Interior Point Methods for
Linear Optimization
C. Roos, T. Terlaky, J.-Ph. Vial
Dedicated to our wives
Gerda, Gabriella and Marie
and our children
Jacoline, Geranda, Marijn
Viktor
Benjamin and Emmanuelle
Contents
List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xv
List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xix
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii
1 Introduction . . . . . . . . . . . . . . . . . . . . . .
1.1 Subject of the book . . . . . . . . . . . . . . . . .
1.2 More detailed description of the contents . . . . .
1.3 What is new in this book? . . . . . . . . . . . . .
1.4 Required knowledge and skills . . . . . . . . . . .
1.5 How to use the book for courses . . . . . . . . . .
1.6 Footnotes and exercises . . . . . . . . . . . . . .
1.7 Preliminaries . . . . . . . . . . . . . . . . . . . .
1.7.1 Positive definite matrices . . . . . . . . .
1.7.2 Norms of vectors and matrices . . . . . .
1.7.3 Hadamard inequality for the determinant
1.7.4 Order estimates . . . . . . . . . . . . . . .
1.7.5 Notational conventions . . . . . . . . . . .
I
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction: Theory and Complexity
2 Duality Theory for Linear Optimization . . .
2.1 Introduction . . . . . . . . . . . . . . . . . . . .
2.2 The canonical LO-problem and its dual . . . .
2.3 Reduction to inequality system . . . . . . . . .
2.4 Interior-point condition . . . . . . . . . . . . .
2.5 Embedding into a self-dual LO-problem . . . .
2.6 The classes B and N . . . . . . . . . . . . . . .
2.7 The central path . . . . . . . . . . . . . . . . .
2.7.1 Definition of the central path . . . . . .
2.7.2 Existence of the central path . . . . . .
2.8 Existence of a strictly complementary solution
2.9 Strong duality theorem . . . . . . . . . . . . . .
1
1
2
5
6
6
8
8
8
8
11
11
11
13
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15
15
18
19
20
22
24
27
27
29
35
38
viii
Contents
2.10 The dual problem of an arbitrary LO problem . . . . . . . . . . . . . .
2.11 Convergence of the central path . . . . . . . . . . . . . . . . . . . . . .
40
43
3 A Polynomial Algorithm for the Self-dual Model . . .
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Finding an ε-solution . . . . . . . . . . . . . . . . . . . .
3.2.1 Newton-step algorithm . . . . . . . . . . . . . . .
3.2.2 Complexity analysis . . . . . . . . . . . . . . . .
3.3 Polynomial complexity result . . . . . . . . . . . . . . .
3.3.1 Introduction . . . . . . . . . . . . . . . . . . . .
3.3.2 Condition number . . . . . . . . . . . . . . . . .
3.3.3 Large and small variables . . . . . . . . . . . . .
3.3.4 Finding the optimal partition . . . . . . . . . . .
3.3.5 A rounding procedure for interior-point solutions
3.3.6 Finding a strictly complementary solution . . . .
3.4 Concluding remarks . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
47
47
48
50
50
53
53
54
57
58
62
65
70
4 Solving the Canonical Problem . . . . . . . . . . . .
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . .
4.2 The case where strictly feasible solutions are known
4.2.1 Adapted self-dual embedding . . . . . . . . .
4.2.2 Central paths of (P ) and (D) . . . . . . . . .
4.2.3 Approximate solutions of (P ) and (D) . . . .
4.3 The general case . . . . . . . . . . . . . . . . . . . .
4.3.1 Introduction . . . . . . . . . . . . . . . . . .
4.3.2 Alternative embedding for the general case .
4.3.3 The central path of (SP2 ) . . . . . . . . . . .
4.3.4 Approximate solutions of (P ) and (D) . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
71
71
72
73
74
75
78
78
78
80
82
II
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
The Logarithmic Barrier Approach
85
5 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . .
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Duality results for the standard LO problem . . . . . .
5.3 The primal logarithmic barrier function . . . . . . . .
5.4 Existence of a minimizer . . . . . . . . . . . . . . . . .
5.5 The interior-point condition . . . . . . . . . . . . . . .
5.6 The central path . . . . . . . . . . . . . . . . . . . . .
5.7 Equivalent formulations of the interior-point condition
5.8 Symmetric formulation . . . . . . . . . . . . . . . . . .
5.9 Dual logarithmic barrier function . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 87
. 87
. 88
. 90
. 90
. 91
. 95
. 99
. 103
. 105
6 The Dual Logarithmic Barrier
6.1 A conceptual method . . . . .
6.2 Using approximate centers . .
6.3 Definition of the Newton step
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Method
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
107
107
109
110
Contents
6.4
6.5
6.6
6.7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
113
114
119
120
121
122
123
125
127
127
129
130
132
135
138
140
142
143
144
7 The Primal-Dual Logarithmic Barrier Method . . . . . . . . . .
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Definition of the Newton step . . . . . . . . . . . . . . . . . . . . .
7.3 Properties of the Newton step . . . . . . . . . . . . . . . . . . . . .
7.4 Proximity and local quadratic convergence . . . . . . . . . . . . . .
7.4.1 A sharper local quadratic convergence result . . . . . . . .
7.5 Primal-dual logarithmic barrier algorithm with full Newton steps .
7.5.1 Convergence analysis . . . . . . . . . . . . . . . . . . . . . .
7.5.2 Illustration of the algorithm with full Newton steps . . . . .
7.5.3 The classical analysis of the algorithm . . . . . . . . . . . .
7.6 A version of the algorithm with adaptive updates . . . . . . . . . .
7.6.1 Adaptive updating . . . . . . . . . . . . . . . . . . . . . . .
7.6.2 The primal-dual affine-scaling and centering direction . . .
7.6.3 Condition for adaptive updates . . . . . . . . . . . . . . . .
7.6.4 Calculation of the adaptive update . . . . . . . . . . . . . .
7.6.5 Special case: adaptive update at the µ-center . . . . . . . .
7.6.6 A simple version of the condition for adaptive updating . .
7.6.7 Illustration of the algorithm with adaptive updates . . . . .
7.7 The predictor-corrector method . . . . . . . . . . . . . . . . . . . .
7.7.1 The predictor-corrector algorithm . . . . . . . . . . . . . .
7.7.2 Properties of the affine-scaling step . . . . . . . . . . . . . .
7.7.3 Analysis of the predictor-corrector algorithm . . . . . . . .
7.7.4 An adaptive version of the predictor-corrector algorithm . .
7.7.5 Illustration of adaptive predictor-corrector algorithm . . . .
7.7.6 Quadratic convergence of the predictor-corrector algorithm
7.8 A version of the algorithm with large updates . . . . . . . . . . . .
7.8.1 Estimates of barrier function values . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
149
149
150
152
154
159
160
161
162
165
168
168
170
172
172
174
175
176
177
181
181
185
186
188
188
194
196
6.8
6.9
Properties of the Newton step . . . . . . . . . . . . . . . . . .
Proximity and local quadratic convergence . . . . . . . . . . .
The duality gap close to the central path . . . . . . . . . . .
Dual logarithmic barrier algorithm with full Newton steps . .
6.7.1 Convergence analysis . . . . . . . . . . . . . . . . . . .
6.7.2 Illustration of the algorithm with full Newton steps . .
A version of the algorithm with adaptive updates . . . . . . .
6.8.1 An adaptive-update variant . . . . . . . . . . . . . . .
6.8.2 The affine-scaling direction and the centering direction
6.8.3 Calculation of the adaptive update . . . . . . . . . . .
6.8.4 Illustration of the use of adaptive updates . . . . . . .
A version of the algorithm with large updates . . . . . . . . .
6.9.1 Estimates of barrier function values . . . . . . . . . .
6.9.2 Estimates of objective values . . . . . . . . . . . . . .
6.9.3 Effect of large update on barrier function value . . . .
6.9.4 Decrease of the barrier function value . . . . . . . . .
6.9.5 Number of inner iterations . . . . . . . . . . . . . . .
6.9.6 Total number of iterations . . . . . . . . . . . . . . . .
6.9.7 Illustration of the algorithm with large updates . . . .
ix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
x
Contents
7.8.2
7.8.3
7.8.4
Decrease of barrier function value . . . . . . . . . . . . . . . . . 199
A bound for the number of inner iterations . . . . . . . . . . . 204
Illustration of the algorithm with large updates . . . . . . . . . 209
8 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
III
The Target-following Approach
9 Preliminaries . . . . . . . . . . . .
9.1 Introduction . . . . . . . . . . .
9.2 The target map and its inverse
9.3 Target sequences . . . . . . . .
9.4 The target-following scheme . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
217
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
219
219
221
226
231
10 The Primal-Dual Newton Method . . . .
10.1 Introduction . . . . . . . . . . . . . . . . .
10.2 Definition of the primal-dual Newton step
10.3 Feasibility of the primal-dual Newton step
10.4 Proximity and local quadratic convergence
10.5 The damped primal-dual Newton method
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
235
235
235
236
237
240
11 Applications . . . . . . . . . . . . . . . . . .
11.1 Introduction . . . . . . . . . . . . . . . . .
11.2 Central-path-following method . . . . . .
11.3 Weighted-path-following method . . . . .
11.4 Centering method . . . . . . . . . . . . .
11.5 Weighted-centering method . . . . . . . .
11.6 Centering and optimizing together . . . .
11.7 Adaptive and large target-update methods
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
247
247
248
249
250
252
254
257
12 The Dual Newton Method . . . . .
12.1 Introduction . . . . . . . . . . . . .
12.2 The weighted dual barrier function
12.3 Definition of the dual Newton step
12.4 Feasibility of the dual Newton step
12.5 Quadratic convergence . . . . . . .
12.6 The damped dual Newton method
12.7 Dual target-updating . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
259
259
259
261
262
263
264
266
13 The Primal Newton Method . . . . .
13.1 Introduction . . . . . . . . . . . . . .
13.2 The weighted primal barrier function
13.3 Definition of the primal Newton step
13.4 Feasibility of the primal Newton step
13.5 Quadratic convergence . . . . . . . .
13.6 The damped primal Newton method
13.7 Primal target-updating . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
269
269
270
270
272
273
273
275
Contents
xi
14 Application to the Method of Centers . .
14.1 Introduction . . . . . . . . . . . . . . . . .
14.2 Description of Renegar’s method . . . . .
14.3 Targets in Renegar’s method . . . . . . .
14.4 Analysis of the center method . . . . . . .
14.5 Adaptive- and large-update variants of the
IV
. . . .
. . . .
. . . .
. . . .
. . . .
center
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
method
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Miscellaneous Topics
277
277
278
279
281
284
287
15 Karmarkar’s Projective Method . . . . . . . . . . . .
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
15.2 The unit simplex Σn in IRn . . . . . . . . . . . . . . .
15.3 The inner-outer sphere bound . . . . . . . . . . . . . .
15.4 Projective transformations of Σn . . . . . . . . . . . .
15.5 The projective algorithm . . . . . . . . . . . . . . . . .
15.6 The Karmarkar potential . . . . . . . . . . . . . . . .
15.7 Iteration bound for the projective algorithm . . . . . .
15.8 Discussion of the special format . . . . . . . . . . . . .
15.9 Explicit expression for the Karmarkar search direction
15.10The homogeneous Karmarkar format . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
289
289
290
291
292
293
295
297
297
301
304
16 More Properties of the Central Path
16.1 Introduction . . . . . . . . . . . . . . .
16.2 Derivatives along the central path . .
16.2.1 Existence of the derivatives . .
16.2.2 Boundedness of the derivatives
16.2.3 Convergence of the derivatives
16.3 Ellipsoidal approximations of level sets
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
307
307
307
307
309
314
315
17 Partial Updating . . . . . . . . . .
17.1 Introduction . . . . . . . . . . . .
17.2 Modified search direction . . . .
17.3 Modified proximity measure . . .
17.4 Algorithm with rank-one updates
17.5 Count of the rank-one updates .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
317
317
319
320
323
324
18 Higher-Order Methods . . . . . . . . . . . . . . . . .
18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . .
18.2 Higher-order search directions . . . . . . . . . . . . .
18.3 Analysis of the error term . . . . . . . . . . . . . . .
18.4 Application to the primal-dual Dikin direction . . .
18.4.1 Introduction . . . . . . . . . . . . . . . . . .
18.4.2 The (first-order) primal-dual Dikin direction
18.4.3 Algorithm using higher-order Dikin directions
18.4.4 Feasibility and duality gap reduction . . . . .
18.4.5 Estimate of the error term . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
329
329
330
335
337
337
338
341
341
342
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xii
Contents
18.4.6 Step size . . . . . . . . . . . . . . . . . . . . . . . . .
18.4.7 Convergence analysis . . . . . . . . . . . . . . . . . .
18.5 Application to the primal-dual logarithmic barrier method .
18.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . .
18.5.2 Estimate of the error term . . . . . . . . . . . . . . .
18.5.3 Reduction of the proximity after a higher-order step
18.5.4 The step-size . . . . . . . . . . . . . . . . . . . . . .
18.5.5 Reduction of the barrier parameter . . . . . . . . . .
18.5.6 A higher-order logarithmic barrier algorithm . . . .
18.5.7 Iteration bound . . . . . . . . . . . . . . . . . . . . .
18.5.8 Improved iteration bound . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
343
345
346
346
347
349
353
354
356
357
358
19 Parametric and Sensitivity Analysis . . . . . . . . . . . . . . .
19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19.3 Optimal sets and optimal partition . . . . . . . . . . . . . . . .
19.4 Parametric analysis . . . . . . . . . . . . . . . . . . . . . . . . .
19.4.1 The optimal-value function is piecewise linear . . . . . .
19.4.2 Optimal sets on a linearity interval . . . . . . . . . . . .
19.4.3 Optimal sets in a break point . . . . . . . . . . . . . . .
19.4.4 Extreme points of a linearity interval . . . . . . . . . . .
19.4.5 Running through all break points and linearity intervals
19.5 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . .
19.5.1 Ranges and shadow prices . . . . . . . . . . . . . . . . .
19.5.2 Using strictly complementary solutions . . . . . . . . . .
19.5.3 Classical approach to sensitivity analysis . . . . . . . . .
19.5.4 Comparison of the classical and the new approach . . .
19.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
361
361
362
362
366
368
370
372
377
379
387
387
388
391
394
398
20 Implementing Interior Point Methods . . . . . . . . . . . . . .
20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.2 Prototype algorithm . . . . . . . . . . . . . . . . . . . . . . . .
20.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.3.1 Detecting redundancy and making the constraint matrix
20.3.2 Reducing the size of the problem . . . . . . . . . . . . .
20.4 Sparse linear algebra . . . . . . . . . . . . . . . . . . . . . . . .
20.4.1 Solving the augmented system . . . . . . . . . . . . . .
20.4.2 Solving the normal equation . . . . . . . . . . . . . . . .
20.4.3 Second-order methods . . . . . . . . . . . . . . . . . . .
20.5 Starting point . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.5.1 Simplifying the Newton system of the embedding model
20.5.2 Notes on warm start . . . . . . . . . . . . . . . . . . . .
20.6 Parameters: step-size, stopping criteria . . . . . . . . . . . . . .
20.6.1 Target-update . . . . . . . . . . . . . . . . . . . . . . .
20.6.2 Step size . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.6.3 Stopping criteria . . . . . . . . . . . . . . . . . . . . . .
20.7 Optimal basis identification . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
sparser
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
401
401
402
405
406
407
408
408
409
411
413
418
418
419
419
420
420
421
Contents
xiii
20.7.1 Preliminaries . . . . . . . . . . . . . . . . .
20.7.2 Basis tableau and orthogonality . . . . . . .
20.7.3 The optimal basis identification procedure .
20.7.4 Implementation issues of basis identification
20.8 Available software . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
421
422
424
427
429
Appendix A Some Results from Analysis . . . . . . . . . . . . . . . . . . 431
Appendix B Pseudo-inverse of a Matrix . . . . . . . . . . . . . . . . . . 433
Appendix C Some Technical Lemmas . . . . . . . . . . . . . . . . . . . . 435
Appendix D Transformation to canonical
D.1 Introduction . . . . . . . . . . . . . . .
D.2 Elimination of free variables . . . . . .
D.3 Removal of equality constraints . . . .
form
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
445
445
446
448
Appendix E The Dikin step algorithm
E.1 Introduction . . . . . . . . . . . . . .
E.2 Search direction . . . . . . . . . . . .
E.3 Algorithm using the Dikin direction
E.4 Feasibility, proximity and step-size .
E.5 Convergence analysis . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
451
451
451
454
455
458
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
Symbol Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
List of Figures
1.1
3.1
5.1
5.2
5.3
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6.12
6.13
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
7.10
7.11
9.1
10.1
11.1
14.1
15.1
15.2
18.1
19.1
Dependence between the chapters. . . . . . . . . . . . . . . . . . . . .
Output Full-Newton step algorithm for the problem in Example I.7. .
The graph of ψ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The dual central path if b = (0, 1). . . . . . . . . . . . . . . . . . . . .
The dual central path if b = (1, 1). . . . . . . . . . . . . . . . . . . . .
The projection yielding s−1 ∆s. . . . . . . . . . . . . . . . . . . . . . .
Required number of Newton steps to reach proximity 10−16 . . . . . . .
Convergence rate of the Newton process. . . . . . . . . . . . . . . . . .
The proximity before and after a Newton step. . . . . . . . . . . . . .
Demonstration no.1 of the Newton process. . . . . . . . . . . . . . . .
Demonstration no.2 of the Newton process. . . . . . . . . . . . . . . .
Demonstration no.3 of the Newton process. . . . . . . . . . . . . . . .
Iterates of the dual logarithmic barrier algorithm. . . . . . . . . . . . .
The idea of adaptive updating. . . . . . . . . . . . . . . . . . . . . . .
The iterates when using adaptive updates. . . . . . . . . . . . . . . . .
The functions ψ(δ) and ψ(−δ) for 0 ≤ δ < 1. . . . . . . . . . . . . . .
Bounds for bT y. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The first iterates for a large update with θ = 0.9. . . . . . . . . . . . .
Quadratic convergence of primal-dual Newton process (µ = 1). . . . .
Demonstration of the primal-dual Newton process. . . . . . . . . . . .
The iterates of the primal-dual algorithm with full steps. . . . . . . . .
The primal-dual full-step approach. . . . . . . . . . . . . . . . . . . . .
The full-step method with an adaptive barrier update. . . . . . . . . .
Iterates of the primal-dual algorithm with adaptive updates. . . . . . .
Iterates of the primal-dual algorithm with cheap adaptive updates. . .
The right-hand side of (7.40) for τ = 1/2. . . . . . . . . . . . . . . . .
The iterates of the adaptive predictor-corrector algorithm. . . . . . .
Bounds for ψµ (x, s). . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The iterates when using large updates with θ = 0.5, 0.9, 0.99 and 0.999.
The central path in the w-space (n = 2). . . . . . . . . . . . . . . . . .
Lower bound for the decrease in φw during a damped Newton step. . .
A Dikin-path in the w-space (n = 2). . . . . . . . . . . . . . . . . . . .
The center method according to Renegar. . . . . . . . . . . . . . . . .
The simplex Σ3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
One iteration of the projective algorithm (x = xk ). . . . . . . . . . . .
Trajectories in the w-space for higher-order steps with r = 1, 2, 3, 4, 5.
A shortest path problem. . . . . . . . . . . . . . . . . . . . . . . . . .
7
53
93
98
99
112
115
116
117
117
118
119
125
126
130
135
138
147
158
159
165
169
170
178
178
185
190
198
212
225
244
254
281
290
294
334
363
xvi
List of figures
19.2
19.3
19.4
19.5
19.6
20.1
20.2
E.1
The optimal partition of the shortest path problem in Figure 19.1. .
The optimal-value function g(γ). . . . . . . . . . . . . . . . . . . . .
The optimal-value function f (β). . . . . . . . . . . . . . . . . . . . .
The feasible region of (D). . . . . . . . . . . . . . . . . . . . . . . . .
A transportation problem. . . . . . . . . . . . . . . . . . . . . . . . .
Basis tableau. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Tableau for a maximal basis. . . . . . . . . . . . . . . . . . . . . . .
Output of the Dikin Step Algorithm for the problem in Example I.7.
.
.
.
.
.
.
.
.
364
369
383
390
394
423
426
459
List of Tables
2.1. Scheme for dualizing. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1. Estimates for large and small variables on the central path. . . . . . .
3.2. Estimates for large and small variables if δc (z) ≤ τ . . . . . . . . . . . .
6.1. Output of the dual full-step algorithm. . . . . . . . . . . . . . . . . . .
6.2. Output of the dual full-step algorithm with adaptive updates. . . . . .
6.3. Progress of the dual algorithm with large updates, θ = 0.5. . . . . . .
6.4. Progress of the dual algorithm with large updates, θ = 0.9. . . . . . .
6.5. Progress of the dual algorithm with large updates, θ = 0.99. . . . . . .
7.1. Output of the primal-dual full-step algorithm. . . . . . . . . . . . . . .
7.2. Proximity values in the final iterations. . . . . . . . . . . . . . . . . . .
7.3. The primal-dual full-step algorithm with expensive adaptive updates. .
7.4. The primal-dual full-step algorithm with cheap adaptive updates. . . .
7.5. The adaptive predictor-corrector algorithm. . . . . . . . . . . . . . . .
7.6. Asymptotic orders of magnitude of some relevant vectors. . . . . . . .
7.7. Progress of the primal-dual algorithm with large updates, θ = 0.5. . .
7.8. Progress of the primal-dual algorithm with large updates, θ = 0.9. . .
7.9. Progress of the primal-dual algorithm with large updates, θ = 0.99. . .
7.10. Progress of the primal-dual algorithm with large updates, θ = 0.999. .
16.1. Asymptotic orders of magnitude of some relevant vectors. . . . . . . .
43
58
61
124
129
145
146
146
163
164
177
177
189
191
210
211
211
211
310
Preface
Linear Optimization1 (LO) is one of the most widely taught and applied mathematical
techniques. Due to revolutionary developments both in computer technology and
algorithms for linear optimization, ‘the last ten years have seen an estimated six orders
of magnitude speed improvement’.2 This means that problems that could not be solved
10 years ago, due to a required computational time of one year, say, can now be solved
within some minutes. For example, linear models of airline crew scheduling problems
with as many as 13 million variables have recently been solved within three minutes
on a four-processor Silicon Graphics Power Challenge workstation. The achieved
acceleration is due partly to advances in computer technology and for a significant
part also to the developments in the field of so-called interior-point methods for linear
optimization.
Until very recently, the method of choice for solving linear optimization problems
was the Simplex Method of Dantzig [59]. Since the initial formulation in 1947, this
method has been constantly improved. It is generally recognized to be very robust and
efficient and it is routinely used to solve problems in Operations Research, Business,
Economics and Engineering. In an effort to explain the remarkable efficiency of the
Simplex Method, people strived to prove, using the theory of complexity, that the
computational effort to solve a linear optimization problem via the Simplex Method
is polynomially bounded with the size of the problem instance. This question is still
unsettled today, but it stimulated two important proposals of new algorithms for LO.
The first one is due to Khachiyan in 1979 [167]: it is based on the ellipsoid technique
for nonlinear optimization of Shor [255]. With this technique, Khachiyan proved that
LO belongs to the class of polynomially solvable problems. Although this result has
had a great theoretical impact, the new algorithm failed to deliver its promises in
actual computational efficiency. The second proposal was made in 1984 by Karmarkar [165]. Karmarkar’s algorithm is also polynomial, with a better complexity bound
1
2
The field of Linear Optimization has been given the name Linear Programming in the past. The
origin of this name goes back to the Dutch Nobel prize winner Koopmans. See Dantzig [60].
Nowadays the word ‘programming’ usually refers to the activity of writing computer programs,
and as a consequence its use instead of the more natural word ‘optimization’ gives rise to confusion.
Following others, like Padberg [230], we prefer to use the name Linear Optimization in the
book. It may be noted that in the nonlinear branches of the field of Mathematical Programming
(like Combinatorial Optimization, Discrete Optimization, Semidefinite Optimization, etc.) this
terminology has already become generally accepted.
This claim is due to R.E. Bixby, professor of Computational and Applied Mathematics at Rice
University, and director of CPLEX Optimization, Inc., a company that markets algorithms for
linear and mixed-integer optimization. See the news bulletin of the Center For Research on Parallel
Computation, Volume 4, Issue 1, Winter 1996. Bixby adds that parallelization may lead to ‘at least
eight orders of magnitude improvement—the difference between a year and a fraction of a second!’
xx
Preface
than Khachiyan, but it has the further advantage of being highly efficient in practice.
After an initial controversy it has been established that for very large, sparse problems,
subsequent variants of Karmarkar’s method often outperform the Simplex Method.
Though the field of LO was considered more or less mature some ten years ago, after
Karmarkar’s paper it suddenly surfaced as one of the most active areas of research in
optimization. In the period 1984–1989 more than 1300 papers were published on the
subject, which became known as Interior Point Methods (IPMs) for LO.3 Originally
the aim of the research was to get a better understanding of the so-called Projective
Method of Karmarkar. Soon it became apparent that this method was related to
classical methods like the Affine Scaling Method of Dikin [63, 64, 65], the Logarithmic
Barrier Method of Frisch [86, 87, 88] and the Center Method of Huard [148, 149],
and that the last two methods could also be proved to be polynomial. Moreover, it
turned out that the IPM approach to LO has a natural generalization to the related
field of convex nonlinear optimization, which resulted in a new stream of research
and an excellent monograph of Nesterov and Nemirovski [226]. Promising numerical
performances of IPMs for convex optimization were recently reported by Breitfeld
and Shanno [50] and Jarre, Kocvara and Zowe [162]. The monograph of Nesterov
and Nemirovski opened the way into another new subfield of optimization, called
Semidefinite Optimization, with important applications in System Theory, Discrete
Optimization, and many other areas. For a survey of these developments the reader
may consult Vandenberghe and Boyd [48].
As a consequence of the above developments, there are now profound reasons why
people may want to learn about IPMs. We hope that this book answers the need of
professors who want to teach their students the principles of IPMs, of colleagues who
need a unified presentation of a desperately burgeoning field, of users of LO who want
to understand what is behind the new IPM solvers in commercial codes (CPLEX, OSL,
. . .) and how to interpret results from those codes, and of other users who want to
exploit the new algorithms as part of a more general software toolbox in optimization.
Let us briefly indicate here what the book offers, and what does it not. Part I
contains a small but complete and self-contained introduction to LO. We deal with
the duality theory for LO and we present a first polynomial method for solving an LO
problem. We also present an elegant method for the initialization of the method,
using the so-called self-dual embedding technique. Then in Part II we present a
comprehensive treatment of Logarithmic Barrier Methods. These methods are applied
to the LO problem in standard format, the format that has become most popular in
the field because the Simplex Method was originally devised for that format. This
part contains the basic elements for the design of efficient algorithms for LO. Several
types of algorithm are considered and analyzed. Very often the analysis improves the
existing analysis and leads to sharper complexity bounds than known in the literature.
In Part III we deal with the so-called Target-following Approach to IPMs. This is a
unifying framework that enables us to treat many other IPMs, like the Center Method,
in an easy way. Part IV covers some additional topics. It starts with the description
and analysis of the Projective Method of Karmarkar. Then we discuss some more
3
We refer the reader to the extensive bibliography of Kranich [179, 180] for a survey of the
literature on the subject until 1989. A more recent (annotated) bibliography was given by Roos
and Terlaky [242]. A valuable source of information is the World Wide Web interior point archive:
http://www.mcs.anl.gov/home/otc/InteriorPoint.archive.html.
Preface
xxi
interesting theoretical properties of the central path. We also discuss two interesting
methods to enhance the efficiency of IPMs, namely Partial Updating, and so-called
Higher-Order Methods. This part also contains chapters on parametric and sensitivity
analysis and on computational aspects of IPMs.
It may be clear from this description that we restrict ourselves to Linear Optimization in this book. We do not dwell on such interesting subjects as Convex Optimization and Semidefinite Optimization, but we consider the book as a preparation for
the study of IPMs for these types of optimization problem, and refer the reader to the
existing literature.4
Some popular topics in IPMs for LO are not covered by the book. For example,
we do not treat the (Primal) Affine Scaling Method of Dikin.5 The reason for this
is that we restrict ourselves in this book to polynomial methods and until now the
polynomiality question for the (Primal) Affine Scaling Method is unsettled. Instead
we describe in Appendix E a primal-dual version of Dikin’s affine-scaling method
that is polynomial. Chapter 18 describes a higher-order version of this primal-dual
affine-scaling method that has the best possible complexity bound known until now
for interior-point methods.
Another topic not touched in the book is (Primal-Dual) Infeasible Start Methods.
These methods, which have drawn a lot of attention in the last years, deal with the
situation when no feasible starting point is available.6 In fact, Part I of the book
provides a much more elegant solution to this problem; there we show that any given
LO problem can be embedded in a self-dual problem for which a feasible interior
starting point is known. Further, the approach in Part I is theoretically more efficient
than using an Infeasible Start Method, and from a computational point of view is not
more involved, as we show in Chapter 20.
We hope that the book will be useful to students, users and researchers, inside and
outside the field, in offering them, under a single cover, a presentation of the most
successful ideas in interior-point methods.
Kees Roos
Tamás Terlaky
Jean-Philippe Vial
Preface to the 2005 edition
Twenty years after Karmarkar’s [165] epoch making paper interior point methods
(IPMs) made their way to all areas of optimization theory and practice. The theory of
IPMs matured, their professional software implementations significantly pushed the
boundary of efficiently solvable problems. Eight years passed since the first edition
of this book was published. In these years the theory of IPMs further crystallized.
One of the notable developments is that the significance of the self-dual embedding
4
For Convex Optimization the reader may consult den Hertog [140], Nesterov and Nemirovski [226]
and Jarre [161]. For Semidefinite Optimization we refer to Nesterov and Nemirovski [226],
Vandenberghe and Boyd [48] and Ramana and Pardalos [236]. We also mention Shanno and
Breitfeld and Simantiraki [252] for the related topic of barrier methods for nonlinear programming.
5
A recent survey on affine scaling methods was given by Tsuchiya [272].
6
We refer the reader to, e.g., Potra [235], Bonnans and Potra [45], Wright [295, 297], Wright and
Ralph [296] and the recent book of Wright [298].
xxii
Preface
model –that is a distinctive feature of this book– got fully recognized. Leading linear
and conic-linear optimization software packages, such as MOSEK7 and SeDuMi8 are
developed on the bedrock of the self-dual model, and the leading commercial linear
optimization package CPLEX9 includes the embedding model as a proposed option to
solve difficult practical problems.
This new edition of this book features a completely rewritten first part. While
keeping the simplicity of the presentation and accessibility of complexity analysis,
the featured IPM in Part I is now a standard, primal-dual path-following Newton
algorithm. This choice allows us to reach the so-far best known complexity result in
an elementary way, immediately in the first part of the book.
As always, the authors had to make choices when and how to cut the expansion of
the material of the book, and which new results to include in this edition. We cannot
resist mentioning two developments after the publication of the first edition.
The first development can be considered as a direct consequence of the approach
taken in the book. In our approach properties of the univariate function ψ(t), as defined
in Section 5.5 (page 92), play a key role. The book makes clear that the primal-, dualand primal-dual logarithmic barrier function can be defined in terms of ψ(t), and
as such ψ(t) is at the heart of all logarithmic barrier functions; we call it now the
kernel function of the logarithmic barrier function. After the completion of the book
it became clear that more efficient large-update IPMs than those considered in this
book, which are all based on the logarithmic barrier function, can be obtained simply
by replacing ψ(t) by other kernel functions. A large class of such kernel functions,
that allowed to improve the worst case complexity of large-update IPMs, is the family
of self-regular functions, which is the subject of the monograph [233]; more kernel
functions were considered in [32].
A second, more recent development, deals
√ with the complexity of IPMs. Until now,
the best iteration bound for IPMs is O( nL), where n denotes the dimension of the
problem (in standard from),
√ and L the binary input size of the problem. In 1996, Todd
and Ye showed that O( 3 nL) is a lower bound for the iteration complexity of IPMs
[267]. It is well known that the iteration complexity highly depends on the curliness
of the central path, and that the presence of redundancy may severely affect this
curliness. Deza et al. [61] showed that by adding enough redundant constraints to the
Klee-Minty example of dimension n, the central path may be forced to visit all 2n
vertices of the Klee-Minty cube. An enhanced version √
of the same example, where the
number of inequalities is N = O(22n n3 ), yields an O( N /log N ) lower bound for the
iteration complexity, thus almost closing (up to a factor of log N ) the gap with the
best worst case iteration bound for IPMs [62].
Instructors adapting the book as textbook in a course may contact the authors at
<terlaky@mcmaster.ca> for obtaining the ”Solution Manual” for the exercises and
getting access to a user forum.
March 2005
7
MOSEK: http://www.mosek.com
8
SeDuMi: http://sedumi.mcmaster.ca
9
CPLEX: http://cplex.com
Kees Roos
Tamás Terlaky
Jean-Philippe Vial
Acknowledgements
The subject of this book came into existence during the twelve years following 1984
when Karmarkar initiated the field of interior-point methods for linear optimization.
Each of the authors has been involved in the exciting research that gave rise to the
subject and in many cases they published their results jointly. Of course the book
is primarily organized around these results, but it goes without saying that many
other results from colleagues in the ‘interior-point community’ are also included. We
are pleased to acknowledge their contribution and at the appropriate places we have
strived to give them credit. If some authors do not find due mention of their work
we apologize for this and invoke as an excuse the exploding literature that makes it
difficult to keep track of all the contributions.
To reach a unified presentation of many diverse results, it did not suffice to make
a bundle of existing papers. It was necessary to recast completely the form in which
these results found their way into the journals. This was a very time-consuming task:
we want to thank our universities for giving us the opportunity to do this job.
We gratefully acknowledge the developers of LATEX for designing this powerful text
processor and our colleagues Leo Rog and Peter van der Wijden for their assistance
whenever there was a technical problem. For the construction of many tables and
figures we used MATLAB; nowadays we could say that a mathematician without
MATLAB is like a physicist without a microscope. It is really exciting to study the
behavior of a designed algorithm with the graphical features of this ‘mathematical
microscope’.
We greatly enjoyed stimulating discussions with many colleagues from all over the
world in the past years. Often this resulted in cooperation and joint publications.
We kindly acknowledge that without the input from their side this book could not
have been written. Special thanks are due to those colleagues who helped us during
the writing process. We mention János Mayer (University of Zürich, Switzerland) for
his numerous remarks after a critical reading of large parts of the first draft and
Michael Saunders (Stanford University, USA) for an extremely careful and useful
preview of a later version of the book. Many other colleagues helped us to improve
intermediate drafts. We mention Jan Brinkhuis (Erasmus University, Rotterdam)
who provided us with some valuable references, Erling Andersen (Odense University,
Denmark), Harvey Greenberg and Allen Holder (both from the University of Colorado
at Denver, USA), Tibor Illés (Eötvös University, Budapest), Florian Jarre (University
of Würzburg, Germany), Etienne de Klerk (Delft University of Technology), Panos
Pardalos (University of Florida, USA), Jos Sturm (Erasmus University, Rotterdam),
and Joost Warners (Delft University of Technology).
Finally, the authors would like to acknowledge the generous contributions of
xxiv
Acknowledgements
numerous colleagues and students. Their critical reading of earlier drafts of the
manuscript helped us to clean up the new edition by eliminating typos and using
their constructive remarks to improve the readability of several parts of the books. We
mention Jiming Peng (McMaster University), Gema Martinez Plaza (The University
of Alicante) and Manuel Vieira (University of Lisbon/University of Technology Delft).
Last but not least, we want to express warm thanks to our wives and children. They
also contributed substantially to the book by their mental support, and by forgiving
our shortcomings as fathers for too long.
1
Introduction
1.1
Subject of the book
This book deals with linear optimization (LO). The object of LO is to find the optimal
(minimal or maximal) value of a linear function subject to linear constraints on the
variables. The constraints may be either equality or inequality constraints.1 From
the point of view of applications, LO possesses many nice features. Linear models are
relatively simple to create. They can be realistic enough to give a proper account of the
problems at hand. As a consequence, LO models have found applications in different
areas such as engineering, management, logistics, statistics, pattern recognition, etc.
LO is also very relevant to economic theory. It underlies the analysis of linear activity
models and provides, through duality theory, a nice insight into the price mechanism.
However, we will not deal with applications and modeling. Many existing textbooks
teach more about this.2
Our interest will be mainly in methods for solving LO problems, especially Interior
Point Methods (IPM’s). Renewed interest in these methods for solving LO problems
arose after the seminal paper of Karmarkar [165] in 1984. The overwhelming amount
of research of the last ten years has been tremendously prolific. Many new algorithms
were proposed and almost all of these algorithms have been shown to be efficient, at
least from a theoretical point of view. Our first aim is to present a comprehensive and
unified treatment of many of these new methods.
It may not be surprising that exploring a new method for LO should lead to a new
view of the theory of LO. In fact, a similar interaction between method and theory
is well known for the Simplex Method; in the past the theory of LO and the Simplex
Method were intimately related. The fundamental results of the theory of LO concern
strong duality and the existence of a strictly complementary solution. Our second aim
will be to derive these results from limiting properties of the so-called central path of
an LO problem.
Thus the very theory of LO is revisited. The central path appears to play a key role
both in the development of the theory and in the design of algorithms.
1
2
The more general optimization problem arising when the objective function and/or the constraints
are nonlinear is not considered. It may be pointed out that LO is the first building block in the
development of the theory of nonlinear optimization. Algorithmically, LO is also widely used in
nonlinear and integer optimization, either as a subroutine in a more complicated algorithm or as
a starting point of a specialized algorithm.
The book of Williams [293] is completely devoted to the design of mathematical models, including
linear models.
2
Introduction
As a consequence, the book can be considered a self-contained treatment of LO.
The reader familiar with the subject of LO will easily recognize the difference from
the classical approach to the theory. The Simplex Method in essence explores the
polyhedral structure of the domain (or feasible region) of an LO problem. Accordingly,
the classical approach to the theory of LO concentrates on the polyhedral structure of
the domain. On the other hand, the IPM approach uses the central path as a guide to
the set of optimal solutions, and the theory follows by studying the limiting properties
of this path.3 As we will see, the limit of the central path is a strictly complementary
solution. Strictly complementary solutions play a crucial role in the theory as presented
in Part I of the book. Also, in general, the output of a well-designed IPM for LO is a
strictly complementary solution. Recall that the Simplex Method generates a so-called
basic solution and that such solutions are fundamental in the classical theory of LO.
From the practical point of view it is most important to study the sensitivity of
an optimal solution under perturbations in the data of an LO problem. This is the
subject of Sensitivity (or Parametric or Postoptimal) Analysis. Our third aim will be
to present some new results in this respect, which will make clear the well-known fact
that the classical approach has some inherent weaknesses. These weaknesses can be
overcome by exploring the concept of the optimal partition of an LO problem which
is closely related to a strictly complementary solution.
1.2
More detailed description of the contents
As stated in the previous section, we intend to present an interior point approach
to both the theory of LO and algorithms for LO (design, convergence, complexity
and asymptotic behavior). The common thread through the various parts of the book
will be the prominent role of strictly complementary solutions; this notion plays a
crucial role in the IPM approach and distinguishes the new approach from the classical
Simplex based approach.
Part I of the book consists of Chapters 2, 3 and 4. This part is a self-contained
treatment of LO. It provides the main theoretical results for LO, as well as a
polynomial method for solving the LO problem. The theory of LO is developed in
Chapter 2. This is done in a way that is probably new for most readers, even for those
who are familiar with LO. As indicated before, in IPM’s a fundamental element is
the central path of a problem. This path is introduced in Chapter 2 and the duality
theory for LO is derived from its properties. The general theory turns out to follow
easily when considering first the relatively small class of so-called self-dual problems.
The results for self-dual problems are extended to general problems by embedding
any given LO problem in an appropriate self-dual problem. Chapter 3 presents an
algorithm that solves self-dual problems in polynomial time. It may be emphasized
that this algorithm yields a so-called strictly complementary solution of the given
problem. Such a solution, in general, provides much more information on the set of
3
Most of the fundamental duality results for LO will be well known to many of the readers; they can
be found in any textbook on LO. Probably the existence of a strictly complementary solution is
less well known. This result has been shown first by Goldman and Tucker [111] and will be referred
to as the Goldman–Tucker theorem. It plays a crucial role in this book. We get it as a byproduct
of the limiting behavior of the central path.
Introduction
3
optimal solutions than an optimal basic solution as provided by the Simplex Method.
The strictly complementary solution is obtained by applying a rounding procedure to
a sufficiently accurate approximate solution. Chapter 4 is devoted to LO problems in
canonical format, with (only) nonnegative variables and (only) inequality constraints.
A thorough discussion of the special structure of the canonical format provides some
specialized embeddings in self-dual problems. As a byproduct we find the central
path for canonical LO problems. We also discuss how an approximate solution for the
canonical problem can be obtained from an approximate solution of the embedding
problem.
The two main components in an iterative step of an IPM are the search direction
and the step-length along that direction. The algorithm in Part I is a rather simple
primal-dual algorithm based on the primal-dual Newton direction and uses a very
simple step-length rule: the step length is always 1. The resulting Full-Newton Step
Algorithm is polynomial and straightforward to implement. However, the theoretical
iteration bound derived for this algorithm, although polynomial, is relatively poor
when compared with algorithms based on other search strategies. Therefore, more
efficient methods are considered in Part II of the book; they are so-called Logarithmic
Barrier Methods. For reasons of compatibility with the existing literature, on both
the Simplex Method and IPM’s, we abandon the canonical format (with nonnegative
variables and inequality constraints) in Part II and use the so-called standard format
(with nonnegative variables and equality constraints).
In order to make Part II independent of Part I, in Chapter 5 we revisit duality
theory and discuss the relevant results for the standard format from an interior point
of view. This includes, of course, the definition and existence of the central paths for
the (primal) problem in standard form and its dual problem (which has free variables
and inequality constraints). Using a symmetric formulation of both problems we see
that any method for the primal problem induces in a natural way a method for the dual
problem and vice versa. Then, in Chapter 6, we focus on the Dual Logarithmic Barrier
Method; according to the previous remark the analysis can be naturally, and easily,
transformed to the primal case. The search direction here is the Newton direction for
minimizing the (classical) dual logarithmic barrier function with barrier parameter µ.
Three types of method are considered. First we analyze a method that uses full Newton
steps and small updates of the barrier parameter µ. This gives another central-pathfollowing method that admits the best possible iteration bound. Secondly, we discuss
the use of adaptive updates of µ; this leaves the iteration bound unchanged, but
enhances the practical behavior. Finally, we consider methods that use large updates
of µ and a bounded number of damped Newton steps between each pair of successive
barrier updates. The (theoretical worst-case) iteration bound is worse than for the
full Newton step method, but this seems to be due to the poor analysis of this type
of method. In practice large-update methods are much more efficient than the full
Newton step method. This is demonstrated by some (small) examples. Chapter 7,
deals with the Primal-Dual Logarithmic Barrier Method. It has basically the same
structure as Chapter 6. Having defined the primal-dual Newton direction, we deal
first with a full primal-dual Newton step method that allows small updates in the
barrier parameter µ. Then we consider a method with adaptive updates of µ, and
finally methods that use large updates of µ and a bounded number of damped primaldual Newton steps between each pair of successive barrier updates. In-between we
4
Introduction
also deal with the Predictor-Corrector Method. The nice feature of this method is
its asymptotic quadratic convergence rate. Some small computational examples are
included that highlight the better performance of the primal-dual Newton method
compared with the dual (or primal) Newton method. The methods used in Part II
need to be initialized with a strictly feasible solution.4 Therefore, in Chapter 8 we
discuss how to meet this condition. This concludes the description of Part II.
At this stage of the book, the reader will have encountered the main theoretical
ideas underlying efficient implementations of IPM’s for LO. He will have been exposed
to many variants of IPM’s, dual and primal-dual methods with either full or damped
Newton steps.5 The search directions in these methods are Newton directions. All these
methods, in one way or another, use the central path as a guideline to optimality. Part
III is devoted to a broader class of IPM’s, some of which also follow the central path but
others do not. In Chapter 9 we introduce the unifying concepts of target sequence and
Target-following Methods. In the Logarithmic Barrier Methods of Part II the target
sequence always consists of points on the central path. Other IPM’s can be simply
characterized by their target sequence. We present some examples in Chapter 11,
where we deal with weighted-path-following methods, a Dikin-path-following method,
and also with a centering method that can be used to compute the so-called weightedanalytic center of a polytope. Chapters 10, 12 and 13 present respectively primal-dual,
dual and primal versions of Newton’s method for following a given target sequence.
Finally, concluding Part III, in Chapter 14 we describe a famous interior-point method,
due to Renegar and based on the center method of Huard; we show that it nicely fits
in the framework of target-following methods, with the targets on the central path.
Part IV is entitled Miscellaneous Topics: it contains material that deserves a place
in the book but did not fit well in any of the previous three parts. The reader will
have noticed that until now we have not discussed the very first polynomial IPM,
the Projective Method of Karmarkar. This is because the mainstream of research into
IPM’s diverged from this method soon after 1984.6 Because of the big influence this
algorithm had on the field of LO, and also because there is still a small ongoing stream
of research in this direction, it deserves a place in this book. We describe and analyze
Karmarkar’s method in Chapter 15. Surprisingly enough, and in contrast with all
other methods discussed in this book, both in the description and the analysis of Karmarkar’s method we do not refer to the central path; also, the search direction differs
from the Newton directions used in the other methods. In Chapter 16 we return to the
central path. We show that the central path is differentiable and study the asymptotic
4
A feasible solution is called strictly feasible if no variable or inequality constraint is at (one of) its
bound(s).
5
In the literature, full-step methods are often called short-step methods and damped Newton step
methods long-step methods or large-step methods. In damped-step methods a line search is made in
each iteration that aims to (approximately) minimize a barrier (or potential) function. Therefore,
these methods are also known as potential reduction methods.
There are still many textbooks on LO that do not deal with IPM’s. Moreover, in some other
textbooks that pay attention to IPM’s, the authors only discuss the Projective Method of Karmarkar, thereby neglecting the important developments after 1984 that gave rise to the efficient
methods used in the well-known commercial codes, such as CPLEX and OSL. Exceptions, in this
respect, are Bazaraa, Sherali and Shetty [37], Padberg [230] and Fang and Puthenpura [74], who
discuss the existence of other IPM’s in a separate section or chapter. We also mention Saigal [249],
who gives a large chapter (of 150 pages) on a topic not covered in this book, namely (primal)
affine-scaling methods. A recent survey on these methods is given by Tsuchiya [272].
6
Introduction
5
behavior of the derivatives when the optimal set is approached. We also show that we
can associate with each point on the central path two homothetic ellipsoids centered at
this point so that one ellipsoid is contained in the feasible region and the other ellipsoid
contains the optimal set. The next two chapters deal with methods for accelerating
IPM’s. Chapter 17 deals with a technique called partial updating, already proposed in
Karmarkar’s original paper. In Chapter 18 we consider so-called higher-order methods.
The Newton methods used before are considered to be first-order methods. It is shown
that more advanced search directions improve the iteration bound for several first order
methods. The complexity bound achieves the best value known for IPM’s nowadays.
We also apply the higher-order-technique to the Logarithmic Barrier Method.
Chapter 19 deals with Parametric and Sensitivity Analysis. This classical subject
in LO is of great importance in the analysis of practical linear models. Almost any
textbook includes a section about it and many commercial optimization packages offer
an option to perform post-optimal analysis. Unfortunately, the classical approach,
based on the use of an optimal basic solution, has some inherent weaknesses. These
weaknesses are discussed and demonstrated. We follow a new approach in this chapter,
leading to a better understanding of the subject and avoiding the shortcomings of
the classical approach. The notions of optimal partition and strictly complementary
solution play an important role, but to avoid any misunderstanding, it should be
emphasized that the new approach can also be performed when only an optimal basic
solution is available.
After all the efforts spent in the book to develop beautiful theorems and convergence
results the reader may want to get some more evidence that IPM’s work well in
practice. Therefore the final chapter is devoted to the implementation of IPM’s.
Though most implementations more or less follow the scheme prescribed by the
theory, there is still a large stretch between the theory and an efficient implementation.
Chapter 20 discusses some of the important implementation issues.
1.3
What is new in this book?
The book offers an approach to LO and to IPM’s that is new in many aspects.7 First,
the derivation of the main theoretical results for LO, like the duality theory and the
existence of a strictly complementary solution from properties of the central path, is
new. The primal-dual algorithm for solving self-dual problems is also new; equipped
with the rounding procedure it yields an exact strictly complementary solution. The
derivation of the polynomial complexity of the whole procedure is surprisingly simple.8
The algorithms in Part II, based on the logarithmic barrier method, are known
from the literature, but their analysis contains many new elements, often resulting
in much sharper bounds than those in the literature. In this respect an important
(and new) tool is the function ψ, first introduced in Section 5.5 and used through
the rest of the book. We present a comprehensive discussion of all possible variants
of these algorithms (like dual, primal and primal-dual full-step, adaptive-update and
7
8
Of course, the book is inspired by many papers and results of many colleagues. Thinking over these
results often led to new insights, new algorithms and new ways to analyze these algorithms.
The approach in Part I, based on the embedding of a given LO problem in a self-dual problem,
suggests some new and promising implementation strategies.
6
Introduction
large-update methods). We also deal with the — from the practical point of view
very important — predictor-corrector method, and show that this method has an
asymptotically quadratic convergent rate. We also discuss the techniques of partial
updating and the use of higher-order methods. Finally, we present a new approach to
sensitivity analysis and discuss many computationally aspects which are crucial for
efficient implementation of IPM’s.
1.4
Required knowledge and skills
We wanted to write a book that presents the most prominent results on IPM’s in a
unified and comprehensive way, with a full development of the most important items.
Especially Part I can be considered as an elementary introduction to LO, containing both a complete derivation of the duality theory as well as an easy-to-analyze
polynomial algorithm.
The mathematical tools that are used do not go beyond standard calculus and linear
algebra. Nevertheless, people educated in the Simplex based approach to LO will need
some effort to get acquainted with the formalism and the mathematical manipulations.
They have struggled with the algebra of pivoting, the new methods do not refer to
pivoting.9 However, the tools used are not much more advanced than those that were
required to master the Simplex Method. We therefore expect that people will quickly
get acquainted with the new tools, just as many generations of students have become
familiar with pivoting.
In general, the level of the book will be accessible to any student in Operations
Research and Mathematics, with 2 to 3 years of basic training in calculus and linear
algebra.
1.5
How to use the book for courses
Owing to the importance of LO in theory and in practice, it must be expected that
IPM’s will soon become a popular topic in Operations Research and other fields where
LO is used, such as Business, Economics and Engineering. More and more institutions
will open courses dedicated to IPM’s for LO. It has been one of our purposes to collect
in this book all relevant material from research papers, survey papers, etc. and to strive
for a cohesive and easily accessible source for such courses.
The dependence between the chapters is demonstrated in Figure 1.1. This figure
indicates some possible reading paths through the book. For newcomers in the field
we recommend starting with Part I, consisting of Chapters 2, 3 and 4. This part of
the book can be used for a basic course in LO, covering duality theory and offering
a first and easy-to-analyze polynomial algorithm: the Full-Newton Step Algorithm.
Part II deals with LO problems in standard format. Chapter 5 covers the duality
theory and Chapters 6 and 7 deal with several interesting variants of the Logarithmic
9
However, numerical analysts who want to perform the actual implementation really need to
master advanced sparse linear algebra, including pivoting strategies in matrix factorization. See
Chapter 20.
Introduction
✲ 2
7
✲ 3
✲ 4
❄✠
5
✠
❘ ❄
9
❄
❘
6
7
❄
17
❘
❥
12
13
q
14
❄
8
❄
10
11
✎
❄
❯❲ ❄
20
16
18
Figure 1.1
Dependence between the chapters.
❄
15
❄
19
Barrier Method that underly the efficient solvers in existing commercial optimization
packages. For readers who know the Simplex Method and who are familiar with the
LO problem in standard format, we made Part II independent of Part I; they might
wish to start their reading with Part II and then proceed with Part I.
Part III, on the target-following approach, offers much new understanding of the
principles of IPM’s, as well as a unifying and easily accessible treatment of other
IPM’s, such as the method of Renegar (Chapter 14). This part could be part of a
more advanced course on IPM’s.
Chapter 15 contains a relatively simple description and analysis of Karmarkar’s
Projective Method. This chapter is almost independent of the previous chapters and
hence can be read at any stage.
Chapters 16, 17 and 18 could find a place in an advanced course. The value of
Chapter 16 is purely theoretical and is recommended to readers who want to delve
more deeply into properties of the central path. The other two chapters, on the other
hand, have more practical value. They describe and apply two techniques (partial
updating and higher-order methods) that can be used to enhance the efficiency of
some methods.
We consider Chapter 19 to be extremely important for users of LO who are interested
in the sensitivity of their models to perturbations in the input data. This chapter is
independent of almost all the previous chapters.
Finally, Chapter 20 is relevant for readers who are interested in implementation
8
Introduction
issues. It assumes a basic understanding of many theoretical concepts for IPM’s and
of advanced numerical algebra.
1.6
Footnotes and exercises
It may be worthwhile to devote some words to the positioning of footnotes and
exercises in this book. The footnotes are used to refer to related references, or to
make a small digression from the main thrust of the reasoning. We preferred to place
the footnotes not at the end of each chapter but at the bottom of the page they refer
to. We have treated exercises in the same way. They often have a goal similar to
footnotes, namely to highlight a result closely related to results discussed in the book.
1.7
Preliminaries
We assume that the reader is familiar with the basic concepts of linear algebra, such as
linear (sub-)space, linear (in-)dependence of vectors, determinant of a (square) matrix,
nonsingularity of a matrix, inverse of a matrix, etc. We recall some basic concepts and
results in this section.10
1.7.1
Positive definite matrices
The space of all square n × n matrices is denoted by IRn×n . A matrix A ∈ IRn×n
is called a positive definite matrix if A is symmetric and each of its eigenvalues is
positive.11 The following statements are equivalent for any symmetric matrix A:
(i) A is positive definite;
(ii) A = C T C for some nonsingular matrix C;
(iii) xT Ax > 0 for each nonzero vector x.
A matrix A ∈ IRn×n is called a positive semi-definite matrix if A is symmetric
and its eigenvalues are nonnegative. The following statements are equivalent for any
symmetric matrix A:
(i) A is positive semi-definite;
(ii) A = C T C for some matrix C;
(iii) xT Ax ≥ 0 for each vector x.
1.7.2
Norms of vectors and matrices
In this book a vector x is always an n-tuple (x1 , x2 , . . . , xn ) in IRn . The numbers
xi (1 ≤ i ≤ n) are called the coordinates or entries of x. Usually we think of x as a
10
11
For a more detailed treatment we refer the reader to books like Bellman [38], Birkhoff and
MacLane [41], Golub and Van Loan [112], Horn and Johnson [147], Lancester and Tismenetsky [181], Ben-Israel and Greville [39], Strang [259] and Watkins [289].
Some authors do not include symmetry as part of the definition. For example, Golub and Van
Loan [112] call A positive definite if (iii) holds without requiring symmetry of A.
Introduction
9
column vector and of its transpose, denoted by xT , as a row vector. If all entries of x
are zero we simply write x = 0. A special vector is the all-one vector, denoted by e,
whose coordinates are all equal to 1. The scalar product of x and s ∈ IRn is given by
n
X
xT s =
xi si .
i=1
We recall the following properties of norms for vectors and matrices. A norm (or
vector norm) on IRn is a function that assigns to each x ∈ IRn a nonnegative number
kxk such that for all x, s ∈ IRn and α ∈ IR:
kxk > 0,
if x 6= 0
kαxk = |α| kxk
kx + sk ≤ kxk + ksk .
The Euclidean norm is defined by
v
u n
uX
x2i .
kxk2 = t
i=1
When the norm is not further specified, kxk will always refer to the Euclidean norm.
The Cauchy–Schwarz inequality states that for x, s ∈ IRn :
xT s ≤ kxk ksk .
The inequality holds with equality if and only if x and s are linearly dependent.
For any positive number p we also have the p-norm, defined by
kxkp =
n
X
i=1
|xi |
p
! p1
.
The Euclidean norm is the special case where p = 2 and is therefore also called the
2-norm. Another important special case is the 1-norm:
kxk1 =
n
X
i=1
|xi | .
Letting p go to infinity we get the so-called infinity norm:
kxk∞ = lim kxkp .
p→∞
We have
kxk∞ = max |xi | .
1≤i≤n
For any positive definite n × n matrix A we have a vector norm k.kA according to
√
kxkA = xT Ax.
10
Introduction
For any norm the unit ball in IRn is the set
{x ∈ IRn : kxk = 1} .
By concatenating the columns of an n × n matrix A (in the natural order), A can be
2
considered a vector in IRn . A function assigning to each A ∈ IRn×n a real number kAk
is called a matrix norm if it satisfies the conditions for a vector norm and moreover
kABk ≤ kAk kBk ,
for all A, B ∈ IRn×n . A well-known matrix norm is the Frobenius norm k.kF , which is
simply the vector 2-norm applied to the matrix:
v
uX
n
u n X
A2ij .
kAkF = t
i=1 j=1
Every vector norm induces a matrix norm according to
kAk = max kAxk .
kxk=1
This matrix norm satisfies
∀x ∈ IRn .
kAxk ≤ kAk kxk ,
The vector 1-norm induces the matrix norm
kAk1 = max
1≤j≤n
n
X
i=1
|Aij | ,
and the vector ∞-norm induces the matrix norm
kAk∞ = max
1≤i≤n
n
X
j=1
|Aij | .
kAk1 is also called the column sum norm and kAk∞ the row sum norm. Note that
kAk∞ = AT
1
.
Hence, if A is symmetric then kAk∞ = kAk1 . The matrix norm induced by the vector
2-norm is, by definition,
kAk2 = max kAxk2 .
kxk2 =1
This norm is also called the spectral matrix norm. Observe that it differs from the
Frobenius norm (consider both norms for A = I, where I = diag (e)). In general,
kAk2 ≤ kAkF .
Introduction
1.7.3
11
Hadamard inequality for the determinant
For an n × n matrix A with columns a1 , a2 , . . . , an its determinant satisfies
det(A) = volume of the parallelepiped spanned by a1 , a2 , . . . , an .
This interpretation of the determinant implies the inequality
det(A) ≤ ka1 k2 ka2 k2 . . . kan k2 ,
which is known as the Hadamard inequality.12
1.7.4
Order estimates
Let f and g be functions from the positive reals to the positive reals. In many estimates
the following definitions will be helpful.
• We write f (x) = O(g(x)) if there exists a positive constant c such that f (x) ≤ cg(x),
for all x > 0.
• We write f (x) = Ω(g(x)) if there exists a positive constant c such that f (x) ≥ cg(x),
for all x > 0.
• We write f (x) = Θ(g(x)) if there exist positive constants c1 and c2 such that
c1 g(x) ≤ f (x) ≤ c2 g(x), for all x > 0.
1.7.5
Notational conventions
The identity matrix usually is denoted as I; if the size of I is not clear from the
context we use a subscript like in In to specify that it is the n × n identity matrix.
Similarly, zero matrices and zero vectors usually are denoted simply as 0; but if the
size is ambiguous, we use subscripts like in 0m×n to specify the size. The all-one vector
is always denoted as e, and if necessary the size is specified by a subscript.
For any x ∈ IRn we often denote the diagonal matrix diag (x) by the corresponding
capital X. For example, D = diag (d). The componentwise product of two vectors
x, s ∈ IRn , known as the Hadamard product of x and s is denoted compactly by xs.13
The i-th entry of xs is xi si . In other words, xs = Xs = Sx. As a consequence we have
for the scalar product of x and s,
xT s = eT (xs),
which will be used repeatedly later on. Similarly we use x/s for the componentwise
quotient of x and s. This kind of notation is also used for√unitary operations. For
√
example, the i-th entry of x−1 is xi−1 and the i-th entry of x is xi . This notation
is consistent as long as componentwise operations are given precedence over matrix
operations. Thus, if A is a matrix then Axs = A(xs).
12
See, e.g., Horn and Johnson [147], page 477.
13
In the literature this product is known as the Hadamard product of x and s. It is often denoted by
x•s. Throughout the book we will use the shorter notation xs. Note that if x and s are nonnegative
then xs = 0 holds if and only if xT s = 0.
Part I
Introduction: Theory and
Complexity
2
Duality Theory for Linear
Optimization
2.1
Introduction
This chapter introduces the reader to the main theoretical results in the field of linear
optimization (LO). These results concern the notion of duality in LO. An LO problem
consists of optimizing (i.e., minimizing or maximizing) a linear objective function
subject to a finite set of linear constraints. The constraints may be equality constraints
or inequality constraints. If the constraints are inconsistent, so that they do not allow
any feasible solution, then the problem is called infeasible, otherwise feasible. In the
latter case the feasible set (or domain) of the problem is not empty; then there are two
possibilities: the objective function is either unbounded or bounded on the domain. In
the first case, the problem is called unbounded and in the second case bounded. The
set of optimal solutions of a problem is referred to as the optimal set; the optimal set
is empty if and only if the problem is infeasible or unbounded.
For any LO problem we may construct a second LO problem, called its dual problem,
or shortly its dual. A problem and its dual are closely related. The relation can be
expressed nicely in terms of the optimal sets of both problems. If the optimal set of one
of the two problems is nonempty, then neither is the optimal set of the other problem;
moreover, the optimal values of the objective functions for both problems are equal.
These nontrivial results are the basic ingredients of the so-called duality theory for
LO.
The duality theory for LO can be derived in many ways.1 A popular approach in
textbooks to this theory is constructive. It is based on the Simplex Method. While
solving a problem by this method, at each iterative step the method generates so1
The first duality results in LO were obtained in a nonconstructive way. They can be derived from
some variants of Farkas’ lemma [75], or from more general separation theorems for convex sets. See,
e.g., Osborne [229] and Saigal [249]. An alternative approach is based on direct inductive proofs
of theorems of Farkas, Weyl and Minkowski and derives the duality results for LO as a corollary
of these theorems. See, e.g., Gale [91]. Constructive proofs are based on finite termination of a
suitable algorithm for solving either linear inequality systems or LO problems. A classical method
for solving linear inequality systems in a finite number of steps is Fourier-Motzkin elimination.
By this method we can decide in finite time if the system admits a feasible solution or not. See,
e.g., Dantzig [59]. This can be used to proof Farkas’ lemma from which the duality results for
LO then easily follow. For the LO problem there exist several finite termination methods. One
of them, the Simplex Method, is sketched in this paragraph. Many authors use such a method to
derive the duality results for LO. See, e.g., Chvátal [55], Dantzig [59], Nemhauser and Wolsey [224],
Papadimitriou and Steiglitz [231], Schrijver [250] and Walsh [287].
16
I Theory and Complexity
called multipliers associated with the constraints. The method terminates when the
multipliers turn out to be feasible for the dual problem; then it yields an optimal
solution both for the primal and the dual problem.2
Interior point methods are also intimately linked with duality theory. The key
concept is the so-called central path, an analytic curve in the interior of the domain of
the problem that starts somewhere in the ‘middle’ of the domain and ends somewhere
in the ‘middle’ of the optimal set of the problem. The term ‘middle’ in this context will
be made precise later. Interior point methods follow the central path (approximately)
as a guideline to the optimal set.3 One of the aims of this chapter is to show that the
aforementioned duality results can be derived from properties of the central path.4
Not every problem has a central path. Therefore, it is important in this framework to
determine under which condition the central path exists. It happens that this condition
implies the existence of the central path for the dual problem and the points on the
dual central path are closely related to the points on the primal central path. As a
consequence, following the primal central path (approximately) to the primal optimal
set goes always together with following the dual central path (approximately) to the
dual optimal set. Thus, when the primal and dual central paths exist, the interiorpoint approach yields in a natural way the duality theory for LO, just as in the case of
the Simplex Method. When the central paths do not exist the duality results can be
obtained by a little trick, namely by embedding the given problem in a larger problem
which has a central path. Below this approach will be discussed in more detail.
We start the whole analysis, in the next section, by considering the LO problem in
the so-called canonical form. So the objective is to minimize a linear function over a
set of inequality constraints of greater-than-or-equal type with nonnegative variables.
Since every LO problem admits a canonical representation, the validity of the
duality results in this chapter naturally extend to arbitrary LO problems. Usually
the canonical form of an LO problem is obtained by introducing new variables and/or
constraints. As a result, the number of variables and/or constraints may be doubled.
In Appendix D.1 we present a specific scheme that transforms any LO problem that is
not in the canonical form to a canonical problem in such a way that the total number
of variables and constraints does not increase, and even decreases in many cases.
We show that solving the canonical LO problem can be reduced to finding a solution
of an appropriate system of inequalities. In Section 2.4 we impose a condition on the
system—the interior-point condition— and we show that this condition is not satisfied
by our system of inequalities. By expanding the given system slightly however we get
an equivalent system that satisfies the interior-point condition. Then we construct a
self-dual problem5 whose domain is defined by the last system. We further show that
a solution of the system, and hence of the given LO problem, can easily be obtained
2
3
The Simplex Method was proposed first by Dantzig [59]. In fact, this method has many variants
due to various strategies for choosing the pivot element. When we refer to the Simplex Method
we always assume that a pivot strategy is used that prevents cycling and thus guarantees finite
termination of the method.
This interpretation of recent interior-point methods for LO was proposed first by Megiddo [200].
The notion of central path originates from nonlinear (convex) optimization; see Fiacco and
McCormick [77].
4
This approach to the duality theory has been worked out by Güler et al. [133, 134].
5
Problems of this special type were considered first by Tucker [274], in 1956.
I.2 Duality Theory
17
from a so-called strictly complementary solution of the self-dual problem.
Thus the canonical problem can be embedded in a natural way into a selfdual problem and using the existence of a strictly complementary solution for the
embedding self-dual problem we derive the classical duality results for the canonical
problem. This is achieved in Section 2.9.
The self-dual problem in itself is a trivial LO problem. In this problem all variables
are nonnegative. The problem is trivial in the sense that the zero vector is feasible
and also optimal. In general the zero vector will not be the only optimal solution.
If the optimal set contains nonzero vectors, then some of the variables must occur
with positive value in an optimal solution. Thus we may divide the variables into two
groups: one group contains the variables that are zero in each optimal solution, and
the second group contains the other variables that may occur with positive sign in an
optimal solution. Let us call for the moment the variables in the first group ‘good’
variables and those in the second group ‘bad’ variables.
We proceed by showing that the interior-point condition guarantees the existence
of the central path. The proof of this fact in Section 2.7 is constructive. From the
limiting behavior of the central path when it approaches the optimal set, we derive
the existence of a strictly complementary solution of the self-dual problem. In such
an optimal solution all ‘good’ variables are positive, whereas the ’bad’ variables are
zero, of course. Next we prove the same result for the case where the interior-point
condition does not hold. From this we derive that every (canonical) LO problem that
has an optimal solution, also has a strictly complementary optimal solution.
It may be clear that the nontrivial part of the above analysis concerns the existence
of a strictly complementary solution for the self-dual problem. Such solutions play
a crucial role in the approach of this book. Obviously a strictly complementary
solution provides much more information on the optimal set of the problem than
just one optimal solution, because variables that occur with zero value in a strictly
complementary solution will be zero in any optimal solution.6
One of the surprises of this chapter is that the above results for the self-dual problem
immediately imply all basic duality results for the general LO problem. This is shown
first for the canonical problem in Section 2.9 and then for general LO problems in
Section 2.10; in this section we present an easy-to-remember scheme for writing down
the dual problem of any given LO problem. This involves first transforming the given
problem to a canonical form, then taking the dual of this problem and reformulating
the canonical dual so that its relation to the given problem becomes more apparent.
The scheme is such that applying it twice returns the original problem. Finally,
although the result is not used explicitly in this chapter, but because it is interesting
in itself, we conclude this chapter with Section 2.11 where we show that the central
path converges to an optimal solution.
6
The existence of strictly complementary optimal solutions was shown first by Goldman and
Tucker [111] in 1956. Balinski and Tucker [33], in 1969, gave a constructive proof.
18
I Theory and Complexity
2.2
The canonical LO-problem and its dual
We say that a linear optimization problem is in canonical form if it is written in the
following way:
(P )
min cT x : Ax ≥ b, x ≥ 0 ,
(2.1)
where the matrix A is of size m × n, the vectors c and x are in IRn and b in IRm .
Note that all the constraints in (P ) are inequality constraints and the variables
are nonnegative. Each LO-problem can be transformed to an equivalent canonical
problem.7 Given the above canonical problem (P ), we consider a second problem,
denoted by (D) and called the dual problem of (P ), given by
(D)
max bT y : AT y ≤ c, y ≥ 0 .
(2.2)
The two problems (P ) and (D) share the matrix A and the vectors b and c in their
description. But the role of b and c has been interchanged: the objective vector c of
(P ) is the right-hand side vector of (D), and, similarly, the right-hand side vector b
of (P ) is the objective vector of (D). Moreover, the constraint matrix in (D) is the
transposed matrix AT , where A is the constraint matrix in (P ). In both problems the
variables are nonnegative. The problems differ in that (P ) is a minimization problem
whereas (D) is a maximization problem, and, moreover, the inequality symbols in the
constraints have opposite direction.8,9
At this stage we make a crucial observation.
Lemma I.1 (Weak duality) Let x be feasible for (P ) and y for (D). Then
bT y ≤ cT x.
(2.3)
Proof: If x is feasible for (P ) and y for (D), then x ≥ 0, y ≥ 0, Ax ≥ b and AT y ≤ c.
As a consequence we may write
bT y ≤ (Ax)T y = xT AT y ≤ cT x.
This proves the lemma.
✷
Hence, any y that is feasible for (D) provides a lower bound bT y for the value of cT x,
whenever x is feasible for (P ). Conversely, any x that is feasible for (P ) provides an
upper bound cT x for the value of bT y, whenever y is feasible for (D). This phenomenon
is known as the weak duality property. We have as an immediate consequence the
following.
Corollary I.2 If x is feasible for (P ) and y for (D), and cT x = bT y, then x is optimal
for (P ) and y is optimal for (D).
7
For this we refer to any text book on LO. In Appendix D it is shown that this can be achieved
without increasing the numbers of constraints and variables.
8
Exercise 1 The dual problem (D) can be transformed into canonical form by replacing the
constraint AT y ≤ c by −AT y ≥ −c and the objective max bT y by min −bT y. Verify that the
dual of the resulting problem is exactly (P ).
9
Exercise 2 Let the matrix A be skew-symmetric, i.e., AT = −A, and let b = −c. Verify that then
(D) is essentially the same problem as (P ).
I.2 Duality Theory
19
The (nonnegative) difference
cT x − b T y
(2.4)
between the primal objective value at a primal feasible x and the dual objective value
at a dual feasible y is called the duality gap for the pair (x, y). We just established
that if the duality gap vanishes then x is optimal for (P ) and y is optimal for (D).
Quite surprisingly, the converse statement is also true: if x is an optimal solution of
(P ) and y is an optimal solution of (D) then the duality gap vanishes at the pair
(x, y). This result is known as the strong duality property in LO. One of the aims of
this chapter is to prove this most important result. So, in this chapter we will not use
this property, but prove it!
Thus our starting point is the question under which conditions an optimal pair (x, y)
exists with vanishing duality gap. In the next section we reduce this question to the
question if some system of linear inequalities is solvable.
2.3
Reduction to inequality system
In this section we consider the question whether (P ) and (D) have optimal solutions
with vanishing duality gap. This will be true if and only if the inequality system
Ax ≥
b,
T
0
T
T
−A y ≥ −c,
b y−c x ≥
x ≥ 0,
y ≥ 0,
(2.5)
has a solution. This follows by noting that x and y satisfy the inequalities in the first
two lines if and only if they are feasible for (P ) and (D) respectively. By Lemma I.1
this implies cT x − bT y ≥ 0. Hence, if we also have bT y − cT x ≥ 0 we get bT y = cT x,
proving the claim.
If κ = 1, the following inequality system is equivalent to (2.5), as easily can be
verified.
0m
y
0m×m A −b
−AT 0n×n c x ≥ 0n ,
0
κ
bT
−cT 0
x ≥ 0, y ≥ 0, κ ≥ 0.
(2.6)
The new variable κ is called the homogenizing variable. Since the right-hand side
in (2.6) is the zero vector, this system is homogeneous: whenever (y, x, κ) solves the
system then λ(y, x, κ) also solves the system, for any positive λ. Now, given any
solution (x, y, κ) of (2.6) with κ > 0, (x/κ, y/κ, 1) yields a solution of (2.5). This
makes clear that, in fact, the two systems are completely equivalent unless every
solution of (2.6) has κ = 0. But if κ = 0 for every solution of (2.6), then it follows that
no solution exists with κ = 1, and therefore the system (2.5) cannot have a solution in
that case. Evidently, we can work with the second system without loss of information
about the solution set of the first system.
20
I Theory and Complexity
Hence, defining the matrix M̄ and the vector z̄ by
y
0
A −b
T
M̄ := −A
0
c , z̄ := x ,
κ
bT −cT 0
(2.7)
M̄ z̄ ≥ 0,
(2.8)
where we omitted the size indices of the zero blocks, we have reduced the problem
of finding optimal solutions for (P ) and (D) with vanishing duality gap to finding a
solution of the inequality system
z̄ ≥ 0,
κ > 0.
If this system has a solution then it gives optimal solutions for (P ) and (D) with
vanishing duality gap; otherwise such optimal solutions do not exist. Thus we have
proved the following result.
Theorem I.3 The problems (P ) and (D) have optimal solutions with vanishing
duality gap if and only if system (2.8), with M̄ and z̄ as defined in (2.7), has a
solution.
Thus our task has been reduced to finding a solution of (2.8), or to prove that such
a solution does not exists. In the sequel we will deal with this problem. In doing so,
we will strongly use the fact that the matrix M̄ is skew-symmetric, i.e., M̄ T = −M̄ .10
Note that the order of M̄ equals m + n + 1.
2.4
Interior-point condition
The method we are going to use in the next chapter for solving (2.8) is an interiorpoint method (IPM), and for this we need the system to satisfy the interior-point
condition.
Definition I.4 (IPC) We say that any system of (linear) equalities and (linear)
inequalities satisfies the interior-point condition (IPC) if there exists a feasible solution
that strictly satisfies all inequality constraints in the system.
Unfortunately the system (2.8) does not satisfy the IPC. Because if z = (x, y, κ)
is a solution
then x/κ is feasible for (P ) and y/κ is feasible for (D). But then
cT x − bT y /κ ≥ 0, by weak duality. Since κ > 0, this implies bT y − cT x ≤ 0.
On the other hand, after substitution of (2.7), the last constraint in (2.8) requires
bT y − cT x ≥ 0. It follows that bT y − cT x = 0, and hence no feasible solution of (2.8)
satisfies the last inequality in (2.8) strictly.
To overcome this shortcoming of the system (2.8) we increase the dimension by
adding one more nonnegative variable ϑ to the vector z̄, and by extending M̄ with
one extra column and row, according to
#
" #
"
M̄ r
z̄
, z :=
,
(2.9)
M :=
T
−r 0
ϑ
10
Exercise 3 If S is an n × n skew-symmetric matrix and z ∈ IRn , then z T Sz = 0. Prove this.
I.2 Duality Theory
21
where
r = em+n+1 − M̄ em+n+1 ,
with em+n+1 denoting an all-one vector of length m + n + 1. So we have
y
0
A −b
e
−
Ae
+
b
m
n
−AT
x
r
0
c
M =
, r = en + AT em − c , z =
bT −cT 0
κ
1 − b T e m + cT e n
−rT
0
ϑ
(2.10)
(2.11)
The order of the matrix M is m + n + 2. To simplify the presentation, in the rest of
this chapter we denote this number as n̄:
n̄ = m + n + 2.
Letting q be the vector of length n̄ given by
#
"
0n̄−1
,
q :=
n̄
(2.12)
we consider the system
M z ≥ −q,
z ≥ 0.
(2.13)
We make two important observations. First we observe that the matrix M is skewsymmetric. Secondly, the system (2.13) satisfies the IPC. The all-one vector does the
work, because taking z̄ = en̄−1 and ϑ = 1, we have
#
# " # "
# "
"
#"
0
en̄−1
M̄ en̄−1 + r
M̄ r
en̄−1
.
(2.14)
+
=
=
Mz + q =
1
1
n̄
−rT en̄−1 + n̄
−rT 0
The last equality is due to the definition of r, which implies M̄ en̄−1 + r = en̄−1 and
T
T
en̄−1 + n̄ = 1,
−rT en̄−1 + n̄ = − en̄−1 − M̄ en̄−1 en̄−1 + n̄ = −en̄−1
where we used eTn̄−1 M̄ en̄−1 = 0 (cf. Exercise 3, page 20).
The usefulness of system (2.13) stems from two facts. First, it satisfies the IPC
and hence can be treated by an interior-point method. What this implies will
become apparent in the next chapter. Another crucial property is that there is a
correspondence between the solutions of (2.8) and the solutions of (2.13) with ϑ = 0.
To see this it is useful to write (2.13) in terms of z̄ and ϑ:
#" # " #
"
z̄
0
M̄ r
+
≥ 0, z̄ ≥ 0, ϑ ≥ 0.
T
n̄
−r 0
ϑ
Obviously, if z = (z̄, 0) satisfies (2.13), this implies M̄ z̄ ≥ 0 and z̄ ≥ 0, and hence z̄
satisfies (2.8). On the other hand, if z̄ satisfies (2.8) then M̄ z̄ ≥ 0 and z̄ ≥ 0; as a
consequence z = (z̄, 0) satisfies (2.13) if and only if −rT z̄ + n̄ ≥ 0, i.e., if and only if
rT z̄ ≤ n̄.
If rT z̄ ≤ 0 this certainly holds. Otherwise, if rT z̄ > 0, the positive multiple nz̄/rT z̄ of
z̄ satisfies rT z̄ ≤ n̄. Since a positive multiple preserves signs, this is sufficient for our
goal. We summarize the above discussion in the following theorem.
22
I Theory and Complexity
Theorem I.5 The following three statements are equivalent:
(i) Problems (P ) and (D) have optimal solutions with vanishing duality gap;
(ii) If M̄ and z̄ are given by (2.7) then (2.8) has a solution;
(iii) If M and z are given by (2.11) then (2.13) has a solution with ϑ = 0 and κ > 0.
Moreover, system (2.13) satisfies the IPC.
2.5
Embedding into a self-dual LO-problem
Obviously, solving (2.8) is equivalent to finding a solution of the minimization problem
(SP0 )
min 0T z̄ : M̄ z̄ ≥ 0, z̄ ≥ 0
(2.15)
with κ > 0. In fact, this is the way we are going to follow: our aim will be to find
out whether this problem has a(n optimal) solution with κ > 0 or not. Note that
the latter condition makes our task nontrivial. Because finding an optimal solution of
(SP0 ) is trivial: the zero vector is feasible and hence optimal. Also note that (SP0 ) is
in the canonical form. However, it has a very special structure: its feasible domain is
homogeneous and since M̄ is skew-symmetric, the problem (SP0 ) is a self-dual problem
(cf. Exercise 2, page 18). We say that (SP0 ) is a self-dual embedding of the canonical
problem (P ) and its dual problem (D).
If the constraints in an LO problem satisfy the IPC, then we simply say that the
problem itself satisfies the IPC. As we established in the previous section, the self-dual
embedding (SP0 ) does not satisfy the IPC, and therefore, from an algorithmic point
of view this problem is not useful.
In the previous section we reduced the problem of finding optimal solutions (P ) and
(D) with vanishing duality gap to finding a solution of (2.13) with ϑ = 0 and κ > 0.
For that purpose we consider another self-dual embedding of (P ) and (D), namely
(SP )
min q T z : M z ≥ −q, z ≥ 0 .
(2.16)
The following theorem shows that we can achieve our goal by solving this problem.
Theorem I.6 The system (2.13) has a solution with ϑ = 0 and κ > 0 if and only if
the problem (SP ) has an optimal solution with κ = zn̄−1 > 0.
Proof: Since q ≥ 0 and z ≥ 0, we have q T z ≥ 0, and hence the optimal value of (SP )
is certainly nonnegative. On the other hand, since q ≥ 0 the zero vector (z = 0) is
feasible, and yields zero as objective value, which is therefore the optimal value. Since
q T z = n̄ϑ, we conclude that the optimal solutions of (2.16) are precisely the vectors z
satisfying (2.13) with ϑ = 0. This proves the theorem.
✷
We associate to any vector z ∈ IRn its slack vector s(z) as follows.
s(z) := M z + q.
(2.17)
Then we have
z is a feasible for (SP )
⇐⇒
z ≥ 0, s(z) ≥ 0.
I.2 Duality Theory
23
As we established in the previous section, the inequalities defining the feasible domain
of (SP ) satisfy the IPC. To be more specific, we found in (2.14) that the all-one vector
e is feasible and its slack vector is the all-one vector. In other words,
s(e) = e.
(2.18)
We proceed by giving a small example.
Example I.7 By way of example we consider the case where the problems (P ) and
(D) are determined by the following constraint matrix A, and vectors b and c:11
1
1
, c= 2 .
, b=
A=
−1
0
According to (2.7) the matrix M̄ is then equal to
0
0
1 −1
0
A −b
0
0
0
1
M̄ = −AT
0
c =
−1
0
0
2
T
T
b
−c
0
1 −1 −2
0
and according to (2.10), the vector r becomes
1
1
r = e − M̄ e =
1 −
1
Thus, by (2.11) and (2.12), we obtain
0
0
1 −1
0
0
0
1
M = −1
0
0
2
1 −1 −2
0
−1
0
0 −3
Hence, the self-dual problem (SP ),
0
0
1 −1
0
0
0
1
min 5ϑ : −1
0
0
2
1
−1
−2
0
−1
0
0 −3
0
1
0
1
=
1 0
−2
3
1
0
0
3
0
,
,
q=
0
0
0
0
5
.
as given by (2.16), gets the form
1
0
z1
z1
0 z2 0
z2
0 z3 + 0 ≥ 0, z3 ≥ 0 . (2.19)
z4 = κ
3 z4 0
z5
0
5
z5 = ϑ
Note that the all-one vector is feasible for this problem and that its surplus vector
also is the all-one vector. This is in accordance with (2.18). As we shall see later on,
it means that the all-one vector is the point on the central path for µ = 1.
♦
11
cf. Example D.5 (page 449) in Appendix D.
24
I Theory and Complexity
Remark I.8 In the rest of this chapter, and the next chapter, we deal with the problem
(SP ). In fact, our analysis does not only apply to the case that M and q have the
special form of (2.11) and (2.12). Therefore we extend the applicability of our analysis
by weakening the assumptions on M and q. Unless stated otherwise below we only
assume the following:
M T = −M, q ≥ 0, s(e) = e.
(2.20)
The last two variables in the vector z play a special role. They are the homogenizing
variable κ = zn̄−1 , and ϑ = zn̄ . The variable ϑ is called the normalizing variable,
because of the following important property.
Lemma I.9 One has
eT z + eT s(z) = n̄ + q T z.
(2.21)
Proof: The identity in the lemma is a consequence of the orthogonality property (cf.
Exercise 3, page 20)
uT M u = 0, ∀u ∈ IRn̄ .
(2.22)
First we deduce that for every z one has
q T z = z T (s(z) − M z) = z T s(z) − z T M z = z T s(z).
(2.23)
Taking u = e − z in (2.22) we obtain
(z − e)T (s(z) − s(e)) = 0.
Since s(e) = e, eT e = n̄ and z T s(z) = q T z, the relation (2.21) follows.
✷
It follows from Lemma I.9 that the sum of the positive coordinates in z and s(z)
is bounded above by n̄ + q T z. Note that this is especially interesting if z is optimal,
because then q T z = 0. Hence, if z is optimal then
eT z + eT s(z) = n̄.
(2.24)
Since z and s(z) are nonnegative this implies that the set of optimal solutions is
bounded.
Another interesting feature of the LO-problem (2.16) is that it is self-dual: the dual
problem is
(DSP )
max −q T u : M T u ≤ q, u ≥ 0 ;
since M is skew-symmetric, M T u ≤ q is equivalent to −M u ≤ q, or M u ≥ −q,
and maximizing −q T u is equivalent to minimizing q T u, and thus the dual problem is
essential the same problem as (2.16).
The rest of the chapter is devoted to our main task, namely to find an optimal
solution of (2.16) with κ > 0 or to establish that such a solution does not exist.
2.6
The classes B and N
We introduce the index sets B and N according to
B := {i : zi > 0 for some optimal z}
N := {i : si (z) > 0 for some optimal z} .
I.2 Duality Theory
25
So, B contains all indices i for which an optimal solution z with positive zi exists. We
also write zi ∈ B if i ∈ B. Note that we certainly have ϑ ∈
/ B, because ϑ is zero in
any optimal solution of (SP ). The main question we have to answer is whether κ ∈ B
holds or not. Because if κ ∈ B then there exists an optimal solution z with κ > 0,
in which case (P ) and (D) have optimal solutions with vanishing duality gap, and
otherwise not.
The next lemma implies that the sets B and N are disjoint. In this lemma, and
further on, we use the following notation. To any vector u ∈ IRk , we associate the
diagonal matrix U whose diagonal entries are the elements of u, in the same order. If
also v ∈ IRk , then U v will be denoted shortly as uv. Thus uv is a vector whose entries
are obtained by multiplying u and v componentwise.
Lemma I.10 Let z 1 and z 2 be feasible for (SP ). Then z 1 and z 2 are optimal solutions
of (SP ) if and only if z 1 s(z 2 ) = z 2 s(z 1 ) = 0.
Proof: According to (2.23) we have for any feasible z:
q T z = z T s(z).
(2.25)
As a consequence, z ≥ 0 is optimal if and only if s(z) ≥ 0 and z T s(z) = 0. Since, by
(2.22),
T
z 1 − z 2 M z 1 − z 2 = 0,
we have
z1 − z2
T
s(z 1 ) − s(z 2 ) = 0.
Expanding the product on the left and rearranging the terms we get
(z 1 )T s(z 2 ) + (z 2 )T s(z 1 ) = (z 1 )T s(z 1 ) + (z 2 )T s(z 2 ).
Now z 1 is optimal if and only if (z 1 )T s(z 1 ) = 0, by (2.25), and similarly for z 2 . Hence,
since z 1 , z 2 , s(z 1 ) and s(z 2 ) are all nonnegative, z 1 and z 2 are optimal if and only if
(z 1 )T s(z 2 ) + (z 2 )T s(z 1 ) = 0,
which is equivalent to
z 1 s(z 2 ) = z 2 s(z 1 ) = 0,
proving the lemma.
✷
Corollary I.11 The sets B and N are disjoint.
Proof: If i ∈ B ∩ N then there exist optimal solutions z 1 and z 2 of (SP ) such that
zi1 > 0 and si (z 2 ) > 0. This would imply zi1 si (z 2 ) > 0, a contradiction with Lemma
I.10. Hence B ∩ N is the empty set.
✷
By way of example we determine the classes B and N for the problem considered
in Example I.7.
26
I Theory and Complexity
Example I.12 Consider the self-dual problem (SP ) in Example I.7, as given by
(2.19):
0
0
1 −1
1
z1
0
z1
0
z 0
z
0
0
1
0
2
2
min 5ϑ : −1
0
0
2
0 z3 + 0 ≥ 0, z3 ≥ 0 .
1 −1 −2
z4 = κ
0
3 z4 0
−1
0
0 −3
0
z5
5
z5 = ϑ
For any z ∈ IR5 we have
z3 − z4 + z5
z3 − κ + ϑ
z4
κ
s(z) =
2z4 − z1
2κ − z1
=
z1 − z2 − 2z3 + 3z5 z1 − z2 − 2z3 + 3ϑ
5 − z1 − 3z4
5 − z1 − 3κ
Now z is feasible if z ≥ 0 and s(z) ≥ 0, and
z = (z1 , z2 , z3 , κ, ϑ) is optimal if and only if
z1
z3 − κ + ϑ
z
κ
2
2κ − z1
z3 ≥ 0,
≥ 0,
κ
z1 − z2 − 2z3 + 3ϑ
ϑ
5 − z1 − 3κ
Adding the equalities at the right we obtain 5ϑ
Substitution gives
z1
z3 − κ
z
κ
2
z3 ≥ 0, 2κ − z1 ≥ 0,
κ
z1 − z2 − 2z3
0
5 − z1 − 3κ
optimal if moreover zs(z) = 0. So
z1 (z3 − κ + ϑ) = 0
z2 κ = 0
z3 (2κ − z1 ) = 0 .
κ (z1 − z2 − 2z3 + 3ϑ) = 0
ϑ (5 − z1 − 3κ) = 0
= 0, which gives ϑ = 0, as it should.
z1 (z3 − κ) = 0
z2 κ = 0
z3 (2κ − z1 ) = 0 .
κ (z1 − z2 − 2z3 ) = 0
ϑ=0
Note that if κ = 0 then the inequality 2κ − z1 ≥ 0 implies z1 = 0, and then the
inequality z1 − z2 − 2z3 ≥ 0 gives also z2 = 0 and z3 = 0. Hence, z = 0 is the only
optimal solution for which κ = 0. So, let us assume κ > 0. Then we deduce from the
second and fourth equality that z2 = 0 and z1 − z2 − 2z3 = 0. This reduces our system
to
2z3 (z3 − κ) = 0
z1 = 2z3
z3 − κ
z2 = 0
κ
0
z3 (2κ − 2z3 ) = 0 .
z3
≥ 0, 2κ − 2z3 ≥ 0,
0
κ
0=0
5 − 2z3 − 3κ
0
ϑ=0
The equations at the right make clear that either z3 = 0 or z3 = κ. However, the
inequality z3 − κ ≥ 0 forces z3 > 0 since κ > 0. Thus we find that any optimal solution
I.2 Duality Theory
has the form
27
2κ
0
z = κ ,
κ
0
0
κ
s(z) = 0 ,
0
5 − 5κ
0 ≤ κ ≤ 1.
(2.26)
This implies that in this example the sets B and N are given by
B = {1, 3, 4} ,
N = {2, 5} .
♦
In the above example the union of B and N is the full index set. This is not an
incident. Our next aim is to prove that this always holds. 12,13,14,15 As a consequence
these sets form a partition of the full index set {1, 2, . . . , n̄}; it is the so-called optimal
partition of (SP ). This important and nontrivial result is fundamental to our purpose
but its proof requires some effort. It highly depends on properties of the central path
of (SP ), which is introduced in the next section.
2.7
2.7.1
The central path
Definition of the central path
Recall from (2.14) that s(e) = e, where e (as always) denotes the all-one vector of
appropriate length (in this case, n̄). As a consequence, we have a vector z such that
zi si (z) = 1 (1 ≤ i ≤ n̄), which, using our shorthand notation can also be expressed as
⇒
z=e
zs(z) = e.
(2.27)
Now we come to a very fundamental notion, both from a theoretical and algorithmic
point of view, namely the central path of the LO-problem at hand. The underlying
12
Exercise 4 Following the same approach as in Example I.7 construct the embedding problem for
the case where the problems (P ) and (D) are determined by
A=
13
h
1
0
i
,
i
,
b=
h
1
0
b=
h
1
1
Exercise 6 Same as in Exercise 4, but now with
A=
15
1
0
i
,
i
,
c=
2
,
and, following the approach of Example I.12, find the set of all optimal solutions and the optimal
partition.
Exercise 5 Same as in Exercise 4, but now with
A=
14
h
h
1
0
h
1
0
i
,
i
,
b=
h
1
β
h
1
β
i
,
i
,
c=
Exercise 7 Same as in Exercise 4, but now with
A=
b=
c=
c=
2
2
2
.
,
β > 0.
,
β < 0.
28
I Theory and Complexity
theoretical property is that for every positive number µ there exist a nonnegative
vector z such that
zs(z) = µe, z ≥ 0, s(z) ≥ 0,
(2.28)
and moreover, this vector is unique. If µ = 1, the existence of such a vector is
guaranteed by (2.27). Also note that if we put µ = 0 in (2.28) then the solutions
are just the optimal solutions of (SP ). As we have seen in Example I.12 there may
more than one optimal solution. Therefore, if µ = 0 the system (2.28) may have
multiple solutions. The following lemma is of much interest. It makes clear that for
µ > 0 the system (2.28) has at most one solution.
Lemma I.13 If µ > 0, then there exists at most one nonnegative vector z such that
(2.28) holds.
Proof: Let z 1 and z 2 to nonnegative vectors satisfying (2.28), and let s1 = s(z 1 )
and s2 = s(z 2 ). Since µ > 0, z 1 , z 2 , s1 , s2 are all positive. Define ∆z := z 2 − z 1 , and
similarly ∆s := s2 − s1 . Then we may easily verify that
M ∆z = ∆s
1
1
z ∆s + s ∆z + ∆s∆z = 0.
(2.29)
(2.30)
Using that M is skew-symmetric, (2.22) implies that ∆z T ∆s = 0, or, equivalently,
eT (∆z∆s) = 0.
(2.31)
Rewriting (2.30) gives
(z 1 + ∆z)∆s + s1 ∆z = 0.
Since z 1 + ∆z = z 2 > 0 and s1 > 0, this implies that no two corresponding entries in
∆z and ∆s have the same sign. So it follows that
∆z∆s ≤ 0.
(2.32)
Combining (2.31) and (2.32), we obtain ∆z∆s = 0. Hence either (∆z)i = 0 or
(∆s)i = 0, for each i. Using (2.30) once more, we conclude that (∆z)i = 0 and
(∆s)i = 0, for each i. Hence ∆z = ∆s = 0, whence z 1 = z 2 and s1 = s2 . This proves
the lemma.
✷
To prove the existence of a solution to (2.28) requires much more effort. We postpone
this to the next section. For the moment, let us take the existence of a solution to
(2.28) for granted and denote it as z(µ). We call it the µ-center of (SP ). The set
{z(µ) : µ > 0}
of all µ-centers represents a parametric curve in the feasible region of (SP ). This curve
is called the central path of (SP ). Note that
q T z(µ) = s(µ)T z(µ) = µn̄.
(2.33)
This proves that along the central path, when µ approaches zero, the objective value
q T z(µ) monotonically decreases to zero, at a linear rate.
I.2 Duality Theory
2.7.2
29
Existence of the central path
In this section we give an algorithmic proof of the existence of a solution to (2.28).
Starting at z = e we construct the µ-center for any µ > 0. This is done by using the
so-called Newton direction as a search direction. The results in this section will also
be used later when dealing with a polynomial-time method for solving (SP ).
Newton direction
Assume that z is a positive solution of (SP ) such that its slack vector s = s(z) is
positive, and let ∆z denote a displacement in the z-space. Our aim is to find ∆z such
that z + ∆z is the µ-center. We denote
z + := z + ∆z,
and the new slack vector as s+ :
s+ := s(z + ) = M (z + ∆z) + q = s + M ∆z.
Thus, the displacement ∆s in the s-space is simply given by
∆s = s+ − s = M ∆z.
Observe that ∆z and ∆s are orthogonal, since by (2.22):
(∆z)T ∆s = (∆z)T M ∆z = 0.
(2.34)
We want ∆z to be such that z + becomes the µ-center, which means (z + ∆z)
(s + ∆s) = µe, or
zs + z∆s + s∆z + ∆z∆s = µe.
This equation is nonlinear, due to the quadratic term ∆z∆s. Applying Newton’s
method, we omit this nonlinear term, leaving us with the following linear system
in the unknown vectors ∆z and ∆s:
M ∆z − ∆s = 0,
z∆s + s∆z = µe − zs.
(2.35)
(2.36)
This system has a unique solution, as easily may be verified, by using that M is
skew-symmetric and z > 0 and s > 0.16,17 The solution ∆z is called the Newton
direction. Since we omitted the quadratic term ∆z∆s in our calculation of the Newton
16
Exercise 8 The coefficient matrix of the system (2.35-2.36) of linear equations in ∆z and ∆s is
17
M
S
−I
Z
.
As usual, Z = diag (z) and S = diag (s), with z > 0 and s > 0, and I denotes the identity matrix.
Show that this matrix is nonsingular.
Exercise 9 Let M be a skew-symmetric matrix of size n × n and Z and S positive diagonal
matrices of the same size as M . Then the matrix S + ZM is nonsingular. Prove this.
30
I Theory and Complexity
direction, z + ∆z will (in general) not be the µ-center, but hopefully it will be a good
approximation. In fact, using (2.36), after the Newton step one has
z + s(z + ) = (z + ∆z)(s + ∆s) = zs + (z∆s + s∆z) + ∆z∆s = µe + ∆z∆s.
(2.37)
Comparing this with our desire, namely z + s(z + ) = µe, we see that the ‘error’ is
precisely the quadratic term ∆z∆s. Using (2.22), we deduce from (2.37) that
T
z + s(z + ) = µeT e = µn̄,
(2.38)
showing that after the Newton step the duality gap already has the desired value.
Example I.14 Let us compute the Newton step at z = e for the self-dual problem
(SP ) in Example I.7, as given by (2.19), with respect to some µ > 0. Since
z = s(z) = e, the equation (2.36) reduces to
∆s + ∆z = µe − e = (µ − 1)e.
Hence, by substitution into (2.35) we obtain
(M + I) ∆z = (µ − 1)e.
It suffices to know the solution of the equation (M + I) ζ = e, because then ∆z =
(µ − 1)ζ. Thus we need to solve ζ from
1
0
1 −1
1
1
0
1
1
0
1
0
0
1
2
0 ζ = 1,
−1
1 −1 −2
1
1
3
−1
0
0 −3
1
1
which gives the unique solution
ζ=
Hence
and
T
1 8 4 1
.
− , , , ,1
3 9 9 9
T
1 8 4 1
∆z = (µ − 1) − , , , , 1
,
3 9 9 9
∆s = M ∆z = (µ − 1) (e − ζ) = (µ − 1)
After the Newton step we thus have
4 1 5 8
, , , ,0
3 9 9 9
(2.39)
T
.
(2.40)
z + s+ = (z + ∆z) (s + ∆s) = zs + (∆z + ∆s) + ∆z∆s
= e + (µ − 1)e + ∆z∆s = µ e + ∆z∆s
(µ − 1)2
= µe +
(−36, 8, 20, 8, 0)T .
81
♦
I.2 Duality Theory
31
Proximity measure
To measure the quality of any approximation z of z(µ), we introduce a proximity
measure δ(z, µ) that vanishes if z = z(µ) and is positive otherwise. To this end we
introduce the variance vector of z with respect to µ as follows:
s
zs(z)
,
(2.41)
v :=
µ
where all operations are componentwise. Note that
⇔
zs(z) = µe
v = e.
The proximity measure δ(z, µ) is now defined by18
δ(z, µ) :=
1
2
v − v −1 .
(2.42)
Note that if z = z(µ) then v = e and hence δ(z, µ) = 0, and otherwise δ(z, µ) > 0. We
show below that if δ(z, µ) < 1 then the Newton process quadratically fast converges to
z(µ). For this we need the following lemma, which estimates the error term in terms
of the proximity measure. In this lemma k.k denotes the Eucledian norm (or 2-norm)
and k.k∞ the Chebychev norm (or infinity norm) of a vector.
√
Lemma I.15 If δ := δ(z, µ), then k∆z∆sk∞ ≤ µδ 2 and k∆z∆sk ≤ µδ 2 2.
√
√
Proof: Componentwise division of (2.36) by µ v = zs yields
r
z
∆s +
s
r
√
s
∆z = µ v −1 − v .
z
The terms at the left represent orthogonal vectors whose componentwise product
is ∆z∆s. Applying Lemma C.4 in Appendix C to these vectors, and using that
v −1 − v = 2δ, the result immediately follows.
✷
Quadratic convergence of the Newton process
We are now ready for the main result on the Newton direction.
Theorem I.16 If δ := δ(z, µ) < 1, then the Newton step is strictly feasible, i.e.,
z + > 0 and s+ > 0. Moreover,
18
δ2
.
δ(z + , µ) ≤ p
2(1 − δ 2 )
In the analysis of interior-point methods we always need to introduce a quantity that measures the
‘distance’ of a feasible vector z to the central path or to the µ-center. This can be done in many
ways as becomes apparent in the course of this book. In the coming chapters we make use of a
variety of so-called proximity measures. Most of these measures are based on the simple observation
that z is equal to the µ-center if and only if v = e and z is on the central path if and only if the
vector zs(z) is a scalar multiple of the all-one vector.
32
I Theory and Complexity
Proof: Let 0 ≤ α ≤ 1, z α = z + α∆z and sα = s + α∆s. We then have, using (2.36),
z α sα = (z + α∆z)(s + α∆s) = zs + α (z∆s + s∆z) + α2 ∆z∆s
= zs + α (µe − zs) + α2 ∆z∆s = (1 − α)zs + α (µe + α∆z∆s)
By Lemma I.15,
µe + α∆z∆s ≥ µe − α k∆z∆sk∞ e ≥ µ(1 − αδ 2 ) e > 0.
Hence, since (1 − α)zs ≥ 0, we have z α sα > 0, for all α ∈ [0, 1]. Therefore, the
components of z α and sα cannot vanish when α ∈ [0, 1]. Hence, since z > 0 and s > 0,
by continuity, z α and sα must be positive for any such α, especially for α = 1. This
proves the first statement in the lemma.
Now let us turn to the proof of the second statement. Let δ + := δ(z + , µ) and let v +
be the variance vector of z + with respect to µ:
s
z + s+
.
v+ =
µ
Then, by definition,
2δ + = (v + )−1 − v + = (v + )−1 e − (v + )2
Recall from (2.37) that z + s+ = µe + ∆z∆s. In other words,
v+
Substitution into (2.43) gives
− ∆z∆s
µ
2δ = q
e + ∆z∆s
µ
+
2
=e+
.
(2.43)
∆z∆s
.
µ
∆z∆s
µ
≤ r
1−
∆z∆s
µ
∞
√
δ2 2
≤ √
.
1 − δ2
The last inequality follows by using Lemma I.15 twice. Thus the proof is complete. ✷
√
Theorem I.16 implies that when δ ≤ 1/ 2, then after a Newton step the proximity to
the µ-center satisfies δ(z + , µ) ≤ δ 2 . In other words, Newton’s method is quadratically
convergent.
Example I.17 Using the self-dual problem (SP ) in Example I.7 again, we consider
in this example feasibility of the Newton step, and the proximity measure before and
after Newton step at z = e for several values of µ, to be specified below. We will see
that the Newton step performs much better than Theorem I.16 predicts! In Example
I.14 we found the values of ∆z and ∆s. Using these values we find for the new iterate:
T
1 8 4 1
,
z = e + (µ − 1) − , , , , 1
3 9 9 9
+
I.2 Duality Theory
33
and since s = s(e) = e,
s+ = e + (µ − 1)
4 1 5 8
, , , ,0
3 9 9 9
T
.
Hence the Newton step is feasible, i.e., z + and s+ are nonnegative, if and only if
0.25 ≤ µ ≤ 4,
as easily may be verified. For any such µ we have
√
1 √
5 √
e
1
1
1 √
δ(z, µ) =
µe − √
µ − √ kek =
µ− √ .
=
2
µ
2
µ
2
µ
Note that Theorem I.16 guarantees feasibility only if δ(z, µ) ≤ 1. This holds if
5µ2 − 14µ + 5 ≤ 0, which is equivalent to
√
√
1
1
7−2 6 ≤µ≤
7 + 2 6 ≈ 2.3798.
0.4202 ≈
5
5
√
The same theorem guarantees quadratically convergence if δ(z, µ) ≤ 1/ 2, which
holds if and only if
√
√
1
1
0.5367 ≈
6 − 11 ≤ µ ≤
6 + 11 ≈ 1.8633.
5
5
√
By way of example, consider the case where µ = 0.5. Then we have δ(z, µ) = 14 10 ≈
√
5
0.7906 and, by Theorem I.16, δ(z + , µ) ≤ 12
3 ≈ 0.7217. Let us compute the actual
+
value of δ(z , µ). For µ = 0.5 we have
1
2
1
s =e−
2
z+ = e −
1 8 4 1
− , , , ,1
3 9 9 9
T
=
7 5 7 17 1
, , ,
,
6 9 9 18 2
T
,
and since s = s(e) = e,
+
4 1 5 8
, , , ,0
3 9 9 9
Therefore,
v+
2
=
z + s+
=
µ
Finally, we compute δ(z + , µ) by using
4δ(z + , µ)2 = v + − (v + )−1
T
2
T
=
1 17 13 5
,
,
, ,1
3 18 18 9
7 85 91 85
,
,
,
,1
9 81 81 81
=
5
X
i=1
vi+
2
+
T
5
X
i=1
T
.
.
vi+
−2
− 10.
Note that the first sum equals (z + ) s+ /µ = 2n̄µ = 5. The second sum equals 5.0817.
Thus we obtain 4δ(z + , µ)2 = 0.0817, which gives δ(z + , µ) = 0.1429.
♦
34
I Theory and Complexity
Existence of the central path
Now suppose that we know the µ-center for µ = µ0 > 0 and let us denote z 0 = z(µ0 ).
Note that this is true for µ0 = 1, with z 0 = e, because es(e) = e. So e is the µ-center
for µ = 1.
Since z 0 s(z 0 ) = µ0 e, the v-vector for z 0 with respect to an arbitrary µ > 0 is given
by
s
s
s
z 0 s(z 0 )
=
µ
v=
Hence we have δ(z 0 , µ) ≤
√1
2
µ0
e.
µ
if and only if
1
2
Using kek =
µ0 e
=
µ
s
µ0
−
µ
s
µ
1
kek ≤ √ .
µ0
2
√
n̄, one may easily verify that this holds if and only if
r
1
1
1
2
µ
β := 1 + +
+ .
≤ 0 ≤ β,
2
β
µ
n̄
n̄
n̄
(2.44)
Now starting the Newton process at z 0 , with µ fixed, and such that µ satisfies (2.44),
we can generate an infinite sequence z 0 , z 1 , · · · z k , · · · such that
Hence
δ zk, µ ≤
1
.
22k−1
lim δ z k , µ = 0.
k→∞
The generated sequence has at least one accumulation point z ∗ , since the iterates
z 1 , · · · z k , · · · lie in the compact set
eT z + eT s(z) = n̄ (1 + µ) ,
z ≥ 0, s(z) ≥ 0,
due to (2.21) and (2.38). Since δ (z ∗ , µ) = 0, we obtain z ∗ s (z ∗ ) = µe. Due to Lemma
I.13, z ∗ is unique. This proves that the µ-center exists if µ satisfies (2.44) with µ0 = 1,
i.e., if
1
≤ µ ≤ β.
β
By redefining µ0 as one of the endpoints of the above interval we can repeat the above
procedure, and extend the interval where the µ-center exists to
1
≤ µ ≤ β2.
β2
and so on. After applying the procedure k times the interval where the µ-center
certainly exists is given by
1
≤ µ ≤ βk .
βk
I.2 Duality Theory
35
For arbitrary µ > 0, we have to apply the above procedure at most
|log µ|
log β
times, to prove the existence of the µ-center. This completes the proof of the existence
of the central path.
t
for t ≥ 0,19
It may be worth noting that, using n̄ ≥ 2 and log(1 + t) ≥ 1+t
q
!
r !
r
√
2
1
2
1
1
2
2
n̄
√ ≥√ .
q = √
log β = log 1 + +
≥ log 1 +
≥
+
n̄
n̄2
n̄
n̄
2
n̄ + 2
2n̄
1+
n̄
Hence the number of times that we have to apply the above described procedure to
obtain the µ-center is bounded above by
√
2n̄ |log µ| .
(2.45)
We have just shown that the system (2.28) has a unique solution for every positive µ.
The solution is called the µ-center, and denoted as z(µ). The set of all µ-centers is a
curve in the interior of the feasible region of (SP ). The definition of the µ-center, as
given by (2.28), can be equivalently given as the unique solution of the system
M z + q = s, z ≥ 0, s ≥ 0
zs = µe,
(2.46)
with M and z as defined in (2.11), and s = s(z), as in (2.17).20,21,22
2.8
Existence of a strictly complementary solution
Now that we have proven the existence of the central path we can use it as a guide
to the optimal set, by letting the parameter µ approach to zero. As we show in this
section, in this way we obtain an optimal solution z such that z + s(z) > 0.
Definition I.18 Two nonnegative vectors a and b in IRn are said to be complementary
vectors if ab = 0. If moreover a + b > 0 then a and b are called strictly complementary
vectors.
19
See, e.g., Exercise 39, page 133.
20
Exercise 10 Using the definitions of z and q, according to (2.11) and (2.12), show that ϑ(µ) = µ.
21
Exercise 11 In this exercise a skew-symmetric M and four vectors q (i) , i = 1, 2, 3, 4 are given as
follows:
M =
22
h
0
−1
1
0
i
,
q (1) =
h
0
0
i
,
q (2) =
h
1
0
i
,
q (3) =
h
0
1
i
,
q (4) =
h
1
1
i
.
For each of the four cases q = q (i) , i = 1, 2, 3, 4, one is asked to verify (1) if the system (2.46) has
a solution if µ > 0 and (2) if the first equation in (2.46) satisfies the IPC, i.e., has a solution with
z > 0 and s > 0.
Exercise 12 Show that z(µ) is continuous (and differentiable) at any positive µ. (Hint: Apply
the implicit function theorem (cf. Proposition A.2 in Appendix A) to the system (2.46)).
36
I Theory and Complexity
Recall that optimality of z means that zs(z) = 0, which means that z and s(z) are
complementary vectors. We are going to show that there exists an optimal vector z
such that z and s(z) are strictly complementary vectors. Then for every index i, either
zi > 0 or si (z) > 0. This implies that the index sets B and N , introduced in Section
2.5 form a partition of the index set, the so-called optimal partition of (SP ).
It is convenient to introduce some more notation.
Definition I.19 If z is a nonnegative vector, we define its support, denoted by σ(z),
as the set of indices i for which zi > 0:
σ(z) = {i : zi > 0} .
Note that if z is feasible then zs(z) = 0 holds if and only if σ(z) ∩ σ(s(z)) = ∅.
Furthermore, z is a strictly complementary optimal solution if and only if it is optimal
and σ(z) ∪ σ(s) = {1, 2, . . . , n̄}.
Theorem I.20 (SP ) has an optimal solution z ∗ with z ∗ + s(z ∗ ) > 0.
Proof: Let {µk }∞
k=1 be a monotonically decreasing sequence of positive numbers
µk such that µk → 0 if k → ∞, and let s(µk ) := s (z(µk )). Due to Lemma I.9
the set {(z(µk ), s(µk ))} lies in a compact set, and hence it contains a subsequence
converging to a point (z ∗ , s∗ ), with s∗ = s(z ∗ ). Since z(µk )T s(µk ) = n̄µk → 0, we
have (z ∗ )T s∗ = 0. Hence, from (2.25), q T z ∗ = 0. So z ∗ is an optimal solution.
We claim that (z ∗ , s∗ ) is a strictly complementary pair. To prove this, we apply the
orthogonality property (2.22) to the points z ∗ and z(µk ), which gives
(z(µk ) − z ∗ )T (s(µk ) − s∗ ) = 0.
Rearranging the terms, and using z(µk )T s(µk ) = n̄µk and (z ∗ )T s∗ = 0, we arrive at
X
X
zj (µk )s∗j = n̄µk .
zj∗ sj (µk ) +
j∈σ(z ∗ )
j∈σ(s∗ )
Dividing both sides by µk and recalling that zj (µk )sj (µk ) = µk , we obtain
X
j∈σ(z ∗ )
X
zj∗
s∗j
+
= n̄.
zj (µk )
sj (µk )
∗
j∈σ(s )
Letting k → ∞, the first sum on the left becomes equal to the number of positive
coordinates in z ∗ . Similarly, the second sum becomes equal to the number of positive
coordinates in s∗ . The sum of these numbers being n̄, we conclude that the optimal
pair (z ∗ , s∗ ) is strictly complementary.23,24
✷
23
24
By using a similar proof technique it can be shown that the limit of z(µ) exists if µ goes to zero.
In other words, the central path converges. The limit point is (of course) a uniquely determined
optimal solution of (SP ), which can further be characterized as the so-called analytic center of the
set of optimal solutions (cf. Section 2.11).
Let us also mention that Theorem I.20 is a special case of an old result of Goldman and Tucker which
states that every feasible linear system of equalities and inequalities has a strictly complementary
solution [111].
I.2 Duality Theory
37
By Theorem I.20 there exists a strictly complementary solution z of (SP ). Having
such a solution, the classes B and N simply follow from
B = {i : zi > 0} ,
N = {i : si (z) > 0} .
Now recall from Theorem I.5 and Theorem I.6 that the problems (P ) and (D) have
optimal solutions with vanishing duality gap if and only if (SP ) has an optimal solution
with κ > 0. Due to Theorem I.20 this can be restated as follows.
Corollary I.21 The problems (P ) and (D) have optimal solutions with vanishing
duality gap if and only if κ ∈ B.
Let us consider more in detail the implications of κ ∈ B for the problems (SP0 ), and
more importantly, for (P ) and (D).
Theorem I.20 implies the existence of a strictly complementary optimal solution z
of (SP ). Let z be such an optimal solution. Then we have
zs(z) = 0,
z + s(z) > 0,
z ≥ 0,
s(z) ≥ 0.
Now using s(z) = M z + q and ϑ = 0, and also (2.11) and (2.12), we obtain
z=
y
x
κ
0
≥ 0,
s(z) =
Ax − κb
−AT y + κc
b T y − cT x
n̄ − y T , xT , κ r
≥ 0.
Neglecting the last entry in both vectors, it follows that
Ax − κb
s̄(z̄) := M̄ z̄ = −AT y + κc ≥ 0,
b T y − cT x
y
z̄ := x ≥ 0,
κ
(2.47)
and moreover,
z̄s̄(z̄) = 0,
z̄ + s̄(z̄) > 0,
z̄ ≥ 0,
s̄(z̄) ≥ 0.
(2.48)
This shows that z̄ is a strictly complementary optimal solution of (SP0 ). Hence the
next theorem requires no further proof.
Theorem I.22 (SP0 ) has an optimal solution z̄ with z̄ + s̄(z̄) > 0.
Note that (2.47) and (2.48) are homogeneous in the variables x, y and κ. So, assuming
κ ∈ B, without loss of generality we may put κ = 1. Then we come to
y ≥ 0,
Ax − b ≥ 0,
x ≥ 0,
c − A y ≥ 0,
1 ≥ 0,
T
bT y − cT x ≥ 0,
y (Ax − b) = 0,
x c − AT y = 0,
bT y − cT x = 0,
y + (Ax − b) > 0,
x + c − AT y > 0,
1 + bT y − cT x > 0.
38
I Theory and Complexity
This makes clear that x is feasible for (P ) and y is feasible for (D), and because
cT x = bT y these solutions are optimal with vanishing duality gap. We get a little
more information from the above system, namely
y (Ax − b) = 0,
x c − AT y = 0,
y + (Ax − b) > 0,
x + c − AT y > 0.
The upper two relations show that the dual vector y and the primal slack vector
Ax − b are strictly complementary, whereas the lower two relations express that the
primal vector x and the dual slack vector c − AT y are strictly complementary. Thus
the following is also true.
Theorem I.23 If κ ∈ B then the problems (P ) and (D) have optimal solutions that
are strictly complementary with the slack vector of the other problem. Moreover, the
optimal values of (P ) and (D) are equal.
An intriguing question is of course what can be said about the problems (P ) and (D)
if κ ∈
/ B, i.e., if κ ∈ N . This question is completely answered in the next section.
2.9
Strong duality theorem
We start by proving the following lemma.
Lemma I.24 If κ ∈ N then there exist vectors x and y such that
x ≥ 0,
y ≥ 0,
Ax ≥ 0,
AT y ≤ 0,
bT y − cT x > 0.
Proof: Let κ ∈ N . Substitution of κ = 0 in (2.47) and (2.48) yields
y ≥ 0,
x ≥ 0,
0 ≥ 0,
Ax ≥ 0,
−AT y ≥ 0,
bT y − cT x ≥ 0,
y (Ax) = 0,
x AT y = 0,
0 bT y − cT x = 0,
y + Ax > 0,
x−AT y > 0,
0 + bT y − cT x > 0.
It follows that the vectors x and y are as desired, thus the lemma is proved.
✷
Let us call an LO-problem solvable if it has an optimal solution, and unsolvable
otherwise. Note that an LO-problem can be unsolvable for two possible reasons: the
domain of the problem is empty, or the domain is not empty but the objective function
is unbounded on the domain. In the first case the problem is called infeasible and in
the second case unbounded.
Theorem I.25 If κ ∈ N then neither (P ) nor (D) has an optimal solution.
Proof: Let κ ∈ N . By Lemma I.24 we then have vectors x and y such that
x ≥ 0,
y ≥ 0,
Ax ≥ 0,
AT y ≤ 0,
bT y − cT x > 0.
(2.49)
I.2 Duality Theory
39
By the last inequality we cannot have bT y ≤ 0 and cT x ≥ 0. Hence,
either
bT y > 0
or
cT x < 0.
(2.50)
Suppose that (P ) is not infeasible. Then there exists x∗ such that
x∗ ≥ 0
and
Ax∗ ≥ b.
Using (2.49) we find that x∗ + x ≥ 0 and A(x∗ + x) ≥ b. So x∗ + x is feasible for (P ).
We can not have bT y > 0, because this would lead to the contradiction
0 < bT y ≤ (Ax∗ )T y = x∗ T (AT y) ≤ 0,
since x∗ ≥ 0 and AT y ≤ 0. Hence we have bT y ≤ 0. By (2.50) this implies cT x < 0.
But then we have for any positive λ that x∗ + λx is feasible for (P ) and
cT (x∗ + λx) = cT x∗ + λcT x,
showing that the objective value goes to minus infinity if λ grows to infinity. Thus we
have shown that (P ) is either infeasible or unbounded, and hence (P ) has no optimal
solution.
The other case can be handled in the same way. If (D) is feasible then there exists
y ∗ such that y ∗ ≥ 0 and AT y ∗ ≤ c. Due to (2.49) we find that y ∗ + y ≥ 0 and
AT (y ∗ + y) ≤ c. So y ∗ + y is feasible for (D). We then can not have cT x < 0, because
this gives the contradiction
0 > cT x ≥ AT y ∗
T
x = y ∗ T (Ax) ≥ 0,
since y ∗ ≥ 0 and Ax ≥ 0. Hence cT x ≥ 0. By (2.50) this implies bT y > 0. But then we
have for any positive λ that y ∗ + λy is feasible for (D) and
bT (y ∗ + λy) = bT y ∗ + λbT y.
If λ grows to infinity then the last expression goes to infinity as well, so (D) is an
unbounded problem. Thus we have shown that (D) is either infeasible or unbounded.
This completes the proof.
✷
The following theorem summarizes the above results.
Theorem I.26 (Strong duality theorem) For an LO problem (P ) in canonical
form and its dual problem (D) we have the following two alternatives:
(i) Both (P ) and (D) are solvable and there exist (strictly complementary) optimal
solutions x for (P ) and y for (D) such that cT x = bT y.
(ii) Neither (P ) nor (D) is solvable.
This theorem is known as the strong duality theorem. It is the result that we
announced in Section 2.2. It implies that if one of the problems (P ) and (D) is solvable
then the other problem is solvable as well and in that case the duality gap vanishes
at optimality. So the optimal values of both problems are then equal.
40
I Theory and Complexity
If (B, N ) is the optimal partition of the self-dual problem (SP ) in which (P ) and
(D) are embedded, then case (i) occurs if κ ∈ B and case (ii) if κ ∈ N . Also, by
Theorem I.25, case (ii) occurs if and only if there exist x and y such that (2.49) holds,
and then at least one of the two problems is infeasible.
Duality is a major topic in the theory of LO. At many places in the book, and in
many ways, we explore duality properties. The above result concerns an LO problem
(P ) in canonical form and its dual problem (D). In the next section we will extend
the applicability of Theorem I.26 to any LO problem.
We conclude the present section with an interesting observation.
Remark I.27 In the classical approach to LO we have so-called theorems of the
alternatives, also known as variants of Farkas’ lemma. We want to establish here that the
fact that (2.47) has a strictly complementary solution for each vector c ∈ IRn implies Farkas’
lemma. We show this below for the following variant of the lemma.
Lemma I.28 (Farkas’ lemma [75]) For a given m × n matrix A and a vector b ∈ IRm
either the system
Ax ≥ b, x ≥ 0
has a solution or the system
AT y ≤ 0, bT y > 0, y ≥ 0
has a solution but not both systems have a solution.
Proof: The obvious part of the lemma is that not both systems can have a solution, because
this would lead to the contradiction
0 < bT y ≤ (Ax)T y = xT AT y ≤ 0.
Taking c = 0 in (2.47), it follows that there exist x and y such that the two vectors
y
z = x ≥ 0,
κ
Ax − κb
s(z) = −AT y ≥ 0
bT y
are strictly complementary. For κ there are two possibilities: either κ = 0 or κ > 0. In the
first case we obtain AT y ≤ 0, bT y > 0, y ≥ 0. In the second case we may assume without
loss of generality that κ = 1. Then x satisfies Ax ≥ b, x ≥ 0, proving the claim.25
•
2.10
The dual problem of an arbitrary LO problem
Every LO problem can be transformed into a canonical form. In fact, this can be done
in many ways. In its canonical form the problem has a dual problem. In this way we
can obtain a dual problem for any LO problem. Unfortunately the transformation to
canonical form is not unique, and as a consequence, the dual problem obtained in this
way is not uniquely determined.
25
Exercise 13 Derive Theorem I.22 from Farkas’ lemma. In other words, use Farkas’ lemma to show
that for any skew-symmetric matrix M there exists a vector x such that
x ≥ 0,
M x ≥ 0,
x + M x > 0.
I.2 Duality Theory
41
The aim of this section is to show that we can find a dual problem for any given
problem in a unique and simple way, so that when taking the dual of the dual problem
we get the original problem, in its original description.
Recall that three types of variables can be distinguished: nonnegative variables, free
variables and nonpositive variables. Similarly, three types of constraints can occur
in an LO problem: equality constraints, inequality constraints of the ≤ type and
inequality constraints of the ≥ type. For our present purpose we need to consider the
LO problem in its most general form, with all types of constraint and all types of
variable. Therefore, we consider the following problem as the primal problem:
(P )
T
0
A0 x0 + A1 x1 + A2 x2 = b0
x0
c
1
1
: B0 x0 + B1 x1 + B2 x2 ≥ b1 ,
min
x
c
c2
2
C0 x0 + C1 x1 + C2 x2 ≤ b2
x
x1 ≥ 0, x2 ≤ 0
,
where, for each i = 0, 1, 2, Ai , Bi and Ci are matrices and bi , ci and xi are vectors,
and the sizes of these matrices and vectors, which we do not further specify, are such
that all expressions in the problem are well defined.
Now let us determine the dual of this problem. We first put it into canonical form.26
To this end we replace the equality constraint by two inequality constraints and we
multiply the ≤ constraint by −1. Furthermore, we replace the nonpositive variable x2
by x3 = −x2 and the free variable x0 by x+ − x− with x+ and x− nonnegative. This
yields the following equivalent problem:
minimize
T
c0
−c0
c1
−c2
subject to
x+
x−
x1
x3
+ 0
x
A0 −A0 A1 −A2
b
−A0 A0 −A1 A2 x− −b0
B0 −B0 B1 −B2 x1 ≥ b1 ,
x3
−C0 C0 −C1 C2
−b2
x+
x−
x1 ≥ 0.
x3
In terms of vectors z 1 , z 2 , z 3 , z 4 that contain the appropriate nonnegative dual
variables, the dual of this problem becomes
maximize
26
T
b0
−b0
b1
−b2
z1
z2
z3
z4
The transformations carried out below lead to an increase of the numbers of constraints and
variables in the problem formulation. They are therefore ‘bad’ from a computational point of view.
But our present purpose is purely theoretical. In Appendix D it is shown how the problem can be
put in canonical form without increasing these numbers.
42
I Theory and Complexity
subject to
1 0
c
z
AT0 −AT0 B0T −C0T
−AT AT −B T C T z 2 −c0
0
0
0
0
,
≤
AT1 −AT1 B1T −C1T z 3 c1
−c2
z4
−AT2 AT2 −B2T C2T
z1
z2
3 ≥ 0.
z
z4
We can easily check that the variables z 1 and z 2 only occur together in the combination
z 1 − z 2 . Therefore, we can replace the variables by one free variable y 0 := z 1 − z 2 .
This reduces the problem to
maximize
subject to
T
y0
b0
1 3
b z
z4
−b2
0
c
AT0 B0T −C0T 0
" #
−c0
−AT −B T C T y
z3
0
0
0
3
,
≥ 0.
≤
z
c1
AT1 B1T −C1T
z4
4
z
−c2
−AT2 −B2T C2T
In this problem the first two blocks of constraints can be taken together into one block
of equality constraints:
T
0
0
T 0
T 3
T 4
0 "
#
b
y
A
y
+
B
z
−
C
z
=
c
0
0
0
3
z
1 3
T 0
T 3
T 4
1,
≥
0
.
max b z :
A1 y + B1 z − C1 z ≤ c
4
z
2
4
T
0
T
3
T
4
2
−b
z
−A2 y − B2 z + C2 z ≤ −c
Finally we multiply the last block of constraints by -1, we replace the nonnegative
variable z 3 by y 1 = z 3 and the nonnegative variable z 4 by the nonpositive variable
y 2 = −z 4 . This transforms the dual problem to its final form, namely
T
0
T 0
T 1
T 2
0
0
y
A0 y + B0 y + C0 y = c
b
1 1
1
2
2
1
1
T
0
T
T
(D) max b y : A1 y + B1 y + C1 y ≤ c , y ≥ 0, y ≤ 0 .
b2
y2
AT2 y 0 + B2T y 1 + C2T y 2 ≥ c2
Comparison of the primal problem (P ) with its dual problem (D), in its final
description, reveals some simple rules for the construction of a dual problem for
any given LO problem. First, the objective vector and the right-hand side vector are
interchanged in the two problems, and the constraint matrix is transposed. At first
sight it may not be obvious that the types of the dual variables and the dual constraints
can be determined. We need to realize that the vector y 0 of dual variables relates to
the first block of constraints in the primal problem, y 1 to the second block and y 2 to
the third block of constraints. Then the relation becomes obvious: equality constraints
in the primal problem yield free variables in the dual problem, inequality constraints
in the primal problem of type ≥ yield nonnegative variables in the dual problem, and
inequality constraints in the primal problem of type ≤ yield nonpositive variables in
the dual problem. For the types of dual constraint we have similar relations. Here the
I.2 Duality Theory
43
vector of primal variables x0 relates to the first block of constraints in the dual problem,
x1 to the second block and x2 to the third block of constraints. Free variables in the
primal problem yield equality constraints in the dual problem, nonnegative variables
in the primal problem yield inequality constraints of type ≤ in the dual problem, and
nonpositive variables in the primal problem yield inequality constraints of type ≥ in
the dual problem. Table 2.1. summarizes these observations, and as such provides a
simple scheme for writing down a dual problem for any given minimization problem.
To get the dual of a maximization problem, one simply has to use the table from the
right to the left.
Primal problem (P)
Dual problem (D)
min cT x
max bT y
equality constraint
free variable
inequality constraint ≥
variable ≥ 0
free variable
equality constraint
variable ≥ 0
inequality constraint ≤
variable ≤ 0
inequality constraint ≥
inequality constraint ≤
Table 2.1.
variable ≤ 0
Scheme for dualizing.
As indicated before, the dualizing scheme is such that when it is applied twice, the
original problem is returned. This easily follows from Table 2.1., by inspection.27
2.11
Convergence of the central path
We already announced in footnote 23 (page 36) that the central path has a unique
limit point in the optimal set. Because this result was not needed in the rest of this
chapter, we postponed its proof to this section. We also characterize the limit point
as the so-called analytic center of the optimal set of (SP ).
As before, we assume that the central path of (SP ) exists. Our aim is to investigate
the behavior of the central path as µ tends to 0. From the proof of Theorem I.20 we
know that the central path has a subsequence converging to an optimal solution. This
was sufficient for proving the existence of a strictly complementary solution of (SP ).
However, as we show below, the central path itself converges. The limit point z ∗ and
27
Exercise 14 Using the results of this chapter prove that the following three statements are
equivalent:
(i) (SP ) satisfies the interior-point
condition;
(ii) the level sets Lγ := (z, s(z)) : q T z ≤ γ, s(z) = M z + q ≥ 0, z ≥ 0 of q T z are bounded;
(iii) the optimal set of (SP ) is bounded.
44
I Theory and Complexity
its surplus vector s∗ := s(z ∗ ) form a strictly complementary optimal solution pair,
and hence determine the optimal partition (B, N ) of (SP ).
The optimal set of (SP ) is given by
SP ∗ = (z, s) : M z + q = s, z ≥ 0, s ≥ 0, q T z = 0 .
This makes clear that SP ∗ is the intersection of the affine space
(z, s) : M z + q = s, q T z = 0
with the nonnegative orthant of IR2n .
At this stage we need to define the analytic center of SP ∗ . We give the definition
for the more general case of an arbitrary (nonempty) set that is the intersection of an
affine space in IRp and the nonnegative orthant of IRp .
Definition I.29 (Analytic center) 28 Let the nonempty and bounded set T be the
intersection of an affine space in IRp with the nonnegative orthant of IRp . We define
the support σ(T ) of T as the subset of the full index set {1, 2, . . . , p} given by
σ(T ) = {i : ∃x ∈ T such that xi > 0} .
The analytic center of T is defined as the zero vector if σ(T ) is empty; otherwise it is
the vector in T that maximizes the product
Y
xi , x ∈ T .
(2.51)
i∈σ(T )
If the support of the set T in the above definition is nonempty then the convexity of
T implies the existence of a vector x ∈ T such that xσ(T ) > 0. Moreover, if σ(T ) is
nonempty then the maximum value of the product (2.51) exists since T is bounded.
Since the logarithm of the product (2.51) is strictly concave, the maximum value (if it
exists) is attained at a unique point of T . Thus the above definition uniquely defines
the analytic center for any bounded subset that is the intersection of an affine space
in IRp with the nonnegative orthant of IRp .
Due to Lemma I.9 any pair (z, s) ∈ SP ∗ satisfies
eT z + eT s(z) = n̄.
This makes clear that the optimal set SP ∗ is bounded. Its analytic center therefore
exists. We now show that the central path converges to this analytic center. The proof
very much resembles that of Theorem I.20.29
28
29
The notion of analytic center of a polyhedron was introduced by Sonnevend [257]. It plays a crucial
role in the theory of interior-point methods.
The limiting behavior of the central path as µ approaches zero has been an important subject in
research on interior-point methods for a long time. In the book by Fiacco and McCormick [77]
the convergence of the central path to an optimal solution is investigated for general convex
optimization problems. McLinden [197] considered the limiting behavior of the path for monotone
complementarity problems and introduced the idea for the proof-technique of Theorem I.20, which
was later adapted by Güler and Ye [135]. Megiddo [200] extensively investigated the properties of
the central path, which motivated Monteiro and Adler [218], Tanabe [261] and Kojima, Mizuno
and Yoshise [178] to investigate primal-dual methods. Other relevant references for the limiting
behavior of the central path are Adler and Monteiro [3], Asić, Kovačević-Vujčić and RadosavljevićNikolić [28], Güler [131], Kojima, Mizuno and Noma [176], Monteiro and Tsuchiya [222] and
Witzgall, Boggs and Domich [294], Halická [137], Wechs [290] and Zhao and Zhu [321].
I.2 Duality Theory
45
Theorem I.30 The central path converges to the analytic center of the optimal set
SP ∗ of (SP ).
Proof: Let (z ∗ , s∗ ) be an accumulation point of the central path, where s∗ = s(z ∗ ).
The existence of such a point has been established in the proof of Theorem I.20. Let
{µk }∞
k=1 be a positive sequence such that µk → 0 and such that (z(µk ), s(µk )), with
s(µk ) = s(z(µk )), converges to (z ∗ , s∗ ). Then z ∗ is optimal, which means z ∗ s∗ = 0,
and z ∗ and s∗ are strictly complementary, i.e, z ∗ + s∗ > 0.
Now let z̄ be optimal in (SP ) and let s̄ = M z̄ + q be its surplus vector. Applying
the orthogonality property (2.22) to the points z̄ and z(µ) we obtain
(z(µk ) − z̄)T (s(µk ) − s̄) = 0.
Rearranging terms and using z(µk )T s(µk ) = nµk and (z̄)T s̄ = 0, we get
n
X
z̄j sj (µk ) +
n
X
s̄j zj (µk ) = nµk .
j=1
j=1
Since the pair (z ∗ , s∗ ) is strictly complementary and (z̄, s̄) is an arbitrary optimal pair,
we have for each coordinate j:
s∗j = 0 ⇒ s̄j = 0.
zj∗ = 0 ⇒ z̄j = 0,
Hence, z̄j = 0 if j ∈
/ σ(z ∗ ) and s̄j = 0 if j ∈
/ σ(s∗ ). Thus we may write
X
X
s̄j zj (µk ) = nµk .
z̄j sj (µk ) +
j∈σ(s∗ )
j∈σ(z ∗ )
Dividing both sides by µk = zj (µk )sj (µk ), we get
X
X
z̄j
s̄j
+
= n.
zj (µk )
sj (µk )
∗
∗
j∈σ(z )
Letting k → ∞, it follows that
j∈σ(s )
X
j∈σ(z ∗ )
X s̄j
z̄j
+
= n.
∗
zj
s∗j
∗
j∈σ(s )
Using the arithmetic-geometric-mean inequality we obtain
1/n
X s̄j
X z̄j
Y z̄j Y s̄j
1
= 1.
+
≤
zj∗
s∗j
n
zj∗
s∗j
∗
∗
∗
∗
j∈σ(z )
j∈σ(s )
Obviously, the above inequality implies
Y
Y
s̄j ≤
z̄j
j∈σ(z ∗ )
j∈σ(s )
j∈σ(z )
j∈σ(s∗ )
Y
j∈σ(z ∗ )
zj∗
Y
s∗j .
j∈σ(s∗ )
Q
Q
This shows that (z ∗ , s∗ ) maximizes the product
j∈σ(z ∗ ) zj
j∈σ(s∗ ) sj over the
optimal set. Hence the central path of (SP ) has only one accumulation point when µ
approaches zero and this is the analytic center of SP ∗ .
✷
46
I Theory and Complexity
Example I.31 Let us compute the limit point of the central path of the self-dual
problem (SP ) in Example I.7, as given by (2.19). Recall from (2.26) in Example I.12
that any optimal solution has the form
2κ
0
0
κ
z = κ , s(z) = 0 , 0 ≤ κ ≤ 1,
κ
0
0
5 − 5κ
from which the sets B and N follow:
B = {1, 3, 4} ,
N = {2, 5} .
Hence we have for any optimal z,
Y
Y
zj
sj (z) = 2κ4 (5 − 5κ) = 10 κ4 − κ5 .
j∈B
j∈N
This product is maximal for κ = 0.8, so the analytical center of the optimal set is
given by 30,31,32,33
1.6
0
z = 0.8 ,
0.8
0
0
0.8
s(z) = 0 .
0
1
♦
The convergence of the central path when µ goes to zero implies the boundedness of
the coordinates of z(µ) and s(µ) for any finite section of the central path. Of course,
this also follows from Lemma I.9 and (2.33).34
30
Exercise 15 Find the analytic center of the self-dual problem considered in Exercise 4 (page 27).
31
Exercise 16 Find the analytic center of the self-dual problem considered in Exercise 5 (page 27).
32
Exercise 17 Find the analytic center of the self-dual problem considered in Exercise 6 (page 27).
33
Exercise 18 Find the analytic center of the self-dual problem considered in Exercise 7 (page 27).
34
Exercise 19 For any positive µ consider the set
SP µ :=
(z, s) : M z + q = s, z ≥ 0, s ≥ 0, q T z = q T z(µ) .
Using the same proof-technique as for Theorem I.30, show that the pair (z(µ), s(µ)) is the analytic
center of this set.
3
A Polynomial Algorithm for the
Self-dual Model
3.1
Introduction
The previous chapter made clear that any (canonical) LO problem can be solved by
finding a strictly complementary solution of a specific self-dual problem that satisfies
the interior-point assumption. In particular, the self-dual problem has the form
(SP )
min q T z : M z ≥ −q, z ≥ 0 ,
where M is a skew-symmetric matrix and q a nonnegative vector. Deviating from the
notation in Chapter 2 we denote the order of M as n (instead of n̄). Then, according
to (2.12) the vector q has the form
"
#
0n−1
q :=
.
(3.1)
n
Note that due to the definition of the matrix M we may assume that n ≥ 5.
Like before, we associate to any vector z ∈ IRn its slack vector s(z):
s(z) := M z + q.
(3.2)
As a consequence we have
z is a feasible for (SP ) if and only if z ≥ 0 and s(z) ≥ 0.
Also recall that the all-one vector e is feasible for (SP ) and its slack vector is the
all-one vector (cf. Theorem I.5):
s(e) = e.
(3.3)
Assuming that the entries in M and q are integral (or rational), we show in this chapter
that we can find a strictly complementary solution of (SP ) in polynomial time. This
means that we present an algorithm that yields a strictly complementary solution of
(SP ) after a number of arithmetic operations that is bounded by a polynomial in the
size of (SP ).
Remark I.32 The terminology is taken from complexity theory. For our purpose it is not
necessary to have a deep understanding of this theory. Major contributions to complexity
48
I Theory and Complexity
theory were given by Cook [56], Karp [166], Aho, Hopcroft and Ullman [5], and Garey and
Johnson [92]. For a survey focusing on linear and combinatorial optimization problems we
refer the reader to Schrijver [250]. Complexity theory distinguishes between easy and hard
problems. In this theory a problem consists of a class of problem instances, so ‘the’ LO
problem consists of all possible instances of LO problems; here we restrict ourselves to LO
problems with integral input data.1 A problem is called solvable in polynomial time (or simply
polynomial or easy) if there exists an algorithm that solves each instance of the problem in
a time that is bounded above by a polynomial in the size of the problem instance; otherwise
the problem is considered to be hard. In general the size of an instance is defined as the
length of a binary string encoding the instance. For the problem (SP ) such a string consists
of binary encodings of the entries in the matrix M and the vector q. Note that the binary
encoding of a positive integer a requires a string of length 1 + ⌈log2 (1 + |a|)⌉. (The first 1
serves to encode the sign of the number.) If the entries in M and q are integral, the total
length of the string for encoding (SP ) becomes
n
X
i=1
(1 + ⌈log2 (1 + |qi |)⌉) +
n(n + 1) +
n
X
i=1
n
X
i,j=1
(1 + ⌈log 2 (1 + |Mij |)⌉) =
⌈log 2 (1 + |qi |)⌉ +
n
X
i,j=1
⌈log2 (1 + |Mij |)⌉ .
(3.4)
Instead we work with the smaller number
L = n(n + 1) + log2 Π,
(3.5)
where Π is the product of all nonzero entries in q and M . Ignoring the integrality operators,
we can show that the expression in (3.4) is less than 2L. In fact, one can easily understand
that the number of operations of an algorithm is polynomial in (3.4) if and only if it is
bounded by a polynomial in L.
•
We consider the number L, as given by (3.5), as the size of (SP ). In fact we use
this number only once. In the next section we present an algorithm that generates a
positive vector z such that z T s(z) ≤ ε, where ε is any positive number, and we derive
a bound for the number of iterations required by the algorithm. Then, in Section 3.3,
we show that this algorithm can be used to find a strictly complementary solution of
(SP ) and we derive an iteration bound that depends on the so-called condition number
of (SP ). Finally, we show that the iteration bound can be bounded from above by a
polynomial in the quantity L, which represents the size of (SP ).
3.2
Finding an ε-solution
After all the theoretical results of the previous sections we are now ready to present
an algorithm that finds a strictly complementary solution of (SP ) in polynomial time.
The working horse in the algorithm is the Newton step that was introduced in Section
2.7.2. It will be convenient to recall its definition and some of its properties.
1
We could easily have included LO problems with rational input data in our considerations, because
if the entries in M and q are rational numbers then after multiplication of these entries with their
smallest common multiple, all entries become integral. Thus, each problem instance with rational
data can easily be transformed to an equivalent problem with integral data.
I.3 Polynomial Algorithm
49
Given a positive vector z such that s = s(z) > 0, the Newton direction ∆z at z
with respect to µ (or the µ-center z(µ)) is uniquely determined by the linear system
(cf. (2.35) – (2.36))
M ∆z − ∆s = 0,
z∆s + s∆z = µe − zs.
Substituting (3.6) into (3.7) we get
(3.6)
(3.7)
2
(S + ZM ) ∆z = µe − zs.
Since the matrix S + ZM is invertible (cf. Exercise 9, page 29), it follows that
−1
∆z = (S + ZM )
∆s = M ∆z.
(µe − zs)
(3.8)
(3.9)
The result of the Newton step is denoted as
z + := z + ∆z;
the new slack vector is then given by
s+ := s(z + ) = M (z + ∆z) + q = s + M ∆z.
The vectors ∆z and ∆s are orthogonal, by (2.34). After the Newton step the objective
value has the desired value nµ, by (2.38):
q T z = sT z = nµ.
The variance vector of z with respect to µ is defined by (cf. (2.41))3 :
s
zs(z)
.
v :=
µ
(3.10)
(3.11)
This implies
⇔
zs(z) = µe
v = e.
(3.12)
We use δ(z, µ) as a measure for the proximity of z to z(µ). It is defined by (cf. (2.42))
δ(z, µ) :=
1
2
v − v −1 .
(3.13)
If z = z(µ) then v = e and hence δ(z, µ) = 0, otherwise
√ δ(z, µ) > 0. If δ(z, µ) < 1
then the Newton step is feasible, and if δ(z, µ) < 1/ 2 then the Newton process
quadratically fast converges to z(µ). This is a consequence of the next lemma (cf.
Theorem I.16).
Lemma I.33 If δ := δ(z, µ) < 1, then the Newton step is strictly feasible, i.e., z + > 0
and s+ > 0. Moreover,
δ2
δ(z + , µ) ≤ p
.
2(1 − δ 2 )
2
3
Here, as usual, Z = diag (z) and S = diag (s).
Exercise 20 If we define d :=
p
z/s, where s = s(z), then show that the Newton step ∆z satisfies
(I + DM D) ∆z =
z −1
v
− v = µs−1 − z.
v
50
I Theory and Complexity
3.2.1
Newton-step algorithm
The idea of the algorithm is quite simple. Starting at z = e, we choose µ < 1 such
that
(3.14)
δ(z, µ) ≤ √12 ,
and perform a Newton step targeting at z(µ). After the step the new iterate z
satisfies δ(z, µ) ≤ 12 . Then we decrease µ such that (3.14) holds for the new values
of z and µ, and repeat the procedure. Note that after each Newton step we have
q T z = z T s(z) = nµ. Thus, if µ approaches zero, then z will approach the set of
optimal solutions. Formally the algorithm can be stated as follows.
Full-Newton step algorithm
Input:
An accuracy parameter ε > 0;
a barrier update parameter θ, 0 < θ < 1.
begin
z = e; µ := 1;
while nµ ≥ ε do
begin
µ := (1 − θ)µ;
z := z + ∆z;
end
end
Note that the reduction of the barrier parameter µ is realized by the multiplication
with the factor 1 − θ. In the next section we discuss how an appropriate value of the
update parameter θ can be obtained, so that during the course of the algorithm the
iterates are kept within the region where Newton’s method is quadratically convergent.
3.2.2
Complexity analysis
At the start of the algorithm we have µ = 1 and z = z(1) = e, whence q T z = n and
δ(z, µ) = 0. In each iteration µ is first reduced with the factor 1 − θ and then the
Newton step is made targeting the new µ-center. It will be clear that the reduction
of µ has effect on the value of the proximity measure. This effect is fully described by
the following lemma.
Lemma I.34 Let z > 0 and µ > 0 be such that s = s(z) > 0 and q T z = nµ. Moreover,
let δ := δ(z, µ) and µ′ = (1 − θ)µ. Then
δ(z, µ′ )2 = (1 − θ)δ 2 +
θ2 n
.
4(1 − θ)
I.3 Polynomial Algorithm
51
Proof: Let δ + := δ(z, µ′ ) and v =
4(δ + )2 =
p
zs/µ, as in (3.11). Then, by definition,
√
v
1 − θ v −1 − √
1−θ
2
√
=
θv
1 − θ v −1 − v − √
1−θ
2
.
From z T s = nµ it follows that kvk2 = n. This implies
v T v −1 − v = n − kvk2 = 0.
Hence, v is orthogonal to v −1 − v. Therefore,
4(δ + )2 = (1 − θ) v −1 − v
2
2
+
θ2 kvk
= (1 − θ) v −1 − v
1−θ
Since v −1 − v = 2δ, the result follows.
Lemma I.35 Let θ =
√1 .
2n
2
+
nθ2
.
1−θ
✷
Then at the start of each iteration we have
q T z = nµ
and
δ(z, µ) ≤
1
2
(3.15)
Proof: At the start of the first iteration we have µ = 1 and z = e, so q T z = n and
δ(z, µ) = 0. Therefore (3.15) certainly holds at the start of the first iteration. Now
suppose that (3.15) holds at the start of some iteration. We show that (3.15) then also
holds at the start of the next iteration. Let δ = δ(z, µ). When the barrier parameter
is updated to µ′ = (1 − θ)µ, Lemma I.34 gives
δ(z, µ′ )2 = (1 − θ) δ 2 +
θ2 n
1−θ
1
3
≤
+
≤ .
4(1 − θ)
4
8(1 − θ)
8
The
√ last inequality can be understood as follows. Due to n ≥ 2 we have 0 ≤ θ ≤
1/ 4 = 1/2. The left hand side expression in the last inequality is a convex function
of θ. Its value at θ = 0 as well as at θ = 1/2 equals 3/8, hence its value does not
exceed 3/8 for θ ∈ [0, 1/2].
√
Since 3/8 ≤ 1/2, it follows that after the µ-update δ(z, µ′ ) ≤ 1/ 2. Hence, by
Lemma I.33, after performing the Newton step we certainly have δ(z + , µ′ ) ≤ 1/2.
Finally, by (3.10), q T z + = nµ′ . Thus the lemma has been proved.
✷
How many iterations are needed by the algorithm? The answer is provided by the
following lemma.
Lemma I.36 After at most
iterations we have nµ ≤ ε.
1
n
log
θ
ε
Proof: Initially, the objective value is n and in each iteration it is reduced by the
factor 1 − θ. Hence, after k iterations we have µ = (1 − θ)k . Therefore, the objective
value, given by q T z(µ) = nµ, is smaller than, or equal to ε if
k
(1 − θ) n ≤ ε.
52
I Theory and Complexity
Taking logarithms, this becomes
k log (1 − θ) + log n ≤ log ε.
Since − log (1 − θ) ≥ θ, this certainly holds if
kθ ≥ log n − log ε = log
n
.
ε
This implies the lemma.
✷
The above results are summarized in the next theorem which requires no further
proof.
Theorem I.37 If θ =
√1
2n
then the algorithm requires at most
l√
nm
2n log
ε
iterations. The output is a feasible z > 0 such that q T z = nµ ≤ ε and δ(z, µ) ≤ 12 .
This theorem shows that we can get an ε-solution z of our self-dual model with ε as
small as desirable.4
A crucial question for us is whether the variable κ = zn−1 is positive or zero in
the limit, when µ goes to zero. In practice, for small enough ε it is usually no serious
problem to decide which of the two cases occurs. In theory, however, this means that
we need to know what the optimal partition of the problem is. As we explain in the
next section, the optimal partition can be found in polynomial time. This requires
some further analysis of the central path.
Example I.38 In this example we demonstrate the behavior of the Full-Newton step
algorithm by applying it to the problem (SP ) in Example I.7, as given in (2.19) on
page 23. According to Theorem I.37, with n = 5, the algorithm requires at most
√
5
10 log
ε
iterations. For ε = 10−3 we have log (5/ε) = log 5000 = 8.5172, and we get 27 as an
upper bound for the number of iterations. When running the algorithm with this ε
the actual number of iterations is 22. The actual values of the output of the algorithm
are
z = (1.5999, 0.0002, 0.8000, 0.8000, 0.0002)T
and
s(z) = (0.0001, 0.8000, 0.0002, 0.0002, 1.0000)T .
The left plot in Figure 3.1 shows how the coordinates of the vector z := (z1 , z2 , z3 , z4 =
κ, z5 = ϑ), which contains the variables in the problem, develop in the course of the
algorithm. The right plot does the same for the coordinates of the surplus vector
s(z) := (s1 , s2 , s3 , s4 , s5 ). Observe that z and s(z) converge nicely to the limit point
of the central path of the sample problem as given in Example I.31.
♦
4
It is worth pointing out that if we put ε = nµ in the iteration bound of Theorem I.37 we get
exactly the same bound as given by (2.45).
I.3 Polynomial Algorithm
53
1.6
1.6
1.4
1.4
■
z1
1.2
1.2
κ
1
s5
✠
1
✠
0.8
✻
0.8
■
s2
z3
0.6
0.6
s3
z2
0.4
✠
0.4
✠
■
✒
0.2
s4
✒
0.2
ϑ
s1
0
0
5
3.3.1
15
✲ iteration number
Figure 3.1
3.3
10
0
20
0
5
10
15
✲ iteration number
20
Output Full-Newton step algorithm for the problem in Example I.7.
Polynomial complexity result
Introduction
Having a strictly complementary solution z of (SP ), we also know the optimal partition
(B, N ) of (SP ), as defined in Section 2.6. For if z is a strictly complementary solution
of (SP ) then we have zs(z) = 0 and z + s(z) > 0, and the optimal partition follows
from5
B = {i : zi > 0}
N = {i : si (z) > 0} .
Definition I.39 The restriction of a vector z ∈ IRn to the coordinates in a subset I
of the full index set {1, 2, . . . , n} is denoted by zI .
Hence if z̃ is a strictly complementary solution of (SP ) then
z̃B > 0,
z̃N = 0,
sB (z̃) = 0,
sN (z̃) > 0.
Now let z be any feasible solution of (SP ). Then, by Lemma I.10, with z1 = z, z2 = z̃
we obtain that z is optimal if and only if zs(z̃) = z̃s(z) = 0. This gives
z
5
is optimal for (SP )
⇐⇒
zN = 0
and sB (z) = 0.
It may be sensible to point out that if, conversely, the optimal partition is known, then it is not
obvious at all how to find a strictly complementary solution of (SP ).
54
I Theory and Complexity
As a consequence, the set SP ∗ of optimal solutions of (SP ) is completely determined
by the optimal partition (B, N ) of (SP ). We thus may write
SP ∗ = {z ∈ SP : zN = 0, sB (z) = 0} ,
where SP denotes the feasible region of (SP ).
Assuming that M and q are integral we show in this section that a strictly
complementary solution of (SP ) can be found in polynomial time. We divide the
work into a few steps. First we apply the Full-Newton step algorithm with a suitable
(small enough) value of the accuracy parameter ε. This yields a positive solution z
of (SP ) such that s(z) is positive as well and such that the pair (z, s(z)) is almost
strictly complementary in the sense that for each index i one of the positive coordinates
in the pair (zi , si (z)) is large and the other is small. To distinguish between ‘large’
and ‘small’ coordinates we introduce the so-called condition number of (SP ). We
are able to specify ε such that the large coordinates of z are in B and the small
coordinates of z in N . The optimal partition of (SP ) can thus be derived from
the almost strictly complementary solution z provided by the algorithm. Then, in
Section 3.3.6, a rounding procedure is described that yields a strictly complementary
solution of (SP ) in polynomial time.
3.3.2
Condition number
Below, (B, N ) always denotes the optimal partition of (SP ), and SP ∗ the optimal set
of (SP ). We first introduce the following two numbers:
z
σSP
:= min max∗ {zi } ,
i∈B z∈SP
s
σSP
:= min max∗ {si (z)} .
i∈N z∈SP
z
s
By convention we take σSP
= ∞ if B is empty and σSP
= ∞ if N is empty. Since the
∗
z
s
optimal set SP is bounded, σSP is finite if B is nonempty and σSP
is finite if N is
nonempty. Due to the definition of the sets B and N both numbers are positive, and
since B and N cannot be both empty at least one of the two numbers is finite. As a
consequence, the number
z
s
σSP := min {σSP
, σSP
}
is positive and finite. This number plays a crucial role in the further analysis and is
called the condition number of (SP ).6 Using that z and s(z) are complementary if
z ∈ SP ∗ we can easily verify that σSP can also be written as
σSP := min max∗ {zi + si (z)} .
1≤i≤n z∈SP
Example I.40 Let us calculate the condition number of our sample problem (2.19)
in Example I.7. We found in Example I.12 that any optimal solution z has the form
6
This condition number seems to be a natural quantity for measuring the degree of hardness of
(SP ). The smaller the number the more difficult it is to find a strictly complementary solution.
In a more general context, it was introduced by Ye [311]. See also Ye and Pardalos [314]. For a
discussion of other condition numbers and their relation to the size of a problem we refer the reader
to Vavasis and Ye [280].
I.3 Polynomial Algorithm
as given by (2.26), namely
2κ
0
z = κ ,
κ
0
Hence we have for any optimal z,
55
0
κ
s(z) = 0 ,
0
5 − 5κ
2κ
κ
z + s(z) = κ ,
κ
5 − 5κ
0 ≤ κ ≤ 1.
0 ≤ κ ≤ 1.
To find the condition number we need to find the maximum values of each of the
variables in in this vector. These values are 2, 1, 1, 1 (for κ = 1) and 5 (for κ = 0),
respectively. The minimum of these maximal values being 1, the condition number of
our sample problem (2.19) turns out to be 1.7,8,9,10
♦
In the above example we were able to calculate the condition number of a given
problem. We see below that when we know the condition number of a problem we
can profit from it in the solution procedure. In general, however, the calculation of
the condition number is at least as hard as solving the problem. Hence, in general, we
have to solve a problem without knowing its condition number. In such cases there is
a cheap way to get a lower bound for the condition number. We proceed by deriving
such a lower bound for σSP in terms of the data of the problem (SP ). We introduce
some more notation.
Definition I.41 The submatrix of M consisting of the elements in the rows whose
indices are in I and the columns whose indices are in J is denoted by MIJ .
Using this convention, we have for any vector z and its surplus vector s = s(z):
sB
zB
MBB MBN
q
=
+ B .
(3.16)
sN
zN
MN B MN N
qN
Recall from the previous section that the vector z is optimal if and only if z and s are
T
nonnegative, zN = 0 and sB = 0. Hence we have q T z = qB
zB . Due to the existence of
7
Exercise 21 Using the results of Exercise 4 (page 27), prove that the condition number of the
self-dual problem in question equals 5/4.
8
Exercise 22 Using the results of Exercise 5 (page 27), prove that the condition number of the
self-dual problem in question equals 5/4.
9
Exercise 23 Using the results of Exercise 6 (page 27), prove that the condition number of the
self-dual problem in question equals 5/(1 + β) if β ≥ 2 and otherwise 5β/(2(1 + β)).
10
Exercise 24 Using the results of Exercise 7 (page 27), prove that the condition number of the
self-dual problem in question equals 5/(4 − β) if β ≤ −1 and otherwise −5β/(4 − β).
56
I Theory and Complexity
a strictly complementary solution z, for which zB is positive, we conclude that
qB = 0.
(3.17)
Thus it becomes clear that a vector z and its surplus vector s are optimal for (SP ) if
and only if zB ≥ 0, zN = 0, sB = 0, sN ≥ 0 and
0
MBB MBN
z
0
=
B + .
sN
MN B MN N
0
qN
This is equivalent to
MBB 0BN
zB
0B
=
,
MN B −IN N
sN
−qN
zB ≥ 0, zN = 0, sB = 0, sN ≥ 0.
(3.18)
Note that any strictly complementary solution z gives rise to a positive solution of
this system. For the calculation of σSP we need to know the maximal value of each
coordinate of the vector (zB , sN ) when this vector runs through all possible solutions
of (3.18). Then σSP is just the smallest of all these maximal values.
At this stage we may apply Lemma C.1 in Appendix C to (3.18), which gives us an
easy to compute lower bound for σSP .
Theorem I.42 The condition number σSP of (SP ) satisfies
1
,
j=1 kMj k
σSP ≥ Qn
where Mj denotes the j-th column of M .
Proof: Recall that the optimal set of (SP ) is determined by the equation (3.18). Also,
by Lemma I.9 we have eT z + eT s(z) = n, showing that the optimal set is bounded.
As we just established, the system (3.18) has a positive solution, and hence we may
apply Lemma C.1 to (3.18) with
MBB
0BN
A=
.
MN B −IN N
The columns in A made up by the two left blocks are the columns Mj of M with
j ∈ B, whereas the columns made up by the two right blocks are unit vectors. Thus
we obtain that the maximal value of each coordinate of the vector (zB , sN ) is bounded
below by the quantity
1
Q
.
j∈B kMj k
With the definition of σSP this implies
σSP ≥ Q
1
1
≥ Qn
.
kM
k
j
j∈B
j=1 kMj k
The last inequality is an obvious consequence of the assumption that all columns in
M are nonzero and integral. Hence the theorem has been proved.
✷
I.3 Polynomial Algorithm
3.3.3
57
Large and small variables
It will be convenient to call the coordinates of z(µ) that are indexed by B the
large coordinates of z(µ), and the other coordinates the small coordinates of z(µ).
Furthermore, the coordinates of sN (µ) are called the large coordinates of s(µ), and
the coordinates of sB (µ) small coordinates of s(µ). The next lemma provides lower
bounds for the large coordinates and upper bounds for the small coordinates of z(µ)
and s(µ). This lemma implies that the large coordinates of z(µ) and s(µ) are bounded
away from zero along the central path, and there exists a uniform lower bound that is
independent of µ. Moreover, the small coordinates are bounded above by a constant
times µ, where the constant depends only on the data in the problem (SP ). In other
words, the order of magnitude of the small coordinates is O(µ). The bounds in the
lemma use the condition number σSP of (SP ).
Lemma I.43 For any positive µ we have
zi (µ) ≥
σSP
, i ∈ B,
n
zi (µ) ≤
nµ
, i ∈ N,
σSP
si (µ) ≤
nµ
, i ∈ B,
σSP
si (µ) ≥
σSP
, i ∈ N.
n
Proof: First let i ∈ N and let z̃ be an optimal solution such that s̃i := si (z̃) is
maximal. Then the definition of the condition number σSP implies that s̃i ≥ σSP .
Applying the orthogonality property (2.22) to the points z̃ and z(µ) we obtain
(z(µ) − z̃)T (s(µ) − s̃) = 0,
which gives
z(µ)T s̃ + s(µ)T z̃ = nµ.
This implies
zi (µ)s̃i ≤ z(µ)T s̃ ≤ nµ.
Dividing by s̃i and using that s̃i ≥ σSP we obtain
zi (µ) ≤
nµ
nµ
≤
.
s̃i
σSP
Since zi (µ)si (µ) = µ, it also follows that
si (µ) ≥
σSP
.
n
This proves the second and fourth inequality in the lemma. The other inequalities are
obtained in the same way. Let i ∈ B and let z̃ be an optimal solution such that z̃i
is maximal. Then the definition of the condition number σSP implies that z̃i ≥ σSP .
Applying the orthogonality property to the points z̃ and z(µ) we obtain in the same
way as before
si (µ)z̃i ≤ s(µ)T z̃ ≤ nµ.
From this we deduce that
si (µ) ≤
nµ
nµ
≤
.
z̃i
σSP
58
I Theory and Complexity
Using once more z(µ)s(µ) = µe we find
zi (µ) ≥
σSP
,
n
completing the proof of the lemma.11,12
✷
We collect the results of the above lemma in Table 3.1..
i∈B
Table 3.1.
i∈N
zi (µ)
≥
σSP
n
≤
nµ
σSP
si (µ)
≤
nµ
σSP
≥
σSP
n
Estimates for large and small variables on the central path.
The lemma has an important consequence. If µ is so small that
σSP
nµ
<
σSP
n
then we have a complete separation of the small and the large variables. This means
that if a point z(µ) on the central path is given so that
µ<
σSP 2
,
n2
then we can determine the optimal partition (B, N ) of (SP ).
In the next section we show that the Full-Newton step algorithm can produce a
point z in the neighborhood of the central path with this feature, namely that it gives
a complete separation of the small and the large variables.
3.3.4
Finding the optimal partition
Theorem I.37 states that after at most
l√
nm
2n log
ε
11
12
(3.19)
Exercise 25 Let 0 < µ < µ̄. Using the orthogonality property (2.22), show that for each i
(1 ≤ i ≤ n),
zi (µ)
si (µ)
+
≤ 2n.
zi (µ̄)
si (µ̄)
The result in Exercise 25 can be improved to
si (µ)
zi (µ)
+
≤ n,
zi (µ̄)
si (µ̄)
which implies
zi (µ) ≤ nzi (µ̄),
si (µ) ≤ nsi (µ̄).
For a proof we refer the reader to Vavasis and Ye [281].
I.3 Polynomial Algorithm
59
iterations the Full-Newton step algorithm yields a feasible solution z such that
q T z = nµ ≤ ε and δ(z, µ) ≤ 12 . We show in this section that if µ is small enough
we can recognize the optimal partition (B, N ) from z, and such z can be found in
a number of iterations that depends only on the dimension n and on the condition
number σSP of (SP ).
We need a simple measure for the distance of z to the central path. To this end, for
each positive feasible vector z with s(z) > 0, we define the number δc (z) as follows:
δc (z) :=
max (zs(z))
.
min (zs(z))
(3.20)
Observe that δc (z) = 1 if and only if zs(z) is a multiple of the all-one vector e. This
occurs precisely if z lies on the central path. Otherwise we have δc (z) > 1. We consider
δc (z) as an indicator for the ‘distance’ of z to the central path.13
Lemma I.44 If δ(z, µ) ≤
1
2
then δc (z) ≤ 4.
Proof: Using the variance vector v of z, with respect to the given µ > 0, we may
write
max µv 2
max v 2
δc (z) =
=
.
min (µv 2 )
min (v 2 )
Using (3.13), it follows from δ(z, µ) ≤
1
2
that
v − v −1 ≤ 1.
Without loss of generality we assume that the coordinates of v are ordered such that
v1 ≥ v2 ≥ . . . ≥ vn .
Then δc (z) = v12 /vn2 . Now consider the problem
max
v12
:
vn2
v−v
−1
≤1 .
The optimal value of this problem is an upper bound for
√ δc (z). One may
√ easily verify
that the optimal solution has vi = 1 if 1 < i < n, v1 = 2 and vn = 1/ 2. Hence the
optimal value is 4.14 This proves the lemma.
✷
13
14
In the analysis of interior-point methods we always need to introduce a quantity that measures the
‘distance’ of a feasible vector x to the central path. This can be done in many ways as becomes
apparent in the course of this book. In the coming chapters we make use of a variety of so-called
proximity measures. All these measures are based on the simple observation that x is on the central
path if and only if the vector xs(x) is a scalar multiple of the all-one vector.
Exercise 26 Prove that
max
(
v12
2
vn
:
n
X
i=1
1
vi −
vi
2
)
≤1
= 4.
60
I Theory and Complexity
Lemma I.45 Let z be a feasible solution of (SP ) such that δc (z) ≤ τ . Then, with
s = s(z), we have
zT s
σSP
zi ≤
, i ∈ N,
, i ∈ B,
zi ≥
τn
σSP
si ≤
zT s
, i ∈ B,
σSP
si ≥
σSP
, i ∈ N.
τn
Proof: The proof is basically the same as the proof of Lemma I.43. It is a little more
complicated because the estimates now concern a point off the central path. From
δc (z) ≤ τ we conclude that there exist positive numbers τ1 and τ2 such that τ τ1 = τ2
and
τ1 ≤ zi si ≤ τ2 , 1 ≤ i ≤ n.
(3.21)
When we realize that these inequalities replace the role of the identity zi (µ)si (µ) = µ
in the proof of Lemma I.43 the generalization becomes almost straightforward. First
suppose that i ∈ N and let z̃ be an optimal solution such that s̃i := si (z̃) is
maximal. Then, from to the definition of σSP , it follows that s̃i ≥ σSP . Applying
the orthogonality property (2.22) to the points z̃ and z, we obtain in the same way as
before
zi s̃i ≤ z T s̃ ≤ z T s.
Hence, dividing both sides by s̃i and using that s̃i ≥ σSP we get
zi ≤
zT s
.
σSP
From the left inequality in (3.21) we also have zi si ≥ τ1 . Hence we must have
si ≥
τ1 σSP
.
zT s
The right inequality in (3.21) gives z T s ≤ nτ2 . Thus
si ≥
τ1 σSP
σSP
=
.
nτ2
nτ
This proves the second and fourth inequality in the lemma. The other inequalities
are obtained in the same way. If i ∈ B and z̃ is an optimal solution such that z̃i is
maximal, then z̃i ≥ σSP . Applying the orthogonality property (2.22) to the points z̃
and z we obtain
si z̃i ≤ sT z̃ ≤ z T s.
Thus we get
si ≤
zT s
zT s
≤
.
z̃i
σSP
Using once more that zi si ≥ τ1 and z T s ≤ nτ2 we obtain
zi ≥
σSP
τ1 σSP
τ1 σSP
=
≥
,
zT s
nτ2
nτ
completing the proof of the lemma.
✷
I.3 Polynomial Algorithm
61
i∈B
Table 3.2.
i∈N
zi
≥
σSP
τn
≤
zT s
σSP
si (z)
≤
zT s
σSP
≥
σSP
τn
Estimates for large and small variables if δc (z) ≤ τ .
The results of the above lemma are shown in Table 3.2.. We conclude that if z T s
is so small that
σSP
zT s
<
σSP
τn
then we have a complete separation of the small and the large variables. Thus we may
state without further proof the following result.
Lemma I.46 Let z be a feasible solution of (SP ) such that δc (z) ≤ τ . If
z T s(z) <
σSP 2
τn
then the optimal partition of (SP ) follows from
B = {i : zi > si (z)}
and
N = {i : zi < si (z)} .
(3.22)
This lemma is the basis of our next result.
Theorem I.47 After at most
√
4n2
2n log
σSP 2
(3.23)
iterations, the Full-Newton step algorithm yields a feasible (and positive) solution z of
(SP ) that reveals the optimal partition (B, N ) of (SP ) according to (3.22).
2
Proof: Let us run the Full-Newton step algorithm with ε = σSP
/ (4n). Then Theorem
T
2
I.37 states that we obtain a feasible z with z s(z) ≤ σSP / (4n) and δ(z, µ) ≤ 1/2.
Lemma I.44 implies that δc (z) ≤ 4. By Lemma I.46, with τ = 4, this z gives a complete
separation between the small variables and the large variables. By Theorem I.37, the
required number of iterations for the given ε is at most
√
4n2
2n log
σSP 2
which is equal to the bound given in the theorem. Thus the proof is complete.
✷
62
I Theory and Complexity
Example I.48 Let us apply Theorem I.47 to the self-dual problem (2.19) in Example
I.7. Then n = 5 and, according to Example I.40 (page 54), σSP = 1. Thus the iteration
bound (3.23) in Theorem I.47 becomes
m
l√
10 log (100) = ⌈14.5628⌉ = 15.
With the help of Figure 3.1 (page 53) we can now determine the optimal partition
and we find
B = {1, 3, 5} , N = {2, 4} ,
♦
in agreement with the result of Example I.12.
3.3.5
A rounding procedure for interior-point solutions
We have just established that the optimal partition of (SP ) can be found after a
finite number of iterations of the Full-Newton step algorithm. The required number
of iterations is at most equal to the number given by (3.23). After this number of
iterations the small variables and the large variables are well enough separated from
each other to reveal the classes B and N that constitute the optimal partition.
The aim of this section and the next section is to show that if B has been fixed then
a strictly complementary solution of (SP ) can be obtained with little extra effort.15
First we establish that the class B is not empty.
Lemma I.49 The class B in the optimal partition of (SP ) is not empty.
Proof: If B is the empty set then z = 0 is the only optimal solution. Since, by
Theorem I.20, this solution must be strictly complementary we must have s(z) > 0.
Since s(z) = M z + q = q, we find q > 0. This contradicts that q has zero entries, by
(3.1). This proves the lemma.
✷
Assuming that the optimal partition (B, N ) has been determined, with B nonempty,
we describe a rounding procedure that can be applied to any positive vector z with
positive surplus vector s(z) to yield a vector z̄ such that z̄ and its surplus vector
s̄ = s(z̄) are complementary (in the sense that z̄N = s̄B = 0) but not necessarily
nonnegative. In the next section we run the algorithm an additional number of
iterations to get a sharper separation between the small and the large variables and
we show that the rounding procedure yields a strictly complementary solution in
polynomial time.
Let us have a positive vector z with positive surplus vector s(z). Recall from (3.16),
page 55, that
15
sB
sN
=
MBB MBN
MN B MN N
zB
zN
+
qB
qN
.
It is generally believed that interior-point methods for LO never generate an exact optimal solution
in polynomial time (Andersen and Ye [11]). In fact, Ye [308] showed in 1992 that a strictly
complementary solution can be found in polynomial time by all the known O(n3 L) interior-point
methods. See also Mehrotra and Ye [208]. The rounding procedure described in this chapter is
essentially the same as the one presented in these two papers and leads to finite termination of the
algorithm.
I.3 Polynomial Algorithm
63
This implies that
sB = MBB zB + MBN zN + qB .
Since qB = 0, by (3.17), ξ = zB satisfies the system of equations in the unknown
vector ξ given by
MBB ξ = sB − MBN zN .
(3.24)
Note that zB is a ‘large’ solution of (3.24), because the entries of zB are large variables.
On the other hand we can easily see that (3.24) must have more solutions. This follows
from the existence of a strictly complementary solution of (SP ), because for any such
solution z̃ we derive from z̃N = 0 and sB (z̃) = 0 that MBB z̃B = 0. Since z̃B > 0, it
follows that the columns of MBB are linearly dependent, and hence (3.24) has multiple
solutions.
Now let ξ be any solution of (3.24) and consider the vector z̄ defined by
z̄B = zB − ξ,
z̄N = 0.
For the surplus vector s̄ = s(z̄) of z̄ we have
s̄B = MBB z̄B + MBN z̄N = MBB z̄B = MBB (zB − ξ) = 0.
So we have z̄N = s̄B = 0, which means that the vectors z̄ and s̄ are complementary. It
will be clear, however, that the vectors z̄ and s̄ are not necessarily nonnegative. This
holds only if
z̄B = zB − ξ ≥ 0,
and
s̄N = MN B z̄B + MN N z̄N + qN = MN B (zB − ξ) + qN = sN − MN N zN − MN B ξ ≥ 0.
We conclude that if (3.24) admits a solution ξ that satisfies the last two inequalities
then z̄ is a solution of (SP ). Moreover, if ξ satisfies these inequalities strictly, so that
zB − ξ > 0,
sN − MN N zN − MN B ξ > 0,
(3.25)
then z̄ is a strictly complementary solution of (SP ). In the next section we show that
solving (3.24) by Gaussian elimination gives such a solution, provided the separation
between the small and the large variables is sharp enough.
Example I.50 In this example we show that the Full-Newton step algorithm
equipped with the above described rounding procedure solves the sample problem
(2.19) in Example I.7 in one iteration. Recall from Example I.14 that the Newton step
in the first iteration√is given by (2.39) and (2.40). Since in this iteration µ = 1 − θ,
substituting θ = 1/ 10, we find
1
∆z = − √
10
T
1 8 4 1
T
− , , , ,1
= − (−0.1054, 0.2811, 0.1405, 0.0351, 0.3162) ,
3 9 9 9
and
1
∆s = − √
10
4 1 5 8
, , , ,0
3 9 9 9
T
T
= − (0.4216, 0.0351, 0.1757, 0.2811, 0.0000) .
64
I Theory and Complexity
Hence, after one iteration the new iterate is given by
z = (1.1054, 0.7189, 0.8595, 0.9649, 0.6838)T ,
and
T
s = (0.5784, 0.9649, 0.8243, 0.7189, 1.0000) .
Let us compute the sets B and N , as defined by (3.22). This gives
B = {1, 3, 4} ,
N = {2, 5} .
It is worth mentioning that these are already the classes of the optimal partition of
the problem. This becomes clear by applying the rounding procedure at z with respect
to the partition (B, N ). The matrix MBB is given by
0
1 −1
MBB = −1
0
2 .
1 −2
0
We have
−0.1054
1.1054
0
1 −1
MBB zB = −1
0
2 0.8595 = 0.8243 .
−0.6135
0.9649
1 −2
0
So we need to find a ‘small’ solution ζ of the system
−0.1054
0
1 −1
MBB zB = −1
0
2 ζ = 0.8243 .
−0.6135
1 −2
0
A solution of this system is
0.0000
ζ = 0.3067 .
0.4122
The rounded solution is now defined by
1.1054
0.0000
1.1054
z̄B = zB − ζ = 0.8595 − 0.3067 = 0.5527 ,
0.5527
0.4122
0.9649
z̄N = 0.
Hence the rounded solution is
z = (1.1054, 0.0000, 0.5527, 0.5527, 0.0000)T .
The corresponding slack vector is
s(z) = M z + q = (0.0000, 0.5527, 0.0000, 0.0000, 2.2365)T .
Since z and s(z) are nonnegative and complementary, z is optimal. Moreover, z+s(z) >
0, so z is a strictly complementary solution. Hence we have solved the sample problem
in one iteration.
♦
I.3 Polynomial Algorithm
65
Remark I.51 In the above example we used for ξ the least norm solution of (3.24). This
is the solution of the minimization problem
min {kξk : MBB ξ = MBB zB } .
Formally the least norm solution can be described as
+
ξ = MBB
MBB zB ,
+
where MBB
denotes the generalized inverse (cf. Appendix B) of MBB . We may then write
+
zB − ξ = IBB − MBB
MBB zB ,
where IBB is the identity matrix of appropriate size.
There are different ways to obtain a suitable vector ξ. Note that our aim is to obtain a
−1
ξ such that zB − ξ is positive. This is equivalent to eB − zB
ξ > 0, which certainly holds if
−1
zB
ξ < 1. An alternative approach might therefore be to use the solution of
min
which gives
−1
zB
ξ
: MBB ξ = MBB zB ,
ξ = ZB (MBB ZB )+ MBB zB ,
•
as easily may be verified.
Of course, we were lucky in the above example in two ways: the first iterate already
determined the optimal partition and, moreover, at this iterate the rounding procedure
yielded a strictly complementary solution. In general more iterations will be necessary
to find the optimal partition and once the optimal partition has been found the
rounding procedure may not yield a strictly complementary solution at once. But,
as we see in the next section, after sufficiently many iterations we can always find an
exact solution of any problem in this way, and the required number of iterations can
be bounded by a (linear) polynomial of the size of the problem.
3.3.6
Finding a strictly complementary solution
In this section we assume that the optimal partition (B, N ) of (SP ) is known. In the
previous section we argued that it may be assumed without loss of generality that
the set B is not empty. In this section we show that when we run the algorithm an
additional number of iterations, the rounding procedure of the previous section can be
used to construct a strictly complementary solution of (SP ). The additional number
of iterations depends on the size of B and is aimed at creating a sufficiently large
distance between the small and the large variables.
We need some more notation. First, ω will denote the infinity norm of M :
ω := kM k∞ = max
1≤i≤n
n
X
j=1
|Mij | .
Second, B ∗ denotes the subset of B for which the columns in MBB are nonzero, and
third, the number πB is defined by
if B ∗ = ∅,
1
Y
πB :=
k(MBB )j k
otherwise.
j∈B ∗
66
I Theory and Complexity
Lemma I.52 Let z be a feasible solution of (SP ) such that δc (z) ≤ τ = 4. If
z T s(z) ≤
2
σSP
4n(1 + ω)2 πB
p ,
|B|
with ω and πB as defined above, then a strictly complementary solution can be found
in
3
O(|B ∗ | )
arithmetical operations.
Proof: Suppose that z is positive solution of (SP ) with positive surplus vector
s = s(z) such that δc (z) ≤ 4 and z T s ≤ ε, where
ε :=
2
σSP
4n(1 + ω)2 πB
p .
|B|
(3.26)
Recall that the entities |B|, ω and πB are all at least 1 and also, by Lemma I.45, that
the small variables in z and s are less than ε/σSP and the large variables are at least
σSP /(4n).
We now show that the system (3.24) has a solution ξ whose coordinates are small
enough, so that
zB − ξ > 0, sN − MN N zN − MN B ξ > 0.
(3.27)
We need to distinguish between the cases where MBB is zero and nonzero respectively.
We first consider the case where MBB = 0. Then ξ = 0 satisfies (3.24) and for this
ξ the condition (3.27) for the rounded solution z̄ to be strictly complementary reduces
to the single inequality
sN − MN N zN > 0.
(3.28)
This inequality is satisfied if MN N = 0. Otherwise, if MN N 6= 0, since zN is small we
may write
εω
ε
=
.
kMN N zN k∞ ≤ kMN N k∞ kzN k∞ ≤ kM k∞
σSP
σSP
Hence, since sN is large, (3.28) certainly holds if
εω
σSP
<
,
σSP
4n
which is equivalent to
2
σSP
.
4nω
Since this inequality is implied by the hypothesis of the lemma, we conclude that the
rounding procedure yields a strictly complementary solution if MBB = 0.
Now consider the case where MBB 6= 0. Then we solve (3.24) by Gaussian
elimination. This goes as follows. Let B1 and B2 be two subsets of B such that MB1 B2 is
a nonsingular square submatrix of MBB with maximal rank, and let ζ be the unique
solution of the equation
MB1 B2 ζ = sB1 − MB1 N zN .
ε<
I.3 Polynomial Algorithm
67
From Cramer’s rule we know that the i-th entry of ζ, with i ∈ B2 , is given by
(i)
det MB1 B2
,
ζi =
det MB1 B2
(i)
where MB1 B2 is the matrix arising by replacing the i-th column in MB1 B2 by the vector
sB1 − MB1 N zN . Since the entries of MB1 B2 are integral and this matrix is nonsingular,
the absolute value of its determinant is at least 1. As a consequence we have
(i)
|ζi | ≤ det MB1 B2 .
The right-hand side is no larger than the product of the norms of the columns in the
(i)
matrix MB1 B2 , due to the inequality of Hadamard (cf. Section 1.7.3). Thus
|ζi | ≤ ksB − MBN zN k
Y
j∈B2 \{i}
k(MB1 B2 )j k ≤ ksB − MBN zN k πB .
(3.29)
The last inequality follows because the norm of each nonzero column in MBB is at
least 1, and πB is the product of these norms.
Since sB and zN are small variables we have
ksB k∞ ≤
ε
σSP
and
kMBN zN k∞ ≤ kMBN k∞ kzN k∞ ≤ kM k∞ kzN k∞ ≤
εω
.
σSP
Therefore
ksB − MBN zN k ≤
p
p
ε (1 + ω)
.
|B| ksB − MBN zN k∞ ≤ |B|
σSP
Substituting this inequality in (3.29), we obtain
ε(1 + ω)πB
|ζi | ≤
σSP
p
|B|
.
Defining ξ by
ξB2 = ζ,
ξi = 0,
i ∈ B \ B2 ,
the vector ξ satisfies (3.24), because MB1 B2 is a nonsingular square submatrix of MBB
with maximal rank and because sB −MBN zN (= MBB zB ) belongs to the column space
of MBB . Hence we have shown that Gaussian elimination yields a solution ξ of (3.24)
such that
p
ε(1 + ω)πB |B|
kξk∞ ≤
.
(3.30)
σSP
Applying the rounding procedure of the previous section to z, using ξ, we obtain
the vector z̄ defined by
z̄B = zB − ξ, z̄N = 0,
68
I Theory and Complexity
and the surplus vector s̄ = s(z̄) satisfies s̄B = 0. So z̄ is complementary. We proceed
by showing that z̄ is a strictly complementary solution of (SP ) by proving that ξ
satisfies the condition (3.25), namely
z̄B = zB − ξ > 0,
s̄N = sN − MN N zN − MN B ξ > 0.
We first establish that z̄B is positive. This is now easy. The coordinates of zB are
large and the nonzero coordinates of ξ are bounded above by the right-hand side in
(3.30). Therefore, z̄B will be positive if
p
ε(1 + ω)πB |B|
σSP
<
,
σSP
4n
or, equivalently,
ε<
2
σSP
p ,
4n(1 + ω)πB |B|
and this is guaranteed by the hypothesis in the lemma.
We proceed by estimating the coordinates of s̄N . First we write
zN
kMN N zN + MN B ξk∞ = k(MN N MN B )k∞
≤
kM
k
∞
ξ
∞
zN
ξ
.
∞
Using (3.30) and the fact that zN is small we obtain
kMN N zN + MN B ξk∞ ≤ ω max
ε(1 + ω)πB
,
σSP
σSP
ε
p !
|B|
εω(1 + ω)πB
=
σSP
p
|B|
.
Here we used again that πB ≥ 1 and |B| ≥ 1. Hence, since the coordinates of sN are
large, the coordinates of s̄N will be positive if
p
εω(1 + ω)πB |B|
σSP
<
,
σSP
4n
or, equivalently, if
ε<
2
σSP
p ,
4nω(1 + ω)πB |B|
and this follows from the hypothesis in the lemma.
Thus we have shown that the condition for z̄ being strictly complementary is
satisfied. Finally, the calculation of ζ can be performed by Gaussian elimination and
3
this requires O(|B ∗ | ) arithmetic operations. Thus the proof is complete.
✷
The next theorem now easily follows from the last lemma.
Theorem I.53 Using the notation introduced above, the Full-Newton step algorithm
yields a feasible solution z for which the rounding procedure yields a strictly complementary solution of (SP ), after at most
&
p '
√
4n2 (1 + ω)2 πB |B|
2n log
2
σSP
iterations.
I.3 Polynomial Algorithm
69
Proof: By Lemma I.52 the rounding procedure yields a strictly complementary
solution if we run the Full-Newton step algorithm with
ε=
2
σSP
4n(1 + ω)2 πB
p .
|B|
By Theorem I.37 for this value of ε the Full-Newton step algorithm requires at most
&
p '
√
4n2 (1 + ω)2 πB |B|
2n log
2
σSP
iterations. This proves the theorem.
✷
Remark I.54 The result in Theorem I.53 can be used to estimate the number of arithmetic
operations required by the algorithm in a worst-case situation. This number can be bounded
by a polynomial of the size L of (SP ) (cf. Remark I.32), as we show. We thus establish that
the method proposed in this chapter solves the self-dual model in polynomial time. As a
consequence, by the results of the previous chapter, it also solves the canonical LO problem
in polynomial time.
The iteration bound in the theorem is worst if B contains all indices. Ignoring the integrality
operator, and denoting the number of iterations by K, the iteration bound becomes
K≤
√
√
4n2 n (1 + ω)2 πM
2n log
,
2
σSP
where
πM :=
n
Y
j=1
By Theorem I.42 we have
kMj k .
1
1
.
=
π
kM
k
M
j
j=1
σSP ≥ Qn
Substituting this we get the upper bound
K≤
√
5
3
,
2n log 4n 2 (1 + ω)2 πM
(3.31)
for the number of iterations. A rather pessimistic estimate yields
2
πM
=
n
n
Y
X
j=1
2
Mij
i=1
!
≤ nn Π2 .
This follows by expanding the product in the middle, which gives nn terms, each of which
is bounded above by Π2 , where Π is defined in Remark I.32 as the product of all nonzero
entries in q and M . We also have the obvious (and very pessimistic) inequality ω ≤ Π, which
implies 1 + ω ≤ 2Π. Substituting these pessimistic estimates in (3.31) we obtain
K≤
√
5
2n log 4n 2 (2Π)2 nn Π2
32
=
√
2n log 16n
3n+5
2
Π5 .
70
I Theory and Complexity
This can be further reduced. One has
log 16 n
3n+5
2
Π5
=
<
=
<
3n + 5
log n + 5 log Π
2
1
7
3 + (3n + 5) (n − 1) + log2 Π
2
2
1
3n2 + 2n + 1 + 7 log2 Π
2
7
(n(n + 1) + log 2 Π) .
2
log 16 +
The first inequality is due to log 16 = 2.7726 < 3, log n ≤ n − 1 and log Π = 0.6931 log2 Π,
and the second inequality holds because 7n(n + 1) > 3n2 + 2n + 1 for all n.
Finally, using the definition (3.5) of the size L(= n(n + 1) + log2 Π)), we obtain
K<
Thus the claim has been proved.
3.4
√
7√
2n L < 5 n L.
2
•
Concluding remarks
The analysis in this chapter is based on properties of the central path of (SP ). To
be more specific, on the property that when one moves along the central path to the
optimal set, the separation between the large and small variables becomes apparent.
We showed that the Full-Newton step algorithm together with a simple rounding
procedure yields a √
polynomial algorithm for solving a canonical LP problem; the
iteration bound is 5 n L, where L is the binary input size of the problem.
In the literature many other polynomial-time interior-point algorithms have been
presented. We will encounter many of these algorithms in the rest of the book. Almost
all of these algorithms are based on a Newton-type search direction. At this stage we
want to mention an interesting exception, which is based on an idea of Dikin and that
also can be used to solve in polynomial time the self-dual problem that we considered
in this and the previous chapter. In fact, an earlier version of this book used the
Dikin Step Algorithm in this part of the book. The iteration bound that we could
obtain for this algorithm was 7nL. Because it leads to a better iteration bound, in this
edition we preferred to use the Full-Newton step algorithm. But because the Dikin
Step Algorithm is interesting in itself, and also because further on in the book we will
deal with Dikins method, we decided to keep a full description and analysis of the
Dikin Step Algorithm in the book. It can be found in Appendix F.16
16
The Dikin Step Algorithm was investigated first by Jansen et al. [156]; the analysis of the algorithm
used in this chapter is based on a paper of Ling [182]. By including
√ higher-order components in
the search direction, the complexity can be improved by a factor n, thus yielding a bound of the
same order as for the Full-Newton step algorithm. This has been shown by Jansen et al. [160]. See
also Chapter 18.
4
Solving the Canonical Problem
4.1
Introduction
In Chapter 2 we discussed the fact that every LO problem has a canonical description
of the form
(P )
min cT x : Ax ≥ b, x ≥ 0 .
The matrix A is of size m × n and the vectors c and x are in IRn and b in IRm . In this
chapter we further discuss how this problem, and its dual problem
(D)
max bT y : AT y ≤ c, y ≥ 0 ,
can be solved by using the algorithm of the previous chapter for solving a self-dual
embedding of both problems. With
y
0
A −b
(4.1)
M̄ := −AT
0
c , z̄ := x ,
T
T
κ
b
−c
0
as in (2.7), the embedding problem is given by (2.15). It is the self-dual homogeneous
problem
(SP0 )
min 0T z̄ : M̄ z̄ ≥ 0, z̄ ≥ 0
(4.2)
In Chapter 3 we showed that a strictly complementary solution z̄ of (SP0 ) can be
found in polynomial time. If a strictly complementary solution z̄ has κ > 0 then
x̄ = x/κ is an optimal solution of (P ), and if κ = 0 then (P ) (and also its dual (D))
must be either unbounded or infeasible. This was shown in Section 2.8, where we also
found that any strictly complementary solution of (SP0 ) with κ > 0 provides a strictly
complementary pair of solutions (x̄, ȳ) for (P ) and (D). Thus x̄ is primal feasible and
ȳ dual feasible. The complementarity means that
x̄ c − AT ȳ = 0, ȳ (Ax̄ − b) = 0,
and the strictness of the complementarity that
x̄ + c − AT ȳ > 0, ȳ + (Ax̄ − b) > 0.
Obviously these results imply that every LO problem can be solved exactly in
polynomial time. The aim of this chapter is to make a more thorough investigation of
72
I Theory and Complexity
the consequences of the results in Chapter 2 and Chapter 3. We restrict ourselves to
the canonical model.
The algorithm for the self-dual model, presented in Section 3.2, requires knowledge
of a positive z̄ such that the surplus vector s(z̄) = M̄ z̄ of z̄ is positive. However, such
z̄ does not exist, as we argued in Section 2.4. But then, as we showed in the same
section, we can embed (SP0 ) in a slightly larger self-dual problem, named (SP ) and
given by (cf. (2.16)),
(SP )
min
q T z : M z ≥ −q, z ≥ 0 .
(4.3)
for which the constraint matrix has one extra row and one extra column, so that any
strictly complementary solution of (SP ) induces a strictly complementary solution
of (SP0 ). Hence, applying the algorithm to the larger problem (SP ) yields a strictly
complementary solution of (SP0 ), hence also for (P ) and (D) if these problems are
solvable.
It should be noted that both the description of the Full-Newton Step algorithm
(page 50) and its analysis apply to any problem of the form (4.3) that satisfies the
IPC, provided that the matrix M is skew-symmetric and q ≥ 0. In other words, we
did not exploit the special structure of the matrix M , as given by (2.11), neither did
we use the special structure of the vector q, as given by (2.12).
Also note that if the embedding problem is ill-conditioned, in the sense that the
condition number σSP is small, we are forced to run the Full-Newton step algorithm
with a (very) small value of the accuracy parameter. In practice, due to limitations of
machine precision, it may happen that we cannot reach the state at which an exact
solution of (SP ) can be found. In that case the question becomes important of what
conclusions can be drawn for the canonical problem (P ) and its dual problem (D)
when an ε-solution for the embedding self-dual problem is available.
The aim of this chapter is twofold. We want to present two other embeddings of
(SP0 ) that satisfy the IPC. Recall that the embedding in Chapter 2 did not require
any foreknowledge about the problems (P ) and (D). We present another embedding
that can also be used for that case. A crucial question that we want to investigate is if
we can then decide whether the given problems have optimal solutions or not without
using the rounding procedure. Obviously, this amounts to deciding whether we have
κ > 0 in the limit or not. This will be the subject in Section 4.3.
Our first aim, however, is to consider an embedding that applies if both (P ) and
(D) have a strictly feasible solution and such solutions are know in advance. This case
is relatively easy, because we then know for sure that κ > 0 in the limit.
4.2
The case where strictly feasible solutions are known
We start with the easiest case, namely when strictly feasible solutions of (P ) and (D)
are given. Suppose that x0 ∈ IRn and y 0 ∈ IRm are strictly feasible solutions of (P )
and (D) respectively:
x0 > 0, s(x0 ) = Ax0 − b > 0
and
y 0 > 0, s(y 0 ) = c − AT y 0 > 0.
I.4 Solving the Canonical Problem
4.2.1
73
Adapted self-dual embedding
Let
M :=
0
A −b
T
−A
0
c
bT −cT
0
0
0 −1
0
0
1
0
and consider the self-dual problem
(SP1 )
min
,
z :=
y
x
κ
ϑ
,
q :=
0
0
0
2
,
T
q z : M z + q ≥ 0, z ≥ 0 .
Note that q ≥ 0. We proceed by showing that this problem has a positive solution
with positive surplus vector. Let
ϑ0 := 1 + cT x0 − bT y 0 .
The weak duality property implies that cT x0 − bT y 0 ≥ 0. If cT x0 − bT y 0 = 0 then x0
and y 0 are optimal and we are done. Otherwise we have ϑ0 > 1. We can easily check
that for
0
y
x0
z 0 :=
1
ϑ0
we have
s(z 0 ) := M z 0 + q =
Ax0 − b
c − AT y 0
T 0
b y − cT x0 + ϑ0
−1 + 2
=
s(x0 )
s(y 0 )
1
1
,
so both z 0 and its surplus vector are positive.1 Now let z̄ be a strictly complementary
solution of (SP1 ). Then we have, for suitable vectors ȳ and x̄ and scalars κ̄ and ϑ̄,
Ax̄ − κ̄b
ȳ
κ̄c − AT ȳ
x̄
z̄ := ≥ 0, s(z̄) = T
≥ 0, z̄s(z̄) = 0, z̄ + s(z̄) > 0.
T
b ȳ − c x̄ + ϑ̄
κ̄
ϑ̄
2 − κ̄
Since the optimal objective value is zero, we have ϑ̄ = 0. On the other hand, we
cannot have κ̄ = 0, because this would imply the contradiction that either (P ) or (D)
is infeasible. Hence we conclude that κ̄ > 0. This has the consequence that x̃ = x̄/κ̄
is feasible for (P ) and ỹ = ȳ/κ̄ is feasible for (D), as follows from the feasibility of z̄.
The complementarity of z̄ and s(z̄) now yields that
s (κ̄) := bT ȳ − cT x̄ = 0.
1
Exercise 27 If it happens that we have a primal feasible x0 and a dual feasible y 0 such that
x0 s(y 0 ) = µen and y 0 s(x0 ) = µem for some positive µ, find an embedding satisfying the IPC such
that z 0 is on its central path.
74
I Theory and Complexity
Thus it follows that x̄/κ̄ is optimal for (P ) and ȳ/κ̄ is optimal for (D). Finally, the
strict complementarity of z̄ and s(z̄) gives the strict complementarity of this solution
pair.
4.2.2
Central paths of (P ) and (D)
At this stage we want to point out an interesting and important consequence of the
existence of strictly feasible solutions of (P ) and (D). In that case we can define central
paths for the problems (P ) and (D). This goes as follows. Let µ be an arbitrary positive
number. Then the µ-center of (SP1 ) is determined as the unique solution of the system
(cf. (2.46), page 35)
z ≥ 0, s ≥ 0
Mz + q
=
s,
zs
=
µ em+n+2 .
(4.4)
In other words, there exist unique nonnegative x, y, κ, ϑ such that
Ax − κb ≥ 0,
κc − AT y ≥ 0,
bT y − cT x + ϑ ≥ 0,
2−κ≥0
and, moreover
y (Ax − κb)
x κc − AT y
κ b T y − cT x + ϑ
ϑ (2 − κ)
=
µem
=
µen
=
µ
=
µ.
(4.5)
An immediate consequence is that all the nonnegative entities mentioned above are
positive. Surprisingly enough, we can compute the value of κ from (4.4). Taking the
inner product of both sides in the first equation with z, while using the orthogonality
property, we get q T z = z T s. The second equation in (4.4) gives z T s = (n + m + 2)µ.
Due to the definition of q we obtain2
2ϑ = (n + m + 2)µ.
(4.6)
T
In fact, this relation expresses that the objective value q z = 2ϑ along the central
path equals the dimension of the matrix M times µ, already established in Section 2.7.
Substitution of the last equation in (4.5) into (4.6) yields
2ϑ = (n + m + 2)ϑ (2 − κ) .
Since ϑ > 0, after dividing by ϑ it easily follows that
κ=
2(n + m + 1)
.
n+m+2
(4.7)
Substitution of the values of κ and ϑ in the third equation gives
2
cT x − b T y
ϑκ − µ
(n + m) µ
(n + m) (n + m + 2)
ϑ
µ
µ.
=
=
= − 2 =
2
κ
κ κ
κ2
κ2
4 (n + m + 1)
2
The relation can also be obtained by adding all the equations in (4.5).
I.4 Solving the Canonical Problem
Now, defining
x̄ =
x
,
κ
ȳ =
75
y
,
κ
ϑ̄ =
ϑ
,
κ
µ̄ =
µ
,
κ2
and using the notation
s(x̄)
s(ȳ)
Ax̄ − b
c − AT ȳ,
:=
:=
we obtain that the positive vectors x̄ and ȳ are feasible for (P ) and (D) respectively
with s(x̄) and s(ȳ) positive, and moreover,
ȳ s(x̄)
=
µ̄ em
x̄ s(ȳ)
=
µ̄ en .
(4.8)
If µ runs through the interval (0, ∞) then µ̄ runs through the same interval, since
κ is constant. We conclude that for every positive µ̄ there exist positive vectors x̄
and ȳ that are feasible for (P ) and (D) respectively and are such that x̄, ȳ and their
associated surplus vectors s(x̄) and s(ȳ) satisfy (4.8).
Our next aim is to show that the system (4.8) cannot have more than one solution
with x̄ and ȳ feasible for (P ) and (D). Suppose that x̄ and ȳ are feasible for (P ) and
(D) and satisfy (4.8). Then it is quite easy to derive a solution for (4.5) as follows.
First we calculate κ from (4.7). Then taking µ = κ2 µ̄, we can find ϑ from (4.6). Finally,
the values x = κx̄ and y = κȳ satisfy (4.5). Since the solution of (4.5) is unique, it
follows that the solution of (4.8) is unique as well. Thus we have shown that for each
positive µ̄ the system (4.8) has a unique solution with x̄ and ȳ feasible for (P ) and
(D).
Denoting the solution of (4.8) by x̄(µ̄) and ȳ(µ̄), we obtain the central paths of (P )
and (D) by letting µ̄ run through all positive values. Summarizing the above results,
we have proved the following.
Theorem I.55 Let (x(µ), y(µ), κ(µ), ϑ(µ)) denote the point on the central path of
(SP1 ) corresponding to the barrier parameter value µ. Then we have κ(µ) = κ with
κ=
2(n + m + 1)
.
n+m+2
If µ̄ = µ/κ2 , then x̄(µ̄) = x(µ)/κ and ȳ(µ̄) = y(µ)/κ are the points on the central
paths of (P ) and (D) corresponding to the barrier parameter µ̄. As a consequence we
have
cT x̄ − bT ȳ = x̄T s(ȳ) + ȳ T s(x̄) = (n + m)µ̄.
4.2.3
Approximate solutions of (P ) and (D)
Our aim is to solve the given problem (P ) by solving the embedding problem (SP1 ).
The Full-Newton step algorithm yields an ε-solution, i.e. a feasible solution z of (SP1 )
such that q T z ≤ ε, where ε is some positive number. Therefore, it is of great importance
to see how we can derive approximate solutions for (P ) and (D) from any such solution
of (SP1 ). In this respect the following lemma is of interest.
76
I Theory and Complexity
Lemma I.56 Let z = (y, x, κ, ϑ) be a positive solution of (SP1 ). If
x̃ =
x
,
κ
ỹ =
y
,
κ
then x̃ is feasible for (P ), ỹ is feasible for (D), and the duality gap at the pair (x̃, ỹ)
satisfies
ϑ
cT x̃ − bT ỹ ≤ .
κ
Proof: Since z is feasible for (SP1 ), we have
Ax − κb
−AT y + κc
b T y − cT x + ϑ
−κ + 2
≥
≥
≥
≥
0
0
0
0.
With x̃ and ỹ as defined in the theorem it follows that Ax̃ ≥ b, AT ỹ ≤ c and
cT x̃ − bT ỹ ≤
ϑ
,
κ
thus proving the lemma.
✷
The above lemma makes clear that it is important for our goal to have a solution
z = (y, x, κ, ϑ) of (SP1 ) for which the quotient ϑ/κ is small. From (4.7) in Section
4.2.2 we know that along the central path the variable κ is constant and given by
κ=
2(n + m + 1)
.
n+m+2
Hence, along the central path we have the following inequality:
cT x̃ − bT ỹ ≤
(n + m + 2) ϑ
.
2(n + m + 1)
For large-scale problems, where n + m is large, this means that the duality gap at the
feasible pair (x̃, ỹ) is about ϑ/2.
Unfortunately our algorithm for solving (SP1 ) generates a feasible solution z that
is not necessarily on the central path. Hence the above estimate for the duality gap
at (x̃, ỹ) is no longer valid. However, we show now that the estimate is ‘almost’ valid
because the solution z generated by the algorithm is close to the central path. To be
more precise, according to Lemma I.44 z satisfies δc (z) ≤ τ , where τ = 4, and where
the proximity measure δc (z) is defined by
δc (z) =
max (zs(z))
.
min (zs(z))
Recall that δc (z) = 1 if and only if zs(z) is a multiple of the all-one vector e. This
occurs precisely if z lies on the central path. Otherwise we have δc (z) > 1. Now we
can prove the following generalization of Lemma I.56.
I.4 Solving the Canonical Problem
77
Lemma I.57 Let τ ≥ 1 and let z = (y, x, κ, ϑ) be a feasible solution of (SP1 ) such
that δc (z) ≤ τ . If
y
x
x̃ = , ỹ = ,
κ
κ
then x̃ is feasible for (P ), ỹ is feasible for (D), and the duality gap at the pair (x̃, ỹ)
satisfies
n+m+2
ϑ.
cT x̃ − bT ỹ <
2(n + m + 2 − τ )
Proof: Recall from (2.23) that q T z = z T s(z). Since q T z = 2ϑ, the average value of
the products zi si (z) is equal to
2ϑ
.
n+m+2
From δc (z) ≤ τ we deduce the following bounds:3,4
2ϑ
2τ ϑ
≤ zi si (z) ≤
,
τ (n + m + 2)
n+m+2
1 ≤ i ≤ m + n + 2.
(4.9)
The lemma is obtained by applying these inequalities to the last two coordinates of z,
which are κ and ϑ. Application of (4.9) to zi = ϑ yields the inequalities
2τ ϑ
2ϑ
≤ ϑ (2 − κ) ≤
.
τ (n + m + 2)
n+m+2
After division by ϑ and some elementary reductions, this gives the following bounds
on κ:
2(n + m + 2 − τ )
2 (τ (n + m + 2) − 1)
≤κ≤
.
(4.10)
n+m+2
τ (n + m + 2)
Application of the left-hand side inequality in (4.9) to zi = κ leads to
κ b T y − cT x + ϑ ≥
2ϑ
.
τ (n + m + 2)
Using the upper bound for κ in (4.10) we obtain
b T y − cT x + ϑ ≥
2ϑ
τ (n + m + 2)
ϑ
=
.
τ (n + m + 2) 2 (τ (n + m + 2) − 1)
τ (n + m + 2) − 1
Hence,
τ (n + m + 2) − 2
ϑ
=
ϑ < ϑ.
τ (n + m + 2) − 1
τ (n + m + 2) − 1
Finally, dividing both sides of this inequality by κ, and using the lower bound for κ
in (4.10), we obtain
cT x − b T y ≤ ϑ −
cT x̃ − bT ỹ =
3
4
n+m+2
cT x − b T y
<
ϑ.
κ
2(n + m + 2 − τ )
These bounds are sufficient for our purpose. Sharper bounds could be obtained from the next
exercise.
T
Exercise 28 Let x ∈ IRn
+ and τ ≥ 1. Prove that if e x = nσ and τ min(x) ≥ max(x) then
nσ
τ nσ
σ
≤
≤ xi ≤
≤ τ σ,
τ
1 + (n − 1)τ
n+τ −1
1 ≤ i ≤ n.
78
I Theory and Complexity
This proves the lemma.5
✷
For large-scale problems the above lemma implies that the duality gap at the feasible
pair (x̃, ỹ) is about ϑ/2, provided that τ is small compared with n + m.
4.3
4.3.1
The general case
Introduction
This time we assume that there is no foreknowledge about (P ) and (D). It may well
be that one of the problems is infeasible, or both. This raises the question of whether
the given problems have any solution at all. This question must be answered by the
solution method. In fact, the method that we presented in Chapter 3 perfectly answers
the question. In the next section, we present an alternative self-dual embedding. The
new embedding problem can be solved in exactly the same way as the embedding
problem (SP ) in Chapter 3, and by using the rounding procedure described there, we
can find a strictly complementary solution. Then the answer to the above question is
given by the value of the homogenizing variable κ. If this variable is positive then both
(P ) and (D) have optimal solutions; if it is zero then at least one of the two problems
is infeasible. Our aim is to develop some tools that may be helpful in deciding if κ is
positive or not without using the rounding procedure.
4.3.2
Alternative embedding for the general case
Let x0 and y 0 be arbitrary positive vectors of dimension n and m respectively. Defining
positive vectors s0 and t0 by the relations
x0 s0 = en ,
y 0 t0 = e m ,
we consider the self-dual problem
(SP2 )
min
T
q z : M z + q ≥ 0, z ≥ 0 ,
where M and q are given by
0mm A −b b̄
−AT 0
c c̄
nn
M := T
,
T
b
−c
0 β
T
−b̄
−c̄T −β 0
with
b̄
5
=
q :=
0m
0n
0
n+m+2
,
t0 + b − Ax0
Exercise 29 Using the sharper bounds for zi si (z) obtainable from Exercise 28, and using the
notation of Lemma I.57, derive the following bound for the duality gap:
cT x̃ − bT ỹ ≤
(n + m + 1 + τ ) ((n + m + 1) τ − 1)
2τ (n + m + 1)2
ϑ.
I.4 Solving the Canonical Problem
c̄
β
Taking
=
=
79
s0 − c + AT y 0
1 − bT y 0 + cT x0 .
we then have
z 0 :=
M z0 + q =
Ax0 − b + b̄
−AT y 0 + c + c̄
bT y 0 − cT x0 + β
−b̄T y 0 − c̄T x0 − β
y0
x0
1
1
+
0m
0n
0
n+m+2
=
t0
s0
1
1
.
Except for the last entry in the last vector this is obvious. For this entry we write
T
T
−b̄T y 0 − c̄T x0 − β
=
− t0 + b − Ax0 y 0 − s0 − c + AT y 0 x0 − β
T
T
=
− t0 y 0 − bT y 0 + x0 AT y 0
T
T
− s0 x0 + cT x0 − y 0 Ax0 − β
=
−m − bT y 0 − n + cT x0 − β = −m − n − 1,
whence
−b̄T y 0 − c̄T x0 − β + n + m + 2 = 1.
We conclude that z 0 is a positive solution of (SP2 ) with a positive surplus vector.
Moreover, since x0 s0 = en and y 0 t0 = em , this solution lies on the central path of
(SP2 ) and the corresponding barrier parameter value is 1. It remains to show that if a
strictly complementary solution of (SP2 ) is available then we can solve problems (P )
and (D). Therefore, let
ȳ
x̄
κ̄
ϑ̄
be a strictly complementary solution. Then, since the optimal value of (SP2 ) is zero,
we have ϑ̄ = 0. As a consequence, the vector
ȳ
z̄ := x̄
κ̄
is a strictly complementary solution of
T
y
0
y
0
A
−b
y
0
m
mm
m
T
min 0n x : −A
0nn c x ≥ 0n , x ≥ 0 .
0
κ
0
κ
bT −cT 0
κ
80
I Theory and Complexity
This is the problem (SP0 ), that we introduced in Chapter 2. We can duplicate the
arguments used there to conclude that if κ̄ is positive then the pair (x̄/κ̄, ȳ/κ̄) provides
strictly complementary optimal solutions of (P ) and (D), and if κ̄ is zero then one
of the two problems is infeasible and the other is unbounded, or both problems are
infeasible.
Thus (SP2 ) provides a self-dual embedding for (P ) and (D). Moreover, z 0 provides
a suitable starting point for the Full-Newton step algorithm. It is the point on the
central path of (SP2 ) corresponding to the barrier parameter value 1.
4.3.3
The central path of (SP2 )
In this section we point out some properties of the central path of the problem (SP2 ).
Let µ be an arbitrary positive number. Then the µ-center of (SP2 ) is determined as
the unique solution of the system (cf. (2.46), page 35)
z ≥ 0, s ≥ 0
Mz + q
=
s,
zs
=
µ em+n+2 .
(4.11)
This solution defines the point on the central path of (SP2 ) corresponding to the
barrier parameter value µ. Hence there exists unique positive x, y, κ, ϑ such that
Ax − κb + ϑb̄
>
0
κc − AT y + ϑc̄
>
0
s(κ) := b y − c x + ϑβ
>
0
>
0
T
T
s(ϑ) := n + m + 2 − b̄T y − c̄T x − κβ
(4.12)
and, moreover,
y Ax − κb + ϑb̄
x κc − AT y + ϑc̄
κ bT y − cT x + ϑβ
ϑ n + m + 2 − b̄T y − c̄T x − κβ
=
µem
=
µen
=
µ
=
µ.
(4.13)
Just as in Section 4.2.2 we take the inner product of both sides with z in the first
equation of (4.11). Using the orthogonality property, we obtain q T z = z T s. The second
equation in (4.11) gives z T s = (n + m + 2)µ. Due to the definition of q we obtain
(n + m + 2)ϑ = (n + m + 2)µ,
which gives ϑ = µ. Since ϑs(ϑ) = µ, by the fourth equation in (4.13), we conclude
that s(ϑ) = 1. Since
s(ϑ) = n + m + 2 − b̄T y − c̄T x − κβ
this leads to
b̄T y + c̄T x + κβ = n + m + 1.
(4.14)
I.4 Solving the Canonical Problem
81
Using ϑ = µ, the third equality in (4.13) can be rewritten as
κ bT y − cT x = µ − µκβ,
which gives
κβ = 1 +
Substituting this in (4.14) we get
b̄T y + c̄T x +
which is equivalent to
κ T
c x − bT y .
µ
κ T
c x − bT y = n + m,
µ
(κc + µc̄)T x − κb − µb̄
T
y = µ(n + m).
(4.15)
This relation admits a nice interpretation. The first two inequalities in (4.12) show
that x is feasible for the perturbed problem
o
n
T
min (κc + µc̄) x : Ax ≥ κb − µb̄, x ≥ 0 ,
and y is feasible for the dual problem
n
o
T
max
κb − µb̄ y : AT y ≤ κc + µc̄, y ≥ 0 .
For these perturbed problems the duality gap at the pair (x, y) is µ(n + m), from
(4.15). Now consider the behavior along the central path when µ approaches zero.
Two cases can occur: either κ converges to some positive value, or κ goes to zero. In
both cases the duality gap converges to zero. Roughly speaking, the limiting values of
x and y are optimal solutions for the perturbed problems. In the first case, when κ
converges to some positive value, asymptotically the first perturbed problem becomes
equivalent to (P ). We simply have to replace the variable x by κx. Also, the second
problem becomes equivalent to (D): replace the variable y by κy. In the second case
however, when κ goes to zero in the limit, then asymptotically the perturbed problems
become
min 0T x : Ax ≥ 0, x ≥ 0 ,
and
max
0T y : AT y ≤ 0, y ≥ 0 .
As we know, one of the problems (P ) and (D) is then infeasible and the other
unbounded, or both problems are infeasible.
When dealing with a solution method for the canonical problem, the method
must decide which of these two cases occurs. In this respect we make an interesting
observation. Clearly the first case occurs if and only if κ ∈ B and the second case if and
only if κ ∈ N , where (B, N ) is the optimal partition of (SP2 ). In other words, which
of the two cases occurs depends on whether κ is a large variable or a small variable.
Note that the variable ϑ is always small; in the present case we have ϑ(µ) = µ, for
each µ > 0. Recall from Lemma I.43 that the large variables are bounded below by
82
I Theory and Complexity
σSP /n and the small variables above by nµ/σSP . Hence, if κ is a large variable then
κ ≥ σSP /n implies
µ
nµ
ϑ
.
= ≤
κ
κ
σSP
This implies that the quotient ϑ/κ goes to zero if µ goes to zero. On the other hand,
if κ is a small variable then
n
nµ
κ
=
,
≤
ϑ
ϑσSP
σSP
proving that the quotient κ/ϑ is bounded above. Therefore, if µ goes to zero, κ2 /ϑ
goes to zero as well, and hence ϑ/κ2 goes to infinity. Thus we may state the following
without further proof.
Theorem I.58 If κ is a large variable then
lim
µ↓0
ϑ
ϑ
= lim 2 = 0,
µ↓0 κ
κ
and if κ is a small variable then
lim
µ↓0
ϑ
= ∞.
κ2
The above theorem provides another theoretical tool for distinguishing between the
two possible cases.
4.3.4
Approximate solutions of (P ) and (D)
Assuming that an ε-solution z = (y, x, κ, ϑ) for the embedding problem (SP2 ) is given,
we proceed by investigating what information this gives on the embedded problem (P )
and its dual (D). With
x
y
x̃ := , ỹ := ,
κ
κ
the feasibility of z for (SP2 ) implies the following inequalities:
Ax̃
AT ỹ
cT x̃ − bT ỹ
κ b̄T x̃ + c̄T ỹ + β
≥
≤
b − ϑκ b̄
c + ϑκ c̄
≤
ϑ
κβ
≤
n + m + 2.
(4.16)
Clearly we cannot conclude that x̃ is feasible for (P ) or that ỹ is feasible for (D). But
x̃ is feasible for the perturbed problem
(
)
T
ϑ
ϑ
′
(P )
min
c + c̄
x̄ : Ax̄ ≥ b − b̄, x ≥ 0 ,
κ
κ
and ỹ is feasible for its dual problem
(
)
T
ϑ
ϑ
′
T
(D )
max
b − b̄
ȳ : A ȳ ≤ c + c̄, y ≥ 0 .
κ
κ
We have the following lemma.
I.4 Solving the Canonical Problem
83
Lemma I.59 Let z = (y, x, κ, ϑ) be a feasible solution of (SP2 ) with κ > 0. If
x̃ =
x
,
κ
y
,
κ
ỹ =
then x̃ is feasible for (P ′ ), ỹ is feasible for (D′ ), and the duality gap at the pair (x̃, ỹ)
for this pair of perturbed problems satisfies
T
T
(n + m + 2)ϑ
ϑ
ϑ
.
ỹ ≤
x̃ − b − b̄
c + c̄
κ
κ
κ2
Proof: We have already established that x̃ is feasible for (P ′ ) and ỹ is feasible for
(D′ ). We rewrite the duality gap for the perturbed problems (P ′ ) and (D′ ) at the pair
(x̃, ỹ) as follows:
T
T
ϑ T
ϑ
ϑ
c̄ x̃ + b̄T ỹ .
ỹ = cT x̃ − bT ỹ +
x̃ − b − b̄
c + c̄
κ
κ
κ
The third inequality in (4.16) gives
cT x̃ − bT ỹ ≤
ϑ
β
κ
and the fourth inequality
c̄T x̃ + b̄T ỹ ≤
n+m+2
− β.
κ
Substitution gives
c+
ϑ
c̄
κ
T
T
ϑ n+m+2
(n + m + 2)ϑ
ϑ
ϑ
,
−β =
ȳ ≤ β +
x̄ − b − b̄
κ
κ
κ
κ
κ2
proving the lemma.
✷
The above lemma seems to be of interest only if κ is a large variable. For if ϑ/κ
and ϑ/κ2 are small enough then the lemma provides a pair of vectors (x̃, ỹ) such that
x̃ and ỹ are ‘almost’ feasible for (P ) and (D) respectively and the duality gap at this
pair is small. The error in feasibility for (P ) is given by the vector (ϑ/κ)b̄ and the
error in feasibility for (D) by the vector (ϑ/κ)c̄, whereas the duality gap with respect
to (P ) and (D) equals approximately
cT x̃ − bT ỹ.
Part II
The Logarithmic Barrier
Approach
5
Preliminaries
5.1
Introduction
In the previous chapters we showed that every LO problem can be solved in polynomial
time. This was achieved by transforming the given problem to its canonical form and
then embedding it into a self-dual model. We proved that the self-dual model can
be solved in polynomial time. Our proof was based on the algorithm in Chapter 3
that uses the Newton direction as search direction. As we have seen, this algorithm is
conceptually simple and allows a quite elementary analysis. For the theoretical purpose
of Part I of the book this algorithm therefore is an ideal choice.
From the practical point of view, however, there exist more efficient algorithms.
The aim of this part of the book is to deal with a class of algorithms that has a
relatively long history, going back to work of Frisch [88] in 1955. Frisch was the first
to propose the use of logarithmic barrier functions in LO. The idea was worked out by
Lootsma [185] and in the classical book of Fiacco and McCormick [77]. After 1984, the
year when Karmarkar’s paper [165] raised new interest in the interior-point approach
to LO, the so-called logarithmic barrier approach also began a new life. It became
the basis of a wide class of polynomial time algorithms. Variants of the most efficient
algorithms in this class found their way into commercial optimization packages like
CPLEX and OSL.1
The aim of this part of the book is to provide a thorough introduction to these
algorithms. In the literature of the last decade these interior-point algorithms were
developed for LO problems in the so-called standard format:
(P )
min cT x : Ax = b, x ≥ 0 ,
where A is an m× n matrix of rank m, c, x ∈ IRn , and b ∈ IRm . This format also served
as the standard for the literature on the Simplex Method. Because of its historical
status, we adopt the standard format for this part of the book.
We want to point out, however, that all results in this part can easily be adapted
to any other format, including the self-dual model of Part I. We only have to define a
suitable logarithmic barrier function for the format under consideration.
A disadvantage of the change from the self-dual to the standard format is that it
leads to some repetition of results. For example, we need to establish under what
conditions the problem (P ) in standard format has a central path, and so on. In fact,
1
CPLEX is a product of CPLEX Optimization, Inc. OSL stands for Optimization Subroutine Library
and is the optimization package of IBM.
88
II Logarithmic Barrier Approach
we could have derived all these results from the results in Chapter 2. But, instead, to
make this part of the book more accessible for readers who are better acquainted with
the standard format rather than the less known self-dual format, we decided to make
this part self-contained.
Readers who went through Part I may only be interested in methods for solving the
self-dual problem
(SP )
min q T x : M x ≥ −q, x ≥ 0 ,
with q ≥ 0 and M T = −M . Those readers may be advised to skip the rest of this
chapter and continue with Chapters 6 and 7. The relevance of these chapters for
solving (SP ) is due to the fact that (SP ) can easily be brought into the standard
format by introducing a surplus vector s to create equality constraints. Since x and s
are nonnegative, this yields (SP ) in the standard format:
(SP S)
min q T x : M x − s = −q, x ≥ 0, s ≥ 0 .
In this part of the book we take the classical duality results for the standard format
of the LO problem as granted. We briefly review these results in the next section.
5.2
Duality results for the standard LO problem
The standard format problem (P ) has the following dual problem:
(D)
max bT y : AT y + s = c, s ≥ 0 ,
where s ∈ IRn and y ∈ IRm . We call (D) the standard dual problem. The feasible
regions of (P ) and (D) are denoted by P and D, respectively:
P := {x : Ax = b, x ≥ 0} ,
D := (y, s) : AT y + s = c, s ≥ 0 .
If P is empty we call (P ) infeasible, otherwise feasible. If (P ) is feasible and the
objective value cT x is unbounded below on P, then (P ) is called unbounded, otherwise
bounded. We use similar terminology for the dual problem (D).
Since we assumed that A has full (row) rank m, we have a one-to-one correspondence
between y and s in the pairs (y, s) ∈ D. In order to facilitate the discussion we feel
free to refer to any pair (y, s) ∈ D either by y ∈ D or s ∈ D. The (relative) interiors
of P and D are denoted by P + and D+ :
P + := {x : Ax = b, x > 0} ,
D+ := (y, s) : AT y + s = c, s > 0 .
We recall the well known and almost trivial weak duality result for the LO problem
in standard format.
II.5 Preliminaries
89
Proposition II.1 (Weak duality) Let x and s be feasible for (P ) and (D), respectively. Then cT x−bT y = xT s ≥ 0. Consequently, cT x is an upper bound for the optimal
value of (D), if it exists, and bT y is a lower bound for the optimal value of (P ), if it
exists. Moreover, if the duality gap xT s is zero then x is an optimal solution of (P )
and (y, s) is an optimal solution of (D).
Proof: The proof is straightforward. We have
0 ≤ xT s = xT (c − AT y) = cT x − (Ax)T y = cT x − bT y.
(5.1)
This implies that cT x is an upper bound for the optimal objective value of (D), and
bT y is a lower bound for the optimal objective value of (P ), and, moreover, if the
duality gap is zero then the pair (x, s) is optimal.
✷
A direct consequence of Proposition II.1 is that if one of the problems (P ) and
(D) is unbounded, then the other problem is infeasible. The classical duality results
for the primal and dual problems in standard format boil down to the following two
results. The first result is the Duality Theorem (due to von Neumann, 1947, [227]),
and the second result will be referred to as the Goldman–Tucker Theorem (Goldman
and Tucker, 1956, [111]).
Theorem II.2 (Duality Theorem) If (P ) and (D) are feasible then both problems
have optimal solutions. Then, if x ∈ P and (y, s) ∈ D, these are optimal solutions
if and only if xT s = 0. Otherwise neither of the two problems has optimal solutions:
either both (P ) and (D) are infeasible or one of the two problems is infeasible and the
other one is unbounded.
Theorem II.3 (Goldman–Tucker Theorem) If (P ) and (D) are feasible then
there exists a strictly complementary pair of optimal solutions, that is an optimal
solution pair (x, s) satisfying x + s > 0.
It may be noted that these two classical results follow immediately from the results in
Part I.2 For future use we also mention that (P ) is infeasible if and only if there exists
a vector y such that AT y ≤ 0 and bT y > 0, and (D) is infeasible if and only if there
exists a vector x ≥ 0 such that Ax = 0 and cT x < 0. These statements are examples
of theorems of the alternatives and easily follow from Farkas’ lemma.3
We denote the set of all optimal solutions of (P ) by P ∗ and similarly D∗ denotes the
set of optimal solutions of (D). Of course, P ∗ is empty if and only if (P ) is infeasible
or unbounded, and D∗ is empty if and only if (D) is infeasible or unbounded. Note
that the Duality Theorem (II.2) implies that P ∗ is empty if and only if D∗ is empty.
2
Exercise 30 Derive Theorem II.2 and Theorem II.3 from Theorem I.26.
3
Exercise 31 Using Farkas’ lemma (cf. Remark I.27), prove:
(i) either the system Ax = b, x ≥ 0 or the system AT y ≤ 0, bT y > 0 has a solution;
(ii) either the system AT y ≤ c or the system Ax = 0, x ≥ 0, cT x < 0 has a solution.
90
5.3
II Logarithmic Barrier Approach
The primal logarithmic barrier function
We start by introducing the so-called logarithmic barrier function for the primal
problem (P ). This is the function g̃µ (x) defined by
g̃µ (x) := cT x − µ
n
X
log xj ,
(5.2)
j=1
where µ is a positive number called the barrier parameter, and x runs through all
primal feasible vectors that are positive. The domain of g̃µ is the set P + .
The use of logarithmic barrier functions in LO was first proposed by Frisch [88] in
1955. By minimizing g̃µ (x), we try to realize two goals at the same time, namely to
T
find
Pn a primal feasible vector x for which c x is small and such that the barrier term
j=1 log xj is large. Frisch observed that the minimization of g̃µ (x) can be done easily
by using standard techniques from nonlinear optimization. The barrier parameter can
be used to put more emphasis on either the objective value cT x of the primal LO
problem (P ), or on the barrier term. Intuitively, by letting µ take a small (positive)
value, we may expect that a minimizer of g̃µ (x) will be a good approximation for an
optimal solution of (P ). It has taken approximately 40 years to make clear that this
is a brilliant idea, not only from a practical but also from a theoretical point of view.
In this part of the book we deal with logarithmic barrier methods for solving both the
primal problem (P ) and the dual problem (D), and we show that when worked out
in an appropriate way, the resulting methods solve both (P ) and (D) in polynomial
time.
5.4
Existence of a minimizer
In the logarithmic barrier approach a major question is whether the barrier function
has a minimizing point or not. This section is devoted to this question, and we present
some necessary and sufficient conditions. One of these (mutually equivalent) conditions
will be called the interior-point condition. This condition is fundamental not only for
the logarithmic barrier approach, but as we shall see, for all interior-point approaches.
Note that the definition of g̃µ (x) can be extended to the set IRn++ of all positive
vectors x, and that g̃µ (x) is differentiable on this set. We can easily verify that the
gradient of g̃µ is given by
∇g̃µ (x) = c − µx−1 ,
and the Hessian matrix by
∇2 g̃µ (x) = µX −2 .
Obviously, the Hessian is positive definite for any x ∈ IRn++ . This means that g̃µ (x)
is strictly convex on IRn++ . We are interested in the behavior of g̃µ on its domain,
which is the set P + of the positive vectors in the primal feasible space. Since P +
is the intersection of IRn++ and the affine space {x : Ax = b}, it is a relatively open
subset of IRn++ . Therefore, the smallest affine space containing P + is the affine space
II.5 Preliminaries
91
{x : Ax = b}, and the linear space parallel to it is the null space N (A) of A:
N (A) = {x : Ax = 0} .
Taking D = IRn++ and C = P + , we may now apply Proposition A.1. From this we
conclude that g̃µ has a minimizer if and only if there exists an x ∈ P + such that
c − µx−1 ⊥ N (A).
Since the orthogonal complement of the null space of A is the row space of A, it follows
that x ∈ P + is a minimizer of g̃µ if and only if there exists a vector y ∈ IRm such that
c − µx−1 = AT y.
By putting s := µx−1 , which is equivalent to xs = µe, it follows that g̃µ has a
minimizer if and only if there exist vectors x, y and s such that
Ax
AT y + s
=
=
b,
c,
xs
=
µe.
x > 0,
s > 0,
(5.3)
We thus have shown that this system represents the optimality conditions for the
primal logarithmic barrier minimization problem, given by
(Pµ )
min g̃µ (x) : x ∈ P + .
We refer to the system (5.3) as the KKT system with respect to µ.4
Note that the condition x > 0 can be relaxed to x ≥ 0, because the third equation in
(5.3) forces strict inequality. Similarly, the condition s > 0 can be replaced by s ≥ 0.
Thus, the first equation in (5.3) is simply the feasibility constraint for the primal
problem (P ) and the second equation is the feasibility constraint for the dual problem
(D). For reasons that we shall make clear later on, the third constraint is referred to
as the centering condition with respect to µ.
5.5
The interior-point condition
If the KKT system has a solution for some positive value of the barrier parameter µ,
then the primal feasible region contains a positive vector x, and the dual feasible region
contains a pair (y, s) with positive slack vector s. In short, both P and D contain a
positive vector. At this stage we announce the surprising result that the converse is
also true: if both P and D contain a positive vector, then the KKT system has a
solution for any positive µ. This is a consequence of the following theorem.
Theorem II.4 Let µ > 0. Then the following statements are equivalent:
(i) both P and D contain a positive vector;
4
The reader who is familiar with the theory of nonlinear optimization will recognize in this system
the first-order optimality conditions, also known as Karush–Kuhn–Tucker conditions, for (Pµ ).
92
II Logarithmic Barrier Approach
(ii) there exists a (unique) minimizer of g̃µ on P + ;
(iii) the KKT system (5.3) has a (unique) solution.
Proof: The equivalence of (ii) and (iii) has been established above. We have also
observed already the implication (iii) ⇒ (i). So the proof of the theorem will be
complete if we show (i) ⇒ (ii). The proof of this implication is more sophisticated.
0
Assuming (i), there exist vectors x0 and y 0 such that x0 is feasible for
(P ) and y
0
0
0
T 0
is feasible for (D), x > 0 and s := c − A y > 0. Taking K = g̃µ x and defining
the level set LK of g̃µ by
LK := x ∈ P + : g̃µ (x) ≤ K ,
we have x0 ∈ LK , so LK is not empty. Since g̃µ is continuous on its domain, it suffices
to show that LK is compact. Because then g̃µ has a minimizer, and since g̃µ is strictly
convex this minimizer is unique. Thus to complete the proof we show below that LK
is compact.
Let x ∈ LK . Using Proposition II.1 we have
cT x − bT y 0 = xT s0 ,
so, in the definition of g̃µ (x) we may replace cT x by bT y 0 + xT s0 :
T
g̃µ (x) = c x − µ
n
X
j=1
T 0
T 0
log xj = b y + x s − µ
Since xT s0 = eT xs0 and eT e = n, this can be written as
n
X
log xj .
j=1
n
n
X
X
xj s0j
log s0j ,
+ n − nµ log µ + bT y 0 + µ
log
g̃µ (x) = eT xs0 − e − µ
µ
j=1
j=1
or, equivalently,
n
n
X
X
xj s0j
log
eT xs0 − e − µ
log s0j .
= g̃µ (x) − n + nµ log µ − bT y 0 − µ
µ
j=1
j=1
Hence, using g̃µ (x) ≤ K and defining K̄ by
K̄ := K − n + nµ log µ − bT y 0 − µ
we obtain
n
X
log s0j ,
j=1
n
X
xj s0j
log
eT xs0 − e − µ
≤ K̄.
µ
j=1
(5.4)
Note that K̄ does not depend on x.
Now let the function ψ : (−1, ∞) → IR be defined by
ψ(t) = t − log(1 + t).
(5.5)
II.5 Preliminaries
93
Then, also using eT e = n, we may rewrite (5.4) as follows:
!
n
X
xj s0j
− 1 ≤ K̄.
ψ
µ
µ
j=1
(5.6)
The rest of the proof is based on some simple properties of the function ψ(t),5 namely
• ψ(t) ≥ 0 for t > −1;
• ψ is strictly convex;
• ψ(0) = 0;
• limt→∞ ψ(t) = ∞;
• limt↓−1 ψ(t) = ∞.
In words: ψ(t) is strictly convex on its domain and minimal at t = 0, with ψ(0) = 0;
moreover, ψ(t) goes to infinity if t goes to one of the boundaries of the domain (−1, ∞)
of ψ. Figure 5.1 depicts the graph of ψ.
2
ψ(t)
✻
1.75
1.5
1.25
1
0.75
0.5
0.25
0
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
✲ t
Figure 5.1
The graph of ψ.
Since ψ is nonnegative on its domain, each term in the above sum is nonnegative.
Therefore,
!
xj s0j
− 1 ≤ K̄, 1 ≤ j ≤ n.
µψ
µ
Now using that ψ(t) is strictly convex, zero at t = 0, and unbounded if t goes to −1
or to infinity, it follows that there must exist unique nonnegative numbers a and b,
5
E. Klafszky drew our attention to the fact that this function is known in the literature. It was
used in a different context for measuring discrepancy between two positive vectors in IRn . See
Csiszár [58] and Klafszky, Mayer and Terlaky [169].
94
II Logarithmic Barrier Approach
with a < 1, such that
K̄
.
µ
ψ(−a) = ψ(b) =
We conclude that
−a ≤
xj s0j
− 1 ≤ b,
µ
1 ≤ j ≤ n,
which gives
µ(1 − a)
µ(1 + b)
≤ xj ≤
,
s0j
s0j
1 ≤ j ≤ n.
Since 1 − a > 0, this shows that each coordinate of the vector x belongs to a finite
and closed interval on the set (0, ∞) of positive real numbers. As a consequence, since
the level set LK is a closed subset of the Cartesian product of these intervals, LK is
compact. Thus we have shown that (ii) holds.
✷
The first condition in Theorem II.4 will be referred to as the interior-point condition.
Let us point out once more that the word ‘unique’ in the second statement comes from
the fact that g̃µ is strictly convex, which implies that g̃µ has at most one minimizer.
The equivalence of (ii) and (iii) now justifies the word ‘unique’ in the third statement.
Remark II.5 It is possible to give an elementary proof (i.e., without using the equivalence
of (ii) and (iii) in Theorem II.4) of the fact that the KKT system (5.3) cannot have more
than one solution. This goes as follows. Let x1 , y 1 , s1 and x2 , y 2 , s2 denote two solutions of the
equation system (5.3). Define ∆x := x2 − x1 , and similarly ∆y := y 2 − y 1 and ∆s := s2 − s1 .
Then we may easily verify that
A∆x
=
0
(5.7)
AT ∆y + ∆s
=
0
(5.8)
x1 ∆s + s1 ∆x + ∆s∆x
=
0.
(5.9)
From (5.7) and (5.8) we deduce that ∆sT ∆x = 0, or
eT ∆x∆s = 0.
(5.10)
Rewriting (5.9) gives
(x1 + ∆x)∆s + s1 ∆x = 0.
Since x1 + ∆x = x2 > 0 and s1 > 0, this implies that no two corresponding entries in ∆x
and ∆s have the same sign. So it follows that
∆x∆s ≤ 0.
(5.11)
Combining (5.10) and (5.11), we obtain ∆x∆s = 0. Hence either (∆x)i = 0 or (∆s)i = 0,
for each i. Using (5.9), we conclude that (∆x)i = 0 and (∆s)i = 0, for each i. Hence x1 = x2
and s1 = s2 . Consequently, AT (y 1 − y 2 ) = 0. Since rank (A) = m, the columns of AT are
linearly independent and it follows that y 1 = y 2 . This proves the claim.
•
II.5 Preliminaries
5.6
95
The central path
Theorem II.4 has several important consequences. First we remark that the interiorpoint condition is independent of the barrier parameter. Therefore, since this condition
is equivalent to the existence of a minimizer of the logarithmic barrier function g̃µ ,
if such a minimizer exists for some (positive) µ, then it exists for all µ. Hence, the
interior point condition guarantees that the KKT system (5.3) has a unique solution
for every positive value of µ. These solutions are denoted throughout as x(µ), y(µ)
and s(µ), and we call x(µ) the µ-center of (P ) and (y(µ), s(µ)) the µ-center of (D).
The set
{x(µ) : µ > 0}
of all primal µ-centers represents a parametric curve in the feasible region P of (P )
and is called the central path of (P ). Similarly, the set
{(y(µ), s(µ)) : µ > 0}
is called the central path of (D).
Remark II.6 It may worthwhile to point out that along the primal central path the primal
objective value cT x(µ) is monotonically decreasing and along the dual central path the dual
objective value bT y(µ) is monotonically increasing if µ decreases. In fact, in both cases the
monotonicity is strict unless the objective value is constant on the feasible region, and in the
latter case the central path is just a point. Although we will not use these results we include
here the proof for the primal case.6 Recall that x(µ) is the (unique) minimizer of the primal
logarithmic barrier function
g̃µ (x) = cT x − µ
n
X
log xj ,
j=1
as given by (5.2), when x runs through the positive vectors in P. First we deal with the
case when the primal objective value is constant on P. We have the following equivalent
statements:
(i)
(ii)
(iii)
(iv)
cT x is constant for x ∈ P;
x(µ) is constant for µ > 0;
x(µ1 ) = x(µ2 ) for some µ1 and µ2 with 0 < µ1 < µ2 ;
there exists a ξ ∈ IRn such that s(µ) = µξ for µ > 0.
The proof is easy. If (i) holds then the minimizer of g̃µ (x) is independent of µ, and hence
x(µ) is constant for all µ > 0, which means that (ii) holds. The implication (ii) ⇒ (iii) is
obvious. Assuming (iii), let ξ be such that x(µ1 ) = x(µ2 ) = ξ. Since s(µ1 ) = µ1 ξ −1 and
s(µ2 ) = µ2 ξ −1 we have
AT y(µ1 ) + µ1 ξ −1 = c,
AT y(µ2 ) + µ2 ξ −1 = c.
This implies
(µ2 − µ1 ) c = AT (µ2 y(µ1 ) − µ1 y(µ2 )) ,
6
The idea of the following proof is due to Fiacco and McCormick [77]. They deal with the more
general case of a convex optimization problem and prove the monotonicity of the objective value
only for the primal central path. We also refer the reader to den Hertog, Roos and Vial [146] for
a different proof. The proof for the dual central path is similar to the proof for the primal central
path and is left to the reader.
96
II Logarithmic Barrier Approach
showing that c belongs to the row space of A. This means that (i) holds.7 Thus we have
shown the equivalence of (i) to (iii). The equivalence of (ii) and (iv) is immediate from
x(µ)s(µ) = µe for all µ > 0.
Now consider the case where the primal objective value is not constant on P. Letting
0 < µ1 < µ2 and x1 = x(µ1 ) and x2 = x(µ2 ), we claim that cT x1 < cT x2 . The above
equivalence (i) ⇔ (iii) makes it clear that x1 6= x2 . The rest of the proof is based on
the fact that g̃µ (x) is strictly convex. From this we deduce that g̃µ1 (x1 ) < g̃µ1 (x2 ) and
g̃µ2 (x2 ) < g̃µ2 (x1 ). Hence
cT x1 − µ1
n
X
log x1j < cT x2 − µ1
n
X
cT x2 − µ2
n
X
log x2j < cT x1 − µ2
n
X
and
j=1
j=1
log x2j
(5.12)
log x1j .
(5.13)
j=1
j=1
The sums in these inequalities can be eliminated by multiplying both sides of (5.12) by µ2
and both sides of (5.13) by µ1 , and then adding the resulting inequalities. Thus we find
µ2 cT x1 + µ1 cT x2 < µ2 cT x2 + µ1 cT x1 ,
which is equivalent to
(µ2 − µ1 ) cT x1 − cT x2 < 0.
Since µ2 − µ1 > 0 we obtain cT x1 < cT x2 , proving the claim.
•
It is obvious that if one of the problems (P ) and (D) is infeasible, then the interiorpoint condition cannot be satisfied, and hence the central paths do not exist. But
feasibility of both (P ) and (D) is not enough for the existence of the central paths:
the central paths exist if and only if both the primal and the dual feasible region
contain a positive vector. In that case, when the interior-point condition is satisfied,
the central path can be obtained by solving the KKT system.
Unfortunately, the KKT system is nonlinear, and hence in general it will not be
possible to solve it explicitly. In order to understand better the type of nonlinearity,
we show that the KKT system can be reformulated as a system of m polynomial
equations of degree at most n, in the m coordinates of the vector y. This goes as
follows. From the second and the third equations we derive that
x = µ c − AT y
−1
.
Substituting this in the first equation we obtain
µA c − AT y
−1
= b.
(5.14)
If we multiply each of the m equations in this system by the product of the n
coordinates of the vector c − AT y, which are linear in the m coordinates yj , we arrive
at m polynomial equations of degree at most n in the coordinates of y.
We illustrate this by a simple example.
7
Exercise 32 Assume that (P ) and (D) satisfy the interior point condition. Prove that the primal
objective value is constant on the primal feasible region P if and only if c = AT λ for some λ ∈ IRm ,
and the dual objective value is constant on the dual feasible region D if and only if b = 0.
II.5 Preliminaries
97
Example II.7 Consider the case where
A=
"
1 −1
0
0
8
0
1
#
1
c = 1 .
1
,
For the moment we do not further specify the vector b. The left-hand side of (5.14)
becomes
µA c − AT y
−1
=µ
"
1 −1
0
0
0
1
#
−1 2µy
1
1 − y1
1 − y12
.
1 + y1 =
µ
1 − y2
1 − y2
This means that the KKT system (5.3) is equivalent to the system of equations
2µy
1
b
2
1 − y1 1
,
=
µ
b2
1 − y2
1 − y1
1 + y1 ≥ 0.
1 − y2
We consider this system for special choices of the vector b. Obviously, if b2 ≤ 0 then
the system has no solution, since µ > 0 and 1 − y2 ≥ 0. Note that the second equation
in Ax = b then requires that x3 ≤ 0, showing that the primal feasible region does not
contain a positive vector in that case. Hence, the central path exists only if b2 > 0.
Without loss of generality we may put b2 = 1. Then we find
y2 = 1 − µ.
Now consider the case where b1 = 0:
b=
"
0
1
#
.
Then we obtain y1 = 0 from the first equation, and hence for each µ > 0:
x(µ)
=
(µ, µ, 1)
s(µ)
y(µ)
=
=
(1, 1, µ)
(0, 1 − µ).
Thus we have found a parametric representation of the central paths of (P ) and (D).
They are straight half lines in this case. The dual central path (in the y-space) is
shown in Figure 5.2.
8
Note that these data are the same as in the examples D.5, D.6 and D.7 in Appendix D. These
examples differ only in the vector b.
98
II Logarithmic Barrier Approach
1
y2
✻
0
−1
central path
❘
−2
−3
−4
−1.5
−1
−0.5
0
0.5
1
1.5
✲ y1
Figure 5.2
The dual central path if b = (0, 1).
Let us also consider the case where b1 = 1:
"
b=
1
1
#
.
The first equation in the reduced KKT system then becomes
y12 + 2µy1 − 1 = 0,
giving
y1 = −µ ±
p
1 + µ2 .
The minus sign gives y1 ≤ −1, which implies s2 = 1 + y1 ≤ 0. Since 1 + y1 must be
positive, the unique solution for y1 is determined by the plus sign:
p
y1 = −µ + 1 + µ2 .
With y(µ) found, the calculation of s(µ) and x(µ) is straightforward, and yields a
parametric representation of the central paths of (P ) and (D). We have for each
µ > 0:
1
p
p
1
µ + 1 + 1 + µ2 ,
−1 + µ + 1 + µ2 , 1
x(µ)
=
2
2
p
p
s(µ)
=
1 + µ − 1 + µ2 , 1 − µ + 1 + µ2 , µ
p
y(µ)
=
−µ + 1 + µ2 , 1 − µ .
The dual central path in the y-space is shown in Figure 5.3.
II.5 Preliminaries
99
1
y2
✻
0
−1
central path
❘
−2
−3
−4
−1.5
−1
−0.5
0
0.5
1
1.5
✲ y1
Figure 5.3
The dual central path if b = (1, 1).
Note that in the above examples the limit of the central path exists if µ approaches
zero, and that the limit point is an optimal solution. In fact this property of the central
path is at the heart of the interior-point methods for solving the problems (P ) and
(D). The central path is used as a guideline to the optimal solution set.
♦
5.7
Equivalent formulations of the interior-point condition
Later on we need other conditions that are equivalent to the interior-point condition.
In this section we deal with one of them.
Let x be feasible for the primal problem, and (y, s) for the dual problem. Then,
omitting y, we call (x, s) a primal-dual pair. From Proposition II.1 we recall that the
duality gap for this pair is given by
cT x − bT y = xT s.
We now derive an important consequence of the interior point condition on the level
sets of the duality gap. In doing so, we shall use a simple relationship that we state,
for further use, as a lemma. The relation in the lemma is an immediate consequence
of the orthogonality of the row space and the null space of the matrix A.
Lemma II.8 Assume x̄ ∈ P and s̄ ∈ D. Then for all primal-dual feasible pairs (x, s),
xT s = s̄T x + x̄T s − x̄T s̄.
100
II Logarithmic Barrier Approach
Proof: From the feasibility assumption, the vectors x − x̄ and s − s̄ are orthogonal,
since the first vector belongs to the null space of A while the second is in the row space
of A. Expanding the scalar product (x − x̄)T (s − s̄) and equating it to zero yields the
result.
✷
Theorem II.9 Let the interior-point assumption hold. Then, for each positive K, the
set of all primal-dual feasible pairs (x, s) such that xT s ≤ K is bounded.
Proof: By the interior-point assumption there exists a positive primal-dual feasible
pair (x̄, s̄). Substituting K for xT s in Lemma II.8, we get
s̄T x + x̄T s ≤ K + x̄T s̄.
This implies that both s̄T x and x̄T s are bounded. Since x̄ > 0 and s̄ > 0, we conclude
that all components of x and s must also be bounded.
✷
We can restate Theorem II.9 by saying that the interior-point condition implies that
all level sets of the duality gap are bounded. Interestingly enough, the converse is also
true. If all level sets of the duality gap are bounded, then the interior point condition
is satisfied. This is a consequence of our next result.9
Theorem II.10 Let the feasible regions of (P ) and (D) be nonempty. Then the
following statements are equivalent:
(i) both P and D contain a positive vector;
(ii) the level sets of the duality gap are bounded;
(iii) the optimal sets of (P ) and (D) are bounded.
Proof: The implication (i) ⇒ (ii) is just a restatement of Theorem II.9. The
implication (ii) ⇒ (iii) is obvious, because optimal solutions of (P ) and (D) are
contained in any nonempty level set of the duality gap. The implication (iii) ⇒ (i) in
the theorem is nontrivial and can be proved as follows.
Since the feasible regions of (P ) and (D) are nonempty we have optimal solutions
x∗ and (y ∗ , s∗ ) for (P ) and (D). First assume that the optimal set of (P ) is bounded.
Since x ∈ P is optimal for (P ) if and only if xT s∗ = 0, this set is given by
P ∗ = x : Ax = b, x ≥ 0, xT s∗ = 0 .
The boundedness of P ∗ implies that the problem
max eT x : Ax = b, x ≥ 0, xT s∗ = 0
x
is bounded, and hence it has an optimal solution. Since x and s∗ are nonnegative, the
problem is equivalent to
max eT x : Ax = b, x ≥ 0, xT s∗ ≤ 0 .
x
9
This result was first established by McLinden [197, 198]. See also Megiddo [200].
II.5 Preliminaries
101
Hence, the dual of this problem is feasible. The dual is given by
min bT y : AT y + λs∗ ≥ e, λ ≥ 0 .
y,λ
Let (ȳ, λ̄) be feasible for this problem. Then we have AT ȳ + λ̄s∗ ≥ e. If λ̄ = 0 then
AT ȳ ≥ e, which implies
AT (y ∗ − ȳ) = AT y ∗ − AT ȳ ≤ c − e,
and hence y ∗ − ȳ is dual feasible with positive slack vector. Now let λ̄ > 0. Then,
replacing s∗ by c − AT y ∗ in AT ȳ + λ̄s∗ ≥ e we get
AT ȳ + λ̄ c − AT y ∗ ≥ e.
Dividing by the positive number λ̄ we obtain
ȳ e
AT y ∗ −
+ ≤ c,
λ̄
λ̄
showing that y ∗ − ȳ/λ̄ is feasible for (D) with a positive slack vector.
We proceed by assuming that the (nonempty!) optimal set of (D) is bounded. The
same arguments apply in this case. Using that (y, s) ∈ D is optimal for (D) if and
only if sT x∗ = 0, the dual optimal set is given by
D∗ = (y, s) : AT y + s = c, s ≥ 0, sT x∗ = 0 .
The boundedness of D∗ implies that the problem
max eT s : AT y + s = c, s ≥ 0, sT x∗ = 0
y,s
is bounded and hence has an optimal solution. This implies that the problem
max eT s : AT y + s = c, s ≥ 0, sT x∗ ≤ 0
y,s
is also feasible and bounded. Hence, the dual problem, given by
min cT x : Ax = 0, x + ηx∗ ≥ e, η ≥ 0 ,
x,η
is feasible and bounded as well. We only use the feasibility. Let (x̄, η̄) be a feasible
solution. Then x̄ + η̄x∗ ≥ e and Ax̄ = 0. If η̄ = 0 then we have x∗ + x̄ ≥ e > 0 and
A (x∗ + x̄) = Ax̄ + Ax∗ = b, whence x∗ + x̄ is a positive vector in P. If η̄ > 0 then we
write
1
x̄
∗
+ x = Ax̄ + Ax∗ = b,
A
η̄
η̄
yielding that the positive vector x̄/η̄ + x∗ is feasible for (P ). Thus we have shown that
(iii) implies (i), completing the proof.
✷
Each of the three statements in Theorem II.10 deals with properties of both (P )
and (D). We also have two one-sided versions of Theorem II.10 in which we have three
102
II Logarithmic Barrier Approach
equivalent statements where each statement involves a property of (P ) or a property
of (D). We state these results as corollaries, in which a primal level set means any set
of the form
x ∈ P : cT x ≤ K
and a dual level set means any set of the form
y ∈ D : bT y ≥ K ,
where K may be any real number. The first corollary follows.
Corollary II.11 Let the feasible regions of (P ) and (D) be nonempty. Then the
following statements are equivalent:
(i) P contains a positive vector;
(ii) the level sets of the dual objective are bounded;
(iii) the optimal set of (D) is bounded.
Proof: Recall that the hypothesis in the corollary implies that the optimal sets of
(P ) and (D) are nonempty. The proof is cyclic, and goes as follows.
(i) ⇒ (ii): Letting x̄ ∈ P, with x̄ > 0, we show that each level set of the dual
objective is bounded. For any number K let DK be the corresponding level set of the
dual objective:
DK = (y, s) ∈ D : bT y ≥ K .
Then (y, s) ∈ DK implies
sT x̄ = cT x̄ − bT y ≤ cT x̄ − K.
Since x̄ > 0, the i-th coordinate of s must be bounded above by (cT x̄ − K)/x̄i .
Therefore, DK is bounded.
(ii) ⇒ (iii): This implication is trivial, because the optimal set of (D) is a level set
of the dual objective.
(iii) ⇒ (i): This implication has been obtained as part of the proof of Theorem II.10.
✷
The proof of the second corollary goes in the same way and is therefore omitted.
Corollary II.12 Let the feasible regions of (P ) and (D) be nonempty. Then the
following statements are equivalent:
(i) D contains a positive vector;
(ii) the level sets of the primal objective are bounded;
(iii) the optimal set of (P ) is bounded.
We conclude this section with some interesting consequences of these corollaries.
We assume that the feasible regions P and D are nonempty.
Corollary II.13 D is bounded if and only if the null space of A contains a positive
vector.
II.5 Preliminaries
103
Proof: The dual feasible region remains unchanged if we put b = 0. In that case D
coincides with the optimal set D∗ of (D), and this is the only nonempty dual level set.
Hence, Corollary II.11 yields that D is bounded if and only if P contains a positive
vector. Since b = 0 this gives the result.
✷
Corollary II.14 P is bounded if and only if the row space of A contains a positive
vector.
Proof: The primal feasible region remains unchanged if we put c = 0. Now P coincides
with the primal optimal set P ∗ of (P ), and Corollary II.12 yields that D is bounded
if and only if D contains a positive vector. Since c = 0 this gives the result.
✷
Note that the word ‘positive’ in the last two corollaries could be replaced by the word
‘negative’, because a linear space contains a positive vector if and only if it contains
a negative vector. An immediate consequence of Corollary II.13 and Corollary II.14 is
as follows.
Corollary II.15 At least one of the two sets P and D is unbounded.
Proof: If both sets are bounded then there exist a positive vector x and a vector y
such that Ax = 0 and AT y > 0. This gives the contradiction
T
0 = (Ax) y = xT AT y > 0.
The result follows.
✷
Remark II.16 If (P ) and (D) satisfy the interior-point condition then for every positive
µ we have a primal-dual pair (x, s) such that xs = µe. Letting µ go to infinity, it follows that
for each index i the product xi si goes to infinity. Therefore, at least one of the coordinates
xi and si must be unbounded. It can be shown that exactly one of these two coordinates is
unbounded and the other is bounded. This is an example of a coordinatewise duality property.
We will not go further in this direction here, but refer the reader to Williams [291, 292] and
to Güler et al. [134].
•
5.8
Symmetric formulation
In this chapter we dealt with the LO problem in standard form
(P )
min cT x : Ax = b, x ≥ 0 ,
and its dual problem
(D)
max
T
b y : AT y + s = c, s ≥ 0 .
Note that there is an asymmetry in problems (P ) and (D). The constraints in (P ) and
(D) are equality constraints, but in (P ) all variables are nonnegative, whereas in (D)
we also have free variables, in y. Note that we could eliminate s in the formulation of
104
II Logarithmic Barrier Approach
(D), leaving us with the inequality constraints AT y ≤ c, so this would not remove the
asymmetry in the formulations.
We could have avoided the asymmetry by using a different format for problem (P ),
but because the chosen format is more or less standard in the literature, we decided
to use the standard format in this chapter and to accept its inherent asymmetry. Note
that the asymmetry is also reflected in the KKT system. This is especially true for
the first two equations, because the third equation is symmetric in x and s.
In this section we make an effort to show that it is quite easy to obtain a perfect
symmetry in the formulations. This has some practical value. It implies that every
concept, or result, or algorithm for one of the two problems, has its natural counterpart
for the other problem. It will also highlight the underlying geometry of an LO problem.
Let us define the linear space L as the null space of the matrix A:
L = {Ax = 0 : x ∈ IRn } ,
(5.15)
and let L⊥ denote the orthogonal complement of L. Then, due to a well known result
in linear algebra, L⊥ is the row space of the matrix A, i.e.,
L⊥ = AT y : y ∈ IRm .
(5.16)
Now let x̄ be any vector satisfying Ax̄ = b. Then x is primal feasible if x ∈ x̄ + L and
x ∈ IRn+ . So the primal problem can be reformulated as
(P ′ )
min cT x : x ∈ (x̄ + L) ∩ IRn+ .
So, (P ) amounts to minimizing the linear function cT x over the intersection of the
affine space x̄ + L and the nonnegative orthant IRn+ .
We can put (D) in the same format by eliminating the vector y of free variables.
To this end we observe that s ∈ IRn is feasible for (D) if and only if s ∈ c + L⊥ and
s ∈ IRn+ . Given any vector s ∈ c + L⊥ , let y be such that AT y + s = c. Then
T
bT y = (Ax̄) y = x̄T AT y = x̄T (c − s) = cT x̄ − x̄T s.
(5.17)
Omitting the constant cT x̄, it follows that solving (D) is equivalent to solving the
problem
(D′ )
min x̄T s : s ∈ (c + L⊥ ) ∩ IRn+ .
Thus we see that the dual problem amounts to minimizing the linear function x̄T s
over the intersection of the affine space c + L⊥ and the nonnegative orthant IRn+ .
The similarity with reformulation (P ′ ) is striking: both problems are minimization
problems, the roles of the vectors x̄ and c are interchanged, and the underlying linear
spaces are each others orthogonal complement. An immediate consequence is also that
the dual of the dual problem is the primal problem.10 The KKT conditions can now
be expressed in a way that is completely symmetric in x and s:
x
s
xs
10
∈
(x̄ + L) ∩ IRn+ ,
=
µe.
∈
⊥
(c + L ) ∩
IRn+ ,
x > 0,
s > 0,
(5.18)
The affine spaces c + L⊥ and x̄ + L intersect in a unique point ξ ∈ IRn . Hence, we could even take
c = x̄ = ξ.
II.5 Preliminaries
105
Due to (5.17), we conclude that on the dual feasible region, bT y and x̄T s sum up to
the constant cT x̄.
5.9
Dual logarithmic barrier function
We conclude this chapter by introducing the dual logarithmic barrier function, using
the symmetry that has now become apparent. Recall that for any positive µ the primal
µ-center x(µ) has been characterized as the minimizer of the primal logarithmic barrier
function g̃µ (x), as given by (5.2):
g̃µ (x) = cT x − µ
n
X
log xj .
j=1
Using the symmetry, we obtain that the dual µ-center s(µ) can be characterized as
the minimizer of the function
h̃µ (s) := x̄T s − µ
n
X
log sj ,
(5.19)
j=1
where s runs through all positive dual feasible slack vectors. According to (5.17), we
may replace x̄T s by cT x̄ − bT y. Omitting the constant cT x̄, it follows that (y(µ), s(µ))
is the minimizer of the function
T
kµ (y, s) = −b y − µ
n
X
log sj .
j=1
The last function is usually called the dual logarithmic barrier function. Recall that for
any dual feasible pair (y, s), h̃µ (s) and kµ (y, s) differ by a constant only. It may often
be preferable to use h̃µ (s), because then we only have to deal with the nonnegative
slack vectors, and not with the free variable y. It will be convenient to refer also to
h̃µ (s) as the dual logarithmic barrier function.
From now on we assume that the interior point condition is satisfied, unless
stated otherwise. As a consequence, both the primal and the dual logarithmic barrier
functions have a minimizer, for each µ > 0. These minimizers are denoted by x(µ)
and s(µ) respectively.
6
The Dual Logarithmic Barrier
Method
In the previous chapter we introduced the central path of a problem as the set
consisting of all µ-centers, with µ running through all positive real numbers. Using
this we can now easily describe the basic idea behind the logarithmic barrier method.
We do so for the dual problem in standard format:
(D)
max bT y : AT y + s = c, s ≥ 0 .
Recall that any method for the dual problem can also be used for solving the primal
problem, because of the symmetry discussed in Section 5.8. The dual problem has the
advantage that its feasible region—in the y-space—can be drawn if its dimension is
small enough (m = 1, 2 or 3). This enables us to illustrate graphically some aspects
of the methods to be described below.
6.1
A conceptual method
We assume that we know the µ-centers y(µ) and s(µ) for some positive µ = µ0 . Later
on, in Chapter 8, we show that this assumption can be made without loss of generality.
Given s(µ), the primal µ-center x(µ) follows from the relation
x(µ)s(µ) = µe.
Now the duality gap for the pair of µ-centers is given by
cT x(µ) − bT y(µ) = x(µ)T s(µ) = nµ.
The last equality follows since we have for each i that
xi (µ)si (µ) = µ.
It follows that if µ goes to zero, then the duality gap goes to zero as well. As a
consequence we have that if µ is small enough, then the pair (y(µ), s(µ)) is ‘almost’
optimal for the dual problem. This can also be seen by comparing the dual objective
value bT y(µ) with the optimal value of (D). Denoting the optimal value of (P ) and
(D) by z ∗ we know from Proposition II.1 that
bT y(µ) ≤ z ∗ ≤ cT x(µ),
108
II Logarithmic Barrier Approach
so we have
z ∗ − bT y(µ) ≤ cT x(µ) − bT y(µ) = x(µ)T s(µ) = nµ,
and
cT x(µ) − z ∗ ≤ cT x(µ) − bT y(µ) = x(µ)T s(µ) = nµ.
Thus, if µ is chosen small enough, the primal objective value cT x(µ) and the dual
objective value bT y(µ) can simultaneously be driven arbitrarily close to the optimal
value. We thus have to deal with the question of how to obtain the µ-centers for small
enough values of µ.
Now let µ∗ be obtained from µ by
µ∗ := (1 − θ) µ,
where θ is a positive constant smaller than 1. We may expect that if θ is not too large,
the µ∗ -centers will be close to the given µ-centers.1 For the moment, let us assume
that we are able to calculate the µ∗ -centers, provided θ is not too large. Then the
following conceptual algorithm can be used to find ε-optimal solutions of both (P )
and (D).
Conceptual Logarithmic Barrier Algorithm
Input:
An accuracy parameter ε > 0;
a barrier update parameter θ, 0 < θ < 1;
the center (y(µ0 ), s(µ0 )) for some µ0 > 0.
begin
µ := µ0 ;
while nµ ≥ ε do
begin
µ := (1 − θ)µ;
s := s(µ);
end
end
Recall that, given the dual center s(µ), the primal center x(µ) can be calculated
immediately from the centering condition at µ. Hence, the output of this algorithm
is a feasible primal-dual pair of solutions for (P ) and (D) such that the duality gap
does not exceed ε. How many iterations are needed by the algorithm? The answer is
provided by the following lemma.
1
This is a consequence of the fact that the µ-centers depend continuously on the barrier parameter
µ, due to a result of Fiacco and McCormick [77]. See also Chapter 16.
II.6 Dual Logarithmic Barrier Method
109
Lemma II.17 If the barrier parameter µ has the initial value µ0 and is repeatedly
multiplied by 1 − θ, with 0 < θ < 1, then after at most
1
nµ0
log
θ
ε
iterations we have nµ ≤ ε.
Proof: Initially the duality gap is nµ0 , and in each iteration it is reduced by the
factor 1 − θ. Hence, after k iterations the duality gap is smaller than ε if
(1 − θ)k nµ0 ≤ ε.
The rest of the proof goes in the same as in the proof of Lemma I.36. Taking logarithms
we get
k log (1 − θ) + log(nµ0 ) ≤ log ε.
Since − log (1 − θ) ≥ θ, this certainly holds if
kθ ≥ log(nµ0 ) − log ε = log
This implies the lemma.
nµ0
.
ε
✷
To make the algorithm more practical, we have to avoid the exact calculation of the
µ-center s(µ). This is the subject of the following sections.
6.2
Using approximate centers
Recall that any µ-center is the minimizer for the corresponding logarithmic barrier
function. Therefore, by minimizing the corresponding logarithmic barrier function we
will find the µ-center. Since the logarithmic barrier function has a positive definite
Hessian, Newton’s method is a natural candidate for this purpose. If we know the
µ-center, then defining µ∗ by µ∗ := (1 − θ)µ, just as in the preceding section, we
can move to the µ∗ -center by applying Newton’s method to the logarithmic barrier
function corresponding to µ∗ , starting at the µ-center. Having reached the µ∗ -center,
we can repeat this process until the barrier parameter has become small enough. In
fact this would yield an implementation of the conceptual algorithm of the preceding
section. Unfortunately, however, after the update of the barrier parameter to µ∗ , to
find the µ∗ -center exactly infinitely many Newton steps are needed. To restrict the
number of Newton steps between two successive updates of the barrier parameter, we
do not calculate the µ∗ -center exactly, but instead use an approximation of it. Our
first aim is to show that this can be done in such a way that only one Newton step
is taken between two successive updates of the barrier parameter. Later on we deal
with a different approach where the number of Newton steps between two successive
updates of the barrier parameter may be larger than one.
In the following sections we are concerned with a more detailed analysis of the
use of approximate centers. In the analysis we need to measure the proximity of
an approximate center to the exact center. We also have to study the behavior of
110
II Logarithmic Barrier Approach
Newton’s method when applied to the logarithmic barrier function. We start in the
next section with the calculation of the Newton step. Then we proceed to defining a
proximity measure and deal with some related properties. After this we can formulate
the algorithm, and analyze it.
6.3
Definition of the Newton step
In this section we assume that we are given a dual feasible pair (y, s), and, by applying
Newton’s method to the dual logarithmic barrier function corresponding to the barrier
parameter value µ, we try to find the minimizer of this function, which is the pair
(y(µ), s(µ)). Recall that the dual logarithmic barrier function is the function kµ (y, s)
defined by
n
X
log si ,
kµ (y, s) := −bT y − µ
i=1
where (y, s) runs through all dual feasible pairs with positive slack vector s. Recall
also that y and s are related by the dual feasibility condition
AT y + s = c,
s ≥ 0,
and since we assume that A has full rank, this defines a one-to-one correspondence
between the components y and s in dual feasible pairs. As a consequence, we can
consider kµ (y, s) as a function of s alone. In Section 5.8 we showed that kµ (y, s) differs
only by the constant cT x̄ from
h̃µ (s) = x̄T s − µ
n
X
log sj ,
j=1
provided Ax̄ = b.
Our present aim is to compute the minimizer s(µ) of h̃µ (s). Assuming s 6= s(µ), we
construct a search direction by applying Newton’s method to h̃µ (s). We first calculate
the first and second derivatives of h̃µ (s) with respect to s, namely
∇h̃µ (s) = x̄ − µs−1 ,
∇2 h̃µ (s) = µS −2 ,
where, as usual, S = diag (s). The Newton step ∆s — in the s-space — is the minimizer
of the second-order approximation of h̃µ (s + ∆s) at s, which is given by
t(∆s) := h̃µ (s) + x̄ − µs−1
T
1
∆s + ∆sT µS −2 ∆s,
2
subject to the condition that s+ ∆s is dual feasible. The latter means that there exists
∆y such that
AT (y + ∆y) + s + ∆s = c.
Since AT y + s = c, this is equivalent to
AT ∆y + ∆s = 0
II.6 Dual Logarithmic Barrier Method
111
for some ∆y.
We make use of an (n − m) × n matrix H whose null space is equal to the row space
of A. Then the condition on ∆s simply means that H∆s = 0, which is equivalent to
∆s ∈ null space of H.
Using Proposition A.1, we find that ∆s minimizes t(∆s) if and only if
∇t(∆s) = x̄ − µs−1 + µs−2 ∆s ⊥ null space of H.
It is useful to restate these conditions in terms of the matrix HS:2
sx̄ − µe + µs−1 ∆s ⊥ null space of HS,
and
Therefore, writing
µs−1 ∆s ∈ null space of HS.
sx̄ − µe = −µs−1 ∆s + sx̄ − µe + µs−1 ∆s ,
we have a decomposition of the vector sx̄ − µe into two components, with the first
component in the null space of HS and the second component orthogonal to the
null space of HS. Stated otherwise, µs−1 ∆s is the orthogonal projection of µe − sx̄
into the null space of HS. Hence we have shown that
µs−1 ∆s = PHS (µe − sx̄) .
(6.1)
From this relation the Newton step ∆s can be calculated. Since the projection matrix
PHS 3 is given by
−1
HS,
PHS = I − SH T HS 2 H T
we obtain the following expression for ∆s:
−1
sx̄
.
HS
e−
∆s = s I − SH T HS 2 H T
µ
Recall that x̄ may be any vector such that Ax̄ = b. It follows that the right-hand
side in (6.1) must be independent of x̄. It is left to the reader to verify that this is
indeed true.4,5,6 We are now going to explore this in a surprising way with extremely
important consequences.
2
3
Exercise 33 Let S be a square and nonsingular matrix and H be any other matrix such that the
product HS is well defined. Then x ∈ null space of H if and only if S −1 x ∈ null space of HS, and
x ⊥ null space of H if and only if Sx ⊥ null space of HS T . Prove this.
For any matrix Q the matrix of the orthogonal projection onto the null space of Q is denoted as
PQ .
4
Exercise 34 Show that PHS (s∆x) = 0 whenever A∆x = 0.
5
Exercise 35 The Newton step in the y-space is given by
∆y = AS −2 AT
Prove this. (Hint: Use that AT ∆y + ∆s = 0.)
6
−1 b
µ
− AS −1 e .
Observe that the computation of ∆s requires the inversion of the matrix HS 2 H T , and the
computation of ∆y the inversion of the matrix AS −2 AT . It is not clear in general which of the two
inversions is more attractive from a computational point of view.
112
II Logarithmic Barrier Approach
If we let x̄ run through the affine space Ax̄ = b then the vector µe − sx̄ runs through
another affine space that is parallel to the null space of AS −1 . Now using that
null space of AS −1 = row space of HS,
we conclude that the affine space consisting of all vectors µe − sx̄, with Ax̄ = b, is
orthogonal to the null space of HS. This implies that these two spaces intersect in a
unique point. Hence there exists a unique vector x̄ satisfying Ax̄ = b such that µe − sx̄
belongs to the null space of HS. We denote this vector as x(s, µ). From its definition
we have
PHS (µe − sx(s, µ)) = µe − sx(s, µ),
thus yielding the following expression for the Newton step:
µs−1 ∆s = µe − sx(s, µ).
(6.2)
null space of AS −1 = row space of HS
Figure 6.1 depicts the situation.
✛ {µe − sx̄ : Ax̄ = b}
null space of HS
■
µ ∆s
s = µe − sx(s, µ)
Figure 6.1
The projection yielding s−1 ∆s.
Another important feature of the vector x(s, µ) is that it minimizes the 2-norm of
µe − sx̄ in the affine space Ax̄ = b. Hence, x(s, µ) can be characterized by the property
x(s, µ) = argminx {kµe − sxk : Ax = b} .
(6.3)
We summarize these results in a theorem.
Theorem II.18 Let s be any positive dual feasible slack vector. Then the Newton
step ∆s at s with respect to the dual logarithmic barrier function corresponding to the
barrier parameter value µ satisfies (6.2), with x(s, µ) as defined in (6.3).
II.6 Dual Logarithmic Barrier Method
6.4
113
Properties of the Newton step
We denote the result of the Newton step at s by s+ . Thus we may write
s+ := s + ∆s = s e + s−1 ∆s .
A major question is whether s+ is feasible or not. Another important question is
whether x(s, µ) is primal feasible. In this section we deal with these two questions,
and we show that both questions allow a perfect answer.
We start with the feasibility of s+ . Clearly, s+ is feasible if and only if s+ is
nonnegative, and this is true if and only if
e + s−1 ∆s ≥ 0.
(6.4)
We conclude that the (full) Newton step is feasible if (6.4) is satisfied.
Let us now consider the vector x(s, µ). By definition, it satisfies the equation Ax = b,
so if it is nonnegative, then x(s, µ) is primal feasible. We can derive a simple condition
for that. From (6.2) we obtain that
x(s, µ) = µs−1 e − s−1 ∆s .
(6.5)
We conclude that x(s, µ) is primal feasible if and only if
e − s−1 ∆s ≥ 0.
(6.6)
Combining this result with (6.4) we state the following lemma.
Lemma II.19 If the Newton step ∆s satisfies
−e ≤ s−1 ∆s ≤ e
then x(s, µ) is primal feasible, and s+ = s + ∆s is dual feasible.
Remark II.20 We make an interesting observation. Since s is positive, (6.6) is
equivalent to
s − ∆s ≥ 0.
Note that s − ∆s is obtained by moving from s in the opposite direction of the Newton
step. Thus we conclude that x(s, µ) is primal feasible if and only if a backward Newton
step yields a dual feasible point for the dual problem.
We conclude this section by considering the special case where ∆s = 0. From (6.2)
we deduce that this occurs if and only if sx(s, µ) = µe, i.e., if and only if s and x(s, µ)
satisfy the centering condition with respect to µ. Since s and x(s, µ) are positive, they
satisfy the KKT conditions. Now the unicity property gives us that x(s, µ) = x(µ)
and s = s(µ). Thus we see that the Newton step at s is equal to the zero vector if and
only if s = s(µ). This could have been expected, because s(µ) is the minimizer of the
dual logarithmic barrier function.
114
6.5
II Logarithmic Barrier Approach
Proximity and local quadratic convergence
Lemma II.19 in the previous section states under what conditions the Newton step
yields feasible solutions on both the dual and the primal side. This turned out to be
the case when
−e ≤ s−1 ∆s ≤ e.
Observe that these inequalities can be rephrased simply by saying that the infinity
norm of the vector s−1 ∆s does not exceed 1. We refer to s−1 ∆s as the Newton step
∆s scaled by s, or, in short, the scaled Newton step at s.
In the analysis of the logarithmic barrier method we need a measure for the ‘distance’
of s to the µ-center s(µ). The above observation might suggest that the infinity norm
of the scaled Newton step could be used for that purpose. However, it turns out to
be more convenient to use the 2-norm of the scaled Newton step. So we measure the
proximity of s to s(µ) by the quantity7
δ(s, µ) := s−1 ∆s .
(6.7)
At the end of the previous section we found that the Newton step ∆s vanishes if and
only if s is equal to s(µ). As a consequence we have
δ(s, µ) = 0 ⇐⇒ s = s(µ).
The obvious question that we have to deal with is about the improvement in the
proximity to s(µ) after a feasible Newton step. The next theorem provides a very
elegant answer to this question. In the proof of this theorem we need a different
characterization of the proximity δ(s, µ), which is an immediate consequence of
Theorem II.18, namely
δ(s, µ) = e −
1
sx(s, µ)
= min {kµe − sxk : Ax = b} .
µ
µ x
(6.8)
We have the following result.
Theorem II.21 If δ(s, µ) ≤ 1, then x(s, µ) is primal feasible, and s+ = s + ∆s is
dual feasible. Moreover,
δ(s+ , µ) ≤ δ(s, µ)2 .
Proof: The first part of the theorem is an obvious consequence of Lemma II.19,
because the infinity norm of s−1 ∆s does not exceed its 2-norm and hence does not
exceed 1. Now let us turn to the proof of the second statement. Using (6.8) we write
δ(s+ , µ) =
7
1
min µe − s+ x
µ x
: Ax = b .
Exercise 36 If s = s(µ) then we know that µs−1 is primal feasible. Now let δ = δ(s, µ) > 0 and
consider x = µs−1 . Let Q = AS −2 AT . Then Q is positive definite, and so is its inverse. Hence
Q−1 defines a norm that we denote as k.kQ−1 . Thus, for any z ∈ IRn :
kzkQ−1 =
p
z T Q−1 z.
Measuring the amount of infeasibility of x in the sense of this norm, prove that
kAx − bkQ−1 = µδ.
II.6 Dual Logarithmic Barrier Method
115
Substituting for x the vector x(s, µ) we obtain the inequality
δ(s+ , µ) ≤
1
µ
µe − s+ x(s, µ) .
(6.9)
The vector µe − s+ x(s, µ) can be reduced as follows:
µe − s+ x(s, µ) = µe − (s + ∆s) x(s, µ) = µe − sx(s, µ) − ∆sx(s, µ).
From (6.2) this implies
µe − s+ x(s, µ) = µs−1 ∆s− ∆sx(s, µ) = (µe − sx(s, µ)) s−1 ∆s = µ s−1 ∆s
Thus we obtain, by substitution of this equality in (6.9),
δ(s+ , µ) ≤
s−1 ∆s
2
≤ s−1 ∆s
∞
2
. (6.10)
s−1 ∆s .
Now from the obvious inequality kzk∞ ≤ kzk, with z = s−1 ∆s, the result follows. ✷
Theorem II.21 implies that after a Newton step the proximity to the µ-center
is smaller than the square of the proximity before the Newton step. In other
words, Newton’s method is quadratically convergent. Moreover, the theorem defines
a neighborhood of the µ-center s(µ) where the quadratic convergence occurs, namely
{s ∈ D : δ(s, µ) < 1} .
(6.11)
This result is extremely important. It implies that when the present iterate s is close
to s(µ), then only a small number of Newton steps brings us very close to s(µ). For
instance, if δ(s, µ) = 0.5, then only 6 Newton steps yield an iterate with proximity
less than 10−16 . Figure 6.2 shows a graph depicting the required number of steps to
✲
number of Newton steps
9
8
7
6
5
4
3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
✲ δ(x, µ)
Figure 6.2
Required number of Newton steps to reach proximity 10−16 .
reach proximity 10−16 when starting at any given value of the proximity in the interval
(0, 1).
116
II Logarithmic Barrier Approach
We can also consider it differently. If we repeatedly apply Newton steps, starting at
s0 = s, then after k Newton steps the resulting point, denoted by sk , satisfies
k
δ(sk , µ) ≤ δ(s0 , µ)2 .
Hence, taking logarithms on both sides,
− log δ(sk , µ) ≥ −2k log δ(s0 , µ),
see Figure 6.3 (page 116).
✲
lower bound for
− log δ (sk ,µ)
− log δ(s0 ,µ)
70
60
50
40
30
20
10
0
0
1
2
3
4
5
6
✲ iteration number k
Figure 6.3
Convergence rate of the Newton process.
The above algebraic proof of the quadratic convergence property is illustrated
geometrically by Figure 6.4 (page 117). Like in Figure 6.1, in Figure 6.4 the
null space of HS and the row space of HS are represented by perpendicular axes.
From (6.1) we know that the orthogonal projection of any vector µe − sx, with
Ax = b, into the null space of HS yields µs−1 ∆s. Hence the norm of this projection
is equal to µδ(s, µ). In other words, µδ(s, µ) is equal to the Euclidean distance from
the affine space {µe − sx : Ax = b} to the origin. Therefore, the proximity after the
Newton step, given by µδ(s+ , µ), is the Euclidean distance from the affine space
{µe − s+ x : Ax = b} to the origin. The affine space {µe − s+ x : Ax = b} contains
2
the vector µe − s+ x(s, µ), which is equal to µ s−1 ∆s , from (6.10). Hence, µδ(s+ , µ)
does not exceed the norm of this vector.
The properties of the proximity measure δ(s, µ) described in Theorem II.21 are
illustrated graphically in the next example. In this example we draw some level curves
for the proximity measure for some fixed value of the barrier parameter µ, and we
show how the Newton step behaves when applied at some points inside and outside
the region of quadratic convergence, as given by (6.11). We do this for some simple
problems.
117
row
spa
ce
row space of HS
of
HS +
II.6 Dual Logarithmic Barrier Method
✛ {µe − sx̄ : Ax̄ = b}
µ
∆s 2
s
✛
{µe − s+ x̄ : Ax̄ = b}
❘
null space of HS
✛
■
µ ∆s
s = µe − sx(s, µ)
µδ (
s+
, µ)
✲
✛
✲
µδ(s, µ)
Figure 6.4
The proximity before and after a Newton step.
1
✲ y2
1
0
✠
0.5
δ(s, 2) = 1
-1
-2
1
-3
1.5
-4
-1
Figure 6.5
-0.5
0
0.5
✲ y1
1
1.5
Demonstration no.1 of the Newton process.
Example II.22 First we take A and c as in Example II.7 on page 97, and b = (0, 1)T .
Figure 5.2 (page 98) shows the feasible region and the central path. In Figure 6.5 we
have added some level curves for δ(s, 2). We have also depicted the Newton step at
118
II Logarithmic Barrier Approach
several points in the feasible region. The respective starting points are indicated by
the symbol ‘o ’, and the resulting point after a Newton step by the symbol ‘∗ ’; the two
points are connected by a straight line to indicate the Newton step.
Note that, in agreement with Theorem II.21, when starting within the region of
quadratic convergence, i.e., when δ(s, µ) < 1, the Newton step is not only feasible, but
there is a significant decrease in the proximity to the 2-center. Also, when starting
outside the region of quadratic convergence, i.e., when δ(s, µ) ≥ 1, it may happen that
the Newton step leaves the feasible region.
In Figure 6.6 we depict similar results for the problem defined in Example II.7 with
b = (1, 1)T .
✲ y2
1
1
0
✠
0.5
δ(s, 2) = 1
-1
-2
-3
1.5
-4
-1
Figure 6.6
-0.5
0
0.5
✲ y1
1
1.5
Demonstration no.2 of the Newton process.
Finally, Figure 6.7 depicts the situation for a new, less regular, problem. It is defined
by
1
4
"
#
" #
1
−2
1
1
0
1 −1
0
2
A=
, b=
, c=
2 .
1
1
1 −1
1
0
0 −1
2
0
0
This figure makes clear that after a Newton step the proximity to the 2-center may
increase. Concluding this example, we may state that inside the region of quadratic
convergence our proximity measure provides perfect control over the Newton process,
II.6 Dual Logarithmic Barrier Method
119
2
✲ y2
1
1.5
1.25
1
1.25
■
0.5
δ(s, 2) = 1
0
0
Figure 6.7
0.5
1
1.5
✲ y1
Demonstration no.3 of the Newton process.
but outside this region it has little value.
6.6
2
♦
The duality gap close to the central path
A nice feature of the µ-center s = s(µ) is that the vector x = µs−1 is primal feasible,
and the duality gap for the primal-dual pair (x, s) is given by nµ. One might ask
about the situation when s is close to s(µ). The next theorem provides a satisfactory
answer. It states that for small values of the proximity δ(s, µ) the duality gap for the
pair (x(s, µ), s) is close to the gap for the µ-centers.
Theorem II.23 Let δ := δ(s, µ) ≤ 1. Then the duality gap for the primal-dual pair
(x(s, µ), s) satisfies
nµ (1 − δ) ≤ sT x(s, µ) ≤ nµ (1 + δ) .
Proof: From Theorem II.21 we know that x(s, µ) is primal feasible. Hence, for the
duality gap we have
sT x(s, µ) = sT µs−1 e − s−1 ∆s = µeT e − s−1 ∆s .
Since the coordinates of the vector e − s−1 ∆s lie in the interval [1 − δ, 1 + δ], the result
follows.
✷
120
II Logarithmic Barrier Approach
Remark II.24 The above estimate for the duality gap is not as sharp as it could be, but is
sufficient for our goal. Nevertheless, we want to point out that the Cauchy–Schwarz inequality
gives stronger bounds. We have
sT x(s, µ) = µeT e − s−1 ∆s = nµ − µeT s−1 ∆s.
Hence
√
sT x(s, µ) − nµ = µ eT s−1 ∆s ≤ µ kek s−1 ∆s = µ nδ,
and it follows that
δ
nµ 1 − √
n
6.7
≤ sT x(s, µ) ≤ nµ
δ
1+ √
n
.
•
Dual logarithmic barrier algorithm with full Newton steps
We can now describe an algorithm using approximate centers. We assume that we are
given a pair (y 0 , s0 ) ∈ D and a µ0 > 0 such that (y 0 , s0 ) is close to the µ0 -center in
the sense of the proximity measure δ(s0 , µ0 ). In the algorithm the barrier parameter
monotonically decreases from the initial value µ0 to some small value determined by
the desired accuracy. In the algorithm we denote by p(s, µ) the Newton step ∆s at
s ∈ D+ to emphasize the dependence on the barrier parameter µ.
Dual Logarithmic Barrier Algorithm with full Newton steps
Input:
A proximity parameter τ , 0 ≤ τ < 1;
an accuracy parameter ε > 0;
(y 0 , s0 ) ∈ D and µ0 > 0 such that δ(s0 , µ0 ) ≤ τ ;
a fixed parameter θ, 0 < θ < 1.
begin
s := s0 ; µ := µ0 ;
while nµ ≥ (1 − θ)ε do
begin
s := s + p(s, µ);
µ := (1 − θ)µ;
end
end
We prove the following theorem.
√
√
Theorem II.25 If τ = 1/ 2 and θ = 1/(3 n), then the Dual Logarithmic Barrier
Algorithm with full Newton steps requires at most
√
nµ0
3 n log
ε
II.6 Dual Logarithmic Barrier Method
121
iterations. The output is a primal-dual pair (x, s) such that xT s ≤ 2ε.
6.7.1
Convergence analysis
The proof of Theorem II.25 depends on the following lemma. The lemma generalizes
Theorem II.21 to the case where, after the Newton step corresponding to the barrier
parameter value µ, the barrier parameter is updated to µ+ = (1 − θ)µ. Taking θ = 0
in the lemma we get back the result of Theorem II.21.
Lemma II.26 8 Assuming δ(s, µ) ≤ 1, let s+ be obtained from s by moving along the
Newton step ∆s = p(s, µ) at s corresponding to the barrier parameter value µ, and let
µ+ = (1 − θ)µ. Then we have
δ(s+ , µ+ )2 ≤ δ(s, µ)4 +
θ2 n
.
(1 − θ)2
Proof: By definition we have
δ(s+ , µ+ ) =
+
1
µ e − s+ x
min
µ+ x
: Ax = b
Substituting for x the vector x(s, µ) we obtain the inequality:
δ(s+ , µ+ ) ≤
1
µ+
µ+ e − s+ x(s, µ) = e −
s+ x(s, µ)
.
µ(1 − θ)
From (6.10) we deduce that
2
.
s+ x(s, µ) = µ e − s−1 ∆s
Substituting this, while simplifying the notation by using
h := s−1 ∆s,
we get
δ(s+ , µ+ ) ≤ e −
e − h2
θ
= h2 −
e − h2 .
1−θ
1−θ
(6.12)
To further simplify the notation we replace θ/ (1 − θ) by ρ. Then taking squares of
both sides in the last inequality we obtain
δ(s+ , µ+ )2 ≤ h2
2
− 2ρ h2
Since khk = δ(s, µ) ≤ 1 we have
T
e − h 2 + ρ2 e − h 2
2
.
0 ≤ e − h2 ≤ e.
Hence we have
h2
8
T
e − h2 ≥ 0,
e − h2
2
≤ kek2 .
This lemma and its proof are due to Ling [182]. It improves estimates used by Roos and Vial [245].
122
II Logarithmic Barrier Approach
2
Using this, and also that kek = n, we obtain
δ(s+ , µ+ )2 ≤ h2
2
2
4
+ ρ2 kek ≤ khk + ρ2 n = δ(s, µ)4 + ρ2 n,
thus proving the lemma.
✷
Remark II.27 It may be noted that a weaker result can be obtained in a more simple way
by applying the triangle inequality to (6.12). This yields
δ(s+ , µ+ ) ≤ h2 +
√
θ n
θ
e − h2 ≤ δ(s, µ)2 +
.
1−θ
1−θ
This result is strong enough to derive a polynomial iteration bound, but the resulting bound
will be slightly weaker than the one in Theorem II.25.
•
√
The proof of Theorem II.25 goes now as follows. Taking θ = 1/(3 n), we have
√
1
θ n
3
≤
=
1−θ
1 − 3√1 n
1
3
2
3
=
1
.
2
Hence, applying the lemma, we obtain
1
δ(s+ , µ+ )2 ≤ δ(s, µ)4 + .
4
√
Therefore, if δ(s, µ) ≤ τ = 1/ 2, then we obtain
δ(s+ , µ+ )2 ≤
1
1 1
+ = ,
4 4
2
√
which implies that δ(s+ , µ+ ) ≤ 1/ 2 = τ. Thus it follows that after each iteration of
the algorithm the property
δ(s, µ) ≤ τ
is maintained. The iteration bound in the theorem is an immediate consequence of
Lemma I.36. Finally, if s is the dual iterate at termination of the algorithm, and µ the
value of the barrier parameter, then with x = x(s, µ), Theorem II.23 yields
sT x(s, µ) ≤ nµ (1 + δ(s, µ)) ≤ nµ (1 + τ ) ≤ 2nµ.
Since upon termination we have nµ ≤ ε, it follows that sT x(s, µ) ≤ 2ε. This completes
the proof of the theorem.
✷
6.7.2
Illustration of the algorithm with full Newton steps
In this section we start with a straightforward application of the logarithmic barrier
algorithm. After that we devote some sections to modifications of the algorithm that
increase the practical efficiency of the algorithm without destroying the theoretical
iteration bound.
II.6 Dual Logarithmic Barrier Method
123
As an example we solve the problem with A and c as in Example II.7, and with
bT = (1, 1). Written out, the (dual) problem is given by
max {y1 + y2 : −1 ≤ y1 ≤ 1, y2 ≤ 1} .
and the primal problem is
min {x1 + x2 + x3 : x1 − x2 = 1, x3 = 1, x ≥ 0} .
We can start the algorithm at y = (0, 0) and µ = 2, because we then have s = (1, 1, 1)
and, since x = (2, 1, 1) is primal feasible,
0
sx
1
− e = − 21 = √ .
δ(s, µ) ≤
µ
2
− 12
With ε = 10−4 , the dual logarithmic barrier algorithm needs 53 iterations to generate
the primal feasible solution x = (1.000015, 0.000015, 1.000000) and the dual feasible
pair (y, s) with y = (0.999971, 0.999971) and s = (0.000029, 1.999971, 0.000029). The
respective objective values are cT x = 2.000030 and bT y = 1.999943, and the duality
gap is 0.000087.
Table 6.1. (page 124) shows some quantities generated in the course of the algorithm.
For each iteration the table shows the values of nµ, the first coordinate of x(s, µ), the
coordinates of y, the first coordinate of s, the proximity δ = δ(s, µ) before and the
proximity δ + = δ(s+ , µ) after the Newton step at y to the µ-center, and, in the last
column, the barrier update parameter θ, which is constant in this example.
The columns for δ and δ + in Table 6.1. are of special interest. They make clear
that the behavior of the algorithm differs from what might be expected. The analysis
was based
√ on the idea of maintaining the proximity of the iterates below the value
τ = 1/ 2 = 0.7071, so as to stay in the region where Newton’s method is very efficient.
Therefore we updated the barrier parameter in such a way that just before the Newton
step, i.e., just after the update of the barrier parameter, the proximity should reach
the value τ . The table makes clear that in reality the proximity takes much smaller
values (soon after the start). Asymptotically the proximity before the Newton step is
always 0.2721 and after the Newton step 0.0524.
This can also be seen from Figure 6.8, which shows the relevant part of the feasible
region and the central path. The points y are indicated by small circles and the exact
µ-centers as asterisks. The above observation becomes very clear in this figure: soon
after the start the circles and the asterisks can hardly be distinguished. The figure
also shows at each iteration the region where the proximity is smaller than τ , thus
indicating the space where we are allowed to move without leaving the region of
quadratic convergence. Instead of using this space the algorithm moves in a very
narrow neighborhood of the central path.
6.8
A version of the algorithm with adaptive updates
The example in the previous section has been discussed in detail in the hope that the
reader will now understand that there is an easy way to reduce the number of iterations
124
It.
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
II Logarithmic Barrier Approach
nµ
6.000000
4.845299
3.912821
3.159798
2.551695
2.060621
1.664054
1.343807
1.085191
0.876346
0.707693
0.571498
0.461513
0.372695
0.300969
0.243048
0.196273
0.158500
0.127997
0.103364
0.083472
0.067407
0.054435
0.043959
0.035499
0.028667
0.023150
0.018695
0.015097
0.012192
0.009845
0.007951
0.006421
0.005185
0.004187
0.003381
0.002731
0.002205
0.001781
0.001438
0.001161
0.000938
0.000757
0.000612
0.000494
0.000399
0.000322
0.000260
0.000210
0.000170
0.000137
0.000111
0.000089
0.000072
x1
2.500000
2.255388
1.969957
1.749168
1.578422
1.447319
1.347011
1.270294
1.211482
1.166207
1.131170
1.103907
1.082581
1.065815
1.052577
1.042085
1.033741
1.027088
1.021771
1.017513
1.014098
1.011356
1.009152
1.007378
1.005950
1.004800
1.003873
1.003125
1.002522
1.002036
1.001643
1.001327
1.001071
1.000865
1.000698
1.000564
1.000455
1.000368
1.000297
1.000240
1.000194
1.000156
1.000126
1.000102
1.000082
1.000066
1.000054
1.000043
1.000035
1.000028
1.000023
1.000018
1.000015
1.000015
y1
0.000000
0.250000
0.285497
0.342068
0.403015
0.467397
0.532510
0.595745
0.654936
0.708650
0.756184
0.797423
0.832650
0.862383
0.887244
0.907881
0.924914
0.938910
0.950370
0.959728
0.967352
0.973553
0.978589
0.982674
0.985986
0.988668
0.990839
0.992596
0.994017
0.995165
0.996094
0.996845
0.997451
0.997941
0.998337
0.998657
0.998915
0.999124
0.999292
0.999429
0.999539
0.999627
0.999699
0.999757
0.999804
0.999841
0.999872
0.999897
0.999917
0.999933
0.999946
0.999956
0.999964
0.999971
Table 6.1.
y2
0.000000
−0.500000
−0.606897
−0.234058
−0.022234
0.184083
0.337370
0.466322
0.568477
0.651736
0.718677
0.772849
0.816552
0.851862
0.880369
0.903393
0.921985
0.936999
0.949123
0.958915
0.966821
0.973207
0.978363
0.982527
0.985890
0.988605
0.990798
0.992569
0.993999
0.995154
0.996087
0.996840
0.997448
0.997939
0.998336
0.998656
0.998915
0.999124
0.999292
0.999428
0.999538
0.999627
0.999699
0.999757
0.999804
0.999841
0.999872
0.999897
0.999917
0.999933
0.999946
0.999956
0.999964
0.999971
s1
1.000000
0.750000
0.714503
0.657932
0.596985
0.532603
0.467490
0.404255
0.345064
0.291350
0.243816
0.202577
0.167350
0.137617
0.112756
0.092119
0.075086
0.061090
0.049630
0.040272
0.032648
0.026447
0.021411
0.017326
0.014014
0.011332
0.009161
0.007404
0.005983
0.004835
0.003906
0.003155
0.002549
0.002059
0.001663
0.001343
0.001085
0.000876
0.000708
0.000571
0.000461
0.000373
0.000301
0.000243
0.000196
0.000159
0.000128
0.000103
0.000083
0.000067
0.000054
0.000044
0.000036
0.000029
δ
0.6124
0.0901
0.2491
0.2003
0.2334
0.2285
0.2406
0.2438
0.2500
0.2537
0.2574
0.2601
0.2624
0.2643
0.2658
0.2670
0.2680
0.2688
0.2695
0.2700
0.2704
0.2707
0.2710
0.2712
0.2714
0.2716
0.2717
0.2718
0.2718
0.2719
0.2720
0.2720
0.2720
0.2721
0.2721
0.2721
0.2721
0.2721
0.2721
0.2721
0.2721
0.2721
0.2721
0.2722
0.2722
0.2722
0.2722
0.2722
0.2722
0.2722
0.2722
0.2722
0.2722
−
Output of the dual full-step algorithm.
δ+
0.2509
0.0053
0.0540
0.0303
0.0420
0.0379
0.0416
0.0421
0.0441
0.0453
0.0467
0.0477
0.0486
0.0493
0.0499
0.0504
0.0508
0.0511
0.0513
0.0515
0.0517
0.0518
0.0519
0.0520
0.0521
0.0521
0.0522
0.0522
0.0523
0.0523
0.0523
0.0523
0.0523
0.0523
0.0523
0.0524
0.0524
0.0524
0.0524
0.0524
0.0524
0.0524
0.0524
0.0524
0.0524
0.0524
0.0524
0.0524
0.0524
0.0524
0.0524
0.0524
0.0524
−
θ
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
0.1925
−
II.6 Dual Logarithmic Barrier Method
125
1
y2
0.5
✻
0
−0.5
②
−1
−1.5
−2
②
δ(s, 2(1 − θ)2 ) = τ
②
δ(s, 2(1 − θ)) = τ
δ(s, 2) = τ
−2.5
✛
−3
0
central path
0.5
1
✲ y1
Figure 6.8
Iterates of the dual logarithmic barrier algorithm.
required by the algorithm without losing the quality of the solution guaranteed by
Theorem II.25. The obvious way to reach this goal is to make larger updates of the
barrier parameter while keeping the iterates in the region of quadratic convergence.
This is called the adaptive-update strategy,9 which we discuss in the next section.
After that we deal with a more greedy approach, using larger updates of the barrier
parameter, and in which we may leave temporarily the region of quadratic convergence.
This is the so-called large-update strategy. The analysis of the large-update strategy
cannot be based on the proximity measure δ(y, µ) alone, because outside the region of
quadratic convergence this measure has no useful meaning. But, as we shall see, there
exists a different way of measuring the progress of the algorithm in that case.
6.8.1
An adaptive-update variant
Observe that the iteration bound of Theorem II.25 was obtained by requiring that
after each update of the barrier parameter µ the proximity satisfies
δ(s, µ) ≤ τ.
(6.13)
In order to make clear how this observation can be used to improve the performance
of the algorithm without losing the iteration bound of Theorem II.25, let us briefly
recall the idea behind the proof of this theorem. At the start of an iteration we are
given s and µ such that (6.13) holds. We then make a Newton step to the µ-center,
9
The adaptive-update strategy was first proposed by Ye [303]. See also Roos and Vial [245].
126
II Logarithmic Barrier Approach
■
central path
y(µ+ )
✛
y(µ)
(y + , s+ )
✛
Figure 6.9
✛
δ(s+ , µ+ ) = τ
δ(s+ , µ) = τ
δ(s+ , µ) = τ 2
The idea of adaptive updating.
which yields s+ , and we have
δ(s+ , µ) ≤ τ 2 .
(6.14)
Then we update µ to a smaller value µ+ = (1 − θ)µ such that
δ(s+ , µ+ ) ≤ τ,
(6.15)
and we start the next iteration. Our estimates in √
the proof of Theorem II.25 were such
that it has become clear that the value θ = 1/(3 n) guarantees that (6.15) will hold.
But from the example in the previous section we know that actually the new proximity
may be much smaller than τ . In other words, it may well happen that using the given
value of θ we start the next iteration with an s+ and a µ+ such that δ(s+ , µ+ ) is
(much) smaller than τ .
It will be clear that this opens a way to speed up the algorithm without
√ degrading
the iteration bound. For if we take θ larger than the value θ = 1/(3 n) used in
Theorem II.25, thus enforcing a deeper update of the barrier parameter in such a way
that (6.15) still holds, then the analysis in the proof of Theorem II.25 remains valid
but the number of iterations decreases. The question arises of how deep the update
might be. In other words, we have to deal with the problem that we are given s+ and
µ such that (6.14) holds, and we ask how large we can take θ in µ+ = (1 − θ)µ so that
(6.15) holds with equality:
δ(s+ , µ+ ) = τ.
See Figure
√ 6.9. Note that we know beforehand that this value of θ is at least
θ = 1/(3 n).
To answer the above question we need to introduce the so-called affine-scaling
direction and the centering direction at s.
II.6 Dual Logarithmic Barrier Method
6.8.2
127
The affine-scaling direction and the centering direction
From (6.1) we recall that the Newton step at s to the µ-center is given by
µs−1 ∆s = PHS (µe − sx̄) ,
so we may write
1
sx̄
= SPHS (e) − SPHS (sx̄) .
∆s = SPHS e −
µ
µ
The directions
∆c s := SPHS (e)
(6.16)
∆a s := −SPHS (sx̄)
(6.17)
and
are called the centering direction and the affine-scaling direction respectively. Note that
these two directions depend only on the iterate s and not on the barrier parameter µ.
Now the Newton step at s to the µ-center can be written as
∆s = ∆c s +
1 a
∆ s.
µ
By definition (6.7), the proximity δ(s, µ) is given by δ(s, µ) := s−1 ∆s . Thus we have
δ(s, µ) = dc +
1 a
d ,
µ
where
dc = s−1 ∆c s,
da = s−1 ∆a s
are the scaled centering and affine-scaling directions respectively.
6.8.3
Calculation of the adaptive update
Now that we know how the proximity depends on the barrier parameter we are able
to solve the problem posed √
above. We assume that we have an iterate s such that for
some µ > 0 and 0 < τ < 1/ 2,
δ := δ(s, µ) ≤ τ 2 ,
and we ask for the smallest value µ+ of the barrier parameter such that
δ(s, µ+ ) = τ.
Clearly, µ+ is the smallest positive root of the equation
δ(s, µ) = dc +
1 a
d = τ.
µ
(6.18)
128
II Logarithmic Barrier Approach
Note that in the case where b = 0, the dual objective value is constant on the dual
feasible region and hence s is optimal.10,11 We assume that da 6= 0. This is true if and
only if b 6= 0. It then follows from (6.18) that δ(s, µ) depends continuously on µ and
goes to infinity if µ approaches zero. Hence, since τ > τ 2 , equation (6.18) has at least
one positive solution.
Squaring both sides of (6.18), we arrive at the following quadratic equation in 1/µ:
2
1
2
2
kda k + (da )T dc + kdc k − τ 2 = 0.
(µ)2
µ
The two roots of (6.19) are given by
r
2
2
2
−(da )T dc ± ((da )T dc ) − kda k kdc k − τ 2
2
kda k
(6.19)
.
We already know that at least one of the roots is positive. Hence, although we do not
know the sign of the second root, we may conclude that 1/µ∗ , where µ∗ is the value of
the barrier parameter we are looking for, is equal to the larger of the two roots. This
gives, after some elementary calculations,
µ∗ =
kdc k2 − τ 2
r
.
2
2
2
c
2
a
T
c
a
T
c
a
(d ) d + ((d ) d ) − kd k kd k − τ
It is interesting to observe that it is easy to characterize the situation that both
roots of (6.18) are positive. By considering the constant term in the quadratic equation
(6.19) we see that both roots are positive if and only if kdc k2 − τ 2 > 0. From (6.18) it
follows that kdc k = δ(s, ∞). Thus, both roots are positive if and only if
δ(s, ∞) > τ.
Obviously this situation occurs only if
(da )T dc < 0.
Thus we find the interesting result
δ(s, ∞) > τ
⇒
(da )T dc < 0.
At the central path, when δ(s, µ) = 0, we have da = −µdc , so in that case the above
implication is obvious.
10
Exercise 37 Show that da = 0 if and only if b = 0.
11
Exercise 38 Consider the case b = 0. Then the primal feasibility condition is Ax = 0, x ≥ 0,
which is homogeneous in x. Show that x(s, µ) = µx(s, 1) for each µ > 0, and that δ(s, µ) is
independent of µ. Taking s = s(1), it now easily follows that s(µ) = s(1) for each µ > 0. This
means that the dual central path is a point in this case, whereas the primal central path is a
straight half line. If s and µ > 0 are such that δ(s, µ) < 1 then the Newton process converges
quadratically to s(1), which is the analytic center of the dual feasible region. See also Roos and
Vial [243] and Ye [310].
II.6 Dual Logarithmic Barrier Method
6.8.4
129
Illustration of the use of adaptive updates
By way of example we solve the same problem as in Section 6.7.2 with the dual
logarithmic barrier algorithm, now using adaptive updates. As before, we start
the algorithm at y = (0, 0) and µ = 2. With ε = 10−4 and adaptive updates,
the dual full-step algorithm needs 20 iterations to generate the primal feasible
solution x = (1.000013, 0.000013, 1.000000) and the dual feasible pair (y, s) with y =
(0.999973, 0.999986) and s = (0.000027, 1.999973, 0.000014). The respective objective
values are cT x = 2.000027 and bT y = 1.999960, and the duality gap is 0.000067.
Table 6.2. (page 129) gives some information on how the algorithm progresses. From
the seventh column in this table (with the heading δ) it is clear that we have reached
our goal: after each update of the barrier parameter the proximity equals τ . Moreover,
the adaptive barrier parameter update strategy reduced the number of iterations, from
53 to 20.
Figure 6.10 (page 130) provides a graphical illustration of the adaptive strategy.
It shows the relevant part of the feasible region and the central path, as well as
the first four points generated by the algorithm and their regions of quadratic
convergence. After each update the iterate lies on the boundary of the region of
quadratic convergence for the next value of the barrier parameter.
It.
nµ
x1
y1
y2
s1
δ
δ+
θ
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
3.000000
1.778382
0.863937
0.505477
0.280994
0.165317
0.093710
0.055038
0.031566
0.018469
0.010662
0.006220
0.003603
0.002098
0.001217
0.000708
0.000411
0.000239
0.000139
0.000081
0.000081
1.500000
1.374235
1.149409
1.091171
1.047762
1.028293
1.015735
1.009255
1.005275
1.003087
1.001779
1.001038
1.000601
1.000350
1.000203
1.000118
1.000069
1.000040
1.000023
1.000013
1.000013
0.000000
0.500000
0.579559
0.864662
0.847943
0.954529
0.947640
0.984428
0.982196
0.994677
0.993971
0.998188
0.997961
0.999385
0.999311
0.999792
0.999767
0.999930
0.999921
0.999976
0.999973
0.000000
0.000000
0.686927
0.714208
0.913169
0.906834
0.971182
0.968951
0.990450
0.989568
0.996813
0.996484
0.998931
0.998814
0.999640
0.999599
0.999879
0.999865
0.999959
0.999954
0.999986
1.000000
0.500000
0.420441
0.135338
0.152057
0.045471
0.052360
0.015572
0.017804
0.005323
0.006029
0.001812
0.002039
0.000615
0.000689
0.000208
0.000233
0.000070
0.000079
0.000024
0.000027
0.7071
0.7071
0.7071
0.7071
0.7071
0.7071
0.7071
0.7071
0.7071
0.7071
0.7071
0.7071
0.7071
0.7071
0.7071
0.7071
0.7071
0.7071
0.7071
0.7071
−
0.1581
0.4725
0.4563
0.4849
0.4912
0.4776
0.4937
0.4799
0.4915
0.4827
0.4894
0.4846
0.4881
0.4856
0.4874
0.4862
0.4870
0.4864
0.4868
0.4865
−
0.5000
0.4072
0.5142
0.4149
0.4441
0.4117
0.4332
0.4127
0.4265
0.4149
0.4227
0.4167
0.4207
0.4177
0.4198
0.4183
0.4193
0.4186
0.4191
0.4187
−
Table 6.2.
Output of the dual full-step algorithm with adaptive updates.
130
II Logarithmic Barrier Approach
1
y2
0.5
✻
0
②
−0.5
δ(s, 2) = τ
−1
−1.5
−2
−2.5
✛
−3
0
central path
0.5
1
✲ y1
Figure 6.10
6.9
The iterates when using adaptive updates.
A version of the algorithm with large updates
In this section we consider a more greedy approach than the adaptive strategy, using
larger updates of the barrier parameter. As before, we assume that we have an iterate
s and a µ > 0 such that s belongs to the region of quadratic convergence around the
µ-center. In fact we assume that12
1
δ(s, µ) ≤ τ = √ .
2
Starting at s we want to reach the region of quadratic convergence around the µ+ center, with
µ+ = (1 − θ)µ,
and we assume that θ is so large that s lies outside the region of quadratic convergence
around the µ+ -center. In fact, it may well happen that δ(s, µ+ ) is much larger than 1.
It is clear that the analysis of the previous sections, where we always took full Newton
steps for the target value of the barrier parameter, is then no longer useful: this analysis
was based on the nice behavior of Newton’s method in a close neighborhood of the
µ+ -center. Being outside this region, we can no longer profit from this nice behavior
and we need an alternative approach.
Now remember that the target center s(µ+ ) can be characterized as the (unique)
12
√
We could have taken a different value for τ , for example τ = 1/2, but the choice τ = 1/ 2 seems
to be natural. The analysis below supports our choice. In the literature the choice τ = 1/2 is very
popular (see, e.g., [140]). It is easy to adapt the analysis below to this value.
II.6 Dual Logarithmic Barrier Method
131
minimizer of the dual logarithmic barrier function
kµ+ (y, s) = −bT y − µ+
n
X
log sj
j=1
and that this function is strictly convex on the interior of the dual feasible region.
Hence, the difference
kµ+ (y, s) − kµ+ (y(µ+ ), s(µ+ ))
vanishes if and only if s = s(µ+ ) and is positive elsewhere. The difference can therefore
be used as another indicator for the ‘distance’ from s to s(µ+ ). That is exactly what
we plan to do. Outside the region of quadratic convergence the barrier function value
will act as a measure for proximity to the µ-center. We show that when moving in the
direction of the Newton step at s the barrier function decreases, and that by choosing
an appropriate step-size we can guarantee a sufficient decrease of the barrier function
value. In principle, the step-size can be obtained from a one-dimensional line search
in the Newton direction so as to minimize the barrier function in this direction. Based
on these ideas we derive an upper bound for the required number of damped Newton
steps to reach the vicinity of s(µ+ ); the upper bound will be a function of θ.
The algorithm is described on page 131. We refer to the first while-loop in the
Dual Logarithmic Barrier Algorithm with Large Updates
Input:
√
A proximity parameter τ = 1/ 2;
an accuracy parameter ε > 0;
a variable damping factor α;
an update parameter θ, 0 < θ < 1;
(y 0 , s0 ) ∈ D and µ0 > 0 such that δ(s0 , µ0 ) ≤ τ .
begin
s := s0 ; µ := µ0 ;
while nµ ≥ ε do
begin
µ := (1 − θ)µ;
while δ(s, µ) ≥ τ do
begin
s := s + αp(s, µ);
(The damping factor α must be such that kµ (y, s) decreases
sufficiently. The default value is 1/(1 + δ(s, µ)).)
end
end
end
algorithm as the outer loop and to the second while-loop as the inner loop. Each
132
II Logarithmic Barrier Approach
execution of the outer loop is called an outer iteration and each execution of the inner
loop an inner iteration. The required number of outer iterations depends only on the
dimension n of the problem, on µ0 and ε, and on the (fixed) barrier update parameter
θ. This number immediately follows from Lemma I.36. The main task in the analysis
of the algorithm is to derive an upper bound for the number of iterations in the inner
loop. For that purpose we need some lemmas that estimate barrier function values
and objective values in the region of quadratic convergence around the µ-center. Since
these estimates are interesting in themselves, and also because their importance goes
beyond the analysis of the present algorithm with line searches alone, we discuss them
in separate sections.
6.9.1
Estimates of barrier function values
We start with the barrier function values. Our goal is to estimate dual barrier function
values in the region of quadratic convergence around the µ-center. It will be convenient
not to deal with the barrier function itself, but to scale it by the barrier parameter.
Therefore we introduce
n
hµ (s) :=
−bT y X
1
log sj .
kµ (y, s) =
−
µ
µ
j=1
Let us point out once more that y is omitted in the argument of hµ (s) because of the
one-to-one correspondence between y and s in dual feasible pairs (y, s). We also use
the primal barrier function scaled by µ:
n
gµ (x) :=
1
cT x X
log xj .
g̃µ (x) =
−
µ
µ
j=1
Recall that both barrier functions are strictly convex on their domain and that s(µ)
and x(µ) are their respective minimizers. Therefore, defining
φpµ (x) := gµ (x) − gµ (x(µ)),
φdµ (s) := hµ (s) − hµ (s(µ)),
we have φdµ (s) ≥ 0, with equality if and only if s = s(µ), and also φpµ (x) ≥ 0, with
equality if and only if x = x(µ). As a consequence, defining
φµ (x, s) := φpµ (x) + φdµ (s),
(6.20)
where (x, s) is any pair of positive primal and dual feasible solutions, we have
φµ (x, s) ≥ 0, and the equality holds if and only if x = x(µ) and s = s(µ). The
function φµ : P + × D+ → IR+ is called the primal-dual logarithmic barrier function
with barrier parameter µ. Now the following lemma is almost obvious.
Lemma II.28 Let x > 0 be primal feasible and s > 0 dual feasible. Then
φpµ (x) = φµ (x, s(µ)) ≤ φµ (x, s)
and
φdµ (s) = φµ (x(µ), s) ≤ φµ (x, s).
Proof: The inequalities in the lemma are immediate from (6.20) since φpµ (x) and
φdµ (s) are nonnegative. Similarly, the equalities follow since φpµ (x(µ)) = φdµ (s(µ)) = 0.
Thus the lemma has been proved.
✷
II.6 Dual Logarithmic Barrier Method
133
In the sequel, properties of the function φµ form the basis of many of our estimates.
These estimates follow from properties of the univariate function
ψ(t) = t − log(1 + t),
t > −1,
(6.21)
as defined in (5.5).13 The definition of ψ is extended to any vector z = (z1 , z2 , . . . , zn )
satisfying z + e > 0 according to
Ψ(z) =
n
X
ψ(zj ) =
n
X
j=1
j=1
(zj − log(1 + zj )) = eT z −
n
X
log(1 + zj ).
(6.22)
j=1
We now make a crucial observation, namely that the barrier functions φµ (x, s), φpµ (x)
and φdµ (s) can be nicely expressed in terms of the function Ψ.
Lemma II.29 Let x > 0 be primal feasible and s > 0 dual feasible. Then
(i) φµ (x, s) = Ψ xs
−
e
;
µ
xs(µ)
−e ;
(ii) φpµ (x) = Ψ
µ
x(µ)s
d
(iii) φµ (s) = Ψ
−e .
µ
Proof: 14 First we consider item (i). We use that cT x − bT y = xT s and cT x(µ) −
bT y(µ) = x(µ)T s(µ) = nµ. Now φµ (x, s) can be reduced as follows:
φµ (x, s)
=
=
hµ (s) + gµ (x) − (hµ (s(µ)) + gµ (x(µ)))
n
n
x(µ)T s(µ) X
xT s X
log xj sj −
log zj (µ)sj (µ)
−
+
µ
µ
j=1
j=1
n
=
xT s X
−
log xj sj − n + n log µ.
µ
j=1
Since xT s = eT (xs) and eT e = n, we find the following expression for φµ (x, s):15
φµ (x, s) = e
T
X
n
sx
xj sj
sx
log
−e −
=Ψ
−e .
µ
µ
µ
j=1
(6.23)
This proves the first statement in the lemma. The second statement follows by
substituting s = s(µ) in the first statement, and using Lemma II.28. Similarly, the
third statement follows by substituting x = x(µ) in the first statement.
✷
13
Exercise 39 Let t > −1. Prove that
14
15
−t
t2
+ ψ(t) =
.
1+t
1+t
Note that the dependence of φµ (x, s) on x and s is such that it depends only on the coordinatewise
product xs of x and s.
Exercise 40 Considering (6.23) as the definition of φµ (x, s), and without using the properties of
ψ, show that φµ (x, s) is nonnegative, and zero if and only if xs = µe. (Hint: Use the arithmeticgeometric-mean inequality.)
ψ
134
II Logarithmic Barrier Approach
Now we are ready to derive lower and upper bounds for the value of the dual
logarithmic barrier function in the region of quadratic convergence around the µcenter. These bounds heavily depend on the following two inequalities:
ψ (kzk) ≤ Ψ(z) ≤ ψ (− kzk) ,
z > −e.
(6.24)
The second inequality is valid only if kzk < 1. The inequalities in (6.24) are
fundamental for our purpose and are immediate consequences of Lemma C.2 in
Appendix C.16,17
Lemma II.30 18
Let δ := δ(s, µ). Then
φdµ (s) ≥ δ − log(1 + δ) = ψ(δ).
Moreover, if δ < 1, then
φdµ (s) ≤ φµ (x(s, µ), s) ≤ −δ − log(1 − δ) = ψ(−δ).
Proof: By applying the inequalities in (6.24) to (6.23) we obtain for any positive
primal feasible x:
sx
sx
ψ
≤ φµ (x, s) ≤ ψ −
(6.25)
−e
−e ,
µ
µ
where the second inequality is valid only if the norm of xs/µ − e does not exceed 1.
Using (6.8) we write
δ = δ(s, µ) = e −
sx
sx(s, µ)
≤ e−
.
µ
µ
Hence, by the monotonicity of ψ(t) for t ≥ 0,
ψ(δ) ≤ ψ
16
17
sx
−e
µ
,
At least one of the inequalities in (6.24) shows up in almost every paper on interior-point methods.
As far as we know, all usual proofs use the power series expansion of log(1 + x), −1 < x < 1 and
do not characterize the case of equality, at least not explicitly. We give an elementary proof in
Appendix C (page 435).
Exercise 41 Let z ∈ IRn . Prove that
z≥0
⇒
Ψ(z) ≤ nψ
kzk
√
n
≤
kzk2
2
− kzk
kzk2
≥
.
√
n
2
This lemma improves a similar result of den Hertog et al. [146] and den Hertog [140]. The
improvement is due to a suggestion made by Osman Güler [130] during a six month stay at
Delft in 1992, namely to use the primal logarithmic barrier function in the analysis of the dual
logarithmic barrier method. This approach not only simplifies the analysis significantly, but also
leads to sharper estimates. It may be appropriate to mention that even stronger bounds for φµ (x, s)
will be derived in Lemma II.69, but there we use a different proximity measure.
−e < z ≤ 0
18
⇒
Ψ(z) ≥ nψ
II.6 Dual Logarithmic Barrier Method
135
for any positive primal feasible x. Taking x = x(µ) and using the left inequality in
(6.25) and the third statement in Lemma II.29, we get
ψ(δ) ≤ ψ
sx(µ)
−e
µ
≤ φµ (x(µ), s) = φdµ (s),
proving the first inequality in the lemma. For the proof of the second inequality in the
lemma we assume δ < 1 and put x = x(s, µ) in the right inequality in (6.25). This
gives
sx(s, µ)
φµ (x(s, µ), s) ≤ ψ −
= ψ(−δ).
−e
µ
By Lemma II.28 we also have φdµ (s) ≤ φµ (x(s, µ), s). Thus the lemma follows.
✷
The functions ψ(δ) and ψ(−δ), for 0 ≤ δ < 1, play a dominant role in many of the
estimates below. Figure 6.11 shows their graphs.
3
2.75
2.5
2.25
2
1.75
1.5
1.25
1
0.75
❯
❯
0.25
0
0
0.2
Figure 6.11
6.9.2
ψ(δ)
ψ(−δ)
0.5
0.4
0.6
0.8
✲ δ
1
The functions ψ(δ) and ψ(−δ) for 0 ≤ δ < 1.
Estimates of objective values
We proceed by considering the dual objective value bT y in the region of quadratic
convergence around the µ-center. Using that x(µ)s(µ) = µe and cT x(µ) − bT y(µ) =
x(µ)T s(µ) = nµ, we write
bT y(µ) − bT y
=
=
cT x(µ) − nµ − bT y = sT x(µ) − nµ = eT (sx(µ) − µe)
s
sx(µ)
T
T
− e = µe
−e .
(6.26)
µe
µ
s(µ)
136
II Logarithmic Barrier Approach
Applying the Cauchy–Schwarz inequality to the expression for bT y(µ) − bT y in (6.26),
we obtain
√
s
−e .
bT y(µ) − bT y ≤ µ n
(6.27)
s(µ)
√
We assume δ := δ(s, µ) < 1/ 2. It seems reasonable then to expect that the norm of
the vector
s
sx(µ)
hs :=
−e=
−e
s(µ)
µ
will not differ too much from δ. In any case, that is what we are going
√ to show. It will
then follow that the absolute value of bT y(µ) − bT y is of order µδ n.
Note that hs can be written as
hs =
s − s(µ)
,
s(µ)
and hence khs k measures the relative difference between s and s(µ). We also introduce
a similar vector for any primal feasible x > 0:
hx :=
x
x − x(µ)
xs(µ)
−e=
−e=
.
µ
x(µ)
x(µ)
Using that x − x(µ) and s − s(µ) are orthogonal, as these vectors belong to the null
space and row space of A, respectively, we may write
T
1
x − x(µ)
s − s(µ)
hTx hs =
= (x − x(µ))T (s − s(µ)) = 0.
x(µ)
s(µ)
µ
This makes clear that hx and hs are orthogonal as well. In the rest of this section we
work with x = x(s, µ) and derive upper bounds for khx k and khs k. It is convenient to
introduce the vector
h = hx + hs .
The next lemma implicitly yields an upper bound for khk.
Lemma II.31 Let δ = δ(s, µ) < 1 and x = x(s, µ). Then ψ(khk) ≤ ψ(−δ).
Proof: Using Lemma II.29 we may rewrite (6.20) as
φµ (x, s) = Ψ(hx ) + Ψ(hs ).
By the first inequality in (6.24) we have
Ψ(hx ) ≥ ψ(khx k) and Ψ(hs ) ≥ ψ(khs k).
Applying the first inequality in (6.24) to the 2-dimensional vector (khx k , khs k), we
obtain
ψ(khx k) + ψ(khs k) ≥ ψ(khk).
Here we used that hx and hs are orthogonal. Substitution gives
φµ (x, s) ≥ ψ(khk).
II.6 Dual Logarithmic Barrier Method
137
On the other hand, by Lemma II.30 we have φµ (x, s) ≤ ψ(−δ), thus completing the
proof.
✷
Let us point out that we can easily deduce from Lemma II.31 an interesting
upper bound for khk if δ < 1. It can then be shown that ψ(khk) ≤ ψ(−δ) implies
khk ≤ δ/(1 − δ).19,20 This implies that khk ≤ 1 if δ ≤ 1/2. However, for our purpose
this bound
√ is not strong enough. We prove a stronger result that implies that khk ≤ 1
if δ < 1/ 2.
√
√
Lemma II.32 Let δ = δ(s, µ) ≤ 1/ 2. Then khk < 2.
Proof: By Lemma II.31 we have ψ(khk) ≤ ψ(−δ). Since ψ(−δ) is monotonically
increasing in δ, this implies
√
ψ(khk) ≤ ψ(−1/ 2) = 0.52084.
Since
√
ψ( 2) = 0.53284 > 0.52084,
and ψ(t) is monotonically increasing for t ≥ 0, we conclude that khk <
√
2.
We now have the following result.
Lemma II.33
21
√
Let δ := δ(s, µ) ≤ 1/ 2. Then
q
p
s
khs k =
− e ≤ 1 − 1 − 2δ 2 .
s(µ)
Moreover, if x = x(s, µ) then also
khx k =
x
−e ≤
x(µ)
Proof: Lemma II.32 implies that
q
p
1 − 1 − 2δ 2 .
khx + hs k = khk <
√
2.
On the other hand, since
xs
xs
−e=
− e = (e + hx )(e + hs ) − e = hx + hs + hx hs ,
µ
x(µ)s(µ)
with x = x(s, µ), and using (6.8), it follows that
1
khx + hs + hx hs k = δ ≤ √ .
2
19
Exercise 42 Let 0 ≤ t < 1. Prove that
ψ
20
21
−t
1+t
≤
t2
t2
t2
≤ ψ(t) ≤
≤ ψ(−t) ≤
≤ψ
2(1 + t)
2
2(1 − t)
t
1−t
.
Also show that the first two inequalities are valid for any t > 0.
Exercise 43 Let 0 ≤ δ < 1 and r ≥ 0 be such that ψ(r) ≤ ψ(−δ). Prove that r ≤ δ/(1 − δ).
For δ ≤ 1/2 this lemma was first shown by Gonzaga (private communication, Delft, 1994).
✷
138
II Logarithmic Barrier Approach
At this stage we may apply the fourth uv-lemma (Lemma C.8 in Appendix C) with
u = hx and v = hs , to obtain the lemma.
✷
We are now ready for the main result of this section.
√
Theorem II.34 If δ = δ(s, µ) ≤ 1/ 2 then
q
p
√
T
T
b y(µ) − b y ≤ µ n 1 − 1 − 2δ 2 .
Proof: Recall from (6.27) that
√
bT y(µ) − bT y ≤ µ n khs k .
Substituting the bound of Lemma II.33 on khs k, the theorem follows.
✷
1
0.8
p
√
1 − 1 − 2δ 2
0.6
❯
❑
δ
0.4
0.2
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
✲ δ
Figure 6.12
The graphs of δ and
p
1−
√
√
1 − 2δ 2 for 0 ≤ δ ≤ 1/ 2.
p
√
Figure 6.12 (page 138) shows the graphs of δ and 1 − 1 − 2δ 2 . It is clear that
for small values of δ (δ ≤ 0.3 say) the functions can hardly be distinguished.
6.9.3
Effect of large update on barrier function value
We start by considering the effect of an update of the barrier parameter on the
difference between the dual barrier function value and its minimal value. More
precisely,
we assume that for given dual feasible s and µ > 0 we have δ = δ(s, µ) ≤
√
1/ 2, and we want to estimate
φdµ+ (s) = hµ+ (s) − hµ+ (s(µ+ )),
where µ+ = µ(1 − θ) for 0 ≤ θ < 1. Note that Lemma II.30 gives the answer if θ = 0:
φdµ (s) ≤ ψ(−δ).
II.6 Dual Logarithmic Barrier Method
139
For the general case, where θ > 0, we write
φdµ+ (s)
hµ+ (s) − hµ+ (s(µ+ ))
=
hµ+ (s) − hµ+ (s(µ)) + hµ+ (s(µ)) − hµ+ (s(µ+ ))
=
hµ+ (s) − hµ+ (s(µ)) + φdµ+ (s(µ)),
=
(6.28)
and we treat the first two terms and the last term in the last expression separately.
Lemma II.35 In the above notation,
hµ+ (s) − hµ+ (s(µ)) ≤ ψ(−δ) +
Proof: Just using definitions we write
√ q
p
θ n
1 − 1 − 2δ 2 .
1−θ
n
hµ+ (s) − hµ+ (s(µ))
=
=
=
=
n
bT y(µ) X
−bT y X
log
s
+
log sj (µ)
−
+
j
µ+
µ+
j=1
j=1
−
n
X
j=1
log sj +
n
X
log sj (µ) +
j=1
bT y(µ) − bT y
µ+
bT y(µ) − bT y
bT y(µ) − bT y
−
µ+
µ
T
T
θ
b
y(µ)
−
b
y
φdµ (s) +
.
1−θ
µ
hµ (s) − hµ (s(µ)) +
Applying Lemma II.30 to the first term in the last expression, and Theorem II.34 to
the second term gives the lemma.
✷
Lemma II.36 In the above notation,
φdµ+ (s(µ))
≤ φµ+ (x(µ), s(µ)) = nψ
θ
1−θ
.
Proof: The inequality follows from Lemma II.28. The equality is obtained as follows.
From (6.23),
X
n
s(µ)x(µ)
xj (µ)sj (µ)
φµ+ (x(µ), s(µ)) = eT
log
−
e
−
.
µ+
µ+
j=1
Since x(µ)s(µ) = µe and µ+ = (1 − θ)µ, this can be simplified to
X
n
µ
µe
log +
−
e
−
φµ+ (x(µ), s(µ))
=
eT
+
µ
µ
j=1
X
n
e
1
=
eT
log
−e −
1−θ
1
−
θ
j=1
θ
θ
=
n
− log 1 +
1−θ
1−θ
θ
.
=
nψ
1−θ
140
II Logarithmic Barrier Approach
This completes the proof.
✷
Combining the results of the last two lemmas we find the next lemma.
√
Lemma II.37 Let δ(s, µ) ≤ 1/ 2 for some dual feasible s and µ > 0. Then, if
µ+ = µ(1 − θ) with 0 ≤ θ < 1, we have
φdµ+ (s)
≤ψ
−1
√
2
√
θ n
θ
+
.
+ nψ
1−θ
1−θ
Proof: The lemma follows from√(6.28) and the bounds provided by the previous
✷
lemmas, by substitution of δ = 1/ 2.
With s, µ and µ+ as in the last lemma, our aim is to estimate the number of damped
Newton steps required to reach the vicinity of the µ+ -center when starting at s. To
this end we proceed by estimating the decrease in the barrier function value during a
damped Newton step.
6.9.4
Decrease of the barrier function value
In this section we consider a damped Newton step to the µ-center at an arbitrary
positive dual feasible s and we estimate its effect on the barrier function value. The
analysis also yields a suitable value for the damping factor α. The result of the damped
Newton step is denoted by s+ , so
s+ = s + α∆s,
(6.29)
where ∆s denotes the full Newton step.
Lemma II.38 Let δ = δ(s, µ). If α = 1/(1 + δ) then the damped Newton step (6.29)
is feasible and it reduces the barrier function value by at least δ − log(1 + δ). In other
words,
φdµ (s) − φdµ (s+ ) ≥ δ − log(1 + δ) = ψ(δ).
Proof: First recall from (6.5) in Section 6.5 that the Newton step ∆s is determined
by
x(s, µ) = µs−1 e − s−1 ∆s .
We denote x(s, µ) briefly as x. With
z :=
∆s
xs
=e− ,
s
µ
the damped Newton step can be described as follows:
s+ = s + α∆s = s(e + αz).
Since s+ is feasible if and only if it is nonnegative, the step is certainly feasible if
α kzk < 1. Since δ = kzk, the value for α specified by the lemma satisfies this condition,
II.6 Dual Logarithmic Barrier Method
141
and hence the feasibility of s+ follows. Now we consider the decrease in the dual barrier
function value during the step. We may write
φdµ (s) − φdµ (s+ )
=
=
hµ (s) − hµ (s+ )
n
n
−bT y + X
−bT y X
log sj −
log s+
−
+
j .
µ
µ
j=1
j=1
n
=
bT y + − bT y X
log (1 + α zj ) .
+
µ
j=1
The difference bT y + − bT y can be written as follows:
bT y + − bT y
cT x − bT y − cT x − bT y + = xT s − xT s+
=
−α xT (sz) = −α eT (xs) z = α µeT (z − e)z.
=
Thus we obtain
φdµ (s) − φdµ (s+ )
=
=
=
αeT (z − e)z +
n
X
log (1 + α zj )
j=1
α eT z 2 − eT (αz) −
2
α δ − Ψ(α z).
n
X
j=1
log (1 + α zj )
Since kαzk < 1 we may apply the right-hand side inequality in (6.24), which gives
Ψ(α z) ≤ ψ (−α kzk) = ψ (−αδ), whence
φdµ (s) − φdµ (s+ ) ≥ α δ 2 − ψ(−αδ) = α δ 2 + α δ + log(1 − α δ).
As a function of α, the right-hand side expression is increasing for 0 ≤ α ≤ 1/(1 + δ),
as can be easily verified, and it attains its maximal value at α = 1/(1 + δ), which is
the value specified in the lemma. Substitution of this value yields the bound in the
142
II Logarithmic Barrier Approach
lemma. Thus the proof is complete.22,23
✷
We are now ready to estimate the number of (inner) iterations between two
successive updates of the barrier parameter.
6.9.5
Number of inner iterations
Lemma II.39 The number of (inner) iterations between two successive updates of the
barrier parameter is no larger than
& √
2 '
θ n
+1
.
3
1−θ
Proof: From Lemma II.37 we know that after the update of µ we have
√
θ n
θ
φdµ+ (s) ≤ ψ(−τ ) +
,
+ nψ
1−θ
1−θ
√
where τ = 1/ 2. The algorithm repeats damped Newton steps as long the iterate s
satisfies δ = δ(s, µ+ ) > τ . In that case the step decreases the barrier function value
by at least ψ(δ), by Lemma II.38. Since δ > τ , the decrease is at least
ψ(τ ) = 0.172307.
As soon as the barrier function value has reached ψ(τ ) we are sure that δ(s, µ+ ) ≤ τ ,
from Lemma II.30. Hence, the number of inner iterations is no larger than
√
θ n
θ
1
ψ(−τ ) − ψ(τ ) +
.
+ nψ
ψ(τ )
1−θ
1−θ
The rest of the proof consists in reducing this expression to the one in the lemma.
First, using that ψ(−τ ) = 0.52084, we obtain
ψ(−τ ) − ψ(τ )
0.34853
=
≤ 3.
ψ(τ )
0.172307
22
Exercise 44 In the proof of Lemma II.38 we found the following expression for the decrease in
the dual barrier function value:
φdµ (s) − φdµ (s+ ) = α eT z 2 − Ψ(α z),
where α denotes the size of the damped Newton step. Show that the decrease is maximal for the
unique step-size ᾱ determined by the equation
eT z 2 =
n
X
j=1
αzj2
1 + αzj
and that for this value the decrease is given by
23
ᾱz
.
e + ᾱz
It is interesting to observe that Lemma II.38 provides a second proof of the first statement in
Lemma II.30, namely
φdµ (s) ≥ ψ(δ),
Ψ
where δ := δ(s, µ). This follows from Lemma II.38, since φdµ (s+ ) ≥ 0.
II.6 Dual Logarithmic Barrier Method
143
Furthermore, using ψ(t) ≤ t2 /2 for t ≥ 0 we get24
nθ2
θ
≤
.
nψ
1−θ
2(1 − θ)2
(6.30)
Finally, using that 1/ψ(τ ) < 1/6 we obtain the following upper bound for the number
of inner iterations:
& √
2 '
√
θ n
3nθ2
6θ n
= 3
+
+1
.
3+
1−θ
(1 − θ)2
1−θ
This proves the lemma.
✷
√
Remark II.40 It is tempting to apply Lemma II.39 to the case where θ = 1/(3 n). We
know that for that value of θ one full Newton step keeps the iterate in the region of quadratic
convergence around the µ+ -center. Substitution of this value in the bound of Lemma II.39
however yields that at least 6 damped Newton steps are required for the same purpose. This
disappointing result reveals a weakness of the above analysis. The weakness probably stems
from the fact that the estimate of the number of inner iterations in one outer iteration is based
on the assumption that the decrease in the barrier function value is given by the constant
ψ(τ ). Actually the decrease is at least ψ(δ). Since in many inner iterations, in particular in
the iterations immediately after the update of the barrier parameter, the proximity δ may be
much larger than τ , the actual number of iterations may be much smaller than the pessimistic
estimate of Lemma II.39. This is the reason why for the algorithm with large updates there
exists a gap between theory and practice. In practice the number of inner iterations is much
smaller than the upper bound given by the lemma. Hopefully future research will close this
gap.25
•
6.9.6
Total number of iterations
We proceed by estimating the total number of iterations required by the algorithm.
Theorem II.41 To obtain a primal-dual pair (x, s), with x = x(s, µ), such that
xT s ≤ 2ε, at most
'
& & √
2 '
θ n
nµ0
1
3
+1
log
θ
1−θ
ε
iterations are required by the logarithmic barrier algorithm with large updates.
24
A different estimate arises by using Exercise 39, which implies ψ(t) ≤ t2 /(1 + t) for t > −1. Hence
nψ
25
θ
1−θ
≤
nθ 2
,
1−θ
which is sharper than (6.30) if θ > 12 . The use of (6.30) however does not deteriorate the order of
our estimates below.
Exercise 45 Let δ = δ(s, µ) > 0 and x = x(s, µ). Then the vector z = (xs/µ) − e has at least one
positive coordinate. Prove this. Hence, if z has only one nonzero coordinate then this coordinate
equals kzk. Show that in that case the single damped Newton step with step-size α = 1/(1 + δ)
yields s+ = s(µ).
144
II Logarithmic Barrier Approach
Proof: The number of outer iterations follows from Lemma I.36. The bound in the
theorem is obtained by multiplying this number by the bound of Lemma II.39 for
the number of inner iterations per outer iteration and rounding the product, if not
integral, to the smallest integer above it.
✷
We end this section by drawing two conclusions. If we take θ to be a fixed constant
(independent of n), for example θ = 1/2, the iteration bound of Theorem II.41 becomes
nµ0
.
O n log
ε
For such values of θ we say that the algorithm uses large updates. The number of
inner iterations per outer
iteration is then O(n).
√
If we take θ = ν/ n for some fixed constant ν (independent of n), the iteration
bound of Theorem II.41 becomes
√
nµ0
n log
,
O
ε
provided that n is large enough (n ≥ ν 2 say). It has become common to say that the
algorithm uses medium updates. The number of inner iterations per outer iteration is
then bounded by a constant, depending on ν.
In the next section we give an illustration.
6.9.7
Illustration of the algorithm with large updates
We use the same sample problem as before (see Sections 6.7.2 and 6.8.4) and solve
it using the dual logarithmic barrier algorithm with large updates. We do this for
several values of the barrier update parameter θ. As before, we start the algorithm at
y = (0, 0) and µ = 2, and the accuracy parameter is set to ε = 10−4 . For θ = 0.5,
Table 6.3. (page 145) lists the algorithm’s progress.
The table needs some explanation. The first two columns contain counters for the
outer and inner iterations, respectively. The algorithm requires 16 outer and 16 inner
iterations. The table shows the effect of each outer iteration, which involves an update
of the barrier parameter, and also the effect of each inner iteration, which involves a
move in the dual space. During a barrier parameter update the dual variables y and
s remain unchanged, but, because of the change in µ, the primal variable x(s, µ) and
the proximity attain new values. After each update, damped Newton steps are taken
until the proximity reaches the value τ . In this example the number of inner iterations
per outer iteration is never more than one. Note that we can guarantee the primal
feasibility of x only if the proximity is at most one. Since the table shows only the
second coordinate of x (and also of s), infeasibility of x can only be detected from the
table if x2 is negative. In this example this does not occur, but it occurs in the next
example, where we solve the same problem with θ = 0.9.
With θ = 0.9, Table 6.4. (page 146) shows that in some iterations x is infeasible
indeed. Moreover, although the number of outer iterations is much smaller than in
the previous case (5 instead of 16), the total number of iterations is almost the same
(14 instead of 16). Clearly, and understandably, the deeper updates make it harder to
reach the new target region.
II.6 Dual Logarithmic Barrier Method
Outer Inner
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
145
nµ
x2
y1
y2
s2
δ
0 6.000000
3.000000
1 3.000000
1.500000
2 1.500000
0.750000
3 0.750000
0.375000
4 0.375000
0.187500
5 0.187500
0.093750
6 0.093750
0.046875
7 0.046875
0.023438
8 0.023438
0.011719
9 0.011719
0.005859
10 0.005859
0.002930
11 0.002930
0.001465
12 0.001465
0.000732
13 0.000732
0.000366
14 0.000366
0.000183
15 0.000183
0.000092
16 0.000092
1.500000
0.500000
0.690744
0.230248
0.302838
0.105977
0.138696
0.056177
0.065989
0.029627
0.032120
0.015201
0.015842
0.007704
0.007866
0.003878
0.003920
0.001946
0.001956
0.000975
0.000977
0.000488
0.000488
0.000244
0.000244
0.000122
0.000122
0.000061
0.000061
0.000031
0.000031
0.000015
0.000015
0.000000
0.000000
0.292893
0.292893
0.519549
0.519549
0.717503
0.717503
0.847850
0.847850
0.920315
0.920315
0.959178
0.959178
0.979268
0.979268
0.989548
0.989548
0.994747
0.994747
0.997366
0.997366
0.998681
0.998681
0.999340
0.999340
0.999670
0.999670
0.999835
0.999835
0.999917
0.999917
0.999959
0.000000
0.000000
0.000000
0.000000
0.433260
0.433260
0.696121
0.696121
0.840792
0.840792
0.918672
0.918672
0.958681
0.958681
0.979161
0.979161
0.989516
0.989516
0.994740
0.994740
0.997364
0.997364
0.998680
0.998680
0.999340
0.999340
0.999670
0.999670
0.999835
0.999835
0.999917
0.999917
0.999959
1.000000
1.000000
1.292893
1.292893
1.519549
1.519549
1.717503
1.717503
1.847850
1.847850
1.920315
1.920315
1.959178
1.959178
1.979268
1.979268
1.989548
1.989548
1.994747
1.994747
1.997366
1.997366
1.998681
1.998681
1.999340
1.999340
1.999670
1.999670
1.999835
1.999835
1.999917
1.999917
1.999959
0.6124
0.7071
0.2229
1.3081
0.2960
1.7316
0.3618
2.0059
0.4050
2.1632
0.4367
2.2575
0.4591
2.3176
0.4744
2.3556
0.4844
2.3793
0.4907
2.3937
0.4945
2.4023
0.4968
2.4074
0.4982
2.4103
0.4990
2.4120
0.4994
2.4130
0.4997
2.4135
0.4998
Table 6.3.
Progress of the dual algorithm with large updates, θ = 0.5.
This is even more true in the last example where we take θ = 0.99. Table 6.5.
(page 146) shows the result. The number of outer iterations is only 3, but the total
number of iterations is still 14. This leads us to the important observation that the
deep update strategy has its limits. On the other hand, the number of iterations is
competing with the methods using full Newton steps, and is significantly less than the
iteration bound of Theorem II.41.
146
II Logarithmic Barrier Approach
Outer Inner
0
1
2
3
4
5
Outer Inner
2
3
x2
y1
y2
s2
δ
0 6.000000
1.500000 0.000000 0.000000 1.000000 0.6124
0.600000 −0.300000 0.000000 0.000000 1.000000 5.3385
1 0.600000
0.014393 0.394413 0.631060 1.394413 2.4112
2 0.600000
0.108620 0.762163 0.722418 1.762163 0.5037
0.060000 −0.005240 0.762163 0.722418 1.762163 16.8904
3 0.060000
0.008563 0.906132 0.922246 1.906132 4.7236
4 0.060000
0.010057 0.967364 0.961475 1.967364 1.1306
5 0.060000
0.010098 0.977293 0.978223 1.977293 0.1716
0.006000
0.000891 0.977293 0.978223 1.977293 14.3247
6 0.006000
0.000994 0.992649 0.992275 1.992649 3.9208
7 0.006000
0.001001 0.996651 0.996769 1.996651 0.9143
8 0.006000
0.001001 0.997834 0.997808 1.997834 0.1277
0.000600
0.000099 0.997834 0.997808 1.997834 13.9956
9 0.000600
0.000100 0.999254 0.999264 1.999254 3.8257
10 0.000600
0.000100 0.999676 0.999673 1.999676 0.8883
11 0.000600
0.000100 0.999782 0.999783 1.999782 0.1224
0.000060
0.000010 0.999782 0.999783 1.999782 13.9508
12 0.000060
0.000010 0.999926 0.999926 1.999926 3.8128
13 0.000060
0.000010 0.999967 0.999968 1.999967 0.8847
14 0.000060
0.000010 0.999978 0.999978 1.999978 0.1216
Table 6.4.
0
1
nµ
Progress of the dual algorithm with large updates, θ = 0.9.
nµ
x2
y1
y2
s2
δ
0 6.000000
1.500000 0.000000 0.000000 1.000000
0.6124
0.060000 −0.480000 0.000000 0.000000 1.000000 60.4235
1 0.060000 −0.133674 0.407010 0.797740 1.407010 28.2966
2 0.060000
0.008587 0.906680 0.860655 1.906680
7.0268
3 0.060000
0.009852 0.949767 0.964246 1.949767
1.7270
4 0.060000
0.010099 0.978068 0.974574 1.978068
0.2919
0.000600 −0.000021 0.978068 0.974574 1.978068 166.4835
5 0.000600
0.000086 0.992297 0.993722 1.992297 48.2832
6 0.000600
0.000099 0.998161 0.997593 1.998161 13.7438
7 0.000600
0.000100 0.999183 0.999394 1.999183
3.6913
8 0.000600
0.000100 0.999720 0.999656 1.999720
0.8224
9 0.000600
0.000100 0.999781 0.999792 1.999781
0.1013
0.000006
0.000001 0.999781 0.999792 1.999781 149.4817
10 0.000006
0.000001 0.999939 0.999934 1.999939 43.4727
11 0.000006
0.000001 0.999980 0.999981 1.999980 12.4359
12 0.000006
0.000001 0.999994 0.999993 1.999994
3.3655
13 0.000006
0.000001 0.999997 0.999997 1.999997
0.7573
14 0.000006
0.000001 0.999998 0.999998 1.999998
0.0949
Table 6.5.
Progress of the dual algorithm with large updates, θ = 0.99.
II.6 Dual Logarithmic Barrier Method
147
We conclude this section with a graphical illustration of the algorithm, with θ = 0.9.
Figure 6.13 shows the first outer iteration, which consists of 2 inner iterations.
1
✛ δ(s, 0.2) = τ
y2
0.5
✻
0
−0.5
−1
−1.5
②
−2
δ(s, 2) = τ
−2.5
✛
−3
0
central path
0.5
1
✲ y1
Figure 6.13
The first iterates for a large update with θ = 0.9.
7
The Primal-Dual Logarithmic
Barrier Method
7.1
Introduction
In the previous chapter we dealt extensively with the dual logarithmic barrier approach
to the LO problem. It has become clear that Newton’s method, when applied to find
the minimizer of the dual logarithmic barrier function, yields a search direction ∆s in
the dual space that allows us to follow the dual central path (approximately) to the
dual optimal set. We were able to show that an ε-solution of (D) can be obtained in a
number of iterations that is proportional to the product
of the logarithm of the initial
√
duality gap divided by the desired accuracy, and n (for the full-step method and the
medium-update method) or n (for the large-update method). Although the driving
force in the dual logarithmic barrier approach is the desire to solve the dual problem
(D), it also yields an ε-solution of the primal problem (P ). The problem (P ) also plays
a crucial role in the analysis of the method. For example, the Newton step ∆s at (y, s)
for the barrier parameter value µ is described by the primal variable x(s, µ). Moreover,
the convergence proof of the method uses the duality gap cT x(s, µ) − bT y. Finally, the
analysis of the medium-update and large-update versions of the dual method strongly
depend on the properties of the primal-dual logarithmic barrier function φµ (x, s).
The aim of this chapter is to show that we can benefit from the primal problem not
only in the analysis but also in the design of the algorithm. The idea is to solve both
the dual and the primal problem simultaneously, by taking in each iteration a step
∆s in the dual space and a step ∆x in the primal space. Here, the search directions
∆s and ∆x still have to be defined. This is done in the next section. Again, Newton’s
name is given to the search directions, but now the search directions arise from an
iterative method — also due to Newton — for solving the system of equations defining
the µ-centers of (P ) and (D).
In the following paragraphs we follow the same program as for the dual algorithms:
we first introduce a proximity measure, then we deal with full-step methods, with both
fixed and adaptive updates of the barrier parameter, and finally we consider methods
that use deep (but fixed) updates and damped Newton steps.
For the sake of clarity, it might be useful to emphasize that it is not our aim to
take for ∆s the dual Newton step and for ∆x its counterpart, the primal Newton step.
For this would mean that we were executing two algorithms simultaneously, namely
the dual logarithmic barrier algorithm and the primal logarithmic barrier algorithm.
150
II Logarithmic Barrier Approach
Apart from the fact that this makes no sense, it doubles the computational work
(roughly speaking). Instead, we define the search directions ∆s and ∆x in a new
way and we show that the resulting algorithms, called primal-dual algorithms, allow
similar theoretical iteration bounds to their dual (or primal) counterparts. In practice,
however, primal-dual methods have a very good reputation. Many computational
studies give support to this reputation. This is especially true for the so-called
predictor-corrector method, which is discussed in Section 7.7.
7.2
Definition of the Newton step
In this section we are given a positive primal-dual feasible pair (x, (y, s)), and some
µ > 0. Our aim is to define search directions ∆x, ∆y, ∆s that move in the direction of
the µ-center x(µ), y(µ), s(µ). In fact, we want the new iterates x + ∆x, y + ∆y, s + ∆s
to satisfy the KKT system (5.3) with respect to µ. After substitution this yields the
following conditions on ∆x, ∆y, ∆s:
A(x + ∆x)
=
b,
x + ∆x > 0,
A (y + ∆y) + s + ∆s
(x + ∆x)(s + ∆s)
=
=
c,
µe.
s + ∆s > 0,
T
If we neglect for the moment the inequality constraints, then, since Ax = b and
AT y + s = c, this system can be rewritten as follows:
A∆x
=
0,
T
A ∆y + ∆s
=
0,
s∆x + x∆s + ∆x∆s
=
µe − xs.
(7.1)
Unfortunately, this system of equations in ∆x, ∆y and ∆s is nonlinear, because of the
term ∆x∆s in the third equation. To overcome this difficulty we simply neglect this
quadratic term, according to Newton’s method for solving nonlinear equations, and
we obtain the linear system
A∆x
=
0,
AT ∆y + ∆s
s∆x + x∆s
=
=
0,
µe − xs.
(7.2)
Below we show that this system determines the displacements ∆x, ∆y and ∆s
uniquely. We call them the primal-dual Newton directions and these are the directions
we are going to use.
Theorem II.42 The system (7.2) has a unique solution, namely
−1
b − µAs−1
∆y
=
AXS −1 AT
∆s
∆x
=
=
−AT ∆y
µs−1 − x − xs−1 ∆s.
II.7 Primal-Dual Logarithmic Barrier Method
151
Proof: We divide the third equation in (7.2) coordinatewise by s, and obtain
∆x + xs−1 ∆s = µs−1 − x.
(7.3)
Multiplying this equation from the left by A, and using that A∆x = 0 and Ax = b,
we get
AXS −1 ∆s = µAs−1 − Ax = µAs−1 − b.
The second equation gives ∆s = −AT ∆y. Substituting this we find
AXS −1 AT ∆y = b − µAs−1 .
Since A is an m × n matrix of rank m, the matrix AXS −1 AT has size m × m and is
nonsingular, so the last equation determines ∆y uniquely as specified in the theorem.
Now ∆s follows uniquely from ∆s = −AT ∆y. Finally, (7.3) yields the expression for
∆x.1
✷
Remark II.43 In the analysis below we do not use the expressions just found for the search
directions in the primal and the dual space. But it is important to see that their computation
requires the solution of a linear system of equations with AXS −1 AT as coefficient matrix.
We refer the reader to Chapter 20 for a discussion of computational issues related to efficient
solution methods for such systems.
•
Remark II.44 We can easily deduce from Theorem II.42 that the primal-dual directions
for the y- and the s-space differ from the dual search directions used in the previous chapter.
For example, the dual direction for y was given by
AS −2 AT
−1
b
− As−1
µ
whereas the primal-dual direction is given by
AXS −1 AT
−1
b − µAs−1 .
The difference is that the scaling matrix S −2 in the dual case is replaced by the scaling matrix
XS −1 /µ in the primal-dual case. Note that the two scaling matrices coincide if and only if
XS = µI, which happens if and only if x = x(µ) and s = s(µ). In that case both expressions
vanish, since then µAs−1 = Ax = b. We conclude that if s 6= s(µ) then the dual directions
in the y- and in the s-space differ from the corresponding primal-dual directions. A similar
result holds for the search direction in the primal space. It may be worthwhile to point out
that the dual search direction at y depends only on y itself and the slack vector s = c − AT y,
whereas the primal-dual direction at y also depends on the given primal variable x.
•
1
Exercise 46 An alternative proof of the unicity property in Theorem II.42 can be obtained by
showing that the matrix in the linear system (7.2) is nonsingular. This matrix is given by
"
Prove that this matrix in nonsingular.
A 0
0 AT
S
0
0
I
X
#
.
152
7.3
II Logarithmic Barrier Approach
Properties of the Newton step
We denote the result of the (full) Newton step at (x, y, s) by (x+ , y + , s+ ):
x+ = x + ∆x,
y + = y + ∆y,
s+ = s + ∆s.
Note that the new iterates satisfy the affine equations Ax+ = b and AT y + + s+ = c,
since A∆x = 0 and AT ∆y + ∆s = 0, so we only have to concentrate on the sign of
the vectors x+ and s+ . We call the Newton step feasible if x+ and s+ are nonnegative
and strictly feasible if x+ and s+ are positive. The main aim of this section is to find
conditions for feasibility and strict feasibility of the (full) Newton step.
First we deal with two simple lemmas.2
Lemma II.45 ∆x and ∆s are orthogonal.
Proof: Since A∆x = 0, ∆x belongs to the null space of A, and since ∆s = −AT ∆y,
∆s belongs to the row space of A. Since these spaces are orthogonal, the lemma follows.
✷
If x+ and s+ are nonnegative (positive), then their product is nonnegative (positive)
as well. We may write
x+ s+ = (x + ∆x)(s + ∆s) = xs + (s∆x + x∆s) + ∆x∆s.
Since s∆x + x∆s = µe − xs this leads to
x+ s+ = µe + ∆x∆s.
(7.4)
Thus it follows that x+ and s+ are feasible only if µe + ∆x∆s is nonnegative.
Surprisingly enough, the converse is also true. This is the content of our next lemma.
Lemma II.46 The primal-dual Newton step is feasible if and only if µe + ∆x∆s ≥ 0
and strictly feasible if and only if µe + ∆x∆s > 0.
Proof: The ‘only if’ part of both statements in the lemma follows immediately from
(7.4). For the proof of the converse part we introduce a step length α, 0 ≤ α ≤ 1, and
we define
xα = x + α∆x, y α = y + α∆y, sα = s + α∆s.
We then have x0 = x, x1 = x+ and similar relations for the dual variables. Hence we
have x0 s0 = xs > 0. The proof uses a continuity argument, namely that x1 and s1
are nonnegative if xα sα is positive for all α in the open interval (0, 1). This argument
has a simple geometric interpretation: x1 and s1 are feasible if and only if the open
segment connecting x0 and x1 lies in the interior of the primal feasible region, and the
open segment connecting s0 and s1 lies in the interior of the dual feasible region. Now
we write
xα sα = (x + α∆x)(s + α∆s) = xs + α (s∆x + x∆s) + α2 ∆x∆s.
2
One might observe that some of the results in this and the next section are quite similar to
analogous results in Section 2.7.2 in Part I for the Newton step for the self-dual model. To keep
the treatment here self-supporting we do not invoke these results, however.
II.7 Primal-Dual Logarithmic Barrier Method
153
Using s∆x + x∆s = µe − xs gives
xα sα = xs + α (µe − xs) + α2 ∆x∆s.
Now suppose µe + ∆x∆s ≥ 0. Then it follows that
xα sα ≥ xs + α (µe − xs) − α2 µe = (1 − α) (xs + αµe) .
Since xs and e are positive it follows that xα sα > 0 for 0 ≤ α < 1. Hence, none of the
entries of xα and sα vanish for 0 ≤ α < 1. Since x0 and s0 are positive, this implies
that xα > 0 and sα > 0 for 0 ≤ α < 1. Therefore, by continuity, the vectors x1 and
s1 cannot have negative entries. This completes the proof of the first statement in the
lemma. Assuming µe + ∆x∆s > 0, we derive in the same way
xα sα > xs + α (µe − xs) − α2 µe = (1 − α) (xs + αµe) .
This implies that x1 s1 > 0. Hence, by continuity, x1 and s1 must be positive, proving
the second statement in the lemma.
✷
We proceed with a discussion of the vector ∆x∆s. From (7.4) it is clear that the
error made by neglecting the second-order term in the nonlinear system (7.1) is given
by this vector. It represents the so-called second-order effect in the Newton step.
Therefore it will not be surprising that the vector ∆x∆s plays a crucial role in the
analysis of primal-dual methods.
It is worth considering the ideal case where the second-order term vanishes. If
∆x∆s = 0, then ∆x and ∆s solve the nonlinear system (7.1). By Lemma II.46 the
Newton iterates x+ and s+ are feasible in this case. Hence they satisfy the KKT
conditions. Now the unicity property gives us that x+ = x(µ) and s+ = s(µ). Thus
we see that the Newton process is exact in this case: it produces the µ-centers in one
iteration.3
In general the second-order term is nonzero and the new iterates do not coincide
with the µ-centers. But we have the surprising property that the duality gap always
assumes the same value as at the µ-centers, where the duality gap equals nµ.
T
Lemma II.47 If the primal-dual Newton step is feasible then (x+ ) s+ = nµ.
Proof: Using (7.4) and the fact that the vectors ∆x and ∆s are orthogonal, the
duality gap after the Newton step can be written as follows:
x+
T
s+ = eT x+ s+ = eT (µe + ∆x∆s) = µeT e = nµ.
This proves the lemma.
✷
In the general case we need some quantity for measuring the progress of the
Newton iterates on the way to the µ-centers. As in the case of the dual logarithmic
barrier method we start by considering a ‘full-step method’. We then deal with
3
Exercise 47 Let (x, s) be a positive primal-dual feasible pair with x = x(µ). Show that the
Newton process is exact in this case, with ∆x = 0 and ∆s = s(µ) − s. (A similar results holds if
s = s(µ), and follows in the same way.)
154
II Logarithmic Barrier Approach
an ‘adaptive method’, in which the barrier parameter is updated ‘adaptively’, and
then turn to the ‘large-update method’, which uses large fixed updates and damped
Newton steps. For the large-update method we already have an excellent candidate
for measuring proximity to the µ-centers, namely the primal-dual logarithmic barrier
function φµ (x, s). For the full-step method and the adaptive method we need a new
measure that is introduced in the next section.
7.4
Proximity and local quadratic convergence
Recall that for the dual method we have used the Euclidean norm of the Newton step
∆s scaled by s as a proximity measure. It is not at all obvious how this successful
approach can be generalized to the primal-dual case. However, there is a natural way
of doing this, but we first have to reformulate the linear system (7.2) that defines the
Newton directions in the primal-dual case. To this end we introduce the vectors
r
r
x
xs
, u :=
.
d :=
s
µ
Using d we can rescale x and s to the same vector, namely u:
d−1 x
√ = u,
µ
ds
√ = u.
µ
Now we scale ∆x and ∆s similarly to dx and ds :
d−1 ∆x
=: dx ,
√
µ
d∆s
√ =: ds .
µ
(7.5)
For easy reference in the future we write
x+ = x + ∆x
+
s = s + ∆s
=
=
√
µ d (u + dx )
√ −1
µ d (u + ds )
(7.6)
(7.7)
and, using (7.4),
x+ s+ = µe + ∆x∆s = µ (e + dx ds ) .
(7.8)
Thus we may restate Lemma II.46 without further proof as follows.
Lemma II.48 The primal-dual Newton step is feasible if and only if
e + dx ds ≥ 0
(7.9)
e + dx ds > 0.
(7.10)
∆x∆s = µdx ds ,
(7.11)
and strictly feasible if and only if
Since
II.7 Primal-Dual Logarithmic Barrier Method
155
the orthogonality of ∆x and ∆s implies that the scaled displacements dx and ds are
orthogonal as well. Now we may reformulate the left-hand side in the third equation
of the KKT system as follows:
√
s∆x + x∆s = µ sddx + xd−1 ds = µ u (dx + ds ) ,
and the right-hand side can be rewritten as
µe − xs = µe − µu2 = µ u u−1 − u .
The third equation can then be restated simply as
dx + ds = u−1 − u.
On the other hand, the first and the second equations can be reformulated as ADdx = 0
and (AD)T dy + ds = 0, where
∆y
dy = √ .
µ
We conclude that the scaled displacements dx , dy and ds satisfy
ADdx = 0
T
(7.12)
(AD) dy + ds = 0
dx + ds = u
−1
− u.
The first two equations show that the vectors dx and ds belong to the null space and
the row space of the matrix AD respectively. These two spaces are orthogonal and
the row space of AD is equal to the null space of the matrix HD−1 , where H is any
matrix whose null space is equal to the row space of A, as defined in Section 6.3 (page
111). The last equation makes clear that dx and ds form the orthogonal components
of the vector u−1 − u in these complementary subspaces. Therefore, we find4,5
dx
=
ds
=
PAD (u−1 − u)
(7.13)
PHD−1 (u−1 − u).
(7.14)
The orthogonality of dx and ds also implies
kdx k2 + kds k2 = u−1 − u
2
.
(7.15)
Note that the displacements dx , ds (and also dy ) are zero if and only if u−1 − u = 0.
In this case x, y and s coincide with the respective µ-centers. It will be clear that
the quantity u−1 − u is a natural candidate for measuring closeness to the pair of
4
5
Exercise 48 Verify that the expressions for the scaled displacements dx and ds in (7.13) and
(7.14) are in accordance with Theorem II.42.
Exercise 49 Show that
PAD + PHD −1 = I,
where I denotes the identity matrix in IRn . Also show that
PAD = D −1 H T HD −2 H T
−1
HD −1 ,
PHD −1 = DAT AD 2 AT
−1
AD.
156
II Logarithmic Barrier Approach
µ-centers. It turns out that it is more convenient not to use the norm of u−1 − u itself,
but to divide it by 2. Therefore, we define
δ(x, s; µ) :=
1
1 −1
u −u =
2
2
r
xs
−
µ
r
µ
.
xs
(7.16)
By (7.15), δ(x, s; µ) is simply half of the Euclidean norm of the concatenation of the
search direction vectors ∆x and ∆s after some appropriate scaling.6,7,8
In the previous section we discussed that the quality of the Newton step greatly
depends on the second-order term ∆x∆s. Recall that this term, when expressed in
the scaled displacements, equals µdx ds . We proceed by showing that the vector dx ds
can be nicely bounded in terms of the proximity measure.
Lemma II.49 Let (x, s) be any positive primal-dual
√ pair and suppose µ > 0. If
δ := δ(x, s; µ), then kdx ds k∞ ≤ δ 2 and kdx ds k ≤ δ 2 2.
Proof: Since the vectors dx and ds are orthogonal, the lemma follows immediately
from the first uv−lemma (Lemma C.4 in Appendix C) by noting that dx +ds = u−1 −u
and u−1 − u = 2δ.
✷
We are now ready for the main result of this section (Theorem II.50 below), which
is the primal-dual analogue of Theorem II.21 for the dual logarithmic barrier method.
Theorem II.50 If δ := δ(x, s; µ) ≤ 1, then the primal-dual Newton step is feasible,
i.e., x+ and s+ are nonnegative. Moreover, if δ < 1, then x+ and s+ are positive and
δ2
δ(x+ , s+ ; µ) ≤ p
.
2(1 − δ 2 )
6
This proximity measure was introduced by Jansen et al. [157]. In the context of primal-dual
methods, most authors used a different but closely related proximity measure. See Section 7.5.3.
Because of the analogy with the proximity measure in the dual case, and also because of its natural
interpretation as the norm of the scaled Newton direction, we prefer the proximity measure as
defined by (7.16). Another motivation for the use of this measure is that it allows sharper estimates
in the analysis of the primal-dual methods. This will become clear later.
7
Exercise 50 Let δ = δ(x, s; µ). In general the vector x̄ = µs−1 is not primal feasible, and the
vector s̄ = µx−1 not dual feasible. The aim of this exercise is to show that the deviation from
feasibility can be measured in a natural way. Defining
Gp = AD 2 AT ,
we have
kAx̄ − bkG−1 =
p
√
µ kds k ,
Gd = HD −2 H T ,
kH s̄ − HckG−1 =
√
d
µ kdx k .
As a consequence, prove that
kAx̄ − bk2 −1 + kH s̄ − Hck2 −1 = 4µδ2 .
Gp
8
G
d
Exercise 51 Prove that
v
u n
u1 X
xi s i
δ(x, s; µ) = t
cosh log
2
i=1
µ
−1
v
u n
1
uX
xi s i
=t
log
sinh2
.
i=1
2
µ
II.7 Primal-Dual Logarithmic Barrier Method
157
Proof: Since δ := δ(x, s; µ) ≤ 1, Lemma II.49 implies that kdx ds k∞ ≤ 1. Now Lemma
II.48 yields that the primal-dual Newton step is feasible, thus proving the first part
of the theorem. Now let us turn to the proof of the second statement. Obviously, the
same arguments as for the first part show that if δ < 1 then x+ and s+ are positive.
Let δ + := δ(x+ , s+ ; µ) and
s
u+ :=
x+ s+
.
µ
Then we have, by definition,
2δ + = (u+ )−1 − u+ = (u+ )−1 e − (u+ )2
Recall from (7.8) that
.
x+ s+ = µ (e + dx ds ) .
Hence,
u+ =
Substitution gives
p
e + dx ds .
dx ds
2δ + = √
e + dx ds
kdx ds k
.
≤ p
1 − kdx ds k∞
Now using the bounds in Lemma II.49 we obtain
√
δ2 2
2δ + ≤ √
.
1 − δ2
Dividing both sides by 2 we arrive at the result in the theorem.
✷
Theorem II.50 makes clear that the primal-dual Newton method is quadratically
convergent in the region
1
(7.17)
(x, s) ∈ P × D : δ(x, s; µ) ≤ √ = 0.7071 ,
2
where we have δ + ≤ δ 2 . It is clear that Theorem II.50 has no value
p if the upper bound
for δ(x+ , s+ ; µ) is not smaller than δ, which is the case if δ ≥ 2/3 = 0.8165.
As for the dual Newton method, we provide a graphical example to illustrate how
the primal-dual Newton process behaves.
Example II.51 We use the same problem as in Example II.7 with b = (1, 1)T . So
A, b and c are given by
" #
"
#
1
1
1 −1
0
.
A=
, c = 1 , b =
1
0
0
1
1
Instead of drawing a graph in the dual (or primal) space we take another approach. We
associate with each primal-dual pair (x, s) the positive vector w = xs, and represent
this vector by a point in the so-called w-space, which is the interior of the nonnegative
158
II Logarithmic Barrier Approach
4
0.707
w2
✻
✻
central path
3
0.5
δ(w, 1) =
2
√1
2
❄
1
0
0
Figure 7.1
1
2
3
✲ w1
4
Quadratic convergence of primal-dual Newton process (µ = 1).
orthant of IRn , with n = 3. Note that δ(x, s; µ) = 0 if and only if x = x(µ) and
s = s(µ), and that in that case xs = µe. Hence, in the w-space the central path is
represented by the half-line µe, µ > 0.√ Figure 7.1 shows the level curves (in the wspace) for the proximity values τ = 1/ 2 and τ 2 with respect to µ = 1, and also how
the Newton step behaves when applied at some points on the boundary of the region
of quadratic convergence. This figure depicts the w-space projected onto its first two
coordinates. The starting point for a Newton step is always indicated by the symbol
‘o ’, and the point resulting from the step by the symbol ‘∗ ’.9 The curve connecting
the two points shows the intermediate values of xs on the way from the starting point
to the point after the full Newton step. The points on these curves represent
xα sα = (x + α∆x)(s + α∆s) = xs + α (x∆s + s∆x) + α2 ∆x∆s, 0 ≤ α ≤ 1,
where (x0 , s0 ) is the starting point of the iteration and (x1 , s1 ) the result of the full
Newton step. If there were no second-order effects (i.e., if ∆x∆s = 0) then this curve
would be a straight line. So the curvature of the line connecting the point before and
after a step is an indication of the second-order effect. Note that after the Newton
step the new proximity value is always smaller than τ 2 = 1/2, in agreement with
Theorem II.50. In fact, one may observe that often the decrease in the proximity to
the 1-center is much more significant.
9
The starting points in this example were obtained by using theory that will be developed later in
the book, in Part III. There we show that for any positive vector w ∈ IRn there exists a primal-dual
pair (x, s) such that xs = w and we also deal with methods that yield such a pair. For each starting
point the first two entries of w can be read from the figure; for the third coordinate of w we took
the value 1, the value of w3 at the 1-center, since x(1)s(1) = e.
II.7 Primal-Dual Logarithmic Barrier Method
159
When starting outside the region of quadratic convergence the behavior of the
Newton process is quite unpredictable. Note that the feasibility of the (full) Newton
step is then not guaranteed by the theory.
9
w2
✻8
1.5
✻
central path
7
1.25
6
δ(w, 1) = 1.5
1
5
❄
4
0.75
3
0.5
2
0.25
1
0
0
Figure 7.2
1
2
3
4
5
6
7
8
9
✲ w1
Demonstration of the primal-dual Newton process.
In Figure 7.2 we consider the behavior of the Newton process outside this region,
even for proximity values larger than 1. The behavior (in this simple example) is
surprisingly good if we start on (or close to) the central path. When starting closer
to the boundary of the w-space the second-order effect becomes more evident and
this may result in infeasibility of the Newton step, as Figure 7.2 demonstrates (for
example if w1 = 8 and w2 = 1). This observation, that Newton’s method performs
better when the starting point is on or close to the central path than when we start
close to the boundary of the nonnegative orthant, is not supported by the theory, but
is in agreement with common computational practice.
♦
7.4.1
A sharper local quadratic convergence result
In this section we show that Theorem II.50 can be slightly improved. By using the
third uv−lemma (Lemma C.7 in Appendix C) we obtain the following.
Theorem II.52 If δ = δ(x, s; µ) < 1 then
δ2
δ(x+ , s+ ; µ) ≤ p
.
2(1 − δ 4 )
160
II Logarithmic Barrier Approach
Proof: From the proof of Theorem II.50 we recall the definitions of δ + and u+ , and
the relation
p
u+ = e + dx ds .
2
Since dx and ds are orthogonal this implies that ku+ k = n. Now we may write
4(δ + )2
2
2
2
= (u+ )−1 + u+ − 2n
e
2
T
−e .
−n=e
e + dx ds
=
(u+ )−1 − u+
=
(u+ )−1
Application of Lemma C.7 to the last expression (with u = dx and v = ds ) yields the
result of the theorem, since kdx + ds k = 2δ, with δ < 1.
✷
7.5
Primal-dual logarithmic barrier algorithm with full Newton
steps
In this section we investigate a primal-dual algorithm using approximate centers. The
algorithm is described below. It is assumed that we are given a positive primal-dual
pair (x0 , s0 ) ∈ P + × D+ and µ0 > 0 such that (x0 , s0 ) is close to the µ0 -center in the
sense of the proximity measure δ(x0 , s0 ; µ0 ). In the algorithm ∆x and ∆s denote the
primal-dual Newton step, as defined before.
Primal-Dual Logarithmic Barrier Algorithm with full Newton steps
Input:
A proximity parameter τ , 0 ≤ τ < 1;
an accuracy parameter ε > 0;
(x0 , s0 ) ∈ P + × D+ and µ0 > 0 such that (x0 )T s0 = nµ0 and
δ(x0 , s0 ; µ0 ) ≤ τ ;
a barrier update parameter θ, 0 < θ < 1.
begin
x := x0 ; s := s0 ; µ := µ0 ;
while nµ ≥ (1 − θ)ε do
begin
x := x + ∆x;
s := s + ∆s;
µ := (1 − θ)µ;
end
end
We have the following theorem. The proof will follow below.
II.7 Primal-Dual Logarithmic Barrier Method
161
√
√
Theorem II.53 If τ = 1/ 2 and θ = 1/ 2n, then the Primal-Dual Logarithmic
Barrier Algorithm with full Newton steps requires at most
√
nµ0
2n log
ε
iterations. The output is a primal-dual pair (x, s) such that xT s ≤ ε.
7.5.1
Convergence analysis
Just as in the dual case the proof depends on a lemma that quantifies the effect on
the proximity measure of an update of the barrier parameter to µ+ = (1 − θ)µ.
Lemma II.54 Let (x, s) be a positive primal-dual pair and µ > 0 such that xT s = nµ.
Moreover, let δ := δ(x, s; µ) and let µ+ = (1 − θ)µ. Then
δ(x, s; µ+ )2 = (1 − θ)δ 2 +
Proof: Let δ + := δ(x, s; µ+ ) and u =
4(δ + )2 =
θ2 n
.
4(1 − θ)
p
xs/µ. Then, by definition,
√
u
1 − θu−1 − √
1−θ
2
2
=
√
θu
1 − θ u−1 − u + √
1−θ
2
.
From xT s = nµ it follows that kuk = n. Hence, u is orthogonal to u−1 − u:
Therefore,
2
uT u−1 − u = n − kuk = 0.
4(δ + )2 = (1 − θ) u−1 − u
2
2
2
+
θ2 kuk
.
1−θ
Finally, since u−1 − u = 2δ and kuk = n the result follows.
✷
The proof of Theorem √
II.53 now goes as follows. At the start of the algorithm we
have δ(x, s; µ) ≤ τ = 1/ 2. After the primal-dual Newton step to the µ-center we
have, by Theorem II.50, δ(x+ , s+ ; µ) ≤ 1/2. Also, from Lemma II.47, (x+ )T s+ =√nµ.
Then, after the barrier parameter is updated to µ+ = (1 − θ)µ, with θ = 1/ 2n,
Lemma II.54 yields the following upper bound for δ(x+ , s+ ; µ+ ):
δ(x+ , s+ ; µ+ )2 ≤
1
3
1−θ
+
≤ .
4
8(1 − θ)
8
Assuming n ≥ 2, the last inequality follows since its left hand side is a convex function
of θ, whose value is 3/8 both in θ = 0 and θ = 1/2. Since θ ∈ [0, 1/2],√the left hand
side does not exceed 3/8. Since 3/8 < 1/2, we obtain δ(x+ , s+ ; µ+ ) ≤ 1/ 2 = τ. Thus,
after each iteration of the algorithm the property
δ(x, s; µ) ≤ τ
162
II Logarithmic Barrier Approach
is maintained, and hence the algorithm is well defined. The iteration bound in the
theorem follows from Lemma I.36. Finally, since after each full Newton step the
duality gap attains its target value, by Lemma II.47, the duality gap for the pair (x, s)
generated by the algorithm is at most ε. This completes the proof of the theorem. ✷
Remark II.55 It is worthwhile to discuss the quality of the iteration bound in Theorem
II.53. For that purpose we consider the hypothetical situation where the Newton step in
each iteration is exact. Then, putting δ + = δ(x+ , s+ , µ+ ), after the update of the barrier
parameter we have
θ2 n
,
4(δ + )2 =
1−θ
p
√
and hence we have δ + ≤ 1/ 2 only if θ2 n ≤ 2(1 − θ). This occurs only if θ < 2/n. Hence,
√
if we maintain the property δ(x, s; µ) ≤ 1/ 2 after the update of the barrier parameter, then
the iteration bound will never be smaller than
q
nµ0
n
log
2
ε
.
(7.18)
Note that the iteration bound of Theorem II.53 is only a factor 2 worse than the ‘ideal’
iteration bound (7.18). Recall that the bound (7.18) assumes that the Newton step is exact
in each iteration. In this respect it is interesting to indicate that for larger values of n the
result of Theorem II.53 can be improved so that it becomes closer to the ‘ideal’ iteration
bound. But then we
√ need to use the stronger quadratic convergence result of Theorem II.52.
If we take θ = 1/ n, then by using √
Lemma II.54 and Theorem II.52, we may easily verify
that the property δ(x, s; µ) ≤ τ = 1/ 2 is maintained if
1−θ
1
1
+
≤ .
4(1 − θ)
6
2
This holds if θ ≤ 0.36602, which corresponds to n ≥ 8. Thus, for n ≥ 8 the iteration bound
of Theorem II.53 can be improved to
√
n log
nµ0
ε
.
(7.19)
This iteration bound is the best among all known iteration bounds for interior-point methods.
√
It differs by only a factor 2 from the ideal bound (7.18).
•
7.5.2
Illustration of the algorithm with full Newton steps
We use the same sample problem as before (see Sections 6.7.2, 6.8.4 and 6.9.7). As
starting point we use the vectors x = (2, 1, 1), y = (0, 0) and s = (1, 1, 1), and since
xT s = 4, we take the initial value of the barrier parameter µ equal to 4/3. We can
easily check that δ(x, s; µ) = 0.2887. So these data can indeed be used to initialize the
algorithm. With ε = 10−4 , the algorithm generates the data collected in Table 7.1..
As before, Table 7.1. contains one entry (the first) of the vectors x and s. The seventh
column contains the values of the proximity δ = δ(x, s; µ) before the Newton step, and
the eighth column the proximity δ + = δ(x+ , s+ ; µ) after the Newton step at (x, s) to
the current µ-center.
II.7 Primal-Dual Logarithmic Barrier Method
It.
nµ
x1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
4.000000
2.367007
1.400680
0.828855
0.490476
0.290240
0.171750
0.101633
0.060142
0.035589
0.021060
0.012462
0.007375
0.004364
0.002582
0.001528
0.000904
0.000535
0.000317
0.000187
0.000111
0.000066
0.000039
2.000000
2.000000
1.510102
1.267497
1.148591
1.085283
1.049603
1.029055
1.017089
1.010076
1.005950
1.003516
1.002079
1.001230
1.000728
1.000430
1.000255
1.000151
1.000089
1.000053
1.000031
1.000018
1.000011
Table 7.1.
y1
y2
163
s1
δ
δ+
θ
0.000000
0.000000 1.000000 0.2887 0.0000 0.4082
0.333333 −0.333333 0.666667 0.4596 0.0479 0.4082
0.442200
0.210998 0.557800 0.4611 0.0586 0.4082
0.601207
0.533107 0.398793 0.4618 0.0437 0.4082
0.744612
0.723715 0.255388 0.4608 0.0271 0.4082
0.843582
0.836508 0.156418 0.4601 0.0162 0.4082
0.905713
0.903253 0.094287 0.4598 0.0096 0.4082
0.943610
0.942750 0.056390 0.4597 0.0057 0.4082
0.966423
0.966122 0.033577 0.4596 0.0034 0.4082
0.980058
0.979953 0.019942 0.4596 0.0020 0.4082
0.988174
0.988137 0.011826 0.4596 0.0012 0.4082
0.992993
0.992980 0.007007 0.4596 0.0007 0.4082
0.995850
0.995846 0.004150 0.4596 0.0004 0.4082
0.997543
0.997542 0.002457 0.4596 0.0002 0.4082
0.998546
0.998545 0.001454 0.4596 0.0001 0.4082
0.999139
0.999139 0.000861 0.4596 0.0001 0.4082
0.999491
0.999491 0.000509 0.4596 0.0001 0.4082
0.999699
0.999699 0.000301 0.4596 0.0000 0.4082
0.999822
0.999822 0.000178 0.4596 0.0000 0.4082
0.999894
0.999894 0.000106 0.4596 0.0000 0.4082
0.999938
0.999938 0.000062 0.4596 0.0000 0.4082
0.999963
0.999963 0.000037 0.4596 0.0000 0.4082
0.999978
0.999978 0.000022
−
−
−
Output of the primal-dual full-step algorithm.
Comparing the results in Table 7.1. with those in the corresponding table for the
dual algorithm with full steps (Table 6.1., page 124), the most striking differences are
the number of iterations and the behavior of the proximity measure. In the primal-dual
case the number of iterations is 22 (instead of 53). This can be easily understood
from
√
the fact that
we
could
use
the
larger
barrier
update
parameter
θ
=
1/
2n
(instead
of
√
θ = 1/(3 n)).
The second difference is probably more important. In the primal-dual case Newton’s
method is much more efficient than in the dual case. This is especially evident in the
final iterations where both methods show very stable behavior. In the dual case the
proximity takes in these iterations the values 0.2722 (before) and 0.0524 (after the
Newton step), whereas in the primal-dual case these values are respectively 0.4596 and
0.0000. Note that in the dual case the effect of the Newton step is slightly better than
the quadratic convergence result of Theorem II.21. In the primal-dual case, however,
the effect of the Newton step is much better than predicted by Theorem II.50, and
even much better than the improved quadratic convergence result of Theorem II.52.
164
II Logarithmic Barrier Approach
The figures in Table 7.1. justify the statement (at least for this sample problem, but
we observed the same phenomenon in other experiments) that asymptotically the
primal-dual Newton method is almost exact.
Remark II.56 It is of interest to have a closer (and more accurate) look at the proximity
values in the final iterations. They are given in Table 7.2. (page 164). These figures show that
It.
δ
δ+
11
12
13
14
15
16
17
18
19
20
21
0.45960642869434
0.45960584496214
0.45960564054812
0.45960556896741
0.45960554390189
0.45960553512461
0.45960553205110
0.45960553097480
0.45960553059816
0.45960553046642
0.45960553041942
0.00069902816289
0.00041365328341
0.00024478048789
0.00014484936548
0.00008571487895
0.00005072193012
0.00003001478966
0.00001776130347
0.00001051028182
0.00000621947704
0.00000368038542
Table 7.2.
Proximity values in the final iterations.
in the final iterations, where Newton’s method is almost exact, the quality of the method
gradually improves. After the step the proximity decreases monotonically. In fact, surprisingly
enough, the rate of decrease of subsequent values of the proximity after the step is almost
constant (0.59175). Remember that
√ the barrier parameter µ also decreases at a linear rate
by a factor 1 − θ, where θ = 1/ 2n. In our case we have n = 3. This gives θ = 0.4082 and
1 − θ = 0.59175, precisely the rate of decrease in δ + . Before the Newton step the proximity
√
is almost constant (0.4596). Not surprisingly, this is precisely the value of θ n/(2(1 − θ)).
Thus, our numerical experiment gives rise to a conjecture:
Conjecture II.57 Asymptotically the quality of the primal-dual Newton step gradually
improves. The proximity before the step converges to some constant and the proximity after the
step decreases monotonically to zero with a linear convergence rate. The rate of convergence
is equal to 1 − θ.
This observed behavior of the primal-dual Newton method has no theoretical justification at
the moment.
•
We conclude this section with a graphical illustration. Figure 7.3 shows on two
graphs the progress of the algorithm in the w-space (cf. Example II.51 on page 157).
In both figures the w-space is projected onto its first two coordinates. The difference
between the two graphs is due to the scaling of the axes. On the left graph the scale
is linear and on the right graph it is logarithmic. As in Example II.51, the curves
connecting the subsequent iterates show the intermediate values of xs on the way to
the next iterate. The graphs show that after the first iteration the iterates follow the
central path quite accurately.
II.7 Primal-Dual Logarithmic Barrier Method
165
101
5
w2
✻
w2
✻ 100
✻
4
central path
10−1
δ(w, µ1 ) = τ
3
10−2
❄
10−3
2
10−4
1
10−5
10−6
0
0
1
2
Figure 7.3
7.5.3
3
4
✲ w1
5
10−6
10−5
10−4
10−3
10−2
10−1
100
✲ w1
101
The iterates of the primal-dual algorithm with full steps.
The classical analysis of the algorithm
In this section we give a different analysis of the primal-dual logarithmic barrier
algorithm with full Newton steps. The analysis uses the proximity measure
σ(x, s; µ) :=
xs
−e ,
µ
which is very common in the literature on primal-dual methods.10
Because of its widespread use, it seems useful to show in this section how the analysis
can be easily adapted to the use of the classical proximity measure. In fact, the only
thing we have to do is find suitable analogues of the quadratic convergence result in
Theorem II.50 and the barrier update result of Lemma II.54.
p
√
Theorem II.58 11 If σ := σ(x, s; µ) ≤ 2/ 1 + 1 + 2 = 0.783155, then the primal-dual Newton step is feasible. Moreover, in that case
σ2
.
σ(x+ , s+ ; µ) ≤ √
2 2(1 − σ)
Proof: First we derive from u2 − e = σ the obvious inequality
1 − σ ≤ u2i ≤ 1 + σ,
1 ≤ i ≤ n.
10
It was introduced by Kojima, Mizuno and Yoshise [178] and used in many other papers. See, e.g.,
Gonzaga [124], den Hertog [140], Marsten Shanno and Simantiraki [196], McShane, Monma and
Shanno [199], Mehrotra and Sun [205], Mizuno [215], Monteiro and Adler [218], Todd [262, 264],
Zhang and Tapia [319].
11
This result is due to Mizuno [212].
166
II Logarithmic Barrier Approach
This implies
u−2
∞
≤
From (7.4) we recall that
1
.
1−σ
(7.20)
x+ s+ = µe + ∆x∆s.
Hence, using (7.11), we have
12,13
σ(x+ , s+ ; µ) :=
∆x∆s
x+ s+
−e =
= kdx ds k .
µ
µ
By the first uv-lemma (Lemma C.4 in Appendix C) we have
1
1
2
kdx ds k ≤ √ kdx + ds k = √ u−1 − u
2 2
2 2
2
.
Using (7.20) we write
2
u−1 − u
= u−1 (e − u2 )
2
≤ u−2
2
e − u2
∞
≤
σ2
.
1−σ
Hence we get
σ2
σ(x+ , s+ ; µ) ≤ √
.
2 2(1 − σ)
Since σ(x+ , s+ ; µ) = kdx ds k, feasibility of the new iterates is certainly guaranteed if
σ(x+ , s+ ; µ) ≤ 1, from (7.9). This condition is certainly satisfied if
σ2
√
≤ 1,
2 2(1 − σ)
p
√
and this inequality holds if and only if σ ≤ 2/ 1 + 1 + 2 , as can easily be
verified. The theorem follows.
✷
12
Exercise 52 This exercise provides an alternative proof of the first inequality in Lemma C.4. Let
u and v denote vectors in IRn and δ > 0 (δ ∈ IR). First prove that
min
u,v
13
(
u1 v1 :
n
X
ui vi = 0,
i=1
n
X
i=1
u2i
+
vi2
= 4δ
2
)
= δ2 .
Using this, show that if u and v are orthogonal and ku + vk = 2δ then kuvk∞ ≤ δ2 .
Exercise 53 This exercise provides tighter version of the second inequality in Lemma C.4. Let u
and v denote vectors in IRn and δ > 0 (δ ∈ IR). First prove that
max
u,v
(
n
X
i=1
u2i vi2 :
n
X
i=1
ui vi = 0,
n
X
i=1
2
u2i + vi
= 4δ2
)
=
nδ4
.
n−1
√
Using this show that if u and v are orthogonal and ku + vk = 2δ then kuvk ≤ δ2 2.
II.7 Primal-Dual Logarithmic Barrier Method
167
Lemma II.59 Let (x, s) be a positive primal-dual pair and µ > 0 such that xT s = nµ.
Moreover, let σ := σ(x, s; µ) and let µ+ = (1 − θ)µ. Then we have
√
σ2 + θ2 n
+
.
σ(x, s; µ ) =
1−θ
Proof: Let σ + := σ(x, s; µ+ ), with xT s = nµ. Then, by definition,
(σ + )2 =
xs
−e
(1 − θ)µ
2
=
1
(1 − θ)2
xs
− e + θe
µ
2
.
The vectors e and xs/µ − e are orthogonal, as easily follows. Hence
xs
− e + θe
µ
2
=
xs
−e
µ
2
2
+ kθek = σ 2 + θ2 n.
The lemma follows.
✷
From the above results, it is clear that maintaining the property σ(x, s : µ) ≤ τ
during the course of the algorithm amounts to the following condition on θ:
s
1
τ4
+ nθ2 ≤ τ.
(7.21)
1 − θ 8(1 − τ )2
For any given τ this inequality determines how deep the updates of the barrier
parameter are allowed to be. Since the full Newton step must be feasible we may
assume that
2
p
τ≤
√ = 0.783155.
1+ 1+ 2
Squaring both sides of (7.21) gives
τ4
+ nθ2 ≤ τ 2 (1 − θ)2 .
8(1 − τ )2
√
This implies nθ2 ≤ τ 2 , and hence the parameter θ must satisfy θ ≤ τ / n.
The iteration bound of Lemma I.36 becomes smaller for larger values of θ. Our aim
here is to show that for the best possible choice of θ the iteration bound resulting
from the classical analysis cannot be better than the bound of Theorem II.53. For
that purpose we may assume that n is so large that 1 − θ ≈ 1. Then the condition on
θ becomes
τ4
+ nθ2 ≤ τ 2 ,
8(1 − τ )2
or equivalently,
nθ2 ≤ τ 2 −
τ4
.
8(1 − τ )2
(7.22)
Note that the right-hand side expression must be nonnegative, which holds only if
√
2 2
√ = 0.738796.
τ≤
1+2 2
168
II Logarithmic Barrier Approach
We can easily verify that the right-hand side expression in (7.22) is maximal if
7τ 3 − 22τ 2 + 24τ − 8 = 0,
which occurs for τ = 0.60155. Substituting this value in (7.22) we obtain
nθ2 ≤ 0.258765,
which amounts to
θ≤
1
0.508689
√
≈ √ .
n
2 n
Obviously, this upper bound for θ is too optimistic. The above argument makes clear
that by using the ‘classical’ proximity measure σ(x, s; µ) in the analysis of the primaldual method with full Newton steps, the iteration bound obtained with the proximity
measure δ(x, s; µ) cannot be improved.
7.6
7.6.1
A version of the algorithm with adaptive updates
Adaptive updating
We have seen in Section 7.5 that when the property
1
δ(x, s; µ) ≤ τ = √
2
(7.23)
is maintained after the update of the barrier parameter, p
the values of the barrier
update parameter θ are limited by the upper bound θ < 2/n, and therefore, the
iteration bound cannot be better than the ‘ideal’ bound
r
nµ0
n
.
log
2
ε
Thus, larger updates of the barrier parameter are possible only when abandoning the
idea that property (7.23) must hold after each update of the barrier parameter.
To make clear how this can be done without losing the iteration bound of
Theorem II.53, we briefly recall the idea behind the proof of this theorem. After
each Newton step we have a primal-dual pair (x, s) and µ > 0 such that
τ2
.
δ(x, s; µ) ≤ τ̄ = p
2(1 − τ 2 )
(7.24)
Then we update µ to a smaller value µ+ = (1 − θ)µ such that
δ(x, s; µ+ ) ≤ τ,
(7.25)
and we perform a Newton step to the µ+ -center, yielding a primal-dual pair (x+ , s+ )
such that δ(x+ , s+ ; µ+ ) ≤ τ̄ . Figure 7.4 illustrates this.
Why does this scheme work? It works because every time we perform a Newton
step the iterates x and s are such that xs is in the region around the µ-center where
II.7 Primal-Dual Logarithmic Barrier Method
169
...
...✛ T
... x s = nµ
...
❨
...
central path
...
...
...
...
...
...
...
...
...
...
µ
...
...
✛ δ(x, s, µ) = τ̄
...
...
...
...
xs
...
...
...
...
...
...
...
...
...
...
+
.
...
.
µ = (1 − θ)µ ... + +
...
..✠
...
...x s
.
...
...
✛ δ(x, s, µ+ ) = τ
...
...
...
...
...✛
... xT s = nµ+
...
Figure 7.4
The primal-dual full-step approach.
Newton’s method behaves well.
√ The theory guarantees that if the proximity does not
exceed the parameter τ = 1/ 2 then we stay within this region. However, in practice
the region where Newton’s method behaves well may be much larger.
Thus we can adapt our strategy to this phenomenon and choose the smallest barrier
parameter µ+ = (1 − θ)µ so that after the Newton step to the µ+ -center the iterates
satisfy δ(x+ , s+ ; µ+ ) ≤ τ̄ . Therefore, let us consider the following problem:
Given a primal-dual pair (x, s) and µ > 0 such that δ := δ(x, s; µ) ≤ τ̄ ,
find the largest θ such that after the Newton step at (x, s) with barrier
parameter value µ+ = (1 − θ)µ we have δ + = δ(x+ , s+ ; µ+ ) ≤ τ̄ .
Here we use the parameter τ̄ instead of τ , because until now τ referred to the proximity
before the Newton step, whereas τ̄ is an upper bound for the proximity just after the
Newton step. It is natural to take for τ̄ the value 1/2, because this is an upper bound
√
for the proximity after the Newton step when the proximity before the step is 1/ 2.
Our aim in this section is to investigate how deep the updates can be taken, so as to
enhance the performance of the algorithm as much as possible. See Figure 7.5.14 Just
as in the case of the dual method with adaptive updates, we need to introduce the
so-called primal-dual affine-scaling and primal-dual centering directions at (x, s).
14
The idea of using adaptive updates of the barrier parameter in a primal-dual method can be found
in, e.g., Jarre and Saunders [163].
170
II Logarithmic Barrier Approach
...
...✛ T
... x s = nµ
...
❨
...
central path
...
...
...
...
...
..
µ .....
✛ δ(x, s, µ) = τ̄
...
.
xs.....
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
µ+ = (1 − θ)µ ......
...
...
...
+ + ...✎.
x s ..
...
...✛
...
.
Figure 7.5
7.6.2
✛ δ(x, s, µ+ ) = τ
xT s = nµ+
The full-step method with an adaptive barrier update.
The primal-dual affine-scaling and centering direction
We first recall some definitions and properties from Section 7.4. With
r
r
x
xs
d=
, u=
,
s
µ
the vectors x and s can be scaled by d to the vector u as follows:
ds
d−1 x
√ = √ = u.
µ
µ
The same scaling applied to the Newton steps ∆x and ∆s yields the scaled Newton
steps dx and ds :
d−1 ∆x
d∆s
dx = √
, ds = √ ,
µ
µ
and these satisfy
dx + ds = u−1 − u.
Moreover, the vectors dx and ds are orthogonal. They are the components of the vector
u−1 − u in the null space of AD and the null space of HD−1 respectively:
dx
ds
=
=
PAD (u−1 − u)
PHD−1 (u
−1
− u).
(7.26)
(7.27)
II.7 Primal-Dual Logarithmic Barrier Method
171
In this section we work mainly with the scaled Newton steps dx and ds . The
last expressions yield a natural way of separating these directions into a so-called
affine-scaling component and a centering component. Remark Guoyong Gu: term
centering component not natural!! The (scaled) centering directions are defined
by
dcx = PAD (u−1 ), dcs = PHD−1 (u−1 ),
(7.28)
and the (scaled) affine directions by
dax = −PAD (u),
das = −PHD−1 (u).
(7.29)
Now we have the obvious relations
dx
ds
=
=
dcx + dax
dcs + das
and
dcx + dcs = u−1
dax + das = −u.
The unscaled centering and affine-scaling directions are defined in the obvious way:
√
∆a x := µddax , etc. For the sake of completeness we list these definitions below and
we also give some alternative expressions which can straightforwardly be verified.
∆a x :=
∆a s :=
√
√
√
µddax = − µDPAD (u) = −DPAD ( xs)
√
√ −1 a
√
µd ds = − µD−1 PHD−1 (u) = −D−1 PHD−1 ( xs)
∆c x :=
√
√
µddcx = µDPAD (u−1 ) = µDPAD ( √exs )
∆c s :=
√ −1 c √ −1
µd ds = µD PHD−1 (u−1 ) = µD−1 PHD−1 ( √exs ).
Note that the affine-scaling directions ∆a x and ∆a s depend only on the iterates x and
s and not on the barrier parameter µ. For the centering directions we have that ∆c x/µ
and ∆c s/µ depend only on the iterates x and s and not on the barrier parameter µ.
Also note that if we are on the central path, i.e., if x = x(µ) and s = s(µ), then we
have u = e. This implies u−1 − u = 0, whence dx = ds = 0. Hence, on the central path
we have dax = −dcx and das = −dcs .
For future reference we observe that the above definitions imply the obvious relations
∆x = ∆a x + ∆c x
(7.30)
∆s = ∆a s + ∆c s,
which show that the (unscaled) full Newton step (∆x, ∆s) — at (x, s) and for the
barrier parameter value µ — can be nicely decomposed in its affine scaling and its
centering component.
172
7.6.3
II Logarithmic Barrier Approach
Condition for adaptive updates
In this section we start to deal with the problem stated before. Let (x, s) be a positive
primal-dual pair and µ > 0 such that δ = δ(x, s; µ) ≤ τ̄ . We want to investigate how
large θ can be so that after the Newton step at (x, s) with barrier parameter value
µ+ = (1 − θ)µ we have δ + = δ(x+ , s+ ; µ+ ) ≤ τ̄ . We derive a condition for the barrier
update parameter θ that guarantees the desired behavior.
The vector u, the scaled search directions dx and ds and their (scaled) centering
components dcx , dcs and (scaled) affine components dax , das , have the same meaning as
in the previous section; the entities u, dax and das depend on the given value µ of the
barrier parameter. The scaled search directions at (x, s) with barrier parameter value
+
µ+ are denoted by d+
x and ds . Letting ∆x and ∆s denote the (unscaled) Newton
+
directions with respect to µ , we have
s∆x + x∆s = µ+ e − xs,
and therefore, also using (7.11),
+
x+ s+ = µ+ e + ∆x∆s = µ+ e + d+
x ds .
+
By Lemma II.48, the step is feasible if e + d+
x ds ≥ 0, and this certainly holds if
+
d+
x ds
∞
≤ 1.
Moreover, from the proof of Theorem II.50 we recall that the proximity δ + :=
δ(x+ , s+ ; µ+ ) of the new pair (x+ , s+ ) with respect to the µ+ -center is given by
d+ d+
2δ + = p x s
.
+
e + d+
x ds
This implies that we have δ + ≤ τ̄ if and only if
+
d+
x ds
p
+
e + d+
x ds
2
≤ 4τ̄ 2 .
In the sequel we use the weaker condition
+
d+
x ds
2
+
≤ 4τ̄ 2 1 − d+
x ds
∞
,
(7.31)
which we refer to as the condition for adaptive updating. A very important observation
is that when this condition is satisfied, the Newton step is feasible. Because, if (7.31)
holds, since the left-hand side expression is nonnegative, the right-hand side expression
+
must be nonnegative as well, and hence kd+
x ds k∞ ≤ 1. Thus, in the further analysis
we may concentrate on the condition for adaptive updating (7.31).
7.6.4
Calculation of the adaptive update
We proceed by deriving upper bounds for the 2-norm and the infinity norm of the
+
vector d+
x ds . It is convenient to introduce the vector
r
xs
ū :=
.
µ+
II.7 Primal-Dual Logarithmic Barrier Method
We then have
ū =
Hence, using this and (7.26),
and
r
xs
=
µ+
r
173
xs
u
.
= √
(1 − θ)µ
1−θ
√
1
−1
− ū = 1 − θPAD u−1 − √
d+
PAD (u)
x = PAD ū
1−θ
√
1
−1
PHD−1 (u) .
d+
− ū = 1 − θPHD−1 u−1 − √
s = PHD−1 ū
1−θ
Now using (7.28) and (7.29) we obtain
d+
x
=
d+
s
=
√
da
1 − θ dcx + √ x
1−θ
√
da
1 − θ dcs + √ s .
1−θ
Note that d+
x can be rewritten in the following way:
√
da
1 − θdcx + √ x
1−θ
(7.32)
(7.33)
√
1
√
− 1 − θ dax
1−θ
=
√
1 − θ (dcx + dax ) +
=
√
θ
1 − θdx + √
dax .
1−θ
Since d+
s can be reformulated in exactly the same way we find
d+
x
=
d+
s
=
√
θ
1 − θdx + √
dax
1−θ
√
θ
1 − θds + √
das .
1−θ
Multiplication of both expressions gives
+
a
a
d+
x ds = (1 − θ)dx ds + θ (dx ds + ds dx ) +
θ2 a a
d d .
1−θ x s
(7.34)
+
At this stage we see how the coordinates of the vector d+
x ds depend on θ. The
+ +
coordinates of (1 − θ)dx ds are quadratic functions of θ:
+
2
a
a
2 a a
(1 − θ)d+
x ds = (1 − θ) dx ds + θ(1 − θ) (dx ds + ds dx ) + θ dx ds .
When multiplying the condition (7.31) for adaptive updating by (1−θ)2 , this condition
can be rewritten as
+
4τ̄ 2 (1 − θ)2 − (1 − θ)d+
x ds
2
+
≥ 4τ̄ 2 (1 − θ) (1 − θ)d+
x ds
∞
.
(7.35)
Now denoting the left-hand side member by p(θ) and the i-th coordinate of the vector
+
(1 − θ)d+
x ds by qi (θ), with τ̄ given, we need to find the largest positive θ that satisfies
the following inequalities:
p(θ)
p(θ)
≥
≥
4τ̄ 2 (1 − θ)qi (θ), 1 ≤ i ≤ n
−4τ̄ 2 (1 − θ)qi (θ), 1 ≤ i ≤ n.
174
II Logarithmic Barrier Approach
Since p(θ) is a polynomial of degree 4 in θ, and each qi (θ) is a polynomial of degree
2 in θ, the largest positive θ satisfying each single one of these 2n inequalities can be
found straightforwardly by solving a polynomial equation of degree 4. The smallest of
the 2n positive numbers obtained in this way (some of them may be infinite, but not
all of them!) is the value of θ determined by the condition of adaptive updating. Thus
we have shown that the largest θ satisfying the condition for adaptive updating can
be found by solving 2n polynomial equations of degree 4.15
Below we deal with a second approach. We consider a further relaxation of the
condition for adaptive updating that requires the solution of only one quadratic
equation. Of course, this approach yields a smaller value of θ than the above procedure,
which gives the exact solution of the condition (7.31) for adaptive updating. Before
proceeding it is of interest to investigate the special case where we start at the µ-centers
x = x(µ) and s = s(µ).
7.6.5
Special case: adaptive update at the µ-center
When we start with x = x(µ) and s = s(µ), we established earlier that u = e,
dx = ds = 0, dax = −dcx and das = −dcs . Substituting this in (7.34) we obtain
+
d+
x ds =
θ2 a a
d d .
1−θ x s
Now we can use the first uv-lemma (Lemma C.4 in Appendix C) to estimate the
2-norm and the infinity norm of dax das . Since dax + das = −u = −e, we obtain
n
kdax das k ≤ √ ,
2 2
kdax das k∞ ≤
n
.
4
Substitution in (7.31) gives
θ2 n
θ 4 n2
2
.
1
−
≤
4τ̄
8(1 − θ)2
4(1 − θ)
This can be rewritten as
which is equivalent to
or
θ 4 n2
τ̄ 2 θ2 n
+
≤ 4τ̄ 2 ,
2
8(1 − θ)
1−θ
√
θ2 n
√
+ τ̄ 2 2
2 2(1 − θ)
2
≤ 4τ̄ 2 + 2τ̄ 4 ,
√ p
√
θ2 n
≤2 2
4τ̄ 2 + 2τ̄ 4 − τ̄ 2 2 .
1−θ
Substituting τ̄ = 1/2 gives
θ2 n
≤ 2.
1−θ
15
In fact, more efficient procedures exist for solving the condition for adaptive updating, but here
our only aim has been to show that there exists an efficient procedure for finding the maximal
value of the parameter θ satisfying the condition for adaptive updating.
II.7 Primal-Dual Logarithmic Barrier Method
175
This result has its own interest. The bound obtained is exactly the ‘ideal’ bound for θ
derived in Section 7.5 for the hypothetical situation where the Newton step is exact.
Here we obtained a better bound without this assumption, but under the more realistic
assumption that we start at the µ-centers x = x(µ) and s = s(µ).
7.6.6
A simple version of the condition for adaptive updating
We return to the general case, and show how a weakened version of the condition for
adaptive updating
+ 2
+
d+
≤ 4τ̄ 2 1 − d+
x ds
x ds ∞
can be reduced to a quadratic inequality in θ. With
+
d+ := d+
x + ds ,
the first uv-lemma (Lemma C.4 in Appendix C) implies that
2
2
+
d+
≤
x ds
kd+ k
√ ,
2 2
+
d+
x ds
∞
≤
kd+ k
.
4
Substituting these bounds in the condition for adaptive updating we obtain the weaker
condition
!
4
2
kd+ k
kd+ k
2
1−
.
≤ 4τ̄
8
4
Rewriting this as
we obtain
d+
d+
2
2
+ 4τ̄ 2
≤
2
≤ 32τ̄ 2 + 16τ̄ 4 ,
p
32τ̄ 2 + 16τ̄ 4 − 4τ̄ 2 .
Substituting τ̄ = 1/2 leads to the condition
d+
2
≤ 2.
d+
x
(7.36)
d+
s ,
From the expressions (7.32) and (7.33) for
and
and also using that
and dax + das = −u, we find
√
u
d+ = 1 − θ u−1 − √
.
1−θ
dcx +dcs
= u−1
From this expression we can calculate the norm of d+ :
d+
Since kuk2 = n and u−1
d+
16
Since d+
namely
2
2
2
= (1 − θ) u−1
2
+
kuk2
− 2n.
1−θ
= n + 4δ 2 , where δ = δ(x, s; µ), we obtain
= (1 − θ) n + 4δ 2 +
θ2 n 16
n
− 2n = 4(1 − θ)δ 2 +
.
1−θ
1−θ
= 2δ(x, s : µ+ ) this analysis yields in a different way the same result as in Lemma II.54,
δ(x, s; µ+ )2 = (1 − θ)δ2 +
θ2 n
.
4(1 − θ)
176
II Logarithmic Barrier Approach
Putting this in (7.36) we obtain the following condition on θ:
4(1 − θ)δ 2 +
θ2 n
≤ 2.
1−θ
The largest θ satisfying this inequality is given by
√
2n + 1 − 4nδ 2 + 4δ 2 − 1
θ=
.
n + 4δ 2
(7.37)
With this value of θ we are sure that when starting with δ(x, s; µ) = δ, after the Newton
step with barrier parameter value µ+ = (1−θ)µ we have δ(x+ , s+ ; µ+ ) ≤ 1/2. If δ = 0,
the above expression reduces to
θ=
and if δ = 1/2 to
1
2
√
≤√
1 + 2n + 1
2n
1
θ= √
, 17
n+1
as easily may be verified. Hence, when using cheap adaptive updates the actual value
of θ varies from iteration to iteration but it always lies between
√ the above two extreme
2. As a consequence, the
values. The ratio between these extreme values is about
√
speedup factor is bounded above by (approximately) 2.
7.6.7
Illustration of the algorithm with adaptive updates
With the same example as in the previous illustrations, and the same initialization
of the algorithm as in Section 7.5.2, we experiment in this section with two adaptiveupdate strategies. First we consider the most expensive strategy, and calculate the
barrier update parameter θ from (7.35). In this case we need to solve 2n polynomial
inequalities of degree four. The algorithm, with ε = 10−4 , then runs as shown in
Table 7.3.. As before, Table 7.3. contains one entry (the first) of the vectors x and
s. A new column shows the value of the barrier update parameter in each iteration.
The fast increase of this parameter to almost 1 is surprising. It results in very fast
convergence of the method: only 5 iterations yield the desired accuracy.
When we calculate θ according to (7.37), the performance of the algorithm is as
shown in Table 7.4.. Now 15 iterations are needed instead of 6. In this example in
the final iterations θ seems to stabilize around the value 0.58486. This implies that
the convergence rate for the duality gap is linear. This is in contrast with the other
approach, where the convergence rate for the duality gap appears to be quadratic.
Unfortunately, at this time no theoretical justification for a quadratic convergence
rate of the adaptive version of the full-step method exists. For the moment we leave
17
We could have used this value of θ in Theorem II.53, leading to the iteration bound
√
nµ0
n + 1 log
ε
for the Primal-Dual Logarithmic Barrier Algorithm with full Newton steps.
II.7 Primal-Dual Logarithmic Barrier Method
177
It.
nµ
x1
y1
y2
s1
δ
δ+
θ
0
1
2
3
4
5
4.000000
1.281509
0.197170
0.004586
0.000002
0.000000
2.000000
1.093836
1.010191
1.000224
1.000000
1.000000
0.000000
0.333333
0.888935
0.997391
0.999999
1.000000
0.000000
0.572830
0.934277
0.998471
0.999999
1.000000
1.000000
0.666667
0.111065
0.002609
0.000001
0.000000
0.2887
0.7071
0.7071
0.7071
0.7071
−
0.7071
0.7071
0.7071
0.7071
0.1472
−
0.679623
0.846142
0.976740
0.999460
0.999999
−
Table 7.3.
The primal-dual full-step algorithm with expensive adaptive updates.
It.
nµ
x1
y1
y2
s1
δ
δ+
θ
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
4.000000
1.860612
0.866381
0.393584
0.177943
0.080352
0.036275
0.016375
0.007392
0.003337
0.001506
0.000680
0.000307
0.000139
0.000063
0.000028
2.000000
1.286871
1.138033
1.063707
1.029247
1.013307
1.006028
1.002726
1.001231
1.000556
1.000251
1.000113
1.000051
1.000023
1.000010
1.000005
0.000000
0.333333
0.698479
0.865026
0.939865
0.973046
0.987874
0.994534
0.997535
0.998887
0.999498
0.999773
0.999898
0.999954
0.999979
0.999991
0.000000
0.379796
0.711206
0.868805
0.940686
0.973216
0.987908
0.994542
0.997536
0.998888
0.999498
0.999773
0.999898
0.999954
0.999979
0.999991
1.000000
0.666667
0.301521
0.134974
0.060135
0.026954
0.012126
0.005466
0.002465
0.001113
0.000502
0.000227
0.000102
0.000046
0.000021
0.000009
0.2887
0.2934
0.1355
0.0670
0.0308
0.0140
0.0063
0.0028
0.0013
0.0006
0.0003
0.0001
0.0001
0.0000
0.0000
−
0.2934
0.1355
0.0670
0.0308
0.0140
0.0063
0.0028
0.0013
0.0006
0.0003
0.0001
0.0001
0.0000
0.0000
0.0000
−
0.534847
0.534357
0.545715
0.547890
0.548438
0.548554
0.548578
0.548583
0.548584
0.548584
0.548584
0.548584
0.548584
0.548584
0.548584
−
Table 7.4.
The primal-dual full-step algorithm with cheap adaptive updates.
this topic with the conclusion that the above comparison between the ‘expensive’ and
the ‘cheap’ adaptive update full-step method suggests that it is worth spending extra
effort in finding as large values for θ as possible.
We conclude the section with a graphical illustration of the adaptive updating
strategy. Figure 7.6 shows on two graphs the progress of the algorithm with the
expensive update. The graphs show the first two coordinates of the iterates in the
w-space. The left graph has a linear scale and the right graph a logarithmic scale.
Figure 7.7 concerns the case when cheap updates are used.
7.7
The predictor-corrector method
In the previous section it became clear that the Newton step can be decomposed into an
affine-scaling component and a centering component. Using the notations introduced
178
II Logarithmic Barrier Approach
101
2
w2
0
✻10
w2
✻
✻
1.5
10−1
central path
10−2
✠
1
δ(w, µ1 ) = τ
10−3
10−4
10−5
0.5
µ1 e
10−6
10−7
0
0
0.5
1
Figure 7.6
1.5
2
✲ w1
10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101
✲ w1
Iterates of the primal-dual algorithm with adaptive updates.
there, we recall from Section 7.6.2 that the (scaled) centering components are given
by
dcx = PAD (u−1 ), dcs = PHD−1 (u−1 ),
and the (scaled) affine components by
dax = −PAD (u),
where
d=
r
x
,
s
das = −PHD−1 (u),
u=
r
xs
.
µ
101
3
w2
✻
2.5
w2
0
✻10
✻
central path
2
10−1
10−2
1.5
✠
10−3
δ(w, µ1 ) = τ
10−4
1
10−5
µ1 e
0.5
10−6
10−7
0
0
0.5
1
Figure 7.7
1.5
2
2.5
✲ w1
3
10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101
✲ w1
Iterates of the primal-dual algorithm with cheap adaptive updates.
II.7 Primal-Dual Logarithmic Barrier Method
179
We also recall the relations
dx
ds
dcx + dax
dcs + das
=
=
and
dcx + dcs = u−1
dax + das = −u.
The unscaled centering and affine-scaling components are given by
∆a x =
√
µddax ,
∆a s =
√ −1 a
µd ds
∆c x =
√
µddcx ,
∆c s =
√ −1 c
µd ds ;
and
as a consequence we have
∆a x∆a s = µdax das ,
∆c x∆c s = µdcx dcs .
It is interesting to consider the effect of moving along these directions. Let us define
xa (θ)
=
x + θ∆a x
xc
=
x + ∆c x
sa (θ)
=
s + θ∆a s
sc
=
s + ∆c s.
We say that xa (θ) and sa (θ) result from an affine-scaling step of size θ at (x, s). In
preparation for the next lemma we first establish the following two relations:
x∆a s + s∆a x
=
x∆c s + s∆c x
=
−xs
µe.
(7.38)
(7.39)
These relations easily follow from the previous ones. We show this for the first of the
two relations. We first write
x∆a s =
√
µxd−1 das = µudas ,
and
s∆a x =
√
µsddax = µudax .
Adding the last two equalities we get (7.38):
x∆a s + s∆a x = µu (dax + das ) = −µu2 = −xs.
Now we can prove
Lemma II.60 Let xT s = nµ. Assuming feasibility of the steps, the affine-scaling step
reduces the duality gap by a factor 1 − θ and the step along the centering components
doubles the duality gap.
180
II Logarithmic Barrier Approach
Proof: We have
xa (θ)sa (θ) = (x + θ∆a x) (s + θ∆a s) = xs + θ (x∆a s + s∆a x) + θ2 ∆a x∆a s.
Using (7.38) we find
xa (θ)sa (θ) = (1 − θ)xs + θ2 ∆a x∆a s.
Using that ∆a x and ∆a s are orthogonal we obtain
T
(xa (θ)) sa (θ) = eT (1 − θ)xs + θ2 ∆a x∆a s = (1 − θ)xT s,
proving the first statement. For the second statement we write
xc sc = (x + ∆c x) (s + ∆c s) = xs + (x∆c s + s∆c x) + ∆c x∆c s.
Substitution of (7.39) gives
xc sc = xs + µe + θ2 ∆c x∆c s.
Thus we obtain
T
(xc ) sc = eT xs + µe + θ2 ∆c x∆c s = xT s + µeT e = 2nµ.
This completes the proof.
✷
Recall from (7.30) that the (unscaled) full Newton step (∆x, ∆s) at (x, s) — with
barrier parameter value µ — can be decomposed in its affine scaling and its centering
component. The above lemma makes clear that in the algorithms we dealt with before,
the reduction in the duality gap during a (full) Newton step is delivered by the affinescaling component in the Newton step. The centering component in the Newton step
forces the iterates to stay close to the central path.
When solving a given LO problem, we wish to find a primal-dual pair with a
duality gap close to zero. We want to reduce the duality gap as fast as possible to
zero. Therefore, it becomes natural to consider algorithms that put more emphasis on
the affine-scaling component. That is the underlying idea of the predictor-corrector
method which is the subject of this section. Note that when the full affine-scaling step
(with step-size 1) is feasible, it produces a feasible pair with duality gap zero, and
hence it yields an optimal solution pair in a single step. This makes clear that the full
affine step will be infeasible in general.
In the predictor-corrector method, instead of combining the two directions in a
single Newton step, we decompose the Newton step into two steps, an affine-scaling
II.7 Primal-Dual Logarithmic Barrier Method
181
step first and, next, a so-called pure centering step.18 Since a full affine-scaling step
is infeasible, we use a damping parameter θ. By taking θ small enough we enforce
feasibility of the step, and at the same time gain control over the loss of proximity
to the central path. The aim of the centering step is to restore the proximity to the
central path. This is obtained by using a Newton step with barrier parameter value
µ, where nµ is equal to the present duality gap. Such a step leaves the duality gap
unchanged, by Lemma II.47.
7.7.1
The predictor-corrector algorithm
In the description of the predictor-corrector algorithm below (page 182), ∆x and ∆s
denote the full Newton step at (x, s) with the current value of the barrier parameter
µ, and ∆a x and ∆a s denote the full affine-scaling step at the current iterate (x, s).
Observe that according to Lemma II.60 the damping factor θ for the affine-scaling
step can also be interpreted as an updating parameter for the barrier parameter µ.
We have the following theorem.
√
Theorem II.61 If τ = 1/2 and θ = 1/(2 n), then the Predictor-Corrector Algorithm
requires at most
√
nµ0
2 n log
ε
iterations. The output is a primal-dual pair (x, s) such that xT s ≤ ε.
The proof of this result is postponed to Section 7.7.3. It requires a careful analysis
of the affine-scaling step, which is the
√ subject of the next section. Let us note now
that the iteration bound is a factor 2 worse than the bound in Theorem II.53 for
the algorithm with full Newton steps. Moreover, each major iteration in the predictorcorrector algorithm consists of two steps: the centering step (also called the corrector
step) and the affine-scaling step (also called the predictor step).
7.7.2
Properties of the affine-scaling step
The purpose of this section is to analyze the effect of an affine-scaling step with size
θ on the proximity measure. As before, (x, s) denotes a positive primal-dual pair. We
18
The idea of breaking down the Newton direction into its affine-scaling and its centering component
seems to be due to Mehrotra [205]. The method considered in this chapter was proposed first by
Mizuno, Todd and Ye [217]; they were the first to use the name predictor-corrector method. The
analysis in this chapter closely resembles their analysis. Like them we alternate (single) primaldual affine-scaling steps and (single) primal-dual centering steps. An earlier paper of Sonnevend,
Stoer and Zhao [258] is based on similar ideas, except that they use multiple centering steps. It
soon appeared that one could prove that the method asymptotically has a quadratic convergence
rate (see, e.g., Mehrotra [206, 205], Ye et al. [317], Gonzaga and Tapia [126, 127], Ye [309] and
Luo and Ye [188].). Quadratic convergence of the primal-dual predictor-corrector method is the
subject in Section 7.7.6. A dual version of the predictor-corrector method was considered by Barnes,
Chopra and Jensen [36]; they showed polynomial-time convergence with an O(nL) iteration bound.
Mehrotra’s variant of the primal-dual predictor-corrector method will be discussed in Chapter 20. It
significantly cuts down the computational effort to achieve the greatest practical efficiency among
all interior-point methods. See, e.g., Lustig, Marsten and Shanno [192]. As a consequence the
method has become very popular.
182
II Logarithmic Barrier Approach
Predictor-Corrector Algorithm
Input:
A proximity parameter τ , 0 ≤ τ < 1;
an accuracy parameter ε > 0;
(x0 , s0 ) ∈ P × D, µ0 > 0 with (x0 )T s0 = nµ0 , δ(x0 , s0 ; µ0 ) ≤ τ ;
a barrier update parameter θ, 0 < θ < 1.
begin
x := x0 ; s := s0 ; µ := µ0 ;
while nµ ≥ (1 − θ)ε do
begin
x := x + ∆x;
s := s + ∆s;
x := x + θ∆a x;
s := s + θ∆a s;
µ := (1 − θ)µ;
end
end
assume that µ > 0 is such that xT s = nµ, and δ := δ(x, s; µ). Recall from (7.16) that
δ=
1 −1
1 e − u2
u −u =
,
2
2
u
where
u=
r
xs
.
µ
We need a simple bound on the coordinates of the vector u.
√
Lemma II.62 Let ρ(δ) := δ + 1 + δ 2 . Then
1
≤ ui ≤ ρ(δ),
ρ(δ)
1 ≤ i ≤ n.
Proof: Since ui is positive for each i, we have
−2δui ≤ 1 − u2i ≤ 2δui .
This implies
u2i − 2δui − 1 ≤ 0 ≤ u2i + 2δui − 1.
Rewriting this as
2
2
(ui − δ) − 1 − δ 2 ≤ 0 ≤ (ui + δ) − 1 − δ 2
II.7 Primal-Dual Logarithmic Barrier Method
183
we obtain
(ui − δ)2 ≤ 1 + δ 2 ≤ (ui + δ)2 ,
which implies
ui − δ ≤ |ui − δ| ≤
Thus we arrive at
p
1 + δ 2 ≤ ui + δ.
p
p
1 + δ 2 ≤ ui ≤ δ + 1 + δ 2 = ρ(δ).
−δ +
For the left-hand expression we write
−δ +
p
1 + δ2 =
1
1
√
=
.
2
ρ(δ)
δ+ 1+δ
This proves the lemma.
✷
Now we can prove the following.
Lemma II.63 Let the pair (x+ , s+ ) result from an affine-scaling step at (x, s) with
step-size θ. If xT s = nµ and δ := δ(x, s; µ) < τ , then we have δ + := δ(x+ , s+ ; (1 −
θ)µ) ≤ τ if θ satisfies the inequality
√
θ2 n
≤2 2 τ
1−θ
s
!
√
√
4
+ 4δρ(δ) 2 + 2τ 2 − 2δρ(δ) − τ 2 2 .
ρ(δ)2
(7.40)
For fixed τ , the right-hand side expression in (7.40) is a monotonically decreasing
function of δ.
Proof: From the proof of Lemma II.60 we recall that
x+ s+ = (1 − θ)xs + θ2 ∆a x∆a s.
This can be rewritten as
x+ s+ = µ (1 − θ)u2 + θ2 dax das .
Defining
+
u :=
we thus have
u+
2
s
x+ s+
,
(1 − θ)µ
= u2 +
θ2 dax das
.
1−θ
The proximity after the affine-scaling step satisfies
δ+ =
1
2
u+
−1
e − u+
2
≤
1
2
u+
−1
∞
e − u+
2
.
184
II Logarithmic Barrier Approach
We proceed by deriving bounds for the last two norms. First we consider the second
norm:
e − u+
2
θ2
θ2 dax das
kda da k
≤ e − u2 +
1−θ
1−θ x s
θ2 n
+ √
.
2 2(1 − θ)
=
e − u2 −
≤
e − u2
For the last inequality we applied the first uv-lemma (Lemma C.4 in Appendix C) to
the vectors dax and das and further utilized kuk2 = n. From Lemma II.62, we further
obtain
e − u2
e − u2
≤ kuk∞
≤ 2δρ(δ).
e − u2 = u
u
u
−1
For the estimate of (u+ )
we write, using Lemma II.62 and the first uv-lemma
∞
once more,
2
θ2
θ2 n
1
2
a a
u+
≥
u
−
−
kd
d
k
≥
.
i
i
1−θ x s ∞
ρ(δ)2
4(1 − θ)
We conclude, by substitution of these estimates, that
2
θ n
2δρ(δ) + 2√2(1−θ)
δ ≤ q
.
1
θ2n
2 ρ(δ)
2 − 4(1−θ)
+
Hence, δ + ≤ τ holds if
2
4τ 2
θ2 nτ 2
θ2 n
≤
−
.
2δρ(δ) + √
2
ρ(δ)
1−θ
2 2(1 − θ)
(7.41)
This can be rewritten as
2
√
√
θ2 n
θ2 n
4τ 2
2δρ(δ) + √
+ 4τ 2 δρ(δ) 2,
+ 2τ 2 2 2δρ(δ) + √
≤
2
ρ(δ)
2 2(1 − θ)
2 2(1 − θ)
or equivalently,
√ 2
√
θ2 n
4τ 2
2δρ(δ) + √
+ τ2 2 ≤
+ 4τ 2 δρ(δ) 2 + 2τ 4 .
2
ρ(δ)
2 2(1 − θ)
By taking the square root we get
√
θ2 n
+ τ2 2 ≤ τ
2δρ(δ) + √
2 2(1 − θ)
s
√
4
+ 4δρ(δ) 2 + 2τ 2 .
2
ρ(δ)
By rearranging terms this can be rewritten as
s
√
√
4
θ2 n
√
+ 4δρ(δ) 2 + 2τ 2 − 2δρ(δ) − τ 2 2.
≤τ
2
ρ(δ)
2 2(1 − θ)
II.7 Primal-Dual Logarithmic Barrier Method
185
This implies the first statement in the lemma. For the proof of the second statement
we observe that the inequality (7.40) in the lemma is equivalent to the inequality
(7.41). We can easily verify that the left-hand side expression in (7.41) is increasing in
both δ and θ and the right-hand side expression is decreasing in both δ and θ. Hence,
if θ satisfies (7.41) for some value of δ, then the same value of θ satisfies (7.41) also
for smaller values of δ. Since the inequalities (7.40) and (7.41) are equivalent, the last
inequality has the same property: if θ satisfies (7.40) for some value of δ, then the
same value of θ satisfies (7.40) also for smaller values of δ. This implies the second
statement in the lemma and completes the proof.
✷
2
1.5
upper bound for
θ2n
1−θ
✠
1
0.5
0
0
0.1
0.2
0.3
0.4
✲ δ = δ(x, s; µ)
Figure 7.8
The right-hand side of (7.40) for τ = 1/2.
Figure 7.8 shows the graph of the right-hand side of (7.40) as a function of δ.
With the above lemma the analysis of the predictor-corrector algorithm can easily
be accomplished. We do this in the next section. At the end of this section we apply
the lemma to the special case where we start the affine-scaling step at the µ-centers.
Then δ = 0 and ρ(δ) = 1. Substitution of these values in the lemma yields that the
proximity after the step does not exceed τ if
√ p
√
θ2 n
≤ 2 2 τ 4 + 2τ 2 − τ 2 2 .
1−θ
Note that this bound coincides with the corresponding bound obtained in Section 7.6.5
for an adaptive update at the µ-center with the full-step method.
7.7.3
Analysis of the predictor-corrector algorithm
√
In this section we provide the proof of Theorem II.61. Taking τ = 1/2 and θ = 1/(2 n)
we show that each iteration starts with x, s and µ such that δ(x, s; µ) ≤ τ. This makes
the algorithm well defined, and implies the result of the theorem.
186
II Logarithmic Barrier Approach
The corrector step is simply a Newton step to the µ-center. By Theorem II.50 (on
page 156) the result is a pair (x, s) such that
1
1
= √ .
δ := δ(x, s; µ) ≤ q 4
24
2(1 − 14 )
Now we apply Lemma II.63 to this pair (x, s). This lemma states that the affine step
with step-size θ leaves the proximity with respect to the barrier parameter (1 − θ)µ
smaller than (or equal to) τ if θ satisfies (7.40) and, moreover, that√for fixed τ the
right-hand side of (7.40) is monotonically decreasing in δ. For δ = 1/ 24 we have
r
r
1
3
1
=
.
ρ(δ) = √ + 1 +
24
2
24
Substitution of the given values in the√ right-hand side of (7.40) yields the value
0.612626 (cf. Figure 7.8, with δ = 1/ 24 = 0.204124). Hence (7.40) is certainly
satisfied if
θ2 n
≤ 0.612626.
1−θ
√
If θ = 1/(2 n) this condition is satisfied for each n ≥ 1. This proves Theorem II.61.
✷
Remark II.64 In the above analysis we could also have used the improved quadratic
convergence result of Theorem II.52. However, this does not give a significant change. After
the centering step the proximity satisfies
1
δ := δ(x, s; µ) ≤ p 4
2(1 −
1
)
16
1
= √ ,
30
and the condition on θ becomes a little weaker, namely:
θ2 n
≤ 0.768349.
1−θ
7.7.4
•
An adaptive version of the predictor-corrector algorithm
As stated before, the predictor-corrector method is the most popular interior-point
method for solving LO problems in practice. But this is not true for the version we
dealt with in the previous section. √
When we update the barrier parameter each time
by the factor 1 − θ, with θ = 1/(2 n), as in that algorithm, the required number of
iterations will be as predicted by Theorem II.61. That is, each iteration reduces the
duality gap by the constant factor 1 − θ and hence the duality
√ gap reaches the desired
accuracy in a number of iterations that is proportional to n. The obvious way to
reduce the number of iterations is to use adaptive updates of the barrier parameter.
The following lemma is crucial.
Lemma II.65 Let the pair (x+ , s+ ) result from an affine-scaling step at (x, s) with
step-size θ. If xT s = nµ and δ := δ(x, s; µ) < τ , then δ + := δ(x+ , s+ ; µ(1 − θ)) ≤ τ if
s
!
θ2
1
a a
2
+ 2δρ(δ) + τ − τ − 2δρ(δ).
(7.42)
kd d k ≤ 2τ
1−θ x s
ρ(δ)2
II.7 Primal-Dual Logarithmic Barrier Method
187
Proof: The proof is a slight modification of the proof of Lemma II.63. We recall from
that proof that the proximity after the affine-scaling step satisfies
2
−1
−1
2
1
1
,
e − u+
u+
u+
≤
e − u+
δ+ =
2
2
∞
where, as before,
u+
2
= u2 +
θ2 dax das
,
1−θ
dax and das denote the scaled affine-scaling components, and u =
some estimates:
2
θ2
kda da k ,
≤ e − u2 +
e − u+
1−θ x s
and
e − u2 ≤ 2δρ(δ).
Moreover,
p
xs/µ. We also recall
θ2
θ2
1
−
kdax das k∞ ≥
kda da k .
2
1−θ
ρ(δ)
1−θ x s
By substitution of these estimates we obtain
u+
i
2
≥ u2i −
2
θ
2δρ(δ) + 1−θ
kdax das k
.
δ ≤ q
θ2
1
a a
2 ρ(δ)
2 − 1−θ kdx ds k
+
Hence, δ + ≤ τ holds if
2
θ2
4θ2 τ 2 a a
4τ 2
a a
2δρ(δ) +
−
kdx ds k ≤
kd d k .
1−θ
ρ(δ)2
1−θ x s
This can be rewritten as
2
4τ 2
θ2
θ2
a a
a a
2
+ 8τ 2 δρ(δ),
kdx ds k + 4τ 2δρ(δ) +
kdx ds k ≤
2δρ(δ) +
1−θ
1−θ
ρ(δ)2
or equivalently,
2
θ2
4τ 2
a a
2
2δρ(δ) +
+ 8τ 2 δρ(δ) + 4τ 4 .
kdx ds k + 2τ
≤
1−θ
ρ(δ)2
By taking the square root we get
θ2
kda da k + 2τ 2 ≤ 2τ
2δρ(δ) +
1−θ x s
which reduces to
θ2
kda da k ≤ 2τ
1−θ x s
s
s
1
+ 2δρ(δ) + τ 2 ,
ρ(δ)2
1
+ 2δρ(δ) + τ 2 − τ
ρ(δ)2
This completes the proof.
From this lemma we derive the next theorem.
!
− 2δρ(δ).
✷
188
II Logarithmic Barrier Approach
Theorem II.66 If τ = 1/3 then the property δ(x, s; µ) ≤ τ is maintained in each
iteration if θ is taken equal to
θ=
2
p
.
1 + 1 + 13 kdax das k
Proof: We only need to show that when we start some iteration with x, s and µ such
that δ(x, s; µ) ≤ τ, then after this iteration the property δ(x, s; µ) ≤ τ is maintained.
By Theorem II.50 (on page 156) the result of the corrector step is a pair (x, s) such
that
1
1
=
.
δ := δ(x, s; µ) ≤ q 9
12
2(1 − 1 )
9
Now we apply Lemma II.65 to (x, s). By this lemma the affine step with step-size θ
leaves the proximity with respect to the barrier parameter (1 − θ)µ smaller than (or
equal to) τ if θ satisfies (7.42). For δ = 1/12 we have ρ(δ) = 1.0868. Substitution of
the given values in the right-hand side expression yields 0.308103, which is greater
than 4/13. The right-hand side is monotonic in δ, as can be verified by elementary
means, so smaller values of δ yield larger values than 4/13. Thus the proximity after
the affine-scaling step does not exceed τ if θ satisfies
4
θ2
kda da k ≤
.
1−θ x s
13
We may easily verify that the value in the theorem satisfies this condition with equality.
Hence the proof is complete.
✷
7.7.5
Illustration of adaptive predictor-corrector algorithm
With the same example as in the previous illustrations, and the same initialization,
the adaptive predictor-corrector algorithm, with ε = 10−4 , runs as shown in Table 7.5.
(page 189). Each iteration consists of two steps: the corrector step (with θ = 0) and
the affine-scaling step (with θ as given by Theorem II.66). Table 7.5. shows that only
7 iterations yield the desired accuracy. After the corrector step the proximity is always
very small, especially in the final iterations. This is the same phenomenon as observed
previously, namely that the Newton process is almost exact. For the affine-scaling steps
we see the same behavior as in the full-step method with adaptive updates. The value
of the barrier update parameter increases very quickly to 1. As a result the duality
gap goes very quickly to zero. This is not accidental. It is a property of the predictorcorrector method with adaptive updates, as shown in the next section. Figure 7.9
(page 190) shows on two graphs the progress of the algorithm in the w-space.
7.7.6
Quadratic convergence of the predictor-corrector algorithm
It is clear that the rate of convergence in the predictor-corrector method depends on
the values taken by the barrier update parameter θ. We show in this section that the
II.7 Primal-Dual Logarithmic Barrier Method
It.
nµ
x1
1
1
2
2
3
3
4
4
5
5
6
6
7
7
4.000000
4.000000
1.595030
1.595030
0.593303
0.593303
0.146755
0.146755
0.013557
0.013557
0.000138
0.000138
0.000000
0.000000
2.000000
2.000000
1.278509
1.334918
1.088217
1.108991
1.019821
1.025085
1.001775
1.002265
1.000018
1.000023
1.000000
1.000000
Table 7.5.
y1
y2
189
s1
δ
θ
0.000000
0.000000 1.000000 0.2887 0.000000
0.333333 −0.333333 0.666667 0.0000 0.601242
0.493665
0.468323 0.506335 0.1576 0.000000
0.606483
0.468323 0.393517 0.0085 0.628030
0.780899
0.802232 0.219101 0.1486 0.000000
0.822447
0.802232 0.177553 0.0031 0.752648
0.941805
0.951082 0.058195 0.1543 0.000000
0.952333
0.951082 0.047667 0.0008 0.907623
0.994513
0.995481 0.005487 0.1568 0.000000
0.995492
0.995481 0.004508 0.0001 0.989826
0.999944
0.999954 0.000056 0.1575 0.000000
0.999954
0.999954 0.000046 0.0000 0.999894
1.000000
1.000000 0.000000 0.1576 0.000000
1.000000
1.000000 0.000000 0.0000 1.000000
The adaptive predictor-corrector algorithm.
rate of convergence eventually becomes quadratic. To achieve a quadratic convergence
rate it must be true that in the limit, (1−θ)µ is of the order O(µ2 ), so that 1−θ = O(µ).
In this section we show that the value of θ in Theorem II.66 has this property. The
following lemma makes clear that for our purpose it is sufficient to concentrate on the
magnitude of the norm of the vector dax das .
Lemma II.67 The value of the barrier update parameter θ in Theorem II.66 satisfies
1−θ ≤
13 a a
kdx ds k .
4
Hence, the rate of convergence for the adaptive predictor-corrector method is quadratic
if kdax das k = O(µ).
Proof: The lemma is an easy consequence of properties of the function f : [0, ∞) →
IR+ defined by
2
√
.
f (x) = 1 −
1 + 1 + 13x
The derivative is given by
13
f ′ (x) = √
2
√
1 + 13x 1 + 1 + 13x
and the second derivative by
√
−169 1 + 3 1 + 13x
f (x) =
3 .
√
3
2 (1 + 13x) 2 1 + 1 + 13x
′′
190
II Logarithmic Barrier Approach
3
w2
✻
2.5
101
w2
100
✻
✻
10−1
central path
2
10−2
10−3
1.5
10−4
µ1 e
10−5
1
10−6
■ δ(w, µ1 ) = τ
10−7
0.5
10−8
0
0
0.5
1
1.5
Figure 7.9
2
2.5
✲ w1
3
10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101
✲ w1
The iterates of the adaptive predictor-corrector algorithm.
This implies that f is monotonically increasing and concave. Since f ′ (0) = 13/4, it
follows that f (x) ≤ 13x/4 for each x ≥ 0. Putting x = kdax das k gives the lemma. ✷
We need one more basic fact in the analysis below. This concerns the optimal sets
P ∗ and D∗ of the primal and dual problems. Defining the index sets
B := {i : xi > 0 for some x ∈ P ∗ }
and
N := {i : si > 0 for some s ∈ D∗ } ,
we know that these sets are disjoint, because xT s = 0 whenever x ∈ P ∗ and s ∈ D∗ .
We need the far from obvious fact that each index i, 1 ≤ i ≤ n, belongs either to
B or N .19 As a consequence, the sets B and N form a partition of the index set
{i : 1 ≤ i ≤ n}. This partition is called the optimal partition of the problems (P )
and (D).
The behavior of the components of the vectors dax and das strongly depends
on whether a component belongs to one set of the optimal partition or to the
complementary set. Table 7.6. summarizes some facts concerning the order of
magnitude of the components of various vectors of interest.
From this table we
read, for example, that xB = Θ(1) and ∆a xN = O(µ). According to the definition
of the symbols Θ and O this means that there exist positive constants c1 , c2 , c3 such
that c1 e ≤ xB ≤ c2 e and ∆a xN ≤ c3 µ.20 In our case it is important to stress that
19
20
This is the content of the Goldman–Tucker Theorem (Theorem II.3), an early result in the theory
of Linear Optimization that has often been considered exotic. The original proof was based on
Farkas’ lemma (see, e.g., Schrijver [250], pp. 95–96). In Part I of this book we have shown that
the corresponding result for the self-dual model is a natural byproduct of the limiting behavior of
the central path. We also refer the reader to Güler et al. [134], who derived the Goldman–Tucker
Theorem from the limiting behavior of the central path for the standard format. Güler and Ye [135]
showed that interior-point algorithms — in a wide class — keep the iterates so close to the central
path that these algorithms yield the optimal partition of the problem.
See Section 1.7.4 for definitions of the order symbols O and Θ.
II.7 Primal-Dual Logarithmic Barrier Method
Vector
Table 7.6.
B
191
N
1
x
Θ(1)∗
Θ(µ)
2
s
Θ(µ)
Θ(1)∗
3
u
Θ(1)
4
d
Θ( √1µ )
Θ(1)
√
Θ( µ)
5
dax
6
das
7
∆a x
8
∆a s
O(µ)∗
O(1)
O(µ)∗
O(µ)
O(1)
O(µ)∗
O(µ)
O(µ)∗
Asymptotic orders of magnitude of some relevant vectors.
these constants are independent of the iterates x, s and of the value µ of the barrier
parameter. They depend only on the problem data A, b and c. Some of the statements
in the table are almost trivial; the more difficult ones are indicated by an asterisk.
Below we present the relevant proofs.
Let us temporarily postpone the proof of the statements in Table 7.6. and show
that the order estimates given in the table immediately imply quadratic convergence
of the adaptive predictor-corrector method.
Theorem II.68 The adaptive predictor-corrector method is asymptotically quadratically convergent.
Proof: From Table 7.6. we deduce that each component of the vector dax das is bounded
by O(µ). From our conventions this implies that dax das = O(µ). Hence the result follows
from Lemma II.67.
✷
The rest of this section is devoted to proving the estimates in Table 7.6.. Note that
at the start of an affine-scaling step we have δ = δ(x, s; µ) ≤ 1/12, from the proof of
Theorem II.66. This property will be used several times in the sequel. We start with
line 3 in the table.
Line 3: With δ ≤ 1/12, Lemma II.62 implies that each component ui of u satisfies
0.92013 ≤
This proves that u = Θ(1).
1
≤ ui ≤ ρ(δ) ≤ 1.0868.
ρ(δ)
192
II Logarithmic Barrier Approach
Lines 1 and 2: We start with the estimates for xB and sB . We need the following
two positive numbers:21
σp
:=
min max{xi : x ∈ P ∗ }
σd
:=
min max{si : s ∈ D∗ }.
i∈B
i∈N
Note that these numbers depend only on the data of the problem and not on the
iterates. Moreover, due to the existence of a strictly complementary optimal solution
pair, both numbers are positive. Now let i ∈ B and let x̄ ∈ P ∗ be such that x̄i is
maximal. Then, using that x̄i ≥ σp > 0, we may write
si =
sT x̄
sT x̄
si x̄i
≤
≤
.
x̄i
x̄i
σp
Since x̄ is optimal, cT x̄ ≤ cT x. Hence, with y such that s = c − AT y, we have
sT x̄ = cT x̄ − bT y ≤ cT x − bT y = sT x = nµ,
so that
si ≤
nµ
,
σp
∀i ∈ B.
This implies that sB = O(µ). From the third line in Table 7.6. we derive that
xB sB = µu2B = Θ(µ). The last two estimates imply that
(xB )−1 =
O(µ)
sB
=
= O(1).
Θ(µ)
Θ(µ)
This implies that xB is bounded away from zero. On the other hand, since the pair
(x, s) has duality gap nµ and hence, by Theorem II.9 (on page 100), belongs to a
bounded set, we have xB = O(1). Thus we may conclude that xB = Θ(1). Since we
also have xB sB = Θ(µ), it follows that sB = Θ(µ). In exactly the same way we derive
that sN = Θ(1) and xN = Θ(µ).
Line 4: The estimates in the fourth line follow directly from the definition of d and
the estimates for x and s in the first two lines.
Line 5 and 6: We obtain an order estimate for (dax )N and (das )B by the following
simple argument. By its definition dax is the component of√the vector −u in the null
space of the matrix AD. Hence we have kdax k ≤ kuk = n. Therefore, dax = O(1).
Since (dax )N is a subvector of dax , we must also have (dax )N = O(1). A similar argument
applies to (das )B .
The estimates for (dax )B and (das )N are much more difficult to obtain. We only deal
with the estimate for (dax )B ; the result for (das )N can be obtained in a similar way.
21
These quantities were introduced by Ye [311]. See also Vavasis and Ye [280]. The numbers σp
s
x
for the self-dual model, as introduced in
and σSP
and σd closely resemble the numbers σSP
Section 3.3.2 of Part I. According to the definition of the condition number σSP for the self-dual
model, the smallest of the two numbers σp and σd is a natural candidate for a condition number
for the standard problems (P ) and (D). We refer the reader to the above-mentioned papers for a
discussion of other condition numbers and their mutual relations.
II.7 Primal-Dual Logarithmic Barrier Method
193
The main force in the derivation below is the observation that dax can be written as
the projection on the null space of AD of a vector that vanishes on the index set B.22
This can be seen as follows. We may write
√
1
1
dax = −PAD (u) = − √ PAD ( xs) = − √ PAD (ds).
µ
µ
Now let (ỹ, s̃) be any dual optimal pair. Then
s = c − AT y = AT ỹ + s̃ − AT y = s̃ + AT (ỹ − y),
so we have
ds = ds̃ + (AD)T (ỹ − y).
This means that ds − ds̃ belongs to the row space of AD. The row space being
orthogonal to the null space of AD, it follows that
PAD (ds) = PAD (ds̃).
Thus we obtain
1
dax = − √ PAD (ds̃).
µ
(7.43)
Since s̃ is dual optimal, all its positive coordinates belong to the index set N , and
hence we have s̃B = 0. Now we can rewrite (7.43) in the following way:
√
2
− µ dax = argminh kds̃ − hk : ADh = 0 ,
or equivalently,
√
2
2
− µ dax = argminh kdB s̃B − hB k + kdN s̃N − hN k : AB DB hB + AN DN hN = 0 .
This means that the solution of the last minimization problem is given by hB =
√
√
− µ (dax )B and hN = − µ (dax )N . Hence, substituting the optimal value for hN as
above, and also using that s̃B = 0, we obtain
√
√
2
− µ (dax )B = argminhB khB k : AB DB hB = µ AN DN (dax )N .
√
Stated otherwise, − µ (dax )B can be characterized as the vector of smallest norm in
the affine space
√
S = {ξ : AB DB ξ = µ AN DN (dax )N } .
√
Now consider the least norm solution of the equation AB z = µ AN DN (dax )N . This
solution is given by
√
a
z ∗ = µ A+
B AN DN (dx )N ,
22
We kindly acknowledge that the basic idea of the analysis below was communicated privately to us
by our colleague Gonzaga. We also refer the reader to Gonzaga and Tapia [127] and Ye et al. [317];
these papers deal with the asymptotically quadratic convergence rate of the predictor-corrector
method.
194
II Logarithmic Barrier Approach
−1 ∗
23
where A+
B denotes the pseudo-inverse √of the matrix AB . It is obvious that DB z
a
belongs to the affine space S. Hence, − µ(dx )B being the vector of smallest norm in
S, we obtain
√
√
−1 ∗
−1 +
k µ (dax )B k ≤ DB
z = µ DB
AB AN DN (dax )N ,
or, dividing both sides by
√
µ,
−1 +
k(dax )B k ≤ DB
AB AN DN (dax )N .
This implies
−1
k(dax )B k ≤ DB
a
A+
B kAN k kDN k k(dx )N k .
Since, by convention, A+
B and kAN k are bounded by O(1), and the order of magnitudes of the other norms on the right-hand side multiply to O(µ), we obtain that
k(dax )B k = O(µ). This implies the entry (dax )B = O(µ) in the table.
Line 7 and 8: These lines are not necessary for the proof of Theorem II.68. We only
add them because of their own interest. They immediately follow from the previous
lines in the table and the relations
∆x =
√
µ ddx ,
∆s =
√ −1
µ d ds .
This completes the proof of all the entries in Table 7.6..
7.8
A version of the algorithm with large updates
The primal-dual methods considered so far share the property that the iterates stay
close to the central path. More precisely, each generated primal-dual pair (x, s) belongs
to the region of quadratic convergence around some µ-center. In this section we
consider an algorithm in which the iterates may temporarily get quite far from the
central path, because of a large, but fixed, update of the barrier parameter. Then, by
using damped Newton steps, we return to the neighborhood of the point of the central
path corresponding to the new value of the barrier parameter. The algorithm is the
natural primal-dual analogue of the dual algorithm with large updates in Section 6.9.
Just as in the dual case, when the iterates leave the neighborhood of the central path
the proximity measure for the full-step method, δ(x, s; µ), becomes less relevant as a
measure for closeness to the central path. It will be of no surprise that in the primaldual case the primal-dual logarithmic barrier function φµ (x, s) is a perfect tool for this
job. Recall from (6.23), on page 133, that φµ (x, s) is given by
φµ (x, s) = Ψ
xs
−e
µ
= eT
X
n
xj sj
xs
log
−e −
,
µ
µ
j=1
(7.44)
and from Section 6.9 (page 130) that φµ (x, s) is nonnegative on its domain (the set
of all positive primal-dual pairs), is strictly convex, has a (unique) minimizer, namely
23
See Appendix B.
II.7 Primal-Dual Logarithmic Barrier Method
195
(x, s) = (x(µ), s(µ)) and, finally that φµ (x(µ), s(µ)) = 0.24
The algorithm is described below (page 195). As usual, ∆x and ∆s denote the
Newton step at the current pair (x, s) with the barrier parameter equal to its current
value µ. The first while-loop in the algorithm is called the outer loop and the second
Primal-Dual Logarithmic Barrier Algorithm with Large Updates
Input:
A proximity parameter τ ;
an accuracy parameter ε > 0;
a variable damping factor α;
a fixed barrier update parameter θ, 0 < θ < 1;
(x0 , s0 ) ∈ P × D and µ0 > 0 such that δ(x0 , s0 ; µ0 ) ≤ τ .
begin
x := x0 ; s := s0 ; µ := µ0 ;
while nµ ≥ ε do
begin
µ := (1 − θ)µ;
while δ(x, s; µ) ≥ τ do
begin
x := x + α∆x;
s := s + α∆s;
(The damping factor α must be such that φµ (x, s) decreases
sufficiently. Lemma II.72 gives a default value for α.)
end
end
end
while-loop the inner loop. Each execution of the outer loop is called an outer iteration
and each execution of the inner loop an inner iteration. The required number of outer
iterations depends only on the dimension n of the problem, on µ0 and ε, and on the
(fixed) barrier update parameter θ. This number immediately follows from Lemma
I.36 and is given by
nµ0
1
.
log
θ
ε
Just as in the dual case, the main task in the analysis of the algorithm is the estimation
of the number of iterations between two successive updates of the barrier parameter.
24
Exercise 54 Let the positive primal-dual pair (x, s) be given. We want to find µ > 0 such that
φµ (x, s) is minimal. Show that this happens if µ = xT s/n and verify that for this value of µ we
have
n
xT s X
nxs
−
e
=
n
log
−
log xj sj .
φµ (x, s) = Ψ
xT s
n
j=1
196
II Logarithmic Barrier Approach
This is the purpose of the next sections. We first derive some estimates of φµ (x, s) in
terms of the proximity measure δ(x, s; µ).
7.8.1
Estimates of barrier function values
The estimates in this section are of the same type as the estimates in Section 6.9.1
for the dual case.25 Many of these estimates there were given in terms of the function
ψ : (−1, ∞) → IR determined by (5.5):
ψ(t) = t − log(1 + t),
which is nonnegative on its domain, strictly convex and zero at t = 0. For z ∈ IRn ,
with z + e > 0, we defined in (6.22), page 133,
Ψ(z) =
n
X
ψ(zj ).
(7.45)
j=1
The estimates in Section 6.9.1 were given in terms of the dual proximity measure
δ(y, µ). Our aim is to derive similar estimates, but now in terms of the primal-dual
proximity measure δ(x, s; µ).
Let (x, s) be any positive primal-dual pair and µ > 0. Then, with u as usual:
r
xs
,
u=
µ
we may write
n
X
log u2j = Ψ u2 − e .
φµ (x, s) = eT u2 − e −
j=1
Using this we prove the next lemma.
√
Lemma II.69 Let δ := δ(x, s; µ) and ρ(δ) := δ + 1 + δ 2 . Then
−2δ
ψ
≤ φµ (x, s) ≤ ψ (2δρ(δ)) .
ρ(δ)
The first (second) inequality holds with equality if and only if one of the coordinates
of u attains the value ρ(δ) (1/ρ(δ)) and all other coordinates are equal to 1.
Proof: Fixing δ, we consider the behavior of Ψ u2 − e on the set
T := u ∈ IRn : u−1 − u = 2δ, u ≥ 0 .
Note that this set is invariant under inverting coordinates of u. Because of the
inequality
1
ψ(t − 1) > ψ
− 1 , t > 1,
(7.46)
t
25
The estimates in this section are new and dramatically improve existing estimates from the
literature. See, e.g., Monteiro and Adler [218], Mizuno and Todd [216], Jansen et al. [157] and
den Hertog [140].
II.7 Primal-Dual Logarithmic Barrier Method
197
whose elementary proof is left as an exercise 26 , this implies that u ≥ e if u maximizes
Ψ(u2 − e) on T and u ≤ e if u minimizes Ψ(u2 − e) on T .
Consider first the case where u is a maximizer of Ψ on the set T . The first-order
optimality conditions are
e
u2 − e
,
2u
=
2λ
u
−
u2
u3
(7.47)
where λ ∈ IR. This can be rewritten as
u2 u2 − e = λ u2 − e u2 + e .
It follows that each coordinate of u satisfies
ui = 1
u2i = λ u2i + 1 .
or
Since u > 0, we may conclude from this that the coordinates of u that differ from 1
are mutually equal. Suppose that u has k such coordinates, and that their common
value is ν. Note that k > 0, unless δ = 0, in which case the lemma is trivial. Therefore,
we may assume that k ≥ 1. Now, since u ∈ T ,
2
1
−ν
= 4δ 2 ,
k
ν
which gives
1
2δ
−ν = √ .
ν
k
Since u is a maximizer, we have ν ≥ 1, and hence
δ
ν=ρ √
.
k
Therefore, using that
ρ(t)2 − 1 = 2tρ(t),
we obtain
2
2
t ∈ IR,
Ψ u − e = kψ ν − 1 = kψ
2δ
√ ρ
k
(7.48)
δ
√
k
.
The expression on the right-hand side is decreasing as a function of k.27 Hence the
maximal value is attained if k = 1, and this value equals ψ (2δ ρ (δ)). The second
inequality in the lemma follows.
The first inequality is obtained in the same way. If u is a minimizer of Ψ on the set
T , then the first-order optimality conditions (7.47) imply in the same way as before
26
Exercise 55 Derive (7.46) from the inequalities in Exercise 42 (page 137).
27
Exercise 56 Let δ and ρ(δ) be as defined in Lemma II.69, and let k ≥ 1. Prove that
kψ
2δ
√ ρ
k
δ
√
k
and that this expression is maximal if k = 1.
= kψ
√
2δ2 + 2δ δ2 + k
k
198
II Logarithmic Barrier Approach
that the coordinates of u that differ from 1 are mutually equal. Assuming that u has
k such coordinates, and that their common value is ν again, we now have ν ≤ 1, and
hence
1
ν = .
ρ √δk
Using (7.48), it follows that
1 − ρ(t)2
−2tρ(t)
−2t
1
−
1
=
=
=
.
2
2
2
ρ(t)
ρ(t)
ρ(t)
ρ(t)
Hence we may write
−2δ
.
Ψ u2 − e = kψ ν 2 − 1 = kψ √
k ρ √δk
The expression on the right-hand side is increasing as a function of k.28 Hence the
minimal value is attained if k = 1, and this value equals ψ (−2δ/ρ (δ)). Thus the proof
of the lemma is complete.
✷
4
3
ψ (2δρ(δ))
❯
2
❑
1
ψ
0
0
0.5
−2δ
ρ(δ)
1
1.5
2
✲ δ = δ(x, s; µ)
Figure 7.10
28
Bounds for ψµ (x, s).
Exercise 57 Let δ and ρ(δ) be as defined in Lemma II.69, and let k ≥ 1. Prove that
kψ √
−2δ
kρ
√δ
k
= kψ
and that this expression is minimal if k = 1.
−2δ
√
δ + δ2 + k
II.7 Primal-Dual Logarithmic Barrier Method
199
Figure 7.10 shows the graphs of the bounds in Lemma II.69 for φµ (x, s) as a function
of the proximity δ.
Remark II.70 It may be worthwhile to discuss the quality of these bounds. Both bounds
are valid for all (nonnegative) values of the proximity. Especially for the upper bound this
is worth noting. Proximity measures known in the literature do not have this feature. For
example, with the popular measure
xs
−e
µ
all known upper bounds grow to infinity if the measure approaches 1. The upper bound of
Lemma II.69 goes to infinity only if our proximity measure goes to infinity.
The lower bound goes to infinity as well if if our proximity measure goes to infinity, due
to the fact that −2δ/ρ(δ) converges to -1 if δ goes to infinity. This is a new feature, which
will be used below in the analysis of the large-update method. On the other hand, it must be
noted that the lower bound grows very slowly if δ increases. For example, if δ = 1, 000, 000
then the lower bound is only 28.0168.
•
7.8.2
Decrease of barrier function value
Suppose again that (x, s) is any positive primal-dual pair and µ > 0. In this section we
analyze the effect on the barrier function value of a damped Newton step at (x, s) to
the µ-center. With u as defined before, the Newton displacements ∆x and ∆s satisfy
x∆s + s∆x = µe − xs.
Let x+ and s+ result from a damped Newton step of size α at (x, s). Then we have
x+ = x + α∆x,
s+ = s + α∆s.
Using the scaled displacements dx and ds , defined in (7.5), page 154, we can also write
√
√
x+ = µ d (u + αdx ) , s+ = µ d−1 (u + αds ) .
As a consequence,
x+ s+ = µ (u + αdx ) (u + αds ) = µ u2 + α e − u2 + α2 dx ds .
Here we used that u (dx + ds ) = e − u2 , which follows from
dx + ds = u−1 − u.
Now, defining
u+ :=
s
(7.49)
x+ s+
,
µ
it follows that
u+
2
= (u + αdx ) (u + αds ) = u2 + α e − u2 + α2 dx ds .
Subtracting e we get
u+
2
− e = (1 − α) u2 − e + α2 dx ds .
(7.50)
200
II Logarithmic Barrier Approach
Note that the orthogonality of dx and ds implies that eT dx ds = 0. Using this we find
the following expression for φµ (x+ , s+ ):
φµ (x+ , s+ )
eT
=
u+
2
n
X
2
log u+
−e −
j
j=1
n
X
2
log u+
(1 − α) eT u2 − e −
.
j
=
j=1
The next lemma provides an expression for the decrease of the barrier function value
during a damped Newton step.
Lemma II.71 Let δ = δ(x, s; µ) and let α be such that the pair (x+ , s+ ) resulting
from the damped Newton step of size α is feasible. Then we have
αds
αdx
+ +
2
−Ψ
.
φµ (x, s) − φµ (x , s ) = 4αδ − Ψ
u
u
Proof: For the moment let us denote ∆ := φµ (x, s) − φµ (x+ , s+ ). Then we have
∆
=
=
e
T
2
u −e −
n
X
log u2j
j=1
n
X
log
αeT u2 − e +
j=1
Since
u+
we may write
u+
u
Substituting this we obtain
∆
=
=
αe
T
2
2
=
n
X
αeT u2 − e +
!
+ 2
uj
uj
2
u −e +
n
X
j=1
2
.
dx
ds
e+α
e+α
.
u
u
j=1
n
X
log
u+
j
uj
!2
X
n
(ds )j
(dx )j
+
log 1 + α
.
log 1 + α
uj
uj
j=1
j=1
Observe that, by the definition of Ψ,
n
X
(dx )j
dx
αdx
−Ψ
log 1 + α
= αeT
uj
u
u
j=1
and, similarly,
(ds )j
ds
αds
T
log 1 + α
−Ψ
.
= αe
uj
u
u
j=1
n
X
log u+
j
= (u + αdx ) (u + αds )
2
u −e +
− (1 − α)e
T
II.7 Primal-Dual Logarithmic Barrier Method
201
Substituting this in the last expression for ∆ we arrive at
ds
αdx
αds
dx
T
2
T
T
+ αe
−Ψ
−Ψ
.
∆ = αe u − e + αe
u
u
u
u
Using (7.49) once more, the coefficients of α in the first three terms can be taken
together as follows:
2
dx + ds
2
T
u −e+
e
= eT u2 − e + u−2 − e = eT u−1 − u .
u
Thus we obtain
∆ = α u−1 − u
2
−Ψ
αdx
u
Since u−1 − u = 2δ, the lemma follows.29,30
−Ψ
αds
u
.
✷
We proceed by deriving a lower bound for the expression in the above lemma. The
next lemma also specifies a value of the damping parameter α for which the decrease
in the barrier function value attains the lower bound.
Lemma II.72 Let δ = δ(x, s; µ) and let α = 1/ω − 1/(ω + 4δ 2 ), where
s
s
2
2
2
2
∆x
dx
∆s
ds
ω :=
+
=
+
.
x
s
u
u
Then the pair (x+ , s+ ) resulting from the damped Newton step of size α is feasible.
Moreover, the barrier function value decreases by at least ψ(2δ/ρ(δ)). In other words,
2δ
φµ (x, s) − φµ (x+ , s+ ) ≥ ψ
.
ρ(δ)
29
30
Exercise 58 Verify that
∆x
dx
∆s
ds
=
,
=
.
x
u
s
u
Exercise 59 Using Lemma II.71, show that the decrease in the primal-dual barrier function value
after a damped step of size α can be written as:
∆ := φµ (x, s) − φµ (x+ , s+ ) = α kdx k2 + α kds k2 − Ψ
αdx
u
−Ψ
Now let z be the concatenation of the vectors dx and ds . Then we may write
.
z 2
u
z
+ αu
!
,
∆ = α kzk2 − Ψ
αz
u
αds
u
.
Using this, show that the decrease is maximal for the unique step-size ᾱ determined by the equation
T 2
e z =e
T
α
e
and that for this value the decrease is given by
Ψ
−ᾱz
u + ᾱz
=Ψ
−ᾱdx
u + ᾱdx
+Ψ
−ᾱds
u + ᾱds
.
202
II Logarithmic Barrier Approach
Proof: Assuming feasibility of the damped step with size α, we know from
Lemma II.71 that the decrease in the barrier function value is given by
αds
αdx
2
−Ψ
.
∆ := 4αδ − Ψ
u
u
We now apply the right-hand side inequality in (6.24), page 134, to the vector in IR2n
obtained by concatenating the vectors αdx /u and αds /u. Note that the norm of this
vector is given by αω, with ω as defined in the lemma, and that αω < 1 for the value
of α specified in the lemma. Then we obtain
∆ ≥ 4αδ 2 − ψ (−αω) = 4αδ 2 + αω + log (1 − αω) .
(7.51)
As a function of α, the derivative of the right-hand side expression is given by
ω
4δ 2 (1 − αω) − αω 2
=
.
1 − αω
1 − αω
From this we see that the right-hand side expression in (7.51) is increasing for
4δ 2 + ω −
0 ≤ α ≤ ᾱ :=
4δ 2
1
1
,
= −
2
ω (ω + 4δ )
ω ω + 4δ 2
and decreasing for larger values of α. Hence it attains its maximal value at α = ᾱ,
which is the value specified in the lemma. Moreover, since the barrier function is finite
for 0 ≤ α ≤ ᾱ, the damped Newton step of size ᾱ is feasible. Substitution of α = ᾱ in
(7.51) yields the following bound for ∆:
2
4δ
4δ 2
ω
4δ 2
4δ 2
=
ψ
.
=
+ log
−
log
1
+
∆≥
ω
ω + 4δ 2
ω
ω
ω
In this bound we may replace ω by a larger value, since ψ(t) is monotonically increasing
for t nonnegative. An upper bound for ω can be obtained as follows:
s
q
2
2
ds
dx
2
2
ω=
+
≤ u−1 ∞ kdx k + kds k = u−1 ∞ u−1 − u .
u
u
Since u−1
∞
≤ ρ(δ), by Lemma II.62, page 182, and u−1 − u = 2δ we obtain
ω ≤ 2δρ(δ).
Substitution of this bound yields
∆≥ψ
completing the proof.31
31
2δ
ρ(δ)
(7.52)
,
✷
Exercise 60 With ω as defined in Lemma II.72, show that
ω≥
2δ
.
ρ(δ)
Using this and (7.52), prove that the step-size α specified in Lemma II.72 satisfies
δ2
ρ(δ)2
1
≤α=
≤
.
2
2ρ(δ) (2ρ(δ) + δ)
ω (ω + δ )
2 (2 + δρ(δ))
II.7 Primal-Dual Logarithmic Barrier Method
203
Remark II.73 The same analysis as in Lemma II.72 can be applied to the case where
different step-sizes are taken for the x-space and the s-space. Let x+ = x + α∆x and
s+ = s + β∆s, with α and β such that both steps are feasible. Then the decrease in the
primal-dual barrier function value is given by
∆ := φµ (x, s) − φµ (x+ , s+ ) = α kdx k2 − Ψ
αdx
u
+ β kds k2 − Ψ
βds
u
.
Defining ω1 := kdx /uk, the x-part of the right-hand side can be bounded by
∆1 := α kdx k2 − Ψ
αdx
u
≥ψ
kdx k2
ω1
,
and this bound holds with equality if
α = ᾱ :=
1
1
−
.
ω1
ω1 + kdx k2
Similarly, defining ω2 := kds /uk, the s-part of the right-hand side can be bounded by
∆2 := β kds k2 − Ψ
βds
u
≥ψ
kds k2
ω2
,
and this bound holds with equality if
β = β̄ :=
1
1
−
.
ω2
ω2 + kds k2
Hence,
∆ = ∆1 + ∆2 ≥ ψ
kdx k2
ω1
+ψ
kds k2
ω2
.
We can easily verify that
ω1 ≤ ρ(δ) kdx k ,
ω2 ≤ ρ(δ) kds k .
Using the monotonicity of ψ, it follows that
∆1 ≥ ψ
kdx k
ρ(δ)
,
∆2 ≥ ψ
kds k
ρ(δ)
.
We obtain in this way
∆ = ∆1 + ∆2 ≥ ψ
kdx k
ρ(δ)
+ψ
kds k
ρ(δ)
.
Finally, applying the left inequality in (6.24) to the right-hand side expression, we can easily
derive that
s
∆ ≥ ψ
kdx k2 + kds k2
=ψ
ρ(δ)2
2δ
ρ(δ)
.
204
II Logarithmic Barrier Approach
Note that this is exactly the same bound as obtained in Lemma II.72. Thus, different stepsizes in the x-space and s-space give in this analysis no advantage over equal step-sizes in
both spaces. This contradicts an earlier (and incorrect) result of Roos and Vial in [246].32 •
For our goal it is of interest to√derive the following two conclusions from the above
lemma. First, if δ(x, s; µ) = 1/ 2 then a damped Newton step reduces the barrier
function by at least 0.182745, which is larger than 1/6. On the other hand for larger
values of δ(x, s; µ) the lower bound for the reduction in the barrier function value
seems to be rather poor. It seems reasonable to expect that the reduction grows to
infinity if δ goes to infinity. However, if δ goes to infinity then 2δ/ρ(δ) goes to 1,
and hence the lower bound in the lemma is bounded by the rather small constant
ψ(1) = 1 − log 2.33
7.8.3
A bound for the number of inner iterations
As before, we assume that we have an iterate (x, s) and µ > 0 such that (x, s) belongs
to the region around the µ-center determined by
δ = δ(x, s; µ) ≤ τ,
for some positive τ . Starting at (x, s) we count the number of inner iterations needed
to reach the corresponding region around the µ+ -center, with
µ+ = (1 − θ)µ.
Implicitly it is assumed that θ is so large that (x, s) lies outside the region of quadratic
convergence around the µ+ -center, but this is not essential for the analysis below.
Recall that the target centers x(µ+ ) and s(µ+ ) are the (unique) minimizers of the
primal-dual logarithmic barrier function φµ+ (x, s), and that the value of this function
is an indicator for the ‘distance’ from (x, s) to (x(µ+ ), s(µ+ )).
We start by considering the effect of an update of the barrier parameter to
µ+ = (1 − θ)µ with 0 ≤ θ < 1, on the barrier function value. Note that Lemma
II.69 gives the answer if θ = 0:
φµ (x, s) ≤ ψ(2δρ(δ)).
32
Exercise 61 In this exercise we consider the case where different step-sizes are taken for the xspace and the s-space. Let x+ = x + α∆x and s+ = s + β∆s, with α and β such that both steps
are feasible. Prove that the decrease in the primal-dual barrier function value is given by
∆ := φµ (x, s) − φµ (x+ , s+ ) = α kdx k2 + β kds k2 − Ψ
αdx
u
−Ψ
βds
u
.
Using this, show that the decrease is maximal for the unique step-sizes ᾱ and β̄ determined by the
equations
!
!
T
2
e (dx ) = e
α
T
dx
u
2
,
e + α dux
T
2
e (ds ) = e
β
T
ds
u
2
e + β dus
,
and that for these values of α and β the decrease is given by
Ψ
33
−ᾱdx
u + ᾱdx
+Ψ
−β̄ds
u + β̄ds
.
We want to explicitly show the inherent weakness of the lower bound in Lemma II.72 in the hope
that it will stimulate the reader to look for a stronger result.
II.7 Primal-Dual Logarithmic Barrier Method
205
For the general case, with θ > 0, we have the following lemma.
Lemma II.74 Using the above notation, we have
√
θ
2δρ(δ)θ n
.
+ nψ
φµ+ (x, s) ≤ φµ (x, s) +
1−θ
1−θ
Proof: The proof is more or less straightforward. The vector u is defined as usual.
2
X
n
u2j
u
φµ+ (x, s)
=
eT
log
−e −
1−θ
1−θ
j=1
2
n
X
u
2
2
T
2
T
− u + n log(1 − θ)
log uj + e
=
e u −e −
1−θ
j=1
θeT u2
+ n log(1 − θ)
1−θ
θn
θ
uT u − u−1 +
+ n log(1 − θ).
=
φµ (x, s) +
1−θ
1−θ
The second term in the last expression can be bounded by using
√
uT u − u−1 ≤ kuk u − u−1 ≤ 2δρ(δ) n.
=
φµ (x, s) +
The first inequality is simply the Cauchy–Schwarz
√ inequality and
√ the second inequality
follows from u−1 − u = 2δ and kuk ≤
n kuk∞ ≤ nρ(δ), where we used
Lemma II.62, page 182. We also have
θn
θ
θ
θ
= nψ
.
+ n log(1 − θ) = n
− log 1 +
1−θ
1−θ
1−θ
1−θ
Substitution yields
√
2δρ(δ)θ n
θ
φµ+ (x, s) ≤ φµ (x, s) +
,
+ nψ
1−θ
1−θ
and hence the lemma has been proved.
✷
Now we are ready to estimate the number of (inner) iterations between two
successive updates of the barrier parameter.
Lemma II.75 For given θ (0 < θ < 1), let
Then, when
√
θ n
R :=
.
1−θ
√
R
τ= p
√ ,
2 1+ R
the number of (inner) iterations between two successive updates of the barrier
parameter is not larger than
4
s
√
θ n
.
2 1 +
1−θ
206
II Logarithmic Barrier Approach
Proof: Suppose that δ = δ(x, s; µ) ≤ τ . Then it follows from Lemma II.74 that after
the update of the barrier parameter to µ+ = (1 − θ)µ we have
√
θ
2δρ(δ)θ n
.
+ nψ
φµ+ (x, s) ≤ φµ (x, s) +
1−θ
1−θ
By Lemma II.69 we have φµ (x, s) ≤ ψ (2δρ(δ)). Using the monotonicity of ψ and,
since δ ≤ τ , 2δρ(δ) ≤ 2τ ρ(τ ) we obtain
√
2τ ρ(τ )θ n
θ
φµ+ (x, s) ≤ ψ (2τ ρ(τ )) +
+ nψ
.
1−θ
1−θ
Application of the inequality ψ(t) ≤ t2 /2 for t ≥ 0 to the first and the third terms
yields
φ
µ+
2
√
√
√
2τ ρ(τ )θ n
nθ2
θ n
√
.
(x, s) ≤ 2τ ρ(τ ) +
= τ ρ(τ ) 2 +
+
1−θ
2(1 − θ)2
2(1 − θ)
2
2
The algorithm repeats damped Newton steps until the iterate (x, s) satisfies δ =
δ(x, s; µ+ ) ≤ τ . Each damped step decreases the barrier function value by at least
ψ (2τ /ρ(τ )). Hence, after
2
√
√
1
n
θ
τ ρ(τ ) 2 + √
(7.53)
2τ
2(1 − θ)
ψ ρ(τ
)
iterations the value of the barrier function will have reached (or bypassed) the value
ψ (2τ /ρ(τ )). From Lemma II.69, using that ψ (2τ /ρ(τ )) < ψ (−2τ /ρ(τ )), the iterate
(x, s) then certainly satisfies δ(x, s; µ+ ) ≤ τ , and hence (7.53) provides an upper
bound for the number of inner iterations between two successive updates of the barrier
parameter.
The rest of the proof consists in manipulating this expression. First, using ψ(t) ≥
t2 /(2(1 + t)) and 0 ≤ 2τ /ρ(τ ) ≤ 1, we obtain
ψ
2τ
ρ(τ )
≥
4τ 2
ρ(τ )2
2 1+
2τ
ρ(τ )
=
τ2
τ2
2
≥
.
2τ ρ(τ )2
ρ(τ )2
1 + ρ(τ
)
Substitution reduces the upper bound (7.53) to
&
2 ' &
√
√ 2 '
√
θ n
θρ(τ
)
ρ(τ )2
n
= 2 ρ(τ )2 +
.
τ ρ(τ ) 2 + √
τ2
2τ
(1
−
θ)
2(1 − θ)
For fixed θ the number of inner iterations is a function of τ . Note that this function
goes to infinity if τ goes to zero or to infinity. Our aim is to determine τ such that
this function is minimized. To this end we consider
T (τ ) := ρ(τ )2 +
ρ(τ )R
,
2τ
II.7 Primal-Dual Logarithmic Barrier Method
207
with R as given in the lemma. The derivative of T (τ ) with respect to τ can be simplified
to
4τ 2 ρ(τ )2 − R
√
.
T ′ (τ ) =
2τ 2 1 + τ 2
Hence T (τ ) is minimal if
2τ ρ(τ ) =
√
R.
We can solve this equation for τ . It can be rewritten as
√
ρ(τ )2 − 1 = R,
which gives
ρ(τ ) =
Hence,
1
τ=
2
ρ(τ ) −
1
ρ(τ )
1
=
2
q
√
1 + R.
!
√
q
√
1
R
= p
1+ R− p
√
√ .
1+ R
2 1+ R
(7.54)
Substitution of this value in T (τ ) gives
√
R
R
1
+
√ 2
√
ρ(τ )R
√
T (τ ) = ρ(τ )2 +
= 1+ R .
=1+ R+
2τ
R
For the value of τ given by (7.54) the number of inner iterations between two successive
updates of the barrier parameter will not be larger than
4
s
√
√ 4
θ n
= 2 1 +
2 1+ R
,
1−θ
which proves the lemma.
✷
√
Remark II.76 Note that for small values of θ, so that θ n is bounded by a constant, the
above lemma implies that the number of inner iterations between two successive
updates of
√
the barrier parameter is bounded by a constant. For example, with θ = 1/ 2n, which gives
(for large values of n) τ = 0.309883, this number is given by
&
2 1+
r
1
√
2
4 '
= 23.
Unfortunately the constant is rather large. Because, if τ = 0.309883 then we know that after
√
an update with θ = 1/ 2n one full Newton step will be sufficient to reach the vicinity of
the new target. In fact, it turns out that the bound has the same weakness as the bound
in Theorem II.41 for the dual case. As discussed earlier, this weak result is due to the poor
analysis.
•
In practice the number of inner iterations is much smaller than the number predicted
by the lemma. This is illustrated by some examples in the next section. But first we
208
II Logarithmic Barrier Approach
formulate the main conclusion of this section, namely that the primal-dual logarithmic
barrier method with large updates is polynomial. This is the content of our final result
in this section.
Theorem II.77 The following expression is an upper bound for the total number of
iterations required by the logarithmic barrier algorithm with line searches:
4
s
√
0
θ n
nµ
1
.
log
2 1 +
θ
1−θ
ε
Here it is assumed that τ is chosen as in Lemma II.75:
√
√
θ n
R
.
τ= p
√ , where R =
1−θ
2 1+ R
√
If θ ≤ n/(n + n) the output is a primal-dual pair (x, s) such that xT s ≤ 2ε.
Proof: The number of outer iterations follows from Lemma I.36. The bound in the
theorem is obtained by multiplying this number by the bound of Lemma II.75 for
the number of inner iterations per outer iteration and rounding the product, if not
integral, to the smallest integer above it. Finally, for the last statement we use the
inequality
2δρ(δ)
nµ,
xT s ≤ 1 + √
n
where δ = δ(x, s; µ); the elementary proof of this inequality is left as an exercise.34,35
For the output pair (x, s) we may apply this inequality with δ ≤ τ . Since
√
q
√
R
p
τ=
√ , ρ(τ ) = 1 + R,
2 1+ R
√
√
we have 2τ ρ(τ ) = R. Now θ ≤ n/(n + n) implies that R ≤ n, and hence we obtain
T
that x s ≤ 2nµ ≤ 2ε.
✷
Just as in the dual case, we draw two conclusions from the last theorem. If we take
for θ a fixed constant (independent of n), for example θ = 1/2, the algorithm is called
a large-update algorithm and the iteration bound of Theorem II.77 becomes
nµ0
.
O n log
ε
34
Exercise 62 Let (x, s) be a positive primal-dual pair and µ > 0. If δ = δ(x, s; µ), prove that
xT s − nµ = µ uT u − u−1
35
≤
2δρ(δ)
nµ.
√
n
√
Exercise 63 The bound in Exercise 62 is based on the estimate kuk ≤ ρ(δ) n. Prove the sharper
estimate
√
δ
kuk ≤ ρ √
n.
n
II.7 Primal-Dual Logarithmic Barrier Method
209
√
If we take θ = ν/ n for some fixed constant ν (independent of n), the algorithm is
called a medium-update algorithm and the iteration bound of Theorem II.77 becomes
√
nµ0
,
O
n log
ε
provided that n is large enough (n ≥ ν 2 say).
7.8.4
Illustration of the algorithm with large updates
We use the same sample problem as in the numerical examples given before, and solve
this problem using the primal-dual logarithmic barrier algorithm with large updates.
We use the same initialization as before, namely x = (2, 1, 1), y = (0, 0), s = (1, 1, 1)
and µ = 4/3. We do this for the values 0.5, 0.9, 0.99 and 0.999 of the barrier update
parameter θ. It may be interesting to mention the values of the parameter τ , as given
by Lemma II.75, for these values of θ. With n = 3, these values are respectively
0.43239, 0.88746, 1.74397 and 3.18671. The progress of the algorithm for the three
successive values of θ is shown in Tables 7.7. (page 210), 7.8., 7.9. and 7.10. (page
211). The tables need some explanation. They show only the first coordinates of x
and of s. As in the corresponding tables for the dual case, the tables not only have
lines for the inner iterations, but also for the outer iterations, which multiply the value
of the barrier parameter by the fixed factor 1−θ. The last column shows the proximity
to the current µ-center. The proximity value δ increases in the outer iterations and
decreases in the inner iterations.
The tables clearly demonstrate the advantages of the large-update strategy. The
number of inner iterations between two successive updates of the barrier parameter is
never more than two.
In the last table (with θ = 0.999) the sample problem is solved in only 3 iterations,
which is the best result obtained so far. The practical behavior is significantly better
than the theoretical analysis justifies. This is typical, and the same phenomenon occurs
for larger problems than the small sample problem.
We conclude this section with a graphical illustration of the algorithm, in Figure
7.11 (page 212).
210
II Logarithmic Barrier Approach
Outer Inner
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Table 7.7.
nµ
x1
y1
y2
s1
δ
0 4.000000
2.000000
1 2.000000
1.000000
2 1.000000
0.500000
3 0.500000
0.250000
4 0.250000
0.125000
5 0.125000
0.062500
6 0.062500
0.031250
7 0.031250
0.015625
8 0.015625
0.007812
9 0.007812
0.003906
10 0.003906
0.001953
11 0.001953
0.000977
12 0.000977
0.000488
13 0.000488
0.000244
14 0.000244
0.000122
15 0.000122
0.000061
16 0.000061
2.000000
2.000000
1.372070
1.372070
1.158784
1.158784
1.082488
1.082488
1.041691
1.041691
1.020805
1.020805
1.010423
1.010423
1.005201
1.005201
1.002606
1.002606
1.001300
1.001300
1.000651
1.000651
1.000325
1.000325
1.000163
1.000163
1.000081
1.000081
1.000041
1.000041
1.000020
1.000020
1.000010
0.000000
0.000000
0.313965
0.313965
0.649743
0.649743
0.835475
0.835475
0.916934
0.916934
0.958399
0.958399
0.979157
0.979157
0.989597
0.989597
0.994789
0.994789
0.997399
0.997399
0.998697
0.998697
0.999350
0.999350
0.999674
0.999674
0.999837
0.999837
0.999919
0.999919
0.999959
0.999959
0.999980
0.000000
0.000000
0.313965
0.313965
0.666131
0.666131
0.835249
0.835249
0.916776
0.916776
0.958395
0.958395
0.979156
0.979156
0.989598
0.989598
0.994789
0.994789
0.997399
0.997399
0.998697
0.998697
0.999350
0.999350
0.999674
0.999674
0.999837
0.999837
0.999919
0.999919
0.999959
0.999959
0.999980
1.000000
1.000000
0.686035
0.686035
0.350257
0.350257
0.164525
0.164525
0.083066
0.083066
0.041601
0.041601
0.020843
0.020843
0.010403
0.010403
0.005211
0.005211
0.002601
0.002601
0.001303
0.001303
0.000650
0.000650
0.000326
0.000326
0.000163
0.000163
0.000081
0.000081
0.000041
0.000041
0.000020
0.2887
0.6455
0.2334
0.6838
0.1559
0.6237
0.0587
0.6031
0.0281
0.6115
0.0147
0.6111
0.0073
0.6129
0.0039
0.6111
0.0019
0.6129
0.0015
0.6111
0.0007
0.6129
0.0012
0.6112
0.0006
0.6129
0.0011
0.6112
0.0005
0.6129
0.0012
0.6112
0.0005
Progress of the primal-dual algorithm with large updates, θ = 0.5.
II.7 Primal-Dual Logarithmic Barrier Method
Outer Inner
0
1
2
3
4
5
Table 7.8.
nµ
x1
y1
y2
s1
δ
0 4.000000
0.400000
1 0.400000
2 0.400000
0.040000
3 0.040000
0.004000
4 0.004000
0.000400
5 0.000400
0.000040
6 0.000040
2.000000
2.000000
1.051758
1.078981
1.078981
1.004551
1.004551
1.000621
1.000621
1.000066
1.000066
1.000007
0.000000
0.000000
0.263401
0.875555
0.875555
0.976424
0.976424
0.998596
0.998596
0.999867
0.999867
0.999987
0.000000
0.000000
0.684842
0.861676
0.861676
0.983729
0.983729
0.998677
0.998677
0.999868
0.999868
0.999987
1.000000
1.000000
0.736599
0.124445
0.124445
0.023576
0.023576
0.001404
0.001404
0.000133
0.000133
0.000013
0.2887
2.4664
1.1510
0.0559
2.5417
0.3661
2.7838
0.0447
2.4533
0.0070
2.4543
0.0027
Progress of the primal-dual algorithm with large updates, θ = 0.9.
Outer Inner
0
1
2
3
Table 7.9.
nµ
x1
y1
y2
s1
δ
0 4.000000
0.040000
1 0.040000
2 0.040000
0.000400
3 0.000400
0.000004
4 0.000004
2.000000
2.000000
2.000000
1.004883
1.004883
1.007772
1.007772
1.000038
0.000000
0.000000
0.000000
0.251292
0.251292
0.987570
0.987570
0.999743
0.000000
0.000000
0.000000
0.743825
0.743825
0.986233
0.986233
0.999834
1.000000
1.000000
1.000000
0.748708
0.748708
0.012430
0.012430
0.000257
0.2887
8.5737
4.2530
0.0816
8.7620
0.4532
9.5961
0.0392
Progress of the primal-dual algorithm with large updates, θ = 0.99.
Outer Inner
0
1
2
Table 7.10.
211
nµ
x1
y1
y2
0 4.000000
0.004000
1 0.004000
2 0.004000
0.000004
3 0.000004
2.000000
2.000000
1.000977
1.000481
1.000481
1.000000
0.000000
0.000000
0.250006
0.999268
0.999268
0.999998
0.000000
0.000000
0.749018
0.998990
0.998990
0.999999
s1
δ
1.000000 0.2887
1.000000 27.3587
0.749994 13.6684
0.000732 0.3722
0.000732 22.4872
0.000002 0.2066
Progress of the primal-dual algorithm with large updates, θ = 0.999.
212
w2
II Logarithmic Barrier Approach
102
w2
✻
✻
100
100
10−2
10−2
10−4
10−4
10−6
10−8
10−8
w2
102
10−6
θ = 0.5
τ = 0.433
10−6
10−4
10−2
100
✲ w1
10−8
10−8
102
102
w2
✻
θ = 0.9
τ = 0.887
10−6
10−4
10−2
100
✲ w1
102
102
✻
100
100
10−2
10−2
10−4
10−4
10−6
10−8
10−8
10−6
θ = 0.99
τ = 1.744
10−6
10−4
Figure 7.11
10−2
100
✲ w1
102
10−8
10−8
θ = 0.999
τ = 3.187
10−6
10−4
10−2
100
✲ w1
102
The iterates when using large updates with θ = 0.5, 0.9, 0.99 and 0.999.
8
Initialization
All the methods of this part of the book assume the availability of a starting point
on or close to the central path of the problem. Sometimes such a point is known, but
more often we have no foreknowledge of the problem under consideration. For these
cases we provide in this chapter a transformation of the problem yielding an equivalent
problem for which a point on the central path is available. This transformation is based
on results in Part I and is described below in detail.1
Suppose that we want to solve the problem (P ) in standard format:
(P )
min cT x : Ax = b, x ≥ 0 ,
where A is an m × n matrix of rank m, c, x ∈ IRn , and b ∈ IRm . Let I be a subset of
the full index set {1, 2, . . . , n} such that the submatrix AI of A has size m × m and is
nonsingular. Thus, AI is a basis for (P ). After reordering the columns of A, we may
write
A = (AI AJ ) ,
where J denotes the complement of I. Now Ax = b can be rewritten as
AI xI + AJ xJ = b,
which is equivalent to
As a consequence we have
xI = AI−1 (b − AJ xJ ) .
T
T −1
T −T
cT x = cTI xI + cTJ xJ = cTI A−1
I (b − AJ xJ ) + cJ xJ = cI AI b + cJ − AJ AI cI
Hence, omitting the constant cTI A−1
I b we can reformulate (P ) as
n
o
T
(P c )
min
cJ − ATJ AI−T cI xJ : AI−1 (b − AJ xJ ) ≥ 0, xJ ≥ 0 ,
T
xJ .
or equivalently,
(P c )
1
min
n
cJ − ATJ AI−T cI
T
o
xJ : −AI−1 AJ xJ ≥ −A−1
b,
x
≥
0
.
J
I
We want to point out an advantage of the approach in this chapter over the approach in the existing literature. The technique of embedding a given standard form problem in a homogeneous and
self–dual problem was introduced by Ye, Todd and Mizuno [316] in 1994. See also Wu, Wu and
Ye [299]; their final model contains free variables. In our approach the occurrence of free variables
is avoided by first reducing the given standard problem to a canonical problem. For a different
approach to the initialization problem we refer to, e.g., Lustig [189, 190].
214
II Logarithmic Barrier Approach
Thus we have transformed (P ) to the equivalent problem (P c ), which is in canonical
format. Chapter 4 describes how we can embed any canonical model in a self-dual
model so that a strictly complementary solution of the latter model either yields a
strictly complementary solution of the canonical problem or makes clear that the
canonical problem is infeasible or unbounded. Moreover, for the embedding problem
we have a point on its central path available. If we apply such an embedding to (P c ),
the resulting self-dual model may be given by
(SP c )
min q T ξ : M ξ ≥ −q, ξ ≥ 0 ,
where M is skew-symmetric and q ≥ 0. Let ξ(µ) be a given point on the central path
of (SP c ) for some positive µ. Now (SP c ) can be written in the standard form by
associating the surplus vector σ(ξ) := M ξ + q with any ξ. We then may rewrite (SP c )
as
(SSP c )
min q T ξ : M ξ − σ = −q, ξ ≥ 0, σ ≥ 0 ,
and we have
ξ(µ)σ(ξ(µ)) = µe,
where e is an all-one vector of appropriate size. Note that (SSP c ) is in the standard
format. We can rewrite it as
(P̄ )
min c̄T x̄ : Āx̄ = b̄, x̄ ≥ 0 ,
with
Ā =
h
M −I
i
,
c̄ =
"
q
0
#
,
b̄ = −q.
The problem (P̄ ) is in the standard format and hence the methods of this chapter
can be used to yield an ε-solution of (P̄ ) provided that we have a solution on or close
to its central path. We now show that this condition is satisfied by showing that the
µ-center of (P̄ ) is known. To this end we need to consider also the dual problem of
(P̄ ), namely
(D̄)
max b̄T ȳ : ĀT ȳ + s̄ = c̄, s̄ ≥ 0 .
For the slack vector s̄ we have
T
s̄ = c̄ − Ā ȳ =
"
q − M T ȳ
ȳ
#
=
"
q + M ȳ
ȳ
#
.
Here we used that M T = −M . Now with the definition
"
#
ξ(µ)
, ȳ =: ξ(µ),
x̄ :=
σ(ξ(µ))
x̄ is feasible for (P̄ ) and ȳ is feasible for (D̄). The feasibility of ȳ follows by considering
its slack vector:
"
# "
# "
#
q + M ȳ
q + M ξ(µ)
σ(ξ(µ))
s̄ =
=
=
.
ȳ
ξ(µ)
ξ(µ)
II.8 Initialization
215
For the product of x̄ and s̄ we have
"
#"
# "
# "
#
ξ(µ)
σ(ξ(µ))
ξ(µ)σ(ξ(µ))
µe
x̄s̄ =
=
=
.
σ(ξ(µ))
ξ(µ)
σ(ξ(µ))ξ(µ)
µe
This proves that x̄ is the µ-center of (P̄ ), as required.
By way of example we apply the above transformation to the sample problem used
throughout this part of the book.
Example II.78 Taking A and c as in Example II.7 (page 97), and b = (1, 1)T , we
have
"
#
" #
1
1 −1
0
1
A=
, b=
, c = 1 ,
0
0
1
1
1
and (P ) is the problem
(P )
min {x1 + x2 + x3 : x1 − x2 = 1, x3 = 1, x ≥ 0} .
The first and the third column of A form a basis. With the index set I defined
accordingly, the matrix AI is the identity matrix. Then we express x1 and x3 in
terms of x2 :
#
# "
"
1 + x2
x1
.
=
1
x3
Using this we eliminate x1 and x3 from (P ) and we obtain the canonical problem (P c ):
#
)
"
(
" #
−1
1
c
, x2 ≥ 0 .
x2 ≥
(P )
min 2x2 + 2 :
−1
0
Being unrealistic, but just to demonstrate the transformation process for this simple
case, we do not assume any foreknowledge and embed this problem in a self-dual
problem as described in Section 4.3.2 Taking 1 for x0 and s0 , and for y 0 and t0 the
all-one vector of length 2, the self-dual embedding problem is given by (SP c ) with
0
0
0
1
1 −1
0
0
0
0
1
0
,
0 .
q
=
M =
−1
0
0
2
0
0
5
0
−1 −1 −2
5
1
0
0 −5
0
Now the all-one vector is feasible for (SP c ) and its surplus vector is also the all-one
vector, as easily can be verified. It follows that the all-one vector is the point on the
central path for µ = 1. Adding surplus variables to this problem we get a problem in
the standard format with 5 equality constraints and 10 variables. Solving this problem
2
Exercise 64 The canonical problem (P c ) contains an empty row. Remove this row and then
perform the embedding. Show that this leads to the same solution of (P c ).
216
II Logarithmic Barrier Approach
with the large-update logarithmic barrier method (with θ = 0.999 and ε = 10−4 ), we
find in 4 iterations the strictly complementary solution
4 4 8
4
ξ = (0, 0, 0, , 0, , , , 0, 1).
5
5 5 5
The slack vector is
4
4 4 8
σ(ξ) = ( , , , 0, 1, 0, 0, 0, , 0).
5 5 5
5
Note that the first five coordinates of ξ are equal to the last five coordinates of σ(ξ)
and vice versa. In fact, the first five coordinates of ξ form a solution of the self-dual
embedding (SP c ) of (P c ). The homogenizing variable, the fourth entry in ξ, is positive.
Therefore, we have found an optimal solution of (P c ). The optimal value of x2 in (P c ),
the third coordinate in the vector ξ, is given by x2 = 0. Hence x = (1, 0, 1) is optimal
for the original problem (P ).
♦
A clear disadvantage of the above embedding procedure seems to be that it increases
the size of the problem. If the constraint matrix A of (P ) has size m × n then the final
standard form problem that we have to solve has size (n + 2) × 2(n + 2). However,
when the procedure is implemented efficiently the amount of extra computation can
be reduced significantly. In fact, the computation of the search direction for the larger
problem can be organized in such a way that it requires the solution of three linear
systems with the same matrix of size (m+2)×(m+2). This is explained in Chapter 20.
Part III
The Target-following
Approach
9
Preliminaries
9.1
Introduction
In this part we deal again with the problems (P ) and (D) in the standard form:
(P )
min cT x : Ax = b, x ≥ 0 ,
(D)
max
T
b y : AT y ≤ c .
As before, the matrix A is of size m × n with full row rank and the vectors c and x are
in IRn and b in IRm . Assuming that the interior-point condition is satisfied we recall
from Theorem II.4 that the KKT system (5.3)
Ax
AT y + s
=
=
b,
c,
xs
=
µe
x ≥ 0,
s ≥ 0,
(9.1)
has a unique solution for every positive value of µ. These solutions are called the µcenters of (P ) and (D). The above result is fundamental for the algorithms analyzed
in Part II. When µ runs through the positive real line then the solutions of the KKT
system run through the central paths of (P ) and (D); the methods in Part II just
approximately follow the central path to the optimal sets of (P ) and (D). These
methods were called logarithmic barrier methods because the points on the central
path are minimizers of the logarithmic barrier functions for (P ) and (D). For obvious
reasons they have also become known under the name central-path-following methods.
In each (outer) iteration of such a method the value of the parameter µ is fixed
and starting at a given feasible solution of (P ) and/or (D) a good approximation is
constructed of the µ-centers of (P ) and (D). Numerically the approximate solutions
are obtained either by using Newton’s method for solving the KKT system or by
using Newton’s method for minimizing the logarithmic barrier function of (P ) and
(D). In the first case Newton’s method provides displacements for both (P ) and (D);
then we speak of a primal-dual method. In the second case Newton’s method provides
a displacement for either (P ) or (D), depending on whether the logarithmic barrier
function of (P ) or (D) is used in Newton’s method. This gives the so-called primal
methods and dual methods respectively. In all cases the result of an (outer) iteration
is a primal-dual pair approximating the µ-centers and such that the duality gap is
approximately nµ.
220
III Target-following Approach
In this part we present a generalization of the above results. The starting point is
the observation that if the vector µe on the right-hand side of the KKT system (9.1)
is replaced by any positive vector w then the resulting system still has a (unique)
solution. Thus, for any positive vector w the system
Ax
=
b,
AT y + s
xs
=
=
c,
w
x ≥ 0,
s ≥ 0,
(9.2)
has a unique solution, denoted by x(w), y(w), s(w).1 This result is interesting in itself.
It means that we can associate with each positive vector w the primal-dual pair
(x(w), s(w)).2 The map ΦP D associating with any w > 0 the pair (x(w), s(w)) will
be called the target map associated with (P ) and (D). In the next section we discuss
its existence and also some interesting properties.
In the present context, it is convenient to refer to the interior of the nonnegative
orthant in IRn as the w-space. Any (possibly infinite) sequence of positive vectors
wk (k = 1, 2, . . .) in the w-space is called a target sequence. If a target sequence
converges to the origin, then the duality gap eT wk for the corresponding pair in the
sequence ΦP D (wk ) converges to zero. We are especially interested in target sequences
of this type for which the sequence ΦP D (wk ) is convergent as well, and for which the
limiting primal-dual pair is strictly complementary. In Section 9.3 we derive a sufficient
condition on target sequences (converging to the origin) that yields this property. We
also give a condition such that the limiting pair consists of so-called weighted-analytic
centers of the optimal sets of (P ) and (D).
With any central-path-following method we can associate a target sequence on the
central path by specifying the values of the barrier parameter µ used in the successive
(outer) iterations. The central-path-following method can be interpreted as a method
that takes the points on the central path as intermediate targets on the way to the
origin. Thus it becomes apparent how the notion of central-path-following methods
can be generalized to target-following methods, which (approximately) follow arbitrary
target sequences. To develop this idea further we need numerical procedures that can
be used to obtain a good approximation of the primal-dual pair corresponding to
some specified positive target vector. Chapters 10, 12 and 13 are devoted to such
procedures. The basic principle is again Newton’s method. Chapter 10 describes a
primal-dual method, Chapter 12 a dual method, and Chapter 13 a primal method.
The target-following approach offers a very general framework for the analysis
of almost all known interior-point methods. In Chapter 11 we analyze some of the
methods of Part II in this framework. We also deal with some other applications,
including a target-following method that is based on the Dikin direction, as introduced
in Appendix E. Finally, in Chapter 14 we deal with the so-called method of centers.
This method will be described and after putting it into the target-following framework
we provide a new and relatively easy analysis of the method.
1
This result, which establishes a one-to-one correspondence between primal-dual pairs (x, s) and
positive vectors in IRn , was proved first in Kojima et al. [175]. Below we present a simple alternative
proof. Mizuno [212, 214] was the first to use this property in the design of an algorithm.
2
Here, as before, we use that any dual feasible pair (y, s) can be uniquely represented by either y
or s. This is due to the assumption that A has full row rank.
III.9 Preliminaries
9.2
221
The target map and its inverse
Our first aim in this section is to establish that the target map ΦP D is well defined.
That is, we need to show that for any positive vector w ∈ IRn the system (9.2) has
a unique solution. To this end we use a modification of the primal-dual logarithmic
barrier as given by (6.23). Replacing the role of the vector µe in this function by the
vector w, we consider the modified primal-dual logarithmic barrier function defined
by
n
X
xj sj
1
wj ψ
−1 .
(9.3)
φw (x, s) =
max (w) j=1
wj
Here the function ψ has its usual meaning (cf. (5.5), page 92). The scaling factor
1/ max (w) serves to scale φw (x, s) in such a way that φw (x, s) coincides with the
primal-dual logarithmic barrier function (7.44) in Section 7.8 (page 194) if w is on the
central path.3
Note that φw (x, s) is defined for all positive primal-dual pairs (x, s). Moreover,
φw (x, s) ≥ 0 and the equality holds if and only if xs = w. Hence, the weighted KKT
system (9.2) has a solution if and only if the minimal value of φw is 0.
By expanding φw (x, s) we get
max (w) φw (x, s)
n
X
=
j=1
n
X
=
j=1
wj
xj sj
xj sj
− 1 − log
wj
wj
n
X
xj sj −
xT s −
=
n
X
j=1
j=1
wj −
n
X
wj log xj sj +
n
X
wj log wj
j=1
j=1
wj log xj sj − eT w +
n
X
wj log wj .
(9.4)
j=1
Neglecting for the moment the constant part, that is the part that does not depend
on x and s, we are left with the function
xT s −
n
X
wj log xj sj .
(9.5)
j=1
This function is usually called a weighted primal-dual logarithmic barrier function
with the coefficients of the vector w as weighting coefficients. Since xT s = cT x − bT y,
the first term in (9.5) is linear on the domain of φw (x, s). The second term, called the
barrier term, is strictly convex and hence it follows that φw (x, s) is strictly convex on
its domain.
3
If w = µe then max (w) = µ and hence
φw (x, s) =
1
µ
n
X
j=1
µψ
x
j sj
µ
−1
=
n
x s
X
j j
ψ
j=1
µ
−1
=Ψ
xs
−e ;
µ
this is precisely the primal-dual logarithmic barrier function φµ (x, s) as given by (6.23) and (7.44),
and that was used in the analysis of the large-update central-path-following logarithmic barrier
method.
222
III Target-following Approach
In the sequel we need a quantity to measure the distance from a positive vector w
to the central path of the w-space. Such a measure was introduced in Section 3.3.4 in
(3.20). We use the same measure here, namely
δc (w) :=
max (w)
.
min (w)
(9.6)
Now we are ready to derive the desired result by adapting Theorem II.4 and its proof
to the present case. With w fixed, for given K ∈ IR the level set LK of φw is defined
by
LK = (x, s) : x ∈ P + , s ∈ D+ , φw (x, s) ≤ K .
Theorem III.1 Let w ∈ IRn and w > 0. Then the following statements are equivalent:
(i) (P ) and (D) satisfy the interior-point condition.
(ii) There exists K ≥ 0 such that the level set LK is nonempty and compact.
(iii) There exists a (unique) primal-dual pair (x, s) minimizing φw with x and s both
positive.
(iv) There exist (unique) x, s ∈ IRn and y ∈ IRm satisfying (9.2);
(v) For each K ≥ 0 the level set LK is nonempty and compact.
Proof: (i) ⇒ (ii): Assuming (i), there exists a positive x0 ∈ P + and
a positive
s0 ∈ D+ . With K = φw x0 , s0 the level set LK contains the pair x0 , s0 . Thus, LK
is not empty, and we need to show that LK is compact. Let (x, s) ∈ LK . Then, by the
definition of LK ,
n
X
xi si
wi ψ
− 1 ≤ K max (w).
wi
i=1
Since each term in the sum is nonnegative, this implies
xi si
K max (w)
ψ
−1 ≤
= Kδc (w),
wi
min (w)
1 ≤ i ≤ n.
Since ψ is strictly convex on its domain and goes to infinity at its boundaries, there
exist unique positive numbers a and b, with a < 1, such that
ψ(−a) = ψ(b) = Kδc (w).
We conclude that
−a ≤
xi si
− 1 ≤ b,
wi
1 ≤ i ≤ n,
which gives
wi (1 − a) ≤ xi si ≤ wi (1 + b),
1 ≤ i ≤ n.
(9.7)
From the right-hand side inequality we deduce that
xT s ≤ (1 + b)eT w.
We proceed by showing that this and (i) imply that the coordinates of x and s can
be bounded above. Since A(x − x0 ) = 0, the vector x − x0 belongs to the null space of
III.9 Preliminaries
223
A. Similarly, s − s0 = AT (y 0 − y) implies that s − s0 lies in the row space of A. The
row space and the null space of A are orthogonal and hence we have
(x − x0 )T (s − s0 ) = 0.
(9.8)
Writing this as
xT s0 + sT x0 = xT s + (x0 )T (s0 )
and using xT s ≤ (1 + b)eT w, we find
xT s0 + sT x0 ≤ (1 + b)eT w + (x0 )T (s0 ).
(9.9)
Since sT x0 ≥ 0, x ≥ 0, and s0 > 0, this implies for each index i that
xi s0i ≤ xT s0 + sT x0 ≤ (1 + b)eT w + (x0 )T (s0 ),
whence
xi ≤
(1 + b)eT w + (x0 )T (s0 )
,
s0i
proving that the coordinates of the vector x are bounded above. The coordinates of
the vector s are bounded above as well. This can be derived from (9.9) in exactly the
same way as for the coordinates of x. Using xT s0 ≥ 0, s ≥ 0, and x0 > 0, we obtain
for each index i that
(1 + b)eT w + (x0 )T (s0 )
.
si ≤
x0i
Thus we have shown that the level set LK is bounded. We proceed by showing that
LK is compact. Each si being bounded above, the left inequality in (9.7) implies that
xi is bounded away from zero. In fact, we have
xi ≥
(1 − a)wi
(1 − a)x0i wi
≥
.
si
(1 + b)eT w + (x0 )T (s0 )
In the same way we derive that for each i,
si ≥
(1 − a)s0i wi
(1 − a)wi
≥
.
xi
(1 + b)eT w + (x0 )T (s0 )
We conclude that for each i there exist positive numbers αi and βi with 0 < αi ≤ βi ,
such that
αi ≤ xi , si ≤ βi , 1 ≤ i ≤ n.
Thus we have proved the inclusion
LK ⊆
n
Y
i=1
[αi , βi ] × [αi , βi ].
The set on the right-hand side lies in the positive orthant of IRn × IRn , and being the
Cartesian product of closed intervals, it is compact. Since φw is continuous, and well
defined on this set, it follows that LK is compact. Thus we have shown that (ii) holds.
224
III Target-following Approach
(ii) ⇒ (iii): Suppose that (ii) holds. Then, for some nonnegative K the level set LK
is nonempty and compact. Since φw is continuous, it follows that φw has a minimizer
(x, s) in LK . Moreover, since φw is strictly convex, this minimizer is unique. Finally,
from the definition of φw , ψ ((xi si /wi ) − 1) must be finite, and hence xi si > 0 for each
i. This implies that x > 0 and s > 0, proving (iii).
(iii) ⇒ (iv): Suppose that (iii) holds. Then φw has a (unique) minimizer. Since the
domain P + × D+ of φw is open, (x, s) ∈ P + × D+ is a minimizer of φw if and only
if the gradient of φw is orthogonal to the linear space parallel to the smallest affine
space containing P + × D+ (cf. Proposition A.1). This linear space is determined by
the affine system
Ax = 0, Hs = 0,
where H is a matrix such that its row space is the null space of A and vice versa. The
gradient of φw with respect to the coordinates of x satisfies
max (w)∇x φw (x, s) = s −
w
,
x
and with respect to the coordinates of s we have
max (w)∇s φw (x, s) = x −
w
.
s
Application of Proposition A.1 yields that ∇x φw (x, s) must lie in the row space of A
and ∇s φw (x, s) must lie in the row space of H. These two spaces are orthogonal, and
hence we obtain
w
w T
x−
= 0.
s−
x
s
This can be rewritten as
w T
w
s−
XS −1 s −
= 0.
x
x
Since XS −1 is a diagonal matrix with positive elements on the diagonal, this implies
s−
Hence,
w
= 0.
x
w
= 0,
x
whence xs = w. This proves that (x, s) is a minimizer of φw if and only if (x, s) satisfies
(9.2). Hence (iv) follows from (iii).
(iv) ⇒ (i): Let (x, s) be a solution of (9.1). Since w > 0 and x and s are nonnegative,
both x and s are positive. This proves that (P ) and (D) satisfy the interior-point
condition.
Thus it has been shown that statements (i) to (iv) in the theorem are equivalent. We
finally prove that statement (v) is equivalent with each of these statements. Obviously
(v) implies (ii). On the other hand, assuming that statements (i) to (iv) hold, let x and
s solve (9.2). Then we have x > 0, s > 0 and xs = w. This implies that φw (x, s) = 0,
as easily follows by substitution. Now let K be any nonnegative number. Then the
s−
III.9 Preliminaries
225
level set LK contains the pair (x, s) and hence it is nonempty. Finally, from the above
proof of the implication (i) ⇒ (ii) it is clear that LK is compact. This completes the
proof of the theorem.
✷
If the interior-point condition is satisfied, then the target map ΦP D provides a
tool for representing any positive primal-dual pair (x, s) by the positive vector xs,
which is the inverse image of the pair (x, s). The importance of this feature cannot
be overestimated. It means that the interior of the nonnegative orthant in IRn can
be used to represent all positive primal-dual pairs. As a consequence, the behavior
of primal-dual methods that generate sequences of positive primal-dual pairs, can be
described in the nonnegative orthant in IRn . Obviously, the central paths of (P ) and
(D) are represented by the bisector {µe : µ > 0} of the w-space; in the sequel we
refer to the bisector as the central path of the w-space. See Figure 9.1.
w2
✻
central path
❘
duality gap constant
✠
✲ w1
Figure 9.1
The central path in the w-space (n = 2).
For central-path-following methods the target sequence is a sequence on this path
converging to the origin. The iterates of these methods are positive primal-dual pairs
‘close’ to the target points on the central path, in the sense of some proximity measure.
In the next sections we deal with target sequences that are not necessarily on the
central path.
Remark III.2 We conclude this section with an interesting observation, namely that the
target map of (P ) and (D) contains so much information that we can reconstruct the data A, b
and c from the target map.4 This can be shown as follows. We take partial derivatives with
4
This result was established by Crouzeix and Roos [57] in an unpublished note.
226
III Target-following Approach
respect to the coordinates of w in the weighted KKT system (9.2). Denoting the Jacobians
of x, y and s simply by x′ , y ′ and s′ respectively, we have
x′ =
∂x
,
∂w
y′ =
∂y
,
∂w
s′ =
∂s
,
∂w
where the (i, j) entry of x′ is the partial derivative ∂xi /∂wj , etc. Note that x′ and s′ are
n × n matrices and y ′ is an m × n matrix. Thus we obtain
Ax′
=
0,
A T y ′ + s′
=
0,
Xs′ + Sx′
=
Inn ,
(9.10)
where I denotes the identity matrix of size n × n.5 The third equation is equivalent to
s′ = X −1 Inn − Sx′ .
Using also the second equation we get
AT y ′ = X −1 Sx′ − Inn .
′
(9.11)
′
Since y is an m × n matrix of rank m there exists an n × m matrix R such that y R = Imm .
Multiplying (9.11) from the right by R we obtain
AT = AT Imm = AT y ′ R = X −1 Sx′ − Inn R,
which determines the matrix A uniquely. Finally, for any positive w, the vectors b and c
follow from b = Ax(w) and c = AT y(w) + s(w).
•
9.3
Target sequences
Let us consider a target sequence
w0 , w1 , w2 , . . . , wk , . . .
(9.12)
which converges to the origin. The vectors wk in the sequence are positive and
lim wk = 0.
k→∞
As a consequence, for the duality gap eT wk at wk we have limk→∞ eT wk = 0; this
implies that the accumulation points of the sequence
(9.13)
ΦP D w 0 , ΦP D w 1 , ΦP D w 2 , . . . , ΦP D w k , . . .
are optimal primal-dual pairs.6 In the sequel (x∗ , s∗ ) denotes any such optimal primaldual pair.
5
Since the matrix of system (9.10) is nonsingular, the implicit function theorem (cf. Proposition A.2
in Appendix A) implies the existence of all the relevant partial derivatives.
6
Exercise 65 By definition, an accumulation point of the sequence (9.13) is a primal-dual pair
that is the limiting point of some convergent subsequence of (9.13). Verify the existence of such a
convergent subsequence.
III.9 Preliminaries
227
We are especially interested in target sequences for which the accumulation pairs
(x∗ , s∗ ) are strictly complementary. We prove below that this happens if the target
sequence lies in some cone neighborhood of the central path defined by
δc (w) ≤ τ,
where τ is fixed and τ ≥ 1. Recall that δc (w) ≥ 1, with equality if and only if w is on
the central path. Also, δc (w) is homogeneous in w: for any positive scalar λ and for
any positive vector w we have
δc (λw) = δc (w).
As a consequence, the inequality δc (w) ≤ τ determines a cone in the w-space.
In Theorem I.20 we showed for the self-dual model that the limiting pairs of any
target sequence on the central path are strictly complementary optimal solutions. Our
next result not only implies an analogous result for the standard format but it extends
it to target sequences lying inside a cone around the central path in the w-space.
Theorem III.3 Let τ ≥ 1 and let the target sequence (9.12) be such that δc (wk ) ≤ τ
for each k. Then every accumulation pair (x∗ , s∗ ) of the sequence (9.13) is strictly
complementary.
Proof: For each k = 1, 2, . . ., let (xk , sk ) := ΦP D (wk ). Then we have
xk sk = wk ,
k = 1, 2, . . . .
Now let (x∗ , s∗ ) be any accumulation point of the sequence (9.13). Then there
exists a subsequence of the given sequence whose primal-dual pairs converge to
(x∗ , s∗ ). Without loss of generality we assume that the given sequence itself is such a
subsequence. Since xk − x∗ and sk − s∗ belong respectively to the null space and the
row space of A, these vectors are orthogonal. Hence,
xk − x∗
T
sk − s∗ = 0.
Expanding the product and rearranging terms, we get
T
T
(x∗ ) sk + (s∗ ) xk = sk
Using that sk
T
T
T
xk + (s∗ ) x∗ .
xk = eT wk and (x∗ )T s∗ = 0, we arrive at
X
x∗j skj +
X
s∗j xkj = eT wk ,
k = 1, 2, . . . .
j∈σ(s∗ )
j∈σ(x∗ )
Here σ(x∗ ) denotes the support of x∗ and σ(s∗ ) the support of s∗ .7 Using that
xk sk = wk , we can write the last equation as
X
j∈σ(x∗ )
7
wjk x∗j
xkj
+
X wjk s∗j
j∈σ(s∗ )
skj
= eT w k ,
k = 1, 2, . . . .
The support of a vector is defined in Section 2.8, Definition I.19, page 36.
228
III Target-following Approach
Now let ε be a (small) positive number such that
1+ε
> τ.
nε
Then, since (x∗ , s∗ ) is the limit of the sequence (xk , sk )∞
k=0 , there exists a natural
number K such that
x∗j
s∗j
≤ 1 + ε and
≤1+ε
k
xj
skj
for each j (1 ≤ j ≤ n) and for all k ≥ K. Hence, for k ≥ K we have
X
X
wjk .
wjk +
eT wk ≤ (1 + ε)
j∈σ(x∗ )
j∈σ(s∗ )
If the pair (x∗ , s∗ ) is not strictly complementary, there exists an index i that does not
belong to the union σ(x∗ ) ∪ σ(s∗ ) of the supports of x∗ and s∗ . Then we have
X
X
wjk ≤ eT wk − wik .
wjk +
j∈σ(x∗ )
Substitution gives
This implies
j∈σ(s∗ )
eT wk ≤ (1 + ε) eT wk − wik .
(1 + ε)wik ≤ εeT wk .
The average value of the elements of wk is eT wk /n. Since δc (wk ) ≤ τ , the average
value does not exceed τ wik . Hence, eT wk ≤ nτ wik . Substituting this we obtain
(1 + ε)wik ≤ nετ wik .
Now dividing both sides by wik we arrive at the contradiction
1 + ε ≤ nετ.
This proves that (x∗ , s∗ ) is strictly complementary.
✷
If a target sequence satisfies the condition in Theorem III.3 for some τ ≥ 1, it is
clear that the ratios between the coordinates of the vectors wk are bounded. In fact,
1
wk
≤ ik ≤ τ
τ
wj
for all k and for all i and j. For target sequences on the central path these ratios are
all equal to one, so the limits of the ratios exist if k goes to infinity. In general we are
interested in target sequences for which the limits of these ratios exist when k goes to
infinity. Since the ratios between the coordinates do not change if wk is multiplied by
a positive constant, this happens if and only if there exists a positive vector w∗ such
that
nwk
lim T k = w∗ ,
(9.14)
k→∞ e w
III.9 Preliminaries
229
and then the limiting values of the ratios are given by the ratios between the
coordinates of w∗ . Note that we have eT w∗ = n, because the sum of the coordinates
of each vector nwk /eT wk is equal to n. Also note that if a target sequence satisfies
(9.14), we may find a τ ≥ 1 such that δc (wk ) ≤ τ for each k. In fact, we may take
τ = max
i,j,k
wik
.
wjk
Hence, by Theorem III.3, any accumulation pair (x∗ , s∗ ) for such a sequence is strictly
complementary.
Our next result shows that if (9.14) holds then the limiting pair (x∗ , s∗ ) is unique
and can be characterized as a weighted-analytic center of the optimal sets of (P ) and
(D). Let us first define this notion.
Definition III.4 (Weighted-analytic center) Let the nonempty and bounded set
T be the intersection of an affine space in IRp with the nonnegative orthant of IRp . We
define the support σ(T ) of T as the subset of the full index set {1, 2, . . . , p} given by
σ(T ) = {i : ∃x ∈ T such that xi > 0} .
If w is any positive vector in IRp then the corresponding weighted-analytic center of
T is defined as the zero vector if σ(T ) is empty, otherwise it is the vector in T that
maximizes the product
Y
i
x∈T.
(9.15)
xw
i ,
i∈σ(T )
If the support of T is not empty then the convexity of T implies the existence of a
vector x ∈ T such that xσ(T ) > 0. Moreover, if σ(T ) is not empty then the maximum
value of the product (9.15) exists since T is bounded. Since the product (9.15) is
strictly concave, the maximum value is attained at a unique point of T . The above
definition generalizes the notion of analytic center, as defined by Definition I.29 and
it uniquely defines the weighted-analytic center (for any positive weighting vector w)
for any bounded subset that is the intersection of an affine space in IRp with the
nonnegative orthant of IRn . 8
Below we apply this notion to the optimal sets of (P ) and (D). If a target sequence
satisfies (9.14) then the next result states that the sequence of its primal-dual pairs
converges to the pair of weighted-analytic centers of the optimal sets of (P ) and (D).
Theorem III.5 Let the target sequence (9.12) be such that (9.14) holds for some
w∗ , and let (x∗ , s∗ ) be an accumulation point of the sequence (9.13). Then x∗ is the
weighted-analytic center of P ∗ with respect to w∗ , and s∗ is the weighted-analytic
center of D∗ with respect to w∗ .
Proof: We have already established that the limiting pair (x∗ , s∗ ) is strictly complementary, from Theorem III.3. As a consequence, the support of the optimal set P ∗
8
Exercise 66 Let w be any positive vector in IRp and let the bounded set T be the intersection of
an affine space in IRp with the nonnegative orthant of IRp . Show that the weighted-analytic center
(with w as weighting vector) of T coincides with the analytic center of T if and only if w is a scalar
multiple of the all-one vector.
230
III Target-following Approach
of (P ) is equal to the support σ(x∗ ) of x∗ , and the support of the optimal set D∗ of
(D) is equal to the support σ(s∗ ) of s∗ .
Now let x̄ be optimal for (P ) and s̄ for (D). Applying the orthogonality property
to the pairs (x̄, s̄) and (xk , sk ) := ΦP D (wk ) we obtain
(xk − x̄)T (sk − s̄) = 0.
Expanding the product and rearranging terms, we get
T
T
T
T
(x̄) sk + (s̄) xk = sk xk + (s̄) x̄.
Since sk
T
xk = eT wk and (x̄)T s̄ = 0, we get
X
X
s̄j xkj = eT wk ,
x̄j skj +
k = 1, 2, . . . .
j∈σ(s∗ )
j∈σ(x∗ )
Here we have also used that σ(x̄) ⊂ σ(x∗ ) and σ(s̄) ⊂ σ(s∗ ). Using xk sk = wk we
have
X wjk s̄j
X wjk x̄j
+
= eT wk , k = 1, 2, . . . .
k
k
x
s
j
j
∗
∗
j∈σ(s )
j∈σ(x )
Multiplying both sides by n/eT wk we get
X
j∈σ(x∗ )
X nwjk s̄j
nwjk x̄j
+
= n,
eT wk xkj
eT wk skj
∗
k = 1, 2, . . . .
j∈σ(s )
Letting k → ∞, it follows that
X
j∈σ(x∗ )
X wj∗ s̄j
wj∗ x̄j
+
= n.
x∗j
s∗j
∗
j∈σ(s )
At this stage we apply the geometric inequality,9 which states that for any two positive
vectors α and β in IRn ,
!Pn βi
Pn
n
β
j=1
i
Y αj
j=1 αi
.
≤ Pn
βj
j=1 βi
j=1
We apply this inequality with β = w∗ and
αj =
wj∗ x̄j
x∗j
(j ∈ σ(x∗ )),
αj =
wj∗ s̄j
s∗j
(j ∈ σ(s∗ )).
Thus we obtain, using that the sum of the weights wj∗ equals n,
n
!wj∗
!wj∗
X wj∗ x̄j
X wj∗ s̄j
Y
Y
s̄j
x̄j
1
= 1.
≤
∗
∗
∗ +
∗
x
s
n
x
s
j
j
j
j
∗
∗
∗
∗
j∈σ(x )
9
j∈σ(s )
j∈σ(x )
j∈σ(s )
When β is the all-one vector e, the geometric inequality reduces to the arithmetic-geometric-mean
inequality. For a proof of the geometric inequality we refer to Hardy, Littlewood and Pólya [139].
III.9 Preliminaries
231
Substituting s̄ = s∗ in the above inequality we get
Y
x̄j j ≤
Y
s̄j j ≤
j∈σ(x∗ )
w∗
Y
x∗j wj ,
Y
s∗j wj .
∗
j∈σ(x∗ )
and substituting x̄ = x∗ gives
j∈σ(s∗ )
w∗
∗
j∈σ(s∗ )
This shows that x∗ maximizes the product
Y
xj j
Y
sj j
w∗
i∈σ(x∗ )
and s∗ the product
w∗
i∈σ(s∗ )
over the optimal sets of (P ) and (D) respectively. Hence the proof is complete.
9.4
✷
The target-following scheme
We are ready to describe more formally the main idea of the target-following approach.
Assume we are given some positive primal-dual feasible pair (x0 , s0 ). Put w0 := x0 s0
and assume that we have a sequence
w0 , w1 , . . . , wk , . . . , wK
(9.16)
of points wk in the w-space with the following property:
Given the primal-dual pair for wk , with 0 ≤ k < K, it is ‘easy’ to compute
the primal-dual pair for wk+1 .
We call such a sequence a traceable target sequence of length K.
If a traceable sequence of length K is available, then we can solve the given
problem pair (P ) and (D), up to the precision level eT wK , in K iterations. The k-th
iteration in this method would consist of the computation of the primal-dual targetpair corresponding to the target point wk . Conceptually, the algorithm is described as
follows (page 232).
Some remarks are in order. Firstly, in practice the primal-dual pair (x(w̄), s(w̄))
corresponding to an intermediate target w̄ is not computed exactly. Instead we
compute it approximately, but so that the approximating pair is close to w̄ in the
sense of a suitable proximity measure.
Secondly, the target sequence is not necessarily prescribed beforehand. It may be
generated in the course of the algorithm. Both cases occurred in Chapter 7. For
example, the primal-dual logarithmic barrier algorithm with full Newton steps in
232
III Target-following Approach
Conceptual Target-following Algorithm
Input:
A positive primal-dual pair (x0 , s0 );
a final target vector w̃.
begin
w := x0 s0 ;
while w is not ‘close’ to w̃ do
begin
choose an ‘intermediate’ target w̄;
compute x(w̄) and s(w̄);
w := x(w̄)s(w̄);
end
end
Section 7.5 uses intermediate targets of the form w = µe, and each subsequent target is
given by (1−θ)w, with θ fixed. The same is true for the primal-dual logarithmic barrier
algorithm with large updates in Section 7.8. In contrast, the primal-dual logarithmic
barrier algorithm with adaptive updates (cf. Section 7.6.1) defines its target points
adaptively.
Thirdly, if we say that the primal-dual pair corresponding to a given target can
be computed ‘easily’, we mean that we have an efficient numerical procedure for
finding this primal-dual pair, at least approximately. The numerical method is always
Newton’s method, either for solving the KKT system defining the primal-dual pair, or
for finding the minimizer of a suitable barrier function. When full Newton steps are
taken, the target must be close to where we are, and one step must yield a sufficiently
accurate approximation of the primal-dual pair for this target. In the literature,
methods of this type are usually called short-step methods when the target sequence
is prescribed, and adaptive-step methods if the target sequence is defined adaptively.
We call them full-step methods. If subsequent targets are at a greater distance we
are forced to use damped Newton steps. The number of Newton steps necessary to
reach the next target (at least approximately) may then become larger than one. To
achieve polynomiality we need to guarantee that this number
√ can be bounded either
by a constant or by some suitable function of n, e.g., O( n) or O(n). We refer to
such methods as multistep methods. They appear in the literature as medium-step
methods and large-step methods.
In general, a primal-dual target-following algorithm is based on some finite
underlying target sequence w0 , w1 , . . . , wk = w̃. The final target w̃ is a vector with
small duality gap eT w̃ if we are optimizing, but other final targets are allowable as
well; examples of both types of target sequence are given in Chapter 11 below. The
general structure is as follows.
III.9 Preliminaries
233
Generic (Primal-Dual) Target-following Algorithm
Input:
A positive primal-dual pair (x0 , s0 ) such that x0 s0 = w0 ;
a final target vector w̃.
begin
x = x0 , s = s0 , w := w0 ;
while w is not ‘close’ to w̃ do
begin
replace w by the next target in the sequence;
while xs is not ‘close’ to w do
begin
apply Newton steps at (x, s) with w as target
end
end
end
For each target in the sequence the next target can be prescribed (in advance), but
it can also be defined adaptively. If it is close to the present target then a single (full)
Newton step may suffice to reach the next target, otherwise we apply a multistep
method, using damped Newton steps.
The target-following approach is more general than the standard central-pathfollowing schemes that appear in the literature. The vast majority of the latter use
target sequences on the central path.10 We show below, in Chapter 11, that many
classical results in the literature can be put in the target-following scheme and that
this scheme often dramatically simplifies the analysis.
First, we derive the necessary numerical tools in the next chapter. This amounts
to generalizing results obtained before in Part II for the case where the target is on
the central path to the case where it is off the central path. We first analyze the full
primal-dual Newton step method and the damped primal-dual Newton step method
for computing the primal-dual pair corresponding to a given target vector. To this end
we introduce a proximity measure, and we show that the full Newton step method
is quadratically convergent. For the damped Newton method we show that a single
step reduces the primal-dual barrier function by at least a constant, provided that the
proximity measure is bounded below by a constant. We then have the basic ingredients
10
There are so many papers on the subject that it is impossible to give an exhaustive list. We
mention a few of them. Short-step methods along the central path can be found in Renegar [237],
Gonzaga [118], Roos and Vial [245], Monteiro and Adler [218] and Kojima et al. [178]. We also
refer the reader to the excellent survey of Gonzaga [124]. The concept of target-following methods
was introduced by Jansen et al. [159]. Closely related methods, using so-called α-sequences, were
considered before by Mizuno for the linear complementarity problem in [212] and [214]. The first
results on multistep methods were those of Gonzaga [121, 122] and Roos and Vial [244]. We also
mention den Hertog, Roos and Vial [146] and Mizuno, Todd and Ye [217]. The target-following
scheme was applied first to multistep methods by Jansen et al. [158].
234
III Target-following Approach
for the analysis of primal-dual target-following methods.
The results of the next chapter are used in Chapter 11 for the analysis of several
interesting algorithms. There we restrict ourselves to full Newton step methods because
they give the best complexity results. Later we show that the target-following concept
is also useful when dealing with dual or primal methods. We also show that the primaldual pair belonging to a target vector can be efficiently computed by such methods.
This is the subject of Chapters 12 and 13.
10
The Primal-Dual Newton Method
10.1
Introduction
Suppose that a positive primal-dual feasible pair (x, s) is given as well as some target
vector w > 0. Our aim is to find the primal-dual pair (x(w), s(w)). Recall that to the
dual feasible slack vector s belongs a unique y such that AT y + s = c. The vector in
the y-space corresponding to s(w) is denoted by y(w). In this section we define search
directions ∆x, ∆y, ∆s at the given pair (x, s) that are aimed to bring us closer to the
target pair (x(w), s(w)) corresponding to w. The search directions in this section are
obtained by applying Newton’s method to the weighted KKT system (9.2), page 220.
The approach closely resembles the treatment in Chapter 7. There the target was on
the central path, but now the target may be any positive vector w. It will become
clear that the results of Chapter 7 can be generalized almost straightforwardly to the
present case. To avoid tiresome repetitions we to omit detailed arguments when they
are similar to arguments used in Chapter 7.
10.2
Definition of the primal-dual Newton step
We want the iterates x + ∆x, y + ∆y, s + ∆s to satisfy the weighted KKT system (9.2)
with respect to the target w. So we want ∆x, ∆y and ∆s to satisfy
A(x + ∆x)
A (y + ∆y) + s + ∆s
=
=
b,
c,
(x + ∆x)(s + ∆s)
=
w.
T
x + ∆x ≥ 0,
s + ∆s ≥ 0,
Neglecting the inequality constraints, we can rewrite this as follows:
A∆x
A ∆y + ∆s
=
=
0,
0,
s∆x + x∆s + ∆x∆s
=
w − xs.
T
(10.1)
Newton’s method amounts to linearizing this system by neglecting the second-order
term ∆x∆s in the third equation. Thus we obtain the linear system
A∆x
A ∆y + ∆s
=
=
0,
0,
s∆x + x∆s
=
w − xs.
T
(10.2)
236
III Target-following Approach
Comparing this system with (7.2), page 150, in Chapter 7, we see that the only
difference occurs in the third equation, where the target vector w replaces the target
µe on the central path. In particular, both systems have the same matrix. Since this
matrix is nonsingular (cf. Theorem II.42, page 150, and Exercise 46, page 151), system
(10.2) determines the displacements ∆x, ∆y and ∆s uniquely. We call them the primaldual Newton directions at (x, s) corresponding to the target w.1,2,3 It may be worth
pointing out that computation of the displacements ∆x, ∆y and ∆s amounts to solving
a positive definite system with the matrix AXS −1 AT , just like when the target is on
the central path.
10.3
Feasibility of the primal-dual Newton step
In this section we investigate the feasibility of the (full) Newton step. As before, the
result of the Newton step at (x, y, s) is denoted by (x+ , y + , s+ ), so
x+ = x + ∆x,
y + = y + ∆y,
s+ = s + ∆s.
Since the new iterates satisfy the affine equations we only have to deal with the
question of whether x+ and s+ are nonnegative or not. We have
x+ s+ = (x + ∆x)(s + ∆s) = xs + (s∆x + x∆s) + ∆x∆s.
Since s∆x + x∆s = w − xs this leads to
x+ s+ = w + ∆x∆s.
(10.3)
Hence, x+ and s+ are feasible only if w + ∆x∆s is nonnegative. The converse is also
true. This is the content of the next lemma.
Lemma III.6 The primal-dual Newton step at (x, s) to the target w is feasible if and
only if w + ∆x∆s ≥ 0.
Proof: The proof uses exactly the same arguments as the proof of Lemma II.46; we
simply need to replace the vector µe by w. We leave it to the reader to verify this. ✷
Note that Newton’s method is exact when the second-order term ∆x∆s vanishes.
In that case we have x+ s+ = w. This means that the pair (x+ , s+ ) is the image of w
under the target map, whence x+ = x(w) and s+ = s(w).
In general ∆x∆s will not be zero and Newton’s method will not be exact. However,
the duality gap always assumes the correct value eT w after the Newton step.
1
Exercise 67 Prove that the system (10.2) has a unique solution, namely
∆y
2
3
=
AXS −1 AT
−1
b − AW s−1
∆s
=
−AT ∆y
∆x
=
ws−1 − x − xs−1 ∆s.
Exercise 68 When w = 0 in (10.2), the resulting directions coincide with the primal-dual affinescaling directions introduced in Section 7.6.2. Verify this.
Exercise 69 When w = µe and µ = xT s/n in (10.2), the resulting directions coincide with the
primal-dual centering directions introduced in Section 7.6.2. Verify this.
III.10 Primal-Dual Newton Method
237
T
Lemma III.7 If the primal-dual Newton step is feasible then (x+ ) s+ = eT w.
Proof: This is immediate from (10.3) because the vectors ∆x and ∆s are orthogonal.
✷
In the following sections we further analyze the primal-dual Newton method. This
requires a quantity for measuring the progress of the Newton iterates on the way to
the pair ΦP D (w). As may be expected, two cases could occur. In the first case the
present pair (x, s) is ‘close’ to ΦP D (w) and full Newton steps are feasible. In that case
the full Newton step method is (hopefully, and locally) quadratically convergent. In
the second case the present pair (x, s) is ‘far’ from ΦP D (w) and the full Newton step
may not be feasible. Then we are forced to take damped Newton steps and we may
expect no more than a linear convergence rate. In both cases we need a new quantity
for measuring the proximity of the current iterate to the target vector w. The next
section deals with the first case and the second case is considered in Section 10.5. It
will be no surprise that we use the weighted primal-dual barrier function φw (x, s) in
Section 10.5 to measure progress of the method.
10.4
Proximity and local quadratic convergence
Recall from (7.16), page 156, that in the analysis of the central-path-following primaldual method we measured the distance of the pair (x, s) to the target µe by the
quantity
r
r
µe
xs
1
−
.
δ(x, s; µ) =
2
xs
µe
This can be rewritten as
1
µe − xs
√
.
δ(x, s; µ) = √
2 µ
xs
Note that the right-hand side measures, in some way, the distance in the w-space
between the inverse image µe of the pair of µ-centers (x(µe), s(µe)) and the primaldual pair (x, s).4 For a general target vector w we adapt this measure to
1
w − xs
√
δ(xs, w) := p
.
xs
2 min (w)
(10.4)
The quantity on the right measures the distance from the coordinatewise product xs
to w. It is defined for (ordered) pairs of vectors in the w-space. Therefore, and because
it will be more convenient in the future, we express this feature by using the notation
4
This observation makes clear that the proximity measure δ(x, s; µ) ignores the actual data of the
problems (P ) and (D), which is contained in A, b and c. Since the behavior of Newton’s method
does depend on these data, it follows that the effect of a (full) Newton step on the proximity
measure depends on the data of the problem. This reveals the weakness of the analysis of the
full-step method (cf. Chapter 6.7). It ignores the actual data of the problem and only provides a
worst-case analysis. In contrast, with adaptive updates (cf. Chapter 6.8) the data of the problem
are taken into account and, as a result, the performance of the method is improved.
238
III Target-following Approach
δ(xs, w) instead of the alternative notation δ(x, s; w). We prove in this section that
the Newton method is quadratically convergent in terms of this proximity measure.5
As before we use scaling vectors d and u. The definition of u needs to be adapted
to the new situation:
r
r
x
xs
, u :=
.
(10.5)
d :=
s
w
Note that xs = w if and only if u = e. We also introduce a vector v according to
√
v = xs.
With d we can rescale both x and s to the vector v:6
d−1 x = v,
ds = v.
Rescaling ∆x and ∆s similarly:
d−1 ∆x =: dx ,
d∆s =: ds ,
(10.6)
we see that
∆x∆s = dx ds .
Consequently, the orthogonality of ∆x and ∆s implies that the scaled displacements
dx and ds are orthogonal as well. Now we may reduce the left-hand side in the third
equation of the KKT system as follows:
s∆x + x∆s = sdd−1 ∆x + xd−1 d∆s = v (dx + ds ) ,
so the third equation can be restated simply as
dx + ds = v −1 (w − xs) .
5
Exercise 70 The definition (10.4) of the primal-dual proximity measure δ = δ(xs, w) implies that
2δ(xs, w) ≥
w − xs
√ √
w xs
Using this and Lemma II.62, prove
6
=
q
w
−
xs
q
xs
w
.
q
xi s i
1
≤
≤ ρ(δ), 1 ≤ i ≤ n.
ρ(δ)
wi
Here we deviate from the approach in Chapter 7. The natural generalization of the approach there
would be to rescale x and s to u:
d−1 x
= u,
√
w
ds
√ =: u,
w
and then rescale ∆x and ∆s accordingly to
d−1 ∆x
=: dx ,
√
w
d∆s
√ =: ds .
w
But then we have
∆x∆s = wdx ds
and we lose the orthogonality of dx and ds with respect to the standard inner product. This could
be resolved by changing the inner product in such a way that orthogonality is preserved. We leave
it as an (interesting) exercise to the reader to work this out. Here the difficulty is circumvented by
using a different scaling.
III.10 Primal-Dual Newton Method
239
On the other hand, the first and second equations can be reformulated as ADdx = 0
and (AD)T dy + ds = 0, where dy = ∆y. We conclude that the scaled displacements
dx , dy and ds satisfy
ADdx =
(AD)T dy + ds =
dx + ds =
0
0
v −1 (w − xs) .
(10.7)
Using the same arguments as in Chapter 7 we conclude that dx and ds form the
components of v −1 (w − xs) in the null space and the row space of AD, respectively.
Note that w − xs represents the move we want to make in the w-space. Therefore we
denote it as ∆w. It is also convenient to use a scaled version dw of ∆w, namely
dw := v −1 (w − xs) = v −1 ∆w.
(10.8)
dx + ds = dw
(10.9)
Then we have
and, since dx and ds are orthogonal,
2
2
2
kdx k + kds k = kdw k .
(10.10)
This makes clear that the scaled displacements dx , ds (and also dy ) are zero if and
only if dw = 0. In that case x, y and s coincide with their values at w. An immediate
consequence of the definition (10.4) of the proximity δ(xs, w) is
kdw k
.
δ(xs, w) = p
2 min (w)
(10.11)
The next lemma contains upper bounds for the 2-norm and the infinity norm of the
second-order term dx ds .
Lemma III.8 We have kdx ds k∞ ≤
1
4
2
kdw k and kdx ds k ≤
1
√
2 2
2
kdw k .
Proof: The lemma follows immediately by applying the first uv-lemma (Lemma C.4)
to the vectors dx and ds .
✷
Lemma III.9 The Newton step is feasible if δ(xs, w) ≤ 1.
Proof: Lemma III.6 guarantees feasibility of the Newton step if w + ∆x∆s ≥ 0. Since
∆x∆s = dx ds this certainly holds if the infinity norm of the quotient dx ds /w does not
exceed 1. Using Lemma III.8 and (10.11) we may write
dx ds
w
This implies the lemma.
∞
≤
2
kdx ds k∞
kdw k
≤
= δ(xs, w)2 ≤ 1.
min (w)
4 min (w)
✷
We are ready for the main result of this section, which is a perfect analogue of
Theorem II.50, where the target is on the central path.
240
III Target-following Approach
Theorem III.10 If δ := δ(xs; w) ≤ 1, then the primal-dual Newton step is feasible
and (x+ )T s+ = eT w. Moreover, if δ < 1 then
δ2
.
δ(x+ s+ , w) ≤ p
2(1 − δ 2 )
Proof: The first part of the theorem is a restatement of Lemma III.9 and Lemma III.7.
We proceed with the proof of the second statement. By definition,
δ(x+ s+ , w)2 =
w − x+ s+
1
√
4 min (w)
x+ s+
2
.
Recall from (10.3) that x+ s+ = w + ∆x∆s = w + dx ds . Using also Lemma III.8 and
(10.11), we write
1
2
min x+ s+ ≥ min (w) − kdx ds k∞ ≥ min (w) − kdw k = min (w) 1 − δ 2 .
4
Thus we find, by substitution,
2
δ(x+ s+ , w)2 ≤
2
kw − x+ s+ k
kdx ds k
=
.
2
2
4 (1 − δ ) min (w)
4 (1 − δ 2 ) min (w)2
Finally, using the upper bound for kdx ds k in Lemma III.8 and also using (10.11) once
more, we obtain
4
δ(x+ s+ , w)2 ≤
kdw k
δ4
=
.
32 (1 − δ 2 ) min (w)2
2 (1 − δ 2 )
This implies the theorem.7
✷
It is clear that the above result has value only if the given pair (x, s) is close
enough to the target
vector w. It guarantees quadratic convergence
p to the target
√
if δ(xs, w) ≤ 1/ 2. Convergence is guaranteed only if δ(xs, w) ≤ 2/3. For larger
values of δ(xs, w) we need a different analysis. Then we measure progress of the iterates
in terms of the barrier function φw (x, s) and we use damped Newton steps. This is
the subject of the next section.
10.5
The damped primal-dual Newton method
As before, we are given a positive primal-dual pair (x, s) and a target vector w > 0.
Let x+ and s+ result from a damped Newton step of size α at (x, s). In this section
7
Recall from Lemma C.6 in Section 7.4.1 that we have the better estimate
δ(x+ s+ ; w) ≤
p
δ2
2(1 − δ4 )
if the target w is on the central path. We were not able to get the same result if w is off the central
path. We leave this as a topic for further research.
III.10 Primal-Dual Newton Method
241
we analyze the effect of a damped Newton step — at (x, s) and for the target w — on
the value of the barrier function φw (x, s) (as defined on page 221). We have
x+ = x + α∆x,
s+ = s + α∆s,
where α denotes the step-size, and 0 ≤ α ≤ 1. Using the scaled displacements dx and
ds as defined in (10.6), we may also write
s+ = d−1 (v + αds ) ,
x+ = d (v + αdx ) ,
where v =
√
xs. As a consequence,
x+ s+ = (v + αdx ) (v + αds ) = v 2 + αv (dx + ds ) + α2 dx ds .
Since
v (dx + ds ) = w − xs = w − v 2 ,
we obtain
Now, defining
x+ s+ = v 2 + α w − v 2 + α2 dx ds .
v + :=
we have
v+
and
v+
2
2
(10.12)
√
x+ s+ ,
= (v + αdx ) (v + αds )
(10.13)
− v 2 = α w − v 2 + α2 dx ds .
(10.14)
The next theorem provides a lower bound for the decrease of the barrier function value
during a damped Newton step. The bound coincides with the result of Lemma II.72
if w is on the central path and becomes worse if the ‘distance’ from w to the central
path increases.
Theorem III.11 Let δ = δ(xs, w) and let α = 1/ω − 1/(ω + 4δ 2 /δc (w)), where8
s
s
2
2
2
2
∆s
ds
∆x
dx
+
=
+
.
ω :=
x
s
v
v
Then the pair (x+ , s+ ) resulting from the damped Newton step of size α is feasible.
Moreover,
2δ
+ +
.
φw (x, s) − φw (x , s ) ≥ ψ
δc (w)ρ(δ)
Proof: It will
√ be convenient to express max (w) φw (x, s), given by (9.4), page 221, in
terms of v = xs. We obviously have
max (w) φw (x, s) = eT v 2 −
8
n
X
j=1
wj log vj2 − eT w +
Exercise 71 Verify that
∆x
dx
=
,
x
v
∆s
ds
=
.
s
v
n
X
j=1
wj log wj .
242
III Target-following Approach
Hence we have the following expression for max (w) φw (x+ , s+ ):
max (w) φw (x+ , s+ ) = eT v +
2
−
n
X
wj log vj+
j=1
2
− eT w +
n
X
wj log wj .
j=1
With ∆ := φw (x, s) − φw (x+ , s+ ), subtracting both expressions yields
2
n
vj+
X
2
+ 2
wj log
+
.
max (w) ∆ = e v − v
vj2
j=1
T
Substitution of (10.13) and (10.14) gives
X
n
n
(dx )j
(ds )j
X
wj log 1 + α
max (w) ∆ = −αeT w − v 2 +
wj log 1 + α
+
.
vj
vj
j=1
j=1
Here we took advantage of the orthogonality of dx and ds in omitting the term
containing eT dx ds . The definition of ψ implies
(dx )j
(dx )j
(dx )j
=α
−ψ α
log 1 + α
vj
vj
vj
and a similar result for the terms containing entries of ds . Substituting this we obtain
wds
wdx
T
T
2
T
+ αe
max (w) ∆
=
−αe w − v + αe
v
v
n
X
(dx )j
(ds )j
wj ψ α
+ψ α
.
−
vj
vj
j=1
2
The contribution of the terms on the left of the sum can be reduced to α kdw k . This
follows because
w (dx + ds )
w − v 2 dw
wdw
vd2
− w − v2 +
= −vdw +
=
= w = d2w .
v
v
v
v
It can easily be understood that the sum attains its maximal value if all the coordinates
of the concatenation of the vectors αdx /v and αds /v are zero except one, and the
nonzero coordinate, for which wj must be maximal, is equal to minus the norm of this
concatenated vector. The norm of the concatenation of αdx /v and αds /v being αω,
we arrive at
max (w) ∆
≥
=
2
α kdw k − max (w) ψ (−αω)
4αδ 2 min (w) − max (w) ψ (−αω) .
This can be rewritten as
∆≥
4αδ 2
4αδ 2
− ψ (−αω) =
+ αω + log (1 − αω) .
δc (w)
δc (w)
(10.15)
III.10 Primal-Dual Newton Method
243
The derivative of the right-hand side expression with respect to α is
ω
4δ 2
+ω−
,
δc (w)
1 − αω
and it vanishes only for the value of α specified in the lemma. As in the proof of
Lemma II.72 (page 201) we conclude that the specified value of α maximizes the lower
bound for ∆ in (10.15), and, as a consequence, the damped Newton step of the specified
size is feasible. Substitution in (10.15) yields, after some elementary reductions, the
following bound for ∆:
4δ 2
4δ 2
4δ 2
∆≥
=ψ
.
− log 1 +
ωδc (w)
ωδc (w)
ωδc (w)
In this bound we may replace ω by a larger value, since ψ(t) is monotonically increasing
for t nonnegative. An upper bound for ω can be obtained as follows:
s
p
2
2
2δ min (w)
dx
kdw k
ds
=
.
+
≤
ω=
v
v
min (v)
min (v)
Let the index k be such that min (v) = vk . Then we may write
p
p
r
√
2δ wk
2δ min (w)
2δ min (w)
wk
≤
= 2δ
= 2δu−1
=
k ,
min (v)
vk
vk
xk sk
where u denotes the vector defined in (10.5). The coordinates of u can be bounded
nicely by using the function ρ(δ) defined in Lemma II.62 (page 182). This can be
achieved by reducing δ = δ(xs, w), as given in (10.4), in the following way:
r
r
r
w
w
xs
1 −1
1
1
w − xs
√
≥
−
=
u −u .
δ= p
2
min (w)
xs
w
2
xs
2 min (w)
Hence we have u−1 − u ≤ 2δ. Applying Lemma II.62 it follows that the coordinates
of u and u−1 are bounded above by ρ(δ) (cf. Exercise 70, page 238). Hence we may
conclude that
ω ≤ 2δρ(δ).
Substitution of this bound in the last lower bound for ∆ yields
2δ
∆≥ψ
,
δc (w)ρ(δ)
completing the proof.
✷
√
The damped Newton method will be used only if δ = δ(xs, w) ≥ 1/ 2, because
for smaller values √
of δ full Newton steps give quadratic convergence to the target. For
δ = δ(xs, w) ≥ 1/ 2 we have
√
√
2δ
2
2
√ = 3 − 1 = 0.73205,
q
≥
=
ρ(δ)
1+ 3
√1 +
1 + 12
2
244
III Target-following Approach
so outside the region of quadratic convergence around the target w, a damped Newton
step reduces the barrier function value by at least
0.73205
.
ψ
δc (w)
0.2
0.16
0.12
0.08
ψ
0.04
0.73205
δc (w)
☛
0
0
1
2
3
4
5
6
7
8
9
10
✲ δc (w)
Figure 10.1
Lower bound for the decrease in φw during a damped Newton step.
The graph in Figure 10.1 depicts this function for 1 ≤ δc (w) ≤ 10.
Remark III.12 The above analysis is based on the barrier function φw (x, s) defined in
(9.3). We showed in (9.4) and (9.5) that, up to a constant factor max (w), the variable part
in this function is given by the weighted primal-dual logarithmic barrier function
xT s −
n
X
wj log xj sj .
j=1
In this function the weights occur in the barrier term.
We want to point out that there exists an alternative way to analyze the damped Newton
method by using a barrier function for which the weights occur in the objective term. Consider
φ̄w (x, s) := e
T
xs
w
−e −
n
X
j=1
X
x j sj
=
ψ
log
wj
n
j=1
x j sj
−1
wj
=Ψ
xs
w
−e .
(10.16)
Clearly φ̄w (x, s) is defined for all positive primal-dual pairs (x, s). Moreover, φw (x, s) ≥ 0
and the equality holds if and only if xs = w. Hence, the solution of the weighted KKT system
(9.2) is characterized by the fact that it satisfies the equation φ̄w (x, s) = 0. The variable part
of φ̄w (x, s) is given by
n
xs X
eT
−
log xj sj ,
w
j=1
III.10 Primal-Dual Newton Method
245
which has the weights in the objective term. It has recently been shown by de Klerk, Roos
and Terlaky [172] that this function can equally well serve in the analysis of the damped
Newton method. In fact, Theorem III.11 remains true if φw is replaced by φ̄w . This might
be surprising because, whereas φw is strictly convex on its domain, φ̄w is not convex unless
w is on the central path.9
•
9
Exercise 72 Let (x, s) be any positive primal-dual pair. Show that
φ̄w (x, s) ≤ φw (x, s).
11
Applications
11.1
Introduction
In this Chapter we present some examples of traceable target sequences. The examples
are chosen to cover the most prominent primal-dual methods and results in the
literature. We restrict ourselves to sequences that can be traced by full Newton steps.1
To keep the presentation simple, we make a further assumption, namely that
Newton’s method is exact in its region of quadratic convergence. In other words,
we assume that the algorithm generates exact primal-dual pairs for the respective
targets in the target sequence. In a practical algorithm the generated primal-dual
pairs will never exactly match their respective targets. However, our assumption does
not change the order of magnitude for the obtained iteration bounds. In fact, at the
cost of a little more involved analysis we can obtain the same iteration bounds for a
practical algorithm, except for a small constant factor. This can be understood from
the following theorem, where we assume that we are given a ‘good’ approximation
for the primal-dual pair (x(w), s(w)) corresponding to the target w and we consider
the effect of an update of the target to w̄. We make clear that δ(xs, w̄) ≈ δ(w, w̄) if
δ(xs, w) is small.
Thus, we assume that the proximity δ = δ(xs, w) is small. Recall that the quadratic
√
convergence property of Newton’s method justifies this assumption. If δ ≤ 1/ 2 then
in no more than 6 full Newton steps we are sure that a primal-dual pair (x, s) is
obtained for which δ(xs, w) ≤ 10−10 . Thus, if K denotes the length of the target
sequence, 6K additional Newton steps are sufficient to work with ‘exact’ primal-dual
pairs, at least from a computational point of view.
Theorem III.13 Let the primal-dual pair (x, s) and the target w be such that δ =
δ(xs, w). Then, for any other target vector w̄ we have
δ(xs, w̄) ≤
1
s
min (w)
δ + ρ(δ) δ(w, w̄).
min (w̄)
The motivation for this choice is that full Newton steps give the best iteration bounds. The results
in the previous chapter for the damped Newton step provide the ingredients for the analysis of
target-following methods using the multistep strategy. Target sequences for multistep methods
were treated extensively by Jansen in [151]. See also Jansen et al. [158].
248
III Target-following Approach
Proof: We may write
1
1
xs − w̄
xs − w + w − w̄
√
√
= p
.
δ(xs, w̄) = p
xs
xs
2 min (w̄)
2 min (w̄)
Using the triangle inequality we get
This implies
1
1
xs − w
w − w̄
√
√
+ p
.
δ(xs, w̄) ≤ p
xs
xs
2 min (w̄)
2 min (w̄)
δ(xs, w̄) ≤
s
min (w)
1
δ(xs, w) + p
min (w̄)
2 min (w̄)
r
w w − w̄
√
.
xs
w
From the result of Exercise 70 on page 238, this can be reduced to
s
min (w)
δ + ρ(δ)δ(w, w̄),
δ(xs, w̄) ≤
min (w̄)
completing the proof.
✷
In the extreme case where δ(xs, w) = 0, we have xs = w and hence δ(xs, w̄) =
δ(w, w̄). In that case the bound in the lemma is sharp, since δ = 0 and ρ(0) = 1. If δ
is small, then the first term in the bound for δ(xs, w̄) will be small compared to the
second term. This follows by noting that the square root can be bounded by
s
r
min (w)
wk
≤ ρ (δ(w, w̄)) .
(11.1)
≤
min (w̄)
w̄k
Here the index k is such that min (w̄) = w̄k .2 Since ρ(δ) ≈ 1 if δ ≈ 0, we conclude
that δ(xs, w̄) ≈ δ(w, w̄) if δ is small.
11.2
Central-path-following method
Central-path-following methods were investigated extensively in Part II. The aim of
this section is twofold. It provides a first (and easy) illustration of the use of the targetfollowing approach, and it yields one of the main results of Part II in a relatively cheap
way.
The target points have the form w = µe, µ > 0. When at the target w, we let the
next target point be given by
w̄ = (1 − θ)w, 0 < θ < 1.
2
When combining the bounds in Theorem III.13 and (11.1) one gets the bound
δ(xs, w̄) ≤ ρ (δ(w, w̄)) δ(xs, w) + ρ (δ(xs, w)) δ(w, w̄),
which has a nice symmetry, but which is weaker than the bound of Theorem III.13.
III.11 Applications
249
Then some straightforward calculations yield δ(w, w̄):
√
√
θ
µe
w̄ − w
θ n
1
√
.
= p
= √
δ(w, w̄) = p
w
2 1−θ
2 min (w̄)
2 (1 − θ)µ
Assuming that n ≥ 4 we find that
1
δ(w, w̄) ≤ √
2
if
1
θ=√ .
n
Hence, by Lemma I.36, a full Newton step method needs
√
nµ0
n log
ε
iterations3 to generate an ε-solution when starting at w0 = µ0 e.
11.3
Weighted-path-following method
With a little extra effort, we can also analyze the case where the target sequence lies on
the half line w = µw0 , µ > 0, for some fixed positive vector w0 . This half line is a socalled weighted path in the w-space. The primal-dual pairs on it converge to weightedanalytic centers of the optimal sets of (P ) and (D), due to Theorem III.5. Note that
when using a target sequence of this type we can start the algorithm everywhere in the
w-space. However, as we shall see, not using the central path diminishes the efficiency
of the algorithm.
Letting the next target point be given by
w̄ = (1 − θ)w, 0 < θ < 1,
(11.2)
we obtain
√
r
kθ wk
w̄ − w
θ
w
1
√
= p
.
= √
δ(w, w̄) = p
w
min (w)
2 1−θ
2 min (w̄)
2 (1 − θ) min (w)
Using δc (w), as defined in (9.6), page 222, which measures the proximity of w to the
central path, we may write
s
r
w
max (w) p
= nδc (w).
≤ kek
min (w)
min (w)
Thus we obtain
3
p
θ nδc (w)
δ(w, w̄) ≤ √
.
2 1−θ
Formally, we should round the iteration bound to the smallest integer exceeding it. For simplicity we
omit the corresponding rounding operator in the iteration bounds in this chapter; this is common
practice in the literature.
250
III Target-following Approach
Assuming n ≥ 4 again, we find that
1
δ(w, w̄) ≤ √
2
if
1
θ= p
.
nδc (w)
Hence, when starting at w0 , we are sure that the duality gap is smaller than ε after
at most
p
eT w 0
nδc (w0 ) log
(11.3)
ε
iterations. Here we used the obvious identity δc (w0 ) = δc (w). Comparing this result
with
a factor
p the iteration bound of the previous section we observe that we introduce
4
0
δc (w ) > 1 into the iteration bound by not using the central path.
The above result indicates that in some sense the central path is the best path to
follow to the optimal set. When starting further from the central path the iteration
bound becomes worse. This result gives us evidence of the very special status of the
central path among all possible weighted paths to the optimal set.
11.4
Centering method
If we are given a primal-dual pair (x0 , s0 ) such that w0 = x0 s0 is not on the central
path, then instead of following the weighted path through w0 to the origin, we can use
an alternative strategy. The idea is first to move to the central path and then follow
the central path to the origin. We know already how to follow the central path. But
the other problem, moving from some point w0 in the w-space to the central path, is
new. This problem has become known as the centering problem.5,6
The centering problem can be solved by using a target sequence starting at w0 and
ending on the central path. We shall propose a target sequence that converges in
√
n log δc (w0 )
(11.4)
iterations.7 The iteration bound (11.4) can be obtained as follows. Let w̄ be obtained
from some point w outside the central path by replacing each entry wi such that
wi < (1 + θ) min (w)
4
Primal-dual weighted-path-following methods were first proposed and discussed by Megiddo [200].
Later they were also analyzed by Ding and Li [67]. A primal version was studied by Roos and den
Hertog [241].
5
The centering approach presented here was proposed independently by den Hertog [140] and Mizuno [212].
Exercise 73 The centering problem includes the problem of finding the analytic center of a
polytope. Why?
Note that the quantity δc (w 0 ) appears under a logarithm. This is very important from the viewpoint
of complexity analysis. If the weights were initially determined from a primal-dual feasible pair
(x0 , s0 ), we can say that δc (w 0 ) has the same input length as the two points. It is reasonable to
assume that this input length is at most equivalent to the input length of the data of the problem,
but there is no real reason to state that it is strictly smaller. Since an algorithm is claimed to be
polynomial only when the bound on the number of iterations is a function of the logarithm of the
length of the input data, it is better to have the quantity δc (w 0 ) under the logarithm.
6
7
III.11 Applications
251
by (1 + θ) min (w), where θ is some positive constant satisfying 1 + θ ≤ δc (w). It then
follows that
δc (w)
.
δc (w̄) =
1+θ
Using that 0 ≤ w̄i − wi ≤ θ min (w) we write
1
1
w̄ − w
θ min (w) e
√
√
δ(w, w̄) = p
≤ p
.
w
w
2 min (w̄)
2 (1 + θ) min (w)
This implies
θ
δ(w, w̄) ≤ p
2 (1 + θ)
so we have
p
√
√
min (w) e
θ n
θ
θ n
√
≤
≤ p
,
kek = √
w
2
2 1+θ
2 (1 + θ)
1
δ(w, w̄) ≤ √
2
if
√
2
θ=√ .
n
At each iteration, δc (w) decreases by the factor 1 + θ. Thus, when starting at w0 , we
certainly have reached the central path if the iteration number k satisfies
(1 + θ)k ≥ δc (w0 ).
Substituting the value of θ and then taking logarithms, we obtain
√ !
2
≥ log δc (w0 ).
k log 1 + √
n
If n ≥ 3, this inequality is satisfied if8
k
√ ≥ log δc (w0 ).
n
Thus we find that no more than
√
n log δc (w0 )
(11.5)
iterations bring the iterate onto the central path. This proves the iteration bound
(11.4) for the centering problem.
The above-described target sequence ends at the point max (w) e on the central
path. From there on we can follow the central path as described in Section 11.2 and
we reach an ε-solution after a total of
√
n max (w0 )
n log δc (w0 ) + log
(11.6)
ε
8
If n ≥ 3 then we have
log
√
2
1+ √
n
1
≥ √ .
n
252
III Target-following Approach
iterations.
Note that this bound for a strategy that first centralizes and then optimizes is better
than the one we obtained for the more direct strategy (11.2) of following a sequence
along the weighted path. In fact the bound (11.6) is the best one known until now
when the starting point lies away from the central path.
Remark III.14 The above centering strategy pushes the small coordinates of w0 upward
to max (w0 ). We can also consider the more obvious strategy of moving the large coordinates
of w0 downward to min (w0 ). Following a similar analysis we obtain
δ(w, w̄) ≤
θ
Hence,
1
δ(w, w̄) ≤ √
2
if
p
nδc (w0 )
.
2
√
2
.
θ= p
nδc (w0 )
As a consequence, in the resulting iteration bound, which is proportional to 1/θ, the quantity
δc (w0 ) does not appear under the logarithm. This makes clear that we get a slightly worse
result than (11.5) in this case.9
•
11.5
Weighted-centering method
The converse of the centering problem consists in finding a primal-dual pair (x, s)
such that the ratios between the coordinates of xs are prescribed, when a point on the
central path is given. If w1 is a positive vector whose coordinates have the prescribed
weights, then we want to find feasible x and s such that xs = λw1 for some positive λ.
In fact, the aim is not to solve this √
problem exactly; it is enough if we find a primaldual pair such that δ(xs, λw1 ) ≤ 1/ 2 for some positive λ. This problem is known as
the weighted-centering problem.10
Let the primal-dual pair be given for the point w0 = µe on the central path, with
µ > 0. We first rescale the given vector w1 by a positive scalar factor in such a way
9
Exercise 74 Another strategy for reaching the central path from a given vector w 0 can be defined
as follows. When at w, we define w̄ as follows. As long as max(w) > (1+θ) min(w) do the following:
w̄i =
min(w) + θ min(w),
max(w) − θ min(w),
wi ,
if wi < (1 + θ) min(w),
if wi ≥ (1 + θ) min(w) and wi > max(w) − θ min(w),
otherwise.
Analyze this strategy and show that the iteration bound is the same as (11.5), but when the central
path is reached the duality gap is (in general) smaller, yielding a slight improvement of (11.6).
10
The treatment of the weighted-centering problem presented here was first proposed by Mizuno [214].
It closely resembles our approach to the centering problem. See also Jansen et al. [159, 158] and
Jansen [151]. A special case of the weighted-centering problem was considered by Atkinson and
Vaidya [29] and later also by Freund [85] and Goffin and Vial [102]. Their objective was to find the
weighted-analytic center of a polytope. Our approach generates the weighted-analytic center of the
primal polytope P if we take c = 0, and the weighted-analytic center of the dual polytope D if we
take b = 0. The approach of Atkinson and Vaidya was put into the target-following framework by
Jansen et al. [158]. See also Jansen [151]. The last two references use two nested traceable target
sequences. The result is a significantly simpler analysis as well as a better iteration bound than
Atkinson and Vaidya’s bound.
III.11 Applications
253
that
max (w1 ) = µ,
and we construct a traceable target sequence from w0 to w1 . When we put w := w0 ,
the coordinates of w corresponding to the largest coordinates of w1 have their correct
value. We gradually decrease the other coordinates of w to their correct value by using
a similar technique as in the previous section. Let w̄ be obtained from w by redefining
each entry wi according to
w̄i := max wi1 , wi − (1 − θ) min (w) ,
where θ is some positive constant smaller than one. Note that w̄i can never become
smaller than wi1 and if it has reached this value then it remains constant in subsequent
target vectors. Hence, this process leaves the ‘correct’ coordinates of w — those have
the larger values — invariant, and it decreases the other coordinates by (1−θ) min (w),
or less if undershooting should occur. Thus, we have
min (w̄) ≥ (1 − θ) min (w),
with equality, except possibly for the last point in the target sequence, and
0 ≤ wi − w̄i ≤ θ min (w).
To make the sequence traceable, θ cannot be taken too large. Using the last two
inequalities we write
1
w̄ − w
θ min (w) e
1
√
√
≤ p
.
δ(w, w̄) = p
w
w
2 min (w̄)
2 (1 − θ) min (w)
This gives
θ
δ(w, w̄) ≤ p
2 (1 − θ)
p
√
min (w) e
θ
θ n
√
.
≤ p
kek = √
w
2 1−θ
2 (1 − θ)
As before, assuming n ≥ 4 we get
1
δ(w, w̄) ≤ √
2
if
1
θ=√ .
n
Before the final iteration, which puts all entries of w at their correct values, each
iteration increases δc (w) by the factor 1/ (1 − θ). We certainly have reached w1 if the
iteration number k satisfies
1
≥ δc (w1 ).
(1 − θ)k
Taking logarithms, this inequality becomes
−k log (1 − θ) ≥ log δc (w1 )
and this certainly holds if
kθ ≥ log δc (w1 ),
√
since θ ≤ − log (1 − θ). Substitution of θ = 1/ n yields that no more than
√
n log δc (w1 )
iterations bring the iterate to w1 .
254
11.6
III Target-following Approach
Centering and optimizing together
In Section 11.4 we discussed a two-phase strategy for the case where the initial primaldual feasible pair (x0 , s0 ) is not on the central path. The first phase is devoted to
centralizing and the second phase to optimizing. Although this strategy achieves the
best possible iteration bound obtained so far, it is worth considering an alternative
strategy that combines the two phases at the same time.
Let w0 := x0 s0 and consider the function f : IR+ → IRn+ defined by
f (θ) :=
w0
, θ ≥ 0.
e + θw0
(11.7)
The image of f defines a path in the w-space starting at f (0) = w0 and converging to
the origin when θ goes to infinity. See Figure 11.1.
w2
✻
central path
❘
w0
■
Dikin-path
✲ w1
Figure 11.1
A Dikin-path in the w-space (n = 2).
We refer to this path as the Dikin-path in the w-space starting at w0 .11 It may easily
be checked that if w1 lies on the Dikin-path starting at w0 , then the Dikin-path
11
Dikin, well known for his primal affine-scaling method for LO, did not consider primal-dual
methods. Nevertheless, the discovery of this path in the w-space has been inspired by his work.
Therefore, we gave his name to it. The relation with Dikin’s work is as follows. The direction of
the tangent to the Dikin-path is obtained by differentiating f (θ) with respect to θ. This yields
−(w 0 )2
df (θ)
=
= −f (θ)2 .
dθ
(e + θw 0 )2
This implies that the Dikin-path is a trajectory of the vector field −w 2 in the w-space. Without
going further into it we refer the reader to Jansen, Roos and Terlaky [156] where this field was
used to obtain the primal-dual analogue of the so-called primal affine-scaling direction of Dikin [63].
This is precisely the direction used in the Dikin Step Algorithm, in Appendix E.
III.11 Applications
255
starting at w1 is just the continuation of the path starting at w0 .12 Asymptotically,
the Dikin-path becomes tangent to the central path, because for very large values of
θ we have
e
f (θ) ≈ .
θ
We can easily establish that along the path the proximity to the central path is
improving. This goes as follows. Let w := f (θ). Then, using that f preserves the
ordering of the coordinates,13 we may write
δc (w) =
max (w 0 )
1+θ max (w 0 )
0
min (w )
1+θ min (w 0 )
= δc (w0 )
1 + θ min (w0 )
≤ δc (w0 ).
1 + θ max (w0 )
(11.8)
The last inequality is strict if δc (w0 ) > 1. Also, the duality gap is decreasing. This
follows because
w0
eT w 0
eT w := eT
≤
< eT w 0 .
0
e + θw
1 + θ min (w0 )
Consequently, the Dikin-path achieves the two goals that were assigned to it. It
centralizes and optimizes at the same time.
Let us now try to devise a traceable target sequence along the Dikin-path. Suppose
that w is a point on this path. Without loss of generality we may assume that
w = f (0) = w0 . Let w̄ := f (θ) for some positive θ. Then we have
1
w̄ − w
1
√
= p
δ(w, w̄) = p
w
2 min (w̄)
2 min (w̄)
w
e+θw
−w
√
,
w
which can be simplified to
3
θw 2
1
.
δ(w, w̄) = p
2 min (w̄) e + θw
Using that f preserves the ordering of the coordinates we further deduce
p
p
3
1 + θ min (w)
max (w)
θw 2
θw
√
p
δ(w, w̄) =
,
≤ p
e + θw
e + θw
2 min (w)
2 min (w)
which gives
p
δc (w)
θw
√
δ(w, w̄) ≤
.
2
e + θw
Finally, since e + θw > e, we get
δ(w, w̄) ≤
1 p
θ δc (w) kwk .
2
12
Exercise 75 Show that if w 1 lies on the Dikin-path starting at w 0 , then the Dikin-path starting
at w 1 is just the continuation of the path starting at w 0 .
13
0 and w := f (θ), with f (θ) as defined in (11.7). Prove that
Exercise 76 Let w10 ≤ w20 ≤ . . . ≤ wn
for each positive θ we have w1 ≤ w2 ≤ . . . ≤ wn .
256
III Target-following Approach
So we have
1
δ(w, w̄) ≤ √
2
if
√
2
p
θ=
.
kwk δc (w)
We established above that the duality gap is reduced by at least the factor 1+θ min (w).
Replacing θ by its value defined above, we have
√
√
√
2 min (w)
2 min (w)
2
p
p
p
= 1+
.
1 + θ min (w) = 1 +
≥ 1+
kwk δc (w)
max (w) nδc (w)
δc (w) nδc (w)
Using δc (w) < δc (w0 ), we deduce in the usual way that after
p
eT w 0
δc (w0 ) nδc (w0 ) log
ε
(11.9)
iterations the duality gap is smaller than ε.
For large values of δc (w0 ) this bound is significantly worse than the bounds obtained
in the previous sections when starting off the central path. It is even worse —
by a factor δc (w0 ) — than the bound for the weighted-path-following method in
Section 11.3. The reason for this weak result is that in the final step, just before
(11.9), we replaced δc (w) by δc (w0 ). Thus we did not fully explore the centralizing
effect of the Dikin-path, which implies that in the final iterations δc (w) tends to 1.
To improve the bound we shall look at the process in a different way. Instead of
directly estimating the number of target moves until a suitable duality gap is achieved,
we shall concentrate on the number of steps that are required to get close to the central
path, a state that can be measured for instance by δc (w) < 2.
Using (11.8) and substituting the value of θ, we obtained
p
√
kwk δc (w) + min (w) 2
1 + θ min (w)
p
= δc (w)
δc (w̄) = δc (w)
√ .
1 + θ max (w)
kwk δc (w) + max (w) 2
This can be written as
δc (w̄) = δc (w)
!
√
2 (max (w) − min (w))
p
.
1−
√
kwk δc (w) + max (w) 2
√
Using that kwk ≤ max (w) n and max (w) = δc (w) min (w) we obtain
√
2 (δ (w) − 1)
p c
δc (w̄) ≤ δc (w) 1 −
√ .
δc (w)
nδc (w) + 2
Now assuming n ≥ 6 and δc (w) ≥ 2 we get
√
2 (δ (w) − 1)
1
p c
.
√ ≥ p
2 nδc (w)
δc (w)
nδc (w) + 2
This can be verified by elementary means. As a consequence, under these assumptions,
!
1
.
δc (w̄) ≤ δc (w) 1 − p
2 nδc (w)
III.11 Applications
257
Hence, using that δc (w) ≤ δc (w0 ), after k iterations we have
δc (w̄) ≤
1
1− p
2 nδc (w0 )
!k
δc (w0 ).
By the usual arguments, it follows that δc (w̄) ≤ 2 after at most
2
p
δc (w0 )
nδc (w0 ) log
2
iterations. The proximity to the central path is then at most 2. Now from (11.9) it
follows that the number of iterations needed to reach an ε-solution does not exceed
√
eT w 0
.
2 2n log
ε
By adding the two numbers, we obtain the iteration bound
p
√
√
δc (w0 )
eT w 0
0
n 2 2 log
.
+ 2 δc (w ) log
ε
2
Note that this bound is better than the previous bound (11.9) and also better than
the bound (11.3) for following the weighted central path. But it is still worse than the
bound (11.6) for the two-phase strategy.
11.7
Adaptive and large target-update methods
The complexity bounds derived in the previous sections are based on a worst-case
analysis of full Newton step methods. Each target step is chosen to be short enough
so that, in any possible instance, proximity will remain under control. Moreover, the
target step is not at all influenced by the particular primal-dual feasible pair. As a
consequence, for an implementation of a full-step target-following method the required
running time may give rise to some disappointment.
It then becomes tempting to take larger target-updates. An obvious improvement
would be to relate the target move to the primal-dual feasible pair and to make the
move as large as possible while keeping proximity to the primal-dual feasible pair
under control; in that case a full Newton step still yields a new primal-dual feasible
pair closer to the target and the process may be repeated. This enhancement of the
full-step strategy into the so-called adaptive step or maximal step strategy does not
improve the overall theoretical complexity bound, but it has a dramatic effect on the
efficiency, especially on the asymptotic convergence rate.14
Despite this nice asymptotic result, the steps in the adaptive-step method may in
general be too short to produce a really efficient method. In practical applications it
is often wise to work with larger target-updates. One obvious shortcoming of a large
14
In a recent paper [125], Gonzaga showed that the maximal step method — with some additional
safeguard steps — is asymptotically quadratically convergent; i.e., in the final iterations the duality
gap converges to zero quadratically. Gonzaga also showed that the iterates converge to the analytic
centers of the optimal sets of (P ) and (D).
258
III Target-following Approach
target-update is that the full Newton step may cause infeasibility. To overcome this
difficulty one must use a damped Newton step. The progress is then measured by
the primal-dual barrier logarithmic function φw (x, s) analyzed in Section 10.5. Using
the results of that section, iteration bounds for the damped Newton method can be
derived for large-update versions of the target sequences dealt with in this chapter.
In accordance with the results in Chapter 7 for the logarithmic
√ barrier central-pathfollowing method, the iteration bounds are always a factor n worse than those for
the full-step methods. We feel that it goes beyond the aim of this chapter to give a
detailed report of the results obtained in this direction. We refer the reader to the
references mentioned in the course of this chapter.15
15
In this connection it may be useful to mention again the book of Jansen [151], which contains a
thorough treatment of the target-following approach. Jansen also deals with methods using large
target-updates. He provides some additional examples of traceable target sequences that can be
used to simplify drastically the analysis of existing methods, such as the cone-affine-scaling method
of Sturm and Zhang [260] and the shifted barrier method of Freund [84]. These results can also be
found in Jansen et al. [158].
12
The Dual Newton Method
12.1
Introduction
The results in the previous sections have made clear that the image of a given target
vector w > 0 under the target map ΦP D (w) can be computed provided that we are
given some positive primal-dual pair (x, s). If the given pair (x, s) is such that xs
is close to w, Newton’s method can be applied to the weighted KKT system (9.2).
Starting at (x, s) this method generates a sequence of primal-dual pairs converging to
ΦP D (w). The distance from the pair (x, s) to w is measured by the proximity measure
δ(xs, w) in (10.4):
1
w − xs
√
δ(xs, w) := p
.
xs
2 min (w)
√
If δ(xs, w) ≤ 1/ 2 then the primal-dual method converges quadratically to ΦP D (w).
For larger values of δ(xs, w) we could realize a linear convergence rate by using damped
Newton steps of appropriate size. The sketched approach is called primal-dual because
it uses search steps in both the x-space and the s-space at each iteration.
The aim of this chapter and the next is to show that the same goal can be realized
by moving only in the primal space or the dual space. Assuming that we are given a
positive primal feasible solution x, a primal method moves in the primal space until
it reaches x(w). Similarly, a dual method starts at some given dual feasible solution
(y, s) with s > 0, and moves in the dual space until it reaches (y(w), s(w)). We deal
with dual methods in the next sections, and consider primal methods in the next
chapter. In both cases the search direction is obtained by applying Newton’s method
to a suitable weighted logarithmic barrier function. The general framework of a dual
target-following algorithm is described on page 260. The underlying target sequence
starts at w0 and ends at w̃.
12.2
The weighted dual barrier function
The search direction in a dual method is obtained by applying Newton’s method to
the weighted dual logarithmic barrier function φdw (y), given by
φdw (y)
−1
:=
min (w)
T
b y+
n
X
i=1
wi log si
!
,
(12.1)
260
III Target-following Approach
Generic Dual Target-following Algorithm
Input:
A dual feasible pair (y 0 , s0 ) such that y 0 = y w0 ; s0 = s w0 ;
a final target vector w̃.
begin
y := y 0 ; s = s0 ; w := w0 ;
while w is not ‘close’ to w̃ do
begin
replace w by the next target in the sequence;
while (y, s) is not ‘close’ to (y(w), s(w)) do
begin
apply Newton steps at (y, s) to the target w
end
end
end
with s = c − AT y. In this section we prove that φdw (y) attains its minimal value at
y(w). In the next section it turns out that φdw (y) is strictly convex. The first property
can easily be derived from the primal-dual logarithmic barrier function φw used in
Section 10.5. With w fixed, we consider φw at the pair (x(w), s). Starting from (9.4),
page 221, and using x(w)T s = cT x(w) − bT y and x(w)s(w) = w we write
max (w) φw (x(w), s)
=
=
=
=
T
x(w) s −
x(w)T s −
n
X
j=1
n
X
j=1
T
wj log xj (w)sj − e w +
wj log sj − eT w +
cT x(w) − bT y −
n
X
j=1
n
X
n
X
wj log wj
j=1
wj log sj (w)
j=1
wj log sj − eT w +
min (w) φdw (y) + cT x(w) − eT w +
n
X
n
X
wj log sj (w)
j=1
wj log sj (w).
j=1
Since w is fixed, this shows that min (w) φdw (y) and max (w)φw (x(w), s) differ by a
constant. Since φw (x(w), s) attains its minimal value at s(w), it follows that φdw (y)
must attain its minimal value at y(w).1
1
Exercise 77 For each positive primal-dual pair (x, s), prove that
φw (x, s) = φw (x, s(w)) + φw (x(w), s).
III.12 Dual Newton Method
12.3
261
Definition of the dual Newton step
d
Let y be dual feasible and w > 0. We denote the gradient of φdw (y) at y by gw
(y) and
d
the Hessian by Hw (y). These are
d
gw
(y) :=
and
−1
b − AW s−1
min (w)
Hwd (y) :=
1
AW S −2 AT ,
min (w)
as can be easily verified. Note that Hwd (y) is positive definite. It follows that φdw (y) is
a strictly convex function.
The Newton step at y is given by
−1
d
(12.2)
b − AW s−1 .
∆y = −Hwd (y)−1 gw
(y) = AW S −2 AT
Since y(w) is the minimizer of φdw (y) we have ∆y = 0 if and only if y = y(w). We
measure the proximity of y with respect to y(w) by a suitable norm of ∆y, namely
the norm induced by the positive definite matrix Hwd (y):
δ d (y, w) := k∆ykHwd (y) .
We call this the Hessian norm of ∆y. We show below that it is an appropriate
generalization of the proximity measure used in Section 6.5 (page 114) for the analysis
of the dual logarithmic barrier approach. More precisely, we find that both measures
coincide if w is on the central path.
d
Using the definition of the Hessian norm of ∆y = −Hwd (y)−1 gw
(y) we may write
q
q
d (y)T H d (y)−1 g d (y).
δ d (y, w) = ∆y T Hwd (y)∆y = gw
(12.3)
w
w
Remark III.15 The dual proximity measure δ d (y, w) can be characterized in a different
way as follows:
where
1
min
δ d (y, w) = p
min (w) x
n
d−1 x −
w
s
o
: Ax = b ,
√
w
.
(12.4)
s
We want to explain this here, because later on, for the primal method this characterization
provides a natural way of defining a primal proximity measure.
Let x satisfy Ax = b. We do not require x to be nonnegative. Replacing b by Ax in the
above expression (12.2) for ∆y and using d from (12.4), we obtain
d :=
∆y = AD2 AT
This can be rewritten as
∆y = AD2 AT
−1
−1
Ax − AW s−1 .
ADd−1 x − ws−1 = AD2 AT
−1
AD
sx − w
√
.
w
262
III Target-following Approach
The corresponding displacement in the slack space is given by ∆s = −AT ∆y. This implies
d∆s = − (AD)T AD2 AT
−1
AD
sx − w
√
.
w
√
This makes clear that −d∆s is equal to the orthogonal projection of the vector (sx − w) / w
into the row space of AD. Hence, we have
d∆s = −
where
x(s, w) = argminx
Lemma III.16 below implies
The claim follows.
12.4
sx(s, w) − w
√
,
w
sx − w
√
w
: Ax = b
.
1
kd∆sk .
δ d (y, w) = p
min (w)
•
Feasibility of the dual Newton step
Let y + result from the Newton step at y:
y + := y + ∆y.
If we define
∆s := −AT ∆y,
the slack vector for y + is just s + ∆s, as easily follows. The Newton step is feasible if
and only if s + ∆s ≥ 0. It is convenient to introduce the vector v according to
r
w
.
(12.5)
v :=
min (w)
Note that v ≥ e and v = e if and only if w is on the central path. Now we can prove
the next lemma. From this lemma it becomes clear that δ d (y, w) coincides with the
proximity measure δ(y, µ), defined in (6.7), page 114, if w = µe.
Lemma III.16
δ d (y, w) =
v∆s
∆s
∆s
≥
≥
s
s
s
.
∞
If δ d (y, w) ≤ 1 then y ∗ = y + ∆y is dual feasible.
Proof: Using (12.3) and the above expression for Hwd (y), we write
δ d (y, w)2 = ∆y T Hwd (y)∆y =
1
∆y T AW S −2 AT ∆y.
min (w)
III.12 Dual Newton Method
263
Replacing AT ∆y by −∆s and also using the definition (12.5) of v, we get
v∆s
s
δ d (y, w)2 = ∆sT V 2 S −2 ∆s =
2
.
Thus we obtain
δ d (y, w) =
v∆s
∆s
∆s
≥
≥
s
s
s
.
∞
The first inequality follows because v ≥ e, and the second inequality is trivial. This
proves the first part of the lemma. For the second part, assume δ d (y, w) ≤ 1. Then
we derive from the last inequality in the first part of the lemma that |∆s| ≤ s, which
implies s + ∆s ≥ 0. The lemma is proved.
✷
12.5
Quadratic convergence
The aim of this section is to generalize the quadratic convergence result of the dual
Newton method in Theorem II.21, page 114, to the present case.2
Theorem III.17 δ d (y + , w) ≤ δ d (y, w)2 .
Proof: By definition
d
d
δ d (y + , w)2 = gw
(y + )T Hwd (y + )−1 gw
(y + ).
d
The main part of the proof consists of the calculation of Hwd (y + ) and gw
(y + ).
It is convenient to work with the matrix
−1
B := AV (S + ∆S)
Using B we write
−2
Hwd (y + ) = AV 2 (S + ∆S)
.
AT = BB T .
d
Note that BB T is nonsingular because A has full row rank. For gw
(y + ) we may write
d
gw
(y + )
=
=
−1
b − AW (s + ∆s)−1
min (w)
−1
e
e
b − AW s−1 + AW
.
−
min (w)
s s + ∆s
d
The first two terms form gw
(y). Replacing W in the third term by min (w) V 2 , we
obtain
∆s
d
d
gw
(y + ) = gw
(y) − AV 2
.
s (s + ∆s)
2
An alternative proof of Theorem III.17 can be given by generalizing the proof of Theorem II.21;
this approach is followed in Jansen et al. [157] and also in the next chapter, where we deal with
the analogous result for primal target-following methods. The proof given here seems to be new,
and is more straightforward.
264
III Target-following Approach
Since
d
gw
(y) = −Hwd (y)∆y = −AV 2 S −2 AT ∆y = AV 2 S −2 ∆s
we get
d
gw
(y + )
= AV
2
∆s
∆s
−
2
s
s (s + ∆s)
2
= AV
(∆s)
2
s (s + ∆s)
2
!
.
The definition of B enables us to rewrite this as
2
∆s
d
+
gw (y ) = BV
.
s
d
Substituting the derived expressions for Hwd (y + ) and gw
(y + ) in the expression for
d +
2
δ (y , w) we find
2 !T
2
∆s
∆s
d +
2
T −1
T
δ (y , w) = V
BV
B BB
.
s
s
Since B T BB T
−1
B is a projection matrix,3 this implies
2 !T
2
2
∆s
∆s
∆s
d +
2
δ (y , w) ≤ V
V
= V
s
s
s
whence
δ d (y + , w) ≤ V
∆s
s
2
≤
∆s
s
∞
,
V ∆s
.
s
Finally, using Lemma III.16, the theorem follows.
12.6
2
✷
The damped dual Newton method
In this section we consider a damped Newton step to a target vector w > 0 at an
arbitrary dual feasible y with positive slack vector s = c − AT y. We use the damping
factor α and move from y to y + = y+α∆y. The resulting slack vector is s+ = c−AT y + .
Obviously s+ = s + α∆s, where ∆s = −AT ∆y. We prove the following generalization
of Lemma II.38.
Theorem III.18 Let δ = δ d (y, w). If α = 1/(δc (w)+ δ) then the damped Newton step
of size α is feasible and
δ
d
d
+
φw (y) − φw (y ) ≥ δc (w) ψ
.
δc (w)
3
It may be worth mentioning here how the proof can be adapted to the case where A does not
have full row rank. First, δd (y, w) can be redefined by replacing the inverse of the Hessian matrix
d (y) in (12.3) by its generalized inverse. Then, in the proof of Theorem III.17 we may use the
Hw
generalized inverse of BB T instead of its inverse. We then also have that
B T BB T
+
B
is a projection matrix and hence we can proceed in the same way.
III.12 Dual Newton Method
265
Proof: Defining ∆ := φdw (y) − φdw (y + ), we have
−1
∆=
min (w)
n
X
s+
wi log i
b y−b y −
si
i=1
T
T +
!
,
or equivalently,
−1
∆=
min (w)
!
α∆si
wi log 1 +
−αb ∆y −
.
si
i=1
T
n
X
Using the definition of the function ψ, we can write this as
!
n
X
α∆si
−1
α∆si
T
wi
∆=
−αb ∆y −
.
−ψ
min (w)
si
si
i=1
Thus we obtain
1
∆=
min (w)
n
∆s X
αbT ∆y + αwT
−
wi ψ
s
i=1
α∆si
si
!
.
The first two terms between the outer brackets can be reduced to α min (w) δ 2 . To this
end we write
bT ∆y + wT
T
∆s
d
(y)T ∆y.
= b − AW s−1 ∆y = − min (w) gw
s
d
Since ∆y = −Hwd (y)−1 gw
(y), we get
bT ∆y + wT
∆s
= min (w)δ 2 ,
s
proving the claim. Using the same argument as in the proof of Theorem III.11, it can
easily be understood that the sum between the brackets attains its maximal value if all
the coordinates of the vector α∆s/s are zero except one, and the nonzero coordinate,
for which wj must be maximal, is equal to minus the norm of this vector. Thus we
obtain
∆s
1
2
α min (w) δ − max (w) ψ −α
∆
≥
min (w)
s
∆s
=
α δ 2 − δc (w) ψ −α
.
s
Now also using Lemma III.16 and the monotonicity of ψ we obtain
∆ ≥ α δ 2 − δc (w) ψ (−αδ) = αδ 2 + δc (w) (αδ + log (1 − αδ)) .
It is easily verified that the right-hand side expression is maximal if α = 1/(δc (w) + δ).
Substitution of this value yields
δ
δ
= δ − δc (w) log 1 +
.
∆ ≥ δ + δc (w) log 1 −
δc (w) + δ
δc (w)
266
III Target-following Approach
This can be written as
∆ ≥ δc (w)
δ
δ
δ
= δc (w) ψ
,
− log 1 +
δc (w)
δc (w)
δc (w)
completing the proof.
12.7
✷
Dual target-updating
When analysing a dual target-following method we need to quantify the effect of
an update of the target on the proximity measure. We derive the dual analogue of
Theorem III.13 in this section. We assume that (y, s) is dual feasible and δ = δ d (y, w)
for some target vector w, and letting w∗ be any other target vector we derive an
upper bound for δ d (y, w∗ ). We have the following result, in which δ (w∗ , w) measures
the ‘distance’ from w∗ to w according to the primal-dual proximity measure introduced
in (10.4):
1
w − w∗
√
δ(w∗ , w) := p
.
(12.6)
w∗
2 min (w)
Theorem III.19
p
r
min(w)
w
δ (y, w ) ≤ p
w∗
min(w∗ )
∗
d
∞
δ d (y, w) + 2δ (w∗ , w) .
∗
d
Proof: By definition δ (y, w ) satisfies
d
δ d (y, w∗ ) = gw
∗ (y)
d (y)−1
Hw
∗
=
This implies
δ d (y, w∗ ) =
−1
b − AW ∗ s−1
∗
min (w )
1
b − AW s−1 − A (W ∗ − W ) s−1
min (w∗ )
.
d (y)−1
Hw
∗
d (y)−1
Hw
∗
.
Using the triangle inequality we derive from this
δ d (y, w∗ ) ≤
min (w)
g d (y)
min (w∗ ) w
d (y)−1
Hw
∗
+
1
A (W ∗ − W ) s−1
min (w∗ )
d (y)−1
Hw
∗
.
We have4
Hwd ∗ (y)
=
=
4
1
1
W ∗ W −2 T
AW ∗ S −2 AT =
A
S A
∗
∗
min (w )
min (w )
W
∗
w
1
AW S −2 AT
min
min (w∗ )
w
∗
min (w)
w
Hwd (y).
min
min (w∗ )
w
The meaning of the symbol ‘’ below is as follows. For any two square matrices P and Q we write
P Q (or P Q) if the matrix P − Q is positive semidefinite. If this holds and Q is nonsingular
then P must also be nonsingular and Q−1 P −1 . This property is used here.
III.12 Dual Newton Method
267
Hence
Hwd ∗ (y)−1
min (w∗ ) w
min (w) w∗
∞
Hwd (y)−1 .
We use this inequality to estimate the first term in the above estimate for δ d (y, w∗ ):
min (w)
g d (y)
min (w∗ ) w
≤
d (y)−1
Hw
∗
=
s
min (w) min(w∗ ) w
g d (y)
min (w∗ ) min(w) w∗ ∞ w
s
min (w) w
δ d (y, w).
min (w∗ ) w∗ ∞
d (y)−1
Hw
For the second term it is convenient to use the positive vector v ∗ defined by
s
w∗
∗
,
v =
min (w∗ )
and the matrix B defined by B = AS −1 . Then we have
2
Hwd ∗ (y) = B (V ∗ ) B T
and
A (W ∗ − W ) s−1 = B (w∗ − w) ,
so we may write
A (W ∗ − W ) s−1
2
d (y)−1
Hw
∗
=
=
where
−1
B (w − w∗ )
(B (w − w∗ ))T B (V ∗ )2 B T
T
(V ∗ )−1 (w − w∗ )
H (V ∗ )−1 (w − w∗ ) ,
−1
BV ∗ .
H = (BV ∗ )T B (V ∗ )2 B T
Clearly, H = H 2 . Thus, H is a projection matrix, whence H I. Therefore,
A (W ∗ − W ) s−1
2
d (y)−1
Hw
∗
−1
≤ (V ∗ )
(w − w∗ )
2
= min (w∗ )
w − w∗
√
w∗
The last equality follows by using the definition of v ∗ . Thus we obtain
1
A (W ∗ − W ) s−1
min (w∗ )
d (y)−1
Hw
∗
1
≤ p
min (w∗ )
Substituting the obtained bounds we arrive at
s
min (w) w
1
d
∗
δ (y, w ) ≤
δ d (y, w) + p
min (w∗ ) w∗ ∞
min (w∗ )
w − w∗
√
.
w∗
w − w∗
√
.
w∗
2
.
268
III Target-following Approach
Finally, using the definition of the primal-dual proximity measure δ (w∗ , w), according
to (10.4), we may write
p
2δ (w∗ , w) min (w)
1
w − w∗
p
√
p
=
,
(12.7)
w∗
min (w∗ )
min (w∗ )
and the theorem follows.
✷
In the special case where w∗ = (1 − θ)w the above result reduces to
d
1
θ kwk
θ
kwk
1
δ (y, w)
d
d
∗
√
=
δ (y, w) +
.
+√
δ (y, w ) ≤ √
1−θ
min (w)
1−θ
1−θ
1 − θ min (w)
Moreover, if w = µe, this gives
δ d (y, w∗ ) ≤
√
1
δ d (y, w) + θ n .
1−θ
13
The Primal Newton Method
13.1
Introduction
The aim of this chapter is to show that the idea of a target-following method can also be
realized by moving only in the primal space. Starting at a given positive primal feasible
solution x a primal method moves in the primal space until it reaches x(w) where
w denotes an intermediate (positive) target vector. The search direction follows by
applying Newton’s method to a weighted logarithmic barrier function. This function is
introduced in the next section. Its minimizer is precisely x(w). Hence, by taking (full or
damped) Newton steps with respect to this function we can (approximately) compute
x(w). The general framework of a primal target-following algorithm is described below.
Generic Primal Target-following Algorithm
Input:
A primal feasible vector x0 such that x0 = x w0 ;
a final target vector w̃.
begin
x := x0 ; w := w0 ;
while w is not ‘close’ to w̃ do
begin
Replace w by the next target in the sequence;
while x is not ‘close’ to x(w) do
begin
Apply Newton steps at x to the target w
end
end
end
The underlying target sequence starts at w0 and ends — via some intermediate target
vectors — at w̃.
270
13.2
III Target-following Approach
The weighted primal barrier function
The search direction in a primal method is obtained by applying Newton’s method to
the weighted primal barrier function given by
n
X
1
cT x −
wj log xj .
(13.1)
φpw (x) :=
min (w)
j=1
We first establish that φpw (x) attains its minimal value at x(w). This easily follows
by using the barrier function φw in the same way as for the dual weighted barrier
function. Starting from (9.4), on page 221, and using xT s(w) = cT x − bT y(w) and
x(w)s(w) = w we write
max (w) φw (x, s(w))
=
=
=
=
T
x s(w) −
xT s(w) −
cT x −
n
X
j=1
n
X
j=1
n
X
j=1
T
wj log xj sj (w) − e w +
wj log xj − eT w +
n
X
n
X
wj log wj
j=1
wj log xj (w)
j=1
wj log xj − bT y(w) − eT w +
min (w) φpw (x) − bT y(w) − eT w +
n
X
n
X
wj log xj (w)
j=1
wj log xj (w).
j=1
This implies that x(w) is a unique minimizer of φpw (x).
13.3
Definition of the primal Newton step
p
(x)
Let x be primal feasible and let w > 0. We denote the gradient of φpw (x) at x by gw
p
and the Hessian by Hw (x). These are
1
w
p
gw
(x) :=
c−
min (w)
x
and
Hwp (x) :=
1
W X −2 = V 2 X −2 ,
min (w)
where V = diag (v), with v as defined in (12.5) in the previous chapter. Note that
Hwp (x) is positive definite. It follows that φpw (x) is a strictly convex function.
The calculation of the Newton step ∆x is a little complicated by the fact that we
want x + ∆x to stay in the affine space Ax = b. This means that ∆x must satisfy
A∆x = 0. The Newton step at x is then obtained by minimizing the second-order
Taylor polynomial at x subject to this constraint. Thus, ∆x is the solution of
1
T p
T
p
min ∆x gw (x) + ∆x Hw (x)∆x : A∆x = 0 .
∆x
2
III.13 Primal Newton Method
271
The optimality conditions for this minimization problem are
p
gw
(x) + Hwp (x)∆x
A∆x
=
=
AT u
0,
where the coordinates of u ∈ IRm are Lagrange multipliers. We introduce the scaling
vector d according to
x
d := √ .
w
Observe that Hwp (x) = D−2 / min (w). The optimality conditions can be rewritten as
w
−d−1 ∆x + min (w) (AD)T u
=
d c−
x
AD(d−1 ∆x)
=
0,
which shows that −d−1 ∆x is the orthogonal projection of d (c − w/x) into the null
space of AD:
xc − w
w
−1
√
=⇒ ∆x = −DPAD
.
(13.2)
−d ∆x = PAD d c −
x
w
√
Remark III.20 When w = µe we have d = x/ µ. Since AD and AX have the same null
space, we have PAD = PAX . Therefore, in this case the Newton step is given by
1
∆x = − √ XPAX
µ
xc − µe
√
µ
= −XPAX
Xc
−e .
µ
This search direction is used in the so-called primal logarithmic barrier method, which is
obtained by applying the results of this chapter to the case where the targets are on the
central path. It is the natural analogue of the dual logarithmic barrier method treated in
Chapter 6.
•
We introduce the following proximity measure to quantify the distance from x to
x(w):
o
n
1
w
δ p (x, w) = p
: AT y + s = c .
(13.3)
min d s −
x
min (w) y,s
This measure is inspired by the measure (6.8) for the dual logarithmic barrier method,
introduced in Section 6.5.1 Let us denote by s(x, w) the minimizing s in (13.3).
Lemma III.21 We have
δ p (x, w) =
1
x s(x, w) − w
v∆x
√
=p
.
x
w
min(w)
Proof: For the proof of the first equality we eliminate s in (13.3) and write
o
n
o
n
w
w
− DAT y .
: AT y + s = c = min
d c−
min d s −
y
y,s
x
x
1
Similar proximity measures were used in Roos and Vial [245], and Hertog and Roos [142] for primal
methods, and in Mizuno [212, 214] and Jansen et al. [159] for primal-dual methods.
272
III Target-following Approach
Let ȳ denote the solution of the last minimization problem. Then
w
w
= DAT ȳ + PAD (d c −
.
d c−
x
x
Thus we obtain
From (13.2),
w
w
d c−
− DAT ȳ = PAD d c −
.
x
x
w
= −d−1 ∆x.
PAD d c −
x
Hence we get
p
1
−1
δ (x, w) = p
kd
min (w)
√
w∆x
v∆x
∆xk = p
=
,
x
x
min (w)
1
proving the first equality in the lemma. The second equality in the lemma follows from
the definition of s(x, w).2
✷
From the above proof and (13.2) we deduce that
d−1 ∆x = −
xs(x, w) − w
√
.
w
(13.4)
Also observe that the lemma implies that, just as in the dual case, the proximity
measure is equal to the ‘Hessian–norm’ of the Newton step:
δ p (x, w) = k∆xkHwp (x) .
13.4
Feasibility of the primal Newton step
Let x+ result from the Newton step at x:
x+ := x + ∆x.
The Newton step is feasible if and only if x + ∆x ≥ 0. Now we can prove the next
lemma.
Lemma III.22 If δ p (x, w) ≤ 1 then x∗ = x + ∆x is primal feasible.
Proof: From Lemma III.21 we derive
δ p (x, w) =
∆x
∆x
v∆x
≥
≥
x
x
x
.
∞
Hence, if δ p (x, w) < 1, then |∆x| ≤ x, which implies x + ∆x ≥ 0. The lemma follows.
✷
2
Exercise 78 If δp (x, w) ≤ 1 then s(x, w) is dual feasible. Prove this.
III.13 Primal Newton Method
13.5
273
Quadratic convergence
We proceed by showing that the primal Newton method is quadratically convergent.
Theorem III.23 δ p (x+ , w) ≤ δ p (x, w)2 .
Proof: Using the definition of δ p (x+ , w) we may write
δ p (x+ , w)
=
≤
≤
x+ s(x+ , w) − w
√
w
min(w)
1
x+ s(x, w) − w
p
√
w
min(w)
1
+
p
2 kx s(x, w) − wk.
min(w)
p
1
Denote s̄ := s(x, w). From (13.4) we obtain
x s̄ − w
x s̄(x s̄ − w)
s̄∆x = s̄dd−1 ∆x = −ds̄ √
=−
.
w
w
This implies
kx+ s̄ − wk = k(x + ∆x)s̄ − wk = xs̄ − w −
(xs̄ − w)2
xs̄(xs̄ − w)
=
.
w
w
Combining the above relations, we get
δ p (x+ , w) ≤ p
1
min(w)
2
xs̄ − w
√
w
2
≤
xs̄ − w
p
√
w
min(w)
1
This completes the proof.
13.6
!2
= δ p (x, w)2 .
✷
The damped primal Newton method
In this section we consider a damped primal Newton step to a target vector w > 0 at
an arbitrary positive primal feasible x. The damping factor is again denoted by α and
we move from x to x+ = x + α∆x. After Theorem III.18 it will be no surprise that
we have the following result.
Theorem III.24 Let δ = δ p (x, w). If α = 1/(δc (w) + δ) then the damped Newton
step of size α is feasible and
δ
p
p
+
.
φw (x) − φw (x ) ≥ δc (w) ψ
δc (w)
274
III Target-following Approach
Proof: Defining ∆ := φpw (x) − φpw (x+ ), we have
n
X
x+
wi log i
c x−c x +
xi
i=1
1
∆=
min (w)
T
T
+
!
,
or equivalently,
1
∆=
min (w)
n
X
α∆xi
wi log 1 +
−αc ∆x +
xi
i=1
T
!
.
Using the definition of the function ψ, this can be rewritten as
1
∆=
min (w)
T
−αc ∆x +
n
X
wi
i=1
α∆xi
−ψ
xi
α∆xi
xi
!
.
Thus we obtain
n
∆x X
wi ψ
−αc ∆x + αw
−
x
i=1
1
∆=
min (w)
T
T
α∆xi
xi
!
.
We reduce the first two terms between the outer brackets to α min (w) δ 2 :
−cT ∆x + wT
and from (13.2),
w T
∆x
− c−
x
=
=
√
Since d = x/ w this implies
w T
∆x
∆x,
=− c−
x
x
w
w T
d c−
PAD d c −
x
x
w 2
PAD d c −
= d−1 ∆x
x
2
.
w T
∆x = min (w) δ 2 ,
− c−
x
proving the claim. The sum between the brackets can be estimated in the same way
as for the dual method. Thus we obtain
∆x
1
α min (w) δ 2 − max (w) ψ −α
∆
≥
min (w)
x
∆x
=
α δ 2 − δc (w) ψ −α
,
x
yielding exactly the same lower bound for ∆ as in the dual case. Hence we can use
the same arguments as we did there to complete the proof.
✷
III.13 Primal Newton Method
13.7
275
Primal target-updating
We derive the primal analogue of Theorem III.19 in this section. We assume that x is
primal feasible and δ = δ p (x, w) for some target vector w. For any other target vector
w∗ we need to derive an upper bound for δ p (x, w∗ ). The result is completely similar
to Theorem III.19, but the proof must be adapted to the primal context.
Theorem III.25
p
r
min(w)
w
p
δ (x, w ) ≤
∗
∗
w
min(w )
p
∗
p
∗
δ (x, w) + 2δ (w , w) .
∞
Proof: By Lemma III.21,
x s(x, w∗ ) − w∗
1
√
,
δ p (x, w∗ ) = p
w∗
min (w∗ )
where s(x, w∗ ) satisfies the affine dual constraint AT y +s = c and minimizes the above
norm. Hence, since s(x, w) satisfies the affine dual constraint, replacing s(x, w∗ ) by
s(x, w) we obtain
δ p (x, w∗ )
≤
=
1
x s(x, w) − w∗
p
√
w∗
min (w∗ )
1
x s(x, w) − w + w − w∗
p
√
.
w∗
min (w∗ )
Using the triangle inequality we derive from this
x s(x, w) − w
w − w∗
1
1
√
p
√
+
.
δ p (x, w∗ ) ≤ p
w∗
w∗
min (w∗ )
min (w∗ )
The second term can be reduced by using (12.7) and then the theorem follows if the
first term on the right satisfies
s
min (w) w
x s(x, w) − w
1
p
√
≤
δ p (x, w).
(13.5)
∗
∗
min
(w∗ ) w∗ ∞
w
min (w )
This inequality can be obtained by writing
x s(x, w) − w
1
p
√
∗
w∗
min (w )
=
≤
=
Hence the theorem follows.
√
w x s(x, w) − w
1
p
√
√
∗
∗
w
w
min (w )
r
x s(x, w) − w
1
w
p
√
∗
∗
w ∞
w
min (w )
s
min (w) w
δ p (x, w).
min (w∗ ) w∗ ∞
✷
14
Application to the Method of
Centers
14.1
Introduction
Shortly after Karmarkar published his projective algorithm for linear optimization, some authors pointed out possible links with earlier literature. Gill et
al. [97] noticed the close similarity between the search directions in Karmarkar’s
algorithm and in the logarithmic barrier approach extensively studied by Fiacco
and√McCormick [77]. At the same time, Renegar [237] proposed an algorithm with
O( nL) iterations, an improvement over Karmarkar’s algorithm. Renegar’s scheme
was a clever implementation of Huard’s method of centers [148]. Again, there were clear
similarities, but equivalence was not established. For a while, the literature seemed
to develop in three approximately independent directions. The first stream dealt with
extensions of Karmarkar’s algorithm and was identified with the notion of projective
transformation and projective space.1 This is the topic of the next chapter. The second
stream of research was a revival and a new interpretation of the logarithmic approach.
We amply elaborated on that approach in Part II of this book. The third stream
prolonged Renegar’s contribution. Not so much has been done in this framework.2
After a decade of active research, it has become apparent that the links between
the three approaches are very tight. They only reflect different ways of looking at the
same thing. From one point of view, the similarity between the method of centers
and the logarithmic barrier approach is striking. In both cases, the progress towards
optimality is triggered by a parameter that is gradually shifted to its optimal value.
The iterations are performed in the primal, dual or primal-dual spaces; they are made
of Newton steps or damped Newton steps that aim to catch up with the parameter
variation. The parameter updates are either small enough
√ to allow full Newton steps
and the method is of a path-following type with an O( nL) iteration bound; or, the
updates are large and the method performs line searches along Newton’s direction
with the aim of reducing a certain potential. The parameter in the logarithmic barrier
approach is the penalty coefficient attached to the logarithm; in the method of centers,
the parameter is a bound on the optimal objective function value. In the logarithmic
barrier approach, the parameter is gradually moved to zero. In the method of centers,
1
For survey papers, we refer the reader to Anstreicher [17, 24], Goldfarb and Todd [109],
Gonzaga [123, 124], den Hertog and Roos [142] and Todd [265].
2
In this connection we cite den Hertog, Roos and Terlaky [143] and den Hertog [140].
278
III Target-following Approach
the parameter is monotonically shifted to the optimal value of the LO problem.
A similar link exists between Renegar’s method of centers and the variants
of Karmarkar’s method introduced by de Ghellinck and Vial [95] and Todd and
Burrell [266]. Those variants use a parameter — a lower bound in case of a
minimization problem — that is triggered to its optimal value. If this parameter
is kept fixed, the projective algorithm computes an analytic center3 that is the dual
of the center used by Renegar. Consequently, there also exist path-following schemes
for the projective algorithm, see Shaw and Goldfarb [254], and Goffin and Vial [103];
these are very close to Renegar’s method.
In this chapter we concentrate on the method of centers. Our aim is to show that the
method can be described and analyzed quite well in the target-following framework.4
14.2
Description of Renegar’s method
The method of centers (or center method) can easily be described by considering the
barrier function used by Renegar.5 Assuming the knowledge of a strict lower bound z
for the optimal value of the dual problem (D) he considers the function
φR (y, z) := −q log(bT y − z) −
n
X
log si ,
i=1
where q is some positive number and s = c − AT y. His method consists of finding
(an approximation of) the minimizer y(z) of this barrier function by using Newton’s
method. Then the lower bound z is enlarged to
z̄ = z + θ(bT y(z) − z)
(14.1)
3
The computation of analytic centers can be performed via variants of the projective algorithm. In
this connection, we cite Atkinson [29] and Goffin and Vial [102].
4
The method of centers has an interest of its own. First, the approach formalizes Huard’s scheme
and supports Huard’s intuition of an efficient interior-point algorithm. There are also close links
with Karmarkar’s method that are made explicit in Vial [285]. Second, the method of centers
offers a natural framework for cutting plane methods. Cutting plane methods could be described
in short as a way to solve an LO problem with so many (possibly infinite) inequality constraints
that we cannot even enumerate them in a reasonable computational time. The only possibility is
to generate them one at a time, as they seem needed to insure feasibility eventually. Generating
cuts from a center, and in particular, from an analytic center, appears to be sound from both
the theoretical and the practical point of views. The idea of using analytic centers in this context
was alluded to by Sonnevend [257] and fully worked out by Goffin, Haurie and Vial [99]. See
du Merle [209] and Gondzio et al. [115] for a detailed description of the method, and e.g., Bahn et
al. [31] and Goffin et al. [98] for results on large scale programs. Let us mention that the complexity
analysis of a conceptual method of analytic centers was given first by Atkinson and Vaidya [30]
and Nesterov [225]. An implementable version of the method using approximate analytic centers
is analyzed by Goffin, Luo and Ye [100], Luo [186], Ye [312], Goffin and Sharifi-Mokhtarian [101],
Altman and Kiwiel [7], Kiwiel [168], and Goffin and Vial [104]. Besides, to highlight the similarity
between the method of centers and the logarithmic barrier approach it is worth noting that
logarithmic barrier methods also allow a natural cutting plane scheme based on adding and deleting
constraints. We refer the reader to den Hertog [140], den Hertog, Roos and Terlaky [145], den Hertog
et al. [141] and Kaliski et al. [164]. For a complexity analysis of a special variant of this method
we refer the reader to Luo, Roos and Terlaky [187].
5
The notation used here differs from the notation of Renegar. This is partly due to the fact that
Renegar dealt with a solution method for the primal problem whereas we apply his approach to
the dual problem.
III.14 Method of Centers
279
for some positive θ such that z̄ is again a strict lower bound for the optimal value and
the process is repeated. Renegar showed that this scheme can be used to construct an
ε-solution of (D) in at most
√
bT y 0 − z 0
n log
O
ε
iterations, where the superscript 0 refers to initial values, as usual. In this way he was
the first to obtain this iteration bound.
The algorithm can be described as follows.
Renegar’s Method of Centers
Input:
A strict lower bound z 0 for the optimal value of (D);
a dual feasible y 0 such √
that y 0 is ‘close’ to y(z 0 );
a positive number q ≥ n;
an update parameter θ, 0 < θ < 1.
begin
y := y 0 ; z := z 0 ;
while bT y − z ≥ ε do
begin
z = z + θ bT y − z ;
while y is not ‘close’ to y(z) do
begin
Apply Newton steps at y aiming at y(z)
end
end
end
14.3
Targets in Renegar’s method
Let us now look at how this approach fits into the target-following concept. First we
observe that φR can be considered as the barrier term in a weighted barrier function
for the dual problem when we add the constraint bT y ≥ z to the dual constraints and
give the extra constraint the weight q. Giving the extra constraint the index 0, and
indexing the other constraints by 1 to n as usual, we have the vector of weights
w = (q, 1, 1, . . . , 1).
The second observation is that Renegar’s barrier function is exactly the weighted dual
barrier function φdw (cf. (12.1) on page 259) for the problem
(DR)
max 0T y : AT y + s = c, −bT y + s0 = −z, s ≥ 0, s0 ≥ 0 .
280
III Target-following Approach
The feasible region of this problem is just the feasible region of (D) cut by the objective
constraint bT y ≥ z. Since the objective function is trivial, each feasible point is optimal.
As a consequence, the weighted central path of (DR) is a point and hence this point,
which is the minimizer of φR , is just the weighted-analytic center (according to w)
of the feasible region of (D) cut by the objective constraint (cf. Theorem III.5 on
page 229).
The dual problem of (DR) is the following homogeneous problem:
(P R)
min cT x̃ − x̃0 z : Ax̃ − x̃0 b = 0, x̃ ≥ 0, x̃0 ≥ 0 .
Applying Theorem III.1 (page 222), we see that the optimality conditions for
φR (y, z) = φdw (y) are given by
Ax̃ − x̃0 b
AT y + s
b T y − s0
x̃s
x̃ s
0 0
=
0,
=
=
c,
z,
=
=
e,
q.
x̃, x0 ≥ 0,
s ≥ 0,
s0 ≥ 0,
(14.2)
The third and fifth equations imply
x̃0 =
q
q
= T
.
0
s
b y−z
x :=
x̃
bT y − z
=
x̃
x̃0
q
Hence, defining
(14.3)
we get
Ax
AT y + s
=
=
b,
c,
xs
=
µz e,
where
µz :=
bT y(z) − z
,
q
x ≥ 0,
s ≥ 0,
(14.4)
(14.5)
with y(z) denoting the minimizer of Renegar’s barrier function φR (y). We conclude
that y(z) can be characterized in two ways. First, it is the weighted-analytic center
of the feasible region of (D) cut by the objective constraint bT y ≥ z and, second, it
is the point on the central path of (D) corresponding to the above barrier parameter
value µz . Figure 14.1 depicts the situation.
In the course of the center method the lower bound z is gradually updated to the
optimal value of (D) and after each update of the lower bound the corresponding
minimizer y(z) is (approximately) computed. Since y(z) represents the dual part of
the primal-dual pair belonging to the vector µz e in the w-space, we conclude that the
center method can be considered as a central-path-following method.
III.14 Method of Centers
281
✻
b
y(z)
bT y ≥ z
❄
Figure 14.1
14.4
The center method according to Renegar.
Analysis of the center method
It will be clear that in the analysis of his method Renegar had to deal with the question
of how far the value of the lower bound z can be enlarged — according to (14.1) —
so that the minimizer ȳ of φR (y, z̄) can be computed efficiently; hereby it may be
assumed that the minimizer y of φR (y, z) is known.6 The answer to this question
determines the speed of convergence of the method. As we know, the answer depends
on the proximity δ(µz e, µz̄ e) of the present target vector µz e to the new target vector
µz̄ e. Thus, we have to estimate the proximity δ(µz e, µz̄ e), where z̄ is given by (14.1).
Further analysis below is a little complicated by the fact that the new target vector
µz̄ e is not known, since
µz̄ =
bT y(z̄) − z̄
q
depends on the unknown minimizer y(z̄) of φR (y, z̄). To cope with this complication
we need some further estimates.
Let (x(z), y(z), s(z)) denote the solution of (14.4), so it is the point on the central
path of (P ) and (D) corresponding to the strict lower bound z for the optimal value.
Then the duality gap at this point is given by
n bT y(z) − z
c x(z) − b y(z) = nµz =
.
q
T
6
T
As far as the numerical procedure for the computation of the minimizer of Renegar’s barrier
function is concerned, it may be clear that there are a lot of possible choices. Renegar presented
a dual method in [237]. His search direction is the Newton direction for minimizing φR . In our
framework this amounts to applying the dual Newton method for the computation of the primaldual pair corresponding to the target vector w for the problems (P R) and (DR); this method has
been discussed in Section 12.2. Obviously, the same goal can be achieved by using any efficient
computational — primal, dual or primal-dual — method for the computation of the primal-dual
pair corresponding to the target vector µz e for (P ) and (D).
282
III Target-following Approach
This identity can be written as
n+q
n
cT x(z) − z
=
=1+ .
bT y(z) − z
q
q
(14.6)
Denoting the optimal value by z ∗ we have cT x(z) ≥ z ∗ . Hence
n
bT y(z) − z .
z∗ − z ≤ 1 +
q
Also observe that when we know x(z) and y(z) then the lower bound z can be
reconstructed: solving z from (14.6) and (14.5) respectively we get
z=
(n + q) bT y(z) − q cT x(z)
= bT y(z) − qµz .
n
For the updated lower bound z̄ we thus find the expression
z̄ = bT y(z) − qµz + θ bT y(z) − z = bT y(z) − qµz + θqµz = bT y(z) − (1 − θ) qµz .
Since bT y(z) is a lower bound for the optimal value, this relation makes clear that we
are able to guarantee that z̄ is a strict lower bound for the optimal value only if θ < 1.
Lemma III.26 The dual objective value bT y(z) is monotonically increasing, whereas
the primal objective value cT x(z) and bT y(z) − z are monotonically decreasing if z
increases.7
Proof: We first prove the second part of the lemma. To this end we use the weighted
primal barrier function for (P R),
φpw,z (x̃, x̃0 ) = cT x̃ − x̃0 z − q log x̃0 −
n
X
log x̃i .
i=1
The dependence of this function on the lower bound z is expressed by the corresponding subindex. Now let z and z̄ be two strict
lower bounds for the optimal value of
(P ) and (D) and z̄ > z. Since x̃(z), x̃0 (z) minimizes φpw,z (x̃, x̃0 ) and x̃(z̄), x̃0 (z̄)
minimizes φpw,z̄ (x̃, x̃0 ) we have
φpw,z x̃(z), x̃0 (z) ≤ φpw,z x̃(z̄), x̃0 (z̄) , φpw,z̄ x̃(z̄), x̃0 (z̄) ≤ φpw,z̄ x̃(z), x̃0 (z) .
Adding these inequalities, we get
φpw,z x̃(z), x̃0 (z) + φpw,z̄ x̃(z̄), x̃0 (z̄) ≤ φpw,z x̃(z̄), x̃0 (z̄) + φpw,z̄ x̃(z), x̃0 (z) .
Evaluating the expressions in these inequalities and omitting the common terms on
both sides — the terms in which the parameters z and z̄ do not occur — we find
−x̃0 (z)z − x̃0 (z̄)z̄ ≤ −x̃0 (z̄)z − x̃0 (z)z̄,
7
This lemma is taken from den Hertog [140]. The proof below is a slight variation on his proof. The
proof technique is due to Fiacco and McCormick [77] and can be applied to obtain monotonicity
of the objective value along the central path in a much wider class of convex problems. We refer
the reader to den Hertog, Roos and Terlaky [144] and den Hertog [140].
III.14 Method of Centers
or equivalently,
283
(z̄ − z) x̃0 (z̄) − x̃0 (z) ≥ 0.
This implies x̃0 (z̄) − x̃0 (z) ≥ 0, or
x̃0 (z̄) ≥ x̃0 (z).
By (14.3) this is equivalent to
bT y(z̄) − z̄ ≤ bT y(z) − z.
Thus we have shown that bT y(z) − z is monotonically decreasing if z increases. This
implies that µz is also monotonically decreasing if z increases. The rest of the lemma
follows because along the central path the dual objective value is increasing and the
primal objective value is decreasing. The proof of this property of the central path
can be found in Remark II.6 (page 95).
✷
Now let z̄ be given by (14.1). Then we may write
cT x(z̄) − z − θ bT y(z) − z
cT x(z̄) − z̄
=
.
cT x(z) − z
cT x(z) − z
By the above lemma we have cT x(z̄) ≤ cT x(z). Hence, using also (14.6) we get
θ bT y(z) − z
cT x(z̄) − z̄
θq
≤1−
=1−
.
cT x(z) − z
cT x(z) − z
n+q
Using (14.6) once more we derive
bT y(z̄) − z̄
cT x(z̄) − z̄
=
,
bT y(z) − z
cT x(z) − z
and so
bT y(z̄) − z̄
θq
≤1−
.
T
b y(z) − z
n+q
Therefore we obtain the following relation between µz̄ and µz :
θq
µz̄ ≤ 1 −
µz .
n+q
(14.7)
For the moment we deviate from Renegar’s approach by taking as a new target the
vector
θq
w̄ := 1 −
w,
(14.8)
n+q
where w = µz e. Instead of Renegar’s target vector µz̄ e we use w̄ as a target vector.
Due to the inequality (14.7) this means that we slow down the progress to optimality
compared with Renegar’s
approach. We show, however, that the modified strategy
√
still yields an O( nL) iteration bound, just as Renegar’s approach. Assuming n ≥ 4,
the argument used in Section 11.2 implies that
1
δ(µz e, w̄) ≤ √
2
if
θq
1
= √ .
n+q
n
284
III Target-following Approach
Hence, when
θ=
n+q
√ ,
q n
(14.9)
the primal-dual pair belonging to the target w̄ can be computed efficiently, to any
desired accuracy.
Since the barrier parameter, and hence the duality gap, at the new target is reduced
by the factor 1 − θq/ (n + q) we obtain an ε-solution after at most
√
eT w 0
n+q
eT w 0
log
= n log
θq
ε
ε
iterations. Here w0 denotes the initial point in the w-space.
Note that the parameter q disappeared in the iteration bound. In fact, the above
analysis, based on the updating scheme (14.8), works for every positive value of q and
gives the same iteration bound for each value of q.
On the other hand, when using Renegar’s scheme, the update goes via the strict
lower bound z. As we established before, it is then
√ necessary to keep θ < 1. So
Renegar’s approach only works if q satisfies n+q < q n. This amounts to the following
condition on q:
√
n
> n.
q≥√
n−1
√
Renegar, in [237], recommended q = n and θ = 1/ 13 q . Den Hertog
[140], who
√
√
simplified the analysis significantly, used q ≥ 2 n and θ = 1/ 8 q . In both cases
the iteration bound is of the same order of magnitude as the bound derived above.8
14.5
Adaptive- and large-update variants of the center method
In the logarithmic barrier approach, we used a penalty parameter to trigger the
algorithm. By letting the parameter go to zero in a controlled way, we could drive
the pairs of dual solutions to optimality. The crux of the analysis was the updating
scheme: small, adaptive or large updates, with results of variable complexity. Small or
adaptive updates
allow relatively small reductions of the duality gap — by a factor
√
1 − O (1/ n) — in O(1)√Newton steps between two successive updates, and achieve
global convergence in O( nL) iterations. Large updates allow sharp decreases of the
duality gap — by a factor 1 − Θ (1) — but require more Newton steps (usually as
many as O(n)) between two successive updates and lead to global convergence in
O(nL) iterations. A similar situation occurs for target-following methods, where the
algorithm is triggered by the targets; the target sequence can be designed such that
similar convergence results arise for small, adaptive and large updates respectively.
The method of this chapter, the (dual) center method of Renegar, has a different
triggering mechanism: a lower bound on the optimal objective value. The idea is to
8
√
√
√
For q = n we obtain from (14.9) θ = 2/ n and for q ≥ 2 n we get θ ≤ 1/2 + 1/ n. These
values for θ are larger than the respective values used by Renegar and Den Hertog. We should
note however that this is, at least partly, due to the fact that the analysis of both Renegar and
den Hertog is based on the use of approximate central solutions whereas we made the simplifying
assumption that exact central solutions are computed for each value of µz .
III.14 Method of Centers
285
move this bound up to the point where the objective is set near to its optimal value. For
any such lower bound z the dual polytope AT y ≤ c is cut by the objective constraint
bT y ≥ z and the (ideal) new iterate is a weighted-analytic center of the cut polytope.
The weighting vector treats all the constraints in AT y ≤ c equally but it gives extra
emphasis to the objective constraint by the factor q. Enlarging q, pushes the new
iterate in the direction of the optimal set. This opens the way to adaptive- and largeupdate versions of Renegar’s method. Appropriate values for q can easily be found.
To see this it suffices to recall from (14.7) that the duality gap between two successive
updates of the lower bound reduces by at least the factor
1−
θq
.
n+q
For example, q = n and θ = 1/2 give a reduction of the duality gap by at least 3/4.
It is clear that the reduction factor for the duality gap can be made arbitrarily small
by choosing appropriate values for q and θ (0 < θ < 1). We then get out of the
domain of quadratic convergence, but by using damped Newton steps we can reach
the new weighted-analytic center in a controlled number of steps. From this it will
be clear that the updates of the lower bound can be designed in such a way that
adaptive- or large-update versions of the center method arise and that the complexity
results will be similar to those for the logarithmic barrier method. These ideas can
be worked out easily in the target-following framework. In fact, if Renegar’s method
is modified according to the updating scheme (14.8), the results immediately follow
from the corresponding results for the logarithmic barrier approach.9
9
Adaptive and large-update variants of the center method are analyzed by den Hertog [140].
Part IV
Miscellaneous Topics
15
Karmarkar’s Projective Method
15.1
Introduction
It has been pointed out before that recent research in interior-point methods for LO
has been motivated by the appearance of the seminal paper [165] of Karmarkar in
1984. Despite its extraordinary power of stimulation of the scientific community, Karmarkar’s so-called projective method seemed to remain a very particular method,
remotely related to the huge literature to which it gave rise. Significantly many papers
appeared on the projective algorithm itself,1 but the link with other methods, in
particular Renegar’s, has not drawn much attention up to recently.2 The decaying
interest for the primal projective method is also due to a poorer behavior on solving
practical optimization problems.3 In this chapter we provide a simplified description
and analysis of the projective method and we also relate it to the other methods
described in this book.
Karmarkar considered the very special problem
(P K)
min cT x : Ax = 0, eT x = n, x ≥ 0 ,
where, as before, A is an m × n matrix of rank m, and e denotes the all-one vector.
Karmarkar made two seemingly restrictive assumptions, namely that the optimal value
cT x∗ of the problem is known and has value zero, and secondly, that the all-one vector
e is feasible for (P K). Note that the problem (P K) is trivial if cT e = 0. Then the
all-one vector e is an optimal solution. So we assume throughout that this case is
excluded. As a consequence we have
cT e > 0.
(15.1)
1
Papers in that stream were written by Anstreicher [14, 15, 16, 18, 19, 20, 21, 22, 23, 24], Freund [83,
85], de Ghellinck and Vial [95, 96], Goffin and Vial [102, 103], Goldfarb and Mehrotra [105, 106, 107],
Goldfarb and Xiao [110], Goldfarb and Shaw [108], Shaw and Goldfarb [254], Gonzaga [117, 119],
Roos [239], Vial [282, 283, 284], Xu, Yao and Chen [300], Yamashita [301], Ye [304, 305, 306, 307],
Ye and Todd [315] and Todd and Burrell [266]. We also refer the reader to the survey papers
Anstreicher [17, 24], Goldfarb and Todd [109], Gonzaga [123, 124], den Hertog and Roos [142] and
Todd [265].
2
See Vial [285, 286].
3
In their comparison between the primal projective method and a primal-dual method, Fraley and
Vial [80, 81] concluded to the superiority of the later for solving optimization problems. However,
it is worth mentioning that the projective algorithm has been used with success in the computation
of analytic centers in an interior-point cutting plane algorithm; in particular, Bahn et al. [31] and
Goffin et al. [98] could solve very large decomposition problems with this approach.
290
IV Miscellaneous Topics
Later on it is made clear that the model (P K) is general enough for our purpose.
If it can be solved in polynomial time then the same is true for every LO problem.
15.2
The unit simplex Σn in IRn
The feasible region of (P K) is contained in the unit simplex in IRn . This simplex plays
a crucial role in the projective method. We denote it by Σn :
Σn = x ∈ IRn : eT x = n, x ≥ 0 .
Obviously4 the all-one vector e belongs to Σn and lies at the heart of it. The sphere
in IRn centered at e and with radius ρ is denoted by B(e, ρ). The analysis of the
projective method requires knowledge of the smallest sphere B(e, R) containing Σn as
well as the largest sphere B(e, r) whose intersection with the hyperplane eT x = n is
contained in Σn .
It can easily be understood that R is equal to the Euclidean distance from the center
e of Σn to the vertex (n, 0, . . . , 0). See Figure 15.1, which depicts Σ3 . We have
x3
(0, 0, 3)
✕
(0, 32 , 23 )
r
( 23 , 0, 32 )
e = (1, 1, 1)
(0, 0, 0)
(0, 3, 0)
( 23 , 32 , 0)
(3, 0, 0)
x2
R
❘
x1
Figure 15.1
R=
The simplex Σ3 .
p
p
(n − 1)2 + (n − 1)12 = n(n − 1).
Similarly, r is equal to the Euclidean distance from e to the center of one of the faces
4
It might be worthwhile to indicate that the dimension of the polytope Σn is n − 1, since this is the
dimension of the hyperplane eT x = n, which is the smallest affine space containing Σn .
IV.15 Karmarkar’s Projective Method
291
n
n
, . . . , n−1
), and therefore
of Σn , such as (0, n−1
r=
s
1 + (n − 1)
Assuming n > 1, we thus have
15.3
n
−1
n−1
2
=
r
n
.
n−1
1
r
=
.
R
n−1
The inner-outer sphere bound
As usual, let P denote the feasible region of the given problem (P K). Then we may
write P as
P = Ω ∩ {x ∈ IRn : x ≥ 0} ,
where Ω is the affine space determined by
Ω = x ∈ IRn : Ax = 0, eT x = n .
Now consider the minimization problem
min cT x : x ∈ Ω ∩ B(e, r) .
This problem can be solved explicitly. Since Ω is an affine space containing the center
e of the sphere B(e, r), the intersection of the two sets is a sphere of radius r in a
lower-dimensional space. Hence the minimum value of cT x over Ω ∩ B(e, r) occurs
uniquely at the point
z 1 := e − rp,
where p is the vector of unit length whose direction is obtained by projecting the vector
c into the linear space parallel to Ω. Similarly, when x runs through Ω ∩ B(e, R), the
minimal value will be attained uniquely at the point
z 2 := e − Rp.
Since
Ω ∩ B(e, r) ⊆ P ⊆ Ω ∩ B(e, R),
and the minimal value over P is given as zero, we must have
cT z 2 ≤ 0 ≤ cT z 1 .
This can be rewritten as
cT e − RcT p ≤ 0 ≤ cT e − rcT p.
The left inequality and (15.1) imply
cT p ≥
cT e
> 0.
R
292
IV Miscellaneous Topics
Hence,
cT z 1 = cT e − rcT p ≤ cT e −
r T
c e=
R
1−
1
n−1
cT e.
Thus, starting at the feasible point e we may construct in this way the new feasible
point z 1 whose objective value, compared with the value at e, is reduced by the factor
1 − 1/(n − 1).
At this stage we note that we want the new point to be positive. The above procedure
may end at the boundary of the simplex. This can be prevented by introducing a stepsize α ∈ (0, 1) and using the point
z := e − αrp
as the new iterate. Below α ≈ 1/2 will turn out to be a good choice. The objective
value is then reduced by the factor
α
.
1−
n−1
It is clear that the above procedure can be used only once. The reduction factor for
the objective value is 1 − r/R, where r/R is the ratio between the radius of the largest
inscribed sphere and the radius of the smallest circumscribed sphere for the feasible
region. This ratio is maximal at the center e of the feasible region. If we approach the
boundary of the region the ratio goes to zero and the reduction factor goes to 1 and
we cannot make enough progress to get an efficient method.
Here Karmarkar made a brilliant contribution. His idea is to transform the problem
to an equivalent problem by using a projective transformation that maps the new
iterate back to the center e of the simplex Σn . We describe this transformation in the
next section. After the transformation the procedure can be repeated and the objective
value is reduced by the same factor. After sufficiently many iterations, a feasible point
can be obtained with objective value as close to zero as we wish.
15.4
Projective transformations of Σn
Let d > 0 be any positive vector. With IRn+ denoting the set of nonnegative vectors in
IRn , the projective transformation Td : IRn+ \ {0} → Σn is defined by
Td : x 7→
ndx
ndx
= T
.
dT x
e (dx)
Note that Td can be decomposed into two transformations: a coordinate-wise scaling
x 7→ dx and a global scaling x 7→ nx/eT x. The first transformation is defined for
each x, and is linear; the second transformation — which coincides with Te — is
only defined if eT x is nonzero, and is nonlinear. As a consequence, Td is a nonlinear
transformation.
It may easily be verified that Td maps the simplex Σn into itself and that it is
invertible on Σn ; the inverse on Σn is simply
Td−1 : x 7→
nd−1 x
.
eT (d−1 x)
IV.15 Karmarkar’s Projective Method
293
The projective transformation has some important properties.
Proposition IV.1 For each d > 0 the projective transformation Td is a one-to-one
map of the simplex Σn onto itself. The intersection of Σn with the linear subspace
{x : Ax = 0} is mapped to the intersection of Σn with another subspace of the
same dimension, namely x : AD−1 x = 0 . Besides, the transformation is positively
homogeneous of degree zero; that is, for any λ > 0,
Td (λx) = Td (x).
Proof: The first statement is immediate. To prove the second statement, let x ∈ Σn .
Then Ax = 0 if and only if Ad−1 dx = 0, which is equivalent to AD−1 Td (x) = 0. This
implies the second statement. The last statement is immediate from the definition. ✷
Now let z be a feasible and positive point. For any nonzero x ∈ P there exists a
unique ξ ∈ Σn such that x = Tz (ξ). We have Ax = 0 if and only if AZξ = 0 and
T
cT x = cT Tz (ξ) = cT
n (Zc) ξ
nzξ
= T
.
eT (zξ)
e (zξ)
Hence the problem (P K) can be reformulated as
min
(
)
T
n (Zc) ξ
T
: AZξ = 0, e ξ = n, ξ ≥ 0 .
eT (zξ)
Note that the objective of this problem is nonlinear. But we know that the optimal
T
value is zero and this can happen only if (Zc) ξ = 0. So we may replace the nonlinear
T
objective by the linear objective (Zc) ξ and, changing the variable ξ back to x, we
are left with the linear problem
(P KS)
min
n
o
T
(Zc) x : AZx = 0, eT x = n, x ≥ 0 .
Note that the feasibility of z implies Az = 0, whence AZe = 0, showing that e is
feasible for the new problem. Thus we can use the procedure described in Section 15.3
to construct a new feasible point for the transformed problem so that the objective
value is reduced by a factor 1 − α/ (n − 1). The new point is obtained by minimizing
the objective over the inscribed sphere with radius αr:
min
15.5
n
o
T
(Zc) x : AZx = 0, eT x = n, kx − ek ≤ αr .
The projective algorithm
We can now describe the algorithm as follows.
294
IV Miscellaneous Topics
Projective Algorithm
Input:
An accuracy parameter ε > 0.
begin
x := e;
while cT x ≥ ε do
begin
n
o
z := argminξ (Xc)T ξ : AXξ = 0, eT ξ = n, kξ − ek ≤ αr ;
x := Tx (z);
end
end
As long as the objective value at the current iterate x is larger than the threshold
value ε, the problem is rescaled by the projective transformation Tx−1 . This makes the
all-one vector feasible. Then the new iterate z for the transformed problem is obtained
by minimizing the objective value over the inscribed sphere with radius αr. After this
the inverse of the map Tx−1 — that is Tx — is applied to z and we get a point that
is feasible for the original problem (P K) again. This is repeated until the objective
value is small enough. Figure 15.2 depicts one iteration of the algorithm.
c
✻
✠
Σn
xk
xk+1
✒
Figure 15.2
✠
✠
T −1
✲x
✛
Tx
e
optimal solution
✕Xc
Ax = 0
kξ − ek = αr
e
z
■
■
Σn
■
optimal solution
AXξ = 0
One iteration of the projective algorithm (x = xk ).
In the next section we derive an iteration bound for the algorithm. Unfortunately, the
analysis of the algorithm cannot be based on the reduction of the objective value in
each iteration. This is because the objective value is not preserved under the projective
transformation. This is the price we pay for the linearization of the nonlinear problem
IV.15 Karmarkar’s Projective Method
295
after each projective transformation. Here, again, Karmarkar proposed an elegant
solution. The progress of the method can be measured by a suitable potential function.
We introduce this function in the next section.
15.6
The Karmarkar potential
Karmarkar used the following potential function in the analysis of his method.
φK (x) = n log cT x −
n
X
log xi .
i=1
The usefulness of this function depends on two lemmas.
Lemma IV.2 If x ∈ Σn then
cT x ≤ exp
φK (x)
n
.
Proof: Since eT x = n, using the geometric-arithmetic-mean inequality we may write
n
X
i=1
log xi ≤ n log
Therefore
φK (x) = n log cT x −
which implies the lemma.
eT x
= n log 1 = 0.
n
n
X
i=1
log xi ≥ n log cT x,
✷
Lemma IV.3 Let x and z be positive vectors in Σn and y = Tx (z). Then
T
φK (x) − φK (y) = n log
(Xc) e
(Xc)T z
+
n
X
log zi .
i=1
Proof: First we observe that φK (x) is homogeneous of degree zero in x. In other
words, for each positive λ we have
φK (λx) = φK (x).
As a consequence we have
φK (y) = φK (Tx (z)) = φK
nxz
T
e (xz)
= φK (xz) ,
as follows by taking λ = n/eT (xz). Therefore,
n
φK (x) − φK (y) = φK (x) − φK (xz) = n log
X
xi
cT x
log
,
−
T
c (xz) i=1
xi zi
296
IV Miscellaneous Topics
from which the lemma follows.
✷
Applying the above lemma with z = e − αrp we can prove that each iteration of
the projective algorithm decreases the potential by at least 0.30685 when choosing α
appropriately.
Lemma IV.4 Taking α = 1/(1 + r), each iteration of the projective algorithm
decreases the potential function value by at least 1 − log 2 = 0.30685.
Proof: By Lemma IV.3, at any iteration the potential function value decreases by
the amount
n
(Xc)T e X
log zi .
∆ = n log
+
T
(Xc) z i=1
Recall that Xc is the objective vector in the transformed problem. Since the objective
value of the transformed problem is reduced by at least a factor 1 − αr/R and
z = e − αrp, we obtain
n
αr X
log (1 − αrpi ) .
+
∆ ≥ −n log 1 −
R
i=1
(15.2)
For the first term we write
αr αnr
αr
αr
αr
αr
−n log 1 −
+ψ −
+ nψ −
=n
=
= αr2 + nψ −
.
R
R
R
R
R
R
Here, and below we use the function ψ as defined in (5.5), page 92. The second term
in (15.2) can be written as
n
X
i=1
log (1 − αrpi ) = −αreT p −
n
X
i=1
ψ (−αrpi ) = −
n
X
ψ (−αrpi ) .
i=1
Here we have used the fact that eT p = 0. By the right-hand side inequality in (6.24),
on page 134, the above sum can be bounded above by ψ (−αr kpk). Since kpk = 1 we
obtain
αr
∆ ≥ αr2 + nψ −
− ψ (−αr) .
R
Omitting the second term, which is nonnegative, we arrive at
∆ ≥ αr2 − ψ (−αr) = αr2 + αr + log (1 − αr) .
The right-hand side expression is maximal if α = 1/ (1 + r). Substitution of this value
yields
r
∆ ≥ r + log 1 −
= r − log (1 + r) = ψ (r) .
1+r
p
Since r = n/(n − 1) > 1 we have ψ (r) > ψ (1) = 1−log 2, and the proof is complete.
✷
IV.15 Karmarkar’s Projective Method
15.7
297
Iteration bound for the projective algorithm
The convergence result is as follows.
Theorem IV.5 After no more than
cT e
n
log
ψ(1)
ε
iterations the algorithm stops with a feasible point x such that cT x ≤ ε.
Proof: After k iterations the iterate x satisfies
φK (x) − φK (e) < −kψ(1).
Since φK (e) = n log cT e,
φK (x) < n log cT e − kψ(1).
Using Lemma IV.2, we obtain
φK (x)
n log cT e − kψ(1)
cT x ≤ exp
< exp
.
n
n
The stopping criterion is thus certainly met as soon as
n log cT e − kψ(1)
≤ ε.
exp
n
Taking logarithms of both sides we get
n log cT e − kψ(1) ≤ n log ε,
or equivalently,
k≥
n
cT e
log
,
ψ(1)
ε
which yields the bound in the theorem.
15.8
✷
Discussion of the special format
The problem (P K) solved by the Projective Method of Karmarkar has a special format
that is called the Karmarkar format. Except for the so-called normalizing constraint
eT x = n, the constraints in (P K) are homogeneous. Furthermore, it is assumed that
the optimal value is zero and that some positive feasible vector is given.5 We may
5
In fact, Karmarkar assumed that the all-one vector e is feasible, but it is sufficient if some given
positive vector w is feasible. In that case we can use the projective transformation Tw−1 as defined
in Section 15.4, to transform the problem to another problem in the Karmarkar format and for
which the all-one vector is feasible.
298
IV Miscellaneous Topics
wonder how the Projective Method could be used to solve an arbitrary LO problem
that is not given in the Karmarkar format.6
Clearly problem (P K) is in the standard format and, since its feasible region is
contained in the unit simplex Σn in IRn , the feasible region is bounded. Finally, since
the all-one vector is feasible, (P K) satisfies the interior-point condition. In this section
we first show that a problem (P ) in standard format can easily be reduced to the Karmarkar format whenever the feasible region P of (P ) is bounded and the interior-point
condition is satisfied. Secondly, we discuss how a general LO problem can be put in
the format of (P K).
Thus, let the feasible region P of the standard problem
(P )
min cT x : Ax = b, x ≥ 0
be bounded and let it contain a positive vector. Now let the pair (ȳ, s̄) be optimal for
the dual problem
(D)
max bT y : AT y + s = c, s ≥ 0 .
Then we have, for any primal feasible x,
s̄T x = cT x − bT ȳ.
So s̄T x and cT x differ by the constant bT ȳ and hence the problem
(P ′ )
min s̄T x : Ax = b, x ≥ 0
has the same optimal set as (P ). Since s̄ is dual optimal, the optimal value of (P ′ ) is
zero. Since the feasible region P is bounded, we deduce from Corollary II.14 that the
row space of the constraint matrix A contains a positive vector. That is, there exists
a λ ∈ IRm such that
v := AT λ > 0.
Now, defining
ν := bT λ,
we have for any feasible x,
6
T
v T x = AT λ x = λT Ax = λT b = ν.
The first assumption on a known optimal value for a problem in the Karmarkar format was removed
by Todd and Burrell [266]. They used a simple observation that for any ζ, the objective cT x − ζ
is equivalent to (c − (ζ/n) e)T x. If ζ = ζ ∗ , the optimal value of problem (P K), the assumption
of a zero optimal value is verified for the problem with the new objective. If ζ < ζ ∗ , Todd and
Burrell were able to show that the algorithm allows an update of the lower bound ζ by a simple
linear ratio test after finitely many iterations; the overall procedure has the same complexity as the
original algorithm of Karmarkar. The second assumption of a known interior feasible solution was
removed by Ghellinck and Vial [95] by using a different projective embedding. They also used the
same parametrization as Todd and Burrell and thus produced the first combined phase I – phase
II interior-point algorithm, simultaneously resolving optimality and feasibility. They also pointed
out that the projective algorithm was truly a Newton method. The update of the bound in their
method is done by an awkward quadratic test. Fraley [79] was able to replace the quadratic test by
a simpler linear ratio test. To remain consistent with Part I of the book, we shall not dwell upon
those approaches, but rather use a homogeneous self-dual embedding, and analyze the behavior of
Karmarkar algorithm on the embedding problem.
IV.15 Karmarkar’s Projective Method
299
Since there exists a positive primal feasible x and v is positive, it follows that
ν = v T x > 0. We may write
νAx = νb = v T x b = b v T x = bv T x.
Hence,
Defining
νA − bv T x = 0.
A′ := νA − bv T ,
we conclude that
n
o
T
P = x : A′ x = 0, v T x = ν ,
and hence (P ′ ) can be reformulated as
(P ′ )
min s̄T x : A′ x = 0, v T x = ν, x ≥ 0 ,
where ν > 0. This problem can be rewritten as
n
o
T
(P ′′ )
min
s̄v −1 x̄ : A′ V −1 x̄ = 0, eT x̄ = n, x̄ ≥ 0 ,
where the new variable x̄ relates to the old variable x according to x̄ = nvx/ν. Since
(P ) satisfies the interior-point condition, this condition is also satisfied by (P ′ ). Hence,
the problem (P ′′ ) is not only equivalent to the given standard problem (P ), but
it satisfies all the conditions of the Karmarkar format: except for the normalizing
constraint the constraints are homogeneous, the optimal value is zero, and some
positive feasible vector is given. Thus we have shown that any standard primal problem
for which the feasible set is bounded has a representation in the Karmarkar format.7
Our second goal in this section is to point out that any given LO problem can
be transformed to a problem in the Karmarkar format. Here we use some results
from Chapter 2. First, the given problem can be put in the canonical format,
where all constraints are inequality constraints and the variables are nonnegative (see
Appendix D.1). Then we can embed the resulting canonical problem — and its dual
problem — in a homogeneous self-dual problem, as described in Section 2.5 (cf. (2.15)).
Thus we arrive at a problem of the form
min 0T x : M x ≥ 0, x ≥ 0 ,
where M is skew-symmetric (M = −M T ) and we need to find a strictly complementary
solution for this problem. We proceed by reducing this problem to the Karmarkar
format.
First we use the procedure described in Section 2.5 to embed the above self-dual
problem in a self-dual problem that satisfies the interior-point condition. As before,
let the vector r be defined by
r := e − M e.
7
It should be noted that this statement has only theoretical value; to reduce a given standard
problem with bounded feasible region to the Karmarkar format we need a dual feasible pair (ȳ, s̄)
with s̄ > 0; in general such a pair will not be available beforehand.
300
IV Miscellaneous Topics
Now consider the self-dual model in IRn+1 given by
"
#T " # "
#" # "
# " # " #
0
x
M r
x
0
0
x
:
+
≥
,
≥
0
.
min
n+1
ξ
−rT 0
ξ
n+1
0
ξ
Taking
(x, ξ) = (e, 1) ,
we get
"
M r
−rT 0
#"
x
ξ
#
+
"
0
n+1
#
=
"
Me + r
−rT e + n + 1
#
=
"
e
1
#
,
as can easily be verified. By introducing the surplus vector (s, η), we can write the
inequality constraints as equality constraints and get the equivalent problem
(
"
#" # " # "
# " # " #
)
M r
x
s
0
x
s
min ξ :
−
=
,
,
≥ 0 . (15.3)
−rT 0
ξ
η
−n − 1
ξ
η
We replaced the objective (n + 1)ξ by ξ; this is allowed since the optimal objective is
0. Note that the all-one vector ((x, ξ, s, η) = (e, 1, e, 1)) is feasible for (15.3) and the
optimal value is zero. When summing up all the constraints we obtain
eT M x + eT rξ − eT s − rT x − η = −n − 1.
Since r = e − M e and eT M e = 0, this reduces to
eT x + eT s + ξ + η = (n + 1)(1 + ξ).
(15.4)
We can replace the last equality constraint in (15.3) by (15.4). Thus we arrive at the
problem
x
x
#
#
"
"
M r −I 0 ξ
0
ξ
.
(15.5)
,
=
≥
0
min ξ :
s
eT 1 eT 1 s
(n + 1)(1 + ξ)
η
η
Instead of this problem we consider
x
"
#
"
# x
ξ
ξ
0
M r −I 0
.
≥
0
=
,
min ξ :
s
2(n + 1)
eT 1 eT 1 s
η
η
(15.6)
We established above that the all-one vector is feasible for (15.5); obviously this implies
that the all-one vector is also feasible for (15.6). It follows that the problem (15.6)
is in the Karmarkar format and hence it can be solved by the projective method.
Any optimal solution (x∗ , ξ ∗ , s∗ , η ∗ ) of (15.6) has ξ ∗ = 0. It is easily verified that
(x∗ , ξ ∗ , s∗ , η ∗ ) /2 is feasible for (15.5) and also optimal.
IV.15 Karmarkar’s Projective Method
301
Thus we have shown how any given LO problem can be embedded into a problem
that has the Karmarkar format and for which the all-one vector is feasible. We should
note however that solving the given problem by solving the embedding problem
requires a strictly complementary solution of the embedding problem. Thus we are
left with an important question, namely, does the Projective Method yield a strictly
complementary solution? A positive answer to this question has been given by
Muramatsu and Tsuchiya [223]. Their proof uses the fact that there is a close relation
between Karmarkar’s method and the primal affine-scaling method of Dikin8 when
applied to the homogeneous problem obtained by omitting the normalizing constraint
in the Karmarkar format. The next two sections serve to highlight this relation. We
first derive an explicit expression for the search direction in the Projective Method. The
result is that this direction can be interpreted as a primal logarithmic barrier direction
for the homogeneous problem. Then we show that the homogeneous problem has
optimal value zero and that any strictly complementary solution of the homogeneous
problem yields a solution of the Karmarkar format.
15.9
Explicit expression for the Karmarkar search direction
It may be surprising that in the discussion of Karmarkar’s approach there is no mention
of some issues that were crucial in the methods discussed in the rest of this book. The
most striking example of this is the complete absence of the central path in Karmarkar’s approach. Also, whereas the search direction in all the other methods is
obtained by applying Newton’s method — either to a logarithmic barrier function
or to the centering conditions — the search direction in the Projective Method is
obtained from a different perspective. The aim of this section is to derive an explicit
expression for the search direction in the Projective Method. In this way we establish
a surprising relation with the Newton direction in the primal logarithmic method for
the homogeneous problem arising when the normalizing constraint in the Karmarkar
format is neglected.
Let x be a positive vector that is feasible for (P K). Recall from Section 15.5 that
the new iterate x+ in the Projective Algorithm is obtained from x+ = Tx (z) where
n
o
T
z = argminξ (Xc) ξ : AXξ = 0, eT ξ = n, kξ − ek ≤ αr .
Here r denotes the radius of the maximal inscribed sphere in the simplex Σn and α is
the step-size. From this we can easily derive that9
z = e + α∆z,
where
∆z = argmin∆ξ
8
9
n
o
T
(Xc) ∆ξ : AX∆ξ = 0, eT ∆ξ = 0, k∆ξk = r .
For a brief description of the primal affine-scaling method of Dikin we refer to the footnote on
page 339.
We assume throughout that cT x is not constant on the feasible region of (P K). With this
assumption the vector z is uniquely defined.
302
IV Miscellaneous Topics
By writing down the first-order optimality conditions for this minimization problem
we obtain
AX∆z
eT ∆z
=
=
0
0
Xc
k∆zk
=
=
XAT y + σe + η∆z
r,
(15.7)
where σ, η ∈ IR and y ∈ IRm . Multiplying the third equation from the left by AX and
using the first equation and
AXe = Ax = 0,
(15.8)
we get
AX 2 c = AX 2 AT y,
whence
y = AX 2 AT
−1
AX 2 c.
Substituting this in the third equation of (15.7) gives
σe + η∆z
=
=
=
−1
AX 2 c
Xc − XAT AX 2 AT
−1
T
I − (AX) AX 2 AT
AX Xc
PAX (Xc) .
(15.9)
Taking the inner product with e on both sides, while using eT ∆z = 0 and eT e = n,
we get
nσ = eT PAX (Xc) .
Since AXe = 0, according to (15.8), e belongs to the null space of AX and hence
PAX (e) = e.
(15.10)
Using this we write
T
T
eT PAX (Xc) = (Xc) PAX (e) = (Xc) e = cT x.
(15.11)
Thus we obtain nσ = cT x. Substituting this in (15.9) we get
cT x
cT x
e = PAX Xc −
e .
η∆z = PAX (Xc) −
n
n
The second equality follows by using (15.10) once more. Up to its sign, the value of
the factor η now follows from k∆zk = r. This implies
T
PAX Xc − c nx e
.
∆z = ±r
(15.12)
T
PAX Xc − c nx e
Here we assumed that the vector
cT x
cT x
e = PAX (Xc) −
e
χ := PAX Xc −
n
n
(15.13)
IV.15 Karmarkar’s Projective Method
303
is nonzero. This is indeed true. We leave this fact as an exercise to the reader.10The
T
sign in (15.12) follows by using that we are minimizing (Xc) ∆z. So we must have
T
(Xc) ∆z ≤ 0. In this respect the following observation is crucial. By using the
Cauchy–Schwarz inequality we may write
√
T
T
cT x = (Xc) e = (Xc) PAX (e) = eT PAX (Xc) ≤ n kPAX (Xc)k .
Note that this inequality holds with equality only if PAX (Xc) is a scalar multiple of e.
This would imply that ∆z is a scalar multiple of e. Since eT ∆z = 0 and k∆zk = r > 0
this case cannot occur. Thus we obtain
cT x
kPAX (Xc)k > √ .
n
As a consequence,
cT x
T
e
(Xc) PAX Xc −
n
=
=
cT x
T
(Xc) PAX (e)
n
2
cT x
2
kPAX (Xc)k −
> 0.
n
T
(Xc) PAX (Xc) −
T
We conclude from this that (Xc) ∆z ≤ 0 holds only for the minus sign in (15.12).
Thus we find
rχ
∆z = −
.
(15.14)
kχk
We proceed by deriving an expression for x+ . We have
nx (e + α∆z)
nx (e + α∆z)
nxz
=x+
−
x
.
x+ = Tx (z) = T = T
x z
x (e + α∆z)
xT (e + α∆z)
So the displacement in the x-space is given by the expression between the brackets.
This expression can be reduced as follows. We have
nx (e + α∆z)
nx (e + α∆z) − xT (e + α∆z) x
nx (∆z) − xT (∆z) x
−x=
=α
.
T
T
x (e + α∆z)
x (e + α∆z)
xT (e + α∆z)
Here we used that eT x = n. Hence we may write
x+ = x + α∆x,
where
∆x =
nx (∆z) − xT (∆z) x
.
xT (e + α∆z)
(15.15)
Using (15.14) the enumerator in the last expression can be reduced as follows:
rx
nrxχ r xT χ x
T
xT χ e − nχ .
+
=
nx (∆z) − x (∆z) x = −
kχk
kχk
kχk
10
Exercise 79 Show that the assumption (15.1) implies that cT x is positive on the (relative) interior
of the feasible region of (P K). Derive from this that the vector χ is nonzero, for any feasible x
with x > 0.
304
IV Miscellaneous Topics
Using the definition (15.13) of χ and eT x = n, we may write
xT χ e − nχ
=
xT PAX (Xc) − cT x e − nPAX (Xc) + cT x e
=
xT PAX (Xc) e − nPAX (Xc)
Xc
=
nµ PAX e −
µ
where
µ=
So we have
xT PAX (Xc)
.
n
rnµ
XPAX
nx (∆z) − x (∆z) x =
kχk
T
(15.16)
Xc
e−
.
µ
Substituting this relation in the above expression (15.15) for ∆x gives
rnµ
Xc
∆x =
XPAX e −
kχk xT (e + α∆z)
µ
(15.17)
Thus we have found an explicit expression for the search direction ∆x used in the
Projective Method of Karmarkar.11 Note that this direction is a scalar multiple of
Xc
−XPAX
−e
µ
and that this is precisely the primal logarithmic barrier direction12 at x for the barrier
parameter value µ, given by (15.16), for the homogeneous problem
(P KH)
min cT x : Ax = 0, x ≥ 0 .
Note also that problem (P KH) arises when the normalizing constraint in (P K) is
neglected. We consider the problem (P KH) in more detail in the next section.
15.10
The homogeneous Karmarkar format
In this section we want to point out a relation between the primal logarithmic barrier
method when applied to the homogeneous problem (P KH) and the Projective Method
of Karmarkar. It is assumed throughout that (P K) satisfies the assumptions of the
Karmarkar format. Recall that (P KH) is given by
(P KH)
min cT x : Ax = 0, x ≥ 0 .
We first show that the optimal value of (P KH) is zero. Otherwise there exists a
nonnegative vector x satisfying Ax = 0 such that cT x < 0. But then
nx
Te (x) = T
e x
11
12
Show that cT ∆x, with ∆x given by (15.17), is negative if and only if
cT x xT PAX (Xc) > n kPAX (Xc)k2 .
See Remark III.20 on page 271.
IV.15 Karmarkar’s Projective Method
305
is feasible for (P K) and satisfies cT Te (x) < 0, contradicting the fact that the optimal
value of (P K) is zero. The claim follows.13
It is clear that any optimal solution of (P K) is nonzero and optimal for (P KH).
So (P KH) will have a nonzero optimal solution x. Now, if x is optimal then λx is
optimal as well for any nonnegative λ. Therefore, since (P KH) has a nonzero optimal
solution, the optimal set of (P KH) is unbounded. This implies, by Corollary II.12
(page 102), that the dual problem (DKH) of (P KH), given by
(DKH)
max 0T y : AT y + s = c, s ≥ 0 ,
does not contain a strictly feasible solution. Thus, (DKH) cannot satisfy the interiorpoint condition. As a consequence, the central paths of (P KH) and (DKH) do not
exist.
Note that any nonzero feasible solution x of (P KH) can be rescaled to Te (x) so
that it becomes feasible for (P K). All scalar multiples λx, with λ ≥ 0, are feasible for
(P KH), so we have a one-to-one correspondence between feasible solutions of (P K)
and feasible rays in (P KH). Therefore, we can neglect the normalizing constraint
in (P K) and just look for a nonzero optimal solution of (P KH). The behavior of
the affine-scaling direction on (P KH) has been carefully analyzed by Tsuchiya and
Muramatsu [273]. The results of this paper form the basis of the paper [223] by
the same authors in which they prove that the Projective Method yields a strictly
complementary solution of (P K).14
13
A different proof of the claim can be obtained as follows. The dual problem of (P K) is
(DK)
max
0T y + nζ : AT y + ζe + s = c, s ≥ 0 .
This problem has an optimal solution and, due to Karmarkar’s assumption, its optimal value is
zero. Thus it follows that (y, ζ) is optimal for (DK) if and only if ζ = 0 and y is an optimal solution
of
(DKH)
max 0T y : AT y + s = c, s ≥ 0 .
By dualizing (DKH) we regain the problem (P KH), and hence, by the duality theorem the optimal
value of (P KH) must be zero.
14
Exercise 80 Let x be feasible for (P KH) and positive and let µ > 0. Then, defining the number
δ(x, µ) by
n xs
o
δ(x, µ) := min
− e : AT y + s = c ,
µ
y,s
we have δ(x, µ) ≥ 1. Prove this.
16
More Properties of the Central
Path
16.1
Introduction
In this chapter we reconsider the self-dual problem
(SP )
min
T
q x : M x ≥ −q, x ≥ 0 ,
where the matrix M is of size n × n and skew-symmetric, and the vector q is
nonnegative.
We assume that the central path of (SP ) exists, and our aim is to further investigate
the behavior of the central path, especially as µ tends to 0. As usual, we denote the
µ-center by x(µ) and its surplus vector by s(µ) = s (x(µ)). From Theorem I.30 (on
page 45) we know that the central path converges to the analytic center of the optimal
set SP ∗ of (SP ). The limit point x∗ and s∗ := s(x∗ ) form a strictly complementary
optimal solution pair, and hence determine the optimal partition of (SP ), which is
denoted by π = (B, N ).
We first deal with the derivatives of x(µ) and s(µ) with respect to µ. In the next
section we prove their existence. In Section 16.2.2 we show that the derivatives are
bounded, and we also investigate the limits of the derivatives when µ approaches zero.
In a final section we show that there exist two homothetic ellipsoids that are centered
at the µ-center and which respectively contain, and are contained in, an appropriate
level set of the objective value q T x.
16.2
Derivatives along the central path
16.2.1
Existence of the derivatives
A fundamental result in the theory of interior point methods is the existence and
uniqueness of the solution of the system
F (w, x, s) =
"
Mx + q − s
xs − w
#
= 0,
x ≥ 0, s ≥ 0
308
IV Miscellaneous Topics
for all positive w.1 The solution is denoted by x(w) and s(w).
Remark IV.6 It is possible to give an elementary proof of the fact that the equation
F (w, x, s) = 0 cannot have more than one solution. This goes as follows. Let x1 , s1 and x2 , s2
denote two solutions of the equation. Define ∆x := x2 − x1 , and ∆s := s2 − s1 . Then it
follows from M x1 + s1 = M x2 + s2 = q that
M ∆x = ∆s.
Since M is skew symmetric it follows that ∆xT ∆s = ∆xT M ∆x = 0, so
eT (∆x∆s) = 0.
1 1
2 2
x1j
x2j
From x s = x s = w we derive that if
=
holds for some j then also
versa. In other words,
(∆x)j = 0 ⇔ (∆s)j = 0, j = 1, · · · , n.
(16.1)
s1j
=
s2j ,
and vice
(16.2)
Also, if x1j ≤ x2j for some j, then s1j ≥ s2j , and if x1j ≥ x2j then s1j ≤ s2j . Thus we have
(∆x)j (∆s)j ≤ 0, j = 1, · · · , n.
Using (16.1) we obtain
(∆x)j (∆s)j = 0, j = 1, · · · , n.
This, together with (16.2) yields that (∆x)j = 0 and (∆s)j = 0, for each j. Hence we conclude
that ∆x = ∆s = 0. Thus we have shown x2 = x1 and s2 = s1 , proving the claim.2
•
With z = (x, s), the gradient matrix (or Jacobian) of F (w, x, s) with respect to z is
#
"
M −I
,
∇z F (w, x, s) =
S X
where S and X are the diagonal matrices corresponding to x and s, respectively. This
matrix is independent of w and depends continuously on x, s and is nonsingular (cf.
Exercise 8). Hence we may apply the implicit function theorem.3 Since F (w, x, s) is
infinitely many times differentiable the same is true for x(w) and s(w), and we have
# "
#−1 " #
"
∂x
M −I
0
∂w
=
∂s
S X
I
∂w
On the central path we have w = µe, with µ ∈ (0, ∞). Let us consider the more
general situation where w is a function of a parameter t, such that w(t) > 0 for all
t ∈ T with T an open interval T ⊆ IR. Moreover, we assume that w is in the class C ∞
of infinitely differentiable functions. Then the first-order derivatives of x(t) and s(t)
with respect to t are given by
"
# "
#−1 "
#
x′ (t)
M
−I
0
=
.
(16.3)
s′ (t)
S(t) X(t)
w′ (t)
1
2
3
This result follows from Theorem II.4 if w = µe. For arbitrary w > 0 a proof similar to that of
Theorem III.1 can be given.
A more general result, for the case where M is a so-called P0 -matrix, is proven in Kojima et
al. [175].
Cf. Proposition A.2.
IV.16 More Theoretical Results
309
Changing notation, and denoting x′ (t) by x(1) , and similar for s and w, using induction
we can easily obtain the higher-order derivatives. Actually, we have
"
x(k)
s(k)
#
=
"
M
−I
S(t) X(t)
where
w̃ = w(k) −
k−1
X
i=1
k
i
#−1 "
0
w̃
#
x(k−i) s(i) .
If w is analytic in t then so are x and s.4
When applying the above results to the case where x = x(µ) and s = s(µ), with
µ ∈ (0, ∞), it follows that all derivatives with respect to µ exist and that x and s
depend analytically on µ.
16.2.2
Boundedness of the derivatives
Recall that the point x(µ) and its surplus vector s(µ) are uniquely determined by the
system of equations
Mx + q
xs
=
=
s, x ≥ 0, s ≥ 0,
µe.
(16.4)
Taking derivatives with respect to µ in (16.4) we find, as a special case of (16.3),
M ẋ
xṡ + sẋ
=
=
ṡ
e.
(16.5)
The derivatives of x(µ) and s(µ) with respect to µ are now denoted by ẋ and ṡ
respectively. In this section we derive bounds for the derivatives.5 These bounds are
used in the next section to study the asymptotic behavior of the derivatives when µ
approaches zero. Since we are interested only in the asymptotic behavior, we assume
in this section that µ is bounded above by some fixed positive number µ̄.
Table 16.1. (page 310) summarizes some facts concerning the order of magnitude of
the components of various vectors of interest. We are interested in the dependence on
µ. All other problem dependent data (like the condition number σSP , the dimension
n of the problem, etc.) are considered as constants in the analysis below.
From Table 16.1. we read that, e.g., xB (µ) = Θ(1) and ẋN (µ) = O(1). For the
meaning of the symbols Θ and O we refer to Section 1.7.4. See also page 190. It is
important to stress that the constants hidden in the order symbols are independent
4
This follows from an extension of the implicit function theorem. We refer the reader to, e.g.,
Fiacco [76], Theorem 2.4.2, page 36. See also Halická [137], Wechs [290] and Zhao and Zhu [321].
5
We restrict ourselves to first-order derivatives. The asymptotic behavior of the derivatives has been
considered by Adler and Monteiro [3], Witzgall, Boggs and Domich [294] and Ye et al. [313]. We
also mention Güler [131], who also considers the higher-order derivatives and their asymptotic
behavior, both when µ goes to zero and when µ goes to infinity. A very interesting result in his
paper is that all the higher-order derivatives vanish if µ approaches infinity, which indicates that
the central path is asymptotically linear at infinity.
310
IV Miscellaneous Topics
Vector
Table 16.1.
B
N
1
x(µ)
Θ(1)
Θ(µ)
2
s(µ)
Θ(µ)
3
d(µ)
Θ( √1µ )
Θ(1)
√
Θ( µ)
4
ẋ(µ)
O(1)
O(1)
5
ṡ(µ)
O(1)
O(1)
Asymptotic orders of magnitude of some relevant vectors.
of the vectors x, s and of the value µ of the barrier parameter. They depend only on
the problem data M and q and the upper bound µ̄ for µ.
The statements in the first two lines of the table almost immediately follow from
Lemma I.43 on page 57. For example, for i ∈ B the lemma states nxi (µ) ≥ σSP , where
σSP is the condition number of (SP ). This means that xi (µ) is bounded below by a
constant. But, since xi (µ) is bounded on the finite section 0 < µ ≤ µ̄ of the central
path, as a consequence of Lemma I.9 (page 24), xi (µ) is also bounded above by a
constant. This justifies the statement xi (µ) = Θ(1). Since, xi (µ)si (µ) = µ, this also
implies si (µ) = Θ(µ). This explains the first two lines for the B-part. The estimates
for the N -parts of xi (µ) and si (µ) are derived in the same way.
The third line shows order estimates for the vector d(µ), given by
s
x(µ)
.
d(µ) =
s(µ)
These estimates immediately follow from the first two lines of the table. It remains to
deal with the last two lines in the table, which concern the derivatives.
In the rest of this section we omit the argument µ and write simply x instead of
x(µ). This gives no rise to confusion. We start
√ by writing the second equation in (16.5)
in a different way. Dividing both sides by xs, and using that xs = µe we get
e
dṡ + d−1 ẋ = √ .
µ
(16.6)
Note that the orthogonality of ẋ and ṡ — which is immediate from the first equation
in (16.5) since M is skew-symmetric — implies that the vectors dṡ and d−1 ẋ are
orthogonal as well. Hence we have
kdṡk2 + d−1 ẋ
Consequently
√
n
kdṡk ≤ √ ,
µ
2
e
= √
µ
−1
d
2
=
n
.
µ
√
n
ẋ ≤ √ .
µ
IV.16 More Theoretical Results
This implies
311
√
n
≤
d−1
ẋ
√ .
N N
µ
√
The third line in the table gives dB = Θ(1/ µ). This together with the left-hand side
inequality implies ṡB = O(1). Similarly, the right-hand side inequality implies that
ẋN = O(1). Thus we have settled the derivatives of the small coordinates.
It remains to deal with the estimates for the derivatives of the large coordinates,
ẋB and ṡN . This is the harder part. We need to characterize the scaled derivatives dṡ
and d−1 ẋ in a different way.
√
n
kdB ṡB k ≤ √ ,
µ
Lemma IV.7 Let x̃ be any vector in IRn and s̃ = s(x̃). Then
"
#
"
#
1
d−1 ẋ
ds̃
= P(MD −D−1 )
.
µ
dṡ
d−1 x̃
Here P(MD −D−1 ) denotes the orthogonal projection onto the null space of the matrix
M D − D−1 , where D is the diagonal matrix of d.6
Proof:
Letting I denote the identity matrix of size n × n, we multiply both sides
in (16.6) by DM D − I. This gives
e
(DM D − I) dṡ + d−1 ẋ = (DM D − I) √ .
µ
By expanding the products we get
e
e
DM ẋ − dṡ + DM D2 ṡ − d−1 ẋ = DM D √ − √ .
µ
µ
With M ẋ = ṡ this simplifies to
e
e
DM D2 ṡ − d−1 ẋ = DM D √ − √ ,
µ
µ
and this can be rewritten as
e
√ − d−1 ẋ
µ
=
=
6
e
DM D √ − DM D2 ṡ = DM D
µ
e
T
−DM D √ − dṡ .
µ
e
√ − dṡ
µ
Exercise 81 Using the notation of Lemma IV.7, let x̃ run through all vectors in IRn . Then the
vector
ds̃
d−1 x̃
runs through an affine space parallel to the row space of the matrix
intersects the null space of
a vector x̃ in IRn such that
MD
−D −1
µẋs(µ)
µṡx(µ)
where s̃ = s(x̃).
M D −D −1
. This space
in a unique point. Using this, prove that there exists
=
x(µ)s̃
=
s(µ)x̃,
312
IV Miscellaneous Topics
Using this we write
"
#
√e − d−1 ẋ
µ
√e
µ
− dṡ
"
#
−DM T
e
D
−
d
ṡ
√
µ
D−1
h
iT e
− M D −D−1
D √ − dṡ .
µ
=
=
This shows that
the vector on the left belongs to the row space of the matrix
M D − D−1 . Observing that, on the other hand,
"
#
i d−1 ẋ
h
= 0,
M D −D−1
dṡ
which means that the vector of scaled derivatives
"
#
d−1 ẋ
dṡ
(16.7)
belongs to the null space of the matrix M D − D−1 , we conclude that the vector
(16.7) can be characterized as the orthogonal projection of the vector
" e #
√
µ
√e
µ
into the null space of M D − D−1 . In other words,
" e #
"
#
√
d−1 ẋ
µ
.
= P(MD −D−1 )
√e
dṡ
µ
p
√
Since xs = µe, we may replace the vector e by xs/µ. Now using that xs = ds =
d−1 x, we get
"
#
"
#
1
d−1 ẋ
ds
= P(MD −D−1 )
.
µ
dṡ
d−1 x
Finally, let x̃ be any vector in IRn and s̃ = s(x̃). Then we may write
"
# "
#
"
# "
#
ds̃
ds
d (s̃ − s)
DM (x̃ − x)
−
=
=
d−1 x
d−1 (x̃ − x)
d−1 (x̃ − x)
d−1 x̃
"
#
h
iT
−DM T (x̃ − x)
−1
=
=
−
(x̃ − x) .
M
D
−D
d−1 (x̃ − x)
The last vector is in the row space of
P(MD −D−1 )
"
h
ds
d−1 x
i
M D −D−1 , and hence we have
#
= P(MD −D−1 )
"
ds̃
d−1 x̃
#
,
IV.16 More Theoretical Results
313
proving the lemma.
✷
Using Lemma IV.7 with x̃ = x∗ and s̃ = s∗ we have
"
"
#
# 2
" #
h
i h
d−1 ẋ
ds∗ − h
−1
µ
= argminh,k∈IRn
:
=
0
.
M
D
−D
dṡ
d−1 x∗ − k
k
Hence, the unique solution of the above least squares problem is given by
h = µd−1 ẋ,
k = µdṡ.
The left-hand side of the constraint in the problem can be split as follows:
"
"
#
#
i h
i h
h
h
N
B
−1
−1
= 0.
+ MN DN −DB
MB DB −DN
kB
kN
Substituting the optimal values for hN and kB we find that hB and kN need to satisfy
#
#
"
"
h
i d−1 ẋ
h
i h
N
B
N
−1
−1
=
−µ MN DN −DB
MB DB −DN
dB ṡB
kN
#
"
h
i ẋ
N
.
=
−µ MN −IB
ṡB
Since x∗N = 0 and s∗B = 0 we obtain the following characterization of the derivatives
for the large coordinates:
"
#
d−1
B ẋB
µ
=
dN ṡN
"
#
"
#
"
# 2
h
i ẋ
h
i h
h
N
B
B
−1
.
= −µ MN −IB
: MB DB −DN
argminhB ,kN
kN
ṡB
kN
(16.8)
Now let z = (zB , zN ) be the least norm solution of the equation
"
#
"
#
h
i z
h
i ẋ
B
N
= −µ MN −IB
.
MB −IN
zN
ṡB
Then we have
"
zB
zN
#
h
= −µ MB −IN
i+ h
MN −IB
i
"
ẋN
ṡB
#
(16.9)
h
i+
h
i
where MB −IN
denotes the pseudo-inverse7 of the matrix MB −IN . It is
obvious that
−1
hB = dB
zB , kN = dN zN
7
See Appendix B.
314
IV Miscellaneous Topics
is feasible for (16.8). It follows that
#
#
"
"
−1
µd−1
ẋ
d
z
B
B
B
B
.
≤
µdN ṡN
dN zN
√
√
From Table 16.1. we know that d−1
B = Θ( µ) and dN = Θ( µ), so it follows that
#
"
#
"
zB
µd−1
√
B ẋB
.
≤ Θ( µ)
zN
µdN ṡN
Moreover, we have already established that
ẋN = O(1),
ṡB = O(1).
Hence, using also (16.9),
z = µ O(1),
where the constant in the order symbol now also contains the norm of the matrix
+
(MB − IN ) (MN − IB ). Note that this matrix, and hence also its norm, depends
only on the data of the problem. Substitution yields, after dividing both sides by µ,
"
#
d−1
√
B ẋB
= Θ( µ) O(1).
dN ṡN
√
√
Using once more d−1
B = Θ( µ) and dN = Θ( µ), we finally obtain
"
#
ẋB
= O(1),
ṡN
completing the proof of the estimates in the table.
16.2.3
Convergence of the derivatives
Consider the second equation in (16.5):
xṡ + sẋ = e.
Recall that ẋ and ṡ are orthogonal. Since xs = µe, the vectors xṡ and sẋ are orthogonal
as well, so this equation represents an orthogonal decomposition of the all-one vector
e. It is interesting to consider this decomposition as µ goes to zero. This is done in the
next theorem. Its proof uses the results of the previous section, which are summarized
in Table 16.1..
Theorem IV.8 If µ approaches zero, then xṡ and sẋ converge to complementary
{0, 1}-vectors. The supports of their limits are B and N , respectively.
Proof: For each index i we have
xi ṡi + si ẋi = 1.
IV.16 More Theoretical Results
315
Now let i ∈ B and let µ approach zero. Then si → 0. Since ẋi is bounded, it follows
that si ẋi → 0. Therefore, xi ṡi → 1. Similarly, if i ∈ N , then xi → 0. Since ṡi is
bounded, xi ṡi → 0 and hence si ẋi → 1. This implies the theorem.
✷
The next theorem is an immediate consequence of Theorem IV.8 and requires no
further proof. It establishes that the derivatives of the small variables converge if µ
approaches zero.8
Theorem IV.9 We have limµ↓0 ẋN = (s∗N )−1 and limµ↓0 ṡB = (x∗B )−1 .
16.3
✷
Ellipsoidal approximations of level sets
In this section we discuss another property of µ-centers. Namely, that there exist two
homothetic ellipsoids that are centered at the µ-center and which respectively contain,
and are contained in, an appropriate level set of the objective function q T x. In this
section we keep µ > 0 fixed.
For any K ≥ 0 we define the level set
MK := x : x ≥ 0, s(x) = M x + q ≥ 0, q T x ≤ K .
(16.10)
Since q T x(µ) = nµ, we have x(µ) ∈ MK if and only if nµ ≤ K. Note that M0
represents the set of optimal solutions of (SP ), since q T x ≤ 0 if and only if q T x = 0.
Hence M0 = SP ∗ .
For any number r ≥ 0 we also define the ellipsoid
)
(
2
2
s(x)
x
2
−e +
−e ≤r .
E(µ, r) := x :
x(µ)
s(µ)
Note that the norms in the defining inequality of this ellipsoid vanish if x = x(µ), so
the analytic center x(µ) is the center of the ellipsoid E(µ, r).
Theorem IV.10 E(µ, 1) ⊆ Mµ(n+√n) and M0 ⊆ E(µ, n).
Proof: Assume x ∈ E(µ, 1). We denote s(x) simply by s. To√prove the first inclusion
we need to show that x ≥ 0, s = s(x) ≥ 0 and q T x ≤ µ (n + n).
To simplify the notation we make use once more of the vectors hx and hs introduced
in Section 6.9.2, namely
x
s
hx =
− e, hs =
− e,
(16.11)
x(µ)
s(µ)
or equivalently,
hx =
8
x − x(µ)
,
x(µ)
hs =
s − s(µ)
.
s(µ)
Theorem IV.9 gives only the limiting values of the derivatives of the small variables and says nothing
about convergence of the derivatives for the large coordinates. For this we refer to Güler [131], who
shows that all derivatives converge when µ approaches zero along a weighted path. In fact, he
extends this result to all higher-order derivatives and he gets similar results for the case where µ
approaches infinity.
316
IV Miscellaneous Topics
Obviously, hx and hs are orthogonal. Hence, defining
h := hx + hs ,
we find
khk2 = khx k2 + khs k2 ≤ 1.
Hence khx k ≤ 1. We easily see that this implies x√≥ 0. Similarly, khs k ≤ 1 implies
s ≥ 0. Thus it remains to show that q T x ≤ µ (n + n). Since
xs
xs
=
,
(hx + e) (hs + e) =
x(µ)s(µ)
µ
and on the other hand
(hx + e) (hs + e) = hx hs + hx + hs + e = hx hs + h + e,
we get
xs
− e − hx hs .
(16.12)
µ
Taking the inner product of both sides with the all-one vector, while using once more
that hx and hs are orthogonal, we arrive at
h=
eT h =
This gives
xT s
qT x
− eT e =
− n.
µ
µ
(16.13)
q T x = µ n + eT h .
Finally, applying the Cauchy–Schwarz inequality to eT h, while using khk ≤ 1, we get
√
q T x ≤ µ (n + kek) = µ n + n ,
proving the first inclusion in the theorem.
To prove the second inclusion, let x be optimal for (SP ). Then q T x = 0, and hence,
from (16.13), eT h = −n. Since x ≥ 0 and s ≥ 0, (16.11) gives hx ≥ −e and hs ≥ −e.
Thus we find h ≥ −2e. Now consider the maximization problem
n
o
2
max khk : eT h = −n, h ≥ −2e ,
(16.14)
h
and let h̄ be a solution of this problem. Then, for arbitrary i and j, with 1 ≤ i < j ≤ n,
h̄i and h̄j solve the problem
max h2i + h2j : hi + hj = h̄i + h̄j , hi ≥ −2, hj ≥ −2 ,
hi ,hj
We easily understand that this implies that either hi = −2 or hj = −2. Thus, h̄
must have n − 1 coordinates equal to −2 and the remaining coordinate equal to
−n − (n − 1)(−2) = n − 2, and hence,
h̄
2
= (n − 1)4 + (n − 2)2 = n2 .
Therefore, khk ≤ n. This means that x ∈ E(µ, n), and hence the theorem follows.9 ✷
9
Exercise 82 Using the notation of this section, prove that
Mnµ ⊆ E(µ, 2n).
17
Partial Updating
17.1
Introduction
In this chapter we deal with a technique that can be applied to almost
√ every
interior-point algorithm to enhance the theoretical efficiency by a factor n. The
technique is called partial updating, and was introduced by Karmarkar in [165]. His
projective algorithm, as presented in Chapter 15, needs O(nL) iterations and O(n3 )
arithmetic operations per iteration. Thus, in total the projective algorithm requires
O(n4 L) arithmetic operations. Karmarkar showed that this complexity bound can
be reduced to O(n3.5 L) arithmetic operations by using partial updating. It has since
become apparent that the same technique can be applied to many other interior-point
√
algorithms with the same effect: a reduction of the complexity bound by a factor n.1
The partial updating technique can be described as follows. In an interior-point
method for solving the problems (P ) and (D) — in the standard format of Part II
— each search direction is obtained by solving a linear system involving a matrix of
the form AD2 AT , where the scaling matrix D = diag (d) is a positive diagonal matrix
depending on the method. In a primal-dual method we have D2 = XS −1 , in a primal
method D = X, and in a dual method D = S −1 . The matrix D varies from iteration
to iteration, due to the variations in x and/or s. We assume that A is m × n with
rank m. Without partial updating the computation of the search directions requires
at each iteration O(n3 ) arithmetic operations for factorization of the matrix AD2 AT
and only O(n2 ) operations for all the other required arithmetic operations.
Although the matrix AD2 AT varies from iteration to iteration, it seems reasonable
to expect that the variations are not too large, and that the matrix at the next iteration
is related in some sense to the current matrix. In other words, the calculation of the
search direction in the next iteration might benefit from earlier calculations. In some
way, that goal is achieved by the use of partial updating.
To simplify the discussion we assume for the moment that at some iteration the
scaling matrix is the identity matrix I and at the next iteration D. Then, if ai denotes
the i-th column of A, we may write
AD2 AT =
n
X
i=1
1
d2i ai aTi =
n
X
i=1
ai aTi +
n
X
i=1
d2i − 1 ai aTi .
See for example Anstreicher [20], Anstreicher and Bosch [25, 26], Bosch [46], Bosch and Anstreicher [47], den Hertog, Roos and Vial [146], Gonzaga [118], Kojima, Mizuno and Yoshise [177],
Mehrotra [204], Mizuno [213], Monteiro and Adler [219], Roos [240], Vaidya [276] and Ye [306].
318
IV Miscellaneous Topics
Hence
AD2 AT = AAT +
n
X
i=1
d2i − 1 ai aTi ,
showing that AD A arises by adding the n rank-one matrices d2i − 1 ai aTi to AAT .
Now consider the hypothetical situation that di = 1 for every i, except for i = 1. Then
we have
AD2 AT = AAT + d21 − 1 a1 aT1
2
T
and AD2 AT is a so-called rank-one modification of AAT . By the well known ShermanMorrison formula2 we then have
−1
−1
−1
−1
AAT
a1 aT1 AAT
.
AD2 AT
= AAT
− d21 − 1
−1
1 + (d21 − 1) aT1 (AAT ) a1
This expression makes clear that the inverse of AD2 AT is equal to the inverse of AAT
plus a scalar multiple of the rank-one matrix vv T , where
v = AAT
−1
a1 .
−1
−1
We say that AD2 AT
is a rank-one update of AAT
. The computation of a
rank-one update requires O(n2 ) arithmetic operations, as may easily be verified.
In the general situation, when all the entries of d differ from 1, the inverse of the
matrix AD2 AT can be obtained by applying n rank-one updates to the inverse of
AAT . This still requires O(n3 ) arithmetic operations.
The underlying idea for the partial updating technique is to perform only those
rank-one updates that correspond to coordinates i of d for which d2i − 1 exceeds
some threshold value. A partial updating algorithm maintains an approximation d˜
e 2 AT instead of AD2 AT ; the value of d˜i is updated to its correct
of d and uses AD
value if it deviates too much from di . Each update of an entry in d˜ necessitates
e 2 AT . But each such modification
modification of the inverse (or factorization) of AD
can be accomplished by a rank-one update, and this requires only O(n2 ) arithmetic
operations.3 The success of the partial updating technique comes from the fact that
it can reduce
the total number of rank-one updates in the course of an algorithm by
√
a factor n.
The analysis of an interior-point algorithm with partial updating consists of two
parts. First we need to show that the modified search directions, obtained by using
the scaling matrix d˜ instead of d, are sufficiently accurate to maintain the polynomial
complexity of the original algorithm; this amounts to showing that the modified
algorithm has a worst-case iteration count of the same order of magnitude as the
2
Exercise 83 Let Q, R, S, T be matrices such that the matrices Q and Q + RS T are nonsingular
and R and S are n × k matrices of rank k ≤ n. Prove that
(Q + RS T )−1 = Q−1 − Q−1 R(I + S T Q−1 R)−1 S T Q−1 .
The Sherman-Morrison formula arises by taking R = S = a, where a is a nonzero vector [136].
3
We refer the reader to Shanno [251] for more details of rank-one updates of a Cholesky factorization
of a matrix of the form AD 2 AT .
IV.17 Partial Updating
319
original algorithm. Then, secondly, we have to count the total number of rank-one
updates in the modified algorithm.
As indicated above, the partial updating technique can be applied to a wide class of
interior-point algorithms. Below we demonstrate its use only for the dual logarithmic
barrier method with full Newton steps, which was analyzed in Chapter 6.
17.2
Modified search direction
Recall from Exercise 35 (page 111) that the search direction in the dual logarithmic
barrier method is given by
−1 b
− AS −1 e .
∆y = AS −2 AT
µ
More precisely, this is the search direction at y, with s = c − AT y > 0, and for the
barrier parameter value µ. In the sequel we use instead
−1 b
− AS −1 e ,
∆y = ASe−2 AT
µ
where s̃ is such that s̃ = λs with
λi ∈
1
,σ ,
σ
1 ≤ i ≤ n,
(17.1)
for some fixed real constant σ > 1. The corresponding displacement in the s-space is
given by
−1 b
−1
T
T
−2 T
e
− AS e .
(17.2)
∆s = −A ∆y = −A AS A
µ
Letting x̄ be such that Ax̄ = b we may write
T
−1
sx̄
s̃−1 ∆s = − ASe−1
ASe−2 AT
ASe−1 Λ
−e ,
µ
showing that −s̃−1 ∆s equals the orthogonal projection of the vector
sx̄
Λ
−e
µ
into the row space of the matrix ASe−1 . Since the row space of the matrix ASe−1 is
e where H is the same matrix as used in Chapter 6 —
equal to the null space of H S,
and defined in Section 5.8, page 111 — we have
sx̄
s̃−1 ∆s = −PH Se Λ
−e
.
(17.3)
µ
Note that if λ = e then the above expression coincides with the expression for the
dual Newton step in (6.1). Defining
sx
−e
: Ax = b ,
(17.4)
x̃(s, µ) = argminx
Λ
µ
320
IV Miscellaneous Topics
and using the same arguments as in Section 6.5 we can easily verify that
s̃x̃(s, µ)
sx̃(s, µ)
sx̄
−e
=Λ
−e =
− λ,
PH Se Λ
µ
µ
µ
yielding the following expression for the modified Newton step:
sx̃(s, µ)
.
s̃−1 ∆s = Λ e −
µ
17.3
(17.5)
Modified proximity measure
The proximity of s to s(µ) is measured by the quantity
δ̃(s, µ) :=
∆s
.
s̃
(17.6)
From (17.5) it follows that the modified Newton step ∆s vanishes if and only if
sx̃(s, µ) = µe, which holds if and only if x̃(s, µ) = x(µ) and s = s(µ). As a consequence
we have
δ̃(s, µ) = 0 ⇐⇒ s = s(µ).
An immediate consequence of (17.4) and (17.5) is
sx
sx̃(s, µ)
= min Λ
−e
δ̃(s, µ) = Λ e −
x
µ
µ
: Ax = b .
(17.7)
The next lemma shows that the modified proximity δ̃(s, µ) has a simple relation with
the standard proximity measure δ(s, µ).
Lemma IV.11
δ(s, µ)
≤ δ̃(s, µ) ≤ σδ(s, µ).
σ
Proof: Using (17.7) and max (λ) ≤ σ we may write
sx̃(s, µ)
sx(s, µ)
δ̃(s, µ)
=
Λ e−
≤ Λ e−
µ
µ
sx(s, µ)
≤ σδ(s, µ).
≤
kdk∞ e −
µ
On the other hand we have
δ(s, µ)
=
≤
sx(s, µ)
sx̃(s, µ)
−1
e−
≤ Λ Λ e−
µ
µ
sx̃(s, µ)
d−1 ∞ Λ e −
≤ σ δ̃(s, µ).
µ
This implies the lemma.
The next lemma generalizes Lemma II.26 in Section 6.7.
✷
IV.17 Partial Updating
321
Lemma IV.12 Assuming δ̃(s, µ) ≤ 1, let s+ be obtained from s by moving along the
modified Newton step ∆s at s for the barrier parameter value µ, and let µ+ = (1 − θ)µ.
Assuming that s+ is feasible, we have
s
σ 2 − 1 δ̃(s, µ)
θ2 n
+
+
4
+
.
δ̃(s , µ ) ≤ σ δ̃(s, µ) +
1−θ
(1 − θ)2
Proof: By definition,
+
+
δ(s , µ ) = min
x
s+ x
e− +
µ
: Ax = b .
Substituting for x the vector x̃(s, µ) and replacing µ+ by µ(1 − θ) we obtain the
following inequality:
s+ x̃(s, µ)
δ(s+ , µ+ ) ≤ e −
.
(17.8)
µ(1 − θ)
Simplifying the notation by using
h :=
we may rewrite (17.5) as
Using this and (17.9) we get
s+ x̃(s, µ)
=
=
∆s
∆s
=
,
s̃
λs
(17.9)
sx̃(s, µ) = µ e − λ−1 h .
(17.10)
(s + ∆s) x̃(s, µ) = (s + λsh) x̃(s, µ)
(e + λh) sx̃(s, µ) = µ (e + λh) e − λ−1 h .
Substituting this into (17.8) we obtain
(e + λh) e − λ−1 h
δ(s , µ ) ≤ e −
1−θ
+
+
e − h2 + λ − λ−1 h
= e−
.
1−θ
This can be rewritten as
λ − λ−1 h
θ e − h2
−
.
δ(s , µ ) ≤ h −
1−θ
1−θ
+
+
2
The triangle inequality now yields
θ e − h2
δ(s , µ ) ≤ h −
1−θ
+
+
2
+
λ − λ−1 h
.
1−θ
(17.11)
The first norm resembles (6.12) and, since khk ≤ 1, can be estimated in the same way.
This gives
2
θ e − h2
θ2 n
4
2
≤ khk +
h −
2.
1−θ
(1 − θ)
322
IV Miscellaneous Topics
For the second norm in (17.11) we write
λ − λ−1 h
≤ λ − λ−1
1−θ
∞
σ − σ −1 khk
khk
≤
.
1−θ
1−θ
Substituting the last two bounds in (17.11), while using khk = δ̃(s, µ), we find
s
σ − σ −1 δ̃(s, µ)
θ2 n
+
+
4
+
δ(s , µ ) ≤ δ̃(s, µ) +
.
1−θ
(1 − θ)2
Finally, Lemma IV.11 gives δ̃(s+ , µ+ ) ≤ σδ(s+ , µ+ ) and the bound in the lemma
follows.
✷
Lemma IV.13
√ Let n ≥ 3. Using the notation of Lemma IV.12 and taking σ = 9/8
and θ = 1/(6 n), we have
δ̃(s, µ) ≤
1
1
⇒ δ̃(s+ , µ+ ) ≤ ,
2
2
and the new iterate s+ is feasible.
Proof: The implication in the lemma follows by substituting the given values in the
bound for δ̃(s+ , µ+ ) in Lemma IV.12. If n ≥ 3 this gives
δ̃(s+ , µ+ ) ≤ 0.49644 < 0.5,
yielding the desired result. By Lemma IV.11 this implies δ(s+ , µ+ ) ≤ σ/2 = 9/16.
From this the feasibility of s+ follows.
✷
The above lemma shows that for the specified values of the parameters σ and θ
the modified Newton steps keep the iterates close to the central path. The value of
the barrier update parameter θ in Lemma IV.13 is a factor of two smaller than in
the algorithm of Section 6.7. Hence we must expect that the iteration bound for an
algorithm based on these parameter values will be a factor of two worse. This is the
price we pay for using the modified Newton direction. On the other hand, in terms of
the number of arithmetic operations required to reach an ε-solution, the gain is much
larger. This will become clear in the next section.
The modified algorithm is described on page 323.
Note that in this algorithm the vector λ may be arbitrarily at each iteration, subject
to (17.1). The next theorem specifies values for the parameters τ , θ and σ for which
the algorithm is well defined and has a polynomial iteration bound.
√
Theorem IV.14 If τ = 1/2, θ = 1/(6 n) and σ = 9/8, then the Dual Logarithmic
Barrier Algorithm with Modified Full Newton Steps requires at most
√
nµ0
6 n log
ε
iterations. The output is a primal-dual pair (x, s) such that xT s ≤ 2ε.
IV.17 Partial Updating
323
Dual Log. Barrier Algorithm with Modified Full Newton Steps
Input:
A proximity parameter τ , 0 ≤ τ < 1;
an accuracy parameter ε > 0;
(y 0 , s0 ) ∈ D and µ0 > 0 such that δ(s0 , µ0 ) ≤ τ ;
a barrier update parameter θ, 0 < θ < 1;
a threshold value σ, σ > 1.
begin
s := s0 ; µ := µ0 ;
while nµ ≥ ε do
begin
Choose any λ satisfying (17.1);
s := s + ∆s, ∆s from (17.2);
µ := (1 − θ)µ;
end
end
Proof: According to Lemma IV.13 the algorithm is well defined. The iteration bound
is an immediate consequence of Lemma I.36. Finally, the duality gap of the final iterate
can be estimated as follows. For the final iterate s we have δ̃(s, µ) ≤ 1/2, with nµ ≤ ε.
Taking x = x̃(s, µ) it follows from (17.10) that
sT x̃(s, µ) = nµ − µhT λ−1 .
Since
√
hT λ−1 ≤ λ−1 khk ≤ σ δ̃(s, µ) n ≤ 9n/16 ≤ n,
we obtain
sT x̃(s, µ) ≤ 2nµ ≤ 2ε.
The proof is complete.
17.4
✷
Algorithm with rank-one updates
We now present a variant of the algorithm in the previous section in which the vector
λ used in the computation of the modified Newton step is prescribed. See page 324.
Note that at each iteration the vector s̃ is updated in such a way that the vector λ
used in the computation of the modified Newton step satisfies (17.1). As a consequence,
the iteration bound for the algorithm is
√ given by Theorem IV.14. Hence, the algorithm
yields an exact solution of (D) in O ( nL) iterations. Without using partial updates
— which corresponds to giving the threshold parameter σ the value 1 — the bound for
324
IV Miscellaneous Topics
Full Step Dual Log. Barrier Algorithm with Rank-One Updates
Input:
A proximity parameter τ , τ = 1/2;
an accuracy parameter ε > 0;
(y 0 , s0 ) ∈ D and µ0 > 0 such that δ(s0 ,√
µ0 ) ≤ τ ;
a barrier update parameter θ, θ = 1/(6 n);
a threshold value σ, σ = 9/8.
begin
s := s0 ; µ := µ0 ; s̃ = s;
while nµ ≥ ε do
begin
λ := s̃s−1 ;
s := s + ∆s, ∆s from (17.2);
for i := 1 to n do
begin
if s̃sii ∈
/ σ1 , σ then s̃i := si
end
µ := (1 − θ)µ;
end
end
the total number of arithmetic operations becomes O n3.5 L . Recall that the extra
factorn3 can be interpreted as being due to n rank-one updates per iteration, with
O n2 arithmetic operations per rank-one update.
The total number of rank-one updates in the above algorithm is equal to the number
of times that a coordinate of the vector s̃ is updated. We estimate this
√ number in the
next section, and we show that on the average it is not more than O ( n) per iteration,
instead of n. Thus
the overall bound for the total number of arithmetical operations
becomes O n3 L .
17.5
Count of the rank-one updates
We need to count (or estimate) the number of times that a coordinate of the vector
s̃ changes. Let sk and s̃k denote the values assigned to s and to s̃, respectively, at
iteration k of the algorithm. We use also the superscript k to refer to values assigned
to other relevant entities during the k-th iteration. For example, the value assigned to
λ at iteration k is denoted by λk and satisfies
λk =
s̃k−1
,
sk−1
k ≥ 1.
IV.17 Partial Updating
325
Moreover, denoting the modified Newton step on iteration k by ∆sk , we have
∆sk = sk − sk−1 = s̃k−1 hk = λk sk−1 hk ,
k ≥ 1.
(17.12)
Note that the algorithm is initialized so that s0 = s̃0 and these are the values of s and
s̃ just before the first iteration.
Now consider the i-th coordinate of s̃. Suppose that s̃i is updated at iteration
k1 ≥ 0 and next updated at iteration k2 > k1 . Then the updating rule implies that
the sequence
ski 2 −1 ski 2
ski 1 +1 ski 1 +2
,
,
.
.
.
,
,
s̃ki 1 s̃ki 1 +1
s̃ki 2 −2 s̃ik2 −1
has the property that the last entry lies outside the interval (1/σ, σ) whereas all the
other entries lie inside this interval. Since
ski 1 = s̃ki 1 = s̃ki 1 +1 = . . . = s̃ki 2 −1
we can rewrite the above sequence as
ski 1 +1 ski 1 +2
ski 2 −1 ski 2
,
,
.
.
.
,
,
.
ski 1
ski 1
ski 1 ski 1
(17.13)
Hence, with
pj := sik1 +j ,
0 ≤ j ≤ K := k2 − k1 ,
(17.14)
the sequence
p0 , p1 , . . . , pK
has the property
pj
∈
p0
and
1
,σ ,
σ
pK
∈
/
p0
1≤j<K
1
,σ .
σ
(17.15)
(17.16)
(17.17)
Our estimate of the number of rank-one updates in the algorithm depends on a
technical lemma on such sequences. The proof of this lemma (Lemma IV.15 below)
requires another technical lemma that can be found in Appendix C (Lemma C.3).
Lemma IV.15 Let σ ≥ 1 and let p0 , p1 , . . . , pK be a finite sequence of positive
numbers satisfying (17.16) and (17.17). Then
K−1
X
j=0
pj+1 − pj
1
≥1− .
pj
σ
Proof: We start with K = 1. Then we need to show
p1 − p0
1
p1
=
−1 ≥1− .
p0
p0
σ
(17.18)
326
IV Miscellaneous Topics
If p1 /p0 ≤ 1/σ then
p1
1
p1
−1 =1−
≥1−
p0
p0
σ
and if p1 /p0 ≥ σ then
p1
σ−1
1
p1
−1 =
−1≥σ−1≥
=1− .
p0
p0
σ
σ
We proceed with K ≥ 2. It is convenient to denote the left-hand side expression
on (17.18) by g(p0 , p1 , . . . , pK ). We start with an easy observation: if pj+1 = pj for
some j (0 ≤ j < K) then g(p0 , p1 , . . . , pK ) does not change if we remove pj+1 from
the sequence. So without loss of generality we may assume that no two subsequent
elements in the given sequence p0 , p1 , . . . , pK are equal.
Now let the given sequence p0 , p1 , . . . , pK be such that g(p0 , p1 , . . . , pK ) is minimal.
For 0 < j < K we consider the two terms in g(p0 , p1 , . . . , pK ) that contain pj . The
contribution of these two terms is given by
pj+1 − pj
pj
pj − pj−1
+
= 1−
+ 1−
pj−1
pj
pj−1
pj+1
pj−1
pj
pj−1
.
(17.19)
Since p0 , p1 , . . . , pK minimizes g(p0 , p1 , . . . , pK ), when fixing pj−1 and pj+1 , pj must
minimize (17.19). If pj+1 ≤ pj−1 then Lemma C.3 (page 437) implies that
pj
pj
pj+1
= 1 or
=
.
pj−1
pj−1
pj−1
This means that
pj = pj−1 or pj = pj+1 .
Hence, in this case the sequence has two subsequent elements that are equal, which
has been excluded above. We conclude that pj+1 > pj−1 . Applying Lemma C.3 once
more, we obtain
r
pj+1
pj
=
.
pj−1
pj−1
Thus it follows that
pj−1 < pj =
√
pj−1 pj+1 < pj+1
for each j, 0 < j < K, showing that the sequence p0 , p1 , . . . , pK is strictly increasing
and each entry pj in the sequence, with 0 < j < K, is the geometric mean of the
surrounding entries. This implies that the sequence pj /p0 , 1 ≤ j ≤ K, is geometric
and we have
pj
= αj , 1 ≤ j ≤ K,
p0
where
r
p2
α=
> 1.
p0
In that case we must have
pK
p0
g(p0 , p1 , . . . , pK ) =
≥ σ and hence α satisfies αK ≥ σ. Since
K−1
X
j=0
K−1
X
pj+1
(α − 1) = K (α − 1) ,
−1 =
pj
j=0
IV.17 Partial Updating
327
the inequality in the lemma follows if
K (α − 1) ≥ 1 −
1
.
αK
This inequality holds for each natural number K and for each real number α ≥ 1.
This can be seen by reducing the right-hand side as follows:
(α − 1) αK−1 + . . . + α + 1
αK − 1
1
=
=
1− K
α
αK
αK
−1
−2
=
(α − 1) α + α
. . . + α−K < K (α − 1) .
This completes the proof.
✷
Now the next lemma follows easily.
Lemma IV.16 Suppose that the component s̃i of s̃ is updated at iteration k1 and next
updated at iteration k2 > k1 . Then
kX
2 −1
k=k1
∆sk+1
1
i
≥1− ,
k
σ
si
where ∆sk+1
denotes the i-th coordinate of the modified Newton step at iteration k +1.
i
Proof: Applying Lemma IV.15 to the sequence p0 , p1 , . . . , pK defined by (17.14) we
get
kX
2 −1
1
sk+1
− ski
i
≥1− .
k
σ
s
i
k=k
1
Since s
k+1
k
− s = ∆s
k+1
by definition, the lemma follows.
✷
Theorem IV.17 Let N denote the total number of iterations of the algorithm and ni
the total number of updates of s̃i . Then
n
X
i=1
√
ni ≤ 6N n.
Proof: Recall from (17.12) that
∆sk+1 = λk+1 sk hk+1 .
Hence, for 1 ≤ i ≤ n,
∆sk+1
i
= λk+1
hk+1
.
i
i
ski
Now Lemma IV.16 implies
N
X
k=1
λki hki
=
N
−1
X
k=0
λk+1
hk+1
i
i
1
≥ ni 1 −
.
σ
328
IV Miscellaneous Topics
Taking the sum over i we obtain
n
X
i=1
N
ni ≤
n
σ XX k k
λi hi .
σ−1
i=1
k=1
The inner sum can be bounded above by
n
X
i=1
λki hki ≤ σ
n
X
hki = σ hk
i=1
1
≤ σ hk
√
n.
Since hk = δ̃(sk , µk ) ≤ τ we obtain
n
X
i=1
ni ≤
√
N
N σ2 τ n
σ X √
.
στ n =
σ−1
σ−1
k=1
Substituting the values of σ and τ specified in the algorithm proves the theorem. ✷
Finally, using the iteration bound of Theorem IV.14 and that each rank-one update
requires O(n2 ) arithmetic operations, we may state our final result without further
proof.
Theorem IV.18 The Full Step Dual Logarithmic Barrier Algorithm with Rank-One
Updates requires at most
nµ0
36n3 log
ε
arithmetic operations. The output is a primal-dual pair (x, s) such that xT s ≤ 2ε.
18
Higher-Order Methods
18.1
Introduction
In a target-following method the Newton directions ∆x and ∆s to a given target
point w in the w-space,1 and at a given positive primal-dual pair (x, s), are obtained
by solving the system (10.2):
A∆x
=
0,
A ∆y + ∆s
s∆x + x∆s
=
=
0,
∆w,
T
(18.1)
where ∆w = w − xs. Recall that this system was obtained by neglecting the secondorder term ∆x∆s in the third equation of the nonlinear system (10.1), given by
A∆x
A ∆y + ∆s
=
=
0,
0,
s∆x + x∆s + ∆x∆s
=
∆w.
T
(18.2)
An exact solution — (∆e x, ∆e y, ∆e s) say — of (18.2) would yield the primal-dual pair
corresponding to the target w, because
(x + ∆e x) (s + ∆e s) = w,
as can easily be verified. Unfortunately, finding an exact solution of the nonlinear
system (18.2) is hard from a computational point of view. Therefore, following a
classical approach in mathematics when dealing with nonlinearity, we linearize the
system, and use the solutions of the linearized system (18.1). Denoting its solution
simply by (∆x, ∆y, ∆s), the primal-dual pair (x + ∆x, s + ∆s) satisfies
(x + ∆x) (s + ∆s) = w − ∆x∆s,
and hence, the ‘error’ after the step is given by ∆x∆s. Thus, this error represents the
price we have to pay for using a solution of the linearized system (18.1). We refer to
it henceforth as the second-order effect.
1
We defined the w-space in Section 9.1, page 220; it is simply the interior of the nonnegative orthant
in IRn .
330
IV Miscellaneous Topics
Clearly, the second-order effect strongly depends on the actual data of the problem
under consideration.2
It would be very significant if we could eliminate the above described second-order
effect, or at least minimize it in some way or another. One way to do this is to use
so-called higher-order methods.3 The Newton method used so far is considered to be a
first-order method. In the next section the search directions for higher-order methods
are introduced. Then we devote a separate section (Section 18.3) to the estimate of the
(higher-order) error term E r (α), where r ≥ 1 denotes the order of the search direction
and α the step-size. The results of Section 18.3 are applied in two subsequent sections.
In Section 18.4 we first discuss and extend the definition of the primal-dual Dikin
direction, as introduced in Appendix E for the self-dual problem, to a primal-dual
Dikin direction for the problems (P ) and (D) in standard format. Then we consider
a higher-order version of √
this direction, and we show that the iteration bound can
be reduced by the factor n without increasing the complexity per iteration. Then,
in Section 18.5 we apply the results of Section 18.3 to the primal-dual logarithmic
barrier method, as considered in Chapter 7 of Part II. This section is based on a
paper of Zhao [320]. Here the use of higher-order search directions does not improve
the iteration bound when compared with the (first-order) full Newton step method.
Recall that in the full Newton step method the iterates stay very close to the central
path. This can be expressed by saying this method keeps the iterates in a ‘narrow
cone’ around the central path. We shall see that a higher-order method allows the
iterates to stay further away from the central path, which makes such a method a
‘wide cone’ method.
18.2
Higher-order search directions
Suppose that we are given a positive primal-dual pair (x, s) and we want to find the
primal-dual pair corresponding to w̄ := xs + ∆w for some displacement ∆w in the
w-space. Our aim is to generate suitable search directions ∆x and ∆s at (x, s). One
way to derive such directions is to consider the linear line segment in the w-space
connecting xs with w̄. A parametric representation of this segment is given by
xs + α∆w,
2
3
0 ≤ α ≤ 1.
In the w-space the ideal situation is that the curve (x + α∆x) (s + α∆s) , 0 ≤ α ≤ 1, moves from
xs in a straight line to the target w. As a matter of fact, the second-order effect ‘blows’ the curve
away from this straight line segment. Considering α as a time parameter, we can think of the
iterate (x + α∆x) (s + α∆s) as the trajectory of a vessel sailing from xs to w and of the secondorder effect as a wind blowing it away from its trajectory. To reach the target w the bargeman
can put over the helm now and then, which in our context is accomplished by updating the search
direction. In practice, a bargeman will anticipate the fact that the wind is (locally) constant and he
can put the helm in a fixed position that prevents the vessel being driven from its correct course.
It may be interesting to mention a computer game called Schiet OpT M , designed by Brinkhuis
and Draisma, that is based on this phenomenon [51]. It requires the player to find an optimal path
in the w-space to the origin.
The idea of using higher-order search directions as presented in this chapter is due to Monteiro,
Adler and Resende [220], and was later investigated by Zhang and Zhang [318], Hung and Ye [150],
Jansen et al. [160] and Zhao [320]. The idea has been applied also in the context of a predictorcorrector method by Mehrotra [202, 205].
IV.18 Higher-Order Methods
331
To any point of this segment belongs a primal-dual pair and we denote this pair by
(x(α), s(α)).4 Since x(α) and s(α) depend analytically5 on α there exist x(i) and s(i) ,
with i = 0, 1, . . ., such that
x(α) =
∞
X
x(i) αi ,
s(α) =
∞
X
s(i) αi ,
0 ≤ α ≤ 1.
i=0
i=0
(18.3)
Obviously, x(0) = x = x(0) and s(0) = s = s(0) . From Ax(α) = b, for each α ∈ [0, 1],
we derive
Ax(0) = b,
Ax(i) = 0,
i ≥ 1.
(18.4)
Similarly, there exist unique y (i) and s(i) , i = 0, 1, . . ., such that
AT y (0) + s(0) = c,
AT y (i) + s(i) = 0,
i ≥ 1.
(18.5)
Furthermore, from
x(α)s(α) = xs + α∆w,
by expanding x(α) and s(α) and then equating terms with equal powers of α, we get
the following relations:
(0) (1)
x
s
k
X
x(0) s(0)
=
xs
(18.6)
(0) (1)
=
∆w
(18.7)
x(i) s(k−i)
=
0,
+s
x
k = 2, 3, . . . .
(18.8)
i=0
The first relation implies once more that x(0) = x and s(0) = s. Using this and (18.4),
(18.5) and (18.7) we obtain
Ax(1)
=
0
(1)
=
=
0
∆w.
T (1)
A y +s
sx(1) + xs(1)
4
5
(18.9)
In other chapters of this book x(α) denotes the α-center on the primal central path. To avoid any
misunderstanding it might be appropriate to emphasize that in this chapter x(α) — as well as s(α)
— has a different meaning, as indicated.
Note that x(α) and s(α) are uniquely determined by the relations
Ax(α)
=
b,
AT y(α) + s(α)
x (α) s(α)
=
=
c,
s > 0,
xs + α∆w.
x > 0,
Obviously, the right-hand sides in these relations depend linearly (and hence analytically) on α.
Since the Jacobian matrix of the left-hand side is nonsingular, the implicit function theorem (cf.
Proposition A.2 in Appendix A) implies that x(α), y(α) and s(α) depend analytically on α. See
also Section 16.2.1.
332
IV Miscellaneous Topics
This shows that x(1) and s(1) are just the primal-dual Newton directions at (x, s) for
the target w̄ = xs + ∆w.6 Using (18.4), (18.5) and (18.8) we find that the higher-order
coefficients x(k) , y (k) and s(k) , with k ≥ 2, satisfy the linear system
T (k)
A y
Ax(k)
+ s(k)
=
=
sx(k) + xs(k)
0
0
(18.10)
−
=
k−1
X
x(i) s(k−i) ,
k = 2, 3, . . . ,
i=1
thus finding a recursive expression for the higher-order coefficients. The remarkable
thing here is that the coefficient matrix in (18.10) is the same as in (18.9). This has the
important consequence that as soon as the standard (first-order) Newton directions
x(1) and s(1) have been calculated from the linear system (18.9), the second-order
terms x(2) and s(2) can be computed from a linear system with the same coefficient
matrix. Having x(2) and s(2) , we can compute x(3) and s(3) , and so on. Hence, from a
computational point of view the higher-order terms x(k) and s(k) , with k ≥ 2, can be
obtained relatively cheaply.
Assuming that the computation of the Newton directions requiresO(n3 ) arithmetic
operations, the computation of each subsequent pair x(k) , s(k) of higher-order
coefficients requires
O(n2 ) arithmetic operations. For example, if we compute the
(k) (k)
pairs x , s
for k = 1, 2, . . . , n, this doubles the computational cost per iteration.
There is some reason to expect, however, that we will obtain a more accurate search
direction; this may result in a speedup that justifies the extra computational burden
in the computation.
By truncating the expansion (18.3), we define the primal-dual Newton directions of
order r at (x, s) with step-size α by
∆r,α x :=
r
X
x(i) αi ,
∆r,α s :=
r
X
s(i) αi .
(18.11)
i=1
i=1
Moving along these directions we arrive at
xr (α) := x + ∆r,α x,
sr (α) := s + ∆r,α s.
Recall that we started this section by taking w̄ = xs + ∆w as the target point in the
w-space. Now that we have introduced the step-size α it is more natural to consider
w̄(α) := xs + α∆w
as the target. In the following lemma we calculate xr (α) sr (α), which is the next iterate
in the w-space, and hence obtain an expression for the deviation from the target w̄(α)
after the step.
6
Exercise 84 Verify that y (1) can be solved from (18.9) by the formula
y (1) = − AXS −1 AT
−1
AS −1 ∆w.
This generalizes the expression for the logarithmic barrier direction in Exercise 35, page 111. Given
y (1) , s(1) and x(1) follow from
s(1) = −AT y (1) ,
x(1) = S −1 ∆w − xs(1) .
IV.18 Higher-Order Methods
333
Lemma IV.19
r
2r
X
r
x (α) s (α) = xs + α∆w +
r
X
k
α
k=r+1
(i) (k−i)
x s
i=k−r
!
.
Proof: We may write
xr (α) := x + ∆r,α x =
r
X
x(i) αi ,
i=0
and we have a similar expression for sr (α). Therefore,
!
! r
r
X
X
s(i) αi .
x(i) αi
xr (α) sr (α) =
(18.12)
i=0
i=0
The right-hand side can be considered as a polynomial in α of degree 2r. We consider
the coefficient of αk for 0 ≤ k ≤ 2r. If 0 ≤ k ≤ r then the coefficient of αk is given by
k
X
x(i) s(k−i) ,
i=0
By (18.8), this expression vanishes if k ≥ 2. Furthermore, if k = 1 the expression is
equal to ∆w, by (18.7) and if k = 0 it is equal to xs, by (18.6). So it remains to
consider the coefficient of αk on the right-hand side of (18.12) for r + 1 ≤ k ≤ 2r. For
these values of k the corresponding coefficient in (18.12) is given by
r
X
x(i) s(k−i) .
i=k−r
Hence, collecting the above results, we get
xr (α) sr (α) = xs + α∆w +
2r
X
αk
k=r+1
r
X
x(i) s(k−i)
i=k−r
!
.
This completes the proof.
(18.13)
✷
In the next section we further analyze the error term
r
E (α) :=
2r
X
k=r+1
k
α
r
X
(i) (k−i)
x s
i=k−r
!
.
(18.14)
We conclude this section with two observations. First, taking r = 1 we get
E 1 (α) = α2 x(1) s(1) = α2 ∆x∆s,
where ∆x and ∆s are the standard primal-dual Newton directions at (x, s). This is in
accordance with earlier results (see, e.g., (10.12)). If we use a first-order Newton step
334
IV Miscellaneous Topics
then the error is of order two in α. In the general case, of a step of order r, the error
term E r (α) is of order r + 1 in α.
The second observation concerns the orthogonality of the search directions in the xand s-spaces. It is immediate from the first two equations in (18.9) and (18.10) that
As a consequence,
x(i)
T
s(j) = 0,
∀i ≥ 1, ∀j ≥ 1.
T
(∆r,α x) ∆r,α s = 0,
and also, from Lemma IV.19,
T
(xr (α)) sr (α) = eT (xs + α∆w) = eT w̄(α).
Thus, after the step with size α, the duality gap is equal to the gap at the target w̄(α).
Figure 18.1 illustrates the use of higher-order search directions.
w2 4
✻
✻
start
3
■
■
2
■
■
■
1
✻
r=1
r=2
r=3
r=4
r=5
target
0
0
1
2
3
4
✲ w1
Figure 18.1
Trajectories in the w-space for higher-order steps with r = 1, 2, 3, 4, 5.
IV.18 Higher-Order Methods
18.3
335
Analysis of the error term
The main task in the analysis of the higher-order method is to estimate the error term
E r (α), given by (18.14). Our first estimation is very loose. We write
kE r (α)k ≤
2r
X
r
X
αk
k=r+1
i=k−r
x(i) s(k−i) ≤
2r
X
αk
k=r+1
r
X
x(i) s(k−i) ,
(18.15)
i=k−r
and we concentrate on estimating the norms in the last sum. We use the vectors d and
v introduced in Section 10.4:
r
√
x
(18.16)
, v = xs.
d=
s
Then the third equation in (18.9) can be rewritten as
∆w
v
d−1 x(1) + ds(1) =
(18.17)
and the third equation in (18.10) as
d−1 x(k) + ds(k) = −v −1
k−1
X
x(i) s(k−i) ,
k = 2, 3, . . . .
(18.18)
i=1
Since x(k) and s(k) are orthogonal for k ≥ 1, the vectors d−1 x(k) and ds(k) are
orthogonal as well. Therefore,
d−1 x(k)
2
+ ds(k)
2
= d−1 x(k) + ds(k)
2
,
k ≥ 1.
Hence, defining
q (k) := d−1 x(k) + ds(k) ,
k ≥ 1,
(18.19)
we have for each k ≥ 1,
d−1 x(k) ≤ q (k) ,
ds(k) ≤ q (k) ,
and as a consequence, for 1 ≤ i ≤ k − 1 we may write
x(i) s(k−i) = d−1 x(i) ds(k−i) ≤ d−1 x(i)
ds(k−i) ≤ q (i)
q (k−i) .
(18.20)
Substitution of these inequalities in the bound (18.15) for E r (α) yields
kE r (α)k ≤
2r
X
k=r+1
αk
r
X
q (i)
q (k−i) .
i=k−r
We proceed by deriving upper bounds for q (k) , k ≥ 1.
(18.21)
336
IV Miscellaneous Topics
Lemma IV.20 For each k ≥ 1,
q (k) ≤ ϕk v −1
k−1
∞
k
q (1)
,
(18.22)
where the integer sequence ϕ1 , ϕ2 , . . . is defined recursively by ϕ1 = 1 and
ϕk =
k−1
X
ϕi ϕk−i .
(18.23)
i=1
Proof: The proof is by induction on k. Note that (18.22) holds trivially if k = 1.
Assume that (18.22) holds for q (ℓ) if 1 ≤ ℓ < k. We complete the proof by deducing
from this assumption that the lemma is also true for q (k) . For k ≥ 2 we obtain from
the definition (18.19) of q (k) and (18.18) that
q (k) = −v −1
k−1
X
x(i) s(k−i) .
i=1
Hence, using (18.20),
q (k) ≤ v −1
∞
k−1
X
q (i)
q (k−i) .
i=1
At this stage we apply the induction hypothesis to the last two norms, yielding
q (k) ≤ v −1
∞
k−1
X
ϕi v −1
i=1
i−1
∞
q (1)
i
ϕk−i v −1
k−i−1
∞
q (1)
k−i
,
which can be simplified to
q (k) ≤ v −1
k−1
∞
q (1)
X
k k−1
ϕi ϕk−i .
i=1
Finally, using (18.23) the lemma follows.
✷
The solution of the recursion (18.23) with ϕ1 = 1 is given by7
2k − 2
1
ϕk =
.
k
k−1
(18.24)
This enables us to prove our next result.
Lemma IV.21 For each k = r + 1, . . . , 2r,
r
X
i=k−r
7
q (i)
q (k−i) ≤
22k−3 −1
v
k
k−2
∞
q (1)
k
.
Exercise 85 Prove that (18.24) is the solution of the recursion in (18.23) satisfying ϕ1 = 1 (cf.,
e.g., Liu [184]).
IV.18 Higher-Order Methods
337
Proof: Using Lemma IV.20 we may write
r
X
i=k−r
q (i)
r
X
q (k−i) ≤
ϕi v −1
i=k−r
i−1
∞
q (1)
i
ϕk−i v −1
k−i−1
∞
q (1)
k−i
,
which is equivalent to
r
X
q (k−i) ≤ v −1
q (i)
i=k−r
k−2
∞
q (1)
k
r
X
ϕi ϕk−i .
i=k−r
For the last sum we use again a loose bound:
r
X
i=k−r
ϕi ϕk−i ≤
k−1
X
ϕi ϕk−i = ϕk .
i=1
Using (18.24 ) and k ≥ 2 we can easily derive that
ϕk ≤
22k−3
,
k
k ≥ 2.
Substituting this we obtain
r
X
q (i)
i=k−r
q (k−i) ≤
22k−3 −1
v
k
k−2
∞
q (1)
k
,
proving the lemma.
✷
Now we are ready for the main result of this section.
Theorem IV.22 We have
2r
X
1
αk 22k−3 v −1
kE (α)k ≤
r+1
r
k=r+1
k−2
∞
q (1)
k
.
Proof: From (18.21) we recall that
kE r (α)k ≤
2r
X
k=r+1
αk
r
X
q (i)
q (k−i) .
i=k−r
Replacing the second sum by the upper bound in Lemma IV.21 and using that k ≥ r+1
in the first sum, we obtain the result.
✷
18.4
Application to the primal-dual Dikin direction
18.4.1
Introduction
The Dikin direction, described in Appendix E, is one of the directions that can be used
for solving the self-dual problem. In the next section we show that its definition can
338
IV Miscellaneous Topics
easily be adapted to problems (P ) and (D) in standard format. It will become clear
that the analysis of the self-dual model also applies to the standard model and vice
versa. Although we don’t work it out here, we mention that use of the (first-order)
Dikin direction leads to an algorithm for solving the standard model that requires at
most
T
x0 s0
τ n log
ε
iterations, where (x0 , s0 ) denotes the initial primal-dual pair, ε is an upper bound
for the duality gap upon termination of the algorithm and τ an upper bound for the
distance of the iterates to the central path.8 The complexity per iteration is O(n3 ), as
usual. This is in accordance with the bounds in Appendix E for the self-dual model. By
using higher-order versions of the Dikin direction the complexity can be improved by a
r−1
√
factor (τ n) 2r . Note that this factor goes to τ n if r goes to infinity. The complexity
per iteration is O(n3 + rn2 ). Hence, when taking r = n, the complexity per iteration
remains√ O(n3 ). In that case we show that the iteration bound is improved by the
factor τ n. When τ = O(1), which can be assumed without loss of generality, we
obtain the iteration bound
O
√
n log
x0
T
ε
s0
!
,
which is the best iteration bound for interior point methods known until now.
18.4.2
The (first-order) primal-dual Dikin direction
Let (x, s) be a positive primal-dual pair for (P ) and (D) and let ∆x and ∆s denote
displacements in the x-space and the s-space. Moving along ∆x and ∆s we arrive at
x+ := x + ∆x,
s+ := s + ∆s.
The new iterates will be feasible if
A∆x = 0,
AT ∆y + ∆s = 0,
where ∆y represents the displacement in the y-space corresponding to ∆s, and both
x+ and s+ are nonnegative. Since ∆x and ∆s are orthogonal, the new duality gap is
given by
x+
8
T
s+ = xT s + xT ∆s + sT ∆x.
Originally, the Dikin direction was introduced for the standard format. See Jansen, Roos and
Terlaky [156].
IV.18 Higher-Order Methods
339
Replicating Dikin’s idea, just as in Section E.2, we replace the nonnegativity conditions
by the condition9
∆x ∆s
≤ 1.
+
x
s
This can be rewritten as
x+ − x s+ − s
+
≤ 1,
x
s
showing that the new iterates are sought within an ellipsoid, called the Dikin ellipsoid
at the given pair (x, s). Since our aim is to minimize the duality gap, we consider the
optimization problem
∆x ∆s
min eT (s∆x + x∆s) : A∆x = 0, AT ∆y + ∆s = 0,
≤ 1 . (18.25)
+
x
s
The crucial observation is that (18.25) determines the displacements ∆x and ∆s
uniquely. The arguments are almost the same as in Section E.2. Using the vectors
d and v in (18.16), x and s can be rescaled to the same vector v:
d−1 x = v,
ds = v.
As usual, we rescale ∆x and ∆s accordingly to
dx := d−1 ∆x,
Then
dx
∆x
=
,
x
v
ds := d∆s.
(18.26)
∆s
ds
= ,
s
v
and moreover,
∆x∆s = dx ds .
9
Dikin introduced the so-called primal affine-scaling direction at a primal feasible x (x > 0) by
minimizing the primal objective cT (x + ∆x) over the ellipsoid
∆x
x
≤ 1,
subject to A∆x = 0. So the primal affine-scaling direction is determined as the unique solution of
min
n
cT ∆x : A∆x = 0,
∆x
x
o
≤1
.
Dikin showed convergence of the primal affine-scaling method ([63, 64, 65]) under some nondegeneracy assumptions. Later, without nondegeneracy assumptions, Tsuchiya [268, 270] proved
convergence of the method with damped steps. Dikin and Roos [66] proved convergence of a fullstep method for the special case that the given problem is homogeneous. Despite many attempts,
until now it has not been possible to show that the method is polynomial. For a recent survey
paper we refer the reader to Tsuchiya [272]. The approach in this section seems to be the natural
generalization to the primal-dual framework.
340
IV Miscellaneous Topics
Also, the scaled displacements dx and ds are orthogonal. Now the vector occurring in
the ellipsoidal constraint in (18.25) can be reduced to
dx + ds
∆x ∆s
+
=
.
x
s
v
Moreover, the variable vector in the objective of problem (18.25) can be written as
∆x ∆s
= v (dx + ds ) .
s∆x + x∆s = xs
+
x
s
With
dw := dx + ds ,
(18.27)
the vectors dx and ds are uniquely determined as the orthogonal components of dw in
the null space and row space of AD, so we have
dx
=
PAD (dw )
(18.28)
ds
=
dw − dx .
(18.29)
Thus we can solve the problem (18.25) by solving the much simpler problem
dw
T
min v dw :
≤1 .
(18.30)
v
The solution of (18.30) is given by
3
dw = −
(xs) 2
v3
=−
.
2
kv k
kxsk
It follows that dx and ds are uniquely determined by the system
ADdx
=
0
(AD) dy + ds
dx + ds
=
=
0
dw .
T
In terms of the unscaled displacements this can be rewritten as
A∆x
AT ∆y + ∆s
=
=
0
0
s∆x + x∆s
=
∆w,
(18.31)
where
2
v4
(xs)
=−
.
(18.32)
kv 2 k
kxsk
We conclude that the solution of the minimization problem (18.25) is uniquely
determined by the linear system (18.31). Hence the (first-order) Dikin directions ∆x
and ∆s are the Newton directions at (x, s) corresponding to the displacement ∆w
in the w-space, as given by (18.32). We therefore call ∆w the Dikin direction in the
w-space.
In the next section we consider an algorithm using higher-order Dikin directions.
Using the estimates of the error term E r (α) in the previous section we analyze this
algorithm in subsequent sections.
∆w = vdw = −
IV.18 Higher-Order Methods
18.4.3
341
Algorithm using higher-order Dikin directions
For the rest of this section, ∆w denotes the Dikin direction in the w-space as given
by (18.32). For r = 1, 2, . . . and for some fixed step-size α that is specified later, the
corresponding higher-order Newton steps of order r at (x, s) are given by (18.11). The
iterates after the step depend on the step-size α. To express this dependence we denote
them as x(α) and s(α) as in Section 18.2. We consider the following algorithm.
Higher-Order Dikin Step Algorithm for the Standard Model
Input:
An accuracy parameter ε > 0;
a step-size parameter α, 0 < α ≤ 1;
a positive primal-dual pair x0 , s0 .
begin
x := x0 ; s := s0 ;
while xT s ≥ ε do
begin
x := x(α) = x + ∆r,α x;
s := s(α) = s + ∆r,α s
end
end
Below we analyze this algorithm. Our aim is to keep the iterates (x, s) within some
cone
max (xs)
δc (xs) =
≤τ
min (xs)
around the central path, for some fixed τ > 1; τ is chosen such that
δc (x0 s0 ) ≤ τ.
18.4.4
Feasibility and duality gap reduction
As before, we use the superscript
step of size α at (x, s). Thus,
+
to refer to entities after the higher-order Dikin
x+
:=
x(α) = x + ∆r,α x,
s+
:=
s(α) = s + ∆r,α s,
and from Lemma IV.19,
x+ s+ = x(α)s(α) = xs + α∆w + E r (α),
(18.33)
where the higher-order error term E r (α) is given by (18.14).
The step-size α is feasible if the new iterates are positive. Using the same (simple
continuity) argument as in the proof of Lemma E.2, page 455, we get the following
result.
342
IV Miscellaneous Topics
Lemma IV.23 If ᾱ is such that x(α)s(α) > 0 for all α satisfying 0 ≤ α ≤ ᾱ, then
the step-size ᾱ is feasible.
Lemma IV.23 implies that the step-size ᾱ is feasible if
xs + α∆w + E r (α) ≥ 0,
0 ≤ α ≤ ᾱ.
At the end of Section 18.2 we established that after the step the duality gap attains
the value eT (xs + α∆w). This leads to the following lemma.
Lemma IV.24 If the step-size α is feasible then
T
α
xT s.
x+ s+ ≤ 1 − √
n
Proof: We have
2
T
x+ s+ = eT
(xs)
xs − α
kxsk
!
= xT s − α kxsk .
The Cauchy–Schwarz inequality implies
1
eT (xs)
xT s
kxsk = √ kek kxsk ≥ √
= √ ,
n
n
n
and the lemma follows.
18.4.5
✷
Estimate of the error term
By Theorem IV.22 the error term E r (α) satisfies
kE r (α)k ≤
2r
X
1
αk 22k v −1
8 (r + 1)
k=r+1
k−2
∞
q (1)
k
.
In the present case we have, from (18.19), (18.17) and (18.32),
q (1) =
Hence
q (1) =
Therefore,
v −1
∞
q (1)
∆w
v3
=− 2 .
v
kv k
v2
v3
≤ kvk∞
= kvk∞ = max (v).
2
kv k
kv 2 k
max (v)
≤
=
min (v)
Substituting this we get
kE r (α)k ≤
1
v −1
8 (r + 1)
−2
∞
2r
X
k=r+1
s
√
max (xs) p
= δc (xs) ≤ τ .
min (xs)
αk 22k
2r
√ k
min (xs) X
τ =
8 (r + 1)
k=r+1
√ k
4α τ .
(18.34)
IV.18 Higher-Order Methods
18.4.6
343
Step size
Assuming δc (x, s) ≤ τ , with τ > 1, we establish a bound for the step-size α such that
this property is maintained after a higher-order Dikin step. The analysis follows the
same lines as the analysis in Section E.4 of the algorithm for the self-dual model with
first-order Dikin steps. As there, we derive from δc (x, s) ≤ τ the existence of positive
numbers τ1 and τ2 such that
τ1 e ≤ xs ≤ τ2 e,
with τ2 = τ τ1 .
(18.35)
Without loss of generality we take
τ1 = min (xs).
The following lemma generalizes Lemma E.4.
Lemma IV.25 Let τ > 1. Suppose that δc (xs) ≤ τ and let τ1 and τ2 be such that
(18.35) holds. If the step-size α satisfies
s
(
√ )
1
1 r 2τ1 τ
kxsk
,
, √ , √
α ≤ min
2τ2 4 τ 4 τ
kxsk
then we have δc (x+ s+ ) ≤ τ .
Proof: Using (18.33) and the definition of ∆w we obtain
2
x+ s+ = x(α)s(α) = xs + α∆w + E r (α) = xs −
α (xs)
+ E r (α).
kxsk
Using the first bound on α in the lemma, we can easily verify that the map
t 7→ t −
αt2
kxsk
is an increasing function for t ∈ [0, τ2 ]. Application of this map to each component of
the vector xs gives
2
α (xs)
ατ22
ατ12
e ≤ xs −
e.
≤ τ2 −
τ1 −
kxsk
kxsk
kxsk
It follows that
ατ12
ατ22
r
+ +
τ1 −
e + E (α) ≤ x s ≤ τ2 −
e + E r (α).
kxsk
kxsk
Hence, assuming for the moment that the Dikin step of size α is feasible, we certainly
have δ(x+ s+ ) ≤ τ if
ατ22
ατ12
r
e + E (α) ≥ τ2 −
e + E r (α).
τ
τ1 −
kxsk
kxsk
344
IV Miscellaneous Topics
Since τ2 = τ τ1 , this reduces to
α
τ22 − τ τ12
kxsk
e + (τ − 1)E r (α) ≥ 0.
Since τ22 − τ τ12 = (τ − 1) τ1 τ2 we can divide by τ − 1, thus obtaining
ατ1 τ2
e + E r (α) ≥ 0.
kxsk
This inequality is certainly satisfied if
kxsk kE r (α)k ≤ ατ1 τ2 .
Using the upper bound (18.34) for E r (α) it follows that we have δ(x+ s+ ) ≤ τ if α is
such that
2r
√ k
kxsk min (xs) X
4α τ ≤ ατ1 τ2 .
8 (r + 1)
k=r+1
Since min (xs) = τ1 , this inequality simplifies to
2r
X
kxsk
8 (r + 1)
k=r+1
√ k
4α τ ≤ ατ2 .
√
The second bound in the lemma implies that 4α τ < 1. Therefore, the last sum is
bounded above by
2r
X
√ r+1
√ k
.
4α τ ≤ r 4α τ
k=r+1
Substituting this we arrive at the inequality
√ r+1
r kxsk (4α τ )
≤ ατ2 .
8 (r + 1)
Omitting the factor r/(r + 1), we can easily check that this inequality certainly holds
if
s
√
1 r 2τ1 τ
,
α≤ √
4 τ
kxsk
which is the third bound on α in the lemma. Thus we have shown that for each stepsize α satisfying the bounds in the lemma, we have δ(x+ s+ ) ≤ τ . But this implies that
the coordinates of x+ s+ do not vanish for any of these step-sizes. By Lemma IV.23
this also implies that the given step-size α is feasible. Hence the lemma follows. ✷
IV.18 Higher-Order Methods
18.4.7
345
Convergence analysis
With the result of the previous section we can now derive an upper bound for the
number of iterations needed by the algorithm.
Lemma IV.26 Let 4/n ≤ τ ≤ 4n. Then, with the step-size
s
1 r 2
√ ,
α= √
4 τ
τn
the Higher-Order Dikin Step Algorithm for the Standard Model requires at most
r√
T
√ r τn
x0 s0
log
4 τn
2
ε
iterations.10 The output is a feasible primal-dual pair (x, s) such that δc (xs) ≤ τ and
xT s ≤ ε.
Proof: Initially we are given a feasible primal-dual pair (x0 , s0 ) such that δc (x0 s0 ) ≤
τ. The given step-size α guarantees that these properties are maintained after each
iteration. This can be deduced from Lemma IV.25, as we now show. It suffices to show
that the specified value of α meets the bound in Lemma IV.25. Since τ n ≥ 4 we have
s
1
1 r 2
√
≤ √ ,
α= √
4 τ
τn
4 τ
√
showing that α meets the second bound. Since kxsk ≤ τ2 n we have
√
√
√
2τ1 τ
2τ1 τ
2 τ
2
√ = √ = √ ,
≥
kxsk
τ2 n
τ n
τn
which implies that α also meets the third bound in Lemma IV.25. Finally, for the first
bound in the lemma, we may write
√
√
n
kxsk
τ1 n
1
≥
=
≥ √ .
2τ2
2τ2
2τ
4 τ
The last inequality follows because τ ≤ 4n. Thus we have shown that α meets the
bounds in Lemma IV.25. As a consequence, the property δc (xs) ≤ τ is maintained
during the course of the algorithm. This also implies that the algorithm is well defined
and, hence, the only remaining task is to derive the iteration bound in the lemma. By
Lemma IV.24, each iteration reduces the duality gap by a factor 1 − θ, where
s
1 r 2
α
√ .
θ= √ = √
n
4 τn
τn
10
When r = 1 the step-size becomes
α=
2τ
1
√ ,
n
which is a factor of 2 smaller than the step-size in Section E.5. As a consequence the iteration
bound is a factor of 2 worse than in Section E.5. This is due to a weaker estimate of the error
term.
346
IV Miscellaneous Topics
Hence, by Lemma I.36, the duality gap satisfies xT s ≤ ε after at most
r√
T
√ r τn
x0 s0
1
nµ0
log
= 4 τn
log
θ
ε
2
ε
iterations. This completes the proof.
✷
2
Recall that each iteration requires O n3 + rn arithmetic operations. In the rest
of this section we take the order r of the search direction equal to r = n. Then the
complexity per iteration is still O n3 just as in the case of a first-order method. The
iteration bound of Lemma IV.26 then becomes
r√
T
√ n τn
x0 s0
log
.
4 τn
2
ε
Now, assuming τ ≤ 4n, we have
r√
√
τn
n
≤ n n.
2
The last expression is maximal for n = 3 and is then equal to 1.44225. Thus we may
state without further proof the following theorem.
Theorem IV.27 Let 4/n ≤ τ ≤ 4n and r = n. Then the Higher-Order Dikin Step
Algorithm for the Standard Model stops after at most
T
√
x0 s0
6 τ n log
ε
iterations. Each iteration requires O(n3 ) arithmetic operations.
For τ = 2, which can be taken without loss of generality, the iteration bound of
Theorem IV.27 becomes
T !
√
x0 s0
,
n log
O
ε
which is the best obtainable bound.
18.5
Application to the primal-dual logarithmic barrier method
18.5.1
Introduction
In this section we apply the higher-order approach to the (primal-dual) logarithmic
barrier method. If the target value of the barrier parameter is µ, then the search
direction in the w-space at a given primal-dual pair (x, s) is given by
∆w = µe − xs.
We measure the proximity from (x, s) to the target µe by the usual measure
r
r
1
1
xs
µe
µe − xs
√
−
= √
.
(18.36)
δ(xs, µ) =
2
µe
xs
2 µ
xs
IV.18 Higher-Order Methods
347
In this chapter we also use an infinity-norm based proximity of the central path,
namely
r
r
√
µ
µe
µe
= max
= max
.
(18.37)
δ∞ (xs, µ) :=
i
i
xs ∞
xs
vi
Recall from Lemma II.62 that we always have
δ∞ (xs, µ) ≤ ρ (δ(xs, µ)) .
(18.38)
Just as in the previous section, where we used the Dikin direction, our aim is to consider
a higher-order logarithmic barrier method that keeps the iterates within some cone
around the central path. The cone is obtained by requiring that the primal-dual pairs
(x, s) generated by the method are such that there exists a µ > 0 such that
δ(xs, µ) ≤ τ,
and
δ∞ (xs, µ) ≤ ζ
(18.39)
where τ and ζ denote some fixed positive numbers that specify the ‘width’ of the cone
around the central path in which the iterates are allowed to move.
When ζ = ρ(τ ) it follows from (18.38) that
δ(xs, µ) ≤ τ
⇒
δ∞ (xs, µ) ≤ ζ.
Hence, the logarithmic barrier methods considered in Part II fall within the present
framework
√ with ζ = ρ(τ ). The full Newton step method considered in Part II uses
τ = 1/ 2. In the large-update methods of Part II the updates of the barrier parameter
µ reduce µ by a factor √
1 − θ, where θ = O(1). As a consequence, after a barrier update
we have δ(xs, µ) = O( n). Hence, we may say that the full Newton step methods in
Part II keep the iterates in√a cone with τ = O(1), and the large-update methods in
a wider cone with τ = O( n). Recall that the method using the wider cone — the
large-update methods — are multistep methods. Each single step is a damped (firstorder) Newton step and the progress is measured by the decrease of the (primal-dual)
logarithmic barrier function.
In this
√ section we consider a method that works within a ‘wide’ cone, with
τ = O( n) and ζ = O(1), but we use higher-order Newton steps instead of damped
first-order steps. The surprising feature of the method is that progress can be controlled
by using the proximity measures δ(xs, µ) and δ∞ (xs, µ). We show that after an update
of the barrier parameter a higher-order step reduces the proximity δ(xs, µ) by a factor
smaller than one and keeps the proximity δ∞ (xs, µ) under a fixed threshold value
ζ ≥ 2. Then the barrier parameter value can be decreased to a smaller value while
respecting the cone condition
we obtain a ‘wide-cone method’
√ (18.39). In this way
whose iteration bound is O( n log log x0 )T s0 /ε . Each iteration consists of a single
higher-order Newton step.
Below we need to analyze the effect of a higher-order Newton step on the proximity
measures. For that purpose the error term must be estimated.
18.5.2
Estimate of the error term
Recall from Lemma IV.19 that the error term E r (α) is given by
kE r (α)k ≤
2r
X
1
αk 22k v −1
8 (r + 1)
k=r+1
k−2
∞
q (1)
k
,
(18.40)
348
IV Miscellaneous Topics
where v =
√
xs. In the present case, (18.19) and (18.17) give
q (1) =
µe − xs
∆w
= √
.
v
xs
Hence, using (18.36) and denoting δ(xs, µ) by δ, we find
√
√
q (1) = 2 µ δ ≤ 2 µ τ.
Furthermore by using (18.37) and putting δ∞ := δ∞ (xs, µ) we have
v
−1
∞
1
=√
µ
r
µ
xs
∞
δ∞
ζ
= √ ≤ √ .
µ
µ
Substituting these in (18.40) we get11
kE r (α)k ≤
2r
2r
X
X
µ
µ
k
k
(8αδδ
)
≤
(8ατ ζ) .
∞
2
8(r + 1)δ∞
8(r + 1)ζ 2
k=r+1
(18.41)
k=r+1
Below we always make the natural assumption that α ≤ 1. Moreover, δ and δ∞ always
denote δ(xs, µ) and δ∞ (xs, µ) respectively.
Lemma IV.28 Let the step-size be such that α ≤ 1/ (8δδ∞ ). Then
r+1
kE r (α)k ≤
rµ (8αδδ∞ )
2
r+1
8δ∞
.
Proof: Since 8αδδ∞ ≤ 1, we have
2r
X
k=r+1
k
(8αδδ∞ ) ≤ r (8αδδ∞ )
r+1
.
Substitution in (18.41) gives the lemma.
✷
Corollary IV.29 Let δ ≤ τ , δ∞ ≤ ζ and α ≤ 1/ (8τ ζ). Then
kE r (α)k ≤
11
rµ (8ατ ζ)
r+1
8ζ 2
r+1
.
For r = 1 the derived bound for the error term gives E 1 (1) ≤ 4µδ2 , as follows easily. It is
interesting to compare this bound with the error bound in Section 7.4 (cf. Lemma II.49), which
√
√
amounts to E 1 (1) ≤ µδ2 2. Although the present bound is weaker by a factor of 2 2 for r = 1,
√
it is sharp enough for our present purpose. It is also sharp enough to derive an O( n) complexity
bound for r = 1 with some worse constant than before. Our main interest here is the case where
r > 1.
IV.18 Higher-Order Methods
18.5.3
349
Reduction of the proximity after a higher-order step
Recall from (18.13) that after a higher-order step of size α we have
xr (α)sr (α) = xs + α∆w + E r (α) = xs + α(µe − xs) + E r (α).
We consider
w̄(α) := xs + α(µe − xs)
as the (intermediate) target during the step. The new iterate in the w-space is denoted
by w(α), so
w(α) = xr (α)sr (α).
As a consequence,
w(α) = w̄(α) + E r (α).
(18.42)
The proximities of the new iterate with respect to the µ-center are given by
s
r
w(α)
µ
1
−
δ(w(α), µ) =
2
µ
w(α)
and
δ∞ (w(α), µ) =
r
µ
w(α)
.
∞
Ideally the proximities after the step would be δ(w̄(α), µ) and δ∞ (w̄(α), µ). We first
derive an upper bound for δ(w̄(α), µ) and δ∞ (w̄(α), µ) respectively in terms of τ , ζ
and the step-size α.
Lemma IV.30 We have
√
(i) δ(w̄(α), µ) ≤ 1 − α δ,
p
2 .
(ii) δ∞ (w̄(α), µ) ≤ α + (1 − α)δ∞
Proof: It is easily verified that for any positive vector w, by their definitions (18.36)
and (18.37), both δ(w, µ)2 and δ∞ (w, µ)2 are convex functions of w. Since
w̄(α) = xs + α (µe − xs) = α (µe) + (1 − α)xs,
0 ≤ α ≤ 1,
w̄(α) is a convex combination of µe and xs. Hence, by the convexity of δ(w, µ)2 ,
δ(w̄(α), µ)2 ≤ α δ(µe, µ)2 + (1 − α) δ(xs, µ)2 .
Since δ(µe, µ) = 0, the first statement of the lemma follows.
The proof of the second claim is analogous. The convexity of δ∞ (xs, µ)2 gives
δ∞ (w̄(α), µ)2 ≤ α δ∞ (µe, µ)2 + (1 − α) δ∞ (xs, µ)2 .
Since δ∞ (µe, µ) = 1, the lemma follows.
✷
It is very important for our purpose that when the pair (x, s) satisfies the cone
condition (18.39) for µ > 0, then after a higher-order step at (x, s) to the µ-center,
350
IV Miscellaneous Topics
the new iterates also satisfy the cone condition. The next corollary of Lemma IV.30
is a first step in this direction. It shows that w̄(α) satisfies the cone condition. Recall
that w̄(α) = w(α) if the higher-order step is exact. Later we deal with the case where
the higher-order step is not exact (cf. Theorem IV.35 below). This requires careful
estimation of the error term E r (α).
Corollary IV.31 Let δ ≤ τ and δ∞ ≤ ζ, with ζ ≥ 2. Then we have
√
(i) δ(w̄(α), µ) ≤ 1 − α τ ≤ (1 − α2 )τ ;
p
(ii) δ∞ (w̄(α), µ) ≤ α + (1 − α)ζ 2 ≤ 1 − 3α
8 ζ ≤ ζ.
Proof:
√ The first claim is immediate from the first part of Lemma IV.30, since δ ≤ τ
and 1 − α ≤ 1 − α/2. For the proof of the second statement we write, using the
second part of Lemma IV.30 and ζ ≥ 2,
r
p
p
ζ2
2
α + (1 − α)δ∞
≤
α + (1 − α)ζ 2 ≤ α + (1 − α)ζ 2
4
s
3α
3α
2
=
ζ ≤ 1−
ζ ≤ ζ.
1−
4
8
Thus the corollary has been proved.
✷
The next lemma provides an expression for the ‘error’ in the proximities after the
step. We use the following relation, which is an obvious consequence of (18.42):
w(α)
E r (α)
w̄(α)
e+
.
(18.43)
=
µ
µ
w̄(α)
Lemma IV.32 Let α be such that
E r (α)
w̄(α)
∞
√
5−1
≤
.
2
Then we have
p
r
(α)
1 + δ(w̄(α), µ)2 Ew̄(α)
;
r
(α)
.
(ii) δ∞ (w(α), µ) ≤ δ∞ (w̄(α), µ) 1 + Ew̄(α)
(i) δ(w(α), µ) ≤ δ(w̄(α), µ) +
∞
Proof: Using (18.43) we may write
1
δ(w(α), µ) =
2
s
w̄(α)
µ
s
−1
µ
E r (α)
E r (α)
e+
−
e+
.
w̄(α)
w̄(α)
w̄(α)
To simplify the notation we omit the argument α in the rest of the proof and we
introduce the notation
E r (α)
λ :=
,
w̄(α)
IV.18 Higher-Order Methods
so that
351
r
1
δ(w(α), µ) =
2
Since
q
w̄
µ
−
w̄
(e + λ) −
µ
r
µ
−1
(e + λ)
.
w̄
q
−1
µ
(e + λ) − w̄
(e + λ) =
p
q
1
− 12
µ
2
(e
+
λ)
(e
+
λ)
+ w̄
−
e
−
−
e
,
µ
w̄
q
pµ
w̄
w̄
µ
application of the triangle inequality gives
r
rµ
1
1
w̄
−1
(e + λ) 2 − e −
(e + λ) 2 − e . (18.44)
δ(w(α), µ) ≤ δ(w̄(α), µ) +
2
µ
w̄
Denoting the i-th coordinate of the vector under the last norm by zi , we have
r
rµ
1
1
w̄i
2
(1 + λi ) − 1 −
(1 + λi )− 2 − 1 .
zi =
µ
w̄i
This implies
|zi | ≤
r
1
w̄i
(1 + λi ) 2 − 1 +
µ
r
1
µ
(1 + λi )− 2 − 1 .
w̄i
√
The hypothesis of the lemma implies |λi | ≤ ( 5 − 1)/2. Now using some elementary
inequalities,12 we get
r
r
r
r
w̄i
µ
w̄i
µ
|λi | .
|λi | =
|λi | +
+
|zi | ≤
µ
w̄i
µ
w̄i
Since
r
w̄i
+
µ
and
r
r
µ
w̄i
2
w̄
−
µ
=4+
r
µ
w̄
we conclude that
Hence
r
2
∞
≤
r
r
w̄
−
µ
µ
w̄i
r
2
µ
w̄
p
|zi | ≤ 2 1 + δ(w̄(α), µ)2 |λi | ,
kzk ≤ 2
12
w̄i
−
µ
≤4+
r
w̄
−
µ
2
= 4δ(w̄(α), µ)2 ,
1 ≤ i ≤ n.
p
1 + δ(w̄(α), µ)2 kλk .
Exercise 86 Prove the following inequalities:
1
(1 + λ) 2 − 1
1
(1 + λ)− 2 − 1
≤
|λ| ,
≤
|λ| ,
r
−1 ≤ λ ≤ 1,
√
1− 5
≤ λ ≤ 1.
2
µ
w̄
2
∞
352
IV Miscellaneous Topics
Substituting this in (18.44) proves the first statement of the lemma.
The proof of the second statement in the lemma is analogous. We write
s
−1
r
µ
µ
E r (α)
e+
=
δ∞ (w(α), µ)
=
w(α) ∞
w̄(α)
w̄(α)
∞
r
µ
−1
(e + λ)
=
w̄(α)
∞
r
r
1
µ
µ
=
(e + λ)− 2 − e
+
w̄(α)
w̄(α)
∞
r
r
µ
µ
− 21
+
(e + λ) − e
≤
.
w̄(α) ∞
w̄(α) ∞
∞
Using again the results of Exercise 86 we can simplify this to
δ∞ (w(α), µ)
≤
=
δ∞ (w̄(α), µ) + δ∞ (w̄(α), µ) kλk∞
E r (α)
δ∞ (w̄(α), µ) 1 +
,
w̄(α) ∞
proving the lemma.
✷
The following corollary easily follows from Lemma IV.32 and Corollary IV.31.
Corollary IV.33 Let δ ≤ τ and δ∞ ≤ ζ, with ζ ≥ 2. If α is such that
√
5−1
E r (α)
≤
,
w̄(α) ∞
2
then we have
√
r
(α)
;
1 + τ 2 Ew̄(α)
r
(α)
ζ.
(ii) δ∞ (w(α), µ) ≤ 1 − 38 α 1 + Ew̄(α)
(i) δ(w(α), µ) ≤ (1 − α2 )τ +
∞
We proceed by finding a step-size α that satisfies the hypothesis of Lemma IV.32
and Corollary IV.33.
Lemma IV.34 With δ and ζ as in Corollary IV.33, let the step-size α be such that
α ≤ 1/(8τ ζ). Then
E r (α)
r
r+1
(8ατ ζ)
≤
w̄(α)
8(r + 1)
and α satisfies the hypothesis of Lemma IV.32 and Corollary IV.33.
Proof: We may write
E r (α)
e
≤
w̄(α)
w̄(α)
1
kE (α)k =
µ
∞
r
r
µe
w̄(α)
2
∞
kE r (α)k ≤
ζ2
kE r (α)k ,
µ
IV.18 Higher-Order Methods
353
where the last inequality follows from Corollary IV.31. Now using Corollary IV.29 we
have
r
E r (α)
r+1
(8ατ ζ)
,
≤
w̄(α)
8(r + 1)
proving the first part of the lemma. The second part follows from the first part by
using 8ατ ζ ≤ 1:
√
E r (α)
E r (α)
r
5−1
<
,
≤
≤
w̄(α) ∞
w̄(α)
8(r + 1)
2
completing the proof.
✷
Equipped with the above results we can prove the next theorem.
Theorem IV.35 Let δ ≤ τ , δ∞ ≤ ζ, with ζ ≥ 2, and α ≤ 1/(8τ ζ). Then
√
r+1
r
1 + τ 2 (8ατ ζ) ;
(i) δ(w(α), µ) ≤ 1 − α2 τ + 8(r+1)
r+1
r
(ii) δ∞ (w(α), µ) ≤ 1 − 83 α 1 + 8(r+1)
(8ατ ζ)
ζ.
Proof: For the given step-size the hypothesis of Corollary IV.33 is satisfied, by
Lemma IV.34. From Lemma IV.34 we also deduce the second inequality in
E r (α)
w̄(α)
∞
≤
E r (α)
r
r+1
(8ατ ζ)
.
≤
w̄(α)
8(r + 1)
Substituting these inequalities in Corollary IV.33 yields the theorem.
18.5.4
✷
The step-size
In the sequel the step-size α is given the value
α=
1
q
,
√
r
8τ ζ (r + 1)ζ 1 + τ 2
(18.45)
where δ = δ(xs, µ) ≤ τ and δ∞ = δ∞ (xs, µ) ≤ ζ. It is assumed that ζ ≥ 2. The
next theorem makes clear that after a higher-order step with the given step-size α the
proximity δ is below a fixed fraction of τ and the proximity δ∞ below a fixed fraction
of ζ.
Theorem IV.36 If the step-size is given by (18.45) then
!
α(r2 + 1)
τ.
δ(w(α), µ) ≤ 1 −
2 (r + 1)2
Moreover,
α
δ∞ (w(α), µ) ≤ 1 −
ζ.
8
354
IV Miscellaneous Topics
Proof: The proof uses Theorem IV.35. This theorem applies because for the given
value of α we have
1
1
q
≤
,
α=
√
8τ
ζ
r
8τ ζ (r + 1)ζ 1 + τ 2
whence 8ατ ζ ≤ 1. Hence, by the first statement in Theorem IV.35,
p
r
α
τ+
δ(w(α), µ) ≤ 1 −
1 + τ 2 (8ατ ζ)r+1 .
2
8(r + 1)
(18.46)
The second term on the right can be reduced by using the definition of α:
p
8ατ ζ
α
r
√
1 + τ2
δ(w(α), µ)
≤
1−
τ+
2
8(r + 1)
(r + 1)ζ 1 + τ 2
rα
α
τ
=
1− +
2
(r + 1)2
r2 + 1
=
1−
α τ.
2(r + 1)2
This proves the first statement. The second claim follows in a similar way from the
second statement in Theorem IV.35:
3
r
r+1
ζ
δ∞ (w(α), µ)
≤
1− α
1+
(8ατ ζ)
8
8(r + 1)
3
8ατ ζ
r
√
=
1− α
1+
ζ
8
8(r + 1) (r + 1)ζ 1 + τ 2
ατ
r
3
√
ζ
1+
=
1− α
8
(r + 1)2 1 + τ 2
3
rα
<
1− α
ζ
1+
8
(r + 1)2
3
1
≤
1− α
1+ α ζ
8
4
α
ζ.
≤
1−
8
In the last but one inequality we used that r/(r + 1)2 is monotonically decreasing if r
increases (for r ≥ 1).
✷
18.5.5
Reduction of the barrier parameter
In this section we assume that δ = δ(xs, µ) ≤ τ , where τ is any positive number. After
a higher-order step with step-size α, given by (18.45), we have by Theorem IV.36,
δ(w(α), µ) ≤ (1 − β) δ,
where
β=
α(r2 + 1)
.
2(r + 1)2
(18.47)
IV.18 Higher-Order Methods
355
Below we investigate how far µ can be decreased after the step while keeping the
proximity δ less than or equal to τ . Before doing this we observe that
r
µe
δ∞ (xs, µ) =
xs ∞
is monotonically decreasing as µ decreases. Hence, we do not have to worry about δ∞
when µ is reduced. Defining
µ+ := (1 − θ)µ,
we first deal with a lemma that later gives an upper bound for δ(w(α), µ+ ).13
Lemma IV.37 Let (x, s) be a positive primal-dual pair and suppose µ > 0. If
δ := δ(xs, µ) and µ+ = (1 − θ)µ then
δ(xs, µ+ ) ≤
√
2δ + θ n
√
.
2 1−θ
Proof: By the definition of δ(xs, µ+ ),
1
δ(xs, µ ) =
2
+
r
xs
−
µ+ e
To simplify the notation in the proof we use u =
δ(xs, µ+ ) =
r
µ+ e
.
xs
p
xs/µ. Then we may write
√
1
u
θu
1 √
√
.
1 − θ u − u−1 + √
− 1 − θu−1 =
2
2
1−θ
1−θ
Using the triangle inequality and14 also
kuk ≤ u − u−1 +
√
√
n = 2δ + n,
we get
√
√
√
√
2δ + θ n
θ kuk
θ (2δ + n)
√
δ(xs, µ ) ≤ δ 1 − θ + √
≤δ 1−θ+
= √
,
2 1−θ
2 1−θ
2 1−θ
+
proving the lemma.
13
14
✷
A similar result was derived in Lemma II.54, but under the assumption that xT s = nµ. This
assumption will in general not be satisfied in the present context, and hence we have a weaker
bound.
Exercise 87 For each positive number ξ we have
|ξ| ≤ ξ −
1
+ 1.
ξ
Prove this and derive that for each positive vector u the following inequality holds:
kuk ≤ u − u−1
+
√
n.
356
IV Miscellaneous Topics
Theorem IV.38 Let δ = δ(xs, µ) ≤ τ , δ∞ = δ∞ (xs, µ) ≤ ζ, with ζ ≥ 2. Taking
first a higher-order step at (x, s), with α according to (18.45), and then updating the
barrier parameter to µ+ = (1 − θ)µ, where
θ=
ατ (r2 + 1)
2βτ
√ =
√ ,
2τ + n
(r + 1)2 (2τ + n)
(18.48)
we have δ(w(α), µ+ ) ≤ τ and δ∞ (w(α), µ+ ) ≤ ζ.
Proof: The second part of Theorem IV.36 implies that after a step of the given
size δ∞ (w(α), µ) ≤ ζ. We established earlier that δ∞ monotonically decreases when µ
decreases. As a result we have δ∞ (w(α), µ+ ) ≤ ζ. Now let us estimate δ(w(α), µ+ ).
After a higher-order step with step-size α as given by (18.45), we have by the first
part of Theorem IV.36,
α(r2 + 1)
δ(w(α), µ) ≤ 1 −
δ(xs, µ) = (1 − β) δ,
2(r + 1)2
with β as defined in (18.47). Also using Lemma IV.37 we obtain
δ(w(α), µ+ ) ≤
√
√
2δ(w(α), µ) + θ n
2 (1 − β) δ + θ n
√
√
≤
.
2 1−θ
2 1−θ
Since δ ≤ τ , we certainly have δ(w(α), µ+ ) ≤ τ if
√
2(1 − β)τ + θ n
√
≤ τ.
2 1−θ
This inequality can be rewritten as
√
√
2(1 − β)τ + θ n ≤ 2τ 1 − θ.
Using
√
1 − θ ≥ 1 − θ the above inequality certainly holds if
√
2(1 − β)τ + θ n ≤ 2τ (1 − θ).
It is easily verified that the value of θ in (18.48) satisfies this inequality with equality.
Thus the proof is complete.
✷
18.5.6
A higher-order logarithmic barrier algorithm
Formally the logarithmic barrier algorithm using higher-order Newton steps can be
described as below.
IV.18 Higher-Order Methods
357
Higher-Order Logarithmic Barrier Algorithm
Input:
A natural number r, the order of the search directions;
a positive number τ , specifying
the cone;
a primal-dual pair x0 , s0 and µ0 > 0 such that δ(x0 s0 , µ0 ) ≤ τ .
ζ := max 2, δ∞ (x0 s0 , µ0 ) ;
a step-size parameter α, from (18.45);
an update parameter θ, from (18.48);
an accuracy parameter ε > 0;
begin
x := x0 ; s := s0 ; µ := µ0 ;
while xT s ≥ ε do
begin
x := x(α) = x + ∆r,α x;
s := s(α) = s + ∆r,α s;
µ := (1 − θ)µ
end
end
A direct consequence of the specified values of the step-size α and update parameter
θ is that the properties δ(xs, µ) ≤ τ and δ∞ (xs, µ) ≤ ζ are maintained in the course of
the algorithm. This follows from Theorem IV.38 and makes the algorithm well-defined.
18.5.7
Iteration bound
In the further analysis of the algorithm we choose
√
τ = n and r = n.
At the end of each iteration of the algorithm we have
√
δ(xs, µ) ≤ τ = n.
As a consequence (cf. Exercise 62),
√
√
2τ ρ(τ )
xT s ≤ 1 + √
nµ = 1 + 2ρ n nµ ≤ 4 1 + n nµ.
n
Hence xT s ≤ ε holds if
4 1+
or
µ≤
√
n nµ ≤ ε,
ε
√ .
4n (1 + n)
358
IV Miscellaneous Topics
Recall that at each iteration the barrier parameter is reduced by a factor 1 − θ, with
θ=
α
α(n2 + 1)
ατ (r2 + 1)
√
≥ .
=
2
2
(r + 1) (2τ + n)
3(n + 1)
6
(18.49)
The last inequality holds for all n ≥ 1. Using Lemma I.36 we find that the number of
iterations does not exceed
√
4 (1 + n) nµ0
6
log
.
α
ε
√
Substituting α in (18.45) and τ = n, we get
q
√
√ n
6
= 48ζ n (n + 1)ζ 1 + n.
α
For n ≥ 1 we have
q
√
√
n
(n + 1) 1 + n ≤ 2 2 = 2.8284,
with equality only if n = 1. Thus we find
n+1 √
6
≤ 136 ζ n n.
α
Thus we may state the next theorem without further proof.
Theorem IV.39 The Higher-Order Logarithmic Barrier Algorithm needs at most
√
n+1 √
4 (1 + n) nµ0
n
136 ζ
n log
ε
iterations. Each iteration requires O(n3 ) arithmetic operations. The output is a primaldual pair (x, s) such that xT s ≤ ε.
T
When starting the algorithm on the central path, with µ0 = x0 s0 /n, we have ζ = 2.
In that case δ∞ (xs, µ) ≤ 2 at each iteration and the iteration bound of Theorem IV.39
becomes
√
√
√
√
nnµ0
4 (1 + n) nµ0
544 n log
.
(18.50)
=O
n log
ε
ε
In fact, as long as ζ = O(1) the iteration bound is given by the right-hand expression
in (18.50). Note that this bound has the same order of magnitude as the best known
iteration complexity bound.
When (x0 , s0 ) is far from the central path, the value of ζ may be so large that the
iteration bound of Theorem IV.39 becomes
very
poor. Note that ζ can be as large as
n+1
ρ(τ ), which would give an extra factor O n 2n in (18.50). However, a more careful
analysis yields a much better bound, as we show in the next section.
18.5.8
Improved iteration bound
In this section we consider the situation where the algorithm
starts with a high value
√
of ζ. Recall from the previous section that if τ = n then ζ is always bounded by
IV.18 Higher-Order Methods
359
√
√
ζ ≤ ρ ( n) = O ( n). Now the second part of Theorem IV.36 implies that after a
higher-order step at (x, s) to the µ-center we have
α
ζ.
δ∞ (w(α), µ) ≤ 1 −
8
Reducing µ to µ+ = (1 − θ)µ we get
α
ζ.
δ∞ (w(α), µ+ ) ≤ (1 − θ) 1 −
8
Now using the lower bound (18.49) for θ it follows that
α
α
δ∞ (w(α), µ+ ) ≤ 1 −
1−
ζ.
6
8
Since 0 ≤ α ≤ 1 we have 1 − α6 1 − α8 ≤ 1 − α4 . Hence
α
ζ.
δ∞ (w(α), µ+ ) ≤ 1 −
4
Substituting the value of α, while using
q
q
√
√ √
n
8 (n + 1) ζ 1 + n ≤ 8 n (n + 1) ρ n 1 + n ≤ 55,
we obtain
1
ζ
√ ,
=ζ−
220τ ζ
220 n
√
showing that δ∞ (xs, µ) decreases by at least 1/ (220 n) in one iteration. Obviously,
we can redefine ζ according to
1
√
ζ := max 2, δ∞ (w(α), µ+ ) ≤ max 2, ζ −
220 n
δ∞ (w(α), µ+ ) ≤ ζ −
in the next iteration and continue the algorithm with this new value. In this way ζ
reaches the value 2 in no more than
√
√
220 n ζ 0 − 2 = O ζ 0 n
iterations, where ζ 0 = δ∞ (x0 s0 , µ0 ). From then on ζ keeps the value 2, and the number
of additional iterations is bounded by (18.50). Hence we may state the following
improvement of Theorem IV.39 without further proof.
Theorem IV.40 The Higher-Order Logarithmic Barrier Algorithm needs at most
√
√
4 nnµ1
0√
O ζ n + n log
.
ε
iterations. Each iteration requires O(n3 ) arithmetic operations. The output is a primaldual pair (x, s) such that xT s ≤ ε.
In this theorem µ1 denotes the value of the barrier parameter attained at the first
iteration for which ζ = 2. Obviously, µ1 ≤ µ0 .
19
Parametric and Sensitivity
Analysis
19.1
Introduction
Many commercial optimization packages for solving LO problems not only solve the
problem at hand, but also provide additional information on the solution. This added
information concerns the sensitivity of the solution produced by the package to perturbations in the data of the problem. In this chapter we deal with a problem (P ) in
standard format:
(P )
min cT x : Ax = b, x ≥ 0 .
The dual problem (D) is written as
(D)
max bT y : AT y + s = c, s ≥ 0 .
The input data for both problems consists of the matrix A, which is of size m × n,
and the vectors b ∈ IRm and c ∈ IRn . The optimal value of (P ) and (D) is denoted by
zA (b, c), with zA (b, c) = −∞ if (P ) is unbounded and (D) infeasible, and zA (b, c) = ∞
if (D) is unbounded and (P ) infeasible. If (P ) and (D) are both infeasible then zA (b, c)
is undefined. We call zA the optimal-value function for the matrix A.
The extra information provided by solution packages concerns only changes in the
vectors b and c. We also restrict ourselves to such changes. It will follow from the
results below that zA (b, c) depends continuously on the vectors b and c. In contrast,
the effect of changes in the matrix A is not necessarily continuous. The next example
provides a simple illustration of this phenomenon.1
Example IV.41 Consider the problem
min {x2 : αx1 + x2 = 1, x1 ≥ 0, x2 ≥ 0} ,
where α ∈ IR. In this example we have A = (α 1), b = (1) and c = (0 1)T . We can
easily verify that zA (b, c) = 0 if α > 0 and zA (b, c) = 1 if α ≤ 0. Thus, if zA (b, c) is
considered a function of α, a discontinuity occurs at α = 0.
♦
Thus. the dependence of zA (b, c) on the entries in b and c is more simple than the
dependence of zA (b, c) on the entries in A.
1
For some results on the effect of changes in A we refer the reader to Mills [210] and Gal [89].
362
IV Miscellaneous Topics
We develop some theory in this chapter for the analysis of one-dimensional
parametric perturbations of the vectors b and c. Given a pair of optimal solutions
for (P ) and (D), we present an algorithm in Section 19.4.5 for the computation
of the optimal-value function under such a perturbation. Then, in Section 19.5 we
consider the special case of sensitivity analysis, also called postoptimal analysis. This
classical topic is treated in almost all (text-)books on LO and implemented in almost
all commercial optimization packages for LO. We show in Section 19.5.1 that the socalled ranges and shadow prices of the coefficients in b and c can be obtained by solving
auxiliary LO problems. In Section 19.5.3 we briefly discuss the classical approach to
sensitivity analysis, which is based on the use of an optimal basic solution and the
corresponding optimal basis. Although the classical approach is much cheaper from a
computational point of view, it yields less information and can easily be misinterpreted.
This is demonstrated in Section 19.5.4, where we provide a striking example of the
inherent weaknesses of the classical approach.
19.2
Preliminaries
The feasible regions of (P ) and (D) are denoted by
P
D
:=
:=
{x : Ax = b, x ≥ 0} ,
(y, s) : AT y + s = c, s ≥ 0 .
Assuming that (P ) and (D) are both feasible, the optimal sets of (P ) and (D) are
denoted by P ∗ and D∗ . We define the index sets B and N by
B := {i : xi > 0 for some x ∈ P ∗ } ,
N := {i : si > 0 for some (y, s) ∈ D∗ } .
The Duality Theorem (Theorem II.2) implies that B ∩ N = ∅, and the Goldman–
Tucker Theorem (Theorem II.3) that
B ∪ N = {1, 2, . . . , n}.
Thus, B and N form a partition of the full index set. This (ordered) partition, denoted
by π = (B, N ), is the optimal partition of problems (P ) and (D). It is obvious that
the optimal partition depends on b and c.
19.3
Optimal sets and optimal partition
In the rest of this chapter we assume that b and c are such that (P ) and (D) have
optimal solutions, and π = (B, N ) denotes the optimal partition of both problems. By
definition, the optimal partition is determined by the sets of optimal solutions for (P )
and (D). In this section it is made clear that, conversely, the optimal partition provides
essential information on the optimal solution sets P ∗ and D∗ . The next lemma follows
immediately from the Duality Theorem and is stated without proof.
IV.19 Parametric and Sensitivity Analysis
363
Lemma IV.42 Let x∗ ∈ P ∗ and (y ∗ , s∗ ) ∈ D∗ . Then
P∗
=
x : x ∈ P, xT s∗ = 0 ,
D∗
=
(y, s) : (y, s) ∈ D, sT x∗ = 0 .
As before, we use the notation xB and xN to refer to the restriction of a vector
x ∈ IRn to the coordinate sets B and N respectively. Similarly, AB denotes the
restriction of A to the columns in B and AN the restriction of A to the columns
in N . Now the sets P ∗ and D∗ can be described in terms of the optimal partition.
Lemma IV.43 Given the optimal partition (B, N ) of (P ) and (D), the optimal sets
of both problems are given by
P∗
D∗
{x : x ∈ P, xN = 0} ,
{(y, s) : (y, s) ∈ D, sB = 0} .
=
=
Proof: Let x∗ , s∗ be any strictly complementary pair of solutions of (P ) and (D), and
(x, s) an arbitrary pair of feasible solutions. Then, from Lemma IV.42, x is optimal
for (P ) if and only if xT s∗ = 0. Since s∗B = 0 and s∗N > 0, we have xT s∗ = 0 if and
only if xN = 0, thus proving that P ∗ consists of all primal feasible x for which xN = 0.
Similarly, if (y, s) ∈ D then this pair is optimal if and only if sT x∗ = 0. Since x∗B > 0
and x∗N = 0, this occurs if and only if sB = 0, thus proving that D∗ consists of all
dual feasible s for which sB = 0.
✷
To illustrate the meaning of Lemma IV.43 we give an example.
Example IV.44 Figure 19.1 shows a network with given arc lengths, and we ask for
a shortest path from node s to node t.
Denoting the set of nodes in this network by V and the set of arcs by E, any path
from s to t can be represented by a 0-1 vector x of length |E|, whose coordinates are
indexed by the arcs, such that xe = 1 if and only if arc e belongs to the path. The
length of the path is then given by
X
ce xe ,
(19.1)
e∈E
4
✒
4
s
5
✲
✒
4
3
✲
5
5
✲
✒
3
5
6
7
❘
✲
✒
3
❘
6
4
6
❘
✲
✒
3
4
Figure 19.1
❘
✲
3
2
5
❘
✲
A shortest path problem.
❘t
✲
✒
364
IV Miscellaneous Topics
where ce denote the length of arc e, for all e ∈ E. Furthermore, denoting e = (v, w)
if arc e points from node v to node w (with v ∈ V and w ∈ V ), and denoting xe by
xvw , x will satisfy the following balance equations:
X
xsv
=
1
v∈V
X
xvu
v∈V
X
X
=
u ∈ V \ {s, t}
xuv ,
v∈V
xvt
=
(19.2)
1
v∈V
Now consider the LO problem consisting of minimizing the linear function (19.1)
subject to the linear equality constraints in (19.2), with all variables xe , e ∈ E,
nonnegative. This problem has the standard format: it is a minimization problem
with equality constraints and nonnegative variables. Solving this problem with an
interior-point method we find a strictly complementary solution, and hence the optimal
partition of the problem. In this way we have computed the optimal partition (B, N )
of the problem. Since in this example there is a 1-to-1 correspondence between the
arcs and the variables, we may think of B and N as a partition of the arcs in the
network.
4
✒
4
s
5
✲
✒
4
3
✲
5
6
❘
✲
✒
3
Figure 19.2
5
✲
✒
3
5
7
❘
6
4
6
❘
✲
✒
3
4
❘
✲
3
❘t
✲
✒
2
5
❘
✲
The optimal partition of the shortest path problem in Figure 19.1.
In Figure 19.2 we have drawn the network once more, but now with the arcs in B
solid and the arcs in N dashed. The meaning of Lemma IV.43 is that any path from
s to t using only solid arcs is a shortest path, and all shortest paths use exclusively
solid arcs. In other words, the set B consists of all arcs in the network which occur in
some shortest path from s to t and the set N contains arcs in the network which do
not belong to any shortest path from s to t.2
♦
2
Exercise 88 Consider any network with node set V and arc set E and let s and t be two distinct
nodes in this network. If all arcs in the network have positive length, then the set B, consisting of
all arcs in the network which occur in at least one shortest path from s to t, does not contain a
(directed) circuit. Prove this.
IV.19 Parametric and Sensitivity Analysis
365
The next result deals with the dimensions of the optimal sets P ∗ and D∗ . Here, as
usual the (affine) dimension of a subset of IRk is the dimension of the smallest affine
subspace in IRk containing the subset.
Lemma IV.45 We have
dim P ∗
dim D∗
=
=
|B| − rank (AB )
m − rank (AB ).
Proof: The optimal set of (P ) is given by
P ∗ = {x : Ax = b, xB ≥ 0, xN = 0} ,
and hence the smallest affine subspace of IRn containing P ∗ is given by
{x : AB xB = b, xN = 0} .
The dimension of this affine space is equal to the dimension of the null space of AB .
Since this dimension is given by |B| − rank (AB ), the first statement follows.
For the proof of the second statement we use that the dual optimal set can be
described by
D∗ = (y, s) : AT y + s = c, sB = 0, sN ≥ 0 .
This is equivalent to
D∗ = (y, s) : ATB y = cB , ATN y + sN = cN , sB = 0, sN ≥ 0 .
The smallest affine subspace containing this set is
(y, s) : ATB y = cB , ATN y + sN = cN , sB = 0 .
Obviously sN is uniquely determined by y, and any y satisfying ATB y = cB yields
a point in this affine space. Hence the dimension of the affine space is equal to the
dimension of the null space of ATB . Since m is the number of columns of ATB , the
dimension of the null space of ATB equals m − rank (AB ). This completes the proof. ✷
Lemma IV.45 immediately implies that (P ) has a unique solution if and only if
rank (AB ) = |B|. Clearly this happens if and only if the columns in AB are linearly
independent. Also, (D) has a unique solution if and only if rank (AB ) = m, which
happens if and only if the rows in AB are linearly independent.3
3
It has become common practice in the literature to call the problem (P ) degenerate if (P ) or
(D) have multiple optimal solutions. Degeneracy is an important topic in LO. In the context
of the Simplex Method it is well known as a source of difficulties. This is especially true when
dealing with sensitivity analysis. See, e.g., Gal [90] and Greenberg [128]. But also in the context
of interior-point methods the occurrence of degeneracy may influence the behavior of the method.
We mention some references: Gonzaga [120], Güler et al. [132], Todd [263], Tsuchiya [269, 271],
Hall and Vanderbei [138].
366
19.4
IV Miscellaneous Topics
Parametric analysis
In this section we start to investigate the effect of changes in b and c on the optimalvalue function zA (b, c). We consider one-dimensional parametric perturbations of b
and c. So we want to study
zA (b + β∆b, c + γ∆c)
as a function of the parameters β and γ, where ∆b and ∆c are given perturbation
vectors. From now on the vectors b and c are fixed, and the variations come from
the parameters β and γ. In fact, we restrict ourselves to the cases that the variations
occur only in one of the two vectors b and c. In other words, taking γ = 0 we consider
variations in β and taking β = 0 we consider variations in γ.
If γ = 0, then (Pβ ) will denote the perturbed primal problem and (Dβ ) its dual.
The feasible regions of these problems are denoted by Pβ and Dβ . Similarly, if β = 0,
then (Dγ ) will denote the perturbed dual problem and (Pγ ) its dual and the feasible
regions of these problems are Dγ and Pγ . Observe that the feasible region of (Dβ ) is
simply D and the feasible region of (Pγ ) is simply P. We use the superscript ∗ to refer
to the optimal set of each of these problems.
We assume that b and c are such that (P ) and (D) are both feasible. Then zA (b, c)
is well defined and finite. It is convenient to introduce the following notations:
b(β) := b + β∆b,
f (β) := zA (b(β), c),
c(γ) := c + γ∆c,
g(γ) := zA (b, c(γ)).
Here the domain of the parameters β and γ is taken as large as possible. Let us consider
the domain of f . This function is defined as long as zA (b(β), c) is well defined. Since
the feasible region of (Dβ ) is constant when β varies, and since we assume that (Dβ )
is feasible for β = 0, it follows that (Dβ ) is feasible for all values of β. Therefore,
f (β) is well defined if the dual problem (Dβ ) has an optimal solution and f (β) is not
defined (or infinity) if the dual problem (Dβ ) is unbounded. By the Duality Theorem
it follows that f (β) is well defined if and only if the primal problem (Pβ ) is feasible.
In exactly the same way it can be understood that the domain of g consists of all γ
for which (Dγ ) is feasible (and (Pγ ) bounded).
Lemma IV.46 The domains of f and g are convex.
Proof: We give the proof for f . The proof for g is similar and therefore omitted.
Let β1 , β2 ∈ dom (f ) and β1 < β < β2 . Then f (β1 ) and f (β2 ) are finite, which means
that both Pβ1 and Pβ2 are nonempty. Let x1 ∈ Pβ1 and x2 ∈ Pβ2 . Then x1 and x2
are nonnegative and
Ax1 = b + β1 ∆b,
Ax2 = b + β2 ∆b.
Now consider
x := x1 +
(β2 − β) x1 + (β − β1 ) x2
β − β1
.
x2 − x1 =
β2 − β1
β2 − β1
IV.19 Parametric and Sensitivity Analysis
367
Note that x is a convex combination of x1 and x2 and hence x is nonnegative. We
proceed by showing that x ∈ Pβ . Using that A x2 − x1 = (β2 − β1 ) ∆b this goes as
follows:
Ax
=
=
=
=
β − β1
A x2 − x1
β2 − β1
β − β1
(β2 − β1 ) ∆b
b + β1 ∆b +
β2 − β1
b + β1 ∆b + (β − β1 ) ∆b
Ax1 +
b + β∆b.
This proves that (Pβ ) is feasible and hence β ∈ dom (f ), completing the proof.
✷
The domains of f and g are in fact closed intervals on the real line. This follows
from the above lemma, and the fact that the complements of the domains of f and g
are open subsets of the real line. The last statement is the content of the next lemma.
Lemma IV.47 The complements of the domains of f and g are open subsets of the
real line.
Proof: As in the proof of the previous lemma we omit the proof for g because it
is similar to the proof for f . We need to show that the complement of dom (f ) is
open. Let β ∈
/ dom (f ). This means that (Dβ ) is unbounded. This is equivalent to the
existence of a vector z such that
AT z ≤ 0,
(b + β∆b)T z > 0.
Fixing z and considering β as a variable, the set of all β satisfying the strict inequality
T
(b + β∆b) z > 0 is an open interval. For all β in this interval (Dβ ) is unbounded.
Hence the domain of f is open. This proves the lemma.
✷
A consequence of the last two lemmas is the next theorem, which requires no further
proof.
Theorem IV.48 The domains of f and g are closed intervals on the real line.4 ✷
Example IV.49 Let (D) be the problem
max {y2 : y2 ≤ 1} .
y=(y1 ,y2 )
In this case b = (0, 1) and c = (1). Note that (D) is feasible and bounded. The set
of all optimal solutions consists of all (y1 , 1) with y1 ∈ IR. Now let ∆b = (1, 0), and
consider the effect of replacing b by b + β∆b, and let f (β) be as defined above. Then
f (β) =
4
max {y2 + βy1 : y2 ≤ 1} .
y=(y1 ,y2 )
To avoid misunderstanding we point out that a singleton {a} (a ∈ IR) is also considered as a closed
interval.
368
IV Miscellaneous Topics
We can easily verify that the perturbed problem is unbounded for all nonzero β. Hence
the domain of f is the singleton {0}.5
♦
19.4.1
The optimal-value function is piecewise linear
In this section we show that the functions f (β) and g(γ) are piecewise linear on their
domains. We start with g(γ).
Theorem IV.50 g(γ) is continuous, concave and piecewise linear.
Proof: By definition,
g(γ) = min c(γ)T x : x ∈ P .
For each γ the minimum value is attained at the central solution of the perturbed
problem (Pγ ). This solution is uniquely determined by the optimal partition of (Pγ ).
Since the number of partitions of the full index set {1, 2, . . . , n} is finite, we may write
g(γ) = min c(γ)T x : x ∈ T ,
where T is a finite subset of P. For each x ∈ T we have
c(γ)T x = cT x + γ∆cT x,
which is a linear function of γ. Thus, g(γ) is the minimum of a finite set of linear
functions.6 This implies that g(γ) is continuous, concave and piecewise linear, proving
the theorem.
✷
Theorem IV.51 f (β) is continuous, convex and piecewise linear.
Proof: The proof goes in the same way as for Theorem IV.50. By definition,
f (β) = max b(β)T y : y ∈ D .
For each β the maximum value is attained at a central solution (y ∗ , s∗ ) of (D). Now
s∗ is uniquely determined by the optimal partition of (D) and b(β)T y ∗ is constant for
all optimal y ∗ . Associating one particular y ∗ with any possible slack s∗ arising in this
way, we obtain that
f (β) = max b(β)T y : y ∈ S ,
where S is a finite subset of D. For each y ∈ S, we have
b(β)T y = bT y + β∆bT y,
5
6
Exercise 89 With (D) and f (β) as defined in Example IV.49 we consider the effect on the domain
of f when some constraints are added. When the constraint y1 ≥ 0 is added to (D), the domain of
f becomes (−∞, 0]. When the constraint y1 ≤ 0 is added to (D), the domain of f becomes [0, ∞)
and when both constraints are added the domain of f becomes (−∞, ∞). Prove this.
Exercise 90 Prove that the minimum of a finite family of linear functions, each defined on the
same closed interval, is continuous, concave and piecewise linear.
IV.19 Parametric and Sensitivity Analysis
369
which is a linear function of β. This makes clear that f (β) is the maximum of a finite
set of linear functions. Therefore, f (β) is continuous, convex and piecewise linear, as
required.
✷
The values of β where the slope of the optimal-value function f (β) changes are
called break points of f , and any interval between two successive break points of f is
called a linearity interval of f . In a similar way we define break points and linearity
intervals for g.
Example IV.52 For any γ ∈ IR consider the problem (Pγ ) defined by
(Pγ )
min
x1 + (3 + γ)x2 + (1 − γ)x3
s.t.
x1 + x2 + x3 = 4,
x1 , x2 , x3 ≥ 0.
In this case b is constant and the perturbation vector for c = (1, 3, 1) is
∆c = (0, 1, −1).
The dual problem is
(Dγ )
max {4y : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ} .
From this it is obvious that the optimal value is given by
g(γ) = 4 min (1, 3 + γ, 1 − γ) .
The graph of the optimal-value function g(γ) is depicted in Figure 19.3. Note that
5
4
g(γ)
✻
3
2
1
0
−1
−2
−3
−4
−5
−5
−4
−3
Figure 19.3
−2
−1
0
✲1 γ
2
3
The optimal-value function g(γ).
g(γ) is piecewise linear and concave. The break points of g occur for γ = −2 and
γ = 0.
♦
370
19.4.2
IV Miscellaneous Topics
Optimal sets on a linearity interval
For any β in the domain of f we denote the optimal set of (Pβ ) by Pβ∗ and the optimal
set of (Dβ ) by Dβ∗ .
Theorem IV.53 If f (β) is linear on the interval [β1 , β2 ], where β1 < β2 , then the
dual optimal set Dβ∗ is constant (i.e. invariant) for β ∈ (β1 , β2 ).
Proof: Let β̄ ∈ (β1 , β2 ) be arbitrary and let ȳ ∈ Dβ̄∗ be arbitrary as well. Since ȳ is
optimal for (Dβ̄ ) we have
f (β̄) = b(β̄)T ȳ = bT ȳ + β̄∆bT ȳ,
and, since ȳ is dual feasible for all β,
b (β1 )T ȳ = bT ȳ + β1 ∆bT ȳ ≤ f (β1 ),
b (β2 )T ȳ = bT ȳ + β2 ∆bT ȳ ≤ f (β2 ).
Hence we find
f (β1 ) − f (β̄) ≥ β1 − β̄ ∆bT ȳ,
The linearity of f on [β1 , β2 ] implies
f (β2 ) − f (β̄) ≥ β2 − β̄ ∆bT ȳ.
f (β2 ) − f (β̄)
f (β̄) − f (β1 )
=
.
β̄ − β1
β2 − β̄
Now using that β2 − β̄ > 0 and β1 − β̄ < 0 we obtain
∆bT ȳ ≤
f (β̄) − f (β1 )
f (β2 ) − f (β̄)
≤ ∆bT ȳ.
=
β̄ − β1
β2 − β̄
Hence, the last two inequalities are equalities, and the slope of f on the closed interval
[β1 , β2 ] is just ∆bT ȳ. This means that the derivative of f with respect to β on the
open interval (β1 , β2 ) satisfies
f ′ (β̄) = ∆bT ȳ,
∀β ∈ (β1 , β2 ) ,
or equivalently,
f (β) = bT ȳ + β∆bT ȳ = b (β)T ȳ,
∀β ∈ (β1 , β2 ) .
We conclude that ȳ is optimal for any (Dβ ) with β ∈ (β1 , β2 ). Since ȳ was arbitrary
in Dβ̄∗ , it follows that
Dβ̄∗ ⊆ Dβ∗ , ∀β ∈ (β1 , β2 ) .
Since β̄ was arbitrary in the open interval (β1 , β2 ), the above argument applies to any
β̃ ∈ (β1 , β2 ); so we also have
Dβ̃∗ ⊆ Dβ∗ ,
∀β ∈ (β1 , β2 ) .
We may conclude that Dβ̄∗ ⊆ Dβ̃∗ and Dβ̃∗ ⊆ Dβ̄∗ , which gives Dβ̄∗ = Dβ̃∗ . The theorem
follows.
✷
The above proof reveals that ∆bT y must have the same value for all y ∈ Dβ∗ and for
all β ∈ (β1 , β2 ). So we may state the following.
IV.19 Parametric and Sensitivity Analysis
371
Corollary IV.54 Under the hypothesis of Theorem IV.53,
f ′ (β) = ∆bT y,
∀β ∈ (β1 , β2 ) , ∀y ∈ Dβ∗ .
By continuity we may write
T
f (β) = bT ȳ + β∆bT ȳ = b (β) ȳ,
∀β ∈ [β1 , β2 ].
This immediately implies another consequence.
∗
Corollary IV.55 Under the hypothesis of Theorem IV.53 let D(β
:= Dβ∗ for
1 ,β2 )
arbitrary β ∈ (β1 , β2 ). Then
∗
⊆ Dβ∗1 ,
D(β
1 ,β2 )
∗
⊆ Dβ∗2 .
D(β
1 ,β2 )
In the next result we deal with the converse of the implication in Theorem IV.53.
Theorem IV.56 Let β1 and β2 > β1 be such that Dβ∗1 = Dβ∗2 . Then Dβ∗ is constant
for all β ∈ [β1 , β2 ] and f (β) is linear on the interval [β1 , β2 ].
Proof: Let ȳ ∈ Dβ∗1 = Dβ∗2 . Then
f (β1 ) = b (β1 )T ȳ,
f (β2 ) = b (β2 )T ȳ.
Consider the linear function h:
T
T
h(β) = b (β) ȳ = (b + β∆b) ȳ,
∀β ∈ [β1 , β2 ].
Then h coincides with f at β1 and β2 . Since f is convex this implies
f (β) ≤ h(β),
∀β ∈ [β1 , β2 ].
Now ȳ is feasible for all β ∈ [β1 , β2 ]. Since f (β) is the optimal value of (Dβ ), it follows
that
T
f (β) ≥ b(β)T ȳ = (b + β∆b) ȳ = h(β).
Therefore, f coincides with h on [β1 , β2 ]. As a consequence, f is linear on [β1 , β2 ] and
ȳ is optimal for (Dβ ) whenever β ∈ [β1 , β2 ]. Since ȳ is arbitrary in Dβ∗1 = Dβ∗2 this
implies that Dβ∗1 = Dβ∗2 is a subset of Dβ∗ for any β ∈ (β1 , β2 ). By Theorem IV.53, and
Corollary IV.55 we also have the converse inclusion. The dual optimal set on (β1 , β2 )
is therefore constant, and the proof is complete.
✷
Each of the above results about f (β) has its analogue for g(γ). We state these results
without further proof.7 The omitted proofs are straightforward modifications of the
above proofs.
Theorem IV.57 If g(γ) is linear on the interval [γ1 , γ2 ], where γ1 < γ2 , then the
primal optimal set Pγ∗ is constant for γ ∈ (γ1 , γ2 ).
7
Exercise 91 Prove Theorem IV.57, Corollary IV.58, Corollary IV.59 and Theorem IV.60.
372
IV Miscellaneous Topics
Corollary IV.58 Under the hypothesis of Theorem IV.57,
g ′ (γ) = ∆cT x,
∀γ ∈ (γ1 , γ2 ) , ∀x ∈ Pγ∗ .
∗
Corollary IV.59 Under the hypothesis of Theorem IV.57 let P(γ
:= Pγ∗ for
1 ,γ2 )
arbitrary γ ∈ (γ1 , γ2 ). Then
∗
P(γ
⊆ Pγ∗1 ,
1 ,γ2 )
∗
P(γ
⊆ Pγ∗2 .
1 ,γ2 )
Theorem IV.60 Let γ1 and γ2 > γ1 be such that Pγ∗1 = Pγ∗2 . Then Pγ∗ is constant
for all γ ∈ [γ1 , γ2 ] and g(γ) is linear on the interval [γ1 , γ2 ].
19.4.3
Optimal sets in a break point
Returning to the function f , we established in the previous section that if β ∈ dom (f )
is not a break point of f then the quantity ∆bT y is constant for all y ∈ Dβ∗ . In this
section we will see that this property is characteristic for ‘nonbreak’ points.
If the domain of f has a right extreme point then we may consider the right
derivative at this point to be ∞, and if the domain of f has a left extreme point
the left derivative at this point may be taken as −∞. Then β is a break point of f if
and only if the left and the right derivatives of f at β are different. This follows from
′
the definition of a break point. Denoting the left and the right derivatives by f−
(β)
′
and f+ (β) respectively, the convexity of f implies that at a break point β we have
′
′
f−
(β) < f+
(β).
If dom (f ) has a right extreme point, it is convenient to consider the open interval
at the right of this point as a linearity interval where both f and its derivative are
∞. Similarly, if dom (f ) has a left extreme point, we may consider the open interval
at the left of this point as a linearity interval where f is ∞ and its derivative −∞.
Obviously, these extreme linearity intervals are characterized by the fact that on the
intervals the primal problem is infeasible and the dual problem unbounded. The dual
problem is unbounded if and only if the set Dβ∗ of optimal solutions is empty.
Lemma IV.61 8 Let β, β − and β + belong to the interior of dom (f ) such that β +
belongs to the open linearity interval just to the right of β and β − to the open linearity
interval just to the left of β. Moreover, let y + ∈ Dβ∗ + and y − ∈ Dβ∗ − . Then
′
f−
(β)
=
′
f+
(β)
=
min ∆bT y : y ∈ Dβ∗ = ∆bT y −
y
max ∆bT y : y ∈ Dβ∗ = ∆bT y + .
y
′
′
Proof: We give the proof for f+
(β). The proof for f−
(β) goes in the same way and
+
∗
is omitted. Since y is optimal for Dβ + we have
(b + β + ∆b)T y + = f (β + ) ≥ (b + β + ∆b)T y, ∀y ∈ Dβ∗ .
8
This lemma can also be obtained as a special case of a result of Mills [210]. His more general result
gives the directional derivatives of the optimal-value function with respect to any ‘admissible’
perturbation of A, b and c; when only b is perturbed it gives the same result as the lemma.
IV.19 Parametric and Sensitivity Analysis
373
We also have y + ∈ Dβ∗ , from Theorem IV.53 and Corollary IV.55. Therefore,
T
T
(b + β∆b) y + = (b + β∆b) y, ∀y ∈ Dβ∗ .
Subtracting both sides of this equality from the corresponding sides in the last
inequality gives
β + − β ∆bT y + ≥ β + − β ∆bT y, ∀y ∈ Dβ∗ .
Dividing both sides by the positive number β + − β we get
∆bT y + ≥ ∆bT y, ∀y ∈ Dβ∗ ,
thus proving that
max ∆bT y : y ∈ Dβ∗ = ∆bT y + .
y
Since
′
f+
(β)
T +
= ∆b y , from Corollary IV.54, the lemma follows.
✷
The above lemma admits a nice generalization that is also valid if β is an extreme
point of the domain of f .
Theorem IV.62 Let β ∈ dom (f ) and let x∗ be any optimal solution of (Pβ ). Then
the derivatives at β satisfy
′
f−
(β)
=
min ∆bT y : AT y + s = c, s ≥ 0, sT x∗ = 0
y,s
′
f+
(β)
=
max ∆bT y : AT y + s = c, s ≥ 0, sT x∗ = 0 .
y,s
′
Proof: As in the previous lemma, we give the proof for f+
(β) and omit the proof for
′
f− (β). Consider the optimization problem
max ∆bT y : AT y + s = c, s ≥ 0, sT x∗ = 0 .
(19.3)
y,s
First we establish that if β belongs to the interior of dom (f ) then this is exactly the
same problem as the maximization problem in Lemma IV.61. This follows because if
AT y + s = c, s ≥ 0, then (y, s) is optimal for (Dβ ) if and only if sT x∗ = 0, since x∗
is an optimal solution of the dual problem (Pβ ) of (Dβ ). If β belongs to the interior
of dom (f ) then the theorem follows from Lemma IV.61. Hence it remains to deal
with the case where β is an extreme point of dom (f ). It is easily verified that if β is
the left extreme point of dom (f ) then we can repeat the arguments in the proof of
Lemma IV.61. Thus it remains to prove the theorem if β is the right extreme point of
′
dom (f ). Since f+
(β) = ∞ in that case, we need to show that the above maximization
problem (19.3) is unbounded.
Let β be the right extreme point of dom (f ) and suppose that the problem (19.3) is
not unbounded. Let us point out first that (19.3) is feasible. Its feasible region is just
the optimal set of the dual (Dβ ) of (Pβ ). Since (Pβ ) has as an optimal solution, (Dβ )
has an optimal solution as well. This implies that (Dβ ) is feasible. Therefore, (19.3)
is feasible as well. Hence, if (19.3) is not unbounded, the problem itself and its dual
have optimal solutions. The dual problem is given by
min cT ξ : Aξ = ∆b, ξ + λx∗ ≥ 0 .
ξ,λ
374
IV Miscellaneous Topics
We conclude that there exists a vector ξ ∈ IRn and a scalar λ such that Aξ =
∆b, ξ + λx∗ ≥ 0. This implies that we cannot have ξi < 0 and x∗i = 0. In other
words,
x∗i = 0 ⇒ ξi ≥ 0.
Hence, there exists a positive ε such that x̄ := x∗ + εξ ≥ 0. Now we have
Ax̄ = A (x∗ + εξ) = Ax∗ + εAξ = b + (β + ε) ∆b.
Thus we find that (Pβ+ε ) admits x̄ as a feasible point. This contradicts the assumption
that β is the right extreme point of dom (f ). We conclude that (19.3) is unbounded,
proving the theorem.
✷
The picture becomes more complete now. Note that Theorem IV.62 is valid for any
value of β in the domain of f . The theorem reestablishes that at a ‘nonbreak’ point,
where the left and right derivative of f are equal, the value of ∆bT y is constant when y
runs through the dual optimal set Dβ∗ . But it also makes it clear that at a break point,
where the two derivatives are different, ∆bT y is not constant when y runs through the
dual optimal set Dβ∗ . Then the extreme values of ∆bT y yield the left and the right
derivatives of f at β; the left derivative is the minimum and the right derivative the
maximal value of ∆bT y when y runs through the dual optimal set Dβ∗ .
It is worth pointing out another consequence of Lemma IV.61 and Theorem IV.62.
Using the notation of the lemma we have the inclusions
Dβ∗ − ⊆ Dβ∗ ,
Dβ∗ + ⊆ Dβ∗ ,
which follow from Corollary IV.55 if β is not an extreme point of dom (f ). If β is the
right extreme point then Dβ∗ + is empty, and if it is the left extreme point then Dβ∗ − is
empty as well; hence the above inclusions hold everywhere. Now suppose that β is a
nonextreme break point of f . Then letting y run through the set Dβ∗ − we know that
∆bT y is constant and equal to the left derivative of f at β, and if y runs through Dβ∗ +
then ∆bT y is constant and equal to the right derivative of f at β and, finally, if y
runs through Dβ∗ then ∆bT y is not constant. Thus the three sets must be mutually
different. As a consequence, the above inclusions must be strict. Moreover, since the
left and the right derivatives at β are different, the sets Dβ∗ − and Dβ∗ + are disjoint.
Thus we may state the following.
Corollary IV.63 Let β be a nonextreme break point of f and let β + and β − be as
defined in Lemma IV.61. Then we have
Dβ∗ − ⊂ Dβ∗ ,
Dβ∗ + ⊂ Dβ∗ ,
Dβ∗ − ∩ Dβ∗ + = ∅,
where the inclusions are strict.9
9
Exercise 92 Using the notation of Lemma IV.61 and Corollary IV.63, we have
Dβ∗ − ∪ Dβ∗ + ⊆ Dβ∗ .
Show that the inclusion is always strict. (Hint: use the central solution of (Dβ ).)
IV.19 Parametric and Sensitivity Analysis
375
Two other almost obvious consequences of the above results are the following
corollaries.10
Corollary IV.64 Let β be a nonextreme break point of f and let β + and β − be as
defined in Lemma IV.61. Then
Dβ∗ − = y ∈ Dβ∗ : ∆bT y = ∆bT y − ,
Dβ∗ + = y ∈ Dβ∗ : ∆bT y = ∆bT y + .
Corollary IV.65 Let β be a nonextreme break point of f and let β + and β − be as
defined in Lemma IV.61. Then
dim Dβ∗ − < dim Dβ∗ ,
dim Dβ∗ + < dim Dβ∗ .
Remark IV.66 It is interesting to consider the dual optimal set Dβ∗ when β runs from
−∞ to ∞. To the left of the smallest break point (the break point for which β is minimal)
the set Dβ∗ is constant. It may happen that Dβ∗ is empty there, due to the absence of optimal
solutions for these small values of β. This occurs if (Dβ ) is unbounded (which means that
(Pβ ) is infeasible) for the values of β on the farthest left open linearity interval. Then, at the
first break point, the set Dβ∗ increases to a larger set, and as we pass to the next open linearity
interval the set Dβ∗ becomes equal to a proper subset of this enlarged set. This process repeats
itself at every new break point: at a break point of f the dual optimal set expands itself, and
as we pass to the next open linearity interval it shrinks to a proper subset of the enlarged
set. Since the derivative of f is monotonically increasing when β runs from −∞ to ∞, every
new dual optimal set arising in this way differs from all previous ones. In other words, every
break point of f and every linearity interval of f has its own dual optimal set.11
•
We state the dual analogues of Lemma IV.61 and Theorem IV.62 and their
corollaries without further proof.12
Lemma IV.67 Let γ, γ − and γ + belong to the interior of dom (g), γ + to the open
linearity interval just to the right of γ, and γ − to the open linearity interval just to
the left of γ. Moreover, let x+ ∈ Pγ∗+ and x− ∈ Pγ∗− . Then
′
g−
(γ)
=
′
g+
(γ)
=
max ∆cT x : x ∈ Pγ∗ = ∆cT x−
x
min ∆cT x : x ∈ Pγ∗ = ∆cT x+ .
x
Theorem IV.68 Let γ ∈ dom (g) and let (y ∗ , s∗ ) be any optimal solution of (Dγ ).
Then the derivatives at γ satisfy
′
g−
(γ)
=
′
g+
(γ)
=
max ∆cT x : Ax = b, x ≥ 0, xT s∗ = 0
x
min ∆cT x : Ax = b, x ≥ 0, xT s∗ = 0 .
x
10
Exercise 93 Prove Corollary IV.64 and Corollary IV.65.
11
Exercise 94 The dual optimal sets belonging to two different open linearity intervals of f are
disjoint. Prove this. (Hint: use that the derivatives of f on the two intervals are different.)
Exercise 95 Prove Lemma IV.67, Theorem IV.68, Corollary IV.69, Corollary IV.70 and Corollary IV.71.
12
376
IV Miscellaneous Topics
Corollary IV.69 Let γ be a nonextreme break point of g and let γ + and γ − be as
defined in Lemma IV.67. Then
Pγ∗− ⊂ Pγ∗ ,
Pγ∗+ ⊂ Pγ∗ ,
Pγ∗− ∩ Pγ∗+ = ∅,
where the inclusions are strict.13
Corollary IV.70 Let γ be a nonextreme break point of g and let γ + and γ − be as
defined in Lemma IV.67. Then
Pγ∗− = x ∈ Pγ∗ : ∆cT x = ∆cT x− , Pγ∗+ = x ∈ Pγ∗ : ∆cT x = ∆cT x+ .
Corollary IV.71 Let γ be a nonextreme break point of g and let γ + and γ − be as
defined in Lemma IV.67. Then
dim Pγ∗− < dim Pγ∗ ,
dim Pγ∗+ < dim Pγ∗ .
The next example illustrates the results of this section.
Example IV.72 We use the same problem as in Example IV.52. For any γ ∈ IR the
problem (Pγ ) is defined by
(Pγ )
min
x1 + (3 + γ)x2 + (1 − γ)x3
s.t.
x1 + x2 + x3 = 4,
x1 , x2 , x3 ≥ 0,
and the dual problem is
(Dγ )
max {4y : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ} .
The perturbation vector for c = (1, 3, 1) is
∆c = (0, 1, −1).
The graph of g is depicted in Figure 19.3 (page 369). The break points of g occur at
γ = −2 and γ = 0.
For γ < −2 the optimal solution of (Pγ ) is x = (0, 4, 0), and then ∆cT x = 4. At the
break point γ = −2 the primal optimal solution set is given by
{x = (x1 , x2 , 0) : x1 + x2 = 4, x1 ≥ 0, x2 ≥ 0} .
The extreme values of ∆cT x on this set are 4 and 0. The maximal value occurs for
x = (0, 4, 0) and the minimal value for x = (4, 0, 0). Hence, the left and right derivatives
of g at γ = −2 are given by these values. If −2 < γ < 0 then the optimal solution of
the primal problem is given by x = (4, 0, 0) and ∆cT x = 0, so the derivative of g is 0
in this region. At the break point γ = 0 the primal optimal solution set is given by
{x = (x1 , 0, x3 ) : x1 + x3 = 4, x1 ≥ 0, x3 ≥ 0} .
The extreme values of ∆cT x on this set are 0 and −4. The left and right derivatives
of g at γ = 0 are given by these values. The maximal value occurs for x = (4, 0, 0)
and the minimal value for x = (0, 0, 4). Observe that in this example the primal
optimal solution set at every break point has dimension 1, whereas in the open linearity
intervals the optimal solution is always unique.
♦
13
Exercise 96 Find an example where Pγ∗− = ∅ and Pγ∗ 6= ∅.
IV.19 Parametric and Sensitivity Analysis
19.4.4
377
Extreme points of a linearity interval
In this section we assume that β̄ belongs to the interior of a linearity interval [β1 , β2 ].
Given an optimal solution of (Dβ̄ ) we show how the extreme points β1 and β2 of the
linearity interval containing β̄ can be found by solving two auxiliary LO problems.
Theorem IV.73 Let β̄ be arbitrary and let (y ∗ , s∗ ) be any optimal solution of (Dβ̄ ).
Then the extreme points of the linearity interval [β1 , β2 ] containing β̄ follow from
β1
=
min β : Ax = b + β∆b, x ≥ 0, xT s∗ = 0
β,x
β2
=
max β : Ax = b + β∆b, x ≥ 0, xT s∗ = 0 .
β,x
Proof: We only give the proof for β1 .14 Consider the minimization problem
min β : Ax = b + β∆b, x ≥ 0, xT s∗ = 0 .
(19.4)
β,x
We first show that this problem is feasible. Since (Dβ̄ ) has an optimal solution, its
dual problem (Pβ̄ ) has an optimal solution as well. Letting x̄ be optimal for (Pβ̄ ), we
can easily verify that β = β̄ and x = x̄ are feasible for (19.4).
We proceed by considering the case where (19.4) is unbounded. For any β ≤ β̄
there exists a vector x that satisfies Ax = b + β∆b, x ≥ 0, xT s∗ = 0. Now (y ∗ , s∗ )
is feasible for (Dβ ) and x is feasible for (Pβ ). Since xT s∗ = 0, x is optimal for
(Pβ ) and (y ∗ , s∗ ) is optimal for (Dβ ). The optimal value of both problems is given
by b(β)T y ∗ = bT y ∗ + β∆bT y ∗ . This means that β belongs to the linearity interval
containing β̄. Since this holds for any β ≤ β̄, the left boundary of this linearity
interval is −∞, as it should be.
It remains to deal with the case where (19.4) has an optimal solution, say (β ∗ , x∗ ).
We then have Ax∗ = b + β ∗ ∆b = b(β ∗ ), so x∗ is feasible for (Pβ ∗ ). Since (y ∗ , s∗ )
is feasible for (Dβ ∗ ) and x∗ T s∗ = 0 it follows that x∗ is optimal for (Pβ ∗ ) and
(y ∗ , s∗ ) is optimal for (Dβ ∗ ). The optimal value of both problems is given by
b(β ∗ )T y ∗ = bT y ∗ + β ∗ ∆bT y ∗ . This means that β ∗ belongs to the linearity interval
containing β̄, and it follows that β ∗ ≥ β1 .
On the other hand, from Corollary IV.55 the pair (y ∗ , s∗ ) is optimal for (Dβ1 ). Now
let x̄ be optimal for (Pβ1 ). Then we have
Ax̄ = b(β1 ) = b + β1 ∆b, x ≥ 0,
x̄T s∗ = 0,
which shows that the pair (β1 , x̄) is feasible for the above minimization problem. This
implies that β ∗ ≤ β1 . Hence we obtain that β ∗ = β1 . This completes the proof.
✷
If β̄ is not a break point then there is only one linearity interval containing β̄, and
hence this must be the linearity interval [β1 , β2 ], as given by Theorem IV.73.
It is worth pointing out that if β̄ is a break point there are three linearity intervals
containing β̄, namely the singleton interval [β̄, β̄] and the two surrounding linearity
intervals. In the singleton case, the linearity interval [β1 , β2 ] given by Theorem IV.73
may be any of these three intervals, and which one it is depends on the given optimal
14
Exercise 97 Prove the second part (on β2 ) of Theorem IV.73.
378
IV Miscellaneous Topics
solution (y ∗ , s∗ ) of (Dβ̄ ). It can easily be understood that the linearity interval at
the right of β̄ will be found if (y ∗ , s∗ ) happens to be optimal on the right linearity
′
interval. This occurs when ∆bT y ∗ = f+
(β̄), due to Corollary IV.64. Similarly, the
linearity interval at the left of β̄ will be found if (y ∗ , s∗ ) is optimal on the left linearity
′
interval and this occurs when ∆bT y ∗ = f−
(β̄), also due to Corollary IV.64. Finally, if
′
′
f−
(β̄) < ∆bT y ∗ < f+
(β̄),
(19.5)
then we have β1 = β2 = β̄ in Theorem IV.73. The last situation seems to be most
informative. It clearly indicates that β̄ is a break point of f , which is not apparent
in the other two situations. Knowing that β̄ is a break point of f we can find the
two one-sided derivatives of f at β̄ as well as optimal solutions for the two intervals
surrounding β̄ from Theorem IV.62. In the light of this discussion the following result
is of interest. It shows that the above ambiguity can be avoided by the use of strictly
complementary optimal solutions.
Theorem IV.74 Let β̄ be a break point and let (y ∗ , s∗ ) be a strictly complementary
optimal solution of (Dβ̄ ). Then the numbers β1 and β2 given by Theorem IV.73 satisfy
β1 = β2 = β̄.
Proof: If (y ∗ , s∗ ) is a strictly complementary optimal solution of (Dβ̄ ) then it uniquely
determines the optimal partition of (Dβ̄ ) and this partition differs from the optimal
partitions corresponding to the optimal sets of the linearity intervals surrounding β̄.
Hence (y ∗ , s∗ ) does not belong to the optimal sets of the linearity intervals surrounding
β̄. From Corollary IV.64 it follows that ∆bT y ∗ satisfies (19.5), and the theorem follows.
✷
It is not difficult to state the corresponding results for g. We do this below, omitting
the proofs, and then provide an example of their use.15
Theorem IV.75 Let γ̄ be arbitrary and let x∗ be any optimal solution of (Pγ̄ ). Then
the extreme points of the linearity interval [γ1 , γ2 ] containing γ̄ follow from
γ1
=
min γ : AT y + s = c + γ∆c, s ≥ 0, sT x∗ = 0
γ,y,s
γ2
=
max γ : AT y + s = c + γ∆c, s ≥ 0, sT x∗ = 0 .
γ,y,s
Theorem IV.76 Let γ̄ be a break point and let x∗ be a strictly complementary
optimal solution of (Pγ̄ ). Then the numbers γ1 and γ2 given by Theorem IV.75 satisfy
γ1 = γ2 = γ̄.
Example IV.77 We use the same problem as in Example IV.72. Using the notation
of Theorem IV.75 we first determine the linearity interval for γ̄ = −1. We can easily
verify that x = (4, 0, 0) is optimal for (P−1 ). Hence the extreme points γ1 and γ2 of
the linearity interval containing γ̄ follow by minimizing and maximizing γ over the
region
{γ : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ, 4(1 − y) = 0} .
15
Exercise 98 Prove Theorem IV.75 and Theorem IV.76.
IV.19 Parametric and Sensitivity Analysis
379
The last constraint implies y = 1, so the other constraints reduce to 1 ≤ 3 + γ and
1 ≤ 1 − γ, which gives −2 ≤ γ ≤ 0. Hence the linearity interval containing γ̄ = −1 is
[−2, 0].
When γ̄ = 1, x = (0, 0, 4) is optimal for (P1 ), and the linearity interval containing
γ̄ follows by minimizing and maximizing γ over the region
{γ : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ, 4(1 − γ − y) = 0} .
The last constraint implies y = 1 − γ. Now the other constraints reduce to 1 − γ ≤ 1
and 1 − γ ≤ 3 + γ, which is equivalent to γ ≥ 0. So the linearity interval containing
γ̄ = 1 is [0, ∞).
When γ̄ = −3, x = (0, 4, 0) is optimal for (P−3 ), and the linearity interval containing
γ̄ follows by minimizing and maximizing γ over the region
{γ : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ, 4(3 + γ − y) = 0} .
The last constraint implies y = 3+γ, and the other constraints reduce to 3+γ ≤ 1 and
3 + γ ≤ 1 − γ, which is equivalent to γ ≤ −2. Thus, the linearity interval containing
γ̄ = −3 is (−∞, −2].
Observe that the linearity intervals just calculated agree with Figure 19.3.
Finally we demonstrate the use of Theorem IV.76 at a break point. Taking γ̄ = 0,
we see that x = (4, 0, 0) is optimal for (P0 ), and we need to minimize and maximize γ
over the region
{γ : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ, 4(1 − y) = 0} .
This gives −2 ≤ γ ≤ 0 and we find the linearity interval [−2, 0] left from 0. This is
because x = (4, 0, 0) is also optimal on this interval. Recall from Example IV.72 that
the optimal set at γ = 0 is given by
{x = (x1 , 0, x3 ) : x1 + x3 = 4, x1 ≥ 0, x3 ≥ 0} .
Thus, instead of the optimal solution x = (4, 0, 0) we may equally well use the strictly
complementary solution x = (2, 0, 2). Then we need to minimize and maximize γ over
the region
{γ : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ, 2(1 − y) + 2(1 − γ − y) = 0} .
The last constraint amounts to γ = 2 − 2y. Substitution in the third constraint yields
y ≤ −1 + 2y or y ≥ 1. Because of the first constraint we get y = 1, from which it
follows that γ = 0. Thus, γ1 = γ2 = 0 in accordance with Theorem IV.76.
♦
19.4.5
Running through all break points and linearity intervals
Using the results of the previous sections, we present in this section an algorithm that
yields the optimal-value function for a one-dimensional perturbation of the vector b
or the vector c. We first deal with a one-dimensional perturbation of b by a scalar
multiple of the vector ∆b; we state the algorithm for the calculation of the optimalvalue function and then prove that the algorithm finds all break points and linearity
380
IV Miscellaneous Topics
intervals. It will then be clear how to treat a one-dimensional perturbation of c; we
state the corresponding algorithm and its convergence result without further proof.
We provide examples for both cases.
Assume that we are given optimal solutions x∗ of (P ) and (y ∗ , s∗ ) of (D). In the
notation of the previous sections, the problem (Pβ ) and its dual (Dβ ) arise by replacing
the vector b by b(β) = b+β∆b; the optimal value of these problems is denoted by f (β).
So we have f (0) = cT x∗ = bT y ∗ . The domain of the optimal-value function is (−∞, ∞)
and f (β) = ∞ if and only if (Dβ ) is unbounded. Recall from Theorem IV.51 that f (β)
is convex and piecewise linear. Below we present an algorithm that determines f on the
nonnegative part of the real line. We leave it to the reader to find some straightforward
modifications of the algorithm, yielding an algorithm that generates f on the other
part of the real line.16 The algorithm is as follows.17
The Optimal Value Function f (β), β ≥ 0
Input:
An optimal solution (y ∗ , s∗ ) of (D);
a perturbation vector ∆b.
begin
k := 1; y 0 := y ∗ ; s0 = s∗ ; ready:=false;
while not ready do
begin
Solve maxβ,x β : Ax = b + β∆b, x ≥ 0, xT sk−1 = 0 ;
if this problem is unbounded: ready:=true
else let (βk , xk ) be an optimal solution;
begin
Solve maxy,s ∆bT y : AT y + s = c, s ≥ 0, sT xk = 0 ;
if this problem is unbounded: ready := true
else let (y k , sk ) be an optimal solution;
k := k + 1;
end
end
end
The next theorem states that the above algorithm finds the successive break points
of f on the nonnegative part of the real line, as well as the slopes of f on the successive
linearity intervals.
Theorem IV.78 The algorithm terminates after a finite number of iterations. If K
is the number of iterations upon termination then β1 , β2 , . . . , βK are the successive
16
17
Exercise 99 When the two maximization problems in the algorithm are changed into minimization problems, the algorithm yields the break points and linearity intervals for negative values of
β. Prove this.
After the completion of this section the same algorithm appeared in a recent paper of Monteiro
and Mehrotra [221] and the authors became aware of the fact that these authors already published
the algorithm in 1992 [207].
IV.19 Parametric and Sensitivity Analysis
381
break points of f on the nonnegative real line. The optimal value at βk (1 ≤ k ≤ K)
is given by cT xk and the slope of f on the interval (βk , βk+1 ) (1 ≤ k < K) by ∆bT y k .
Proof: In the first iteration the algorithm starts by solving
max β : Ax = b + β∆b, x ≥ 0, xT s0 = 0 ,
β,x
where s0 is the slack vector in the given optimal solution (y 0 , s0 ) = (y ∗ , s∗ ) of
(D) = (D0 ). This problem is feasible, because (P ) has an optimal solution x∗ and
(β, x) = (0, x∗ ) satisfies the constraints. Hence the first auxiliary problem is either
unbounded or it has an optimal solution (β1 , x1 ). By Theorem IV.73 β1 is equal to
the extreme point at the right of the linearity interval containing 0. If the problem
is unbounded (when β1 = ∞) then f is linear on (0, ∞) and the algorithm stops;
otherwise β1 is the first break point to the right of 0. (Note that it may happen that
β1 = 0. This certainly occurs if 0 is a break point of f and the starting solution (y ∗ , s∗ )
is strictly complementary.) Clearly x1 is primal feasible at β = β1 . Since (y 1 , s1 ) is
dual feasible at β = β1 and (x1 )T s1 = 0 we see that x1 is optimal for (Pβ1 ). Hence
f (β1 ) = cT x1 . Also observe that (y 1 , s1 ) is dual optimal at β1 . (This also follows from
Corollary IV.55.)
Assuming that the second half of the algorithm occurs, when the above problem has
an optimal solution, the algorithm proceeds by solving a second auxiliary problem,
namely
max ∆bT y : AT y + s = c, s ≥ 0, sT x1 = 0 .
y,s
By Theorem IV.62 the maximal value is equal to the right derivative of f at β1 . If the
problem is unbounded then β1 is the largest break point of f on (0, ∞) and f (β) = ∞
for β > β1 . In that case we are done and the algorithm stops. Otherwise, when the
problem is bounded, the optimal solution (y 1 , s1 ) is such that ∆bT y 1 is equal to the
slope on the linearity interval to the right of β1 , by Lemma IV.61. Moreover, from
Corollary IV.64, (y 1 , s1 ) is dual optimal on the open linearity interval to the right of
β1 . Hence, at the start of the second iteration (y 1 , s1 ) is an optimal solution at the
open interval to the right of the first break point on [0, ∞). Thus we can start the
second iteration and proceed as in the first iteration. Since each iteration produces a
linearity interval, and f has only finitely many such intervals, the algorithm terminates
after a finite number of iterations.
✷
Example IV.79 Consider the primal problem
(P )
min {x1 + x2 + x3 : x1 − x2 = 0, x3 = 1, x = (x1 , x2 , x3 ) ≥ 0}
and its dual
max {y2 : −1 ≤ y1 ≤ 1, y2 ≤ 1} .
(D)
Hence, in this case we have
A=
"
1 −1
0
0
0
1
#
,
1
c = 1 ,
1
b=
"
0
1
#
.
382
IV Miscellaneous Topics
We perturb the vector b by a scalar multiple of
"
#
1
∆b =
−1
to
b(β) = b + β∆b =
"
0
1
#
+β
"
1
−1
#
=
"
β
1−β
#
,
and use the algorithm to find the break points and linearity intervals of f (β) =
z (b(β), c).
Optimal solutions of (P ) and (D) are given by
x∗ = (0, 0, 1),
y ∗ = (0, 1),
s∗ = (1, 1, 0).
Thus, entering the first iteration of the algorithm we consider
max {β : x1 − x2 = β, x3 = 1 − β, x ≥ 0, x1 + x2 = 0} .
β,x
From x ≥ 0, x1 + x2 = 0 we deduce that x1 = x2 = 0 and hence β = 0. Thus we find
the first break point and the optimal value at this break point:
β1 = 0,
x1 = (0, 0, 1),
f (β1 ) = cT x1 = 1.
We proceed with the second auxiliary problem:
max {y1 − y2 : −1 ≤ y1 ≤ 1, y2 ≤ 1, 1 − y2 = 0} .
y
It follows that y2 = 1 and y1 − y2 = y1 − 1 is maximal if y1 = 1. Thus we find an
optimal solution (y 1 , s1 ) for the linearity interval just to the right of β1 and the slope
of f on this interval:
y 1 = (1, 1),
s1 = (0, 2, 0),
′
f+
(β1 ) = ∆bT y1 = 0.
In the second iteration the first auxiliary problem is
max {β : x1 − x2 = β, x3 = 1 − β, x ≥ 0, 2x2 = 0} ,
β,x
which is equivalent to
max {β : β = x1 , β = 1 − x3 , x ≥ 0, x2 = 0} .
β,x
Clearly the maximum value of β is attained at x1 = 1 and x3 = 0. Thus we find the
second break point and the optimal value at this break point:
β2 = 1,
x1 = (1, 0, 0),
The second auxiliary problem becomes
f (β2 ) = cT x2 = 1.
IV.19 Parametric and Sensitivity Analysis
383
9
8
f (β)
7
✻
6
5
4
3
2
1
0
−1
−4
−3
−2
Figure 19.4
−1
0
✲ β
1
2
The optimal-value function f (β).
max {y1 − y2 : −1 ≤ y1 ≤ 1, y2 ≤ 1, 1 − y1 = 0} ,
y
which is equivalent to
max {1 − y2 : y2 ≤ 1, y1 = 1} .
y
′
(β2 ) = ∞ and we are done. For larger
Clearly this problem is unbounded. Hence f+
values of β the primal problem (Pβ ) becomes infeasible and the dual problem (Dβ )
unbounded.
We proceed by calculating f (β) for negative values of β. Using Exercise 99 (page 380,
the first auxiliary problem, in the first iteration, becomes simply
min {β : x1 − x2 = β, x3 = 1 − β, x ≥ 0, x1 + x2 = 0} .
β,x
We can easily verify that this problem has the same solution as its counterpart, when
we maximize β. This is due to the fact that β = 0 is a break point of f . We find, as
before,
β1 = 0, x1 = (0, 0, 1), f (β1 ) = cT x1 = 1.
We proceed with the second auxiliary problem:
min {y1 − y2 : −1 ≤ y1 ≤ 1, y2 ≤ 1, 1 − y2 = 0} .
y
Since y2 = 1 we have y1 − y2 = y1 − 1 and this is minimal if y1 = −1. Thus we find
an optimal solution (y 1 , s1 ) for the linearity interval just to the left of β1 = 0 and the
slope of f on this interval:
y 1 = (−1, 1),
s1 = (2, 0, 0),
′
f−
(β1 ) = ∆bT y1 = −2.
384
IV Miscellaneous Topics
In the second iteration the first auxiliary problem becomes
min {β : x1 − x2 = β, x3 = 1 − β, x ≥ 0, 2x1 = 0} ,
β,x
which is equivalent to
min {β : β = −x2 , β = 1 − x3 , x ≥ 0, x1 = 0} .
β,x
Obviously this problem is unbounded. This means that f (β) is linear on the negative
real line, and we are done. Figure 19.4 (page 383) depicts the optimal-value function
f (β) as just calculated.
♦
When the vector c is perturbed by a scalar multiple of ∆c to c(γ) = c + γ∆c,
the algorithm for the calculation of the optimal value function g(γ) can be stated as
follows. Recall that g is concave. That is why the second auxiliary problem in the
algorithm is a minimization problem.18
The Optimal Value Function g(γ), γ ≥ 0
Input:
An optimal solution x∗ of (P );
a perturbation vector ∆c.
begin
ready:=false;
k := 1; x0 := x∗ ;
while not ready do
begin
Solve maxγ,y,s γ : AT y + s = c + γ∆c, s ≥ 0, sT xk−1 = 0 ;
if this problem is unbounded: ready:=true
else let (γk , y k , sk ) be an optimal solution;
begin
Solve minx ∆cT x : Ax = b, x ≥ 0, xT sk = 0 ;
if this problem is unbounded: ready:=true
else let xk be an optimal solution;
k := k + 1;
end
end
end
The above algorithm finds the successive break points of g on the nonnegative real
line as well as the slopes of g on the successive linearity intervals. The proof uses
18
Exercise 100 When the maximization problem in the algorithm is changed into a minimization
problem and the minimization into a maximization problem, the algorithm yields the break points
and linearity intervals for negative values of γ. Prove this.
IV.19 Parametric and Sensitivity Analysis
385
arguments similar to the arguments in the proof of Theorem IV.78 and is therefore
omitted.
Theorem IV.80 The algorithm terminates after a finite number of iterations. If K
is the number of iterations upon termination then γ1 , γ2 , . . . , γK are the successive
break points of g on the nonnegative real line. The optimal value at γk (1 ≤ k ≤ K)
is given by bT y k and the slope of g on the interval (γk , γk+1 ) (1 ≤ k < K) by ∆cT xk .
✷
The next example illustrates the use of the above algorithm.
Example IV.81 In Example IV.72 we considered the primal problem
min {x1 + 3x2 + x3 : x1 + x2 + x3 = 4, x1 , x2 , x3 ≥ 0}
(P )
and its dual problem
(D)
max {4y : y ≤ 1, y ≤ 3, y ≤ 1} ,
with the perturbation vector
∆c = (0, 1, −1)
and we calculated the linearity intervals from Lemma IV.67. This required the
knowledge of an optimal primal solution for each interval. Theorem IV.80 enables
us to find these intervals from the knowledge of an optimal solution x∗ of (P ) only.
Entering the first iteration of the above algorithm with x∗ = (4, 0, 0) we consider
max {γ : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ, 4(1 − y) = 0} .
γ,y
We can easily see that y = 1 is optimal with γ = 0. Thus we find the first break point
and the optimal value at this break point:
γ1 = 0,
y 1 = 1,
s1 = (0, 2, 0),
g(γ1 ) = bT y 1 = 4.
The second auxiliary problem is now given by:
min {x2 − x3 : x1 + x2 + x3 = 4, x1 , x2 , x3 ≥ 0, 2x2 = 0} .
x
It follows that x2 = 0 and x2 − x3 = −x3 is minimal if x3 = 4 and x1 = 0. Thus we
find an optimal solution x1 for the linearity interval just to the right of γ1 and the
slope of g on this interval:
x1 = (0, 0, 4),
′
g+
(γ1 ) = ∆cT x1 = −4.
In the second iteration the first auxiliary problem is
max {γ : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ, 4(1 − γ − y) = 0} .
γ,y
386
IV Miscellaneous Topics
It follows that y = 1 − γ and the problem becomes equivalent to
max {γ : 1 − γ ≤ 1, 1 − γ ≤ 3 + γ, y = 1 − γ} .
γ,y
Clearly this problem is unbounded. Hence g is linear for values of γ larger than γ1 = 0.
We proceed by calculating g(γ) for negative values of γ. Using Exercise 100
(page 384), the first auxiliary problem, in the first iteration, becomes simply
min {γ : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ, 4(1 − y) = 0} .
γ,y
Since y = 1 this is equivalent to
min {γ : −2 ≤ γ ≤ 0, y = 1} ,
γ,y
so the first break point and the optimal value at this break point are given by
γ1 = −2,
y 1 = 1,
s1 = (0, 0, 2),
g(γ1 ) = bT y 1 = 4.
The second auxiliary problem is now given by:
max {x2 − x3 : x1 + x2 + x3 = 4, x1 , x2 , x3 ≥ 0, 2x3 = 0} ,
x
which is equivalent to
max {x2 : x1 + x2 = 4, x1 , x2 ≥ 0, x3 = 0} .
x
Since x2 is maximal if x1 = 0 and x2 = 4 we find an optimal solution x1 for the
linearity interval just to the left of γ1 and the slope of g on this interval:
x1 = (0, 4, 0),
′
g−
(γ1 ) = ∆cT x1 = 4.
In the second iteration the first auxiliary problem is
min {γ : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ, 4(3 + γ − y) = 0} .
γ,y
It follows that y = 3 + γ and the problem becomes equivalent to
min {γ : 3 + γ ≤ 1, 3 + γ ≤ 1 − γ, y = 3 + γ} .
γ,y
Clearly this problem is unbounded. Hence g is linear for values of γ smaller than
γ1 = −2. This completes the calculation of the optimal-value function g(γ) for the
present example. We can easily check that the above results are in accordance with
the graph of g(γ) in Figure 19.3 (page 369).19
♦
19
Exercise 101 In Example IV.81 the algorithm for the computation of the optimal-value function
g(γ) was initialized by the optimal solution x∗ = (4, 0, 0) of (P ). Execute the algorithm once more
now using the optimal solution x∗ = (2, 0, 2) of (P ).
IV.19 Parametric and Sensitivity Analysis
19.5
387
Sensitivity analysis
Sensitivity analysis is the special case of parametric analysis where only one coefficient
of b, or c, is perturbed. This means that the perturbation vector is a unit vector. The
derivative of the optimal-value function to a coefficient is called the shadow price and
the corresponding linearity interval the range of the coefficient. When dealing with
sensitivity analysis the aim is to find the shadow prices and ranges of all coefficients
in b and c. Of course, the current value of a coefficient may or may not be a break
point. In the latter case, when the current coefficient is not a break point, it belongs
to an open linearity interval and the range of the coefficient is just this closed linearity
interval and its shadow price the slope of the optimal-value function on this interval.
If the coefficient is a break point, then we have two shadow prices, the left-shadow
price, which is the left derivative of the optimal-value function at the current value,
and the right-shadow price, the right derivative of the optimal-value function at the
current value.20
19.5.1
Ranges and shadow prices
Let x∗ be an optimal solution of (P ) and (y ∗ , s∗ ) an optimal solution of (D). With
ei denoting the i-th unit vector (1 ≤ i ≤ m), the range of the i-th coefficient bi
of b is simply the linearity interval of the optimal-value function zA (b + βei , c) that
contains zero. From Theorem IV.73, the extreme points of this linearity interval follow
by minimizing and maximizing β over the set
β : Ax = b + βei , x ≥ 0, xT s∗ = 0 .
With bi considered as a variable, its range of bi follows by minimizing and maximizing
bi over the set
bi : Ax = b, x ≥ 0, xT s∗ = 0 .
(19.6)
The variables in this problem are x and bi . For the shadow prices of bi we use
Theorem IV.62. The left- and right-shadow prices of bi follow by minimizing and
maximizing eTi y = yi over the set
yi : AT y + s = c, s ≥ 0, sT x∗ = 0 .
(19.7)
Similarly, the range of the j-th coefficient cj of c is equal to the linearity interval of the
optimal-value function zA (b, c + γej ) that contains zero. Changing cj into a variable
and using Theorem IV.75, we obtain the extreme points of this linearity interval by
minimizing and maximizing cj over the set
cj : AT y + s = c, s ≥ 0, sT x∗ = 0 .
(19.8)
20
Sensitivity analysis is an important topic in the application oriented literature on LO. Some relevant
references, in chronological order, are Gal [89], Gauvin [93], Evans and Baker [72, 73], Akgül [6],
Knolmayer [173], Gal [90], Greenberg [128], Rubin and Wagner [247], Ward and Wendell [288],
Adler and Monteiro [4], Mehrotra and Monteiro [207], Jansen, Roos and Terlaky [153], Jansen,
de Jong, Roos and Terlaky [152] and Greenberg [129]. It is surprising that in the literature on
sensitivity analysis it is far from common to distinguish between left- and right-shadow prices. One
of the early exceptions was Gauvin [93]; this paper, however, is not mentioned in the historical
survey on sensitivity analysis of Gal [90].
388
IV Miscellaneous Topics
In this problem the variables are the vectors y and s and also cj . For the shadow
prices of cj we use Theorem IV.68. The left- and right-shadow prices of cj follow by
minimizing and maximizing eTj x = xj over the set
xj : Ax = b, x ≥ 0, xT s∗ = 0 .
(19.9)
Some remarks are in order. If bi is not a break point, which becomes evident if the
extreme values in (19.6) both differ from bi , then we know that the left- and rightshadow prices of bi are the same and these are given by yi∗ . In that case there is no
need to solve (19.7). On the other hand, when bi is a break point, it is clear from
the discussion following the proof of Theorem IV.73 that there are three possibilities.
When the range of bi is determined by solving (19.6) the result may be one of the
two linearity intervals surrounding bi ; in that case yi∗ is the shadow price of bi on
this interval. This happens if and only if the given optimal solution y ∗ is such that
yi∗ is an extreme value in the set (19.7). The third possibility is that the extreme
values in the set (19.6) are both equal to bi . This certainly occurs if y ∗ is a strictly
complementary solution of (D). In each of the three cases it becomes clear after (19.6)
is solved, that bi is a break point, and the left- and right-shadow prices at bi can be
found by determining the extreme values of (19.7). Clearly similar remarks apply to
the ranges and shadow prices of the coefficients of the vector c.
19.5.2
Using strictly complementary solutions
The formulas for the ranges and shadow prices of the coefficients of b and c can be
simplified when the given optimal solutions x∗ of (P ) and (y ∗ , s∗ ) of (D) are strictly
complementary. Let (B, N ) denote the optimal partition of (P ) and (D). Then we
have x∗B > 0, x∗N = 0 and s∗B = 0, s∗N > 0. As a consequence, we have xT s∗ = 0 in
(19.6) and (19.9) if and only if xN = 0. Similarly, sT x∗ = 0 holds in (19.7) and (19.8)
if and only if sB = 0.
Using this we can reformulate (19.6) as
{bi : Ax = b, xB ≥ 0, xN = 0} ,
(19.10)
and (19.7) as
yi : AT y + s = c, sB = 0, sN ≥ 0 .
(19.11)
Similarly, (19.8) can be rewritten as
cj : AT y + s = c, sB = 0, sN ≥ 0 ,
(19.12)
and (19.9) as
{xj : Ax = b, xB ≥ 0, xN = 0} .
(19.13)
IV.19 Parametric and Sensitivity Analysis
389
We proceed with an example.21
Example IV.82 Consider the (primal) problem (P ) defined by
min
s.t.
x1
+
4x2
+
x3
−2x1
+
x2
+
x3
+
x2
−
x3
x1
+
2x4
+
+
2x5
+
x5
x4
−x6
=0
−x7
=1
x1 , x2 , x3 , x4 , x5 , x6 , x7 ≥ 0.
The dual problem (D) is
max
y2
−2y1
s.t.
+
y2
y1
+
y2
y1
−
y2
y2
y1
−y1
−y2
≤
1
(1)
4
(2)
≤
1
(3)
≤
2
(4)
≤
2
(5)
≤
0
(6)
≤
0
(7)
≤
Problem (D) can be solved graphically. Its feasible region is shown in Figure 19.5
(page 390).
Since we are maximizing y2 in (D), the figure makes clear that the set of optimal
solutions is given by
D∗ = {(y1 , y2 ) : 0.5 ≤ y1 ≤ 2, y2 = 2} ,
and hence the optimal value is 2. Note that all slack values can be positive at an
optimal solution except the slack value of the constraint y2 ≤ 2. This means that the
set N in the optimal partition (B, N ) equals N = {1, 2, 3, 5, 6, 7}. Hence, B = {4}.
Therefore, at optimality only the variable x4 can be positive. It follows that
P ∗ = {x ∈ P : x1 = x2 = x3 = x5 = x6 = x7 = 0} = {(0, 0, 0, 1, 0, 0, 0)} ,
and (P ) has a unique solution: x = (0, 0, 0, 1, 0, 0, 0).
21
Exercise 102 The ranges and shadow prices can also be found by solving the corresponding dual
problems. For example, the maximal value of bi in (19.10) can be found by solving
min
and the minimal value by solving
max
bT y : AT
B y ≥ 0, yi = −1
bT y : AT
B y ≤ 0, yi = −1 .
Formulate the dual problems for the other six cases.
390
IV Miscellaneous Topics
3
(6)
2.5
(1)
(2)
(4)
2
y2
✻
1.5
1
0.5
(7)
0
−0.5
(3)
−1
−1
−0.5
0
Figure 19.5
(5)
0.5
1
1.5
2
✲ y1
2.5
3
The feasible region of (D).
The next table shows the result of a complete sensitivity analysis. It shows the
ranges and shadow prices for all coefficients of b and c, where these vectors have their
usual meaning. For each coefficient that is a break point we give the shadow price as a
closed interval; the extreme values of this interval are the left- and right-shadow prices
of the coefficient. In this example this happens only for b1 . The range of a break point
consists of the point itself; the table gives this point. On the other hand, for ‘nonbreak
points’ the range is a proper interval and the shadow price is a number.
Coefficient
Range
Shadow prices
b1 = 0
0
[ 21 , 2]
b2 = 1
[0, ∞)
2
[ 52 , ∞)
[− 23 , ∞)
0
[0, 3]
1
[ 12 , ∞)
0
[−2, ∞)
0
c1 = 1
c2 = 4
c3 = 1
c4 = 2
c5 = 2
c6 = 0
c7 = 0
[−2, ∞)
0
0
[−2, ∞)
0
We perform the sensitivity analysis here for b1 and c4 .
Range and shadow prices for b1
Using (19.10) the range of b1 follows by minimizing and maximizing b1 over the system
0
x4
=
=
b1
1.
IV.19 Parametric and Sensitivity Analysis
391
The solution of this system is unique: x4 = 1 and b1 = 0, so the range of b1 is the
interval [0, 0]. This means that b1 = 0 is a break point.
The left- and right-shadow prices of b1 follow by minimizing and maximizing y1
over y ∈ D∗ . The minimal value is 0.5 and the maximal value 2, so the left- and
right-shadow prices 0.5 and 2.
Range and shadow price for c4
The range of c4 is found by using (19.12). This amounts to minimizing and maximizing
c4 over the system
−2y1
+
y2
y1
+
y2
y1
−
y2
y2
y1
y1
y2
≤
1
≤
1
≤
4
=
c4
≤
2
≥
0.
≥
0
This optimization problem can easily be solved by using Figure 19.5. It amounts to
the question of which values of y2 are feasible when the fourth constraint is removed
in Figure 19.5. We can easily verify that all values of y2 in the closed interval [0, 3]
(and no other values) satisfy. Therefore, the range of c4 is this interval. The shadow
♦
price of c4 is given by eT4 x = x4 = 1.
19.5.3
Classical approach to sensitivity analysis
Commercial optimization packages for the solution of LO problems usually offer the
possibility of doing sensitivity analysis. The sensitivity analysis in many existing
commercial optimization packages is based on the naive approach presented in first
year textbooks. As a result, the outcome of the sensitivity analysis is often confusing.
We explain this below.
The ‘classical’ approach to sensitivity analysis is based on the Simplex Method for
solving LO problems.22 The Simplex Method produces a so-called basic solution of
22
With the word ‘classical’ we want to refer to the approach which dominates the literature, especially
well known textbooks dealing with parametric and/or sensitivity analysis. This approach has led
to the existing misuse of parametric optimization in commercial packages. This misuse is however
a shortcoming of the packages and by no means a shortcoming in the whole existing theoretical
literature. In this respect we want to refer the reader to Nožička, Guddat, Hollatz and Bank [228].
In this book the parametric issue is correctly handled in terms of the Simplex Method, polyhedrons,
faces of polyhedra etc. Besides parameterizing either the objective vector or the right-hand side
vector, much more general parametric issues are also discussed. The following citation is taken
from this book: Den qualitativen Untersuchungen in den meisten erschienenen Aufsätzen und
Büchern liegt das Simplexverfahren zugrunde. Zwangsläufig unterliegen alle derartig gewonnenen
Aussagen den Schwierigkeiren, die bei Beweisführungen mit Hilfe der Simplexmethode im Falle der
Entartung auftreten. In einigen Arbeiten wurde ein rein algebraischer Weg verfolgt, der in gewisse
Spezialfällen zu Resultaten führte, im allgemeinen aber bisher keine qualitative Analyse erlieferte.
392
IV Miscellaneous Topics
the problem. It suffices for our purpose to know that such a solution is determined
by an optimal basis. Assuming that A is of size m × n and rank (A) = m, a basis is
a nonsingular m × m submatrix AB ′ of A and the corresponding basic solution x is
determined by
AB ′ xB ′ = b, xN ′ = 0,
where N ′ consists of the indices not in B ′ . Defining a vector y by
ATB ′ y = cB ′ ,
and denoting the slack vector of y by s, we have sB ′ = 0. Since xN ′ = 0, it follows
that xs = 0, proving that x and s are complementary vectors. Hence, if xB ′ and sN ′
are nonnegative then x is optimal for (P ) and (y, s) is optimal for (D). In that case
AB ′ is called an optimal basis for (P ) and (D). A main result in the Simplex based
approach to LO is that such an optimal basis always exists — provided the assumption
that rank (A) = m is satisfied — and the Simplex Method generates such a basis. For
a detailed description of the Simplex Method and its underlying theory we refer the
reader to any (text-)book on LO.23
Any optimal basis leads to a natural division of the indices into m basic indices and
n − m nonbasic indices, thus yielding a partition (B ′ , N ′ ) of the index set. We call this
the optimal basis partition induced by the optimal basis B ′ . Obviously, an optimal
basis partition need not be an optimal partition. In fact, this observation is crucial as
we show below.
The classical approach to sensitivity analysis amounts to applying the ‘formulas’
(19.10) – (19.13) for the ranges and shadow prices, but with the optimal basis partition
(B ′ , N ′ ) instead of the optimal partition (B, N ). It is clear that in general (B ′ , N ′ )
is not necessarily the optimal partition because (P ) and (D) may have more than
one optimal basis. The outcome of the classical analysis will therefore depend on
the optimal basis AB ′ . Hence, correct implementations of the classical approach may
give rise to different ‘ranges’ and ‘shadow prices’.24 The next example illustrates this
phenomenon. In a subsequent section a further example is given, where we apply
several commercial optimization packages to a small transportation problem.
Example IV.83 For problems (P ) and (D) in Example IV.82 we have three optimal
bases. These are given in the table below. The column at the right gives the ‘ranges’
for c4 for each of these bases.
Basis
B′
‘Range’ for c4
1
{1, 4}
[1, 3]
{4, 5}
[1, 2]
2
3
{2, 4}
[2, 3]
We get three different ‘ranges’, depending on the optimal basis. Let us do the
calculations for the first optimal basis in the table. The ‘range’ of c4 is found by
23
24
See, e.g., Dantzig [59], Papadimitriou and Steiglitz [231], Chvátal [55], Schrijver [250], Fang and
Puthenpura [74] and Sierksma [256].
We put the words range and shadow price between quotes if they refer to ranges and shadow prices
obtained from an optimal basis partition (which may differ from the unique optimal partition).
IV.19 Parametric and Sensitivity Analysis
393
using (19.12) with (B, N ) such that B = B ′ = {1, 4}. This amounts to minimizing
and maximizing c4 over the system
−2y1
+
y2
=
1
y1
+
y2
4
y1
−
y2
≤
y2
y1
y1
y2
≤
1
=
c4
≤
2
≥
0.
≥
0
Using Figure 19.5 we can easily solve this problem. The question now is which values
of y2 are feasible when the fourth constraint is removed in Figure 19.5 and the first
constraint is active. We can easily verify that this leads to 1 ≤ y2 ≤ 3, thus yielding
the closed interval [1, 3] as the ‘range’ for c4 . The other two ‘ranges’ can be found in
the same way by keeping the second and the fifth constraints active, respectively.
A commercial optimization package provides the user with one of the three ranges
in the table, depending on the optimal basis found by the package. Observe that each
of the three ranges is a subrange of the correct range, which is [0, 3]. Note that the
current value 2 of c4 lies in the open interval, whereas for two of the ‘ranges’ in the
table, 2 is an extreme point. This might lead to the wrong conclusion that 2 is a break
point of the optimal-value function.
♦
It can easily be understood that the ‘range’ obtained from an optimal basis partition
is always a subinterval of the whole linearity interval. Of course, sometimes the
subinterval may coincide with the whole interval. For the shadow prices a similar
statement holds. At a ‘nonbreak point’ an optimal basis partition yields the correct
shadow price. At a break point, however, an optimal basis partition yields one ‘shadow
price’, which may be any number between the left- and the right-shadow price. The
example in the next section demonstrates this behavior very clearly.
Before proceeding with the next section we must note that from a computational
point of view, the approach using an optimal basis partition is much cheaper than using
the optimal partition. In the latter case we need to solve some auxiliary LO problems
— in the worst case four for each coefficient. When the optimal partition (B, N ) is
replaced by an optimal basis partition (B ′ , N ′ ), however, it becomes computationally
very simple to determine the ‘ranges’ and ‘shadow prices’.
For example, consider the ‘range’ problem for bi . This amounts to minimizing and
maximizing bi over the set
{bi : Ax = b, xB ′ ≥ 0, xN ′ = 0} .
Since AB ′ is nonsingular, it follows that
xB ′ = A−1
B ′ b,
and hence the condition xB ′ ≥ 0 reduces to
A−1
B ′ b ≥ 0.
394
IV Miscellaneous Topics
This is a system of m linear inequalities in the coefficient bi , with i fixed, and hence its
solution can be determined straightforwardly. Note that the system is feasible, because
the current value of bi is such that the system is satisfied. Hence, the solution is a
closed interval containing the current value of bi .
19.5.4
Comparison of the classical and the new approach
For the comparison we use a simple problem, arising when transporting commodities
(of one type) from three distribution centers to three warehouses. The supply values at
the three distribution centers are 2, 6 and 5 units respectively, and the demand value
at each of the three warehouses is just 3. We assume that the costs for transportation
of one unit of commodity from a distribution center to a warehouse is independent
of the distribution center and the warehouse, and this cost is equal to one (unit of
currency). The aim is to meet the demand at the warehouses at minimal cost. This
problem is depicted in Figure 19.6 by means of a network. The left three nodes in
this network represent the distribution centers and the right three nodes the three
warehouses. The arcs represent the transportation routes from the distribution centers
to the warehouses. The supply and demand values are indicated at the respective
nodes. The transportation problem consists of assigning ‘flow’ values to the arcs in
a1 = 2
✲
✲
✯
✒
✲ 3 = b1
a2 = 6
✲
❥
✲
✯
✲ 3 = b2
a3 = 5
✲
❘
❥
✲
✲ 3 = b3
Figure 19.6
A transportation problem.
the network so that the demand is met and the supply values are respected; this must
be done in such a way that the cost of the transportation to the demand nodes is
minimized. Because of the choice of cost coefficients, the total cost is simply the sum
of all arc flow values. Since the total demand is 9, this is also the optimal value for
the total cost value. Note that there are many optimal flows; this is due to the fact
that all arcs are equally expensive. So far, everything is trivial.
Sensitivity to demand and supply values
Now we want to determine the sensitivity of the optimal value to perturbations of
the supply and demand values. Denoting the supply values by a = (a1 , a2 , a3 ) and
the demand values by b = (b1 , b2 , b3 ), we can determine the ranges of these values by
IV.19 Parametric and Sensitivity Analysis
395
hand.
For example, when b1 is changed, the total demand becomes 6 + b1 and this is the
optimal value as long as such a demand can be met by the present supply. This leads
to the condition
6 + b1 ≤ 2 + 6 + 5 = 13,
which yields b1 ≤ 7. For larger values of b1 the problem becomes infeasible. When
b1 = 0, the arcs leading to the first demand node have zero flow value in any optimal
solution. This means that 0 is a break point, and the range of b1 is [0, 7]. Because of
the symmetry in the network for the demand nodes, the range for b2 and b3 will be
the same interval.
When a1 is changed, the total supply becomes 11 + a1 and this will be sufficient as
long as
11 + a1 ≥ 9,
which yields a1 ≥ −2. The directed arcs can only handle nonnegative supply values,
and hence the range of a1 is [0, ∞). Similarly, the range for a2 follows from
7 + a2 ≥ 9,
which yields the range [2, ∞) for a2 , and the range for a3 follows from
8 + a3 ≥ 9,
yielding the range [1, ∞) for a3 .
To compare these ranges with the ‘ranges’ provided by the classical approach, we
made a linear model of the above problem, solved it using five well-known commercial
optimization packages, and performed a sensitivity analysis with these packages. We
used the following linear standard model:
P3 P3
min
i=1
j=1 xij
s.t.
x11
+
x12
+
x13
+
s1
=
2
x21
+
x22
+
x23
+
s2
=
6
x31
+
x32
+
x33
+
s3
=
5
x11
+
x21
+
x31
d1
=
3
x12
+
x22
+
x32
−
d2
=
3
x13
+
x23
+
x33
d3
=
3
xij , si , dj ≥ 0, i, j = 1, 2, 3.
−
−
The meaning of the variables is as follows:
xij
:
the amount of transport from supply node i to demand node j,
si
:
excess supply at supply node i,
dj
:
shortage of demand at node j,
where i and j run from 1 to 3.
396
IV Miscellaneous Topics
The result of the experiment is shown in the table below.25 The columns correspond
to the supply and the demand coefficients. Their current values are put between
brackets. The rows in the table corresponding to the five packages26 CPLEX, LINDO,
PC-PROG, XMP and OSL show the ‘ranges’ produced by these packages. The last
row contains the ranges calculated before by hand.27
‘Ranges’ of supply and demand values
LO package
a1 (2)
a2 (6)
a3 (5)
b1 (3)
b2 (3)
b3 (3)
CPLEX
LINDO
[0,3]
[4,7]
[1,∞)
[2,7]
[2,5]
[2,5]
[1,3]
[2,∞)
[4,7]
[2,4]
[1,4]
[1,7]
[0,∞)
[4,∞)
[3,6]
[2,5]
[0,5]
[2,5]
XMP
[0,3]
[6,7]
[1,∞)
[2,3]
[2,3]
[2,7]
OSL
[0,3]
[4,7]
[2,7]
[2,5]
[2,5]
[0,∞)
[2,∞)
(−∞, ∞)
[0,7]
[0,7]
[0,7]
PC-PROG
Correct range
[1,∞)
The table clearly demonstrates the weaknesses of the classical approach. Sensitivity
analysis is considered to be a tool for obtaining information about the bottlenecks
and degrees of freedom in the problem. The information provided by the commercial
optimization packages is confusing and hardly allows a solid interpretation. For
example, in our example problem there is obvious symmetry between the demand
nodes. None of the five packages gives evidence of this symmetry.
Remark IV.84 As stated before, the ‘ranges’ and ‘shadow prices’ provided by the classical
approach arise by applying the formulas (19.10) – (19.13) for the ranges and shadow prices,
but replacing the optimal partition (B, N ) by the optimal basis partition (B ′ , N ′ ). Indeed,
the ‘ranges’ in the table can be reconstructed in this way. We will not do this here, but to
enable the interested reader to perform the relevant calculations we give the optimal basis
partitions used by the packages. If the optimal basis partition is (B ′ , N ′ ), it suffices to know
the variables in B ′ for each of the five packages. These ‘basic variables’ are given in the next
table.
LO package
25
26
27
Basic variables
CPLEX
x12
x21
x22
x23
x31
s3
LINDO
x11
x23
x31
x32
x33
s2
PC-PROG
x22
x23
x31
x33
s1
s2
XMP
x13
x21
x22
x23
x33
s3
OSL
x12
x21
x22
x23
x31
s3
The dual problem has a unique solution in this example. These are the shadow prices for the
demand and supply values. All packages return this unique solution, namely 0 for the supply
values — due to the excess of supply — and 1 for the demand values.
For more information on these packages we refer the reader to Sharda [253].
The ‘range’ provided by the IBM package OSL (Optimization Subroutine Library) for a3 is not
a subrange of the correct range; this must be due to a bug in OSL. The correct ‘range’ for the
optimal basis partition used by OSL is [1, ∞).
IV.19 Parametric and Sensitivity Analysis
397
Note that CPLEX and OSL use the same optimal basis. The output of their sensitivity
analysis differs, however. As noted before, the explanation of this phenomenon is that the
OSL implementation of the classical approach must contain a bug.
•
The sensitivity analysis for the cost coefficients cij is considered next. The results
are similar, as we shall see.
Sensitivity to cost coefficients
The current values of the cost coefficients cij are all 1. As a consequence each feasible
flow on the network is optimal if the sum of the flow values xij equals 9. When one
of the arcs becomes more expensive, then the flow on this arc can be rerouted over
the other arcs and the optimal value remains 9. Hence the right-shadow price of each
cost coefficient equals 0. On the other hand, if one of the arc becomes cheaper, then
it becomes attractive to let this arc carry as much flow as possible. The maximal flow
values for the arcs are 2 for the arcs emanating from the first supply node and 3 for
the other arcs. Since for each arc there exists an optimal solution of the problem in
which the flow value on that arc is zero, a decrease of 1 in the cost coefficient for the
arcs emanating from the first supply node leads to a decrease of 2 in the total cost,
and for the other arcs the decrease in the total cost is 3. Thus we have found the
left- and right-shadow prices of the cost coefficients. Since the left- and right-shadow
prices are all different, the current value of each of the cost coefficients is a break
point. Obviously, the linearity interval to the left of this break point is (−∞, 1] and
the linearity interval to the right of it is [1, ∞).
In the next table the ‘shadow prices’ provided by the five commercial optimization
packages are given. The last row in the table contains the correct values of the leftand right-shadow prices, as just calculated.
‘Shadow prices’ of cost coefficients
LO package
c11
c12
c13
c21
c22
c23
c31
c32
c33
CPLEX
0
2
0
2
1
3
1
0
0
LINDO
2
0
0
0
0
2
1
3
1
PC-PROG
0
0
0
0
3
1
3
0
2
XMP
0
0
2
3
3
0
0
0
1
OSL
0
2
0
2
1
3
1
0
0
[2,0]
[2,0]
[2,0]
[3,0]
[3,0]
[3,0]
[3,0]
[3,0]
[3,0]
Correct values
Note that in all cases the ‘shadow price’ of a package lies in the interval between
the left- and right-shadow prices.
The last table shows the ‘ranges’ of the packages and the correct left- and right-hand
side ranges for the cost coefficients.28 It is easy to understand the correct ranges. For
example, if c11 increases then the corresponding arc becomes more expensive than the
other arcs, and hence will not be used in an optimal solution. On the other hand, if
28
In this table we use shorthand notation for the infinite intervals [1,∞) and (-∞,1]. The interval
[1,∞) is denoted by [1, ) and the interval (-∞,1] by ( ,1].
398
IV Miscellaneous Topics
c11 decreases than it becomes attractive to use this arc as much as possible; due to
the limited supply value (i.e., 2) in the first supply node a flow of value 2 will be sent
along this arc whatever the value of c11 is. Considering c21 , we see the same behavior if
c21 increases: the arc will not be used. But if c21 ∈ [0, 1], then the arc will be preferred
above the other arcs, and its flow value will be 3. If c21 would become negative, then
it becomes attractive to send even a flow of value 6 along this arc, despite the fact
that than the first demand node receives oversupply. So c21 = 0 is a break point.
Note that if a ‘shadow price’ of a package is equal to the left or right-shadow price
then the ‘range’ provided by the package must be a subinterval of the correct range.
Moreover, if the ‘shadow price’ of a package is not equal to the left or right-shadow
price then the ‘range’ provided by the package must be the singleton [1, 1]. The results
of the packages are consistent in this respect, as follows easily by inspection.
‘Ranges’ of the cost coefficients
LO package
c11
c12
c13
c21
c22
c23
c31
CPLEX
[1, )
( ,1]
[1, )
[1,1]
[1,1]
[0,1]
[1,1]
[1, )
[1, )
LINDO
( ,1]
[1, )
[1, )
[1, )
[1, )
[1,1]
[1,1]
[0,1]
[1,1]
PC-PROG
[1, )
[1, )
[1, )
[1, )
[0,1]
[1,1]
[0,1]
[1, )
[1,1]
XMP
29
c32
c33
–
–
(,1]
[0,1]
[0,1]
[1,1]
–
–
[1,1]
30
[1, )
[1,1]
[1, )
[1,1]
[1,1]
[1,1]
[1,1]
[1, )
[1, )
Left range
( ,1]
( ,1]
( ,1]
[0,1]
[0,1]
[0,1]
[0,1]
[0,1]
[0,1]
Right range
[1, )
[1, )
[1, )
[1, )
[1, )
[1, )
[1, )
[1, )
[1, )
OSL
19.6
Concluding remarks
In this chapter we developed the theory necessary for the analysis of one-dimensional
parametric perturbations of the vectors b and c in the standard formulation of the
primal problem (P ) and its dual problem (D). Given a pair of optimal solutions for
these problems, we presented algorithms in Section 19.4.5 for the computation of the
optimal-value function under such a perturbation. In Section 19.5 we concentrated on
the special case of sensitivity analysis. In Section 19.5.1 we showed that the ranges and
shadow prices of the coefficients of b and c can be obtained by solving auxiliary LO
problems. We also discussed how the ranges obtained in this way can be ambiguous,
but that the ambiguity can be avoided by using strictly complementary solutions.
We proceeded in Section 19.5.3 by discussing the classical approach to sensitivity
analysis, based on the use of an optimal basic solution and the corresponding optimal
basis. We showed that this approach is much cheaper from a computational point of
29
For some unclear reason XMP did not provide all ranges. The missing entries in its row are all
equal to [1,∞).
30
In Remark IV.84 it was established that OSL and CPLEX use the same optimal basis; nevertheless
their ‘ranges’ for c12 and c23 are different. One may easily verify that these ‘ranges’ are (−∞, 1]
and [0, 1] respectively. Thus, the CPLEX ‘ranges’ are consistent with this optimal basis and the
OSL ‘ranges’ are not.
IV.19 Parametric and Sensitivity Analysis
399
view. On the other hand, much less information is usually obtained and the information
is often confusing. In the previous section we provided a striking example by presenting
the sensitivity information provided by five commercial optimization packages for a
simple transportation problem.
The shortcomings of the classical approach are well known among experts in the
field. At several places in the literature these experts raised their voices to warn of the
possible implications of using the classical approach. By way of example we include a
citation of Rubin and Wagner [247]:
Managers who build their own microcomputer linear programming models
are apt to misuse the resulting shadow prices and shadow costs. Fallacious
interpretations of these values can lead to expensive mistakes, especially
unwarranted capital investments.
As a result of the unreliability of the sensitivity information provided by computer
packages, the reputation of sensitivity analysis as a tool for obtaining information
about the bottlenecks and degrees of freedom has suffered a lot. Many potential users
of such information do not use it, because they want to avoid the pitfalls that are
inherent in the classical approach.
The theory developed in this chapter provides a solid base for reliable sensitivity
modules in future generations of computer packages for LO.
20
Implementing Interior Point
Methods
20.1
Introduction
Several polynomial interior-point algorithms were discussed in the previous chapters.
Interior point algorithms not only provide the best theoretical complexity for LO
problems but allow highly efficient implementations as well. Obviously not all
polynomial algorithms are practically efficient. In particular, all full Newton step
methods (see, e.g., Section 6.7) are inefficient in practice. However variants like
the predictor-corrector method (see Section 7.7) and large-update methods (see
Section 6.9) allow efficient implementations. The aim of this chapter is to give some
hints on how some of these interior point algorithms can be converted into efficient
implementations. To reach this goal several problems have to be dealt with. Some
of these problems have been at least partially discussed earlier (e.g., the embedding
problem in Chapter 2) but need further elaboration. Some other topics (e.g., methods
of sparse numerical linear algebra, preprocessing) have not yet been touched.
By reviewing the various interior-point methods we observe that they are all based
on similar assumptions and are built up from similar ingredients. We can extract the
following essential elements of interior-point methods (IPMs).
Appropriate problem form. All algorithms assume that the LO problem satisfies
certain assumptions. The problem must be in an appropriate form (e.g., the
canonical form or the standard form). In the standard form the coefficient matrix
A must have full row rank. Techniques to bring a given LO problem to the desired
form, and at the same time to eliminate redundant constraints and variables,
are called preprocessing and are discussed in Section 20.3.
Search direction. The search direction in interior-point methods is always a Newton
direction. To calculate this direction we have to solve a system of linear
equations. Except for the right-hand side and the scaling, this system is the
same for all the methods. Computationally the solution of the system amounts
to factorizing a square matrix and then solving the triangular systems by
forward or backward substitution. The factorization is the most expensive part
of an iteration. Without efficient sparse linear algebra routines, interior-point
methods would not be practical. Various elements of sparse matrix techniques are
discussed in Section 20.4. A straightforward idea for reducing the computational
402
IV Miscellaneous Topics
cost is to reuse the same factorization. This leads to the idea of second- and
higher-order methods discussed in Section 20.4.3.
Interior point. The interior-point assumption is presupposed, i.e. that both the
primal and the dual problem have a strictly positive (preferably centered) initial
solution. Most LO problems do not have such a solution, but still have to be
solved. A theoretically appealing and at the same time practical method is
to embed the problem in a self-dual model, as discussed in Chapter 2. The
embedding model is revisited and elaborated on in Section 20.5.
Reoptimization. In practice it often happens that several variants of the same LO
problem need to be solved successively. One might expect that the solution of an
earlier version would be a good starting point for a slightly modified problem.
For this so-called warm start problem the embedding model also provides a good
solution as discussed in Section 20.5.2.
Parameters: Step size, stopping criteria. The iterates in IPMs should stay in
some neighborhood of the central path. Theoretically good step-sizes can result
in hopelessly slow convergence in practice. A practical step-size selection rule
is discussed. At some point, when the duality gap is small, the calculation is
terminated. The theoretical criteria are typically far beyond today’s machine
precision. A practical criterion is presented in Section 20.6.
Optimal basis identification. It is not an essential element of interior-point methods, but sometimes it still might be important1 to find an optimal basis. Then
we need to provide the ability to ‘cross over’ from an interior solution to a basic
solution. An elegant strongly polynomial algorithm is presented in Section 20.7.
20.2
Prototype algorithm
In most practical LO problems, in addition to the equality and inequality constraints,
the variables have lower and upper bounds. Thus we deal with the primal problem in
the following form:
min cT x : Ax ≥ b, x ≤ bu , x ≥ 0 ,
(20.1)
x
n
where c, x, bu ∈ IR , b ∈ IRm , and the matrix A is of size m × n. Now its dual is
(20.2)
max bT y − bTu yu : AT y − yu ≤ c, y ≥ 0, yu ≥ 0 ,
y,yu
where y ∈ IRm and yu ∈ IRn . Let us denote the slack variables in the primal problem
(20.1) by
z = Ax − b, zu = bu − x
and in the dual problem (20.2) by
s = c + yu − AT y,
1
Here we might think about linear integer optimization when cutting planes are to be generated to
cut off the current nonintegral optimal solution.
IV.20 Implementing Interior Point Methods
403
respectively. Here we assume not only that the problem pair satisfies the interior-point
assumption, but also that a strictly positive solution (x, zu , z, s, yu , y) > 0 is given,
satisfying all the constraints in (20.1) and (20.2) respectively. How to solve these
problems without the explicit knowledge of an interior point is discussed in Section
20.5.
The central path of the pair of problems given in (20.1) and (20.2) is defined as
the set of solutions (x(µ), zu (µ), z(µ)) > 0 and (s(µ), yu (µ), y(µ)) > 0 for µ > 0 of the
system
Ax − z
=
b,
x + zu
A y − yu + s
=
=
bu ,
c,
xs
zu yu
=
=
µe,
µe,
zy
=
µe,
T
(20.3)
where e is the vector of all ones with appropriate dimension.
Observe that the first three of the above equations are linear and force primal and
dual feasibility of the solution. The last three equations are nonlinear. They become the
complementarity conditions when µ = 0, which together with the feasibility constraints
provides optimality of the solutions. The actual duality (or complementarity) gap g
can easily be computed:
g = xT s + zuT yu + z T y,
which equals (2n + m)µ on the central path .
One iteration of a primal-dual algorithm makes one step of Newton’s method applied
to system (20.3) with a given µ; then µ is reduced and the procedure is repeated as
long as the duality gap is larger than a predetermined tolerance.
Given a solution (x, zu , z) > 0 of (20.1) and (s, yu , y) > 0 of (20.2) the Newton
direction for (20.3) is obtained by solving a system of linear equations. This system
can be written as follows, where the ordering of the variables is chosen so that the
structure of the coefficient matrix becomes apparent.
0 A −I 0 0 0
0
∆y
0
AT 0 0 −I 0 I ∆x
∆z
0 I 0
0
0
I
0
.
=
(20.4)
0 S 0
µe
−
xs
∆y
0 0 X
u
µe − zu yu
∆zu
0 0 0 Zu Yu 0
µe − zy
∆s
Z 0 Y
0 0 0
In making a step, in order to preserve the positivity of (x, zu , z) and (s, yu , y), a stepsize α usually smaller than one (a damped Newton step) is chosen.
Let us have a closer look at the Newton system. From the last four equations in
(20.4) we can easily derive
∆zu
=
∆s
∆yu
=
=
∆z
=
−∆x,
x−1 (µe − xs − s∆x),
zu−1 (µe − zu yu − yu ∆zu ) = zu−1 (µe − zu yu + yu ∆x),
y
−1
(µe − yz − z∆y).
(20.5)
404
IV Miscellaneous Topics
With these relations, (20.4) reduces to
"
#"
D2
A
−D̄−2
AT
∆y
∆x
#
=
"
r
h
#
,
(20.6)
where
D2
=
ZY −1
−2
r
=
=
h
=
SX −1 + Yu Zu−1
y −1 (µe − yz)
D̄
zu−1 (µe − zu yu ) − x−1 (µe − xs).
The solution of the reduced Newton system (20.6) is the computationally most involved
step of any interior point method. The system (20.6) in this form is a symmetric
indefinite system and is referred to as the augmented system. If the second equation
in the augmented system is multiplied by −1 a system with a positive definite (but
unsymmetric) matrix is obtained.
The augmented system (20.6) is equivalent to
∆x = D̄2 (AT ∆y − h)
and
AD̄2 AT + D2 ∆y = r + AD̄2 h.
(20.7)
2
The last equation is referred to as the normal equation. The way to solve the systems
(20.6) and (20.7) efficiently is discussed in detail in Section 20.4.
After system (20.6) or (20.7) is solved, using formulas (20.5) we obtain the solution
of (20.4). Now the maximal feasible step lengths αP for the primal (x, z, zu ) and αD for
the dual (s, y, yu ) variables are calculated. Then these step-sizes are slightly reduced
by a factor α0 < 1 to avoid reaching the boundary. The new iterate is computed as
xk+1
:=
xk + α0 αP ∆x,
zuk+1
:=
zuk + α0 αP ∆zu ,
z k+1
:=
z k + α0 αP ∆z,
sk+1
:=
sk + α0 αD ∆s,
yuk+1
:=
yuk + α0 αD ∆yu ,
y k+1
:=
y k + α0 αD ∆y.
(20.9)
After the step, the parameter µ is updated and the process is repeated. A prototype
algorithm can be summarized as follows.
2
Exercise 103 Show that if first ∆y is calculated from the system (20.6) as a function of ∆x the
following formulas arise:
∆y = −D −2 (A∆x − r)
and
AT D −2 A + D̄ −2 ∆x = AT D −2 r − h.
(20.8)
Observe that this symmetric formulation allows for further utilization of the structure of the normal
equation. We are free to choose between (20.7) and (20.8) depending on which has a nicer sparsity
structure.
IV.20 Implementing Interior Point Methods
405
Prototype Primal–Dual Algorithm
Input:
An accuracy parameter ε > 0;
(x0 , zu0 , z 0 ) and (s0 , yu0 , y 0 ); interior solutions for (20.1) and (20.2);
parameter α0 < 1; µ0 > 0.
begin
(x, zu , z, s, yu , y) = (x0 , zu0 , z 0 , s0 , yu0 , y 0 ); µ := µ0 ;
while (2n + m)µ ≥ ε do
begin
reduce µ
solve (20.4) to obtain (∆x, ∆zu , ∆z, ∆s, ∆yu , ∆y);
determine αP and αD ;
update (x, zu , z, yu , y, s) by (20.9)
end
end
Before discussing all the ingredients in more detail we make an important observation.
Solving a problem with individual upper bounds on the variables does not require
significantly more computational effort than solving the same problem without such
upper bounds. In both cases the augmented system (20.6) and the normal equation
(20.7) have the same size. The extra costs per iteration arising from the upper bounds
are just O(n), namely some extra ratio tests to determine the maximal possible steps
sizes and some extra vector manipulations (see equations (20.5)).3
20.3
Preprocessing
An important issue for all implementations is to transform the problem into an
appropriate form, e.g., to the canonical form with upper bounded variables (20.1), and
to reduce the problem size in order to reach a minimal representation of the problem.
This aim is quite plausible. A smaller problem needs less memory to store, usually fewer
iterations of the algorithm, and if the transformation reduces the number of nonzero
coefficients or improves the sparsity structure then fewer arithmetic operations per
iteration are needed. A minimal representation should be free of redundancies, implied
variables and inequalities. In general it is not realistic to strive to find the minimal
representation of a given problem. But by analysing the structure of the problem it is
often possible to reduce the problem size significantly. In fact, almost all large-scale LO
problems contain redundancies in practice. The use of modeling languages and matrix
generators easily allows the generation of huge models. Modelers choose to formulate
models that are easy to understand and modify; this often leads to the introduction
3
Exercise 104 Check that the computational cost per iteration increases just by O(n) if individual
upper bounds are imposed on the variables.
406
IV Miscellaneous Topics
of superfluous variables and redundant constraints. To remove at least most of these
redundancies is, however, a nontrivial task; this is the aim of preprocessing.
As we have already indicated, computationally the most expensive part of an
interior-point iteration is calculating the search direction, to solve the normal equation
(20.7) or the augmented system (20.6). With a compact formulation the speedup can
be significant.4
20.3.1
Detecting redundancy and making the constraint matrix sparser
By analysing the sparsity pattern of the matrix A, one can frequently reduce the
problem size. The aim of the sparsity analysis is to reduce the number of nonzero
elements in the constraint matrix A; this is done by elementary matrix operations.
In fact, as a consequence, the sparsity analysis mainly depends on just the nonzero
structure of the matrix A and it is largely independent of the magnitude of the
coefficients.
1. First we look for pairs of constraints with the same nonzero pattern. If we have two
(in-)equality constraints which are identical — up to a scalar multiplier — then
one of these constraints is removed from the problem. If one of them is an equality
constraint and the other an inequality constraint then the inequality constraint is
dropped. If they are opposite inequalities then they are replaced by one equality
constraint.
2. Linearly dependent constraints are removed. (Dependency can easily be detected
by using elimination.)
3. Duplicate columns are removed.
4. To improve the sparsity pattern of the constraint matrix A further we first put
the constraints into equality form. Then by adding and subtracting constraints
with appropriate multipliers, we can eliminate several nonzero entries.5 During
this process we have to make sure that the resulting sparser system is equivalent
to the original one. Mathematically this means that we look for a nonsingular
matrix Q ∈ IRm×m such that the matrix QA is as sparse as possible. Such sparser
constraints in the resulting equivalent formulation
QAx = Qb
might be much more suitable for a direct application of the interior-point solver.6
4
5
6
Preprocessing is not a new idea, but has enjoyed much attention since the introduction of interiorpoint methods. This is due to the fact that the realized speedup is often larger than for the Simplex
Method. For further reading we refer the reader to, e.g., Brearley, Mitra and Williams [49], Adler et
al. [1], Lustig, Marsten and Shanno [191], Andersen and Andersen [9], Gondzio [113], Andersen [8],
Bixby [42], Lustig, Marsten and Shanno [193] and Andersen et al. [10].
As an illustration let us consider two constraints aT
x = bk and aT
j x = bj where σ(ak ) ⊆ σ(aj ).
k
(Recall that σ(x) = { i | xi 6= 0 }.) Now if we define aj = aj + λak and bj = bj + λbk , where λ
T
is chosen so that σ(aj ) ⊂ σ(aj ), then the constraint aT
j x = bj can be replaced by aj x = bj while
the number of nonzero coefficients is reduced by at least one.
Exact solution of this Sparsity Problem is an NP-complete problem (Chang and McCormick [54])
but efficient heuristics (Adler et al. [1], Chang and McCormick [54] and Gondzio [113]) usually
produce satisfactory nonzero reductions in A. The algorithm of Gondzio [113], for example, looks
for a row of A that has a sparsity pattern that is a subset of the sparsity pattern of other rows and
uses it to eliminate nonzero elements from these rows.
IV.20 Implementing Interior Point Methods
20.3.2
407
Reducing the size of the problem
In general, finding all redundancies in an LO problem is a more difficult problem than
solving the problem; hence, preprocessing procedures use a great variety of simple
inspection techniques to detect obvious redundancies. These techniques are very cheap
and fast, and are applied repeatedly until the problem cannot be reduced by these
techniques any more. Here we discuss a small collection of commonly used reduction
procedures.
1. Empty rows and columns are removed.
2. A fixed variable (xj = uj ) can be substituted out of the problem.
3. A row with a single variable defines a simple bound; after an appropriate bound
update the row can be removed.
4. We call variable xj a free column singleton if it contains a single nonzero coefficient
and there are neither lower nor upper bounds imposed on it. In this case the variable
xj can be substituted out of the problem. As a result both the variable xj and the
constraint in which it occurs are eliminated. The same holds for so-called implied
free variables, i.e., for variables for which implied bounds (discussed later on) are
at least as tight as the original bounds.
5. All the free variables can be eliminated by making them a free singleton column by
eliminating all but one coefficient in their columns. Here we recall the techniques
that were discussed in Theorem D.1 in which the LO problem was reduced to
canonical form. In the elimination steps we have to pay special attention to the
sparsity, by choosing elements in the elimination steps that reduce the number of
nonzero coordinates in A or, at least, produce the smallest amount of new nonzero
elements.
6. Trivial lower and upper bounds for each constraint i are determined. If
X
X
(20.10)
aij buj , and bi =
aij buj ,
bi =
{j:aij <0}
{j:aij >0}
then clearly
bi ≤
X
j
aij xj ≤ bi .
(20.11)
Observe that due to the nonnegativity of x, for the bounds we have bi ≤ 0 ≤ bi .
If the inequalities (20.11) are at least as tight as the original constraint, then the
constraint i is redundant. If one of them contradicts the original i-th constraint, then
the problem is infeasible. In some special cases (e.g.: ‘less than or equal to’ row with
bi = bi , ‘greater than or equal to’ row with bi = bi , or equality row for which bi
is equal to one of the limits bi or bi ) the constraint in the optimization problem
becomes a forcing one. This means that the only way to satisfy the constraint is
to fix all variables that appear in it on their appropriate bounds. Then all of these
variables can be substituted out of the problem.
7. From the constraint limits (20.10), implied variable bounds can be derived
(remember, we have 0 ≤ x ≤ bu ). Assume that for an inequality constraint the
bounds (20.11) are derived. Then for each k such that aik > 0 we have
X
aij xj ≤ bi
bi + aik xk ≤
j
408
IV Miscellaneous Topics
and for each k such that aik < 0 we have
bi + aik (xk − uk ) ≤
X
j
aij xj ≤ bi .
Now the new implied bounds from row i are easily derived as
′
xk ≤ uk
=
(bi − bi )/aik
for all
k : aik > 0,
xk ≥ lk
=
uk + (bi − bi )/aik for all
k : aik < 0.
′
If these bounds are tighter than the original ones, then the variable bounds are
improved.
8. Apply the same techniques to the dual problem.
The application of all presolve techniques described so far often results in impressive
reductions of the initial problem formulation. Once the solution for the reduced
problem is found, we have to recover the complete primal and dual solutions for the
original problem. This phase is called postprocessing.
20.4
Sparse linear algebra
As became clear in Section 20.2, the computationally most intensive part of an interiorpoint algorithm is to solve either the augmented system (20.6):
D2
A
∆y
r
=
,
(20.12)
∆x
h
AT
−D̄−2
or the normal equation (20.7):
2
AD̄2 AT + D2 ∆y = q,
(20.13)
where q = r + AD̄ h. At each iteration one of the systems (20.12) or (20.13) has to
be solved. In the subsequent iterations only the diagonal scaling matrices D and D̄
and the right-hand sides are changing. The nonzero structure of the augmented and
normal matrices remains the same in all the iterations. For an efficient implementation
it is absolutely necessary to design numerical routines that make use of this constant
sparsity structure.
20.4.1
Solving the augmented system
To solve the augmented system (20.12) a well-established technique, the Bunch–
Parlett factorization7 may be used. Observe that the coefficient matrix of (20.12)
is nonsingular, symmetric and indefinite. The Bunch–Parlett factorization of the
symmetric indefinite matrix has the form
D2
A
P T = LΛLT ,
(20.14)
P
AT
−D̄−2
7
For the original description of the algorithm we refer to Bunch and Parlett [53]. For further
application to solving least squares problems we refer the reader to Arioli, Duff and de Rijk [27],
Björk [44] and Duff [68].
IV.20 Implementing Interior Point Methods
409
for some permutation matrix P , where Λ is an indefinite block diagonal matrix with
1 × 1 and 2 × 2 blocks and L is a lower triangular matrix. The factorization is basically
an elimination (Gaussian) algorithm, in which we have to specify at each stage which
row and which column is used for the purpose of elimination.
In the Bunch–Parlett factorization, to produce a sparse and numerically stable L
and Λ at each iteration the system is dynamically analyzed. Thus it may well happen
that at each iteration structurally different factors are generated. This means that
in the choice of the element that is used for the elimination, both the sparsity and
stability of the triangular factor are considered. Within these stability and sparsity
considerations we have a great deal of freedom in this selection; we are not restricted
to the diagonal elements (one possible trivial choice) of the coefficient matrix. The
efficiency depends strongly on the heuristics used in the selection strategy.
The relatively expensive so-called analyze phase is frequently skipped and the same
structure is reused in subsequent iterations and updated only occasionally when the
numerical properties make it necessary. A popular selection rule is detecting ‘dense’
columns and rows (with many nonzero coefficients) and eliminating first in the
diagonal positions of D2 and D̄−2 in the augmented matrix (20.12) corresponding
to sparse rows and columns. The dense structure is pushed to the last stage of
factorization as a dense window. In general it is unclear what threshold density should
be used to separate dense and sparse structures. When the number of nonzeros in dense
columns is significantly larger than the average number of entries in sparse columns
then it is easy to determine a fixed threshold value. Whenever more complicated
sparsity structures appear, more sophisticated heuristics are needed.8
20.4.2
Solving the normal equation
The other popular method for calculating the search direction is to solve the normal
equation (20.13). The method of choice in this case is the sparse Cholesky factorization:
P̄ AD̄2 AT + D2 P̄ T = L̄ΛL̄T ,
(20.15)
for some permutation matrix P̄ , where L̄ is a lower triangular matrix and Λ is a positive
definite diagonal matrix. It should be clear from the derivation of the normal equation
that the normal equation approach can be considered as a special implementation of
the augmented system approach. More concretely this means that we first eliminate
either ∆x or ∆y by using all the diagonal entries of either D̄−2 or D2 . Thus the normal
equation approach is less flexible but, on the other hand, the coefficient matrix to be
factorized is symmetric positive definite, and both the matrix and its factors have a
constant sparsity structure.
The Cholesky factorization of (20.15) exists for any positive D2 and D̄2 . The sparsity
structure of L̄ is independent of these diagonal matrices and hence is constant in all
iterations if the same elimination steps are performed. Consequently it is sufficient
to analyze the structure just once and determine a good ordering of the rows and
8
To discuss these heuristics is beyond the scope of this chapter. The reader can find detailed
discussion of the advantages and disadvantages of the normal equation approach in the next
section and in the papers Andersen et al. [10], Duff et al. [69], Fourer and Mehrotra [78], Gondzio
and Terlaky [116], Maros and Mészáros [195], Turner [275], Vanderbei [277] and Vanderbei and
Carpenter [278].
410
IV Miscellaneous Topics
columns in order to obtain sparse factors. To determine such an ordering involves
considerable computational effort, but it is the basis of a successful implementation
of the Cholesky factorization in interior-point methods. This is the analyze phase.
More formally, we have to find a permutation matrix P such that the Cholesky
factor of P (AD̄2 AT + D2 )P T is the sparsest possible. Due to the difficulty of this
problem, heuristics are used in practice to find such a good permutation.9 Two efficient
heuristics, namely the minimum degree and the minimum local fill-in orderings, are
particularly useful in interior-point method implementations. These heuristics are
described briefly below.
Minimum degree ordering
Since the matrix to be factorized is positive definite and symmetric the elimination
can be restricted to the diagonal elements. This limitation preserves the symmetry
and positive definiteness of the Schur complement. Let us assume that in the k-th
step of the Gaussian elimination the i-th row of the Schur complement contains ni
nonzero entries. If this row is used for the elimination, then the elimination requires
fi =
1
(ni − 1)2 ,
2
(20.16)
floating point operations (flops). The number fi estimates the computational effort
and gives an overestimate of the fill-in that can result from the elimination. The best
choice of row i at step k is the one that minimizes fi .10
Minimum local fill-in ordering
Let us observe that, in general, fi in (20.16) considerably overestimates the number
of fill-ins at a given iteration of the elimination process because it does not take into
account the fact that in many positions of the predicted fill-in, nonzero entries already
exist. It is possible that another candidate that seems to be worse in terms of (20.16)
would produce less fill-in because in the elimination, mainly existing nonzero entries
would be updated. The minimum local fill-in ordering takes locally the real fill-in into
account. As a consequence, each step is more expensive but the resulting factors are
sparser. This higher cost has to be paid once in the analyze phase.
Disadvantages of the normal equations approach
The normal equations approach shows uniformly good performance when applied to
the solution of the majority of linear programs. Unfortunately, it suffers from a serious
drawback. The presence of dense columns in A might be catastrophic if they are not
treated with extra care. A dense column of A with k nonzero elements creates a k × k
dense submatrix (dense window) of the normal matrix (20.13). Such dense columns
do not seriously influence the efficiency of the augmented system approach.
9
Yannakakis [302] proved that finding the optimal permutation is an NP-complete problem.
10
The function fi is Markowitz’s merit function [194]. Interpreting this process in terms of the
elimination graph (cf. George and Liu [94]), we can see that it is equivalent to the choice of the
node in the graph that has the minimum degree (this gave the name to this heuristic).
IV.20 Implementing Interior Point Methods
411
In order to handle dense columns efficiently the first step is to identify them. This
typically means to choose a threshold value. If the number of nonzeros in a column
is larger than this threshold, the column is considered to be dense, the remaining
columns as sparse. Denoting the matrix of the sparse columns in A by As and the
matrix of the dense columns by Ad , the equation (20.12) can be written as follows.
D2
Ad
As
∆y
r
T
Ad −D̄d−2 0 ∆xd = hd .
hs
∆xs
ATs
0 −D̄s−2
(20.17)
After eliminating ∆xs = −D̄s−2 (hs − ATs dy ) we get the equation
D2 + As D̄s−2 ATs
ATd
Ad
−D̄d−2
∆y
∆xd
=
r + As D̄s−2 hs
hd
.
(20.18)
Here the left-upper block of the coefficient matrix is positive definite symmetric and
sparse, thus it is easy to factorize efficiently. As the reader can easily see, this approach
tries to combine the advantages of the normal equation approach and the augmented
system approach.11,12
20.4.3
Second-order methods
An attempt to reduce the computational cost of interior-point methods is based on
trying to reuse the same factorization of either the normal matrix or the augmented
system. Both in theory and in practice, factorization is much more expensive than
backsolve of triangular systems; so we can do additional backsolves in each iteration
with different right-hand sides if these reduce the total number of interior-point
iterations. This is the essential idea of higher-order methods. Our discussion here
follows the present computational practice; so we consider only the second-order
11
An appealing advantage of the symmetric formulation of the LO problem is that in (20.18) the
matrix D 2 + As D̄s−2 AT
s is nonsingular. If one would use the standard Ax = b, x ≥ 0 form, then we
would have just As D̄s−2 AT
s which might be singular. To handle this unpleasant situation an extra
trick is needed. For this we refer the reader to Andersen et al. [13] and also to Exercise 105.
12
Exercise 105 Verify that (∆y, ∆xd ) is the solution of
Ad
As D̄s−2 AT
s
AT
−
D̄d−2
d
∆y
∆xd
=
r + As D̄s−2 hs
hd
if and only if (∆y, ∆xd , u) solves
T
Ad Q
As D̄s−2 AT
∆y
s + QQ
AT
−D̄d−2 0 ∆xd =
d
u
QT
0
I
r + As D̄s−2 hs
hd
0
with any matrix Q having appropriate dimension. Observe that by choosing Q properly (e.g.
T
diagonal) the matrix As D̄s−2 AT
s + QQ is nonsingular.
412
IV Miscellaneous Topics
predictor-corrector method that is implemented in several codes with great success.13
Predictor-corrector technique
This predictor-corrector method has two components. The first is an adaptive choice
of the barrier parameter µ; the other is a second-order approximation of the central
path .
The first step in the predictor-corrector algorithm is to compute the primal-dual
affine-scaling (predictor) direction. This is the solution of the Newton system (20.4)
with µ = 0 and is indicated by ∆a . It is easy to see that if a step of size α is taken
in the affine-scaling direction, then the duality gap is reduced by α; i.e. if a large step
can be made in this direction then significant progress is made in the optimization. If
the feasible step-size in the affine-scaling direction is small, we expect that the current
point is close to the boundary; thus centering is needed and µ should not be reduced
too much.
In the predictor-corrector algorithm, first the predicted duality gap is calculated
that results from a step along the primal-dual affine-scaling direction. To this end,
when the affine-scaling direction is computed, the maximum primal (αaP ) and dual
(αaD ) feasible step sizes are determined that preserve nonnegativity of (x, zu , z) and
(s, yu , y). Then the predicted duality gap
ga
=
(x + αaP ∆a x)T (s + αaD ∆a s) + (zu + αaP ∆a zu )T (yu + αaD ∆a yu )
+(z + αaP ∆a z)T (y + αaD ∆a y)
is computed and is used to determine a target point
µ=
g 2 g
a
a
g
n
(20.19)
on the central path . Here ga /n relates to the central point with the same duality gap
that the predictor affine step would produce, and the factor (ga /g)2 pushes the target
further towards optimality in a way that depends on the achieved reduction of the
predictor step. Now the second-order component of the predictor-corrector direction
is computed. Ideally we would like to compute a step such that the next iterate is
perfectly centered, i.e.,
13
(x + ∆x)(s + ∆s)
(zu + ∆zu )(yu + ∆yu )
=
=
µe,
µe,
(z + ∆z)(y + ∆y)
=
µe,
The second-order predictor-corrector technique presented here is due to Mehrotra [205]; from a
computational point of view the method is very successful. The higher than order 2 methods —
discussed in Chapter 18 — are implementable too, but to date computational results with methods
of order higher than 2 are quite disappointing. See Andersen et al. [10]. Mehrotra was motivated
by the paper of Monteiro, Adler and Resende [220], who were the first to introduce the primal-dual
affine-scaling direction and higher-order versions of the primal-dual affine-scaling direction; they
elaborated on a computational paper of Adler, Karmarkar, Resende and Veiga [2] that uses the
dual affine-scaling direction and higher-order versions of it.
IV.20 Implementing Interior Point Methods
413
or equivalently
x∆s + s∆x
=
zu ∆yu + yu ∆zu
z∆y + y∆z
=
=
−xs + µe − ∆x∆s,
−zu yu + µe − ∆zu ∆yu ,
−zy + µe − ∆z∆y.
Usually, in the computation of the Newton direction the second-order terms
∆x∆s, ∆zu ∆yu , ∆z∆y
are neglected (recall (20.4)). Instead of neglecting the second-order term, the affine
directions
∆a x, ∆a s, ∆a zu ∆a yu , ∆a z∆a y
are used as the predictions of the second-order effect. One step of the algorithm can
be summarized as follows.
• Solve (20.4) with µ = 0, resulting in the affine step (∆a x, ∆a zu , ∆a z) and
(∆a s, ∆a yu , ∆a y).
• Calculate the maximal feasible step lengths αaP and αaD .
• Calculate the predicted duality gap ga and µ by (20.19).
• Solve the corrected Newton system
A∆x − ∆z
∆x + ∆zu
=
=
0
0
AT ∆y − ∆yu + ∆s
x∆s + s∆x
=
=
0
−xs + µe − ∆a x∆a s,
zu ∆yu + yu ∆zu
=
z∆y + y∆z
=
a
(20.20)
a
−zu yu + µe − ∆ zu ∆ yu ,
−zy + µe − ∆a z∆a y.
• Calculate the maximal feasible step lengths αP and αD and make a damped step
by using (20.9).14
Finally, observe that a single iteration of this second-order predictor-corrector primaldual method needs two solves of the same large, sparse linear system (20.4) and (20.20)
for two different right-hand sides. Thus the same factorization can be used twice.
20.5
Starting point
The self-dual embedding problem is an elegant theoretical construction for handling
the starting point problem. At the same time it can also be the basis of an efficient
implementation. In this section we show that solving the slightly larger embedding
14
This presentation of the algorithm follows the paper of Mehrotra [205]. It differs from the 2-order
method of Chapter 18.
414
IV Miscellaneous Topics
problem does not increase the computational cost significantly.15 Before presenting
the embedding problem, we summarize some of its surprisingly nice properties.
1. The embedding problem is self-dual: the dual problem is identical to the primal
one.
2. It is always feasible. Furthermore, the interior of the feasible set of the embedding
problem is also nonempty; hence the optimal faces are bounded (from Theorem
II.10). So interior-point methods always converge to an optimal solution.
3. Optimality of the original problem is detected by convergence, independently of
the boundedness or unboundedness of the optimal faces of the original problem.
4. Infeasibility of the original problem is detected by convergence as well.16 Primal,
dual or primal and dual rays for the original problems are identified to prove dual,
primal or dual and primal infeasibility (cf. Theorem I.26).
5. For the embedding problem a perfectly centered initial pair can always be
constructed.
6. It allows an elegant handling of the warm start problem.
7. The embedding problem can be solved with any method that generates a strictly
complementary solution; if the chosen method is polynomial, it solves the original
problem with essentially the same complexity bound. Thus we can achieve the best
possible complexity bounds for solving an arbitrary problem.
Self-dual embedding
We consider problems (20.1) and (20.2). To formulate the embedding problem we
need to introduce some further vectors in a way similar to that of Chapter 2. We start
with
x0 > 0, zu0 > 0, z 0 > 0, s0 > 0, yu0 > 0, y 0 > 0, κ0 > 0, ϑ0 > 0, ρ0 > 0, ν 0 > 0,
where x0 , zu0 , s0 , yu0 ∈ IRn , y 0 , z 0 ∈ IRm and κ0 , ϑ0 , ρ0 , ν 0 ∈ IR are arbitrary. Then we
define b̄ ∈ IRm , b̄u , c̄ ∈ IRn , the scaled error at the arbitrary initial interior solutions
(recall the construction in Section 4.3), and parameters β, γ ∈ IR as follows:
15
16
b̄u
=
b̄
=
c̄
=
β
=
1
(bu κ0 − x0 − zu0 )
ϑ0
1
(bκ0 − Ax0 + z 0 )
ϑ0
1
(cκ0 + yu0 − AT y 0 − s0 )
ϑ0
1 T 0
(c x − bT y 0 + bTu yu0 + ν 0 )
ϑ0
Such embedding was first introduced by Ye, Todd and Mizuno [316] using the standard form
problems (20.29) and (20.30). They discussed most of the advantages of this embedding and showed
√
that Mizuno, Todd and Ye’s [217] predictor-corrector algorithms solve the LO problem in O( nL)
iterations, yielding the first infeasible IPM with this complexity. Somewhat later Jansen, Roos and
Terlaky [155] presented the self-dual problem for the symmetric form primal-dual pair in a concise
introduction to the theory of LO based on IPMs.
The popular so-called infeasible-start methods detect unboundedness or infeasibility of the original
problem by divergence of the iterates.
IV.20 Implementing Interior Point Methods
γ
=
=
415
βκ0 + b̄T y 0 − b̄u yu0 − c̄T x0 + ρ0
1
[(x0 )T s0 + (yu0 )T zu0 + (y 0 )T z 0 + κ0 ρ0 ] + ρ0 > 0.
ϑ0
It is worth noting that if x0 is strictly feasible for (20.1), κ0 = 1, z 0 = Ax0 − b and
zu0 = bu − x0 , then b̄ = 0 and b̄u = 0. Also if (y 0 , yu0 ) is strictly feasible for (20.2),
κ0 = 1 and s0 = c − AT y + yu0 , then c̄ = 0. In some sense the vectors b̄, b̄u and c̄
measure the amount of scaled infeasibility of the given vectors x0 , z 0 , zu0 , s0 , y 0 , yu0 .
Now consider the following self-dual LO problem:
(SP)
min
γϑ
−x
Ax
s.t.
yu
−bTu yu
b̄Tu yu
−AT y
+bT y
+bu κ
−b κ
+c κ
−cT x
−b̄u ϑ
+b̄ ϑ
−c̄ ϑ
+β ϑ
−b̄T y +c̄T x −β κ
yu ≥ 0, y ≥ 0, x ≥ 0, κ ≥ 0, ϑ ≥ 0.
≥0
≥0
≥0
≥0
≥ −γ
Let us denote the slack variables for the problem (SP ) by zu , z, s, ν and ρ
respectively. By construction the positive solution x = x0 , z = z 0 , zu = zu0 , s =
s0 , y = y 0 , yu = yu0 , κ = κ0 , ϑ = ϑ0 , ν = ν 0 , ρ = ρ0 is interior feasible for problem
(SP). Also note that if, e.g., we choose x = x0 = e, z = z 0 = e, zu = zu0 = e, s =
s0 = e, y = y 0 = e, yu = yu0 = e, κ = κ0 = 1, ϑ = ϑ0 = 1, ν = ν 0 = 1, ρ = ρ0 = 1,
then this solution with µ = 1 is a perfectly centered initial solution for problem (SP).
The following theorem follows easily in the same way as Theorem I.20.17
Theorem IV.85 The embedding (SP ) of the given problems (20.1) and (20.2) has
the following properties:
(i) The self-dual problem (SP ) is feasible and hence both primal and dual feasible.
Thus it has an optimal solution.
(ii) For any optimal solution of (SP), ϑ∗ = 0.
(iii) (SP ) always has a strictly complementary optimal solution (yu∗ , y ∗ , x∗ , κ∗ , ϑ∗ ).
(iv) If κ∗ > 0, then x∗ /κ∗ and (y ∗ , yu∗ )/κ∗ are strictly complementary optimal
solutions of (20.1) and (20.2) respectively.
(v) If κ∗ = 0, then either (20.1) or (20.2) or both are infeasible.
Solving the embedding model needs just slightly more computation per iteration
than solving problem (20.1). This small extra effort is the cost of having several
important advantages: having a centered initial starting point, detecting infeasibility
by convergence, applicability of any IPM without degrading theoretical complexity.
The rest of this section is devoted to showing that the computation of the Newton
direction for the embedding problem (SP ) reduces to almost the same sized augmented
(20.6) or normal equation (20.7) systems as in the case of (20.1).
17
Exercise 106 Prove this theorem.
416
IV Miscellaneous Topics
In Chapter 3 the self-dual problem
(SP )
T
q̃ x̃ : M x̃ ≥ −q̃, x̃ ≥ 0 ,
min
was solved, where M is of size n × n and skew-symmetric and q̃ ∈ IRn+ . Given an initial
positive solution (x̃, s̃) > 0, where s̃ = M x̃ + q̃, a Newton step for problem (SP ) with
a value µ > 0 was given as
f = M ∆x,
f
∆s
f is the solution of the system
where ∆x
e −1 S)
e ∆x
f = µx̃−1 − s̃.
(M + X
(20.21)
Now we have to analyze how the positive definite system (20.21) can be efficiently
solved in the case of problem (SP). For this problem we have x̃ = (yu , y, x, κ, ϑ),
s̃ = (zu , z, s, ν, ρ) and
0
0
M =
I
−bTu
b̄Tu
−I
0
0
−AT
b
T
−b̄T
bu
−b
A
0
−c
−b̄u
c
T
0
c̄T
−β
0
0
−c̄ and q̃ = 0
.
0
β
0
γ
b̄
Hence the Newton equation (20.21) can be written as
Yu−1 Zu
0
0
Y −1 Z
I
−AT
−bTu
b̄Tu
b
T
−b̄
T
−I
A
X −1 S
−c
T
c̄T
bu
−b
c
ν
κ
−β
−b̄u
∆yu
µyu−1 − zu
∆y µy −1 − z
−1
−c̄ ∆x
= µx − s
1
β
∆κ µ κ − ν
ρ
µ ϑ1 − ρ
∆ϑ
ϑ
b̄
From the first and the second equation it easily follows that
∆yu = Yu Zu−1 (∆x − bu ∆κ + b̄u ∆ϑ + µyu−1 − zu )
and
∆y = Y Z −1 (−A∆x + b∆κ − b̄∆ϑ + µy −1 − z).
We simplify the notation by introducing
Wu := Zu−1 Yu .
.
(20.22)
IV.20 Implementing Interior Point Methods
417
Then, by substituting the value of ∆yu in (20.22) we find18
∆y
r1
Y −1 Z
A
−b
b̄
−AT X −1 S + W
c − Wu bu
−c̄ + Wu b̄u ∆x r2
u
=
bT
−cT − bTu Wu νκ + bTu Wu bu β − bTu Wu b̄u
∆κ r3
r4
−b̄T
c̄T + b̄Tu Wu −β − b̄Tu Wu bu ϑρ + b̄Tu Wu b̄u
∆ϑ
, (20.23)
where for simplicity the right-hand side elements are denoted by r1 , . . . , r4 . Now if we
multiply the second block of equations (corresponding to the right-hand side r2 ) in
(20.23) by −1, a system analogous to the augmented system (20.6) of problem (20.1) is
obtained. The difference is that here we have two additional constraints and variables.
For the solution of this system, the factorization of the matrix may happen in the
same way, but the last two rows and columns (these are typically dense) should be
left to the last two steps of the factorization. A 2 × 2 dense window for (∆κ, ∆ϑ) then
remains.
If we further simplify (20.23) by substituting the value of ∆y, the analogue to the
normal equation system of the problem (SP ) is produced. For simplicity the scalars
here are denoted by η1 , . . . , η8 and r5 , r6 , r7 .19,20
r5
∆x
AT Z −1 Y A + X −1 S + Zu−1 Yu η1 η2
(20.24)
η3
η4 η5
∆κ = r6 .
r7
∆ϑ
η6
η7 η8
18
Exercise 107 Verify that
19
Exercise 108 Verify that
r1
r2
r3
r4
=
µy −1 − z
−1
µx−1 − s − (µzu
− yu )
−1
µ κ1 − ν + bT
u (µzu − yu )
−1
1
µϑ
− ρ − b̄T
u (µzu − yu )
.
c − Wu bu − AT Z −1 Y b
η1
=
η2
η3
=
=
η4
=
T −1
νκ−1 + bT
Yb
u Wu bu + b Z
η5
=
η6
=
T −1
β − bT
Y b̄
u Wu b̄u − b Z
T −1
c̄T + b̄T
YA
u Wu + b̄ Z
η7
=
η8
=
T −1
−β − b̄T
Yb
u Wu bu − b̄ Z
r5
=
r6
=
−1
µx−1 − s − (µzu
− yu ) + AT (µz −1 − y)
r7
=
−c̄ + Wu b̄u + AT Z −1 Y b̄
T −1
−cT − bT
YA
u Wu − b Z
T −1
ρϑ−1 + b̄T
Y b̄
u Wu b̄u + b̄ Z
and
20
−1
T
−1
µκ−1 − ν + bT
− y)
u (µzu − yu ) − b (µz
−1
T
−1
µϑ−1 − ρ − b̄T
− y).
u (µzu − yu ) + b̄ (µz
Exercise 109 Develop similar formulas for the normal equation if ∆x is eliminated instead of
∆y. Compare the results with (20.7) and (20.8).
418
20.5.1
IV Miscellaneous Topics
Simplifying the Newton system of the embedding model
As mentioned with respect to the augmented system, we easily verify that the
difference between the normal equations of problem (20.1) and the embedding problem
(SP ) is that here two additional constraints and variables are present. Note that the
last two rows and columns in (20.23) and (20.24) are neither symmetric nor skewsymmetric. The reader might think that these two extra columns deteriorate the
efficiency of the algorithm (it requires two additional back-solves for the computation
of the Newton direction) and hence make the embedding approach less attractive
in practice. However, the computational cost can easily be reduced by a simple
observation. First, note that for any interior solution (yu , y, x, κ, ϑ) the duality gap
(see also Exercise 10 on page 35) is equal to
2γϑ.
Second, remember that in Lemma II.47 we have proved that in a primal-dual method
the target duality gap is always reached after a full Newton step. Since the duality
gap on the central path with the value µ equals to
2(m + 2n + 2)µ
and thus, the target duality gap is determined by the target value µ+ = (1 − θ)µ, the
step ∆ϑ can directly be calculated.
∆ϑ = ϑ+ − ϑ =
θµ
µ+ − µ
(m + 2n + 2) = − (m + 2n + 2)
γ
γ
As a result we conclude that the value of ∆ϑ in (20.24) is known, thus it can simply
be substituted in the Newton system and the system (20.23) reduces to almost the
original size. This simplification allows to implement IPMs based on the self-dual
embedding model efficiently, the cost per iteration is only one extra back-solve.
20.5.2
Notes on warm start
Many practical problems need the solution of a sequence of similar linear programs
where small perturbations are made to b and/or c (possibly also in A). As long as these
perturbations are small, we naturally expect that the optimal solutions are not far from
each other and restarting the optimization from the solution of the old problem (warm
start) should be more efficient than solving the problem from scratch.21
The difficulty in the IPM warm start comes from the fact that the old optimal
solution is very close to the boundary (this is a necessity since all optimal solutions in
an LO problem are on the boundary of the feasible set) and well centered. This point,
in the perturbed problem, still remains close to the boundary or becomes infeasible,
but even if it remains feasible it is very poorly centered. Consequently, the IPM
makes a long sequence of short steps because the iterates cannot get away from the
boundary. Therefore for an efficient warm start we need a well-centered point close to
21
Some early attempts to solve such problems are due to Freund [84] who uses shifted barriers, and
Polyak [234] who applies modified barrier functions. For further elaboration of the literature see,
e.g., Lustig, Marsten and Shanno [193], Gondzio and Terlaky [116] and Andersen et al. [10].
IV.20 Implementing Interior Point Methods
419
the old optimal one or an efficient centering method (to get far from the boundary)
to overcome these difficulties. These two possibilities are discussed briefly below.
Independent of the approach chosen it would be wise to save a well-centered almost
optimal solution (say, with 10−2 relative duality gap) that is still sufficiently far away
from the boundary.
• Centered solutions for warm start in (SP ) embedding. Among the spectacular properties of the (SP ) embedding listed in the previous section, the ability to
construct always perfectly centered initial interior points was mentioned. The old
well-centered almost optimal solution x∗ , z ∗ , zu∗ , s∗ , y ∗ , yu∗ , κ∗ , ϑ∗ , ρ∗ , ν ∗ can
be used as the initial point for embedding the perturbed problem. As we have seen
in Section 20.5, b̄, c̄, β and γ can always be redefined so that the above solution
stays well centered. The construction allows simultaneous perturbations of b, bu ,
c and even the matrix A. Additionally, it extends to handling new constraints or
variables added to the problem (e.g., in buildup or cutting plane schemes). In these
cases, we can keep the solution for the old coordinates (let µ be the actual barrier
parameter) and set the initial value of the new complementary variables equal to
√
µ. This results in a perfectly centered initial solution.
• Efficient centering. If the old solution remains feasible, but is badly centered, we
might proceed with this solution without making a new embedding. The common
approach is to use a path-following method for the recentering process; it uses
targets on the central path . Because of the weak performance of Newton’s method
far off the central path, this approach is too optimistic for a warm start. The targetfollowing method discussed in Part III (Section 11.4) offers much more flexibility
in choosing achievable targets, thus leading to efficient ways of centering. A target
sequence that improves centrality allows larger steps and therefore speeds up the
centering and, as a consequence, the optimization process.22
20.6
Parameters: step-size, stopping criteria
20.6.1
Target-update
The easiest way to ensure that all iterates remain close to the central path is to
decrease µ by a very small amount at each iteration. This provides the best theoretical
worst-case complexity, as we have seen in discussing full Newton step methods. These
methods demonstrate hopelessly slow convergence in practice and their theoretical
worst-case complexity is identical to their practical performance.
In large-update methods the barrier parameter is reduced much faster than the theory
suggests. To preserve polynomial convergence of these methods in theory, several
Newton steps are computed between two reductions of µ (update of the target) until
the iterate is in a sufficiently small neighborhood of the central path . In practice this
multistep strategy is ignored and at each reduction of µ, at each target-update, only
one Newton step is made. A drawback of this strategy is that the iterates might get
22
Computational results based on centering target sequences are presented in Gondzio [114] and
Andersen et al. [10].
420
IV Miscellaneous Topics
far away from the central path or from the target point, and the efficiency of the
Newton method might deteriorate. A careful strategy for updating µ and for steplength selection reduces the danger of this negative scenario.
At an interior iterate the current duality gap is given by
g = xT s + zuT yu + z T y,
which is equal to (2n + m)µ if the iterate is on the central path . The central point
with the same duality gap as the current iterate belongs to the value
µ=
xT s + zuT yu + z T y
.
2n + m
The target µ value is chosen so that the target duality gap is significantly smaller, but
does not put the target too far away. Thus we take
µnew = (1 − θ)
xT s + zuT yu + z T y
,
2n + m
(20.25)
where θ ∈ [0, 1]. The value θ = 0 corresponds to pure centering, while θ < 1 aims to
reduce the duality gap. A solid but still optimistic update is θ = 34 .23
20.6.2
Step size
Although there is not much supporting theory, current implementations use very large
and different step-sizes in the primal and dual spaces.24 All implementations use a
variant of the following strategy. First the maximum possible step-sizes are computed:
αP
and
αD
:= max {α > 0 : (x, z, zu ) + α(∆x, ∆z, ∆zu ) ≥ 0},
:= max {α > 0 : (s, y, yu ) + α(∆s, ∆y, ∆yu ) ≥ 0},
and these step-sizes are slightly reduced by a factor α0 = 0.99995 to ensure that the
new point is strictly positive. Although this aggressive, i.e. very large, choice of α0
is frequently reported to be the best, we must be careful and include a safeguard to
handle the case when α0 = 0.99995 turns out to be too aggressive.
20.6.3
Stopping criteria
Interior point algorithms terminate when the duality gap is small enough and the
current solution is feasible for the original problems (20.1) and (20.2), or when the
23
24
In the published literature, iteration counts larger than 50 almost never occur and most frequently
iteration numbers around 20 are reported. Taking this number as a target iteration count and
assuming that (in contrast to the theoretical worst-case analysis) Newton’s method provides iterates
always close to the target point, we can calculate how large the target-update (how small (1 − θ))
should be to reach the desired accuracy within the required number of iterations. Thus, for a
problem with 104 variables and a centered initial solution with µ = 1 and aiming for a solution
with 8 digits of accuracy, we have to reduce the duality gap by a factor of 1012 in 20 iterations.
By straightforward calculation we can easily verify that the value θ = 43 is an appropriate choice
for this purpose.
Kojima, Megiddo and Mizuno [174] proved global convergence of a primal-dual method that allows
such large step-sizes in most iterations.
IV.20 Implementing Interior Point Methods
421
infeasibility is small enough. The practical tolerances are larger than the theoretical
bounds that guarantee identification of an exact solution; this is a common drawback
of all numerical algorithms for solving LO problems. To obtain a sensible solution the
duality gap and the measure of infeasibility should be related to the problem data.
Relative primal infeasibility is related to the length of the vectors b and bu , dual
infeasibility is related to the length of the vector c, and the duality gap is related to
the actual objective value. A solution with p digits relative accuracy is guaranteed by
the stopping criteria presented here:
||Ax − z − b||
≤ 10−p
1 + ||b||
and
||x + zu − bu ||
≤ 10−p ,
1 + ||bu ||
||c − AT y + yu − s||
≤ 10−p ,
1 + ||c||
|cT x − (bT y − bTu yu )|
≤ 10−p .
1 + |cT x|
(20.26)
(20.27)
(20.28)
An 8-digit solution (p = 8) is typically required in the literature. Let us observe
that conditions (20.26–20.28) still depend on the scaling of the problem and somehow
use the assumption that the coefficients of the vectors b, bu , c are about the same
magnitude as those of the matrix A — preferably near 1.
√
An important note is needed here. The theoretical worst-case bound O( n log 1ε )
is still far from computational practice. It is still extremely pessimistic; in practice
the number of iterations is something like O(log n). It is rare that the current
implementations of interior-point methods use more than 50 iterations to reach an
8-digit optimal solution.
20.7
Optimal basis identification
20.7.1
Preliminaries
An optimal basis identification procedure is an algorithm that generates an optimal
basis and the related optimal basic solutions from an arbitrary primal-dual optimal
solution pair. In this section we briefly recall the notion of an optimal basis. In order
to ease the discussion we use the standard format:
min cT x : Ax = b, x ≥ 0 ,
(20.29)
where c, x, ∈ IRn , b ∈ IRm , and the matrix A is of size m × n. The dual problem is
max bT y : AT y + s = c, s ≥ 0 ,
(20.30)
where y ∈ IRm and s ∈ IRn . We assume that A has rank m. A basis AB is a nonsingular
rank m submatrix of A, where the set of column indices of AB is denoted by B. A
basic solution of the primal problem (20.29) is a vector x where all the coordinates in
N = {1, . . . , n} − B are set to zero and the basis coordinates form the unique solution
of the equation AB xB = b. The corresponding dual basic solution is defined as the
unique solution of ATB y = cB , along with sB = 0 and sN = cN − ATN y. It is clear from
422
IV Miscellaneous Topics
this definition that a primal-dual pair (x, s) of basic solutions is always complementary,
and hence, if both x and s are feasible, they are primal and dual optimal, respectively.
A basic solution is called primal (dual) degenerate if at least one component of xB
(sN ) is zero.
There might be two reasons in practice to require an optimal basic solution for an
LO problem.
1. If the given problem is a mixed integer LO problem then some or all of the variables
must be integer. After solving the continuous relaxation we have to generate cuts
to cut off the nonintegral optimal solution. To date, such cuts can be generated
only if an optimal basic solution is available.25 Up till now there has been only one
attempt to design a cut generation procedure within the interior-point setting (see
Mitchell [211]).
2. In practical applications of LO, a sequence of slightly perturbed problems often
has to be solved. This is the case in combinatorial optimization when new cuts
are added to the problem or if a branch and bound algorithm is applied. Also if,
e.g., in production planning models the optimal solutions for different scenarios
are calculated and compared, we need to solve a sequence of slightly perturbed
problems. When such closely related problems are solved, we expect that the
previous optimal solution can help to solve the new problem faster. Although some
methods for potentially efficient warm start were discussed in Section 20.5.2, in
some cases it might be advantageous in practice to use Simplex type solvers initiated
with an old optimal basis.
In this section we describe how an optimal basic solution can be obtained from any
optimal solution pair of the problem.
20.7.2
Basis tableau and orthogonality
We introduce briefly the notions of basis tableau and pivot transformation and we
show how orthogonal vectors can be obtained from a basis tableau. Let A be the
constraint matrix, with columns aj for 1 ≤ j ≤ n, and let AB be a basis chosen from
the columns of A. The basis tableau QB corresponding to B is defined by the equation
AB QB = A.
(20.31)
Because this gives no rise to confusion we write below Q instead of QB . The rows of
Q are naturally indexed by the indices in B and the columns by 1, 2, . . . , n. If i ∈ B
and j = 1, 2, . . . , n the corresponding element of Q is denoted by qij . See Figure 20.1
(page 423). It is clear that qij is the coefficient of ai in the unique basis representation
of the vector aj :
X
aj =
qij ai .
i∈B
For j ∈ B this implies
qij =
25
(
1
if i = j,
0
otherwise,
The reader may consult the books of Schrijver [250] and Nemhauser and Wolsey [224] to learn
about combinatorial optimization.
IV.20 Implementing Interior Point Methods
i∈B
423
··· j
.
.
.
.
.
.
···
i
qij
.
.
.
···
···
.
.
.
Figure 20.1
Basis tableau.
Thus, if j ∈ B, the corresponding column in Q is a unit vector with its 1 in the row
corresponding to j. Hence, by a suitable reordering of columns QB — the submatrix
of Q consisting of the columns indexed by B — becomes an identity matrix. It is
convenient for the reasoning if this identity matrix occupies the first m columns.
Therefore, by permuting the columns of Q by a permutation matrix P we write
h
i
QP = I QN ,
(20.32)
where QN denotes the submatrix of Q arising when the columns of QB are deleted.
In the next section, where we present the optimal basis identification procedure,
we will need a well-known orthogonality property of basis tableaus.26 This property
follows from the obvious matrix identity
h
i QN
= 0.
I QN
−I
Because of (20.32) this can be written as
QN
= 0.
QP
−I
Defining
R := P
QN
−I
,
(20.33)
(20.34)
we have rank Q = m, rank R = n − m and QR = 0. We associate with each index a
vector in IRn as follows. If i ∈ B, q (i) will denote the corresponding row of Q and if
j ∈ N then q(j) is the corresponding column of R.
Clearly, the vectors q (i) , i ∈ B, span the row space of Q = QB and the vectors q(j) ,
j ∈ N , span the column space of R. Since these spaces are orthogonal, they are each
26
See, e.g., Rockafellar [238] or Klafszky and Terlaky [171].
424
IV Miscellaneous Topics
others orthogonal complement. Note that the row space of Q is the same as the row
space of A. We thus see that the above spaces are independent of the basis B.
′
denote the vector associated with
Now let AB′ be another basis for A and let q(j)
′
an index j 6∈ B . Then the aforementioned orthogonality property states that
′
q (i) ⊥ q(j)
for all i ∈ B and j 6∈ B ′ . This is an obvious consequence of the observation in the
previous paragraph.
It is well known that the basis tableau for B ′ can be obtained from the tableau for
B by performing a sequence of pivot operations. A pivot operation replaces a basis
vector ai , i ∈ B by a nonbasic vector aj , j ∈
/ B, where qij 6= 0.27
Example IV.86 For better understanding let us consider a simple numerical example. The following two basic tableaus can be transformed into each other by a single
pivot.
a1
a2
a3
a4
a5
a5
2
1
3
0
1
a4
–1
–1
4
1
0
a1
a2
a3
a4
a5
a2
2
1
3
0
1
a4
1
0
7
1
1
It is easy to check that for the first tableau q(3) = (0, 0, −1, 4, 3) and for the second
tableau q (4) = (1, 0, 7, 1, 1), and that these vectors are orthogonal.28,29
✷
♦
20.7.3
The optimal basis identification procedure
Given any complementary solution, the algorithm presented below constructs an
optimal basis in at most n iterations.30 Since the iteration count and thus the number
of necessary arithmetic operations depends only on the dimension of the problem and
is independent of the actual problem data, the algorithm is called strongly polynomial.
The algorithm can be initialized with any optimal (and thus complementary)
solution pair (x, s). This pair defines a partition of the index set as follows:
B = {i | xi > 0},
27
28
29
30
N = {i : si > 0},
T = {i : xi = si = 0}.
Exercise 110 Let i ∈ B, where AB is a basis. For any j 6∈ B such that qij 6= 0 show that
B′ = (B \ {i}) ∪ {j} also defines a basis, and the tableau for B′ can be obtained from the tableau
for B by one pivot operation.
Exercise 111 For each of the tableaus in Example IV.86, give the permutation matrix P and the
matrix R according to (20.33) and (20.34).
Exercise 112 For each of the tableaus in Example IV.86, give a full bases of the row space of the
tableau and of its orthogonal complement.
The algorithm discussed here was proposed by Megiddo [201]. He has also proved that an optimal
basis cannot be constructed only from a primal or dual optimal solution in strongly polynomial
time unless there exists a strongly polynomial algorithm for solving the LO problem. The problem
of constructing a vertex solution from an interior-point solution has also been considered by
Mehrotra [203].
IV.20 Implementing Interior Point Methods
425
As we have seen in Section 3.3.6, interior-point methods produce a strictly complementary optimal solution and hence such a solution gives a partition with T = ∅. But
below we deal with the general case and we allow T to be nonempty.
The optimal basis identification procedure consists of three phases. In the first phase
a so-called maximal basis is constructed. A basis of A is called maximal with respect
to (x, s) if
• it has the maximum possible number of columns from AB ,
• it has the maximum possible number of columns from (AB , AT ).
Then, in the second and third phases, independently of each other, primal and dual
elimination procedures are applied to produce primal and dual feasible basic solutions
respectively.
Note that a maximal basis is not unique and not necessarily primal and/or dual
feasible. A maximal basis can be found by the following simple pivot algorithm.
Because of the assumption rank (A) = m, all the artificial basis vectors {e1 , · · · , em }
Initial basis
Input:
Optimal solution pair (x, s) and the related partition (B, N, T );
artificial basis an+1 = e1 , · · · , an+m = em ;
B = {n + 1, · · · , n + m}.
Output:
A maximal basis B ⊆ {1, · · · , n}.
begin
while qij 6= 0, i > n, j ∈ AB do
begin
pivot on position (i, j) (ai leaves and aj enters the basis);
B := (B \ {i}) ∪ {j} .
end
while qij 6= 0, i > n, j ∈ AT do
begin
pivot on position (i, j) (ai leaves and aj enters the basis);
B := (B \ {i}) ∪ {j} .
end
while qij 6= 0, i > n, j ∈ AN do
begin
pivot on position (i, j) (ai leaves and aj enters the basis);
B := (B \ {i}) ∪ {j} .
end
end
are eliminated from the basis at termination. Since the AB part is investigated first,
the number of basis vectors from AB is maximal; similarly the number of basis vectors
from [AB , AT ] is also maximal. In a practical implementation, special attention must
426
IV Miscellaneous Topics
be given to the selection of the pivot elements in the above algorithm. Typically there
is lot of freedom in the pivot selection, since a large number of leaving and/or entering
variables could be selected at each iteration.31 The structure of the basis tableau
resulting from the algorithm is visualized in Figure 20.2. Note that the tableau is
never computed in practice; just the basis, in a factorized form. The tableau form is
used just to ease the explanation and understanding.
AB
AT
AN
1
i∈B∩B
..
.
1
1
i∈B∩T
0
..
.
1
1
i∈B∩N
0
0
..
.
1
Figure 20.2
Tableau for a maximal basis.
We proceed by a primal and a dual phase, performed independently of each other.
They make the basis primal and dual feasible, respectively.
Observe that in the elimination step of the first while-loop of the primal phase the
columns of ABe are dependent. Hence there exists a nonzero solution of ABex̄B = 0.32 In
the elimination step the ‘maximal’ property of the basis is lost, but it isrestoredin the
second while-loop. As we can see, the Primal phase works only on the ABe , ATe part
of the matrix A. In fact it reduces the ABe part to an independent set of column vectors.
At termination the maximal basis is primal feasible and x̃ is the corresponding primal
feasible basic solution, i.e., x̃B = A−1
B b ≥ 0 and x̃N = 0. The number of eliminations in
the first while-loop is at most |B| − rank (B) and the number of pivots in the second
while-loop is also at most |B| − rank (B).
The Dual phase presented below works on the (AT , AN ) part. It reduces AN and
extends AT so that no vector from AN remains in the basis.
Note that in the elimination step of the first while-loop the rank of [AB , ATe]
is less than m.33 In the elimination step the ‘maximal’ property of the basis is
31
32
33
We would like to choose always the pivot element that produces the least fill-in in the inverse basis.
For this pivot selection problem many heuristics are possible. It can at least locally be optimized by,
e.g., the heuristic Markowitz rule (recall Section 20.4). Implementation issues related to optimal
basis identification procedures are discussed in Andersen and Ye [11], Andersen et al. [10] and
Bixby and Saltzman [43].
In fact, an appropriate x̄ can be read from the tableau. Because of the orthogonality property any
e − B can be used. In a practical implementation the tableau is not available;
vector q(j) for j ∈ B
only the (factorized) basis matrix QB is available. But then a vector q(j) can be obtained by
computing a single nonbasic column of the tableau.
e ∩ B; so only one row of the tableau
For an appropriate s̄ we can choose any vector q (i) for i ∈ N
has to be computed at each execution of the first while-loop.
IV.20 Implementing Interior Point Methods
427
Primal phase
Input:
e N, Te);
Optimal solution pair (x̃, s) and the related partition (B,
maximal basis B.
Output:
A maximal basis B ⊆ {1, · · · , n};
e N, Te) with B
e ⊂ B.
optimal solution (x̃, s), partition (B,
begin
e 6⊆ B do
while B
begin
begin
let x̄ be such that ABex̄Be = 0, x̄Te∪N = 0, x̄ 6= 0;
eliminate a(t least one) coordinate
ofx̃, let x̃ := x̃ − ϑx̄ ≥ 0;
e := σ(x̃), Te := {1, . . . , n} \ B
e∪N ;
B
end
while qij 6= 0, i ∈ Te ∩ B, j ∈ B do
begin
pivot on position (i, j) (ai leaves, aj enters the basis);
B := (B \ {i}) ∪ {j} .
end
end
end
lost but is restored in the second while-loop. At termination the maximal basis
is dual feasible and s̃ is the corresponding dual feasible basic solution, i.e., s̃N =
T
cN − ATN (A−1
B ) cB and s̃B = 0. The number of eliminations in the first while-loop is
at most m − rank (AB , AT ) and the number of pivots in the second while-loop is also
at most m − rank (AB , AT ).
To summarize, by first constructing a maximal basis and then performing the primal
and dual phases, the above algorithm generates an optimal basis after at most n
iterations. First we need at most m pivots to construct the maximal basis, then in the
primal phase |B| − rank (B) and in the dual phase m − rank (AB , AT ) pivots follow.
Finally, to verify the n-step bound, observe that after the initial maximal basis is
constructed, each variable might enter the basis at most once.
20.7.4
Implementation issues of basis identification
In the above basis identification algorithm it is assumed that a pair of exact
primal/dual optimal solutions is known. This is never the case in practice. Interior
point algorithms generate only a sequence converging to optimal solutions and because
of the finite precision of computations the solutions are neither exactly feasible nor
complementary. Somehow we have to make a decision about which variables are
428
IV Miscellaneous Topics
Dual phase
Input:
e , Te);
Optimal solutions (x, s̃), partition (B, N
maximal basis B.
Output:
A maximal basis B ⊆ {1, · · · , n};
e , Te) with N
e ∩ B = ∅.
optimal solution (x, s̃), partition (B, N
begin
e ∩ B 6= ∅ do
while N
begin
begin
let s̄ be such that s̄ = AT y, s̄B∪Te = 0, s̄ 6= 0;
eliminate a(t least one) coordinate
ofs, let s̃ := s̃ − ϑs̄ ≥ 0;
e
e
e ;
N := σ(s̃), T := {1, · · · , n} \ B ∪ N
end
e ∩ B, j ∈ Te do
while qij 6= 0, i ∈ N
begin
pivot on position (i, j) (ai leaves, aj enters the basis);
B := (B \ {i}) ∪ {j} .
end
end
end
positive and which are zero at the optimum.
Let (x̄, ȳ, s̄) be feasible and (x̄)T s̄ ≤ ε. Let us make a guess for the optimal partition
of the problem as
B = { i | x̄i ≥ s̄i }
and
N = { i | x̄i < s̄i }.
Now we can define the following perturbed problem34
minimize c̄T x : Ax = b̄, x ≥ 0 ,
(20.35)
where
b̄ = AB x̄B ,
c̄B = ATB ȳ
and c̄N = ATN ȳ + sN .
Now the vectors (x, y, s), where y = ȳ and
x̄
i
∈
B,
i
and si =
xi =
0
i∈N
34
This approach was proposed by Andersen and Ye [11].
0
i ∈ B,
s̄i
i∈N
(20.36)
IV.20 Implementing Interior Point Methods
429
are strictly complementary optimal solutions of (20.35).35 If ε is small enough,
then the partition (B, N ) is the optimal partition of (20.29) (recall the results of
Theorem I.47 and observe that the proof does not depend on the specific algorithm,
just on the centrality condition and the stopping precision). Thus problems (20.29)
and (20.35) share the same partition and the same set of optimal bases. As an
optimal complementary solution for (20.35) is available, the above basis identification
algorithm can be applied to this perturbed problem. The resulting optimal basis,
within a small margin of error (depending on ε), is an optimal basis for (20.29).
20.8
Available software
After twenty years of intensive research, IPMs are now well understood both in theory
and practice. As a result a number of sophisticated implementations exist of IPMs for
LO. Below we give a list of some of these codes; some of them contain both a Simplex
and an IPM solver. They are capable to solve linear problems on a PC in some minutes
that were hardly solvable on a super computer fifteen years ago.
CPLEX (CPLEX/ BARRIER) (CPLEX Optimization, Inc.). For information
contact http://www.cplex.com.
CPLEX is leading the market at this moment. It is a most complete and robust
package. It contains a primal and a dual Simplex solver, an efficient interior-point
implementation with cross-over,36 a good mixed-integer code, a network and a quadratic programming solver. It is supported by most modelling languages and available
for most platforms.
XPRESS-MP (DASH Optimization). For information contact the vendor’s WEB
page: http://www.dashoptimization.com.
An excellent package including Simplex and IPM solvers. It is almost as complete
as CPLEX.
CLP (The LO solver on COIN-OR). For more information contact
http://www.coin-or.org/cgi-bin/cvsweb.cgi/COIN/Clp/.
COIN-OR’s LO package is written by the IBM Lo team. Like CPLEX, CLP contains
both Simplex and IPM solvers. It is capable to solve linear and quadratic optimization
problems.
LOQO. Available from http://www.princeton.edu/~rvdb/.
LOQO is developed by Vanderbei (Department of Operations Research and
Financial Engineering, Princeton University, Princeton, NJ 08544, USA). It is a
robust implementation of a primal-dual infeasible-start IPM for convex quadratic
optimization. LOQO is a commercial package, like CPLEX and OSL, but it is available
for academic purposes for a modest license fee.
35
Producing a reliable guess for the optimal partition is a nontrivial task. The simple method
presented by (20.36) seems to work reasonably well in practice. See El-Bakry, Tapia and
Zhang [71, 70]. However, Andersen and Ye [11] report good results by using a more sophisticated
indicator to predict the optimal partition (B, N ) based on the primal-dual search direction.
36
Close to optimality the solver rounds the IPM solution to a (not necessarily optimal) basic solution
and switches to the Simplex solver, that generates an optimal basic solution.
430
IV Miscellaneous Topics
HOPDM. Available from
http://www.maths.ed.ac.uk/~gondzio/software/hopdm.html.
HOPDM is developed by Gondzio (School of Mathematics, The University of
Edinburgh, Edinburgh, Scotland). It implements a higher order primal-dual method.
It is in the public domain — in a form of FORTRAN source files — for academic
purposes.
BPMPD. Available from http://www.sztaki.hu/~meszaros/bpmpd/.
Mészáros’ BPMPD, is an implementation of a primal-dual predictor-corrector IPM
including both the normal and augmented system approach. The code is available as
an executable file for academic purposes.
LIPSOL. Available from http://www.caam.rice.edu/~yzhang/.
Zhang’s LIPSOL is written in MATLAB and FORTRAN. It is an implementation
of the primal-dual predictor-corrector method. One of its features is the use of the
MATLAB programming language, which makes its use relatively easy.
PCx. Available from http://www-fp.mcs.anl.gov/otc/Tools/PCx/.
This code was developed by Czyzyk, Mehrotra and Wrightat the Argonne National
Lac, Chicago.. It is a stand alone C implementation of an infeasible primal-dual
predictor corrector IPM. PCx is freely available, but is not public domain software
McIPM. Available from http://www.cas.mcmaster.ca/~oplab/index.html.
This code was developed at the Advanced Optimization Lab, McMaster University
by Zhu, Peng and Terlaky. McIPM is written in MATLAB and C. It is a unique
implementation of a Self-Regular primal-dual predictor-corrector IPM and it is based
on the self-dual embedding model. The use of the MATLAB makes its use relatively
easy. It is freely available under an open source license.
More information about codes for linear optimization, either for commercial or research
purposes, are available at the World Wide Web site of LP FAQ (LP Frequently Asked
Questions) at
• http://www-unix.mcs.anl.gov/otc/Guide/faq/linear-programming-faq.html
• ftp://rtfm.mit.edu/pub/usenet/sci.answers/linear-programming-faq
Appendix A
Some Results from Analysis
In Part II we need a result from convex analysis. We include its elementary proof in
this appendix for the sake of completeness. A closely related result can be found in
Bazaraa et al. [37] (Theorem 3.4.3 and Corollary 1, pp. 101–102). Recall that a subset
C of IRk is called relatively open if C is open in the smallest affine subspace of IRk
containing C.
Proposition A.1 Let f : D → IR be a differentiable function, where D ⊆ IRk is an
open set, and let C be a relatively open convex subset of D such that f is convex on
C. Moreover, let L denote the subspace parallel to the smallest affine space containing
C. Then, x∗ ∈ C minimizes f over C iff
∇f (x∗ ) ⊥ L.
(A.1)
Proof: Since f is convex on C, we have for any x, x∗ ∈ C,
f (x) ≥ f (x∗ ) + ∇f (x∗ )T (x − x∗ ).
Since x − x∗ ∈ L, the sufficiency of condition (A.1) follows immediately. To prove the
necessity of (A.1), consider xt = x∗ + t(x− x∗ ), with t ∈ IR. The convexity of C implies
that if 0 ≤ t ≤ 1, then xt ∈ C. Moreover, since C is open, we also have xt ∈ C when
t ≥ −a for some positive a. Since f is differentiable, we have
T
∇f (x∗ ) (x − x∗ ) = lim
t↓0
f (xt ) − f (x∗ )
f (xt ) − f (x∗ )
= lim
.
t↑0
t
t
Now let x∗ ∈ C minimize f . Since f (xt ) ≥ f (x∗ ), letting t → 0 we have that the first
limit must be nonnegative, and the second nonpositive. Hence both limits are zero. So
T
we have ∇f (x∗ ) (x − x∗ ) = 0, ∀x ∈ C. Thus (A.1) follows.
✷
At several places in the book we mention the implicit function theorem. There
exists many forms of this theorem. See, e.g., Franklin [82], Buck [52], Fiacco [76] or
Rudin [248]. We cite here a version of Bertsekas [40] (Proposition A.25, pp. 554).1
Proposition A.2 (Implicit Function Theorem) Let f : IRn+m → IRm be a
function of w ∈ IRn and z ∈ IRm such that:
1
In fact, Proposition A.25 in Bertsekas [40] contains a typo. It says that f : IRn+m → IRn instead
of f : IRn+m → IRm .
432
Some Results from Analysis
(i) There exist w̄ ∈ IRn and z̄ ∈ IRm such that f (w̄, z̄) = 0.
(ii) f is continuous and has a continuous and nonsingular gradient matrix (or
Jacobian) ∇z f (w, z) in an open set containing (w̄, z̄).
Then there exists open sets Sw̄ ⊆ IRn and Sz̄ ⊆ IRm containing w̄ and z̄, respectively,
and a continuous function φ : Sw̄ → Sz̄ such that z̄ = φ(w̄) and f (w, φ(w)) = 0 for
all w ∈ Sw̄ . The function φ is unique in the sense that if w ∈ Sw̄ , z ∈ Sz̄ , and
f (w, z) = 0, then z = φ(w). Furthermore, if for some p > 0, f is p times continuously
differentiable the same is true for φ, and we have
−1
∇φ(w) = − (∇z f (w, φ(w)))
∇w f (w, φ(w)) .
Appendix B
Pseudo-inverse of a Matrix
We are interested in the least norm solution of the linear system of equations
Ax = b,
where A is an m × n matrix of rank r, and b ∈ IRm . We assume that a solution exists,
i.e., b belongs to the column space of A.
First we consider the case where r = n. Then the columns of A are linearly
independent and hence the solution is unique. It is obtained by premultiplication
of the system by AT : AT Ax = AT b. Since AT A is nonsingular we find
x = (AT A)−1 AT b (r = n).
We proceed with the case where r = m < n. Then Ax = b has multiple solutions.
The least norm solution is characterized by the fact that it is orthogonal to the null
space of A. So in this case the solution belongs to the row space of A and hence can
be written as x = AT λ, λ ∈ IRm . This implies that AAT λ = b. This time AAT is
nonsingular, and we obtain that λ = (AAT )−1 b, whence
x = AT (AAT )−1 b
(r = m).
Finally we consider the general case, without making any assumption on the rank of
A. We start by decomposing A as follows:
A = A1 A2 ,
where A1 is an m × r matrix of rank r, and A2 is an r × n matrix of rank r.
There are many ways to realize such a decomposition. One way is the well-known
LU decomposition of A.1
Now Ax = b can be rewritten as A1 A2 x = b. Since A1 has full column rank we are
in the first situation, and hence it follows that
A2 x = (AT1 A1 )−1 AT1 b.
Thus our problem is reduced to finding a least norm solution of the last system. Since
A2 has full row rank we are now in the second situation, and hence it follows that
x = AT2 (A2 AT2 )−1 (AT1 A1 )−1 AT1 b.
1
See, e.g., the book of Strang [259].
434
Pseudo-inverse of a Matrix
Thus we have found the least norm solution of Ax = b. Defining the matrix A+
according to
A+ = AT2 (A2 AT2 )−1 (AT1 A1 )−1 AT1 ,
(B.1)
we conclude that the least norm solution of Ax = b is given by x = A+ b.
The matrix A+ is called the pseudo-inverse of A. We can easily verify that A+
satisfies the following relations:
AA+ A
+
+
A AA
(AA+ )T
(A+ A)T
=
A,
(B.2)
=
=
+
A ,
AA+ ,
(B.3)
(B.4)
=
A+ A.
(B.5)
Theorem B.1 The equations (B.2) to (B.5) determine A+ uniquely.
Proof: We already have seen that a solution exists. Suppose that we have two
solutions, X1 and X2 say. From (B.2) and (B.5) we derive that X1 AAT = AT , and
X2 AAT = AT . So (X1 − X2 )AAT = 0. This implies (X1 − X2 )AAT (X1 − X2 )T = 0,
and hence we must have (X1 − X2 )A = 0. This means that the columns of X1 − X2
belong to the left null space of A. On the other hand (B.3) and (B.4) imply that
AX1 X1T = X1T , and AX2 X2T = X2T . Hence A(X1 X1T − X2 X2T ) = X1T − X2T . This
means that the columns of X1 − X2 belong to the column space of A. Since the
column space and the left null space of A are orthogonal this implies that X1 = X2 .
✷
There is another interesting way to describe the pseudo-inverse A+ of A, which
uses the so-called singular value decomposition (SV D) of A. Let r denote the rank
of A, and let λ1 , λ2 , · · · , λr denote the nonzero (hence positive) eigenvalues of AAT .
Furthermore, let Q1 and Q2 denote orthogonal matrices such that the first r columns
of Q1 constitute a basis of the column space of A, and the first r columns of Q2
constitute a basis of the row space of A. Then, if Σ denotes the m × n matrix whose
only nonzero elements are Σ11 , Σ22 , · · · , Σrr , with
p
Σii = σi := λi , 1 ≤ i ≤ r,
then we have
A = Q1 ΣQT2 .
This is the SV D of A, and the numbers σi , 1 ≤ i ≤ r are called the singular values
of A.
Using Theorem B.1 we can easily verify that Σ+ is the n × m matrix whose only
nonzero elements are the first r diagonal elements, and these are the inverses of the
singular values of A. Then, using Theorem B.1 once more, we can easily check that
A+ is given by
A+ = Q2 Σ+ QT1 .
Appendix C
Some Technical Lemmas
Lemma C.1 Let A be an m × n matrix with columns Aj and b a vector of dimension
m such that the set
S := {x : Ax = b, x ≥ 0}
is bounded and contains a positive vector. Moreover, let all the entries in A and b be
integral. Then for each i, with 1 ≤ i ≤ n,
1
.
j=1 kAj k
max {xi : x ∈ S} ≥ Qn
x
Proof: Observe that each column Aj of A must be nonzero, due to the boundedness
of S. Fixing the index i, let x ∈ S be such that xi is maximal. Note that such an x
exists since S is bounded. Moreover, since S contains a positive vector, we must have
xi > 0. Let J be the support of x:
J = {j : xj > 0} .
We assume that x is such that the cardinality of its support is minimal. Then the
columns of the submatrix AJ of A are linearly independent. This can be shown as
follows. Let there exist a nonzero vector λ ∈ IRn such that
X
λj Aj = 0,
j∈J
and λk = 0 for each k ∈
/ J. Then Aλ = 0. Hence, if ε is small enough, x ± ελ has
the same support as x and is positive on J. Moreover, x ± ελ ∈ S. Since the i-th
coordinate cannot exceed xi it follows that λi = 0. Since S is bounded, at least one
of the coordinates of λ must be negative, because otherwise S would contain the ray
x + ελ, ε > 0. By increasing the value of ε until one of its coordinates reaches zero
we get a vector in S with less than |J| nonzero coordinates and for which the i-th
coordinate still has value xi . This contradicts the assumption that x has minimal
support among such vectors, thus proving that the columns of the submatrix AJ of A
are linearly independent.
Now let AKJ be any nonsingular submatrix of AJ . Here K denotes a suitable subset
of the row indices 1, 2, · · · , m of A. Then we have
AKJ xJ = bK ,
436
Some Technical Lemmas
since the coordinates xj of x with j ∈
/ J are zero. We can solve xi from this equation
by using Cramer’s rule. 1 This yields
det A′KJ
,
det AKJ
xi =
(C.1)
where A′KJ denotes the matrix arising from AKJ by replacing the i-th column by bK .
We know that xi > 0. This implies |det A′KJ | > 0. Since all the entries in the matrix
A′KJ are integral the absolute value of its determinant is at least 1. Thus we find
xi ≥
1
.
|det AKJ |
Now using that |det AKJ | is bounded above by the product of the norms of its columns,
due to the well-known Hadamard inequality2 for determinants, we find3
1
1
1
.
≥ Q
≥ Qn
kA
k
kA
k
Kj
j
j∈J
j∈J
j=1 kAj k
xi ≥ Q
The second inequality is obvious and the last inequality follows since A has no zero
columns and hence the norm of each column of A is at least 1. This proves the lemma.
✷
We proceed with a proof of the two basic inequalities in (6.24) on page 134. The
proof uses standard techniques for proving elementary inequalities.4
Lemma C.2 Let z ∈ IRn , and α ≥ 0. Then each of the two inequalities
ψ (α kzk ) ≤ Ψ (αz) ≤ ψ (−α kzk)
holds whenever the involved expressions are well defined. The left (right) inequality
holds with equality if and only if one of the coordinates of z equals kzk ( − kzk,
respectively) and the remaining coordinates are zero.
Proof: Fixing z we introduce
g(α) := ψ (α kzk)
and
G(α) := Ψ (αz) =
n
X
ψ (αzi ) ,
i=1
where α is such that αz > −e and α kzk > −1. Both functions are twice differentiable
with respect to α. Using that ψ ′ (t) = 1 − 1/t we obtain
α kzk2
g (α) =
,
1 + α kzk
′
′
G (α) =
n
X
i=1
αzi2
1 + αzi
1
The idea of using Cramer’s rule in this way was applied first by Khachiyan [167].
2
cf. Section 1.7.3.
3
The idea of using Hadamard’s inequality for deriving bounds on the coordinates of xi from (C.1)
was applied earlier by Klafszky and Terlaky [170] in a similar context.
4
The proof is due to Jiming Peng [232].
Some Technical Lemmas
437
and
2
′′
g (α) =
kzk
′′
2,
G (α) =
(1 + α kzk)
n
X
i=1
zi2
2.
(1 + αzi )
Now consider the case where α ≥ 0. Then using zi ≤ kzk we may write
G′′ (α) =
n
X
i=1
zi2
(1 + αzi )2
≥
n
X
i=1
2
zi2
=
(1 + α kzk)2
kzk
= g ′′ (α).
(1 + α kzk)2
So G(α) − g(α) is convex for α ≥ 0. Since
g ′ (0) = G′ (0) = 0
g(0) = G(0) = 0,
(C.2)
it follows that G(α) ≥ g(α) if α ≥ 0. This proves the left hand side inequality in the
lemma.
The right inequality follows in the same way. Let α ≥ 0 be such that e + αz > 0
and 1 − α kzk > 0. Using 1 + αzi ≥ 1 − α kzk > 0 we may write
G′′ (α) =
n
X
i=1
zi2
(1 + αzi )
2
≤
n
X
i=1
2
zi2
2
(1 − α kzk)
=
kzk
2
(1 − α kzk)
= g ′′ (−α).
This implies that G(α) − g(−α) is concave for α ≥ 0. Using (C.2) once more we obtain
G(α) ≤ g(−α) if α ≥ 0, which is the right hand side inequality in the lemma.
2
Note that in both cases equality occurs only if zi2 = kzk for some i. Since the
remaining coordinates are zero in that case, the lemma follows.
✷
We proceed with another technical lemma that is used in the proof of Lemma IV.15
in Chapter 17 (page 325).
Lemma C.3 Let p be a positive number and let f : IR+ → IR+ be defined by
f (x) := |1 − x| + 1 −
If p ≥ 1 then f attains its minimal value at x =
its minimal value at x = 1 and at x = p.
p
.
x
√
p, and if 0 < p ≤ 1 then f attains
Proof: First consider the case p ≥ 1. If x ≤ 1 then we have
f (x) = 1 − x +
p
p
− 1 = − x.
x
x
Hence, if x ≤ 1 the derivative of f is given by
f ′ (x) = −
p
− 1 < 0.
x2
Thus, f is monotonically decreasing if x ≤ 1. If x ≥ p then we have
f (x) = x − 1 + 1 −
p
p
=x−
x
x
438
Some Technical Lemmas
and the derivative of f is given by
f ′ (x) = 1 +
p
> 0,
x2
proving that f is monotonically increasing if x ≥ p. For 1 ≤ x ≤ p we have
f (x) = x − 1 +
p
p
− 1 = x + − 2.
x
x
Now the derivative of f is given by
f ′ (x) = 1 −
p
x2
and the second derivative by
2p
> 0.
x3
√
Hence f is convex if x ∈ [1, p]. Putting f ′ (x) = 0 we get x = p, proving the first part
of the lemma.
The case p ≤ 1 is treated as follows. If x ≤ p then
f ′′ (x) =
f (x) = 1 − x +
p
p
− 1 = − x,
x
x
and, as before, f is monotonically decreasing. If x ≥ 1 then
f (x) = x − 1 + 1 −
p
p
=x−
x
x
and f is monotonically increasing. Now let p ≤ x ≤ 1. Then
f (x) = 1 − x + 1 −
p
p
=2−x− .
x
x
Hence f is concave if x ∈ [p, 1], and f has local minima at x = p and x = 1. Since
f (1) = f (p) = 1 − p the second part of the lemma follows.
✷
The rest of this appendix is devoted to some properties of the componentwise
product uv of two orthogonal vectors u and v in IRn . The first two lemmas give
some upper bounds for the 2-norm and the infinity norm of uv.
Lemma C.4 (First uv-lemma) If u and v are orthogonal in IRn , then
√
1
2
kuvk∞ ≤ ku + vk2 , kuvk ≤
ku + vk2 .
4
4
Proof: We may write
1
(u + v)2 − (u − v)2 .
4
From this we derive the componentwise inequality
uv =
1
1
− (u − v)2 ≤ uv ≤ (u + v)2 .
4
4
(C.3)
Some Technical Lemmas
439
This implies
1
1
2
2
− ku − vk e ≤ uv ≤ ku + vk e.
4
4
Since u and v are orthogonal, the vectors u − v and u + v have the same norm, and
hence the first inequality in the lemma follows. For the second inequality we derive
from (C.3) that
2
1 T
1 T
2
2
kuvk = eT (uv) =
e (u + v)2 − (u − v)2 ≤
e (u + v)4 + (u − v)4 .
16
16
4
Since eT z 4 ≤ kzk for any z ∈ IRn , we obtain
1
4
4
2
ku + vk + ku − vk .
kuvk ≤
16
Using again that ku − vk = ku + vk, we confirm the second inequality.
(C.4)
✷
The next lemma provides a second upper bound for kuvk.
Lemma C.5 (Second uv-lemma) 5 If u and v are orthogonal in IRn , then kuvk ≤
√1 kuk kvk .
2
Proof: Recall from (C.4) that
1
4
4
ku + vk + ku − vk .
16
Now first assume that u and v are unit vectors, i.e., kuk = kvk = 1. Then the
4
4
2
orthogonality of u and v implies that ku + vk = ku − vk = 4, whence kuvk ≤ 1/2.
In the general case, if u or v is not a unit vector, then if one of the two vectors is the
zero vector, the lemma is obvious. Else we may write
2
kuvk ≤
kuvk = kuk kvk
u v
.
kuk kvk
Now applying the above result for the case of unit vectors to u/ kuk and v/ kvk we
obtain the lemma.
✷
The bound for kuvk in Lemma C.5 is stronger than the corresponding bound in
Lemma C.4. This easily follows by using ab ≤ 21 a2 + b2 with a = kuk and b = kvk.
It may be noted that the last inequality provides also an alternative proof for the
bound for kuvk∞ in Lemma C.5.
For the proof of the third uv-lemma we need the next lemma.
Lemma C.6 6 Let γ be a vector in IRp such that γ > −e and eT γ = σ. Then if either
γ ≥ 0 or γ ≤ 0,
p
X
−σ
−γi
≤
;
1
+
γ
1
+σ
i
i=1
equality holds if and only if at most one of the coordinates of γ is nonzero.
5
6
For the case in which u and v are unit vectors, this lemma has been found by several authors.
See, e.g., Mizuno [214], Jansen et al. [154], Gonzaga [125]. The extension to the general case in
Lemma C.5 is due to Gonzaga (private communication). We will refer to this lemma as the second
uv-lemma.
This lemma and the next lemma are due to Ling [182, 183].
440
Some Technical Lemmas
Proof: The lemma is trivial if γ = 0, so we may assume that γ is nonzero. For the
proof of the lemma we use the function f : (−1, ∞)p → IR defined by
p
X
−γi
.
f (γ) :=
1
+ γi
i=1
We can easily verify that f is convex (its Hessian is positive definite). Observe that
P
p
i=1 γi /σ = 1 and, since either γ ≥ 0 or γ ≤ 0, γi /σ ≥ 0. Therefore we may write
!
p
p
p
X
X
X
γi
γi
−σ
γi
−σ
=
f (γ) = f
σei ≤
f (σei ) =
,
σ
σ
σ
1
+
σ
1
+σ
i=1
i=1
i=1
where ei denotes the i-th unit vector in IRp . This proves the inequality in the lemma.
Note that the inequality holds with equality if γ = σei , for some i, and that in all
other cases the inequality is strict since the Hessian of f is positive definite. Thus the
lemma has been proved.
✷
Using the above lemmas we prove the next lemma.
Lemma C.7 (Third uv-lemma) Let u and v be orthogonal in IRn , and suppose
ku + vk = 2r with r < 1. Then
e
2r4
eT
.
−e ≤
e + uv
1 − r4
Proof: The first uv-lemma implies that kuvk∞ ≤ r2 < 1. Hence, putting β := uv we
have eT β = 0 and −e < β < e. Now let
{i : βi > 0},
I+ :=
{i : βi < 0}.
I− :=
Then
X
i∈I+
βi = −
X
βi .
i∈I−
Let σ denote this common value. Using Lemma C.6 twice, with respectively γi = bi
for i ∈ I+ and γi = bi for i ∈ I− , we obtain
X
n
−βi
e
e
T
T
−e
=
e
−e =
e
e + uv
e+β
1 + βi
i=1
X −βi
X −βi
=
+
1 + βi
1 + βi
i∈I+
≤
i∈I−
σ
2σ 2
−σ
.
+
=
1+σ
1−σ
1 − σ2
The last expression is monotonically increasing in σ. Hence we may replace it by an
upper bound, which can be obtained as follows:
n
σ=
n
n
1
1X
1X
1X 2
2
|βi | =
|ui vi | ≤
ui + vi2 = ku + vk = r2 .
2 i=1
2 i=1
4 i=1
4
Some Technical Lemmas
441
Substitution of this bound for σ yields the lemma.
✷
n
Lemma C.8
√ and v be orthogonal in IR and suppose
√ (Fourth uv-lemma) Let u
ku + vk ≤ 2 and δ = ku + v + uvk ≤ 1/ 2. Then
q
p
kuk ≤ 1 − 1 − 2δ 2 .
Proof: It is convenient for the proof to introduce the vector
z = u + v,
and to denote r := kzk. Since u and v are orthogonal there exists a ϕ, 0 ≤ ϕ ≤ π/2,
such that
kuk = r cos ϕ, kvk = r sin ϕ.
(C.5)
√
Note that if the angle ϕ equals π/4 then r = kzk ≤ 2 implies that kuk = kvk < 1.
But for the general case we only know that 0 ≤ ϕ ≤ π/2 and hence at first sight we
should expect that the norms of kuk and kvk may well exceed 1. However, it will turn
√
out below that the second condition in the lemma, namely δ = ku + v + uvk ≤ 1/ 2,
restricts the values of ϕ to a small neighborhood of π/4, depending on δ, thus yielding
the tighter upper bound for kuk in the lemma. Of course, the symmetry with respect
to u and v implies the same upper bound for kvk.
We have
δ = ku + v + uvk ≥ ku + vk − kuvk = kzk − kuvk .
(C.6)
Applying the second uv-lemma (Lemma C.5) we find
1
1
r2 sin 2ϕ
√
kuvk ≤ √ kuk kvk = √ r2 cos ϕ sin ϕ =
.
2
2
2 2
Substituting this in (C.6) we obtain
δ≥r−
r2 sin 2ϕ
√
.
2 2
(C.7)
The lemma is trivial if either ϕ = 0 or ϕ = π/2, because then either u = 0 or u = z.
In the latter case, v = 0, whence kuk = δ. Since (cf. Figure 6.12, page 138)
q
p
δ ≤ 1 − 1 − 2δ 2 ,
the claim follows. Therefore, from now on it is assumed that
0<ϕ<
π
.
2
Thus, sin 2ϕ > 0 and (C.7) is equivalent to
√
√
(sin 2ϕ) r2 − 2r 2 + 2ϕ 2 ≥ 0.
442
Some Technical Lemmas
The left-hand side expression is quadratic in r and vanishes if
√
q
√
2
1 ± 1 − δ 2 sin 2ϕ .
r=
sin 2ϕ
√
The plus sign gives a value larger than 2. Thus we obtain
√
q
√
2δ
2
p
1 − 1 − δ 2 sin 2ϕ =
.
r≤
√
sin 2ϕ
1 + 1 − δ 2 sin 2ϕ
Consequently, using 0 < ϕ < π/2,
kuk = r cos ϕ ≤
We proceed by considering the function
f (ϕ) :=
2δ cos ϕ
p
.
√
1 + 1 − δ 2 sin 2ϕ
2δ cos ϕ
p
,
√
1 + 1 − δ 2 sin 2ϕ
0 ≤ ϕ ≤ π/2,
√
with δ 2 ≤ 1. Clearly this function is nonnegative and differentiable on the interval
[0, π/2]. Moreover, f (0) = δ and f (π/2) = 0. On the open interval (0, π/2) the
derivative of f with respect to ϕ is zero if and only if
√
q
√
δ 2 cos ϕ cos 2ϕ
p
= 0.
− sin ϕ 1 + 1 − δ 2 sin 2ϕ +
√
1 − δ 2 sin 2ϕ
This reduces to
q
√
√
√
− sin ϕ 1 − δ 2 sin 2ϕ − sin ϕ 1 − δ 2 sin 2ϕ + δ 2 cos ϕ cos 2ϕ = 0,
which can be rewritten as
q
√
√
δ 2 cos ϕ − sin ϕ = sin ϕ 1 − δ 2 sin 2ϕ.
Taking squares we obtain
√
√
2δ 2 cos2 ϕ + sin2 ϕ − δ 2 sin 2ϕ = sin2 ϕ − δ 2 sin2 ϕ sin 2ϕ,
which simplifies to
√
√
2δ 2 cos2 ϕ = δ 2 sin 2ϕ 1 − sin2 ϕ = δ 2 sin 2ϕ cos2 ϕ.
√
Dividing by δ 2 cos2 ϕ we find the surprisingly simple expression
√
sin 2ϕ = δ 2.
√
We assume that δ is positive, because if δ = 0 the lemma is trivial. Then sin 2ϕ = δ 2
admits two values for ϕ on the interval [0, π/2], one at each side of π/4. Since we are
Some Technical Lemmas
443
maximizing f we have to take the value to the left of π/4. For this value, cos 2ϕ is
positive. Therefore we may write
2δ cos ϕ
2δ cos ϕ
δ
2δ cos ϕ
p
=
=
=
.
2ϕ
2
1
+
cos
2ϕ
2
cos
cos
ϕ
1 + 1 − sin 2ϕ
√
Now cos ϕ can be solved from the equation 2 cos ϕ sin ϕ = δ 2. Taking the larger of
the two roots we obtain
q
p
1
cos ϕ = √
1 + 1 − 2δ 2 .
2
f (ϕ) =
For this value of ϕ we have
√ q
√
q
p
p
δ 2
δ 2
2
=√
f (ϕ) = p
1 − 1 − 2δ = 1 − 1 − 2δ 2 .
√
2δ 2
1 + 1 − 2δ 2
Clearly this value is larger than the values at the boundary points ϕ = 0 and ϕ = π/2.
Hence it gives the maximum value of r cos ϕ on the whole interval [0, π/2]. Thus the
lemma follows.
✷
Appendix D
Transformation to canonical form
D.1
Introduction
It is almost obvious that every LO problem can be rewritten in the canonical form given
by (P ). To see this, some simple observations are sufficient. First, any maximization
problem can be turned into a minimization problem by multiplying the objective
function by −1. Second, any equality constraint aT x = b can be replaced by the
two inequality constraints aT x ≤ b, aT x ≥ b, and any inequality constraint aT x ≤ b
is equivalent to −aT x ≥ −b. Third, any free variable x, with no sign requirements,
can be written as x = x+ − x− , with x+ and x− nonnegative. By applying these
transformations to any given LO problem, we get an equivalent problem that has the
canonical form of (P ). The new problem is equivalent to the given problem in the
sense that the new problem is feasible if and only if the given problem is feasible, and
unbounded if and only if the given problem is unbounded, and, moreover, if the given
problem has (one or more) optimal solutions then these can be found from the optimal
solution(s) of the new problem.
The approach just sketched is quite popular in textbooks,1 despite the fact
that in practice, when dealing with solution methods, it has a number of obvious
shortcomings. First, it increases the number of constraints and/or variables in the
problem description. Each equality constraint is removed at the cost of an extra
constraint, and each free variable is removed at the cost of an extra variable. Especially
when the given problem is a large-scale problem it may be desirable to keep the
dimensions of the problem as small as possible. Apart from this shortcoming the
approach is even more inappropriate when dealing with an interior-point solution
method. It will become clear later on that it is then essential to have a feasible region
with a nonempty interior so that the level sets for the duality gap are bounded.
However, when an equality constraint is replaced by two inequality constraints, these
two inequalities cannot have positive slack values for any feasible point. This means
that the interior of the feasible region is empty after the transformation. Moreover,
the nonnegative variables introduced by eliminating a free variable are unbounded:
when the same constant is added to the two new variables their difference remains the
same. Hence, if in the original problem the level sets of the duality gap were bounded,
we would lose this property in the new formulation of the problem.
For deriving theoretical results, the above properties of the described transformations may give no problems at all. In fact, an example of an application of this type is
1
See, e.g., Schrijver [250], page 91, and Padberg [230], page 23.
446
Transformation to canonical form
given in Section 2.10. However, when it is our aim to solve a given LO problem, the approach cannot be recommended, especially if the solution method is an interior-point
method.
The purpose of this section is to show that there exists an alternative approach
that has an opposite effect on the problem size: it reduces the size of the problem.
Moreover, if the original problem has a nonempty interior feasible region then so has
the transformed problem, and if the level sets in the original problem are bounded
then they are bounded after the transformation as well. In this approach, outlined
below, each equality constraint and each free variable in the original problem reduces
the number of variables or the number of constraints by one. Stated more precisely,
we have the following result.
Theorem D.1 Let (P ) be an LO problem with m constraints and n variables.
Moreover let (P ) have m0 equality constraints and n0 free variables. Then there exists
an equivalent canonical problem for which the sum of the number of constraints and
the number of variables is not more than n + m − n0 − m0 .
Proof: In an arbitrary LO problem we distinguish between the following types of
variable: nonnegative variables, free variables and nonpositive variables.2 Similarly,
three types of constraints can occur: equality constraints, inequality constraints of the
less-than-or-equal-to (≤) type and inequality constraints of the greater-than-or-equalto (≥) type. It is clear that nonpositive variables can be replaced by nonnegative
variables at no cost by taking the opposites as new variables. Also, inequality
constraints of the less-than-or-equal-to type can be turned into inequality constraints
of the greater-than-or-equal-to type through multiplication by −1. In this way we can
transform the problem to the following form at no cost:
(
)
T
c0
x0
A0 x0 + A1 x1
=
b0
1
(P )
min
:
, x ≥0 ,
B0 x0 + B1 x1
≥
b1
c1
x1
where, for i = 0, 1, Ai and Bi are matrices and bi , ci and xi are vectors. The vector
x0 contains the n0 free variables, and there are m0 equality constraints. The variables
in x1 are nonnegative, and their number is n − n0 , whereas the number of inequality
constraints is m − m0 . The sizes of the matrices and the vectors in (P ) are such that
all expressions in the problem are well defined and need no further specification.
D.2
Elimination of free variables
In this section we discuss the elimination of free variables, thus showing how to obtain
a problem in which all variables are nonnegative. We may assume that the matrix
2
A variable xi in (P ) is called a nonnegative variable if (P ) contains an explicit constraint xi ≥ 0
and a nonpositive variable if there is a constraint xi ≤ 0 in (P ); all remaining variables are called
free variables. For the moment this classification of the variables is sufficient for our goal. But it
may be useful to discuss the role of bounds on the variables. In this proof we consider any constraint
of the form xi ≥ ℓ or xi ≤ u, with ℓ and u nonzero, as an inequality constraint. If the problem
requires a variable xi to satisfy ℓ ≤ xi ≤ u then we can save one constraint by a simple shift of xi :
defining x′i := xi − ℓ, the new variable is nonnegative and is bounded above by x′i ≤ u − ℓ.
Transformation to canonical form
447
[A0 A1 ] has full row rank. Otherwise the set of equality constraints is redundant
or inconsistent. If the system is not inconsistent, we can eliminate some of these
constraints until the above condition on the rank is satisfied, i.e., rank (A0 A1 ) = m0 .
Introducing a surplus vector x2 , we can write the inequality constraints as
B0 x0 + B1 x1 − x2 = b1 ,
x2 ≥ 0.
The constraints in the problem are then represented by the equality system
x0
0
A0 A1
0
x1 = b1 , x1 ≥ 0, x2 ≥ 0,
B0 B1 −Im−m0
b
x2
where Im−m0 denotes the identity matrix of size (m − m0 ) × (m − m0 ). We now have
m equality constraints and n + m − m0 variables. Grouping together the nonnegative
variables, we may write the last system as
0 0
1
x
b
x
[F G]
= 1 , z=
≥ 0,
z
b
x2
where x0 contains the free variables, as before, and the variables in z are nonnegative.
Note that, as a consequence of the above rank condition, the matrix [F G] has full
row rank. The size of F is m × n0 and the size of G is m × (n − n0 + m − m0 ).
Let us denote the rank of F by r. The we obviously have r ≤ n0 . Then, using
Gaussian elimination, we can express r free variables in the remaining variables. We
simply have to pivot on free variables as long as possible. So, as long as free variables
occur in the problem formulation we choose a free variable and a constraint in which
it occurs. Then, using this (equality) constraint, we express the free variable in the
other variables and by substitution eliminate it from the other constraints and from
the objective function. Since F has rank r, we can do this r times, and after reordering
variables and equations if necessary, the constraints get the form
x̄0
r
0
Ir H Dr 0
d
x̄
x̃
=
, x1 =
, z ≥ 0,
(D.1)
0 0 D
d
x̃0
z
where Ir is the r × r identity matrix, which is multiplied with x̄0 , the vector of the
eliminated free variables, and H is an r × (n0 − r) matrix, which is multiplied with
x̃0 , the vector of free variables that are not eliminated; the columns of Dr and D
correspond to the nonnegative variables in z. Moreover, since the variables x̄0 have
been eliminated from the objective function, there exist vectors cH and cD such that
the objective function has the form
cTH x̃0 + cTD z.
(D.2)
We are left with m equalities. The first r equalities express the free variables in x̄0 in
the remaining variables, while the remaining m − r equalities contain no free variables.
Observe that the first r equalities do not impose a condition on the feasibility of the
vector z; they simply tell us how the values of the free variables in x̄0 can be calculated
from the remaining variables.
448
Transformation to canonical form
We conclude that the problem is feasible if and only if the system
Dz = d,
z≥0
(D.3)
is feasible. Assuming this, for an any z satisfying (D.3) we can choose the vector x̃0
arbitrarily and then compute x̄0 such that the resulting vector satisfies (D.1). So fixing
z, and hence also fixing its contribution cTD z to the objective function (D.2), we can
make the objective value arbitrary small if the vector cH is nonzero. Since the variables
in x̄0 do not occur in the objective function, it follows from this that the problem is
unbounded if cH is nonzero.
So, if the problem is not unbounded then cH = 0. In that case it remains to solve
the problem
(P ′ )
min cTD z : Dz = d, z ≥ 0 ,
where D is an (m − r) × (n − n0 + m − m0 ) matrix and this matrix has rank m − r.
Note that (P ′ ) is in standard format.
D.3
Removal of equality constraints
We now show how problem (P ′ ) can be reduced to canonical form. This goes by using
the same pivoting procedure as above. Choose a variable and an equality constraint in
which it occurs. Use the constraint to express the chosen variable in the other variables
and then eliminate this variable from the other constraints and the objective function.
Since A has rank m − r we can repeat this process m − r times and then we are left
with expressions for the m − r eliminated variables in the remaining (nonnegative)
variables. The number of the remaining variables is
n − n0 + m − m0 − (m − r) = n − n0 + r − m0 .
Now the nonnegativity conditions on the m − r eliminated variables result in m − r
inequality constraints for the remaining n − n0 + r − m0 variables. So we are left with
m − r inequality constraints that contain n − n0 + r − m0 variables. The sum of these
numbers being n + m − n0 − m0 , the theorem has been proved.
✷
Before giving an example of the above reduction we make some observations.
Remark D.2 When dealing with an LO problem, it is most often desirable to have
an economical representation of the problem. Theorem D.1 implies that whenever the
model contains equality constraints or free variables, then the size of the constraint
matrix can be reduced by transforming the problem to a canonical form. As a
consequence, when we consider the dimension of the constraint matrix as a measure of
the size of the model, then any minimal representation of the problem has a canonical
form. Of course, here it is assumed that in any such representation, nonpositive
variables are replaced by nonnegative variables and ≤ inequalities by ≥ inequalities;
these transformations do not change the dimension of the constraint matrix. In this
connection it may be useful to point out that the representation obtained by the
transformation in the proof of Theorem D.1 may be far from a minimal representation.
Any claim of this type is poorly founded. For example, if the given problem is infeasible
Transformation to canonical form
449
then a representation with one constraint and one variable exists. But to find out
whether the problem is infeasible one really has to solve it.
Remark D.3 It may happen that after the above transformations we are left with a
canonical problem
(P )
min cT x : Ax ≥ b, x ≥ 0 ,
for which the matrix A has a zero row. In that case we can reduce the problem further.
If the i-th row of A is zero and bi ≤ 0 then the i-th row of A and the i-th entry of b
can be removed. If bi > 0 then we may decide that the problem is infeasible.
Remark D.4 Also if A has a zero column further reduction is possible. If the j-th
column of A is zero and cj > 0 then we have xj = 0 in any optimal solution and this
column and the corresponding entry of c can be deleted. If cj < 0 then the problem is
unbounded. Finally, if cj = 0 then xj may be given any (nonnegative) value. For the
further analysis of the problem we may delete the j-th column of A and the entry cj
in c.
Example D.5 By way of example we consider the problem
max {y1 + y2 : −1 ≤ y1 ≤ 1, y2 ≤ 1} .
(EP )
(D.4)
This problem has two variables and three constraints, so the constraint matrix has size
3 × 2. Since the two variables are free (cf. Footnote 2), Theorem D.1 guarantees the
existence of a canonical description of the problem for which the sum of the numbers
of rows and columns in the constraint matrix is at most 3 (= 5 − 2). Following the
scheme of the proof of Theorem D.1 we construct such a canonical formulation. First,
by introducing nonnegative slack variables for the three inequality constraints, we
change all constraints into equality constraints:
−y1
+ s1
y1
= 1
+ s2
y2
= 1
+ s3 = 1.
The free variables y1 and y2 can be eliminated by using
y1
y2
=
=
s1 − 1
1 − s3 ,
and since y1 + y2 = s1 − s3 we obtain the equivalent problem
max {s1 − s3 : s1 + s2 = 2, s1 , s2 , s3 ≥ 0} .
By elimination of s2 this reduces to
max {s1 − s3 : s1 ≤ 2, s1 , s3 ≥ 0} .
(D.5)
The problem is now reduced to the dual canonical form, as given by (2.2), with the
following constraint matrix A, right-hand side vector c and objective vector b:
" #
"
#
h i
1
1
A=
, c= 2 , b=
.
0
−1
450
Transformation to canonical form
Note that the constraint matrix in this problem has size 2 × 1, and the sum of the
dimensions is 3, as expected.
♦
In the above example the optimal solution y = (1, 1) is unique. We consider below
two modifications of the sample problem (EP ) by changing the objective function. In
the first modification we use the objective function y1 ; then the optimal set consists
of all y = (1, y2 ) with y2 ≤ 1. The optimal solution is no longer unique. The second
modification has objective function y1 − y2 ; then the problem is unbounded, as can
easily be seen.
Example D.6 In this example we consider the problem
max {y1 : −1 ≤ y1 ≤ 1, y2 ≤ 1} .
(D.6)
As in the previous example we can introduce nonnegative slack variables s1 , s2 and s3
and then eliminate the variables y1 , y2 and s2 , arriving at the canonical problem
max {s1 : s1 ≤ 2, s1 , s3 ≥ 0} .
(D.7)
Here we have replaced the objective y1 = s1 − 1 simply by s1 , thereby omitting the
constant −1, which is irrelevant for the optimization. The dependence of the eliminated
variables on the variables in this problem is the same as in the previous example:
y1
y2
=
=
s2
=
s1 − 1
1 − s3
2 − s1 .
The constraint matrix A and the right-hand side vector c in the dual canonical
formulation are the same as before; only the objective vector b has changed:
A=
"
1
0
#
,
c=
h
2
i
,
b=
"
1
0
#
.
♦
Example D.7 Finally we consider the unbounded problem
max {y1 − y2 : −1 ≤ y1 ≤ 1, y2 ≤ 1} .
(D.8)
In this case the optimal set is empty. To avoid repetition we immediately state the
canonical model:
max {s1 + s3 : s1 ≤ 2, s3 ≥ 0} .
(D.9)
The dependence of the eliminated variables on the variables in this problem is
the same as in the previous example. The matrix A and vectors c and b are now
A=
"
1
0
#
,
c=
h
2
i
,
b=
"
1
1
#
.
♦
Appendix E
The Dikin step algorithm
E.1
Introduction
In this appendix we reconsider the self-dual problem
(SP )
min
q T z : M z ≥ −q, z ≥ 0 .
(E.1)
as given by (2.16) and we present a simple algorithm for solving (SP ) different from
the full-Newton step algorithm of Section 3. Recall that we may assume without loss
of generality that x = e is feasible and s(e) = M e + q = e, so e is the point on the
central path of (SP ) corresponding to the value 1 of the barrier parameter. Moreover,
at this point the objective value equals n, the order of the skew-symmetric matrix M .
To avoid the trivial case n = 1 (when M = 0), we assume below that n ≥ 2.
The algorithm can be described roughly as follows. Starting at x0 = e the algorithm
approximately follows the central path until the objective value reaches some (small)
target value ε. This is achieved by moving from x0 along a direction — more or less
tangent to the central path — to the next iterate x1 , in such a way that x1 is close to
the central path again, but with a smaller objective value. Then we repeat the same
procedure until the objective has become small enough.
In the next section we define the search direction used in the algorithm.1 Then,
in Section E.3 the algorithm is defined and in subsequent sections the algorithm is
analyzed. This results in an iteration bound, in Section E.5.
E.2
Search direction
Let x be a positive solution of (SP ) such that its surplus vector s = s(x) is positive,
and let ∆x denote a displacement in the x-space. For the moment we neglect the
nonnegativity conditions in (SP ). Then, the new iterate x+ is given by
x+ := x + ∆x,
1
After the appearance of Karmarkar’s paper in 1984, Barnes [34] and Vanderbei, Meketon and
Freedman [279] proposed a simplified version of Karmarkar’s algorithm. Later, their algorithm
appeared to be just a rediscovery of the primal affine-scaling method proposed by Dikin [63] in
1967. See also Barnes [35]. The search direction used in this chapter can be considered as a primaldual variant of the affine-scaling direction of Dikin (cf. the footnote on page 339) and is therefore
named the Dikin direction. It was first proposed by Jansen, Roos and Terlaky [156].
452
The Dikin step algorithm
and the new surplus vector s+ follows from
s+ := s(x+ ) = M (x + ∆x) + q = s + M ∆x.
The displacement ∆s in the s-space is simply given by
∆s = s+ − s = M ∆x,
and, hence, the two displacements are related by
M ∆x − ∆s = 0.
(E.2)
This implies, by the orthogonality property (2.22), that ∆x and ∆s are orthogonal:
T
T
(∆x) ∆s = (∆x) M ∆x = 0.
(E.3)
The inequality constraints in (SP ) require that
x + ∆x ≥ 0,
s + ∆s ≥ 0.
In fact, we want to stay in the interior of the feasible region, so we need to find
displacements ∆x and ∆s such that
x + ∆x > 0,
s + ∆s > 0.
Following an idea of Dikin [63, 65], we replace the nonnegativity conditions by requiring
that the next iterates (x + ∆x, s + ∆s) belong to a suitable ellipsoid. We define this
ellipsoid by requiring that
∆x ∆s
+
≤ 1,
(E.4)
x
s
and call this ellipsoid in IR2n the Dikin ellipsoid.
Remark E.1 It may be noted that when there are no additional conditions on the
displacements ∆x and ∆s, then the Dikin ellipsoid is highly degenerate in the sense that
it contains a linear space. For then the equation s∆x + x∆s = 0 determines an n-dimensional
linear space that is contained in it. However, when intersecting the Dikin ellipsoid with the
linear space (E.2), we get a bounded set. This can be seen as follows. The pair (∆x, ∆s)
belongs to the Dikin ellipsoid if and only if (E.4) holds. Now (E.4) can be rewritten as
s∆x + x∆s
xs
≤ 1.
By substitution of ∆s = M ∆x this becomes
s∆x + xM ∆x
xs
≤ 1,
which is equivalent to
(XS)−1 (S + XM ) ∆x ≤ 1.
The matrix (XS)−1 (S + XM ) is nonsingular, and hence ∆x is bounded. See also Exercise 9
(page 29) and Exercise 113 (page 453).
•
The Dikin step algorithm
453
Our aim is to minimize the objective value q T x = xT s. The new objective value is
(x + ∆x)T (s + ∆s) = xT s + xT ∆s + sT ∆x.
Here we have used that ∆x and ∆s are orthogonal, from (E.3). Now minimizing
the new objective value over the Dikin ellipsoid amounts to solving the following
optimization problem:
∆x ∆s
T
T
+
min s ∆x + x ∆s : M ∆x − ∆s = 0,
≤1 .
(E.5)
x
s
We proceed by showing that this problem uniquely determines the search direction
vectors. For this purpose we rewrite (E.5) as follows.
∆x ∆s
∆x ∆s
T
+
+
: M ∆x − ∆s = 0,
≤1 .
(E.6)
min (xs)
x
s
x
s
The vector
∆x ∆s
+
x
s
must belong to the unit ball. When we neglect the affine constraint ∆s = M ∆x in
(E.6) we get the relaxation
n
o
T
min (xs) ξ : kξk ≤ 1 .
ξ :=
This problem has a trivial (and unique) solution, namely
ξ=−
xs
.
kxsk
Thus, if we can find ∆x and ∆s such that
∆x ∆s
+
x
s
∆s
xs
kxsk
M ∆x
−
=
=
(E.7)
(E.8)
then ∆x and ∆s will solve (E.5). Multiplying both sides of (E.7) with xs yields
s∆x + x∆s = −
x2 s2
.
kxsk
(E.9)
Now substituting (E.8) we get2,3
(S + XM ) ∆x = −
2
3
x2 s2
.
kxsk
As usual, X = diag (x) and S = diag (s).
Exercise 113 If we define d :=
p
x/s then show that the Dikin step ∆x can be rewritten as
∆x = −D (I + DM D)−1
3
3
x2 s2
.
kxsk
454
The Dikin step algorithm
Thus we have found the solution of (E.5), namely
−1
∆x
=
− (S + XM )
∆s
=
M ∆x.
x2 s2
kxsk
(E.10)
(E.11)
We call ∆x the Dikin direction or Dikin step at x for the self-dual model (SP ). In the
next section we present an algorithm that is based on the use of this direction, and in
subsequent sections we prove that this algorithm solves (SP ) in polynomial time.
E.3
Algorithm using the Dikin direction
The reader should be aware that we have so far not discussed whether the Dikin step
yields a feasible point. Before stating our algorithm we need to deal with this. For the
moment it suffices to point out that in the algorithm we use a step-size parameter α.
Starting at x we move in the direction along the Dikin step ∆x to x + α∆x. The value
of α is specified later on. The algorithm can now be described as follows.
Dikin Step Algorithm for the Self-dual Model
Input:
An accuracy parameter ε > 0;
a step-size parameter α, 0 < α ≤ 1;
x0 > 0 such that s(x0 ) > 0.
begin
x := x0 ; s := s(x);
while xT s ≥ ε do
begin
x := x + α∆x (with ∆x from (E.10));
s := s(x);
end
end
Below we analyze this algorithm and provide a default value for the step-size
parameter α for which the Dikin step is always feasible. This makes the algorithm
well defined. In the analysis of the algorithm we need a measure for the ‘distance’ of
an iterate x to the central path . To this end, for each positive feasible vector x with
s(x) > 0, we use the number δc (x) as introduced earlier in (3.20):
δc (x) :=
max (xs(x))
.
min (xs(x))
(E.12)
Below, in Theorem E.8 we show that the algorithm needs no more than O(τ n)
iterations to produce a solution x with xT s(x) ≤ ε, where τ depends on x0 according
The Dikin step algorithm
to
455
τ = max 2, δc (x0 ) .
Recall that it may be assumed without loss of generality that x0 lies on the central
path, in which case δc (x0 ) = 1 and τ = 2.
E.4
Feasibility, proximity and step-size
We proceed by a condition on the step-size that guarantees the feasibility of the new
iterates. Let us say that the step-size α is feasible if the new iterate and its surplus
vector are positive. Then we may state the following result. In this lemma, and further
on, we simply write s for s(x).
Lemma E.2 Let α ≥ 0, xα = x + α∆x and sα = s + α∆s. If ᾱ is such that xα sα > 0
for all α satisfying 0 ≤ α ≤ ᾱ, then the step-size ᾱ is feasible.
Proof: If ᾱ satisfies the hypothesis of the lemma then the coordinates of xα and sα
cannot vanish for any α ∈ [0, ᾱ]. Hence, since x0 s0 > 0, by continuity, xα and sα must
be positive for any such α.
✷
We use the superscript
+
to refer to entities after the Dikin step of size α at x:
x+
s+
:=
:=
x + α∆x,
s + α∆s.
Consequently,
x+ s+ = (x + α∆x)(s + α∆s) = xs + α (x∆x + s∆s) + α2 ∆x∆s.
Using (E.9), we obtain
x+ s+ = xs − α
x2 s2
+ α2 ∆x∆s.
kxsk
(E.13)
Recall that the objective value is given by q T x = xT s. In the next lemma we investigate
the reduction of the objective value during a feasible Dikin step with size α.
Lemma E.3 If the step-size α is feasible then
α
+ T +
xT s.
s ≤ 1− √
x
n
Proof: Using (E.13) and the fact that ∆x and ∆s are orthogonal, the objective value
T
(x+ ) s+ after the step can be expressed as follows.
x+
T
s+ = xT s − αeT
x2 s2
= xT s − α kxsk .
kxsk
The Cauchy–Schwarz inequality implies
xT s = eT (xs) ≤ kek kxsk =
√
n kxsk .
456
The Dikin step algorithm
Substitution gives
x+
Hence the lemma follows.
T
s+ ≤
α
xT s.
1− √
n
✷
Now let τ ≥ 1 be some constant. We assume that we are given a feasible x such that
δc (x) ≤ τ . Our aim is to find a step-size α such that these properties are maintained
after the Dikin step.
One easily verifies that δc (x) ≤ τ holds if and only if there exist positive numbers
τ1 and τ2 such that
τ1 e ≤ xs ≤ τ2 e,
with τ2 = τ τ1 .
Obvious choices are τ1 = min(xs) and τ2 = max(xs). Only then one has δc (x) = τ . In
the sequel we assume that x is feasible and δc (x) ≤ τ . Note that this implies x > 0
and s > 0.
Lemma E.4 If the step-size α satisfies
1
α≤ √ ,
τ n
(E.14)
then the map
t 7→ t − α
is monotonically increasing for t ∈ [0, τ2 ].
t2
kxsk
(E.15)
Proof: The lemma holds if the derivative of the map in the lemma is nonnegative
for t ∈ [0, τ2 ]. So we need to show that 1 − 2αt/ kxsk ≥ 0 for t ∈ [0, τ2 ]. Since n ≥ 2
we have
√
kτ1 ek
kxsk
τ1
τ1 n
1
=
≤
.
(E.16)
α≤ √ = √ ≤
2τ2
2τ2
2τ2
τ n
τ2 n
Hence, using t ∈ [0, τ2 ], we may write
2αt
2ατ2
≤
≤ 1.
kxsk
kxsk
This implies the lemma.
✷
In the sequel we assume that α satisfies (E.14). By Lemma E.4 the map (E.15) is
then monotonically increasing for t ≤ τ2 . Since τ1 e ≤ xs ≤ τ2 e, by applying this map
to the components of xs we get
x2 s2
τ22
τ12
e ≤ xs − α
e.
≤ τ2 − α
τ1 − α
kxsk
kxsk
kxsk
Substitution into (E.13) gives
τ12
τ22
2
+ +
τ1 − α
e + α ∆x∆s ≤ x s ≤ τ2 − α
e + α2 ∆x∆s,
kxsk
kxsk
(E.17)
The Dikin step algorithm
457
thus yielding lower and upper bounds for the entries of x+ s+ .
A crucial part of the analysis consists of finding a bound for the last term in (E.17).
For that purpose we use Lemma C.4 in Appendix
C. The first statement in this lemma
p
(with u = d−1 ∆x and v = d∆s, where d = x/s) gives
d−1 ∆x (d∆s)
k∆x∆sk∞ =
∞
≤
1 −1
d ∆x + d∆s
4
2
=
2
1 s∆x + x∆s
√
4
xs
.
Now using (E.9) we get
k∆x∆sk∞ ≤
1 √
xs
xs
4
kxsk
2
≤
1
xs
kxsk∞
4
kxsk
2
=
1
τ2
kxsk∞ ≤ .
4
4
Lemma E.5 When assuming (E.14), the step size α is feasible if
p
2 (2τ − 1)
α≤
.
τ
(E.18)
(E.19)
Proof: Using (E.18) we derive from the left-hand side inequality in (E.17) that
τ2
τ12
+ +
e − α2 e.
(E.20)
x s ≥ τ1 − α
kxsk
4
The step size α is feasible if and only if x+ s+ is a positive vector. Due to (E.14) we
+ +
have α ≤ kxsk
2τ2 (cf. (E.16)). Hence x s certainly is positive if
1
τ2
τ
τ2
> 0.
− α2
τ1 − 1 − α2 = τ1 1 −
2τ2
4
2τ
4
One easily verifies that the last inequality is equivalent to the inequality in the lemma.
✷
Lemma E.6
4
Assuming (E.14), (E.19), and δc (x) ≤ τ , let
α≤
4(τ − 1) 1
√ .
τ +1 τ n
Then x+ is feasible and δc (x+ ) ≤ τ .
Proof: Obviously, (E.20) yields a lower bound for min(x+ s+ ). In the same way,
by using (E.18) once more, the right-hand side inequality in (E.17) yields an upper
bound, namely
τ2
τ22
+ +
e + α2 e.
(E.21)
x s ≤ τ2 − α
kxsk
4
Now it follows from (E.20) and (E.21) that
δc (x+ ) ≤
4
τ2 −
τ1 −
ατ22
kxsk
ατ12
kxsk
+
α2 τ2
4
−
α2 τ2
4
ατ2
α2
1
−
+
τ2
kxsk
4
= τ 1 +
=
2
ατ1
τ1 1 − kxsk
− α4 τ
2
α(τ1 −τ2 )
+ α (τ4+1)
kxsk
2
ατ1
− α4 τ
1 − kxsk
.
In the previous edition of this book our estimate for δc (x+ ) contained a technical error. We kindly
acknowledge the authors Zoltán and Szabolcs of [322] for pointing this out.
458
The Dikin step algorithm
This makes clear that δc (x+ ) ≤ τ holds if
α(τ1 − τ2 ) α2 (τ + 1)
+
≤ 0.
kxsk
4
√
Using τ1 ≤ τ2 and kxsk ≤ τ2 n, one easily obtains the upper bound for α in the
lemma.
✷
Note that if τ ≥ 2 then 4(τ − 1)/ (τ + 1) > 1. Hence the bound in Lemma E.6
is then weaker than the bounds in Lemma E.4 and Lemma E.5. Therefore, without
further proof we may state the main result of this section.
√
Theorem E.7 Let τ ≥ 2 and α = 1/(τ n). Then α is feasible, and δc (x) ≤ τ implies
δc (x∗ ) ≤ τ .
E.5
Convergence analysis
We are now ready for deriving an upper bound for the number of iterations needed
by the algorithm.
√
Theorem E.8 Let τ := max 2, δc (x0 ) and α = 1/ (τ n). Then the Dikin Step
Algorithm for the Self-dual Model requires at most
q T x0
τ n log
ε
iterations. The output is a feasible solution x such that δc (x) ≤ τ and q T x ≤ ε.
Proof: Initially we are given a feasible x = x0 > 0 such that δc (x) ≤ τ . Since τ ≥ 2
these properties are maintained during the execution of the algorithm, by Theorem
E.7. Initially the objective value equals q T x0 . Each iteration reduces the objective
value by a factor 1 − 1/(nτ ), by Lemma E.3. Hence, after k iterations the objective
value is smaller than ε if
k
1
1−
q T x0 ≤ ε.
nτ
Taking logarithms, this becomes
1
+ log(q T x0 ) ≤ log ε.
k log 1 −
nτ
Since
1
1
log 1 −
≤− ,
nτ
nτ
this certainly holds if k satisfies
−
q T x0
k
≤ − log(q T x0 ) + log ε = − log
.
nτ
ε
This implies the theorem.
✷
The Dikin step algorithm
459
Example E.9 In this example we demonstrate the behavior of the Dikin Step
Algorithm by applying it to the problem (SP ) in Example I.7, as given in (2.19)
(page 23). The same problem was solved earlier by the Full-Newton Step Algorithm
in Example I.38.
We initialize the algorithm with z = e. Then Theorem E.8, with τ = 2 and n = 5,
yields that the algorithm requires at most
5
10 log
ε
iterations. For ε = 10−2 we have log (5/ε) = log 500 = 6.2146, and we get 63 as an
upper bound for the number of iterations. When running the algorithm with this ε
the actual number of iterations is 58. The output of the algorithm is
z = (1.5985, 0.0025, 0.7998, 0.8005, 0.0020)
and
s(z) = (0.0012, 0.8005, 0.0025, 0.0025, 1.0000).
The left plot in Figure E.1 shows how the coordinates of the vector z develop in the
course of the algorithm. The right plot does the same for the coordinates of the surplus
vector s = s(z). Observe that z and s(z) converge to the same solution as found in
1.6
1.6
1.4
1.4
■
1.2
z1
1
κ
1.2
s5
✠
1
✠
0.8
✻
0.8
■
s2
z3
0.6
0.6
s3
✠
s3
0.4
0.4
■
✒
ϑ
0.2
✒
0.2
s4
s1
0
0
20
Figure E.1
40
60
✲ iteration number
0
0
20
40
60
✲ iteration number
Output of the Dikin Step Algorithm for the problem in Example I.7.
Example I.38 by the Full-Newton Step Algorithm, but the number of iterations is
higher.
♦
Bibliography
[1] I. Adler, N.K. Karmarkar, M.G.C. Resende, and G. Veiga. Data structures and programming techniques for the implementation of Karmarkar’s algorithm. ORSA J. on
Computing, 1:84–106, 1989.
[2] I. Adler, N.K. Karmarkar, M.G.C. Resende, and G. Veiga. An implementation of Karmarkar’s algorithm for linear programming. Mathematical Programming, 44:297–335,
1989. Errata in Mathematical Programming, 50:415, 1991.
[3] I. Adler and R.D.C. Monteiro. Limiting behavior of the affine scaling continuous
trajectories for linear programming problems. Mathematical Programming, 50:29–51,
1991.
[4] I. Adler and R.D.C. Monteiro. A geometric view of parametric linear programming.
Algorithmica, 8:161–176, 1992.
[5] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The Design and Analysis of Computer
Algorithms. Addison-Wesley, Reading, Mass., 1974.
[6] M. Akgül. A note on shadow prices in linear programming. J. Operational Research
Society, 35:425–431, 1984.
[7] A. Altman and K.C. Kiwiel. A note on some analytic center cutting plane methods
for convex feasibility and minimization problems. Computational Optimization and
Applications, 5, 1996.
[8] E.D. Andersen. Finding all linearly dependent rows in large-scale linear programming.
Optimization Methods and Software, 6:219–227, 1995.
[9] E.D. Andersen and K.D. Andersen. Presolving in linear programming. Mathematical
Programming, 71:221–245, 1995.
[10] E.D. Andersen, J. Gondzio, Cs. Mészáros, and X. Xu. Implementation of interior
point methods for large scale linear programming. In T. Terlaky, editor, Interior Point
Methods of Mathematical Programming, pp. 189–252. Kluwer Academic Publishers,
Dordrecht, The Netherlands, 1996.
[11] E.D. Andersen and Y. Ye. Combining interior-point and pivoting algorithms for linear
programming. Management Science, 42(12):1719–1731, 1996.
[12] K.D. Andersen. A modified Schur complement method for handling dense columns
in interior-point methods for linear programming. ACM Transactions Mathematical
Software, 22(3):348–356, 1996.
[13] E.D. Andersen, C. Roos, T. Terlaky, T. Trafalis and J.P. Warners. The use of low-rank
updates in interior-point methods. In: Ed. Y. Yuan, Numerical Linear Algebra and
Optimization, pp. 1–12. Science Press, Beijing, China, 2004.
[14] K.M. Anstreicher. A monotonic projective algorithm for fractional linear programming.
Algorithmica, 1(4):483–498, 1986.
462
Bibliography
[15] K.M. Anstreicher. A strengthened acceptance criterion for approximate projections in
Karmarkar’s algorithm. Operations Research Letters, 5:211–214, 1986.
[16] K.M. Anstreicher. A combined Phase I – Phase II projective algorithm for linear programming. Mathematical Programming, 43:209–223, 1989.
[17] K.M. Anstreicher. Progress in interior point algorithms since 1984. SIAM News, 22:12–
14, March 1989.
[18] K.M. Anstreicher. The worst-case step in Karmarkar’s algorithm. Mathematics of
Operations Research, 14:294–302, 1989.
[19] K.M. Anstreicher. Dual ellipsoids and degeneracy in the projective algorithm for linear
programming. Contemporary Mathematics, 114:141–149, 1990.
[20] K.M. Anstreicher. A standard form variant and safeguarded linesearch for the modified
Karmarkar algorithm. Mathematical Programming, 47:337–351, 1990.
[21] K.M. Anstreicher. A combined phase I – phase II scaled potential algorithm for linear
programming. Mathematical Programming, 52:429–439, 1991.
[22] K.M. Anstreicher. On the performance of Karmarkar’s algorithm over a sequence of
iterations. SIAM J. on Optimization, 1(1):22–29, 1991.
[23] K.M. Anstreicher. On interior algorithms for linear programming with no regularity
assumptions. Operations Research Letters, 11:209–212, 1992.
[24] K.M. Anstreicher. Potential reduction algorithms. In T. Terlaky, editor, Interior Point
Methods of Mathematical Programming, pp. 125–158. Kluwer Academic Publishers,
Dordrecht, The Netherlands, 1996.
[25] K.M. Anstreicher and R.A. Bosch. Long steps in a O(n3 L) algorithm for linear programming. Mathematical Programming, 54:251–265, 1992.
[26] K.M. Anstreicher and R.A. Bosch. A new infinity-norm path following algorithm for
linear programming. SIAM J. on Optimization, 5:236–246, 1995.
[27] M. Arioli, I.S. Duff, and P.P.M. de Rijk. On the augmented system approach to sparse
least-squares problems. Numer. Math., 55:667–684, 1989.
[28] M.D. Asić, V.V. Kovačević-Vujčić, and M.D. Radosavljević-Nikolić. A note on limiting behavior of the projective and the affine rescaling algorithms. Contemporary
Mathematics, 114:151–157, 1990.
[29] D.S. Atkinson and P.M. Vaidya. A scaling technique for finding the weighted analytic
center of a polytope. Mathematical Programming, 57:163–192, 1992.
[30] D.S. Atkinson and P.M. Vaidya. A cutting plane algorithm that uses analytic centers.
Mathematical Programming, 69(69), 1995.
[31] O. Bahn, J.-L. Goffin, O. du Merle, and J.-Ph. Vial. A cutting plane method from
analytic centers for stochastic programming. Mathematical Programming, 69(1):45–73,
1995.
[32] Y.Q. Bai, M. Elghami, and C. Roos. A comparative study of kernel functions for
primal-dual interior-point algorithms in linear optimization. SIAM J. on Optimization,
15(1):101–128, 2004.
[33] M.L. Balinski and A.W. Tucker. Duality theory of linear programs: a constructive
approach with applications. SIAM Review, 11:499–581, 1969.
[34] E.R. Barnes. A variation on Karmarkar’s algorithm for solving linear programming
problems. Mathematical Programming, 36:174–182, 1986.
Bibliography
463
[35] E.R. Barnes. Some results concerning convergence of the affine scaling algorithm.
Contemporary Mathematics, 114:131–139, 1990.
[36] E.R. Barnes, S. Chopra, and D.J. Jensen. The affine scaling method with centering.
Technical Report, Dept. of Mathematical Sciences, IBM T. J. Watson Research Center,
P. O. Box 218, Yorktown Heights, NY 10598, USA, 1988.
[37] M.S. Bazaraa, H.D. Sherali, and C.M. Shetty. Nonlinear Programming: Theory and
Algorithms. John Wiley & Sons, New York (second edition), 1993.
[38] R. Bellman. Introduction to Matrix Analysis. Volume 12 of SIAM Classics in Applied
Mathematics. SIAM, Philadelphia, 1995.
[39] A. Ben-Israel and T.N.E. Greville. Generalized Inverses: Theory and Applications.
John Wiley & Sons, New York, USA, 1974.
[40] D.P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, Massachusetts,
1995.
[41] G. Birkhoff and S. MacLane. A Survey of Modern Algebra. Macmillan, New York,
1977.
[42] R.E. Bixby. Progress in linear programming. ORSA J. on Computing, 6:15–22, 1994.
[43] R.E. Bixby and M.J. Saltzman. Recovering an optimal LP basis from an interior point
solution. Operations Research Letters, 15(4):169–178, 1994.
[44] Å. Björk. Numerical Methods for Least Squares Problems. SIAM, Philadelphia, 1996.
[45] J.F. Bonnans and F.A. Potra. Infeasible path-following algorithms for linear complementarity problems. Mathematics of Operations Research, 22(2), 378–407, 1997.
[46] R.A. Bosch. On Mizuno’s rank one updating algorithm for linear programming. SIAM
J. on Optimization, 3:861–867, 1993.
[47] R.A. Bosch and K.M. Anstreicher. On partial updating in a potential reduction linear
programming algorithm of Kojima, Mizuno and Yoshise. Algorithmica, 9(1):184–197,
1993.
[48] S.E. Boyd and L. Vandenberghe. Semidefinite programming. SIAM Review, 38(1):49–
96, 1996.
[49] A.L. Brearley, G. Mitra, and H.P. Williams. Analysis of mathmetical programming problems prior to applying the simplex algorithm. Mathematical Programming, 15:54–83,
1975.
[50] M.G. Breitfeld and D.F. Shanno. Computational experiene with modified log-barrier
methods for nonlinear programming. Annals of Operations Research, 62:439–464, 1996.
[51] J. Brinkhuis and G. Draisma. Schiet OpT M . Special Report, Econometric Institute,
Erasmus University, Rotterdam, 1996.
[52] R.C. Buck. Advanced Calculus. International Series in Pure and Apllied Mathematics.
Mac-Graw Hill Book Company, New York (third edition), 1978.
[53] J.R. Bunch and B.N. Parlett. Direct methods for solving symmetric indefinit systems
of linear equations. SIAM J. on Numerical Analysis, 8:639–655, 1971.
[54] S.F. Chang and S.T. McCormick. A hierachical algorithm for making sparse matrices
sparse. Mathematicaal Programming, 56:1–30, 1992.
[55] V. Chvátal. Linear programming. W.H. Freeman and Company, New York, USA, 1983.
[56] S.A. Cook. The complexity of theorem-proving procedures. In Proceedings of Third
Annual ACM Symposium on Theory of Computing, pp. 151–158. ACM, New York,
1971.
464
Bibliography
[57] J.P. Crouzeix and C. Roos. On the inverse target map of a linear programming problem.
Unpublished Manuscript, University of Clermont, France, 1994.
[58] I. Csiszár. I-divergence geometry of probability distributions and minimization problems. Annals of Probability, 3:146–158, 1975.
[59] G.B. Dantzig. Linear Programming and Extensions. Princeton Univ. Press, Princeton,
New Jersey, 1963.
[60] G.B. Dantzig. Linear programming. In J.K. Lenstra, A.H.G. Rinnooy Kan, and
A. Schrijver, editors, History of mathmetical programming. A collection of personal
reminiscences. CWI, North–Holland, The Netherlands, 1991.
[61] A. Deza, E. Nematollahi, R. Peyghami and T. Terlaky. The central path visits all the
vertices of the Klee-Minty cube. AdvOl-Report #2004/11. McMaster Univ., Hamilton,
Ontario, Canada.
[62] A. Deza, E. Nematollahi and T. Terlaky. How good are interior point methods? KleeMinty cubes tighten iteration-complexity bounds. AdvOl-Report #2004/20. McMaster
Univ., Hamilton, Ontario, Canada.
[63] I.I. Dikin. Iterative solution of problems of linear and quadratic programming. Doklady
Akademii Nauk SSSR, 174:747–748, 1967. Translated in Soviet Mathematics Doklady,
8:674–675, 1967.
[64] I.I. Dikin. On the convergence of an iterative process. Upravlyaemye Sistemi, 12:54–60,
1974. (In Russian).
[65] I.I. Dikin. Letter to the editor. Mathematical Programming, 41:393–394, 1988.
[66] I.I. Dikin and C. Roos. Convergence of the dual varaibles for the primal affine scaling
method with unit steps in the homogeneous case. J. of Optimization Theory and
Applications, 95:305–321, 1997.
[67] J. Ding and T.Y. Li. An algorithm based on weighted logarithmic barrier functions for
linear complementarity problems. Arabian J. for Science and Engineering, 15(4):679–
685, 1990.
[68] I.S. Duff. The solution of large-scale least-squares problems on supercomputers. Annals
of Operations Research, 22:241–252, 1990.
[69] I.S. Duff, N.I.M. Gould, J.K. Reid, J.A. Scott, and K. Turner. The factorization of
sparse symmetric indefinite matrices. IMA J. on Numerical Analysis, 11:181–204, 1991.
[70] A.S. El-Bakry, R.A. Tapia, and Y. Zhang. A study of indicators for identifying zero
variables in interior-point methods. SIAM Review, 36(1):45–72, 1994.
[71] A.S. El-Bakry, R.A. Tapia, and Y. Zhang. On the convergence rate of Newton interiorpoint methods in the absence of strict complementarity. Computational optimization
and Applications, 6:157-167, 1996.
[72] J.R. Evans and N.R. Baker. Degeneracy and the (mis)interpretation of sensitivity
analysis in linear programming. Decision Sciences, 13:348–354, 1982.
[73] J.R. Evans and N.R. Baker. Reply to ‘On ranging cost coefficients in dual degenerate
linear programming problems’. Decision Sciences, 14:442–443, 1983.
[74] S.-Ch. Fang and S. Puthenpura. Linear optimization and extensions: theory and algorithms. Prentice Hall, Englewood Cliffs, New Jersey 07632, 1993.
[75] J. Farkas. Theorie der einfachen Ungleichungen. J. Reine und Angewandte Mathematik,
124:1–27, 1902.
Bibliography
465
[76] A.V. Fiacco. Introduction to Sensitivity and Stability Analysis in Nonlinear Programming, volume 165 of Mathematics in Science and Engineering. Academic Press, New
York, 1983.
[77] A.V. Fiacco and G.P. McCormick. Nonlinear Programming: Sequential Unconstrained
Minimization Techniques. John Wiley & Sons, New York, 1968. Reprint: Volume 4 of
SIAM Classics in Applied Mathematics, SIAM Publications, Philadelphia, PA 19104–
2688, USA, 1990.
[78] R. Fourer and S. Mehrotra. Solving symmetric indefinite systems in an interior-point
method for linear programming. Mathematical Programming, 62:15–40, 1993.
[79] C. Fraley. Linear updates for a single-phase projective method. Operations Research
Letters, 9:169–174, 1990.
[80] C. Fraley and J.-Ph. Vial. Numerical study of projective methods for linear programming. In S. Dolecki, editor, Optimization: Proceedings of the 5th French-German
Conference in Castel-Novel, Varetz, France, October 1988, volume 1405 of Lecture Notes
in Mathematics, pp. 25–38. Springer Verlag, Berlin, West-Germany, 1989.
[81] C. Fraley and J.-Ph. Vial. Alternative approaches to feasibility in projective methods
for linear programming. ORSA J. on Computing, 4:285–299, 1992.
[82] P. Franklin. A Treatise on Advanced Calculus. John Wiley & Sons, New York (fifth
edition), 1955.
[83] R.M. Freund. An analogous of Karmarkar’s algorithm for inequality constrained linear
programs, with a ’new’ class of projective transformations for centering a polytope.
Operations Research Letters, 7:9–13, 1988.
[84] R.M. Freund. Theoretical efficiency of a shifted barrier function algorithm for linear
programming. Linear Algebra and Its Applications, 152:19–41, 1991.
[85] R.M. Freund. Projective transformation for interior-point algorithms, and a superlinearly convergent algorithm for the w-center problem. Mathematical Programming,
58:385–414, 1993.
[86] K.R. Frisch. Principles of linear programming—the double gradient form of the logarithmic potential method. Memorandum, Institute of Economics, University of Oslo,
Oslo, Norway, October 1954.
[87] K.R. Frisch. La resolution des problemes de programme lineaire par la methode du
potential logarithmique. Cahiers du Seminaire D’Econometrie, 4:7–20, 1956.
[88] K.R. Frisch. The logarithmic potential method for solving linear programming problems. Memorandum, University Institute of Economics, Oslo, 1955.
[89] T. Gal. Postoptimal analyses, parametric programming and related topics. Mac-Graw
Hill Inc., New York/Berlin, 1979.
[90] T. Gal. Shadow prices and sensitivity analysis in linear programming under degeneracy,
state-of-the-art-survey. OR Spektrum, 8:59–71, 1986.
[91] D. Gale. The Theory of Linear Economic Models. McGraw–Hill, New York, USA, 1960.
[92] M.R. Garey and D.S. Johnson. Computers and Intractability: a Guide to the Theory
of NP-completeness. Freeman, San Francisco, 1979.
[93] J. Gauvin. Quelques precisions sur les prix marginaux en programmation lineaire.
INFOR, 18:68–73, 1980. (In French).
[94] A. George and J.W.-H. Liu. Computing Solution of Large Sparse Positive Definite
Systems. Prentice-Hall, Englewood Cliffs, NJ, 1981.
466
Bibliography
[95] G. de Ghellinck and J.-Ph. Vial. A polynomial Newton method for linear programming.
Algorithmica, 1(4):425–453, 1986.
[96] G. de Ghellinck and J.-Ph. Vial. An extension of Karmarkar’s algorithm for solving a
system of linear homogeneous equations on the simplex. Mathematical Programming,
39:79–92, 1987.
[97] P.E. Gill, W. Murray, M.A. Saunders, J.A. Tomlin, and M.H. Wright. On projected
Newton barrier methods for linear programming and an equivalence to Karmarkar’s
projective method. Mathematical Programming, 36:183–209, 1986.
[98] J.-L. Goffin, J. Gondzio, R. Sarkissian, and J.-Ph. Vial. Solving nonlinear multicommodity flow problems by the analytic center cutting plane method. Mathematical
Programming, 76:131–154, 1997.
[99] J.-L. Goffin, A. Haurie, and J.-Ph. Vial. Decomposition and nondifferentiable optimization with the projective algorithm. Management Science, 38(2):284–302, 1992.
[100] J.-L. Goffin, Z.-Q. Luo, and Y. Ye. Complexity analysis of an interior cutting plane for
convex feasibility problems. SIAM J. on Optimization, 6(3), 1996.
[101] J.-L. Goffin and F. Sharifi-Mokhtarian-Mokhtarian. Primal-dual-infeasible Newton
approach for the analytic center deep-cutting plane method. J. Optim. Theory Appl.,
101(1):35–58, 1999.
[102] J.-L. Goffin and J.-Ph. Vial. On the computation of weighted analytic centers and dual
ellipsoids with the projective algorithm. Mathematical Programming, 60:81–92, 1993.
[103] J.-L. Goffin and J.-Ph. Vial. Short steps with Karmarkar’s projective algorithm for
linear programming. SIAM J. on Optimization, 4:193–207, 1994.
[104] J.-L. Goffin and J.-Ph. Vial. Shallow, deep and very deep cuts in the analytic center
cutting plane method. Math. Program., 84(1, Ser. A):89–103, 1999.
[105] D. Goldfarb and S. Mehrotra. Relaxed variants of Karmarkar’s algorithm for linear
programs with unknown optimal objective value. Mathematical Programming, 40:183–
195, 1988.
[106] D. Goldfarb and S. Mehrotra. A relaxed version of Karmarkar’s method. Mathematical
Programming, 40:289–315, 1988.
[107] D. Goldfarb and S. Mehrotra. A self-correcting version of Karmarkar’s algorithm. SIAM
J. on Numerical Analysis, 26:1006–1015, 1989.
[108] D. Goldfarb and D.X. Shaw. On the complexity of a class of projective interior point
methods. Mathematics of Operations Research, 20:116–134, 1995.
[109] D. Goldfarb and M.J. Todd. Linear Programming. In G.L. Nemhauser, A.H.G. Rinnooy
Kan, and M.J. Todd, editors, Optimization, volume 1 of Handbooks in Operations
Research and Management Science, pp. 141–170. North Holland, Amsterdam, The
Netherlands, 1989.
[110] D. Goldfarb and D. Xiao. A primal projective interior point method for linear programming. Mathematical Programming, 51:17–43, 1991.
[111] A.J. Goldman and A.W. Tucker. Theory of linear programming. In H.W. Kuhn and
A.W. Tucker, editors, Linear Inequalities and Related Systems, Annals of Mathematical
Studies, No. 38, pp. 53–97. Princeton University Press, Princeton, New Jersey, 1956.
[112] G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins University
Press, Baltimore (second edition), 1989.
[113] J. Gondzio. Presolve analysis of linear programs prior to applying the interior point
method. INFORMS J. on Computing, 9:73–91,1997.
Bibliography
467
[114] J. Gondzio. Multiple centrality corrections in a primal-dual method for linear programming. Computational Optimization and Applications, 6:137–156, 1996.
[115] J. Gondzio, O. du Merle, R. Sarkissian, and J.-Ph. Vial. ACCPM - a library for
convex optimization based on an analytic center cutting plane method. European J. of
Operational Research, 94:206–211, 1996.
[116] J. Gondzio and T. Terlaky. A computational view of interior point methods for linear
programming. In J.E. Beasley, editor, Advances in Linear and Integer Programming,
pp. 103–185. Oxford University Press, Oxford, Great Britain, 1996.
[117] C.C. Gonzaga. A simple representation of Karmarkar’s algorithm. Technical Report,
Dept. of Systems Engineering and Computer Science, COPPE Federal University of
Rio de Janeiro, 21941 Rio de Janeiro, RJ, Brazil, May 1988.
[118] C.C. Gonzaga. An algorithm for solving linear programming problems in O(n3 L)
operations. In N. Megiddo, editor, Progress in Mathematical Programming: Interior
Point and Related Methods, pp. 1–28. Springer Verlag, New York, 1989.
[119] C.C. Gonzaga. Conical projection algorithms for linear programming. Mathematical
Programming, 43:151–173, 1989.
[120] C.C. Gonzaga. Convergence of the large step primal affine-scaling algorithm for
primal nondegenerate linear programs. Technical Report ES–230/90, Dept. of Systems
Engineering and Computer Science, COPPE Federal University of Rio de Janeiro,
21941 Rio de Janeiro, RJ, Brazil, September 1990.
[121] C.C. Gonzaga. Large step path-following methods for linear programming, Part I:
Barrier function method. SIAM J. on Optimization, 1:268–279, 1991.
[122] C.C. Gonzaga. Large step path-following methods for linear programming, Part II:
Potential reduction method. SIAM J. on Optimization, 1:280–292, 1991.
[123] C.C. Gonzaga. Search directions for interior linear programming methods. Algorithmica, 6:153–181, 1991.
[124] C.C. Gonzaga. Path-following methods for linear programming. SIAM Review, 34(2):
167–227, 1992.
[125] C.C. Gonzaga. The largest step path following algorithm for monotone linear complementarity problems. Mathematical Programming, 76(2):309–332, 1997.
[126] C.C. Gonzaga and R.A. Tapia. On the convergence of the Mizuno–Todd–Ye algorithm
to the analytic center of the solution set. SIAM J. on Optimization, 7: 47–65, 1997.
[127] C.C. Gonzaga and R.A. Tapia. On the quadratic convergence of the simplified Mizuno–
Todd–Ye algorithm for linear programming. SIAM J. on Optimization, 7:66–85, 1997.
[128] H.J. Greenberg. An analysis of degeneracy. Naval Research Logistics Quarterly, 33:635–
655, 1986.
[129] H.J. Greenberg. The use of the optimal partition in a linear programming solution for
postoptimal analysis. Operations Research Letters, 15:179–186, 1994.
[130] O. Güler, 1994. Private communication.
[131] O. Güler. Limiting behavior of the weighted central paths in linear programming.
Mathematical Programming, 65(2):347–363, 1994.
[132] O. Güler, D. den Hertog, C. Roos, T. Terlaky, and T. Tsuchiya. Degeneracy in interior
point methods for linear programming: A survey. Annals of Operations Research,
46:107–138, 1993.
468
Bibliography
[133] O. Güler, C. Roos, T. Terlaky, and J.-Ph. Vial. Interior point approach to the theory
of linear programming. Cahiers de Recherche 1992.3, Faculte des Sciences Economique
et Sociales, Universite de Geneve, Geneve, Switzerland, 1992.
[134] O. Güler, C. Roos, T. Terlaky, and J.-Ph. Vial. A survey of the implications of the
behavior of the central path for the duality theory of linear programming. Management
Science, 41:1922–1934, 1995.
[135] O. Güler and Y. Ye. Convergence behavior of interior-point algorithms. Mathematical
Programming, 60(2):215–228, 1993.
[136] W. W. Hager. Updating the inverse of a matrix. SIAM Review, 31(2):221–239, June
1989.
[137] M. Halická. Analytical properties of the central path at the boundary point in linear
programming. Mathematical Programming, 84:335-355, 1999.
[138] L.A. Hall and R.J. Vanderbei. Two-third is sharp for affine scaling. Operations Research
Letters, 13:197–201, 1993.
[139] G.H. Hardy, J.E. Littlewood, and G. Pólya. Inequalities. Cambridge University Press,
Cambridge, Cambridge, 1934.
[140] D. den Hertog. Interior Point Approach to Linear, Quadratic and Convex Programming,
volume 277 of Mathematics and its Applications. Kluwer Academic Publishers,
Dordrecht, The Netherlands, 1994.
[141] D. den Hertog, J.A. Kaliski, C. Roos, and T. Terlaky. A logarithmic barrier cutting
plane method for convex programming. Annals of Operations Reasearch, 58:69–98,
1995.
[142] D. den Hertog and C. Roos. A survey of search directions in interior point methods for
linear programming. Mathematical Programming, 52:481–509, 1991.
[143] D. den Hertog, C. Roos, and T. Terlaky. A potential reduction variant of Renegar’s
short-step path-following method for linear programming. Linear Algebra and Its
Applications, 152:43–68, 1991.
[144] D. den Hertog, C. Roos, and T. Terlaky. On the monotonicity of the dual objective
along barrier paths. COAL Bulletin, 20:2–8, 1992.
[145] D. den Hertog, C. Roos, and T. Terlaky. Adding and deleting constraints in the
logarithmic barrier method for LP. In D.-Z. Du and J. Sun, editors, Advances
in Optimization and Approximation, pp. 166–185. Kluwer Academic Publishers,
Dordrecht, The Netherlands, 1994.
[146] D. den Hertog, C. Roos, and J.-Ph. Vial. A complexity reduction for the long-step
path-following algorithm for linear programming. SIAM J. on Optimization, 2:71–87,
1992.
[147] R.A. Horn and C.R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge, UK, 1985.
[148] P. Huard. Resolution of mathmetical programming with nonlinear constraints by the
method of centers. In J. Abadie, editor, Nonlinear Programming, pp. 207–219. North
Holland, Amsterdam, The Netherlands, 1967.
[149] P. Huard. A method of centers by upper-bounding functions with applications. In J.B.
Rosen, O.L. Mangasarian, and K. Ritter, editors, Nonlinear Programming: Proceedings
of a Symposium held at the University of Wisconsin, Madison, Wisconsin, USA, May
1970, pp. 1–30. Academic Press, New York, USA, 1970.
Bibliography
469
√
[150] P. Hung and Y. Ye. An asymptotically O( nL)-iteration path-following linear programming algorithm that uses long steps. SIAM J. on Optimization, 6:570–586, 1996.
[151] B. Jansen. Interior Point Techniques in Optimization. Complexity, Sensitivity and
Algorithms, volume 6 of Applied Optimization. Kluwer Academic Publishers, Dordrecht,
The Netherlands, 1997.
[152] B. Jansen, J.J. de Jong, C. Roos, and T. Terlaky. Sensitivity analysis in linear
programming: just be careful! European Journal of Operations Research, 101:15–28,
1997.
[153] B. Jansen, C. Roos, and T. Terlaky. An interior point approach to postoptimal
and parametric analysis in linear programming. Technical Report 92–21, Faculty of
Technical Mathematics and Computer Science, TU Delft, NL–2628 CD Delft, The
Netherlands, April 1992.
[154] B. Jansen, C. Roos, and T. Terlaky. A family of polynomial affine scaling algorithms
for positive semi-definite linear complementarity problems. SIAM J. on Optimization,
7(1):126–140, 1997.
[155] B. Jansen, C. Roos, and T. Terlaky. The theory of linear programming : Skew symmetric
self-dual problems and the central path. Optimization, 29:225–233, 1994.
[156] B. Jansen, C. Roos, and T. Terlaky. A polynomial Dikin-type primal-dual algorithm
for linear programming. Mathematics of Operations Research, 21:341–353, 1996.
[157] B. Jansen, C. Roos, T. Terlaky, and J.-Ph. Vial. Primal-dual algorithms for linear
programming based on the logarithmic barrier method. J. of Optimization Theory and
Applications, 83:1–26, 1994.
[158] B. Jansen, C. Roos, T. Terlaky, and J.-Ph. Vial. Long-step primal-dual target-following
algorithms for linear programming. Mathematical Methods of Operations Research,
44:11–30, 1996.
[159] B. Jansen, C. Roos, T. Terlaky, and J.-Ph. Vial. Primal-dual target-following algorithms for linear programming. Annals of Operations Research, 62:197–231, 1996.
[160] B. Jansen, C. Roos, T. Terlaky, and Y. Ye. Improved complexity using higher-order
correctors for primal-dual Dikin affine scaling. Mathematical Programming, 76:117–130,
1997.
[161] F. Jarre. Interior-point methods for classes of convex programs. In T. Terlaky, editor,
Interior Point Methods of Mathematical Programming, pp. 255–296. Kluwer Academic
Publishers, Dordrecht, The Netherlands, 1996.
[162] F. Jarre, M. Kocvara, and J. Zowe. Optimal truss design by interior-point methods.
SIAM J. on Optimization, 8(4):1084–1107, 1998.
[163] F. Jarre and M.A. Saunders. An adaptive primal-dual method for linear programming.
COAL Newsletter, 19:7–16, August 1991.
[164] J. Kaliski, D. Haglin, C. Roos, and T. Terlaky. Logarithmic barrier decomposition
methods for semi-infinite programming. International Transactions in Operations
Research 4:285–303, 1997.
[165] N.K. Karmarkar. A new polynomial-time algorithm for linear programming. Combinatorica, 4:373–395, 1984.
[166] R.M. Karp. Reducibility among combinatorial problems. In R.E. Miller and J.W.
Thatcher, editors, Complexity of computer computations, pp. 85–103. Plenum Press,
New York, 1972.
470
Bibliography
[167] L.G. Khachiyan. A polynomial algorithm in linear programming. Doklady Akademiia
Nauk SSSR, 244:1093–1096, 1979. Translated into English in Soviet Mathematics
Doklady 20, 191–194.
[168] K.C. Kiwiel. Complexity of some cutting plane methods that use analytic centers.
Mathematical Programming, 74(1), 1996.
[169] E. Klafszky, J. Mayer, and T. Terlaky. Linearly constrained estimation by mathmetical
programming. European J. of Operational Research, 34:254–267, 1989.
[170] E. Klafszky and T. Terlaky. On the ellipsoid method. Szigma, 20(2–3):196–208, 1988.
In Hungarian.
[171] E. Klafszky and T. Terlaky. The role of pivoting in proving some fundamental theorems
of linear algebra. Linear Algebra and its Applications, 151:97–118, 1991.
[172] E. de Klerk, C. Roos, and T. Terlaky. A nonconvex weighted potential function for
polynomial target following methods. Annals of Operations Reasearch, 81:3–14, 1998.
[173] G. Knolmayer. The effects of degeneracy on cost-coefficient ranges and an algorithm
to resolve interpretation problems. Decision Sciences, 15:14–21, 1984.
[174] M. Kojima, N. Megiddo, and S. Mizuno. A primal-dual infeasible-interior-point algorithm for linear programming. Mathematical Programming, 61:263–280, 1993.
[175] M. Kojima, N. Megiddo, T. Noma, and A. Yoshise. A unified approach to interior
point algorithms for linear complementarity problems, volume 538 of Lecture Notes in
Computer Science. Springer Verlag, Berlin, Germany, 1991.
[176] M. Kojima, S. Mizuno, and T. Noma. Limiting behavior of trajectories by a continuation method for monotone complementarity problems. Mathematics of Operations
Research, 15(4):662–675, 1990.
[177] M. Kojima, S. Mizuno, and A. Yoshise. A polynomial-time algorithm for a class of
linear complementarity problems. Mathematical Programming, 44:1–26, 1989.
[178] M. Kojima, S. Mizuno, and A. Yoshise. A primal-dual interior point algorithm for
linear programming. In N. Megiddo, editor, Progress in Mathematical Programming:
Interior Point and Related Methods, pp. 29–47. Springer Verlag, New York, 1989.
[179] E. Kranich. Interior point methods for mathmetical programming: A bibliography.
Discussion Paper 171, Institute of Economy and Operations Research, FernUniversität
Hagen, P.O. Box 940, D–5800 Hagen 1, West–Germany, May 1991. Available through
NETLIB, see Kranich [180].
[180] E. Kranich. Interior-point methods bibliography. SIAG/OPT Views-and-News, A
Forum for the SIAM Activity Group on Optimization, 1:11, 1992.
[181] P. Lancester and M. Tismenetsky. The Theory of Matrices with Applications. Academic
Press, Orlando (second edition), 1985.
[182] P.D. Ling. A new proof of convergence for the new primal-dual affine scaling interiorpoint algorithm of Jansen, Roos and Terlaky. Working paper, University of East-Anglia,
Norwich, England, 1993.
[183] P.D. Ling. A predictor-corrector algorithm. Working Paper, University of East Anglia,
Norwich, England, 1993.
[184] C.L. Liu. Introduction to Combinatorial Mathematics. Mac-Graw Hill Book Company,
New York, 1968.
[185] F.A. Lootsma. Numerical Methods for Nonlinear Optimization.
London, UK, 1972.
Academic Press,
Bibliography
471
[186] Z.-Q. Luo. Analysis of a cutting plane method that uses weighted analytic center an
multiple cuts. SIAM J. on Optimization, 7(3):697–716, 1997.
[187] Z.-Q. Luo, C. Roos, and T. Terlaky. Complexity analysis of a logarithmic barrier
decomposition method for semi-infinite linear programming. Applied Numerical
Mathematics, 29:379–394, 1999.
[188] Z.-Q. Luo and Y. Ye. A genuine quadratically convergent polynomial interior point
algorithm for linear programming. In D.-Z. Du and J. Sun, editors, Advances in
Optimization and Approximation, pp. 235–246. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1994.
[189] I.J. Lustig. An analysis of an available set of linear programming test problems.
Computers and Operations Research, 16:173–184, 1989.
[190] I.J. Lustig. Phase 1 search directions for a primal-dual interior point method for linear
programming. Contemporary Mathematics, 114:121–130, 1990.
[191] I.J. Lustig, R.E. Marsten, and D.F. Shanno. Computational experience with a primaldual interior point method for linear programming. Linear Algebra and Its Applications,
152:191–222, 1991.
[192] I.J. Lustig, R.E. Marsten, and D.F. Shanno. On implementing Mehrotra’s predictorcorrector interior point method for linear programming. SIAM J. on Optimization,
2:435–449, 1992.
[193] I.J. Lustig, R.E. Marsten, and D.F. Shanno. Interior point methods for linear programming : Computational state of the art. ORSA J. on Computing, 6(1):1–14, 1994.
[194] H.M. Markowitz. The elimination form of the inverse and its application to linear
programming. Management Science, 3:255–269, 1957.
[195] I. Maros and Cs. Mészáros. The role of the augmented system in interior point methods.
European J. of Operational Research, 107(3):720–736, 1998.
[196] R.E. Marsten, D.F. Shanno and E.M. Simantiraki. Interior point methods for linear
and nonlinear programming. In I.A. Duff and A. Watson, editors, The state of the art
in numerical analysis (York, 1996), volume 63 of Inst. Math. Appl. Conf. Ser. New
Ser., pages 339–362. Oxford Univ. Press, New York, 1997.
[197] L. McLinden. The analogue of Moreau’s proximation theorem, with applications to the
nonlinear complementarity problem. Pacific J. of Mathematics, 88:101–161, 1980.
[198] L. McLinden. The complementarity problem for maximal monotone multifunctions.
In R.W. Cottle, F. Giannessi, and J.L. Lions, editors, Variational Inequalities and
Complementarity Problems, pp. 251–270. John Wiley and Sons, New York, 1980.
[199] K.A. McShane, C.L. Monma, and D.F. Shanno. An implementation of a primal-dual
interior point method for linear programming. ORSA J. on Computing, 1:70–83, 1989.
[200] N. Megiddo. Pathways to the optimal set in linear programming. In N. Megiddo,
editor, Progress in Mathematical Programming: Interior Point and Related Methods,
pp. 131–158. Springer Verlag, New York, 1989. Identical version in Proceedings of the
6th Mathematical Programming Symposium of Japan, Nagoya, Japan, pp. 1–35, 1986.
[201] N. Megiddo. On finding primal- and dual-optimal bases. ORSA J. on Computing,
3:63–65, 1991.
[202] S. Mehrotra. Higher order methods and their performance. Technical Report 90–16R1,
Dept. of Industrial Engineering and Management Science, Northwestern University,
Evanston, IL 60208, USA, 1990. Revised July 1991.
472
Bibliography
[203] S. Mehrotra. Finding a vertex solution using an interior point method. Linear Algebra
and Its Applications, 152:233–253, 1991.
[204] S. Mehrotra. Deferred rank-one updates in O(n3 L) interior point algorithm. J. of the
Operations Research Society of Japan, 35:345–352, 1992.
[205] S. Mehrotra. On the implementation of a (primal-dual) interior point method. SIAM
J. on Optimization, 2(4):575–601, 1992.
[206] S. Mehrotra. Quadratic convergence in a primal-dual method. Mathematics of Operations Research, 18:741–751, 1993.
[207] S. Mehrotra and R.D.C. Monteiro. Parametric and range analysis for interior point
methods. Mathematical Programming, 74:65–82, 1996.
[208] S. Mehrotra and Y. Ye. On finding the optimal facet of linear programs. Mathematical
Programming, 62:497–515, 1993.
[209] O. du Merle. Mise en œuvre et développements de la méthode de plans coupants basés
sur les centres analytiques. PhD thesis, Faculté des Sciences Economiques et Sociales,
Université de Genève, 1995. In French.
[210] H.D. Mills. Marginal values of matrix games and linear programs. In H.W. Kuhn and
A.W. Tucker, editors, Linear Inequalities and Related Systems, Annals of Mathematical
Studies, No. 38, pp. 183–193. Princeton University Press, Princeton, New Jersey, 1956.
[211] J.E. Mitchell. Fixing variables and generating classical cutting planes when using an
interior point branch and cut method to solve integer programming problems. European
J. of Operational Research, 97:139–148, 1997.
[212] S. Mizuno. An O(n3 L) algorithm using a sequence for linear complementarity problems.
J. of the Operations Research Society of Japan, 33:66–75, 1990.
[213] S. Mizuno. A rank-one updating interior algorithm for linear programming. Arabian
J. for Science and Engineering, 15(4):671–677, 1990.
[214] S. Mizuno. A new polynomial time method for a linear complementarity problem.
Mathematical Programming, 56:31–43, 1992.
[215] S. Mizuno. A primal-dual interior point method for linear programming. The Proceeding
of the Institute of Statistical Mathematics, 40(1):27–44, 1992. In Japanese.
[216] S. Mizuno and M.J. Todd. An O(n3 L) adaptive path following algorithm for a linear
complementarity problem. Mathematical Programming, 52:587–595, 1991.
[217] S. Mizuno, M.J. Todd, and Y. Ye. On adaptive-step primal-dual interior-point algorithms for linear programming. Mathematics of Operations Research, 18:964–981,
1993.
[218] R.D.C. Monteiro and I. Adler. Interior-path following primal-dual algorithms: Part I:
Linear programming. Mathematical Programming, 44:27–41, 1989.
[219] R.D.C. Monteiro and I. Adler. Interior path-following primal-dual algorithms: Part II:
Convex quadratic programming. Mathematical Programming, 44:43–66, 1989.
[220] R.D.C. Monteiro, I. Adler, and M.G.C. Resende. A polynomial-time primal-dual affine
scaling algorithm for linear and convex quadratic programming and its power series
extension. Mathematics of Operations Research, 15:191–214, 1990.
[221] R.D.C. Monteiro and S. Mehrotra. A general parametric analysis approach and
its implications to sensitivity analysis in interior point methods. Mathematical
Programming, 72:65–82, 1996.
Bibliography
473
[222] R.D.C. Monteiro and T. Tsuchiya. Limiting behavior of the derivatives of certain
trajectories associated with a monotone horizontal linear complementarity problem.
Mathematics of Operations Research 21(4):793–814, 1996.
[223] M. Muramatsu and T. Tsuchiya. Convergence analysis of the projective scaling algorithm based on a long-step homogeneous affine scaling algorithm. Mathematical
Programming, 72:291–305, 1996.
[224] G.L. Nemhauser and L.A. Wolsey. Integer and Combinatorial Optimization. J. Wiley
& Sons, New York, 1988.
[225] Y. Nesterov. Cutting plane algorithms from analytic centers: efficiency estimates.
Mathematical Programming, 69(1), 1995.
[226] Y. Nesterov and A.S. Nemirovskii. Interior Point Polynomial Methods in Convex
Programming: Theory and Algorithms. SIAM Publications. SIAM, Philadelphia, USA,
1993.
[227] J. von Neumann. On a maximization problem. Manuscript, Institute for Advanced
Studies, Princeton University, Princeton, NJ 08544, USA, 1947.
[228] F. Nožička, J. Guddat, H. Hollatz, and B. Bank. Theorie der linearen parametrischen
Optimierung. Akademie-Verlag, Berlin, 1974.
[229] M.R. Osborne. Finite Algorithms in Optimization and Data Analysis. John Wiley &
Sons, New York, USA, 1985.
[230] M. Padberg. Linear Optimization and Extensions, volume 12 of Algorithmis and Combinatorics. Springer Verlag, Berlin, West–Germany, 1995.
[231] C.H. Papadimitriou and K. Steiglitz. Combinatorial optimization. Algorithms and
complexity. Prentice–Hall, Inc., Englewood Cliffs, New Jersey, 1982.
[232] J. Peng. Private communication.
[233] J. Peng, C. Roos and T. Terlaky. Self-Regularity: A New Paradigm for Primal-Dual
Interior Point Methods. Princeton University Press, 2002.
[234] R. Polyak. Modified barrier functions (theory and methods). Mathematical Programming, 54:177–222, 1992.
[235] F.A. Potra. A quadratically convergent predictor-corrector method for solving linear
programs from infeasible starting points. Mathematical Programming, 67(3):383–406,
1994.
[236] M.V. Ramana and P.M. Pardalos. Semidefinite programming. In T. Terlaky, editor,
Interior Point Methods of Mathematical Programming, pp. 369–398. Kluwer Academic
Publishers, Dordrecht, The Netherlands, 1996.
[237] J. Renegar. A polynomial-time algorithm, based on Newton’s method, for linear programming. Mathematical Programming, 40:59–93, 1988.
[238] R.T. Rockafellar. The elementary vectors of a subspace of IRN . In R.C. Bose and T.A.
Dowling, editors, Combinatorial Mathematics and Its Applications: Proceedings North
Caroline Conference, Chapel Hill, 1967, pp. 104–127. The University of North Caroline
Press, Chapel Hill, North Caroline, 1969.
[239] C. Roos. New trajectory-following polynomial-time algorithm for linear programming
problems. J. of Optimization Theory and Applications, 63:433–458, 1989.
[240] C. Roos. An O(n3 L) approximate center method for linear programming. In S. Dolecki,
editor, Optimization: Proceedings of the 5th French–German Conference in Castel–
Novel, Varetz, France, October 1988, volume 1405 of Lecture Notes in Mathematics,
pp. 147–158. Springer Verlag, Berlin, West–Germany, 1989.
474
Bibliography
[241] C. Roos and D. den Hertog. A polynomial method of approximate weighted centers for
linear programming. Technical Report 89–13, Faculty of Mathematics and Computer
Science, TU Delft, NL–2628 BL Delft, The Netherlands, 1989.
[242] C. Roos and T. Terlaky. Advances in linear optimization. In M. Dell’Amico, F. Maffioli,
and S. Martello, editors, Annotated Bibliographies in Combinatorial Optimization, pp.
95–114. John Wiley & Sons, New York, USA, 1997.
[243] C. Roos and J.-Ph. Vial. Analytic centers in linear programming. Technical Report
88–74, Faculty of Mathematics and Computer Science, TU Delft, NL–2628 BL Delft,
The Netherlands, 1988.
[244] C. Roos and J.-Ph. Vial. Long steps with the logarithmic penalty barrier function
in linear programming. In J. Gabszevwicz, J.F. Richard, and L. Wolsey, editors,
Economic Decision–Making: Games, Economics and Optimization, dedicated to J.H.
Drèze, pp. 433–441. Elsevier Science Publisher B.V., Amsterdam, The Netherlands,
1989.
[245] C. Roos and J.-Ph. Vial. A polynomial method of approximate centers for linear programming. Mathematical Programming, 54:295–305, 1992.
[246] C. Roos and J.-Ph. Vial. Achievable potential reductions in the method of Kojima et
al. in the case of linear programming. Revue RAIRO–Operations Research, 28:123–133,
1994.
[247] D.S. Rubin and H.M. Wagner. Shadow prices: tips and traps for managers and instructors. Interfaces, 20:150–157, 1990.
[248] W. Rudin. Principles of Mathematical Analysis. Mac-Graw Hill Book Company, New
York, 1978.
[249] R. Saigal. Linear Programming, A modern integrated analysis. International series
in operations research & management. Kluwer Academic Publishers, Dordrecht, The
Netherlands, 1995.
[250] A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, New
York, 1986.
[251] D.F. Shanno. Computing Karmarkar projections quickly. Mathematical Programming,
41:61–71, 1988.
[252] D.F. Shanno, M.G. Breitfeld, and E.M. Simantiraki. Implementing barrier methods for
nonlinear programming. In T. Terlaky, editor, Interior Point Methods of Mathematical
Programming, pp. 369–398. Kluwer Academic Publishers, Dordrecht, The Netherlands,
1996.
[253] R. Sharda. Linear programming software for personal computers: 1995 survey. OR/MS
Today, pp. 49–57, October 1995.
[254] D.X. Shaw and D. Goldfarb. A path-following projective interior point method for
linear programming. SIAM J. on Optimization, 4:65–85, 1994.
[255] N.Z. Shor. Quadratic optimization problems. Soviet J. of Computer and System Sciences, 25:1–11, 1987.
[256] G. Sierksma. Linear and integer programming, volume 245 of Monographs and Textbooks
in Pure and Applied Mathematics. Marcel Dekker Inc., New York, second edition, 2002.
Theory and practice, With 1 IBM-PC floppy disk (“INTPM, a version of Karmarkar’s
Interior Point Method”) by J. Gjaltema and G. A. Tijssen (3.5 inch; HD).
Bibliography
475
[257] Gy. Sonnevend. An “analytic center” for polyhedrons and new classes of global algorithms for linear (smooth, convex) programming. In A. Prékopa, J. Szelezsán, and
B. Strazicky, editors, System Modelling and Optimization: Proceedings of the 12th
IFIP–Conference held in Budapest, Hungary, September 1985, volume 84 of Lecture
Notes in Control and Information Sciences, pp. 866–876. Springer Verlag, Berlin, West–
Germany, 1986.
[258] Gy. Sonnevend, J. Stoer, and G. Zhao. On the complexity of following the central
path by linear extrapolation in linear programming. Methods of Operations Research,
62:19–31, 1990.
[259] G. Strang. Linear Algebra and its Applications. Harcourt Brace Jovanovich, Orlando,
Florida, USA, 1988.
√
[260] J.F. Sturm and S. Zhang. An O( nL) iteration bound primal-dual cone affine scaling
algorithm. Mathematical Programming, 72:177–194, 1996.
[261] K. Tanabe. Centered Newton methods and Differential Geometry of Optimization.
Cooperative Research Report 89. The Institute of Statistical Mathematics, Tokyo,
Japan, 1996. (Contains 38 papers related to the subject).
[262] M.J. Todd. Recent developments and new directions in linear programming. In M.
Iri and K. Tanabe, editors, Mathematical Programming: Recent Developments and
Applications, pp. 109–157. Kluwer Academic Press, Dordrecht, The Netherlands, 1989.
[263] M.J. Todd. The effects of degeneracy, null and unbounded variables on variants of
Karmarkar’s linear programming algorithm. In T.F. Coleman and Y. Li, editors, LargeScale Numerical Optimization. Volume 46 of SIAM Proceedings in Applied Mathematics,
pp. 81–91. SIAM, Philadelphia, PA, USA, 1990.
[264] M.J. Todd. A lower bound on the number of iterations of primal-dual interior-point
methods for linear programming. In G.A. Watson and D.F. Griffiths, editors, Numerical
Analysis 1993, volume 303 of Pitman Research Notes in Mathematics, pp. 237–259.
Longman Press, Harlow, 1994. See also [267].
[265] M.J. Todd. Potential-reduction methods in mathmetical programming. Mathematical
Programming, 76 (1), 3–45, 1997.
[266] M.J. Todd and B.P. Burrell. An extension of Karmarkar’s algorithm for linear programming using dual variables. Algorithmica, 1(4):409–424, 1986.
[267] M.J. Todd and Y. Ye. A lower bound on the number of iterations of long-step and polynomial interior-point linear programming algorithms. Annals of Operations Research,
62:233–252, 1996.
[268] T. Tsuchiya. Global convergence of the affine scaling methods for degenerate linear
programming problems. Mathematical Programming, 52:377–404, 1991.
[269] T. Tsuchiya. Degenerate linear programming problems and the affine scaling method.
Systems, Control and Information, 34(4):216–222, April 1990. (In Japanese).
[270] T. Tsuchiya. Global convergence property of the affine scaling methods for primal degenerate linear programming problems. Mathematics of Operations Research, 17(3):527–
557, 1992.
[271] T. Tsuchiya. Quadratic convergence of Iri–Imai algorithm for degenerate linear programming problems. J. of Optimization Theory and Applications, 87(3):703–726, 1995.
[272] T. Tsuchiya. Affine scaling algorithm. In T. Terlaky, editor, Interior Point Methods of
Mathematical Programming, pp. 35–82. Kluwer Academic Publishers, Dordrecht, The
Netherlands, 1996.
476
Bibliography
[273] T. Tsuchiya and M. Muramatsu. Global convergence of the long-step affine scaling
algorithm for degenerate linear programming problems. SIAM J. on Optimization,
5(3):525–551, 1995.
[274] A.W. Tucker. Dual systems of homogeneous linear relations. In H.W. Kuhn and
A.W. Tucker, editors, Linear Inequalities and Related Systems, Annals of Mathematical
Studies, No. 38, pp. 3–18. Princeton University Press, Princeton, New Jersey, 1956.
[275] K. Turner. Computing projections for the Karmarkar algorithm. Linear Algebra and
Its Applications, 152:141–154, 1991.
[276] P.M. Vaidya. An algorithm for linear programming which requires O((m + n)n2 +
(m + n)1.5 nL) arithmetic operations. Mathematical Programming, 47:175–201, 1990.
[277] R.J. Vanderbei. Symmetric quasi-definite matrices. SIAM J. on Optimization, 5(1):
100–113, 1995.
[278] R.J. Vanderbei and T.J. Carpenter. Symmetric indefinite systems for interior point
methods. Mathematical Programming, 58:1–32, 1993.
[279] R.J. Vanderbei, M.S. Meketon, and B.A. Freedman. A modification of Karmarkar’s
linear programming algorithm. Algorithmica, 1(4):395–407, 1986.
[280] S.A. Vavasis and Y. Ye. Condition numbers for polyhedra with real number data.
Operations Research Letters, 17:209–214, 1995.
[281] S.A. Vavasis and Y. Ye. A primal-dual interior point method whose running time
depends only on the constraint matrix. Mathematical Programming, 74:79–120, 1996.
[282] J.-Ph. Vial. A fully polynomial time projective method. Operations Research Letters,
7(1), 1988.
[283] J.-Ph. Vial. A unified approach to projective algorithms for linear programming. In
S. Dolecki, editor, Optimization: Proceedings of the 5th French–German Conference
in Castel–Novel, Varetz, France, October 1988, volume 1405 of Lecture Notes in
Mathematics, pp. 191–220. Springer Verlag, Berlin, West–Germany, 1989.
[284] J.-Ph. Vial. A projective algorithm for linear programming with no regularity condition.
Operations Research Letters, 12(1), 1992.
[285] J.-Ph. Vial. A generic path-following algorithm with a sliding constraint and its application to linear programming and the computation of analytic centers. Technical
Report 1996.8, LOGILAB/Management Studies, University of Geneva, Switzerland,
1996.
[286] J.-Ph. Vial. A path-following version of the Todd-Burrell procedure for linear programming. Mathematical Methods of Operations Research, 46(2):153–167, 1997.
[287] G.R. Walsh. An Introduction to Linear Programming. John Wiley & Sons, New York,
USA, 1985.
[288] J.E. Ward and R.E. Wendell. Approaches to sensitivity analysis in linear programming.
Annals of Operations Research, 27:3–38, 1990.
[289] D.S. Watkins. Fundamentals of Matrix Computations. John Wiley & Sons, New York,
1991.
[290] M. Wechs. The analyticity of interior-point-paths at strictly complementary solutions
of linear programs. Optimization Methods and Software, 9:209–243, 1998.
[291] A.C. Williams. Boundedness relations for linear constraints sets. Linear Algebra and
Its Applications, 3:129–141, 1970.
Bibliography
477
[292] A.C. Williams. Complementarity theorems for linear programming. SIAM Review,
12:135–137, 1970.
[293] H.P. Williams. Model Building in Mathematical Programming. John Wiley & Sons,
New York, USA (third edition), 1990.
[294] C. Witzgall, P.T. Boggs, and P.D. Domich. On the convergence behavior of trajectories
for linear programming. Contemporary Mathematics, 114:161–187, 1990.
[295] S.J. Wright. An infeasible-interior-point algorithm for linear complementarity problems. Mathematical Programming, 67(1):29–52, 1994.
[296] S.J. Wright and D. Ralph. A superlinear infeasible-interior-point algorithm for monotone nonlinear complementarity problems. Mathematics of Operations Research,
21(4):815–838, 1996.
[297] S.J. Wright. A path-following infeasible-interior-point algorithm for linear complementarity problems. Optimization Methods and Software, 2:79–106, 1993.
[298] S.J. Wright. Primal-Dual Interior-Point Methods. SIAM, Philadelphia, 1996.
√
[299] F. Wu, S. Wu, and Y. Ye. On quadratic convergence of the O( nL)-iteration homogeneous and self-dual linear programming algorithm. Annals of Operations Research,
87: 393–406, 1999.
[300] S.R. Xu, H.B. Yao, and Y.Q. Chen. An improved Karmarkar algorithm for linear
programming and its numerical tests. Mathematica Applicata, 5(1):14–21, 1992. (In
Chinese, English summary).
[301] H. Yamashita. A polynomially and quadratically convergent method for linear programming. Working Paper, Mathematical Systems Institute, Inc., Tokyo, Japan, 1986.
[302] M. Yannakakis. Computing the minimum fill-in is NP-complete. SIAM J. on Algebraic
Discrete Methods, pp. 77–79, 1981.
[303] Y. Ye. Interior algorithms for linear, quadratic, and linearly constrained convex programming. PhD thesis, Dept. of Engineering Economic Systems, Stanford University,
Stanford, CA 94305, USA, 1987.
[304] Y. Ye. Karmarkar’s algorithm and the ellipsoid method. Operations Research Letters,
6:177–182, 1987.
[305] Y. Ye. A class of projective transformations for linear programming. SIAM J. on
Computing, 19:457–466, 1990.
[306] Y. Ye. An O(n3 L) potential reduction algorithm for linear programming. Mathematical
Programming, 50:239–258, 1991.
[307] Y. Ye. Extensions of the potential reduction algorithm for linear programming. J. of
Optimization Theory and Applications, 72(3):487–498, 1992.
[308] Y. Ye. On the finite convergence of interior-point algorithms for linear programming.
Mathematical Programming, 57:325–335, 1992.
[309] Y. Ye. On the q-order of convergence of interior-point algorithms for linear programming. In Wu Fang, editor, Proceedings Symposium on Applied Mathematics. Chinese
Academy of Sciences, Institute of Applied Mathematics, 1992.
[310] Y. Ye. A potential reduction algorithm allowing column generation. SIAM J. on
Optimization, 2:7–20, 1992.
[311] Y. Ye. Toward probabilistic analysis of interior-point algorithms for linear programming. Mathematics of Operations Research, 19:38–52, 1994.
478
Author Index
[312] Y. Ye. Complexity analysis of the analytic center cutting plane method that uses
multiple cuts. Mathematical Programming, 76(1):211–221, 1997.
√
[313] Y. Ye, O. Güler, R.A. Tapia, and Y. Zhang. A quadratically convergent O( nL)iteration algorithm for linear programming. Mathematical Programming, 59:151–162,
1993.
[314] Y. Ye and P.M. Pardalos. A class of linear complementarity problems solvable in
polynomial time. Linear Algebra and Its Applications, 152:3–17, 1991.
[315] Y. Ye and M.J. Todd. Containing and shrinking ellipsoids in the path-following algorithm. Mathematical Programming, 47:1–10, 1990.
√
[316] Y. Ye, M.J. Todd, and S. Mizuno. An O( nL)-iteration homogeneous and self-dual
linear programming algorithm. Mathematics of Operations Research, 19:53–67, 1994.
√
[317] Y. Ye, O. Güler, R.A. Tapia, and Y. Zhang. A quadratically convergent O( nL)
iteration algorithm for linear programming. Mathematical Programming, 59:151–162,
1993.
[318] L. Zhang and Y. Zhang. On polynomiality of the Mehrotra-type predictor-corrector
interior-point algorithms. Mathematical Programming, 68:303–318, 1995.
[319] Y. Zhang and R.A. Tapia. Superlinear and quadratic convergence of primal-dual
interior-point methods for linear programming revisited. J. of Optimization Theory
and Applications, 73(2):229–242, 1992.
[320] G. Zhao. Interior point algorithms for linear complementarity problems based on large
neighborhoods of the central path. SIAM J. on Optimization, 8(2):397–413, 1998.
[321] G. Zhao and J. Zhu. Analytical properties of the central trajectory in interior point
methods. In D-Z. Du and J. Sun, editors, Advances in Optimization and Approximation,
pp. 362–375. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1994.
[322] M. Zoltán and T. Szabolcs. Iterációfüggetlen lépéshossz és lépésbecslés a Dikinalgoritmus alkalmazásában a lineáris programozási feladatra. Alkalmazott Matematikai
Lapok, 26:365–379, 2009.
Author Index
Abadie, J., 468
Adler, I., 44, 165, 196, 233, 309, 317, 330,
387, 406, 412, 461, 472
Aho, A.V., 48, 461
Akgül, M., 387, 461
Altman, A., 278, 461
Andersen, E.D., xxiii, 62, 406, 409, 412,
418, 419, 426, 428, 429, 461
Andersen, K.D., 406, 411, 461
Anstreicher, K.M., 277, 289, 317, 461–463
Arioli, M., 408, 462
Asić, M.D., 44, 462
Atkinson, D.S., 252, 278, 462
Bahn, O., 278, 289, 462
Baker, N.R., 387, 464
Balinski, M.L., 17, 462
Bank, B., 391, 473
Barnes, E.R., 181, 451, 462, 463
Bazaraa, M.S., 4, 431, 463
Beasley, R., 466
Bellman, R., 8, 463
Ben-Israel, A., 8, 463
Bertsekas, D.P., 431, 463
Birkhoff, G., 8, 463
Bixby, R.E., xix, 406, 426, 463
Björk, Å., 408, 463
Boggs, P.T., 44, 309, 476
Bonnans, J.F., xxi, 463
Bosch, R.A., 317, 462, 463
Bose, R.C., 473
Boyd, S.E., xx, xxi, 463
Brearley, A.L., 406, 463
Breitfeld, M.G., xx, xxi, 463, 474
Brinkhuis, J., xxiii, 330, 463
Buck, R.C., 431, 463
Bunch, J.R., 408, 463
Burrell, B.P., 278, 289, 298, 475
Carpenter, T.J., 409, 475
Chang, S.F., 406, 463
Chen, Y.Q., 289, 477
Chopra, S., 181, 463
Chvátal, V., 15, 392, 463
Coleman, T.F., 475
Cook, S.A., 48, 463
Cottle, R.W., 471
Crouzeix, J.P., 225, 464
Csiszár, I., 93, 464
Czyzyk, J., 430
Dantzig, G.B., xix, 15, 16, 392, 464
Dell’Amico, M., 473
Dikin, I.I., xx, xxi, 4, 70, 220, 254–256,
301, 330, 337–341, 343, 345–347,
451–456, 458, 464
Ding, J., 250, 464
Dolecki, S., 465, 473, 476
Domich, P.D., 44, 309, 476
Dowling, T.A., 473
Drèze, J.H., 473
Draisma, G., 330, 463
Du, D.-Z., 468, 470
Duff, I.S., 408, 409, 462, 464, 471
El-Bakry, A.S., 429, 464
Evans, J.R., 387, 464
Fang, S.-Ch., 4, 392, 464
Farkas, J., 15, 89, 190, 464
Fiacco, A.V., 16, 44, 87, 95, 108, 277, 282,
309, 431, 464
Fourer, R., 409, 465
Fraley, C., 289, 298, 465
Franklin, P., 431, 465
Freedman, B.A., 451, 475
Freund, R.M., 252, 258, 289, 418, 465
Frisch, K.R., xx, 87, 90, 465
480
Gabszevwicz, J., 473
Gal, T., 361, 365, 387, 465
Gale, D., 15, 465
Garey, M.R., 48, 465
Gauvin, J., 387, 465
George, A., 410, 465
Ghellinck, G. de, 278, 289, 298, 465
Giannessi, F., 471
Gill, P.E., 277, 465
Goffin, J.L., 252, 278, 289, 462, 466
Goldfarb, D., 277, 278, 289, 466, 474
Goldman, A.J., 2, 17, 36, 89, 466
Golub, G.H., 8, 466
Gondzio, J., 278, 406, 409, 418, 419, 430,
461, 466
Gonzaga, C.C., 137, 165, 181, 193, 233,
257, 277, 289, 317, 365, 439, 467
Gould, N.I.M., 464
Greenberg, H.J., xxiii, 365, 387, 467
Greville, T.N.E., 8, 463
Griffiths, D.F., 475
Guddat, J., 391, 473
Güler, O., 16, 44, 103, 190, 309, 315, 365,
467, 477
Haglin, D., 469
Halická, M., 44, 309, 468
Hall, L.A., 365, 468
Hardy, G.H., 230, 468
Haurie, A., 278, 466
Hertog, D. den, xxi, 95, 134, 165, 196, 233,
250, 271, 277, 278, 282, 284, 285,
289, 317, 467, 468, 473
Holder, A., xxiii
Hollatz, H., 391, 473
Hopcroft, J.E., 48, 461
Horn, R.A., 8, 11, 468
Huard, P., xx, 4, 277, 468
Hung, P., 330, 468
Illés, T., xxiii
Iri, M., 475
Jansen, B., 70, 156, 196, 233, 247, 252,
254, 258, 263, 271, 330, 338, 387,
414, 439, 451, 468, 469
Jarre, F., xx, xxi, xxiii, 169, 469
Jensen, D.J., 181, 463
Johnson, C.R., 8, 11
Johnson, D.S., 48, 465, 468
Jong, J.J. de, 387, 468
Author Index
Kaliski, J.A., 278, 468, 469
Karmarkar, N.K., xix, xx, xxiii, 1, 4, 5, 7,
87, 277, 278, 289, 292, 295, 297–
301, 304, 317, 412, 451, 461, 469
Karp, R.M., 48, 469
Khachiyan, L.G., xix, xx, 436, 469
Kiwiel, K.C., 278, 461, 469
Klafszky, E., 93, 423, 436, 469
Klerk, E. de , xxiii, 245, 469
Knolmayer, G., 387, 470
Kocvara, M., xx, 469
Kojima, M., 44, 165, 220, 233, 308, 317,
420, 470
Koopmans, Tj.C., xix
Kovačević-Vujčić, V.V., 44, 462
Kranich, E., xx, 470
Kuhn, H.W., 466, 472, 475
Lancester, P., 8, 470
Lenstra, J.K., 464
Li, T.Y., 250, 464, 475
Ling, P.D., 70, 121, 439, 470
Lions, J.L., 471
Littlewood, J.E., 230, 468
Liu, C.L., 336, 470
Liu, J.W.-H., 410, 465
Lootsma, F.A., 87, 470
Luo, Z.-Q., 181, 278, 466, 470
Lustig, I.J., 181, 213, 406, 418, 470, 471
Mészáros, Cs., 409, 430, 461, 471
Mac Lane, S., 8, 463
Maffioli, F., 473
Mangasarian, O.L., 468
Markowitz, H.M., 410, 426, 471
Maros, I., 409, 471
Marsten, R.E., 165, 181, 406, 418, 471
Martello, S., 473
Mayer, J., xxiii, 93, 469
McCormick, S.T., 16, 44, 87, 95, 108, 277,
282, 406, 463, 464
McLinden, L., 44, 100, 471
McShane, K.A., 165, 471
Megiddo, N., 16, 44, 100, 250, 420, 424,
467, 470, 471
Mehrotra, S., 62, 165, 181, 289, 317, 330,
380, 387, 409, 412, 413, 424, 430,
465, 466, 471, 472
Meketon, M.S., 451, 475
Merle, O. du, 278, 462, 466, 472
Miller, R.E., 469
Author Index
Mills, H.D., 361, 372, 472
Minkowski, H., 15
Mitchell, J.E., 422, 472
Mitra, G., 406, 463
Mizuno, S., 44, 165, 181, 196, 213, 220,
233, 250, 252, 271, 317, 414, 439,
470, 472, 477
Monma, C.L., 165, 471
Monteiro, R.D.C., 44, 165, 196, 233, 309,
317, 330, 380, 387, 412, 461, 471,
472
Muramatsu, M., 301, 305, 472, 475
Murray, W., 465
Nemhauser, G.L., 15, 422, 466, 472
Nemirovski, A.S., xx, xxi, 472
Nesterov, Y.E., xx, xxi, 278, 472
Neumann, J. von, 89, 473
Newton, I., 3–6, 29–32, 34, 48–52, 54,
58, 59, 61–63, 68–70, 72, 75,
80, 87, 109–116, 118, 120, 121,
123, 125, 127, 128, 130, 131,
140, 142–144, 149, 150, 152–154,
156–165, 167–172, 175–177, 180,
181, 186, 188, 194, 195, 199–202,
204, 206, 207, 219, 220, 231–
241, 243–245, 247, 249, 257–259,
261–264, 269–273, 277, 278, 281,
284, 285, 298, 301, 319, 320, 322,
325, 329, 330, 332, 333, 340, 341,
347, 401, 403, 404, 412, 413, 415,
416, 418–420
Nožička, F., 391, 473
Noma, T., 44, 470
Osborne, M.R., 15, 473
Pólya, G., 230, 468
Padberg, M., xix, 4, 445, 473
Papadimitriou, C.H., 15, 392, 473
Pardalos, P.M., xxi, xxiii, 54, 473, 477
Parlett, B.N., 408, 463
Peng, J., 430, 436, 473
Polyak, R., 418, 473
Potra, F.A., xxi, 463, 473
Prékopa, A., 474
Puthenpura, S., 4, 392, 464
Radosavljević-Nikolić, M.D., 44, 462
Ralph, D., xxi, 476
Ramana, M.V., xxi, 473
Reid, J.K., 464
481
Renegar, J., 4, 7, 233, 277–281, 283–285,
289, 473
Resende, M.G.C., 330, 412, 461, 472
Richard, J.F., 473
Rijk, P.P.M. de, 408, 462
Rinnooy Kan, A.H.G., 464, 466
Ritter, K., 468
Rockafellar, R.T., 423, 473
Roos, C., xx, 95, 121, 125, 128, 204, 225,
233, 245, 250, 254, 271, 277, 278,
282, 289, 317, 338, 339, 387, 414,
451, 461, 464, 467–470, 473, 474
Rosen, J.B., 468
Rubin, D.S., 387, 399, 474
Rudin, W., 431, 474
Saigal, R., 4, 15, 474
Saltzman, M.J., 426, 463
Sarkissian, R., 466
Saunders, M.A., xxiii, 169, 465, 469
Schrijver, A., 15, 48, 190, 392, 422, 445,
464, 474
Scott, J.A., 464
Shanno, D.F., xx, xxi, 165, 181, 318, 406,
418, 463, 471, 474
Sharda, R., 396, 474
Sharifi-Mokhtarian, F., 278, 466
Shaw, D.X., 278, 289, 466, 474
Sherali, H.D., 4, 463
Shetty, C.M., 4, 463
Shor, N.Z., xix, 474
Sierksma, G., 392
Simantiraki, E.M., xxi, 165, 471, 474
Sonnevend, Gy., 44, 181, 278, 474
Steiglitz, K., 15, 392, 473
Stoer, J., 181, 474
Strang, G., 8, 433, 474
Strazicky, B., 474
Sturm, J., xxiii, 258, 474
Sun, J., 165, 468, 470
Szelezsán, J., 474
Tanabe, T., 44, 474, 475
Tapia, R.A., 165, 181, 193, 429, 464, 467,
477
Terlaky, T., xx, 93, 245, 254, 277, 278,
282, 338, 387, 409, 414, 418, 423,
430, 436, 451, 461, 462, 466–470,
473–475
Thatcher, J.W., 469
Tismenetsky, M., 8, 470
482
Todd, M.J., 165, 181, 196, 213, 233, 277,
278, 289, 298, 365, 414, 466, 472,
475, 477
Tomlin, J.A., 465
Trafalis, T., 461
Tsuchiya, T., xxi, 4, 44, 301, 305, 339, 365,
467, 472, 475
Tucker, A.W., 2, 16, 17, 36, 89, 462, 466,
472, 475
Turner, K., 409, 464, 475
Ullman, J.D., 48, 461
Vaidya, P.M., 252, 278, 317, 462, 475
Van Loan, C.F., 8, 466
Vandenberghe, L., xx, xxi, 463
Vanderbei, R.J., 365, 409, 429, 451, 468,
475
Vavasis, S.A., 54, 58, 192, 476
Veiga, G., 412, 461
Vial, J.P., 95, 121, 125, 128, 204, 233,
252, 271, 278, 289, 298, 317, 462,
465–469, 473, 474, 476
Wagner, H.M., 387, 399, 474
Walsh, G.R., 15, 476
Ward, J.E., 387, 476
Warners, J.P., xxiii, 461
Watkins, D.S., 8, 476
Watson, A., 471, 475
Wechs, M., 44, 309, 476
Wendell, R.E., 387, 476
Weyl, H., 15
Williams, A.C., 103, 476
Williams, H.P., 1, 406, 463
Witzgall, C., 44, 309, 476
Wolsey, L.A., 15, 422, 472, 473
Wright, M.H., 465
Wright, S.J., xxi, 430, 476
Wu, F., 213, 476
Wu, S., 213, 476
Xiao, D., 289, 466
Xu, S.R., 289, 477
Xu, X., 461
Yamashita, H., 289, 477
Yannakakis, M., 410, 477
Yao, H.B., 289, 477
Ye, Y., 44, 54, 58, 62, 125, 128, 181, 190,
192, 193, 213, 233, 278, 289, 309,
Author Index
317, 330, 414, 426, 428, 429, 461,
466–472, 475–477
Yoshise, A., 44, 165, 317, 470
Zhang, L., 330, 477
Zhang, S., 258, 474
Zhang, Y., 165, 330, 429, 430, 464, 477
Zhao, G., 44, 181, 309, 330, 474, 478
Zhu, J., 44, 309, 430, 478
Zowe, J., xx, 469
Subject Index
1-norm, 9, see Symbol Index, k.k1
2-norm, 9, see Symbol Index, k.k2
p-norm, 9, see Symbol Index, k.kp
∞-norm, 9, see Symbol Index, k.k∞
µ-center, 28, see Symbol Index, x(µ), y(µ),
z(µ) and s(µ)
adaptive-step methods, see Target-following
Methods
adaptive-update strategy
dual case, 125
primal-dual case, 169
affine-scaling component, 171, see affinescaling direction
affine-scaling direction
dual, 127
primal-dual, 171, 179
affine-scaling step of size θ, 179
algorithms
Conceptual Logarithmic Barrier Algorithm, 108, 107–109
Conceptual Target-following Algorithm, 232
Dikin Step Algorithm for Self-dual
Model, 454
Dual Logarithmic Barrier Algorithm,
107–149
with adaptive updates, 123–129
with full Newton steps, 120, 120–
123
with large updates, 131, 130–149
Dual Logarithmic Barrier Algorithm
with Modified Full Newton Steps,
323
Full Step Dual Logarithmic Barrier
Algorithm with Rank-One Updates, 324, 317–328
Full-Newton Step Algorithm for Selfdual Model, 50, 47–70
Generic Dual Target-following Algorithm, 260
Generic Primal Target-following Algorithm, 269
Generic Target-following Algorithm,
233
Higher-Order Dikin Step Algorithm
for the Standard Model, 341,
337–346
Higher-Order Logarithmic Barrier
Algorithm, 357, 346–359
Karmarkar’s Projective Method, 294,
289–305
Method of Centers, 277–285
Predictor-Corrector Algorithm, 182,
177–194
Primal-Dual Logarithmic Barrier Algorithm, 149–209
with adaptive updates, 168–177
with full Newton steps, 160, 150–
168
with large updates, 195, 194–209
Renegar’s Method of Centers, 277–
285
Target-following Methods, 235–275
all-one vector, see e
analytic center, 43
definition, 44
dual feasible region, 128
level set, 46
limit of central path, 45
analyticity of the central path, see central
path
analyze phase, see implementation aspects
arithmetic-geometric-mean inequality, 133
asymptotic behavior, 2
asymptotic behavior of central path, 4, see
central path
484
backward dual Newton step, 113
barrier parameter, 132
standard problem, 90
barrier term, 221
basic indices, 392
basic solution, 2, 391, see implementation
aspects
basis for (P ), 213
basis identification procedure, see implementation aspects
basis tableau, see implementation aspects
binary encoding, 48, see complexity theory
bounded dual feasible region, 103
bounded level set, 100, 103, 222, 445
bounded primal feasible region, 103
bounded problem, 15
BPMPD, 430
break points, see Parametric Analysis
Bunch–Parlett factorization, see implementation aspects
canonical form
see canonical problem, 16
canonical model
see canonical problem, 16
canonical problem, 17, 18
approximate solutions, 76, 83
central path, 75
definition, 16, 18
dual problem, 18, 71
duality gap, 19
duality theorem, 39
embedding
if interior solutions are known, 72
in general, 78
homogenizing variable, see Symbol
Index, κ
KKT conditions, 74
normalizing variable, see Symbol
Index, ϑ
primal problem, 18, 71
strictly complementary solution, 17,
37, 38
strong duality property, 19, 39
strong duality theorem, 39
transformation into, 445
weak duality property, 18
Cauchy–Schwarz inequality, 9, 120, 136,
205, 303, 316, 342, 456
Subject Index
centering component, 171, see centering
direction
centering condition, 91
centering direction
dual, 127
primal-dual, 171, 179
centering method, 4, see Target-following
Methods
centering problem, 250
central path, 1, 16, 27, 28
algorithmic proof, 29
analyticity, 309
asymptotic behavior, 4, 309
canonical model, 73–76, 79–82
derivatives, 226, 307, 309, 315
differentiability, 4, 307
existence, 29–35, 90–99
general, xxi, 1–5, 7
implementation aspects, 403, 412,
418–420, 451, 454, 455
Karmarkar format, 301, 305
self-dual problem, 16, 17, 23, 27, 28,
31, 35, 36, 43–46, 52, 57–60, 70,
307–310, 322
standard model, 87, 95–99, 107, 117,
123, 128, 129, 149, 158, 159, 164,
171, 180, 181, 190, 194, 213–
215, 219–222, 225, 227, 228, 233,
235, 236, 239–241, 245, 249–252,
254–257, 261, 262, 271, 280–283,
330, 331, 338, 341, 347, 358
straight, 97, 128
uniqueness, 28
central-path-following methods, 219
Cholesky factorization, see implementation aspects
CLP, 429
column sum norm, 10
Combinatorial Optimization, xix
complementary vectors, 35
complete separation, 58
complexity, 2, 5, 70, 234, 284, 298, 318,
401, 415, 419
complexity analysis, 250, 278
complexity bounds, see iteration bounds,
xx, xxi, 5, 257, 317, 338, 348,
358, 414
complexity theory, xix
binary encoding, 48
polynomial time, 47
size of a problem instance, 47
Subject Index
solvable in polynomial time, 48
Conceptual Logarithmic Barrier Algorithm, 108, 107–109
iteration bound, 108
condition for adaptive updating, 172
condition number, 48, 54
cone neighborhood, 227
cone-affine-scaling, 258
constraint matrix, 18
corrector step, 181, see predictor-corrector
method
CPLEX, xix, xx, 4, 87, 396–398, 429
cutting plane methods, 278
damped Newton step, 131
damped-step methods, 4
damping parameter, 181
degenerate problem, 365
dense columns and rows, see implementation aspects
derivatives of x(µ) and s(µ), see central
path
differentiability of central path, 4, see
central path
Dikin direction, 451, 454
Dikin ellipsoid, 339, 452
Dikin step, 454
Dikin Step Algorithm for Self-dual Model,
454
duality gap reduction, 455
feasible step-size, 455
high-order variant, 337
iteration bound for ε-solution, 458
proximity measure, 454
search direction, 453
Dikin-path, 254
Dikin-path-following method, 4, see Targetfollowing Methods
dimension optimal sets, 365, see standard
problem
directional derivatives, see Parametric
Analysis
Discrete Optimization, xix
distance to the central path, see proximity
measure
domain, 15
dual canonical problem, 18, see canonical
problem
definition, 18
dual level set, 102
485
Dual Logarithmic Barrier Algorithm, 107–
149
with adaptive updates, 123–129
affine-scaling direction, 127
centering direction, 127
illustration, 129
with full Newton steps, 120, 120–123
convergence analysis, 121–122
illustration, 122–123
iteration bound, 120
Newton step ∆s, 111
proximity measure, 114
quadratic convergence, 114–119
scaled Newton step, 112
with large updates, 131, 130–149
illustrations, 144–149
iteration bound, 143
step-size, 140, 143
Dual Logarithmic Barrier Algorithm with
Modified Full Newton Steps, 323
iteration bound, 322
dual methods, 219
dual of general LO problem, 40
dual problem, 15
dual standard problem, see standard
problem
Dual Target-following Method, see Targetfollowing Methods
duality gap, 19
duality in LO, 15
Duality Theorem, 89, 362, 366
dualizing scheme, 43
elimination of free variables, 446
ellipsoid method, xix
equality constraints, 15
examples
calculation of central path, 97
classical sensitivity analysis, 392
condition number, 54
Dikin Step Algorithm, 458, 459
Dual Logarithmic Barrier Algorithm
with adaptive updates, 129
with full Newton steps, 122
with large update, 144
dual Newton process, 116
initialization, 215
Newton step Algorithm, 52
optimal partition, 62, 363
optimal set, 363
optimal-value function, 361, 369
486
Subject Index
at a break point, 378
computation, 381, 385
domain, 367
Predictor-Corrector Algorithm, 188
Primal-Dual Logarithmic Barrier Algorithm
with adaptive updates, 176
with full Newton steps, 162
with large updates, 209
primal-dual Newton process, 157
quadratic convergence Newton process, 116
quadratic convergence primal-dual
Newton process, 157
reduction to canonical format, 449,
450
rounding procedure, 63
self-dual embedding, 23, 26, 27, 30,
32, 46, 55, 449, 450
sensitivity analysis, 389
shadow prices, 376
shortest path problem, 363
Farkas’ lemma, 15, 40, 89
feasible problem, 15
feasible set, 15
feasible solution, 15
feasible step-size
Dikin Step Algorithm, 455
finite termination, 15, 16, 62
first-order method, 330
floating point operations, see implementation aspects
flops, see floating point operations
free variables, 446
Frobenius norm, 10
full index set, 27
Full Step Dual Logarithmic Barrier Algorithm with Rank-One Updates,
324, 317–328
modified proximity measure, 320–323
modified search direction, 319–320
required number of arithmetic operations, 328
Full-Newton Step Algorithm for Self-dual
Model, 50, 47–70
iteration bound for ε-solution, 52
iteration bound for exact solution, 68
iteration bound for optimal partition,
61
polynomiality, 69
proximity measure, 49, 59
rounding procedure, 62–65
search direction, 49
full-step methods, 4, see Target-following
Methods
Gaussian elimination, see implementation
aspects
generalized inverse, 65, 264, see pseudoinverse
geometric inequality, 230
Goldman–Tucker Theorem, 2, 89, 190,
362
gradient matrix, 308, see Jacobian
Hadamard inequality, 11, 436
Hadamard product, 11
Hessian norm, 261
Higher-Order Dikin Step Algorithm for
the Standard Model, 341, 337–
346
bound for the error term, 342
convergence analysis, 345–346
duality gap reduction, 342
feasible step-sizes, 342, 343
first-order direction, 340, 338–340
iteration bound, 338, 346
Higher-Order Logarithmic Barrier Algorithm, 357, 346–359
barrier parameter update, 356
bound for the error term, 348
convergence analysis, 357–359
improved iteration bound, 359
iteration bound, 358
proximity after a step, 353, 349–354
step-size, 353
higher-order methods, 5, 329–359
Schiet OpT M , 330
search directions, 330–334
analysis of error term, 335–337
error term, 333
illustration, 334
second-order effect, 329
upper bound for error term, 337
homogeneous, 22
homogenizing variable, 19
HOPDM, 430
implementation aspects, 401–430
analyze phase, 410
augmented system
Subject Index
definition, 404
solution of, 408
basic solution
dual degeneracy, 422
primal degeneracy, 422
basis tableau, 422
Bunch–Parlett factorization, 408
Cholesky factorization, 409
dense columns and rows, 409
floating point operations, 410
Gaussian elimination, 410
Markowitz’s merit function, 410
maximal basis, 425
normal equation
advantages and disadvantages, 409
definition, 404
solution of, 409
structure, 404
optimal basis, 421
optimal basis identification, 421–430
ordering
minimum degree, 410
minimum local fill-in, 410
pivot transformation, 422
preprocessing, 405–408
detecting redundancy, 406
reduction of the problem size, 407
Schur complement, 410
second-order predictor-corrector method,
411
simplify the Newton system, 418
sparse linear algebra, 408–413
starting point, 413–419
self-dual embedding, 414
step-size, 420
stopping criteria, 420–421
warm start, 418–419
implicit function theorem, 226, 308, 309,
331, 431
inequality constraints, 15
infeasible problem, 15, 38
infinity norm, 9
inner iteration, 132, 195
inner loop, 131, 195
input size of an LO problem, see L
interior-point condition, 16, 20
standard problem, 94
interior-point method, 20
interior-point methods, xix, 16
IPC, 20
IPM, 20
487
iteration bounds, 3, 5, 48, 122, 125, 144,
145, 150, 162, 167, 168, 247,
250–252, 254, 257, 258, 277, 284,
294, 318, 322, 330, 338, 345, 347
Conceptual Logarithmic Barrier Algorithm, 108
Dikin Step Algorithm, 70, 458
Dual Logarithmic Barrier Algorithm
with full Newton steps, 120, 125
with large updates, 143
Dual Logarithmic Barrier Algorithm
with Modified Full Newton Steps,
322
Full-Newton Step Algorithm, 52, 68
Higher-Order Dikin Step Algorithm
for the Standard Model, 346
Higher-Order Logarithmic Barrier
Algorithm, 358, 359
Karmarkar’s Projective Method, 297
Newton Step Algorithm, 69, 70
Primal-Dual Logarithmic Barrier Algorithm
with full Newton steps, 161, 168
with large updates, 208
Renegar’s Method of Centers, 279
Jacobian, 226, 308, 331, 432
Karmarkar format, see Symbol Index,
(P K), 297
definition, 289
discussion, 297–301
dual homogeneous version, 305
dual version, 305
homogeneous version, see Symbol
Index, (P KH), 304–305
Karmarkar’s Projective Method, 294,
289–305
decrease potential function, 296
iteration bound, 297
potential function, 295
search direction, 304, 301–304
step-size, 296
unit simplex in IRn , see Symbol
Index, Σn
illustration for n = 3, 290
inner-outer sphere bound, 292
inverse of the transformation Td ,
293
projective transformation, see Symbol Index, Td
488
properties of Td , 293
radius largest inner sphere, see
Symbol Index, r
radius smallest outer sphere, see
Symbol Index, R
Karush–Kuhn–Tucker conditions, 91, see
KKT conditions
KKT conditions
canonical problem, 74
standard problem, 91
uniqueness of solution, 92, 222
large coordinates, 54, 57
large updates, 144
large-step methods, 4, see Target-following
Methods
large-update algorithm, 208
large-update strategy, 125
left-shadow price, see Sensitivity Analysis
level set
ellipsoidal approximation, 315
of φw (x, s), 222
of g̃µ (x), 92
of duality gap, 100, 103, 445
of primal objective, 102
LINDO, 396–398
linear constraints, 1, 15
linear function, 1
linear optimization, see LO
linear optimization problem, 15
Linear Programming, xix
linearity interval, see Parametric Analysis
LIPSOL, 430
LO, xix
logarithmic barrier function, 87
standard dual problem, 105
standard primal problem, 90
logarithmic barrier method, xx, 3, 219
dual method, 107
Newton step, 111
primal method, 271
Newton step, 271
primal-dual method, 149, 150
Newton step, 150
see also Target-following Methods,
219
long-step methods, 4
LOQO, 429
lower bound for σSP , 56
Markowitz’s merit function, see implementation aspects
Subject Index
Mathematical Programming, xix
matrix norm, 10
maximal basis, see implementation aspects
maximal step, see adaptive-step methods
McIPM, 430
medium updates, 144
medium-step methods, see Target-following
Methods
medium-update algorithm, 209
Method of Centers, 277–285
minimum degree, see implementation aspects
minimum local fill-in, see implementation
aspects
µ-center
(P ) and (D), 95
multipliers, 16
multistep-step methods, see Target-following
Methods
Newton direction, 29–31, 49
self-dual problem, 29
definition, 29
feasibility, 32
quadratic convergence, 31, 32
Newton step
to µ-center
dual case, 110
primal-dual case, 161
to target w
dual case, 261
primal case, 271
primal-dual case, 236
nonbasic indices, 392
nonnegative variables, 446
nonpositive variables, 446
normal equation, see implementation aspects
normalizing constraint, 297
normalizing variable, 24
objective function, 15
objective vector, 18
optimal basic solution, 362
optimal basis, 362, 392, see implementation aspects
optimal basis identification, see implementation aspects
optimal basis partition, see Sensitivity
Analysis
Subject Index
optimal partition, 2, 27, 36, see standard
problem
standard problem, 190
optimal set, 15
optimal-value function, see Parametric
Analysis
optimizing, 15
orthogonality property, 24
OSL, xx, 4, 87, 396–398
outer iteration, 132, 195
outer iteration bound, 108
outer loop, 131, 195
Parametric Analysis, 361–386
optimal-value function, see Symbol
Index, zA (b, c), f (β) and c(γ)
algorithm for f (β), 380
algorithm for g(γ), 384
break points, 369
directional derivatives, 372
domain, 367
examples, 361, 367, 369, 376, 378,
381, 385
extreme points of linearity interval, 377, 378
linearity interval, 369
one-sided derivatives, 372, 373, 375
piecewise linearity, 368
perturbation vectors, see Symbol
Index, ∆b and ∆c
perturbed problems, see Symbol Index, (Pβ ) and (Dγ )
dual problem of (Dγ ), see Symbol
Index, (Pγ )
dual problem of (Pβ ), see Symbol
Index, (Dβ )
feasible region (Dγ ), see Symbol
Index, Dγ
feasible region (Pβ ), see Symbol
Index, Pβ
partial updating, 5, 317–328
Dual Logarithmic Barrier Algorithm
with Modified Full Newton Steps,
323
Full Step Dual Logarithmic Barrier
Algorithm with Rank-One Updates, 324
rank-one modification, 318
rank-one update, 318
Sherman-Morrison formula, 318
path-following method, 4
489
central path, 248
Dikin-path, 254
primal or dual, see logarithmic barrier method and center method
weighted path, 249
PC-PROG, 396–398
PCx, 430
perturbed problems, see Parametric Analysis
pivot transformation, see implementation
aspects
polynomial time, see complexity theory,
48, see complexity theory
polynomially solvable problems, xix
positive definite matrix, 8
positive semi-definite matrix, 8
postoptimal analysis, see Sensitivity Analysis
potential reduction methods, 4
predictor step, 181, see predictor-corrector
method
Predictor-Corrector Algorithm, 182, 177–
194
adaptive version, 186–194
convergence analysis, 185–194
illustration, 188
iteration bound, 181
second-order version, see implementation aspects
predictor-corrector method, 150, see PredictorCorrector Algorithm
preprocessing, see implementation aspects
primal affine-scaling, 339
primal affine-scaling method, 339, 451
primal canonical problem, 18, see canonical problem
definition, 18
primal level set, 102
primal logarithmic barrier method, 304
primal methods, 219
primal standard problem, see standard
problem, see standard problem
Primal Target-following Method, see Targetfollowing Methods
primal-dual affine-scaling, 169
primal-dual algorithms, 150
primal-dual centering, 169
Primal-Dual Logarithmic Barrier Algorithm, 149–209
duality gap after Newton step, 153
example Newton process, 159
490
feasibility of Newton step, 152, 154
initialization, 213–216
local quadratic convergence, 156, 159
Newton step, 150, 150–154
proximity measure, 156
with adaptive updates, 168–177
affine-scaling direction, 171, 179
centering direction, 171, 179
cheap adaptive update, 176
condition for adaptive updating,
172, 173
illustration, 176–177
with full Newton steps, 160, 150–168
classical analysis, 165–168
convergence analysis, 161–162
illustration, 162–164
iteration bound, 161
with large updates, 195, 194–209
illustrations, 209
iteration bound, 208
step-size, 201
primal-dual logarithmic barrier function,
132
primal-dual method, 219
primal-dual pair, 99
Primal-Dual Target-following Method, see
Target-following Methods
Projective Method, 277, see Karmarkar’s
Projective Method
proximity measures, 31, 59
δc (w), 222, 227
δc (x), 454
δc (z), 59
δ d (y, w), 261
δ p (x, w), 271, 272
δ(w∗ , w), 266
δ(z, µ), 49
δ(x, s; µ), 156, 237
δ(xs, w), 237
δ(s, µ), 114
σ(x, s; µ), 165
pseudo-inverse, 194, 313, 433–434
quadratic convergence
dual case, 114
primal-dual case, 156
ranges, see Sensitivity and/or Parametric
Analysis
rank-one modification, see partial updating
Subject Index
rank-one update, see partial updating
reliable sensitivity modules, 399
removal of equality constraints, 448
Renegar’s method, see Renegar’s Method
of Centers
Renegar’s Method of Centers, 279
adaptive and large-update variants,
284–285
analysis, 281–284
as target-following method, 279–280
barrier function, see Symbol Index,
φR (y, z)
description, 278
iteration bound, 279
lower bound update, 278
right-hand side vector, 18
right-shadow price, see Sensitivity Analysis
rounding procedure, 3, 54
row sum norm, 10
scaled Newton step, 114
scaling matrix, 151, 317
scheme for dualizing, 43
Schiet OpT M , see higher-order methods
Schur complement, see implementation
aspects
search direction, 451
second-order effect
higher-order methods, 329
self-dual embedding, 22
self-dual model, see self-dual problem
self-dual problem, 13, 16, 24
central path
convergence, 43, 45
derivatives, 309–315
condition number, see Symbol Index,
σSP
definition, 22, 71, 72, 451
ellipsoidal approximations of level
sets, 315–316
limit central path, 36
objective value, 24, 25, 48, 50, 61, 66,
454, 455
optimal partition, 36
polynomial algorithm, 50, 47–70,
454
proximity measure, 31
strictly complementary solution, 35–
37
strong duality theorem, 38
Subject Index
Semidefinite Optimization, xix
Sensitivity Analysis, 387–399
classical approach, 391–399
computationally cheap, 393
optimal basis partition, 392
pitfalls, 399
ranges depend on optimal basis,
392
results of 5 commercial packages,
394–398
definition, 387
example, 389
left- and right-shadow prices of bi ,
387, 388
left- and right-shadow prices of cj ,
388
left-shadow price, 387
range of bi , 387, 388
range of cj , 387, 388
range of a coefficient, 387
right-shadow price, 387
shadow price of a coefficient, 387
shadow prices, see Sensitivity and/or
Parametric Analysis
Sherman-Morrison formula, 318, see partial updating
shifted barrier method, 258
short-step methods, 4, see Target-following
Methods
Simplex Method, xix, xx, 1–3, 6, 7, 15, 16,
87, 365, 391, 392, 406
singular value decomposition, 434
size of a problem instance, see complexity
theory
skew-symmetric matrix, 18, 20–22, 24, 28,
29, 47, 214, 299, 307, 310, 416
slack vector, 22, 47
small coordinates, 54, 57
solvable in polynomial time, see complexity theory
solvable problem, 38
sparse linear algebra, see implementation
aspects
spectral matrix norm, 10
standard dual problem
logarithmic barrier function, 105
standard format, 87, see standard problem, 448
standard primal problem
logarithmic barrier function, 90
standard problem
491
barrier parameter, 90
barrier term, 90
central path
definition, 95
duality gap, 107
examples, 96–99
monotonicity, 95
classical duality results
complementarity, 89
strong duality, 89
weak duality, 88, 89
coordinatewise duality, 103
dual adaptive-update algorithm, 123–
129
illustration, 129
dual algorithms, 107–149
dual barrier function, see Symbol
Index, kµ (y, s)
decrease after step, 140, 140–142
effect of an update, 140, 138–140
dual full-step algorithm, 120, 120–
123
dual large-update algorithm, 131,
130–149
dual problem, 88, 103, 107
duality gap
close to central path, 119
on central path, 89, 99
estimates of dual objective values,
138, 135–138
interior-point condition, 94
equivalent conditions, 100
KKT conditions, 91
optimal partition, see Symbol Index,
π = (B, N )
optimal sets, 100, see Symbol Index,
P ∗ and D∗
determined by dual optimal solution, 363
determined by optimal partition,
363
dimensions, 365
example, 363
orthogonality property, 99
predictor-corrector algorithm, 182,
177–194
primal barrier function, 90, see Symbol Index, g̃µ (x)
primal problem, 87, 103
primal-dual adaptive-update algorithm, 168–177
492
primal-dual algorithms, 149–209
primal-dual barrier function, see Symbol Index, φµ (x, s)
decrease after step, 201, 199–204
effect of an update, 205
primal-dual full-step algorithm, 160,
150–168
primal-dual large-update algorithm,
195, 194–209
strictly complementary solution, 89
symmetric formulation, 103–105
starting point, see implementation aspects
step of size α
damped Newton step, 140, 154, 199,
202, 232, 240, 241, 258, 403
decrease barrier function, 140, 199,
201, 202, 241, 296, 347
Dikin step, 455
feasibility, 152, 154, 236, 239, 262,
272, 342, 343, 455
higher-order Dikin step, 341, 349
step-size, see implementation aspects
stopping criteria, see implementation aspects
strict complementarity
standard format, 89
strictly complementary solution, 2
strictly complementary vectors, 35
strictly feasible, 4
strong duality property, 19
strong duality theorem, 39
support of a vector, 36
target map, see Symbol Index, ΦP D , see
Target-following Methods
target pair, see Target-following Methods
target sequence, 4, see Target-following
Methods
target vector, see Target-following Methods
Target-following Method, 4
Target-following Methods, 235–275
adaptive and large target-update,
257–258
adaptive-step methods, 232
dual method, 260, 259–268
barrier function, 259
effect of target update, 266
feasibility of Newton step, 262
linear convergence for damped
step, 264
Subject Index
local quadratic convergence, 263
Newton step, 261
proximity measure, 261
examples, 247–285
centering method, 250–252
central-path-following, 248–249
Dikin-path-following method, 254–
257
method of centers, 277–285
Renegar’s method of centers, 277–
285
weighted-centering method, 252–
253
weighted-path-following, 249–250
full-step methods, 232
large-step methods, 232
medium-step methods, 232
multistep-step methods, 232
primal method, 269, 269–275
barrier function, 270
effect of target update, 275
feasibility of Newton step, 272
linear convergence for damped
step, 273
local quadratic convergence, 273
Newton step, 271
proximity measure, 271, 272
primal-dual method, 233, 235–245
barrier function, 221
duality gap after Newton step, 237
feasibility of Newton step, 236, 239
linear convergence for damped
steps, 241
local quadratic convergence, 240
Newton step, 235, 236
proximity measure, 237, 266
proximity measure, 222
short-step methods, 232
target map, 220
target pair, 235
target sequence, 220
properties, 226–231
target vector, 235
traceable target sequence, 231
theorems of the alternatives, 40
traceable target sequence, see Targetfollowing Methods
types of constraint
equality, 446
inequality
greater-than-or-equal-to, 446
Subject Index
less-than-or-equal-to, 446
types of variable
free, 446
nonnegative, 446
nonpositive, 446
unbounded problem, 15, 38
unit ball in IRn , 10
unsolvable problem, 38
vanishing duality gap, 19, 37
variance vector, 31, 49, 59
warm start, see implementation aspects
weak duality, 18
weak duality property, 18
weighted dual logarithmic barrier function, 259, see Symbol Index,
φdw (y)
weighted path, 249
weighted primal barrier function, 270
weighted primal logarithmic barrier function, see Symbol Index, φpw (x)
weighted primal-dual logarithmic barrier
function, 221, see Symbol Index,
φw (x, s)
weighted-analytic center, 4, 220, 229
definition, 229
limit of target sequence, 229
weighted-centering problem, 252
weighted-path-following method, 4, see
Target-following Methods
weighting coefficients, 221
w-space, 220
XMP, 396–398
XPRESS-MP, 429
493
Symbol Index
(D′ ), 82
(D)
canonical form, 18, 71
standard form, 88, 103, 107, 219, 298,
361
(DK), 305
(DKH), 305
(D′ ), 104
(D̄), 214
(Dβ ), 366
(Dγ ), 366
(EP ), 449
(P ′′ ), 299
(P ′ ), 82, 298, 299, 448
(P )
canonical form, 18, 71, 449
standard form, 87, 103, 213, 219, 298,
361
(P K), 289
(P KH), 304
(P KS), 293
(P ′ ), 104
(P̄ ), 214
(Pβ ), 366
(P c ), 213, 214
(Pγ ), 366
(Pµ ), 91
(SP ), 22, 47, 72, 88, 307, 416, 451
(SP0 ), 71
(SP1 ), 73
(SP2 ), 78
(SP0 ), 22
(SSP ), 88
(SP c ), 214
(SSP c ), 214
A
canonical form, 18
Karmarkar form, 289
standard form, 87, 298, 361
k.k1 , 9
k.k2 , 9
k.kp , 9
k.k∞ , 9
B, 24, 190
b
canonical form, 18
standard form, 87, 298, 361
b(β), 366
B ∗ , 65
c
canonical form, 18
Karmarkar form, 289
standard form, 87, 298, 361
c(γ), 366
d, 170, 238
ds , 170, 238
das , 171
dcs , 171
dx , 170, 238
dax , 171
dcx , 171
D, 88
Dβ , 366
Dγ , 366
D+ , 88
D∗ , 89, 190, 362
dimension, 365
from optimal solution of (P ), 363
∆b, 366
∆c, 366
∆s, 49, 150, 452
∆a s, 171
496
∆c s, 171
∆x, 150, 451
∆z, 49
∆a x, 171
∆c x, 171
∆y, 150
δc (w), 222, 227
δc (x), 454
δc (z), 59
δ d (y, w), 261
δ d (y, w), 261
δ p (x, w), 271, 272
∆s, 29
δ(w∗ , w), 266
δ(z, µ), 49
δ(x, s; µ), 156, 237
δ(xs, w), 237
∆z, 29
δ(s, µ), 114
δ(x, µ), 305
e, 9
E (µ, r), 315
f (β), 366
g(γ), 366
g̃µ (x), 90
scaled, see gµ (x)
gµ (x), 132
H, 111
h̃µ (s), 105, 110
scaled, see hµ (s)
hµ (s), 132
κ, 19
kµ (y, s), 105, 110
see also h̃µ (s), 105
L, 48, 70
L, 104
L⊥ , 104
M , 21
M̄ , 20, 23, 71
MBB , 55
MBN , 55
MIJ , 55
MK , 315
MNB , 55
MNN , 55
Symbol Index
N , 24, 190
N (A), 91
n̄, 21
O, 11
Ω, 11
ω, 65
P, 88
Pβ , 366
Pγ , 366
P + , 88
P ∗ , 89, 190, 362
dimension, 365
from optimal solution of (D), 363
φdµ (s), 132
φpµ (x), 132
φdµ (s)
properties, 133, 134
φdµ (s)
properties, 132
φpµ (x)
properties, 132
φµ (x, s), 132
properties, 132–134
ΦP D , 220
existence, 222, 221–226
φpµ (x)
properties, 133
φR (y, z), 278
φdw (y), 260
φpw (x), 270
φw (x, s), 221
πB , 65
π = (B, N ), 362
PQ , 111
ψ
graph, 93
graphs of ψ(δ) and ψ(−δ), 135
properties, 93, 133, 137, 197, 198
Ψ
properties, 134
q, 21
qB , 55
qN , 55
R, 290, see Σn
r, 21, 291, see Σn
ρ(δ), 182
s(µ), 95
Symbol Index
sα , 158, 455
sB , 55
sB (z̃), 53
sB (z), 53
σSP , 54
lower bound, 56
σ(x, s; µ), 165
σd , 192
σp , 192
Σn , 290
illustration for n = 3, 290
s
σSP
, 54
σ(z), 36
z
σSP
, 54
sN , 55
sN (z̃), 53
SP, 54
s+ , 455
SP ∗ , 44, 54
s(z̃), 53
s(z), 53
s(z), 22
Td , 292
properties, 293
Θ, 11
ϑ, 21
z̃B , 53
z̃N , 53
u, 170, 238
v, 238
w-space, 220
x(µ), 95
xα , 158, 455
x+ , 455
y(µ), 95
z, 21
z(µ), 28
zA (b, c), 361
zB , 55
z̄, 20, 23, 71
zI , 53
zN , 53, 55
497