Academia.eduAcademia.edu

Interior Point Methods for Linear Optimization

Preface Linear Optimization 1 (LO) is one of the most widely taught and applied mathematical techniques. Due to revolutionary developments both in computer technology and algorithms for linear optimization, 'the last ten years have seen an estimated six orders of magnitude speed improvement'. 2 This means that problems that could not be solved 10 years ago, due to a required computational time of one year, say, can now be solved within some minutes. For example, linear models of airline crew scheduling problems with as many as 13 million variables have recently been solved within three minutes on a four-processor Silicon Graphics Power Challenge workstation. The achieved acceleration is due partly to advances in computer technology and for a significant part also to the developments in the field of so-called interior-point methods for linear optimization. Until very recently, the method of choice for solving linear optimization problems was the Simplex Method of Dantzig [59]. Since the initial formulation in 1947, this method has been constantly improved. It is generally recognized to be very robust and efficient and it is routinely used to solve problems in Operations Research, Business, Economics and Engineering. In an effort to explain the remarkable efficiency of the Simplex Method, people strived to prove, using the theory of complexity, that the computational effort to solve a linear optimization problem via the Simplex Method is polynomially bounded with the size of the problem instance. This question is still unsettled today, but it stimulated two important proposals of new algorithms for LO. The first one is due to Khachiyan in 1979 [167]: it is based on the ellipsoid technique for nonlinear optimization of Shor [255]. With this technique, Khachiyan proved that LO belongs to the class of polynomially solvable problems. Although this result has had a great theoretical impact, the new algorithm failed to deliver its promises in actual computational efficiency. The second proposal was made in 1984 by Karmarkar [165]. Karmarkar's algorithm is also polynomial, with a better complexity bound 1 The field of Linear Optimization has been given the name Linear Programming in the past. The origin of this name goes back to the Dutch Nobel prize winner Koopmans. See Dantzig [60]. Nowadays the word 'programming' usually refers to the activity of writing computer programs, and as a consequence its use instead of the more natural word 'optimization' gives rise to confusion. Following others, like Padberg [230], we prefer to use the name Linear Optimization in the book. It may be noted that in the nonlinear branches of the field of Mathematical Programming (like Combinatorial Optimization, Discrete Optimization, Semidefinite Optimization, etc.) this terminology has already become generally accepted. 2 This claim is due to R.E. Bixby, professor of Computational and Applied Mathematics at Rice University, and director of CPLEX Optimization, Inc., a company that markets algorithms for linear and mixed-integer optimization. See the news bulletin of the Center For Research on Parallel Computation, Volume 4, Issue 1, Winter 1996. Bixby adds that parallelization may lead to 'at least eight orders of magnitude improvement-the difference between a year and a fraction of a second!' 4 For Convex Optimization the reader may consult den Hertog [140], Nesterov and Nemirovski [226] and Jarre [161]. For Semidefinite Optimization we refer to Nesterov and Nemirovski [226], Vandenberghe and Boyd [48] and Ramana and Pardalos [236]. We also mention Shanno and Breitfeld and Simantiraki [252] for the related topic of barrier methods for nonlinear programming. 5 A recent survey on affine scaling methods was given by Tsuchiya [272]. 6 We refer the reader to, e.g., Potra [235], Bonnans and Potra [45], Wright [295, 297], Wright and Ralph [296] and the recent book of Wright [298].

Interior Point Methods for Linear Optimization C. Roos, T. Terlaky, J.-Ph. Vial Dedicated to our wives Gerda, Gabriella and Marie and our children Jacoline, Geranda, Marijn Viktor Benjamin and Emmanuelle Contents List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . 1.1 Subject of the book . . . . . . . . . . . . . . . . . 1.2 More detailed description of the contents . . . . . 1.3 What is new in this book? . . . . . . . . . . . . . 1.4 Required knowledge and skills . . . . . . . . . . . 1.5 How to use the book for courses . . . . . . . . . . 1.6 Footnotes and exercises . . . . . . . . . . . . . . 1.7 Preliminaries . . . . . . . . . . . . . . . . . . . . 1.7.1 Positive definite matrices . . . . . . . . . 1.7.2 Norms of vectors and matrices . . . . . . 1.7.3 Hadamard inequality for the determinant 1.7.4 Order estimates . . . . . . . . . . . . . . . 1.7.5 Notational conventions . . . . . . . . . . . I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction: Theory and Complexity 2 Duality Theory for Linear Optimization . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . 2.2 The canonical LO-problem and its dual . . . . 2.3 Reduction to inequality system . . . . . . . . . 2.4 Interior-point condition . . . . . . . . . . . . . 2.5 Embedding into a self-dual LO-problem . . . . 2.6 The classes B and N . . . . . . . . . . . . . . . 2.7 The central path . . . . . . . . . . . . . . . . . 2.7.1 Definition of the central path . . . . . . 2.7.2 Existence of the central path . . . . . . 2.8 Existence of a strictly complementary solution 2.9 Strong duality theorem . . . . . . . . . . . . . . 1 1 2 5 6 6 8 8 8 8 11 11 11 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 15 18 19 20 22 24 27 27 29 35 38 viii Contents 2.10 The dual problem of an arbitrary LO problem . . . . . . . . . . . . . . 2.11 Convergence of the central path . . . . . . . . . . . . . . . . . . . . . . 40 43 3 A Polynomial Algorithm for the Self-dual Model . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Finding an ε-solution . . . . . . . . . . . . . . . . . . . . 3.2.1 Newton-step algorithm . . . . . . . . . . . . . . . 3.2.2 Complexity analysis . . . . . . . . . . . . . . . . 3.3 Polynomial complexity result . . . . . . . . . . . . . . . 3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . 3.3.2 Condition number . . . . . . . . . . . . . . . . . 3.3.3 Large and small variables . . . . . . . . . . . . . 3.3.4 Finding the optimal partition . . . . . . . . . . . 3.3.5 A rounding procedure for interior-point solutions 3.3.6 Finding a strictly complementary solution . . . . 3.4 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 47 48 50 50 53 53 54 57 58 62 65 70 4 Solving the Canonical Problem . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 4.2 The case where strictly feasible solutions are known 4.2.1 Adapted self-dual embedding . . . . . . . . . 4.2.2 Central paths of (P ) and (D) . . . . . . . . . 4.2.3 Approximate solutions of (P ) and (D) . . . . 4.3 The general case . . . . . . . . . . . . . . . . . . . . 4.3.1 Introduction . . . . . . . . . . . . . . . . . . 4.3.2 Alternative embedding for the general case . 4.3.3 The central path of (SP2 ) . . . . . . . . . . . 4.3.4 Approximate solutions of (P ) and (D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 71 72 73 74 75 78 78 78 80 82 II . . . . . . . . . . . . . . . . . . . . . . The Logarithmic Barrier Approach 85 5 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Duality results for the standard LO problem . . . . . . 5.3 The primal logarithmic barrier function . . . . . . . . 5.4 Existence of a minimizer . . . . . . . . . . . . . . . . . 5.5 The interior-point condition . . . . . . . . . . . . . . . 5.6 The central path . . . . . . . . . . . . . . . . . . . . . 5.7 Equivalent formulations of the interior-point condition 5.8 Symmetric formulation . . . . . . . . . . . . . . . . . . 5.9 Dual logarithmic barrier function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 . 87 . 88 . 90 . 90 . 91 . 95 . 99 . 103 . 105 6 The Dual Logarithmic Barrier 6.1 A conceptual method . . . . . 6.2 Using approximate centers . . 6.3 Definition of the Newton step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 107 109 110 Contents 6.4 6.5 6.6 6.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 114 119 120 121 122 123 125 127 127 129 130 132 135 138 140 142 143 144 7 The Primal-Dual Logarithmic Barrier Method . . . . . . . . . . 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Definition of the Newton step . . . . . . . . . . . . . . . . . . . . . 7.3 Properties of the Newton step . . . . . . . . . . . . . . . . . . . . . 7.4 Proximity and local quadratic convergence . . . . . . . . . . . . . . 7.4.1 A sharper local quadratic convergence result . . . . . . . . 7.5 Primal-dual logarithmic barrier algorithm with full Newton steps . 7.5.1 Convergence analysis . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Illustration of the algorithm with full Newton steps . . . . . 7.5.3 The classical analysis of the algorithm . . . . . . . . . . . . 7.6 A version of the algorithm with adaptive updates . . . . . . . . . . 7.6.1 Adaptive updating . . . . . . . . . . . . . . . . . . . . . . . 7.6.2 The primal-dual affine-scaling and centering direction . . . 7.6.3 Condition for adaptive updates . . . . . . . . . . . . . . . . 7.6.4 Calculation of the adaptive update . . . . . . . . . . . . . . 7.6.5 Special case: adaptive update at the µ-center . . . . . . . . 7.6.6 A simple version of the condition for adaptive updating . . 7.6.7 Illustration of the algorithm with adaptive updates . . . . . 7.7 The predictor-corrector method . . . . . . . . . . . . . . . . . . . . 7.7.1 The predictor-corrector algorithm . . . . . . . . . . . . . . 7.7.2 Properties of the affine-scaling step . . . . . . . . . . . . . . 7.7.3 Analysis of the predictor-corrector algorithm . . . . . . . . 7.7.4 An adaptive version of the predictor-corrector algorithm . . 7.7.5 Illustration of adaptive predictor-corrector algorithm . . . . 7.7.6 Quadratic convergence of the predictor-corrector algorithm 7.8 A version of the algorithm with large updates . . . . . . . . . . . . 7.8.1 Estimates of barrier function values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 149 150 152 154 159 160 161 162 165 168 168 170 172 172 174 175 176 177 181 181 185 186 188 188 194 196 6.8 6.9 Properties of the Newton step . . . . . . . . . . . . . . . . . . Proximity and local quadratic convergence . . . . . . . . . . . The duality gap close to the central path . . . . . . . . . . . Dual logarithmic barrier algorithm with full Newton steps . . 6.7.1 Convergence analysis . . . . . . . . . . . . . . . . . . . 6.7.2 Illustration of the algorithm with full Newton steps . . A version of the algorithm with adaptive updates . . . . . . . 6.8.1 An adaptive-update variant . . . . . . . . . . . . . . . 6.8.2 The affine-scaling direction and the centering direction 6.8.3 Calculation of the adaptive update . . . . . . . . . . . 6.8.4 Illustration of the use of adaptive updates . . . . . . . A version of the algorithm with large updates . . . . . . . . . 6.9.1 Estimates of barrier function values . . . . . . . . . . 6.9.2 Estimates of objective values . . . . . . . . . . . . . . 6.9.3 Effect of large update on barrier function value . . . . 6.9.4 Decrease of the barrier function value . . . . . . . . . 6.9.5 Number of inner iterations . . . . . . . . . . . . . . . 6.9.6 Total number of iterations . . . . . . . . . . . . . . . . 6.9.7 Illustration of the algorithm with large updates . . . . ix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Contents 7.8.2 7.8.3 7.8.4 Decrease of barrier function value . . . . . . . . . . . . . . . . . 199 A bound for the number of inner iterations . . . . . . . . . . . 204 Illustration of the algorithm with large updates . . . . . . . . . 209 8 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 III The Target-following Approach 9 Preliminaries . . . . . . . . . . . . 9.1 Introduction . . . . . . . . . . . 9.2 The target map and its inverse 9.3 Target sequences . . . . . . . . 9.4 The target-following scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 219 221 226 231 10 The Primal-Dual Newton Method . . . . 10.1 Introduction . . . . . . . . . . . . . . . . . 10.2 Definition of the primal-dual Newton step 10.3 Feasibility of the primal-dual Newton step 10.4 Proximity and local quadratic convergence 10.5 The damped primal-dual Newton method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 235 235 236 237 240 11 Applications . . . . . . . . . . . . . . . . . . 11.1 Introduction . . . . . . . . . . . . . . . . . 11.2 Central-path-following method . . . . . . 11.3 Weighted-path-following method . . . . . 11.4 Centering method . . . . . . . . . . . . . 11.5 Weighted-centering method . . . . . . . . 11.6 Centering and optimizing together . . . . 11.7 Adaptive and large target-update methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 247 248 249 250 252 254 257 12 The Dual Newton Method . . . . . 12.1 Introduction . . . . . . . . . . . . . 12.2 The weighted dual barrier function 12.3 Definition of the dual Newton step 12.4 Feasibility of the dual Newton step 12.5 Quadratic convergence . . . . . . . 12.6 The damped dual Newton method 12.7 Dual target-updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 259 259 261 262 263 264 266 13 The Primal Newton Method . . . . . 13.1 Introduction . . . . . . . . . . . . . . 13.2 The weighted primal barrier function 13.3 Definition of the primal Newton step 13.4 Feasibility of the primal Newton step 13.5 Quadratic convergence . . . . . . . . 13.6 The damped primal Newton method 13.7 Primal target-updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 269 270 270 272 273 273 275 Contents xi 14 Application to the Method of Centers . . 14.1 Introduction . . . . . . . . . . . . . . . . . 14.2 Description of Renegar’s method . . . . . 14.3 Targets in Renegar’s method . . . . . . . 14.4 Analysis of the center method . . . . . . . 14.5 Adaptive- and large-update variants of the IV . . . . . . . . . . . . . . . . . . . . center . . . . . . . . . . . . . . . . . . . . . . . . . method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miscellaneous Topics 277 277 278 279 281 284 287 15 Karmarkar’s Projective Method . . . . . . . . . . . . 15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 15.2 The unit simplex Σn in IRn . . . . . . . . . . . . . . . 15.3 The inner-outer sphere bound . . . . . . . . . . . . . . 15.4 Projective transformations of Σn . . . . . . . . . . . . 15.5 The projective algorithm . . . . . . . . . . . . . . . . . 15.6 The Karmarkar potential . . . . . . . . . . . . . . . . 15.7 Iteration bound for the projective algorithm . . . . . . 15.8 Discussion of the special format . . . . . . . . . . . . . 15.9 Explicit expression for the Karmarkar search direction 15.10The homogeneous Karmarkar format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 289 290 291 292 293 295 297 297 301 304 16 More Properties of the Central Path 16.1 Introduction . . . . . . . . . . . . . . . 16.2 Derivatives along the central path . . 16.2.1 Existence of the derivatives . . 16.2.2 Boundedness of the derivatives 16.2.3 Convergence of the derivatives 16.3 Ellipsoidal approximations of level sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 307 307 307 309 314 315 17 Partial Updating . . . . . . . . . . 17.1 Introduction . . . . . . . . . . . . 17.2 Modified search direction . . . . 17.3 Modified proximity measure . . . 17.4 Algorithm with rank-one updates 17.5 Count of the rank-one updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 317 319 320 323 324 18 Higher-Order Methods . . . . . . . . . . . . . . . . . 18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 18.2 Higher-order search directions . . . . . . . . . . . . . 18.3 Analysis of the error term . . . . . . . . . . . . . . . 18.4 Application to the primal-dual Dikin direction . . . 18.4.1 Introduction . . . . . . . . . . . . . . . . . . 18.4.2 The (first-order) primal-dual Dikin direction 18.4.3 Algorithm using higher-order Dikin directions 18.4.4 Feasibility and duality gap reduction . . . . . 18.4.5 Estimate of the error term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 329 330 335 337 337 338 341 341 342 . . . . . . . . . . . . . . . . . . xii Contents 18.4.6 Step size . . . . . . . . . . . . . . . . . . . . . . . . . 18.4.7 Convergence analysis . . . . . . . . . . . . . . . . . . 18.5 Application to the primal-dual logarithmic barrier method . 18.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 18.5.2 Estimate of the error term . . . . . . . . . . . . . . . 18.5.3 Reduction of the proximity after a higher-order step 18.5.4 The step-size . . . . . . . . . . . . . . . . . . . . . . 18.5.5 Reduction of the barrier parameter . . . . . . . . . . 18.5.6 A higher-order logarithmic barrier algorithm . . . . 18.5.7 Iteration bound . . . . . . . . . . . . . . . . . . . . . 18.5.8 Improved iteration bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 345 346 346 347 349 353 354 356 357 358 19 Parametric and Sensitivity Analysis . . . . . . . . . . . . . . . 19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.3 Optimal sets and optimal partition . . . . . . . . . . . . . . . . 19.4 Parametric analysis . . . . . . . . . . . . . . . . . . . . . . . . . 19.4.1 The optimal-value function is piecewise linear . . . . . . 19.4.2 Optimal sets on a linearity interval . . . . . . . . . . . . 19.4.3 Optimal sets in a break point . . . . . . . . . . . . . . . 19.4.4 Extreme points of a linearity interval . . . . . . . . . . . 19.4.5 Running through all break points and linearity intervals 19.5 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . 19.5.1 Ranges and shadow prices . . . . . . . . . . . . . . . . . 19.5.2 Using strictly complementary solutions . . . . . . . . . . 19.5.3 Classical approach to sensitivity analysis . . . . . . . . . 19.5.4 Comparison of the classical and the new approach . . . 19.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 361 362 362 366 368 370 372 377 379 387 387 388 391 394 398 20 Implementing Interior Point Methods . . . . . . . . . . . . . . 20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2 Prototype algorithm . . . . . . . . . . . . . . . . . . . . . . . . 20.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3.1 Detecting redundancy and making the constraint matrix 20.3.2 Reducing the size of the problem . . . . . . . . . . . . . 20.4 Sparse linear algebra . . . . . . . . . . . . . . . . . . . . . . . . 20.4.1 Solving the augmented system . . . . . . . . . . . . . . 20.4.2 Solving the normal equation . . . . . . . . . . . . . . . . 20.4.3 Second-order methods . . . . . . . . . . . . . . . . . . . 20.5 Starting point . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.5.1 Simplifying the Newton system of the embedding model 20.5.2 Notes on warm start . . . . . . . . . . . . . . . . . . . . 20.6 Parameters: step-size, stopping criteria . . . . . . . . . . . . . . 20.6.1 Target-update . . . . . . . . . . . . . . . . . . . . . . . 20.6.2 Step size . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.6.3 Stopping criteria . . . . . . . . . . . . . . . . . . . . . . 20.7 Optimal basis identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sparser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 401 402 405 406 407 408 408 409 411 413 418 418 419 419 420 420 421 Contents xiii 20.7.1 Preliminaries . . . . . . . . . . . . . . . . . 20.7.2 Basis tableau and orthogonality . . . . . . . 20.7.3 The optimal basis identification procedure . 20.7.4 Implementation issues of basis identification 20.8 Available software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 422 424 427 429 Appendix A Some Results from Analysis . . . . . . . . . . . . . . . . . . 431 Appendix B Pseudo-inverse of a Matrix . . . . . . . . . . . . . . . . . . 433 Appendix C Some Technical Lemmas . . . . . . . . . . . . . . . . . . . . 435 Appendix D Transformation to canonical D.1 Introduction . . . . . . . . . . . . . . . D.2 Elimination of free variables . . . . . . D.3 Removal of equality constraints . . . . form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 445 446 448 Appendix E The Dikin step algorithm E.1 Introduction . . . . . . . . . . . . . . E.2 Search direction . . . . . . . . . . . . E.3 Algorithm using the Dikin direction E.4 Feasibility, proximity and step-size . E.5 Convergence analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 451 451 454 455 458 . . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479 Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483 Symbol Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 List of Figures 1.1 3.1 5.1 5.2 5.3 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 9.1 10.1 11.1 14.1 15.1 15.2 18.1 19.1 Dependence between the chapters. . . . . . . . . . . . . . . . . . . . . Output Full-Newton step algorithm for the problem in Example I.7. . The graph of ψ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The dual central path if b = (0, 1). . . . . . . . . . . . . . . . . . . . . The dual central path if b = (1, 1). . . . . . . . . . . . . . . . . . . . . The projection yielding s−1 ∆s. . . . . . . . . . . . . . . . . . . . . . . Required number of Newton steps to reach proximity 10−16 . . . . . . . Convergence rate of the Newton process. . . . . . . . . . . . . . . . . . The proximity before and after a Newton step. . . . . . . . . . . . . . Demonstration no.1 of the Newton process. . . . . . . . . . . . . . . . Demonstration no.2 of the Newton process. . . . . . . . . . . . . . . . Demonstration no.3 of the Newton process. . . . . . . . . . . . . . . . Iterates of the dual logarithmic barrier algorithm. . . . . . . . . . . . . The idea of adaptive updating. . . . . . . . . . . . . . . . . . . . . . . The iterates when using adaptive updates. . . . . . . . . . . . . . . . . The functions ψ(δ) and ψ(−δ) for 0 ≤ δ < 1. . . . . . . . . . . . . . . Bounds for bT y. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The first iterates for a large update with θ = 0.9. . . . . . . . . . . . . Quadratic convergence of primal-dual Newton process (µ = 1). . . . . Demonstration of the primal-dual Newton process. . . . . . . . . . . . The iterates of the primal-dual algorithm with full steps. . . . . . . . . The primal-dual full-step approach. . . . . . . . . . . . . . . . . . . . . The full-step method with an adaptive barrier update. . . . . . . . . . Iterates of the primal-dual algorithm with adaptive updates. . . . . . . Iterates of the primal-dual algorithm with cheap adaptive updates. . . The right-hand side of (7.40) for τ = 1/2. . . . . . . . . . . . . . . . . The iterates of the adaptive predictor-corrector algorithm. . . . . . . Bounds for ψµ (x, s). . . . . . . . . . . . . . . . . . . . . . . . . . . . . The iterates when using large updates with θ = 0.5, 0.9, 0.99 and 0.999. The central path in the w-space (n = 2). . . . . . . . . . . . . . . . . . Lower bound for the decrease in φw during a damped Newton step. . . A Dikin-path in the w-space (n = 2). . . . . . . . . . . . . . . . . . . . The center method according to Renegar. . . . . . . . . . . . . . . . . The simplex Σ3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . One iteration of the projective algorithm (x = xk ). . . . . . . . . . . . Trajectories in the w-space for higher-order steps with r = 1, 2, 3, 4, 5. A shortest path problem. . . . . . . . . . . . . . . . . . . . . . . . . . 7 53 93 98 99 112 115 116 117 117 118 119 125 126 130 135 138 147 158 159 165 169 170 178 178 185 190 198 212 225 244 254 281 290 294 334 363 xvi List of figures 19.2 19.3 19.4 19.5 19.6 20.1 20.2 E.1 The optimal partition of the shortest path problem in Figure 19.1. . The optimal-value function g(γ). . . . . . . . . . . . . . . . . . . . . The optimal-value function f (β). . . . . . . . . . . . . . . . . . . . . The feasible region of (D). . . . . . . . . . . . . . . . . . . . . . . . . A transportation problem. . . . . . . . . . . . . . . . . . . . . . . . . Basis tableau. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tableau for a maximal basis. . . . . . . . . . . . . . . . . . . . . . . Output of the Dikin Step Algorithm for the problem in Example I.7. . . . . . . . . 364 369 383 390 394 423 426 459 List of Tables 2.1. Scheme for dualizing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. Estimates for large and small variables on the central path. . . . . . . 3.2. Estimates for large and small variables if δc (z) ≤ τ . . . . . . . . . . . . 6.1. Output of the dual full-step algorithm. . . . . . . . . . . . . . . . . . . 6.2. Output of the dual full-step algorithm with adaptive updates. . . . . . 6.3. Progress of the dual algorithm with large updates, θ = 0.5. . . . . . . 6.4. Progress of the dual algorithm with large updates, θ = 0.9. . . . . . . 6.5. Progress of the dual algorithm with large updates, θ = 0.99. . . . . . . 7.1. Output of the primal-dual full-step algorithm. . . . . . . . . . . . . . . 7.2. Proximity values in the final iterations. . . . . . . . . . . . . . . . . . . 7.3. The primal-dual full-step algorithm with expensive adaptive updates. . 7.4. The primal-dual full-step algorithm with cheap adaptive updates. . . . 7.5. The adaptive predictor-corrector algorithm. . . . . . . . . . . . . . . . 7.6. Asymptotic orders of magnitude of some relevant vectors. . . . . . . . 7.7. Progress of the primal-dual algorithm with large updates, θ = 0.5. . . 7.8. Progress of the primal-dual algorithm with large updates, θ = 0.9. . . 7.9. Progress of the primal-dual algorithm with large updates, θ = 0.99. . . 7.10. Progress of the primal-dual algorithm with large updates, θ = 0.999. . 16.1. Asymptotic orders of magnitude of some relevant vectors. . . . . . . . 43 58 61 124 129 145 146 146 163 164 177 177 189 191 210 211 211 211 310 Preface Linear Optimization1 (LO) is one of the most widely taught and applied mathematical techniques. Due to revolutionary developments both in computer technology and algorithms for linear optimization, ‘the last ten years have seen an estimated six orders of magnitude speed improvement’.2 This means that problems that could not be solved 10 years ago, due to a required computational time of one year, say, can now be solved within some minutes. For example, linear models of airline crew scheduling problems with as many as 13 million variables have recently been solved within three minutes on a four-processor Silicon Graphics Power Challenge workstation. The achieved acceleration is due partly to advances in computer technology and for a significant part also to the developments in the field of so-called interior-point methods for linear optimization. Until very recently, the method of choice for solving linear optimization problems was the Simplex Method of Dantzig [59]. Since the initial formulation in 1947, this method has been constantly improved. It is generally recognized to be very robust and efficient and it is routinely used to solve problems in Operations Research, Business, Economics and Engineering. In an effort to explain the remarkable efficiency of the Simplex Method, people strived to prove, using the theory of complexity, that the computational effort to solve a linear optimization problem via the Simplex Method is polynomially bounded with the size of the problem instance. This question is still unsettled today, but it stimulated two important proposals of new algorithms for LO. The first one is due to Khachiyan in 1979 [167]: it is based on the ellipsoid technique for nonlinear optimization of Shor [255]. With this technique, Khachiyan proved that LO belongs to the class of polynomially solvable problems. Although this result has had a great theoretical impact, the new algorithm failed to deliver its promises in actual computational efficiency. The second proposal was made in 1984 by Karmarkar [165]. Karmarkar’s algorithm is also polynomial, with a better complexity bound 1 2 The field of Linear Optimization has been given the name Linear Programming in the past. The origin of this name goes back to the Dutch Nobel prize winner Koopmans. See Dantzig [60]. Nowadays the word ‘programming’ usually refers to the activity of writing computer programs, and as a consequence its use instead of the more natural word ‘optimization’ gives rise to confusion. Following others, like Padberg [230], we prefer to use the name Linear Optimization in the book. It may be noted that in the nonlinear branches of the field of Mathematical Programming (like Combinatorial Optimization, Discrete Optimization, Semidefinite Optimization, etc.) this terminology has already become generally accepted. This claim is due to R.E. Bixby, professor of Computational and Applied Mathematics at Rice University, and director of CPLEX Optimization, Inc., a company that markets algorithms for linear and mixed-integer optimization. See the news bulletin of the Center For Research on Parallel Computation, Volume 4, Issue 1, Winter 1996. Bixby adds that parallelization may lead to ‘at least eight orders of magnitude improvement—the difference between a year and a fraction of a second!’ xx Preface than Khachiyan, but it has the further advantage of being highly efficient in practice. After an initial controversy it has been established that for very large, sparse problems, subsequent variants of Karmarkar’s method often outperform the Simplex Method. Though the field of LO was considered more or less mature some ten years ago, after Karmarkar’s paper it suddenly surfaced as one of the most active areas of research in optimization. In the period 1984–1989 more than 1300 papers were published on the subject, which became known as Interior Point Methods (IPMs) for LO.3 Originally the aim of the research was to get a better understanding of the so-called Projective Method of Karmarkar. Soon it became apparent that this method was related to classical methods like the Affine Scaling Method of Dikin [63, 64, 65], the Logarithmic Barrier Method of Frisch [86, 87, 88] and the Center Method of Huard [148, 149], and that the last two methods could also be proved to be polynomial. Moreover, it turned out that the IPM approach to LO has a natural generalization to the related field of convex nonlinear optimization, which resulted in a new stream of research and an excellent monograph of Nesterov and Nemirovski [226]. Promising numerical performances of IPMs for convex optimization were recently reported by Breitfeld and Shanno [50] and Jarre, Kocvara and Zowe [162]. The monograph of Nesterov and Nemirovski opened the way into another new subfield of optimization, called Semidefinite Optimization, with important applications in System Theory, Discrete Optimization, and many other areas. For a survey of these developments the reader may consult Vandenberghe and Boyd [48]. As a consequence of the above developments, there are now profound reasons why people may want to learn about IPMs. We hope that this book answers the need of professors who want to teach their students the principles of IPMs, of colleagues who need a unified presentation of a desperately burgeoning field, of users of LO who want to understand what is behind the new IPM solvers in commercial codes (CPLEX, OSL, . . .) and how to interpret results from those codes, and of other users who want to exploit the new algorithms as part of a more general software toolbox in optimization. Let us briefly indicate here what the book offers, and what does it not. Part I contains a small but complete and self-contained introduction to LO. We deal with the duality theory for LO and we present a first polynomial method for solving an LO problem. We also present an elegant method for the initialization of the method, using the so-called self-dual embedding technique. Then in Part II we present a comprehensive treatment of Logarithmic Barrier Methods. These methods are applied to the LO problem in standard format, the format that has become most popular in the field because the Simplex Method was originally devised for that format. This part contains the basic elements for the design of efficient algorithms for LO. Several types of algorithm are considered and analyzed. Very often the analysis improves the existing analysis and leads to sharper complexity bounds than known in the literature. In Part III we deal with the so-called Target-following Approach to IPMs. This is a unifying framework that enables us to treat many other IPMs, like the Center Method, in an easy way. Part IV covers some additional topics. It starts with the description and analysis of the Projective Method of Karmarkar. Then we discuss some more 3 We refer the reader to the extensive bibliography of Kranich [179, 180] for a survey of the literature on the subject until 1989. A more recent (annotated) bibliography was given by Roos and Terlaky [242]. A valuable source of information is the World Wide Web interior point archive: http://www.mcs.anl.gov/home/otc/InteriorPoint.archive.html. Preface xxi interesting theoretical properties of the central path. We also discuss two interesting methods to enhance the efficiency of IPMs, namely Partial Updating, and so-called Higher-Order Methods. This part also contains chapters on parametric and sensitivity analysis and on computational aspects of IPMs. It may be clear from this description that we restrict ourselves to Linear Optimization in this book. We do not dwell on such interesting subjects as Convex Optimization and Semidefinite Optimization, but we consider the book as a preparation for the study of IPMs for these types of optimization problem, and refer the reader to the existing literature.4 Some popular topics in IPMs for LO are not covered by the book. For example, we do not treat the (Primal) Affine Scaling Method of Dikin.5 The reason for this is that we restrict ourselves in this book to polynomial methods and until now the polynomiality question for the (Primal) Affine Scaling Method is unsettled. Instead we describe in Appendix E a primal-dual version of Dikin’s affine-scaling method that is polynomial. Chapter 18 describes a higher-order version of this primal-dual affine-scaling method that has the best possible complexity bound known until now for interior-point methods. Another topic not touched in the book is (Primal-Dual) Infeasible Start Methods. These methods, which have drawn a lot of attention in the last years, deal with the situation when no feasible starting point is available.6 In fact, Part I of the book provides a much more elegant solution to this problem; there we show that any given LO problem can be embedded in a self-dual problem for which a feasible interior starting point is known. Further, the approach in Part I is theoretically more efficient than using an Infeasible Start Method, and from a computational point of view is not more involved, as we show in Chapter 20. We hope that the book will be useful to students, users and researchers, inside and outside the field, in offering them, under a single cover, a presentation of the most successful ideas in interior-point methods. Kees Roos Tamás Terlaky Jean-Philippe Vial Preface to the 2005 edition Twenty years after Karmarkar’s [165] epoch making paper interior point methods (IPMs) made their way to all areas of optimization theory and practice. The theory of IPMs matured, their professional software implementations significantly pushed the boundary of efficiently solvable problems. Eight years passed since the first edition of this book was published. In these years the theory of IPMs further crystallized. One of the notable developments is that the significance of the self-dual embedding 4 For Convex Optimization the reader may consult den Hertog [140], Nesterov and Nemirovski [226] and Jarre [161]. For Semidefinite Optimization we refer to Nesterov and Nemirovski [226], Vandenberghe and Boyd [48] and Ramana and Pardalos [236]. We also mention Shanno and Breitfeld and Simantiraki [252] for the related topic of barrier methods for nonlinear programming. 5 A recent survey on affine scaling methods was given by Tsuchiya [272]. 6 We refer the reader to, e.g., Potra [235], Bonnans and Potra [45], Wright [295, 297], Wright and Ralph [296] and the recent book of Wright [298]. xxii Preface model –that is a distinctive feature of this book– got fully recognized. Leading linear and conic-linear optimization software packages, such as MOSEK7 and SeDuMi8 are developed on the bedrock of the self-dual model, and the leading commercial linear optimization package CPLEX9 includes the embedding model as a proposed option to solve difficult practical problems. This new edition of this book features a completely rewritten first part. While keeping the simplicity of the presentation and accessibility of complexity analysis, the featured IPM in Part I is now a standard, primal-dual path-following Newton algorithm. This choice allows us to reach the so-far best known complexity result in an elementary way, immediately in the first part of the book. As always, the authors had to make choices when and how to cut the expansion of the material of the book, and which new results to include in this edition. We cannot resist mentioning two developments after the publication of the first edition. The first development can be considered as a direct consequence of the approach taken in the book. In our approach properties of the univariate function ψ(t), as defined in Section 5.5 (page 92), play a key role. The book makes clear that the primal-, dualand primal-dual logarithmic barrier function can be defined in terms of ψ(t), and as such ψ(t) is at the heart of all logarithmic barrier functions; we call it now the kernel function of the logarithmic barrier function. After the completion of the book it became clear that more efficient large-update IPMs than those considered in this book, which are all based on the logarithmic barrier function, can be obtained simply by replacing ψ(t) by other kernel functions. A large class of such kernel functions, that allowed to improve the worst case complexity of large-update IPMs, is the family of self-regular functions, which is the subject of the monograph [233]; more kernel functions were considered in [32]. A second, more recent development, deals √ with the complexity of IPMs. Until now, the best iteration bound for IPMs is O( nL), where n denotes the dimension of the problem (in standard from), √ and L the binary input size of the problem. In 1996, Todd and Ye showed that O( 3 nL) is a lower bound for the iteration complexity of IPMs [267]. It is well known that the iteration complexity highly depends on the curliness of the central path, and that the presence of redundancy may severely affect this curliness. Deza et al. [61] showed that by adding enough redundant constraints to the Klee-Minty example of dimension n, the central path may be forced to visit all 2n vertices of the Klee-Minty cube. An enhanced version √ of the same example, where the number of inequalities is N = O(22n n3 ), yields an O( N /log N ) lower bound for the iteration complexity, thus almost closing (up to a factor of log N ) the gap with the best worst case iteration bound for IPMs [62]. Instructors adapting the book as textbook in a course may contact the authors at <terlaky@mcmaster.ca> for obtaining the ”Solution Manual” for the exercises and getting access to a user forum. March 2005 7 MOSEK: http://www.mosek.com 8 SeDuMi: http://sedumi.mcmaster.ca 9 CPLEX: http://cplex.com Kees Roos Tamás Terlaky Jean-Philippe Vial Acknowledgements The subject of this book came into existence during the twelve years following 1984 when Karmarkar initiated the field of interior-point methods for linear optimization. Each of the authors has been involved in the exciting research that gave rise to the subject and in many cases they published their results jointly. Of course the book is primarily organized around these results, but it goes without saying that many other results from colleagues in the ‘interior-point community’ are also included. We are pleased to acknowledge their contribution and at the appropriate places we have strived to give them credit. If some authors do not find due mention of their work we apologize for this and invoke as an excuse the exploding literature that makes it difficult to keep track of all the contributions. To reach a unified presentation of many diverse results, it did not suffice to make a bundle of existing papers. It was necessary to recast completely the form in which these results found their way into the journals. This was a very time-consuming task: we want to thank our universities for giving us the opportunity to do this job. We gratefully acknowledge the developers of LATEX for designing this powerful text processor and our colleagues Leo Rog and Peter van der Wijden for their assistance whenever there was a technical problem. For the construction of many tables and figures we used MATLAB; nowadays we could say that a mathematician without MATLAB is like a physicist without a microscope. It is really exciting to study the behavior of a designed algorithm with the graphical features of this ‘mathematical microscope’. We greatly enjoyed stimulating discussions with many colleagues from all over the world in the past years. Often this resulted in cooperation and joint publications. We kindly acknowledge that without the input from their side this book could not have been written. Special thanks are due to those colleagues who helped us during the writing process. We mention János Mayer (University of Zürich, Switzerland) for his numerous remarks after a critical reading of large parts of the first draft and Michael Saunders (Stanford University, USA) for an extremely careful and useful preview of a later version of the book. Many other colleagues helped us to improve intermediate drafts. We mention Jan Brinkhuis (Erasmus University, Rotterdam) who provided us with some valuable references, Erling Andersen (Odense University, Denmark), Harvey Greenberg and Allen Holder (both from the University of Colorado at Denver, USA), Tibor Illés (Eötvös University, Budapest), Florian Jarre (University of Würzburg, Germany), Etienne de Klerk (Delft University of Technology), Panos Pardalos (University of Florida, USA), Jos Sturm (Erasmus University, Rotterdam), and Joost Warners (Delft University of Technology). Finally, the authors would like to acknowledge the generous contributions of xxiv Acknowledgements numerous colleagues and students. Their critical reading of earlier drafts of the manuscript helped us to clean up the new edition by eliminating typos and using their constructive remarks to improve the readability of several parts of the books. We mention Jiming Peng (McMaster University), Gema Martinez Plaza (The University of Alicante) and Manuel Vieira (University of Lisbon/University of Technology Delft). Last but not least, we want to express warm thanks to our wives and children. They also contributed substantially to the book by their mental support, and by forgiving our shortcomings as fathers for too long. 1 Introduction 1.1 Subject of the book This book deals with linear optimization (LO). The object of LO is to find the optimal (minimal or maximal) value of a linear function subject to linear constraints on the variables. The constraints may be either equality or inequality constraints.1 From the point of view of applications, LO possesses many nice features. Linear models are relatively simple to create. They can be realistic enough to give a proper account of the problems at hand. As a consequence, LO models have found applications in different areas such as engineering, management, logistics, statistics, pattern recognition, etc. LO is also very relevant to economic theory. It underlies the analysis of linear activity models and provides, through duality theory, a nice insight into the price mechanism. However, we will not deal with applications and modeling. Many existing textbooks teach more about this.2 Our interest will be mainly in methods for solving LO problems, especially Interior Point Methods (IPM’s). Renewed interest in these methods for solving LO problems arose after the seminal paper of Karmarkar [165] in 1984. The overwhelming amount of research of the last ten years has been tremendously prolific. Many new algorithms were proposed and almost all of these algorithms have been shown to be efficient, at least from a theoretical point of view. Our first aim is to present a comprehensive and unified treatment of many of these new methods. It may not be surprising that exploring a new method for LO should lead to a new view of the theory of LO. In fact, a similar interaction between method and theory is well known for the Simplex Method; in the past the theory of LO and the Simplex Method were intimately related. The fundamental results of the theory of LO concern strong duality and the existence of a strictly complementary solution. Our second aim will be to derive these results from limiting properties of the so-called central path of an LO problem. Thus the very theory of LO is revisited. The central path appears to play a key role both in the development of the theory and in the design of algorithms. 1 2 The more general optimization problem arising when the objective function and/or the constraints are nonlinear is not considered. It may be pointed out that LO is the first building block in the development of the theory of nonlinear optimization. Algorithmically, LO is also widely used in nonlinear and integer optimization, either as a subroutine in a more complicated algorithm or as a starting point of a specialized algorithm. The book of Williams [293] is completely devoted to the design of mathematical models, including linear models. 2 Introduction As a consequence, the book can be considered a self-contained treatment of LO. The reader familiar with the subject of LO will easily recognize the difference from the classical approach to the theory. The Simplex Method in essence explores the polyhedral structure of the domain (or feasible region) of an LO problem. Accordingly, the classical approach to the theory of LO concentrates on the polyhedral structure of the domain. On the other hand, the IPM approach uses the central path as a guide to the set of optimal solutions, and the theory follows by studying the limiting properties of this path.3 As we will see, the limit of the central path is a strictly complementary solution. Strictly complementary solutions play a crucial role in the theory as presented in Part I of the book. Also, in general, the output of a well-designed IPM for LO is a strictly complementary solution. Recall that the Simplex Method generates a so-called basic solution and that such solutions are fundamental in the classical theory of LO. From the practical point of view it is most important to study the sensitivity of an optimal solution under perturbations in the data of an LO problem. This is the subject of Sensitivity (or Parametric or Postoptimal) Analysis. Our third aim will be to present some new results in this respect, which will make clear the well-known fact that the classical approach has some inherent weaknesses. These weaknesses can be overcome by exploring the concept of the optimal partition of an LO problem which is closely related to a strictly complementary solution. 1.2 More detailed description of the contents As stated in the previous section, we intend to present an interior point approach to both the theory of LO and algorithms for LO (design, convergence, complexity and asymptotic behavior). The common thread through the various parts of the book will be the prominent role of strictly complementary solutions; this notion plays a crucial role in the IPM approach and distinguishes the new approach from the classical Simplex based approach. Part I of the book consists of Chapters 2, 3 and 4. This part is a self-contained treatment of LO. It provides the main theoretical results for LO, as well as a polynomial method for solving the LO problem. The theory of LO is developed in Chapter 2. This is done in a way that is probably new for most readers, even for those who are familiar with LO. As indicated before, in IPM’s a fundamental element is the central path of a problem. This path is introduced in Chapter 2 and the duality theory for LO is derived from its properties. The general theory turns out to follow easily when considering first the relatively small class of so-called self-dual problems. The results for self-dual problems are extended to general problems by embedding any given LO problem in an appropriate self-dual problem. Chapter 3 presents an algorithm that solves self-dual problems in polynomial time. It may be emphasized that this algorithm yields a so-called strictly complementary solution of the given problem. Such a solution, in general, provides much more information on the set of 3 Most of the fundamental duality results for LO will be well known to many of the readers; they can be found in any textbook on LO. Probably the existence of a strictly complementary solution is less well known. This result has been shown first by Goldman and Tucker [111] and will be referred to as the Goldman–Tucker theorem. It plays a crucial role in this book. We get it as a byproduct of the limiting behavior of the central path. Introduction 3 optimal solutions than an optimal basic solution as provided by the Simplex Method. The strictly complementary solution is obtained by applying a rounding procedure to a sufficiently accurate approximate solution. Chapter 4 is devoted to LO problems in canonical format, with (only) nonnegative variables and (only) inequality constraints. A thorough discussion of the special structure of the canonical format provides some specialized embeddings in self-dual problems. As a byproduct we find the central path for canonical LO problems. We also discuss how an approximate solution for the canonical problem can be obtained from an approximate solution of the embedding problem. The two main components in an iterative step of an IPM are the search direction and the step-length along that direction. The algorithm in Part I is a rather simple primal-dual algorithm based on the primal-dual Newton direction and uses a very simple step-length rule: the step length is always 1. The resulting Full-Newton Step Algorithm is polynomial and straightforward to implement. However, the theoretical iteration bound derived for this algorithm, although polynomial, is relatively poor when compared with algorithms based on other search strategies. Therefore, more efficient methods are considered in Part II of the book; they are so-called Logarithmic Barrier Methods. For reasons of compatibility with the existing literature, on both the Simplex Method and IPM’s, we abandon the canonical format (with nonnegative variables and inequality constraints) in Part II and use the so-called standard format (with nonnegative variables and equality constraints). In order to make Part II independent of Part I, in Chapter 5 we revisit duality theory and discuss the relevant results for the standard format from an interior point of view. This includes, of course, the definition and existence of the central paths for the (primal) problem in standard form and its dual problem (which has free variables and inequality constraints). Using a symmetric formulation of both problems we see that any method for the primal problem induces in a natural way a method for the dual problem and vice versa. Then, in Chapter 6, we focus on the Dual Logarithmic Barrier Method; according to the previous remark the analysis can be naturally, and easily, transformed to the primal case. The search direction here is the Newton direction for minimizing the (classical) dual logarithmic barrier function with barrier parameter µ. Three types of method are considered. First we analyze a method that uses full Newton steps and small updates of the barrier parameter µ. This gives another central-pathfollowing method that admits the best possible iteration bound. Secondly, we discuss the use of adaptive updates of µ; this leaves the iteration bound unchanged, but enhances the practical behavior. Finally, we consider methods that use large updates of µ and a bounded number of damped Newton steps between each pair of successive barrier updates. The (theoretical worst-case) iteration bound is worse than for the full Newton step method, but this seems to be due to the poor analysis of this type of method. In practice large-update methods are much more efficient than the full Newton step method. This is demonstrated by some (small) examples. Chapter 7, deals with the Primal-Dual Logarithmic Barrier Method. It has basically the same structure as Chapter 6. Having defined the primal-dual Newton direction, we deal first with a full primal-dual Newton step method that allows small updates in the barrier parameter µ. Then we consider a method with adaptive updates of µ, and finally methods that use large updates of µ and a bounded number of damped primaldual Newton steps between each pair of successive barrier updates. In-between we 4 Introduction also deal with the Predictor-Corrector Method. The nice feature of this method is its asymptotic quadratic convergence rate. Some small computational examples are included that highlight the better performance of the primal-dual Newton method compared with the dual (or primal) Newton method. The methods used in Part II need to be initialized with a strictly feasible solution.4 Therefore, in Chapter 8 we discuss how to meet this condition. This concludes the description of Part II. At this stage of the book, the reader will have encountered the main theoretical ideas underlying efficient implementations of IPM’s for LO. He will have been exposed to many variants of IPM’s, dual and primal-dual methods with either full or damped Newton steps.5 The search directions in these methods are Newton directions. All these methods, in one way or another, use the central path as a guideline to optimality. Part III is devoted to a broader class of IPM’s, some of which also follow the central path but others do not. In Chapter 9 we introduce the unifying concepts of target sequence and Target-following Methods. In the Logarithmic Barrier Methods of Part II the target sequence always consists of points on the central path. Other IPM’s can be simply characterized by their target sequence. We present some examples in Chapter 11, where we deal with weighted-path-following methods, a Dikin-path-following method, and also with a centering method that can be used to compute the so-called weightedanalytic center of a polytope. Chapters 10, 12 and 13 present respectively primal-dual, dual and primal versions of Newton’s method for following a given target sequence. Finally, concluding Part III, in Chapter 14 we describe a famous interior-point method, due to Renegar and based on the center method of Huard; we show that it nicely fits in the framework of target-following methods, with the targets on the central path. Part IV is entitled Miscellaneous Topics: it contains material that deserves a place in the book but did not fit well in any of the previous three parts. The reader will have noticed that until now we have not discussed the very first polynomial IPM, the Projective Method of Karmarkar. This is because the mainstream of research into IPM’s diverged from this method soon after 1984.6 Because of the big influence this algorithm had on the field of LO, and also because there is still a small ongoing stream of research in this direction, it deserves a place in this book. We describe and analyze Karmarkar’s method in Chapter 15. Surprisingly enough, and in contrast with all other methods discussed in this book, both in the description and the analysis of Karmarkar’s method we do not refer to the central path; also, the search direction differs from the Newton directions used in the other methods. In Chapter 16 we return to the central path. We show that the central path is differentiable and study the asymptotic 4 A feasible solution is called strictly feasible if no variable or inequality constraint is at (one of) its bound(s). 5 In the literature, full-step methods are often called short-step methods and damped Newton step methods long-step methods or large-step methods. In damped-step methods a line search is made in each iteration that aims to (approximately) minimize a barrier (or potential) function. Therefore, these methods are also known as potential reduction methods. There are still many textbooks on LO that do not deal with IPM’s. Moreover, in some other textbooks that pay attention to IPM’s, the authors only discuss the Projective Method of Karmarkar, thereby neglecting the important developments after 1984 that gave rise to the efficient methods used in the well-known commercial codes, such as CPLEX and OSL. Exceptions, in this respect, are Bazaraa, Sherali and Shetty [37], Padberg [230] and Fang and Puthenpura [74], who discuss the existence of other IPM’s in a separate section or chapter. We also mention Saigal [249], who gives a large chapter (of 150 pages) on a topic not covered in this book, namely (primal) affine-scaling methods. A recent survey on these methods is given by Tsuchiya [272]. 6 Introduction 5 behavior of the derivatives when the optimal set is approached. We also show that we can associate with each point on the central path two homothetic ellipsoids centered at this point so that one ellipsoid is contained in the feasible region and the other ellipsoid contains the optimal set. The next two chapters deal with methods for accelerating IPM’s. Chapter 17 deals with a technique called partial updating, already proposed in Karmarkar’s original paper. In Chapter 18 we consider so-called higher-order methods. The Newton methods used before are considered to be first-order methods. It is shown that more advanced search directions improve the iteration bound for several first order methods. The complexity bound achieves the best value known for IPM’s nowadays. We also apply the higher-order-technique to the Logarithmic Barrier Method. Chapter 19 deals with Parametric and Sensitivity Analysis. This classical subject in LO is of great importance in the analysis of practical linear models. Almost any textbook includes a section about it and many commercial optimization packages offer an option to perform post-optimal analysis. Unfortunately, the classical approach, based on the use of an optimal basic solution, has some inherent weaknesses. These weaknesses are discussed and demonstrated. We follow a new approach in this chapter, leading to a better understanding of the subject and avoiding the shortcomings of the classical approach. The notions of optimal partition and strictly complementary solution play an important role, but to avoid any misunderstanding, it should be emphasized that the new approach can also be performed when only an optimal basic solution is available. After all the efforts spent in the book to develop beautiful theorems and convergence results the reader may want to get some more evidence that IPM’s work well in practice. Therefore the final chapter is devoted to the implementation of IPM’s. Though most implementations more or less follow the scheme prescribed by the theory, there is still a large stretch between the theory and an efficient implementation. Chapter 20 discusses some of the important implementation issues. 1.3 What is new in this book? The book offers an approach to LO and to IPM’s that is new in many aspects.7 First, the derivation of the main theoretical results for LO, like the duality theory and the existence of a strictly complementary solution from properties of the central path, is new. The primal-dual algorithm for solving self-dual problems is also new; equipped with the rounding procedure it yields an exact strictly complementary solution. The derivation of the polynomial complexity of the whole procedure is surprisingly simple.8 The algorithms in Part II, based on the logarithmic barrier method, are known from the literature, but their analysis contains many new elements, often resulting in much sharper bounds than those in the literature. In this respect an important (and new) tool is the function ψ, first introduced in Section 5.5 and used through the rest of the book. We present a comprehensive discussion of all possible variants of these algorithms (like dual, primal and primal-dual full-step, adaptive-update and 7 8 Of course, the book is inspired by many papers and results of many colleagues. Thinking over these results often led to new insights, new algorithms and new ways to analyze these algorithms. The approach in Part I, based on the embedding of a given LO problem in a self-dual problem, suggests some new and promising implementation strategies. 6 Introduction large-update methods). We also deal with the — from the practical point of view very important — predictor-corrector method, and show that this method has an asymptotically quadratic convergent rate. We also discuss the techniques of partial updating and the use of higher-order methods. Finally, we present a new approach to sensitivity analysis and discuss many computationally aspects which are crucial for efficient implementation of IPM’s. 1.4 Required knowledge and skills We wanted to write a book that presents the most prominent results on IPM’s in a unified and comprehensive way, with a full development of the most important items. Especially Part I can be considered as an elementary introduction to LO, containing both a complete derivation of the duality theory as well as an easy-to-analyze polynomial algorithm. The mathematical tools that are used do not go beyond standard calculus and linear algebra. Nevertheless, people educated in the Simplex based approach to LO will need some effort to get acquainted with the formalism and the mathematical manipulations. They have struggled with the algebra of pivoting, the new methods do not refer to pivoting.9 However, the tools used are not much more advanced than those that were required to master the Simplex Method. We therefore expect that people will quickly get acquainted with the new tools, just as many generations of students have become familiar with pivoting. In general, the level of the book will be accessible to any student in Operations Research and Mathematics, with 2 to 3 years of basic training in calculus and linear algebra. 1.5 How to use the book for courses Owing to the importance of LO in theory and in practice, it must be expected that IPM’s will soon become a popular topic in Operations Research and other fields where LO is used, such as Business, Economics and Engineering. More and more institutions will open courses dedicated to IPM’s for LO. It has been one of our purposes to collect in this book all relevant material from research papers, survey papers, etc. and to strive for a cohesive and easily accessible source for such courses. The dependence between the chapters is demonstrated in Figure 1.1. This figure indicates some possible reading paths through the book. For newcomers in the field we recommend starting with Part I, consisting of Chapters 2, 3 and 4. This part of the book can be used for a basic course in LO, covering duality theory and offering a first and easy-to-analyze polynomial algorithm: the Full-Newton Step Algorithm. Part II deals with LO problems in standard format. Chapter 5 covers the duality theory and Chapters 6 and 7 deal with several interesting variants of the Logarithmic 9 However, numerical analysts who want to perform the actual implementation really need to master advanced sparse linear algebra, including pivoting strategies in matrix factorization. See Chapter 20. Introduction ✲ 2 7 ✲ 3 ✲ 4 ❄✠ 5 ✠ ❘ ❄ 9 ❄ ❘ 6 7 ❄ 17 ❘ ❥ 12 13 q 14 ❄ 8 ❄ 10 11 ✎ ❄ ❯❲ ❄ 20 16 18 Figure 1.1 Dependence between the chapters. ❄ 15 ❄ 19 Barrier Method that underly the efficient solvers in existing commercial optimization packages. For readers who know the Simplex Method and who are familiar with the LO problem in standard format, we made Part II independent of Part I; they might wish to start their reading with Part II and then proceed with Part I. Part III, on the target-following approach, offers much new understanding of the principles of IPM’s, as well as a unifying and easily accessible treatment of other IPM’s, such as the method of Renegar (Chapter 14). This part could be part of a more advanced course on IPM’s. Chapter 15 contains a relatively simple description and analysis of Karmarkar’s Projective Method. This chapter is almost independent of the previous chapters and hence can be read at any stage. Chapters 16, 17 and 18 could find a place in an advanced course. The value of Chapter 16 is purely theoretical and is recommended to readers who want to delve more deeply into properties of the central path. The other two chapters, on the other hand, have more practical value. They describe and apply two techniques (partial updating and higher-order methods) that can be used to enhance the efficiency of some methods. We consider Chapter 19 to be extremely important for users of LO who are interested in the sensitivity of their models to perturbations in the input data. This chapter is independent of almost all the previous chapters. Finally, Chapter 20 is relevant for readers who are interested in implementation 8 Introduction issues. It assumes a basic understanding of many theoretical concepts for IPM’s and of advanced numerical algebra. 1.6 Footnotes and exercises It may be worthwhile to devote some words to the positioning of footnotes and exercises in this book. The footnotes are used to refer to related references, or to make a small digression from the main thrust of the reasoning. We preferred to place the footnotes not at the end of each chapter but at the bottom of the page they refer to. We have treated exercises in the same way. They often have a goal similar to footnotes, namely to highlight a result closely related to results discussed in the book. 1.7 Preliminaries We assume that the reader is familiar with the basic concepts of linear algebra, such as linear (sub-)space, linear (in-)dependence of vectors, determinant of a (square) matrix, nonsingularity of a matrix, inverse of a matrix, etc. We recall some basic concepts and results in this section.10 1.7.1 Positive definite matrices The space of all square n × n matrices is denoted by IRn×n . A matrix A ∈ IRn×n is called a positive definite matrix if A is symmetric and each of its eigenvalues is positive.11 The following statements are equivalent for any symmetric matrix A: (i) A is positive definite; (ii) A = C T C for some nonsingular matrix C; (iii) xT Ax > 0 for each nonzero vector x. A matrix A ∈ IRn×n is called a positive semi-definite matrix if A is symmetric and its eigenvalues are nonnegative. The following statements are equivalent for any symmetric matrix A: (i) A is positive semi-definite; (ii) A = C T C for some matrix C; (iii) xT Ax ≥ 0 for each vector x. 1.7.2 Norms of vectors and matrices In this book a vector x is always an n-tuple (x1 , x2 , . . . , xn ) in IRn . The numbers xi (1 ≤ i ≤ n) are called the coordinates or entries of x. Usually we think of x as a 10 11 For a more detailed treatment we refer the reader to books like Bellman [38], Birkhoff and MacLane [41], Golub and Van Loan [112], Horn and Johnson [147], Lancester and Tismenetsky [181], Ben-Israel and Greville [39], Strang [259] and Watkins [289]. Some authors do not include symmetry as part of the definition. For example, Golub and Van Loan [112] call A positive definite if (iii) holds without requiring symmetry of A. Introduction 9 column vector and of its transpose, denoted by xT , as a row vector. If all entries of x are zero we simply write x = 0. A special vector is the all-one vector, denoted by e, whose coordinates are all equal to 1. The scalar product of x and s ∈ IRn is given by n X xT s = xi si . i=1 We recall the following properties of norms for vectors and matrices. A norm (or vector norm) on IRn is a function that assigns to each x ∈ IRn a nonnegative number kxk such that for all x, s ∈ IRn and α ∈ IR: kxk > 0, if x 6= 0 kαxk = |α| kxk kx + sk ≤ kxk + ksk . The Euclidean norm is defined by v u n uX x2i . kxk2 = t i=1 When the norm is not further specified, kxk will always refer to the Euclidean norm. The Cauchy–Schwarz inequality states that for x, s ∈ IRn : xT s ≤ kxk ksk . The inequality holds with equality if and only if x and s are linearly dependent. For any positive number p we also have the p-norm, defined by kxkp = n X i=1 |xi | p ! p1 . The Euclidean norm is the special case where p = 2 and is therefore also called the 2-norm. Another important special case is the 1-norm: kxk1 = n X i=1 |xi | . Letting p go to infinity we get the so-called infinity norm: kxk∞ = lim kxkp . p→∞ We have kxk∞ = max |xi | . 1≤i≤n For any positive definite n × n matrix A we have a vector norm k.kA according to √ kxkA = xT Ax. 10 Introduction For any norm the unit ball in IRn is the set {x ∈ IRn : kxk = 1} . By concatenating the columns of an n × n matrix A (in the natural order), A can be 2 considered a vector in IRn . A function assigning to each A ∈ IRn×n a real number kAk is called a matrix norm if it satisfies the conditions for a vector norm and moreover kABk ≤ kAk kBk , for all A, B ∈ IRn×n . A well-known matrix norm is the Frobenius norm k.kF , which is simply the vector 2-norm applied to the matrix: v uX n u n X A2ij . kAkF = t i=1 j=1 Every vector norm induces a matrix norm according to kAk = max kAxk . kxk=1 This matrix norm satisfies ∀x ∈ IRn . kAxk ≤ kAk kxk , The vector 1-norm induces the matrix norm kAk1 = max 1≤j≤n n X i=1 |Aij | , and the vector ∞-norm induces the matrix norm kAk∞ = max 1≤i≤n n X j=1 |Aij | . kAk1 is also called the column sum norm and kAk∞ the row sum norm. Note that kAk∞ = AT 1 . Hence, if A is symmetric then kAk∞ = kAk1 . The matrix norm induced by the vector 2-norm is, by definition, kAk2 = max kAxk2 . kxk2 =1 This norm is also called the spectral matrix norm. Observe that it differs from the Frobenius norm (consider both norms for A = I, where I = diag (e)). In general, kAk2 ≤ kAkF . Introduction 1.7.3 11 Hadamard inequality for the determinant For an n × n matrix A with columns a1 , a2 , . . . , an its determinant satisfies det(A) = volume of the parallelepiped spanned by a1 , a2 , . . . , an . This interpretation of the determinant implies the inequality det(A) ≤ ka1 k2 ka2 k2 . . . kan k2 , which is known as the Hadamard inequality.12 1.7.4 Order estimates Let f and g be functions from the positive reals to the positive reals. In many estimates the following definitions will be helpful. • We write f (x) = O(g(x)) if there exists a positive constant c such that f (x) ≤ cg(x), for all x > 0. • We write f (x) = Ω(g(x)) if there exists a positive constant c such that f (x) ≥ cg(x), for all x > 0. • We write f (x) = Θ(g(x)) if there exist positive constants c1 and c2 such that c1 g(x) ≤ f (x) ≤ c2 g(x), for all x > 0. 1.7.5 Notational conventions The identity matrix usually is denoted as I; if the size of I is not clear from the context we use a subscript like in In to specify that it is the n × n identity matrix. Similarly, zero matrices and zero vectors usually are denoted simply as 0; but if the size is ambiguous, we use subscripts like in 0m×n to specify the size. The all-one vector is always denoted as e, and if necessary the size is specified by a subscript. For any x ∈ IRn we often denote the diagonal matrix diag (x) by the corresponding capital X. For example, D = diag (d). The componentwise product of two vectors x, s ∈ IRn , known as the Hadamard product of x and s is denoted compactly by xs.13 The i-th entry of xs is xi si . In other words, xs = Xs = Sx. As a consequence we have for the scalar product of x and s, xT s = eT (xs), which will be used repeatedly later on. Similarly we use x/s for the componentwise quotient of x and s. This kind of notation is also used for√unitary operations. For √ example, the i-th entry of x−1 is xi−1 and the i-th entry of x is xi . This notation is consistent as long as componentwise operations are given precedence over matrix operations. Thus, if A is a matrix then Axs = A(xs). 12 See, e.g., Horn and Johnson [147], page 477. 13 In the literature this product is known as the Hadamard product of x and s. It is often denoted by x•s. Throughout the book we will use the shorter notation xs. Note that if x and s are nonnegative then xs = 0 holds if and only if xT s = 0. Part I Introduction: Theory and Complexity 2 Duality Theory for Linear Optimization 2.1 Introduction This chapter introduces the reader to the main theoretical results in the field of linear optimization (LO). These results concern the notion of duality in LO. An LO problem consists of optimizing (i.e., minimizing or maximizing) a linear objective function subject to a finite set of linear constraints. The constraints may be equality constraints or inequality constraints. If the constraints are inconsistent, so that they do not allow any feasible solution, then the problem is called infeasible, otherwise feasible. In the latter case the feasible set (or domain) of the problem is not empty; then there are two possibilities: the objective function is either unbounded or bounded on the domain. In the first case, the problem is called unbounded and in the second case bounded. The set of optimal solutions of a problem is referred to as the optimal set; the optimal set is empty if and only if the problem is infeasible or unbounded. For any LO problem we may construct a second LO problem, called its dual problem, or shortly its dual. A problem and its dual are closely related. The relation can be expressed nicely in terms of the optimal sets of both problems. If the optimal set of one of the two problems is nonempty, then neither is the optimal set of the other problem; moreover, the optimal values of the objective functions for both problems are equal. These nontrivial results are the basic ingredients of the so-called duality theory for LO. The duality theory for LO can be derived in many ways.1 A popular approach in textbooks to this theory is constructive. It is based on the Simplex Method. While solving a problem by this method, at each iterative step the method generates so1 The first duality results in LO were obtained in a nonconstructive way. They can be derived from some variants of Farkas’ lemma [75], or from more general separation theorems for convex sets. See, e.g., Osborne [229] and Saigal [249]. An alternative approach is based on direct inductive proofs of theorems of Farkas, Weyl and Minkowski and derives the duality results for LO as a corollary of these theorems. See, e.g., Gale [91]. Constructive proofs are based on finite termination of a suitable algorithm for solving either linear inequality systems or LO problems. A classical method for solving linear inequality systems in a finite number of steps is Fourier-Motzkin elimination. By this method we can decide in finite time if the system admits a feasible solution or not. See, e.g., Dantzig [59]. This can be used to proof Farkas’ lemma from which the duality results for LO then easily follow. For the LO problem there exist several finite termination methods. One of them, the Simplex Method, is sketched in this paragraph. Many authors use such a method to derive the duality results for LO. See, e.g., Chvátal [55], Dantzig [59], Nemhauser and Wolsey [224], Papadimitriou and Steiglitz [231], Schrijver [250] and Walsh [287]. 16 I Theory and Complexity called multipliers associated with the constraints. The method terminates when the multipliers turn out to be feasible for the dual problem; then it yields an optimal solution both for the primal and the dual problem.2 Interior point methods are also intimately linked with duality theory. The key concept is the so-called central path, an analytic curve in the interior of the domain of the problem that starts somewhere in the ‘middle’ of the domain and ends somewhere in the ‘middle’ of the optimal set of the problem. The term ‘middle’ in this context will be made precise later. Interior point methods follow the central path (approximately) as a guideline to the optimal set.3 One of the aims of this chapter is to show that the aforementioned duality results can be derived from properties of the central path.4 Not every problem has a central path. Therefore, it is important in this framework to determine under which condition the central path exists. It happens that this condition implies the existence of the central path for the dual problem and the points on the dual central path are closely related to the points on the primal central path. As a consequence, following the primal central path (approximately) to the primal optimal set goes always together with following the dual central path (approximately) to the dual optimal set. Thus, when the primal and dual central paths exist, the interiorpoint approach yields in a natural way the duality theory for LO, just as in the case of the Simplex Method. When the central paths do not exist the duality results can be obtained by a little trick, namely by embedding the given problem in a larger problem which has a central path. Below this approach will be discussed in more detail. We start the whole analysis, in the next section, by considering the LO problem in the so-called canonical form. So the objective is to minimize a linear function over a set of inequality constraints of greater-than-or-equal type with nonnegative variables. Since every LO problem admits a canonical representation, the validity of the duality results in this chapter naturally extend to arbitrary LO problems. Usually the canonical form of an LO problem is obtained by introducing new variables and/or constraints. As a result, the number of variables and/or constraints may be doubled. In Appendix D.1 we present a specific scheme that transforms any LO problem that is not in the canonical form to a canonical problem in such a way that the total number of variables and constraints does not increase, and even decreases in many cases. We show that solving the canonical LO problem can be reduced to finding a solution of an appropriate system of inequalities. In Section 2.4 we impose a condition on the system—the interior-point condition— and we show that this condition is not satisfied by our system of inequalities. By expanding the given system slightly however we get an equivalent system that satisfies the interior-point condition. Then we construct a self-dual problem5 whose domain is defined by the last system. We further show that a solution of the system, and hence of the given LO problem, can easily be obtained 2 3 The Simplex Method was proposed first by Dantzig [59]. In fact, this method has many variants due to various strategies for choosing the pivot element. When we refer to the Simplex Method we always assume that a pivot strategy is used that prevents cycling and thus guarantees finite termination of the method. This interpretation of recent interior-point methods for LO was proposed first by Megiddo [200]. The notion of central path originates from nonlinear (convex) optimization; see Fiacco and McCormick [77]. 4 This approach to the duality theory has been worked out by Güler et al. [133, 134]. 5 Problems of this special type were considered first by Tucker [274], in 1956. I.2 Duality Theory 17 from a so-called strictly complementary solution of the self-dual problem. Thus the canonical problem can be embedded in a natural way into a selfdual problem and using the existence of a strictly complementary solution for the embedding self-dual problem we derive the classical duality results for the canonical problem. This is achieved in Section 2.9. The self-dual problem in itself is a trivial LO problem. In this problem all variables are nonnegative. The problem is trivial in the sense that the zero vector is feasible and also optimal. In general the zero vector will not be the only optimal solution. If the optimal set contains nonzero vectors, then some of the variables must occur with positive value in an optimal solution. Thus we may divide the variables into two groups: one group contains the variables that are zero in each optimal solution, and the second group contains the other variables that may occur with positive sign in an optimal solution. Let us call for the moment the variables in the first group ‘good’ variables and those in the second group ‘bad’ variables. We proceed by showing that the interior-point condition guarantees the existence of the central path. The proof of this fact in Section 2.7 is constructive. From the limiting behavior of the central path when it approaches the optimal set, we derive the existence of a strictly complementary solution of the self-dual problem. In such an optimal solution all ‘good’ variables are positive, whereas the ’bad’ variables are zero, of course. Next we prove the same result for the case where the interior-point condition does not hold. From this we derive that every (canonical) LO problem that has an optimal solution, also has a strictly complementary optimal solution. It may be clear that the nontrivial part of the above analysis concerns the existence of a strictly complementary solution for the self-dual problem. Such solutions play a crucial role in the approach of this book. Obviously a strictly complementary solution provides much more information on the optimal set of the problem than just one optimal solution, because variables that occur with zero value in a strictly complementary solution will be zero in any optimal solution.6 One of the surprises of this chapter is that the above results for the self-dual problem immediately imply all basic duality results for the general LO problem. This is shown first for the canonical problem in Section 2.9 and then for general LO problems in Section 2.10; in this section we present an easy-to-remember scheme for writing down the dual problem of any given LO problem. This involves first transforming the given problem to a canonical form, then taking the dual of this problem and reformulating the canonical dual so that its relation to the given problem becomes more apparent. The scheme is such that applying it twice returns the original problem. Finally, although the result is not used explicitly in this chapter, but because it is interesting in itself, we conclude this chapter with Section 2.11 where we show that the central path converges to an optimal solution. 6 The existence of strictly complementary optimal solutions was shown first by Goldman and Tucker [111] in 1956. Balinski and Tucker [33], in 1969, gave a constructive proof. 18 I Theory and Complexity 2.2 The canonical LO-problem and its dual We say that a linear optimization problem is in canonical form if it is written in the following way:  (P ) min cT x : Ax ≥ b, x ≥ 0 , (2.1) where the matrix A is of size m × n, the vectors c and x are in IRn and b in IRm . Note that all the constraints in (P ) are inequality constraints and the variables are nonnegative. Each LO-problem can be transformed to an equivalent canonical problem.7 Given the above canonical problem (P ), we consider a second problem, denoted by (D) and called the dual problem of (P ), given by  (D) max bT y : AT y ≤ c, y ≥ 0 . (2.2) The two problems (P ) and (D) share the matrix A and the vectors b and c in their description. But the role of b and c has been interchanged: the objective vector c of (P ) is the right-hand side vector of (D), and, similarly, the right-hand side vector b of (P ) is the objective vector of (D). Moreover, the constraint matrix in (D) is the transposed matrix AT , where A is the constraint matrix in (P ). In both problems the variables are nonnegative. The problems differ in that (P ) is a minimization problem whereas (D) is a maximization problem, and, moreover, the inequality symbols in the constraints have opposite direction.8,9 At this stage we make a crucial observation. Lemma I.1 (Weak duality) Let x be feasible for (P ) and y for (D). Then bT y ≤ cT x. (2.3) Proof: If x is feasible for (P ) and y for (D), then x ≥ 0, y ≥ 0, Ax ≥ b and AT y ≤ c. As a consequence we may write  bT y ≤ (Ax)T y = xT AT y ≤ cT x. This proves the lemma. ✷ Hence, any y that is feasible for (D) provides a lower bound bT y for the value of cT x, whenever x is feasible for (P ). Conversely, any x that is feasible for (P ) provides an upper bound cT x for the value of bT y, whenever y is feasible for (D). This phenomenon is known as the weak duality property. We have as an immediate consequence the following. Corollary I.2 If x is feasible for (P ) and y for (D), and cT x = bT y, then x is optimal for (P ) and y is optimal for (D). 7 For this we refer to any text book on LO. In Appendix D it is shown that this can be achieved without increasing the numbers of constraints and variables. 8 Exercise 1 The dual problem (D) can be transformed into canonical form by replacing the constraint AT y ≤ c by −AT y ≥ −c and the objective max bT y by min −bT y. Verify that the dual of the resulting problem is exactly (P ). 9 Exercise 2 Let the matrix A be skew-symmetric, i.e., AT = −A, and let b = −c. Verify that then (D) is essentially the same problem as (P ). I.2 Duality Theory 19 The (nonnegative) difference cT x − b T y (2.4) between the primal objective value at a primal feasible x and the dual objective value at a dual feasible y is called the duality gap for the pair (x, y). We just established that if the duality gap vanishes then x is optimal for (P ) and y is optimal for (D). Quite surprisingly, the converse statement is also true: if x is an optimal solution of (P ) and y is an optimal solution of (D) then the duality gap vanishes at the pair (x, y). This result is known as the strong duality property in LO. One of the aims of this chapter is to prove this most important result. So, in this chapter we will not use this property, but prove it! Thus our starting point is the question under which conditions an optimal pair (x, y) exists with vanishing duality gap. In the next section we reduce this question to the question if some system of linear inequalities is solvable. 2.3 Reduction to inequality system In this section we consider the question whether (P ) and (D) have optimal solutions with vanishing duality gap. This will be true if and only if the inequality system Ax ≥ b, T 0 T T −A y ≥ −c, b y−c x ≥ x ≥ 0, y ≥ 0, (2.5) has a solution. This follows by noting that x and y satisfy the inequalities in the first two lines if and only if they are feasible for (P ) and (D) respectively. By Lemma I.1 this implies cT x − bT y ≥ 0. Hence, if we also have bT y − cT x ≥ 0 we get bT y = cT x, proving the claim. If κ = 1, the following inequality system is equivalent to (2.5), as easily can be verified.     0m y 0m×m A −b       −AT 0n×n c   x  ≥  0n  , 0 κ bT −cT 0  x ≥ 0, y ≥ 0, κ ≥ 0. (2.6) The new variable κ is called the homogenizing variable. Since the right-hand side in (2.6) is the zero vector, this system is homogeneous: whenever (y, x, κ) solves the system then λ(y, x, κ) also solves the system, for any positive λ. Now, given any solution (x, y, κ) of (2.6) with κ > 0, (x/κ, y/κ, 1) yields a solution of (2.5). This makes clear that, in fact, the two systems are completely equivalent unless every solution of (2.6) has κ = 0. But if κ = 0 for every solution of (2.6), then it follows that no solution exists with κ = 1, and therefore the system (2.5) cannot have a solution in that case. Evidently, we can work with the second system without loss of information about the solution set of the first system. 20 I Theory and Complexity Hence, defining the matrix M̄ and the vector z̄ by     y 0 A −b     T M̄ :=  −A 0 c  , z̄ :=  x  , κ bT −cT 0 (2.7) M̄ z̄ ≥ 0, (2.8) where we omitted the size indices of the zero blocks, we have reduced the problem of finding optimal solutions for (P ) and (D) with vanishing duality gap to finding a solution of the inequality system z̄ ≥ 0, κ > 0. If this system has a solution then it gives optimal solutions for (P ) and (D) with vanishing duality gap; otherwise such optimal solutions do not exist. Thus we have proved the following result. Theorem I.3 The problems (P ) and (D) have optimal solutions with vanishing duality gap if and only if system (2.8), with M̄ and z̄ as defined in (2.7), has a solution. Thus our task has been reduced to finding a solution of (2.8), or to prove that such a solution does not exists. In the sequel we will deal with this problem. In doing so, we will strongly use the fact that the matrix M̄ is skew-symmetric, i.e., M̄ T = −M̄ .10 Note that the order of M̄ equals m + n + 1. 2.4 Interior-point condition The method we are going to use in the next chapter for solving (2.8) is an interiorpoint method (IPM), and for this we need the system to satisfy the interior-point condition. Definition I.4 (IPC) We say that any system of (linear) equalities and (linear) inequalities satisfies the interior-point condition (IPC) if there exists a feasible solution that strictly satisfies all inequality constraints in the system. Unfortunately the system (2.8) does not satisfy the IPC. Because if z = (x, y, κ) is a solution  then x/κ is feasible for (P ) and y/κ is feasible for (D). But then cT x − bT y /κ ≥ 0, by weak duality. Since κ > 0, this implies bT y − cT x ≤ 0. On the other hand, after substitution of (2.7), the last constraint in (2.8) requires bT y − cT x ≥ 0. It follows that bT y − cT x = 0, and hence no feasible solution of (2.8) satisfies the last inequality in (2.8) strictly. To overcome this shortcoming of the system (2.8) we increase the dimension by adding one more nonnegative variable ϑ to the vector z̄, and by extending M̄ with one extra column and row, according to # " # " M̄ r z̄ , z := , (2.9) M := T −r 0 ϑ 10 Exercise 3 If S is an n × n skew-symmetric matrix and z ∈ IRn , then z T Sz = 0. Prove this. I.2 Duality Theory 21 where r = em+n+1 − M̄ em+n+1 , with em+n+1 denoting an all-one vector of length m + n + 1. So we have       y 0 A −b e − Ae + b m n   −AT  x  r 0 c      M =  , r =  en + AT em − c  , z =  bT −cT 0   κ 1 − b T e m + cT e n −rT 0 ϑ (2.10)      (2.11) The order of the matrix M is m + n + 2. To simplify the presentation, in the rest of this chapter we denote this number as n̄: n̄ = m + n + 2. Letting q be the vector of length n̄ given by # " 0n̄−1 , q := n̄ (2.12) we consider the system M z ≥ −q, z ≥ 0. (2.13) We make two important observations. First we observe that the matrix M is skewsymmetric. Secondly, the system (2.13) satisfies the IPC. The all-one vector does the work, because taking z̄ = en̄−1 and ϑ = 1, we have # # " # " # " " #" 0 en̄−1 M̄ en̄−1 + r M̄ r en̄−1 . (2.14) + = = Mz + q = 1 1 n̄ −rT en̄−1 + n̄ −rT 0 The last equality is due to the definition of r, which implies M̄ en̄−1 + r = en̄−1 and T T en̄−1 + n̄ = 1, −rT en̄−1 + n̄ = − en̄−1 − M̄ en̄−1 en̄−1 + n̄ = −en̄−1 where we used eTn̄−1 M̄ en̄−1 = 0 (cf. Exercise 3, page 20). The usefulness of system (2.13) stems from two facts. First, it satisfies the IPC and hence can be treated by an interior-point method. What this implies will become apparent in the next chapter. Another crucial property is that there is a correspondence between the solutions of (2.8) and the solutions of (2.13) with ϑ = 0. To see this it is useful to write (2.13) in terms of z̄ and ϑ: #" # " # " z̄ 0 M̄ r + ≥ 0, z̄ ≥ 0, ϑ ≥ 0. T n̄ −r 0 ϑ Obviously, if z = (z̄, 0) satisfies (2.13), this implies M̄ z̄ ≥ 0 and z̄ ≥ 0, and hence z̄ satisfies (2.8). On the other hand, if z̄ satisfies (2.8) then M̄ z̄ ≥ 0 and z̄ ≥ 0; as a consequence z = (z̄, 0) satisfies (2.13) if and only if −rT z̄ + n̄ ≥ 0, i.e., if and only if rT z̄ ≤ n̄. If rT z̄ ≤ 0 this certainly holds. Otherwise, if rT z̄ > 0, the positive multiple nz̄/rT z̄ of z̄ satisfies rT z̄ ≤ n̄. Since a positive multiple preserves signs, this is sufficient for our goal. We summarize the above discussion in the following theorem. 22 I Theory and Complexity Theorem I.5 The following three statements are equivalent: (i) Problems (P ) and (D) have optimal solutions with vanishing duality gap; (ii) If M̄ and z̄ are given by (2.7) then (2.8) has a solution; (iii) If M and z are given by (2.11) then (2.13) has a solution with ϑ = 0 and κ > 0. Moreover, system (2.13) satisfies the IPC. 2.5 Embedding into a self-dual LO-problem Obviously, solving (2.8) is equivalent to finding a solution of the minimization problem  (SP0 ) min 0T z̄ : M̄ z̄ ≥ 0, z̄ ≥ 0 (2.15) with κ > 0. In fact, this is the way we are going to follow: our aim will be to find out whether this problem has a(n optimal) solution with κ > 0 or not. Note that the latter condition makes our task nontrivial. Because finding an optimal solution of (SP0 ) is trivial: the zero vector is feasible and hence optimal. Also note that (SP0 ) is in the canonical form. However, it has a very special structure: its feasible domain is homogeneous and since M̄ is skew-symmetric, the problem (SP0 ) is a self-dual problem (cf. Exercise 2, page 18). We say that (SP0 ) is a self-dual embedding of the canonical problem (P ) and its dual problem (D). If the constraints in an LO problem satisfy the IPC, then we simply say that the problem itself satisfies the IPC. As we established in the previous section, the self-dual embedding (SP0 ) does not satisfy the IPC, and therefore, from an algorithmic point of view this problem is not useful. In the previous section we reduced the problem of finding optimal solutions (P ) and (D) with vanishing duality gap to finding a solution of (2.13) with ϑ = 0 and κ > 0. For that purpose we consider another self-dual embedding of (P ) and (D), namely  (SP ) min q T z : M z ≥ −q, z ≥ 0 . (2.16) The following theorem shows that we can achieve our goal by solving this problem. Theorem I.6 The system (2.13) has a solution with ϑ = 0 and κ > 0 if and only if the problem (SP ) has an optimal solution with κ = zn̄−1 > 0. Proof: Since q ≥ 0 and z ≥ 0, we have q T z ≥ 0, and hence the optimal value of (SP ) is certainly nonnegative. On the other hand, since q ≥ 0 the zero vector (z = 0) is feasible, and yields zero as objective value, which is therefore the optimal value. Since q T z = n̄ϑ, we conclude that the optimal solutions of (2.16) are precisely the vectors z satisfying (2.13) with ϑ = 0. This proves the theorem. ✷ We associate to any vector z ∈ IRn its slack vector s(z) as follows. s(z) := M z + q. (2.17) Then we have z is a feasible for (SP ) ⇐⇒ z ≥ 0, s(z) ≥ 0. I.2 Duality Theory 23 As we established in the previous section, the inequalities defining the feasible domain of (SP ) satisfy the IPC. To be more specific, we found in (2.14) that the all-one vector e is feasible and its slack vector is the all-one vector. In other words, s(e) = e. (2.18) We proceed by giving a small example. Example I.7 By way of example we consider the case where the problems (P ) and (D) are determined by the following constraint matrix A, and vectors b and c:11       1 1 , c= 2 . , b= A= −1 0 According to (2.7) the matrix M̄ is then equal to     0 0 1 −1 0 A −b  0 0 0 1  M̄ =  −AT 0 c =  −1 0 0 2 T T b −c 0 1 −1 −2 0 and according to (2.10), the vector r becomes    1 1    r = e − M̄ e =   1 − 1 Thus, by (2.11) and (2.12), we obtain  0 0 1 −1  0 0 0 1   M =  −1 0 0 2   1 −1 −2 0 −1 0 0 −3 Hence, the self-dual problem (SP ),    0 0 1 −1    0   0 0 1    min 5ϑ :  −1 0 0 2      1 −1 −2 0    −1 0 0 −3   0 1 0 1 = 1 0 −2 3 1 0 0 3 0     ,     ,      q=   0 0 0 0 5     .   as given by (2.16), gets the form         1 0 z1 z1            0   z2   0   z2         0   z3  +  0  ≥ 0,  z3  ≥ 0 . (2.19)           z4 = κ  3   z4   0     z5 0 5 z5 = ϑ Note that the all-one vector is feasible for this problem and that its surplus vector also is the all-one vector. This is in accordance with (2.18). As we shall see later on, it means that the all-one vector is the point on the central path for µ = 1. ♦ 11 cf. Example D.5 (page 449) in Appendix D. 24 I Theory and Complexity Remark I.8 In the rest of this chapter, and the next chapter, we deal with the problem (SP ). In fact, our analysis does not only apply to the case that M and q have the special form of (2.11) and (2.12). Therefore we extend the applicability of our analysis by weakening the assumptions on M and q. Unless stated otherwise below we only assume the following: M T = −M, q ≥ 0, s(e) = e. (2.20) The last two variables in the vector z play a special role. They are the homogenizing variable κ = zn̄−1 , and ϑ = zn̄ . The variable ϑ is called the normalizing variable, because of the following important property. Lemma I.9 One has eT z + eT s(z) = n̄ + q T z. (2.21) Proof: The identity in the lemma is a consequence of the orthogonality property (cf. Exercise 3, page 20) uT M u = 0, ∀u ∈ IRn̄ . (2.22) First we deduce that for every z one has q T z = z T (s(z) − M z) = z T s(z) − z T M z = z T s(z). (2.23) Taking u = e − z in (2.22) we obtain (z − e)T (s(z) − s(e)) = 0. Since s(e) = e, eT e = n̄ and z T s(z) = q T z, the relation (2.21) follows. ✷ It follows from Lemma I.9 that the sum of the positive coordinates in z and s(z) is bounded above by n̄ + q T z. Note that this is especially interesting if z is optimal, because then q T z = 0. Hence, if z is optimal then eT z + eT s(z) = n̄. (2.24) Since z and s(z) are nonnegative this implies that the set of optimal solutions is bounded. Another interesting feature of the LO-problem (2.16) is that it is self-dual: the dual problem is  (DSP ) max −q T u : M T u ≤ q, u ≥ 0 ; since M is skew-symmetric, M T u ≤ q is equivalent to −M u ≤ q, or M u ≥ −q, and maximizing −q T u is equivalent to minimizing q T u, and thus the dual problem is essential the same problem as (2.16). The rest of the chapter is devoted to our main task, namely to find an optimal solution of (2.16) with κ > 0 or to establish that such a solution does not exist. 2.6 The classes B and N We introduce the index sets B and N according to B := {i : zi > 0 for some optimal z} N := {i : si (z) > 0 for some optimal z} . I.2 Duality Theory 25 So, B contains all indices i for which an optimal solution z with positive zi exists. We also write zi ∈ B if i ∈ B. Note that we certainly have ϑ ∈ / B, because ϑ is zero in any optimal solution of (SP ). The main question we have to answer is whether κ ∈ B holds or not. Because if κ ∈ B then there exists an optimal solution z with κ > 0, in which case (P ) and (D) have optimal solutions with vanishing duality gap, and otherwise not. The next lemma implies that the sets B and N are disjoint. In this lemma, and further on, we use the following notation. To any vector u ∈ IRk , we associate the diagonal matrix U whose diagonal entries are the elements of u, in the same order. If also v ∈ IRk , then U v will be denoted shortly as uv. Thus uv is a vector whose entries are obtained by multiplying u and v componentwise. Lemma I.10 Let z 1 and z 2 be feasible for (SP ). Then z 1 and z 2 are optimal solutions of (SP ) if and only if z 1 s(z 2 ) = z 2 s(z 1 ) = 0. Proof: According to (2.23) we have for any feasible z: q T z = z T s(z). (2.25) As a consequence, z ≥ 0 is optimal if and only if s(z) ≥ 0 and z T s(z) = 0. Since, by (2.22),  T z 1 − z 2 M z 1 − z 2 = 0, we have z1 − z2 T  s(z 1 ) − s(z 2 ) = 0. Expanding the product on the left and rearranging the terms we get (z 1 )T s(z 2 ) + (z 2 )T s(z 1 ) = (z 1 )T s(z 1 ) + (z 2 )T s(z 2 ). Now z 1 is optimal if and only if (z 1 )T s(z 1 ) = 0, by (2.25), and similarly for z 2 . Hence, since z 1 , z 2 , s(z 1 ) and s(z 2 ) are all nonnegative, z 1 and z 2 are optimal if and only if (z 1 )T s(z 2 ) + (z 2 )T s(z 1 ) = 0, which is equivalent to z 1 s(z 2 ) = z 2 s(z 1 ) = 0, proving the lemma. ✷ Corollary I.11 The sets B and N are disjoint. Proof: If i ∈ B ∩ N then there exist optimal solutions z 1 and z 2 of (SP ) such that zi1 > 0 and si (z 2 ) > 0. This would imply zi1 si (z 2 ) > 0, a contradiction with Lemma I.10. Hence B ∩ N is the empty set. ✷ By way of example we determine the classes B and N for the problem considered in Example I.7. 26 I Theory and Complexity Example I.12 Consider the self-dual problem (SP ) in Example I.7, as given by (2.19):            0 0 1 −1 1 z1 0 z1       0 z  0  z     0 0 1 0 2    2             min 5ϑ :  −1 0 0 2 0   z3  +  0  ≥ 0,  z3  ≥ 0 .               1 −1 −2  z4 = κ  0 3   z4   0        −1 0 0 −3 0 z5 5 z5 = ϑ For any z ∈ IR5 we have     z3 − z4 + z5 z3 − κ + ϑ     z4 κ         s(z) =  2z4 − z1 2κ − z1 =       z1 − z2 − 2z3 + 3z5   z1 − z2 − 2z3 + 3ϑ  5 − z1 − 3z4 5 − z1 − 3κ Now z is feasible if z ≥ 0 and s(z) ≥ 0, and z = (z1 , z2 , z3 , κ, ϑ) is optimal if and only if     z1 z3 − κ + ϑ z    κ  2       2κ − z1  z3  ≥ 0,   ≥ 0,     κ  z1 − z2 − 2z3 + 3ϑ  ϑ 5 − z1 − 3κ Adding the equalities at the right we obtain 5ϑ Substitution gives     z1 z3 − κ z    κ  2        z3  ≥ 0,  2κ − z1  ≥ 0,     κ  z1 − z2 − 2z3  0 5 − z1 − 3κ optimal if moreover zs(z) = 0. So        z1 (z3 − κ + ϑ) = 0 z2 κ = 0 z3 (2κ − z1 ) = 0 .    κ (z1 − z2 − 2z3 + 3ϑ) = 0    ϑ (5 − z1 − 3κ) = 0 = 0, which gives ϑ = 0, as it should.        z1 (z3 − κ) = 0 z2 κ = 0 z3 (2κ − z1 ) = 0 .    κ (z1 − z2 − 2z3 ) = 0    ϑ=0 Note that if κ = 0 then the inequality 2κ − z1 ≥ 0 implies z1 = 0, and then the inequality z1 − z2 − 2z3 ≥ 0 gives also z2 = 0 and z3 = 0. Hence, z = 0 is the only optimal solution for which κ = 0. So, let us assume κ > 0. Then we deduce from the second and fourth equality that z2 = 0 and z1 − z2 − 2z3 = 0. This reduces our system to       2z3 (z3 − κ) = 0 z1 = 2z3 z3 − κ         z2 = 0 κ 0          z3 (2κ − 2z3 ) = 0 . z3   ≥ 0,  2κ − 2z3  ≥ 0,            0 κ 0=0    5 − 2z3 − 3κ 0 ϑ=0 The equations at the right make clear that either z3 = 0 or z3 = κ. However, the inequality z3 − κ ≥ 0 forces z3 > 0 since κ > 0. Thus we find that any optimal solution I.2 Duality Theory has the form 27   2κ  0      z =  κ ,   κ 0   0  κ      s(z) =  0  ,    0  5 − 5κ 0 ≤ κ ≤ 1. (2.26) This implies that in this example the sets B and N are given by B = {1, 3, 4} , N = {2, 5} . ♦ In the above example the union of B and N is the full index set. This is not an incident. Our next aim is to prove that this always holds. 12,13,14,15 As a consequence these sets form a partition of the full index set {1, 2, . . . , n̄}; it is the so-called optimal partition of (SP ). This important and nontrivial result is fundamental to our purpose but its proof requires some effort. It highly depends on properties of the central path of (SP ), which is introduced in the next section. 2.7 2.7.1 The central path Definition of the central path Recall from (2.14) that s(e) = e, where e (as always) denotes the all-one vector of appropriate length (in this case, n̄). As a consequence, we have a vector z such that zi si (z) = 1 (1 ≤ i ≤ n̄), which, using our shorthand notation can also be expressed as ⇒ z=e zs(z) = e. (2.27) Now we come to a very fundamental notion, both from a theoretical and algorithmic point of view, namely the central path of the LO-problem at hand. The underlying 12 Exercise 4 Following the same approach as in Example I.7 construct the embedding problem for the case where the problems (P ) and (D) are determined by A= 13 h 1 0 i , i , b= h 1 0 b= h 1 1 Exercise 6 Same as in Exercise 4, but now with A= 15 1 0 i , i , c=  2  , and, following the approach of Example I.12, find the set of all optimal solutions and the optimal partition. Exercise 5 Same as in Exercise 4, but now with A= 14 h h 1 0 h 1 0 i , i , b= h 1 β h 1 β i , i , c= Exercise 7 Same as in Exercise 4, but now with A= b= c= c=   2 2    2  . , β > 0. , β < 0. 28 I Theory and Complexity theoretical property is that for every positive number µ there exist a nonnegative vector z such that zs(z) = µe, z ≥ 0, s(z) ≥ 0, (2.28) and moreover, this vector is unique. If µ = 1, the existence of such a vector is guaranteed by (2.27). Also note that if we put µ = 0 in (2.28) then the solutions are just the optimal solutions of (SP ). As we have seen in Example I.12 there may more than one optimal solution. Therefore, if µ = 0 the system (2.28) may have multiple solutions. The following lemma is of much interest. It makes clear that for µ > 0 the system (2.28) has at most one solution. Lemma I.13 If µ > 0, then there exists at most one nonnegative vector z such that (2.28) holds. Proof: Let z 1 and z 2 to nonnegative vectors satisfying (2.28), and let s1 = s(z 1 ) and s2 = s(z 2 ). Since µ > 0, z 1 , z 2 , s1 , s2 are all positive. Define ∆z := z 2 − z 1 , and similarly ∆s := s2 − s1 . Then we may easily verify that M ∆z = ∆s 1 1 z ∆s + s ∆z + ∆s∆z = 0. (2.29) (2.30) Using that M is skew-symmetric, (2.22) implies that ∆z T ∆s = 0, or, equivalently, eT (∆z∆s) = 0. (2.31) Rewriting (2.30) gives (z 1 + ∆z)∆s + s1 ∆z = 0. Since z 1 + ∆z = z 2 > 0 and s1 > 0, this implies that no two corresponding entries in ∆z and ∆s have the same sign. So it follows that ∆z∆s ≤ 0. (2.32) Combining (2.31) and (2.32), we obtain ∆z∆s = 0. Hence either (∆z)i = 0 or (∆s)i = 0, for each i. Using (2.30) once more, we conclude that (∆z)i = 0 and (∆s)i = 0, for each i. Hence ∆z = ∆s = 0, whence z 1 = z 2 and s1 = s2 . This proves the lemma. ✷ To prove the existence of a solution to (2.28) requires much more effort. We postpone this to the next section. For the moment, let us take the existence of a solution to (2.28) for granted and denote it as z(µ). We call it the µ-center of (SP ). The set {z(µ) : µ > 0} of all µ-centers represents a parametric curve in the feasible region of (SP ). This curve is called the central path of (SP ). Note that q T z(µ) = s(µ)T z(µ) = µn̄. (2.33) This proves that along the central path, when µ approaches zero, the objective value q T z(µ) monotonically decreases to zero, at a linear rate. I.2 Duality Theory 2.7.2 29 Existence of the central path In this section we give an algorithmic proof of the existence of a solution to (2.28). Starting at z = e we construct the µ-center for any µ > 0. This is done by using the so-called Newton direction as a search direction. The results in this section will also be used later when dealing with a polynomial-time method for solving (SP ). Newton direction Assume that z is a positive solution of (SP ) such that its slack vector s = s(z) is positive, and let ∆z denote a displacement in the z-space. Our aim is to find ∆z such that z + ∆z is the µ-center. We denote z + := z + ∆z, and the new slack vector as s+ : s+ := s(z + ) = M (z + ∆z) + q = s + M ∆z. Thus, the displacement ∆s in the s-space is simply given by ∆s = s+ − s = M ∆z. Observe that ∆z and ∆s are orthogonal, since by (2.22): (∆z)T ∆s = (∆z)T M ∆z = 0. (2.34) We want ∆z to be such that z + becomes the µ-center, which means (z + ∆z) (s + ∆s) = µe, or zs + z∆s + s∆z + ∆z∆s = µe. This equation is nonlinear, due to the quadratic term ∆z∆s. Applying Newton’s method, we omit this nonlinear term, leaving us with the following linear system in the unknown vectors ∆z and ∆s: M ∆z − ∆s = 0, z∆s + s∆z = µe − zs. (2.35) (2.36) This system has a unique solution, as easily may be verified, by using that M is skew-symmetric and z > 0 and s > 0.16,17 The solution ∆z is called the Newton direction. Since we omitted the quadratic term ∆z∆s in our calculation of the Newton 16 Exercise 8 The coefficient matrix of the system (2.35-2.36) of linear equations in ∆z and ∆s is  17 M S −I Z  . As usual, Z = diag (z) and S = diag (s), with z > 0 and s > 0, and I denotes the identity matrix. Show that this matrix is nonsingular. Exercise 9 Let M be a skew-symmetric matrix of size n × n and Z and S positive diagonal matrices of the same size as M . Then the matrix S + ZM is nonsingular. Prove this. 30 I Theory and Complexity direction, z + ∆z will (in general) not be the µ-center, but hopefully it will be a good approximation. In fact, using (2.36), after the Newton step one has z + s(z + ) = (z + ∆z)(s + ∆s) = zs + (z∆s + s∆z) + ∆z∆s = µe + ∆z∆s. (2.37) Comparing this with our desire, namely z + s(z + ) = µe, we see that the ‘error’ is precisely the quadratic term ∆z∆s. Using (2.22), we deduce from (2.37) that T z + s(z + ) = µeT e = µn̄, (2.38) showing that after the Newton step the duality gap already has the desired value. Example I.14 Let us compute the Newton step at z = e for the self-dual problem (SP ) in Example I.7, as given by (2.19), with respect to some µ > 0. Since z = s(z) = e, the equation (2.36) reduces to ∆s + ∆z = µe − e = (µ − 1)e. Hence, by substitution into (2.35) we obtain (M + I) ∆z = (µ − 1)e. It suffices to know the solution of the equation (M + I) ζ = e, because then ∆z = (µ − 1)ζ. Thus we need to solve ζ from     1 0 1 −1 1 1  0  1 1 0 1 0         0 1 2 0 ζ = 1,  −1      1 −1 −2 1 1 3 −1 0 0 −3 1 1 which gives the unique solution ζ= Hence and T  1 8 4 1 . − , , , ,1 3 9 9 9 T  1 8 4 1 ∆z = (µ − 1) − , , , , 1 , 3 9 9 9 ∆s = M ∆z = (µ − 1) (e − ζ) = (µ − 1) After the Newton step we thus have  4 1 5 8 , , , ,0 3 9 9 9 (2.39) T . (2.40) z + s+ = (z + ∆z) (s + ∆s) = zs + (∆z + ∆s) + ∆z∆s = e + (µ − 1)e + ∆z∆s = µ e + ∆z∆s (µ − 1)2 = µe + (−36, 8, 20, 8, 0)T . 81 ♦ I.2 Duality Theory 31 Proximity measure To measure the quality of any approximation z of z(µ), we introduce a proximity measure δ(z, µ) that vanishes if z = z(µ) and is positive otherwise. To this end we introduce the variance vector of z with respect to µ as follows: s zs(z) , (2.41) v := µ where all operations are componentwise. Note that ⇔ zs(z) = µe v = e. The proximity measure δ(z, µ) is now defined by18 δ(z, µ) := 1 2 v − v −1 . (2.42) Note that if z = z(µ) then v = e and hence δ(z, µ) = 0, and otherwise δ(z, µ) > 0. We show below that if δ(z, µ) < 1 then the Newton process quadratically fast converges to z(µ). For this we need the following lemma, which estimates the error term in terms of the proximity measure. In this lemma k.k denotes the Eucledian norm (or 2-norm) and k.k∞ the Chebychev norm (or infinity norm) of a vector. √ Lemma I.15 If δ := δ(z, µ), then k∆z∆sk∞ ≤ µδ 2 and k∆z∆sk ≤ µδ 2 2. √ √ Proof: Componentwise division of (2.36) by µ v = zs yields r z ∆s + s r  √ s ∆z = µ v −1 − v . z The terms at the left represent orthogonal vectors whose componentwise product is ∆z∆s. Applying Lemma C.4 in Appendix C to these vectors, and using that v −1 − v = 2δ, the result immediately follows. ✷ Quadratic convergence of the Newton process We are now ready for the main result on the Newton direction. Theorem I.16 If δ := δ(z, µ) < 1, then the Newton step is strictly feasible, i.e., z + > 0 and s+ > 0. Moreover, 18 δ2 . δ(z + , µ) ≤ p 2(1 − δ 2 ) In the analysis of interior-point methods we always need to introduce a quantity that measures the ‘distance’ of a feasible vector z to the central path or to the µ-center. This can be done in many ways as becomes apparent in the course of this book. In the coming chapters we make use of a variety of so-called proximity measures. Most of these measures are based on the simple observation that z is equal to the µ-center if and only if v = e and z is on the central path if and only if the vector zs(z) is a scalar multiple of the all-one vector. 32 I Theory and Complexity Proof: Let 0 ≤ α ≤ 1, z α = z + α∆z and sα = s + α∆s. We then have, using (2.36), z α sα = (z + α∆z)(s + α∆s) = zs + α (z∆s + s∆z) + α2 ∆z∆s = zs + α (µe − zs) + α2 ∆z∆s = (1 − α)zs + α (µe + α∆z∆s) By Lemma I.15, µe + α∆z∆s ≥ µe − α k∆z∆sk∞ e ≥ µ(1 − αδ 2 ) e > 0. Hence, since (1 − α)zs ≥ 0, we have z α sα > 0, for all α ∈ [0, 1]. Therefore, the components of z α and sα cannot vanish when α ∈ [0, 1]. Hence, since z > 0 and s > 0, by continuity, z α and sα must be positive for any such α, especially for α = 1. This proves the first statement in the lemma. Now let us turn to the proof of the second statement. Let δ + := δ(z + , µ) and let v + be the variance vector of z + with respect to µ: s z + s+ . v+ = µ Then, by definition, 2δ + = (v + )−1 − v + = (v + )−1 e − (v + )2 Recall from (2.37) that z + s+ = µe + ∆z∆s. In other words, v+ Substitution into (2.43) gives − ∆z∆s µ 2δ = q e + ∆z∆s µ + 2 =e+  . (2.43) ∆z∆s . µ ∆z∆s µ ≤ r 1− ∆z∆s µ ∞ √ δ2 2 ≤ √ . 1 − δ2 The last inequality follows by using Lemma I.15 twice. Thus the proof is complete. ✷ √ Theorem I.16 implies that when δ ≤ 1/ 2, then after a Newton step the proximity to the µ-center satisfies δ(z + , µ) ≤ δ 2 . In other words, Newton’s method is quadratically convergent. Example I.17 Using the self-dual problem (SP ) in Example I.7 again, we consider in this example feasibility of the Newton step, and the proximity measure before and after Newton step at z = e for several values of µ, to be specified below. We will see that the Newton step performs much better than Theorem I.16 predicts! In Example I.14 we found the values of ∆z and ∆s. Using these values we find for the new iterate: T  1 8 4 1 , z = e + (µ − 1) − , , , , 1 3 9 9 9 + I.2 Duality Theory 33 and since s = s(e) = e, s+ = e + (µ − 1)  4 1 5 8 , , , ,0 3 9 9 9 T . Hence the Newton step is feasible, i.e., z + and s+ are nonnegative, if and only if 0.25 ≤ µ ≤ 4, as easily may be verified. For any such µ we have √ 1 √ 5 √ e 1 1 1 √ δ(z, µ) = µe − √ µ − √ kek = µ− √ . = 2 µ 2 µ 2 µ Note that Theorem I.16 guarantees feasibility only if δ(z, µ) ≤ 1. This holds if 5µ2 − 14µ + 5 ≤ 0, which is equivalent to √  √  1 1 7−2 6 ≤µ≤ 7 + 2 6 ≈ 2.3798. 0.4202 ≈ 5 5 √ The same theorem guarantees quadratically convergence if δ(z, µ) ≤ 1/ 2, which holds if and only if √  √  1 1 0.5367 ≈ 6 − 11 ≤ µ ≤ 6 + 11 ≈ 1.8633. 5 5 √ By way of example, consider the case where µ = 0.5. Then we have δ(z, µ) = 14 10 ≈ √ 5 0.7906 and, by Theorem I.16, δ(z + , µ) ≤ 12 3 ≈ 0.7217. Let us compute the actual + value of δ(z , µ). For µ = 0.5 we have 1 2  1 s =e− 2  z+ = e − 1 8 4 1 − , , , ,1 3 9 9 9 T =  7 5 7 17 1 , , , , 6 9 9 18 2 T , and since s = s(e) = e, + 4 1 5 8 , , , ,0 3 9 9 9 Therefore, v+ 2 = z + s+ = µ Finally, we compute δ(z + , µ) by using 4δ(z + , µ)2 = v + − (v + )−1 T  2 T =  1 17 13 5 , , , ,1 3 18 18 9 7 85 91 85 , , , ,1 9 81 81 81 = 5 X i=1 vi+ 2 + T 5 X i=1 T . . vi+ −2 − 10. Note that the first sum equals (z + ) s+ /µ = 2n̄µ = 5. The second sum equals 5.0817. Thus we obtain 4δ(z + , µ)2 = 0.0817, which gives δ(z + , µ) = 0.1429. ♦ 34 I Theory and Complexity Existence of the central path Now suppose that we know the µ-center for µ = µ0 > 0 and let us denote z 0 = z(µ0 ). Note that this is true for µ0 = 1, with z 0 = e, because es(e) = e. So e is the µ-center for µ = 1. Since z 0 s(z 0 ) = µ0 e, the v-vector for z 0 with respect to an arbitrary µ > 0 is given by s s s z 0 s(z 0 ) = µ v= Hence we have δ(z 0 , µ) ≤ √1 2 µ0 e. µ if and only if 1 2 Using kek = µ0 e = µ s µ0 − µ s µ 1 kek ≤ √ . µ0 2 √ n̄, one may easily verify that this holds if and only if r 1 1 1 2 µ β := 1 + + + . ≤ 0 ≤ β, 2 β µ n̄ n̄ n̄ (2.44) Now starting the Newton process at z 0 , with µ fixed, and such that µ satisfies (2.44), we can generate an infinite sequence z 0 , z 1 , · · · z k , · · · such that Hence  δ zk, µ ≤ 1 . 22k−1  lim δ z k , µ = 0. k→∞ The generated sequence has at least one accumulation point z ∗ , since the iterates z 1 , · · · z k , · · · lie in the compact set eT z + eT s(z) = n̄ (1 + µ) , z ≥ 0, s(z) ≥ 0, due to (2.21) and (2.38). Since δ (z ∗ , µ) = 0, we obtain z ∗ s (z ∗ ) = µe. Due to Lemma I.13, z ∗ is unique. This proves that the µ-center exists if µ satisfies (2.44) with µ0 = 1, i.e., if 1 ≤ µ ≤ β. β By redefining µ0 as one of the endpoints of the above interval we can repeat the above procedure, and extend the interval where the µ-center exists to 1 ≤ µ ≤ β2. β2 and so on. After applying the procedure k times the interval where the µ-center certainly exists is given by 1 ≤ µ ≤ βk . βk I.2 Duality Theory 35 For arbitrary µ > 0, we have to apply the above procedure at most |log µ| log β times, to prove the existence of the µ-center. This completes the proof of the existence of the central path. t for t ≥ 0,19 It may be worth noting that, using n̄ ≥ 2 and log(1 + t) ≥ 1+t q ! r ! r √ 2 1 2 1 1 2 2 n̄ √ ≥√ . q = √ log β = log 1 + + ≥ log 1 + ≥ + n̄ n̄2 n̄ n̄ 2 n̄ + 2 2n̄ 1+ n̄ Hence the number of times that we have to apply the above described procedure to obtain the µ-center is bounded above by √ 2n̄ |log µ| . (2.45) We have just shown that the system (2.28) has a unique solution for every positive µ. The solution is called the µ-center, and denoted as z(µ). The set of all µ-centers is a curve in the interior of the feasible region of (SP ). The definition of the µ-center, as given by (2.28), can be equivalently given as the unique solution of the system M z + q = s, z ≥ 0, s ≥ 0 zs = µe, (2.46) with M and z as defined in (2.11), and s = s(z), as in (2.17).20,21,22 2.8 Existence of a strictly complementary solution Now that we have proven the existence of the central path we can use it as a guide to the optimal set, by letting the parameter µ approach to zero. As we show in this section, in this way we obtain an optimal solution z such that z + s(z) > 0. Definition I.18 Two nonnegative vectors a and b in IRn are said to be complementary vectors if ab = 0. If moreover a + b > 0 then a and b are called strictly complementary vectors. 19 See, e.g., Exercise 39, page 133. 20 Exercise 10 Using the definitions of z and q, according to (2.11) and (2.12), show that ϑ(µ) = µ. 21 Exercise 11 In this exercise a skew-symmetric M and four vectors q (i) , i = 1, 2, 3, 4 are given as follows: M = 22 h 0 −1 1 0 i , q (1) = h 0 0 i , q (2) = h 1 0 i , q (3) = h 0 1 i , q (4) = h 1 1 i . For each of the four cases q = q (i) , i = 1, 2, 3, 4, one is asked to verify (1) if the system (2.46) has a solution if µ > 0 and (2) if the first equation in (2.46) satisfies the IPC, i.e., has a solution with z > 0 and s > 0. Exercise 12 Show that z(µ) is continuous (and differentiable) at any positive µ. (Hint: Apply the implicit function theorem (cf. Proposition A.2 in Appendix A) to the system (2.46)). 36 I Theory and Complexity Recall that optimality of z means that zs(z) = 0, which means that z and s(z) are complementary vectors. We are going to show that there exists an optimal vector z such that z and s(z) are strictly complementary vectors. Then for every index i, either zi > 0 or si (z) > 0. This implies that the index sets B and N , introduced in Section 2.5 form a partition of the index set, the so-called optimal partition of (SP ). It is convenient to introduce some more notation. Definition I.19 If z is a nonnegative vector, we define its support, denoted by σ(z), as the set of indices i for which zi > 0: σ(z) = {i : zi > 0} . Note that if z is feasible then zs(z) = 0 holds if and only if σ(z) ∩ σ(s(z)) = ∅. Furthermore, z is a strictly complementary optimal solution if and only if it is optimal and σ(z) ∪ σ(s) = {1, 2, . . . , n̄}. Theorem I.20 (SP ) has an optimal solution z ∗ with z ∗ + s(z ∗ ) > 0. Proof: Let {µk }∞ k=1 be a monotonically decreasing sequence of positive numbers µk such that µk → 0 if k → ∞, and let s(µk ) := s (z(µk )). Due to Lemma I.9 the set {(z(µk ), s(µk ))} lies in a compact set, and hence it contains a subsequence converging to a point (z ∗ , s∗ ), with s∗ = s(z ∗ ). Since z(µk )T s(µk ) = n̄µk → 0, we have (z ∗ )T s∗ = 0. Hence, from (2.25), q T z ∗ = 0. So z ∗ is an optimal solution. We claim that (z ∗ , s∗ ) is a strictly complementary pair. To prove this, we apply the orthogonality property (2.22) to the points z ∗ and z(µk ), which gives (z(µk ) − z ∗ )T (s(µk ) − s∗ ) = 0. Rearranging the terms, and using z(µk )T s(µk ) = n̄µk and (z ∗ )T s∗ = 0, we arrive at X X zj (µk )s∗j = n̄µk . zj∗ sj (µk ) + j∈σ(z ∗ ) j∈σ(s∗ ) Dividing both sides by µk and recalling that zj (µk )sj (µk ) = µk , we obtain X j∈σ(z ∗ ) X zj∗ s∗j + = n̄. zj (µk ) sj (µk ) ∗ j∈σ(s ) Letting k → ∞, the first sum on the left becomes equal to the number of positive coordinates in z ∗ . Similarly, the second sum becomes equal to the number of positive coordinates in s∗ . The sum of these numbers being n̄, we conclude that the optimal pair (z ∗ , s∗ ) is strictly complementary.23,24 ✷ 23 24 By using a similar proof technique it can be shown that the limit of z(µ) exists if µ goes to zero. In other words, the central path converges. The limit point is (of course) a uniquely determined optimal solution of (SP ), which can further be characterized as the so-called analytic center of the set of optimal solutions (cf. Section 2.11). Let us also mention that Theorem I.20 is a special case of an old result of Goldman and Tucker which states that every feasible linear system of equalities and inequalities has a strictly complementary solution [111]. I.2 Duality Theory 37 By Theorem I.20 there exists a strictly complementary solution z of (SP ). Having such a solution, the classes B and N simply follow from B = {i : zi > 0} , N = {i : si (z) > 0} . Now recall from Theorem I.5 and Theorem I.6 that the problems (P ) and (D) have optimal solutions with vanishing duality gap if and only if (SP ) has an optimal solution with κ > 0. Due to Theorem I.20 this can be restated as follows. Corollary I.21 The problems (P ) and (D) have optimal solutions with vanishing duality gap if and only if κ ∈ B. Let us consider more in detail the implications of κ ∈ B for the problems (SP0 ), and more importantly, for (P ) and (D). Theorem I.20 implies the existence of a strictly complementary optimal solution z of (SP ). Let z be such an optimal solution. Then we have zs(z) = 0, z + s(z) > 0, z ≥ 0, s(z) ≥ 0. Now using s(z) = M z + q and ϑ = 0, and also (2.11) and (2.12), we obtain    z=  y x κ 0      ≥ 0,    s(z) =   Ax − κb −AT y + κc b T y − cT x   n̄ − y T , xT , κ r     ≥ 0.  Neglecting the last entry in both vectors, it follows that  Ax − κb   s̄(z̄) := M̄ z̄ =  −AT y + κc  ≥ 0, b T y − cT x  y   z̄ :=  x  ≥ 0, κ   (2.47) and moreover, z̄s̄(z̄) = 0, z̄ + s̄(z̄) > 0, z̄ ≥ 0, s̄(z̄) ≥ 0. (2.48) This shows that z̄ is a strictly complementary optimal solution of (SP0 ). Hence the next theorem requires no further proof. Theorem I.22 (SP0 ) has an optimal solution z̄ with z̄ + s̄(z̄) > 0. Note that (2.47) and (2.48) are homogeneous in the variables x, y and κ. So, assuming κ ∈ B, without loss of generality we may put κ = 1. Then we come to y ≥ 0, Ax − b ≥ 0, x ≥ 0, c − A y ≥ 0, 1 ≥ 0, T bT y − cT x ≥ 0, y (Ax − b) = 0,  x c − AT y = 0, bT y − cT x = 0, y + (Ax − b) > 0,  x + c − AT y > 0,  1 + bT y − cT x > 0. 38 I Theory and Complexity This makes clear that x is feasible for (P ) and y is feasible for (D), and because cT x = bT y these solutions are optimal with vanishing duality gap. We get a little more information from the above system, namely y (Ax − b) = 0,  x c − AT y = 0, y + (Ax − b) > 0,  x + c − AT y > 0. The upper two relations show that the dual vector y and the primal slack vector Ax − b are strictly complementary, whereas the lower two relations express that the primal vector x and the dual slack vector c − AT y are strictly complementary. Thus the following is also true. Theorem I.23 If κ ∈ B then the problems (P ) and (D) have optimal solutions that are strictly complementary with the slack vector of the other problem. Moreover, the optimal values of (P ) and (D) are equal. An intriguing question is of course what can be said about the problems (P ) and (D) if κ ∈ / B, i.e., if κ ∈ N . This question is completely answered in the next section. 2.9 Strong duality theorem We start by proving the following lemma. Lemma I.24 If κ ∈ N then there exist vectors x and y such that x ≥ 0, y ≥ 0, Ax ≥ 0, AT y ≤ 0, bT y − cT x > 0. Proof: Let κ ∈ N . Substitution of κ = 0 in (2.47) and (2.48) yields y ≥ 0, x ≥ 0, 0 ≥ 0, Ax ≥ 0, −AT y ≥ 0, bT y − cT x ≥ 0, y (Ax) = 0,  x AT y = 0,  0 bT y − cT x = 0, y + Ax > 0, x−AT y > 0,  0 + bT y − cT x > 0. It follows that the vectors x and y are as desired, thus the lemma is proved. ✷ Let us call an LO-problem solvable if it has an optimal solution, and unsolvable otherwise. Note that an LO-problem can be unsolvable for two possible reasons: the domain of the problem is empty, or the domain is not empty but the objective function is unbounded on the domain. In the first case the problem is called infeasible and in the second case unbounded. Theorem I.25 If κ ∈ N then neither (P ) nor (D) has an optimal solution. Proof: Let κ ∈ N . By Lemma I.24 we then have vectors x and y such that x ≥ 0, y ≥ 0, Ax ≥ 0, AT y ≤ 0, bT y − cT x > 0. (2.49) I.2 Duality Theory 39 By the last inequality we cannot have bT y ≤ 0 and cT x ≥ 0. Hence, either bT y > 0 or cT x < 0. (2.50) Suppose that (P ) is not infeasible. Then there exists x∗ such that x∗ ≥ 0 and Ax∗ ≥ b. Using (2.49) we find that x∗ + x ≥ 0 and A(x∗ + x) ≥ b. So x∗ + x is feasible for (P ). We can not have bT y > 0, because this would lead to the contradiction 0 < bT y ≤ (Ax∗ )T y = x∗ T (AT y) ≤ 0, since x∗ ≥ 0 and AT y ≤ 0. Hence we have bT y ≤ 0. By (2.50) this implies cT x < 0. But then we have for any positive λ that x∗ + λx is feasible for (P ) and cT (x∗ + λx) = cT x∗ + λcT x, showing that the objective value goes to minus infinity if λ grows to infinity. Thus we have shown that (P ) is either infeasible or unbounded, and hence (P ) has no optimal solution. The other case can be handled in the same way. If (D) is feasible then there exists y ∗ such that y ∗ ≥ 0 and AT y ∗ ≤ c. Due to (2.49) we find that y ∗ + y ≥ 0 and AT (y ∗ + y) ≤ c. So y ∗ + y is feasible for (D). We then can not have cT x < 0, because this gives the contradiction 0 > cT x ≥ AT y ∗ T x = y ∗ T (Ax) ≥ 0, since y ∗ ≥ 0 and Ax ≥ 0. Hence cT x ≥ 0. By (2.50) this implies bT y > 0. But then we have for any positive λ that y ∗ + λy is feasible for (D) and bT (y ∗ + λy) = bT y ∗ + λbT y. If λ grows to infinity then the last expression goes to infinity as well, so (D) is an unbounded problem. Thus we have shown that (D) is either infeasible or unbounded. This completes the proof. ✷ The following theorem summarizes the above results. Theorem I.26 (Strong duality theorem) For an LO problem (P ) in canonical form and its dual problem (D) we have the following two alternatives: (i) Both (P ) and (D) are solvable and there exist (strictly complementary) optimal solutions x for (P ) and y for (D) such that cT x = bT y. (ii) Neither (P ) nor (D) is solvable. This theorem is known as the strong duality theorem. It is the result that we announced in Section 2.2. It implies that if one of the problems (P ) and (D) is solvable then the other problem is solvable as well and in that case the duality gap vanishes at optimality. So the optimal values of both problems are then equal. 40 I Theory and Complexity If (B, N ) is the optimal partition of the self-dual problem (SP ) in which (P ) and (D) are embedded, then case (i) occurs if κ ∈ B and case (ii) if κ ∈ N . Also, by Theorem I.25, case (ii) occurs if and only if there exist x and y such that (2.49) holds, and then at least one of the two problems is infeasible. Duality is a major topic in the theory of LO. At many places in the book, and in many ways, we explore duality properties. The above result concerns an LO problem (P ) in canonical form and its dual problem (D). In the next section we will extend the applicability of Theorem I.26 to any LO problem. We conclude the present section with an interesting observation. Remark I.27 In the classical approach to LO we have so-called theorems of the alternatives, also known as variants of Farkas’ lemma. We want to establish here that the fact that (2.47) has a strictly complementary solution for each vector c ∈ IRn implies Farkas’ lemma. We show this below for the following variant of the lemma. Lemma I.28 (Farkas’ lemma [75]) For a given m × n matrix A and a vector b ∈ IRm either the system Ax ≥ b, x ≥ 0 has a solution or the system AT y ≤ 0, bT y > 0, y ≥ 0 has a solution but not both systems have a solution. Proof: The obvious part of the lemma is that not both systems can have a solution, because this would lead to the contradiction 0 < bT y ≤ (Ax)T y = xT AT y ≤ 0. Taking c = 0 in (2.47), it follows that there exist x and y such that the two vectors   y z =  x  ≥ 0, κ   Ax − κb s(z) =  −AT y  ≥ 0 bT y are strictly complementary. For κ there are two possibilities: either κ = 0 or κ > 0. In the first case we obtain AT y ≤ 0, bT y > 0, y ≥ 0. In the second case we may assume without loss of generality that κ = 1. Then x satisfies Ax ≥ b, x ≥ 0, proving the claim.25 • 2.10 The dual problem of an arbitrary LO problem Every LO problem can be transformed into a canonical form. In fact, this can be done in many ways. In its canonical form the problem has a dual problem. In this way we can obtain a dual problem for any LO problem. Unfortunately the transformation to canonical form is not unique, and as a consequence, the dual problem obtained in this way is not uniquely determined. 25 Exercise 13 Derive Theorem I.22 from Farkas’ lemma. In other words, use Farkas’ lemma to show that for any skew-symmetric matrix M there exists a vector x such that x ≥ 0, M x ≥ 0, x + M x > 0. I.2 Duality Theory 41 The aim of this section is to show that we can find a dual problem for any given problem in a unique and simple way, so that when taking the dual of the dual problem we get the original problem, in its original description. Recall that three types of variables can be distinguished: nonnegative variables, free variables and nonpositive variables. Similarly, three types of constraints can occur in an LO problem: equality constraints, inequality constraints of the ≤ type and inequality constraints of the ≥ type. For our present purpose we need to consider the LO problem in its most general form, with all types of constraint and all types of variable. Therefore, we consider the following problem as the primal problem: (P )  T   0  A0 x0 + A1 x1 + A2 x2 = b0 x0  c 1 1     : B0 x0 + B1 x1 + B2 x2 ≥ b1 , min x c   c2 2 C0 x0 + C1 x1 + C2 x2 ≤ b2 x x1 ≥ 0, x2 ≤ 0      , where, for each i = 0, 1, 2, Ai , Bi and Ci are matrices and bi , ci and xi are vectors, and the sizes of these matrices and vectors, which we do not further specify, are such that all expressions in the problem are well defined. Now let us determine the dual of this problem. We first put it into canonical form.26 To this end we replace the equality constraint by two inequality constraints and we multiply the ≤ constraint by −1. Furthermore, we replace the nonpositive variable x2 by x3 = −x2 and the free variable x0 by x+ − x− with x+ and x− nonnegative. This yields the following equivalent problem: minimize  T c0  −c0     c1  −c2 subject to   x+  x−     x1  x3  +   0  x A0 −A0 A1 −A2 b  −A0 A0 −A1 A2   x−   −b0        B0 −B0 B1 −B2   x1  ≥  b1  , x3 −C0 C0 −C1 C2 −b2    x+  x−     x1  ≥ 0. x3 In terms of vectors z 1 , z 2 , z 3 , z 4 that contain the appropriate nonnegative dual variables, the dual of this problem becomes maximize 26  T b0  −b0     b1  −b2   z1  z2     z3  z4 The transformations carried out below lead to an increase of the numbers of constraints and variables in the problem formulation. They are therefore ‘bad’ from a computational point of view. But our present purpose is purely theoretical. In Appendix D it is shown how the problem can be put in canonical form without increasing these numbers. 42 I Theory and Complexity subject to  1   0  c z AT0 −AT0 B0T −C0T  −AT AT −B T C T   z 2   −c0      0 0 0 0  ,   ≤    AT1 −AT1 B1T −C1T   z 3   c1  −c2 z4 −AT2 AT2 −B2T C2T   z1  z2     3  ≥ 0. z   z4 We can easily check that the variables z 1 and z 2 only occur together in the combination z 1 − z 2 . Therefore, we can replace the variables by one free variable y 0 := z 1 − z 2 . This reduces the problem to maximize subject to T   y0 b0  1  3  b  z  z4 −b2   0  c AT0 B0T −C0T  0  " #  −c0   −AT −B T C T  y z3       0 0 0 3 , ≥ 0. ≤ z        c1   AT1 B1T −C1T  z4 4 z −c2 −AT2 −B2T C2T  In this problem the first two blocks of constraints can be taken together into one block of equality constraints:   T   0 0 T 0 T 3 T 4 0 "   #   b y A y + B z − C z = c 0 0 0   3 z  1  3 T 0 T 3 T 4 1, ≥ 0 . max  b   z  : A1 y + B1 z − C1 z ≤ c 4   z   2 4 T 0 T 3 T 4 2   −b z −A2 y − B2 z + C2 z ≤ −c Finally we multiply the last block of constraints by -1, we replace the nonnegative variable z 3 by y 1 = z 3 and the nonnegative variable z 4 by the nonpositive variable y 2 = −z 4 . This transforms the dual problem to its final form, namely    T   0 T 0 T 1 T 2 0 0     y A0 y + B0 y + C0 y = c   b  1  1 1 2 2 1 1 T 0 T T (D) max  b   y  : A1 y + B1 y + C1 y ≤ c , y ≥ 0, y ≤ 0 .       b2 y2 AT2 y 0 + B2T y 1 + C2T y 2 ≥ c2 Comparison of the primal problem (P ) with its dual problem (D), in its final description, reveals some simple rules for the construction of a dual problem for any given LO problem. First, the objective vector and the right-hand side vector are interchanged in the two problems, and the constraint matrix is transposed. At first sight it may not be obvious that the types of the dual variables and the dual constraints can be determined. We need to realize that the vector y 0 of dual variables relates to the first block of constraints in the primal problem, y 1 to the second block and y 2 to the third block of constraints. Then the relation becomes obvious: equality constraints in the primal problem yield free variables in the dual problem, inequality constraints in the primal problem of type ≥ yield nonnegative variables in the dual problem, and inequality constraints in the primal problem of type ≤ yield nonpositive variables in the dual problem. For the types of dual constraint we have similar relations. Here the I.2 Duality Theory 43 vector of primal variables x0 relates to the first block of constraints in the dual problem, x1 to the second block and x2 to the third block of constraints. Free variables in the primal problem yield equality constraints in the dual problem, nonnegative variables in the primal problem yield inequality constraints of type ≤ in the dual problem, and nonpositive variables in the primal problem yield inequality constraints of type ≥ in the dual problem. Table 2.1. summarizes these observations, and as such provides a simple scheme for writing down a dual problem for any given minimization problem. To get the dual of a maximization problem, one simply has to use the table from the right to the left. Primal problem (P) Dual problem (D) min cT x max bT y equality constraint free variable inequality constraint ≥ variable ≥ 0 free variable equality constraint variable ≥ 0 inequality constraint ≤ variable ≤ 0 inequality constraint ≥ inequality constraint ≤ Table 2.1. variable ≤ 0 Scheme for dualizing. As indicated before, the dualizing scheme is such that when it is applied twice, the original problem is returned. This easily follows from Table 2.1., by inspection.27 2.11 Convergence of the central path We already announced in footnote 23 (page 36) that the central path has a unique limit point in the optimal set. Because this result was not needed in the rest of this chapter, we postponed its proof to this section. We also characterize the limit point as the so-called analytic center of the optimal set of (SP ). As before, we assume that the central path of (SP ) exists. Our aim is to investigate the behavior of the central path as µ tends to 0. From the proof of Theorem I.20 we know that the central path has a subsequence converging to an optimal solution. This was sufficient for proving the existence of a strictly complementary solution of (SP ). However, as we show below, the central path itself converges. The limit point z ∗ and 27 Exercise 14 Using the results of this chapter prove that the following three statements are equivalent: (i) (SP ) satisfies the interior-point condition;  (ii) the level sets Lγ := (z, s(z)) : q T z ≤ γ, s(z) = M z + q ≥ 0, z ≥ 0 of q T z are bounded; (iii) the optimal set of (SP ) is bounded. 44 I Theory and Complexity its surplus vector s∗ := s(z ∗ ) form a strictly complementary optimal solution pair, and hence determine the optimal partition (B, N ) of (SP ). The optimal set of (SP ) is given by  SP ∗ = (z, s) : M z + q = s, z ≥ 0, s ≥ 0, q T z = 0 . This makes clear that SP ∗ is the intersection of the affine space  (z, s) : M z + q = s, q T z = 0 with the nonnegative orthant of IR2n . At this stage we need to define the analytic center of SP ∗ . We give the definition for the more general case of an arbitrary (nonempty) set that is the intersection of an affine space in IRp and the nonnegative orthant of IRp . Definition I.29 (Analytic center) 28 Let the nonempty and bounded set T be the intersection of an affine space in IRp with the nonnegative orthant of IRp . We define the support σ(T ) of T as the subset of the full index set {1, 2, . . . , p} given by σ(T ) = {i : ∃x ∈ T such that xi > 0} . The analytic center of T is defined as the zero vector if σ(T ) is empty; otherwise it is the vector in T that maximizes the product Y xi , x ∈ T . (2.51) i∈σ(T ) If the support of the set T in the above definition is nonempty then the convexity of T implies the existence of a vector x ∈ T such that xσ(T ) > 0. Moreover, if σ(T ) is nonempty then the maximum value of the product (2.51) exists since T is bounded. Since the logarithm of the product (2.51) is strictly concave, the maximum value (if it exists) is attained at a unique point of T . Thus the above definition uniquely defines the analytic center for any bounded subset that is the intersection of an affine space in IRp with the nonnegative orthant of IRp . Due to Lemma I.9 any pair (z, s) ∈ SP ∗ satisfies eT z + eT s(z) = n̄. This makes clear that the optimal set SP ∗ is bounded. Its analytic center therefore exists. We now show that the central path converges to this analytic center. The proof very much resembles that of Theorem I.20.29 28 29 The notion of analytic center of a polyhedron was introduced by Sonnevend [257]. It plays a crucial role in the theory of interior-point methods. The limiting behavior of the central path as µ approaches zero has been an important subject in research on interior-point methods for a long time. In the book by Fiacco and McCormick [77] the convergence of the central path to an optimal solution is investigated for general convex optimization problems. McLinden [197] considered the limiting behavior of the path for monotone complementarity problems and introduced the idea for the proof-technique of Theorem I.20, which was later adapted by Güler and Ye [135]. Megiddo [200] extensively investigated the properties of the central path, which motivated Monteiro and Adler [218], Tanabe [261] and Kojima, Mizuno and Yoshise [178] to investigate primal-dual methods. Other relevant references for the limiting behavior of the central path are Adler and Monteiro [3], Asić, Kovačević-Vujčić and RadosavljevićNikolić [28], Güler [131], Kojima, Mizuno and Noma [176], Monteiro and Tsuchiya [222] and Witzgall, Boggs and Domich [294], Halická [137], Wechs [290] and Zhao and Zhu [321]. I.2 Duality Theory 45 Theorem I.30 The central path converges to the analytic center of the optimal set SP ∗ of (SP ). Proof: Let (z ∗ , s∗ ) be an accumulation point of the central path, where s∗ = s(z ∗ ). The existence of such a point has been established in the proof of Theorem I.20. Let {µk }∞ k=1 be a positive sequence such that µk → 0 and such that (z(µk ), s(µk )), with s(µk ) = s(z(µk )), converges to (z ∗ , s∗ ). Then z ∗ is optimal, which means z ∗ s∗ = 0, and z ∗ and s∗ are strictly complementary, i.e, z ∗ + s∗ > 0. Now let z̄ be optimal in (SP ) and let s̄ = M z̄ + q be its surplus vector. Applying the orthogonality property (2.22) to the points z̄ and z(µ) we obtain (z(µk ) − z̄)T (s(µk ) − s̄) = 0. Rearranging terms and using z(µk )T s(µk ) = nµk and (z̄)T s̄ = 0, we get n X z̄j sj (µk ) + n X s̄j zj (µk ) = nµk . j=1 j=1 Since the pair (z ∗ , s∗ ) is strictly complementary and (z̄, s̄) is an arbitrary optimal pair, we have for each coordinate j: s∗j = 0 ⇒ s̄j = 0. zj∗ = 0 ⇒ z̄j = 0, Hence, z̄j = 0 if j ∈ / σ(z ∗ ) and s̄j = 0 if j ∈ / σ(s∗ ). Thus we may write X X s̄j zj (µk ) = nµk . z̄j sj (µk ) + j∈σ(s∗ ) j∈σ(z ∗ ) Dividing both sides by µk = zj (µk )sj (µk ), we get X X z̄j s̄j + = n. zj (µk ) sj (µk ) ∗ ∗ j∈σ(z ) Letting k → ∞, it follows that j∈σ(s ) X j∈σ(z ∗ ) X s̄j z̄j + = n. ∗ zj s∗j ∗ j∈σ(s ) Using the arithmetic-geometric-mean inequality we obtain   1/n  X s̄j X z̄j Y z̄j Y s̄j 1    = 1. + ≤  zj∗ s∗j n zj∗ s∗j ∗ ∗ ∗ ∗ j∈σ(z ) j∈σ(s ) Obviously, the above inequality implies Y Y s̄j ≤ z̄j j∈σ(z ∗ ) j∈σ(s ) j∈σ(z ) j∈σ(s∗ ) Y j∈σ(z ∗ ) zj∗ Y s∗j . j∈σ(s∗ ) Q Q This shows that (z ∗ , s∗ ) maximizes the product j∈σ(z ∗ ) zj j∈σ(s∗ ) sj over the optimal set. Hence the central path of (SP ) has only one accumulation point when µ approaches zero and this is the analytic center of SP ∗ . ✷ 46 I Theory and Complexity Example I.31 Let us compute the limit point of the central path of the self-dual problem (SP ) in Example I.7, as given by (2.19). Recall from (2.26) in Example I.12 that any optimal solution has the form     2κ 0  0   κ          z =  κ  , s(z) =  0  , 0 ≤ κ ≤ 1,     κ  0  0 5 − 5κ from which the sets B and N follow: B = {1, 3, 4} , N = {2, 5} . Hence we have for any optimal z, Y Y  zj sj (z) = 2κ4 (5 − 5κ) = 10 κ4 − κ5 . j∈B j∈N This product is maximal for κ = 0.8, so the analytical center of the optimal set is given by 30,31,32,33   1.6  0      z =  0.8  ,    0.8  0   0  0.8      s(z) =  0  .    0  1 ♦ The convergence of the central path when µ goes to zero implies the boundedness of the coordinates of z(µ) and s(µ) for any finite section of the central path. Of course, this also follows from Lemma I.9 and (2.33).34 30 Exercise 15 Find the analytic center of the self-dual problem considered in Exercise 4 (page 27). 31 Exercise 16 Find the analytic center of the self-dual problem considered in Exercise 5 (page 27). 32 Exercise 17 Find the analytic center of the self-dual problem considered in Exercise 6 (page 27). 33 Exercise 18 Find the analytic center of the self-dual problem considered in Exercise 7 (page 27). 34 Exercise 19 For any positive µ consider the set SP µ :=  (z, s) : M z + q = s, z ≥ 0, s ≥ 0, q T z = q T z(µ) . Using the same proof-technique as for Theorem I.30, show that the pair (z(µ), s(µ)) is the analytic center of this set. 3 A Polynomial Algorithm for the Self-dual Model 3.1 Introduction The previous chapter made clear that any (canonical) LO problem can be solved by finding a strictly complementary solution of a specific self-dual problem that satisfies the interior-point assumption. In particular, the self-dual problem has the form  (SP ) min q T z : M z ≥ −q, z ≥ 0 , where M is a skew-symmetric matrix and q a nonnegative vector. Deviating from the notation in Chapter 2 we denote the order of M as n (instead of n̄). Then, according to (2.12) the vector q has the form " # 0n−1 q := . (3.1) n Note that due to the definition of the matrix M we may assume that n ≥ 5. Like before, we associate to any vector z ∈ IRn its slack vector s(z): s(z) := M z + q. (3.2) As a consequence we have z is a feasible for (SP ) if and only if z ≥ 0 and s(z) ≥ 0. Also recall that the all-one vector e is feasible for (SP ) and its slack vector is the all-one vector (cf. Theorem I.5): s(e) = e. (3.3) Assuming that the entries in M and q are integral (or rational), we show in this chapter that we can find a strictly complementary solution of (SP ) in polynomial time. This means that we present an algorithm that yields a strictly complementary solution of (SP ) after a number of arithmetic operations that is bounded by a polynomial in the size of (SP ). Remark I.32 The terminology is taken from complexity theory. For our purpose it is not necessary to have a deep understanding of this theory. Major contributions to complexity 48 I Theory and Complexity theory were given by Cook [56], Karp [166], Aho, Hopcroft and Ullman [5], and Garey and Johnson [92]. For a survey focusing on linear and combinatorial optimization problems we refer the reader to Schrijver [250]. Complexity theory distinguishes between easy and hard problems. In this theory a problem consists of a class of problem instances, so ‘the’ LO problem consists of all possible instances of LO problems; here we restrict ourselves to LO problems with integral input data.1 A problem is called solvable in polynomial time (or simply polynomial or easy) if there exists an algorithm that solves each instance of the problem in a time that is bounded above by a polynomial in the size of the problem instance; otherwise the problem is considered to be hard. In general the size of an instance is defined as the length of a binary string encoding the instance. For the problem (SP ) such a string consists of binary encodings of the entries in the matrix M and the vector q. Note that the binary encoding of a positive integer a requires a string of length 1 + ⌈log2 (1 + |a|)⌉. (The first 1 serves to encode the sign of the number.) If the entries in M and q are integral, the total length of the string for encoding (SP ) becomes n X i=1 (1 + ⌈log2 (1 + |qi |)⌉) + n(n + 1) + n X i=1 n X i,j=1 (1 + ⌈log 2 (1 + |Mij |)⌉) = ⌈log 2 (1 + |qi |)⌉ + n X i,j=1 ⌈log2 (1 + |Mij |)⌉ . (3.4) Instead we work with the smaller number L = n(n + 1) + log2 Π, (3.5) where Π is the product of all nonzero entries in q and M . Ignoring the integrality operators, we can show that the expression in (3.4) is less than 2L. In fact, one can easily understand that the number of operations of an algorithm is polynomial in (3.4) if and only if it is bounded by a polynomial in L. • We consider the number L, as given by (3.5), as the size of (SP ). In fact we use this number only once. In the next section we present an algorithm that generates a positive vector z such that z T s(z) ≤ ε, where ε is any positive number, and we derive a bound for the number of iterations required by the algorithm. Then, in Section 3.3, we show that this algorithm can be used to find a strictly complementary solution of (SP ) and we derive an iteration bound that depends on the so-called condition number of (SP ). Finally, we show that the iteration bound can be bounded from above by a polynomial in the quantity L, which represents the size of (SP ). 3.2 Finding an ε-solution After all the theoretical results of the previous sections we are now ready to present an algorithm that finds a strictly complementary solution of (SP ) in polynomial time. The working horse in the algorithm is the Newton step that was introduced in Section 2.7.2. It will be convenient to recall its definition and some of its properties. 1 We could easily have included LO problems with rational input data in our considerations, because if the entries in M and q are rational numbers then after multiplication of these entries with their smallest common multiple, all entries become integral. Thus, each problem instance with rational data can easily be transformed to an equivalent problem with integral data. I.3 Polynomial Algorithm 49 Given a positive vector z such that s = s(z) > 0, the Newton direction ∆z at z with respect to µ (or the µ-center z(µ)) is uniquely determined by the linear system (cf. (2.35) – (2.36)) M ∆z − ∆s = 0, z∆s + s∆z = µe − zs. Substituting (3.6) into (3.7) we get (3.6) (3.7) 2 (S + ZM ) ∆z = µe − zs. Since the matrix S + ZM is invertible (cf. Exercise 9, page 29), it follows that −1 ∆z = (S + ZM ) ∆s = M ∆z. (µe − zs) (3.8) (3.9) The result of the Newton step is denoted as z + := z + ∆z; the new slack vector is then given by s+ := s(z + ) = M (z + ∆z) + q = s + M ∆z. The vectors ∆z and ∆s are orthogonal, by (2.34). After the Newton step the objective value has the desired value nµ, by (2.38): q T z = sT z = nµ. The variance vector of z with respect to µ is defined by (cf. (2.41))3 : s zs(z) . v := µ (3.10) (3.11) This implies ⇔ zs(z) = µe v = e. (3.12) We use δ(z, µ) as a measure for the proximity of z to z(µ). It is defined by (cf. (2.42)) δ(z, µ) := 1 2 v − v −1 . (3.13) If z = z(µ) then v = e and hence δ(z, µ) = 0, otherwise √ δ(z, µ) > 0. If δ(z, µ) < 1 then the Newton step is feasible, and if δ(z, µ) < 1/ 2 then the Newton process quadratically fast converges to z(µ). This is a consequence of the next lemma (cf. Theorem I.16). Lemma I.33 If δ := δ(z, µ) < 1, then the Newton step is strictly feasible, i.e., z + > 0 and s+ > 0. Moreover, δ2 δ(z + , µ) ≤ p . 2(1 − δ 2 ) 2 3 Here, as usual, Z = diag (z) and S = diag (s). Exercise 20 If we define d := p z/s, where s = s(z), then show that the Newton step ∆z satisfies (I + DM D) ∆z =  z −1 v − v = µs−1 − z. v 50 I Theory and Complexity 3.2.1 Newton-step algorithm The idea of the algorithm is quite simple. Starting at z = e, we choose µ < 1 such that (3.14) δ(z, µ) ≤ √12 , and perform a Newton step targeting at z(µ). After the step the new iterate z satisfies δ(z, µ) ≤ 12 . Then we decrease µ such that (3.14) holds for the new values of z and µ, and repeat the procedure. Note that after each Newton step we have q T z = z T s(z) = nµ. Thus, if µ approaches zero, then z will approach the set of optimal solutions. Formally the algorithm can be stated as follows. Full-Newton step algorithm Input: An accuracy parameter ε > 0; a barrier update parameter θ, 0 < θ < 1. begin z = e; µ := 1; while nµ ≥ ε do begin µ := (1 − θ)µ; z := z + ∆z; end end Note that the reduction of the barrier parameter µ is realized by the multiplication with the factor 1 − θ. In the next section we discuss how an appropriate value of the update parameter θ can be obtained, so that during the course of the algorithm the iterates are kept within the region where Newton’s method is quadratically convergent. 3.2.2 Complexity analysis At the start of the algorithm we have µ = 1 and z = z(1) = e, whence q T z = n and δ(z, µ) = 0. In each iteration µ is first reduced with the factor 1 − θ and then the Newton step is made targeting the new µ-center. It will be clear that the reduction of µ has effect on the value of the proximity measure. This effect is fully described by the following lemma. Lemma I.34 Let z > 0 and µ > 0 be such that s = s(z) > 0 and q T z = nµ. Moreover, let δ := δ(z, µ) and µ′ = (1 − θ)µ. Then δ(z, µ′ )2 = (1 − θ)δ 2 + θ2 n . 4(1 − θ) I.3 Polynomial Algorithm 51 Proof: Let δ + := δ(z, µ′ ) and v = 4(δ + )2 = p zs/µ, as in (3.11). Then, by definition, √ v 1 − θ v −1 − √ 1−θ 2 √ =  θv 1 − θ v −1 − v − √ 1−θ 2 . From z T s = nµ it follows that kvk2 = n. This implies  v T v −1 − v = n − kvk2 = 0. Hence, v is orthogonal to v −1 − v. Therefore, 4(δ + )2 = (1 − θ) v −1 − v 2 2 + θ2 kvk = (1 − θ) v −1 − v 1−θ Since v −1 − v = 2δ, the result follows. Lemma I.35 Let θ = √1 . 2n 2 + nθ2 . 1−θ ✷ Then at the start of each iteration we have q T z = nµ and δ(z, µ) ≤ 1 2 (3.15) Proof: At the start of the first iteration we have µ = 1 and z = e, so q T z = n and δ(z, µ) = 0. Therefore (3.15) certainly holds at the start of the first iteration. Now suppose that (3.15) holds at the start of some iteration. We show that (3.15) then also holds at the start of the next iteration. Let δ = δ(z, µ). When the barrier parameter is updated to µ′ = (1 − θ)µ, Lemma I.34 gives δ(z, µ′ )2 = (1 − θ) δ 2 + θ2 n 1−θ 1 3 ≤ + ≤ . 4(1 − θ) 4 8(1 − θ) 8 The √ last inequality can be understood as follows. Due to n ≥ 2 we have 0 ≤ θ ≤ 1/ 4 = 1/2. The left hand side expression in the last inequality is a convex function of θ. Its value at θ = 0 as well as at θ = 1/2 equals 3/8, hence its value does not exceed 3/8 for θ ∈ [0, 1/2]. √ Since 3/8 ≤ 1/2, it follows that after the µ-update δ(z, µ′ ) ≤ 1/ 2. Hence, by Lemma I.33, after performing the Newton step we certainly have δ(z + , µ′ ) ≤ 1/2. Finally, by (3.10), q T z + = nµ′ . Thus the lemma has been proved. ✷ How many iterations are needed by the algorithm? The answer is provided by the following lemma. Lemma I.36 After at most iterations we have nµ ≤ ε.  1 n log θ ε  Proof: Initially, the objective value is n and in each iteration it is reduced by the factor 1 − θ. Hence, after k iterations we have µ = (1 − θ)k . Therefore, the objective value, given by q T z(µ) = nµ, is smaller than, or equal to ε if k (1 − θ) n ≤ ε. 52 I Theory and Complexity Taking logarithms, this becomes k log (1 − θ) + log n ≤ log ε. Since − log (1 − θ) ≥ θ, this certainly holds if kθ ≥ log n − log ε = log n . ε This implies the lemma. ✷ The above results are summarized in the next theorem which requires no further proof. Theorem I.37 If θ = √1 2n then the algorithm requires at most l√ nm 2n log ε iterations. The output is a feasible z > 0 such that q T z = nµ ≤ ε and δ(z, µ) ≤ 12 . This theorem shows that we can get an ε-solution z of our self-dual model with ε as small as desirable.4 A crucial question for us is whether the variable κ = zn−1 is positive or zero in the limit, when µ goes to zero. In practice, for small enough ε it is usually no serious problem to decide which of the two cases occurs. In theory, however, this means that we need to know what the optimal partition of the problem is. As we explain in the next section, the optimal partition can be found in polynomial time. This requires some further analysis of the central path. Example I.38 In this example we demonstrate the behavior of the Full-Newton step algorithm by applying it to the problem (SP ) in Example I.7, as given in (2.19) on page 23. According to Theorem I.37, with n = 5, the algorithm requires at most   √ 5 10 log ε iterations. For ε = 10−3 we have log (5/ε) = log 5000 = 8.5172, and we get 27 as an upper bound for the number of iterations. When running the algorithm with this ε the actual number of iterations is 22. The actual values of the output of the algorithm are z = (1.5999, 0.0002, 0.8000, 0.8000, 0.0002)T and s(z) = (0.0001, 0.8000, 0.0002, 0.0002, 1.0000)T . The left plot in Figure 3.1 shows how the coordinates of the vector z := (z1 , z2 , z3 , z4 = κ, z5 = ϑ), which contains the variables in the problem, develop in the course of the algorithm. The right plot does the same for the coordinates of the surplus vector s(z) := (s1 , s2 , s3 , s4 , s5 ). Observe that z and s(z) converge nicely to the limit point of the central path of the sample problem as given in Example I.31. ♦ 4 It is worth pointing out that if we put ε = nµ in the iteration bound of Theorem I.37 we get exactly the same bound as given by (2.45). I.3 Polynomial Algorithm 53 1.6 1.6 1.4 1.4 ■ z1 1.2 1.2 κ 1 s5 ✠ 1 ✠ 0.8 ✻ 0.8 ■ s2 z3 0.6 0.6 s3 z2 0.4 ✠ 0.4 ✠ ■ ✒ 0.2 s4 ✒ 0.2 ϑ s1 0 0 5 3.3.1 15 ✲ iteration number Figure 3.1 3.3 10 0 20 0 5 10 15 ✲ iteration number 20 Output Full-Newton step algorithm for the problem in Example I.7. Polynomial complexity result Introduction Having a strictly complementary solution z of (SP ), we also know the optimal partition (B, N ) of (SP ), as defined in Section 2.6. For if z is a strictly complementary solution of (SP ) then we have zs(z) = 0 and z + s(z) > 0, and the optimal partition follows from5 B = {i : zi > 0} N = {i : si (z) > 0} . Definition I.39 The restriction of a vector z ∈ IRn to the coordinates in a subset I of the full index set {1, 2, . . . , n} is denoted by zI . Hence if z̃ is a strictly complementary solution of (SP ) then z̃B > 0, z̃N = 0, sB (z̃) = 0, sN (z̃) > 0. Now let z be any feasible solution of (SP ). Then, by Lemma I.10, with z1 = z, z2 = z̃ we obtain that z is optimal if and only if zs(z̃) = z̃s(z) = 0. This gives z 5 is optimal for (SP ) ⇐⇒ zN = 0 and sB (z) = 0. It may be sensible to point out that if, conversely, the optimal partition is known, then it is not obvious at all how to find a strictly complementary solution of (SP ). 54 I Theory and Complexity As a consequence, the set SP ∗ of optimal solutions of (SP ) is completely determined by the optimal partition (B, N ) of (SP ). We thus may write SP ∗ = {z ∈ SP : zN = 0, sB (z) = 0} , where SP denotes the feasible region of (SP ). Assuming that M and q are integral we show in this section that a strictly complementary solution of (SP ) can be found in polynomial time. We divide the work into a few steps. First we apply the Full-Newton step algorithm with a suitable (small enough) value of the accuracy parameter ε. This yields a positive solution z of (SP ) such that s(z) is positive as well and such that the pair (z, s(z)) is almost strictly complementary in the sense that for each index i one of the positive coordinates in the pair (zi , si (z)) is large and the other is small. To distinguish between ‘large’ and ‘small’ coordinates we introduce the so-called condition number of (SP ). We are able to specify ε such that the large coordinates of z are in B and the small coordinates of z in N . The optimal partition of (SP ) can thus be derived from the almost strictly complementary solution z provided by the algorithm. Then, in Section 3.3.6, a rounding procedure is described that yields a strictly complementary solution of (SP ) in polynomial time. 3.3.2 Condition number Below, (B, N ) always denotes the optimal partition of (SP ), and SP ∗ the optimal set of (SP ). We first introduce the following two numbers: z σSP := min max∗ {zi } , i∈B z∈SP s σSP := min max∗ {si (z)} . i∈N z∈SP z s By convention we take σSP = ∞ if B is empty and σSP = ∞ if N is empty. Since the ∗ z s optimal set SP is bounded, σSP is finite if B is nonempty and σSP is finite if N is nonempty. Due to the definition of the sets B and N both numbers are positive, and since B and N cannot be both empty at least one of the two numbers is finite. As a consequence, the number z s σSP := min {σSP , σSP } is positive and finite. This number plays a crucial role in the further analysis and is called the condition number of (SP ).6 Using that z and s(z) are complementary if z ∈ SP ∗ we can easily verify that σSP can also be written as σSP := min max∗ {zi + si (z)} . 1≤i≤n z∈SP Example I.40 Let us calculate the condition number of our sample problem (2.19) in Example I.7. We found in Example I.12 that any optimal solution z has the form 6 This condition number seems to be a natural quantity for measuring the degree of hardness of (SP ). The smaller the number the more difficult it is to find a strictly complementary solution. In a more general context, it was introduced by Ye [311]. See also Ye and Pardalos [314]. For a discussion of other condition numbers and their relation to the size of a problem we refer the reader to Vavasis and Ye [280]. I.3 Polynomial Algorithm as given by (2.26), namely   2κ  0      z =  κ ,   κ 0 Hence we have for any optimal z, 55   0  κ      s(z) =  0  ,    0  5 − 5κ   2κ  κ      z + s(z) =  κ  ,    κ  5 − 5κ 0 ≤ κ ≤ 1. 0 ≤ κ ≤ 1. To find the condition number we need to find the maximum values of each of the variables in in this vector. These values are 2, 1, 1, 1 (for κ = 1) and 5 (for κ = 0), respectively. The minimum of these maximal values being 1, the condition number of our sample problem (2.19) turns out to be 1.7,8,9,10 ♦ In the above example we were able to calculate the condition number of a given problem. We see below that when we know the condition number of a problem we can profit from it in the solution procedure. In general, however, the calculation of the condition number is at least as hard as solving the problem. Hence, in general, we have to solve a problem without knowing its condition number. In such cases there is a cheap way to get a lower bound for the condition number. We proceed by deriving such a lower bound for σSP in terms of the data of the problem (SP ). We introduce some more notation. Definition I.41 The submatrix of M consisting of the elements in the rows whose indices are in I and the columns whose indices are in J is denoted by MIJ . Using this convention, we have for any vector z and its surplus vector s = s(z):        sB zB MBB MBN q =   +  B .  (3.16) sN zN MN B MN N qN Recall from the previous section that the vector z is optimal if and only if z and s are T nonnegative, zN = 0 and sB = 0. Hence we have q T z = qB zB . Due to the existence of 7 Exercise 21 Using the results of Exercise 4 (page 27), prove that the condition number of the self-dual problem in question equals 5/4. 8 Exercise 22 Using the results of Exercise 5 (page 27), prove that the condition number of the self-dual problem in question equals 5/4. 9 Exercise 23 Using the results of Exercise 6 (page 27), prove that the condition number of the self-dual problem in question equals 5/(1 + β) if β ≥ 2 and otherwise 5β/(2(1 + β)). 10 Exercise 24 Using the results of Exercise 7 (page 27), prove that the condition number of the self-dual problem in question equals 5/(4 − β) if β ≤ −1 and otherwise −5β/(4 − β). 56 I Theory and Complexity a strictly complementary solution z, for which zB is positive, we conclude that qB = 0. (3.17) Thus it becomes clear that a vector z and its surplus vector s are optimal for (SP ) if and only if zB ≥ 0, zN = 0, sB = 0, sN ≥ 0 and        0 MBB MBN z 0 =   B  +  . sN MN B MN N 0 qN This is equivalent to      MBB 0BN zB 0B   = , MN B −IN N sN −qN zB ≥ 0, zN = 0, sB = 0, sN ≥ 0. (3.18) Note that any strictly complementary solution z gives rise to a positive solution of this system. For the calculation of σSP we need to know the maximal value of each coordinate of the vector (zB , sN ) when this vector runs through all possible solutions of (3.18). Then σSP is just the smallest of all these maximal values. At this stage we may apply Lemma C.1 in Appendix C to (3.18), which gives us an easy to compute lower bound for σSP . Theorem I.42 The condition number σSP of (SP ) satisfies 1 , j=1 kMj k σSP ≥ Qn where Mj denotes the j-th column of M . Proof: Recall that the optimal set of (SP ) is determined by the equation (3.18). Also, by Lemma I.9 we have eT z + eT s(z) = n, showing that the optimal set is bounded. As we just established, the system (3.18) has a positive solution, and hence we may apply Lemma C.1 to (3.18) with   MBB 0BN A= . MN B −IN N The columns in A made up by the two left blocks are the columns Mj of M with j ∈ B, whereas the columns made up by the two right blocks are unit vectors. Thus we obtain that the maximal value of each coordinate of the vector (zB , sN ) is bounded below by the quantity 1 Q . j∈B kMj k With the definition of σSP this implies σSP ≥ Q 1 1 ≥ Qn . kM k j j∈B j=1 kMj k The last inequality is an obvious consequence of the assumption that all columns in M are nonzero and integral. Hence the theorem has been proved. ✷ I.3 Polynomial Algorithm 3.3.3 57 Large and small variables It will be convenient to call the coordinates of z(µ) that are indexed by B the large coordinates of z(µ), and the other coordinates the small coordinates of z(µ). Furthermore, the coordinates of sN (µ) are called the large coordinates of s(µ), and the coordinates of sB (µ) small coordinates of s(µ). The next lemma provides lower bounds for the large coordinates and upper bounds for the small coordinates of z(µ) and s(µ). This lemma implies that the large coordinates of z(µ) and s(µ) are bounded away from zero along the central path, and there exists a uniform lower bound that is independent of µ. Moreover, the small coordinates are bounded above by a constant times µ, where the constant depends only on the data in the problem (SP ). In other words, the order of magnitude of the small coordinates is O(µ). The bounds in the lemma use the condition number σSP of (SP ). Lemma I.43 For any positive µ we have zi (µ) ≥ σSP , i ∈ B, n zi (µ) ≤ nµ , i ∈ N, σSP si (µ) ≤ nµ , i ∈ B, σSP si (µ) ≥ σSP , i ∈ N. n Proof: First let i ∈ N and let z̃ be an optimal solution such that s̃i := si (z̃) is maximal. Then the definition of the condition number σSP implies that s̃i ≥ σSP . Applying the orthogonality property (2.22) to the points z̃ and z(µ) we obtain (z(µ) − z̃)T (s(µ) − s̃) = 0, which gives z(µ)T s̃ + s(µ)T z̃ = nµ. This implies zi (µ)s̃i ≤ z(µ)T s̃ ≤ nµ. Dividing by s̃i and using that s̃i ≥ σSP we obtain zi (µ) ≤ nµ nµ ≤ . s̃i σSP Since zi (µ)si (µ) = µ, it also follows that si (µ) ≥ σSP . n This proves the second and fourth inequality in the lemma. The other inequalities are obtained in the same way. Let i ∈ B and let z̃ be an optimal solution such that z̃i is maximal. Then the definition of the condition number σSP implies that z̃i ≥ σSP . Applying the orthogonality property to the points z̃ and z(µ) we obtain in the same way as before si (µ)z̃i ≤ s(µ)T z̃ ≤ nµ. From this we deduce that si (µ) ≤ nµ nµ ≤ . z̃i σSP 58 I Theory and Complexity Using once more z(µ)s(µ) = µe we find zi (µ) ≥ σSP , n completing the proof of the lemma.11,12 ✷ We collect the results of the above lemma in Table 3.1.. i∈B Table 3.1. i∈N zi (µ) ≥ σSP n ≤ nµ σSP si (µ) ≤ nµ σSP ≥ σSP n Estimates for large and small variables on the central path. The lemma has an important consequence. If µ is so small that σSP nµ < σSP n then we have a complete separation of the small and the large variables. This means that if a point z(µ) on the central path is given so that µ< σSP 2 , n2 then we can determine the optimal partition (B, N ) of (SP ). In the next section we show that the Full-Newton step algorithm can produce a point z in the neighborhood of the central path with this feature, namely that it gives a complete separation of the small and the large variables. 3.3.4 Finding the optimal partition Theorem I.37 states that after at most l√ nm 2n log ε 11 12 (3.19) Exercise 25 Let 0 < µ < µ̄. Using the orthogonality property (2.22), show that for each i (1 ≤ i ≤ n), zi (µ) si (µ) + ≤ 2n. zi (µ̄) si (µ̄) The result in Exercise 25 can be improved to si (µ) zi (µ) + ≤ n, zi (µ̄) si (µ̄) which implies zi (µ) ≤ nzi (µ̄), si (µ) ≤ nsi (µ̄). For a proof we refer the reader to Vavasis and Ye [281]. I.3 Polynomial Algorithm 59 iterations the Full-Newton step algorithm yields a feasible solution z such that q T z = nµ ≤ ε and δ(z, µ) ≤ 12 . We show in this section that if µ is small enough we can recognize the optimal partition (B, N ) from z, and such z can be found in a number of iterations that depends only on the dimension n and on the condition number σSP of (SP ). We need a simple measure for the distance of z to the central path. To this end, for each positive feasible vector z with s(z) > 0, we define the number δc (z) as follows: δc (z) := max (zs(z)) . min (zs(z)) (3.20) Observe that δc (z) = 1 if and only if zs(z) is a multiple of the all-one vector e. This occurs precisely if z lies on the central path. Otherwise we have δc (z) > 1. We consider δc (z) as an indicator for the ‘distance’ of z to the central path.13 Lemma I.44 If δ(z, µ) ≤ 1 2 then δc (z) ≤ 4. Proof: Using the variance vector v of z, with respect to the given µ > 0, we may write   max µv 2 max v 2 δc (z) = = . min (µv 2 ) min (v 2 ) Using (3.13), it follows from δ(z, µ) ≤ 1 2 that v − v −1 ≤ 1. Without loss of generality we assume that the coordinates of v are ordered such that v1 ≥ v2 ≥ . . . ≥ vn . Then δc (z) = v12 /vn2 . Now consider the problem max  v12 : vn2 v−v −1  ≤1 . The optimal value of this problem is an upper bound for √ δc (z). One may √ easily verify that the optimal solution has vi = 1 if 1 < i < n, v1 = 2 and vn = 1/ 2. Hence the optimal value is 4.14 This proves the lemma. ✷ 13 14 In the analysis of interior-point methods we always need to introduce a quantity that measures the ‘distance’ of a feasible vector x to the central path. This can be done in many ways as becomes apparent in the course of this book. In the coming chapters we make use of a variety of so-called proximity measures. All these measures are based on the simple observation that x is on the central path if and only if the vector xs(x) is a scalar multiple of the all-one vector. Exercise 26 Prove that max ( v12 2 vn : n  X i=1 1 vi − vi 2 ) ≤1 = 4. 60 I Theory and Complexity Lemma I.45 Let z be a feasible solution of (SP ) such that δc (z) ≤ τ . Then, with s = s(z), we have zT s σSP zi ≤ , i ∈ N, , i ∈ B, zi ≥ τn σSP si ≤ zT s , i ∈ B, σSP si ≥ σSP , i ∈ N. τn Proof: The proof is basically the same as the proof of Lemma I.43. It is a little more complicated because the estimates now concern a point off the central path. From δc (z) ≤ τ we conclude that there exist positive numbers τ1 and τ2 such that τ τ1 = τ2 and τ1 ≤ zi si ≤ τ2 , 1 ≤ i ≤ n. (3.21) When we realize that these inequalities replace the role of the identity zi (µ)si (µ) = µ in the proof of Lemma I.43 the generalization becomes almost straightforward. First suppose that i ∈ N and let z̃ be an optimal solution such that s̃i := si (z̃) is maximal. Then, from to the definition of σSP , it follows that s̃i ≥ σSP . Applying the orthogonality property (2.22) to the points z̃ and z, we obtain in the same way as before zi s̃i ≤ z T s̃ ≤ z T s. Hence, dividing both sides by s̃i and using that s̃i ≥ σSP we get zi ≤ zT s . σSP From the left inequality in (3.21) we also have zi si ≥ τ1 . Hence we must have si ≥ τ1 σSP . zT s The right inequality in (3.21) gives z T s ≤ nτ2 . Thus si ≥ τ1 σSP σSP = . nτ2 nτ This proves the second and fourth inequality in the lemma. The other inequalities are obtained in the same way. If i ∈ B and z̃ is an optimal solution such that z̃i is maximal, then z̃i ≥ σSP . Applying the orthogonality property (2.22) to the points z̃ and z we obtain si z̃i ≤ sT z̃ ≤ z T s. Thus we get si ≤ zT s zT s ≤ . z̃i σSP Using once more that zi si ≥ τ1 and z T s ≤ nτ2 we obtain zi ≥ σSP τ1 σSP τ1 σSP = ≥ , zT s nτ2 nτ completing the proof of the lemma. ✷ I.3 Polynomial Algorithm 61 i∈B Table 3.2. i∈N zi ≥ σSP τn ≤ zT s σSP si (z) ≤ zT s σSP ≥ σSP τn Estimates for large and small variables if δc (z) ≤ τ . The results of the above lemma are shown in Table 3.2.. We conclude that if z T s is so small that σSP zT s < σSP τn then we have a complete separation of the small and the large variables. Thus we may state without further proof the following result. Lemma I.46 Let z be a feasible solution of (SP ) such that δc (z) ≤ τ . If z T s(z) < σSP 2 τn then the optimal partition of (SP ) follows from B = {i : zi > si (z)} and N = {i : zi < si (z)} . (3.22) This lemma is the basis of our next result. Theorem I.47 After at most  √ 4n2 2n log σSP 2  (3.23) iterations, the Full-Newton step algorithm yields a feasible (and positive) solution z of (SP ) that reveals the optimal partition (B, N ) of (SP ) according to (3.22). 2 Proof: Let us run the Full-Newton step algorithm with ε = σSP / (4n). Then Theorem T 2 I.37 states that we obtain a feasible z with z s(z) ≤ σSP / (4n) and δ(z, µ) ≤ 1/2. Lemma I.44 implies that δc (z) ≤ 4. By Lemma I.46, with τ = 4, this z gives a complete separation between the small variables and the large variables. By Theorem I.37, the required number of iterations for the given ε is at most   √ 4n2 2n log σSP 2 which is equal to the bound given in the theorem. Thus the proof is complete. ✷ 62 I Theory and Complexity Example I.48 Let us apply Theorem I.47 to the self-dual problem (2.19) in Example I.7. Then n = 5 and, according to Example I.40 (page 54), σSP = 1. Thus the iteration bound (3.23) in Theorem I.47 becomes m l√ 10 log (100) = ⌈14.5628⌉ = 15. With the help of Figure 3.1 (page 53) we can now determine the optimal partition and we find B = {1, 3, 5} , N = {2, 4} , ♦ in agreement with the result of Example I.12. 3.3.5 A rounding procedure for interior-point solutions We have just established that the optimal partition of (SP ) can be found after a finite number of iterations of the Full-Newton step algorithm. The required number of iterations is at most equal to the number given by (3.23). After this number of iterations the small variables and the large variables are well enough separated from each other to reveal the classes B and N that constitute the optimal partition. The aim of this section and the next section is to show that if B has been fixed then a strictly complementary solution of (SP ) can be obtained with little extra effort.15 First we establish that the class B is not empty. Lemma I.49 The class B in the optimal partition of (SP ) is not empty. Proof: If B is the empty set then z = 0 is the only optimal solution. Since, by Theorem I.20, this solution must be strictly complementary we must have s(z) > 0. Since s(z) = M z + q = q, we find q > 0. This contradicts that q has zero entries, by (3.1). This proves the lemma. ✷ Assuming that the optimal partition (B, N ) has been determined, with B nonempty, we describe a rounding procedure that can be applied to any positive vector z with positive surplus vector s(z) to yield a vector z̄ such that z̄ and its surplus vector s̄ = s(z̄) are complementary (in the sense that z̄N = s̄B = 0) but not necessarily nonnegative. In the next section we run the algorithm an additional number of iterations to get a sharper separation between the small and the large variables and we show that the rounding procedure yields a strictly complementary solution in polynomial time. Let us have a positive vector z with positive surplus vector s(z). Recall from (3.16), page 55, that        15  sB sN = MBB MBN MN B MN N  zB zN + qB qN . It is generally believed that interior-point methods for LO never generate an exact optimal solution in polynomial time (Andersen and Ye [11]). In fact, Ye [308] showed in 1992 that a strictly complementary solution can be found in polynomial time by all the known O(n3 L) interior-point methods. See also Mehrotra and Ye [208]. The rounding procedure described in this chapter is essentially the same as the one presented in these two papers and leads to finite termination of the algorithm. I.3 Polynomial Algorithm 63 This implies that sB = MBB zB + MBN zN + qB . Since qB = 0, by (3.17), ξ = zB satisfies the system of equations in the unknown vector ξ given by MBB ξ = sB − MBN zN . (3.24) Note that zB is a ‘large’ solution of (3.24), because the entries of zB are large variables. On the other hand we can easily see that (3.24) must have more solutions. This follows from the existence of a strictly complementary solution of (SP ), because for any such solution z̃ we derive from z̃N = 0 and sB (z̃) = 0 that MBB z̃B = 0. Since z̃B > 0, it follows that the columns of MBB are linearly dependent, and hence (3.24) has multiple solutions. Now let ξ be any solution of (3.24) and consider the vector z̄ defined by z̄B = zB − ξ, z̄N = 0. For the surplus vector s̄ = s(z̄) of z̄ we have s̄B = MBB z̄B + MBN z̄N = MBB z̄B = MBB (zB − ξ) = 0. So we have z̄N = s̄B = 0, which means that the vectors z̄ and s̄ are complementary. It will be clear, however, that the vectors z̄ and s̄ are not necessarily nonnegative. This holds only if z̄B = zB − ξ ≥ 0, and s̄N = MN B z̄B + MN N z̄N + qN = MN B (zB − ξ) + qN = sN − MN N zN − MN B ξ ≥ 0. We conclude that if (3.24) admits a solution ξ that satisfies the last two inequalities then z̄ is a solution of (SP ). Moreover, if ξ satisfies these inequalities strictly, so that zB − ξ > 0, sN − MN N zN − MN B ξ > 0, (3.25) then z̄ is a strictly complementary solution of (SP ). In the next section we show that solving (3.24) by Gaussian elimination gives such a solution, provided the separation between the small and the large variables is sharp enough. Example I.50 In this example we show that the Full-Newton step algorithm equipped with the above described rounding procedure solves the sample problem (2.19) in Example I.7 in one iteration. Recall from Example I.14 that the Newton step in the first iteration√is given by (2.39) and (2.40). Since in this iteration µ = 1 − θ, substituting θ = 1/ 10, we find 1 ∆z = − √ 10  T 1 8 4 1 T − , , , ,1 = − (−0.1054, 0.2811, 0.1405, 0.0351, 0.3162) , 3 9 9 9 and 1 ∆s = − √ 10  4 1 5 8 , , , ,0 3 9 9 9 T T = − (0.4216, 0.0351, 0.1757, 0.2811, 0.0000) . 64 I Theory and Complexity Hence, after one iteration the new iterate is given by z = (1.1054, 0.7189, 0.8595, 0.9649, 0.6838)T , and T s = (0.5784, 0.9649, 0.8243, 0.7189, 1.0000) . Let us compute the sets B and N , as defined by (3.22). This gives B = {1, 3, 4} , N = {2, 5} . It is worth mentioning that these are already the classes of the optimal partition of the problem. This becomes clear by applying the rounding procedure at z with respect to the partition (B, N ). The matrix MBB is given by   0 1 −1   MBB =  −1 0 2 . 1 −2 0 We have     −0.1054 1.1054 0 1 −1      MBB zB =  −1 0 2   0.8595  =  0.8243  . −0.6135 0.9649 1 −2 0  So we need to find a ‘small’ solution ζ of the system     −0.1054 0 1 −1     MBB zB =  −1 0 2  ζ =  0.8243  . −0.6135 1 −2 0 A solution of this system is  0.0000   ζ =  0.3067  . 0.4122  The rounded solution is now defined by       1.1054 0.0000 1.1054       z̄B = zB − ζ =  0.8595  −  0.3067  =  0.5527  , 0.5527 0.4122 0.9649 z̄N = 0. Hence the rounded solution is z = (1.1054, 0.0000, 0.5527, 0.5527, 0.0000)T . The corresponding slack vector is s(z) = M z + q = (0.0000, 0.5527, 0.0000, 0.0000, 2.2365)T . Since z and s(z) are nonnegative and complementary, z is optimal. Moreover, z+s(z) > 0, so z is a strictly complementary solution. Hence we have solved the sample problem in one iteration. ♦ I.3 Polynomial Algorithm 65 Remark I.51 In the above example we used for ξ the least norm solution of (3.24). This is the solution of the minimization problem min {kξk : MBB ξ = MBB zB } . Formally the least norm solution can be described as + ξ = MBB MBB zB , + where MBB denotes the generalized inverse (cf. Appendix B) of MBB . We may then write  + zB − ξ = IBB − MBB MBB zB , where IBB is the identity matrix of appropriate size. There are different ways to obtain a suitable vector ξ. Note that our aim is to obtain a −1 ξ such that zB − ξ is positive. This is equivalent to eB − zB ξ > 0, which certainly holds if −1 zB ξ < 1. An alternative approach might therefore be to use the solution of min which gives  −1 zB ξ : MBB ξ = MBB zB , ξ = ZB (MBB ZB )+ MBB zB , • as easily may be verified. Of course, we were lucky in the above example in two ways: the first iterate already determined the optimal partition and, moreover, at this iterate the rounding procedure yielded a strictly complementary solution. In general more iterations will be necessary to find the optimal partition and once the optimal partition has been found the rounding procedure may not yield a strictly complementary solution at once. But, as we see in the next section, after sufficiently many iterations we can always find an exact solution of any problem in this way, and the required number of iterations can be bounded by a (linear) polynomial of the size of the problem. 3.3.6 Finding a strictly complementary solution In this section we assume that the optimal partition (B, N ) of (SP ) is known. In the previous section we argued that it may be assumed without loss of generality that the set B is not empty. In this section we show that when we run the algorithm an additional number of iterations, the rounding procedure of the previous section can be used to construct a strictly complementary solution of (SP ). The additional number of iterations depends on the size of B and is aimed at creating a sufficiently large distance between the small and the large variables. We need some more notation. First, ω will denote the infinity norm of M : ω := kM k∞ = max 1≤i≤n n X j=1 |Mij | . Second, B ∗ denotes the subset of B for which the columns in MBB are nonzero, and third, the number πB is defined by    if B ∗ = ∅,  1 Y πB :=  k(MBB )j k otherwise.   j∈B ∗ 66 I Theory and Complexity Lemma I.52 Let z be a feasible solution of (SP ) such that δc (z) ≤ τ = 4. If z T s(z) ≤ 2 σSP 4n(1 + ω)2 πB p , |B| with ω and πB as defined above, then a strictly complementary solution can be found in 3 O(|B ∗ | ) arithmetical operations. Proof: Suppose that z is positive solution of (SP ) with positive surplus vector s = s(z) such that δc (z) ≤ 4 and z T s ≤ ε, where ε := 2 σSP 4n(1 + ω)2 πB p . |B| (3.26) Recall that the entities |B|, ω and πB are all at least 1 and also, by Lemma I.45, that the small variables in z and s are less than ε/σSP and the large variables are at least σSP /(4n). We now show that the system (3.24) has a solution ξ whose coordinates are small enough, so that zB − ξ > 0, sN − MN N zN − MN B ξ > 0. (3.27) We need to distinguish between the cases where MBB is zero and nonzero respectively. We first consider the case where MBB = 0. Then ξ = 0 satisfies (3.24) and for this ξ the condition (3.27) for the rounded solution z̄ to be strictly complementary reduces to the single inequality sN − MN N zN > 0. (3.28) This inequality is satisfied if MN N = 0. Otherwise, if MN N 6= 0, since zN is small we may write εω ε = . kMN N zN k∞ ≤ kMN N k∞ kzN k∞ ≤ kM k∞ σSP σSP Hence, since sN is large, (3.28) certainly holds if εω σSP < , σSP 4n which is equivalent to 2 σSP . 4nω Since this inequality is implied by the hypothesis of the lemma, we conclude that the rounding procedure yields a strictly complementary solution if MBB = 0. Now consider the case where MBB 6= 0. Then we solve (3.24) by Gaussian elimination. This goes as follows. Let B1 and B2 be two subsets of B such that MB1 B2 is a nonsingular square submatrix of MBB with maximal rank, and let ζ be the unique solution of the equation MB1 B2 ζ = sB1 − MB1 N zN . ε< I.3 Polynomial Algorithm 67 From Cramer’s rule we know that the i-th entry of ζ, with i ∈ B2 , is given by (i) det MB1 B2 , ζi = det MB1 B2 (i) where MB1 B2 is the matrix arising by replacing the i-th column in MB1 B2 by the vector sB1 − MB1 N zN . Since the entries of MB1 B2 are integral and this matrix is nonsingular, the absolute value of its determinant is at least 1. As a consequence we have (i) |ζi | ≤ det MB1 B2 . The right-hand side is no larger than the product of the norms of the columns in the (i) matrix MB1 B2 , due to the inequality of Hadamard (cf. Section 1.7.3). Thus |ζi | ≤ ksB − MBN zN k Y j∈B2 \{i} k(MB1 B2 )j k ≤ ksB − MBN zN k πB . (3.29) The last inequality follows because the norm of each nonzero column in MBB is at least 1, and πB is the product of these norms. Since sB and zN are small variables we have ksB k∞ ≤ ε σSP and kMBN zN k∞ ≤ kMBN k∞ kzN k∞ ≤ kM k∞ kzN k∞ ≤ εω . σSP Therefore ksB − MBN zN k ≤ p p ε (1 + ω) . |B| ksB − MBN zN k∞ ≤ |B| σSP Substituting this inequality in (3.29), we obtain ε(1 + ω)πB |ζi | ≤ σSP p |B| . Defining ξ by ξB2 = ζ, ξi = 0, i ∈ B \ B2 , the vector ξ satisfies (3.24), because MB1 B2 is a nonsingular square submatrix of MBB with maximal rank and because sB −MBN zN (= MBB zB ) belongs to the column space of MBB . Hence we have shown that Gaussian elimination yields a solution ξ of (3.24) such that p ε(1 + ω)πB |B| kξk∞ ≤ . (3.30) σSP Applying the rounding procedure of the previous section to z, using ξ, we obtain the vector z̄ defined by z̄B = zB − ξ, z̄N = 0, 68 I Theory and Complexity and the surplus vector s̄ = s(z̄) satisfies s̄B = 0. So z̄ is complementary. We proceed by showing that z̄ is a strictly complementary solution of (SP ) by proving that ξ satisfies the condition (3.25), namely z̄B = zB − ξ > 0, s̄N = sN − MN N zN − MN B ξ > 0. We first establish that z̄B is positive. This is now easy. The coordinates of zB are large and the nonzero coordinates of ξ are bounded above by the right-hand side in (3.30). Therefore, z̄B will be positive if p ε(1 + ω)πB |B| σSP < , σSP 4n or, equivalently, ε< 2 σSP p , 4n(1 + ω)πB |B| and this is guaranteed by the hypothesis in the lemma. We proceed by estimating the coordinates of s̄N . First we write    zN kMN N zN + MN B ξk∞ = k(MN N MN B )k∞ ≤ kM k ∞ ξ ∞ zN ξ  . ∞ Using (3.30) and the fact that zN is small we obtain kMN N zN + MN B ξk∞ ≤ ω max ε(1 + ω)πB , σSP σSP ε p ! |B| εω(1 + ω)πB = σSP p |B| . Here we used again that πB ≥ 1 and |B| ≥ 1. Hence, since the coordinates of sN are large, the coordinates of s̄N will be positive if p εω(1 + ω)πB |B| σSP < , σSP 4n or, equivalently, if ε< 2 σSP p , 4nω(1 + ω)πB |B| and this follows from the hypothesis in the lemma. Thus we have shown that the condition for z̄ being strictly complementary is satisfied. Finally, the calculation of ζ can be performed by Gaussian elimination and 3 this requires O(|B ∗ | ) arithmetic operations. Thus the proof is complete. ✷ The next theorem now easily follows from the last lemma. Theorem I.53 Using the notation introduced above, the Full-Newton step algorithm yields a feasible solution z for which the rounding procedure yields a strictly complementary solution of (SP ), after at most & p ' √ 4n2 (1 + ω)2 πB |B| 2n log 2 σSP iterations. I.3 Polynomial Algorithm 69 Proof: By Lemma I.52 the rounding procedure yields a strictly complementary solution if we run the Full-Newton step algorithm with ε= 2 σSP 4n(1 + ω)2 πB p . |B| By Theorem I.37 for this value of ε the Full-Newton step algorithm requires at most & p ' √ 4n2 (1 + ω)2 πB |B| 2n log 2 σSP iterations. This proves the theorem. ✷ Remark I.54 The result in Theorem I.53 can be used to estimate the number of arithmetic operations required by the algorithm in a worst-case situation. This number can be bounded by a polynomial of the size L of (SP ) (cf. Remark I.32), as we show. We thus establish that the method proposed in this chapter solves the self-dual model in polynomial time. As a consequence, by the results of the previous chapter, it also solves the canonical LO problem in polynomial time. The iteration bound in the theorem is worst if B contains all indices. Ignoring the integrality operator, and denoting the number of iterations by K, the iteration bound becomes K≤ √ √ 4n2 n (1 + ω)2 πM 2n log , 2 σSP where πM := n Y j=1 By Theorem I.42 we have kMj k . 1 1 . = π kM k M j j=1 σSP ≥ Qn Substituting this we get the upper bound K≤ √   5 3 , 2n log 4n 2 (1 + ω)2 πM (3.31) for the number of iterations. A rather pessimistic estimate yields 2 πM = n n Y X j=1 2 Mij i=1 ! ≤ nn Π2 . This follows by expanding the product in the middle, which gives nn terms, each of which is bounded above by Π2 , where Π is defined in Remark I.32 as the product of all nonzero entries in q and M . We also have the obvious (and very pessimistic) inequality ω ≤ Π, which implies 1 + ω ≤ 2Π. Substituting these pessimistic estimates in (3.31) we obtain K≤ √  5 2n log 4n 2 (2Π)2 nn Π2  32  = √  2n log 16n 3n+5 2  Π5 . 70 I Theory and Complexity This can be further reduced. One has  log 16 n 3n+5 2 Π5  = < = < 3n + 5 log n + 5 log Π 2 1 7 3 + (3n + 5) (n − 1) + log2 Π 2 2  1 3n2 + 2n + 1 + 7 log2 Π 2 7 (n(n + 1) + log 2 Π) . 2 log 16 + The first inequality is due to log 16 = 2.7726 < 3, log n ≤ n − 1 and log Π = 0.6931 log2 Π, and the second inequality holds because 7n(n + 1) > 3n2 + 2n + 1 for all n. Finally, using the definition (3.5) of the size L(= n(n + 1) + log2 Π)), we obtain K< Thus the claim has been proved. 3.4 √ 7√ 2n L < 5 n L. 2 • Concluding remarks The analysis in this chapter is based on properties of the central path of (SP ). To be more specific, on the property that when one moves along the central path to the optimal set, the separation between the large and small variables becomes apparent. We showed that the Full-Newton step algorithm together with a simple rounding procedure yields a √ polynomial algorithm for solving a canonical LP problem; the iteration bound is 5 n L, where L is the binary input size of the problem. In the literature many other polynomial-time interior-point algorithms have been presented. We will encounter many of these algorithms in the rest of the book. Almost all of these algorithms are based on a Newton-type search direction. At this stage we want to mention an interesting exception, which is based on an idea of Dikin and that also can be used to solve in polynomial time the self-dual problem that we considered in this and the previous chapter. In fact, an earlier version of this book used the Dikin Step Algorithm in this part of the book. The iteration bound that we could obtain for this algorithm was 7nL. Because it leads to a better iteration bound, in this edition we preferred to use the Full-Newton step algorithm. But because the Dikin Step Algorithm is interesting in itself, and also because further on in the book we will deal with Dikins method, we decided to keep a full description and analysis of the Dikin Step Algorithm in the book. It can be found in Appendix F.16 16 The Dikin Step Algorithm was investigated first by Jansen et al. [156]; the analysis of the algorithm used in this chapter is based on a paper of Ling [182]. By including √ higher-order components in the search direction, the complexity can be improved by a factor n, thus yielding a bound of the same order as for the Full-Newton step algorithm. This has been shown by Jansen et al. [160]. See also Chapter 18. 4 Solving the Canonical Problem 4.1 Introduction In Chapter 2 we discussed the fact that every LO problem has a canonical description of the form  (P ) min cT x : Ax ≥ b, x ≥ 0 . The matrix A is of size m × n and the vectors c and x are in IRn and b in IRm . In this chapter we further discuss how this problem, and its dual problem  (D) max bT y : AT y ≤ c, y ≥ 0 , can be solved by using the algorithm of the previous chapter for solving a self-dual embedding of both problems. With     y 0 A −b     (4.1) M̄ :=  −AT 0 c  , z̄ :=  x  , T T κ b −c 0 as in (2.7), the embedding problem is given by (2.15). It is the self-dual homogeneous problem  (SP0 ) min 0T z̄ : M̄ z̄ ≥ 0, z̄ ≥ 0 (4.2) In Chapter 3 we showed that a strictly complementary solution z̄ of (SP0 ) can be found in polynomial time. If a strictly complementary solution z̄ has κ > 0 then x̄ = x/κ is an optimal solution of (P ), and if κ = 0 then (P ) (and also its dual (D)) must be either unbounded or infeasible. This was shown in Section 2.8, where we also found that any strictly complementary solution of (SP0 ) with κ > 0 provides a strictly complementary pair of solutions (x̄, ȳ) for (P ) and (D). Thus x̄ is primal feasible and ȳ dual feasible. The complementarity means that  x̄ c − AT ȳ = 0, ȳ (Ax̄ − b) = 0, and the strictness of the complementarity that  x̄ + c − AT ȳ > 0, ȳ + (Ax̄ − b) > 0. Obviously these results imply that every LO problem can be solved exactly in polynomial time. The aim of this chapter is to make a more thorough investigation of 72 I Theory and Complexity the consequences of the results in Chapter 2 and Chapter 3. We restrict ourselves to the canonical model. The algorithm for the self-dual model, presented in Section 3.2, requires knowledge of a positive z̄ such that the surplus vector s(z̄) = M̄ z̄ of z̄ is positive. However, such z̄ does not exist, as we argued in Section 2.4. But then, as we showed in the same section, we can embed (SP0 ) in a slightly larger self-dual problem, named (SP ) and given by (cf. (2.16)), (SP ) min  q T z : M z ≥ −q, z ≥ 0 . (4.3) for which the constraint matrix has one extra row and one extra column, so that any strictly complementary solution of (SP ) induces a strictly complementary solution of (SP0 ). Hence, applying the algorithm to the larger problem (SP ) yields a strictly complementary solution of (SP0 ), hence also for (P ) and (D) if these problems are solvable. It should be noted that both the description of the Full-Newton Step algorithm (page 50) and its analysis apply to any problem of the form (4.3) that satisfies the IPC, provided that the matrix M is skew-symmetric and q ≥ 0. In other words, we did not exploit the special structure of the matrix M , as given by (2.11), neither did we use the special structure of the vector q, as given by (2.12). Also note that if the embedding problem is ill-conditioned, in the sense that the condition number σSP is small, we are forced to run the Full-Newton step algorithm with a (very) small value of the accuracy parameter. In practice, due to limitations of machine precision, it may happen that we cannot reach the state at which an exact solution of (SP ) can be found. In that case the question becomes important of what conclusions can be drawn for the canonical problem (P ) and its dual problem (D) when an ε-solution for the embedding self-dual problem is available. The aim of this chapter is twofold. We want to present two other embeddings of (SP0 ) that satisfy the IPC. Recall that the embedding in Chapter 2 did not require any foreknowledge about the problems (P ) and (D). We present another embedding that can also be used for that case. A crucial question that we want to investigate is if we can then decide whether the given problems have optimal solutions or not without using the rounding procedure. Obviously, this amounts to deciding whether we have κ > 0 in the limit or not. This will be the subject in Section 4.3. Our first aim, however, is to consider an embedding that applies if both (P ) and (D) have a strictly feasible solution and such solutions are know in advance. This case is relatively easy, because we then know for sure that κ > 0 in the limit. 4.2 The case where strictly feasible solutions are known We start with the easiest case, namely when strictly feasible solutions of (P ) and (D) are given. Suppose that x0 ∈ IRn and y 0 ∈ IRm are strictly feasible solutions of (P ) and (D) respectively: x0 > 0, s(x0 ) = Ax0 − b > 0 and y 0 > 0, s(y 0 ) = c − AT y 0 > 0. I.4 Solving the Canonical Problem 4.2.1 73 Adapted self-dual embedding Let    M :=   0 A −b T −A 0 c bT −cT 0 0 0 −1 0 0 1 0 and consider the self-dual problem (SP1 ) min     ,    z :=   y x κ ϑ     ,    q :=   0 0 0 2    ,   T q z : M z + q ≥ 0, z ≥ 0 . Note that q ≥ 0. We proceed by showing that this problem has a positive solution with positive surplus vector. Let ϑ0 := 1 + cT x0 − bT y 0 . The weak duality property implies that cT x0 − bT y 0 ≥ 0. If cT x0 − bT y 0 = 0 then x0 and y 0 are optimal and we are done. Otherwise we have ϑ0 > 1. We can easily check that for  0  y  x0    z 0 :=    1  ϑ0 we have    s(z 0 ) := M z 0 + q =   Ax0 − b c − AT y 0 T 0 b y − cT x0 + ϑ0 −1 + 2       =   s(x0 ) s(y 0 ) 1 1    ,  so both z 0 and its surplus vector are positive.1 Now let z̄ be a strictly complementary solution of (SP1 ). Then we have, for suitable vectors ȳ and x̄ and scalars κ̄ and ϑ̄,     Ax̄ − κ̄b ȳ  κ̄c − AT ȳ   x̄      z̄ :=   ≥ 0, s(z̄) =  T  ≥ 0, z̄s(z̄) = 0, z̄ + s(z̄) > 0. T  b ȳ − c x̄ + ϑ̄   κ̄  ϑ̄ 2 − κ̄ Since the optimal objective value is zero, we have ϑ̄ = 0. On the other hand, we cannot have κ̄ = 0, because this would imply the contradiction that either (P ) or (D) is infeasible. Hence we conclude that κ̄ > 0. This has the consequence that x̃ = x̄/κ̄ is feasible for (P ) and ỹ = ȳ/κ̄ is feasible for (D), as follows from the feasibility of z̄. The complementarity of z̄ and s(z̄) now yields that s (κ̄) := bT ȳ − cT x̄ = 0. 1 Exercise 27 If it happens that we have a primal feasible x0 and a dual feasible y 0 such that x0 s(y 0 ) = µen and y 0 s(x0 ) = µem for some positive µ, find an embedding satisfying the IPC such that z 0 is on its central path. 74 I Theory and Complexity Thus it follows that x̄/κ̄ is optimal for (P ) and ȳ/κ̄ is optimal for (D). Finally, the strict complementarity of z̄ and s(z̄) gives the strict complementarity of this solution pair. 4.2.2 Central paths of (P ) and (D) At this stage we want to point out an interesting and important consequence of the existence of strictly feasible solutions of (P ) and (D). In that case we can define central paths for the problems (P ) and (D). This goes as follows. Let µ be an arbitrary positive number. Then the µ-center of (SP1 ) is determined as the unique solution of the system (cf. (2.46), page 35) z ≥ 0, s ≥ 0 Mz + q = s, zs = µ em+n+2 . (4.4) In other words, there exist unique nonnegative x, y, κ, ϑ such that Ax − κb ≥ 0, κc − AT y ≥ 0, bT y − cT x + ϑ ≥ 0, 2−κ≥0 and, moreover y (Ax − κb)  x κc − AT y  κ b T y − cT x + ϑ ϑ (2 − κ) = µem = µen = µ = µ. (4.5) An immediate consequence is that all the nonnegative entities mentioned above are positive. Surprisingly enough, we can compute the value of κ from (4.4). Taking the inner product of both sides in the first equation with z, while using the orthogonality property, we get q T z = z T s. The second equation in (4.4) gives z T s = (n + m + 2)µ. Due to the definition of q we obtain2 2ϑ = (n + m + 2)µ. (4.6) T In fact, this relation expresses that the objective value q z = 2ϑ along the central path equals the dimension of the matrix M times µ, already established in Section 2.7. Substitution of the last equation in (4.5) into (4.6) yields 2ϑ = (n + m + 2)ϑ (2 − κ) . Since ϑ > 0, after dividing by ϑ it easily follows that κ= 2(n + m + 1) . n+m+2 (4.7) Substitution of the values of κ and ϑ in the third equation gives 2 cT x − b T y ϑκ − µ (n + m) µ (n + m) (n + m + 2) ϑ µ µ. = = = − 2 = 2 κ κ κ κ2 κ2 4 (n + m + 1) 2 The relation can also be obtained by adding all the equations in (4.5). I.4 Solving the Canonical Problem Now, defining x̄ = x , κ ȳ = 75 y , κ ϑ̄ = ϑ , κ µ̄ = µ , κ2 and using the notation s(x̄) s(ȳ) Ax̄ − b c − AT ȳ, := := we obtain that the positive vectors x̄ and ȳ are feasible for (P ) and (D) respectively with s(x̄) and s(ȳ) positive, and moreover, ȳ s(x̄) = µ̄ em x̄ s(ȳ) = µ̄ en . (4.8) If µ runs through the interval (0, ∞) then µ̄ runs through the same interval, since κ is constant. We conclude that for every positive µ̄ there exist positive vectors x̄ and ȳ that are feasible for (P ) and (D) respectively and are such that x̄, ȳ and their associated surplus vectors s(x̄) and s(ȳ) satisfy (4.8). Our next aim is to show that the system (4.8) cannot have more than one solution with x̄ and ȳ feasible for (P ) and (D). Suppose that x̄ and ȳ are feasible for (P ) and (D) and satisfy (4.8). Then it is quite easy to derive a solution for (4.5) as follows. First we calculate κ from (4.7). Then taking µ = κ2 µ̄, we can find ϑ from (4.6). Finally, the values x = κx̄ and y = κȳ satisfy (4.5). Since the solution of (4.5) is unique, it follows that the solution of (4.8) is unique as well. Thus we have shown that for each positive µ̄ the system (4.8) has a unique solution with x̄ and ȳ feasible for (P ) and (D). Denoting the solution of (4.8) by x̄(µ̄) and ȳ(µ̄), we obtain the central paths of (P ) and (D) by letting µ̄ run through all positive values. Summarizing the above results, we have proved the following. Theorem I.55 Let (x(µ), y(µ), κ(µ), ϑ(µ)) denote the point on the central path of (SP1 ) corresponding to the barrier parameter value µ. Then we have κ(µ) = κ with κ= 2(n + m + 1) . n+m+2 If µ̄ = µ/κ2 , then x̄(µ̄) = x(µ)/κ and ȳ(µ̄) = y(µ)/κ are the points on the central paths of (P ) and (D) corresponding to the barrier parameter µ̄. As a consequence we have cT x̄ − bT ȳ = x̄T s(ȳ) + ȳ T s(x̄) = (n + m)µ̄. 4.2.3 Approximate solutions of (P ) and (D) Our aim is to solve the given problem (P ) by solving the embedding problem (SP1 ). The Full-Newton step algorithm yields an ε-solution, i.e. a feasible solution z of (SP1 ) such that q T z ≤ ε, where ε is some positive number. Therefore, it is of great importance to see how we can derive approximate solutions for (P ) and (D) from any such solution of (SP1 ). In this respect the following lemma is of interest. 76 I Theory and Complexity Lemma I.56 Let z = (y, x, κ, ϑ) be a positive solution of (SP1 ). If x̃ = x , κ ỹ = y , κ then x̃ is feasible for (P ), ỹ is feasible for (D), and the duality gap at the pair (x̃, ỹ) satisfies ϑ cT x̃ − bT ỹ ≤ . κ Proof: Since z is feasible for (SP1 ), we have Ax − κb −AT y + κc b T y − cT x + ϑ −κ + 2 ≥ ≥ ≥ ≥ 0 0 0 0. With x̃ and ỹ as defined in the theorem it follows that Ax̃ ≥ b, AT ỹ ≤ c and cT x̃ − bT ỹ ≤ ϑ , κ thus proving the lemma. ✷ The above lemma makes clear that it is important for our goal to have a solution z = (y, x, κ, ϑ) of (SP1 ) for which the quotient ϑ/κ is small. From (4.7) in Section 4.2.2 we know that along the central path the variable κ is constant and given by κ= 2(n + m + 1) . n+m+2 Hence, along the central path we have the following inequality: cT x̃ − bT ỹ ≤ (n + m + 2) ϑ . 2(n + m + 1) For large-scale problems, where n + m is large, this means that the duality gap at the feasible pair (x̃, ỹ) is about ϑ/2. Unfortunately our algorithm for solving (SP1 ) generates a feasible solution z that is not necessarily on the central path. Hence the above estimate for the duality gap at (x̃, ỹ) is no longer valid. However, we show now that the estimate is ‘almost’ valid because the solution z generated by the algorithm is close to the central path. To be more precise, according to Lemma I.44 z satisfies δc (z) ≤ τ , where τ = 4, and where the proximity measure δc (z) is defined by δc (z) = max (zs(z)) . min (zs(z)) Recall that δc (z) = 1 if and only if zs(z) is a multiple of the all-one vector e. This occurs precisely if z lies on the central path. Otherwise we have δc (z) > 1. Now we can prove the following generalization of Lemma I.56. I.4 Solving the Canonical Problem 77 Lemma I.57 Let τ ≥ 1 and let z = (y, x, κ, ϑ) be a feasible solution of (SP1 ) such that δc (z) ≤ τ . If y x x̃ = , ỹ = , κ κ then x̃ is feasible for (P ), ỹ is feasible for (D), and the duality gap at the pair (x̃, ỹ) satisfies n+m+2 ϑ. cT x̃ − bT ỹ < 2(n + m + 2 − τ ) Proof: Recall from (2.23) that q T z = z T s(z). Since q T z = 2ϑ, the average value of the products zi si (z) is equal to 2ϑ . n+m+2 From δc (z) ≤ τ we deduce the following bounds:3,4 2ϑ 2τ ϑ ≤ zi si (z) ≤ , τ (n + m + 2) n+m+2 1 ≤ i ≤ m + n + 2. (4.9) The lemma is obtained by applying these inequalities to the last two coordinates of z, which are κ and ϑ. Application of (4.9) to zi = ϑ yields the inequalities 2τ ϑ 2ϑ ≤ ϑ (2 − κ) ≤ . τ (n + m + 2) n+m+2 After division by ϑ and some elementary reductions, this gives the following bounds on κ: 2(n + m + 2 − τ ) 2 (τ (n + m + 2) − 1) ≤κ≤ . (4.10) n+m+2 τ (n + m + 2) Application of the left-hand side inequality in (4.9) to zi = κ leads to  κ b T y − cT x + ϑ ≥ 2ϑ . τ (n + m + 2) Using the upper bound for κ in (4.10) we obtain b T y − cT x + ϑ ≥ 2ϑ τ (n + m + 2) ϑ = . τ (n + m + 2) 2 (τ (n + m + 2) − 1) τ (n + m + 2) − 1 Hence, τ (n + m + 2) − 2 ϑ = ϑ < ϑ. τ (n + m + 2) − 1 τ (n + m + 2) − 1 Finally, dividing both sides of this inequality by κ, and using the lower bound for κ in (4.10), we obtain cT x − b T y ≤ ϑ − cT x̃ − bT ỹ = 3 4 n+m+2 cT x − b T y < ϑ. κ 2(n + m + 2 − τ ) These bounds are sufficient for our purpose. Sharper bounds could be obtained from the next exercise. T Exercise 28 Let x ∈ IRn + and τ ≥ 1. Prove that if e x = nσ and τ min(x) ≥ max(x) then nσ τ nσ σ ≤ ≤ xi ≤ ≤ τ σ, τ 1 + (n − 1)τ n+τ −1 1 ≤ i ≤ n. 78 I Theory and Complexity This proves the lemma.5 ✷ For large-scale problems the above lemma implies that the duality gap at the feasible pair (x̃, ỹ) is about ϑ/2, provided that τ is small compared with n + m. 4.3 4.3.1 The general case Introduction This time we assume that there is no foreknowledge about (P ) and (D). It may well be that one of the problems is infeasible, or both. This raises the question of whether the given problems have any solution at all. This question must be answered by the solution method. In fact, the method that we presented in Chapter 3 perfectly answers the question. In the next section, we present an alternative self-dual embedding. The new embedding problem can be solved in exactly the same way as the embedding problem (SP ) in Chapter 3, and by using the rounding procedure described there, we can find a strictly complementary solution. Then the answer to the above question is given by the value of the homogenizing variable κ. If this variable is positive then both (P ) and (D) have optimal solutions; if it is zero then at least one of the two problems is infeasible. Our aim is to develop some tools that may be helpful in deciding if κ is positive or not without using the rounding procedure. 4.3.2 Alternative embedding for the general case Let x0 and y 0 be arbitrary positive vectors of dimension n and m respectively. Defining positive vectors s0 and t0 by the relations x0 s0 = en , y 0 t0 = e m , we consider the self-dual problem (SP2 ) min  T q z : M z + q ≥ 0, z ≥ 0 , where M and q are given by   0mm A −b b̄  −AT 0 c c̄  nn   M :=  T , T  b −c 0 β  T −b̄ −c̄T −β 0 with b̄ 5 =    q :=   0m 0n 0 n+m+2    ,  t0 + b − Ax0 Exercise 29 Using the sharper bounds for zi si (z) obtainable from Exercise 28, and using the notation of Lemma I.57, derive the following bound for the duality gap: cT x̃ − bT ỹ ≤ (n + m + 1 + τ ) ((n + m + 1) τ − 1) 2τ (n + m + 1)2 ϑ. I.4 Solving the Canonical Problem c̄ β Taking = = 79 s0 − c + AT y 0 1 − bT y 0 + cT x0 .  we then have   z 0 :=      M z0 + q =   Ax0 − b + b̄ −AT y 0 + c + c̄ bT y 0 − cT x0 + β −b̄T y 0 − c̄T x0 − β y0 x0 1 1            +   0m 0n 0 n+m+2       =   t0 s0 1 1    .  Except for the last entry in the last vector this is obvious. For this entry we write T T −b̄T y 0 − c̄T x0 − β = − t0 + b − Ax0 y 0 − s0 − c + AT y 0 x0 − β T T = − t0 y 0 − bT y 0 + x0 AT y 0 T T − s0 x0 + cT x0 − y 0 Ax0 − β = −m − bT y 0 − n + cT x0 − β = −m − n − 1, whence −b̄T y 0 − c̄T x0 − β + n + m + 2 = 1. We conclude that z 0 is a positive solution of (SP2 ) with a positive surplus vector. Moreover, since x0 s0 = en and y 0 t0 = em , this solution lies on the central path of (SP2 ) and the corresponding barrier parameter value is 1. It remains to show that if a strictly complementary solution of (SP2 ) is available then we can solve problems (P ) and (D). Therefore, let   ȳ  x̄       κ̄  ϑ̄ be a strictly complementary solution. Then, since the optimal value of (SP2 ) is zero, we have ϑ̄ = 0. As a consequence, the vector   ȳ   z̄ :=  x̄  κ̄ is a strictly complementary solution of         T        y 0 y 0 A −b y 0 m mm m               T min  0n   x  :  −A 0nn c   x  ≥  0n  ,  x  ≥ 0 .       0 κ 0 κ bT −cT 0 κ 80 I Theory and Complexity This is the problem (SP0 ), that we introduced in Chapter 2. We can duplicate the arguments used there to conclude that if κ̄ is positive then the pair (x̄/κ̄, ȳ/κ̄) provides strictly complementary optimal solutions of (P ) and (D), and if κ̄ is zero then one of the two problems is infeasible and the other is unbounded, or both problems are infeasible. Thus (SP2 ) provides a self-dual embedding for (P ) and (D). Moreover, z 0 provides a suitable starting point for the Full-Newton step algorithm. It is the point on the central path of (SP2 ) corresponding to the barrier parameter value 1. 4.3.3 The central path of (SP2 ) In this section we point out some properties of the central path of the problem (SP2 ). Let µ be an arbitrary positive number. Then the µ-center of (SP2 ) is determined as the unique solution of the system (cf. (2.46), page 35) z ≥ 0, s ≥ 0 Mz + q = s, zs = µ em+n+2 . (4.11) This solution defines the point on the central path of (SP2 ) corresponding to the barrier parameter value µ. Hence there exists unique positive x, y, κ, ϑ such that Ax − κb + ϑb̄ > 0 κc − AT y + ϑc̄ > 0 s(κ) := b y − c x + ϑβ > 0 > 0 T T s(ϑ) := n + m + 2 − b̄T y − c̄T x − κβ (4.12) and, moreover,  y Ax − κb + ϑb̄  x κc − AT y + ϑc̄  κ bT y − cT x + ϑβ  ϑ n + m + 2 − b̄T y − c̄T x − κβ = µem = µen = µ = µ. (4.13) Just as in Section 4.2.2 we take the inner product of both sides with z in the first equation of (4.11). Using the orthogonality property, we obtain q T z = z T s. The second equation in (4.11) gives z T s = (n + m + 2)µ. Due to the definition of q we obtain (n + m + 2)ϑ = (n + m + 2)µ, which gives ϑ = µ. Since ϑs(ϑ) = µ, by the fourth equation in (4.13), we conclude that s(ϑ) = 1. Since s(ϑ) = n + m + 2 − b̄T y − c̄T x − κβ this leads to b̄T y + c̄T x + κβ = n + m + 1. (4.14) I.4 Solving the Canonical Problem 81 Using ϑ = µ, the third equality in (4.13) can be rewritten as  κ bT y − cT x = µ − µκβ, which gives κβ = 1 + Substituting this in (4.14) we get b̄T y + c̄T x + which is equivalent to  κ T c x − bT y . µ  κ T c x − bT y = n + m, µ (κc + µc̄)T x − κb − µb̄ T y = µ(n + m). (4.15) This relation admits a nice interpretation. The first two inequalities in (4.12) show that x is feasible for the perturbed problem o n T min (κc + µc̄) x : Ax ≥ κb − µb̄, x ≥ 0 , and y is feasible for the dual problem n o T max κb − µb̄ y : AT y ≤ κc + µc̄, y ≥ 0 . For these perturbed problems the duality gap at the pair (x, y) is µ(n + m), from (4.15). Now consider the behavior along the central path when µ approaches zero. Two cases can occur: either κ converges to some positive value, or κ goes to zero. In both cases the duality gap converges to zero. Roughly speaking, the limiting values of x and y are optimal solutions for the perturbed problems. In the first case, when κ converges to some positive value, asymptotically the first perturbed problem becomes equivalent to (P ). We simply have to replace the variable x by κx. Also, the second problem becomes equivalent to (D): replace the variable y by κy. In the second case however, when κ goes to zero in the limit, then asymptotically the perturbed problems become  min 0T x : Ax ≥ 0, x ≥ 0 , and max  0T y : AT y ≤ 0, y ≥ 0 . As we know, one of the problems (P ) and (D) is then infeasible and the other unbounded, or both problems are infeasible. When dealing with a solution method for the canonical problem, the method must decide which of these two cases occurs. In this respect we make an interesting observation. Clearly the first case occurs if and only if κ ∈ B and the second case if and only if κ ∈ N , where (B, N ) is the optimal partition of (SP2 ). In other words, which of the two cases occurs depends on whether κ is a large variable or a small variable. Note that the variable ϑ is always small; in the present case we have ϑ(µ) = µ, for each µ > 0. Recall from Lemma I.43 that the large variables are bounded below by 82 I Theory and Complexity σSP /n and the small variables above by nµ/σSP . Hence, if κ is a large variable then κ ≥ σSP /n implies µ nµ ϑ . = ≤ κ κ σSP This implies that the quotient ϑ/κ goes to zero if µ goes to zero. On the other hand, if κ is a small variable then n nµ κ = , ≤ ϑ ϑσSP σSP proving that the quotient κ/ϑ is bounded above. Therefore, if µ goes to zero, κ2 /ϑ goes to zero as well, and hence ϑ/κ2 goes to infinity. Thus we may state the following without further proof. Theorem I.58 If κ is a large variable then lim µ↓0 ϑ ϑ = lim 2 = 0, µ↓0 κ κ and if κ is a small variable then lim µ↓0 ϑ = ∞. κ2 The above theorem provides another theoretical tool for distinguishing between the two possible cases. 4.3.4 Approximate solutions of (P ) and (D) Assuming that an ε-solution z = (y, x, κ, ϑ) for the embedding problem (SP2 ) is given, we proceed by investigating what information this gives on the embedded problem (P ) and its dual (D). With x y x̃ := , ỹ := , κ κ the feasibility of z for (SP2 ) implies the following inequalities: Ax̃ AT ỹ cT x̃ − bT ỹ  κ b̄T x̃ + c̄T ỹ + β ≥ ≤ b − ϑκ b̄ c + ϑκ c̄ ≤ ϑ κβ ≤ n + m + 2. (4.16) Clearly we cannot conclude that x̃ is feasible for (P ) or that ỹ is feasible for (D). But x̃ is feasible for the perturbed problem ( ) T ϑ ϑ ′ (P ) min c + c̄ x̄ : Ax̄ ≥ b − b̄, x ≥ 0 , κ κ and ỹ is feasible for its dual problem ( ) T ϑ ϑ ′ T (D ) max b − b̄ ȳ : A ȳ ≤ c + c̄, y ≥ 0 . κ κ We have the following lemma. I.4 Solving the Canonical Problem 83 Lemma I.59 Let z = (y, x, κ, ϑ) be a feasible solution of (SP2 ) with κ > 0. If x̃ = x , κ y , κ ỹ = then x̃ is feasible for (P ′ ), ỹ is feasible for (D′ ), and the duality gap at the pair (x̃, ỹ) for this pair of perturbed problems satisfies T T   (n + m + 2)ϑ ϑ ϑ . ỹ ≤ x̃ − b − b̄ c + c̄ κ κ κ2 Proof: We have already established that x̃ is feasible for (P ′ ) and ỹ is feasible for (D′ ). We rewrite the duality gap for the perturbed problems (P ′ ) and (D′ ) at the pair (x̃, ỹ) as follows: T T    ϑ T ϑ ϑ c̄ x̃ + b̄T ỹ . ỹ = cT x̃ − bT ỹ + x̃ − b − b̄ c + c̄ κ κ κ The third inequality in (4.16) gives cT x̃ − bT ỹ ≤ ϑ β κ and the fourth inequality c̄T x̃ + b̄T ỹ ≤ n+m+2 − β. κ Substitution gives  c+ ϑ c̄ κ T  T   ϑ n+m+2 (n + m + 2)ϑ ϑ ϑ , −β = ȳ ≤ β + x̄ − b − b̄ κ κ κ κ κ2 proving the lemma. ✷ The above lemma seems to be of interest only if κ is a large variable. For if ϑ/κ and ϑ/κ2 are small enough then the lemma provides a pair of vectors (x̃, ỹ) such that x̃ and ỹ are ‘almost’ feasible for (P ) and (D) respectively and the duality gap at this pair is small. The error in feasibility for (P ) is given by the vector (ϑ/κ)b̄ and the error in feasibility for (D) by the vector (ϑ/κ)c̄, whereas the duality gap with respect to (P ) and (D) equals approximately cT x̃ − bT ỹ. Part II The Logarithmic Barrier Approach 5 Preliminaries 5.1 Introduction In the previous chapters we showed that every LO problem can be solved in polynomial time. This was achieved by transforming the given problem to its canonical form and then embedding it into a self-dual model. We proved that the self-dual model can be solved in polynomial time. Our proof was based on the algorithm in Chapter 3 that uses the Newton direction as search direction. As we have seen, this algorithm is conceptually simple and allows a quite elementary analysis. For the theoretical purpose of Part I of the book this algorithm therefore is an ideal choice. From the practical point of view, however, there exist more efficient algorithms. The aim of this part of the book is to deal with a class of algorithms that has a relatively long history, going back to work of Frisch [88] in 1955. Frisch was the first to propose the use of logarithmic barrier functions in LO. The idea was worked out by Lootsma [185] and in the classical book of Fiacco and McCormick [77]. After 1984, the year when Karmarkar’s paper [165] raised new interest in the interior-point approach to LO, the so-called logarithmic barrier approach also began a new life. It became the basis of a wide class of polynomial time algorithms. Variants of the most efficient algorithms in this class found their way into commercial optimization packages like CPLEX and OSL.1 The aim of this part of the book is to provide a thorough introduction to these algorithms. In the literature of the last decade these interior-point algorithms were developed for LO problems in the so-called standard format:  (P ) min cT x : Ax = b, x ≥ 0 , where A is an m× n matrix of rank m, c, x ∈ IRn , and b ∈ IRm . This format also served as the standard for the literature on the Simplex Method. Because of its historical status, we adopt the standard format for this part of the book. We want to point out, however, that all results in this part can easily be adapted to any other format, including the self-dual model of Part I. We only have to define a suitable logarithmic barrier function for the format under consideration. A disadvantage of the change from the self-dual to the standard format is that it leads to some repetition of results. For example, we need to establish under what conditions the problem (P ) in standard format has a central path, and so on. In fact, 1 CPLEX is a product of CPLEX Optimization, Inc. OSL stands for Optimization Subroutine Library and is the optimization package of IBM. 88 II Logarithmic Barrier Approach we could have derived all these results from the results in Chapter 2. But, instead, to make this part of the book more accessible for readers who are better acquainted with the standard format rather than the less known self-dual format, we decided to make this part self-contained. Readers who went through Part I may only be interested in methods for solving the self-dual problem  (SP ) min q T x : M x ≥ −q, x ≥ 0 , with q ≥ 0 and M T = −M . Those readers may be advised to skip the rest of this chapter and continue with Chapters 6 and 7. The relevance of these chapters for solving (SP ) is due to the fact that (SP ) can easily be brought into the standard format by introducing a surplus vector s to create equality constraints. Since x and s are nonnegative, this yields (SP ) in the standard format:  (SP S) min q T x : M x − s = −q, x ≥ 0, s ≥ 0 . In this part of the book we take the classical duality results for the standard format of the LO problem as granted. We briefly review these results in the next section. 5.2 Duality results for the standard LO problem The standard format problem (P ) has the following dual problem:  (D) max bT y : AT y + s = c, s ≥ 0 , where s ∈ IRn and y ∈ IRm . We call (D) the standard dual problem. The feasible regions of (P ) and (D) are denoted by P and D, respectively: P := {x : Ax = b, x ≥ 0} ,  D := (y, s) : AT y + s = c, s ≥ 0 . If P is empty we call (P ) infeasible, otherwise feasible. If (P ) is feasible and the objective value cT x is unbounded below on P, then (P ) is called unbounded, otherwise bounded. We use similar terminology for the dual problem (D). Since we assumed that A has full (row) rank m, we have a one-to-one correspondence between y and s in the pairs (y, s) ∈ D. In order to facilitate the discussion we feel free to refer to any pair (y, s) ∈ D either by y ∈ D or s ∈ D. The (relative) interiors of P and D are denoted by P + and D+ : P + := {x : Ax = b, x > 0} ,  D+ := (y, s) : AT y + s = c, s > 0 . We recall the well known and almost trivial weak duality result for the LO problem in standard format. II.5 Preliminaries 89 Proposition II.1 (Weak duality) Let x and s be feasible for (P ) and (D), respectively. Then cT x−bT y = xT s ≥ 0. Consequently, cT x is an upper bound for the optimal value of (D), if it exists, and bT y is a lower bound for the optimal value of (P ), if it exists. Moreover, if the duality gap xT s is zero then x is an optimal solution of (P ) and (y, s) is an optimal solution of (D). Proof: The proof is straightforward. We have 0 ≤ xT s = xT (c − AT y) = cT x − (Ax)T y = cT x − bT y. (5.1) This implies that cT x is an upper bound for the optimal objective value of (D), and bT y is a lower bound for the optimal objective value of (P ), and, moreover, if the duality gap is zero then the pair (x, s) is optimal. ✷ A direct consequence of Proposition II.1 is that if one of the problems (P ) and (D) is unbounded, then the other problem is infeasible. The classical duality results for the primal and dual problems in standard format boil down to the following two results. The first result is the Duality Theorem (due to von Neumann, 1947, [227]), and the second result will be referred to as the Goldman–Tucker Theorem (Goldman and Tucker, 1956, [111]). Theorem II.2 (Duality Theorem) If (P ) and (D) are feasible then both problems have optimal solutions. Then, if x ∈ P and (y, s) ∈ D, these are optimal solutions if and only if xT s = 0. Otherwise neither of the two problems has optimal solutions: either both (P ) and (D) are infeasible or one of the two problems is infeasible and the other one is unbounded. Theorem II.3 (Goldman–Tucker Theorem) If (P ) and (D) are feasible then there exists a strictly complementary pair of optimal solutions, that is an optimal solution pair (x, s) satisfying x + s > 0. It may be noted that these two classical results follow immediately from the results in Part I.2 For future use we also mention that (P ) is infeasible if and only if there exists a vector y such that AT y ≤ 0 and bT y > 0, and (D) is infeasible if and only if there exists a vector x ≥ 0 such that Ax = 0 and cT x < 0. These statements are examples of theorems of the alternatives and easily follow from Farkas’ lemma.3 We denote the set of all optimal solutions of (P ) by P ∗ and similarly D∗ denotes the set of optimal solutions of (D). Of course, P ∗ is empty if and only if (P ) is infeasible or unbounded, and D∗ is empty if and only if (D) is infeasible or unbounded. Note that the Duality Theorem (II.2) implies that P ∗ is empty if and only if D∗ is empty. 2 Exercise 30 Derive Theorem II.2 and Theorem II.3 from Theorem I.26. 3 Exercise 31 Using Farkas’ lemma (cf. Remark I.27), prove: (i) either the system Ax = b, x ≥ 0 or the system AT y ≤ 0, bT y > 0 has a solution; (ii) either the system AT y ≤ c or the system Ax = 0, x ≥ 0, cT x < 0 has a solution. 90 5.3 II Logarithmic Barrier Approach The primal logarithmic barrier function We start by introducing the so-called logarithmic barrier function for the primal problem (P ). This is the function g̃µ (x) defined by g̃µ (x) := cT x − µ n X log xj , (5.2) j=1 where µ is a positive number called the barrier parameter, and x runs through all primal feasible vectors that are positive. The domain of g̃µ is the set P + . The use of logarithmic barrier functions in LO was first proposed by Frisch [88] in 1955. By minimizing g̃µ (x), we try to realize two goals at the same time, namely to T find Pn a primal feasible vector x for which c x is small and such that the barrier term j=1 log xj is large. Frisch observed that the minimization of g̃µ (x) can be done easily by using standard techniques from nonlinear optimization. The barrier parameter can be used to put more emphasis on either the objective value cT x of the primal LO problem (P ), or on the barrier term. Intuitively, by letting µ take a small (positive) value, we may expect that a minimizer of g̃µ (x) will be a good approximation for an optimal solution of (P ). It has taken approximately 40 years to make clear that this is a brilliant idea, not only from a practical but also from a theoretical point of view. In this part of the book we deal with logarithmic barrier methods for solving both the primal problem (P ) and the dual problem (D), and we show that when worked out in an appropriate way, the resulting methods solve both (P ) and (D) in polynomial time. 5.4 Existence of a minimizer In the logarithmic barrier approach a major question is whether the barrier function has a minimizing point or not. This section is devoted to this question, and we present some necessary and sufficient conditions. One of these (mutually equivalent) conditions will be called the interior-point condition. This condition is fundamental not only for the logarithmic barrier approach, but as we shall see, for all interior-point approaches. Note that the definition of g̃µ (x) can be extended to the set IRn++ of all positive vectors x, and that g̃µ (x) is differentiable on this set. We can easily verify that the gradient of g̃µ is given by ∇g̃µ (x) = c − µx−1 , and the Hessian matrix by ∇2 g̃µ (x) = µX −2 . Obviously, the Hessian is positive definite for any x ∈ IRn++ . This means that g̃µ (x) is strictly convex on IRn++ . We are interested in the behavior of g̃µ on its domain, which is the set P + of the positive vectors in the primal feasible space. Since P + is the intersection of IRn++ and the affine space {x : Ax = b}, it is a relatively open subset of IRn++ . Therefore, the smallest affine space containing P + is the affine space II.5 Preliminaries 91 {x : Ax = b}, and the linear space parallel to it is the null space N (A) of A: N (A) = {x : Ax = 0} . Taking D = IRn++ and C = P + , we may now apply Proposition A.1. From this we conclude that g̃µ has a minimizer if and only if there exists an x ∈ P + such that c − µx−1 ⊥ N (A). Since the orthogonal complement of the null space of A is the row space of A, it follows that x ∈ P + is a minimizer of g̃µ if and only if there exists a vector y ∈ IRm such that c − µx−1 = AT y. By putting s := µx−1 , which is equivalent to xs = µe, it follows that g̃µ has a minimizer if and only if there exist vectors x, y and s such that Ax AT y + s = = b, c, xs = µe. x > 0, s > 0, (5.3) We thus have shown that this system represents the optimality conditions for the primal logarithmic barrier minimization problem, given by  (Pµ ) min g̃µ (x) : x ∈ P + . We refer to the system (5.3) as the KKT system with respect to µ.4 Note that the condition x > 0 can be relaxed to x ≥ 0, because the third equation in (5.3) forces strict inequality. Similarly, the condition s > 0 can be replaced by s ≥ 0. Thus, the first equation in (5.3) is simply the feasibility constraint for the primal problem (P ) and the second equation is the feasibility constraint for the dual problem (D). For reasons that we shall make clear later on, the third constraint is referred to as the centering condition with respect to µ. 5.5 The interior-point condition If the KKT system has a solution for some positive value of the barrier parameter µ, then the primal feasible region contains a positive vector x, and the dual feasible region contains a pair (y, s) with positive slack vector s. In short, both P and D contain a positive vector. At this stage we announce the surprising result that the converse is also true: if both P and D contain a positive vector, then the KKT system has a solution for any positive µ. This is a consequence of the following theorem. Theorem II.4 Let µ > 0. Then the following statements are equivalent: (i) both P and D contain a positive vector; 4 The reader who is familiar with the theory of nonlinear optimization will recognize in this system the first-order optimality conditions, also known as Karush–Kuhn–Tucker conditions, for (Pµ ). 92 II Logarithmic Barrier Approach (ii) there exists a (unique) minimizer of g̃µ on P + ; (iii) the KKT system (5.3) has a (unique) solution. Proof: The equivalence of (ii) and (iii) has been established above. We have also observed already the implication (iii) ⇒ (i). So the proof of the theorem will be complete if we show (i) ⇒ (ii). The proof of this implication is more sophisticated. 0 Assuming (i), there exist vectors x0 and y 0 such that x0 is feasible for  (P ) and y 0 0 0 T 0 is feasible for (D), x > 0 and s := c − A y > 0. Taking K = g̃µ x and defining the level set LK of g̃µ by  LK := x ∈ P + : g̃µ (x) ≤ K , we have x0 ∈ LK , so LK is not empty. Since g̃µ is continuous on its domain, it suffices to show that LK is compact. Because then g̃µ has a minimizer, and since g̃µ is strictly convex this minimizer is unique. Thus to complete the proof we show below that LK is compact. Let x ∈ LK . Using Proposition II.1 we have cT x − bT y 0 = xT s0 , so, in the definition of g̃µ (x) we may replace cT x by bT y 0 + xT s0 : T g̃µ (x) = c x − µ n X j=1 T 0 T 0 log xj = b y + x s − µ  Since xT s0 = eT xs0 and eT e = n, this can be written as n X log xj . j=1 n n X X  xj s0j log s0j , + n − nµ log µ + bT y 0 + µ log g̃µ (x) = eT xs0 − e − µ µ j=1 j=1 or, equivalently, n n X X  xj s0j log eT xs0 − e − µ log s0j . = g̃µ (x) − n + nµ log µ − bT y 0 − µ µ j=1 j=1 Hence, using g̃µ (x) ≤ K and defining K̄ by K̄ := K − n + nµ log µ − bT y 0 − µ we obtain n X log s0j , j=1 n X  xj s0j log eT xs0 − e − µ ≤ K̄. µ j=1 (5.4) Note that K̄ does not depend on x. Now let the function ψ : (−1, ∞) → IR be defined by ψ(t) = t − log(1 + t). (5.5) II.5 Preliminaries 93 Then, also using eT e = n, we may rewrite (5.4) as follows: ! n X xj s0j − 1 ≤ K̄. ψ µ µ j=1 (5.6) The rest of the proof is based on some simple properties of the function ψ(t),5 namely • ψ(t) ≥ 0 for t > −1; • ψ is strictly convex; • ψ(0) = 0; • limt→∞ ψ(t) = ∞; • limt↓−1 ψ(t) = ∞. In words: ψ(t) is strictly convex on its domain and minimal at t = 0, with ψ(0) = 0; moreover, ψ(t) goes to infinity if t goes to one of the boundaries of the domain (−1, ∞) of ψ. Figure 5.1 depicts the graph of ψ. 2 ψ(t) ✻ 1.75 1.5 1.25 1 0.75 0.5 0.25 0 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 ✲ t Figure 5.1 The graph of ψ. Since ψ is nonnegative on its domain, each term in the above sum is nonnegative. Therefore, ! xj s0j − 1 ≤ K̄, 1 ≤ j ≤ n. µψ µ Now using that ψ(t) is strictly convex, zero at t = 0, and unbounded if t goes to −1 or to infinity, it follows that there must exist unique nonnegative numbers a and b, 5 E. Klafszky drew our attention to the fact that this function is known in the literature. It was used in a different context for measuring discrepancy between two positive vectors in IRn . See Csiszár [58] and Klafszky, Mayer and Terlaky [169]. 94 II Logarithmic Barrier Approach with a < 1, such that K̄ . µ ψ(−a) = ψ(b) = We conclude that −a ≤ xj s0j − 1 ≤ b, µ 1 ≤ j ≤ n, which gives µ(1 − a) µ(1 + b) ≤ xj ≤ , s0j s0j 1 ≤ j ≤ n. Since 1 − a > 0, this shows that each coordinate of the vector x belongs to a finite and closed interval on the set (0, ∞) of positive real numbers. As a consequence, since the level set LK is a closed subset of the Cartesian product of these intervals, LK is compact. Thus we have shown that (ii) holds. ✷ The first condition in Theorem II.4 will be referred to as the interior-point condition. Let us point out once more that the word ‘unique’ in the second statement comes from the fact that g̃µ is strictly convex, which implies that g̃µ has at most one minimizer. The equivalence of (ii) and (iii) now justifies the word ‘unique’ in the third statement. Remark II.5 It is possible to give an elementary proof (i.e., without using the equivalence of (ii) and (iii) in Theorem II.4) of the fact that the KKT system (5.3) cannot have more than one solution. This goes as follows. Let x1 , y 1 , s1 and x2 , y 2 , s2 denote two solutions of the equation system (5.3). Define ∆x := x2 − x1 , and similarly ∆y := y 2 − y 1 and ∆s := s2 − s1 . Then we may easily verify that A∆x = 0 (5.7) AT ∆y + ∆s = 0 (5.8) x1 ∆s + s1 ∆x + ∆s∆x = 0. (5.9) From (5.7) and (5.8) we deduce that ∆sT ∆x = 0, or eT ∆x∆s = 0. (5.10) Rewriting (5.9) gives (x1 + ∆x)∆s + s1 ∆x = 0. Since x1 + ∆x = x2 > 0 and s1 > 0, this implies that no two corresponding entries in ∆x and ∆s have the same sign. So it follows that ∆x∆s ≤ 0. (5.11) Combining (5.10) and (5.11), we obtain ∆x∆s = 0. Hence either (∆x)i = 0 or (∆s)i = 0, for each i. Using (5.9), we conclude that (∆x)i = 0 and (∆s)i = 0, for each i. Hence x1 = x2 and s1 = s2 . Consequently, AT (y 1 − y 2 ) = 0. Since rank (A) = m, the columns of AT are linearly independent and it follows that y 1 = y 2 . This proves the claim. • II.5 Preliminaries 5.6 95 The central path Theorem II.4 has several important consequences. First we remark that the interiorpoint condition is independent of the barrier parameter. Therefore, since this condition is equivalent to the existence of a minimizer of the logarithmic barrier function g̃µ , if such a minimizer exists for some (positive) µ, then it exists for all µ. Hence, the interior point condition guarantees that the KKT system (5.3) has a unique solution for every positive value of µ. These solutions are denoted throughout as x(µ), y(µ) and s(µ), and we call x(µ) the µ-center of (P ) and (y(µ), s(µ)) the µ-center of (D). The set {x(µ) : µ > 0} of all primal µ-centers represents a parametric curve in the feasible region P of (P ) and is called the central path of (P ). Similarly, the set {(y(µ), s(µ)) : µ > 0} is called the central path of (D). Remark II.6 It may worthwhile to point out that along the primal central path the primal objective value cT x(µ) is monotonically decreasing and along the dual central path the dual objective value bT y(µ) is monotonically increasing if µ decreases. In fact, in both cases the monotonicity is strict unless the objective value is constant on the feasible region, and in the latter case the central path is just a point. Although we will not use these results we include here the proof for the primal case.6 Recall that x(µ) is the (unique) minimizer of the primal logarithmic barrier function g̃µ (x) = cT x − µ n X log xj , j=1 as given by (5.2), when x runs through the positive vectors in P. First we deal with the case when the primal objective value is constant on P. We have the following equivalent statements: (i) (ii) (iii) (iv) cT x is constant for x ∈ P; x(µ) is constant for µ > 0; x(µ1 ) = x(µ2 ) for some µ1 and µ2 with 0 < µ1 < µ2 ; there exists a ξ ∈ IRn such that s(µ) = µξ for µ > 0. The proof is easy. If (i) holds then the minimizer of g̃µ (x) is independent of µ, and hence x(µ) is constant for all µ > 0, which means that (ii) holds. The implication (ii) ⇒ (iii) is obvious. Assuming (iii), let ξ be such that x(µ1 ) = x(µ2 ) = ξ. Since s(µ1 ) = µ1 ξ −1 and s(µ2 ) = µ2 ξ −1 we have AT y(µ1 ) + µ1 ξ −1 = c, AT y(µ2 ) + µ2 ξ −1 = c. This implies (µ2 − µ1 ) c = AT (µ2 y(µ1 ) − µ1 y(µ2 )) , 6 The idea of the following proof is due to Fiacco and McCormick [77]. They deal with the more general case of a convex optimization problem and prove the monotonicity of the objective value only for the primal central path. We also refer the reader to den Hertog, Roos and Vial [146] for a different proof. The proof for the dual central path is similar to the proof for the primal central path and is left to the reader. 96 II Logarithmic Barrier Approach showing that c belongs to the row space of A. This means that (i) holds.7 Thus we have shown the equivalence of (i) to (iii). The equivalence of (ii) and (iv) is immediate from x(µ)s(µ) = µe for all µ > 0. Now consider the case where the primal objective value is not constant on P. Letting 0 < µ1 < µ2 and x1 = x(µ1 ) and x2 = x(µ2 ), we claim that cT x1 < cT x2 . The above equivalence (i) ⇔ (iii) makes it clear that x1 6= x2 . The rest of the proof is based on the fact that g̃µ (x) is strictly convex. From this we deduce that g̃µ1 (x1 ) < g̃µ1 (x2 ) and g̃µ2 (x2 ) < g̃µ2 (x1 ). Hence cT x1 − µ1 n X log x1j < cT x2 − µ1 n X cT x2 − µ2 n X log x2j < cT x1 − µ2 n X and j=1 j=1 log x2j (5.12) log x1j . (5.13) j=1 j=1 The sums in these inequalities can be eliminated by multiplying both sides of (5.12) by µ2 and both sides of (5.13) by µ1 , and then adding the resulting inequalities. Thus we find µ2 cT x1 + µ1 cT x2 < µ2 cT x2 + µ1 cT x1 , which is equivalent to  (µ2 − µ1 ) cT x1 − cT x2 < 0. Since µ2 − µ1 > 0 we obtain cT x1 < cT x2 , proving the claim. • It is obvious that if one of the problems (P ) and (D) is infeasible, then the interiorpoint condition cannot be satisfied, and hence the central paths do not exist. But feasibility of both (P ) and (D) is not enough for the existence of the central paths: the central paths exist if and only if both the primal and the dual feasible region contain a positive vector. In that case, when the interior-point condition is satisfied, the central path can be obtained by solving the KKT system. Unfortunately, the KKT system is nonlinear, and hence in general it will not be possible to solve it explicitly. In order to understand better the type of nonlinearity, we show that the KKT system can be reformulated as a system of m polynomial equations of degree at most n, in the m coordinates of the vector y. This goes as follows. From the second and the third equations we derive that x = µ c − AT y −1 . Substituting this in the first equation we obtain µA c − AT y −1 = b. (5.14) If we multiply each of the m equations in this system by the product of the n coordinates of the vector c − AT y, which are linear in the m coordinates yj , we arrive at m polynomial equations of degree at most n in the coordinates of y. We illustrate this by a simple example. 7 Exercise 32 Assume that (P ) and (D) satisfy the interior point condition. Prove that the primal objective value is constant on the primal feasible region P if and only if c = AT λ for some λ ∈ IRm , and the dual objective value is constant on the dual feasible region D if and only if b = 0. II.5 Preliminaries 97 Example II.7 Consider the case where A= " 1 −1 0 0 8 0 1 #  1   c =  1 . 1  , For the moment we do not further specify the vector b. The left-hand side of (5.14) becomes µA c − AT y −1 =µ " 1 −1 0 0 0 1 # −1  2µy  1 1 − y1    1 − y12  .  1 + y1  =  µ 1 − y2 1 − y2  This means that the KKT system (5.3) is equivalent to the system of equations  2µy    1 b 2  1 − y1   1  , =  µ b2 1 − y2   1 − y1    1 + y1  ≥ 0. 1 − y2 We consider this system for special choices of the vector b. Obviously, if b2 ≤ 0 then the system has no solution, since µ > 0 and 1 − y2 ≥ 0. Note that the second equation in Ax = b then requires that x3 ≤ 0, showing that the primal feasible region does not contain a positive vector in that case. Hence, the central path exists only if b2 > 0. Without loss of generality we may put b2 = 1. Then we find y2 = 1 − µ. Now consider the case where b1 = 0: b= " 0 1 # . Then we obtain y1 = 0 from the first equation, and hence for each µ > 0: x(µ) = (µ, µ, 1) s(µ) y(µ) = = (1, 1, µ) (0, 1 − µ). Thus we have found a parametric representation of the central paths of (P ) and (D). They are straight half lines in this case. The dual central path (in the y-space) is shown in Figure 5.2. 8 Note that these data are the same as in the examples D.5, D.6 and D.7 in Appendix D. These examples differ only in the vector b. 98 II Logarithmic Barrier Approach 1 y2 ✻ 0 −1 central path ❘ −2 −3 −4 −1.5 −1 −0.5 0 0.5 1 1.5 ✲ y1 Figure 5.2 The dual central path if b = (0, 1). Let us also consider the case where b1 = 1: " b= 1 1 # . The first equation in the reduced KKT system then becomes y12 + 2µy1 − 1 = 0, giving y1 = −µ ± p 1 + µ2 . The minus sign gives y1 ≤ −1, which implies s2 = 1 + y1 ≤ 0. Since 1 + y1 must be positive, the unique solution for y1 is determined by the plus sign: p y1 = −µ + 1 + µ2 . With y(µ) found, the calculation of s(µ) and x(µ) is straightforward, and yields a parametric representation of the central paths of (P ) and (D). We have for each µ > 0:    1   p p 1 µ + 1 + 1 + µ2 , −1 + µ + 1 + µ2 , 1 x(µ) = 2 2   p p s(µ) = 1 + µ − 1 + µ2 , 1 − µ + 1 + µ2 , µ   p y(µ) = −µ + 1 + µ2 , 1 − µ . The dual central path in the y-space is shown in Figure 5.3. II.5 Preliminaries 99 1 y2 ✻ 0 −1 central path ❘ −2 −3 −4 −1.5 −1 −0.5 0 0.5 1 1.5 ✲ y1 Figure 5.3 The dual central path if b = (1, 1). Note that in the above examples the limit of the central path exists if µ approaches zero, and that the limit point is an optimal solution. In fact this property of the central path is at the heart of the interior-point methods for solving the problems (P ) and (D). The central path is used as a guideline to the optimal solution set. ♦ 5.7 Equivalent formulations of the interior-point condition Later on we need other conditions that are equivalent to the interior-point condition. In this section we deal with one of them. Let x be feasible for the primal problem, and (y, s) for the dual problem. Then, omitting y, we call (x, s) a primal-dual pair. From Proposition II.1 we recall that the duality gap for this pair is given by cT x − bT y = xT s. We now derive an important consequence of the interior point condition on the level sets of the duality gap. In doing so, we shall use a simple relationship that we state, for further use, as a lemma. The relation in the lemma is an immediate consequence of the orthogonality of the row space and the null space of the matrix A. Lemma II.8 Assume x̄ ∈ P and s̄ ∈ D. Then for all primal-dual feasible pairs (x, s), xT s = s̄T x + x̄T s − x̄T s̄. 100 II Logarithmic Barrier Approach Proof: From the feasibility assumption, the vectors x − x̄ and s − s̄ are orthogonal, since the first vector belongs to the null space of A while the second is in the row space of A. Expanding the scalar product (x − x̄)T (s − s̄) and equating it to zero yields the result. ✷ Theorem II.9 Let the interior-point assumption hold. Then, for each positive K, the set of all primal-dual feasible pairs (x, s) such that xT s ≤ K is bounded. Proof: By the interior-point assumption there exists a positive primal-dual feasible pair (x̄, s̄). Substituting K for xT s in Lemma II.8, we get s̄T x + x̄T s ≤ K + x̄T s̄. This implies that both s̄T x and x̄T s are bounded. Since x̄ > 0 and s̄ > 0, we conclude that all components of x and s must also be bounded. ✷ We can restate Theorem II.9 by saying that the interior-point condition implies that all level sets of the duality gap are bounded. Interestingly enough, the converse is also true. If all level sets of the duality gap are bounded, then the interior point condition is satisfied. This is a consequence of our next result.9 Theorem II.10 Let the feasible regions of (P ) and (D) be nonempty. Then the following statements are equivalent: (i) both P and D contain a positive vector; (ii) the level sets of the duality gap are bounded; (iii) the optimal sets of (P ) and (D) are bounded. Proof: The implication (i) ⇒ (ii) is just a restatement of Theorem II.9. The implication (ii) ⇒ (iii) is obvious, because optimal solutions of (P ) and (D) are contained in any nonempty level set of the duality gap. The implication (iii) ⇒ (i) in the theorem is nontrivial and can be proved as follows. Since the feasible regions of (P ) and (D) are nonempty we have optimal solutions x∗ and (y ∗ , s∗ ) for (P ) and (D). First assume that the optimal set of (P ) is bounded. Since x ∈ P is optimal for (P ) if and only if xT s∗ = 0, this set is given by  P ∗ = x : Ax = b, x ≥ 0, xT s∗ = 0 . The boundedness of P ∗ implies that the problem  max eT x : Ax = b, x ≥ 0, xT s∗ = 0 x is bounded, and hence it has an optimal solution. Since x and s∗ are nonnegative, the problem is equivalent to  max eT x : Ax = b, x ≥ 0, xT s∗ ≤ 0 . x 9 This result was first established by McLinden [197, 198]. See also Megiddo [200]. II.5 Preliminaries 101 Hence, the dual of this problem is feasible. The dual is given by  min bT y : AT y + λs∗ ≥ e, λ ≥ 0 . y,λ Let (ȳ, λ̄) be feasible for this problem. Then we have AT ȳ + λ̄s∗ ≥ e. If λ̄ = 0 then AT ȳ ≥ e, which implies AT (y ∗ − ȳ) = AT y ∗ − AT ȳ ≤ c − e, and hence y ∗ − ȳ is dual feasible with positive slack vector. Now let λ̄ > 0. Then, replacing s∗ by c − AT y ∗ in AT ȳ + λ̄s∗ ≥ e we get  AT ȳ + λ̄ c − AT y ∗ ≥ e. Dividing by the positive number λ̄ we obtain  ȳ  e AT y ∗ − + ≤ c, λ̄ λ̄ showing that y ∗ − ȳ/λ̄ is feasible for (D) with a positive slack vector. We proceed by assuming that the (nonempty!) optimal set of (D) is bounded. The same arguments apply in this case. Using that (y, s) ∈ D is optimal for (D) if and only if sT x∗ = 0, the dual optimal set is given by  D∗ = (y, s) : AT y + s = c, s ≥ 0, sT x∗ = 0 . The boundedness of D∗ implies that the problem  max eT s : AT y + s = c, s ≥ 0, sT x∗ = 0 y,s is bounded and hence has an optimal solution. This implies that the problem  max eT s : AT y + s = c, s ≥ 0, sT x∗ ≤ 0 y,s is also feasible and bounded. Hence, the dual problem, given by  min cT x : Ax = 0, x + ηx∗ ≥ e, η ≥ 0 , x,η is feasible and bounded as well. We only use the feasibility. Let (x̄, η̄) be a feasible solution. Then x̄ + η̄x∗ ≥ e and Ax̄ = 0. If η̄ = 0 then we have x∗ + x̄ ≥ e > 0 and A (x∗ + x̄) = Ax̄ + Ax∗ = b, whence x∗ + x̄ is a positive vector in P. If η̄ > 0 then we write   1 x̄ ∗ + x = Ax̄ + Ax∗ = b, A η̄ η̄ yielding that the positive vector x̄/η̄ + x∗ is feasible for (P ). Thus we have shown that (iii) implies (i), completing the proof. ✷ Each of the three statements in Theorem II.10 deals with properties of both (P ) and (D). We also have two one-sided versions of Theorem II.10 in which we have three 102 II Logarithmic Barrier Approach equivalent statements where each statement involves a property of (P ) or a property of (D). We state these results as corollaries, in which a primal level set means any set of the form  x ∈ P : cT x ≤ K and a dual level set means any set of the form  y ∈ D : bT y ≥ K , where K may be any real number. The first corollary follows. Corollary II.11 Let the feasible regions of (P ) and (D) be nonempty. Then the following statements are equivalent: (i) P contains a positive vector; (ii) the level sets of the dual objective are bounded; (iii) the optimal set of (D) is bounded. Proof: Recall that the hypothesis in the corollary implies that the optimal sets of (P ) and (D) are nonempty. The proof is cyclic, and goes as follows. (i) ⇒ (ii): Letting x̄ ∈ P, with x̄ > 0, we show that each level set of the dual objective is bounded. For any number K let DK be the corresponding level set of the dual objective:  DK = (y, s) ∈ D : bT y ≥ K . Then (y, s) ∈ DK implies sT x̄ = cT x̄ − bT y ≤ cT x̄ − K. Since x̄ > 0, the i-th coordinate of s must be bounded above by (cT x̄ − K)/x̄i . Therefore, DK is bounded. (ii) ⇒ (iii): This implication is trivial, because the optimal set of (D) is a level set of the dual objective. (iii) ⇒ (i): This implication has been obtained as part of the proof of Theorem II.10. ✷ The proof of the second corollary goes in the same way and is therefore omitted. Corollary II.12 Let the feasible regions of (P ) and (D) be nonempty. Then the following statements are equivalent: (i) D contains a positive vector; (ii) the level sets of the primal objective are bounded; (iii) the optimal set of (P ) is bounded. We conclude this section with some interesting consequences of these corollaries. We assume that the feasible regions P and D are nonempty. Corollary II.13 D is bounded if and only if the null space of A contains a positive vector. II.5 Preliminaries 103 Proof: The dual feasible region remains unchanged if we put b = 0. In that case D coincides with the optimal set D∗ of (D), and this is the only nonempty dual level set. Hence, Corollary II.11 yields that D is bounded if and only if P contains a positive vector. Since b = 0 this gives the result. ✷ Corollary II.14 P is bounded if and only if the row space of A contains a positive vector. Proof: The primal feasible region remains unchanged if we put c = 0. Now P coincides with the primal optimal set P ∗ of (P ), and Corollary II.12 yields that D is bounded if and only if D contains a positive vector. Since c = 0 this gives the result. ✷ Note that the word ‘positive’ in the last two corollaries could be replaced by the word ‘negative’, because a linear space contains a positive vector if and only if it contains a negative vector. An immediate consequence of Corollary II.13 and Corollary II.14 is as follows. Corollary II.15 At least one of the two sets P and D is unbounded. Proof: If both sets are bounded then there exist a positive vector x and a vector y such that Ax = 0 and AT y > 0. This gives the contradiction  T 0 = (Ax) y = xT AT y > 0. The result follows. ✷ Remark II.16 If (P ) and (D) satisfy the interior-point condition then for every positive µ we have a primal-dual pair (x, s) such that xs = µe. Letting µ go to infinity, it follows that for each index i the product xi si goes to infinity. Therefore, at least one of the coordinates xi and si must be unbounded. It can be shown that exactly one of these two coordinates is unbounded and the other is bounded. This is an example of a coordinatewise duality property. We will not go further in this direction here, but refer the reader to Williams [291, 292] and to Güler et al. [134]. • 5.8 Symmetric formulation In this chapter we dealt with the LO problem in standard form  (P ) min cT x : Ax = b, x ≥ 0 , and its dual problem (D) max  T b y : AT y + s = c, s ≥ 0 . Note that there is an asymmetry in problems (P ) and (D). The constraints in (P ) and (D) are equality constraints, but in (P ) all variables are nonnegative, whereas in (D) we also have free variables, in y. Note that we could eliminate s in the formulation of 104 II Logarithmic Barrier Approach (D), leaving us with the inequality constraints AT y ≤ c, so this would not remove the asymmetry in the formulations. We could have avoided the asymmetry by using a different format for problem (P ), but because the chosen format is more or less standard in the literature, we decided to use the standard format in this chapter and to accept its inherent asymmetry. Note that the asymmetry is also reflected in the KKT system. This is especially true for the first two equations, because the third equation is symmetric in x and s. In this section we make an effort to show that it is quite easy to obtain a perfect symmetry in the formulations. This has some practical value. It implies that every concept, or result, or algorithm for one of the two problems, has its natural counterpart for the other problem. It will also highlight the underlying geometry of an LO problem. Let us define the linear space L as the null space of the matrix A: L = {Ax = 0 : x ∈ IRn } , (5.15) and let L⊥ denote the orthogonal complement of L. Then, due to a well known result in linear algebra, L⊥ is the row space of the matrix A, i.e.,  L⊥ = AT y : y ∈ IRm . (5.16) Now let x̄ be any vector satisfying Ax̄ = b. Then x is primal feasible if x ∈ x̄ + L and x ∈ IRn+ . So the primal problem can be reformulated as  (P ′ ) min cT x : x ∈ (x̄ + L) ∩ IRn+ . So, (P ) amounts to minimizing the linear function cT x over the intersection of the affine space x̄ + L and the nonnegative orthant IRn+ . We can put (D) in the same format by eliminating the vector y of free variables. To this end we observe that s ∈ IRn is feasible for (D) if and only if s ∈ c + L⊥ and s ∈ IRn+ . Given any vector s ∈ c + L⊥ , let y be such that AT y + s = c. Then T bT y = (Ax̄) y = x̄T AT y = x̄T (c − s) = cT x̄ − x̄T s. (5.17) Omitting the constant cT x̄, it follows that solving (D) is equivalent to solving the problem  (D′ ) min x̄T s : s ∈ (c + L⊥ ) ∩ IRn+ . Thus we see that the dual problem amounts to minimizing the linear function x̄T s over the intersection of the affine space c + L⊥ and the nonnegative orthant IRn+ . The similarity with reformulation (P ′ ) is striking: both problems are minimization problems, the roles of the vectors x̄ and c are interchanged, and the underlying linear spaces are each others orthogonal complement. An immediate consequence is also that the dual of the dual problem is the primal problem.10 The KKT conditions can now be expressed in a way that is completely symmetric in x and s: x s xs 10 ∈ (x̄ + L) ∩ IRn+ , = µe. ∈ ⊥ (c + L ) ∩ IRn+ , x > 0, s > 0, (5.18) The affine spaces c + L⊥ and x̄ + L intersect in a unique point ξ ∈ IRn . Hence, we could even take c = x̄ = ξ. II.5 Preliminaries 105 Due to (5.17), we conclude that on the dual feasible region, bT y and x̄T s sum up to the constant cT x̄. 5.9 Dual logarithmic barrier function We conclude this chapter by introducing the dual logarithmic barrier function, using the symmetry that has now become apparent. Recall that for any positive µ the primal µ-center x(µ) has been characterized as the minimizer of the primal logarithmic barrier function g̃µ (x), as given by (5.2): g̃µ (x) = cT x − µ n X log xj . j=1 Using the symmetry, we obtain that the dual µ-center s(µ) can be characterized as the minimizer of the function h̃µ (s) := x̄T s − µ n X log sj , (5.19) j=1 where s runs through all positive dual feasible slack vectors. According to (5.17), we may replace x̄T s by cT x̄ − bT y. Omitting the constant cT x̄, it follows that (y(µ), s(µ)) is the minimizer of the function T kµ (y, s) = −b y − µ n X log sj . j=1 The last function is usually called the dual logarithmic barrier function. Recall that for any dual feasible pair (y, s), h̃µ (s) and kµ (y, s) differ by a constant only. It may often be preferable to use h̃µ (s), because then we only have to deal with the nonnegative slack vectors, and not with the free variable y. It will be convenient to refer also to h̃µ (s) as the dual logarithmic barrier function. From now on we assume that the interior point condition is satisfied, unless stated otherwise. As a consequence, both the primal and the dual logarithmic barrier functions have a minimizer, for each µ > 0. These minimizers are denoted by x(µ) and s(µ) respectively. 6 The Dual Logarithmic Barrier Method In the previous chapter we introduced the central path of a problem as the set consisting of all µ-centers, with µ running through all positive real numbers. Using this we can now easily describe the basic idea behind the logarithmic barrier method. We do so for the dual problem in standard format:  (D) max bT y : AT y + s = c, s ≥ 0 . Recall that any method for the dual problem can also be used for solving the primal problem, because of the symmetry discussed in Section 5.8. The dual problem has the advantage that its feasible region—in the y-space—can be drawn if its dimension is small enough (m = 1, 2 or 3). This enables us to illustrate graphically some aspects of the methods to be described below. 6.1 A conceptual method We assume that we know the µ-centers y(µ) and s(µ) for some positive µ = µ0 . Later on, in Chapter 8, we show that this assumption can be made without loss of generality. Given s(µ), the primal µ-center x(µ) follows from the relation x(µ)s(µ) = µe. Now the duality gap for the pair of µ-centers is given by cT x(µ) − bT y(µ) = x(µ)T s(µ) = nµ. The last equality follows since we have for each i that xi (µ)si (µ) = µ. It follows that if µ goes to zero, then the duality gap goes to zero as well. As a consequence we have that if µ is small enough, then the pair (y(µ), s(µ)) is ‘almost’ optimal for the dual problem. This can also be seen by comparing the dual objective value bT y(µ) with the optimal value of (D). Denoting the optimal value of (P ) and (D) by z ∗ we know from Proposition II.1 that bT y(µ) ≤ z ∗ ≤ cT x(µ), 108 II Logarithmic Barrier Approach so we have z ∗ − bT y(µ) ≤ cT x(µ) − bT y(µ) = x(µ)T s(µ) = nµ, and cT x(µ) − z ∗ ≤ cT x(µ) − bT y(µ) = x(µ)T s(µ) = nµ. Thus, if µ is chosen small enough, the primal objective value cT x(µ) and the dual objective value bT y(µ) can simultaneously be driven arbitrarily close to the optimal value. We thus have to deal with the question of how to obtain the µ-centers for small enough values of µ. Now let µ∗ be obtained from µ by µ∗ := (1 − θ) µ, where θ is a positive constant smaller than 1. We may expect that if θ is not too large, the µ∗ -centers will be close to the given µ-centers.1 For the moment, let us assume that we are able to calculate the µ∗ -centers, provided θ is not too large. Then the following conceptual algorithm can be used to find ε-optimal solutions of both (P ) and (D). Conceptual Logarithmic Barrier Algorithm Input: An accuracy parameter ε > 0; a barrier update parameter θ, 0 < θ < 1; the center (y(µ0 ), s(µ0 )) for some µ0 > 0. begin µ := µ0 ; while nµ ≥ ε do begin µ := (1 − θ)µ; s := s(µ); end end Recall that, given the dual center s(µ), the primal center x(µ) can be calculated immediately from the centering condition at µ. Hence, the output of this algorithm is a feasible primal-dual pair of solutions for (P ) and (D) such that the duality gap does not exceed ε. How many iterations are needed by the algorithm? The answer is provided by the following lemma. 1 This is a consequence of the fact that the µ-centers depend continuously on the barrier parameter µ, due to a result of Fiacco and McCormick [77]. See also Chapter 16. II.6 Dual Logarithmic Barrier Method 109 Lemma II.17 If the barrier parameter µ has the initial value µ0 and is repeatedly multiplied by 1 − θ, with 0 < θ < 1, then after at most   1 nµ0 log θ ε iterations we have nµ ≤ ε. Proof: Initially the duality gap is nµ0 , and in each iteration it is reduced by the factor 1 − θ. Hence, after k iterations the duality gap is smaller than ε if (1 − θ)k nµ0 ≤ ε. The rest of the proof goes in the same as in the proof of Lemma I.36. Taking logarithms we get k log (1 − θ) + log(nµ0 ) ≤ log ε. Since − log (1 − θ) ≥ θ, this certainly holds if kθ ≥ log(nµ0 ) − log ε = log This implies the lemma. nµ0 . ε ✷ To make the algorithm more practical, we have to avoid the exact calculation of the µ-center s(µ). This is the subject of the following sections. 6.2 Using approximate centers Recall that any µ-center is the minimizer for the corresponding logarithmic barrier function. Therefore, by minimizing the corresponding logarithmic barrier function we will find the µ-center. Since the logarithmic barrier function has a positive definite Hessian, Newton’s method is a natural candidate for this purpose. If we know the µ-center, then defining µ∗ by µ∗ := (1 − θ)µ, just as in the preceding section, we can move to the µ∗ -center by applying Newton’s method to the logarithmic barrier function corresponding to µ∗ , starting at the µ-center. Having reached the µ∗ -center, we can repeat this process until the barrier parameter has become small enough. In fact this would yield an implementation of the conceptual algorithm of the preceding section. Unfortunately, however, after the update of the barrier parameter to µ∗ , to find the µ∗ -center exactly infinitely many Newton steps are needed. To restrict the number of Newton steps between two successive updates of the barrier parameter, we do not calculate the µ∗ -center exactly, but instead use an approximation of it. Our first aim is to show that this can be done in such a way that only one Newton step is taken between two successive updates of the barrier parameter. Later on we deal with a different approach where the number of Newton steps between two successive updates of the barrier parameter may be larger than one. In the following sections we are concerned with a more detailed analysis of the use of approximate centers. In the analysis we need to measure the proximity of an approximate center to the exact center. We also have to study the behavior of 110 II Logarithmic Barrier Approach Newton’s method when applied to the logarithmic barrier function. We start in the next section with the calculation of the Newton step. Then we proceed to defining a proximity measure and deal with some related properties. After this we can formulate the algorithm, and analyze it. 6.3 Definition of the Newton step In this section we assume that we are given a dual feasible pair (y, s), and, by applying Newton’s method to the dual logarithmic barrier function corresponding to the barrier parameter value µ, we try to find the minimizer of this function, which is the pair (y(µ), s(µ)). Recall that the dual logarithmic barrier function is the function kµ (y, s) defined by n X log si , kµ (y, s) := −bT y − µ i=1 where (y, s) runs through all dual feasible pairs with positive slack vector s. Recall also that y and s are related by the dual feasibility condition AT y + s = c, s ≥ 0, and since we assume that A has full rank, this defines a one-to-one correspondence between the components y and s in dual feasible pairs. As a consequence, we can consider kµ (y, s) as a function of s alone. In Section 5.8 we showed that kµ (y, s) differs only by the constant cT x̄ from h̃µ (s) = x̄T s − µ n X log sj , j=1 provided Ax̄ = b. Our present aim is to compute the minimizer s(µ) of h̃µ (s). Assuming s 6= s(µ), we construct a search direction by applying Newton’s method to h̃µ (s). We first calculate the first and second derivatives of h̃µ (s) with respect to s, namely ∇h̃µ (s) = x̄ − µs−1 , ∇2 h̃µ (s) = µS −2 , where, as usual, S = diag (s). The Newton step ∆s — in the s-space — is the minimizer of the second-order approximation of h̃µ (s + ∆s) at s, which is given by t(∆s) := h̃µ (s) + x̄ − µs−1 T 1 ∆s + ∆sT µS −2 ∆s, 2 subject to the condition that s+ ∆s is dual feasible. The latter means that there exists ∆y such that AT (y + ∆y) + s + ∆s = c. Since AT y + s = c, this is equivalent to AT ∆y + ∆s = 0 II.6 Dual Logarithmic Barrier Method 111 for some ∆y. We make use of an (n − m) × n matrix H whose null space is equal to the row space of A. Then the condition on ∆s simply means that H∆s = 0, which is equivalent to ∆s ∈ null space of H. Using Proposition A.1, we find that ∆s minimizes t(∆s) if and only if ∇t(∆s) = x̄ − µs−1 + µs−2 ∆s ⊥ null space of H. It is useful to restate these conditions in terms of the matrix HS:2 sx̄ − µe + µs−1 ∆s ⊥ null space of HS, and Therefore, writing µs−1 ∆s ∈ null space of HS.  sx̄ − µe = −µs−1 ∆s + sx̄ − µe + µs−1 ∆s , we have a decomposition of the vector sx̄ − µe into two components, with the first component in the null space of HS and the second component orthogonal to the null space of HS. Stated otherwise, µs−1 ∆s is the orthogonal projection of µe − sx̄ into the null space of HS. Hence we have shown that µs−1 ∆s = PHS (µe − sx̄) . (6.1) From this relation the Newton step ∆s can be calculated. Since the projection matrix PHS 3 is given by −1 HS, PHS = I − SH T HS 2 H T we obtain the following expression for ∆s:    −1 sx̄ . HS e− ∆s = s I − SH T HS 2 H T µ Recall that x̄ may be any vector such that Ax̄ = b. It follows that the right-hand side in (6.1) must be independent of x̄. It is left to the reader to verify that this is indeed true.4,5,6 We are now going to explore this in a surprising way with extremely important consequences. 2 3 Exercise 33 Let S be a square and nonsingular matrix and H be any other matrix such that the product HS is well defined. Then x ∈ null space of H if and only if S −1 x ∈ null space of HS, and x ⊥ null space of H if and only if Sx ⊥ null space of HS T . Prove this. For any matrix Q the matrix of the orthogonal projection onto the null space of Q is denoted as PQ . 4 Exercise 34 Show that PHS (s∆x) = 0 whenever A∆x = 0. 5 Exercise 35 The Newton step in the y-space is given by ∆y = AS −2 AT Prove this. (Hint: Use that AT ∆y + ∆s = 0.) 6 −1  b µ  − AS −1 e . Observe that the computation of ∆s requires the inversion of the matrix HS 2 H T , and the computation of ∆y the inversion of the matrix AS −2 AT . It is not clear in general which of the two inversions is more attractive from a computational point of view. 112 II Logarithmic Barrier Approach If we let x̄ run through the affine space Ax̄ = b then the vector µe − sx̄ runs through another affine space that is parallel to the null space of AS −1 . Now using that null space of AS −1 = row space of HS, we conclude that the affine space consisting of all vectors µe − sx̄, with Ax̄ = b, is orthogonal to the null space of HS. This implies that these two spaces intersect in a unique point. Hence there exists a unique vector x̄ satisfying Ax̄ = b such that µe − sx̄ belongs to the null space of HS. We denote this vector as x(s, µ). From its definition we have PHS (µe − sx(s, µ)) = µe − sx(s, µ), thus yielding the following expression for the Newton step: µs−1 ∆s = µe − sx(s, µ). (6.2) null space of AS −1 = row space of HS Figure 6.1 depicts the situation. ✛ {µe − sx̄ : Ax̄ = b} null space of HS ■ µ ∆s s = µe − sx(s, µ) Figure 6.1 The projection yielding s−1 ∆s. Another important feature of the vector x(s, µ) is that it minimizes the 2-norm of µe − sx̄ in the affine space Ax̄ = b. Hence, x(s, µ) can be characterized by the property x(s, µ) = argminx {kµe − sxk : Ax = b} . (6.3) We summarize these results in a theorem. Theorem II.18 Let s be any positive dual feasible slack vector. Then the Newton step ∆s at s with respect to the dual logarithmic barrier function corresponding to the barrier parameter value µ satisfies (6.2), with x(s, µ) as defined in (6.3). II.6 Dual Logarithmic Barrier Method 6.4 113 Properties of the Newton step We denote the result of the Newton step at s by s+ . Thus we may write  s+ := s + ∆s = s e + s−1 ∆s . A major question is whether s+ is feasible or not. Another important question is whether x(s, µ) is primal feasible. In this section we deal with these two questions, and we show that both questions allow a perfect answer. We start with the feasibility of s+ . Clearly, s+ is feasible if and only if s+ is nonnegative, and this is true if and only if e + s−1 ∆s ≥ 0. (6.4) We conclude that the (full) Newton step is feasible if (6.4) is satisfied. Let us now consider the vector x(s, µ). By definition, it satisfies the equation Ax = b, so if it is nonnegative, then x(s, µ) is primal feasible. We can derive a simple condition for that. From (6.2) we obtain that  x(s, µ) = µs−1 e − s−1 ∆s . (6.5) We conclude that x(s, µ) is primal feasible if and only if e − s−1 ∆s ≥ 0. (6.6) Combining this result with (6.4) we state the following lemma. Lemma II.19 If the Newton step ∆s satisfies −e ≤ s−1 ∆s ≤ e then x(s, µ) is primal feasible, and s+ = s + ∆s is dual feasible. Remark II.20 We make an interesting observation. Since s is positive, (6.6) is equivalent to s − ∆s ≥ 0. Note that s − ∆s is obtained by moving from s in the opposite direction of the Newton step. Thus we conclude that x(s, µ) is primal feasible if and only if a backward Newton step yields a dual feasible point for the dual problem. We conclude this section by considering the special case where ∆s = 0. From (6.2) we deduce that this occurs if and only if sx(s, µ) = µe, i.e., if and only if s and x(s, µ) satisfy the centering condition with respect to µ. Since s and x(s, µ) are positive, they satisfy the KKT conditions. Now the unicity property gives us that x(s, µ) = x(µ) and s = s(µ). Thus we see that the Newton step at s is equal to the zero vector if and only if s = s(µ). This could have been expected, because s(µ) is the minimizer of the dual logarithmic barrier function. 114 6.5 II Logarithmic Barrier Approach Proximity and local quadratic convergence Lemma II.19 in the previous section states under what conditions the Newton step yields feasible solutions on both the dual and the primal side. This turned out to be the case when −e ≤ s−1 ∆s ≤ e. Observe that these inequalities can be rephrased simply by saying that the infinity norm of the vector s−1 ∆s does not exceed 1. We refer to s−1 ∆s as the Newton step ∆s scaled by s, or, in short, the scaled Newton step at s. In the analysis of the logarithmic barrier method we need a measure for the ‘distance’ of s to the µ-center s(µ). The above observation might suggest that the infinity norm of the scaled Newton step could be used for that purpose. However, it turns out to be more convenient to use the 2-norm of the scaled Newton step. So we measure the proximity of s to s(µ) by the quantity7 δ(s, µ) := s−1 ∆s . (6.7) At the end of the previous section we found that the Newton step ∆s vanishes if and only if s is equal to s(µ). As a consequence we have δ(s, µ) = 0 ⇐⇒ s = s(µ). The obvious question that we have to deal with is about the improvement in the proximity to s(µ) after a feasible Newton step. The next theorem provides a very elegant answer to this question. In the proof of this theorem we need a different characterization of the proximity δ(s, µ), which is an immediate consequence of Theorem II.18, namely δ(s, µ) = e − 1 sx(s, µ) = min {kµe − sxk : Ax = b} . µ µ x (6.8) We have the following result. Theorem II.21 If δ(s, µ) ≤ 1, then x(s, µ) is primal feasible, and s+ = s + ∆s is dual feasible. Moreover, δ(s+ , µ) ≤ δ(s, µ)2 . Proof: The first part of the theorem is an obvious consequence of Lemma II.19, because the infinity norm of s−1 ∆s does not exceed its 2-norm and hence does not exceed 1. Now let us turn to the proof of the second statement. Using (6.8) we write δ(s+ , µ) = 7  1 min µe − s+ x µ x : Ax = b . Exercise 36 If s = s(µ) then we know that µs−1 is primal feasible. Now let δ = δ(s, µ) > 0 and consider x = µs−1 . Let Q = AS −2 AT . Then Q is positive definite, and so is its inverse. Hence Q−1 defines a norm that we denote as k.kQ−1 . Thus, for any z ∈ IRn : kzkQ−1 = p z T Q−1 z. Measuring the amount of infeasibility of x in the sense of this norm, prove that kAx − bkQ−1 = µδ. II.6 Dual Logarithmic Barrier Method 115 Substituting for x the vector x(s, µ) we obtain the inequality δ(s+ , µ) ≤ 1 µ µe − s+ x(s, µ) . (6.9) The vector µe − s+ x(s, µ) can be reduced as follows: µe − s+ x(s, µ) = µe − (s + ∆s) x(s, µ) = µe − sx(s, µ) − ∆sx(s, µ). From (6.2) this implies µe − s+ x(s, µ) = µs−1 ∆s− ∆sx(s, µ) = (µe − sx(s, µ)) s−1 ∆s = µ s−1 ∆s Thus we obtain, by substitution of this equality in (6.9), δ(s+ , µ) ≤ s−1 ∆s 2 ≤ s−1 ∆s ∞ 2 . (6.10) s−1 ∆s . Now from the obvious inequality kzk∞ ≤ kzk, with z = s−1 ∆s, the result follows. ✷ Theorem II.21 implies that after a Newton step the proximity to the µ-center is smaller than the square of the proximity before the Newton step. In other words, Newton’s method is quadratically convergent. Moreover, the theorem defines a neighborhood of the µ-center s(µ) where the quadratic convergence occurs, namely {s ∈ D : δ(s, µ) < 1} . (6.11) This result is extremely important. It implies that when the present iterate s is close to s(µ), then only a small number of Newton steps brings us very close to s(µ). For instance, if δ(s, µ) = 0.5, then only 6 Newton steps yield an iterate with proximity less than 10−16 . Figure 6.2 shows a graph depicting the required number of steps to ✲ number of Newton steps 9 8 7 6 5 4 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ✲ δ(x, µ) Figure 6.2 Required number of Newton steps to reach proximity 10−16 . reach proximity 10−16 when starting at any given value of the proximity in the interval (0, 1). 116 II Logarithmic Barrier Approach We can also consider it differently. If we repeatedly apply Newton steps, starting at s0 = s, then after k Newton steps the resulting point, denoted by sk , satisfies k δ(sk , µ) ≤ δ(s0 , µ)2 . Hence, taking logarithms on both sides, − log δ(sk , µ) ≥ −2k log δ(s0 , µ), see Figure 6.3 (page 116). ✲ lower bound for − log δ (sk ,µ) − log δ(s0 ,µ) 70 60 50 40 30 20 10 0 0 1 2 3 4 5 6 ✲ iteration number k Figure 6.3 Convergence rate of the Newton process. The above algebraic proof of the quadratic convergence property is illustrated geometrically by Figure 6.4 (page 117). Like in Figure 6.1, in Figure 6.4 the null space of HS and the row space of HS are represented by perpendicular axes. From (6.1) we know that the orthogonal projection of any vector µe − sx, with Ax = b, into the null space of HS yields µs−1 ∆s. Hence the norm of this projection is equal to µδ(s, µ). In other words, µδ(s, µ) is equal to the Euclidean distance from the affine space {µe − sx : Ax = b} to the origin. Therefore, the proximity after the Newton step, given by µδ(s+ , µ), is the Euclidean distance from the affine space {µe − s+ x : Ax = b} to the origin. The affine space {µe − s+ x : Ax = b} contains 2 the vector µe − s+ x(s, µ), which is equal to µ s−1 ∆s , from (6.10). Hence, µδ(s+ , µ) does not exceed the norm of this vector. The properties of the proximity measure δ(s, µ) described in Theorem II.21 are illustrated graphically in the next example. In this example we draw some level curves for the proximity measure for some fixed value of the barrier parameter µ, and we show how the Newton step behaves when applied at some points inside and outside the region of quadratic convergence, as given by (6.11). We do this for some simple problems. 117 row spa ce row space of HS of HS + II.6 Dual Logarithmic Barrier Method ✛ {µe − sx̄ : Ax̄ = b} µ  ∆s 2 s ✛ {µe − s+ x̄ : Ax̄ = b} ❘ null space of HS ✛ ■ µ ∆s s = µe − sx(s, µ) µδ ( s+ , µ) ✲ ✛ ✲ µδ(s, µ) Figure 6.4 The proximity before and after a Newton step. 1 ✲ y2 1 0 ✠ 0.5 δ(s, 2) = 1 -1 -2 1 -3 1.5 -4 -1 Figure 6.5 -0.5 0 0.5 ✲ y1 1 1.5 Demonstration no.1 of the Newton process. Example II.22 First we take A and c as in Example II.7 on page 97, and b = (0, 1)T . Figure 5.2 (page 98) shows the feasible region and the central path. In Figure 6.5 we have added some level curves for δ(s, 2). We have also depicted the Newton step at 118 II Logarithmic Barrier Approach several points in the feasible region. The respective starting points are indicated by the symbol ‘o ’, and the resulting point after a Newton step by the symbol ‘∗ ’; the two points are connected by a straight line to indicate the Newton step. Note that, in agreement with Theorem II.21, when starting within the region of quadratic convergence, i.e., when δ(s, µ) < 1, the Newton step is not only feasible, but there is a significant decrease in the proximity to the 2-center. Also, when starting outside the region of quadratic convergence, i.e., when δ(s, µ) ≥ 1, it may happen that the Newton step leaves the feasible region. In Figure 6.6 we depict similar results for the problem defined in Example II.7 with b = (1, 1)T . ✲ y2 1 1 0 ✠ 0.5 δ(s, 2) = 1 -1 -2 -3 1.5 -4 -1 Figure 6.6 -0.5 0 0.5 ✲ y1 1 1.5 Demonstration no.2 of the Newton process. Finally, Figure 6.7 depicts the situation for a new, less regular, problem. It is defined by   1 4     " # " # 1   −2 1 1 0 1 −1 0 2  A= , b= , c=  2 . 1 1 1 −1 1 0 0 −1   2   0 0 This figure makes clear that after a Newton step the proximity to the 2-center may increase. Concluding this example, we may state that inside the region of quadratic convergence our proximity measure provides perfect control over the Newton process, II.6 Dual Logarithmic Barrier Method 119 2 ✲ y2 1 1.5 1.25 1 1.25 ■ 0.5 δ(s, 2) = 1 0 0 Figure 6.7 0.5 1 1.5 ✲ y1 Demonstration no.3 of the Newton process. but outside this region it has little value. 6.6 2 ♦ The duality gap close to the central path A nice feature of the µ-center s = s(µ) is that the vector x = µs−1 is primal feasible, and the duality gap for the primal-dual pair (x, s) is given by nµ. One might ask about the situation when s is close to s(µ). The next theorem provides a satisfactory answer. It states that for small values of the proximity δ(s, µ) the duality gap for the pair (x(s, µ), s) is close to the gap for the µ-centers. Theorem II.23 Let δ := δ(s, µ) ≤ 1. Then the duality gap for the primal-dual pair (x(s, µ), s) satisfies nµ (1 − δ) ≤ sT x(s, µ) ≤ nµ (1 + δ) . Proof: From Theorem II.21 we know that x(s, µ) is primal feasible. Hence, for the duality gap we have   sT x(s, µ) = sT µs−1 e − s−1 ∆s = µeT e − s−1 ∆s . Since the coordinates of the vector e − s−1 ∆s lie in the interval [1 − δ, 1 + δ], the result follows. ✷ 120 II Logarithmic Barrier Approach Remark II.24 The above estimate for the duality gap is not as sharp as it could be, but is sufficient for our goal. Nevertheless, we want to point out that the Cauchy–Schwarz inequality gives stronger bounds. We have  sT x(s, µ) = µeT e − s−1 ∆s = nµ − µeT s−1 ∆s. Hence √ sT x(s, µ) − nµ = µ eT s−1 ∆s ≤ µ kek s−1 ∆s = µ nδ, and it follows that  δ nµ 1 − √ n 6.7  ≤ sT x(s, µ) ≤ nµ  δ 1+ √ n  . • Dual logarithmic barrier algorithm with full Newton steps We can now describe an algorithm using approximate centers. We assume that we are given a pair (y 0 , s0 ) ∈ D and a µ0 > 0 such that (y 0 , s0 ) is close to the µ0 -center in the sense of the proximity measure δ(s0 , µ0 ). In the algorithm the barrier parameter monotonically decreases from the initial value µ0 to some small value determined by the desired accuracy. In the algorithm we denote by p(s, µ) the Newton step ∆s at s ∈ D+ to emphasize the dependence on the barrier parameter µ. Dual Logarithmic Barrier Algorithm with full Newton steps Input: A proximity parameter τ , 0 ≤ τ < 1; an accuracy parameter ε > 0; (y 0 , s0 ) ∈ D and µ0 > 0 such that δ(s0 , µ0 ) ≤ τ ; a fixed parameter θ, 0 < θ < 1. begin s := s0 ; µ := µ0 ; while nµ ≥ (1 − θ)ε do begin s := s + p(s, µ); µ := (1 − θ)µ; end end We prove the following theorem. √ √ Theorem II.25 If τ = 1/ 2 and θ = 1/(3 n), then the Dual Logarithmic Barrier Algorithm with full Newton steps requires at most   √ nµ0 3 n log ε II.6 Dual Logarithmic Barrier Method 121 iterations. The output is a primal-dual pair (x, s) such that xT s ≤ 2ε. 6.7.1 Convergence analysis The proof of Theorem II.25 depends on the following lemma. The lemma generalizes Theorem II.21 to the case where, after the Newton step corresponding to the barrier parameter value µ, the barrier parameter is updated to µ+ = (1 − θ)µ. Taking θ = 0 in the lemma we get back the result of Theorem II.21. Lemma II.26 8 Assuming δ(s, µ) ≤ 1, let s+ be obtained from s by moving along the Newton step ∆s = p(s, µ) at s corresponding to the barrier parameter value µ, and let µ+ = (1 − θ)µ. Then we have δ(s+ , µ+ )2 ≤ δ(s, µ)4 + θ2 n . (1 − θ)2 Proof: By definition we have δ(s+ , µ+ ) =  + 1 µ e − s+ x min µ+ x : Ax = b Substituting for x the vector x(s, µ) we obtain the inequality: δ(s+ , µ+ ) ≤ 1 µ+ µ+ e − s+ x(s, µ) = e − s+ x(s, µ) . µ(1 − θ) From (6.10) we deduce that  2  . s+ x(s, µ) = µ e − s−1 ∆s Substituting this, while simplifying the notation by using h := s−1 ∆s, we get δ(s+ , µ+ ) ≤ e −  e − h2 θ = h2 − e − h2 . 1−θ 1−θ (6.12) To further simplify the notation we replace θ/ (1 − θ) by ρ. Then taking squares of both sides in the last inequality we obtain δ(s+ , µ+ )2 ≤ h2 2 − 2ρ h2 Since khk = δ(s, µ) ≤ 1 we have T  e − h 2 + ρ2 e − h 2 2 . 0 ≤ e − h2 ≤ e. Hence we have h2 8 T  e − h2 ≥ 0, e − h2 2 ≤ kek2 . This lemma and its proof are due to Ling [182]. It improves estimates used by Roos and Vial [245]. 122 II Logarithmic Barrier Approach 2 Using this, and also that kek = n, we obtain δ(s+ , µ+ )2 ≤ h2 2 2 4 + ρ2 kek ≤ khk + ρ2 n = δ(s, µ)4 + ρ2 n, thus proving the lemma. ✷ Remark II.27 It may be noted that a weaker result can be obtained in a more simple way by applying the triangle inequality to (6.12). This yields δ(s+ , µ+ ) ≤ h2 + √ θ n θ e − h2 ≤ δ(s, µ)2 + . 1−θ 1−θ This result is strong enough to derive a polynomial iteration bound, but the resulting bound will be slightly weaker than the one in Theorem II.25. • √ The proof of Theorem II.25 goes now as follows. Taking θ = 1/(3 n), we have √ 1 θ n 3 ≤ = 1−θ 1 − 3√1 n 1 3 2 3 = 1 . 2 Hence, applying the lemma, we obtain 1 δ(s+ , µ+ )2 ≤ δ(s, µ)4 + . 4 √ Therefore, if δ(s, µ) ≤ τ = 1/ 2, then we obtain δ(s+ , µ+ )2 ≤ 1 1 1 + = , 4 4 2 √ which implies that δ(s+ , µ+ ) ≤ 1/ 2 = τ. Thus it follows that after each iteration of the algorithm the property δ(s, µ) ≤ τ is maintained. The iteration bound in the theorem is an immediate consequence of Lemma I.36. Finally, if s is the dual iterate at termination of the algorithm, and µ the value of the barrier parameter, then with x = x(s, µ), Theorem II.23 yields sT x(s, µ) ≤ nµ (1 + δ(s, µ)) ≤ nµ (1 + τ ) ≤ 2nµ. Since upon termination we have nµ ≤ ε, it follows that sT x(s, µ) ≤ 2ε. This completes the proof of the theorem. ✷ 6.7.2 Illustration of the algorithm with full Newton steps In this section we start with a straightforward application of the logarithmic barrier algorithm. After that we devote some sections to modifications of the algorithm that increase the practical efficiency of the algorithm without destroying the theoretical iteration bound. II.6 Dual Logarithmic Barrier Method 123 As an example we solve the problem with A and c as in Example II.7, and with bT = (1, 1). Written out, the (dual) problem is given by max {y1 + y2 : −1 ≤ y1 ≤ 1, y2 ≤ 1} . and the primal problem is min {x1 + x2 + x3 : x1 − x2 = 1, x3 = 1, x ≥ 0} . We can start the algorithm at y = (0, 0) and µ = 2, because we then have s = (1, 1, 1) and, since x = (2, 1, 1) is primal feasible,   0 sx 1 − e =  − 21  = √ . δ(s, µ) ≤ µ 2 − 12 With ε = 10−4 , the dual logarithmic barrier algorithm needs 53 iterations to generate the primal feasible solution x = (1.000015, 0.000015, 1.000000) and the dual feasible pair (y, s) with y = (0.999971, 0.999971) and s = (0.000029, 1.999971, 0.000029). The respective objective values are cT x = 2.000030 and bT y = 1.999943, and the duality gap is 0.000087. Table 6.1. (page 124) shows some quantities generated in the course of the algorithm. For each iteration the table shows the values of nµ, the first coordinate of x(s, µ), the coordinates of y, the first coordinate of s, the proximity δ = δ(s, µ) before and the proximity δ + = δ(s+ , µ) after the Newton step at y to the µ-center, and, in the last column, the barrier update parameter θ, which is constant in this example. The columns for δ and δ + in Table 6.1. are of special interest. They make clear that the behavior of the algorithm differs from what might be expected. The analysis was based √ on the idea of maintaining the proximity of the iterates below the value τ = 1/ 2 = 0.7071, so as to stay in the region where Newton’s method is very efficient. Therefore we updated the barrier parameter in such a way that just before the Newton step, i.e., just after the update of the barrier parameter, the proximity should reach the value τ . The table makes clear that in reality the proximity takes much smaller values (soon after the start). Asymptotically the proximity before the Newton step is always 0.2721 and after the Newton step 0.0524. This can also be seen from Figure 6.8, which shows the relevant part of the feasible region and the central path. The points y are indicated by small circles and the exact µ-centers as asterisks. The above observation becomes very clear in this figure: soon after the start the circles and the asterisks can hardly be distinguished. The figure also shows at each iteration the region where the proximity is smaller than τ , thus indicating the space where we are allowed to move without leaving the region of quadratic convergence. Instead of using this space the algorithm moves in a very narrow neighborhood of the central path. 6.8 A version of the algorithm with adaptive updates The example in the previous section has been discussed in detail in the hope that the reader will now understand that there is an easy way to reduce the number of iterations 124 It. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 II Logarithmic Barrier Approach nµ 6.000000 4.845299 3.912821 3.159798 2.551695 2.060621 1.664054 1.343807 1.085191 0.876346 0.707693 0.571498 0.461513 0.372695 0.300969 0.243048 0.196273 0.158500 0.127997 0.103364 0.083472 0.067407 0.054435 0.043959 0.035499 0.028667 0.023150 0.018695 0.015097 0.012192 0.009845 0.007951 0.006421 0.005185 0.004187 0.003381 0.002731 0.002205 0.001781 0.001438 0.001161 0.000938 0.000757 0.000612 0.000494 0.000399 0.000322 0.000260 0.000210 0.000170 0.000137 0.000111 0.000089 0.000072 x1 2.500000 2.255388 1.969957 1.749168 1.578422 1.447319 1.347011 1.270294 1.211482 1.166207 1.131170 1.103907 1.082581 1.065815 1.052577 1.042085 1.033741 1.027088 1.021771 1.017513 1.014098 1.011356 1.009152 1.007378 1.005950 1.004800 1.003873 1.003125 1.002522 1.002036 1.001643 1.001327 1.001071 1.000865 1.000698 1.000564 1.000455 1.000368 1.000297 1.000240 1.000194 1.000156 1.000126 1.000102 1.000082 1.000066 1.000054 1.000043 1.000035 1.000028 1.000023 1.000018 1.000015 1.000015 y1 0.000000 0.250000 0.285497 0.342068 0.403015 0.467397 0.532510 0.595745 0.654936 0.708650 0.756184 0.797423 0.832650 0.862383 0.887244 0.907881 0.924914 0.938910 0.950370 0.959728 0.967352 0.973553 0.978589 0.982674 0.985986 0.988668 0.990839 0.992596 0.994017 0.995165 0.996094 0.996845 0.997451 0.997941 0.998337 0.998657 0.998915 0.999124 0.999292 0.999429 0.999539 0.999627 0.999699 0.999757 0.999804 0.999841 0.999872 0.999897 0.999917 0.999933 0.999946 0.999956 0.999964 0.999971 Table 6.1. y2 0.000000 −0.500000 −0.606897 −0.234058 −0.022234 0.184083 0.337370 0.466322 0.568477 0.651736 0.718677 0.772849 0.816552 0.851862 0.880369 0.903393 0.921985 0.936999 0.949123 0.958915 0.966821 0.973207 0.978363 0.982527 0.985890 0.988605 0.990798 0.992569 0.993999 0.995154 0.996087 0.996840 0.997448 0.997939 0.998336 0.998656 0.998915 0.999124 0.999292 0.999428 0.999538 0.999627 0.999699 0.999757 0.999804 0.999841 0.999872 0.999897 0.999917 0.999933 0.999946 0.999956 0.999964 0.999971 s1 1.000000 0.750000 0.714503 0.657932 0.596985 0.532603 0.467490 0.404255 0.345064 0.291350 0.243816 0.202577 0.167350 0.137617 0.112756 0.092119 0.075086 0.061090 0.049630 0.040272 0.032648 0.026447 0.021411 0.017326 0.014014 0.011332 0.009161 0.007404 0.005983 0.004835 0.003906 0.003155 0.002549 0.002059 0.001663 0.001343 0.001085 0.000876 0.000708 0.000571 0.000461 0.000373 0.000301 0.000243 0.000196 0.000159 0.000128 0.000103 0.000083 0.000067 0.000054 0.000044 0.000036 0.000029 δ 0.6124 0.0901 0.2491 0.2003 0.2334 0.2285 0.2406 0.2438 0.2500 0.2537 0.2574 0.2601 0.2624 0.2643 0.2658 0.2670 0.2680 0.2688 0.2695 0.2700 0.2704 0.2707 0.2710 0.2712 0.2714 0.2716 0.2717 0.2718 0.2718 0.2719 0.2720 0.2720 0.2720 0.2721 0.2721 0.2721 0.2721 0.2721 0.2721 0.2721 0.2721 0.2721 0.2721 0.2722 0.2722 0.2722 0.2722 0.2722 0.2722 0.2722 0.2722 0.2722 0.2722 − Output of the dual full-step algorithm. δ+ 0.2509 0.0053 0.0540 0.0303 0.0420 0.0379 0.0416 0.0421 0.0441 0.0453 0.0467 0.0477 0.0486 0.0493 0.0499 0.0504 0.0508 0.0511 0.0513 0.0515 0.0517 0.0518 0.0519 0.0520 0.0521 0.0521 0.0522 0.0522 0.0523 0.0523 0.0523 0.0523 0.0523 0.0523 0.0523 0.0524 0.0524 0.0524 0.0524 0.0524 0.0524 0.0524 0.0524 0.0524 0.0524 0.0524 0.0524 0.0524 0.0524 0.0524 0.0524 0.0524 0.0524 − θ 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 0.1925 − II.6 Dual Logarithmic Barrier Method 125 1 y2 0.5 ✻ 0 −0.5 ② −1 −1.5 −2 ② δ(s, 2(1 − θ)2 ) = τ ② δ(s, 2(1 − θ)) = τ δ(s, 2) = τ −2.5 ✛ −3 0 central path 0.5 1 ✲ y1 Figure 6.8 Iterates of the dual logarithmic barrier algorithm. required by the algorithm without losing the quality of the solution guaranteed by Theorem II.25. The obvious way to reach this goal is to make larger updates of the barrier parameter while keeping the iterates in the region of quadratic convergence. This is called the adaptive-update strategy,9 which we discuss in the next section. After that we deal with a more greedy approach, using larger updates of the barrier parameter, and in which we may leave temporarily the region of quadratic convergence. This is the so-called large-update strategy. The analysis of the large-update strategy cannot be based on the proximity measure δ(y, µ) alone, because outside the region of quadratic convergence this measure has no useful meaning. But, as we shall see, there exists a different way of measuring the progress of the algorithm in that case. 6.8.1 An adaptive-update variant Observe that the iteration bound of Theorem II.25 was obtained by requiring that after each update of the barrier parameter µ the proximity satisfies δ(s, µ) ≤ τ. (6.13) In order to make clear how this observation can be used to improve the performance of the algorithm without losing the iteration bound of Theorem II.25, let us briefly recall the idea behind the proof of this theorem. At the start of an iteration we are given s and µ such that (6.13) holds. We then make a Newton step to the µ-center, 9 The adaptive-update strategy was first proposed by Ye [303]. See also Roos and Vial [245]. 126 II Logarithmic Barrier Approach ■ central path y(µ+ ) ✛ y(µ) (y + , s+ ) ✛ Figure 6.9 ✛ δ(s+ , µ+ ) = τ δ(s+ , µ) = τ δ(s+ , µ) = τ 2 The idea of adaptive updating. which yields s+ , and we have δ(s+ , µ) ≤ τ 2 . (6.14) Then we update µ to a smaller value µ+ = (1 − θ)µ such that δ(s+ , µ+ ) ≤ τ, (6.15) and we start the next iteration. Our estimates in √ the proof of Theorem II.25 were such that it has become clear that the value θ = 1/(3 n) guarantees that (6.15) will hold. But from the example in the previous section we know that actually the new proximity may be much smaller than τ . In other words, it may well happen that using the given value of θ we start the next iteration with an s+ and a µ+ such that δ(s+ , µ+ ) is (much) smaller than τ . It will be clear that this opens a way to speed up the algorithm without √ degrading the iteration bound. For if we take θ larger than the value θ = 1/(3 n) used in Theorem II.25, thus enforcing a deeper update of the barrier parameter in such a way that (6.15) still holds, then the analysis in the proof of Theorem II.25 remains valid but the number of iterations decreases. The question arises of how deep the update might be. In other words, we have to deal with the problem that we are given s+ and µ such that (6.14) holds, and we ask how large we can take θ in µ+ = (1 − θ)µ so that (6.15) holds with equality: δ(s+ , µ+ ) = τ. See Figure √ 6.9. Note that we know beforehand that this value of θ is at least θ = 1/(3 n). To answer the above question we need to introduce the so-called affine-scaling direction and the centering direction at s. II.6 Dual Logarithmic Barrier Method 6.8.2 127 The affine-scaling direction and the centering direction From (6.1) we recall that the Newton step at s to the µ-center is given by µs−1 ∆s = PHS (µe − sx̄) , so we may write   1 sx̄ = SPHS (e) − SPHS (sx̄) . ∆s = SPHS e − µ µ The directions ∆c s := SPHS (e) (6.16) ∆a s := −SPHS (sx̄) (6.17) and are called the centering direction and the affine-scaling direction respectively. Note that these two directions depend only on the iterate s and not on the barrier parameter µ. Now the Newton step at s to the µ-center can be written as ∆s = ∆c s + 1 a ∆ s. µ By definition (6.7), the proximity δ(s, µ) is given by δ(s, µ) := s−1 ∆s . Thus we have δ(s, µ) = dc + 1 a d , µ where dc = s−1 ∆c s, da = s−1 ∆a s are the scaled centering and affine-scaling directions respectively. 6.8.3 Calculation of the adaptive update Now that we know how the proximity depends on the barrier parameter we are able to solve the problem posed √ above. We assume that we have an iterate s such that for some µ > 0 and 0 < τ < 1/ 2, δ := δ(s, µ) ≤ τ 2 , and we ask for the smallest value µ+ of the barrier parameter such that δ(s, µ+ ) = τ. Clearly, µ+ is the smallest positive root of the equation δ(s, µ) = dc + 1 a d = τ. µ (6.18) 128 II Logarithmic Barrier Approach Note that in the case where b = 0, the dual objective value is constant on the dual feasible region and hence s is optimal.10,11 We assume that da 6= 0. This is true if and only if b 6= 0. It then follows from (6.18) that δ(s, µ) depends continuously on µ and goes to infinity if µ approaches zero. Hence, since τ > τ 2 , equation (6.18) has at least one positive solution. Squaring both sides of (6.18), we arrive at the following quadratic equation in 1/µ: 2 1 2 2 kda k + (da )T dc + kdc k − τ 2 = 0. (µ)2 µ The two roots of (6.19) are given by r   2 2 2 −(da )T dc ± ((da )T dc ) − kda k kdc k − τ 2 2 kda k (6.19) . We already know that at least one of the roots is positive. Hence, although we do not know the sign of the second root, we may conclude that 1/µ∗ , where µ∗ is the value of the barrier parameter we are looking for, is equal to the larger of the two roots. This gives, after some elementary calculations, µ∗ = kdc k2 − τ 2 r .  2 2 2 c 2 a T c a T c a (d ) d + ((d ) d ) − kd k kd k − τ It is interesting to observe that it is easy to characterize the situation that both roots of (6.18) are positive. By considering the constant term in the quadratic equation (6.19) we see that both roots are positive if and only if kdc k2 − τ 2 > 0. From (6.18) it follows that kdc k = δ(s, ∞). Thus, both roots are positive if and only if δ(s, ∞) > τ. Obviously this situation occurs only if (da )T dc < 0. Thus we find the interesting result δ(s, ∞) > τ ⇒ (da )T dc < 0. At the central path, when δ(s, µ) = 0, we have da = −µdc , so in that case the above implication is obvious. 10 Exercise 37 Show that da = 0 if and only if b = 0. 11 Exercise 38 Consider the case b = 0. Then the primal feasibility condition is Ax = 0, x ≥ 0, which is homogeneous in x. Show that x(s, µ) = µx(s, 1) for each µ > 0, and that δ(s, µ) is independent of µ. Taking s = s(1), it now easily follows that s(µ) = s(1) for each µ > 0. This means that the dual central path is a point in this case, whereas the primal central path is a straight half line. If s and µ > 0 are such that δ(s, µ) < 1 then the Newton process converges quadratically to s(1), which is the analytic center of the dual feasible region. See also Roos and Vial [243] and Ye [310]. II.6 Dual Logarithmic Barrier Method 6.8.4 129 Illustration of the use of adaptive updates By way of example we solve the same problem as in Section 6.7.2 with the dual logarithmic barrier algorithm, now using adaptive updates. As before, we start the algorithm at y = (0, 0) and µ = 2. With ε = 10−4 and adaptive updates, the dual full-step algorithm needs 20 iterations to generate the primal feasible solution x = (1.000013, 0.000013, 1.000000) and the dual feasible pair (y, s) with y = (0.999973, 0.999986) and s = (0.000027, 1.999973, 0.000014). The respective objective values are cT x = 2.000027 and bT y = 1.999960, and the duality gap is 0.000067. Table 6.2. (page 129) gives some information on how the algorithm progresses. From the seventh column in this table (with the heading δ) it is clear that we have reached our goal: after each update of the barrier parameter the proximity equals τ . Moreover, the adaptive barrier parameter update strategy reduced the number of iterations, from 53 to 20. Figure 6.10 (page 130) provides a graphical illustration of the adaptive strategy. It shows the relevant part of the feasible region and the central path, as well as the first four points generated by the algorithm and their regions of quadratic convergence. After each update the iterate lies on the boundary of the region of quadratic convergence for the next value of the barrier parameter. It. nµ x1 y1 y2 s1 δ δ+ θ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 3.000000 1.778382 0.863937 0.505477 0.280994 0.165317 0.093710 0.055038 0.031566 0.018469 0.010662 0.006220 0.003603 0.002098 0.001217 0.000708 0.000411 0.000239 0.000139 0.000081 0.000081 1.500000 1.374235 1.149409 1.091171 1.047762 1.028293 1.015735 1.009255 1.005275 1.003087 1.001779 1.001038 1.000601 1.000350 1.000203 1.000118 1.000069 1.000040 1.000023 1.000013 1.000013 0.000000 0.500000 0.579559 0.864662 0.847943 0.954529 0.947640 0.984428 0.982196 0.994677 0.993971 0.998188 0.997961 0.999385 0.999311 0.999792 0.999767 0.999930 0.999921 0.999976 0.999973 0.000000 0.000000 0.686927 0.714208 0.913169 0.906834 0.971182 0.968951 0.990450 0.989568 0.996813 0.996484 0.998931 0.998814 0.999640 0.999599 0.999879 0.999865 0.999959 0.999954 0.999986 1.000000 0.500000 0.420441 0.135338 0.152057 0.045471 0.052360 0.015572 0.017804 0.005323 0.006029 0.001812 0.002039 0.000615 0.000689 0.000208 0.000233 0.000070 0.000079 0.000024 0.000027 0.7071 0.7071 0.7071 0.7071 0.7071 0.7071 0.7071 0.7071 0.7071 0.7071 0.7071 0.7071 0.7071 0.7071 0.7071 0.7071 0.7071 0.7071 0.7071 0.7071 − 0.1581 0.4725 0.4563 0.4849 0.4912 0.4776 0.4937 0.4799 0.4915 0.4827 0.4894 0.4846 0.4881 0.4856 0.4874 0.4862 0.4870 0.4864 0.4868 0.4865 − 0.5000 0.4072 0.5142 0.4149 0.4441 0.4117 0.4332 0.4127 0.4265 0.4149 0.4227 0.4167 0.4207 0.4177 0.4198 0.4183 0.4193 0.4186 0.4191 0.4187 − Table 6.2. Output of the dual full-step algorithm with adaptive updates. 130 II Logarithmic Barrier Approach 1 y2 0.5 ✻ 0 ② −0.5 δ(s, 2) = τ −1 −1.5 −2 −2.5 ✛ −3 0 central path 0.5 1 ✲ y1 Figure 6.10 6.9 The iterates when using adaptive updates. A version of the algorithm with large updates In this section we consider a more greedy approach than the adaptive strategy, using larger updates of the barrier parameter. As before, we assume that we have an iterate s and a µ > 0 such that s belongs to the region of quadratic convergence around the µ-center. In fact we assume that12 1 δ(s, µ) ≤ τ = √ . 2 Starting at s we want to reach the region of quadratic convergence around the µ+ center, with µ+ = (1 − θ)µ, and we assume that θ is so large that s lies outside the region of quadratic convergence around the µ+ -center. In fact, it may well happen that δ(s, µ+ ) is much larger than 1. It is clear that the analysis of the previous sections, where we always took full Newton steps for the target value of the barrier parameter, is then no longer useful: this analysis was based on the nice behavior of Newton’s method in a close neighborhood of the µ+ -center. Being outside this region, we can no longer profit from this nice behavior and we need an alternative approach. Now remember that the target center s(µ+ ) can be characterized as the (unique) 12 √ We could have taken a different value for τ , for example τ = 1/2, but the choice τ = 1/ 2 seems to be natural. The analysis below supports our choice. In the literature the choice τ = 1/2 is very popular (see, e.g., [140]). It is easy to adapt the analysis below to this value. II.6 Dual Logarithmic Barrier Method 131 minimizer of the dual logarithmic barrier function kµ+ (y, s) = −bT y − µ+ n X log sj j=1 and that this function is strictly convex on the interior of the dual feasible region. Hence, the difference kµ+ (y, s) − kµ+ (y(µ+ ), s(µ+ )) vanishes if and only if s = s(µ+ ) and is positive elsewhere. The difference can therefore be used as another indicator for the ‘distance’ from s to s(µ+ ). That is exactly what we plan to do. Outside the region of quadratic convergence the barrier function value will act as a measure for proximity to the µ-center. We show that when moving in the direction of the Newton step at s the barrier function decreases, and that by choosing an appropriate step-size we can guarantee a sufficient decrease of the barrier function value. In principle, the step-size can be obtained from a one-dimensional line search in the Newton direction so as to minimize the barrier function in this direction. Based on these ideas we derive an upper bound for the required number of damped Newton steps to reach the vicinity of s(µ+ ); the upper bound will be a function of θ. The algorithm is described on page 131. We refer to the first while-loop in the Dual Logarithmic Barrier Algorithm with Large Updates Input: √ A proximity parameter τ = 1/ 2; an accuracy parameter ε > 0; a variable damping factor α; an update parameter θ, 0 < θ < 1; (y 0 , s0 ) ∈ D and µ0 > 0 such that δ(s0 , µ0 ) ≤ τ . begin s := s0 ; µ := µ0 ; while nµ ≥ ε do begin µ := (1 − θ)µ; while δ(s, µ) ≥ τ do begin s := s + αp(s, µ); (The damping factor α must be such that kµ (y, s) decreases sufficiently. The default value is 1/(1 + δ(s, µ)).) end end end algorithm as the outer loop and to the second while-loop as the inner loop. Each 132 II Logarithmic Barrier Approach execution of the outer loop is called an outer iteration and each execution of the inner loop an inner iteration. The required number of outer iterations depends only on the dimension n of the problem, on µ0 and ε, and on the (fixed) barrier update parameter θ. This number immediately follows from Lemma I.36. The main task in the analysis of the algorithm is to derive an upper bound for the number of iterations in the inner loop. For that purpose we need some lemmas that estimate barrier function values and objective values in the region of quadratic convergence around the µ-center. Since these estimates are interesting in themselves, and also because their importance goes beyond the analysis of the present algorithm with line searches alone, we discuss them in separate sections. 6.9.1 Estimates of barrier function values We start with the barrier function values. Our goal is to estimate dual barrier function values in the region of quadratic convergence around the µ-center. It will be convenient not to deal with the barrier function itself, but to scale it by the barrier parameter. Therefore we introduce n hµ (s) := −bT y X 1 log sj . kµ (y, s) = − µ µ j=1 Let us point out once more that y is omitted in the argument of hµ (s) because of the one-to-one correspondence between y and s in dual feasible pairs (y, s). We also use the primal barrier function scaled by µ: n gµ (x) := 1 cT x X log xj . g̃µ (x) = − µ µ j=1 Recall that both barrier functions are strictly convex on their domain and that s(µ) and x(µ) are their respective minimizers. Therefore, defining φpµ (x) := gµ (x) − gµ (x(µ)), φdµ (s) := hµ (s) − hµ (s(µ)), we have φdµ (s) ≥ 0, with equality if and only if s = s(µ), and also φpµ (x) ≥ 0, with equality if and only if x = x(µ). As a consequence, defining φµ (x, s) := φpµ (x) + φdµ (s), (6.20) where (x, s) is any pair of positive primal and dual feasible solutions, we have φµ (x, s) ≥ 0, and the equality holds if and only if x = x(µ) and s = s(µ). The function φµ : P + × D+ → IR+ is called the primal-dual logarithmic barrier function with barrier parameter µ. Now the following lemma is almost obvious. Lemma II.28 Let x > 0 be primal feasible and s > 0 dual feasible. Then φpµ (x) = φµ (x, s(µ)) ≤ φµ (x, s) and φdµ (s) = φµ (x(µ), s) ≤ φµ (x, s). Proof: The inequalities in the lemma are immediate from (6.20) since φpµ (x) and φdµ (s) are nonnegative. Similarly, the equalities follow since φpµ (x(µ)) = φdµ (s(µ)) = 0. Thus the lemma has been proved. ✷ II.6 Dual Logarithmic Barrier Method 133 In the sequel, properties of the function φµ form the basis of many of our estimates. These estimates follow from properties of the univariate function ψ(t) = t − log(1 + t), t > −1, (6.21) as defined in (5.5).13 The definition of ψ is extended to any vector z = (z1 , z2 , . . . , zn ) satisfying z + e > 0 according to Ψ(z) = n X ψ(zj ) = n X j=1 j=1 (zj − log(1 + zj )) = eT z − n X log(1 + zj ). (6.22) j=1 We now make a crucial observation, namely that the barrier functions φµ (x, s), φpµ (x) and φdµ (s) can be nicely expressed in terms of the function Ψ. Lemma II.29 Let x > 0 be primal feasible and s > 0 dual feasible. Then   (i) φµ (x, s) = Ψ xs − e ;   µ xs(µ) −e ; (ii) φpµ (x) = Ψ  µ  x(µ)s d (iii) φµ (s) = Ψ −e . µ Proof: 14 First we consider item (i). We use that cT x − bT y = xT s and cT x(µ) − bT y(µ) = x(µ)T s(µ) = nµ. Now φµ (x, s) can be reduced as follows: φµ (x, s) = = hµ (s) + gµ (x) − (hµ (s(µ)) + gµ (x(µ))) n n x(µ)T s(µ) X xT s X log xj sj − log zj (µ)sj (µ) − + µ µ j=1 j=1 n = xT s X − log xj sj − n + n log µ. µ j=1 Since xT s = eT (xs) and eT e = n, we find the following expression for φµ (x, s):15 φµ (x, s) = e T   X   n sx xj sj sx log −e − =Ψ −e . µ µ µ j=1 (6.23) This proves the first statement in the lemma. The second statement follows by substituting s = s(µ) in the first statement, and using Lemma II.28. Similarly, the third statement follows by substituting x = x(µ) in the first statement. ✷ 13 Exercise 39 Let t > −1. Prove that 14 15   −t t2 + ψ(t) = . 1+t 1+t Note that the dependence of φµ (x, s) on x and s is such that it depends only on the coordinatewise product xs of x and s. Exercise 40 Considering (6.23) as the definition of φµ (x, s), and without using the properties of ψ, show that φµ (x, s) is nonnegative, and zero if and only if xs = µe. (Hint: Use the arithmeticgeometric-mean inequality.) ψ 134 II Logarithmic Barrier Approach Now we are ready to derive lower and upper bounds for the value of the dual logarithmic barrier function in the region of quadratic convergence around the µcenter. These bounds heavily depend on the following two inequalities: ψ (kzk) ≤ Ψ(z) ≤ ψ (− kzk) , z > −e. (6.24) The second inequality is valid only if kzk < 1. The inequalities in (6.24) are fundamental for our purpose and are immediate consequences of Lemma C.2 in Appendix C.16,17 Lemma II.30 18 Let δ := δ(s, µ). Then φdµ (s) ≥ δ − log(1 + δ) = ψ(δ). Moreover, if δ < 1, then φdµ (s) ≤ φµ (x(s, µ), s) ≤ −δ − log(1 − δ) = ψ(−δ). Proof: By applying the inequalities in (6.24) to (6.23) we obtain for any positive primal feasible x:     sx sx ψ ≤ φµ (x, s) ≤ ψ − (6.25) −e −e , µ µ where the second inequality is valid only if the norm of xs/µ − e does not exceed 1. Using (6.8) we write δ = δ(s, µ) = e − sx sx(s, µ) ≤ e− . µ µ Hence, by the monotonicity of ψ(t) for t ≥ 0, ψ(δ) ≤ ψ 16 17 sx −e µ  , At least one of the inequalities in (6.24) shows up in almost every paper on interior-point methods. As far as we know, all usual proofs use the power series expansion of log(1 + x), −1 < x < 1 and do not characterize the case of equality, at least not explicitly. We give an elementary proof in Appendix C (page 435). Exercise 41 Let z ∈ IRn . Prove that z≥0 ⇒ Ψ(z) ≤ nψ   kzk √ n  ≤  kzk2 2 − kzk kzk2 ≥ . √ n 2 This lemma improves a similar result of den Hertog et al. [146] and den Hertog [140]. The improvement is due to a suggestion made by Osman Güler [130] during a six month stay at Delft in 1992, namely to use the primal logarithmic barrier function in the analysis of the dual logarithmic barrier method. This approach not only simplifies the analysis significantly, but also leads to sharper estimates. It may be appropriate to mention that even stronger bounds for φµ (x, s) will be derived in Lemma II.69, but there we use a different proximity measure. −e < z ≤ 0 18  ⇒ Ψ(z) ≥ nψ II.6 Dual Logarithmic Barrier Method 135 for any positive primal feasible x. Taking x = x(µ) and using the left inequality in (6.25) and the third statement in Lemma II.29, we get ψ(δ) ≤ ψ  sx(µ) −e µ  ≤ φµ (x(µ), s) = φdµ (s), proving the first inequality in the lemma. For the proof of the second inequality in the lemma we assume δ < 1 and put x = x(s, µ) in the right inequality in (6.25). This gives   sx(s, µ) φµ (x(s, µ), s) ≤ ψ − = ψ(−δ). −e µ By Lemma II.28 we also have φdµ (s) ≤ φµ (x(s, µ), s). Thus the lemma follows. ✷ The functions ψ(δ) and ψ(−δ), for 0 ≤ δ < 1, play a dominant role in many of the estimates below. Figure 6.11 shows their graphs. 3 2.75 2.5 2.25 2 1.75 1.5 1.25 1 0.75 ❯ ❯ 0.25 0 0 0.2 Figure 6.11 6.9.2 ψ(δ) ψ(−δ) 0.5 0.4 0.6 0.8 ✲ δ 1 The functions ψ(δ) and ψ(−δ) for 0 ≤ δ < 1. Estimates of objective values We proceed by considering the dual objective value bT y in the region of quadratic convergence around the µ-center. Using that x(µ)s(µ) = µe and cT x(µ) − bT y(µ) = x(µ)T s(µ) = nµ, we write bT y(µ) − bT y = = cT x(µ) − nµ − bT y = sT x(µ) − nµ = eT (sx(µ) − µe)     s sx(µ) T T − e = µe −e . (6.26) µe µ s(µ) 136 II Logarithmic Barrier Approach Applying the Cauchy–Schwarz inequality to the expression for bT y(µ) − bT y in (6.26), we obtain √ s −e . bT y(µ) − bT y ≤ µ n (6.27) s(µ) √ We assume δ := δ(s, µ) < 1/ 2. It seems reasonable then to expect that the norm of the vector s sx(µ) hs := −e= −e s(µ) µ will not differ too much from δ. In any case, that is what we are going √ to show. It will then follow that the absolute value of bT y(µ) − bT y is of order µδ n. Note that hs can be written as hs = s − s(µ) , s(µ) and hence khs k measures the relative difference between s and s(µ). We also introduce a similar vector for any primal feasible x > 0: hx := x x − x(µ) xs(µ) −e= −e= . µ x(µ) x(µ) Using that x − x(µ) and s − s(µ) are orthogonal, as these vectors belong to the null space and row space of A, respectively, we may write  T   1 x − x(µ) s − s(µ) hTx hs = = (x − x(µ))T (s − s(µ)) = 0. x(µ) s(µ) µ This makes clear that hx and hs are orthogonal as well. In the rest of this section we work with x = x(s, µ) and derive upper bounds for khx k and khs k. It is convenient to introduce the vector h = hx + hs . The next lemma implicitly yields an upper bound for khk. Lemma II.31 Let δ = δ(s, µ) < 1 and x = x(s, µ). Then ψ(khk) ≤ ψ(−δ). Proof: Using Lemma II.29 we may rewrite (6.20) as φµ (x, s) = Ψ(hx ) + Ψ(hs ). By the first inequality in (6.24) we have Ψ(hx ) ≥ ψ(khx k) and Ψ(hs ) ≥ ψ(khs k). Applying the first inequality in (6.24) to the 2-dimensional vector (khx k , khs k), we obtain ψ(khx k) + ψ(khs k) ≥ ψ(khk). Here we used that hx and hs are orthogonal. Substitution gives φµ (x, s) ≥ ψ(khk). II.6 Dual Logarithmic Barrier Method 137 On the other hand, by Lemma II.30 we have φµ (x, s) ≤ ψ(−δ), thus completing the proof. ✷ Let us point out that we can easily deduce from Lemma II.31 an interesting upper bound for khk if δ < 1. It can then be shown that ψ(khk) ≤ ψ(−δ) implies khk ≤ δ/(1 − δ).19,20 This implies that khk ≤ 1 if δ ≤ 1/2. However, for our purpose this bound √ is not strong enough. We prove a stronger result that implies that khk ≤ 1 if δ < 1/ 2. √ √ Lemma II.32 Let δ = δ(s, µ) ≤ 1/ 2. Then khk < 2. Proof: By Lemma II.31 we have ψ(khk) ≤ ψ(−δ). Since ψ(−δ) is monotonically increasing in δ, this implies √ ψ(khk) ≤ ψ(−1/ 2) = 0.52084. Since √ ψ( 2) = 0.53284 > 0.52084, and ψ(t) is monotonically increasing for t ≥ 0, we conclude that khk < √ 2. We now have the following result. Lemma II.33 21 √ Let δ := δ(s, µ) ≤ 1/ 2. Then q p s khs k = − e ≤ 1 − 1 − 2δ 2 . s(µ) Moreover, if x = x(s, µ) then also khx k = x −e ≤ x(µ) Proof: Lemma II.32 implies that q p 1 − 1 − 2δ 2 . khx + hs k = khk < √ 2. On the other hand, since xs xs −e= − e = (e + hx )(e + hs ) − e = hx + hs + hx hs , µ x(µ)s(µ) with x = x(s, µ), and using (6.8), it follows that 1 khx + hs + hx hs k = δ ≤ √ . 2 19 Exercise 42 Let 0 ≤ t < 1. Prove that ψ 20 21  −t 1+t  ≤ t2 t2 t2 ≤ ψ(t) ≤ ≤ ψ(−t) ≤ ≤ψ 2(1 + t) 2 2(1 − t)  t 1−t  . Also show that the first two inequalities are valid for any t > 0. Exercise 43 Let 0 ≤ δ < 1 and r ≥ 0 be such that ψ(r) ≤ ψ(−δ). Prove that r ≤ δ/(1 − δ). For δ ≤ 1/2 this lemma was first shown by Gonzaga (private communication, Delft, 1994). ✷ 138 II Logarithmic Barrier Approach At this stage we may apply the fourth uv-lemma (Lemma C.8 in Appendix C) with u = hx and v = hs , to obtain the lemma. ✷ We are now ready for the main result of this section. √ Theorem II.34 If δ = δ(s, µ) ≤ 1/ 2 then q p √ T T b y(µ) − b y ≤ µ n 1 − 1 − 2δ 2 . Proof: Recall from (6.27) that √ bT y(µ) − bT y ≤ µ n khs k . Substituting the bound of Lemma II.33 on khs k, the theorem follows. ✷ 1 0.8 p √ 1 − 1 − 2δ 2 0.6 ❯ ❑ δ 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 ✲ δ Figure 6.12 The graphs of δ and p 1− √ √ 1 − 2δ 2 for 0 ≤ δ ≤ 1/ 2. p √ Figure 6.12 (page 138) shows the graphs of δ and 1 − 1 − 2δ 2 . It is clear that for small values of δ (δ ≤ 0.3 say) the functions can hardly be distinguished. 6.9.3 Effect of large update on barrier function value We start by considering the effect of an update of the barrier parameter on the difference between the dual barrier function value and its minimal value. More precisely, we assume that for given dual feasible s and µ > 0 we have δ = δ(s, µ) ≤ √ 1/ 2, and we want to estimate φdµ+ (s) = hµ+ (s) − hµ+ (s(µ+ )), where µ+ = µ(1 − θ) for 0 ≤ θ < 1. Note that Lemma II.30 gives the answer if θ = 0: φdµ (s) ≤ ψ(−δ). II.6 Dual Logarithmic Barrier Method 139 For the general case, where θ > 0, we write φdµ+ (s) hµ+ (s) − hµ+ (s(µ+ )) = hµ+ (s) − hµ+ (s(µ)) + hµ+ (s(µ)) − hµ+ (s(µ+ )) = hµ+ (s) − hµ+ (s(µ)) + φdµ+ (s(µ)), = (6.28) and we treat the first two terms and the last term in the last expression separately. Lemma II.35 In the above notation, hµ+ (s) − hµ+ (s(µ)) ≤ ψ(−δ) + Proof: Just using definitions we write √ q p θ n 1 − 1 − 2δ 2 . 1−θ n hµ+ (s) − hµ+ (s(µ)) = = = = n bT y(µ) X −bT y X log s + log sj (µ) − + j µ+ µ+ j=1 j=1 − n X j=1 log sj + n X log sj (µ) + j=1 bT y(µ) − bT y µ+ bT y(µ) − bT y bT y(µ) − bT y − µ+ µ T T θ b y(µ) − b y φdµ (s) + . 1−θ µ hµ (s) − hµ (s(µ)) + Applying Lemma II.30 to the first term in the last expression, and Theorem II.34 to the second term gives the lemma. ✷ Lemma II.36 In the above notation, φdµ+ (s(µ)) ≤ φµ+ (x(µ), s(µ)) = nψ  θ 1−θ  . Proof: The inequality follows from Lemma II.28. The equality is obtained as follows. From (6.23),   X n s(µ)x(µ) xj (µ)sj (µ) φµ+ (x(µ), s(µ)) = eT log − e − . µ+ µ+ j=1 Since x(µ)s(µ) = µe and µ+ = (1 − θ)µ, this can be simplified to  X  n µ µe log + − e − φµ+ (x(µ), s(µ)) = eT + µ µ j=1   X n e 1 = eT log −e − 1−θ 1 − θ j=1    θ θ = n − log 1 + 1−θ 1−θ   θ . = nψ 1−θ 140 II Logarithmic Barrier Approach This completes the proof. ✷ Combining the results of the last two lemmas we find the next lemma. √ Lemma II.37 Let δ(s, µ) ≤ 1/ 2 for some dual feasible s and µ > 0. Then, if µ+ = µ(1 − θ) with 0 ≤ θ < 1, we have φdµ+ (s) ≤ψ  −1 √ 2    √ θ n θ + . + nψ 1−θ 1−θ Proof: The lemma follows from√(6.28) and the bounds provided by the previous ✷ lemmas, by substitution of δ = 1/ 2. With s, µ and µ+ as in the last lemma, our aim is to estimate the number of damped Newton steps required to reach the vicinity of the µ+ -center when starting at s. To this end we proceed by estimating the decrease in the barrier function value during a damped Newton step. 6.9.4 Decrease of the barrier function value In this section we consider a damped Newton step to the µ-center at an arbitrary positive dual feasible s and we estimate its effect on the barrier function value. The analysis also yields a suitable value for the damping factor α. The result of the damped Newton step is denoted by s+ , so s+ = s + α∆s, (6.29) where ∆s denotes the full Newton step. Lemma II.38 Let δ = δ(s, µ). If α = 1/(1 + δ) then the damped Newton step (6.29) is feasible and it reduces the barrier function value by at least δ − log(1 + δ). In other words, φdµ (s) − φdµ (s+ ) ≥ δ − log(1 + δ) = ψ(δ). Proof: First recall from (6.5) in Section 6.5 that the Newton step ∆s is determined by  x(s, µ) = µs−1 e − s−1 ∆s . We denote x(s, µ) briefly as x. With z := ∆s xs =e− , s µ the damped Newton step can be described as follows: s+ = s + α∆s = s(e + αz). Since s+ is feasible if and only if it is nonnegative, the step is certainly feasible if α kzk < 1. Since δ = kzk, the value for α specified by the lemma satisfies this condition, II.6 Dual Logarithmic Barrier Method 141 and hence the feasibility of s+ follows. Now we consider the decrease in the dual barrier function value during the step. We may write φdµ (s) − φdµ (s+ ) = = hµ (s) − hµ (s+ ) n n −bT y + X −bT y X log sj − log s+ − + j . µ µ j=1 j=1 n = bT y + − bT y X log (1 + α zj ) . + µ j=1 The difference bT y + − bT y can be written as follows: bT y + − bT y  cT x − bT y − cT x − bT y + = xT s − xT s+ = −α xT (sz) = −α eT (xs) z = α µeT (z − e)z. = Thus we obtain φdµ (s) − φdµ (s+ ) = = = αeT (z − e)z +  n X log (1 + α zj ) j=1 α eT z 2 − eT (αz) − 2 α δ − Ψ(α z). n X j=1  log (1 + α zj ) Since kαzk < 1 we may apply the right-hand side inequality in (6.24), which gives Ψ(α z) ≤ ψ (−α kzk) = ψ (−αδ), whence φdµ (s) − φdµ (s+ ) ≥ α δ 2 − ψ(−αδ) = α δ 2 + α δ + log(1 − α δ). As a function of α, the right-hand side expression is increasing for 0 ≤ α ≤ 1/(1 + δ), as can be easily verified, and it attains its maximal value at α = 1/(1 + δ), which is the value specified in the lemma. Substitution of this value yields the bound in the 142 II Logarithmic Barrier Approach lemma. Thus the proof is complete.22,23 ✷ We are now ready to estimate the number of (inner) iterations between two successive updates of the barrier parameter. 6.9.5 Number of inner iterations Lemma II.39 The number of (inner) iterations between two successive updates of the barrier parameter is no larger than &  √ 2 ' θ n +1 . 3 1−θ Proof: From Lemma II.37 we know that after the update of µ we have   √ θ n θ φdµ+ (s) ≤ ψ(−τ ) + , + nψ 1−θ 1−θ √ where τ = 1/ 2. The algorithm repeats damped Newton steps as long the iterate s satisfies δ = δ(s, µ+ ) > τ . In that case the step decreases the barrier function value by at least ψ(δ), by Lemma II.38. Since δ > τ , the decrease is at least ψ(τ ) = 0.172307. As soon as the barrier function value has reached ψ(τ ) we are sure that δ(s, µ+ ) ≤ τ , from Lemma II.30. Hence, the number of inner iterations is no larger than     √ θ n θ 1 ψ(−τ ) − ψ(τ ) + . + nψ ψ(τ ) 1−θ 1−θ The rest of the proof consists in reducing this expression to the one in the lemma. First, using that ψ(−τ ) = 0.52084, we obtain ψ(−τ ) − ψ(τ ) 0.34853 = ≤ 3. ψ(τ ) 0.172307 22 Exercise 44 In the proof of Lemma II.38 we found the following expression for the decrease in the dual barrier function value: φdµ (s) − φdµ (s+ ) = α eT z 2 − Ψ(α z), where α denotes the size of the damped Newton step. Show that the decrease is maximal for the unique step-size ᾱ determined by the equation eT z 2 = n X j=1 αzj2 1 + αzj and that for this value the decrease is given by 23   ᾱz . e + ᾱz It is interesting to observe that Lemma II.38 provides a second proof of the first statement in Lemma II.30, namely φdµ (s) ≥ ψ(δ), Ψ where δ := δ(s, µ). This follows from Lemma II.38, since φdµ (s+ ) ≥ 0. II.6 Dual Logarithmic Barrier Method 143 Furthermore, using ψ(t) ≤ t2 /2 for t ≥ 0 we get24   nθ2 θ ≤ . nψ 1−θ 2(1 − θ)2 (6.30) Finally, using that 1/ψ(τ ) < 1/6 we obtain the following upper bound for the number of inner iterations:  &  √ 2 '  √ θ n 3nθ2 6θ n = 3 + +1 . 3+ 1−θ (1 − θ)2 1−θ This proves the lemma. ✷ √ Remark II.40 It is tempting to apply Lemma II.39 to the case where θ = 1/(3 n). We know that for that value of θ one full Newton step keeps the iterate in the region of quadratic convergence around the µ+ -center. Substitution of this value in the bound of Lemma II.39 however yields that at least 6 damped Newton steps are required for the same purpose. This disappointing result reveals a weakness of the above analysis. The weakness probably stems from the fact that the estimate of the number of inner iterations in one outer iteration is based on the assumption that the decrease in the barrier function value is given by the constant ψ(τ ). Actually the decrease is at least ψ(δ). Since in many inner iterations, in particular in the iterations immediately after the update of the barrier parameter, the proximity δ may be much larger than τ , the actual number of iterations may be much smaller than the pessimistic estimate of Lemma II.39. This is the reason why for the algorithm with large updates there exists a gap between theory and practice. In practice the number of inner iterations is much smaller than the upper bound given by the lemma. Hopefully future research will close this gap.25 • 6.9.6 Total number of iterations We proceed by estimating the total number of iterations required by the algorithm. Theorem II.41 To obtain a primal-dual pair (x, s), with x = x(s, µ), such that xT s ≤ 2ε, at most ' & &  √ 2 ' θ n nµ0 1 3 +1 log θ 1−θ ε iterations are required by the logarithmic barrier algorithm with large updates. 24 A different estimate arises by using Exercise 39, which implies ψ(t) ≤ t2 /(1 + t) for t > −1. Hence nψ 25  θ 1−θ  ≤ nθ 2 , 1−θ which is sharper than (6.30) if θ > 12 . The use of (6.30) however does not deteriorate the order of our estimates below. Exercise 45 Let δ = δ(s, µ) > 0 and x = x(s, µ). Then the vector z = (xs/µ) − e has at least one positive coordinate. Prove this. Hence, if z has only one nonzero coordinate then this coordinate equals kzk. Show that in that case the single damped Newton step with step-size α = 1/(1 + δ) yields s+ = s(µ). 144 II Logarithmic Barrier Approach Proof: The number of outer iterations follows from Lemma I.36. The bound in the theorem is obtained by multiplying this number by the bound of Lemma II.39 for the number of inner iterations per outer iteration and rounding the product, if not integral, to the smallest integer above it. ✷ We end this section by drawing two conclusions. If we take θ to be a fixed constant (independent of n), for example θ = 1/2, the iteration bound of Theorem II.41 becomes   nµ0 . O n log ε For such values of θ we say that the algorithm uses large updates. The number of inner iterations per outer iteration is then O(n). √ If we take θ = ν/ n for some fixed constant ν (independent of n), the iteration bound of Theorem II.41 becomes   √ nµ0 n log , O ε provided that n is large enough (n ≥ ν 2 say). It has become common to say that the algorithm uses medium updates. The number of inner iterations per outer iteration is then bounded by a constant, depending on ν. In the next section we give an illustration. 6.9.7 Illustration of the algorithm with large updates We use the same sample problem as before (see Sections 6.7.2 and 6.8.4) and solve it using the dual logarithmic barrier algorithm with large updates. We do this for several values of the barrier update parameter θ. As before, we start the algorithm at y = (0, 0) and µ = 2, and the accuracy parameter is set to ε = 10−4 . For θ = 0.5, Table 6.3. (page 145) lists the algorithm’s progress. The table needs some explanation. The first two columns contain counters for the outer and inner iterations, respectively. The algorithm requires 16 outer and 16 inner iterations. The table shows the effect of each outer iteration, which involves an update of the barrier parameter, and also the effect of each inner iteration, which involves a move in the dual space. During a barrier parameter update the dual variables y and s remain unchanged, but, because of the change in µ, the primal variable x(s, µ) and the proximity attain new values. After each update, damped Newton steps are taken until the proximity reaches the value τ . In this example the number of inner iterations per outer iteration is never more than one. Note that we can guarantee the primal feasibility of x only if the proximity is at most one. Since the table shows only the second coordinate of x (and also of s), infeasibility of x can only be detected from the table if x2 is negative. In this example this does not occur, but it occurs in the next example, where we solve the same problem with θ = 0.9. With θ = 0.9, Table 6.4. (page 146) shows that in some iterations x is infeasible indeed. Moreover, although the number of outer iterations is much smaller than in the previous case (5 instead of 16), the total number of iterations is almost the same (14 instead of 16). Clearly, and understandably, the deeper updates make it harder to reach the new target region. II.6 Dual Logarithmic Barrier Method Outer Inner 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 145 nµ x2 y1 y2 s2 δ 0 6.000000 3.000000 1 3.000000 1.500000 2 1.500000 0.750000 3 0.750000 0.375000 4 0.375000 0.187500 5 0.187500 0.093750 6 0.093750 0.046875 7 0.046875 0.023438 8 0.023438 0.011719 9 0.011719 0.005859 10 0.005859 0.002930 11 0.002930 0.001465 12 0.001465 0.000732 13 0.000732 0.000366 14 0.000366 0.000183 15 0.000183 0.000092 16 0.000092 1.500000 0.500000 0.690744 0.230248 0.302838 0.105977 0.138696 0.056177 0.065989 0.029627 0.032120 0.015201 0.015842 0.007704 0.007866 0.003878 0.003920 0.001946 0.001956 0.000975 0.000977 0.000488 0.000488 0.000244 0.000244 0.000122 0.000122 0.000061 0.000061 0.000031 0.000031 0.000015 0.000015 0.000000 0.000000 0.292893 0.292893 0.519549 0.519549 0.717503 0.717503 0.847850 0.847850 0.920315 0.920315 0.959178 0.959178 0.979268 0.979268 0.989548 0.989548 0.994747 0.994747 0.997366 0.997366 0.998681 0.998681 0.999340 0.999340 0.999670 0.999670 0.999835 0.999835 0.999917 0.999917 0.999959 0.000000 0.000000 0.000000 0.000000 0.433260 0.433260 0.696121 0.696121 0.840792 0.840792 0.918672 0.918672 0.958681 0.958681 0.979161 0.979161 0.989516 0.989516 0.994740 0.994740 0.997364 0.997364 0.998680 0.998680 0.999340 0.999340 0.999670 0.999670 0.999835 0.999835 0.999917 0.999917 0.999959 1.000000 1.000000 1.292893 1.292893 1.519549 1.519549 1.717503 1.717503 1.847850 1.847850 1.920315 1.920315 1.959178 1.959178 1.979268 1.979268 1.989548 1.989548 1.994747 1.994747 1.997366 1.997366 1.998681 1.998681 1.999340 1.999340 1.999670 1.999670 1.999835 1.999835 1.999917 1.999917 1.999959 0.6124 0.7071 0.2229 1.3081 0.2960 1.7316 0.3618 2.0059 0.4050 2.1632 0.4367 2.2575 0.4591 2.3176 0.4744 2.3556 0.4844 2.3793 0.4907 2.3937 0.4945 2.4023 0.4968 2.4074 0.4982 2.4103 0.4990 2.4120 0.4994 2.4130 0.4997 2.4135 0.4998 Table 6.3. Progress of the dual algorithm with large updates, θ = 0.5. This is even more true in the last example where we take θ = 0.99. Table 6.5. (page 146) shows the result. The number of outer iterations is only 3, but the total number of iterations is still 14. This leads us to the important observation that the deep update strategy has its limits. On the other hand, the number of iterations is competing with the methods using full Newton steps, and is significantly less than the iteration bound of Theorem II.41. 146 II Logarithmic Barrier Approach Outer Inner 0 1 2 3 4 5 Outer Inner 2 3 x2 y1 y2 s2 δ 0 6.000000 1.500000 0.000000 0.000000 1.000000 0.6124 0.600000 −0.300000 0.000000 0.000000 1.000000 5.3385 1 0.600000 0.014393 0.394413 0.631060 1.394413 2.4112 2 0.600000 0.108620 0.762163 0.722418 1.762163 0.5037 0.060000 −0.005240 0.762163 0.722418 1.762163 16.8904 3 0.060000 0.008563 0.906132 0.922246 1.906132 4.7236 4 0.060000 0.010057 0.967364 0.961475 1.967364 1.1306 5 0.060000 0.010098 0.977293 0.978223 1.977293 0.1716 0.006000 0.000891 0.977293 0.978223 1.977293 14.3247 6 0.006000 0.000994 0.992649 0.992275 1.992649 3.9208 7 0.006000 0.001001 0.996651 0.996769 1.996651 0.9143 8 0.006000 0.001001 0.997834 0.997808 1.997834 0.1277 0.000600 0.000099 0.997834 0.997808 1.997834 13.9956 9 0.000600 0.000100 0.999254 0.999264 1.999254 3.8257 10 0.000600 0.000100 0.999676 0.999673 1.999676 0.8883 11 0.000600 0.000100 0.999782 0.999783 1.999782 0.1224 0.000060 0.000010 0.999782 0.999783 1.999782 13.9508 12 0.000060 0.000010 0.999926 0.999926 1.999926 3.8128 13 0.000060 0.000010 0.999967 0.999968 1.999967 0.8847 14 0.000060 0.000010 0.999978 0.999978 1.999978 0.1216 Table 6.4. 0 1 nµ Progress of the dual algorithm with large updates, θ = 0.9. nµ x2 y1 y2 s2 δ 0 6.000000 1.500000 0.000000 0.000000 1.000000 0.6124 0.060000 −0.480000 0.000000 0.000000 1.000000 60.4235 1 0.060000 −0.133674 0.407010 0.797740 1.407010 28.2966 2 0.060000 0.008587 0.906680 0.860655 1.906680 7.0268 3 0.060000 0.009852 0.949767 0.964246 1.949767 1.7270 4 0.060000 0.010099 0.978068 0.974574 1.978068 0.2919 0.000600 −0.000021 0.978068 0.974574 1.978068 166.4835 5 0.000600 0.000086 0.992297 0.993722 1.992297 48.2832 6 0.000600 0.000099 0.998161 0.997593 1.998161 13.7438 7 0.000600 0.000100 0.999183 0.999394 1.999183 3.6913 8 0.000600 0.000100 0.999720 0.999656 1.999720 0.8224 9 0.000600 0.000100 0.999781 0.999792 1.999781 0.1013 0.000006 0.000001 0.999781 0.999792 1.999781 149.4817 10 0.000006 0.000001 0.999939 0.999934 1.999939 43.4727 11 0.000006 0.000001 0.999980 0.999981 1.999980 12.4359 12 0.000006 0.000001 0.999994 0.999993 1.999994 3.3655 13 0.000006 0.000001 0.999997 0.999997 1.999997 0.7573 14 0.000006 0.000001 0.999998 0.999998 1.999998 0.0949 Table 6.5. Progress of the dual algorithm with large updates, θ = 0.99. II.6 Dual Logarithmic Barrier Method 147 We conclude this section with a graphical illustration of the algorithm, with θ = 0.9. Figure 6.13 shows the first outer iteration, which consists of 2 inner iterations. 1 ✛ δ(s, 0.2) = τ y2 0.5 ✻ 0 −0.5 −1 −1.5 ② −2 δ(s, 2) = τ −2.5 ✛ −3 0 central path 0.5 1 ✲ y1 Figure 6.13 The first iterates for a large update with θ = 0.9. 7 The Primal-Dual Logarithmic Barrier Method 7.1 Introduction In the previous chapter we dealt extensively with the dual logarithmic barrier approach to the LO problem. It has become clear that Newton’s method, when applied to find the minimizer of the dual logarithmic barrier function, yields a search direction ∆s in the dual space that allows us to follow the dual central path (approximately) to the dual optimal set. We were able to show that an ε-solution of (D) can be obtained in a number of iterations that is proportional to the product of the logarithm of the initial √ duality gap divided by the desired accuracy, and n (for the full-step method and the medium-update method) or n (for the large-update method). Although the driving force in the dual logarithmic barrier approach is the desire to solve the dual problem (D), it also yields an ε-solution of the primal problem (P ). The problem (P ) also plays a crucial role in the analysis of the method. For example, the Newton step ∆s at (y, s) for the barrier parameter value µ is described by the primal variable x(s, µ). Moreover, the convergence proof of the method uses the duality gap cT x(s, µ) − bT y. Finally, the analysis of the medium-update and large-update versions of the dual method strongly depend on the properties of the primal-dual logarithmic barrier function φµ (x, s). The aim of this chapter is to show that we can benefit from the primal problem not only in the analysis but also in the design of the algorithm. The idea is to solve both the dual and the primal problem simultaneously, by taking in each iteration a step ∆s in the dual space and a step ∆x in the primal space. Here, the search directions ∆s and ∆x still have to be defined. This is done in the next section. Again, Newton’s name is given to the search directions, but now the search directions arise from an iterative method — also due to Newton — for solving the system of equations defining the µ-centers of (P ) and (D). In the following paragraphs we follow the same program as for the dual algorithms: we first introduce a proximity measure, then we deal with full-step methods, with both fixed and adaptive updates of the barrier parameter, and finally we consider methods that use deep (but fixed) updates and damped Newton steps. For the sake of clarity, it might be useful to emphasize that it is not our aim to take for ∆s the dual Newton step and for ∆x its counterpart, the primal Newton step. For this would mean that we were executing two algorithms simultaneously, namely the dual logarithmic barrier algorithm and the primal logarithmic barrier algorithm. 150 II Logarithmic Barrier Approach Apart from the fact that this makes no sense, it doubles the computational work (roughly speaking). Instead, we define the search directions ∆s and ∆x in a new way and we show that the resulting algorithms, called primal-dual algorithms, allow similar theoretical iteration bounds to their dual (or primal) counterparts. In practice, however, primal-dual methods have a very good reputation. Many computational studies give support to this reputation. This is especially true for the so-called predictor-corrector method, which is discussed in Section 7.7. 7.2 Definition of the Newton step In this section we are given a positive primal-dual feasible pair (x, (y, s)), and some µ > 0. Our aim is to define search directions ∆x, ∆y, ∆s that move in the direction of the µ-center x(µ), y(µ), s(µ). In fact, we want the new iterates x + ∆x, y + ∆y, s + ∆s to satisfy the KKT system (5.3) with respect to µ. After substitution this yields the following conditions on ∆x, ∆y, ∆s: A(x + ∆x) = b, x + ∆x > 0, A (y + ∆y) + s + ∆s (x + ∆x)(s + ∆s) = = c, µe. s + ∆s > 0, T If we neglect for the moment the inequality constraints, then, since Ax = b and AT y + s = c, this system can be rewritten as follows: A∆x = 0, T A ∆y + ∆s = 0, s∆x + x∆s + ∆x∆s = µe − xs. (7.1) Unfortunately, this system of equations in ∆x, ∆y and ∆s is nonlinear, because of the term ∆x∆s in the third equation. To overcome this difficulty we simply neglect this quadratic term, according to Newton’s method for solving nonlinear equations, and we obtain the linear system A∆x = 0, AT ∆y + ∆s s∆x + x∆s = = 0, µe − xs. (7.2) Below we show that this system determines the displacements ∆x, ∆y and ∆s uniquely. We call them the primal-dual Newton directions and these are the directions we are going to use. Theorem II.42 The system (7.2) has a unique solution, namely  −1 b − µAs−1 ∆y = AXS −1 AT ∆s ∆x = = −AT ∆y µs−1 − x − xs−1 ∆s. II.7 Primal-Dual Logarithmic Barrier Method 151 Proof: We divide the third equation in (7.2) coordinatewise by s, and obtain ∆x + xs−1 ∆s = µs−1 − x. (7.3) Multiplying this equation from the left by A, and using that A∆x = 0 and Ax = b, we get AXS −1 ∆s = µAs−1 − Ax = µAs−1 − b. The second equation gives ∆s = −AT ∆y. Substituting this we find AXS −1 AT ∆y = b − µAs−1 . Since A is an m × n matrix of rank m, the matrix AXS −1 AT has size m × m and is nonsingular, so the last equation determines ∆y uniquely as specified in the theorem. Now ∆s follows uniquely from ∆s = −AT ∆y. Finally, (7.3) yields the expression for ∆x.1 ✷ Remark II.43 In the analysis below we do not use the expressions just found for the search directions in the primal and the dual space. But it is important to see that their computation requires the solution of a linear system of equations with AXS −1 AT as coefficient matrix. We refer the reader to Chapter 20 for a discussion of computational issues related to efficient solution methods for such systems. • Remark II.44 We can easily deduce from Theorem II.42 that the primal-dual directions for the y- and the s-space differ from the dual search directions used in the previous chapter. For example, the dual direction for y was given by AS −2 AT −1  b − As−1 µ  whereas the primal-dual direction is given by AXS −1 AT −1  b − µAs−1 . The difference is that the scaling matrix S −2 in the dual case is replaced by the scaling matrix XS −1 /µ in the primal-dual case. Note that the two scaling matrices coincide if and only if XS = µI, which happens if and only if x = x(µ) and s = s(µ). In that case both expressions vanish, since then µAs−1 = Ax = b. We conclude that if s 6= s(µ) then the dual directions in the y- and in the s-space differ from the corresponding primal-dual directions. A similar result holds for the search direction in the primal space. It may be worthwhile to point out that the dual search direction at y depends only on y itself and the slack vector s = c − AT y, whereas the primal-dual direction at y also depends on the given primal variable x. • 1 Exercise 46 An alternative proof of the unicity property in Theorem II.42 can be obtained by showing that the matrix in the linear system (7.2) is nonsingular. This matrix is given by " Prove that this matrix in nonsingular. A 0 0 AT S 0 0 I X # . 152 7.3 II Logarithmic Barrier Approach Properties of the Newton step We denote the result of the (full) Newton step at (x, y, s) by (x+ , y + , s+ ): x+ = x + ∆x, y + = y + ∆y, s+ = s + ∆s. Note that the new iterates satisfy the affine equations Ax+ = b and AT y + + s+ = c, since A∆x = 0 and AT ∆y + ∆s = 0, so we only have to concentrate on the sign of the vectors x+ and s+ . We call the Newton step feasible if x+ and s+ are nonnegative and strictly feasible if x+ and s+ are positive. The main aim of this section is to find conditions for feasibility and strict feasibility of the (full) Newton step. First we deal with two simple lemmas.2 Lemma II.45 ∆x and ∆s are orthogonal. Proof: Since A∆x = 0, ∆x belongs to the null space of A, and since ∆s = −AT ∆y, ∆s belongs to the row space of A. Since these spaces are orthogonal, the lemma follows. ✷ If x+ and s+ are nonnegative (positive), then their product is nonnegative (positive) as well. We may write x+ s+ = (x + ∆x)(s + ∆s) = xs + (s∆x + x∆s) + ∆x∆s. Since s∆x + x∆s = µe − xs this leads to x+ s+ = µe + ∆x∆s. (7.4) Thus it follows that x+ and s+ are feasible only if µe + ∆x∆s is nonnegative. Surprisingly enough, the converse is also true. This is the content of our next lemma. Lemma II.46 The primal-dual Newton step is feasible if and only if µe + ∆x∆s ≥ 0 and strictly feasible if and only if µe + ∆x∆s > 0. Proof: The ‘only if’ part of both statements in the lemma follows immediately from (7.4). For the proof of the converse part we introduce a step length α, 0 ≤ α ≤ 1, and we define xα = x + α∆x, y α = y + α∆y, sα = s + α∆s. We then have x0 = x, x1 = x+ and similar relations for the dual variables. Hence we have x0 s0 = xs > 0. The proof uses a continuity argument, namely that x1 and s1 are nonnegative if xα sα is positive for all α in the open interval (0, 1). This argument has a simple geometric interpretation: x1 and s1 are feasible if and only if the open segment connecting x0 and x1 lies in the interior of the primal feasible region, and the open segment connecting s0 and s1 lies in the interior of the dual feasible region. Now we write xα sα = (x + α∆x)(s + α∆s) = xs + α (s∆x + x∆s) + α2 ∆x∆s. 2 One might observe that some of the results in this and the next section are quite similar to analogous results in Section 2.7.2 in Part I for the Newton step for the self-dual model. To keep the treatment here self-supporting we do not invoke these results, however. II.7 Primal-Dual Logarithmic Barrier Method 153 Using s∆x + x∆s = µe − xs gives xα sα = xs + α (µe − xs) + α2 ∆x∆s. Now suppose µe + ∆x∆s ≥ 0. Then it follows that xα sα ≥ xs + α (µe − xs) − α2 µe = (1 − α) (xs + αµe) . Since xs and e are positive it follows that xα sα > 0 for 0 ≤ α < 1. Hence, none of the entries of xα and sα vanish for 0 ≤ α < 1. Since x0 and s0 are positive, this implies that xα > 0 and sα > 0 for 0 ≤ α < 1. Therefore, by continuity, the vectors x1 and s1 cannot have negative entries. This completes the proof of the first statement in the lemma. Assuming µe + ∆x∆s > 0, we derive in the same way xα sα > xs + α (µe − xs) − α2 µe = (1 − α) (xs + αµe) . This implies that x1 s1 > 0. Hence, by continuity, x1 and s1 must be positive, proving the second statement in the lemma. ✷ We proceed with a discussion of the vector ∆x∆s. From (7.4) it is clear that the error made by neglecting the second-order term in the nonlinear system (7.1) is given by this vector. It represents the so-called second-order effect in the Newton step. Therefore it will not be surprising that the vector ∆x∆s plays a crucial role in the analysis of primal-dual methods. It is worth considering the ideal case where the second-order term vanishes. If ∆x∆s = 0, then ∆x and ∆s solve the nonlinear system (7.1). By Lemma II.46 the Newton iterates x+ and s+ are feasible in this case. Hence they satisfy the KKT conditions. Now the unicity property gives us that x+ = x(µ) and s+ = s(µ). Thus we see that the Newton process is exact in this case: it produces the µ-centers in one iteration.3 In general the second-order term is nonzero and the new iterates do not coincide with the µ-centers. But we have the surprising property that the duality gap always assumes the same value as at the µ-centers, where the duality gap equals nµ. T Lemma II.47 If the primal-dual Newton step is feasible then (x+ ) s+ = nµ. Proof: Using (7.4) and the fact that the vectors ∆x and ∆s are orthogonal, the duality gap after the Newton step can be written as follows: x+ T  s+ = eT x+ s+ = eT (µe + ∆x∆s) = µeT e = nµ. This proves the lemma. ✷ In the general case we need some quantity for measuring the progress of the Newton iterates on the way to the µ-centers. As in the case of the dual logarithmic barrier method we start by considering a ‘full-step method’. We then deal with 3 Exercise 47 Let (x, s) be a positive primal-dual feasible pair with x = x(µ). Show that the Newton process is exact in this case, with ∆x = 0 and ∆s = s(µ) − s. (A similar results holds if s = s(µ), and follows in the same way.) 154 II Logarithmic Barrier Approach an ‘adaptive method’, in which the barrier parameter is updated ‘adaptively’, and then turn to the ‘large-update method’, which uses large fixed updates and damped Newton steps. For the large-update method we already have an excellent candidate for measuring proximity to the µ-centers, namely the primal-dual logarithmic barrier function φµ (x, s). For the full-step method and the adaptive method we need a new measure that is introduced in the next section. 7.4 Proximity and local quadratic convergence Recall that for the dual method we have used the Euclidean norm of the Newton step ∆s scaled by s as a proximity measure. It is not at all obvious how this successful approach can be generalized to the primal-dual case. However, there is a natural way of doing this, but we first have to reformulate the linear system (7.2) that defines the Newton directions in the primal-dual case. To this end we introduce the vectors r r x xs , u := . d := s µ Using d we can rescale x and s to the same vector, namely u: d−1 x √ = u, µ ds √ = u. µ Now we scale ∆x and ∆s similarly to dx and ds : d−1 ∆x =: dx , √ µ d∆s √ =: ds . µ (7.5) For easy reference in the future we write x+ = x + ∆x + s = s + ∆s = = √ µ d (u + dx ) √ −1 µ d (u + ds ) (7.6) (7.7) and, using (7.4), x+ s+ = µe + ∆x∆s = µ (e + dx ds ) . (7.8) Thus we may restate Lemma II.46 without further proof as follows. Lemma II.48 The primal-dual Newton step is feasible if and only if e + dx ds ≥ 0 (7.9) e + dx ds > 0. (7.10) ∆x∆s = µdx ds , (7.11) and strictly feasible if and only if Since II.7 Primal-Dual Logarithmic Barrier Method 155 the orthogonality of ∆x and ∆s implies that the scaled displacements dx and ds are orthogonal as well. Now we may reformulate the left-hand side in the third equation of the KKT system as follows:  √ s∆x + x∆s = µ sddx + xd−1 ds = µ u (dx + ds ) , and the right-hand side can be rewritten as  µe − xs = µe − µu2 = µ u u−1 − u . The third equation can then be restated simply as dx + ds = u−1 − u. On the other hand, the first and the second equations can be reformulated as ADdx = 0 and (AD)T dy + ds = 0, where ∆y dy = √ . µ We conclude that the scaled displacements dx , dy and ds satisfy ADdx = 0 T (7.12) (AD) dy + ds = 0 dx + ds = u −1 − u. The first two equations show that the vectors dx and ds belong to the null space and the row space of the matrix AD respectively. These two spaces are orthogonal and the row space of AD is equal to the null space of the matrix HD−1 , where H is any matrix whose null space is equal to the row space of A, as defined in Section 6.3 (page 111). The last equation makes clear that dx and ds form the orthogonal components of the vector u−1 − u in these complementary subspaces. Therefore, we find4,5 dx = ds = PAD (u−1 − u) (7.13) PHD−1 (u−1 − u). (7.14) The orthogonality of dx and ds also implies kdx k2 + kds k2 = u−1 − u 2 . (7.15) Note that the displacements dx , ds (and also dy ) are zero if and only if u−1 − u = 0. In this case x, y and s coincide with the respective µ-centers. It will be clear that the quantity u−1 − u is a natural candidate for measuring closeness to the pair of 4 5 Exercise 48 Verify that the expressions for the scaled displacements dx and ds in (7.13) and (7.14) are in accordance with Theorem II.42. Exercise 49 Show that PAD + PHD −1 = I, where I denotes the identity matrix in IRn . Also show that PAD = D −1 H T HD −2 H T −1 HD −1 , PHD −1 = DAT AD 2 AT −1 AD. 156 II Logarithmic Barrier Approach µ-centers. It turns out that it is more convenient not to use the norm of u−1 − u itself, but to divide it by 2. Therefore, we define δ(x, s; µ) := 1 1 −1 u −u = 2 2 r xs − µ r µ . xs (7.16) By (7.15), δ(x, s; µ) is simply half of the Euclidean norm of the concatenation of the search direction vectors ∆x and ∆s after some appropriate scaling.6,7,8 In the previous section we discussed that the quality of the Newton step greatly depends on the second-order term ∆x∆s. Recall that this term, when expressed in the scaled displacements, equals µdx ds . We proceed by showing that the vector dx ds can be nicely bounded in terms of the proximity measure. Lemma II.49 Let (x, s) be any positive primal-dual √ pair and suppose µ > 0. If δ := δ(x, s; µ), then kdx ds k∞ ≤ δ 2 and kdx ds k ≤ δ 2 2. Proof: Since the vectors dx and ds are orthogonal, the lemma follows immediately from the first uv−lemma (Lemma C.4 in Appendix C) by noting that dx +ds = u−1 −u and u−1 − u = 2δ. ✷ We are now ready for the main result of this section (Theorem II.50 below), which is the primal-dual analogue of Theorem II.21 for the dual logarithmic barrier method. Theorem II.50 If δ := δ(x, s; µ) ≤ 1, then the primal-dual Newton step is feasible, i.e., x+ and s+ are nonnegative. Moreover, if δ < 1, then x+ and s+ are positive and δ2 δ(x+ , s+ ; µ) ≤ p . 2(1 − δ 2 ) 6 This proximity measure was introduced by Jansen et al. [157]. In the context of primal-dual methods, most authors used a different but closely related proximity measure. See Section 7.5.3. Because of the analogy with the proximity measure in the dual case, and also because of its natural interpretation as the norm of the scaled Newton direction, we prefer the proximity measure as defined by (7.16). Another motivation for the use of this measure is that it allows sharper estimates in the analysis of the primal-dual methods. This will become clear later. 7 Exercise 50 Let δ = δ(x, s; µ). In general the vector x̄ = µs−1 is not primal feasible, and the vector s̄ = µx−1 not dual feasible. The aim of this exercise is to show that the deviation from feasibility can be measured in a natural way. Defining Gp = AD 2 AT , we have kAx̄ − bkG−1 = p √ µ kds k , Gd = HD −2 H T , kH s̄ − HckG−1 = √ d µ kdx k . As a consequence, prove that kAx̄ − bk2 −1 + kH s̄ − Hck2 −1 = 4µδ2 . Gp 8 G d Exercise 51 Prove that v u n  u1 X xi s i δ(x, s; µ) = t cosh log 2 i=1 µ  −1 v u n  1 uX xi s i =t log sinh2 . i=1 2 µ II.7 Primal-Dual Logarithmic Barrier Method 157 Proof: Since δ := δ(x, s; µ) ≤ 1, Lemma II.49 implies that kdx ds k∞ ≤ 1. Now Lemma II.48 yields that the primal-dual Newton step is feasible, thus proving the first part of the theorem. Now let us turn to the proof of the second statement. Obviously, the same arguments as for the first part show that if δ < 1 then x+ and s+ are positive. Let δ + := δ(x+ , s+ ; µ) and s u+ := x+ s+ . µ Then we have, by definition, 2δ + = (u+ )−1 − u+ = (u+ )−1 e − (u+ )2 Recall from (7.8) that  . x+ s+ = µ (e + dx ds ) . Hence, u+ = Substitution gives p e + dx ds . dx ds 2δ + = √ e + dx ds kdx ds k . ≤ p 1 − kdx ds k∞ Now using the bounds in Lemma II.49 we obtain √ δ2 2 2δ + ≤ √ . 1 − δ2 Dividing both sides by 2 we arrive at the result in the theorem. ✷ Theorem II.50 makes clear that the primal-dual Newton method is quadratically convergent in the region   1 (7.17) (x, s) ∈ P × D : δ(x, s; µ) ≤ √ = 0.7071 , 2 where we have δ + ≤ δ 2 . It is clear that Theorem II.50 has no value p if the upper bound for δ(x+ , s+ ; µ) is not smaller than δ, which is the case if δ ≥ 2/3 = 0.8165. As for the dual Newton method, we provide a graphical example to illustrate how the primal-dual Newton process behaves. Example II.51 We use the same problem as in Example II.7 with b = (1, 1)T . So A, b and c are given by   " # " # 1 1 1 −1 0   . A= , c =  1 , b = 1 0 0 1 1 Instead of drawing a graph in the dual (or primal) space we take another approach. We associate with each primal-dual pair (x, s) the positive vector w = xs, and represent this vector by a point in the so-called w-space, which is the interior of the nonnegative 158 II Logarithmic Barrier Approach 4 0.707 w2 ✻ ✻ central path 3 0.5 δ(w, 1) = 2 √1 2 ❄ 1 0 0 Figure 7.1 1 2 3 ✲ w1 4 Quadratic convergence of primal-dual Newton process (µ = 1). orthant of IRn , with n = 3. Note that δ(x, s; µ) = 0 if and only if x = x(µ) and s = s(µ), and that in that case xs = µe. Hence, in the w-space the central path is represented by the half-line µe, µ > 0.√ Figure 7.1 shows the level curves (in the wspace) for the proximity values τ = 1/ 2 and τ 2 with respect to µ = 1, and also how the Newton step behaves when applied at some points on the boundary of the region of quadratic convergence. This figure depicts the w-space projected onto its first two coordinates. The starting point for a Newton step is always indicated by the symbol ‘o ’, and the point resulting from the step by the symbol ‘∗ ’.9 The curve connecting the two points shows the intermediate values of xs on the way from the starting point to the point after the full Newton step. The points on these curves represent xα sα = (x + α∆x)(s + α∆s) = xs + α (x∆s + s∆x) + α2 ∆x∆s, 0 ≤ α ≤ 1, where (x0 , s0 ) is the starting point of the iteration and (x1 , s1 ) the result of the full Newton step. If there were no second-order effects (i.e., if ∆x∆s = 0) then this curve would be a straight line. So the curvature of the line connecting the point before and after a step is an indication of the second-order effect. Note that after the Newton step the new proximity value is always smaller than τ 2 = 1/2, in agreement with Theorem II.50. In fact, one may observe that often the decrease in the proximity to the 1-center is much more significant. 9 The starting points in this example were obtained by using theory that will be developed later in the book, in Part III. There we show that for any positive vector w ∈ IRn there exists a primal-dual pair (x, s) such that xs = w and we also deal with methods that yield such a pair. For each starting point the first two entries of w can be read from the figure; for the third coordinate of w we took the value 1, the value of w3 at the 1-center, since x(1)s(1) = e. II.7 Primal-Dual Logarithmic Barrier Method 159 When starting outside the region of quadratic convergence the behavior of the Newton process is quite unpredictable. Note that the feasibility of the (full) Newton step is then not guaranteed by the theory. 9 w2 ✻8 1.5 ✻ central path 7 1.25 6 δ(w, 1) = 1.5 1 5 ❄ 4 0.75 3 0.5 2 0.25 1 0 0 Figure 7.2 1 2 3 4 5 6 7 8 9 ✲ w1 Demonstration of the primal-dual Newton process. In Figure 7.2 we consider the behavior of the Newton process outside this region, even for proximity values larger than 1. The behavior (in this simple example) is surprisingly good if we start on (or close to) the central path. When starting closer to the boundary of the w-space the second-order effect becomes more evident and this may result in infeasibility of the Newton step, as Figure 7.2 demonstrates (for example if w1 = 8 and w2 = 1). This observation, that Newton’s method performs better when the starting point is on or close to the central path than when we start close to the boundary of the nonnegative orthant, is not supported by the theory, but is in agreement with common computational practice. ♦ 7.4.1 A sharper local quadratic convergence result In this section we show that Theorem II.50 can be slightly improved. By using the third uv−lemma (Lemma C.7 in Appendix C) we obtain the following. Theorem II.52 If δ = δ(x, s; µ) < 1 then δ2 δ(x+ , s+ ; µ) ≤ p . 2(1 − δ 4 ) 160 II Logarithmic Barrier Approach Proof: From the proof of Theorem II.50 we recall the definitions of δ + and u+ , and the relation p u+ = e + dx ds . 2 Since dx and ds are orthogonal this implies that ku+ k = n. Now we may write 4(δ + )2 2 2 2 = (u+ )−1 + u+ − 2n   e 2 T −e . −n=e e + dx ds = (u+ )−1 − u+ = (u+ )−1 Application of Lemma C.7 to the last expression (with u = dx and v = ds ) yields the result of the theorem, since kdx + ds k = 2δ, with δ < 1. ✷ 7.5 Primal-dual logarithmic barrier algorithm with full Newton steps In this section we investigate a primal-dual algorithm using approximate centers. The algorithm is described below. It is assumed that we are given a positive primal-dual pair (x0 , s0 ) ∈ P + × D+ and µ0 > 0 such that (x0 , s0 ) is close to the µ0 -center in the sense of the proximity measure δ(x0 , s0 ; µ0 ). In the algorithm ∆x and ∆s denote the primal-dual Newton step, as defined before. Primal-Dual Logarithmic Barrier Algorithm with full Newton steps Input: A proximity parameter τ , 0 ≤ τ < 1; an accuracy parameter ε > 0; (x0 , s0 ) ∈ P + × D+ and µ0 > 0 such that (x0 )T s0 = nµ0 and δ(x0 , s0 ; µ0 ) ≤ τ ; a barrier update parameter θ, 0 < θ < 1. begin x := x0 ; s := s0 ; µ := µ0 ; while nµ ≥ (1 − θ)ε do begin x := x + ∆x; s := s + ∆s; µ := (1 − θ)µ; end end We have the following theorem. The proof will follow below. II.7 Primal-Dual Logarithmic Barrier Method 161 √ √ Theorem II.53 If τ = 1/ 2 and θ = 1/ 2n, then the Primal-Dual Logarithmic Barrier Algorithm with full Newton steps requires at most   √ nµ0 2n log ε iterations. The output is a primal-dual pair (x, s) such that xT s ≤ ε. 7.5.1 Convergence analysis Just as in the dual case the proof depends on a lemma that quantifies the effect on the proximity measure of an update of the barrier parameter to µ+ = (1 − θ)µ. Lemma II.54 Let (x, s) be a positive primal-dual pair and µ > 0 such that xT s = nµ. Moreover, let δ := δ(x, s; µ) and let µ+ = (1 − θ)µ. Then δ(x, s; µ+ )2 = (1 − θ)δ 2 + Proof: Let δ + := δ(x, s; µ+ ) and u = 4(δ + )2 = θ2 n . 4(1 − θ) p xs/µ. Then, by definition, √ u 1 − θu−1 − √ 1−θ 2 2 = √  θu 1 − θ u−1 − u + √ 1−θ 2 . From xT s = nµ it follows that kuk = n. Hence, u is orthogonal to u−1 − u: Therefore,  2 uT u−1 − u = n − kuk = 0. 4(δ + )2 = (1 − θ) u−1 − u 2 2 2 + θ2 kuk . 1−θ Finally, since u−1 − u = 2δ and kuk = n the result follows. ✷ The proof of Theorem √ II.53 now goes as follows. At the start of the algorithm we have δ(x, s; µ) ≤ τ = 1/ 2. After the primal-dual Newton step to the µ-center we have, by Theorem II.50, δ(x+ , s+ ; µ) ≤ 1/2. Also, from Lemma II.47, (x+ )T s+ =√nµ. Then, after the barrier parameter is updated to µ+ = (1 − θ)µ, with θ = 1/ 2n, Lemma II.54 yields the following upper bound for δ(x+ , s+ ; µ+ ): δ(x+ , s+ ; µ+ )2 ≤ 1 3 1−θ + ≤ . 4 8(1 − θ) 8 Assuming n ≥ 2, the last inequality follows since its left hand side is a convex function of θ, whose value is 3/8 both in θ = 0 and θ = 1/2. Since θ ∈ [0, 1/2],√the left hand side does not exceed 3/8. Since 3/8 < 1/2, we obtain δ(x+ , s+ ; µ+ ) ≤ 1/ 2 = τ. Thus, after each iteration of the algorithm the property δ(x, s; µ) ≤ τ 162 II Logarithmic Barrier Approach is maintained, and hence the algorithm is well defined. The iteration bound in the theorem follows from Lemma I.36. Finally, since after each full Newton step the duality gap attains its target value, by Lemma II.47, the duality gap for the pair (x, s) generated by the algorithm is at most ε. This completes the proof of the theorem. ✷ Remark II.55 It is worthwhile to discuss the quality of the iteration bound in Theorem II.53. For that purpose we consider the hypothetical situation where the Newton step in each iteration is exact. Then, putting δ + = δ(x+ , s+ , µ+ ), after the update of the barrier parameter we have θ2 n , 4(δ + )2 = 1−θ p √ and hence we have δ + ≤ 1/ 2 only if θ2 n ≤ 2(1 − θ). This occurs only if θ < 2/n. Hence, √ if we maintain the property δ(x, s; µ) ≤ 1/ 2 after the update of the barrier parameter, then the iteration bound will never be smaller than q nµ0 n log 2 ε  . (7.18) Note that the iteration bound of Theorem II.53 is only a factor 2 worse than the ‘ideal’ iteration bound (7.18). Recall that the bound (7.18) assumes that the Newton step is exact in each iteration. In this respect it is interesting to indicate that for larger values of n the result of Theorem II.53 can be improved so that it becomes closer to the ‘ideal’ iteration bound. But then we √ need to use the stronger quadratic convergence result of Theorem II.52. If we take θ = 1/ n, then by using √ Lemma II.54 and Theorem II.52, we may easily verify that the property δ(x, s; µ) ≤ τ = 1/ 2 is maintained if 1−θ 1 1 + ≤ . 4(1 − θ) 6 2 This holds if θ ≤ 0.36602, which corresponds to n ≥ 8. Thus, for n ≥ 8 the iteration bound of Theorem II.53 can be improved to  √ n log nµ0 ε  . (7.19) This iteration bound is the best among all known iteration bounds for interior-point methods. √ It differs by only a factor 2 from the ideal bound (7.18). • 7.5.2 Illustration of the algorithm with full Newton steps We use the same sample problem as before (see Sections 6.7.2, 6.8.4 and 6.9.7). As starting point we use the vectors x = (2, 1, 1), y = (0, 0) and s = (1, 1, 1), and since xT s = 4, we take the initial value of the barrier parameter µ equal to 4/3. We can easily check that δ(x, s; µ) = 0.2887. So these data can indeed be used to initialize the algorithm. With ε = 10−4 , the algorithm generates the data collected in Table 7.1.. As before, Table 7.1. contains one entry (the first) of the vectors x and s. The seventh column contains the values of the proximity δ = δ(x, s; µ) before the Newton step, and the eighth column the proximity δ + = δ(x+ , s+ ; µ) after the Newton step at (x, s) to the current µ-center. II.7 Primal-Dual Logarithmic Barrier Method It. nµ x1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 4.000000 2.367007 1.400680 0.828855 0.490476 0.290240 0.171750 0.101633 0.060142 0.035589 0.021060 0.012462 0.007375 0.004364 0.002582 0.001528 0.000904 0.000535 0.000317 0.000187 0.000111 0.000066 0.000039 2.000000 2.000000 1.510102 1.267497 1.148591 1.085283 1.049603 1.029055 1.017089 1.010076 1.005950 1.003516 1.002079 1.001230 1.000728 1.000430 1.000255 1.000151 1.000089 1.000053 1.000031 1.000018 1.000011 Table 7.1. y1 y2 163 s1 δ δ+ θ 0.000000 0.000000 1.000000 0.2887 0.0000 0.4082 0.333333 −0.333333 0.666667 0.4596 0.0479 0.4082 0.442200 0.210998 0.557800 0.4611 0.0586 0.4082 0.601207 0.533107 0.398793 0.4618 0.0437 0.4082 0.744612 0.723715 0.255388 0.4608 0.0271 0.4082 0.843582 0.836508 0.156418 0.4601 0.0162 0.4082 0.905713 0.903253 0.094287 0.4598 0.0096 0.4082 0.943610 0.942750 0.056390 0.4597 0.0057 0.4082 0.966423 0.966122 0.033577 0.4596 0.0034 0.4082 0.980058 0.979953 0.019942 0.4596 0.0020 0.4082 0.988174 0.988137 0.011826 0.4596 0.0012 0.4082 0.992993 0.992980 0.007007 0.4596 0.0007 0.4082 0.995850 0.995846 0.004150 0.4596 0.0004 0.4082 0.997543 0.997542 0.002457 0.4596 0.0002 0.4082 0.998546 0.998545 0.001454 0.4596 0.0001 0.4082 0.999139 0.999139 0.000861 0.4596 0.0001 0.4082 0.999491 0.999491 0.000509 0.4596 0.0001 0.4082 0.999699 0.999699 0.000301 0.4596 0.0000 0.4082 0.999822 0.999822 0.000178 0.4596 0.0000 0.4082 0.999894 0.999894 0.000106 0.4596 0.0000 0.4082 0.999938 0.999938 0.000062 0.4596 0.0000 0.4082 0.999963 0.999963 0.000037 0.4596 0.0000 0.4082 0.999978 0.999978 0.000022 − − − Output of the primal-dual full-step algorithm. Comparing the results in Table 7.1. with those in the corresponding table for the dual algorithm with full steps (Table 6.1., page 124), the most striking differences are the number of iterations and the behavior of the proximity measure. In the primal-dual case the number of iterations is 22 (instead of 53). This can be easily understood from √ the fact that we could use the larger barrier update parameter θ = 1/ 2n (instead of √ θ = 1/(3 n)). The second difference is probably more important. In the primal-dual case Newton’s method is much more efficient than in the dual case. This is especially evident in the final iterations where both methods show very stable behavior. In the dual case the proximity takes in these iterations the values 0.2722 (before) and 0.0524 (after the Newton step), whereas in the primal-dual case these values are respectively 0.4596 and 0.0000. Note that in the dual case the effect of the Newton step is slightly better than the quadratic convergence result of Theorem II.21. In the primal-dual case, however, the effect of the Newton step is much better than predicted by Theorem II.50, and even much better than the improved quadratic convergence result of Theorem II.52. 164 II Logarithmic Barrier Approach The figures in Table 7.1. justify the statement (at least for this sample problem, but we observed the same phenomenon in other experiments) that asymptotically the primal-dual Newton method is almost exact. Remark II.56 It is of interest to have a closer (and more accurate) look at the proximity values in the final iterations. They are given in Table 7.2. (page 164). These figures show that It. δ δ+ 11 12 13 14 15 16 17 18 19 20 21 0.45960642869434 0.45960584496214 0.45960564054812 0.45960556896741 0.45960554390189 0.45960553512461 0.45960553205110 0.45960553097480 0.45960553059816 0.45960553046642 0.45960553041942 0.00069902816289 0.00041365328341 0.00024478048789 0.00014484936548 0.00008571487895 0.00005072193012 0.00003001478966 0.00001776130347 0.00001051028182 0.00000621947704 0.00000368038542 Table 7.2. Proximity values in the final iterations. in the final iterations, where Newton’s method is almost exact, the quality of the method gradually improves. After the step the proximity decreases monotonically. In fact, surprisingly enough, the rate of decrease of subsequent values of the proximity after the step is almost constant (0.59175). Remember that √ the barrier parameter µ also decreases at a linear rate by a factor 1 − θ, where θ = 1/ 2n. In our case we have n = 3. This gives θ = 0.4082 and 1 − θ = 0.59175, precisely the rate of decrease in δ + . Before the Newton step the proximity √ is almost constant (0.4596). Not surprisingly, this is precisely the value of θ n/(2(1 − θ)). Thus, our numerical experiment gives rise to a conjecture: Conjecture II.57 Asymptotically the quality of the primal-dual Newton step gradually improves. The proximity before the step converges to some constant and the proximity after the step decreases monotonically to zero with a linear convergence rate. The rate of convergence is equal to 1 − θ. This observed behavior of the primal-dual Newton method has no theoretical justification at the moment. • We conclude this section with a graphical illustration. Figure 7.3 shows on two graphs the progress of the algorithm in the w-space (cf. Example II.51 on page 157). In both figures the w-space is projected onto its first two coordinates. The difference between the two graphs is due to the scaling of the axes. On the left graph the scale is linear and on the right graph it is logarithmic. As in Example II.51, the curves connecting the subsequent iterates show the intermediate values of xs on the way to the next iterate. The graphs show that after the first iteration the iterates follow the central path quite accurately. II.7 Primal-Dual Logarithmic Barrier Method 165 101 5 w2 ✻ w2 ✻ 100 ✻ 4 central path 10−1 δ(w, µ1 ) = τ 3 10−2 ❄ 10−3 2 10−4 1 10−5 10−6 0 0 1 2 Figure 7.3 7.5.3 3 4 ✲ w1 5 10−6 10−5 10−4 10−3 10−2 10−1 100 ✲ w1 101 The iterates of the primal-dual algorithm with full steps. The classical analysis of the algorithm In this section we give a different analysis of the primal-dual logarithmic barrier algorithm with full Newton steps. The analysis uses the proximity measure σ(x, s; µ) := xs −e , µ which is very common in the literature on primal-dual methods.10 Because of its widespread use, it seems useful to show in this section how the analysis can be easily adapted to the use of the classical proximity measure. In fact, the only thing we have to do is find suitable analogues of the quadratic convergence result in Theorem II.50 and the barrier update result of Lemma II.54.  p √  Theorem II.58 11 If σ := σ(x, s; µ) ≤ 2/ 1 + 1 + 2 = 0.783155, then the primal-dual Newton step is feasible. Moreover, in that case σ2 . σ(x+ , s+ ; µ) ≤ √ 2 2(1 − σ) Proof: First we derive from u2 − e = σ the obvious inequality 1 − σ ≤ u2i ≤ 1 + σ, 1 ≤ i ≤ n. 10 It was introduced by Kojima, Mizuno and Yoshise [178] and used in many other papers. See, e.g., Gonzaga [124], den Hertog [140], Marsten Shanno and Simantiraki [196], McShane, Monma and Shanno [199], Mehrotra and Sun [205], Mizuno [215], Monteiro and Adler [218], Todd [262, 264], Zhang and Tapia [319]. 11 This result is due to Mizuno [212]. 166 II Logarithmic Barrier Approach This implies u−2 ∞ ≤ From (7.4) we recall that 1 . 1−σ (7.20) x+ s+ = µe + ∆x∆s. Hence, using (7.11), we have 12,13 σ(x+ , s+ ; µ) := ∆x∆s x+ s+ −e = = kdx ds k . µ µ By the first uv-lemma (Lemma C.4 in Appendix C) we have 1 1 2 kdx ds k ≤ √ kdx + ds k = √ u−1 − u 2 2 2 2 2 . Using (7.20) we write 2 u−1 − u = u−1 (e − u2 ) 2 ≤ u−2 2 e − u2 ∞ ≤ σ2 . 1−σ Hence we get σ2 σ(x+ , s+ ; µ) ≤ √ . 2 2(1 − σ) Since σ(x+ , s+ ; µ) = kdx ds k, feasibility of the new iterates is certainly guaranteed if σ(x+ , s+ ; µ) ≤ 1, from (7.9). This condition is certainly satisfied if σ2 √ ≤ 1, 2 2(1 − σ)  p √  and this inequality holds if and only if σ ≤ 2/ 1 + 1 + 2 , as can easily be verified. The theorem follows. ✷ 12 Exercise 52 This exercise provides an alternative proof of the first inequality in Lemma C.4. Let u and v denote vectors in IRn and δ > 0 (δ ∈ IR). First prove that min u,v 13 ( u1 v1 : n X ui vi = 0, i=1 n X i=1 u2i + vi2  = 4δ 2 ) = δ2 . Using this, show that if u and v are orthogonal and ku + vk = 2δ then kuvk∞ ≤ δ2 . Exercise 53 This exercise provides tighter version of the second inequality in Lemma C.4. Let u and v denote vectors in IRn and δ > 0 (δ ∈ IR). First prove that max u,v ( n X i=1 u2i vi2 : n X i=1 ui vi = 0, n X i=1  2 u2i + vi = 4δ2 ) = nδ4 . n−1 √ Using this show that if u and v are orthogonal and ku + vk = 2δ then kuvk ≤ δ2 2. II.7 Primal-Dual Logarithmic Barrier Method 167 Lemma II.59 Let (x, s) be a positive primal-dual pair and µ > 0 such that xT s = nµ. Moreover, let σ := σ(x, s; µ) and let µ+ = (1 − θ)µ. Then we have √ σ2 + θ2 n + . σ(x, s; µ ) = 1−θ Proof: Let σ + := σ(x, s; µ+ ), with xT s = nµ. Then, by definition, (σ + )2 = xs −e (1 − θ)µ 2 = 1 (1 − θ)2 xs − e + θe µ 2 . The vectors e and xs/µ − e are orthogonal, as easily follows. Hence xs − e + θe µ 2 = xs −e µ 2 2 + kθek = σ 2 + θ2 n. The lemma follows. ✷ From the above results, it is clear that maintaining the property σ(x, s : µ) ≤ τ during the course of the algorithm amounts to the following condition on θ: s 1 τ4 + nθ2 ≤ τ. (7.21) 1 − θ 8(1 − τ )2 For any given τ this inequality determines how deep the updates of the barrier parameter are allowed to be. Since the full Newton step must be feasible we may assume that 2 p τ≤ √ = 0.783155. 1+ 1+ 2 Squaring both sides of (7.21) gives τ4 + nθ2 ≤ τ 2 (1 − θ)2 . 8(1 − τ )2 √ This implies nθ2 ≤ τ 2 , and hence the parameter θ must satisfy θ ≤ τ / n. The iteration bound of Lemma I.36 becomes smaller for larger values of θ. Our aim here is to show that for the best possible choice of θ the iteration bound resulting from the classical analysis cannot be better than the bound of Theorem II.53. For that purpose we may assume that n is so large that 1 − θ ≈ 1. Then the condition on θ becomes τ4 + nθ2 ≤ τ 2 , 8(1 − τ )2 or equivalently, nθ2 ≤ τ 2 − τ4 . 8(1 − τ )2 (7.22) Note that the right-hand side expression must be nonnegative, which holds only if √ 2 2 √ = 0.738796. τ≤ 1+2 2 168 II Logarithmic Barrier Approach We can easily verify that the right-hand side expression in (7.22) is maximal if 7τ 3 − 22τ 2 + 24τ − 8 = 0, which occurs for τ = 0.60155. Substituting this value in (7.22) we obtain nθ2 ≤ 0.258765, which amounts to θ≤ 1 0.508689 √ ≈ √ . n 2 n Obviously, this upper bound for θ is too optimistic. The above argument makes clear that by using the ‘classical’ proximity measure σ(x, s; µ) in the analysis of the primaldual method with full Newton steps, the iteration bound obtained with the proximity measure δ(x, s; µ) cannot be improved. 7.6 7.6.1 A version of the algorithm with adaptive updates Adaptive updating We have seen in Section 7.5 that when the property 1 δ(x, s; µ) ≤ τ = √ 2 (7.23) is maintained after the update of the barrier parameter, p the values of the barrier update parameter θ are limited by the upper bound θ < 2/n, and therefore, the iteration bound cannot be better than the ‘ideal’ bound  r nµ0 n . log 2 ε Thus, larger updates of the barrier parameter are possible only when abandoning the idea that property (7.23) must hold after each update of the barrier parameter. To make clear how this can be done without losing the iteration bound of Theorem II.53, we briefly recall the idea behind the proof of this theorem. After each Newton step we have a primal-dual pair (x, s) and µ > 0 such that τ2 . δ(x, s; µ) ≤ τ̄ = p 2(1 − τ 2 ) (7.24) Then we update µ to a smaller value µ+ = (1 − θ)µ such that δ(x, s; µ+ ) ≤ τ, (7.25) and we perform a Newton step to the µ+ -center, yielding a primal-dual pair (x+ , s+ ) such that δ(x+ , s+ ; µ+ ) ≤ τ̄ . Figure 7.4 illustrates this. Why does this scheme work? It works because every time we perform a Newton step the iterates x and s are such that xs is in the region around the µ-center where II.7 Primal-Dual Logarithmic Barrier Method 169 ... ...✛ T ... x s = nµ ... ❨ ... central path ... ... ... ... ... ... ... ... ... ... µ ... ... ✛ δ(x, s, µ) = τ̄ ... ... ... ... xs ... ... ... ... ... ... ... ... ... ... + . ... . µ = (1 − θ)µ ... + + ... ..✠ ... ...x s . ... ... ✛ δ(x, s, µ+ ) = τ ... ... ... ... ...✛ ... xT s = nµ+ ... Figure 7.4 The primal-dual full-step approach. Newton’s method behaves well. √ The theory guarantees that if the proximity does not exceed the parameter τ = 1/ 2 then we stay within this region. However, in practice the region where Newton’s method behaves well may be much larger. Thus we can adapt our strategy to this phenomenon and choose the smallest barrier parameter µ+ = (1 − θ)µ so that after the Newton step to the µ+ -center the iterates satisfy δ(x+ , s+ ; µ+ ) ≤ τ̄ . Therefore, let us consider the following problem: Given a primal-dual pair (x, s) and µ > 0 such that δ := δ(x, s; µ) ≤ τ̄ , find the largest θ such that after the Newton step at (x, s) with barrier parameter value µ+ = (1 − θ)µ we have δ + = δ(x+ , s+ ; µ+ ) ≤ τ̄ . Here we use the parameter τ̄ instead of τ , because until now τ referred to the proximity before the Newton step, whereas τ̄ is an upper bound for the proximity just after the Newton step. It is natural to take for τ̄ the value 1/2, because this is an upper bound √ for the proximity after the Newton step when the proximity before the step is 1/ 2. Our aim in this section is to investigate how deep the updates can be taken, so as to enhance the performance of the algorithm as much as possible. See Figure 7.5.14 Just as in the case of the dual method with adaptive updates, we need to introduce the so-called primal-dual affine-scaling and primal-dual centering directions at (x, s). 14 The idea of using adaptive updates of the barrier parameter in a primal-dual method can be found in, e.g., Jarre and Saunders [163]. 170 II Logarithmic Barrier Approach ... ...✛ T ... x s = nµ ... ❨ ... central path ... ... ... ... ... .. µ ..... ✛ δ(x, s, µ) = τ̄ ... . xs..... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... µ+ = (1 − θ)µ ...... ... ... ... + + ...✎. x s .. ... ...✛ ... . Figure 7.5 7.6.2 ✛ δ(x, s, µ+ ) = τ xT s = nµ+ The full-step method with an adaptive barrier update. The primal-dual affine-scaling and centering direction We first recall some definitions and properties from Section 7.4. With r r x xs d= , u= , s µ the vectors x and s can be scaled by d to the vector u as follows: ds d−1 x √ = √ = u. µ µ The same scaling applied to the Newton steps ∆x and ∆s yields the scaled Newton steps dx and ds : d−1 ∆x d∆s dx = √ , ds = √ , µ µ and these satisfy dx + ds = u−1 − u. Moreover, the vectors dx and ds are orthogonal. They are the components of the vector u−1 − u in the null space of AD and the null space of HD−1 respectively: dx ds = = PAD (u−1 − u) PHD−1 (u −1 − u). (7.26) (7.27) II.7 Primal-Dual Logarithmic Barrier Method 171 In this section we work mainly with the scaled Newton steps dx and ds . The last expressions yield a natural way of separating these directions into a so-called affine-scaling component and a centering component. Remark Guoyong Gu: term centering component not natural!! The (scaled) centering directions are defined by dcx = PAD (u−1 ), dcs = PHD−1 (u−1 ), (7.28) and the (scaled) affine directions by dax = −PAD (u), das = −PHD−1 (u). (7.29) Now we have the obvious relations dx ds = = dcx + dax dcs + das and dcx + dcs = u−1 dax + das = −u. The unscaled centering and affine-scaling directions are defined in the obvious way: √ ∆a x := µddax , etc. For the sake of completeness we list these definitions below and we also give some alternative expressions which can straightforwardly be verified. ∆a x := ∆a s := √ √ √ µddax = − µDPAD (u) = −DPAD ( xs) √ √ −1 a √ µd ds = − µD−1 PHD−1 (u) = −D−1 PHD−1 ( xs) ∆c x := √ √ µddcx = µDPAD (u−1 ) = µDPAD ( √exs ) ∆c s := √ −1 c √ −1 µd ds = µD PHD−1 (u−1 ) = µD−1 PHD−1 ( √exs ). Note that the affine-scaling directions ∆a x and ∆a s depend only on the iterates x and s and not on the barrier parameter µ. For the centering directions we have that ∆c x/µ and ∆c s/µ depend only on the iterates x and s and not on the barrier parameter µ. Also note that if we are on the central path, i.e., if x = x(µ) and s = s(µ), then we have u = e. This implies u−1 − u = 0, whence dx = ds = 0. Hence, on the central path we have dax = −dcx and das = −dcs . For future reference we observe that the above definitions imply the obvious relations ∆x = ∆a x + ∆c x (7.30) ∆s = ∆a s + ∆c s, which show that the (unscaled) full Newton step (∆x, ∆s) — at (x, s) and for the barrier parameter value µ — can be nicely decomposed in its affine scaling and its centering component. 172 7.6.3 II Logarithmic Barrier Approach Condition for adaptive updates In this section we start to deal with the problem stated before. Let (x, s) be a positive primal-dual pair and µ > 0 such that δ = δ(x, s; µ) ≤ τ̄ . We want to investigate how large θ can be so that after the Newton step at (x, s) with barrier parameter value µ+ = (1 − θ)µ we have δ + = δ(x+ , s+ ; µ+ ) ≤ τ̄ . We derive a condition for the barrier update parameter θ that guarantees the desired behavior. The vector u, the scaled search directions dx and ds and their (scaled) centering components dcx , dcs and (scaled) affine components dax , das , have the same meaning as in the previous section; the entities u, dax and das depend on the given value µ of the barrier parameter. The scaled search directions at (x, s) with barrier parameter value + µ+ are denoted by d+ x and ds . Letting ∆x and ∆s denote the (unscaled) Newton + directions with respect to µ , we have s∆x + x∆s = µ+ e − xs, and therefore, also using (7.11),  + x+ s+ = µ+ e + ∆x∆s = µ+ e + d+ x ds . + By Lemma II.48, the step is feasible if e + d+ x ds ≥ 0, and this certainly holds if + d+ x ds ∞ ≤ 1. Moreover, from the proof of Theorem II.50 we recall that the proximity δ + := δ(x+ , s+ ; µ+ ) of the new pair (x+ , s+ ) with respect to the µ+ -center is given by d+ d+ 2δ + = p x s . + e + d+ x ds This implies that we have δ + ≤ τ̄ if and only if + d+ x ds p + e + d+ x ds 2 ≤ 4τ̄ 2 . In the sequel we use the weaker condition + d+ x ds 2 + ≤ 4τ̄ 2 1 − d+ x ds ∞  , (7.31) which we refer to as the condition for adaptive updating. A very important observation is that when this condition is satisfied, the Newton step is feasible. Because, if (7.31) holds, since the left-hand side expression is nonnegative, the right-hand side expression + must be nonnegative as well, and hence kd+ x ds k∞ ≤ 1. Thus, in the further analysis we may concentrate on the condition for adaptive updating (7.31). 7.6.4 Calculation of the adaptive update We proceed by deriving upper bounds for the 2-norm and the infinity norm of the + vector d+ x ds . It is convenient to introduce the vector r xs ū := . µ+ II.7 Primal-Dual Logarithmic Barrier Method We then have ū = Hence, using this and (7.26), and r xs = µ+ r 173 xs u . = √ (1 − θ)µ 1−θ  √  1 −1 − ū = 1 − θPAD u−1 − √ d+ PAD (u) x = PAD ū 1−θ   √ 1 −1 PHD−1 (u) . d+ − ū = 1 − θPHD−1 u−1 − √ s = PHD−1 ū 1−θ Now using (7.28) and (7.29) we obtain d+ x = d+ s = √ da 1 − θ dcx + √ x 1−θ √ da 1 − θ dcs + √ s . 1−θ Note that d+ x can be rewritten in the following way: √ da 1 − θdcx + √ x 1−θ  (7.32) (7.33)  √ 1 √ − 1 − θ dax 1−θ = √ 1 − θ (dcx + dax ) + = √ θ 1 − θdx + √ dax . 1−θ Since d+ s can be reformulated in exactly the same way we find d+ x = d+ s = √ θ 1 − θdx + √ dax 1−θ √ θ 1 − θds + √ das . 1−θ Multiplication of both expressions gives + a a d+ x ds = (1 − θ)dx ds + θ (dx ds + ds dx ) + θ2 a a d d . 1−θ x s (7.34) + At this stage we see how the coordinates of the vector d+ x ds depend on θ. The + + coordinates of (1 − θ)dx ds are quadratic functions of θ: + 2 a a 2 a a (1 − θ)d+ x ds = (1 − θ) dx ds + θ(1 − θ) (dx ds + ds dx ) + θ dx ds . When multiplying the condition (7.31) for adaptive updating by (1−θ)2 , this condition can be rewritten as + 4τ̄ 2 (1 − θ)2 − (1 − θ)d+ x ds 2 + ≥ 4τ̄ 2 (1 − θ) (1 − θ)d+ x ds ∞ . (7.35) Now denoting the left-hand side member by p(θ) and the i-th coordinate of the vector + (1 − θ)d+ x ds by qi (θ), with τ̄ given, we need to find the largest positive θ that satisfies the following inequalities: p(θ) p(θ) ≥ ≥ 4τ̄ 2 (1 − θ)qi (θ), 1 ≤ i ≤ n −4τ̄ 2 (1 − θ)qi (θ), 1 ≤ i ≤ n. 174 II Logarithmic Barrier Approach Since p(θ) is a polynomial of degree 4 in θ, and each qi (θ) is a polynomial of degree 2 in θ, the largest positive θ satisfying each single one of these 2n inequalities can be found straightforwardly by solving a polynomial equation of degree 4. The smallest of the 2n positive numbers obtained in this way (some of them may be infinite, but not all of them!) is the value of θ determined by the condition of adaptive updating. Thus we have shown that the largest θ satisfying the condition for adaptive updating can be found by solving 2n polynomial equations of degree 4.15 Below we deal with a second approach. We consider a further relaxation of the condition for adaptive updating that requires the solution of only one quadratic equation. Of course, this approach yields a smaller value of θ than the above procedure, which gives the exact solution of the condition (7.31) for adaptive updating. Before proceeding it is of interest to investigate the special case where we start at the µ-centers x = x(µ) and s = s(µ). 7.6.5 Special case: adaptive update at the µ-center When we start with x = x(µ) and s = s(µ), we established earlier that u = e, dx = ds = 0, dax = −dcx and das = −dcs . Substituting this in (7.34) we obtain + d+ x ds = θ2 a a d d . 1−θ x s Now we can use the first uv-lemma (Lemma C.4 in Appendix C) to estimate the 2-norm and the infinity norm of dax das . Since dax + das = −u = −e, we obtain n kdax das k ≤ √ , 2 2 kdax das k∞ ≤ n . 4 Substitution in (7.31) gives   θ2 n θ 4 n2 2 . 1 − ≤ 4τ̄ 8(1 − θ)2 4(1 − θ) This can be rewritten as which is equivalent to  or θ 4 n2 τ̄ 2 θ2 n + ≤ 4τ̄ 2 , 2 8(1 − θ) 1−θ √ θ2 n √ + τ̄ 2 2 2 2(1 − θ) 2 ≤ 4τ̄ 2 + 2τ̄ 4 , √ p √  θ2 n ≤2 2 4τ̄ 2 + 2τ̄ 4 − τ̄ 2 2 . 1−θ Substituting τ̄ = 1/2 gives θ2 n ≤ 2. 1−θ 15 In fact, more efficient procedures exist for solving the condition for adaptive updating, but here our only aim has been to show that there exists an efficient procedure for finding the maximal value of the parameter θ satisfying the condition for adaptive updating. II.7 Primal-Dual Logarithmic Barrier Method 175 This result has its own interest. The bound obtained is exactly the ‘ideal’ bound for θ derived in Section 7.5 for the hypothetical situation where the Newton step is exact. Here we obtained a better bound without this assumption, but under the more realistic assumption that we start at the µ-centers x = x(µ) and s = s(µ). 7.6.6 A simple version of the condition for adaptive updating We return to the general case, and show how a weakened version of the condition for adaptive updating  + 2 + d+ ≤ 4τ̄ 2 1 − d+ x ds x ds ∞ can be reduced to a quadratic inequality in θ. With + d+ := d+ x + ds , the first uv-lemma (Lemma C.4 in Appendix C) implies that 2 2 + d+ ≤ x ds kd+ k √ , 2 2 + d+ x ds ∞ ≤ kd+ k . 4 Substituting these bounds in the condition for adaptive updating we obtain the weaker condition ! 4 2 kd+ k kd+ k 2 1− . ≤ 4τ̄ 8 4 Rewriting this as  we obtain d+ d+ 2 2 + 4τ̄ 2 ≤ 2 ≤ 32τ̄ 2 + 16τ̄ 4 , p 32τ̄ 2 + 16τ̄ 4 − 4τ̄ 2 . Substituting τ̄ = 1/2 leads to the condition d+ 2 ≤ 2. d+ x (7.36) d+ s , From the expressions (7.32) and (7.33) for and and also using that and dax + das = −u, we find √ u d+ = 1 − θ u−1 − √ . 1−θ dcx +dcs = u−1 From this expression we can calculate the norm of d+ : d+ Since kuk2 = n and u−1 d+ 16 Since d+ namely 2 2 2 = (1 − θ) u−1 2 + kuk2 − 2n. 1−θ = n + 4δ 2 , where δ = δ(x, s; µ), we obtain  = (1 − θ) n + 4δ 2 + θ2 n 16 n − 2n = 4(1 − θ)δ 2 + . 1−θ 1−θ = 2δ(x, s : µ+ ) this analysis yields in a different way the same result as in Lemma II.54, δ(x, s; µ+ )2 = (1 − θ)δ2 + θ2 n . 4(1 − θ) 176 II Logarithmic Barrier Approach Putting this in (7.36) we obtain the following condition on θ: 4(1 − θ)δ 2 + θ2 n ≤ 2. 1−θ The largest θ satisfying this inequality is given by √ 2n + 1 − 4nδ 2 + 4δ 2 − 1 θ= . n + 4δ 2 (7.37) With this value of θ we are sure that when starting with δ(x, s; µ) = δ, after the Newton step with barrier parameter value µ+ = (1−θ)µ we have δ(x+ , s+ ; µ+ ) ≤ 1/2. If δ = 0, the above expression reduces to θ= and if δ = 1/2 to 1 2 √ ≤√ 1 + 2n + 1 2n 1 θ= √ , 17 n+1 as easily may be verified. Hence, when using cheap adaptive updates the actual value of θ varies from iteration to iteration but it always lies between √ the above two extreme 2. As a consequence, the values. The ratio between these extreme values is about √ speedup factor is bounded above by (approximately) 2. 7.6.7 Illustration of the algorithm with adaptive updates With the same example as in the previous illustrations, and the same initialization of the algorithm as in Section 7.5.2, we experiment in this section with two adaptiveupdate strategies. First we consider the most expensive strategy, and calculate the barrier update parameter θ from (7.35). In this case we need to solve 2n polynomial inequalities of degree four. The algorithm, with ε = 10−4 , then runs as shown in Table 7.3.. As before, Table 7.3. contains one entry (the first) of the vectors x and s. A new column shows the value of the barrier update parameter in each iteration. The fast increase of this parameter to almost 1 is surprising. It results in very fast convergence of the method: only 5 iterations yield the desired accuracy. When we calculate θ according to (7.37), the performance of the algorithm is as shown in Table 7.4.. Now 15 iterations are needed instead of 6. In this example in the final iterations θ seems to stabilize around the value 0.58486. This implies that the convergence rate for the duality gap is linear. This is in contrast with the other approach, where the convergence rate for the duality gap appears to be quadratic. Unfortunately, at this time no theoretical justification for a quadratic convergence rate of the adaptive version of the full-step method exists. For the moment we leave 17 We could have used this value of θ in Theorem II.53, leading to the iteration bound  √ nµ0 n + 1 log ε  for the Primal-Dual Logarithmic Barrier Algorithm with full Newton steps. II.7 Primal-Dual Logarithmic Barrier Method 177 It. nµ x1 y1 y2 s1 δ δ+ θ 0 1 2 3 4 5 4.000000 1.281509 0.197170 0.004586 0.000002 0.000000 2.000000 1.093836 1.010191 1.000224 1.000000 1.000000 0.000000 0.333333 0.888935 0.997391 0.999999 1.000000 0.000000 0.572830 0.934277 0.998471 0.999999 1.000000 1.000000 0.666667 0.111065 0.002609 0.000001 0.000000 0.2887 0.7071 0.7071 0.7071 0.7071 − 0.7071 0.7071 0.7071 0.7071 0.1472 − 0.679623 0.846142 0.976740 0.999460 0.999999 − Table 7.3. The primal-dual full-step algorithm with expensive adaptive updates. It. nµ x1 y1 y2 s1 δ δ+ θ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 4.000000 1.860612 0.866381 0.393584 0.177943 0.080352 0.036275 0.016375 0.007392 0.003337 0.001506 0.000680 0.000307 0.000139 0.000063 0.000028 2.000000 1.286871 1.138033 1.063707 1.029247 1.013307 1.006028 1.002726 1.001231 1.000556 1.000251 1.000113 1.000051 1.000023 1.000010 1.000005 0.000000 0.333333 0.698479 0.865026 0.939865 0.973046 0.987874 0.994534 0.997535 0.998887 0.999498 0.999773 0.999898 0.999954 0.999979 0.999991 0.000000 0.379796 0.711206 0.868805 0.940686 0.973216 0.987908 0.994542 0.997536 0.998888 0.999498 0.999773 0.999898 0.999954 0.999979 0.999991 1.000000 0.666667 0.301521 0.134974 0.060135 0.026954 0.012126 0.005466 0.002465 0.001113 0.000502 0.000227 0.000102 0.000046 0.000021 0.000009 0.2887 0.2934 0.1355 0.0670 0.0308 0.0140 0.0063 0.0028 0.0013 0.0006 0.0003 0.0001 0.0001 0.0000 0.0000 − 0.2934 0.1355 0.0670 0.0308 0.0140 0.0063 0.0028 0.0013 0.0006 0.0003 0.0001 0.0001 0.0000 0.0000 0.0000 − 0.534847 0.534357 0.545715 0.547890 0.548438 0.548554 0.548578 0.548583 0.548584 0.548584 0.548584 0.548584 0.548584 0.548584 0.548584 − Table 7.4. The primal-dual full-step algorithm with cheap adaptive updates. this topic with the conclusion that the above comparison between the ‘expensive’ and the ‘cheap’ adaptive update full-step method suggests that it is worth spending extra effort in finding as large values for θ as possible. We conclude the section with a graphical illustration of the adaptive updating strategy. Figure 7.6 shows on two graphs the progress of the algorithm with the expensive update. The graphs show the first two coordinates of the iterates in the w-space. The left graph has a linear scale and the right graph a logarithmic scale. Figure 7.7 concerns the case when cheap updates are used. 7.7 The predictor-corrector method In the previous section it became clear that the Newton step can be decomposed into an affine-scaling component and a centering component. Using the notations introduced 178 II Logarithmic Barrier Approach 101 2 w2 0 ✻10 w2 ✻ ✻ 1.5 10−1 central path 10−2 ✠ 1 δ(w, µ1 ) = τ 10−3 10−4 10−5 0.5 µ1 e 10−6 10−7 0 0 0.5 1 Figure 7.6 1.5 2 ✲ w1 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101 ✲ w1 Iterates of the primal-dual algorithm with adaptive updates. there, we recall from Section 7.6.2 that the (scaled) centering components are given by dcx = PAD (u−1 ), dcs = PHD−1 (u−1 ), and the (scaled) affine components by dax = −PAD (u), where d= r x , s das = −PHD−1 (u), u= r xs . µ 101 3 w2 ✻ 2.5 w2 0 ✻10 ✻ central path 2 10−1 10−2 1.5 ✠ 10−3 δ(w, µ1 ) = τ 10−4 1 10−5 µ1 e 0.5 10−6 10−7 0 0 0.5 1 Figure 7.7 1.5 2 2.5 ✲ w1 3 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101 ✲ w1 Iterates of the primal-dual algorithm with cheap adaptive updates. II.7 Primal-Dual Logarithmic Barrier Method 179 We also recall the relations dx ds dcx + dax dcs + das = = and dcx + dcs = u−1 dax + das = −u. The unscaled centering and affine-scaling components are given by ∆a x = √ µddax , ∆a s = √ −1 a µd ds ∆c x = √ µddcx , ∆c s = √ −1 c µd ds ; and as a consequence we have ∆a x∆a s = µdax das , ∆c x∆c s = µdcx dcs . It is interesting to consider the effect of moving along these directions. Let us define xa (θ) = x + θ∆a x xc = x + ∆c x sa (θ) = s + θ∆a s sc = s + ∆c s. We say that xa (θ) and sa (θ) result from an affine-scaling step of size θ at (x, s). In preparation for the next lemma we first establish the following two relations: x∆a s + s∆a x = x∆c s + s∆c x = −xs µe. (7.38) (7.39) These relations easily follow from the previous ones. We show this for the first of the two relations. We first write x∆a s = √ µxd−1 das = µudas , and s∆a x = √ µsddax = µudax . Adding the last two equalities we get (7.38): x∆a s + s∆a x = µu (dax + das ) = −µu2 = −xs. Now we can prove Lemma II.60 Let xT s = nµ. Assuming feasibility of the steps, the affine-scaling step reduces the duality gap by a factor 1 − θ and the step along the centering components doubles the duality gap. 180 II Logarithmic Barrier Approach Proof: We have xa (θ)sa (θ) = (x + θ∆a x) (s + θ∆a s) = xs + θ (x∆a s + s∆a x) + θ2 ∆a x∆a s. Using (7.38) we find xa (θ)sa (θ) = (1 − θ)xs + θ2 ∆a x∆a s. Using that ∆a x and ∆a s are orthogonal we obtain  T (xa (θ)) sa (θ) = eT (1 − θ)xs + θ2 ∆a x∆a s = (1 − θ)xT s, proving the first statement. For the second statement we write xc sc = (x + ∆c x) (s + ∆c s) = xs + (x∆c s + s∆c x) + ∆c x∆c s. Substitution of (7.39) gives xc sc = xs + µe + θ2 ∆c x∆c s. Thus we obtain  T (xc ) sc = eT xs + µe + θ2 ∆c x∆c s = xT s + µeT e = 2nµ. This completes the proof. ✷ Recall from (7.30) that the (unscaled) full Newton step (∆x, ∆s) at (x, s) — with barrier parameter value µ — can be decomposed in its affine scaling and its centering component. The above lemma makes clear that in the algorithms we dealt with before, the reduction in the duality gap during a (full) Newton step is delivered by the affinescaling component in the Newton step. The centering component in the Newton step forces the iterates to stay close to the central path. When solving a given LO problem, we wish to find a primal-dual pair with a duality gap close to zero. We want to reduce the duality gap as fast as possible to zero. Therefore, it becomes natural to consider algorithms that put more emphasis on the affine-scaling component. That is the underlying idea of the predictor-corrector method which is the subject of this section. Note that when the full affine-scaling step (with step-size 1) is feasible, it produces a feasible pair with duality gap zero, and hence it yields an optimal solution pair in a single step. This makes clear that the full affine step will be infeasible in general. In the predictor-corrector method, instead of combining the two directions in a single Newton step, we decompose the Newton step into two steps, an affine-scaling II.7 Primal-Dual Logarithmic Barrier Method 181 step first and, next, a so-called pure centering step.18 Since a full affine-scaling step is infeasible, we use a damping parameter θ. By taking θ small enough we enforce feasibility of the step, and at the same time gain control over the loss of proximity to the central path. The aim of the centering step is to restore the proximity to the central path. This is obtained by using a Newton step with barrier parameter value µ, where nµ is equal to the present duality gap. Such a step leaves the duality gap unchanged, by Lemma II.47. 7.7.1 The predictor-corrector algorithm In the description of the predictor-corrector algorithm below (page 182), ∆x and ∆s denote the full Newton step at (x, s) with the current value of the barrier parameter µ, and ∆a x and ∆a s denote the full affine-scaling step at the current iterate (x, s). Observe that according to Lemma II.60 the damping factor θ for the affine-scaling step can also be interpreted as an updating parameter for the barrier parameter µ. We have the following theorem. √ Theorem II.61 If τ = 1/2 and θ = 1/(2 n), then the Predictor-Corrector Algorithm requires at most   √ nµ0 2 n log ε iterations. The output is a primal-dual pair (x, s) such that xT s ≤ ε. The proof of this result is postponed to Section 7.7.3. It requires a careful analysis of the affine-scaling step, which is the √ subject of the next section. Let us note now that the iteration bound is a factor 2 worse than the bound in Theorem II.53 for the algorithm with full Newton steps. Moreover, each major iteration in the predictorcorrector algorithm consists of two steps: the centering step (also called the corrector step) and the affine-scaling step (also called the predictor step). 7.7.2 Properties of the affine-scaling step The purpose of this section is to analyze the effect of an affine-scaling step with size θ on the proximity measure. As before, (x, s) denotes a positive primal-dual pair. We 18 The idea of breaking down the Newton direction into its affine-scaling and its centering component seems to be due to Mehrotra [205]. The method considered in this chapter was proposed first by Mizuno, Todd and Ye [217]; they were the first to use the name predictor-corrector method. The analysis in this chapter closely resembles their analysis. Like them we alternate (single) primaldual affine-scaling steps and (single) primal-dual centering steps. An earlier paper of Sonnevend, Stoer and Zhao [258] is based on similar ideas, except that they use multiple centering steps. It soon appeared that one could prove that the method asymptotically has a quadratic convergence rate (see, e.g., Mehrotra [206, 205], Ye et al. [317], Gonzaga and Tapia [126, 127], Ye [309] and Luo and Ye [188].). Quadratic convergence of the primal-dual predictor-corrector method is the subject in Section 7.7.6. A dual version of the predictor-corrector method was considered by Barnes, Chopra and Jensen [36]; they showed polynomial-time convergence with an O(nL) iteration bound. Mehrotra’s variant of the primal-dual predictor-corrector method will be discussed in Chapter 20. It significantly cuts down the computational effort to achieve the greatest practical efficiency among all interior-point methods. See, e.g., Lustig, Marsten and Shanno [192]. As a consequence the method has become very popular. 182 II Logarithmic Barrier Approach Predictor-Corrector Algorithm Input: A proximity parameter τ , 0 ≤ τ < 1; an accuracy parameter ε > 0; (x0 , s0 ) ∈ P × D, µ0 > 0 with (x0 )T s0 = nµ0 , δ(x0 , s0 ; µ0 ) ≤ τ ; a barrier update parameter θ, 0 < θ < 1. begin x := x0 ; s := s0 ; µ := µ0 ; while nµ ≥ (1 − θ)ε do begin x := x + ∆x; s := s + ∆s; x := x + θ∆a x; s := s + θ∆a s; µ := (1 − θ)µ; end end assume that µ > 0 is such that xT s = nµ, and δ := δ(x, s; µ). Recall from (7.16) that δ= 1 −1 1 e − u2 u −u = , 2 2 u where u= r xs . µ We need a simple bound on the coordinates of the vector u. √ Lemma II.62 Let ρ(δ) := δ + 1 + δ 2 . Then 1 ≤ ui ≤ ρ(δ), ρ(δ) 1 ≤ i ≤ n. Proof: Since ui is positive for each i, we have −2δui ≤ 1 − u2i ≤ 2δui . This implies u2i − 2δui − 1 ≤ 0 ≤ u2i + 2δui − 1. Rewriting this as 2 2 (ui − δ) − 1 − δ 2 ≤ 0 ≤ (ui + δ) − 1 − δ 2 II.7 Primal-Dual Logarithmic Barrier Method 183 we obtain (ui − δ)2 ≤ 1 + δ 2 ≤ (ui + δ)2 , which implies ui − δ ≤ |ui − δ| ≤ Thus we arrive at p 1 + δ 2 ≤ ui + δ. p p 1 + δ 2 ≤ ui ≤ δ + 1 + δ 2 = ρ(δ). −δ + For the left-hand expression we write −δ + p 1 + δ2 = 1 1 √ = . 2 ρ(δ) δ+ 1+δ This proves the lemma. ✷ Now we can prove the following. Lemma II.63 Let the pair (x+ , s+ ) result from an affine-scaling step at (x, s) with step-size θ. If xT s = nµ and δ := δ(x, s; µ) < τ , then we have δ + := δ(x+ , s+ ; (1 − θ)µ) ≤ τ if θ satisfies the inequality √ θ2 n ≤2 2 τ 1−θ s ! √ √ 4 + 4δρ(δ) 2 + 2τ 2 − 2δρ(δ) − τ 2 2 . ρ(δ)2 (7.40) For fixed τ , the right-hand side expression in (7.40) is a monotonically decreasing function of δ. Proof: From the proof of Lemma II.60 we recall that x+ s+ = (1 − θ)xs + θ2 ∆a x∆a s. This can be rewritten as  x+ s+ = µ (1 − θ)u2 + θ2 dax das . Defining + u := we thus have u+ 2 s x+ s+ , (1 − θ)µ = u2 + θ2 dax das . 1−θ The proximity after the affine-scaling step satisfies δ+ = 1 2 u+ −1  e − u+ 2  ≤ 1 2 u+ −1 ∞ e − u+ 2 . 184 II Logarithmic Barrier Approach We proceed by deriving bounds for the last two norms. First we consider the second norm: e − u+ 2 θ2 θ2 dax das kda da k ≤ e − u2 + 1−θ 1−θ x s θ2 n + √ . 2 2(1 − θ) = e − u2 − ≤ e − u2 For the last inequality we applied the first uv-lemma (Lemma C.4 in Appendix C) to the vectors dax and das and further utilized kuk2 = n. From Lemma II.62, we further obtain e − u2 e − u2 ≤ kuk∞ ≤ 2δρ(δ). e − u2 = u u u −1 For the estimate of (u+ ) we write, using Lemma II.62 and the first uv-lemma ∞ once more, 2 θ2 θ2 n 1 2 a a u+ ≥ u − − kd d k ≥ . i i 1−θ x s ∞ ρ(δ)2 4(1 − θ) We conclude, by substitution of these estimates, that 2 θ n 2δρ(δ) + 2√2(1−θ) δ ≤ q . 1 θ2n 2 ρ(δ) 2 − 4(1−θ) + Hence, δ + ≤ τ holds if 2  4τ 2 θ2 nτ 2 θ2 n ≤ − . 2δρ(δ) + √ 2 ρ(δ) 1−θ 2 2(1 − θ) (7.41) This can be rewritten as   2  √ √ θ2 n θ2 n 4τ 2 2δρ(δ) + √ + 4τ 2 δρ(δ) 2, + 2τ 2 2 2δρ(δ) + √ ≤ 2 ρ(δ) 2 2(1 − θ) 2 2(1 − θ) or equivalently,   √ 2 √ θ2 n 4τ 2 2δρ(δ) + √ + τ2 2 ≤ + 4τ 2 δρ(δ) 2 + 2τ 4 . 2 ρ(δ) 2 2(1 − θ) By taking the square root we get √ θ2 n + τ2 2 ≤ τ 2δρ(δ) + √ 2 2(1 − θ) s √ 4 + 4δρ(δ) 2 + 2τ 2 . 2 ρ(δ) By rearranging terms this can be rewritten as s √ √ 4 θ2 n √ + 4δρ(δ) 2 + 2τ 2 − 2δρ(δ) − τ 2 2. ≤τ 2 ρ(δ) 2 2(1 − θ) II.7 Primal-Dual Logarithmic Barrier Method 185 This implies the first statement in the lemma. For the proof of the second statement we observe that the inequality (7.40) in the lemma is equivalent to the inequality (7.41). We can easily verify that the left-hand side expression in (7.41) is increasing in both δ and θ and the right-hand side expression is decreasing in both δ and θ. Hence, if θ satisfies (7.41) for some value of δ, then the same value of θ satisfies (7.41) also for smaller values of δ. Since the inequalities (7.40) and (7.41) are equivalent, the last inequality has the same property: if θ satisfies (7.40) for some value of δ, then the same value of θ satisfies (7.40) also for smaller values of δ. This implies the second statement in the lemma and completes the proof. ✷ 2 1.5 upper bound for θ2n 1−θ ✠ 1 0.5 0 0 0.1 0.2 0.3 0.4 ✲ δ = δ(x, s; µ) Figure 7.8 The right-hand side of (7.40) for τ = 1/2. Figure 7.8 shows the graph of the right-hand side of (7.40) as a function of δ. With the above lemma the analysis of the predictor-corrector algorithm can easily be accomplished. We do this in the next section. At the end of this section we apply the lemma to the special case where we start the affine-scaling step at the µ-centers. Then δ = 0 and ρ(δ) = 1. Substitution of these values in the lemma yields that the proximity after the step does not exceed τ if √  p √  θ2 n ≤ 2 2 τ 4 + 2τ 2 − τ 2 2 . 1−θ Note that this bound coincides with the corresponding bound obtained in Section 7.6.5 for an adaptive update at the µ-center with the full-step method. 7.7.3 Analysis of the predictor-corrector algorithm √ In this section we provide the proof of Theorem II.61. Taking τ = 1/2 and θ = 1/(2 n) we show that each iteration starts with x, s and µ such that δ(x, s; µ) ≤ τ. This makes the algorithm well defined, and implies the result of the theorem. 186 II Logarithmic Barrier Approach The corrector step is simply a Newton step to the µ-center. By Theorem II.50 (on page 156) the result is a pair (x, s) such that 1 1 = √ . δ := δ(x, s; µ) ≤ q 4 24 2(1 − 14 ) Now we apply Lemma II.63 to this pair (x, s). This lemma states that the affine step with step-size θ leaves the proximity with respect to the barrier parameter (1 − θ)µ smaller than (or equal to) τ if θ satisfies (7.40) and, moreover, that√for fixed τ the right-hand side of (7.40) is monotonically decreasing in δ. For δ = 1/ 24 we have r r 1 3 1 = . ρ(δ) = √ + 1 + 24 2 24 Substitution of the given values in the√ right-hand side of (7.40) yields the value 0.612626 (cf. Figure 7.8, with δ = 1/ 24 = 0.204124). Hence (7.40) is certainly satisfied if θ2 n ≤ 0.612626. 1−θ √ If θ = 1/(2 n) this condition is satisfied for each n ≥ 1. This proves Theorem II.61. ✷ Remark II.64 In the above analysis we could also have used the improved quadratic convergence result of Theorem II.52. However, this does not give a significant change. After the centering step the proximity satisfies 1 δ := δ(x, s; µ) ≤ p 4 2(1 − 1 ) 16 1 = √ , 30 and the condition on θ becomes a little weaker, namely: θ2 n ≤ 0.768349. 1−θ 7.7.4 • An adaptive version of the predictor-corrector algorithm As stated before, the predictor-corrector method is the most popular interior-point method for solving LO problems in practice. But this is not true for the version we dealt with in the previous section. √ When we update the barrier parameter each time by the factor 1 − θ, with θ = 1/(2 n), as in that algorithm, the required number of iterations will be as predicted by Theorem II.61. That is, each iteration reduces the duality gap by the constant factor 1 − θ and hence the duality √ gap reaches the desired accuracy in a number of iterations that is proportional to n. The obvious way to reduce the number of iterations is to use adaptive updates of the barrier parameter. The following lemma is crucial. Lemma II.65 Let the pair (x+ , s+ ) result from an affine-scaling step at (x, s) with step-size θ. If xT s = nµ and δ := δ(x, s; µ) < τ , then δ + := δ(x+ , s+ ; µ(1 − θ)) ≤ τ if s ! θ2 1 a a 2 + 2δρ(δ) + τ − τ − 2δρ(δ). (7.42) kd d k ≤ 2τ 1−θ x s ρ(δ)2 II.7 Primal-Dual Logarithmic Barrier Method 187 Proof: The proof is a slight modification of the proof of Lemma II.63. We recall from that proof that the proximity after the affine-scaling step satisfies 2 −1  −1 2  1 1 , e − u+ u+ u+ ≤ e − u+ δ+ = 2 2 ∞ where, as before, u+ 2 = u2 + θ2 dax das , 1−θ dax and das denote the scaled affine-scaling components, and u = some estimates: 2 θ2 kda da k , ≤ e − u2 + e − u+ 1−θ x s and e − u2 ≤ 2δρ(δ). Moreover, p xs/µ. We also recall θ2 θ2 1 − kdax das k∞ ≥ kda da k . 2 1−θ ρ(δ) 1−θ x s By substitution of these estimates we obtain u+ i 2 ≥ u2i − 2 θ 2δρ(δ) + 1−θ kdax das k . δ ≤ q θ2 1 a a 2 ρ(δ) 2 − 1−θ kdx ds k + Hence, δ + ≤ τ holds if  2 θ2 4θ2 τ 2 a a 4τ 2 a a 2δρ(δ) + − kdx ds k ≤ kd d k . 1−θ ρ(δ)2 1−θ x s This can be rewritten as 2    4τ 2 θ2 θ2 a a a a 2 + 8τ 2 δρ(δ), kdx ds k + 4τ 2δρ(δ) + kdx ds k ≤ 2δρ(δ) + 1−θ 1−θ ρ(δ)2 or equivalently,  2 θ2 4τ 2 a a 2 2δρ(δ) + + 8τ 2 δρ(δ) + 4τ 4 . kdx ds k + 2τ ≤ 1−θ ρ(δ)2 By taking the square root we get θ2 kda da k + 2τ 2 ≤ 2τ 2δρ(δ) + 1−θ x s which reduces to θ2 kda da k ≤ 2τ 1−θ x s s s 1 + 2δρ(δ) + τ 2 , ρ(δ)2 1 + 2δρ(δ) + τ 2 − τ ρ(δ)2 This completes the proof. From this lemma we derive the next theorem. ! − 2δρ(δ). ✷ 188 II Logarithmic Barrier Approach Theorem II.66 If τ = 1/3 then the property δ(x, s; µ) ≤ τ is maintained in each iteration if θ is taken equal to θ= 2 p . 1 + 1 + 13 kdax das k Proof: We only need to show that when we start some iteration with x, s and µ such that δ(x, s; µ) ≤ τ, then after this iteration the property δ(x, s; µ) ≤ τ is maintained. By Theorem II.50 (on page 156) the result of the corrector step is a pair (x, s) such that 1 1 = . δ := δ(x, s; µ) ≤ q 9 12 2(1 − 1 ) 9 Now we apply Lemma II.65 to (x, s). By this lemma the affine step with step-size θ leaves the proximity with respect to the barrier parameter (1 − θ)µ smaller than (or equal to) τ if θ satisfies (7.42). For δ = 1/12 we have ρ(δ) = 1.0868. Substitution of the given values in the right-hand side expression yields 0.308103, which is greater than 4/13. The right-hand side is monotonic in δ, as can be verified by elementary means, so smaller values of δ yield larger values than 4/13. Thus the proximity after the affine-scaling step does not exceed τ if θ satisfies 4 θ2 kda da k ≤ . 1−θ x s 13 We may easily verify that the value in the theorem satisfies this condition with equality. Hence the proof is complete. ✷ 7.7.5 Illustration of adaptive predictor-corrector algorithm With the same example as in the previous illustrations, and the same initialization, the adaptive predictor-corrector algorithm, with ε = 10−4 , runs as shown in Table 7.5. (page 189). Each iteration consists of two steps: the corrector step (with θ = 0) and the affine-scaling step (with θ as given by Theorem II.66). Table 7.5. shows that only 7 iterations yield the desired accuracy. After the corrector step the proximity is always very small, especially in the final iterations. This is the same phenomenon as observed previously, namely that the Newton process is almost exact. For the affine-scaling steps we see the same behavior as in the full-step method with adaptive updates. The value of the barrier update parameter increases very quickly to 1. As a result the duality gap goes very quickly to zero. This is not accidental. It is a property of the predictorcorrector method with adaptive updates, as shown in the next section. Figure 7.9 (page 190) shows on two graphs the progress of the algorithm in the w-space. 7.7.6 Quadratic convergence of the predictor-corrector algorithm It is clear that the rate of convergence in the predictor-corrector method depends on the values taken by the barrier update parameter θ. We show in this section that the II.7 Primal-Dual Logarithmic Barrier Method It. nµ x1 1 1 2 2 3 3 4 4 5 5 6 6 7 7 4.000000 4.000000 1.595030 1.595030 0.593303 0.593303 0.146755 0.146755 0.013557 0.013557 0.000138 0.000138 0.000000 0.000000 2.000000 2.000000 1.278509 1.334918 1.088217 1.108991 1.019821 1.025085 1.001775 1.002265 1.000018 1.000023 1.000000 1.000000 Table 7.5. y1 y2 189 s1 δ θ 0.000000 0.000000 1.000000 0.2887 0.000000 0.333333 −0.333333 0.666667 0.0000 0.601242 0.493665 0.468323 0.506335 0.1576 0.000000 0.606483 0.468323 0.393517 0.0085 0.628030 0.780899 0.802232 0.219101 0.1486 0.000000 0.822447 0.802232 0.177553 0.0031 0.752648 0.941805 0.951082 0.058195 0.1543 0.000000 0.952333 0.951082 0.047667 0.0008 0.907623 0.994513 0.995481 0.005487 0.1568 0.000000 0.995492 0.995481 0.004508 0.0001 0.989826 0.999944 0.999954 0.000056 0.1575 0.000000 0.999954 0.999954 0.000046 0.0000 0.999894 1.000000 1.000000 0.000000 0.1576 0.000000 1.000000 1.000000 0.000000 0.0000 1.000000 The adaptive predictor-corrector algorithm. rate of convergence eventually becomes quadratic. To achieve a quadratic convergence rate it must be true that in the limit, (1−θ)µ is of the order O(µ2 ), so that 1−θ = O(µ). In this section we show that the value of θ in Theorem II.66 has this property. The following lemma makes clear that for our purpose it is sufficient to concentrate on the magnitude of the norm of the vector dax das . Lemma II.67 The value of the barrier update parameter θ in Theorem II.66 satisfies 1−θ ≤ 13 a a kdx ds k . 4 Hence, the rate of convergence for the adaptive predictor-corrector method is quadratic if kdax das k = O(µ). Proof: The lemma is an easy consequence of properties of the function f : [0, ∞) → IR+ defined by 2 √ . f (x) = 1 − 1 + 1 + 13x The derivative is given by 13 f ′ (x) = √ 2 √ 1 + 13x 1 + 1 + 13x and the second derivative by  √ −169 1 + 3 1 + 13x f (x) = 3 . √ 3 2 (1 + 13x) 2 1 + 1 + 13x ′′ 190 II Logarithmic Barrier Approach 3 w2 ✻ 2.5 101 w2 100 ✻ ✻ 10−1 central path 2 10−2 10−3 1.5 10−4 µ1 e 10−5 1 10−6 ■ δ(w, µ1 ) = τ 10−7 0.5 10−8 0 0 0.5 1 1.5 Figure 7.9 2 2.5 ✲ w1 3 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101 ✲ w1 The iterates of the adaptive predictor-corrector algorithm. This implies that f is monotonically increasing and concave. Since f ′ (0) = 13/4, it follows that f (x) ≤ 13x/4 for each x ≥ 0. Putting x = kdax das k gives the lemma. ✷ We need one more basic fact in the analysis below. This concerns the optimal sets P ∗ and D∗ of the primal and dual problems. Defining the index sets B := {i : xi > 0 for some x ∈ P ∗ } and N := {i : si > 0 for some s ∈ D∗ } , we know that these sets are disjoint, because xT s = 0 whenever x ∈ P ∗ and s ∈ D∗ . We need the far from obvious fact that each index i, 1 ≤ i ≤ n, belongs either to B or N .19 As a consequence, the sets B and N form a partition of the index set {i : 1 ≤ i ≤ n}. This partition is called the optimal partition of the problems (P ) and (D). The behavior of the components of the vectors dax and das strongly depends on whether a component belongs to one set of the optimal partition or to the complementary set. Table 7.6. summarizes some facts concerning the order of magnitude of the components of various vectors of interest. From this table we read, for example, that xB = Θ(1) and ∆a xN = O(µ). According to the definition of the symbols Θ and O this means that there exist positive constants c1 , c2 , c3 such that c1 e ≤ xB ≤ c2 e and ∆a xN ≤ c3 µ.20 In our case it is important to stress that 19 20 This is the content of the Goldman–Tucker Theorem (Theorem II.3), an early result in the theory of Linear Optimization that has often been considered exotic. The original proof was based on Farkas’ lemma (see, e.g., Schrijver [250], pp. 95–96). In Part I of this book we have shown that the corresponding result for the self-dual model is a natural byproduct of the limiting behavior of the central path. We also refer the reader to Güler et al. [134], who derived the Goldman–Tucker Theorem from the limiting behavior of the central path for the standard format. Güler and Ye [135] showed that interior-point algorithms — in a wide class — keep the iterates so close to the central path that these algorithms yield the optimal partition of the problem. See Section 1.7.4 for definitions of the order symbols O and Θ. II.7 Primal-Dual Logarithmic Barrier Method Vector Table 7.6. B 191 N 1 x Θ(1)∗ Θ(µ) 2 s Θ(µ) Θ(1)∗ 3 u Θ(1) 4 d Θ( √1µ ) Θ(1) √ Θ( µ) 5 dax 6 das 7 ∆a x 8 ∆a s O(µ)∗ O(1) O(µ)∗ O(µ) O(1) O(µ)∗ O(µ) O(µ)∗ Asymptotic orders of magnitude of some relevant vectors. these constants are independent of the iterates x, s and of the value µ of the barrier parameter. They depend only on the problem data A, b and c. Some of the statements in the table are almost trivial; the more difficult ones are indicated by an asterisk. Below we present the relevant proofs. Let us temporarily postpone the proof of the statements in Table 7.6. and show that the order estimates given in the table immediately imply quadratic convergence of the adaptive predictor-corrector method. Theorem II.68 The adaptive predictor-corrector method is asymptotically quadratically convergent. Proof: From Table 7.6. we deduce that each component of the vector dax das is bounded by O(µ). From our conventions this implies that dax das = O(µ). Hence the result follows from Lemma II.67. ✷ The rest of this section is devoted to proving the estimates in Table 7.6.. Note that at the start of an affine-scaling step we have δ = δ(x, s; µ) ≤ 1/12, from the proof of Theorem II.66. This property will be used several times in the sequel. We start with line 3 in the table. Line 3: With δ ≤ 1/12, Lemma II.62 implies that each component ui of u satisfies 0.92013 ≤ This proves that u = Θ(1). 1 ≤ ui ≤ ρ(δ) ≤ 1.0868. ρ(δ) 192 II Logarithmic Barrier Approach Lines 1 and 2: We start with the estimates for xB and sB . We need the following two positive numbers:21 σp := min max{xi : x ∈ P ∗ } σd := min max{si : s ∈ D∗ }. i∈B i∈N Note that these numbers depend only on the data of the problem and not on the iterates. Moreover, due to the existence of a strictly complementary optimal solution pair, both numbers are positive. Now let i ∈ B and let x̄ ∈ P ∗ be such that x̄i is maximal. Then, using that x̄i ≥ σp > 0, we may write si = sT x̄ sT x̄ si x̄i ≤ ≤ . x̄i x̄i σp Since x̄ is optimal, cT x̄ ≤ cT x. Hence, with y such that s = c − AT y, we have sT x̄ = cT x̄ − bT y ≤ cT x − bT y = sT x = nµ, so that si ≤ nµ , σp ∀i ∈ B. This implies that sB = O(µ). From the third line in Table 7.6. we derive that xB sB = µu2B = Θ(µ). The last two estimates imply that (xB )−1 = O(µ) sB = = O(1). Θ(µ) Θ(µ) This implies that xB is bounded away from zero. On the other hand, since the pair (x, s) has duality gap nµ and hence, by Theorem II.9 (on page 100), belongs to a bounded set, we have xB = O(1). Thus we may conclude that xB = Θ(1). Since we also have xB sB = Θ(µ), it follows that sB = Θ(µ). In exactly the same way we derive that sN = Θ(1) and xN = Θ(µ). Line 4: The estimates in the fourth line follow directly from the definition of d and the estimates for x and s in the first two lines. Line 5 and 6: We obtain an order estimate for (dax )N and (das )B by the following simple argument. By its definition dax is the component of√the vector −u in the null space of the matrix AD. Hence we have kdax k ≤ kuk = n. Therefore, dax = O(1). Since (dax )N is a subvector of dax , we must also have (dax )N = O(1). A similar argument applies to (das )B . The estimates for (dax )B and (das )N are much more difficult to obtain. We only deal with the estimate for (dax )B ; the result for (das )N can be obtained in a similar way. 21 These quantities were introduced by Ye [311]. See also Vavasis and Ye [280]. The numbers σp s x for the self-dual model, as introduced in and σSP and σd closely resemble the numbers σSP Section 3.3.2 of Part I. According to the definition of the condition number σSP for the self-dual model, the smallest of the two numbers σp and σd is a natural candidate for a condition number for the standard problems (P ) and (D). We refer the reader to the above-mentioned papers for a discussion of other condition numbers and their mutual relations. II.7 Primal-Dual Logarithmic Barrier Method 193 The main force in the derivation below is the observation that dax can be written as the projection on the null space of AD of a vector that vanishes on the index set B.22 This can be seen as follows. We may write √ 1 1 dax = −PAD (u) = − √ PAD ( xs) = − √ PAD (ds). µ µ Now let (ỹ, s̃) be any dual optimal pair. Then s = c − AT y = AT ỹ + s̃ − AT y = s̃ + AT (ỹ − y), so we have ds = ds̃ + (AD)T (ỹ − y). This means that ds − ds̃ belongs to the row space of AD. The row space being orthogonal to the null space of AD, it follows that PAD (ds) = PAD (ds̃). Thus we obtain 1 dax = − √ PAD (ds̃). µ (7.43) Since s̃ is dual optimal, all its positive coordinates belong to the index set N , and hence we have s̃B = 0. Now we can rewrite (7.43) in the following way:   √ 2 − µ dax = argminh kds̃ − hk : ADh = 0 , or equivalently,   √ 2 2 − µ dax = argminh kdB s̃B − hB k + kdN s̃N − hN k : AB DB hB + AN DN hN = 0 . This means that the solution of the last minimization problem is given by hB = √ √ − µ (dax )B and hN = − µ (dax )N . Hence, substituting the optimal value for hN as above, and also using that s̃B = 0, we obtain   √ √ 2 − µ (dax )B = argminhB khB k : AB DB hB = µ AN DN (dax )N . √ Stated otherwise, − µ (dax )B can be characterized as the vector of smallest norm in the affine space √ S = {ξ : AB DB ξ = µ AN DN (dax )N } . √ Now consider the least norm solution of the equation AB z = µ AN DN (dax )N . This solution is given by √ a z ∗ = µ A+ B AN DN (dx )N , 22 We kindly acknowledge that the basic idea of the analysis below was communicated privately to us by our colleague Gonzaga. We also refer the reader to Gonzaga and Tapia [127] and Ye et al. [317]; these papers deal with the asymptotically quadratic convergence rate of the predictor-corrector method. 194 II Logarithmic Barrier Approach −1 ∗ 23 where A+ B denotes the pseudo-inverse √of the matrix AB . It is obvious that DB z a belongs to the affine space S. Hence, − µ(dx )B being the vector of smallest norm in S, we obtain √ √ −1 ∗ −1 + k µ (dax )B k ≤ DB z = µ DB AB AN DN (dax )N , or, dividing both sides by √ µ, −1 + k(dax )B k ≤ DB AB AN DN (dax )N . This implies −1 k(dax )B k ≤ DB a A+ B kAN k kDN k k(dx )N k . Since, by convention, A+ B and kAN k are bounded by O(1), and the order of magnitudes of the other norms on the right-hand side multiply to O(µ), we obtain that k(dax )B k = O(µ). This implies the entry (dax )B = O(µ) in the table. Line 7 and 8: These lines are not necessary for the proof of Theorem II.68. We only add them because of their own interest. They immediately follow from the previous lines in the table and the relations ∆x = √ µ ddx , ∆s = √ −1 µ d ds . This completes the proof of all the entries in Table 7.6.. 7.8 A version of the algorithm with large updates The primal-dual methods considered so far share the property that the iterates stay close to the central path. More precisely, each generated primal-dual pair (x, s) belongs to the region of quadratic convergence around some µ-center. In this section we consider an algorithm in which the iterates may temporarily get quite far from the central path, because of a large, but fixed, update of the barrier parameter. Then, by using damped Newton steps, we return to the neighborhood of the point of the central path corresponding to the new value of the barrier parameter. The algorithm is the natural primal-dual analogue of the dual algorithm with large updates in Section 6.9. Just as in the dual case, when the iterates leave the neighborhood of the central path the proximity measure for the full-step method, δ(x, s; µ), becomes less relevant as a measure for closeness to the central path. It will be of no surprise that in the primaldual case the primal-dual logarithmic barrier function φµ (x, s) is a perfect tool for this job. Recall from (6.23), on page 133, that φµ (x, s) is given by φµ (x, s) = Ψ  xs −e µ  = eT   X n xj sj xs log −e − , µ µ j=1 (7.44) and from Section 6.9 (page 130) that φµ (x, s) is nonnegative on its domain (the set of all positive primal-dual pairs), is strictly convex, has a (unique) minimizer, namely 23 See Appendix B. II.7 Primal-Dual Logarithmic Barrier Method 195 (x, s) = (x(µ), s(µ)) and, finally that φµ (x(µ), s(µ)) = 0.24 The algorithm is described below (page 195). As usual, ∆x and ∆s denote the Newton step at the current pair (x, s) with the barrier parameter equal to its current value µ. The first while-loop in the algorithm is called the outer loop and the second Primal-Dual Logarithmic Barrier Algorithm with Large Updates Input: A proximity parameter τ ; an accuracy parameter ε > 0; a variable damping factor α; a fixed barrier update parameter θ, 0 < θ < 1; (x0 , s0 ) ∈ P × D and µ0 > 0 such that δ(x0 , s0 ; µ0 ) ≤ τ . begin x := x0 ; s := s0 ; µ := µ0 ; while nµ ≥ ε do begin µ := (1 − θ)µ; while δ(x, s; µ) ≥ τ do begin x := x + α∆x; s := s + α∆s; (The damping factor α must be such that φµ (x, s) decreases sufficiently. Lemma II.72 gives a default value for α.) end end end while-loop the inner loop. Each execution of the outer loop is called an outer iteration and each execution of the inner loop an inner iteration. The required number of outer iterations depends only on the dimension n of the problem, on µ0 and ε, and on the (fixed) barrier update parameter θ. This number immediately follows from Lemma I.36 and is given by   nµ0 1 . log θ ε Just as in the dual case, the main task in the analysis of the algorithm is the estimation of the number of iterations between two successive updates of the barrier parameter. 24 Exercise 54 Let the positive primal-dual pair (x, s) be given. We want to find µ > 0 such that φµ (x, s) is minimal. Show that this happens if µ = xT s/n and verify that for this value of µ we have n   xT s X nxs − e = n log − log xj sj . φµ (x, s) = Ψ xT s n j=1 196 II Logarithmic Barrier Approach This is the purpose of the next sections. We first derive some estimates of φµ (x, s) in terms of the proximity measure δ(x, s; µ). 7.8.1 Estimates of barrier function values The estimates in this section are of the same type as the estimates in Section 6.9.1 for the dual case.25 Many of these estimates there were given in terms of the function ψ : (−1, ∞) → IR determined by (5.5): ψ(t) = t − log(1 + t), which is nonnegative on its domain, strictly convex and zero at t = 0. For z ∈ IRn , with z + e > 0, we defined in (6.22), page 133, Ψ(z) = n X ψ(zj ). (7.45) j=1 The estimates in Section 6.9.1 were given in terms of the dual proximity measure δ(y, µ). Our aim is to derive similar estimates, but now in terms of the primal-dual proximity measure δ(x, s; µ). Let (x, s) be any positive primal-dual pair and µ > 0. Then, with u as usual: r xs , u= µ we may write n   X log u2j = Ψ u2 − e . φµ (x, s) = eT u2 − e − j=1 Using this we prove the next lemma. √ Lemma II.69 Let δ := δ(x, s; µ) and ρ(δ) := δ + 1 + δ 2 . Then   −2δ ψ ≤ φµ (x, s) ≤ ψ (2δρ(δ)) . ρ(δ) The first (second) inequality holds with equality if and only if one of the coordinates of u attains the value ρ(δ) (1/ρ(δ)) and all other coordinates are equal to 1.  Proof: Fixing δ, we consider the behavior of Ψ u2 − e on the set  T := u ∈ IRn : u−1 − u = 2δ, u ≥ 0 . Note that this set is invariant under inverting coordinates of u. Because of the inequality   1 ψ(t − 1) > ψ − 1 , t > 1, (7.46) t 25 The estimates in this section are new and dramatically improve existing estimates from the literature. See, e.g., Monteiro and Adler [218], Mizuno and Todd [216], Jansen et al. [157] and den Hertog [140]. II.7 Primal-Dual Logarithmic Barrier Method 197 whose elementary proof is left as an exercise 26 , this implies that u ≥ e if u maximizes Ψ(u2 − e) on T and u ≤ e if u minimizes Ψ(u2 − e) on T . Consider first the case where u is a maximizer of Ψ on the set T . The first-order optimality conditions are  e  u2 − e , 2u = 2λ u − u2 u3 (7.47) where λ ∈ IR. This can be rewritten as    u2 u2 − e = λ u2 − e u2 + e . It follows that each coordinate of u satisfies ui = 1  u2i = λ u2i + 1 . or Since u > 0, we may conclude from this that the coordinates of u that differ from 1 are mutually equal. Suppose that u has k such coordinates, and that their common value is ν. Note that k > 0, unless δ = 0, in which case the lemma is trivial. Therefore, we may assume that k ≥ 1. Now, since u ∈ T , 2  1 −ν = 4δ 2 , k ν which gives 1 2δ −ν = √ . ν k Since u is a maximizer, we have ν ≥ 1, and hence   δ ν=ρ √ . k Therefore, using that ρ(t)2 − 1 = 2tρ(t), we obtain  2 2  t ∈ IR, Ψ u − e = kψ ν − 1 = kψ  2δ √ ρ k (7.48)  δ √ k  . The expression on the right-hand side is decreasing as a function of k.27 Hence the maximal value is attained if k = 1, and this value equals ψ (2δ ρ (δ)). The second inequality in the lemma follows. The first inequality is obtained in the same way. If u is a minimizer of Ψ on the set T , then the first-order optimality conditions (7.47) imply in the same way as before 26 Exercise 55 Derive (7.46) from the inequalities in Exercise 42 (page 137). 27 Exercise 56 Let δ and ρ(δ) be as defined in Lemma II.69, and let k ≥ 1. Prove that kψ  2δ √ ρ k  δ √ k  and that this expression is maximal if k = 1. = kψ   √ 2δ2 + 2δ δ2 + k k 198 II Logarithmic Barrier Approach that the coordinates of u that differ from 1 are mutually equal. Assuming that u has k such coordinates, and that their common value is ν again, we now have ν ≤ 1, and hence 1 ν =  . ρ √δk Using (7.48), it follows that 1 − ρ(t)2 −2tρ(t) −2t 1 − 1 = = = . 2 2 2 ρ(t) ρ(t) ρ(t) ρ(t) Hence we may write     −2δ  . Ψ u2 − e = kψ ν 2 − 1 = kψ  √ k ρ √δk The expression on the right-hand side is increasing as a function of k.28 Hence the minimal value is attained if k = 1, and this value equals ψ (−2δ/ρ (δ)). Thus the proof of the lemma is complete. ✷ 4 3 ψ (2δρ(δ)) ❯ 2 ❑ 1 ψ 0 0 0.5  −2δ ρ(δ) 1  1.5 2 ✲ δ = δ(x, s; µ) Figure 7.10 28 Bounds for ψµ (x, s). Exercise 57 Let δ and ρ(δ) be as defined in Lemma II.69, and let k ≥ 1. Prove that  kψ  √ −2δ kρ  √δ k    = kψ and that this expression is minimal if k = 1.  −2δ √ δ + δ2 + k  II.7 Primal-Dual Logarithmic Barrier Method 199 Figure 7.10 shows the graphs of the bounds in Lemma II.69 for φµ (x, s) as a function of the proximity δ. Remark II.70 It may be worthwhile to discuss the quality of these bounds. Both bounds are valid for all (nonnegative) values of the proximity. Especially for the upper bound this is worth noting. Proximity measures known in the literature do not have this feature. For example, with the popular measure xs −e µ all known upper bounds grow to infinity if the measure approaches 1. The upper bound of Lemma II.69 goes to infinity only if our proximity measure goes to infinity. The lower bound goes to infinity as well if if our proximity measure goes to infinity, due to the fact that −2δ/ρ(δ) converges to -1 if δ goes to infinity. This is a new feature, which will be used below in the analysis of the large-update method. On the other hand, it must be noted that the lower bound grows very slowly if δ increases. For example, if δ = 1, 000, 000 then the lower bound is only 28.0168. • 7.8.2 Decrease of barrier function value Suppose again that (x, s) is any positive primal-dual pair and µ > 0. In this section we analyze the effect on the barrier function value of a damped Newton step at (x, s) to the µ-center. With u as defined before, the Newton displacements ∆x and ∆s satisfy x∆s + s∆x = µe − xs. Let x+ and s+ result from a damped Newton step of size α at (x, s). Then we have x+ = x + α∆x, s+ = s + α∆s. Using the scaled displacements dx and ds , defined in (7.5), page 154, we can also write √ √ x+ = µ d (u + αdx ) , s+ = µ d−1 (u + αds ) . As a consequence,   x+ s+ = µ (u + αdx ) (u + αds ) = µ u2 + α e − u2 + α2 dx ds . Here we used that u (dx + ds ) = e − u2 , which follows from dx + ds = u−1 − u. Now, defining u+ := s (7.49) x+ s+ , µ it follows that u+ 2  = (u + αdx ) (u + αds ) = u2 + α e − u2 + α2 dx ds . Subtracting e we get u+ 2  − e = (1 − α) u2 − e + α2 dx ds . (7.50) 200 II Logarithmic Barrier Approach Note that the orthogonality of dx and ds implies that eT dx ds = 0. Using this we find the following expression for φµ (x+ , s+ ): φµ (x+ , s+ ) eT =  u+ 2 n  X 2 log u+ −e − j j=1 n  X 2 log u+ (1 − α) eT u2 − e − . j = j=1 The next lemma provides an expression for the decrease of the barrier function value during a damped Newton step. Lemma II.71 Let δ = δ(x, s; µ) and let α be such that the pair (x+ , s+ ) resulting from the damped Newton step of size α is feasible. Then we have     αds αdx + + 2 −Ψ . φµ (x, s) − φµ (x , s ) = 4αδ − Ψ u u Proof: For the moment let us denote ∆ := φµ (x, s) − φµ (x+ , s+ ). Then we have ∆ = = e T  2 u −e − n X log u2j j=1 n  X log αeT u2 − e + j=1 Since u+ we may write  u+ u Substituting this we obtain ∆ = = αe T 2 2 =  n X  αeT u2 − e + ! + 2 uj uj 2  u −e + n X j=1 2 .    dx ds e+α e+α . u u j=1 n X log u+ j uj !2  X    n (ds )j (dx )j + log 1 + α . log 1 + α uj uj j=1 j=1 Observe that, by the definition of Ψ,       n X (dx )j dx αdx −Ψ log 1 + α = αeT uj u u j=1 and, similarly,       (ds )j ds αds T log 1 + α −Ψ . = αe uj u u j=1 n X log u+ j = (u + αdx ) (u + αds ) 2 u −e + − (1 − α)e T II.7 Primal-Dual Logarithmic Barrier Method 201 Substituting this in the last expression for ∆ we arrive at          ds αdx αds dx T 2 T T + αe −Ψ −Ψ . ∆ = αe u − e + αe u u u u Using (7.49) once more, the coefficients of α in the first three terms can be taken together as follows:   2  dx + ds 2 T u −e+ e = eT u2 − e + u−2 − e = eT u−1 − u . u Thus we obtain ∆ = α u−1 − u 2 −Ψ  αdx u Since u−1 − u = 2δ, the lemma follows.29,30  −Ψ  αds u  . ✷ We proceed by deriving a lower bound for the expression in the above lemma. The next lemma also specifies a value of the damping parameter α for which the decrease in the barrier function value attains the lower bound. Lemma II.72 Let δ = δ(x, s; µ) and let α = 1/ω − 1/(ω + 4δ 2 ), where s s 2 2 2 2 ∆x dx ∆s ds ω := + = + . x s u u Then the pair (x+ , s+ ) resulting from the damped Newton step of size α is feasible. Moreover, the barrier function value decreases by at least ψ(2δ/ρ(δ)). In other words,   2δ φµ (x, s) − φµ (x+ , s+ ) ≥ ψ . ρ(δ) 29 30 Exercise 58 Verify that ∆x dx ∆s ds = , = . x u s u Exercise 59 Using Lemma II.71, show that the decrease in the primal-dual barrier function value after a damped step of size α can be written as: ∆ := φµ (x, s) − φµ (x+ , s+ ) = α kdx k2 + α kds k2 − Ψ  αdx u  −Ψ Now let z be the concatenation of the vectors dx and ds . Then we may write   . z 2 u z + αu  ! ,   ∆ = α kzk2 − Ψ αz u  αds u  . Using this, show that the decrease is maximal for the unique step-size ᾱ determined by the equation T 2 e z =e T α e and that for this value the decrease is given by Ψ  −ᾱz u + ᾱz  =Ψ  −ᾱdx u + ᾱdx +Ψ −ᾱds u + ᾱds  . 202 II Logarithmic Barrier Approach Proof: Assuming feasibility of the damped step with size α, we know from Lemma II.71 that the decrease in the barrier function value is given by     αds αdx 2 −Ψ . ∆ := 4αδ − Ψ u u We now apply the right-hand side inequality in (6.24), page 134, to the vector in IR2n obtained by concatenating the vectors αdx /u and αds /u. Note that the norm of this vector is given by αω, with ω as defined in the lemma, and that αω < 1 for the value of α specified in the lemma. Then we obtain ∆ ≥ 4αδ 2 − ψ (−αω) = 4αδ 2 + αω + log (1 − αω) . (7.51) As a function of α, the derivative of the right-hand side expression is given by ω 4δ 2 (1 − αω) − αω 2 = . 1 − αω 1 − αω From this we see that the right-hand side expression in (7.51) is increasing for 4δ 2 + ω − 0 ≤ α ≤ ᾱ := 4δ 2 1 1 , = − 2 ω (ω + 4δ ) ω ω + 4δ 2 and decreasing for larger values of α. Hence it attains its maximal value at α = ᾱ, which is the value specified in the lemma. Moreover, since the barrier function is finite for 0 ≤ α ≤ ᾱ, the damped Newton step of size ᾱ is feasible. Substitution of α = ᾱ in (7.51) yields the following bound for ∆:   2  4δ 4δ 2 ω 4δ 2 4δ 2 = ψ . = + log − log 1 + ∆≥ ω ω + 4δ 2 ω ω ω In this bound we may replace ω by a larger value, since ψ(t) is monotonically increasing for t nonnegative. An upper bound for ω can be obtained as follows: s q 2 2 ds dx 2 2 ω= + ≤ u−1 ∞ kdx k + kds k = u−1 ∞ u−1 − u . u u Since u−1 ∞ ≤ ρ(δ), by Lemma II.62, page 182, and u−1 − u = 2δ we obtain ω ≤ 2δρ(δ). Substitution of this bound yields ∆≥ψ completing the proof.31 31  2δ ρ(δ)  (7.52) , ✷ Exercise 60 With ω as defined in Lemma II.72, show that ω≥ 2δ . ρ(δ) Using this and (7.52), prove that the step-size α specified in Lemma II.72 satisfies δ2 ρ(δ)2 1 ≤α= ≤ . 2 2ρ(δ) (2ρ(δ) + δ) ω (ω + δ ) 2 (2 + δρ(δ)) II.7 Primal-Dual Logarithmic Barrier Method 203 Remark II.73 The same analysis as in Lemma II.72 can be applied to the case where different step-sizes are taken for the x-space and the s-space. Let x+ = x + α∆x and s+ = s + β∆s, with α and β such that both steps are feasible. Then the decrease in the primal-dual barrier function value is given by ∆ := φµ (x, s) − φµ (x+ , s+ ) = α kdx k2 − Ψ  αdx u  + β kds k2 − Ψ  βds u  . Defining ω1 := kdx /uk, the x-part of the right-hand side can be bounded by  ∆1 := α kdx k2 − Ψ αdx u  ≥ψ   kdx k2 ω1 , and this bound holds with equality if α = ᾱ := 1 1 − . ω1 ω1 + kdx k2 Similarly, defining ω2 := kds /uk, the s-part of the right-hand side can be bounded by  ∆2 := β kds k2 − Ψ βds u  ≥ψ  kds k2 ω2  , and this bound holds with equality if β = β̄ := 1 1 − . ω2 ω2 + kds k2 Hence, ∆ = ∆1 + ∆2 ≥ ψ  kdx k2 ω1  +ψ  kds k2 ω2  . We can easily verify that ω1 ≤ ρ(δ) kdx k , ω2 ≤ ρ(δ) kds k . Using the monotonicity of ψ, it follows that ∆1 ≥ ψ  kdx k ρ(δ)  , ∆2 ≥ ψ  kds k ρ(δ)  . We obtain in this way ∆ = ∆1 + ∆2 ≥ ψ  kdx k ρ(δ)  +ψ  kds k ρ(δ)  . Finally, applying the left inequality in (6.24) to the right-hand side expression, we can easily derive that   s ∆ ≥ ψ kdx k2 + kds k2  =ψ ρ(δ)2  2δ ρ(δ)  . 204 II Logarithmic Barrier Approach Note that this is exactly the same bound as obtained in Lemma II.72. Thus, different stepsizes in the x-space and s-space give in this analysis no advantage over equal step-sizes in both spaces. This contradicts an earlier (and incorrect) result of Roos and Vial in [246].32 • For our goal it is of interest to√derive the following two conclusions from the above lemma. First, if δ(x, s; µ) = 1/ 2 then a damped Newton step reduces the barrier function by at least 0.182745, which is larger than 1/6. On the other hand for larger values of δ(x, s; µ) the lower bound for the reduction in the barrier function value seems to be rather poor. It seems reasonable to expect that the reduction grows to infinity if δ goes to infinity. However, if δ goes to infinity then 2δ/ρ(δ) goes to 1, and hence the lower bound in the lemma is bounded by the rather small constant ψ(1) = 1 − log 2.33 7.8.3 A bound for the number of inner iterations As before, we assume that we have an iterate (x, s) and µ > 0 such that (x, s) belongs to the region around the µ-center determined by δ = δ(x, s; µ) ≤ τ, for some positive τ . Starting at (x, s) we count the number of inner iterations needed to reach the corresponding region around the µ+ -center, with µ+ = (1 − θ)µ. Implicitly it is assumed that θ is so large that (x, s) lies outside the region of quadratic convergence around the µ+ -center, but this is not essential for the analysis below. Recall that the target centers x(µ+ ) and s(µ+ ) are the (unique) minimizers of the primal-dual logarithmic barrier function φµ+ (x, s), and that the value of this function is an indicator for the ‘distance’ from (x, s) to (x(µ+ ), s(µ+ )). We start by considering the effect of an update of the barrier parameter to µ+ = (1 − θ)µ with 0 ≤ θ < 1, on the barrier function value. Note that Lemma II.69 gives the answer if θ = 0: φµ (x, s) ≤ ψ(2δρ(δ)). 32 Exercise 61 In this exercise we consider the case where different step-sizes are taken for the xspace and the s-space. Let x+ = x + α∆x and s+ = s + β∆s, with α and β such that both steps are feasible. Prove that the decrease in the primal-dual barrier function value is given by ∆ := φµ (x, s) − φµ (x+ , s+ ) = α kdx k2 + β kds k2 − Ψ  αdx u  −Ψ  βds u  . Using this, show that the decrease is maximal for the unique step-sizes ᾱ and β̄ determined by the equations ! ! T 2 e (dx ) = e α T dx u 2 , e + α dux T 2 e (ds ) = e β T ds u 2 e + β dus , and that for these values of α and β the decrease is given by Ψ 33  −ᾱdx u + ᾱdx  +Ψ  −β̄ds u + β̄ds  . We want to explicitly show the inherent weakness of the lower bound in Lemma II.72 in the hope that it will stimulate the reader to look for a stronger result. II.7 Primal-Dual Logarithmic Barrier Method 205 For the general case, with θ > 0, we have the following lemma. Lemma II.74 Using the above notation, we have   √ θ 2δρ(δ)θ n . + nψ φµ+ (x, s) ≤ φµ (x, s) + 1−θ 1−θ Proof: The proof is more or less straightforward. The vector u is defined as usual.  2  X n u2j u φµ+ (x, s) = eT log −e − 1−θ 1−θ j=1   2 n  X u 2 2 T 2 T − u + n log(1 − θ) log uj + e = e u −e − 1−θ j=1 θeT u2 + n log(1 − θ) 1−θ  θn θ uT u − u−1 + + n log(1 − θ). = φµ (x, s) + 1−θ 1−θ The second term in the last expression can be bounded by using  √ uT u − u−1 ≤ kuk u − u−1 ≤ 2δρ(δ) n. = φµ (x, s) + The first inequality is simply the Cauchy–Schwarz √ inequality and √ the second inequality follows from u−1 − u = 2δ and kuk ≤ n kuk∞ ≤ nρ(δ), where we used Lemma II.62, page 182. We also have      θn θ θ θ = nψ . + n log(1 − θ) = n − log 1 + 1−θ 1−θ 1−θ 1−θ Substitution yields   √ 2δρ(δ)θ n θ φµ+ (x, s) ≤ φµ (x, s) + , + nψ 1−θ 1−θ and hence the lemma has been proved. ✷ Now we are ready to estimate the number of (inner) iterations between two successive updates of the barrier parameter. Lemma II.75 For given θ (0 < θ < 1), let Then, when √ θ n R := . 1−θ √ R τ= p √ , 2 1+ R the number of (inner) iterations between two successive updates of the barrier parameter is not larger than   4  s √ θ n    . 2 1 +  1−θ    206 II Logarithmic Barrier Approach Proof: Suppose that δ = δ(x, s; µ) ≤ τ . Then it follows from Lemma II.74 that after the update of the barrier parameter to µ+ = (1 − θ)µ we have   √ θ 2δρ(δ)θ n . + nψ φµ+ (x, s) ≤ φµ (x, s) + 1−θ 1−θ By Lemma II.69 we have φµ (x, s) ≤ ψ (2δρ(δ)). Using the monotonicity of ψ and, since δ ≤ τ , 2δρ(δ) ≤ 2τ ρ(τ ) we obtain   √ 2τ ρ(τ )θ n θ φµ+ (x, s) ≤ ψ (2τ ρ(τ )) + + nψ . 1−θ 1−θ Application of the inequality ψ(t) ≤ t2 /2 for t ≥ 0 to the first and the third terms yields φ µ+ 2  √ √ √ 2τ ρ(τ )θ n nθ2 θ n √ . (x, s) ≤ 2τ ρ(τ ) + = τ ρ(τ ) 2 + + 1−θ 2(1 − θ)2 2(1 − θ) 2 2 The algorithm repeats damped Newton steps until the iterate (x, s) satisfies δ = δ(x, s; µ+ ) ≤ τ . Each damped step decreases the barrier function value by at least ψ (2τ /ρ(τ )). Hence, after    2 √ √ 1 n θ     τ ρ(τ ) 2 + √ (7.53)  2τ 2(1 − θ)   ψ ρ(τ  ) iterations the value of the barrier function will have reached (or bypassed) the value ψ (2τ /ρ(τ )). From Lemma II.69, using that ψ (2τ /ρ(τ )) < ψ (−2τ /ρ(τ )), the iterate (x, s) then certainly satisfies δ(x, s; µ+ ) ≤ τ , and hence (7.53) provides an upper bound for the number of inner iterations between two successive updates of the barrier parameter. The rest of the proof consists in manipulating this expression. First, using ψ(t) ≥ t2 /(2(1 + t)) and 0 ≤ 2τ /ρ(τ ) ≤ 1, we obtain ψ  2τ ρ(τ )  ≥ 4τ 2 ρ(τ )2  2 1+ 2τ ρ(τ )  = τ2 τ2 2 ≥ . 2τ ρ(τ )2 ρ(τ )2 1 + ρ(τ ) Substitution reduces the upper bound (7.53) to & 2 ' &   √ √ 2 ' √ θ n θρ(τ ) ρ(τ )2 n = 2 ρ(τ )2 + . τ ρ(τ ) 2 + √ τ2 2τ (1 − θ) 2(1 − θ) For fixed θ the number of inner iterations is a function of τ . Note that this function goes to infinity if τ goes to zero or to infinity. Our aim is to determine τ such that this function is minimized. To this end we consider T (τ ) := ρ(τ )2 + ρ(τ )R , 2τ II.7 Primal-Dual Logarithmic Barrier Method 207 with R as given in the lemma. The derivative of T (τ ) with respect to τ can be simplified to 4τ 2 ρ(τ )2 − R √ . T ′ (τ ) = 2τ 2 1 + τ 2 Hence T (τ ) is minimal if 2τ ρ(τ ) = √ R. We can solve this equation for τ . It can be rewritten as √ ρ(τ )2 − 1 = R, which gives ρ(τ ) = Hence, 1 τ= 2  ρ(τ ) − 1 ρ(τ )  1 = 2 q √ 1 + R. ! √ q √ 1 R = p 1+ R− p √ √ . 1+ R 2 1+ R (7.54) Substitution of this value in T (τ ) gives  √   R R 1 + √ 2 √ ρ(τ )R √ T (τ ) = ρ(τ )2 + = 1+ R . =1+ R+ 2τ R For the value of τ given by (7.54) the number of inner iterations between two successive updates of the barrier parameter will not be larger than   4  s    √  √ 4 θ n   = 2 1 + 2 1+ R ,  1−θ    which proves the lemma. ✷ √ Remark II.76 Note that for small values of θ, so that θ n is bounded by a constant, the above lemma implies that the number of inner iterations between two successive updates of √ the barrier parameter is bounded by a constant. For example, with θ = 1/ 2n, which gives (for large values of n) τ = 0.309883, this number is given by &  2 1+ r 1 √ 2 4 ' = 23. Unfortunately the constant is rather large. Because, if τ = 0.309883 then we know that after √ an update with θ = 1/ 2n one full Newton step will be sufficient to reach the vicinity of the new target. In fact, it turns out that the bound has the same weakness as the bound in Theorem II.41 for the dual case. As discussed earlier, this weak result is due to the poor analysis. • In practice the number of inner iterations is much smaller than the number predicted by the lemma. This is illustrated by some examples in the next section. But first we 208 II Logarithmic Barrier Approach formulate the main conclusion of this section, namely that the primal-dual logarithmic barrier method with large updates is polynomial. This is the content of our final result in this section. Theorem II.77 The following expression is an upper bound for the total number of iterations required by the logarithmic barrier algorithm with line searches:     4  s √ 0 θ n  nµ  1   .  log  2 1 + θ  1−θ  ε      Here it is assumed that τ is chosen as in Lemma II.75: √ √ θ n R . τ= p √ , where R = 1−θ 2 1+ R √ If θ ≤ n/(n + n) the output is a primal-dual pair (x, s) such that xT s ≤ 2ε. Proof: The number of outer iterations follows from Lemma I.36. The bound in the theorem is obtained by multiplying this number by the bound of Lemma II.75 for the number of inner iterations per outer iteration and rounding the product, if not integral, to the smallest integer above it. Finally, for the last statement we use the inequality   2δρ(δ) nµ, xT s ≤ 1 + √ n where δ = δ(x, s; µ); the elementary proof of this inequality is left as an exercise.34,35 For the output pair (x, s) we may apply this inequality with δ ≤ τ . Since √ q √ R p τ= √ , ρ(τ ) = 1 + R, 2 1+ R √ √ we have 2τ ρ(τ ) = R. Now θ ≤ n/(n + n) implies that R ≤ n, and hence we obtain T that x s ≤ 2nµ ≤ 2ε. ✷ Just as in the dual case, we draw two conclusions from the last theorem. If we take for θ a fixed constant (independent of n), for example θ = 1/2, the algorithm is called a large-update algorithm and the iteration bound of Theorem II.77 becomes   nµ0 . O n log ε 34 Exercise 62 Let (x, s) be a positive primal-dual pair and µ > 0. If δ = δ(x, s; µ), prove that xT s − nµ = µ uT u − u−1 35  ≤ 2δρ(δ) nµ. √ n √ Exercise 63 The bound in Exercise 62 is based on the estimate kuk ≤ ρ(δ) n. Prove the sharper estimate  √ δ kuk ≤ ρ √ n. n II.7 Primal-Dual Logarithmic Barrier Method 209 √ If we take θ = ν/ n for some fixed constant ν (independent of n), the algorithm is called a medium-update algorithm and the iteration bound of Theorem II.77 becomes   √ nµ0 , O n log ε provided that n is large enough (n ≥ ν 2 say). 7.8.4 Illustration of the algorithm with large updates We use the same sample problem as in the numerical examples given before, and solve this problem using the primal-dual logarithmic barrier algorithm with large updates. We use the same initialization as before, namely x = (2, 1, 1), y = (0, 0), s = (1, 1, 1) and µ = 4/3. We do this for the values 0.5, 0.9, 0.99 and 0.999 of the barrier update parameter θ. It may be interesting to mention the values of the parameter τ , as given by Lemma II.75, for these values of θ. With n = 3, these values are respectively 0.43239, 0.88746, 1.74397 and 3.18671. The progress of the algorithm for the three successive values of θ is shown in Tables 7.7. (page 210), 7.8., 7.9. and 7.10. (page 211). The tables need some explanation. They show only the first coordinates of x and of s. As in the corresponding tables for the dual case, the tables not only have lines for the inner iterations, but also for the outer iterations, which multiply the value of the barrier parameter by the fixed factor 1−θ. The last column shows the proximity to the current µ-center. The proximity value δ increases in the outer iterations and decreases in the inner iterations. The tables clearly demonstrate the advantages of the large-update strategy. The number of inner iterations between two successive updates of the barrier parameter is never more than two. In the last table (with θ = 0.999) the sample problem is solved in only 3 iterations, which is the best result obtained so far. The practical behavior is significantly better than the theoretical analysis justifies. This is typical, and the same phenomenon occurs for larger problems than the small sample problem. We conclude this section with a graphical illustration of the algorithm, in Figure 7.11 (page 212). 210 II Logarithmic Barrier Approach Outer Inner 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Table 7.7. nµ x1 y1 y2 s1 δ 0 4.000000 2.000000 1 2.000000 1.000000 2 1.000000 0.500000 3 0.500000 0.250000 4 0.250000 0.125000 5 0.125000 0.062500 6 0.062500 0.031250 7 0.031250 0.015625 8 0.015625 0.007812 9 0.007812 0.003906 10 0.003906 0.001953 11 0.001953 0.000977 12 0.000977 0.000488 13 0.000488 0.000244 14 0.000244 0.000122 15 0.000122 0.000061 16 0.000061 2.000000 2.000000 1.372070 1.372070 1.158784 1.158784 1.082488 1.082488 1.041691 1.041691 1.020805 1.020805 1.010423 1.010423 1.005201 1.005201 1.002606 1.002606 1.001300 1.001300 1.000651 1.000651 1.000325 1.000325 1.000163 1.000163 1.000081 1.000081 1.000041 1.000041 1.000020 1.000020 1.000010 0.000000 0.000000 0.313965 0.313965 0.649743 0.649743 0.835475 0.835475 0.916934 0.916934 0.958399 0.958399 0.979157 0.979157 0.989597 0.989597 0.994789 0.994789 0.997399 0.997399 0.998697 0.998697 0.999350 0.999350 0.999674 0.999674 0.999837 0.999837 0.999919 0.999919 0.999959 0.999959 0.999980 0.000000 0.000000 0.313965 0.313965 0.666131 0.666131 0.835249 0.835249 0.916776 0.916776 0.958395 0.958395 0.979156 0.979156 0.989598 0.989598 0.994789 0.994789 0.997399 0.997399 0.998697 0.998697 0.999350 0.999350 0.999674 0.999674 0.999837 0.999837 0.999919 0.999919 0.999959 0.999959 0.999980 1.000000 1.000000 0.686035 0.686035 0.350257 0.350257 0.164525 0.164525 0.083066 0.083066 0.041601 0.041601 0.020843 0.020843 0.010403 0.010403 0.005211 0.005211 0.002601 0.002601 0.001303 0.001303 0.000650 0.000650 0.000326 0.000326 0.000163 0.000163 0.000081 0.000081 0.000041 0.000041 0.000020 0.2887 0.6455 0.2334 0.6838 0.1559 0.6237 0.0587 0.6031 0.0281 0.6115 0.0147 0.6111 0.0073 0.6129 0.0039 0.6111 0.0019 0.6129 0.0015 0.6111 0.0007 0.6129 0.0012 0.6112 0.0006 0.6129 0.0011 0.6112 0.0005 0.6129 0.0012 0.6112 0.0005 Progress of the primal-dual algorithm with large updates, θ = 0.5. II.7 Primal-Dual Logarithmic Barrier Method Outer Inner 0 1 2 3 4 5 Table 7.8. nµ x1 y1 y2 s1 δ 0 4.000000 0.400000 1 0.400000 2 0.400000 0.040000 3 0.040000 0.004000 4 0.004000 0.000400 5 0.000400 0.000040 6 0.000040 2.000000 2.000000 1.051758 1.078981 1.078981 1.004551 1.004551 1.000621 1.000621 1.000066 1.000066 1.000007 0.000000 0.000000 0.263401 0.875555 0.875555 0.976424 0.976424 0.998596 0.998596 0.999867 0.999867 0.999987 0.000000 0.000000 0.684842 0.861676 0.861676 0.983729 0.983729 0.998677 0.998677 0.999868 0.999868 0.999987 1.000000 1.000000 0.736599 0.124445 0.124445 0.023576 0.023576 0.001404 0.001404 0.000133 0.000133 0.000013 0.2887 2.4664 1.1510 0.0559 2.5417 0.3661 2.7838 0.0447 2.4533 0.0070 2.4543 0.0027 Progress of the primal-dual algorithm with large updates, θ = 0.9. Outer Inner 0 1 2 3 Table 7.9. nµ x1 y1 y2 s1 δ 0 4.000000 0.040000 1 0.040000 2 0.040000 0.000400 3 0.000400 0.000004 4 0.000004 2.000000 2.000000 2.000000 1.004883 1.004883 1.007772 1.007772 1.000038 0.000000 0.000000 0.000000 0.251292 0.251292 0.987570 0.987570 0.999743 0.000000 0.000000 0.000000 0.743825 0.743825 0.986233 0.986233 0.999834 1.000000 1.000000 1.000000 0.748708 0.748708 0.012430 0.012430 0.000257 0.2887 8.5737 4.2530 0.0816 8.7620 0.4532 9.5961 0.0392 Progress of the primal-dual algorithm with large updates, θ = 0.99. Outer Inner 0 1 2 Table 7.10. 211 nµ x1 y1 y2 0 4.000000 0.004000 1 0.004000 2 0.004000 0.000004 3 0.000004 2.000000 2.000000 1.000977 1.000481 1.000481 1.000000 0.000000 0.000000 0.250006 0.999268 0.999268 0.999998 0.000000 0.000000 0.749018 0.998990 0.998990 0.999999 s1 δ 1.000000 0.2887 1.000000 27.3587 0.749994 13.6684 0.000732 0.3722 0.000732 22.4872 0.000002 0.2066 Progress of the primal-dual algorithm with large updates, θ = 0.999. 212 w2 II Logarithmic Barrier Approach 102 w2 ✻ ✻ 100 100 10−2 10−2 10−4 10−4 10−6 10−8 10−8 w2 102 10−6 θ = 0.5 τ = 0.433 10−6 10−4 10−2 100 ✲ w1 10−8 10−8 102 102 w2 ✻ θ = 0.9 τ = 0.887 10−6 10−4 10−2 100 ✲ w1 102 102 ✻ 100 100 10−2 10−2 10−4 10−4 10−6 10−8 10−8 10−6 θ = 0.99 τ = 1.744 10−6 10−4 Figure 7.11 10−2 100 ✲ w1 102 10−8 10−8 θ = 0.999 τ = 3.187 10−6 10−4 10−2 100 ✲ w1 102 The iterates when using large updates with θ = 0.5, 0.9, 0.99 and 0.999. 8 Initialization All the methods of this part of the book assume the availability of a starting point on or close to the central path of the problem. Sometimes such a point is known, but more often we have no foreknowledge of the problem under consideration. For these cases we provide in this chapter a transformation of the problem yielding an equivalent problem for which a point on the central path is available. This transformation is based on results in Part I and is described below in detail.1 Suppose that we want to solve the problem (P ) in standard format:  (P ) min cT x : Ax = b, x ≥ 0 , where A is an m × n matrix of rank m, c, x ∈ IRn , and b ∈ IRm . Let I be a subset of the full index set {1, 2, . . . , n} such that the submatrix AI of A has size m × m and is nonsingular. Thus, AI is a basis for (P ). After reordering the columns of A, we may write A = (AI AJ ) , where J denotes the complement of I. Now Ax = b can be rewritten as AI xI + AJ xJ = b, which is equivalent to As a consequence we have xI = AI−1 (b − AJ xJ ) . T T −1 T −T cT x = cTI xI + cTJ xJ = cTI A−1 I (b − AJ xJ ) + cJ xJ = cI AI b + cJ − AJ AI cI Hence, omitting the constant cTI A−1 I b we can reformulate (P ) as n o T (P c ) min cJ − ATJ AI−T cI xJ : AI−1 (b − AJ xJ ) ≥ 0, xJ ≥ 0 , T xJ . or equivalently, (P c ) 1 min n cJ − ATJ AI−T cI T o xJ : −AI−1 AJ xJ ≥ −A−1 b, x ≥ 0 . J I We want to point out an advantage of the approach in this chapter over the approach in the existing literature. The technique of embedding a given standard form problem in a homogeneous and self–dual problem was introduced by Ye, Todd and Mizuno [316] in 1994. See also Wu, Wu and Ye [299]; their final model contains free variables. In our approach the occurrence of free variables is avoided by first reducing the given standard problem to a canonical problem. For a different approach to the initialization problem we refer to, e.g., Lustig [189, 190]. 214 II Logarithmic Barrier Approach Thus we have transformed (P ) to the equivalent problem (P c ), which is in canonical format. Chapter 4 describes how we can embed any canonical model in a self-dual model so that a strictly complementary solution of the latter model either yields a strictly complementary solution of the canonical problem or makes clear that the canonical problem is infeasible or unbounded. Moreover, for the embedding problem we have a point on its central path available. If we apply such an embedding to (P c ), the resulting self-dual model may be given by  (SP c ) min q T ξ : M ξ ≥ −q, ξ ≥ 0 , where M is skew-symmetric and q ≥ 0. Let ξ(µ) be a given point on the central path of (SP c ) for some positive µ. Now (SP c ) can be written in the standard form by associating the surplus vector σ(ξ) := M ξ + q with any ξ. We then may rewrite (SP c ) as  (SSP c ) min q T ξ : M ξ − σ = −q, ξ ≥ 0, σ ≥ 0 , and we have ξ(µ)σ(ξ(µ)) = µe, where e is an all-one vector of appropriate size. Note that (SSP c ) is in the standard format. We can rewrite it as  (P̄ ) min c̄T x̄ : Āx̄ = b̄, x̄ ≥ 0 , with Ā = h M −I i , c̄ = " q 0 # , b̄ = −q. The problem (P̄ ) is in the standard format and hence the methods of this chapter can be used to yield an ε-solution of (P̄ ) provided that we have a solution on or close to its central path. We now show that this condition is satisfied by showing that the µ-center of (P̄ ) is known. To this end we need to consider also the dual problem of (P̄ ), namely  (D̄) max b̄T ȳ : ĀT ȳ + s̄ = c̄, s̄ ≥ 0 . For the slack vector s̄ we have T s̄ = c̄ − Ā ȳ = " q − M T ȳ ȳ # = " q + M ȳ ȳ # . Here we used that M T = −M . Now with the definition " # ξ(µ) , ȳ =: ξ(µ), x̄ := σ(ξ(µ)) x̄ is feasible for (P̄ ) and ȳ is feasible for (D̄). The feasibility of ȳ follows by considering its slack vector: " # " # " # q + M ȳ q + M ξ(µ) σ(ξ(µ)) s̄ = = = . ȳ ξ(µ) ξ(µ) II.8 Initialization 215 For the product of x̄ and s̄ we have " #" # " # " # ξ(µ) σ(ξ(µ)) ξ(µ)σ(ξ(µ)) µe x̄s̄ = = = . σ(ξ(µ)) ξ(µ) σ(ξ(µ))ξ(µ) µe This proves that x̄ is the µ-center of (P̄ ), as required. By way of example we apply the above transformation to the sample problem used throughout this part of the book. Example II.78 Taking A and c as in Example II.7 (page 97), and b = (1, 1)T , we have   " # " # 1 1 −1 0 1   A= , b= , c =  1 , 0 0 1 1 1 and (P ) is the problem (P ) min {x1 + x2 + x3 : x1 − x2 = 1, x3 = 1, x ≥ 0} . The first and the third column of A form a basis. With the index set I defined accordingly, the matrix AI is the identity matrix. Then we express x1 and x3 in terms of x2 : # # " " 1 + x2 x1 . = 1 x3 Using this we eliminate x1 and x3 from (P ) and we obtain the canonical problem (P c ): # ) " ( " # −1 1 c , x2 ≥ 0 . x2 ≥ (P ) min 2x2 + 2 : −1 0 Being unrealistic, but just to demonstrate the transformation process for this simple case, we do not assume any foreknowledge and embed this problem in a self-dual problem as described in Section 4.3.2 Taking 1 for x0 and s0 , and for y 0 and t0 the all-one vector of length 2, the self-dual embedding problem is given by (SP c ) with     0 0 0 1 1 −1     0  0 0 0 1 0     ,  0 . q = M = −1 0 0 2 0         0 5 0  −1 −1 −2 5 1 0 0 −5 0 Now the all-one vector is feasible for (SP c ) and its surplus vector is also the all-one vector, as easily can be verified. It follows that the all-one vector is the point on the central path for µ = 1. Adding surplus variables to this problem we get a problem in the standard format with 5 equality constraints and 10 variables. Solving this problem 2 Exercise 64 The canonical problem (P c ) contains an empty row. Remove this row and then perform the embedding. Show that this leads to the same solution of (P c ). 216 II Logarithmic Barrier Approach with the large-update logarithmic barrier method (with θ = 0.999 and ε = 10−4 ), we find in 4 iterations the strictly complementary solution 4 4 8 4 ξ = (0, 0, 0, , 0, , , , 0, 1). 5 5 5 5 The slack vector is 4 4 4 8 σ(ξ) = ( , , , 0, 1, 0, 0, 0, , 0). 5 5 5 5 Note that the first five coordinates of ξ are equal to the last five coordinates of σ(ξ) and vice versa. In fact, the first five coordinates of ξ form a solution of the self-dual embedding (SP c ) of (P c ). The homogenizing variable, the fourth entry in ξ, is positive. Therefore, we have found an optimal solution of (P c ). The optimal value of x2 in (P c ), the third coordinate in the vector ξ, is given by x2 = 0. Hence x = (1, 0, 1) is optimal for the original problem (P ). ♦ A clear disadvantage of the above embedding procedure seems to be that it increases the size of the problem. If the constraint matrix A of (P ) has size m × n then the final standard form problem that we have to solve has size (n + 2) × 2(n + 2). However, when the procedure is implemented efficiently the amount of extra computation can be reduced significantly. In fact, the computation of the search direction for the larger problem can be organized in such a way that it requires the solution of three linear systems with the same matrix of size (m+2)×(m+2). This is explained in Chapter 20. Part III The Target-following Approach 9 Preliminaries 9.1 Introduction In this part we deal again with the problems (P ) and (D) in the standard form:  (P ) min cT x : Ax = b, x ≥ 0 , (D) max  T b y : AT y ≤ c . As before, the matrix A is of size m × n with full row rank and the vectors c and x are in IRn and b in IRm . Assuming that the interior-point condition is satisfied we recall from Theorem II.4 that the KKT system (5.3) Ax AT y + s = = b, c, xs = µe x ≥ 0, s ≥ 0, (9.1) has a unique solution for every positive value of µ. These solutions are called the µcenters of (P ) and (D). The above result is fundamental for the algorithms analyzed in Part II. When µ runs through the positive real line then the solutions of the KKT system run through the central paths of (P ) and (D); the methods in Part II just approximately follow the central path to the optimal sets of (P ) and (D). These methods were called logarithmic barrier methods because the points on the central path are minimizers of the logarithmic barrier functions for (P ) and (D). For obvious reasons they have also become known under the name central-path-following methods. In each (outer) iteration of such a method the value of the parameter µ is fixed and starting at a given feasible solution of (P ) and/or (D) a good approximation is constructed of the µ-centers of (P ) and (D). Numerically the approximate solutions are obtained either by using Newton’s method for solving the KKT system or by using Newton’s method for minimizing the logarithmic barrier function of (P ) and (D). In the first case Newton’s method provides displacements for both (P ) and (D); then we speak of a primal-dual method. In the second case Newton’s method provides a displacement for either (P ) or (D), depending on whether the logarithmic barrier function of (P ) or (D) is used in Newton’s method. This gives the so-called primal methods and dual methods respectively. In all cases the result of an (outer) iteration is a primal-dual pair approximating the µ-centers and such that the duality gap is approximately nµ. 220 III Target-following Approach In this part we present a generalization of the above results. The starting point is the observation that if the vector µe on the right-hand side of the KKT system (9.1) is replaced by any positive vector w then the resulting system still has a (unique) solution. Thus, for any positive vector w the system Ax = b, AT y + s xs = = c, w x ≥ 0, s ≥ 0, (9.2) has a unique solution, denoted by x(w), y(w), s(w).1 This result is interesting in itself. It means that we can associate with each positive vector w the primal-dual pair (x(w), s(w)).2 The map ΦP D associating with any w > 0 the pair (x(w), s(w)) will be called the target map associated with (P ) and (D). In the next section we discuss its existence and also some interesting properties. In the present context, it is convenient to refer to the interior of the nonnegative orthant in IRn as the w-space. Any (possibly infinite) sequence of positive vectors wk (k = 1, 2, . . .) in the w-space is called a target sequence. If a target sequence converges to the origin, then the duality gap eT wk for the corresponding pair in the sequence ΦP D (wk ) converges to zero. We are especially interested in target sequences of this type for which the sequence ΦP D (wk ) is convergent as well, and for which the limiting primal-dual pair is strictly complementary. In Section 9.3 we derive a sufficient condition on target sequences (converging to the origin) that yields this property. We also give a condition such that the limiting pair consists of so-called weighted-analytic centers of the optimal sets of (P ) and (D). With any central-path-following method we can associate a target sequence on the central path by specifying the values of the barrier parameter µ used in the successive (outer) iterations. The central-path-following method can be interpreted as a method that takes the points on the central path as intermediate targets on the way to the origin. Thus it becomes apparent how the notion of central-path-following methods can be generalized to target-following methods, which (approximately) follow arbitrary target sequences. To develop this idea further we need numerical procedures that can be used to obtain a good approximation of the primal-dual pair corresponding to some specified positive target vector. Chapters 10, 12 and 13 are devoted to such procedures. The basic principle is again Newton’s method. Chapter 10 describes a primal-dual method, Chapter 12 a dual method, and Chapter 13 a primal method. The target-following approach offers a very general framework for the analysis of almost all known interior-point methods. In Chapter 11 we analyze some of the methods of Part II in this framework. We also deal with some other applications, including a target-following method that is based on the Dikin direction, as introduced in Appendix E. Finally, in Chapter 14 we deal with the so-called method of centers. This method will be described and after putting it into the target-following framework we provide a new and relatively easy analysis of the method. 1 This result, which establishes a one-to-one correspondence between primal-dual pairs (x, s) and positive vectors in IRn , was proved first in Kojima et al. [175]. Below we present a simple alternative proof. Mizuno [212, 214] was the first to use this property in the design of an algorithm. 2 Here, as before, we use that any dual feasible pair (y, s) can be uniquely represented by either y or s. This is due to the assumption that A has full row rank. III.9 Preliminaries 9.2 221 The target map and its inverse Our first aim in this section is to establish that the target map ΦP D is well defined. That is, we need to show that for any positive vector w ∈ IRn the system (9.2) has a unique solution. To this end we use a modification of the primal-dual logarithmic barrier as given by (6.23). Replacing the role of the vector µe in this function by the vector w, we consider the modified primal-dual logarithmic barrier function defined by   n X xj sj 1 wj ψ −1 . (9.3) φw (x, s) = max (w) j=1 wj Here the function ψ has its usual meaning (cf. (5.5), page 92). The scaling factor 1/ max (w) serves to scale φw (x, s) in such a way that φw (x, s) coincides with the primal-dual logarithmic barrier function (7.44) in Section 7.8 (page 194) if w is on the central path.3 Note that φw (x, s) is defined for all positive primal-dual pairs (x, s). Moreover, φw (x, s) ≥ 0 and the equality holds if and only if xs = w. Hence, the weighted KKT system (9.2) has a solution if and only if the minimal value of φw is 0. By expanding φw (x, s) we get max (w) φw (x, s) n X = j=1 n X = j=1 wj xj sj xj sj − 1 − log wj wj n X xj sj − xT s − =  n X j=1 j=1 wj − n X  wj log xj sj + n X wj log wj j=1 j=1 wj log xj sj − eT w + n X wj log wj . (9.4) j=1 Neglecting for the moment the constant part, that is the part that does not depend on x and s, we are left with the function xT s − n X wj log xj sj . (9.5) j=1 This function is usually called a weighted primal-dual logarithmic barrier function with the coefficients of the vector w as weighting coefficients. Since xT s = cT x − bT y, the first term in (9.5) is linear on the domain of φw (x, s). The second term, called the barrier term, is strictly convex and hence it follows that φw (x, s) is strictly convex on its domain. 3 If w = µe then max (w) = µ and hence φw (x, s) = 1 µ n X j=1 µψ x j sj µ  −1 = n x s X j j ψ j=1 µ  −1 =Ψ   xs −e ; µ this is precisely the primal-dual logarithmic barrier function φµ (x, s) as given by (6.23) and (7.44), and that was used in the analysis of the large-update central-path-following logarithmic barrier method. 222 III Target-following Approach In the sequel we need a quantity to measure the distance from a positive vector w to the central path of the w-space. Such a measure was introduced in Section 3.3.4 in (3.20). We use the same measure here, namely δc (w) := max (w) . min (w) (9.6) Now we are ready to derive the desired result by adapting Theorem II.4 and its proof to the present case. With w fixed, for given K ∈ IR the level set LK of φw is defined by  LK = (x, s) : x ∈ P + , s ∈ D+ , φw (x, s) ≤ K . Theorem III.1 Let w ∈ IRn and w > 0. Then the following statements are equivalent: (i) (P ) and (D) satisfy the interior-point condition. (ii) There exists K ≥ 0 such that the level set LK is nonempty and compact. (iii) There exists a (unique) primal-dual pair (x, s) minimizing φw with x and s both positive. (iv) There exist (unique) x, s ∈ IRn and y ∈ IRm satisfying (9.2); (v) For each K ≥ 0 the level set LK is nonempty and compact. Proof: (i) ⇒ (ii): Assuming (i), there exists a positive x0 ∈ P + and  a positive s0 ∈ D+ . With K = φw x0 , s0 the level set LK contains the pair x0 , s0 . Thus, LK is not empty, and we need to show that LK is compact. Let (x, s) ∈ LK . Then, by the definition of LK ,   n X xi si wi ψ − 1 ≤ K max (w). wi i=1 Since each term in the sum is nonnegative, this implies   xi si K max (w) ψ −1 ≤ = Kδc (w), wi min (w) 1 ≤ i ≤ n. Since ψ is strictly convex on its domain and goes to infinity at its boundaries, there exist unique positive numbers a and b, with a < 1, such that ψ(−a) = ψ(b) = Kδc (w). We conclude that −a ≤ xi si − 1 ≤ b, wi 1 ≤ i ≤ n, which gives wi (1 − a) ≤ xi si ≤ wi (1 + b), 1 ≤ i ≤ n. (9.7) From the right-hand side inequality we deduce that xT s ≤ (1 + b)eT w. We proceed by showing that this and (i) imply that the coordinates of x and s can be bounded above. Since A(x − x0 ) = 0, the vector x − x0 belongs to the null space of III.9 Preliminaries 223 A. Similarly, s − s0 = AT (y 0 − y) implies that s − s0 lies in the row space of A. The row space and the null space of A are orthogonal and hence we have (x − x0 )T (s − s0 ) = 0. (9.8) Writing this as xT s0 + sT x0 = xT s + (x0 )T (s0 ) and using xT s ≤ (1 + b)eT w, we find xT s0 + sT x0 ≤ (1 + b)eT w + (x0 )T (s0 ). (9.9) Since sT x0 ≥ 0, x ≥ 0, and s0 > 0, this implies for each index i that xi s0i ≤ xT s0 + sT x0 ≤ (1 + b)eT w + (x0 )T (s0 ), whence xi ≤ (1 + b)eT w + (x0 )T (s0 ) , s0i proving that the coordinates of the vector x are bounded above. The coordinates of the vector s are bounded above as well. This can be derived from (9.9) in exactly the same way as for the coordinates of x. Using xT s0 ≥ 0, s ≥ 0, and x0 > 0, we obtain for each index i that (1 + b)eT w + (x0 )T (s0 ) . si ≤ x0i Thus we have shown that the level set LK is bounded. We proceed by showing that LK is compact. Each si being bounded above, the left inequality in (9.7) implies that xi is bounded away from zero. In fact, we have xi ≥ (1 − a)wi (1 − a)x0i wi ≥ . si (1 + b)eT w + (x0 )T (s0 ) In the same way we derive that for each i, si ≥ (1 − a)s0i wi (1 − a)wi ≥ . xi (1 + b)eT w + (x0 )T (s0 ) We conclude that for each i there exist positive numbers αi and βi with 0 < αi ≤ βi , such that αi ≤ xi , si ≤ βi , 1 ≤ i ≤ n. Thus we have proved the inclusion LK ⊆ n Y i=1 [αi , βi ] × [αi , βi ]. The set on the right-hand side lies in the positive orthant of IRn × IRn , and being the Cartesian product of closed intervals, it is compact. Since φw is continuous, and well defined on this set, it follows that LK is compact. Thus we have shown that (ii) holds. 224 III Target-following Approach (ii) ⇒ (iii): Suppose that (ii) holds. Then, for some nonnegative K the level set LK is nonempty and compact. Since φw is continuous, it follows that φw has a minimizer (x, s) in LK . Moreover, since φw is strictly convex, this minimizer is unique. Finally, from the definition of φw , ψ ((xi si /wi ) − 1) must be finite, and hence xi si > 0 for each i. This implies that x > 0 and s > 0, proving (iii). (iii) ⇒ (iv): Suppose that (iii) holds. Then φw has a (unique) minimizer. Since the domain P + × D+ of φw is open, (x, s) ∈ P + × D+ is a minimizer of φw if and only if the gradient of φw is orthogonal to the linear space parallel to the smallest affine space containing P + × D+ (cf. Proposition A.1). This linear space is determined by the affine system Ax = 0, Hs = 0, where H is a matrix such that its row space is the null space of A and vice versa. The gradient of φw with respect to the coordinates of x satisfies max (w)∇x φw (x, s) = s − w , x and with respect to the coordinates of s we have max (w)∇s φw (x, s) = x − w . s Application of Proposition A.1 yields that ∇x φw (x, s) must lie in the row space of A and ∇s φw (x, s) must lie in the row space of H. These two spaces are orthogonal, and hence we obtain  w w T  x− = 0. s− x s This can be rewritten as   w T w s− XS −1 s − = 0. x x Since XS −1 is a diagonal matrix with positive elements on the diagonal, this implies s− Hence, w = 0. x w = 0, x whence xs = w. This proves that (x, s) is a minimizer of φw if and only if (x, s) satisfies (9.2). Hence (iv) follows from (iii). (iv) ⇒ (i): Let (x, s) be a solution of (9.1). Since w > 0 and x and s are nonnegative, both x and s are positive. This proves that (P ) and (D) satisfy the interior-point condition. Thus it has been shown that statements (i) to (iv) in the theorem are equivalent. We finally prove that statement (v) is equivalent with each of these statements. Obviously (v) implies (ii). On the other hand, assuming that statements (i) to (iv) hold, let x and s solve (9.2). Then we have x > 0, s > 0 and xs = w. This implies that φw (x, s) = 0, as easily follows by substitution. Now let K be any nonnegative number. Then the s− III.9 Preliminaries 225 level set LK contains the pair (x, s) and hence it is nonempty. Finally, from the above proof of the implication (i) ⇒ (ii) it is clear that LK is compact. This completes the proof of the theorem. ✷ If the interior-point condition is satisfied, then the target map ΦP D provides a tool for representing any positive primal-dual pair (x, s) by the positive vector xs, which is the inverse image of the pair (x, s). The importance of this feature cannot be overestimated. It means that the interior of the nonnegative orthant in IRn can be used to represent all positive primal-dual pairs. As a consequence, the behavior of primal-dual methods that generate sequences of positive primal-dual pairs, can be described in the nonnegative orthant in IRn . Obviously, the central paths of (P ) and (D) are represented by the bisector {µe : µ > 0} of the w-space; in the sequel we refer to the bisector as the central path of the w-space. See Figure 9.1. w2 ✻ central path ❘ duality gap constant ✠ ✲ w1 Figure 9.1 The central path in the w-space (n = 2). For central-path-following methods the target sequence is a sequence on this path converging to the origin. The iterates of these methods are positive primal-dual pairs ‘close’ to the target points on the central path, in the sense of some proximity measure. In the next sections we deal with target sequences that are not necessarily on the central path. Remark III.2 We conclude this section with an interesting observation, namely that the target map of (P ) and (D) contains so much information that we can reconstruct the data A, b and c from the target map.4 This can be shown as follows. We take partial derivatives with 4 This result was established by Crouzeix and Roos [57] in an unpublished note. 226 III Target-following Approach respect to the coordinates of w in the weighted KKT system (9.2). Denoting the Jacobians of x, y and s simply by x′ , y ′ and s′ respectively, we have x′ = ∂x , ∂w y′ = ∂y , ∂w s′ = ∂s , ∂w where the (i, j) entry of x′ is the partial derivative ∂xi /∂wj , etc. Note that x′ and s′ are n × n matrices and y ′ is an m × n matrix. Thus we obtain Ax′ = 0, A T y ′ + s′ = 0, Xs′ + Sx′ = Inn , (9.10) where I denotes the identity matrix of size n × n.5 The third equation is equivalent to  s′ = X −1 Inn − Sx′ . Using also the second equation we get  AT y ′ = X −1 Sx′ − Inn . ′ (9.11) ′ Since y is an m × n matrix of rank m there exists an n × m matrix R such that y R = Imm . Multiplying (9.11) from the right by R we obtain  AT = AT Imm = AT y ′ R = X −1 Sx′ − Inn R, which determines the matrix A uniquely. Finally, for any positive w, the vectors b and c follow from b = Ax(w) and c = AT y(w) + s(w). • 9.3 Target sequences Let us consider a target sequence w0 , w1 , w2 , . . . , wk , . . . (9.12) which converges to the origin. The vectors wk in the sequence are positive and lim wk = 0. k→∞ As a consequence, for the duality gap eT wk at wk we have limk→∞ eT wk = 0; this implies that the accumulation points of the sequence     (9.13) ΦP D w 0 , ΦP D w 1 , ΦP D w 2 , . . . , ΦP D w k , . . . are optimal primal-dual pairs.6 In the sequel (x∗ , s∗ ) denotes any such optimal primaldual pair. 5 Since the matrix of system (9.10) is nonsingular, the implicit function theorem (cf. Proposition A.2 in Appendix A) implies the existence of all the relevant partial derivatives. 6 Exercise 65 By definition, an accumulation point of the sequence (9.13) is a primal-dual pair that is the limiting point of some convergent subsequence of (9.13). Verify the existence of such a convergent subsequence. III.9 Preliminaries 227 We are especially interested in target sequences for which the accumulation pairs (x∗ , s∗ ) are strictly complementary. We prove below that this happens if the target sequence lies in some cone neighborhood of the central path defined by δc (w) ≤ τ, where τ is fixed and τ ≥ 1. Recall that δc (w) ≥ 1, with equality if and only if w is on the central path. Also, δc (w) is homogeneous in w: for any positive scalar λ and for any positive vector w we have δc (λw) = δc (w). As a consequence, the inequality δc (w) ≤ τ determines a cone in the w-space. In Theorem I.20 we showed for the self-dual model that the limiting pairs of any target sequence on the central path are strictly complementary optimal solutions. Our next result not only implies an analogous result for the standard format but it extends it to target sequences lying inside a cone around the central path in the w-space. Theorem III.3 Let τ ≥ 1 and let the target sequence (9.12) be such that δc (wk ) ≤ τ for each k. Then every accumulation pair (x∗ , s∗ ) of the sequence (9.13) is strictly complementary. Proof: For each k = 1, 2, . . ., let (xk , sk ) := ΦP D (wk ). Then we have xk sk = wk , k = 1, 2, . . . . Now let (x∗ , s∗ ) be any accumulation point of the sequence (9.13). Then there exists a subsequence of the given sequence whose primal-dual pairs converge to (x∗ , s∗ ). Without loss of generality we assume that the given sequence itself is such a subsequence. Since xk − x∗ and sk − s∗ belong respectively to the null space and the row space of A, these vectors are orthogonal. Hence, xk − x∗ T  sk − s∗ = 0. Expanding the product and rearranging terms, we get T T (x∗ ) sk + (s∗ ) xk = sk Using that sk T T T xk + (s∗ ) x∗ . xk = eT wk and (x∗ )T s∗ = 0, we arrive at X x∗j skj + X s∗j xkj = eT wk , k = 1, 2, . . . . j∈σ(s∗ ) j∈σ(x∗ ) Here σ(x∗ ) denotes the support of x∗ and σ(s∗ ) the support of s∗ .7 Using that xk sk = wk , we can write the last equation as X j∈σ(x∗ ) 7 wjk x∗j xkj + X wjk s∗j j∈σ(s∗ ) skj = eT w k , k = 1, 2, . . . . The support of a vector is defined in Section 2.8, Definition I.19, page 36. 228 III Target-following Approach Now let ε be a (small) positive number such that 1+ε > τ. nε Then, since (x∗ , s∗ ) is the limit of the sequence (xk , sk )∞ k=0 , there exists a natural number K such that x∗j s∗j ≤ 1 + ε and ≤1+ε k xj skj for each j (1 ≤ j ≤ n) and for all k ≥ K. Hence, for k ≥ K we have   X X wjk  . wjk + eT wk ≤ (1 + ε)  j∈σ(x∗ ) j∈σ(s∗ ) If the pair (x∗ , s∗ ) is not strictly complementary, there exists an index i that does not belong to the union σ(x∗ ) ∪ σ(s∗ ) of the supports of x∗ and s∗ . Then we have X X wjk ≤ eT wk − wik . wjk + j∈σ(x∗ ) Substitution gives This implies j∈σ(s∗ )  eT wk ≤ (1 + ε) eT wk − wik . (1 + ε)wik ≤ εeT wk . The average value of the elements of wk is eT wk /n. Since δc (wk ) ≤ τ , the average value does not exceed τ wik . Hence, eT wk ≤ nτ wik . Substituting this we obtain (1 + ε)wik ≤ nετ wik . Now dividing both sides by wik we arrive at the contradiction 1 + ε ≤ nετ. This proves that (x∗ , s∗ ) is strictly complementary. ✷ If a target sequence satisfies the condition in Theorem III.3 for some τ ≥ 1, it is clear that the ratios between the coordinates of the vectors wk are bounded. In fact, 1 wk ≤ ik ≤ τ τ wj for all k and for all i and j. For target sequences on the central path these ratios are all equal to one, so the limits of the ratios exist if k goes to infinity. In general we are interested in target sequences for which the limits of these ratios exist when k goes to infinity. Since the ratios between the coordinates do not change if wk is multiplied by a positive constant, this happens if and only if there exists a positive vector w∗ such that nwk lim T k = w∗ , (9.14) k→∞ e w III.9 Preliminaries 229 and then the limiting values of the ratios are given by the ratios between the coordinates of w∗ . Note that we have eT w∗ = n, because the sum of the coordinates of each vector nwk /eT wk is equal to n. Also note that if a target sequence satisfies (9.14), we may find a τ ≥ 1 such that δc (wk ) ≤ τ for each k. In fact, we may take τ = max i,j,k wik . wjk Hence, by Theorem III.3, any accumulation pair (x∗ , s∗ ) for such a sequence is strictly complementary. Our next result shows that if (9.14) holds then the limiting pair (x∗ , s∗ ) is unique and can be characterized as a weighted-analytic center of the optimal sets of (P ) and (D). Let us first define this notion. Definition III.4 (Weighted-analytic center) Let the nonempty and bounded set T be the intersection of an affine space in IRp with the nonnegative orthant of IRp . We define the support σ(T ) of T as the subset of the full index set {1, 2, . . . , p} given by σ(T ) = {i : ∃x ∈ T such that xi > 0} . If w is any positive vector in IRp then the corresponding weighted-analytic center of T is defined as the zero vector if σ(T ) is empty, otherwise it is the vector in T that maximizes the product Y i x∈T. (9.15) xw i , i∈σ(T ) If the support of T is not empty then the convexity of T implies the existence of a vector x ∈ T such that xσ(T ) > 0. Moreover, if σ(T ) is not empty then the maximum value of the product (9.15) exists since T is bounded. Since the product (9.15) is strictly concave, the maximum value is attained at a unique point of T . The above definition generalizes the notion of analytic center, as defined by Definition I.29 and it uniquely defines the weighted-analytic center (for any positive weighting vector w) for any bounded subset that is the intersection of an affine space in IRp with the nonnegative orthant of IRn . 8 Below we apply this notion to the optimal sets of (P ) and (D). If a target sequence satisfies (9.14) then the next result states that the sequence of its primal-dual pairs converges to the pair of weighted-analytic centers of the optimal sets of (P ) and (D). Theorem III.5 Let the target sequence (9.12) be such that (9.14) holds for some w∗ , and let (x∗ , s∗ ) be an accumulation point of the sequence (9.13). Then x∗ is the weighted-analytic center of P ∗ with respect to w∗ , and s∗ is the weighted-analytic center of D∗ with respect to w∗ . Proof: We have already established that the limiting pair (x∗ , s∗ ) is strictly complementary, from Theorem III.3. As a consequence, the support of the optimal set P ∗ 8 Exercise 66 Let w be any positive vector in IRp and let the bounded set T be the intersection of an affine space in IRp with the nonnegative orthant of IRp . Show that the weighted-analytic center (with w as weighting vector) of T coincides with the analytic center of T if and only if w is a scalar multiple of the all-one vector. 230 III Target-following Approach of (P ) is equal to the support σ(x∗ ) of x∗ , and the support of the optimal set D∗ of (D) is equal to the support σ(s∗ ) of s∗ . Now let x̄ be optimal for (P ) and s̄ for (D). Applying the orthogonality property to the pairs (x̄, s̄) and (xk , sk ) := ΦP D (wk ) we obtain (xk − x̄)T (sk − s̄) = 0. Expanding the product and rearranging terms, we get T T T T (x̄) sk + (s̄) xk = sk xk + (s̄) x̄. Since sk T xk = eT wk and (x̄)T s̄ = 0, we get X X s̄j xkj = eT wk , x̄j skj + k = 1, 2, . . . . j∈σ(s∗ ) j∈σ(x∗ ) Here we have also used that σ(x̄) ⊂ σ(x∗ ) and σ(s̄) ⊂ σ(s∗ ). Using xk sk = wk we have X wjk s̄j X wjk x̄j + = eT wk , k = 1, 2, . . . . k k x s j j ∗ ∗ j∈σ(s ) j∈σ(x ) Multiplying both sides by n/eT wk we get X j∈σ(x∗ ) X nwjk s̄j nwjk x̄j + = n, eT wk xkj eT wk skj ∗ k = 1, 2, . . . . j∈σ(s ) Letting k → ∞, it follows that X j∈σ(x∗ ) X wj∗ s̄j wj∗ x̄j + = n. x∗j s∗j ∗ j∈σ(s ) At this stage we apply the geometric inequality,9 which states that for any two positive vectors α and β in IRn , !Pn βi Pn   n β j=1 i Y αj j=1 αi . ≤ Pn βj j=1 βi j=1 We apply this inequality with β = w∗ and αj = wj∗ x̄j x∗j (j ∈ σ(x∗ )), αj = wj∗ s̄j s∗j (j ∈ σ(s∗ )). Thus we obtain, using that the sum of the weights wj∗ equals n, n !wj∗ !wj∗   X wj∗ x̄j X wj∗ s̄j Y Y s̄j x̄j 1  = 1. ≤  ∗ ∗ ∗ + ∗ x s n x s j j j j ∗ ∗ ∗ ∗ j∈σ(x ) 9 j∈σ(s ) j∈σ(x ) j∈σ(s ) When β is the all-one vector e, the geometric inequality reduces to the arithmetic-geometric-mean inequality. For a proof of the geometric inequality we refer to Hardy, Littlewood and Pólya [139]. III.9 Preliminaries 231 Substituting s̄ = s∗ in the above inequality we get Y x̄j j ≤ Y s̄j j ≤ j∈σ(x∗ ) w∗ Y x∗j wj , Y s∗j wj . ∗ j∈σ(x∗ ) and substituting x̄ = x∗ gives j∈σ(s∗ ) w∗ ∗ j∈σ(s∗ ) This shows that x∗ maximizes the product Y xj j Y sj j w∗ i∈σ(x∗ ) and s∗ the product w∗ i∈σ(s∗ ) over the optimal sets of (P ) and (D) respectively. Hence the proof is complete. 9.4 ✷ The target-following scheme We are ready to describe more formally the main idea of the target-following approach. Assume we are given some positive primal-dual feasible pair (x0 , s0 ). Put w0 := x0 s0 and assume that we have a sequence w0 , w1 , . . . , wk , . . . , wK (9.16) of points wk in the w-space with the following property: Given the primal-dual pair for wk , with 0 ≤ k < K, it is ‘easy’ to compute the primal-dual pair for wk+1 . We call such a sequence a traceable target sequence of length K. If a traceable sequence of length K is available, then we can solve the given problem pair (P ) and (D), up to the precision level eT wK , in K iterations. The k-th iteration in this method would consist of the computation of the primal-dual targetpair corresponding to the target point wk . Conceptually, the algorithm is described as follows (page 232). Some remarks are in order. Firstly, in practice the primal-dual pair (x(w̄), s(w̄)) corresponding to an intermediate target w̄ is not computed exactly. Instead we compute it approximately, but so that the approximating pair is close to w̄ in the sense of a suitable proximity measure. Secondly, the target sequence is not necessarily prescribed beforehand. It may be generated in the course of the algorithm. Both cases occurred in Chapter 7. For example, the primal-dual logarithmic barrier algorithm with full Newton steps in 232 III Target-following Approach Conceptual Target-following Algorithm Input: A positive primal-dual pair (x0 , s0 ); a final target vector w̃. begin w := x0 s0 ; while w is not ‘close’ to w̃ do begin choose an ‘intermediate’ target w̄; compute x(w̄) and s(w̄); w := x(w̄)s(w̄); end end Section 7.5 uses intermediate targets of the form w = µe, and each subsequent target is given by (1−θ)w, with θ fixed. The same is true for the primal-dual logarithmic barrier algorithm with large updates in Section 7.8. In contrast, the primal-dual logarithmic barrier algorithm with adaptive updates (cf. Section 7.6.1) defines its target points adaptively. Thirdly, if we say that the primal-dual pair corresponding to a given target can be computed ‘easily’, we mean that we have an efficient numerical procedure for finding this primal-dual pair, at least approximately. The numerical method is always Newton’s method, either for solving the KKT system defining the primal-dual pair, or for finding the minimizer of a suitable barrier function. When full Newton steps are taken, the target must be close to where we are, and one step must yield a sufficiently accurate approximation of the primal-dual pair for this target. In the literature, methods of this type are usually called short-step methods when the target sequence is prescribed, and adaptive-step methods if the target sequence is defined adaptively. We call them full-step methods. If subsequent targets are at a greater distance we are forced to use damped Newton steps. The number of Newton steps necessary to reach the next target (at least approximately) may then become larger than one. To achieve polynomiality we need to guarantee that this number √ can be bounded either by a constant or by some suitable function of n, e.g., O( n) or O(n). We refer to such methods as multistep methods. They appear in the literature as medium-step methods and large-step methods. In general, a primal-dual target-following algorithm is based on some finite underlying target sequence w0 , w1 , . . . , wk = w̃. The final target w̃ is a vector with small duality gap eT w̃ if we are optimizing, but other final targets are allowable as well; examples of both types of target sequence are given in Chapter 11 below. The general structure is as follows. III.9 Preliminaries 233 Generic (Primal-Dual) Target-following Algorithm Input: A positive primal-dual pair (x0 , s0 ) such that x0 s0 = w0 ; a final target vector w̃. begin x = x0 , s = s0 , w := w0 ; while w is not ‘close’ to w̃ do begin replace w by the next target in the sequence; while xs is not ‘close’ to w do begin apply Newton steps at (x, s) with w as target end end end For each target in the sequence the next target can be prescribed (in advance), but it can also be defined adaptively. If it is close to the present target then a single (full) Newton step may suffice to reach the next target, otherwise we apply a multistep method, using damped Newton steps. The target-following approach is more general than the standard central-pathfollowing schemes that appear in the literature. The vast majority of the latter use target sequences on the central path.10 We show below, in Chapter 11, that many classical results in the literature can be put in the target-following scheme and that this scheme often dramatically simplifies the analysis. First, we derive the necessary numerical tools in the next chapter. This amounts to generalizing results obtained before in Part II for the case where the target is on the central path to the case where it is off the central path. We first analyze the full primal-dual Newton step method and the damped primal-dual Newton step method for computing the primal-dual pair corresponding to a given target vector. To this end we introduce a proximity measure, and we show that the full Newton step method is quadratically convergent. For the damped Newton method we show that a single step reduces the primal-dual barrier function by at least a constant, provided that the proximity measure is bounded below by a constant. We then have the basic ingredients 10 There are so many papers on the subject that it is impossible to give an exhaustive list. We mention a few of them. Short-step methods along the central path can be found in Renegar [237], Gonzaga [118], Roos and Vial [245], Monteiro and Adler [218] and Kojima et al. [178]. We also refer the reader to the excellent survey of Gonzaga [124]. The concept of target-following methods was introduced by Jansen et al. [159]. Closely related methods, using so-called α-sequences, were considered before by Mizuno for the linear complementarity problem in [212] and [214]. The first results on multistep methods were those of Gonzaga [121, 122] and Roos and Vial [244]. We also mention den Hertog, Roos and Vial [146] and Mizuno, Todd and Ye [217]. The target-following scheme was applied first to multistep methods by Jansen et al. [158]. 234 III Target-following Approach for the analysis of primal-dual target-following methods. The results of the next chapter are used in Chapter 11 for the analysis of several interesting algorithms. There we restrict ourselves to full Newton step methods because they give the best complexity results. Later we show that the target-following concept is also useful when dealing with dual or primal methods. We also show that the primaldual pair belonging to a target vector can be efficiently computed by such methods. This is the subject of Chapters 12 and 13. 10 The Primal-Dual Newton Method 10.1 Introduction Suppose that a positive primal-dual feasible pair (x, s) is given as well as some target vector w > 0. Our aim is to find the primal-dual pair (x(w), s(w)). Recall that to the dual feasible slack vector s belongs a unique y such that AT y + s = c. The vector in the y-space corresponding to s(w) is denoted by y(w). In this section we define search directions ∆x, ∆y, ∆s at the given pair (x, s) that are aimed to bring us closer to the target pair (x(w), s(w)) corresponding to w. The search directions in this section are obtained by applying Newton’s method to the weighted KKT system (9.2), page 220. The approach closely resembles the treatment in Chapter 7. There the target was on the central path, but now the target may be any positive vector w. It will become clear that the results of Chapter 7 can be generalized almost straightforwardly to the present case. To avoid tiresome repetitions we to omit detailed arguments when they are similar to arguments used in Chapter 7. 10.2 Definition of the primal-dual Newton step We want the iterates x + ∆x, y + ∆y, s + ∆s to satisfy the weighted KKT system (9.2) with respect to the target w. So we want ∆x, ∆y and ∆s to satisfy A(x + ∆x) A (y + ∆y) + s + ∆s = = b, c, (x + ∆x)(s + ∆s) = w. T x + ∆x ≥ 0, s + ∆s ≥ 0, Neglecting the inequality constraints, we can rewrite this as follows: A∆x A ∆y + ∆s = = 0, 0, s∆x + x∆s + ∆x∆s = w − xs. T (10.1) Newton’s method amounts to linearizing this system by neglecting the second-order term ∆x∆s in the third equation. Thus we obtain the linear system A∆x A ∆y + ∆s = = 0, 0, s∆x + x∆s = w − xs. T (10.2) 236 III Target-following Approach Comparing this system with (7.2), page 150, in Chapter 7, we see that the only difference occurs in the third equation, where the target vector w replaces the target µe on the central path. In particular, both systems have the same matrix. Since this matrix is nonsingular (cf. Theorem II.42, page 150, and Exercise 46, page 151), system (10.2) determines the displacements ∆x, ∆y and ∆s uniquely. We call them the primaldual Newton directions at (x, s) corresponding to the target w.1,2,3 It may be worth pointing out that computation of the displacements ∆x, ∆y and ∆s amounts to solving a positive definite system with the matrix AXS −1 AT , just like when the target is on the central path. 10.3 Feasibility of the primal-dual Newton step In this section we investigate the feasibility of the (full) Newton step. As before, the result of the Newton step at (x, y, s) is denoted by (x+ , y + , s+ ), so x+ = x + ∆x, y + = y + ∆y, s+ = s + ∆s. Since the new iterates satisfy the affine equations we only have to deal with the question of whether x+ and s+ are nonnegative or not. We have x+ s+ = (x + ∆x)(s + ∆s) = xs + (s∆x + x∆s) + ∆x∆s. Since s∆x + x∆s = w − xs this leads to x+ s+ = w + ∆x∆s. (10.3) Hence, x+ and s+ are feasible only if w + ∆x∆s is nonnegative. The converse is also true. This is the content of the next lemma. Lemma III.6 The primal-dual Newton step at (x, s) to the target w is feasible if and only if w + ∆x∆s ≥ 0. Proof: The proof uses exactly the same arguments as the proof of Lemma II.46; we simply need to replace the vector µe by w. We leave it to the reader to verify this. ✷ Note that Newton’s method is exact when the second-order term ∆x∆s vanishes. In that case we have x+ s+ = w. This means that the pair (x+ , s+ ) is the image of w under the target map, whence x+ = x(w) and s+ = s(w). In general ∆x∆s will not be zero and Newton’s method will not be exact. However, the duality gap always assumes the correct value eT w after the Newton step. 1 Exercise 67 Prove that the system (10.2) has a unique solution, namely ∆y 2 3 = AXS −1 AT −1 b − AW s−1  ∆s = −AT ∆y ∆x = ws−1 − x − xs−1 ∆s. Exercise 68 When w = 0 in (10.2), the resulting directions coincide with the primal-dual affinescaling directions introduced in Section 7.6.2. Verify this. Exercise 69 When w = µe and µ = xT s/n in (10.2), the resulting directions coincide with the primal-dual centering directions introduced in Section 7.6.2. Verify this. III.10 Primal-Dual Newton Method 237 T Lemma III.7 If the primal-dual Newton step is feasible then (x+ ) s+ = eT w. Proof: This is immediate from (10.3) because the vectors ∆x and ∆s are orthogonal. ✷ In the following sections we further analyze the primal-dual Newton method. This requires a quantity for measuring the progress of the Newton iterates on the way to the pair ΦP D (w). As may be expected, two cases could occur. In the first case the present pair (x, s) is ‘close’ to ΦP D (w) and full Newton steps are feasible. In that case the full Newton step method is (hopefully, and locally) quadratically convergent. In the second case the present pair (x, s) is ‘far’ from ΦP D (w) and the full Newton step may not be feasible. Then we are forced to take damped Newton steps and we may expect no more than a linear convergence rate. In both cases we need a new quantity for measuring the proximity of the current iterate to the target vector w. The next section deals with the first case and the second case is considered in Section 10.5. It will be no surprise that we use the weighted primal-dual barrier function φw (x, s) in Section 10.5 to measure progress of the method. 10.4 Proximity and local quadratic convergence Recall from (7.16), page 156, that in the analysis of the central-path-following primaldual method we measured the distance of the pair (x, s) to the target µe by the quantity r r µe xs 1 − . δ(x, s; µ) = 2 xs µe This can be rewritten as 1 µe − xs √ . δ(x, s; µ) = √ 2 µ xs Note that the right-hand side measures, in some way, the distance in the w-space between the inverse image µe of the pair of µ-centers (x(µe), s(µe)) and the primaldual pair (x, s).4 For a general target vector w we adapt this measure to 1 w − xs √ δ(xs, w) := p . xs 2 min (w) (10.4) The quantity on the right measures the distance from the coordinatewise product xs to w. It is defined for (ordered) pairs of vectors in the w-space. Therefore, and because it will be more convenient in the future, we express this feature by using the notation 4 This observation makes clear that the proximity measure δ(x, s; µ) ignores the actual data of the problems (P ) and (D), which is contained in A, b and c. Since the behavior of Newton’s method does depend on these data, it follows that the effect of a (full) Newton step on the proximity measure depends on the data of the problem. This reveals the weakness of the analysis of the full-step method (cf. Chapter 6.7). It ignores the actual data of the problem and only provides a worst-case analysis. In contrast, with adaptive updates (cf. Chapter 6.8) the data of the problem are taken into account and, as a result, the performance of the method is improved. 238 III Target-following Approach δ(xs, w) instead of the alternative notation δ(x, s; w). We prove in this section that the Newton method is quadratically convergent in terms of this proximity measure.5 As before we use scaling vectors d and u. The definition of u needs to be adapted to the new situation: r r x xs , u := . (10.5) d := s w Note that xs = w if and only if u = e. We also introduce a vector v according to √ v = xs. With d we can rescale both x and s to the vector v:6 d−1 x = v, ds = v. Rescaling ∆x and ∆s similarly: d−1 ∆x =: dx , d∆s =: ds , (10.6) we see that ∆x∆s = dx ds . Consequently, the orthogonality of ∆x and ∆s implies that the scaled displacements dx and ds are orthogonal as well. Now we may reduce the left-hand side in the third equation of the KKT system as follows: s∆x + x∆s = sdd−1 ∆x + xd−1 d∆s = v (dx + ds ) , so the third equation can be restated simply as dx + ds = v −1 (w − xs) . 5 Exercise 70 The definition (10.4) of the primal-dual proximity measure δ = δ(xs, w) implies that 2δ(xs, w) ≥ w − xs √ √ w xs Using this and Lemma II.62, prove 6 = q w − xs q xs w . q xi s i 1 ≤ ≤ ρ(δ), 1 ≤ i ≤ n. ρ(δ) wi Here we deviate from the approach in Chapter 7. The natural generalization of the approach there would be to rescale x and s to u: d−1 x = u, √ w ds √ =: u, w and then rescale ∆x and ∆s accordingly to d−1 ∆x =: dx , √ w d∆s √ =: ds . w But then we have ∆x∆s = wdx ds and we lose the orthogonality of dx and ds with respect to the standard inner product. This could be resolved by changing the inner product in such a way that orthogonality is preserved. We leave it as an (interesting) exercise to the reader to work this out. Here the difficulty is circumvented by using a different scaling. III.10 Primal-Dual Newton Method 239 On the other hand, the first and second equations can be reformulated as ADdx = 0 and (AD)T dy + ds = 0, where dy = ∆y. We conclude that the scaled displacements dx , dy and ds satisfy ADdx = (AD)T dy + ds = dx + ds = 0 0 v −1 (w − xs) . (10.7) Using the same arguments as in Chapter 7 we conclude that dx and ds form the components of v −1 (w − xs) in the null space and the row space of AD, respectively. Note that w − xs represents the move we want to make in the w-space. Therefore we denote it as ∆w. It is also convenient to use a scaled version dw of ∆w, namely dw := v −1 (w − xs) = v −1 ∆w. (10.8) dx + ds = dw (10.9) Then we have and, since dx and ds are orthogonal, 2 2 2 kdx k + kds k = kdw k . (10.10) This makes clear that the scaled displacements dx , ds (and also dy ) are zero if and only if dw = 0. In that case x, y and s coincide with their values at w. An immediate consequence of the definition (10.4) of the proximity δ(xs, w) is kdw k . δ(xs, w) = p 2 min (w) (10.11) The next lemma contains upper bounds for the 2-norm and the infinity norm of the second-order term dx ds . Lemma III.8 We have kdx ds k∞ ≤ 1 4 2 kdw k and kdx ds k ≤ 1 √ 2 2 2 kdw k . Proof: The lemma follows immediately by applying the first uv-lemma (Lemma C.4) to the vectors dx and ds . ✷ Lemma III.9 The Newton step is feasible if δ(xs, w) ≤ 1. Proof: Lemma III.6 guarantees feasibility of the Newton step if w + ∆x∆s ≥ 0. Since ∆x∆s = dx ds this certainly holds if the infinity norm of the quotient dx ds /w does not exceed 1. Using Lemma III.8 and (10.11) we may write dx ds w This implies the lemma. ∞ ≤ 2 kdx ds k∞ kdw k ≤ = δ(xs, w)2 ≤ 1. min (w) 4 min (w) ✷ We are ready for the main result of this section, which is a perfect analogue of Theorem II.50, where the target is on the central path. 240 III Target-following Approach Theorem III.10 If δ := δ(xs; w) ≤ 1, then the primal-dual Newton step is feasible and (x+ )T s+ = eT w. Moreover, if δ < 1 then δ2 . δ(x+ s+ , w) ≤ p 2(1 − δ 2 ) Proof: The first part of the theorem is a restatement of Lemma III.9 and Lemma III.7. We proceed with the proof of the second statement. By definition, δ(x+ s+ , w)2 = w − x+ s+ 1 √ 4 min (w) x+ s+ 2 . Recall from (10.3) that x+ s+ = w + ∆x∆s = w + dx ds . Using also Lemma III.8 and (10.11), we write   1 2 min x+ s+ ≥ min (w) − kdx ds k∞ ≥ min (w) − kdw k = min (w) 1 − δ 2 . 4 Thus we find, by substitution, 2 δ(x+ s+ , w)2 ≤ 2 kw − x+ s+ k kdx ds k = . 2 2 4 (1 − δ ) min (w) 4 (1 − δ 2 ) min (w)2 Finally, using the upper bound for kdx ds k in Lemma III.8 and also using (10.11) once more, we obtain 4 δ(x+ s+ , w)2 ≤ kdw k δ4 = . 32 (1 − δ 2 ) min (w)2 2 (1 − δ 2 ) This implies the theorem.7 ✷ It is clear that the above result has value only if the given pair (x, s) is close enough to the target vector w. It guarantees quadratic convergence p to the target √ if δ(xs, w) ≤ 1/ 2. Convergence is guaranteed only if δ(xs, w) ≤ 2/3. For larger values of δ(xs, w) we need a different analysis. Then we measure progress of the iterates in terms of the barrier function φw (x, s) and we use damped Newton steps. This is the subject of the next section. 10.5 The damped primal-dual Newton method As before, we are given a positive primal-dual pair (x, s) and a target vector w > 0. Let x+ and s+ result from a damped Newton step of size α at (x, s). In this section 7 Recall from Lemma C.6 in Section 7.4.1 that we have the better estimate δ(x+ s+ ; w) ≤ p δ2 2(1 − δ4 ) if the target w is on the central path. We were not able to get the same result if w is off the central path. We leave this as a topic for further research. III.10 Primal-Dual Newton Method 241 we analyze the effect of a damped Newton step — at (x, s) and for the target w — on the value of the barrier function φw (x, s) (as defined on page 221). We have x+ = x + α∆x, s+ = s + α∆s, where α denotes the step-size, and 0 ≤ α ≤ 1. Using the scaled displacements dx and ds as defined in (10.6), we may also write s+ = d−1 (v + αds ) , x+ = d (v + αdx ) , where v = √ xs. As a consequence, x+ s+ = (v + αdx ) (v + αds ) = v 2 + αv (dx + ds ) + α2 dx ds . Since v (dx + ds ) = w − xs = w − v 2 , we obtain Now, defining  x+ s+ = v 2 + α w − v 2 + α2 dx ds . v + := we have v+ and v+ 2 2 (10.12) √ x+ s+ , = (v + αdx ) (v + αds ) (10.13)  − v 2 = α w − v 2 + α2 dx ds . (10.14) The next theorem provides a lower bound for the decrease of the barrier function value during a damped Newton step. The bound coincides with the result of Lemma II.72 if w is on the central path and becomes worse if the ‘distance’ from w to the central path increases. Theorem III.11 Let δ = δ(xs, w) and let α = 1/ω − 1/(ω + 4δ 2 /δc (w)), where8 s s 2 2 2 2 ∆s ds ∆x dx + = + . ω := x s v v Then the pair (x+ , s+ ) resulting from the damped Newton step of size α is feasible. Moreover,   2δ + + . φw (x, s) − φw (x , s ) ≥ ψ δc (w)ρ(δ) Proof: It will √ be convenient to express max (w) φw (x, s), given by (9.4), page 221, in terms of v = xs. We obviously have max (w) φw (x, s) = eT v 2 − 8 n X j=1 wj log vj2 − eT w + Exercise 71 Verify that ∆x dx = , x v ∆s ds = . s v n X j=1 wj log wj . 242 III Target-following Approach Hence we have the following expression for max (w) φw (x+ , s+ ): max (w) φw (x+ , s+ ) = eT v + 2 − n X wj log vj+ j=1 2 − eT w + n X wj log wj . j=1 With ∆ := φw (x, s) − φw (x+ , s+ ), subtracting both expressions yields 2 n  vj+   X 2 + 2 wj log + . max (w) ∆ = e v − v vj2 j=1 T Substitution of (10.13) and (10.14) gives    X  n n (dx )j (ds )j  X wj log 1 + α max (w) ∆ = −αeT w − v 2 + wj log 1 + α + . vj vj j=1 j=1 Here we took advantage of the orthogonality of dx and ds in omitting the term containing eT dx ds . The definition of ψ implies     (dx )j (dx )j (dx )j =α −ψ α log 1 + α vj vj vj and a similar result for the terms containing entries of ds . Substituting this we obtain      wds wdx T T 2 T + αe max (w) ∆ = −αe w − v + αe v v      n X (dx )j (ds )j wj ψ α +ψ α . − vj vj j=1 2 The contribution of the terms on the left of the sum can be reduced to α kdw k . This follows because   w (dx + ds ) w − v 2 dw wdw vd2 − w − v2 + = −vdw + = = w = d2w . v v v v It can easily be understood that the sum attains its maximal value if all the coordinates of the concatenation of the vectors αdx /v and αds /v are zero except one, and the nonzero coordinate, for which wj must be maximal, is equal to minus the norm of this concatenated vector. The norm of the concatenation of αdx /v and αds /v being αω, we arrive at max (w) ∆ ≥ = 2 α kdw k − max (w) ψ (−αω) 4αδ 2 min (w) − max (w) ψ (−αω) . This can be rewritten as ∆≥ 4αδ 2 4αδ 2 − ψ (−αω) = + αω + log (1 − αω) . δc (w) δc (w) (10.15) III.10 Primal-Dual Newton Method 243 The derivative of the right-hand side expression with respect to α is ω 4δ 2 +ω− , δc (w) 1 − αω and it vanishes only for the value of α specified in the lemma. As in the proof of Lemma II.72 (page 201) we conclude that the specified value of α maximizes the lower bound for ∆ in (10.15), and, as a consequence, the damped Newton step of the specified size is feasible. Substitution in (10.15) yields, after some elementary reductions, the following bound for ∆:     4δ 2 4δ 2 4δ 2 ∆≥ =ψ . − log 1 + ωδc (w) ωδc (w) ωδc (w) In this bound we may replace ω by a larger value, since ψ(t) is monotonically increasing for t nonnegative. An upper bound for ω can be obtained as follows: s p 2 2 2δ min (w) dx kdw k ds = . + ≤ ω= v v min (v) min (v) Let the index k be such that min (v) = vk . Then we may write p p r √ 2δ wk 2δ min (w) 2δ min (w) wk ≤ = 2δ = 2δu−1 = k , min (v) vk vk xk sk where u denotes the vector defined in (10.5). The coordinates of u can be bounded nicely by using the function ρ(δ) defined in Lemma II.62 (page 182). This can be achieved by reducing δ = δ(xs, w), as given in (10.4), in the following way: r  r r w w xs 1 −1 1 1 w − xs √ ≥ − = u −u . δ= p 2 min (w) xs w 2 xs 2 min (w) Hence we have u−1 − u ≤ 2δ. Applying Lemma II.62 it follows that the coordinates of u and u−1 are bounded above by ρ(δ) (cf. Exercise 70, page 238). Hence we may conclude that ω ≤ 2δρ(δ). Substitution of this bound in the last lower bound for ∆ yields   2δ ∆≥ψ , δc (w)ρ(δ) completing the proof. ✷ √ The damped Newton method will be used only if δ = δ(xs, w) ≥ 1/ 2, because for smaller values √ of δ full Newton steps give quadratic convergence to the target. For δ = δ(xs, w) ≥ 1/ 2 we have √ √ 2δ 2 2 √ = 3 − 1 = 0.73205, q ≥ = ρ(δ) 1+ 3 √1 + 1 + 12 2 244 III Target-following Approach so outside the region of quadratic convergence around the target w, a damped Newton step reduces the barrier function value by at least   0.73205 . ψ δc (w) 0.2 0.16 0.12 0.08 ψ 0.04  0.73205 δc (w)  ☛ 0 0 1 2 3 4 5 6 7 8 9 10 ✲ δc (w) Figure 10.1 Lower bound for the decrease in φw during a damped Newton step. The graph in Figure 10.1 depicts this function for 1 ≤ δc (w) ≤ 10. Remark III.12 The above analysis is based on the barrier function φw (x, s) defined in (9.3). We showed in (9.4) and (9.5) that, up to a constant factor max (w), the variable part in this function is given by the weighted primal-dual logarithmic barrier function xT s − n X wj log xj sj . j=1 In this function the weights occur in the barrier term. We want to point out that there exists an alternative way to analyze the damped Newton method by using a barrier function for which the weights occur in the objective term. Consider φ̄w (x, s) := e T  xs w  −e − n X j=1 X x j sj = ψ log wj n j=1   x j sj −1 wj =Ψ  xs w  −e . (10.16) Clearly φ̄w (x, s) is defined for all positive primal-dual pairs (x, s). Moreover, φw (x, s) ≥ 0 and the equality holds if and only if xs = w. Hence, the solution of the weighted KKT system (9.2) is characterized by the fact that it satisfies the equation φ̄w (x, s) = 0. The variable part of φ̄w (x, s) is given by n xs X eT − log xj sj , w j=1 III.10 Primal-Dual Newton Method 245 which has the weights in the objective term. It has recently been shown by de Klerk, Roos and Terlaky [172] that this function can equally well serve in the analysis of the damped Newton method. In fact, Theorem III.11 remains true if φw is replaced by φ̄w . This might be surprising because, whereas φw is strictly convex on its domain, φ̄w is not convex unless w is on the central path.9 • 9 Exercise 72 Let (x, s) be any positive primal-dual pair. Show that φ̄w (x, s) ≤ φw (x, s). 11 Applications 11.1 Introduction In this Chapter we present some examples of traceable target sequences. The examples are chosen to cover the most prominent primal-dual methods and results in the literature. We restrict ourselves to sequences that can be traced by full Newton steps.1 To keep the presentation simple, we make a further assumption, namely that Newton’s method is exact in its region of quadratic convergence. In other words, we assume that the algorithm generates exact primal-dual pairs for the respective targets in the target sequence. In a practical algorithm the generated primal-dual pairs will never exactly match their respective targets. However, our assumption does not change the order of magnitude for the obtained iteration bounds. In fact, at the cost of a little more involved analysis we can obtain the same iteration bounds for a practical algorithm, except for a small constant factor. This can be understood from the following theorem, where we assume that we are given a ‘good’ approximation for the primal-dual pair (x(w), s(w)) corresponding to the target w and we consider the effect of an update of the target to w̄. We make clear that δ(xs, w̄) ≈ δ(w, w̄) if δ(xs, w) is small. Thus, we assume that the proximity δ = δ(xs, w) is small. Recall that the quadratic √ convergence property of Newton’s method justifies this assumption. If δ ≤ 1/ 2 then in no more than 6 full Newton steps we are sure that a primal-dual pair (x, s) is obtained for which δ(xs, w) ≤ 10−10 . Thus, if K denotes the length of the target sequence, 6K additional Newton steps are sufficient to work with ‘exact’ primal-dual pairs, at least from a computational point of view. Theorem III.13 Let the primal-dual pair (x, s) and the target w be such that δ = δ(xs, w). Then, for any other target vector w̄ we have δ(xs, w̄) ≤ 1 s min (w) δ + ρ(δ) δ(w, w̄). min (w̄) The motivation for this choice is that full Newton steps give the best iteration bounds. The results in the previous chapter for the damped Newton step provide the ingredients for the analysis of target-following methods using the multistep strategy. Target sequences for multistep methods were treated extensively by Jansen in [151]. See also Jansen et al. [158]. 248 III Target-following Approach Proof: We may write 1 1 xs − w̄ xs − w + w − w̄ √ √ = p . δ(xs, w̄) = p xs xs 2 min (w̄) 2 min (w̄) Using the triangle inequality we get This implies 1 1 xs − w w − w̄ √ √ + p . δ(xs, w̄) ≤ p xs xs 2 min (w̄) 2 min (w̄) δ(xs, w̄) ≤ s min (w) 1 δ(xs, w) + p min (w̄) 2 min (w̄) r w w − w̄ √ . xs w From the result of Exercise 70 on page 238, this can be reduced to s min (w) δ + ρ(δ)δ(w, w̄), δ(xs, w̄) ≤ min (w̄) completing the proof. ✷ In the extreme case where δ(xs, w) = 0, we have xs = w and hence δ(xs, w̄) = δ(w, w̄). In that case the bound in the lemma is sharp, since δ = 0 and ρ(0) = 1. If δ is small, then the first term in the bound for δ(xs, w̄) will be small compared to the second term. This follows by noting that the square root can be bounded by s r min (w) wk ≤ ρ (δ(w, w̄)) . (11.1) ≤ min (w̄) w̄k Here the index k is such that min (w̄) = w̄k .2 Since ρ(δ) ≈ 1 if δ ≈ 0, we conclude that δ(xs, w̄) ≈ δ(w, w̄) if δ is small. 11.2 Central-path-following method Central-path-following methods were investigated extensively in Part II. The aim of this section is twofold. It provides a first (and easy) illustration of the use of the targetfollowing approach, and it yields one of the main results of Part II in a relatively cheap way. The target points have the form w = µe, µ > 0. When at the target w, we let the next target point be given by w̄ = (1 − θ)w, 0 < θ < 1. 2 When combining the bounds in Theorem III.13 and (11.1) one gets the bound δ(xs, w̄) ≤ ρ (δ(w, w̄)) δ(xs, w) + ρ (δ(xs, w)) δ(w, w̄), which has a nice symmetry, but which is weaker than the bound of Theorem III.13. III.11 Applications 249 Then some straightforward calculations yield δ(w, w̄): √ √ θ µe w̄ − w θ n 1 √ . = p = √ δ(w, w̄) = p w 2 1−θ 2 min (w̄) 2 (1 − θ)µ Assuming that n ≥ 4 we find that 1 δ(w, w̄) ≤ √ 2 if 1 θ=√ . n Hence, by Lemma I.36, a full Newton step method needs √ nµ0 n log ε iterations3 to generate an ε-solution when starting at w0 = µ0 e. 11.3 Weighted-path-following method With a little extra effort, we can also analyze the case where the target sequence lies on the half line w = µw0 , µ > 0, for some fixed positive vector w0 . This half line is a socalled weighted path in the w-space. The primal-dual pairs on it converge to weightedanalytic centers of the optimal sets of (P ) and (D), due to Theorem III.5. Note that when using a target sequence of this type we can start the algorithm everywhere in the w-space. However, as we shall see, not using the central path diminishes the efficiency of the algorithm. Letting the next target point be given by w̄ = (1 − θ)w, 0 < θ < 1, (11.2) we obtain √ r kθ wk w̄ − w θ w 1 √ = p . = √ δ(w, w̄) = p w min (w) 2 1−θ 2 min (w̄) 2 (1 − θ) min (w) Using δc (w), as defined in (9.6), page 222, which measures the proximity of w to the central path, we may write s r w max (w) p = nδc (w). ≤ kek min (w) min (w) Thus we obtain 3 p θ nδc (w) δ(w, w̄) ≤ √ . 2 1−θ Formally, we should round the iteration bound to the smallest integer exceeding it. For simplicity we omit the corresponding rounding operator in the iteration bounds in this chapter; this is common practice in the literature. 250 III Target-following Approach Assuming n ≥ 4 again, we find that 1 δ(w, w̄) ≤ √ 2 if 1 θ= p . nδc (w) Hence, when starting at w0 , we are sure that the duality gap is smaller than ε after at most p eT w 0 nδc (w0 ) log (11.3) ε iterations. Here we used the obvious identity δc (w0 ) = δc (w). Comparing this result with a factor p the iteration bound of the previous section we observe that we introduce 4 0 δc (w ) > 1 into the iteration bound by not using the central path. The above result indicates that in some sense the central path is the best path to follow to the optimal set. When starting further from the central path the iteration bound becomes worse. This result gives us evidence of the very special status of the central path among all possible weighted paths to the optimal set. 11.4 Centering method If we are given a primal-dual pair (x0 , s0 ) such that w0 = x0 s0 is not on the central path, then instead of following the weighted path through w0 to the origin, we can use an alternative strategy. The idea is first to move to the central path and then follow the central path to the origin. We know already how to follow the central path. But the other problem, moving from some point w0 in the w-space to the central path, is new. This problem has become known as the centering problem.5,6 The centering problem can be solved by using a target sequence starting at w0 and ending on the central path. We shall propose a target sequence that converges in √ n log δc (w0 ) (11.4) iterations.7 The iteration bound (11.4) can be obtained as follows. Let w̄ be obtained from some point w outside the central path by replacing each entry wi such that wi < (1 + θ) min (w) 4 Primal-dual weighted-path-following methods were first proposed and discussed by Megiddo [200]. Later they were also analyzed by Ding and Li [67]. A primal version was studied by Roos and den Hertog [241]. 5 The centering approach presented here was proposed independently by den Hertog [140] and Mizuno [212]. Exercise 73 The centering problem includes the problem of finding the analytic center of a polytope. Why? Note that the quantity δc (w 0 ) appears under a logarithm. This is very important from the viewpoint of complexity analysis. If the weights were initially determined from a primal-dual feasible pair (x0 , s0 ), we can say that δc (w 0 ) has the same input length as the two points. It is reasonable to assume that this input length is at most equivalent to the input length of the data of the problem, but there is no real reason to state that it is strictly smaller. Since an algorithm is claimed to be polynomial only when the bound on the number of iterations is a function of the logarithm of the length of the input data, it is better to have the quantity δc (w 0 ) under the logarithm. 6 7 III.11 Applications 251 by (1 + θ) min (w), where θ is some positive constant satisfying 1 + θ ≤ δc (w). It then follows that δc (w) . δc (w̄) = 1+θ Using that 0 ≤ w̄i − wi ≤ θ min (w) we write 1 1 w̄ − w θ min (w) e √ √ δ(w, w̄) = p ≤ p . w w 2 min (w̄) 2 (1 + θ) min (w) This implies θ δ(w, w̄) ≤ p 2 (1 + θ) so we have p √ √ min (w) e θ n θ θ n √ ≤ ≤ p , kek = √ w 2 2 1+θ 2 (1 + θ) 1 δ(w, w̄) ≤ √ 2 if √ 2 θ=√ . n At each iteration, δc (w) decreases by the factor 1 + θ. Thus, when starting at w0 , we certainly have reached the central path if the iteration number k satisfies (1 + θ)k ≥ δc (w0 ). Substituting the value of θ and then taking logarithms, we obtain √ ! 2 ≥ log δc (w0 ). k log 1 + √ n If n ≥ 3, this inequality is satisfied if8 k √ ≥ log δc (w0 ). n Thus we find that no more than √ n log δc (w0 ) (11.5) iterations bring the iterate onto the central path. This proves the iteration bound (11.4) for the centering problem. The above-described target sequence ends at the point max (w) e on the central path. From there on we can follow the central path as described in Section 11.2 and we reach an ε-solution after a total of   √ n max (w0 ) n log δc (w0 ) + log (11.6) ε 8 If n ≥ 3 then we have log  √ 2 1+ √ n  1 ≥ √ . n 252 III Target-following Approach iterations. Note that this bound for a strategy that first centralizes and then optimizes is better than the one we obtained for the more direct strategy (11.2) of following a sequence along the weighted path. In fact the bound (11.6) is the best one known until now when the starting point lies away from the central path. Remark III.14 The above centering strategy pushes the small coordinates of w0 upward to max (w0 ). We can also consider the more obvious strategy of moving the large coordinates of w0 downward to min (w0 ). Following a similar analysis we obtain δ(w, w̄) ≤ θ Hence, 1 δ(w, w̄) ≤ √ 2 if p nδc (w0 ) . 2 √ 2 . θ= p nδc (w0 ) As a consequence, in the resulting iteration bound, which is proportional to 1/θ, the quantity δc (w0 ) does not appear under the logarithm. This makes clear that we get a slightly worse result than (11.5) in this case.9 • 11.5 Weighted-centering method The converse of the centering problem consists in finding a primal-dual pair (x, s) such that the ratios between the coordinates of xs are prescribed, when a point on the central path is given. If w1 is a positive vector whose coordinates have the prescribed weights, then we want to find feasible x and s such that xs = λw1 for some positive λ. In fact, the aim is not to solve this √ problem exactly; it is enough if we find a primaldual pair such that δ(xs, λw1 ) ≤ 1/ 2 for some positive λ. This problem is known as the weighted-centering problem.10 Let the primal-dual pair be given for the point w0 = µe on the central path, with µ > 0. We first rescale the given vector w1 by a positive scalar factor in such a way 9 Exercise 74 Another strategy for reaching the central path from a given vector w 0 can be defined as follows. When at w, we define w̄ as follows. As long as max(w) > (1+θ) min(w) do the following: w̄i =    min(w) + θ min(w), max(w) − θ min(w), wi , if wi < (1 + θ) min(w), if wi ≥ (1 + θ) min(w) and wi > max(w) − θ min(w), otherwise. Analyze this strategy and show that the iteration bound is the same as (11.5), but when the central path is reached the duality gap is (in general) smaller, yielding a slight improvement of (11.6). 10 The treatment of the weighted-centering problem presented here was first proposed by Mizuno [214]. It closely resembles our approach to the centering problem. See also Jansen et al. [159, 158] and Jansen [151]. A special case of the weighted-centering problem was considered by Atkinson and Vaidya [29] and later also by Freund [85] and Goffin and Vial [102]. Their objective was to find the weighted-analytic center of a polytope. Our approach generates the weighted-analytic center of the primal polytope P if we take c = 0, and the weighted-analytic center of the dual polytope D if we take b = 0. The approach of Atkinson and Vaidya was put into the target-following framework by Jansen et al. [158]. See also Jansen [151]. The last two references use two nested traceable target sequences. The result is a significantly simpler analysis as well as a better iteration bound than Atkinson and Vaidya’s bound. III.11 Applications 253 that max (w1 ) = µ, and we construct a traceable target sequence from w0 to w1 . When we put w := w0 , the coordinates of w corresponding to the largest coordinates of w1 have their correct value. We gradually decrease the other coordinates of w to their correct value by using a similar technique as in the previous section. Let w̄ be obtained from w by redefining each entry wi according to  w̄i := max wi1 , wi − (1 − θ) min (w) , where θ is some positive constant smaller than one. Note that w̄i can never become smaller than wi1 and if it has reached this value then it remains constant in subsequent target vectors. Hence, this process leaves the ‘correct’ coordinates of w — those have the larger values — invariant, and it decreases the other coordinates by (1−θ) min (w), or less if undershooting should occur. Thus, we have min (w̄) ≥ (1 − θ) min (w), with equality, except possibly for the last point in the target sequence, and 0 ≤ wi − w̄i ≤ θ min (w). To make the sequence traceable, θ cannot be taken too large. Using the last two inequalities we write 1 w̄ − w θ min (w) e 1 √ √ ≤ p . δ(w, w̄) = p w w 2 min (w̄) 2 (1 − θ) min (w) This gives θ δ(w, w̄) ≤ p 2 (1 − θ) p √ min (w) e θ θ n √ . ≤ p kek = √ w 2 1−θ 2 (1 − θ) As before, assuming n ≥ 4 we get 1 δ(w, w̄) ≤ √ 2 if 1 θ=√ . n Before the final iteration, which puts all entries of w at their correct values, each iteration increases δc (w) by the factor 1/ (1 − θ). We certainly have reached w1 if the iteration number k satisfies 1 ≥ δc (w1 ). (1 − θ)k Taking logarithms, this inequality becomes −k log (1 − θ) ≥ log δc (w1 ) and this certainly holds if kθ ≥ log δc (w1 ), √ since θ ≤ − log (1 − θ). Substitution of θ = 1/ n yields that no more than √ n log δc (w1 ) iterations bring the iterate to w1 . 254 11.6 III Target-following Approach Centering and optimizing together In Section 11.4 we discussed a two-phase strategy for the case where the initial primaldual feasible pair (x0 , s0 ) is not on the central path. The first phase is devoted to centralizing and the second phase to optimizing. Although this strategy achieves the best possible iteration bound obtained so far, it is worth considering an alternative strategy that combines the two phases at the same time. Let w0 := x0 s0 and consider the function f : IR+ → IRn+ defined by f (θ) := w0 , θ ≥ 0. e + θw0 (11.7) The image of f defines a path in the w-space starting at f (0) = w0 and converging to the origin when θ goes to infinity. See Figure 11.1. w2 ✻ central path ❘ w0 ■ Dikin-path ✲ w1 Figure 11.1 A Dikin-path in the w-space (n = 2). We refer to this path as the Dikin-path in the w-space starting at w0 .11 It may easily be checked that if w1 lies on the Dikin-path starting at w0 , then the Dikin-path 11 Dikin, well known for his primal affine-scaling method for LO, did not consider primal-dual methods. Nevertheless, the discovery of this path in the w-space has been inspired by his work. Therefore, we gave his name to it. The relation with Dikin’s work is as follows. The direction of the tangent to the Dikin-path is obtained by differentiating f (θ) with respect to θ. This yields −(w 0 )2 df (θ) = = −f (θ)2 . dθ (e + θw 0 )2 This implies that the Dikin-path is a trajectory of the vector field −w 2 in the w-space. Without going further into it we refer the reader to Jansen, Roos and Terlaky [156] where this field was used to obtain the primal-dual analogue of the so-called primal affine-scaling direction of Dikin [63]. This is precisely the direction used in the Dikin Step Algorithm, in Appendix E. III.11 Applications 255 starting at w1 is just the continuation of the path starting at w0 .12 Asymptotically, the Dikin-path becomes tangent to the central path, because for very large values of θ we have e f (θ) ≈ . θ We can easily establish that along the path the proximity to the central path is improving. This goes as follows. Let w := f (θ). Then, using that f preserves the ordering of the coordinates,13 we may write δc (w) = max (w 0 ) 1+θ max (w 0 ) 0 min (w ) 1+θ min (w 0 ) = δc (w0 ) 1 + θ min (w0 ) ≤ δc (w0 ). 1 + θ max (w0 ) (11.8) The last inequality is strict if δc (w0 ) > 1. Also, the duality gap is decreasing. This follows because w0 eT w 0 eT w := eT ≤ < eT w 0 . 0 e + θw 1 + θ min (w0 ) Consequently, the Dikin-path achieves the two goals that were assigned to it. It centralizes and optimizes at the same time. Let us now try to devise a traceable target sequence along the Dikin-path. Suppose that w is a point on this path. Without loss of generality we may assume that w = f (0) = w0 . Let w̄ := f (θ) for some positive θ. Then we have 1 w̄ − w 1 √ = p δ(w, w̄) = p w 2 min (w̄) 2 min (w̄) w e+θw −w √ , w which can be simplified to 3 θw 2 1 . δ(w, w̄) = p 2 min (w̄) e + θw Using that f preserves the ordering of the coordinates we further deduce p p 3 1 + θ min (w) max (w) θw 2 θw √ p δ(w, w̄) = , ≤ p e + θw e + θw 2 min (w) 2 min (w) which gives p δc (w) θw √ δ(w, w̄) ≤ . 2 e + θw Finally, since e + θw > e, we get δ(w, w̄) ≤ 1 p θ δc (w) kwk . 2 12 Exercise 75 Show that if w 1 lies on the Dikin-path starting at w 0 , then the Dikin-path starting at w 1 is just the continuation of the path starting at w 0 . 13 0 and w := f (θ), with f (θ) as defined in (11.7). Prove that Exercise 76 Let w10 ≤ w20 ≤ . . . ≤ wn for each positive θ we have w1 ≤ w2 ≤ . . . ≤ wn . 256 III Target-following Approach So we have 1 δ(w, w̄) ≤ √ 2 if √ 2 p θ= . kwk δc (w) We established above that the duality gap is reduced by at least the factor 1+θ min (w). Replacing θ by its value defined above, we have √ √ √ 2 min (w) 2 min (w) 2 p p p = 1+ . 1 + θ min (w) = 1 + ≥ 1+ kwk δc (w) max (w) nδc (w) δc (w) nδc (w) Using δc (w) < δc (w0 ), we deduce in the usual way that after p eT w 0 δc (w0 ) nδc (w0 ) log ε (11.9) iterations the duality gap is smaller than ε. For large values of δc (w0 ) this bound is significantly worse than the bounds obtained in the previous sections when starting off the central path. It is even worse — by a factor δc (w0 ) — than the bound for the weighted-path-following method in Section 11.3. The reason for this weak result is that in the final step, just before (11.9), we replaced δc (w) by δc (w0 ). Thus we did not fully explore the centralizing effect of the Dikin-path, which implies that in the final iterations δc (w) tends to 1. To improve the bound we shall look at the process in a different way. Instead of directly estimating the number of target moves until a suitable duality gap is achieved, we shall concentrate on the number of steps that are required to get close to the central path, a state that can be measured for instance by δc (w) < 2. Using (11.8) and substituting the value of θ, we obtained p √ kwk δc (w) + min (w) 2 1 + θ min (w) p = δc (w) δc (w̄) = δc (w) √ . 1 + θ max (w) kwk δc (w) + max (w) 2 This can be written as δc (w̄) = δc (w) ! √ 2 (max (w) − min (w)) p . 1− √ kwk δc (w) + max (w) 2 √ Using that kwk ≤ max (w) n and max (w) = δc (w) min (w) we obtain   √ 2 (δ (w) − 1) p c δc (w̄) ≤ δc (w) 1 − √ . δc (w) nδc (w) + 2 Now assuming n ≥ 6 and δc (w) ≥ 2 we get √ 2 (δ (w) − 1) 1 p c . √  ≥ p 2 nδc (w) δc (w) nδc (w) + 2 This can be verified by elementary means. As a consequence, under these assumptions, ! 1 . δc (w̄) ≤ δc (w) 1 − p 2 nδc (w) III.11 Applications 257 Hence, using that δc (w) ≤ δc (w0 ), after k iterations we have δc (w̄) ≤ 1 1− p 2 nδc (w0 ) !k δc (w0 ). By the usual arguments, it follows that δc (w̄) ≤ 2 after at most 2 p δc (w0 ) nδc (w0 ) log 2 iterations. The proximity to the central path is then at most 2. Now from (11.9) it follows that the number of iterations needed to reach an ε-solution does not exceed √ eT w 0 . 2 2n log ε By adding the two numbers, we obtain the iteration bound   p √ √ δc (w0 ) eT w 0 0 n 2 2 log . + 2 δc (w ) log ε 2 Note that this bound is better than the previous bound (11.9) and also better than the bound (11.3) for following the weighted central path. But it is still worse than the bound (11.6) for the two-phase strategy. 11.7 Adaptive and large target-update methods The complexity bounds derived in the previous sections are based on a worst-case analysis of full Newton step methods. Each target step is chosen to be short enough so that, in any possible instance, proximity will remain under control. Moreover, the target step is not at all influenced by the particular primal-dual feasible pair. As a consequence, for an implementation of a full-step target-following method the required running time may give rise to some disappointment. It then becomes tempting to take larger target-updates. An obvious improvement would be to relate the target move to the primal-dual feasible pair and to make the move as large as possible while keeping proximity to the primal-dual feasible pair under control; in that case a full Newton step still yields a new primal-dual feasible pair closer to the target and the process may be repeated. This enhancement of the full-step strategy into the so-called adaptive step or maximal step strategy does not improve the overall theoretical complexity bound, but it has a dramatic effect on the efficiency, especially on the asymptotic convergence rate.14 Despite this nice asymptotic result, the steps in the adaptive-step method may in general be too short to produce a really efficient method. In practical applications it is often wise to work with larger target-updates. One obvious shortcoming of a large 14 In a recent paper [125], Gonzaga showed that the maximal step method — with some additional safeguard steps — is asymptotically quadratically convergent; i.e., in the final iterations the duality gap converges to zero quadratically. Gonzaga also showed that the iterates converge to the analytic centers of the optimal sets of (P ) and (D). 258 III Target-following Approach target-update is that the full Newton step may cause infeasibility. To overcome this difficulty one must use a damped Newton step. The progress is then measured by the primal-dual barrier logarithmic function φw (x, s) analyzed in Section 10.5. Using the results of that section, iteration bounds for the damped Newton method can be derived for large-update versions of the target sequences dealt with in this chapter. In accordance with the results in Chapter 7 for the logarithmic √ barrier central-pathfollowing method, the iteration bounds are always a factor n worse than those for the full-step methods. We feel that it goes beyond the aim of this chapter to give a detailed report of the results obtained in this direction. We refer the reader to the references mentioned in the course of this chapter.15 15 In this connection it may be useful to mention again the book of Jansen [151], which contains a thorough treatment of the target-following approach. Jansen also deals with methods using large target-updates. He provides some additional examples of traceable target sequences that can be used to simplify drastically the analysis of existing methods, such as the cone-affine-scaling method of Sturm and Zhang [260] and the shifted barrier method of Freund [84]. These results can also be found in Jansen et al. [158]. 12 The Dual Newton Method 12.1 Introduction The results in the previous sections have made clear that the image of a given target vector w > 0 under the target map ΦP D (w) can be computed provided that we are given some positive primal-dual pair (x, s). If the given pair (x, s) is such that xs is close to w, Newton’s method can be applied to the weighted KKT system (9.2). Starting at (x, s) this method generates a sequence of primal-dual pairs converging to ΦP D (w). The distance from the pair (x, s) to w is measured by the proximity measure δ(xs, w) in (10.4): 1 w − xs √ δ(xs, w) := p . xs 2 min (w) √ If δ(xs, w) ≤ 1/ 2 then the primal-dual method converges quadratically to ΦP D (w). For larger values of δ(xs, w) we could realize a linear convergence rate by using damped Newton steps of appropriate size. The sketched approach is called primal-dual because it uses search steps in both the x-space and the s-space at each iteration. The aim of this chapter and the next is to show that the same goal can be realized by moving only in the primal space or the dual space. Assuming that we are given a positive primal feasible solution x, a primal method moves in the primal space until it reaches x(w). Similarly, a dual method starts at some given dual feasible solution (y, s) with s > 0, and moves in the dual space until it reaches (y(w), s(w)). We deal with dual methods in the next sections, and consider primal methods in the next chapter. In both cases the search direction is obtained by applying Newton’s method to a suitable weighted logarithmic barrier function. The general framework of a dual target-following algorithm is described on page 260. The underlying target sequence starts at w0 and ends at w̃. 12.2 The weighted dual barrier function The search direction in a dual method is obtained by applying Newton’s method to the weighted dual logarithmic barrier function φdw (y), given by φdw (y) −1 := min (w) T b y+ n X i=1 wi log si ! , (12.1) 260 III Target-following Approach Generic Dual Target-following Algorithm Input:   A dual feasible pair (y 0 , s0 ) such that y 0 = y w0 ; s0 = s w0 ; a final target vector w̃. begin y := y 0 ; s = s0 ; w := w0 ; while w is not ‘close’ to w̃ do begin replace w by the next target in the sequence; while (y, s) is not ‘close’ to (y(w), s(w)) do begin apply Newton steps at (y, s) to the target w end end end with s = c − AT y. In this section we prove that φdw (y) attains its minimal value at y(w). In the next section it turns out that φdw (y) is strictly convex. The first property can easily be derived from the primal-dual logarithmic barrier function φw used in Section 10.5. With w fixed, we consider φw at the pair (x(w), s). Starting from (9.4), page 221, and using x(w)T s = cT x(w) − bT y and x(w)s(w) = w we write max (w) φw (x(w), s) = = = = T x(w) s − x(w)T s − n X j=1 n X j=1 T wj log xj (w)sj − e w + wj log sj − eT w + cT x(w) − bT y − n X j=1 n X n X wj log wj j=1 wj log sj (w) j=1 wj log sj − eT w + min (w) φdw (y) + cT x(w) − eT w + n X n X wj log sj (w) j=1 wj log sj (w). j=1 Since w is fixed, this shows that min (w) φdw (y) and max (w)φw (x(w), s) differ by a constant. Since φw (x(w), s) attains its minimal value at s(w), it follows that φdw (y) must attain its minimal value at y(w).1 1 Exercise 77 For each positive primal-dual pair (x, s), prove that φw (x, s) = φw (x, s(w)) + φw (x(w), s). III.12 Dual Newton Method 12.3 261 Definition of the dual Newton step d Let y be dual feasible and w > 0. We denote the gradient of φdw (y) at y by gw (y) and d the Hessian by Hw (y). These are d gw (y) := and  −1 b − AW s−1 min (w) Hwd (y) := 1 AW S −2 AT , min (w) as can be easily verified. Note that Hwd (y) is positive definite. It follows that φdw (y) is a strictly convex function. The Newton step at y is given by  −1 d (12.2) b − AW s−1 . ∆y = −Hwd (y)−1 gw (y) = AW S −2 AT Since y(w) is the minimizer of φdw (y) we have ∆y = 0 if and only if y = y(w). We measure the proximity of y with respect to y(w) by a suitable norm of ∆y, namely the norm induced by the positive definite matrix Hwd (y): δ d (y, w) := k∆ykHwd (y) . We call this the Hessian norm of ∆y. We show below that it is an appropriate generalization of the proximity measure used in Section 6.5 (page 114) for the analysis of the dual logarithmic barrier approach. More precisely, we find that both measures coincide if w is on the central path. d Using the definition of the Hessian norm of ∆y = −Hwd (y)−1 gw (y) we may write q q d (y)T H d (y)−1 g d (y). δ d (y, w) = ∆y T Hwd (y)∆y = gw (12.3) w w Remark III.15 The dual proximity measure δ d (y, w) can be characterized in a different way as follows: where 1 min δ d (y, w) = p min (w) x n  d−1 x − w s  o : Ax = b , √ w . (12.4) s We want to explain this here, because later on, for the primal method this characterization provides a natural way of defining a primal proximity measure. Let x satisfy Ax = b. We do not require x to be nonnegative. Replacing b by Ax in the above expression (12.2) for ∆y and using d from (12.4), we obtain d := ∆y = AD2 AT This can be rewritten as ∆y = AD2 AT −1 −1  Ax − AW s−1 .  ADd−1 x − ws−1 = AD2 AT −1 AD sx − w √ . w 262 III Target-following Approach The corresponding displacement in the slack space is given by ∆s = −AT ∆y. This implies d∆s = − (AD)T AD2 AT −1 AD sx − w √ . w √ This makes clear that −d∆s is equal to the orthogonal projection of the vector (sx − w) / w into the row space of AD. Hence, we have d∆s = − where x(s, w) = argminx Lemma III.16 below implies The claim follows. 12.4 sx(s, w) − w √ , w  sx − w √ w : Ax = b  . 1 kd∆sk . δ d (y, w) = p min (w) • Feasibility of the dual Newton step Let y + result from the Newton step at y: y + := y + ∆y. If we define ∆s := −AT ∆y, the slack vector for y + is just s + ∆s, as easily follows. The Newton step is feasible if and only if s + ∆s ≥ 0. It is convenient to introduce the vector v according to r w . (12.5) v := min (w) Note that v ≥ e and v = e if and only if w is on the central path. Now we can prove the next lemma. From this lemma it becomes clear that δ d (y, w) coincides with the proximity measure δ(y, µ), defined in (6.7), page 114, if w = µe. Lemma III.16 δ d (y, w) = v∆s ∆s ∆s ≥ ≥ s s s . ∞ If δ d (y, w) ≤ 1 then y ∗ = y + ∆y is dual feasible. Proof: Using (12.3) and the above expression for Hwd (y), we write δ d (y, w)2 = ∆y T Hwd (y)∆y = 1 ∆y T AW S −2 AT ∆y. min (w) III.12 Dual Newton Method 263 Replacing AT ∆y by −∆s and also using the definition (12.5) of v, we get v∆s s δ d (y, w)2 = ∆sT V 2 S −2 ∆s = 2 . Thus we obtain δ d (y, w) = v∆s ∆s ∆s ≥ ≥ s s s . ∞ The first inequality follows because v ≥ e, and the second inequality is trivial. This proves the first part of the lemma. For the second part, assume δ d (y, w) ≤ 1. Then we derive from the last inequality in the first part of the lemma that |∆s| ≤ s, which implies s + ∆s ≥ 0. The lemma is proved. ✷ 12.5 Quadratic convergence The aim of this section is to generalize the quadratic convergence result of the dual Newton method in Theorem II.21, page 114, to the present case.2 Theorem III.17 δ d (y + , w) ≤ δ d (y, w)2 . Proof: By definition d d δ d (y + , w)2 = gw (y + )T Hwd (y + )−1 gw (y + ). d The main part of the proof consists of the calculation of Hwd (y + ) and gw (y + ). It is convenient to work with the matrix −1 B := AV (S + ∆S) Using B we write −2 Hwd (y + ) = AV 2 (S + ∆S) . AT = BB T . d Note that BB T is nonsingular because A has full row rank. For gw (y + ) we may write d gw (y + ) = =  −1 b − AW (s + ∆s)−1 min (w)    −1 e e b − AW s−1 + AW . − min (w) s s + ∆s d The first two terms form gw (y). Replacing W in the third term by min (w) V 2 , we obtain ∆s d d gw (y + ) = gw (y) − AV 2 . s (s + ∆s) 2 An alternative proof of Theorem III.17 can be given by generalizing the proof of Theorem II.21; this approach is followed in Jansen et al. [157] and also in the next chapter, where we deal with the analogous result for primal target-following methods. The proof given here seems to be new, and is more straightforward. 264 III Target-following Approach Since d gw (y) = −Hwd (y)∆y = −AV 2 S −2 AT ∆y = AV 2 S −2 ∆s we get d gw (y + ) = AV 2  ∆s ∆s − 2 s s (s + ∆s)  2 = AV (∆s) 2 s (s + ∆s) 2 ! . The definition of B enables us to rewrite this as  2 ∆s d + gw (y ) = BV . s d Substituting the derived expressions for Hwd (y + ) and gw (y + ) in the expression for d + 2 δ (y , w) we find  2 !T 2   ∆s ∆s d + 2 T −1 T δ (y , w) = V BV B BB . s s Since B T BB T −1 B is a projection matrix,3 this implies  2 !T 2 2   ∆s ∆s ∆s d + 2 δ (y , w) ≤ V V = V s s s whence δ d (y + , w) ≤ V  ∆s s 2 ≤ ∆s s ∞ , V ∆s . s Finally, using Lemma III.16, the theorem follows. 12.6 2 ✷ The damped dual Newton method In this section we consider a damped Newton step to a target vector w > 0 at an arbitrary dual feasible y with positive slack vector s = c − AT y. We use the damping factor α and move from y to y + = y+α∆y. The resulting slack vector is s+ = c−AT y + . Obviously s+ = s + α∆s, where ∆s = −AT ∆y. We prove the following generalization of Lemma II.38. Theorem III.18 Let δ = δ d (y, w). If α = 1/(δc (w)+ δ) then the damped Newton step of size α is feasible and   δ d d + φw (y) − φw (y ) ≥ δc (w) ψ . δc (w) 3 It may be worth mentioning here how the proof can be adapted to the case where A does not have full row rank. First, δd (y, w) can be redefined by replacing the inverse of the Hessian matrix d (y) in (12.3) by its generalized inverse. Then, in the proof of Theorem III.17 we may use the Hw generalized inverse of BB T instead of its inverse. We then also have that B T BB T + B is a projection matrix and hence we can proceed in the same way. III.12 Dual Newton Method 265 Proof: Defining ∆ := φdw (y) − φdw (y + ), we have −1 ∆= min (w) n X s+ wi log i b y−b y − si i=1 T T + ! , or equivalently, −1 ∆= min (w)  ! α∆si wi log 1 + −αb ∆y − . si i=1 T n X Using the definition of the function ψ, we can write this as  !  n X α∆si −1 α∆si T wi ∆= −αb ∆y − . −ψ min (w) si si i=1 Thus we obtain 1 ∆= min (w) n ∆s X αbT ∆y + αwT − wi ψ s i=1  α∆si si ! . The first two terms between the outer brackets can be reduced to α min (w) δ 2 . To this end we write bT ∆y + wT T ∆s d (y)T ∆y. = b − AW s−1 ∆y = − min (w) gw s d Since ∆y = −Hwd (y)−1 gw (y), we get bT ∆y + wT ∆s = min (w)δ 2 , s proving the claim. Using the same argument as in the proof of Theorem III.11, it can easily be understood that the sum between the brackets attains its maximal value if all the coordinates of the vector α∆s/s are zero except one, and the nonzero coordinate, for which wj must be maximal, is equal to minus the norm of this vector. Thus we obtain    ∆s 1 2 α min (w) δ − max (w) ψ −α ∆ ≥ min (w) s   ∆s = α δ 2 − δc (w) ψ −α . s Now also using Lemma III.16 and the monotonicity of ψ we obtain ∆ ≥ α δ 2 − δc (w) ψ (−αδ) = αδ 2 + δc (w) (αδ + log (1 − αδ)) . It is easily verified that the right-hand side expression is maximal if α = 1/(δc (w) + δ). Substitution of this value yields     δ δ = δ − δc (w) log 1 + . ∆ ≥ δ + δc (w) log 1 − δc (w) + δ δc (w) 266 III Target-following Approach This can be written as ∆ ≥ δc (w)      δ δ δ = δc (w) ψ , − log 1 + δc (w) δc (w) δc (w) completing the proof. 12.7 ✷ Dual target-updating When analysing a dual target-following method we need to quantify the effect of an update of the target on the proximity measure. We derive the dual analogue of Theorem III.13 in this section. We assume that (y, s) is dual feasible and δ = δ d (y, w) for some target vector w, and letting w∗ be any other target vector we derive an upper bound for δ d (y, w∗ ). We have the following result, in which δ (w∗ , w) measures the ‘distance’ from w∗ to w according to the primal-dual proximity measure introduced in (10.4): 1 w − w∗ √ δ(w∗ , w) := p . (12.6) w∗ 2 min (w) Theorem III.19 p  r min(w) w δ (y, w ) ≤ p w∗ min(w∗ ) ∗ d ∞  δ d (y, w) + 2δ (w∗ , w) . ∗ d Proof: By definition δ (y, w ) satisfies d δ d (y, w∗ ) = gw ∗ (y) d (y)−1 Hw ∗ = This implies δ d (y, w∗ ) =  −1 b − AW ∗ s−1 ∗ min (w ) 1 b − AW s−1 − A (W ∗ − W ) s−1 min (w∗ ) . d (y)−1 Hw ∗ d (y)−1 Hw ∗ . Using the triangle inequality we derive from this δ d (y, w∗ ) ≤ min (w) g d (y) min (w∗ ) w d (y)−1 Hw ∗ + 1 A (W ∗ − W ) s−1 min (w∗ ) d (y)−1 Hw ∗ . We have4 Hwd ∗ (y) =  = 4 1 1 W ∗ W −2 T AW ∗ S −2 AT = A S A ∗ ∗ min (w ) min (w ) W  ∗ w 1 AW S −2 AT min min (w∗ ) w  ∗ min (w) w Hwd (y). min min (w∗ ) w The meaning of the symbol ‘’ below is as follows. For any two square matrices P and Q we write P  Q (or P  Q) if the matrix P − Q is positive semidefinite. If this holds and Q is nonsingular then P must also be nonsingular and Q−1  P −1 . This property is used here. III.12 Dual Newton Method 267 Hence Hwd ∗ (y)−1  min (w∗ ) w min (w) w∗ ∞ Hwd (y)−1 . We use this inequality to estimate the first term in the above estimate for δ d (y, w∗ ): min (w) g d (y) min (w∗ ) w ≤ d (y)−1 Hw ∗ = s min (w) min(w∗ ) w g d (y) min (w∗ ) min(w) w∗ ∞ w s min (w) w δ d (y, w). min (w∗ ) w∗ ∞ d (y)−1 Hw For the second term it is convenient to use the positive vector v ∗ defined by s w∗ ∗ , v = min (w∗ ) and the matrix B defined by B = AS −1 . Then we have 2 Hwd ∗ (y) = B (V ∗ ) B T and A (W ∗ − W ) s−1 = B (w∗ − w) , so we may write A (W ∗ − W ) s−1 2 d (y)−1 Hw ∗ = = where −1  B (w − w∗ ) (B (w − w∗ ))T B (V ∗ )2 B T  T (V ∗ )−1 (w − w∗ ) H (V ∗ )−1 (w − w∗ ) , −1  BV ∗ . H = (BV ∗ )T B (V ∗ )2 B T Clearly, H = H 2 . Thus, H is a projection matrix, whence H  I. Therefore, A (W ∗ − W ) s−1 2 d (y)−1 Hw ∗ −1 ≤ (V ∗ ) (w − w∗ ) 2 = min (w∗ ) w − w∗ √ w∗ The last equality follows by using the definition of v ∗ . Thus we obtain 1 A (W ∗ − W ) s−1 min (w∗ ) d (y)−1 Hw ∗ 1 ≤ p min (w∗ ) Substituting the obtained bounds we arrive at s min (w) w 1 d ∗ δ (y, w ) ≤ δ d (y, w) + p min (w∗ ) w∗ ∞ min (w∗ ) w − w∗ √ . w∗ w − w∗ √ . w∗ 2 . 268 III Target-following Approach Finally, using the definition of the primal-dual proximity measure δ (w∗ , w), according to (10.4), we may write p 2δ (w∗ , w) min (w) 1 w − w∗ p √ p = , (12.7) w∗ min (w∗ ) min (w∗ ) and the theorem follows. ✷ In the special case where w∗ = (1 − θ)w the above result reduces to     d 1 θ kwk θ kwk 1 δ (y, w) d d ∗ √ = δ (y, w) + . +√ δ (y, w ) ≤ √ 1−θ min (w) 1−θ 1−θ 1 − θ min (w) Moreover, if w = µe, this gives δ d (y, w∗ ) ≤ √  1 δ d (y, w) + θ n . 1−θ 13 The Primal Newton Method 13.1 Introduction The aim of this chapter is to show that the idea of a target-following method can also be realized by moving only in the primal space. Starting at a given positive primal feasible solution x a primal method moves in the primal space until it reaches x(w) where w denotes an intermediate (positive) target vector. The search direction follows by applying Newton’s method to a weighted logarithmic barrier function. This function is introduced in the next section. Its minimizer is precisely x(w). Hence, by taking (full or damped) Newton steps with respect to this function we can (approximately) compute x(w). The general framework of a primal target-following algorithm is described below. Generic Primal Target-following Algorithm Input:  A primal feasible vector x0 such that x0 = x w0 ; a final target vector w̃. begin x := x0 ; w := w0 ; while w is not ‘close’ to w̃ do begin Replace w by the next target in the sequence; while x is not ‘close’ to x(w) do begin Apply Newton steps at x to the target w end end end The underlying target sequence starts at w0 and ends — via some intermediate target vectors — at w̃. 270 13.2 III Target-following Approach The weighted primal barrier function The search direction in a primal method is obtained by applying Newton’s method to the weighted primal barrier function given by   n X 1  cT x − wj log xj  . (13.1) φpw (x) := min (w) j=1 We first establish that φpw (x) attains its minimal value at x(w). This easily follows by using the barrier function φw in the same way as for the dual weighted barrier function. Starting from (9.4), on page 221, and using xT s(w) = cT x − bT y(w) and x(w)s(w) = w we write max (w) φw (x, s(w)) = = = = T x s(w) − xT s(w) − cT x − n X j=1 n X j=1 n X j=1 T wj log xj sj (w) − e w + wj log xj − eT w + n X n X wj log wj j=1 wj log xj (w) j=1 wj log xj − bT y(w) − eT w + min (w) φpw (x) − bT y(w) − eT w + n X n X wj log xj (w) j=1 wj log xj (w). j=1 This implies that x(w) is a unique minimizer of φpw (x). 13.3 Definition of the primal Newton step p (x) Let x be primal feasible and let w > 0. We denote the gradient of φpw (x) at x by gw p and the Hessian by Hw (x). These are  1 w p gw (x) := c− min (w) x and Hwp (x) := 1 W X −2 = V 2 X −2 , min (w) where V = diag (v), with v as defined in (12.5) in the previous chapter. Note that Hwp (x) is positive definite. It follows that φpw (x) is a strictly convex function. The calculation of the Newton step ∆x is a little complicated by the fact that we want x + ∆x to stay in the affine space Ax = b. This means that ∆x must satisfy A∆x = 0. The Newton step at x is then obtained by minimizing the second-order Taylor polynomial at x subject to this constraint. Thus, ∆x is the solution of   1 T p T p min ∆x gw (x) + ∆x Hw (x)∆x : A∆x = 0 . ∆x 2 III.13 Primal Newton Method 271 The optimality conditions for this minimization problem are p gw (x) + Hwp (x)∆x A∆x = = AT u 0, where the coordinates of u ∈ IRm are Lagrange multipliers. We introduce the scaling vector d according to x d := √ . w Observe that Hwp (x) = D−2 / min (w). The optimality conditions can be rewritten as  w −d−1 ∆x + min (w) (AD)T u = d c− x AD(d−1 ∆x) = 0, which shows that −d−1 ∆x is the orthogonal projection of d (c − w/x) into the null space of AD:     xc − w w  −1 √ =⇒ ∆x = −DPAD . (13.2) −d ∆x = PAD d c − x w √ Remark III.20 When w = µe we have d = x/ µ. Since AD and AX have the same null space, we have PAD = PAX . Therefore, in this case the Newton step is given by 1 ∆x = − √ XPAX µ  xc − µe √ µ  = −XPAX   Xc −e . µ This search direction is used in the so-called primal logarithmic barrier method, which is obtained by applying the results of this chapter to the case where the targets are on the central path. It is the natural analogue of the dual logarithmic barrier method treated in Chapter 6. • We introduce the following proximity measure to quantify the distance from x to x(w): o n  1 w δ p (x, w) = p : AT y + s = c . (13.3) min d s − x min (w) y,s This measure is inspired by the measure (6.8) for the dual logarithmic barrier method, introduced in Section 6.5.1 Let us denote by s(x, w) the minimizing s in (13.3). Lemma III.21 We have δ p (x, w) = 1 x s(x, w) − w v∆x √ =p . x w min(w) Proof: For the proof of the first equality we eliminate s in (13.3) and write o n o  n  w w − DAT y . : AT y + s = c = min d c− min d s − y y,s x x 1 Similar proximity measures were used in Roos and Vial [245], and Hertog and Roos [142] for primal methods, and in Mizuno [212, 214] and Jansen et al. [159] for primal-dual methods. 272 III Target-following Approach Let ȳ denote the solution of the last minimization problem. Then    w  w = DAT ȳ + PAD (d c − . d c− x x Thus we obtain From (13.2),    w  w d c− − DAT ȳ = PAD d c − . x x   w  = −d−1 ∆x. PAD d c − x Hence we get p 1 −1 δ (x, w) = p kd min (w) √ w∆x v∆x ∆xk = p = , x x min (w) 1 proving the first equality in the lemma. The second equality in the lemma follows from the definition of s(x, w).2 ✷ From the above proof and (13.2) we deduce that d−1 ∆x = − xs(x, w) − w √ . w (13.4) Also observe that the lemma implies that, just as in the dual case, the proximity measure is equal to the ‘Hessian–norm’ of the Newton step: δ p (x, w) = k∆xkHwp (x) . 13.4 Feasibility of the primal Newton step Let x+ result from the Newton step at x: x+ := x + ∆x. The Newton step is feasible if and only if x + ∆x ≥ 0. Now we can prove the next lemma. Lemma III.22 If δ p (x, w) ≤ 1 then x∗ = x + ∆x is primal feasible. Proof: From Lemma III.21 we derive δ p (x, w) = ∆x ∆x v∆x ≥ ≥ x x x . ∞ Hence, if δ p (x, w) < 1, then |∆x| ≤ x, which implies x + ∆x ≥ 0. The lemma follows. ✷ 2 Exercise 78 If δp (x, w) ≤ 1 then s(x, w) is dual feasible. Prove this. III.13 Primal Newton Method 13.5 273 Quadratic convergence We proceed by showing that the primal Newton method is quadratically convergent. Theorem III.23 δ p (x+ , w) ≤ δ p (x, w)2 . Proof: Using the definition of δ p (x+ , w) we may write δ p (x+ , w) = ≤ ≤ x+ s(x+ , w) − w √ w min(w) 1 x+ s(x, w) − w p √ w min(w) 1 + p 2 kx s(x, w) − wk. min(w) p 1 Denote s̄ := s(x, w). From (13.4) we obtain x s̄ − w x s̄(x s̄ − w) s̄∆x = s̄dd−1 ∆x = −ds̄ √ =− . w w This implies kx+ s̄ − wk = k(x + ∆x)s̄ − wk = xs̄ − w − (xs̄ − w)2 xs̄(xs̄ − w) = . w w Combining the above relations, we get δ p (x+ , w) ≤ p 1 min(w) 2  xs̄ − w √ w 2 ≤ xs̄ − w p √ w min(w) 1 This completes the proof. 13.6 !2 = δ p (x, w)2 . ✷ The damped primal Newton method In this section we consider a damped primal Newton step to a target vector w > 0 at an arbitrary positive primal feasible x. The damping factor is again denoted by α and we move from x to x+ = x + α∆x. After Theorem III.18 it will be no surprise that we have the following result. Theorem III.24 Let δ = δ p (x, w). If α = 1/(δc (w) + δ) then the damped Newton step of size α is feasible and   δ p p + . φw (x) − φw (x ) ≥ δc (w) ψ δc (w) 274 III Target-following Approach Proof: Defining ∆ := φpw (x) − φpw (x+ ), we have n X x+ wi log i c x−c x + xi i=1 1 ∆= min (w) T T + ! , or equivalently, 1 ∆= min (w) n X  α∆xi wi log 1 + −αc ∆x + xi i=1 T ! . Using the definition of the function ψ, this can be rewritten as 1 ∆= min (w) T −αc ∆x + n X wi i=1  α∆xi −ψ xi  α∆xi xi ! . Thus we obtain n ∆x X wi ψ −αc ∆x + αw − x i=1 1 ∆= min (w) T T  α∆xi xi ! . We reduce the first two terms between the outer brackets to α min (w) δ 2 : −cT ∆x + wT and from (13.2),  w T ∆x − c− x = = √ Since d = x/ w this implies  w T ∆x ∆x, =− c− x x     w  w T d c− PAD d c − x x   w  2 PAD d c − = d−1 ∆x x 2 .  w T ∆x = min (w) δ 2 , − c− x proving the claim. The sum between the brackets can be estimated in the same way as for the dual method. Thus we obtain    ∆x 1 α min (w) δ 2 − max (w) ψ −α ∆ ≥ min (w) x   ∆x = α δ 2 − δc (w) ψ −α , x yielding exactly the same lower bound for ∆ as in the dual case. Hence we can use the same arguments as we did there to complete the proof. ✷ III.13 Primal Newton Method 13.7 275 Primal target-updating We derive the primal analogue of Theorem III.19 in this section. We assume that x is primal feasible and δ = δ p (x, w) for some target vector w. For any other target vector w∗ we need to derive an upper bound for δ p (x, w∗ ). The result is completely similar to Theorem III.19, but the proof must be adapted to the primal context. Theorem III.25 p  r min(w) w p δ (x, w ) ≤ ∗ ∗ w min(w ) p ∗ p ∗  δ (x, w) + 2δ (w , w) . ∞ Proof: By Lemma III.21, x s(x, w∗ ) − w∗ 1 √ , δ p (x, w∗ ) = p w∗ min (w∗ ) where s(x, w∗ ) satisfies the affine dual constraint AT y +s = c and minimizes the above norm. Hence, since s(x, w) satisfies the affine dual constraint, replacing s(x, w∗ ) by s(x, w) we obtain δ p (x, w∗ ) ≤ = 1 x s(x, w) − w∗ p √ w∗ min (w∗ ) 1 x s(x, w) − w + w − w∗ p √ . w∗ min (w∗ ) Using the triangle inequality we derive from this x s(x, w) − w w − w∗ 1 1 √ p √ + . δ p (x, w∗ ) ≤ p w∗ w∗ min (w∗ ) min (w∗ ) The second term can be reduced by using (12.7) and then the theorem follows if the first term on the right satisfies s min (w) w x s(x, w) − w 1 p √ ≤ δ p (x, w). (13.5) ∗ ∗ min (w∗ ) w∗ ∞ w min (w ) This inequality can be obtained by writing x s(x, w) − w 1 p √ ∗ w∗ min (w ) = ≤ = Hence the theorem follows. √ w x s(x, w) − w 1 p √ √ ∗ ∗ w w min (w ) r x s(x, w) − w 1 w p √ ∗ ∗ w ∞ w min (w ) s min (w) w δ p (x, w). min (w∗ ) w∗ ∞ ✷ 14 Application to the Method of Centers 14.1 Introduction Shortly after Karmarkar published his projective algorithm for linear optimization, some authors pointed out possible links with earlier literature. Gill et al. [97] noticed the close similarity between the search directions in Karmarkar’s algorithm and in the logarithmic barrier approach extensively studied by Fiacco and√McCormick [77]. At the same time, Renegar [237] proposed an algorithm with O( nL) iterations, an improvement over Karmarkar’s algorithm. Renegar’s scheme was a clever implementation of Huard’s method of centers [148]. Again, there were clear similarities, but equivalence was not established. For a while, the literature seemed to develop in three approximately independent directions. The first stream dealt with extensions of Karmarkar’s algorithm and was identified with the notion of projective transformation and projective space.1 This is the topic of the next chapter. The second stream of research was a revival and a new interpretation of the logarithmic approach. We amply elaborated on that approach in Part II of this book. The third stream prolonged Renegar’s contribution. Not so much has been done in this framework.2 After a decade of active research, it has become apparent that the links between the three approaches are very tight. They only reflect different ways of looking at the same thing. From one point of view, the similarity between the method of centers and the logarithmic barrier approach is striking. In both cases, the progress towards optimality is triggered by a parameter that is gradually shifted to its optimal value. The iterations are performed in the primal, dual or primal-dual spaces; they are made of Newton steps or damped Newton steps that aim to catch up with the parameter variation. The parameter updates are either small enough √ to allow full Newton steps and the method is of a path-following type with an O( nL) iteration bound; or, the updates are large and the method performs line searches along Newton’s direction with the aim of reducing a certain potential. The parameter in the logarithmic barrier approach is the penalty coefficient attached to the logarithm; in the method of centers, the parameter is a bound on the optimal objective function value. In the logarithmic barrier approach, the parameter is gradually moved to zero. In the method of centers, 1 For survey papers, we refer the reader to Anstreicher [17, 24], Goldfarb and Todd [109], Gonzaga [123, 124], den Hertog and Roos [142] and Todd [265]. 2 In this connection we cite den Hertog, Roos and Terlaky [143] and den Hertog [140]. 278 III Target-following Approach the parameter is monotonically shifted to the optimal value of the LO problem. A similar link exists between Renegar’s method of centers and the variants of Karmarkar’s method introduced by de Ghellinck and Vial [95] and Todd and Burrell [266]. Those variants use a parameter — a lower bound in case of a minimization problem — that is triggered to its optimal value. If this parameter is kept fixed, the projective algorithm computes an analytic center3 that is the dual of the center used by Renegar. Consequently, there also exist path-following schemes for the projective algorithm, see Shaw and Goldfarb [254], and Goffin and Vial [103]; these are very close to Renegar’s method. In this chapter we concentrate on the method of centers. Our aim is to show that the method can be described and analyzed quite well in the target-following framework.4 14.2 Description of Renegar’s method The method of centers (or center method) can easily be described by considering the barrier function used by Renegar.5 Assuming the knowledge of a strict lower bound z for the optimal value of the dual problem (D) he considers the function φR (y, z) := −q log(bT y − z) − n X log si , i=1 where q is some positive number and s = c − AT y. His method consists of finding (an approximation of) the minimizer y(z) of this barrier function by using Newton’s method. Then the lower bound z is enlarged to z̄ = z + θ(bT y(z) − z) (14.1) 3 The computation of analytic centers can be performed via variants of the projective algorithm. In this connection, we cite Atkinson [29] and Goffin and Vial [102]. 4 The method of centers has an interest of its own. First, the approach formalizes Huard’s scheme and supports Huard’s intuition of an efficient interior-point algorithm. There are also close links with Karmarkar’s method that are made explicit in Vial [285]. Second, the method of centers offers a natural framework for cutting plane methods. Cutting plane methods could be described in short as a way to solve an LO problem with so many (possibly infinite) inequality constraints that we cannot even enumerate them in a reasonable computational time. The only possibility is to generate them one at a time, as they seem needed to insure feasibility eventually. Generating cuts from a center, and in particular, from an analytic center, appears to be sound from both the theoretical and the practical point of views. The idea of using analytic centers in this context was alluded to by Sonnevend [257] and fully worked out by Goffin, Haurie and Vial [99]. See du Merle [209] and Gondzio et al. [115] for a detailed description of the method, and e.g., Bahn et al. [31] and Goffin et al. [98] for results on large scale programs. Let us mention that the complexity analysis of a conceptual method of analytic centers was given first by Atkinson and Vaidya [30] and Nesterov [225]. An implementable version of the method using approximate analytic centers is analyzed by Goffin, Luo and Ye [100], Luo [186], Ye [312], Goffin and Sharifi-Mokhtarian [101], Altman and Kiwiel [7], Kiwiel [168], and Goffin and Vial [104]. Besides, to highlight the similarity between the method of centers and the logarithmic barrier approach it is worth noting that logarithmic barrier methods also allow a natural cutting plane scheme based on adding and deleting constraints. We refer the reader to den Hertog [140], den Hertog, Roos and Terlaky [145], den Hertog et al. [141] and Kaliski et al. [164]. For a complexity analysis of a special variant of this method we refer the reader to Luo, Roos and Terlaky [187]. 5 The notation used here differs from the notation of Renegar. This is partly due to the fact that Renegar dealt with a solution method for the primal problem whereas we apply his approach to the dual problem. III.14 Method of Centers 279 for some positive θ such that z̄ is again a strict lower bound for the optimal value and the process is repeated. Renegar showed that this scheme can be used to construct an ε-solution of (D) in at most   √ bT y 0 − z 0 n log O ε iterations, where the superscript 0 refers to initial values, as usual. In this way he was the first to obtain this iteration bound. The algorithm can be described as follows. Renegar’s Method of Centers Input: A strict lower bound z 0 for the optimal value of (D); a dual feasible y 0 such √ that y 0 is ‘close’ to y(z 0 ); a positive number q ≥ n; an update parameter θ, 0 < θ < 1. begin y := y 0 ; z := z 0 ; while bT y − z ≥ ε do begin  z = z + θ bT y − z ; while y is not ‘close’ to y(z) do begin Apply Newton steps at y aiming at y(z) end end end 14.3 Targets in Renegar’s method Let us now look at how this approach fits into the target-following concept. First we observe that φR can be considered as the barrier term in a weighted barrier function for the dual problem when we add the constraint bT y ≥ z to the dual constraints and give the extra constraint the weight q. Giving the extra constraint the index 0, and indexing the other constraints by 1 to n as usual, we have the vector of weights w = (q, 1, 1, . . . , 1). The second observation is that Renegar’s barrier function is exactly the weighted dual barrier function φdw (cf. (12.1) on page 259) for the problem  (DR) max 0T y : AT y + s = c, −bT y + s0 = −z, s ≥ 0, s0 ≥ 0 . 280 III Target-following Approach The feasible region of this problem is just the feasible region of (D) cut by the objective constraint bT y ≥ z. Since the objective function is trivial, each feasible point is optimal. As a consequence, the weighted central path of (DR) is a point and hence this point, which is the minimizer of φR , is just the weighted-analytic center (according to w) of the feasible region of (D) cut by the objective constraint (cf. Theorem III.5 on page 229). The dual problem of (DR) is the following homogeneous problem:  (P R) min cT x̃ − x̃0 z : Ax̃ − x̃0 b = 0, x̃ ≥ 0, x̃0 ≥ 0 . Applying Theorem III.1 (page 222), we see that the optimality conditions for φR (y, z) = φdw (y) are given by Ax̃ − x̃0 b AT y + s b T y − s0 x̃s x̃ s 0 0 = 0, = = c, z, = = e, q. x̃, x0 ≥ 0, s ≥ 0, s0 ≥ 0, (14.2) The third and fifth equations imply x̃0 = q q = T . 0 s b y−z x := x̃ bT y − z = x̃ x̃0 q Hence, defining (14.3) we get Ax AT y + s = = b, c, xs = µz e, where µz := bT y(z) − z , q x ≥ 0, s ≥ 0, (14.4) (14.5) with y(z) denoting the minimizer of Renegar’s barrier function φR (y). We conclude that y(z) can be characterized in two ways. First, it is the weighted-analytic center of the feasible region of (D) cut by the objective constraint bT y ≥ z and, second, it is the point on the central path of (D) corresponding to the above barrier parameter value µz . Figure 14.1 depicts the situation. In the course of the center method the lower bound z is gradually updated to the optimal value of (D) and after each update of the lower bound the corresponding minimizer y(z) is (approximately) computed. Since y(z) represents the dual part of the primal-dual pair belonging to the vector µz e in the w-space, we conclude that the center method can be considered as a central-path-following method. III.14 Method of Centers 281 ✻ b y(z) bT y ≥ z ❄ Figure 14.1 14.4 The center method according to Renegar. Analysis of the center method It will be clear that in the analysis of his method Renegar had to deal with the question of how far the value of the lower bound z can be enlarged — according to (14.1) — so that the minimizer ȳ of φR (y, z̄) can be computed efficiently; hereby it may be assumed that the minimizer y of φR (y, z) is known.6 The answer to this question determines the speed of convergence of the method. As we know, the answer depends on the proximity δ(µz e, µz̄ e) of the present target vector µz e to the new target vector µz̄ e. Thus, we have to estimate the proximity δ(µz e, µz̄ e), where z̄ is given by (14.1). Further analysis below is a little complicated by the fact that the new target vector µz̄ e is not known, since µz̄ = bT y(z̄) − z̄ q depends on the unknown minimizer y(z̄) of φR (y, z̄). To cope with this complication we need some further estimates. Let (x(z), y(z), s(z)) denote the solution of (14.4), so it is the point on the central path of (P ) and (D) corresponding to the strict lower bound z for the optimal value. Then the duality gap at this point is given by  n bT y(z) − z c x(z) − b y(z) = nµz = . q T 6 T As far as the numerical procedure for the computation of the minimizer of Renegar’s barrier function is concerned, it may be clear that there are a lot of possible choices. Renegar presented a dual method in [237]. His search direction is the Newton direction for minimizing φR . In our framework this amounts to applying the dual Newton method for the computation of the primaldual pair corresponding to the target vector w for the problems (P R) and (DR); this method has been discussed in Section 12.2. Obviously, the same goal can be achieved by using any efficient computational — primal, dual or primal-dual — method for the computation of the primal-dual pair corresponding to the target vector µz e for (P ) and (D). 282 III Target-following Approach This identity can be written as n+q n cT x(z) − z = =1+ . bT y(z) − z q q (14.6) Denoting the optimal value by z ∗ we have cT x(z) ≥ z ∗ . Hence    n bT y(z) − z . z∗ − z ≤ 1 + q Also observe that when we know x(z) and y(z) then the lower bound z can be reconstructed: solving z from (14.6) and (14.5) respectively we get z= (n + q) bT y(z) − q cT x(z) = bT y(z) − qµz . n For the updated lower bound z̄ we thus find the expression  z̄ = bT y(z) − qµz + θ bT y(z) − z = bT y(z) − qµz + θqµz = bT y(z) − (1 − θ) qµz . Since bT y(z) is a lower bound for the optimal value, this relation makes clear that we are able to guarantee that z̄ is a strict lower bound for the optimal value only if θ < 1. Lemma III.26 The dual objective value bT y(z) is monotonically increasing, whereas the primal objective value cT x(z) and bT y(z) − z are monotonically decreasing if z increases.7 Proof: We first prove the second part of the lemma. To this end we use the weighted primal barrier function for (P R), φpw,z (x̃, x̃0 ) = cT x̃ − x̃0 z − q log x̃0 − n X log x̃i . i=1 The dependence of this function on the lower bound z is expressed by the corresponding subindex. Now let z and z̄ be two strict  lower bounds for the optimal value of (P ) and (D) and z̄ > z. Since x̃(z), x̃0 (z) minimizes φpw,z (x̃, x̃0 ) and x̃(z̄), x̃0 (z̄) minimizes φpw,z̄ (x̃, x̃0 ) we have     φpw,z x̃(z), x̃0 (z) ≤ φpw,z x̃(z̄), x̃0 (z̄) , φpw,z̄ x̃(z̄), x̃0 (z̄) ≤ φpw,z̄ x̃(z), x̃0 (z) . Adding these inequalities, we get     φpw,z x̃(z), x̃0 (z) + φpw,z̄ x̃(z̄), x̃0 (z̄) ≤ φpw,z x̃(z̄), x̃0 (z̄) + φpw,z̄ x̃(z), x̃0 (z) . Evaluating the expressions in these inequalities and omitting the common terms on both sides — the terms in which the parameters z and z̄ do not occur — we find −x̃0 (z)z − x̃0 (z̄)z̄ ≤ −x̃0 (z̄)z − x̃0 (z)z̄, 7 This lemma is taken from den Hertog [140]. The proof below is a slight variation on his proof. The proof technique is due to Fiacco and McCormick [77] and can be applied to obtain monotonicity of the objective value along the central path in a much wider class of convex problems. We refer the reader to den Hertog, Roos and Terlaky [144] and den Hertog [140]. III.14 Method of Centers or equivalently, 283  (z̄ − z) x̃0 (z̄) − x̃0 (z) ≥ 0. This implies x̃0 (z̄) − x̃0 (z) ≥ 0, or x̃0 (z̄) ≥ x̃0 (z). By (14.3) this is equivalent to bT y(z̄) − z̄ ≤ bT y(z) − z. Thus we have shown that bT y(z) − z is monotonically decreasing if z increases. This implies that µz is also monotonically decreasing if z increases. The rest of the lemma follows because along the central path the dual objective value is increasing and the primal objective value is decreasing. The proof of this property of the central path can be found in Remark II.6 (page 95). ✷ Now let z̄ be given by (14.1). Then we may write  cT x(z̄) − z − θ bT y(z) − z cT x(z̄) − z̄ = . cT x(z) − z cT x(z) − z By the above lemma we have cT x(z̄) ≤ cT x(z). Hence, using also (14.6) we get  θ bT y(z) − z cT x(z̄) − z̄ θq ≤1− =1− . cT x(z) − z cT x(z) − z n+q Using (14.6) once more we derive bT y(z̄) − z̄ cT x(z̄) − z̄ = , bT y(z) − z cT x(z) − z and so bT y(z̄) − z̄ θq ≤1− . T b y(z) − z n+q Therefore we obtain the following relation between µz̄ and µz :   θq µz̄ ≤ 1 − µz . n+q (14.7) For the moment we deviate from Renegar’s approach by taking as a new target the vector   θq w̄ := 1 − w, (14.8) n+q where w = µz e. Instead of Renegar’s target vector µz̄ e we use w̄ as a target vector. Due to the inequality (14.7) this means that we slow down the progress to optimality compared with Renegar’s approach. We show, however, that the modified strategy √ still yields an O( nL) iteration bound, just as Renegar’s approach. Assuming n ≥ 4, the argument used in Section 11.2 implies that 1 δ(µz e, w̄) ≤ √ 2 if θq 1 = √ . n+q n 284 III Target-following Approach Hence, when θ= n+q √ , q n (14.9) the primal-dual pair belonging to the target w̄ can be computed efficiently, to any desired accuracy. Since the barrier parameter, and hence the duality gap, at the new target is reduced by the factor 1 − θq/ (n + q) we obtain an ε-solution after at most √ eT w 0 n+q eT w 0 log = n log θq ε ε iterations. Here w0 denotes the initial point in the w-space. Note that the parameter q disappeared in the iteration bound. In fact, the above analysis, based on the updating scheme (14.8), works for every positive value of q and gives the same iteration bound for each value of q. On the other hand, when using Renegar’s scheme, the update goes via the strict lower bound z. As we established before, it is then √ necessary to keep θ < 1. So Renegar’s approach only works if q satisfies n+q < q n. This amounts to the following condition on q: √ n > n. q≥√ n−1 √  Renegar, in [237], recommended q = n and θ = 1/ 13 q . Den Hertog [140], who √ √  simplified the analysis significantly, used q ≥ 2 n and θ = 1/ 8 q . In both cases the iteration bound is of the same order of magnitude as the bound derived above.8 14.5 Adaptive- and large-update variants of the center method In the logarithmic barrier approach, we used a penalty parameter to trigger the algorithm. By letting the parameter go to zero in a controlled way, we could drive the pairs of dual solutions to optimality. The crux of the analysis was the updating scheme: small, adaptive or large updates, with results of variable complexity. Small or adaptive updates allow relatively small reductions of the duality gap — by a factor √ 1 − O (1/ n) — in O(1)√Newton steps between two successive updates, and achieve global convergence in O( nL) iterations. Large updates allow sharp decreases of the duality gap — by a factor 1 − Θ (1) — but require more Newton steps (usually as many as O(n)) between two successive updates and lead to global convergence in O(nL) iterations. A similar situation occurs for target-following methods, where the algorithm is triggered by the targets; the target sequence can be designed such that similar convergence results arise for small, adaptive and large updates respectively. The method of this chapter, the (dual) center method of Renegar, has a different triggering mechanism: a lower bound on the optimal objective value. The idea is to 8 √ √ √ For q = n we obtain from (14.9) θ = 2/ n and for q ≥ 2 n we get θ ≤ 1/2 + 1/ n. These values for θ are larger than the respective values used by Renegar and Den Hertog. We should note however that this is, at least partly, due to the fact that the analysis of both Renegar and den Hertog is based on the use of approximate central solutions whereas we made the simplifying assumption that exact central solutions are computed for each value of µz . III.14 Method of Centers 285 move this bound up to the point where the objective is set near to its optimal value. For any such lower bound z the dual polytope AT y ≤ c is cut by the objective constraint bT y ≥ z and the (ideal) new iterate is a weighted-analytic center of the cut polytope. The weighting vector treats all the constraints in AT y ≤ c equally but it gives extra emphasis to the objective constraint by the factor q. Enlarging q, pushes the new iterate in the direction of the optimal set. This opens the way to adaptive- and largeupdate versions of Renegar’s method. Appropriate values for q can easily be found. To see this it suffices to recall from (14.7) that the duality gap between two successive updates of the lower bound reduces by at least the factor 1− θq . n+q For example, q = n and θ = 1/2 give a reduction of the duality gap by at least 3/4. It is clear that the reduction factor for the duality gap can be made arbitrarily small by choosing appropriate values for q and θ (0 < θ < 1). We then get out of the domain of quadratic convergence, but by using damped Newton steps we can reach the new weighted-analytic center in a controlled number of steps. From this it will be clear that the updates of the lower bound can be designed in such a way that adaptive- or large-update versions of the center method arise and that the complexity results will be similar to those for the logarithmic barrier method. These ideas can be worked out easily in the target-following framework. In fact, if Renegar’s method is modified according to the updating scheme (14.8), the results immediately follow from the corresponding results for the logarithmic barrier approach.9 9 Adaptive and large-update variants of the center method are analyzed by den Hertog [140]. Part IV Miscellaneous Topics 15 Karmarkar’s Projective Method 15.1 Introduction It has been pointed out before that recent research in interior-point methods for LO has been motivated by the appearance of the seminal paper [165] of Karmarkar in 1984. Despite its extraordinary power of stimulation of the scientific community, Karmarkar’s so-called projective method seemed to remain a very particular method, remotely related to the huge literature to which it gave rise. Significantly many papers appeared on the projective algorithm itself,1 but the link with other methods, in particular Renegar’s, has not drawn much attention up to recently.2 The decaying interest for the primal projective method is also due to a poorer behavior on solving practical optimization problems.3 In this chapter we provide a simplified description and analysis of the projective method and we also relate it to the other methods described in this book. Karmarkar considered the very special problem  (P K) min cT x : Ax = 0, eT x = n, x ≥ 0 , where, as before, A is an m × n matrix of rank m, and e denotes the all-one vector. Karmarkar made two seemingly restrictive assumptions, namely that the optimal value cT x∗ of the problem is known and has value zero, and secondly, that the all-one vector e is feasible for (P K). Note that the problem (P K) is trivial if cT e = 0. Then the all-one vector e is an optimal solution. So we assume throughout that this case is excluded. As a consequence we have cT e > 0. (15.1) 1 Papers in that stream were written by Anstreicher [14, 15, 16, 18, 19, 20, 21, 22, 23, 24], Freund [83, 85], de Ghellinck and Vial [95, 96], Goffin and Vial [102, 103], Goldfarb and Mehrotra [105, 106, 107], Goldfarb and Xiao [110], Goldfarb and Shaw [108], Shaw and Goldfarb [254], Gonzaga [117, 119], Roos [239], Vial [282, 283, 284], Xu, Yao and Chen [300], Yamashita [301], Ye [304, 305, 306, 307], Ye and Todd [315] and Todd and Burrell [266]. We also refer the reader to the survey papers Anstreicher [17, 24], Goldfarb and Todd [109], Gonzaga [123, 124], den Hertog and Roos [142] and Todd [265]. 2 See Vial [285, 286]. 3 In their comparison between the primal projective method and a primal-dual method, Fraley and Vial [80, 81] concluded to the superiority of the later for solving optimization problems. However, it is worth mentioning that the projective algorithm has been used with success in the computation of analytic centers in an interior-point cutting plane algorithm; in particular, Bahn et al. [31] and Goffin et al. [98] could solve very large decomposition problems with this approach. 290 IV Miscellaneous Topics Later on it is made clear that the model (P K) is general enough for our purpose. If it can be solved in polynomial time then the same is true for every LO problem. 15.2 The unit simplex Σn in IRn The feasible region of (P K) is contained in the unit simplex in IRn . This simplex plays a crucial role in the projective method. We denote it by Σn :  Σn = x ∈ IRn : eT x = n, x ≥ 0 . Obviously4 the all-one vector e belongs to Σn and lies at the heart of it. The sphere in IRn centered at e and with radius ρ is denoted by B(e, ρ). The analysis of the projective method requires knowledge of the smallest sphere B(e, R) containing Σn as well as the largest sphere B(e, r) whose intersection with the hyperplane eT x = n is contained in Σn . It can easily be understood that R is equal to the Euclidean distance from the center e of Σn to the vertex (n, 0, . . . , 0). See Figure 15.1, which depicts Σ3 . We have x3 (0, 0, 3) ✕ (0, 32 , 23 ) r ( 23 , 0, 32 ) e = (1, 1, 1) (0, 0, 0) (0, 3, 0) ( 23 , 32 , 0) (3, 0, 0) x2 R ❘ x1 Figure 15.1 R= The simplex Σ3 . p p (n − 1)2 + (n − 1)12 = n(n − 1). Similarly, r is equal to the Euclidean distance from e to the center of one of the faces 4 It might be worthwhile to indicate that the dimension of the polytope Σn is n − 1, since this is the dimension of the hyperplane eT x = n, which is the smallest affine space containing Σn . IV.15 Karmarkar’s Projective Method 291 n n , . . . , n−1 ), and therefore of Σn , such as (0, n−1 r= s 1 + (n − 1) Assuming n > 1, we thus have 15.3  n −1 n−1 2 = r n . n−1 1 r = . R n−1 The inner-outer sphere bound As usual, let P denote the feasible region of the given problem (P K). Then we may write P as P = Ω ∩ {x ∈ IRn : x ≥ 0} , where Ω is the affine space determined by  Ω = x ∈ IRn : Ax = 0, eT x = n . Now consider the minimization problem  min cT x : x ∈ Ω ∩ B(e, r) . This problem can be solved explicitly. Since Ω is an affine space containing the center e of the sphere B(e, r), the intersection of the two sets is a sphere of radius r in a lower-dimensional space. Hence the minimum value of cT x over Ω ∩ B(e, r) occurs uniquely at the point z 1 := e − rp, where p is the vector of unit length whose direction is obtained by projecting the vector c into the linear space parallel to Ω. Similarly, when x runs through Ω ∩ B(e, R), the minimal value will be attained uniquely at the point z 2 := e − Rp. Since Ω ∩ B(e, r) ⊆ P ⊆ Ω ∩ B(e, R), and the minimal value over P is given as zero, we must have cT z 2 ≤ 0 ≤ cT z 1 . This can be rewritten as cT e − RcT p ≤ 0 ≤ cT e − rcT p. The left inequality and (15.1) imply cT p ≥ cT e > 0. R 292 IV Miscellaneous Topics Hence, cT z 1 = cT e − rcT p ≤ cT e − r T c e= R  1− 1 n−1  cT e. Thus, starting at the feasible point e we may construct in this way the new feasible point z 1 whose objective value, compared with the value at e, is reduced by the factor 1 − 1/(n − 1). At this stage we note that we want the new point to be positive. The above procedure may end at the boundary of the simplex. This can be prevented by introducing a stepsize α ∈ (0, 1) and using the point z := e − αrp as the new iterate. Below α ≈ 1/2 will turn out to be a good choice. The objective value is then reduced by the factor α . 1− n−1 It is clear that the above procedure can be used only once. The reduction factor for the objective value is 1 − r/R, where r/R is the ratio between the radius of the largest inscribed sphere and the radius of the smallest circumscribed sphere for the feasible region. This ratio is maximal at the center e of the feasible region. If we approach the boundary of the region the ratio goes to zero and the reduction factor goes to 1 and we cannot make enough progress to get an efficient method. Here Karmarkar made a brilliant contribution. His idea is to transform the problem to an equivalent problem by using a projective transformation that maps the new iterate back to the center e of the simplex Σn . We describe this transformation in the next section. After the transformation the procedure can be repeated and the objective value is reduced by the same factor. After sufficiently many iterations, a feasible point can be obtained with objective value as close to zero as we wish. 15.4 Projective transformations of Σn Let d > 0 be any positive vector. With IRn+ denoting the set of nonnegative vectors in IRn , the projective transformation Td : IRn+ \ {0} → Σn is defined by Td : x 7→ ndx ndx = T . dT x e (dx) Note that Td can be decomposed into two transformations: a coordinate-wise scaling x 7→ dx and a global scaling x 7→ nx/eT x. The first transformation is defined for each x, and is linear; the second transformation — which coincides with Te — is only defined if eT x is nonzero, and is nonlinear. As a consequence, Td is a nonlinear transformation. It may easily be verified that Td maps the simplex Σn into itself and that it is invertible on Σn ; the inverse on Σn is simply Td−1 : x 7→ nd−1 x . eT (d−1 x) IV.15 Karmarkar’s Projective Method 293 The projective transformation has some important properties. Proposition IV.1 For each d > 0 the projective transformation Td is a one-to-one map of the simplex Σn onto itself. The intersection of Σn with the linear subspace {x : Ax = 0} is mapped to the intersection of Σn with another subspace of the same dimension, namely x : AD−1 x = 0 . Besides, the transformation is positively homogeneous of degree zero; that is, for any λ > 0, Td (λx) = Td (x). Proof: The first statement is immediate. To prove the second statement, let x ∈ Σn . Then Ax = 0 if and only if Ad−1 dx = 0, which is equivalent to AD−1 Td (x) = 0. This implies the second statement. The last statement is immediate from the definition. ✷ Now let z be a feasible and positive point. For any nonzero x ∈ P there exists a unique ξ ∈ Σn such that x = Tz (ξ). We have Ax = 0 if and only if AZξ = 0 and T cT x = cT Tz (ξ) = cT n (Zc) ξ nzξ = T . eT (zξ) e (zξ) Hence the problem (P K) can be reformulated as min ( ) T n (Zc) ξ T : AZξ = 0, e ξ = n, ξ ≥ 0 . eT (zξ) Note that the objective of this problem is nonlinear. But we know that the optimal T value is zero and this can happen only if (Zc) ξ = 0. So we may replace the nonlinear T objective by the linear objective (Zc) ξ and, changing the variable ξ back to x, we are left with the linear problem (P KS) min n o T (Zc) x : AZx = 0, eT x = n, x ≥ 0 . Note that the feasibility of z implies Az = 0, whence AZe = 0, showing that e is feasible for the new problem. Thus we can use the procedure described in Section 15.3 to construct a new feasible point for the transformed problem so that the objective value is reduced by a factor 1 − α/ (n − 1). The new point is obtained by minimizing the objective over the inscribed sphere with radius αr: min 15.5 n o T (Zc) x : AZx = 0, eT x = n, kx − ek ≤ αr . The projective algorithm We can now describe the algorithm as follows. 294 IV Miscellaneous Topics Projective Algorithm Input: An accuracy parameter ε > 0. begin x := e; while cT x ≥ ε do begin n o z := argminξ (Xc)T ξ : AXξ = 0, eT ξ = n, kξ − ek ≤ αr ; x := Tx (z); end end As long as the objective value at the current iterate x is larger than the threshold value ε, the problem is rescaled by the projective transformation Tx−1 . This makes the all-one vector feasible. Then the new iterate z for the transformed problem is obtained by minimizing the objective value over the inscribed sphere with radius αr. After this the inverse of the map Tx−1 — that is Tx — is applied to z and we get a point that is feasible for the original problem (P K) again. This is repeated until the objective value is small enough. Figure 15.2 depicts one iteration of the algorithm. c ✻ ✠ Σn xk xk+1 ✒ Figure 15.2 ✠ ✠ T −1 ✲x ✛ Tx e optimal solution ✕Xc Ax = 0 kξ − ek = αr e z ■ ■ Σn ■ optimal solution AXξ = 0 One iteration of the projective algorithm (x = xk ). In the next section we derive an iteration bound for the algorithm. Unfortunately, the analysis of the algorithm cannot be based on the reduction of the objective value in each iteration. This is because the objective value is not preserved under the projective transformation. This is the price we pay for the linearization of the nonlinear problem IV.15 Karmarkar’s Projective Method 295 after each projective transformation. Here, again, Karmarkar proposed an elegant solution. The progress of the method can be measured by a suitable potential function. We introduce this function in the next section. 15.6 The Karmarkar potential Karmarkar used the following potential function in the analysis of his method. φK (x) = n log cT x − n X log xi . i=1 The usefulness of this function depends on two lemmas. Lemma IV.2 If x ∈ Σn then cT x ≤ exp  φK (x) n  . Proof: Since eT x = n, using the geometric-arithmetic-mean inequality we may write n X i=1 log xi ≤ n log Therefore φK (x) = n log cT x − which implies the lemma. eT x = n log 1 = 0. n n X i=1 log xi ≥ n log cT x, ✷ Lemma IV.3 Let x and z be positive vectors in Σn and y = Tx (z). Then T φK (x) − φK (y) = n log (Xc) e (Xc)T z + n X log zi . i=1 Proof: First we observe that φK (x) is homogeneous of degree zero in x. In other words, for each positive λ we have φK (λx) = φK (x). As a consequence we have φK (y) = φK (Tx (z)) = φK  nxz T e (xz)  = φK (xz) , as follows by taking λ = n/eT (xz). Therefore, n φK (x) − φK (y) = φK (x) − φK (xz) = n log X xi cT x log , − T c (xz) i=1 xi zi 296 IV Miscellaneous Topics from which the lemma follows. ✷ Applying the above lemma with z = e − αrp we can prove that each iteration of the projective algorithm decreases the potential by at least 0.30685 when choosing α appropriately. Lemma IV.4 Taking α = 1/(1 + r), each iteration of the projective algorithm decreases the potential function value by at least 1 − log 2 = 0.30685. Proof: By Lemma IV.3, at any iteration the potential function value decreases by the amount n (Xc)T e X log zi . ∆ = n log + T (Xc) z i=1 Recall that Xc is the objective vector in the transformed problem. Since the objective value of the transformed problem is reduced by at least a factor 1 − αr/R and z = e − αrp, we obtain n  αr  X log (1 − αrpi ) . + ∆ ≥ −n log 1 − R i=1 (15.2) For the first term we write   αr  αnr  αr   αr  αr  αr  −n log 1 − +ψ − + nψ − =n = = αr2 + nψ − . R R R R R R Here, and below we use the function ψ as defined in (5.5), page 92. The second term in (15.2) can be written as n X i=1 log (1 − αrpi ) = −αreT p − n X i=1 ψ (−αrpi ) = − n X ψ (−αrpi ) . i=1 Here we have used the fact that eT p = 0. By the right-hand side inequality in (6.24), on page 134, the above sum can be bounded above by ψ (−αr kpk). Since kpk = 1 we obtain  αr  ∆ ≥ αr2 + nψ − − ψ (−αr) . R Omitting the second term, which is nonnegative, we arrive at ∆ ≥ αr2 − ψ (−αr) = αr2 + αr + log (1 − αr) . The right-hand side expression is maximal if α = 1/ (1 + r). Substitution of this value yields   r ∆ ≥ r + log 1 − = r − log (1 + r) = ψ (r) . 1+r p Since r = n/(n − 1) > 1 we have ψ (r) > ψ (1) = 1−log 2, and the proof is complete. ✷ IV.15 Karmarkar’s Projective Method 15.7 297 Iteration bound for the projective algorithm The convergence result is as follows. Theorem IV.5 After no more than cT e n log ψ(1) ε iterations the algorithm stops with a feasible point x such that cT x ≤ ε. Proof: After k iterations the iterate x satisfies φK (x) − φK (e) < −kψ(1). Since φK (e) = n log cT e, φK (x) < n log cT e − kψ(1). Using Lemma IV.2, we obtain     φK (x) n log cT e − kψ(1) cT x ≤ exp < exp . n n The stopping criterion is thus certainly met as soon as   n log cT e − kψ(1) ≤ ε. exp n Taking logarithms of both sides we get n log cT e − kψ(1) ≤ n log ε, or equivalently, k≥ n cT e log , ψ(1) ε which yields the bound in the theorem. 15.8 ✷ Discussion of the special format The problem (P K) solved by the Projective Method of Karmarkar has a special format that is called the Karmarkar format. Except for the so-called normalizing constraint eT x = n, the constraints in (P K) are homogeneous. Furthermore, it is assumed that the optimal value is zero and that some positive feasible vector is given.5 We may 5 In fact, Karmarkar assumed that the all-one vector e is feasible, but it is sufficient if some given positive vector w is feasible. In that case we can use the projective transformation Tw−1 as defined in Section 15.4, to transform the problem to another problem in the Karmarkar format and for which the all-one vector is feasible. 298 IV Miscellaneous Topics wonder how the Projective Method could be used to solve an arbitrary LO problem that is not given in the Karmarkar format.6 Clearly problem (P K) is in the standard format and, since its feasible region is contained in the unit simplex Σn in IRn , the feasible region is bounded. Finally, since the all-one vector is feasible, (P K) satisfies the interior-point condition. In this section we first show that a problem (P ) in standard format can easily be reduced to the Karmarkar format whenever the feasible region P of (P ) is bounded and the interior-point condition is satisfied. Secondly, we discuss how a general LO problem can be put in the format of (P K). Thus, let the feasible region P of the standard problem  (P ) min cT x : Ax = b, x ≥ 0 be bounded and let it contain a positive vector. Now let the pair (ȳ, s̄) be optimal for the dual problem  (D) max bT y : AT y + s = c, s ≥ 0 . Then we have, for any primal feasible x, s̄T x = cT x − bT ȳ. So s̄T x and cT x differ by the constant bT ȳ and hence the problem  (P ′ ) min s̄T x : Ax = b, x ≥ 0 has the same optimal set as (P ). Since s̄ is dual optimal, the optimal value of (P ′ ) is zero. Since the feasible region P is bounded, we deduce from Corollary II.14 that the row space of the constraint matrix A contains a positive vector. That is, there exists a λ ∈ IRm such that v := AT λ > 0. Now, defining ν := bT λ, we have for any feasible x, 6 T v T x = AT λ x = λT Ax = λT b = ν. The first assumption on a known optimal value for a problem in the Karmarkar format was removed by Todd and Burrell [266]. They used a simple observation that for any ζ, the objective cT x − ζ is equivalent to (c − (ζ/n) e)T x. If ζ = ζ ∗ , the optimal value of problem (P K), the assumption of a zero optimal value is verified for the problem with the new objective. If ζ < ζ ∗ , Todd and Burrell were able to show that the algorithm allows an update of the lower bound ζ by a simple linear ratio test after finitely many iterations; the overall procedure has the same complexity as the original algorithm of Karmarkar. The second assumption of a known interior feasible solution was removed by Ghellinck and Vial [95] by using a different projective embedding. They also used the same parametrization as Todd and Burrell and thus produced the first combined phase I – phase II interior-point algorithm, simultaneously resolving optimality and feasibility. They also pointed out that the projective algorithm was truly a Newton method. The update of the bound in their method is done by an awkward quadratic test. Fraley [79] was able to replace the quadratic test by a simpler linear ratio test. To remain consistent with Part I of the book, we shall not dwell upon those approaches, but rather use a homogeneous self-dual embedding, and analyze the behavior of Karmarkar algorithm on the embedding problem. IV.15 Karmarkar’s Projective Method 299 Since there exists a positive primal feasible x and v is positive, it follows that ν = v T x > 0. We may write    νAx = νb = v T x b = b v T x = bv T x. Hence, Defining  νA − bv T x = 0. A′ := νA − bv T , we conclude that n o T P = x : A′ x = 0, v T x = ν , and hence (P ′ ) can be reformulated as  (P ′ ) min s̄T x : A′ x = 0, v T x = ν, x ≥ 0 , where ν > 0. This problem can be rewritten as n o  T (P ′′ ) min s̄v −1 x̄ : A′ V −1 x̄ = 0, eT x̄ = n, x̄ ≥ 0 , where the new variable x̄ relates to the old variable x according to x̄ = nvx/ν. Since (P ) satisfies the interior-point condition, this condition is also satisfied by (P ′ ). Hence, the problem (P ′′ ) is not only equivalent to the given standard problem (P ), but it satisfies all the conditions of the Karmarkar format: except for the normalizing constraint the constraints are homogeneous, the optimal value is zero, and some positive feasible vector is given. Thus we have shown that any standard primal problem for which the feasible set is bounded has a representation in the Karmarkar format.7 Our second goal in this section is to point out that any given LO problem can be transformed to a problem in the Karmarkar format. Here we use some results from Chapter 2. First, the given problem can be put in the canonical format, where all constraints are inequality constraints and the variables are nonnegative (see Appendix D.1). Then we can embed the resulting canonical problem — and its dual problem — in a homogeneous self-dual problem, as described in Section 2.5 (cf. (2.15)). Thus we arrive at a problem of the form  min 0T x : M x ≥ 0, x ≥ 0 , where M is skew-symmetric (M = −M T ) and we need to find a strictly complementary solution for this problem. We proceed by reducing this problem to the Karmarkar format. First we use the procedure described in Section 2.5 to embed the above self-dual problem in a self-dual problem that satisfies the interior-point condition. As before, let the vector r be defined by r := e − M e. 7 It should be noted that this statement has only theoretical value; to reduce a given standard problem with bounded feasible region to the Karmarkar format we need a dual feasible pair (ȳ, s̄) with s̄ > 0; in general such a pair will not be available beforehand. 300 IV Miscellaneous Topics Now consider the self-dual model in IRn+1 given by  " #T " # " #" # " # " # " #   0 x M r x 0 0 x : + ≥ , ≥ 0 . min   n+1 ξ −rT 0 ξ n+1 0 ξ Taking (x, ξ) = (e, 1) , we get " M r −rT 0 #" x ξ # + " 0 n+1 # = " Me + r −rT e + n + 1 # = " e 1 # , as can easily be verified. By introducing the surplus vector (s, η), we can write the inequality constraints as equality constraints and get the equivalent problem ( " #" # " # " # " # " # ) M r x s 0 x s min ξ : − = , , ≥ 0 . (15.3) −rT 0 ξ η −n − 1 ξ η We replaced the objective (n + 1)ξ by ξ; this is allowed since the optimal objective is 0. Note that the all-one vector ((x, ξ, s, η) = (e, 1, e, 1)) is feasible for (15.3) and the optimal value is zero. When summing up all the constraints we obtain eT M x + eT rξ − eT s − rT x − η = −n − 1. Since r = e − M e and eT M e = 0, this reduces to eT x + eT s + ξ + η = (n + 1)(1 + ξ). (15.4) We can replace the last equality constraint in (15.3) by (15.4). Thus we arrive at the problem       x x     # # " "         M r −I 0  ξ  0  ξ  . (15.5) , = ≥ 0 min ξ :        s  eT 1 eT 1  s  (n + 1)(1 + ξ)       η η Instead of this problem we consider       x     " # " # x      ξ   ξ 0 M r −I 0      . ≥ 0 = , min ξ :        s  2(n + 1) eT 1 eT 1  s        η η (15.6) We established above that the all-one vector is feasible for (15.5); obviously this implies that the all-one vector is also feasible for (15.6). It follows that the problem (15.6) is in the Karmarkar format and hence it can be solved by the projective method. Any optimal solution (x∗ , ξ ∗ , s∗ , η ∗ ) of (15.6) has ξ ∗ = 0. It is easily verified that (x∗ , ξ ∗ , s∗ , η ∗ ) /2 is feasible for (15.5) and also optimal. IV.15 Karmarkar’s Projective Method 301 Thus we have shown how any given LO problem can be embedded into a problem that has the Karmarkar format and for which the all-one vector is feasible. We should note however that solving the given problem by solving the embedding problem requires a strictly complementary solution of the embedding problem. Thus we are left with an important question, namely, does the Projective Method yield a strictly complementary solution? A positive answer to this question has been given by Muramatsu and Tsuchiya [223]. Their proof uses the fact that there is a close relation between Karmarkar’s method and the primal affine-scaling method of Dikin8 when applied to the homogeneous problem obtained by omitting the normalizing constraint in the Karmarkar format. The next two sections serve to highlight this relation. We first derive an explicit expression for the search direction in the Projective Method. The result is that this direction can be interpreted as a primal logarithmic barrier direction for the homogeneous problem. Then we show that the homogeneous problem has optimal value zero and that any strictly complementary solution of the homogeneous problem yields a solution of the Karmarkar format. 15.9 Explicit expression for the Karmarkar search direction It may be surprising that in the discussion of Karmarkar’s approach there is no mention of some issues that were crucial in the methods discussed in the rest of this book. The most striking example of this is the complete absence of the central path in Karmarkar’s approach. Also, whereas the search direction in all the other methods is obtained by applying Newton’s method — either to a logarithmic barrier function or to the centering conditions — the search direction in the Projective Method is obtained from a different perspective. The aim of this section is to derive an explicit expression for the search direction in the Projective Method. In this way we establish a surprising relation with the Newton direction in the primal logarithmic method for the homogeneous problem arising when the normalizing constraint in the Karmarkar format is neglected. Let x be a positive vector that is feasible for (P K). Recall from Section 15.5 that the new iterate x+ in the Projective Algorithm is obtained from x+ = Tx (z) where n o T z = argminξ (Xc) ξ : AXξ = 0, eT ξ = n, kξ − ek ≤ αr . Here r denotes the radius of the maximal inscribed sphere in the simplex Σn and α is the step-size. From this we can easily derive that9 z = e + α∆z, where ∆z = argmin∆ξ 8 9 n o T (Xc) ∆ξ : AX∆ξ = 0, eT ∆ξ = 0, k∆ξk = r . For a brief description of the primal affine-scaling method of Dikin we refer to the footnote on page 339. We assume throughout that cT x is not constant on the feasible region of (P K). With this assumption the vector z is uniquely defined. 302 IV Miscellaneous Topics By writing down the first-order optimality conditions for this minimization problem we obtain AX∆z eT ∆z = = 0 0 Xc k∆zk = = XAT y + σe + η∆z r, (15.7) where σ, η ∈ IR and y ∈ IRm . Multiplying the third equation from the left by AX and using the first equation and AXe = Ax = 0, (15.8) we get AX 2 c = AX 2 AT y, whence y = AX 2 AT −1 AX 2 c. Substituting this in the third equation of (15.7) gives σe + η∆z = = = −1 AX 2 c Xc − XAT AX 2 AT   −1 T I − (AX) AX 2 AT AX Xc PAX (Xc) . (15.9) Taking the inner product with e on both sides, while using eT ∆z = 0 and eT e = n, we get nσ = eT PAX (Xc) . Since AXe = 0, according to (15.8), e belongs to the null space of AX and hence PAX (e) = e. (15.10) Using this we write T T eT PAX (Xc) = (Xc) PAX (e) = (Xc) e = cT x. (15.11) Thus we obtain nσ = cT x. Substituting this in (15.9) we get   cT x cT x e = PAX Xc − e . η∆z = PAX (Xc) − n n The second equality follows by using (15.10) once more. Up to its sign, the value of the factor η now follows from k∆zk = r. This implies   T PAX Xc − c nx e  .  ∆z = ±r (15.12) T PAX Xc − c nx e Here we assumed that the vector   cT x cT x e = PAX (Xc) − e χ := PAX Xc − n n (15.13) IV.15 Karmarkar’s Projective Method 303 is nonzero. This is indeed true. We leave this fact as an exercise to the reader.10The T sign in (15.12) follows by using that we are minimizing (Xc) ∆z. So we must have T (Xc) ∆z ≤ 0. In this respect the following observation is crucial. By using the Cauchy–Schwarz inequality we may write √ T T cT x = (Xc) e = (Xc) PAX (e) = eT PAX (Xc) ≤ n kPAX (Xc)k . Note that this inequality holds with equality only if PAX (Xc) is a scalar multiple of e. This would imply that ∆z is a scalar multiple of e. Since eT ∆z = 0 and k∆zk = r > 0 this case cannot occur. Thus we obtain cT x kPAX (Xc)k > √ . n As a consequence,   cT x T e (Xc) PAX Xc − n = = cT x T (Xc) PAX (e) n 2 cT x 2 kPAX (Xc)k − > 0. n T (Xc) PAX (Xc) − T We conclude from this that (Xc) ∆z ≤ 0 holds only for the minus sign in (15.12). Thus we find rχ ∆z = − . (15.14) kχk We proceed by deriving an expression for x+ . We have   nx (e + α∆z) nx (e + α∆z) nxz =x+ − x . x+ = Tx (z) = T = T x z x (e + α∆z) xT (e + α∆z) So the displacement in the x-space is given by the expression between the brackets. This expression can be reduced as follows. We have nx (e + α∆z) nx (e + α∆z) − xT (e + α∆z) x nx (∆z) − xT (∆z) x −x= =α . T T x (e + α∆z) x (e + α∆z) xT (e + α∆z) Here we used that eT x = n. Hence we may write x+ = x + α∆x, where ∆x = nx (∆z) − xT (∆z) x . xT (e + α∆z) (15.15) Using (15.14) the enumerator in the last expression can be reduced as follows:    rx nrxχ r xT χ x T xT χ e − nχ . + = nx (∆z) − x (∆z) x = − kχk kχk kχk 10 Exercise 79 Show that the assumption (15.1) implies that cT x is positive on the (relative) interior of the feasible region of (P K). Derive from this that the vector χ is nonzero, for any feasible x with x > 0. 304 IV Miscellaneous Topics Using the definition (15.13) of χ and eT x = n, we may write    xT χ e − nχ = xT PAX (Xc) − cT x e − nPAX (Xc) + cT x e  = xT PAX (Xc) e − nPAX (Xc)   Xc = nµ PAX e − µ where µ= So we have xT PAX (Xc) . n rnµ XPAX nx (∆z) − x (∆z) x = kχk T (15.16)   Xc e− . µ Substituting this relation in the above expression (15.15) for ∆x gives   rnµ Xc ∆x = XPAX e − kχk xT (e + α∆z) µ (15.17) Thus we have found an explicit expression for the search direction ∆x used in the Projective Method of Karmarkar.11 Note that this direction is a scalar multiple of   Xc −XPAX −e µ and that this is precisely the primal logarithmic barrier direction12 at x for the barrier parameter value µ, given by (15.16), for the homogeneous problem  (P KH) min cT x : Ax = 0, x ≥ 0 . Note also that problem (P KH) arises when the normalizing constraint in (P K) is neglected. We consider the problem (P KH) in more detail in the next section. 15.10 The homogeneous Karmarkar format In this section we want to point out a relation between the primal logarithmic barrier method when applied to the homogeneous problem (P KH) and the Projective Method of Karmarkar. It is assumed throughout that (P K) satisfies the assumptions of the Karmarkar format. Recall that (P KH) is given by  (P KH) min cT x : Ax = 0, x ≥ 0 . We first show that the optimal value of (P KH) is zero. Otherwise there exists a nonnegative vector x satisfying Ax = 0 such that cT x < 0. But then nx Te (x) = T e x 11 12 Show that cT ∆x, with ∆x given by (15.17), is negative if and only if  cT x xT PAX (Xc) > n kPAX (Xc)k2 . See Remark III.20 on page 271. IV.15 Karmarkar’s Projective Method 305 is feasible for (P K) and satisfies cT Te (x) < 0, contradicting the fact that the optimal value of (P K) is zero. The claim follows.13 It is clear that any optimal solution of (P K) is nonzero and optimal for (P KH). So (P KH) will have a nonzero optimal solution x. Now, if x is optimal then λx is optimal as well for any nonnegative λ. Therefore, since (P KH) has a nonzero optimal solution, the optimal set of (P KH) is unbounded. This implies, by Corollary II.12 (page 102), that the dual problem (DKH) of (P KH), given by  (DKH) max 0T y : AT y + s = c, s ≥ 0 , does not contain a strictly feasible solution. Thus, (DKH) cannot satisfy the interiorpoint condition. As a consequence, the central paths of (P KH) and (DKH) do not exist. Note that any nonzero feasible solution x of (P KH) can be rescaled to Te (x) so that it becomes feasible for (P K). All scalar multiples λx, with λ ≥ 0, are feasible for (P KH), so we have a one-to-one correspondence between feasible solutions of (P K) and feasible rays in (P KH). Therefore, we can neglect the normalizing constraint in (P K) and just look for a nonzero optimal solution of (P KH). The behavior of the affine-scaling direction on (P KH) has been carefully analyzed by Tsuchiya and Muramatsu [273]. The results of this paper form the basis of the paper [223] by the same authors in which they prove that the Projective Method yields a strictly complementary solution of (P K).14 13 A different proof of the claim can be obtained as follows. The dual problem of (P K) is (DK) max  0T y + nζ : AT y + ζe + s = c, s ≥ 0 . This problem has an optimal solution and, due to Karmarkar’s assumption, its optimal value is zero. Thus it follows that (y, ζ) is optimal for (DK) if and only if ζ = 0 and y is an optimal solution of  (DKH) max 0T y : AT y + s = c, s ≥ 0 . By dualizing (DKH) we regain the problem (P KH), and hence, by the duality theorem the optimal value of (P KH) must be zero. 14 Exercise 80 Let x be feasible for (P KH) and positive and let µ > 0. Then, defining the number δ(x, µ) by n xs o δ(x, µ) := min − e : AT y + s = c , µ y,s we have δ(x, µ) ≥ 1. Prove this. 16 More Properties of the Central Path 16.1 Introduction In this chapter we reconsider the self-dual problem (SP ) min  T q x : M x ≥ −q, x ≥ 0 , where the matrix M is of size n × n and skew-symmetric, and the vector q is nonnegative. We assume that the central path of (SP ) exists, and our aim is to further investigate the behavior of the central path, especially as µ tends to 0. As usual, we denote the µ-center by x(µ) and its surplus vector by s(µ) = s (x(µ)). From Theorem I.30 (on page 45) we know that the central path converges to the analytic center of the optimal set SP ∗ of (SP ). The limit point x∗ and s∗ := s(x∗ ) form a strictly complementary optimal solution pair, and hence determine the optimal partition of (SP ), which is denoted by π = (B, N ). We first deal with the derivatives of x(µ) and s(µ) with respect to µ. In the next section we prove their existence. In Section 16.2.2 we show that the derivatives are bounded, and we also investigate the limits of the derivatives when µ approaches zero. In a final section we show that there exist two homothetic ellipsoids that are centered at the µ-center and which respectively contain, and are contained in, an appropriate level set of the objective value q T x. 16.2 Derivatives along the central path 16.2.1 Existence of the derivatives A fundamental result in the theory of interior point methods is the existence and uniqueness of the solution of the system F (w, x, s) = " Mx + q − s xs − w # = 0, x ≥ 0, s ≥ 0 308 IV Miscellaneous Topics for all positive w.1 The solution is denoted by x(w) and s(w). Remark IV.6 It is possible to give an elementary proof of the fact that the equation F (w, x, s) = 0 cannot have more than one solution. This goes as follows. Let x1 , s1 and x2 , s2 denote two solutions of the equation. Define ∆x := x2 − x1 , and ∆s := s2 − s1 . Then it follows from M x1 + s1 = M x2 + s2 = q that M ∆x = ∆s. Since M is skew symmetric it follows that ∆xT ∆s = ∆xT M ∆x = 0, so eT (∆x∆s) = 0. 1 1 2 2 x1j x2j From x s = x s = w we derive that if = holds for some j then also versa. In other words, (∆x)j = 0 ⇔ (∆s)j = 0, j = 1, · · · , n. (16.1) s1j = s2j , and vice (16.2) Also, if x1j ≤ x2j for some j, then s1j ≥ s2j , and if x1j ≥ x2j then s1j ≤ s2j . Thus we have (∆x)j (∆s)j ≤ 0, j = 1, · · · , n. Using (16.1) we obtain (∆x)j (∆s)j = 0, j = 1, · · · , n. This, together with (16.2) yields that (∆x)j = 0 and (∆s)j = 0, for each j. Hence we conclude that ∆x = ∆s = 0. Thus we have shown x2 = x1 and s2 = s1 , proving the claim.2 • With z = (x, s), the gradient matrix (or Jacobian) of F (w, x, s) with respect to z is # " M −I , ∇z F (w, x, s) = S X where S and X are the diagonal matrices corresponding to x and s, respectively. This matrix is independent of w and depends continuously on x, s and is nonsingular (cf. Exercise 8). Hence we may apply the implicit function theorem.3 Since F (w, x, s) is infinitely many times differentiable the same is true for x(w) and s(w), and we have # " #−1 " # " ∂x M −I 0 ∂w = ∂s S X I ∂w On the central path we have w = µe, with µ ∈ (0, ∞). Let us consider the more general situation where w is a function of a parameter t, such that w(t) > 0 for all t ∈ T with T an open interval T ⊆ IR. Moreover, we assume that w is in the class C ∞ of infinitely differentiable functions. Then the first-order derivatives of x(t) and s(t) with respect to t are given by " # " #−1 " # x′ (t) M −I 0 = . (16.3) s′ (t) S(t) X(t) w′ (t) 1 2 3 This result follows from Theorem II.4 if w = µe. For arbitrary w > 0 a proof similar to that of Theorem III.1 can be given. A more general result, for the case where M is a so-called P0 -matrix, is proven in Kojima et al. [175]. Cf. Proposition A.2. IV.16 More Theoretical Results 309 Changing notation, and denoting x′ (t) by x(1) , and similar for s and w, using induction we can easily obtain the higher-order derivatives. Actually, we have " x(k) s(k) # = " M −I S(t) X(t) where w̃ = w(k) − k−1 X i=1 k i  #−1 " 0 w̃ # x(k−i) s(i) . If w is analytic in t then so are x and s.4 When applying the above results to the case where x = x(µ) and s = s(µ), with µ ∈ (0, ∞), it follows that all derivatives with respect to µ exist and that x and s depend analytically on µ. 16.2.2 Boundedness of the derivatives Recall that the point x(µ) and its surplus vector s(µ) are uniquely determined by the system of equations Mx + q xs = = s, x ≥ 0, s ≥ 0, µe. (16.4) Taking derivatives with respect to µ in (16.4) we find, as a special case of (16.3), M ẋ xṡ + sẋ = = ṡ e. (16.5) The derivatives of x(µ) and s(µ) with respect to µ are now denoted by ẋ and ṡ respectively. In this section we derive bounds for the derivatives.5 These bounds are used in the next section to study the asymptotic behavior of the derivatives when µ approaches zero. Since we are interested only in the asymptotic behavior, we assume in this section that µ is bounded above by some fixed positive number µ̄. Table 16.1. (page 310) summarizes some facts concerning the order of magnitude of the components of various vectors of interest. We are interested in the dependence on µ. All other problem dependent data (like the condition number σSP , the dimension n of the problem, etc.) are considered as constants in the analysis below. From Table 16.1. we read that, e.g., xB (µ) = Θ(1) and ẋN (µ) = O(1). For the meaning of the symbols Θ and O we refer to Section 1.7.4. See also page 190. It is important to stress that the constants hidden in the order symbols are independent 4 This follows from an extension of the implicit function theorem. We refer the reader to, e.g., Fiacco [76], Theorem 2.4.2, page 36. See also Halická [137], Wechs [290] and Zhao and Zhu [321]. 5 We restrict ourselves to first-order derivatives. The asymptotic behavior of the derivatives has been considered by Adler and Monteiro [3], Witzgall, Boggs and Domich [294] and Ye et al. [313]. We also mention Güler [131], who also considers the higher-order derivatives and their asymptotic behavior, both when µ goes to zero and when µ goes to infinity. A very interesting result in his paper is that all the higher-order derivatives vanish if µ approaches infinity, which indicates that the central path is asymptotically linear at infinity. 310 IV Miscellaneous Topics Vector Table 16.1. B N 1 x(µ) Θ(1) Θ(µ) 2 s(µ) Θ(µ) 3 d(µ) Θ( √1µ ) Θ(1) √ Θ( µ) 4 ẋ(µ) O(1) O(1) 5 ṡ(µ) O(1) O(1) Asymptotic orders of magnitude of some relevant vectors. of the vectors x, s and of the value µ of the barrier parameter. They depend only on the problem data M and q and the upper bound µ̄ for µ. The statements in the first two lines of the table almost immediately follow from Lemma I.43 on page 57. For example, for i ∈ B the lemma states nxi (µ) ≥ σSP , where σSP is the condition number of (SP ). This means that xi (µ) is bounded below by a constant. But, since xi (µ) is bounded on the finite section 0 < µ ≤ µ̄ of the central path, as a consequence of Lemma I.9 (page 24), xi (µ) is also bounded above by a constant. This justifies the statement xi (µ) = Θ(1). Since, xi (µ)si (µ) = µ, this also implies si (µ) = Θ(µ). This explains the first two lines for the B-part. The estimates for the N -parts of xi (µ) and si (µ) are derived in the same way. The third line shows order estimates for the vector d(µ), given by s x(µ) . d(µ) = s(µ) These estimates immediately follow from the first two lines of the table. It remains to deal with the last two lines in the table, which concern the derivatives. In the rest of this section we omit the argument µ and write simply x instead of x(µ). This gives no rise to confusion. We start √ by writing the second equation in (16.5) in a different way. Dividing both sides by xs, and using that xs = µe we get e dṡ + d−1 ẋ = √ . µ (16.6) Note that the orthogonality of ẋ and ṡ — which is immediate from the first equation in (16.5) since M is skew-symmetric — implies that the vectors dṡ and d−1 ẋ are orthogonal as well. Hence we have kdṡk2 + d−1 ẋ Consequently √ n kdṡk ≤ √ , µ 2 e = √ µ −1 d 2 = n . µ √ n ẋ ≤ √ . µ IV.16 More Theoretical Results This implies 311 √ n ≤ d−1 ẋ √ . N N µ √ The third line in the table gives dB = Θ(1/ µ). This together with the left-hand side inequality implies ṡB = O(1). Similarly, the right-hand side inequality implies that ẋN = O(1). Thus we have settled the derivatives of the small coordinates. It remains to deal with the estimates for the derivatives of the large coordinates, ẋB and ṡN . This is the harder part. We need to characterize the scaled derivatives dṡ and d−1 ẋ in a different way. √ n kdB ṡB k ≤ √ , µ Lemma IV.7 Let x̃ be any vector in IRn and s̃ = s(x̃). Then " # " # 1 d−1 ẋ ds̃ = P(MD −D−1 ) . µ dṡ d−1 x̃ Here P(MD −D−1 ) denotes the orthogonal projection onto the null space of the matrix  M D − D−1 , where D is the diagonal matrix of d.6 Proof: Letting I denote the identity matrix of size n × n, we multiply both sides in (16.6) by DM D − I. This gives  e (DM D − I) dṡ + d−1 ẋ = (DM D − I) √ . µ By expanding the products we get e e DM ẋ − dṡ + DM D2 ṡ − d−1 ẋ = DM D √ − √ . µ µ With M ẋ = ṡ this simplifies to e e DM D2 ṡ − d−1 ẋ = DM D √ − √ , µ µ and this can be rewritten as e √ − d−1 ẋ µ = = 6 e DM D √ − DM D2 ṡ = DM D µ   e T −DM D √ − dṡ . µ  e √ − dṡ µ  Exercise 81 Using the notation of Lemma IV.7, let x̃ run through all vectors in IRn . Then the vector   ds̃ d−1 x̃ runs through an affine space parallel to the row space of the matrix intersects the null space of a vector x̃ in IRn such that  MD −D −1 µẋs(µ) µṡx(µ) where s̃ = s(x̃).   M D −D −1  . This space in a unique point. Using this, prove that there exists = x(µ)s̃ = s(µ)x̃, 312 IV Miscellaneous Topics Using this we write " # √e − d−1 ẋ µ √e µ − dṡ " #   −DM T e D − d ṡ √ µ D−1  h iT  e − M D −D−1 D √ − dṡ . µ = = This shows that the vector on the left belongs to the row space of the matrix  M D − D−1 . Observing that, on the other hand, " # i d−1 ẋ h = 0, M D −D−1 dṡ which means that the vector of scaled derivatives " # d−1 ẋ dṡ (16.7)  belongs to the null space of the matrix M D − D−1 , we conclude that the vector (16.7) can be characterized as the orthogonal projection of the vector " e # √ µ √e µ  into the null space of M D − D−1 . In other words, " e # " # √ d−1 ẋ µ . = P(MD −D−1 ) √e dṡ µ p √ Since xs = µe, we may replace the vector e by xs/µ. Now using that xs = ds = d−1 x, we get " # " # 1 d−1 ẋ ds = P(MD −D−1 ) . µ dṡ d−1 x Finally, let x̃ be any vector in IRn and s̃ = s(x̃). Then we may write " # " # " # " # ds̃ ds d (s̃ − s) DM (x̃ − x) − = = d−1 x d−1 (x̃ − x) d−1 (x̃ − x) d−1 x̃ " # h iT −DM T (x̃ − x) −1 = = − (x̃ − x) . M D −D d−1 (x̃ − x) The last vector is in the row space of P(MD −D−1 ) " h ds d−1 x i M D −D−1 , and hence we have # = P(MD −D−1 ) " ds̃ d−1 x̃ # , IV.16 More Theoretical Results 313 proving the lemma. ✷ Using Lemma IV.7 with x̃ = x∗ and s̃ = s∗ we have  "  " # # 2 " #   h i h d−1 ẋ ds∗ − h −1 µ = argminh,k∈IRn : = 0 . M D −D   dṡ d−1 x∗ − k k Hence, the unique solution of the above least squares problem is given by h = µd−1 ẋ, k = µdṡ. The left-hand side of the constraint in the problem can be split as follows: " " # # i h i h h h N B −1 −1 = 0. + MN DN −DB MB DB −DN kB kN Substituting the optimal values for hN and kB we find that hB and kN need to satisfy # # " " h i d−1 ẋ h i h N B N −1 −1 = −µ MN DN −DB MB DB −DN dB ṡB kN # " h i ẋ N . = −µ MN −IB ṡB Since x∗N = 0 and s∗B = 0 we obtain the following characterization of the derivatives for the large coordinates: " # d−1 B ẋB µ = dN ṡN  " # " # " # 2   h i ẋ h i h h N B B −1 . = −µ MN −IB : MB DB −DN argminhB ,kN  kN ṡB  kN (16.8) Now let z = (zB , zN ) be the least norm solution of the equation " # " # h i z h i ẋ B N = −µ MN −IB . MB −IN zN ṡB Then we have " zB zN # h = −µ MB −IN i+ h MN −IB i " ẋN ṡB # (16.9) h i+ h i where MB −IN denotes the pseudo-inverse7 of the matrix MB −IN . It is obvious that −1 hB = dB zB , kN = dN zN 7 See Appendix B. 314 IV Miscellaneous Topics is feasible for (16.8). It follows that # # " " −1 µd−1 ẋ d z B B B B . ≤ µdN ṡN dN zN √ √ From Table 16.1. we know that d−1 B = Θ( µ) and dN = Θ( µ), so it follows that # " # " zB µd−1 √ B ẋB . ≤ Θ( µ) zN µdN ṡN Moreover, we have already established that ẋN = O(1), ṡB = O(1). Hence, using also (16.9), z = µ O(1), where the constant in the order symbol now also contains the norm of the matrix + (MB − IN ) (MN − IB ). Note that this matrix, and hence also its norm, depends only on the data of the problem. Substitution yields, after dividing both sides by µ, " # d−1 √ B ẋB = Θ( µ) O(1). dN ṡN √ √ Using once more d−1 B = Θ( µ) and dN = Θ( µ), we finally obtain " # ẋB = O(1), ṡN completing the proof of the estimates in the table. 16.2.3 Convergence of the derivatives Consider the second equation in (16.5): xṡ + sẋ = e. Recall that ẋ and ṡ are orthogonal. Since xs = µe, the vectors xṡ and sẋ are orthogonal as well, so this equation represents an orthogonal decomposition of the all-one vector e. It is interesting to consider this decomposition as µ goes to zero. This is done in the next theorem. Its proof uses the results of the previous section, which are summarized in Table 16.1.. Theorem IV.8 If µ approaches zero, then xṡ and sẋ converge to complementary {0, 1}-vectors. The supports of their limits are B and N , respectively. Proof: For each index i we have xi ṡi + si ẋi = 1. IV.16 More Theoretical Results 315 Now let i ∈ B and let µ approach zero. Then si → 0. Since ẋi is bounded, it follows that si ẋi → 0. Therefore, xi ṡi → 1. Similarly, if i ∈ N , then xi → 0. Since ṡi is bounded, xi ṡi → 0 and hence si ẋi → 1. This implies the theorem. ✷ The next theorem is an immediate consequence of Theorem IV.8 and requires no further proof. It establishes that the derivatives of the small variables converge if µ approaches zero.8 Theorem IV.9 We have limµ↓0 ẋN = (s∗N )−1 and limµ↓0 ṡB = (x∗B )−1 . 16.3 ✷ Ellipsoidal approximations of level sets In this section we discuss another property of µ-centers. Namely, that there exist two homothetic ellipsoids that are centered at the µ-center and which respectively contain, and are contained in, an appropriate level set of the objective function q T x. In this section we keep µ > 0 fixed. For any K ≥ 0 we define the level set  MK := x : x ≥ 0, s(x) = M x + q ≥ 0, q T x ≤ K . (16.10) Since q T x(µ) = nµ, we have x(µ) ∈ MK if and only if nµ ≤ K. Note that M0 represents the set of optimal solutions of (SP ), since q T x ≤ 0 if and only if q T x = 0. Hence M0 = SP ∗ . For any number r ≥ 0 we also define the ellipsoid ) ( 2 2 s(x) x 2 −e + −e ≤r . E(µ, r) := x : x(µ) s(µ) Note that the norms in the defining inequality of this ellipsoid vanish if x = x(µ), so the analytic center x(µ) is the center of the ellipsoid E(µ, r). Theorem IV.10 E(µ, 1) ⊆ Mµ(n+√n) and M0 ⊆ E(µ, n). Proof: Assume x ∈ E(µ, 1). We denote s(x) simply by s. To√prove the first inclusion we need to show that x ≥ 0, s = s(x) ≥ 0 and q T x ≤ µ (n + n). To simplify the notation we make use once more of the vectors hx and hs introduced in Section 6.9.2, namely x s hx = − e, hs = − e, (16.11) x(µ) s(µ) or equivalently, hx = 8 x − x(µ) , x(µ) hs = s − s(µ) . s(µ) Theorem IV.9 gives only the limiting values of the derivatives of the small variables and says nothing about convergence of the derivatives for the large coordinates. For this we refer to Güler [131], who shows that all derivatives converge when µ approaches zero along a weighted path. In fact, he extends this result to all higher-order derivatives and he gets similar results for the case where µ approaches infinity. 316 IV Miscellaneous Topics Obviously, hx and hs are orthogonal. Hence, defining h := hx + hs , we find khk2 = khx k2 + khs k2 ≤ 1. Hence khx k ≤ 1. We easily see that this implies x√≥ 0. Similarly, khs k ≤ 1 implies s ≥ 0. Thus it remains to show that q T x ≤ µ (n + n). Since xs xs = , (hx + e) (hs + e) = x(µ)s(µ) µ and on the other hand (hx + e) (hs + e) = hx hs + hx + hs + e = hx hs + h + e, we get xs − e − hx hs . (16.12) µ Taking the inner product of both sides with the all-one vector, while using once more that hx and hs are orthogonal, we arrive at h= eT h = This gives xT s qT x − eT e = − n. µ µ (16.13)  q T x = µ n + eT h . Finally, applying the Cauchy–Schwarz inequality to eT h, while using khk ≤ 1, we get √  q T x ≤ µ (n + kek) = µ n + n , proving the first inclusion in the theorem. To prove the second inclusion, let x be optimal for (SP ). Then q T x = 0, and hence, from (16.13), eT h = −n. Since x ≥ 0 and s ≥ 0, (16.11) gives hx ≥ −e and hs ≥ −e. Thus we find h ≥ −2e. Now consider the maximization problem n o 2 max khk : eT h = −n, h ≥ −2e , (16.14) h and let h̄ be a solution of this problem. Then, for arbitrary i and j, with 1 ≤ i < j ≤ n, h̄i and h̄j solve the problem  max h2i + h2j : hi + hj = h̄i + h̄j , hi ≥ −2, hj ≥ −2 , hi ,hj We easily understand that this implies that either hi = −2 or hj = −2. Thus, h̄ must have n − 1 coordinates equal to −2 and the remaining coordinate equal to −n − (n − 1)(−2) = n − 2, and hence, h̄ 2 = (n − 1)4 + (n − 2)2 = n2 . Therefore, khk ≤ n. This means that x ∈ E(µ, n), and hence the theorem follows.9 ✷ 9 Exercise 82 Using the notation of this section, prove that Mnµ ⊆ E(µ, 2n). 17 Partial Updating 17.1 Introduction In this chapter we deal with a technique that can be applied to almost √ every interior-point algorithm to enhance the theoretical efficiency by a factor n. The technique is called partial updating, and was introduced by Karmarkar in [165]. His projective algorithm, as presented in Chapter 15, needs O(nL) iterations and O(n3 ) arithmetic operations per iteration. Thus, in total the projective algorithm requires O(n4 L) arithmetic operations. Karmarkar showed that this complexity bound can be reduced to O(n3.5 L) arithmetic operations by using partial updating. It has since become apparent that the same technique can be applied to many other interior-point √ algorithms with the same effect: a reduction of the complexity bound by a factor n.1 The partial updating technique can be described as follows. In an interior-point method for solving the problems (P ) and (D) — in the standard format of Part II — each search direction is obtained by solving a linear system involving a matrix of the form AD2 AT , where the scaling matrix D = diag (d) is a positive diagonal matrix depending on the method. In a primal-dual method we have D2 = XS −1 , in a primal method D = X, and in a dual method D = S −1 . The matrix D varies from iteration to iteration, due to the variations in x and/or s. We assume that A is m × n with rank m. Without partial updating the computation of the search directions requires at each iteration O(n3 ) arithmetic operations for factorization of the matrix AD2 AT and only O(n2 ) operations for all the other required arithmetic operations. Although the matrix AD2 AT varies from iteration to iteration, it seems reasonable to expect that the variations are not too large, and that the matrix at the next iteration is related in some sense to the current matrix. In other words, the calculation of the search direction in the next iteration might benefit from earlier calculations. In some way, that goal is achieved by the use of partial updating. To simplify the discussion we assume for the moment that at some iteration the scaling matrix is the identity matrix I and at the next iteration D. Then, if ai denotes the i-th column of A, we may write AD2 AT = n X i=1 1 d2i ai aTi = n X i=1 ai aTi + n X i=1  d2i − 1 ai aTi . See for example Anstreicher [20], Anstreicher and Bosch [25, 26], Bosch [46], Bosch and Anstreicher [47], den Hertog, Roos and Vial [146], Gonzaga [118], Kojima, Mizuno and Yoshise [177], Mehrotra [204], Mizuno [213], Monteiro and Adler [219], Roos [240], Vaidya [276] and Ye [306]. 318 IV Miscellaneous Topics Hence AD2 AT = AAT + n X i=1  d2i − 1 ai aTi ,  showing that AD A arises by adding the n rank-one matrices d2i − 1 ai aTi to AAT . Now consider the hypothetical situation that di = 1 for every i, except for i = 1. Then we have  AD2 AT = AAT + d21 − 1 a1 aT1 2 T and AD2 AT is a so-called rank-one modification of AAT . By the well known ShermanMorrison formula2 we then have −1 −1 −1 −1  AAT a1 aT1 AAT . AD2 AT = AAT − d21 − 1 −1 1 + (d21 − 1) aT1 (AAT ) a1 This expression makes clear that the inverse of AD2 AT is equal to the inverse of AAT plus a scalar multiple of the rank-one matrix vv T , where v = AAT −1 a1 . −1 −1 We say that AD2 AT is a rank-one update of AAT . The computation of a rank-one update requires O(n2 ) arithmetic operations, as may easily be verified. In the general situation, when all the entries of d differ from 1, the inverse of the matrix AD2 AT can be obtained by applying n rank-one updates to the inverse of AAT . This still requires O(n3 ) arithmetic operations. The underlying idea for the partial updating technique is to perform only those rank-one updates that correspond to coordinates i of d for which d2i − 1 exceeds some threshold value. A partial updating algorithm maintains an approximation d˜ e 2 AT instead of AD2 AT ; the value of d˜i is updated to its correct of d and uses AD value if it deviates too much from di . Each update of an entry in d˜ necessitates e 2 AT . But each such modification modification of the inverse (or factorization) of AD can be accomplished by a rank-one update, and this requires only O(n2 ) arithmetic operations.3 The success of the partial updating technique comes from the fact that it can reduce the total number of rank-one updates in the course of an algorithm by √ a factor n. The analysis of an interior-point algorithm with partial updating consists of two parts. First we need to show that the modified search directions, obtained by using the scaling matrix d˜ instead of d, are sufficiently accurate to maintain the polynomial complexity of the original algorithm; this amounts to showing that the modified algorithm has a worst-case iteration count of the same order of magnitude as the 2 Exercise 83 Let Q, R, S, T be matrices such that the matrices Q and Q + RS T are nonsingular and R and S are n × k matrices of rank k ≤ n. Prove that (Q + RS T )−1 = Q−1 − Q−1 R(I + S T Q−1 R)−1 S T Q−1 . The Sherman-Morrison formula arises by taking R = S = a, where a is a nonzero vector [136]. 3 We refer the reader to Shanno [251] for more details of rank-one updates of a Cholesky factorization of a matrix of the form AD 2 AT . IV.17 Partial Updating 319 original algorithm. Then, secondly, we have to count the total number of rank-one updates in the modified algorithm. As indicated above, the partial updating technique can be applied to a wide class of interior-point algorithms. Below we demonstrate its use only for the dual logarithmic barrier method with full Newton steps, which was analyzed in Chapter 6. 17.2 Modified search direction Recall from Exercise 35 (page 111) that the search direction in the dual logarithmic barrier method is given by   −1 b − AS −1 e . ∆y = AS −2 AT µ More precisely, this is the search direction at y, with s = c − AT y > 0, and for the barrier parameter value µ. In the sequel we use instead   −1  b − AS −1 e , ∆y = ASe−2 AT µ where s̃ is such that s̃ = λs with λi ∈   1 ,σ , σ 1 ≤ i ≤ n, (17.1) for some fixed real constant σ > 1. The corresponding displacement in the s-space is given by   −1  b −1 T T −2 T e − AS e . (17.2) ∆s = −A ∆y = −A AS A µ Letting x̄ be such that Ax̄ = b we may write    T  −1 sx̄ s̃−1 ∆s = − ASe−1 ASe−2 AT ASe−1 Λ −e , µ showing that −s̃−1 ∆s equals the orthogonal projection of the vector   sx̄ Λ −e µ into the row space of the matrix ASe−1 . Since the row space of the matrix ASe−1 is e where H is the same matrix as used in Chapter 6 — equal to the null space of H S, and defined in Section 5.8, page 111 — we have    sx̄ s̃−1 ∆s = −PH Se Λ −e . (17.3) µ Note that if λ = e then the above expression coincides with the expression for the dual Newton step in (6.1). Defining     sx −e : Ax = b , (17.4) x̃(s, µ) = argminx Λ µ 320 IV Miscellaneous Topics and using the same arguments as in Section 6.5 we can easily verify that      s̃x̃(s, µ) sx̃(s, µ) sx̄ −e =Λ −e = − λ, PH Se Λ µ µ µ yielding the following expression for the modified Newton step:   sx̃(s, µ) . s̃−1 ∆s = Λ e − µ 17.3 (17.5) Modified proximity measure The proximity of s to s(µ) is measured by the quantity δ̃(s, µ) := ∆s . s̃ (17.6) From (17.5) it follows that the modified Newton step ∆s vanishes if and only if sx̃(s, µ) = µe, which holds if and only if x̃(s, µ) = x(µ) and s = s(µ). As a consequence we have δ̃(s, µ) = 0 ⇐⇒ s = s(µ). An immediate consequence of (17.4) and (17.5) is      sx sx̃(s, µ) = min Λ −e δ̃(s, µ) = Λ e − x µ µ  : Ax = b . (17.7) The next lemma shows that the modified proximity δ̃(s, µ) has a simple relation with the standard proximity measure δ(s, µ). Lemma IV.11 δ(s, µ) ≤ δ̃(s, µ) ≤ σδ(s, µ). σ Proof: Using (17.7) and max (λ) ≤ σ we may write     sx̃(s, µ) sx(s, µ) δ̃(s, µ) = Λ e− ≤ Λ e− µ µ sx(s, µ) ≤ σδ(s, µ). ≤ kdk∞ e − µ On the other hand we have δ(s, µ) = ≤   sx(s, µ) sx̃(s, µ) −1 e− ≤ Λ Λ e− µ µ   sx̃(s, µ) d−1 ∞ Λ e − ≤ σ δ̃(s, µ). µ This implies the lemma. The next lemma generalizes Lemma II.26 in Section 6.7. ✷ IV.17 Partial Updating 321 Lemma IV.12 Assuming δ̃(s, µ) ≤ 1, let s+ be obtained from s by moving along the modified Newton step ∆s at s for the barrier parameter value µ, and let µ+ = (1 − θ)µ. Assuming that s+ is feasible, we have s  σ 2 − 1 δ̃(s, µ) θ2 n + + 4 + . δ̃(s , µ ) ≤ σ δ̃(s, µ) + 1−θ (1 − θ)2 Proof: By definition, + + δ(s , µ ) = min x  s+ x e− + µ  : Ax = b . Substituting for x the vector x̃(s, µ) and replacing µ+ by µ(1 − θ) we obtain the following inequality: s+ x̃(s, µ) δ(s+ , µ+ ) ≤ e − . (17.8) µ(1 − θ) Simplifying the notation by using h := we may rewrite (17.5) as Using this and (17.9) we get s+ x̃(s, µ) = = ∆s ∆s = , s̃ λs (17.9)  sx̃(s, µ) = µ e − λ−1 h . (17.10) (s + ∆s) x̃(s, µ) = (s + λsh) x̃(s, µ)  (e + λh) sx̃(s, µ) = µ (e + λh) e − λ−1 h . Substituting this into (17.8) we obtain (e + λh) e − λ−1 h δ(s , µ ) ≤ e − 1−θ + +   e − h2 + λ − λ−1 h = e− . 1−θ This can be rewritten as   λ − λ−1 h θ e − h2 − . δ(s , µ ) ≤ h − 1−θ 1−θ + + 2 The triangle inequality now yields θ e − h2 δ(s , µ ) ≤ h − 1−θ + + 2  +  λ − λ−1 h . 1−θ (17.11) The first norm resembles (6.12) and, since khk ≤ 1, can be estimated in the same way. This gives  2 θ e − h2 θ2 n 4 2 ≤ khk + h − 2. 1−θ (1 − θ) 322 IV Miscellaneous Topics For the second norm in (17.11) we write  λ − λ−1 h ≤ λ − λ−1 1−θ ∞  σ − σ −1 khk khk ≤ . 1−θ 1−θ Substituting the last two bounds in (17.11), while using khk = δ̃(s, µ), we find s  σ − σ −1 δ̃(s, µ) θ2 n + + 4 + δ(s , µ ) ≤ δ̃(s, µ) + . 1−θ (1 − θ)2 Finally, Lemma IV.11 gives δ̃(s+ , µ+ ) ≤ σδ(s+ , µ+ ) and the bound in the lemma follows. ✷ Lemma IV.13 √ Let n ≥ 3. Using the notation of Lemma IV.12 and taking σ = 9/8 and θ = 1/(6 n), we have δ̃(s, µ) ≤ 1 1 ⇒ δ̃(s+ , µ+ ) ≤ , 2 2 and the new iterate s+ is feasible. Proof: The implication in the lemma follows by substituting the given values in the bound for δ̃(s+ , µ+ ) in Lemma IV.12. If n ≥ 3 this gives δ̃(s+ , µ+ ) ≤ 0.49644 < 0.5, yielding the desired result. By Lemma IV.11 this implies δ(s+ , µ+ ) ≤ σ/2 = 9/16. From this the feasibility of s+ follows. ✷ The above lemma shows that for the specified values of the parameters σ and θ the modified Newton steps keep the iterates close to the central path. The value of the barrier update parameter θ in Lemma IV.13 is a factor of two smaller than in the algorithm of Section 6.7. Hence we must expect that the iteration bound for an algorithm based on these parameter values will be a factor of two worse. This is the price we pay for using the modified Newton direction. On the other hand, in terms of the number of arithmetic operations required to reach an ε-solution, the gain is much larger. This will become clear in the next section. The modified algorithm is described on page 323. Note that in this algorithm the vector λ may be arbitrarily at each iteration, subject to (17.1). The next theorem specifies values for the parameters τ , θ and σ for which the algorithm is well defined and has a polynomial iteration bound. √ Theorem IV.14 If τ = 1/2, θ = 1/(6 n) and σ = 9/8, then the Dual Logarithmic Barrier Algorithm with Modified Full Newton Steps requires at most √ nµ0 6 n log ε iterations. The output is a primal-dual pair (x, s) such that xT s ≤ 2ε. IV.17 Partial Updating 323 Dual Log. Barrier Algorithm with Modified Full Newton Steps Input: A proximity parameter τ , 0 ≤ τ < 1; an accuracy parameter ε > 0; (y 0 , s0 ) ∈ D and µ0 > 0 such that δ(s0 , µ0 ) ≤ τ ; a barrier update parameter θ, 0 < θ < 1; a threshold value σ, σ > 1. begin s := s0 ; µ := µ0 ; while nµ ≥ ε do begin Choose any λ satisfying (17.1); s := s + ∆s, ∆s from (17.2); µ := (1 − θ)µ; end end Proof: According to Lemma IV.13 the algorithm is well defined. The iteration bound is an immediate consequence of Lemma I.36. Finally, the duality gap of the final iterate can be estimated as follows. For the final iterate s we have δ̃(s, µ) ≤ 1/2, with nµ ≤ ε. Taking x = x̃(s, µ) it follows from (17.10) that sT x̃(s, µ) = nµ − µhT λ−1 . Since √ hT λ−1 ≤ λ−1 khk ≤ σ δ̃(s, µ) n ≤ 9n/16 ≤ n, we obtain sT x̃(s, µ) ≤ 2nµ ≤ 2ε. The proof is complete. 17.4 ✷ Algorithm with rank-one updates We now present a variant of the algorithm in the previous section in which the vector λ used in the computation of the modified Newton step is prescribed. See page 324. Note that at each iteration the vector s̃ is updated in such a way that the vector λ used in the computation of the modified Newton step satisfies (17.1). As a consequence, the iteration bound for the algorithm is √ given by Theorem IV.14. Hence, the algorithm yields an exact solution of (D) in O ( nL) iterations. Without using partial updates — which corresponds to giving the threshold parameter σ the value 1 — the bound for 324 IV Miscellaneous Topics Full Step Dual Log. Barrier Algorithm with Rank-One Updates Input: A proximity parameter τ , τ = 1/2; an accuracy parameter ε > 0; (y 0 , s0 ) ∈ D and µ0 > 0 such that δ(s0 ,√ µ0 ) ≤ τ ; a barrier update parameter θ, θ = 1/(6 n); a threshold value σ, σ = 9/8. begin s := s0 ; µ := µ0 ; s̃ = s; while nµ ≥ ε do begin λ := s̃s−1 ; s := s + ∆s, ∆s from (17.2); for i := 1 to n do begin  if s̃sii ∈ / σ1 , σ then s̃i := si end µ := (1 − θ)µ; end end  the total number of arithmetic operations becomes O n3.5 L . Recall that the extra factorn3 can be interpreted as being due to n rank-one updates per iteration, with O n2 arithmetic operations per rank-one update. The total number of rank-one updates in the above algorithm is equal to the number of times that a coordinate of the vector s̃ is updated. We estimate this √ number in the next section, and we show that on the average it is not more than O ( n) per iteration, instead of n. Thus  the overall bound for the total number of arithmetical operations becomes O n3 L . 17.5 Count of the rank-one updates We need to count (or estimate) the number of times that a coordinate of the vector s̃ changes. Let sk and s̃k denote the values assigned to s and to s̃, respectively, at iteration k of the algorithm. We use also the superscript k to refer to values assigned to other relevant entities during the k-th iteration. For example, the value assigned to λ at iteration k is denoted by λk and satisfies λk = s̃k−1 , sk−1 k ≥ 1. IV.17 Partial Updating 325 Moreover, denoting the modified Newton step on iteration k by ∆sk , we have ∆sk = sk − sk−1 = s̃k−1 hk = λk sk−1 hk , k ≥ 1. (17.12) Note that the algorithm is initialized so that s0 = s̃0 and these are the values of s and s̃ just before the first iteration. Now consider the i-th coordinate of s̃. Suppose that s̃i is updated at iteration k1 ≥ 0 and next updated at iteration k2 > k1 . Then the updating rule implies that the sequence ski 2 −1 ski 2 ski 1 +1 ski 1 +2 , , . . . , , s̃ki 1 s̃ki 1 +1 s̃ki 2 −2 s̃ik2 −1 has the property that the last entry lies outside the interval (1/σ, σ) whereas all the other entries lie inside this interval. Since ski 1 = s̃ki 1 = s̃ki 1 +1 = . . . = s̃ki 2 −1 we can rewrite the above sequence as ski 1 +1 ski 1 +2 ski 2 −1 ski 2 , , . . . , , . ski 1 ski 1 ski 1 ski 1 (17.13) Hence, with pj := sik1 +j , 0 ≤ j ≤ K := k2 − k1 , (17.14) the sequence p0 , p1 , . . . , pK has the property pj ∈ p0 and   1 ,σ , σ pK ∈ / p0  1≤j<K  1 ,σ . σ (17.15) (17.16) (17.17) Our estimate of the number of rank-one updates in the algorithm depends on a technical lemma on such sequences. The proof of this lemma (Lemma IV.15 below) requires another technical lemma that can be found in Appendix C (Lemma C.3). Lemma IV.15 Let σ ≥ 1 and let p0 , p1 , . . . , pK be a finite sequence of positive numbers satisfying (17.16) and (17.17). Then K−1 X j=0 pj+1 − pj 1 ≥1− . pj σ Proof: We start with K = 1. Then we need to show p1 − p0 1 p1 = −1 ≥1− . p0 p0 σ (17.18) 326 IV Miscellaneous Topics If p1 /p0 ≤ 1/σ then p1 1 p1 −1 =1− ≥1− p0 p0 σ and if p1 /p0 ≥ σ then p1 σ−1 1 p1 −1 = −1≥σ−1≥ =1− . p0 p0 σ σ We proceed with K ≥ 2. It is convenient to denote the left-hand side expression on (17.18) by g(p0 , p1 , . . . , pK ). We start with an easy observation: if pj+1 = pj for some j (0 ≤ j < K) then g(p0 , p1 , . . . , pK ) does not change if we remove pj+1 from the sequence. So without loss of generality we may assume that no two subsequent elements in the given sequence p0 , p1 , . . . , pK are equal. Now let the given sequence p0 , p1 , . . . , pK be such that g(p0 , p1 , . . . , pK ) is minimal. For 0 < j < K we consider the two terms in g(p0 , p1 , . . . , pK ) that contain pj . The contribution of these two terms is given by pj+1 − pj pj pj − pj−1 + = 1− + 1− pj−1 pj pj−1 pj+1 pj−1 pj pj−1 . (17.19) Since p0 , p1 , . . . , pK minimizes g(p0 , p1 , . . . , pK ), when fixing pj−1 and pj+1 , pj must minimize (17.19). If pj+1 ≤ pj−1 then Lemma C.3 (page 437) implies that pj pj pj+1 = 1 or = . pj−1 pj−1 pj−1 This means that pj = pj−1 or pj = pj+1 . Hence, in this case the sequence has two subsequent elements that are equal, which has been excluded above. We conclude that pj+1 > pj−1 . Applying Lemma C.3 once more, we obtain r pj+1 pj = . pj−1 pj−1 Thus it follows that pj−1 < pj = √ pj−1 pj+1 < pj+1 for each j, 0 < j < K, showing that the sequence p0 , p1 , . . . , pK is strictly increasing and each entry pj in the sequence, with 0 < j < K, is the geometric mean of the surrounding entries. This implies that the sequence pj /p0 , 1 ≤ j ≤ K, is geometric and we have pj = αj , 1 ≤ j ≤ K, p0 where r p2 α= > 1. p0 In that case we must have pK p0 g(p0 , p1 , . . . , pK ) = ≥ σ and hence α satisfies αK ≥ σ. Since K−1 X j=0 K−1 X pj+1 (α − 1) = K (α − 1) , −1 = pj j=0 IV.17 Partial Updating 327 the inequality in the lemma follows if K (α − 1) ≥ 1 − 1 . αK This inequality holds for each natural number K and for each real number α ≥ 1. This can be seen by reducing the right-hand side as follows:  (α − 1) αK−1 + . . . + α + 1 αK − 1 1 = = 1− K α αK αK  −1 −2 = (α − 1) α + α . . . + α−K < K (α − 1) . This completes the proof. ✷ Now the next lemma follows easily. Lemma IV.16 Suppose that the component s̃i of s̃ is updated at iteration k1 and next updated at iteration k2 > k1 . Then kX 2 −1 k=k1 ∆sk+1 1 i ≥1− , k σ si where ∆sk+1 denotes the i-th coordinate of the modified Newton step at iteration k +1. i Proof: Applying Lemma IV.15 to the sequence p0 , p1 , . . . , pK defined by (17.14) we get kX 2 −1 1 sk+1 − ski i ≥1− . k σ s i k=k 1 Since s k+1 k − s = ∆s k+1 by definition, the lemma follows. ✷ Theorem IV.17 Let N denote the total number of iterations of the algorithm and ni the total number of updates of s̃i . Then n X i=1 √ ni ≤ 6N n. Proof: Recall from (17.12) that ∆sk+1 = λk+1 sk hk+1 . Hence, for 1 ≤ i ≤ n, ∆sk+1 i = λk+1 hk+1 . i i ski Now Lemma IV.16 implies N X k=1 λki hki = N −1 X k=0 λk+1 hk+1 i i   1 ≥ ni 1 − . σ 328 IV Miscellaneous Topics Taking the sum over i we obtain n X i=1 N ni ≤ n σ XX k k λi hi . σ−1 i=1 k=1 The inner sum can be bounded above by n X i=1 λki hki ≤ σ n X hki = σ hk i=1 1 ≤ σ hk √ n. Since hk = δ̃(sk , µk ) ≤ τ we obtain n X i=1 ni ≤ √ N N σ2 τ n σ X √ . στ n = σ−1 σ−1 k=1 Substituting the values of σ and τ specified in the algorithm proves the theorem. ✷ Finally, using the iteration bound of Theorem IV.14 and that each rank-one update requires O(n2 ) arithmetic operations, we may state our final result without further proof. Theorem IV.18 The Full Step Dual Logarithmic Barrier Algorithm with Rank-One Updates requires at most nµ0 36n3 log ε arithmetic operations. The output is a primal-dual pair (x, s) such that xT s ≤ 2ε. 18 Higher-Order Methods 18.1 Introduction In a target-following method the Newton directions ∆x and ∆s to a given target point w in the w-space,1 and at a given positive primal-dual pair (x, s), are obtained by solving the system (10.2): A∆x = 0, A ∆y + ∆s s∆x + x∆s = = 0, ∆w, T (18.1) where ∆w = w − xs. Recall that this system was obtained by neglecting the secondorder term ∆x∆s in the third equation of the nonlinear system (10.1), given by A∆x A ∆y + ∆s = = 0, 0, s∆x + x∆s + ∆x∆s = ∆w. T (18.2) An exact solution — (∆e x, ∆e y, ∆e s) say — of (18.2) would yield the primal-dual pair corresponding to the target w, because (x + ∆e x) (s + ∆e s) = w, as can easily be verified. Unfortunately, finding an exact solution of the nonlinear system (18.2) is hard from a computational point of view. Therefore, following a classical approach in mathematics when dealing with nonlinearity, we linearize the system, and use the solutions of the linearized system (18.1). Denoting its solution simply by (∆x, ∆y, ∆s), the primal-dual pair (x + ∆x, s + ∆s) satisfies (x + ∆x) (s + ∆s) = w − ∆x∆s, and hence, the ‘error’ after the step is given by ∆x∆s. Thus, this error represents the price we have to pay for using a solution of the linearized system (18.1). We refer to it henceforth as the second-order effect. 1 We defined the w-space in Section 9.1, page 220; it is simply the interior of the nonnegative orthant in IRn . 330 IV Miscellaneous Topics Clearly, the second-order effect strongly depends on the actual data of the problem under consideration.2 It would be very significant if we could eliminate the above described second-order effect, or at least minimize it in some way or another. One way to do this is to use so-called higher-order methods.3 The Newton method used so far is considered to be a first-order method. In the next section the search directions for higher-order methods are introduced. Then we devote a separate section (Section 18.3) to the estimate of the (higher-order) error term E r (α), where r ≥ 1 denotes the order of the search direction and α the step-size. The results of Section 18.3 are applied in two subsequent sections. In Section 18.4 we first discuss and extend the definition of the primal-dual Dikin direction, as introduced in Appendix E for the self-dual problem, to a primal-dual Dikin direction for the problems (P ) and (D) in standard format. Then we consider a higher-order version of √ this direction, and we show that the iteration bound can be reduced by the factor n without increasing the complexity per iteration. Then, in Section 18.5 we apply the results of Section 18.3 to the primal-dual logarithmic barrier method, as considered in Chapter 7 of Part II. This section is based on a paper of Zhao [320]. Here the use of higher-order search directions does not improve the iteration bound when compared with the (first-order) full Newton step method. Recall that in the full Newton step method the iterates stay very close to the central path. This can be expressed by saying this method keeps the iterates in a ‘narrow cone’ around the central path. We shall see that a higher-order method allows the iterates to stay further away from the central path, which makes such a method a ‘wide cone’ method. 18.2 Higher-order search directions Suppose that we are given a positive primal-dual pair (x, s) and we want to find the primal-dual pair corresponding to w̄ := xs + ∆w for some displacement ∆w in the w-space. Our aim is to generate suitable search directions ∆x and ∆s at (x, s). One way to derive such directions is to consider the linear line segment in the w-space connecting xs with w̄. A parametric representation of this segment is given by xs + α∆w, 2 3 0 ≤ α ≤ 1. In the w-space the ideal situation is that the curve (x + α∆x) (s + α∆s) , 0 ≤ α ≤ 1, moves from xs in a straight line to the target w. As a matter of fact, the second-order effect ‘blows’ the curve away from this straight line segment. Considering α as a time parameter, we can think of the iterate (x + α∆x) (s + α∆s) as the trajectory of a vessel sailing from xs to w and of the secondorder effect as a wind blowing it away from its trajectory. To reach the target w the bargeman can put over the helm now and then, which in our context is accomplished by updating the search direction. In practice, a bargeman will anticipate the fact that the wind is (locally) constant and he can put the helm in a fixed position that prevents the vessel being driven from its correct course. It may be interesting to mention a computer game called Schiet OpT M , designed by Brinkhuis and Draisma, that is based on this phenomenon [51]. It requires the player to find an optimal path in the w-space to the origin. The idea of using higher-order search directions as presented in this chapter is due to Monteiro, Adler and Resende [220], and was later investigated by Zhang and Zhang [318], Hung and Ye [150], Jansen et al. [160] and Zhao [320]. The idea has been applied also in the context of a predictorcorrector method by Mehrotra [202, 205]. IV.18 Higher-Order Methods 331 To any point of this segment belongs a primal-dual pair and we denote this pair by (x(α), s(α)).4 Since x(α) and s(α) depend analytically5 on α there exist x(i) and s(i) , with i = 0, 1, . . ., such that x(α) = ∞ X x(i) αi , s(α) = ∞ X s(i) αi , 0 ≤ α ≤ 1. i=0 i=0 (18.3) Obviously, x(0) = x = x(0) and s(0) = s = s(0) . From Ax(α) = b, for each α ∈ [0, 1], we derive Ax(0) = b, Ax(i) = 0, i ≥ 1. (18.4) Similarly, there exist unique y (i) and s(i) , i = 0, 1, . . ., such that AT y (0) + s(0) = c, AT y (i) + s(i) = 0, i ≥ 1. (18.5) Furthermore, from x(α)s(α) = xs + α∆w, by expanding x(α) and s(α) and then equating terms with equal powers of α, we get the following relations: (0) (1) x s k X x(0) s(0) = xs (18.6) (0) (1) = ∆w (18.7) x(i) s(k−i) = 0, +s x k = 2, 3, . . . . (18.8) i=0 The first relation implies once more that x(0) = x and s(0) = s. Using this and (18.4), (18.5) and (18.7) we obtain Ax(1) = 0 (1) = = 0 ∆w. T (1) A y +s sx(1) + xs(1) 4 5 (18.9) In other chapters of this book x(α) denotes the α-center on the primal central path. To avoid any misunderstanding it might be appropriate to emphasize that in this chapter x(α) — as well as s(α) — has a different meaning, as indicated. Note that x(α) and s(α) are uniquely determined by the relations Ax(α) = b, AT y(α) + s(α) x (α) s(α) = = c, s > 0, xs + α∆w. x > 0, Obviously, the right-hand sides in these relations depend linearly (and hence analytically) on α. Since the Jacobian matrix of the left-hand side is nonsingular, the implicit function theorem (cf. Proposition A.2 in Appendix A) implies that x(α), y(α) and s(α) depend analytically on α. See also Section 16.2.1. 332 IV Miscellaneous Topics This shows that x(1) and s(1) are just the primal-dual Newton directions at (x, s) for the target w̄ = xs + ∆w.6 Using (18.4), (18.5) and (18.8) we find that the higher-order coefficients x(k) , y (k) and s(k) , with k ≥ 2, satisfy the linear system T (k) A y Ax(k) + s(k) = = sx(k) + xs(k) 0 0 (18.10) − = k−1 X x(i) s(k−i) , k = 2, 3, . . . , i=1 thus finding a recursive expression for the higher-order coefficients. The remarkable thing here is that the coefficient matrix in (18.10) is the same as in (18.9). This has the important consequence that as soon as the standard (first-order) Newton directions x(1) and s(1) have been calculated from the linear system (18.9), the second-order terms x(2) and s(2) can be computed from a linear system with the same coefficient matrix. Having x(2) and s(2) , we can compute x(3) and s(3) , and so on. Hence, from a computational point of view the higher-order terms x(k) and s(k) , with k ≥ 2, can be obtained relatively cheaply. Assuming that the computation of the Newton directions requiresO(n3 ) arithmetic operations, the computation of each subsequent pair x(k) , s(k) of higher-order coefficients requires O(n2 ) arithmetic operations. For example, if we compute the  (k) (k) pairs x , s for k = 1, 2, . . . , n, this doubles the computational cost per iteration. There is some reason to expect, however, that we will obtain a more accurate search direction; this may result in a speedup that justifies the extra computational burden in the computation. By truncating the expansion (18.3), we define the primal-dual Newton directions of order r at (x, s) with step-size α by ∆r,α x := r X x(i) αi , ∆r,α s := r X s(i) αi . (18.11) i=1 i=1 Moving along these directions we arrive at xr (α) := x + ∆r,α x, sr (α) := s + ∆r,α s. Recall that we started this section by taking w̄ = xs + ∆w as the target point in the w-space. Now that we have introduced the step-size α it is more natural to consider w̄(α) := xs + α∆w as the target. In the following lemma we calculate xr (α) sr (α), which is the next iterate in the w-space, and hence obtain an expression for the deviation from the target w̄(α) after the step. 6 Exercise 84 Verify that y (1) can be solved from (18.9) by the formula y (1) = − AXS −1 AT −1 AS −1 ∆w. This generalizes the expression for the logarithmic barrier direction in Exercise 35, page 111. Given y (1) , s(1) and x(1) follow from s(1) = −AT y (1) ,  x(1) = S −1 ∆w − xs(1) . IV.18 Higher-Order Methods 333 Lemma IV.19 r 2r X r x (α) s (α) = xs + α∆w + r X k α k=r+1 (i) (k−i) x s i=k−r ! . Proof: We may write xr (α) := x + ∆r,α x = r X x(i) αi , i=0 and we have a similar expression for sr (α). Therefore, ! ! r r X X s(i) αi . x(i) αi xr (α) sr (α) = (18.12) i=0 i=0 The right-hand side can be considered as a polynomial in α of degree 2r. We consider the coefficient of αk for 0 ≤ k ≤ 2r. If 0 ≤ k ≤ r then the coefficient of αk is given by k X x(i) s(k−i) , i=0 By (18.8), this expression vanishes if k ≥ 2. Furthermore, if k = 1 the expression is equal to ∆w, by (18.7) and if k = 0 it is equal to xs, by (18.6). So it remains to consider the coefficient of αk on the right-hand side of (18.12) for r + 1 ≤ k ≤ 2r. For these values of k the corresponding coefficient in (18.12) is given by r X x(i) s(k−i) . i=k−r Hence, collecting the above results, we get xr (α) sr (α) = xs + α∆w + 2r X αk k=r+1 r X x(i) s(k−i) i=k−r ! . This completes the proof. (18.13) ✷ In the next section we further analyze the error term r E (α) := 2r X k=r+1 k α r X (i) (k−i) x s i=k−r ! . (18.14) We conclude this section with two observations. First, taking r = 1 we get E 1 (α) = α2 x(1) s(1) = α2 ∆x∆s, where ∆x and ∆s are the standard primal-dual Newton directions at (x, s). This is in accordance with earlier results (see, e.g., (10.12)). If we use a first-order Newton step 334 IV Miscellaneous Topics then the error is of order two in α. In the general case, of a step of order r, the error term E r (α) is of order r + 1 in α. The second observation concerns the orthogonality of the search directions in the xand s-spaces. It is immediate from the first two equations in (18.9) and (18.10) that As a consequence,  x(i) T s(j) = 0, ∀i ≥ 1, ∀j ≥ 1. T (∆r,α x) ∆r,α s = 0, and also, from Lemma IV.19, T (xr (α)) sr (α) = eT (xs + α∆w) = eT w̄(α). Thus, after the step with size α, the duality gap is equal to the gap at the target w̄(α). Figure 18.1 illustrates the use of higher-order search directions. w2 4 ✻ ✻ start 3 ■ ■ 2 ■ ■ ■ 1 ✻ r=1 r=2 r=3 r=4 r=5 target 0 0 1 2 3 4 ✲ w1 Figure 18.1 Trajectories in the w-space for higher-order steps with r = 1, 2, 3, 4, 5. IV.18 Higher-Order Methods 18.3 335 Analysis of the error term The main task in the analysis of the higher-order method is to estimate the error term E r (α), given by (18.14). Our first estimation is very loose. We write kE r (α)k ≤ 2r X r X αk k=r+1 i=k−r x(i) s(k−i) ≤ 2r X αk k=r+1 r X x(i) s(k−i) , (18.15) i=k−r and we concentrate on estimating the norms in the last sum. We use the vectors d and v introduced in Section 10.4: r √ x (18.16) , v = xs. d= s Then the third equation in (18.9) can be rewritten as ∆w v d−1 x(1) + ds(1) = (18.17) and the third equation in (18.10) as d−1 x(k) + ds(k) = −v −1 k−1 X x(i) s(k−i) , k = 2, 3, . . . . (18.18) i=1 Since x(k) and s(k) are orthogonal for k ≥ 1, the vectors d−1 x(k) and ds(k) are orthogonal as well. Therefore, d−1 x(k) 2 + ds(k) 2 = d−1 x(k) + ds(k) 2 , k ≥ 1. Hence, defining q (k) := d−1 x(k) + ds(k) , k ≥ 1, (18.19) we have for each k ≥ 1, d−1 x(k) ≤ q (k) , ds(k) ≤ q (k) , and as a consequence, for 1 ≤ i ≤ k − 1 we may write x(i) s(k−i) = d−1 x(i) ds(k−i) ≤ d−1 x(i) ds(k−i) ≤ q (i) q (k−i) . (18.20) Substitution of these inequalities in the bound (18.15) for E r (α) yields kE r (α)k ≤ 2r X k=r+1 αk r X q (i) q (k−i) . i=k−r We proceed by deriving upper bounds for q (k) , k ≥ 1. (18.21) 336 IV Miscellaneous Topics Lemma IV.20 For each k ≥ 1, q (k) ≤ ϕk v −1 k−1 ∞ k q (1) , (18.22) where the integer sequence ϕ1 , ϕ2 , . . . is defined recursively by ϕ1 = 1 and ϕk = k−1 X ϕi ϕk−i . (18.23) i=1 Proof: The proof is by induction on k. Note that (18.22) holds trivially if k = 1. Assume that (18.22) holds for q (ℓ) if 1 ≤ ℓ < k. We complete the proof by deducing from this assumption that the lemma is also true for q (k) . For k ≥ 2 we obtain from the definition (18.19) of q (k) and (18.18) that q (k) = −v −1 k−1 X x(i) s(k−i) . i=1 Hence, using (18.20), q (k) ≤ v −1 ∞ k−1 X q (i) q (k−i) . i=1 At this stage we apply the induction hypothesis to the last two norms, yielding q (k) ≤ v −1 ∞ k−1 X ϕi v −1 i=1 i−1 ∞ q (1) i ϕk−i v −1 k−i−1 ∞ q (1) k−i , which can be simplified to q (k) ≤ v −1 k−1 ∞ q (1) X k k−1 ϕi ϕk−i . i=1 Finally, using (18.23) the lemma follows. ✷ The solution of the recursion (18.23) with ϕ1 = 1 is given by7   2k − 2 1 ϕk = . k k−1 (18.24) This enables us to prove our next result. Lemma IV.21 For each k = r + 1, . . . , 2r, r X i=k−r 7 q (i) q (k−i) ≤ 22k−3 −1 v k k−2 ∞ q (1) k . Exercise 85 Prove that (18.24) is the solution of the recursion in (18.23) satisfying ϕ1 = 1 (cf., e.g., Liu [184]). IV.18 Higher-Order Methods 337 Proof: Using Lemma IV.20 we may write r X i=k−r q (i) r X q (k−i) ≤ ϕi v −1 i=k−r i−1 ∞ q (1) i ϕk−i v −1 k−i−1 ∞ q (1) k−i , which is equivalent to r X q (k−i) ≤ v −1 q (i) i=k−r k−2 ∞ q (1) k r X ϕi ϕk−i . i=k−r For the last sum we use again a loose bound: r X i=k−r ϕi ϕk−i ≤ k−1 X ϕi ϕk−i = ϕk . i=1 Using (18.24 ) and k ≥ 2 we can easily derive that ϕk ≤ 22k−3 , k k ≥ 2. Substituting this we obtain r X q (i) i=k−r q (k−i) ≤ 22k−3 −1 v k k−2 ∞ q (1) k , proving the lemma. ✷ Now we are ready for the main result of this section. Theorem IV.22 We have 2r X 1 αk 22k−3 v −1 kE (α)k ≤ r+1 r k=r+1 k−2 ∞ q (1) k . Proof: From (18.21) we recall that kE r (α)k ≤ 2r X k=r+1 αk r X q (i) q (k−i) . i=k−r Replacing the second sum by the upper bound in Lemma IV.21 and using that k ≥ r+1 in the first sum, we obtain the result. ✷ 18.4 Application to the primal-dual Dikin direction 18.4.1 Introduction The Dikin direction, described in Appendix E, is one of the directions that can be used for solving the self-dual problem. In the next section we show that its definition can 338 IV Miscellaneous Topics easily be adapted to problems (P ) and (D) in standard format. It will become clear that the analysis of the self-dual model also applies to the standard model and vice versa. Although we don’t work it out here, we mention that use of the (first-order) Dikin direction leads to an algorithm for solving the standard model that requires at most T x0 s0 τ n log ε iterations, where (x0 , s0 ) denotes the initial primal-dual pair, ε is an upper bound for the duality gap upon termination of the algorithm and τ an upper bound for the distance of the iterates to the central path.8 The complexity per iteration is O(n3 ), as usual. This is in accordance with the bounds in Appendix E for the self-dual model. By using higher-order versions of the Dikin direction the complexity can be improved by a r−1 √ factor (τ n) 2r . Note that this factor goes to τ n if r goes to infinity. The complexity per iteration is O(n3 + rn2 ). Hence, when taking r = n, the complexity per iteration remains√ O(n3 ). In that case we show that the iteration bound is improved by the factor τ n. When τ = O(1), which can be assumed without loss of generality, we obtain the iteration bound O √ n log x0 T ε s0 ! , which is the best iteration bound for interior point methods known until now. 18.4.2 The (first-order) primal-dual Dikin direction Let (x, s) be a positive primal-dual pair for (P ) and (D) and let ∆x and ∆s denote displacements in the x-space and the s-space. Moving along ∆x and ∆s we arrive at x+ := x + ∆x, s+ := s + ∆s. The new iterates will be feasible if A∆x = 0, AT ∆y + ∆s = 0, where ∆y represents the displacement in the y-space corresponding to ∆s, and both x+ and s+ are nonnegative. Since ∆x and ∆s are orthogonal, the new duality gap is given by x+ 8 T s+ = xT s + xT ∆s + sT ∆x. Originally, the Dikin direction was introduced for the standard format. See Jansen, Roos and Terlaky [156]. IV.18 Higher-Order Methods 339 Replicating Dikin’s idea, just as in Section E.2, we replace the nonnegativity conditions by the condition9 ∆x ∆s ≤ 1. + x s This can be rewritten as x+ − x s+ − s + ≤ 1, x s showing that the new iterates are sought within an ellipsoid, called the Dikin ellipsoid at the given pair (x, s). Since our aim is to minimize the duality gap, we consider the optimization problem   ∆x ∆s min eT (s∆x + x∆s) : A∆x = 0, AT ∆y + ∆s = 0, ≤ 1 . (18.25) + x s The crucial observation is that (18.25) determines the displacements ∆x and ∆s uniquely. The arguments are almost the same as in Section E.2. Using the vectors d and v in (18.16), x and s can be rescaled to the same vector v: d−1 x = v, ds = v. As usual, we rescale ∆x and ∆s accordingly to dx := d−1 ∆x, Then dx ∆x = , x v ds := d∆s. (18.26) ∆s ds = , s v and moreover, ∆x∆s = dx ds . 9 Dikin introduced the so-called primal affine-scaling direction at a primal feasible x (x > 0) by minimizing the primal objective cT (x + ∆x) over the ellipsoid ∆x x ≤ 1, subject to A∆x = 0. So the primal affine-scaling direction is determined as the unique solution of min n cT ∆x : A∆x = 0, ∆x x o ≤1 . Dikin showed convergence of the primal affine-scaling method ([63, 64, 65]) under some nondegeneracy assumptions. Later, without nondegeneracy assumptions, Tsuchiya [268, 270] proved convergence of the method with damped steps. Dikin and Roos [66] proved convergence of a fullstep method for the special case that the given problem is homogeneous. Despite many attempts, until now it has not been possible to show that the method is polynomial. For a recent survey paper we refer the reader to Tsuchiya [272]. The approach in this section seems to be the natural generalization to the primal-dual framework. 340 IV Miscellaneous Topics Also, the scaled displacements dx and ds are orthogonal. Now the vector occurring in the ellipsoidal constraint in (18.25) can be reduced to dx + ds ∆x ∆s + = . x s v Moreover, the variable vector in the objective of problem (18.25) can be written as   ∆x ∆s = v (dx + ds ) . s∆x + x∆s = xs + x s With dw := dx + ds , (18.27) the vectors dx and ds are uniquely determined as the orthogonal components of dw in the null space and row space of AD, so we have dx = PAD (dw ) (18.28) ds = dw − dx . (18.29) Thus we can solve the problem (18.25) by solving the much simpler problem   dw T min v dw : ≤1 . (18.30) v The solution of (18.30) is given by 3 dw = − (xs) 2 v3 =− . 2 kv k kxsk It follows that dx and ds are uniquely determined by the system ADdx = 0 (AD) dy + ds dx + ds = = 0 dw . T In terms of the unscaled displacements this can be rewritten as A∆x AT ∆y + ∆s = = 0 0 s∆x + x∆s = ∆w, (18.31) where 2 v4 (xs) =− . (18.32) kv 2 k kxsk We conclude that the solution of the minimization problem (18.25) is uniquely determined by the linear system (18.31). Hence the (first-order) Dikin directions ∆x and ∆s are the Newton directions at (x, s) corresponding to the displacement ∆w in the w-space, as given by (18.32). We therefore call ∆w the Dikin direction in the w-space. In the next section we consider an algorithm using higher-order Dikin directions. Using the estimates of the error term E r (α) in the previous section we analyze this algorithm in subsequent sections. ∆w = vdw = − IV.18 Higher-Order Methods 18.4.3 341 Algorithm using higher-order Dikin directions For the rest of this section, ∆w denotes the Dikin direction in the w-space as given by (18.32). For r = 1, 2, . . . and for some fixed step-size α that is specified later, the corresponding higher-order Newton steps of order r at (x, s) are given by (18.11). The iterates after the step depend on the step-size α. To express this dependence we denote them as x(α) and s(α) as in Section 18.2. We consider the following algorithm. Higher-Order Dikin Step Algorithm for the Standard Model Input: An accuracy parameter ε > 0; a step-size parameter α, 0 < α ≤ 1;  a positive primal-dual pair x0 , s0 . begin x := x0 ; s := s0 ; while xT s ≥ ε do begin x := x(α) = x + ∆r,α x; s := s(α) = s + ∆r,α s end end Below we analyze this algorithm. Our aim is to keep the iterates (x, s) within some cone max (xs) δc (xs) = ≤τ min (xs) around the central path, for some fixed τ > 1; τ is chosen such that δc (x0 s0 ) ≤ τ. 18.4.4 Feasibility and duality gap reduction As before, we use the superscript step of size α at (x, s). Thus, + to refer to entities after the higher-order Dikin x+ := x(α) = x + ∆r,α x, s+ := s(α) = s + ∆r,α s, and from Lemma IV.19, x+ s+ = x(α)s(α) = xs + α∆w + E r (α), (18.33) where the higher-order error term E r (α) is given by (18.14). The step-size α is feasible if the new iterates are positive. Using the same (simple continuity) argument as in the proof of Lemma E.2, page 455, we get the following result. 342 IV Miscellaneous Topics Lemma IV.23 If ᾱ is such that x(α)s(α) > 0 for all α satisfying 0 ≤ α ≤ ᾱ, then the step-size ᾱ is feasible. Lemma IV.23 implies that the step-size ᾱ is feasible if xs + α∆w + E r (α) ≥ 0, 0 ≤ α ≤ ᾱ. At the end of Section 18.2 we established that after the step the duality gap attains the value eT (xs + α∆w). This leads to the following lemma. Lemma IV.24 If the step-size α is feasible then   T α xT s. x+ s+ ≤ 1 − √ n Proof: We have 2 T x+ s+ = eT (xs) xs − α kxsk ! = xT s − α kxsk . The Cauchy–Schwarz inequality implies 1 eT (xs) xT s kxsk = √ kek kxsk ≥ √ = √ , n n n and the lemma follows. 18.4.5 ✷ Estimate of the error term By Theorem IV.22 the error term E r (α) satisfies kE r (α)k ≤ 2r X 1 αk 22k v −1 8 (r + 1) k=r+1 k−2 ∞ q (1) k . In the present case we have, from (18.19), (18.17) and (18.32), q (1) = Hence q (1) = Therefore, v −1 ∞ q (1) ∆w v3 =− 2 . v kv k v2 v3 ≤ kvk∞ = kvk∞ = max (v). 2 kv k kv 2 k max (v) ≤ = min (v) Substituting this we get kE r (α)k ≤ 1 v −1 8 (r + 1) −2 ∞ 2r X k=r+1 s √ max (xs) p = δc (xs) ≤ τ . min (xs) αk 22k 2r √ k min (xs) X τ = 8 (r + 1) k=r+1 √ k 4α τ . (18.34) IV.18 Higher-Order Methods 18.4.6 343 Step size Assuming δc (x, s) ≤ τ , with τ > 1, we establish a bound for the step-size α such that this property is maintained after a higher-order Dikin step. The analysis follows the same lines as the analysis in Section E.4 of the algorithm for the self-dual model with first-order Dikin steps. As there, we derive from δc (x, s) ≤ τ the existence of positive numbers τ1 and τ2 such that τ1 e ≤ xs ≤ τ2 e, with τ2 = τ τ1 . (18.35) Without loss of generality we take τ1 = min (xs). The following lemma generalizes Lemma E.4. Lemma IV.25 Let τ > 1. Suppose that δc (xs) ≤ τ and let τ1 and τ2 be such that (18.35) holds. If the step-size α satisfies s ( √ ) 1 1 r 2τ1 τ kxsk , , √ , √ α ≤ min 2τ2 4 τ 4 τ kxsk then we have δc (x+ s+ ) ≤ τ . Proof: Using (18.33) and the definition of ∆w we obtain 2 x+ s+ = x(α)s(α) = xs + α∆w + E r (α) = xs − α (xs) + E r (α). kxsk Using the first bound on α in the lemma, we can easily verify that the map t 7→ t − αt2 kxsk is an increasing function for t ∈ [0, τ2 ]. Application of this map to each component of the vector xs gives     2 α (xs) ατ22 ατ12 e ≤ xs − e. ≤ τ2 − τ1 − kxsk kxsk kxsk It follows that     ατ12 ατ22 r + + τ1 − e + E (α) ≤ x s ≤ τ2 − e + E r (α). kxsk kxsk Hence, assuming for the moment that the Dikin step of size α is feasible, we certainly have δ(x+ s+ ) ≤ τ if      ατ22 ατ12 r e + E (α) ≥ τ2 − e + E r (α). τ τ1 − kxsk kxsk 344 IV Miscellaneous Topics Since τ2 = τ τ1 , this reduces to α  τ22 − τ τ12 kxsk  e + (τ − 1)E r (α) ≥ 0. Since τ22 − τ τ12 = (τ − 1) τ1 τ2 we can divide by τ − 1, thus obtaining ατ1 τ2 e + E r (α) ≥ 0. kxsk This inequality is certainly satisfied if kxsk kE r (α)k ≤ ατ1 τ2 . Using the upper bound (18.34) for E r (α) it follows that we have δ(x+ s+ ) ≤ τ if α is such that 2r √ k kxsk min (xs) X 4α τ ≤ ατ1 τ2 . 8 (r + 1) k=r+1 Since min (xs) = τ1 , this inequality simplifies to 2r X kxsk 8 (r + 1) k=r+1 √ k 4α τ ≤ ατ2 . √ The second bound in the lemma implies that 4α τ < 1. Therefore, the last sum is bounded above by 2r X √ r+1 √ k . 4α τ ≤ r 4α τ k=r+1 Substituting this we arrive at the inequality √ r+1 r kxsk (4α τ ) ≤ ατ2 . 8 (r + 1) Omitting the factor r/(r + 1), we can easily check that this inequality certainly holds if s √ 1 r 2τ1 τ , α≤ √ 4 τ kxsk which is the third bound on α in the lemma. Thus we have shown that for each stepsize α satisfying the bounds in the lemma, we have δ(x+ s+ ) ≤ τ . But this implies that the coordinates of x+ s+ do not vanish for any of these step-sizes. By Lemma IV.23 this also implies that the given step-size α is feasible. Hence the lemma follows. ✷ IV.18 Higher-Order Methods 18.4.7 345 Convergence analysis With the result of the previous section we can now derive an upper bound for the number of iterations needed by the algorithm. Lemma IV.26 Let 4/n ≤ τ ≤ 4n. Then, with the step-size s 1 r 2 √ , α= √ 4 τ τn the Higher-Order Dikin Step Algorithm for the Standard Model requires at most r√ T √ r τn x0 s0 log 4 τn 2 ε iterations.10 The output is a feasible primal-dual pair (x, s) such that δc (xs) ≤ τ and xT s ≤ ε. Proof: Initially we are given a feasible primal-dual pair (x0 , s0 ) such that δc (x0 s0 ) ≤ τ. The given step-size α guarantees that these properties are maintained after each iteration. This can be deduced from Lemma IV.25, as we now show. It suffices to show that the specified value of α meets the bound in Lemma IV.25. Since τ n ≥ 4 we have s 1 1 r 2 √ ≤ √ , α= √ 4 τ τn 4 τ √ showing that α meets the second bound. Since kxsk ≤ τ2 n we have √ √ √ 2τ1 τ 2τ1 τ 2 τ 2 √ = √ = √ , ≥ kxsk τ2 n τ n τn which implies that α also meets the third bound in Lemma IV.25. Finally, for the first bound in the lemma, we may write √ √ n kxsk τ1 n 1 ≥ = ≥ √ . 2τ2 2τ2 2τ 4 τ The last inequality follows because τ ≤ 4n. Thus we have shown that α meets the bounds in Lemma IV.25. As a consequence, the property δc (xs) ≤ τ is maintained during the course of the algorithm. This also implies that the algorithm is well defined and, hence, the only remaining task is to derive the iteration bound in the lemma. By Lemma IV.24, each iteration reduces the duality gap by a factor 1 − θ, where s 1 r 2 α √ . θ= √ = √ n 4 τn τn 10 When r = 1 the step-size becomes α= 2τ 1 √ , n which is a factor of 2 smaller than the step-size in Section E.5. As a consequence the iteration bound is a factor of 2 worse than in Section E.5. This is due to a weaker estimate of the error term. 346 IV Miscellaneous Topics Hence, by Lemma I.36, the duality gap satisfies xT s ≤ ε after at most r√ T √ r τn x0 s0 1 nµ0 log = 4 τn log θ ε 2 ε iterations. This completes the proof. ✷  2 Recall that each iteration requires O n3 + rn arithmetic operations. In the rest of this section we take the order r of the search direction equal to r = n. Then the complexity per iteration is still O n3 just as in the case of a first-order method. The iteration bound of Lemma IV.26 then becomes r√ T √ n τn x0 s0 log . 4 τn 2 ε Now, assuming τ ≤ 4n, we have r√ √ τn n ≤ n n. 2 The last expression is maximal for n = 3 and is then equal to 1.44225. Thus we may state without further proof the following theorem. Theorem IV.27 Let 4/n ≤ τ ≤ 4n and r = n. Then the Higher-Order Dikin Step Algorithm for the Standard Model stops after at most T √ x0 s0 6 τ n log ε iterations. Each iteration requires O(n3 ) arithmetic operations. For τ = 2, which can be taken without loss of generality, the iteration bound of Theorem IV.27 becomes T ! √ x0 s0 , n log O ε which is the best obtainable bound. 18.5 Application to the primal-dual logarithmic barrier method 18.5.1 Introduction In this section we apply the higher-order approach to the (primal-dual) logarithmic barrier method. If the target value of the barrier parameter is µ, then the search direction in the w-space at a given primal-dual pair (x, s) is given by ∆w = µe − xs. We measure the proximity from (x, s) to the target µe by the usual measure r r 1 1 xs µe µe − xs √ − = √ . (18.36) δ(xs, µ) = 2 µe xs 2 µ xs IV.18 Higher-Order Methods 347 In this chapter we also use an infinity-norm based proximity of the central path, namely r r √ µ µe µe = max = max . (18.37) δ∞ (xs, µ) := i i xs ∞ xs vi Recall from Lemma II.62 that we always have δ∞ (xs, µ) ≤ ρ (δ(xs, µ)) . (18.38) Just as in the previous section, where we used the Dikin direction, our aim is to consider a higher-order logarithmic barrier method that keeps the iterates within some cone around the central path. The cone is obtained by requiring that the primal-dual pairs (x, s) generated by the method are such that there exists a µ > 0 such that δ(xs, µ) ≤ τ, and δ∞ (xs, µ) ≤ ζ (18.39) where τ and ζ denote some fixed positive numbers that specify the ‘width’ of the cone around the central path in which the iterates are allowed to move. When ζ = ρ(τ ) it follows from (18.38) that δ(xs, µ) ≤ τ ⇒ δ∞ (xs, µ) ≤ ζ. Hence, the logarithmic barrier methods considered in Part II fall within the present framework √ with ζ = ρ(τ ). The full Newton step method considered in Part II uses τ = 1/ 2. In the large-update methods of Part II the updates of the barrier parameter µ reduce µ by a factor √ 1 − θ, where θ = O(1). As a consequence, after a barrier update we have δ(xs, µ) = O( n). Hence, we may say that the full Newton step methods in Part II keep the iterates in√a cone with τ = O(1), and the large-update methods in a wider cone with τ = O( n). Recall that the method using the wider cone — the large-update methods — are multistep methods. Each single step is a damped (firstorder) Newton step and the progress is measured by the decrease of the (primal-dual) logarithmic barrier function. In this √ section we consider a method that works within a ‘wide’ cone, with τ = O( n) and ζ = O(1), but we use higher-order Newton steps instead of damped first-order steps. The surprising feature of the method is that progress can be controlled by using the proximity measures δ(xs, µ) and δ∞ (xs, µ). We show that after an update of the barrier parameter a higher-order step reduces the proximity δ(xs, µ) by a factor smaller than one and keeps the proximity δ∞ (xs, µ) under a fixed threshold value ζ ≥ 2. Then the barrier parameter value can be decreased to a smaller value while respecting the cone condition  we obtain a ‘wide-cone method’ √ (18.39). In this way whose iteration bound is O( n log log x0 )T s0 /ε . Each iteration consists of a single higher-order Newton step. Below we need to analyze the effect of a higher-order Newton step on the proximity measures. For that purpose the error term must be estimated. 18.5.2 Estimate of the error term Recall from Lemma IV.19 that the error term E r (α) is given by kE r (α)k ≤ 2r X 1 αk 22k v −1 8 (r + 1) k=r+1 k−2 ∞ q (1) k , (18.40) 348 IV Miscellaneous Topics where v = √ xs. In the present case, (18.19) and (18.17) give q (1) = µe − xs ∆w = √ . v xs Hence, using (18.36) and denoting δ(xs, µ) by δ, we find √ √ q (1) = 2 µ δ ≤ 2 µ τ. Furthermore by using (18.37) and putting δ∞ := δ∞ (xs, µ) we have v −1 ∞ 1 =√ µ r µ xs ∞ δ∞ ζ = √ ≤ √ . µ µ Substituting these in (18.40) we get11 kE r (α)k ≤ 2r 2r X X µ µ k k (8αδδ ) ≤ (8ατ ζ) . ∞ 2 8(r + 1)δ∞ 8(r + 1)ζ 2 k=r+1 (18.41) k=r+1 Below we always make the natural assumption that α ≤ 1. Moreover, δ and δ∞ always denote δ(xs, µ) and δ∞ (xs, µ) respectively. Lemma IV.28 Let the step-size be such that α ≤ 1/ (8δδ∞ ). Then r+1 kE r (α)k ≤ rµ (8αδδ∞ ) 2 r+1 8δ∞ . Proof: Since 8αδδ∞ ≤ 1, we have 2r X k=r+1 k (8αδδ∞ ) ≤ r (8αδδ∞ ) r+1 . Substitution in (18.41) gives the lemma. ✷ Corollary IV.29 Let δ ≤ τ , δ∞ ≤ ζ and α ≤ 1/ (8τ ζ). Then kE r (α)k ≤ 11 rµ (8ατ ζ) r+1 8ζ 2 r+1 . For r = 1 the derived bound for the error term gives E 1 (1) ≤ 4µδ2 , as follows easily. It is interesting to compare this bound with the error bound in Section 7.4 (cf. Lemma II.49), which √ √ amounts to E 1 (1) ≤ µδ2 2. Although the present bound is weaker by a factor of 2 2 for r = 1, √ it is sharp enough for our present purpose. It is also sharp enough to derive an O( n) complexity bound for r = 1 with some worse constant than before. Our main interest here is the case where r > 1. IV.18 Higher-Order Methods 18.5.3 349 Reduction of the proximity after a higher-order step Recall from (18.13) that after a higher-order step of size α we have xr (α)sr (α) = xs + α∆w + E r (α) = xs + α(µe − xs) + E r (α). We consider w̄(α) := xs + α(µe − xs) as the (intermediate) target during the step. The new iterate in the w-space is denoted by w(α), so w(α) = xr (α)sr (α). As a consequence, w(α) = w̄(α) + E r (α). (18.42) The proximities of the new iterate with respect to the µ-center are given by s r w(α) µ 1 − δ(w(α), µ) = 2 µ w(α) and δ∞ (w(α), µ) = r µ w(α) . ∞ Ideally the proximities after the step would be δ(w̄(α), µ) and δ∞ (w̄(α), µ). We first derive an upper bound for δ(w̄(α), µ) and δ∞ (w̄(α), µ) respectively in terms of τ , ζ and the step-size α. Lemma IV.30 We have √ (i) δ(w̄(α), µ) ≤ 1 − α δ, p 2 . (ii) δ∞ (w̄(α), µ) ≤ α + (1 − α)δ∞ Proof: It is easily verified that for any positive vector w, by their definitions (18.36) and (18.37), both δ(w, µ)2 and δ∞ (w, µ)2 are convex functions of w. Since w̄(α) = xs + α (µe − xs) = α (µe) + (1 − α)xs, 0 ≤ α ≤ 1, w̄(α) is a convex combination of µe and xs. Hence, by the convexity of δ(w, µ)2 , δ(w̄(α), µ)2 ≤ α δ(µe, µ)2 + (1 − α) δ(xs, µ)2 . Since δ(µe, µ) = 0, the first statement of the lemma follows. The proof of the second claim is analogous. The convexity of δ∞ (xs, µ)2 gives δ∞ (w̄(α), µ)2 ≤ α δ∞ (µe, µ)2 + (1 − α) δ∞ (xs, µ)2 . Since δ∞ (µe, µ) = 1, the lemma follows. ✷ It is very important for our purpose that when the pair (x, s) satisfies the cone condition (18.39) for µ > 0, then after a higher-order step at (x, s) to the µ-center, 350 IV Miscellaneous Topics the new iterates also satisfy the cone condition. The next corollary of Lemma IV.30 is a first step in this direction. It shows that w̄(α) satisfies the cone condition. Recall that w̄(α) = w(α) if the higher-order step is exact. Later we deal with the case where the higher-order step is not exact (cf. Theorem IV.35 below). This requires careful estimation of the error term E r (α). Corollary IV.31 Let δ ≤ τ and δ∞ ≤ ζ, with ζ ≥ 2. Then we have √ (i) δ(w̄(α), µ) ≤ 1 − α τ ≤ (1 − α2 )τ ; p  (ii) δ∞ (w̄(α), µ) ≤ α + (1 − α)ζ 2 ≤ 1 − 3α 8 ζ ≤ ζ. Proof: √ The first claim is immediate from the first part of Lemma IV.30, since δ ≤ τ and 1 − α ≤ 1 − α/2. For the proof of the second statement we write, using the second part of Lemma IV.30 and ζ ≥ 2, r p p ζ2 2 α + (1 − α)δ∞ ≤ α + (1 − α)ζ 2 ≤ α + (1 − α)ζ 2 4 s    3α 3α 2 = ζ ≤ 1− ζ ≤ ζ. 1− 4 8 Thus the corollary has been proved. ✷ The next lemma provides an expression for the ‘error’ in the proximities after the step. We use the following relation, which is an obvious consequence of (18.42):   w(α) E r (α) w̄(α) e+ . (18.43) = µ µ w̄(α) Lemma IV.32 Let α be such that E r (α) w̄(α) ∞ √ 5−1 ≤ . 2 Then we have p r (α) 1 + δ(w̄(α), µ)2 Ew̄(α) ;   r (α) . (ii) δ∞ (w(α), µ) ≤ δ∞ (w̄(α), µ) 1 + Ew̄(α) (i) δ(w(α), µ) ≤ δ(w̄(α), µ) + ∞ Proof: Using (18.43) we may write 1 δ(w(α), µ) = 2 s w̄(α) µ   s  −1 µ E r (α) E r (α) e+ − e+ . w̄(α) w̄(α) w̄(α) To simplify the notation we omit the argument α in the rest of the proof and we introduce the notation E r (α) λ := , w̄(α) IV.18 Higher-Order Methods so that 351 r 1 δ(w(α), µ) = 2 Since q w̄ µ − w̄ (e + λ) − µ r µ −1 (e + λ) . w̄ q −1 µ (e + λ) − w̄ (e + λ) =  p   q  1 − 12 µ 2 (e + λ) (e + λ) + w̄ − e − − e , µ w̄ q pµ w̄ w̄ µ application of the triangle inequality gives r   rµ   1 1 w̄ −1 (e + λ) 2 − e − (e + λ) 2 − e . (18.44) δ(w(α), µ) ≤ δ(w̄(α), µ) + 2 µ w̄ Denoting the i-th coordinate of the vector under the last norm by zi , we have r   rµ   1 1 w̄i 2 (1 + λi ) − 1 − (1 + λi )− 2 − 1 . zi = µ w̄i This implies |zi | ≤ r 1 w̄i (1 + λi ) 2 − 1 + µ r 1 µ (1 + λi )− 2 − 1 . w̄i √ The hypothesis of the lemma implies |λi | ≤ ( 5 − 1)/2. Now using some elementary inequalities,12 we get r r r r  w̄i µ w̄i µ |λi | . |λi | = |λi | + + |zi | ≤ µ w̄i µ w̄i Since r w̄i + µ and r r µ w̄i 2 w̄ − µ =4+ r µ w̄ we conclude that Hence r 2 ∞ ≤ r r w̄ − µ µ w̄i r 2 µ w̄ p |zi | ≤ 2 1 + δ(w̄(α), µ)2 |λi | , kzk ≤ 2 12 w̄i − µ ≤4+ r w̄ − µ 2 = 4δ(w̄(α), µ)2 , 1 ≤ i ≤ n. p 1 + δ(w̄(α), µ)2 kλk . Exercise 86 Prove the following inequalities: 1 (1 + λ) 2 − 1 1 (1 + λ)− 2 − 1 ≤ |λ| , ≤ |λ| , r −1 ≤ λ ≤ 1, √ 1− 5 ≤ λ ≤ 1. 2 µ w̄ 2 ∞ 352 IV Miscellaneous Topics Substituting this in (18.44) proves the first statement of the lemma. The proof of the second statement in the lemma is analogous. We write s  −1 r µ µ E r (α) e+ = δ∞ (w(α), µ) = w(α) ∞ w̄(α) w̄(α) ∞ r µ −1 (e + λ) = w̄(α) ∞ r r  1 µ µ  = (e + λ)− 2 − e + w̄(α) w̄(α) ∞ r r µ µ − 21 + (e + λ) − e ≤ . w̄(α) ∞ w̄(α) ∞ ∞ Using again the results of Exercise 86 we can simplify this to δ∞ (w(α), µ) ≤ = δ∞ (w̄(α), µ) + δ∞ (w̄(α), µ) kλk∞   E r (α) δ∞ (w̄(α), µ) 1 + , w̄(α) ∞ proving the lemma. ✷ The following corollary easily follows from Lemma IV.32 and Corollary IV.31. Corollary IV.33 Let δ ≤ τ and δ∞ ≤ ζ, with ζ ≥ 2. If α is such that √ 5−1 E r (α) ≤ , w̄(α) ∞ 2 then we have √ r (α) ; 1 + τ 2 Ew̄(α)    r (α) ζ. (ii) δ∞ (w(α), µ) ≤ 1 − 38 α 1 + Ew̄(α) (i) δ(w(α), µ) ≤ (1 − α2 )τ + ∞ We proceed by finding a step-size α that satisfies the hypothesis of Lemma IV.32 and Corollary IV.33. Lemma IV.34 With δ and ζ as in Corollary IV.33, let the step-size α be such that α ≤ 1/(8τ ζ). Then E r (α) r r+1 (8ατ ζ) ≤ w̄(α) 8(r + 1) and α satisfies the hypothesis of Lemma IV.32 and Corollary IV.33. Proof: We may write E r (α) e ≤ w̄(α) w̄(α) 1 kE (α)k = µ ∞ r r µe w̄(α) 2 ∞ kE r (α)k ≤ ζ2 kE r (α)k , µ IV.18 Higher-Order Methods 353 where the last inequality follows from Corollary IV.31. Now using Corollary IV.29 we have r E r (α) r+1 (8ατ ζ) , ≤ w̄(α) 8(r + 1) proving the first part of the lemma. The second part follows from the first part by using 8ατ ζ ≤ 1: √ E r (α) E r (α) r 5−1 < , ≤ ≤ w̄(α) ∞ w̄(α) 8(r + 1) 2 completing the proof. ✷ Equipped with the above results we can prove the next theorem. Theorem IV.35 Let δ ≤ τ , δ∞ ≤ ζ, with ζ ≥ 2, and α ≤ 1/(8τ ζ). Then √  r+1 r 1 + τ 2 (8ατ ζ) ; (i) δ(w(α), µ) ≤ 1 − α2 τ + 8(r+1)   r+1 r (ii) δ∞ (w(α), µ) ≤ 1 − 83 α 1 + 8(r+1) (8ατ ζ) ζ. Proof: For the given step-size the hypothesis of Corollary IV.33 is satisfied, by Lemma IV.34. From Lemma IV.34 we also deduce the second inequality in E r (α) w̄(α) ∞ ≤ E r (α) r r+1 (8ατ ζ) . ≤ w̄(α) 8(r + 1) Substituting these inequalities in Corollary IV.33 yields the theorem. 18.5.4 ✷ The step-size In the sequel the step-size α is given the value α= 1 q , √ r 8τ ζ (r + 1)ζ 1 + τ 2 (18.45) where δ = δ(xs, µ) ≤ τ and δ∞ = δ∞ (xs, µ) ≤ ζ. It is assumed that ζ ≥ 2. The next theorem makes clear that after a higher-order step with the given step-size α the proximity δ is below a fixed fraction of τ and the proximity δ∞ below a fixed fraction of ζ. Theorem IV.36 If the step-size is given by (18.45) then ! α(r2 + 1) τ. δ(w(α), µ) ≤ 1 − 2 (r + 1)2 Moreover,  α δ∞ (w(α), µ) ≤ 1 − ζ. 8 354 IV Miscellaneous Topics Proof: The proof uses Theorem IV.35. This theorem applies because for the given value of α we have 1 1 q ≤ , α= √ 8τ ζ r 8τ ζ (r + 1)ζ 1 + τ 2 whence 8ατ ζ ≤ 1. Hence, by the first statement in Theorem IV.35,  p r α τ+ δ(w(α), µ) ≤ 1 − 1 + τ 2 (8ατ ζ)r+1 . 2 8(r + 1) (18.46) The second term on the right can be reduced by using the definition of α:  p 8ατ ζ α r √ 1 + τ2 δ(w(α), µ) ≤ 1− τ+ 2 8(r + 1) (r + 1)ζ 1 + τ 2   rα α τ = 1− + 2 (r + 1)2   r2 + 1 = 1− α τ. 2(r + 1)2 This proves the first statement. The second claim follows in a similar way from the second statement in Theorem IV.35:    3 r r+1 ζ δ∞ (w(α), µ) ≤ 1− α 1+ (8ατ ζ) 8 8(r + 1)    3 8ατ ζ r √ = 1− α 1+ ζ 8 8(r + 1) (r + 1)ζ 1 + τ 2    ατ r 3 √ ζ 1+ = 1− α 8 (r + 1)2 1 + τ 2    3 rα < 1− α ζ 1+ 8 (r + 1)2    3 1 ≤ 1− α 1+ α ζ 8 4  α ζ. ≤ 1− 8 In the last but one inequality we used that r/(r + 1)2 is monotonically decreasing if r increases (for r ≥ 1). ✷ 18.5.5 Reduction of the barrier parameter In this section we assume that δ = δ(xs, µ) ≤ τ , where τ is any positive number. After a higher-order step with step-size α, given by (18.45), we have by Theorem IV.36, δ(w(α), µ) ≤ (1 − β) δ, where β= α(r2 + 1) . 2(r + 1)2 (18.47) IV.18 Higher-Order Methods 355 Below we investigate how far µ can be decreased after the step while keeping the proximity δ less than or equal to τ . Before doing this we observe that r µe δ∞ (xs, µ) = xs ∞ is monotonically decreasing as µ decreases. Hence, we do not have to worry about δ∞ when µ is reduced. Defining µ+ := (1 − θ)µ, we first deal with a lemma that later gives an upper bound for δ(w(α), µ+ ).13 Lemma IV.37 Let (x, s) be a positive primal-dual pair and suppose µ > 0. If δ := δ(xs, µ) and µ+ = (1 − θ)µ then δ(xs, µ+ ) ≤ √ 2δ + θ n √ . 2 1−θ Proof: By the definition of δ(xs, µ+ ), 1 δ(xs, µ ) = 2 + r xs − µ+ e To simplify the notation in the proof we use u = δ(xs, µ+ ) = r µ+ e . xs p xs/µ. Then we may write √  1 u θu 1 √ √ . 1 − θ u − u−1 + √ − 1 − θu−1 = 2 2 1−θ 1−θ Using the triangle inequality and14 also kuk ≤ u − u−1 + √ √ n = 2δ + n, we get √ √ √ √ 2δ + θ n θ kuk θ (2δ + n) √ δ(xs, µ ) ≤ δ 1 − θ + √ ≤δ 1−θ+ = √ , 2 1−θ 2 1−θ 2 1−θ + proving the lemma. 13 14 ✷ A similar result was derived in Lemma II.54, but under the assumption that xT s = nµ. This assumption will in general not be satisfied in the present context, and hence we have a weaker bound. Exercise 87 For each positive number ξ we have |ξ| ≤ ξ − 1 + 1. ξ Prove this and derive that for each positive vector u the following inequality holds: kuk ≤ u − u−1 + √ n. 356 IV Miscellaneous Topics Theorem IV.38 Let δ = δ(xs, µ) ≤ τ , δ∞ = δ∞ (xs, µ) ≤ ζ, with ζ ≥ 2. Taking first a higher-order step at (x, s), with α according to (18.45), and then updating the barrier parameter to µ+ = (1 − θ)µ, where θ= ατ (r2 + 1) 2βτ √ = √ , 2τ + n (r + 1)2 (2τ + n) (18.48) we have δ(w(α), µ+ ) ≤ τ and δ∞ (w(α), µ+ ) ≤ ζ. Proof: The second part of Theorem IV.36 implies that after a step of the given size δ∞ (w(α), µ) ≤ ζ. We established earlier that δ∞ monotonically decreases when µ decreases. As a result we have δ∞ (w(α), µ+ ) ≤ ζ. Now let us estimate δ(w(α), µ+ ). After a higher-order step with step-size α as given by (18.45), we have by the first part of Theorem IV.36,   α(r2 + 1) δ(w(α), µ) ≤ 1 − δ(xs, µ) = (1 − β) δ, 2(r + 1)2 with β as defined in (18.47). Also using Lemma IV.37 we obtain δ(w(α), µ+ ) ≤ √ √ 2δ(w(α), µ) + θ n 2 (1 − β) δ + θ n √ √ ≤ . 2 1−θ 2 1−θ Since δ ≤ τ , we certainly have δ(w(α), µ+ ) ≤ τ if √ 2(1 − β)τ + θ n √ ≤ τ. 2 1−θ This inequality can be rewritten as √ √ 2(1 − β)τ + θ n ≤ 2τ 1 − θ. Using √ 1 − θ ≥ 1 − θ the above inequality certainly holds if √ 2(1 − β)τ + θ n ≤ 2τ (1 − θ). It is easily verified that the value of θ in (18.48) satisfies this inequality with equality. Thus the proof is complete. ✷ 18.5.6 A higher-order logarithmic barrier algorithm Formally the logarithmic barrier algorithm using higher-order Newton steps can be described as below. IV.18 Higher-Order Methods 357 Higher-Order Logarithmic Barrier Algorithm Input: A natural number r, the order of the search directions; a positive number τ , specifying the cone;  a primal-dual pair x0 , s0 and µ0 > 0 such that δ(x0 s0 , µ0 ) ≤ τ . ζ := max 2, δ∞ (x0 s0 , µ0 ) ; a step-size parameter α, from (18.45); an update parameter θ, from (18.48); an accuracy parameter ε > 0; begin x := x0 ; s := s0 ; µ := µ0 ; while xT s ≥ ε do begin x := x(α) = x + ∆r,α x; s := s(α) = s + ∆r,α s; µ := (1 − θ)µ end end A direct consequence of the specified values of the step-size α and update parameter θ is that the properties δ(xs, µ) ≤ τ and δ∞ (xs, µ) ≤ ζ are maintained in the course of the algorithm. This follows from Theorem IV.38 and makes the algorithm well-defined. 18.5.7 Iteration bound In the further analysis of the algorithm we choose √ τ = n and r = n. At the end of each iteration of the algorithm we have √ δ(xs, µ) ≤ τ = n. As a consequence (cf. Exercise 62),   √  √  2τ ρ(τ ) xT s ≤ 1 + √ nµ = 1 + 2ρ n nµ ≤ 4 1 + n nµ. n Hence xT s ≤ ε holds if 4 1+ or µ≤ √  n nµ ≤ ε, ε √ . 4n (1 + n) 358 IV Miscellaneous Topics Recall that at each iteration the barrier parameter is reduced by a factor 1 − θ, with θ= α α(n2 + 1) ατ (r2 + 1) √ ≥ . = 2 2 (r + 1) (2τ + n) 3(n + 1) 6 (18.49) The last inequality holds for all n ≥ 1. Using Lemma I.36 we find that the number of iterations does not exceed √ 4 (1 + n) nµ0 6 log . α ε √ Substituting α in (18.45) and τ = n, we get q √ √ n 6 = 48ζ n (n + 1)ζ 1 + n. α For n ≥ 1 we have q √ √ n (n + 1) 1 + n ≤ 2 2 = 2.8284, with equality only if n = 1. Thus we find n+1 √ 6 ≤ 136 ζ n n. α Thus we may state the next theorem without further proof. Theorem IV.39 The Higher-Order Logarithmic Barrier Algorithm needs at most √ n+1 √ 4 (1 + n) nµ0 n 136 ζ n log ε iterations. Each iteration requires O(n3 ) arithmetic operations. The output is a primaldual pair (x, s) such that xT s ≤ ε. T When starting the algorithm on the central path, with µ0 = x0 s0 /n, we have ζ = 2. In that case δ∞ (xs, µ) ≤ 2 at each iteration and the iteration bound of Theorem IV.39 becomes   √ √ √ √ nnµ0 4 (1 + n) nµ0 544 n log . (18.50) =O n log ε ε In fact, as long as ζ = O(1) the iteration bound is given by the right-hand expression in (18.50). Note that this bound has the same order of magnitude as the best known iteration complexity bound. When (x0 , s0 ) is far from the central path, the value of ζ may be so large that the iteration bound of Theorem IV.39 becomes  very poor. Note that ζ can be as large as n+1 ρ(τ ), which would give an extra factor O n 2n in (18.50). However, a more careful analysis yields a much better bound, as we show in the next section. 18.5.8 Improved iteration bound In this section we consider the situation where the algorithm starts with a high value √ of ζ. Recall from the previous section that if τ = n then ζ is always bounded by IV.18 Higher-Order Methods 359 √ √ ζ ≤ ρ ( n) = O ( n). Now the second part of Theorem IV.36 implies that after a higher-order step at (x, s) to the µ-center we have  α ζ. δ∞ (w(α), µ) ≤ 1 − 8 Reducing µ to µ+ = (1 − θ)µ we get  α ζ. δ∞ (w(α), µ+ ) ≤ (1 − θ) 1 − 8 Now using the lower bound (18.49) for θ it follows that  α  α δ∞ (w(α), µ+ ) ≤ 1 − 1− ζ. 6 8    Since 0 ≤ α ≤ 1 we have 1 − α6 1 − α8 ≤ 1 − α4 . Hence  α ζ. δ∞ (w(α), µ+ ) ≤ 1 − 4 Substituting the value of α, while using q q √ √ √ n 8 (n + 1) ζ 1 + n ≤ 8 n (n + 1) ρ n 1 + n ≤ 55, we obtain 1 ζ √ , =ζ− 220τ ζ 220 n √ showing that δ∞ (xs, µ) decreases by at least 1/ (220 n) in one iteration. Obviously, we can redefine ζ according to    1 √ ζ := max 2, δ∞ (w(α), µ+ ) ≤ max 2, ζ − 220 n δ∞ (w(α), µ+ ) ≤ ζ − in the next iteration and continue the algorithm with this new value. In this way ζ reaches the value 2 in no more than  √ √  220 n ζ 0 − 2 = O ζ 0 n iterations, where ζ 0 = δ∞ (x0 s0 , µ0 ). From then on ζ keeps the value 2, and the number of additional iterations is bounded by (18.50). Hence we may state the following improvement of Theorem IV.39 without further proof. Theorem IV.40 The Higher-Order Logarithmic Barrier Algorithm needs at most   √ √ 4 nnµ1 0√ O ζ n + n log . ε iterations. Each iteration requires O(n3 ) arithmetic operations. The output is a primaldual pair (x, s) such that xT s ≤ ε. In this theorem µ1 denotes the value of the barrier parameter attained at the first iteration for which ζ = 2. Obviously, µ1 ≤ µ0 . 19 Parametric and Sensitivity Analysis 19.1 Introduction Many commercial optimization packages for solving LO problems not only solve the problem at hand, but also provide additional information on the solution. This added information concerns the sensitivity of the solution produced by the package to perturbations in the data of the problem. In this chapter we deal with a problem (P ) in standard format:  (P ) min cT x : Ax = b, x ≥ 0 . The dual problem (D) is written as  (D) max bT y : AT y + s = c, s ≥ 0 . The input data for both problems consists of the matrix A, which is of size m × n, and the vectors b ∈ IRm and c ∈ IRn . The optimal value of (P ) and (D) is denoted by zA (b, c), with zA (b, c) = −∞ if (P ) is unbounded and (D) infeasible, and zA (b, c) = ∞ if (D) is unbounded and (P ) infeasible. If (P ) and (D) are both infeasible then zA (b, c) is undefined. We call zA the optimal-value function for the matrix A. The extra information provided by solution packages concerns only changes in the vectors b and c. We also restrict ourselves to such changes. It will follow from the results below that zA (b, c) depends continuously on the vectors b and c. In contrast, the effect of changes in the matrix A is not necessarily continuous. The next example provides a simple illustration of this phenomenon.1 Example IV.41 Consider the problem min {x2 : αx1 + x2 = 1, x1 ≥ 0, x2 ≥ 0} , where α ∈ IR. In this example we have A = (α 1), b = (1) and c = (0 1)T . We can easily verify that zA (b, c) = 0 if α > 0 and zA (b, c) = 1 if α ≤ 0. Thus, if zA (b, c) is considered a function of α, a discontinuity occurs at α = 0. ♦ Thus. the dependence of zA (b, c) on the entries in b and c is more simple than the dependence of zA (b, c) on the entries in A. 1 For some results on the effect of changes in A we refer the reader to Mills [210] and Gal [89]. 362 IV Miscellaneous Topics We develop some theory in this chapter for the analysis of one-dimensional parametric perturbations of the vectors b and c. Given a pair of optimal solutions for (P ) and (D), we present an algorithm in Section 19.4.5 for the computation of the optimal-value function under such a perturbation. Then, in Section 19.5 we consider the special case of sensitivity analysis, also called postoptimal analysis. This classical topic is treated in almost all (text-)books on LO and implemented in almost all commercial optimization packages for LO. We show in Section 19.5.1 that the socalled ranges and shadow prices of the coefficients in b and c can be obtained by solving auxiliary LO problems. In Section 19.5.3 we briefly discuss the classical approach to sensitivity analysis, which is based on the use of an optimal basic solution and the corresponding optimal basis. Although the classical approach is much cheaper from a computational point of view, it yields less information and can easily be misinterpreted. This is demonstrated in Section 19.5.4, where we provide a striking example of the inherent weaknesses of the classical approach. 19.2 Preliminaries The feasible regions of (P ) and (D) are denoted by P D := := {x : Ax = b, x ≥ 0} ,  (y, s) : AT y + s = c, s ≥ 0 . Assuming that (P ) and (D) are both feasible, the optimal sets of (P ) and (D) are denoted by P ∗ and D∗ . We define the index sets B and N by B := {i : xi > 0 for some x ∈ P ∗ } , N := {i : si > 0 for some (y, s) ∈ D∗ } . The Duality Theorem (Theorem II.2) implies that B ∩ N = ∅, and the Goldman– Tucker Theorem (Theorem II.3) that B ∪ N = {1, 2, . . . , n}. Thus, B and N form a partition of the full index set. This (ordered) partition, denoted by π = (B, N ), is the optimal partition of problems (P ) and (D). It is obvious that the optimal partition depends on b and c. 19.3 Optimal sets and optimal partition In the rest of this chapter we assume that b and c are such that (P ) and (D) have optimal solutions, and π = (B, N ) denotes the optimal partition of both problems. By definition, the optimal partition is determined by the sets of optimal solutions for (P ) and (D). In this section it is made clear that, conversely, the optimal partition provides essential information on the optimal solution sets P ∗ and D∗ . The next lemma follows immediately from the Duality Theorem and is stated without proof. IV.19 Parametric and Sensitivity Analysis 363 Lemma IV.42 Let x∗ ∈ P ∗ and (y ∗ , s∗ ) ∈ D∗ . Then  P∗ = x : x ∈ P, xT s∗ = 0 ,  D∗ = (y, s) : (y, s) ∈ D, sT x∗ = 0 . As before, we use the notation xB and xN to refer to the restriction of a vector x ∈ IRn to the coordinate sets B and N respectively. Similarly, AB denotes the restriction of A to the columns in B and AN the restriction of A to the columns in N . Now the sets P ∗ and D∗ can be described in terms of the optimal partition. Lemma IV.43 Given the optimal partition (B, N ) of (P ) and (D), the optimal sets of both problems are given by P∗ D∗ {x : x ∈ P, xN = 0} , {(y, s) : (y, s) ∈ D, sB = 0} . = = Proof: Let x∗ , s∗ be any strictly complementary pair of solutions of (P ) and (D), and (x, s) an arbitrary pair of feasible solutions. Then, from Lemma IV.42, x is optimal for (P ) if and only if xT s∗ = 0. Since s∗B = 0 and s∗N > 0, we have xT s∗ = 0 if and only if xN = 0, thus proving that P ∗ consists of all primal feasible x for which xN = 0. Similarly, if (y, s) ∈ D then this pair is optimal if and only if sT x∗ = 0. Since x∗B > 0 and x∗N = 0, this occurs if and only if sB = 0, thus proving that D∗ consists of all dual feasible s for which sB = 0. ✷ To illustrate the meaning of Lemma IV.43 we give an example. Example IV.44 Figure 19.1 shows a network with given arc lengths, and we ask for a shortest path from node s to node t. Denoting the set of nodes in this network by V and the set of arcs by E, any path from s to t can be represented by a 0-1 vector x of length |E|, whose coordinates are indexed by the arcs, such that xe = 1 if and only if arc e belongs to the path. The length of the path is then given by X ce xe , (19.1) e∈E 4 ✒ 4 s 5 ✲ ✒ 4 3 ✲ 5 5 ✲ ✒ 3 5 6 7 ❘ ✲ ✒ 3 ❘ 6 4 6 ❘ ✲ ✒ 3 4 Figure 19.1 ❘ ✲ 3 2 5 ❘ ✲ A shortest path problem. ❘t ✲ ✒ 364 IV Miscellaneous Topics where ce denote the length of arc e, for all e ∈ E. Furthermore, denoting e = (v, w) if arc e points from node v to node w (with v ∈ V and w ∈ V ), and denoting xe by xvw , x will satisfy the following balance equations: X xsv = 1 v∈V X xvu v∈V X X = u ∈ V \ {s, t} xuv , v∈V xvt = (19.2) 1 v∈V Now consider the LO problem consisting of minimizing the linear function (19.1) subject to the linear equality constraints in (19.2), with all variables xe , e ∈ E, nonnegative. This problem has the standard format: it is a minimization problem with equality constraints and nonnegative variables. Solving this problem with an interior-point method we find a strictly complementary solution, and hence the optimal partition of the problem. In this way we have computed the optimal partition (B, N ) of the problem. Since in this example there is a 1-to-1 correspondence between the arcs and the variables, we may think of B and N as a partition of the arcs in the network. 4 ✒ 4 s 5 ✲ ✒ 4 3 ✲ 5 6 ❘ ✲ ✒ 3 Figure 19.2 5 ✲ ✒ 3 5 7 ❘ 6 4 6 ❘ ✲ ✒ 3 4 ❘ ✲ 3 ❘t ✲ ✒ 2 5 ❘ ✲ The optimal partition of the shortest path problem in Figure 19.1. In Figure 19.2 we have drawn the network once more, but now with the arcs in B solid and the arcs in N dashed. The meaning of Lemma IV.43 is that any path from s to t using only solid arcs is a shortest path, and all shortest paths use exclusively solid arcs. In other words, the set B consists of all arcs in the network which occur in some shortest path from s to t and the set N contains arcs in the network which do not belong to any shortest path from s to t.2 ♦ 2 Exercise 88 Consider any network with node set V and arc set E and let s and t be two distinct nodes in this network. If all arcs in the network have positive length, then the set B, consisting of all arcs in the network which occur in at least one shortest path from s to t, does not contain a (directed) circuit. Prove this. IV.19 Parametric and Sensitivity Analysis 365 The next result deals with the dimensions of the optimal sets P ∗ and D∗ . Here, as usual the (affine) dimension of a subset of IRk is the dimension of the smallest affine subspace in IRk containing the subset. Lemma IV.45 We have dim P ∗ dim D∗ = = |B| − rank (AB ) m − rank (AB ). Proof: The optimal set of (P ) is given by P ∗ = {x : Ax = b, xB ≥ 0, xN = 0} , and hence the smallest affine subspace of IRn containing P ∗ is given by {x : AB xB = b, xN = 0} . The dimension of this affine space is equal to the dimension of the null space of AB . Since this dimension is given by |B| − rank (AB ), the first statement follows. For the proof of the second statement we use that the dual optimal set can be described by  D∗ = (y, s) : AT y + s = c, sB = 0, sN ≥ 0 . This is equivalent to  D∗ = (y, s) : ATB y = cB , ATN y + sN = cN , sB = 0, sN ≥ 0 . The smallest affine subspace containing this set is  (y, s) : ATB y = cB , ATN y + sN = cN , sB = 0 . Obviously sN is uniquely determined by y, and any y satisfying ATB y = cB yields a point in this affine space. Hence the dimension of the affine space is equal to the dimension of the null space of ATB . Since m is the number of columns of ATB , the dimension of the null space of ATB equals m − rank (AB ). This completes the proof. ✷ Lemma IV.45 immediately implies that (P ) has a unique solution if and only if rank (AB ) = |B|. Clearly this happens if and only if the columns in AB are linearly independent. Also, (D) has a unique solution if and only if rank (AB ) = m, which happens if and only if the rows in AB are linearly independent.3 3 It has become common practice in the literature to call the problem (P ) degenerate if (P ) or (D) have multiple optimal solutions. Degeneracy is an important topic in LO. In the context of the Simplex Method it is well known as a source of difficulties. This is especially true when dealing with sensitivity analysis. See, e.g., Gal [90] and Greenberg [128]. But also in the context of interior-point methods the occurrence of degeneracy may influence the behavior of the method. We mention some references: Gonzaga [120], Güler et al. [132], Todd [263], Tsuchiya [269, 271], Hall and Vanderbei [138]. 366 19.4 IV Miscellaneous Topics Parametric analysis In this section we start to investigate the effect of changes in b and c on the optimalvalue function zA (b, c). We consider one-dimensional parametric perturbations of b and c. So we want to study zA (b + β∆b, c + γ∆c) as a function of the parameters β and γ, where ∆b and ∆c are given perturbation vectors. From now on the vectors b and c are fixed, and the variations come from the parameters β and γ. In fact, we restrict ourselves to the cases that the variations occur only in one of the two vectors b and c. In other words, taking γ = 0 we consider variations in β and taking β = 0 we consider variations in γ. If γ = 0, then (Pβ ) will denote the perturbed primal problem and (Dβ ) its dual. The feasible regions of these problems are denoted by Pβ and Dβ . Similarly, if β = 0, then (Dγ ) will denote the perturbed dual problem and (Pγ ) its dual and the feasible regions of these problems are Dγ and Pγ . Observe that the feasible region of (Dβ ) is simply D and the feasible region of (Pγ ) is simply P. We use the superscript ∗ to refer to the optimal set of each of these problems. We assume that b and c are such that (P ) and (D) are both feasible. Then zA (b, c) is well defined and finite. It is convenient to introduce the following notations: b(β) := b + β∆b, f (β) := zA (b(β), c), c(γ) := c + γ∆c, g(γ) := zA (b, c(γ)). Here the domain of the parameters β and γ is taken as large as possible. Let us consider the domain of f . This function is defined as long as zA (b(β), c) is well defined. Since the feasible region of (Dβ ) is constant when β varies, and since we assume that (Dβ ) is feasible for β = 0, it follows that (Dβ ) is feasible for all values of β. Therefore, f (β) is well defined if the dual problem (Dβ ) has an optimal solution and f (β) is not defined (or infinity) if the dual problem (Dβ ) is unbounded. By the Duality Theorem it follows that f (β) is well defined if and only if the primal problem (Pβ ) is feasible. In exactly the same way it can be understood that the domain of g consists of all γ for which (Dγ ) is feasible (and (Pγ ) bounded). Lemma IV.46 The domains of f and g are convex. Proof: We give the proof for f . The proof for g is similar and therefore omitted. Let β1 , β2 ∈ dom (f ) and β1 < β < β2 . Then f (β1 ) and f (β2 ) are finite, which means that both Pβ1 and Pβ2 are nonempty. Let x1 ∈ Pβ1 and x2 ∈ Pβ2 . Then x1 and x2 are nonnegative and Ax1 = b + β1 ∆b, Ax2 = b + β2 ∆b. Now consider x := x1 +  (β2 − β) x1 + (β − β1 ) x2 β − β1 . x2 − x1 = β2 − β1 β2 − β1 IV.19 Parametric and Sensitivity Analysis 367 Note that x is a convex combination of x1 and x2 and hence x is nonnegative. We proceed by showing that x ∈ Pβ . Using that A x2 − x1 = (β2 − β1 ) ∆b this goes as follows: Ax = = = =  β − β1 A x2 − x1 β2 − β1 β − β1 (β2 − β1 ) ∆b b + β1 ∆b + β2 − β1 b + β1 ∆b + (β − β1 ) ∆b Ax1 + b + β∆b. This proves that (Pβ ) is feasible and hence β ∈ dom (f ), completing the proof. ✷ The domains of f and g are in fact closed intervals on the real line. This follows from the above lemma, and the fact that the complements of the domains of f and g are open subsets of the real line. The last statement is the content of the next lemma. Lemma IV.47 The complements of the domains of f and g are open subsets of the real line. Proof: As in the proof of the previous lemma we omit the proof for g because it is similar to the proof for f . We need to show that the complement of dom (f ) is open. Let β ∈ / dom (f ). This means that (Dβ ) is unbounded. This is equivalent to the existence of a vector z such that AT z ≤ 0, (b + β∆b)T z > 0. Fixing z and considering β as a variable, the set of all β satisfying the strict inequality T (b + β∆b) z > 0 is an open interval. For all β in this interval (Dβ ) is unbounded. Hence the domain of f is open. This proves the lemma. ✷ A consequence of the last two lemmas is the next theorem, which requires no further proof. Theorem IV.48 The domains of f and g are closed intervals on the real line.4 ✷ Example IV.49 Let (D) be the problem max {y2 : y2 ≤ 1} . y=(y1 ,y2 ) In this case b = (0, 1) and c = (1). Note that (D) is feasible and bounded. The set of all optimal solutions consists of all (y1 , 1) with y1 ∈ IR. Now let ∆b = (1, 0), and consider the effect of replacing b by b + β∆b, and let f (β) be as defined above. Then f (β) = 4 max {y2 + βy1 : y2 ≤ 1} . y=(y1 ,y2 ) To avoid misunderstanding we point out that a singleton {a} (a ∈ IR) is also considered as a closed interval. 368 IV Miscellaneous Topics We can easily verify that the perturbed problem is unbounded for all nonzero β. Hence the domain of f is the singleton {0}.5 ♦ 19.4.1 The optimal-value function is piecewise linear In this section we show that the functions f (β) and g(γ) are piecewise linear on their domains. We start with g(γ). Theorem IV.50 g(γ) is continuous, concave and piecewise linear. Proof: By definition,  g(γ) = min c(γ)T x : x ∈ P . For each γ the minimum value is attained at the central solution of the perturbed problem (Pγ ). This solution is uniquely determined by the optimal partition of (Pγ ). Since the number of partitions of the full index set {1, 2, . . . , n} is finite, we may write  g(γ) = min c(γ)T x : x ∈ T , where T is a finite subset of P. For each x ∈ T we have c(γ)T x = cT x + γ∆cT x, which is a linear function of γ. Thus, g(γ) is the minimum of a finite set of linear functions.6 This implies that g(γ) is continuous, concave and piecewise linear, proving the theorem. ✷ Theorem IV.51 f (β) is continuous, convex and piecewise linear. Proof: The proof goes in the same way as for Theorem IV.50. By definition,  f (β) = max b(β)T y : y ∈ D . For each β the maximum value is attained at a central solution (y ∗ , s∗ ) of (D). Now s∗ is uniquely determined by the optimal partition of (D) and b(β)T y ∗ is constant for all optimal y ∗ . Associating one particular y ∗ with any possible slack s∗ arising in this way, we obtain that  f (β) = max b(β)T y : y ∈ S , where S is a finite subset of D. For each y ∈ S, we have b(β)T y = bT y + β∆bT y, 5 6 Exercise 89 With (D) and f (β) as defined in Example IV.49 we consider the effect on the domain of f when some constraints are added. When the constraint y1 ≥ 0 is added to (D), the domain of f becomes (−∞, 0]. When the constraint y1 ≤ 0 is added to (D), the domain of f becomes [0, ∞) and when both constraints are added the domain of f becomes (−∞, ∞). Prove this. Exercise 90 Prove that the minimum of a finite family of linear functions, each defined on the same closed interval, is continuous, concave and piecewise linear. IV.19 Parametric and Sensitivity Analysis 369 which is a linear function of β. This makes clear that f (β) is the maximum of a finite set of linear functions. Therefore, f (β) is continuous, convex and piecewise linear, as required. ✷ The values of β where the slope of the optimal-value function f (β) changes are called break points of f , and any interval between two successive break points of f is called a linearity interval of f . In a similar way we define break points and linearity intervals for g. Example IV.52 For any γ ∈ IR consider the problem (Pγ ) defined by (Pγ ) min x1 + (3 + γ)x2 + (1 − γ)x3 s.t. x1 + x2 + x3 = 4, x1 , x2 , x3 ≥ 0. In this case b is constant and the perturbation vector for c = (1, 3, 1) is ∆c = (0, 1, −1). The dual problem is (Dγ ) max {4y : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ} . From this it is obvious that the optimal value is given by g(γ) = 4 min (1, 3 + γ, 1 − γ) . The graph of the optimal-value function g(γ) is depicted in Figure 19.3. Note that 5 4 g(γ) ✻ 3 2 1 0 −1 −2 −3 −4 −5 −5 −4 −3 Figure 19.3 −2 −1 0 ✲1 γ 2 3 The optimal-value function g(γ). g(γ) is piecewise linear and concave. The break points of g occur for γ = −2 and γ = 0. ♦ 370 19.4.2 IV Miscellaneous Topics Optimal sets on a linearity interval For any β in the domain of f we denote the optimal set of (Pβ ) by Pβ∗ and the optimal set of (Dβ ) by Dβ∗ . Theorem IV.53 If f (β) is linear on the interval [β1 , β2 ], where β1 < β2 , then the dual optimal set Dβ∗ is constant (i.e. invariant) for β ∈ (β1 , β2 ). Proof: Let β̄ ∈ (β1 , β2 ) be arbitrary and let ȳ ∈ Dβ̄∗ be arbitrary as well. Since ȳ is optimal for (Dβ̄ ) we have f (β̄) = b(β̄)T ȳ = bT ȳ + β̄∆bT ȳ, and, since ȳ is dual feasible for all β, b (β1 )T ȳ = bT ȳ + β1 ∆bT ȳ ≤ f (β1 ), b (β2 )T ȳ = bT ȳ + β2 ∆bT ȳ ≤ f (β2 ). Hence we find  f (β1 ) − f (β̄) ≥ β1 − β̄ ∆bT ȳ, The linearity of f on [β1 , β2 ] implies  f (β2 ) − f (β̄) ≥ β2 − β̄ ∆bT ȳ. f (β2 ) − f (β̄) f (β̄) − f (β1 ) = . β̄ − β1 β2 − β̄ Now using that β2 − β̄ > 0 and β1 − β̄ < 0 we obtain ∆bT ȳ ≤ f (β̄) − f (β1 ) f (β2 ) − f (β̄) ≤ ∆bT ȳ. = β̄ − β1 β2 − β̄ Hence, the last two inequalities are equalities, and the slope of f on the closed interval [β1 , β2 ] is just ∆bT ȳ. This means that the derivative of f with respect to β on the open interval (β1 , β2 ) satisfies f ′ (β̄) = ∆bT ȳ, ∀β ∈ (β1 , β2 ) , or equivalently, f (β) = bT ȳ + β∆bT ȳ = b (β)T ȳ, ∀β ∈ (β1 , β2 ) . We conclude that ȳ is optimal for any (Dβ ) with β ∈ (β1 , β2 ). Since ȳ was arbitrary in Dβ̄∗ , it follows that Dβ̄∗ ⊆ Dβ∗ , ∀β ∈ (β1 , β2 ) . Since β̄ was arbitrary in the open interval (β1 , β2 ), the above argument applies to any β̃ ∈ (β1 , β2 ); so we also have Dβ̃∗ ⊆ Dβ∗ , ∀β ∈ (β1 , β2 ) . We may conclude that Dβ̄∗ ⊆ Dβ̃∗ and Dβ̃∗ ⊆ Dβ̄∗ , which gives Dβ̄∗ = Dβ̃∗ . The theorem follows. ✷ The above proof reveals that ∆bT y must have the same value for all y ∈ Dβ∗ and for all β ∈ (β1 , β2 ). So we may state the following. IV.19 Parametric and Sensitivity Analysis 371 Corollary IV.54 Under the hypothesis of Theorem IV.53, f ′ (β) = ∆bT y, ∀β ∈ (β1 , β2 ) , ∀y ∈ Dβ∗ . By continuity we may write T f (β) = bT ȳ + β∆bT ȳ = b (β) ȳ, ∀β ∈ [β1 , β2 ]. This immediately implies another consequence. ∗ Corollary IV.55 Under the hypothesis of Theorem IV.53 let D(β := Dβ∗ for 1 ,β2 ) arbitrary β ∈ (β1 , β2 ). Then ∗ ⊆ Dβ∗1 , D(β 1 ,β2 ) ∗ ⊆ Dβ∗2 . D(β 1 ,β2 ) In the next result we deal with the converse of the implication in Theorem IV.53. Theorem IV.56 Let β1 and β2 > β1 be such that Dβ∗1 = Dβ∗2 . Then Dβ∗ is constant for all β ∈ [β1 , β2 ] and f (β) is linear on the interval [β1 , β2 ]. Proof: Let ȳ ∈ Dβ∗1 = Dβ∗2 . Then f (β1 ) = b (β1 )T ȳ, f (β2 ) = b (β2 )T ȳ. Consider the linear function h: T T h(β) = b (β) ȳ = (b + β∆b) ȳ, ∀β ∈ [β1 , β2 ]. Then h coincides with f at β1 and β2 . Since f is convex this implies f (β) ≤ h(β), ∀β ∈ [β1 , β2 ]. Now ȳ is feasible for all β ∈ [β1 , β2 ]. Since f (β) is the optimal value of (Dβ ), it follows that T f (β) ≥ b(β)T ȳ = (b + β∆b) ȳ = h(β). Therefore, f coincides with h on [β1 , β2 ]. As a consequence, f is linear on [β1 , β2 ] and ȳ is optimal for (Dβ ) whenever β ∈ [β1 , β2 ]. Since ȳ is arbitrary in Dβ∗1 = Dβ∗2 this implies that Dβ∗1 = Dβ∗2 is a subset of Dβ∗ for any β ∈ (β1 , β2 ). By Theorem IV.53, and Corollary IV.55 we also have the converse inclusion. The dual optimal set on (β1 , β2 ) is therefore constant, and the proof is complete. ✷ Each of the above results about f (β) has its analogue for g(γ). We state these results without further proof.7 The omitted proofs are straightforward modifications of the above proofs. Theorem IV.57 If g(γ) is linear on the interval [γ1 , γ2 ], where γ1 < γ2 , then the primal optimal set Pγ∗ is constant for γ ∈ (γ1 , γ2 ). 7 Exercise 91 Prove Theorem IV.57, Corollary IV.58, Corollary IV.59 and Theorem IV.60. 372 IV Miscellaneous Topics Corollary IV.58 Under the hypothesis of Theorem IV.57, g ′ (γ) = ∆cT x, ∀γ ∈ (γ1 , γ2 ) , ∀x ∈ Pγ∗ . ∗ Corollary IV.59 Under the hypothesis of Theorem IV.57 let P(γ := Pγ∗ for 1 ,γ2 ) arbitrary γ ∈ (γ1 , γ2 ). Then ∗ P(γ ⊆ Pγ∗1 , 1 ,γ2 ) ∗ P(γ ⊆ Pγ∗2 . 1 ,γ2 ) Theorem IV.60 Let γ1 and γ2 > γ1 be such that Pγ∗1 = Pγ∗2 . Then Pγ∗ is constant for all γ ∈ [γ1 , γ2 ] and g(γ) is linear on the interval [γ1 , γ2 ]. 19.4.3 Optimal sets in a break point Returning to the function f , we established in the previous section that if β ∈ dom (f ) is not a break point of f then the quantity ∆bT y is constant for all y ∈ Dβ∗ . In this section we will see that this property is characteristic for ‘nonbreak’ points. If the domain of f has a right extreme point then we may consider the right derivative at this point to be ∞, and if the domain of f has a left extreme point the left derivative at this point may be taken as −∞. Then β is a break point of f if and only if the left and the right derivatives of f at β are different. This follows from ′ the definition of a break point. Denoting the left and the right derivatives by f− (β) ′ and f+ (β) respectively, the convexity of f implies that at a break point β we have ′ ′ f− (β) < f+ (β). If dom (f ) has a right extreme point, it is convenient to consider the open interval at the right of this point as a linearity interval where both f and its derivative are ∞. Similarly, if dom (f ) has a left extreme point, we may consider the open interval at the left of this point as a linearity interval where f is ∞ and its derivative −∞. Obviously, these extreme linearity intervals are characterized by the fact that on the intervals the primal problem is infeasible and the dual problem unbounded. The dual problem is unbounded if and only if the set Dβ∗ of optimal solutions is empty. Lemma IV.61 8 Let β, β − and β + belong to the interior of dom (f ) such that β + belongs to the open linearity interval just to the right of β and β − to the open linearity interval just to the left of β. Moreover, let y + ∈ Dβ∗ + and y − ∈ Dβ∗ − . Then ′ f− (β) = ′ f+ (β) =  min ∆bT y : y ∈ Dβ∗ = ∆bT y − y  max ∆bT y : y ∈ Dβ∗ = ∆bT y + . y ′ ′ Proof: We give the proof for f+ (β). The proof for f− (β) goes in the same way and + ∗ is omitted. Since y is optimal for Dβ + we have (b + β + ∆b)T y + = f (β + ) ≥ (b + β + ∆b)T y, ∀y ∈ Dβ∗ . 8 This lemma can also be obtained as a special case of a result of Mills [210]. His more general result gives the directional derivatives of the optimal-value function with respect to any ‘admissible’ perturbation of A, b and c; when only b is perturbed it gives the same result as the lemma. IV.19 Parametric and Sensitivity Analysis 373 We also have y + ∈ Dβ∗ , from Theorem IV.53 and Corollary IV.55. Therefore, T T (b + β∆b) y + = (b + β∆b) y, ∀y ∈ Dβ∗ . Subtracting both sides of this equality from the corresponding sides in the last inequality gives   β + − β ∆bT y + ≥ β + − β ∆bT y, ∀y ∈ Dβ∗ . Dividing both sides by the positive number β + − β we get ∆bT y + ≥ ∆bT y, ∀y ∈ Dβ∗ , thus proving that  max ∆bT y : y ∈ Dβ∗ = ∆bT y + . y Since ′ f+ (β) T + = ∆b y , from Corollary IV.54, the lemma follows. ✷ The above lemma admits a nice generalization that is also valid if β is an extreme point of the domain of f . Theorem IV.62 Let β ∈ dom (f ) and let x∗ be any optimal solution of (Pβ ). Then the derivatives at β satisfy  ′ f− (β) = min ∆bT y : AT y + s = c, s ≥ 0, sT x∗ = 0 y,s  ′ f+ (β) = max ∆bT y : AT y + s = c, s ≥ 0, sT x∗ = 0 . y,s ′ Proof: As in the previous lemma, we give the proof for f+ (β) and omit the proof for ′ f− (β). Consider the optimization problem  max ∆bT y : AT y + s = c, s ≥ 0, sT x∗ = 0 . (19.3) y,s First we establish that if β belongs to the interior of dom (f ) then this is exactly the same problem as the maximization problem in Lemma IV.61. This follows because if AT y + s = c, s ≥ 0, then (y, s) is optimal for (Dβ ) if and only if sT x∗ = 0, since x∗ is an optimal solution of the dual problem (Pβ ) of (Dβ ). If β belongs to the interior of dom (f ) then the theorem follows from Lemma IV.61. Hence it remains to deal with the case where β is an extreme point of dom (f ). It is easily verified that if β is the left extreme point of dom (f ) then we can repeat the arguments in the proof of Lemma IV.61. Thus it remains to prove the theorem if β is the right extreme point of ′ dom (f ). Since f+ (β) = ∞ in that case, we need to show that the above maximization problem (19.3) is unbounded. Let β be the right extreme point of dom (f ) and suppose that the problem (19.3) is not unbounded. Let us point out first that (19.3) is feasible. Its feasible region is just the optimal set of the dual (Dβ ) of (Pβ ). Since (Pβ ) has as an optimal solution, (Dβ ) has an optimal solution as well. This implies that (Dβ ) is feasible. Therefore, (19.3) is feasible as well. Hence, if (19.3) is not unbounded, the problem itself and its dual have optimal solutions. The dual problem is given by  min cT ξ : Aξ = ∆b, ξ + λx∗ ≥ 0 . ξ,λ 374 IV Miscellaneous Topics We conclude that there exists a vector ξ ∈ IRn and a scalar λ such that Aξ = ∆b, ξ + λx∗ ≥ 0. This implies that we cannot have ξi < 0 and x∗i = 0. In other words, x∗i = 0 ⇒ ξi ≥ 0. Hence, there exists a positive ε such that x̄ := x∗ + εξ ≥ 0. Now we have Ax̄ = A (x∗ + εξ) = Ax∗ + εAξ = b + (β + ε) ∆b. Thus we find that (Pβ+ε ) admits x̄ as a feasible point. This contradicts the assumption that β is the right extreme point of dom (f ). We conclude that (19.3) is unbounded, proving the theorem. ✷ The picture becomes more complete now. Note that Theorem IV.62 is valid for any value of β in the domain of f . The theorem reestablishes that at a ‘nonbreak’ point, where the left and right derivative of f are equal, the value of ∆bT y is constant when y runs through the dual optimal set Dβ∗ . But it also makes it clear that at a break point, where the two derivatives are different, ∆bT y is not constant when y runs through the dual optimal set Dβ∗ . Then the extreme values of ∆bT y yield the left and the right derivatives of f at β; the left derivative is the minimum and the right derivative the maximal value of ∆bT y when y runs through the dual optimal set Dβ∗ . It is worth pointing out another consequence of Lemma IV.61 and Theorem IV.62. Using the notation of the lemma we have the inclusions Dβ∗ − ⊆ Dβ∗ , Dβ∗ + ⊆ Dβ∗ , which follow from Corollary IV.55 if β is not an extreme point of dom (f ). If β is the right extreme point then Dβ∗ + is empty, and if it is the left extreme point then Dβ∗ − is empty as well; hence the above inclusions hold everywhere. Now suppose that β is a nonextreme break point of f . Then letting y run through the set Dβ∗ − we know that ∆bT y is constant and equal to the left derivative of f at β, and if y runs through Dβ∗ + then ∆bT y is constant and equal to the right derivative of f at β and, finally, if y runs through Dβ∗ then ∆bT y is not constant. Thus the three sets must be mutually different. As a consequence, the above inclusions must be strict. Moreover, since the left and the right derivatives at β are different, the sets Dβ∗ − and Dβ∗ + are disjoint. Thus we may state the following. Corollary IV.63 Let β be a nonextreme break point of f and let β + and β − be as defined in Lemma IV.61. Then we have Dβ∗ − ⊂ Dβ∗ , Dβ∗ + ⊂ Dβ∗ , Dβ∗ − ∩ Dβ∗ + = ∅, where the inclusions are strict.9 9 Exercise 92 Using the notation of Lemma IV.61 and Corollary IV.63, we have Dβ∗ − ∪ Dβ∗ + ⊆ Dβ∗ . Show that the inclusion is always strict. (Hint: use the central solution of (Dβ ).) IV.19 Parametric and Sensitivity Analysis 375 Two other almost obvious consequences of the above results are the following corollaries.10 Corollary IV.64 Let β be a nonextreme break point of f and let β + and β − be as defined in Lemma IV.61. Then  Dβ∗ − = y ∈ Dβ∗ : ∆bT y = ∆bT y − ,  Dβ∗ + = y ∈ Dβ∗ : ∆bT y = ∆bT y + . Corollary IV.65 Let β be a nonextreme break point of f and let β + and β − be as defined in Lemma IV.61. Then dim Dβ∗ − < dim Dβ∗ , dim Dβ∗ + < dim Dβ∗ . Remark IV.66 It is interesting to consider the dual optimal set Dβ∗ when β runs from −∞ to ∞. To the left of the smallest break point (the break point for which β is minimal) the set Dβ∗ is constant. It may happen that Dβ∗ is empty there, due to the absence of optimal solutions for these small values of β. This occurs if (Dβ ) is unbounded (which means that (Pβ ) is infeasible) for the values of β on the farthest left open linearity interval. Then, at the first break point, the set Dβ∗ increases to a larger set, and as we pass to the next open linearity interval the set Dβ∗ becomes equal to a proper subset of this enlarged set. This process repeats itself at every new break point: at a break point of f the dual optimal set expands itself, and as we pass to the next open linearity interval it shrinks to a proper subset of the enlarged set. Since the derivative of f is monotonically increasing when β runs from −∞ to ∞, every new dual optimal set arising in this way differs from all previous ones. In other words, every break point of f and every linearity interval of f has its own dual optimal set.11 • We state the dual analogues of Lemma IV.61 and Theorem IV.62 and their corollaries without further proof.12 Lemma IV.67 Let γ, γ − and γ + belong to the interior of dom (g), γ + to the open linearity interval just to the right of γ, and γ − to the open linearity interval just to the left of γ. Moreover, let x+ ∈ Pγ∗+ and x− ∈ Pγ∗− . Then ′ g− (γ) = ′ g+ (γ) =  max ∆cT x : x ∈ Pγ∗ = ∆cT x− x  min ∆cT x : x ∈ Pγ∗ = ∆cT x+ . x Theorem IV.68 Let γ ∈ dom (g) and let (y ∗ , s∗ ) be any optimal solution of (Dγ ). Then the derivatives at γ satisfy ′ g− (γ) = ′ g+ (γ) =  max ∆cT x : Ax = b, x ≥ 0, xT s∗ = 0 x  min ∆cT x : Ax = b, x ≥ 0, xT s∗ = 0 . x 10 Exercise 93 Prove Corollary IV.64 and Corollary IV.65. 11 Exercise 94 The dual optimal sets belonging to two different open linearity intervals of f are disjoint. Prove this. (Hint: use that the derivatives of f on the two intervals are different.) Exercise 95 Prove Lemma IV.67, Theorem IV.68, Corollary IV.69, Corollary IV.70 and Corollary IV.71. 12 376 IV Miscellaneous Topics Corollary IV.69 Let γ be a nonextreme break point of g and let γ + and γ − be as defined in Lemma IV.67. Then Pγ∗− ⊂ Pγ∗ , Pγ∗+ ⊂ Pγ∗ , Pγ∗− ∩ Pγ∗+ = ∅, where the inclusions are strict.13 Corollary IV.70 Let γ be a nonextreme break point of g and let γ + and γ − be as defined in Lemma IV.67. Then   Pγ∗− = x ∈ Pγ∗ : ∆cT x = ∆cT x− , Pγ∗+ = x ∈ Pγ∗ : ∆cT x = ∆cT x+ . Corollary IV.71 Let γ be a nonextreme break point of g and let γ + and γ − be as defined in Lemma IV.67. Then dim Pγ∗− < dim Pγ∗ , dim Pγ∗+ < dim Pγ∗ . The next example illustrates the results of this section. Example IV.72 We use the same problem as in Example IV.52. For any γ ∈ IR the problem (Pγ ) is defined by (Pγ ) min x1 + (3 + γ)x2 + (1 − γ)x3 s.t. x1 + x2 + x3 = 4, x1 , x2 , x3 ≥ 0, and the dual problem is (Dγ ) max {4y : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ} . The perturbation vector for c = (1, 3, 1) is ∆c = (0, 1, −1). The graph of g is depicted in Figure 19.3 (page 369). The break points of g occur at γ = −2 and γ = 0. For γ < −2 the optimal solution of (Pγ ) is x = (0, 4, 0), and then ∆cT x = 4. At the break point γ = −2 the primal optimal solution set is given by {x = (x1 , x2 , 0) : x1 + x2 = 4, x1 ≥ 0, x2 ≥ 0} . The extreme values of ∆cT x on this set are 4 and 0. The maximal value occurs for x = (0, 4, 0) and the minimal value for x = (4, 0, 0). Hence, the left and right derivatives of g at γ = −2 are given by these values. If −2 < γ < 0 then the optimal solution of the primal problem is given by x = (4, 0, 0) and ∆cT x = 0, so the derivative of g is 0 in this region. At the break point γ = 0 the primal optimal solution set is given by {x = (x1 , 0, x3 ) : x1 + x3 = 4, x1 ≥ 0, x3 ≥ 0} . The extreme values of ∆cT x on this set are 0 and −4. The left and right derivatives of g at γ = 0 are given by these values. The maximal value occurs for x = (4, 0, 0) and the minimal value for x = (0, 0, 4). Observe that in this example the primal optimal solution set at every break point has dimension 1, whereas in the open linearity intervals the optimal solution is always unique. ♦ 13 Exercise 96 Find an example where Pγ∗− = ∅ and Pγ∗ 6= ∅. IV.19 Parametric and Sensitivity Analysis 19.4.4 377 Extreme points of a linearity interval In this section we assume that β̄ belongs to the interior of a linearity interval [β1 , β2 ]. Given an optimal solution of (Dβ̄ ) we show how the extreme points β1 and β2 of the linearity interval containing β̄ can be found by solving two auxiliary LO problems. Theorem IV.73 Let β̄ be arbitrary and let (y ∗ , s∗ ) be any optimal solution of (Dβ̄ ). Then the extreme points of the linearity interval [β1 , β2 ] containing β̄ follow from  β1 = min β : Ax = b + β∆b, x ≥ 0, xT s∗ = 0 β,x  β2 = max β : Ax = b + β∆b, x ≥ 0, xT s∗ = 0 . β,x Proof: We only give the proof for β1 .14 Consider the minimization problem  min β : Ax = b + β∆b, x ≥ 0, xT s∗ = 0 . (19.4) β,x We first show that this problem is feasible. Since (Dβ̄ ) has an optimal solution, its dual problem (Pβ̄ ) has an optimal solution as well. Letting x̄ be optimal for (Pβ̄ ), we can easily verify that β = β̄ and x = x̄ are feasible for (19.4). We proceed by considering the case where (19.4) is unbounded. For any β ≤ β̄ there exists a vector x that satisfies Ax = b + β∆b, x ≥ 0, xT s∗ = 0. Now (y ∗ , s∗ ) is feasible for (Dβ ) and x is feasible for (Pβ ). Since xT s∗ = 0, x is optimal for (Pβ ) and (y ∗ , s∗ ) is optimal for (Dβ ). The optimal value of both problems is given by b(β)T y ∗ = bT y ∗ + β∆bT y ∗ . This means that β belongs to the linearity interval containing β̄. Since this holds for any β ≤ β̄, the left boundary of this linearity interval is −∞, as it should be. It remains to deal with the case where (19.4) has an optimal solution, say (β ∗ , x∗ ). We then have Ax∗ = b + β ∗ ∆b = b(β ∗ ), so x∗ is feasible for (Pβ ∗ ). Since (y ∗ , s∗ ) is feasible for (Dβ ∗ ) and x∗ T s∗ = 0 it follows that x∗ is optimal for (Pβ ∗ ) and (y ∗ , s∗ ) is optimal for (Dβ ∗ ). The optimal value of both problems is given by b(β ∗ )T y ∗ = bT y ∗ + β ∗ ∆bT y ∗ . This means that β ∗ belongs to the linearity interval containing β̄, and it follows that β ∗ ≥ β1 . On the other hand, from Corollary IV.55 the pair (y ∗ , s∗ ) is optimal for (Dβ1 ). Now let x̄ be optimal for (Pβ1 ). Then we have Ax̄ = b(β1 ) = b + β1 ∆b, x ≥ 0, x̄T s∗ = 0, which shows that the pair (β1 , x̄) is feasible for the above minimization problem. This implies that β ∗ ≤ β1 . Hence we obtain that β ∗ = β1 . This completes the proof. ✷ If β̄ is not a break point then there is only one linearity interval containing β̄, and hence this must be the linearity interval [β1 , β2 ], as given by Theorem IV.73. It is worth pointing out that if β̄ is a break point there are three linearity intervals containing β̄, namely the singleton interval [β̄, β̄] and the two surrounding linearity intervals. In the singleton case, the linearity interval [β1 , β2 ] given by Theorem IV.73 may be any of these three intervals, and which one it is depends on the given optimal 14 Exercise 97 Prove the second part (on β2 ) of Theorem IV.73. 378 IV Miscellaneous Topics solution (y ∗ , s∗ ) of (Dβ̄ ). It can easily be understood that the linearity interval at the right of β̄ will be found if (y ∗ , s∗ ) happens to be optimal on the right linearity ′ interval. This occurs when ∆bT y ∗ = f+ (β̄), due to Corollary IV.64. Similarly, the linearity interval at the left of β̄ will be found if (y ∗ , s∗ ) is optimal on the left linearity ′ interval and this occurs when ∆bT y ∗ = f− (β̄), also due to Corollary IV.64. Finally, if ′ ′ f− (β̄) < ∆bT y ∗ < f+ (β̄), (19.5) then we have β1 = β2 = β̄ in Theorem IV.73. The last situation seems to be most informative. It clearly indicates that β̄ is a break point of f , which is not apparent in the other two situations. Knowing that β̄ is a break point of f we can find the two one-sided derivatives of f at β̄ as well as optimal solutions for the two intervals surrounding β̄ from Theorem IV.62. In the light of this discussion the following result is of interest. It shows that the above ambiguity can be avoided by the use of strictly complementary optimal solutions. Theorem IV.74 Let β̄ be a break point and let (y ∗ , s∗ ) be a strictly complementary optimal solution of (Dβ̄ ). Then the numbers β1 and β2 given by Theorem IV.73 satisfy β1 = β2 = β̄. Proof: If (y ∗ , s∗ ) is a strictly complementary optimal solution of (Dβ̄ ) then it uniquely determines the optimal partition of (Dβ̄ ) and this partition differs from the optimal partitions corresponding to the optimal sets of the linearity intervals surrounding β̄. Hence (y ∗ , s∗ ) does not belong to the optimal sets of the linearity intervals surrounding β̄. From Corollary IV.64 it follows that ∆bT y ∗ satisfies (19.5), and the theorem follows. ✷ It is not difficult to state the corresponding results for g. We do this below, omitting the proofs, and then provide an example of their use.15 Theorem IV.75 Let γ̄ be arbitrary and let x∗ be any optimal solution of (Pγ̄ ). Then the extreme points of the linearity interval [γ1 , γ2 ] containing γ̄ follow from  γ1 = min γ : AT y + s = c + γ∆c, s ≥ 0, sT x∗ = 0 γ,y,s  γ2 = max γ : AT y + s = c + γ∆c, s ≥ 0, sT x∗ = 0 . γ,y,s Theorem IV.76 Let γ̄ be a break point and let x∗ be a strictly complementary optimal solution of (Pγ̄ ). Then the numbers γ1 and γ2 given by Theorem IV.75 satisfy γ1 = γ2 = γ̄. Example IV.77 We use the same problem as in Example IV.72. Using the notation of Theorem IV.75 we first determine the linearity interval for γ̄ = −1. We can easily verify that x = (4, 0, 0) is optimal for (P−1 ). Hence the extreme points γ1 and γ2 of the linearity interval containing γ̄ follow by minimizing and maximizing γ over the region {γ : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ, 4(1 − y) = 0} . 15 Exercise 98 Prove Theorem IV.75 and Theorem IV.76. IV.19 Parametric and Sensitivity Analysis 379 The last constraint implies y = 1, so the other constraints reduce to 1 ≤ 3 + γ and 1 ≤ 1 − γ, which gives −2 ≤ γ ≤ 0. Hence the linearity interval containing γ̄ = −1 is [−2, 0]. When γ̄ = 1, x = (0, 0, 4) is optimal for (P1 ), and the linearity interval containing γ̄ follows by minimizing and maximizing γ over the region {γ : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ, 4(1 − γ − y) = 0} . The last constraint implies y = 1 − γ. Now the other constraints reduce to 1 − γ ≤ 1 and 1 − γ ≤ 3 + γ, which is equivalent to γ ≥ 0. So the linearity interval containing γ̄ = 1 is [0, ∞). When γ̄ = −3, x = (0, 4, 0) is optimal for (P−3 ), and the linearity interval containing γ̄ follows by minimizing and maximizing γ over the region {γ : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ, 4(3 + γ − y) = 0} . The last constraint implies y = 3+γ, and the other constraints reduce to 3+γ ≤ 1 and 3 + γ ≤ 1 − γ, which is equivalent to γ ≤ −2. Thus, the linearity interval containing γ̄ = −3 is (−∞, −2]. Observe that the linearity intervals just calculated agree with Figure 19.3. Finally we demonstrate the use of Theorem IV.76 at a break point. Taking γ̄ = 0, we see that x = (4, 0, 0) is optimal for (P0 ), and we need to minimize and maximize γ over the region {γ : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ, 4(1 − y) = 0} . This gives −2 ≤ γ ≤ 0 and we find the linearity interval [−2, 0] left from 0. This is because x = (4, 0, 0) is also optimal on this interval. Recall from Example IV.72 that the optimal set at γ = 0 is given by {x = (x1 , 0, x3 ) : x1 + x3 = 4, x1 ≥ 0, x3 ≥ 0} . Thus, instead of the optimal solution x = (4, 0, 0) we may equally well use the strictly complementary solution x = (2, 0, 2). Then we need to minimize and maximize γ over the region {γ : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ, 2(1 − y) + 2(1 − γ − y) = 0} . The last constraint amounts to γ = 2 − 2y. Substitution in the third constraint yields y ≤ −1 + 2y or y ≥ 1. Because of the first constraint we get y = 1, from which it follows that γ = 0. Thus, γ1 = γ2 = 0 in accordance with Theorem IV.76. ♦ 19.4.5 Running through all break points and linearity intervals Using the results of the previous sections, we present in this section an algorithm that yields the optimal-value function for a one-dimensional perturbation of the vector b or the vector c. We first deal with a one-dimensional perturbation of b by a scalar multiple of the vector ∆b; we state the algorithm for the calculation of the optimalvalue function and then prove that the algorithm finds all break points and linearity 380 IV Miscellaneous Topics intervals. It will then be clear how to treat a one-dimensional perturbation of c; we state the corresponding algorithm and its convergence result without further proof. We provide examples for both cases. Assume that we are given optimal solutions x∗ of (P ) and (y ∗ , s∗ ) of (D). In the notation of the previous sections, the problem (Pβ ) and its dual (Dβ ) arise by replacing the vector b by b(β) = b+β∆b; the optimal value of these problems is denoted by f (β). So we have f (0) = cT x∗ = bT y ∗ . The domain of the optimal-value function is (−∞, ∞) and f (β) = ∞ if and only if (Dβ ) is unbounded. Recall from Theorem IV.51 that f (β) is convex and piecewise linear. Below we present an algorithm that determines f on the nonnegative part of the real line. We leave it to the reader to find some straightforward modifications of the algorithm, yielding an algorithm that generates f on the other part of the real line.16 The algorithm is as follows.17 The Optimal Value Function f (β), β ≥ 0 Input: An optimal solution (y ∗ , s∗ ) of (D); a perturbation vector ∆b. begin k := 1; y 0 := y ∗ ; s0 = s∗ ; ready:=false; while not ready do begin  Solve maxβ,x β : Ax = b + β∆b, x ≥ 0, xT sk−1 = 0 ; if this problem is unbounded: ready:=true else let (βk , xk ) be an optimal solution; begin  Solve maxy,s ∆bT y : AT y + s = c, s ≥ 0, sT xk = 0 ; if this problem is unbounded: ready := true else let (y k , sk ) be an optimal solution; k := k + 1; end end end The next theorem states that the above algorithm finds the successive break points of f on the nonnegative part of the real line, as well as the slopes of f on the successive linearity intervals. Theorem IV.78 The algorithm terminates after a finite number of iterations. If K is the number of iterations upon termination then β1 , β2 , . . . , βK are the successive 16 17 Exercise 99 When the two maximization problems in the algorithm are changed into minimization problems, the algorithm yields the break points and linearity intervals for negative values of β. Prove this. After the completion of this section the same algorithm appeared in a recent paper of Monteiro and Mehrotra [221] and the authors became aware of the fact that these authors already published the algorithm in 1992 [207]. IV.19 Parametric and Sensitivity Analysis 381 break points of f on the nonnegative real line. The optimal value at βk (1 ≤ k ≤ K) is given by cT xk and the slope of f on the interval (βk , βk+1 ) (1 ≤ k < K) by ∆bT y k . Proof: In the first iteration the algorithm starts by solving  max β : Ax = b + β∆b, x ≥ 0, xT s0 = 0 , β,x where s0 is the slack vector in the given optimal solution (y 0 , s0 ) = (y ∗ , s∗ ) of (D) = (D0 ). This problem is feasible, because (P ) has an optimal solution x∗ and (β, x) = (0, x∗ ) satisfies the constraints. Hence the first auxiliary problem is either unbounded or it has an optimal solution (β1 , x1 ). By Theorem IV.73 β1 is equal to the extreme point at the right of the linearity interval containing 0. If the problem is unbounded (when β1 = ∞) then f is linear on (0, ∞) and the algorithm stops; otherwise β1 is the first break point to the right of 0. (Note that it may happen that β1 = 0. This certainly occurs if 0 is a break point of f and the starting solution (y ∗ , s∗ ) is strictly complementary.) Clearly x1 is primal feasible at β = β1 . Since (y 1 , s1 ) is dual feasible at β = β1 and (x1 )T s1 = 0 we see that x1 is optimal for (Pβ1 ). Hence f (β1 ) = cT x1 . Also observe that (y 1 , s1 ) is dual optimal at β1 . (This also follows from Corollary IV.55.) Assuming that the second half of the algorithm occurs, when the above problem has an optimal solution, the algorithm proceeds by solving a second auxiliary problem, namely  max ∆bT y : AT y + s = c, s ≥ 0, sT x1 = 0 . y,s By Theorem IV.62 the maximal value is equal to the right derivative of f at β1 . If the problem is unbounded then β1 is the largest break point of f on (0, ∞) and f (β) = ∞ for β > β1 . In that case we are done and the algorithm stops. Otherwise, when the problem is bounded, the optimal solution (y 1 , s1 ) is such that ∆bT y 1 is equal to the slope on the linearity interval to the right of β1 , by Lemma IV.61. Moreover, from Corollary IV.64, (y 1 , s1 ) is dual optimal on the open linearity interval to the right of β1 . Hence, at the start of the second iteration (y 1 , s1 ) is an optimal solution at the open interval to the right of the first break point on [0, ∞). Thus we can start the second iteration and proceed as in the first iteration. Since each iteration produces a linearity interval, and f has only finitely many such intervals, the algorithm terminates after a finite number of iterations. ✷ Example IV.79 Consider the primal problem (P ) min {x1 + x2 + x3 : x1 − x2 = 0, x3 = 1, x = (x1 , x2 , x3 ) ≥ 0} and its dual max {y2 : −1 ≤ y1 ≤ 1, y2 ≤ 1} . (D) Hence, in this case we have A= " 1 −1 0 0 0 1 # ,  1   c =  1 , 1  b= " 0 1 # . 382 IV Miscellaneous Topics We perturb the vector b by a scalar multiple of " # 1 ∆b = −1 to b(β) = b + β∆b = " 0 1 # +β " 1 −1 # = " β 1−β # , and use the algorithm to find the break points and linearity intervals of f (β) = z (b(β), c). Optimal solutions of (P ) and (D) are given by x∗ = (0, 0, 1), y ∗ = (0, 1), s∗ = (1, 1, 0). Thus, entering the first iteration of the algorithm we consider max {β : x1 − x2 = β, x3 = 1 − β, x ≥ 0, x1 + x2 = 0} . β,x From x ≥ 0, x1 + x2 = 0 we deduce that x1 = x2 = 0 and hence β = 0. Thus we find the first break point and the optimal value at this break point: β1 = 0, x1 = (0, 0, 1), f (β1 ) = cT x1 = 1. We proceed with the second auxiliary problem: max {y1 − y2 : −1 ≤ y1 ≤ 1, y2 ≤ 1, 1 − y2 = 0} . y It follows that y2 = 1 and y1 − y2 = y1 − 1 is maximal if y1 = 1. Thus we find an optimal solution (y 1 , s1 ) for the linearity interval just to the right of β1 and the slope of f on this interval: y 1 = (1, 1), s1 = (0, 2, 0), ′ f+ (β1 ) = ∆bT y1 = 0. In the second iteration the first auxiliary problem is max {β : x1 − x2 = β, x3 = 1 − β, x ≥ 0, 2x2 = 0} , β,x which is equivalent to max {β : β = x1 , β = 1 − x3 , x ≥ 0, x2 = 0} . β,x Clearly the maximum value of β is attained at x1 = 1 and x3 = 0. Thus we find the second break point and the optimal value at this break point: β2 = 1, x1 = (1, 0, 0), The second auxiliary problem becomes f (β2 ) = cT x2 = 1. IV.19 Parametric and Sensitivity Analysis 383 9 8 f (β) 7 ✻ 6 5 4 3 2 1 0 −1 −4 −3 −2 Figure 19.4 −1 0 ✲ β 1 2 The optimal-value function f (β). max {y1 − y2 : −1 ≤ y1 ≤ 1, y2 ≤ 1, 1 − y1 = 0} , y which is equivalent to max {1 − y2 : y2 ≤ 1, y1 = 1} . y ′ (β2 ) = ∞ and we are done. For larger Clearly this problem is unbounded. Hence f+ values of β the primal problem (Pβ ) becomes infeasible and the dual problem (Dβ ) unbounded. We proceed by calculating f (β) for negative values of β. Using Exercise 99 (page 380, the first auxiliary problem, in the first iteration, becomes simply min {β : x1 − x2 = β, x3 = 1 − β, x ≥ 0, x1 + x2 = 0} . β,x We can easily verify that this problem has the same solution as its counterpart, when we maximize β. This is due to the fact that β = 0 is a break point of f . We find, as before, β1 = 0, x1 = (0, 0, 1), f (β1 ) = cT x1 = 1. We proceed with the second auxiliary problem: min {y1 − y2 : −1 ≤ y1 ≤ 1, y2 ≤ 1, 1 − y2 = 0} . y Since y2 = 1 we have y1 − y2 = y1 − 1 and this is minimal if y1 = −1. Thus we find an optimal solution (y 1 , s1 ) for the linearity interval just to the left of β1 = 0 and the slope of f on this interval: y 1 = (−1, 1), s1 = (2, 0, 0), ′ f− (β1 ) = ∆bT y1 = −2. 384 IV Miscellaneous Topics In the second iteration the first auxiliary problem becomes min {β : x1 − x2 = β, x3 = 1 − β, x ≥ 0, 2x1 = 0} , β,x which is equivalent to min {β : β = −x2 , β = 1 − x3 , x ≥ 0, x1 = 0} . β,x Obviously this problem is unbounded. This means that f (β) is linear on the negative real line, and we are done. Figure 19.4 (page 383) depicts the optimal-value function f (β) as just calculated. ♦ When the vector c is perturbed by a scalar multiple of ∆c to c(γ) = c + γ∆c, the algorithm for the calculation of the optimal value function g(γ) can be stated as follows. Recall that g is concave. That is why the second auxiliary problem in the algorithm is a minimization problem.18 The Optimal Value Function g(γ), γ ≥ 0 Input: An optimal solution x∗ of (P ); a perturbation vector ∆c. begin ready:=false; k := 1; x0 := x∗ ; while not ready do begin  Solve maxγ,y,s γ : AT y + s = c + γ∆c, s ≥ 0, sT xk−1 = 0 ; if this problem is unbounded: ready:=true else let (γk , y k , sk ) be an optimal solution; begin  Solve minx ∆cT x : Ax = b, x ≥ 0, xT sk = 0 ; if this problem is unbounded: ready:=true else let xk be an optimal solution; k := k + 1; end end end The above algorithm finds the successive break points of g on the nonnegative real line as well as the slopes of g on the successive linearity intervals. The proof uses 18 Exercise 100 When the maximization problem in the algorithm is changed into a minimization problem and the minimization into a maximization problem, the algorithm yields the break points and linearity intervals for negative values of γ. Prove this. IV.19 Parametric and Sensitivity Analysis 385 arguments similar to the arguments in the proof of Theorem IV.78 and is therefore omitted. Theorem IV.80 The algorithm terminates after a finite number of iterations. If K is the number of iterations upon termination then γ1 , γ2 , . . . , γK are the successive break points of g on the nonnegative real line. The optimal value at γk (1 ≤ k ≤ K) is given by bT y k and the slope of g on the interval (γk , γk+1 ) (1 ≤ k < K) by ∆cT xk . ✷ The next example illustrates the use of the above algorithm. Example IV.81 In Example IV.72 we considered the primal problem min {x1 + 3x2 + x3 : x1 + x2 + x3 = 4, x1 , x2 , x3 ≥ 0} (P ) and its dual problem (D) max {4y : y ≤ 1, y ≤ 3, y ≤ 1} , with the perturbation vector ∆c = (0, 1, −1) and we calculated the linearity intervals from Lemma IV.67. This required the knowledge of an optimal primal solution for each interval. Theorem IV.80 enables us to find these intervals from the knowledge of an optimal solution x∗ of (P ) only. Entering the first iteration of the above algorithm with x∗ = (4, 0, 0) we consider max {γ : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ, 4(1 − y) = 0} . γ,y We can easily see that y = 1 is optimal with γ = 0. Thus we find the first break point and the optimal value at this break point: γ1 = 0, y 1 = 1, s1 = (0, 2, 0), g(γ1 ) = bT y 1 = 4. The second auxiliary problem is now given by: min {x2 − x3 : x1 + x2 + x3 = 4, x1 , x2 , x3 ≥ 0, 2x2 = 0} . x It follows that x2 = 0 and x2 − x3 = −x3 is minimal if x3 = 4 and x1 = 0. Thus we find an optimal solution x1 for the linearity interval just to the right of γ1 and the slope of g on this interval: x1 = (0, 0, 4), ′ g+ (γ1 ) = ∆cT x1 = −4. In the second iteration the first auxiliary problem is max {γ : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ, 4(1 − γ − y) = 0} . γ,y 386 IV Miscellaneous Topics It follows that y = 1 − γ and the problem becomes equivalent to max {γ : 1 − γ ≤ 1, 1 − γ ≤ 3 + γ, y = 1 − γ} . γ,y Clearly this problem is unbounded. Hence g is linear for values of γ larger than γ1 = 0. We proceed by calculating g(γ) for negative values of γ. Using Exercise 100 (page 384), the first auxiliary problem, in the first iteration, becomes simply min {γ : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ, 4(1 − y) = 0} . γ,y Since y = 1 this is equivalent to min {γ : −2 ≤ γ ≤ 0, y = 1} , γ,y so the first break point and the optimal value at this break point are given by γ1 = −2, y 1 = 1, s1 = (0, 0, 2), g(γ1 ) = bT y 1 = 4. The second auxiliary problem is now given by: max {x2 − x3 : x1 + x2 + x3 = 4, x1 , x2 , x3 ≥ 0, 2x3 = 0} , x which is equivalent to max {x2 : x1 + x2 = 4, x1 , x2 ≥ 0, x3 = 0} . x Since x2 is maximal if x1 = 0 and x2 = 4 we find an optimal solution x1 for the linearity interval just to the left of γ1 and the slope of g on this interval: x1 = (0, 4, 0), ′ g− (γ1 ) = ∆cT x1 = 4. In the second iteration the first auxiliary problem is min {γ : y ≤ 1, y ≤ 3 + γ, y ≤ 1 − γ, 4(3 + γ − y) = 0} . γ,y It follows that y = 3 + γ and the problem becomes equivalent to min {γ : 3 + γ ≤ 1, 3 + γ ≤ 1 − γ, y = 3 + γ} . γ,y Clearly this problem is unbounded. Hence g is linear for values of γ smaller than γ1 = −2. This completes the calculation of the optimal-value function g(γ) for the present example. We can easily check that the above results are in accordance with the graph of g(γ) in Figure 19.3 (page 369).19 ♦ 19 Exercise 101 In Example IV.81 the algorithm for the computation of the optimal-value function g(γ) was initialized by the optimal solution x∗ = (4, 0, 0) of (P ). Execute the algorithm once more now using the optimal solution x∗ = (2, 0, 2) of (P ). IV.19 Parametric and Sensitivity Analysis 19.5 387 Sensitivity analysis Sensitivity analysis is the special case of parametric analysis where only one coefficient of b, or c, is perturbed. This means that the perturbation vector is a unit vector. The derivative of the optimal-value function to a coefficient is called the shadow price and the corresponding linearity interval the range of the coefficient. When dealing with sensitivity analysis the aim is to find the shadow prices and ranges of all coefficients in b and c. Of course, the current value of a coefficient may or may not be a break point. In the latter case, when the current coefficient is not a break point, it belongs to an open linearity interval and the range of the coefficient is just this closed linearity interval and its shadow price the slope of the optimal-value function on this interval. If the coefficient is a break point, then we have two shadow prices, the left-shadow price, which is the left derivative of the optimal-value function at the current value, and the right-shadow price, the right derivative of the optimal-value function at the current value.20 19.5.1 Ranges and shadow prices Let x∗ be an optimal solution of (P ) and (y ∗ , s∗ ) an optimal solution of (D). With ei denoting the i-th unit vector (1 ≤ i ≤ m), the range of the i-th coefficient bi of b is simply the linearity interval of the optimal-value function zA (b + βei , c) that contains zero. From Theorem IV.73, the extreme points of this linearity interval follow by minimizing and maximizing β over the set  β : Ax = b + βei , x ≥ 0, xT s∗ = 0 . With bi considered as a variable, its range of bi follows by minimizing and maximizing bi over the set  bi : Ax = b, x ≥ 0, xT s∗ = 0 . (19.6) The variables in this problem are x and bi . For the shadow prices of bi we use Theorem IV.62. The left- and right-shadow prices of bi follow by minimizing and maximizing eTi y = yi over the set  yi : AT y + s = c, s ≥ 0, sT x∗ = 0 . (19.7) Similarly, the range of the j-th coefficient cj of c is equal to the linearity interval of the optimal-value function zA (b, c + γej ) that contains zero. Changing cj into a variable and using Theorem IV.75, we obtain the extreme points of this linearity interval by minimizing and maximizing cj over the set  cj : AT y + s = c, s ≥ 0, sT x∗ = 0 . (19.8) 20 Sensitivity analysis is an important topic in the application oriented literature on LO. Some relevant references, in chronological order, are Gal [89], Gauvin [93], Evans and Baker [72, 73], Akgül [6], Knolmayer [173], Gal [90], Greenberg [128], Rubin and Wagner [247], Ward and Wendell [288], Adler and Monteiro [4], Mehrotra and Monteiro [207], Jansen, Roos and Terlaky [153], Jansen, de Jong, Roos and Terlaky [152] and Greenberg [129]. It is surprising that in the literature on sensitivity analysis it is far from common to distinguish between left- and right-shadow prices. One of the early exceptions was Gauvin [93]; this paper, however, is not mentioned in the historical survey on sensitivity analysis of Gal [90]. 388 IV Miscellaneous Topics In this problem the variables are the vectors y and s and also cj . For the shadow prices of cj we use Theorem IV.68. The left- and right-shadow prices of cj follow by minimizing and maximizing eTj x = xj over the set  xj : Ax = b, x ≥ 0, xT s∗ = 0 . (19.9) Some remarks are in order. If bi is not a break point, which becomes evident if the extreme values in (19.6) both differ from bi , then we know that the left- and rightshadow prices of bi are the same and these are given by yi∗ . In that case there is no need to solve (19.7). On the other hand, when bi is a break point, it is clear from the discussion following the proof of Theorem IV.73 that there are three possibilities. When the range of bi is determined by solving (19.6) the result may be one of the two linearity intervals surrounding bi ; in that case yi∗ is the shadow price of bi on this interval. This happens if and only if the given optimal solution y ∗ is such that yi∗ is an extreme value in the set (19.7). The third possibility is that the extreme values in the set (19.6) are both equal to bi . This certainly occurs if y ∗ is a strictly complementary solution of (D). In each of the three cases it becomes clear after (19.6) is solved, that bi is a break point, and the left- and right-shadow prices at bi can be found by determining the extreme values of (19.7). Clearly similar remarks apply to the ranges and shadow prices of the coefficients of the vector c. 19.5.2 Using strictly complementary solutions The formulas for the ranges and shadow prices of the coefficients of b and c can be simplified when the given optimal solutions x∗ of (P ) and (y ∗ , s∗ ) of (D) are strictly complementary. Let (B, N ) denote the optimal partition of (P ) and (D). Then we have x∗B > 0, x∗N = 0 and s∗B = 0, s∗N > 0. As a consequence, we have xT s∗ = 0 in (19.6) and (19.9) if and only if xN = 0. Similarly, sT x∗ = 0 holds in (19.7) and (19.8) if and only if sB = 0. Using this we can reformulate (19.6) as {bi : Ax = b, xB ≥ 0, xN = 0} , (19.10) and (19.7) as  yi : AT y + s = c, sB = 0, sN ≥ 0 . (19.11) Similarly, (19.8) can be rewritten as  cj : AT y + s = c, sB = 0, sN ≥ 0 , (19.12) and (19.9) as {xj : Ax = b, xB ≥ 0, xN = 0} . (19.13) IV.19 Parametric and Sensitivity Analysis 389 We proceed with an example.21 Example IV.82 Consider the (primal) problem (P ) defined by min s.t. x1 + 4x2 + x3 −2x1 + x2 + x3 + x2 − x3 x1 + 2x4 + + 2x5 + x5 x4 −x6 =0 −x7 =1 x1 , x2 , x3 , x4 , x5 , x6 , x7 ≥ 0. The dual problem (D) is max y2 −2y1 s.t. + y2 y1 + y2 y1 − y2 y2 y1 −y1 −y2 ≤ 1 (1) 4 (2) ≤ 1 (3) ≤ 2 (4) ≤ 2 (5) ≤ 0 (6) ≤ 0 (7) ≤ Problem (D) can be solved graphically. Its feasible region is shown in Figure 19.5 (page 390). Since we are maximizing y2 in (D), the figure makes clear that the set of optimal solutions is given by D∗ = {(y1 , y2 ) : 0.5 ≤ y1 ≤ 2, y2 = 2} , and hence the optimal value is 2. Note that all slack values can be positive at an optimal solution except the slack value of the constraint y2 ≤ 2. This means that the set N in the optimal partition (B, N ) equals N = {1, 2, 3, 5, 6, 7}. Hence, B = {4}. Therefore, at optimality only the variable x4 can be positive. It follows that P ∗ = {x ∈ P : x1 = x2 = x3 = x5 = x6 = x7 = 0} = {(0, 0, 0, 1, 0, 0, 0)} , and (P ) has a unique solution: x = (0, 0, 0, 1, 0, 0, 0). 21 Exercise 102 The ranges and shadow prices can also be found by solving the corresponding dual problems. For example, the maximal value of bi in (19.10) can be found by solving min and the minimal value by solving max   bT y : AT B y ≥ 0, yi = −1 bT y : AT B y ≤ 0, yi = −1 . Formulate the dual problems for the other six cases. 390 IV Miscellaneous Topics 3 (6) 2.5 (1) (2) (4) 2 y2 ✻ 1.5 1 0.5 (7) 0 −0.5 (3) −1 −1 −0.5 0 Figure 19.5 (5) 0.5 1 1.5 2 ✲ y1 2.5 3 The feasible region of (D). The next table shows the result of a complete sensitivity analysis. It shows the ranges and shadow prices for all coefficients of b and c, where these vectors have their usual meaning. For each coefficient that is a break point we give the shadow price as a closed interval; the extreme values of this interval are the left- and right-shadow prices of the coefficient. In this example this happens only for b1 . The range of a break point consists of the point itself; the table gives this point. On the other hand, for ‘nonbreak points’ the range is a proper interval and the shadow price is a number. Coefficient Range Shadow prices b1 = 0 0 [ 21 , 2] b2 = 1 [0, ∞) 2 [ 52 , ∞) [− 23 , ∞) 0 [0, 3] 1 [ 12 , ∞) 0 [−2, ∞) 0 c1 = 1 c2 = 4 c3 = 1 c4 = 2 c5 = 2 c6 = 0 c7 = 0 [−2, ∞) 0 0 [−2, ∞) 0 We perform the sensitivity analysis here for b1 and c4 . Range and shadow prices for b1 Using (19.10) the range of b1 follows by minimizing and maximizing b1 over the system 0 x4 = = b1 1. IV.19 Parametric and Sensitivity Analysis 391 The solution of this system is unique: x4 = 1 and b1 = 0, so the range of b1 is the interval [0, 0]. This means that b1 = 0 is a break point. The left- and right-shadow prices of b1 follow by minimizing and maximizing y1 over y ∈ D∗ . The minimal value is 0.5 and the maximal value 2, so the left- and right-shadow prices 0.5 and 2. Range and shadow price for c4 The range of c4 is found by using (19.12). This amounts to minimizing and maximizing c4 over the system −2y1 + y2 y1 + y2 y1 − y2 y2 y1 y1 y2 ≤ 1 ≤ 1 ≤ 4 = c4 ≤ 2 ≥ 0. ≥ 0 This optimization problem can easily be solved by using Figure 19.5. It amounts to the question of which values of y2 are feasible when the fourth constraint is removed in Figure 19.5. We can easily verify that all values of y2 in the closed interval [0, 3] (and no other values) satisfy. Therefore, the range of c4 is this interval. The shadow ♦ price of c4 is given by eT4 x = x4 = 1. 19.5.3 Classical approach to sensitivity analysis Commercial optimization packages for the solution of LO problems usually offer the possibility of doing sensitivity analysis. The sensitivity analysis in many existing commercial optimization packages is based on the naive approach presented in first year textbooks. As a result, the outcome of the sensitivity analysis is often confusing. We explain this below. The ‘classical’ approach to sensitivity analysis is based on the Simplex Method for solving LO problems.22 The Simplex Method produces a so-called basic solution of 22 With the word ‘classical’ we want to refer to the approach which dominates the literature, especially well known textbooks dealing with parametric and/or sensitivity analysis. This approach has led to the existing misuse of parametric optimization in commercial packages. This misuse is however a shortcoming of the packages and by no means a shortcoming in the whole existing theoretical literature. In this respect we want to refer the reader to Nožička, Guddat, Hollatz and Bank [228]. In this book the parametric issue is correctly handled in terms of the Simplex Method, polyhedrons, faces of polyhedra etc. Besides parameterizing either the objective vector or the right-hand side vector, much more general parametric issues are also discussed. The following citation is taken from this book: Den qualitativen Untersuchungen in den meisten erschienenen Aufsätzen und Büchern liegt das Simplexverfahren zugrunde. Zwangsläufig unterliegen alle derartig gewonnenen Aussagen den Schwierigkeiren, die bei Beweisführungen mit Hilfe der Simplexmethode im Falle der Entartung auftreten. In einigen Arbeiten wurde ein rein algebraischer Weg verfolgt, der in gewisse Spezialfällen zu Resultaten führte, im allgemeinen aber bisher keine qualitative Analyse erlieferte. 392 IV Miscellaneous Topics the problem. It suffices for our purpose to know that such a solution is determined by an optimal basis. Assuming that A is of size m × n and rank (A) = m, a basis is a nonsingular m × m submatrix AB ′ of A and the corresponding basic solution x is determined by AB ′ xB ′ = b, xN ′ = 0, where N ′ consists of the indices not in B ′ . Defining a vector y by ATB ′ y = cB ′ , and denoting the slack vector of y by s, we have sB ′ = 0. Since xN ′ = 0, it follows that xs = 0, proving that x and s are complementary vectors. Hence, if xB ′ and sN ′ are nonnegative then x is optimal for (P ) and (y, s) is optimal for (D). In that case AB ′ is called an optimal basis for (P ) and (D). A main result in the Simplex based approach to LO is that such an optimal basis always exists — provided the assumption that rank (A) = m is satisfied — and the Simplex Method generates such a basis. For a detailed description of the Simplex Method and its underlying theory we refer the reader to any (text-)book on LO.23 Any optimal basis leads to a natural division of the indices into m basic indices and n − m nonbasic indices, thus yielding a partition (B ′ , N ′ ) of the index set. We call this the optimal basis partition induced by the optimal basis B ′ . Obviously, an optimal basis partition need not be an optimal partition. In fact, this observation is crucial as we show below. The classical approach to sensitivity analysis amounts to applying the ‘formulas’ (19.10) – (19.13) for the ranges and shadow prices, but with the optimal basis partition (B ′ , N ′ ) instead of the optimal partition (B, N ). It is clear that in general (B ′ , N ′ ) is not necessarily the optimal partition because (P ) and (D) may have more than one optimal basis. The outcome of the classical analysis will therefore depend on the optimal basis AB ′ . Hence, correct implementations of the classical approach may give rise to different ‘ranges’ and ‘shadow prices’.24 The next example illustrates this phenomenon. In a subsequent section a further example is given, where we apply several commercial optimization packages to a small transportation problem. Example IV.83 For problems (P ) and (D) in Example IV.82 we have three optimal bases. These are given in the table below. The column at the right gives the ‘ranges’ for c4 for each of these bases. Basis B′ ‘Range’ for c4 1 {1, 4} [1, 3] {4, 5} [1, 2] 2 3 {2, 4} [2, 3] We get three different ‘ranges’, depending on the optimal basis. Let us do the calculations for the first optimal basis in the table. The ‘range’ of c4 is found by 23 24 See, e.g., Dantzig [59], Papadimitriou and Steiglitz [231], Chvátal [55], Schrijver [250], Fang and Puthenpura [74] and Sierksma [256]. We put the words range and shadow price between quotes if they refer to ranges and shadow prices obtained from an optimal basis partition (which may differ from the unique optimal partition). IV.19 Parametric and Sensitivity Analysis 393 using (19.12) with (B, N ) such that B = B ′ = {1, 4}. This amounts to minimizing and maximizing c4 over the system −2y1 + y2 = 1 y1 + y2 4 y1 − y2 ≤ y2 y1 y1 y2 ≤ 1 = c4 ≤ 2 ≥ 0. ≥ 0 Using Figure 19.5 we can easily solve this problem. The question now is which values of y2 are feasible when the fourth constraint is removed in Figure 19.5 and the first constraint is active. We can easily verify that this leads to 1 ≤ y2 ≤ 3, thus yielding the closed interval [1, 3] as the ‘range’ for c4 . The other two ‘ranges’ can be found in the same way by keeping the second and the fifth constraints active, respectively. A commercial optimization package provides the user with one of the three ranges in the table, depending on the optimal basis found by the package. Observe that each of the three ranges is a subrange of the correct range, which is [0, 3]. Note that the current value 2 of c4 lies in the open interval, whereas for two of the ‘ranges’ in the table, 2 is an extreme point. This might lead to the wrong conclusion that 2 is a break point of the optimal-value function. ♦ It can easily be understood that the ‘range’ obtained from an optimal basis partition is always a subinterval of the whole linearity interval. Of course, sometimes the subinterval may coincide with the whole interval. For the shadow prices a similar statement holds. At a ‘nonbreak point’ an optimal basis partition yields the correct shadow price. At a break point, however, an optimal basis partition yields one ‘shadow price’, which may be any number between the left- and the right-shadow price. The example in the next section demonstrates this behavior very clearly. Before proceeding with the next section we must note that from a computational point of view, the approach using an optimal basis partition is much cheaper than using the optimal partition. In the latter case we need to solve some auxiliary LO problems — in the worst case four for each coefficient. When the optimal partition (B, N ) is replaced by an optimal basis partition (B ′ , N ′ ), however, it becomes computationally very simple to determine the ‘ranges’ and ‘shadow prices’. For example, consider the ‘range’ problem for bi . This amounts to minimizing and maximizing bi over the set {bi : Ax = b, xB ′ ≥ 0, xN ′ = 0} . Since AB ′ is nonsingular, it follows that xB ′ = A−1 B ′ b, and hence the condition xB ′ ≥ 0 reduces to A−1 B ′ b ≥ 0. 394 IV Miscellaneous Topics This is a system of m linear inequalities in the coefficient bi , with i fixed, and hence its solution can be determined straightforwardly. Note that the system is feasible, because the current value of bi is such that the system is satisfied. Hence, the solution is a closed interval containing the current value of bi . 19.5.4 Comparison of the classical and the new approach For the comparison we use a simple problem, arising when transporting commodities (of one type) from three distribution centers to three warehouses. The supply values at the three distribution centers are 2, 6 and 5 units respectively, and the demand value at each of the three warehouses is just 3. We assume that the costs for transportation of one unit of commodity from a distribution center to a warehouse is independent of the distribution center and the warehouse, and this cost is equal to one (unit of currency). The aim is to meet the demand at the warehouses at minimal cost. This problem is depicted in Figure 19.6 by means of a network. The left three nodes in this network represent the distribution centers and the right three nodes the three warehouses. The arcs represent the transportation routes from the distribution centers to the warehouses. The supply and demand values are indicated at the respective nodes. The transportation problem consists of assigning ‘flow’ values to the arcs in a1 = 2 ✲ ✲ ✯ ✒ ✲ 3 = b1 a2 = 6 ✲ ❥ ✲ ✯ ✲ 3 = b2 a3 = 5 ✲ ❘ ❥ ✲ ✲ 3 = b3 Figure 19.6 A transportation problem. the network so that the demand is met and the supply values are respected; this must be done in such a way that the cost of the transportation to the demand nodes is minimized. Because of the choice of cost coefficients, the total cost is simply the sum of all arc flow values. Since the total demand is 9, this is also the optimal value for the total cost value. Note that there are many optimal flows; this is due to the fact that all arcs are equally expensive. So far, everything is trivial. Sensitivity to demand and supply values Now we want to determine the sensitivity of the optimal value to perturbations of the supply and demand values. Denoting the supply values by a = (a1 , a2 , a3 ) and the demand values by b = (b1 , b2 , b3 ), we can determine the ranges of these values by IV.19 Parametric and Sensitivity Analysis 395 hand. For example, when b1 is changed, the total demand becomes 6 + b1 and this is the optimal value as long as such a demand can be met by the present supply. This leads to the condition 6 + b1 ≤ 2 + 6 + 5 = 13, which yields b1 ≤ 7. For larger values of b1 the problem becomes infeasible. When b1 = 0, the arcs leading to the first demand node have zero flow value in any optimal solution. This means that 0 is a break point, and the range of b1 is [0, 7]. Because of the symmetry in the network for the demand nodes, the range for b2 and b3 will be the same interval. When a1 is changed, the total supply becomes 11 + a1 and this will be sufficient as long as 11 + a1 ≥ 9, which yields a1 ≥ −2. The directed arcs can only handle nonnegative supply values, and hence the range of a1 is [0, ∞). Similarly, the range for a2 follows from 7 + a2 ≥ 9, which yields the range [2, ∞) for a2 , and the range for a3 follows from 8 + a3 ≥ 9, yielding the range [1, ∞) for a3 . To compare these ranges with the ‘ranges’ provided by the classical approach, we made a linear model of the above problem, solved it using five well-known commercial optimization packages, and performed a sensitivity analysis with these packages. We used the following linear standard model: P3 P3 min i=1 j=1 xij s.t. x11 + x12 + x13 + s1 = 2 x21 + x22 + x23 + s2 = 6 x31 + x32 + x33 + s3 = 5 x11 + x21 + x31 d1 = 3 x12 + x22 + x32 − d2 = 3 x13 + x23 + x33 d3 = 3 xij , si , dj ≥ 0, i, j = 1, 2, 3. − − The meaning of the variables is as follows: xij : the amount of transport from supply node i to demand node j, si : excess supply at supply node i, dj : shortage of demand at node j, where i and j run from 1 to 3. 396 IV Miscellaneous Topics The result of the experiment is shown in the table below.25 The columns correspond to the supply and the demand coefficients. Their current values are put between brackets. The rows in the table corresponding to the five packages26 CPLEX, LINDO, PC-PROG, XMP and OSL show the ‘ranges’ produced by these packages. The last row contains the ranges calculated before by hand.27 ‘Ranges’ of supply and demand values LO package a1 (2) a2 (6) a3 (5) b1 (3) b2 (3) b3 (3) CPLEX LINDO [0,3] [4,7] [1,∞) [2,7] [2,5] [2,5] [1,3] [2,∞) [4,7] [2,4] [1,4] [1,7] [0,∞) [4,∞) [3,6] [2,5] [0,5] [2,5] XMP [0,3] [6,7] [1,∞) [2,3] [2,3] [2,7] OSL [0,3] [4,7] [2,7] [2,5] [2,5] [0,∞) [2,∞) (−∞, ∞) [0,7] [0,7] [0,7] PC-PROG Correct range [1,∞) The table clearly demonstrates the weaknesses of the classical approach. Sensitivity analysis is considered to be a tool for obtaining information about the bottlenecks and degrees of freedom in the problem. The information provided by the commercial optimization packages is confusing and hardly allows a solid interpretation. For example, in our example problem there is obvious symmetry between the demand nodes. None of the five packages gives evidence of this symmetry. Remark IV.84 As stated before, the ‘ranges’ and ‘shadow prices’ provided by the classical approach arise by applying the formulas (19.10) – (19.13) for the ranges and shadow prices, but replacing the optimal partition (B, N ) by the optimal basis partition (B ′ , N ′ ). Indeed, the ‘ranges’ in the table can be reconstructed in this way. We will not do this here, but to enable the interested reader to perform the relevant calculations we give the optimal basis partitions used by the packages. If the optimal basis partition is (B ′ , N ′ ), it suffices to know the variables in B ′ for each of the five packages. These ‘basic variables’ are given in the next table. LO package 25 26 27 Basic variables CPLEX x12 x21 x22 x23 x31 s3 LINDO x11 x23 x31 x32 x33 s2 PC-PROG x22 x23 x31 x33 s1 s2 XMP x13 x21 x22 x23 x33 s3 OSL x12 x21 x22 x23 x31 s3 The dual problem has a unique solution in this example. These are the shadow prices for the demand and supply values. All packages return this unique solution, namely 0 for the supply values — due to the excess of supply — and 1 for the demand values. For more information on these packages we refer the reader to Sharda [253]. The ‘range’ provided by the IBM package OSL (Optimization Subroutine Library) for a3 is not a subrange of the correct range; this must be due to a bug in OSL. The correct ‘range’ for the optimal basis partition used by OSL is [1, ∞). IV.19 Parametric and Sensitivity Analysis 397 Note that CPLEX and OSL use the same optimal basis. The output of their sensitivity analysis differs, however. As noted before, the explanation of this phenomenon is that the OSL implementation of the classical approach must contain a bug. • The sensitivity analysis for the cost coefficients cij is considered next. The results are similar, as we shall see. Sensitivity to cost coefficients The current values of the cost coefficients cij are all 1. As a consequence each feasible flow on the network is optimal if the sum of the flow values xij equals 9. When one of the arcs becomes more expensive, then the flow on this arc can be rerouted over the other arcs and the optimal value remains 9. Hence the right-shadow price of each cost coefficient equals 0. On the other hand, if one of the arc becomes cheaper, then it becomes attractive to let this arc carry as much flow as possible. The maximal flow values for the arcs are 2 for the arcs emanating from the first supply node and 3 for the other arcs. Since for each arc there exists an optimal solution of the problem in which the flow value on that arc is zero, a decrease of 1 in the cost coefficient for the arcs emanating from the first supply node leads to a decrease of 2 in the total cost, and for the other arcs the decrease in the total cost is 3. Thus we have found the left- and right-shadow prices of the cost coefficients. Since the left- and right-shadow prices are all different, the current value of each of the cost coefficients is a break point. Obviously, the linearity interval to the left of this break point is (−∞, 1] and the linearity interval to the right of it is [1, ∞). In the next table the ‘shadow prices’ provided by the five commercial optimization packages are given. The last row in the table contains the correct values of the leftand right-shadow prices, as just calculated. ‘Shadow prices’ of cost coefficients LO package c11 c12 c13 c21 c22 c23 c31 c32 c33 CPLEX 0 2 0 2 1 3 1 0 0 LINDO 2 0 0 0 0 2 1 3 1 PC-PROG 0 0 0 0 3 1 3 0 2 XMP 0 0 2 3 3 0 0 0 1 OSL 0 2 0 2 1 3 1 0 0 [2,0] [2,0] [2,0] [3,0] [3,0] [3,0] [3,0] [3,0] [3,0] Correct values Note that in all cases the ‘shadow price’ of a package lies in the interval between the left- and right-shadow prices. The last table shows the ‘ranges’ of the packages and the correct left- and right-hand side ranges for the cost coefficients.28 It is easy to understand the correct ranges. For example, if c11 increases then the corresponding arc becomes more expensive than the other arcs, and hence will not be used in an optimal solution. On the other hand, if 28 In this table we use shorthand notation for the infinite intervals [1,∞) and (-∞,1]. The interval [1,∞) is denoted by [1, ) and the interval (-∞,1] by ( ,1]. 398 IV Miscellaneous Topics c11 decreases than it becomes attractive to use this arc as much as possible; due to the limited supply value (i.e., 2) in the first supply node a flow of value 2 will be sent along this arc whatever the value of c11 is. Considering c21 , we see the same behavior if c21 increases: the arc will not be used. But if c21 ∈ [0, 1], then the arc will be preferred above the other arcs, and its flow value will be 3. If c21 would become negative, then it becomes attractive to send even a flow of value 6 along this arc, despite the fact that than the first demand node receives oversupply. So c21 = 0 is a break point. Note that if a ‘shadow price’ of a package is equal to the left or right-shadow price then the ‘range’ provided by the package must be a subinterval of the correct range. Moreover, if the ‘shadow price’ of a package is not equal to the left or right-shadow price then the ‘range’ provided by the package must be the singleton [1, 1]. The results of the packages are consistent in this respect, as follows easily by inspection. ‘Ranges’ of the cost coefficients LO package c11 c12 c13 c21 c22 c23 c31 CPLEX [1, ) ( ,1] [1, ) [1,1] [1,1] [0,1] [1,1] [1, ) [1, ) LINDO ( ,1] [1, ) [1, ) [1, ) [1, ) [1,1] [1,1] [0,1] [1,1] PC-PROG [1, ) [1, ) [1, ) [1, ) [0,1] [1,1] [0,1] [1, ) [1,1] XMP 29 c32 c33 – – (,1] [0,1] [0,1] [1,1] – – [1,1] 30 [1, ) [1,1] [1, ) [1,1] [1,1] [1,1] [1,1] [1, ) [1, ) Left range ( ,1] ( ,1] ( ,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] Right range [1, ) [1, ) [1, ) [1, ) [1, ) [1, ) [1, ) [1, ) [1, ) OSL 19.6 Concluding remarks In this chapter we developed the theory necessary for the analysis of one-dimensional parametric perturbations of the vectors b and c in the standard formulation of the primal problem (P ) and its dual problem (D). Given a pair of optimal solutions for these problems, we presented algorithms in Section 19.4.5 for the computation of the optimal-value function under such a perturbation. In Section 19.5 we concentrated on the special case of sensitivity analysis. In Section 19.5.1 we showed that the ranges and shadow prices of the coefficients of b and c can be obtained by solving auxiliary LO problems. We also discussed how the ranges obtained in this way can be ambiguous, but that the ambiguity can be avoided by using strictly complementary solutions. We proceeded in Section 19.5.3 by discussing the classical approach to sensitivity analysis, based on the use of an optimal basic solution and the corresponding optimal basis. We showed that this approach is much cheaper from a computational point of 29 For some unclear reason XMP did not provide all ranges. The missing entries in its row are all equal to [1,∞). 30 In Remark IV.84 it was established that OSL and CPLEX use the same optimal basis; nevertheless their ‘ranges’ for c12 and c23 are different. One may easily verify that these ‘ranges’ are (−∞, 1] and [0, 1] respectively. Thus, the CPLEX ‘ranges’ are consistent with this optimal basis and the OSL ‘ranges’ are not. IV.19 Parametric and Sensitivity Analysis 399 view. On the other hand, much less information is usually obtained and the information is often confusing. In the previous section we provided a striking example by presenting the sensitivity information provided by five commercial optimization packages for a simple transportation problem. The shortcomings of the classical approach are well known among experts in the field. At several places in the literature these experts raised their voices to warn of the possible implications of using the classical approach. By way of example we include a citation of Rubin and Wagner [247]: Managers who build their own microcomputer linear programming models are apt to misuse the resulting shadow prices and shadow costs. Fallacious interpretations of these values can lead to expensive mistakes, especially unwarranted capital investments. As a result of the unreliability of the sensitivity information provided by computer packages, the reputation of sensitivity analysis as a tool for obtaining information about the bottlenecks and degrees of freedom has suffered a lot. Many potential users of such information do not use it, because they want to avoid the pitfalls that are inherent in the classical approach. The theory developed in this chapter provides a solid base for reliable sensitivity modules in future generations of computer packages for LO. 20 Implementing Interior Point Methods 20.1 Introduction Several polynomial interior-point algorithms were discussed in the previous chapters. Interior point algorithms not only provide the best theoretical complexity for LO problems but allow highly efficient implementations as well. Obviously not all polynomial algorithms are practically efficient. In particular, all full Newton step methods (see, e.g., Section 6.7) are inefficient in practice. However variants like the predictor-corrector method (see Section 7.7) and large-update methods (see Section 6.9) allow efficient implementations. The aim of this chapter is to give some hints on how some of these interior point algorithms can be converted into efficient implementations. To reach this goal several problems have to be dealt with. Some of these problems have been at least partially discussed earlier (e.g., the embedding problem in Chapter 2) but need further elaboration. Some other topics (e.g., methods of sparse numerical linear algebra, preprocessing) have not yet been touched. By reviewing the various interior-point methods we observe that they are all based on similar assumptions and are built up from similar ingredients. We can extract the following essential elements of interior-point methods (IPMs). Appropriate problem form. All algorithms assume that the LO problem satisfies certain assumptions. The problem must be in an appropriate form (e.g., the canonical form or the standard form). In the standard form the coefficient matrix A must have full row rank. Techniques to bring a given LO problem to the desired form, and at the same time to eliminate redundant constraints and variables, are called preprocessing and are discussed in Section 20.3. Search direction. The search direction in interior-point methods is always a Newton direction. To calculate this direction we have to solve a system of linear equations. Except for the right-hand side and the scaling, this system is the same for all the methods. Computationally the solution of the system amounts to factorizing a square matrix and then solving the triangular systems by forward or backward substitution. The factorization is the most expensive part of an iteration. Without efficient sparse linear algebra routines, interior-point methods would not be practical. Various elements of sparse matrix techniques are discussed in Section 20.4. A straightforward idea for reducing the computational 402 IV Miscellaneous Topics cost is to reuse the same factorization. This leads to the idea of second- and higher-order methods discussed in Section 20.4.3. Interior point. The interior-point assumption is presupposed, i.e. that both the primal and the dual problem have a strictly positive (preferably centered) initial solution. Most LO problems do not have such a solution, but still have to be solved. A theoretically appealing and at the same time practical method is to embed the problem in a self-dual model, as discussed in Chapter 2. The embedding model is revisited and elaborated on in Section 20.5. Reoptimization. In practice it often happens that several variants of the same LO problem need to be solved successively. One might expect that the solution of an earlier version would be a good starting point for a slightly modified problem. For this so-called warm start problem the embedding model also provides a good solution as discussed in Section 20.5.2. Parameters: Step size, stopping criteria. The iterates in IPMs should stay in some neighborhood of the central path. Theoretically good step-sizes can result in hopelessly slow convergence in practice. A practical step-size selection rule is discussed. At some point, when the duality gap is small, the calculation is terminated. The theoretical criteria are typically far beyond today’s machine precision. A practical criterion is presented in Section 20.6. Optimal basis identification. It is not an essential element of interior-point methods, but sometimes it still might be important1 to find an optimal basis. Then we need to provide the ability to ‘cross over’ from an interior solution to a basic solution. An elegant strongly polynomial algorithm is presented in Section 20.7. 20.2 Prototype algorithm In most practical LO problems, in addition to the equality and inequality constraints, the variables have lower and upper bounds. Thus we deal with the primal problem in the following form:  min cT x : Ax ≥ b, x ≤ bu , x ≥ 0 , (20.1) x n where c, x, bu ∈ IR , b ∈ IRm , and the matrix A is of size m × n. Now its dual is  (20.2) max bT y − bTu yu : AT y − yu ≤ c, y ≥ 0, yu ≥ 0 , y,yu where y ∈ IRm and yu ∈ IRn . Let us denote the slack variables in the primal problem (20.1) by z = Ax − b, zu = bu − x and in the dual problem (20.2) by s = c + yu − AT y, 1 Here we might think about linear integer optimization when cutting planes are to be generated to cut off the current nonintegral optimal solution. IV.20 Implementing Interior Point Methods 403 respectively. Here we assume not only that the problem pair satisfies the interior-point assumption, but also that a strictly positive solution (x, zu , z, s, yu , y) > 0 is given, satisfying all the constraints in (20.1) and (20.2) respectively. How to solve these problems without the explicit knowledge of an interior point is discussed in Section 20.5. The central path of the pair of problems given in (20.1) and (20.2) is defined as the set of solutions (x(µ), zu (µ), z(µ)) > 0 and (s(µ), yu (µ), y(µ)) > 0 for µ > 0 of the system Ax − z = b, x + zu A y − yu + s = = bu , c, xs zu yu = = µe, µe, zy = µe, T (20.3) where e is the vector of all ones with appropriate dimension. Observe that the first three of the above equations are linear and force primal and dual feasibility of the solution. The last three equations are nonlinear. They become the complementarity conditions when µ = 0, which together with the feasibility constraints provides optimality of the solutions. The actual duality (or complementarity) gap g can easily be computed: g = xT s + zuT yu + z T y, which equals (2n + m)µ on the central path . One iteration of a primal-dual algorithm makes one step of Newton’s method applied to system (20.3) with a given µ; then µ is reduced and the procedure is repeated as long as the duality gap is larger than a predetermined tolerance. Given a solution (x, zu , z) > 0 of (20.1) and (s, yu , y) > 0 of (20.2) the Newton direction for (20.3) is obtained by solving a system of linear equations. This system can be written as follows, where the ordering of the variables is chosen so that the structure of the coefficient matrix becomes apparent.      0 A −I 0 0 0 0 ∆y 0   AT 0 0 −I 0 I   ∆x      ∆z    0 I 0 0 0 I 0 . =   (20.4)       0 S 0 µe − xs ∆y 0 0 X  u     µe − zu yu ∆zu 0 0 0 Zu Yu 0 µe − zy ∆s Z 0 Y 0 0 0 In making a step, in order to preserve the positivity of (x, zu , z) and (s, yu , y), a stepsize α usually smaller than one (a damped Newton step) is chosen. Let us have a closer look at the Newton system. From the last four equations in (20.4) we can easily derive ∆zu = ∆s ∆yu = = ∆z = −∆x, x−1 (µe − xs − s∆x), zu−1 (µe − zu yu − yu ∆zu ) = zu−1 (µe − zu yu + yu ∆x), y −1 (µe − yz − z∆y). (20.5) 404 IV Miscellaneous Topics With these relations, (20.4) reduces to " #" D2 A −D̄−2 AT ∆y ∆x # = " r h # , (20.6) where D2 = ZY −1 −2 r = = h = SX −1 + Yu Zu−1 y −1 (µe − yz) D̄ zu−1 (µe − zu yu ) − x−1 (µe − xs). The solution of the reduced Newton system (20.6) is the computationally most involved step of any interior point method. The system (20.6) in this form is a symmetric indefinite system and is referred to as the augmented system. If the second equation in the augmented system is multiplied by −1 a system with a positive definite (but unsymmetric) matrix is obtained. The augmented system (20.6) is equivalent to ∆x = D̄2 (AT ∆y − h) and  AD̄2 AT + D2 ∆y = r + AD̄2 h. (20.7) 2 The last equation is referred to as the normal equation. The way to solve the systems (20.6) and (20.7) efficiently is discussed in detail in Section 20.4. After system (20.6) or (20.7) is solved, using formulas (20.5) we obtain the solution of (20.4). Now the maximal feasible step lengths αP for the primal (x, z, zu ) and αD for the dual (s, y, yu ) variables are calculated. Then these step-sizes are slightly reduced by a factor α0 < 1 to avoid reaching the boundary. The new iterate is computed as xk+1 := xk + α0 αP ∆x, zuk+1 := zuk + α0 αP ∆zu , z k+1 := z k + α0 αP ∆z, sk+1 := sk + α0 αD ∆s, yuk+1 := yuk + α0 αD ∆yu , y k+1 := y k + α0 αD ∆y. (20.9) After the step, the parameter µ is updated and the process is repeated. A prototype algorithm can be summarized as follows. 2 Exercise 103 Show that if first ∆y is calculated from the system (20.6) as a function of ∆x the following formulas arise: ∆y = −D −2 (A∆x − r) and  AT D −2 A + D̄ −2 ∆x = AT D −2 r − h. (20.8) Observe that this symmetric formulation allows for further utilization of the structure of the normal equation. We are free to choose between (20.7) and (20.8) depending on which has a nicer sparsity structure. IV.20 Implementing Interior Point Methods 405 Prototype Primal–Dual Algorithm Input: An accuracy parameter ε > 0; (x0 , zu0 , z 0 ) and (s0 , yu0 , y 0 ); interior solutions for (20.1) and (20.2); parameter α0 < 1; µ0 > 0. begin (x, zu , z, s, yu , y) = (x0 , zu0 , z 0 , s0 , yu0 , y 0 ); µ := µ0 ; while (2n + m)µ ≥ ε do begin reduce µ solve (20.4) to obtain (∆x, ∆zu , ∆z, ∆s, ∆yu , ∆y); determine αP and αD ; update (x, zu , z, yu , y, s) by (20.9) end end Before discussing all the ingredients in more detail we make an important observation. Solving a problem with individual upper bounds on the variables does not require significantly more computational effort than solving the same problem without such upper bounds. In both cases the augmented system (20.6) and the normal equation (20.7) have the same size. The extra costs per iteration arising from the upper bounds are just O(n), namely some extra ratio tests to determine the maximal possible steps sizes and some extra vector manipulations (see equations (20.5)).3 20.3 Preprocessing An important issue for all implementations is to transform the problem into an appropriate form, e.g., to the canonical form with upper bounded variables (20.1), and to reduce the problem size in order to reach a minimal representation of the problem. This aim is quite plausible. A smaller problem needs less memory to store, usually fewer iterations of the algorithm, and if the transformation reduces the number of nonzero coefficients or improves the sparsity structure then fewer arithmetic operations per iteration are needed. A minimal representation should be free of redundancies, implied variables and inequalities. In general it is not realistic to strive to find the minimal representation of a given problem. But by analysing the structure of the problem it is often possible to reduce the problem size significantly. In fact, almost all large-scale LO problems contain redundancies in practice. The use of modeling languages and matrix generators easily allows the generation of huge models. Modelers choose to formulate models that are easy to understand and modify; this often leads to the introduction 3 Exercise 104 Check that the computational cost per iteration increases just by O(n) if individual upper bounds are imposed on the variables. 406 IV Miscellaneous Topics of superfluous variables and redundant constraints. To remove at least most of these redundancies is, however, a nontrivial task; this is the aim of preprocessing. As we have already indicated, computationally the most expensive part of an interior-point iteration is calculating the search direction, to solve the normal equation (20.7) or the augmented system (20.6). With a compact formulation the speedup can be significant.4 20.3.1 Detecting redundancy and making the constraint matrix sparser By analysing the sparsity pattern of the matrix A, one can frequently reduce the problem size. The aim of the sparsity analysis is to reduce the number of nonzero elements in the constraint matrix A; this is done by elementary matrix operations. In fact, as a consequence, the sparsity analysis mainly depends on just the nonzero structure of the matrix A and it is largely independent of the magnitude of the coefficients. 1. First we look for pairs of constraints with the same nonzero pattern. If we have two (in-)equality constraints which are identical — up to a scalar multiplier — then one of these constraints is removed from the problem. If one of them is an equality constraint and the other an inequality constraint then the inequality constraint is dropped. If they are opposite inequalities then they are replaced by one equality constraint. 2. Linearly dependent constraints are removed. (Dependency can easily be detected by using elimination.) 3. Duplicate columns are removed. 4. To improve the sparsity pattern of the constraint matrix A further we first put the constraints into equality form. Then by adding and subtracting constraints with appropriate multipliers, we can eliminate several nonzero entries.5 During this process we have to make sure that the resulting sparser system is equivalent to the original one. Mathematically this means that we look for a nonsingular matrix Q ∈ IRm×m such that the matrix QA is as sparse as possible. Such sparser constraints in the resulting equivalent formulation QAx = Qb might be much more suitable for a direct application of the interior-point solver.6 4 5 6 Preprocessing is not a new idea, but has enjoyed much attention since the introduction of interiorpoint methods. This is due to the fact that the realized speedup is often larger than for the Simplex Method. For further reading we refer the reader to, e.g., Brearley, Mitra and Williams [49], Adler et al. [1], Lustig, Marsten and Shanno [191], Andersen and Andersen [9], Gondzio [113], Andersen [8], Bixby [42], Lustig, Marsten and Shanno [193] and Andersen et al. [10]. As an illustration let us consider two constraints aT x = bk and aT j x = bj where σ(ak ) ⊆ σ(aj ). k (Recall that σ(x) = { i | xi 6= 0 }.) Now if we define aj = aj + λak and bj = bj + λbk , where λ T is chosen so that σ(aj ) ⊂ σ(aj ), then the constraint aT j x = bj can be replaced by aj x = bj while the number of nonzero coefficients is reduced by at least one. Exact solution of this Sparsity Problem is an NP-complete problem (Chang and McCormick [54]) but efficient heuristics (Adler et al. [1], Chang and McCormick [54] and Gondzio [113]) usually produce satisfactory nonzero reductions in A. The algorithm of Gondzio [113], for example, looks for a row of A that has a sparsity pattern that is a subset of the sparsity pattern of other rows and uses it to eliminate nonzero elements from these rows. IV.20 Implementing Interior Point Methods 20.3.2 407 Reducing the size of the problem In general, finding all redundancies in an LO problem is a more difficult problem than solving the problem; hence, preprocessing procedures use a great variety of simple inspection techniques to detect obvious redundancies. These techniques are very cheap and fast, and are applied repeatedly until the problem cannot be reduced by these techniques any more. Here we discuss a small collection of commonly used reduction procedures. 1. Empty rows and columns are removed. 2. A fixed variable (xj = uj ) can be substituted out of the problem. 3. A row with a single variable defines a simple bound; after an appropriate bound update the row can be removed. 4. We call variable xj a free column singleton if it contains a single nonzero coefficient and there are neither lower nor upper bounds imposed on it. In this case the variable xj can be substituted out of the problem. As a result both the variable xj and the constraint in which it occurs are eliminated. The same holds for so-called implied free variables, i.e., for variables for which implied bounds (discussed later on) are at least as tight as the original bounds. 5. All the free variables can be eliminated by making them a free singleton column by eliminating all but one coefficient in their columns. Here we recall the techniques that were discussed in Theorem D.1 in which the LO problem was reduced to canonical form. In the elimination steps we have to pay special attention to the sparsity, by choosing elements in the elimination steps that reduce the number of nonzero coordinates in A or, at least, produce the smallest amount of new nonzero elements. 6. Trivial lower and upper bounds for each constraint i are determined. If X X (20.10) aij buj , and bi = aij buj , bi = {j:aij <0} {j:aij >0} then clearly bi ≤ X j aij xj ≤ bi . (20.11) Observe that due to the nonnegativity of x, for the bounds we have bi ≤ 0 ≤ bi . If the inequalities (20.11) are at least as tight as the original constraint, then the constraint i is redundant. If one of them contradicts the original i-th constraint, then the problem is infeasible. In some special cases (e.g.: ‘less than or equal to’ row with bi = bi , ‘greater than or equal to’ row with bi = bi , or equality row for which bi is equal to one of the limits bi or bi ) the constraint in the optimization problem becomes a forcing one. This means that the only way to satisfy the constraint is to fix all variables that appear in it on their appropriate bounds. Then all of these variables can be substituted out of the problem. 7. From the constraint limits (20.10), implied variable bounds can be derived (remember, we have 0 ≤ x ≤ bu ). Assume that for an inequality constraint the bounds (20.11) are derived. Then for each k such that aik > 0 we have X aij xj ≤ bi bi + aik xk ≤ j 408 IV Miscellaneous Topics and for each k such that aik < 0 we have bi + aik (xk − uk ) ≤ X j aij xj ≤ bi . Now the new implied bounds from row i are easily derived as ′ xk ≤ uk = (bi − bi )/aik for all k : aik > 0, xk ≥ lk = uk + (bi − bi )/aik for all k : aik < 0. ′ If these bounds are tighter than the original ones, then the variable bounds are improved. 8. Apply the same techniques to the dual problem. The application of all presolve techniques described so far often results in impressive reductions of the initial problem formulation. Once the solution for the reduced problem is found, we have to recover the complete primal and dual solutions for the original problem. This phase is called postprocessing. 20.4 Sparse linear algebra As became clear in Section 20.2, the computationally most intensive part of an interiorpoint algorithm is to solve either the augmented system (20.6):      D2 A ∆y r = , (20.12) ∆x h AT −D̄−2 or the normal equation (20.7): 2  AD̄2 AT + D2 ∆y = q, (20.13) where q = r + AD̄ h. At each iteration one of the systems (20.12) or (20.13) has to be solved. In the subsequent iterations only the diagonal scaling matrices D and D̄ and the right-hand sides are changing. The nonzero structure of the augmented and normal matrices remains the same in all the iterations. For an efficient implementation it is absolutely necessary to design numerical routines that make use of this constant sparsity structure. 20.4.1 Solving the augmented system To solve the augmented system (20.12) a well-established technique, the Bunch– Parlett factorization7 may be used. Observe that the coefficient matrix of (20.12) is nonsingular, symmetric and indefinite. The Bunch–Parlett factorization of the symmetric indefinite matrix has the form   D2 A P T = LΛLT , (20.14) P AT −D̄−2 7 For the original description of the algorithm we refer to Bunch and Parlett [53]. For further application to solving least squares problems we refer the reader to Arioli, Duff and de Rijk [27], Björk [44] and Duff [68]. IV.20 Implementing Interior Point Methods 409 for some permutation matrix P , where Λ is an indefinite block diagonal matrix with 1 × 1 and 2 × 2 blocks and L is a lower triangular matrix. The factorization is basically an elimination (Gaussian) algorithm, in which we have to specify at each stage which row and which column is used for the purpose of elimination. In the Bunch–Parlett factorization, to produce a sparse and numerically stable L and Λ at each iteration the system is dynamically analyzed. Thus it may well happen that at each iteration structurally different factors are generated. This means that in the choice of the element that is used for the elimination, both the sparsity and stability of the triangular factor are considered. Within these stability and sparsity considerations we have a great deal of freedom in this selection; we are not restricted to the diagonal elements (one possible trivial choice) of the coefficient matrix. The efficiency depends strongly on the heuristics used in the selection strategy. The relatively expensive so-called analyze phase is frequently skipped and the same structure is reused in subsequent iterations and updated only occasionally when the numerical properties make it necessary. A popular selection rule is detecting ‘dense’ columns and rows (with many nonzero coefficients) and eliminating first in the diagonal positions of D2 and D̄−2 in the augmented matrix (20.12) corresponding to sparse rows and columns. The dense structure is pushed to the last stage of factorization as a dense window. In general it is unclear what threshold density should be used to separate dense and sparse structures. When the number of nonzeros in dense columns is significantly larger than the average number of entries in sparse columns then it is easy to determine a fixed threshold value. Whenever more complicated sparsity structures appear, more sophisticated heuristics are needed.8 20.4.2 Solving the normal equation The other popular method for calculating the search direction is to solve the normal equation (20.13). The method of choice in this case is the sparse Cholesky factorization:  P̄ AD̄2 AT + D2 P̄ T = L̄ΛL̄T , (20.15) for some permutation matrix P̄ , where L̄ is a lower triangular matrix and Λ is a positive definite diagonal matrix. It should be clear from the derivation of the normal equation that the normal equation approach can be considered as a special implementation of the augmented system approach. More concretely this means that we first eliminate either ∆x or ∆y by using all the diagonal entries of either D̄−2 or D2 . Thus the normal equation approach is less flexible but, on the other hand, the coefficient matrix to be factorized is symmetric positive definite, and both the matrix and its factors have a constant sparsity structure. The Cholesky factorization of (20.15) exists for any positive D2 and D̄2 . The sparsity structure of L̄ is independent of these diagonal matrices and hence is constant in all iterations if the same elimination steps are performed. Consequently it is sufficient to analyze the structure just once and determine a good ordering of the rows and 8 To discuss these heuristics is beyond the scope of this chapter. The reader can find detailed discussion of the advantages and disadvantages of the normal equation approach in the next section and in the papers Andersen et al. [10], Duff et al. [69], Fourer and Mehrotra [78], Gondzio and Terlaky [116], Maros and Mészáros [195], Turner [275], Vanderbei [277] and Vanderbei and Carpenter [278]. 410 IV Miscellaneous Topics columns in order to obtain sparse factors. To determine such an ordering involves considerable computational effort, but it is the basis of a successful implementation of the Cholesky factorization in interior-point methods. This is the analyze phase. More formally, we have to find a permutation matrix P such that the Cholesky factor of P (AD̄2 AT + D2 )P T is the sparsest possible. Due to the difficulty of this problem, heuristics are used in practice to find such a good permutation.9 Two efficient heuristics, namely the minimum degree and the minimum local fill-in orderings, are particularly useful in interior-point method implementations. These heuristics are described briefly below. Minimum degree ordering Since the matrix to be factorized is positive definite and symmetric the elimination can be restricted to the diagonal elements. This limitation preserves the symmetry and positive definiteness of the Schur complement. Let us assume that in the k-th step of the Gaussian elimination the i-th row of the Schur complement contains ni nonzero entries. If this row is used for the elimination, then the elimination requires fi = 1 (ni − 1)2 , 2 (20.16) floating point operations (flops). The number fi estimates the computational effort and gives an overestimate of the fill-in that can result from the elimination. The best choice of row i at step k is the one that minimizes fi .10 Minimum local fill-in ordering Let us observe that, in general, fi in (20.16) considerably overestimates the number of fill-ins at a given iteration of the elimination process because it does not take into account the fact that in many positions of the predicted fill-in, nonzero entries already exist. It is possible that another candidate that seems to be worse in terms of (20.16) would produce less fill-in because in the elimination, mainly existing nonzero entries would be updated. The minimum local fill-in ordering takes locally the real fill-in into account. As a consequence, each step is more expensive but the resulting factors are sparser. This higher cost has to be paid once in the analyze phase. Disadvantages of the normal equations approach The normal equations approach shows uniformly good performance when applied to the solution of the majority of linear programs. Unfortunately, it suffers from a serious drawback. The presence of dense columns in A might be catastrophic if they are not treated with extra care. A dense column of A with k nonzero elements creates a k × k dense submatrix (dense window) of the normal matrix (20.13). Such dense columns do not seriously influence the efficiency of the augmented system approach. 9 Yannakakis [302] proved that finding the optimal permutation is an NP-complete problem. 10 The function fi is Markowitz’s merit function [194]. Interpreting this process in terms of the elimination graph (cf. George and Liu [94]), we can see that it is equivalent to the choice of the node in the graph that has the minimum degree (this gave the name to this heuristic). IV.20 Implementing Interior Point Methods 411 In order to handle dense columns efficiently the first step is to identify them. This typically means to choose a threshold value. If the number of nonzeros in a column is larger than this threshold, the column is considered to be dense, the remaining columns as sparse. Denoting the matrix of the sparse columns in A by As and the matrix of the dense columns by Ad , the equation (20.12) can be written as follows.  D2 Ad As  ∆y   r      T   Ad −D̄d−2 0   ∆xd  =  hd  . hs ∆xs ATs 0 −D̄s−2 (20.17) After eliminating ∆xs = −D̄s−2 (hs − ATs dy ) we get the equation  D2 + As D̄s−2 ATs ATd Ad −D̄d−2   ∆y ∆xd =  r + As D̄s−2 hs hd  . (20.18) Here the left-upper block of the coefficient matrix is positive definite symmetric and sparse, thus it is easy to factorize efficiently. As the reader can easily see, this approach tries to combine the advantages of the normal equation approach and the augmented system approach.11,12 20.4.3 Second-order methods An attempt to reduce the computational cost of interior-point methods is based on trying to reuse the same factorization of either the normal matrix or the augmented system. Both in theory and in practice, factorization is much more expensive than backsolve of triangular systems; so we can do additional backsolves in each iteration with different right-hand sides if these reduce the total number of interior-point iterations. This is the essential idea of higher-order methods. Our discussion here follows the present computational practice; so we consider only the second-order 11 An appealing advantage of the symmetric formulation of the LO problem is that in (20.18) the matrix D 2 + As D̄s−2 AT s is nonsingular. If one would use the standard Ax = b, x ≥ 0 form, then we would have just As D̄s−2 AT s which might be singular. To handle this unpleasant situation an extra trick is needed. For this we refer the reader to Andersen et al. [13] and also to Exercise 105. 12 Exercise 105 Verify that (∆y, ∆xd ) is the solution of  Ad As D̄s−2 AT s AT − D̄d−2 d  ∆y ∆xd  =  r + As D̄s−2 hs hd  if and only if (∆y, ∆xd , u) solves    T Ad Q As D̄s−2 AT ∆y s + QQ  AT −D̄d−2 0   ∆xd  = d u QT 0 I   r + As D̄s−2 hs   hd 0 with any matrix Q having appropriate dimension. Observe that by choosing Q properly (e.g. T diagonal) the matrix As D̄s−2 AT s + QQ is nonsingular. 412 IV Miscellaneous Topics predictor-corrector method that is implemented in several codes with great success.13 Predictor-corrector technique This predictor-corrector method has two components. The first is an adaptive choice of the barrier parameter µ; the other is a second-order approximation of the central path . The first step in the predictor-corrector algorithm is to compute the primal-dual affine-scaling (predictor) direction. This is the solution of the Newton system (20.4) with µ = 0 and is indicated by ∆a . It is easy to see that if a step of size α is taken in the affine-scaling direction, then the duality gap is reduced by α; i.e. if a large step can be made in this direction then significant progress is made in the optimization. If the feasible step-size in the affine-scaling direction is small, we expect that the current point is close to the boundary; thus centering is needed and µ should not be reduced too much. In the predictor-corrector algorithm, first the predicted duality gap is calculated that results from a step along the primal-dual affine-scaling direction. To this end, when the affine-scaling direction is computed, the maximum primal (αaP ) and dual (αaD ) feasible step sizes are determined that preserve nonnegativity of (x, zu , z) and (s, yu , y). Then the predicted duality gap ga = (x + αaP ∆a x)T (s + αaD ∆a s) + (zu + αaP ∆a zu )T (yu + αaD ∆a yu ) +(z + αaP ∆a z)T (y + αaD ∆a y) is computed and is used to determine a target point µ=  g 2 g a a g n (20.19) on the central path . Here ga /n relates to the central point with the same duality gap that the predictor affine step would produce, and the factor (ga /g)2 pushes the target further towards optimality in a way that depends on the achieved reduction of the predictor step. Now the second-order component of the predictor-corrector direction is computed. Ideally we would like to compute a step such that the next iterate is perfectly centered, i.e., 13 (x + ∆x)(s + ∆s) (zu + ∆zu )(yu + ∆yu ) = = µe, µe, (z + ∆z)(y + ∆y) = µe, The second-order predictor-corrector technique presented here is due to Mehrotra [205]; from a computational point of view the method is very successful. The higher than order 2 methods — discussed in Chapter 18 — are implementable too, but to date computational results with methods of order higher than 2 are quite disappointing. See Andersen et al. [10]. Mehrotra was motivated by the paper of Monteiro, Adler and Resende [220], who were the first to introduce the primal-dual affine-scaling direction and higher-order versions of the primal-dual affine-scaling direction; they elaborated on a computational paper of Adler, Karmarkar, Resende and Veiga [2] that uses the dual affine-scaling direction and higher-order versions of it. IV.20 Implementing Interior Point Methods 413 or equivalently x∆s + s∆x = zu ∆yu + yu ∆zu z∆y + y∆z = = −xs + µe − ∆x∆s, −zu yu + µe − ∆zu ∆yu , −zy + µe − ∆z∆y. Usually, in the computation of the Newton direction the second-order terms ∆x∆s, ∆zu ∆yu , ∆z∆y are neglected (recall (20.4)). Instead of neglecting the second-order term, the affine directions ∆a x, ∆a s, ∆a zu ∆a yu , ∆a z∆a y are used as the predictions of the second-order effect. One step of the algorithm can be summarized as follows. • Solve (20.4) with µ = 0, resulting in the affine step (∆a x, ∆a zu , ∆a z) and (∆a s, ∆a yu , ∆a y). • Calculate the maximal feasible step lengths αaP and αaD . • Calculate the predicted duality gap ga and µ by (20.19). • Solve the corrected Newton system A∆x − ∆z ∆x + ∆zu = = 0 0 AT ∆y − ∆yu + ∆s x∆s + s∆x = = 0 −xs + µe − ∆a x∆a s, zu ∆yu + yu ∆zu = z∆y + y∆z = a (20.20) a −zu yu + µe − ∆ zu ∆ yu , −zy + µe − ∆a z∆a y. • Calculate the maximal feasible step lengths αP and αD and make a damped step by using (20.9).14 Finally, observe that a single iteration of this second-order predictor-corrector primaldual method needs two solves of the same large, sparse linear system (20.4) and (20.20) for two different right-hand sides. Thus the same factorization can be used twice. 20.5 Starting point The self-dual embedding problem is an elegant theoretical construction for handling the starting point problem. At the same time it can also be the basis of an efficient implementation. In this section we show that solving the slightly larger embedding 14 This presentation of the algorithm follows the paper of Mehrotra [205]. It differs from the 2-order method of Chapter 18. 414 IV Miscellaneous Topics problem does not increase the computational cost significantly.15 Before presenting the embedding problem, we summarize some of its surprisingly nice properties. 1. The embedding problem is self-dual: the dual problem is identical to the primal one. 2. It is always feasible. Furthermore, the interior of the feasible set of the embedding problem is also nonempty; hence the optimal faces are bounded (from Theorem II.10). So interior-point methods always converge to an optimal solution. 3. Optimality of the original problem is detected by convergence, independently of the boundedness or unboundedness of the optimal faces of the original problem. 4. Infeasibility of the original problem is detected by convergence as well.16 Primal, dual or primal and dual rays for the original problems are identified to prove dual, primal or dual and primal infeasibility (cf. Theorem I.26). 5. For the embedding problem a perfectly centered initial pair can always be constructed. 6. It allows an elegant handling of the warm start problem. 7. The embedding problem can be solved with any method that generates a strictly complementary solution; if the chosen method is polynomial, it solves the original problem with essentially the same complexity bound. Thus we can achieve the best possible complexity bounds for solving an arbitrary problem. Self-dual embedding We consider problems (20.1) and (20.2). To formulate the embedding problem we need to introduce some further vectors in a way similar to that of Chapter 2. We start with x0 > 0, zu0 > 0, z 0 > 0, s0 > 0, yu0 > 0, y 0 > 0, κ0 > 0, ϑ0 > 0, ρ0 > 0, ν 0 > 0, where x0 , zu0 , s0 , yu0 ∈ IRn , y 0 , z 0 ∈ IRm and κ0 , ϑ0 , ρ0 , ν 0 ∈ IR are arbitrary. Then we define b̄ ∈ IRm , b̄u , c̄ ∈ IRn , the scaled error at the arbitrary initial interior solutions (recall the construction in Section 4.3), and parameters β, γ ∈ IR as follows: 15 16 b̄u = b̄ = c̄ = β = 1 (bu κ0 − x0 − zu0 ) ϑ0 1 (bκ0 − Ax0 + z 0 ) ϑ0 1 (cκ0 + yu0 − AT y 0 − s0 ) ϑ0 1 T 0 (c x − bT y 0 + bTu yu0 + ν 0 ) ϑ0 Such embedding was first introduced by Ye, Todd and Mizuno [316] using the standard form problems (20.29) and (20.30). They discussed most of the advantages of this embedding and showed √ that Mizuno, Todd and Ye’s [217] predictor-corrector algorithms solve the LO problem in O( nL) iterations, yielding the first infeasible IPM with this complexity. Somewhat later Jansen, Roos and Terlaky [155] presented the self-dual problem for the symmetric form primal-dual pair in a concise introduction to the theory of LO based on IPMs. The popular so-called infeasible-start methods detect unboundedness or infeasibility of the original problem by divergence of the iterates. IV.20 Implementing Interior Point Methods γ = = 415 βκ0 + b̄T y 0 − b̄u yu0 − c̄T x0 + ρ0 1 [(x0 )T s0 + (yu0 )T zu0 + (y 0 )T z 0 + κ0 ρ0 ] + ρ0 > 0. ϑ0 It is worth noting that if x0 is strictly feasible for (20.1), κ0 = 1, z 0 = Ax0 − b and zu0 = bu − x0 , then b̄ = 0 and b̄u = 0. Also if (y 0 , yu0 ) is strictly feasible for (20.2), κ0 = 1 and s0 = c − AT y + yu0 , then c̄ = 0. In some sense the vectors b̄, b̄u and c̄ measure the amount of scaled infeasibility of the given vectors x0 , z 0 , zu0 , s0 , y 0 , yu0 . Now consider the following self-dual LO problem: (SP) min γϑ −x Ax s.t. yu −bTu yu b̄Tu yu −AT y +bT y +bu κ −b κ +c κ −cT x −b̄u ϑ +b̄ ϑ −c̄ ϑ +β ϑ −b̄T y +c̄T x −β κ yu ≥ 0, y ≥ 0, x ≥ 0, κ ≥ 0, ϑ ≥ 0. ≥0 ≥0 ≥0 ≥0 ≥ −γ Let us denote the slack variables for the problem (SP ) by zu , z, s, ν and ρ respectively. By construction the positive solution x = x0 , z = z 0 , zu = zu0 , s = s0 , y = y 0 , yu = yu0 , κ = κ0 , ϑ = ϑ0 , ν = ν 0 , ρ = ρ0 is interior feasible for problem (SP). Also note that if, e.g., we choose x = x0 = e, z = z 0 = e, zu = zu0 = e, s = s0 = e, y = y 0 = e, yu = yu0 = e, κ = κ0 = 1, ϑ = ϑ0 = 1, ν = ν 0 = 1, ρ = ρ0 = 1, then this solution with µ = 1 is a perfectly centered initial solution for problem (SP). The following theorem follows easily in the same way as Theorem I.20.17 Theorem IV.85 The embedding (SP ) of the given problems (20.1) and (20.2) has the following properties: (i) The self-dual problem (SP ) is feasible and hence both primal and dual feasible. Thus it has an optimal solution. (ii) For any optimal solution of (SP), ϑ∗ = 0. (iii) (SP ) always has a strictly complementary optimal solution (yu∗ , y ∗ , x∗ , κ∗ , ϑ∗ ). (iv) If κ∗ > 0, then x∗ /κ∗ and (y ∗ , yu∗ )/κ∗ are strictly complementary optimal solutions of (20.1) and (20.2) respectively. (v) If κ∗ = 0, then either (20.1) or (20.2) or both are infeasible. Solving the embedding model needs just slightly more computation per iteration than solving problem (20.1). This small extra effort is the cost of having several important advantages: having a centered initial starting point, detecting infeasibility by convergence, applicability of any IPM without degrading theoretical complexity. The rest of this section is devoted to showing that the computation of the Newton direction for the embedding problem (SP ) reduces to almost the same sized augmented (20.6) or normal equation (20.7) systems as in the case of (20.1). 17 Exercise 106 Prove this theorem. 416 IV Miscellaneous Topics In Chapter 3 the self-dual problem (SP )  T q̃ x̃ : M x̃ ≥ −q̃, x̃ ≥ 0 , min was solved, where M is of size n × n and skew-symmetric and q̃ ∈ IRn+ . Given an initial positive solution (x̃, s̃) > 0, where s̃ = M x̃ + q̃, a Newton step for problem (SP ) with a value µ > 0 was given as f = M ∆x, f ∆s f is the solution of the system where ∆x e −1 S) e ∆x f = µx̃−1 − s̃. (M + X (20.21) Now we have to analyze how the positive definite system (20.21) can be efficiently solved in the case of problem (SP). For this problem we have x̃ = (yu , y, x, κ, ϑ), s̃ = (zu , z, s, ν, ρ) and  0   0   M =  I   −bTu  b̄Tu −I 0 0 −AT b T −b̄T bu −b A 0 −c −b̄u c T 0 c̄T −β   0       0          −c̄  and q̃ =  0  .     0  β     0 γ b̄ Hence the Newton equation (20.21) can be written as           Yu−1 Zu 0 0 Y −1 Z I −AT −bTu b̄Tu b T −b̄ T −I A X −1 S −c T c̄T bu −b c ν κ −β −b̄u  ∆yu   µyu−1 − zu      ∆y   µy −1 − z       −1    −c̄   ∆x   =  µx − s    1    β    ∆κ   µ κ − ν ρ µ ϑ1 − ρ ∆ϑ ϑ b̄ From the first and the second equation it easily follows that ∆yu = Yu Zu−1 (∆x − bu ∆κ + b̄u ∆ϑ + µyu−1 − zu ) and ∆y = Y Z −1 (−A∆x + b∆κ − b̄∆ϑ + µy −1 − z). We simplify the notation by introducing Wu := Zu−1 Yu .      .     (20.22) IV.20 Implementing Interior Point Methods 417 Then, by substituting the value of ∆yu in (20.22) we find18     ∆y r1 Y −1 Z A −b b̄          −AT X −1 S + W c − Wu bu −c̄ + Wu b̄u   ∆x   r2 u  =       bT −cT − bTu Wu νκ + bTu Wu bu β − bTu Wu b̄u    ∆κ   r3  r4 −b̄T c̄T + b̄Tu Wu −β − b̄Tu Wu bu ϑρ + b̄Tu Wu b̄u ∆ϑ      , (20.23)   where for simplicity the right-hand side elements are denoted by r1 , . . . , r4 . Now if we multiply the second block of equations (corresponding to the right-hand side r2 ) in (20.23) by −1, a system analogous to the augmented system (20.6) of problem (20.1) is obtained. The difference is that here we have two additional constraints and variables. For the solution of this system, the factorization of the matrix may happen in the same way, but the last two rows and columns (these are typically dense) should be left to the last two steps of the factorization. A 2 × 2 dense window for (∆κ, ∆ϑ) then remains. If we further simplify (20.23) by substituting the value of ∆y, the analogue to the normal equation system of the problem (SP ) is produced. For simplicity the scalars here are denoted by η1 , . . . , η8 and r5 , r6 , r7 .19,20      r5 ∆x AT Z −1 Y A + X −1 S + Zu−1 Yu η1 η2           (20.24) η3 η4 η5    ∆κ  =  r6  .  r7 ∆ϑ η6 η7 η8 18 Exercise 107 Verify that  19 Exercise 108 Verify that    r1 r2 r3 r4     =   µy −1 − z −1 µx−1 − s − (µzu − yu ) −1 µ κ1 − ν + bT u (µzu − yu ) −1 1 µϑ − ρ − b̄T u (µzu − yu )   .  c − Wu bu − AT Z −1 Y b η1 = η2 η3 = = η4 = T −1 νκ−1 + bT Yb u Wu bu + b Z η5 = η6 = T −1 β − bT Y b̄ u Wu b̄u − b Z T −1 c̄T + b̄T YA u Wu + b̄ Z η7 = η8 = T −1 −β − b̄T Yb u Wu bu − b̄ Z r5 = r6 = −1 µx−1 − s − (µzu − yu ) + AT (µz −1 − y) r7 = −c̄ + Wu b̄u + AT Z −1 Y b̄ T −1 −cT − bT YA u Wu − b Z T −1 ρϑ−1 + b̄T Y b̄ u Wu b̄u + b̄ Z and 20 −1 T −1 µκ−1 − ν + bT − y) u (µzu − yu ) − b (µz −1 T −1 µϑ−1 − ρ − b̄T − y). u (µzu − yu ) + b̄ (µz Exercise 109 Develop similar formulas for the normal equation if ∆x is eliminated instead of ∆y. Compare the results with (20.7) and (20.8). 418 20.5.1 IV Miscellaneous Topics Simplifying the Newton system of the embedding model As mentioned with respect to the augmented system, we easily verify that the difference between the normal equations of problem (20.1) and the embedding problem (SP ) is that here two additional constraints and variables are present. Note that the last two rows and columns in (20.23) and (20.24) are neither symmetric nor skewsymmetric. The reader might think that these two extra columns deteriorate the efficiency of the algorithm (it requires two additional back-solves for the computation of the Newton direction) and hence make the embedding approach less attractive in practice. However, the computational cost can easily be reduced by a simple observation. First, note that for any interior solution (yu , y, x, κ, ϑ) the duality gap (see also Exercise 10 on page 35) is equal to 2γϑ. Second, remember that in Lemma II.47 we have proved that in a primal-dual method the target duality gap is always reached after a full Newton step. Since the duality gap on the central path with the value µ equals to 2(m + 2n + 2)µ and thus, the target duality gap is determined by the target value µ+ = (1 − θ)µ, the step ∆ϑ can directly be calculated. ∆ϑ = ϑ+ − ϑ = θµ µ+ − µ (m + 2n + 2) = − (m + 2n + 2) γ γ As a result we conclude that the value of ∆ϑ in (20.24) is known, thus it can simply be substituted in the Newton system and the system (20.23) reduces to almost the original size. This simplification allows to implement IPMs based on the self-dual embedding model efficiently, the cost per iteration is only one extra back-solve. 20.5.2 Notes on warm start Many practical problems need the solution of a sequence of similar linear programs where small perturbations are made to b and/or c (possibly also in A). As long as these perturbations are small, we naturally expect that the optimal solutions are not far from each other and restarting the optimization from the solution of the old problem (warm start) should be more efficient than solving the problem from scratch.21 The difficulty in the IPM warm start comes from the fact that the old optimal solution is very close to the boundary (this is a necessity since all optimal solutions in an LO problem are on the boundary of the feasible set) and well centered. This point, in the perturbed problem, still remains close to the boundary or becomes infeasible, but even if it remains feasible it is very poorly centered. Consequently, the IPM makes a long sequence of short steps because the iterates cannot get away from the boundary. Therefore for an efficient warm start we need a well-centered point close to 21 Some early attempts to solve such problems are due to Freund [84] who uses shifted barriers, and Polyak [234] who applies modified barrier functions. For further elaboration of the literature see, e.g., Lustig, Marsten and Shanno [193], Gondzio and Terlaky [116] and Andersen et al. [10]. IV.20 Implementing Interior Point Methods 419 the old optimal one or an efficient centering method (to get far from the boundary) to overcome these difficulties. These two possibilities are discussed briefly below. Independent of the approach chosen it would be wise to save a well-centered almost optimal solution (say, with 10−2 relative duality gap) that is still sufficiently far away from the boundary. • Centered solutions for warm start in (SP ) embedding. Among the spectacular properties of the (SP ) embedding listed in the previous section, the ability to construct always perfectly centered initial interior points was mentioned. The old well-centered almost optimal solution x∗ , z ∗ , zu∗ , s∗ , y ∗ , yu∗ , κ∗ , ϑ∗ , ρ∗ , ν ∗ can be used as the initial point for embedding the perturbed problem. As we have seen in Section 20.5, b̄, c̄, β and γ can always be redefined so that the above solution stays well centered. The construction allows simultaneous perturbations of b, bu , c and even the matrix A. Additionally, it extends to handling new constraints or variables added to the problem (e.g., in buildup or cutting plane schemes). In these cases, we can keep the solution for the old coordinates (let µ be the actual barrier parameter) and set the initial value of the new complementary variables equal to √ µ. This results in a perfectly centered initial solution. • Efficient centering. If the old solution remains feasible, but is badly centered, we might proceed with this solution without making a new embedding. The common approach is to use a path-following method for the recentering process; it uses targets on the central path . Because of the weak performance of Newton’s method far off the central path, this approach is too optimistic for a warm start. The targetfollowing method discussed in Part III (Section 11.4) offers much more flexibility in choosing achievable targets, thus leading to efficient ways of centering. A target sequence that improves centrality allows larger steps and therefore speeds up the centering and, as a consequence, the optimization process.22 20.6 Parameters: step-size, stopping criteria 20.6.1 Target-update The easiest way to ensure that all iterates remain close to the central path is to decrease µ by a very small amount at each iteration. This provides the best theoretical worst-case complexity, as we have seen in discussing full Newton step methods. These methods demonstrate hopelessly slow convergence in practice and their theoretical worst-case complexity is identical to their practical performance. In large-update methods the barrier parameter is reduced much faster than the theory suggests. To preserve polynomial convergence of these methods in theory, several Newton steps are computed between two reductions of µ (update of the target) until the iterate is in a sufficiently small neighborhood of the central path . In practice this multistep strategy is ignored and at each reduction of µ, at each target-update, only one Newton step is made. A drawback of this strategy is that the iterates might get 22 Computational results based on centering target sequences are presented in Gondzio [114] and Andersen et al. [10]. 420 IV Miscellaneous Topics far away from the central path or from the target point, and the efficiency of the Newton method might deteriorate. A careful strategy for updating µ and for steplength selection reduces the danger of this negative scenario. At an interior iterate the current duality gap is given by g = xT s + zuT yu + z T y, which is equal to (2n + m)µ if the iterate is on the central path . The central point with the same duality gap as the current iterate belongs to the value µ= xT s + zuT yu + z T y . 2n + m The target µ value is chosen so that the target duality gap is significantly smaller, but does not put the target too far away. Thus we take µnew = (1 − θ) xT s + zuT yu + z T y , 2n + m (20.25) where θ ∈ [0, 1]. The value θ = 0 corresponds to pure centering, while θ < 1 aims to reduce the duality gap. A solid but still optimistic update is θ = 34 .23 20.6.2 Step size Although there is not much supporting theory, current implementations use very large and different step-sizes in the primal and dual spaces.24 All implementations use a variant of the following strategy. First the maximum possible step-sizes are computed: αP and αD := max {α > 0 : (x, z, zu ) + α(∆x, ∆z, ∆zu ) ≥ 0}, := max {α > 0 : (s, y, yu ) + α(∆s, ∆y, ∆yu ) ≥ 0}, and these step-sizes are slightly reduced by a factor α0 = 0.99995 to ensure that the new point is strictly positive. Although this aggressive, i.e. very large, choice of α0 is frequently reported to be the best, we must be careful and include a safeguard to handle the case when α0 = 0.99995 turns out to be too aggressive. 20.6.3 Stopping criteria Interior point algorithms terminate when the duality gap is small enough and the current solution is feasible for the original problems (20.1) and (20.2), or when the 23 24 In the published literature, iteration counts larger than 50 almost never occur and most frequently iteration numbers around 20 are reported. Taking this number as a target iteration count and assuming that (in contrast to the theoretical worst-case analysis) Newton’s method provides iterates always close to the target point, we can calculate how large the target-update (how small (1 − θ)) should be to reach the desired accuracy within the required number of iterations. Thus, for a problem with 104 variables and a centered initial solution with µ = 1 and aiming for a solution with 8 digits of accuracy, we have to reduce the duality gap by a factor of 1012 in 20 iterations. By straightforward calculation we can easily verify that the value θ = 43 is an appropriate choice for this purpose. Kojima, Megiddo and Mizuno [174] proved global convergence of a primal-dual method that allows such large step-sizes in most iterations. IV.20 Implementing Interior Point Methods 421 infeasibility is small enough. The practical tolerances are larger than the theoretical bounds that guarantee identification of an exact solution; this is a common drawback of all numerical algorithms for solving LO problems. To obtain a sensible solution the duality gap and the measure of infeasibility should be related to the problem data. Relative primal infeasibility is related to the length of the vectors b and bu , dual infeasibility is related to the length of the vector c, and the duality gap is related to the actual objective value. A solution with p digits relative accuracy is guaranteed by the stopping criteria presented here: ||Ax − z − b|| ≤ 10−p 1 + ||b|| and ||x + zu − bu || ≤ 10−p , 1 + ||bu || ||c − AT y + yu − s|| ≤ 10−p , 1 + ||c|| |cT x − (bT y − bTu yu )| ≤ 10−p . 1 + |cT x| (20.26) (20.27) (20.28) An 8-digit solution (p = 8) is typically required in the literature. Let us observe that conditions (20.26–20.28) still depend on the scaling of the problem and somehow use the assumption that the coefficients of the vectors b, bu , c are about the same magnitude as those of the matrix A — preferably near 1. √ An important note is needed here. The theoretical worst-case bound O( n log 1ε ) is still far from computational practice. It is still extremely pessimistic; in practice the number of iterations is something like O(log n). It is rare that the current implementations of interior-point methods use more than 50 iterations to reach an 8-digit optimal solution. 20.7 Optimal basis identification 20.7.1 Preliminaries An optimal basis identification procedure is an algorithm that generates an optimal basis and the related optimal basic solutions from an arbitrary primal-dual optimal solution pair. In this section we briefly recall the notion of an optimal basis. In order to ease the discussion we use the standard format:  min cT x : Ax = b, x ≥ 0 , (20.29) where c, x, ∈ IRn , b ∈ IRm , and the matrix A is of size m × n. The dual problem is  max bT y : AT y + s = c, s ≥ 0 , (20.30) where y ∈ IRm and s ∈ IRn . We assume that A has rank m. A basis AB is a nonsingular rank m submatrix of A, where the set of column indices of AB is denoted by B. A basic solution of the primal problem (20.29) is a vector x where all the coordinates in N = {1, . . . , n} − B are set to zero and the basis coordinates form the unique solution of the equation AB xB = b. The corresponding dual basic solution is defined as the unique solution of ATB y = cB , along with sB = 0 and sN = cN − ATN y. It is clear from 422 IV Miscellaneous Topics this definition that a primal-dual pair (x, s) of basic solutions is always complementary, and hence, if both x and s are feasible, they are primal and dual optimal, respectively. A basic solution is called primal (dual) degenerate if at least one component of xB (sN ) is zero. There might be two reasons in practice to require an optimal basic solution for an LO problem. 1. If the given problem is a mixed integer LO problem then some or all of the variables must be integer. After solving the continuous relaxation we have to generate cuts to cut off the nonintegral optimal solution. To date, such cuts can be generated only if an optimal basic solution is available.25 Up till now there has been only one attempt to design a cut generation procedure within the interior-point setting (see Mitchell [211]). 2. In practical applications of LO, a sequence of slightly perturbed problems often has to be solved. This is the case in combinatorial optimization when new cuts are added to the problem or if a branch and bound algorithm is applied. Also if, e.g., in production planning models the optimal solutions for different scenarios are calculated and compared, we need to solve a sequence of slightly perturbed problems. When such closely related problems are solved, we expect that the previous optimal solution can help to solve the new problem faster. Although some methods for potentially efficient warm start were discussed in Section 20.5.2, in some cases it might be advantageous in practice to use Simplex type solvers initiated with an old optimal basis. In this section we describe how an optimal basic solution can be obtained from any optimal solution pair of the problem. 20.7.2 Basis tableau and orthogonality We introduce briefly the notions of basis tableau and pivot transformation and we show how orthogonal vectors can be obtained from a basis tableau. Let A be the constraint matrix, with columns aj for 1 ≤ j ≤ n, and let AB be a basis chosen from the columns of A. The basis tableau QB corresponding to B is defined by the equation AB QB = A. (20.31) Because this gives no rise to confusion we write below Q instead of QB . The rows of Q are naturally indexed by the indices in B and the columns by 1, 2, . . . , n. If i ∈ B and j = 1, 2, . . . , n the corresponding element of Q is denoted by qij . See Figure 20.1 (page 423). It is clear that qij is the coefficient of ai in the unique basis representation of the vector aj : X aj = qij ai . i∈B For j ∈ B this implies qij = 25 ( 1 if i = j, 0 otherwise, The reader may consult the books of Schrijver [250] and Nemhauser and Wolsey [224] to learn about combinatorial optimization. IV.20 Implementing Interior Point Methods i∈B                            423 ··· j . . . . . . ··· i qij . . . ··· ··· . . . Figure 20.1 Basis tableau. Thus, if j ∈ B, the corresponding column in Q is a unit vector with its 1 in the row corresponding to j. Hence, by a suitable reordering of columns QB — the submatrix of Q consisting of the columns indexed by B — becomes an identity matrix. It is convenient for the reasoning if this identity matrix occupies the first m columns. Therefore, by permuting the columns of Q by a permutation matrix P we write h i QP = I QN , (20.32) where QN denotes the submatrix of Q arising when the columns of QB are deleted. In the next section, where we present the optimal basis identification procedure, we will need a well-known orthogonality property of basis tableaus.26 This property follows from the obvious matrix identity   h i QN  = 0. I QN  −I Because of (20.32) this can be written as   QN  = 0. QP  −I Defining  R := P  QN −I  , (20.33) (20.34) we have rank Q = m, rank R = n − m and QR = 0. We associate with each index a vector in IRn as follows. If i ∈ B, q (i) will denote the corresponding row of Q and if j ∈ N then q(j) is the corresponding column of R. Clearly, the vectors q (i) , i ∈ B, span the row space of Q = QB and the vectors q(j) , j ∈ N , span the column space of R. Since these spaces are orthogonal, they are each 26 See, e.g., Rockafellar [238] or Klafszky and Terlaky [171]. 424 IV Miscellaneous Topics others orthogonal complement. Note that the row space of Q is the same as the row space of A. We thus see that the above spaces are independent of the basis B. ′ denote the vector associated with Now let AB′ be another basis for A and let q(j) ′ an index j 6∈ B . Then the aforementioned orthogonality property states that ′ q (i) ⊥ q(j) for all i ∈ B and j 6∈ B ′ . This is an obvious consequence of the observation in the previous paragraph. It is well known that the basis tableau for B ′ can be obtained from the tableau for B by performing a sequence of pivot operations. A pivot operation replaces a basis vector ai , i ∈ B by a nonbasic vector aj , j ∈ / B, where qij 6= 0.27 Example IV.86 For better understanding let us consider a simple numerical example. The following two basic tableaus can be transformed into each other by a single pivot. a1 a2 a3 a4 a5 a5 2 1 3 0 1 a4 –1 –1 4 1 0 a1 a2 a3 a4 a5 a2 2 1 3 0 1 a4 1 0 7 1 1 It is easy to check that for the first tableau q(3) = (0, 0, −1, 4, 3) and for the second tableau q (4) = (1, 0, 7, 1, 1), and that these vectors are orthogonal.28,29 ✷ ♦ 20.7.3 The optimal basis identification procedure Given any complementary solution, the algorithm presented below constructs an optimal basis in at most n iterations.30 Since the iteration count and thus the number of necessary arithmetic operations depends only on the dimension of the problem and is independent of the actual problem data, the algorithm is called strongly polynomial. The algorithm can be initialized with any optimal (and thus complementary) solution pair (x, s). This pair defines a partition of the index set as follows: B = {i | xi > 0}, 27 28 29 30 N = {i : si > 0}, T = {i : xi = si = 0}. Exercise 110 Let i ∈ B, where AB is a basis. For any j 6∈ B such that qij 6= 0 show that B′ = (B \ {i}) ∪ {j} also defines a basis, and the tableau for B′ can be obtained from the tableau for B by one pivot operation. Exercise 111 For each of the tableaus in Example IV.86, give the permutation matrix P and the matrix R according to (20.33) and (20.34). Exercise 112 For each of the tableaus in Example IV.86, give a full bases of the row space of the tableau and of its orthogonal complement. The algorithm discussed here was proposed by Megiddo [201]. He has also proved that an optimal basis cannot be constructed only from a primal or dual optimal solution in strongly polynomial time unless there exists a strongly polynomial algorithm for solving the LO problem. The problem of constructing a vertex solution from an interior-point solution has also been considered by Mehrotra [203]. IV.20 Implementing Interior Point Methods 425 As we have seen in Section 3.3.6, interior-point methods produce a strictly complementary optimal solution and hence such a solution gives a partition with T = ∅. But below we deal with the general case and we allow T to be nonempty. The optimal basis identification procedure consists of three phases. In the first phase a so-called maximal basis is constructed. A basis of A is called maximal with respect to (x, s) if • it has the maximum possible number of columns from AB , • it has the maximum possible number of columns from (AB , AT ). Then, in the second and third phases, independently of each other, primal and dual elimination procedures are applied to produce primal and dual feasible basic solutions respectively. Note that a maximal basis is not unique and not necessarily primal and/or dual feasible. A maximal basis can be found by the following simple pivot algorithm. Because of the assumption rank (A) = m, all the artificial basis vectors {e1 , · · · , em } Initial basis Input: Optimal solution pair (x, s) and the related partition (B, N, T ); artificial basis an+1 = e1 , · · · , an+m = em ; B = {n + 1, · · · , n + m}. Output: A maximal basis B ⊆ {1, · · · , n}. begin while qij 6= 0, i > n, j ∈ AB do begin pivot on position (i, j) (ai leaves and aj enters the basis); B := (B \ {i}) ∪ {j} . end while qij 6= 0, i > n, j ∈ AT do begin pivot on position (i, j) (ai leaves and aj enters the basis); B := (B \ {i}) ∪ {j} . end while qij 6= 0, i > n, j ∈ AN do begin pivot on position (i, j) (ai leaves and aj enters the basis); B := (B \ {i}) ∪ {j} . end end are eliminated from the basis at termination. Since the AB part is investigated first, the number of basis vectors from AB is maximal; similarly the number of basis vectors from [AB , AT ] is also maximal. In a practical implementation, special attention must 426 IV Miscellaneous Topics be given to the selection of the pivot elements in the above algorithm. Typically there is lot of freedom in the pivot selection, since a large number of leaving and/or entering variables could be selected at each iteration.31 The structure of the basis tableau resulting from the algorithm is visualized in Figure 20.2. Note that the tableau is never computed in practice; just the basis, in a factorized form. The tableau form is used just to ease the explanation and understanding. AB AT AN 1 i∈B∩B .. . 1 1 i∈B∩T 0 .. . 1 1 i∈B∩N 0 0 .. . 1 Figure 20.2 Tableau for a maximal basis. We proceed by a primal and a dual phase, performed independently of each other. They make the basis primal and dual feasible, respectively. Observe that in the elimination step of the first while-loop of the primal phase the columns of ABe are dependent. Hence there exists a nonzero solution of ABex̄B = 0.32 In the elimination step the ‘maximal’ property of the basis is lost, but it isrestoredin the second while-loop. As we can see, the Primal phase works only on the ABe , ATe part of the matrix A. In fact it reduces the ABe part to an independent set of column vectors. At termination the maximal basis is primal feasible and x̃ is the corresponding primal feasible basic solution, i.e., x̃B = A−1 B b ≥ 0 and x̃N = 0. The number of eliminations in the first while-loop is at most |B| − rank (B) and the number of pivots in the second while-loop is also at most |B| − rank (B). The Dual phase presented below works on the (AT , AN ) part. It reduces AN and extends AT so that no vector from AN remains in the basis. Note that in the elimination step of the first while-loop the rank of [AB , ATe] is less than m.33 In the elimination step the ‘maximal’ property of the basis is 31 32 33 We would like to choose always the pivot element that produces the least fill-in in the inverse basis. For this pivot selection problem many heuristics are possible. It can at least locally be optimized by, e.g., the heuristic Markowitz rule (recall Section 20.4). Implementation issues related to optimal basis identification procedures are discussed in Andersen and Ye [11], Andersen et al. [10] and Bixby and Saltzman [43]. In fact, an appropriate x̄ can be read from the tableau. Because of the orthogonality property any e − B can be used. In a practical implementation the tableau is not available; vector q(j) for j ∈ B only the (factorized) basis matrix QB is available. But then a vector q(j) can be obtained by computing a single nonbasic column of the tableau. e ∩ B; so only one row of the tableau For an appropriate s̄ we can choose any vector q (i) for i ∈ N has to be computed at each execution of the first while-loop. IV.20 Implementing Interior Point Methods 427 Primal phase Input: e N, Te); Optimal solution pair (x̃, s) and the related partition (B, maximal basis B. Output: A maximal basis B ⊆ {1, · · · , n}; e N, Te) with B e ⊂ B. optimal solution (x̃, s), partition (B, begin e 6⊆ B do while B begin begin let x̄ be such that ABex̄Be = 0, x̄Te∪N = 0, x̄ 6= 0; eliminate a(t least one) coordinate ofx̃, let x̃ := x̃ − ϑx̄ ≥ 0;  e := σ(x̃), Te := {1, . . . , n} \ B e∪N ; B end while qij 6= 0, i ∈ Te ∩ B, j ∈ B do begin pivot on position (i, j) (ai leaves, aj enters the basis); B := (B \ {i}) ∪ {j} . end end end lost but is restored in the second while-loop. At termination the maximal basis is dual feasible and s̃ is the corresponding dual feasible basic solution, i.e., s̃N = T cN − ATN (A−1 B ) cB and s̃B = 0. The number of eliminations in the first while-loop is at most m − rank (AB , AT ) and the number of pivots in the second while-loop is also at most m − rank (AB , AT ). To summarize, by first constructing a maximal basis and then performing the primal and dual phases, the above algorithm generates an optimal basis after at most n iterations. First we need at most m pivots to construct the maximal basis, then in the primal phase |B| − rank (B) and in the dual phase m − rank (AB , AT ) pivots follow. Finally, to verify the n-step bound, observe that after the initial maximal basis is constructed, each variable might enter the basis at most once. 20.7.4 Implementation issues of basis identification In the above basis identification algorithm it is assumed that a pair of exact primal/dual optimal solutions is known. This is never the case in practice. Interior point algorithms generate only a sequence converging to optimal solutions and because of the finite precision of computations the solutions are neither exactly feasible nor complementary. Somehow we have to make a decision about which variables are 428 IV Miscellaneous Topics Dual phase Input: e , Te); Optimal solutions (x, s̃), partition (B, N maximal basis B. Output: A maximal basis B ⊆ {1, · · · , n}; e , Te) with N e ∩ B = ∅. optimal solution (x, s̃), partition (B, N begin e ∩ B 6= ∅ do while N begin begin let s̄ be such that s̄ = AT y, s̄B∪Te = 0, s̄ 6= 0; eliminate a(t least one) coordinate ofs, let s̃ := s̃ − ϑs̄ ≥ 0;  e e e ; N := σ(s̃), T := {1, · · · , n} \ B ∪ N end e ∩ B, j ∈ Te do while qij 6= 0, i ∈ N begin pivot on position (i, j) (ai leaves, aj enters the basis); B := (B \ {i}) ∪ {j} . end end end positive and which are zero at the optimum. Let (x̄, ȳ, s̄) be feasible and (x̄)T s̄ ≤ ε. Let us make a guess for the optimal partition of the problem as B = { i | x̄i ≥ s̄i } and N = { i | x̄i < s̄i }. Now we can define the following perturbed problem34  minimize c̄T x : Ax = b̄, x ≥ 0 , (20.35) where b̄ = AB x̄B , c̄B = ATB ȳ and c̄N = ATN ȳ + sN . Now the vectors (x, y, s), where y = ȳ and     x̄ i ∈ B, i and si = xi =   0 i∈N 34 This approach was proposed by Andersen and Ye [11]. 0 i ∈ B, s̄i i∈N (20.36) IV.20 Implementing Interior Point Methods 429 are strictly complementary optimal solutions of (20.35).35 If ε is small enough, then the partition (B, N ) is the optimal partition of (20.29) (recall the results of Theorem I.47 and observe that the proof does not depend on the specific algorithm, just on the centrality condition and the stopping precision). Thus problems (20.29) and (20.35) share the same partition and the same set of optimal bases. As an optimal complementary solution for (20.35) is available, the above basis identification algorithm can be applied to this perturbed problem. The resulting optimal basis, within a small margin of error (depending on ε), is an optimal basis for (20.29). 20.8 Available software After twenty years of intensive research, IPMs are now well understood both in theory and practice. As a result a number of sophisticated implementations exist of IPMs for LO. Below we give a list of some of these codes; some of them contain both a Simplex and an IPM solver. They are capable to solve linear problems on a PC in some minutes that were hardly solvable on a super computer fifteen years ago. CPLEX (CPLEX/ BARRIER) (CPLEX Optimization, Inc.). For information contact http://www.cplex.com. CPLEX is leading the market at this moment. It is a most complete and robust package. It contains a primal and a dual Simplex solver, an efficient interior-point implementation with cross-over,36 a good mixed-integer code, a network and a quadratic programming solver. It is supported by most modelling languages and available for most platforms. XPRESS-MP (DASH Optimization). For information contact the vendor’s WEB page: http://www.dashoptimization.com. An excellent package including Simplex and IPM solvers. It is almost as complete as CPLEX. CLP (The LO solver on COIN-OR). For more information contact http://www.coin-or.org/cgi-bin/cvsweb.cgi/COIN/Clp/. COIN-OR’s LO package is written by the IBM Lo team. Like CPLEX, CLP contains both Simplex and IPM solvers. It is capable to solve linear and quadratic optimization problems. LOQO. Available from http://www.princeton.edu/~rvdb/. LOQO is developed by Vanderbei (Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA). It is a robust implementation of a primal-dual infeasible-start IPM for convex quadratic optimization. LOQO is a commercial package, like CPLEX and OSL, but it is available for academic purposes for a modest license fee. 35 Producing a reliable guess for the optimal partition is a nontrivial task. The simple method presented by (20.36) seems to work reasonably well in practice. See El-Bakry, Tapia and Zhang [71, 70]. However, Andersen and Ye [11] report good results by using a more sophisticated indicator to predict the optimal partition (B, N ) based on the primal-dual search direction. 36 Close to optimality the solver rounds the IPM solution to a (not necessarily optimal) basic solution and switches to the Simplex solver, that generates an optimal basic solution. 430 IV Miscellaneous Topics HOPDM. Available from http://www.maths.ed.ac.uk/~gondzio/software/hopdm.html. HOPDM is developed by Gondzio (School of Mathematics, The University of Edinburgh, Edinburgh, Scotland). It implements a higher order primal-dual method. It is in the public domain — in a form of FORTRAN source files — for academic purposes. BPMPD. Available from http://www.sztaki.hu/~meszaros/bpmpd/. Mészáros’ BPMPD, is an implementation of a primal-dual predictor-corrector IPM including both the normal and augmented system approach. The code is available as an executable file for academic purposes. LIPSOL. Available from http://www.caam.rice.edu/~yzhang/. Zhang’s LIPSOL is written in MATLAB and FORTRAN. It is an implementation of the primal-dual predictor-corrector method. One of its features is the use of the MATLAB programming language, which makes its use relatively easy. PCx. Available from http://www-fp.mcs.anl.gov/otc/Tools/PCx/. This code was developed by Czyzyk, Mehrotra and Wrightat the Argonne National Lac, Chicago.. It is a stand alone C implementation of an infeasible primal-dual predictor corrector IPM. PCx is freely available, but is not public domain software McIPM. Available from http://www.cas.mcmaster.ca/~oplab/index.html. This code was developed at the Advanced Optimization Lab, McMaster University by Zhu, Peng and Terlaky. McIPM is written in MATLAB and C. It is a unique implementation of a Self-Regular primal-dual predictor-corrector IPM and it is based on the self-dual embedding model. The use of the MATLAB makes its use relatively easy. It is freely available under an open source license. More information about codes for linear optimization, either for commercial or research purposes, are available at the World Wide Web site of LP FAQ (LP Frequently Asked Questions) at • http://www-unix.mcs.anl.gov/otc/Guide/faq/linear-programming-faq.html • ftp://rtfm.mit.edu/pub/usenet/sci.answers/linear-programming-faq Appendix A Some Results from Analysis In Part II we need a result from convex analysis. We include its elementary proof in this appendix for the sake of completeness. A closely related result can be found in Bazaraa et al. [37] (Theorem 3.4.3 and Corollary 1, pp. 101–102). Recall that a subset C of IRk is called relatively open if C is open in the smallest affine subspace of IRk containing C. Proposition A.1 Let f : D → IR be a differentiable function, where D ⊆ IRk is an open set, and let C be a relatively open convex subset of D such that f is convex on C. Moreover, let L denote the subspace parallel to the smallest affine space containing C. Then, x∗ ∈ C minimizes f over C iff ∇f (x∗ ) ⊥ L. (A.1) Proof: Since f is convex on C, we have for any x, x∗ ∈ C, f (x) ≥ f (x∗ ) + ∇f (x∗ )T (x − x∗ ). Since x − x∗ ∈ L, the sufficiency of condition (A.1) follows immediately. To prove the necessity of (A.1), consider xt = x∗ + t(x− x∗ ), with t ∈ IR. The convexity of C implies that if 0 ≤ t ≤ 1, then xt ∈ C. Moreover, since C is open, we also have xt ∈ C when t ≥ −a for some positive a. Since f is differentiable, we have T ∇f (x∗ ) (x − x∗ ) = lim t↓0 f (xt ) − f (x∗ ) f (xt ) − f (x∗ ) = lim . t↑0 t t Now let x∗ ∈ C minimize f . Since f (xt ) ≥ f (x∗ ), letting t → 0 we have that the first limit must be nonnegative, and the second nonpositive. Hence both limits are zero. So T we have ∇f (x∗ ) (x − x∗ ) = 0, ∀x ∈ C. Thus (A.1) follows. ✷ At several places in the book we mention the implicit function theorem. There exists many forms of this theorem. See, e.g., Franklin [82], Buck [52], Fiacco [76] or Rudin [248]. We cite here a version of Bertsekas [40] (Proposition A.25, pp. 554).1 Proposition A.2 (Implicit Function Theorem) Let f : IRn+m → IRm be a function of w ∈ IRn and z ∈ IRm such that: 1 In fact, Proposition A.25 in Bertsekas [40] contains a typo. It says that f : IRn+m → IRn instead of f : IRn+m → IRm . 432 Some Results from Analysis (i) There exist w̄ ∈ IRn and z̄ ∈ IRm such that f (w̄, z̄) = 0. (ii) f is continuous and has a continuous and nonsingular gradient matrix (or Jacobian) ∇z f (w, z) in an open set containing (w̄, z̄). Then there exists open sets Sw̄ ⊆ IRn and Sz̄ ⊆ IRm containing w̄ and z̄, respectively, and a continuous function φ : Sw̄ → Sz̄ such that z̄ = φ(w̄) and f (w, φ(w)) = 0 for all w ∈ Sw̄ . The function φ is unique in the sense that if w ∈ Sw̄ , z ∈ Sz̄ , and f (w, z) = 0, then z = φ(w). Furthermore, if for some p > 0, f is p times continuously differentiable the same is true for φ, and we have −1 ∇φ(w) = − (∇z f (w, φ(w))) ∇w f (w, φ(w)) . Appendix B Pseudo-inverse of a Matrix We are interested in the least norm solution of the linear system of equations Ax = b, where A is an m × n matrix of rank r, and b ∈ IRm . We assume that a solution exists, i.e., b belongs to the column space of A. First we consider the case where r = n. Then the columns of A are linearly independent and hence the solution is unique. It is obtained by premultiplication of the system by AT : AT Ax = AT b. Since AT A is nonsingular we find x = (AT A)−1 AT b (r = n). We proceed with the case where r = m < n. Then Ax = b has multiple solutions. The least norm solution is characterized by the fact that it is orthogonal to the null space of A. So in this case the solution belongs to the row space of A and hence can be written as x = AT λ, λ ∈ IRm . This implies that AAT λ = b. This time AAT is nonsingular, and we obtain that λ = (AAT )−1 b, whence x = AT (AAT )−1 b (r = m). Finally we consider the general case, without making any assumption on the rank of A. We start by decomposing A as follows: A = A1 A2 , where A1 is an m × r matrix of rank r, and A2 is an r × n matrix of rank r. There are many ways to realize such a decomposition. One way is the well-known LU decomposition of A.1 Now Ax = b can be rewritten as A1 A2 x = b. Since A1 has full column rank we are in the first situation, and hence it follows that A2 x = (AT1 A1 )−1 AT1 b. Thus our problem is reduced to finding a least norm solution of the last system. Since A2 has full row rank we are now in the second situation, and hence it follows that x = AT2 (A2 AT2 )−1 (AT1 A1 )−1 AT1 b. 1 See, e.g., the book of Strang [259]. 434 Pseudo-inverse of a Matrix Thus we have found the least norm solution of Ax = b. Defining the matrix A+ according to A+ = AT2 (A2 AT2 )−1 (AT1 A1 )−1 AT1 , (B.1) we conclude that the least norm solution of Ax = b is given by x = A+ b. The matrix A+ is called the pseudo-inverse of A. We can easily verify that A+ satisfies the following relations: AA+ A + + A AA (AA+ )T (A+ A)T = A, (B.2) = = + A , AA+ , (B.3) (B.4) = A+ A. (B.5) Theorem B.1 The equations (B.2) to (B.5) determine A+ uniquely. Proof: We already have seen that a solution exists. Suppose that we have two solutions, X1 and X2 say. From (B.2) and (B.5) we derive that X1 AAT = AT , and X2 AAT = AT . So (X1 − X2 )AAT = 0. This implies (X1 − X2 )AAT (X1 − X2 )T = 0, and hence we must have (X1 − X2 )A = 0. This means that the columns of X1 − X2 belong to the left null space of A. On the other hand (B.3) and (B.4) imply that AX1 X1T = X1T , and AX2 X2T = X2T . Hence A(X1 X1T − X2 X2T ) = X1T − X2T . This means that the columns of X1 − X2 belong to the column space of A. Since the column space and the left null space of A are orthogonal this implies that X1 = X2 . ✷ There is another interesting way to describe the pseudo-inverse A+ of A, which uses the so-called singular value decomposition (SV D) of A. Let r denote the rank of A, and let λ1 , λ2 , · · · , λr denote the nonzero (hence positive) eigenvalues of AAT . Furthermore, let Q1 and Q2 denote orthogonal matrices such that the first r columns of Q1 constitute a basis of the column space of A, and the first r columns of Q2 constitute a basis of the row space of A. Then, if Σ denotes the m × n matrix whose only nonzero elements are Σ11 , Σ22 , · · · , Σrr , with p Σii = σi := λi , 1 ≤ i ≤ r, then we have A = Q1 ΣQT2 . This is the SV D of A, and the numbers σi , 1 ≤ i ≤ r are called the singular values of A. Using Theorem B.1 we can easily verify that Σ+ is the n × m matrix whose only nonzero elements are the first r diagonal elements, and these are the inverses of the singular values of A. Then, using Theorem B.1 once more, we can easily check that A+ is given by A+ = Q2 Σ+ QT1 . Appendix C Some Technical Lemmas Lemma C.1 Let A be an m × n matrix with columns Aj and b a vector of dimension m such that the set S := {x : Ax = b, x ≥ 0} is bounded and contains a positive vector. Moreover, let all the entries in A and b be integral. Then for each i, with 1 ≤ i ≤ n, 1 . j=1 kAj k max {xi : x ∈ S} ≥ Qn x Proof: Observe that each column Aj of A must be nonzero, due to the boundedness of S. Fixing the index i, let x ∈ S be such that xi is maximal. Note that such an x exists since S is bounded. Moreover, since S contains a positive vector, we must have xi > 0. Let J be the support of x: J = {j : xj > 0} . We assume that x is such that the cardinality of its support is minimal. Then the columns of the submatrix AJ of A are linearly independent. This can be shown as follows. Let there exist a nonzero vector λ ∈ IRn such that X λj Aj = 0, j∈J and λk = 0 for each k ∈ / J. Then Aλ = 0. Hence, if ε is small enough, x ± ελ has the same support as x and is positive on J. Moreover, x ± ελ ∈ S. Since the i-th coordinate cannot exceed xi it follows that λi = 0. Since S is bounded, at least one of the coordinates of λ must be negative, because otherwise S would contain the ray x + ελ, ε > 0. By increasing the value of ε until one of its coordinates reaches zero we get a vector in S with less than |J| nonzero coordinates and for which the i-th coordinate still has value xi . This contradicts the assumption that x has minimal support among such vectors, thus proving that the columns of the submatrix AJ of A are linearly independent. Now let AKJ be any nonsingular submatrix of AJ . Here K denotes a suitable subset of the row indices 1, 2, · · · , m of A. Then we have AKJ xJ = bK , 436 Some Technical Lemmas since the coordinates xj of x with j ∈ / J are zero. We can solve xi from this equation by using Cramer’s rule. 1 This yields det A′KJ , det AKJ xi = (C.1) where A′KJ denotes the matrix arising from AKJ by replacing the i-th column by bK . We know that xi > 0. This implies |det A′KJ | > 0. Since all the entries in the matrix A′KJ are integral the absolute value of its determinant is at least 1. Thus we find xi ≥ 1 . |det AKJ | Now using that |det AKJ | is bounded above by the product of the norms of its columns, due to the well-known Hadamard inequality2 for determinants, we find3 1 1 1 . ≥ Q ≥ Qn kA k kA k Kj j j∈J j∈J j=1 kAj k xi ≥ Q The second inequality is obvious and the last inequality follows since A has no zero columns and hence the norm of each column of A is at least 1. This proves the lemma. ✷ We proceed with a proof of the two basic inequalities in (6.24) on page 134. The proof uses standard techniques for proving elementary inequalities.4 Lemma C.2 Let z ∈ IRn , and α ≥ 0. Then each of the two inequalities ψ (α kzk ) ≤ Ψ (αz) ≤ ψ (−α kzk) holds whenever the involved expressions are well defined. The left (right) inequality holds with equality if and only if one of the coordinates of z equals kzk ( − kzk, respectively) and the remaining coordinates are zero. Proof: Fixing z we introduce g(α) := ψ (α kzk) and G(α) := Ψ (αz) = n X ψ (αzi ) , i=1 where α is such that αz > −e and α kzk > −1. Both functions are twice differentiable with respect to α. Using that ψ ′ (t) = 1 − 1/t we obtain α kzk2 g (α) = , 1 + α kzk ′ ′ G (α) = n X i=1 αzi2 1 + αzi 1 The idea of using Cramer’s rule in this way was applied first by Khachiyan [167]. 2 cf. Section 1.7.3. 3 The idea of using Hadamard’s inequality for deriving bounds on the coordinates of xi from (C.1) was applied earlier by Klafszky and Terlaky [170] in a similar context. 4 The proof is due to Jiming Peng [232]. Some Technical Lemmas 437 and 2 ′′ g (α) = kzk ′′ 2, G (α) = (1 + α kzk) n X i=1 zi2 2. (1 + αzi ) Now consider the case where α ≥ 0. Then using zi ≤ kzk we may write G′′ (α) = n X i=1 zi2 (1 + αzi )2 ≥ n X i=1 2 zi2 = (1 + α kzk)2 kzk = g ′′ (α). (1 + α kzk)2 So G(α) − g(α) is convex for α ≥ 0. Since g ′ (0) = G′ (0) = 0 g(0) = G(0) = 0, (C.2) it follows that G(α) ≥ g(α) if α ≥ 0. This proves the left hand side inequality in the lemma. The right inequality follows in the same way. Let α ≥ 0 be such that e + αz > 0 and 1 − α kzk > 0. Using 1 + αzi ≥ 1 − α kzk > 0 we may write G′′ (α) = n X i=1 zi2 (1 + αzi ) 2 ≤ n X i=1 2 zi2 2 (1 − α kzk) = kzk 2 (1 − α kzk) = g ′′ (−α). This implies that G(α) − g(−α) is concave for α ≥ 0. Using (C.2) once more we obtain G(α) ≤ g(−α) if α ≥ 0, which is the right hand side inequality in the lemma. 2 Note that in both cases equality occurs only if zi2 = kzk for some i. Since the remaining coordinates are zero in that case, the lemma follows. ✷ We proceed with another technical lemma that is used in the proof of Lemma IV.15 in Chapter 17 (page 325). Lemma C.3 Let p be a positive number and let f : IR+ → IR+ be defined by f (x) := |1 − x| + 1 − If p ≥ 1 then f attains its minimal value at x = its minimal value at x = 1 and at x = p. p . x √ p, and if 0 < p ≤ 1 then f attains Proof: First consider the case p ≥ 1. If x ≤ 1 then we have f (x) = 1 − x + p p − 1 = − x. x x Hence, if x ≤ 1 the derivative of f is given by f ′ (x) = − p − 1 < 0. x2 Thus, f is monotonically decreasing if x ≤ 1. If x ≥ p then we have f (x) = x − 1 + 1 − p p =x− x x 438 Some Technical Lemmas and the derivative of f is given by f ′ (x) = 1 + p > 0, x2 proving that f is monotonically increasing if x ≥ p. For 1 ≤ x ≤ p we have f (x) = x − 1 + p p − 1 = x + − 2. x x Now the derivative of f is given by f ′ (x) = 1 − p x2 and the second derivative by 2p > 0. x3 √ Hence f is convex if x ∈ [1, p]. Putting f ′ (x) = 0 we get x = p, proving the first part of the lemma. The case p ≤ 1 is treated as follows. If x ≤ p then f ′′ (x) = f (x) = 1 − x + p p − 1 = − x, x x and, as before, f is monotonically decreasing. If x ≥ 1 then f (x) = x − 1 + 1 − p p =x− x x and f is monotonically increasing. Now let p ≤ x ≤ 1. Then f (x) = 1 − x + 1 − p p =2−x− . x x Hence f is concave if x ∈ [p, 1], and f has local minima at x = p and x = 1. Since f (1) = f (p) = 1 − p the second part of the lemma follows. ✷ The rest of this appendix is devoted to some properties of the componentwise product uv of two orthogonal vectors u and v in IRn . The first two lemmas give some upper bounds for the 2-norm and the infinity norm of uv. Lemma C.4 (First uv-lemma) If u and v are orthogonal in IRn , then √ 1 2 kuvk∞ ≤ ku + vk2 , kuvk ≤ ku + vk2 . 4 4 Proof: We may write  1 (u + v)2 − (u − v)2 . 4 From this we derive the componentwise inequality uv = 1 1 − (u − v)2 ≤ uv ≤ (u + v)2 . 4 4 (C.3) Some Technical Lemmas 439 This implies 1 1 2 2 − ku − vk e ≤ uv ≤ ku + vk e. 4 4 Since u and v are orthogonal, the vectors u − v and u + v have the same norm, and hence the first inequality in the lemma follows. For the second inequality we derive from (C.3) that 2  1 T 1 T 2 2 kuvk = eT (uv) = e (u + v)2 − (u − v)2 ≤ e (u + v)4 + (u − v)4 . 16 16 4 Since eT z 4 ≤ kzk for any z ∈ IRn , we obtain  1  4 4 2 ku + vk + ku − vk . kuvk ≤ 16 Using again that ku − vk = ku + vk, we confirm the second inequality. (C.4) ✷ The next lemma provides a second upper bound for kuvk. Lemma C.5 (Second uv-lemma) 5 If u and v are orthogonal in IRn , then kuvk ≤ √1 kuk kvk . 2 Proof: Recall from (C.4) that  1  4 4 ku + vk + ku − vk . 16 Now first assume that u and v are unit vectors, i.e., kuk = kvk = 1. Then the 4 4 2 orthogonality of u and v implies that ku + vk = ku − vk = 4, whence kuvk ≤ 1/2. In the general case, if u or v is not a unit vector, then if one of the two vectors is the zero vector, the lemma is obvious. Else we may write 2 kuvk ≤ kuvk = kuk kvk u v . kuk kvk Now applying the above result for the case of unit vectors to u/ kuk and v/ kvk we obtain the lemma. ✷ The bound for kuvk in Lemma C.5 is stronger than the corresponding bound in Lemma C.4. This easily follows by using ab ≤ 21 a2 + b2 with a = kuk and b = kvk. It may be noted that the last inequality provides also an alternative proof for the bound for kuvk∞ in Lemma C.5. For the proof of the third uv-lemma we need the next lemma. Lemma C.6 6 Let γ be a vector in IRp such that γ > −e and eT γ = σ. Then if either γ ≥ 0 or γ ≤ 0, p X −σ −γi ≤ ; 1 + γ 1 +σ i i=1 equality holds if and only if at most one of the coordinates of γ is nonzero. 5 6 For the case in which u and v are unit vectors, this lemma has been found by several authors. See, e.g., Mizuno [214], Jansen et al. [154], Gonzaga [125]. The extension to the general case in Lemma C.5 is due to Gonzaga (private communication). We will refer to this lemma as the second uv-lemma. This lemma and the next lemma are due to Ling [182, 183]. 440 Some Technical Lemmas Proof: The lemma is trivial if γ = 0, so we may assume that γ is nonzero. For the proof of the lemma we use the function f : (−1, ∞)p → IR defined by p X −γi . f (γ) := 1 + γi i=1 We can easily verify that f is convex (its Hessian is positive definite). Observe that P p i=1 γi /σ = 1 and, since either γ ≥ 0 or γ ≤ 0, γi /σ ≥ 0. Therefore we may write !   p p p X X X γi γi −σ γi −σ = f (γ) = f σei ≤ f (σei ) = , σ σ σ 1 + σ 1 +σ i=1 i=1 i=1 where ei denotes the i-th unit vector in IRp . This proves the inequality in the lemma. Note that the inequality holds with equality if γ = σei , for some i, and that in all other cases the inequality is strict since the Hessian of f is positive definite. Thus the lemma has been proved. ✷ Using the above lemmas we prove the next lemma. Lemma C.7 (Third uv-lemma) Let u and v be orthogonal in IRn , and suppose ku + vk = 2r with r < 1. Then   e 2r4 eT . −e ≤ e + uv 1 − r4 Proof: The first uv-lemma implies that kuvk∞ ≤ r2 < 1. Hence, putting β := uv we have eT β = 0 and −e < β < e. Now let {i : βi > 0}, I+ := {i : βi < 0}. I− := Then X i∈I+ βi = − X βi . i∈I− Let σ denote this common value. Using Lemma C.6 twice, with respectively γi = bi for i ∈ I+ and γi = bi for i ∈ I− , we obtain    X  n −βi e e T T −e = e −e = e e + uv e+β 1 + βi i=1 X −βi X −βi = + 1 + βi 1 + βi i∈I+ ≤ i∈I− σ 2σ 2 −σ . + = 1+σ 1−σ 1 − σ2 The last expression is monotonically increasing in σ. Hence we may replace it by an upper bound, which can be obtained as follows: n σ= n n  1 1X 1X 1X 2 2 |βi | = |ui vi | ≤ ui + vi2 = ku + vk = r2 . 2 i=1 2 i=1 4 i=1 4 Some Technical Lemmas 441 Substitution of this bound for σ yields the lemma. ✷ n Lemma C.8 √ and v be orthogonal in IR and suppose √ (Fourth uv-lemma) Let u ku + vk ≤ 2 and δ = ku + v + uvk ≤ 1/ 2. Then q p kuk ≤ 1 − 1 − 2δ 2 . Proof: It is convenient for the proof to introduce the vector z = u + v, and to denote r := kzk. Since u and v are orthogonal there exists a ϕ, 0 ≤ ϕ ≤ π/2, such that kuk = r cos ϕ, kvk = r sin ϕ. (C.5) √ Note that if the angle ϕ equals π/4 then r = kzk ≤ 2 implies that kuk = kvk < 1. But for the general case we only know that 0 ≤ ϕ ≤ π/2 and hence at first sight we should expect that the norms of kuk and kvk may well exceed 1. However, it will turn √ out below that the second condition in the lemma, namely δ = ku + v + uvk ≤ 1/ 2, restricts the values of ϕ to a small neighborhood of π/4, depending on δ, thus yielding the tighter upper bound for kuk in the lemma. Of course, the symmetry with respect to u and v implies the same upper bound for kvk. We have δ = ku + v + uvk ≥ ku + vk − kuvk = kzk − kuvk . (C.6) Applying the second uv-lemma (Lemma C.5) we find 1 1 r2 sin 2ϕ √ kuvk ≤ √ kuk kvk = √ r2 cos ϕ sin ϕ = . 2 2 2 2 Substituting this in (C.6) we obtain δ≥r− r2 sin 2ϕ √ . 2 2 (C.7) The lemma is trivial if either ϕ = 0 or ϕ = π/2, because then either u = 0 or u = z. In the latter case, v = 0, whence kuk = δ. Since (cf. Figure 6.12, page 138) q p δ ≤ 1 − 1 − 2δ 2 , the claim follows. Therefore, from now on it is assumed that 0<ϕ< π . 2 Thus, sin 2ϕ > 0 and (C.7) is equivalent to √ √ (sin 2ϕ) r2 − 2r 2 + 2ϕ 2 ≥ 0. 442 Some Technical Lemmas The left-hand side expression is quadratic in r and vanishes if √   q √ 2 1 ± 1 − δ 2 sin 2ϕ . r= sin 2ϕ √ The plus sign gives a value larger than 2. Thus we obtain √   q √ 2δ 2 p 1 − 1 − δ 2 sin 2ϕ = . r≤ √ sin 2ϕ 1 + 1 − δ 2 sin 2ϕ Consequently, using 0 < ϕ < π/2, kuk = r cos ϕ ≤ We proceed by considering the function f (ϕ) := 2δ cos ϕ p . √ 1 + 1 − δ 2 sin 2ϕ 2δ cos ϕ p , √ 1 + 1 − δ 2 sin 2ϕ 0 ≤ ϕ ≤ π/2, √ with δ 2 ≤ 1. Clearly this function is nonnegative and differentiable on the interval [0, π/2]. Moreover, f (0) = δ and f (π/2) = 0. On the open interval (0, π/2) the derivative of f with respect to ϕ is zero if and only if √   q √ δ 2 cos ϕ cos 2ϕ p = 0. − sin ϕ 1 + 1 − δ 2 sin 2ϕ + √ 1 − δ 2 sin 2ϕ This reduces to q   √ √ √ − sin ϕ 1 − δ 2 sin 2ϕ − sin ϕ 1 − δ 2 sin 2ϕ + δ 2 cos ϕ cos 2ϕ = 0, which can be rewritten as q √ √ δ 2 cos ϕ − sin ϕ = sin ϕ 1 − δ 2 sin 2ϕ. Taking squares we obtain √ √ 2δ 2 cos2 ϕ + sin2 ϕ − δ 2 sin 2ϕ = sin2 ϕ − δ 2 sin2 ϕ sin 2ϕ, which simplifies to √ √  2δ 2 cos2 ϕ = δ 2 sin 2ϕ 1 − sin2 ϕ = δ 2 sin 2ϕ cos2 ϕ. √ Dividing by δ 2 cos2 ϕ we find the surprisingly simple expression √ sin 2ϕ = δ 2. √ We assume that δ is positive, because if δ = 0 the lemma is trivial. Then sin 2ϕ = δ 2 admits two values for ϕ on the interval [0, π/2], one at each side of π/4. Since we are Some Technical Lemmas 443 maximizing f we have to take the value to the left of π/4. For this value, cos 2ϕ is positive. Therefore we may write 2δ cos ϕ 2δ cos ϕ δ 2δ cos ϕ p = = = . 2ϕ 2 1 + cos 2ϕ 2 cos cos ϕ 1 + 1 − sin 2ϕ √ Now cos ϕ can be solved from the equation 2 cos ϕ sin ϕ = δ 2. Taking the larger of the two roots we obtain q p 1 cos ϕ = √ 1 + 1 − 2δ 2 . 2 f (ϕ) = For this value of ϕ we have √ q √ q p p δ 2 δ 2 2 =√ f (ϕ) = p 1 − 1 − 2δ = 1 − 1 − 2δ 2 . √ 2δ 2 1 + 1 − 2δ 2 Clearly this value is larger than the values at the boundary points ϕ = 0 and ϕ = π/2. Hence it gives the maximum value of r cos ϕ on the whole interval [0, π/2]. Thus the lemma follows. ✷ Appendix D Transformation to canonical form D.1 Introduction It is almost obvious that every LO problem can be rewritten in the canonical form given by (P ). To see this, some simple observations are sufficient. First, any maximization problem can be turned into a minimization problem by multiplying the objective function by −1. Second, any equality constraint aT x = b can be replaced by the two inequality constraints aT x ≤ b, aT x ≥ b, and any inequality constraint aT x ≤ b is equivalent to −aT x ≥ −b. Third, any free variable x, with no sign requirements, can be written as x = x+ − x− , with x+ and x− nonnegative. By applying these transformations to any given LO problem, we get an equivalent problem that has the canonical form of (P ). The new problem is equivalent to the given problem in the sense that the new problem is feasible if and only if the given problem is feasible, and unbounded if and only if the given problem is unbounded, and, moreover, if the given problem has (one or more) optimal solutions then these can be found from the optimal solution(s) of the new problem. The approach just sketched is quite popular in textbooks,1 despite the fact that in practice, when dealing with solution methods, it has a number of obvious shortcomings. First, it increases the number of constraints and/or variables in the problem description. Each equality constraint is removed at the cost of an extra constraint, and each free variable is removed at the cost of an extra variable. Especially when the given problem is a large-scale problem it may be desirable to keep the dimensions of the problem as small as possible. Apart from this shortcoming the approach is even more inappropriate when dealing with an interior-point solution method. It will become clear later on that it is then essential to have a feasible region with a nonempty interior so that the level sets for the duality gap are bounded. However, when an equality constraint is replaced by two inequality constraints, these two inequalities cannot have positive slack values for any feasible point. This means that the interior of the feasible region is empty after the transformation. Moreover, the nonnegative variables introduced by eliminating a free variable are unbounded: when the same constant is added to the two new variables their difference remains the same. Hence, if in the original problem the level sets of the duality gap were bounded, we would lose this property in the new formulation of the problem. For deriving theoretical results, the above properties of the described transformations may give no problems at all. In fact, an example of an application of this type is 1 See, e.g., Schrijver [250], page 91, and Padberg [230], page 23. 446 Transformation to canonical form given in Section 2.10. However, when it is our aim to solve a given LO problem, the approach cannot be recommended, especially if the solution method is an interior-point method. The purpose of this section is to show that there exists an alternative approach that has an opposite effect on the problem size: it reduces the size of the problem. Moreover, if the original problem has a nonempty interior feasible region then so has the transformed problem, and if the level sets in the original problem are bounded then they are bounded after the transformation as well. In this approach, outlined below, each equality constraint and each free variable in the original problem reduces the number of variables or the number of constraints by one. Stated more precisely, we have the following result. Theorem D.1 Let (P ) be an LO problem with m constraints and n variables. Moreover let (P ) have m0 equality constraints and n0 free variables. Then there exists an equivalent canonical problem for which the sum of the number of constraints and the number of variables is not more than n + m − n0 − m0 . Proof: In an arbitrary LO problem we distinguish between the following types of variable: nonnegative variables, free variables and nonpositive variables.2 Similarly, three types of constraints can occur: equality constraints, inequality constraints of the less-than-or-equal-to (≤) type and inequality constraints of the greater-than-or-equalto (≥) type. It is clear that nonpositive variables can be replaced by nonnegative variables at no cost by taking the opposites as new variables. Also, inequality constraints of the less-than-or-equal-to type can be turned into inequality constraints of the greater-than-or-equal-to type through multiplication by −1. In this way we can transform the problem to the following form at no cost: (    ) T c0 x0 A0 x0 + A1 x1 = b0 1 (P ) min : , x ≥0 , B0 x0 + B1 x1 ≥ b1 c1 x1 where, for i = 0, 1, Ai and Bi are matrices and bi , ci and xi are vectors. The vector x0 contains the n0 free variables, and there are m0 equality constraints. The variables in x1 are nonnegative, and their number is n − n0 , whereas the number of inequality constraints is m − m0 . The sizes of the matrices and the vectors in (P ) are such that all expressions in the problem are well defined and need no further specification. D.2 Elimination of free variables In this section we discuss the elimination of free variables, thus showing how to obtain a problem in which all variables are nonnegative. We may assume that the matrix 2 A variable xi in (P ) is called a nonnegative variable if (P ) contains an explicit constraint xi ≥ 0 and a nonpositive variable if there is a constraint xi ≤ 0 in (P ); all remaining variables are called free variables. For the moment this classification of the variables is sufficient for our goal. But it may be useful to discuss the role of bounds on the variables. In this proof we consider any constraint of the form xi ≥ ℓ or xi ≤ u, with ℓ and u nonzero, as an inequality constraint. If the problem requires a variable xi to satisfy ℓ ≤ xi ≤ u then we can save one constraint by a simple shift of xi : defining x′i := xi − ℓ, the new variable is nonnegative and is bounded above by x′i ≤ u − ℓ. Transformation to canonical form 447 [A0 A1 ] has full row rank. Otherwise the set of equality constraints is redundant or inconsistent. If the system is not inconsistent, we can eliminate some of these constraints until the above condition on the rank is satisfied, i.e., rank (A0 A1 ) = m0 . Introducing a surplus vector x2 , we can write the inequality constraints as B0 x0 + B1 x1 − x2 = b1 , x2 ≥ 0. The constraints in the problem are then represented by the equality system     x0  0 A0 A1 0  x1  = b1 , x1 ≥ 0, x2 ≥ 0, B0 B1 −Im−m0 b x2 where Im−m0 denotes the identity matrix of size (m − m0 ) × (m − m0 ). We now have m equality constraints and n + m − m0 variables. Grouping together the nonnegative variables, we may write the last system as  0  0  1 x b x [F G] = 1 , z= ≥ 0, z b x2 where x0 contains the free variables, as before, and the variables in z are nonnegative. Note that, as a consequence of the above rank condition, the matrix [F G] has full row rank. The size of F is m × n0 and the size of G is m × (n − n0 + m − m0 ). Let us denote the rank of F by r. The we obviously have r ≤ n0 . Then, using Gaussian elimination, we can express r free variables in the remaining variables. We simply have to pivot on free variables as long as possible. So, as long as free variables occur in the problem formulation we choose a free variable and a constraint in which it occurs. Then, using this (equality) constraint, we express the free variable in the other variables and by substitution eliminate it from the other constraints and from the objective function. Since F has rank r, we can do this r times, and after reordering variables and equations if necessary, the constraints get the form     x̄0  r  0 Ir H Dr  0  d x̄ x̃ = , x1 = , z ≥ 0, (D.1) 0 0 D d x̃0 z where Ir is the r × r identity matrix, which is multiplied with x̄0 , the vector of the eliminated free variables, and H is an r × (n0 − r) matrix, which is multiplied with x̃0 , the vector of free variables that are not eliminated; the columns of Dr and D correspond to the nonnegative variables in z. Moreover, since the variables x̄0 have been eliminated from the objective function, there exist vectors cH and cD such that the objective function has the form cTH x̃0 + cTD z. (D.2) We are left with m equalities. The first r equalities express the free variables in x̄0 in the remaining variables, while the remaining m − r equalities contain no free variables. Observe that the first r equalities do not impose a condition on the feasibility of the vector z; they simply tell us how the values of the free variables in x̄0 can be calculated from the remaining variables. 448 Transformation to canonical form We conclude that the problem is feasible if and only if the system Dz = d, z≥0 (D.3) is feasible. Assuming this, for an any z satisfying (D.3) we can choose the vector x̃0 arbitrarily and then compute x̄0 such that the resulting vector satisfies (D.1). So fixing z, and hence also fixing its contribution cTD z to the objective function (D.2), we can make the objective value arbitrary small if the vector cH is nonzero. Since the variables in x̄0 do not occur in the objective function, it follows from this that the problem is unbounded if cH is nonzero. So, if the problem is not unbounded then cH = 0. In that case it remains to solve the problem  (P ′ ) min cTD z : Dz = d, z ≥ 0 , where D is an (m − r) × (n − n0 + m − m0 ) matrix and this matrix has rank m − r. Note that (P ′ ) is in standard format. D.3 Removal of equality constraints We now show how problem (P ′ ) can be reduced to canonical form. This goes by using the same pivoting procedure as above. Choose a variable and an equality constraint in which it occurs. Use the constraint to express the chosen variable in the other variables and then eliminate this variable from the other constraints and the objective function. Since A has rank m − r we can repeat this process m − r times and then we are left with expressions for the m − r eliminated variables in the remaining (nonnegative) variables. The number of the remaining variables is n − n0 + m − m0 − (m − r) = n − n0 + r − m0 . Now the nonnegativity conditions on the m − r eliminated variables result in m − r inequality constraints for the remaining n − n0 + r − m0 variables. So we are left with m − r inequality constraints that contain n − n0 + r − m0 variables. The sum of these numbers being n + m − n0 − m0 , the theorem has been proved. ✷ Before giving an example of the above reduction we make some observations. Remark D.2 When dealing with an LO problem, it is most often desirable to have an economical representation of the problem. Theorem D.1 implies that whenever the model contains equality constraints or free variables, then the size of the constraint matrix can be reduced by transforming the problem to a canonical form. As a consequence, when we consider the dimension of the constraint matrix as a measure of the size of the model, then any minimal representation of the problem has a canonical form. Of course, here it is assumed that in any such representation, nonpositive variables are replaced by nonnegative variables and ≤ inequalities by ≥ inequalities; these transformations do not change the dimension of the constraint matrix. In this connection it may be useful to point out that the representation obtained by the transformation in the proof of Theorem D.1 may be far from a minimal representation. Any claim of this type is poorly founded. For example, if the given problem is infeasible Transformation to canonical form 449 then a representation with one constraint and one variable exists. But to find out whether the problem is infeasible one really has to solve it. Remark D.3 It may happen that after the above transformations we are left with a canonical problem  (P ) min cT x : Ax ≥ b, x ≥ 0 , for which the matrix A has a zero row. In that case we can reduce the problem further. If the i-th row of A is zero and bi ≤ 0 then the i-th row of A and the i-th entry of b can be removed. If bi > 0 then we may decide that the problem is infeasible. Remark D.4 Also if A has a zero column further reduction is possible. If the j-th column of A is zero and cj > 0 then we have xj = 0 in any optimal solution and this column and the corresponding entry of c can be deleted. If cj < 0 then the problem is unbounded. Finally, if cj = 0 then xj may be given any (nonnegative) value. For the further analysis of the problem we may delete the j-th column of A and the entry cj in c. Example D.5 By way of example we consider the problem max {y1 + y2 : −1 ≤ y1 ≤ 1, y2 ≤ 1} . (EP ) (D.4) This problem has two variables and three constraints, so the constraint matrix has size 3 × 2. Since the two variables are free (cf. Footnote 2), Theorem D.1 guarantees the existence of a canonical description of the problem for which the sum of the numbers of rows and columns in the constraint matrix is at most 3 (= 5 − 2). Following the scheme of the proof of Theorem D.1 we construct such a canonical formulation. First, by introducing nonnegative slack variables for the three inequality constraints, we change all constraints into equality constraints: −y1 + s1 y1 = 1 + s2 y2 = 1 + s3 = 1. The free variables y1 and y2 can be eliminated by using y1 y2 = = s1 − 1 1 − s3 , and since y1 + y2 = s1 − s3 we obtain the equivalent problem max {s1 − s3 : s1 + s2 = 2, s1 , s2 , s3 ≥ 0} . By elimination of s2 this reduces to max {s1 − s3 : s1 ≤ 2, s1 , s3 ≥ 0} . (D.5) The problem is now reduced to the dual canonical form, as given by (2.2), with the following constraint matrix A, right-hand side vector c and objective vector b: " # " # h i 1 1 A= , c= 2 , b= . 0 −1 450 Transformation to canonical form Note that the constraint matrix in this problem has size 2 × 1, and the sum of the dimensions is 3, as expected. ♦ In the above example the optimal solution y = (1, 1) is unique. We consider below two modifications of the sample problem (EP ) by changing the objective function. In the first modification we use the objective function y1 ; then the optimal set consists of all y = (1, y2 ) with y2 ≤ 1. The optimal solution is no longer unique. The second modification has objective function y1 − y2 ; then the problem is unbounded, as can easily be seen. Example D.6 In this example we consider the problem max {y1 : −1 ≤ y1 ≤ 1, y2 ≤ 1} . (D.6) As in the previous example we can introduce nonnegative slack variables s1 , s2 and s3 and then eliminate the variables y1 , y2 and s2 , arriving at the canonical problem max {s1 : s1 ≤ 2, s1 , s3 ≥ 0} . (D.7) Here we have replaced the objective y1 = s1 − 1 simply by s1 , thereby omitting the constant −1, which is irrelevant for the optimization. The dependence of the eliminated variables on the variables in this problem is the same as in the previous example: y1 y2 = = s2 = s1 − 1 1 − s3 2 − s1 . The constraint matrix A and the right-hand side vector c in the dual canonical formulation are the same as before; only the objective vector b has changed: A= " 1 0 # , c= h 2 i , b= " 1 0 # . ♦ Example D.7 Finally we consider the unbounded problem max {y1 − y2 : −1 ≤ y1 ≤ 1, y2 ≤ 1} . (D.8) In this case the optimal set is empty. To avoid repetition we immediately state the canonical model: max {s1 + s3 : s1 ≤ 2, s3 ≥ 0} . (D.9) The dependence of the eliminated variables on the variables in this problem is the same as in the previous example. The matrix A and vectors c and b are now A= " 1 0 # , c= h 2 i , b= " 1 1 # . ♦ Appendix E The Dikin step algorithm E.1 Introduction In this appendix we reconsider the self-dual problem (SP ) min  q T z : M z ≥ −q, z ≥ 0 . (E.1) as given by (2.16) and we present a simple algorithm for solving (SP ) different from the full-Newton step algorithm of Section 3. Recall that we may assume without loss of generality that x = e is feasible and s(e) = M e + q = e, so e is the point on the central path of (SP ) corresponding to the value 1 of the barrier parameter. Moreover, at this point the objective value equals n, the order of the skew-symmetric matrix M . To avoid the trivial case n = 1 (when M = 0), we assume below that n ≥ 2. The algorithm can be described roughly as follows. Starting at x0 = e the algorithm approximately follows the central path until the objective value reaches some (small) target value ε. This is achieved by moving from x0 along a direction — more or less tangent to the central path — to the next iterate x1 , in such a way that x1 is close to the central path again, but with a smaller objective value. Then we repeat the same procedure until the objective has become small enough. In the next section we define the search direction used in the algorithm.1 Then, in Section E.3 the algorithm is defined and in subsequent sections the algorithm is analyzed. This results in an iteration bound, in Section E.5. E.2 Search direction Let x be a positive solution of (SP ) such that its surplus vector s = s(x) is positive, and let ∆x denote a displacement in the x-space. For the moment we neglect the nonnegativity conditions in (SP ). Then, the new iterate x+ is given by x+ := x + ∆x, 1 After the appearance of Karmarkar’s paper in 1984, Barnes [34] and Vanderbei, Meketon and Freedman [279] proposed a simplified version of Karmarkar’s algorithm. Later, their algorithm appeared to be just a rediscovery of the primal affine-scaling method proposed by Dikin [63] in 1967. See also Barnes [35]. The search direction used in this chapter can be considered as a primaldual variant of the affine-scaling direction of Dikin (cf. the footnote on page 339) and is therefore named the Dikin direction. It was first proposed by Jansen, Roos and Terlaky [156]. 452 The Dikin step algorithm and the new surplus vector s+ follows from s+ := s(x+ ) = M (x + ∆x) + q = s + M ∆x. The displacement ∆s in the s-space is simply given by ∆s = s+ − s = M ∆x, and, hence, the two displacements are related by M ∆x − ∆s = 0. (E.2) This implies, by the orthogonality property (2.22), that ∆x and ∆s are orthogonal: T T (∆x) ∆s = (∆x) M ∆x = 0. (E.3) The inequality constraints in (SP ) require that x + ∆x ≥ 0, s + ∆s ≥ 0. In fact, we want to stay in the interior of the feasible region, so we need to find displacements ∆x and ∆s such that x + ∆x > 0, s + ∆s > 0. Following an idea of Dikin [63, 65], we replace the nonnegativity conditions by requiring that the next iterates (x + ∆x, s + ∆s) belong to a suitable ellipsoid. We define this ellipsoid by requiring that ∆x ∆s + ≤ 1, (E.4) x s and call this ellipsoid in IR2n the Dikin ellipsoid. Remark E.1 It may be noted that when there are no additional conditions on the displacements ∆x and ∆s, then the Dikin ellipsoid is highly degenerate in the sense that it contains a linear space. For then the equation s∆x + x∆s = 0 determines an n-dimensional linear space that is contained in it. However, when intersecting the Dikin ellipsoid with the linear space (E.2), we get a bounded set. This can be seen as follows. The pair (∆x, ∆s) belongs to the Dikin ellipsoid if and only if (E.4) holds. Now (E.4) can be rewritten as s∆x + x∆s xs ≤ 1. By substitution of ∆s = M ∆x this becomes s∆x + xM ∆x xs ≤ 1, which is equivalent to (XS)−1 (S + XM ) ∆x ≤ 1. The matrix (XS)−1 (S + XM ) is nonsingular, and hence ∆x is bounded. See also Exercise 9 (page 29) and Exercise 113 (page 453). • The Dikin step algorithm 453 Our aim is to minimize the objective value q T x = xT s. The new objective value is (x + ∆x)T (s + ∆s) = xT s + xT ∆s + sT ∆x. Here we have used that ∆x and ∆s are orthogonal, from (E.3). Now minimizing the new objective value over the Dikin ellipsoid amounts to solving the following optimization problem:   ∆x ∆s T T + min s ∆x + x ∆s : M ∆x − ∆s = 0, ≤1 . (E.5) x s We proceed by showing that this problem uniquely determines the search direction vectors. For this purpose we rewrite (E.5) as follows.     ∆x ∆s ∆x ∆s T + + : M ∆x − ∆s = 0, ≤1 . (E.6) min (xs) x s x s The vector ∆x ∆s + x s must belong to the unit ball. When we neglect the affine constraint ∆s = M ∆x in (E.6) we get the relaxation n o T min (xs) ξ : kξk ≤ 1 . ξ := This problem has a trivial (and unique) solution, namely ξ=− xs . kxsk Thus, if we can find ∆x and ∆s such that ∆x ∆s + x s ∆s xs kxsk M ∆x − = = (E.7) (E.8) then ∆x and ∆s will solve (E.5). Multiplying both sides of (E.7) with xs yields s∆x + x∆s = − x2 s2 . kxsk (E.9) Now substituting (E.8) we get2,3 (S + XM ) ∆x = − 2 3 x2 s2 . kxsk As usual, X = diag (x) and S = diag (s). Exercise 113 If we define d := p x/s then show that the Dikin step ∆x can be rewritten as ∆x = −D (I + DM D)−1 3 3 x2 s2 . kxsk 454 The Dikin step algorithm Thus we have found the solution of (E.5), namely −1 ∆x = − (S + XM ) ∆s = M ∆x. x2 s2 kxsk (E.10) (E.11) We call ∆x the Dikin direction or Dikin step at x for the self-dual model (SP ). In the next section we present an algorithm that is based on the use of this direction, and in subsequent sections we prove that this algorithm solves (SP ) in polynomial time. E.3 Algorithm using the Dikin direction The reader should be aware that we have so far not discussed whether the Dikin step yields a feasible point. Before stating our algorithm we need to deal with this. For the moment it suffices to point out that in the algorithm we use a step-size parameter α. Starting at x we move in the direction along the Dikin step ∆x to x + α∆x. The value of α is specified later on. The algorithm can now be described as follows. Dikin Step Algorithm for the Self-dual Model Input: An accuracy parameter ε > 0; a step-size parameter α, 0 < α ≤ 1; x0 > 0 such that s(x0 ) > 0. begin x := x0 ; s := s(x); while xT s ≥ ε do begin x := x + α∆x (with ∆x from (E.10)); s := s(x); end end Below we analyze this algorithm and provide a default value for the step-size parameter α for which the Dikin step is always feasible. This makes the algorithm well defined. In the analysis of the algorithm we need a measure for the ‘distance’ of an iterate x to the central path . To this end, for each positive feasible vector x with s(x) > 0, we use the number δc (x) as introduced earlier in (3.20): δc (x) := max (xs(x)) . min (xs(x)) (E.12) Below, in Theorem E.8 we show that the algorithm needs no more than O(τ n) iterations to produce a solution x with xT s(x) ≤ ε, where τ depends on x0 according The Dikin step algorithm to 455  τ = max 2, δc (x0 ) . Recall that it may be assumed without loss of generality that x0 lies on the central path, in which case δc (x0 ) = 1 and τ = 2. E.4 Feasibility, proximity and step-size We proceed by a condition on the step-size that guarantees the feasibility of the new iterates. Let us say that the step-size α is feasible if the new iterate and its surplus vector are positive. Then we may state the following result. In this lemma, and further on, we simply write s for s(x). Lemma E.2 Let α ≥ 0, xα = x + α∆x and sα = s + α∆s. If ᾱ is such that xα sα > 0 for all α satisfying 0 ≤ α ≤ ᾱ, then the step-size ᾱ is feasible. Proof: If ᾱ satisfies the hypothesis of the lemma then the coordinates of xα and sα cannot vanish for any α ∈ [0, ᾱ]. Hence, since x0 s0 > 0, by continuity, xα and sα must be positive for any such α. ✷ We use the superscript + to refer to entities after the Dikin step of size α at x: x+ s+ := := x + α∆x, s + α∆s. Consequently, x+ s+ = (x + α∆x)(s + α∆s) = xs + α (x∆x + s∆s) + α2 ∆x∆s. Using (E.9), we obtain x+ s+ = xs − α x2 s2 + α2 ∆x∆s. kxsk (E.13) Recall that the objective value is given by q T x = xT s. In the next lemma we investigate the reduction of the objective value during a feasible Dikin step with size α. Lemma E.3 If the step-size α is feasible then    α + T + xT s. s ≤ 1− √ x n Proof: Using (E.13) and the fact that ∆x and ∆s are orthogonal, the objective value T (x+ ) s+ after the step can be expressed as follows. x+ T s+ = xT s − αeT x2 s2 = xT s − α kxsk . kxsk The Cauchy–Schwarz inequality implies xT s = eT (xs) ≤ kek kxsk = √ n kxsk . 456 The Dikin step algorithm Substitution gives x+ Hence the lemma follows. T s+ ≤   α xT s. 1− √ n ✷ Now let τ ≥ 1 be some constant. We assume that we are given a feasible x such that δc (x) ≤ τ . Our aim is to find a step-size α such that these properties are maintained after the Dikin step. One easily verifies that δc (x) ≤ τ holds if and only if there exist positive numbers τ1 and τ2 such that τ1 e ≤ xs ≤ τ2 e, with τ2 = τ τ1 . Obvious choices are τ1 = min(xs) and τ2 = max(xs). Only then one has δc (x) = τ . In the sequel we assume that x is feasible and δc (x) ≤ τ . Note that this implies x > 0 and s > 0. Lemma E.4 If the step-size α satisfies 1 α≤ √ , τ n (E.14) then the map t 7→ t − α is monotonically increasing for t ∈ [0, τ2 ]. t2 kxsk (E.15) Proof: The lemma holds if the derivative of the map in the lemma is nonnegative for t ∈ [0, τ2 ]. So we need to show that 1 − 2αt/ kxsk ≥ 0 for t ∈ [0, τ2 ]. Since n ≥ 2 we have √ kτ1 ek kxsk τ1 τ1 n 1 = ≤ . (E.16) α≤ √ = √ ≤ 2τ2 2τ2 2τ2 τ n τ2 n Hence, using t ∈ [0, τ2 ], we may write 2αt 2ατ2 ≤ ≤ 1. kxsk kxsk This implies the lemma. ✷ In the sequel we assume that α satisfies (E.14). By Lemma E.4 the map (E.15) is then monotonically increasing for t ≤ τ2 . Since τ1 e ≤ xs ≤ τ2 e, by applying this map to the components of xs we get     x2 s2 τ22 τ12 e ≤ xs − α e. ≤ τ2 − α τ1 − α kxsk kxsk kxsk Substitution into (E.13) gives     τ12 τ22 2 + + τ1 − α e + α ∆x∆s ≤ x s ≤ τ2 − α e + α2 ∆x∆s, kxsk kxsk (E.17) The Dikin step algorithm 457 thus yielding lower and upper bounds for the entries of x+ s+ . A crucial part of the analysis consists of finding a bound for the last term in (E.17). For that purpose we use Lemma C.4 in Appendix C. The first statement in this lemma p (with u = d−1 ∆x and v = d∆s, where d = x/s) gives  d−1 ∆x (d∆s) k∆x∆sk∞ = ∞ ≤ 1 −1 d ∆x + d∆s 4 2 = 2 1 s∆x + x∆s √ 4 xs . Now using (E.9) we get k∆x∆sk∞ ≤ 1 √ xs xs 4 kxsk 2 ≤ 1 xs kxsk∞ 4 kxsk 2 = 1 τ2 kxsk∞ ≤ . 4 4 Lemma E.5 When assuming (E.14), the step size α is feasible if p 2 (2τ − 1) α≤ . τ (E.18) (E.19) Proof: Using (E.18) we derive from the left-hand side inequality in (E.17) that   τ2 τ12 + + e − α2 e. (E.20) x s ≥ τ1 − α kxsk 4 The step size α is feasible if and only if x+ s+ is a positive vector. Due to (E.14) we + + have α ≤ kxsk 2τ2 (cf. (E.16)). Hence x s certainly is positive if     1 τ2 τ τ2 > 0. − α2 τ1 − 1 − α2 = τ1 1 − 2τ2 4 2τ 4 One easily verifies that the last inequality is equivalent to the inequality in the lemma. ✷ Lemma E.6 4 Assuming (E.14), (E.19), and δc (x) ≤ τ , let α≤ 4(τ − 1) 1 √ . τ +1 τ n Then x+ is feasible and δc (x+ ) ≤ τ . Proof: Obviously, (E.20) yields a lower bound for min(x+ s+ ). In the same way, by using (E.18) once more, the right-hand side inequality in (E.17) yields an upper bound, namely   τ2 τ22 + + e + α2 e. (E.21) x s ≤ τ2 − α kxsk 4 Now it follows from (E.20) and (E.21) that δc (x+ ) ≤ 4 τ2 − τ1 − ατ22 kxsk ατ12 kxsk + α2 τ2 4 − α2 τ2 4  ατ2 α2 1 − + τ2 kxsk 4 = τ 1 + = 2 ατ1 τ1 1 − kxsk − α4 τ 2 α(τ1 −τ2 ) + α (τ4+1) kxsk 2 ατ1 − α4 τ 1 − kxsk  . In the previous edition of this book our estimate for δc (x+ ) contained a technical error. We kindly acknowledge the authors Zoltán and Szabolcs of [322] for pointing this out. 458 The Dikin step algorithm This makes clear that δc (x+ ) ≤ τ holds if α(τ1 − τ2 ) α2 (τ + 1) + ≤ 0. kxsk 4 √ Using τ1 ≤ τ2 and kxsk ≤ τ2 n, one easily obtains the upper bound for α in the lemma. ✷ Note that if τ ≥ 2 then 4(τ − 1)/ (τ + 1) > 1. Hence the bound in Lemma E.6 is then weaker than the bounds in Lemma E.4 and Lemma E.5. Therefore, without further proof we may state the main result of this section. √ Theorem E.7 Let τ ≥ 2 and α = 1/(τ n). Then α is feasible, and δc (x) ≤ τ implies δc (x∗ ) ≤ τ . E.5 Convergence analysis We are now ready for deriving an upper bound for the number of iterations needed by the algorithm.  √ Theorem E.8 Let τ := max 2, δc (x0 ) and α = 1/ (τ n). Then the Dikin Step Algorithm for the Self-dual Model requires at most   q T x0 τ n log ε iterations. The output is a feasible solution x such that δc (x) ≤ τ and q T x ≤ ε. Proof: Initially we are given a feasible x = x0 > 0 such that δc (x) ≤ τ . Since τ ≥ 2 these properties are maintained during the execution of the algorithm, by Theorem E.7. Initially the objective value equals q T x0 . Each iteration reduces the objective value by a factor 1 − 1/(nτ ), by Lemma E.3. Hence, after k iterations the objective value is smaller than ε if  k 1 1− q T x0 ≤ ε. nτ Taking logarithms, this becomes   1 + log(q T x0 ) ≤ log ε. k log 1 − nτ Since   1 1 log 1 − ≤− , nτ nτ this certainly holds if k satisfies − q T x0 k ≤ − log(q T x0 ) + log ε = − log . nτ ε This implies the theorem. ✷ The Dikin step algorithm 459 Example E.9 In this example we demonstrate the behavior of the Dikin Step Algorithm by applying it to the problem (SP ) in Example I.7, as given in (2.19) (page 23). The same problem was solved earlier by the Full-Newton Step Algorithm in Example I.38. We initialize the algorithm with z = e. Then Theorem E.8, with τ = 2 and n = 5, yields that the algorithm requires at most   5 10 log ε iterations. For ε = 10−2 we have log (5/ε) = log 500 = 6.2146, and we get 63 as an upper bound for the number of iterations. When running the algorithm with this ε the actual number of iterations is 58. The output of the algorithm is z = (1.5985, 0.0025, 0.7998, 0.8005, 0.0020) and s(z) = (0.0012, 0.8005, 0.0025, 0.0025, 1.0000). The left plot in Figure E.1 shows how the coordinates of the vector z develop in the course of the algorithm. The right plot does the same for the coordinates of the surplus vector s = s(z). Observe that z and s(z) converge to the same solution as found in 1.6 1.6 1.4 1.4 ■ 1.2 z1 1 κ 1.2 s5 ✠ 1 ✠ 0.8 ✻ 0.8 ■ s2 z3 0.6 0.6 s3 ✠ s3 0.4 0.4 ■ ✒ ϑ 0.2 ✒ 0.2 s4 s1 0 0 20 Figure E.1 40 60 ✲ iteration number 0 0 20 40 60 ✲ iteration number Output of the Dikin Step Algorithm for the problem in Example I.7. Example I.38 by the Full-Newton Step Algorithm, but the number of iterations is higher. ♦ Bibliography [1] I. Adler, N.K. Karmarkar, M.G.C. Resende, and G. Veiga. Data structures and programming techniques for the implementation of Karmarkar’s algorithm. ORSA J. on Computing, 1:84–106, 1989. [2] I. Adler, N.K. Karmarkar, M.G.C. Resende, and G. Veiga. An implementation of Karmarkar’s algorithm for linear programming. Mathematical Programming, 44:297–335, 1989. Errata in Mathematical Programming, 50:415, 1991. [3] I. Adler and R.D.C. Monteiro. Limiting behavior of the affine scaling continuous trajectories for linear programming problems. Mathematical Programming, 50:29–51, 1991. [4] I. Adler and R.D.C. Monteiro. A geometric view of parametric linear programming. Algorithmica, 8:161–176, 1992. [5] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Mass., 1974. [6] M. Akgül. A note on shadow prices in linear programming. J. Operational Research Society, 35:425–431, 1984. [7] A. Altman and K.C. Kiwiel. A note on some analytic center cutting plane methods for convex feasibility and minimization problems. Computational Optimization and Applications, 5, 1996. [8] E.D. Andersen. Finding all linearly dependent rows in large-scale linear programming. Optimization Methods and Software, 6:219–227, 1995. [9] E.D. Andersen and K.D. Andersen. Presolving in linear programming. Mathematical Programming, 71:221–245, 1995. [10] E.D. Andersen, J. Gondzio, Cs. Mészáros, and X. Xu. Implementation of interior point methods for large scale linear programming. In T. Terlaky, editor, Interior Point Methods of Mathematical Programming, pp. 189–252. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1996. [11] E.D. Andersen and Y. Ye. Combining interior-point and pivoting algorithms for linear programming. Management Science, 42(12):1719–1731, 1996. [12] K.D. Andersen. A modified Schur complement method for handling dense columns in interior-point methods for linear programming. ACM Transactions Mathematical Software, 22(3):348–356, 1996. [13] E.D. Andersen, C. Roos, T. Terlaky, T. Trafalis and J.P. Warners. The use of low-rank updates in interior-point methods. In: Ed. Y. Yuan, Numerical Linear Algebra and Optimization, pp. 1–12. Science Press, Beijing, China, 2004. [14] K.M. Anstreicher. A monotonic projective algorithm for fractional linear programming. Algorithmica, 1(4):483–498, 1986. 462 Bibliography [15] K.M. Anstreicher. A strengthened acceptance criterion for approximate projections in Karmarkar’s algorithm. Operations Research Letters, 5:211–214, 1986. [16] K.M. Anstreicher. A combined Phase I – Phase II projective algorithm for linear programming. Mathematical Programming, 43:209–223, 1989. [17] K.M. Anstreicher. Progress in interior point algorithms since 1984. SIAM News, 22:12– 14, March 1989. [18] K.M. Anstreicher. The worst-case step in Karmarkar’s algorithm. Mathematics of Operations Research, 14:294–302, 1989. [19] K.M. Anstreicher. Dual ellipsoids and degeneracy in the projective algorithm for linear programming. Contemporary Mathematics, 114:141–149, 1990. [20] K.M. Anstreicher. A standard form variant and safeguarded linesearch for the modified Karmarkar algorithm. Mathematical Programming, 47:337–351, 1990. [21] K.M. Anstreicher. A combined phase I – phase II scaled potential algorithm for linear programming. Mathematical Programming, 52:429–439, 1991. [22] K.M. Anstreicher. On the performance of Karmarkar’s algorithm over a sequence of iterations. SIAM J. on Optimization, 1(1):22–29, 1991. [23] K.M. Anstreicher. On interior algorithms for linear programming with no regularity assumptions. Operations Research Letters, 11:209–212, 1992. [24] K.M. Anstreicher. Potential reduction algorithms. In T. Terlaky, editor, Interior Point Methods of Mathematical Programming, pp. 125–158. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1996. [25] K.M. Anstreicher and R.A. Bosch. Long steps in a O(n3 L) algorithm for linear programming. Mathematical Programming, 54:251–265, 1992. [26] K.M. Anstreicher and R.A. Bosch. A new infinity-norm path following algorithm for linear programming. SIAM J. on Optimization, 5:236–246, 1995. [27] M. Arioli, I.S. Duff, and P.P.M. de Rijk. On the augmented system approach to sparse least-squares problems. Numer. Math., 55:667–684, 1989. [28] M.D. Asić, V.V. Kovačević-Vujčić, and M.D. Radosavljević-Nikolić. A note on limiting behavior of the projective and the affine rescaling algorithms. Contemporary Mathematics, 114:151–157, 1990. [29] D.S. Atkinson and P.M. Vaidya. A scaling technique for finding the weighted analytic center of a polytope. Mathematical Programming, 57:163–192, 1992. [30] D.S. Atkinson and P.M. Vaidya. A cutting plane algorithm that uses analytic centers. Mathematical Programming, 69(69), 1995. [31] O. Bahn, J.-L. Goffin, O. du Merle, and J.-Ph. Vial. A cutting plane method from analytic centers for stochastic programming. Mathematical Programming, 69(1):45–73, 1995. [32] Y.Q. Bai, M. Elghami, and C. Roos. A comparative study of kernel functions for primal-dual interior-point algorithms in linear optimization. SIAM J. on Optimization, 15(1):101–128, 2004. [33] M.L. Balinski and A.W. Tucker. Duality theory of linear programs: a constructive approach with applications. SIAM Review, 11:499–581, 1969. [34] E.R. Barnes. A variation on Karmarkar’s algorithm for solving linear programming problems. Mathematical Programming, 36:174–182, 1986. Bibliography 463 [35] E.R. Barnes. Some results concerning convergence of the affine scaling algorithm. Contemporary Mathematics, 114:131–139, 1990. [36] E.R. Barnes, S. Chopra, and D.J. Jensen. The affine scaling method with centering. Technical Report, Dept. of Mathematical Sciences, IBM T. J. Watson Research Center, P. O. Box 218, Yorktown Heights, NY 10598, USA, 1988. [37] M.S. Bazaraa, H.D. Sherali, and C.M. Shetty. Nonlinear Programming: Theory and Algorithms. John Wiley & Sons, New York (second edition), 1993. [38] R. Bellman. Introduction to Matrix Analysis. Volume 12 of SIAM Classics in Applied Mathematics. SIAM, Philadelphia, 1995. [39] A. Ben-Israel and T.N.E. Greville. Generalized Inverses: Theory and Applications. John Wiley & Sons, New York, USA, 1974. [40] D.P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, Massachusetts, 1995. [41] G. Birkhoff and S. MacLane. A Survey of Modern Algebra. Macmillan, New York, 1977. [42] R.E. Bixby. Progress in linear programming. ORSA J. on Computing, 6:15–22, 1994. [43] R.E. Bixby and M.J. Saltzman. Recovering an optimal LP basis from an interior point solution. Operations Research Letters, 15(4):169–178, 1994. [44] Å. Björk. Numerical Methods for Least Squares Problems. SIAM, Philadelphia, 1996. [45] J.F. Bonnans and F.A. Potra. Infeasible path-following algorithms for linear complementarity problems. Mathematics of Operations Research, 22(2), 378–407, 1997. [46] R.A. Bosch. On Mizuno’s rank one updating algorithm for linear programming. SIAM J. on Optimization, 3:861–867, 1993. [47] R.A. Bosch and K.M. Anstreicher. On partial updating in a potential reduction linear programming algorithm of Kojima, Mizuno and Yoshise. Algorithmica, 9(1):184–197, 1993. [48] S.E. Boyd and L. Vandenberghe. Semidefinite programming. SIAM Review, 38(1):49– 96, 1996. [49] A.L. Brearley, G. Mitra, and H.P. Williams. Analysis of mathmetical programming problems prior to applying the simplex algorithm. Mathematical Programming, 15:54–83, 1975. [50] M.G. Breitfeld and D.F. Shanno. Computational experiene with modified log-barrier methods for nonlinear programming. Annals of Operations Research, 62:439–464, 1996. [51] J. Brinkhuis and G. Draisma. Schiet OpT M . Special Report, Econometric Institute, Erasmus University, Rotterdam, 1996. [52] R.C. Buck. Advanced Calculus. International Series in Pure and Apllied Mathematics. Mac-Graw Hill Book Company, New York (third edition), 1978. [53] J.R. Bunch and B.N. Parlett. Direct methods for solving symmetric indefinit systems of linear equations. SIAM J. on Numerical Analysis, 8:639–655, 1971. [54] S.F. Chang and S.T. McCormick. A hierachical algorithm for making sparse matrices sparse. Mathematicaal Programming, 56:1–30, 1992. [55] V. Chvátal. Linear programming. W.H. Freeman and Company, New York, USA, 1983. [56] S.A. Cook. The complexity of theorem-proving procedures. In Proceedings of Third Annual ACM Symposium on Theory of Computing, pp. 151–158. ACM, New York, 1971. 464 Bibliography [57] J.P. Crouzeix and C. Roos. On the inverse target map of a linear programming problem. Unpublished Manuscript, University of Clermont, France, 1994. [58] I. Csiszár. I-divergence geometry of probability distributions and minimization problems. Annals of Probability, 3:146–158, 1975. [59] G.B. Dantzig. Linear Programming and Extensions. Princeton Univ. Press, Princeton, New Jersey, 1963. [60] G.B. Dantzig. Linear programming. In J.K. Lenstra, A.H.G. Rinnooy Kan, and A. Schrijver, editors, History of mathmetical programming. A collection of personal reminiscences. CWI, North–Holland, The Netherlands, 1991. [61] A. Deza, E. Nematollahi, R. Peyghami and T. Terlaky. The central path visits all the vertices of the Klee-Minty cube. AdvOl-Report #2004/11. McMaster Univ., Hamilton, Ontario, Canada. [62] A. Deza, E. Nematollahi and T. Terlaky. How good are interior point methods? KleeMinty cubes tighten iteration-complexity bounds. AdvOl-Report #2004/20. McMaster Univ., Hamilton, Ontario, Canada. [63] I.I. Dikin. Iterative solution of problems of linear and quadratic programming. Doklady Akademii Nauk SSSR, 174:747–748, 1967. Translated in Soviet Mathematics Doklady, 8:674–675, 1967. [64] I.I. Dikin. On the convergence of an iterative process. Upravlyaemye Sistemi, 12:54–60, 1974. (In Russian). [65] I.I. Dikin. Letter to the editor. Mathematical Programming, 41:393–394, 1988. [66] I.I. Dikin and C. Roos. Convergence of the dual varaibles for the primal affine scaling method with unit steps in the homogeneous case. J. of Optimization Theory and Applications, 95:305–321, 1997. [67] J. Ding and T.Y. Li. An algorithm based on weighted logarithmic barrier functions for linear complementarity problems. Arabian J. for Science and Engineering, 15(4):679– 685, 1990. [68] I.S. Duff. The solution of large-scale least-squares problems on supercomputers. Annals of Operations Research, 22:241–252, 1990. [69] I.S. Duff, N.I.M. Gould, J.K. Reid, J.A. Scott, and K. Turner. The factorization of sparse symmetric indefinite matrices. IMA J. on Numerical Analysis, 11:181–204, 1991. [70] A.S. El-Bakry, R.A. Tapia, and Y. Zhang. A study of indicators for identifying zero variables in interior-point methods. SIAM Review, 36(1):45–72, 1994. [71] A.S. El-Bakry, R.A. Tapia, and Y. Zhang. On the convergence rate of Newton interiorpoint methods in the absence of strict complementarity. Computational optimization and Applications, 6:157-167, 1996. [72] J.R. Evans and N.R. Baker. Degeneracy and the (mis)interpretation of sensitivity analysis in linear programming. Decision Sciences, 13:348–354, 1982. [73] J.R. Evans and N.R. Baker. Reply to ‘On ranging cost coefficients in dual degenerate linear programming problems’. Decision Sciences, 14:442–443, 1983. [74] S.-Ch. Fang and S. Puthenpura. Linear optimization and extensions: theory and algorithms. Prentice Hall, Englewood Cliffs, New Jersey 07632, 1993. [75] J. Farkas. Theorie der einfachen Ungleichungen. J. Reine und Angewandte Mathematik, 124:1–27, 1902. Bibliography 465 [76] A.V. Fiacco. Introduction to Sensitivity and Stability Analysis in Nonlinear Programming, volume 165 of Mathematics in Science and Engineering. Academic Press, New York, 1983. [77] A.V. Fiacco and G.P. McCormick. Nonlinear Programming: Sequential Unconstrained Minimization Techniques. John Wiley & Sons, New York, 1968. Reprint: Volume 4 of SIAM Classics in Applied Mathematics, SIAM Publications, Philadelphia, PA 19104– 2688, USA, 1990. [78] R. Fourer and S. Mehrotra. Solving symmetric indefinite systems in an interior-point method for linear programming. Mathematical Programming, 62:15–40, 1993. [79] C. Fraley. Linear updates for a single-phase projective method. Operations Research Letters, 9:169–174, 1990. [80] C. Fraley and J.-Ph. Vial. Numerical study of projective methods for linear programming. In S. Dolecki, editor, Optimization: Proceedings of the 5th French-German Conference in Castel-Novel, Varetz, France, October 1988, volume 1405 of Lecture Notes in Mathematics, pp. 25–38. Springer Verlag, Berlin, West-Germany, 1989. [81] C. Fraley and J.-Ph. Vial. Alternative approaches to feasibility in projective methods for linear programming. ORSA J. on Computing, 4:285–299, 1992. [82] P. Franklin. A Treatise on Advanced Calculus. John Wiley & Sons, New York (fifth edition), 1955. [83] R.M. Freund. An analogous of Karmarkar’s algorithm for inequality constrained linear programs, with a ’new’ class of projective transformations for centering a polytope. Operations Research Letters, 7:9–13, 1988. [84] R.M. Freund. Theoretical efficiency of a shifted barrier function algorithm for linear programming. Linear Algebra and Its Applications, 152:19–41, 1991. [85] R.M. Freund. Projective transformation for interior-point algorithms, and a superlinearly convergent algorithm for the w-center problem. Mathematical Programming, 58:385–414, 1993. [86] K.R. Frisch. Principles of linear programming—the double gradient form of the logarithmic potential method. Memorandum, Institute of Economics, University of Oslo, Oslo, Norway, October 1954. [87] K.R. Frisch. La resolution des problemes de programme lineaire par la methode du potential logarithmique. Cahiers du Seminaire D’Econometrie, 4:7–20, 1956. [88] K.R. Frisch. The logarithmic potential method for solving linear programming problems. Memorandum, University Institute of Economics, Oslo, 1955. [89] T. Gal. Postoptimal analyses, parametric programming and related topics. Mac-Graw Hill Inc., New York/Berlin, 1979. [90] T. Gal. Shadow prices and sensitivity analysis in linear programming under degeneracy, state-of-the-art-survey. OR Spektrum, 8:59–71, 1986. [91] D. Gale. The Theory of Linear Economic Models. McGraw–Hill, New York, USA, 1960. [92] M.R. Garey and D.S. Johnson. Computers and Intractability: a Guide to the Theory of NP-completeness. Freeman, San Francisco, 1979. [93] J. Gauvin. Quelques precisions sur les prix marginaux en programmation lineaire. INFOR, 18:68–73, 1980. (In French). [94] A. George and J.W.-H. Liu. Computing Solution of Large Sparse Positive Definite Systems. Prentice-Hall, Englewood Cliffs, NJ, 1981. 466 Bibliography [95] G. de Ghellinck and J.-Ph. Vial. A polynomial Newton method for linear programming. Algorithmica, 1(4):425–453, 1986. [96] G. de Ghellinck and J.-Ph. Vial. An extension of Karmarkar’s algorithm for solving a system of linear homogeneous equations on the simplex. Mathematical Programming, 39:79–92, 1987. [97] P.E. Gill, W. Murray, M.A. Saunders, J.A. Tomlin, and M.H. Wright. On projected Newton barrier methods for linear programming and an equivalence to Karmarkar’s projective method. Mathematical Programming, 36:183–209, 1986. [98] J.-L. Goffin, J. Gondzio, R. Sarkissian, and J.-Ph. Vial. Solving nonlinear multicommodity flow problems by the analytic center cutting plane method. Mathematical Programming, 76:131–154, 1997. [99] J.-L. Goffin, A. Haurie, and J.-Ph. Vial. Decomposition and nondifferentiable optimization with the projective algorithm. Management Science, 38(2):284–302, 1992. [100] J.-L. Goffin, Z.-Q. Luo, and Y. Ye. Complexity analysis of an interior cutting plane for convex feasibility problems. SIAM J. on Optimization, 6(3), 1996. [101] J.-L. Goffin and F. Sharifi-Mokhtarian-Mokhtarian. Primal-dual-infeasible Newton approach for the analytic center deep-cutting plane method. J. Optim. Theory Appl., 101(1):35–58, 1999. [102] J.-L. Goffin and J.-Ph. Vial. On the computation of weighted analytic centers and dual ellipsoids with the projective algorithm. Mathematical Programming, 60:81–92, 1993. [103] J.-L. Goffin and J.-Ph. Vial. Short steps with Karmarkar’s projective algorithm for linear programming. SIAM J. on Optimization, 4:193–207, 1994. [104] J.-L. Goffin and J.-Ph. Vial. Shallow, deep and very deep cuts in the analytic center cutting plane method. Math. Program., 84(1, Ser. A):89–103, 1999. [105] D. Goldfarb and S. Mehrotra. Relaxed variants of Karmarkar’s algorithm for linear programs with unknown optimal objective value. Mathematical Programming, 40:183– 195, 1988. [106] D. Goldfarb and S. Mehrotra. A relaxed version of Karmarkar’s method. Mathematical Programming, 40:289–315, 1988. [107] D. Goldfarb and S. Mehrotra. A self-correcting version of Karmarkar’s algorithm. SIAM J. on Numerical Analysis, 26:1006–1015, 1989. [108] D. Goldfarb and D.X. Shaw. On the complexity of a class of projective interior point methods. Mathematics of Operations Research, 20:116–134, 1995. [109] D. Goldfarb and M.J. Todd. Linear Programming. In G.L. Nemhauser, A.H.G. Rinnooy Kan, and M.J. Todd, editors, Optimization, volume 1 of Handbooks in Operations Research and Management Science, pp. 141–170. North Holland, Amsterdam, The Netherlands, 1989. [110] D. Goldfarb and D. Xiao. A primal projective interior point method for linear programming. Mathematical Programming, 51:17–43, 1991. [111] A.J. Goldman and A.W. Tucker. Theory of linear programming. In H.W. Kuhn and A.W. Tucker, editors, Linear Inequalities and Related Systems, Annals of Mathematical Studies, No. 38, pp. 53–97. Princeton University Press, Princeton, New Jersey, 1956. [112] G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore (second edition), 1989. [113] J. Gondzio. Presolve analysis of linear programs prior to applying the interior point method. INFORMS J. on Computing, 9:73–91,1997. Bibliography 467 [114] J. Gondzio. Multiple centrality corrections in a primal-dual method for linear programming. Computational Optimization and Applications, 6:137–156, 1996. [115] J. Gondzio, O. du Merle, R. Sarkissian, and J.-Ph. Vial. ACCPM - a library for convex optimization based on an analytic center cutting plane method. European J. of Operational Research, 94:206–211, 1996. [116] J. Gondzio and T. Terlaky. A computational view of interior point methods for linear programming. In J.E. Beasley, editor, Advances in Linear and Integer Programming, pp. 103–185. Oxford University Press, Oxford, Great Britain, 1996. [117] C.C. Gonzaga. A simple representation of Karmarkar’s algorithm. Technical Report, Dept. of Systems Engineering and Computer Science, COPPE Federal University of Rio de Janeiro, 21941 Rio de Janeiro, RJ, Brazil, May 1988. [118] C.C. Gonzaga. An algorithm for solving linear programming problems in O(n3 L) operations. In N. Megiddo, editor, Progress in Mathematical Programming: Interior Point and Related Methods, pp. 1–28. Springer Verlag, New York, 1989. [119] C.C. Gonzaga. Conical projection algorithms for linear programming. Mathematical Programming, 43:151–173, 1989. [120] C.C. Gonzaga. Convergence of the large step primal affine-scaling algorithm for primal nondegenerate linear programs. Technical Report ES–230/90, Dept. of Systems Engineering and Computer Science, COPPE Federal University of Rio de Janeiro, 21941 Rio de Janeiro, RJ, Brazil, September 1990. [121] C.C. Gonzaga. Large step path-following methods for linear programming, Part I: Barrier function method. SIAM J. on Optimization, 1:268–279, 1991. [122] C.C. Gonzaga. Large step path-following methods for linear programming, Part II: Potential reduction method. SIAM J. on Optimization, 1:280–292, 1991. [123] C.C. Gonzaga. Search directions for interior linear programming methods. Algorithmica, 6:153–181, 1991. [124] C.C. Gonzaga. Path-following methods for linear programming. SIAM Review, 34(2): 167–227, 1992. [125] C.C. Gonzaga. The largest step path following algorithm for monotone linear complementarity problems. Mathematical Programming, 76(2):309–332, 1997. [126] C.C. Gonzaga and R.A. Tapia. On the convergence of the Mizuno–Todd–Ye algorithm to the analytic center of the solution set. SIAM J. on Optimization, 7: 47–65, 1997. [127] C.C. Gonzaga and R.A. Tapia. On the quadratic convergence of the simplified Mizuno– Todd–Ye algorithm for linear programming. SIAM J. on Optimization, 7:66–85, 1997. [128] H.J. Greenberg. An analysis of degeneracy. Naval Research Logistics Quarterly, 33:635– 655, 1986. [129] H.J. Greenberg. The use of the optimal partition in a linear programming solution for postoptimal analysis. Operations Research Letters, 15:179–186, 1994. [130] O. Güler, 1994. Private communication. [131] O. Güler. Limiting behavior of the weighted central paths in linear programming. Mathematical Programming, 65(2):347–363, 1994. [132] O. Güler, D. den Hertog, C. Roos, T. Terlaky, and T. Tsuchiya. Degeneracy in interior point methods for linear programming: A survey. Annals of Operations Research, 46:107–138, 1993. 468 Bibliography [133] O. Güler, C. Roos, T. Terlaky, and J.-Ph. Vial. Interior point approach to the theory of linear programming. Cahiers de Recherche 1992.3, Faculte des Sciences Economique et Sociales, Universite de Geneve, Geneve, Switzerland, 1992. [134] O. Güler, C. Roos, T. Terlaky, and J.-Ph. Vial. A survey of the implications of the behavior of the central path for the duality theory of linear programming. Management Science, 41:1922–1934, 1995. [135] O. Güler and Y. Ye. Convergence behavior of interior-point algorithms. Mathematical Programming, 60(2):215–228, 1993. [136] W. W. Hager. Updating the inverse of a matrix. SIAM Review, 31(2):221–239, June 1989. [137] M. Halická. Analytical properties of the central path at the boundary point in linear programming. Mathematical Programming, 84:335-355, 1999. [138] L.A. Hall and R.J. Vanderbei. Two-third is sharp for affine scaling. Operations Research Letters, 13:197–201, 1993. [139] G.H. Hardy, J.E. Littlewood, and G. Pólya. Inequalities. Cambridge University Press, Cambridge, Cambridge, 1934. [140] D. den Hertog. Interior Point Approach to Linear, Quadratic and Convex Programming, volume 277 of Mathematics and its Applications. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1994. [141] D. den Hertog, J.A. Kaliski, C. Roos, and T. Terlaky. A logarithmic barrier cutting plane method for convex programming. Annals of Operations Reasearch, 58:69–98, 1995. [142] D. den Hertog and C. Roos. A survey of search directions in interior point methods for linear programming. Mathematical Programming, 52:481–509, 1991. [143] D. den Hertog, C. Roos, and T. Terlaky. A potential reduction variant of Renegar’s short-step path-following method for linear programming. Linear Algebra and Its Applications, 152:43–68, 1991. [144] D. den Hertog, C. Roos, and T. Terlaky. On the monotonicity of the dual objective along barrier paths. COAL Bulletin, 20:2–8, 1992. [145] D. den Hertog, C. Roos, and T. Terlaky. Adding and deleting constraints in the logarithmic barrier method for LP. In D.-Z. Du and J. Sun, editors, Advances in Optimization and Approximation, pp. 166–185. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1994. [146] D. den Hertog, C. Roos, and J.-Ph. Vial. A complexity reduction for the long-step path-following algorithm for linear programming. SIAM J. on Optimization, 2:71–87, 1992. [147] R.A. Horn and C.R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge, UK, 1985. [148] P. Huard. Resolution of mathmetical programming with nonlinear constraints by the method of centers. In J. Abadie, editor, Nonlinear Programming, pp. 207–219. North Holland, Amsterdam, The Netherlands, 1967. [149] P. Huard. A method of centers by upper-bounding functions with applications. In J.B. Rosen, O.L. Mangasarian, and K. Ritter, editors, Nonlinear Programming: Proceedings of a Symposium held at the University of Wisconsin, Madison, Wisconsin, USA, May 1970, pp. 1–30. Academic Press, New York, USA, 1970. Bibliography 469 √ [150] P. Hung and Y. Ye. An asymptotically O( nL)-iteration path-following linear programming algorithm that uses long steps. SIAM J. on Optimization, 6:570–586, 1996. [151] B. Jansen. Interior Point Techniques in Optimization. Complexity, Sensitivity and Algorithms, volume 6 of Applied Optimization. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1997. [152] B. Jansen, J.J. de Jong, C. Roos, and T. Terlaky. Sensitivity analysis in linear programming: just be careful! European Journal of Operations Research, 101:15–28, 1997. [153] B. Jansen, C. Roos, and T. Terlaky. An interior point approach to postoptimal and parametric analysis in linear programming. Technical Report 92–21, Faculty of Technical Mathematics and Computer Science, TU Delft, NL–2628 CD Delft, The Netherlands, April 1992. [154] B. Jansen, C. Roos, and T. Terlaky. A family of polynomial affine scaling algorithms for positive semi-definite linear complementarity problems. SIAM J. on Optimization, 7(1):126–140, 1997. [155] B. Jansen, C. Roos, and T. Terlaky. The theory of linear programming : Skew symmetric self-dual problems and the central path. Optimization, 29:225–233, 1994. [156] B. Jansen, C. Roos, and T. Terlaky. A polynomial Dikin-type primal-dual algorithm for linear programming. Mathematics of Operations Research, 21:341–353, 1996. [157] B. Jansen, C. Roos, T. Terlaky, and J.-Ph. Vial. Primal-dual algorithms for linear programming based on the logarithmic barrier method. J. of Optimization Theory and Applications, 83:1–26, 1994. [158] B. Jansen, C. Roos, T. Terlaky, and J.-Ph. Vial. Long-step primal-dual target-following algorithms for linear programming. Mathematical Methods of Operations Research, 44:11–30, 1996. [159] B. Jansen, C. Roos, T. Terlaky, and J.-Ph. Vial. Primal-dual target-following algorithms for linear programming. Annals of Operations Research, 62:197–231, 1996. [160] B. Jansen, C. Roos, T. Terlaky, and Y. Ye. Improved complexity using higher-order correctors for primal-dual Dikin affine scaling. Mathematical Programming, 76:117–130, 1997. [161] F. Jarre. Interior-point methods for classes of convex programs. In T. Terlaky, editor, Interior Point Methods of Mathematical Programming, pp. 255–296. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1996. [162] F. Jarre, M. Kocvara, and J. Zowe. Optimal truss design by interior-point methods. SIAM J. on Optimization, 8(4):1084–1107, 1998. [163] F. Jarre and M.A. Saunders. An adaptive primal-dual method for linear programming. COAL Newsletter, 19:7–16, August 1991. [164] J. Kaliski, D. Haglin, C. Roos, and T. Terlaky. Logarithmic barrier decomposition methods for semi-infinite programming. International Transactions in Operations Research 4:285–303, 1997. [165] N.K. Karmarkar. A new polynomial-time algorithm for linear programming. Combinatorica, 4:373–395, 1984. [166] R.M. Karp. Reducibility among combinatorial problems. In R.E. Miller and J.W. Thatcher, editors, Complexity of computer computations, pp. 85–103. Plenum Press, New York, 1972. 470 Bibliography [167] L.G. Khachiyan. A polynomial algorithm in linear programming. Doklady Akademiia Nauk SSSR, 244:1093–1096, 1979. Translated into English in Soviet Mathematics Doklady 20, 191–194. [168] K.C. Kiwiel. Complexity of some cutting plane methods that use analytic centers. Mathematical Programming, 74(1), 1996. [169] E. Klafszky, J. Mayer, and T. Terlaky. Linearly constrained estimation by mathmetical programming. European J. of Operational Research, 34:254–267, 1989. [170] E. Klafszky and T. Terlaky. On the ellipsoid method. Szigma, 20(2–3):196–208, 1988. In Hungarian. [171] E. Klafszky and T. Terlaky. The role of pivoting in proving some fundamental theorems of linear algebra. Linear Algebra and its Applications, 151:97–118, 1991. [172] E. de Klerk, C. Roos, and T. Terlaky. A nonconvex weighted potential function for polynomial target following methods. Annals of Operations Reasearch, 81:3–14, 1998. [173] G. Knolmayer. The effects of degeneracy on cost-coefficient ranges and an algorithm to resolve interpretation problems. Decision Sciences, 15:14–21, 1984. [174] M. Kojima, N. Megiddo, and S. Mizuno. A primal-dual infeasible-interior-point algorithm for linear programming. Mathematical Programming, 61:263–280, 1993. [175] M. Kojima, N. Megiddo, T. Noma, and A. Yoshise. A unified approach to interior point algorithms for linear complementarity problems, volume 538 of Lecture Notes in Computer Science. Springer Verlag, Berlin, Germany, 1991. [176] M. Kojima, S. Mizuno, and T. Noma. Limiting behavior of trajectories by a continuation method for monotone complementarity problems. Mathematics of Operations Research, 15(4):662–675, 1990. [177] M. Kojima, S. Mizuno, and A. Yoshise. A polynomial-time algorithm for a class of linear complementarity problems. Mathematical Programming, 44:1–26, 1989. [178] M. Kojima, S. Mizuno, and A. Yoshise. A primal-dual interior point algorithm for linear programming. In N. Megiddo, editor, Progress in Mathematical Programming: Interior Point and Related Methods, pp. 29–47. Springer Verlag, New York, 1989. [179] E. Kranich. Interior point methods for mathmetical programming: A bibliography. Discussion Paper 171, Institute of Economy and Operations Research, FernUniversität Hagen, P.O. Box 940, D–5800 Hagen 1, West–Germany, May 1991. Available through NETLIB, see Kranich [180]. [180] E. Kranich. Interior-point methods bibliography. SIAG/OPT Views-and-News, A Forum for the SIAM Activity Group on Optimization, 1:11, 1992. [181] P. Lancester and M. Tismenetsky. The Theory of Matrices with Applications. Academic Press, Orlando (second edition), 1985. [182] P.D. Ling. A new proof of convergence for the new primal-dual affine scaling interiorpoint algorithm of Jansen, Roos and Terlaky. Working paper, University of East-Anglia, Norwich, England, 1993. [183] P.D. Ling. A predictor-corrector algorithm. Working Paper, University of East Anglia, Norwich, England, 1993. [184] C.L. Liu. Introduction to Combinatorial Mathematics. Mac-Graw Hill Book Company, New York, 1968. [185] F.A. Lootsma. Numerical Methods for Nonlinear Optimization. London, UK, 1972. Academic Press, Bibliography 471 [186] Z.-Q. Luo. Analysis of a cutting plane method that uses weighted analytic center an multiple cuts. SIAM J. on Optimization, 7(3):697–716, 1997. [187] Z.-Q. Luo, C. Roos, and T. Terlaky. Complexity analysis of a logarithmic barrier decomposition method for semi-infinite linear programming. Applied Numerical Mathematics, 29:379–394, 1999. [188] Z.-Q. Luo and Y. Ye. A genuine quadratically convergent polynomial interior point algorithm for linear programming. In D.-Z. Du and J. Sun, editors, Advances in Optimization and Approximation, pp. 235–246. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1994. [189] I.J. Lustig. An analysis of an available set of linear programming test problems. Computers and Operations Research, 16:173–184, 1989. [190] I.J. Lustig. Phase 1 search directions for a primal-dual interior point method for linear programming. Contemporary Mathematics, 114:121–130, 1990. [191] I.J. Lustig, R.E. Marsten, and D.F. Shanno. Computational experience with a primaldual interior point method for linear programming. Linear Algebra and Its Applications, 152:191–222, 1991. [192] I.J. Lustig, R.E. Marsten, and D.F. Shanno. On implementing Mehrotra’s predictorcorrector interior point method for linear programming. SIAM J. on Optimization, 2:435–449, 1992. [193] I.J. Lustig, R.E. Marsten, and D.F. Shanno. Interior point methods for linear programming : Computational state of the art. ORSA J. on Computing, 6(1):1–14, 1994. [194] H.M. Markowitz. The elimination form of the inverse and its application to linear programming. Management Science, 3:255–269, 1957. [195] I. Maros and Cs. Mészáros. The role of the augmented system in interior point methods. European J. of Operational Research, 107(3):720–736, 1998. [196] R.E. Marsten, D.F. Shanno and E.M. Simantiraki. Interior point methods for linear and nonlinear programming. In I.A. Duff and A. Watson, editors, The state of the art in numerical analysis (York, 1996), volume 63 of Inst. Math. Appl. Conf. Ser. New Ser., pages 339–362. Oxford Univ. Press, New York, 1997. [197] L. McLinden. The analogue of Moreau’s proximation theorem, with applications to the nonlinear complementarity problem. Pacific J. of Mathematics, 88:101–161, 1980. [198] L. McLinden. The complementarity problem for maximal monotone multifunctions. In R.W. Cottle, F. Giannessi, and J.L. Lions, editors, Variational Inequalities and Complementarity Problems, pp. 251–270. John Wiley and Sons, New York, 1980. [199] K.A. McShane, C.L. Monma, and D.F. Shanno. An implementation of a primal-dual interior point method for linear programming. ORSA J. on Computing, 1:70–83, 1989. [200] N. Megiddo. Pathways to the optimal set in linear programming. In N. Megiddo, editor, Progress in Mathematical Programming: Interior Point and Related Methods, pp. 131–158. Springer Verlag, New York, 1989. Identical version in Proceedings of the 6th Mathematical Programming Symposium of Japan, Nagoya, Japan, pp. 1–35, 1986. [201] N. Megiddo. On finding primal- and dual-optimal bases. ORSA J. on Computing, 3:63–65, 1991. [202] S. Mehrotra. Higher order methods and their performance. Technical Report 90–16R1, Dept. of Industrial Engineering and Management Science, Northwestern University, Evanston, IL 60208, USA, 1990. Revised July 1991. 472 Bibliography [203] S. Mehrotra. Finding a vertex solution using an interior point method. Linear Algebra and Its Applications, 152:233–253, 1991. [204] S. Mehrotra. Deferred rank-one updates in O(n3 L) interior point algorithm. J. of the Operations Research Society of Japan, 35:345–352, 1992. [205] S. Mehrotra. On the implementation of a (primal-dual) interior point method. SIAM J. on Optimization, 2(4):575–601, 1992. [206] S. Mehrotra. Quadratic convergence in a primal-dual method. Mathematics of Operations Research, 18:741–751, 1993. [207] S. Mehrotra and R.D.C. Monteiro. Parametric and range analysis for interior point methods. Mathematical Programming, 74:65–82, 1996. [208] S. Mehrotra and Y. Ye. On finding the optimal facet of linear programs. Mathematical Programming, 62:497–515, 1993. [209] O. du Merle. Mise en œuvre et développements de la méthode de plans coupants basés sur les centres analytiques. PhD thesis, Faculté des Sciences Economiques et Sociales, Université de Genève, 1995. In French. [210] H.D. Mills. Marginal values of matrix games and linear programs. In H.W. Kuhn and A.W. Tucker, editors, Linear Inequalities and Related Systems, Annals of Mathematical Studies, No. 38, pp. 183–193. Princeton University Press, Princeton, New Jersey, 1956. [211] J.E. Mitchell. Fixing variables and generating classical cutting planes when using an interior point branch and cut method to solve integer programming problems. European J. of Operational Research, 97:139–148, 1997. [212] S. Mizuno. An O(n3 L) algorithm using a sequence for linear complementarity problems. J. of the Operations Research Society of Japan, 33:66–75, 1990. [213] S. Mizuno. A rank-one updating interior algorithm for linear programming. Arabian J. for Science and Engineering, 15(4):671–677, 1990. [214] S. Mizuno. A new polynomial time method for a linear complementarity problem. Mathematical Programming, 56:31–43, 1992. [215] S. Mizuno. A primal-dual interior point method for linear programming. The Proceeding of the Institute of Statistical Mathematics, 40(1):27–44, 1992. In Japanese. [216] S. Mizuno and M.J. Todd. An O(n3 L) adaptive path following algorithm for a linear complementarity problem. Mathematical Programming, 52:587–595, 1991. [217] S. Mizuno, M.J. Todd, and Y. Ye. On adaptive-step primal-dual interior-point algorithms for linear programming. Mathematics of Operations Research, 18:964–981, 1993. [218] R.D.C. Monteiro and I. Adler. Interior-path following primal-dual algorithms: Part I: Linear programming. Mathematical Programming, 44:27–41, 1989. [219] R.D.C. Monteiro and I. Adler. Interior path-following primal-dual algorithms: Part II: Convex quadratic programming. Mathematical Programming, 44:43–66, 1989. [220] R.D.C. Monteiro, I. Adler, and M.G.C. Resende. A polynomial-time primal-dual affine scaling algorithm for linear and convex quadratic programming and its power series extension. Mathematics of Operations Research, 15:191–214, 1990. [221] R.D.C. Monteiro and S. Mehrotra. A general parametric analysis approach and its implications to sensitivity analysis in interior point methods. Mathematical Programming, 72:65–82, 1996. Bibliography 473 [222] R.D.C. Monteiro and T. Tsuchiya. Limiting behavior of the derivatives of certain trajectories associated with a monotone horizontal linear complementarity problem. Mathematics of Operations Research 21(4):793–814, 1996. [223] M. Muramatsu and T. Tsuchiya. Convergence analysis of the projective scaling algorithm based on a long-step homogeneous affine scaling algorithm. Mathematical Programming, 72:291–305, 1996. [224] G.L. Nemhauser and L.A. Wolsey. Integer and Combinatorial Optimization. J. Wiley & Sons, New York, 1988. [225] Y. Nesterov. Cutting plane algorithms from analytic centers: efficiency estimates. Mathematical Programming, 69(1), 1995. [226] Y. Nesterov and A.S. Nemirovskii. Interior Point Polynomial Methods in Convex Programming: Theory and Algorithms. SIAM Publications. SIAM, Philadelphia, USA, 1993. [227] J. von Neumann. On a maximization problem. Manuscript, Institute for Advanced Studies, Princeton University, Princeton, NJ 08544, USA, 1947. [228] F. Nožička, J. Guddat, H. Hollatz, and B. Bank. Theorie der linearen parametrischen Optimierung. Akademie-Verlag, Berlin, 1974. [229] M.R. Osborne. Finite Algorithms in Optimization and Data Analysis. John Wiley & Sons, New York, USA, 1985. [230] M. Padberg. Linear Optimization and Extensions, volume 12 of Algorithmis and Combinatorics. Springer Verlag, Berlin, West–Germany, 1995. [231] C.H. Papadimitriou and K. Steiglitz. Combinatorial optimization. Algorithms and complexity. Prentice–Hall, Inc., Englewood Cliffs, New Jersey, 1982. [232] J. Peng. Private communication. [233] J. Peng, C. Roos and T. Terlaky. Self-Regularity: A New Paradigm for Primal-Dual Interior Point Methods. Princeton University Press, 2002. [234] R. Polyak. Modified barrier functions (theory and methods). Mathematical Programming, 54:177–222, 1992. [235] F.A. Potra. A quadratically convergent predictor-corrector method for solving linear programs from infeasible starting points. Mathematical Programming, 67(3):383–406, 1994. [236] M.V. Ramana and P.M. Pardalos. Semidefinite programming. In T. Terlaky, editor, Interior Point Methods of Mathematical Programming, pp. 369–398. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1996. [237] J. Renegar. A polynomial-time algorithm, based on Newton’s method, for linear programming. Mathematical Programming, 40:59–93, 1988. [238] R.T. Rockafellar. The elementary vectors of a subspace of IRN . In R.C. Bose and T.A. Dowling, editors, Combinatorial Mathematics and Its Applications: Proceedings North Caroline Conference, Chapel Hill, 1967, pp. 104–127. The University of North Caroline Press, Chapel Hill, North Caroline, 1969. [239] C. Roos. New trajectory-following polynomial-time algorithm for linear programming problems. J. of Optimization Theory and Applications, 63:433–458, 1989. [240] C. Roos. An O(n3 L) approximate center method for linear programming. In S. Dolecki, editor, Optimization: Proceedings of the 5th French–German Conference in Castel– Novel, Varetz, France, October 1988, volume 1405 of Lecture Notes in Mathematics, pp. 147–158. Springer Verlag, Berlin, West–Germany, 1989. 474 Bibliography [241] C. Roos and D. den Hertog. A polynomial method of approximate weighted centers for linear programming. Technical Report 89–13, Faculty of Mathematics and Computer Science, TU Delft, NL–2628 BL Delft, The Netherlands, 1989. [242] C. Roos and T. Terlaky. Advances in linear optimization. In M. Dell’Amico, F. Maffioli, and S. Martello, editors, Annotated Bibliographies in Combinatorial Optimization, pp. 95–114. John Wiley & Sons, New York, USA, 1997. [243] C. Roos and J.-Ph. Vial. Analytic centers in linear programming. Technical Report 88–74, Faculty of Mathematics and Computer Science, TU Delft, NL–2628 BL Delft, The Netherlands, 1988. [244] C. Roos and J.-Ph. Vial. Long steps with the logarithmic penalty barrier function in linear programming. In J. Gabszevwicz, J.F. Richard, and L. Wolsey, editors, Economic Decision–Making: Games, Economics and Optimization, dedicated to J.H. Drèze, pp. 433–441. Elsevier Science Publisher B.V., Amsterdam, The Netherlands, 1989. [245] C. Roos and J.-Ph. Vial. A polynomial method of approximate centers for linear programming. Mathematical Programming, 54:295–305, 1992. [246] C. Roos and J.-Ph. Vial. Achievable potential reductions in the method of Kojima et al. in the case of linear programming. Revue RAIRO–Operations Research, 28:123–133, 1994. [247] D.S. Rubin and H.M. Wagner. Shadow prices: tips and traps for managers and instructors. Interfaces, 20:150–157, 1990. [248] W. Rudin. Principles of Mathematical Analysis. Mac-Graw Hill Book Company, New York, 1978. [249] R. Saigal. Linear Programming, A modern integrated analysis. International series in operations research & management. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1995. [250] A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, New York, 1986. [251] D.F. Shanno. Computing Karmarkar projections quickly. Mathematical Programming, 41:61–71, 1988. [252] D.F. Shanno, M.G. Breitfeld, and E.M. Simantiraki. Implementing barrier methods for nonlinear programming. In T. Terlaky, editor, Interior Point Methods of Mathematical Programming, pp. 369–398. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1996. [253] R. Sharda. Linear programming software for personal computers: 1995 survey. OR/MS Today, pp. 49–57, October 1995. [254] D.X. Shaw and D. Goldfarb. A path-following projective interior point method for linear programming. SIAM J. on Optimization, 4:65–85, 1994. [255] N.Z. Shor. Quadratic optimization problems. Soviet J. of Computer and System Sciences, 25:1–11, 1987. [256] G. Sierksma. Linear and integer programming, volume 245 of Monographs and Textbooks in Pure and Applied Mathematics. Marcel Dekker Inc., New York, second edition, 2002. Theory and practice, With 1 IBM-PC floppy disk (“INTPM, a version of Karmarkar’s Interior Point Method”) by J. Gjaltema and G. A. Tijssen (3.5 inch; HD). Bibliography 475 [257] Gy. Sonnevend. An “analytic center” for polyhedrons and new classes of global algorithms for linear (smooth, convex) programming. In A. Prékopa, J. Szelezsán, and B. Strazicky, editors, System Modelling and Optimization: Proceedings of the 12th IFIP–Conference held in Budapest, Hungary, September 1985, volume 84 of Lecture Notes in Control and Information Sciences, pp. 866–876. Springer Verlag, Berlin, West– Germany, 1986. [258] Gy. Sonnevend, J. Stoer, and G. Zhao. On the complexity of following the central path by linear extrapolation in linear programming. Methods of Operations Research, 62:19–31, 1990. [259] G. Strang. Linear Algebra and its Applications. Harcourt Brace Jovanovich, Orlando, Florida, USA, 1988. √ [260] J.F. Sturm and S. Zhang. An O( nL) iteration bound primal-dual cone affine scaling algorithm. Mathematical Programming, 72:177–194, 1996. [261] K. Tanabe. Centered Newton methods and Differential Geometry of Optimization. Cooperative Research Report 89. The Institute of Statistical Mathematics, Tokyo, Japan, 1996. (Contains 38 papers related to the subject). [262] M.J. Todd. Recent developments and new directions in linear programming. In M. Iri and K. Tanabe, editors, Mathematical Programming: Recent Developments and Applications, pp. 109–157. Kluwer Academic Press, Dordrecht, The Netherlands, 1989. [263] M.J. Todd. The effects of degeneracy, null and unbounded variables on variants of Karmarkar’s linear programming algorithm. In T.F. Coleman and Y. Li, editors, LargeScale Numerical Optimization. Volume 46 of SIAM Proceedings in Applied Mathematics, pp. 81–91. SIAM, Philadelphia, PA, USA, 1990. [264] M.J. Todd. A lower bound on the number of iterations of primal-dual interior-point methods for linear programming. In G.A. Watson and D.F. Griffiths, editors, Numerical Analysis 1993, volume 303 of Pitman Research Notes in Mathematics, pp. 237–259. Longman Press, Harlow, 1994. See also [267]. [265] M.J. Todd. Potential-reduction methods in mathmetical programming. Mathematical Programming, 76 (1), 3–45, 1997. [266] M.J. Todd and B.P. Burrell. An extension of Karmarkar’s algorithm for linear programming using dual variables. Algorithmica, 1(4):409–424, 1986. [267] M.J. Todd and Y. Ye. A lower bound on the number of iterations of long-step and polynomial interior-point linear programming algorithms. Annals of Operations Research, 62:233–252, 1996. [268] T. Tsuchiya. Global convergence of the affine scaling methods for degenerate linear programming problems. Mathematical Programming, 52:377–404, 1991. [269] T. Tsuchiya. Degenerate linear programming problems and the affine scaling method. Systems, Control and Information, 34(4):216–222, April 1990. (In Japanese). [270] T. Tsuchiya. Global convergence property of the affine scaling methods for primal degenerate linear programming problems. Mathematics of Operations Research, 17(3):527– 557, 1992. [271] T. Tsuchiya. Quadratic convergence of Iri–Imai algorithm for degenerate linear programming problems. J. of Optimization Theory and Applications, 87(3):703–726, 1995. [272] T. Tsuchiya. Affine scaling algorithm. In T. Terlaky, editor, Interior Point Methods of Mathematical Programming, pp. 35–82. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1996. 476 Bibliography [273] T. Tsuchiya and M. Muramatsu. Global convergence of the long-step affine scaling algorithm for degenerate linear programming problems. SIAM J. on Optimization, 5(3):525–551, 1995. [274] A.W. Tucker. Dual systems of homogeneous linear relations. In H.W. Kuhn and A.W. Tucker, editors, Linear Inequalities and Related Systems, Annals of Mathematical Studies, No. 38, pp. 3–18. Princeton University Press, Princeton, New Jersey, 1956. [275] K. Turner. Computing projections for the Karmarkar algorithm. Linear Algebra and Its Applications, 152:141–154, 1991. [276] P.M. Vaidya. An algorithm for linear programming which requires O((m + n)n2 + (m + n)1.5 nL) arithmetic operations. Mathematical Programming, 47:175–201, 1990. [277] R.J. Vanderbei. Symmetric quasi-definite matrices. SIAM J. on Optimization, 5(1): 100–113, 1995. [278] R.J. Vanderbei and T.J. Carpenter. Symmetric indefinite systems for interior point methods. Mathematical Programming, 58:1–32, 1993. [279] R.J. Vanderbei, M.S. Meketon, and B.A. Freedman. A modification of Karmarkar’s linear programming algorithm. Algorithmica, 1(4):395–407, 1986. [280] S.A. Vavasis and Y. Ye. Condition numbers for polyhedra with real number data. Operations Research Letters, 17:209–214, 1995. [281] S.A. Vavasis and Y. Ye. A primal-dual interior point method whose running time depends only on the constraint matrix. Mathematical Programming, 74:79–120, 1996. [282] J.-Ph. Vial. A fully polynomial time projective method. Operations Research Letters, 7(1), 1988. [283] J.-Ph. Vial. A unified approach to projective algorithms for linear programming. In S. Dolecki, editor, Optimization: Proceedings of the 5th French–German Conference in Castel–Novel, Varetz, France, October 1988, volume 1405 of Lecture Notes in Mathematics, pp. 191–220. Springer Verlag, Berlin, West–Germany, 1989. [284] J.-Ph. Vial. A projective algorithm for linear programming with no regularity condition. Operations Research Letters, 12(1), 1992. [285] J.-Ph. Vial. A generic path-following algorithm with a sliding constraint and its application to linear programming and the computation of analytic centers. Technical Report 1996.8, LOGILAB/Management Studies, University of Geneva, Switzerland, 1996. [286] J.-Ph. Vial. A path-following version of the Todd-Burrell procedure for linear programming. Mathematical Methods of Operations Research, 46(2):153–167, 1997. [287] G.R. Walsh. An Introduction to Linear Programming. John Wiley & Sons, New York, USA, 1985. [288] J.E. Ward and R.E. Wendell. Approaches to sensitivity analysis in linear programming. Annals of Operations Research, 27:3–38, 1990. [289] D.S. Watkins. Fundamentals of Matrix Computations. John Wiley & Sons, New York, 1991. [290] M. Wechs. The analyticity of interior-point-paths at strictly complementary solutions of linear programs. Optimization Methods and Software, 9:209–243, 1998. [291] A.C. Williams. Boundedness relations for linear constraints sets. Linear Algebra and Its Applications, 3:129–141, 1970. Bibliography 477 [292] A.C. Williams. Complementarity theorems for linear programming. SIAM Review, 12:135–137, 1970. [293] H.P. Williams. Model Building in Mathematical Programming. John Wiley & Sons, New York, USA (third edition), 1990. [294] C. Witzgall, P.T. Boggs, and P.D. Domich. On the convergence behavior of trajectories for linear programming. Contemporary Mathematics, 114:161–187, 1990. [295] S.J. Wright. An infeasible-interior-point algorithm for linear complementarity problems. Mathematical Programming, 67(1):29–52, 1994. [296] S.J. Wright and D. Ralph. A superlinear infeasible-interior-point algorithm for monotone nonlinear complementarity problems. Mathematics of Operations Research, 21(4):815–838, 1996. [297] S.J. Wright. A path-following infeasible-interior-point algorithm for linear complementarity problems. Optimization Methods and Software, 2:79–106, 1993. [298] S.J. Wright. Primal-Dual Interior-Point Methods. SIAM, Philadelphia, 1996. √ [299] F. Wu, S. Wu, and Y. Ye. On quadratic convergence of the O( nL)-iteration homogeneous and self-dual linear programming algorithm. Annals of Operations Research, 87: 393–406, 1999. [300] S.R. Xu, H.B. Yao, and Y.Q. Chen. An improved Karmarkar algorithm for linear programming and its numerical tests. Mathematica Applicata, 5(1):14–21, 1992. (In Chinese, English summary). [301] H. Yamashita. A polynomially and quadratically convergent method for linear programming. Working Paper, Mathematical Systems Institute, Inc., Tokyo, Japan, 1986. [302] M. Yannakakis. Computing the minimum fill-in is NP-complete. SIAM J. on Algebraic Discrete Methods, pp. 77–79, 1981. [303] Y. Ye. Interior algorithms for linear, quadratic, and linearly constrained convex programming. PhD thesis, Dept. of Engineering Economic Systems, Stanford University, Stanford, CA 94305, USA, 1987. [304] Y. Ye. Karmarkar’s algorithm and the ellipsoid method. Operations Research Letters, 6:177–182, 1987. [305] Y. Ye. A class of projective transformations for linear programming. SIAM J. on Computing, 19:457–466, 1990. [306] Y. Ye. An O(n3 L) potential reduction algorithm for linear programming. Mathematical Programming, 50:239–258, 1991. [307] Y. Ye. Extensions of the potential reduction algorithm for linear programming. J. of Optimization Theory and Applications, 72(3):487–498, 1992. [308] Y. Ye. On the finite convergence of interior-point algorithms for linear programming. Mathematical Programming, 57:325–335, 1992. [309] Y. Ye. On the q-order of convergence of interior-point algorithms for linear programming. In Wu Fang, editor, Proceedings Symposium on Applied Mathematics. Chinese Academy of Sciences, Institute of Applied Mathematics, 1992. [310] Y. Ye. A potential reduction algorithm allowing column generation. SIAM J. on Optimization, 2:7–20, 1992. [311] Y. Ye. Toward probabilistic analysis of interior-point algorithms for linear programming. Mathematics of Operations Research, 19:38–52, 1994. 478 Author Index [312] Y. Ye. Complexity analysis of the analytic center cutting plane method that uses multiple cuts. Mathematical Programming, 76(1):211–221, 1997. √ [313] Y. Ye, O. Güler, R.A. Tapia, and Y. Zhang. A quadratically convergent O( nL)iteration algorithm for linear programming. Mathematical Programming, 59:151–162, 1993. [314] Y. Ye and P.M. Pardalos. A class of linear complementarity problems solvable in polynomial time. Linear Algebra and Its Applications, 152:3–17, 1991. [315] Y. Ye and M.J. Todd. Containing and shrinking ellipsoids in the path-following algorithm. Mathematical Programming, 47:1–10, 1990. √ [316] Y. Ye, M.J. Todd, and S. Mizuno. An O( nL)-iteration homogeneous and self-dual linear programming algorithm. Mathematics of Operations Research, 19:53–67, 1994. √ [317] Y. Ye, O. Güler, R.A. Tapia, and Y. Zhang. A quadratically convergent O( nL) iteration algorithm for linear programming. Mathematical Programming, 59:151–162, 1993. [318] L. Zhang and Y. Zhang. On polynomiality of the Mehrotra-type predictor-corrector interior-point algorithms. Mathematical Programming, 68:303–318, 1995. [319] Y. Zhang and R.A. Tapia. Superlinear and quadratic convergence of primal-dual interior-point methods for linear programming revisited. J. of Optimization Theory and Applications, 73(2):229–242, 1992. [320] G. Zhao. Interior point algorithms for linear complementarity problems based on large neighborhoods of the central path. SIAM J. on Optimization, 8(2):397–413, 1998. [321] G. Zhao and J. Zhu. Analytical properties of the central trajectory in interior point methods. In D-Z. Du and J. Sun, editors, Advances in Optimization and Approximation, pp. 362–375. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1994. [322] M. Zoltán and T. Szabolcs. Iterációfüggetlen lépéshossz és lépésbecslés a Dikinalgoritmus alkalmazásában a lineáris programozási feladatra. Alkalmazott Matematikai Lapok, 26:365–379, 2009. Author Index Abadie, J., 468 Adler, I., 44, 165, 196, 233, 309, 317, 330, 387, 406, 412, 461, 472 Aho, A.V., 48, 461 Akgül, M., 387, 461 Altman, A., 278, 461 Andersen, E.D., xxiii, 62, 406, 409, 412, 418, 419, 426, 428, 429, 461 Andersen, K.D., 406, 411, 461 Anstreicher, K.M., 277, 289, 317, 461–463 Arioli, M., 408, 462 Asić, M.D., 44, 462 Atkinson, D.S., 252, 278, 462 Bahn, O., 278, 289, 462 Baker, N.R., 387, 464 Balinski, M.L., 17, 462 Bank, B., 391, 473 Barnes, E.R., 181, 451, 462, 463 Bazaraa, M.S., 4, 431, 463 Beasley, R., 466 Bellman, R., 8, 463 Ben-Israel, A., 8, 463 Bertsekas, D.P., 431, 463 Birkhoff, G., 8, 463 Bixby, R.E., xix, 406, 426, 463 Björk, Å., 408, 463 Boggs, P.T., 44, 309, 476 Bonnans, J.F., xxi, 463 Bosch, R.A., 317, 462, 463 Bose, R.C., 473 Boyd, S.E., xx, xxi, 463 Brearley, A.L., 406, 463 Breitfeld, M.G., xx, xxi, 463, 474 Brinkhuis, J., xxiii, 330, 463 Buck, R.C., 431, 463 Bunch, J.R., 408, 463 Burrell, B.P., 278, 289, 298, 475 Carpenter, T.J., 409, 475 Chang, S.F., 406, 463 Chen, Y.Q., 289, 477 Chopra, S., 181, 463 Chvátal, V., 15, 392, 463 Coleman, T.F., 475 Cook, S.A., 48, 463 Cottle, R.W., 471 Crouzeix, J.P., 225, 464 Csiszár, I., 93, 464 Czyzyk, J., 430 Dantzig, G.B., xix, 15, 16, 392, 464 Dell’Amico, M., 473 Dikin, I.I., xx, xxi, 4, 70, 220, 254–256, 301, 330, 337–341, 343, 345–347, 451–456, 458, 464 Ding, J., 250, 464 Dolecki, S., 465, 473, 476 Domich, P.D., 44, 309, 476 Dowling, T.A., 473 Drèze, J.H., 473 Draisma, G., 330, 463 Du, D.-Z., 468, 470 Duff, I.S., 408, 409, 462, 464, 471 El-Bakry, A.S., 429, 464 Evans, J.R., 387, 464 Fang, S.-Ch., 4, 392, 464 Farkas, J., 15, 89, 190, 464 Fiacco, A.V., 16, 44, 87, 95, 108, 277, 282, 309, 431, 464 Fourer, R., 409, 465 Fraley, C., 289, 298, 465 Franklin, P., 431, 465 Freedman, B.A., 451, 475 Freund, R.M., 252, 258, 289, 418, 465 Frisch, K.R., xx, 87, 90, 465 480 Gabszevwicz, J., 473 Gal, T., 361, 365, 387, 465 Gale, D., 15, 465 Garey, M.R., 48, 465 Gauvin, J., 387, 465 George, A., 410, 465 Ghellinck, G. de, 278, 289, 298, 465 Giannessi, F., 471 Gill, P.E., 277, 465 Goffin, J.L., 252, 278, 289, 462, 466 Goldfarb, D., 277, 278, 289, 466, 474 Goldman, A.J., 2, 17, 36, 89, 466 Golub, G.H., 8, 466 Gondzio, J., 278, 406, 409, 418, 419, 430, 461, 466 Gonzaga, C.C., 137, 165, 181, 193, 233, 257, 277, 289, 317, 365, 439, 467 Gould, N.I.M., 464 Greenberg, H.J., xxiii, 365, 387, 467 Greville, T.N.E., 8, 463 Griffiths, D.F., 475 Guddat, J., 391, 473 Güler, O., 16, 44, 103, 190, 309, 315, 365, 467, 477 Haglin, D., 469 Halická, M., 44, 309, 468 Hall, L.A., 365, 468 Hardy, G.H., 230, 468 Haurie, A., 278, 466 Hertog, D. den, xxi, 95, 134, 165, 196, 233, 250, 271, 277, 278, 282, 284, 285, 289, 317, 467, 468, 473 Holder, A., xxiii Hollatz, H., 391, 473 Hopcroft, J.E., 48, 461 Horn, R.A., 8, 11, 468 Huard, P., xx, 4, 277, 468 Hung, P., 330, 468 Illés, T., xxiii Iri, M., 475 Jansen, B., 70, 156, 196, 233, 247, 252, 254, 258, 263, 271, 330, 338, 387, 414, 439, 451, 468, 469 Jarre, F., xx, xxi, xxiii, 169, 469 Jensen, D.J., 181, 463 Johnson, C.R., 8, 11 Johnson, D.S., 48, 465, 468 Jong, J.J. de, 387, 468 Author Index Kaliski, J.A., 278, 468, 469 Karmarkar, N.K., xix, xx, xxiii, 1, 4, 5, 7, 87, 277, 278, 289, 292, 295, 297– 301, 304, 317, 412, 451, 461, 469 Karp, R.M., 48, 469 Khachiyan, L.G., xix, xx, 436, 469 Kiwiel, K.C., 278, 461, 469 Klafszky, E., 93, 423, 436, 469 Klerk, E. de , xxiii, 245, 469 Knolmayer, G., 387, 470 Kocvara, M., xx, 469 Kojima, M., 44, 165, 220, 233, 308, 317, 420, 470 Koopmans, Tj.C., xix Kovačević-Vujčić, V.V., 44, 462 Kranich, E., xx, 470 Kuhn, H.W., 466, 472, 475 Lancester, P., 8, 470 Lenstra, J.K., 464 Li, T.Y., 250, 464, 475 Ling, P.D., 70, 121, 439, 470 Lions, J.L., 471 Littlewood, J.E., 230, 468 Liu, C.L., 336, 470 Liu, J.W.-H., 410, 465 Lootsma, F.A., 87, 470 Luo, Z.-Q., 181, 278, 466, 470 Lustig, I.J., 181, 213, 406, 418, 470, 471 Mészáros, Cs., 409, 430, 461, 471 Mac Lane, S., 8, 463 Maffioli, F., 473 Mangasarian, O.L., 468 Markowitz, H.M., 410, 426, 471 Maros, I., 409, 471 Marsten, R.E., 165, 181, 406, 418, 471 Martello, S., 473 Mayer, J., xxiii, 93, 469 McCormick, S.T., 16, 44, 87, 95, 108, 277, 282, 406, 463, 464 McLinden, L., 44, 100, 471 McShane, K.A., 165, 471 Megiddo, N., 16, 44, 100, 250, 420, 424, 467, 470, 471 Mehrotra, S., 62, 165, 181, 289, 317, 330, 380, 387, 409, 412, 413, 424, 430, 465, 466, 471, 472 Meketon, M.S., 451, 475 Merle, O. du, 278, 462, 466, 472 Miller, R.E., 469 Author Index Mills, H.D., 361, 372, 472 Minkowski, H., 15 Mitchell, J.E., 422, 472 Mitra, G., 406, 463 Mizuno, S., 44, 165, 181, 196, 213, 220, 233, 250, 252, 271, 317, 414, 439, 470, 472, 477 Monma, C.L., 165, 471 Monteiro, R.D.C., 44, 165, 196, 233, 309, 317, 330, 380, 387, 412, 461, 471, 472 Muramatsu, M., 301, 305, 472, 475 Murray, W., 465 Nemhauser, G.L., 15, 422, 466, 472 Nemirovski, A.S., xx, xxi, 472 Nesterov, Y.E., xx, xxi, 278, 472 Neumann, J. von, 89, 473 Newton, I., 3–6, 29–32, 34, 48–52, 54, 58, 59, 61–63, 68–70, 72, 75, 80, 87, 109–116, 118, 120, 121, 123, 125, 127, 128, 130, 131, 140, 142–144, 149, 150, 152–154, 156–165, 167–172, 175–177, 180, 181, 186, 188, 194, 195, 199–202, 204, 206, 207, 219, 220, 231– 241, 243–245, 247, 249, 257–259, 261–264, 269–273, 277, 278, 281, 284, 285, 298, 301, 319, 320, 322, 325, 329, 330, 332, 333, 340, 341, 347, 401, 403, 404, 412, 413, 415, 416, 418–420 Nožička, F., 391, 473 Noma, T., 44, 470 Osborne, M.R., 15, 473 Pólya, G., 230, 468 Padberg, M., xix, 4, 445, 473 Papadimitriou, C.H., 15, 392, 473 Pardalos, P.M., xxi, xxiii, 54, 473, 477 Parlett, B.N., 408, 463 Peng, J., 430, 436, 473 Polyak, R., 418, 473 Potra, F.A., xxi, 463, 473 Prékopa, A., 474 Puthenpura, S., 4, 392, 464 Radosavljević-Nikolić, M.D., 44, 462 Ralph, D., xxi, 476 Ramana, M.V., xxi, 473 Reid, J.K., 464 481 Renegar, J., 4, 7, 233, 277–281, 283–285, 289, 473 Resende, M.G.C., 330, 412, 461, 472 Richard, J.F., 473 Rijk, P.P.M. de, 408, 462 Rinnooy Kan, A.H.G., 464, 466 Ritter, K., 468 Rockafellar, R.T., 423, 473 Roos, C., xx, 95, 121, 125, 128, 204, 225, 233, 245, 250, 254, 271, 277, 278, 282, 289, 317, 338, 339, 387, 414, 451, 461, 464, 467–470, 473, 474 Rosen, J.B., 468 Rubin, D.S., 387, 399, 474 Rudin, W., 431, 474 Saigal, R., 4, 15, 474 Saltzman, M.J., 426, 463 Sarkissian, R., 466 Saunders, M.A., xxiii, 169, 465, 469 Schrijver, A., 15, 48, 190, 392, 422, 445, 464, 474 Scott, J.A., 464 Shanno, D.F., xx, xxi, 165, 181, 318, 406, 418, 463, 471, 474 Sharda, R., 396, 474 Sharifi-Mokhtarian, F., 278, 466 Shaw, D.X., 278, 289, 466, 474 Sherali, H.D., 4, 463 Shetty, C.M., 4, 463 Shor, N.Z., xix, 474 Sierksma, G., 392 Simantiraki, E.M., xxi, 165, 471, 474 Sonnevend, Gy., 44, 181, 278, 474 Steiglitz, K., 15, 392, 473 Stoer, J., 181, 474 Strang, G., 8, 433, 474 Strazicky, B., 474 Sturm, J., xxiii, 258, 474 Sun, J., 165, 468, 470 Szelezsán, J., 474 Tanabe, T., 44, 474, 475 Tapia, R.A., 165, 181, 193, 429, 464, 467, 477 Terlaky, T., xx, 93, 245, 254, 277, 278, 282, 338, 387, 409, 414, 418, 423, 430, 436, 451, 461, 462, 466–470, 473–475 Thatcher, J.W., 469 Tismenetsky, M., 8, 470 482 Todd, M.J., 165, 181, 196, 213, 233, 277, 278, 289, 298, 365, 414, 466, 472, 475, 477 Tomlin, J.A., 465 Trafalis, T., 461 Tsuchiya, T., xxi, 4, 44, 301, 305, 339, 365, 467, 472, 475 Tucker, A.W., 2, 16, 17, 36, 89, 462, 466, 472, 475 Turner, K., 409, 464, 475 Ullman, J.D., 48, 461 Vaidya, P.M., 252, 278, 317, 462, 475 Van Loan, C.F., 8, 466 Vandenberghe, L., xx, xxi, 463 Vanderbei, R.J., 365, 409, 429, 451, 468, 475 Vavasis, S.A., 54, 58, 192, 476 Veiga, G., 412, 461 Vial, J.P., 95, 121, 125, 128, 204, 233, 252, 271, 278, 289, 298, 317, 462, 465–469, 473, 474, 476 Wagner, H.M., 387, 399, 474 Walsh, G.R., 15, 476 Ward, J.E., 387, 476 Warners, J.P., xxiii, 461 Watkins, D.S., 8, 476 Watson, A., 471, 475 Wechs, M., 44, 309, 476 Wendell, R.E., 387, 476 Weyl, H., 15 Williams, A.C., 103, 476 Williams, H.P., 1, 406, 463 Witzgall, C., 44, 309, 476 Wolsey, L.A., 15, 422, 472, 473 Wright, M.H., 465 Wright, S.J., xxi, 430, 476 Wu, F., 213, 476 Wu, S., 213, 476 Xiao, D., 289, 466 Xu, S.R., 289, 477 Xu, X., 461 Yamashita, H., 289, 477 Yannakakis, M., 410, 477 Yao, H.B., 289, 477 Ye, Y., 44, 54, 58, 62, 125, 128, 181, 190, 192, 193, 213, 233, 278, 289, 309, Author Index 317, 330, 414, 426, 428, 429, 461, 466–472, 475–477 Yoshise, A., 44, 165, 317, 470 Zhang, L., 330, 477 Zhang, S., 258, 474 Zhang, Y., 165, 330, 429, 430, 464, 477 Zhao, G., 44, 181, 309, 330, 474, 478 Zhu, J., 44, 309, 430, 478 Zowe, J., xx, 469 Subject Index 1-norm, 9, see Symbol Index, k.k1 2-norm, 9, see Symbol Index, k.k2 p-norm, 9, see Symbol Index, k.kp ∞-norm, 9, see Symbol Index, k.k∞ µ-center, 28, see Symbol Index, x(µ), y(µ), z(µ) and s(µ) adaptive-step methods, see Target-following Methods adaptive-update strategy dual case, 125 primal-dual case, 169 affine-scaling component, 171, see affinescaling direction affine-scaling direction dual, 127 primal-dual, 171, 179 affine-scaling step of size θ, 179 algorithms Conceptual Logarithmic Barrier Algorithm, 108, 107–109 Conceptual Target-following Algorithm, 232 Dikin Step Algorithm for Self-dual Model, 454 Dual Logarithmic Barrier Algorithm, 107–149 with adaptive updates, 123–129 with full Newton steps, 120, 120– 123 with large updates, 131, 130–149 Dual Logarithmic Barrier Algorithm with Modified Full Newton Steps, 323 Full Step Dual Logarithmic Barrier Algorithm with Rank-One Updates, 324, 317–328 Full-Newton Step Algorithm for Selfdual Model, 50, 47–70 Generic Dual Target-following Algorithm, 260 Generic Primal Target-following Algorithm, 269 Generic Target-following Algorithm, 233 Higher-Order Dikin Step Algorithm for the Standard Model, 341, 337–346 Higher-Order Logarithmic Barrier Algorithm, 357, 346–359 Karmarkar’s Projective Method, 294, 289–305 Method of Centers, 277–285 Predictor-Corrector Algorithm, 182, 177–194 Primal-Dual Logarithmic Barrier Algorithm, 149–209 with adaptive updates, 168–177 with full Newton steps, 160, 150– 168 with large updates, 195, 194–209 Renegar’s Method of Centers, 277– 285 Target-following Methods, 235–275 all-one vector, see e analytic center, 43 definition, 44 dual feasible region, 128 level set, 46 limit of central path, 45 analyticity of the central path, see central path analyze phase, see implementation aspects arithmetic-geometric-mean inequality, 133 asymptotic behavior, 2 asymptotic behavior of central path, 4, see central path 484 backward dual Newton step, 113 barrier parameter, 132 standard problem, 90 barrier term, 221 basic indices, 392 basic solution, 2, 391, see implementation aspects basis for (P ), 213 basis identification procedure, see implementation aspects basis tableau, see implementation aspects binary encoding, 48, see complexity theory bounded dual feasible region, 103 bounded level set, 100, 103, 222, 445 bounded primal feasible region, 103 bounded problem, 15 BPMPD, 430 break points, see Parametric Analysis Bunch–Parlett factorization, see implementation aspects canonical form see canonical problem, 16 canonical model see canonical problem, 16 canonical problem, 17, 18 approximate solutions, 76, 83 central path, 75 definition, 16, 18 dual problem, 18, 71 duality gap, 19 duality theorem, 39 embedding if interior solutions are known, 72 in general, 78 homogenizing variable, see Symbol Index, κ KKT conditions, 74 normalizing variable, see Symbol Index, ϑ primal problem, 18, 71 strictly complementary solution, 17, 37, 38 strong duality property, 19, 39 strong duality theorem, 39 transformation into, 445 weak duality property, 18 Cauchy–Schwarz inequality, 9, 120, 136, 205, 303, 316, 342, 456 Subject Index centering component, 171, see centering direction centering condition, 91 centering direction dual, 127 primal-dual, 171, 179 centering method, 4, see Target-following Methods centering problem, 250 central path, 1, 16, 27, 28 algorithmic proof, 29 analyticity, 309 asymptotic behavior, 4, 309 canonical model, 73–76, 79–82 derivatives, 226, 307, 309, 315 differentiability, 4, 307 existence, 29–35, 90–99 general, xxi, 1–5, 7 implementation aspects, 403, 412, 418–420, 451, 454, 455 Karmarkar format, 301, 305 self-dual problem, 16, 17, 23, 27, 28, 31, 35, 36, 43–46, 52, 57–60, 70, 307–310, 322 standard model, 87, 95–99, 107, 117, 123, 128, 129, 149, 158, 159, 164, 171, 180, 181, 190, 194, 213– 215, 219–222, 225, 227, 228, 233, 235, 236, 239–241, 245, 249–252, 254–257, 261, 262, 271, 280–283, 330, 331, 338, 341, 347, 358 straight, 97, 128 uniqueness, 28 central-path-following methods, 219 Cholesky factorization, see implementation aspects CLP, 429 column sum norm, 10 Combinatorial Optimization, xix complementary vectors, 35 complete separation, 58 complexity, 2, 5, 70, 234, 284, 298, 318, 401, 415, 419 complexity analysis, 250, 278 complexity bounds, see iteration bounds, xx, xxi, 5, 257, 317, 338, 348, 358, 414 complexity theory, xix binary encoding, 48 polynomial time, 47 size of a problem instance, 47 Subject Index solvable in polynomial time, 48 Conceptual Logarithmic Barrier Algorithm, 108, 107–109 iteration bound, 108 condition for adaptive updating, 172 condition number, 48, 54 cone neighborhood, 227 cone-affine-scaling, 258 constraint matrix, 18 corrector step, 181, see predictor-corrector method CPLEX, xix, xx, 4, 87, 396–398, 429 cutting plane methods, 278 damped Newton step, 131 damped-step methods, 4 damping parameter, 181 degenerate problem, 365 dense columns and rows, see implementation aspects derivatives of x(µ) and s(µ), see central path differentiability of central path, 4, see central path Dikin direction, 451, 454 Dikin ellipsoid, 339, 452 Dikin step, 454 Dikin Step Algorithm for Self-dual Model, 454 duality gap reduction, 455 feasible step-size, 455 high-order variant, 337 iteration bound for ε-solution, 458 proximity measure, 454 search direction, 453 Dikin-path, 254 Dikin-path-following method, 4, see Targetfollowing Methods dimension optimal sets, 365, see standard problem directional derivatives, see Parametric Analysis Discrete Optimization, xix distance to the central path, see proximity measure domain, 15 dual canonical problem, 18, see canonical problem definition, 18 dual level set, 102 485 Dual Logarithmic Barrier Algorithm, 107– 149 with adaptive updates, 123–129 affine-scaling direction, 127 centering direction, 127 illustration, 129 with full Newton steps, 120, 120–123 convergence analysis, 121–122 illustration, 122–123 iteration bound, 120 Newton step ∆s, 111 proximity measure, 114 quadratic convergence, 114–119 scaled Newton step, 112 with large updates, 131, 130–149 illustrations, 144–149 iteration bound, 143 step-size, 140, 143 Dual Logarithmic Barrier Algorithm with Modified Full Newton Steps, 323 iteration bound, 322 dual methods, 219 dual of general LO problem, 40 dual problem, 15 dual standard problem, see standard problem Dual Target-following Method, see Targetfollowing Methods duality gap, 19 duality in LO, 15 Duality Theorem, 89, 362, 366 dualizing scheme, 43 elimination of free variables, 446 ellipsoid method, xix equality constraints, 15 examples calculation of central path, 97 classical sensitivity analysis, 392 condition number, 54 Dikin Step Algorithm, 458, 459 Dual Logarithmic Barrier Algorithm with adaptive updates, 129 with full Newton steps, 122 with large update, 144 dual Newton process, 116 initialization, 215 Newton step Algorithm, 52 optimal partition, 62, 363 optimal set, 363 optimal-value function, 361, 369 486 Subject Index at a break point, 378 computation, 381, 385 domain, 367 Predictor-Corrector Algorithm, 188 Primal-Dual Logarithmic Barrier Algorithm with adaptive updates, 176 with full Newton steps, 162 with large updates, 209 primal-dual Newton process, 157 quadratic convergence Newton process, 116 quadratic convergence primal-dual Newton process, 157 reduction to canonical format, 449, 450 rounding procedure, 63 self-dual embedding, 23, 26, 27, 30, 32, 46, 55, 449, 450 sensitivity analysis, 389 shadow prices, 376 shortest path problem, 363 Farkas’ lemma, 15, 40, 89 feasible problem, 15 feasible set, 15 feasible solution, 15 feasible step-size Dikin Step Algorithm, 455 finite termination, 15, 16, 62 first-order method, 330 floating point operations, see implementation aspects flops, see floating point operations free variables, 446 Frobenius norm, 10 full index set, 27 Full Step Dual Logarithmic Barrier Algorithm with Rank-One Updates, 324, 317–328 modified proximity measure, 320–323 modified search direction, 319–320 required number of arithmetic operations, 328 Full-Newton Step Algorithm for Self-dual Model, 50, 47–70 iteration bound for ε-solution, 52 iteration bound for exact solution, 68 iteration bound for optimal partition, 61 polynomiality, 69 proximity measure, 49, 59 rounding procedure, 62–65 search direction, 49 full-step methods, 4, see Target-following Methods Gaussian elimination, see implementation aspects generalized inverse, 65, 264, see pseudoinverse geometric inequality, 230 Goldman–Tucker Theorem, 2, 89, 190, 362 gradient matrix, 308, see Jacobian Hadamard inequality, 11, 436 Hadamard product, 11 Hessian norm, 261 Higher-Order Dikin Step Algorithm for the Standard Model, 341, 337– 346 bound for the error term, 342 convergence analysis, 345–346 duality gap reduction, 342 feasible step-sizes, 342, 343 first-order direction, 340, 338–340 iteration bound, 338, 346 Higher-Order Logarithmic Barrier Algorithm, 357, 346–359 barrier parameter update, 356 bound for the error term, 348 convergence analysis, 357–359 improved iteration bound, 359 iteration bound, 358 proximity after a step, 353, 349–354 step-size, 353 higher-order methods, 5, 329–359 Schiet OpT M , 330 search directions, 330–334 analysis of error term, 335–337 error term, 333 illustration, 334 second-order effect, 329 upper bound for error term, 337 homogeneous, 22 homogenizing variable, 19 HOPDM, 430 implementation aspects, 401–430 analyze phase, 410 augmented system Subject Index definition, 404 solution of, 408 basic solution dual degeneracy, 422 primal degeneracy, 422 basis tableau, 422 Bunch–Parlett factorization, 408 Cholesky factorization, 409 dense columns and rows, 409 floating point operations, 410 Gaussian elimination, 410 Markowitz’s merit function, 410 maximal basis, 425 normal equation advantages and disadvantages, 409 definition, 404 solution of, 409 structure, 404 optimal basis, 421 optimal basis identification, 421–430 ordering minimum degree, 410 minimum local fill-in, 410 pivot transformation, 422 preprocessing, 405–408 detecting redundancy, 406 reduction of the problem size, 407 Schur complement, 410 second-order predictor-corrector method, 411 simplify the Newton system, 418 sparse linear algebra, 408–413 starting point, 413–419 self-dual embedding, 414 step-size, 420 stopping criteria, 420–421 warm start, 418–419 implicit function theorem, 226, 308, 309, 331, 431 inequality constraints, 15 infeasible problem, 15, 38 infinity norm, 9 inner iteration, 132, 195 inner loop, 131, 195 input size of an LO problem, see L interior-point condition, 16, 20 standard problem, 94 interior-point method, 20 interior-point methods, xix, 16 IPC, 20 IPM, 20 487 iteration bounds, 3, 5, 48, 122, 125, 144, 145, 150, 162, 167, 168, 247, 250–252, 254, 257, 258, 277, 284, 294, 318, 322, 330, 338, 345, 347 Conceptual Logarithmic Barrier Algorithm, 108 Dikin Step Algorithm, 70, 458 Dual Logarithmic Barrier Algorithm with full Newton steps, 120, 125 with large updates, 143 Dual Logarithmic Barrier Algorithm with Modified Full Newton Steps, 322 Full-Newton Step Algorithm, 52, 68 Higher-Order Dikin Step Algorithm for the Standard Model, 346 Higher-Order Logarithmic Barrier Algorithm, 358, 359 Karmarkar’s Projective Method, 297 Newton Step Algorithm, 69, 70 Primal-Dual Logarithmic Barrier Algorithm with full Newton steps, 161, 168 with large updates, 208 Renegar’s Method of Centers, 279 Jacobian, 226, 308, 331, 432 Karmarkar format, see Symbol Index, (P K), 297 definition, 289 discussion, 297–301 dual homogeneous version, 305 dual version, 305 homogeneous version, see Symbol Index, (P KH), 304–305 Karmarkar’s Projective Method, 294, 289–305 decrease potential function, 296 iteration bound, 297 potential function, 295 search direction, 304, 301–304 step-size, 296 unit simplex in IRn , see Symbol Index, Σn illustration for n = 3, 290 inner-outer sphere bound, 292 inverse of the transformation Td , 293 projective transformation, see Symbol Index, Td 488 properties of Td , 293 radius largest inner sphere, see Symbol Index, r radius smallest outer sphere, see Symbol Index, R Karush–Kuhn–Tucker conditions, 91, see KKT conditions KKT conditions canonical problem, 74 standard problem, 91 uniqueness of solution, 92, 222 large coordinates, 54, 57 large updates, 144 large-step methods, 4, see Target-following Methods large-update algorithm, 208 large-update strategy, 125 left-shadow price, see Sensitivity Analysis level set ellipsoidal approximation, 315 of φw (x, s), 222 of g̃µ (x), 92 of duality gap, 100, 103, 445 of primal objective, 102 LINDO, 396–398 linear constraints, 1, 15 linear function, 1 linear optimization, see LO linear optimization problem, 15 Linear Programming, xix linearity interval, see Parametric Analysis LIPSOL, 430 LO, xix logarithmic barrier function, 87 standard dual problem, 105 standard primal problem, 90 logarithmic barrier method, xx, 3, 219 dual method, 107 Newton step, 111 primal method, 271 Newton step, 271 primal-dual method, 149, 150 Newton step, 150 see also Target-following Methods, 219 long-step methods, 4 LOQO, 429 lower bound for σSP , 56 Markowitz’s merit function, see implementation aspects Subject Index Mathematical Programming, xix matrix norm, 10 maximal basis, see implementation aspects maximal step, see adaptive-step methods McIPM, 430 medium updates, 144 medium-step methods, see Target-following Methods medium-update algorithm, 209 Method of Centers, 277–285 minimum degree, see implementation aspects minimum local fill-in, see implementation aspects µ-center (P ) and (D), 95 multipliers, 16 multistep-step methods, see Target-following Methods Newton direction, 29–31, 49 self-dual problem, 29 definition, 29 feasibility, 32 quadratic convergence, 31, 32 Newton step to µ-center dual case, 110 primal-dual case, 161 to target w dual case, 261 primal case, 271 primal-dual case, 236 nonbasic indices, 392 nonnegative variables, 446 nonpositive variables, 446 normal equation, see implementation aspects normalizing constraint, 297 normalizing variable, 24 objective function, 15 objective vector, 18 optimal basic solution, 362 optimal basis, 362, 392, see implementation aspects optimal basis identification, see implementation aspects optimal basis partition, see Sensitivity Analysis Subject Index optimal partition, 2, 27, 36, see standard problem standard problem, 190 optimal set, 15 optimal-value function, see Parametric Analysis optimizing, 15 orthogonality property, 24 OSL, xx, 4, 87, 396–398 outer iteration, 132, 195 outer iteration bound, 108 outer loop, 131, 195 Parametric Analysis, 361–386 optimal-value function, see Symbol Index, zA (b, c), f (β) and c(γ) algorithm for f (β), 380 algorithm for g(γ), 384 break points, 369 directional derivatives, 372 domain, 367 examples, 361, 367, 369, 376, 378, 381, 385 extreme points of linearity interval, 377, 378 linearity interval, 369 one-sided derivatives, 372, 373, 375 piecewise linearity, 368 perturbation vectors, see Symbol Index, ∆b and ∆c perturbed problems, see Symbol Index, (Pβ ) and (Dγ ) dual problem of (Dγ ), see Symbol Index, (Pγ ) dual problem of (Pβ ), see Symbol Index, (Dβ ) feasible region (Dγ ), see Symbol Index, Dγ feasible region (Pβ ), see Symbol Index, Pβ partial updating, 5, 317–328 Dual Logarithmic Barrier Algorithm with Modified Full Newton Steps, 323 Full Step Dual Logarithmic Barrier Algorithm with Rank-One Updates, 324 rank-one modification, 318 rank-one update, 318 Sherman-Morrison formula, 318 path-following method, 4 489 central path, 248 Dikin-path, 254 primal or dual, see logarithmic barrier method and center method weighted path, 249 PC-PROG, 396–398 PCx, 430 perturbed problems, see Parametric Analysis pivot transformation, see implementation aspects polynomial time, see complexity theory, 48, see complexity theory polynomially solvable problems, xix positive definite matrix, 8 positive semi-definite matrix, 8 postoptimal analysis, see Sensitivity Analysis potential reduction methods, 4 predictor step, 181, see predictor-corrector method Predictor-Corrector Algorithm, 182, 177– 194 adaptive version, 186–194 convergence analysis, 185–194 illustration, 188 iteration bound, 181 second-order version, see implementation aspects predictor-corrector method, 150, see PredictorCorrector Algorithm preprocessing, see implementation aspects primal affine-scaling, 339 primal affine-scaling method, 339, 451 primal canonical problem, 18, see canonical problem definition, 18 primal level set, 102 primal logarithmic barrier method, 304 primal methods, 219 primal standard problem, see standard problem, see standard problem Primal Target-following Method, see Targetfollowing Methods primal-dual affine-scaling, 169 primal-dual algorithms, 150 primal-dual centering, 169 Primal-Dual Logarithmic Barrier Algorithm, 149–209 duality gap after Newton step, 153 example Newton process, 159 490 feasibility of Newton step, 152, 154 initialization, 213–216 local quadratic convergence, 156, 159 Newton step, 150, 150–154 proximity measure, 156 with adaptive updates, 168–177 affine-scaling direction, 171, 179 centering direction, 171, 179 cheap adaptive update, 176 condition for adaptive updating, 172, 173 illustration, 176–177 with full Newton steps, 160, 150–168 classical analysis, 165–168 convergence analysis, 161–162 illustration, 162–164 iteration bound, 161 with large updates, 195, 194–209 illustrations, 209 iteration bound, 208 step-size, 201 primal-dual logarithmic barrier function, 132 primal-dual method, 219 primal-dual pair, 99 Primal-Dual Target-following Method, see Target-following Methods Projective Method, 277, see Karmarkar’s Projective Method proximity measures, 31, 59 δc (w), 222, 227 δc (x), 454 δc (z), 59 δ d (y, w), 261 δ p (x, w), 271, 272 δ(w∗ , w), 266 δ(z, µ), 49 δ(x, s; µ), 156, 237 δ(xs, w), 237 δ(s, µ), 114 σ(x, s; µ), 165 pseudo-inverse, 194, 313, 433–434 quadratic convergence dual case, 114 primal-dual case, 156 ranges, see Sensitivity and/or Parametric Analysis rank-one modification, see partial updating Subject Index rank-one update, see partial updating reliable sensitivity modules, 399 removal of equality constraints, 448 Renegar’s method, see Renegar’s Method of Centers Renegar’s Method of Centers, 279 adaptive and large-update variants, 284–285 analysis, 281–284 as target-following method, 279–280 barrier function, see Symbol Index, φR (y, z) description, 278 iteration bound, 279 lower bound update, 278 right-hand side vector, 18 right-shadow price, see Sensitivity Analysis rounding procedure, 3, 54 row sum norm, 10 scaled Newton step, 114 scaling matrix, 151, 317 scheme for dualizing, 43 Schiet OpT M , see higher-order methods Schur complement, see implementation aspects search direction, 451 second-order effect higher-order methods, 329 self-dual embedding, 22 self-dual model, see self-dual problem self-dual problem, 13, 16, 24 central path convergence, 43, 45 derivatives, 309–315 condition number, see Symbol Index, σSP definition, 22, 71, 72, 451 ellipsoidal approximations of level sets, 315–316 limit central path, 36 objective value, 24, 25, 48, 50, 61, 66, 454, 455 optimal partition, 36 polynomial algorithm, 50, 47–70, 454 proximity measure, 31 strictly complementary solution, 35– 37 strong duality theorem, 38 Subject Index Semidefinite Optimization, xix Sensitivity Analysis, 387–399 classical approach, 391–399 computationally cheap, 393 optimal basis partition, 392 pitfalls, 399 ranges depend on optimal basis, 392 results of 5 commercial packages, 394–398 definition, 387 example, 389 left- and right-shadow prices of bi , 387, 388 left- and right-shadow prices of cj , 388 left-shadow price, 387 range of bi , 387, 388 range of cj , 387, 388 range of a coefficient, 387 right-shadow price, 387 shadow price of a coefficient, 387 shadow prices, see Sensitivity and/or Parametric Analysis Sherman-Morrison formula, 318, see partial updating shifted barrier method, 258 short-step methods, 4, see Target-following Methods Simplex Method, xix, xx, 1–3, 6, 7, 15, 16, 87, 365, 391, 392, 406 singular value decomposition, 434 size of a problem instance, see complexity theory skew-symmetric matrix, 18, 20–22, 24, 28, 29, 47, 214, 299, 307, 310, 416 slack vector, 22, 47 small coordinates, 54, 57 solvable in polynomial time, see complexity theory solvable problem, 38 sparse linear algebra, see implementation aspects spectral matrix norm, 10 standard dual problem logarithmic barrier function, 105 standard format, 87, see standard problem, 448 standard primal problem logarithmic barrier function, 90 standard problem 491 barrier parameter, 90 barrier term, 90 central path definition, 95 duality gap, 107 examples, 96–99 monotonicity, 95 classical duality results complementarity, 89 strong duality, 89 weak duality, 88, 89 coordinatewise duality, 103 dual adaptive-update algorithm, 123– 129 illustration, 129 dual algorithms, 107–149 dual barrier function, see Symbol Index, kµ (y, s) decrease after step, 140, 140–142 effect of an update, 140, 138–140 dual full-step algorithm, 120, 120– 123 dual large-update algorithm, 131, 130–149 dual problem, 88, 103, 107 duality gap close to central path, 119 on central path, 89, 99 estimates of dual objective values, 138, 135–138 interior-point condition, 94 equivalent conditions, 100 KKT conditions, 91 optimal partition, see Symbol Index, π = (B, N ) optimal sets, 100, see Symbol Index, P ∗ and D∗ determined by dual optimal solution, 363 determined by optimal partition, 363 dimensions, 365 example, 363 orthogonality property, 99 predictor-corrector algorithm, 182, 177–194 primal barrier function, 90, see Symbol Index, g̃µ (x) primal problem, 87, 103 primal-dual adaptive-update algorithm, 168–177 492 primal-dual algorithms, 149–209 primal-dual barrier function, see Symbol Index, φµ (x, s) decrease after step, 201, 199–204 effect of an update, 205 primal-dual full-step algorithm, 160, 150–168 primal-dual large-update algorithm, 195, 194–209 strictly complementary solution, 89 symmetric formulation, 103–105 starting point, see implementation aspects step of size α damped Newton step, 140, 154, 199, 202, 232, 240, 241, 258, 403 decrease barrier function, 140, 199, 201, 202, 241, 296, 347 Dikin step, 455 feasibility, 152, 154, 236, 239, 262, 272, 342, 343, 455 higher-order Dikin step, 341, 349 step-size, see implementation aspects stopping criteria, see implementation aspects strict complementarity standard format, 89 strictly complementary solution, 2 strictly complementary vectors, 35 strictly feasible, 4 strong duality property, 19 strong duality theorem, 39 support of a vector, 36 target map, see Symbol Index, ΦP D , see Target-following Methods target pair, see Target-following Methods target sequence, 4, see Target-following Methods target vector, see Target-following Methods Target-following Method, 4 Target-following Methods, 235–275 adaptive and large target-update, 257–258 adaptive-step methods, 232 dual method, 260, 259–268 barrier function, 259 effect of target update, 266 feasibility of Newton step, 262 linear convergence for damped step, 264 Subject Index local quadratic convergence, 263 Newton step, 261 proximity measure, 261 examples, 247–285 centering method, 250–252 central-path-following, 248–249 Dikin-path-following method, 254– 257 method of centers, 277–285 Renegar’s method of centers, 277– 285 weighted-centering method, 252– 253 weighted-path-following, 249–250 full-step methods, 232 large-step methods, 232 medium-step methods, 232 multistep-step methods, 232 primal method, 269, 269–275 barrier function, 270 effect of target update, 275 feasibility of Newton step, 272 linear convergence for damped step, 273 local quadratic convergence, 273 Newton step, 271 proximity measure, 271, 272 primal-dual method, 233, 235–245 barrier function, 221 duality gap after Newton step, 237 feasibility of Newton step, 236, 239 linear convergence for damped steps, 241 local quadratic convergence, 240 Newton step, 235, 236 proximity measure, 237, 266 proximity measure, 222 short-step methods, 232 target map, 220 target pair, 235 target sequence, 220 properties, 226–231 target vector, 235 traceable target sequence, 231 theorems of the alternatives, 40 traceable target sequence, see Targetfollowing Methods types of constraint equality, 446 inequality greater-than-or-equal-to, 446 Subject Index less-than-or-equal-to, 446 types of variable free, 446 nonnegative, 446 nonpositive, 446 unbounded problem, 15, 38 unit ball in IRn , 10 unsolvable problem, 38 vanishing duality gap, 19, 37 variance vector, 31, 49, 59 warm start, see implementation aspects weak duality, 18 weak duality property, 18 weighted dual logarithmic barrier function, 259, see Symbol Index, φdw (y) weighted path, 249 weighted primal barrier function, 270 weighted primal logarithmic barrier function, see Symbol Index, φpw (x) weighted primal-dual logarithmic barrier function, 221, see Symbol Index, φw (x, s) weighted-analytic center, 4, 220, 229 definition, 229 limit of target sequence, 229 weighted-centering problem, 252 weighted-path-following method, 4, see Target-following Methods weighting coefficients, 221 w-space, 220 XMP, 396–398 XPRESS-MP, 429 493 Symbol Index (D′ ), 82 (D) canonical form, 18, 71 standard form, 88, 103, 107, 219, 298, 361 (DK), 305 (DKH), 305 (D′ ), 104 (D̄), 214 (Dβ ), 366 (Dγ ), 366 (EP ), 449 (P ′′ ), 299 (P ′ ), 82, 298, 299, 448 (P ) canonical form, 18, 71, 449 standard form, 87, 103, 213, 219, 298, 361 (P K), 289 (P KH), 304 (P KS), 293 (P ′ ), 104 (P̄ ), 214 (Pβ ), 366 (P c ), 213, 214 (Pγ ), 366 (Pµ ), 91 (SP ), 22, 47, 72, 88, 307, 416, 451 (SP0 ), 71 (SP1 ), 73 (SP2 ), 78 (SP0 ), 22 (SSP ), 88 (SP c ), 214 (SSP c ), 214 A canonical form, 18 Karmarkar form, 289 standard form, 87, 298, 361 k.k1 , 9 k.k2 , 9 k.kp , 9 k.k∞ , 9 B, 24, 190 b canonical form, 18 standard form, 87, 298, 361 b(β), 366 B ∗ , 65 c canonical form, 18 Karmarkar form, 289 standard form, 87, 298, 361 c(γ), 366 d, 170, 238 ds , 170, 238 das , 171 dcs , 171 dx , 170, 238 dax , 171 dcx , 171 D, 88 Dβ , 366 Dγ , 366 D+ , 88 D∗ , 89, 190, 362 dimension, 365 from optimal solution of (P ), 363 ∆b, 366 ∆c, 366 ∆s, 49, 150, 452 ∆a s, 171 496 ∆c s, 171 ∆x, 150, 451 ∆z, 49 ∆a x, 171 ∆c x, 171 ∆y, 150 δc (w), 222, 227 δc (x), 454 δc (z), 59 δ d (y, w), 261 δ d (y, w), 261 δ p (x, w), 271, 272 ∆s, 29 δ(w∗ , w), 266 δ(z, µ), 49 δ(x, s; µ), 156, 237 δ(xs, w), 237 ∆z, 29 δ(s, µ), 114 δ(x, µ), 305 e, 9 E (µ, r), 315 f (β), 366 g(γ), 366 g̃µ (x), 90 scaled, see gµ (x) gµ (x), 132 H, 111 h̃µ (s), 105, 110 scaled, see hµ (s) hµ (s), 132 κ, 19 kµ (y, s), 105, 110 see also h̃µ (s), 105 L, 48, 70 L, 104 L⊥ , 104 M , 21 M̄ , 20, 23, 71 MBB , 55 MBN , 55 MIJ , 55 MK , 315 MNB , 55 MNN , 55 Symbol Index N , 24, 190 N (A), 91 n̄, 21 O, 11 Ω, 11 ω, 65 P, 88 Pβ , 366 Pγ , 366 P + , 88 P ∗ , 89, 190, 362 dimension, 365 from optimal solution of (D), 363 φdµ (s), 132 φpµ (x), 132 φdµ (s) properties, 133, 134 φdµ (s) properties, 132 φpµ (x) properties, 132 φµ (x, s), 132 properties, 132–134 ΦP D , 220 existence, 222, 221–226 φpµ (x) properties, 133 φR (y, z), 278 φdw (y), 260 φpw (x), 270 φw (x, s), 221 πB , 65 π = (B, N ), 362 PQ , 111 ψ graph, 93 graphs of ψ(δ) and ψ(−δ), 135 properties, 93, 133, 137, 197, 198 Ψ properties, 134 q, 21 qB , 55 qN , 55 R, 290, see Σn r, 21, 291, see Σn ρ(δ), 182 s(µ), 95 Symbol Index sα , 158, 455 sB , 55 sB (z̃), 53 sB (z), 53 σSP , 54 lower bound, 56 σ(x, s; µ), 165 σd , 192 σp , 192 Σn , 290 illustration for n = 3, 290 s σSP , 54 σ(z), 36 z σSP , 54 sN , 55 sN (z̃), 53 SP, 54 s+ , 455 SP ∗ , 44, 54 s(z̃), 53 s(z), 53 s(z), 22 Td , 292 properties, 293 Θ, 11 ϑ, 21 z̃B , 53 z̃N , 53 u, 170, 238 v, 238 w-space, 220 x(µ), 95 xα , 158, 455 x+ , 455 y(µ), 95 z, 21 z(µ), 28 zA (b, c), 361 zB , 55 z̄, 20, 23, 71 zI , 53 zN , 53, 55 497