Optimization Principles and Algorithms Ed2 v1

Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.
in) - jeudi 18 avril 2024 à 07h48
Optimization:
Principles and Algorithms
Michel Bierlaire
Companion website: optimizationprinciplesalgorithms.com
EPFL Press
Optimization:
Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48
Michel Bierlaire
Every engineer and decision scientist must have a good mastery of optimization, an essential
element in their toolkit. Thus, this articulate introductory textbook will certainly be welcomed
by students and practicing professionals alike. Drawing from his vast teaching experience,
the author skillfully leads the reader through a rich choice of topics in a coherent, fluid and
tasteful blend of models and methods anchored on the underlying mathematical notions
(only prerequisites: first year calculus and linear algebra). Topics range from the classics to
some of the most recent developments in smooth unconstrained and constrained optimiza-
tion, like descent methods, conjugate gradients, Newton and quasi-Newton methods, linear
programming and the simplex method, trust region and interior point methods.
Furthermore elements of discrete and combinatorial optimization like network optimization,
integer programming and heuristic local search methods are also presented.
This book presents optimization as a modeling tool that beyond supporting problem formu-
lation plus design and implementation of efficient algorithms, also is a language suited for
interdisciplinary human interaction. Readers further become aware that while the roots of
mathematical optimization go back to the work of giants like Newton, Lagrange, Cauchy,
Euler or Gauss, it did not become a discipline on its own until World War Two. Also that its pre-
sent momentum really resulted from its symbiosis with modern computers, which made it
possible to routinely solve problems with millions of variables and constraints.
With his witty, entertaining, yet precise style, Michel Bierlaire captivates his readers and
awakens their desire to try out the presented material in a creative mode. One of the out-
standing assets of this book is the unified, clear and concise rendering of the various algo-
rithms, which makes them easily readable and translatable into any high level program-
ming language. This is an addictive book that I am very pleased to recommend.
Prof. Thomas M. Liebling
MICHEL BIERLAIRE holds a PhD in mathematics from the University of Namur, Belgium. He is full professor at the
School of Architecture, Civil and Environmental Engineering at the Ecole Polytechnique Fédérale de Lausanne,
Switzerland. He has been teaching optimization and operations research at the EPFL since 1998.
EPFL Press
Optimization:
Cover Illustrations: Abstract cubes background, © ProMotion and Architectural detail of modern
roof structure, © Lucian Milasan – Fotolia.com
Michel Bierlaire
Optimization:
EPFL Press
This book is published under the editorial direction of
Professor Robert Dalang (EPFL).
The publisher and author express their thanks to the Ecole

Polytechnique Fédérale de Lausanne (EPFL) for its generous

support towards the publication of this book.
The EPFL Press is the English-language imprint of the Foundation of the Presses polytechniques et
universitaires romandes (PPUR). The PPUR publishes mainly works of teaching and research of the Ecole
polytechnique fédérale de Lausanne (EPFL), of universities and other institutions of higher education.
Presses polytechniques et universitaires romandes, EPFL – Rolex Learning Center,

Post office box 119, CH-1015 Lausanne, Switzerland
E-mail : ppur@epfl.ch
Phone : 021 / 693 21 30
Fax : 021 / 693 40 27
www.epflpress.org
© 2018, Second edition, EPFL Press

ISBN 978-2-88915-279-7
© 2015, First edition, EPFL Press

ISBN 978-2-940222-78-0 (EPFL Press)
ISBN 978-1-4822-0345-5 (CRC Press)
Printed in Italy
All right reserved (including those of translation into other languages). No part of this book may be
reproduced in any form – by photoprint, microfilm, or any other means – nor transmitted or translated into
a machine language without written permission from the publisher.
To Patricia
Preface
Optimization algorithms are important tools for engineers, but difficult to use. In
fact, none of them is universal, and a good understanding of the different methods is
necessary in order to identify the most appropriate one in the context of specific appli-
cations. Designed to teach undergraduate engineering students about optimization,
this book also provides professionals employing optimization methods with significant
elements to identify the methods that are appropriate for their applications, and to
understand the possible failures of certain methods on their problem. The content
is meant to be formal, in the sense that the results presented are proven in detail,
and all described algorithms have been implemented and tested by the author. In
addition, the many numeric and graphic illustrations constitute a significant base for
understanding the methods.
The book features eight parts. The first part focuses on the formulation and the
analysis of the optimization problem. It describes the modeling process that leads to
an optimization problem, as well as the transformations of the problem into an equiv-
alent formulation. The properties of the problem and corresponding hypotheses are
discussed independently of the algorithms. Subsequently, the optimality conditions,
the theoretical foundations that are essential for properly mastering the algorithms,
are analyzed in detail in Part II. Before explaining the methods for unconstrained
continuous optimization in Part IV, algorithms for solving systems of non linear
equations, based on Newton’s method, are described in Part III. The algorithms for
constrained continuous optimization constitute the fifth part. Part VI addresses op-
timization problems based on network structures, elaborating more specifically on
the shortest path problem and the maximum flow problem. Discrete optimization
problems, where the variables are constrained to take integer values, are introduced
in Part VII, where both exact methods and heuristics are presented. The last part is
an appendix containing the definitions and theoretical results used in the book.
Several chapters include exercises. Chapters related to algorithms also propose
projects involving an implementation. It is advisable to use a mathematical program-
ming language, such as Octave (Eaton, 1997) or Matlab (Moled, 2004). If a language
such as C, C++, or Fortran is preferred, a library managing the linear algebra, such
as LAPACK (Anderson et al., 1999), can be useful. When time limits do not allow
a full implementation of the algorithms by the students, the teaching assistant may
prepare the general structure of the program, including the implementation of opti-
mization problems (objective function, constraints, and derivatives) in order for the
Optimization: Principles and Algorithm viii
students to focus on the key points of the algorithms. The examples described in
detail in this book enable the implementations to be verified.
Optimization is an active research field, that is, permanently stimulated by the
needs of modern applications. Many aspects are absent from this book. “Every choice
entails the rejection of what might have been better,” said Andre Gide. Among the
important topics not covered in this book, we can mention

• the numerical aspects related to the implementation, particularly important and
tricky in this area (see, e.g., Dennis and Schnabel, 1996);
• the convergence analysis of the algorithms (see, e.g., Ortega and Rheinboldt, 1970,
Dennis and Schnabel, 1996, Conn et al., 2000, and many others);
• automatic differentiation, allowing the automatic generation of analytical deriva-
tives of a function (Griewank, 1989, Griewank, 2000);
• techniques to deal with problems of large size, such as updates with limited mem-
ory (Byrd et al., 1994) or partially separable functions (Griewank and Toint, 1982);
• homotopy methods (Forster, 1995);
• semidefinite optimization (Gärtner and Matousek, 2012);
• the vast field of convex optimization (Ben-Tal and Nemirovski, 2001, Boyd and
Vandenberghe, 2004, Calafiore and El Ghaoui, 2014);
• stochastic programming (Birge and Louveaux, 1997, Shapiro et al., 2014);
• robust optimization (Ben-Tal et al., 2009), where uncertainty of the data is ex-
plicitly accounted for.
This book is the fruit of fifteen years of teaching optimization to undergraduate
students in engineering at the Ecole Polytechnique Fédérale of Lausanne. Except for
the parts of the book related to networks and discrete optimization that are new, the
material in the book has been translated from Bierlaire (2006), a textbook in French.
The main sources of inspiration are the following books:
• Bertsimas and Weismantel (2005)
• de Werra et al. (2003)
• Bonnans et al. (2003)
• Conn et al. (2000)
• Nocedal and Wright (1999)
• Bertsekas (1999)
• Wolsey (1998)
• Bertsekas (1998)
• Bertsimas and Tsitsiklis (1997)
• Wright (1997)
• Dennis and Schnabel (1996)
• Ahuja et al. (1993).
Preface ix
There are many books on optimization. Within the vast literature, we may cite
the following books in English: Beck (2014), Calafiore and El Ghaoui (2014), Ben-
Tal et al. (2009), Boyd and Vandenberghe (2004), Ben-Tal and Nemirovski (2001),
Conn et al. (2000), Kelley (1999), Kelley (1995), Birge and Louveaux (1997), Den-
nis and Schnabel (1996), Axelsson (1994), Polyak (1987), Scales (1985), Coleman
(1984), McCormick (1983), Gill et al. (1981), Fletcher (1980), Fletcher (1981), Or-
tega and Rheinboldt (1970), and Fiacco and McCormick (1968). Several books are
also available in French. Among them, we can cite Korte et al. (2010), Dodge (2006),
Cherruault (1999), Breton and Haurie (1999), Hiriart-Urruty (1998), Bonnans et al.
(1997), and Gauvin (1992).
The bibliographic source for the biographies of Jacobi, Hesse, Lagrange, Fermat,
Newton, Al Khwarizmi, Cauchy, and Lipschitz is Gillispie (1990). The information
about Tucker (Gass, 2004), Dantzig (Gass, 2003), Little (Larson, 2004), Fulkerson
(Bland and Orlin, 2005), and Gomory (Johnson, 2005) come from the series IFORS’
Operational Research Hall of Fame. The source of information for Euler is his
biography by Finkel (1897). Finally, the information on Davidon was taken from his
web page
www.haverford.edu/math/wdavidon.html
and from Nocedal and Wright (1999). The selection of persons described in this work
is purely arbitrary. Many other mathematicians have contributed significantly to the
field of optimization and would deserve a place herein. I encourage the reader to read,
in particular, the articles of the series IFORS’ Operational Research Hall of Fame
published in International Transactions in Operational Research, expressly
dealing with Morse, Bellman, Kantorovich, Erlang, and Kuhn.
Online material
The book has a companion website:
www.optimizationprinciplesalgorithms.com
The algorithms presented in the book are coded in GNU Octave, a high-level in-
terpreted language (www.gnu.org/software/octave), primarily intended for numerical
computations. The code for the algorithms, as well as examples of optimization prob-
lems, are provided. All the examples have been run on GNU Octave, version 3.8.1.
on a MacBook Pro running OS X Yosemite 10.10.2. If you use these codes, Michel
Bierlaire, the author, grants you a nonexclusive license to run, display, reproduce, dis-
tribute and prepare derivative works of this code. The code has not been thoroughly
tested under all conditions. The author, therefore, does not guarantee or imply its
reliability, serviceability, or function. The author provides no program services for
the code.
Optimization: Principles and Algorithm x
Acknowledgments
I would like to thank EPFL Press for their support in the translation of the French
version and the publication of this book. The financial support of the School of
Architecture, Civil and Environmental Engineering at EPFL is highly appreciated.
I am grateful to my PhD advisor, Philippe Toint, who passed on his passion for
optimization to me. Among the many things he taught me, the use of geometric
interpretations of certain concepts, especially algorithmic ones, proved particularly
useful for my research and my teaching. I hope that they will now benefit the readers
of this book.
I also thank Thomas Liebling who put his trust in me by welcoming me into his
group at EPFL back in 1998. Among other things, he asked me to take care of the
optimization and operations research courses, which year after year have enabled me
to build the material that is gathered in this book. But I am especially grateful for
his friendship, and all the things that he has imparted to me.
I would like to thank my doctoral students, teaching assistants, and postdocs, not
only for the pleasure of working with them, but also for their valuable help in the
optimization course over the years, and their comments on various versions of this
book.
Earlier versions of the manuscript were carefully read by the members of the Trans-
port and Mobility Laboratory at the EPFL, and in particular Stefan Binder, Anna
Fernandez Antolin, Flurin Hänseler, Yousef Maknoon, Iliya Markov, Marija Nikolic,
Tomas Robenek, Riccardo Scarinci, and Shadi Sharif. Thomas Liebling provided a
great deal of comments, with excellent suggestions to improve the style and the con-
tent of the book. He caught several errors and imprecisions. It greatly improved the
quality of the text. If there are still mistakes (and there are always some that escape
scrutinity), I clearly take full responsibility. Errata will be published on the website
as mistakes are caught.
I am so proud of my children, Aria and François, who have recently started the
difficult challenge of obtaining a university degree. Finally, I dedicate this book to
Patricia. I am really lucky to know such a wonderful person. Her love is a tremendous
source of joy and energy for me.
Contents
I Formulation and analysis of the problem 1

1 Formulation 5
1.1 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Projectile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.2 Swisscom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.3 Château Laupt-Himum . . . . . . . . . . . . . . . . . . . . . . 9
1.1.4 Euclid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.5 Agent 007 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.6 Indiana Jones . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1.7 Geppetto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2 Problem transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.1 Simple transformations . . . . . . . . . . . . . . . . . . . . . . 16
1.2.2 Slack variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2 Objective function 29
2.1 Convexity and concavity . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Differentiability: the first order . . . . . . . . . . . . . . . . . . . . . . 31
2.3 Differentiability: the second order . . . . . . . . . . . . . . . . . . . . 39
2.4 Linearity and non linearity . . . . . . . . . . . . . . . . . . . . . . . . 42
2.5 Conditioning and preconditioning . . . . . . . . . . . . . . . . . . . . . 45
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3 Constraints 51
3.1 Active constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2 Linear independence of the constraints . . . . . . . . . . . . . . . . . . 56
3.3 Feasible directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.3.1 Convex constraints . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.3.2 Constraints defined by equations-inequations . . . . . . . . . . 62
3.4 Elimination of constraints . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5 Linear constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
xii CONTENTS
3.5.1 Polyhedron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.5.2 Basic solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.5.3 Basic directions . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4 Introduction to duality 93
4.1 Constraint relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.2 Duality in linear optimization . . . . . . . . . . . . . . . . . . . . . . . 102
4.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
II Optimality conditions 111
5 Unconstrained optimization 115

5.1 Necessary optimality conditions . . . . . . . . . . . . . . . . . . . . . . 115
5.2 Sufficient optimality conditions . . . . . . . . . . . . . . . . . . . . . . 120
5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6 Constrained optimization 127

6.1 Convex constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.2 Lagrange multipliers: necessary conditions . . . . . . . . . . . . . . . . 133
6.2.1 Linear constraints . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.2.2 Equality constraints . . . . . . . . . . . . . . . . . . . . . . . . 137
6.2.3 Equality and inequality constraints . . . . . . . . . . . . . . . . 142
6.3 Lagrange multipliers: sufficient conditions . . . . . . . . . . . . . . . . 152
6.3.1 Equality constraints . . . . . . . . . . . . . . . . . . . . . . . . 153
6.3.2 Inequality constraints . . . . . . . . . . . . . . . . . . . . . . . 154
6.4 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.5 Linear optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
6.6 Quadratic optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
III Solving equations 177
7 Newton’s method 181

7.1 Equation with one unknown . . . . . . . . . . . . . . . . . . . . . . . . 181
7.2 Systems of equations with multiple unknowns . . . . . . . . . . . . . . 192
7.3 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
8 Quasi-Newton methods 201

8.1 Equation with one unknown . . . . . . . . . . . . . . . . . . . . . . . . 201
8.2 Systems of equations with multiple unknowns . . . . . . . . . . . . . . 208
8.3 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
CONTENTS xiii
IV Unconstrained optimization 217

9 Quadratic problems 221
9.1 Direct solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
9.2 Conjugate gradient method . . . . . . . . . . . . . . . . . . . . . . . . 222
9.3 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
10 Newton’s local method 235

10.1 Solving the necessary optimality conditions . . . . . . . . . . . . . . . 235
10.2 Geometric interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . 236
10.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
11 Descent methods and line search 245

11.1 Preconditioned steepest descent . . . . . . . . . . . . . . . . . . . . . . 246
11.2 Exact line search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
11.2.1 Quadratic interpolation . . . . . . . . . . . . . . . . . . . . . . 252
11.2.2 Golden section . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
11.3 Inexact line search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
11.4 Steepest descent method . . . . . . . . . . . . . . . . . . . . . . . . . . 277
11.5 Newton method with line search . . . . . . . . . . . . . . . . . . . . . 277
11.6 The Rosenbrock problem . . . . . . . . . . . . . . . . . . . . . . . . . 281
11.7 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
11.8 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
12 Trust region 291

12.1 Solving the trust region subproblem . . . . . . . . . . . . . . . . . . . 294
12.1.1 The dogleg method . . . . . . . . . . . . . . . . . . . . . . . . . 294
12.1.2 Steihaug-Toint method . . . . . . . . . . . . . . . . . . . . . . 298
12.2 Calculation of the radius of the trust region . . . . . . . . . . . . . . . 300
12.4 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
13 Quasi-Newton methods 311

13.1 BFGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
13.2 Symmetric update of rank 1 (SR1) . . . . . . . . . . . . . . . . . . . . 317
13.4 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
13.5 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
14 Least squares problem 329

14.1 The Gauss-Newton method . . . . . . . . . . . . . . . . . . . . . . . . 334
14.2 Kalman filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
14.3 Orthogonal regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
14.4 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
xiv CONTENTS
15 Direct search methods 347

15.1 Nelder-Mead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
15.2 Torczon’s multi-directional search . . . . . . . . . . . . . . . . . . . . . 354
15.3 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
V Constrained optimization 359

16 The simplex method 363
16.1 The simplex algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
16.2 The simplex tableau . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
16.3 The initial tableau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
16.4 The revised simplex algorithm . . . . . . . . . . . . . . . . . . . . . . 394
16.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
16.6 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
17 Newton’s method for constrained optimization 399

17.1 Projected gradient method . . . . . . . . . . . . . . . . . . . . . . . . 399
17.2 Preconditioned projected gradient . . . . . . . . . . . . . . . . . . . . 405
17.3 Dikin’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
17.4 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
18 Interior point methods 415

18.1 Barrier methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
18.2 Linear optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
18.3 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
19 Augmented Lagrangian method 445

19.1 Lagrangian penalty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
19.2 Quadratic penalty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
19.3 Double penalty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
19.4 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
20 Sequential quadratic programming 463

20.1 Local sequential quadratic programming . . . . . . . . . . . . . . . . . 464
20.2 Globally convergent algorithm . . . . . . . . . . . . . . . . . . . . . . . 471
20.3 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
VI Networks 487
21 Introduction and definitions 491
21.1 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
21.2 Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
21.3 Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
21.4 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
CONTENTS xv
21.5 Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501

21.5.1 Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
21.5.2 Capacities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
21.5.3 Supply and demand . . . . . . . . . . . . . . . . . . . . . . . . 504
21.5.4 Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
21.5.5 Network representation . . . . . . . . . . . . . . . . . . . . . . 508

21.6 Flow decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
21.7 Minimum spanning trees . . . . . . . . . . . . . . . . . . . . . . . . . . 520
21.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
22 The transhipment problem 529

22.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
22.2 Optimality conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
22.3 Total unimodularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
22.4 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
22.4.1 The shortest path problem . . . . . . . . . . . . . . . . . . . . 539
22.4.2 The maximum flow problem . . . . . . . . . . . . . . . . . . . . 541
22.4.3 The transportation problem . . . . . . . . . . . . . . . . . . . . 544
22.4.4 The assignment problem . . . . . . . . . . . . . . . . . . . . . . 546
22.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
23 Shortest path 551

23.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552
23.2 The shortest path algorithm . . . . . . . . . . . . . . . . . . . . . . . . 558
23.3 Dijkstra’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
23.4 The longest path problem . . . . . . . . . . . . . . . . . . . . . . . . . 571
23.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
24 Maximum flow 577

24.1 The Ford-Fulkerson algorithm . . . . . . . . . . . . . . . . . . . . . . . 577
24.2 The minimum cut problem . . . . . . . . . . . . . . . . . . . . . . . . 583
24.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588
VII Discrete optimization 591

25 Introduction to discrete optimization 595
25.1 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595
25.2 Classical problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
25.2.1 The knapsack problem . . . . . . . . . . . . . . . . . . . . . . . 607
25.2.2 Set covering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
25.2.3 The traveling salesman problem . . . . . . . . . . . . . . . . . 610
25.3 The curse of dimensionality . . . . . . . . . . . . . . . . . . . . . . . . 614
25.4 Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616
25.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
xvi CONTENTS
26 Exact methods for discrete optimization 625

26.1 Branch and bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626
26.2 Cutting planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637
26.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644
26.4 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
27 Heuristics 647
27.1 Greedy heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
27.2 Neighborhood and local search . . . . . . . . . . . . . . . . . . . . . . 656
27.3 Variable neighborhood search . . . . . . . . . . . . . . . . . . . . . . . 669
27.4 Simulated annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674
27.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682
27.6 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682
VIII Appendices 685

A Notations 687
B Definitions 689
C Theorems 695
D Projects 699
D.1 General instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
D.2 Performance analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700
References 703
Part I
problem
Formulation and analysis of the
No one trusts a model except the

man who wrote it; everyone trusts
an observation, except the man who
made it.
Harlow Shapley
Modeling is a necessity before any optimization process. How do we translate a

specific problem statement into a mathematical formulation that allows its analysis
and its resolution? In this first part, we propose modeling in the field of optimization.
Then, we identify the properties of the optimization problem that are useful in the
development of the theory and algorithms.
Chapter 1
Formulation
Contents
1.1 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Projectile . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.2 Swisscom . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.3 Château Laupt-Himum . . . . . . . . . . . . . . . . . . . 9
1.1.4 Euclid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.5 Agent 007 . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.6 Indiana Jones . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1.7 Geppetto . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2 Problem transformations . . . . . . . . . . . . . . . . . . . 16
1.2.1 Simple transformations . . . . . . . . . . . . . . . . . . . 16
1.2.2 Slack variables . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4 Problem definition . . . . . . . . . . . . . . . . . . . . . . . 21
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.1 Modeling
The need to optimize is a direct result of the need to organize. Optimizing consists in
identifying an optimal configuration or an optimum in a system in the broadest sense
of the term. We use here Definition 1.1, given by Oxford University Press (2013).
Definition 1.1 (Optimum). (In Latin optimum, the best). The most favorable
or advantageous condition, value, or amount, especially under a particular set of
circumstances.
As part of a scientific approach, this definition requires some details. How can we
judge that the condition is favorable, and how can we formally describe the set of
circumstances?
6 Modeling
The answer to these questions is an essential step in any optimization: mathe-

matical modeling (Definition 1.2 by Oxford University Press, 2013). The modeling
process consists of three steps:
1. The identification of the decision variables. They are the components of the
system that describe its state, and that the analyst wants to determine. Or, they
represent configurations of the system that are possible to modify in order to

improve its performance. In general, if these variables are n in number, they are
represented by a (column)1 vector of Rn , often denoted by x = (x1 . . . xn )T , i.e.,
 
x1
 
x =  ...  .
xn
In practice, this step is probably the most complicated and most important. The
most complicated because only experience in modeling and a good knowledge of
the specific problem can guide the selection. The most important because the rest
of the process depends on it. An inappropriate selection of decision variables can
generate an optimization problem that is too complicated and impossible to solve.
2. The description of the method to assess the state of the system in question, given
a set of decision variables. In this book, we assume that the person performing
the modeling is able to identify a formula, a function, providing a measure of the
state of the system, a value that she wants to make the smallest or largest possible.
This function, called objective function, is denoted by f and the aforementioned
measure obtained for the decision variables x is a real number denoted by f(x).
3. The mathematical description of the circumstances or constraints, specifying the
values that the decision variables can take.
Definition 1.2 (Mathematical model). Mathematical representation of a physical,

economic, human phenomenon, etc., conducted in order to better study it.
The modeling process is both exciting and challenging. Indeed, there is no uni-
versal recipe, and the number of possible models for a given problem is only limited
by the imagination of the modeler. However, it is essential to master optimization
tools and to understand the underlying assumptions in order to develop the adequate
model for the analysis in question. In this chapter, we provide some simple examples
of modeling exercises. In each case, we present a possible modeling.
1.1.1 Projectile
We start with a simple problem. A projectile is launched vertically at a rate of 50
meters per second, in the absence of wind. After how long and at which altitude does
1 See Appendix A about the mathematical notations used throughout the book.
Formulation 7
it start to fall? Note that, in this case, the decision variables represent a state of the
system that the analyst wants to calculate.
The modeling process consists of three steps.
Decision variables A single decision variable is used. Denoted by x, it represents
the number of seconds from the launch of the projectile. Note that in this case,
the decision variables represent a state of the system that the analyst wants to
calculate.
Objective function We seek to identify the maximum altitude reached by the ob-
ject. We must thus express the altitude as a function of the decision variable. Since
we are dealing with the uniformly accelerating movement of an object subjected
to gravity, we have
g 9.81 2
f(x) = − x2 + v0 x + x0 = − x + 50 x ,
2 2
where g = 9.81 is the acceleration experienced by the projectile, v0 = 50 is its
initial velocity, and x0 = 0 is its initial altitude.
Constraints Time only goes forward. Therefore, we impose x ≥ 0.
We obtain the optimization problem
9.81 2
max − x + 50 x , (1.1)
x∈R 2
subject to (s.t.)
x ≥ 0. (1.2)
1.1.2 Swisscom
The company Swisscom would like to install an antenna to connect four important
new customers to its network. This antenna must be as close as possible to each
client, giving priority to the best customers. However, to avoid the proliferation of
telecommunication antennas, the company is not allowed to install the new antenna
at a distance closer than 10 km from the other two antennas, respectively located at
coordinates (−5, 10) and (5, 0) and represented by the symbol ✪ in Figure 1.1. The
coordinates are expressed in kilometers from Swisscom’s headquarters. Swisscom
knows the geographic situation of each customer as well as the number of hours of
communication that the customer is supposed to consume per month. This data is
listed in Table 1.1. At which location should Swisscom install the new antenna?
Decision variables Swisscom must identify the ideal location for the antenna, i.e.,
the coordinates of that location. We define two decision variables x1 and x2
representing these coordinates in a given reference system.
Objective function The distance di (x1 , x2 ) between a customer i located at the
coordinates (ai , bi ) and the antenna is given by
q
di (x1 , x2 ) = (x1 − ai )2 + (x2 − bi )2 . (1.3)
8 Modeling
3
✻
✪ 1
(x1 , x2 )
✪ ✲ 4
Figure 1.1: Swisscom problem
Table 1.1: Data for Swisscom customers

Customer Coord. Hours
1 (5, 10) 200
2 (10, 5) 150
3 (0, 12) 200
4 (12, 0) 300
To take into account the communication time, we measure the sum of the distances
weighted by the number of consumed hours:
f(x1 , x2 ) = 200 d1 (x1 , x2 ) + 150 d2 (x1 , x2 )

+ 200 d3 (x1 , x2 ) + 300 d4 (x1 , x2 )
q
= 200 (x1 − 5)2 + (x2 − 10)2
q
(1.4)
+ 150 (x1 − 10)2 + (x2 − 5)2
q
+ 200 x21 + (x2 − 12)2
q
+ 300 (x1 − 12)2 + x22 .
Constraints The constraints on the distances between the antennas can be expressed
as q
(x1 + 5)2 + (x2 − 10)2 ≥ 10 (1.5)
and q
(x1 − 5)2 + x22 ≥ 10 . (1.6)
Formulation 9
We can combine the various stages of modeling to obtain the following optimiza-
tion problem:
q
min f(x1 , x2 ) = 200 (x1 − 5)2 + (x2 − 10)2
x∈R2
q
+ 150 (x1 − 10)2 + (x2 − 5)2
q (1.7)
+ 200 x21 + (x2 − 12)2
q
+ 300 (x1 − 12)2 + x22
subject to (s.t.)
q
(x1 + 5)2 + (x2 − 10)2 ≥ 10
q (1.8)
(x1 − 5)2 + (x2 − 10)2 ≥ 10 .
1.1.3 Ch^
ateau Laupt-Himum
The Château Laupt-Himum produces rosé wine and red wine by buying grapes from
local producers. This year they can buy up to one ton of Pinot (a red grape) from a
winegrower, paying e 3 per kilo. They can then vinify the grapes in two ways: either
as a white wine to obtain a rosé wine or as a red wine to get Pinot Noir, a full-bodied
red wine. The vinification of the rosé wine costs e 2 per kilo of grapes, while that of
the Pinot Noir costs e 3.50 per kilo of grapes.
In order to take into account economies of scale, the Château wants to adjust the
price of its wine to the quantity produced. The price for one liter of the rosé is e 15
minus a rebate of e 2 per hundred liters produced. Thus, if they produce 100 liters
of rosé, they sell it for e 13 per liter. If they produce 200, they sell it for e 11 per
liter. Similarly, they sell the Pinot Noir at a price of e 23 per liter, minus a rebate of
e 1 per hundred liters produced.
How should the Château Laupt-Himum be organized in order to maximize its
profit, when a kilo of grapes produces 1 liter of wine?
Decision variables The strategy of the Château Laupt-Himum is to decide how

many liters of rosé wine and Pinot Noir to produce each year, and the number of
kilos of grapes to buy from the winegrower. Therefore, we define three decision
variables:
• x1 is the number of liters of rosé wine to produce each year,
• x2 is the number of liters of Pinot Noir to produce,
• x3 is the number of kilos of grapes to buy.
Objective function The objective of the Château Laupt-Himum is to maximize its
profit. This gain is the income from the wine sales minus the costs.
10 Modeling
Each liter of rosé wine that is sold gives (in e )

2
15 − x1
100
taking into account the reduction. Similarly, each liter of Pinot Noir gives (in e )
1
23 − x2 .
100
The revenues corresponding to the production of x1 liters of rosé wine and x2
liters of Pinot Noir are equal to

2 1
x1 15 − x1 + x2 23 − x2 .
100 100
It costs 3x3 to purchase the grapes. To produce a liter of wine, they need one kilo
of vinified grapes, which costs e 2 for the rosé and e 3.50 for the Pinot Noir. The
total costs are
2x1 + 3.5 x2 + 3x3 .
The objective function that the Château Laupt-Himum should maximize is

2 1
x1 15 − x1 + x2 23 − x2 − (2x1 + 3.5 x2 + 3x3 ) .
100 100
Constraints The Château cannot buy more than 1 ton of grapes from the wine-
grower, i.e.,
x3 ≤ 1, 000 .
Moreover, they cannot produce more wine than is possible with the amount of
grapes purchased. As one kilo of grapes produces one liter of wine, we have
x1 + x2 ≤ x3 .
It is necessary to add constraints which are, although apparently trivial at the
application level, essential to the validity of the mathematical model. These con-
straints specify the nature of the decision variables. In the case of Château Laupt-
Himum, negative values of these variables would have no valid interpretation. It
is necessary to impose
x1 ≥ 0 , x2 ≥ 0 , x3 ≥ 0 . (1.9)
We combine the modeling steps to obtain the following optimization problem:

2 1
max f(x) = x1 15 − x1 + x2 23 − x2 − (2x1 + 3.5 x2 + 3x3 ) (1.10)
x∈R3 100 100
subject to
x1 + x2 ≤ x3
x3 ≤ 1 000
x1 ≥ 0 (1.11)
x2 ≥ 0
x3 ≥ 0 .
Formulation 11
1.1.4 Euclid
In about 300 BC, the Greek mathematician Euclid was interested in the following
geometry problem: what is the rectangle with the greatest area among the rectangles
with given perimeter L? This is considered as one of the first known optimization
problems in history. We write it in three steps.

Decision variables The decision variables are the length x1 and the height x2 of the
rectangle, expressed in any arbitrary unit.
Objective function We are looking for the rectangle with maximum area. There-
fore, the objective function is simply equal to x1 x2 .
Constraints The total length of the edges of the rectangle must be equal to L, that
is
2x1 + 2x2 = L. (1.12)
Moreover, the dimensions x1 and x2 must be non negative:
x1 ≥ 0 and x2 ≥ 0. (1.13)
Combining everything, we obtain the following optimization problem:
max x1 x2 (1.14)
x∈R2
subject to
2x1 + 2x2 = L
x1 ≥ 0 (1.15)
x2 ≥ 0.
1.1.5 Agent 007

James Bond, secret agent 007, has a mission to defuse a nuclear bomb on a yacht
moored 100 meters from shore. Currently, James Bond is 80 meters from the nearest
point to the yacht on the beach. He is capable of running on the beach at 20 km/h
and swimming at 6 km/h. Given that he needs 30 seconds to defuse the bomb, and
that the bomb is programmed to explode in 102 seconds, will James Bond have the
time to save the free world? This crucial issue, illustrated in Figure 1.2, can be solved
by an optimization problem.
Decision variables The decision that James Bond must make in order to arrive as
fast as possible on the yacht is to choose when to stop running on the beach and
start swimming towards the yacht. We define a decision variable x representing
the distance in meters to run on the beach before jumping into the water.
Objective function Since the objective is to minimize the time to get to the yacht,
the objective function associates a decision x with the corresponding time in sec-
onds. Since James Bond runs at 20 km/h, the x meters on the beach are covered
in 3.6 x/20 seconds, i.e., 18x/100 seconds. From there, he swims a distance of
q
1002 + (80 − x)2 (1.16)
Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 12 Modeling
100 m
007
x
80 m
Figure 1.2: The setting for James Bond
at a speed of 6 km/h. This takes him

q q
3.6
1002 + (80 − x)2 = 0.6 1002 + (80 − x)2 (1.17)
6
seconds. The objective function is
q
18
f(x) = x + 0.6 1002 + (80 − x)2 . (1.18)
100
Note that the trivial decisions such as x = 0 and x = 80 have disastrous conse-
quences for the future of the planet. An optimization approach is essential.
Constraints A secret agent like James Bond suffers no constraint! However, it makes
sense to require that he does not run backward or beyond the yacht, that is,
0 ≤ x ≤ 80 .
We can combine the different steps of the modeling to obtain the following opti-
mization problem:
q
18
min f(x) = x + 0.6 1002 + (80 − x)2 (1.19)
x∈R 100
subject to
x≥0
(1.20)
x ≤ 80 .
This example is inspired by Walker (1999).
Formulation 13
1.1.6 Indiana Jones

During his quest to find the cross of Coronado, the famous archaeologist Indiana
Jones gets stuck facing a huge room filled with Pseudechis porphyriacus, venomous
snakes. This room is 10 meters long and 5 high. Given the aversion the adventurer
has for these reptiles, it is impossible for him to wade through them, and he considers
passing over them. However, the roof is not strong enough, so he cannot walk on it.
Ever ingenious, he places the end of a ladder on the ground, blocked by a boulder,
leans it on the wall, and uses it to reach the other end of the room (Figure 1.3). Once
there, he uses his whip to get down to the floor on the other side of the snake room.
Where on the floor must he place the end of the ladder, so that the length used is as
small as possible, and the ladder thus less likely to break under his weight? We write
the optimization problem that would help our hero.
x2
✻
h=5m
❄
✛ ✲ x1
ℓ = 10 m
Figure 1.3: The setting for Indiana Jones

Decision variables The decision variable is the position on the floor of the end of
the ladder. To facilitate our modeling work, let us also use a decision variable for
the other end of the ladder. Thus
• x1 is the position on the floor of the ladder’s end,
• x2 is the height of the other end of the ladder at the other end of the room.
Objective function Since the objective is to find the smallest possible ladder, we
minimize its length, i.e., q
f(x) = x21 + x22 .
Constraints The positions x1 and x2 should be such that the ladder is leaned exactly
on the edge of the wall of the room. By using similar triangles, this constraint
can be expressed as
x2 h x2 − h
= =
x1 x1 − ℓ ℓ
14 Modeling
or
x1 x2 − hx1 − ℓx2 = 0 .
Finally, the ends of the ladder must be outside the room, and
x1 ≥ ℓ and x2 ≥ h .
We can combine the modeling steps to obtain the following optimization problem:
q
min x21 + x22 (1.21)
x∈R2
subject to
x1 x2 − hx1 − ℓx2 = 0
x1 ≥ ℓ (1.22)
x2 ≥ h .
1.1.7 Geppetto
The company Geppetto Inc. produces wooden toys. It specializes in the manufacture
of soldiers and trains. Each soldier is sold for e 27 and costs e 24 in raw material. To
produce a soldier, 1 hour of carpentry labor is required, as well as 2 hours of finishing.
Each train is sold for e 21 and costs e 19 in raw material. To produce a train, it takes
1 hour of carpentry labor and 1 hour of finishing. Geppetto has two carpenters and
two finishing specialists each working 40 hours per week. He himself puts in 20 hours
per week on finishing work. The trains are popular, and he knows that he can always
sell his entire production. However, he is not able to sell more than 40 soldiers per
week. What should Geppetto do to optimize his income?
The modeling process is organized in three steps.
Decision variables Geppetto’s strategy is to determine the number of soldiers and
trains to produce per week. Thus, we define two decision variables:
• x1 is the number of soldiers to produce per week,
• x2 is the number of trains to produce per week.
Objective function The objective of Geppetto is to make as much money as pos-
sible. The quantity to maximize is the profit. Geppetto’s gains consist of the
income from the sales of the toys, minus the cost of the raw material. Geppetto’s
income is the sum of the income from the sales of the soldiers and the sales of the
trains. The income (in e ) from the sales of the soldiers is the number of soldiers
sold multiplied by the selling price of one soldier, i.e., 27 x1 . The income from
the sales of the trains is 21 x2 . The total income for Geppetto is 27 x1 + 21 x2 .
Similarly, we evaluate the material costs to 24 x1 + 19x2 . The gain is
(27 x1 + 21 x2 ) − (24 x1 + 19x2 ), (1.23)
or
f(x) = 3x1 + 2x2 . (1.24)
Formulation 15
Constraints The production is subject to three main constraints. First, the available
labor does not allow more than 100 finishing hours (performed by two workers and
Geppetto) and 80 hours of carpentry (performed by two carpenters). Furthermore,
to avoid unsold objects, they should not produce more than 40 soldiers per week.
The number of hours per week of finishing is the number of hours of finishing
for the soldiers, multiplied by the number of soldiers produced, plus the hours of
finishing for the trains, multiplied by the number of trains produced. This number
may not exceed 100, and we can express the following constraint:
2x1 + x2 ≤ 100 . (1.25)
A similar analysis of the carpentry resources leads to the following constraint:
x1 + x2 ≤ 80 . (1.26)
Finally, the constraint to avoid unsold products can simply be written as:
x1 ≤ 40 . (1.27)
At this stage, it seems that all the constraints of the problem have been described
mathematically. However, it is necessary to add constraints that are, although
apparently trivial at the application level, essential to the validity of the mathe-
matical model. These constraints specify the nature of the decision variables. In
the case of Geppetto, it is not conceivable to produce parts of trains or soldiers.
The decision variables must absolutely take integer values, so that
x1 ∈ N , x2 ∈ N . (1.28)
We combine the different stages of the modeling to obtain the following optimiza-
tion problem:
max f(x) = 3x1 + 2x2 , (1.29)

x∈N2
subject to
2x1 + x2 ≤ 100
x1 + x2 ≤ 80 (1.30)
x1 ≤ 40.
Interestingly, the constraints (1.28), albeit of a trivial aspect, significantly compli-

cate the optimization methods. Most of the book is devoted to problems where the
integrality of the variables is not imposed. An introduction to discrete optimization
is provided in Part VII. This example is inspired by Winston (1994).
16 Problem transformations
1.2 Problem transformations
Even though the modeling step is completed, we are not yet out of the woods. In-
deed, there are many ways to write a given problem mathematically. Algorithms and
software often require a special formulation based on which they solve the problem.
In this chapter, we study techniques that help us comply with these requirements.
Obtaining the mathematical formulation of a problem does not necessarily end the
modeling process. In fact, the obtained formulation may not be adequate. In particu-
lar, the software capable of solving optimization problems often require the problems
to be formulated in a specific way, not necessarily corresponding to the result of
the approach described in Section 1.1. We review some rules for transforming an
optimization problem into another equivalent problem.
Definition 1.3 (Equivalence). Two optimization problems P1 and P2 are said to be

equivalent if we can create a feasible point (i.e., satisfying the constraints) in P2 from
a feasible point in P1 (and vice versa) with the same value of the objective function.
In particular, the two problems have the same optimal cost, and we can obtain a
solution of P2 from a solution of P1 (and vice versa).
1.2.1 Simple transformations

Here are some simple transformations that are often used in modeling.
1. Consider the optimization problem
min f(x) ,
x∈X ⊆ Rn
n
where X is a subset of R . Consider a function g : R → R that is strictly
increasing on Im(f) = z | ∃x ∈ X such that z = f(x) , i.e., for any z1 , z2 ∈ Im(f),
g(z1 ) > g(z2 ) if and only if z1 > z2 . Thus,2
argminx∈X ⊆ Rn f(x) = argminx∈X ⊆ Rn h(x) , (1.31)

where h(x) = g f(x) , and

min n g f(x) = g min n f(x) . (1.32)
x∈X ⊆ R x∈X ⊆ R
In particular, adding or subtracting a constant to the objective function of an

optimization problem does not change its solution

∀c ∈ R , argminx∈X ⊆ Rn f(x) = argminx∈X ⊆ Rn f(x) + c . (1.33)
2 The operator argmin identifies the values of the decision variables that reach the minimum, while
the operator min identifies the value corresponding to the objective function. See Appendix A.
Formulation 17
Similarly, if the function f generates only positive values, taking the logarithm of
the objective function or taking its square does not change its solution:

argminx∈X ⊆ Rn f(x) = argminx∈X ⊆ Rn log f(x) , (1.34)
and
2
argminx∈X ⊆ Rn f(x) = argminx∈X ⊆ Rn f(x) , (1.35)
as g(x) = x2 is strictly increasing for x ≥ 0. Note that the log transformation is
typically used in the context of maximum likelihood estimation of unknown pa-
rameters in statistics, where the objective function is a probability and, therefore,
is positive. The square transform is relevant namely when f(x) is expressed as a
square root (see the example on Indiana Jones in Section 1.1.6). In this case, the
square root can be omitted.
2. A maximization problem whose objective function is f(x) is equivalent to a mini-
mization problem whose objective function is −f(x):
argmaxx f(x) = argminx −f(x) , (1.36)
and
max f(x) = − min −f(x) . (1.37)
x x
Similarly, we have
argminx f(x) = argmaxx −f(x) , (1.38)
and
min f(x) = − max −f(x) . (1.39)
x x
3. A constraint defined by a lower inequality can be multiplied by −1 to get an upper

inequality
g(x) ≤ 0 ⇐⇒ −g(x) ≥ 0 . (1.40)
4. A constraint defined by an equality can be replaced by two constraints defined by
inequalities
g(x) ≤ 0
g(x) = 0 ⇐⇒ (1.41)
g(x) ≥ 0 .
Note that this transformation is primarily used when constraints are linear. When
g(x) is non linear, this transformation is generally not recommended.
5. Some software require that all decision variables be non negative. However, for
problems such as the Swisscom example described in Section 1.1.2, such restric-
tions are not relevant. If a variable x can take any real value, it is then replaced
by two artificial variables denoted by x+ and x− , such that
x = x+ − x− . (1.42)
In this case, we can meet the requirements of the software and impose x+ ≥ 0 and
x− ≥ 0, without loss of generality.
18 Problem transformations
6. In the presence of a constraint x ≥ a, with a ∈ R, a simple change of variable
x=e
x+a (1.43)
transforms the constraint into

e
x ≥ 0. (1.44)
To illustrate these transformations, let us consider the following optimization

problem:
max −x2 + sin y (1.45)
x,y
subject to
6x − y2 ≥ 1 (1.46)
2 2
x +y =3 (1.47)
x≥2 (1.48)
y ∈ R, (1.49)
and transform it in such a way as to obtain a minimization problem, in which all the
decision variables are non negative, and all constraints are defined by lower inequali-
ties. We get the following problem:
− x + 2)2 − sin(y+ − y− )
min (e (1.50)
e, y+ , y−
x
subject to
2
x + 2) + y+ − y−
−6(e +1≤0 (1.51)
2 +

− 2
(e
x + 2) + y − y −3≤0 (1.52)
2
x + 2)2 − y+ − y− + 3 ≤ 0
−(e (1.53)
e
x≥0 (1.54)
+
y ≥0 (1.55)
−
y ≥ 0, (1.56)
where
(1.50) is obtained by applying (1.37) to (1.45),
(1.52) and (1.53) are obtained by applying (1.41) to (1.47),
(1.55) and (1.56) are obtained by applying (1.42) to (1.49).
Note that the transformed problem has more decision variables (3 instead of 2 for the
original problem) and more constraints (6 instead of 3 for the original problem).
When the solution e x∗ , y+∗ , y−∗ of (1.50)–(1.56) is available, it is easy to obtain
the solution to the original problem (1.45)–(1.49) by applying the inverse transfor-
mations, i.e.,
x∗ = e
x∗ + 2
(1.57)
y∗ = y+∗ − y−∗ .
Formulation 19
1.2.2 Slack variables

A slack variable is introduced to replace an inequality constraint by an equality con-
straint. Such a variable should be non negative. There are several ways to define a
slack variable.
• The slack variable y is introduced directly in the specification, and its value is
explicitly restricted to be non negative:
g(x) + y = 0
g(x) ≤ 0 ⇐⇒ (1.58)
y ≥ 0.
The above specification does not completely eliminate inequality constraints. How-
ever, it simplifies considerably the nature of these constraints.
• The slack variable is introduced indirectly using a specification enforcing its non
negativity. For example, the slack variable can be defined as y = z2 , and
g(x) ≤ 0 ⇐⇒ g(x) + z2 = 0. (1.59)
• The slack variable can also be introduced indirectly using an exponential, that is,
y = exp(z), and
g(x) ≤ 0 ⇐⇒ g(x) + ez = 0. (1.60)
The limitation of this approach is that there is no value of z such that g(x)+ez = 0
when the constraint is active, that is when g(x) = 0. Strictly speaking, the two
specifications are not equivalent. However, the slack variable exp(z) can be made
as close to zero as desired by decreasing the value of z, as
lim ez = 0. (1.61)
z→−∞
So, loosely speaking, we can say that the two specifications are “asymptotically
equivalent.”
The slack variable (z2 , ez or y) thus introduced measures the distance between the
constraint and the point x. The special status of such variables can be effectively
exploited for solving optimization problems.
Definition 1.4 (Slack variable). A slack variable is a decision variable introduced

in an optimization problem to transform an inequality constraint into an equality
constraint, possibly with a non negativity constraint.
For example, we consider the following optimization problem:

min x21 − x22 (1.62)
x1 , x2
subject to
π
sin x1 ≤ (1.63)
√ 2
ln ex1 + ex2 ≥ e (1.64)
x1 − x2 ≤ 100 . (1.65)
20 Hypotheses
We introduce the slack variable z1 for the constraint (1.63), the slack variable z2
for the constraint (1.64), and the slack variable y3 for the constraint (1.65). The
obtained optimization problem is
min x21 − x22 , (1.66)

x1 , x2 , z1 , z2 , y3
subject to
π
sin x1 + z21 = (1.67)
2
√
ln ex1 + ex2 − ez2 = e (1.68)
x1 − x2 + y3 = 100 (1.69)
y3 ≥ 0 . (1.70)
Note that the objective function is not affected by the introduction of slack variables.
1.3 Hypotheses
The methods and algorithms presented in this book are not universal. Each approach
is subject to assumptions about the structure of the underlying problem. Specific
assumptions are discussed for each method. However, there are three important
assumptions that concern (almost) the whole book: the continuity hypothesis, the
differentiability hypothesis, and the determinism hypothesis.
The continuity hypothesis consists in only considering problems for which the
objective to optimize and the constraints are modeled by continuous functions of
decision variables. This hypothesis excludes problems with integer variables (see
discussion in Section 1.1.7). Such problems are treated by discrete optimization
or integer programming.3 An introduction to discrete optimization in provided in
Part VII of this book. We also refer the reader to Wolsey (1998) for an introduction to
combinatorial optimization, as well as to Schrijver (2003), Bertsimas and Weismantel
(2005), and Korte and Vygen (2007).
The differentiability hypothesis (which obviously implies the continuity hypoth-
esis) also requires that the functions involved in the model are differentiable. Non
differentiable optimization is the subject of books such as Boyd and Vandenberghe
(2004), Bonnans et al. (2006), and Dem’Yanov et al. (2012).
The determinism hypothesis consists in ignoring possible errors in the data for the
problem. Measurement errors, as well as modeling errors, can have a non negligible
impact on the outcome. Stochastic optimization (Birge and Louveaux, 1997) enables
the use of models in which some pieces of data are represented by random variables.
Robust optimization (see Ben-Tal and Nemirovski, 2001 and Ben-Tal et al., 2009)
produces solutions that are barely modified by slight disturbances in the data of the
problem.
3 The term “programming”, used in the sense of optimization, was introduced during the Second
World War.
Formulation 21
It is crucial to be aware of these hypotheses in the modeling stage. The use of

inappropriate techniques can lead to erroneous results. For instance, it is shown in
Section 25.4 that solving a discrete optimization problem by solving a continuous
version (called the relaxation) and then rounding the solutions to the closest integer
is inappropriate.
1.4 Problem definition

We now outline the main concepts that define the optimization problems, and analyze
the desired properties.
We consider the following optimization problem:
min f(x) (1.71)

x∈Rn
subject to
h(x) = 0 (1.72)
g(x) ≤ 0 (1.73)
and
x ∈ X, (1.74)
where f is a function of Rn in R, h is a function of Rn in Rm , g is a function of

Rn in Rp and X ⊆ Rn is a convex set (Definition B.2). We say that this is an opti-
mization problem with n decision variables, m equality constraints, and p inequality
constraints. We assume that n > 0, i.e., that the problem involves at least one de-
cision variable. However, we consider problems where m and p are zero, as well as
problems where X = Rn . By employing the transformations described in Section 1.2,
it is possible to express any optimization problem satisfying the hypotheses described
in Section 1.3 in the form (1.71)–(1.74).
We consider two types of solutions to this problem: local minima, where none of
their neighbors also satisfying the constraints are better, and global minima, where
no other point satisfying the constraints is better.

Definition 1.5 (Local minimum). Let Y = x ∈ Rn | h(x) = 0 , g(x) ≤ 0 and x ∈ X
be the feasible set, that is the set of vectors satisfying all constraints. The vector
x∗ ∈ Y is called a local minimum of the problem (1.71)–(1.74) if there exists ε > 0
such that
f(x∗ ) ≤ f(x) , ∀x ∈ Y such that x − x∗ < ε . (1.75)
The notation k · k denotes a vector norm on Rn (Definition B.1).

22 Problem definition

Definition 1.6 (Strict local minimum). Let Y = x ∈ Rn | h(x) = 0 , g(x) ≤
0 and x ∈ X be the feasible set, that is the set of vectors satisfying all the constraints.
The vector x∗ ∈ Y is called a strict local minimum of the problem (1.71)–(1.74) if
there exists ε > 0 such that
f(x∗ ) < f(x) , ∀x ∈ Y , x 6= x∗ such that x − x∗ < ε . (1.76)

Definition 1.7 (Global minimum). Let Y = x ∈ Rn | h(x) = 0 , g(x) ≤ 0 and x ∈
X be the feasible set, that is the set of vectors satisfying all the constraints. The
vector x∗ ∈ Y is called a global minimum of the problem (1.71)–(1.74) if
f(x∗ ) ≤ f(x) , ∀x ∈ Y . (1.77)

Definition 1.8 (Strict global minimum). Let Y = x ∈ Rn | h(x) = 0 , g(x) ≤
0 and x ∈ X be the feasible set, that is the set of vectors satisfying all the constraints.
The vector x∗ ∈ Y is called a strict global minimum of the problem (1.71)-(1.74) if
f(x∗ ) < f(x) , ∀x ∈ Y , x 6= x∗ . (1.78)
The notions of local maximum, strict local maximum, global maximum and strict
global maximum are defined in a similar manner.
Example 1.9 (Local minimum and maximum). Figure 1.4 illustrates these defini-
tions for the function
5 7 11
f(x) = − x3 + x2 − x + 3. (1.79)
6 2 3

The point x∗ ≃ 0.6972 is a local minimum of f. Indeed, there is an interval x∗ −
1 ∗ 1 ∗
2 , x + 2 , represented by the dotted lines, such that f(x ) ≤ f(x), for any x in the
e∗
interval. Similarly,
∗ 1 the x ≃ 2.1024 is a local maximum. Indeed, there
point exists
an interval ex − 2, ex∗ + 12 , represented by the dotted lines, such that f(e
x∗ ) ≥ f(x),
for any x in the interval.
Example 1.10 (Binary optimization). Let f : Rn → R be differentiable, and let us

take the optimization problem
min f(x)
with constraints x ∈ {0, 1}n , i.e., such that each variable can take only the value 0 or
1. However, we can express the constraints in the following manner:
hi (x) = xi (1 − xi ) = 0 , i = 1, . . . , n .
Formulation 23
3.5
local maximum
3
2.5 x∗ x̃∗
f(x)
2
local minimum
1.5
1
0 0.5 1 1.5 2 2.5 3
x
Figure 1.4: Local minimum and maximum for Example 1.9
We thus get n differentiable constraints and the hypotheses are satisfied. However,
since each feasible point is isolated, each of them is a local minimum of the problem.
Therefore, algorithms for continuous optimization designed to identify local minima
are useless for this type of problem.
In general, it is desirable to exploit the particular structure of the problem, because

an excess of generality penalizes optimization algorithms. We analyze special cases
of the problem (1.71)–(1.74).
It is important to note that the existence of a solution is not always guaranteed.
For example, the problem minx∈R f(x) = x has neither a minimum nor a maximum,
whether local or global. In fact, this function is not bounded in the sense that it can
take on values that are arbitrarily large and arbitrarily small.
Definition 1.11 (Function bounded from below). The function f : Rn → R is

bounded from below on Y ⊆ Rn if there exists a real M such that
f(x) ≥ M , ∀x ∈ Y . (1.80)
The function f(x) = x is unbounded on R. However, it is bounded on compact

subsets of R. Among all the bounds M of Definition 1.11, the largest is called the
infimum of f.
Definition 1.12 (Infimum). Let f : Rn → R be a function bounded from below on

the set Y ⊆ Rn . The infimum of f on Y is denoted by
inf f(y) (1.81)
y∈Y
24 Problem definition
and is such that

inf f(y) ≤ f(x) , ∀x ∈ Y (1.82)
y∈Y
and
∀ M > inf f(y) , ∃x ∈ Y such that f(x) < M . (1.83)

y∈Y
Example 1.13 (Infimum). Consider f(x) = ex and Y = R. We have
inf f(y) = 0 .
y∈Y
We verify Definition 1.12. We have
0 = inf f(y) ≤ f(x) = ex , ∀x ∈ R ,

y∈Y
and (1.82) is satisfied. Consider an arbitrary M > 0, and consider x = ln M/2. In

this case,
M
f(x) = < M,
2
and the definition is satisfied.
This example shows that an optimization problem is not always well defined, and
that no solution may exist. Theorem 1.14 identifies cases where the problem is well-
defined, and where the infimum of a function on a set is reached by at least one
point of the set. In this case, the point is a global minimum of the corresponding
optimization problem.
Theorem 1.14 (Weierstrass theorem). Let Y ⊂ Rn be a closed and non empty

subset of Rn , and let f : Y → R be a lower semi-continuous function (Definition
B.21) on Y. If Y is compact (Definition B.23) or if f is coercive (that is, if it
goes to +∞ when x goes to +∞ or −∞, see Definition B.22), there exists x∗ ∈ Y
such that
f(x∗ ) = inf f(y).
y∈Y
Proof. Consider a sequence (xk )k of elements of Y such that
lim f(xk ) = inf f(y).

k→∞ y∈Y
If Y is compact, it is bounded and the sequence has at least one limit point x∗ . If f is
coercive (Definition B.22), the sequence (xk )k is bounded and has at least one limit
point x∗ . In both cases, due to the lower semi-continuity of f in x∗ , we have
f(x∗ ) ≤ lim f(xk ) = inf f(y),

k→∞ y∈Y
Formulation 25
and
f(x∗ ) = inf f(y).
y∈Y
Note that some functions may include local minima and have no global minimum,
as for Example 1.9, shown in Figure 1.4, where the function (1.79) is unbounded from
below, as
5 7 11
lim − x3 + x2 − x + 3 = −∞.
x→+∞ 6 2 3
In some cases, an optimization problem with constraints can be simplified and
the constraints ignored. Before detailing such simplifications in Section 3.1, we state
now a general theoretical result when the optimum is an interior point of the set of
constraints.
Definition 1.15 (Interior point). Consider Y ⊂ Rn and y ∈ Y. We say that y is in

the interior of Y if there exists a neighborhood of y in Y. Formally, y is in the interior
of Y if there exists ε > 0 such that all points in a neighborhood of size ε of y belong
to Y, that is such that
kz − yk ≤ ε =⇒ z ∈ Y. (1.84)
Theorem 1.16 (Solution in the interior of constraints). Let x∗ be a local minimum

of the optimization problem (1.71)–(1.74). Let Y = x ∈ Rn | h(x) = 0, g(x) ≤
0 and x ∈ X be the feasible set, and let Y = Y1 ∩ Y2 such that x∗ is an interior
point of Y1 . Then x∗ is a local minimum of the problem
min f(x).
x∈Y2
In particular, if x∗ is an interior point of Y, then the theorem applies with

Y2 = Rn and x∗ is a solution of the unconstrained optimization problem.
Proof. According to Definition 1.15, there exists ε1 > 0 such that

y ∈ Y1 , ∀y such that y − x∗ ≤ ε1 . (1.85)
Since x∗ is a local minimum (Definition 1.5), there exists ε2 > 0 such that
f(x∗ ) ≤ f(y) , ∀y ∈ Y such that ky − x∗ k ≤ ε2 . (1.86)
Then, if ε = min(ε1 , ε2 ), any point y ∈ Y2 such that ky − x∗ k ≤ ε belongs also to
Y1 and, therefore, is feasible. Consequently, x∗ is better, in the sense of the objective
function, than any of these y. Formally, we get
f(x∗ ) ≤ f(y) , ∀y ∈ Y2 such that y − x∗ ≤ ε , (1.87)
which is exactly (1.75) where Y has been replaced by Y2 , and x∗ is a local minimum
of the problem with only the Y2 constraints.
26 Exercises
This result is used particularly in the development of optimality conditions for

problems with constraints (Chapter 6), as well as in the development of algorithms
called interior point methods. Note that a problem containing equality constraints
has no interior points.
Before detailing algorithms that enable an optimization problem to be solved, it is
essential to properly understand the nature of this problem. In Chapter 2, we analyze

the objective function in detail. Constraints are discussed in Chapter 3. Finally, in
Chapter 4, we analyze the ways in which it is possible to combine the objective
function and constraints.
1.5 Exercises
Exercise 1.1 (Geometry). We want to determine a parallelepiped with a volume of
unity and minimal surface.
1. Formulate this problem as an optimization problem by determining
(a) the decision variables,
(b) the objective function,
(c) the constraint(s).
2. Formulate this problem as a minimization problem with only lower inequality
constraints.
Exercise 1.2 (Finance). A bank wants to determine how to invest its assets for the
year to come. Currently, the bank has a million euros that it can invest in bonds, real
estate loans, leases or personal loans. The annual interest rate of different investment
types are 6% for bonds, 10% for real estate loans, 8% for leases, and 13% for personal
loans.
To minimize risk, the portfolio selected by the bank must satisfy the following
restrictions:
• The amount allocated to personal loans must not exceed half of that invested in
bonds.
• The amount allocated to real estate loans must not exceed that allocated to leases.
• At most 20% of the total invested amount can be allocated to personal loans.
1. Formulate this problem as an optimization problem by determining
(a) the decision variables,
(b) the objective function,
(c) the constraint(s).
2. Formulate this problem as a minimization problem with only lower inequality
constraints.
3. Formulate this problem as a maximization problem with equality constraints and
with non negativity constraints on the decision variables.
Formulation 27
Exercise 1.3 (Stock management). The company Daubeliou sells oil and wants to
optimize the management of its stock. The annual demand is estimated at 100,000
liters and is assumed to be homogeneous throughout the year. The cost of storage
is e 40 per thousand liters per year. When the company orders oil to replenish its
stock, this costs e 80. Assuming that the order arrives instantly, how many orders
must the company Daubeliou place each year to satisfy demand and minimize costs?
Formulate this problem as an optimization problem by determining
1. the decision variables,
2. the objective function,
3. the constraint(s).
Advice:
• Set the amount of oil to order each time as a decision variable.
• Represent in a graph the evolution of the stock as a function of time.
Exercise 1.4 (Measurement errors). Mr. O. Beese is obsessed with his weight. He
owns 10 scales and weighs himself each morning on each of them. This morning he
got the following measurement results:
100.8 99.4 101.3 97.6 102.5 102.4

104.6 102.6 95.1 96.6
He wishes to determine an estimate of his weight while minimizing the sum of the
squares of measurement errors from the 10 scales. Formulate the optimization prob-
lem that he needs to solve.
Exercise 1.5 (Congestion). Every day, 10,000 people commute from Divonne to
Geneva. By train, the journey takes 40 minutes. By road, the travel time depends on
the level of congestion. It takes 20 minutes when the highway is completely deserted.
When there is traffic, travel time increases by 5 minutes per thousand people using
the highway (assuming that there is one person per car). If 4,000 people take the car
and 6,000 take the train, the travel time by road is equal to 20 + 5 × 4 = 40 minutes,
which is identical to the travel time by train. In this situation, it would be of no
interest to change one’s mode of transport. We say that the system is at equilibrium.
However, from the point of view of the average travel time per person, is this situation
optimal? Formulate an optimization problem to answer the question.
Chapter 2
Objective function
Before attempting to understand how to solve a problem, we first try to understand

the problem itself. In this chapter, we identify the properties of the objective function
which are useful in the development of theory and algorithms. A crucial part in
optimization is to understand the geometry of the problem. Derivatives play a central
role in this analysis. We also identify what constitutes a good or a bad geometry for
a problem.
Contents
2.1 Convexity and concavity . . . . . . . . . . . . . . . . . . . 29
2.2 Differentiability: the first order . . . . . . . . . . . . . . . 31
2.3 Differentiability: the second order . . . . . . . . . . . . . 39
2.4 Linearity and non linearity . . . . . . . . . . . . . . . . . . 42
2.5 Conditioning and preconditioning . . . . . . . . . . . . . 45
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Several concepts can be used to characterize the objective function. It is important

to identify the characteristics of the objective function because each optimization
algorithm is based on specific hypotheses. When the hypotheses of an algorithm have
not been verified for a given problem, there is no guarantee that the algorithm can
be used to solve this problem.
2.1 Convexity and concavity

A function f : Rn → R is said to be convex if, for any vector x and y of Rn, the
graph of f between x and y is not above the line segment connecting x, f(x) and
y, f(y) in Rn+1 . Definition 2.1 establishes this property formally.
30 Convexity and concavity
Definition 2.1 (Convex function). A function f : Rn → R is said to be convex if,

for any x, y ∈ Rn and for any λ ∈ [0, 1], we have

f λx + (1 − λ)y ≤ λf(x) + (1 − λ)f(y) . (2.1)
Definition
2.1 is shown
in Figure 2.1. The line segment connecting the points
x, f(x) and y, f(y) is never below the graph of f. The point z = λx + (1 − λ)y is
somewhere between x and y when 0 ≤ λ ≤ 1. The point with coordinates z, λf(x) +
(1−λ)f(y) is on the line segment between the points x, f(x) and y, f(y) . In order
for the function to be convex, this point must never (i.e., for all x, y and 0 ≤ λ ≤ 1)
be below the graph of the function.
(z, λf(x) + (1 − λ)f(y))
(z, f(z))
x z y
Figure 2.1: Illustration of Definition 2.1
Figure 2.2 shows a nonconvex function,

in which there is an x and y such that
the line connecting x, f(x) and y, f(y) is partially located below the graph of the
function.
The notion of convexity of a function can be strengthened, giving strict convexity.
Definition 2.2 (Strictly convex function). A function f : Rn → R is said to be

strictly convex if for all x, y ∈ Rn , x 6= y, and for all λ ∈ ]0, 1[ , we have

f λx + (1 − λ)y < λf(x) + (1 − λ)f(y) . (2.2)
The convexity of the function is an important feature in optimization. In gen-

eral, when the function is not convex, it is particularly difficult to identify a global
minimum of the problem (1.71)–(1.74).
Note that the importance of convexity is linked to minimization problems. When
we study maximization problems, the concept of concavity should be used, as shown
Objective function 31
(y, f(y))
(x, f(x))
Figure 2.2: Illustration of a counterexample to Definition 2.1
by Definition 2.3. As discussed in Section 1.2.1, a maximization problem can always

be easily transformed into a minimization problem using (1.37).
Definition 2.3 (Concave function). A function f : Rn → R is said to be concave if

−f is a convex function, i.e., if for all x, y ∈ Rn and for all λ ∈ [0, 1], we have

f λx + (1 − λ)y ≥ λf(x) + (1 − λ)f(y) . (2.3)
Note that convexity and concavity are not complementary properties. A function
may be neither convex nor concave. This is the case of the function represented in
Figure 2.2.
2.2 Differentiability: the first order

An important assumption in this book is that the objective function f and the func-
tions g and h describing the constraints are continuous (Definition B.5) and differen-
tiable. We summarize here the main concepts related to differentiability.
Definition 2.4 (Partial derivative). Let f : Rn → R be a continuous function. The

function ∇i f(x) : Rn → R, also written as ∂f(x)/∂xi is called the ith partial derivative
of f and is defined as
f(x1 , . . . , xi + α, . . . , xn ) − f(x1 , . . . , xi , . . . , xn )
lim . (2.4)
α→0 α
This limit may not exist.
32 Differentiability: the first order
If the partial derivatives ∂f(x)/∂xi exist for all i, the gradient of f is defined as
follows.
Definition 2.5 (Gradient ). Let f : Rn → R be a differentiable function. The

function denoted by ∇f(x) : Rn → Rn is called the gradient of f and is defined as
 
∂f(x)
 ∂x1 
 
 .. 
∇f(x) = 
 .
.
 (2.5)
 
 ∂f(x) 
∂xn
The gradient plays a key role in the development and analysis of optimization
algorithms.
Example 2.6 (Gradient). Consider f(x1 , x2 , x3 ) = ex1 +x21 x3 −x1 x2 x3 . The gradient
of f is given by
 x1 
e + 2x1 x3 − x2 x3
∇f(x1 , x2 , x3 ) =  −x1 x3 . (2.6)
2
x1 − x1 x2
The analysis of the behavior of the function in certain directions is also important
for optimization methods. We introduce the concept of a directional derivative.
Definition 2.7 (Directional derivative). Let f : Rn → R be a continuous function.

Consider x ∈ Rn and d ∈ Rn . The directional derivative of f in x in the direction d
is given by
f(x + αd) − f(x)
lim , (2.7)
>
α−→0
α
if the limit exists. In addition, when the gradient exists, the directional derivative is
the scalar product between the gradient of f and the direction d, i.e.,
∇f(x)T d . (2.8)
Definition 2.8 (Differentiable function). Let f : Rn → R be a continuous function.

If, for any d ∈ Rn , the directional derivative of f in the direction d exists, the function
f is said to be differentiable.
This concept is sometimes called “Gâteaux differentiability,” in the sense that

other types of differentiability can be defined (like the Fréchet differentiability). In
this book, we deal with continuous differentiable functions for which a distinction is
unnecessary.
Example 2.9 (Directional derivative). Consider f(x1 , x2 , x3 ) = ex1 + x21 x3 − x1 x2 x3 ,
and  
d1
d =  d2  . (2.9)
d3
The directional derivative of f in the direction d is
( d1 d2 d3 ) ∇f(x1 , x2 , x3 ) =
(2.10)
d1 (ex1 + 2x1 x3 − x2 x3 ) − d2 x1 x3 + d3 (x21 − x1 x2 ) ,
where ∇f(x1 , x2 , x3 ) is given by (2.6).
A numerical illustration of the directional derivative is given in Example 2.14.

Note that the partial derivatives are in fact directional derivatives in the direction of
the coordinate axes and
∂f(x)
= ∇f(x)T ei , (2.11)
∂xi
where ei is the column i of the identity matrix, i.e., a vector with all entries equal to
0, except the one at line i which is 1.
The directional derivative provides information about the slope of the function in
the direction d, just as the derivative gives information of the slope of functions of
one variable. In particular, the function increases in the direction d if the directional
derivative is positive and decreases if it is negative. In the latter case, we say that it
is a descent direction.
Definition 2.10 (Descent direction). Let f : Rn → R be a differentiable function.

Consider x, d ∈ Rn . The direction d is a descent direction in x if
dT ∇f(x) < 0 . (2.12)
The terminology “descent direction” is justified by Theorem 2.11. It shows the

decrease of the function along the descent direction. The theorem also states that
the decrease is proportional to the slope, that is, the directional derivative, in the
neighborhood of x.
Theorem 2.11 (Descent direction). Let f : Rn → R be a differentiable function.

Consider x ∈ Rn such that ∇f(x) 6= 0 and d ∈ Rn . If d is a descent direction, in
the sense of Definition 2.10, there exists η > 0 such that
f(x + αd) < f(x) , ∀0 < α ≤ η . (2.13)

Moreover, for any β < 1, there exists b

η > 0 such that
f(x + αd) < f(x) + αβ∇f(x)T d , (2.14)
for all 0 < α ≤ b

η.
Proof. We use (C.1) from Taylor’s theorem (Theorem C.1) to evaluate the function
in x + αd by employing the information in x. We have

f(x + αd) = f(x) + αdT ∇f(x) + o αkdk Taylor
T
= f(x) + αd ∇f(x) + o(α) kdk does not depend on α.
The result follows from the fact that dT ∇f(x) < 0 by assumption, and that o(α) can
be made as small as needed. More formally, according to the definition of the Landau
notation o( · ) (Definition B.17), for any ε > 0, there exists η such that
o(α)
< ε, ∀0 < α ≤ η .
α
We choose ε = −dT ∇f(x), which is positive according to Definition 2.10. Then,
o(α) o(α)
≤ < −dT ∇f(x) , ∀0 < α ≤ η . (2.15)
α α
We now need only multiply (2.15) by α to obtain
αdT ∇f(x) + o(α) < 0 , ∀0 < α ≤ η ,
and f(x + αd) < f(x), ∀0 < α ≤ η. The result (2.14) is obtained in a similar manner,
by choosing ε = (β − 1)∇f(x)T d in the definition of the Landau notation. We have
ε > 0 because β < 1 and ∇f(x)T d < 0.
Among all directions d from a point x, the one in which the slope is the steepest
is the direction of the gradient ∇f(x). To show this, we consider all directions d that
have the same norm (the same length) as the gradient and compare the directional
derivative for each of them.
Theorem 2.12 (Steepest ascent). Let f : Rn → R be a differentiable function.

Consider x ∈ Rn and d∗ = ∇f(x). Then, for all d ∈ Rn such that kdk = ∇f(x) ,
we have
dT ∇f(x) ≤ d∗ T ∇f(x) = ∇f(x)T ∇f(x) . (2.16)
Proof. Let d be any direction. We have

dT ∇f(x) ≤ kdk ∇f(x) Cauchy-Schwartz (Theorem C.13)
2
= ∇f(x) assumption kdk = ∇f(x)
T
= ∇f(x) ∇f(x) definition of a scalar product
∗T
=d ∇f(x) definition of d∗ .
2
Since d∗ T ∇f(x) = ∇f(x) ≥ 0, the function increases in the direction d∗ , which
corresponds to the steepest ascent.
If the direction of the gradient corresponds to the steepest ascent of the function x,
we need only consider the direction opposite the gradient −∇f(x) to find the steepest
descent.
Corollary 2.13 (Steepest descent). Let f : Rn → R be a differentiable function.

Consider x ∈ Rn and d∗ = −∇f(x). Then, for any d ∈ Rn such that kdk =
∇f(x) , we have
−∇f(x)T ∇f(x) = d∗ T ∇f(x) ≤ dT ∇f(x) (2.17)
and the direction opposite the gradient is that in which the function has its
steepest descent.
Proof. Let d be any direction. We have

−dT ∇f(x) ≤ ∇f(x)T ∇f(x) by applying Theorem 2.12 at −d
T
= − d∗ ∇f(x) according to the definition of d∗
T
and −dT ∇f(x) ≤ − d∗ ∇f(x). We get (2.17) by multiplying this last inequality
by −1. Since d∗ T ∇f(x) ≤ 0, the function is decreasing in the direction d∗ , which
corresponds to the steepest descent.
1 2
Example 2.14 (Steepest ascent). Consider f(x) = 2 x1 + 2x22 , and x = ( 1 1 )T .
We consider three directions

1 1 −1
d1 = ∇f(x) = , d2 = , and d3 = .
4 1 3
The directional derivative in f in each direction equals:
dT1 ∇f(x) = 17
dT2 ∇f(x) = 5
dT3 ∇f(x) = 11 .
The shape of the function in each of these directions is shown in Figure 2.3. For each
direction di , the function f(x + αdi ) is represented for values of α between 0 and 1.
55
50
−1
45 d3 =
3
40
1
f(x + αdi ) 35 d2 =
1
30
25 d1 = ∇f(x)
20
15
10
5
0
0 0.2 0.4 0.6 0.8 1
α
1 2
Figure 2.3: Shape of the function 2 x1 + 2x2 at point (1, 1)T in several directions
Example 2.15 (Steepest descent). Consider f(x) = 12 x21 + 2x22 , and x = ( 1 1 )T .

We consider three directions

−1 −1 1
d1 = −∇f(x) = , d2 = , and d3 = .
−4 −1 −3
The directional derivative in f in each direction equals:
dT1 ∇f(x) = −17
dT2 ∇f(x) = −5
dT3 ∇f(x) = −11 .
The shape of the function in each of these directions is shown in Figure 2.4. For each
direction di , the function f(x + αdi ) is represented for values of α between 0 and 1.
It is important to note in this figure that the function does not constantly decrease
along one descent direction. The descent feature is local, i.e., valid in a neighborhood
of x. Even if the function increases later, the steepest descent is locally observed in
the direction −∇f(x).
In addition to providing information on the slope of the function, the gradient

also enables to verify whether the function is convex or concave.
Theorem 2.16 (Convexity according to the gradient). Let f : X ⊆ Rn → R be a

differentiable function on an open convex set X. f is convex on X if and only if
f(y) − f(x) ≥ (y − x)T ∇f(x) , ∀x, y ∈ X . (2.18)
f is strictly convex on X if and only if
f(y) − f(x) > (y − x)T ∇f(x) , ∀x, y ∈ X . (2.19)
18
16 d1 = −∇f(x)
14
−1 1
f(x + αdi ) 12 d2 = d3 =
−1 −3
10
8
6
4
2
0
0 0.2 0.4 0.6 0.8 1
α
1 2
Figure 2.4: Shape of the function 2 x1 + 2x2 at point (1, 1)T in several directions
Proof. Necessary condition. We first show that (2.18) is a necessary condition. Let
x, y ∈ X be arbitrary and let us consider d = y − x. We evaluate the directional
derivative of f in the direction d and obtain
(y − x)T ∇f(x) = dT ∇f(x) definition of d

f(x + αd) − f(x)
= lim Definition 2.7
>
α→0 α

f x + α(y − x) − f(x)
= lim definition of d.
>
α→0 α
Since the limit is reached for α → 0, we can assume without loss of generality that
α < 1. We obtain, by convexity of f, and by applying Definition 2.1 with λ = 1 − α,
(1 − α)f(x) + αf(y) − f(x)
(y − x)T ∇f(x) ≤ lim . (2.20)
α→0
> α
We now need only simplify (2.20) to obtain (2.18).
Sufficient condition. We now assume that (2.18) is satisfied, and let us demon-
strate the convexity of f. Let x, y ∈ X be arbitrary, and let us take z = λx + (1 − λ)y.
z ∈ X because X is convex. We apply (2.18) first for z and x, and then for z and y:
f(x) − f(z) ≥ (x − z)T ∇f(z)
(2.21)
f(y) − f(z) ≥ (y − z)T ∇f(z) .
We multiply the first inequality by λ and the second by (1 − λ) and sum them up to
obtain T
λf(x) + (1 − λ)f(y) − f(z) ≥ λx + (1 − λ)y − z ∇f(z). (2.22)
According to the definition of z, we obtain the characterization (2.2) of Definition 2.1,
and f is convex.
The proof for the strictly convex case is identical.
If we write (2.18) in a slightly different way:
f(y) ≥ f(x) + (y − x)T ∇f(x) , ∀x, y ∈ Rn , (2.23)
the right term is nothing else than the equation of the hyperplane that is tangent to
the function f at point x. In this case, we see that a function is convex if and only
if the graph is never below the hyperplane tangent. Figure 2.5 illustrates this idea in
the case of a function with one variable.
f(x) + (y − x)T ∇f(x)
Figure 2.5: Hyperplane tangent to a convex function
We conclude this section by generalizing the notion of the gradient for the func-
tions of Rn → Rm . In this case, the various partial derivatives are arranged in a
matrix called the gradient matrix. Each column of this matrix is the gradient of the
corresponding component of f.
Definition 2.17 (Gradient matrix). Let f : Rn → Rm be such that fi : Rn → R

is differentiable, for i = 1, . . . , m. In this case, f is differentiable, and the function
∇f(x) : Rn → Rn×m is called a gradient matrix and is defined by
 
∇f(x) =  ∇f1 (x) · · · ∇fm (x) 
 ∂f ∂f2 ∂fm 
1 (2.24)
···
 ∂x1 ∂x1 ∂x1 
 . .. .. .. 
=
 .. . . .
.

 ∂f ∂f2 ∂fm 
1
···
∂xn ∂xn ∂xn
The gradient matrix is often used in its transposed form and is then called the
Jacobian matrix of f.
Definition 2.18 (Jacobian matrix). Consider f : Rn → Rm . The function J(x) :

Rn → Rm×n is called a Jacobian matrix and is defined as
 
∇f1 (x)T
 .. 
J(x) = ∇f(x)T =  . . (2.25)
T
∇fm (x)
Born in Potsdam (Germany) on December 10, 1804, and died in

Berlin on February 18, 1851, Jacobi taught at Königsberg with
Neumann and Bessel. He contributed significantly to the theory
of elliptic functions, in competition with Abel. His work on
first-order partial differential equations and determinants are of
prime importance. Although introduced by Cauchy in 1815, the
determinant function is called Jacobian thanks to a long thesis
published by Jacobi in 1841. The determinant of the Jacobian
matrix (Definition 2.18) is called the Jacobian.
Figure 2.6: Carl Gustav Jacob Jacobi
2.3 Differentiability: the second order

We can perform the same differentiability analysis as that on the function f in Sec-
tion 2.2 for each of the functions ∇i f(x) of Definition 2.4. The jth partial derivative
of ∇i f(x) is the second derivative of f with respect to the variables i and j, because

∂∇i f(x) ∂ ∂f(x)/∂xi ∂2 f(x)
= = . (2.26)
∂xj ∂xj ∂xi ∂xj
It is common to organize the second derivatives in an n × n matrix in which the

element of line i and column j is ∂2 f(x)/∂xi ∂xj . This matrix is called Hessian and
gets its name from the German mathematician Otto Hesse (Figure 2.7).
Definition 2.19 (Hessian matrix). Let f : Rn → R be a twice differentiable function.

The function ∇2 f(x) : Rn → Rn×n is called the Hessian matrix or Hessian of f and
40 Differentiability: the second order
is defined by
 
∂2 f(x) ∂2 f(x) ∂2 f(x)
 ··· 
 ∂x21 ∂x1 ∂x2 ∂x1 ∂xn
 2 
∂2 f(x) ∂2 f(x)
 ∂ f(x) 
 ··· 
 ∂x2 ∂x1 ∂x22 ∂x2 ∂xn
∇2 f(x) =  . (2.27)
 .. .. .. 
 . . . 
 
 2 
 ∂ f(x) 2
∂ f(x) ∂ f(x) 
2
···
∂xn ∂x1 ∂xn ∂x2 ∂x2n
The Hessian matrix is always symmetric.
Note that the Hessian of f is the gradient matrix and the Jacobian matrix of ∇f.
Example 2.20 (Hessian). Consider f(x1 , x2 , x3 ) = ex1 + x21 x3 − x1 x2 x3 . The Hessian
of f is given by
 x1 
e + 2x3 −x3 2x1 − x2
∇2 f(x1 , x2 , x3 ) =  −x3 0 −x1 . (2.28)
2x1 − x2 −x1 0
Just like the gradient, the Hessian gives us information about the convexity of the
function.
Theorem 2.21 (Convexity by the Hessian). Let f : X ⊆ Rn → R be a twice

differentiable function on an open convex set X. If ∇2 f(x) is positive semidefinite
(resp. positive definite) for all x in X, then f is convex (resp. strictly convex)
on X.
Proof. Consider x and y in X. We utilize (C.4) of Taylor’s theorem (Theorem C.2)

to evaluate the function in y by using the information in x. By writing d = y − x, we
have
1
f(y) = f(x) + (y − x)T ∇f(x) + dT ∇2 f(x + αd)d . (2.29)
2
If 0 < α ≤ 1, x+αd ∈ X by convexity (Definition B.2) of X, and the matrix ∇2 f(x+αd)
is positive semidefinite (Definition B.8). Therefore,
dT ∇2 f(x + αd)d ≥ 0 . (2.30)
Then, we have
f(y) ≥ f(x) + (y − x)T ∇f(x) . (2.31)
We now need only invoke Theorem 2.16 to prove the convexity of f. The strict
convexity is proven in a similar manner.
It is interesting to analyze the second order information along a given direction

d. If we have a twice differentiable function f, a point x and a direction d, we can
calculate the derivatives in this direction by considering the function of one variable
g : R+ −→ R : α f(x + αd) . (2.32)

According to the chain differentiation rule, we have
g ′ (α) = dT ∇f(x + αd) . (2.33)
Note that g ′ (0) is the directional derivative (Definition 2.7) of f at x along d. We

also have
g ′′ (α) = dT ∇2 f(x + αd)d . (2.34)
Since the second derivative of a function of one variable gives us curvature informa-
tion, (2.34) provides us with information about the curvature of the function f in
the direction d. In particular, when α = 0, this expression gives information on the
curvature of f in x. To avoid the length of the direction influencing the notion of
curvature, it is important to normalize. We obtain the following definition.
Definition 2.22 (Curvature). Let f : Rn → R be a twice differentiable function.

Consider x, d ∈ Rn . The quantity
dT ∇2 f(x)d
(2.35)
dT d
represents the curvature of the function f in x in the direction d.
In linear algebra, the quantity (2.35) is often called the Rayleigh quotient of ∇2 f(x)
in the direction d. One should immediately note that the curvature of the function
x in the direction −d is exactly the same as in the direction d.
Example 2.23 (Curvature). Consider f(x) = 21 x21 + 2x22 , and x = (1, 1)T , as in
Examples 2.14 and 2.15. The curvature of the function in different directions is given
below:
d −d dT ∇2 f(x)d/dT d
( 1 4 )T ( −1 −4 )T 3.8235
( 1 1 )T ( −1 −1 )T 2.5
( −1 3 )T ( 1 −3 )T 3.7
42 Linearity and non linearity
Otto Hesse was born in Königsberg (currently Kaliningrad, Rus-

sia) on April 22, 1811, and died in Munich (Germany) on August
4, 1874. He was a student of Jacobi and in 1845 was appointed
Extraordinary Professor at Königsberg, where he taught for 16
years. Kirchkoff and Lipschitz attended his courses. Hesse was
also affiliated with the University of Halle, in Heidelberg and

with Munich Polytechnicum. He worked mainly on the theory
of algebraic functions and the theory of invariants.
Figure 2.7: Ludwig Otto Hesse
2.4 Linearity and non linearity

A function f : Rn → R is said to be linear if its value is a linear combination of
variables.
Definition 2.24 (Linear function). A function f : Rn → R is said to be linear if it is

defined as
Xn
f(x) = cT x = ci xi , (2.36)
i=1
where c ∈ R is a constant vector, i.e., independent of x. A function f : Rn → Rm is

n
linear if each of its components fi : Rn → R, i = 1, . . . , m, is linear. In this case, it

can be written as
f(x) = Ax , (2.37)
where A ∈ Rm×n is a matrix of constants.
When a constant term is added to a linear function, the result is said to be affine.
Definition 2.25 (Affine function). A function f : Rn → R is said to be affine if it is

written as
n
X
f(x) = cT x + d = ci xi + d , (2.38)
i=1
where c ∈ R is a vector of constants and d ∈ R. A function f : Rn → Rm is affine

n
if each of its components fi : Rn → R, i = 1, . . . , m, is affine. In this case, it can be

written as
f(x) = Ax + b , (2.39)
where A ∈ Rm×n is a matrix and b ∈ Rm is a vector.
Note that minimizing (2.38) is equivalent to minimizing (2.36). Note that all
linear functions are affine. By abuse of language, a non linear function is actually a
function that is not affine.
Definition 2.26 (Non linear function). Any function that is not affine is said to be
non linear.
The set of non linear functions is vast, and one needs to be a little more precise
in their characterization. Intuitively, the function shown in Figure 2.8 seems more
non linear than the one in Figure 2.9. The slope of the former changes quickly with
x, which is not the case for the latter. This is formally captured by the Lipschitz
continuity (Definition B.16) of the gradient.
3
−x
2.5 f(x) = ecos(e )
2
f(x)
1.5
0.5
0
-5 -4 -3 -2 -1 0
x
Figure 2.8: Example of a non linear function
40
30
20 x2
f(x) = 100 + 3x + 1
10
f(x)
0
-10
-20
-30
-10 -5 0 5 10
x
Figure 2.9: Example of another non linear function
44 Linearity and non linearity
Definition 2.27 (Lipschitz continuity of the gradient). Consider f : X ⊆ Rn → Rm .

The gradient matrix of the function is Lipschitz continuous on X if there exists a
constant M > 0 such that, for all x, y ∈ X, we have
∇f(x) − ∇f(y) n×m

≤M x−y n
, (2.40)
where k · kn×m is a norm on Rn×m and k · kn is a norm on Rn . The constant M is

called the Lipschitz constant.
Intuitively, the definition says that the slopes of the function at two close points
are close as well. And the more so when M is small. Actually, when f is linear, the
slope is the same at any point, and (2.40) is verified with M = 0. The value of M
for the function represented in Figure 2.9 is low, while it is large for the function
illustrated in Figure 2.8, where the slope varies dramatically with small modifications
of x.
The constant M can be interpreted as an upper bound on the curvature of the
function. The greater M is, the smaller the curvature is. If M = 0, the curvature is
zero, and the function is linear. Note that this constant is essentially theoretical and
that it is generally difficult to obtain a value for it.
Among the non linear functions, quadratic functions play an important role in
optimization algorithms.
Definition 2.28 (Quadratic function). A function f : Rn → R is said to be quadratic

if it can be written as
n n n
1 T 1 XX X
f(x) = x Qx + gT x + c = Qij xi xj + gi xi + c , (2.41)
2 2
i=1 j=1 i=1
where Q is a symmetric matrix n × n, g ∈ Rn and c ∈ R. We have
∇f(x) = Qx + g and ∇2 f(x) = Q . (2.42)
The presence of the factor 12 enables a simplification of the expression of ∇f(x).

Note also that the fact that Q is symmetric is not restrictive. Indeed, if Q was not
symmetric, we would have
n X
X n n X
X n
1
xT Qx = Qij xi xj = (Qij + Qji )xi xj .
2
i=1 j=1 i=1 j=i
We now define the symmetric matrix Q ′ such that Qij

′ ′
= Qji = 12 (Qij + Qji ). We
T T ′
obtain x Qx = x Q x, and the same function can be written using a symmetric
matrix.
2.5 Conditioning and preconditioning

In linear algebra, the notion of conditioning is related to the analysis of numerical
errors that can occur when solving a linear system (see Golub and Van Loan, 1996).
Definition 2.29 (Condition number). Let A ∈ Rn×n be a non singular symmetric

matrix. The condition number for A is
κ(A) = kAk A−1 . (2.43)
If the matrix norm used is the norm 2, we have

σ1
κ2 (A) = A 2
A−1 2
= , (2.44)
σn
where σ1 is the largest singular value of A (Definition B.28) and σn is the smallest. By
extension, the condition number of a singular matrix (i.e., such that σn = 0) is +∞.
If A is symmetric positive semidefinite, the singular values of A are its eigenvalues
(Definition B.7).
We propose a geometric interpretation of the condition number. For this, we

consider a non linear function f : Rn → R and a vector x ∈ Rn . We assume that
the matrix ∇2 f(x) is positive definite1 , and let λ1 be its largest eigenvalue and λn its
smallest. Let d1 be an eigenvector corresponding to λ1 . We have
Ad1 = λ1 d1 . (2.45)
By premultiplying by dT1 and normalizing, we obtain
dT1 Ad1
λ1 = . (2.46)
dT1 d1
According to Definition 2.22, the eigenvalue λ1 corresponds to the curvature of

the function in the direction of the eigenvector d1 . In addition, according to the
Rayleigh-Ritz theorem (Theorem C.4), this is the greatest curvature among all possi-
ble directions. A similar reasoning for the smallest eigenvalue allows us to determine
λn as the smallest curvature of the function in all possible directions. The condition
number of the Hessian matrix at a point x is the ratio between the largest and the
smallest curvature among the directions when starting from x.
Definition 2.30 (Conditioning). Let f : Rn → R be a twice differentiable function,

and let us take a vector x ∈ Rn . The conditioning of f at x is the condition number
of ∇2 f(x).
1 ∇2 f(x) is always symmetric.

46 Conditioning and preconditioning
By using this interpretation related to curvature, an ill-conditioned function is

characterized by a large difference in curvature between two directions. In the case of
a quadratic function with two dimensions (Figure 2.10(a)), this translates into level
curves forming elongated ellipses (Figure 2.10(b)). A well conditioned function is
characterized by a homogeneous curvature in the various directions. In the case of a
quadratic function with two dimensions (Figure 2.11(a)), this translates into nearly
circular level curves (Figure 2.11(b)).
Example 2.31 (Conditioning). The quadratic function
f(x1 , x2 ) = 2x21 + 9x22 (2.47)
is such that its condition number is 9/2 for all x, because

2 4 0
∇ f(x1 , x2 ) = . (2.48)
0 18
We now apply the change of variables

′
x1 2 0
√ x1
= , (2.49)
x2′ 0 3 2 x2
i.e.,

x1 1/2 √ 0 x1′
= . (2.50)
x2 0 2/6 x2′
We obtain
1 ′2 1 ′2
f(x1′ , x2′ ) =
x + x2 , (2.51)
2 1 2
for which the Hessian is the identity matrix, and the condition number is 1, for all
(x1′ , x2′ ).
We see that it is possible to reduce the condition by a change of variables. In

general, a change of variables is defined by an invertible matrix M.
Definition 2.32 (Change of variables). Consider x ∈ Rn . Let M ∈ Rn×n be an

invertible matrix. The change of variables is the linear application defined by M that
transforms x into x ′ = Mx.
Consider a function f(x) and apply to it the change of variables x ′ = Mx to obtain

the function f̃(x ′ ). We have, according to the chain differentiation rule (Theorem C.3),
f̃(x ′ ) = f(M−1 x ′ )
∇f̃(x ′ ) = M−T ∇f(M−1 x ′ )
(2.52)
∇2 f̃(x ′ ) = M−T ∇2 f(M−1 x ′ )M−1
= M−T ∇2 f(x)M−1 .
40
30
f(x1 , x2 ) 20
10
0
-2
-1
-2 0 x1
-1 1
x2 0 1 2
2
(a) Function
0 x2
-1
-2
-2 -1 0 1 2
x1
(b) Level curves
Figure 2.10: Function (2.47) of Example 2.31
The condition of f̃ in x ′ is the condition number of the matrix M−T ∇2 f(x)M−1

(where M−T is the inverse of the transpose of M). Choosing a change of variables
such that the condition is as close as possible to 1 is called preconditioning.
Definition 2.33 (Preconditioning). Let f : Rn → R be a twice differentiable function,

and let us take a vector x ∈ Rn . The preconditioning of f in x involves defining
a change of variables x ′ = Mx and a function f̃(x ′ ) = f(M−1 x ′ ), such that the
conditioning of f̃ in Mx is better than the conditioning of f in x.
In the context of optimization, the matrix for the change of variables should be
positive definite in order to preserve the nature of the problem.
Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 48 Conditioning and preconditioning
f(x1 , x2 )
-2
-1
-2 0 x1
-1 1
x2 0 1 2
2
(a) Function
0 x2
-1
-2
-2 -1 0 1 2
x1
(b) Level curves
Figure 2.11: Function (2.51) of Example 2.31
If ∇2 f(x) is positive definite, we can calculate the Cholesky decomposition (Defi-

nition B.18)
∇2 f(x) = LLT , (2.53)
where L is a lower triangular matrix. We now choose the change of variables
x ′ = LT x ⇐⇒ x = L−T x ′ . (2.54)
In this case
∇2 f̃(x ′ ) = L−1 ∇2 f(x)L−T according to (2.52)

−1 T −T
=L LL L according to (2.53)
= I.

The conditioning of the function f̃ in x ′ is 1. According to Definition 2.29, κ2 ∇2 f̃(x ′ )
≥ 1 and the obtained conditioning is the best possible. The best preconditioning of f
in x consists in defining the change of variables based on the Cholesky factorization
of ∇2 f(x).
Example 2.34 (Preconditioning). Consider
1 2 25 2 √
f(x1 , x2 ) = x + x + 3x1 x2 − 12 x1 − π x2 − 6 . (2.55)
2 1 2 2
We have
x1 + 3x2 − 12
∇f(x1 , x2 ) = √ (2.56)
3x1 + 25 x2 − π
and T
2 1 3 1 0 1 0
∇ f(x1 , x2 ) = = . (2.57)
3 25 3 4 3 4
We define
x1′ 1 3 x1
= , (2.58)
x2′ 0 4 x2
i.e.,
x1 1 −3/4 x1′
= . (2.59)
x2 0 1/4 x2′
We obtain
2 2
1 3 ′ 25 1 ′ 3 3 ′
f̃(x1′ , x2′ ) = ′
x1 − x2 + x + ′
x1 − x2 x2′
2 4 2 4 2 4 4
√
3 π ′
− 12 x1′ − x2′ − x −6 (2.60)
4 4 2
√
1 ′2 1 ′2 ′ π
= x1 + x2 − 12 x1 + 9 − x2′ − 6 .
2 2 4
It is easy to verify that ∇2 f̃(x1′ , x2′ ) = I. Note that there are no longer any crossed
terms in x1′ x2′ .
50 Exercises
2.6 Exercises
Exercise 2.1. Among the following functions, which are convex and which are con-
cave? Justify your answer.
1. f(x) = 1 − x2 .
2. f(x) = x2 − 1.
q
3. f(x1 , x2 ) = x21 + x22 .
4. f(x) = x3 .
5. f(x1 , x2 , x3 ) = sin(a)x1 + cos(b)x2 + e−c x3 , a, b, c ∈ R.
Exercise 2.2. For each of the following functions:
• Calculate the gradient.
• Calculate the Hessian.
• Specify (and justify) whether the function is convex, concave, or neither.
• Calculate the curvature of the function in a direction d at the specified point x̄.
• Make a change of variables to precondition the function, using the Hessian at the
specified point x̄. Please note that the matrix for a change of variables must be
positive definite.
1 9
1. f(x1 , x2 ) = x21 + x22 , x̄ = (0, 0)T .
2 2
1 3
2. f(x1 , x2 ) = x1 + x32 − x1 − x2 , x̄ = (9, 1)T .
3
3. f(x1 , x2 ) = (x1 − 2)4 + (x1 − 2)2 x22 + (x2 + 1)2 , x̄ = (2, −1)T .
4. f(x1 , x2 ) = x21 + 2x1 x2 + 2x22 , x̄ = (1, 1)T .

5. f(x1 , x2 ) = x21 − x1 x2 + 2x22 − 2x1 + ex1 +x2 , x̄ = (0, 0)T .
Exercise 2.3. Consider f : Rn → R, a point x ∈ Rn and a direction d ∈ Rn , d 6= 0
such that f(x + αd) = f(x) for all α ∈ R. What is the curvature of f in x in the
direction d? What is the conditioning of f in x?
Chapter 3
Constraints
Life would be easier without constraints. Or would it? In this chapter, we investigate
ways to remove some of them, or even all of them. And when some remain, they need
to be properly understood in order to be verified. As algorithms need to move along
directions that are compatible with the constraints, such directions are characterized
in various contexts. We put a special emphasis on linear constraints, which the
analysis simplifies significantly.
Contents
3.1 Active constraints . . . . . . . . . . . . . . . . . . . . . . . 52
3.2 Linear independence of the constraints . . . . . . . . . . 56
3.3 Feasible directions . . . . . . . . . . . . . . . . . . . . . . . 60
3.3.1 Convex constraints . . . . . . . . . . . . . . . . . . . . . . 60
3.3.2 Constraints defined by equations-inequations . . . . . . . 62
3.4 Elimination of constraints . . . . . . . . . . . . . . . . . . 75
3.5 Linear constraints . . . . . . . . . . . . . . . . . . . . . . . 78
3.5.1 Polyhedron . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.5.2 Basic solutions . . . . . . . . . . . . . . . . . . . . . . . . 83
3.5.3 Basic directions . . . . . . . . . . . . . . . . . . . . . . . . 87
3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
In this chapter, we analyze the constraints of the optimization problem:

minx∈Rn f(x) subject to h(x) = 0, g(x) ≤ 0, x ∈ X. A vector satisfying all constraints
is called a feasible point.
Definition 3.1 (Feasible point). Consider the optimization problem (1.71)–(1.74).

A point x ∈ Rn is said to be feasible if it satisfies all constraints (1.72)–(1.74). Note
that, in the literature, this concept is sometimes called a feasible solution or feasible
vector.
52 Active constraints
3.1 Active constraints

The concept of active constraints is relevant mainly for inequality constraints. We
introduce the concept with an example.
Example 3.2 (Inequality constraints). Consider the optimization problem
min x2 (3.1)
x∈R
subject to
x ≤ 4
(3.2)
x ≥ −10.
It is illustrated in Figure 3.1, where the inequality constraints are represented by
vertical lines, associated with an arrow pointed towards the feasible domain. The
solution to this problem is x∗ = 0. In fact, one could also choose to ignore the
constraints and still obtain the same solution. We say that the constraints are inactive
at the solution. When using the notation (1.73), the problem can be written as
min x2 (3.3)
x∈R
subject to
g1 (x) = x−4 ≤ 0
(3.4)
g2 (x) = −x − 10 ≤ 0.
We have g1 (x∗ ) = −4 and g2 (x∗ ) = −10, and g1 (x∗ ) < 0 and g2 (x∗ ) < 0. The fact
that the inequality constraints are strictly verified characterizes inactive constraints.
f(x)
x∗ = 0 f(x) = x2
-10 -5 0 5 10
x
Figure 3.1: Illustration of Example 3.2

Constraints 53
Example 3.3 (Inequality constraints II). Consider the optimization problem

min x2 (3.5)
x∈R
subject to
x ≤ 4
(3.6)
x ≥ 1.
It is illustrated in Figure 3.2, where the inequality constraints are represented by
vertical lines, associated with an arrow pointed towards the feasible domain. The
solution to this problem is x∗ = 1. In this case, we can ignore the constraint x ≤ 4.
However, the constraint x ≥ 1 cannot be ignored without modifying the solution. It
is said to be active. Using the notation (1.73), the problem can be written as
min x2 (3.7)
x∈R
subject to
g1 (x) = x−4 ≤ 0
(3.8)
g2 (x) = 1−x ≤ 0.
We have g1 (x∗ ) = −3 and g2 (x∗ ) = 0, and g1 (x∗ ) < 0 and g2 (x∗ ) = 0. The first
constraint is verified with strict inequality and is inactive. The second constraint is
verified with equality and is active.
f(x) = x2
f(x)
x∗ = 1
-4 -2 0 2 4
x
Definition 3.4 (Active constraints). Consider g : Rn → Rp and h : Rn → Rm . For

1 ≤ i ≤ p, an inequality constraint
gi (x) ≤ 0 (3.9)
54 Active constraints
is said to be active in x∗ if
gi (x∗ ) = 0, (3.10)
and inactive in x∗ if
gi (x∗ ) < 0. (3.11)
By extension, for 1 ≤ i ≤ m, an equality constraint
hi (x) = 0 (3.12)
is said to be active at x∗ if it is satisfied in x∗ , i.e.,
hi (x∗ ) = 0. (3.13)
The set of indices of the active constraints in x∗ is denoted by A(x∗ ).
This concept of active constraints is attractive because in x∗ the active constraints

can be considered equality constraints, while the inactive constraints can be ignored.
In Example 3.2, the unconstrained optimization problem minx∈R x2 has exactly the
same solution as (3.1)–(3.2). In Example 3.3, the constrained optimization problem
minx∈R x2 subject to x = 1 has exactly the same solution as (3.5)–(3.6).
Theorem 3.5 (Active constraints). Take a vector x∗ ∈ Rn . Consider the following

optimization problem P1
minn f(x) (3.14)
x∈R
subject to
g(x) ≤ 0, (3.15)
x ∈ Y ⊆ Rn , (3.16)
where g : Rn −→ Rm is continuous and Y is a subset of Rn . If x∗ is feasible, i.e.,
g(x∗ ) ≤ 0, and if A(x∗ ) ⊆ {1, . . . , p} is the set of indices of the active constraints
in x∗ , i.e.,
A(x∗ ) = {i|gi (x∗ ) = 0}, (3.17)
we consider the following optimization problem P2
min f(x) (3.18)

x∈Rn
subject to
gi (x) = 0, i ∈ A(x∗ ), (3.19)
x ∈ Y ⊆ Rn . (3.20)
Here, x∗ is a local minimum of P1 if and only if x∗ is a local minimum of P2 .
Constraints 55
Proof. Sufficient condition. By continuity of g, for each inactive constraint j, there

is a neighborhood of size εj around x∗ such that the constraint j is strictly verified in
the neighborhood. More formally, if gj (x∗ ) < 0, then there exists εj > 0 such that
gj (x) < 0 ∀x such that kx − x∗ k < εj , (3.21)

as illustrated in Figure 3.3. Consider two feasible neighborhoods around x∗ . The first
one is defined as
Y1 (ε) = {x|g(x) ≤ 0, x ∈ Y and kx − x∗ k < ε}, (3.22)
and contains neighbors of x∗ that are feasible for the problem P1 . The second is
defined as
Y2 (ε) = {x|gi (x) = 0 ∀i ∈ A(x∗ ), x ∈ Y and kx − x∗ k < ε}, (3.23)
and contains neighbors of x∗ that are feasible for the problem P2 .

Since x∗ is a local minimum of P1 , according to Definition 1.5, there exists ^ε > 0
such that f(x∗ ) ≤ f(x), ∀x ∈ Y1 (^ε).
Consider the smallest neighborhood, and define
ε̃ = min(^ε, min∗ εj ). (3.24)

j6∈A(x )
We show that
Y2 (ε̃) ⊆ Y1 (^ε). (3.25)
Indeed, take any x in Y2 (ε̃). In order to show that it belongs to Y1 (^ε), we need to
show that g(x) ≤ 0, x ∈ Y, and kx − x∗ k ≤ ^ε. Since ε̃ ≤ ^ε, we have kx − x∗ k < ε̃ ≤ ^ε,
and the third condition is immediately verified. The second condition (x ∈ Y) is
inherited from the definition of Y2 (ε̃). We need only to demonstrate that g(x) ≤ 0.
To do this, consider the constraints of A(x∗ ) separately from the others. By definition
of Y2 , we have gi (x) = 0 for i ∈ A(x∗ ), which implies gi (x) ≤ 0. Take j 6∈ A(x∗ ).
Since ε̃ ≤ εj , we have gj (x) < 0 according to (3.21), which implies gj (x) ≤ 0. This
completes the proof that g(x) ≤ 0, so that x ∈ Y1 (^ε).
As a result of (3.25), since x∗ is the best element (in the sense of the objective
function) of Y1 (^ε) (according to Definition 1.5 of the local minimum), and since x∗
belongs to Y2 (ε̃), it is also the best element of this set, and a local minimum of P2 .
Necessary condition. Let X1 be the set of feasible points of P1 and X2 the set of
feasible points of P2 . We have X2 ⊆ X1 . Let x∗ be a local minimum of P2 . We assume
by contradiction that it is not a local minimum of P1 . Then, for any ε > 0, there
exists x ∈ X1 such that kx − x∗ k ≤ ε and f(x) < f(x∗ ). Since x∗ is a local minimum
of P2 , x cannot be feasible for P2 , and x 6∈ X2 ⊆ X1 , leading to the contradiction.
56 Linear independence of the constraints
We have managed to eliminate some constraints. Unfortunately, this required know-

ing the solution x∗ . This simplification is thus relevant mainly in theory.
gj (x) > 0
x∗✛ εj ✲
gj (x) < 0
Figure 3.3: Strictly feasible neighborhood of x∗ when the inequality constraint j is

inactive
3.2 Linear independence of the constraints

The analysis of the structure of the constraints and their management in algorithms
is complex. It is therefore necessary to introduce assumptions that are general enough
not to be restrictive from an operational point of view, and that render it possible
to avoid pathological cases. In particular, the linear independence of the constraints
plays an important role. Even though this concept is relatively intuitive when the
constraints are linear, it is necessary to define it strictly for non linear constraints.
Start with the case where all constraints are defined by affine functions (see Def-
inition 2.25). In this case, when using the techniques of Section 1.2, it is always
possible to express the optimization problem as
min f(x) (3.26)
subject to
Ax = b (3.27)
x ≥ 0 (3.28)
by writing h(x) = b − Ax in (1.72), with A ∈ Rm×n , x ∈ Rn and b ∈ Rm , and

g(x) = −x in (1.73).
Constraints 57
Since the inequality constraints are simple, we analyze in more details the system
of equations Ax = b. Like any linear system, three possibilities may arise:
• the system is incompatible, and there is no x such that Ax = b;
• the system is underdetermined, and there is an infinite number of x such that
Ax = b;
• the system is non singular, and there is a unique x that satisfies Ax = b.
In an optimization context, incompatible and non singular systems have little rele-
vance because they leave no degree of freedom to optimize any objective. We thus
only consider underdetermined systems.
If the system is compatible, the rank of A (Definition B.29) gives us information
on the relevance of the various constraints. If the rank is deficient, this means that
certain rows of A form a linear combination of the others, and the corresponding
constraints are redundant. This is formalized by Theorem 3.6 and illustrated by
Example 3.7.
Theorem 3.6 (Redundant constraints). Consider a compatible system of linear

equality constraints Ax = b, with A ∈ Rm×n , m ≤ n. If the rank of A is
deficient, i.e., rank(A)=r < m, then there exists a matrix Ã ∈ Rr×n of full rank
(i.e., rank(Ã) = r), composed exclusively of rows ℓ1 , . . . , ℓr of A such that
Ãx = b̃ ⇐⇒ Ax = b, (3.29)
where b̃ is composed of elements ℓ1 , . . . , ℓr of b.
Proof. Since the rank of A is r, this signifies that m − r rows of A can be written as
linear combinations of r other rows. Without loss of generality, we can assume that
the last m − r rows are linear combinations of the first r rows. By denoting aTk the
kth row of A, we have
r
X
ak = λjk aj k = r + 1, . . . , m and ∃j t.q. λjk 6= 0. (3.30)
j=1
Moreover, since by hypothesis the system Ax = b is compatible, for each k = r +

1, . . . , m we have
bk = aTk x
Pr
= λj a T x (3.31)
Prj=1 jk j
= λ
j=1 k jb .
Denote Ã the matrix composed of the first r rows of A, and b̃ the vector composed
of the first r components of b. Then, ℓi = i, i = 1, . . . , r.
=⇒ Consider x such that Ãx = b̃. According to the definition of Ã, we have that x
satisfies the first r equations of the system, i.e., aTi x = bi for i = 1, . . . , r. Select
k as an arbitrary index between r + 1 and m, and demonstrate that x satisfies the
58 Linear independence of the constraints
corresponding equation. We have

r
X
aTk x = λjk aTj x according to (3.30)
j=1
r
X
= λjk bj because Ãx = b̃

j=1
= bk according to (3.31).
⇐= Let x be such that Ax = b. x satisfies all equations of the system, particularly

the first r ones. Then, it satisfies Ãx = b̃.
Example 3.7 (Redundant system). Take the constraints
x1 + x2 + x3 = 1
x1 − x2 + x4 = 1 (3.32)
x1 − 5x2 − 2x3 + 3x4 = 1
i.e.,
   
1 1 1 0 1
A =  1 −1 0 1  b =  1 . (3.33)
1 −5 −2 3 1
The rank of A is equal to 2, but there are 3 rows (the determinant of any squared
submatrix of dimension 3 is zero). This means that one of the rows is a linear
combination of the others. Since the system is compatible (for instance, x1 = 2/3,
x2 = 0,x3 = 1/3, x4 = 1/3 is feasible), one of the constraints must be redundant. In
this case, if aTi represents the ith row of A, we have
a3 = −2a1 + 3a2 . (3.34)
We can remove the 3rd constraint, and the system
x1 + x2 + x3 = 1
(3.35)
x1 − x2 + x4 = 1
is equivalent to the constraint system (3.32).
To generalize this result to non linear constraints h(x) = 0, we must linearize

them by invoking Taylor’s theorem (Theorem C.1) around a point x+ , i.e.,
h(x) = h(x+ ) + ∇h(x+ )T (x − x+ ) + o(k(x − x+ k).
In this case, by ignoring the term o of (C.1), h(x) = 0 can be approximated by
h(x+ ) + ∇h(x+ )T x − ∇h(x+ )T x+ = 0

Constraints 59
or by
∇h(x+ )T x = ∇h(x+ )T x+ − h(x+ ).
Therefore, the gradients of equality constraints play a similar role as the rows of
the matrix A in (3.27). As for the inequality constraints, we saw in Section 3.1 that
those that are inactive at x+ can be ignored, and that those that are active can be
considered equality constraints. Consequently, we can define the condition of linear
independence as follows.
Definition 3.8 (Linear independence of the constraints). Consider the optimization

problem (1.71)–(1.73) minx∈Rn f(x) subject to h(x) = 0 and g(x) ≤ 0, and x+ a feasi-
ble point. The linear independence of the constraints is satisfied in x+ if the gradients
of the equality constraints and the gradients of the active inequality constraints in
x+ are linearly independent. By abuse of language, it is sometimes simply said that
the constraints are linearly independent.
Example 3.9 (Linear independence of constraints). Take an optimization problem

in R2 with the inequality constraint
g(x) = x21 + (x2 − 1)2 − 1 ≤ 0 (3.36)
and the equality constraint

h(x) = x2 − x21 = 0. (3.37)
We have

2x1 −2x1
∇g(x) = and ∇h(x) = .
2x2 − 2 1
2
h(x) = x2 − x21 = 0
1.5
1 •xa
0.5
x2
0 •xb
-0.5 g(x) = x21 + (x2 − 1)2 − 1 = 0
-1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x1
Figure 3.4: Linear independence of the constraints
60 Feasible directions
Consider the point xa = (1, 1)T , that is feasible, and for which the constraint
(3.36) is active. We have

2 −2
∇g(xa ) = and ∇h(xa ) = .
0 1
These two vectors are linearly independent, and the linear independence of the con-
straints is satisfied in xa . Figure 3.4 represents the normalized vectors
√ !
∇g(xa ) 1 ∇h(xa ) −2 5
5
= and = √ .
k∇g(xa )k 0 k∇h(xa )k 5
5
Consider the point xb = (0, 0)T , that is also feasible, and for which the constraint
(3.36) is active. We have

b 0 b 0
∇g(x ) = and ∇h(x ) = .
−2 1
These vectors are represented as normalized in Figure 3.4. They are linearly depen-
dent because ∇g(xb ) = −2∇h(xb ), and the linear independence of the constraints is
not satisfied in xb .
3.3 Feasible directions

A major difficulty when we develop optimization algorithms is to move within the
feasible set. The analysis of the feasible directions helps us in this task.
Definition 3.10 (Feasible direction). Consider the general optimization problem

(1.71)–(1.74), and the feasible point x ∈ Rn . A direction d is said to be feasible in x
if there exists η > 0 such that x + αd is feasible for any 0 < α ≤ η.
In short, it is a direction that can be followed, at least a little bit, while staying
within the feasible set. Some examples are provided in Figure 3.5, where the feasible
set is the polygon represented by thin lines, feasible directions are represented with
thick plain lines, and infeasible directions with thick dashed lines.
3.3.1 Convex constraints

When the set X of the constraints is convex, the identification of a feasible direction
in x ∈ X depends on the identification of a feasible point y ∈ X, other than x.
Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 Constraints 61
Figure 3.5: Feasible (plain) and infeasible (dashed) directions
Theorem 3.11 (Feasible direction in a convex set). Let X be a convex set, and
consider x, y ∈ X, y 6= x. The direction d = y − x is a feasible direction in x and
x + αd = x + α(y − x) is feasible for any 0 ≤ α ≤ 1.
Proof. According to Definition B.2 of a convex set.
•y
d=y−x
x•
Figure 3.6: Feasible direction in a convex set
Corollary 3.12 (Feasible directions in an interior point). Let X ⊆ Rn be a subset

of Rn and x ∈ Rn an interior point of X. Here, any direction d ∈ Rn is feasible
in x.
Proof. According to the definition of an interior point (Definition 1.15), there exists
a (convex) neighborhood V = {z such that kz − xk ≤ ε} such that V ⊆ X and ε > 0.
Consider an arbitrary direction d, and let y = x + εd/kdk be the point where the
direction intersects the border of the neighborhood. Since ky − xk = ε, then y ∈ V.
Since V is convex, Theorem 3.11 is invoked to demonstrate that d is feasible.
This result is particularly important. The fact that all directions are feasible at
an interior point gives freedom to algorithms in the selection of the direction. This
is what motivates the method of interior points, as described in Chapter 18.
3.3.2 Constraints defined by equations-inequations

Here we consider the problem (1.71)–(1.73). Since we have at our disposal analytical
expression of constraints, we characterize the set of feasible directions directly from
these expressions. When the constraints are linear, the characterization is simple.
Theorem 3.13 (Feasible directions: linear case). Consider the optimization prob-
lem (3.26)–(3.28) min f(x) subject to Ax = b and x ≥ 0, and let x+ be a feasible
point. A direction d is feasible in x+ if and only if
1. Ad = 0, and
2. di ≥ 0 when x+
i = 0.
Proof.=⇒ Direct implication. Let d be a feasible direction in x+ . According to

Definition 3.10, there exists η > 0 such that x+ + αd is feasible for any 0 < α ≤ η.
It satisfies the constraint (3.27), i.e.,
b = A(x+ + αd) = Ax+ + αAd = b + αAd.
We have that Ad = 0 because α > 0. Consider i such that x+ +

i = 0. Since x + αd
is feasible, we have
x+
i + αdi = αdi ≥ 0
and the second condition is satisfied.

⇐= Inverse implication. Let d be a direction satisfying the two conditions. Consider
the point x+ + αd. It satisfies the constraints (3.27) because Ad = 0. For the
constraints (3.28), we consider three types of indices:
– Consider i such that x+ +
i = 0. In this case, xi + αdi is non negative because
di ≥ 0 by hypothesis.
– Consider i such that x+
i > 0 and di ≥ 0. The same conclusion.
– Consider i such that x+i > 0 and di < 0. These are the only indices for which
chances are that x+i + αdi is non feasible. In this case, we have to determine
the step to take in direction d in order to stay feasible. Define
xi
η= min − > 0.
i|x+
i >0 and di <0 di
If we choose α ≤ η, we have
x+
α≤η≤− i
∀i t.q. x+
i > 0 and di < 0,
di
and
x+
i + αdi ≥ 0
because di < 0.
Constraints 63
Then, x+ + αd is feasible if α ≤ η. Since η is positive, d is a feasible direction

according to Definition 3.10.
Corollary 3.14 (Combination of feasible directions: linear case). Consider the

optimization problem (3.26)–(3.28) min f(x) subject to Ax = b and x ≥ 0. Let

x+ be a feasible point and d1 , . . . , dk feasible directions in x+ . Then, the convex
cone generated by these directions contains feasible directions. That is, any
linear combination with non negative coefficients of these directions, i.e.,
k
X
d= αj dj αj ≥ 0 ∀j, (3.38)
j=1
is a feasible direction.
Proof. The two conditions of Theorem 3.13 are trivially satisfied for d.
To switch to the non linear case, we must use the gradients of the constraints.
Before this, we propose to interpret Theorem 3.13 in terms of gradients.
The first condition of the theorem concerns equality constraints. We have seen
that the rows of the matrix A are the gradients of the constraints, i.e. ∇hi (x) = ai ,
with
hi (x) = aTi x − bi = 0.
In the linear case, the first condition can be written as
∇hi (x)T d = 0 i = 1, . . . , m.
The inequality constraints can be written as
gi (x) = −xi ≤ 0 i = 1, . . . , p,
and  
0
 .. 
 . 
 
 0 
 
 
∇gi (x) =  −1  and ∇gi (x)T d = −di .
 
 0 
 
 .. 
 . 
0
The second condition of the theorem can be expressed as follows: “If the constraint
gi (x) is active at x+ , then ∇gi (x+ )T d ≤ 0.” We should also note that if an inequality
constraint is not active at x+ , it does not involve any condition on the direction for
the latter to be feasible.
Unfortunately, the generalization of these results to the non linear case is not
trivial. We develop it in two steps. We first see how to characterize feasible directions
for an inequality constraint. We treat the equality constraint later.
We observe that the gradient of an inequality constraint at a point where it is
active points toward the outside of the constraints, as shown by Example 3.15.
Example 3.15 (Constraint gradients). Consider the subset of R2 defined by the

constraint
1 1 1
(x1 − 1)2 + (x2 − 1)2 ≤ ,
2 2 2
and represented by Figure 3.7. Considering the formulation (1.73), we have
1 1 1
g(x) = (x1 − 1)2 + (x2 − 1)2 − ,
2 2 2
and
x1 − 1
∇g(x) = .
x2 − 1
If we evaluate the gradient at different points along the border of the constraint, we
obtain directions pointing towards the outside of the feasible domain, as shown in
Figure 3.7.
x2
✻
✛ 1
(x
2 1
− 1)2 + 1
(x
2 2
− 1)2 <= 1
2
✲
✲
x1
Figure 3.7: Constraint gradient

Constraints 65
Intuitively, the gradient direction and the directions that form an acute angle with
it cannot be considered as feasible directions.
Theorem 3.16 (Feasible directions: an inequality constraint). Let g : Rn −→ R be

a differentiable function, and consider x+ ∈ Rn such that g(x+ ) ≤ 0.
1. If the constraint g(x) ≤ 0 is inactive at x+ , all directions are feasible in x.

2. If the constraint is active at x+ , and ∇g(x+ ) 6= 0, a direction d is feasible in
x+ if
∇g(x+ )T d < 0. (3.39)
Proof. 1. If the constraint is inactive at x+ , then x+ is an interior point, and Corol-

lary 3.12 applies.
2. Consider the case where the constraint is active at x+ , that is g(x+ ) = 0. Let d be
a direction satisfying (3.39). According to Definition 2.10, d is a descent direction
for g in x+ . We can apply Theorem 2.11 to determine that there exists η > 0 such
that
g(x+ + αd) < g(x+ ) = 0, ∀0 < α ≤ η,
and conclude that d is feasible.
It is important to note that (3.39) is a sufficient condition in order to obtain a

feasible direction when the constraint is active, but it is not a necessary condition.
Indeed, in the linear case, we have a similar condition, with a non strict inequality. We
still have to discuss the case where ∇g(x+ )T d = 0, which occurs when the gradient is
zero, or when the direction d is perpendicular to the gradient. In this case, we invoke
Theorem C.1 (Taylor’s theorem) to obtain
g(x+ + αd) = g(x+ ) + αdT ∇g(x+ ) + o(αkdk) = o(α).
In this case, nothing can guarantee that there exists α such that x+ + αd is feasible.
However, we can make the infeasibility as little as desired by choosing a sufficiently
small α.
We now analyze the feasible directions for an equality constraint
h(x) = 0.
We can express this constraint in an equivalent manner as
h(x) ≤ 0
−h(x) ≤ 0.
If x+ is feasible, these two inequality constraints are active. However, no direction d

can simultaneously satisfy (3.39) for the two inequalities. This condition is unusable
for equality constraints.
Since the equality constraints pose a problem, we address the problem in the other
direction. Instead of positioning ourselves on a feasible point x+ and wondering how
to reach another one, we attempt to identify the ways to reach x+ while remaining
feasible. To do this, we introduce the concept of feasible sequences.
Definition 3.17 (Feasible sequences). Consider the optimization problem (1.71)–

(1.74), and a feasible point x+ ∈ Rn . A sequence (xk )k∈N , with xk ∈ Rn for any k is
said to be a feasible sequence in x+ if the following conditions are satisfied:
1. limk→∞ xk = x+ ,
2. there exists k0 such that xk is feasible for any k ≥ k0 ,
3. xk 6= x+ for all k.
The set of feasible sequences in x+ is denoted by S(x+ ).
Example 3.18 (Feasible sequence). Consider the constraint in R2
h(x) = x21 − x2 = 0,
and the feasible point x+ = (0, 0)T . The sequence defined by

1

xk = k
1
k2
satisfies the three conditions of Definition 3.17 and belongs to S(x+ ). It is illustrated
in Figure 3.8.
h(x) = x21 − x2 = 0
x2
•
••
•••••••••
+•
x =0
-1 -0.5 0 0.5 1
x1
Figure 3.8: Example of a feasible sequence
Constraints 67
Given that the sequence (xk )k is feasible in x+ , we consider the directions con-
necting xk to x+ by normalizing them:
xk − x+
dk = . (3.40)
kxk − x+ k
We should keep in mind that these directions are generally not feasible directions.
We are looking at what happens at the limit.
Example 3.19 (Feasible direction at the limit). We consider once again Exam-
ple 3.18. We have xk − x+ = xk ,
√
+ k2 + 1
kxk − x k =
k2
and !
√ k
dk = k2 +1 .
√ 1
k2 +1
At the limit, we obtain
1
d = lim dk = .
k→∞ 0
These directions are illustrated in Figure 3.9. Note that

+ 0
∇h(x ) = and ∇h(x+ )T d = 0.
−1
d1
h(x) = x21 − x2 = 0
x2
d2
d3
•
•• d
•••••••••
+•
x =0
-1 -0.5 0 0.5 1
x1
Figure 3.9: Example of a feasible direction at the limit
Unfortunately, it is not always possible to position oneself at the limit, as shown

in Example 3.20 where the sequence is not convergent, and contains two adherent
points. In this case, we should consider the limit of subsequences in order to identify
the feasible directions at the limit.
Example 3.20 (Feasible direction at the limit). As for Example 3.18, we consider
the constraint in R2 :
h(x) = x21 − x2 = 0,
and the feasible point x+ = (0, 0)T . The sequence defined by

!
(−1)k
xk = k
1
k2
satisfies the three conditions of Definition 3.17 and belongs to S(x+ ). The calculation
of the directions gives
k !
(−1) k
√
dk = k2 +1
√ 1 ,
k2 +1
and limk→∞ dk does not exist. However, if we consider the subsequences defined by
the even and odd indices, respectively, we obtain

′ 1
d = lim d2k =
k→∞ 0
and
′′ −1
d = lim d2k+1 = ,
k→∞ 0
as illustrated in Figure 3.10. Note once again that ∇h(x+ )T d ′ = ∇h(x+ )T d ′′ = 0.
• h(x) = x21 − x2 = 0
d1
x2
d2
d3
• d4
d ′′ • d′
• •• •••••••• •
x+ = 0
-1 -0.5 0 0.5 1
x1
Figure 3.10: Example of a feasible sequence
We can now formally define the notion of a feasible sequence at the limit.
Constraints 69
Definition 3.21 (Feasible direction at the limit). Consider the optimization problem
(1.71)–(1.74), and a feasible point x+ ∈ Rn . Let (xk )k∈N be a feasible sequence in
x+ . The direction d 6= 0 is a feasible direction at the limit in x+ for the sequence
(xk )k∈N if there exists a subsequence (xki )i∈N such that
d xki − x+
= lim . (3.41)
kdk i→∞ kxki − x+ k
Note that any feasible direction d is also a feasible direction at the limit. Just
take the feasible sequence xk = x+ + k1 d in Definition 3.21.
Definition 3.22 (Tangent cone). A feasible direction at the limit is also called a
tangent direction. The set of all tangent directions in x+ is called the tangent cone
and denoted by T (x+ ).
We can now make the connection between this concept and the constraint gradient.
According to Theorem 3.16 and the associated comments, we consider all directions
that form an obtuse angle with the active inequality constraint gradients and those
that are orthogonal to the equality constraint gradients.
Definition 3.23 (Linearized cone). Consider the optimization problem (1.71)–(1.74),

and a feasible point x+ ∈ Rn . We call the linearized cone in x+ , denoted by D(x+ ),
the set of directions d such that
dT ∇gi (x+ ) ≤ 0, ∀i = 1, . . . , p such that gi (x+ ) = 0, (3.42)
and
dT ∇hi (x+ ) = 0, i = 1, . . . , m, (3.43)
and of all their multiples, i.e.,
{αd|α > 0 and d satisfies (3.42) and (3.43)} . (3.44)
Theorem 3.24 (Feasible directions at the limit). Consider the optimization prob-
lem(1.71)–(1.74), and a feasible point x+ ∈ Rn . Any feasible direction at the
limit in x+ belongs to the linearized cone in x+ , that is
T (x+ ) ⊆ D(x+ ). (3.45)

Proof. Let d be a normalized feasible direction at the limit, and xk a feasible sequence
such that
xk − x+
d = lim . (3.46)
k→∞ kxk − x+ k
1. Consider an active inequality constraint, i.e.,

gi (x+ ) = 0.
For a sufficiently large k such that xk is feasible, we can write
gi (x+ ) + (xk − x+ )T ∇gi (x+ ) + o(kxk − x+ k) = gi (xk ) ≤ 0
invoking Theorem C.1 (Taylor’s theorem), and
(xk − x+ )T ∇gi (x+ ) o(kxk − x+ k)

+ ≤ 0.
kxk − x+ k kxk − x+ k
We now only have to position ourselves at the limit, and utilize (3.46) and Defi-
nition B.17 to obtain (3.42).
2. Consider an equality constraint,
hi (x+ ) = 0.
This constraint is equivalent to two inequality constraints
hi (x+ ) ≤ 0
−hi (x+ ) ≤ 0
which are both active at x+ . According to the first point already demonstrated,
we have 0 ≤ dT ∇hi (x+ ) and 0 ≤ −dT ∇hi (x+ ), and obtain (3.43).
Example 3.25. Illustration of Theorem 3.24 Returning to Example 3.20, we can

observe in Figure 3.11 that the two feasible directions at the limit are orthogonal to
the constraint gradient in x+ , i.e.,

+ 0
∇h(x ) = .
−1
Theorem 3.24 does not yet provide a characterization of feasible directions at the
limit. Nevertheless, such a characterization is important, especially since the notion
of linearized cone is easier to handle than the concept of feasible direction at the limit.
Unfortunately, such a characterization does not exist in the general case. Therefore,
it is useful to assume that any element of the linearized cone is a feasible direction at
the limit. This hypothesis is called a constraint qualification.1
1 Several constraint qualifications have been proposed in the literature. This one is sometimes
called the Abadie Constraint Qualification, from the work by Abadie (1967).
Constraints 71
h(x) = x21 − x2 = 0
d ′′ x+ = 0 d′
x2
-1 -0.5 0 0.5 1
x1
Figure 3.11: Gradient and feasible directions at the limit
Definition 3.26 (Constraint qualification). Consider the optimization problem

(1.71)–(1.74), and let x+ be a feasible point. The constraint qualification condi-
tion is satisfied if any element of the linearized cone in x+ is a feasible direction at
the limit in x+ , that is if
T (x+ ) = D(x+ ). (3.47)
This hypothesis, seemingly restrictive, is satisfied in a number of cases. In par-

ticular, when the constraints are defined solely by equations and inequations, each of
the following is sufficient for a constraint qualification.
• If the constraints (1.71)–(1.73) are linear, the constraint qualification is satisfied
at all feasible points.
• If the constraints are linearly independent in x+ (Definition 3.8), the constraint
qualification is satisfied at x+ .
• If there exists a vector d ∈ Rn such that
1. ∇hi (x+ )T d = 0, for any i = 1, . . . , m,
2. ∇gi (x+ )T d < 0 for any i = 1, . . . , p such that gi (x+ ) = 0
and such that the equality constraints are linearly independent in x+ , then the
constraint qualification is satisfied at x+ (Mangasarian and Fromovitz, 1967).
• If there is no equality constraint, the functions gi are convex, and there exists a
vector x− such that
gi (x− ) < 0 for any i = 1, . . . , p such that gi (x+ ) = 0,
the constraint qualification is satisfied at x+ (Slater, 1950).

We develop the proof for the two first conditions.
Theorem 3.27 (Characterization of feasible directions at the limit – I). Consider

the optimization problem (1.71)–(1.73), and a feasible point x+ ∈ Rn such that
all active constraints in x+ are linear. Every direction d such that kdk = 1 is
feasible at the limit in x+ if and only if it belongs to the linearized cone D(x+ ),
that is
T (x+ ) = D(x+ ). (3.48)
Proof. Theorem 3.24 shows that T (x+ ) ⊆ D(x+ ). To demonstrate that D(x+ ) ⊆
T (x+ ), consider a normalized direction2 d that belongs to the linearized cone D(x+ ).
We need to create a feasible sequence (xk )k , such that (3.41) is satisfied.
For each inequality constraint i active at x+ , we have
gi (x) = aTi x − bi , (3.49)
and for each equality constraint, we have
hi (x) = āTi x − b̄i . (3.50)
Following the arguments developed in Theorem 3.5, there exists ε such that all
constraints that are inactive at x+ are also inactive in any point of the sphere of
radius ε centered in x+ . Consider the sequence (xk )k with
ε
xk = x+ + d k = 1, 2, . . . (3.51)
k
Each xk is situated in the sphere mentioned above, and satisfies the inequality con-
straints that are inactive at x+ . For the inequality constraints that are active at x+ ,
we have
gi (xk ) = gi (xk ) − gi (x+ ) because gi (x+ ) = 0

= aTi xk − bi − aTi x+ + bi according to (3.49)
+
= aTi (xk −x )
ε
= aTi d according to (3.51).
k
Since d is in the linearized cone at x+ , ∇gi (x+ )T d = aTi d ≤ 0, and xk is feasible for
any inequality constraint. For the equality constraint, we obtain in a similar manner
ε T
hi (xk ) = ā d.
k i
Since d is in the linearized cone at x+ , ∇hi (x+ )T d = āTi d = 0, and xk is feasible

for any equality constraint. The sequence (xk )k is indeed a feasible sequence, in the
sense of Definition 3.17.
2 i.e., such that kdk = 1.
Constraints 73
Finally, we now need only deduce from kdk = 1 that, for any k,
xk − x+ (ε/k)d
= =d
kxk − x+ k ε/k
to conclude that d is indeed a feasible direction at the limit.

Theorem 3.28 (Characterization of feasible directions at the limit – II). Consider

the optimization problem(1.71)–(1.73), and a feasible point x+ ∈ Rn for which
the constraints are linearly independent. Any d such that kdk = 1 is feasible at
the limit at x+ if and only if it belongs to the linearized cone D(x+ ), that is
T (x+ ) = D(x+ ). (3.52)
Proof. Theorem 3.24 shows that T (x+ ) ⊆ D(x+ ). To demonstrate that D(x+ ) ⊆
T (x+ ), consider a normalized direction d that belongs to the linearized cone D(x+ ).
We create a feasible sequence (xk )k , such that (3.41) is satisfied. We create it implic-
itly and not explicitly.
To simplify the notations, we first assume that all constraints are equality con-
straints. We consider the Jacobian matrix of constraints in x+ , ∇h(x+ )T ∈ Rm×n ,
for which the rows are constraint gradients in x+ (see Definition 2.18). Since the
constraints are linearly independent, the Jacobian matrix is of full rank. Consider a
matrix Z ∈ Rn×(n−m) for which the columns form a basis of the kernel of ∇h(x+ )T ,
i.e., such that ∇h(x+ )T Z = 0. We apply the theorem of implicit functions (Theo-
rem C.6) to the parameterized function F : R × Rn → Rn defined by

h(x) − µ∇h(x+ )T d
F(µ, x) = . (3.53)
ZT (x − x+ − µd)
The assumptions of Theorem C.6 are satisfied for µ = 0 and x = x+ . Indeed,
∇x F(µ, x) = (∇h(x) Z)
is non singular since the columns of Z are orthogonal to those of ∇h(x), and since
the two submatrices are of full rank. Then, we have a function φ such that
x+ = φ(0) (3.54)
and, for µ sufficiently close to zero,

h(φ(µ)) − µ∇h(x+ )T d
F(µ, φ(µ)) = = 0. (3.55)
ZT (φ(µ) − x+ − µd)
Since d is in the linearized cone, we deduce from the first part of (3.55)
h(φ(µ)) = µ∇h(x+ )T d = 0 (3.56)

and φ(µ) is feasible. We use φ to build a feasible sequence. To do so, we show that
φ(µ) 6= x+ when µ 6= 0. Assume by contradiction that φ(µ) = x+ . In this case,

h(x+ ) − µ∇h(x+ )T d −µ∇h(x+ )T d
F(µ, x+ ) = = = 0. (3.57)
ZT (x+ − x+ − µd) −µZT d
If µ 6= 0, and since the matrices ∇h(x+ )T and Z are of full rank, we deduce that d = 0,
which is impossible since kdk = 1. Then, if µ 6= 0, we necessarily have φ(µ) 6= x+ .
We are now able to generate a feasible sequence. To do so, we consider a sequence
(µk )k such that limk→∞ (µk )k = 0. Then, the sequence
xk = φ(µk ) (3.58)
satisfies all conditions to be a feasible sequence (see Definition 3.17). We now need
to demonstrate that d is a feasible direction at the limit.
For a sufficiently large k, such that µk is sufficiently close to zero, we use a Taylor
series of h around x+
h(xk ) = h(x+ ) + ∇h(x+ )T (xk − x+ ) + o(kxk − x+ k)

= ∇h(x+ )T (xk − x+ ) + o(kxk − x+ k)
in (3.55) to obtain
0 = F(µ , x )
k k+ T
∇h(x ) (xk − x+ ) + o(kxk − x+ k) − µk ∇h(x+ )T d
=
ZT (xk − x+ − µk d)

∇h(x ) (xk − x+ − µk d) + o(kxk − x+ k)
+ T
=
ZT (xk − x+ − µk d)

∇h(x+ )T
= (xk − x+ − µk d) + o(kxk − x+ k)
ZT

∇h(x+ )T xk −x+
+
k −x k)
= T kx −x
µk
+ k − kx −x+ k d + o(kx
kxk −x+ k .
Z k k
Then, since the matrices ∇h(xk )T and ZT are of full rank, we have

xk − x+ µk
lim − d = 0. (3.59)
k→∞ kxk − x+ k kxk − x+ k
Define
xk − x+ µk
d̃ = lim +
and c = lim ,
k→∞ kxk − x k k→∞ kxk − x+ k
and (3.59) is written as

d̃ = cd.
Since kd̃k = kdk = 1, we have c = 1 and
xk − x+
lim = d.
k→∞ kxk − x+ k
Constraints 75
Then, d is indeed a feasible direction at the limit.

To be completely accurate, we must consider a case with inequality constraints.
For those that are active at x+ , the reasoning is identical, with the exception of (3.56)
which becomes
g(φ(µ)) = µ∇g(x+ )T d ≤ 0
from which we deduce the feasibility of φ(µ). Inactive constraints do not pose a prob-
lem, since there is a sphere around x+ such that all elements satisfy these constraints.
Since Definition 3.17 is asymptotic, we can always choose k sufficiently large such
that xk belongs to this sphere.
Feasible directions at the limit are an extension of the concept of a feasible direction.
It enables us to identify in which direction an infinitesimal displacement continues
to be feasible. Unfortunately, the definition is too complex to be operational. The
linearized cone, based on the constraint gradients, is directly accessible to the calcu-
lation. We usually assume that the constraint qualification is satisfied.
3.4 Elimination of constraints

Optimization problems without constraint are simpler than those with constraints.
We now analyze techniques to eliminate constraints.
We start with the optimization problem (3.26)–(3.28) min f(x) subject to Ax = b
and x ≥ 0, where the constraints are linear. We assume that we have a system
of constraints of full rank, obtained after eliminating any redundant constraint (see
Theorem 3.6). It is then possible to simplify the problem by eliminating certain
variables, as shown in Example 3.29.
Example 3.29 (Elimination of variables). Consider the following optimization prob-
lem:
min f(x1 , x2 , x3 , x4 ) = x21 + sin(x3 − x2 ) + x4 + 1 (3.60)
subject to
x1 +x2 +x3 = 1
(3.61)
x1 −x2 +x4 = 1.
We can rewrite the constraints in the following manner:
x3 = 1 − x1 − x2
(3.62)
x4 = 1 − x1 + x2 .
Thus, the optimization problem can be rewritten so as to depend only on two variables
x1 and x2 :
min f(x1 , x2 ) = x21 + sin(−x1 − 2x2 + 1) − x1 + x2 + 2 (3.63)
without constraint.
76 Elimination of constraints
To generalize Example 3.29, we consider the constraints
Ax = b (3.64)
with A ∈ Rm×n , m ≤ n, x ∈ Rn and b ∈ Rm and rank(A) = m. We choose m

columns of A that are linearly independent corresponding to the variables that we

wish to eliminate. Apply a permutation P ∈ Rn×n of the columns of A in such a way
that the m selected columns are the leftmost columns:
AP = (B N) (3.65)
where B ∈ Rm×m contains the m first columns of AP, and N ∈ Rm×(n−m) contains
the n − m last ones. Recalling that PPT = I, we write (3.64) in the following manner
Ax = AP(PT x) = BxB + NxN = b (3.66)
where xB ∈ Rm contains the m first components of PT x, and xN ∈ Rn−m contains

the n − m last ones. Since the m first columns of AP are linearly independent, the
matrix B is invertible. We can write
xB = B−1 (b − NxN ). (3.67)
By adopting this convention, we consider the optimization problem with linear

equality constraints
xB
min f P (3.68)
xB ,xN xN
subject to
BxB + NxN = b. (3.69)
It is equivalent to the unconstrained problem
−1
B (b − NxN )
min f P . (3.70)
xN xN
The variables xB are called basic variables, and the variables xN non basic variables.
Example 3.30 (Elimination of variables – II). In Example 3.29, we have

1 1 1 0 1
A= b= .
1 −1 0 1 1
The variables to eliminate are x3 and x4 . They correspond to the last two columns
of the constraint matrix, and we choose the permutation matrix to make them the
first two, that is
 
0 0 1 0
 0 0 0 1 
P=  1 0 0 0 

0 1 0 0
Constraints 77
to obtain

1 0 1 1 1 0 1 1
AP = (B|N) = B= N=
0 1 1 −1 0 1 1 −1
and

x3
xB = = B−1 (b − NxN )
x4

1 0 1 1 1 x1
= −
0 1 1 1 −1 x2

1 − x1 − x2
=
1 − x1 + x2
which is exactly (3.62).
It is relatively easy to remove linear equality constraints. However, note that the
calculation of the matrix B−1 can be tedious, especially when m is large, and it is
sometimes preferable to explicitly maintain the constraints for the problem.
The elimination of non linear constraints can be problematic. An interesting
example, proposed as an exercise by Fletcher (1983) and again by Nocedal and Wright
(1999), illustrates this difficulty.
Example 3.31 (Elimination of non linear constraints). Consider the problem
min f(x1 , x2 ) = x21 + x22

x
subject to
(x1 − 1)3 = x22
shown in Figure 3.12.
3
x22 = (x1 − 1)3
2
0
x2
(1, 0)
-1
-2
-3
-3 -2 -1 0 1 2 3
x1
Figure 3.12: The problem in Example 3.31
78 Linear constraints
The solution to this problem is (1, 0). If we eliminate x2 , we obtain an optimization

problem without constraint
min f̃(x1 ) = x21 + (x1 − 1)3 .

x1
However, this new problem has no solution since f̃ is unbounded, i.e.,
lim f̃(x1 ) = −∞,

x1 →−∞
as shown in Figure 3.13. The problem is that the substitution can only be performed
if x1 ≥ 1, since x22 must necessarily be non negative. This implicit constraint in the
original problem should be explicitly incorporated in the problem after elimination.
It plays a crucial role since it is active at the solution.
20
x21 + (x1 − 1)3
10
0
-10
f̃(x1 )
-20
-30
-40
-50
-60
-3 -2 -1 0 1 2 3
x1
Figure 3.13: The problem without constraint in Example 3.31
One must thus be cautious when eliminating non linear constraints.
3.5 Linear constraints

When the constraints are linear, a more detailed analysis can be performed. We first
give a geometric description of the constraints. Then, geometric concepts have their
algebraic counterparts.
3.5.1 Polyhedron
We analyze the linear constraints (3.27)–(3.28) from a geometrical point of view. The
central concept in this context is the polyhedron.
Constraints 79
Definition 3.32 (Polyhedron). A polyhedron is a set of points of Rn delimited by

hyperplanes, i.e.,
{x ∈ Rn |Ax ≤ b} , (3.71)
with A ∈ Rm×n and b ∈ Rm .
By employing the techniques discussed in Section 1.2, it is always possible to

transform an optimization problem with general linear constraints into a problem
with the constraints Ax ≤ b. Thus, the set of feasible points in an optimization
problem with linear constraints is a polyhedron. To make the most of the technique
of elimination of variables mentioned above, it is helpful to use the representation of
a polyhedron called representation in standard form.
Definition 3.33 (Polyhedron represented in standard form). A polyhedron repre-

sented in standard form is a polyhedron defined in the following manner
{x ∈ Rn |Ax = b, x ≥ 0} , (3.72)
where A ∈ Rm×n and b ∈ Rm .
Note that according to Theorem 3.6, the matrix A is assumed to be of full rank
without loss of generality.
The identification of vertices or extreme points of a polyhedron is possible thanks
to the technique of elimination of variables described above. We begin by formally
defining a vertex.
Definition 3.34 (Vertex). Let P be a polyhedron. A vector x ∈ P is a vertex of P

if it is impossible to find two vectors y and z in P, different from x such that x is
a convex combination (Definition B.3) of y and z, i.e., such that there exists a real
number 0 < λ < 1 such that
x = λy + (1 − λ)z. (3.73)
The Definition 3.34 is illustrated by Figure 3.14, where x is a vertex. If we choose

y ∈ P, it is impossible to find a z in P such that x is a convex combination of y and
z. On the other hand, x̃ is not a vertex, and represents a convex combination of ỹ
and z̃.
We can identify the vertices of a polyhedron represented in standard form by using
the following procedure:
1. Choose m variables to eliminate.
2. Identify the matrix B that contains the corresponding columns of A.
3. Take xN = 0.
4. In this case, (3.67) is expressed as xB = B−1 b. If xB ≥ 0, then x = (xTB xTN )T is a
vertex of the polyhedron.
x
P
z̃
x̃
ỹ
Figure 3.14: Illustration of Definition 3.34
We formalize this result with the following theorem.
Theorem 3.35 (Identification of vertices). Let P = {x ∈ Rn |Ax = b, x ≥ 0} be

a polyhedron represented in standard form, with A ∈ Rm×n of full rank and
b ∈ Rm and n ≥ m. Consider m linearly independent columns of A, and call
B the matrix containing these m columns, and N the matrix containing the
remaining n − m columns, such that
AP = (B|N) (3.74)
where P is the appropriate permutation matrix. Consider the vector

−1
B b
x=P . (3.75)
0Rn−m
If B−1 b ≥ 0, then x is a vertex of P.
Proof. Without loss of generality, and to simplify the notations in the proof, we
assume that the m columns chosen are the m first ones, such that the permutation
matrix P = I. We assume by contradiction that there exists y, z ∈ P, y 6= x, z 6= x
and 0 ≤ λ ≤ 1 such that
x = λy + (1 − λ)z. (3.76)
After decomposition, we obtain
xB = λyB + (1 − λ)zB , (3.77)
and
xN = λyN + (1 − λ)zN . (3.78)
Since y and z are in P, we have yN ≥ 0 and zN ≥ 0. Since 0 ≤ λ ≤ 1, the only way
for xN = 0 is that yN = zN = 0. Then,
yB = B−1 (b − NyN ) = B−1 b = xB , (3.79)
Constraints 81
and
zB = B−1 (b − NzN ) = B−1 b = xB . (3.80)
We obtain x = y = z, which contradicts the fact that y and z are different from x,
and proves the result.
It then appears that the vertices can be characterized by the set of active con-
straints.
Theorem 3.36 (Vertices and active constraints). Let P = {x ∈ Rn |Ax = b, x ≥ 0}

be a polyhedron represented in standard form, with A ∈ Rm×n and b ∈ Rm and
n ≥ m. Let x∗ ∈ P, and
A(x∗ ) = {i|x∗i = 0}
be the set of indices of the active constraints. x∗ is a vertex of P if and only if
the linear manifold
L(x∗ ) = {x ∈ Rn |Ax = b and xi = 0 ∀i ∈ A(x∗ )} (3.81)
is zero-dimensional, i.e., L(x∗ ) = {x∗ }.
Proof. =⇒ Direct implication. Let x∗ be a vertex. Assume by contradiction that

L(x∗ ) is not zero-dimensional. There is then a straight line in L(x∗ ) characterized
by the equation
x∗ + λd, λ ∈ R
with d ∈ Rn , d 6= 0 and Ad = 0 such that di = 0 for any i ∈ A(x0 ) (see
Theorem 3.13). For every i such that di 6= 0, we define
x∗i
αi = − .
di
According to Definition (3.81) of linear manifold, αi 6= 0, for any i. Indeed, if

di 6= 0, then i 6∈ A(x∗ ), and x∗i > 0. We are now able to find two points of the
polyhedron such that x∗ is a convex combination of these points, contradicting
the fact that it is a vertex.
Consider α1 = mini {αi |αi > 0}. If no αi is positive, we take α1 = 1. Similarly,
α2 = maxi {αi |αi < 0}. If no αi is negative, we take α2 = −1. Then, the points
y = x∗ + α1 d and z = x∗ + α2 d belong by construction to the polyhedron P.
Moreover x∗ = λy + (1 − λ)z, with
−α2
λ= .
α1 − α2
Since α1 > 0 and α2 < 0, we have 0 < λ < 1, and x∗ is a convex combination of
y and z.
⇐= Inverse implication. Consider x∗ ∈ P such that L(x∗ ) = {x∗ }. We assume by
contradiction that x∗ is not a vertex of the polyhedron P. There then exists y
and z in P such that x∗ = 12 (y + z), by arbitrarily taking λ = 12 in Definition 3.34.

For all indices i such that x∗i = 0, the corresponding indices of y and z are also
necessarily zero, because y ≥ 0 and z ≥ 0. Then, y and z belong to L(x∗ ), which
contradicts the fact that x∗ is the only element.
The characterization of vertices by active constraints is particularly useful when

developing algorithms. A more explicit representation than linear manifold is desir-
able. This is the concept of a feasible basic solution. Before introducing this notion,
we demonstrate that a non empty polyhedron represented in standard form always
contains at least one vertex.
Theorem 3.37 (Existence of a vertex). Let P = {x ∈ Rn |Ax = b, x ≥ 0} be a

polyhedron represented in standard form, with A ∈ Rm×n and b ∈ Rm and
n ≥ m. If P is non empty, it has at least one vertex.
Proof. We construct a finite number of points belonging to linear varieties (defined

by (3.81)) of decreasing dimension. The last one is the vertex of P, then proving its
existence.
Since P is non empty, there exists x0 ∈ P. If dim L(x0 ) = 0, x0 is a vertex
according to Theorem 3.36. Otherwise, there exists a straight line contained in L(x0 )
characterized by
x0 + λd, λ ∈ R
with d ∈ Rn , d 6= 0 and Ad = 0 such that di = 0 for any i ∈ A(x0 ). For each i such
that di 6= 0, we define
(x0 )i
αi = − .
di
According to Definition (3.81) of linear manifold, αi 6= 0, for all i. Indeed, if di 6= 0,
then i 6∈ A(x0 ), and (x0 )i > 0. Without loss of generality, we can assume that there
exists at least one αi > 0 (if this is not the case, they are all non positive, and we
can utilize the same approach using the straight line defined by −d). We define
α∗ = min αi
i|di >0
and j an index for which the minimum is reached, i.e., α∗ = αj . The point x1 = x0 +
α∗ d belongs to the polyhedron by construction. Moreover, (x1 )j = 0 and (x0 )j > 0.
Then, the dimension of L(x1 ) is strictly inferior to that of L(x0 ). We now need
only repeat the procedure to obtain, after a certain number of iterations k at most
equal to the dimension of L(x0 ), a point xk such that dim L(xk ) = 0. According to
Theorem 3.36, xk is a vertex of P. This proof is illustrated in Figure 3.15, where
the linear manifold {x|Ax = b} is shown. In this example, L(x0 ) is the represented
plane, L(x1 ) is the straight line corresponding to the second coordinate axis, and
L(x2 ) = {x2 }.
Constraints 83
✻
x0
x1
✲
x2 = xk
Figure 3.15: Illustration of the proof of Theorem 3.37
3.5.2 Basic solutions

The notion of a vertex is a purely geometric concept. By invoking Theorem 3.35, it
is possible to characterize it algebraically. In this case, we speak of a feasible basic
solution.
Definition 3.38 (Basic solution). Let P = {x ∈ Rn |Ax = b, x ≥ 0} be a polyhedron

represented in standard form, with A ∈ Rm×n and b ∈ Rm and n ≥ m. A vector
x ∈ Rn such that Ax = b is along with a set of indices j1 , . . . , jm said to be a basic
solution of P if
1. the matrix B = (Aj1 · · · Ajm ) composed of columns j1 , . . . , jm of the matrix A is
non singular and
2. xi = 0 if i 6= j1 , . . . , jm .
If, moreover, xB = B−1 b ≥ 0, the vector x is called a feasible basic solution.
It is common to say that the variables j1 , . . . , jm in Definition 3.38 are basic

variables, and that the others are non basic variables. Example 3.39 identifies the
basic solutions of a polygon, written in the form of a polyhedron represented in
standard form.
Example 3.39 (Basic solutions). Consider a polyhedron represented in standard
form
   

 x1 

  x2  
P= x=   | Ax = b, x ≥ 0 (3.82)

 x3  

 
x4
with

1 1 1 0 1
A= b= . (3.83)
1 −1 0 1 1
x2 ✻
3
d1
d3
❘ x1 + x2 = 1
❄ ✠
c4
■d
1 ✲
c3
d 2 x1
✠
■ x −x =1
1 2
Figure 3.16: Feasible domain of Example 3.39
In order to view it in R2 , we represent the polygon

x1
P̃ = | x1 + x2 ≤ 1, x1 − x2 ≤ 1, x1 ≥ 0, x2 ≥ 0 (3.84)
x2
in Figure 3.16. Note that if (x1 , x2 , x3 , x4 )T ∈ P, then (x1 , x2 )T ∈ P̃. Furthermore,

if (x1 , x2 )T ∈ P̃, then (x1 , x2 , 1 − x1 − x2 , 1 − x1 + x2 )T ∈ P. The variables x3 and x4
are slack variables (Definition 1.4).
Each basic solution is obtained by selecting 2 variables out of 4 to be in the basis.
There is a total of 6 possible selections of basic variables.
1. Basic solution with x1 and x2 in the basis (j1 = 1, j2 = 2).
 
1

1 1 1 1
1  0 
B= ; B−1 = 2
1
2 ; xB = B−1 b = ;x =  
 0 .
1 −1 2 − 12 0
0
This basic solution is feasible and corresponds to point 2 in Figure 3.16.

Constraints 85

 
1

1 1 0 1 1  0 
B= ; B−1 = ; xB = B−1 b = ;x =  
 0 .
1 0 1 −1 0
This basic solution is feasible and also corresponds to point 2 in Figure 3.16.
 
1

1 0 1 0 1  0 
B= ; B−1 = ; xB = B−1 b = ;x = 
 0 .

1 1 −1 1 0
0
This basic solution is feasible and also corresponds to point 2 in Figure 3.16.
 
0

1 1 0 −1 −1  −1 
B= ; B−1 = ; xB = B−1 b = ;x = 
 2 .

−1 0 1 1 2
0
This basic solution is not feasible because B−1 b 6≥ 0. It corresponds to point 4 in

Figure 3.16.
 
0

1 0 1 0 1  1 
B= ; B−1 = ; xB = B−1 b = ;x =  
 0 .
−1 1 1 1 2
2

 
0

1 0 1  0 
B=B = −1 −1
; xB = B b = ;x =  
 1 .
0 1 1
1
The notion of a basic solution (Definition 3.38) enables us to analyze the poly-
hedron in terms of active constraints of the optimization problem (Definition 3.4).
Let x be a feasible basic solution such that xB = B−1 b > 0. We say that it is non
degenerate. In this case, there are exactly n active constraints in x: the m equal-
ity constraints and the n − m non basic variables which are 0, and which make the
constraints of type xi ≥ 0 active. The constraints xi ≥ 0 corresponding to the basic

variables are all inactive because xB > 0. According to Theorem 3.5, the feasible
basic solution is defined by n equations. If A is of full rank and xB > 0, there is then
a bijective relationship between the vertices of the polyhedron and the feasible basic
solutions. This equivalence between an algebraic and a geometric concept is useful
when developing algorithms.
Theorem 3.40 (Equivalence between vertices and feasible basic solutions). Let P =
{x ∈ Rn |Ax = b, x ≥ 0} be a polyhedron. The point x∗ ∈ P is a vertex of P if and
only if it is a feasible basic solution.
Proof. =⇒ Let x∗ be a vertex of P. We assume that x∗ is not a feasible basic solution.

We can assume without loss of generality that the matrix A is of full rank (by
removing redundant constraints). As x∗ is not a feasible basic solution, there are
strictly more than m non zero components in x∗ . Consider m linearly indepen-
dent columns of A, corresponding to non zero components of x∗ , which form an
invertible matrix B, and where the remaining n − m columns form a matrix N.
Here, x∗ can be decomposed (see Section 3.4) into one basic component xB and
one non basic component xN such that
xB = B−1 (b − NxN ).
Since x∗ is not a feasible basic solution, there exists at least one component k of
xN that is not zero. We construct the direction d for which the basic component
is
dB = −B−1 Ak ,
where Ak is the kth column of A, and the non basic component for all zero
components, except the kth one which equals 1. Therefore,
X
Ad = BdB + Ndn = −BB−1 Ak + Aj dj = −Ak + Ak = 0.
jnon basic
Then, for all α,

A(x∗ + αd) = Ax∗ + αAd = Ax∗ = b.
Since x∗k > 0, it is possible to choose α1 > 0 and α2 > 0 sufficiently small so that
x1 = x∗ + α1 d and x2 = x∗ − α2 d are in P. We take
α2
λ= .
α1 + α2
We have 0 < λ < 1 and x∗ = λx1 + (1 − λ)x2 , which contradicts the fact that x∗
is a vertex of the polyhedron.
⇐= Theorem 3.35.
It is important to note that Theorem 3.40 does not guarantee a bijective rela-
tionship between the vertices of the polyhedron and the feasible basic solutions in all
Constraints 87
cases. Indeed, when some of the components of xB = B−1 b are zero, there are more
than n active constraints, and the feasible basic solution is defined by more than n
equations in a space of n dimensions. We say in this case that we are dealing with a
degenerate feasible basic solution.
Definition 3.41 (Degenerate feasible basic solution). Let P = {x ∈ Rn |Ax = b, x ≥

0} be a polyhedron represented in standard form, with A ∈ Rm×n and b ∈ Rm and
n ≥ m. A basic solution x ∈ Rn is said to be degenerate if more than n constraints
are active at x, i.e., if more than n − m components of x are zero.
In the presence of degeneracy, a vertex may correspond to multiple feasible basic

solutions. In Example 3.39, three constraints are active at vertex 2 (Figure 3.16), even
though we only need two constraints to characterize it. Then, the first three basic
solutions identified in the example all correspond to this vertex and are degenerate.
3.5.3 Basic directions

If x is a feasible basic solution, the feasible directions in x (there are infinitely many)
can be characterized by a finite number of directions, called basic directions, and
which correspond to the edges of the polyhedron of the constraints adjacent to the
vertex corresponding to the feasible basic solution x.
To define these basic directions, we consider a feasible basic solution
−1
xB B b
x= = (3.85)
xN 0
where we assume, without loss of generality, that the indices of the basic variables are
the m first ones, and that B consists of the m first columns of the matrix A. Consider
a non basic variable, for instance the variable with index p, and define a direction
that gives positive values to this non basic variable, all the while maintaining the
other non basic variables at zero. Then
   
d1 d1
 ..   .. 
 .   . 
   
 dm   dm 
   
 d   
 m+1   0 
 ..   
dB    .. 
d= = .  =  . . (3.86)
dN    
 dp−1   0 
   
 dp   1 
   
 dp+1   0 
   
 ..   .. 
 .   . 
dn 0
Since part dN of the direction is defined, we now need only define dB . For this, we
invoke Theorem 3.13. To ensure that such a direction is feasible, the first condition
is that Ad = 0. Then, by denoting Aj the jth column of A, we obtain

n
X
Ad = BdB + NdN = BdB + Aj dj = BdB + Ap = 0, (3.87)
j=m+1
that is,
dB = −B−1 Ap . (3.88)
Definition 3.42 (Basic direction). Let P = {x ∈ Rn |Ax = b, x ≥ 0} be a polyhedron

represented in standard form, with A ∈ Rm×n and b ∈ Rm and n ≥ m and let x ∈ Rn
be a feasible basic solution of P. A direction d is called the pth basic direction in x
if p is the index of a non basic variable, and

dBp
dp = P (3.89)
dNp
where P is the permutation matrix corresponding to the basic solution x, dBp =

−B−1 Ap , and dNp is such that

T 0
P ep = , (3.90)
dNp
i.e., that all the elements of dNp are zero, except the one corresponding to the variable
p, which is 1.
Note that these directions are not always feasible, as discussed later. But first, we
illustrate the concept with Example 3.39.
Example 3.43 (Basic directions). Consider the polygon in Example 3.39, and the
feasible basic solution where x2 and x4 are in the basis. Then
   
0 0 0 1 0
 1   1 0 0 0 
x=  
 0  and P =  0 0 0 1  .

2 0 1 0 0
The basic direction corresponding to the non basic variable x1 is
    

  1 0 1 −1 1
−B−1 A1  −     −1 
d1 = P   = P 1 1 1  = P  −2 = 
1  1   1   0 ,
0
0 0 −2
and the basic direction corresponding to the non basic variable x3 is
    

  1 0 1 −1 0
−B−1 A3  −   −1   −1 
 = P 1 1 0  = P = 
d3 = P  0    0   1 .
0
1
1 1 −1
Constraints 89
These two directions are shown in Figure 3.16, from the feasible basic solution 3.
We now consider the feasible basic solution where x1 and x4 are in the basis (point
2 in Figure 3.16). Then,
   
1 1 0 0 0
 0   0 0 1 0 
x=  
 0  and P =  0 0 0 1  .

0 0 1 0 0

     
  1 0 1 −1 −1
−B−1 A2  −     
 = P −1 1 −1  = P 2  =  1 ,
d̃2 = P  1     
1 1 0 
0
0 0 2

     
  1 0 1 −1 −1
−B−1 A3  − −1     
 = P 1 0  = P 1  =  0 .
d̃3 = P  0    0   1 
0
1
1 1 1
These directions are not represented in Figure 3.16. Finally, consider the feasible
basic solution where x1 and x2 are in the basis. Then,
   
1 1 0 0 0
 0   0 1 0 0 
x=  
 0  and P =  0 0 1
 = I.
0 
0 0 0 0 1

 1 1
  1 
 −1
 2 2 1 −2
−B A3  − 1
− 1
0   −1 
b3 = 
d 1 = 2 2 = 2 ,
 1   1 
0
0 0

 1 1
  1 
 −1
 2 2 0 −2
−B A4  − 1 1
−2 1   1 
b4 = 
d 0 = 2 = 2 .
 0   0 
1
1 1
b4 is a feasible direction,
These two directions are shown in Figure 3.16. Note that d
b
whereas d3 is not.
Theorem 3.44 (Feasible basic directions). Let P = {x ∈ Rn |Ax = b, x ≥ 0} be

a polyhedron represented in standard form, with A ∈ Rm×n and b ∈ Rm and
n ≥ m, and let x ∈ Rn be a feasible basic solution of P. If x is non degenerate
(in the sense of Definition 3.41), then any basic direction is a feasible direction
in x.
Proof. Let k be an arbitrary index of a non basic variable. According to Defini-

tion 3.42, we have Adk = 0. Moreover, since x is non degenerate, only its non
basic components are zero. The corresponding components of dk are non negative by
definition. Theorem 3.13 can be applied to prove the feasibility of dk .
The following theorem enables us to consider only the feasible basic directions in
order to characterize any feasible direction.
Theorem 3.45 (Combination of basic directions). Let P = {x ∈ Rn |Ax = b, x ≥ 0}

be a polyhedron represented in standard form, with A ∈ Rm×n and b ∈ Rm and
n ≥ m, and let x ∈ Rn be a feasible basic solution of P. Any feasible direction d
in x can be written as a linear combination of the basic directions, i.e.,
X
d= (d)j dj , (3.91)
j∈N
where N is the set of indices of the non basic variables, dj ∈ Rn the jth basic
direction, and (d)j ∈ R, the jth component of d.
Proof. Consider a feasible direction d, and assume without loss of generality that the
basic variables are the m first ones. According to Theorem 3.13, we have
Ad = BdB + NdN = 0
and
n
X
dB = −B−1 NdN = − (d)j B−1 Aj (3.92)
j=m+1
where (d)j ∈ R is the jth component of the vector d. By decomposing dN in the

canonical basis, we can also write
n
X
dN = (d)j ej−m (3.93)
j=m+1
where ek ∈ Rn−m is a vector for which all the components are zero, except the kth
one, which is 1. According to Definition 3.42, (3.92) and (3.93) are written as
Xn Xn
dB −B−1 Aj
d= = (d)j = (d)j dj , (3.94)
dN ej−m
j=m+1 j=m+1
where dj is the jth basic direction. We obtain (3.91).

Constraints 91
The proof of Theorems 3.27 and 3.28 is inspired by Nocedal and Wright (1999).
That of Theorems 3.36 and 3.37 is inspired by de Werra et al. (2003).
3.6 Exercises
Exercise 3.1. Take the feasible set defined by the constraints
x1 − x2 + 2x3 − 2x4 + 3x5 = 3

x1 + x3 + 2x5 = 1
x2 − x3 + 2x4 − x5 = −2.
1. Identify a feasible point.

2. Verify whether the constraints are linearly independent.

(x1 − 1)2 + x22 − 1
h(x1 , x2 ) = = 0,
(x1 − 2)2 + x22 − 4
sin(x1 )+cos(x2 ) 2 2
g(x1 , x2 ) = −e − x1 − x2 ≤ 0.
1. Determine the set of feasible points.

2. For each one, determine the active constraints.
3. For each one, verify whether the condition of independence of the constraints
(Definition 3.8) is satisfied.
x1 + x2 ≤ 3
x1 + x3 ≤ 7
x1 ≥ 0
x2 ≥ 0
x3 ≥ 0.
1. Take the point x = (3 0 4)T . Characterize the linearized cone in x.

2. Express the constraints in standard form.
3. Identify the basic solutions, and among them, those that are feasible.
4. For each point corresponding to a feasible basic solution,
(a) characterize the linearized cone,
(b) identify the basic directions,
(c) verify whether the basic directions are in the linearized cone.
−x1 + x2 ≤1
x1 + 2x2 ≤ 4.
1. Provide a graphic representation of this feasible set.

92 Exercises
2. Express the constraints in standard form.

3. List the basic solutions, and represent them on the graph.
4. For each feasible basic solution, list the basic directions and represent them on
the graph.
Exercise 3.5. Consider the feasible set defined by the following constraints:
x1 − x2 ≥ −2
2x1 + x2 ≤ 8
x1 + x2 ≤ 5
x1 + 2x2 ≤ 10
x1 ≥ 0
x2 ≥ 0
1. Provide a graphic representation of this feasible set.

2. Enumerate the vertices of D.
3. List the basic solutions and represent them on the graph.
4. For each feasible basic solution, list the basic directions and represent them on
the graph.
5. Reformulate the same set of constraints using a minimum number of constraints
(use the graphical representation to identify them).
(x1 + 1)2 + x22 ≤ 1

(x1 − 1)2 + x22 ≤ 1.
For each x and d below,

1. verify that x is feasible,
2. specify whether the direction d is feasible in x (justify!),
3. specify whether the direction d is feasible at the limit in x (justify!).
x d x d
0 0 -1 -1 1 0 -1 0
0 0 1 1 0 1 -1 1
1 0 0 1 0 1 -1 -1
1 0 0 -1 0 1 1 1
1 0 1 0 0 1 1 -1
Exercise 3.7. Take the√optimization problem minx∈R2 x1 +x2 subject to x21 +x22 = 2,
T
and the point x̄ = (− 2, 0) . Identify the feasible directions at the limit in x̄ by
employing the following sequences:
q ! q !
− 2 − k12 − 2 − k12
and .
1
k − k1
First verify that they are indeed feasible sequences.

Chapter 4
Introduction to duality
There are those who are subject to constraints and others who impose them. We
now take the point of view of the second category in order to analyze an optimization
problem from a different angle.
Contents
4.1 Constraint relaxation . . . . . . . . . . . . . . . . . . . . . 93
4.2 Duality in linear optimization . . . . . . . . . . . . . . . . 102
4.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
In Section 3.4, we attempted to explicitly eliminate the constraints of the optimiza-

tion problem by expressing one set of variables as a function of the others. In this
chapter, we use another technique to remove constraints. This technique, called con-
straint relaxation, plays an important role both theoretically and algorithmically. In
addition, it enables us to introduce the concept of duality.
4.1 Constraint relaxation

Here we introduce the concept of constraint relaxation with a simple example.
Example 4.1 (The mountaineer and the billionaire). A billionaire decides to offer a
mountaineer a prize linked to the altitude he manages to climb, at a rate of e 1 per
meter. However, for reasons only he knows, the billionaire requires the mountaineer
to stay in the Alps. The mountaineer immediately takes on the optimal strategy. He
climbs Mont Blanc and pockets e 4,807 (Figure 4.1(a)).
However, the mountaineer loves freedom and does not easily accept constraints.
After some negotiation, the billionaire allows the climber to go elsewhere than in the
Alps if he so desires, but must then pay a fine. A problem arises for the billionaire.
If the fine is too low, the climber will want to go to the Himalayas and climb Mount
Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 94 Constraint relaxation
(a) Mont Blanc (4,807 m) (b) Mount Everest (8,848 m)
Figure 4.1: Solutions for Example 4.1
Everest, culminating at 8,848 meters (Figure 4.1(b)). The billionaire thus decides to
set the fine at e 4,041. In this case, climbing Everest would give the mountaineer
8,848 − 4,041 = e 4,807, which is exactly the same amount as if he decided to climb
Mont Blanc. Therefore, the mountaineer has no interest in violating the constraint,
and the final solution to the problem is the same as with the constraint.
We can model the problem of the climber by calling his position (longitude/latitude)
x and the corresponding altitude f(x). The first problem is a constrained optimization
problem:
max f(x)
x
subject to
x ∈ Alps .
The fine imposed by the billionaire is denoted by a(x), and depends on the position
x. In particular, a(x) = 0 if x ∈ Alps. The optimization problem is now without
constraints and can be expressed as
max f(x) − a(x) .

x
Although somewhat imaginative, Example 4.1 shows us that an optimization prob-

lem can be seen from two points of view. From the viewpoint of the one solving the
problem (the mountaineer) and of the one who defines the rules of the game (the
billionaire). If we would like a constraint relaxation in order to remove them, we
must put ourselves in the place of the billionaire, so that the new rules are consistent
with the old ones. We now apply the same approach to another simple optimization
problem.
Introduction to duality 95
János von Neumann was born on December 28, 1903, in Bu-

dapest. Originally, his name did not have the noble prefix “von”.
In 1913, his father bought a title of nobility. János took the
name John von Neumann when he became an American in 1937.
Although holder of a chemistry degree from ETH (Swiss Federal
Institute of Technology) in Zürich, he quickly turned to math-

ematics. The results by Gödel on incompleteness gave him the
incentive to abandon his work on axiomatization of set theory.
Within the framework of quantum mechanics, he unified the theories of Schrödinger
and Heisenberg. He is considered the father of game theory. It is in this context that
he developed the principle of duality. One of his most famous quotes is “If people
do not believe that mathematics is simple, it is only because they do not realize
how complicated life is!” Von Neumann died on February 8, 1957, in Washington, DC.
Figure 4.2: John von Neumann
Example 4.2 (Constraint relaxation). Consider the optimization problem
min 2x1 + x2 (4.1)

x∈R2
subject to
1 − x1 − x2 = 0
x1 ≥ 0 (4.2)
x2 ≥ 0
for which the solution is x∗ = (0, 1)T with an optimal value 1. We now relax the
constraint 1 − x1 − x2 = 0 and introduce a fine that is proportional to the violation
of the constraint, with a proportionality factor λ. This way, the fine is zero when the
constraint is satisfied. We obtain the following problem:
min 2x1 + x2 + λ(1 − x1 − x2 ) (4.3)

x∈R2
subject to
x1 ≥ 0
(4.4)
x2 ≥ 0 .
We examine different values of λ.
• If λ = 0, (4.3) becomes 2x1 + x2 and the solution to the problem is x∗ = (0, 0)T
with an optimal value of 0 (Figure 4.3). This solution violates the constraint of
the original problem, and the optimal value is lower. It is a typical case where
the penalty value is ill-suited, and where it becomes interesting to violate the
constraint.
96 Constraint relaxation
30
20
10
2x + y 0
-10
-20
-30 10
5
0 x2
-5
-10 -5 -10
0 5 10
x1
Figure 4.3: Objective function of Example 4.2: λ = 0
30
20
10
2−y 0
-10
-20
-30
10
5
0 x2
-5
-10 -5 0 -10
5 10
x1
• If λ = 2, (4.3) becomes 2 − x2 and the problem is unbounded, because the more

the value of x2 increases, the more the objective function decreases (Figure 4.4).
It is imperative to avoid such values of the penalty parameter, which generate
unbounded problems.
• Finally, if λ = 1, (4.3) becomes x1 + 1, and each x such that x1 = 0 is a solution
to the problem, with an optimal value of 1 (Figure 4.5). In this case, regardless of
the value of x2 , there is no way to get a better value than the optimal value of the
initial problem. The penalty parameter acts as a deterrent, and there is nothing
to be gained by violating the constraint.
30
20
10
1+x 0
-10
-20
-30
10
5
0 x2
-5
-10 -5 0 -10
5 10
x1
Consider the optimization problem (1.71)–(1.73). We generalize this idea to in-

corporate constraints in the objective function. The function thus obtained is called
Lagrangian or the Lagrangian function.
Definition 4.3 (Lagrangian function). Consider the optimization problem (1.71)–

(1.73) min f(x) subject to h(x) = 0 and g(x) ≤ 0, and consider the vectors λ ∈ Rm
and µ ∈ Rp . The function L : Rn+m+p → R defined by
L(x, λ, µ) = f(x) + λT h(x) + µT g(x)

m
X p
X (4.5)
= f(x) + λi hi (x) + µj gj (x)
i=1 j=1
is called Lagrangian or the Lagrangian function of the problem (1.71)–(1.73).
As we did in Example 4.2, we can minimize the Lagrangian function for each fixed
value of the parameters λ and µ. Indeed, the Lagrangian function now depends only
on x. The function that associates a set of parameters to the optimal value of the
associated problem is called a dual function.
Definition 4.4 (Dual function). Consider the optimization problem (1.71)–(1.73)

and its Lagrangian function L(x, λ, µ) defined by (4.5). The function q : Rm+p → R
defined by
q(λ, µ) = minn L(x, λ, µ) (4.6)
x∈R
is the dual function of the problem (1.71)–(1.73). The parameters λ and µ are called
dual variables. In this context, the variables x are called primal variables.
If we take Example 4.1, −q(λ, µ) represents the mountaineer’s prize1 if the bil-
lionaire imposes a fine for violation of the constraints λT h(x) + µT g(x).
For inequality constraints, since only non negative values of g(x) should be avoided
and result in a fine, it is essential that µ ≥ 0. Indeed, the term µT g(x) is non negative,
and thus penalizing, only when g(x) > 0.
Theorem 4.5 (Bound from dual function). Let x∗ be the solution to the opti-
mization problem (1.71)–(1.73), and let q(λ, µ) be the dual function to the same
problem. Consider λ ∈ Rm and µ ∈ Rp , µ ≥ 0. Then,
q(λ, µ) ≤ f(x∗ ) , (4.7)
and the dual function provides lower bounds on the optimal value of the problem.
Proof.
q(λ, µ) = minn L(x, λ, µ) according to (4.6)

x∈R
∗
≤ L(x , λ, µ)
= f(x∗ ) + λT h(x∗ ) + µT g(x∗ ) according to (4.5)
∗ ∗
T
= f(x ) + µ g(x ) h(x∗ ) = 0
≤ f(x∗ ) g(x∗ ) ≤ 0 and µ ≥ 0.
Corollary 4.6 (Objective functions of the primal and dual). Let x be a feasible
solution of the optimization problem (1.71)–(1.73), and let q(λ, µ) be the dual
function to the same problem. Consider λ ∈ Rm and µ ∈ Rp , µ ≥ 0. Then,
q(λ, µ) ≤ f(x). (4.8)
Proof. Denote x∗ the optimal solution of the primal problem. As x is primal feasible,
we have f(x∗ ) ≤ f(x). The results follows from Theorem (4.5).
If we take the point of view of the billionaire, the problem is to define these fines
in such a manner that the mountaineer wins as little as possible with the new system.
He tries to optimize the dual function, ensuring that the considered parameters λ and
µ ≥ 0 do not generate an unbounded problem. This optimization problem is called
the dual problem.
1 The sign of q is changed because the problem with the mountaineer is one of maximization and
not minimization.
Definition 4.7 (Dual problem). Consider the optimization problem (1.71)–(1.73)

and its dual function q(λ, µ) defined by (4.6). Let Xq ⊆ Rm+p be the domain of q,
i.e.,
Xq = λ, µ | q(λ, µ) > −∞ . (4.9)
The optimization problem

max q(λ, µ) (4.10)
λ,µ
subject to
µ≥0 (4.11)
and
(λ, µ) ∈ Xq (4.12)
is the dual problem of the problem (1.71)–(1.73). In this context, the original problem
(1.71)–(1.73) is called the primal problem.
Example 4.8 (Dual problem). Take again Example 4.2:
min 2x1 + x2 (4.13)

x∈R2
subject to
h1 (x) = 1 − x1 − x2 = 0 (λ)
g1 (x) = − x1 ≤0 (µ1 ) (4.14)
g2 (x) = − x2 ≤ 0 (µ2 ) .
The Lagrangian function of this problem is
L(x1 , x2 , λ, µ1 , µ2 ) = 2x1 + x2 + λ(1 − x1 − x2 ) − µ1 x1 − µ2 x2

= (2 − λ − µ1 )x1 + (1 − λ − µ2 )x2 + λ .
In order for the dual function to be bounded, the coefficients of x1 and x2 have to be
zero, and
2 − λ − µ1 = 0 , 1 − λ − µ2 = 0 ,
or
µ1 = 2 − λ , µ2 = 1 − λ . (4.15)
Therefore, we can eliminate µ1 and µ2 so that

Xq = λ|λ ≤ 1 ,
and the dual function becomes

q(λ) = λ
The dual problem is written as

max λ
subject to
λ ≤ 1,
for which the solution is λ∗ = 1. According to the equalities (4.15), we have µ∗1 = 1
and µ∗2 = 0.
As a direct consequence of Theorem 4.5, the optimal value of this problem can
never exceed the optimal value of the original problem. This result is called the weak
duality theorem.
Theorem 4.9 (Weak duality). Let x∗ be the optimal solution to the primal prob-
lem (1.71)–(1.73) and let (λ∗ , µ∗ ) be the optimal solution to the associated dual
problem (4.10)–(4.12). Then
q(λ∗ , µ∗ ) ≤ f(x∗ ) . (4.16)
Proof. This theorem is a special case of Theorem 4.5 for λ = λ∗ and µ = µ∗ .
Corollary 4.10 (Duality and feasibility). Consider the primal problem (1.71)–
(1.73) and the associated dual problem (4.10)–(4.12).
• If the primal problem is unbounded, then the dual problem is not feasible.
• If the dual problem is unbounded, then the primal problem is not feasible.
Proof. If the optimal value of the primal problem is −∞, there is no dual variable
(λ, µ) that satisfies (4.16) and the dual problem is not feasible. Similarly, if the
optimal value of the dual problem is +∞, there is no primal variable x that satisfies
(4.16) and the primal problem is not feasible.
Corollary 4.11 (Optimality of the primal and the dual). Let x∗ be a feasible so-
lution of the primal problem (1.71)–(1.73) and let (λ∗ , µ∗ ) be a feasible solution
of the associated dual problem (4.10)–(4.12). If q(λ∗ , µ∗ ) = f(x∗ ), then x∗ is
optimal for the primal, and (λ∗ , µ∗ ) is optimal for the dual.
Proof. Consider any x feasible for the primal. From Theorem 4.5, we have
f(x) ≥ q(λ∗ , µ∗ ) = f(x∗ ),

proving the optimality of x∗ . Similarly, consider any (λ, µ) feasible for the dual. From
the same theorem, we have
q(λ, µ) ≤ f(x∗ ) = q(λ∗ , µ∗ ),

proving the optimality of (λ∗ , µ∗ ).
Corollary 4.12 (Duality and feasibility (II)). Consider the primal problem (1.71)–
(1.73) and the associated dual problem (4.10)–(4.12).
• If the primal problem is infeasible, then the dual problem is either unbounded
or infeasible.
• If the dual problem is infeasible, then the primal problem is either unbounded
or infeasible.
Proof. We show the contrapositive. If the dual problem is bounded and feasible, it
has an optimal solution. From Corollary 4.11, the primal problem has also an optimal
solution, as is therefore feasible. The second statement is shown in a similar way.
The dual problem has interesting geometric properties. Indeed, the objective
function to maximize is concave, and the domain Xq is convex.
Theorem 4.13 (Concavity-convexity of a dual problem). Let (4.10)–(4.12) be the

dual problem of an optimization problem. The objective function (4.10) is con-
cave, and the domain of the dual function (4.9) is convex.
Proof. Consider x ∈ Rn , γ = (λ, µ) and γ̄ = (λ̄, µ̄) ∈ Rm+p , such that µ, µ̄ ≥ 0,

γ, γ̄ ∈ Xq and γ 6= γ̄. Consider also α ∈ R such that 0 ≤ α ≤ 1. According to
Definition 4.3, we have

L x, αγ + (1 − α)γ̄ = αL(x, γ) + (1 − α)L(x, γ̄) .
Taking the minimum, we obtain

min L x, αγ + (1 − α)γ̄ ≥ α min L(x, γ) + (1 − α) min L(x, γ̄) (4.17)
x x x
or

q αγ + (1 − α)γ̄ ≥ αq(γ) + (1 − α)q(γ̄) , (4.18)
which demonstrates the concavity of q (Definition 2.3). Since γ and γ̄ are in Xq , then
q(γ) > −∞ and q(γ̄) > −∞. According to (4.18), we also have q(αγ+(1−α)γ̄) > −∞
and this way αγ + (1 − α)γ̄ is in Xq , proving the convexity of Xq (Definition B.2).
102 Duality in linear optimization
Giuseppe Lodovico Lagrangia, born in Turin on January 25,

1736, is often considered a French mathematician, despite his
Italian origin, and is known under the name Joseph-Louis La-
grange. It was he himself who in his youth took the French ver-
sion of the name. In 1766, he succeeded Euler as director of the
mathematics section of the Academy of Sciences in Berlin. He

was the first professor of analysis at the Ecole Polytechnique in
Paris (founded 1794 under the name “Ecole Centrale des Travaux
Publics.” He was a member of the Bureau des Longitudes, created on June 25, 1795.
Napoleon presented him with the Legion of Honor in 1808 and the Grand Cross of
the Imperial Order of Reunion on April 3, 1813, a few days before his death. He
contributed significantly to diverse areas, such as calculus, astronomy, analytical me-
chanics, probability, fluid mechanics and number theory. He oversaw the introduction
of the metric system, working with Lavoisier. He died on April 10, 1813, and is buried
in the Pantheon, in Paris. The funeral oration was given by Laplace.
Figure 4.6: Joseph-Louis Lagrange
4.2 Duality in linear optimization

We now analyze the dual problem in the context of linear optimization. We consider
the following (primal) problem:
min cT x (4.19)
x
subject to
Ax = b
(4.20)
x≥0
and we have
h(x) = b − Ax and g(x) = −x .
Therefore, the Lagrangian function (4.5) can be written as
L(x, λ, µ) = cT x + λT (b − Ax) − µT x
T (4.21)
= c − AT λ − µ x + λT b .
The Lagrangian function is linear in x. The only possibility for it to be bounded is if

it is constant, i.e.,
c − AT λ − µ = 0 .
In this case, the dual function is q(λ, µ) = λT b and the dual problem is written as
max λT b (4.22)
λ,µ
subject to
µ≥0
(4.23)
µ = c − AT λ .
By eliminating µ, renaming λ x, and changing the maximization to a minimization,
we obtain
min −bT x (4.24)

x
subject to
AT x ≤ c . (4.25)
This is another linear optimization problem. We calculate its dual problem. Since
there are no equality constraints, we have
L(x, µ) = −bT x + µT (AT x − c)

= (−b + Aµ)T x − µT c .
Again, for this linear function to be bounded, it has to be constant, i.e., −b + Aµ = 0.

The dual function is q(µ) = −µT c and the dual problem is written as
max −µT c
µ
subject to
µ≥0
Aµ = b .
By replacing µ by x, and transforming the maximization into a minimization, we
obtain the original problem (4.19)–(4.20). The dual of the dual problem is the primal
problem. We can now generalize these results.
Theorem 4.14 (Dual of a linear problem). Consider the following linear problem:
min cT1 x1 + cT2 x2 + cT3 x3 (4.26)

x
subject to
A1 x1 + B1 x2 + C1 x3 = b1
A2 x1 + B2 x2 + C2 x3 ≤ b2
A3 x1 + B3 x2 + C3 x3 ≥ b3
(4.27)
x1 ≥ 0
x2 ≤ 0
x3 ∈ Rn3 ,
where x1 ∈ Rn1 , x2 ∈ Rn2 , x3 ∈ Rn3 , b1 ∈ Rm , b2 ∈ Rpi and b3 ∈ Rps . The
matrices Ai , Bi , Ci , i = 1, 2, 3, have appropriate dimensions. The dual of this
problem is
max γT b = γT1 b1 + γT2 b2 + γT3 b3 (4.28)
γ
subject to
(γ1 ∈ Rm )
γ2 ≤ 0
γ3 ≥ 0
(4.29)
AT1 γ1 + AT2 γ2 + AT3 γ3 = T
A γ ≤ c1

BT1 γ1 + BT2 γ2 + BT3 γ3 = BT γ ≥ c2

CT1 γ1 + CT2 γ2 + CT3 γ3 = CT γ = c3
T T
with γ = γT1 γT2 γT3 ∈ Rm+pi +ps and A = AT1 AT2 AT3 ∈
(m+pi +ps )×n1
R . The matrices B and C are defined in a similar manner.
Proof. The Lagrangian function is written as
L(x, λ, µ2 , µ3 , µx1 , µx2 ) = cT1 x1 + cT2 x2 + cT3 x3

+ λT (b1 − A1 x1 − B1 x2 − C1 x3 )
+ µT2 (A2 x1 + B2 x2 + C2 x3 − b2 )
+ µT3 (b3 − A3 x1 − B3 x2 − C3 x3 )
− µTx1 x1
+ µTx2 x2
with µ2 , µ3 , µx1 , µx2 ≥ 0. By combining the terms, we obtain
L(x, λ, µ2 , µ3 , µx1 , µx2 ) = λT b1 − µT2 b2 + µT3 b3

T
+ c1 − AT1 λ + AT2 µ2 − AT3 µ3 − µx1 x1
T
+ c2 − BT1 λ + BT2 µ2 − BT3 µ3 + µx2 x2
T
+ c3 − CT1 λ + CT2 µ2 − CT3 µ3 x3 .
Define γ1 = λ, γ2 = −µ2 and γ3 = µ3 . We immediately deduce that γ1 ∈ Rm ,

γ2 ≤ 0 and γ3 ≥ 0. We obtain the Lagrangian function
L(x, γ, µx1 , µx2 ) = γT b

T
+ c1 − AT γ − µx1 x1
T
+ c2 − BT γ + µx2 x2
T
+ c3 − CT γ x3 .
This is a linear function. For it to be bounded, is needs to be constant and
µx1 = c1 − AT γ
µx2 = BT γ − c2
CT γ = c3 .
We now need only use µx1 ≥ 0 and µx2 ≥ 0 to obtain the result.
Note that the problem (4.26)–(4.27) combines all the possibilities of writing the
constraints of a linear problem: equality, lower inequality, upper inequality, non pos-
itivity, and non negativity. The result can be summarized as follows:
• For each constraint of the primal there is a dual variable
Constraint of the primal Dual variable

= free
≤ ≤0
≥ ≥0
• For each primal variable there is a dual constraint
Primal variable Dual constraint
≥0 ≤
≤0 ≥
free =
Theorem 4.15 (The dual of the dual is the primal). Consider a (primal) linear
optimization problem. If the dual is converted into a minimization problem, and
we calculate its dual, we obtain a problem equivalent to the primal problem.
Proof. In the problem (4.28)–(4.29) of Theorem 4.14, we replace γ by −x and the

maximization by a minimization:
min xT1 b1 + xT2 b2 + xT3 b3

x1 ,x2 ,x3
subject to
x 1 ∈ Rm
x2 ≥ 0
x3 ≤0
T T T
A1 x1 + A2 x2 + A3 x3 ≥ −c1
BT1 x1 + BT2 x2 + BT3 x3 ≤ −c2
CT1 x1 + CT2 x2 + CT3 x3 = −c3 .
According to Theorem 4.14, the dual of this problem is
max −c1 γ1 − c2 γ2 − c3 γ3
γ1 ,γ2 ,γ3
subject to
A1 γ1 + B1 γ2 + C1 γ3 = b1
A2 γ1 + B2 γ2 + C2 γ3 ≤ b2
A3 γ1 + B3 γ2 + C3 γ3 ≥ b3
γ1 ≥ 0
γ2 ≤ 0
γ3 ∈ Rn3 .
We now need only replace γ by x, and convert the maximization into a minimization
to obtain (4.26)–(4.27) and prove the result.
Example 4.16 (Dual of a linear problem). Consider the linear optimization problem
min x1 + 2x2 + 3x3
subject to
−x1 + 3x2 = 5
2x1 − x2 + 3x3 ≥ 6
x3 ≤ 4
x1 ≥ 0
x2 ≤ 0
x3 ∈ R .
The dual problem is

max 5γ1 + 6γ2 + 4γ3
subject to
γ1 ∈ R
γ2 ≥ 0
γ3 ≤ 0
−γ1 + 2γ2 ≤1
3γ1 − γ2 ≥2
3γ2 + γ3 = 3 .
This is also a linear problem. We write it as a minimization problem and rename the
variables x.
min −5x1 − 6x2 − 4x3
subject to
x1 ∈ R
x2 ≥ 0
x3 ≤ 0
x1 − 2x2 ≥ −1
−3x1 + x2 ≤ −2
−3x2 − x3 = −3 .
We can calculate its dual :

max −γ1 − 2γ2 − 3γ3
subject to
γ1 − 3γ2 = −5
−2γ1 + γ2 − 3γ3 ≤ −6
−γ3 ≥ −4
γ1 ≥ 0
γ2 ≤ 0
γ3 ∈ R .
It is easy to verify that this problem is equivalent to the original problem.
We conclude this chapter with an important result in linear optimization, called

the strong duality theorem.
Theorem 4.17 (Strong duality). Consider a linear optimization problem and its
dual. If one problem has an optimal solution, so does the other one, and the
optimal value of their objective functions are the same.
Proof. Consider A ∈ Rm×n , b ∈ Rm , c ∈ Rn , x ∈ Rn and λ ∈ Rm . Consider the

primal problem
min cT x
subject to
Ax = b, x ≥ 0,
and the dual problem
max bT λ
subject to
AT λ ≤ c.
Assume that the dual problem has an optimal solution λ∗ . Therefore, for each λ
that is dual feasible, we have bT λ ≤ bT λ∗ . Equivalently, for any ε > 0, we have
bT λ < bT λ∗ + ε. Therefore, there is no λ which verifies both the dual constraints
AT λ ≤ c and bT λ ≥ bT λ∗ + ε or, equivalently, −bT λ ≤ −bT λ∗ − ε. In other words,
the system of n + 1 linear inequalities, with m variables

AT c
λ≤
−bT −bT λ∗ − ε
is incompatible. According to Farkas’ lemma (Lemma C.10), there exists a vector

y
,
r
where y ∈ Rn , y ≥ 0, and r ∈ R, r ≥ 0, such that

T AT
(y r) = 0,
−bT
108 Exercises
that is
Ay − rb = 0, (4.30)
and
T c
(y r) < 0,
−bT λ∗ − ε
that is
cT y − rbT λ∗ − rε < 0. (4.31)
We distinguish two cases.
r = 0 In this case, (4.30) is Ay = 0 and (4.31) is cT y < 0. Applying Farkas’ lemma
to the compatible system AT λ ≤ c (it is verified at least by λ∗ ), we obtain that
cT y ≥ 0, for each y ≥ 0 such that Ay = 0, contradicting (4.30)–(4.31). Therefore
r 6= 0.
r > 0 Divide (4.30) by r, and define x∗ = y/r to obtain
Ax∗ − b = 0. (4.32)
As y ≥ 0 and r > 0, x∗ is feasible for the primal problem. Dividing also (4.31) by
r, we obtain
cT x∗ − bT λ∗ − ε < 0. (4.33)
Denote δ = cT x∗ − bT λ∗ . By Corollary (4.6), as x∗ is primal feasible and λ∗ dual
feasible, we know that δ = cT x∗ − bT λ∗ ≥ 0. Therefore, (4.33) is written as
0 ≤ δ < ε.
As this must be true for any arbitrary small ε, we obtain δ = 0, and cT x∗ = bT λ∗ .

From Corollary (4.11), x∗ is the optimal solution of the primal problem.
As the dual of the dual is the primal (Theorem (4.15)), the result holds in the other
direction as well.
Another proof, based on the optimality conditions, is presented in Theorem 6.33.
Note that the strong duality result does not hold in general for all optimization
problems. Yet, it holds if the objective function is convex and the constraints linear
(see Bertsekas, 1999, Proposition 5.2.1).
4.3 Exercises
Exercise 4.1. Consider the optimization problem
min x21 + x22 subject to x1 = 1 .

x∈R2
1. Write the Lagrangian of this problem.

2. Write the dual function.
3. Write and solve the dual problem.
Exercise 4.2. Same questions as for Exercise 4.1 for the problem
1 2
min (x + x22 ) s.c. x1 ≥ 1 .
x∈R2 2 1
Exercise 4.3. Consider a matrix A ∈ Rn×n such that AT = −A, and the vector
c ∈ Rn . Consider the optimization problem
min cT x
x∈Rn
subject to
Ax ≥ −c
x ≥ 0.
Demonstrate that this problem is self-dual, i.e., that the dual problem is equivalent
to the primal problem.
Exercise 4.4. Consider the optimization problem
min −3x1 + 2x2

x∈R2
subject to
x1 − x2 ≤ 2
−x1 + x2 ≤ −3
x1 , x2 ≥ 0.
1. Write the Lagrangian.
2. Write the dual function.
3. Write the dual problem.
4. Represent graphically the feasible set of the primal problem.
5. Represent graphically the feasible set of the dual problem.
Exercise 4.5. Same questions for the following problem.
min −x1 − x2
x∈R2
subject to
−x1 + x2 ≤ 1
1
x1 − x2 ≤ 0
2
x1 , x2 ≥ 0 .
Part II
Optimality conditions
As far as the laws of mathematics

refer to reality, they are not certain;
and as far as they are certain, they
do not refer to reality.
Albert Einstein
Before developing algorithms that enable us to identify solutions to an optimization

problem, we must be able to decide whether a given point is optimal or not. These
optimality conditions have three key roles in the development of algorithms:
1. they provide a theoretical analysis of the problem,
2. they directly inspire ideas for algorithms,
3. they render it possible to determine a stopping criterion for iterative algorithms.
We analyze them in detail and in a gradual manner, starting with the simplest
ones.
Chapter 5
Unconstrained optimization
Contents
5.1 Necessary optimality conditions . . . . . . . . . . . . . . . 115
5.2 Sufficient optimality conditions . . . . . . . . . . . . . . . 120
5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.1 Necessary optimality conditions

Consider the problem of unconstrained optimization (1.71) minx∈Rn f(x) and a local
minimum x∗ , as defined in Definition 1.5. We attempt to characterize the minimum
by using the results developed in Chapter 2. The first optimality condition is a
generalization of a well-known result, attributed to Fermat.
Theorem 5.1 (Necessary optimality conditions). Let x∗ be a local minimum of a

function f : Rn → R. If f is differentiable in an open neighborhood V of x∗ , then,
∇f(x∗ ) = 0 . (5.1)
If, in addition, f is twice differentiable on V, then
∇2 f(x∗ ) is positive semidefinite . (5.2)
Condition (5.1) is said to be a first-order necessary condition, and condition

(5.2) is said to be a second-order necessary condition.
Proof. We recall that −∇f(x∗ ) is the direction of the steepest descent in x∗ (Theorem
2.13) and assume by contradiction that ∇f(x∗ ) 6= 0. We can then use Theorem 2.11
with the descent direction d = −∇f(x∗ ) to obtain η such that

f x∗ − α∇f(x∗ ) < f(x∗ ) , ∀α ∈ ]0, η] ,
116 Necessary optimality conditions
which contradicts the optimality of x∗ and demonstrates the first-order condition. To

demonstrate the second-order condition, we invoke Taylor’s theorem (Theorem C.2)
in x∗ , with an arbitrary direction d and an arbitrary step α > 0 such that x∗ +αd ∈ V.
As
1
f(x∗ + αd) − f(x∗ ) = αdT ∇f(x∗ ) + α2 dT ∇2 f(x∗ )d + o kαdk2 ,
2
we have
1 2 T 2 ∗
f(x∗ + αd) − f(x∗ ) = α d ∇ f(x )d + o kαdk2 from (5.1)
2
1
= α2 dT ∇2 f(x∗ )d + o(α2 ) kdk does not depend on α
2
≥0 x∗ is optimal .
When we divide by α2 , we get
1 T 2 ∗ o(α2 )
d ∇ f(x )d + ≥ 0. (5.3)
2 α2
Intuitively, as the second term can be made as small as desired, the result must hold.
More formally, let us assume by contradiction that dT ∇2 f(x∗ )d is negative and that
its value is −2ε, with ε > 0. According to the Landau notation o(·) (Definition B.17),
for all ε > 0, there exists η such that
o(α2 )
< ε, ∀0 < α ≤ η ,
α2
and
1 T 2 ∗ o(α2 ) 1 T 2 ∗ o(α2 ) 1
d ∇ f(x )d + ≤ d ∇ f(x )d + < − 2ε + ε = 0 ,
2 α2 2 α2 2
which contradicts (5.3) and proves that dT ∇2 f(x∗ )d ≥ 0. Since d is an arbitrary
direction, ∇2 f(x∗ ) is positive semidefinite (Definition B.8).
From a geometrical point of view, the second-order condition means that f is
locally convex in x (Theorem 2.21).
Example 5.2 (Affine function). Consider an affine function (see Definition 2.25):
f(x) = cT x + d, (5.4)
where c ∈ Rn is a vector of constants and d ∈ R. Then, ∇f(x) = c and ∇2 f(x) = 0.

Therefore, the necessary optimality conditions are verified for every x if c = 0, and
for no x if c 6= 0. The geometric interpretation is that an affine function is bounded
only if it is constant. We have used this property in Section 4.1 to derive the dual
problem.
Example 5.3 (Necessary optimality condition – I). Consider the function

2 2
f(x1 , x2 ) = 100 x2 − x21 + 1 − x1
Unconstrained optimization 117
illustrated in Figure 5.1 (see Section 11.6 for a discussion of this function). The point
T
1 1 is a local minimum of the function. We have

400 x31 − 400 x1 x2 + 2x1 − 2
∇f(x1 , x2 ) = ,
200 x2 − 200 x21
T
which is indeed zero in 1 1 . Moreover,

1200 x21 − 400 x2 + 2 −400 x1
∇2 f(x1 , x2 ) = ,
−400 x1 200
T
which, in 1 1 , is:

802 −400
∇2 f(1, 1) = ,
−400 200
for which the eigenvalues are positive (0.39936 and 1,001.6) and the Hessian matrix
T
is positive semidefinite. Note that the conditioning of f in 1 1 is high (2,508)
and that the function is ill-conditioned at the solution (Section 2.5).
x1 x2
Figure 5.1: Function of Example 5.3
It is important to emphasize that the necessary optimality conditions are not

sufficient, as shown by Examples 5.4 and 5.5.
Example 5.4 (Necessary optimality condition – II). Consider the function
f(x1 , x2 ) = −x41 − x42

118 Necessary optimality conditions
T
illustrated in Figure 5.2. The point 0 0 satisfies the necessary optimality
conditions. Indeed,
−4x31
∇f(x1 , x2 ) =
−4x32
T
is zero in 0 0 . Moreover,

2 −12 x21 0
∇ f(x1 , x2 ) =
0 −12 x22
T
is positive semidefinite in 0 0 . However, this is not a local minimum.
T To
demonstrate this, consider a non zero arbitrary direction
T d = d1 d2 and take
a step α > 0 of any length from the point 0 0 . We have
0 = f(0, 0) > f(αd1 , αd2 ) = −(αd1 )4 − (αd2 )4

T
and 0 0 turns out to be a local maximum. From a geometrical point of view,
T
the function is in fact concave and not convex in 0 0 .
x2
x1
Example 5.5 (Necessary optimality condition – III). Consider the function
f(x1 , x2 ) = 50 x21 − x32

T
illustrated in Figure 5.3. The point 0 0 satisfies the necessary optimality
conditions. Indeed,
100 x1
∇f(x1 , x2 ) =
−3x22
T
is zero in 0 0 . Moreover,

2 100 0
∇ f(x1 , x2 ) =
0 −6x2
T
is positive semidefinite in 0 0 . However,
T it is not a local minimum. To show
this, consider the direction d = 0 1 and take any one step α > 0 from the
T
point 0 0 . We have
0 = f(0, 0) > f(0, α) = −α3

T
and 0 0 is not a local minimum. However, if we consider the direction d =
T
0 −1 , we obtain
0 = f(0, 0) < f(0, −α) = α3 .
T
Then, 0 0 is not a local maximum either. From a geometrical viewpoint, the
T
function is neither concave nor convex in 0 0 .
x2 x1
In practice, the second-order necessary condition is difficult to check, as this re-

quires calculations of the second derivatives and analyses of the eigenvalues of the
Hessian matrix. The first-order necessary optimality condition plays a central role in
optimization. The vectors x that satisfy this condition are called critical points or
stationary points. Among them, there are local minima, local maxima, and points
that are neither (Example 5.5). The latter are called saddle points.
120 Sufficient optimality conditions
Definition 5.6 (Critical point). Let f : Rn → R be a differentiable function. Any

vector x ∈ Rn such that ∇f(x) = 0 is said to be a critical point or stationary point
of f.
5.2 Sufficient optimality conditions
Theorem 5.7 (Sufficient optimality conditions). Consider a function f : Rn →

R twice differentiable in an open subset V of Rn and let x∗ ∈ V satisfy the
conditions
∇f(x∗ ) = 0 (5.5)
and
∇2 f(x∗ ) is positive definite . (5.6)
∗
In this case, x is a local minimum of f.
Proof. We assume by contradiction that there exists a direction d and η > 0 such
that, for any 0 < α ≤ η, f(x∗ + αd) < f(x∗ ). With an identical approach to the proof
of Theorem 5.1, we have
f(x∗ + αd) − f(x∗ ) 1 o(α2 )

2
= dT ∇2 f(x∗ )d +
α 2 α2
and
1 T 2 ∗ o(α2 )
d ∇ f(x )d + <0
2 α2
or
1 T 2 ∗ o(α2 )
d ∇ f(x )d + +ε=0
2 α2
with ε > 0. According to the definition of the Landau notation o( · ) (Definition
B.17), there exists η̄ such that
o(α2 )
< ε, ∀α, 0 < α ≤ η̄ ,
α2
and then, for any α ≤ min(η, η̄), we have
o(α2 ) o(α2 )
− ≤ < ε,
α2 α2
such that
1 T 2 ∗ o(α2 )
d ∇ f(x )d = − 2 − ε < 0 ,
2 α
which contradicts the fact that ∇2 f(x∗ ) is positive definite.
Example 5.8 (Optimality conditions). Consider the function
1 2
f(x1 , x2 ) = x + x1 cos x2
2 1
illustrated in Figure 5.4. We use the optimality conditions to identify the minima of
this function. We have

x1 + cos x2
∇f(x1 , x2 ) = .
−x1 sin x2
3
2
1
0
-1
-6
-4
-2
0
x2
2 -1.5
4 -0.5 -1
0.5 0
6 1
1.5 x1
Figure 5.4: Function of Example 5.8: the surface
T T
This gradient is zero for x∗k = (−1)k+1 , kπ , k ∈ Z, and for x̄k = 0, π2 + kπ ,
k ∈ Z, as illustrated in Figure 5.5. We also have

2 1 − sin x2
∇ f(x1 , x2 ) = .
− sin x2 −x1 cos x2
By evaluating this matrix in x∗k , we get for any k

1 0
∇2 f(x∗k ) = .
0 1
Since this matrix is positive definite, each point x∗k satisfies the sufficient optimality
conditions and is a local minimum of the function.
By evaluating the Hessian matrix in x̄k , we get for any k

2 1 (−1)k+1
∇ f(x̄k ) = .
(−1)k+1 0
x∗2 6
x̄1
4
x∗1
x̄0 2
x∗0 0 x2
x̄−1 -2
x∗−1
-4
x̄−2
x∗−2 -6
-1.5 -1 -0.5 0 0.5 1 1.5
x1
Figure 5.5: Function of Example 5.8: stationary points
Regardless of k, this matrix is not positive semidefinite. Therefore, there is no x̄k

that satisfies the necessary optimality conditions. None of them can then be a local
minimum.
We now present a sufficient condition for a local minimum to also be a global

minimum.
Theorem 5.9 (Sufficient global optimality conditions). Consider a continuous

function f : Rn → R and let x∗ ∈ Rn be a local minimum of f. If f is a convex
function, then x∗ is a global minimum of f. If, moreover, f is strictly convex, x∗
is the only global minimum of f.
Proof. We assume by contradiction that there exists another local minimum x+ 6= x∗ ,

such that f(x+ ) < f(x∗ ). By the convexity of f (Definition 2.1), we have

f αx∗ + (1 − α)x+ ≤ αf(x∗ ) + (1 − α)f(x+ ),
where 0 ≤ α ≤ 1. Since f(x+ ) < f(x∗ ), we have for each α ∈ [0, 1[

f αx∗ + (1 − α)x+ < αf(x∗ ) + (1 − α)f(x∗ ) = f(x∗ ) . (5.7)
It means that any point strictly between x∗ and x+ is also strictly better than x∗ .
Consider an arbitrary ε > 0, and demonstrate that Definition 1.5 of the local minimum
is contradicted. If ε ≥ kx∗ − x+ k, (1.75) is not satisfied for x = x+ , when taking α = 1
in (5.7). If ε < kx∗ − x+ k, consider 0 < η < 1 such that kηx∗ + (1 − η)x+ k = ε. In
this case, (1.75) is not satisfied for x = αx∗ + (1 − α)x+ with η ≤ α < 1 according to
(5.7). Since η < 1, such α always exist.
We now consider a strictly convex function, and assume that x∗ and y∗ are two
distinct global minima, and then x∗ 6= y∗ and f(x∗ ) = f(y∗ ). According to Definition
2.2, we have

f αx∗ + (1 − α)y∗ < αf(x∗ ) + (1 − α)f(y∗ ) = f(x∗ ) = f(y∗ ) , ∀α ∈ ]0, 1[ ,
which contradicts that x∗ and y∗ are global minima.
Pierre de Fermat was born in Beaumont-de-Lomagne close to

Montauban on August 20, 1601, and died in Castres on January
12, 1665. With the exception of a few isolated articles, Fermat
never published and never gave any publicity to his methods.
Some of his most important results were written in the mar-
gin of books, the most often without proof. For instance, his
“observations on Diophante,” an important part of his work on
number theory, was published by his son on the basis of margin
notes in a copy of Arithmetica. Fermat’s conjecture is probably
the most famous of his intuitions. It affirms that when n ≥ 3,
there exists no non zero integer numbers x, y and z such that
xn + yn = zn . He wrote the following note in the margin of
Arithmetica by Diophante: “I have a marvelous demonstration, but this margin is
too narrow to contain it.” This conjecture, called Fermat’s last theorem, was proven
by Wiles (1995). Fermat’s body was transferred from Castres to the Augustinian
Convent in Toulouse in 1675.
Figure 5.6: Pierre de Fermat
We conclude this chapter with a discussion of the optimality conditions for quadratic
problems (Definition 2.28).
Theorem 5.10 (Optimality conditions for quadratic problems). We consider the

problem
1
min f(x) = xT Qx + gT x + c , (5.8)
x∈Rn 2
where Q ∈ n × n is a symmetric matrix, g ∈ Rn and c ∈ R.
1. If Q is not positive semidefinite, then the problem (5.8) has no solution, i.e.,
there is no x ∈ Rn that is a local minimum of (5.8).
2. If Q is positive definite, then
x∗ = −Q−1 g (5.9)
is the only global minimum of (5.8).

3. If Q is positive semidefinite, but not positive definite,
either the problem is not bounded or there is an infinite number of global

minima. More precisely, we consider the Schur decomposition of Q (see
Theorem C.5):
Q = UΛUT , (5.10)
where U is an orthogonal matrix (composed of the eigenvectors of Q organized

in columns) and Λ is a diagonal matrix with the eigenvalues of Q as entries.
As Q is positive semidefinite but not positive definite, it means that it is rank
deficient, so that r eigenvalues are positive and n − r are zero. We can sort
the indices in such a way that the r first eigenvalues on the diagonal are
positive, and the n − r last are zero:

Λr 0 Λr 0
Λ= = = diag(λ1 , . . . , λr , 0, . . . , 0). (5.11)
0 Λn−r 0 0
We decompose the vectors x and g as follows:

T yr T gr
U x= and U g = , (5.12)
yn−r gn−r
where yr , gr ∈ Rr and yn−r , gn−r ∈ Rn−r . Therefore, if gn−r 6= 0, the problem

is unbounded. If gn−r = 0, then for any yn−r ∈ Rn−r ,

∗ −Λ−1r gr
x =U (5.13)
yn−r
is a global minimum of (5.8).
Proof. We have ∇f(x) = Qx + g and ∇2 f(x) = Q.

1. We assume by contradiction that there exists a local minimum x∗ of (5.8). Accord-
ing to (5.2) of Theorem 5.1, ∇2 f(x) = Q is positive semidefinite, which contradicts
the hypothesis.
2. Since Q is positive definite, the point x∗ in (5.9) is indeed definite and
∇f(x∗ ) = −QQ−1 g + g = 0 .
The sufficient optimality conditions (5.5) and (5.6) are satisfied and x∗ is a lo-
cal minimum of f. Moreover, according to Theorem 2.21, f is strictly convex.
According to Theorem 5.9, x∗ is the only global minimum.
3. Using the Schur decomposition, and the fact that U is orthogonal (so that UUT =
I), we write the objective function of (5.8) as
1 T 1
f(x) = x Qx + gT x + c = xT UΛUT x + gT UUT x + c. (5.14)
2 2
Using (5.12), we obtain
1 T
f(yr , yn−r ) = y Λr yr + gTr yr + gTn−r yn−r + c. (5.15)
2 r
The gradient is
Λr yr + gr
∇f(y) = .
gn−r
If gn−r 6= 0, the gradient is different from zero for any value of y, and the necessary
optimality condition is never verified. Now, if gn−r = 0, the variables yn−r do
not affect the objective function. We fix yn−r to any arbitrary value and solve
the problem for yr :
1
min f(yr ) = yTr Λr yr + gTr yr . (5.16)
2
As Λr is positive definite, the first result of this theorem applies, and
y∗r = −Λ−1
r gr . (5.17)
We obtain (5.13) using (5.12).
The last result of Theorem 5.10 has a geometric interpretation. The Schur de-
composition actually identifies one subspace where the quadratic function is strictly
convex (the subspace corresponding to the positive eigenvalues), and one subspace
where the function is linear (the subspace corresponding to zero eigenvalues). In this
latter subspace, in order to guarantee that the function is bounded, the linear part
must be constant, which corresponds to the condition gn−r = 0 (see Example 5.2).
5.3 Exercises
For the following optimization problems :
1. Calculate the gradient and the Hessian of the objective function.
2. Identify the critical points.
3. Eliminate those that do not satisfy the necessary optimality conditions.
4. Identify those that satisfy the sufficient optimality conditions.
Exercise 5.1. min x21 + x22 .
x∈R2
1 3
Exercise 5.2. min x + x32 − x1 − x2 .
3 1
x∈R2
1
Exercise 5.3. min x2 + .
x∈R x−2
Exercise 5.4. min x61 − 3x41 x22 + 3x21 x42 − x62 .
x∈R2
Exercise 5.5. min f(x), where f is defined by one of the functions of Exercise 2.2.
x∈R2
Chapter 6
Constrained optimization
Contents
6.1 Convex constraints . . . . . . . . . . . . . . . . . . . . . . 128
6.2 Lagrange multipliers: necessary conditions . . . . . . . . 133
6.2.1 Linear constraints . . . . . . . . . . . . . . . . . . . . . . 133
6.2.2 Equality constraints . . . . . . . . . . . . . . . . . . . . . 137
6.2.3 Equality and inequality constraints . . . . . . . . . . . . . 142
6.3 Lagrange multipliers: sufficient conditions . . . . . . . . 152
6.3.1 Equality constraints . . . . . . . . . . . . . . . . . . . . . 153
6.3.2 Inequality constraints . . . . . . . . . . . . . . . . . . . . 154
6.4 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . 159
6.5 Linear optimization . . . . . . . . . . . . . . . . . . . . . . 165
6.6 Quadratic optimization . . . . . . . . . . . . . . . . . . . . 171
6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
The development of optimality conditions in the presence of constraints is based on

the same intuition as in the unconstrained case: it is impossible to descend from a
minimum. However, we can no longer apply the optimality conditions described in
Chapter 5, as illustrated by the following example.
Example 6.1 (Necessary optimality condition without constraint). Consider the
problem
min f(x) = x2
subject to
x ≥ 1.
The solution to the problem is x = 1. And yet, f ′ (1) = 2 6= 0.
Here, instead of verifying that no direction is a descent direction, we must only

take into account the feasible directions and, if there is none, the feasible directions
at the limit (see Definition 3.21 and the discussions of Section 3.3). Theorem 6.2
expresses that if x∗ is a local minimum, no feasible direction at the limit is a descent
direction.
128 Convex constraints
Theorem 6.2 (Necessary optimality conditions for general constraints). Let x∗ be

a local minimum for the optimization problem minx∈Rn f(x) subject to h(x) = 0,
g(x) ≤ 0 and x ∈ X defined in Section 1.4. Here,
∇f(x∗ )T d ≥ 0
(6.1)
for any feasible direction d at the limit in x∗ .
Proof. We assume by contradiction that there exists a feasible direction at the

limit
d such that ∇f(x∗ )T d < 0 and let us consider a feasible (sub-)sequence xk k∈N of
Definition 3.21. According to Taylor’s theorem (Theorem C.1) with d = xk − x∗ , we
have
T
f(xk ) = f(x∗ ) + xk − x∗ ∇f(x∗ ) + o kxk − x∗ k . (6.2)
Since d = limk (xk − x∗ ), and ∇f(x∗ )T d < 0, there exists an index K such that
T
xk − x∗ ∇f(x∗ ) < 0 for all k ≥ K. In addition, the term o kxk − x∗ k can be made
as small as desired by making k sufficiently large (see Theorem 2.11 or Theorem 5.1
for a more formal analysis of this result). Therefore, there exists an index k large
enough that f(xk ) < f(x∗ ), which contradicts the local optimality of x∗ .
This general result does not take into account a possible structure in the con-
straints. We now propose optimality conditions for specific problems.
6.1 Convex constraints

We now consider the optimization problem min f(x) subject to x ∈ X, where X is a
closed non empty convex set. We obtain a specific version of Theorem 6.2.
Theorem 6.3 (Necessary optimality conditions for convex constraints). Let x∗ be a

local minimum to the optimization problem
min f(x) ,
x∈X
where f : Rn −→ R is differentiable at X and X is a non empty convex set. Then,

∀x ∈ X, T
∇f x∗ (x − x∗ ) ≥ 0 . (6.3)
Proof. We assume by contradiction that (6.3) is not satisfied. In this case, according
to Definition 2.10, the direction d = x − x∗ is a descent direction. According to
Theorem 2.11, there exists η > 0 such that
f(x∗ + αd) < f(x∗ ) , ∀α ∈ [0, η] . (6.4)

Constrained optimization 129
Moreover, according to Theorem 3.11, d is a feasible direction and x∗ + αd is feasible

for any 0 < α ≤ 1. Then, for each 0 < α ≤ min(η, 1), we have x∗ + αd ∈ X and
f(x∗ + αd) < f(x∗ ). This contradicts the local optimality of x∗ .
The condition (6.3) signifies geometrically that any feasible direction should form
an acute angle with the gradient, as illustrated in Figure 6.1. When the convex set
has a particular structure, the necessary optimality conditions can be simplified, as
shown in Example 6.4.
3
X ∇f(x∗ )
2
1 x − x∗
x∗•
0
-1
-2
-3
-3 -2 -1 0 1 2 3
Figure 6.1: Illustration of the necessary optimality condition
Example 6.4 (Bound constraints). Consider the optimization problem
min f(x)
x∈X⊂Rn
with

X = x | ℓi ≤ xi ≤ ui , i = 1, . . . , n , (6.5)
where ℓi 6= ui , for any i = 1, . . . , n. Let x∗ be a local minimum of this problem. Since

the condition (6.3) should be satisfied for all x ∈ Rn , we select some specific values
to derive necessary conditions. Each time, we select an arbitrary index i and choose
x ∈ Rn such that xj = x∗j for all j 6= i. For such x, the condition (6.3) simplifies to
∂f(x∗ )
(xi − x∗i ) ≥ 0 . (6.6)
∂xi
We now need to specify xi and verify that ℓi ≤ xi ≤ ui in order to obtain a feasible

point and apply the necessary optimality condition. We consider three cases.
1. x∗i = ℓi . If we choose
ui − ℓi ui + ℓi
xi = ℓi + = ,
2 2
xi is located exactly halfway between ℓi and ui , and x is feasible. Moreover,

xi − x∗i = (ui − ℓi )/2 > 0 because ui > ℓi . The condition (6.6) implies that
∂f(x∗ )
≥ 0.
∂xi
2. x∗i = ui . If we choose
ui − ℓi ui + ℓi
xi = ui − = ,
2 2
x is feasible. Moreover, xi − x∗i = −(ui − ℓi )/2 < 0. The condition (6.6) implies
that
∂f(x∗ )
≤ 0.
∂xi
3. ℓi < x∗i < ui . If we choose
ui + x∗i
xi = ,
2
x is feasible. Moreover, xi − x∗i = (ui − x∗i )/2 > 0 because x∗i < ui . The condition
(6.6) implies that
∂f(x∗ )
≥ 0.
∂xi
If we choose
ℓi + x∗i
xi = ,
2
x is feasible. Moreover, xi − x∗i = (ℓi − x∗i )/2 < 0 because ℓi < x∗i . The condition
(6.6) implies that
∂f(x∗ )
≤ 0.
∂xi
By combining these two results, we get
∂f(x∗ )
= 0.
∂xi
Then, in the case of bound constraints defined by (6.5), the necessary optimality
conditions can be written as
∂f(x∗ )
≥ 0, if x∗i = ℓi
∂xi
∂f(x∗ )
≤ 0, if x∗i = ui
∂xi
∂f(x∗ )
= 0, if ℓi < x∗i < ui
∂xi
for any i such that ℓi < ui . Finally, let us note that, in the case where ℓi = ui , each
feasible x is such that xi = ℓi = ui = x∗i and the condition (6.6) is trivially satisfied,
regardless of the value of ∂f(x∗ )/∂xi .
Figure 6.2(b) illustrates the problem
min f(x) = x21 + x22
subject to
0.7 ≤ x1 ≤ 2
−1 ≤ x2 ≤ 1 .
T T
The solution is x∗ = 0.7 0 and ∇f(x∗ ) = 1.4 0 . Since x∗1 = ℓ1 = 0.7, we
∗ ∗ ∗
have ∂f(x )/∂x1 ≥ 0. Since ℓ2 < x2 < u2 , we have ∂f(x )/∂x2 = 0.
Figure 6.2(a) illustrates the problem
min f(x) = x21 + x22
subject to
−2 ≤ x1 ≤ −0.7
−1 ≤ x2 ≤ 1 .
T T
The solution is x∗ = −0.7 0 and ∇f(x∗ ) = −1.4 0 . Since x∗1 = u1 =
∗ ∗ ∗
−0.7, we have ∂f(x )/∂x1 ≤ 0. Since ℓ2 < x2 < u2 , we have ∂f(x )/∂x2 = 0.
∇f(x∗ ) •x∗ x∗• ∇f(x∗ )
(a) Upper bound active (b) Lower bound active
Figure 6.2: Illustration of the necessary optimality condition for bound constraints
Theorem 6.5 (Sufficient optimality conditions for convex constraints – I). Consider
the optimization problem
min f(x) ,
x∈X
where X is a closed non empty convex set, and f : Rn −→ R is differentiable and

convex at X. Then, (6.3) is a sufficient condition for x∗ to be a global minimum
of f in X.
Proof. According to Theorem 2.16, we have

T
f(x) − f(x∗ ) ≥ x − x∗ ∇f(x∗ ) , ∀x ∈ X .
If (6.3) is satisfied, then

f(x) − f(x∗ ) ≥ 0 , ∀x ∈ X ,
and x∗ is a global minimum (Definition 1.7)

Example 6.6 (Projection on a convex set). Let X be a closed non empty convex set
for Rn and let us take z ∈ Rn . The projection of z over X, denoted by [z]P , is defined
as the unique solution of the following optimization problem:
1
min f(x) = (x − z)T (x − z) subject to x ∈ X.
x 2
Since f is convex and ∇f(x) = x − z, a necessary and sufficient condition for x∗ to be
the projection on z over X is
T
x∗ − z (x − x∗ ) ≥ 0 , ∀x ∈ X . (6.7)
Note that if z ∈ X, then (6.7) implies that x∗ = z.
It is interesting to characterize the optimality condition (6.3) by using the projec-

tion operator.
Theorem 6.7 (Optimality conditions for convex constraints – II). Consider the
optimization problem
min f(x) ,
x∈X
where X is a closed non empty convex set and f : Rn −→ R is differentiable. If

x∗ is a local minimum, then
P
x∗ = x∗ − α∇f(x∗ ) , ∀α > 0 . (6.8)
If, moreover, f is convex, (6.8) is sufficient for x∗ to optimize f over X.
P
Proof. Consider z(α) = x∗ − α∇f(x∗ ). According to (6.7), we have z(α) = x∗ for
all α > 0 if and only if
T
x∗ − z(α) (x − x∗ ) ≥ 0 , ∀x ∈ X , ∀α > 0 ,
or
T
x∗ − x∗ + α∇f(x∗ ) (x − x∗ ) ≥ 0 , ∀x ∈ X , ∀α > 0 .
The latter equation is equivalent to the optimality condition (6.3).

6.2 Lagrange multipliers: necessary conditions

Theorem 6.2 is based on the notion of feasible directions at the limit, which form
the tangent cone (Definition 3.22). As we discussed in the last part of Section 3.3,
this notion is too complex and the linearized cone (Definition 3.23) is much easier
to handle. The most common cases, where the linearized cone is equivalent to the
tangent cone, are optimization problems with linear constraints (Theorem 3.27) and
linearly independent constraints (Theorem 3.28). Therefore, we present the results for
only these cases. It is also possible to develop optimality conditions by considering the
linearized cone from a general point of view. The details are described in Mangasarian
(1979) and Nocedal and Wright (1999). The necessary optimality conditions are
generally called Karush-Kuhn-Tucker conditions or KKT conditions. In fact, for
many years, they were called Kuhn-Tucker conditions, following the article by Kuhn
and Tucker (1951). It later turned out that Karush (1939) had already formulated
them independently. John (1948) proposed a generalization a decade later (Theorem
6.12).
Note that the theory of Lagrange multipliers extends beyond the optimality con-
ditions presented in this book and that they can also be adapted to non differentiable
optimization. We refer the interested reader to Bertsekas (1982) and Rockafellar
(1993).
In this text, we adopt the approach of Bertsekas (1999), who first presents these
conditions in the case of linear constraints, and then for problems including linearly
independent equality constraints. The proof provides intuitions that are reused in
the development of algorithms. Subsequently, we generalize the result for problems
that also include inequality constraints.
6.2.1 Linear constraints

Consider the problem
min f(x) (6.9)
x∈Rn
subject to
Ax = b (6.10)
with A ∈ Rm×n and b ∈ Rm . According to Theorem 3.6, the matrix A can be

considered of full rank without loss of generality. In this case, the Karush-Kuhn-
Tucker conditions are formulated in the following manner.
Theorem 6.8 (Karush-Kuhn-Tucker: linear case). Let x∗ be a local minimum of

the problem minx∈Rn f(x) subject to Ax = b, where f : Rn → R is differentiable
and A ∈ Rm×n is of full rank. There thus exists a single vector λ∗ ∈ Rm such
that
∇x L(x∗ , λ∗ ) = ∇f(x∗ ) + AT λ∗ = 0 , (6.11)
134 Lagrange multipliers: necessary conditions
Albert William Tucker was born on November 25, 1905 in On-

tario, Canada, and died in Hightstown, New Jersey on January
25, 1995. He received his PhD in 1932 from Princeton University,
where he spent most of his career. He is particularly known for
his work on linear programming and game theory. After Dantzig
visited von Neumann in 1948, Tucker drove Dantzig back to the

train station in Princeton. It was during this short ride that
Dantzig exposed linear programming to Tucker. In the context
of game theory, von Neumann is generally cited as the inventor of
the duality theorem, and Tucker, Kuhn and Wales as the first to
propose a rigorous proof. In 1950, Tucker proposed the example
of the prisoner’s dilemma to illustrate the difficulty of non zero-sum games. John
Nash, Nobel Laureate in Economics 1994, was one of his students at Princeton. The
story goes that von Neumann did not agree with the approach of Nash’s game theory,
while Tucker encouraged him to develop his ideas and supervised his doctoral thesis.
Figure 6.3: Albert William Tucker
where L is the Lagrangian function (Definition 4.3). If f is twice differentiable,

then
yT ∇2xx L(x∗ , λ∗ )y ≥ 0 , ∀y ∈ Rn such that Ay = 0 . (6.12)
Proof. We employ the technique for the elimination of constraints described in Section
3.4 to convert the optimization problem with constraints into an optimization problem
without constraint (3.70). To simplify the proof, we can assume that the variables
are arranged in a way that the m variables to eliminate are the m first ones. Then,
P = I in (3.70) and the minimization problem is
−1
B b − NxN
min g(xN ) = f . (6.13)
xN ∈Rn−m xN
If x∗ = (x∗B , x∗N ) is a local minimum of the problem with constraints, then x∗N is a
local minimum of the problem without constraints and the necessary condition (5.1)
applies to (6.13). By using chain rule differentiation (see (C.6) of Theorem C.3) we
obtain:
∇g(x∗N ) = −NT B−T ∇B f(x∗ ) + ∇N f(x∗ ) = 0 , (6.14)
where ∇B f(x∗ ) and ∇N f(x∗ ) represent the gradient of f with regard to the variables
xB and xN , respectively. If we define
λ∗ = −B−T ∇B f(x∗ ) , (6.15)
which can also be written as
∇B f(x∗ ) + BT λ∗ = 0 , (6.16)
the condition (6.14) is expressed as
∇N f(x∗ ) + NT λ∗ = 0 . (6.17)
Equations (6.16) and (6.17) form (6.11), which proves the first-order result.
To demonstrate the second-order result, we first derive (6.11) to obtain
∇2xx L(x, λ) = ∇2 f(x) . (6.18)
We consider a vector y ∈ Rn such that Ay = 0 and then ByB + NyN = 0 or

yB = −B−1 NyN . Then, if d ∈ Rm−n is an arbitrary vector, the vector

yB −B−1 Nd
y= = (6.19)
yN d
is such that Ay = 0. According to the necessary optimality conditions (5.2) in the

unconstrained case and Definition B.8 of a positive semidefinite matrix, we have that
dT ∇2 g(x∗N )d ≥ 0 , ∀d ∈ Rm−n . (6.20)
However, when deriving (6.14), we obtain
∇2 g(x∗N ) = NT B−T ∇2BB f(x∗ )B−1 N

− NT B−T ∇2BN f(x∗ )
(6.21)
− ∇2NB f(x∗ )B−1 N
+ ∇2NN f(x∗ ) ,
where ∇2 f(x∗ ) is decomposed into

!
2 ∗
∇2BB f(x∗ ) ∇2BN f(x∗ )
∇ f(x ) = . (6.22)
∇2NB f(x∗ ) ∇2NN f(x∗ )
Then,
dT ∇2 g(x∗N )d =dT NT B−T ∇2BB f(x∗ )B−1 Nd

− dT NT B−T ∇2BN f(x∗ )d
− dT ∇2NB f(x∗ )B−1 Nd
+ dT ∇2NN f(x∗ )d from (6.21)
=yTB ∇2BB f(x∗ )yB
+ yTB ∇2BN f(x∗ )yN
+ yTN ∇2NB f(x∗ )yB
+ yTN ∇2NN f(x∗ )yN from (6.19)
T 2 ∗
=y ∇ f(x )y from (6.22)
=y T
∇2xx L(x∗ )y from (6.18)
and (6.12) is equivalent to (6.20).

Example 6.9 (Karush-Kuhn-Tucker: linear case). We consider the following opti-

mization problem:
min f(x1 , x2 , x3 , x4 ) = x21 + x22 + x23 + x24
subject to
x1 + x2 + x3 =1
x1 − x2 + x4 = 1 .
The solution to this problem is
   
2/3 4/3
 0   0 
x∗ = 
 1/3 
 and ∇f(x∗ ) =  
 2/3  .
1/3 2/3
By decomposing
1 1 1 0
A= B N = ,
1 −1 0 1
we obtain

∗ −T ∗ 1/2 1/2 4/3 −2/3
λ = −B ∇B f(x ) = − = ,
1/2 −1/2 0 −2/3
and (6.11) is expressed as
        
4/3 1 1 4/3 −4/3 0

 0   1 −1  −2/3  0   0  0 
 +   + = .
 2/3   1 0  −2/3 =  2/3   −2/3   0 
2/3 0 1 2/3 −2/3 0
Any vector of the form  

1 1
− y3 − y4
 2 2 
 
 1 1 
 
y =  − y3 + y4 
 2 2 
 
 y3 
y4
is such that Ay = 0 and (6.12) is written as
1  1 T  
− y3 − y4 1 1
 2 2     − 2 y3 − 2
y4

  2 0 0 0  
 1 1  0 2  
T 2 ∗    0 0  1 1 
y ∇ f(x )y =  − y3 + y4  0 0  − y + y4 
 2 2  2 0  2 3 2 
   
 y3  0 0 0 2  y3 
y4 y4
= 3y23 + 3y24 ≥ 0 .
6.2.2 Equality constraints

We consider here the problem with equality constraints (1.71)–(1.72). In this case,
the Karush-Kuhn-Tucker conditions are formulated in the following way. Thanks to
the Lagrangian function (Definition 4.3), their expression presents similarities with
conditions without constraint.
Theorem 6.10 (Karush-Kuhn-Tucker: equality constraints). Let x∗ be a local min-

imum of the problem minx∈Rn f(x) subject to h(x) = 0, with f : Rn → R and
h : Rn → Rm continuously differentiable. If the constraints are linearly indepen-
dent in x∗ (in the sense of Definition 3.8), there exists a unique vector λ∗ ∈ Rm
such that
∇L(x∗ , λ∗ ) = 0 , (6.23)
where L is the Lagrangian function (Definition 4.3). If f and h are twice differ-
entiable, then
yT ∇2xx L(x∗ , λ∗ )y ≥ 0 , ∀y ∈ D(x∗ ) , (6.24)
where D(x∗ ) is the linearized cone1 in x∗ (Definition 3.23). Moreover,
T −1 T
λ∗ = − ∇h x∗ ∇h(x∗ ) ∇h x∗ ∇f(x∗ ) . (6.25)
Proof. We generate a sequence of optimization problems without constraint approach-

ing the original problem. The idea is to penalize the violation of constraints by in-
troducing a penalty term. The objective function of the problem without constraints
is defined by
k 2 α 2
Fk (x) = f(x) + h(x) + x − x∗ , (6.26)
2 2
where k ∈ N, x∗ is a local minimum to the problem minx∈Rn f(x) subject to h(x) = 0
and α > 0 is arbitrary. Since x∗ is a local minimum (Definition 1.5), there exists ε
such that
f(x∗ ) ≤ f(x) , ∀x such that h(x) = 0 and x ∈ Sε , (6.27)

where Sε is the sphere defined by Sε = x|kx−x∗k ≤ ε . According to the Weierstrass
theorem (Theorem 1.14), the problem
min Fk (x) (6.28)
subject to
x ∈ Sε (6.29)
has a solution in Sε , denoted by xk . One should keep in mind that the problem (6.28)–
(6.29) is subject to constraints. Nevertheless, we demonstrate that, for a sufficiently
1 Since the constraints are linearly independent, the constraint qualification is satisfied and the
linearized cone corresponds to the tangent cone.
large k, the solution lies strictly inside Sε and is therefore a solution to the problem
(6.28) without constraint (according to Theorem 3.5). The role of Sε is to ensure that
no local minima other than x∗ are found.
We have
Fk (xk ) ≤ f(x∗ ) . (6.30)
Indeed, Fk (xk ) ≤ Fk (x∗ ) because xk is the solution to (6.28)-(6.29) and x∗ ∈ Sε . Also,

according to (6.26),
k 2 α ∗ 2
Fk (x∗ ) = f(x∗ ) + h(x∗ ) + x − x∗ = f(x∗ ),
2 2
because h(x∗ ) = 0. Then, when k → ∞, the value of Fk (xk ) remains bounded. We
show by contradiction that this implies that
lim h(xk ) = 0 . (6.31)

k→∞
2
Indeed, if (6.31) is not satisfied, then the term k2 h(xk ) tends towards +∞. Since
F(xk ) remains bounded, this signifies that either f(xk ), or kxk − x∗ k2 tends towards
−∞. However f(xk ) is bounded from below by f(x∗ ) over Sε (according to (6.27))
and kxk − x∗ k2 is positive, which leads to a contradiction and proves (6.31).
Let bx be a limit point of the sequence (xk )k (Definition B.20). According to (6.31),
we have h(b x) = 0 and b x is feasible for the original problem and thus f(x∗ ) ≤ f(bx).
Moreover, according to (6.30)
2
x − x∗
x) + α b
lim Fk (xk ) = f(b ≤ f(x∗ ) . (6.32)
k→∞
Then,
2
f(b x − x∗
x) + α b ≤ f(b
x) . (6.33)
∗ 2

As a result, α b
x−x = 0 and bx = x∗ . The sequence xk k
converges to x∗ . According
to Definition B.19, there exists b
k such that
kxk − x∗ k ≤ 0.9 ε < ε , b,

∀k ≥ k (6.34)
where ε is the radius of the sphere Sε involved in the definition of the local minimum
(6.27). The point xk is inside Sε when k is sufficiently large.
According to Theorem 1.16, xk is a local minimum of the unconstrained prob-
lem (6.28). We can apply the necessary optimality conditions of an unconstrained
problem, given by Theorem 5.1:
∇Fk (xk ) = ∇f(xk ) + k∇h(xk )h(xk ) + α(xk − x∗ ) = 0 (6.35)
and ∇2 Fk (xk ) is positive semidefinite, with

m
X
∇2 Fk (xk ) = ∇2 f(xk ) + k hi (xk )∇2 hi (xk ) + k∇h(xk )∇h(xk )T + αI . (6.36)
i=1
x0
b
Sε
b
0.9ε
b
xk
x∗
Figure 6.4: Illustration of the proof of Theorem 6.10
By multiplying (6.35) with ∇h(xk )T , we get
∇h(xk )T ∇f(xk ) + k∇h(xk )T ∇h(xk )h(xk ) + α∇h(xk )T (xk − x∗ ) = 0 . (6.37)
Since the constraints are linearly independent by hypothesis, the matrix ∇h(x∗ )T
∇h(x∗ ) is of full rank and invertible. By continuity of ∇h (indeed, h is continuously
differentiable), there exists a k sufficiently large such that ∇h(xk )T ∇h(xk ) is also
−1
invertible. By multiplying (6.37) by ∇h(xk )T ∇h(xk ) , we obtain
−1
kh(xk ) = − ∇h(xk )T ∇h(xk ) ∇h(xk )T ∇f(xk ) + α(xk − x∗ ) . (6.38)
When k → ∞, we define
λ∗ = lim kh(xk ) (6.39)
k→∞
to obtain (6.25). By letting k → ∞ in (6.35), we get
∇f(x∗ ) + ∇h(x∗ )λ∗ = 0 . (6.40)
According to Definition 4.3, (6.40) is equivalent to ∇x L(x∗ , λ∗ ) = 0. Since 0 = h(x∗ ) =

∇λ L(x∗ , λ∗ ), we get (6.23).
To demonstrate the second-order condition (6.24), let us consider y in the lin-
T
earized cone (and thus ∇h x∗ y = 0) and, for a sufficiently large k, let us consider
its projection yk on the kernel of ∇h(xk )T . According to the theorem of projection
on the kernel of a matrix (Theorem C.7), we have
−1
yk = y − ∇h(xk ) ∇h(xk )T ∇h(xk ) ∇h(xk )T y . (6.41)
We also have
T −1 T
lim yk = y − ∇h(x∗ ) ∇h x∗ ∇h(x∗ ) ∇h x∗ y = y , (6.42)
k→∞
T
because ∇h x∗ y = 0. Then, from (6.36),
m
!
X
yTk ∇2 F(xk )yk = yTk 2
∇ f(xk ) + k 2
hi (xk )∇ hi (xk ) yk
i=1
+ kyTk ∇h(xk )∇h(xk )T yk + αyTk yk .

As ∇h(xk )T yk = 0,
m
!
X
yTk ∇2 F(xk )yk = yTk 2
∇ f(xk ) + k hi (xk )∇ hi (xk ) yk + αyTk yk .
2
i=1
As ∇2 Fk (xk ) is positive semidefinite, this last quantity is non negative. By letting

k → ∞ and using λ∗ = limk→∞ kh(xk ), we get

m
!
X
T 2 ∗ ∗ 2 ∗
y ∇ f(x ) + λi ∇ hi (x ) y + αyT y ≥ 0 . (6.43)
i=1
According to Definition 4.3 of the Lagrangian function, (6.43) is equivalent to
yT ∇2xx L(x∗ , λ∗ )y + αyT y ≥ 0 . (6.44)
If (6.24) is not satisfied, (6.44) is not valid for all α > 0. Indeed, if yT ∇2xx L(x∗ , λ∗ )y <
0, (6.44) is not satisfied for the values of α such that
yT ∇2xx L(x∗ , λ∗ )y
α<− .
yT y
Since α can be arbitrarily chosen, this concludes the proof.
Note that for linear constraints, h(x) = Ax − b, ∇h(x) = AT and (6.23) is written
as
∇f(x∗ ) + AT λ∗
= 0,
Ax − b
which is equivalent to (6.11) of Theorem 6.8.
Example 6.11 (Karush-Kuhn-Tucker: equality constraints). Consider the optimiza-
tion problem
min x1 + x2 (6.45)
x∈R2
subject to
x21 + (x2 − 1)2 − 1
h(x) = = 0. (6.46)
−x21 + x2
The set of constraints is illustrated in Figure 6.5. We have

1 2x1 −2x1
∇f(x) = and ∇h(x) = . (6.47)
1 2x2 − 2 1
The Lagrangian function of the problem is

L(x, λ) = x1 + x2 + λ1 x21 + (x2 − 1)2 − 1 + λ2 (−x21 + x2 ) (6.48)
and  
1 + 2λ1 x1 − 2λ2 x1
 
 1 + 2λ1 (x2 − 1) + λ2 
∇L(x, λ) = 
 x2 + (x − 1)2 − 1
.
 (6.49)
 1 2 
−x21 + x2
2
h2 (x) = x2 − x21
1.5
1 • a
x
0.5
x2
0 • b
x
-0.5 h1 (x) = x21 + (x2 − 1)2 − 1
-1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x1
Figure 6.5: Illustration of the KKT conditions
T
The point xa = 1 1 is a local minimum of the problem. The constraints are
a
linearly independent in x because the matrix

a 2 −2
∇h(x ) = (6.50)
0 1
is of full rank. Thanks to the relations (6.25) and (6.47), we get

∗ −3/2
λ = . (6.51)
−1
The condition ∇L(xa , λ∗ ) = 0 is satisfied. Note that the linearized cone is empty
in xa and that the second-order
T condition is trivially satisfied.
The point xb = 0 0 is also a local minimum (in fact, we are dealing with a
global minimum of the problem). We have
 
1
 1 − 2λ1 + λ2 
∇L(xb , λ) = 

,
 (6.52)
0
0
which cannot be zero for any λ. The necessary condition is not satisfied in this case.
Indeed, the constraints are not linearly independent in xb because the matrix

0 0
∇h(xb ) = (6.53)
−2 1
is not of full rank.

We now present the result of John (1948) for equality constraints, which consti-
tutes a generalization of Theorem 6.10.
Theorem 6.12 (Fritz John: equality constraints). Let x∗ be a local minimum of

the problem minx∈Rn f(x) subject to h(x) = 0, where f : Rn → R and h : Rn → Rm
are continuously differentiable. Then, there exists µ∗0 ∈ R and a vector λ∗ ∈ Rm

such that
µ∗0 ∇f(x∗ ) + ∇h(x∗ )λ∗ = 0 (6.54)
and µ∗0 , λ∗1 , . . . , λm , are not all zero.
Proof. In the case where the constraints are linearly independent, Theorem 6.10 ap-
plies and (6.54) is trivially obtained with µ∗0 = 1. In the case where the constraints
are linearly dependent, then there exist λ∗1 , . . . , λ∗m , not all zero, such that
m
X
λ∗i ∇hi (x∗ ) = 0 ,
i=1
and (6.54) is obtained when µ0 = 0.
6.2.3 Equality and inequality constraints

We now present the necessary Karush-Kuhn-Tucker optimality conditions for a gen-
eral case including equality and inequality constraints. As is often the case, the
approach consists in returning to an already studied case, in this case the problem
with only equality constraints.
Theorem 6.13 (Karush-Kuhn-Tucker). Let x∗ be a local minimum of the problem

minx∈Rn f(x) subject to h(x) = 0, g(x) ≤ 0, where f : Rn → R, h : Rn → Rm
and g : Rn → Rp are continuously differentiable. If the constraints are linearly
independent in x∗ (in the sense of Definition 3.8), there exists a unique vector
λ∗ ∈ Rm and a unique vector µ∗ ∈ Rp such that
∇x L(x∗ , λ∗ , µ∗ ) = 0 , (6.55)
µ∗j ≥ 0 , j = 1, . . . , p , (6.56)
and
µ∗j gj (x∗ ) = 0 , j = 1, . . . , p , (6.57)
where L is the Lagrangian function (Definition 4.3). If f, h and g are twice
differentiable, then
yT ∇2xx L(x∗ , λ∗ , µ∗ )y ≥ 0 , ∀y 6= 0 such that
T ∗
y ∇hi (x ) = 0 , i = 1, . . . , m (6.58)
T ∗ ∗
y ∇gi (x ) = 0 , i = 1, . . . , p such that gi (x ) = 0 .
Proof. We consider the active inequality constraints at the solution as equality con-
straints and let us ignore the other constraints in order to obtain the problem
minx∈Rn f(x) subject to h(x) = 0, gi (x) = 0, for any i ∈ A(x∗ ), where A(x∗ ) is
the set
of active constraints in x∗ (Definition 3.4). According to Theorem 3.5 where
Y = x | h(x) = 0 , x∗ is a local minimum of the optimization problem with equality
constraints. According to Theorem 6.10, there exist Lagrange multipliers λ∗ ∈ Rm

and µ∗i , with i ∈ A(x∗ ) such that
X
∇f(x∗ ) + ∇h(x∗ )λ∗ + µ∗i ∇gi (x∗ ) = 0 . (6.59)
i∈A(x∗ )
We associate a zero multiplier to each non active inequality constraint in x∗ to obtain

p
X
∇f(x∗ ) + ∇h(x∗ )λ∗ + µ∗i ∇gi (x∗ ) = 0 (6.60)
i=1
with µ∗i = 0 if i 6∈ A(x∗ ). We thus get (6.55). Similarly, the second-order condition
of Theorem 6.10 implies that
yT ∇2xx L(x∗ , λ∗ , µ∗ )y ≥ 0 , ∀y such that

T ∗
y ∇hi (x ) = 0 , i = 1, . . . , m (6.61)
T ∗ ∗
y ∇gi (x ) = 0 , i = 1, . . . , p such that gi (x ) = 0
and (6.58) is satisfied. We note that (6.57) is trivially satisfied. Indeed, if the con-
straint gj (x∗ ) ≤ 0 is active, we have gj (x∗ ) = 0. If on the other hand it is not, we
have µ∗j = 0. We now need only demonstrate (6.56).
We take the same proof as Theorem 6.10, by defining the penalty function for
inequality constraints with

g+
i (x) = max 0, gi (x) , i = 1, . . . , p . (6.62)
In this case, the function (6.26) becomes

p
k 2 kX + 2 α
Fk (x) = f(x) + h(x) + gj (x) + kx − x∗ k2 . (6.63)
2 2 2
i=1
Since g+ 2
j (x) is differentiable and
∇g+ 2 +
j (x) = 2gj (x)∇gj (x) , (6.64)
we can invoke the same development as in the proof of Theorem 6.10. Since we have
obtained (6.39), we have
µ∗i = lim kg+

i (xk ) , i = 1, . . . , p . (6.65)
k→∞
And since g+
i (x) ≥ 0, we get (6.56).
Example 6.14 (Karush-Kuhn-Tucker: inequality constraints – I). Consider the prob-

lem
min x1 + x2 (6.66)
x∈R2
subject to
(x1 − 3)2 + x22 ≤ 9
x21 + (x2 − 3)2 ≤ 9 (6.67)

x21 ≤ 1 + x2 ,
illustrated in Figure 6.6. Re-arranging the equations of the constraints, we obtain
g1 (x) = x21 − 6x1 + x22
g2 (x) = x21 − 6x2 + x22 (6.68)
g3 (x) = x21 − x2 − 1 ,
and the Lagrangian function is written as
L(x, µ) = x1 + x2 + µ1 (x21 − 6x1 + x22 )
(6.69)
+ µ2 (x21 − 6x2 + x22 ) + µ3 (x21 − x2 − 1) .
6
x21 ≤ 1 + x2
4 x21 + (x2 − 3)2 ≤ 9
2
x2
0 •
(x1 − 3)2 + x22 ≤ 9
-2
-4
-4 -2 0 2 4 6
x1
Figure 6.6: Karush-Kuhn-Tucker optimality conditions
T
The point x∗ = 0 0 is a local minimum of this problem. The constraints
g1 (x) ≤ 0 and g2 (x) ≤ 0 are active in x∗ , whereas the constraint g3 (x) ≤ 0 is not.
The point x∗ is also a local minimum of the problem where the two active inequality
constraints are replaced by equality constraints, and the active constraint is ignored,
that is
min x1 + x2 (6.70)
x∈R2
subject to
(x1 − 3)2 + x22 = 9
(6.71)
x21 + (x2 − 3)2 = 9 ,
or, equivalently
h1 (x) = x21 − 6x1 + x22 = 0
(6.72)
h2 (x) = x21 − 6x2 + x22 = 0 .
The gradient of the constraints is written as

2x1 − 6 2x1
∇h(x) = . (6.73)
2x2 2x2 − 6
According to the KKT conditions (Theorem 6.10), λ∗ is given by (6.25) and we have
T !−1 T
∗ −6 0 −6 0 −6 0 1 1/6
λ =− = .
0 −6 0 −6 0 −6 1 1/6
Then, the necessary first-order KKT condition for the initial problem is written as
!
∗ ∗
1 − 6µ∗1 + 2(µ∗1 + µ∗2 + µ∗3 )x∗1
∇Lx (x , µ ) = = 0,
1 − 6µ∗2 + 2(µ∗1 + µ∗2 )x∗2 − µ∗3
T
where L is defined by (6.69), x∗ = 0 0 and
   
λ∗1 1/6
µ∗ =  λ∗2  =  1/6  .
0 0
Since the linearized cone is empty in x∗ , the necessary second-order KKT condition
is trivially satisfied.
Example 6.15 (Karush-Kuhn-Tucker: inequality constraints – II). Consider the

optimization problem
1
min (x21 − x22 ) (6.74)
x∈R 2 2
subject to
x2 ≤ 1 , (6.75)
for which the objective function is illustrated in Figure 6.7. We have
1 2 1 2
L(x, µ) = x − x + µ(x2 − 1) .
2 1 2 2
T
The point x∗ = 0 1 is a local minimum of the problem. The constraint is
active in this point. The first-order condition (6.55) is expressed as

∗ ∗ x∗1 0
∇x L(x , µ ) = = =0
−x∗2 + µ∗ −1 + µ∗
60 10
40
20
0 5
-20
-40 0 x2
-60
-5
-10 -10
-5
x2 0 -10 -10 -5 0 5 10
5 5 0x -5
1010 1 x1
(a) Surface (b) Level curves
Figure 6.7: Objective function of Examples 6.15 and 6.21
and is satisfied with µ∗ = 1, which is positive. The second-order condition (6.58) is

written as
1 0 y1
( y1 y2 ) = y21 − y22 ≥ 0 (6.76)
0 −1 y2
for any y such that

T y1 y1
∇g x∗ =( 0 1 ) = y2 = 0
y2 y2
and is satisfied.
T Note that if we choose a feasible direction y, for instance y =
0 −1 , the condition (6.76) is not satisfied. It is only valid for y orthogonal to
active constraints.
Example 6.16 (Physical interpretation of KKT conditions). We consider the opti-
mization problem
min x1
x∈R2
subject to
h1 (x) = x1 − sin x2 = 0 .
T
The equality constraint is represented in Figure 6.8(a). The point xa = −1 3π/2
is a local minimum of the problem. We have

1 1
∇f(x) = , ∇h(x) =
0 − cos x2
and the necessary optimality condition is written as
−∇f(xa ) − λ∗1 ∇h(xa ) = 0 .

6 λ∗ = −1
1
5 −∇f(xa ) −λ∗1 ∇h(xa )

xa•
4
−∇f(xb ) −λ∗1 ∇h(xb )

x2
3 •
xb
2
1
h(x) = x1 − sin x2 = 0
0
-2 -1 0 1 2
x1
(a) Equality constraint
5 −∇f(xa )
xa•
4
x2
1
g(x) = x1 − sin x2 ≤ 0
0
-2 -1 0 1 2
x1
(b) Inequality constraint
Figure 6.8: Interpretation of the KKT constraints
Using the fact that −∇f(xa ) is the direction with the steepest descent (Theorem
2.13), we can interpret this condition as an equilibrium between the “force” −∇f(xa ),
which drives the solution to lower values of the objective function, and the force
−λ1 ∇h(xa ), which maintains the solution on the constraint. If xa is optimal, this
signifies that the forces are balanced and that their sum is zero. In our example, since
the “force” −∇f(xa ) acts in the same direction as −∇h(xa ), the multiplier λ∗1 should
be negative so that the two “forces” can compensate each other.
T
If we now take the point xb = sin(3) 3 , the “forces” −∇f(xb ) and −λ∇h(xb )
are not balanced. This is not only the case when λ = λ∗1 , as shown in Figure 6.8(a),
but for all λ.
We now consider the problem

min x1
x∈R2
subject to
g1 (x) = x1 − sin x2 ≤ 0 .
The inequality constraint is represented in Figure 6.8(b). The point xa = (−1, 3π/2)
is not a local minimum of the problem. In fact, the direction −∇f(xa ) is feasible and
the associated “force” drives the solution towards feasible values.
Note that the constraint is active in xa and that the equation
−∇f(xa ) − µ1 ∇g1 (xa ) = 0
is satisfied for µ1 = −1. The interpretation of the forces signifies that the “force”
−µ1 ∇g1 (xa ) drives the solution to the right and prevents it from going inside the
feasible domain, which is incompatible with the definition of the inequality constraint.
The condition (6.56) in Theorem 6.13, µ∗ ≥ 0 signifies that, for an inequality con-
straint, the “force” can only act in a single direction, so as to prevent the points from
leaving the feasible domain, but not from going inside. For the equality constraints,
the “force” can act in two directions and there is no condition for the sign of the
multipliers.
Example 6.17 (Slack variables). Consider problem (P1)
min f(x) (6.77)

x∈Rn
subject to
gi (x) ≤ 0, i = 1, . . . , m. (6.78)
The Lagrangian of (P1) is

m
X
L(x, µ) = f(x) + µi gi (x). (6.79)
i=1
The first derivative is

m
X ∂gi
∂L ∂f
(x, µ) = (x) + µi (x), (6.80)
∂xj ∂xj ∂xj
i=1
for j = 1, . . . , n, and the second derivative is
X m
∂2 L ∂2 f ∂gi
(x, µ) = (x) + µi (x), (6.81)
∂xj ∂xk ∂xj ∂xk ∂xj ∂xk
i=1
for j, k = 1, . . . , n.
Let x∗ be a local optimum of problem P1. Therefore, the first order necessary
optimality (KKT) conditions say that, under appropriate assumptions, there is a
unique µ∗ ∈ Rm , µ∗ ≥ 0, such that
X ∂gim
∂L ∗ ∗ ∂f ∗
(x , µ ) = (x ) + µ∗i (x∗ ) = 0,
∂xj ∂xj ∂xj

i=1 (6.82)
µ∗i gi (x∗ ) = 0 i = 1, . . . , m.
Assume that the first p constraints are active at x∗ , and the others not, that is
gi (x∗ ) = 0, i = 1, . . . , p, (6.83)
and
gi (x∗ ) < 0, i = p + 1, . . . , m. (6.84)
Therefore, we obtain
X ∂gi p
∂L ∗ ∗ ∂f ∗
(x , µ ) = (x ) + µ∗i (x∗ ) = 0, (6.85)
∂xj ∂xj ∂xj
i=1
and
µ∗i ≥ 0, i = 1, . . . , p,
(6.86)
µ∗i = 0, i = p + 1, . . . , m.
Moreover, for each d ∈ Rm such that, for each i = 1, . . . , p
n
X ∂gi ∗
dk (x ) = 0, (6.87)
∂xk
k=1
we have !
n X
n p
X ∂2 f X ∂g i
(x∗ ) + µ∗i (x∗ ) dj dk ≥ 0. (6.88)
∂xj ∂xk ∂xj ∂xk
j=1 k=1 i=1
Consider now problem (P2), obtained from problem (P1) by transforming the
inequality constraints into equality constraints using slack variables, as suggested in
Section 1.2.2:
min
n m
f(x) (6.89)
x∈R ,y∈R
subject to
hi (x, y) = gi (x) + y2i = 0, i = 1, . . . , m. (6.90)
For each i = 1, . . . , m, the first derivatives of the constraint are
∂hi ∂gi
= , k = 1, . . . , n, (6.91)
∂xk ∂xk
∂hi
= 2yi , (6.92)
∂yi
and
∂hi
= 0, ℓ = 1, . . . , m, ℓ 6= i. (6.93)
∂yℓ
The Lagrangian of (P2) is
m
X
L(x, y, λ) = f(x) + λi (gi (x) + y2i ). (6.94)

i=1
The first derivatives are

X ∂gi m
∂L ∂f
(x, y, λ) = (x) + λi (x), (6.95)
∂xj ∂xj ∂xj
i=1
for j = 1, . . . , n, and
∂L
(x, y, λ) = 2λi yi , (6.96)
∂yi
for i = 1, . . . , m. The second derivatives are
m
∂2 L ∂2 f X ∂gi
(x, y, λ) = (x) + λi (x), (6.97)
∂xj ∂xk ∂xj ∂xk ∂xj ∂xk
i=1
for j, k = 1, . . . , n,
∂2 L
(x, y, λ) = 2λi , (6.98)
∂y2i
for i = 1, . . . , m,
∂2 L
(x, y, λ) = 0, (6.99)
∂yi ∂yℓ
for i, ℓ = 1, . . . , m, i 6= ℓ, and
∂2 L
(x, y, λ) = 0, (6.100)
∂xj ∂yi
for j = 1, . . . , n, i = 1, . . . , m.
Let x∗ and y∗ be local optima of problem P2. Therefore, the first order necessary
optimality (KKT) conditions say that there is a unique λ∗ ∈ Rm such that
X ∂gi m
∂L ∗ ∗ ∗ ∂f ∗
(x , y , λ ) = (x ) + λ∗i (x∗ ) = 0, (6.101)
∂xj ∂xj ∂xj
i=1
and
2λi y∗i = 0 i = 1, . . . , m. (6.102)
Moreover, for each
dx
d= ∈ Rn+m
dy
such that, for each i = 1, . . . , m
n
X ∂gi ∗
(dx )k (x ) + 2(dy )i y∗i = 0, (6.103)
∂xk
k=1
we have
n X
n m
! m
X ∂2 f X X
∗ ∗ ∂gi ∗
(x ) + λi (x ) (dx )j (dx )k + 2 λ∗i (dy )2i ≥ 0. (6.104)
j=1 k=1 i=1 i=1
Now, assume that the constraints are numbered so that y∗i = 0 for i = 1, . . . , p,
and y∗i 6= 0 for i = p + 1, . . . , m. As x∗ verifies the constraints (6.90), we have also

gi (x∗ ) = 0 for i = 1, . . . , p, and gi (x∗ ) < 0 for i = p + 1, . . . , m. Then, from (6.102),
we have λ∗i = 0, i = p + 1, . . . , m. The first order condition (6.101) becomes
X ∂gi p
∂L ∗ ∗ ∗ ∂f ∗
(x , y , λ ) = (x ) + λ∗i (x∗ ) = 0. (6.105)
∂xj ∂xj ∂xj
i=1
For the second order conditions, for each

dx
d= ∈ Rn+m
dy
such that, for each i = 1, . . . , p
n
X ∂gi ∗
(dx )k (x ) = 0, (6.106)
∂xk
k=1
and for each i = p + 1, . . . , m

n
X ∂gi ∗
(dx )k (x ) + 2(dy )i y∗i = 0, (6.107)
∂xk
k=1
we have
n X
n p
! p
X ∂2 f ∗
X
∗ ∂gi ∗
X
(x ) + λi (x ) (dx )j (dx )k + 2 λ∗i (dy )2i ≥ 0. (6.108)
j=1 k=1 i=1 i=1
In particular, consider d such that dx = 0 and (dy )i = 0, i = p + 1, . . . , m. It clearly

verifies conditions (6.106) and (6.107). We have for any value of (dy )i , i = 1, . . . , p,
p
X
2 λ∗i (dy )2i ≥ 0. (6.109)
i=1
In particular, select k between 1 and p, and set (dy )i = 0 for each i 6= k, and
(dy )k = 1. Therefore, (6.109) implies λ∗k ≥ 0, for any k = 1, . . . , p.
Based on these results, we can prove that (x∗ , µ∗ ) verifies the KKT conditions
of problem P1, if and onlyp if (x∗ , y∗ , λ∗ ) verifies the KKT conditions of problem P2,
where λ∗ = µ∗ and y∗i = −gi (x∗ ), i = 1, . . . , m.
P1 =⇒ P2 Consider (x∗ , µ∗ ) that verifies the KKT conditions of problem P1, such
that gi (x∗ ) = 0 for i = 1, . . . , p and gi (x∗ ) < 0 for i = p + 1, . . . , m. Define y∗
such that
y∗i = 0p i = 1, . . . , p,
(6.110)
y∗i = −gi (x∗ ) i = p + 1, . . . , m,
and define λ∗ = µ∗ . Then (x∗ , y∗ , λ∗ ) verifies the KKT conditions of problem P2.
152 Lagrange multipliers: sufficient conditions
• Constraints (6.90) are trivially verified from the definition of y∗ .

• The first order conditions (6.105) are exactly the same as (6.85).
• The second order KKT conditions are also trivially verified. Consider a direc-
tion d that verifies (6.106) and (6.107). As (6.106) is equivalent to (6.87), we
deduce from (6.88) that
n X n p
!
X ∂2 f X
∗ ∗ ∂gi ∗
(x ) + λi (x ) (dx )j (dx )k ≥ 0. (6.111)
j=1 k=1 i=1
Now, (6.108) results from (6.111) and (6.86), which says that λ∗i ≥ 0, for each
i.
P2 =⇒ P1 Consider (x∗ , y∗ , λ∗ ) that verifies the KKT conditions of problem P2, such
that y∗ = 0 for i = 1, . . . , p and y∗i 6= 0 for i = p + 1, . . . , m. Then (x∗ , µ∗ ), where
µ∗ = λ∗ , verifies the KKT conditions of problem P1.
• The constraints (6.78) are direct consequences of (6.90).
• The first order conditions (6.105) are exactly the same as (6.85).
• The conditions (6.86) on the Lagrange multipliers are verified, as for P2, λ∗i ≥ 0,
for i = 1, . . . , p and λ∗i = 0, i = p + 1, . . . , m (see the discussion above).
• Consider d that verifies (6.87). Define dx = d, and define dy such that (dy )i =
0, i = 1, . . . , p, and
n
1 X ∂gi ∗
(dy )i = − ∗ (dx )k (x ) (6.112)
2yi ∂xk
k=1
for i = p + 1, . . . , m. By definition, dx and dy verify (6.106) and (6.107).

Therefore, (6.108) holds. As (dy )i = 0, i = 1, . . . , p, we obtain (6.88).
6.3 Lagrange multipliers: sufficient conditions

Similarly to the approach presented in Section 6.2, we start with problems with
equality constraints. We then generalize the result for general problems.
The demonstration of the sufficient optimality condition utilizes what is called an
augmented Lagrangian, which is also used for algorithms.
Definition 6.18 (Augmented Lagrangian). Consider the optimization problem with

equality constraints (1.71)–(1.72) minx∈Rn f(x) subject to h(x) = 0 and let us take a
parameter c ∈ R, c > 0, called penalty parameter. The Lagrangian function of the
problem
c 2
minn f(x) + h(x) subject to h(x) = 0 (6.113)
x∈R 2
is called the augmented Lagrangian function of the problem (1.71)–(1.72) and is

expressed as
c 2
Lc (x, λ) = L(x, λ) + h(x)
2 (6.114)
c 2
= f(x) + λT h(x) +
h(x) .
2
The idea of the objective function (6.113) is to penalize points x that violate the
constraints, hence the name penalty parameter for c .
6.3.1 Equality constraints

Thanks to the Lagrangian function, the sufficient conditions are similar to those for
the unconstrained case. However, we should note the role of the linearized cone.
Theorem 6.19 (Sufficient optimality conditions: equality constraints). Let f : Rn →

R and h : Rn → Rm be twice differentiable functions. Consider x∗ ∈ Rn and
λ∗ ∈ Rm such that
∇L(x∗ , λ∗ ) = 0 (6.115)
and
yT ∇2xx L(x∗ , λ∗ )y > 0 , ∀y ∈ D(x∗ ) , y 6= 0 , (6.116)
where L is the Lagrangian function of the optimization problem minx∈Rn f(x)
subject to h(x) = 0 and D(x∗ ) is the linearized cone in x∗ . Then, x∗ is a strict
local minimum of the optimization problem.
Proof. We first note that any solution to the augmented problem (6.113) is also a
solution to the original problem. We go back to a problem of unconstrained opti-
mization thanks to the augmented Lagrangian function, by showing that x∗ is a strict
local minimum of the problem
minn Lc (x, λ∗ ) (6.117)
x∈R
for sufficiently large c. Indeed,

∇x Lc (x∗ , λ∗ ) = ∇f(x∗ ) + ∇h(x∗ ) λ∗ + ch(x∗ ) by derivation of (6.114)
∗ ∗ ∗
= ∇f(x ) + ∇h(x )λ because x∗ is feasible
= ∇x L(x∗ , λ∗ ) according to Definition 4.3
=0 from (6.115) .
Similarly, we obtain
T
∇2xx Lc (x∗ , λ∗ ) = ∇2xx L(x∗ , λ∗ ) + c∇h(x∗ )∇h x∗ . (6.118)
By applying the theorem for the formation of a positive definite matrix (Theorem
C.18), there exists b
c such that (6.118) is positive definite for all c > b
c. According to
Theorem 5.7, x∗ is a strict local minimum of the unconstrained problem (6.117) for
sufficiently large c.
According to Definition 1.6, there exists ε > 0 such that
Lc (x∗ , λ∗ ) < Lc (x, λ∗ ) , ∀x ∈ Rn , x 6= x∗ such that kx − x∗ k < ε . (6.119)

According to Definition 6.18 of Lc , we get
f(x∗ ) < f(x) , ∀x ∈ Rn , x 6= x∗ such that kx − x∗ k < ε and h(x) = 0 . (6.120)
According to Definition 1.6, x∗ is a strict local minimum of the problem.
Notes
• Theorem 6.10 can be demonstrated by using the same constraint elimination tech-
nique as for Theorem 6.8. The logic behind the demonstration is the same, but
the proof is more technical (see Bertsekas, 1999).
• No constraint qualifications appear in the sufficient optimality conditions, neither
linear independence nor any other.
6.3.2 Inequality constraints
Theorem 6.20 (Sufficient optimality conditions). Let f : Rn → R, h : Rn → Rm

and g : Rn → Rp be twice differentiable functions. Consider x∗ ∈ Rn , λ∗ ∈ Rm
and µ∗ ∈ Rp such that
∇x L(x∗ , λ∗ , µ∗ ) = 0 (6.121)
∗
h(x ) = 0 (6.122)
∗
g(x ) ≤ 0 (6.123)
∗
µ ≥0 (6.124)
µ∗j gj (x∗ ) = 0, j = 1, . . . , p (6.125)
µ∗j > 0, ∀j ∈ A(x ) ∗
(6.126)
yT ∇2xx L(x∗ , λ∗ , µ∗ )y > 0 , ∀y 6= 0 such that

T ∗
y ∇hi (x ) = 0 , i = 1, . . . , m (6.127)
T ∗ ∗
y ∇gi (x ) = 0 , i = 1, . . . , p such that gi (x ) = 0 ,
where L is the Lagrangian function of the optimization problem minx∈Rn f(x)
subject to h(x) = 0 and g(x) ≤ 0. Then, x∗ is a strict local minimum of the
optimization problem.
Proof. We use slack variables (Definition 1.4) to obtain the following optimization
problem with equality constraints
min f(x) (6.128)

x∈Rn , z∈Rp
subject to
hi (x) = 0 , i = 1, . . . , m
(6.129)
gi (x) + z2i = 0, i = 1, . . . , p ,
and let us define p
z∗i = −gi (x∗ ) (6.130)
2
such that gi (x∗ ) + z∗i = 0 is trivially satisfied.
m p
X X
b z, λ, µ) = f(x) +
L(x, λi hi (x) + µi gi (x) + z2i (6.131)
i=1 i=1
and
m
X p
X
∇x b
L(x, z, λ, µ) = ∇f(x) + λi ∇hi (x) + µi ∇gi (x) = ∇x L(x, λ, µ) (6.132)
i=1 i=1
and
∂b
L(x, z, λ, µ)
= 2µi zi , i = 1, . . . , p . (6.133)
∂zi

x
Moreover, by expressing b
x= , we have
z
 
∇2xx L(x, λ, µ) 0
 2µ1 0 ··· 0 
 
 
∇2bxbx b
L(x, z, λ, µ) =  0 2µ2 ··· 0 . (6.134)
 .. .. .. 
 .. 
 0 . . . . 
0 0 · · · 2µp
From the hypothesis (6.125) and as per (6.130), we have µ∗i z∗i = 0. Moreover,
from the hypothesis (6.121), we have
∇b
Lbx (x∗ , z∗ , λ∗ , µ∗ ) = 0 . (6.135)
Consider a non zero vector

y
∈ Rm+p ,
w
in the linearized cone at
x∗
z∗
for the problem (6.128)–(6.129), i.e., such that
yT ∇hi (x∗ ) = 0 , i = 1, . . . , m , (6.136)
and
yT ∇gi (x∗ ) + 2z∗i wi = 0 ,
i = 1, . . . , p . (6.137)
Note that if i ∈ A(x∗ ), then zi = 0 and (6.137) is written as yT ∇gi (x∗ ) = 0. The
vector y always corresponds to the conditions of the hypothesis (6.127). We have

y
yT wT b ∗ , z∗ , λ∗ , µ∗ )
∇bxbx L(x
w
p
X
= yT ∇L2xx (x∗ , λ∗ , µ∗ )y + 2 µ∗i w2i from (6.134)
i=1
X
= yT ∇2xx L(x∗ , λ∗ , µ∗ )y + 2 µ∗i w2i from (6.125) .
i∈A(x∗ )
From the hypothesis (6.127), the first term is positive if y 6= 0. From the hypothesis
(6.124), each term of the sum of the second term is non negative. If y = 0, there is
necessarily an i such that wi 6= 0. In order for (6.137) to be satisfied, we have that
z∗i = 0 and then i ∈ A(x∗ ). From (6.126), the corresponding term µ∗i w2i is positive.
The sufficient optimality conditions of Theorem 6.19 are satisfied for x∗ , z∗ , λ∗
and µ∗ and ∗
x
z∗
is a strict local minimum of (6.128)–(6.129). Consequently, x∗ is a strict local mini-
mum of the initial problem.
The condition (6.126) is called the strict complementarity condition. The fol-
lowing example illustrates its importance.
Example 6.21 (Importance of the strict complementarity condition). Consider the
problem
1
min (x21 − x22 ) (6.138)
x∈R 2
2
subject to
x2 ≤ 0 (6.139)
for which the objective function is illustrated in Figure 6.7. We demonstrate that
T all
sufficient optimality conditions except (6.126) are satisfied for x∗ = 0 0 and
∗
µ = 0. We have
1
L(x, µ) = (x21 − x22 ) + µx2 . (6.140)
2
Then,
x1
∇Lx (x, µ) =
−x2 + µ
T
and (6.121) is satisfied when x∗ = 0 0 and µ∗ = 0. The other conditions are
trivially satisfied and (6.126) is not satisfied because the constraint is active in x∗ and
µ∗ = 0.
We now consider the point (0, −α), with α > 0. It is feasible and the objective
function is − 21 α2 , which is strictly smaller than the value in x∗ , which is therefore
not a local minimum.

To understand why it is not a local optimum, let us take the proof of Theorem
6.20 and transform the problem into a problem with equality constraints:
1 2
min (x − x22 ) (6.141)
x∈R2 2 1
subject to
x2 + z2 = 0 . (6.142)
We have p
z∗ = −g(x∗ ) = 0 . (6.143)
b 1
L(x, z, µ) = (x21 − x22 ) + µ(x2 + z2 ) . (6.144)
2
The Hessian ∇2bxxb b
L(x∗ , z∗ , µ∗ ) used in the proof is
 
1 0 0
2 b 
∇bxbx L(x, z, µ) = 0 −1 0 
0 0 2µ
and  
1 0 0
∇2bxbx b
L(x∗ , z∗ , µ∗ ) =  0 −1 0 ,
0 0 0
and it is singular. As in the proof, let us take a vector belonging to the linearized
cone at ∗
x
z∗
of the problem with equality constraints, i.e.,
 
0
y
=  0 .
w
γ
For all γ, we have
  
1 0 0 0
( 0 0 γ )  0 −1 0   0  = 0
0 0 0 γ
and the sufficient condition for the problem with equality constraints is not satisfied,
which prevents us from proving Theorem 6.20.
We conclude this section with two examples of the use of optimality conditions
to identify critical points. They both lead to the resolution of a system of equations,
the topic of the next section of this book.
Example 6.22 (Identification of critical points – I). Consider the problem
min 3x21 + x22

x∈R2
subject to
2x1 + x2 = 1
illustrated in Figure 6.9. The necessary optimality condition (6.23) is written as
6x1 + 2λ = 0
2x2 + λ = 0
2x1 + x2 − 1 = 0 .
This is a system with three linear equations, with three unknowns, for which the
solution is
2
x∗1 =
7
3
x∗2 =
7
6
λ∗ = − .
7
We now need only ensure that this point satisfies the sufficient second-order conditions
in order to determine that it is indeed a solution.
• x∗ = ( 27 , 73 )
x2
x1
Figure 6.9: Problem for Example 6.22

Example 6.23 (Identification of critical points – II). Consider the problem
min 3x21 + x22

x∈R2
subject to
x21 + 4x1 − x2 + 3 = 0
illustrated in Figure 6.10. The necessary optimality condition (6.23) is expressed as
6x1 + 2λx1 + 4λ = 0
2x2 − λ = 0
x21 + 4x1 + 3 = 0 .
This is a system of three non linear equations, with three unknowns. One solution is
x1 = −1, x2 = 1.5, and λ = 3. It is not necessarily straightforward to calculate it.
3
2
• x∗ 1
0 x2
-1
-2
-3
-3 -2 -1 0 1 2 3
x1
Figure 6.10: Problem of Example 6.23
6.4 Sensitivity analysis

When the data of an optimization problem is slightly disturbed, the solution to the
perturbed problem generally does not differ fundamentally from that of the unper-
turbed problem. We first analyze this relation for problems with equality constraints.
Theorem 6.24 (Sensitivity analysis: equality constraints). Let f : Rn → R and

h : Rn → Rm be twice continuously differentiable functions. Consider the opti-
mization problem (1.71)–(1.72)
min f(x)
x∈Rn
160 Sensitivity analysis
subject to
h(x) = 0 .
Moreover, let x∗ be a local minimum and let λ∗ satisfy the sufficient optimality
conditions (6.115) and (6.116), such that the constraints are linearly indepen-
dent in x∗ , according to Definition 3.8. Consider a perturbation of the data

characterized by δ ∈ Rm , and the perturbed optimization problem
min f(x)
x∈Rn
subject to
h(x) = δ.
There thus exists a sphere S ⊂ Rm centered in 0 such that if δ ∈ S, there exist x(δ)
and λ(δ) satisfying the sufficient optimality conditions of the perturbed problem.
The functions x : Rm → Rn : δ x(δ) and λ : Rm → Rm : δ λ(δ) are
∗ ∗
continuously differentiable in S, with x(0) = x and λ(0) = λ . Moreover, for all
δ ∈ S, we have
∇p(δ) = −λ(δ) , (6.145)
where
p(δ) = f x(δ) . (6.146)
Proof. We note that

x
γ= ∈ Rn+m
λ
and consider the function F : Rm+n+m → Rn+m defined by

∇f(x) + ∇h(x)λ ∇x L(x, λ)
F(δ, γ) = = . (6.147)
h(x) − δ h(x) − δ
We first demonstrate that the gradient matrix

2
∇xx L(x∗ , λ∗ ) ∇h(x∗ )
∇γ F(δ, γ∗ ) =
∇h(x∗ )T 0
is non singular. We assume by contradiction that this is not the case. There then
exist y ∈ Rn and z ∈ Rm , non zero, such that

∗ y
∇γ F(δ, γ ) = 0,
z
i.e.,
∇2xx L(x∗ , λ∗ )y + ∇h(x∗ )z = 0 (6.148)

∗ T
∇h(x ) y = 0 . (6.149)
We have
yT ∇2xx L(x∗ , λ∗ )y = −yT ∇h(x∗ )z from (6.148)

=0 from (6.149).
Since the sufficient optimality condition (6.116) is satisfied, ∇2xx L(x∗ , λ∗ ) is positive
definite and y = 0. Then, according to (6.148), ∇h(x∗ )z = 0. By assumption, the
constraints are linearly independent at x∗ and the matrix ∇h(x∗ ) is of full rank.
Then, z = 0, which contradicts the fact that y and z are non zero. The matrix
∇γ F(δ, γ∗ ) is indeed non singular and we can apply the theorem of implicit functions
(Theorem C.6): there exist neighborhoods V0 ⊆ Rm around δ+ = 0 and Vγ∗ ⊆ Rn+m
around γ+ = γ∗ , as well as a continuous function

x(δ)
φ : V0 → Vγ ∗ : δ γ(δ) =
λ(δ)
such that
F δ, γ(δ) = 0 , ∀δ ∈ V0 ,
i.e., !

∇f x(δ) + ∇h x(δ) λ(δ)
= 0. (6.150)
h x(δ) − δ
We now demonstrate that, for δ sufficiently close to 0, the sufficient optimality
conditions of the perturbed problem are satisfied. Assuming (by contradiction) that
this is not the case, there exists a sequence δk k , such that limk→∞ δk = 0 and a
T
sequence (yk )k , with yk ∈ Rm , kyk k = 1 and ∇h x(δk ) yk = 0, for all k, such that

yTk ∇2xx L x(δk ), λ(δk ) yk ≤ 0 , ∀k .
Consider a subsequence of yk converging toward ȳ 6= 0. When we take the limit,

we obtain by continuity of ∇2xx L (as a result of the continuity of ∇2xx f and ∇2xx hi ,
i = 1, . . . , m) that
ȳT ∇2xx L(x∗ , λ∗ )ȳ ≤ 0 ,
which contradicts the sufficient optimality condition in x∗ and λ∗ (6.116).
By differentiating the second row of (6.150), we obtain

∇δ h x(δ) = ∇x(δ)∇h x(δ) = I . (6.151)
When multiplying the first row of (6.150) by ∇x(δ), we get

0 = ∇x(δ)∇f x(δ) + ∇x(δ)∇h x(δ) λ(δ) = ∇x(δ)∇f x(δ) + λ(δ) ,
where the second equality comes from (6.151). Therefore,

∇p(δ) = ∇δ f x(δ) = ∇x(δ)∇f x(δ) = −λ(δ) ,
which demonstrates (6.145).

Example 6.25 (Sensitivity). Consider the problem minx∈R x2 subject to x = 1, for

which the solution is x∗ = 1, λ∗ = −2. We consider the perturbed problem minx∈R x2
subject to x = 1 + δ, for which the solution is x(δ) = 1 + δ and λ(δ) = −2δ − 2. We
have f x(δ) = δ2 + 2δ + 1 and

df
x(δ) = 2δ + 2 = −λ(δ) .
dδ

The quantity ∇f x(δ) represents the marginal modification of the objective func-
tion for a perturbation δ of the constraints. When δ is small, we use Taylor’s theorem
(Theorem C.1) to obtain

p(δ) = p(0) + δT ∇p(0) + o kdk .
Neglecting the last term, we obtain

f x(δ) ≈ f(x∗ ) − δT λ∗ .
Note that if p(δ) is linear, we have exactly

f x(δ) = f(x∗ ) − δT λ∗ . (6.152)
This result has significant practical consequences. Indeed, it becomes possible to

measure the impact of a perturbation of the constraint on the objective function,
without re-optimizing.
Example 6.26 (Sensitivity analysis). We consider a company manufacturing two
products. Each unit of the first product brings in e 6,000, while each unit of the
second product brings in e 5,000. A total of 10 machine-hours and 15 tons of raw
material are available daily. Each unit of the first product requires 2 machine-hours
and 1 ton of raw material. Each unit of the second product requires 1 machine-hour
and 3 tons of raw material. In thousands of euros, the optimal production that the
company should consider is obtained by solving the optimization problem
max 6x1 + 5x2

x1 ,x2
subject to
2x1 + x2 ≤ 10
x1 + 3x2 ≤ 15
x1 , x2 ≥ 0 .
We omit the non negativity constraints to maintain a simple formulation (these con-
straints are inactive at the solution). We first express the problem in the form (1.71)–
(1.72) by changing the maximization problem into a minimization problem, and by
including slack variables (see Section 1.2):
min f(x) = −6x1 − 5x2

x1 ,x2 ,x3 ,x4
subject to
h1 (x) = 2x1 + x2 + x23 − 10 = 0
h2 (x) = x1 + 3x2 + x24 − 15 = 0 ,
∗
where x3 and x4 are the slack variables.
T The solution to the problem is x =
T ∗
and λ = 13/5 4/5 , enabling the company to bring in e 38,000
3 4 0 0
per day.
10 10 •x(δ) = (0, 10)

8 8
6 •x(δ) = (2, 6) 6
x2
x2
∗ ∗
4 •x = (3, 4) 4 •x = (3, 4)
2 2
0 0
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
x1 x1
(a) δ = 5 (b) δ = 16
Figure 6.11: Graphical analysis of Example 6.26
In order to increase its production, the company wishes to invest by purchasing

an additional quantity δ of raw material per day. In this case, the constraints would
become
h1 (x) = 2x1 + x2 + x23 − 10 = 0
h2 (x) = x1 + 3x2 + x24 − 15 = δ .
To determine what this investment would bring in, we use (6.152) with

0
δ=
δ
to obtain
4
f x(δ) = f(x∗ ) − δT λ∗ = −38 − δ . (6.153)
5
Therefore, the purchase of 5 tons of raw material per day would enable the company
to bring in e 4,000. If this purchase costs less than e 4,000, it is worth going through
with the investment. Otherwise, the investment is of no use.
Note that this result is only valid for small values of δ. If δ = 16, such that
31 tons of raw material are available each day, the company no longer has enough
machine-hours to use up all of the raw material. Therefore, the second constraint
is no longer active (here, x4 is positive). The company should thus produce only
the second product, which consumes half as many machine-hours. In this case, the
purchase of additional raw material would not enable the company to earn more, and
the company should thus rather invest in new machines.
The inspiration for this example came from de Werra et al. (2003).
Corollary 6.27 (Sensitivity analysis). Let f : Rn → R, g : Rn → Rp and h :

Rn → Rm be twice differentiable functions. Consider the optimization problem
(1.71)–(1.73)
minn f(x)
x∈R
subject to
h(x) = 0
g(x) ≤ 0 .
Moreover, let x be a local minimum and let λ∗ , µ∗ satisfy the sufficient optimality
∗
conditions (6.121)–(6.126), such that the constraints are linearly independent

in x∗ , according to Definition 3.8. Consider perturbations δh ∈ Rm and δg ∈ Rp ,
and the perturbed optimization problem
min f(x)
x∈Rn
subject to
h(x) = δh
g(x) ≤ δg .
T
Then, there exists a sphere S ⊂ Rm+p centered in 0 such that if δ = δTh δTg ∈
S, there are x(δ), λ(δ) and µ(δ) are satisfying the sufficient optimality conditions
of the perturbed problem. The functions x : Rm+p → Rn , λ : Rm+p → Rm and
µ : Rm+p → Rp are continuously differentiable in S, with x(0, 0) = x∗ , λ(0, 0) = λ∗
and µ(0, 0) = µ∗ . Moreover, for all δ ∈ S, we have
∇δh p(δ) = −λ(δ)

(6.154)
∇δg p(δ) = −µ(δ) ,
with
p(δ) = f x(δ) . (6.155)
Proof. From Theorem 3.5, x∗ , λ∗ and µ∗ satisfy the optimality conditions of the
problem
min f(x)
x∈Rn
subject to
h(x) = 0
gi (x) = 0 , ∀i ∈ A(x∗ ) .
From Theorem 6.24, there exist x(δ), λ(δ) and µ(δ) satisfying the sufficient optimality
conditions of the problem
minn f(x) (6.156)
x∈R
subject to
h(x) = δh (6.157)
∗
gi (x) = δg i
, ∀i ∈ A(x ) . (6.158)
The functions x : Rm+p → Rn , λ : Rm+p → Rm and µi : Rm+p → R are continuously

differentiable in S, with x(0, 0) = x∗ , λ(0, 0) = λ∗ and µi (0, 0) = µ∗i , i ∈ A(x∗ ).
Moreover, for all δ ∈ S, we have
∇δh p(δ) = −λ(δ)

∇δg p(δ) i = −µi (δ) , i ∈ Ax∗ .
We now need only verify the result for inequality constraints that are inactive in
x∗ , i.e.,
gi (x∗ ) < 0 .

In this case, if δi is sufficiently close to 0, gi x(δ) < δi , and the constraint i is also in-
active on the solution to the perturbed problem (by continuity of gi ). Then, according
to Theorem 3.5, the perturbed problem is equivalent to the problem (6.156)–(6.158).
If we take µi (δ) = 0 for all i 6∈ A(x∗ ), x(δ), λ(δ) and µ(δ) satisfying the sufficient
optimality conditions. Moreover, regardless of the value of δi (small enough for the
constraint of the problem to remain inactive), the value of x(λ) remains constant,
since it is determined by the problem (6.156)–(6.158), which does not depend on δi ,
if i 6∈ A(x∗ ). Therefore,
n
X ∂f ∂xj
∂p
= = 0 = −µi (δ) ,
∂δi ∂xj ∂δi
j=1
which concludes the proof.

We emphasize the importance of the condition requiring that the inactive con-
straints in the initial problem remain so in the perturbed problem. This is illustrated
for Example 6.26 in Figure 6.11. When δ = 5, the solution x(δ) is such that the
constraints x1 ≥ 0 and x2 ≥ 0, inactive in x∗ , remain inactive in x(δ). However, when
δ = 16, the constraint x1 ≥ 0 becomes active in x(δ) and we leave the domain of
application of the theorem of sensitivity.
6.5 Linear optimization

We now analyze in greater detail the optimality conditions for the linear optimization
problem
minn cT x (6.159)
x∈R
166 Linear optimization
subject to
Ax = b
(6.160)
x ≥ 0,
where A ∈ Rm×n , b ∈ Rm , c ∈ Rn , for which the Lagrangian function is
L(x, λ, µ) = cT x + λT (b − Ax) − µT x , (6.161)
with λ ∈ Rm and µ ∈ Rn . By directly applying Theorem 6.13, the necessary first-

order optimality condition is expressed as
∇L(x, λ, µ) = c − AT λ − µ = 0
(6.162)
µ ≥ 0.
These conditions represent exactly the constraints (4.23) of the dual problem de-
scribed in Section 4.2. The second-order conditions are trivial, because ∇2xx L(x, λ, µ) =
0, for all (x, λ, µ). The complementarity condition (6.57) is simply written as
µi xi = 0 , i = 1, . . . , n, (6.163)
or, equivalently,
m
X
(ci − aji λj )xi = 0 , i = 1, . . . , n. (6.164)
j=1
We show below that this condition happens to also be sufficient for optimality.
We can also utilize the necessary conditions of Theorem 5.1. In particular, if we
consider the jth basic direction dj (Definition 3.42), the directional derivative of the
objective function in the direction dj is given by
∇f(x∗ )T dj = cT dj = cTB (dj )B + cTN (dj )N = −cTB B−1 Aj + cj . (6.165)
In the context of linear optimization, this quantity is often called reduced cost.
Definition 6.28 (Reduced costs). Consider the linear optimization problem (6.159)–
(6.160) and let x be a feasible basic solution of the constraint polyhedron. The reduced
cost of xj is
c̄j = cj − cTB B−1 Aj , j = 1, . . . , n . (6.166)
In matrix form, we have
c̄ = c − AT B−T cB . (6.167)
The reduced costs can be decomposed into their basic and non basic components,
as follows:
c̄B = cB − BT B−T cB = 0, (6.168)
and
c̄N = cN − NT B−T cB . (6.169)
Therefore, for any basis B, the basic components of the reduced costs is always
0. This, together with the geometric interpretation of the non basic components, is
formalized in the next theorem.
Theorem 6.29 (Reduced costs). Consider the linear problem (6.159)–(6.160)

and let x be a basic solution of the constraint polyhedron. When j is the index
of a non basic variable, the jth reduced cost is the directional derivative of the
objective function in the jth basic direction. When j is the index of a basic
variable, the jth reduced cost is zero.
Proof. In the case where j is non basic, the proof can be found above (see (6.165)).
For basic j, we see that B−1 B = I and B−1 Aj is the jth column of the identity matrix.
Then, cTB B−1 Aj = cj and the reduced cost is zero.
The concept of reduced costs now enables us to state the optimality conditions
for linear optimization.
Theorem 6.30 (Necessary optimality conditions, linear optimization). Consider the

linear problem (6.159)–(6.160) and let x∗ be a non degenerate basic solution of
the constraint polyhedron. If x∗ is the solution to (6.159)–(6.160), then c̄ ≥ 0.
Proof. Consider the basic direction dk . According to Theorem 3.44, the non degen-
eracy of x∗ ensures that dk is feasible. Therefore, given the convexity of the set of
constraints, the necessary condition of Theorem 6.3 applies and
T
∇f x∗ dk = c̄ ≥ 0 ,
by using (6.166) and Theorem 6.29.
Theorem 6.31 (Sufficient conditions, linear optimization). Consider the linear

problem (6.159)–(6.160) and let x∗ be a feasible basic solution of the constraint
polyhedron. If c̄ ≥ 0, then x∗ is optimal.
Proof. Let y be an arbitrary feasible point and w = y − x∗ . Since the feasible set
is convex, w is a feasible direction (Theorem 3.11). If dj is the jth basic direction2
(Definition 3.42), we have
X
cT w = (w)j cT dj from Theorem 3.45
j∈N
X
= (w)j (−cTB B−1 Aj + cj ) from Definition 3.42
j∈N
X
= (w)j c̄j from Definition 6.28.
j∈N
2 In this proof, dj is a vector of Rn , while (w)j is a scalar, representing the jth entry of the vector
w.
Since x∗ is a basic solution, j ∈ N implies that x∗j = 0 according to Definition 3.38.

Therefore, (w)j = yj − x∗j = yj ≥ 0 by feasibility of y. Then, as the reduced costs are
non negative, we obtain
X
cT y − cT x∗ = cT w = (w)j c̄j ≥ 0 ,
j∈N
which proves the optimality of x∗ .
Note that the sufficient conditions do not assume that x∗ is non degenerate. To
understand why the necessary conditions may not hold when x∗ is degenerate, we go
back to the example illustrated in Figure 3.16. In this example, the vertex number
2 corresponds to a degenerate solution, and the basic direction d b3 is not feasible.
Therefore, if this vertex happens to be the optimal solution of an optimization prob-
b3 to correspond to an ascent direction.
lem, it is not necessary for the basic direction d
It does not matter if it is a descent direction, as the direction is not feasible anyway.
If it happens to be a descent direction, the reduced cost is negative, and the necessary
condition is not verified although we are at the optimal solution.
We now characterize the optimal solution of the dual problem given the optimal
solution of the primal. We obtain an important result for linear optimization, called
strong duality, that the optimal value of the primal coincides with the optimal value
of the dual. Moreover, this result provides a direct link between the reduced costs
and the dual variables.
Corollary 6.32 (Optimality of the dual). Consider the primal linear problem
(6.159)–(6.160) and let B be a basis such that B−1 b ≥ 0 and c̄ ≥ 0. Consider
also the dual problem
maxm
λT b (6.170)
λ∈R
subject to
AT λ ≤ c. (6.171)
Then, the primal vector x∗ with basic variables
x∗B = B−1 b (6.172)
and non basic variables x∗N = 0, is optimal for the primal problem, the dual
vector
λ∗ = B−T cB (6.173)
is optimal for the dual problem, and the objective functions are equal, that is
(λ∗ )T b = cT x∗ . (6.174)
Proof. The optimality of x∗ is guaranteed by Theorem 6.31. We have also
(λ∗ )T b = cTB B−1 b from (6.173)

= cTB x∗B from (6.172)
T ∗
=c x as x∗N = 0,
proving (6.174).
The vector λ∗ is feasible for the dual. Indeed, from (6.167), we have
AT λ∗ = AT B−T cB = c − c̄.
As c̄ ≥ 0, we obtain (6.171).
Consider now any dual feasible λ. By the weak duality theorem (Theorem 4.9),
λT b ≤ cT x∗ = (λ∗ )T b,
which proves the optimality of λ∗ for the dual problem.

The above result leads to an important result called strong duality. Consider x∗
an optimal solution of the primal problem. If x∗ is non degenerate, the condition
c̄ ≥ 0 is a sufficient and necessary condition for its optimality. If it is degenerate,
it can be shown that there exists a basis B such that x∗B = B−1 b ≥ 0 and c̄ =
c − AT B−T cB ≥ 0. The idea is that the simplex algorithm described in Chapter 16,
combined with appropriate rules attributed to Bland (1977) terminates in a finite
number of iterations with an optimal basis and non negative reduced costs.
Theorem 6.33 (Strong duality). Consider the primal linear problem
min cT x
x∈Rn
subject to
Ax = b
x ≥ 0,
where A ∈ Rm×n , b ∈ Rn , c ∈ Rn , and the dual problem
max λT b
λ∈Rm
subject to
AT λ ≤ c.
If either the primal or the dual problem has an optimal solution, so does the
other, and the optimal objective values are equal.
Proof. From the discussion above, if x∗ is a solution to the primal problem, there
exists a basis B such that x∗B = B−1 b ≥ 0 and c̄ = c − AT B−T cB ≥ 0. Therefore,
Corollary 6.32 applies, the optimal solution of the dual is B−T cB , and the objective
functions are equal.
If λ∗ is a solution to the dual problem, the fact that the primal problem is the
dual of the dual (Theorem 4.15) is used to prove the result.
Note that another proof of this result, exploiting Farkas’ lemma, is presented in
Theorem 4.17. Finally, we show that the complementarity condition (6.164) is a
sufficient and necessary optimality condition.
Theorem 6.34 (Complementarity slackness). Consider the primal linear problem
min cT x
x∈Rn
subject to
Ax = b
x ≥ 0,
where A ∈ Rm×n , b ∈ Rn , c ∈ Rn , and the dual problem
max λT b
λ∈Rm
subject to
AT λ ≤ c.
Consider x∗ primal feasible and λ∗ dual feasible. x∗ is optimal for the primal
and λ∗ is optimal for the dual if and only if
m
X
(ci − aji λj )xi = 0 , i = 1, . . . , n. (6.175)
j=1
Proof. Conditions (6.175) are KKT necessary optimality conditions (see Theorem 6.13
and the discussion at the beginning of the section). To show that they are sufficient,
consider the equation
n
X m
X
(c − AT λ∗ )T x∗ = (ci − aji λj )xi = 0.
i=1 j=1
Therefore,
cT x∗ = (λ∗ )T Ax∗ .
As x∗ is primal feasible, we have Ax∗ = b and
cT x∗ = (λ∗ )T b.
Consequently, the objective function of the primal at x∗ equals the objective function
of the dual at λ∗ . We apply Theorem 4.11 to prove the optimality of x∗ and λ∗ .
Conditions (6.175) are called complementarity slackness conditions because the
activity of the constraints must be complementary. At the optimal solution, if a
primal variable is positive, that is if xi > 0, the corresponding dual constraint must
Pm
be active, that is ci − j=1 aji λj . Symmetrically, if a dual constraint is inactive, that
Pm
is if ci > j=1 aji λj , the corresponding primal variable must be equal to 0.
6.6 Quadratic optimization

We now consider a case of quadratic optimization with equality constraints:
1 T
min x Qx + cT x (6.176)
x∈Rn 2
subject to
Ax = b , (6.177)
where Q ∈ Rn×n , c ∈ Rn , A ∈ Rm×n and b ∈ Rm . The Lagrangian function is
1 T
L(x, λ) = x Qx + cT x + λT (b − Ax) , (6.178)
2
with λ ∈ Rm . By directly applying Theorem 6.13, the necessary first-order optimality
condition is written as
∇x L(x, λ) = Qx + c − AT λ = 0 . (6.179)
By combining (6.177) and (6.179), we obtain the linear system

Q −AT x −c
= . (6.180)
A 0 λ b
We demonstrate the case where this system has a unique solution.
Lemma 6.35. Consider the quadratic problem (6.176)–(6.177), with A of full

rank. Let Z ∈ Rn×(n−m) be a matrix for which the columns form a basis of
the null space of A, i.e., AZ = 0, and Z is of full rank. If the reduced Hessian
matrix ZT QZ is positive definite, then the system (6.180) is non singular and
has a unique solution (x∗ , λ∗ ).
Proof. Consider x and λ such that

Q −AT x 0
= ,
A 0 λ 0
i.e., Qx = AT λ and Ax = 0. We demonstrate that x and λ are zero in order to prove

that the matrix is non singular. Since Ax = 0, we have

Q −AT x
0= x T
λ T = xT Qx .
A 0 λ
172 Quadratic optimization
Since Z is of full rank, there exists y such that x = Zy. Therefore,
yT ZT QZy = 0 .
Since ZT QZ is positive definite, then y = 0. As a result, x = Zy = 0 and the first

equation is written as
Qx − AT λ = −AT λ = 0 .
Since A is of full rank, then λ = 0.
We calculate the analytical solution to this problem.
Lemma 6.36. Consider the quadratic problem (6.176)–(6.177) with Q = I and

b = 0, i.e.,
1
min xT x + cT x
2
subject to
Ax = 0
where A is of full rank. The solution to this problem is
−1
x∗ = AT AAT Ac − c (6.181)
−1
λ∗ = AAT Ac . (6.182)
Proof. The system (6.180) is written as

I −AT x∗ −c
= .
A 0 λ∗ 0
By multiplying the first equation
x∗ − AT λ∗ = −c , (6.183)
by A, we obtain
Ax∗ − AAT λ∗ = −Ac .
Since Ax∗ = 0 and A is of full rank, we obtain (6.182). We now need only introduce
(6.182) in (6.183) to obtain (6.181).
Lemma 6.37. Consider the quadratic problem (6.176)–(6.177) with Q = I, i.e.,
1 T
min x x + cT x
x 2
subject to
Ax = b
where A is of full rank. The solution to this problem is

−1
x∗ = AT AAT (Ac + b) − c (6.184)
∗

T −1
λ = AA (Ac + b) (6.185)
Proof. Consider x0 such that Ax0 = b and let y = x − x0 , i.e., x = y + x0 . The

problem becomes
1 T
min y y + xT0 x0 + 2yT x0 + cT y + xT x0
y 2
subject to
Ay + Ax0 = b .
By removing the constant terms of the objective function and using Ax0 = b, we
obtain
1 T
min yT y + c + x0 y
y 2
subject to
Ay = 0 .
According to Theorem 6.36, the solution to this problem is
−1
y∗ = AT AAT A(c + x0 ) − (c + x0 )
−1
λ∗ = AAT A(c + x0 ) .
We now need only use Ax0 = b and y∗ = x∗ − x0 to obtain the result.
Theorem 6.38 (Analytical solution of a quadratic problem). Consider the

quadratic problem (6.176)–(6.177) minx∈Rn 21 xT Qx+cT x subject to Ax = b, where
A is of full rank. If the matrix Q is positive definite, then the system (6.180) is
non singular and has a unique solution (x∗ , λ∗ ) given by
x∗ = Q−1 (AT λ∗ − c) (6.186)
and −1
λ∗ = AQ−1 AT (AQ−1 c + b) . (6.187)
Proof. Let Z ∈ Rn×(n−m) be a matrix where the columns form a basis of the null
space of A, i.e., such that AZ = 0. Since Q is positive definite, then so is ZT QZ,
and Theorem 6.35 applies to demonstrate the non singularity of the system and the
unicity of the solution. Let L be an lower triangular matrix such that Q = LLT and
let us take y = LT x. The problem (6.176)–(6.177) is thus written as
1 T −1 T −T 1
min y L LL L y + cT L−T y = yT y + cT L−T y
y 2 2
174 Exercises
subject to
AL−T y = b .
The solution to this problem is given by Theorem 6.37, by replacing c with L−1 c and
A with AL−T . Then,
−1
λ∗ = AL−T L−1 AT (AL−T L−1 c + b) = (AQ−1 AT )−1 (AQ−1 c + b)
and
y∗ = L−1 AT λ∗ − L−1 c .
We now need only take y∗ = LT x∗ to obtain the result.
The presentation of the proof of Theorems 6.10, 6.13, 6.19, 6.20, and 6.24 was
inspired by Bertsekas (1999). That of the proof of Theorem 6.31 was inspired by
Bertsimas and Tsitsiklis (1997).
6.7 Exercises
Exercise 6.1.
Identify the local optima of the following optimization problems, and verify the op-
timality conditions.
n
X
1. minn kxk22 , subject to xi = 1.
x∈R
i=1
n
X
2. minn xi , subject to kxk22 = 1.
x∈R
i=1
3. min −x21 − x22 , subject to (x1 /2)2 + (x2 /2)2 ≤ 1 (Hint: plot the level curves and
x∈R2
the constraints).
4. min −x21 − x22 , subject to −x21 + x22 ≤ 1, and −5 ≤ x1 ≤ 5 (Hint: plot the level
x∈R2
curves and the constraints).
5. The Indiana Jones problem (Section 1.1.6): min x21 + x22 , subject to x1 x2 − hx1 −
x∈R2
ℓx2 = 0, x1 ≥ ℓ, x2 ≥ h.
Exercise 6.2.
An electricity company must supply a town that consumes 100 MWh daily. Three
plants are used to generate the energy: a gas plant, producing at the cost of e 800/MWh,
a coal plant, producing at the cost of e 1,500/MWh, and a hydroelectric plant produc-
ing at the cost of e 300/MWh. The amount of available water limits the production
of the latter plant to 40 MWh per day. Moreover, due to ecological concerns, the two
other plants are limited to produce no more than 80 MWh per day each.
1. Formulate a linear optimization problem that would optimize the costs of the
company.
2. Formulate the dual problem.
3. Prove, using the optimality conditions, that the optimal solution is to produce 60
MWh per day with the gas plant, 40 MWh per day with the hydroelectric plant,
and not to use the coal plant.
4. Deduce the optimal values of the dual variables.
5. Use sensitivity analysis to propose profitable investments to the company.
Exercise 6.3. Consider the optimization problem minx∈Rn f(x) subject to

n
X
xi = 1 and x ≥ 0.
i=1
Let x∗ be a local minimum of f.

1. Prove that, if x∗i > 0, then
∂f(x∗ ) ∂f(x∗ )
≤ ∀j. (6.188)
∂xi ∂xj
(Hint: refer to Example 6.4).

2. Show that, if f is convex, condition (6.188) is sufficient. (Hint: Define ∆ =
mini ∂f(x∗ )/∂xi ).
Exercise 6.4 (Slack variables). Consider problem (P1)
min f(x) (6.189)

x∈Rn
subject to
gi (x) ≤ 0, i = 1, . . . , m,
and problem (P2)
min f(x)
x∈Rn ,y∈Rm
subject to
gi (x) + yi = 0, i = 1, . . . , m,
yi ≥ 0, i = 1, . . . , m.
1. Write the necessary optimality conditions (KKT) for problem (P1), both first and
second order.
2. Write the necessary optimality conditions (KKT) for problem (P2), both first and
second order.
3. Prove that (x∗ , µ∗ ) verifies the KKT conditions of problem P1, if and only if
(x∗ , y∗ , λ∗ ) verifies the KKT conditions of problem P2, where λ∗ = µ∗ and y∗i =
−gi (x∗ ), i = 1, . . . , m (Hint: refer to Example 6.16).
Part III
Solving equations
Equations are more important to

me, because politics is for the
present, but an equation is
something for eternity.
Albert Einstein
We have seen that the necessary optimality conditions enable us to identify the critical
points (Definition 5.6) that are candidates for the solution to an optimization problem.
In the case of optimization without constraint, we use condition (5.1). For constrained
optimization, we use conditions (6.11), (6.23), and (6.55)–(6.57). One way to address
the problem is to solve the system of equations defined by these conditions. This
is how the problem in Example 5.8 was solved, as well as Example 6.22. In these
two cases, the system of equations is easily solved. This is not always the case, as
illustrated in Example 6.23.
We now consider numerical methods enabling us to solve such systems of non
linear equations. Even though these are not directly utilized for optimization, they
are the basis for the main algorithms.
Chapter 7
Newton’s method
Contents
7.1 Equation with one unknown . . . . . . . . . . . . . . . . . 181

7.2 Systems of equations with multiple unknowns . . . . . . 192
7.3 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Newton’s method plays a crucial role in the context of solving non linear equations
and, by extension, in that of non linear optimization. Isaac Newton was inspired
by a method from Vieta, and the method was later on improved by Raphson (and
sometimes called “Newton-Raphson.”) We refer the reader to Deuflhard (2012) for
a historical perspective of the method. We introduce it for the simple problem of
solving one equation of one unknown, i.e., by deriving numerical methods to find
x ∈ R such that F(x) = 0.
7.1 Equation with one unknown
Let F : R → R be a real differentiable function of one variable. In order to solve

the equation F(x) = 0, the main idea of Newton’s method consists in simplifying the
problem. Since a non linear equation is complicated to solve, it is replaced by a linear
equation. The concept of replacing a difficult problem with a simpler one is used
throughout this book. We use the term model when referring to a function that is a
simplification of another.
To obtain this simplified equation, we invoke Taylor’s Theorem C.1 which ensures
that a differentiable function can be approximated at a point by a straight line and
that the magnitude of the error decreases with the distance to this point.
182 Equation with one unknown
Isaac Newton was born prematurely and fatherless on December

25, 1642, in Woolsthorpe, England. (As 11 days were dropped in
September 1752 to adjust the calendar, the date of his birth in
the “new style” calendar, that is, January 4, 1643, is sometimes
reported.) He is considered as the father of modern analysis, es-
pecially thanks to his study on differentiable functions, and in-

finitesimal calculus (that he called “fluxions”). His most famous
work, published in Philosophiae naturalis principia mathe-
matica, concerns the theory of gravitation and associated principles (inertia, action-
reaction, tides, etc.) He is considered as the founder of celestial mechanics. Newton
claimed that the fall of an apple inspired in him the concept of gravitation. Some
dispute him being the father of these findings, rather attributing the fundamental
ideas to Robert Hooke. Newton accused Leibniz (apparently wrongfully) of having
plagiarized his work. He was the first British scientist to be knighted, on April 16,
1705, by Queen Anne. He died on March 20, 1727, in London. One of his most famous
quotes is “If I have seen further than others, it is by standing upon the shoulders of
giants.” He is buried in Westminster Abbey, with the following inscription on his
grave: “Hic depositum est, quod mortale fuit Isaaci Newtoni”. (Here lies that which
was mortal of Isaac Newton).
Figure 7.1: Sir Isaac Newton
Example 7.1 (Linear model). Take the function
F(x) = x2 − 2
and the point b

x = 2. According to Taylor’s theorem, for any d ∈ R, we have

F(b x) + dF ′ (b
x + d) = F(b x) + o |d|

=bx2 − 2 + 2bxd + o |d|

= 2 + 4d + o |d| .

The linear model is obtained by ignoring the error o |d| :
x + d) = 2 + 4d .
m(b
Defining x = b
x + d, we get
m(x) = 2 + 4(x − 2) = 4x − 6 .
The function and the model are presented in Figure 7.2(a). The zoom in Figure 7.2(b)
illustrates the good agreement between the model and the function around b x = 2.
Newton’s method 183
15 2.5
f(x) 2.4 f(x)
10 m(x) 2.3 m(x)
2.2
5 2.1
2
0 1.9
1.8
-5 1.7
1.6
-10 1.5
0 0.5 1 1.5 2 2.5 3 3.5 4 1.9 1.95 2 2.05 2.1
x x
(a) b
x=2 (b) Zoom
Figure 7.2: Linear model of x2 − 2
We can now provide a general definition of the linear model of a non linear func-
tion.
Definition 7.2 (Linear model of a function with one variable). Let F : R → R be a

differentiable function. The linear model of F in b
x is a function mbx : R → R defined
by
mbx (x) = F(b x)F ′ (b
x) + (x − b x) . (7.1)
From a first approximation b x, the main idea of Newton’s method in order to find
the root of the function f consists in
1. calculating the linear model in b
x,
2. calculating the root x+ of this linear model,
3. if x+ is not the root of f, considering x+ as a new approximation and starting
over.
According to Definition 7.2, the root of the linear model is the solution to
F(b x)F ′ (b
x) + (x − b x) = 0 , (7.2)
i.e., if F ′ (b
x) 6= 0,
F(b x)
x+ = b
x− , (7.3)
F ′ (b
x)
which summarizes the first two steps presented above.
We also need to specify the third step. How do we conclude that x+ is a root
of the function, i.e., F(x+ ) = 0, and that we can stop the iterations? Seemingly
innocuous, this question is far from simple to answer. Indeed, computers operating
in finite arithmetic are not capable of representing all real numbers (that represent an
uncountable infinity). Therefore, it is possible, and even common, that the method
never generates a point x+ such that F(x+ ) = 0 exactly. Then, we must often settle for
a solution x+ such that F(x+ ) is “sufficiently close” to 0. In practice, the user provides
a measure of this desired proximity, denoted by ε and the algorithm is interrupted
when
F(x+ ) ≤ ε . (7.4)
√
A typical value for ε is εM , where εM is the machine epsilon, that is, an upper
bound on the relative error due to rounding in floating point arithmetic. A simple
way to compute εM is Algorithm 7.1. The loop stops when εM is so small that, when
added to 1, the result is also 1.
Algorithm 7.1: Machine epsilon

1 Objective
2 Find the machine epsilon εM .
3 Initialization
4 εM := 1.
5 while 1 + εM 6= 1 do
6 εM := εM /2.
Typical values are

√
• εM = 5.9605 10−8 for single precision floating point (so that ε = εM =
2.4414 10−4 ), and
√
• εM = 1.1102 10−16 for double precision floating point (so that ε = εM =
1.0537 10−8 ).
We now have all the elements in order to write Newton’s algorithm to solve an
equation with one unknown (Algorithm 7.2).
Abu Ja‘far Muhammad ibn Musa Al-Khwarizmi was a Persian

mathematician born before AD 800. Only a few details about
his life can be gleaned from Islamic literature. His name appears to indicate that
he was from the State of Khwarazm or Khorezm (currently Khiva in Uzbekhistan).
However, other sources suggest that he was born between the Tigris and Euphrates
in the Baghdad area. Al Khwarizmi was an astronomer in the House of Wisdom (Dar
al-Hikma) of caliph Abd Allah al Mahmoun. He is primarily known for his treatise al
Kitab almukhtasar fi hisab al-jabr w’al muqabala (that can be translated as “The
Compendious Book on Calculation by Completion and Balancing”), which provides
the origin of the word algebra (al-Jabr, used in the sense of transposition, became
algebra). He explained in Arabic the system of Indian decimal digits applied to
arithmetic operations. The Latin translation of this work, entitled Algoritmi de
numero Indorum gave rise to the word algorithm. Al Khwarizmi died after AD 847.
Figure 7.3: Al Khwarizmi
Algorithm 7.2: Newton’s method: one variable

1 Objective
2 Find (an approximation of) a solution to the equation F(x) = 0.
3 Input
4 The function F : R → R.
5 The derivative of the function F ′ : R → R.
6 A first approximation of the solution x0 ∈ R.
7 The required precision ε ∈ R, ε > 0.
8 Output
9 An approximation x∗ ∈ R to the solution.
10 Initialization
11 k := 0.
12 Repeat
13 xk+1 := xk − F(xk )/F ′ (xk ),
14 k := k + 1.
15 Until F(xk ) ≤ ε
16 x∗ = xk
Example 7.3 (Newton’s method: one variable – I). Take the equation
F(x) = x2 − 2 = 0 .
We have F ′ (x) = 2x. We apply Newton’s method (Algorithm 7.2) with x0 = 2,

and ε = 10−15 . The iterations are listed in Table 7.1. The first two iterations are
portrayed in Figure 7.4. Figure 7.4(a) represents the first iteration, where x0 = 2.
The linear model at x0 is represented by a dotted line. It intersects the x-axis at

x1 = 1.5. Figure 7.4(b) represents the second iteration, where x1 = 1.5. The linear
model at x1 is represented by a dotted line. It intersects the x-axis at x2 = 1.4666.
Table 7.1: Iterations with Newton’s method for Example 7.3

k xk F(xk ) F ′ (xk )
0 +2.00000000E+00 +2.00000000E+00 +4.00000000E+00
1 +1.50000000E+00 +2.50000000E-01 +3.00000000E+00
2 +1.41666667E+00 +6.94444444E-03 +2.83333333E+00
3 +1.41421569E+00 +6.00730488E-06 +2.82843137E+00
4 +1.41421356E+00 +4.51061410E-12 +2.82842712E+00
5 +1.41421356E+00 +4.44089210E-16 +2.82842712E+00
2 • 0.3
0.25 •
1.5
0.2
1 0.15
0.5 0.1
• 0.05
0
0 •
-0.5 -0.05
1.41.51.61.71.81.9 2 2.1 1.42 1.44 1.46 1.48 1.5
x x
(a) First iteration (b) Second iteration
Figure 7.4: Newton’s method for Example 7.3
According to Example 7.3, Newton’s method seems quite fast, as only 5 iterations
were necessary to converge. We characterize this speed below. Before that, however,
we illustrate by other examples that the method does not always work that well.
Example 7.4 (Newton’s method: one variable – II). Take the equation
F(x) = x − sin x = 0 .
We have F ′ (x) = 1 − cos x. We apply Newton’s method (Algorithm 7.2) with x0 = 1
and ε = 10−15 . The iterations are listed in Table 7.2. The number of iterations is
much larger than for the previous example. Note how the derivative F ′ (xk ) is getting
closer and closer to 0 as the iterations proceed. Actually, the root of this equation is
x∗ = 0, and the value of the derivative at the root is 0. As Newton’s method divides
by F ′ (xk ) at each iteration, the fact that F ′ (x∗ ) = 0 is the source of the slow behavior
of the method. The first two iterations are portrayed in Figure 7.5(a). The linear
model at the starting point x0 = 1 is represented by a dotted line and intersects the
x-axis at 0.65, which is the first iterate.
Table 7.2: Iterations with Newton’s method for Example 7.4

0 +1.00000000E+00 +1.58529015E-01 +4.59697694E-01
1 +6.55145072E-01 +4.58707860E-02 +2.07040452E-01
2 +4.33590368E-01 +1.34587380E-02 +9.25368255E-02

3 +2.88148401E-01 +3.97094846E-03 +4.12282985E-02
4 +1.91832312E-01 +1.17439692E-03 +1.83434616E-02
..
.
25 +3.84171966E-05 +9.44986548E-15 +7.37940486E-10
26 +2.56114682E-05 +2.79996227E-15 +3.27973648E-10
27 +1.70743119E-05 +8.29617950E-16 +1.45766066E-10
0.25
0.2
0.15 •
0.1
0.05 •
0 • •
-0.05
-0.1
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
x
(a) Two iterations
0.06
•
0.04
0.02
0 • •
-0.02
-0.04
0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7
x
(b) Zoom
The linear model at that point is also represented by a dotted line, and intersects
the x-axis at 0.43, which is the second iterate. Figure 7.5(b) is a zoom on the same
figure.
Even though Newton’s method has managed to provide the desired precision in
5 iterations for Example 7.3, more than 5 times as many iterations are necessary for
Example 7.4. In the following Example, we see that the method may sometimes not
work at all.
Example 7.5 (Newton’s method: one variable – III). Take equation
F(x) = arctan x = 0 .
We have F ′ (x) = 1/(1 + x2 ). We apply Newton’s method (Algorithm 7.2) with

x0 = 1.5 and ε = 10−15 . The first 10 iterations are listed in Table 7.3. We note that
the absolute value of xk increases with each iteration, that the value of F(xk ) seems
to oscillate, and that the value of F ′ (xk ) closes in on 0. Therefore, not only does the
algorithm not approach the solution, but when the derivative approaches 0, the main
iteration cannot be performed due to the division by 0. The first three iterations are
portrayed in Figure 7.6.
Table 7.3: The ten first iterations with Newton’s method for Example 7.5
0 +1.50000000E+00 +9.82793723E-01 +3.07692308E-01
1 -1.69407960E+00 -1.03754636E+00 +2.58404230E-01
2 +2.32112696E+00 +1.16400204E+00 +1.56552578E-01
3 -5.11408784E+00 -1.37769453E+00 +3.68271300E-02
4 +3.22956839E+01 +1.53984233E+00 +9.57844131E-04
5 -1.57531695E+03 -1.57016153E+00 +4.02961851E-07
6 +3.89497601E+06 +1.57079607E+00 +6.59159364E-14
7 -2.38302890E+13 -1.57079633E+00 +1.76092712E-27
8 +8.92028016E+26 +1.57079633E+00 +1.25673298E-54
9 -1.24990460E+54 -1.57079633E+00 +6.40097701E-109
10 +2.45399464E+108 +1.57079633E+00 +1.66055315E-217
We now analyze in detail the aspects that influence the efficiency of the method.
The main result can be stated as follows:
• if the function is not too non linear,
• if the derivative of F at the solution is not too close to 0,
• if x0 is not too far from the root,
• then Newton’s method converges quickly toward the solution.
The central idea of the analysis is to measure the error that is committed when
the non linear function is replaced by the linear model. Intuitively, if the function
is almost linear, the error is small. While if the function is highly non linear, the
error is more significant. We use here the Lipschitz continuity of the derivative of
F to characterize the non linearity, as discussed in Section 2.4 (Definition 2.27).
Theorem 7.6 considers a linear model at b x, and provides an upper bound on the error
1.5
0.5
0 x x1 x0 x2
3
-0.5
-1
-1.5
-4 -2 0 2 4
x
at a point x+ . This bound depends on the distance between b x and x+ , and on the
Lipschitz constant that characterizes the nonlinearity of the function.
Theorem 7.6 (Error of the linear model: one variable). Consider an open interval
X ⊆ R and a function F for which the derivative is Lipschitz continuous over X,
x, x+ ∈ X,
where M is the Lipschitz constant. So, for all b
(x+ − b
x)2
F(x+ ) − mbx (x+ ) ≤ M . (7.5)
2
Proof. We have
Z x+ Z x+ Z x+

F ′ (z) − F ′ (b
x) dz = F ′ (z) dz − F ′ (b
x) dz linearity of the integral
b
x b
x b
x
x) − F ′ (b
= F(x+ ) − F(b x)(x+ − b
x)
= F(x+ ) − mbx (x+ ) from (7.1) .
x + t(x+ − b
We take z = b x) and dz = (x+ − b
x) dt to obtain
Z1

F(x+ ) − mbx (x+ ) = F′ b x) − F ′ (b
x + t(x+ − b x) (x+ − b
x) dt .
0
Therefore,
F(x+ ) − mbx (x+ )

Z1

= F′ b x) − F ′ (b
x + t(x+ − b x) (x+ − b
x) dt
0
Z1

≤ F′ b x) − F ′ (b
x + t(x+ − b x) |x+ − b
x| dt from Theorem C.12
0
Z1

= |x+ − b
x| F′ b x) − F ′ (b
x + t(x+ − b x) dt
0
Z1
+
≤ |x − b
x| M t(x+ − b
x) dt from Definition 2.27
0
Z1
+ 2
= M x −b
x t dt
0
M + 2
= x −b
x .
2
We now use this bound on the error to demonstrate the convergence of Newton’s
method.
Theorem 7.7 (Convergence of Newton’s method: one variable). Consider an open

interval X ⊆ R and a continuously differentiable function F such that its deriva-
tive is Lipschitz continuous over X, and where the Lipschitz constant is M.
Assume that there exists ρ > 0 such that
F ′ (x) ≥ ρ , ∀x ∈ X . (7.6)
Assume that there exists x∗ ∈ X such that F(x∗ ) = 0. There then exists η > 0
such that, if
|x0 − x∗ | < η (7.7)

with x0 ∈ X, the sequence xk k defined by
F(xk )
xk+1 = xk − , k = 0, 1, . . . , (7.8)
F ′ (xk )
is well defined and converges toward x∗ . Moreover,

M 2
|xk+1 − x∗ | ≤ xk − x∗ . (7.9)
2ρ
Proof. We provide a proof by induction. For k = 0, x1 is well defined as F ′ (x0 ) 6= 0

by assumption (7.6), as x0 ∈ X. We have
F(x0 )
x1 − x∗ = x0 − − x∗ from (7.8)
F ′ (x0 )
F(x0 ) − F(x∗ )
= x0 − x∗ − because F(x∗ ) = 0
F ′ (x0 )
1
= ′ F(x∗ ) − mx0 (x∗ ) from (7.1) .
F (x0 )
Then
1
x1 − x∗ ≤ F(x∗ ) − mx0 (x∗ )
F ′ (x0 )
M 2
≤ ′
x0 − x∗ from (7.5)
2 F (x0 )
M
≤ |x0 − x∗ |2 from (7.6) ,
2ρ
which proves the result for k = 0.

We now need technical constants. Take τ such that 0 < τ < 1 and let r be the
radius of the largest interval contained in X and centered in x∗ . We then create

2ρ
η = min r, τ . (7.10)
M
Therefore, based on the hypothesis (7.7), we have
2ρ
|x0 − x∗ | ≤ η ≤ τ (7.11)
M
and
M 2 M 2ρ
|x1 − x∗ | ≤ x0 − x∗ ≤ τ |x0 − x∗ | = τ|x0 − x∗ | < η ,
2ρ 2ρ M
where the last inequality is the result of the fact that τ < 1 and |x0 − x∗ | < η. Since
|x1 − x∗ | < η, we also have that |x1 − x∗ | < r (according to (7.10)) and x1 ∈ X. x1
thus satisfies the same assumptions as x0 . We can now apply the recurrence using
the same arguments for x2 , x3 , and so forth.
We now comment a summarized version of the result of Theorem 7.6:
If the function is not too non linear This assumption is related to the Lipschitz
continuity. The closer M is to 0, the less non linear is the function.
If the derivative of F is not too close to 0 This is hypothesis (7.6). If this as-
sumption is not satisfied, the method may not be well defined (division by zero),
may not converge, or may converge slowly, as illustrated by Example 7.4.
If x0 is not too far from the root This is hypothesis (7.7). If x0 is too far from
the root, the method may not converge, as shown in Example 7.5. It is interesting
192 Systems of equations with multiple unknowns
to take a close look at Definition (7.10) of η, assuming that r (a technical param-

eter) is sufficiently large such that η = 2ρτ/M. If the function is close to being
linear, then M is small and η is large. It means that the set of starting points
such that the method converges is large, and we can afford to start from a point
x0 farther away from x∗ . In practice, as x∗ is not known, it increases the chance
of finding a valid starting point.

Newton’s method converges quickly toward the solution The speed is char-
acterized by (7.9). At each iteration, the new distance to the solution is of the
order of the square of the former. For instance, if the initial error is of the order
of 10−1 , it only takes three iterations for it to become of the order of 10−8 . This
is illustrated in Example 7.3, for which the iterations are described in Table 7.1.
The method is said to converge q-quadratically.

Definition 7.8 (q-quadratic convergence). Take a sequence xk k in Rn that con-
verges toward x∗ . The sequence is said to converge q-quadratically toward x∗ if there
exists c ≥ 0 and b
k ∈ N such that
2
xk+1 − x∗ ≤ c xk − x∗ , ∀k ≥ b
k. (7.12)
In Definition 7.8, the prefix q signifies quotient. In practice, the other types of
convergence are rarely used, and the prefix could be omitted. More details can be
found in Ortega and Rheinboldt (1970).
7.2 Systems of equations with multiple unknowns

We now generalize Newton’s method for systems of non linear equations with multiple
unknowns. The concepts are exactly the same. We start with the definition of the
linear model. Although in our context, the function F maps Rn into Rn , we provide
the most general definition for a function from Rn to Rm .
Definition 7.9 (Linear model of a function with n variables). Let F : Rn → Rm

be a continuously differentiable function. The linear model of F in b x is a function
mbx : Rn → Rm defined by
T
x) + ∇F b
mbx (x) = F(b x (x − bx) = F(b x)(x − b
x) + J(b x) , (7.13)
x) ∈ Rn×m is the gradient matrix of F in b

where ∇F(b x (Definition 2.17) and J(b
x) =
T
b
∇F x is the Jacobian matrix, of dimensions m × n (Definition 2.18).
As in the case with one variable, we determine a bound for the error committed
when replacing the function F by the linear model and finding a result similar to that
of Theorem 7.6. The proof is essentially the same.
Theorem 7.10 (Error of the linear model: n variables). Let F : Rn → Rm be

a continuously differentiable function over an open convex set X ⊂ Rn . The

Jacobian matrix of F is Lipschitz continuous over X (Definition 2.27, where M
is the Lipschitz constant, and the matrix norm is induced by the vector norm;
x, x+ ∈ X,
Definition B.27). So, for all b
2
x+ − b
x
F(x+ ) − mbx (x+ ) ≤ M . (7.14)
2
Proof. The structure of the proof is identical to that of Theorem 7.6. We have
F(x+ ) − mbx (x+ )

= F(x+ ) − F(b x)(x+ − b
x) − J(b x) from (7.13)
Z1

= J b x + t(x+ − bx) (x+ − b x)(x+ − b
x) dt − J(b x) Theorem C.11
0
Z1

= x + t(x+ − b
J b x) (x+ − b
x) − J(b x) dt .
0
Then,
F(x+ ) − mbx (x+ )

Z1

≤ x + t(x+ − b
J b x) kx+ − b
x) − J(b xk dt Theorem C.12
0
Z1
≤ M t(x+ − b
x) kx+ − b
xk dt Definition 2.27
0
Z1
2
= M x+ − b
x t dt
0
2
x+ − b
x
=M .
2
Newton’s method for systems of equations is also essentially the same as for a
single variable. It is described by Algorithm 7.3. System (7.16) solved at step 13 of
the algorithm is often called the Newton equations. Note that we have intentionally
not written this step as
dk+1 = −J(xk )−1 F(xk ).
Indeed, from a numerical point of view, the calculation of dk+1 must be performed
by solving the system of linear equations, not by inverting the Jacobian matrix.
Algorithm 7.3: Newton’s method: n variables

1 Objective
2 To find (an approximation of) a solution to the system of equations
F(x) = 0 . (7.15)
3 Input
4 The function F : Rn → Rn .
5 The Jacobian matrix of the function J : Rn → Rn×n .
6 A first approximation of the solution x0 ∈ Rn .
8 Output
9 An approximation x∗ ∈ Rn of the solution.
10 Initialization
11 k = 0.
12 Repeat
13 Calculate dk+1 solution of
J(xk )dk+1 = −F(xk ) . (7.16)
xk+1 := xk + dk+1 .
14 k := k + 1.
16 x∗ = xk
Example 7.11 (Newton’s method: n variables). Consider the system of equations

2
x1 + 1 + x22 = 2
(7.17)
ex1 + x32 = 2 .
We apply Newton’s method with

! !
(x1 + 1)2 + x22 − 2 2(x1 + 1) 2x2
F(x) = and J(x) = .
ex1 + x32 − 2 ex1 3x22
T
If x0 = 1 1 , we have

3 3
F(x0 ) = ≈
e−1 1.7182
and
4 2
J(x0 ) = .
e 3
The iterations of Newton’s method are described in Table 7.4, with ε = 10−15 , where
the first column reports the iteration number, the second column the current iterate,
the third the value of the function at the current iterate, and the last its norm. The
quadratic convergence of the method is well illustrated in this example. Indeed, the
value of xk converges rapidly to the solution (0, 1)T , and the values of kF(xk )k decrease
rapidly toward zero.
Table 7.4: Iterations of Newton’s method for Example 7.11

k xk F(xk ) F(xk )
0 1.00000000e+00 3.00000000e+00 3.45723768e+00
1.00000000e+00 1.71828182e+00
1 1.52359213e-01 7.56629795e-01 1.15470870e+00
1.19528157e+00 8.72274931e-01
2 -1.08376809e-02 5.19684443e-02 1.14042557e-01
1.03611116e+00 1.01513475e-01
3 -8.89664601e-04 1.29445248e-03 3.94232975e-03
1.00153531e+00 3.72375572e-03
4 -1.37008875e-06 3.13724882e-06 8.07998556e-06
1.00000293e+00 7.44606181e-06
5 -5.53838974e-12 1.05133679e-11 2.88316980e-11
1.00000000e+00 2.68465250e-11
6 -1.53209346e-16 -2.22044604e-16 2.22044604e-16
1.00000000e+00 0.00000000e+00
We now analyze the impact of the starting point on the solution of Newton’s
method.
Example 7.12 (Newton fractal). Consider the system of equations
!
x31 − 3x1 x22 − 1
F(x) = = 0.
x32 − 3x21 x2
It has three roots:

1 −1/2 −1/2
x∗ (b) = , x∗ (g) = √ , x∗ (w) = √ .
0 3/2 − 3/2
We apply Newton’s method to this problem, starting from different points. To visu-
alize the process, we take on the following convention:
• if Newton’s method, when starting from the point x0 , converges toward the solu-
tion x∗ (b), the point x0 is colored in black;
• if Newton’s method, when starting from the point x0 , converges toward the solu-
tion x∗ (g), the point x0 is colored in gray;
• if Newton’s method, when starting from the point x0 converges toward the solution
x∗ (w), the point x0 is colored in white.
Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 196 Systems of equations with multiple unknowns
(a) −2 ≤ x1 ≤ 2, −2 ≤ x2 ≤ 2 (b) −0.001 ≤ x1 ≤ 0.001, −0.001 ≤ x2 ≤ 0.001
Figure 7.7: Newton’s method: relation between the starting point and the solution
The result is presented in Figure 7.7(a), where the three roots are represented
by a + sign. We see that there is no direct relationship between the position of
the starting point and the root identified by the method. For example, look at the
gray areas at the bottom right of Figure 7.7(a). Although these starting points are
closer to the roots x∗ (b) and x∗ (w), Newton’s method converges towards x∗ (g) when
started from these areas. But the most noticeable feature of this figure is the shape
of the borders between each region. This type of configuration is called a fractal
(see Mandelbrot, 1982). The zoom presented in Figure 7.7(b) shows that two points
that are very close may be colored differently. This is an illustration of a chaotic
system, which exhibits a significantly different outcome when the starting conditions
are perturbed just a little bit.
We now generalize Theorem 7.7 for the case of n equations and n variables.
Theorem 7.13 (Convergence of Newton’s method: n variables). Consider an open

convex set X ⊆ Rn and a function F : X → Rn . We assume that there exists
x∗ ∈ X, a sphere B(x∗ , r) centered in x∗ with radius r and a constant ρ > 0 such
that F(x∗ ) = 0, B(x∗ , r) ⊂ X, J(x∗ ) is invertible,
1
J(x∗ )−1 ≤ (7.18)
ρ
and J is Lipschitz continuous over B(x∗ , r), where M is the Lipschitz constant.
There thus exists η > 0 such that if
x0 ∈ B(x∗ , η) , (7.19)
the sequence (xk )k defined by
xk+1 = xk − J(xk )−1 F(xk ) , k = 0, 1, . . . , (7.20)
is well defined and converges toward x∗ . Moreover,

M 2
xk+1 − x∗ ≤ xk − x∗ . (7.21)
ρ
Proof. In order for the sequence to be well defined, the matrix J(xk ) always has to
be invertible. By assumption, it is the case at x∗ . We choose η such that J(x) is
invertible for all x in a sphere B(x∗ , η) of radius η around x∗ . We take
ρ
η = min r, . (7.22)
2M
We first demonstrate that J(x0 ) is invertible, by using the theorem about the inverse
of a perturbed matrix (Theorem C.16), with A = J(x∗ ) and B = J(x0 ). The hypothesis
(C.28) on which the theorem is based is satisfied. Indeed,
−1 −1
J x∗ J(x0 ) − J(x∗ ) ≤ J x∗ J(x0 ) − J(x∗ )
1
≤ J(x0 ) − J(x∗ ) from (7.18)
ρ
M
≤ kx0 − x∗ k Lipschitz
ρ
M
≤ η from (7.19)
ρ
1
≤ from (7.22) .
2
Therefore, J(x0 ) is invertible, and x1 given by (7.20) is well defined. By using this
result, Theorem C.16 and by noting that if y ≤ 1/2, then 1/(1 − y) ≤ 2, we obtain
−1
−1 J x∗
J x0 ≤ −1
1 − J x∗ J(x0 ) − J(x∗ ) (7.23)

∗ −1 2
≤2 J x ≤ .
ρ
We have
x1 − x∗ = x0 − J(x0 )−1 F(x0 ) − x∗ according to (7.20)

= x0 − J(x0 ) −1
F(x0 ) − F(x ) − x∗
∗
because F(x∗ ) = 0

= J(x0 )−1 F(x∗ ) − F(x0 ) − J(x0 )(x∗ − x0 )

= J(x0 )−1 F(x∗ ) − mx0 (x∗ ) from (7.13) .
198 Project
Consequently,
kx1 − x∗ k ≤ J(x0 )−1 F(x∗ ) − mx0 (x∗ )

2
≤ F(x∗ ) − mx0 (x∗ ) from (7.23)
ρ
2
2 x0 − x∗
≤ M from (7.14) ,
ρ 2
which proves (7.21) for k = 0. Since
kx0 − x∗ k ≤ η from (7.19)

ρ
≤ from (7.22) ,
2M
we have
2 ρ kx0 − x∗ k 1
kx1 − x∗ k ≤ M = kx0 − x∗ k
ρ 2M 2 2
and x1 ∈ B(x∗ , η). The same reasoning can be applied recursively to prove the result
for k = 1, 2, 3, . . .
Newton’s method constitutes an effective tool that is central in optimization prob-
lems. However, it has two undesirable features:
1. it must be started close to the solution (which is not known in practice) and,
therefore, does not work from any starting point (assumption (7.19));
2. it requires calculating the matrix of the derivatives at each iteration, which can
involve a great deal of calculations in solving real problems.
Techniques that permit us to address the first issue are called globalization tech-
niques. A global algorithm exhibits convergence when started from any point. We
study such methods directly in the context of optimization in later chapters, and refer
the interested reader to Dennis and Schnabel (1996) for a comprehensive description
of these techniques in the context of solving systems of equations. In Chapter 8, we
address the second issue by presenting methods based on the same idea as Newton’s
method, but without using the derivatives. Such methods are called quasi-Newton
methods.
The presentation of the proof of Theorems 7.6, 7.7, and 7.13 is inspired by Dennis
and Schnabel (1996).
7.3 Project
The general organization of the projects is described in Appendix D.
Objective
To analyze the impact of the starting point on the convergence of Newton’s method,
with inspiration taken from Example 7.12.
Approach
To create drawings similar to those of Figure 7.7, we use the following convention:
• Associate a specific color for each solution. For instance, when we have three
solutions, the RGB codes (255, 0, 0), (0, 255, 0) and (0, 0, 255) could be utilized.
• Define a maximum number of iterations K.

• Apply Newton’s method from a starting point x0 .
• If the method converges in k iterations toward the first solution, associate the color
255 − (255 k/K), 0, 0 to the point x0 . Similarly, associate the color 0, 255 −
(255 k/K), 0 and 0, 0, 255 − (255 k/K if the algorithm converges toward the
second or third solution, respectively.
• If the method does not converge, associate the color black (0, 0, 0) to the point x0 .
Algorithm
Algorithm 7.3.
Problems
Exercise 7.1. The system
x2 = x21
x21 + (x2 − 2)2 = 4
T √ T √ T
has three roots: 0 0 , − 3 3 , 3 3 . Note that there are three
intersections between a circle and a parabola (draw the sketch). Note also that the
Jacobian is singular when x1 = 0.
3x21 + 2x22 = 35
4x21 − 3x22 = 24
T T T T
has four solutions: −3 −2 , −3 2 , 3 −2 , 3 2 . Note that
there are three intersections between an ellipse and a hyperbole (draw the sketch).
x21 − x1 x2 + x22 = 21
x21 + 2x1 x2 − 8x22 = 0
√ √ T √ √ T T T
has four solutions: −2 7 − 7 , 2 7 7 , −4 1 , 4 −1 .
Warning: when implementing these systems, one must not confuse the Jacobian
and the gradient matrix. Each row of the Jacobian corresponds to an equation and
each column to a variable.
Chapter 8
Quasi-Newton methods
“You cannot have your cake and eat it too,” says a popular proverb. In this chapter,
however, we try! The method developed here has an effectiveness close to that of
Newton’s method, without requiring the calculation of derivatives.
Contents
8.1 Equation with one unknown . . . . . . . . . . . . . . . . . 201
8.2 Systems of equations with multiple unknowns . . . . . . 208
8.3 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
When conditions so permit, Newton’s method proves to be fast. However, it re-

quires that the Jacobian matrix be explicitly calculated at each iteration. There are
a number of cases where the function F is not specified by formulas, but rather by ex-
periments or determined by software. In these cases, the analytical expression of the
derivative is unavailable. Even if the problem happens to have an analytical formu-
lation, the calculation of the derivatives can be prohibitive or even impossible when
the analytical calculation and implementation of the derivatives require excessively
long work, into which errors can easily slip.
In this chapter, we see that it is possible to use the ideas from Newton’s method,
without using the derivatives. This is of course done at the expense of performance.
However, this expense is often small compared with what we gain by not having to cal-
culate Jacobian matrices. We introduce the main ideas regarding the simple problem
of one equation with one unknown, before generalizing for systems of equations.
8.1 Equation with one unknown

The main idea is based on the definition of the derivative:
F(x + s) − F(x)
F ′ (x) = lim . (8.1)
s→0 s
Sister Caasi Newton, or Quasi Newton, is the twin sister of Sir Isaac
Newton. Caasi Newton tried to follow in the footsteps of her illustri-
ous brother, but was never able to understand the complex concept of
derivatives. Her striking resemblance to her brother and the complete
absence of any writings cast doubt on her existence.
Figure 8.1: Sister Caasi Newton.
To obtain a good approximation of the value of the derivative, we simply choose a

value of s that is close enough to zero and obtain
F(x + s) − F(x)
as (x) = . (8.2)
s
Geometrically, the derivative at x is the slope of the tangent to the function at x. The
above approximation replaces the tangent by a secant intersecting the function at x
and x + s, as illustrated in Figure 8.2. The model obtained from this approximation
is therefore called the secant linear model of the function.
F(x)
mbx;s
s
b
x b
x+s
Figure 8.2: Secant linear model
Definition 8.1 (Secant linear model of a function with one variable). Let F : R → R
be a differentiable function. The secant linear model of F in b
x is a function mbx;s :
R → R defined by
x + s) − F(b
F(b x)
mbx;s (x) = F(b
x) + (x − b
x) , (8.3)
s
where s 6= 0.
Quasi-Newton methods 203
We can now utilize the same principle as in Newton’s method, replacing the deriva-
tive in (7.3) by its secant approximation and obtain
F(bx)
x+ = b
x− . (8.4)
as (b
x)
To obtain an algorithm, we now need only define the value of s. As said above, a
natural idea is to choose s small so as to obtain a good approximation of the derivative.
For example, s can be defined as
τb x| ≥ 1
x if |b
s= (8.5)
τ otherwise ,
where τ is small, for instance equal to 10−7 . For a more sophisticated calculation of τ,
taking into account the epsilon machine and the precision obtained when calculating F,
we refer the reader to Dennis and Schnabel (1996, Algorithm A5.6.3). The algorithm
based on this definition of s is called the finite difference Newton’s method and is
presented as Algorithm 8.1.
Algorithm 8.1: Finite difference Newton’s method: one variable

1 Objective
2 To find (an approximation of) a solution to the equation
F(x) = 0 .
3 Input
6 A parameter τ > 0.
8 Output
9 An approximation of the solution x∗ ∈ R.
10 Initialization
11 k := 0.
12 Repeat
13 if |xk | ≥ 1 then
14 s := τxk
15 else
16 s := τ
sF(xk )
17 xk+1 := xk − .
F(xk + s) − F(xk )
18 k := k + 1.
20 x∗ = xk .
The iterations of this algorithm applied to Example 7.3, with τ = 10−7 , are
described in Table 8.1. The difference with the iterations of Newton’s method (Ta-
ble 7.1) are almost imperceptible. What is even more interesting is that a higher value
of τ may still enable the algorithm to converge, even if this convergence is slow. Ta-
ble 8.2 contains the iterations of the algorithm applied to Example 7.3, with τ = 0.1.
The first two iterations of this algorithm are illustrated in Figure 8.3. Intuitively, we
expect it to work well, with a relatively large s, when the function is not too linear.
•
2.5
2 •
1.5 x1 x1 + s x0 x0 + s
1
×
0.5
×
0
1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3
x
Figure 8.3: Finite difference Newton’s method for Example 7.3
Table 8.1: Iterations for finite difference Newton’s method (τ = 10−7 ) for Example
7.3
k xk F(xk )
0 +2.00000000E+00 +2.00000000E+00
1 +1.50000003E+00 +2.50000076E-01
2 +1.41666667E+00 +6.94446047E-03
3 +1.41421569E+00 +6.00768206E-06
4 +1.41421356E+00 +4.81081841E-12
5 +1.41421356E+00 +4.44089210E-16
In practice, there is no reason to take τ = 0.1 because, even if this choice provides
results, it slows down the convergence. The only motivation to take a larger s would
be to save on function evaluations. This is the idea of the secant method, that uses
a step based on the last two iterates, that is
s = xk−1 − xk ,
in such a way that (8.2) is written as
F(xk−1 ) − F(xk )
as (xk ) = .
xk−1 − xk
Therefore, no additional evaluation of the function is required, because F(xk−1 ) has
already been calculated during the previous iteration. The secant method is described
Table 8.2: Iterations for finite difference Newton’s method (τ = 0.1) for Example 7.3
k xk F(xk )
0 +2.00000000E+00 +2.00000000E+00
1 +1.52380952E+00 +3.21995465E-01
2 +1.42318594E+00 +2.54582228E-02
3 +1.41466775E+00 +1.28485582E-03
4 +1.41423526E+00 +6.13706622E-05
5 +1.41421460E+00 +2.92283950E-06
6 +1.41421361E+00 +1.39183802E-07
7 +1.41421356E+00 +6.62780186E-09
8 +1.41421356E+00 +3.15609761E-10
9 +1.41421356E+00 +1.50284230E-11
10 +1.41421356E+00 +7.15427717E-13
11 +1.41421356E+00 +3.41948692E-14
12 +1.41421356E+00 +1.33226763E-15
13 +1.41421356E+00 +4.44089210E-16
as Algorithm 8.2. Note that this technique does not work at the first iteration (k = 0),
as xk−1 is not defined. For the first iteration, an arbitrary value for a0 is therefore
selected.
Table 8.3 shows the iterations of the secant method, with a0 = 1. Figure 8.4
illustrates the first two iterations of the method for this example. At the iterate
x0 = 2, a first arbitrary linear model, with slope 1, is first considered. It intersects
the x-axis at 0, which becomes the next iterate x1 . Then the secant method can start.
The secant intersecting the function at x0 and x1 is considered. It intersects the x-
axis at 1, which becomes iterate x2 . The next iteration is illustrated in Figure 8.5.
The secant intersecting the function at x1 and x2 , crosses the x-axis at x3 = 2.
Interestingly, by coincidence, it happens to be the same value as x0 . But it does
not mean that iteration 3 is the same as iteration 0. Indeed, between the two, the
algorithm has collected information about the function, and accumulated it into ak .
If a0 = 1 is an arbitrary value, not containing information about F, the value a3 = 3
used for the next secant model has been calculated using explicit measures of the
function F. As a consequence, the secant intersecting the function at x2 and x3
crosses the x-axis at x4 , that happens not to be too far from the zero of the function.
Therefore, the convergence of the method from iteration 4 is pretty fast, as can be
seen in Table 8.3. Indeed, during the last iterations, xk and xk−1 are closer and closer
and s = xk−1 − xk is smaller and smaller. Geometrically, it means that the secant is
closer and closer to the actual tangent, and the method becomes similar to the finite
difference Newton’s method. The rate of convergence is fast, and is characterized as
superlinear.
Algorithm 8.2: Secant method: one variable

1 Objective
2 To find (an approximation of) a solution to the equation
F(x) = 0 .
3 Input
6 A first approximation of the derivative a0 (by default: a0 = 1).
8 Output
9 An approximation of the solution x∗ ∈ R.
10 Initialization
11 k := 0.
12 Repeat
13 Update the current iterate
F(xk )
xk+1 := xk − .
ak
14 Update the approximation of the derivative
F(xk ) − F(xk+1 )
ak+1 := .
xk − xk+1
15 k := k + 1.
17 x∗ = xk

Definition 8.2 (Superlinear convergence). Consider a sequence xk k in Rn that
converges toward x∗ . The sequence is said to converge superlinearly toward x∗ if
xk+1 − x∗
lim = 0. (8.6)
k→∞ xk − x∗
Table 8.3: Iterations for the secant method (a0 = 1) for Example 7.3
k xk F(xk ) as (xk )
0 +2.00000000E+00 +2.00000000E+00 +1.00000000E+00
1 +0.00000000E+00 -2.00000000E+00 +2.00000000E+00
2 +1.00000000E+00 -1.00000000E+00 +1.00000000E+00

3 +2.00000000E+00 +2.00000000E+00 +3.00000000E+00
4 +1.33333333E+00 -2.22222222E-01 +3.33333333E+00
5 +1.40000000E+00 -4.00000000E-02 +2.73333333E+00
6 +1.41463415E+00 +1.18976800E-03 +2.81463415E+00
7 +1.41421144E+00 -6.00728684E-06 +2.82884558E+00
8 +1.41421356E+00 -8.93145558E-10 +2.82842500E+00
9 +1.41421356E+00 +8.88178420E-16 +2.82842706E+00
-1
-2
x1 x2 x0
-3
-0.5 0 0.5 1 1.5 2
x
Figure 8.4: Iterations 0 and 1 of the secant method for Example 7.3
-1
-2
x1 x2 x4 x3
-3
-0.5 0 0.5 1 1.5 2
x
Figure 8.5: Iterations 2 and 3 of the secant method for Example 7.3
8.2 Systems of equations with multiple unknowns

We now generalize the concepts of Section 8.1 for systems of n equations with n
unknowns. Again, the ideas are based on a linear model.
Algorithm 8.3: Finite difference Newton’s method: n variables

1 Objective
F(x) = 0 . (8.7)
3 Input
6 A parameter τ > 0.
8 Output
9 An approximation of the solution x∗ ∈ Rn .
10 Initialization
11 k := 0.
12 Repeat
13 for j = 1, . . . , n do
14 if |(xk )j | ≥ 1 then
15 sj := τ(xk )j
16 else if 0 ≤ (xk )j ≤ 1
17 then
18 sj := τ
19 else
20 sj := −τ
21 Form the matrix Ak with columns
F(xk + sj ej ) − F(xk )
Ak j
:= , j = 1, . . . , n ,
sj

where Ak j
is the jth column of Ak , and ej ∈ Rn is the jth canonical
vector, composed of 0, except at the jth place containing 1 instead.
22 Calculate dk+1 solution of Ak dk+1 = −F(xk ).
23 xk+1 := xk + dk+1 .
24 k := k + 1.
26 x∗ = xk .
Definition 8.3 (Linear secant model for a function with n variables). Let F : Rn →
Rm be a Lipschitz continuous function and A ∈ Rm×n a matrix. The linear secant
x is a function mbx;A : Rn → Rm defined by
model of F in b
x) + A(x − b
mbx;A (x) = F(b x) . (8.8)
When m = n, Definition 8.3 is similar to Definition 7.9, where J(b x) is replaced

by A. As we did for problems with one variable, we now consider two methods to
determine A: the approximation of J(b x) by finite difference and the secant method
based on previous iterates.
Algorithm 8.3 describes the method based on finite difference approximation. The
comments related to the problems with one variable remain valid. When τ is small, the
differences with the original Newton method are small (compare Table 7.4 and Table
8.4). When τ is large, the method still works, but with a much slower convergence
speed. Table 8.5 describes the iterations for τ = 0.1. We note that the choice of
τ = 0.1 is given only as an illustration. In practice, if the finite difference method is
adopted, a small value of τ should be used (see Dennis and Schnabel, 1996, for more
details).
Table 8.4: Iterations for the finite difference Newton’s method for Example 7.11
(τ = 10−7 )
k xk F(xk ) F(xk )
0 +1.00000000e+00 +3.00000000e+00 +3.45723769e+00
+1.00000000e+00 +1.71828183e+00
1 +1.52359228e-01 +7.56629845e-01 +1.15470878e+00
+1.19528158e+00 +8.72274977e-01
2 -1.08376852e-02 +5.19684806e-02 +1.14042632e-01
+1.03611119e+00 +1.01513541e-01
3 -8.89667761e-04 +1.29445824e-03 +3.94234579e-03
+1.00153532e+00 +3.72377069e-03
4 -1.37016733e-06 +3.13751967e-06 +8.08060994e-06
+1.00000294e+00 +7.44662523e-06
5 -5.68344146e-12 +1.09472431e-11 +2.98662028e-11
+1.00000000e+00 +2.77875500e-11
6 -9.93522913e-17 +0.00000000e+00 +4.44089210e-16
+1.00000000e+00 +4.44089210e-16
The main disadvantage of this method is that it uses n + 1 evaluations of the

function per iteration. This turns out to be prohibitive when n is large. Therefore,
we use the same idea as in the case involving a single variable: force the linear model
in xk to interpolate the function F in xk and in xk−1 . We immediately observe that
mxk ;Ak (xk ) = F(xk ) by Definition 8.3. We now need only impose
mxk ;Ak (xk−1 ) = F(xk ) + Ak (xk−1 − xk ) = F(xk−1 ) (8.9)
Table 8.5: Iterations for the finite difference Newton’s method for Example 7.11
(τ = 0.1)
k xk F(xk ) F(xk )
0 +1.00000000e+00 +3.00000000e+00 +3.45723769e+00
+1.00000000e+00 +1.71828183e+00
1 +1.64629659e-01 +8.02103265e-01 +1.21852778e+00
+1.20238971e+00 +9.17300554e-01
2 -1.45741083e-02 +8.85985792e-02 +1.88972898e-01
+1.05713499e+00 +1.66916290e-01
3 -5.72356301e-03 +8.21459268e-03 +2.52536228e-02
+1.00976678e+00 +2.38802414e-02
4 -4.76896360e-04 +1.48824845e-03 +3.51842725e-03
+1.00122016e+00 +3.18817297e-03
..
.
14 -2.17152295e-13 +5.45341550e-13 +1.36591792e-12
+1.00000000e+00 +1.25233157e-12
15 -2.49137961e-14 +6.26165786e-14 +1.56919411e-13
+1.00000000e+00 +1.43884904e-13
16 -2.79620466e-15 +7.10542736e-15 +1.79018084e-14
+1.00000000e+00 +1.64313008e-14
17 -2.35536342e-16 +8.88178420e-16 +1.98602732e-15
+1.00000000e+00 +1.77635684e-15
18 -5.13007076e-17 +0.00000000e+00 +0.00000000e+00
+1.00000000e+00 +0.00000000e+00
or
Ak (xk − xk−1 ) = F(xk ) − F(xk−1 ) .
This equation is called the secant equation.
Definition 8.4 (Secant equation). A linear model satisfies the secant equation in xk
and xk−1 if the matrix A defining it is such that
A(xk − xk−1 ) = F(xk ) − F(xk−1 ) . (8.10)
By taking
dk−1 = xk − xk−1
(8.11)
yk−1 = F(xk ) − F(xk−1 ) ,
it is written as
Adk−1 = yk−1 . (8.12)
Given xk , xk−1 , F(xk ) and F(xk−1 ), the linear secant model is based on a ma-
trix A satisfying the system of equations (8.10) or (8.12). This system of n linear
equations has n2 unknowns (the elements of A). Therefore, when n > 1, it is always
underdetermined and has an infinite number of solutions. From a geometrical point
of view, there are infinitely many hyperplanes passing through the two points.
The idea proposed by Broyden (1965) is to choose, among the infinite number of
linear models verifying the secant equation, the one that is the closest to the model
established during the previous iteration, thereby conserving to the largest possible
extent what has already been calculated. We now calculate the difference between two
successive models, that is mxk−1 ;Ak−1 (x), the model of the function in the previous
iterate xk−1 , and mxk ;Ak (x), the model of the function in the current iterate xk .
Lemma 8.5. Let mxk ;Ak (x) and mxk−1 ;Ak−1 (x) be linear secant models of a func-
tion F : Rn → Rn in xk and xk−1 , respectively. If these models satisfy the secant
equation, we can characterize the difference between the two models by
mxk ;Ak (x) − mxk−1 ;Ak−1 (x) = (Ak − Ak−1 )(x − xk−1 ) . (8.13)
Proof. The proof exploits the definition (8.8) of the secant model, and the secant
equation (8.10).
mxk ;Ak (x) − mxk−1 ;Ak−1 (x) = F(xk ) + Ak (x − xk )
− F(xk−1 ) − Ak−1 (x − xk−1 ) from (8.8)
= F(xk ) + Ak (x − xk )
− F(xk−1 ) − Ak−1 (x − xk−1 )
+ Ak xk−1 − Ak xk−1
= F(xk ) − F(xk−1 ) − Ak (xk − xk−1 )
+ (Ak − Ak−1 )(x − xk−1 )
= (Ak − Ak−1 )(x − xk−1 ) from (8.10) .
We now need only establish which matrix Ak minimizes this difference.
Theorem 8.6 (Broyden update). Let mxk−1 ;Ak−1 (x) be the linear secant model of
a function F : Rn → Rn in xk−1 and let us take xk ∈ Rn , xk 6= xk−1 . The linear
secant model of F in xk that satisfies the secant equation (8.10) and is as close
as possible to mxk−1 ;Ak−1 (x) is
mxk ;Ak (x) = F(xk ) + Ak (x − xk ) , (8.14)
with
(yk−1 − Ak−1 dk−1 )dTk−1
Ak = Ak−1 + , (8.15)
dTk−1 dk−1
where dk−1 = xk − xk−1 and yk−1 = F(xk ) − F(xk−1 ).
Proof. According to Lemma 8.5, the difference between the two linear models is
(Ak − Ak−1 )(x − xk−1 ) . (8.16)
The secant equation imposes the behavior of the linear model solely in the direction
dk−1 . Therefore, the degrees of freedom should be explored in the directions that are
orthogonal to dk−1 . For all x, we can decompose

x − xk−1 = αdk−1 + s , (8.17)
where s ∈ Rn is such that dTk−1 s = 0. Therefore, (8.16) is written as
α(Ak − Ak−1 )dk−1 + (Ak − Ak−1 )s . (8.18)
The secant equation imposes that the first term should be
α(Ak − Ak−1 )dk−1 = α(yk−1 − Ak−1 dk−1 ) .
It does not depend on Ak and no degrees of freedom are available here. However,
dk−1 is not involved in the second term, and the secant equation is irrelevant for this
term. We choose Ak such that this second term disappears and that the gap between
the two models is minimal. This is the case if Ak − Ak−1 is defined by
Ak − Ak−1 = udTk−1 , (8.19)
because dTk−1 s = 0. In this way, the choice of Ak depends on the choice of u. Once
again, it is the secant equation that enables its definition. We have
udTk−1 dk−1 = (Ak − Ak−1 )dk−1 = yk−1 − Ak−1 dk−1 .
Therefore,
yk−1 − Ak−1 dk−1
u= (8.20)
dTk−1 dk−1
and (8.15) is obtained directly from (8.19) and (8.20).
We now show that this update indeed generates the matrix satisfying the secant
equation that is the closest to Ak−1 .
Theorem 8.7 (Broydenoptimality). Consider Ak−1 ∈ Rn×n , dk−1 and yk−1 ∈

Rn , dk−1 6= 0. Let S = A | Adk−1 = yk−1 be the set of matrices satisfying the
secant equation. Then (8.15), i.e.,
(yk−1 − Ak−1 dk−1 )dTk−1

Ak = Ak−1 +
dTk−1 dk−1
is the solution to
min A − Ak−1 2
A∈S
and the unique solution to
min A − Ak−1 F
.
A∈S
Proof. Let A be an arbitrary matrix in S. We have
(yk−1 − Ak−1 dk−1 )dTk−1

Ak − Ak−1 = from (8.15)
2 dTk−1 dk−1 2
(Adk−1 − Ak−1 dk−1 )dTk−1
= because A ∈ S
dTk−1 dk−1 2
(A − Ak−1 )dk−1 dTk−1
=
dTk−1 dk−1 2
dk−1 dTk−1
≤ A − Ak−1 from (C.22)
2 dTk−1 dk−1 2
= A − Ak−1 2
from (C.25) .
Similarly, we have
(yk−1 − Ak−1 dk−1 )dTk−1

Ak − Ak−1 = from (8.15)
F dTk−1 dk−1 F
(Adk−1 − Ak−1 dk−1 )dTk−1
= because A ∈ S
dTk−1 dk−1 F
(A − Ak−1 )dk−1 dTk−1
=
dTk−1 dk−1 F
dk−1 dTk−1
≤ A − Ak−1 from (C.23)
F dTk−1 dk−1 2
= A − Ak−1 F
from (C.25) .
The uniqueness follows from the strict convexity of the Frobenius norm and the
convexity of the set S.
Algorithm 8.4 describes the secant method for n variables. Table 8.6 describes the
iterations for the secant method for Example 7.11. It is noteworthy that the method
converges, but toward another solution than that of Newton’s method. Table 8.7
compares matrix Ak for some iterations with the corresponding Jacobian matrix.
Clearly, these matrices are different for the first iterations of the algorithm. We
can see, even for the last iterations, that matrix Ak is a poor approximation of the
Jacobian matrix. This is one of the strengths of the secant method: it is not necessary
to have an asymptotically good approximation of the Jacobian matrix for the method
to work well.
Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 214 Systems of equations with multiple unknowns
Algorithm 8.4: Secant method: n variables

1 Objective
F(x) = 0 . (8.21)
3 Input
6 A first approximation of the Jacobian matrix A0 (by default A0 = I).
8 Output
10 Initialization
11 x1 := x0 − A−10 F(x0 ).
12 d0 := x1 − x0 .
13 y0 := F(x1 ) − F(x0 ).
14 k := 1.
15 Repeat
16 Broyden update:
(yk−1 − Ak−1 dk−1 )dTk−1

Ak := Ak−1 + .
dTk−1 dk−1
Calculate dk solution of Ak dk = −F(xk ).

17 xk+1 := xk + dk .
18 yk := F(xk+1 ) − F(xk ).
19 k := k + 1.
21 x∗ = xk .
Table 8.6: Iterations for the secant method for Example 7.11
k xk F(xk ) F(xk )
0 1.00000000e+00 3.00000000e+00 3.45723768e+00
1.00000000e+00 1.71828182e+00
1 -2.00000000e+00 -4.84071214e-01 2.28706231e+00
-7.18281828e-01 -2.23524698e+00
2 -1.66450025e+00 -8.68008706e-01 1.51117836e+00
8.30921595e-01 -1.23702099e+00
3 -2.42562564e-01 2.72598221e+00 7.74156513e+00
2.03771213e+00 7.24574714e+00
4 -1.24155582e+00 -1.34676047e+00 1.83898030e+00
7.71291329e-01 -1.25223192e+00
5 -5.80521668e-01 -1.64577514e+00 2.13825933e+00
4.22211781e-01 -1.36512898e+00
..
.
15 -1.71374738e+00 -1.15696833e-07 1.85422885e-07
1.22088678e+00 -1.44899582e-07
16 -1.71374741e+00 -2.43091768e-10 3.89249065e-10
1.22088682e+00 -3.04008596e-10
17 -1.71374741e+00 8.17124146e-14 1.30803685e-13
1.22088682e+00 1.02140518e-13
18 -1.71374741e+00 -2.22044604e-16 2.22044604e-16
1.22088682e+00 0.00000000e+00
Table 8.7: Jacobian matrix and Broyden matrix for the secant method for Example
7.11
k J(xk ) Ak
0 4.00000000e+00 2.00000000e+00 1.00000000e+00 0.00000000e+00
2.71828182e+00 3.00000000e+00 0.00000000e+00 1.00000000e+00
1 -2.00000000e+00 -1.43656365e+00 1.12149881e+00 6.95897342e-02
1.35335283e-01 1.54778635e+00 5.61032855e-01 1.32133752e+00
2 -1.32900051e+00 1.66184319e+00 1.00559588e+00 -4.65603572e-01
1.89285227e-01 2.07129209e+00 3.95856681e-01 5.58620104e-01
3 1.51487487e+00 4.07542427e+00 2.12000014e+00 4.80185068e-01
7.84614657e-01 1.24568122e+01 3.35797853e+00 3.07255629e+00
4 -4.83111643e-01 1.54258265e+00 2.63710364e+00 1.13571564e+00
2.88934337e-01 1.78467094e+00 3.83878676e+00 3.68207545e+00
..
.
17 -1.42749482e+00 2.44177364e+00 -1.06423011e+00 2.70386672e+00
1.80189282e-01 4.47169389e+00 6.34731480e-01 4.79964153e+00
18 -1.42749482e+00 2.44177364e+00 -1.06870996e+00 2.71006685e+00
1.80189282e-01 4.47169389e+00 6.34731480e-01 4.79964153e+00
216 Project
8.3 Project
Objective
The aim of the present project is to solve a fixed point problem, i.e., given a function
T : Rn → Rn , to identify x ∈ Rn such that T (x) = x. This is of course equivalent
to solving the system of equations F(x) = 0 defined by F(x) = T (x) − x. Even if
the example that we consider is relatively simple, we assume that the derivatives are
unavailable.
Approach
• Implement the Banach fixed-point algorithm xk+1 = T (xk ), as well as the secant
method (Algorithm 8.4).
T
• From the starting point x = 1 1 1 1 1 1 1 , solve the problem with
the fixed-point algorithm.
• Solve the system of equations T (x)−x = 0 by using the secant method (Algorithm
8.4) from the same starting point.
• Solve the system of equations x −T (x) = 0 by using the secant method (Algorithm
8.4) from the same starting point.
• Compare the obtained solutions.
• Compare the number of iterations required to obtain the solution.
Algorithms
The simplest algorithm to solve fixed-point problems consists in applying Banach
iterations, i.e., xk+1 = T (xk ). The secant method is used to solve F(x) = T (x)−x = 0.
Problem
Exercise 8.1. Find x∗ ∈ R7 such that T (x∗ ) = x∗ , with
 
1 + x1 − (x21 /4)
 
 (1/2)x2 + (3/10)x4 + (1/2)x6 
 
 1 + x3 − (x23 /3) 
 
 
T (x) = 
 (1/4)x2 + (2/5)x 4 + (1/5)x6
.

 √ 
 2x5 
 
 (1/4)x + (3/10)x + (3/10)x 
 2 4 6 
8/(2 + x7 )
Part IV
Unconstrained optimization
All constraint, except what wisdom

lays on evil men, is evil.
William Cowper
We discuss here the description of algorithms for solving unconstrained optimization

problems. The chosen approach is the following:
1. First, in Chapter 9, we study quadratic problems, because they often appear as
subproblems in various algorithms.
2. Based on the necessary optimality conditions described in Chapter 5, we use in
Chapter 10 Newton’s method and its variants presented in Part III to solve the
system of equations (5.1). We show with examples that this approach does not
always work.
3. In Chapter 11, we define a class of methods called descent methods, specifically
designed for minimization problems. We demonstrate that Newton’s method can,
once adapted, be part of this class.
4. The methods known as trust region methods, described in Chapter 12, constitute
an interesting alternative to descent methods. Again, we show that Newton’s
method can be adapted also to this context.
5. Finally, we describe quasi-Newton methods, similar to the methods presented in
Chapter 8 in the context of optimization.
Chapter 9
Quadratic problems
Contents
9.1 Direct solution . . . . . . . . . . . . . . . . . . . . . . . . . 221
9.2 Conjugate gradient method . . . . . . . . . . . . . . . . . 222
9.3 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
Before developing algorithms for general non linear problems, let us study the case of
quadratic problems (Definition 2.28). These indeed turn up regularly as subproblems
in the algorithms. In this chapter, we solve the problem
1 T
min f(x) = x Qx + bT x + c , (9.1)
x∈Rn 2
where Q is a symmetric n × n matrix, positive definite, b ∈ Rn and c ∈ R. According
to Theorem 5.10, if Q is not positive definite or semidefinite, the problem has no
solution. The case where Q is positive semidefinite and singular is discussed in
Theorem 5.10, but is not dealt with here. One should immediately note that the
value of c has no impact on the solution to the problem (9.1). Therefore, we focus
on the problem
1
min f(x) = xT Qx + bT x . (9.2)
x∈Rn 2
The value of c is added to the optimal value of the objective function of (9.2) to
obtain the optimal value of the objective function of (9.1).
By employing Theorem 5.10, the unique global minimum of (9.2) can be easily
obtained by solving the system of linear equations
Qx = −b . (9.3)
9.1 Direct solution

Classical linear algebra algorithms can be used to solve (9.3). The solution details
can be found in the literature for linear algebra (see in particular Golub and Van
Loan, 1996). Typically, the solution algorithm has the following structure.
222 Conjugate gradient method
Algorithm 9.1: Quadratic problems: direct solution

1 Objective
2 To find the global minimum of (9.2).
3 Input
4 The symmetric and positive definite matrix Q ∈ Rn×n .

5 The vector b ∈ Rn .
6 Output
7 The solution x∗ ∈ Rn .
8 Calculate the Cholesky factorization Q = LLT .
9 Calculate y∗ , the solution to the lower triangular system Ly = −b.
10 Calculate x∗ , the solution to the upper triangular system LT x = y∗ .
We refer the reader to Higham (1996) for a discussion about the numerical issues
associated with the direct method. Note that the above algorithm does not preserve
the sparsity of the matrix. Indeed, if n is large, and the number of non zero entries
of the matrix Q is significantly less than n2 , it is convenient to adopt data structures
that store only those elements (see, for instance, Dongarra, 2000 and Montagne and
Ekambaram, 2004). Unfortunately, even if Q is sparse1, the matrix L resulting from
the factorization is not, and these data structures cannot be used. The algorithm
presented in the next section is able to exploit the sparsity of the matrix.
9.2 Conjugate gradient method

The conjugate gradient method is an iterative method used to solve (9.2). It was
independently discovered by Stiefel (1952) and Hestenes (1951), who completed and
published it together (Hestenes and Stiefel, 1952). Quite unpopular during the 1950s
and 1960s, the method generated interest during the 1970s, when the size of problems
to solve increased significantly (see Golub and O’Leary, 1989, for more historical
details).
We describe this method in two steps, first presenting the conjugate directions
method in a general manner.
Definition 9.1 (Conjugate directions). Let Q ∈ Rn×n be a positive definite matrix.

The non zero vectors of Rn d1 , . . . , dk are said to be Q-conjugate if
dTi Qdj = 0 , ∀i, j such that i 6= j . (9.4)
1 A matrix is said to be sparse if most of its elements are zero.

Quadratic problems 223
Note that if Q is the identity matrix I, the conjugate directions are orthogonal. If
not, we may define the inner product:
hdi , dj iQ = di Qdj , (9.5)

so that di is Q-conjugate with dj if and only if di is orthogonal to dj with respect to

the inner product h·iQ . We can derive the following result directly from the definition.
Theorem 9.2 (Independence of conjugate directions). Let Q ∈ Rn×n be a positive

definite matrix and d1 , . . . , dk be a set of non zero and Q-conjugate directions.
Then, the vectors d1 , . . . , dk are linearly independent.
Proof. We assume by contradiction that there exist λ1 , . . . , λk−1 , not all zero, such
that
dk = λ1 d1 + · · · + λk−1 dk−1 .
Therefore,
dTk Qdk = λ1 dTk Qd1 + · · · + λk−1 dTk Qdk−1 = 0 ,
because the directions are Q-conjugate. This is impossible because dk is non zero
and Q is positive definite.
An immediate corollary is that, in Rn , the maximum number of Q-conjugate
directions is n.
The idea behind the conjugate directions method is to define an iterative algorithm
using n conjugate directions d1 , . . . , dn , with the following structure:
xk+1 = xk + αk dk , k = 1, . . . , n ,
where αk is chosen to minimize the function in the direction dk , that is
αk = argminα f(xk + αdk ) .
We can identify some of the properties of this type of method.
Lemma 9.3. Let Q ∈ Rn×n be a positive definite matrix, f(x) = 12 xT Qx + bT x

and d1 , . . . , dn be a set of Q-conjugate directions in Rn . Let x1 , . . . , xn+1 be the
iterates generated by a conjugate directions method. Then,
1. for all k = 1, . . . , n, the step αk is defined by
dTk (Qxk + b) dT ∇f(xk )

αk = − T
= − kT ; (9.6)
dk Qdk dk Qdk
2. for all k = 1, . . . , n, ∇f(xk ) is orthogonal to d1 , . . . , dk−1 , i.e.,
∇f(xk )T di = 0 , i = 1, . . . , k − 1 ; (9.7)
3. ∇f(xn+1 ) = 0;
4. let us take k such that ∇f(xk ) = 0 ; then,
∇f(xi ) = 0 , i = k, . . . , n + 1 . (9.8)
Proof. 1. Since αk is the minimum of the function in the direction dk , its value
corresponds to a zero directional derivative of f in the direction dk (Definition 2.7),
i.e.,
dTk ∇f(xk + αk dk ) = dTk ∇f(xk+1 ) = 0 (9.9)
and, applying the formula (2.42) of the gradient of a quadratic function, we obtain
0 = dTk ∇f(xk + αk dk )

= dTk Q(xk + αk dk ) + b
= dTk Qxk + αk dTk Qdk + dTk b
to obtain (9.6).
2. Since xk+1 = xk + αk dk , we have for any i = 1, . . . , k − 1,
xk = xk−1 + αk−1 dk−1

= xk−2 + αk−2 dk−2 + αk−1 dk−1
..
. (9.10)
k−1
X
= xi+1 + αj dj .
j=i+1
Therefore, for i = 1, . . . , k − 1,
dTi ∇f(xk ) = dTi (Qxk + b) according to (2.42)

 
k−1
X
= dTi Q(xi+1 + αj dj ) + b according to (9.10)
j=i+1
k
X
= dTi Qxi+1 + dTi b + αj dTi Qdj
j=i+1
= dTi (Qxi+1 + b) according to (9.4)

= ∇f(xi+1 )T di according to (2.42)
=0 according to (9.9) .
3. Let d 6= 0 be an arbitrary vector of Rn . Since d1 , . . . , dn is a set of n linearly

independent vectors in Rn , this is a basis and d can be written as
n
X
d= λi di .
i=1
Therefore
n
X
T
∇f(xn+1 ) d = λi ∇f(xn+1 )T di = 0
i=1
by the point 2. Since d is arbitrary, we obtain ∇f(xn+1 ) = 0.

4. If ∇f(xk ) = 0, then αk = 0, according to (9.6). The result follows by simple

induction on k.
The most important result related to the conjugate direction methods is that they
identify the global minimum of a problem in, at most, n iterations. In fact, they are
able to solve the problem of increasing dimension in subspaces.
Theorem 9.4 (Conjugate directions method). Let Q ∈ Rn×n be positive definite.

Let d1 , . . . , dℓ , ℓ ≤ n, be a set of Q-conjugate directions, let us take x1 ∈ Rn and
let
ℓ
X
Mℓ = x1 + hd1 , . . . , dℓ i = x x = x1 + λk dk , λ ∈ Rℓ
k=1
be the affine subspace spanned by the directions d1 , . . . , dℓ . Then, the global

minimum of the problem
1 T
min f(x) = x Qx + bT x (9.11)
x∈Mℓ 2
is
ℓ
X
xℓ+1 = x1 + αk dk (9.12)
k=1
with
dTk (Qxk + b)
αk = argminα f(xk + αdk ) = − . (9.13)
dTk Qdk
Proof. We consider the function
ℓ
!
X
g : Rℓ −→ R : λ g(λ) = f x1 + λi di
i=1
that enables us to transform problem (9.11) into an unconstrained problem
min g(λ)
λ∈Rℓ
such that  
ℓ
X
∂g
(λ) = dTi ∇fx1 + λj dj  .
∂λi
j=1
According to Lemma 9.3, when the coefficients λ are replaced by the steps αk defined
by (9.13) (that is, (9.6)), we have
∂g
(α1 , . . . , αℓ ) = dTi ∇f(xℓ+1 ) = 0 , ∀i .
∂λi
Then, ∇g(α1 , . . . , αℓ ) = 0. Moreover,
∂2 g
(α1 , . . . , αℓ ) = dTi Qdj .
∂λi λj
As the directions di are Q-conjugate, the second derivatives matrix of g is a diagonal

matrix with positive eigenvalues. It is therefore positive definite. We now need only
use the sufficient optimality conditions (Theorems 5.7 and 5.9) to demonstrate that
α1 , . . . , αℓ is the global minimum of g and that xℓ+1 defined by (9.12) is the global
minimum of (9.11).
The specific case ℓ = n is particularly important.
Corollary 9.5 (Convergence of the conjugate directions method). Let Q ∈ Rn×n be

positive definite. Let d1 , . . . , dn , be a set of Q-conjugate directions. Let x1 ∈ Rn
be arbitrary. The algorithm based on the recurrence
xk+1 = xk + αk dk
with
dTk (Qxk + b)
αk = −
dTk Qdk
identifies the global minimum of the problem
1 T
minn f(x) = x Qx + bT x
x∈R 2
in at most n iterations.
Proof. We apply Theorem 9.4 with ℓ = n to demonstrate that xℓ+1 is the global
minimum. As the conjugate directions are linearly independent (Theorem 9.2), n
directions span the entire space Rn , that is Mℓ = Mn = Rn .
This result makes the conjugate directions methods particularly attractive. It
remains to show how to obtain Q-conjugate directions. We proceed in two steps.
First, we start from an arbitrary set of linearly independent vectors and apply the
Gram-Schmidt orthogonalization procedure to obtain Q-conjugate directions. Indeed,
as discussed above, two directions are Q-conjugate if they are orthogonal with respect
to the inner product h·iQ defined by (9.5). Second, we identify a specific set of linearly
independent vectors, which simplifies considerably the formulation.
Consider the set of ℓ vectors ξ1 , . . . ξℓ , that are linearly independent. The Q-
conjugate vectors are defined by induction in such a way that, at each step i of the
induction, i = 1, . . . , ℓ, the vector subspace spanned by ξ1 , . . . ξi is the same as the

subspace spanned by d1 , . . . , di , i.e.,
hξ1 , . . . ξi i = hd1 , . . . , di i . (9.14)
We initiate the induction with d1 = ξ1 . Then, for any given i ≥ 2, we assume that
we have Q-conjugate vectors d1 , . . . , di−1 , such that

hξ1 , . . . ξi−1 i = hd1 , . . . , di−1 i .
We thus choose di of the form
i−1
X
di = ξi + αik dk . (9.15)
k=1
We calculate the coefficients αik in order for di to be Q-conjugate with d1 , . . . , di−1 .

Let 1 ≤ j ≤ i − 1 be any arbitrary index.
0 = dTj Qdi
i−1
X
= dTj Qξi + αik dTj Qdk
k=1
= dTj Qξi + αij dTj Qdj ,

because all the other terms of the sum are zero by Q-conjugation. Then,
dTj Qξi
αij = −
dTj Qdj
i−1
X dTk Qξi
di = ξi − dk . (9.16)
dT Qdk
k=1 k
The calculation of di is well-defined. Indeed, the denominator dTk Qdk is non zero be-
cause Q is positive definite. Since the vectors ξ1 , . . . , ξi are linearly independent, ξi is
linearly independent from any direction in the subspace hξ1 , . . . ξi−1 i. From (9.14), it
is also independent from any direction in the subspace hd1 , . . . , di−1 i. Consequently,
i−1
X dTk Qξi
ξi 6= dk ,
dT Qdk
k=1 k
and di is not zero.

The Gram-Schmidt procedure described above can be applied to any set of linearly
independent vectors. We see now that a judicious choice of the vector ξ allows us to
greatly simplify (9.16). The method called the conjugate gradient method utilizes
ξi = −∇f(xi ) = −Qxi − b .
In order to apply the Gram-Schmidt procedure, we must verify that the vectors
∇f(xi ), i = 1, . . . , n are linearly independent. Actually, Theorem 9.6 proposes a
stronger result: they are orthogonal.
Theorem 9.6 (Orthogonal gradients). We consider the conjugate directions

method where each direction di is generated by the Gram-Schmidt method ap-
plied to the directions −∇f(x1 ), . . . , −∇f(xi ), i.e.,
i−1 T
X dk Q∇f(xi )
di = −∇f(xi ) + dk . (9.17)
k=1
dTk Qdk
Then,
∇f(x1 ), . . . , ∇f(xi ) = hd1 , . . . , di i (9.18)
and
∇f(xi )T ∇f(xk ) = 0 , k = 1, . . . , i − 1 . (9.19)
Proof. i = 1 : (9.18) is trivially satisfied because d1 = −∇f(x1 ) and (9.19) does not
apply.
i = 2: We have
dT Q∇f(x2 )
d2 = −∇f(x2 ) + 1 T d1
d1 Qd1
and (9.18) is satisfied. Moreover, according to Lemma 9.3,
0 = ∇f(x2 )T d1 = −∇f(x2 )T ∇f(x1 )
and (9.19) is satisfied.

i > 2: we now assume that the result is satisfied for i − 1. Since the vectors
∇f(x1 ), . . . , ∇f(xi−1 ) are orthogonal, they are linearly independent. Therefore, (9.17)
directly implies that (9.18) is satisfied for i. According to Lemma 9.3, we have that
∇f(xi )T dk = 0 , k = 1, . . . , i − 1 , (9.20)
and ∇f(xi ) is orthogonal to the subspace hd1 , . . . , di−1 i. Since (9.18) is satisfied for
i − 1, ∇f(xi ) is orthogonal to the subspace ∇f(x1 ), . . ., ∇f(xi−1 ) , and (9.19) is
satisfied for i.
We now demonstrate a proposition that enables us to simplify the conjugate gra-
dient method.
Theorem 9.7 (Conjugate gradients). We consider the conjugate directions method

where each direction di is generated by the Gram-Schmidt method applied to the
directions −∇f(x1 ), . . . , −∇f(xi ), i.e., according to (9.17). If ∇f(xi ) 6= 0, then
di = −∇f(xi ) + βi di−1 (9.21)
with
∇f(xi )T ∇f(xi )
βi = . (9.22)
∇f(xi−1 )T ∇f(xi−1 )
Proof. For all k = 1, . . . , i − 1, we have

∇f(xk+1 ) − ∇f(xk ) = Qxk+1 + b − Qxk − b = Q(xk + αk dk − xk ) = αk Qdk .
Since ∇f(xi ) 6= 0 by assumption, then ∇f(xk ) 6= 0, k = 1, . . . , i − 1 (item 4 of
Lemma 9.3), and αk 6= 0, so that
1
Qdk = ∇f(xk+1 ) − ∇f(xk ) .
αk
Then, from the orthogonality of the gradients (Theorem 9.6), we have
1
∇f(xi )T Qdk = ∇f(xi )T ∇f(xk+1 ) − ∇f(xk )
αk

 1 ∇f(xi )T ∇f(xi ) if k = i − 1
= αi

0 if k = 1, . . . , i − 2 .
Similarly, we have
1 T
dTk Qdk = dk ∇f(xk+1 ) − ∇f(xk ) .
αk
Therefore, (9.17) simplifies into
dTi−1 Q∇f(xi )
di = −∇f(xi ) + di−1 = −∇f(xi ) + βi di−1 (9.23)
dTi−1 Qdi−1
with
dTi−1 Q∇f(xi ) ∇f(xi )T ∇f(xi )
βi = = . (9.24)
dTi−1 Qdi−1 dTi−1 ∇f(xi ) − ∇f(xi−1 )
Since
di−1 = −∇f(xi−1 ) + βi−1 di−2 ,
the denominator is written as

dTi−1 ∇f(xi ) − ∇f(xi−1 ) = −∇f(xi−1 )T ∇f(xi ) − ∇f(xi−1 )

+ βi−1 dTi−2 ∇f(xi ) − ∇f(xi−1 )
= −∇f(xi−1 )T ∇f(xi ) (= 0)
T
+ ∇f(xi−1 ) ∇f(xi−1 )
+ βi−1 dTi−2 ∇f(xi ) (= 0)
− βi−1 dTi−2 ∇f(xi−1 ) (= 0) ,
where the three indicated terms are zero according to (9.7) and (9.19). And we obtain
(9.22) from (9.24).
All these results are combined to obtain Algorithm 9.2, the conjugate gradient
method. An important characteristic of the conjugate gradient algorithm is that the
matrix Q defining the problem is never needed as such. There is not even the need
to store it. It is used exclusively to calculate the matrix-vector products Qxk or
Qdk . This is particularly interesting for problems of large size, for which the matrix
Q is generally sparse. In this case, the matrix-vector products can be efficiently
implemented without ever explicitly forming the matrix Q.
Algorithm 9.2: Conjugate gradient method

1 Objective
2 To find the global minimum of (9.2), i.e., minx∈Rn 21 xT Qx + bT x .
3 Input
4 A first approximation x1 of the solution.

5 The symmetric positive definite matrix Q ∈ Rn×n .
6 The vector b ∈ Rn .
7 Output
8 The solution x∗ ∈ Rn .
9 Initialization
10 k := 1,
11 d1 := −Qx1 − b.
12 Repeat
dTk (Qxk + b)
13 αk := − .
dTk Qdk
14 xk+1 := xk + αk dk .
∇f(xk+1 )T ∇f(xk+1 ) (Qxk+1 + b)T (Qxk+1 + b)
15 βk+1 := = .
∇f(xk )T ∇f(xk ) (Qxk + b)T (Qxk + b)
16 dk+1 := −Qxk+1 − b + βk+1 dk .
17 k := k + 1.
18 Until ∇f(xk ) = 0 or k = n + 1
19 x∗ = xk .
Example 9.8 (Conjugate gradient method). We apply Algorithm 9.2 to the quadratic
problem (9.2) defined by
  

1 1 1 1 −4
 1 2 2 2   −7 
Q=
 1
, b= 
 −9  .
2 3 3 
1 2 3 4 −10
The iterations are detailed in Table 9.1. The algorithm converges after having gener-
ated 4 directions, as predicted by Corollary 9.5. It is easy to verify that the directions
generated by the algorithm are well conjugated and that the gradients are orthogonal
to each other, as stated in Theorem 9.6.
Table 9.1: Iterations for the conjugate gradient method for Example 9.8
k xk ∇f(xk ) dk αk βk
1 +5.00000e+00 +1.60000e+01 -1.60000e+01 +1.20766e-01
+5.00000e+00 +2.80000e+01 -2.80000e+01
+5.00000e+00 +3.60000e+01 -3.60000e+01
+5.00000e+00 +4.00000e+01 -4.00000e+01
2 +3.06775e+00 +1.50810e+00 -1.52579e+00 +1.02953e+00 +1.10547e-03
+1.61856e+00 +9.48454e-01 -9.79407e-01
+6.52430e-01 -2.29750e-01 +1.89953e-01
+1.69367e-01 -1.06038e+00 +1.01616e+00
3 +1.49690e+00 +1.70656e-01 -1.97676e-01 +2.37172e+00 +1.77089e-02
+6.10224e-01 -1.55585e-01 +1.38241e-01
+8.47993e-01 -9.20500e-02 +9.54138e-02
Quadratic problems
+1.21554e+00 +1.23492e-01 -1.05497e-01

4 +1.02806e+00 +5.77796e-03 -8.27569e-03 +3.39118e+00 +1.26355e-02
+9.38093e-01 -1.65085e-02 +1.82552e-02
+1.07429e+00 +2.31118e-02 -2.19062e-02
+9.65332e-01 -1.15559e-02 +1.02229e-02
5 +1.00000e+00 -1.66356e-12
+1.00000e+00 -3.12639e-12
+1.00000e+00 -4.21174e-12
+1.00000e+00 -4.78906e-12
231
232 Project
9.3 Project
Objective
The aim of the present project is to solve several quadratic problems and compare
the direct solution with the conjugate gradient method for ill-conditioned problems.
Approach
Perform the following experiments.

1. Generate a problem of dimension 10 for which the eigenvalues are randomly dis-
tributed between 1 and 3, and solve it with Algorithms 9.1 and 9.2. Compare the
solutions. After how many iterations does the conjugate gradient algorithm iden-
tify an iterate such that the norm of the gradient is below 10−6 ? Is this consistent
with theory?
2. Carry out the same approach for a problem of dimension 100.
3. Generate a vector with 100 eigenvalues randomly distributed between 0 and 1.
Subsequently, multiply the last 50 ones by 10,000 and generate a quadratic prob-
lem by using the procedure described below. After how many iterations does the
conjugate gradient algorithm identify an iterate with the norm of the gradient
below 10−6 ? Is this consistent with theory?
4. Generate a quadratic problem defined by a Hilbert matrix of dimension 10 and
another of dimension 100 (see Exercise 9.2 for the definition of a Hilbert matrix).
Apply Algorithms 9.1 and 9.2. Compare the solutions. After how many iterations
does the conjugate gradient algorithm identify an iterate with the norm of the
gradient below 10−6 ? Is this consistent with theory?
Algorithms
Algorithms 9.1 and 9.2.
Problems
Exercise 9.1. Use the following procedure to generate quadratic problems for which
the solution and conditioning are known.
a) Consider any randomly defined matrix A ∈ Rn×n , for instance
 
0.071744 0.039717 0.868964 0.880528 0.969800
 0.085895 0.145339 0.832277 0.691063 0.621372 
 
A=
 0.857871 0.357765 0.151824 0.396765 0.258813 .

 0.412037 0.521116 0.348378 0.632816 0.416459 
0.806180 0.110585 0.332506 0.986633 0.476912
b) Carry out a QR factorization of A to obtain an orthogonal matrix B

 
-0.057291 -0.025777 0.733559 -0.072330 -0.672839
 -0.068592 -0.258524 0.619722 0.339972 0.654846 
 
B=
 -0.685055 -0.035490 -0.204420 0.657980 -0.233910 .

 -0.329033 -0.830212 -0.119684 -0.433105 -0.024104 
-0.643777 0.491924 0.147388 -0.508596 0.251334
c) Choose non negative eigenvalues λ1 , . . . , λn and define D as a diagonal matrix

containing these values in the diagonal. For instance,
 
0.1 0 0 0 0
 0 1 0 0 0 
 
D=
 0 0 10 0 0 .

 0 0 0 100 0 
0 0 0 0 1 000
d) Define the matrix Q = BDBT .

 
458.618 -438.512 151.130 18.496 -164.356
 -438.512 444.290 -132.059 -31.033 148.085 
 
Q=
 151.130 -132.059 98.474 -22.563 -92.529 .

 18.496 -31.033 -22.563 20.182 15.406 
-164.356 148.085 -92.529 15.406 89.536
T
e) Choose a vector x∗ , for instance x∗ = 1 ... 1 , and define b = −Qx∗ .
 
-25.37482
 9.22945 
 
b=
 -2.45361 .

 -0.48793 
3.85804
Then, x∗ is the solution to min 12 xT Qx + bT x and the eigenvalues of Q are the same
as those of D. The same goes for the conditioning.
Exercise 9.2. We also consider the matrix Hn ∈ Rn for which the elements Hn (i, j)
are defined by
1
Hn (i, j) = .
i+j−1
This matrix is called the Hilbert matrix of dimension n. It is symmetric, positive
definite, but extremely ill-conditioned (calculate its eigenvalues).
Chapter 10
Newton’s local method
We now apply Newton’s method in the context of optimization. In this chapter, we

do it blindly. And Algorithm 10.1 does not work in general! Its only utility is to give
inspiration in order to define the algorithms that develop the same rate of convergence
as Newton’s method.
Contents
10.1 Solving the necessary optimality conditions . . . . . . . 235
10.2 Geometric interpretation . . . . . . . . . . . . . . . . . . . 236
10.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
10.1 Solving the necessary optimality conditions

The idea behind Newton’s local method is simply to use Algorithm 7.3 to solve the
system of equations (5.1),
∇f(x∗ ) = 0 ,
which defines the necessary optimality conditions. The algorithm, applied to F(x) =
∇f(x) and J(x) = ∇2 f(x), is described as Algorithm 10.1.
It inherits all the properties of Algorithm 7.3. In particular,
1. the method converges q-quadratically under favorable conditions (Theorem 7.13),
2. the method can diverge if the starting point is too far from the solution,
3. the method is not defined if the matrix ∇2 f(xk ) is singular.
When employed in the context of optimization, Newton’s local method presents
a further disadvantage. Indeed, solving the necessary optimality conditions of the
first degree does not guarantee that the identified solution is a minimum. Newton’s
method has no mechanism enabling it to discern minima from maxima and saddle
236 Geometric interpretation
Algorithm 10.1: Newton’s local method

1 Objective
2 To find (an approximation of) a solution to the system
∇f(x) = 0 . (10.1)
3 Input
4 The gradient of the function ∇f : Rn → Rn .
5 The Hessian of the function ∇2 f : Rn → Rn×n .
8 Output
10 Initialization
11 k := 0.
12 Repeat
13 Calculate dk solution of ∇2 f(xk )dk = −∇f(xk ).
14 xk+1 := xk + dk .
15 k := k + 1.
16 Until ∇f(xk ) ≤ ε
17 x∗ = xk .
points. For instance, by

Tapplying Algorithm 10.1 to minimize the function of Example
5.8, with x0 = 1 1 , Newton’s local method converges rapidly toward

∗ 0 ∗ 0 2 ∗ 1 −1
x = , ∇f(x ) = , ∇ f(x ) = ,
π/2 0 −1 0
which does not satisfy the second-order necessary optimality conditions (Theorem 5.1)
and is consequently not a local minimum. It is actually a saddle point. The iterations
of the method are illustrated in Figure 10.1 and in Table 10.1.
Since Newton’s local method cannot be used as is, we develop alternative methods
in the following chapters. However, the fast rate of convergence of Newton’s method
prompts us to use it when appropriate. We conclude this chapter with a geometric
interpretation of Newton’s local method in the context of optimization.
10.2 Geometric interpretation

The main idea of Newton’s method when solving non linear equations is to replace
a complicated non linear function by a simpler model. In the context of equations,
this model is linear (Definition 7.9). Newton’s local method, applied in the context
of optimization, can be motivated in a similar manner. In this case, the model is no
Table 10.1: Newton’s local method for the minimization of Example 5.8
k xk ∇f(xk ) ∇f(xk ) f(xk )
0 1.00000000e+00 1.54030230e+00 1.75516512e+00 1.04030231e+00
1.00000000e+00 -8.41470984e-01
1 -2.33845128e-01 -2.87077027e-02 2.30665381e-01 7.53121618e-02
1.36419220e+00 2.28871986e-01
2 1.08143752e-02 -3.22524807e-03 1.12840544e-02 -9.33543838e-05
1.58483641e+00 -1.08133094e-02
3 -2.13237666e-06 9.22828706e-07 2.32349801e-06 8.79175320e-12
Newton’s local method
1.57079327e+00 2.13237666e-06
4 1.99044272e-17 8.11347449e-17 8.35406072e-17 1.35248527e-25
1.57079632e+00 -1.99044272e-17
237
6 2
4 1.8
x∗
2 1.6
0 x2 x∗
x0 1.4 x2
-2 x0
1.2
-4
1
-6
-2 -1 0 1 2 -0.5 0 0.5 1
x1 x1
(a) Iterates (b) Zoom
Figure 10.1: Iterates of Newton’s local method for Example 5.8
longer linear, but quadratic. It is obtained thanks to Taylor’s second-order theorem

(Theorem C.2).
Definition 10.1 (Quadratic model of a function). Let f : Rn → R be a twice differ-

entiable function. The quadratic model of f in bx is a function mbx : Rn → R defined
by
1
mbx (x) = f(b x)T ∇f(b
x) + (x − b x) + (x − b x)T ∇2 f(b
x)(x − b
x) , (10.2)
2
where ∇f(b x (Definition 2.5) and ∇2 f(b
x) is the gradient of f in b x) is the hessian matrix
x (Definition 2.19). Defining d = x − b
of f in b x, we obtain the equivalent formulation:
1 T 2
mbx (b x) + dT ∇f(b
x + d) = f(b x) + d ∇ f(b
x)d . (10.3)
2
Note that Definition 10.1 is consistent with Definition 2.28, with Q = ∇2 f(bx),
g = ∇f(bx) and c = f(b
x). If we minimize the model instead of the function, we get
the problem
1
minn mbx (b x) + dT ∇f(b
x + d) = f(b x) + dT ∇2 f(b
x)d . (10.4)
d∈R 2
The sufficient first-order optimality condition (Theorem 5.7) for (10.4) is written as:
∇mbx (b x) + ∇2 f(b
x + d) = ∇f(b x)d = 0 , (10.5)
i.e.,
x)−1 ∇f(b
d = −∇2 f(b x) (10.6)
or
x=b x)−1 ∇f(b
x − ∇2 f(b x) . (10.7)
Newton’s local method 239
The sufficient second-order optimality condition requires that the matrix ∇2 f(b
x) be
positive definite.
Note also that (10.7) is exactly the main formula of Newton’s local method (Al-
gorithm 10.1).
When the hessian matrix of the function is positive definite in xk , an iteration of

Newton’s local method corresponds to minimizing the quadratic model of the function
in xk and thus defining
xk+1 = argminx∈Rn mxk (x) . (10.8)
Algorithm 10.2: Newton’s local method by quadratic modeling

1 Objective
2 To find (an approximation of) a solution to the system
∇f(x) = 0 . (10.9)
3 Input
5 The hessian of the function ∇2 f : Rn → Rn×n .
8 Output
10 Initialization
11 k := 0.
12 Repeat
13 Create the quadratic model
1 T 2
mxk (xk + d) = f(xk ) + dT ∇f(xk ) + d ∇ f(xk )d . (10.10)
2
14 Calculate
dk = argmind mxk (xk + d) (10.11)
using the direct method (Algorithm 9.1) or the conjugate gradient
algorithm (Algorithm 9.2).
15 xk+1 := xk + dk .
16 k := k + 1.
18 x∗ = xk .
Algorithm 10.2 is the version of Algorithm 10.1 using the quadratic model. It is
important to note that Algorithms 10.1 and 10.2 are equivalent only when the hessian
matrix at the current iterate is positive definite, that is when the function is locally
convex at the current iterate. When solving Example 5.8, illustrated in Table 10.1,
this interpretation is not valid. Indeed, we have

1.00000000e+00 -8.41470984e-01
∇2 f(x0 ) = ,
-8.41470984e-01 -5.40302305e-01
for which the eigenvalues are -9.10855416e-01 and 1.37055311e+00. This matrix is
not positive definite. Therefore, the necessary optimality condition for the quadratic
problem is never satisfied, and there exists no solution to the minimization problem of
the quadratic model in x0 . The model is shown in Figure 10.2(b). It is not bounded
from below. Therefore, Algorithm 10.2 cannot be applied.
40
20
f(x1 , x2 ) 0
-20
-4
-2
0
x2 2
4 -2 -4
6 6 4 2 x 0
1
(a) Objective function
40
20
mx0 (x1 , x2 ) 0
-20
-4
-2
0
x2 2
4 -2 -4
6 6 4 2 x 0
1
(b) Quadratic model mx0 (x)
Figure 10.2: Quadratic model for Example 5.8
We illustrate with Example 10.2 the limitations of Newton’s local method by using
the geometric interpretation (Algorithm 10.2).
Example 10.2 (Quadratic model). Consider the function
f(x) = −x4 + 12 x3 − 47 x2 + 60 x . (10.12)

We consider three different points and apply Newton’s local method.
1. xk = 3. The quadratic model is
m3 (x) = 7x2 − 48 x + 81 ,
for which the minimum is xk+1 = 24/7 ≈ 3.4286. Moreover, f(xk ) = 0 and
f(xk+1 ) ≈ -1.32 and then f(xk+1 ) < f(xk ). This is a favorable case. The model
is shown in Figure 10.3. It can be seen that there is a good adequacy between the
model and the function in the neighborhood of the iterates xk and xk+1 .
m4 (x) = x2 − 4x ,
for which the minimum is xk+1 = 2. In this case, f(xk ) = 0 and f(xk+1 ) = 12. The
iterate generated by the method is worse (in terms of the value of the objective
function) than the current iterate. The model is illustrated in Figure 10.4. It can
be seen that the iterate xk+1 lies in a region where the model is a poor approxi-
mation of the function. Recall that Taylor’s theorem guarantees a good adequacy
only in a neighborhood of xk , without mentioning the size of that neighborhood.
Here, the iterate xk+1 clearly lies outside it.
25
f(x)
20 m3 (x)
15
10
5
f(xk )
0 • •
-5 f(xk+1 )
-10
-15
1 2 3 4 5
x
Figure 10.3: Illustration of Example 10.2 with xk = 3
25
f(x)
20 m4 (x)
15 f(x )
• k+1
10
5
f(xk )
0 •
-5
-10
-15
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
x
25
f(x)
20 m5 (x)
15
10
5 f(xk+1 )
•
0 •
f(xk )
-5
-10
-15
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
x

m5 (x) = −17 x2 + 160 x − 375 .
This model is concave (its second derivative is negative) and it is not bounded
from below. It is not possible to minimize it and Algorithm 10.2 does not work.
Applying Newton’s local method (Algorithm 10.1) in xk = 5 corresponds to max-
imizing this quadratic model, which goes against the desired effect.
We conclude this chapter by defining two particular points that play a role later
on. On the one hand, the point obtained during the iteration of Newton’s local
method is often called Newton’s point.
Definition 10.3 (Newton’s point). Let f : Rn → R be a twice differentiable function

and let us take xk ∈ Rn such that ∇2 f(xk ) is positive definite. Newton’s point of f
in xk is the point
xN = xk + dN , (10.13)
where dN is the solution to the system of equations
∇2 f(xk )dN = −∇f(xk ) . (10.14)
The system (10.14) is often called Newton’s equations.
Newton’s point minimizes the quadratic model of the function in xk . If ∇2 f(xk )

is positive definite, we have a minimum of the quadratic model in xk . On the other
hand, the point minimizing the quadratic model in the direction with the steepest
descent is called the Cauchy point.1
Augustin-Louis Cauchy was born in Paris on August 21, 1789.

Cauchy was a pioneer in the study of analysis. In 1814, he
published a thesis on definite integrals that became the basis
of complex functions theory. One year later, he was appointed
professor of analysis at Ecole Polytechnique. In his work, he tried
to demonstrate the proposals that had been put forward so far
as evident and for which there was no proof. Cauchy was the
first to provide rigorous conditions for the convergence of infinite
series and he also gave a precise definition of the integral. He
was a prolific researcher (he wrote approximately 800 mathematical articles), and was
unliked by most of his colleagues. He was a convinced royalist and legitimist, and
spent some time in Italy after having refused to pledge allegiance. He resumed his
chair at the Sorbonne in 1848 after the abdication of Louis-Philippe. He kept it until
his death in Sceaux, on May 22, 1857.
Figure 10.6: Augustin-Louis Cauchy
Definition 10.4 (Cauchy’s point). Let f : Rn → R be a twice differentiable function

and let us take xk ∈ Rn . The Cauchy point of f in xk is the point xC that minimizes
the quadratic model of f in the direction with the steepest descent, i.e.,
xC = xk − αC ∇f(xk ) , (10.15)
where
αC ∈ argminα∈R+
0
mxk xk − α∇f(xk ) . (10.16)
1 We here refer to Dennis and Schnabel (1996, page 139). Other references (particularly Conn
et al., 2000, page 124) define Cauchy’s point as the minimum of the quadratic model along the
arc obtained by projecting the steepest descent direction onto the trust region.
244 Exercises
It is well defined if f is convex in the direction of the gradient. In this case, there is
only one minimizer. Using (10.3), we obtain
∇f(xk )T ∇f(xk )
αC = . (10.17)
∇f(xk )T ∇2 f(xk )∇f(xk )
10.3 Exercises
For each of the following problems, determine Cauchy’s point xC and Newton’s point
xN in x̄. Each time, compare the value of the objective function at these three points.
n
X T
Exercise 10.1. minn ix2i , x̄ = 1 ... 1 .
x∈R
i=1
n
X
Exercise 10.2. minn x2i , any x̄.
x∈R
i=1
2 T
Exercise 10.3. min 2x1 x2 e−(4x1 +x2 )/8 , x̄ = 0 1 .
x∈R2
2 T
Exercise 10.4. min 2x1 x2 e−(4x1 +x2 )/8 , x̄ = 4 4 .
x∈R2
2 2 T
Exercise 10.5. min 100 x2 − x21 + 1 − x1 , x̄ = −1 −1 .
x∈R2
Chapter 11
Descent methods and line search
Newton’s local method may be fast, but it fails regularly. We address the problem in
a different manner. Intuitively, in order to identify iterates with a lower value of the
objective function, we choose to follow the direction with the steepest descent given
by the opposite of the gradient. This idea turns out to be functional, but disastrously
slow. We demonstrate in this chapter how to correct this shortcoming, and how to
combine the two approaches in order to obtain a method that is both fast and robust.
Contents
11.1 Preconditioned steepest descent . . . . . . . . . . . . . . 246
11.2 Exact line search . . . . . . . . . . . . . . . . . . . . . . . . 251
11.2.1 Quadratic interpolation . . . . . . . . . . . . . . . . . . . 252
11.2.2 Golden section . . . . . . . . . . . . . . . . . . . . . . . . 257
11.3 Inexact line search . . . . . . . . . . . . . . . . . . . . . . . 263
11.4 Steepest descent method . . . . . . . . . . . . . . . . . . . 277
11.5 Newton method with line search . . . . . . . . . . . . . . 277
11.6 The Rosenbrock problem . . . . . . . . . . . . . . . . . . . 281
11.7 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . 284
11.8 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
We leave Newton’s method aside for now (we return to it in Section 11.5) and focus
on specific methods for an optimization problem. The main idea is simple. Since we
seek the minimum
of a function, we attempt to descend, i.e., to generate a sequence
of iterates xk k such that
f(xk+1 ) ≤ f(xk ) , k = 1, 2, . . .
Theorem 2.11 ensures that such an iterate can be found in a direction d such that
∇f(xk )T d < 0. The methods presented here, often called descent methods, consist
of a process involving three main steps:
1. Find a direction dk such that ∇f(xk )T dk < 0.
246 Preconditioned steepest descent
2. Find a step αk such that f(xk + αk dk ) < f(xk ).

3. Calculate xk+1 = xk + αk dk and verify a stopping criterion.
11.1 Preconditioned steepest descent

The first idea that comes to mind to define a concrete descent method is to invoke
the theorem of the steepest descent (Theorem 2.13) and to choose dk = −∇f(xk ).
Indeed, it is in this direction that the function has its steepest descent. We often refer
to this method as the steepest descent method. An iteration of this method consists
in
xk+1 = xk − αk ∇f(xk ) . (11.1)
When it comes to the step αk , we choose for the moment one that gives the largest
reduction of the function in the direction dk , i.e.,
αk ∈ argminα∈R+
0
f(xk + αdk ) . (11.2)
In the presence of multiple minima, it is common to use the first one, that is αk is
the small element of argminα∈R+0
f(xk + αdk ). Example 11.1 illustrates this method
for a simple case.
Example 11.1 (Steepest descent method). We minimize the function f : R2 → R
defined by
1 9
f(x) = x21 + x22 , (11.3)
2 2
by using the steepest descent method. Let xk be the current iterate. The direction
of the steepest descent is
!
− xk 1
dk = −∇f(xk ) = .
−9 xk 2
To calculate the step αk , we solve the problem in one dimension
1 2 9 2
min f xk − α∇f(xk ) = min xk 1
− α xk 1
+ xk 2
− 9α xk 2
,
α α 2 2
for which the optimal solution is
2 2
xk 1 + 81 xk 2
α= 2 2 .
xk 1 + 729 xk 2
At each iteration, the steepest descent method generates the point
2 2 !
xk 1 + 81 xk 2 − xk 1
xk+1 = xk + 2 2 .
xk 1 + 729 xk 2 −9 xk 2
T
By applying this algorithm, starting from the point x0 = 9 1 , we obtain the
iterations illustrated in Figure 11.1 and listed (in part) in Table 11.1.
Descent methods and line search 247
1.5
x0
1
0.5
x∗
0 x2
-0.5
-1
-1.5
-10 -5 0 5 10
x1
(a) Iterations
0.15
0.1
0.05
x∗
0 x2
-0.05
-0.1
-0.15
-1 -0.5 0 0.5 1
x1
(b) Zoom
Figure 11.1: Steepest descent method: illustration of Example 11.1.
In Example 11.1, it is remarkable how slow the steepest descent method is, even
though the function to minimize is simple. The zigzag behavior illustrated in Fig-
ure 11.1 is characteristic. We show next that the performance can be improved by
preconditioning the function (the concepts of conditioning and preconditioning are
discussed in Section 2.5).
Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 248
Table 11.1: Steepest descent method for Example 11.1

k (xk )1 (xk )2 ∇f(xk )1 ∇f(xk )2 αk f(xk )
0 +9.000000E+00 +1.000000E+00 +9.000000E+00 +9.000000E+00 0.2 +4.500000E+01
1 +7.200000E+00 -8.000000E-01 +7.200000E+00 -7.200000E+00 0.2 +2.880000E+01
2 +5.760000E+00 +6.400000E-01 +5.760000E+00 +5.760000E+00 0.2 +1.843200E+01
3 +4.608000E+00 -5.120000E-01 +4.608000E+00 -4.608000E+00 0.2 +1.179648E+01
4 +3.686400E+00 +4.096000E-01 +3.686400E+00 +3.686400E+00 0.2 +7.549747E+00
5 +2.949120E+00 -3.276800E-01 +2.949120E+00 -2.949120E+00 0.2 +4.831838E+00
..
.
20 +1.037629E-01 +1.152922E-02 +1.037629E-01 +1.037629E-01 0.2 +5.981526E-03
21 +8.301035E-02 -9.223372E-03 +8.301035E-02 -8.301035E-02 0.2 +3.828177E-03
22 +6.640828E-02 +7.378698E-03 +6.640828E-02 +6.640828E-02 0.2 +2.450033E-03
23 +5.312662E-02 -5.902958E-03 +5.312662E-02 -5.312662E-02 0.2 +1.568021E-03
24 +4.250130E-02 +4.722366E-03 +4.250130E-02 +4.250130E-02 0.2 +1.003534E-03
25 +3.400104E-02 -3.777893E-03 +3.400104E-02 -3.400104E-02 0.2 +6.422615E-04
Preconditioned steepest descent
..
.
50 +1.284523E-04 +1.427248E-05 +1.284523E-04 +1.284523E-04 0.2 +9.166662E-09
51 +1.027618E-04 -1.141798E-05 +1.027618E-04 -1.027618E-04 0.2 +5.866664E-09
52 +8.220947E-05 +9.134385E-06 +8.220947E-05 +8.220947E-05 0.2 +3.754665E-09
53 +6.576757E-05 -7.307508E-06 +6.576757E-05 -6.576757E-05 0.2 +2.402985E-09
54 +5.261406E-05 +5.846007E-06 +5.261406E-05 +5.261406E-05 0.2 +1.537911E-09
55 +4.209125E-05 -4.676805E-06 +4.209125E-05 -4.209125E-05 0.2 +9.842628E-10
Example 11.2 (Preconditioned steepest descent method). We minimize the function

f : R2 → R defined by
1 9
f(x) = x21 + x22 , (11.4)
2 2
by using the steepest descent method and the preconditioning technique from Sec-
tion 2.5. We have

T
x1 2 1 0 1 0 1 0
∇f(x) = and ∇ f(x) = = .
9x2 0 9 0 3 0 3
We use the equations (2.53) and (2.54) to define the change of variables
x1′ = x1
x2′ = 3x2
and we obtain the function

2
1 2 9 1 ′ 1 ′2 1 ′2
f̃(x ) = x1′ +
′
x = x + x2 .
2 2 3 2 2 1 2
Therefore, the direction of the steepest descent is

−x1′
d = −∇f̃(x ′ ) = .
−x2′
To calculate the step α, we solve the problem in one dimension

1 ′ 2 1 ′ 2
min f̃ x ′ − α∇f̃(x ′ ) = min x1 − αx1′ + x2 − αx2′ ,
α α 2 2
for which the optimal solution is α = 1. Then, regardless of the current iterate x ′ ,
the steepest descent method always generates the point
′
x1 −x1′
+ = 0,
x2′ −x2′
which is the optimal solution to the problem. In this case, the method identifies the
minimum of the function in a single iteration.
Clearly, the performance of the steepest descent method can be significantly im-
proved when the function is preconditioned. We can generalize this idea. Let Hk be
a symmetric positive definite matrix such that Hk = Lk LTk . We use Lk to define a
change of variables, according to Definition 2.32, i.e.,
x ′ = LTk x . (11.5)
The steepest descent method for the variables x ′ is written as

′
xk+1 = xk′ − αk ∇f̃(xk′ ) . (11.6)
250 Preconditioned steepest descent
By using (2.52), (11.6) is expressed as
′
xk+1 = xk′ − αk L−1 −T ′
k ∇f(Lk xk ) . (11.7)
In the original variables, we obtain, by using (11.5),

LTk xk+1 = LTk xk − αk L−1

k ∇f(xk ) (11.8)
or, by multiplying by L−T

k
xk+1 = xk − αk L−T −1
k Lk ∇f(xk )
(11.9)
= xk − αk H−1
k ∇f(xk ) .
Therefore, the preconditioned steepest descent method gives
xk+1 = xk + αk dk (11.10)
with
dk = −H−1
k ∇f(xk ) . (11.11)
If we denote Dk = H−1
k , we obtain in a similar manner
dk = −Dk ∇f(xk ) . (11.12)
It is admittedly a descent method. Indeed, when ∇f(xk ) 6= 0,
∇f(xk )T dk = −∇f(xk )T Dk ∇f(xk ) < 0 ,
because Hk is positive definite as is Dk . We note that the index k of Dk enables us

to precondition the method differently for each iteration.
It is important to note that Algorithm 11.1 is not complete. Indeed, nothing is
specified regarding the manner in which to generate the positive definite matrices
Dk . Moreover, the suggested method to calculate αk at step 15 is not trivial to
implement. Finally, certain additional assumptions are necessary in order to ensure
that the method converges.
Section 11.2 describes algorithms that are designed to identify an (approximation)
of a local minimum of the function along the selected direction dk , that is
αk ∈ argminα≥0 f(xk + αdk ). (11.13)
However, it is not necessary to select a step αk that minimizes the function along dk .
In order to save computing time, we propose in Section 11.3 a characterization of steps
that are “acceptable.” Section 11.3 proposes an inexact line search algorithm based
on this characterization. After including the line search approach into the steepest
descent algorithm in Section 11.4, we propose in Section 11.5 a way to define the
preconditioning matrices Dk , inspired by Newton’s method.
Algorithm 11.1: Preconditioned steepest descent method

1 Objective
2 To find (an approximation of) a local minimum of the problem
min f(x) . (11.14)

x∈Rn
3 Input
4 The differentiable function f : Rn → R.
5 The gradient of the function ∇f: Rn → Rn .
6 A family of preconditioners Dk k such that Dk is positive definite for all k.
7 An initial solution x0 ∈ Rn .
9 Output
10 An approximation of the optimal solution x∗ ∈ Rn .
11 Initialization
12 k := 0.
13 Repeat
14 dk := −Dk ∇f(xk ).
15 Determine αk , for instance αk ∈ argminα≥0 f(xk + αdk ).
16 xk+1 := xk + αk dk .
17 k := k + 1.
19 x∗ := xk .
11.2 Exact line search

As suggested in Algorithm 11.1, the step to perform along the direction dk may be
obtained from solving (11.13). We call this way of calculating the step size an “exact
line search,” referring to the fact that we are seeking the exact minimum.
The optimization problem (11.13) is a problem with one variable, α, and can be
written as
min h(α) = f(xk + αdk ), (11.15)
α≥0
where xk is the current iterate and dk is a descent direction. From Theorem 2.11, we
know that α = 0 is not a local minimum of this function. Therefore, the constraint
α ≥ 0 is inactive at the optimal solution and can be ignored (see Theorem 3.5).
Clearly, Newton’s method can be used to solve the problem, if a good approxima-
tion of the local optimum is known. The derivatives of h are
dh(α)
h ′ (α) = = ∇f(xk + αdk )T dk , (11.16)
dα
252 Exact line search
and
d2 h(α)
h ′′ (α) = = dTk ∇2 f(xk + αdk )T dk . (11.17)
dα2
We describe two other techniques: the quadratic interpolation method and the golden
section method.
11.2.1 Quadratic interpolation

The quadratic interpolation method requires that the function h is continuous and
uses only the value of the function, not its derivatives.
Consider three distinct points a h(b) and h(c) > h(b),
so that a local minimum of the function lies in the interval [a, c] by continuity of h.
Such points can be generated by Algorithm 11.2. Note that the condition h(δ) < h(0)
guarantees that the algorithm does not stop at the first iteration, that is, when only
two points have been generated.
Algorithm 11.2: Initialization of the exact line search

1 Objective
2 Find a, b, and c such that a h(b) and h(c) > h(b).
3 Input
4 A continuous function h : R → R such that the function decreases at 0.
5 δ such that h(δ) < h(0).
6 Initialization
7 x0 := 0
8 x1 := δ
9 k := 1
10 Repeat
11 xk+1 := 2xk
12 k=k+1
13 Until h(xk ) > h(xk−1 )
14 a = xk−2
15 b = xk−1
16 c = xk
We interpolate a parabola q at the three points. To do so, we identify the param-

eters β1 , β2 , and β3 of the quadratic function
q(x) = β1 (x − a)(x − b) + β2 (x − a) + β3 (x − b), (11.18)
such that q(a) = h(a), q(b) = h(b) and q(c) = h(c). As q(a) = h(a), we obtain
immediately that
h(a)
β3 = . (11.19)
a−b
Similarly, as q(b) = h(b), we have
h(b)
β2 = . (11.20)
b−a
From the last interpolation condition, q(c) = h(c), we obtain after some straightfor-
ward derivation,
(b − c)h(a) + (c − a)h(b) + (a − b)h(c)

β1 = . (11.21)
(a − b)(c − a)(c − b)
As q(a) > q(b) and q(c) > q(b), the quadratic q is convex, and its minimum x∗
corresponds to the point where the derivative is 0. As
q ′ (x∗ ) = β1 (2x − a − b) + β2 + β3 = 0, (11.22)
we have
β1 (a + b) − β2 − β3
x∗ = . (11.23)
2β1
The numerator β1 (a + b) − β2 − β3 is equal to
h(a)(b2 − c2 ) + h(b)(c2 − a2 ) + h(c)(a2 − b2 )

,
(a − b)(c − a)(c − b)
so that
1 h(a)(b2 − c2 ) + h(b)(c2 − a2 ) + h(c)(a2 − b2 )
x∗ = . (11.24)
2 h(a)(b − c) + h(b)(c − a) + h(c)(a − b)
Now, we need to generate a new set of 3 points a+ , b+ , c+ , with the same properties
(a+ < b+ < c+ , h(a+ ) > h(b+ ), h(c+ ) > h(b+ )), and such that the interval [a+ , c+ ]
is strictly smaller than [a, c]. We assume that h(x∗ ) 6= h(b). If it happens not to be
the case, perturb x∗ by a small amount to enforce h(x∗ ) 6= h(b). Note that assuming
h(x∗ ) 6= h(b) implies that x∗ 6= b.
Suppose first that x∗ lies between b and c, that is a h(b), we set a+ = a, b+ = b, and c+ = x∗ . The condition a+ < b+ <
c+ is trivially verified. The condition h(a+ ) > h(b+ ) is h(a) > h(b), which is
verified by assumption, and the condition h(c+ ) > h(b+ ) is h(x∗ ) > h(b), which
is the condition of the case that is treated.
• If h(x∗ ) < h(b), we set a+ = b, b+ = x∗ , and c+ = c. The condition a+ < b+ <
c+ is trivially verified. The condition h(a+ ) > h(b+ ) is h(b) > h(x∗ ), which is the
condition of the case being treated. The condition h(c+ ) > h(b+ ) is h(c) > h(x∗ ),
which is verified because h(c) > h(b) > h(x∗ ).
Suppose next that x∗ lies between a and b, that is a < x∗ h(b), we set a+ = x∗ , b+ = b, and c+ = c. The condition a+ < b+ <
c+ is trivially verified. The condition h(a+ ) > h(b+ ) is h(x∗ ) > h(b), which is
the condition being treated. The condition h(c+ ) > h(b+ ) is h(c) > h(b), verified
by assumption.
• If h(x∗ ) < h(b), we set a+ = a, b+ = x∗ , and c+ = b. The condition a+ < b+ <

c+ is trivially verified. The condition h(c+ ) > h(b+ ) is h(b) > h(x∗ ), which is the
condition being treated. The condition h(a+ ) > h(b+ ) is h(a) > h(x∗ ), which is
verified because h(a) > h(b) > h(x∗ ).
The complete procedure is described as Algorithm 11.3.
Algorithm 11.3: Exact line search: quadratic interpolation

1 Objective
2 Find a local minimum of minα≥0 h(α)
3 Input
4 A continuous function h : R → R such that the function decreases at 0.
5 A step δ such that h(δ) < h(0).
6 The desired precision ε > 0.
7 Output
8 α∗ local minimum of minα≥0 h(α)
9 Initialization
10 Compute a, b, and c such that a h(b), and h(c) > h(b)
using Algorithm 11.2.
11 Repeat
12
1 h(a)(b2 − c2 ) + h(b)(c2 − a2 ) + h(c)(a2 − b2 )
x∗ := .
2 h(a)(b − c) + h(b)(c − a) + h(c)(a − b)
while h(x∗ ) = h(b) do x∗ is perturbed to avoid being stalled
13 if b-a < c-b then
14 x∗ := x∗ + ε/2
15 else
16 x∗ := x∗ − ε/2
17 if x∗ > b then
18 if h(x∗ ) > h(b) then the new triplet is a, b, x∗
19 c := x∗
20 else the new triplet is b, x∗ , c
21 a := b
22 b := x∗
23 else if x∗ h(b) then the new triplet is x∗ , b, c
25 a := x∗
26 else the new triplet is a, x∗ , b
27 c := b
28 b := x∗
29 Until max(h(a), h(c)) − h(b) ≤ ε or c − a ≤ ε

30 α∗ := b
Example 11.3 (Exact line search with quadratic interpolation). Consider the func-
tion
h(x) = (2 + x) cos(2 + x). (11.25)

In order to identify a local minimum of h, we apply Algorithm 11.3 with δ = 6 and

ε = 10−3 . Note that −1.1640 = h(δ) < h(0) = −0.83229. This value of δ has been
chosen to make the example illustrative. In practice, a smaller value is used (try with
δ = 2).
The first four iterations are illustrated in Figures 11.2–11.5, and all iterations are
reported in Table 11.2.
20
h(x) = (2 + x) cos(2 + x)
15 q(x)
10
5 x∗
a+ c
0 a b c+
b+
-5
-10
-15
-20
0 2 4 6 8 10 12
Figure 11.2: Quadratic interpolation – Iteration 1
20
h(x) = (2 + x) cos(2 + x)
15 q(x)
10
5 a
c
0 b c+
a+
-5 x∗
-10 b+
-15
-20
0 2 4 6 8 10 12

Table 11.2: Iterates of Example 11.3

Iter. a b c x∗ h(a) h(b) h(c) h(x∗ )
1 0.0 6.0 12.0 3.58364 -0.832294 -1.164 1.91432 4.27225
2 3.58364 6.0 12.0 8.21855 4.27225 -1.164 1.91432 -7.16487
3 6.0 8.21855 12.0 8.69855 -1.164 -7.16487 1.91432 -3.13122
4 6.0 8.21855 8.69855 7.43782 -1.164 -7.16487 -3.13122 -9.43702
5 6.0 7.43782 8.21855 7.45558 -1.164 -9.43702 -7.16487 -9.45109
6 7.43782 7.45558 8.21855 7.52836 -9.43702 -9.45109 -7.16487 -9.47729
7 7.45558 7.52836 8.21855 7.52898 -9.45109 -9.47729 -7.16487 -9.47729
Exact line search
8 7.52836 7.52898 8.21855 7.52933 -9.47729 -9.47729 -7.16487 -9.47729

9 7.52898 7.52933 8.21855 7.52933 -9.47729 -9.47729 -7.16487 -9.47729
10 7.52933 7.52933 8.21855 7.52933 -9.47729 -9.47729 -7.16487 -9.47729
11 7.52933 7.52933 8.21855 7.52933 -9.47729 -9.47729 -7.16487 -9.47729
12 7.52933 7.52933 8.21855 7.52933 -9.47729 -9.47729 -7.16487 -9.47729
13 7.52933 7.52933 7.52933 7.52933 -9.47729 -9.47729 -9.47729 -9.47729
20
h(x) = (2 + x) cos(2 + x)
15 q(x)
10
5 c
0 a ∗
a+ x
-5 +
bc
+
-10 b
-15
-20
4 5 6 7 8 9 10 11 12 13
20
h(x) = (2 + x) cos(2 + x)
15 q(x)
10
5
0 a
a+ c
-5 b
x∗ c+
-10 b+
-15
-20
4 5 6 7 8 9 10 11 12 13
11.2.2 Golden section

The golden section method requires that the function h be strictly unimodal on
an interval [0, T ] (see Definition B.6), and α∗ the global minimum of h on [0, T ]
(Definition 1.7). The method generates a sequence of intervals [ℓk , uk ] such that for
each k,
• [ℓk+1 , uk+1 ] ⊂ [ℓk , uk ], and
• α∗ ∈ [ℓk , uk ].
Consider two points αk1 and αk2 such that ℓk < αk1 < αk2 < uk . If h(αk1 ) > h(αk2 ), then
h is decreasing from αk1 and αk2 . Therefore, the global minimum α∗ cannot be smaller
than αk1 (due to the strict unimodality of h). Therefore, α∗ ∈ [αk1 , uk ], which is the
next interval of the sequence: ℓk+1 = αk1 and uk+1 = uk . If h(αk1 ) < h(αk2 ), then h is
increasing from αk1 and αk2 . Therefore, the global minimum α∗ cannot be greater than
αk2 (due to the strict unimodality of h). Therefore, α∗ ∈ [ℓk , αk2 ], which is the next
interval of the sequence: ℓk+1 = ℓk and uk+1 = αk2 . These two cases are illustrated
in Figure 11.6. If it happens that h(αk1 ) = h(αk2 ), then the strict unimodularity of h
guarantees that α∗ ∈ [αk1 , αk2 ], so that it becomes the next interval, and ℓk+1 = αk1
and uk+1 = αk2 .
h(α)
ℓk α1 = ℓk+1 α∗ α2 uk = uk+1
(a) Case h(αk k

1 ) > h(α2 )
h(α)
ℓk = ℓk+1 α∗ α1 uk
α2 = uk+1
(b) Case h(αk k

1 ) < h(α2 )
Figure 11.6: Next interval of the golden section method
We now define specific rules to choose αk1 and αk2 . First, we impose a symmetric
reduction of the intervals, that is
αk1 − ℓk = uk − αk2 = ρ(uk − ℓk ), (11.26)
where ρ < 1/2 is the shrinking factor of the interval, which is constant across itera-
tions. Second, we choose ρ in order to save on function evaluations. At each iteration
where only one of the two values becomes the bound of the next interval, we recycle
the other value for the next reduction, as illustrated in Figure 11.7.
During iteration k, the next interval happens to be [ℓk+1 , uk+1 ] = [ℓk , αk2 ]. We
select αk+1
2 to be equal to αk1 , so that there is no need to recalculate the value of the
ℓk αk1 αk2 uk
ℓk+1 αk+1
1 αk+1
2
uk+1
Figure 11.7: Golden section method: recycling function evaluations
function in αk+1
2 . Denote λ the length of the interval:
λ = uk − ℓk . (11.27)
By symmetry (11.26), we have
αk1 − ℓk = uk − αk2 = ρ(uk − ℓk ) = ρλ, (11.28)
and for the next iteration
αk+1
1 − ℓk+1 = uk+1 − αk+1
2 = ρ(uk+1 − ℓk+1 ). (11.29)
We now exploit the fact that ℓk+1 = ℓk , αk+1
2 = αk1 , and uk+1 = αk2 (see Figure 11.7)
to obtain
αk+1
1 − ℓk = αk2 − αk1 = ρ(αk2 − ℓk ). (11.30)
We first derive
αk2 − αk1 = αk2 − αk1 + ℓk − ℓk + uk − uk
= −(αk1 − ℓk ) − (uk − αk2 ) + uk − ℓk
= −ρλ − ρλ + λ from (11.27) and (11.28)
= λ(1 − 2ρ). (11.31)
Then, we derive
αk2 − ℓk = αk2 − ℓk + uk − uk
= −(uk − αk2 ) + (uk − ℓk )
= −ρλ + λ from (11.27) and (11.28)
= λ(1 − ρ). (11.32)
Using (11.31) and (11.32) into (11.30), we obtain
λ(1 − 2ρ) = ρλ(1 − ρ), (11.33)
or equivalently,
ρ2 − 3ρ + 1 = 0. (11.34)
This equation has two solutions:
√ √
3+ 5 3− 5
and . (11.35)
2 2
As the shrinking factor ρ has to be less than 1/2, we select
√
3− 5
ρ= . (11.36)
2
Example 11.4 (Exact line search with golden section). Consider the function
h(x) = (2 + x) cos(2 + x). (11.37)
The function is strictly unimodular in the interval [5, 10]. We apply Algorithm 11.4
with ε = 10−3 to identify the global minimum of h in this interval.
Algorithm 11.4: Exact line search: golden section

1 Objective
2 Find (an approximation of) the global minimum of h(α) on [ℓ, u].
3 Input
4 An interval [ℓ, u] ⊂ R.
5 A function h : R → R strictly unimodular on [ℓ, u].
6 The desired precision ε > 0.
7 Output
8 α∗ , an approximation of the global minimum of h(α) on [ℓ, u].
9 Initialization
10 k := 1
11 ℓ1 := ℓ
12 u1 := u
√
13 ρ := (3 − 5)/2
14 α11 := ℓ1 + ρ(u1 − ℓ1 )
15 α12 := u1 − ρ(u1 − ℓ1 ).
16 Repeat
17 if h(αk1 ) = h(αk2 ) then
18 ℓk+1 := αk1
19 uk+1 := αk2
20 αk+1
1 := ℓk+1 + ρ(uk+1 − ℓk+1 )
21 αk+1
2 := uk+1 − ρ(uk+1 − ℓk+1 ).
22 else if h(αk1 ) > h(αk2 ) then
23 ℓk+1 := αk1
24 uk+1 := uk
25 αk+1
1 := αk2
k+1
26 α2 := uk+1 − ρ(uk+1 − ℓk+1 ).
27 else
28 ℓk+1 := ℓk
29 uk+1 := αk2
30 αk+1
1 := ℓk+1 + ρ(uk+1 − ℓk+1 )
31 αk+1
2 := αk1 .
32 k := k + 1.
33 Until uk − ℓk ≤ ε
34 α∗ = (ℓk + uk )/2
The intervals generated during the first iterations are represented in Figure 11.8.
The details of each iteration is reported in Table 11.3.
ℓ α∗ u
Figure 11.8: Intervals of the first iterations of the golden section method on Exam-
ple 11.4

k ℓk αk1 αk2 uk h(αk1 ) h(αk2 )
1 5.0 6.90983 8.09017 10.0 -7.75439 -7.93768
2 6.90983 8.09017 8.81966 10.0 -7.93768 -1.89353
3 6.90983 7.63932 8.09017 8.81966 -9.41833 -7.93768
4 6.90983 7.36068 7.63932 8.09017 -9.34146 -9.41833
5 7.36068 7.63932 7.81153 8.09017 -9.41833 -9.08684
6 7.36068 7.53289 7.63932 7.81153 -9.47723 -9.41833
7 7.36068 7.46711 7.53289 7.63932 -9.45863 -9.47723
8 7.46711 7.53289 7.57354 7.63932 -9.47723 -9.4678
9 7.46711 7.50776 7.53289 7.57354 -9.47504 -9.47723
10 7.50776 7.53289 7.54842 7.57354 -9.47723 -9.47553
11 7.50776 7.52329 7.53289 7.54842 -9.47712 -9.47723
12 7.52329 7.53289 7.53882 7.54842 -9.47723 -9.47686
13 7.52329 7.52922 7.53289 7.53882 -9.47729 -9.47723
14 7.52329 7.52696 7.52922 7.53289 -9.47727 -9.47729
15 7.52696 7.52922 7.53062 7.53289 -9.47729 -9.47729
16 7.52696 7.52836 7.52922 7.53062 -9.47729 -9.47729
17 7.52836 7.52922 7.52976 7.53062 -9.47729 -9.47729
18 7.52836 7.52889 7.52922 7.52976 -9.47729 -9.47729
19 7.52889 7.52922 7.52943 7.52976 -9.47729 -9.47729
The name of the method comes from the golden ratio. Two quantities a and b
are in the golden ratio if

a+b a
= = φ, (11.38)
a b
where √
1+ 5
φ= ≈ 1.618 (11.39)
2
is the golden ratio. In geometry, a golden rectangle is a rectangle that can be cut
into a square and a rectangle similar to the original one (see Figure 11.9). Its side
lengths are in the golden ratio. The golden rectangle has been used in architecture,
for its aesthetical properties.
a b
Figure 11.9: A golden rectangle
In Algorithm 11.4, the distance of the two points α1 and α2 to the lower bound
of the interval, that is α1 − ℓ and α2 − ℓ, are in the golden ratio. Indeed,
α2 − ℓ α2 − α1 + α1 − ℓ
=
α1 − ℓ α1 − ℓ
λ(1 − 2ρ) + ρλ
= from (11.31)
ρλ
1−ρ
=
ρ
√
1+ 5
= from (11.36)
2
=φ from (11.39).
11.3 Inexact line search
We want to spend as little effort as possible calculating the step. Instead of trying to
solve a one-dimensional optimization problem as in the previous section, we consider
here a trial-and-error approach, where various values are tested for the step α, and
the first one that is suitable is accepted. It means that we need formal conditions that
distinguish acceptable from unacceptable steps. In principle, to maintain consistency
with theory, small steps should be used. Indeed, Taylor’s theorem guarantees that
performing a small step along a descent direction decreases the value of the function.
However, we would also like our algorithm to progress rapidly, which would encourage
to consider large steps. In order to reconcile these two contradictory objectives, we
establish a kind of contract: large steps are acceptable provided that the reduction
that is achieved is substantial. If not, they are rejected, and smaller steps should be
considered. In order to formally characterize this “contract,” we introduce the notion
of sufficient decrease of the function.
Solving the problem

αk ∈ argminα≥0 f(xk + αdk ) , (11.40)
at each iteration using a technique like those described in Section 11.2 may be un-
necessarily demanding in terms of computational efforts.
Instead of an exact line search method, we describe here an inexact line search,
based on the characterization of what is acceptable and what is not. Once these
characterization conditions are defined, “candidates” are generated for step lengths,
thanks to simple and computationally cheap algorithms, until an acceptable step is
produced.
We start by illustrating the fact that the condition f(xk + αk dk ) < f(xk ) is not
sufficient for αk to be considered an acceptable step.
Example 11.5 (Descent method: too large steps). Consider the one-variable function
f(x) = x2 .
We apply Algorithm 11.1 with x0 = 2 and
1 sgn(xk )
Dk = =
2|xk | 2xk
αk = 2 + 3(2−k−1 ) .
Note that Dk is positive (definite) for all k. Since ∇f(xk ) = 2xk , we have dk =
−Dk ∇f(xk ) = − sgn(xk ). In this case, the method is written as

xk − 2 − 3(2−k−1 ) if xk ≥ 0
xk+1 = (11.41)
xk + 2 + 3(2−k−1 ) if xk < 0,
264 Inexact line search
which gives the sequence of iterates listed in Table 11.4 and illustrated in Figure 11.10.
We show by induction that, in this case,
xk = (−1)k (1 + 2−k ) (11.42)

and
|xk+1 | < |xk | . (11.43)

k xk dk αk
0 +2.000000e+00 -1 +3.500000e+00
1 -1.500000e+00 1 +2.750000e+00
2 +1.250000e+00 -1 +2.375000e+00
3 -1.125000e+00 1 +2.187500e+00
4 +1.062500e+00 -1 +2.093750e+00
5 -1.031250e+00 1 +2.046875e+00
..
.
46 +1.000000e+00 -1 +2.000000e+00
47 -1.000000e+00 1 +2.000000e+00
48 +1.000000e+00 -1 +2.000000e+00
49 -1.000000e+00 1 +2.000000e+00
50 +1.000000e+00 -1 +2.000000e+00
4.5
4
3.5
3
2.5
f(x)
2
1.5
1
0.5
0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x
Figure 11.10: Iterates of Example 11.5
The cases k = 0 and k = 1 are verified numerically (Table 11.4). We now assume
that k is even and that (11.42) and (11.43) are verified for k. We note that the parity
of k and (11.42) ensure that xk > 0. Then,
xk+1 = xk − 2 − 3(2−k−1 ) from (11.41)

= (−1)k (1 + 2−k ) − 2 − 3(2−k−1 ) from (11.42)
−k −k−1
= (1 + 2 ) − 2 − 3(2 ) because k is even
= 1 + 21−(k+1) − 2 − 3(2−(k+1) )
= −1 − 2−(k+1)
= (−1)k+1 (1 + 2−(k+1) ) because k is even .
Since k is even, xk > 0 and xk+1 < 0. Therefore,
|xk | − |xk+1 | = xk + xk+1

= 1 + 2−k − 1 − 2−k−1
1
= 2−k > 0 .
2
The case where k is odd is demonstrated in a similar manner. We deduce directly
from (11.43) that x2k+1 < x2k and
f(xk+1 ) < f(xk ) ,
demonstrating that it is indeed a descent method generating iterates, each of which is

strictly better than the previous one, not only because the objective function strictly
decreases, but also by the fact that each
iterate is closer to the minimum than the pre-
vious one. However, the sequence xk k does not converge and has two accumulation
points in −1 and 1. Neither of these points is a local minimum of the function.
The reason that the presented algorithm fails in Example 11.5 is the disproportion
between the length of the step and the resulting decrease of the objective function.
Indeed, the notion of a descent direction (Definition 2.10) is based on Taylor’s the-
orem, which is valid only in a neighborhood of the current iterate. As soon as we
select a larger step than η of Theorem 2.11, the fact that the direction is a descent
direction is no longer relevant, and the fact that the new iterate happens to be better
is coincidental. This is the case with Example 11.5, where the next iterate actually
lies in a region where the function is increasing along the followed direction.
To avoid this inconvenience, it is necessary to impose a condition characterizing
the notion of sufficient decrease of the function. One idea is to consider a decrease
of the function to be sufficient if the improvement of the objective function is pro-
portional to the length of the step. Concretely, we select γ > 0, and consider a step
αk to be acceptable if
f(xk ) − f(xk + αk dk ) ≥ αk γ ,
or
f(xk + αk dk ) ≤ f(xk ) − αk γ . (11.44)
The factor γ cannot be chosen arbitrarily. In particular, it should vary from one
direction to another. Returning to Example 11.2, Figure 11.11 illustrates the shape of
T
the function f(x0 +αd) when going from x0 = 10 1 in two different normalized
directions, as well as the straight line f(x0 ) − αγ, with γ = 6.
According to Figure 11.11(a), a sufficient decrease of the function in the direction
√ √ T
d = −10/ 181 −9/ 181 is obtained for several values of α, especially between
0 and 3.25478. However, it can be seen in Figure 11.11(b) that no value of α allows
√ √ T
a sufficient decrease in the direction d = −2/ 5 1/ 5 with regard to the
condition (11.44). Indeed, the straight line is too steep, while the function is relatively
flat in this direction. The requirement associated with this value of γ is too strong.
Instead of using an arbitrary fixed value for γ, it is more appropriate to define it
proportional to the slope of the function in xk in the direction dk :
γ = −β∇f(xk )T dk ,
with 0 < β < 1. Then, the closer the directional derivative ∇f(xk )T dk is to zero, the
smaller the slope of the line, and vice versa. Note that, if β = 0, (11.44) would collapse
to f(xk + αk dk ) ≤ f(xk ), that we have shown to be inappropriate. Geometrically, the
line setting the threshold for the objective function value is horizontal in this case.
So the value β = 0 is excluded. The value β = 1 corresponds to the tangent line. If
the function happens to be convex at xk (which is the case close to a local minimum),
the tangent lies entirely below the function (see Theorem 2.16), and no value of αk
verifies (11.44). Again, this value of β is excluded, justifying the condition 0 < β < 1.
Definition 11.6 (Sufficient decrease: the first Wolfe condition). Consider the differ-
entiable function f : Rn → R, a point xk ∈ Rn , a (descent) direction dk ∈ Rn such
that ∇f(xk )T dk < 0 and a step αk ∈ R, αk > 0. We say that the function f decreases
sufficiently in xk + αk dk compared with xk if
f(xk + αk dk ) ≤ f(xk ) + αk β1 ∇f(xk )T dk , (11.45)
with 0 < β1 < 1. The condition (11.45) is called the first Wolfe condition after Wolfe
(1969), or the Armijo condition, after Armijo (1966).
It is important to note that (2.14) in the theorem on descent directions (Theorem

2.11) guarantees that there always exist steps satisfying the condition (11.45). The
condition (11.45) is illustrated in Figure 11.12 with β1 = 0.5 and in Figure 11.13 with
β1 = 0.1. In each case, there exist steps ensuring a sufficient decrease.
The condition (11.45) enables us to reject steps that, due to being too large, do
not provide a sufficient decrease of the function. Having solved the problem of too
large steps, we now consider steps that are too small. These may cause a problem,
as shown in Example 11.7.
65
f(xk + αdk )
f(xk ) − αγ
60
55
50
45
40
35
0 1 2 3 4 5
α
√ √
(a) dk = (−10/ 181 − 9/ 181)T
65
f(xk + αdk )
f(xk ) − αγ
60
55
50
45
40
35
0 1 2 3 4 5
α
√ √ T
(b) dk = −2/ 5 1/ 5
Figure 11.11: Decrease of the function of Example 11.2

65
f(xk + αdk )
f(xk ) + αβ1 ∇f(xk )T dk
60
f(xk ) + α∇f(xk )T dk
55
50
45
40
35
0 1 2 3 4 5
α
√ √
(a) dk = (−10/ 181 − 9/ 181)T
65
f(xk + αdk )
60
55
50
45
40
35
0 1 2 3 4 5
α
√ √ T
(b) dk = −2/ 5 1/ 5
Figure 11.12: Condition (11.45) with β1 = 0.5

65
f(xk + αdk )
60
55
50
45
40
35
0 1 2 3 4 5
α
√ √
(a) dk = (−10/ 181 − 9/ 181)T
65
f(xk + αdk )
60
55
50
45
40
35
0 1 2 3 4 5
α
√ √ T
(b) dk = −2/ 5 1/ 5
Figure 11.13: Condition (11.45) with β1 = 0.1

Example 11.7 (Descent method: too small steps). Consider the one-variable func-
tion
f(x) = x2 .
We apply Algorithm 11.1 with x0 = 2 and
1
Dk =
2xk
αk = 2−k−1 .
Note that Dk is positive (definite) for all k if xk > 0, which is the case in this example.
Since ∇f(xk ) = 2xk , we have dk = −Dk ∇f(xk ) = −1. In this case, the method is
written as
xk+1 = xk − 2−k−1 .

The sequence of iterates xk k is listed in Table 11.5. We show by induction that it
is defined by
xk = 1 + 2−k . (11.46)

k xk dk αk
0 +2.000000e+00 -1 +5.000000e-01
1 +1.500000e+00 -1 +2.500000e-01
2 +1.250000e+00 -1 +1.250000e-01
3 +1.125000e+00 -1 +6.250000e-02
4 +1.062500e+00 -1 +3.125000e-02
5 +1.031250e+00 -1 +1.562500e-02
..
.
46 +1.000000e+00 -1 +7.105427e-15
47 +1.000000e+00 -1 +3.552714e-15
48 +1.000000e+00 -1 +1.776357e-15
49 +1.000000e+00 -1 +8.881784e-16
50 +1.000000e+00 -1 +4.440892e-16
The cases k = 0 and k = 1 are numerically verified (Table 11.5). If we assume that
(11.46) is verified for k, then
xk+1 = xk − 2−k−1 = 1 + 2−k − 2−k−1

= 1 + 2−k−1 (2 − 1) = 1 + 2−(k+1)
and the recurrence is verified. From Equation (11.46), we immediately deduce xk+1 <
xk and, consequently, f(xk+1 ) < f(xk ). We thus have a descent method. However,
the sequence (xk )k converges toward 1, which is not a local minimum of the function.
In this case, the reason the method fails is due to the degeneracy of the steps αk .
Although these steps are positive, they are closer and closer to 0 and at some point,
the method can no longer progress. A technique aiming to prevent this again exploits
the derivative of the function in the direction dk . At the point xk , the directional
derivative ∇f(xk )T dk is negative, because dk is a descent direction. If we were per-
forming an exact line search (see Section 11.2) where the step α∗ corresponds to a local
minimum of the function in the direction k, we would have ∇f(xk + α∗ dk )T dk = 0.
Then, between xk and xk + α∗ dk , the derivative of the function increases compared
to its initial negative value. To ensure sufficiently large steps, the idea is to obtain a
step such that the directional derivative increases sufficiently.
Definition 11.8 (Sufficient progress: the second Wolfe condition). Let f : Rn → R

be a differentiable function, and let us take a point xk ∈ Rn , a (descent) direction
dk ∈ Rn such that ∇f(xk )T dk < 0 and a step αk ∈ R, αk > 0. We say that the point
xk + αk dk enables sufficient progress compared with xk if
∇f(xk + αk dk )T dk ≥ β2 ∇f(xk )T dk (11.47)
with 0 < β2 < 1. The condition (11.47) is called the second Wolfe condition, after
Wolfe (1969).
The condition (11.47) can also be written as
∇f(xk + αk dk )T dk
≤ β2 . (11.48)
∇f(xk )T dk
This is illustrated in Figure 11.14, where the straight dotted lines represent the ratio
of the left term of (11.48). By choosing, for instance, β2 = 0.5, the step α should be
such that α ≥ 1.4687 in Figure 11.14(a) and α ≥ 0.94603 in Figure 11.14(b).
The conditions (11.45) and (11.47) are called the Wolfe conditions, after Wolfe
(1969) and Wolfe (1971). They are sometimes called Armijo-Goldstein conditions,
making reference to the work of Armijo (1966), and Goldstein and Price (1967).
As the two conditions have conflicting goals (one forbidding long steps, one for-
bidding short steps), it may happen that they are incompatible, and that no step
verifies both. The next theorem shows that if the parameter β1 of the first condition,
and the parameter β2 of the second are chosen such that 0 < β1 < β2 < 1, the two
conditions are compatible.
Theorem 11.9 (Validity of the Wolfe conditions). Let f : Rn → R be a differentiable

function, and let us take a point xk ∈ Rn and a (descent) direction dk ∈ Rn such
that ∇f(xk )T dk < 0. We assume that f is bounded from below in the direction
dk , i.e., that there exists f0 such that f(xk + αdk ) ≥ f0 for all α ≥ 0. If 0 < β1 <
1, there exists η such that the first Wolfe condition (11.45) is satisfied for all
αk ≤ η. Moreover, if 0 < β1 < β2 < 1, there exists α2 > 0 such that the two
Wolfe conditions (11.45) and (11.47) are both satisfied.
56 1
54
52 0.5
50 ∇f(xk +αdk )T dk
48 ∇f(xk )T dk 0
46
-0.5
44
42 -1
40 f(xk + αdk )
38 -1.5
36
34 -2
0 1 2 3 4 5
α
√ √
(a) dk = (−10/ 181 − 9/ 181)T
64 1
62
0.5
60
0
58
56 -0.5
54
-1
52
-1.5
50
48 -2
0 1 2 3 4 5
α
√ √ T
(b) dk = −2/ 5 1/ 5
Figure 11.14: Sufficient progress of the function of Example 11.2 (β2 = 0.5)
Proof. Since dk is a descent direction, we invoke the theorem of descent directions

(Theorem 2.11). Note that (2.14) is equivalent to (11.45), which proves the first part
of the theorem. Note also that there exist steps such that the condition (11.45) is
not satisfied. Indeed, as the condition is defined by a decreasing line, that is, an
unbounded function, the fact that f is bounded from below guarantees that, for some
large steps, the line lies below the function. In particular, when
f0 − f(xk )
α> , (11.49)
β1 ∇f(xk )T dk
we have (the direction of the inequality changes because ∇f(xk )T dk < 0)
f(xk ) + αβ1 ∇f(xk )T dk < f0 ≤ f(xk + αdk ) .
By continuity of f, there exists α1 such that

f(xk + α1 dk ) = f(xk ) + α1 β1 ∇f(xk )T dk , (11.50)
i.e., such that the straight line f(xk ) + αβ1 ∇f(xk )T dk intersects the function.
6
f(xk + αdk )
5 f(xk ) + α∇f(xk )T dk
0
α2 α1
α
Figure 11.15: Validity of the Wolfe conditions: illustration of the proof
We invoke the mean value theorem (Theorem C.1) that says that there is a step
α2 between 0 and α1 such that the function has the same slope in xk + α2 dk as the
line (see Figure 11.15). Formally, we use Equation (C.2) with d = α1 dk . There exists
0 ≤ α ′ ≤ 1 such that
f(xk + α1 dk ) = f(xk ) + α1 dTk ∇f(xk + α ′ α1 dk ) . (11.51)
If we take α2 = α ′ α1 , we combine (11.50) and (11.51) to obtain
f(xk ) + α1 β1 ∇f(xk )T dk = f(xk ) + α1 dTk ∇f(xk + α2 dk )
or
dTk ∇f(xk + α2 dk )
β1 = .
∇f(xk )T dk
Then, since β2 > β1 , we have
dTk ∇f(xk + α2 dk )
β2 >
∇f(xk )T dk
and (11.47) is satisfied for α2 .

An inexact line search method enables us to identify a step that satisfies the Wolfe
conditions (11.45) and (11.47). Algorithm 11.5 is attributed to Fletcher (1980) and
Lemaréchal (1981). The idea is simple: a trial step is tested. If it is too long, that is,
if it violates the first Wolfe condition, it is shortened. If it is too short, that is, if it
violates the second Wolfe condition, it is made longer. This process is repeated until
a step verifying both conditions is found. Theorem 11.9 guarantees that such a step
exists when 0 < β1 < β2 < 1.
Algorithm 11.5: Line search

1 Objective
2 To find a step α∗ such that the Wolfe conditions (11.45) and (11.47) are
satisfied.
3 Input
4 The continuously differentiable function f : Rn → R.
6 A vector x ∈ Rn .
7 A descent direction d such that ∇f(x)T d < 0.
8 An initial solution α0 > 0 (e.g. α0 = 1).
9 Parameters β1 and β2 such that 0 < β1 < β2 < 1 (e.g., β1 = 10−4 and
β2 = 0.99).
10 A parameter λ > 1 (e.g., λ = 2).
11 Output
12 A step α∗ such that the conditions (11.45) and (11.47) are satisfied.
13 Initialization
14 i := 0.
15 αℓ := 0.
16 αr := +∞.
17 Repeat
18 if αi violates (11.45) then the step is too long
19 αr := αi
αℓ + αr
20 αi+1 := .
2
21 if αi does not violate (11.45) but violates (11.47) then the step is too
short
22 αℓ := αi
23 if αr < +∞ then
αℓ + αr
24 αi+1 :=
2
25 else
26 αi+1 := λαi
27 i := i + 1.
28 Until αi satisfies the conditions (11.45) and (11.47)
29 α∗ := αi
Table 11.6 lists the steps of the algorithm applied to the function of Example 11.2,
with √
10 −2/√ 5
x= , d=
1 1/ 5
α0 = 10−3 , β1 = 0.3 , β2 = 0.7 , λ = 20 .
Note that, for reasons of implementation, the quantity +∞ is represented by

9.99999000e+05. The value of the parameters used in this example have been chosen
to illustrate all the cases and are not appropriate in practice. The value of β1 should
be close to 0 (for instance, β1 = 10−4 ) and the value of β2 should be close to 1 (for
instance, β2 = 0.99). A smaller value for λ (such as λ = 2) is also more appropriate
in practice.
Table 11.6: Illustration of the line search for Example 11.2

αi αℓ αr Violated cond.
1.00000000e-03 0.00000000e+00 9.99999000e+05 (11.47)
2.00000000e-02 1.00000000e-03 9.99999000e+05 (11.47)
4.00000000e-01 2.00000000e-02 9.99999000e+05 (11.47)
8.00000000e+00 4.00000000e-01 9.99999000e+05 (11.45)
4.20000000e+00 4.00000000e-01 8.00000000e+00 (11.45)
2.30000000e+00 4.00000000e-01 4.20000000e+00 —
Theorem 11.10 (Finiteness of the line search algorithm). Following the same as-
sumptions as those of Theorem 11.9, the line search (Algorithm 11.5) ends after
a finite number of iterations.
Proof. We first assume, by contradiction, that limi→∞ αi = +∞. This signifies that
αr permanently keeps its initial value αr = ∞, and that condition on line 21 of the
algorithm is always verified. Therefore, αi never violates (11.45). This is impossible,
as was discussed in the proof of Theorem 11.9. Indeed, the condition (11.45) is
violated as soon as αi is sufficiently large, i.e., as soon as (11.49) is satisfied. Then,
after a finite number of iterations, αr < ∞, and the following iterations all consist in
a reduction of the step
αℓ + αr
αi+1 = ,
2
either at line 20 or 24 of the algorithm.
We now assume (by contradiction) that the algorithm performs an infinite number
of iterations. In this case,
lim αir − αiℓ = 0 ,
i→∞
where αir and αiℓ are values of αr and αℓ , respectively, at iteration i. Indeed, regardless
of the case that applies, the interval is divided by two at each iteration, and we always
have
αi − αiℓ
αri+1 − αℓi+1 = r .
2
Then, there exists α∗ such that

α∗ = lim αir = lim αiℓ = lim αi .
i→∞ i→∞ i→∞
As αℓ is updated only when the condition on line 21 of the algorithm is verified, it
means that the condition (11.45) is satisfied for all αiℓ , that is,
f(xk + αiℓ dk ) ≤ f(xk ) + αiℓ β1 ∇f(xk )T dk for each i.

Taking the limit i → ∞, we obtain
f(xk + α∗ dk ) ≤ f(xk ) + α∗ β1 ∇f(xk )T dk . (11.52)
Similarly, as αr is updated only when the condition on line 18 of the algorithm is
verified, the condition (11.45) is not satisfied for any αir , that is,
f(xk + αir dk ) > f(xk ) + αir β1 ∇f(xk )T dk for each i. (11.53)
At the limit,
f(xk + α∗ dk ) ≥ f(xk ) + α∗ β1 ∇f(xk )T dk . (11.54)
Note that the equality is not satisfied for any αir ,
but can be reached at the limit.
Actually, by combining (11.52) and (11.54), we observe that it is reached at the limit,
as we have
f(xk + α∗ dk ) = f(xk ) + α∗ β1 ∇f(xk )T dk . (11.55)
Therefore, the limit value α∗ does not violate the Wolfe condition (11.45), while
every αir does. Consequently, α∗ has to be different than any αir , that is, αir > α∗ or
αir − α∗ > 0. In (11.53), we replace f(xk ) by its value derived from (11.55) to obtain
f(xk + αir dk ) > f(xk + α∗ dk ) + (αir − α∗ )β1 ∇f(xk )T dk for each i.
By dividing by αir − α∗ , which is positive, we obtain for each i
f(xk + αir dk ) − f(xk + α∗ dk )
> β1 ∇f(xk )T dk .
αir − α∗
We take the limit i → ∞ to obtain to the left the directional derivative of f in
xk + α∗ dk in the direction dk , and
∇f(xk + α∗ dk )T dk ≥ β1 ∇f(xk )T dk .
Since β2 > β1 and ∇f(xk )T dk < 0 (by assumption), we get
∇f(xk + α∗ dk )T dk > β2 ∇f(xk )T dk . (11.56)
As αℓ is updated only when the condition on line 21 of the algorithm is verified, it
means that the condition (11.47) is violated for all αiℓ , that is,
∇f(xk + αiℓ dk )T dk < β2 ∇f(xk )T dk ,
and at the limit i → ∞,
∇f(xk + α∗ dk )T dk ≤ β2 ∇f(xk )T dk . (11.57)
Since the two conclusions (11.56) and (11.57) are contradictory, the assumption that
the algorithm performs an infinite number of iterations is incorrect, which proves the
result.
11.4 Steepest descent method

The steepest descent method is certainly among the least efficient algorithms for
unconstrained optimization, even though it is regularly reinvented. Here, it is solely
presented for the sake of comparison with the others. In practice, it should not
be used. It is a full working version combining the preconditioned steepest descent

method (Algorithm 11.1) and line search (Algorithm 11.5). Here, the matrix Dk of
Algorithm 11.1 is the identity matrix.
Algorithm 11.6: Steepest descent method

1 Objective
min f(x) . (11.58)

x∈Rn
3 Input
8 Output
9 An approximation of the optimal solution x∗ ∈ R.
10 Initialization
11 k := 0.
12 Repeat
13 dk := −∇f(xk ).
14 Determine αk by applying the line search (Algorithm 11.5) with α0 = 1.
15 xk+1 := xk + αk dk .
16 k := k + 1.
17 Until k∇f(xk )k ≤ ε
18 x∗ = xk .
11.5 Newton method with line search

We now provide a complete working version of the preconditioned steepest descent
method (Algorithm 11.1), by combining the local Newton method (Algorithm 10.1)
and line search (Algorithm 11.5). An iteration of the local Newton method is
xk+1 = xk − ∇2 f(xk )−1 ∇f(xk ) (11.59)
and an iteration of the preconditioned steepest descent method is
xk+1 = xk − αk Dk ∇f(xk ) , (11.60)

278 Newton method with line search
where Dk is positive definite. Therefore, if ∇2 f(xk ) is positive definite and the step
αk = 1 is acceptable (according to the Wolfe conditions), an iteration of the local
Newton method represents exactly an iteration of the preconditioned steepest descent
method with Dk = ∇2 f(xk )−1 .
If the step αk = 1 is not acceptable, it suffices to apply Algorithm 11.5 to obtain
a step satisfying the Wolfe conditions. However, when the Hessian matrix ∇2 f(xk )
is not positive definite, it is necessary to choose another preconditioner Dk . Several
possibilities exist.
One of them involves choosing Dk diagonal, with entries
−1
∂2 f
Dk (i, i) = max ε, (xk ) , (11.61)
∂(x)2i
with ε > 0. Then, each diagonal element (i.e., each eigenvalue) is greater than or
equal to ε, which guarantees the positive definiteness of the matrix, all the while
incorporating into it information related to the second derivative.
Algorithm 11.7: Modified Cholesky factorization

1 Objective
2 To modify a matrix in order to make it positive definite.
3 Input
4 A symmetric matrix A ∈ Rn×n .
5 Output
6 A lower triangular matrix L and τ ≥ 0 such that A + τI = LLT is positive
definite.
7 Initialization
8 k := 0.
9 if mini aii > 0 then
10 τk := 0
11 else
12 τk := 12 kAkF .
13
14 Repeat
15 Calculate the Cholesky factorization LLT of A + τI.
16 if the factorization is not successful
then
17 τk+1 := max 2τk , 12 kAkF .
18 k := k + 1.
19 Until the factorization is successful
In general, the most widely used technique consists in generating a matrix E such
that −1
Dk = ∇2 f(xk ) + E
is positive definite. In particular, this is always possible if E is a multiple of the
identity.1
Algorithm 11.7 proposes a simple method to obtain E as well as a Cholesky fac-

torization of ∇2 f(xk ) + E. Note that this algorithm is simplistic and computationally
demanding. Several Cholesky factorizations may be required before a positive defi-
nite matrix is found. More sophisticated and effective methods have been proposed,
among others by Gill and Murray (1974), Gill et al. (1981), and Schnabel and Eskow
(1999). Putting everything together, Algorithm 11.8 describes the Newton algorithm
with line search.
Algorithm 11.8: Newton algorithm with line search
1 Objective
min f(x) . (11.62)

x∈Rn
3 Input
4 The twice differentiable function f : Rn → R.
9 Output
11 Initialization
12 k := 0.
13 Repeat
14 Calculate a lower triangular matrix Lk and τ such that
Lk LTk = ∇2 f(xk ) + τI ,
by using for instance the modified Cholesky factorization (Algorithm

11.7).
15 Find zk by solving the triangular system Lk zk = ∇f(xk ).
16 Find dk by solving the triangular system LTk dk = −zk .
17 Determine αk by applying line search (Algorithm 11.5) with α0 = 1.
18 xk+1 := xk + αk dk .
19 k := k + 1.
21 x∗ = xk .
1 Apply Theorem C.18 with A = ∇2 f(xk ), B = I.

280 Newton method with line search
This algorithm is operational in the sense that all steps are well defined. To
compare with the local Newton method from Chapter 10, we apply the Newton
method with line search (Algorithm 11.8) to Example 5.8 starting from the same
T
point x0 = 1 1 . In this case, it converges to

1 0 1 0
x∗ = , ∇f(x∗ ) = , ∇2 f(x∗ ) =
π 0 0 1
which is a local minimum since it satisfies the sufficient optimality conditions (see
Theorem 5.7 and the discussions for Example 5.8). The iterations are illustrated in
Figure 11.16, and it is interesting to compare with Figure 10.1.
6
4
x∗
2
x0
0 x2
-2
-4
-6
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x1
(a) Iterates
3.5
x∗
3
2.5
x2
2
1.5
x0 1
0.2 0.4 0.6 0.8 1
x1
(b) Zoom
Figure 11.16: Iterates of the Newton method with line search for Example 5.8
Table 11.7 lists the values of αk and of τ employed in each iteration. Note that,
starting from iteration 4, the algorithm performs the exact same steps as the local
Newton method, as αk = 1 and τ = 0, and it achieves quadratic convergence.
Table 11.7: Illustration of the Newton method with line search (Algorithm 11.8) for
Example 5.8
k f(xk ) ∇f(xk ) αk τ
0 1.04030231e+00 1.75516512e+00
1 2.34942031e-01 8.88574897e-01 1 1.64562250e+00
2 4.21849003e-02 4.80063696e-01 1 1.72091923e+00
3 -4.52738278e-01 2.67168927e-01 3 8.64490594e-01
4 -4.93913638e-01 1.14762780e-01 1 0.00000000e+00
5 -4.99982955e-01 5.85174623e-03 1 0.00000000e+00
6 -5.00000000e-01 1.94633135e-05 1 0.00000000e+00
7 -5.00000000e-01 2.18521663e-10 1 0.00000000e+00
8 -5.00000000e-01 1.22460635e-16 1 0.00000000e+00
11.6 The Rosenbrock problem

To illustrate the algorithms, we consider a problem proposed by Rosenbrock (1960)
in order to illustrate the superiority of his algorithm (an improvement of the steep-
est descent method based on an orthogonalization procedure), compared with the
steepest descent method. It is defined by
2
min f(x1 , x2 ) = 100 x2 − x21 + (1 − x1 )2 . (11.63)
x∈R2
It is actually the function used in Example 5.3 to illustrate the necessary optimality
conditions. It has a valley that follows the parabola x2 = x21 , which forces all descent
methods to follow a curved trajectory. Figure 11.17 plots the function for x1 between
−2 and 2, and x2 between −4 and 4.
x1 x2
Figure 11.17: The Rosenbrock function
282 The Rosenbrock problem
4
3.5
3
2.5
2
x2
x0 1.5
∗
x 1
0.5
0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

x1
Figure 11.18: Level curves of the Rosenbrock function
2
x0 Zoom 1.5
x∗ 1
0.5 x2
0
-0.5
-1
-1.5 -1 -0.5 0 0.5 1
x1
(a) Stopped at 200 iterations
0.66
0.64
0.62
0.6
0.58 x2
0.56
0.54
0.52
0.5
0.72 0.74 0.76 0.78 0.8
x1
(b) Zoom
Figure 11.19: Steepest descent method
Figure 11.18 represents the level curves 0 to 6 of the function, as well as the
T
location of the starting point x0 = −1.5 1.5 and the location of the optimal
∗
T
solution x = 1 1 .
The steepest descent algorithm (Algorithm 11.6) has a hard time solving this
problem. The zigzag trajectory of the iterates, already illustrated in Example 11.1, is
unacceptable here (Figure 11.19). With two exceptions, the steps made by the method
are small, which hinders the algorithm from progressing. This one was interrupted
after 200 iterations, without having converged. A large step could be taken after
two iterations thanks to the line search strategy (Algorithm 11.5), which starts by
attempting large steps and sometimes succeeds.
The superiority of the Newton method with line search (Algorithm 11.8) is illus-
trated in Figure 11.20. The algorithm exploits pretty well the information about the
curvature of the function provided by the second derivatives. The iterations follow
the valley pretty smoothly, along the parabola that defines it.
2
x0 1.5
∗
x 1
0.5 x2
0
-0.5
-1
-1.5 -1 -0.5 0 0.5 1
x1
(a) 23 iterations
x∗ 1
0.8
0.6
x2
0.4
0.2
0
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1

x1
(b) Zoom
Figure 11.20: Newton method with line search
284 Convergence
11.7 Convergence
The local Newton method (Algorithm 10.1) has a quadratic convergence rate (Theo-
rem 7.13). However, it only works when the starting point is sufficiently close to the
optimal solution. In practice, it is clearly not possible to guarantee that this hypoth-
esis is verified. Moreover, the more non linear and ill-conditioned the function is, the
closer the starting point needs to be to the optimal solution (Eq. (7.21)). The main
motivation for developing the Newton method with line search (Algorithm 11.8) is to
obtain an algorithm that converges regardless of the starting point given by the user.
We call such an algorithm globally convergent.
Definition 11.11 (Global

convergence). Consider an iterative algorithm that gen-
erates a sequence xk k in Rn , in order to solve the unconstrained minimization
problem
minn f(x) ,
x∈R
n
where f : R → R is a continuously differentiable function. The algorithm is said to
be globally convergent if
lim ∇f(xk ) = 0 , (11.64)
k→∞
n
regardless of x0 ∈ R .
Care should be taken not to confuse “global convergence” and “global minimum.”
An algorithm can be globally convergent and converge toward a local minimum.
As line search guarantees sufficient decrease and sufficient progress along a descent
direction, the only way to stall a descent direction algorithm is for the directions
to become asymptotically orthogonal to the gradient. Then, even if the algorithm
guarantees that
∇f(xk )T dk < 0 ,
it is necessary to also guarantee that
lim inf ∇f(xk )T dk < 0 .
k
In other words, the cosine of the angle between the descent direction and the direction
of the steepest slope cannot approach 0. We denote this angle θk , i.e.,
−∇f(xk )T dk
cos θk = . (11.65)
∇f(xk ) kdk k
In order to demonstrate this, we need the following theorem, attributed to Zoutendijk.
Theorem 11.12 (Zoutendijk’s theorem). Consider a function f : Rn → R, that

is bounded from below, differentiable, and its gradient is Lipschitz continuous
(Definition B.16), i.e., there exists M > 0 such that
∇f(x) − ∇f(y) ≤ Mkx − yk , ∀x, y ∈ Rn . (11.66)


Consider an algorithm generating the sequence xk k
, defined by the iterations
xk+1 = xk + αk dk , k = 0, 1, . . . , (11.67)
with dk a descent direction (i.e., ∇f(xk )T dk < 0) and αk that satisfies the Wolfe
conditions (11.45) and (11.47). In this case, the series

+∞
X 2
cos2 θk ∇f(xk ) , (11.68)
k=0
where
−∇f(xk )T dk
cos θk = (11.69)
∇f(xk ) kdk k
is convergent.
Proof. Take an arbitrary k. We have
∇f(xk + αk dk )T dk ≥ β2 ∇f(xk )T dk from (11.47)

T T
∇f(xk+1 ) dk ≥ β2 ∇f(xk ) dk from (11.67)
T
∇f(xk+1 ) − ∇f(xk ) dk ≥ (β2 − 1)∇f(xk )T dk .
Moreover, we have
T
∇f(xk+1 ) − ∇f(xk ) dk ≤ ∇f(xk+1 ) − ∇f(xk ) kdk k
≤ Mkxk+1 − xk k kdk k from (11.66)
2
≤ Mαk kdk k from (11.67) .
By grouping these two results, we have
(β2 − 1)∇f(xk )T dk ≤ Mαk kdk k2
or
β2 − 1 ∇f(xk )T dk
αk ≥ . (11.70)
M kdk k2
The first Wolfe condition (11.45) ensures that
f(xk + αk dk ) − f(xk ) ≤ αk β1 ∇f(xk )T dk .
Therefore, since β1 ∇f(xk )T dk < 0, we get

2
β2 − 1 ∇f(xk )T dk b cos2 θk ∇f(xk ) 2
f(xk + αk dk ) − f(xk ) ≤ β1 = −β ,
M kdk k2
286 Convergence
b = β1 (1 − β2 )/M > 0. Consequently, for an arbitrary

by using (11.69) and defining β
K, we have
K
X K
X
b cos2 θk ∇f(xk ) 2
f(xk + αk dk ) − f(xk ) ≤ − β
k=0 k=0
K
X
b 2
f(xK+1 ) − f(x0 ) ≤ −β cos2 θk ∇f(xk ) .
k=0
Multiplying this last inequality by −1, we obtain
K
X
b 2
β cos2 θk ∇f(xk ) ≤ f(x0 ) − f(xK+1 ) .
k=0
Since f is bounded from below, there exists f0 such that f(x) ≥ f0 , for all x. Therefore,
f(x0 ) − f(xK+1 ) ≤ f(x0 ) − f0 , ∀K ,
and then
K
X
b 2
β cos2 θk ∇f(xk ) ≤ f(x0 ) − f0 , ∀K .
k=0
Taking the limit K → ∞, we conclude that the sequence is convergent, i.e.,

+∞
X 2
cos2 θk ∇f(xk ) ≤ f(x0 ) − f0 . (11.71)
k=0
A necessary condition for the sequence of Zoutendijk’s theorem to be convergent

is that
2
lim cos2 θk ∇f(xk ) = 0.
k→∞
In the context of global convergence (Definition 11.11), we want
2
lim ∇f(xk ) = 0.
k→∞
This would be the case if the sequence cos2 θk did not approach zero, i.e., if there
exists δ > 0 such that
∇f(xk )T dk
− ≥ δ , ∀k .
∇f(xk ) kdk k

A sequence of directions dk k satisfying
this property is said to be gradient related
with the sequence of iterates xk k .
Definition 11.13 (Gradient related directions). Consider a function f : Rn → R,

that is bounded from below and
differentiable. Consider also an iterative algorithm
n
that generates a sequence xk k in R , defined by x0 and the iterations
xk+1 = xk + αk dk , k = 0, 1, . . .

The sequence dk k is said to be gradient related with the sequence xk k if, for all
subsequences xk k∈K converging toward a non stationary point, i.e., all subsequences
such that
∇f lim xk 6= 0 ,
k∈K

the corresponding subsequence dk k∈K
is bounded and satisfies
lim sup ∇f(xk )T dk < 0 . (11.72)

k∈K

Then, if the sequence dk k is gradient related with the sequence xk k , the angle
between dk and ∇f(xk ) does not come too close to 90 degrees.
Corollary 11.14 (Global convergence). Consider a function f : Rn → R, that

is bounded from below, differentiable, and for which the gradient is Lipschitz
continuous (Definition B.16), i.e., that there exists M > 0 such that
∇f(x) − ∇f(y) ≤ Mkx − yk , ∀x, y ∈ Rn . (11.73)

Consider an algorithm generating the sequence xk k , defined by x0 and the
iterations
xk+1 = xk + αk dk , k = 0, 1, . . . ,
with dk gradient related with xk (according to Definition 11.13), and αk satisfies
the Wolfe conditions (11.45) and (11.47). Then, regardless of x0 ∈ Rn ,
lim ∇f(xk ) = 0 . (11.74)

k→∞
Proof. It is an immediate consequence of Zoutendijk’s theorem.
In the context of the preconditioned steepest descent method, in order for the
sequence of Newton directions to be gradient related with the iterates, it suffices that
the conditioning of the matrices Dk is bounded, i.e., that there exists C > 0 such
that
kDk k2 D−1
k 2
≤ C, ∀k .
288 Project
According to the Rayleigh-Ritz theorem (Theorem C.4), and since D−1

k 2
is
opposite to the smallest eigenvalue of Dk , we get
2
∇f(xk )
−∇f(xk )T dk = ∇f(xk )T Dk ∇f(xk ) ≥ .
D−1
k 2
Then, by using (11.69),

2
∇f(xk )
cos θk ≥ .
D−1
k 2
∇f(xk ) kdk k
Since
kdk k ≤ Dk 2
∇f(xk ) ,
we get
1 1
cos θk ≥ ≥ > 0.
Dk 2 D−1
k 2
C
Then, the cosine of the angle is bounded by a positive constant, and the directions
do not degenerate by becoming asymptotically orthogonal to the gradient.
The presentation of the proof of Theorem 11.12 was inspired by Nocedal and
Wright (1999). Examples 11.5 and 11.7 were inspired by Dennis and Schnabel (1996).
11.8 Project
Objective
The objective of the present project is to analyze the behavior of the descent methods,
and to understand the role of the preconditioner and the role of the line search
parameters.
Approach
1. Implement the preconditioned steepest descent method (Algorithm 11.1) with line
search (Algorithm 11.5) and the following preconditioners:
(a) Dk = I, to obtain the steepest descent method.
−1
∂2 f(xk )
(b) Dk diagonal matrix, with Dk (i, i) = max 1, .
∂x2i
−1
∂2 f(x0 )
(c) Dk diagonal matrix, with Dk (i, i) = max 1, .
∂x2i
1
(d) Dk diagonal matrix, with Dk (i, i) = .
max 1, xk i
1
(e) Dk diagonal matrix, with Dk (i, i) = .
max 1, x0 i
Utilize several starting points. Each time, verify that the optimality conditions
are satisfied at the final solution and compare the number of iterations.
2. In the line search (Algorithm 11.5), vary the parameters, by using (for instance)
the following values:
α0 = 0.1 ; 1.0 ; 10.0.
β1 = 0.1 ; 0.5 ; 0.9.
β2 = 0.1(β1 − 1) + 1 ; 0.5(β1 − 1) + 1 ; 0.9(β1 − 1) + 1.
Algorithms
Problems
Exercise 11.1. The James Bond problem, described in Section 1.1.5.

Exercise 11.2. The problem
2
min 2x1 x2 e−(4x1 +x2 )/8 .
x∈R2
Advice: draw the function and the level curves with a software such as Gnuplot,
visually identify the stationary points, and then choose the starting points, either
close to or far from the stationary points.
n
X
minn iαx2i ,
x∈R
i=1
T
x̄ = 1 ... 1 , with various values of n and α.
min 3x21 + x42 .
x∈R2
T
Recommended starting point: 1 −2 .
Exercise 11.5. The Rosenbrock problem
2 2
min 100 x2 − x21 + 1 − x1 (Section 11.6) .
x∈R2
T T
Recommended starting points: 1.2 1.2 and −1.2 1 .
290 Project

m
X
min −e−0.1 i + 5 e−i − 3 e−0.4i
x∈R6
i=1
2
+ x3 e−0.1 ix1 − x4 e−0.1 ix2 + x6 e−0.1 ix5 ,
with various values of m. T

Recommended starting point: 1 2 1 1 4 3 .
Chapter 12
Trust region
Contents
12.1 Solving the trust region subproblem . . . . . . . . . . . . 294
12.1.1 The dogleg method . . . . . . . . . . . . . . . . . . . . . . 294
12.1.2 Steihaug-Toint method . . . . . . . . . . . . . . . . . . . 298
12.2 Calculation of the radius of the trust region . . . . . . . 300
12.4 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
In Chapter 11, we addressed a class of optimization methods enabling us to get

around the shortcomings of Newton’s local method, all the while maintaining its
essential qualities when possible. In particular, the line search approach allows for
global convergence, that is, the guarantee that the algorithm would converge to a
local minimum, whatever the starting point, while reaching a quadratic convergence
when the iterates would come close to a local minimum. The so-called trust region
methods target the same objective, through a different approach.
Newton’s local method (Algorithm 10.2) consists in minimizing a quadratic model
of the function at each iteration. Taylor’s theorem (Theorem C.2) shows us that the
quadratic model of a function is a good approximation of the latter close to the point
where it is defined. It is legitimate to define a region around the current iterate xk
within which we can have trust in the quadratic model. This region is called the trust
region. It is defined by its radius ∆k and a point x belongs to this region if
kxk − xk ≤ ∆k , (12.1)
where k · k is a norm on Rn .
Assuming that we knew the value of ∆k , the wisest thing to do would be to
minimize the quadratic model within this region, rather than over all of Rn . The
minimization problem (10.11) in Algorithm 10.2 can be replaced by the following
problem, called the trust region subproblem.
292
Definition 12.1 (Trust region subproblem). Let f : Rn → R be a twice differentiable

x ∈ Rn , mbx the quadratic model of f in b
function, b x (Definition 10.1) and ∆k > 0.
The trust region subproblem is the following minimization problem:
1 T 2
min mbx (b x) + dT ∇f(b

x + d) = f(b x) + d ∇ f(b
x)d (12.2)
d 2
subject to
kdk ≤ ∆k . (12.3)
It is interesting to analyze the optimality conditions of the trust region subproblem

for the Euclidean norm. To simplify this analysis, we rewrite (12.3)
1
kdk22 − ∆2k ≤ 0 . (12.4)
2
The Lagrangian (Definition 4.3) of this problem is
1 T 2 µ
x) + dT ∇f(b
L(d, µ) = f(b x) + d ∇ f(b x)d + kdk22 − ∆2k . (12.5)
2 2
If d∗ is the optimal solution to the trust region subproblem, the necessary opti-
mality conditions guarantee that there exists µ∗ ∈ R such that
∇d L(d∗ , µ∗ ) = ∇f(b x)d∗ + µ∗ d∗ = 0
x) + ∇2 f(b (12.6)
∗
µ ≥0 (12.7)
∗

µ kd∗ k22 − ∆2k = 0 . (12.8)
If d∗ is strictly within the trust region, i.e., if kd∗ k < ∆k , (12.8) guarantees
that µ∗ = 0. Therefore, (12.6) can be simplified and corresponds to the necessary
optimality conditions of the unconstrained problem. The constraint of the trust
region, inactive in d∗ , can be ignored. We obtain an iteration of Newton’s local
method.
If d∗ is at the border of the trust region, i.e., kd∗ k2 = ∆k , then (12.6) is written
as
x) + µ∗ I d∗ = −∇f(b
∇2 f(b x) .
It is possible to demonstrate (see Conn et al., 2000, Theorem 7.2.1, page 172) that
the matrix ∇2 f(b x) + µ∗ I is positive semi-definite. The analogy with the technique
used in Newton’s method with line search (Algorithm 11.8) is interesting. Indeed,
in both cases, a multiple of the identity is added to the Hessian matrix to correct
potential problems of Newton’s local method. The following example illustrates the
relationship between the trust region problem and the unconstrained minimization
of a perturbed quadratic model.
Example 12.2 (Trust region subproblem). Consider Example 10.2 and create the
trust region subproblem in xk = 4 with ∆k = 1.
1 1
min x2 − 4x subject to (x − 4)2 ≤ . (12.9)
x 2 2
Trust region 293
25
20
15
10
5
f(x )
0 • k+1 •
-5 f(xk )
f(x)
-10 x2 − 4x
-15
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
x
(a) Trust region
25
20
15
10
5
f(x )
0 • k+1 •
-5 f(xk )
f(x)
-10 2x2 − 12x + 16
-15
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
x
(b) Perturbed quadratic model
The optimal solution to this problem is xk+1 = x∗ = 3, as illustrated in Figure 12.1(a).

The trust region constraint is active at the solution. The Lagrangian of the problem
is
µ
L(x, µ) = x2 − 4x + (x − 4)2 − 1
2
and, as x∗ = 3,
∇x L(x∗ , µ∗ ) = 2x∗ − 4 + µ∗ (x∗ − 4)
= 2 − µ∗ = 0.
Then, µ∗ = 2. The same result can be obtained by minimizing, without constraint
L(x, 2), i.e.,
2x2 − 12 x + 15 .
Equivalently, we can minimize any model of the form
2x2 − 12 x + c ,
294 Solving the trust region subproblem
where c is a constant and obtain the same result. By choosing c = 16, the quadratic
function interpolates f in xk . This is illustrated in Figure 12.1(b). It shows that im-
posing a trust region constraint can be equivalent to modifying the quadratic model,
and optimize it without constraint.
In order to render the trust region method operational, we need only clarify two
things:
1. How to solve the trust region subproblem (12.2)–(12.3).
2. How to determine the value of ∆k .
12.1 Solving the trust region subproblem

In Chapter 11, we justified the inexact line search algorithm (Algorithm 11.5) by
noticing that it was useless and computationally too demanding to solve exactly the
minimization subproblem at each iteration. The same argument applies here. The
trust region subproblem (12.2)–(12.3) is solved approximately.
There are many ways to solve this problem. Here we present two methods. The
first is called the dogleg method. It is valid when the Hessian matrix at the current
point ∇2 f(b x) is positive definite. The second is based on the conjugate gradient
method (Algorithm 9.2), and is therefore appropriate for large scale problems.
12.1.1 The dogleg method

The main idea of the dogleg method is the following:
• If the trust region is small, the first-order Taylor approximation of the function
is probably already good, and the quadratic term plays only a minor role. It is
therefore wise to follow the steepest descent direction toward the Cauchy point
(Definition 10.4).
• If the trust region is larger, the second-order term becomes significant, and the
Newton point (Definition 10.3) becomes the preferred target.
• In order to combine these two directions, the dogleg method consists in following
a path that leads first to the Cauchy point, and then takes the Newton direction.
This path is continued to the Newton point, or to the border of the trust region.
If the Newton point is reached without leaving the trust region, this means that
a local Newton iteration can be carried out.
We first assume that ∇2 f(b x) is positive definite. Formally, we define a trajectory
from the current point b
x, defined by b x + p(α), with

 αdC 0≤α≤1
p(α) = d + (α − 1)(xd − xC ) 1≤α≤2 (12.10)
 C
η(3 − α) + α − 2 dN 2 ≤ α ≤ 3,
Trust region 295
where
• the Cauchy (i.e., steepest descent) direction (Definition 10.4) is defined by
∇f(bx)T ∇f(b
x)
dC = − ∇f(b
x) ,
x)T ∇2 f(b
∇f(b x)∇f(b
x)
• xC = b
x + dC is the Cauchy point,
• the Newton direction (Definition 10.3) is defined by
−1
dN = −∇2 f b
x ∇f(b
x) ,
if ∇2 f(b
x) is positive definite,
• the dogleg point is defined by
xd = b
x + ηdN ,
where η ≤ 1 defines the position of the dogleg point in the Newton direction; the
recommended value in the literature is η = 0.8kdC k/kdN k + 0.2.
As illustrated in Figure 12.2(a), this trajectory connects b

x (α = 0), xC (α = 1), xd
(α = 2), and xN (α = 3). Note that p(1) = dC , p(2) = xd − b x = ηdN , and p(3) = dN .
This trajectory is followed either all the way (the Newton point), or to the border of
the trust region.
Example 12.3 (Dogleg path). Consider the function
1 2 9 2
x + x
2 1 2 2
T
from Example 11.1 as well as the point b x = 9 1 . The Cauchy point (xC ),
the Newton point (xN ), and the dogleg point (xd ), where the path joins the Newton
direction, are

7.2 0 4.608
xC = , xN = , xd = ,
−0.8 0 0.512
as illustrated in Figure 12.2(a). We consider three trust regions of radii ∆ = 1, 4, 8.

The approximate solution to the trust region subproblem for each radius is

8.29289 5.06523 1.04893
x1 = , x4 = , x8 = ,
0.29289 0.28056 0.11655
as illustrated in Figure 12.2(b).

4 4
2 2
• xb
• x8
• •
• xd
0 • x4 x1 0
xN
•
xC
-2 -2
-4 -4
0 2 4 6 8 10 0 2 4 6 8 10
(a) Definition of the path (b) Intersections with the trust regions
Figure 12.2: Illustration of the dogleg method
We need to calculate the intersection of the trajectory with the radius of the trust
region. In the steepest descent direction, we have
∆k
− ∇f(xk ) = ∆k ,
∇f(xk )
and the point
∆k
b
x− ∇f(xk )
∇f(xk )
is located at the border of the trust region. The technique is identical in the Newton
direction, where the point
∆k
b
x+ dN
kdN k
is located at the border of the trust region. To find where the segment xd − xC
intersects with the trust region, we need to find the value of λ that solves the equation
xC + λ(xd − xC ) − b
x 2
= ∆k .
Lemma 12.4. Consider ∆ > 0, xC located in the trust region centered in b x, i.e.,
such that kdC k = kxC − b
xk ≤ ∆, and xd outside the trust region, i.e., such that
kdd k = kxd − b
xk > ∆. The step λ such that
dC + λ(dd − dC ) 2
= xC − b
x + λ(xd − xC ) 2
=∆ (12.11)
is given by √
−b + b2 − 4ac
λ= (12.12)
2a
Trust region 297
with
2
a = dd − dC 2
b = 2dTC (dd − dC )
2
c = dC − ∆2 .
Proof. The result is obtained by denoting d = xd − xC = dd − dC and by calculating

the roots to the equation
kdC + λdk22 = ∆2
or
dT dλ2 + 2dTC dλ + dTC dC − ∆2 = 0 .
The coefficient of λ2 is a = dT d, that of λ is b = 2dTC d and the independent term
c = dTC dC − ∆2 . The discriminant of this equation is
b2 − 4ac = 4(dTC d2 − dT ddTC dC + dT d∆2 ) .
Since dTC dC ≤ ∆2 , then dT d(∆2 − dTC dC ) ≥ 0 and the discriminant is non negative.
The equation has two solutions:
√ √
−b + b2 − 4ac −b − b2 − 4ac
and .
2a 2a
We demonstrate that the first is always non negative and the second non positive.
It corresponds to the intersection with the trust region in the direction d from xc .
The second root corresponds to the direction −d. We discuss the sign of b.
b = 0: in this case, it is trivial to show that the first root is positive and the
second negative.
b > 0: since b = 2dTC d > 0, the second solution is negative because a > 0. For
the first, we have
p q
2
b − 4ac = 2 dTC d − dT d dTC dC − ∆2
2
q
2
≥ 2 dTC d because ∆2 − dTC dC ≥ 0
≥ 2dTC d because dTC d > 0 .
Therefore, p
−b + b2 − 4ac ≥ −2dTC d + 2dTC d = 0 ,
and the first root is positive as a > 0.
b < 0: since b = 2dTC d < 0, the first solution is positive because a > 0. For the
second, we have
p q
2
b2 − 4ac = 2 dTC d − dT d dTC dC − ∆2
q
≥ 2 (dTC d)2 because ∆2 − dTC dC ≥ 0
≥ −2dTC d because dTC d < 0 .
Therefore, p
−b − b2 − 4ac ≤ −2dTC d + 2dTC d = 0,
and the second root is negative.
Algorithm 12.1: Intersection with the trust region

1 Objective
2 To find the intersection between a direction and the border of the trust
region.
3 Input
4 dC ∈ Rn such that kdC k ≤ ∆
5 d = dd − dC ∈ Rn such that d 6= 0
6 ∆ ∈ R such that ∆ > 0
7 Output
8 The step λ such that kdC + λdk = ∆.
9 a := dT d
10 b := 2dTC d
11 c := dTC dC − ∆2
√
−b + b2 − 4ac
12 λ :=
2a
For the method to work, we have to define the safeguards when the matrix ∇2 f(b
x)
is not positive definite. This is simple. If the function is concave in the steepest
descent direction, i.e., if
∇f(bx)T ∇2 f(b x) ≤ 0 ,
x)∇f(b
then the quadratic function is unbounded from below in this direction, and decreases
towards −∞. It can therefore be followed until the border of the trust region. If the
function is concave in the Newton direction, then it is ignored and the Cauchy point
is selected. Algorithm 12.2 describes the dogleg method to solve approximately the
trust region subproblem.
12.1.2 Steihaug-Toint method

The conjugate gradient method, presented in Section 9.2, is designed to minimize
strictly convex quadratic problems. Steihaug (1983) and Toint (1981) proposed an
adaptation of this method to solve the trust region subproblem.
The basic idea is the following. At each iteration of the conjugate gradient method,
we first test whether the quadratic model is convex in the direction dk . If this is not
the case, we follow this direction until the border of the trust region and stop the
iterations. Furthermore, as soon as an iterate is outside the trust region, we follow the
last calculated direction until the border of the trust region and stop the algorithm.
In all other cases, the method is applied in its original version. We thus obtain
Algorithm 12.3, which should be used with Q = ∇2 f(xk ) and b = ∇f(xk ).
Trust region 299
Algorithm 12.2: Dogleg method

1 Objective
2 To find an approximate solution to the trust region subproblem
x) + 12 dT ∇2 f(b
mind∈Rn dT ∇f(b x)d subject to kdk2 ≤ ∆ .
3 Input
4 The gradient at the current point: ∇f(bx) ∈ Rn 6= 0.
2
5 The Hessian at the current point: ∇ f(bx) ∈ Rn×n .
6 The radius of the trust region: ∆ > 0.
7 Output
8 Approximate solution d∗ .
9 Cauchy point
10 x)T ∇2 f(b
β := ∇f(b x)∇f(b
x) . Curvature in the steepest descent
direction
11
12 if β ≤ 0 then the model is not convex
∆
13 STOP with d∗ = − ∇f(b
x) .
∇f(b x)
14 x)T ∇f(b
α := ∇f(b x) .
α
15 dC := − ∇f(b x) using (10.17)
β
16
17 if kdC k ≥ ∆ then Cauchy point outside the trust region
∆
18 STOP with d∗ := dC .
kdC k
19 Newton point
20 Calculate dN by solving ∇2 f(b
x)dN = −∇f(b x).
21 if dTN ∇2 f(b
x)dN ≤ 0 then the model is not convex
22 STOP with the Cauchy point, d∗ = dC .
23 if kdN k ≤ ∆ then Newton point within the trust region
24 STOP with d∗ = dN .
25 Dogleg point !
0.8α2
26 Calculate dd := 0.2 + dn .
x)T dN
β ∇f(b
27 if kdd k ≤ ∆ then dogleg point within the trust region
∆
28 STOP with d∗ = dN .
kdN k
29 Between Cauchy and dogleg
30 Use Algorithm 12.1 to calculate λ∗ such that dC + λ∗ (dd − dC ) is the
intersection point between the segment connecting the Cauchy point and
the dogleg point, with the border of the trust region.
31 STOP with d∗ = dC + λ∗ (dd − dC ).
300 Calculation of the radius of the trust region
Algorithm 12.3: Steihaug-Toint truncated conjugate gradient method

1 Objective
2 To find an approximate solution to the trust region subproblem
minx 12 xT Qx + xT b subject to kxk2 ≤ ∆.
3 Input
4 Q ∈ Rn×n
5 b ∈ Rn
6 Radius of the trust region ∆
7 Output
8 The approximate solution x∗ ∈ Rn
9 Initialization
10 k := 1
11 x1 := 0
12 d1 := −b
13 Repeat
14 if dTk Qdk ≤ 0 then the function is not convex along dk
15 x∗ = xk + λdk where λ is obtained by Algorithm 12.1
dTk (Qxk + b)
16 Calculate the step αk := −
dTk Qdk
17 Calculate the next iterate: xk+1 := xk + αk dk .
18 if kxk+1 k > ∆ then
19 x∗ = xk + λdk where λ is obtained by Algorithm 12.1
∇f(xk+1 )T ∇f(xk+1 ) (Qxk+1 + b)T (Qxk+1 + b)
20 Calculate βk+1 := = .
∇f(xk )T ∇f(xk ) (Qxk + b)T (Qxk + b)
Calculate the new direction dk+1 := −Qxk+1 − b + βk+1 dk . k := k + 1.
21 Until ∇f(xk ) = 0 or k = n + 1
22 x∗ := xk
12.2 Calculation of the radius of the trust region

The radius of the trust region is determined by trial and error. At the first iteration,
an arbitrary value is chosen (∆ = 10, for instance). Subsequently, we evaluate the
quality of the approximate solution to the trust region subproblem and the radius of
the trust region is adjusted according to the evaluation.
Trust region 301
We assume that the optimal solution (possibly approximate) to the trust region
subproblem is d∗ . In this case, we can compare the reduction of the model
mbx (b x + d∗ )
x) − mbx (b
with the reduction of the function
x + d∗ ) .
x) − f(b
f(b
If the model is reliable, these two quantities should be close. We calculate the ratio
f(b x + d∗ )
x) − f(b
ρ= . (12.13)
mbx (b x + d∗ )
x) − mbx (b
We consider three cases:

1. ρ is close to 1, or larger, and the model is very good,
2. ρ is close to 0, or smaller, and the model is poor,
3. ρ is in between, and the model is just good.
These cases are characterized by the constants η1 and η2 such that 0 < η1 ≤ η2 <
1. Typically, we take η1 = 0.01 and η2 = 0.9.
ρ ≥ η2 : the fit between the model and the function seems to be very good, in the
sense that the reduction predicted by the model has practically been reached or even
exceeded.
η1 ≤ ρ < η2 : the fit between the model and the function is not perfect, but this
model has nevertheless enabled to reduce the value of the function. We refer to it as
good.
ρ < η1 : the fit between the model and the function is poor, in the sense that
either the reduction of the function is negligible compared to the prediction made
based on the model, or the value of the function has increased.
Several strategies for updating the trust region by using ρ have been proposed in
the literature. Here is one of the most simple ones:
• If the fit is very good, the radius of the trust region is doubled.
• If the fit is good, the radius remains unchanged.
1 ∗
• If the fit is poor, the radius is reduced to 2 kd k.
Putting everything together, we obtain Newton’s method with trust region (Algo-
rithm 12.4).
302 Calculation of the radius of the trust region
Algorithm 12.4: Newton’s method with trust region

1 Objective
minx∈Rn f(x).
3 Input
8 The radius of the first trust region ∆0 (by default, ∆0 = 10).
10 The parameters 0 < η1 ≤ η2 < 1 (by default, η1 = 0.01 and η2 = 0.9).
11 Output
13 Initialization
14 k := 0.
15 Repeat
16 Calculate dk by solving (approximately) the trust region subproblem
(12.2)–(12.3), with the dogleg method (Algorithm 12.2) or the
Steihaug-Toint truncated conjugate gradient method (Algorithm 12.3).
f(xk ) − f(xk + dk )
17 Calculate ρ = .
mxk (xk ) − mxk (xk + dk )
18 if ρ < η1 then failure
19 xk+1 := xk
20 ∆k+1 := 12 kdk k
21 else success
22 xk+1 = xk + dk
23 if ρ ≥ η2 then very good
24 ∆k+1 = 2∆k
25 else just good
26 ∆k+1 = ∆k
27 k := k + 1.
29 x∗ := xk .
As an illustration, we apply this method to Example 5.8, i.e.,

1 2
x + x1 cos x2 ,
min
2 1
x∈R2
T
from the same starting point x0 = 1 1 . In this case, the algorithm converges
to
∗ −1 ∗ 0 2 ∗ 1 0
x = , ∇f(x ) = , ∇ f(x ) = ,
0 0 0 1
Trust region 303
which is a local minimum because it satisfies the sufficient optimality conditions

(Theorem 5.7 and the discussions of Example 5.8). The iterations are illustrated
in Figure 12.3, which is interesting to compare with Figure 10.1 and Figure 11.16.
Table 12.1 shows, for each iteration,
the iterate xk ,
• the value of the function,

• the gradient norm,
• the radius of the trust region ∆k ,
• the ratio ρ defined by (12.13),
• the manner in which the dogleg method (Algorithm 12.2) is ended (1: partial
Cauchy step, 2: pure Newton step, 3: partial Newton step, 4: dogleg between
Cauchy and Newton, −2: Cauchy point due to the negative curvature of the
Newton direction),
• the state of the iteration: poor (−), good (+), very good (++).
We notice that after two iterations where a negative curvature has been detected, the
iterations are the same as Newton’s local method starting from iteration 4.
6 2
4 1.5
2 x0 1
x0
x∗ 0 x2 0.5 x2
∗
-2 x 0
-4 -0.5
-6 -1
-2 -1 0 1 2 -1 -0.5 0 0.5 1
x1 x1
Figure 12.3: Iterates of Newton’s method with trust region and dogleg for Example
5.8 (∆0 = 10)
To further illustrate the trust region method, it is worth trying the same algorithm
with a smaller initial radius (∆0 = 1). The iterations are illustrated in Figure 12.4.
Table 12.2 is similar to Table 12.1. We note that the iterations 3 to 11 are actually
equivalent to the steepest descent method. Indeed, since the curvature of the function
is negative in the Newton direction, the dogleg method could not be applied. At
iteration 13, the dogleg method generated a point located on the arc between the
Cauchy point and the dogleg point. As of iteration 14, the iterations are those of
Newton’s local method, and a rapid convergence is achieved.
Table 12.1: Newton’s method with trust region for the minimization of Example 5.8 (∆0 = 10)
k xk f(xk ) ∇f(xk ) ∆k ρ
0 +1.00000e+00 +1.00000e+00 +1.04030e+00 +1.75517e+00 +1.00000e+01
1 -2.33845e-01 +1.36419e+00 -2.06286e-02 +2.30665e-01 +2.00000e+01 +9.61445e-01 2 ++
2 -1.39549e-01 +6.12415e-01 -1.04451e-01 +6.83438e-01 +4.00000e+01 +9.59237e-01 -2 ++
3 -9.34497e-01 +5.18458e-01 -3.75047e-01 +4.67749e-01 +8.00000e+01 +9.89241e-01 -2 ++
4 -1.24534e+00 -2.41828e-01 -4.33668e-01 +4.05285e-01 +8.00000e+01 +3.53577e-01 2 +
5 -1.01925e+00 -3.99531e-02 -4.99001e-01 +4.53782e-02 +1.60000e+02 +1.06883e+00 2 ++
6 -1.00077e+00 -7.03374e-04 -4.99999e-01 +1.04323e-03 +3.20000e+02 +1.01414e+00 2 ++
7 -1.00000e+00 -5.40691e-07 -5.00000e-01 +5.94432e-07 +6.40000e+02 +1.00035e+00 2 ++
Calculation of the radius of the trust region
Table 12.2: Newton’s method with trust region for the minimization of Example 5.8 (∆0 = 1)
k xk f(xk ) k∇f(xk )k ∆k ρ
0 +1.00000e+00 +1.00000e+00 +1.04030e+00 +1.75517e+00 +1.00000e+00
1 +1.22417e-01 +1.47943e+00 +1.86628e-02 +2.45993e-01 +2.00000e+00 +9.47588e-01 1 ++
2 -1.01629e-03 +1.57003e+00 -2.61464e-07 +1.04679e-03 +4.00000e+00 +9.97536e-01 2 ++
3 -5.36408e-04 +1.56809e+00 -1.30949e-06 +2.23824e-03 +8.00000e+00 +1.00000e+00 -2 ++
4 -5.08985e-03 +1.56696e+00 -6.55830e-06 +5.24259e-03 +1.60000e+01 +9.99998e-01 -2 ++
5 -2.68657e-03 +1.55723e+00 -3.28448e-05 +1.12089e-02 +3.20000e+01 +1.00000e+00 -2 ++
6 -2.54882e-02 +1.55160e+00 -1.64466e-04 +2.62487e-02 +6.40000e+01 +9.99957e-01 -2 ++
7 -1.34638e-02 +1.50289e+00 -8.22887e-04 +5.60207e-02 +1.28000e+02 +1.00002e+00 -2 ++
8 -1.27230e-01 +1.47480e+00 -4.10176e-03 +1.30473e-01 +2.56000e+02 +9.98929e-01 -2 ++
9 -6.84750e-02 +1.23764e+00 -2.00488e-02 +2.66527e-01 +5.12000e+02 +1.00051e+00 -2 ++
Trust region
10 -5.88466e-01 +1.10750e+00 -8.98399e-02 +5.45134e-01 +1.02400e+03 +9.77015e-01 -2 ++

11 -4.02533e-01 +4.16075e-01 -2.87173e-01 +5.37370e-01 +2.04800e+03 +1.01116e+00 -2 ++
12 -4.02533e-01 +4.16075e-01 -2.87173e-01 +5.37370e-01 +1.09534e+00 -2.88565e+00 2 -
13 -1.09350e+00 -4.33824e-01 -3.94333e-01 +4.95902e-01 +1.09534e+00 +2.99489e-01 4 +
14 -1.10395e+00 +3.38629e-02 -4.93964e-01 +1.11009e-01 +2.19067e+00 +9.35399e-01 2 ++
15 -1.00047e+00 +3.16268e-03 -4.99995e-01 +3.19902e-03 +4.38135e+00 +1.00813e+00 2 ++
16 -1.00000e+00 +1.44712e-06 -5.00000e-01 +5.20201e-06 +8.76269e+00 +1.00045e+00 2 ++
17 -1.00000e+00 +7.23075e-12 -5.00000e-01 +7.30618e-12 +1.75254e+01 +1.00001e+00 2 ++
305
Table 12.3: Newton’s method with trust region and Algorithm 12.3 for the minimization of Example 5.8 (∆0 = 10)
k xk f(xk ) k∇f(xk )k ∆k ρ
0 +1.00000e+00 +1.00000e+00 +1.04030e+00 +1.75517e+00 +1.00000e+01
1 +1.00000e+00 +1.00000e+00 +1.04030e+00 +1.75517e+00 +5.00000e+00 -7.51975e-02 3 -
2 +1.00000e+00 +1.00000e+00 +1.04030e+00 +1.75517e+00 +2.50000e+00 -1.23991e-01 3 -
3 +5.50230e-01 +3.45921e+00 -3.71332e-01 +4.35121e-01 +2.50000e+00 +4.19624e-01 3 +
4 +1.16790e+00 +2.76142e+00 -4.02518e-01 +4.95063e-01 +2.50000e+00 +1.70028e-01 1 +
5 +1.06365e+00 +3.12536e+00 -4.97834e-01 +6.60783e-02 +5.00000e+00 +1.04357e+00 1 ++
6 +1.00012e+00 +3.14062e+00 -5.00000e-01 +9.75276e-04 +1.00000e+01 +1.00343e+00 1 ++
7 +1.00000e+00 +3.14159e+00 -5.00000e-01 +4.81675e-07 +2.00000e+01 +1.00011e+00 1 ++
Calculation of the radius of the trust region
Trust region 307
6 2
4 1.5
2 x0 1
∗
x0
x 0 x2 0.5 x2
∗
-2 x 0
-4 -0.5
-6 -1
-2 -1 0 1 2 -1 -0.5 0 0.5 1
x1 x1
Figure 12.4: Iterates of Newton’s method with trust region and dogleg for Example
5.8 (∆0 = 1)
6 4
4 3.5
x∗ x∗
3
2
x0 2.5
0 x2 x2
2
-2
1.5
-4 x0 1
-6 0.5
-2 -1 0 1 2 0.5 0.75 1
x1 x1
Figure 12.5: Iterates of Newton’s method with trust region and Steihaug-Toint for
Example 5.8 (∆0 = 10)
Finally, we apply the trust region method by using the truncated conjugate gradi-
ent method (Algorithm 12.3) to solve the trust region subproblem. Table 12.3 lists the
iterations. The penultimate column gives the reasons why Algorithm 12.3 is stopped.
Either it converges toward the unconstrained minimum of the quadratic model (1),
or it generates an iterate outside the trust region (2), or it detects a direction in
which the model has a negative curvature (3). We notice that, as of iteration 4, the
constraint of the trust region subproblem no longer plays a role. In this case, the
iterations are those of Newton’s local method, and a rapid convergence is achieved.

Following on the analysis performed in Section 11.6, we apply the algorithms pre-
sented in this chapter to the Rosenbrock problem. Qualitatively, we reach the same
conclusions: the exploitation of the second derivatives by Newton’s method (here
with trust region) makes it significantly superior to the steepest descent algorithm,
as illustrated in Figure 12.6. Compared to the line search approach, the number of
iterations is roughly the same (23 for line search, 29 for trust region).
2
x0 1.5
∗
x 1
0.5 x2
0
-0.5
-1
-1.5 -1 -0.5 0 0.5 1
x1
(a) 29 iterations
x∗ 1
0.8
0.6
x2
0.4
0.2
0
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1

x1
(b) Zoom
Figure 12.6: Newton’s method with trust region

Trust region 309
12.4 Project
Objective
The aim of the present project is to analyze the behavior of trust region methods
when solving the following problems and to compare it with that of descent methods.
Approach
Implement Algorithm 12.4, once with the dogleg method (Algorithm 12.2) and once
with the Steihaug-Toint truncated conjugate gradient method (Algorithm 12.3), in
order to solve the trust region subproblem. Test several variations by varying the
following parameters:
• ∆0 = 0.1 ; 1.0 ; 10.0 ; 100.0 ;
• η1 = 0.1 ; 0.5 ; 0.9 ;
• η2 = 0.1 (η1 − 1) + 1 ; 0.5 (η1 − 1) + 1 ; 0.9 (η1 − 1) + 1.
Compare these algorithms with the descent method in Chapter 11. Analyze the
results by using the method described in Section D.2.
Algorithms
Algorithms 12.2, 12.3, and 12.4.
Problems
2
min 2x1 x2 e−(4x1 +x2 )/8 .
x∈R2
n
X T
minn iαx2i , x̄ = 1 ... 1 ,
x∈R
i=1
with various values for n and for α.

min 3x21 + x42 .
x∈R2
T
310 Project

2
min 100 x2 − x21 + (1 − x1 )2 (Section 12.3) .
x∈R2
T T
Recommended starting point: 1.2 1.2 and −1.2 1 .

m
X
min −e−0.1 i + 5 e−i − 3 e−0.4 i
x∈R6
i=1
2
+ x3 e−0.1 ix1 − x4 e−0.1 ix2 + x6 e−0.1 ix5 ,
T
Recommended starting point 1 2 1 1 4 3 .
Chapter 13
Contents
13.1 BFGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
13.2 Symmetric update of rank 1 (SR1) . . . . . . . . . . . . . 317
13.4 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
13.5 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
Newton’s method, either with line search (Algorithm 11.8) or with trust region (Al-
gorithm 12.4), requires the use of the Hessian matrix of the function at each iteration.
This matrix poses some practical problems. First, the analytical calculation of the
second derivatives, as well as their implementation, is often tedious and error-prone.
Moreover, once the work has been done, the calculation of this matrix at each itera-
tion of the algorithm is time-consuming and can be detrimental to the effectiveness
of the algorithms. We therefore adapt the quasi-Newton methods of Chapter 8 to
optimization problems in order to maintain the structure of the algorithm without
using the Hessian matrix.
13.1 BFGS
Keeping in mind that Newton’s method aims at solving the system of equations
∇f(x) = 0, it is natural to be directly inspired by the secant methods for systems of
equations presented in Chapter 8 and to propose to approximate the matrix ∇2 f(b x)
by using the Broyden update (8.15), i.e., with a matrix Hk defined by
(yk−1 − Hk−1 dk−1 )dTk−1

Hk = Hk−1 + , (13.1)
dTk−1 dk−1
with
dk−1 = xk − xk−1
(13.2)
yk−1 = ∇f(xk ) − ∇f(xk−1 ) ,
which is (8.11) where F has been replaced by ∇f.
312 BFGS
This matrix satisfies the secant equation (8.10), i.e., the quadratic model formed
from Hk has the same gradient as the function in xk−1 and in xk . The main problem
with this method is that the matrix Hk is generally not symmetric nor positive
definite, as can be seen for iteration 18 in Table 8.7.
William Cooper Davidon was born in Fort Lauderdale in

Florida, on March 18, 1927. A physicist by education, he
is currently professor emeritus of mathematics at Haverford
College, in Pennsylvania. Nocedal and Wright (1999) tell the
following story. In the middle of the 1950s, Davidon attempted
to solve an optimization problem at the Argonne National
Laboratory. The computers at the time were not too stable.
They always stopped before the algorithm was done. Stimulated
by this frustration, Davidon developed a faster method in order to solve his problem.
This first quasi-Newton algorithm was one of the most revolutionary ideas in non
linear optimization. The irony of the story is that the original article by Davidon
(1959) was not accepted for publication at the time. It wasn’t published until 1991,
in the first issue of SIAM Journal on Optimization (Davidon, 1991).
Figure 13.1: William C. Davidon
In the context of optimization, as the matrix is approximating the second deriva-

tives matrix, it makes sense to force it to be symmetric. Moreover, as Newton’s
method has to be modified when the second derivative matrix is non positive def-
inite, it makes sense to also enforce positive definiteness, in order to avoid these
modifications.
The process can be initialized with a symmetric positive definite matrix (the
identity matrix, for instance), but these properties must be maintained by the update
formula that generates Hk from Hk−1 . The following procedure is proposed:
1. Consider Hk−1 symmetric positive definite.
2. Calculate the Cholesky factorization (Definition B.18) of Hk−1 = Lk−1 LTk−1 .
3. Perform an update of Lk−1 , in order to obtain a matrix Ak .
4. Take Hk = Ak ATk , in order to obtain a symmetric and positive definite matrix.
The secant equation (8.10) is written as
Ak ATk dk−1 = yk−1 (13.3)
and can be decomposed into two equations:
Ak x = yk−1 (13.4)
T
Ak dk−1 = x. (13.5)
Based on the principles of the secant method, we calculate Ak so that it satisfies

(13.4) by being as close as possible to Lk−1 . We use the Broyden update formula
(8.15). In order to simplify the notations, we temporarily abandon the indices k and
k − 1, to obtain
(y − Lx)xT
A=L+ . (13.6)
xT x
We must now establish the value of x in order for Equation (13.5) to also be satisfied.
By combining (13.5) and (13.6), we get
(y − Lx)T d
x = AT d = LT d + x. (13.7)
xT x
The latter equation can only have a solution if LT d is a multiple of x, i.e., if there
exists α ∈ R such that
x = αLT d . (13.8)
We immediately note that
xT x = α2 dT LLT d = α2 dT Hd . (13.9)
By combining (13.7), (13.8), and (13.9), we obtain

α
αLT d = LT d + (y − αHd)T d LT d
α2 dT Hd
1
= LT d + (yT d − αdT Hd) LT d
αdT Hd
yT d T
= LT d + L d − LT d
αdT Hd
yT d T
= L d.
αdT Hd
Then,
yT d T
α2 LT d = L d
dT Hd
or
yT d
α2 = . (13.10)
dT Hd
It is important to note that (13.10) only makes sense if yT d > 0.
At this point, the equations (13.6), (13.8), and (13.10) define the matrix A:

1 T yT d T
A=L+ T αyd L − T Hdd L .
y d d Hd
The calculation of AAT is tedious, but direct.

α α2
AAT = H + T
HdyT − T HddT H
y d y d
2
α α α3
+ T ydT H + T T
2 yd Hdy −
T T
2 yd Hdd H
y d T
y d T
y d
α2 α3 α4
− T
HddT H − T T
2 Hdd Hdy +
T T
2 Hdd Hdd H .
y d T
y d T
y d
314 BFGS
By simplifying, and reintegrating the indices, we obtain an update formula for Hk
yk−1 yTk−1 Hk−1 dk−1 dTk−1 Hk−1

Ak ATk = Hk = Hk−1 + − . (13.11)
yTk−1 dk−1 dTk−1 Hk−1 dk−1
This update formula was discovered independently in the late 1960s by the math-
ematicians C. G. Broyden, R. Fletcher, D. Goldfarb, and D. F. Shanno (Figure 13.2),
and is now called the “BFGS” update, which is the acronym of their names.
Figure 13.2: C. G. Broyden, R. Fletcher, D. Goldfarb, and D. F. Shanno
Definition 13.1 (BFGS update). Consider the differentiable function f : Rn → R

and two iterates xk−1 and xk such that dTk−1 yk−1 > 0, with dk−1 = xk − xk−1
and yk−1 = ∇f(xk ) − ∇f(xk−1 ). Consider a symmetric positive definite matrix
Hk−1 ∈ Rn×n . The BFGS update is defined by
yk−1 yTk−1 Hk−1 dk−1 dTk−1 Hk−1

Hk = Hk−1 + T
− . (13.12)
yk−1 dk−1 dTk−1 Hk−1 dk−1
It is important to emphasize that the symmetric and positive definite secant equa-
tion (13.3) does not always have a solution.
Lemma 13.2. Consider d, y ∈ Rn , d 6= 0. There then exists a non singular

matrix A ∈ Rn×n such that
AAT d = y
if and only if
dT y > 0 .
Proof. Necessary condition. The above development, and in particular the equations
(13.6), (13.8), and (13.10), ensure that if dT y > 0, the following matrix is a solution
to the secant equation:

1 yT d
A=L+ T αydT L − T HddT L
y d d Hd
with H = LLT and α such that (13.10) is satisfied.

Sufficient condition. If AAT d = y, then dT AAT d = dT y. Since AAT is positive
definite, then dT y > 0.
The condition dT y > 0 is always satisfied if the second Wolfe condition (Defini-
tion 11.8) is used. Indeed,
∇f(xk−1 + αk−1 dk−1 )T dk−1 ≥ β2 ∇f(xk−1 )T dk−1 from (11.47)

T T T
∇f(xk ) dk−1 − ∇f(xk−1 ) dk−1 ≥ (β2 − 1)∇f(xk−1 ) dk−1
yTk−1 dk−1 ≥ (β2 − 1)∇f(xk−1 )T dk−1 from (13.2).
If dk−1 is a descent direction, then ∇f(xk−1 )T dk−1 < 0 (Definition 2.10). Since
β2 < 1 (Definition 11.8), we have
yTk−1 dk−1 ≥ (β2 − 1)∇f(xk−1 )T dk−1 > 0 .
We can thus adapt Newton’s method with line search (Algorithm 11.8), by replac-
ing the Hessian of f by the BFGS approximation. Note that, unlike the Hessian of f,
we are certain that the matrix Hk is positive definite, which significantly simplifies
the algorithm. The direction dk of the algorithm is calculated by solving the system
of equations
Hk dk = −∇f(xk ) .
In order to avoid solving this system at each iteration, it may be appropriate to

analytically calculate H−1
k and obtain dk by a simple matrix-vector product
dk = −H−1
k ∇f(xk ) .
We need only1 apply to (13.12) the Sherman-Morrison-Woodbury formula (Theorem

C.17) to obtain

dk−1 yTk−1 yk−1 dTk−1 dk−1 dTk−1
H−1
k = I− T −1
Hk−1 I − T + T . (13.13)
dk−1 yk−1 dk−1 yk−1 dk−1 yk−1
The method is described in Algorithm 13.1.

The iterations of the BFGS method applied to Example 5.8 are listed in Table 13.1
and shown in Figure 13.3. The values of H−1 T
k and dk−1 yk−1 at each iteration are
given in Table 13.2.
1 Again, this is tedious, but straightforward.
316 BFGS
Algorithm 13.1: Quasi-Newton BFGS method

1 Objective
minx∈Rn f(x)
3 Input
7 A first approximation of the inverse of the Hessian H−1
0 ∈R
n×n
which is
−1
symmetric positive definite. By default, H0 = I.
9 Output
10 An approximation of the optimal solution x∗ ∈ R
11 Initialization
12 k := 0
13 Repeat
14 dk := −H−1k ∇f(xk )
15 Determine αk by applying a line search (Algorithm 11.5) with α0 = 1
16 xk+1 := xk + αk dk
17 k := k + 1
18 Update H−1k
! !
d̄k−1 yTk−1 ȳk−1 dTk−1 d̄k−1 d̄Tk−1
H−1
k := I− H−1
k−1 I− +
d̄Tk−1 yk−1 d̄Tk−1 yk−1 d̄Tk−1 yk−1
with d̄k−1 = αk−1 dk−1 = xk − xk−1 and yk−1 = ∇f(xk ) − ∇f(xk−1 ).

20 x∗ := xk
6 4
4 3.5
x∗ x∗
2 3
x0
0 x2 2.5 x2
-2 2
-4 1.5
-6 x0 1
-2 -1 0 1 2 0 0.5 1
x1 x1
Figure 13.3: Iterates of the quasi-Newton BFGS method for Example 5.8
Table 13.1: Iterates of the BFGS method (Algorithm 13.1) of Example 5.8
k xk f(xk ) ∇f(xk ) 2
0 1.00000000e+00 1.00000000e+00 1.04030231e+00 1.75516512e+00
1 2.29848847e-01 1.42073549e+00 6.07772543e-02 4.42214869e-01
2 -1.82864218e-02 1.58305828e+00 3.91418158e-04 3.56023470e-02

3 2.08564945e-04 1.57097473e+00 -1.54584949e-08 2.10734922e-04
4 1.46166453e-01 3.85753135e+00 -9.95969816e-02 6.15829050e-01
5 1.28297837e-01 3.57470602e+00 -1.08221093e-01 7.81223552e-01
6 3.46702460e-01 2.67154972e+00 -2.49000880e-01 5.67023829e-01
7 7.50084904e-01 3.33664226e+00 -4.54548154e-01 2.72899395e-01
8 9.24914367e-01 3.14934086e+00 -4.97153311e-01 7.53969637e-02
9 1.02337559e+00 3.15574887e+00 -4.99624251e-01 2.75857811e-02
10 1.00059290e+00 3.13396355e+00 -4.99970706e-01 7.65884832e-03
11 9.98441459e-01 3.14306370e+00 -4.99997705e-01 2.14077500e-03
12 1.00009954e+00 3.14162418e+00 -4.99999995e-01 1.04416861e-04
13 9.99996251e-01 3.14158835e+00 -5.00000000e-01 5.70573905e-06
14 9.99999984e-01 3.14159270e+00 -5.00000000e-01 4.86624055e-08
Table 13.2: Secant approximation of the BFGS method (Algorithm 13.1) for Exam-
ple 5.8

k H−1
k 1,1
H−1
k 2,2
H−1
k 1,2
dTk−1 yk−1
1 7.33361383e-01 9.35044922e-01 1.32282407e-01 1.15252889e+00
2 6.98321680e-01 9.20088043e-01 1.55175319e-01 1.41567951e-01
3 7.00564006e-01 9.15084679e-01 1.58271872e-01 7.89012355e-04
4 2.08781169e+00 1.16981556e+02 1.47289281e+01 1.31040555e-01
5 4.60491698e-01 1.26014080e+01 -1.44703795e+00 1.49596079e-02
6 5.06374127e-01 3.75733536e+00 -4.72552140e-01 2.41674525e-01
7 1.42353095e+00 2.34847423e+00 -1.43694279e-01 3.27748268e-01
8 1.33482497e+00 1.62667185e+00 2.40692428e-01 5.31426899e-02
9 1.00323127e+00 1.58983802e+00 -5.30801141e-02 9.74843071e-03
10 1.17686642e+00 1.17697397e+00 -1.85946123e-01 1.00256605e-03
11 1.08720063e+00 1.00549956e+00 2.39576346e-02 8.75212743e-05
12 1.02776154e+00 1.03763150e+00 3.12543311e-02 4.81635307e-06
13 1.00185313e+00 1.01535053e+00 -5.35546070e-03 1.19529543e-08
14 1.00741129e+00 1.00546900e+00 -6.36327477e-03 3.28322670e-11
13.2 Symmetric update of rank 1 (SR1)

The BFGS update is an update of rank 2, i.e., the matrix Hk − Hk−1 is a matrix of
rank 2. It is also possible to define a symmetric update of rank 1, i.e., such that
Hk = Hk−1 + βvvT , (13.14)

318 Symmetric update of rank 1 (SR1)
where v ∈ Rn and β = 1 or −1. The secant equation is then written as
yk−1 = Hk dk−1 = Hk−1 dk−1 + βvvT dk−1 ,
i.e., as vT dk−1 is a scalar,

1
yk−1 − Hk−1 dk−1 = βvT dk−1 v = v, (13.15)
γ
or
v = γ(yk−1 − Hk−1 dk−1 ) (13.16)
with
1
= βvT dk−1 . (13.17)
γ
By replacing (13.16) in (13.17), we obtain
1
dTk−1 (yk−1 − Hk−1 dk−1 ) = . (13.18)
βγ2
According to (13.16), we have
βvvT = βγ2 (yk−1 − Hk−1 dk−1 )(yk−1 − Hk−1 dk−1 )T . (13.19)
We need only combine (13.14), (13.18), and (13.19) to get
(yk−1 − Hk−1 dk−1 )(yk−1 − Hk−1 dk−1 )T

Hk = Hk−1 + . (13.20)
dTk−1 (yk−1 − Hk−1 dk−1 )
Definition 13.3 (SR1 update). Consider the differentiable function f : Rn → R

and the two iterates xk−1 and xk . Let Hk−1 ∈ Rn×n be a symmetric matrix. The
symmetric rank one (SR1) update is defined by
(yk−1 − Hk−1 dk−1 )(yk−1 − Hk−1 dk−1 )T

Hk = Hk−1 + , (13.21)
dTk−1 (yk−1 − Hk−1 dk−1 )
with dk−1 = xk − xk−1 and yk−1 = ∇f(xk ) − ∇f(xk−1 ).
Note that this update is well defined only if dTk−1 (yk−1 − Hk−1 dk−1 ) 6= 0. Also,
it does not necessarily generate a positive definite matrix, even if Hk−1 is one. For
this reason, it is preferable to use BFGS when dealing with algorithms based on line
search. However, in the context of trust region methods, the SR1 update has proven
effective. We use the SR1 update in Newton’s method with trust region to obtain
Algorithm 13.2.
The iterations of Algorithm 13.2 applied to Example 5.8 are listed in Table 13.4
and shown in Figure 13.4. The values of Hk at each iteration, as well as its two
eigenvalues, are given in Table 13.3.
We notice that the matrix is not positive definite during some iterations (3, 5, 7,
etc.).
Algorithm 13.2: Quasi-Newton SR1 method

1 Objective
minx∈Rn f(x)
3 Input
4 The continuously differentiable function f : Rn → R
5 The gradient of the function ∇f : Rn → Rn
6 An initial solution x0 ∈ Rn
7 A first approximation of the symmetric Hessian H0 ∈ Rn×n (by default,
H0 = I)
8 The radius of the first trust region ∆0 (by default, ∆0 = 10)
9 The required precision ε ∈ R, ε > 0
10 The parameters 0 < η1 ≤ η2 < 1 (by default η1 = 0.01 and η2 = 0.9)
11 Output
12 An approximation of the optimal solution x∗ ∈ R
13 Initialization
14 k := 0
15 Repeat
16 Calculate dk by solving (approximately) the trust region subproblem by
using the Steihaug-Toint truncated conjugate gradient method (Algorithm
12.3)
f(xk ) − f(xk + dk )
17 ρ := .
mxk (xk ) − mxk (xk + dk )
18 if ρ < η1 then failure
19 xk+1 := xk
20 ∆k+1 := 12 kdk k
21 else success
22 xk+1 := xk + dk .
23 if ρ ≥ η2 then very good
24 ∆k+1 := 2∆k
25 else just good
26 ∆k+1 := ∆k .
27 k := k + 1
28 Define d̄k−1 := xk − xk−1 and yk−1 := ∇f(xk ) − ∇f(xk−1 ).
29 if |d̄k−1 (yk−1 − Hk−1 dk−1 )| ≥ 10−8 kd̄k−1 kkyk−1 − Hk−1 dk−1 k then the
denominator is non zero
30
(yk−1 − Hk−1 d̄k−1 )(yk−1 − Hk−1 d̄k−1 )T

Hk = Hk−1 + ,
d̄Tk−1 (yk−1 − Hk−1 d̄k−1 )
31 else
32 Hk = Hk−1 .
34 x∗ := xk
6
4 1.5
2 x0 1
x0
x∗ 0 x2 0.5 x2
-2
x∗ 0
-4
-6 -0.5
-2 -1 0 1 2 -1 -0.5 0 0.5 1
x1 x1
Figure 13.4: Iterates from the quasi-Newton SR1 method for Example 5.8

We come back to the Rosenbrock problem introduced in Sections 11.6 and 12.3.
The quasi-Newton BFGS method (Figure 13.5) presents a behavior that is similar
to Newton’s method, with a few more iterations, but without the calculation of the
second derivative. The SR1 method (Figure 13.6), on the other hand, encountered
some more difficulties. The trust region is relatively small, which explains the large
number of iterations (Figure 13.8). In this case, the method is often close to the
steepest descent method. It is important to note that the SR1 method used with line
search (Figure 13.7) is much more effective.
Clearly, these observations cannot be generalized. In particular, the trust region
algorithm with SR1 can be effective for other problems. We encourage the reader to
carry out the project of Section 13.5 for a more systematic analysis of the performances
related to the algorithms.
13.4 Comments
We conclude this chapter with a few comments.
• The BFGS update can also be combined with a trust region algorithm. In this
case, we cannot guarantee that the condition of Theorem 13.2 is satisfied. The
technique that consists in not performing an update in this case can be ineffective.
We refer the reader to Powell (1977), and Nocedal and Wright (1999, page 540)
for a description of an alternative update.
Table 13.3: Secant approximation of the SR1 method (Algorithm 13.2) for Example 5.8
k (Hk )1,1 (Hk )2,2 (Hk )1,2 λ1 λ2
1 +1.38780e+00 +1.16113e+00 -2.49977e-01 +1.00000e+00 +1.54894e+00
2 +1.50852e+00 +1.18087e+00 -2.01166e-01 +1.08526e+00 +1.60413e+00
3 +9.92963e-01 -1.42553e-02 -9.86122e-01 -6.17922e-01 +1.59663e+00
4 +1.56592e+00 +5.95949e-01 -3.94837e-01 +4.55548e-01 +1.70632e+00
5 +9.81728e-01 +8.85544e-03 -9.80476e-01 -5.99219e-01 +1.58980e+00
6 +1.33046e+00 +1.28506e+00 -3.13357e-01 +9.93580e-01 +1.62194e+00
7 +9.95038e-01 +3.68905e-02 -9.60395e-01 -5.57287e-01 +1.58922e+00
8 +1.87015e+00 +3.76238e-01 -4.15446e-01 +2.68479e-01 +1.97791e+00
9 +1.07848e+00 +2.20504e-01 -7.66576e-01 -2.28956e-01 +1.52794e+00
10 +1.86165e+00 +2.21270e-01 -7.42082e-01 -6.46135e-02 +2.14753e+00
11 +5.29518e-01 -4.83122e-01 +2.26599e-01 -5.31516e-01 +5.77911e-01
12 +5.29604e-01 -2.67251e-03 +2.20180e-01 -8.19450e-02 +6.08876e-01
13 +5.56284e-01 +1.50541e-01 +1.56244e-01 +9.73485e-02 +6.09477e-01

14 +5.85590e-01 +3.94032e-01 +2.40717e-01 +2.30739e-01 +7.48883e-01
15 +8.04073e-01 +1.08584e+00 -1.48061e-01 +7.40579e-01 +1.14934e+00
16 +1.30324e+00 +1.09341e+00 -8.66119e-02 +1.06228e+00 +1.33437e+00
17 +1.00670e+00 +1.08398e+00 -3.37434e-02 +9.94037e-01 +1.09664e+00
18 +9.99334e-01 +9.54417e-01 -2.85711e-03 +9.54236e-01 +9.99515e-01
19 +9.99555e-01 +9.99873e-01 +3.11648e-04 +9.99364e-01 +1.00006e+00
20 +1.00066e+00 +1.00000e+00 -6.64095e-05 +9.99995e-01 +1.00067e+00
321
Table 13.4: Quasi-Newton SR1 method for the minimization of Example 5.8 (∆0 = 10)
k xk f(xk ) k∇f(xk )k ∆k
0 +1.00000e+00 +1.00000e+00 +1.04030e+00 +1.75517e+00 +1.00000e+01
1 -5.40302e-01 +1.84147e+00 +2.90430e-01 +9.60942e-01 +1.00000e+01 +
2 -1.88588e-02 +1.50535e+00 -1.05553e-03 +5.02012e-02 +2.00000e+01 ++
3 -5.26023e-02 +1.48367e+00 -3.19396e-03 +6.26948e-02 +4.00000e+01 ++
4 -5.26023e-02 +1.48367e+00 -3.19396e-03 +6.26948e-02 +2.00000e+01 -
5 -1.05608e-01 +1.36062e+00 -1.64571e-02 +1.45885e-01 +4.00000e+01 ++
6 -1.05608e-01 +1.36062e+00 -1.64571e-02 +1.45885e-01 +2.00000e+01 -
7 -2.07848e-01 +1.25531e+00 -4.28899e-02 +2.22561e-01 +4.00000e+01 ++
8 -2.07848e-01 +1.25531e+00 -4.28899e-02 +2.22561e-01 +2.00000e+01 -
9 -4.35002e-01 +4.79315e-01 -2.91369e-01 +4.94801e-01 +4.00000e+01 ++
10 -4.35002e-01 +4.79315e-01 -2.91369e-01 +4.94801e-01 +2.00000e+01 -
Comments
11 -4.35002e-01 +4.79315e-01 -2.91369e-01 +4.94801e-01 +1.00000e+01 -

12 -4.35002e-01 +4.79315e-01 -2.91369e-01 +4.94801e-01 +5.00000e+00 -
13 -4.35002e-01 +4.79315e-01 -2.91369e-01 +4.94801e-01 +2.50000e+00 -
14 -1.05435e+00 -2.10470e-01 -4.75256e-01 +2.33154e-01 +2.50000e+00 +
15 -1.05435e+00 -2.10470e-01 -4.75256e-01 +2.33154e-01 +3.26811e-01 -
16 -9.18547e-01 +1.09084e-02 -4.96628e-01 +8.20075e-02 +3.26811e-01 +
17 -9.81944e-01 -3.27718e-03 -4.99832e-01 +1.83348e-02 +6.53623e-01 ++
18 -9.99794e-01 -8.64122e-04 -5.00000e-01 +8.88152e-04 +1.30725e+00 ++
19 -9.99997e-01 +4.04748e-05 -5.00000e-01 +4.05730e-05 +2.61449e+00 ++
20 -1.00000e+00 -3.34769e-09 -5.00000e-01 +3.39719e-09 +5.22898e+00 ++
• The SR1 update can also be combined with a line search algorithm. In this
case, the matrix is not necessarily positive definite, and a modified Cholesky
factorization is necessary, just as for Newton’s method with line search (Algorithm
11.8).
• Other update formulas have been proposed, including that of Davidon, studied by
Fletcher and Powell. It is called DFP, after the initials of the names of the three
researchers. The subject is vast and, in this book, we have chosen to include only
the two formulas that seem the most effective in practice.
2
x0 1.5
∗
x 1
0.5 x2
0
-0.5
-1
-1.5 -1 -0.5 0 0.5 1
x1
(a) 34 iterations
x∗ 1
0.8
0.6
x2
0.4
0.2
0
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1

x1
(b) Zoom
Figure 13.5: BFGS method with line search

Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 324 Comments
2
x0 1.5
∗
x 1
0.5 x2
0
-0.5
-1
-1.5 -1 -0.5 0 0.5 1
x1
(a) 97 iterations
x∗ 1
0.8
0.6
x2
0.4
0.2
0
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1

x1
(b) Zoom
Figure 13.6: SR1 method with trust region

Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 Quasi-Newton methods 325
2
x0 1.5
∗
x 1
0.5 x2
0
-0.5
-1
-1.5 -1 -0.5 0 0.5 1
x1
(a) 28 iterations
x∗ 1
0.8
0.6
x2
0.4
0.2
0
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1

x1
(b) Zoom
Figure 13.7: SR1 method with line search

326 Project
10
1
0.1
∆k
0.01
0.001
0.0001
1e-05
0 10 20 30 40 50 60 70 80 90 100
Iteration k
Figure 13.8: Evolution of the radius of the trust region for SR1
13.5 Project
Objective
The aim of the present project is to analyze the behavior of quasi-Newton methods
and compare them to descent methods and trust region methods.
Approach
Implement the BFGS method (Algorithm 13.1) and the SR1 method (Algorithm
13.2). Also implement a version of BFGS with trust region and a version of SR1 with
line search, by taking into account the comments in Section 13.4.
Compare these algorithms with the descent methods and the trust region methods.
Analyze the results by means of the method described in Section D.2. Choose as
performance index on the one hand the resolution time and on the other hand the
number #f + n#g, i.e., the number of evaluations of the function plus n times the
number of gradient evaluations.
Algorithms
Problems


2
min 2x1 x2 e−(4x1 +x2 )/8 .
x∈R2
n
X T
minn iαx2i , x̄ = 1 ... 1 ,
x∈R
i=1
with different values of n and of α.

min 3x21 + x42 .
x∈R2
T
2 2
min 100 x2 − x21 + 1 − x1 (Section 11.6) .
x∈R2
T T
Recommended starting points: 1.2 1.2 and −1.2 1 .
m
X
min −e−0.1 i + 5 e−i − 3 e−0.4 i
x∈R6
i=1
2
+ x3 e−0.1 ix1 − x4 e−0.1 ix2 + x6 e−0.1 ix5 ,
T
Chapter 14
Least squares problem
Contents
14.1 The Gauss-Newton method . . . . . . . . . . . . . . . . . 334
14.2 Kalman filter . . . . . . . . . . . . . . . . . . . . . . . . . . 337
14.3 Orthogonal regression . . . . . . . . . . . . . . . . . . . . . 341
14.4 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Least squares problems are optimization problems expressed in the form

m
1 2 1 1 X
minn g(x) = g(x)T g(x) = gi (x)2 , (14.1)
x∈R 2 2 2
i=1
where g : Rn → Rm is a differentiable function. In particular, these problems arise

when one wants to calibrate the parameters of a mathematical model by using data.
Example 14.1 (Unemployement in Switzerland). We are interested in analyzing
the relationship between the number of men and the number of women unemployed
in Switzerland. Table 14.1 reports the number of unemployed people from Jan-
uary 2012 to December 2013 (source: Swiss State Secretariat for Economic Affairs,
www.amstat.ch). It is postulated that, at any point in time, the number β of unem-
ployed women is related to the number α of unemployed men in the following way:
β = m(α; x1 , x2 ) = x1 α + x2 , (14.2)
where x1 and x2 are unknown parameters to be determined.1 Data in Table 14.1
reports for each month t the pair (αt , βt ) of observed number of men and women
who were unemployed at time t. Note that it is impossible to find parameters x1
1 In statistics, the quantity α is called an explanatory or independent variable, and the quantity
β is the dependent variable. Note that in the statistics literature, it is common to denote by
y the dependent variable, by x the independent variable, and by Greek letters the parameters.
However, in this textbook, we focus on the optimization problem that identifies the values of
the parameters such that the model fits the data well. To be consistent with the rest of the
book, we use x to denote the variables of this optimization problem, which are the unknown
parameters of the model.
330
and x2 such that (14.2) is verified for each data point. The model (14.2) is only an
approximation of the real relationship between the two quantities. Moreover, errors
are always presents in any measurement. Therefore, it is assumed that the model is
β = m(α; x1 , x2 ) = x1 α + x2 + ε, (14.3)
where ε is a random variable.
Table 14.1: Unemployment data in Switzerland

Men Women
January 2012 77,005 57,312
February 2012 76,315 56,839
March 2012 70,891 55,501
April 2012 67,667 55,491
May 2012 64,643 54,217
June 2012 61,770 53,098
July 2012 61,593 54,701
August 2012 63,227 56,596
September 2012 63,684 56,663
October 2012 66,914 58,622
November 2012 72,407 59,660
December 2012 82,413 59,896
January 2013 86,515 61,643
February 2013 84,896 61,105
March 2013 79,660 59,333
April 2013 75,827 60,024
May 2013 72,606 58,684
June 2013 69,423 57,075
July 2013 69,690 58,826
August 2013 69,744 60,212
September 2013 70,418 60,654
October 2013 71,998 61,445
November 2013 77,268 61,805
December 2013 87,299 62,138
The least squares estimation of the unknown parameters is performed by solving

the following optimization problem:
24
X 24
X
min (m(αt ; x1 , x2 ) − βt )2 = (x1 αt + x2 − βt )2 . (14.4)
x1 ,x2
t=1 t=1
The optimal solution is x∗1 = 0.253 and x∗2 = 39,982. The fitted model and the data
are presented in Figure 14.1.
Least squares problem 331
Number of unemployed women 63000

62000
61000
60000
59000
58000
57000
56000
55000
Linear model
54000
Data
53000
60000 65000 70000 75000 80000 85000 90000
Number of unemployed men
Figure 14.1: Unemployment in Switzerland: data and fitted linear model
To generalize the formulation of Example 14.1, let us consider a system where

each configuration i defined by the input values αi produces the output values βi .
We have at our disposal a mathematical model enabling to predict the output values
based on the input values, i.e.,
βi + εi = m(αi ; x) , (14.5)
where x represents the parameters of the model and εi is a random variable with
mean zero and variance σ2 corresponding to the modeling and measurement errors.
Formally, the least squares problem can be derived from the theory of maximum like-
lihood estimation, assuming that the error terms εi are normally distributed. Here,
we prefer to interpret it using an optimization argument, that is as the identification
of the parameters x that minimize the error, i.e.,
X
min ε2i
x,ε
i
with constraint
βi + εi = m(αi ; x) , i = 1, . . . ,
or, by using the constraint to eliminate ε,

X 2
min m(αi ; x) − βi , (14.6)
x
i
which is a least squares problem, with gi (x) = m(αi ; x) − βi . In the context of

Example 14.1, the mathematical model is linear. Example 14.2 presents a more
complex model.
332
Example 14.2 (Neural networks). The training of neural networks can also be viewed
as a least squares problem. We here provide a brief introduction to the concept of
neural networks. The reader is referred to the abundant literature on the subject
for more details such as Haykin (2008) or Iovine (2012). A neural network, in the
mathematical sense of the term, is inspired by its biological analog. The idea is to
develop a complex entity by connecting a large number of single units in a network,

each performing a limited task, using information provided by the other units.
Consider a network organized in N layers of neurons. A neuron j in layer k uses
the information provided by the neurons in layer k − 1, process it and produces a
result νj,k :
nk−1
!
X
νj,k = φ xjk 0 + xjk i νi,k−1 , (14.7)
i=1
where nk−1 is the number of neurons in layer k − 1, xjk is a vector of 1 + nk−1

parameters weighting the importance of the information received by the different
neurons from the previous layer, and φ : R → R is a continuously differentiable
function, called an activation function. Two typical examples of activation functions
are the sigmoidal function
1
φ(α) = (14.8)
1 + e−α
and the hyperbolic tangent function
eα − e−α
φ(α) = . (14.9)
eα + e−α
Note that these functions are continuous approximations of “switches,” that is, of a
function with discrete output: 0 and 1 for (14.8) (see Figure 14.2(a)) and -1 and 1
for (14.9) (see Figure 14.2(b)).
1 1
φ(α)
φ(α)
0 -1
-10 -5 0 5 10 -10 -5 0 5 10
α α
(a) Sigmoid (b) Hyperbolic tangent
Figure 14.2: Examples of activation functions

The first layer uses information from the exterior of the system, denoted by νj,0 ,
j = 1, . . . , n0 . The neural network can be seen as a model (14.5), where the input pa-
rameters αi correspond to the information νj,0 , j = 1, . . . , n0 , the output parameters
βi correspond to the information produced by the last layer, i.e., νj,N , j = 1, . . . , nN ,
and the unknown parameters of the model are all the weightings xjk , j = 1, . . . , nk−1 ,
k = 1, . . . , N. The calibration of the weights xjk is sometimes called the training

phase of the neural network.
It is important to note that this least squares problem is significantly more complex
to solve than that of Example 14.1. Bertsekas (1999) uses the following example to
illustrate that complexity:
5
1X 2
min βi − φ(x1 αi + x2 ) ,
x0 ,x1 2
i=1
where φ is the hyperbolic tangent function (14.9). The five pairs (αi , βi ) are listed
in Table 14.2. The resulting objective function is shown in Figure 14.3.
Table 14.2: Data for the example on a neural network

αi βi
1.165 1
0.626 −1
0.075 −1
0.351 1
-0.696 1
6
5
4
3
2
-10
-5
x1 0 10
5 5
0
10 -5 x2
-10
Figure 14.3: Example of an objective function for the training of a neural network
Now that we have defined and illustrated the problem, we present an algorithm
that exploits its specific structure.
334 The Gauss-Newton method
14.1 The Gauss-Newton method

We consider applying Newton’s method to the least squares problem (14.1). And
we focus on the local version first (Chapter 10). To do so, one must calculate the
gradient and the Hessian matrix of the problem.
We derive
1
f(x) = g(x)T g(x)
2
to obtain
Xm
∇f(x) = ∇g(x)g(x) = ∇gi (x)gi (x) ,
i=1
where ∇g(x) ∈ Rn×m is the gradient matrix of g (Definition 2.17) and ∇gi (x) ∈ Rn
is the gradient of gi . By differentiating again, we get
m
X
∇2 f(x) = ∇gi (x)∇gi (x)T + ∇2 gi (x)gi (x)
i=1
m
X
= ∇g(x)∇g(x)T + ∇2 gi (x)gi (x) .
i=1
The second term used for calculating the Hessian matrix is generally computation-
ally demanding in practice, since it involves the second derivatives of the functions
gi . It is therefore wise to ignore it and approximate the Hessian matrix using only the
first term, i.e., ∇g(x)∇g(x)T . Note that this matrix is always positive semidefinite.
We obtain the Gauss-Newton method presented as Algorithm 14.1, that uses only
the first derivatives of g, not the second.
Note that, if ∇g(xk )∇g(xk )T is non singular, it is positive definite and we have
a descent method. If this is not the case, the technique presented in Section 11.5,
based on a modified Cholesky factorization (Algorithm 11.7) is appropriate. The
name Levenberg-Marquardt is often used in this context, in reference to the work of
Levenberg (1944) and Marquardt (1963). Moreover, the method is locally convergent,
i.e., it converges if x0 is not too far from x∗ . The adaptations that enable the global
convergence of Newton’s local method, namely linear search (Algorithm 11.8) and
trust region (Algorithm 12.4), can of course be used to render the Gauss-Newton
method globally convergent.
In the case where the function g is linear, i.e.,
g(x) = Ax − b
with A ∈ Rm×n , we have ∇g(x) = AT and the Gauss-Newton equation (14.10) is
written as
AT Adk+1 = −AT (Axk − b) .
Since dk+1 = xk+1 − xk , we obtain
AT Axk+1 = AT b ,
and this, regardless of xk . This system of equations is called the system of normal
equations of the linear least squares problem.
Algorithm 14.1: Gauss-Newton method

1 Objective
2 To find (an approximation of) a local optimum of the least squares problem
minx∈Rn f(x) = 21 g(x)T g(x).
3 Input
4 The differentiable function g : Rn → Rm .
5 The gradient matrix of g: ∇g : Rn → Rn×m .
8 Output
9 An approximation of the optimal solution x∗ ∈ Rn .
10 Initialization
11 k := 0.
12 Repeat
13 Calculate dk+1 that solves
∇g(xk )∇g(xk )T dk+1 = −∇g(xk )g(xk ). (14.10)
14 xk+1 := xk + dk+1
15 k=k+1
16 Until ∇g(xk )g(xk ) ≤ ε
17 x∗ := xk
Definition 14.3 (Normal equations). Consider the linear least squares problem
1
min kAx − bk22 , (14.11)
x∈Rn 2
with A ∈ Rm×n and b ∈ Rm . The equations of the system
AT Ax = AT b (14.12)
are called the normal equations of (14.11).
Theorem 14.4 (Normal equations). Consider A ∈ Rm×n and b ∈ Rm . x∗ solves

the normal equations (14.12) if and only if x∗ is an optimal solution to the linear
least square problem (14.11). Moreover, if A is of full rank, x∗ is the unique
optimal solution to (14.11).
336 The Gauss-Newton method
Proof. The objective function of (14.11) is

1
f(x) = (Ax − b)T (Ax − b)
2
and
∇f(x) = AT Ax − AT b ,
and
∇2 f(x) = AT A .
The normal equations are therefore equivalent to the first order optimality conditions
∇f(x∗ ) = 0, and are necessary (Theorem 5.1). The eigenvalues of AT A are the
singular values of A squared. Therefore, ∇2 f(x) is positive semidefinite for all x
and f is convex. Moreover, if A is of full rank, ∇2 f(x) is positive definite and f is
strictly convex. The sufficient global optimality conditions are therefore verified, and
Theorem 5.9 applies to prove the result.
From this result, it appears that the Gauss-Newton method identifies the optimal
solution in a single iteration when the function g is linear. We now give another
interpretation of the method that supports this observation.
To do this, we use a simple model of the function at each iteration. We already
used this idea when presenting Newton’s local method, where a quadratic model was
used (Algorithm 10.2). Here, we model g and not f, and we utilize a linear model
(Definition 7.9).
Consider the least squares problem (14.1) and an iterate xk . Replace g(x) by the
linear model in xk
mxk (x) = g(xk ) + ∇g(xk )T (x − xk ) .
The problem then becomes
1 2 1
min M(x) = min mxk (x) 2
= min mxk (x)T mxk (x) .
x x 2 x 2
We have
1 1
M(x) = mxk (x)T mxk (x) = g(xk )T g(xk )
2 2
+ (x − xk )T ∇g(xk )g(xk )
1
+ (x − xk )T ∇g(xk )∇g(xk )T (x − xk ) .
2
The gradient of this expression is
∇M(x) = ∇g(xk )∇g(xk )T (x − xk ) + ∇g(xk )g(xk ) .
It is zero for −1
xk+1 = xk − ∇g(xk )∇g(xk )T ∇g(xk )g(xk ) ,
T
if ∇g(xk )∇g(xk ) is invertible. This equation is equivalent to (14.10). Then, an
iteration of the Gauss-Newton method is a matter of linearizing the function g around
xk and solving the least squares problem thus obtained. It is therefore evident that, if
g is already linear, the method converges in one iteration (if ∇g(xk )∇g(xk )T = AT A
is invertible).
14.2 Kalman filter

We now consider the linear least squares problem when the data is organized into
blocks. There are several motivations for this.
Nowadays, more and more data is available. In the era of “big data,” the size
of problem (14.11) can be huge. More precisely, m may be much larger than the
number of parameters n. As a consequence, it may happen that the matrix A cannot
be stored as such in memory, and has to be split into blocks just to be handled.
Another context is when data is collected by different sources. In this case, it
is convenient not to wait until data is available from all sources to start solving
the problem. Suppose that a couple of sources have not reported yet, and that the
problem is solved with the available data. When the last pieces of data are finally
made available, it is desirable to update the previous solution, rather than solving
the whole problem from the beginning. In this context, each block of data would
correspond to one of the sources.
The situation is similar when data is available over time, such as in real-time
applications. In this case, each time period corresponds to a block of data. And,
again, it is desirable to calculate a solution with the available data, and update it
when more data is made available.
The matrices involved in the least squares problem (14.11) are represented in
Figure 14.4. The matrix A and the vector b are sliced into J blocks, possibly of
different sizes m1 ,. . . ,mJ .
We want to obtain a first approximation of the calibrated parameters with the
first block of data and subsequently update it as other blocks become available. We
write the least squares problem
min kAx − bk22

x∈Rn
as
J
X 2
minn Aj x − bj 2
,
x∈R
j=1
where Aj ∈ Rmj ×n and bj ∈ Rmj . The optimal solution to this problem is generated
incrementally, starting with the optimal solution to the subproblem corresponding to
the first block of data:
2
minn A1 x − b1 2 .
x∈R
We call x1 ∈ Rn the solution of this problem. Assume that the matrix A1 is of full
rank. In this case, according to Theorem 14.4, x1 is given by the normal equations
AT1 A1 x1 = AT1 b1 . (14.13)
Since A1 is of full rank, AT1 A1 is invertible and

−1
x1 = AT1 A1 AT1 b1 .
338 Kalman filter
m1
m2
m
n
mJ mJ−1
A x b
Figure 14.4: Structures of the matrix and vectors involved in the least squares problem
organized by blocks
Consider now the problem composed of the first two blocks:

2 2
min A1 x − b1 2
+ A2 x − b2 2
,
x∈Rn
that we can also write 2

A1 b1
minn x− .
x∈R A2 b2 2
We denote x2 ∈ Rn the optimal solution of this problem. It verifies the normal
equations
AT1 A1 + AT2 A2 x2 = AT1 b1 + AT2 b2 .
From (14.13), we obtain

AT1 A1 + AT2 A2 x2 = AT1 b1 + AT2 b2
= AT1 A1 x1 + AT2 b2
= AT1 A1 x1 + AT2 A2 x1 − AT2 A2 x1 + AT2 b2

= AT1 A1 + AT2 A2 x1 + AT2 (b2 − A2 x1 ) .
Since AT1 A1 is invertible, AT1 A1 + AT2 A2 is too and we get

−1 T
x2 = x1 + AT1 A1 + AT2 A2 A2 (b2 − A2 x1 ) .
The same technique is used for the following blocks:

x3 = x2 + (AT1 A1 + AT2 A2 + AT3 A3 )−1 AT3 (b3 − A3 x2 ),
x4 = x3 + (AT1 A1 + AT2 A2 + AT3 A3 + AT4 A4 )−1 AT4 (b4 − A4 x3 ),
and so on. For the sake of completeness, we can also obtain an incremental form for
the first block. Indeed, for any x0 ∈ Rn ,

−1 T
x1 = x0 − x0 + AT1 A1 A1 b1
T
−1 T −1 T
= x0 − A1 A1 A1 A1 x0 + AT1 A1 A1 b1
−1 T
= x0 + AT1 A1 A1 (b1 − A1 x0 ) .
P
If we write Hj = jk=1 ATk Ak , we obtain the formula:
xj = xj−1 + H−1 T
j Aj (bj − Aj xj−1 ), (14.14)
where
Hj = Hj−1 + ATj Aj . (14.15)
Note that only data from block j, that is Aj and bj , is used here, irrespectively of the
number of blocks already treated. This is the Kalman filter algorithm, presented as
Algorithm 14.2.
Algorithm 14.2: The Kalman filter method

1 Objective
2 To find, in an incremental manner, the optimal solution x∗ to a linear least
squares problem
J
X 2
minn Aj x − bj 2 . (14.16)
x∈R
j=1
3 Input
4 The matrices Aj ∈ Rmj ×n , j = 1, . . . , J.
5 The vectors bj ∈ Rmj , j = 1, . . . , J.
6 An initial solution x0 ∈ Rn (default: x0 = 0).
7 An initial matrix H0 ∈ Rn×n (default: H0 = 0).
8 Output
9 The optimal solution x∗ ∈ Rn .
10 Initialization
11 j := 0.
12 Repeat
13 j := j + 1
14 Hj := Hj−1 + ATj Aj
15 Calculate dj solving Hj dj = ATj (bj − Aj xj−1 )
16 xj := xj−1 + dj
17 Until j = J
18 x∗ = xJ
340 Kalman filter
Example 14.5 (Kalman filter). We consider again the unemployment data of Ex-
ample 14.1 (Table 14.1). At the end of 2012, a model is calibrated using the 2012
data. Then, as a new piece of data arrives every month, the estimated value of the
parameters is updated with the new information. The estimated parameters at each
stage of this process are reported in Table 14.3.
Block x1 x2
1 0.210758 41998.1
2 0.23963 40074.4
3 0.249038 39451.5
4 0.249355 39431.6
5 0.255092 39122.6
6 0.255453 39157.6
7 0.254998 39200.6
8 0.251087 39579.7
9 0.24501 40172.3
10 0.24095 40617.4
11 0.241631 40726.7
12 0.253012 40011.7
13 0.253436 39982.5
Table 14.3: Kalman filter: estimated parameters of Example 14.5
Figure 14.5 compares the model estimated on data from 2012 only with the model
estimated on the entire data set, 2012 and 2013.
63000
Number of unemployed women
62000
61000
60000
59000
58000
57000
56000 Model (2012)
55000 Model (2012-2013)
Data (2012)
54000
Data (2013)
53000
60000 65000 70000 75000 80000 85000 90000
Number of unemployed men
Figure 14.5: Unemployment in Switzerland: data and fitted linear models

The Kalman filter method only works if the matrix defining the first block is of full
rank. To achieve this, it usually suffices to put enough data in the first block, in order
for a first model to be estimated. Indeed, if n parameters have to be estimated, at
least n pieces of data are necessary, and even more if there is colinearity. Where this is
not possible, one must define H0 such that the matrix H1 = H0 + AT1 A1 is invertible.
Typically, we choose H0 a multiple of the identity. Using this technique naturally

affects the result. The thus-perturbed Kalman filter only gives an approximation of
the optimal solution. However, if the number of blocks J is large, the impact of the
arbitrary matrix H0 should be sufficiently small for all practical purposes.
Kalman filter in real time

The Kalman filter is particularly well adapted for real-time calibration models, i.e.,
when the value of the estimated parameters must be updated as new data becomes
available. Suppose, for example, that we are interested in calibrating the parameters
of a model for vehicular traffic on a highway, using measurements of the flow of cars
on each lane of the highway. Data arrives every minute, say, and the Kalman filter
method is used to update the estimation of the parameters. During the afternoon,
the data collected during the morning peak hour is less relevant to reflect the current
state of the traffic, compared to the data collected 5 minutes ago. In this case, it is
necessary to give greater weight to recent data as opposed to old data. Assume that
the data becomes available in regular intervals and that the pair (Aj , bj ) corresponds
to data from the time interval j. We introduce a discount parameter λ, with 0 < λ ≤ 1,
that represents the relative importance of the data from one time interval j compared
to the previous interval j − 1. When time moves forward, we multiply the weight of
all past data sets by λ. Consequently, the weight associated with the data from two
intervals ago is λ2 , and the weight associated with data from interval i ≤ j is λj−i .
The least squares problem that we have to solve is therefore
J
X 2
min λJ−j Aj x − bj 2
x∈Rn
j=1
where J is the current time interval. We apply the Kalman filter update in this
context, and obtain Algorithm 14.3. Note that the whole memory of the process is
accumulated in the matrix HJ−1 and the vector xJ−1 , irrespectively of the number of
past iterations.
14.3 Orthogonal regression

When estimating the parameters of a mathematical model (14.5) by using the least
squares method, we assume that the observations βi (in statistics called dependent
variables) are subject to random errors, while the observations αi (the independent
variables) are accurately known.
342 Orthogonal regression
Algorithm 14.3: The Kalman filter method in real time

1 Objective
2 To update the parameters of a linear model as the new data becomes
available. At each time interval J, the previous solution of the time
interval J − 1 is updated to obtain the solution of the problem
J
X 2
minn λJ−j Aj x − bj 2
. (14.17)
x∈R
j=1
3 Input
4 The matrix AJ ∈ RmJ ×n .
5 The vector bJ ∈ RmJ .
6 The previous solution xJ−1 ∈ Rn .
7 The previous matrix HJ−1 ∈ Rn×n .
8 A discount factor λ such that 0 < λ ≤ 1.
9 Output
10 xJ and HJ
11 Update
12 HJ := λHJ−1 + ATJ AJ
13 Calculate dJ solving HJ dJ = ATJ (bJ − AJ xJ−1 )
14 xJ := xJ−1 + dJ
The hypothesis concerning the accuracy of αi is often too strong. In Example

14.1, there is no reason to assume that there are errors in the measurement of female
unemployment and not in the measurement of male unemployment.
Assume now that both αi and βi are subject to measurement errors. The model
is now written as
βi + εi = m(αi + ξi ; x) , (14.18)
where εi and ξi are random variables assumed to be independent and with mean zero.
In this case, the least squares problem amounts to finding the value of the parameters
that make the errors as small as possible, that is
X
min (ε2i + ξ2i )
x,ε,ξ
i
with the constraint
βi + εi = m(αi + ξi ; x) , i = 1, . . . ,
or, by using the constraint to eliminate ε,

X 2
min m(αi + ξi ; x) − βi + ξ2i . (14.19)
x,ξ
i
It is important to note that if m is linear, we no longer have a standard linear least

squares problem that can be solved with normal equations. Moreover, the number of
unknowns is in this case n + m. Then, when the number m of pieces of data is large,
the type of approach can considerably increase the size of the problem.
Geometrically, the problem (14.19) amounts to minimizing the distance between
the observations (αi , βi ) and the point predicted by the model αi +ξi , m(αi +ξi ; x) ,
as illustrated in Figure 14.6.
(αi + ξi , m(αi + ξi ; x))
(αi , βi )
Figure 14.6: Orthogonal regression
14.4 Project
Objectives
To compare the Gauss-Newton method with the quasi-Newton methods, analyze the
Kalman filter method, and understand the role of orthogonal regression.
Approach
1. Compare the performances of the quasi-Newton methods (BFGS and SR1) and
the Gauss-Newton method for several non linear problems. Utilize the technique
described in Chapter D to analyze the results.
344 Project
2. Generate a large sample with a linear model and apply the Kalman filter method.
After how many iterations does the approximation of the optimal solution be-
come stabilized? If synthetic data is used, compare with the “real” value of the
parameters. Perform this analysis with samples combining data of varying quality.
3. Generate a sample by introducing perturbations for α and β. Solve the classic least
squares problem and compare with the optimal solution obtained by orthogonal
regression and with the “real” value.
Algorithms
Algorithms 13.1, 13.2, 14.1 and 14.2.
Problems
As part of this project, we recommend working with real data to test the algorithms.
If such data is unavailable, here is how to generate synthetic data.
Apply a model : choose a mathematical model m(α, x), with α ∈ Rp and x ∈ Rn .
For instance, a simple linear model
m(α, x) = αx , p = 1, n = 1 ,
or a non linear model

1
m(α, x) = , p = n.
1 + e−αT x
Generate data : generate a sample by randomly choosing values for α for each “ob-
servation.” For instance, with p = 2,
Obs. α1 α2
1 0.007021 6.760832
2 0.017676 4.377047
3 0.866839 0.883049
4 3.235088 0.329827
5 4.699759 0.186086
6 1.080447 0.275139
7 2.482403 0.492628
8 3.544106 1.038803
Choose the parameters: choose the values for the elements of x. In our example, let
us take x1 = 1 and x2 = −2.
Generate dependent values: we use two methods.
1. Choose a value of variance σ2 . For each “observation” in the sample, pick a random
number r according to the normal law N(0, σ2 ) and generate
β = m(α, x) + r .
T
In our example, by taking m(α, x) = 1/(1 + e−α x
) and σ = 0.05, we get
Obs. α1 α2 m(α, x) r β
1 0.007021 6.760832 1.351E-06 0.072012 0.072013
2 0.017676 4.377047 0.0001606 -0.06527 -0.06511
3 0.866839 0.883049 0.28920265 0.032485 0.321687
4 3.235088 0.329827 0.92926373 0.022983 0.952247
5 4.699759 0.186086 0.9869726 -0.02112 0.965848
6 1.080447 0.275139 0.62952263 0.051388 0.680911

7 2.482403 0.492628 0.8171486 0.012699 0.829847
8 3.544106 1.038803 0.81252489 -0.03009 0.782435
2. Choose a diagonal matrix Σα ∈ Rp and a value of variance σ2β . For each “observa-
tion,” pick a random vector q according to the multivariate normal law N(0, Σα )
and a random number r according to the normal law N(0, σ2β ) and generate
β = m(α + q, x) + r .
In our example, let us take

0.02 0
Σ= ,
0 0.01
to obtain
Obs. α1 α2 q1 q2
1 0.007021 6.760832 0.020367 0.003594
2 0.017676 4.377047 0.015533 0.004155
3 0.866839 0.883049 -0.01972 0.000366
4 3.235088 0.329827 -0.03204 0.007443
5 4.699759 0.186086 -0.0363 -0.01052
6 1.080447 0.275139 0.038858 -0.01315
7 2.482403 0.492628 0.026068 0.006862
8 3.544106 1.038803 0.008163 0.002044
and
Obs. m(α + q, x) r β
1 1.36895E-06 0.072012 0.072013
2 0.000161766 -0.06527 -0.06511
3 0.285016732 0.032485 0.317501
4 0.926116236 0.022983 0.949099
5 0.986775082 -0.02112 0.96565
6 0.644587833 0.051388 0.695976
7 0.818985987 0.012699 0.831685
8 0.813144895 -0.03009 0.783055
Variants: it is interesting to combine, in a single sample, several samples, generated
with different variances. This simulates a process of data collection for which one
subsample is more accurate than another.
The problem to solve is
1X 2
minn β(j) − m α(j), x .
x∈R 2
j
Chapter 15
Direct search methods
Contents
15.1 Nelder-Mead . . . . . . . . . . . . . . . . . . . . . . . . . . 348

15.2 Torczon’s multi-directional search . . . . . . . . . . . . . 354
15.3 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
In many applications, it is problematic to calculate the derivatives, either because they

require long calculation times, or because their analytical formula is not available. It
is conceivable to use finite-difference approximations of first derivatives (Algorithm
8.3) and then utilize quasi-Newton methods (Chapter 13) to approximate the second
derivatives. Alternatively, it is possible to use automatic differentiation methods
(Griewank, 2000), that automatically provide the derivatives of a function specified by
a computer program, so that only the objective function needs to be formulated and
coded by the user. These techniques may require a significant amount of calculation,
though.
Sometimes, the value of the function is obtained by experiments or simulation

tools, and the above methods cannot be used anymore. Indeed, when the value of the
function is obtained by concrete experiments, it is often difficult to exactly reproduce
the same experiment, with one parameter slightly altered, in order to calculate the
finite-difference approximation.
Several methodologies have been proposed that require only the value of the objec-
tive function calculated at the requested iterates, and do not attempt to approximate
the derivatives. Derivative-free methods utilize models of the objective function
generated from interpolation techniques. We refer the reader to Conn et al. (2009)
for an introduction. In this chapter, we introduce direct search methods that use
geometrical objects to investigate the solution space.
348 Nelder-Mead
15.1 Nelder-Mead
The best known of these approaches is the simplex method1 by Nelder and Mead
(1965).
Definition 15.1 (Simplex). A k-dimensional simplex is the convex hull of k + 1

affinely independent vectors x1 ,. . . ,xk+1 of Rn , k ≤ n, i.e., the k vectors x1 − xk+1 ,
x2 − xk+1 , . . . , xk − xk+1 are linearly independent. For example, three non aligned
points in R2 , or four non coplanar points in R3 are affinely independent and define
2- and 3-dimensional simplices, respectively.
The idea behind the Nelder-Mead method is to define an n-dimensional simplex

in Rn from n+1 affinely independent vectors. We assume that these points are sorted
such that
f(x1 ) ≤ f(x2 ) ≤ . . . ≤ f(xn+1 ).
Then, the worst of these points, i.e., xn+1 is replaced by a better one.
In order to determine this better point, we calculate the center of gravity of the
simplex formed by the other points, i.e.,
n
1X
xc = xi
n
i=1
and consider the direction pointing from xn+1 towards xc , that is,
d = xc − xn+1 .
The method then tries several iterates in this direction,
x(α) = xn+1 + αd = (1 − α)xn+1 + αxc .
Note that x(1) = xc . If 0 < α < 1, x(λ) lies between xn+1 and xc , and if α > 1,
x(α) is beyond xc . The values of α tested by the algorithm are 21 , 1, 23 , 2 and 3, as
illustrated in Figure 15.1.
xe = x(3)
x1 xr = x(2)
xm = x( 32 )
xc = x(1)
✒
x− = x( 12 )
x3 x2
Figure 15.1: Nelder-Mead method
1 Not to be confused with the simplex algorithm in linear optimization, described in Chapter 16.
Direct search methods 349
The best of these five vectors is then chosen to replace xn+1 , which thus forms a
new simplex. The method is described as Algorithm 15.1.
Algorithm 15.1: Nelder-Mead

1 Objective
2 To find (an approximation of) a local minimum of the optimization problem
min f(x) . (15.1)

x∈Rn
3 Input
5 The affinely independent vectors x01 , . . . , x0n+1 of Rn , such that
f(x0i ) ≤ f(x0i+1 ), i = 1, . . . , n.
7 Output
9 Initialization
10 k := 0
11 Repeat
n
1X k
12 xc := xi
n
i=1
13 Define dk := xc − xkn+1
14 xr := xkn+1 + 2dk = 2xc − xkn+1
15 if f(xr ) < f(xk1 ) then search beyond xr
16 xe := xkn+1 + 3dk = 2xr − xc
17 if f(xe ) < f(xr ) then
18 b
x := xe
19 else
20 b
x := xr
21 if f(xkn ) > f(xr ) ≥ f(xk1 ) then

22 b
x := xr
23 if f(xr ) ≥ f(xkn ) then search before xr
24 if f(xr ) ≥ f(xkn+1 ) then
25 x := xkn+1 + 12 dk = 12 (xkn+1 + xc )
b
26 else
27 x := xkn+1 + 32 dk = 12 (xr + xc )
b
28 xk+1
n+1 := bx
29 xk+1
i = x k
i , i = 1, . . . , n.
30 k := k + 1.
31 Renumber to get f(xki ) ≤ f(xki+1 ), i = 1, . . . , n.
32 Until kdk k ≤ ε
33 x∗ = xk1
350 Nelder-Mead
Example 15.2 (Nelder-Mead method). Apply the Nelder-Mead method to the prob-
lem
1 9
min f(x) = x21 + x22 ,
x∈R2 2 2
by using as a starting simplex the one generated by the points

1 2 1.1
, , .
1 1.1 2
The first four iterations are illustrated in Figure 15.2 and all of the generated simplices
are shown in Figure 15.3(a). The details of the first four and the last two iterations
are listed in Table 15.1, where the 3 points of the simplex, the point xr and the new
point bx are given, in addition to the value of the associated function.
2 2
1.5 1.5
1 x2 1 x2
0.5 0.5
0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
x1 x1
(a) k = 0 (b) k = 1
2 2
1.5 1.5
1 x2 1 x2
0.5 0.5
0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
x1 x1
(c) k = 2 (d) k = 3
Figure 15.2: First four iterations of the Nelder-Mead method for Example 15.2
2
1.5
1 x2
0.5
-0.5 0 0.5 1 1.5 2

x1
(a) All
0.1
0.05
0 x2
-0.05
-0.1
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2
x1
(b) Zoom
Figure 15.3: Iterations of the Nelder-Mead method for Example 15.2
It is important to note that there is no guarantee that the Nelder-Mead method

converges, as shown in the following example, proposed by McKinnon (1998). It is
therefore a heuristic algorithm (see Definition 27.1).
352 Nelder-Mead
Table 15.1: Iterations of the Nelder-Mead method for Example 15.2

Simplex xr b
x
1 1.0000e+00 2.0000e+00 1.1000e+00 1.9000e+00 1.9000e+00
1.0000e+00 1.1000e+00 2.0000e+00 1.0000e-01 1.0000e-01
f 5.0000e+00 7.4450e+00 1.8605e+01 1.8500e+00 1.8500e+00

2 1.0000e+00 2.0000e+00 1.9000e+00 9.0000e-01 9.0000e-01
1.0000e+00 1.1000e+00 1.0000e-01 -4.4409e-16 -4.4409e-16
f 5.0000e+00 7.4450e+00 1.8500e+00 4.0500e-01 4.0500e-01
3 1.0000e+00 9.0000e-01 1.9000e+00 1.8000e+00 1.2000e+00
1.0000e+00 -4.4409e-16 1.0000e-01 -9.0000e-01 5.2500e-01
f 5.0000e+00 4.0500e-01 1.8500e+00 5.2650e+00 1.9603e+00
4 1.2000e+00 9.0000e-01 1.9000e+00 1.6000e+00 1.3000e+00
5.2500e-01 -4.4409e-16 1.0000e-01 -4.2500e-01 2.8750e-01
f 1.9603e+00 4.0500e-01 1.8500e+00 2.0928e+00 1.2170e+00
..
.
42 -2.7372e-06 -1.3400e-05 6.6939e-06 1.7357e-05 -5.7107e-06
-4.7526e-06 2.8202e-06 -6.4183e-07 -8.2146e-06 6.1485e-08
f 1.0539e-10 1.2557e-10 2.4258e-11 4.5428e-10 1.6323e-11
43 -2.7372e-06 -5.7107e-06 6.6939e-06 3.7204e-06 2.1060e-06
-4.7526e-06 6.1485e-08 -6.4183e-07 4.1723e-06 1.9410e-06
f 1.0539e-10 1.6323e-11 2.4258e-11 8.5255e-11 1.9172e-11
Example 15.3 (The McKinnon example). Consider the function
360 x21 + x2 + x22 if x1 ≤ 0

f(x) =
6x21 + x2 + x22 if x1 ≥ 0 ,
T
for which the minimum is 0 21 . Apply the Nelder-Mead algorithm with the
initial simplex formed with the points
 √ 
1+ 33
1  8  0
 
1
,  1 − √33  , 0
.
T
In this case, the algorithm converges toward the point 0 0 (which is not sta-
tionary), as shown in Figure 15.4.
1
0.8
0.6
0.4
0.2 x2
0
-0.2
-0.4
-0.6
-0.2 0 0.2 0.4 0.6 0.8 1 1.2
x1
Figure 15.4: Iterations of the Nelder-Mead method for the McKinnon example
Table 15.2: Iterations of the Nelder-Mead method for the McKinnon example
Simplex xr b
x
1 1.0000e+00 8.4307e-01 0.0000e+00 -1.5693e-01 7.1077e-01
1.0000e+00 -5.9307e-01 0.0000e+00 -1.5931e+00 3.5173e-01
f 8.0000e+00 4.0233e+00 0.0000e+00 9.8105e+00 3.5066e+00
2 7.1077e-01 8.4307e-01 0.0000e+00 -1.3230e-01 5.9923e-01
3.5173e-01 -5.9307e-01 0.0000e+00 9.4480e-01 -2.0860e-01
f 3.5066e+00 4.0233e+00 0.0000e+00 8.1389e+00 1.9894e+00
3 7.1077e-01 5.9923e-01 0.0000e+00 -1.1154e-01 5.0519e-01
3.5173e-01 -2.0860e-01 0.0000e+00 -5.6033e-01 1.2372e-01
f 3.5066e+00 1.9894e+00 0.0000e+00 4.2325e+00 1.6703e+00
4 5.0519e-01 5.9923e-01 0.0000e+00 -9.4037e-02 4.2591e-01
1.2372e-01 -2.0860e-01 0.0000e+00 3.3232e-01 -7.3372e-02
f 1.6703e+00 1.9894e+00 0.0000e+00 3.6262e+00 1.0204e+00
..
.
63 2.5325e-05 2.1351e-05 0.0000e+00 -3.9743e-06 1.8000e-05
8.5622e-15 -5.0780e-15 0.0000e+00 -1.3640e-14 3.0116e-15
f 3.8483e-09 2.7352e-09 0.0000e+00 5.6862e-09 1.9441e-09
64 1.8000e-05 2.1351e-05 0.0000e+00 -3.3506e-06 1.5176e-05
3.0116e-15 -5.0780e-15 0.0000e+00 8.0896e-15 -1.7861e-15
f 1.9441e-09 2.7352e-09 0.0000e+00 4.0416e-09 1.3818e-09
65 1.8000e-05 1.5176e-05 0.0000e+00 -2.8248e-06 1.2794e-05
3.0116e-15 -1.7861e-15 0.0000e+00 -4.7977e-15 1.0593e-15
f 1.9441e-09 1.3818e-09 0.0000e+00 2.8726e-09 9.8214e-10
66 1.2794e-05 1.5176e-05 0.0000e+00 -2.3815e-06 1.0786e-05
1.0593e-15 -1.7861e-15 0.0000e+00 2.8454e-15 -6.2823e-16
f 9.8214e-10 1.3818e-09 0.0000e+00 2.0418e-09 6.9807e-10
354 Torczon’s multi-directional search
2 0.1
1.5
1 0.05
0.5
0 x2 0 x2
-0.5
-1 -0.05
-1.5
-2 -0.1
-0.5 0 0.5 1 1.5 -0.1 -0.05 0 0.05 0.1
x1 x1
(a) First four iterations (b) Iteration 20
Figure 15.5: Iterations of the Nelder-Mead method for the McKinnon example
15.2 Torczon’s multi-directional search

The main reason for the failure of the Nelder-Mead method when it comes to the
McKinnon example is that the simplex degenerates with the iterations. Indeed, the
three vertices of the simplex become almost colinear. To remedy this, Torczon (1989)
proposed a multi-directional search method that maintains the geometry of the sim-
plex throughout the iterations and guarantees that no degeneration appears.
Contrary to the Nelder-Mead method, the Torczon method revolves around the
best point of the simplex. Consider the simplex generated by the set of points S =
{x1 , . . . , xn+1 }. We note
f(S) = min f(xi ) .
i=1,...,n+1
We assume, without loss of generality, that the minimum is achieved with the first
point, that is f(S) = f(x1 ). The main idea is to consider the simplex obtained by
reflection around the best point x1 , i.e., the simplex generated by Sr = {xr1 , . . . , xrn+1 },
with xr1 = x1 , and xri = 2x1 − xi , i = 2, . . . , n + 1 (see the illustration in Figure 15.6).
• If the best point of Sr is not x1 , i.e., if f(Sr ) < f(S), then Sr is better than S. In
this case, we try to go further and expand the search even more by considering
the simplex generated by Se = xe1 , . . . , xen+1 , with xe1 = x1 and xei = 3x1 − 2xi ,
i = 2, . . . , n + 1 (e for “expansion”). If the expansion provides a better result,
that is, if f(Se ) < f(Sr ), then we choose Se for the next iteration. Otherwise, we
choose Sr .
• In the case where Sr is not better than S, i.e., if f(Sr ) ≥ f(S), then we have
to contract the simplex and use in the next iteration Sc = xc1 , . . . , xcn+1 , with
xc1 = x1 and xci = 21 (x1 + xi ), i = 2, . . . , n + 1.
These different simplices are illustrated in Figure 15.6, and Algorithm 15.2 describes
the method.
x2
xc2
xe3 xr3 x1 xc3 x3

xr2
xe2
Figure 15.6: Simplices from Torczon’s method
Algorithm 15.2: Torczon’s multi-directional method

1 Objective
2 To find (an approximation of) a local minimum of the optimization problem
min f(x) . (15.2)

x∈Rn
3 Input
4 The function f : Rn → R.
5 The set of affinely independent vectors of Rn S0 = {x1 , . . ., xn+1 } such that
f(x1 ) ≤ f(xi+1 ) and f(xn+1 ) ≥ f(xi ), i = 1, . . . , n.
7 Output
9 Initialization
10 k := 0.
11 Repeat
12 Calculate Sr composed of x1 and xri = 2x1 − xi , i = 2, . . . , n + 1
13 if f(Sr ) < f(x1 ) then
14 Calculate Se composed of x1 and xei = 3x1 − 2xi = 2xri − x1 ,
i = 2, . . . , n + 1.
15 if f(Se ) < f(Sr ) then
16 Sk+1 := Se
17 else
18 Sk+1 := Sr
19 else
20 Calculate Sk+1 composed of x1 and xci = (x1 + xi )/2, i = 2, . . . , n + 1.
21 k := k + 1
22 Renumber such that x1 is the best point and xn+1 is the least good in the
new simplex
23 Until kxn+1 − x1 k ≤ ε
24 x∗ = x1
356 Torczon’s multi-directional search
The iterations of Torczon’s multi-directional method applied to the McKinnon

example are presented in Figure 15.7 and detailed in Table 15.3, which indicates the
best point of the current simplex as well as the type of transformation ([C]ontraction,
[R]eflection, [E]xpansion). We find that, contrary to the Nelder-Mead method, the
Torczon method converges toward the optimal solution of the problem. It is actually
the first direct search method that is associated with a proof of convergence (Torczon,
1991).
Table 15.3: Iterations for Torczon’s method for the McKinnon example
(x1 )1 (x1 )2 f(x1 )
0.0000e+00 0.0000e+00 0.0000e+00 C
0.0000e+00 0.0000e+00 0.0000e+00 C
0.0000e+00 0.0000e+00 0.0000e+00 C
1.0538e-01 -7.4134e-02 -2.0035e-03 E
6.6151e-02 -4.7240e-01 -2.2298e-01 C
6.6151e-02 -4.7240e-01 -2.2298e-01 C
6.6151e-02 -4.7240e-01 -2.2298e-01 R
3.6514e-03 -5.3490e-01 -2.4870e-01 C
3.6514e-03 -5.3490e-01 -2.4870e-01 C
3.6514e-03 -5.3490e-01 -2.4870e-01 C
3.6514e-03 -5.3490e-01 -2.4870e-01 C
3.6514e-03 -5.3490e-01 -2.4870e-01 R
3.5813e-04 -5.3258e-01 -2.4894e-01 E
1.5841e-03 -5.2014e-01 -2.4958e-01 R
2.8102e-03 -5.0769e-01 -2.4989e-01 C
2.8102e-03 -5.0769e-01 -2.4989e-01 R
3.4232e-03 -5.0147e-01 -2.4993e-01 C
1.4700e-03 -5.0342e-01 -2.4998e-01 R
-1.7658e-04 -5.0226e-01 -2.4998e-01 R
1.2992e-04 -4.9915e-01 -2.5000e-01 C
-2.3332e-05 -5.0071e-01 -2.5000e-01 C
5.3294e-05 -4.9993e-01 -2.5000e-01 C
5.3294e-05 -4.9993e-01 -2.5000e-01 C
5.3294e-05 -4.9993e-01 -2.5000e-01 C
4.3716e-05 -5.0003e-01 -2.5000e-01 C
1.7987e-05 -5.0001e-01 -2.5000e-01 C
1.7987e-05 -5.0001e-01 -2.5000e-01 R
5.1230e-06 -5.0000e-01 -2.5000e-01 C
5.1230e-06 -5.0000e-01 -2.5000e-01 C
Since the doctoral work of V. Torczon in 1989, numerous researchers have shown
interest in direct search methods. The interested reader is in particular referred to
Wright (1996) or Lewis et al. (2000).
0.5
0 x2
-0.5
-1
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2
x1
Figure 15.7: Torczon’s algorithm for McKinnon’s example
15.3 Project
Objective
The aim of this project is to compare the quasi-Newton algorithms with nonderivative
algorithms.
Approach
Each problem is to be solved with each algorithm. The gradient used by the quasi-
Newton methods should be calculated by finite differences, in order for the contexts
to be comparable. The performance indices are the execution time on the one hand,
and the number of evaluations of the function on the other hand. The latter index
is particularly important when the function to be optimized requires much computa-
tional effort to be evaluated. The method described in Section D.2 is to be used to
analyze the results.
Algorithms
Algorithms 13.1, 13.2, 15.1, and 15.2.
Problems

2
min 2x1 x2 e−(4x1 +x2 )/8 .
x∈R2
358 Project
n
X T
minn iαx2i , x̄ = 1 ... 1 ,
x∈R
i=1
with various values of n and of α.

min 3x21 + x42 .
x∈R2
T
2 2
min 100 x2 − x21 + 1 − x1
x∈R2
T T
(Section 11.6). Recommended starting points: 1.2 1.2 and −1.2 1 .
m
X
min −e−0.1 i + 5 e−i − 3 e−0.4 i
x∈R6
i=1
2
+ x3 e−0.1 ix1 − x4 e−0.1 ix2 + x6 e−0.1 ix5 .
T
Part V
Constrained optimization
The more constraints one imposes,

the more one frees one’s self. And
the arbitrariness of the constraint
serves only to obtain precision of
execution.
Igor Stravinsky
We now address the description of algorithms for solving constrained optimization

problems. Chapter 16 describes one of the most famous algorithms, the simplex
method for linear optimization. A version of Newton’s method for constrained prob-
lems, based on projection operators of these constraints, is described in Chapter 17.
Dikin’s method in Section 17.3 for linear optimization problems constitutes a tran-
sition towards a presentation of interior point methods (Chapter 18). Finally, aug-
mented Lagrangian algorithms and sequential quadratic programming are described
in Chapters 19 and 20, respectively.
Chapter 16
The simplex method
Contents
16.1 The simplex algorithm . . . . . . . . . . . . . . . . . . . . 363
16.2 The simplex tableau . . . . . . . . . . . . . . . . . . . . . . 376
16.3 The initial tableau . . . . . . . . . . . . . . . . . . . . . . . 385
16.4 The revised simplex algorithm . . . . . . . . . . . . . . . 394
16.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
16.6 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
16.1 The simplex algorithm

The simplex method (Wood and Dantzig, 1949, Dantzig, 1949, Dantzig, 1963) is prob-
ably the most famous optimization algorithm, designed to solve linear optimization
problems (6.159)–(6.160), i.e.,
minn cT x
x∈R
subject to
Ax = b
x ≥ 0,
where A ∈ Rm×n , b ∈ Rm , c ∈ Rn . A central idea of the method is based on the fact
that if an optimal solution exists, then there is one that is a vertex (Definition 3.34)
of the constraint polyhedron.1 This property is trivial when n = 1. For instance,
the linear optimization problem in one variable minx∈R cx subject to x ≥ 0 has an
optimal solution if and only if c ≥ 0, because the problem is unbounded if c < 0.
When c > 0, x∗ = 0 is the only optimal solution to the problem. If c = 0, then
any x ∈ R is optimal, in particular x∗ = 0. Note that we have used this property
in Chapter 4 when deriving the conditions for the dual function to be bounded. We
formalize this result for general bound constraints on x.
1 We suggest that the reader carefully reads Sections 3.5 and 6.5 before proceeding with this
chapter.
364 The simplex algorithm
Lemma 16.1. The optimal solution to
min ax + b
x∈R, xℓ ≤x≤xu
is x∗ = xℓ if a > 0 (Figure 16.1(a)) and x∗ = xu if a < 0 (Figure 16.1(b)). If

a = 0, every xℓ ≤ x ≤ xu is optimal, in particular xℓ and xu (Figure 16.1(c)).
As a corollary, if there exists x∗ different from xℓ and xu , which is an optimal
solution, then a = 0 and any feasible point is optimal.
Proof. If a > 0 (respectively a < 0), the function is strictly increasing (respectively
decreasing), and the minimum is reached for the smallest (respectively largest) feasible
value of x.
x∗ x∗
xℓ xu xℓ xu
(a) a > 0 (b) a < 0
x∗
xℓ xu
(c) a = 0
Figure 16.1: Three possible cases for Lemma 16.1

The simplex method 365
Theorem 16.2 (Vertex solution). If the linear optimization problem (6.159)–

(6.160) has an optimal solution, there exists an optimal vertex of the constraint
polyhedron.
Proof. Let P = {x ∈ Rn |Ax = b, x ≥ 0} be the constraint polyhedron

and let x∗ ∈ P
be
an optimal solution with f = c x . Consider Q = x ∈ P | cT x = f∗ =
∗ T ∗
x ∈ R | Ax = b, cT x = f∗ , x ≥ 0 , i.e., the set of optimal solutions to the problem.

n
The set Q is also a polyhedron represented in standard form. It is non empty (x∗ ∈ Q)
and contains at least one vertex y∗ (Theorem 3.37). We assume by contradiction that
y∗ is not a vertex of P. Therefore, from the definition of a vertex (Definition 3.34),
there exist y and z, y 6= z, in P and 0 < λ∗ < 1 such that
y∗ = λ∗ y + (1 − λ∗ )z . (16.1)
According to Lemma 16.1, the one-dimensional linear optimization problem
min λcT y + (1 − λ)cT z

0≤λ≤1
has an optimal solution with a value 0 < λ∗ < 1 (corresponding to y∗ ). Therefore,

any feasible value of λ is also optimal. In particular λ = 0 and λ = 1, so that
f∗ = cT y∗ = cT y = cT z and y and z belong to Q, contradicting the fact that y∗ is a
vertex of Q. This shows that the optimal solution y∗ is a vertex of P.
George Bernard Dantzig was born on November 8, 1914, in Port-

land, Oregon (USA). He was a student of Neyman at Berkeley.
Arriving late to one of Neyman’s classes, Dantzig took as ex-
ercises some unresolved statistical problems that Neyman had
written on the blackboard and solved them. Even though he
finished his dissertation in 1941, he didn’t receive his PhD until
1946 because of World War II. George Dantzig’s fundamental
contribution is the simplex method in optimization. He is often
referred to as the “father of linear programming,” and his work is considered as the
foundation of operations research as a science. Developed within the scope of a task
for the US Air Force, the simplex method was first used to solve a diet problem. Lin-
ear programming and the simplex method are nowadays embedded in every decision
support system. They represent the most important contribution of mathematics to
the daily operations of businesses and industries around the world. The strength of
the method always impressed Georges Dantzig himself. Dantzig was since 1966 pro-
fessor of operational research and computer science at Stanford University. He died
on Friday May 13, 2005.
Figure 16.2: George B. Dantzig
Before proposing algorithms, let us see how to solve the problem graphically for
a simple example.
Example 16.3 (Graphical method). Consider the optimization problem
min −x1 − 2x2

x∈R2
subject to
x1 + x2 ≤ 1
x1 − x2 ≤ 1
x1 ≥ 0
x2 ≥ 0 .
Put in standard form, the constraint polyhedron of this problem is identical to the
polyhedron in Example 3.39. Figure 16.3 represents the feasible domain in the space
of variables (x1 , x2 ).
x2 ✻
✕
✕ f = −2.5
f = −2.0
f = −1.5
✲
f = −1 x1
f = −0.5
Figure 16.3: Graphical method to find the optimal solution to a linear optimization
problem
The level lines corresponding to the different values of the objective function are also
shown. The steepest descent direction, i.e.,

1
−∇f(x) = −c =
2
is displayed with an arrow on each level line. In order to graphically identify the
optimal solution to the problem, one must
1. draw an arbitrary level line intersecting the feasible domain,
2. move this line parallel to itself as far as possible in the direction of the vector −c,
as long as it intersects the feasible domain.
All the points at the intersection of this line and the feasible domain are optimal.
In Example 16.3, the optimal solution is (0, 1), with an optimal value of −2.
The result of Theorem 16.2 enables us to propose a quite simple algorithm. To find
the optimal solution to the problem, we need only go through all the vertices of
the constraint polyhedron and choose the best one. This algorithm can be easily
implemented thanks to Theorem 3.40, that says that the choice of a vertex of the
constraint polyhedron amounts to the choice of the n − m inequality constraints that
are active at this vertex, that is to choose n − m variables that are set to 0 (indeed,
as the polyhedron is represented in standard form, the inequality constraints are
non negativity constraints). These variables are said to be non basic. The m other
variables are said to be basic. If the m × m matrix B gathering the columns of A
corresponding to basic variables is non singular, xB = B−1 b provides the value of the
basic variables at the considered vertex, if xB ≥ 0 (see the discussions in Section 3.5
for more details). The vertex enumeration method detailed in Algorithm 16.1 exploits
this characterization.
Example 16.4 (Vertex enumeration). We apply Algorithm 16.1 with
 
−1

1 1 1 0 1  −2 
A= , b= , c= 
 0 .
1 −1 0 1 1
0
The vertex enumeration for this problem is carried out in the description of Example
3.39. We now need only calculate the values of the objective function to find the
optimal solution.
Algorithm 16.1: Vertex enumeration

1 Objective
2 To find the global minimum of a linear optimization problem in standard
form (6.159)–(6.160)
3 Input
4 The matrix A ∈ Rm×n .
5 The vector b ∈ Rm .
6 The vector c ∈ Rn .
7 Output
8 The set J∗ of basic variables indices of the optimal solution
9 Initialization
10 C := Cm (1, . . . , n), set of all combinations of m indices among n.
11 k := 1
12 f∗ := +∞
13 Repeat
14 Choose a potential basis, that is a set of m indices Jk = (jk1 , . . . , jkm ) ∈ C
15 Let B = (Ajk1 , . . . , Ajkm ) be the matrix formed by the columns of A
corresponding to the indices of Jk
16 if B is invertible and B−1 b ≥ 0 then
17 fk := cTB B−1 b where cB contains the basic components of c
18 else
19 fk = +∞
20 if fk < f∗ then
21 J∗ = Jk
22 f∗ = fk
23 C := C \ Jk
24 k := k + 1
25 Until C = ∅
k xk fk
T
1 1 0 0 0 −1
T
2 1 0 0 0 −1
T
3 1 0 0 0 −1
T
4 0 −1 2
0 B−1 b 6≥ 0
T
5 0 1 0 2 −2
T
6 0 0 1 1 0
T
The optimal solution is x∗ = 0 1 0 2 .
Algorithm 16.1 identifies the optimal solution to a linear optimization problem in

a finite number of iterations, which is the total number of possible ways to choose m
variables among n, that is
n!
.
(n − m)!m!
This number becomes prohibitively large when n and m are large, and the algorithm
is then inapplicable. We are facing a combinatorial optimization problem2 (see
Definition 25.5). We refer the reader to Avis and Fukuda (1992) for an algorithm
enumerating the vertices of a polyhedron in arbitrary dimension.
The simplex method, which we now describe, also goes through the vertices of the
constraint polyhedron, but does it intelligently, to avoid having to enumerate them
all. It uses a strategy similar to the descent methods presented in Chapter 11: at
each iteration, a descent direction is identified, and a step is calculated. As discussed
below, the step may happen to be zero, so that several iterations may not produce
any progress, and special care needs to be taken to avoid the algorithm being stalled.
The algorithm exploits the equivalence between vertices of the constraint polyhedron
and feasible basic solutions (Theorem 3.40).
The geometric interpretation of the simplex algorithm can be summarized as fol-
lows:
• The algorithm starts from a vertex of the constraint polyhedron.
• An edge of the polyhedron along which the objective function decreases is identi-
fied. If no such edge exists, the current vertex is an optimal solution.
• The edge is followed until the next vertex is reached.
A simple illustration is provided in Figure 16.4.
x0
x1 x2
x∗
Figure 16.4: Geometric illustration of the simplex algorithm
2 Linear optimization combines the features of continuous optimization and combinatorial opti-
mization. In addition to its important role in many concrete applications, these features are
exploited a lot in theoretical developments.
The analysis provided in Sections 3.5 and 6.5 allows the geometrical concepts to
be translated into algebraic concepts that are combined to Algorithm 16.2.
• The vertices of the polyhedron are basic feasible solutions, characterized by the
set of indices corresponding to the basic variables (Theorem 3.40). Note that one
vertex may correspond to several basic feasible solutions.

• The edges of the polyhedron are characterized by the basic directions (Defini-
tion 3.42), and are used to go from one vertex to another.
• The reduced costs (Definition 6.28) represent the directional derivative of the
function in the basic directions. They are utilized to identify the descent directions
and verify the optimality of the current iterate.
Comments
• The algorithm must be initialized with an arbitrary feasible basic solution. For
problems in standard form, such a feasible solution always exists if the polyhedron
is non empty (Theorems 3.37 and 3.40). However, it is not necessarily simple to
find such a feasible solution. We address this problem in Section 16.3, where
such a basic feasible solution, or proof there is none, is furnished by the simplex
algorithm applied to an auxiliary problem.
• According to Theorem 6.29, the reduced costs for the basic indices are zero. For
this reason, in step 15, only the non basic indices are considered.
• In practice, the dN part of the direction is never formed.
• Step 22 calculates the maximum step αq that can be taken along the basic direc-
tion dp , while remaining feasible. Geometrically, it is the step that corresponds
to the first constraint activated along the basic direction. Therefore, the corre-
sponding variable becomes 0 and becomes non basic (see Lemma 16.5).
• The reduced cost represents the directional derivative of the (linear) function in
the basic direction. When it is negative, the basic direction is a descent direction
and the new value of the objective function is
cT xk+1 = cT xk + αq c̄p . (16.2)
• Step 18 identifies the index of the non basic variable that is entering the basis,
and step 22 identifies the index of the basic variable that is leaving the basis.
In the presence of a degenerate basic feasible solution, several candidates may
be possible. The algorithm then suggests selection of the smallest index among
those that verify the condition. It allows for a systematic enumeration of the basic
feasible solutions corresponding to the same vertex of the constraint polyhedron,
and avoids the algorithm being stalled in an endless cycle. This guarantees that
the algorithm terminates after a finite number of iterations. This choice is called
Bland’s rule, from the work of Bland (1977). Other strategies are suggested in the
projects of Section 16.6.
Algorithm 16.2: Simplex method

1 Objective
form (6.159)–(6.160).
3 Input
7 J0 = (j01 , . . . , j0m ) the set of indices of the basic variables corresponding to a
basic feasible solution.
8 Output
9 A Boolean indicator U detecting an unbounded problem.
10 If U is false, J∗ = (j∗1 , . . . , j∗m ) is the set of indices of an optimal basic
feasible solution.
11 Initialization
12 k := 0.
13 Repeat
14 Let B = Ajk1 , . . . , Ajkm be the matrix formed by the columns of A
corresponding to the indices of Jk .
15 if c̄j = cj − cTB B−1 Aj ≥ 0 ∀j 6∈ Jk then optimal solution
16 J∗ = Jk , U=FALSE, STOP.
17 else
18 p := smallest index such that c̄p < 0.
19 Calculate the basic variables xB = B−1 b .
20 Calculate the basic components of the pth basic direction dB = −B−1 Ap .
21 For each i = 1, . . . , m , calculate the distance to the non negativity
constraint, i.e., 
 − xB i if d < 0

B i
αi := dB i (16.3)


+∞ otherwise .
22 Let q be the smallest index such that αq = mini αi .

23 if αq = +∞ then the problem is unbounded and has no optimal solution
24 U=TRUE. STOP.
25 Jk+1 := Jk ∪ {p} \ jkq .
26 k = k + 1.
27 Until STOP
• At each iteration, if the basic feasible solution is not degenerate, the basic direction
is feasible (Theorem 3.44). Then, no αi in (16.3) is zero. Therefore, the step αq
is positive and (16.2) guarantees that
cT xk+1 < cT xk .
• For the algorithm to be valid, we need to demonstrate that xk+1 is again a basic
feasible solution and that the matrix B̄, obtained by replacing column jq by Ap ,
is non singular.
Lemma 16.5. After one iteration of the simplex method (Algorithm 16.2), the
new set of indices defines a basic feasible solution.
Proof. We assume without loss of generality that the numbering is such that the basic
variables come first, that is, jki = i for all i. We consider the matrix B̄ corresponding to
the new set of indices and let us first demonstrate that it is non singular. We assume,
by contradiction, that this is not the case. There exist coefficients λ1 , . . . , λm , not all
zero, such that
m
X m
X m
X
λi B̄i = λi B̄i + λq B̄q = λi Ai + λq Ap = 0 ,
i=1 i=1 i=1
i6=q i6=q
as B̄ has been obtained from B by removing column q and replacing it by Ap , all

other columns being the same. Multiplying by B−1 , we obtain
m
X
λi B−1 Ai + λq B−1 Ap = 0,
i=1
i6=q
and these vectors are linearly dependent. However, for all i 6= q, Bi = Ai and
B−1 Ai = B−1 Bi = ei ,
where ei is the ith column of the identity matrix. These vectors are linearly indepen-
dent and their qth component is zero. The vector B−1 Ap is exactly −dB (see step 20
or (3.88)). The qth component
of −dB is not zero, as the index q is chosen among
the indices for which dp i < 0 (step 21). Therefore, B−1 Ap is linearly independent
from all B−1 Ai , creating the contradiction.
So, the first part of Definition 3.38 is satisfied. We now demonstrate that all the
non basic variables are zero. As a basic direction is followed during the iteration, only
the non basic variable p is modified by the iteration (see Definition 3.42). It enters
the basis and may become positive. All the others remain at zero, out of the basis.
We analyze the qth variable, that exits the basis during the iteration. Its value at the
end of the iteration is

xk q
xk+1 q = xk q + αq dp q = xk q − dp q = 0 .
dp q
The new iterate is therefore indeed a basic solution. We demonstrate that it is feasible.
• All the non basic variables are non negative, because they are zero.
• Let i be a basic index. We have

xk+1 i = xk i + αq dp i .

Since xk is feasible and xk i ≥ 0, only the indices i such that dp i < 0 may
cause problems. However, according to step 22 of the algorithm we have αq ≤ αi
for such indices. Therefore,

xk+1 i = xk i +αq dp i

≥ xk i + αi dp i

xk i
= xk i − dp i = 0
dp i
and xk+1 is feasible.
Example 16.6 (The simplex method – I). We apply the simplex method to the same
problem as in Example 16.4, with
 
−1

1 1 1 0 1  −2 
A= , b= , c=  0 .

1 −1 0 1 1
0
Iteration 1

0 1 0
1. Consider J = {3, 4} and B = .
0 1
2. Current iterate:  
0
 0 
x0 =  
 1 , cT x0 = 0 .
1
3. Reduced costs:
c̄1 = −1
c̄2 = −2
c̄3 = 0
c̄4 = 0 .
The index p = 1 is chosen to enter in the basis. Indeed, it is the smallest index
corresponding to a negative reduced cost.
4. Basic direction:
 
1

−1  0 
dB1 = , d1 = 
 −1  ,
 p = 1.
−1
−1
5. Distances to the constraint:

α3 = 1
α4 = 1 .
The index q = 3 is chosen to leave the basis. In fact, since the two values of αi
are equal, the smallest index is chosen.

6. Index 1 replaces index 3 in the basis, and J1 = {1, 4}.
x2 ✻ x2 ✻
x1 + x2 = 1 ■ x1 + x2 = 1
✠ ✠
✲ ✲ ✲
x1 x1
■x1 − x2 = 1 ■x1 − x2 = 1
(a) Iteration 1 (b) Iteration 2
Figure 16.5: Iterations for Example 16.6
Iteration 2

1 0
1. J1 = {1, 4} and B = .
1 1
2. Current iterate :  
1
 0 
x1 =  
 0 , cT x1 = −1 .
0
3. Reduced costs:
c̄1 = 0
c̄2 = −1
c̄3 = 1
c̄4 = 0 .
The index p = 2 is chosen to enter in the basis.
4. Basic direction:
 
−1

−1  1 
dB2 = , d2 =  
 0 , p = 2.
2
5. Distances to the constraint:

α1 = 1
α4 = +∞ .
The index q = 1 is chosen to leave the basis.

6. Index 2 replaces index 1 in the basis, and J2 = {2, 4}.
Iteration 3

1 0
1. J2 = {2, 4} and B = .
−1 1
2. Reduced costs:
c̄1 = 1
c̄2 = 0
c̄3 = 2
c̄4 = 0 .
The point is optimal.

3. Optimal solution:


0
 1 
x∗ =  
 0 , cT x∗ = −2 .
2
Example 16.7 (The simplex method – II). We apply the simplex method to the
following problem:
min −10 x1 − 12 x2 − 12 x3
subject to
x1 + 2x2 + 2x3 ≤ 20
2x1 + x2 + 2x3 ≤ 20
2x1 + 2x2 + x3 ≤ 20 ,
and
x1 , x2 , x3 ≥ 0.
376 The simplex tableau
By adding slack variables in order to obtain a problem in standard form, we get a

problem with n = 6 variables and m = 3 constraints, defined by
 
−10
     −12 
1 2 2 1 0 0 20  
 
 −12 
A =  2 1 2 0 1 0  , b =  20  , c =  .
 0 
2 2 1 0 0 1 20  
 0 
0
The choice of J0 = {4, 5, 6} produces an initial feasible basic solution, from which we
can apply the simplex method. The details of the iterations are given in Table 16.1.
Table 16.1: Iterations with the simplex method for Example 16.7
k Jk c̄ x1 x2 x3 x4 x5 x6 dB αq p q cT x
0 4 5 6 −10 −12 −12 0 0 0 20 20 20 −1 −2 −2 10 1 5 0
1 1 4 6 −7 −2 5 10 0 0 10 0 0 −1.5 −0.5 −1 0 2 6 −100
2 1 2 4 −9 −2 7 10 0 0 10 0 0 −2.5 −1.5 1 4 3 4 −100
3 1 2 3 3.6 1.6 1.6 4 4 4 0 0 0 −136
Note that the basic feasible solution for iteration 1 is degenerate. Indeed, x6 is
zero even though it is in the basis. We also note that at this iteration, the method
cannot progress. The step αq is zero. However, the algorithm still performs a change
of basis (the variable 6 is replaced by the variable 2). During iteration 2, the feasible
basic solution is also degenerate (x2 is zero in the basis), but the algorithm can now
progress (αq = 4).
16.2 The simplex tableau

Algorithm 16.2 requires a significant computational effort for the linear algebra,
mainly due to the need for the matrix B−1 :
• Step 15: calculating reduced costs cT − cTB B−1 A.
• Step 19: calculating the current iterate B−1 b.
• Step 20: calculating the direction −B−1 Ap .
To improve this, we regroup all the important quantities used by the algorithm in a
table, called the simplex tableau.
Definition 16.8 (Simplex tableau). Consider a linear optimization problem in stan-

dard form min cT x subject to Ax = b, x ≥ 0, and let us take a basic matrix B
corresponding to a basic feasible solution e
x. The table
B−1 A B−1 b
(16.4)
T
c − cTB B−1 A −cTB B−1 b
is called the simplex tableau corresponding to this basic feasible solution. In more detail,
we have
e
xj1
−1 −1
..
B A1 · · · B An . (16.5)
e
xjm
c̄1 ··· c̄n −cT e
x
where c̄i is the reduced cost of the variable i.
To illustrate the concept, let us consider the table corresponding to the optimal
solution of Example 16.6:
x1 x2 x3 x4
1 1 1 0 1 x2
2 0 1 1 2 x4
1 0 2 0 2 −cT x
Basic variables
• Each column to the left side of the tableau corresponds to a variable of the prob-
lem.
• The columns with the basic variables contain the columns of the identity matrix.
• Each row corresponds to a basic variable. The correspondence is defined by the
specific structure of the column corresponding to this basic variable: it is a column
of the identity matrix, that is, all its elements are 0, except one which is 1. The
row where element 1 is situated is the row corresponding to the basic variable.
In the example, the only 1 in the column corresponding to x2 is in the first
row. Therefore, the first row is associated with variable x2 . Following the same
reasoning, we see that the second row is associated with variable x4 .
• The m first rows of the last column contain the values of the basic variables. The
other variables, that is, the non basic variables, are always zero.
• The last element of the last column contains the value of the objective function,
with opposite sign. Indeed,
−cT x = −cTB xB − cTN xN = −cTB B−1 b − 0 .
If a tableau is available, an iteration of Algorithm 16.2 greatly simplifies, as the

quantities required for steps 15, 19, and 20 can now be read directly in the tableau
instead of being calculated. The questions are now: how do we generate the first
tableau, and how do we update the tableau from one iteration to the next, in an
efficient way? We address now the second question. We come back to the first one in
Section 16.3.
Clearly, in order for the algorithm to be effective, one must avoid recalculating
the tableau at each iteration. Given that only one variable is replaced in the basis,
only one column of B is modified from one iteration to the next. Therefore, we do
not expect the tableau in the next iteration to be too different from the tableau in
the current iteration.
Let B be the basic matrix at the start of the iteration and let B̄ be this matrix at
the end of the iteration, obtained by replacing one column of B by another column
from A. In Example 16.7, we have
 
1 2 2 1 0 0
A= 2 1 2 0 1 0 .
2 2 1 0 0 1
The first basic matrix, corresponding to indices 4, 5, and 6, is

 
1 0 0
B = (A4 A5 A6 ) =  0 1 0 .
0 0 1
After the first iteration, the variable 5 is replaced by variable 1 in the basis, so that
 
1 1 0
B̄ = (A4 A1 A6 ) =  0 2 0 .
0 2 1
We would like to find a simple transformation of B−1 to obtain B̄−1 , i.e., a matrix
Q such that
QB−1 = B̄−1
or, equivalently,
QB−1 B̄ = I .
It means that the matrix Q transforms the matrix B−1 B̄ into the identity matrix.
Since B−1 B = I, and B and B̄ have the same columns except one, the matrix B−1 B̄ is
already “almost” the identity matrix, i.e.,
 
1 0 u1 0
 0 1 u2 0 
 
 .. .. .. .. .. 
 . 
 . . . . 
B B̄ = 
−1
 .. .. .. ,
 (16.6)
 . . uq . 
 
 .. .. .. .. .. 
 . . . . . 
0 0 um 1
where the vector u is defined by u = −dp = B−1 Ap . We must transform (16.6) into
an identity matrix. The application Q that takes care of this is the composition of
elementary row operations.
Definition 16.9 (Elementary row operations). Consider a matrix A. An elementary

row operation on A consists in multiplying by a constant β a row j of A and adding
the result to the row i
ai := ai + βaj ,
where ai denotes the ith row of A. This operation consists in multiplying A by the
matrix Qij , which is the identity matrix (row dimension of A), of which element (i, j)
is replaced by β.
Example 16.10 (Elementary row operations). Consider the matrix

 
1 2
A =  3 4 .
5 6
We choose i = 2, j = 1, β = 4. We multiply the first row by 4 and add the result to
the second row to obtain  
1 2
Ā =  7 12  .
5 6
We have Ā = Qij A, with
   
0 0 0 1 0 0
Qij =  4 0 0 +I= 4 1 0 .
0 0 0 0 0 1
In order to transform the matrix (16.6) into an identity matrix, we must apply to
it the following elementary row operations:
• For each row i 6= q, we add the row q multiplied by −ui /uq to the ith row. Note
that uq = −(dp )q is not zero (see the proof of Theorem 16.5).
• The row q is divided by uq .
Example 16.11 (Basis change). Consider again Example 16.7. At the first iteration,
we have  
1 0 0
B−1 =  0 1 0  .
0 0 1
The second row of B (corresponding to the variable 5) is replaced by A1 . Since
T
B−1 A1 = 1 2 2 , we have
 
1 1 0
B−1 B̄ =  0 2 0  .
0 2 1
In order for this matrix to be an identity matrix, we must apply elementary row
operations to the following rows:
1. a1 := a1 − (u1 /u2 )a2 , that is a1 := a1 − 0.5 a2 .

2. a3 := a3 − (u3 /u2 )a2 , that is a3 := a3 − a2 .
3. a2 := a2 /u2 , that is a2 := a2 /2.
Note that the modification of row 2 must be performed last, as this row is involved
in the update of rows 1 and 3. By applying these operations to B−1 , we get

 
1 −0.5 0
B̄−1 = 0 0.5 0 .
0 −1 1
Indeed,
B̄−1 = QB−1 = Q22 Q12 Q32 B−1 = Q22 Q12 Q32
where
     
1 0 0 1 −0.5 0 1 0 0
Q22 = 0 0.5 0 , Q12 = 0 1 0 , Q32 = 0 1 0 .
0 0 1 0 0 1 0 −1 1
Note that the matrix Q22 is applied last.
We now see that the same operations can be applied to the simplex tableau. In-
deed, for the first part of the tableau, since the elementary operations are represented
by the matrix Q = B̄−1 B, we have
QB−1 A QB−1 b = B̄−1 A B̄−1 b
We show later on that the same is true for the last row. We use the following update
procedure, called pivoting.
Example 16.12 (Pivoting). Consider the following simplex tableau. We want to
extract the variable x6 from the basis (line 3 of the tableau) and enter the variable
x2 (column 2 of the tableau).
x1 x2 x3 x4 x5 x6
0 1.5 1 1 −0.5 0 10 x4
1 0.5 1 0 0.5 0 10 x1
T =
0 1 −1 0 −1 1 0 x6
0 −7 −2 0 5 0 100
Basic variables
Algorithm 16.3: Tableau pivoting

1 Objective
2 To update the simplex tableau during an iteration of the simplex method.
3 Input
4 The simplex tableau T .

5 The index p of the pivot column, i.e., the column corresponding to the non
basic variable that enters the basis.
6 The index q of the pivot row, i.e., the row corresponding to the basic
variable that leaves the basis.
7 Output
8 The simplex tableau T̄ corresponding to the new basis.
9 Initialization
10 if T (q, p) = 0 then Impossible to carry out the pivoting
11 STOP
12 for i = 1, . . . , m + 1, i 6= q do
13 T (i, k) := T (i, k) − T (i, p)T (q, k)/T (q, p) k = 1, . . . , n + 1
T (q, k)
14 T (q, k) := k = 1, . . . , n + 1
T (q, p)
By applying the pivoting (Algorithm 16.3), we get the following tableau:
x1 x2 x3 x4 x5 x6
0 0 2.5 1 1 −1.5 10 x4
1 0 1.5 0 1 −0.5 10 x1
T̄ =
0 1 −1 0 −1 1 0 x2
0 0 −9 0 −2 7 100
Basic variables
We now need only demonstrate that the last row of the new tableau corresponds
to Definition 16.8. In the tableau T , this last row is of the type
Tm+1 = (cT | 0) − dT (A | b) with dT = cTB B−1 .
The pivot row is of the type

Tq = gT (A | b) ,
where gT is the qth row of B−1 . During the elementary operation on the last row, we
have
T̄m+1 = Tm+1 + βTq
and this row takes the form
T̄m+1 = (cT | 0) − dT (A | b) + βgT (A | b) = (cT | 0) + hT (A | b) , (16.7)
with hT = −dT + βgT . We consider a column k of the tableau T̄ . We have

T (m + 1, p)
T̄ (m + 1, k) = T (m + 1, k) − T (q, k) . (16.8)
T (q, p)
By definition of the tableau, T (m+1, k) is the reduced cost associated with the variable
k. Assume first that k was in the basis before the pivoting, so that T (m + 1, k) = 0.
The column k of T contains zero values, except in the row corresponding to the basic
variable k. As k remains in the basis, it does not correspond to the pivot row q and
T (q, k) = 0. Therefore, T̄ (m + 1, k) = T (m + 1, k) = 0.
Assume now that k is the column of the pivot, that is k = p. It therefore corre-
sponds to a basic variable in the tableau T̄ . By replacing k by p in (16.8), we obtain
that the reduced cost is
T (m + 1, p)
T̄ (m + 1, p) = T (m + 1, p) − T (q, p) = 0 .
T (q, p)
Then, all the elements of the last row of T̄ corresponding to the basic variables B̄ are
zero. Consequently, by taking only these columns in (16.7),
cTB̄ + hT B̄ = 0 ,
which gives
hT = −cTB̄ B̄−1 .
Including it into (16.7), we obtain
T̄m+1 = (cT | 0) + hT (A | b) = (cT | 0) − cB̄ B̄−1 (A | b) ,
which corresponds exactly to Definition 16.8.

We are now able to redefine Algorithm 16.2 by using the tableau, and we obtain
Algorithm 16.4.
Example 16.13 (Simplex algorithm). We apply the simplex algorithm to the fol-
lowing tableau:
x1 x2 x3 x4 x5 x6
1 2 2 1 0 0 20 α4 = 20
2 1 2 0 1 0 20 α5 = 10
2 2 1 0 0 1 20 α6 = 10
−10 −12 −12 0 0 0 0
We start examining the last row from left to right until we identify a negative
reduced cost. There is one in the first column, so that it is selected as the pivot
column, and x1 is scheduled to enter the basis. The value of α is calculated for
each row of the upper part of the tableau such that the entry in the pivot column is
positive.
Algorithm 16.4: Simplex algorithm

1 Objective
form (6.159)–(6.160).
3 Input
4 T0 , the simplex tableau corresponding to a basic feasible solution.
5 Output
6 Boolean indicator U identifying the unbounded problem.
7 If U is false, T ∗ , the simplex tableau corresponding to an optimal basic
feasible solution.
8 Initialization
9 k := 0.
10 Repeat
11 Examine the reduced costs in the last row of Tk . If they are all non
negative, then the tableau is optimal. T ∗ = Tk , U=FALSE. STOP.
12 Let p be the index of the column corresponding to the negative reduced
cost that is the furthest to the left in the tableau.
13 For each i, calculate the distance to the constraint xi ≥ 0, i.e.,

 T (i, n + 1) if T (i, p) > 0
αi = T (i, p)

+∞ otherwise .
14 Let q be the smallest index such that αq = min αi .

i
15 If αq = +∞, the problem is unbounded and no optimal solution exists.
U=TRUE. STOP.
16 The index p is integrated in the basis and the index q is removed. Apply
the pivoting (Algorithm 16.3) to the tableau Tk to obtain Tk+1 .
17 k := k + 1.
18 Until STOP
In this case, the three entries are positive. The smallest α is 10, so that both rows
2 and 3 are candidates to be the pivot row. Applying Bland’s rule, we select row 2,
corresponding to x5 . The pivot (circled) is therefore in column 1 and row 2. After
pivoting, we obtain the following tableau.
x1 x2 x3 x4 x5 x6
0 1.5 1 1 −0.5 0 10 α4 = 20/3

1 0.5 1 0 0.5 0 10 α1 = 20
0 1 −1 0 −1 1 0 α6 = 0
0 −7 −2 0 5 0 100
The first column of the tableau is now a column of the identity matrix. As the 1
is at the second row, the second row of the tableau corresponds to the variable x1 .
Note that the reduced cost of the variable x1 was −10. Remember that it is the slope
of the objective function in the corresponding basic direction. As a step of length
α = 10 is performed, the value of the objective function decreases by 100 units during
this first iteration. This is reflected in the rightmost cell of the last row, that contains
the value of the objective function with the opposite sign. We indeed observe that
the objective function moved from 0 to −100.
Applying the same procedure to the new tableau, we obtain a pivot in column 2
and row 3 (circled), so that x2 enters the basis and x6 leaves it. Note that we have
here a degenerate basic feasible solution, and the basic direction happens not to be
feasible. The maximum step that can be performed is the smallest value of α, that
is α6 = 0. After pivoting, we obtain the following tableau.
x1 x2 x3 x4 x5 x6
0 0 2.5 1 1 −1.5 10 α4 = 4
1 0 1.5 0 1 −0.5 10 α1 = 20/3

0 1 −1 0 −1 1 0 α2 = +∞
0 0 −9 0 −2 7 100
Note that the value of the objective function is still −100. Indeed, the algorithm
has performed a step of length 0. Even if the tableau is different, it corresponds to
the same vertex of the constraint polyhedron.
The leftmost negative reduced cost is in column 3, so that x3 enters the basis.
Note that there are only two positive entries in the column, so that only two α’s are
calculated. The smallest one is in row 1, corresponding to x4 , that leaves the basis.
After pivoting, we obtain the following tableau.
x1 x2 x3 x4 x5 x6
0 0 1 0.4 0.4 −0.6 4 x3

1 0 0 −0.6 0.4 0.4 4 x1
0 1 0 0.4 −0.6 0.4 4 x2
0 0 0 3.6 1.6 1.6 136
A step of length 4 in a direction of slope −9 has been performed, so that the

value of the objective function has decreased by 36 units. All reduced costs in the
last tableau are non negative. We have an optimal solution. The value of the
T basic
variables can be read in the last column of the tableau: x∗ = 4 4 4 0 0 0 , with
cT x∗ = −136.
16.3 The initial tableau

Algorithm 16.4 enables us to solve a linear optimization problem provided that an
initial basic feasible solution is known and the associated tableau is calculated. In
some cases, this task is simple. In others, it can be quite difficult.
For instance, the identification of a first tableau is simple when the linear opti-
mization problem is given in the form
min cT x
subject to
Ax ≤ b
x≥0
and when the vector b only contains non negative elements. In this case, we
transform the problem into standard form by adding slack variables.
The problem then becomes
min cT x + 0T xs
subject to
Ax + Ixs = b
x≥0
xs ≥ 0 ,
where xs ∈ Rm is the vector of slack variables. The point x = 0, xs = b is a basic
feasible solution (because b ≥ 0), where the slack variables xs are in the basis and
the associated basic matrix is the identity matrix. As B = I and cB = 0, the tableau
(16.4) simplifies into
A b
(16.9)
cT 0
It is illustrated with the following example.

Example 16.14 (Initial tableau). Consider the problem
min −2x1 − x2
subject to
x1 − x2 ≤ 2
x1 + x2 ≤ 6
x1 , x2 ≥ 0 .
The problem in standard form is obtained after including the slack variables xs1 = x3
and xs2 = x4 .
min −2x1 − x2
386 The initial tableau
subject to
x1 − x2 + x3 = 2
x1 + x2 + x4 = 6
x1 , x2 , x3 , x4 ≥ 0 .
The slack variables are selected to be in the basis, and the first tableau is
x1 x2 x3 x4
1 −1 1 0 2 α3 = 2
A b 1 1 0 1 6 α4 = 6
=
cT 0 −2 −1 0 0 0
Basic variables
We can now apply the simplex algorithm. During the first iteration, x1 enters the
basis, replacing x3 .
x1 x2 x3 x4
1 −1 1 0 2 α1 = +∞
0 2 −1 1 4 α4 = 2
0 −3 2 0 4
During the second iteration, x2 enters the basis instead of x4 .
x1 x2 x3 x4
1 0 0.5 0.5 4 x1
0 1 −0.5 0.5 2 x2
0 0 0.5 1.5 10
Basic variables
The final tableau is optimal as all reduced costs (in the last row) are non negative.
The optimal value of
T the basic variables are available in the last column, so that
∗ T ∗
x = 4 2 0 0 and c x = −10.
It is important to note that the condition b ≥ 0 is restrictive. If b happens to

contain a negative element, the corresponding constraint should be multiplied by −1
to obtain a positive element. As the constraints are inequality constraints, the sign of
the inequality would therefore change, and the modified constraints are not consistent
with the requested format. For instance, the following set of constraints cannot be
transformed into the requested form (try it):
x1 − x2 ≤ 2
x1 + x2 ≤ −6
x1 , x2 ≥ 0 .
In order to identify an initial tableau for any problem, that is, a feasible basic
solution, we consider an auxiliary optimization problem. We design it in a way that
its initial tableau is easy to construct. We consider the problem in standard form
min cT x
subject to
Ax = b
x ≥ 0,
and call it problem (P). Here, as we have equality constraints, we can assume without
loss of generality that b ≥ 0. If one of the components of b happens to be negative,
we need only multiply the corresponding constraint by −1.
We now create the auxiliary problem. In order to easily obtain a first tableau, we
need to select basic variables corresponding to the column of the identity matrix, so
that B = B−1 = I. As such columns may not be present in problem (P), we enforce
it and introduce an auxiliary variable for each constraint. We obtain the following
constraints for the auxiliary problem, where the identity matrix appears explicitly:
Ax + Ixa = b
(16.10)
x, xa ≥ 0 ,
where xa ∈ Rm is the vector of auxiliary variables. Recall that the objective of

the auxiliary problem is to identify a first valid tableau. Therefore, the objective
function of problem (P) can be ignored here. Instead, our objective is to get rid
of these auxiliary variables, that were added to artificially enforce feasibility. If we
denote e ∈ Rm the vector of dimension m consisting of only 1, we obtain the following
objective function
min xa a a T T a
1 + x2 + · · · + xm = 0 x + e x , (16.11)
where the variables of problem (P) play no role. The objective is to give the smallest
possible value (that is, 0) to all auxiliary variables. The auxiliary problem with
objective function (16.11) and constraints (16.10) is called problem (A). Note that, by
construction, the value of the objective function is the sum of the auxiliary variables
and is thus always non negative.
We consider x0 , a feasible point of problem (P). As Ax0 = b, it is easy to check
that the point x = x0 and xa = 0 is a feasible point of the auxiliary problem (A).
The value of the objective function of this point in (A) is the sum of the variables
xa , that is 0. Since zero is also the smallest possible value of the objective function
of (A), we are dealing with an optimal solution of problem (A).
If x0 is a feasible point of (P), then x = x0 , xa = 0 is an optimal solution to
(A) with a zero value of the objective function. The contrapositive statement is as
follows. If the optimal solution to (A) corresponds to a nonzero, positive, value
of the objective function, then (P) has no feasible solution. As discussed later,
this provides us with a convenient way to detect an infeasible problem.
In order to solve (A), we use the simplex algorithm. The point x = 0, xa = b is

a basic feasible solution of (A) (because b ≥ 0) with auxiliary variables in the basis
to that B = I. The initial tableau is
A b A b
=
−c̃TB A|0 −c̃TB b −eT A|0 −eT b
T
where ecB = 1 1 ... 1 = e is the vector of the coefficients of the basic
variables (here, the auxiliary variables xa ) in the objective function of (A). The last
row of the tableau is simply the sum of the elements of the corresponding column,
with the sign changed.
Problem (A) can be solved by the simplex algorithm (Algorithm 16.4). The
algorithm cannot detect an unbounded problem at step 15, as the objective function
of (A) is bounded below by 0. It always produces an optimal solution x∗ , xa∗ .
Consequently, one of the following two possibilities occurs:
1. The optimal value of (A) is zero. As it is the sum of the auxiliary variables, which
are non negative, each of them have to be zero: xa∗ = 0. As Ax∗ +xa∗ = Ax∗ = b,
x∗ is a feasible solution of (P).
2. The optimal value of (A) is positive. This signifies that there is no feasible solution
for (P).
Note that the auxiliary problem (A) is always feasible and bounded, with 0 as a lower
bound.
In summary, solving the auxiliary problem (A) either provides a feasible solution
of (P), or provides a certificate of infeasibility. We illustrate this in an example.
Example 16.15 (Initial point). Consider the problem
min x1 + x2 + x3
subject to
x1 + 2x2 + 3x3 = 3
−x1 + 2x2 + 6x3 = 2
4x2 + 9x3 = 5
3x3 + x4 = 1
x1 , x2 , x3 , x4 ≥ 0 .
The auxiliary problem is written as
min xa a a a
1 + x2 + x3 + x4
subject to
x1 + 2x2 + 3x3 + xa
1 =3
−x1 + 2x2 + 6x3 + xa
2 =2
4x2 + 9x3 + xa
3 =5
3x3 + x4 + xa
4 =1
x1 , x2 , x3 , x4 , xa a a a
1 , x2 , x3 , x4 ≥ 0 .
The initial tableau for the auxiliary problem and the iterations of the simplex algo-
rithm (Algorithm 16.4) are listed below.
x1 x2 x3 x4 xa1 xa2 xa3 xa4
1 2 3 0 1 0 0 0 3 3/2
−1 2 6 0 0 1 0 0 2 1
0 4 9 0 0 0 1 0 5 5/4
0 0 3 1 0 0 0 1 1 +∞
0 −8 −21 −1 0 0 0 0 −11
2 0 −3 0 1 −1 0 0 1 1/2
−1/2 1 3 0 0 1/2 0 0 1 +∞
2 0 −3 0 0 −2 1 0 1 1/2
0 0 3 1 0 0 0 1 1 +∞
−4 0 3 −1 0 4 0 0 −3

1 0 −3/2 0 1/2 −1/2 0 0 1/2 +∞
0 1 9/4 0 1/4 1/4 0 0 5/4 5/9
0 0 0 0 −1 −1 1 0 0 +∞
0 0 3 1 0 0 0 1 1 1/3
0 0 −3 −1 2 2 0 0 −1

1 0 0 1/2 1/2 −1/2 0 1/2 1 x1
0 1 0 −3/4 1/4 1/4 0 −3/4 1/2 x2
0 0 0 0 −1 −1 1 0 0 xa3
0 0 1 1/3 0 0 0 1/3 1/3 x3
0 0 0 0 2 2 0 1 0
Basic variables
The last tableau is optimal since all the reduced costs are non negative. The
optimal solution is x1 = 1, x2 = 1/2, x3 = 1/3, x4 = 0, xa a a a
1 = x2 = x3 = x4 = 0. The
optimal value of the objective function and all the auxiliary variables are zero. It is
easy to verify that (x1 , x2 , x3 , x4 ) is a feasible point of the initial problem.
The optimal solution to the auxiliary problem renders it possible to identify a

feasible point of the initial problem, if such a point exists. However, in order to apply
the simplex algorithm (Algorithm 16.4), it is necessary to use the associated simplex
tableau. In the case where all the auxiliary variables are non basic, we need only
remove the corresponding columns and calculate the reduced costs in order to obtain
an initial tableau for the initial problem. It may happen (like in Example 16.15)
that the basis corresponding to the optimal tableau of the auxiliary problem contains
auxiliary variables. Note that the feasible basic solution is necessarily degenerate
because this (auxiliary) basic variable is zero. Therefore, it can be exchanged with
a variable of the original problem that is non basic and, therefore, equal to zero as
well. To do so, we choose a column corresponding to a variable from the original
problem such that the pivot is non zero and carry out the pivoting which exchanges
the two variables in the basis. Since the auxiliary basic variable is zero, the pivoting
does not affect the values of the last column of the tableau. Then, the new tableau
corresponds exactly to the same feasible solution as the former. Only the basis has
changed.
In the case where the matrix A of the initial problem is of full rank, such pivoting
is always possible. In Example 16.15, this hypothesis is not satisfied. The third
constraint is the sum of the first two and is redundant. If we want to remove the
variable xa3 from the basis, the only possible candidate to take its place in the basis
is x4 , the only variable of the original problem that is out of the basis. But the
corresponding pivot is zero and the procedure cannot be applied.

1 0 0 1/2 1/2 −1/2 0 1/2 1 x1
0 1 0 −3/4 1/4 1/4 0 −3/4 1/2 x2
0 0 0 0 −1 −1 1 0 0 xa3
0 0 1 1/3 0 0 0 1/3 1/3 x3
0 0 0 0 2 2 0 1 0
The impossibility of removing an auxiliary variable from the basis is therefore a

convenient way to identify a redundant constraint. Since such a constraint can be
ignored without modifying the problem, the corresponding row of the matrix can
simply be removed, as well as the column of the corresponding auxiliary variable:
x1 x2 x3 x4 xa1 xa2 xa4

1 0 0 1/2 1/2 −1/2 1/2 1 x1
0 1 0 −3/4 1/4 1/4 −3/4 1/2 x2
0 0 1 1/3 0 0 1/3 1/3 x3
0 0 0 0 2 2 1 0
When there are no longer any auxiliary variables in the basis, the corresponding
columns can be removed to obtain a tableau.
x1 x2 x3 x4
1 0 0 1/2 1 x1
0 1 0 −3/4 1/2 x2
0 0 1 1/3 1/3 x3
− − − − −
To obtain a feasible tableau from the initial problem, we need to calculate the
elements of the last row. These are reduced costs, defined by (6.166) and the value
of the objective function with the opposite sign. We thus obtain an algorithm in two
phases (Algorithm 16.5).
Algorithm 16.5: Simplex algorithm in two phases

1 Objective
form (6.159)–(6.160).
3 Input
7 Output
8 Boolean indicator U identifying an unbounded problem.
9 Boolean indicator F identifying an infeasible problem.
10 If U and F are false, T ∗ , the simplex tableau corresponding to an optimal
basic feasible solution.
11 Phase I
12 By multiplying the relevant constraints by −1, modify the problem such
that b ≥ 0.
13 Introduce the auxiliary variables xa a
1 , . . . , xm and define
x1 . . . xn xa
1 . . . xa
m
T0 = A I b
−eT A 0 −eT b
where e is the vector of Rm for which all components are 1.

14 Solve the auxiliary problem by using the simplex algorithm (Algorithm
16.4) to obtain T0∗ .
15 If the optimal value of the auxiliary problem is non zero, then F=TRUE.
STOP. Otherwise, F=FALSE.
16 for each basic auxiliary variable do
17 Pivot the tableau with Algorithm 16.3 to exchange it with an original
variable.
18 If all the potential pivots are zero, remove the row corresponding to this
basic variable. The associated constraint is redundant.
19 When there are no more basic auxiliary variables, remove the
corresponding columns of the tableau to obtain the tableau T̄0∗ .
20 Phase II
21 Calculate the last row of T̄0∗ : for j = 1, . . . , n + 1,
 T −1
 cj − cB B Aj
 if j is non basic
T̄0∗ (m + 1, j) := 0 if j is basic


−cTB B−1 b if j=n+1.
22 Solve the problem by using the simplex algorithm (Algorithm 16.4) to

obtain U and T ∗ .
Example 16.16 (Simplex algorithm in two phases). Consider the problem
min 2x1 + 3x2 + 3x3 + x4 − 2x5

x∈R5
subject to
x1 + 3x2 + 4x4 + x5 = 2
x1 + 2x2 − 3x4 + x5 = 2
−x1 − 4x2 + 3x3 = 1
x1 , x2 , x3 , x4 , x5 ≥ 0.
We apply the simplex algorithm in two phases (Algorithm 16.5) with
 
2
   
1 3 0 4 1 2  
 3 
A= 1 2 0 −3 1  , b =  2  , c =   3
.

−1 −4 3 0 0 1  1 
−2
Phase I
1. Initial tableau of the auxiliary problem:
x1 x1.5 x3 x4 x5 xa1 xa2 xa3
1 3 0 4 1 1 0 0 2 2
1 2 0 −3 1 0 1 0 2 2
−1 −4 3 0 0 0 0 1 1
−1 −1 −3 −1 −2 0 0 0 −5
2. The simplex algorithm is applied to solve the auxiliary problem:
x1 x2 x3 x4 x5 xa1 xa2 xa3
1 3 0 4 1 1 0 0 2
0 −1 0 −7 0 −1 1 0 0
0 −1 3 4 1 1 0 1 3 1
0 2 −3 3 −1 1 0 0 −3
1 3 0 4 1 1 0 0 2 x1
0 −1 0 −7 0 −1 1 0 0 xa2
0 −1/3 1 4/3 1/3 1/3 0 1/3 1 x3
0 1 0 7 0 2 0 1 0
The last tableau is optimal, and the optimal value is zero. We have identified a
basic feasible solution: x1 = 2, x2 = 0, x3 = 1, x4 = 0, x5 = 0.
3. The auxiliary variable xa 2 is basic. It is exchanged with x2 .
1 3 0 4 1 1 0 0 2
0 −1 0 −7 0 −1 1 0 0
0 −1/3 1 4/3 1/3 1/3 0 1/3 1
0 1 0 7 0 2 0 1 0

1 0 0 −17 1 −2 3 0 2
0 1 0 7 0 1 −1 0 0
0 0 1 3.67 1/3 2/3 −1/3 1/3 1
0 0 0 0 0 1 1 1 0
No auxiliary variable is left in the basis. The tableau is ready to be cleaned.

4. Remove the columns corresponding to the auxiliary variables.
x1 x2 x3 x4 x5
1 0 0 −17 1 2
0 1 0 7 0 0
0 0 1 3.67 1/3 1
0 0 0 0 0 0
Phase II
1. Calculate the last row of the tableau. The vector c is reported above the tableau
to facilitate the calculations:
• c̄4 = c4 − cTB (B−1 A4 ) = 1 − (2 · (−17) + 3 · 7 + 3 · 3.67) = 3,

• c̄5 = c5 − cTB (B−1 A5 ) = −2 − (2 · 1 + 3 · 0 + 3 · 1/3) = −5.
x1 x2 x3 x4 x5
c= 2 3 3 1 −2
1 0 0 −17 1 2
0 1 0 7 0 0
0 0 1 3.67 1/3 1
0 0 0 3 −5 −7
2. Iterations for phase II :

x1 x2 x3 x4 x5
1 0 0 −17 1 2 2
0 1 0 7 0 0
0 0 1 3.67 1/3 1 3
0 0 0 3 −5 −7
x1 x2 x3 x4 x5
1 0 0 −17 1 2
0 1 0 7 0 0 0
−1/3 0 1 9.33 0 1/3 0.04
5 0 0 −82 0 3
x1 x2 x3 x4 x5
1 2.43 0 0 1 2 x5
0 0.14 0 1 0 0 x4
−1/3 −1.33 1 0 0 1/3 x3
5 11.71 0 0 0 3
394 The revised simplex algorithm
The last tableau is optimal, because all the reduced costs are non negative. The
optimal solution is
1
x1 = 0 , x2 = 0 , x3 = , x4 = 0 , x5 = 2
3
and the optimal value is −3. Note that the optimal solution was already reached at
the previous iteration, illustrating that the non negative reduced costs are sufficient
but not necessary for optimality (see Theorems 6.30 and 6.31).
The presentation of the proof of Theorems 16.1 and 16.2 is inspired by Bertsimas
and Tsitsiklis (1997).
16.4 The revised simplex algorithm

The motivation for developing the tableau in Section 16.2 was to deal with the com-
putational burden of the simplex algorithm, associated with the involvement of the
matrix B−1 at several stages of the algorithm. Another way to deal with this com-
plexity is to exploit an LU factorization of the matrix B, that is, PB = LU where
P ∈ Rm×m is a permutation matrix, L ∈ Rm×m is a lower triangular matrix, and
U ∈ Rm×m an upper triangular matrix. Consider each step of Algorithm 16.2 where
B−1 is involved.
Step 15 Calculation of the reduced costs c − AT B−T cB . We know from Section 6.5
that the reduced costs of basic variables are zero (Theorem 6.29). The reduced
costs for the non basic variables are given by (6.169):
c̄N = cN − NT B−T cB .
They can be computed in the following way using the LU factorization of B.

1. Define z ∈ Rm as the solution of the triangular system
UT z = cB .
2. Define yP ∈ Rm as the solution of the triangular system
LT yP = z.
3. Permute the vector yP to obtain y ∈ Rm :
y = PT yP .
4. Then,
c̄N = cN − NT y.
Step 19 Calculation of the current iterate xB = B−1 b. Using the LU factorization of
B, the procedure is as follows.
1. Define y ∈ Rm as the solution of the triangular system
Ly = Pb.
2. Define xB ∈ Rm as the solution of the triangular system

UxB = y.
Step 20 Calculation of the basic component of the pth basic direction dB = −B−1 Ap .
Using the LU factorization of B, the procedure is as follows.
1. Define y ∈ Rm as the solution of the triangular system
Ly = −PAp .
2. Define dB ∈ Rm as the solution of the triangular system
UdB = y.
In order for the method to be efficient, the LU factorization of B must not be

performed at each iteration. Instead, the factors L and U must be efficiently updated
from one iteration to the next. Like in Section 16.2, this also exploits the fact that
only one column of the matrix B is modified at each iteration. Various methods in
linear algebra may be used to do that. We refer the interested reader to Bartels and
Golub (1969), Forrest and Tomlin (1972), Suhl and Suhl (1993), Nocedal and Wright
(1999, Chapter 13) for the technical details.
16.5 Exercises
Exercise 16.1. Consider the following optimization problem
min −3x1 − 2x2

x∈R2
subject to
x1 − x2 ≥ −2
2x1 + x2 ≤8
x1 + x2 ≤5
x1 + 2x2 ≤ 10
x1 ≥0
x2 ≥ 0.
1. Provide a graphic representation of the feasible set (see Exercise 3.5).
2. Solve the problem graphically.
3. Solve the problem using Algorithm 16.5.
4. Reformulate the same problem with a minimum number of constraints (see Exer-
cise 3.5).
5. Solve the new formulation using Algorithm 16.5.
396 Project
Exercise 16.2. Solve the following optimization problem using Algorithm 16.5.
min −9x1 − 4x2

x∈R2
5x1 + 2x2 ≤ 31
−3x1 + 2x2 ≤ 5
−2x1 − 3x2 ≤ −1
x1 ≥ 0
x2 ≥ 0.
Exercise 16.3. Consider the following simplex tableau:

x1 x2 x3 x4 x5
−2/3 0 δ 0 1/6 46
−1/8 0 0 1 5/2 γ
α 1 ε η −1/6 4
β γ ζ θ 1/2 π
where α, β, γ, δ, ε, ζ, η, θ, γ and π are parameters. The objective function of the
optimization problem is cT x, where
cT = (µ ρ 0 0 0),
and µ and ρ are also parameters.

1. What are the basic variables in this tableau?
2. Give the values of γ, δ, ε, ζ, η, θ, π and ρ that makes it a valid tableau, and
explain why.
3. What are the conditions on the remaining parameters for the optimization problem
to be bounded?
4. What are the conditions on the remaining parameters if the tableau corresponds
to a degenerate basic solution?
to a unique optimal solution of the problem?
to an optimal solution of the problem, and that there exist an infinite number of
optimal solutions?
16.6 Project
Objective
The objective of this project is to implement the simplex algorithm in two phases
and test the various pivoting rules.
Approach
Apply the algorithms to the problem described below, for different values of n, and
compare the number of iterations. For the algorithms comprising random decisions,
run the same problem several times to obtain an average performance index.
Algorithms
Algorithms 16.3, 16.4, and 16.5. The following versions of phase 2 of Algorithm 16.4
are tested.
1. Choose an index of the column corresponding to the negative reduced cost that is
the furthest to the left in the tableau (rule already described in Algorithm 16.4).
2. Choose the index of the column corresponding to the most negative reduced cost.
3. Carry out a pivoting for each variable corresponding to a negative reduced cost
and select the one that generates the most significant reduction in the objective
function.
4. Randomly select an index corresponding to a negative reduced cost, by attributing
the same probability to each of these indexes.
5. Randomly select an index corresponding to a negative reduced cost, for which the
probability of selecting index j is

 −c̄j

 X if c̄j < 0
−c̄k

 {k|c̄k <0}

0 otherwise .
Problems
The following problems are inspired by the ideas of Klee and Minty (1972) to demon-
strate that the complexity of the simplex algorithm cannot be polynomial.
n
X
min − 2n−i xi
i=1
subject to
x1 ≤ 5
4x1 + x2 ≤ 25
8x1 + 4x2 + x3 ≤ 125
..
.
2n x1 + 2n−1 x2 + · · · + 4xn−1 + xn ≤ 5n
x1 , x2 ≥ 0 .
T
The optimal solution to this problem is 0 0 . . . 0 5n .
398 Project

min −xn
subject to
ε ≤ x1 ≤ 1
εxi−1 ≤ xi ≤ 1 − εxi−1 , i = 2, . . . , n ,
where 0 < ε < 1/2.
Chapter 17
Newton’s method for

constrained optimization
Contents
17.1 Projected gradient method . . . . . . . . . . . . . . . . . . 399
17.2 Preconditioned projected gradient . . . . . . . . . . . . . 405
17.3 Dikin’s method . . . . . . . . . . . . . . . . . . . . . . . . . 407
17.4 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
In this chapter, Newton’s method for unconstrained optimization is adapted to prob-

lems with convex constraints:
min f(x)
x
subject to
x ∈ X ⊆ Rn ,
where X is a closed convex subset of Rn .
As demonstrated in Section 11.5, Newton’s method can be seen as a precondi-
tioned version of the steepest descent method. We proceed in a similar manner for
constrained problems: we first present the method by using the steepest descent
method and then precondition it in an appropriate manner.
17.1 Projected gradient method

The basic idea is to follow the steepest descent direction, as in the unconstrained
case. When we obtain an infeasible point, we project it on the set X. Denote [ · ]P the
projection operator. At each iteration, we generate a feasible point yk from a feasible
point xk P
yk = xk − γk ∇f(xk ) ,
with γk > 0 as the step length. The direction dk = yk − xk is feasible, as both xk
and yk are feasible, and the feasible set is convex. We show that, if it is non zero, it
is a descent direction for any value of γk > 0.
400 Projected gradient method
Lemma 17.1. Let f : Rn → R be a differentiable function and X ⊆ Rn a closed

convex set. Take xk ∈ X and x(γ) = xk − γ∇f(xk ), with γ > 0. If the direction
P
d(γ) = x(γ) − xk (17.1)
is non zero, it is a descent direction, for all γ > 0.
Proof. Projecting on a convex set consists in solving an optimization problem. In

Example 6.6, we have derived the optimality conditions of this problem:
P T P
x(γ) − x(γ) x − x(γ) ≥ 0, ∀x ∈ X .
Since xk ∈ X, we have
P T P P T
x(γ) − x(γ) xk − x(γ) = − x(γ) − xk + γ∇f(xk ) d(γ)
T
= − d(γ) + γ∇f(xk ) d(γ)
= −d(γ)T d(γ) − γd(γ)T ∇f(xk )
≥ 0.
Then, as γ > 0,
d(γ)T d(γ)
d(γ)T ∇f(xk ) ≤ − ≤ 0.
γ
Since d(γ) 6= 0 by assumption, we have
d(γ)T ∇f(xk ) < 0
and d(γ) = [x(γ)]P − xk is a descent direction.

P
The direction d(γ) = x(γ) − xk has the following properties:
• If d(γ) = 0, then xk is a stationary point because the necessary optimality condi-

tion (6.8) is satisfied.
• If d(γ) 6= 0, then d(γ) is a descent direction according to Theorem 17.1.

P
• Since xk and x(γ) are feasible, the convexity of X ensures that xk + αd(γ) ∈ X
for any 0 ≤ α ≤ 1 (Theorem 3.11).
It is thus easy to generalize the steepest descent algorithm (Algorithm 11.6) to

obtain Algorithm 17.1.
Newton’s method for constrained optimization 401
Algorithm 17.1: Projected gradient method

1 Objective
min f(x) (17.2)

x∈X⊆Rn
where X is closed, convex and non empty.

3 Input
6 The projection operator on X: [ · ]P .
8 A parameter γ > 0 (for instance γ = 1).
10 Output
12 Initialization
13 k := 0.
14 Repeat
P
15 yk := xk − γ∇f(xk ) .
16 dk := yk − xk .
17 Determine αk by applying a line search (Algorithm 11.5) with α0 = 1.
18 xk+1 := xk + αk dk .
19 k := k + 1.
21 x∗ := xk .
It is important to note that step 15 of Algorithm 17.1 can sometimes be as difficult

as the initial problem. In fact, yk is obtained by solving the problem
1 2
yk = argminx∈X xk − γ∇f(xk ) − x
2 (17.3)
1
= argminx∈X kx − xk k2 + γ∇f(xk )T (x − xk ) .
2
This is an optimization problem on a convex set of constraints, with a convex objective
function. In some cases, this problem is easy to solve.
In particular, the projection on bound constraints is trivial. Take ℓ, u ∈ Rn and
X = x | ℓ ≤ x ≤ u . Then,


ℓi if xk − γ∇f(xk ) i ≤ ℓi
P 
xk − γ∇f(xk ) = xk − γ∇f(xk ) i if ℓi ≤ xk − γ∇f(xk ) i ≤ ui
i 


ui if xk − γ∇f(xk ) ≥ ui .
402 Projected gradient method
If the set X is defined by linear equality constraints Ax = b, we take z = x − xk

1 T
min z z + γ∇f(xk )T z .
2
z|Az=b−Axk
The optimal solution is given by Theorem 6.37:
−1
z∗ = AT AAT b − A xk − γ∇f(xk ) − γ∇f(xk )
and then yk = xk + z∗ , i.e.,
−1
yk = xk − γ∇f(xk ) + AT AAT b − A xk − γ∇f(xk ) . (17.4)
We finally note that, in practice,

P it is also possible to carry out a line search along
the projection arc xk − γ∇f(xk ) , rather than in the direction dk , as we presented
in Algorithm 17.1 (see Bertsekas, 1976).
Example 17.2 (Projected gradient algorithm). We apply Algorithm 17.1 to the

problem
1 2 9 2
min x + x
x∈R2 2 1 2 2
subject to
−x1 + x2 = −1 .
Table 17.1 lists the iterates of the method. The first column contains the iteration
number. The second contains the (infeasible) point obtained by following the steepest
descent direction (except for iteration 0, for which the point (5, 1) was arbitrarily
chosen and where the starting point x0 is obtained by projecting the former on the
constraint). The third column contains the current iterate (always feasible). Finally,
the last column lists the norm of dk , which is used in the stopping criterion.
In order to be able to draw the iterates, the algorithm was also run with γ = 0.1.
The iterates are shown in Table 17.2. Figure 17.1 illustrates the iterations. The
typical zigzagging of the steepest descent method clearly appears on this example,
justifying the need for preconditioning, as discussed in the next section.
Table 17.1: Projected gradient algorithm applied to Example 17.2 (γ = 1)

k xk−1 − γ∇f(xk−1 ) xk kdk k
0 5.00000000e+00 1.00000000e+00 3.50000000e+00 2.50000000e+00
1 0.00000000e+00 -2.00000000e+01 2.50000000e-01 -7.50000000e-01 1.83847763e+01
2 0.00000000e+00 6.00000000e+00 1.06250000e+00 6.25000000e-02 4.59619408e+00
3 0.00000000e+00 -5.00000000e-01 8.59375000e-01 -1.40625000e-01 1.14904852e+00
4 0.00000000e+00 1.12500000e+00 9.10156250e-01 -8.98437500e-02 2.87262130e-01
5 0.00000000e+00 7.18750000e-01 8.97460938e-01 -1.02539062e-01 7.18155325e-02
6 0.00000000e+00 8.20312500e-01 9.00634766e-01 -9.93652344e-02 1.79538831e-02
7 0.00000000e+00 7.94921875e-01 8.99841309e-01 -1.00158691e-01 4.48847078e-03
8 0.00000000e+00 8.01269531e-01 9.00039673e-01 -9.99603271e-02 1.12211769e-03
9 0.00000000e+00 7.99682617e-01 8.99990082e-01 -1.00009918e-01 2.80529424e-04
10 0.00000000e+00 8.00079346e-01 9.00002480e-01 -9.99975204e-02 7.01323559e-05
11 0.00000000e+00 7.99980164e-01 8.99999380e-01 -1.00000620e-01 1.75330890e-05
Newton’s method for constrained optimization
12 0.00000000e+00 8.00004959e-01 8.99999380e-01 -1.00000620e-01 4.38327225e-06

403
Table 17.2: Projected gradient algorithm applied to Example 17.2 (γ = 0.1)

k xk−1 − γ∇f(xk−1 ) xk kdk k
0 5.00000000e+00 1.00000000e+00 3.50000000e+00 2.50000000e+00
1 3.15000000e+00 2.50000000e-01 2.20000000e+00 1.20000000e+00 1.83847763e+00
2 1.98000000e+00 1.20000000e-01 1.55000000e+00 5.50000000e-01 9.19238816e-01
3 1.39500000e+00 5.50000000e-02 1.22500000e+00 2.25000000e-01 4.59619408e-01
4 1.10250000e+00 2.25000000e-02 1.06250000e+00 6.25000000e-02 2.29809704e-01
5 9.56250000e-01 6.25000000e-03 9.81250000e-01 -1.87500000e-02 1.14904852e-01
6 8.83125000e-01 -1.87500000e-03 9.40625000e-01 -5.93750000e-02 5.74524260e-02
7 8.46562500e-01 -5.93750000e-03 9.20312500e-01 -7.96875000e-02 2.87262130e-02
8 8.28281250e-01 -7.96875000e-03 9.10156250e-01 -8.98437500e-02 1.43631065e-02
9 8.19140625e-01 -8.98437500e-03 9.05078125e-01 -9.49218750e-02 7.18155325e-03
10 8.14570312e-01 -9.49218750e-03 9.02539063e-01 -9.74609375e-02 3.59077662e-03
11 8.12285156e-01 -9.74609375e-03 9.01269531e-01 -9.87304687e-02 1.79538831e-03
12 8.11142578e-01 -9.87304687e-03 9.00634766e-01 -9.93652344e-02 8.97694156e-04
Projected gradient method
13 8.10571289e-01 -9.93652344e-03 9.00317383e-01 -9.96826172e-02 4.48847078e-04

14 8.10285645e-01 -9.96826172e-03 9.00158691e-01 -9.98413086e-02 2.24423539e-04
15 8.10142822e-01 -9.98413086e-03 9.00079346e-01 -9.99206543e-02 1.12211769e-04
16 8.10071411e-01 -9.99206543e-03 9.00039673e-01 -9.99603271e-02 5.61058847e-05
17 8.10035706e-01 -9.99603271e-03 9.00019836e-01 -9.99801636e-02 2.80529424e-05
18 8.10017853e-01 -9.99801636e-03 9.00009918e-01 -9.99900818e-02 1.40264712e-05
19 8.10008926e-01 -9.99900818e-03 9.00009918e-01 -9.99900818e-02 7.01323559e-06
3
2.5
2
1.5
x2
1
x0
0.5
0
x∗ -0.5
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x1
(a) Iterations
0.3
0.2
0.1
x2
0
-0.1
x∗
-0.2
0.8 0.9 1 1.1 1.2 1.3
x1
(b) Zoom
Figure 17.1: Projected gradient algorithm: illustration for Example 17.2 with γ = 0.1
17.2 Preconditioned projected gradient

We now apply the projected gradient method, but only after first carrying out a
change of variables (Definition 2.32).
Take the original problem
min f(x)
x∈X
406 Preconditioned projected gradient
and a positive definite matrix H and its Cholesky factorization is LLT (Definition
B.18). We define
x ′ = LT x ⇐⇒ x = L−T x ′ .
With the new variables, the problem is written as
min g(x ′ ) = f(L−T x ′ )

x ′ ∈X ′

with X ′ = x ′ | L−T x ′ ∈ X . By using Equation (17.3), step 15 of Algorithm 17.1 is
written as
1 ′ 2 T
yk′ = argminx ′ ∈X ′ x − xk′ + γ∇g xk′ x ′ − xk′ .
2
In order to write this expression in the original variables, we note that
∇g(xk′ ) = L−1 ∇f(L−T xk′ ) = L−1 ∇f(xk ) ,
which gives
1 T T
yk = argminx∈X L x − LT xk (LT x − LT xk ) + γ∇f(xk )T L−T (LT x − LT xk )
2
or
1
yk = argminx∈X (x − xk )T H(x − xk ) + γ∇f(xk )T (x − xk ) . (17.5)
2
Again, the calculation of yk can be difficult. In a case where X is defined solely

by linear equations, we have a quadratic problem for which the analytical solution is
given by Theorem 6.38.
In the particular case where H = ∇f(xk )2 + τI, with τ chosen so that H is
positive definite, we obtain Newton’s method for constrained optimization (Al-
gorithm 17.2). By applying it to Example 17.2, we find convergence in 2 iterations
(Table 17.3).
Table 17.3: Newton’s method for constrained optimization applied to Example 17.2
(γ = 1)
k yk xk kdk k
0 5.0 1.0 3.5 2.5
1 0.0 -20.0 0.9 -0.1 3.67695526e+00
2 0.0 0.8 0.9 -0.1 3.46944695e-16
Algorithm 17.2: Preconditioned projected gradient method

1 Objective
min f(x) , (17.6)

x∈X⊆Rn
where X is convex, closed and non empty.

3 Input
6 A family of preconditioners (Hk )k such that Hk is positive definite for any
k.
8 A parameter γ > 0 (for instance γ = 1).
10 Output
12 Initialization
13 k := 0.
14 Repeat
15 Calculate yk by solving
1
yk = argminx∈X (x − xk )T Hk (x − xk ) + γ∇f(xk )T (x − xk ) .
2
16 dk := yk − xk .
17 Determine αk by applying a line search (Algorithm 11.5) with α0 = 1.
18 xk+1 := xk + αk dk .
19 k := k + 1.
21 x∗ := xk .
17.3 Dikin’s method

Consider the linear optimization problem
min cT x
x
subject to
Ax = b
x≥0
408 Dikin’s method
and let us assume that the problem is bounded, and there there exists a feasible vector
x such that x > 0. It is possible to apply the ideas of the preconditioned projected
gradient method to this problem. It is important to note that the constraint x ≥ 0
complicates the problem giving it a combinatorial dimension. Contrary to the simplex
method that tries to identify which variables are zero at the optimal solution, we here
work solely with strictly feasible iterates, i.e., such that x > 0.
Take xk feasible and positive, and a positive definite matrix H. We apply an itera-
tion of the preconditioned projected gradient method. We use (17.5) with ∇f(xk ) = c,
and obtain
1
yk = argminx|Ax=b (x − xk )T H(x − xk ) + γcT (x − xk ) ,
2
for which the optimal solution is given by Theorem 6.38:
yk = xk − γH−1 (c − AT λ)
with
−1
λ = AH−1 AT AH−1 c .
The step length γ is chosen sufficiently small so that yk > 0 and yk is then strictly fea-
sible. An iteration of the preconditioned projected gradient proceeds in the direction
yk − xk with a step of length ᾱ, that is,
xk+1 = xk + ᾱk (yk − xk ) = xk − ᾱk γH−1 (c − AT λ)

= xk − αk H−1 (c − AT λ) ,
where αk = ᾱk γ.
In order to guarantee that xk+1 > 0, we choose αk = βαmax , with 0 < β < 1 and

αmax = max α | xk − αH−1 (c − AT λ) ≥ 0 .
In practice, β is often selected between 0.9 and 0.999, but other values are also
possible. Below, we illustrate the algorithm with β = 0.9 and β = 0.5.
For the method to work, we need to select the matrix H. Since the objective
function is linear, the choice H = ∇2 f(xk ) is not appropriate, as ∇2 f(xk ) = 0. The
method proposed by Dikin (1967) consists in choosing
 2 
xk 1
0
 
 .. 
H−1 = diag(xk )2 =  . .
 
2
0 xk n
We obtain Algorithm 17.3.

Algorithm 17.3: Dikin’s method

1 Objective
form (6.159)–(6.160).
3 Input
7 An initial solution x0 such that Ax0 = b and x0 > 0.
8 A parameter β such that 0 < β < 1 (by default, β = 0.9).
9 The required precision ε ∈ R .
10 Output
11 A Boolean indicator U identifying an unbounded problem.
12 If U is false, an approximation of the optimal solution x∗ .
13 Initialization
14 k := 0.
15 Repeat
16 H−1 := diag(xk )2 .
−1
17 λ := AH−1 AT AH−1 c .
−1 T
18 d := −H (c − A λ).
19 For each i = 1, . . . , n, calculate

 − xk i

if di < 0
αi := di

 +∞ otherwise .
20 αmax := mini αi .
21 if αmax = ∞ then
22 the problem is unbounded. U = TRUE. STOP.
23 xk+1 := xk + βαmax d.
24 k := k + 1.
25 Until kdk ≤ ε
26 x∗ := xk .
Example 17.3 (Dikin’s method). Consider the problem
min x1 + 2x2 + 3x3
subject to
x1 + x2 + x3 = 1
x ≥ 0.
410 Dikin’s method
T
The optimal solution is x∗ = 1 0 0 . The iterations of Dikin’s method starting
T
from x0 = 1/3 1/3 1/3 are shown in Figure 17.2 for β = 0.9 and Figure 17.3
for β = 0.5.
1
0.8
x3 0.6
0.4 0
0.2
0 0.2
0.4
0.6 x1
0 0.2 0.8
0.4 0.6 0.8 1
x2 1
Figure 17.2: Dikin’s method for Example 17.3 (β = 0.9)
1
0.8
x3 0.6
0.4 0
0.2
0 0.2
0.4
0.6 x1
0 0.2 0.8
0.4 0.6 0.8 1
x2 1
Figure 17.3: Dikin’s method for Example 17.3 (β = 0.5)
Finally, Table 17.4 represents the iterations of Dikin’s method for Example 16.7.
Dikin’s method is a precursor for interior point methods. In fact, all the iterates
of this method are interior points (Definition 1.15) in the subspace defined by Ax = b.
Then, the directions obtained are automatically feasible for the constraints x ≥ 0. In
the following chapter, we study these methods in more detail.
Table 17.4: Iterations of Dikin’s method for Example 16.7

x1 x2 x3 x4 x5 x6 kdk
3.000000e+00 3.000000e+00 3.000000e+00 5.000000e+00 5.000000e+00 5.000000e+00
3.103540e+00 4.099115e+00 4.099115e+00 5.000000e-01 1.495575e+00 1.495575e+00 9.137113e+01
3.956006e+00 3.979477e+00 3.979477e+00 1.260874e-01 1.495575e-01 1.495575e-01 5.325931e+00
3.944714e+00 4.010669e+00 4.010669e+00 1.260874e-02 7.856374e-02 7.856374e-02 7.977189e-02
3.998821e+00 3.998167e+00 3.998167e+00 8.510000e-03 7.856374e-03 7.856374e-03 1.607856e-02
3.996546e+00 4.000651e+00 4.000651e+00 8.510000e-04 4.955183e-03 4.955183e-03 3.282717e-04
3.999937e+00 3.999877e+00 3.999877e+00 5.550455e-04 4.955183e-04 4.955183e-04 6.386205e-05
3.999778e+00 4.000042e+00 4.000042e+00 5.550455e-05 3.185682e-04 3.185682e-04 1.388727e-06
3.999996e+00 3.999992e+00 3.999992e+00 3.592153e-05 3.185682e-05 3.185682e-05 2.638521e-07
Newton’s method for constrained optimization
411
412 Project
17.4 Project
Objective
The aim of this project is, first, to implement and analyze the preconditioned gradient
method and, second, to implement Dikin’s method and compare it with the simplex
method.
Approach
Projected gradient. The algorithm of the preconditioned projected gradient (Algo-

rithm 17.2) is applied to the non linear problem described below. The idea is to
analyze the behavior of the algorithm for the following families of preconditioners:
1. Hk = I for any k.
2. Hk = (∇2 f(xk ) + τI)−1 , where τ is selected so that Hk is positive definite (use the
modified Cholesky factorization, Algorithm 11.7).

3. Hk = diag min 1, 1/ (xk )1 , . . . , min 1, 1/ (xk )n is a diagonal matrix.
It is interesting to modify the value of the step γ and test, for instance, γ = 0.1, 1, 10.
Dikin. Dikin’s method (Algorithm 17.3) is implemented with several values of β,

for instance, β = 0.1, 0.5, 0.9, 0.99 and 0.999. Compare the performance of Dikin’s
method and of the simplex method for different values of n in the linear problems
described below. In particular, try a value of n that is as large as possible.
Algorithms
Algorithms 17.2, 17.3, and 11.7.
Problems
Exercise 17.1. Non linear (Jansson and Knüppel, 1992):

x
1 exp −(x2 − 5)2 /2
min x21 − 12 x1 + 10 cos π + 8 sin(π5x1 ) − √
x∈R2 2 5
subject to
−30 ≤ x1 ≤ 30
−10 ≤ x2 ≤ 10 .
Exercise 17.2. Linear (i):

n
X
min − 2n−i xi
i=1
subject to
x1 ≤5
4x1 + x2 ≤ 25
8x1 + 4x2 + x3 ≤ 125
..
.
2n x1 + 2n−1 x2 + · · · + 4xn−1 + xn ≤ 5n
x1 , x2 , . . . , xn ≥ 0 .
T
Use the starting point 1 1 ... 1 . The optimal solution to this problem is
T
0 0 . . . 0 5n .
Exercise 17.3. Linear (ii):
min −xn
subject to
ε ≤ x1 ≤ 1
εxi−1 ≤ xi ≤ 1 − εxi−1 , i = 2, . . . , n .
Use the starting point
1+ε
x1 =
2
1
xi = , i = 2, . . . , n ,
2
with 0 < ε < 1/2.
Chapter 18
Interior point methods
Contents
18.1 Barrier methods . . . . . . . . . . . . . . . . . . . . . . . . 415
18.2 Linear optimization . . . . . . . . . . . . . . . . . . . . . . 422
18.3 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
18.1 Barrier methods

The simplex method moves from one vertex of the constraint polyhedron to the next.
It can, in extreme cases, require a large number of iterations. In the context of
the theory of algorithm complexity, we speak of an exponentially large number of
iterations, because the number of iterations in the worst case increases exponentially
with the size of the problem to be solved. Interior point methods, as their name
indicates, methodically avoid the border of the feasible set and thus do not suffer
from the combinatorial aspect inherent to the simplex method. Khachiyan (1979)
was the first to propose an algorithm which he showed to be polynomial, i.e., for
which the number of iterations increases polynomially with the size of the problem.
Unfortunately, this algorithm proved ineffective in practice and it took the work of
Karmarkar (1984) to create enthusiasm around interior point methods.
Currently, the importance of these methods exceeds the framework of linear op-
timization and they are widely used in the context of convex optimization thanks to
the work by Nesterov and Nemirovsky (1994). The reader is referred to Boyd and
Vandenberghe (2004) for more details.
Here, we motivate the development of interior point methods through barrier
methods, as they provide an intuitive interpretation.
Consider the problem
min f(x) subject to x ∈ X , g(x) ≤ 0 , (18.1)

416 Barrier methods
where f : Rn → R, g : Rn → Rm and X is a closed set. The set of feasible points is

F = x ∈ Rn | x ∈ X , g(x) ≤ 0 . (18.2)
In this context, we call the set of interior points the set S defined by

S = x ∈ Rn | x ∈ X , g(x) < 0 , (18.3)
and we assume that it is not empty. Note that the points in S are not interior points
of the set F (in the sense of Definition 1.15). They are interior within the set X, with
respect to the constraint g(x) ≤ 0.
We assume that
• S 6= ∅,
• any feasible point can be arbitrarily well approximated by an interior point, i.e.,
for any x ∈ F and any ε > 0, there exists ex ∈ S such that
e
x − x ≤ ε.
If X is a convex set and g is a convex function, this hypothesis is always satisfied.
Lemma 18.1. Let X ⊂ Rn be a closed convex set. Let g : Rn → Rm be a convex

function. Let F be defined by (18.2) and S defined by (18.3). For all x ∈ F and
for all ε > 0, there exists e
x ∈ S such that
e
x − x ≤ ε.
Proof. If x ∈ S, the property is trivially satisfied with e x = x. Take x ∈ F \ S, i.e.,

so that x ∈ X and so that there exist indices i1 , . . . , ik , k ≤ n, such that g(x)ij = 0
for j = 1, . . . , k. Without loss of generality, we assume that k = n. Indeed, the other
indices do not pose a problem, and we can always choose e xi = xi for them. Then,
g(x) = 0. Take y ∈ S. By convexity of X, λx + (1 − λ)y ∈ X for any 0 ≤ λ ≤ 1.
Moreover, by convexity of g, we have

g λx + (1 − λ)y ≤ λg(x) + (1 − λ)g(y) = (1 − λ)g(y) .

Therefore, since g(y) < 0, we have g λx + (1 − λ)y < 0 for any λ < 1 and e x =
λx + (1 − λ)y ∈ S. To obtain ε ≥ ke x − xk, we need
2
ε2 ≥ (λ − 1)x + (1 − λ)y
= (1 − λ)2 kx − yk2 .
Since 1 − λ > 0, this equals
ε
1−λ≤
kx − yk
or
ε
λ≥1− .
kx − yk
Since x 6= y, for all ε > 0, it is possible to find a λ < 1 such that e
x = λx + (1 − λ)y ∈
S.
Interior point methods 417
The interior point methods employ functions known as barrier functions in order
to force the algorithms to remain in S. A barrier function is defined on S and is going
to infinity as x approaches the border of the set.
Definition 18.2 (Barrier function). Let X ⊂ Rn be a closed set. Let g : Rn → Rm

be a convex function. Let S be defined by (18.3). A function B : S → R is a barrier

function if it is continuous and if
lim B(x) = +∞ . (18.4)

x∈S, g(x)→0
Example 18.3 (Barrier function). The most used barrier functions are the logarith-
mic function
m
X
B(x) = − ln −gj (x) (18.5)
j=1
and the inverse function

m
X 1
B(x) = − . (18.6)
gj (x)
j=1
Consider the constraints defined by 1 ≤ x ≤ 3. They are defined by g : R → R2 ,

with g1 (x) = 1 − x and g2 (x) = x − 3. The logarithmic barrier function for these
constraints are written as
− ln(x − 1) − ln(3 − x)
and the inverse barrier function is
1 1
+ .
x−1 3−x
By multiplying this function by a parameter ε, we can control the height of the
barrier, as illustrated in Figure 18.1 for the logarithmic barrier and in Figure 18.2 for
the inverse barrier.
A barrier method consists in combining the objective function of the problem with
the barrier function and progressively decreasing the height of the latter. We define
a set of parameters εk k such that
• 0 < εk+1 < εk , k = 0, 1, . . .,
• limk εk = 0.
At each iteration, the following problem is solved:
xk ∈ argminx∈S f(x) + εk B(x) . (18.7)
At first glance, this technique seems ineffective. In fact, we need to solve a non linear
constrained problem at each iteration. However, the structure of this problem, and
418 Barrier methods
30
25 ε = 100
20
εB(x)
15 ε = 10
10
5
ε=1
0
1 1.5 2 2.5 3
x
Figure 18.1: Logarithmic barrier
300
250 ε = 100
200
εB(x)
150
100
50 ε = 10
0 ε =1
1 1.5 2 2.5 3
x
Figure 18.2: Inverse barrier
especially the presence of the barrier function, enable us to employ effective methods.
For instance, if X = Rn , the problem to solve is written as
xk ∈ argming(x)<0 f(x) + εk B(x) . (18.8)
It is possible to solve this problem by using the unconstrained optimization meth-

ods described in Part IV. Indeed, since these methods are descent methods, the
presence of a sufficiently high barrier and an appropriate selection of the step along
the current descent direction prevent the generation of iterates outside of S and the
constraints can be ignored. Similarly, if X is a convex set, the methods described in
Chapter 17 can be used.
An essential element
for the proper functioning of this type of method is the speed
at which the set εk k tends to 0, i.e., the speed at which we decrease the height of
the barrier. If it is reduced too fast, an unconstrained algorithm (or an algorithm for
convex constraints) may generate a point outside of S. Good interior point methods
are based on a reduction of εk that is neither too large, to avoid generating infeasible
iterates, nor too small, to avoid a slow convergence of the method.
Example 18.4 (Barrier method). Consider the problem
1 2
min f(x) = (x + x22 )
2 1
with the constraint
x1 ≥ 2 ,
T
for which the optimal solution is x∗ = 2 0 . By taking the logarithmic barrier,
(18.7) is written as
1 2
xk ∈ argminx1 >2 (x + x22 ) − εk ln(x1 − 2) . (18.9)
2 1
We zero the gradient of the objective function
 εk 
x1 −
 x1 − 2  = 0 .
x2
The first component is zero if x21 − 2x1 − εk = 0, i.e., if

p p
x1 = 1 + 1 + εk or x1 = 1 − 1 + εk .
Only the first value is feasible. The second component is zero if x2 = 0. Therefore,
the minimum is unique and we obtain
√ !
1 + 1 + εk
xk = .
0
We clearly see that

2
lim xk =
k→∞ 0
because
lim εk = 0 .
k→∞
The function to minimize in (18.9) is shown in Figure 18.3 for ε = 0.3. We can follow
the evolution of the level curves of this function for different values of ε, as well as
the value of the first component of xk :
ε (xk )1 Figure
0.300 2.140175425 18.4(a)
0.150 2.072380529 18.4(b)
0.095 2.046422477 18.4(c)
0.030 2.014889157 18.4(d)
0.003 2.001498877 18.4(e)
0.000 2.000000000 18.4(f)
420 Barrier methods
5
4.5
4
3.5
3
2.5
2 1
0.5
2.2 0
x1 2.4 -0.5 x2
-1
Figure 18.3: ε = 0.3
Theorem 18.5 (Convergence of the barrier method). Let f : Rn → R be differen-

tiable, X ⊆ Rn closed and g : Rn → Rm differentiable. Consider the sets F and
S defined by (18.2) and (18.3), respectively. We assume that S 6= ∅ and that
any feasible point can be approximated arbitrarily closely by an interior point,
i.e., for any x ∈ F and for any ε > 0, there exists e
x ∈ S such that
ke
x − xk ≤ ε .

Consider the set xk k
such that
xk ∈ argminx∈S f(x) + εk B(x) ,

where B : Rn → R is a barrier function, according to Definition 18.2 and εk k
is such that εk > εk+1 , ∀k, with limk→∞ εk = 0. Then, any limit point of the
sequence xk k is a global minimum of the optimization problem
min f(x)
subject to
x∈F.
1 1
0.5 0.5
0 x2 0 x2
-0.5 -0.5
-1 -1
2 2.1 2.2 2.3 2 2.1 2.2 2.3
x1 x1
(a) ε = 0.3 (b) ε = 0.15
1 1
0.5 0.5
0 x2 0 x2
-0.5 -0.5
-1 -1
2 2.1 2.2 2.3 2 2.1 2.2 2.3
x1 x1
(c) ε = 0.095 (d) ε = 0.03
1 1
0.5 0.5
0 x2 0 x2
-0.5 -0.5
-1 -1
2 2.1 2.2 2.3 2 2.1 2.2 2.3
x1 x1
(e) ε = 0.003 (f) ε = 0
Figure 18.4: Level curves of the function (18.9)


Proof. Let x̄ be the limit of a convergent subsequence xk and then a limit point
k∈K
of the sequence xk k . If x̄ ∈ S, then B(x̄) < ∞ and
lim εk B(xk ) = 0 .
k∈K
If x̄ 6∈ S, then limk∈K B(xk ) = +∞. In both cases, we have
lim inf εk B(xk ) ≥ 0 . (18.10)

k∈K
Then,
lim inf f(xk ) + εk B(xk ) = f(x̄) + lim inf εk B(xk ) ≥ f(x̄) .
k∈K k∈K
Since X is closed and x̄ is the limit of points in S, x̄ is feasible. We assume by

contradiction that it is not a global minimum. Therefore, there exists x∗ ∈ F such
that f(x∗ ) < f(x̄). By assumption, x∗ may be approached arbitrarily closely by an
interior point. Therefore, there exists e
x ∈ S such that f(e
x) < f(x̄).
By definition of xk , we have
f(xk ) + εk B(xk ) ≤ f(e

x) + εk B(e
x) .
When k → ∞, k ∈ K, we obtain
f(x̄) + lim inf εk B(xk ) ≤ f(e

x) .
k∈K
According to (18.10), we have f(x̄) ≤ f(e

x), which leads to a contradiction and proves
the result.
In practice, this type of method is rarely used. We now develop these ideas in the
specific case of linear optimization.
18.2 Linear optimization

The barrier methods, or interior point methods, have proven particularly effective in
the context of convex optimization in general and in the context of linear optimization
in particular.
Consider the linear problem
min cT x
subject to
Ax = b
x ≥ 0.
This is a version of (18.1) with f(x) = cT x, X = {x | Ax = b} and g(x) = −x. The set
of interior points is defined by S = {x | Ax = b , x > 0}, which is assumed to be non
empty. By using the logarithmic barrier function, we define
n
X
xε = argminx∈S cT x − ε ln xi . (18.11)
i=1
When ε tends toward infinity, the objective function cT x plays no role and only
the barrier on the constraints is minimized. We then obtain a point x∞ called the
analytical center of the constraint polyhedron.
Definition 18.6 (Analytical center). Consider a polyhedron represented in standard

form with a non empty interior P = {x | Ax = b , x ≥ 0}. The analytical center x∞

of P is defined by
Xn
x∞ = argminAx=b, x>0 − ln xi . (18.12)
i=1
Example 18.7 (Analytical center). Consider the linear problem:
min x1 + 2x2 + 3x3
subject to
x1 + x2 + x3 = 1
xi ≥ 0 , i = 1, 2, 3 .
The analytical center of the constraint polyhedron is given by
x∞ = argminx1 +x2 +x3 =1, x>0 − ln x1 − ln x2 − ln x3 .
By replacing x3 with 1 − x1 − x2 , we obtain
x∞ = argminx>0 − ln x1 − ln x2 − ln(1 − x1 − x2 ) .
The gradient of the function is

 
1 1
+
−
 x1 1 − x1 − x2 
 ,
 1 1 
− +
x2 1 − x1 − x2
which is zero at x1 = 1/3 and x2 = 1/3. Since x3 = 1−x1 −x2 , we also have x3 = 1/3.
Since all the components are positive, we have an analytical center. It is shown in
Figure 18.5.
Then, from Theorem 18.5, when ε decreases towards 0, the point xε approaches
the optimal solution to the linear problem. The trajectory followed by point xε is
called a central path. We note that this concept is used in the following in a broader
context, involving dual problems. Therefore, we call it here the primal central path.
For Example 18.7, the primal central path is shown in Figure 18.6.
x3
x∞
x2
x1
Figure 18.5: Analytical center of Example 18.7
x3
•x∞
x2
•x∗
x1
Figure 18.6: Central path for Example 18.7

Definition 18.8 (Primal central path). Consider the linear problem
min cT x
subject to
Ax = b
x ≥ 0.
The primal central path is the curve described by
n
X
T
xε = argminx∈S c x − ε ln xi (18.13)
i=1
and parameterized by ε ≥ 0.
It is important to note that the identification of the central path is a difficult

problem. For each value of ε, we must indeed solve a non linear optimization problem.
Therefore, the interior point algorithms use the central path as an indicator of the
direction to progress toward the optimal solution, but without trying to follow it
exactly. It means that, for a given ε, the corresponding non linear optimization
problem is solved approximately.
We now compare the optimality conditions of the linear problem with those of the
barrier problem. By taking the results of Section 6.5, we get the optimality conditions
of the linear problem:
Ax − b = 0 primal constraint
x≥0 primal constraint
T
A λ+µ−c=0 dual constraint (Eq. (6.162))
µ≥0 dual constraint (Eq. (6.162))
xi µi = 0 complementarity constraint (Eq. (6.163)) .
T
If we denote e = 1 1 ... 1 and
   
x1 0 ··· 0 0 µ1 0 · · · 0 0
 0 x2 · · · 0 0   0 µ2 · · · 0 0 
   
 ..   . . 
X= . , S= . , (18.14)
   
 0 0 · · · xn−1 0   0 0 · · · µn−1 0 
0 0 ··· 0 xn 0 0 · · · 0 µn
we obtain
Ax − b = 0
T
A λ+µ−c=0
XSe = 0 (18.15)
x≥0
µ ≥ 0.
We now write the optimality conditions for the problem
n
X
min cT x − ε ln xi
i=1
subject to
Ax = b
x ≥ 0.
The Lagrangian is
n
X
L(x, λ, µ) = cT x − ε ln xi + λT (Ax − b) − µT x .
i=1
The first-order optimality condition is written as

∇L(x, λ, µ) = c − εX−1 + AT λ − µ = 0 , (18.16)
and the complementarity condition is
XSe = 0 . (18.17)
Incidentally, the optimal solution to the barrier problem is always such that x > 0.
Therefore, the multipliers µi are always zero at the optimum. We keep them in the
development in order to draw a parallel with the conditions (18.15). We now define
µ̄ = µ + εX−1 and S̄ = S + εX−1 . (18.18)
The condition (18.16) is written as
c + AT λ − µ̄ = 0 .
Since
XS̄e = XSe + εe ,
the condition (18.17) is written as
XS̄e = εe .
We thus obtain the following conditions for the barrier problem:
Ax − b = 0
T
A λ + µ̄ − c = 0
XS̄e = εe (18.19)
x≥0
µ̄ ≥ 0 .
We note the similarities between the optimality conditions for the original problem
(18.15) and those of the barrier problem (18.19). The conditions are the same, except
for the third one, where the right hand side is εe instead of 0. In the following, we
abandon the notation µ̄ and S̄ and use µ and S.
We can thus characterize the optimal solution to the barrier problem which uses
dual variables. We consider the primal and dual variables together and work in the
space Rn+m+n with the variables (x, λ, µ). In this space, the feasible set is

F = (x, λ, µ) | Ax = b , AT λ + µ = c , x ≥ 0 , µ ≥ 0 (18.20)
and the set of interior points is

S = (x, λ, µ) | Ax = b , AT λ + µ = c , x > 0 , µ > 0 , (18.21)
again assumed to be non empty. The central path concept is also extended.
Definition 18.9 (Primal-dual central path). Consider the linear problem
min cT x
subject to
Ax = b
x ≥ 0.
The primal-dual central path is the curve described by (xε , λε , µε ) with ε ≥ 0, where
(xε , λε , µε ) solves (18.19), that is such that
Axε − b = 0
T
A λε + µε − c = 0
Xε Sε e = εe
xε ≥ 0
µε ≥ 0 .
Some elements of the primal-dual central path for Example 18.7 are listed in Table
18.1.
The system (18.19) includes two sets of linear equations, a set of slightly non
linear equations (XSe = εe), and two sets of inequations. Based on the same idea as
Dikin’s method (Algorithm 17.3), we proceed in the following manner:
1. Consider only the iterates in S.
2. Ignore the inequalities and apply (partially) Newton’s method (Algorithm 7.3) to
the following system of equations, in order to identify a direction for the algorithm
to follow.  
Ax − b
F(x, λ, µ) =  AT λ + µ − c  = 0 . (18.22)
XSe − εe
Table 18.1: Elements of the primal-dual central path for Example 18.7
ε x1 x2 x3 µ1 µ2 µ3 λ
10000 3.3335e-01 3.3333e-01 3.3332e-01 2.9999e+04 3.0000e+04 3.0001e+04 -2.9998e+04
1000 3.3344e-01 3.3333e-01 3.3322e-01 2.9990e+03 3.0000e+03 3.0010e+03 -2.9980e+03
100 3.3445e-01 3.3333e-01 3.3222e-01 2.9900e+02 3.0000e+02 3.0100e+02 -2.9800e+02
10 3.4457e-01 3.3309e-01 3.2235e-01 2.9022e+01 3.0022e+01 3.1022e+01 -2.8022e+01
1 4.5157e-01 3.1118e-01 2.3724e-01 2.2146e+00 3.2149e+00 4.2147e+00 -1.2146e+00
0.1 8.6295e-01 8.9598e-02 4.7454e-02 1.1585e-01 1.1156e+00 2.1158e+00 8.8426e-01
Linear optimization
0.01 9.8484e-01 1.0176e-02 4.9833e-03 1.0151e-02 1.0098e+00 2.0101e+00 9.9008e-01

0.001 9.9840e-01 8.4649e-04 7.5807e-04 1.0020e-03 1.0010e+00 2.0014e+00 9.9931e-01
0.0001 9.9902e-01 9.0048e-04 7.9656e-05 1.0602e-04 1.0007e+00 2.0008e+00 9.9940e-01
0 1.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 1.0000e+00 2.0000e+00 1.0000e+00
3. Calculate the step along the direction such that no iterate leaves S.
The Jacobian matrix of the system (18.22) is
 
A 0 0
J(x, λ, µ) = ∇F(x, λ, µ)T =  0 AT I .
S 0 X
If (x, λ, µ)T ∈ S, then this point is feasible and

 
0
F(x, λ, µ) =  0 . (18.23)
XSe − εe
Therefore, the Newton equations (7.16) for an iterate (x, λ, µ) are written as
    
A 0 0 dx 0
 0 AT I   dλ  =  0  (18.24)
S 0 X dµ −XSe + εe
and an iteration consists in
     
x+ x dx
 λ+  =  λ  + α  dλ  , (18.25)
µ+ µ dµ
T
where 0 < α ≤ 1 is chosen such that x+ , λ+ , µ+ ∈ S.
For a given value of ε, i.e., for a given barrier height, the Newton iterations can
be applied until convergence. In this case, we identify the (primal-dual) central path
element corresponding to this value of ε.
We can thus present a generic primal-dual algorithm of interior points to solve a
linear optimization problem: Algorithm 18.1.
For the algorithm to be well defined, we need to specify two more things:
1. the calculation of the step αk (step 16) and
2. the handling of the barrier height εk (step 17).
These two things are actually closely related, as explained below. But first, we an-
alyze the stopping criterion and the distance to the central path. The necessary
and sufficient condition for (xk , λk , µk ) to be an optimal solution to the initial prob-
lem is that it solves the system (18.15). As all the iterates are in S, Axk − b = 0,
AT λk + µk − c = 0, xk > 0, µk > 0 for any k. Therefore, the algorithm identifies an
optimal solution when xk i µk i = 0, for i = 1, . . . , n. The stopping criterion is in
a sense based on the average distance to optimality
n
1X 1
νk = xk i µk i = xTk µk ,
n n
i=1
as every term is positive (as all the iterates are in the interior set S), and zero at the
optimal solution. We call this quantity the duality measure.
Algorithm 18.1: Generic interior points algorithm

1 Objective
form (6.159)–(6.160), i.e.,
min cT x subject to Ax = b , x ≥ 0 .
x∈Rn
3 Input
7 An initial feasible solution (x0 , λ0 , µ0 )T such that Ax0 = b, AT λ0 + µ0 = c,
x0 > 0 and µ0 > 0.
8 An initial value for the height of the barrier ε0 > 0.
9 The required precision ε̄ ∈ R.
10 Output
11 An approximation of the optimal solution x∗ .
12 Initialization
13 k := 0.
14 Repeat
15 Calculate (dx , dλ , dµ ) by solving
    
A 0 0 dx 0
 0 AT I   dλ  =  0 , (18.26)
Sk 0 Xk dµ −Xk Sk e + εk e
where Xk and Sk are defined by (18.14).

16 Calculate a step 0 < αk ≤ 1 such that
     
xk+1 xk dx
 λk+1  =  λk  + αk  dλ  (18.27)
µk+1 µk dµ
is strictly feasible, i.e., in S.

17 Update the height of the barrier by defining εk+1 .
18 k := k + 1.
1
19 Until xTk µk ≤ ε̄.
n
In order to discuss the choice of the barrier parameter ε, we note that it is wise
to have high barriers when we are far from the optimal solution, while they should
be lower when we are close to it. It is appropriate to define the height of the barrier
as a function of the duality measure, used as a stopping criterion in the algorithm.
Then, we define
ε = σν , (18.28)
where ν = xT µ/n and σ, called the centering parameter, is such that 0 ≤ σ ≤ 1. In

order to understand the role of this parameter, let us analyze its two extreme values.
σ = 0 The barrier is absent in this case, as ε = 0. The direction obtained in
the algorithm aims to directly solve the optimality conditions (18.15) of the initial
problem. It is thus necessary to calculate the step in order to stay within S. This
method is a version of Dikin’s method (Algorithm 17.3) in the primal-dual space.
σ = 1 If ν is fixed, the generic algorithm converges toward the point of the central
path corresponding to ε = ν. Each iteration with σ = 1 enables the central path to
be approached.
The method faces a dilemma. The further the iterates are from the constraints,
the more flexible the method is, as it can make longer steps in any direction. At the
same time, the further it is from the optimal solution, which is at a vertex that is on
the border. The role of the parameter σ is to handle this dilemma.
Example 18.10 (Centering parameter). Consider Example 18.7 again and calculate
the direction (dx , dλ , dµ ) in several points, with two values for σ. For each point, the
values of the dual variables are λ = 0 and µ = c.
The results are presented in Tables 18.2, 18.3, and 18.4, and illustrated in Figures
18.9, 18.7, and 18.8. The case presented in Figure 18.8 illustrates particularly well the
difference between the two extreme values of the σ parameter. While the direction
generated with σ = 0 points more or less in the direction of the optimal solution of
the problem, the direction generated with σ = 1 points toward the central path.
T
Table 18.2: Primal-dual directions at x0 = 0.6 0.2 0.2
ν = 0.53333
σ=1 σ=0
dx dµ dλ dx dµ dλ
-0.049275 -0.028986 0.069739 -0.498138
0.069565 -0.028986 0.028986 -0.026567 -0.498138 0.498138
-0.020290 -0.028986 -0.043172 -0.498138
T
ν = 0.66667
σ=1 σ=0
0.4061814 0.4102842 0.043272 -0.499296

-0.4020786 0.4102842 -0.4102842 -0.019972 -0.499296 0.499296
-0.0041028 0.4102842 -0.023300 -0.499296
T
ν = 0.8
σ=1 σ=0
0.208312 0.470382 0.2 -2
0.053758 0.470382 -0.470382 0.2 -2
-0.262070 0.470382 0 -2
2 -0.2 -2
x3 σ=1
0
σ=0
0
x1
0
x2 1 1
T
Figure 18.7: Newton directions for the point 0.2 0.6 0.2
Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 Interior point methods 433
x3 σ=1
σ=0 0
0
x1
0
x2 1 1
T
1
σ=1
x3
0
0
σ=0
x1
0
x2 1 1
T
In order to “follow” the central path, the algorithm needs a measure of the distance
from one iterate to the path. For ε > 0, the point on the central path corresponding
to ε is such that each product xi µi is equal to the barrier parameter, i.e., that x1 µ1 =
x2 µ2 = . . . = xn µn = ε (see Eq. (18.19)). Therefore, an indicator of the proximity to
the central path is the difference between each individual product and their average
value ν:
   
x1 µ1 ν
1  ..   ..  1
 . − .  = XSe − νe , (18.29)
ν ν
xn µn ν
where k · k is a norm in Rn . Thanks to this measure, we can define the neighborhoods

around the central path. By using the norm 2, we obtain a restricted neighborhood.
Definition 18.11 (Restricted neighborhood of the central path). Consider a linear

problem
min cT x
subject to
Ax = b
x ≥ 0.
The θ-restricted neighborhood of the primal-dual central path is the set
1
V2 (θ) = (x, λ, µ) ∈ S XSe − νe 2
≤θ , (18.30)
ν
where S is the set of primal-dual interior points (18.21) and 0 ≤ θ < 1.
We can define a large neighborhood using the norm ∞. In this case, we obtain
V∞ (θ) = (x, λ, µ) ∈ S −θν ≤ xi µi − ν ≤ θν , i = i, . . . n

= (x, λ, µ) ∈ S | (1 − θ)ν ≤ xi µi ≤ (1 + θ)ν , i = i, . . . n .
It is possible to extend this interval even further, in order to give the algorithms more
flexibility. Indeed, the fact that the product xi µi takes large values is not important.
However, we wish to avoid too small values of these products in relation to their
average value ν and that xi and µi approach 0 too rapidly. Indeed, it this case, the
iterate would be too close to the constraints, and the algorithm would not benefit from
being in the interior anymore. We define a large neighborhood ignoring the upper
bound and defining γ = 1 − θ. As θ is between 0 and 1, so is γ. The designation
V−∞ is used here to highlight the fact that, contrary to V∞ , only the lower bound is
taken into account.
Definition 18.12 (Large neighborhood of the central path). Consider the linear
problem
min cT x
subject to
Ax = b
x ≥ 0.
The γ-large neighborhood of the primal-dual central path is the set

V−∞ (γ) = (x, λ, µ) ∈ S | xi µi ≥ γν , i = 1, . . . , n , (18.31)
where S is the set of primal-dual interior points (18.21) and 0 < γ < 1.
Many variants of this algorithm have been proposed in the literature. We present
here three algorithms that follow the primal-dual central path, using different strate-
gies.
Restricted step algorithm This algorithm, presented as Algorithm 18.2, chooses

the centering parameter so that the step αk = 1 in Algorithm 18.1 generates only
iterates located in the restricted neighborhood of the central path. The values of
θ and σ guarantee that each Newton step generates an iterate in the neighborhood
V2 (θ). Compared to the generic algorithm (Algorithm 18.1), we have εk = νk σ,
and the step along the direction is always αk = 1.
Prediction-correction algorithm This algorithm (Algorithm 18.3) combines the
two extreme cases discussed above:
• the prediction step investigates where the optimal solution can be, by setting
the centering parameter σ to 0. Still, the step length is calculated so that the
next iterate lies in the restricted neighborhood of the central path;
• the correction step is an iteration of the restricted step algorithm (Algo-
rithm 18.2), that focuses on moving the iterations back toward the central
path using the value σ = 1 for the centering parameter.
Long step algorithm This algorithm fixes the centering parameter to an interme-
diary value (not 0, not 1) and selects the step αk in Algorithm 18.1 so that each
iterate is situated in the large neighborhood of the central path.
The theoretical foundation of these algorithms is technical and beyond the scope
of this book. We refer the interested reader to the excellent presentation proposed by
Wright (1997). We present an illustration of these algorithms on Example 18.7. On
such small examples, the interior point methods are definitely not efficient. But the
iterations can be represented on a figure, in order to have some insight on their general
behavior. Comparing the three versions, the flexibility of the long step algorithm
seems to pay off on this example. Although it cannot be formally generalized, it is
the version that should be preferred in general.
Algorithm 18.2: Interior point algorithm with restricted steps

1 Objective
2 To find a global minimum of a linear optimization problem in standard
form (6.159)–(6.160), i.e.,
min cT x subject to Ax = b , x ≥ 0 .
x∈Rn
3 Input
7 θ = 0.4.
8 An initial feasible solution (x0 , λ0 , µ0 ) ∈ V2 (θ).
√
9 σ = 1 − θ/ n.
11 Output
12 An approximation of the optimal solution (x∗ , λ∗ , µ∗ ).
13 Initialization
14 k := 0.
15 Repeat
1 T
16 νk = x µk .
n k
17 Calculate (dx , dλ , dµ )T by solving
    
A 0 0 dx 0
 0 AT I   dλ  =  0 , (18.32)
Sk 0 Xk dµ −Xk Sk e + νk σe

18 (xk+1 , λk+1 , µk+1 )T := (xk , λk , µk )T + (dx , dλ , dµ )T .
19 k := k + 1.
20 Until νk ≤ ε̄.
Algorithm 18.3: Predictor-corrector interior point algorithm

1 Objective
form (6.159)–(6.160), i.e., minx∈Rn cT x subject to Ax = b, x ≥ 0 .
3 Input
4 A ∈ Rm×n , b ∈ Rm , c ∈ Rn .
5 θpred = 0.5, θcorr = 0.25.
6 An initial feasible solution (x0 , λ0 , µ0 ) ∈ V2 (θcorr ).
8 Output
10 Initialization
11 k := 0.
12 Repeat
13 Prediction: σk = 0, no barrier.
1
14 νk = xTk µk .
n
    
A 0 0 dx 0
 0 AT I   dλ  =  0 ,
Sk 0 Xk dµ −Xk Sk e

16 α := 1.
17 Repeat
18 (xk+1 , λk+1 , µk+1 )T := (xk , λk , µk )T + α(dx , dλ , dµ )T .
19 α := α/2.
20 Until (xk+1 , λk+1 , µk+1 )T ∈ V2 (θpred )
21 k := k + 1.
22 Correction: σk = 1
1
23 νk = xTk µk .
n
    
A 0 0 dx 0
 0 AT I   dλ  =  0 ,
Sk 0 Xk dµ −Xk Sk e + νk e

25 (xk+1 , λk+1 , µk+1 )T := (xk , λk , µk )T + (dx , dλ , dµ )T .
26 k := k + 1.
Example 18.13 (Restricted step algorithm). Consider Example 18.7 again and apply
T
Algorithm 18.2 from the starting point x = 0.6 0.2 02 , λ = 0 and µ = c. A
few iterations are listed in Table 18.5. We observe how slow the method is, due to its
inability to take large steps.
Table 18.5: Iterations for the interior point algorithm with restricted steps (Algorithm
18.2) for Example 18.7
k Xk Sk e − νk e 2 /νk kdk k νk
0 3.061862e-01 5.333333e-01
1 4.433431e-02 6.494725e-01 4.101653e-01
2 3.892141e-02 4.203992e-01 3.154417e-01
3 3.940146e-02 2.830158e-01 2.425935e-01
4 3.305109e-02 1.976729e-01 1.865690e-01
5 2.615063e-02 1.416567e-01 1.434827e-01
..
.
20 4.730746e-04 2.303030e-03 2.793843e-03
21 3.635129e-04 1.770192e-03 2.148633e-03
22 2.793793e-04 1.360809e-03 1.652427e-03
23 2.147504e-04 1.046205e-03 1.270815e-03
24 1.650912e-04 8.043948e-04 9.773332e-04
25 1.269267e-04 6.185101e-04 7.516277e-04
..
.
35 9.178012e-06 4.473980e-05 5.440069e-05
36 7.058322e-06 3.440722e-05 4.183739e-05
37 5.428202e-06 2.646100e-05 3.217546e-05
38 4.174570e-06 2.034997e-05 2.474486e-05
39 3.210470e-06 1.565027e-05 1.903028e-05
..
.
55 4.807511e-08 2.343568e-07 2.849756e-07
56 3.697263e-08 1.802344e-07 2.191633e-07
57 2.843417e-08 1.386111e-07 1.685497e-07
58 2.186758e-08 1.066002e-07 1.296248e-07
59 1.681748e-08 8.198194e-08 9.968925e-08
Example 18.14 (Predictor-corrector algorithm). Consider Example 18.7 again and

T
apply Algorithm 18.3 from the starting point1 x = 0.5 0.3 0.2 , λ = 0 and
µ = c. A few iterations are listed in Table 18.6. It is interesting to note that the
lengths of the step and of the correction are of the same order of magnitude as the
distance to the central path. Moreover, the prediction step has the effect of moving
the iterates further from the central path, which requires a correction iteration.
Table 18.6: Iterations of the predictor-corrector interior point algorithm (Algorithm

18.3) for Example 18.7
k Xk Sk e − νk e 2 /νk kαdk k νk α type
0 1.440876e-01 5.666667e-01
1 4.423867e-01 1.400602e+00 2.833333e-01 0.5 pred
2 5.054768e-02 2.736081e-01 2.833333e-01 1.0 corr
3 1.986016e-01 5.684548e-01 1.416667e-01 0.5 pred
4 4.375390e-03 5.605598e-02 1.416667e-01 1.0 corr
5 1.392922e-01 2.300908e-01 7.083333e-02 0.5 pred
6 1.031609e-03 1.890433e-02 7.083333e-02 1.0 corr
..
.
25 1.294342e-04 1.894595e-04 6.917318e-05 0.5 pred
26 8.047593e-13 1.631807e-08 6.917318e-05 1.0 corr
27 6.471134e-05 9.472452e-05 3.458659e-05 0.5 pred
28 1.005798e-13 4.079063e-09 3.458659e-05 1.0 corr
29 3.235423e-05 4.736095e-05 1.729329e-05 0.5 pred
30 1.257720e-14 1.019709e-09 1.729329e-05 1.0 corr
..
.
39 1.011026e-06 1.479990e-06 5.404154e-07 0.5 pred
40 1.959217e-16 9.957556e-13 5.404154e-07 1.0 corr
41 5.055127e-07 7.399946e-07 2.702077e-07 0.5 pred
42 0.000000e+00 2.489387e-13 2.702077e-07 1.0 corr
43 2.527563e-07 3.699972e-07 1.351039e-07 0.5 pred
44 2.770751e-16 6.223464e-14 1.351039e-07 1.0 corr
45 1.263781e-07 1.849986e-07 6.755193e-08 0.5 pred
1
T
The starting point x = 0.6 0.2 0.2 , used in the previous example, does not belong to
V2 (0.25).
Example 18.15 (Long step algorithm). Consider Example 18.7 again and apply
T
Algorithm 18.4 from the starting point x = 0.6 0.2 0.2 , λ = 0 and µ = c.
The iterations are listed in Table 18.7 and illustrated in Figures 18.10, 18.11 and
18.12. This algorithm clearly performs better than the two others on this example.
The fact that it only loosely follows the central path provides it with more flexibility
to move toward the optimal solution.
Algorithm 18.4: Long step interior point algorithm

1 Objective
form (6.159)–(6.160), i.e., minx∈Rn cT x subject to Ax = b, x ≥ 0 .
3 Input
4 A ∈ Rm×n , b ∈ Rm , c ∈ Rn .
5 γ > 0 (for instance, γ = 10−3 ).
6 0 < σ < 1 (for instance σ = 0.1).
7 An initial feasible solution (x0 , λ0 , µ0 ) ∈ V−∞ (γ).
9 Output
11 Initialization
12 k := 0.
13 Repeat
1 T
14 νk = x µk
n k
    
A 0 0 dx 0
 0 AT I   dλ  =  0 ,
Sk 0 Xk dµ −Xk Sk e + νk σe

16 α := 1
17 Repeat
18 (xk+1 , λk+1 , µk+1 )T := (xk , λk , µk )T + α(dx , dλ , dµ )T .
19 α := α/2.
20 Until (xk+1 , λk+1 , µk+1 )T ∈ V−∞ (γ).
21 k := k + 1.
Table 18.7: Iterations of the long step interior point algorithm (Algorithm 18.4) for
Example 18.7
k (maxi=1,...,n µi xi )/νk kαdk k νk α
0 1.125000e+00 5.333333e-01
1 1.217712e+00 1.180976e+00 2.933333e-01 0.5

2 1.278088e+00 5.030498e-01 1.613333e-01 0.5
3 1.294039e+00 2.119436e-01 8.873333e-02 0.5
4 1.561739e+00 2.068096e-01 8.873333e-03 1.0
5 1.014971e+00 2.363846e-02 8.873333e-04 1.0
6 1.007229e+00 2.198411e-03 8.873333e-05 1.0
7 1.000716e+00 2.180253e-04 8.873333e-06 1.0
8 1.000072e+00 2.186378e-05 8.873333e-07 1.0
x3
•x∞
x2
•
x1
Figure 18.10: Iterations of the long step interior point algorithm (Algorithm 18.4) for
Example 18.7
x3
•x∞
x2
•
x1
T
Example 18.7, with x0 = 0.1 0.1 0.8
x3
•x∞
x2
•
x1
T
Example 18.7, with x0 = 0.1 0.8 0.1
The algorithms described in this chapter require an initial feasible solution

(x0 , λ0 , µ0 )T such that Ax0 = b, AT λ0 + µ0 = c, x0 > 0 and µ0 > 0. It means
that a “phase I” procedure is required, similar to the procedure described for the
simplex algorithm in Section 16.3, where an auxiliary problem is solved to generate
the feasible solution of the original problem (see Huhn, 1999 for an analysis of such
techniques in the context of interior point methods). Note that some implementations
do not require beginning with a feasible starting point. These so called infeasible
interior point methods are more complex, but may be quite effective in practice (see
for instance Zhang, 1994).
18.3 Project
Objective
The aim of this project is to implement interior point algorithms and compare them
with the simplex method (Chapter 16) and Dikin’s method (Section 17.3).
Approach
Apply the algorithms to the problems described below, for different values of n. Cau-
tion! If the starting point is not in the neighborhood corresponding to the algorithm,
first apply centering iterations (σ = 1) to obtain a point in the neighborhood.
The following variations of the algorithms can be tested.
Algorithm 18.2. Vary the value of θ = 0.1, 0.4, 0.8.
Algorithm 18.3. θpred = 0.1, 0.5, 0.9 and θcorr = θpred /2.
Algorithm 18.4. γ = 10−5 , 10−3 , 1, σ = 0.1, 0.5, 0.9.
Algorithms
Algorithms 16.5, 18.2, 18.3, and 18.4.
Problems
n
X
min − 2n−i xi
i=1
444 Project
subject to
x1 ≤5
4x1 + x2 ≤ 25
8x1 + 4x2 + x3 ≤ 125
..
.
2n x1 + 2n−1 x2 + · · · + 4xn−1 + xn ≤ 5n
x1 , x2 , . . . , xn ≥ 0 .
T
Use the starting point 1 1 ... 1 . The optimal solution to this problem is
T
0 0 . . . 0 5n .
min −xn
subject to
ε ≤ x1 ≤ 1
εxi−1 ≤ xi ≤ 1 − εxi−1 , i = 2, . . . , n ,
where 0 < ε < 1/2. Use the starting point
1+ε
x1 =
2
1
xi = , i = 2, . . . , n .
2
Chapter 19
Augmented Lagrangian method
A major difficulty with constrained optimization problems is to combine two goals

that are often conflicting:
1. to reduce the value of the objective function and
2. to verify the constraints.
Interior point methods start by giving significant weight to the second criterion at
the expense of the first, and then rebalance the two during the iterations by lowering
the barrier. The augmented Lagrangian method described in this chapter works in
the opposite way. It tries to reduce the objective function, possibly by violating the
constraints. Subsequently, the feasibility is restored progressively as the iterations
proceed.
Contents
19.1 Lagrangian penalty . . . . . . . . . . . . . . . . . . . . . . 447
19.2 Quadratic penalty . . . . . . . . . . . . . . . . . . . . . . . 449
19.3 Double penalty . . . . . . . . . . . . . . . . . . . . . . . . . 450
19.4 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
In this chapter, and in the next, we consider the optimization problem (1.71)–(1.72),
i.e.,
minn f(x) (19.1)
x∈R
subject to
h(x) = 0 , (19.2)
where f is a function of Rn in R and h is a function of Rn in Rm . We keep in mind

that it is always possible to transform an inequality constraint
gi (x) ≤ 0
446
with gi : Rn → R into an equality constraint, by introducing slack variables zi ∈ R

(see Section 1.2.2 and Example 6.17), to obtain
gi (x) + z2i = 0 .
The augmented Lagrangian method is directly inspired by the Karush-Kuhn-

Tucker optimality conditions and especially the proof of Theorem 6.10. The basic
idea consists in transforming a constrained problem into a sequence of unconstrained
problems, by penalizing more and more the possible violation of the constraints. It is
interesting to note that this approach is the opposite of the one used in the context
of interior point methods. Indeed, the latter mainly focuses on the generation of
feasible iterates. During the first iterations, this comes at the expense of optimality.
As the height of the barriers decreases, the interior point algorithms converge toward
the minimum. For the augmented Lagrangian method, it is the opposite. The main
focus is on the identification of optimal solutions to the subproblems, if required by
violating the constraints. Successive iterations try to restore the feasibility of the
iterates.
Two types of penalties for violating the constraints are considered:
• a Lagrangian penalty, or Lagrangian relaxation, as presented in the introduction
of duality, discussed in Chapter 4,
• a quadratic penalty, as in the proof of Theorem 6.10.
This involves combining the Lagrangian (Definition 4.3) of the problem, i.e.,
L(x, λ) = f(x) + λT h(x)
with a quadratic penalty

c 2
h(x) ,
2
where c ∈ R, c > 0. We obtain the augmented Lagrangian (Definition 6.18)
c 2
Lc (x, λ) = f(x) + λT h(x) + h(x) , (19.3)
2
for which the derivatives with respect to x are
∇x Lc (x, λ) = ∇f(x) + ∇h(x)λ + c∇h(x)h(x) (19.4)
and
m
X
∇2xx Lc (x, λ) 2
= ∇ f(x) + λi ∇2 hi (x)
i=1
m
(19.5)
X
T 2
+ c∇h(x)∇h(x) + c hi (x)∇ hi (x) .
i=1
We start by analyzing each penalty separately.

Augmented Lagrangian method 447
19.1 Lagrangian penalty

As described in the proof of Theorem 6.19, if x∗ and λ∗ satisfy the sufficient optimality
conditions (6.115)
∇L(x∗ , λ∗ ) = 0
and (6.116)
yT ∇2xx L(x∗ , λ∗ )y > 0 , ∀y ∈ D(x∗ ) , y 6= 0,
where D(x∗ ) is the linearized cone in x∗ (Definition 3.23), then x∗ is a strict local
minimum of the problem
minn Lc (x, λ∗ ) (19.6)
x∈R
with a sufficiently large c.

Example 19.1 (Lagrangian penalty). Consider the problem
1
min −x21 + x22
x∈R2 2
subject to
x1 = 1 .
The Lagrangian is
1
L(x, λ) = −x21 + x22 + λ(x1 − 1)
2
and
−x1 + λ
∇x L(x, λ) =
x2

−1 0
∇2xx L(x, λ) = .
0 1
T
Then, x∗ = 1 0 and λ∗ = 1 satisfy the sufficient optimality conditions. Indeed,
∗ ∗
∇x L(x , λ ) = 0 and (6.23) are satisfied. One direction y is in the linearized cone
D(x∗ ) if and only if yT ∇h(x∗ ) = 0, that is, if y1 · 1 + y2 · 0 = y1 = 0. Therefore, if
y 6= 0,
−1 0 0
0 y2 = y22 > 0
0 1 y2
and the second order KKT condition (6.24) is satisfied. The augmented Lagrangian
is
1 c
Lc (x, λ) = −x21 + x22 + λ(x1 − 1) + (x1 − 1)2 .
2 2
We have
(c − 1)x1 + λ − c
∇x Lc (x, λ) = ,
x2
which is zero at  c−λ 
x= c−1 
0
448 Lagrangian penalty
and
c−1 0
∇2xx Lc (x, λ) = . (19.7)
0 1
We immediately note that x is not defined if c = 1. Moreover, if c < 1, the second
derivatives matrix is not positive semidefinite, the necessary optimality conditions are
not satisfied and the point x is not a local minimum of the augmented Lagrangian. If
c > 1, then x is a strict local minimum of the augmented Lagrangian. Now, consider
λ = λ∗ = 1. For any c 6= 1, we get
 c − λ∗   c − 1 

1
x= c−1 = c−1 = = x∗ .
0
0 0
This illustrates that if the optimal value of the dual variable λ∗ is known, and if the
parameter c is large enough, minimizing the augmented Lagrangian identifies also a
local minimum of the constrained problem. The constrained problem is represented in
Figure 19.1(a). Figures 19.1(b) and 19.2(b) display the level curves of the augmented
Lagrangian (with λ = λ∗ ) for values of c above 1. We note that the minimum without
constraint is x∗ . However, when c < 1 (Figure 19.2(a)), x∗ is a saddle point of the
augmented Lagrangian and not a minimum.
2 2
1.5 1.5
1 1
0.5 0.5
x∗• 0 x2 x∗• 0 x2
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -2
-1 -0.5 0 0.5 1 1.5 2 2.5 3 -1 -0.5 0 0.5 1 1.5 2 2.5 3
x1 x1
(a) Constrained problem (b) Augmented Lagrangian with c = 2
Figure 19.1: Level curves for Example 19.1
Then, if the value of λ∗ is known, the constrained optimization problem (19.1)–

(19.2) reduces to the unconstrained optimization problem (19.6) and the methods
presented in Part IV can be used. Unfortunately, in practice, the value of λ∗ is as
complicated to obtain as the value of x∗ , and this method cannot be directly used.
Consequently, the second penalty plays an important role.
2 2
1.5 1.5
1 1
0.5 0.5
x∗• 0 x2 x∗• 0 x2
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -2
-1 -0.5 0 0.5 1 1.5 2 2.5 3 -1 -0.5 0 0.5 1 1.5 2 2.5 3
x1 x1
(a) Augmented Lagrangian with c = 1/2 (b) Augmented Lagrangian with c = 10
Figure 19.2: Level curves for Example 19.1
19.2 Quadratic penalty

The quadratic penalty amounts to making c sufficiently large that an infeasible point
of the initial problem cannot be optimal when minimizing the augmented Lagrangian.
This idea is the basis for the proof of Theorem 6.10.
If we take Example 19.1, the minimum of the augmented Lagrangian is
 c−λ 
x =  c−1 .
0
T
When c tends toward infinity, x tends toward x∗ = 1 0 , and this regardless of
the value of λ. The level curves for several values of c are presented in Figure 19.3.
A possible algorithm would consist in solving a sequence of unconstrained prob-
lems
xk ∈ argminx∈Rn Lck (x, λ) , (19.8)
where λ is given and ck is a sequence of real numbers such that limk→∞ ck = +∞.
In general, the final solution to the problem at step k serves as the starting point
for the calculation of the optimal solution in step k + 1. Note that if several minima
belongs argminx∈Rn Lck (x, λ), we consider only one, that is the solution produced by
the considered unconstrained optimization algorithm.
Such an algorithm would work regardless of the value of λ (this is proved be-
low). Unfortunately, the unconstrained minimization problem becomes increasingly
ill-conditioned as ck increases. This can be seen in Example 19.1. Indeed, the level
curves become increasingly stretched (Figure 19.3(d)).
450 Double penalty
2 2
1.5 1.5
1 1
0.5 0.5
x∗• + 0 x2 x∗• + 0 x2
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -2
-1 -0.5 0 0.5 1 1.5 2 2.5 3 -1 -0.5 0 0.5 1 1.5 2 2.5 3
x1 x1
(a) Augmented Lagrangian with c = 2 (b) Augmented Lagrangian with c = 5
2 2
1.5 1.5
1 1
0.5 0.5
x∗•+ 0 x2 x∗+
• 0 x2
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -2
-1 -0.5 0 0.5 1 1.5 2 2.5 3 -1 -0.5 0 0.5 1 1.5 2 2.5 3
x1 x1
(c) Augmented Lagrangian with c = 10 (d) Augmented Lagrangian with c = 100
Figure 19.3: Level curves for Example 19.1, with λ = 0
We can also see that one of the eigenvalues of the Hessian matrix (19.7) is c−1, which
tends toward infinity with c. This situation causes significant numerical difficulties
for solving the unconstrained problems. Therefore, it is necessary to combine the two
types of penalties in order to obtain an efficient algorithm.
19.3 Double penalty

By using both penalty types, we can obtain the advantages of the two approaches. It
is important to note that the quadratic penalty method works better when the value
of λ is close to λ∗ . This is shown in Figure 19.4, where the first component of xk ,
obtained by solving (19.8), is reproduced for various values of c and λ. Clearly, when
c is increased, the value tends faster toward 1 when λ is close to λ∗ = 1.
It is therefore important to be able to obtain a good approximation of λ∗ . The
algorithm presented here is based not only on a sequence of penalty parameters ck k
∗
such that ck → ∞, but also on a set of vectors λk k approximating
λ . From a
theoretical point of view, it is sufficient that the sequence λk k is bounded in order

for the method to work.
1.01
λ=0
1.008 λ = 0.2
λ = 0.4
1.006 λ = 0.6
(xk )1
λ = 0.8
1.004 λ=1
1.002
0 500 1000 1500 2000

c
Figure 19.4: Efficiency of the quadratic penalty as a function of λ
Theorem 19.2 (Augmented Lagrangian method). Let f : Rn → R and h : Rn → Rm

n n
continuous functions. Let X ⊂ R be a closed subset of
be two R such that the
set x ∈ X | h(x) = 0 is non empty. Consider a sequence ck k such that, for all
k, ck ∈ R, 0 < ck < ck+1 , and limk→∞ ck = +∞. Consider a bounded sequence
λk k such that λk ∈ Rm for all k. Let xk be the global minimum of augmented
Lagrangian, that is
ck 2
xk ∈ argminx∈X Lck (x, λk ) = f(x) + λTk h(x) + h(x) . (19.9)
2

Then, each limit point of the sequence xk k is a global minimum of the problem
minx∈Rn f(x) subject to h(x) = 0 and x ∈ X.
Proof. Let f∗ be the optimal value of the constrained problem and let k be arbitrary.
f∗ = min f(x)
h(x)=0, x∈X
ck 2
= min f(x) + λTk h(x) + h(x)
h(x)=0, x∈X 2
= min Lck (x, λk ) .
h(x)=0, x∈X

Since X is closed, x ∈ X | h(x) = 0 is non empty and f is continuous, f∗ is finite.
452 Double penalty
By definition of xk , a global minimum of (19.9), we have
Lck (xk , λk ) ≤ Lck (x, λk ) , ∀x ∈ X . (19.10)
Then, taking the minimum,

Lck (xk , λk ) ≤ min Lck (x, λk ) = f∗ ,

h(x)=0,x∈X
and this for all k. When k tends toward infinity, Lck (xk , λk ) is finite since this is also
the case for f∗ . Let
x̄ be a limit point of the sequence xk k and λ̄ a limit point of
the sequence λk k (it exists because the sequence is bounded). Then, when going
to the upper limit and taking into account the fact that the functions f and h are
continuous, we get
lim sup Lck (xk , λk ) ≤ f∗

k→∞
ck 2
lim sup f(xk ) + λTk h(xk ) +h(xk ) ≤ f∗
k→∞ 2
ck 2
f(x̄) + λ̄T h(x̄) + lim sup h(xk ) ≤ f∗ .
k→∞ 2

In order for the left term to remain finite, while ck k → ∞, we require h(xk ) k → 0
and then h(x̄) = 0. The above expression simplifies to
f(x̄) ≤ f∗ .
Moreover, since X is closed, we also have x̄ ∈ X. Therefore, x̄ is indeed a global

minimum to the problem.
Note that if x∗ is a local minimum to the constrained problem (19.1)–(19.2), then
there exists a closed neighborhood X ⊆ Rn , such that x∗ is a global minimum for the
problem minx∈X f(x) subject to h(x) = 0, and the theorem applies. We have shown
that the quadratic penalty enables us to solve the problem, as was demonstrated
above. The following result enables us to find an approximation of λ∗ .
Theorem 19.3 (Approximation of Lagrange multipliers). Let f and h be contin-

uously differentiable. Consider a sequence (ck )k such that, for all k, ck ∈ R
and 0 < ck < ck+1 . Moreover, let us assume limk→∞ ck = +∞. Let λk k be a
bounded sequence such that λk ∈ Rm for all k. Let εk k be a sequence such that
εk > 0 for all k and limk→∞ εk = 0. Let xk k be a sequence such that
∇x Lck (xk , λk ) ≤ εk . (19.11)

Let xk k∈K be a subsequence of the sequence xk k converging toward x∗ . If
∇h(x∗ ) is of full rank, then
lim λk + ck h(xk ) = λ∗ , (19.12)

k∈K,k→∞
where x∗ and λ∗ satisfy the necessary first-order optimality conditions (6.23),

i.e.,
∇f(x∗ ) + ∇h(x∗ )λ∗ = 0 (19.13)

∗
h(x ) = 0 . (19.14)

Proof. We assume, without loss of generality, that the sequence xk k
converges to-
ward x∗ (by eliminating all terms such that k 6∈ K). We denote
ℓk = λk + ck h(xk ) . (19.15)
From (19.3), we have
∇x Lck (xk , λk ) = ∇f(xk ) + ∇h(xk )λk + ck ∇h(xk )h(xk )

= ∇f(xk ) + ∇h(xk ) λk + ck h(xk )
and using (19.15),

∇x Lck (xk , λk ) = ∇f(xk ) + ∇h(xk )ℓk . (19.16)
∗
By continuity, since ∇h(x ) is of full rank, starting from a sufficiently large k,
∇h(xk ) is also of full rank. Therefore,
∇x Lck (xk , λk ) = ∇f(xk ) + ∇h(xk )ℓk

∇h(xk ) ∇x Lck (xk , λk ) = ∇h(xk )T ∇f(xk ) + ∇h(xk )T ∇h(xk )ℓk
T
and −1
ℓk = ∇h(xk )T ∇h(xk ) ∇h(xk )T ∇x Lck (xk , λk ) − ∇f(xk ) .
According to (19.11), since εk → 0, we have ∇x Lck (xk , λk ) → 0. When k tends
toward infinity, we obtain
−1
λ∗ = lim ℓk = − ∇h(x∗ )T ∇h(x∗ ) ∇h(x∗ )T ∇f(x∗ ) .
k→∞
By making k tend toward infinity in (19.16), as ∇x Lck (xk , λk ) → 0, we obtain (19.13)
∇f(x∗ ) + ∇h(x∗ )λ∗ = 0 .

Since ℓk = λk +ck h(xk ) → λ∗ and λk k is bounded, then ck h(xk ) k is also bounded.
Since ck → ∞, then h(xk ) → 0 and we get (19.14),
h(x∗ ) = 0 .
The arguments used in this proof are similar to those used to prove Theorem 6.10.

This result enables us to define the sequence λk k as follows:
λk+1 = λk + ck h(xk ) . (19.17)

454 Double penalty
Algorithm 19.1: Augmented Lagrangian algorithm

1 Objective
2 To find a local minimum of the problem (1.71)–(1.72):
minx∈Rn f(x) subject to h(x) = 0.
3 Input
7 The twice differentiable constraint h : Rn → Rm .
8 The gradient matrix of the constraint ∇h : Rn → Rn×m .
9 The Hessian ∇2 hi : Rn → Rn×n of each constraint i = 1, . . . , m.
10 An initial feasible solution (x0 , λ0 ).
11 An initial penalty parameter c0 (by default c0 = 10).
12 The required precision ε > 0.
13 Output
14 An approximation of the optimal solution (x∗ , λ∗ ).
15 Initialization
16 k := 0.
17 b
η0 := 0.1258925. Value chosen so that η0 = 0.1.
18 τ := 10.
19 α := 0.1.
20 β := 0.9.
21 εk := 1/c0 .
22 ηk := bη0 /cα
0.
23 Repeat
24 Use Newton’s method with line search (Algorithm 11.8) or with trust
region (Algorithm 12.4) to solve
ck 2
xk+1 ∈ argminx∈Rn Lck (x, λk ) = f(x) + λk h(x) + h(x) , (19.18)
2
by using xk as the starting point and εk as the precision.

25 if h(xk ) ≤ ηk . then we update the multipliers
26 λk+1 := λk + ck h(xk ).
27 ck+1 := ck , ck is not modified.
28 εk+1 := εk /ck , precision is increased.
29 ηk+1 := ηk /cβk , feasibility requirement is increased.
30 else we update the penalty parameter
31 λk+1 := λk , λk is not modified.
32 ck+1 := τck , penalty is increased.
33 εk+1 := ε0 /ck+1 , precision reset.
α
34 ηk+1 := b
η0 /ck+1 , feasibility requirement reset.
35 k := k + 1.
2
36 Until ∇L(xk , λk ) ≤ ε and h(xk ) ≤ε
Example 19.4 (Lagrangian penalty – cont.). Consider Example 19.1 again and apply
the update (19.17) to obtain

ck − λk −λk + ck
λk+1 = λk + ck −1 = .
ck − 1 ck − 1
We examine the convergence of this sequence toward λ∗ = 1.
−λk + ck − λ∗ (ck − 1) −λk + λ∗

λk+1 − λ∗ = = .
ck − 1 ck − 1
Therefore, for λk+1 to be closer to λ∗ than λk , so that the sequence converges, we
need ck > 2. We see again that, in order for the method to work, the value of ck
should be sufficiently large.
Theorem 19.3 is now used as a basis to define the algorithm. Indeed,

we need
to specify the sequences of penalty parameters ck k , of parameter λk k , and of
parameters εk to obtain an algorithm. At each iteration,
1. we solve minx Lc (x, λk ) at a precision εk , by using an appropriate algorithm shown
in Chapters 11, 12 or 13 to obtain xk+1 ;
2. if xk+1 is “sufficiently” feasible, we update the Lagrange multipliers λk , by using
(19.12);
3. otherwise, we increase the penalty parameter ck .
The values of the parameters proposed in Algorithm 19.1 are taken from the
LANCELOT software (Conn et al., 1992).
Example 19.5 (Augmented Lagrangian). Consider the problem

min 2 x21 + x22 − 1 − x1
x∈R2
subject to
x21 + x22 = 1
shown in Figure 19.5. Table 19.1 lists the values of the iterates, as well as the norm of
the gradient of the Lagrangian. Table 19.2 lists the values of ck , ∇x Lck (xk , λk ) , εk ,
h(xk ) and ηk during the iterations. The last column gives the number of iterations
required to solve the problem (19.18). We can see that the penalty parameter is
increased during the first iteration, because the constraint satisfaction was insufficient
( h(xk ) = 1.45292e-01, while ηk = 1e-01).
At the subsequent iterations, the value of the multiplier λk has been updated. T
The path of the algorithm is presented in solid lines starting from x0 = −1 0.1
T
and in dashed lines starting from x̄0 = 0 −0.1 , in Figure 19.6. It is interesting
to note the way it approximately “follows” the constraint. Figure 19.7 shows the
evolution of the
T level curves of the augmented Lagrangian around the optimal solution
x∗ = 1 0 during the 4 first iterations.
456 Double penalty
0.5
0 x2
-0.5
-1
-1 -0.5 0 0.5 1
x1
(a) Level curves
4
3
2
1
0
-1
-2
1
-1 0 x2
x1 0 -1
1
(b) Function
Figure 19.5: Problem for Example 19.5

Table 19.1: Iterates of the augmented Lagrangian method for Example 19.5
k x1 x2 λ ∇x L(xk , λk ) h(xk )
0 -1.00000e+00 1.00000e-01 0.00000e+00 5.01597e+00 1.00000e-02
1 9.24487e-01 5.51255e-03 0.00000e+00 2.69804e+00 1.45292e-01
2 9.92491e-01 -1.83076e-05 -1.49616e+00 1.07375e-04 1.49616e-02

3 9.99981e-01 1.03829e-07 -1.49999e+00 5.44820e-07 3.82637e-05
4 1.00000e+00 -7.98650e-10 -1.50000e+00 2.18509e-07 9.70011e-08
5 1.00000e+00 1.52816e-14 -1.50000e+00 1.36610e-12 1.33171e-09
Table 19.2: Iterates of the augmented Lagrangian method for Example 19.5 (cont.)
k ck ∇x Lck (xk , λk ) εk h(xk ) ηk
1 10 1.30108e-02 1e-01 1.45292e-01 1.00000e-01 15
2 100 1.07375e-04 1e-02 1.49616e-02 7.94328e-02 8
3 100 5.44820e-07 1e-04 3.82637e-05 1.25892e-03 4
4 100 2.18509e-07 1e-06 9.70011e-08 1.99526e-05 1
5 100 1.36628e-12 1e-08 1.33171e-09 3.16228e-07 1
6 100 1.33227e-15 1e-10 3.32778e-12 5.01187e-09 1
x̄0
x0
x∗
Figure 19.6: Augmented Lagrangian: iterations for Example 19.5

Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 458 Double penalty
x2 x2
x∗ x∗
x1 x1
(a) k = 1, c = 10.0, λ = 0 (b) k = 2, c = 100.0,λ = 0
x2 x2
x∗ x∗
x1 x1
(c) k = 3, c = 100.0, λ = −1.49616 (d) k = 4, c = 100.0, λ = −1.49998
Figure 19.7: Level curves for the augmented Lagrangian for Example 19.5
Example 19.6 (Augmented Lagrangian with the constrained Rosenbrock problem).

The Rosenbrock problem (Chapter 11.6) is difficult. We consider a constrained version
of this problem
2 2
min 100 x2 − x21 + 1 − x1
x∈R2
subject to
1
x1 − x22 −
= 0,
2
shown in Figure 19.8. The iterations are listed in Table 19.3 and the evolution of the
parameters in Table 19.4.
1
0.8
x∗ 0.6
0.4
0.2
0
x0
-0.2
-0.4
-1.5 -1 -0.5 0 0.5 1 1.5
Figure 19.8: Augmented Lagrangian: iterations for Example 19.6
Table 19.3: Iterates of the augmented Lagrangian method for Example 19.6
k x1 x2 λ ∇x L(xk , λk ) h(xk )
0 -1.00000e+00 0.00000e+00 0.00000e+00 4.50795e+02 1.50000e+00
1 7.56394e-01 5.68279e-01 -6.65475e-01 1.90681e-02 6.65475e-02
2 7.23548e-01 5.17764e-01 -6.65475e-01 6.43621e-01 4.45321e-02
3 6.80054e-01 4.49567e-01 -2.87105e+00 1.41250e-04 2.20558e-02
4 6.69717e-01 4.29750e-01 -2.87105e+00 1.97370e+00 1.49681e-02
5 6.64093e-01 4.10635e-01 -7.39946e+00 1.78647e-04 4.52841e-03
6 6.63957e-01 4.06281e-01 -7.39946e+00 1.42593e+00 1.10665e-03
7 6.64017e-01 4.05166e-01 -8.82418e+00 1.44475e-08 1.42472e-04
8 6.64029e-01 4.05011e-01 -8.86988e+00 7.57362e-13 4.56917e-06
Table 19.4: Iterates of the augmented Lagrangian method for Example 19.6 cont.
k ck ∇x Lck (xk , λk ) εk h(xk ) ηk
1 10 1.90681e-02 1e-01 6.65475e-02 1.00000e-01 11
2 10 2.74456e-03 1e-02 4.45321e-02 1.25892e-02 2
3 100 1.41250e-04 1e-02 2.20558e-02 7.94328e-02 3
4 100 2.61580e-05 1e-04 1.49681e-02 1.25892e-03 2
5 1000 1.78647e-04 1e-03 4.52841e-03 6.30957e-02 3
6 1000 1.30511e-08 1e-06 1.10665e-03 1.25892e-04 3
7 10000 1.44475e-08 1e-04 1.42472e-04 5.01187e-02 3
8 10000 7.57636e-13 1e-08 4.56917e-06 1.25892e-05 3
It appears that the algorithm quickly identifies the neighborhood of the optimal so-
lution, but that it cannot satisfy the constraint with high precision. It must thus
increase ck , which affects the conditioning of the problem. The close-packed level
curves of the augmented Lagrangian, shown in Figure 19.9(b), highlight this phe-
nomenon.
460 Project
1 1
0.8 0.8
0.6 0.6
x∗
∗
x 0.4 0.4
0.2 0.2
0 0
-0.2 -0.2
-0.4 -0.4
-1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5
(a) ck = 10, λk = 0 (b) ck = 10 000, λk = −8.86988
Figure 19.9: Level curves of the augmented Lagrangian for Example 19.6
The presentation of the proof for Theorems 19.2 and 19.3, as well as Example
19.1, was inspired by Bertsekas (1999).
19.4 Project
The general organization of the projects are described in Appendix D.
Objective
The aim of this project is to implement the augmented Lagrangian to solve various
problems and analyze the role of different parameters on its efficiency.
Approach
Analyze the impact of the following parameters.
1. Penalty parameter: utilize the initial values c0 = 1, 10, and 100 and the augmen-
tation rates τ = 1, 2, 10 and 100. Analyze the impact of the algorithm itself and
also on the behavior of the unconstrained algorithm used to solve the subproblem
of step 11.
2. Precision of the constraints: use the values β = 0.1, 0.5, and 0.9. Analyze the
impact on the approximation of the dual variables.
Algorithm
Algorithm 19.1.
Problems
min ex1 −2x2
subject to
sin(−x1 + x2 − 1) = 0
−2 ≤ x1 ≤ 2
−1.5 ≤ x2 ≤ 1.5 .
10 10
!!
X X
xi xk
min e ci + xi − ln e
x∈R10
i=1 k=1
subject to
ex1 + 2 ex2 + 2 ex3 + ex6 + ex10 = 2

ex4 + 2 ex5 + ex6 + ex7 = 1
ex3 + ex7 + ex8 + 2 ex9 + ex10 = 1
−100 ≤ xi ≤ 100 , i = 1, . . . , 10 ,
with
c1 = −6.089 c6 = −14.986
c2 = −17.164 c7 = −24.1
c3 = −34.054 c8 = −10.708
c4 = −5.914 c9 = −26.662
c5 = −24.721 c10 = −22.179 .
Solution:
x∗1 = −3.40629 x∗6 = −4.44283
x∗2 = −0.596369 x∗7 = −1.41264
x∗3 = −1.1912 x∗8 = −21.6066
x∗4 = −4.62689 x∗9 = −2.26867
x∗5 = −1.0011 x∗10 = −1.40346 .
Proposed by Hock and Schittkowski (1981).
min ln(1 + x21 ) − x2

x∈R2
subject to
2
1 + x21 + x22 = 4
−4 ≤ x1 ≤ 4
−4 ≤ x2 ≤ 4 .
Please note: do not forget to transform the formulation of the problems so that
they are compatible with (19.1)–(19.2).
Chapter 20
Sequential quadratic
programming
Contents
20.1 Local sequential quadratic programming . . . . . . . . . 464
20.2 Globally convergent algorithm . . . . . . . . . . . . . . . 471
20.3 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
In this chapter, we consider the optimization problem (1.71)-(1.72), i.e.,
min f(x) (20.1)

x∈Rn
subject to
h(x) = 0 , (20.2)
where f is a function of Rn in R and h a function of Rn in Rm . Remember that it is

always possible to transform an inequality constraint
gi (x) ≤ 0
with gi : Rn → R into an equality constraint by introducing slack variables zi ∈ R

(see Section 1.2.2 and Example 6.17), to get
gi (x) + z2i = 0 .
The basic idea of the algorithm that we develop is simple. Just as for uncon-
strained optimization, the necessary optimality conditions (6.23) constitute a system
of non linear equations (Theorem 6.10). The methods presented in Chapters 7 and 8
are relevant in this context. We start by applying Newton’s local method.
464 Local sequential quadratic programming
20.1 Local sequential quadratic programming

We need to find primal variables x∗ and dual variables λ∗ such that the gradient of
the Lagrangian of the problem (Definition 4.3) is zero, i.e.,
∇L(x∗ , λ∗ ) = 0 ,
with
L(x, λ) = f(x) + λT h(x) .
We have

∇f(x) + ∇h(x)λ ∇x L(x, λ)
∇L(x, λ) = = (20.3)
h(x) h(x)
 m 
X
2 2
 ∇ f(x) + λ i ∇ h i (x) ∇h(x) 
∇2 L(x, λ) =  i=1 
∇h(x)T 0
!
∇2xx L(x, λ) ∇h(x)
= T
.
∇h(x) 0
Then, an iteration k of Newton’s method consists in finding a direction d ∈ Rn+m

such that
∇2 L(x, λ)d = −∇L(x, λ) ,
i.e., finding dx ∈ Rn and dλ ∈ Rm such that
∇2xx L(x, λ)dx + ∇h(x)dλ = −∇x L(x, λ)
(20.4)
∇h(x)T dx = −h(x) .
It is interesting to note that Equations (20.4) are the optimality conditions of the
following quadratic optimization problem:
1 T 2
min ∇x L(xk , λk )T d + d ∇xx L(xk , λk )d (20.5)
d 2
subject to
∇h(xk )T d + h(xk ) = 0 . (20.6)
m
If ℓ ∈ R is the vector of dual variables for the above problem, the Lagrangian of
(20.5)–(20.6) is
1 T 2
LPQ (d, ℓ) = ∇x L(xk , λk )T d + d ∇xx L(xk , λk )d + ℓT (∇h(xk )T d + h(xk ))
2
and the necessary optimality conditions are
∇d LPQ (d∗ , ℓ∗ ) = ∇x L(xk , λk ) + ∇2xx L(xk , λk )d∗ + ∇h(xk )ℓ∗ = 0
∇l LPQ (d∗ , ℓ∗ ) = ∇h(xk )T d∗ + h(xk ) = 0 .
We now need to take d∗ = dx and ℓ∗ = dλ to obtain (20.4).
Sequential quadratic programming 465
In the context of unconstrained optimization, we have seen that the calculation of

the Newton step amounted to the optimization of a quadratic model of the func-
tion (Algorithm 10.2). In the context of constrained optimization, the calculation
of the Newton step amounts to the optimization of a quadratic function with linear
constraints. This is the problem (20.5)–(20.6).
We now simplify the formulation somewhat. By using (20.3), the Newton equa-
tions (20.4) can be written as
∇2xx L(xk , λk )dx + ∇h(xk )dλ = −∇f(xk ) − ∇h(xk )λk

∇h(xk )T dx = −h(xk ) .
bλ = dλ + λk to obtain
Define d
bλ = −∇f(xk )
∇2xx L(xk , λk )dx + ∇h(xk )d (20.7)
T
∇h(xk ) dx = −h(xk ) . (20.8)
We are here also dealing with optimality conditions for a quadratic problem
1 T 2
min ∇f(xk )T d + d ∇xx L(xk , λk )d (20.9)
d 2
subject to
∇h(xk )T d + h(xk ) = 0 . (20.10)
According to Theorem 6.38, the optimal solution to this quadratic problem is

λ∗ = H−1 h(xk ) − ∇h(xk )T ∇2xx L(xk , λk )−1 ∇f(xk ) , (20.11)
with H = ∇h(xk )T ∇2xx L(xk , λk )−1 ∇h(xk ) and

x∗ = −∇2xx L(xk , λk )−1 ∇h(xk )λ∗ + ∇f(xk ) . (20.12)
It is important to note that, in practice, specialized algorithms should be used to

solve the quadratic problem.1 The analytic solution is computationally intense, and
numerical issues may occur.
Example 20.1 (Quadratic problem in SQP). Consider the problem
min f(x) = x1 + x2
subject to
h(x) = x21 + (x2 − 1)2 − 1 = 0 .
The Lagrangian is
L(x, λ) = x1 + x2 + λx21 + λx22 − 2λx2 .
1 We refer the interested reader to more detailed discussion on SQP methods in the literature,
such as Gould and Toint (2000), and Gill and Wong (2012).
Algorithm 20.1: Local SQP algorithm

1 Objective
2 To find a local minimum of the problem (1.71)–(1.72),
min f(x) subject to h(x) = 0 .

x∈Rn
3 Input
7 The differentiable constraint h : Rn → Rm .
8 The gradient matrix of the constraint ∇h : Rn → Rn×m .
9 The Hessian ∇2 hi : Rn → Rn×n of each constraint i = 1, . . . , m.
10 An initial solution (x0 , λ0 ).
12 Output
14 Initialization
15 k := 0.
16 Repeat
m
X
17 Calculate ∇2xx L(xk , λk ) = ∇2 f(xk ) + λk i ∇2 hi (xk ) .
i=1
18 Obtain dx and dλ by solving the quadratic problem
1
min ∇f(xk )T d + dT ∇2xx L(xk , λk )d
d 2
subject to
∇h(xk )T d + h(xk ) = 0 ,
with an appropriate algorithm. To illustrate the algorithm, we can use
(20.11) and (20.12).
19 xk+1 := xk + dx .
20 λk+1 := dλ .
21 k := k + 1.
22 Until ∇L(xk , λk ) ≤ ε.
Then,  
1 + 2λx1
∇L(x, λ) =  1 + 2λx2 − 2λ 
x21 + x22 − 2x2
 
2λ 0 2x1
∇2 L(x, λ) =  0 2λ 2x2 − λ  .
2x1 2x2 − 2 0
The quadratic problem to be solved at each iteration is
min d1 + d2 + λk d21 + λk d22

d∈R2
subject to
2
2x1 d1 + (2x2 − 2) d2 + x21 + (x2 − 1) − 1 = 0 .
Since the solving of Karush-Kuhn-Tucker equations by Newton’s method consists

in solving a sequence of problems, or quadratic programs, the thus-obtained algo-
rithm is called the sequential quadratic programming (SQP) algorithm. Evidently,
this method has the same characteristics as Newton’s method and is not globally
convergent. We add the adjective “local” to describe it in Algorithm 20.1.
Example 20.2 (Local SQP algorithm – I). Consider the problem
min f(x) = x1 + x2
subject to
h(x) = x21 + (x2 − 1)2 − 1 = 0 .
Figure 20.1 and Tables 20.1 and 20.2 demonstrate the application of the local SQP
algorithm to this problem. We note that the algorithm quickly finds an optimal
solution. It has the speed of convergence of Newton’s method. The algorithm also
suffers from the drawbacks of this method.
2 x0 2
1.5 1.5
1 1
0.5 0.5
x∗ x∗
0 0
-0.5 -0.5
x0 -1 -1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
T T
(a) x0 = 1 − 1 (b) x0 = −3/2 2
Figure 20.1: Illustrations of the local SQP algorithm for Example 20.2
T
Table 20.1: Illustration of the local SQP algorithm for Example 20.2; x0 = 1 − 1
k x1 x2 λ ∇Lxx
0 1.00000e+00 -1.00000e+00 1.00000e+00 5.83095e+00
1 0.00000e+00 -5.00000e-01 5.00000e-01 1.67705e+00
2 -1.00000e+00 -8.33333e-02 4.72222e-01 1.17515e+00

3 -7.74009e-01 2.49726e-01 6.06718e-01 1.94849e-01
4 -7.07425e-01 2.88997e-01 6.98180e-01 1.53511e-02
5 -7.07135e-01 2.92911e-01 7.07075e-01 7.14908e-05
6 -7.07107e-01 2.92893e-01 7.07107e-01 2.38157e-09
2 2
1.5 1.5
x0 1 x0 1
0.5 0.5
x∗ x∗
0 0
-0.5 -0.5
-1 -1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
T T
(a) x0 = −0.1 1 (b) x0 = 0.1 1
Figure 20.2: Illustrations of the local SQP algorithm for Example 20.2
T
Table 20.2: Illustration of the local SQP algorithm for Example 20.2; x0 = −3/2 2
k x1 x2 λ ∇Lxx
0 -1.50000e+00 2.00000e+00 1.00000e+00 4.25000e+00
1 -1.36538e+00 1.07692e+00 4.23077e-01 1.38412e+00
2 -1.11784e+00 -1.85423e-01 4.42901e-01 1.65558e+00
3 -8.03520e-01 2.16153e-01 5.71828e-01 2.91415e-01
4 -7.09901e-01 2.86072e-01 6.88886e-01 3.05735e-02
5 -7.07179e-01 2.92927e-01 7.06965e-01 2.72185e-04
6 -7.07107e-01 2.92893e-01 7.07107e-01 2.34061e-08
For instance, in the center of the circle of constraints, i.e., at the point (0, 1), the
matrix ∇h(x) is zero. Then, the matrix H in (20.11) is zero and consequently not
invertible. Note that, when we start the algorithm from a point close to (0, 1), it
has a tendency to take big steps (Figure 20.2(a) and Table 20.3). Finally, note that
there is no guarantee that a minimum can be found, as shown in Figure 20.2(b) and
Table 20.4. Indeed, the only objective of the algorithm is to zero the gradient of the
Lagrangian.
T
Table 20.3: Illustration of the local SQP algorithm for Example 20.2; x0 = −0.1 1
k x1 x2 λ ∇Lxx
0 -1.00000e-01 1.00000e+00 1.00000e+00 1.61867e+00
1 -5.05000e+00 5.00000e-01 -4.45000e+01 4.53418e+02
2 -2.62404e+00 7.50317e-01 -2.12782e+01 1.13424e+02
3 -1.50286e+00 8.78262e-01 -8.90106e+00 2.79633e+01
4 -1.08612e+00 9.63643e-01 -2.13558e+00 5.75895e+00
5 -1.01047e+00 1.19247e+00 3.11609e-01 1.18100e+00
6 -1.33383e+00 -6.56144e-01 3.95100e-01 3.53584e+00
7 -9.63788e-01 1.09118e-01 4.84472e-01 7.38361e-01
8 -7.22734e-01 2.53867e-01 6.39958e-01 1.17880e-01
9 -7.08899e-01 2.93445e-01 7.04068e-01 5.65591e-03
10 -7.07103e-01 2.92887e-01 7.07103e-01 1.19518e-05
11 -7.07107e-01 2.92893e-01 7.07107e-01 7.90184e-11
T
Table 20.4: Illustration of the local SQP algorithm for Example 20.2; x0 = 0.1 1
k x1 x2 λ ∇Lxx
0 1.00000e-01 1.00000e+00 1.00000e+00 1.84935e+00
1 5.05000e+00 5.00000e-01 -5.45000e+01 5.52800e+02
2 2.62404e+00 7.50277e-01 -2.62802e+01 1.37776e+02
3 1.50282e+00 8.77817e-01 -1.14197e+01 3.35627e+01
4 1.08576e+00 9.59069e-01 -3.50192e+00 6.73105e+00
5 1.00831e+00 1.11015e+00 -7.10297e-01 9.48330e-01
6 9.26496e-01 1.72824e+00 -5.53513e-01 4.35129e-01
7 7.02912e-01 1.74580e+00 -6.73242e-01 7.35796e-02
8 7.08697e-01 1.70662e+00 -7.05786e-01 3.01674e-03
9 7.07106e-01 1.70711e+00 -7.07105e-01 5.18652e-06
10 7.07107e-01 1.70711e+00 -7.07107e-01 1.55080e-11
Example 20.3 (Local SQP algorithm – II). We apply the local SQP algorithm to
the problem in Example 19.5, i.e.,
min 2(x21 + x22 − 1) − x1
x∈R2
subject to
x21 + x22 = 1
shown in Figure 19.5. The iterations are listed in Table 20.5 and presented in Fig-
ure 20.3(a). It is interesting to compare the iterations with those of the augmented
Lagrangian method (Figure 20.3(b)).
x0 x0
x∗ x∗
(a) SQP (b) Augmented Lagrangian
Figure 20.3: SQP and augmented Lagrangian: iterations for Example 20.3
T
Table 20.5: Illustration of the local SQP algorithm for Example 20.3; x0 = 0.5 1.3
k x1 x2 λ ∇Lxx
0 5.00000e-01 1.30000e+00 1.00000e+00 8.10701e+00
1 5.24055e-01 9.29210e-01 -1.14433e+00 1.59951e+00
2 9.35594e-01 6.22819e-01 -1.71786e+00 6.44709e-01
3 1.38229e+00 -2.59531e-01 -1.60029e+00 1.00534e+00
4 1.08314e+00 3.14974e-02 -1.55178e+00 1.78831e-01
5 1.00374e+00 -3.25035e-03 -1.50552e+00 1.09864e-02
6 1.00001e+00 3.61321e-05 -1.50003e+00 5.99917e-05
7 1.00000e+00 -1.92868e-09 -1.50000e+00 2.50640e-09
8 1.00000e+00 2.67876e-18 -1.50000e+00 2.67876e-18
We note that the latter attempts to follow the constraint, thereby requiring more
iterations. The SQP method is much faster in this case. However, let us keep in mind
that the local SQP method is not globally convergent.
Example 20.4 (SQP with the constrained Rosenbrock problem). Consider again the
problem of Example 19.6, i.e.,
2 2
min 100 x2 − x21 + 1 − x1
x∈R2
subject to
1
x1 − x22 −
= 0,
2
and use the SQP method to solve it. The iterations are listed in Table 20.6 and
T
Table 20.6: Illustration of the local SQP algorithm for Example 20.4; x0 = −1 0 ,
λ0 = 1
k x1 x2 λ ∇Lxx
0 -1.00000e+00 0.00000e+00 1.00000e+00 4.49901e+02
1 5.00000e-01 -2.02020e+00 -5.90919e+02 2.84494e+03
2 4.86136e-01 -1.00667e+00 -2.34944e+02 7.21646e+02
3 3.95205e-01 -4.51284e-01 -7.00969e+01 1.86409e+02
4 3.54255e-01 -6.41642e-02 -1.84751e+01 4.09255e+01
5 4.73514e-01 1.74309e-01 -1.30510e+01 7.15123e+00
6 5.71075e-01 2.91031e-01 -5.93423e+00 3.76956e+00
7 6.39843e-01 3.85770e-01 -4.56241e+00 1.42789e+00
8 6.76129e-01 4.21168e-01 -8.56764e+00 5.16695e-01
9 6.63972e-01 4.05248e-01 -8.74417e+00 5.46174e-02
10 6.64028e-01 4.05004e-01 -8.87130e+00 6.31475e-05
11 6.64029e-01 4.05006e-01 -8.87139e+00 7.79619e-10
12 6.64029e-01 4.05006e-01 -8.87139e+00 8.88178e-15
13 6.64029e-01 4.05006e-01 -8.87139e+00 0.00000e+00
It is interesting to note that, contrary to the augmented Lagrangian method (Example

19.6), the SQP method is able to solve the problem with high precision.
20.2 Globally convergent algorithm

In order to render the SQP method globally convergent, we take inspiration from the
descent methods of Chapter 11. In the context of unconstrained optimization, the aim
was to identify a descent direction and calculate an appropriate step in this direction,
in order for the new iterate to be “significantly better” than the last one. In this
context, the concept of “significantly better” corresponds to a sufficient decrease of
the objective function. In the context of constrained optimization, it is not so simple.
Indeed, an iterate can be better than the previous one for two reasons: the value of
the objective function is lower, or the iterate is closer to the feasible set. These two
objectives are often conflicting, in the sense that we generally have to increase the
472 Globally convergent algorithm
•x
∗ 0.5
x0 0
-0.5
-1
-1.5
-2
-1.5 -1 -0.5 0 0.5 1 1.5
Figure 20.4: Iterations of the SQP method for Example 20.4
value of the objective function in order to satisfy the constraints. To identify whether
an iterate is “significantly better,” we have to combine the two aspects in a function
called the merit function. This is similar to the idea developed in the context of the
augmented Lagrangian algorithm. It is referred to as exact if the optimal solution to
the constrained optimization problem (20.1)–(20.2) is a local minimum of the merit
function.
Definition 20.5 (Exact merit function). Consider the constrained optimization

problem (1.71)-(1.74). A function φ : Rn → R is an exact merit function of the
problem if each local minimum x∗ of the problem (1.71)–(1.74) is also a local mini-
mum of the unconstrained function φ.
For the problem (20.1)-(20.2), the exact merit function that is used the most is
m
X
φc (x) = f(x) + c h(x) 1
= f(x) + c hi (x) . (20.13)
i=1
We demonstrate that this is an exact merit function, at least when c is sufficiently

large.
Theorem 20.6 (Exact merit function ℓ1 ). Let f : Rn → R and h : Rn → Rm

be twice differentiable and let us take the optimization problem (20.1)–(20.2),
minx∈Rn f(x) subject to h(x) = 0. Let x∗ and λ∗ satisfy the sufficient optimality
conditions (6.23)–(6.24). If
c> max λ∗i , (20.14)

i=1,...,m
the function (20.13) is an exact merit function for this problem.
Proof. Take ε > 0 such that f(x∗ ) ≤ f(x) for all x such that h(x) = 0 and kx−x∗ k ≤ ε.
We define the following optimization problems:
Perturbed problem. Take δ ∈ Rm . The perturbed problem is
min f(x)
x∈Rn
subject to
h(x) = δ
kx − x∗ k ≤ ε ,
for which the optimal value is denoted by p(δ). According to the sensitivity theorem
(Theorem 6.24), we have
∇p(0) = −λ∗ . (20.15)
Relaxed problem. Take δ ∈ Rm and c > 0. The relaxed problem is

m
X
minn φc (x) = f(x) + c hi (x)
x∈R
i=1
subject to
h(x) = δ
kx − x∗ k ≤ ε ,
for which the optimal value is denoted by pc (δ). We can also write the objective
function
m
X
minn f(x) + c |δi | .
x∈R
i=1
Pm
As c i=1 |δi | does not depend on x, the relaxed problem is equivalent to the per-
turbed problem, up to a shift of the objective function. Therefore,
m
X
pc (δ) = p(δ) + c |δi |. (20.16)
i=1
In particular, we have
pc (0) = p(0). (20.17)

Auxiliary problem. Take c > 0 and ∆(ε) = δ | ∃x such that h(x) = δ and kx−x∗k <
ε . The auxiliary problem is
minm pc (δ)
δ∈R
subject to δ ∈ ∆(ε).
We first demonstrate that δ = 0 is an optimal solution to the auxiliary problem.

Using Taylor’s theorem (Theorem C.2), we have for any δ feasible for the auxiliary
problem
m
X
pc (δ) = p(δ) + c |δi | from (20.16)
i=1
X m
1 T 2
= p(0) + δT ∇p(0) + δ ∇ p(ᾱδ)δ + c |δi | using (C.4)
2
i=1
X m
1 T 2
= p(0) − δT λ∗ + δ ∇ p(ᾱδ)δ + c |δi | from (20.15)
2
i=1
where 0 ≤ ᾱ ≤ 1. Take γ > 0 and c ≥ maxi=1,...,m |λ∗i | + γ. Then,

m
X X X
c |δi | ≥ max |λ∗i | |δi | + γ |δi |
i=1 i i
X X
≥ δi λ∗i + γ |δi |
i i
X
= δ T λ∗ + γ |δi | .
i
Then,
1 T 2 X
pc (δ) ≥ p(0) − δT λ∗ + δ ∇ p(ᾱδ)δ + δT λ∗ + γ |δi |
2
i
or
1 T 2 X
pc (δ) ≥ p(0) + δ ∇ p(ᾱδ)δ + γ |δi | .
2
i
P 1
We can make δ sufficiently close to 0, so that γ i |δi | dominates 2 δT ∇2 p(ᾱδ)δ, so
that
1 T 2 X
δ ∇ p(ᾱδ)δ + γ |δi | > 0,
2
i
and
pc (δ) > p(0).
Using (20.17), we have

pc (δ) > p(0) = pc (0), (20.18)
and δ = 0 is the optimal solution of the auxiliary problem.

We assume that δ 6= 0, but sufficiently close to 0, and x such that h(x) = δ (then,
x is infeasible for the initial problem) and satisfying kx − x∗ k < ε. Since pc (δ) is the
optimal value of the relaxed problem, we have
m
X
pc (δ) ≤ f(x) + c hi (x) .
i=1
According to (20.18), we have

m
X
φc (x∗ ) = f(x∗ ) = pc (0) < f(x) + c hi (x) = φc (x)
i=1
and φc (x∗ ) is better that all the infeasible x in a neighborhood of x∗ .

When x is feasible, i.e., h(x) = 0, then φc (x) = f(x). Since x∗ is a local minimum
of the initial problem, we have φc (x∗ ) = f(x∗ ) ≤ f(x) = φc (x) if kx − x∗ k < ε and x∗
is indeed a local minimum of φc (x).
It is interesting to graphically analyze this function. The level curves of the
merit function
√ for the problem in Example 20.2 are shown in Figure 20.5. When
c > |λ∗ | = 2/2 ≈ 0.707, the minimum of the merit function (xm in the graph)
corresponds to the optimal solution of the initial problem
√ √ T
x∗ = 2/2 1 + 2/2 .
The level curves of the merit function for the problem in Example 20.3 are shown in
Figure 20.6. When c > |λ∗ | = 1.5, the minimum of the merit function corresponds to
T
the optimal solution of the initial problem x∗ = 1 0 .
In order to render Algorithm 20.1 globally convergent, we use the same ideas as
in the unconstrained case, where the merit function plays the role of the objective
function when the notion of “significantly better” is required. The line search methods
(Chapter 11) based on the Wolfe conditions and the trust region methods (Chapter
12) can be used in this context. We give a detailed description of the algorithm based
on line search.
The Wolfe conditions (11.45) and (11.47) should here be translated as
φc (xk + αk dk ) ≤ φc (xk ) + αk β1 ∇φc (xk )T dk
and
∇φc (xk + αk dk )T dk ≥ β2 ∇φc (xk )T dk
with 0 < β1 < β2 < 1. Unfortunately, the merit function (20.13) is not differentiable,
especially when x is feasible. It is not permitted to use ∇φc (xk ), which does not exist
everywhere. However, we do not need the gradient itself, but only the directional
derivative. And it is important that the latter is negative, in order for dk to be a
descent direction for the merit function.
2 2
1.5 1.5
1 1
0.5 0.5
x∗ x∗
0 0
-0.5 xm -0.5
-1 -1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
(a) c = 0 (b) c = 0.3
2 2
1.5 1.5
1 1
∗ 0.5 0.5
xxm x∗ = xm
0 0
-0.5 -0.5
-1 -1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
(c) c = 0.6 (d) c = 0.9
2 2
1.5 1.5
1 1
0.5 0.5
x∗ = xm x∗ = xm
0 0
-0.5 -0.5
-1 -1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
(e) c = 1.2 (f) c = 1.5
Figure 20.5: Merit function for Example 20.2

Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 Sequential quadratic programming 477
xm x∗ xm x∗
(a) c = 0 (b) c = 0.6
xm x∗ x∗ = xm
(c) c = 1.1 (d) c = 1.6
x∗ = xm x∗ = xm
(e) c = 2.1 (f) c = 2.6
Figure 20.6: Merit function for Example 20.3

Theorem 20.7 (Directional derivative of the merit function). Let dx and dλ satisfy
the conditions (20.7) and (20.8). Then, the directional derivative of φc in the
direction dx is
φc′ (xk ; dx ) = ∇f(xk )T dx − c h(xk ) 1 .
Proof. According to Taylor’s theorem (Theorem C.2), there exist αf and αhi such
that
1
f(xk + αdx ) = f(xk ) + αdTx ∇f(xk ) + α2 dTx ∇2 f(xk + ααf d)dx
2
and, for all i = 1, . . . , m,
1 2 T 2
hi (xk + αdx ) = hi (xk ) + αdTx ∇hi (xk ) + α dx ∇ hi (x + ααhi d)dx .
2
Let M be an upper bound on the eigenvalues of ∇2 f(xk +ααf d) and ∇2 hi (x+ααhi d),
i = 1, . . . , m, such that
−Mkdx k2 ≤ dTx ∇2 f(xk + ααf d)dx ≤ Mkdx k2 (20.19)
and
−Mkdx k2 ≤ dTx ∇2 hi (xk + ααhi d)dx ≤ Mkdx k2 . (20.20)
Then,
φc (xk + αdx ) − φc (xk ) = f(xk + αdx ) + c h(xk + αdk ) 1

− f(xk ) − c h(xk ) 1
T 1 2
≤ f(xk ) + α∇f(xk ) dx + α Mkdx k2
2
1
+ c h(xk ) + α∇h(xk )T dx 1 + α2 Mkdx k2
2
− f(xk ) − c h(xk ) 1
= α∇f(xk )T dx + α2 Mkdx k2
+ c h(xk ) + α∇h(xk )T dx 1
− c h(xk ) 1
.
By using (20.8), we obtain
φc (xk + αdx ) − φc (xk ) ≤ α∇f(xk )T dx + α2 Mkdx k2

+ c h(xk ) − αh(xk ) 1
− c h(xk ) 1
.
If we assume that α < 1, such that 1 − α > 0, we get

φc (xk + αdx ) − φc (xk ) ≤ α ∇f(xk )T dx − c h(xk ) 1
+ Mα2 kdx k2 .
By using the lower bounds of (20.19) and (20.20) instead of the upper bounds, we
similarly obtain

φc (xk + αdx ) − φc (xk ) ≥ α ∇f(xk )T dx − c h(xk ) 1 − Mα2 kdx k2 .
Therefore,
2
∇f(xk )T dx − c h(xk ) 1
− Mα dx
φc (xk + αdx ) − φc (xk )
≤
α
T 2
≤ ∇f(xk ) dx − c h(xk ) 1 + Mα dx .
When α → 0, we obtain
φc′ (xk ; dx ) = ∇f(xk )T dx − c h(xk ) 1

. (20.21)
Therefore, if the parameter c is chosen such that
∇f(xk )T dx
c> ,
h(xk ) 1
the direction dx is a descent direction for the merit function. Unfortunately, this
condition may generate large values for c. We perform a detailed analysis of this
directional derivative in order to find another value of c that ensures that dx is a
descent direction for φc .
Theorem 20.8 (Descent direction for the merit function). Let dx and dλ satisfy
the conditions (20.7) and (20.8). Then, the directional derivative of φc in the
direction dx , denoted by φc′ (xk ; dx ) is such that

φc′ (xk ; dx ) ≤ −dTx ∇2 L(xk , λk )dx − c − dλ ∞ h(xk ) 1 .
Proof. According to Theorem 20.7, we have
φc′ (xk ; dx ) = ∇f(xk )T dx − c h(xk ) 1

.
By using (20.7), we obtain
∇f(xk )T dx = −dTx ∇2xx L(xk , λk )dx − dTx ∇h(xk )dλ .
According to (20.8), this gives
∇f(xk )T dx = −dTx ∇2xx L(xk , λk )dx + h(xk )T dλ
and
φc′ (xk ; dx ) = −dTx ∇2xx L(xk , λk )dx + h(xk )T dλ − c h(xk ) 1
.
Applying the Cauchy-Schwartz inequality (C.20):
h(xk )T dλ ≤ h(xk ) 1
dλ ∞
produces the result.

Algorithm 20.2: Globalized SQP algorithm

1 Objective
2 To find a local minimum of the problem (1.71)–(1.72), minx∈Rn f(x)
subject to h(x) = 0.
3 Input
4 f : Rn → R, ∇f : Rn → Rn , ∇2 f : Rn → Rn×n .
5 h : Rn → Rm , ∇h : Rn → Rn×m , ∇2 hi : Rn → Rn×n , i = 1, . . . , m.
6 A parameter 0 < β1 < 1 (by default: β1 = 0.3).
7 A parameter c̄ > 0 (by default: c̄ = 0.1).
8 An initial solution (x0 , λ0 ).
10 Output
12 Initialization
13 k := 0.
14 c0 := λ0 ∞ + c̄.
15 Repeat
m
X
16 Calculate ∇2xx L(xk , λk ) = ∇2 f(xk ) + (λk )i ∇2 hi (xk ) .
i=1
17 Find an positive definite approximation Hk of ∇2xx L(xk , λk ) (e.g. using
Algorithm 11.7).
18 Obtain dx and dλ by solving the quadratic problem
mind ∇f(xk )T d + 21 dT Hk d subject to ∇h(xk )T d + h(xk ) = 0 with an
appropriate algorithm (to illustrate the method, we may use (20.11) and
(20.12)).
+
19 c := dλ ∞ + c̄.
20 Update the penalty parameter
21 If ck ≥ 1.1 c+ , ck+1 := 12 (ck + c+ ).
22 If c+ ≤ ck < 1.1 c+ , ck+1 := ck .
23 If ck < c+ , ck+1 := max(1.5 ck , c+ ).
24 φc′ k (xk ; dx ) := ∇f(xk )T dx − ck h(xk ) 1 .
25 Calculate the step
26 i = 0, αi = 1.
27 while φck (xk + αi dk ) > φck (xk ) + αi β1 φc′ k (xk ; dx ) do
28 αi+1 := αi /2
29 i := i + 1
30 α := αi .
31 xk+1 := xk + αdx .
32 λk+1 := dλ .
33 k := k + 1.
34 Until ∇L(xk , λk ) ≤ ε.
Then, dx is a descent direction for the merit function if

dTx ∇2xx L(xk , λk )dx
c > dλ − .
∞ h(xk ) 1
If the matrix ∇2xx L(xk , λk ) is positive definite, we need only
c > dλ ∞
. (20.22)
This condition is consistent with (20.14).
In practice, the matrix ∇2xx L(xk , λk ) is not always positive definite. Therefore, as
presented in Chapter 11, we replace this matrix with a positive definite approxima-
tion, using for instance a modified Cholesky factorization (Algorithm 11.7).
In practice, the choice of c is delicate. We here adopt the procedure presented
by Bonnans et al. (1997), where the parameter ck is updated at each iteration. Take
c+ = dλ ∞ + c̄, where c̄ is a positive constant. The value of the parameter is chosen
in line with (20.22).
• If ck−1 ≥ 1.1 c+ , then the parameter is too large, and is reduced to the average
value between ck−1 and c+ , i.e.,
1
ck = (ck−1 + c+ ) .
2
• If 1.1 c+ ≥ ck−1 ≥ c+ , then the value of the parameter is good, and we leave it as
it is, i.e.,
ck = ck−1 .
• In the other cases, the parameter has to be increased. In order to significantly
increase it, we impose a minimum augmentation of 50 %, i.e.,
ck = max(1.5 ck−1 , c+ ) .
The algorithm is described as Algorithm 20.2.
Comments
• The second Wolfe condition has not been used here, first of all in order to sim-
plify the description of the algorithm and secondly because it is not necessary in
practice. Moreover, it requires calculations of the directional derivative for each
candidate.
• The matrix Hk can also be constructed by using the update formulas defined in
Chapter 13. If the BFGS method is used, it is important to note that the condition
dT y > 0 (Theorem 13.2) is not automatically satisfied in this context.
We apply this algorithm to Example 20.2. We keep in mind that the local SQP
algorithm does not always converge toward a local minimum, as illustrated in Fig-
ure 20.2. The globalized algorithm, on the other hand, converges toward a local
minimum for the two starting points (Figure 20.7). Tables 20.7 and 20.8 provide a
detailed list of the iterations, where the parameter τ indicates the multiple of the
identity that had to be added to the matrix ∇2xx L for it to be positive definite. It
is interesting to note that, during the last iterations, τ = 0 and α = 1 and these
iterations are thus equivalent to those of the local SQP method.
Table 20.7: Illustration of the globalized SQP algorithm for Example 20.2; x0 = −0.1 1
k x1 x2 λ c α τ k∇Lxx k
T
0 -1.00000e-01 1.00000e+00 1.00000e+00 4.5e+01 2.5e-01 0.0e+00 1.61867e+00

1 -1.33750e+00 8.75000e-01 -4.45000e+01 2.5e+01 1.0e+00 1.3e+02 1.20651e+02
2 -1.03707e+00 8.78487e-01 4.51420e+00 1.3e+01 1.0e+00 0.0e+00 8.36410e+00
3 -9.82831e-01 7.87058e-01 7.18209e-01 6.7e+00 1.2e-01 0.0e+00 8.07145e-01
4 -9.68037e-01 7.22096e-01 5.95218e-01 3.7e+00 1.2e-01 0.0e+00 6.86454e-01
5 -9.47328e-01 6.53182e-01 6.18375e-01 2.2e+00 2.5e-01 0.0e+00 5.96563e-01
6 -9.03900e-01 5.40943e-01 6.41192e-01 1.5e+00 5.0e-01 0.0e+00 4.41901e-01
7 -8.20325e-01 3.91504e-01 6.71728e-01 1.1e+00 1.0e+00 0.0e+00 2.13531e-01
8 -7.11368e-01 2.80115e-01 6.98734e-01 9.8e-01 1.0e+00 0.0e+00 2.56963e-02
Globally convergent algorithm
9 -7.07221e-01 2.92880e-01 7.06945e-01 8.9e-01 1.0e+00 0.0e+00 2.84644e-04

10 -7.07107e-01 2.92893e-01 7.07107e-01 8.5e-01 1.0e+00 0.0e+00 3.93262e-08
11 -7.07107e-01 2.92893e-01 7.07107e-01 8.5e-01 1.0e+00 0.0e+00 1.42178e-15
Table 20.8: Illustration of the globalized SQP algorithm for Example 20.2; x0 = 0.1 1
k x1 x2 λ c α τ k∇Lxx k
T
0 1.00000e-01 1.00000e+00 1.00000e+00 5.5e+01 2.5e-01 0.0e+00 1.84935e+00

1 1.33750e+00 8.75000e-01 -5.45000e+01 3.0e+01 1.0e+00 1.5e+02 1.45526e+02
2 1.03710e+00 8.78856e-01 4.69637e+00 1.5e+01 1.0e+00 0.0e+00 1.07425e+01
3 9.80472e-01 7.66569e-01 -2.25676e-01 7.7e+00 1.6e-02 6.4e-01 1.23808e+00
4 9.57039e-01 6.68675e-01 -3.66978e-01 4.1e+00 3.1e-02 1.0e+00 1.27856e+00
5 9.13886e-01 5.45237e-01 -3.03114e-01 2.2e+00 6.2e-02 8.6e-01 1.35205e+00
6 7.64062e-01 2.47039e-01 -2.17779e-01 1.1e+00 6.2e-02 6.2e-01 1.49377e+00
7 4.17063e-01 -9.88200e-02 1.08475e-03 7.5e-01 2.0e-03 0.0e+00 1.46372e+00
8 -6.68621e-01 -5.10558e-01 2.46922e-01 6.6e-01 5.0e-01 0.0e+00 1.87138e+00
9 -1.03459e+00 -6.24439e-02 4.77505e-01 9.9e-01 1.0e+00 0.0e+00 1.19931e+00
10 -7.66609e-01 2.40944e-01 6.06968e-01 9.0e-01 1.0e+00 0.0e+00 1.94510e-01
Sequential quadratic programming
11 -7.08587e-01 2.90278e-01 6.98162e-01 8.5e-01 1.0e+00 0.0e+00 1.50533e-02

12 -7.07117e-01 2.92897e-01 7.07078e-01 8.5e-01 1.0e+00 0.0e+00 5.43054e-05
13 -7.07107e-01 2.92893e-01 7.07107e-01 8.5e-01 1.0e+00 0.0e+00 6.60568e-10
14 -7.07107e-01 2.92893e-01 7.07107e-01 8.5e-01 1.0e+00 0.0e+00 4.96507e-16
483
484 Project
2 2
1.5 1.5
x0 1 x0 1
0.5 0.5
x∗ x∗
0 0
-0.5 -0.5
-1 -1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
T T
(a) x0 = −0.1 1 (b) x0 = 0.1 1
Figure 20.7: Illustrations of the globalized SQP algorithm for Example 20.2
20.3 Project
Objective
The aim of this project is to implement the SQP algorithm, to test it on several
problems, and to compare the local version with its globalized counterpart.
Approach
1. Solve the problems with Algorithm 20.2. Let x∗ be a local optimum.
2. Randomly generate several starting points in a central ball x∗ of radius ε with,
for instance, ε = 1, 10, 100, 1,000.
3. For each value of ε, perform statistics on the number of times that the local
algorithm converges.
4. For each starting point from which the local algorithm converges, compare the
number of iterations for the two algorithms. What is the impact of the globaliza-
tion on the efficiency of the method?
Algorithms
Problems
min ex1 −2x2
subject to
sin(−x1 + x2 − 1) = 0
−2 ≤ x1 ≤ 2
−1.5 ≤ x2 ≤ 1.5 .

10 10
!!
X X
xi xk
min e ci + xi − ln e
x∈R10
i=1 k=1
subject to
ex1 + 2ex2 + 2ex3 + ex6 + ex10 = 2

ex4 + 2ex5 + ex6 + ex7 = 1
ex3 + ex7 + ex8 + 2ex9 + ex10 = 1
−100 ≤ xi ≤ 100 , i = 1, . . . , 10 ,
with
c1 = −6.089 c6 = −14.986
c2 = −17.164 c7 = −24.1
c3 = −34.054 c8 = −10.708
c4 = −5.914 c9 = −26.662
c5 = −24.721 c10 = −22.179 .
Solution:
x∗1 = −3.40629 x∗6 = −4.44283
x∗2 = −0.596369 x∗7 = −1.41264
x∗3 = −1.1912 x∗8 = −21.6066
x∗4 = −4.62689 x∗9 = −2.26867
x∗5 = −1.0011 x∗10 = −1.40346 .
Proposed by Hock and Schittkowski (1981).
min ln(1 + x21 ) − x2

x∈R2
subject to
2
1 + x21 + x22 = 4
−4 ≤ x1 ≤ 4
−4 ≤ x2 ≤ 4 .
Please note: do not forget to transform the formulation of the problems so that
they are compatible with (20.1)–(20.2).
Part VI
Networks
The richest people in the world look

for and build networks, everyone
else just looks for work.
Robert Kiyosaki
Our daily life is full of networks. We drive on a network of roads and highways. We
receive water and electricity at home through the corresponding supply networks.
Our houses are connected to a network of sewers for the evacuation of waste water.
Our computers communicate over the internet, and our wireless phones are connected
through a network of antennas. We exchange messages and pictures with our friends
on social networks. We participate in professional meetings for the sake of “network-
ing.” Our brain is generating ideas and emotions from a network of neurons. In this
book, we define a network as a mathematical object, with interesting properties that
are exploited to solve optimization problems. The analogy with “real” networks allows
us to develop intuitions about these properties. But the mathematical abstraction is
also useful for applications that have nothing to do with networks in real life.
Chapter 21
Introduction and definitions
Contents
21.1 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
21.2 Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
21.3 Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
21.4 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
21.5 Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
21.5.1 Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
21.5.2 Capacities . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
21.5.3 Supply and demand . . . . . . . . . . . . . . . . . . . . . 504
21.5.4 Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
21.5.5 Network representation . . . . . . . . . . . . . . . . . . . 508
21.6 Flow decomposition . . . . . . . . . . . . . . . . . . . . . . 510
21.7 Minimum spanning trees . . . . . . . . . . . . . . . . . . . 520
21.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
Generally speaking, a network is a system of interconnected people or things. The

main feature of most networks is that the global complexity of the network is high,
while the local complexity is low. For instance, you need a simple connection to the
nearest WiFi antenna to connect your smartphone to the entire World Wide Web.
When your distribution network operator has connected your new house to the grid,
you can consume electricity that may have been produced in a different country. If
you create a funny video and post it on YouTube, you can create a “buzz” and reach
potentially hundreds of thousands of people around the world.
The mathematical object called “network” shares the same feature: it is able to
capture complex structures using simple elements. The analogy between the math-
ematical object and real networks is useful to develop intuitions. However, a con-
sequence is that the vocabulary and definitions used in the literature vary slightly
from one reference to the next. In order to avoid any ambiguity, we provide here the
definitions of the concepts that are used in this part of the book.
492 Graphs
21.1 Graphs
The element defining the structure of a network is called a graph , composed of ver-
tices, or nodes and edges, or arcs. The vertices are the entities that are intercon-
nected, and the edges represent the connections. For instance, in the water supply
network, the vertices are the houses of the customers, the tanks where the water
is stored, the water treatment plants, the pumping stations, and so on. The edges
are the physical pipes connecting these entities. On Facebook, the vertices are the
registered individuals, and an edge represents a friendship connection between two
persons. The relationship between the edges and the vertices that they connect is
captured by a function called the incidence function. A graph is defined by a set of
vertices, a set of edges, and an incidence function.
Definition 21.1 (Graph). A graph is a triple (V, E, φ), where V is a finite set of
elements called vertices, E is a finite set of elements called edges, and φ : E → P2 (V)
is the incidence function, mapping the set of edges into the set P2 (V) of all 2-element
subsets of vertices.
It is common to represent a graph using a picture where vertices are represented

by dots or circles, and edges by lines or arcs connecting the vertices. Consider the
graph with
V = {v1 , v2 , v3 , v4 , v5 },
E = {e1 , e2 , e3 , e4 , e5 , e6 , e7 , e8 }
and the incidence function defined by
φ(e1 ) = {v1 , v2 }, φ(e2 ) = {v2 , v3 },

φ(e3 ) = {v1 , v3 }, φ(e4 ) = {v3 , v5 },
φ(e5 ) = {v2 , v4 }, φ(e6 ) = {v2 , v4 },
φ(e7 ) = {v4 , v5 }, φ(e8 ) = {v4 , v5 }.
The graph is illustrated in Figure 21.1.

A graph defined by a subset of vertices and edges of another graph is called a
subgraph.
Definition 21.2 (Subgraph). The graph (V ′ , E ′ , φ ′ ) is a subgraph of (V, E, φ) if

• V ′ ⊆ V,
′
• E ⊆ E,
• φ ′ (e) = φ(e), for each e ∈ E ′ ,

• for each e ∈ E ′ , if φ ′ (e) = {i, j}, then i and j both belong to V ′ .
In the definition 21.1 of a graph, an edge connects two vertices, and their order is
not specified. The names “arc” and “node,” used instead of “edge” and “vertex,” imply
Introduction and definitions 493
e5
v2 v4
e6
e1
v1 e2 e8 e7
e3
e4
v3 v5
Figure 21.1: An example of a graph
that the underlying connection is directed, meaning that the arc connecting node i to
node j is different from the arc that connects node j to node i (if there is one). The
graph is then said to be directed. In this book, we consider mainly directed graphs,
with the exception of Section 21.7 on minimum spanning trees, where undirected
graphs are considered.
Definition 21.3 (Directed graph). A directed graph is a triple (N , A, φ), where N

is a finite set of elements called nodes, A is a finite set of elements called arcs, and
φ : A → N × N is the incidence function, mapping the set of arcs into the set of
pairs of nodes.
Note that this definition potentially allows several arcs to connect the same pair
of nodes. In this book, we focus on networks such that the incidence function is
injective.1 It means that if we select two distinct arcs a1 , a2 ∈ A, a1 6= a2 , then the
pairs of nodes incident to these two arcs must be different, that is φ(a1 ) 6= φ(a2 ).
In other words, when the incidence function is injective, each ordered pair of nodes
is connected by either 0 or 1 arc. In this case, we can use the notation (i, j) without
ambiguity to identify the arc representing the connection between node i and node
j. When we mention the arc (i, j), we refer to the arc a such that φ(a) = (i, j). As
φ is injective, if such an arc exists, it is unique. We say that i is the upstream node
of arc (i, j), and j its downstream node. It is said that the arc (i, j) is incident to
node i and to node j, irrespectively of the orientation of the arc. The degree di of
node i is the number of incident arcs. The indegree d− i of node i is the number of
arcs incident to i such that i is their downstream node. The outdegree d+ i of node
i is the number of arcs incident to i such that i is their upstream node. For each
node i, we have di = d− +
i + di . Also, if the arc (i, j) exists, nodes i and j are said
1 The incidence function of the graph represented in Figure 21.1 is not injective, as there are two
edges connecting the vertices v2 and v4 , as well as the vertices v3 and v5 .
494 Cuts
to be adjacent to each other, irrespectively of the orientation of the arc. Figure 21.2
represents a directed graph with 8 nodes and 10 arcs. The indegree, outdegree, and
degree of each node are:
d− −
1 =1, d2 =2, d− −
3 =1, d4 =2, d− −
5 =2, d6 =1, d− −
7 =1, d8 =0,
+ + + + + + + +
d1 =1, d2 =2, d3 =2, d4 =2, d5 =1, d6 =1, d7 =1, d8 =0,
d1 =2, d2 =4, d3 =3, d4 =4, d5 =3, d6 =2, d7 =2, d8 =0.
2 4 6
1 8
3 5 7
Figure 21.2: An example of a directed graph
21.2 Cuts
Just as cities can be separated into two banks by a river, it may be convenient to
separate a directed graph into two sets of nodes. Such a separation is called a cut.
Definition 21.4 (Cut). Consider a directed graph where N is the set of nodes. A
cut Γ is an ordered partition of the nodes into two non empty subsets:
Γ = (M, N \ M), (21.1)
where M ⊂ N is a subset of all the nodes of the graph.
We say that the cut Γ separates i from j if i ∈ M and j 6∈ M. Using the analogy
of a city divided by a river, M can be considered as the left bank and N \ M as the
right bank of the river. The arcs with their upstream node in the left bank, and their
downstream node in the right bank may represent bridges on the river. The bridges
connecting the left bank to the right bank constitute the set of forward arcs of the
cut:
Γ → = {(i, j) ∈ A|i ∈ M, j 6∈ M}. (21.2)
Similarly, the bridges connecting the right bank to the left form the set of backward
arcs of the cut:
Γ ← = {(i, j) ∈ A|i 6∈ M, j ∈ M}. (21.3)
Note that one or both of these sets may happen to be empty. When convenient to do
so, we say that (i, j) ∈ Γ if (i, j) ∈ Γ → ∪ Γ ← .
Note from these definitions that the cut based on the partition (M, N \ M) is
different from the cut based on the partition (N \ M, M). This is what is meant by
ordered partition in Definition 21.4. Using the bridge analogy again, it means that
we explicitly distinguish the left bank from the right bank of the river.
To illustrate the concept, Figure 21.3 represents the cut based on the subset of
nodes M = {1, 2, 3, 5, 7}, and the cut is
Γ = ({1, 2, 3, 5, 7}, {4, 6, 8}) .
The forward arcs of the cut are
Γ → = {(2, 4), (5, 4), (7, 6)}.
The backward arcs of the cut are
Γ ← = {(4, 2), (4, 5), (6, 7)}.
2 4 6
N \M
1 M 8
3 5 7
Figure 21.3: Example of a cut
21.3 Paths
A path in a graph (V, E, φ) is a finite sequence of edges e1 , . . . , ep−1 for which there
is a sequence v1 , . . . , vp of vertices such that φ(ek ) = {vk , vk+1 }, for k = 1, . . . , p − 1.
Similarly, a path in a directed graph (N , A, φ) is a sequence of arcs a1 , . . . , ap−1
for which there is a sequence i1 , . . . , ip of nodes such that φ(ak ) = (ik , ik+1 ) or
φ(ak ) = (ik+1 , ik ), for k = 1, . . . , p − 1.
496 Paths
In the context of a directed graph with an injective incidence function, we denote

a path by the sequence of nodes, each pair of consecutive nodes being directed either
forward (→) or backward (←). If a pair i → j is in the path, it means that the two
nodes are connected by the arc (i, j) in the forward direction. If the pair i ← j is
in the path, it means that the two nodes are connected by arc (j, i) in the backward
direction. To be a valid path, each arc in the path must belong to A. The first node
of a path P is called its origin. The last node is called its destination. A path such
that its origin coincides with its destination is called a cycle. A path is simple if it
contains no repeated nodes. A cycle is simple if it contains no repeated nodes, with
the exception of the origin and the destination that coincide. We denote P→ the set
of forward arcs of path P and P← the set of backward arcs. A path is a forward path
if its set of backward arcs is empty. Here are some examples of paths in the directed
graph represented in Figure 21.2:
• 1 → 2 → 4 → 5 is a simple forward path from node 1 to node 5 containing only
forward arcs [P→ = (1, 2), (2, 4), (4, 5), P← = ∅];
• 1 → 2 → 4 → 2 → 3 is a forward path from node 1 to node 3 (note that node
2 is repeated, so that it is not a simple path) [P→ = (1, 2), (2, 4), (4, 2), (2, 3),
P← = ∅];
• 1 → 2 ← 4 → 5 is a simple path from node 1 to node 5 that uses arc (4, 2) in the
reverse direction [P→ = (1, 2), (4, 5), P← = (4, 2) ];
• 1 → 2 ← 4 → 5 ← 3 → 1 is a simple cycle [P→ = (1, 2), (4, 5), (3, 1), P← =
(4, 2), (3, 5)];
• 4 → 6 → 7 is an invalid path as arc (4, 6) does not exist.
Lemma 21.5. [Longest simple path] Consider a directed graph with m nodes.
The number of arcs in any simple path is no more than m − 1.
Proof. A simple path that visits all the nodes of the graph has m − 1 arcs. If this
path is extended by one more arc, the downstream node of this arc would be visited
twice by the path, which would not be simple.
Lemma 21.6. [Finite number of simple paths] Consider a directed graph with
m nodes, m ≥ 2, and two nodes o and d. There is a finite number of simple
paths between o and d.
Proof. Consider k such that 2 ≤ k ≤ m. The number of simple paths containing k

nodes is finite. Indeed, the number of simple paths is bounded above by the number
of permutations of k − 2 nodes (some of these permutations do not correspond to a
valid path). As the value of k is bounded above by m, the total number of simple
paths is bounded above.
When every pair of nodes in the directed graph is connected with a path, the graph
is said to be connected . If every pair of nodes is connected with a path containing
only forward arcs, the graph is said to be strongly connected . A graph containing
a single node and no arc is considered to be strongly connected, too. The graph
represented in Figure 21.2 is not connected, as there are several pairs of nodes with
no path connecting them, such as nodes 1 and 8, for instance. Actually, with respect
to connectivity, this graph appears to have three connected subgraphs, defined by the
sets of nodes: {1, 2, 3, 4, 5}, {6, 7}, and {8}, and the arcs they are incident to. These three
connected subgraphs are called connected components. The definition of connected
components is based on equivalence classes on the set of nodes. Indeed, the relation “is
connected with” defines an equivalence relation on the set of nodes, as it is reflexive,
symmetric, and transitive (see Definition B.24). Note that the relation “is strongly
connected with” does not define an equivalence relation, as it is not symmetric.
Definition 21.7 (Connected component). The subgraph G ′ = (N ′ , A ′ , φ ′ ) of the

graph G = (N , A, φ) is a connected component of G if
• N ′ is an equivalence class on N for the relation “is connected with,”
′ ′
• for each (i, j) ∈ A, if i ∈ N then (i, j) ∈ A , and
• φ ′ (a) = φ(a), for each a ∈ A ′ .
In the second part of the definition, the existence of the arc (i, j) implies that i
and j belong to the same equivalence class. Consequently, if i ∈ N ′ then j belongs to
N ′ too.
2 4 6
1 8
3 5 7
(a) First (b) Second (c) Third
Figure 21.4: Connected components of the graph represented in Figure 21.2

498 Trees
21.4 Trees
There is a family of graphs called trees that are particularly useful both for the
algorithms that we describe in this book, and in various other applications.
Definition 21.8 (Tree). A tree is a connected graph without cycles.
Figure 21.5 represents an example of a tree. A leaf is a node of degree 1 in a tree.
Lemma 21.9. Every tree with at least one arc has at least two leaves.
Proof. Consider the path P of the tree with the largest number of arcs. Call its origin
o and its destination d. By construction, the degree of o and d is larger or equal to
1. If the degree of o is strictly larger than 1, it means that there is another arc, not in
the path, and incident to o. Therefore, a path with one more arc than P exists, which
is not possible by the definition of P. Therefore, o is a leaf. The same argument is
used to determine that d is a leaf, too.
A tree can be characterized in several different ways.
1 2
3 4 5
Figure 21.5: Example of a tree
Theorem 21.10 (Characterization of a tree). Let G = (N , A, φ) be a directed graph

with m nodes and n arcs. The following statements are all equivalent.
1. G is a tree;
2. G is connected and without cycles;
3. there is a unique simple path connecting any two nodes;
4. G has no cycles, and a simple cycle is formed if any arc is added;
5. G is connected and the removal of any single arc disconnects the graph;
6. G is connected and n = m − 1;
7. G has no simple cycles and n = m − 1.
Proof. 1 ⇐⇒ 2 By Definition 21.8.

2 =⇒ 3 Consider nodes i and j. As the graph is connected, there is at least one
path connecting i to j. Suppose by contradiction that there are two distinct
paths connecting i to j. Together, these two paths form a cycle, contradicting the
assumption. Therefore, there is exactly one path between i and j. As the graph
does not contain any cycle, the path is simple.

3 =⇒ 4 Assume that the graph contains a cycle involving nodes i and j. Then there
are two paths connecting i and j, which contradicts the assumption, and proves
that the graph has no cycle. Consider now any two nodes i and j that are not
connected by an arc. By assumption, there is a simple path connecting i and j.
Therefore, if the arc (i, j) is added, it closes the path and forms a simple cycle.
3 =⇒ 5 Consider any arc (i, j). It is the only path connecting i to j. Therefore, if
the arc is removed, node i is disconnected from j.
5 =⇒ 2 Assume by contradiction that the graph contains a cycle. Removing an arc
from the cycle does not disconnect the graph, contradicting this assumption, and
proving the result.
4 =⇒ 3 Consider two nodes i and j in the graph. If the arc (i, j) exists, there is one
path connecting i and j. As there are no cycles, it is the only one. If arc (i, j) does
not exist, add it to the graph. By assumption, it forms a simple cycle. Therefore,
the path obtained by removing arc (i, j) from the cycle is a simple path between
i and j. As the original graph has no cycle, it is unique.
4 =⇒ 2 We need to show that the graph is connected. But as condition 4 implies
condition 3 (see above), any pair of node is connected.
1 =⇒ 6 and 1 =⇒ 7 We show that n = m − 1 by induction. If there is only one arc
in the tree, that is n = 1, Lemma 21.9 states that there are at least two nodes. If
there were 3 nodes or more, one of them would be disconnected, as there is only
one arc. As this is not possible in a tree, the tree has exactly 2 nodes. Suppose the
property to be true for a tree with m nodes, and consider a tree with m ′ = m + 1
nodes. We must show that this tree has n ′ = m ′ − 1 = m arcs. Consider one leaf
of this tree (it exists by Lemma 21.9). If we remove the leaf as well as the unique
incident arc, we obtain a tree with m ′ − 1 = m nodes and n ′ − 1 = n arcs. As
n = m − 1, we have n ′ = 1 + n = 1 + m − 1 = m.
7 =⇒ 6 Consider the K connected components of the graph. Each of them is con-
nected and has no cycles, creating a tree. As a consequence, condition 6 is verified
for each component (see proof above). If mk is the number of nodes in the
connected component k, then mk − 1 is the number of arcs in the component.
Therefore, the total number of arcs is
K
X K
X
n= (mk − 1) = mk − K = m − K.
k=1 k=1
As n = m − 1, then K = 1, meaning that there is only one connected component,

and the graph is connected.
500 Trees
6 =⇒ 2 and 6 =⇒ 7 It is immediate for m = 1 and m = 2 that the graph has no

cycle. Assume now by induction that it is true for m−1, and consider a connected
graph with m nodes and m − 1 arcs. As each arc is incident to two nodes, the
average degree in this graph is
Pm
i=1 di 2n 2(m − 1) 2
= = =2− .
m m m m
This is strictly lower than 2 for any m. Therefore, there is at least one node with
degree strictly less than 2. Moreover, as the graph is connected, no node with
degree 0 exists. Therefore, there is at least one node i with degree 1. This node
cannot be part of a cycle. If we remove i and the incident arc, the remaining
graph has no cycle either, by induction. Therefore, the graph has no cycle.
In order to combine these implications, construct a directed graph where each

node corresponds to one of the conditions, and each arc to an implication proved
above (see Figure 21.6). The equivalence of all the conditions is equivalent to the
strong connectivity of this graph, which can easily be verified.
1 2 4
6 7 3
Figure 21.6: Proven implications for Theorem 21.10
Trees play an important role in network optimization. In particular, it is common

to construct a tree that connects all the nodes of a network. Such a tree is called a
spanning tree.
Definition 21.11 (Spanning tree). Consider the graph (V, E, φ). The subgraph
(V, E ′ , φ ′ ), where E ′ ⊆ E, and φ ′ (e) = φ(e), for each e ∈ E ′ , is a spanning tree of
(V, E, φ) if it is a tree.
Leonhard Euler [OIlr] was born in Basel, Switzerland, on April

15, 1707, and died in St. Petersburg, Russia, on November 18,
1783, of apoplexy. He studied mathematics under the direction
of John Bernoulli, and became friend with his two sons, Daniel
and Nicholas. In 1727, he joined the Academy of Sciences in
St. Petersburg upon the invitation of Empress Catherine I, and

in 1741, he became a member of the Academy of Sciences in
Berlin, asked by Frederick the Great. During a discussion with
the Queen Mother, she found him particularly timid and reserved. “Why, then, will
you not talk to me?” she said. “Because Madam,” he replied, “I have just come
from a country where people are hanged if they talk.” His masterpiece is probably
Introductio in analysin infinitorum (Euler, 1748). The number of things named
after Euler is impressively high: conjectures, equations, formulas, theorems, numbers,
laws. Euler’s identity eiπ + 1 = 0 is an example of mathematical beauty as it involves
five fundamental constants, and three basic arithmetic operations appearing exactly
once each. He introduced the concept of graphs, when solving the problem known as
the Seven Bridges of Königsberg in 1736, that consists in finding a path or a cycle in
a graph that uses each edge exactly once.
Figure 21.7: Leonhard Euler
21.5 Networks
It is often useful to associate quantities to nodes and arcs. For instance, in a water
network, each house may be associated with a daily consumption of water, each
treatment plant may be associated with a daily quantity of water treated, each tank
may be associated with a quantity of stored water, and each pipe has a length and a
cross section. When quantities are associated with the graph, we call it a network.
Definition 21.12 (Network). A network is a 5-uple (N , A, φ, fN , fA ) such that

(N , A, φ) is a directed graph, φ is an injective incidence function, fN : N → Rp ,
p ≥ 0, is a function associating a set of p values with each node, and fA : A → Rq ,
q ≥ 0, is a function associating a set of q values with each arc.
In order to simplify the notations, we refer to a network simply as (N , A). If the

arcs are represented by the pair (i, j), the incidence function is implicit. Moreover,
the quantities associated with the nodes and the arcs can be represented by vectors
of Rm and Rn respectively. This section discusses some of these quantities associated
with networks.
21.5.1 Flows
A network is often used to transport objects or information. The exact nature of
these items varies with the application. The definitions provided here are generic and
502 Networks
do not assume anything about the nature of what is transported. A typical quantity
associated with each arc (i, j) is the flow on the arc, denoted by xij . The quantity
xij ∈ R is the amount of “things” (water, electricity, information, etc.) that traverses
the arc during a given period of time. Note that the concept of flow presented here
is static, in the sense that we assume that it represents the total number of “things”
traversing the network during a time horizon that is sufficiently large so that all units
of flow depart and arrive during this horizon, and the time dimension is irrelevant.
The representation of dynamic flows, varying over time, are more complex and out
of the scope of this book.
For mathematical convenience, we allow xij to take on any real value, including
negative values. If xij < 0, the interpretation is that the arc (i, j) transports −xij
units of flow from j to i, that is in the opposite direction of the arc. The vector
x ∈ Rn such that each entry contains the flow on the corresponding arc is called the
flow vector. Figure 21.8 provides an example of a flow vector, where the flow on each
arc is shown next to it. For instance, there are 2.3 units of flow transported from
node 1 to node 2. There are 3 units of flow transported from node 4 to node 2 on arc
(4, 2), and 2.1 units of flow transported from node 4 to node to 2 on arc (2, 4).
-2.1
2 4 6
3
2.3
1 -1 -5 -5 2.5 3 8
3 5 7
0
Figure 21.8: A flow vector
Consider now a cut Γ = (M, N \ M). The flow through the cut Γ is defined as
X X
X(Γ ) = xij − xij , (21.4)
(i,j)∈Γ → (i,j)∈Γ ←
where Γ → is the set of forward arcs, and Γ ← the set of backward arcs of the cut (see
Section 21.2). If both Γ → and Γ ← are empty, the flow is 0.
Paths may also be associated with flows. Suppose that a flow f follows a simple
path P from its origin to its destination. The flow vector representing this flow is called
a simple path flow. It is a vector x ∈ Rn such that each component corresponding to
a forward arc of the path is equal to f, each component corresponding to a backward
arc is equal to −f, and all other components are 0, that is,

 f if (i, j) ∈ P
→
xij = −f if (i, j) ∈ P← (21.5)


0 otherwise.
When the path is a cycle, we refer to a simple cycle flow. Figure 21.9 represents a
simple path flow for path 1 → 2 ← 4 → 5. It represents f units of flow transported
from origin 1 to destination 5 along P.
0
2 4 6
−f
f
1 0 0 f 0 0 8
0
3 5 7
0
Figure 21.9: Simple path flow for path 1 → 2 ← 4 → 5
21.5.2 Capacities
In many practical applications, the value of the flow cannot exceed some value deter-
mined by physical characteristics of the system represented by an arc. For example,
the maximum quantity of water per unit of time that a pipe can transport depends on
the diameter of the pipe. The maximum number of cars that a highway can transport
per unit of time depends on the number and width of lanes. The maximum value
of the flow on the arc is called its capacity. As we allow xij to take negative values,
both a lower bound ℓij and an upper bound uij on the flow are required. Therefore,
we obtain for each arc (i, j) the following constraint:
ℓij ≤ xij ≤ uij . (21.6)
There are two common configurations of these bounds in practice. In applications

where the direction of flow is constrained to respect the direction of the arc (e.g., one
way streets in urban road networks), the value of ℓij is set to zero to forbid negative
values, and the value of uij is set to the physical capacity. If the flow is allowed to
move in any direction (e.g., in a network transporting electricity), we set ℓij = −uij
where uij is set to the physical capacity. However, the framework is general enough
to accommodate any value for ℓij and uij such that ℓij ≤ uij .
504 Networks
As we have defined the flow through a cut, the concept of capacity is relevant here
as well. Consider a cut Γ = (M, N \ M). The capacity of the cut Γ is
X X
U(Γ ) = uij − ℓij . (21.7)
(i,j)∈Γ → (i,j)∈Γ ←
If both Γ → and Γ ← are empty, the capacity is 0. For any cut Γ , we always have that
the flow through the cut is bounded from above by its capacity, that is
X(Γ ) ≤ U(Γ ). (21.8)
If X(Γ ) = U(Γ ), the cut is said to be saturated, in the sense that no more flow can be
sent from set M (the left bank) to set N \ M (the right bank).
21.5.3 Supply and demand

Nodes can also be associated with quantities. For instance, the total flow from and
towards the node represents the supply and the demand, respectively, of the flows
transported on the network. To introduce these quantities, consider a flow vector
x ∈ Rn , and a node i in the network. The quantity of flow leaving node i is given by
X
xij , (21.9)
j|(i,j)∈A
and the quantity of flow that enters node i is given by

X
xki . (21.10)
k|(k,i)∈A
The difference between these two quantities is called the divergence of node i.
Definition 21.13 (Divergence). Consider a network with m nodes and n arcs, and
a flow vector x ∈ Rn . For each node i, the divergence of x at node i is defined as
the total quantity of flow that leaves the node, minus the total quantity of flow that
enters the node: X X
div(x)i = xij − xki . (21.11)
j|(i,j)∈A k|(k,i)∈A
If this quantity is positive, it means that there are more units leaving the node
than units entering it. Units of flow are created at node i. This node is therefore
supplying the network with flow. It is a supply node. Similarly, if the divergence is
negative, it means that there are less units leaving the node than units entering it.
It is therefore a node where units of flow are consumed. This is a demand node. If
the divergence is zero, no flow is generated or consumed at the node. It is a transit
node. The divergence associated with the flow vector represented in Figure 21.8 is
reported in Figure 21.10, where the divergence of node i is denoted by yi . It is seen
y2 = −8.4 -2.1 y4 = 5.1 y6 = 0.5

2 4 6
2.3 3
y1 = −1.7 1 -1 -5 -5 2.5 3 8 y8 = 0
4
3 5 7
y3 = 5 0 y5 = 0 y7 = −0.5
Figure 21.10: Divergence
that nodes 3, 4, and 6 are supply nodes, nodes 1, 2, and 7 are demand nodes, and
nodes 5 and 8 are transit nodes.
From (21.11), we obtain that the sum of all divergences is always zero, for any
flow vector. Indeed,
P P P P P
div(x)i = x − i∈N k|(k,i)∈A xki
i∈N Pi∈N j|(i,j)∈A P ij
= (i,j)∈A xij − (k,i)∈A xki (21.12)
= 0.
In other words, every unit of flow that is generated somewhere is consumed somewhere
else. A flow vector such that its divergence at each node is zero is called a circulation,
as illustrated in Figure 21.11. In this case, no flow is generated or consumed anywhere.
y2 = 0 2 y4 = 0 y6 = 0
2 4 6
2.3 -0.2
y1 = 0 1 0.1 2.1 4.3 2435 2435 8 y8 = 0

2.3
3 5 7
y3 = 0 -2.2 y5 = 0 y7 = 0
Figure 21.11: Example of a circulation
The following result relates the flow through a cut with the divergence at the
nodes.
506 Networks
Theorem 21.14 (Flow through a cut and divergences). Consider a network with
a set N of m nodes and a set A of n arcs, a subset of nodes M ⊂ N , a cut
Γ = (M, N \ M) and a flow vector x ∈ Rn . If Γ → ∪ Γ ← 6= ∅, then
X
X(Γ ) = div(x)i . (21.13)

i∈M
Proof. From the definition of the flow through a set, and of the sets Γ → and Γ ← , we
have X X
X(Γ ) = xij − xji .
(i,j)|i∈M,j6∈M (j,i)|j6∈M,i∈M
Note that we have inverted the indices of the second term so that node i always
belongs to M in this expression. Consequently, we can also write
 
X X X
X(Γ ) =  xij − xji  . (21.14)
i∈M j|(i,j)∈A,j6∈M j|(j,i)∈A,j6∈M
Now, from (21.11), we have

 
X X X X
div(x)i =  xij − xji  . (21.15)
i∈M i∈M j|(i,j)∈A j|(j,i)∈A
Consider an arc (k, ℓ) such that both k and ℓ belong to M. In (21.15), the flow xkℓ
appears twice, once in the term corresponding to node k with a positive sign, and
once for node ℓ with a negative sign:
   
X X X X X
div(x)i = · · ·+  xkj − xjk  +· · ·+  xℓj − xjℓ  .
i∈M j|(k,j)∈A j|(j,k)∈A j|(ℓ,j)∈A j|(j,ℓ)∈A
(21.16)
Therefore, these two terms cancel out. It means that it is sufficient to consider only
j 6∈ M in (21.15). Therefore, Equation (21.16) is identical to (21.14), proving the
result.
Consider the cut presented in Figure 21.3 with the flow vector and its divergences
presented in Figures 21.8 and 21.10. The flow through the cut is
x24 + x54 + x76 − x42 − x45 − x67 = −2.1 − 5 + 2.5 − 3 + 5 − 3

= −5.6.
The sum of the divergences at the nodes in M is
y1 + y2 + y3 + y5 + y7 = −1.7 − 8.4 + 5 + 0 − 0.5

= −5.6.
21.5.4 Costs
A quantity often associated with an arc (i, j) is a cost, which may depend on the
amount of flow that traverses the arc. In this book, we focus on linear costs, which
are proportional to the flow. The cost to move one unit of flow on arc (i, j) is denoted
by cij , so that the total cost of the arc is cij xij . The unit of the cost is usually
irrelevant, as long as it is the same for every arc in the network. For instance, it
can be the actual cost that has to be paid to traverse the arc, expressed in currency
units (e.g., a toll road). It can also be the time spent by a unit of flow to traverse
the arc. Sometimes, a generalized cost is needed. For instance, both the monetary
cost to traverse the arc and the time to traverse it are relevant. In this case, all
quantities involved have to be translated into the same unit, for example, a monetary
unit, so that they can be added. The valuation of non market resources such as
time is referred to by economists as contingent valuation. For instance, the value of
one hour of travel for commuting car drivers in Switzerland is (on average) CHF 30
(Axhausen et al., 2008). Using this value, if the toll on a road is CHF 10, and the
travel time 30 minutes, the total generalized cost per unit of flow would be CHF 25.
The cost of a path is defined as the sum of the costs of its arcs, that is
X X
C(P) = cij xij − cij xij . (21.17)
(i,j)∈P→ (i,j)∈P←
If x is a simple path flow for path P, we have

X X X X
C(P) = fcij − fcij = f( cij − cij ). (21.18)
(i,j)∈P→ (i,j)∈P← (i,j)∈P→ (i,j)∈P←
As an aside, note that this link-additive assumption may not always correspond to
the situation in a real network. For example, if you fly from Geneva Airport (GVA) to
Bangkok (BKK) with a transfer at Zurich Airport (ZRH), it costs CHF 2,412. If you
fly directly from ZRH to BKK, the cost is CHF 2,407. However, if you buy a ticket
from GVA to ZRH, it costs CHF 570. If we use the network representation illustrated
in Figure 21.12(a), and send one unit of flow along the path GVA → ZRH → BKK,
the associated cost is 2,977, as a consequence of the link-additivite assumption. It
does not correspond to the reality. Another way to model this situation, while keeping
the link-additive assumption, is represented in Figure 21.12(b). In this case, the path
GVA → BKK, with a cost of 2,412, represents passengers buying a ticket from GVA to
BKK (regardless of the number of transfers). The path GVA → ZRH → BKK, with a
cost of 2,977, represents passengers that have bought two separate tickets. But that
representation ignores the fact that travelers from GVA to ZRH and travelers from
GVA to BKK share the same flight (with a limited number of seats) between GVA
and ZRH, which may not be satisfactory either. This illustrates that it is important
to keep assumptions such as the link-additive assumption in mind when creating the
network representation of a real problem.
508 Networks
570 2,407
GVA ZRH BKK
(a) First model
2,412
570 2,407
GVA ZRH BKK
(b) Second model
Figure 21.12: Network representation for an airline problem
21.5.5 Network representation

In real life, it is common to use a map to look at a network or a representation, as
in Figure 21.2. It gives an overview of the overall topology of the network. When
dealing with network algorithms, the computer does not have access to this bird’s-eye
view of the network. Instead, it has access to the set of nodes, the set of arcs, and
a representation of the incidence function. From the point of view of the computer,
the network is more like a labyrinth, where only local information is available. A
common representation of a network is the adjacency matrix . The adjacency matrix
A ∈ Rm×m of a network with m nodes is a m × m square matrix. Each entry is
defined as
1 if (i, j) ∈ A,
A(i, j) = (21.19)
0 otherwise.
Note that this representation is valid because we assume that the incidence function of
a network is injective. The adjacency matrix of the network represented in Figure 21.2
is  
0 1 0 0 0 0 0 0
 0 0 1 1 0 0 0 0 
 
 1 0 0 0 1 0 0 0 
 
 
 0 1 0 0 1 0 0 0 
A= . (21.20)
 0 0 0 1 0 0 0 0 
 
 0 0 0 0 0 0 1 0 
 
 0 0 0 0 0 1 0 0 
0 0 0 0 0 0 0 0
In order to associate quantities with the arcs, an arc numbering convention must
be adopted. For instance, arcs can be numbered sequentially as they appear in the
adjacency matrix read row by row. In our example, arc 1 would be (1, 2), and arc
10 would be (7, 6). In many practical applications, as well as in our simple example,
the adjacency matrix is sparse, as it contains a large number of zero entries. There
are several techniques for an efficient storage of sparse matrices (see, for instance,
Dongarra, 2000 and Montagne and Ekambaram, 2004). A simple one consists in
storing adjacency lists for each node. In this configuration, each node i is associated
with a list of length equal to its outdegree d+ i . Each element of the list corresponds
to an arc (i, j) going out of i. The vector fA (i, j) ∈ Rq of values associated with the
corresponding arc may also be stored in the list. The adjacency lists of the simple
network represented in Figure 21.2, where each arc (i, j) is associated with a quantity
fA (i, j) = xij , as illustrated in Figure 21.13. Each element in list i is associated with
an arc (i, j) and contains: the number j of the downstream node of the corresponding
arc, a vector of quantities associated with this arc (here, xij ), and a pointer toward
the next element in the list. In our example, node 1 has only one outgoing arc:
(1, 2). Therefore, the list contains only one element, with three entries: the number
2 referring to node 2, the value x12 associated with the arc (1, 2), and a null pointer
(illustrated by a dot). Node 2 has two outgoing arcs: (2, 3) and (2, 4). Therefore,
the list associated with node 2 has two elements. The first one corresponds to the
arc (2, 3) and contains the number 3 referring to node 3, the value x23 , and a pointer
to the next element in the list. The second element corresponds to the arc (2, 4)
and contains the number 4 referring to node 4, the value x24 , and a null pointer
characterizing the last element of the list. As node 8 is not associated with any
outgoing arc, the associated list is empty, and the corresponding pointer is null.
1 ·
2 x12 ·
2 ·
3 x23 ·
4 x24 ·
3 ·
1 x31 ·
5 x35 ·
4 ·
2 x42 ·
5 x45 ·
5 ·
4 x54 ·
6 ·
7 x67 ·
7 ·
6 x76 ·
8 ·
Figure 21.13: Representation of the network in Figure 21.2 using adjacency lists
510 Flow decomposition
21.6 Flow decomposition

Consider the network represented in Figure 21.2 and a path flow vector that sends
flows along simple paths and cycles, as shown in Table 21.1.
Table 21.1: Flows on simple paths and cycles

Path number Path Flow
1 1→2→4→5 1.5
2 1→2←4→5 1.5
3 1→2←4→5←3→1 1
4 3→5→4 1
5 6→7 2
The procedure to calculate the resulting flow vector and associated divergences is
called network loading. The flow vector is obtained by simply summing up for each
arc the flow transported by paths containing the arc, and its divergence is defined
by (21.11). It is represented in Figure 21.14. There are 1.5+1.5+1 units of flows
leaving node 1 using paths 1, 2 and 3, and one unit of flow arriving at node 1 from
path 3. So there are a total of 3 units of flows leaving node 1, which corresponds to
its divergence. Note that the flow on arc (3, 5) is zero, as one unit of flow traverses
the arc in the forward direction along path 4, and one unit of flow in the backward
direction along path 3.
y2 = 0 1.5 y4 = −1 y6 = 2
2 4 6
4 -2.5
y1 = 3 1 0 1 4 0 2 8 y8 = 0
1
3 5 7
y3 = 1 0 y5 = −3 y7 = −2
Figure 21.14: Assigning simple path flows on a network
The inverse procedure, consisting of reconstituting the path flows from the flow
vector is called flow decomposition. It is particularly important in applications. For
instance, consider the case where the flow represents trucks that are transporting
goods on the network. As discussed in Chapter 22, we may want to transport these
goods at minimum cost. The result of the optimization algorithm is a flow vector.
However, the instructions to the drivers of the trucks should be expressed in terms
of path flows. These are obtained from the flow decomposition procedure described
next. The procedure is more complex, and composed of three steps: (i) transform the
flow vector into a circulation by adding artificial nodes and arcs, (ii) decompose the
circulation into simple cycle flows, and (iii) remove the artificial nodes and obtain the
simple path flows for the original network. We describe it first on the same example,
using the flow vector and associated divergences obtained by applying the network
loading procedure.
Step (i) First, we transform the flow vector into a circulation. We add an artificial
node (call it a) and for each node i such that its divergence is non zero, we add
an arc (a, i) with a flow equal to its divergence, as illustrated by Figure 21.15.
-1 2
3
1.5
2 4 6
-2.5 -2
1 -3
4
1 0 1 4 0 2 8
1
3 5 7
0
Figure 21.15: Adding an artificial node and arcs to transform a flow vector into a
circulation
Step (ii) We generate a simple cycle flow z such that
zij = xij for at least one arc (i, j),

0 ≤ zij ≤ xij for each (i, j) with xij ≥ 0, (21.21)
0 ≥ zij ≥ xij for each (i, j) with xij ≤ 0,
using Algorithm 21.1. It means that the simple cycle flow that we generate trans-
ports the entire flow on at least one arc and part of the total flow on each arc in
a consistent way (that is, in the right direction).
Algorithm 21.1: Generation of a simple cycle flow from a circulation

1 Objective
2 Generate a simple cycle flow z such that
zij = xij for at least one arc (i, j),

0 ≤ zij ≤ xij for each (i, j) with xij ≥ 0, (21.22)

0 ≥ zij ≥ xij for each (i, j) with xij ≤ 0.
3 Input
4 A network (N , A) of m nodes and n arcs.
5 A circulation x ∈ Rn .
6 Output
7 A simple cycle flow z ∈ Rn verifying (21.22).
8 Initialization
9 Select an arc (k, ℓ) such that xkℓ > 0 or an arc (ℓ, k) such that xℓk < 0.
10 S0 := {ℓ}, Cℓ := S0 , t := 0, P = {k}.
11 Repeat
12 St = ∅.
13 for i = 1, . . . , m, i 6∈ Cℓ do
14 if ∃(j, i) such that j ∈ St−1 and xji > 0 or ∃(i, j) such that j ∈ St−1
and xij < 0 then St := St ∪ {i}
15 Cℓ := Cℓ ∪ St .
16 t := t + 1.
17 Until St = ∅.
18 T index such that k ∈ ST , γ := k, f := +∞.
19 for t = T − 1, . . . , 0 do
20 if ∃i ∈ St such that xiγ > 0 then
21 P := {i →} ∪ P
22 if xiγ < f then f := xiγ
23 else
24 Select i ∈ St such that xγi < 0
25 P := {i ←} ∪ P
26 if −xγi < f then f := −xγi
27 γ = i.
28 for (i, j) ∈ A do
29 if (i, j) ∈ P → then zij := f
30 else if (i, j) ∈ P ← then zij := −f
31 else zij := 0
The algorithm works as follows. We first select an arc (k, ℓ) transporting a positive
amount of flow, such as arc (1, 2), for example. We group the nodes into layers
using a recursive procedure. The first layer S0 = {ℓ} contains only node ℓ. Layer
St is built from layer St−1 in the following way: node i belongs to layer St if it
does not belong to any previous layer S0 , . . . , St−1 , and there is an arc carrying
flow between a node j in St−1 and node i, that is, at least one of the two conditions
is verified (one condition for forward flows, one for backward):
1. there is an arc (j, i) such that j ∈ St−1 and xji > 0, or
2. there is an arc (i, j) such that j ∈ St−1 and xij < 0.
Intuitively, the nodes in layer St are the next step for the flow going out of the
nodes in layer St−1 . The recursive procedure is interrupted if St is empty. The set
S
of nodes covered by the flow going out of node ℓ, that is Cℓ = t St is “isolated”
from the rest of the nodes (N \ Cℓ ), in the sense that there is no flow from one set
to the other. Consider the cut Γ = (Cℓ , N \ Cℓ ).
As we are dealing with a circulation, the flow through the cut is 0. Indeed, if some
units of flow were transferred from set Cℓ to N \ Cℓ , there would be at least one
arc (i, j) transporting positive flow such that i ∈ Cℓ and j 6∈ Cℓ . It is not possible,
as the procedure would have included j into one of the sets St and, therefore, it
would belong to Cℓ . If some units of flow were transferred from set N \ Cℓ to set
Cℓ , as we have a circulation, the same amount of flow must also be transferred
in the other direction, which is not possible according to the previous argument.
Therefore, the flow through the cut Γ is zero.
Consequently, as the arc (k, ℓ) was selected such that it transports positive flow,
it cannot be in the cut. This guarantees that node k belongs to Cℓ , and more
specifically, to one ST such that T ≥ 1, as node ℓ is the only node in S0 .
In our example where arc (1, 2) is selected, S0 = {2}. There are four arcs incident
to node 2. Only two of them are transporting flow out of node 2: arc (2, 4)
transporting 1.5 units of flow (forward), and arc (4, 2) transporting 2.5 units of
flow (backward). Therefore, S1 = {4}. From node 4, arc (a, 4) is transporting one
unit of flow in the backward direction, and arc (4, 5) is transporting 4 units of flow
in the forward direction. Therefore, S2 = {a, 5}. From a, arcs (a, 1), (a, 3), and
(a, 6) are transporting positive quantities of flows. From node 5, only arc (5, 4) is
transporting a positive flow. However, as node 4 has already been included in a
layer, it does not qualify for the next. Therefore, S3 = {1, 3, 6}. Finally, we obtain
S4 = {7} and S5 = ∅. It is seen that node 1 belongs to S3 , and that node 8 has
not been included in any set St . Indeed, there is no path from node 2 to node 8.
Starting from node k ∈ ST , we select a sequence of nodes iT −1 ∈ ST −1 , iT −2 ∈
ST −2 , . . . i0 ∈ S0 such that there is an arc transporting positive flow (either
forward or backward) between it−1 and it (note that, by construction, such arcs
always exist, and i0 = ℓ). Together with the arc (k, ℓ), the sequence of nodes
and associated arcs form a simple cycle P such that all its forward arcs have
positive flow, all its backward arcs have negative flow. Consider the minimum
2
-1
3
S0 1.5 S1
2 4 6
-2.5 -2
1 -3
4
1 0 1 4 0 2 8
1
S3
S4
3 5 S2 7
0
Figure 21.16: Set of nodes
flow f transported by all these arcs, that is
f = min |xij | > 0. (21.23)

(i,j)∈P
We define a simple cycle flow z as follows


 f if (i, j) ∈ P →
zij = −f if (i, j) ∈ P ← (21.24)

0 otherwise.
By construction, the simple cycle flow verifies the properties (21.21).

In our example, the organization of the nodes into layers is illustrated in Fig-
ure 21.17, together with the arc (k, ℓ) that enables to create a simple cycle:
1 → 2 → 4 ← a → 1. (21.25)
The minimum amount of flow transported by one arc is 1 (arc (a, 4), backward),
so that
z12 = z24 = −za4 = za1 = 1,
and all other arc flows are zero.

1.5
4 4
-1 4
a 5
2
1
3
1 3 6
Figure 21.17: Construction of a simple cycle
We then subtract the obtained simple cycle flow from the original flow vector:
x+ = x − z. From properties (21.21), the flow on each arc carrying flow both
for x and x+ , that is each arc (i, j) with xij x+
ij 6= 0, has the same sign for x
+
+
and x. Moreover, there is at least one arc (i, j) such that xij 6= 0 and xij = 0.
Note that this arc is not necessarily the arc (k, ℓ) that was chosen to initiate the
procedure. The procedure is repeated until x+ = 0. Note that it is guaranteed
to happen, as each time the procedure is applied, at least one arc transporting
positive flow before the identification of the simple cycle flow has zero flow after.
So the maximum number of times that the procedure is applied equals the number
of arcs with non zero flow in the original flow vector.
If we start the process again, we generate the following simple cycle flows:
Cycle Flow
1→2→4←a→1 1
1→2→4→5←a→1 0.5
1→2←4→5←a→1 1.5
1→2←4→5←a→3→1 1
4→5→4 1
6→7←a→6 2
2
2
0.5
2 4 6
-2.5 -2
1 -3
3
1 0 1 4 0 2 8
1
3 5 7
0
Figure 21.18: New flow vector after subtracting the simple cycle flow
Algorithm 21.2: Circulation decomposition

1 Objective
2 Decomposition of a circulation into consistent simple cycle flows.
3 Input
5 A circulation x ∈ Rn .
6 Output
7 A list of simple cycle flows consistent with x.
8 Initialization
9 w := x, list := ∅
10 Repeat
11 Obtain a simple cycle flow z using Algorithm 21.1.
12 list := list ∪ z.
13 w = w − z.
14 Until w = 0.
Step (iii) It is now sufficient to remove the artificial node a to obtain the paths from
the original network:
Path Flow
1→2→4 1
1→2→4→5 0.5
1→2←4→5 1.5
3→1→2←4→5 1
4→5→4 1
6→7 2
Note that the decomposition is not unique. The above simple path flows are not
the same as the ones described in Table 21.1:
Path Flow
1→2→4→5 1.5
1→2←4→5 1.5
1→2←4→5←3→1 1
3→5→4 1
6→7 2
However, when each of them is loaded on the network, the same vector flow is gener-
ated.
Algorithm 21.3: Flow decomposition

1 Objective
2 Decomposition of a flow into consistent simple path flows.
3 Input
5 A flow vector x ∈ Rn .
6 Output
7 A list of simple path flows consistent with x.
8 Initialization
9 N + := N ∪ a.
S
10 A+ := A i∈N (a, i).
11 yij := xij for each (i, j) ∈ A.
12 yai = div(x)i for each i ∈ N .
13 Apply Algorithm 21.2 on network (N + , A+ ) and circulation y.
14 For each generated cycle containing node a, remove a and the incident arcs.
We see that each of these simple path flows is either a simple cycle flow, or starts
from a supply node and ends at a demand node. Also, the flows are transported in
the same direction as the original flow on each arc. We say that they are consistent
with the original flow x.
Definition 21.15 (Consistent simple path flow). Let (N , A) be a network with m

nodes and n arcs. Let x ∈ Rn be a flow vector, and z ∈ Rn be a simple path flow. z
is said to be consistent with x if
• xij > 0 for each (i, j) such that zij > 0,
• xij < 0 for each (i, j) such that zij < 0,

• one of these two conditions holds
– div(z) = 0, that is, z is a cycle, or
– div(x)i div(z)i > 0 for each i such that div(z)i 6= 0, that is the origin of the
simple path is a supply node for x, and the destination of the simple path is a
demand node.
We now provide a formal analysis of the above procedure.
Theorem 21.16 (Cycle flow decomposition). Let (N , A) be a network with m

nodes and n arcs. Let x ∈ Rn be a circulation, that is div(x)i = 0, for i =
1, . . . , m. Then the circulation can be decomposed into T simple cycle flows
PT
z1 , . . . , zT consistent with x such that x = t=1 zt and T ≤ n.
Proof. Let 0 < p0 ≤ n be the number of arcs transporting a non zero amount of
flow. If we apply Algorithm 21.1, we obtain a simple cycle flow z consistent with x.
Indeed, the conditions of Definition 21.15 are directly obtained from (21.22). Now, let
us consider the flow vector x+ = x − z, and let p1 be the number of arcs transporting
a non zero amount of flow. As xij = zij for at least one arc, we have that p1 ≤ p0 − 1.
Moreover, conditions (21.22) guarantee that the non zero flows of x+ have the exact
same sign as the corresponding flow of x. If we repeat the same process several times,
at each iteration t such that pt > 0, we have pt+1 ≤ pt − 1. The process is stopped
at iteration T when no arc is transporting a non zero amount of flow, that is when
pT = 0. We have
0 = pT ≤ pT −1 − 1 ≤ pT −2 − 2 ≤ . . . ≤ pT −k − k ≤ . . .
for any 0 ≤ k ≤ T . For k = T , we have
0 ≤ p0 − T that is T ≤ p0 ≤ n.
Therefore, there are at most T simple cycle flows generated from x.
Corollary 21.17 (Flow decomposition). Let (N , A) be a network with m nodes

and n arcs. Let x ∈ Rn be a flow vector. Then it can be decomposed into T simple
PT
path flows z1 , . . . , zT consistent with x, such that x = t=1 zt and T ≤ n + m.
Proof. Consider an extended network obtained from (N , A) by adding one node a.

For each node i in the original network, add an arc (a, i). Now consider the flow vector
y such that yij = xij if (i, j) is an arc from the original network, and yai = div(x)i for
the newly added arcs (see Figure 21.15). This network has n+1 nodes and n+m arcs.
The flow vector y is a circulation. Indeed, for the nodes from the original network,
X X
div(y)i = yij − yki − yai
X X
= xij − xki − div(x)i
= div(x)i − div(x)i
= 0.
For node a, as all arcs are going out of the node, we have
X X
div(y)a = yaj = div(x)i = 0,
j∈N j∈N
where the last result is obtained from (21.12).

From Theorem 21.16, the circulation in the new network can be decomposed into
T ≤ n + m simple cycle flows consistent with y. If such a simple cycle flow does
not contain the node a, it is also a simple cycle flow consistent with x in the original
network. Now, if node a belongs to a simple cycle transporting f > 0 units of flows,
it must be in the following configuration: · · · d ← a → o · · · , as a is the upstream
node of all arcs incident to a in the modified network. If we remove node a from the
cycle, we obtain a simple path flow starting from node o and reaching node d. The
arc (a, o) is transporting f > 0 units of flow in the simple cycle. As it is consistent
with y, it means that yao = div(x)o > 0, and o is a supply node for x. Similarly, arc
(a, d) is transporting −f < 0 units of flow in the simple cycle, as it appears backward.
Again, from consistency of the cycle, we have that yad = div(x)d < 0, and d is a
demand node for x. Therefore, the simple path flow is consistent with x.
Corollary 21.18 (Integer flow decomposition). Let (N , A) be a network with m

nodes and n arcs. Let x ∈ Zn be a flow vector containing only integer values.
Then it can be decomposed into T simple path integer flows z1 , . . . , zT consistent
PT
with x, such that x = t=1 zt and T ≤ n + m.
Proof. In the decomposition of a circulation into simple cycle flows (Algorithm 21.1),
the flow on each simple cycle flow is the smallest flow transported by an arc of the
generated cycle. Therefore, if the flow on each arc is integer, the simple cycle flow
is also integer. And when it is subtracted to the original flow to generate the next
cycle, the flow obviously remains integer-valued.
The above result has important practical implications. If the flow vector represents
physical units (trucks, containers, etc.), it can be decomposed into simple paths
transporting flows of these units.
520 Minimum spanning trees
3 8
1 2 3
7 7 2 8
3 2
4 5 6
4 4 10
4 0
7 8 9
Figure 21.19: Example of an integer flow vector
Example 21.19 (Decomposition of an integer flow vector). Consider the network

presented in Figure 21.19, where the value on each arc represents the arc flow.
Applying Algorithm 21.3 produces the following simple path flows:
Path Flow
1→2→3→6→9 8
1→4→5→2←1 3
1→4→7→8→5→2←1 4
1→2→5→6→9 2
Again, the decomposition is not unique. The following simple path flows are also
consistent with the flow vector:
Path Flow
1→2→3→6→9 1
1→2→5→6→9 2
1→4→5→2→3→6→9 3
1→4→7→8→5→2→3→6→9 4
21.7 Minimum spanning trees

As characterized by condition 5 of Theorem 21.10, a spanning tree can be seen as the
smallest structure that keeps all nodes of a network connected. Indeed, the removal
of a single arc from a tree disconnects it. In this chapter, we analyze the problem of
finding a spanning tree of a network that is associated with the smallest cost. As a
motivation, consider a telecommunication company who must install an optical fiber

infrastructure to connect a set of cities. The company has identified for each pair
of cities if it is feasible to build a connection, and, if so, at what cost. The cities
(vertices) and these potential connections (edges) form a network with an underlying
undirected graph, as the orientation of the arcs are ignored. The problem is to decide
which connections have to be built in order for all cities to be connected, at minimal
cost.
Example 21.20 (Network for the minimum spanning tree problem). A telecommuni-
cation company must connect 7 cities with optical fiber. The potential connections to
be built, together with the associated costs, are modeled by the network represented
in Figure 21.20.
3
3 5
5
5
7
7
1 6 7
2
1
4
5
6
7
2 4
Figure 21.20: Network for the minimum spanning tree (Example 21.20)
A minimum spanning tree, with a total cost of 22, is represented in Figure 21.21,
where arcs represented by a plain line are part of the tree. Note that including arc
(5,7) instead of (4,7) would give another spanning tree with the same cost.
3
3 5
1 6 7
2
5
6
2 4
Figure 21.21: Minimum spanning tree for Example 21.20
The following theorem characterizes a minimum spanning tree.

522 Minimum spanning trees
Theorem 21.21 (Minimum spanning tree). Let (N , E) be a network with m nodes

and n edges (that is, undirected arcs). Let c ∈ Rn be the vector of edge costs.
Let E ′ ⊆ E be such that T ∗ = (N , E ′ ) is a spanning tree. If (i, j) ∈ T ∗ , removing
it disconnects the tree into two connected components. Let Ni be the set of nodes
in the connected component containing node i, and Nj be the set of nodes in the
other one. In the network (N , E), consider the cut Γij = (Ni , N \ Ni ) = (Ni , Nj ).
T ∗ is a minimum spanning tree if and only if T ∗ verifies the cut condition, that
is cij ≤ ckℓ , for each (i, j) ∈ T ∗ and each (k, ℓ) ∈ Γij .
i j
1 6 7
k ℓ
Figure 21.22: Illustration of Theorem 21.21
Proof. Figure 21.22 illustrates the edges involved in this theorem. Edge (i, j) belongs
to the spanning tree. If removed, it defines the cut Γij = (Ni , Nj ) with Ni = {1, i, k}
and Nj = {j, ℓ, 6, 7}.
Necessary condition Assume first that T ∗ is a minimum spanning tree. Assume
by contradiction that cij > ckℓ . Then removing arc (i, j) from T ∗ disconnects
the tree. As it belongs to the cut, adding arc (k, ℓ) reconnects it. And the total
cost of the new tree is lower than the cost of T ∗ , contradicting the fact that T ∗
is optimal.
Sufficient condition Assume that T ∗ verifies the cut condition. Consider a mini-
mum spanning tree Tb . From the necessary condition above, it also verifies the
cut condition. If Tb = T ∗ , the proof is finished. If not, consider (i, j) ∈ T ∗ such
that (i, j) 6∈ Tb . Adding (i, j) to Tb creates a cycle (condition 4 of Theorem 21.10).
This cycle must contain an edge (k, ℓ) ∈ Γij . Note that, by construction, the cut
obtained by removing (i, j) from T ∗ is exactly the same as the cut obtained by
removing (k, ℓ) from Tb . As T ∗ verifies the cut condition, we have cij ≤ ckℓ . As Tb
verifies the cut condition, we have that ckℓ ≤ cij . Consequently, cij = ckℓ . Now
replace edge (k, ℓ) by edge (i, j) in Tb . The new tree is also optimal, as the cost
has not changed. If this new tree is equal to T ∗ , we are done. Otherwise, we start
the process as many times as is needed to obtain T ∗ , and show that it is optimal.
This can happen only a finite number of times, as each time a new arc from T ∗
is included in Tb , and there are exactly m − 1 of them.
The optimality condition provided by Theorem 21.21 suggests a simple algorithm

that constructs step by step the spanning tree, making sure that the cut condition
is always verified. At each iteration of this algorithm, we have a partial tree. We
consider the set M of nodes connected by this partial tree, and the set of all other
nodes N \M. Among all edges belonging to the cut Γ = (M, N \M), we select the one
with the minimum cost, and add it to the partial tree. Such a constructive algorithm,
which considers a locally minimum strategy at each step without reconsidering any
previous decision, is called a greedy algorithm.
Definition 21.22 (Greedy algorithm). A greedy algorithm is an algorithm that

always takes the best immediate, or local, move while finding an answer, without
considering the possible impact of the immediate decisions on later ones.
The greedy algorithm for the minimum spanning tree problem described above is
called the Jarnik-Prim algorithm, from the work of Jarník (1930) and Prim (1957).
It is formally defined as Algorithm 21.4.
Algorithm 21.4: Jarnik-Prim algorithm for minimum spanning tree

1 Objective
2 Calculate a minimum spanning tree.
3 Input
4 A network (N , E) of m nodes and n edges.
5 A vector c ∈ Rn with the cost of each edge.
6 Output
7 The list T ∗ of edges belonging to the minimum spanning tree.
8 Initialization
9 M = {i} where i is any node.
10 T ∗ = ∅.
11 Repeat
12 c := +∞.
13 for (i, j) ∈ Γ (M, N \ M) do
14 if cij < c then
15 c := cij
16 (ei , ej ) := (i, j)
17 T ∗ := T ∗ ∪ (ei , ej ).
18 M := M ∪ ej .
19 Until M = N .
524 Exercises
The iterations of Algorithm 21.4 applied on Example 21.20 are reported in Ta-
ble 21.2.
Table 21.2: Iterations of Algorithm 21.4 on Example 21.20

M T∗ (i, j) cij
1 ∅ (1,2) 6
1,2 (1,2) (2,3) 2
1,2,3 (1,2),(2,3) (3,5) 3
1,2,3,5 (1,2),(2,3),(3,5) (4,5) 1
1,2,3,4,5 (1,2),(2,3),(3,5), (4,5) (4,6) 5
1,2,3,4,5,6 (1,2),(2,3),(3,5), (4,5), (4,6) (4,7) 5
1,2,3,4,5,6,7 (1,2),(2,3),(3,5), (4,5), (4,6), (4,7) — —
The Jarnik-Prim algorithm for the minimum spanning tree problem is an example
where a greedy algorithm provides an optimal solution of a discrete optimization
problem, as shown by Theorem 21.21. For other problems, greedy algorithms may
not necessarily provide an optimal solution. Still, due to their simplicity, they can
also be used as heuristics, as described in Section 27.1.
21.8 Exercises
Exercise 21.1. Consider the network represented in Figure 21.23, where the number
associated with each arc represents the amount of flow traversing it.
1. What is the indegree, the outdegree, and the degree of each node?
2. Give the adjacency matrix of the network.
3. Represent the network using an adjacency list that also stores the flows.
4. Is the network connected?
5. Is the network strongly connected?
6. Enumerate all simple paths from node a to node g.
7. Enumerate all simple forward paths from node a to node g.
8. Give the divergence of the flow vector at each node. What are the supply nodes?
What are the demand nodes?
9. Consider the cut Γ = (M, N \ M), defined by the set M = {a, b, c}.
(a) What are the forward arcs of the cut?
(b) What are the backward arcs of the cut?
(c) What is the flow through the cut? Check that Theorem 21.14 is verified.
(d) Assume that the capacities on each arc are -3 for the lower bound and 5 for
the upper bound. What is the capacity of the cut?
10. Decompose the flow vector into consistent simple path/cycle flows.
0
c e
1 -3 4
-2
a 5 2 f g
-2
3 1 1
-1
b d
Figure 21.23: Network with a flow vector for Exercise 21.1
Exercise 21.2. Consider the network represented in Figure 21.24, where the number
associated with each arc represents the amount of flow traversing it. Answer the same
questions as Exercise 21.1.
-2
a d f
1 c 1 -2
-1
b e g
-1
Figure 21.24: Network with a flow vector for Exercise 21.2
Exercise 21.3. Consider the network represented in Figure 21.25, where each arc
(i, j) is associated with its lower bound ℓij , its flow xij and its upper bound uij in the
following way: (ℓij , xij , uij ). Identify at least 4 cuts Γ = (M, N \ M) separating o
from d, such that Γ → contains exactly 4 arcs. For each of them, give the flow through
the cut and the capacity of the cut.
526 Exercises
(0, 5, 8)
4 5
(0
,0
0)
)
(0, 0, 4)
,5
,1
,1
,5
0)
,5
(0
(0
(0, 10, 25)

(0, 10, 20)
(0, 0, 5)
o 2 d
0)
(0
,3
(0
(0, 2, 5)
,5
0
,7 ,2
,1
,9 (0
5)
(0, 3, 6)
3 6
Figure 21.25: Network for Exercises 21.3, 22.5, and 24.2, where each arc (i, j) is
associated with (ℓij , xij , uij ), that is, lower bound, flow, and upper bound.
Exercise 21.4. Determine the minimum spanning tree for the network represented
in Figure 21.26, where the value associated with each edge is its cost. Apply Algo-
rithm 21.4 starting with M = {a}.
f
1
2
g 2 b
5
5
3 a 3 4
c e
4
3
d
Figure 21.26: Network for Exercise 21.4, with cost associated with each edge
Exercise 21.5. A travel agent organizes hiking routes in the Alps for families. For
each possible origin/destination pair, he wants to identify an itinerary that avoids high
altitudes as much as possible. The network represented in Figure 21.27 represents
various locations that serve as the origin or destination of the routes. Each edge
represents a hiking trail between two of these locations. The value associated with
each edge represents the highest altitude along the trail. Solve the problem for the
travel agent.
1738 1379 1423
1625
139
4
34
135
16
122
7
45
151
1
20
14
23
6
78
12
1619
1964
18
34
5
173
1786
37
22
1456
5 209 7
174 7 198
1349
Figure 21.27: Network of hiking trails in the Alps. The value associated with each
edge is the maximum altitude along the trail.
Chapter 22
The transhipment problem
Contents
22.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
22.2 Optimality conditions . . . . . . . . . . . . . . . . . . . . . 535
22.3 Total unimodularity . . . . . . . . . . . . . . . . . . . . . . 536
22.4 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
22.4.1 The shortest path problem . . . . . . . . . . . . . . . . . 539
22.4.2 The maximum flow problem . . . . . . . . . . . . . . . . . 541
22.4.3 The transportation problem . . . . . . . . . . . . . . . . . 544
22.4.4 The assignment problem . . . . . . . . . . . . . . . . . . . 546
22.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
As discussed in Chapter 21, networks are usually designed to transport flow. As

transporting flow has a cost, we are looking at the least expensive way to transport
the flow from the places where it is produced to the places where it is consumed. This
optimization problem is called the transhipment problem or the minimum cost flow
problem.
22.1 Formulation
We assume that we have a network (N , A) of m nodes and n arcs. The set of nodes
is partitioned into three subsets:
• a set N s of supply nodes representing the places where the flow is produced,
• a set N d of demand nodes representing the places where the flow is consumed,
• a set N t of transhipment nodes where the flow may just be transiting.
We also have access to the following data:
• a vector s ∈ Rm representing the supply/demand of flow, such that
– si > 0 for each i ∈ N s represents the quantity of flow produced at supply node
i,
530 Formulation
– si < 0 for each i ∈ N d is such that −si represents the quantity of flow
consumed at demand node i,
– si = 0 for each i ∈ N t ,
• a vector c ∈ Rn representing the cost of transporting a unit of flow on each arc,
n
• a vector ℓ ∈ R representing the lower capacity of each arc,
n
• a vector u ∈ R representing the upper capacity of each arc.
P
Note that we assume that m i=1 si = 0, and ℓij ≤ uij , for each (i, j) ∈ A. Other-
wise, there is no feasible flow vector verifying the constraints. Note also that these
conditions are not sufficient to guarantee feasibility.
As described in Section 1.1, in order to obtain an optimization problem, we need
to define the decision variables, the objective function, and the constraints.
Decision variables The decision variables are the flow on each arc, denoted by x ∈
Rn . From Theorem 21.17, it is always possible to decompose the flow vector
into simple path flows (possibly cycle flows), so that concrete instructions can be
derived from the flow vector to transport the flow from the production sites to
the consumption sites.
Objective function The objective is to minimize the total cost, that is
X
minn cij xij .
x∈R
(i,j)∈A
Note that it is a linear function in the decision variables.

Constraints Two types of constraints have to be verified. First, the flow vector must
be consistent with the given demand and supply. It means that the divergence on
each node must correspond to the value of si , that is
div(x)i = si ∀i ∈ N ,
or, from (21.11),

X X
xij − xki = si , ∀i ∈ N .
Second, the value of the flow on each arc must verify the capacity constraints,
that is
ℓij ≤ xij ≤ uij , ∀(i, j) ∈ A.
Therefore, we obtain the following optimization problem:
X
minn cij xij (22.1)
x∈R
(i,j)∈A
subject to X X
xij − xki = si , ∀i ∈ N , (22.2)
ℓij ≤ xij ≤ uij , ∀(i, j) ∈ A. (22.3)

The transhipment problem 531
It is a linear optimization problem and can be solved using the simplex method
described in Chapter 16. It is therefore appropriate to transform it into a linear
problem in standard form (6.159)–(6.160). To do this, we apply the transformation
techniques described in Section 1.2. First, in order to set the lower bounds to 0, we
define new variables
′
xij = xij − ℓij . (22.4)
We therefore obtain the following formulation
X X
′
minn cij xij + cij ℓij
x∈R
(i,j)∈A (i,j)∈A
subject to
X X X X
′ ′
xij − xki = si + ℓki − ℓij , ∀i ∈ C,
j|(i,j)∈A k|(k,i)∈A k|(k,i)∈A j|(i,j)∈A
and
′
0 ≤ xij ≤ uij − ℓij , ∀(i, j) ∈ A.
P
As (i,j)∈A cij ℓij is a constant, it can be omitted in the objective function. Defining
′
uij = uij − ℓij ∀(i, j) ∈ A (22.5)
X X
si′ = si + ℓki − ℓij ∀i ∈ C, (22.6)
k|(k,i)∈A j|(i,j)∈A
we obtain X
′
minn
cij xij
x ∈R
′
(i,j)∈A
subject to X X
′ ′
xij − xki = si′ , ∀i ∈ C,
and
′ ′
0 ≤ xij ≤ uij , ∀(i, j) ∈ A.
Next, we transform the upper bound constraints into equality constraints. For each
arc (i, j) we include a slack variable yij (Definition 1.4) and the problem is written as
X
′
min n
cij xij (22.7)
′ x ∈R
(i,j)∈A
subject to X X
′ ′
xij − xki = si′ , ∀i ∈ N , (22.8)
′ ′
xij + yij = uij , ∀(i, j) ∈ A, (22.9)
and
′
xij ≥ 0, yij ≥ 0. (22.10)
The constraint (22.9) can actually be interpreted as a supply/demand constraint and
included in the set of constraints (22.8). We illustrate this with Examples 22.1 and
22.2.
532 Formulation
si sj
xij (ℓij ,uij )
i j
cij
Figure 22.1: Convention for Example 22.1
Example 22.1 (Transhipment problem in standard form – I). Consider the simple
example represented in Figure 22.2(a), where we adopt the convention depicted in
Figure 22.1, and we report the flow and its bounds on the top of the arc, the cost
on the bottom, and the supply of each node is represented by a dotted line. It is
a network with two nodes and one arc. Node 1 supplies 3 units of flows that are
consumed at node 2 (equivalently, node 2 supplies −3 units of flow). Arc (1, 2) has
lower capacity −5, upper capacity 5, and cost 1. It transports 3 units of flow from
node 1 to node 2. In order to set the lower bound to 0, we perform the change of
′
variable (22.4), that is x12 = x12 − (−5) = 3 + 5 = 8. From (22.5), the upper bound
becomes now 5 − (−5) = 10. From (22.6), we obtain that s1′ = 3 − (−5) = 8 and
s2′ = −3 − 5 = −8, as illustrated in Figure 22.2(b). Figure 22.2(c) represents the
modification to the network that accounts for the slack variable.
3 -3
x12 =3(-5,5)
1 2
c12 =1
(a) Original formulation
8 -8
′
x12 =8(0,10)
1 2
c12 =1
(b) Change of variable to obtain zero lower bounds
-2 10 -8
′
y31 =2(0,+∞) x32 =8(0,+∞)
1 3 2
c31 =0 c32 =1
(c) Slack variable to remove the upper bounds
Figure 22.2: Transforming a simple transhipment problem into standard form

Indeed, a new variable means a new arc transporting the corresponding flow. The
arc (1, 2) is replaced by a node (node 3) and two arcs. The supply of the new node
corresponds to the upper bound on the flow of the original arc. The new arc (3, 2)
takes the role of the original arc and transports the flow from the original problem
(here, 8) at the same cost. By design, as the supply of node 3 is equal to the upper
bound, the flow on arc (3, 2) never exceeds that value. If it happens to transport less,
the excess flow (that is, the slack) is transported by the new arc (3, 1), at zero cost
and exits the network at node 1. Therefore, the supply of node 1 must be decreased
by the amount of extra flow that has been injected into the network. In this example,
as the upper bound is 10, the supply of node 3 is 10, and the new supply of node 1
is 8 − 10 = −2.
Example 22.2 (Transhipment problem in standard form – II). Consider the network
represented in Figure 21.2, with the data presented in Table 22.1. The left part of the
table contains the lower bound, the upper bound, the cost of each arc, and a feasible
flow vector. The right part contains the supply for each node. Note that the values
of the supply sum up to 0 and the bounds are compatible, that is, the lower bound is
lower than the upper bound. The network representation of the transformed problem
(22.11)–(22.13) is depicted in Figure 22.3 and the corresponding data in Table 22.2.
Table 22.1: Data for Example 22.2
Arcs Nodes
(i, j) ℓij uij cij xij i si
(1, 2) −1.1 2.5 1 2.2 1 −1.7
(2, 3) −2.2 2.5 1 −2.2 2 −8.4
(2, 4) −3.3 3.5 1 −3.3 3 5.0
(3, 1) −4.5 4.5 1 3.9 4 5.1
(3, 5) −5.5 5.5 1 −1.1 5 0.0
(4, 2) −6.6 6.5 1 0.7 6 0.5
(4, 5) −7.7 7.5 1 −7.7 7 −0.5
(5, 4) −8.8 8.5 1 −8.8 8 0.0
(6, 7) −9.9 9.5 1 −9.5
(7, 6) −10.0 10.5 1 −10.0
534 Formulation
24
2 4 6
42
12
1 23 45 54 67 76 8
31
3 35 5 7
Figure 22.3: Transformed network for Example 22.2
Table 22.2: Data for the transformed network of Example 22.2
(i, j) cij xij i si

(12, 2) 1 3.3 1 −8.7
(12, 1) 0 0.0 2 −22.1
(23, 3) 1 0.0 3 −7.2
(23, 2) 0 8.4 4 −21.0
(24, 4) 1 4.4 5 −21.7
(24, 2) 0 7.3 6 −19.0
(31, 1) 1 0.0 7 −20.9
(31, 3) 0 0.0 8 0.0
(35, 5) 1 0.4 12 3.6
(35, 3) 0 0.0 23 4.7
(42, 2) 1 0.3 24 6.8
(42, 4) 0 4.7 31 9.0
(45, 5) 1 6.8 35 11.0
(45, 4) 0 0.6 42 13.1
(54, 4) 1 6.6 45 15.2
(54, 5) 0 5.8 54 17.3
(67, 7) 1 15.2 67 19.4
(67, 6) 0 17.3 76 20.5
(76, 6) 1 19.0
(76, 7) 0 20.5
We can assume without loss of generality that the transhipment problem is given
in standard form, that is X
minn cij xij (22.11)
x∈R
(i,j)∈A
subject to
X X
xij − xki = si , ∀i ∈ N , (22.12)
and
xij ≥ 0, ∀(i, j) ∈ A. (22.13)
In matrix form, we have
min cT x (22.14)
x∈Rn
subject to
Ax = s (22.15)
x ≥ 0, (22.16)
where A ∈ Rm×n is the incidence matrix of the network. Its columns correspond to
the arcs and the rows to the nodes of the network. The column corresponding to arc
(i, j) contains only 0, except for the entry corresponding to node i that contains 1,
and the entry corresponding to node j that contains −1.
Note that these transformations of the problem into a standard form are exactly
the same as the transformations described in Section 1.2. We have simply given a
concrete interpretation in the network context.
22.2 Optimality conditions

As discussed in Part II, optimality conditions provide a theoretical analysis of the
optimization problem that is an important starting point for the design of algorithms.
They characterize the optimal solution, and therefore provide a stopping criterion for
the iterative methods. We investigate these conditions in the specific case of the
transhipment problem.
The Karush-Kuhn-Tucker conditions (see Theorem 6.13) are significantly simpler
for the transhipment problem (22.11)–(22.13). The Lagrangian is written as
 
X X X X X
L(x, λ, µ) = cij xij + λi  xij − xki − si  − µij xij ,
(i,j)∈A i∈N j|(i,j)∈A k|(k,i)∈A (i,j)∈A
(22.17)
where x ∈ Rn , λ ∈ Rm and µ ∈ Rn .
The derivative with respect to the flow variable xij is
∂L
= cij + λi − λj − µij . (22.18)
∂xij
536 Total unimodularity
Therefore, the necessary optimality conditions (6.55) and (6.56) write
cij + λi − λj ≥ 0, ∀(i, j) ∈ A. (22.19)
Moreover, from condition (6.57), for each arc transporting flow, we have µij = 0 and,
therefore,
cij + λi − λj = 0, ∀(i, j) such that xij > 0. (22.20)
Conditions (22.19) and (22.20) are the complementarity slackness conditions pre-
sented in Theorem 6.34. They are therefore sufficient and necessary optimality con-
ditions for the transhipment problem. There is a dual variable λi associated with
each node i. When the complementarity slackness conditions are verified, λ is also an
optimal solution of the dual problem. Note that only differences of the dual variables
are involved in these conditions. Therefore, if λ ∈ Rm verifies conditions (22.19) and
(22.20), so does any vector such that all values of λ are shifted by a quantity σ, that
is λ + σe, where e ∈ Rm is a vector composed only of 1, for any σ ∈ R.
22.3 Total unimodularity

As described in Section 22.1, the incidence matrix A involved in the constraints of
the transhipment problem has a special structure. It has as many columns as arcs
in the network and as many rows as nodes. For each arc (i, j), there are exactly two
entries in the corresponding column: 1 at row i and −1 at row j. Therefore, the
sum of all rows of the matrix is 0. It has an interesting property when the vector of
supply/demand is integer.
Remember that the optimal solution of the linear optimization problem is a feasi-
ble basic solution (see Definition 3.38). It means that it has the form x∗ = (x∗B x∗N ),
where x∗N = 0 and
x∗B = B−1 s (22.21)
where B is a square invertible matrix. From Cramer’s rule (C.26), we have
1
x∗B = C(B)T s, (22.22)
det(B)
where C(B) is the cofactor matrix of B (see Definition B.12). As each entry of C(B)
is a determinant of a matrix containing only 0, 1, and −1, they are all integers.
Therefore, if the supply vector s is integer, the vector C(B)T s is also integer. Now, if
the determinant of B happens to be either 1 or −1 (it cannot be 0, as B is invertible),
we obtain the nice property that x∗B (and, consequently, x∗ ) is integer. In this case,
B is said to be a unimodular matrix (see Definition B.14).
It is particularly valuable to obtain integer solutions without including explicit
integrality constraints in the optimization problem. Indeed, as discussed during the
presentation of the example in Section 1.1.7, constraining the variables to have only
integer values dramatically complicates the optimization problem, and methods such
as the simplex algorithm cannot be used anymore. Therefore, the property described
above is particularly important, as it allows us to handle these constraints implicitly

and not explicitly. Indeed, it suffices to verify that the property applies to the problem
at hand, and to make sure that the data is integer to obtain an integer optimal solution
from the simplex algorithm.
Definition 22.3 (Total unimodularity). The matrix A ∈ Zm×n is totally unimodular

if the determinant of each square submatrix of A is 0, −1 or +1. In particular, every
entry of the matrix is 0, −1 or +1.
If the matrix A of the constraint is totally unimodular, each basis B has determi-
nant −1 or 1. Indeed, as it is non singular, the determinant cannot be 0. Therefore,
using the argument discussed above, any feasible basic solution (hence, any vertex of
the constraint polyhedron) is integer, including the optimal basic solutions.
Theorem 22.4 (Integrality of the basic solutions). Consider the polyhedron rep-
resented in standard form {x ∈ Rn |Ax = b, x ≥ 0}, where A ∈ Zm×n and b ∈ Zm .
If A is totally unimodular, then every basic solution is integer.
Proof. According to Definition 3.38, every basic solution is decomposed into xN = 0

and
xB = B−1 b,
where the matrix B composed of m columns of the matrix A is non singular. Therefore
det(B) 6= 0. We use Cramer’s rule to write
1
xB = C(B)T b, (22.23)
det(B)
where C(B) is the cofactor matrix of B, and is integer. As A is totally unimodular,
det(B) = ±1. As b is integer, so is xB .
Example 22.5 (Totally unimodular matrices). Consider the incidence matrix of the
network represented in Figure 22.2(c):
 
−1 0
A =  0 −1  .
1 1
Each square submatrix of size 1, that is each element of the matrix, is 0, −1 or +1.
Each square submatrix of size 2 is unimodular:

−1 0 −1 0 0 −1
det = 1, det = −1, det = 1.
0 −1 1 1 1 1
Therefore, A is totally unimodular. Note that it is not sufficient to have entries 0,
−1 or +1 to be totally unimodular. For example, the matrix
 
1 −1
 0 0 
1 1
538 Total unimodularity
is not totally unimodular, as

1 −1
det = 2.
1 1
Theorem 22.6 (Total unimodularity of the incidence matrix). Let (N , A) be a

network with m nodes and n arcs. Let A ∈ {−1, 0, 1}m×n be the incidence matrix
of the network, where each entry aik is defined as

 1 if i is the upstream node of arc k,

aik = −1 if i is the downstream node of arc k, (22.24)


0 otherwise.
Then the matrix A is totally unimodular.
Proof. Assume by contradiction that A is not totally unimodular. There are, there-
fore, submatrices of A such that their determinant is not 0, 1, or −1. Among them,
consider one submatrix with minimum size k and call it B, with det(B) 6∈ {−1, 0, 1}.
As A has at most two non zero entries in each column, each column of B can have
0, 1, or 2 non zero entries.
Assume first that one column of B contains only 0. If it were the case, B would be
singular, and its determinant would be 0, which is not possible. Consequently, each
column of B contains at least one non zero entry.
Assume that one column of B contains exactly one non zero entry. Without loss
of generality, assume this entry to be b11 . It means that B is of the form
 
b11 b12 · · · b1k
 0 
 
B= .. .
 . B′ 
0
Therefore,
det(B) = b11 det(B ′ ).
As b11 is either 1 or −1, then det(B ′ ) is the same as det(B), up to the sign. In
particular, det(B ′ ) is not 0, 1, or −1. As B ′ is strictly smaller than B, this is not
possible as B is the smallest submatrix with such a property. Consequently, each
column of B contains exactly two non zero entries. As they are subvectors of the
column of A, one of these entries is 1 and the other is −1.
Therefore, if we sum up all rows of the matrix B, we obtain 0, and the rows are
linearly dependent. It means that B is singular and det(B) = 0, which is not possible.
This contradicts the fact that A is not totally unimodular, and proves the result.
Corollary 22.7 (Integer optimal solution of the transhipment problem). Consider

the transhipment problem (22.11)–(22.13). If the supply vector s is integer, that
is, if s ∈ Zm , and if the problem is bounded, then it has an optimal solution
which is integer.
Proof. Direct consequence of Theorems 22.4 and 22.6.
The concept of totally unimodular matrices goes beyond the transhipment prob-
lem, and plays an important role in discrete optimization. We refer the reader to
Nemhauser and Wolsey (1988), Wolsey (1998), or Bertsimas and Weismantel (2005)
for a more comprehensive description of the topic.
22.4 Modeling
The transhipment model embeds a variety of other network problems. In general,
these problems are associated with a dedicated algorithm that solves them more effi-
ciently than the simplex method applied to the transhipment formulation. However,
as they are instances of the transhipment problem, they also have its properties. In
particular, the integrality of an optimal solution is guaranteed if the supply/demand
is integer. In this section, we formulate these problems and show why they are tran-
shipment problems. Two of them are treated in greater details in later chapters,
where specific algorithms are presented.
22.4.1 The shortest path problem
The pervasiveness of GPS navigation systems and of online map services allows ev-
erybody to compute the fastest or the shortest itinerary between two points from
a computer or a navigation device (see Figure 22.4). The problem of finding such
itineraries is usually referred to as the shortest path problem and can be defined in
different ways. The classical definition is as follows.
Definition 22.8 (Shortest path problem: single origin-single destination). Consider

a network (N , A) of m nodes and n arcs, and a vector c ∈ Rn representing the cost
of traversing each arc. Consider a node o called the origin and a node d called the
destination. The shortest path problem consists in finding a simple forward path
with origin o and destination d, and with the smallest cost.
From an algorithmic point of view, it is convenient to solve the problem for all
destinations at once.
Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 540 Modeling
Figure 22.4: The fastest path between EPFL and Zinal computed by
OpenRouteService.org
Definition 22.9 (Shortest path problem: single origin-multiple destinations). Con-

sider a network (N , A) of m nodes and n arcs, and a vector c ∈ Rn representing the
cost of traversing each arc. Consider a node o called the origin. The shortest path
problem consists in finding for each node i 6= o in the network a simple forward path
with origin o and destination i, and with the smallest cost.
The definition 21.17 of path cost simplifies here. Indeed, we are looking only at
forward paths, and the set P← is empty. Moreover, we assume that only one unit of
flow is following the path, so that the cost of path P is
X
C(P) = cij . (22.25)
(i,j)∈P
As discussed in Section 21.5.4, the concept of cost is relatively general. Even if the
name of the problem refers to the “shortest” path, the cost does not need to be the
physical length of the arc. In the example of the online tools for itineraries, the
cost can be the travel time to traverse each arc. In this case, the solution of the
shortest path problem is actually the fastest path between o and d. As discussed in
Section 21.3, the only requirement is that the cost of a path is the sum of the cost of
its arcs.
The single origin-single destination shortest path problem is a transhipment prob-
lem where
• the cost on each arc is cij ,
• the supply for the origin is 1, that is so = 1,
• the demand for the destination is 1, that is sd = −1,

• the supply for any other node is 0,
• the lower bound on each arc is 0,
• there is no upper bound.
The transhipment problem (22.11)–(22.13) becomes

X
minn cij xij (22.26)
x∈R
(i,j)∈A
subject to
X X
xoj − xko = 1,
j|(o,j)∈A k|(k,o)∈A
X X
xdj − xkd = −1,
j|(d,j)∈A k|(k,d)∈A
X X
xij − xki = 0, ∀i ∈ N , i 6= o, i 6= d,
xij ≥ 0, ∀(i, j) ∈ A.
The properties of the shortest path problems, as well as specific algorithms to

solve it, are discussed in Chapter 23.
22.4.2 The maximum flow problem

The maximum flow problem consists in pushing as much flow as possible through
a network with given capacities on the arcs. It was first motivated by the analysis
performed by the American army of the railway network operated by the Soviet Union
across Eastern Europe during the cold war (Harris and Ross, 1955, Schrijver, 2002).
Definition 22.10 (Maximum flow problem). Consider a network (N , A) of m nodes

and n arcs, and a vector u ∈ Rn representing the capacity of each arc. Consider a
node o called the origin (or the source), and a node d called the destination (or the
sink). The maximum flow problem consists in identifying a feasible flow vector that
transports as much flow as possible from o to d. More formally, it seeks a flow vector
x such that
• div(x)i = 0 for all i 6= o, i 6= d,
• div(x)o = − div(x)d is maximized.
Consider the network represented in Figure 22.5, where the value of the upper
capacity of each arc is shown next to it. It may represent a railway network, with
the maximum number of trains per hour that can proceed between two cities. Or it
may represent a network of pipelines, where the capacity is the maximum number of
megaliters of oil that the pipe can transport per hour.
542 Modeling
4 1
2 3 2
o 2 3 d
3
Figure 22.5: An example of a maximum flow problem. The value on each arc is its
capacity.
In this simple example, there are only 3 paths that can be used to transport the
flow:
• path 1: o → 2 → 3 → d,
• path 2: o → 3 → d,
• path 3: o → 2 → 4 → d.
Path 1 cannot transport more than 2 units of flow, which is the capacity of its first
and last arcs. If we send 2 units along path 1, path 2 cannot be used. Indeed,
path 2 includes arc (3, d), which cannot accommodate more than the 2 units already
transported along path 1. Similarly, path 3 cannot be used, as it includes arc (o, 2),
which is also at capacity. This strategy transports 2 units of flow from node o to node
d. It is possible to do better with the following reasoning. Path 2 cannot transport
more than 2 units of flow, which is the capacity of its last arc. If we send 2 units of
flow along path 2, path 1 cannot be used, as they share arc (3, d), which is at capacity.
But no arc in path 3 has been used. We can send a maximum of 1 unit along path
3, which is the capacity of its last arc. With this strategy, a total of 3 units are sent
from o to d. The associated flow vector is represented in Figure 22.6, where the flow
is shown next to each arc, together with the arc capacity (in square brackets).
1[4] 1[1]
1[2] 0[3] 2[2]

o 2 3 d
2[3]
Figure 22.6: A solution of a maximum flow problem. The values on each arc are the
flows x and the capacities u in format x[u].
It happens to be the maximum possible. Indeed, all arcs arriving at node d are full
and, whatever is possible upstream, no more flow can arrive there. Clearly, such
enumerations cannot be done on real networks. Specialized algorithms are described
in Chapter 24.
The maximum flow problem can be modeled as a transhipment problem. There is
no cost associated with the arcs, and the quantity that is optimized is the divergence
of a node. The idea is to include in the network an artificial arc that takes the role of
a counter. This arc, connecting d to o, sends any unit of flow reaching the destination
back to the origin, with a cost of −1. This creates a circulation (the divergence of
each node is zero). As the real arcs have all zero costs, the total cost represents the
number of units of flow that are able to reach d from o through the “real” network
(with the opposite sign, as we need to maximize). This is illustrated in Figure 22.7,
where the cost on each arc and the upper capacity are shown.
[0; 4] [0; 1]
[0; 2] [0; 3] [0; 2]

o 2 3 d
[0; 3]
[−1; +∞]
Figure 22.7: Modeling a maximum flow problem as a transhipment problem. The

values on each arc are the cost c and the upper bound u in format [c; u].
The maximum flow problem is therefore a transhipment problem where
• the cost on the artificial arc is −1,

• the cost on every other arc is 0,
• the supply/demand for each node is 0,
• there is no upper bound on the artificial arc,
• the upper bound on every other arc is given by the problem definition.
The optimization problem can then be written as follows:
min −xdo (22.27)

x∈Rn+1
544 Modeling
subject to
X X
xij − xki = 0, ∀i ∈ N , (22.28)
j|(i,j)∈A∪(d,o) k|(k,i)∈A∪(d,o)
xij ≤ uij , ∀(i, j) ∈ A, (22.29)

xij ≥ 0, ∀(i, j) ∈ A, (22.30)

xdo ≥ 0, (22.31)
where the arc (d, o) is the artificial arc added to create a circulation. In the example
presented in Figure 22.6, this arc would transport 3 units of flow.
Note that the maximum flow problem can be used to model a wide variety of
problems. An interesting example is the problem of locating n queens on a chessboard
so that they are not attacking each other (Gardner and Nicolio, 2008).
22.4.3 The transportation problem

The transportation problem is a special case of the transhipment problem where the
flow generated at the origin is directly sent to the destination, without transhipment.
Definition 22.11 (Transportation problem). Consider mo suppliers and md cus-

tomers. Supplier i produces si units of flow, i = 1, . . . , mo , and customer j consumes
tj units of flow, j = 1, . . . , md . Each supplier is associated with a list of customers
that can be served. It costs cij to transport the flow from supplier i to customer j.
The transportation problem consists in deciding how much flow each supplier must
send to each customer to satisfy the demand at minimum cost.
Note that a necessary condition for the problem to be feasible is that the total
P P
supply i si must equal the total demand j tj . As it appears from the definition,
this problem is not directly related to a physical network. The next example is about
the distribution of electricity.
Example 22.12 (Provision of electricity in Switzerland). Consider an electricity

company which needs to serve 4 cities (Zürich, Geneva, Lausanne and Bern) using 3
nuclear plants: Mühleberg, producing 3,110 gigawatt-hours (GWh) per year, Beznau,
with 3,198 GWh and Leibstadt, producing 10,205 GWh. The cities consume 8,961,
3,777, 2,517, and 1,258 GWh per year, respectively. Using an arbitrary unit, the cost
of transporting 1 GWh from a given plant to a given city is as follows:
Zürich Geneva Lausanne Bern

Mühleberg 18 6 10 9
Beznau 9 16 13 7
Leibstadt 14 9 16 5
Note that, in this example, each supplier can potentially serve each client.
The optimal solution, for a total cost of 173,138, suggests serving Lausanne entirely
from Mühleberg and serving Bern entirely from Leibstadt. The entire production of
Beznau is dedicated to Zürich. The rest is distributed to match the demand and
supply constraints, as described on the following table:
Zürich Geneva Lausanne Bern
Mühleberg 0 593 2,517 0

Beznau 3,198 0 0 0
Leibstadt 5,763 3,184 0 1,258
In order to obtain a transhipment problem formulation, it is necessary to model

the problem using a mathematical network. This is done in the following way:
• a (supply) node is associated with each supplier,
• a (demand) node is associated with each customer,
• for each supplier i, an arc (i, j) connects the supplier with each customer j that
can be served,
• the cost associated with each arc is the cost of the associated transportation.
The network for Example 22.12 is represented in Figure 22.8. The transportation
problem is therefore a transhipment problem where
• the supply for the node corresponding to supplier i is si ,
• the supply for the node corresponding to customer j is −tj ,
• there is no upper bound.
Bern
5
7
Leibstadt
16
9 Lausanne
10
13
Beznau
16 Geneva
9
6 14
Mühleberg 9
18
Zürich
Figure 22.8: Modeling of Example 22.12 using a network representation

546 Modeling

md
mo X
X
minn cij xij
x∈R
i=1 j=1
subject to
md
X
xij = si , i = 1, . . . , mo ,
j=1
mo
X
xij = tj , j = 1, . . . , md ,
i=1
xij ≥ 0, i = 1, . . . , mo , j = 1, . . . , md ,
xij = 0, if i does not serve j.
22.4.4 The assignment problem

The assignment problem can be seen as a version of the transportation problem, where
each supplier sends one unit of flow and each customer receives one. It is defined as
follows.
Definition 22.13 (Assignment problem). Consider n resources and n tasks to be

performed. The cost of assigning resource i to task j is cij . The problem consists in
assigning the n resources to the n tasks at minimal total cost.
This is again an example of a problem that has no a priori relationship with a

network. Still, it can be modeled as a transhipment problem.
Example 22.14 (Assignment). After the death of her husband, my grandmother
discovered four masterpieces in her attic (see Figure 22.9). She does not want to keep
them and would like to sell each of them to one of her four children. Each child made
an offer for the masterpieces of interest, in the following way (in kEuros):
Botticelli Bruegel Kandinsky Bierlaire
Harry 8,000 11,000 — —
Ron 9,000 13,000 12,000 —
Hermione 9,000 — 11,000 0.01
Ginny — 14,000 12,000 —
Which masterpiece should she sell to which child? The best deal is obtained by selling
the Botticelli to Harry, the Bruegel to Ginny, the Kandinsky to Ron, and the last one
to Hermione, for a total of 34,000.01 kEuros. Note that Definition 22.13 mentions
minimizing the costs and not maximizing the profit. Here, we have a maximization
problem. Therefore, the cost for an assignment is the proposed price with the opposite
sign.
Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 The transhipment problem 547
(a) Botticelli, 1485 (b) Bruegel, 1558
(c) Kandinsky, 1923

(d) Bierlaire, 1971
Figure 22.9: Masterpieces
This problem is a typical example of a combinatorial optimization problem. In-

deed, the objective is to identify the best combination of resources and tasks. Such
problems are in general highly complicated, and require advanced techniques to be
solved (some of them are described in Part VII of this book). However, in the case of
the assignment problem, we can model it as a transhipment problem. The network
model is built in a way similar to the transportation problem:
• a node is associated with each resource,
• a node is associated with each task,
• for each resource i, an arc (i, j) connects the resource with each task j if the
assignment is feasible,
• the cost associated with each arc is the benefit of the associated assignment, with
the opposite sign.
The network for Example 22.14 is represented in Figure 22.10. The assignment
problem is therefore a transportation problem, that is a transhipment problem where
• the supply for each node corresponding to a resource is 1,
548 Modeling
-8,000 Harry
-11,000
-9,000
-9,000
-13,000 Ron
-14,000 -12,000
-11,000 Hermione
-12,000
-0.01
Ginny
Figure 22.10: Modeling of Example 22.14 using a network representation
• the supply for each node corresponding to a task is −1,

• the upper bound on each arc is 1.
n X
X n
min cij xij
x∈Rn2 i=1 j=1
subject to
n
X
xij = 1, i = 1, . . . , n, (22.32)
j=1
Xn
xij = 1, j = 1, . . . , n, (22.33)
i=1
xij ≥ 0, i = 1, . . . , n, j = 1, . . . , n, (22.34)
xij ≤ 1, i = 1, . . . , n, j = 1, . . . , n. (22.35)
The variable xij is to take on the value 1 if resource i is assigned to task j, and 0
otherwise. The last two constraints should be written as
xij ∈ {0, 1}, i = 1, . . . , n, j = 1, . . . , n.
However, Corollary 22.7 guarantees that there is an optimal solution that is integer,
as the supply vector is integer. Therefore, the entries of that optimal vector x are
guaranteed to be 0 or 1, without an explicit constraint imposing it. The constraints

(22.32) guarantee that each resource is assigned to exactly one task. Similarly, the
constraints (22.33) guarantee that each task is assigned to exactly one resource.
22.5 Exercises
Exercise 22.1. Consider a version of the assignment problem (Definition 22.13)

where there are m resources and n tasks, where m > n. Write it as a transhipment
problem.
Exercise 22.2. The coach of a swimming team needs to assign swimmers to a 200-
meters medley relay team to send to the Junior Olympics. Since most of his best
swimmers are very fast in more than one stroke, it is not clear which swimmer should
be assigned to each of the four strokes. The five fastest swimmers and the best time
(in seconds) they have achieved in each of the strokes (for 50 meters) are presented
in Table 22.3.
Table 22.3: Best time for each swimmer and each stroke (Exercise 22.2)
Stroke Anna Eva Marija Shadi Marianne
Backstroke 37.7 32.9 33.8 37.0 35.4
Breaststroke 43.4 33.1 42.2 34.7 41.8
Butterfly 33.3 28.5 38.9 30.4 33.6
Freestyle 29.2 26.4 29.6 28.5 31.1
Transform the problem into an assignment problem (Definition 22.13) and provide
its mathematical formulation as a transhipment problem. Find an optimal solution
using the simplex algorithm (Algorithm 16.5).
Exercise 22.3. During a wedding dinner gathering p families, the guests are invited
to sit at q tables. Denote by ai the number of members of family i, and by bj the
number of seats at table j. In order to encourage social exchanges, two members of
the same family cannot sit at the same table. Moreover, the first and the second
family do not talk to each other anymore, and do not want to be seated at the same
table.
1. Formulate a network model that helps to seat all the guests and respect the above
mentioned conditions.
2. Investigate the existence of a solution.
3. Solve the problem with p = 6, q = 6, a1 = 3, a2 = 2, a3 = 5, a4 = 4, a5 = 3,
a6 = 1, bj = 3, j = 1, . . . , 6 using the simplex algorithm (Algorithm 16.5).
Exercise 22.4. After spending a weekend with friends, Aria has determined the
amount of money that they owe each other, as presented in Table 22.4.
François requests a simple solution that minimizes the sum of transfers. Model
that problem as a transhipment problem and solve it using the simplex algorithm
550 Exercises
Table 22.4: Money owed by each friend (Exercise 22.4).

Who How much To whom
Aria 6.- Monia
Aria 16.- Gabriel
Monia 0. -
Gabriel 8.- Monia
François 10.- Aria
François 16.- Monia
(Algorithm 16.5).
(i, j) is associated with its lower bound ℓij , its flow xij , and its upper bound uij in
the following way: (ℓij , xij , uij ).
1. Write the mathematical formulation of the maximum flow problem, as a tranship-
ment problem.
2. Solve it using the simplex algorithm (Algorithm 16.5).
Chapter 23
Shortest path
Contents
23.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552

23.2 The shortest path algorithm . . . . . . . . . . . . . . . . . 558
23.3 Dijkstra’s algorithm . . . . . . . . . . . . . . . . . . . . . . 566
23.4 The longest path problem . . . . . . . . . . . . . . . . . . 571
23.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
The shortest path problem is defined in Section 22.4.1. In this chapter, we focus on
solving the problem.
When looking at a map, it may look relatively easy to identify the shortest path.
But the shortest path problem for general networks does not inherit the intuitive
properties of a map. First, an algorithm does not benefit from the bird’s eye view
of the network. As discussed in Section 21.5.5, data structures such as adjacency
matrices or adjacency lists are used in general. They provide a myopic view of the
network topology, similar to what would be the case in a labyrinth, in the sense
that, given a node, we have direct access only to the adjacent nodes. Second, on
geographical maps, the triangle inequality is verified, so that the distance as the crow
flies between two nodes can be used as a reference to compare with the length of paths.
The path that deviates the least from the straight line is likely to be the shortest,
and the length of the detours can be roughly estimated. In general networks, the
nodes may not necessarily correspond to geographical locations, and the arc cost may
not represent the Euclidean distance. Consequently, the triangle inequality may not
necessarily hold, and the intuition inspired by geographical maps cannot be used. For
instance, consider the network represented in Figure 23.1, where the number next to
each arc is its cost, and the number associated with each node is its identifier. The
cost to go from node 9 to node 10 through nodes 13 and 14 is less then the cost of
the direct arc, violating the triangle inequality.
552 Properties
Table 23.1 enumerates all simple paths connecting node 1 and node 16 in this
network. There are 20 of them. The shortest one is represented in bold, both in
Figure 23.1 and Table 23.1. Obviously, a straight line between node 1 and node 16
does not provide any intuition about the shortest path.
8 1 1
1 2 3 4
1 1 8 1
8 8 8
5 6 7 8
1 1 8 1
8 8 8
9 10 11 12
1 1 8 1
1 8 8
13 14 15 16
Figure 23.1: Triangle inequality does not hold
23.1 Properties
Clearly, the path enumeration is not appropriate in practice to identify the shortest
path. The number of paths between two nodes can grow exponentially with the size
of the network.
Instead, we exploit the fact that the shortest path problem is a transhipment
problem (see Section 22.4.1). The optimality conditions (22.19)–(22.20) have a nice
interpretation in that case. For each arc (i, j) in the network,
λj ≤ cij + λi . (23.1)
For each arc on the shortest path (that is, in terms of the transhipment problem, for
each arc transporting flow), we have
λj = cij + λi . (23.2)
As the optimality conditions of the shortest path problem are derived directly from
the complementarity slackness conditions, the next theorem does not need a proof.
However, we provide a proof to obtain some insight in the interpretation of the dual
variables.
Shortest path 553
Table 23.1: List of simple forward paths between node 1 and node 16 in the network
from Figure 23.1
Path Cost Sequence of nodes
1 13 1 2 3 4 8 12 16
2 15 1 5 6 2 3 4 8 12 16
3 29 1 5 6 7 3 4 8 12 16
4 27 1 5 6 7 8 12 16
5 17 1 5 9 10 6 2 3 4 8 12 16
6 31 1 5 9 10 6 7 3 4 8 12 16
7 29 1 5 9 10 6 7 8 12 16
8 38 1 5 9 10 11 7 3 4 8 12 16
9 36 1 5 9 10 11 7 8 12 16
10 27 1 5 9 10 11 12 16
11 12 1 5 9 13 14 10 6 2 3 4 8 12 16
12 26 1 5 9 13 14 10 6 7 3 4 8 12 16
13 24 1 5 9 13 14 10 6 7 8 12 16
14 33 1 5 9 13 14 10 11 7 3 4 8 12 16
15 31 1 5 9 13 14 10 11 7 8 12 16
16 22 1 5 9 13 14 10 11 12 16
17 40 1 5 9 13 14 15 11 7 3 4 8 12 16
18 38 1 5 9 13 14 15 11 7 8 12 16
19 29 1 5 9 13 14 15 11 12 16
20 20 1 5 9 13 14 15 16
Theorem 23.1 (Optimality conditions for the shortest path problem). Consider a
network (N , A) with n arcs and m nodes, and the cost vector c ∈ Rn . Consider
a vector λ ∈ Rm such that
λj ≤ λi + cij , ∀(i, j) ∈ A. (23.3)
Consider a path P between a node o and a node d. If
λj = λi + cij , ∀(i, j) ∈ P, (23.4)
then P is a shortest path between o and d.
Proof. Consider any path Q between o and d, composed of ℓ + 1 arcs, ℓ ≥ 0:
Q = o → j1 → j2 . . . jℓ → d.
The total cost C(Q) of Q is the sum of the cost of each arc on Q:
C(Q) = coj1 + cj1 j2 + . . . + cjℓ d .

554 Properties
From (23.3), cij ≥ λj − λi and we obtain
C(Q) ≥ (λj1 − λo ) + (λj2 − λj1 ) + . . . + λd − λjℓ = λd − λo , (23.5)
as for each node i different from o and d, both λi and −λi are involved in the sum,
and cancel out.

Assume that (23.4) holds. The path P is composed of k + 1 arcs, k ≥ 0:
P = o → i1 → i2 . . . ik → d.
The total cost C(P) of P is the sum of the cost of each arc on P:
C(P) = coi1 + ci1 i2 + . . . + cik d .
From (23.4), cij = λj − λi and we obtain
C(P) = (λi1 − λo ) + (λi2 − λi1 ) + . . . + λd − λik = λd − λo . (23.6)
From (23.5), we obtain for any path Q that C(Q) ≥ C(P), proving that P is the path
with minimum cost.
The dual variable λi is usually called the label of node i. Equation (23.6) shows
that the length of the shortest path is the difference between the label of the des-
tination and the label of the origin. As discussed in Section 22.2, the optimality
conditions are not affected if all labels are shifted by the same constant. Therefore,
it is always possible to impose λo = 0. In that case, the label λi can be interpreted
as the cost of the shortest path from o to i.
Figure 23.2 represents the same network as Figure 23.1, where each node is asso-
ciated with a label. It can easily be verified that condition (23.3) is verified for each
arc in the network. Each arc (i, j) such that λj = λi + cij is represented in bold, and
each arc such that λj < λi + cij is represented with a dotted line. The subnetwork
consisting of all arcs in bold is a spanning tree (as discussed later, a solution of the
single origin-multiple destination shortest path problem is always a spanning tree).
Therefore, from characterization 3 of Theorem 21.10, there is a single path with
bold arcs between node o and any node i. As all arcs of this path verify the optimality
conditions, it is the shortest path.
We now prove some properties of the shortest path. The first result states that
no shortest path exists if the network contains a negative cost cycle. Intuitively, such
a cycle could be followed as many times as needed to reach any arbitrarily low value
for the cost of the path. From the point of view of the transhipment problem, the
linear optimization problem is not bounded.
Theorem 23.2 (Negative cost cycle). Consider a network (N , A) with n arcs

and m nodes, the cost vector c ∈ Rn , and two nodes o and d. Assume that
there exists a forward path from o to d containing a cycle with negative cost.
Therefore, no forward path is the shortest path between o and d.
Shortest path 555
8 1 1
0 7 8 9
1 1 8 1
8 8 8
1 6 14 10
1 1 8 1
8 8 8
2 5 13 11
1 1 8 1
1 8 8
3 4 12 12
Figure 23.2: Shortest path tree, with node labels and arc costs
Proof. Consider the forward path P between o and d containing a negative cost cycle.
Denote Cc < 0 the cost of the cycle, and Cr = C(P) − Cc the cost of the rest of the
path. A new path Pk between o and d can be created by including the cycle k times
instead of only once. The cost of Pk is Cr + kCc . Assume by contradiction that there
exists a shortest path Q between o and d. Select
C(Q) − Cr
k> .
Cc
Then, the cost of Pk is less than the cost of Q. Indeed, as Cc < 0, we have
C(Q) − Cr
C(Pk ) = Cr + kCc < Cr + Cc = C(Q).
Cc
This contradicts the fact that Q is a shortest path.
In principle, this should not be a problem as we are only considering simple

paths, where cycles are not allowed. Unfortunately, the shortest simple path problem
in the presence of negative cost cycles is much more difficult than in their absence.
More advanced methods, such as those presented in Part VII of this book, must be
considered. In this chapter, we assume that no negative cost cycle exists. It happens
to be a sufficient condition for the existence of a shortest path. To show this, we start
with a simple lemma.
556 Properties
Lemma 23.3. [Shorter simple paths] Consider a network (N , A), and two nodes
o and d. Consider a forward path P between o and d that does not contain a
negative cost cycle. Then there exists a simple forward path Q from o to d such
that C(Q) ≤ C(P).
Proof. If the path P is simple, the result is immediate. Assume that path P contains
node j several times, that is
P = o → i1 → i2 . . . ik → j → ik+1 . . . iℓ → j → iℓ+1 . . . → im → d,
where the first occurrence of node j is after node ik , and the last occurrence before
node iℓ+1 . The cost of P is
C(P) = C1 + C2 + C3 ,
where
C1 = coi1 + ci1 i2 + . . . + cik j

C2 = cj,ik+1 + . . . + ciℓ j
C3 = cj,iℓ+1 + . . . cim d .
By assumption, P does not contain a negative cost cycle, and the cost C2 of the cycle
is non negative, that is C2 ≥ 0. Therefore, the path obtained by removing the cycle
is
P ′ = o → i1 → i2 . . . ik → j → iℓ+1 . . . → im → d
and has lower cost
C(P ′ ) = C1 + C3 ≤ C(P).
Note that node j appears exactly once in path P ′ . If P ′ is simple, the result is obtained.
Otherwise, P ′ contains another cycle that can be removed in the same way. Cycles
are removed until a simple path is obtained.
The lemma motivates the use of simple paths when shortest paths are considered.
Indeed, any cycle can be skipped to make the path shorter (until it becomes simple),
except if one of those cycles has a negative cost. The existence of a shortest path can
be deduced directly from this result.
Corollary 23.4 (Existence of a shortest path). Consider a network (N , A) with n

arcs and m nodes, the cost vector c ∈ Rn , and two nodes o and d. A shortest
path between o and d exists if and only if there is at least one path between o
and d, and no path between o and d contains a negative cost cycle.
Proof. The necessary condition is Theorem 23.2. For the sufficient condition, assume
that there is at least one path, and no path with a negative cycle. Consider all simple
paths from o to d. From Lemma 21.6, there is a finite number of them. As there
Shortest path 557
is at least one path between o and d, Theorem 23.3 guarantees that there is also a
simple path, so that the set of simple paths is not empty. Therefore, the simple path
with minimum cost can be identified. Call it P. Take an arbitrary path Q from o
to d. From Theorem 23.3, there is a simple path Q ′ such that C(Q ′ ) ≤ C(Q). By
definition of P, we have also C(P) ≤ C(Q ′ ). Consequently, C(P) ≤ C(Q), proving
that P is a shortest path.
As an immediate corollary of Lemma 23.3, the shortest path can be found among
the simple paths.
Corollary 23.5 (Simple shortest path). Consider a network (N , A) with n arcs

and m nodes, the cost vector c ∈ Rn , and two nodes o and d. If there is a
shortest path from o to d, there is one that is a simple path.
Proof. Let P be a shortest path from o to d. From Theorem 23.2, the path does not
contain a negative cost cycle. From Theorem 23.3, there exists a simple path Q such
that C(Q) ≤ C(P). As P is a shortest path, we must have C(Q) = C(P), proving the
result.
The next corollary gives a lower bound on the length of the shortest path. It is
used in the algorithms to detect negative cost cycles.
Corollary 23.6 (Lower bound on the length of the shortest path). Consider a
network (N , A) with n arcs and m nodes, the cost vector c ∈ Rn , two nodes o
and d, and P a simple shortest path between o and d. If c ≥ 0, C(P) ≥ 0. If
c 6≥ 0, then
C(P) ≥ (m − 1) min cij . (23.7)
(i,j)∈A
Proof. If c ≥ 0, the result is obvious. If c 6≥ 0, then cmin = min(i,j)∈A cij < 0.

Therefore,
X
C(P) = cij ≥ ℓP cmin ,
(i,j)∈P
where ℓP is the number of arcs in P. From Lemma 21.5, ℓP ≤ m − 1, proving the

result.
The next property may look obvious. It happens to be an important property

that allows us to decompose complex problems into simple ones. The optimization
methodology known as dynamic programming 1 relies on the principle of optimality.
We formulate it for the shortest path problem.
1 Dynamic programming is not covered in this book. (See Bellman, 1957, or the more recent
edition Bellman, 2010), among many references.
558 The shortest path algorithm
Theorem 23.7 (Principle of optimality). Consider a network (N , A), and two

nodes o and d. Let P = o → i1 → i2 . . . ik → d be a shortest path from o to d.
Then, for any ℓ = 1, . . . , k, the subpath Poℓ = o → . . . → iℓ is a shortest path from
o to iℓ and Pℓd = iℓ → . . . → d is a shortest path from iℓ to d.
Proof. As there is a shortest path, there is no path with negative cost cycle. Assume
by contradiction that there exists a path Q between o and iℓ such that C(Q) < C(Poℓ ).
The path from o to d obtained by merging Q and Pℓd has cost
C(Q) + C(Pℓd ) < C(Poℓ ) + C(Pℓd ) = C(P),
which contradicts the fact that P is a shortest path. The proof that Pℓd is shortest is
similar.
Edsger Wybe Dijkstra [dEikstra] was born on May 11, 1930, in

Rotterdam, and died on August 6, 2002, in Nuenen (The Nether-
lands). The algorithm that bears his name (Algorithm 23.2) was
published in 1959 in a two-page article (Dijkstra, 1959), together
with an algorithm for the shortest spanning tree problem, while
he was working at the Computation Department of the Mathe-
matical Center in Amsterdam, where he was hired as a program-
mer. He was a member of the team that designed the computer
language ALGOL-60, and designed a computer operating system
while he was Professor of Mathematics at the Eindhoven University of Technology. He
finished his career as the holder of the Schlumberger Centennial Chair in Computer
Science at Austin, and retired in 1999.
Figure 23.3: Edsger Wybe Dijkstra
23.2 The shortest path algorithm

Although the shortest path problem can be modeled as a transhipment problem and,
therefore, can be solved using the simplex method, this happens not to be efficient, if
the structure of the problem is not properly exploited. We present here an algorithm
that solves the single origin-multiple destination version of the shortest path problem.
This algorithm can solve any problem that does not involve a negative costs cycle.
In Section 23.3, we present a specific version of this algorithm for the common case
where all the arc costs are non negative.
The main idea of the algorithm comes from the interpretation of the dual variables
at the optimal solution. As discussed above, if the dual variable at the origin o is set
to 0, the value of the dual variable λi at optimality is the length of the shortest path
between o and i.
Shortest path 559
The algorithm maintains a vector of labels λ ∈ Rm . Then, it analyzes each arc

and checks if the optimality conditions of Theorem 23.1 are verified. If there is an
arc (i, j) such that
λj > λi + cij , (23.8)
then the value of λj is updated

λj = λi + cij .
The interpretation is as follows. At each point in time, the value of λi represents

the length of a forward path between o and i. Therefore, if (23.8) is verified, that
is, if the optimality condition is violated for arc (i, j), it means that the path with
length λj , whatever it is, is strictly longer than the path from o to i associated with
λi , followed by the arc (i, j), as illustrated in Figure 23.4. Therefore, this latter path
is shorter, and its length (λi + cij ) becomes the new label of the node.
c ij
o i
Figure 23.4: Interpretation of the labels as the length of a path from o
Using the labyrinth analogy suggested in Section 21.5.5, the algorithm is like an
explorer systematically exploring all the corridors of this labyrinth (the arcs), while
recording their length. The mileage counter is set to 0 at the starting point (the
origin o). When the explorer reaches an intersection (a node), it records the current
mileage, which is the mileage reached at the previous node (λi ) plus the length of the
corridor (cij ). Now two things may happen. If it is the first time that node j is visited,
the mileage is simply recorded and written on a wall of the intersection. Otherwise,
there is already a mileage written on a wall, that is the length of a previous path
that has been used to reach j. Interestingly, it does not matter what path it is. Its
length is the only relevant information here. If the new mileage (λi + cij ) is greater or
equal to λj , nothing is done. The new path is not shorter. Otherwise, the new path
is shorter, the value of λj is erased and replaced by λi + cij . It is also convenient to
record that the predecessor of node j along the new path is node i.
The key difficulty here is to guarantee the systematic exploration of the network.
The advantage of the algorithm compared to an explorer in a labyrinth is that it
can be teleported to any location in the network. We need to identify the list of
these locations that must be visited. We maintain a set S of nodes that must be
“treated,” where the treatment consists in the label updating described above for
each arc leaving that node. Once the node has been treated, it is removed from S.
During the treatment, the label of other nodes may be updated. Such nodes are
included in the set S in order to be treated later. Before the algorithm starts, the set
S contains only the origin. All labels are initialized to +∞, except the label of the
origin, that is set to 0.

The algorithm terminates if the set of nodes to be treated is empty. This may
never happen if the network contains a negative cost cycle. If it were the case, the
algorithm would follow this cycle forever, following paths with lower and lower cost,
and the labels along the cycle would keep on decreasing. Therefore, a specific stopping
criterion must be included to detect such cases.
This is described in Algorithm 23.1.
Algorithm 23.1: The shortest path algorithm
1 Objective
2 Calculate a shortest path between a node o and all other nodes.
3 Input
5 A vector c ∈ Rn with the cost of each arc.
6 The origin o.
7 Output
8 A Boolean U that is true if the problem is unbounded.
9 A vector λ ∈ Rm containing the optimal labels of the nodes.
10 A vector π ∈ N m such that πi contains the node preceding node i in the
shortest path if λi 6= ∞ and i 6= o.
11 Initialization
12 λo := 0.
13 λi := +∞ ∀i ∈ N , i 6= o .
14 S := {o}.
15 U := false.
16 Repeat
17 Select node i ∈ S.
18 for all j such that (i, j) ∈ A do
19 if λj > λi + cij then
20 λj := λi + cij
21 if λj < 0 and λj < (m − 1) min(i,j)∈A cij then
22 U := true
23 STOP: negative cost cycle detected (see Corollary 23.6)
24 πj := i
25 S := S ∪ {j}
26 S := S \ {i}.
27 Until S = ∅.
Shortest path 561
Example 23.8 (The shortest path algorithm). We apply Algorithm 23.1 on the
network represented in Figure 23.1, where the origin is node 1. Table 23.2 reports
the set S, the treated node i and the value of all labels at each iteration. Table 23.3
reports the predecessors vector π at each iteration. During the first iteration, only
node 1 is in set S and is therefore treated. There are only two arcs originating from
node 1: (1, 2) and (1, 5). As the label of nodes 2 and 5 have been initialized to ∞,
condition (23.8) is trivially verified, and the labels of nodes 2 and 5 are updated to
0 + 8 and 0 + 1 respectively. Meanwhile, π2 and π5 are set to 1, as node 1 is the
predecessor in the current path that has been used to reach these two nodes.
During iteration 8, node 10 is treated. Its label is 10. The algorithm considers arc
(10, 6) with cost 1. As λ6 = 9 ≤ 10 + 1, nothing is done. The same happens for arc
(10, 11), as λ11 = 18 ≤ 10 + 8. Therefore, no label is updated during that iteration,
and node 10 is simply removed from S.
At iteration 14, node 10 is treated again. Its label is now 5. The algorithm
considers arc (10, 6) with cost 1. As λ6 = 9 > 5 + 1, the label of node 6 is updated,
π6 is set to 10, and node 6 included into S. The same happens for arc (10, 11), as
λ11 = 18 > 5 + 8. Note that, by coincidence, the value of π11 used to be 10, and is
therefore not updated.
The final labels correspond to those reported in Figure 23.2. The final value of π
allows us to reconstruct the shortest paths using a backward procedure. We illustrate
it for the shortest path from o to node 11. As π11 = 10, the predecessor of node 11
is node 10. As π10 = 14, the predecessor of node 10 is node 14, and so on. Following
this scheme, one obtains the path
o → 5 → 9 → 13 → 14 → 10 → 11.
We now present some properties of Algorithm 23.1.
Theorem 23.9 (Invariants of the shortest path algorithm). The following properties
hold at the end of each iteration of Algorithm 23.1:
1. If i ∈ S, then λi 6= ∞.
2. For each node i, the value of λi does not increase during the iteration.
3. If λi 6= ∞, it is the length of one path from o to i.
4. If i 6∈ S, then λi = ∞ or λj ≤ λi + cij , for all j such that (i, j) ∈ A.
Proof. 1. During the initialization, node o is included in S at step 14, and λo is set
to 0 (step 12). Any other node in S has been included at step 25. Therefore, the
condition at step 19 is verified, and λi 6= ∞. Therefore, the update at step 20
gives a finite label to j before including it in S, proving the result.
2. It is a direct consequence of the condition at step 19 and the label update statement
at step 20.
Table 23.2: Description of the iterations of the shortest path algorithm for Example 23.8
562
Iter. S i λ1 λ2 λ3 λ4 λ5 λ6 λ7 λ8 λ9 λ10 λ11 λ12 λ13 λ14 λ15 λ16

0 {1} 1 0
1 {2 5} 2 0 8 1
2 {5 3} 5 0 8 9 1
∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
3 {3 6 9} 3 0 8 9 1 9 2
∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
4 {6 9 4} 6 0 8 9 10 1 9 2
∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
5 {9 4 7} 9 0 8 9 10 1 9 17 2
∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
6 {4 7 10 13} 4 0 8 9 10 1 9 17 2 10 3
∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
7 {7 10 13 8} 7 0 8 9 10 1 9 17 11 2 10 3
∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
8 {10 13 8} 10 0 8 9 10 1 9 17 11 2 10 3
∞ ∞ ∞ ∞ ∞ ∞
9 {13 8 11} 13 0 8 9 10 1 9 17 11 2 10 18 3
∞ ∞ ∞ ∞ ∞
10 {8 11 14} 8 0 8 9 10 1 9 17 11 2 10 18 3 4
∞ ∞ ∞ ∞ ∞
11 {11 14 12} 11 0 8 9 10 1 9 17 11 2 10 18 12 3 4
∞ ∞ ∞ ∞
12 {14 12} 14 0 8 9 10 1 9 17 11 2 10 18 12 3 4
∞ ∞ ∞
13 {12 10 15} 12 0 8 9 10 1 9 17 11 2 5 18 12 3 4 12
∞ ∞
14 {10 15 16} 10 0 8 9 10 1 9 17 11 2 5 18 12 3 4 12 13
∞ ∞
15 {15 16 6 11} 15 0 8 9 10 1 6 17 11 2 5 13 12 3 4 12 13
∞
The shortest path algorithm
16 {16 6 11} 16 0 8 9 10 1 6 17 11 2 5 13 12 3 4 12 13
17 {6 11} 6 0 8 9 10 1 6 17 11 2 5 13 12 3 4 12 13
18 {11 2 7} 11 0 7 9 10 1 6 14 11 2 5 13 12 3 4 12 13
19 {2 7} 2 0 7 9 10 1 6 14 11 2 5 13 12 3 4 12 13
20 {7 3} 7 0 7 8 10 1 6 14 11 2 5 13 12 3 4 12 13
21 {3} 3 0 7 8 10 1 6 14 11 2 5 13 12 3 4 12 13
22 {4} 4 0 7 8 9 1 6 14 11 2 5 13 12 3 4 12 13
23 {8} 8 0 7 8 9 1 6 14 10 2 5 13 12 3 4 12 13
24 {12} 12 0 7 8 9 1 6 14 10 2 5 13 11 3 4 12 13
25 {16} 16 0 7 8 9 1 6 14 10 2 5 13 11 3 4 12 12
26 {} 0 7 8 9 1 6 14 10 2 5 13 11 3 4 12 12
Table 23.3: Value of π for each iteration of the shortest path algorithm for Example 23.8
Iter. π1 π2 π3 π4 π5 π6 π7 π8 π9 π10 π11 π12 π13 π14 π15 π16
0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
1 -1 1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
2 -1 1 2 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
3 -1 1 2 -1 1 5 -1 -1 5 -1 -1 -1 -1 -1 -1 -1
4 -1 1 2 3 1 5 -1 -1 5 -1 -1 -1 -1 -1 -1 -1
5 -1 1 2 3 1 5 6 -1 5 -1 -1 -1 -1 -1 -1 -1
6 -1 1 2 3 1 5 6 -1 5 9 -1 -1 9 -1 -1 -1
7 -1 1 2 3 1 5 6 4 5 9 -1 -1 9 -1 -1 -1
8 -1 1 2 3 1 5 6 4 5 9 -1 -1 9 -1 -1 -1
9 -1 1 2 3 1 5 6 4 5 9 10 -1 9 -1 -1 -1
10 -1 1 2 3 1 5 6 4 5 9 10 -1 9 13 -1 -1
11 -1 1 2 3 1 5 6 4 5 9 10 8 9 13 -1 -1
12 -1 1 2 3 1 5 6 4 5 9 10 8 9 13 -1 -1
13 -1 1 2 3 1 5 6 4 5 14 10 8 9 13 14 -1
Shortest path
14 -1 1 2 3 1 5 6 4 5 14 10 8 9 13 14 12
15 -1 1 2 3 1 10 6 4 5 14 10 8 9 13 14 12
16 -1 1 2 3 1 10 6 4 5 14 10 8 9 13 14 12
17 -1 1 2 3 1 10 6 4 5 14 10 8 9 13 14 12
18 -1 6 2 3 1 10 6 4 5 14 10 8 9 13 14 12
19 -1 6 2 3 1 10 6 4 5 14 10 8 9 13 14 12
20 -1 6 2 3 1 10 6 4 5 14 10 8 9 13 14 12
21 -1 6 2 3 1 10 6 4 5 14 10 8 9 13 14 12
22 -1 6 2 3 1 10 6 4 5 14 10 8 9 13 14 12
23 -1 6 2 3 1 10 6 4 5 14 10 8 9 13 14 12
24 -1 6 2 3 1 10 6 4 5 14 10 8 9 13 14 12
25 -1 6 2 3 1 10 6 4 5 14 10 8 9 13 14 12
563
26 -1 6 2 3 1 10 6 4 5 14 10 8 9 13 14 12
3. Consider the first iteration. Before the iteration starts, all labels are set to ∞
(step 13), except λo = 0 (step 12). Note that the value 0 can be interpreted as the
length of a path from o to o. During the first iteration, all nodes i such that the
arc (o, i) exists have their label set to λo + coi = coi (step 20). This is obviously
the length of the direct path from o to i. All other labels remaining at ∞, the
result is proved for the first iteration. Assume now by induction that the result is
true at the beginning of an iteration where the treated node (selected at step 17)
is i. From property 1, we have λi 6= ∞ and, by induction assumption, λi is the
length of a path P between o and i. When the label of a node j is updated at step
20, its value is therefore set to the length of the path composed of path P extended
by arc (i, j), proving the result for all labels updated during this iteration. As the
other labels are untouched, the result holds for all nodes.
4. There are two reasons for i not to be in S. First, if i has never been treated by
the algorithm. In that case, its label has not been updated and remains at the
initial value ∞. Second, i has been treated and removed from S. Because of step
19, just after i has been treated, we have λj ≤ λi + cij for all (i, j) ∈ A. Then
the label of i is not touched anymore while it is out of S. As soon as the label is
updated at step 20, i comes back into S (step 25). In the meantime, any other
label can only decrease (see property 2 above), so that the condition λj ≤ λi + cij
is verified whenever i is not in S.
Theorem 23.10 (Termination of the shortest path algorithm). Algorithm 23.1 ter-
minates after a finite number of iterations.
Proof. The algorithm terminates if S = ∅. Suppose that it never happens. It means

that some nodes are added to the set S an infinite number of times. Each time, their
label is updated to a strictly lower value. Therefore, the label of these nodes goes
to −∞. It is not possible because of the condition at step 21 that interrupts the
algorithm when a label becomes lower than a finite number.
Theorem 23.11 (Bellman’s equation). If Algorithm 23.1 terminates with S = ∅,

then λj = ∞ if and only if there is no path from o to j. If λj 6= ∞, then λj is the
length of the shortest path from o and j, λo = 0 and, for all j 6= o,
λj = min (λi + cij ). (23.9)

(i,j)∈A
Equation (23.9) is called Bellman’s equation.
Proof. Assume that λj = ∞ and that there is a path from o to j. As S = ∅, property

4 of Theorem 23.9 guarantees that for each arc, if the upstream node has a finite
label, so does the downstream node. As λo = 0, the next node along the path has a
finite label. This property propagates through the path until j, which must also have
Shortest path 565
a finite label. This contradicts the assumption, proving the sufficient condition. The
necessary condition is shown by the contrapositive of property 3 of Theorem 23.9,
which states that if there is no path from o to j, then λj = ∞.
Consider now the case where λj 6= ∞. By property 4 of Theorem 23.9, we have
for each i such that λi 6= ∞,
λj ≤ λi + cij , ∀(i, j) ∈ A. (23.10)
Consider a node ℓ. From property 3 of the same theorem, λℓ is the length of a path
from o to ℓ. Call it Pℓ . Take any path Q from o to ℓ. Using the same argument as in
the proof of Theorem 23.1, we have
C(Q) ≥ λℓ − λo .
From property 2 of Theorem 23.9, as λo is initialized to 0, at the end of the algorithm

we have λo ≤ 0. Therefore,
C(Q) ≥ λℓ = C(Pℓ ),
showing that Pℓ is a shortest path from o to ℓ.
From property 3 of the same theorem, λo < 0 would mean that there is a negative
cost cycle from o to o. In that case, the algorithm would follow this cycle until the
stopping criterion at step 21 is verified. This contradicts the assumption that the
algorithm terminates with S = ∅. Therefore, λo = 0.
Finally, Bellman’s equation (23.9) is a direct consequence of the optimality con-
ditions (23.3) and (23.4). Indeed, assume that
λj > min (λi + cij ).

(i,j)∈A
It means that there exists (i, j) ∈ A such that λj > λi + cij , contradicting (23.3).
Assume next that
λj < min (λi + cij ).
(i,j)∈A
Then, λj 6= λi + cij for all (i, j) ∈ A, contradicting (23.4).

Assume that Algorithm 23.1 terminates with S = ∅. Consider the subnetwork
(N , A ′ ), where N is the set of all nodes of the network, and A ′ is generated as
follows. For each node j different from the origin, select one arc (i, j) such that (23.9)
is verified (if Bellman’s equation is verified for several arcs, we arbitrarily select one
of them). We call it Bellman’s subnetwork. By construction, the number of arcs
in the subnetwork is m − 1, where m is the number of nodes. Therefore, if the
subnetwork does not contain any cycle, it is a spanning tree according to condition
7 of Theorem 21.10, and Definition 21.11. For any node d, there is only one path in
the spanning tree connecting the origin o to d (condition 3 of Theorem 21.10). As
each arc of the path verifies Bellman’s equation, it verifies (23.4) and Theorem 23.1
guarantees that it is a shortest path from o to d. Therefore, the subnetwork is called
a shortest path spanning tree.
566 Dijkstra’s algorithm
Theorem 23.12 (Shortest path spanning tree). Consider a network (N , A) of m

nodes and n arcs, and a vector c ∈ Rn representing the cost of traversing each
arc such that every cycle (if any) in the network has positive length. Then,
Bellman’s subnetwork is a shortest path spanning tree.
Proof. As the network does not contain a cycle with negative length, Algorithm 23.1
terminates with S = ∅. Assume by contradiction that Bellman’s subnetwork contains
a cycle i1 , . . . , iℓ . Its length is
ci1 i2 +ci2 i3 +· · ·+ciℓ−1 iℓ +ciℓ i1 = λi2 −λi1 +λi3 −λi2 +· · ·+λiℓ −λiℓ−1 +λi1 −λℓ = 0,
which is not possible by assumption. Therefore, Bellman’s subnetwork does not

contain any cycle and is a spanning tree. As discussed above, the optimality conditions
of Theorem 23.1 apply, and each path in the tree is a shortest path.
23.3 Dijkstra’s algorithm

The shortest path Algorithm 23.1 does not specify how the node to be treated during
a given iteration has to be chosen within the set S (step 17). In Example 23.8, we have
always selected the first node in the set, but any other choice would have produced
a shortest path, too. While the selection strategy of the treated node does not affect
the outcome of the algorithm, it can heavily affect its performance. For instance, if
you run the algorithm on Example 23.8 and select the last node instead of the first
one, it would require 31 iterations instead of 26.
In this section, we present a strategy that is a little bit more sophisticated, and
particularly efficient, but restricted to the case when all costs are non negative. It
consists in selecting the node in S associated with the smallest label. Therefore, in
Algorithm 23.1, the statement 17 is replaced by “Select node i ∈ S such that λi ≤ λj ,
for all j ∈ S.”
If we apply this version of the algorithm on Example 23.8, we obtain the results
presented in Tables 23.4 and 23.5. We note that the algorithm has identified the same
optimal solution as before, but in only 16 iterations, which means that each of the 16
nodes has only been treated once. As all the nodes are reachable from node 1, it is the
minimum possible number of iterations. It happens that this version of the shortest
path algorithm, called Dijkstra’s algorithm and presented as Algorithm 23.2, never
treats any node more than once, when all the arcs in the network have a non negative
cost.
There are only a few differences between Algorithm 23.2 and Algorithm 23.1:
1. the cost vector must be non negative,
2. the node to be treated is the node in S with the smallest label,
3. the identification of a negative cost cycle is no longer necessary, as all arcs have
non negative cost.
Table 23.4: Description of the iterations of the Dijkstra algorithm for Example 23.8
Iter. S i λ1 λ2 λ3 λ4 λ5 λ6 λ7 λ8 λ9 λ10 λ11 λ12 λ13 λ14 λ15 λ16
0 {1} 1 0
1 {2 5} 5 0 8 1
2 {2 6 9} 9 0 8 1 9 2
∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
3 {2 6 10 13} 13 0 8 1 9 2 10 3
∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
4 {2 6 10 14} 14 0 8 1 9 2 10 3 4
∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
5 {2 6 10 15} 10 0 8 1 9 2 5 3 4 12
∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
6 {2 6 15 11} 6 0 8 1 6 2 5 13 3 4 12
∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
7 {2 15 11 7} 2 0 7 1 6 14 2 5 13 3 4 12
∞ ∞ ∞ ∞ ∞ ∞ ∞
8 {15 11 7 3} 3 0 7 8 1 6 14 2 5 13 3 4 12
∞ ∞ ∞ ∞ ∞ ∞
Shortest path
9 {15 11 7 4} 4 0 7 8 9 1 6 14 2 5 13 3 4 12
∞ ∞ ∞ ∞ ∞
10 {15 11 7 8} 8 0 7 8 9 1 6 14 10 2 5 13 3 4 12
∞ ∞ ∞ ∞
11 {15 11 7 12} 12 0 7 8 9 1 6 14 10 2 5 13 11 3 4 12
∞ ∞ ∞
12 {15 11 7 16} 15 0 7 8 9 1 6 14 10 2 5 13 11 3 4 12 12
∞ ∞
13 {11 7 16} 16 0 7 8 9 1 6 14 10 2 5 13 11 3 4 12 12
∞
14 {11 7} 11 0 7 8 9 1 6 14 10 2 5 13 11 3 4 12 12
15 {7} 7 0 7 8 9 1 6 14 10 2 5 13 11 3 4 12 12
16 {} 0 7 8 9 1 6 14 10 2 5 13 11 3 4 12 12
567
Table 23.5: Value of π for each iteration of the Dijkstra algorithm for Example 23.8
Iter. π1 π2 π3 π4 π5 π6 π7 π8 π9 π10 π11 π12 π13 π14 π15 π16
0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
1 -1 1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
2 -1 1 -1 -1 1 5 -1 -1 5 -1 -1 -1 -1 -1 -1 -1
3 -1 1 -1 -1 1 5 -1 -1 5 9 -1 -1 9 -1 -1 -1
4 -1 1 -1 -1 1 5 -1 -1 5 9 -1 -1 9 13 -1 -1
5 -1 1 -1 -1 1 5 -1 -1 5 14 -1 -1 9 13 14 -1
6 -1 1 -1 -1 1 10 -1 -1 5 14 10 -1 9 13 14 -1
7 -1 6 -1 -1 1 10 6 -1 5 14 10 -1 9 13 14 -1
8 -1 6 2 -1 1 10 6 -1 5 14 10 -1 9 13 14 -1
9 -1 6 2 3 1 10 6 -1 5 14 10 -1 9 13 14 -1
Dijkstra’s algorithm
10 -1 6 2 3 1 10 6 4 5 14 10 -1 9 13 14 -1
11 -1 6 2 3 1 10 6 4 5 14 10 8 9 13 14 -1
12 -1 6 2 3 1 10 6 4 5 14 10 8 9 13 14 12
13 -1 6 2 3 1 10 6 4 5 14 10 8 9 13 14 12
14 -1 6 2 3 1 10 6 4 5 14 10 8 9 13 14 12
15 -1 6 2 3 1 10 6 4 5 14 10 8 9 13 14 12
16 -1 6 2 3 1 10 6 4 5 14 10 8 9 13 14 12
Shortest path 569
Algorithm 23.2: Dijkstra’s algorithm

1 Objective
2 Calculate a shortest path between a node o and all nodes.
3 Input

5 A vector c ∈ Rn with the cost of each arc, c ≥ 0.
6 The origin o.
7 Output
8 A vector λ ∈ Rm containing the optimal labels of the nodes.
9 A vector π ∈ N m such that πi contains the node preceding node i in the
shortest path if λi 6= ∞ and i 6= o.
10 Initialization
11 λo := 0 .
12 λi := +∞ ∀i ∈ N , i 6= o.
13 S := {o}.
14 Repeat
15 Select node i ∈ S such that λi ≤ λj , for all j ∈ S.
16 for all j such that (i, j) ∈ A do
17 if λj > λi + cij then
18 λj := λi + cij
19 πj := i
20 S := S ∪ {j}
21 S := S \ {i}.
22 Until S = ∅.
Dijkstra’s algorithm has the same properties as the shortest path algorithm, described
in Section 23.2. It has also the following properties.
Theorem 23.13 (Termination of Dijkstra’s algorithm). Algorithm 23.2 terminates

after a finite number of iterations.
Proof. The algorithm terminates if S = ∅. Suppose that it never happens. It means

that some nodes are added to the set S an infinite number of times. Each time, their
label is updated to a strictly lower value. Therefore, the label of these nodes goes to
−∞. From property 3 of Theorem 23.9, λi is the length of a path from node o to
i. As all the costs are non negative, the length of any path is non negative, and λi
cannot go below 0.
570 Dijkstra’s algorithm
Theorem 23.14 (Invariants of Dijkstra’s algorithm). Consider the set
T = {i|λi 6= ∞ and i 6∈ S}. (23.11)
The following properties hold at the end of each iteration of Algorithm 23.2:
1. If i ∈ T and j 6∈ T , then λi ≤ λj .
2. If i ∈ T in the beginning of the iteration, then the label λi is not modified
during the iteration.
3. If i ∈ T in the beginning of the iteration, then i 6∈ S at the end of the iteration.
4. If i ∈ T , then λi is the length of the shortest path from o to i.
Proof. 1. We prove properties 1 and 2 by induction. Property 1 holds for the first
iteration, where node o is treated, with λo = 0. It is the only node in T at the
end of the iteration. All other labels are equal to ∞, except for nodes j such that
(o, j) ∈ A. As λj = coj ≥ λo = 0, the property holds for these labels. As T = ∅
at the beginning of the iteration, property 2 trivially holds for the first iteration.
Consider now another iteration, and assume that property 1 is true at the begin-
ning of the iteration, that is λi ≤ λj , ∀i ∈ T , ∀j 6∈ T . The iteration is treating
node ℓ. According to the rule of node selection,
λℓ ≤ λj , ∀j 6∈ T . (23.12)
As ℓ ∈ S, ℓ 6∈ T at the beginning of the iteration. When the iteration treats arc

(ℓ, m), two cases must be considered: m ∈ T and m 6∈ T .
m ∈ T Using the assumption of the induction, we have λm ≤ λℓ . As cℓm ≥ 0, we
have also λm ≤ λℓ + cℓm . It means that no node in T will see its label updated
during the iteration. This proves property 2. As no label has been updated
by the algorithm in this case, property 1 continues to hold.
m 6∈ T If the label of node m is not updated, nothing changes, and property 1
continues to hold. If the label is updated, we have, at the end of the iteration,
λm = λℓ + cℓm . Take any node i ∈ T . We have, at the end of the iteration,
λi ≤ λℓ by the assumption of the induction and the fact that the label of i has
not been updated by the iteration. As cℓm ≥ 0, λi ≤ λℓ + cℓm = λm , and the
property holds after the iteration for all nodes that were in T at the beginning
of the iteration.
Finally, as ℓ is in T at the end of the iteration, we need to show that λℓ ≤ λj ,
∀j 6∈ T . This is guaranteed by the rule of node selection (23.12), and the fact that
λℓ has not been modified by the iteration.
2. Proved with the previous property.
3. This is an immediate corollary of the previous property. Indeed, if the label of i
is not modified, i is not included in S.
Shortest path 571
4. From property 2, the label of i will not be modified anymore by any iteration.
Therefore, when the algorithm terminates (Theorem 23.13), the final value is λi .
From Theorem 23.11, it is the length of the shortest path from o to i.
We conclude this section with some comments about the algorithm.

• The efficiency of Dijkstra’s algorithm relies on an efficient procedure to identify

the node with the smallest label in set S. Most implementations rely on a data
structure known as heap that maintains a partial order of the set (see Brassard
and Bratley, 1996, Section 5.7).
• Theorem 23.14 is invoked when applying Dijkstra’s algorithm to the single origin-
single destination problem. Indeed, as soon as the destination node is treated, the
shortest path has been identified and the algorithm can be interrupted.
23.4 The longest path problem

The longest path problem is defined in a similar way to the shortest path problem.
Definition 23.15 (Longest path problem: single origin-single destination). Consider

a network (N , A) of m nodes and n arcs, and a vector c ∈ Rn representing the cost
of traversing each arc. Consider one node o called the origin and one node d called
the destination. The longest path problem consists in finding a simple forward path
with origin o and destination d, and with the largest cost.
It can also be written as a transhipment problem:

X
maxn cij xij (23.13)
x∈R
(i,j)∈A
subject to
X X
xoj − xko = 1,
j|(o,j)∈A k|(k,o)∈A
X X
xdj − xkd = −1,
j|(d,j)∈A k|(k,d)∈A
X X
xij − xki = 0, ∀i ∈ N , i 6= o, i 6= d,
xij ≥ 0, ∀(i, j) ∈ A.
As discussed in Section 1.2.1 and as with any optimization problem, it can be trans-
formed into a minimization problem by changing the sign of the objective function:
X X X
maxn cij xij ⇐⇒ minn − cij xij = (−cij )xij . (23.14)
x∈R x∈R
(i,j)∈A (i,j)∈A (i,j)∈A
572 The longest path problem
Consequently, it is equivalent to the shortest path problem where the sign of each cost
in the network has been changed. Note that, in many applications, this transformed
problem is likely to contain negative cost cycles. In this case, the problem posed as a
transhipment problem is unbounded, and the shortest path algorithm does not work.
Other modeling frameworks related to combinatorial optimization (see Part VII) must
be considered.
A concrete application of the longest path problem is the program evaluation and
review technique (PERT) used in project management. Consider a project composed
of m tasks of a given duration. Each task i is associated with a list of other tasks that
must be completed before task i can start. They are the predecessors of task i. For
example, consider a household that decides to renovate the bathroom in the house.
The list of tasks, together with their duration (in days) and list of predecessors, is
reported in Table 23.6.
Table 23.6: List of tasks to renovate a bathroom

Tasks Duration Predecessors
1 Design the overall setup of the future bath- 1
room
2 Select the bathroom furniture 1 1
3 Quotation for the bathroom furniture 3 1,2
4 Order the bathroom furniture 3 3,6
5 Select the tiles 2 2
6 Quotation from the installer 5 3
7 Quotation from the tiler 5 4,5
8 Confirm the installer 6 6
9 Confirm the tiler 1 7
10 Remove existing furniture 2 8
11 Tiling 3 9,10
12 Installation of the furniture 2 11
13 Dispose of the old furniture 1 10
The relevant questions for project management are: what is the minimum possible
duration of the project? What are the tasks that do not tolerate any delay without
delaying the duration of the whole project? Such tasks are called critical .
We construct a network in the following way:
• a node is associated with each task;
• a node o represents the beginning of the project;
• a node d represents the end of the project;
• for each task j, insert an arc (i, j) for each predecessor i;
• for each task j without predecessor, insert an arc (o, j);
• for each task i without successor, that is, each task which is the predecessor of no
other task, insert an arc (i, d);
Shortest path 573
• for each task i, the cost of each arc (i, j) is the duration of task i;
• the cost of each arc (o, j) is 0.
An arc (i, j) means that “task i must be completed before task j is started.” Note
that, thanks to the network structure, arcs are needed only for direct predecessors.
In our example, task 11 (tiling) cannot start if task 5 (select the tiles) has not been
completed. However, there is no need to insert an arc between these two tasks. By
transitivity, the precedence is captured by the presence of a forward path between
the two tasks. The network representation for the bathroom project is illustrated in
Figure 23.5. To answer the two questions for the project management, a longest path
problem must be solved. Note that such a network does not contain any forward cycle.
Indeed, as each arc (i, j) means “task i must be completed before starting task j,” a
forward cycle would mean that, for each node i on the cycle, task i must be completed
before starting task i, which is a non sense. If it appears to be the case, there must
be a mistake in the problem definition. Consequently, when solving the longest path
problem with the shortest path algorithm, although all costs are negative, no negative
cost cycle exists, and the algorithm terminates with a valid solution. The optimal
labels provide, after changing their sign, the earliest time when each task can be
started. The label of node d is the maximum project duration. For our example, task
7, say, cannot start before 13 days, and the project cannot last less than 24 days (see
Table 23.7). The arcs in the longest path tree are depicted in bold in Figure 23.5. It
can be seen that the longest path is the sequence of tasks 1, 2, 3, 6, 4, 7, 9, 11, and
12, taking a total of 24 days. These tasks are critical tasks, and this path is called
a critical path. If any of these tasks were delayed just a little, the whole project
would be delayed as well. Consequently, the pressure should be put on the tiler. The
installer can afford some delay (how much?) without affecting the end of the whole
project. Note that, in the presence of several longest paths, the algorithm is not
reporting all critical tasks.
1 2 5
2 5 7 9
1 3
1 1
0 1 3 3 2
o 1 3 4 11 12 d
5 2 1
3
5 6 2
6 8 10 13
Figure 23.5: Network representation of the project organization

574 Exercises
Table 23.7: Optimal labels of the critical paths

i λi i λi i λi
o 0 5 -2 10 -16
1 0 6 -5 11 -19
2 -1 7 -13 12 -22
3 -2 8 -10 13 -18
4 -10 9 -18 d -24
Many variants of the algorithms presented in this chapter have been proposed in
the literature. Namely, some algorithms such as the A∗ algorithm (Hart et al., 1968)
are designed to solve the shortest path problem for road networks, where the Eu-
clidean distance can be exploited as a proxy to the shortest path distance. Algo-
rithms designed to be efficient for navigation systems have also been proposed (see,
for instance, Abraham et al., 2011).
23.5 Exercises
Exercise 23.1. Consider the network represented in Figure 23.6, where the cost of
each arc is shown next to it. Apply Dijkstra’s algorithm (Algorithm 23.2) to identify
the shortest path from node o to any other node.
7
2 4 6 8
2 3
5
3
o 9
1
4
5 8
4
1 3 5 7
1 1 6
Figure 23.6: Network for Exercise 23.1. The cost of each arc is shown next to it
Exercise 23.2. Consider the network represented in Figure 23.7, where the value as-
sociated with each arc represents its length. Determine, when they exist, the shortest
paths from node o to all other nodes in the network.
Shortest path 575
2
2 5
4
o 3 1
5
4
6
4
8
3
2 2 3 1 6 8
2
7 6
6
Figure 23.7: Network for Exercise 23.2, where the value associated with each arc
represents its length
Exercise 23.3. Modify Dijkstra’s algorithm to generate all shortest path trees. Hint:
maintain several labels at each node.
Exercise 23.4. A museum hires attendants during opening hours, from 9:00 to 17:00.
It has the possibility of hiring several people, each of them being available for several
hours during the day, at a given cost, as reported in Table 23.8. Identify who the
museum should hire so that there is at least one attendant present in the museum at
any time, in order to minimize the total cost. Hint: model the problem as a shortest
path problem.
Table 23.8: Availabilities and costs of the museum attendants for Exercise 23.4
Name from to cost
Durin 9:00 13:00 30
Isumbras 9:00 11:00 18
Hamfast 12:00 15:00 14
Isengrim 12:00 17:00 38
Arathorn 14:00 17:00 16
Bilbo 13:00 16:00 22
Gelmir 16:00 17:00 9
Exercise 23.5. Consider the six Swiss cities represented in Figure 23.8. The travel
time by car (C) and by public transportation (PT) is shown next to each arc con-
necting two cities.
1. Identify the fastest itinerary from Orbe to any other city by car.
2. Identify the fastest itinerary from Orbe to any other city by public transportation.
576 Exercises
3. Thanks to the car sharing service, the traveler can change from car to public
transportation, or the other way around, in any city. In this context, identify the
fastest itinerary from Orbe to any other city.
4. Thanks to the car sharing service, the traveler can change from car to public trans-
portation, or the other way around, but only in Lausanne, Bern, and Fribourg.
In this context, identify the fastest itinerary from Orbe to any other city.
Neuchatel
C:47
C:48 PT:34 Bern
PT:64 PT:54 PT:22
C:36 C:34
Fribourg
Orbe
C:50
PT:98 PT:43
C:28 C:103 PT:87
PT:120
Lausanne C:83
C:71
PT:79
Sierre
Figure 23.8: Six cities in Switzerland, with the travel time by car (C) and by public
transportation (for Exercise 23.5)
Exercise 23.6. In the program evaluation and review technique (PERT) presented
in Section 23.4, the longest path problem allows us to identify the earliest possible ter-
mination of each task. Design an algorithm to identify the latest possible termination
of each task. Hint: start from node d and proceed backward.
Chapter 24
Maximum flow
Contents
24.1 The Ford-Fulkerson algorithm . . . . . . . . . . . . . . . . 577
24.2 The minimum cut problem . . . . . . . . . . . . . . . . . . 583
24.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588
24.1 The Ford-Fulkerson algorithm

The maximum flow problem is defined in Section 22.4.2, where it is shown that it
can be modeled as a transhipment problem. In this chapter, we present a dedicated
algorithm originally proposed by Ford and Fulkerson (1956).
Delbert Ray Fulkerson was born on August 14, 1924, in Tamms,

Illinois, USA. He obtained a Ph.D. in Mathematics from the
University of Wisconsin in 1951. He then joined the Mathemat-
ics Department at the Rand Corporation, where George Dantzig
and Richard Bellman were also working. He went to the Oper-
ations Research Department of Cornell University in 1971. He
died on January 10, 1976, in Ithaca, New-York, USA. Together
with Dantzig and Johnson, Fulkerson published on the traveling
salesman problem applied to a salesman living in Washington
DC and visiting the capitals of the 48 states of the USA. They write about the prob-
lem: “The origin of the problem is somewhat obscure. It appears to have been dis-
cussed informally among mathematicians and mathematics meetings for many years.”
They invented the concept of subtour elimination. But Fulkerson is best known for
his work on network flows. The Ford-Fulkerson algorithm (Algorithm 24.2) was moti-
vated by a military project aiming at assessing the capacity of the Eastern European
rail network to support a conventional war.
Figure 24.1: Delbert Ray Fulkerson
578 The Ford-Fulkerson algorithm
The algorithm is based on two concepts: unsaturated paths and saturated cuts.
We illustrate these ideas using the small example presented in Section 22.4.2. As
discussed there, we start by sending 2 units of flow on the path o → 2 → 3 → d
(the maximum that can be transported) to obtain the flow pattern illustrated in
Figure 24.2(a). No more flow can be sent on this path, which is said to be saturated.
4
0[4] 0[1]
2[2] 2[3] 2[2]

o 2 3 d
0[3]
(a) 2 units sent
4
0[4] 0[1]
2[2] 2[3] 2[2]

o 2 3 d
0[3]
(b) Unsaturated path
4
1[4] 1[1]
2[2] 1[3] 2[2]

o 2 3 d
1[3]
(c) 1 more unit sent
Figure 24.2: Finding the maximum flow. On each arc, the flow xij and the capacity
cij are shown as xij [cij ]
In Section 22.4.2, we discussed the fact that the paths o → 3 → d and o → 2 → 4 → d

are also saturated, as one of their arcs was at capacity. Note that the maximum flow
problem imposes lower capacities of 0 so that arcs cannot be followed backward at
the final solution. But during the course of the algorithm, nothing prevents us from
Maximum flow 579
considering paths with backward arcs in order to update the current flow vector.
In this example, the path o → 3 ← 2 → 4 → d represented in Figure 24.2(b) is
unsaturated, in the sense that some flow can be sent along this path from o to d.
Indeed, arc (o, 3) can transport up to 3 units of flow. Arc (2, 3) is traversed backward.
It means that a unit of flow traversing it is decreasing the current value, which is 2. As
the lower bound is 0, the arc is not at capacity and can transport up to 2 units of flow
backward. Arc (2, 4) can transport 4 units of flow, and arc (4, d) only 1. Therefore,
1 unit of flow can be sent along this path to obtain the flow pattern represented in
Figure 24.2(c). If we decompose the flow (Algorithm 21.3, but it is easy to do it by
hand here), we obtain that one unit of flow is sent along path o → 2 → 3 → d, one
unit along path o → 3 → d, and one unit along path o → 2 → 4 → d. Note that it is
not the same optimal solution as the one proposed in Section 22.4.2, but it achieves
the same objective, that is, transporting 3 units of flow from o to d.
Consider now the cut
Γ = ({o, 2, 3, 4}, {d}).
We have Γ → = {(3, d), (4, d)}, and Γ ← = ∅. Therefore, the flow through the cut is
X(Γ ) = 1 + 2 = 3, and its capacity is U(Γ ) = 1 + 2 = 3. The cut is therefore saturated,
and there is no way to send more flow from the first set of nodes to the second. As
the cut separates o from d, there is therefore no way to send more flow from o to d,
and the solution is optimal.
The above example provides the intuition of the concept of a saturated path. The
formal definition follows.
Definition 24.1 (Saturated path). Consider a network (N , A), with m nodes and
n arcs, a vector of lower capacities ℓ ∈ Rn , a vector of upper capacities u ∈ Rn , a
feasible flow vector x ∈ Rn , and a path P. The path P is saturated with respect to x
if
∃(i, j) ∈ P→ with xij = uij , or ∃(i, j) ∈ P← with xij = ℓij . (24.1)
The path is said to be unsaturated if
xij < uij , ∀(i, j) ∈ P→ and xij > ℓij , ∀(i, j) ∈ P← . (24.2)
In order to find an unsaturated path, we proceed by layers, similarly to the flow

decomposition algorithm presented in Section 21.6. The first layer S0 = {o} contains
only the origin. Layer St is built from layer St−1 in the following way: node j belongs
to layer St if it does not belong to any previous layer S0 , . . . , St−1 , and at least one
of the two conditions is verified (one condition for forward arcs, one for backward):
1. there is an arc (i, j) such that i ∈ St−1 and xij < uij , or
2. there is an arc (j, i) such that i ∈ St−1 and xji > ℓji .
The recursive procedure is interrupted if d is found or if St is empty. In Algorithm 24.1
when a node is included in a layer (steps 15 and 17), a label is associated, which
records the connected node in the previous layer and the direction of the connecting
arc. The notation j[i →] means that node j has been reached by following arc (i, j) in
the forward direction, while j[← i] means that node j has been reached by following
arc (j, i) in the backward direction. This is useful to reconstruct a path from o to
d, starting from d and following back the track across layers until node o is reached.
Step 23 identifies the upstream node of the forward arcs thanks to the labels, and
step 24 identifies the downstream node of the backward arcs.
Algorithm 24.1: Generation of an unsaturated path

1 Objective
2 Generate a simple path flow along an unsaturated path.
3 Input
5 Flow vector x ∈ Rn , lower bounds ℓ ∈ Rn , upper bounds u ∈ Rn .
6 Origin o, destination d.
7 Output
8 A simple path flow z ∈ Rn or a saturated cut.
9 Initialization
10 S0 := {o}, M := S0 , t := 1.
11 Repeat
12 St := ∅.
13 forall i ∈ St−1 do
14 forall j such that (i, j) ∈ A, j 6∈ M and xij < uij do
15 St := St ∪ {j[i →]}, M := M ∪ {j}
16 forall j such that (j, i) ∈ A, j 6∈ M and xji > ℓji do
17 St := St ∪ {j[← i]} , M := M ∪ {j}
18 t := t + 1.
19 Until d ∈ St−1 or St−1 = ∅.
20 if St−1 = ∅ then return M No unsaturated path.
21 j := d, s := t − 1, P := {d}.
22 Repeat
23 if j[k →] then P := {k →} ∪ P, j := k
24 if j[← k] then P := {k ←} ∪ P, j := k
25 Until j = o
26 f := min min(i,j)∈P → (uij − xij ), min(i,j)∈P ← (xij − ℓij ) .
27 forall (i, j) ∈ A do
28 if (i, j) ∈ P → then zij = f
29 if (i, j) ∈ P ← then zij = −f
30 if (i, j) 6∈ P then zij = 0
Maximum flow 581
If an unsaturated path P has been found, a flow can be sent along it. As we want
to send as much flow as possible, we calculate, for each arc, the maximum additional
flow that it can transport. For forward arcs, it is the difference uij − xij between the
upper bound and the current flow, as sending additional flow along the path increases
the flow on this arc. For backward arcs, it is the difference xij −ℓij between the current
flow and the lower bound, as sending additional flow along the path decreases the
flow on this arc. The quantity of flow that is feasible to send along the path is the
minimum of these values across all arcs of the path, that is,

f := min min → (uij − xij ), min ← (xij − ℓij ) . (24.3)
(i,j)∈P (i,j)∈P
If node d has not been reached by the algorithm that generates the layers, it means
that no unsaturated path exists. In this case, the algorithm is interrupted because
no further arc with residual capacity can be found. The set M defines the cut
Γ = (M, N \ M). By design of the algorithm, any arc with one node i in M and one
node j not in M must be saturated. Indeed, if it were not the case, node j would have
been included into a layer by the algorithm and would then be in M. Therefore, the
cut Γ is saturated. This is the Ford-Fulkerson algorithm, presented as Algorithm 24.2.
Algorithm 24.2: Ford-Fulkerson algorithm

1 Objective
2 Identify the maximum flow through a network.
3 Input
5 Upper bounds u ∈ Rn .
6 Origin o, destination d.
7 Output
8 A flow vector x ∈ Rn such that div(x)o is maximum.
9 Initialization
10 x := 0.
11 Repeat
12 Use Algorithm 24.1 with ℓ = 0 to find a feasible simple path flow z ∈ Rn
13 if a saturated cut has not been found then
14 x :=x + z
15 Until a saturated cut has been found.
Example 24.2 (Maximum flow). Consider the network represented in Figure 24.3,
where the capacities are shown in square brackets. The iterations of the Ford-
Fulkerson algorithm are reported in Table 24.1. During each iteration, layers are
built in order to identify an unsaturated path. They are described in Table 24.2.
2
3[3]
5[5]
2[3] 6
9[9]
3[3] 5[5]
o 3 1[1] d
0[1]
2[2]
0[3] 5
3[4]
3[5]
4
Figure 24.3: Network for Example 24.2, with flows and [capacities]
Table 24.1: Iterations of the Ford-Fulkerson algorithm for Example 24.2

(5, d)
(6, d)
(o, 2)
(o, 3)
(o, 4)
(2, 3)
(2, 6)
(3, 4)
(3, 6)
(4, 3)
(4, 5)
(5, 6)
Flow
Iter
1 0 0 0 0 0 0 0 0 0 0 0 0 3 o→2→6→d
2 3 0 0 0 3 0 0 0 0 0 0 3 3 o→3→6→d
3 3 3 0 0 3 0 3 0 0 0 0 6 2 o→4→5→d
4 3 3 2 0 3 0 3 0 2 0 2 6 2 o→2→3→6→d
5 5 3 2 2 3 0 5 0 2 0 2 8 1 o→4→5→6→d
6 5 3 3 2 3 0 5 0 3 1 2 9
Note that Algorithm 24.1 identifies an unsaturated path with the minimum pos-
sible number of arcs. This is needed to guarantee that the Ford-Fulkerson algorithm
converges. Indeed, if another strategy is used, it may fail to terminate when some
capacities on the arc are irrational (see Zwick, 1995 for simple examples). The version
of the Ford-Fulkerson algorithm presented in this text is not efficient. Indeed, at each
iteration, the layers are reconstructed from the beginning. An algorithm proposed by
Dinic (1970) is based on a more efficient implementation.
Maximum flow 583
Table 24.2: Constructions of layers during the iterations of the Ford-Fulkerson algo-
rithm for Example 24.2
Iteration 1
S1 = {1}
S2 = {2[1 →], 3[1 →], 4[1 →]}

S3 = {6[2 →], 5[4 →]}
S4 = {7[6 →]}
Iteration 2
S1 = {1}
S2 = {2[1 →], 3[1 →], 4[1 →]}
S3 = {6[3 →], 5[4 →]}
S4 = {7[6 →]}
Iteration 3
S1 = {1}
S2 = {2[1 →], 4[1 →]}
S3 = {3[2 →], 5[4 →]}
S4 = {6[3 →], 7[5 →]}
Iteration 4
S1 = {1}
S2 = {2[1 →], 4[1 →]}
S3 = {3[2 →], 5[4 →]}
S4 = {6[3 →]}
S5 = {7[6 →]}
Iteration 5
S1 = {1}
S2 = {4[1 →]}
S3 = {3[4 →], 5[4 →]}
S4 = {2[← 3], 6[5 →]}
S5 = {7[6 →], }
Iteration 6
S1 = {1}
S2 = {4[1 →]}
S3 = {3[4 →], 5[4 →]}
S4 = {2[← 3]}
S5 = {}
24.2 The minimum cut problem

The Ford-Fulkerson algorithm (Algorithm 24.2) terminates when a saturated cut Γ ∗ =
(M∗ , N \ M∗ ) has been identified. This cut can be seen as the “bottleneck” of the
network. Indeed, the arcs of the cut are the only way to connect nodes in M∗ to nodes
584 The minimum cut problem
not in M∗ , and they are all saturated. Using the analogy discussed in Section 21.2,
the cut can be seen as a partition of the nodes into those on the left bank and those
on the right bank of a river separating the city, and the arcs of the cut as the bridges
from one bank to the other. If all bridges are saturated, there is no possibility to
move more flow across the river. It happens that, among all the possible cuts in the
network, Γ ∗ has the smallest capacity. It is the optimal solution of the minimum cut
problem.
Definition 24.3 (Minimum cut problem). Consider a network (N , A) of m nodes

and n arcs, and a vector u ∈ Rn representing the capacity of each arc. Consider a
node o called the origin (or the source), and a node d called the destination (or the
sink). Consider any cut Γ (o, d) = (M, N \ M), where o ∈ M and d 6∈ M, separating
o from d. The minimum cut problem consists in finding, among these cuts, one with
the minimum capacity, that is the cut Γ ∗ (o, d) such that
U(Γ ∗ (o, d)) ≤ U(Γ (o, d)) ∀Γ (o, d). (24.4)
The minimum cut problem is intimately related to the maximum flow problem.
Actually, both problems are dual to each other. In particular, their optimal value is
the same.
Theorem 24.4 (Maximum flow/minimum cut theorem). Consider the maximum

flow problem (Definition 22.10), and an optimal solution x∗ . Consider the min-
imum cut problem (Definition 24.3) and an optimal solution Γ ∗ (o, d). Then,
div(x∗)o = U(Γ ∗ (o, d)). (24.5)
Proof. Consider any arbitrary cut Γ (o, d) = (M, N \ M) separating o from d. From
Theorem 21.14, we have
X
X(Γ (o, d)) = div(x∗ )i = div(x∗ )o ,
i∈M
as o and d are the only nodes with a non zero divergence, and d is not in M. From
(21.8), we also have
div(x∗ )o = X(Γ (o, d)) ≤ U(Γ (o, d)). (24.6)
Now, if we apply Algorithm 24.1, as x∗ is optimal, no unsaturated path can be found,
and the algorithm stops with a saturated cut that separates o from d. Call it Γ ∗ (o, d).
Again, the flow through the cut is
X(Γ ∗ (o, d)) = div(x∗ )o . (24.7)
As the cut is saturated, we also have
X(Γ ∗ (o, d)) = U∗ (Γ (o, d)). (24.8)

Maximum flow 585
Combining (24.6), (24.7), and (24.8), we obtain

U(Γ ∗ (o, d)) ≤ U(Γ (o, d)).
Consequently, Γ ∗ (o, d) is the cut with the minimum capacity, and its capacity is equal
to the maximum flow from o to d.
In order to investigate further the dual relationship between the two problems,
we consider the formulation (22.27)–(22.31) of the maximum flow problem as a tran-
shipment problem (see Section 22.4.2). To each supply constraint (22.28), that is to
each node i, we associate a dual variable pi . To each capacity constraint (22.29), that
is to each arc (i, j), we associate a dual variable λij . Using the techniques presented
in Chapter 4, we write the dual problem as follows
X
min λij uij (24.9)
(i,j)∈A
subject to
λij − pi + pj ≥ 0, ∀(i, j) ∈ A, (24.10)
po − pd ≥ 1, (24.11)
λij ≥ 0, ∀(i, j) ∈ A. (24.12)
In the above formulation, the dual variables p are involved only with respect to their
differences. Moreover, they do not appear in the objective function. Therefore, if p
is feasible, p + αe is also feasible for any α ∈ R (e is a vector with all entries equal to
1), and the objective function is not affected. Therefore, without loss of generality,
one of the dual variables can be normalized to an arbitrary value. Later, we propose
po = 1. Note that the matrix of the constraints of the dual problem is the transpose
of the matrix from the primal, which is totally unimodular by Theorem 22.6. The
determinant of each square submatrix is 0, −1, or +1. This holds as well for the
transposed matrix, which is also totally unimodular. Therefore, Theorem 22.4 applies
for the dual. As the right hand side of the constraints involves only 0 and 1, there is
an integer optimal solution even if the capacities are not integer.
We now show the relationship between this formulation and the minimum cut
problem. The following result shows how to generate the values of the dual variables
given a cut.
Lemma 24.5. Consider the minimum cut problem and an arbitrary cut Γ (o, d) =
(M, N \ M) separating o from d. Consider the vector p ∈ Rm defined as

1 if i ∈ M,
pi = (24.13)
0 otherwise,
and the vector λ ∈ Rn defined as

1 if (i, j) ∈ Γ → ,
λij = (24.14)
0 otherwise.
586 The minimum cut problem
Then, p and λ verify the constraints (24.10)–(24.12), and the objective function
(24.9) is the capacity of the cut.
Proof. • The constraints (24.12) are verified by definition of λ.

• Consider an arc (i, j) where i and j are both in M, or both not in M. In this
case, pi = pj and the constraints (24.10) become λij ≥ 0, and are verified.
• Consider an arc (i, j) where i is in M and j not. Therefore pi = 1 and pj = 0.
As (i, j) is an arc of the cut, we also have λij = 1, so that (24.10) is written as
1 − 1 + 0 ≥ 0, and is verified.
• From (24.13), po = 1 and pd = 0, and (24.11) is verified.
P
• From (21.7), U(Γ ) = (i,j)∈Γ → uij . From the definition of λ, we have
X
U(Γ ) = λij uij .
(i,j)∈A
From a feasible dual solution, many cuts can be generated.
Lemma 24.6. Consider the minimum cut problem. Consider also a vector p ∈
Rm and a vector λ ∈ Rn verifying the constraints (24.10)–(24.12). For any
0 ≤ γ < 1, consider the set
Mγ = {j|po − pj ≤ γ}. (24.15)
Then, Γγ = (Mγ , N \ Mγ ) is a cut separating o from d. Also, there exists γ∗

such that X
U(Γγ∗ ) ≤ λij uij . (24.16)
(i,j)∈A
Proof. Node o is trivially in Mγ for any 0 ≤ γ < 1. From (24.10), po − pd ≥ 1, and

d cannot be in Mγ for any 0 ≤ γ < 1. Therefore, the cut Mγ separates o from d,
for any 0 ≤ γ < 1. For the second result, we need a probability argument.
Consider a random variable X uniformly distributed between 0 and 1. For each
realization γ of X, we can generate a cut Γγ . The expected value of the capacity of
this cut is
X
E[U(ΓX )] = uij Pr ((i, j) ∈ ΓX→ ) .
(i,j)∈A
We show now that, for each (i, j), the probability that the arc is in the cut is bounded
by the value of the dual variable, that is
Pr ((i, j) ∈ ΓX→ ) ≤ λij . (24.17)

Maximum flow 587
The probability that (i, j) is in the cut is the probability that i is in MX and j is not.
By definition (24.15), we have
Pr ((i, j) ∈ ΓX→ ) = Pr(po − pi ≤ X < po − pj ).
If the inequalities are not compatible, that is if po − pj ≤ po − pi , po − p1 > 1, or

po − pj ≤ 0, the probability is 0. Then (24.17) results from the fact that λij ≥ 0,
that is from (24.12). If they are compatible, then, as X is uniformly distributed,
Pr(po − pj < X ≤ po − pi ) = pi − pj ,
and (24.17) holds from (24.10). Therefore,

X
E[U(ΓX )] ≤ uij λij .
(i,j)∈A
By definition of the expected value, there always exists a value 0 ≤ γ∗ < 1 such that
the realization is not greater than the expected value, that is U(Γγ∗ ) ≤ E[U(ΓX )],
proving the result.
We have now all the elements to show that the linear optimization problem (24.9)–
(24.12) is the minimum cut problem.
Theorem 24.7 (Minimum cut problem as a linear optimization). Consider the

minimum cut problem and a cut Γ ∗ (o, d) = (M∗ , N \ M∗ ) separating o from d.
Consider the vector p∗ ∈ Rm defined as

1 if i ∈ M∗ ,
p∗i = (24.18)
0 otherwise,
and the vector λ∗ ∈ Rn defined as

1 if (i, j) ∈ Γ → ,
λ∗ij = (24.19)
0 otherwise.
The cut Γ ∗ (o, d) solves the minimum cut problem if and only if (p∗ , λ∗ ) is the
optimal solution of the linear optimization problem (24.9)–(24.12):
X
min λij uij
(i,j)∈A
subject to
λij − pi + pj ≥ 0, ∀(i, j) ∈ A,
po − pd ≥ 1,
λij ≥ 0, ∀(i, j) ∈ A.
588 Exercises
Proof. If M∗ is the optimal cut, we invoke Lemma 24.5, that states that (p∗ , λ∗ ) is
feasible, and that the objective function (that is, the capacity of the cut) is minimal.
For the other direction, we exploit the fact that there is an integer optimal solution
to (24.9)–(24.12). Also, as discussed earlier, only the differences of the dual variables
matter, so that one of them can be normalized to an arbitrary value. Therefore, we
can assume without loss of generality that p∗o = 1.

Consider now Lemma 24.6 and the definition of the set Mγ . As p is integer,
po = 1, and γ < 1, the condition po − pj ≤ γ to be in the set becomes pj ≥ 1 for any
value of γ. Similarly, as γ ≥ 0, the condition po − pj > γ to be out of the set becomes
pj ≤ 0 for any value of γ. Therefore, for any value of γ, the set Mγ of Lemma 24.6
is
Mγ = {i|p∗i ≥ 1} = {i|p∗i = 1},
which is exactly the set M∗ . Therefore, (24.16) is verified for M∗ and
X
U(Γ ∗ ) ≤ λ∗ij uij .
(i,j)∈A
From the strong duality theorem (Theorems 4.17 and 6.32), we know that the op-
P
timal value of the dual problem is the same as the primal. Therefore, (i,j)∈A λ∗ij uij
has the same value as the maximum flow problem which, from Theorem 24.4, is the
optimal value of the minimum cut problem, so that
X
U(Γ ∗ ) ≥ λ∗ij uij .
(i,j)∈A
Consequently, we have X
U(Γ ∗ ) = λ∗ij uij ,
(i,j)∈A
∗
showing that Γ is a minimum cut.
The material presented in Chapters 21 to 24 is based on lecture notes, inspired
by Ahuja et al. (1993) and Bertsekas (1998).
24.3 Exercises
Exercise 24.1. Consider the network represented in Figure 24.4, where the lower
capacity of each arc is 0, and the upper capacity is shown next to the arc.
1. Apply the Ford-Fulkerson algorithm (Algorithm 24.2) to obtain the maximum
flow through the network from node o to node d, as a function of the parameter
α, where α ≥ 0.
2. What are the values of α, α ≥ 0, such that arc (b, a) belongs to the minimum
cut? And what is the minimum cut?
Maximum flow 589
5 8
o α d
10 4
Figure 24.4: Network for Exercise 24.1, where the value of each arc represents its
upper capacity
(i, j) is associated with its lower bound ℓij , its flow xij , and its upper bound uij in
the following way: (ℓij , xij , uij ).
1. Apply the Ford-Fulkerson algorithm (Algorithm 24.2) to obtain the maximum
flow through the network from node o to node d.
2. Write the mathematical formulation of the minimum cut problem on this net-
work. Solve it using the simplex algorithm (Algorithm 16.5), and verify that
Theorem 24.7 applies in this case.
Part VII
Discrete optimization
The Good Lord made all the

integers; the rest is man’s doing.
Leopold Kronecker
To address continuous optimization problems, both from a theoretical and an algo-

rithmic viewpoint, until now we have relied on the solid theories of analysis. But not
every problem can be modeled using continuous and differentiable functions. In this
part of the book, we focus on problems where the decision variables must take integer
values, or where their values represent the decision to take or not some actions. We
have already encountered such problems in the context of networks, as discussed in
the previous part of the book. For instance, in the assignment problem discussed in
Section 22.4.4, the actions consist in selling a masterpiece to potential buyers. In that
case, the mathematical properties of the network optimization problem guarantee the
integrality of the solution without any specific treatment. This does not always apply.
We now address the category of problems where integrality must be enforced in order
to generate meaningful results.
Chapter 25
Introduction to discrete
optimization
Contents
25.1 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595
25.2 Classical problems . . . . . . . . . . . . . . . . . . . . . . . 607
25.2.1 The knapsack problem . . . . . . . . . . . . . . . . . . . . 607
25.2.2 Set covering . . . . . . . . . . . . . . . . . . . . . . . . . . 609
25.2.3 The traveling salesman problem . . . . . . . . . . . . . . 610
25.3 The curse of dimensionality . . . . . . . . . . . . . . . . . 614
25.4 Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616
25.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
25.1 Modeling
We focus now on optimization problems where the variables are discrete and take
integer values. Such constraints are relevant in contexts where the decisions to be
taken concern a number of items, or entities, that are indivisible. The example
presented in Section 1.1.7 involves the production of toys, where it is not an option
to produce parts of toys. It is also relevant in situations where the set of feasible
solutions consists in all possible combinations of values of several discrete variables.
In this case, it is referred to as “combinatorial optimization.” Other examples are
presented in Section 25.2.
A particularly useful type of variables in this context are binary variables, which
can only take the value 0 or 1. A value 1 may refer to an action that is taken, a decision
to do something, or a switch that is set to “on.” The value 0 corresponds to an action
not taken, a decision not to do something, a switch set to “off”. Binary variables
provide a powerful modeling tool to translate logical conditions into a mathematical
formulation that can be used in an optimization framework. The constraints of the
optimization problem correspond to logical conditions, and a solution is feasible if
596 Modeling
all these conditions are verified. The possibilities are endless, and the modeling
of optimization problems is more of an art than a science. We provide below a few
examples of common modeling techniques using binary variables. In order to illustrate
each of them, we consider the modeling of the following problem.
Example 25.1 (Locating plants for the supply of energy). A company delivers gas
and electricity, and has to decide on the location of its plants. n sites have been
identified to locate the plants to cover the distribution to m cities that are clients of
the company. For each potential plant i, there is a decision of the local authority if
it is allowed to open the plant or not. Moreover, there is a fixed cost pi to open it.
There is also a cost of pgi for each unit of gas produced. The unitary production cost
for electricity is pei . For each city j, we have the following data:
• gj = 1 if city j buys gas from the company, 0 otherwise,

• dgj is the quantity of gas needed by city j,
• ej = 1 if city j buys electricity from the company, and 0 otherwise,
• dej is the quantity of electricity needed by city j.
The cost of infrastructure (wires, pipes, etc.) to allow the transport of the gas (resp.
the electricity) from plant i to city j is cgij (resp. ceij ). Finally, each plant must be
dedicated to either gas or electricity, but not both. The objective of the problem
is to identify what plants must be opened, to assign each of them to either gas or
electricity, and to decide the quantity to be produced and the city to be served.
More specifically, consider an instance with n = 10 potential sites (all approved
by the authorities) and m = 3 cities. The data is reported in Table 25.1.
Table 25.1: Data for Example 25.1

Plants 1 2 3 4 5 6 7 8 9 10
pi 1 1 1 1 1 1 1 1 1 1
pgi 1 1 1 2 2 1 1 1 2 2
pei 2 2 2 1 1 2 1 2 2 2
cgij City 1 5 4 3 2 1 5 4 3 2 1
City 2 2 2 2 2 2 1 2 2 2 2
City 3 1 2 3 4 5 1 2 3 4 5
ceij City 1 5 4 3 8 1 5 4 3 2 1
City 2 2 2 2 10 12 1 2 2 2 2
City 3 1 2 3 1 7 1 2 3 4 5
Cities 1 2 3
gj 1 0 1
ej 1 1 1
dgj 50 0 30
dej 30 20 10
Introduction to discrete optimization 597
A solution consists in opening plants 6 (30 units) and 8 (50 units) for gas, and
plants 4 (10 units), 5 (30 units), and 7 (20 units) for electricity. City 1 receives gas
from plant 8 and electricity from plant 5. City 2 receives electricity from plant 7.
City 3 receives gas from plant 6 and electricity from plant 4. The total cost is 153.
It happens to be an optimal solution.
We start by presenting how the common logical operations translate into binary
variables or constraints for the optimization problem.
Logical identity As described above, a logical identity is characterized by a binary

decision variable x that takes the value 1 if the proposition P is true, and 0 if it is
false, as formalized in the following truth table:
P x
True 1
False 0
Typically, this associates the logical propositions with binary parameters and vari-
ables. Considering Example 25.1, we may have the following:
• ai is a binary parameter that is 1 if the local authorities allow to open a plant

on site i, and 0 otherwise;
• xi is a binary variable that is 1 if it is decided to open plant i, and 0 otherwise;
• ygi is a binary variable that is 1 if plant i is dedicated to gas, and 0 otherwise;
• yei is a binary variable that is 1 if plant i is dedicated to electricity, and 0
otherwise;
• zgij is a binary variable that is 1 if plant i serves gas to city j, and 0 otherwise;
• zeij is a binary variable that is 1 if plant i serves electricity to city j, and 0
otherwise.
Logical negation If a proposition P is characterized by a binary variable x, the

negation ¬P is true if P is false, and false if P is true. It is characterized by the
binary variable z = 1 − x, as shown in the following truth table:
P ¬P x 1−x
True False 1 0
False True 0 1
In the above example, the quantity 1 − xi is associated with the proposition “plant
i is not opened.”
Logical conjunction If we have two propositions P and Q characterized by two
binary variables x and y, their conjunction P ∧ Q is true if both P and Q are true,
and false otherwise. It is characterized by the binary variable z = xy, as shown in
the following truth table:
598 Modeling
P Q P∧Q x y xy
True True True 1 1 1
True False False 1 0 0
False True False 0 1 0
False False False 0 0 0
In our example, a plant is available on site i if the local authorities allow its
construction (that is, ai = 1) and if the company decides to construct it (that
is, xi = 1). The availability of the plant can therefore be modeled by a binary
variable xa i = ai xi . Note that in general, this type of condition can be used to
preprocess the problem and simplify its definition. In this example, it is easier
simply to ignore all the sites that have not received the authorization and work
only with the remaining ones. This is what we assume in the remaining of the
presentation of the example.
When this formulation is used with two variables, it introduces a non linear rela-
tionship between the two variables: xy = 1, which is in general undesirable. In
the context of optimization, it is more appropriate to model the conjunction as
two constraints (x = 1 and y = 1), as all constraints have to be verified to achieve
feasibility.
Therefore, most of the time it is not necessary to model explicitly the conjunction
by a product. We have included it for the sake of completeness.
Logical disjunction If we have two propositions P and Q characterized by two bi-
nary variables x and y, their disjunction P ∨ Q is true if P or Q is true, and is
false if both are false. It is characterized by the constraint x + y ≥ 1, as shown in
the following truth table:
P Q P∨Q x y x+y≥1
True True True 1 1 Yes
True False True 1 0 Yes
False True True 0 1 Yes
False False False 0 0 No
This can be generalized to several propositions P1 , . . . , Pr , characterized by binary
variables x1 , . . . , xr . The disjunction P1 ∨ . . . ∨ Pr is true if at least one of the
propositions P1 , . . . , Pr is true. It is characterized by the constraint
r
X
xi ≥ 1. (25.1)
i=1
Pr
Note that if the variable z = x + y or the variable z = i=1 xi is included in the
model, they are not binary variables.
In our example, each city must receive its gas by plant 1, or plant 2, or . . . , or
plant n. This is modeled as
n
X
zgij ≥ 1, j = 1, . . . , m. (25.2)
i=1
Logical exclusive disjunction If we have two propositions P and Q characterized

by two binary variables x and y, their exclusive disjunction P ⊕ Q is true if P or
Q is true, but not both. It is characterized by the constraint x + y = 1, as shown
in the following truth table:
P Q P⊕Q x y x+y=1
True True False 1 1 No

True False True 1 0 Yes
False False False 0 0 No
Logical implication If we have two propositions P and Q characterized by two bi-
nary variables x and y, their implication P ⇒ Q is true if Q is true or P is false.
It is characterized by the constraint x ≤ y, as shown in the following truth table:
P Q P⇒Q x y x≤y
True False False 1 0 No
False False True 0 0 Yes
In our example, the fact that plant i produces gas (characterized by ygi ) implies
that plant i is open (characterized by xi ). This is modeled as
ygi ≤ xi , ∀i. (25.3)
Similarly, the fact that a plant serves gas to a city obviously implies that it pro-
duces gas. We obtain the constraints
zgij ≤ ygi , ∀i, j. (25.4)
Logical equivalence If we have two propositions P and Q characterized by two

binary variables x and y, their equivalence P ⇔ Q is true if both propositions
have the same truth value. It is therefore characterized by the constraint x = y,
as shown in the following truth table:
P Q P⇔Q x y x=y
True False False 1 0 No
False True False 0 1 No
False False True 0 0 Yes
Optional constraints (I) Consider an optimization problem with the constraint
f(x) ≥ a, where a > 0. We need to model the fact that the constraint must
sometimes be verified, sometimes not. More specifically, there is a binary variable
z in the problem such that the constraint f(x) ≥ a must be verified if z = 1. If
z = 0, it does not matter if it is verified or not. In order to model this, we need a
lower bound on the value of f(x). Without loss of generality,1 we can assume that
1 If the lower bound is ℓ < 0, consider the function f̃(x) = f(x) − ℓ which is such that f̃(x) ≥ 0.
600 Modeling
f(x) ≥ 0 for each feasible x. In this case, we write
f(x) ≥ az. (25.5)
Indeed, if z = 1, (25.5) becomes f(x) ≥ a, and the original constraint applies. If

z = 0, a does not play a role anymore, and (25.5) is written f(x) ≥ 0, which is
always verified.
In our example, if city j buys gas from the company (that is, if gj = 1), it must
be served by at least one plant, that is
X g
zij ≥ 1. (25.6)
i
P g
As i zijis always non negative, it plays the role of f(x) in the above discussion.
Setting a = 1 and z = gj , (25.5) is written as
X g
zij ≥ gj . (25.7)
i
This technique is also illustrated in Section 25.2.3.

Optional constraints (II) The previous modeling technique can be used for lower-
than inequality constraints, too. In this case, an upper bound is needed. Suppose
that we have a function g such that g(x) ≤ M for each x. We have a binary
variable z, and we want to model the fact that, if z = 1, the constraint g(x) ≤ b,
where b < M, must be verified. If z = 0, it does not matter if the constraint is
verified or not. It can be modeled using the constraint
g(x) ≤ bz + (1 − z)M. (25.8)
Indeed, if we define f(x) = M − g(x) and a = M − b, we obtain the same config-

uration as in the previous case, and (25.5) is equivalent to (25.8). This technique
is sometimes called the “big-M” model.
In our example, if a plant is not dedicated to gas, then the quantity of gas produced
must be equal to zero. Denote qgi ≥ 0 the variables characterizing the quantity
of gas produced by plant i. As the variable qgi must be constrained to be non
negative in any case, the optional constraint qgi = 0 can be written as qgi ≤ 0. We
need now to find a value M such that qgi ≤ M in any circumstance. As there is
no point producing more than needed, the quantity of gas produced by any plant
never exceeds the total demand of gas. Therefore, we can define
m
X
M= gj dgj . (25.9)
j=1
The constraint qgi ≤ 0 must be verified if plant i is not dedicated to gas, that is if
ygi = 0. Therefore, we apply (25.8) with g(x) = qgi , b = 0, z = 1 − ygi and obtain
m
X
qgi ≤ ygi gj dgj . (25.10)
j=1
Disjunctive constraints The modeling techniques seen above can be generalized to

model disjunctive constraints. Suppose that we have two functions f and g, such
that f(x) ≥ 0 and g(x) ≥ 0 for each x. We need to model that one of the two
constraints
f(x) ≥ a or g(x) ≥ b (25.11)
must be verified, but not necessarily both. We introduce a variable z that is 1 if

the first constraint is enforced, and 0 if it is the second one. In that case, (25.11)
can be replaced by
f(x) ≥ az and g(x) ≥ b(1 − z). (25.12)
Indeed, if z = 1, (25.12) is written as
f(x) ≥ a and g(x) ≥ 0,
which is verified if f(x) ≥ a, as the second term is always verified. Similarly, if

z = 0, (25.12) is written as
f(x) ≥ 0 and g(x) ≥ b,
which is verified if g(x) ≥ b, as the first term is always verified.

Linearization As we discuss later, it is highly desirable to have a specification of
the model that is linear in the variables. A non linear specification that happens
often in practice is
xy = z, (25.13)
where x, y, and z are binary variables. This non linear constraint is equivalent to
the following set of linear constraints:
x+y≤1+z
z≤x (25.14)
z ≤ y,
as can be seen from the following truth table:
x y z x+y≤1+z z≤x z ≤y xy = z
1 1 1 Yes Yes Yes Yes
1 1 0 No Yes Yes No
1 0 1 Yes Yes No No
0 1 1 Yes No Yes No
0 0 1 Yes No No No
602 Modeling
One way to model Example 25.1 is as follows.

Decision variables:
• xi is a binary variable that is 1 if it is decided to open plant i, and 0 otherwise;
• ygi is a binary variable that is 1 if plant i is dedicated to gas, and 0 otherwise;
• yei is a binary variable that is 1 if plant i is dedicated to electricity, and 0

otherwise;
• zgij is a binary variable that is 1 if plant i serves gas to city j, and 0 otherwise;
• zeij is a binary variable that is 1 if plant i serves electricity to city j, and 0
otherwise;
• qgi ∈ R represents the quantity of gas to be produced by plant i;
• qei ∈ R represents the quantity of electricity to be produced by plant i.
Objective function: the costs involved in this problem are
Pn
• the fixed costs associated with the opening of the plants: i=1 pi x i ,
P
• the production costs of gas: n g g
i=1 pi qi ,
Pn
• the production costs of electricity: i=1 pei qei ,
• the cost of the transportation infrastructure for gas
n X
X m
cgij zgij , (25.15)
i=1 j=1
as zgij is 1 if gas has to be delivered to city j from plant i, and

• the cost of the transportation infrastructure for electricity
n X
X m
ceij zeij . (25.16)
i=1 j=1
Therefore, the objective function is

n
X n
X n
X n X
X m n X
X m
pi x i + pgi qgi + pei qei + cgij zgij + ceij zeij . (25.17)
i=1 i=1 i=1 i=1 j=1 i=1 j=1
Constraints:
• If plant i produces gas, then plant i is open:
ygi ≤ xi ∀i. (25.18)
• If plant i produces electricity, then plant i is open:
yei ≤ xi ∀i. (25.19)
• If plant i produces gas, it cannot produce electricity. Using the logical impli-
cation and the logical negation, we obtain ygi ≤ 1 − yei . Similarly, if plant i
produces electricity, it cannot produce gas, which gives yei ≤ 1 − ygi . Both
constraints are equivalent and can be written as
ygi + yei ≤ 1, ∀i. (25.20)
• Plant i must produce gas in sufficient quantity to satisfy the total demand
associated with it. The demand for city j is dgj gj , that is dg if city j buys gas
from the company: X g g
qgi ≥ dj gj zij . (25.21)
j
• Plant i must produce electricity in sufficient quantity to satisfy the total de-
mand associated with it. The demand for city j is dej ej , that is de if city j
buys electricity from the company:
X
qei ≥ dej ej zeij . (25.22)
j
• If plant i is not dedicated to gas, then the quantity of gas produced must be
equal to zero. From the discussion above, the constraint is (25.10):
m
X
qgi ≤ ygi gj dgj , ∀i. (25.23)
j=1
• If plant i is not dedicated to electricity, then the quantity of electricity pro-

duced must be equal to zero:
m
X
qei ≤ yei ej dej , ∀i. (25.24)
j=1
• If city j buys gas from the company (that is, if gj = 1), it must be served by
at least one plant. As discussed above, the constraint is (25.7):
X g
zij ≥ gj , ∀j. (25.25)
i
• If city j buys electricity from the company (that is, if ej = 1), it must be served
by at least one plant: X
zeij ≥ ej , ∀j. (25.26)
i
• If city j receives gas from plant i, it implies that plant i produces gas:
zgij ≤ ygi , ∀i, j. (25.27)
• If city j receives electricity from plant i, it implies that plant i produces elec-
tricity:
zeij ≤ yei , ∀i, j. (25.28)
• If city j receives gas from plant i, it implies that city j buys gas from the
company:
zgij ≤ gj , ∀i, j. (25.29)
• If city j receives electricity from plant i, it implies that city j buys electricity
from the company:
zeij ≤ ej , ∀i, j. (25.30)
604 Modeling
• Variables xi , ygi , yei , zgij , and zeij take the value 0 or 1.
• Variables qgi and qei are non negative real numbers.
Putting everything together, the optimization problem is written as follows.

n
X n
X n
X n X
X m n X
X m
min pi x i + pgi qgi + pei qei + cgij zgij + ceij zeij .
i=1 i=1 i=1 i=1 j=1 i=1 j=1
subject to
ygi ≤ xi ∀i,
yei ≤ xi ∀i,
ygi + yei ≤ 1, ∀i,
X g g
qgi ≥ dj gj zij ,
j
X
qei ≥ dej ej zeij ,
j
m
X
qgi ≤ ygi dgj , ∀i,
j=1
m
X
qei ≤ yei dej , ∀i,
j=1
X
zgij ≥ gj , ∀j,
i
X
zeij ≥ ej , ∀j,
i
zgij ≤ ygi , ∀i, j,

zeij ≤ yei , ∀i, j,
zgij ≤ gj , ∀i, j,
zeij ≤ ej , ∀i, j,
g e g
xi , yi , yi , zij , zeij ∈ {0, 1}, ∀i, j,
qgi , qei ≥ 0, ∀i.
Note that the above formulation is certainly not the only possible way to model
the problem, and probably not the best one. Several simplifications can be done
(for instance, constraints (25.29) and (25.30) can be used to reduce the number of
variables). Still it illustrates various aspects of modeling that appear in many appli-
cations.
Once the modeling step has been finalized, we obtain a mixed integer optimization
problem.
Definition 25.2 (Integer optimization problem). An optimization problem is an

integer optimization problem if all of its variables are restricted to take integer values.
If some variables are allowed to take non integer values, the optimization problem is
called a mixed integer optimization problem.
In this book, we focus only on integer linear optimization problems.
Definition 25.3 (Integer linear optimization problem). An optimization problem is

an integer linear optimization problem if the objective function and the constraints
are linear functions of the decision variables, and if all of its variables are restricted
to take integer values. If some variables are allowed to take non integer values, the
optimization problem is called a mixed integer linear optimization problem. Using
the techniques described in Section 1.2, such a problem can always be written as
min cTx x + cTz z (25.31)

x∈Rnx ,z∈Nnz
subject to
Ax x + Az z = b
x≥0 (25.32)
nz
z∈N ,
where Ax ∈ Rm×nx , Az ∈ Rm×nz and b ∈ Rm .
The special case where all variables are binary is called a binary linear optimiza-
tion problem.
Definition 25.4 (Binary linear optimization problem). An optimization problem is

a binary linear optimization problem if the objective function and the constraints
are linear functions of the decision variables, and if all the variables are restricted to
take the values 0 or 1.
Such a problem can be written as
min cT x (25.33)
x∈Nn
subject to
Ax = b
(25.34)
x ∈ {0, 1}n ,
where A ∈ Rm×n and b ∈ Rm .
Note that it is possible to transform an integer variable into several binary vari-
ables, if the variable is bounded and can take only a finite number of values. Indeed,
any number can be converted into binary notation. This is exactly what happens in
a computer anyway. More specifically, consider an integer variable x that can take
606 Modeling
any value up to u, that is x ∈ {0, 1, . . . , u}. We define K binary variables zi , where K

is the smallest integer such that
u ≤ 2K − 1, (25.35)
that is
K = ⌈log2 (u + 1)⌉. (25.36)

Then, we replace x by
K−1
X
2i zi . (25.37)
i=0
Note that this transformation may allow x to exceed u. Indeed, suppose that u is 8.
Then K = 4 and 4 variables are used. If they are all set to 1, the corresponding value
is 15, which is above 8. Therefore, the constraint
K−1
X
2i zi ≤ u (25.38)
i=0
must also be included.

Another possible way to transform an integer variable into binary variables is as
follows. Assume that x is an integer variable such that ℓ ≤ x ≤ u. We introduce u − ℓ
binary variables zi , i = 1, . . . , u − ℓ, and define
u−ℓ
X
x=ℓ+ zi . (25.39)
i=1
If all variables zi are 0, the value of x is ℓ. If all variables zi are 1, the value of x is
ℓ + (u − ℓ) = u. Any other combination of 0 and 1 for the variables zi corresponds to
an integer value of x between ℓ and u. There is a major shortcoming to this approach,
though. Indeed, the same value of x may correspond to several combinations of the
variables zi . Consider an example where ℓ = 3 and u = 6. We introduce 3 binary
variables z1 , z2 , and z3 , and we associate each combination with a value of x as
represented in Table 25.2.
The values 1 and 2 can be represented in 3 different ways each. It artificially
increases the size of the feasible set. In order to avoid that, additional constraints
must be introduced. For instance, we may impose the use of binary variables in the
order that they appear. It means that we may set a binary variable zk to 1 only if
the previous variable zk−1 is 1. Using the modeling of logical implications (zk = 1
implies zk−1 = 1), we obtain the constraints
zk ≤ zk−1 , k = 2, . . . , u − ℓ. (25.40)
In our example, these constraints exclude all combinations represented in italic in

Table 25.2, and there is now a bijection between the feasible combinations of zk and
the feasible values of x. Constraints (25.40) are called symmetry breaking constraints.
We conclude this section by mentioning that integer optimization is strongly re-
lated to combinatorial optimization.
x z1 z2 z3
0 0 0 0
1 0 0 1
1 0 1 0
2 0 1 1
1 1 0 0
2 1 0 1
2 1 1 0
3 1 1 1
Table 25.2: Coding an integer with binary variables
Definition 25.5 (Combinatorial optimization). A combinatorial optimization prob-

lem consists in identifying the optimal element of a large finite set.
It is named as such because large finite sets are often generated by the list of
combinations of given elements. For instance, there are 265 possible words with 5
letters. Even if integer optimization problems may happen to have an infinite feasible
set, the use of upper bounds on the objective function allows to transform them into
problems with finite sets.
25.2 Classical problems

We describe in this section some classical combinatorial optimization problems and
discuss their formulation as integer linear optimization problems, to illustrate the
strong link between the two types of problems.
25.2.1 The knapsack problem

Example 25.6 (The knapsack problem). Patricia is about to spend several days on
a long hike in the mountain to climb the Bishorn. She is now preparing her knapsack
and thinking about which items to take. The alpine guides strongly recommends
carrying no more than W kg (say, W = 15). Therefore, Patricia has to decide which
items to carry and which items to leave at home. Each item has a different level of
importance. While it is critical to carry water and food, it is less critical to carry a
laptop. For each item i considered by Patricia, she knows its weight wi ≥ 0 and its
level of importance or utility, ui ∈ R. The problem that Patricia has to solve consists
in deciding what are the items to include in her knapsack in order to maximize the
total utility while satisfying the maximum weight constraint.
608 Classical problems
The name of the knapsack problem comes from Example 25.6. Definition 25.7
provides a more general definition of the problem. Definition 25.8 describes the 0 − 1
knapsack problem, another version of the problem that forbids multiple selection of
the same item.
Definition 25.7 (The knapsack problem). Consider a set of n items. Each item i is
associated with a value characterizing its utility ui ∈ R and a weight wi ≥ 0. The
knapsack problem consists in deciding the number of times that each item must be
selected so that the total weight of the selected items does not exceed an upper bound
W and the total utility is maximized.
Definition 25.8 (The 0 − 1 knapsack problem). Consider a set of n items. Each

item i is associated with a value ui ∈ R and a weight wi ≥ 0. The 0 − 1 knapsack
problem consists in selecting a subset of the items so that the total weight of the
selected items does not exceed an upper bound W and the total utility is maximized.
This problem can be modeled as an integer linear optimization problem. As

described in Section 1.1, we model the problem in three steps:
1. Decision variables: for each item i that can potentially be carried, we define a
variable xi ∈ N that represents the number of items of type i that are carried in
the knapsack.
2. Objective function: for each item i, the contribution to the utility of the knapsack
is ui xi . Therefore, the total utility of the knapsack as a function of the decision
variables is
Xn
ui xi , (25.41)
i=1
where n is the total number of items.
3. Constraints: similarly, for each item i, its contribution to the weight is wi xi .
Therefore, as the maximum weight is W, the constraint is written as
n
X
wi xi ≤ W. (25.42)
i=1
Moreover, by definition of the variables, the following constraints must also be

verified:
xi ∈ N, i = 1, . . . , n. (25.43)
Note that both the objective function and the constraints are linear functions of the
decision variables.
If you consider the 0 − 1 version of the problem (Definition 25.8), the variable xi
is binary and corresponds to a yes/no decision, where xi = 1 means that item i is
included in the knapsack, while xi = 0 means that it is not. Constraints (25.43) are
then replaced by
xi ∈ {0, 1}, i = 1, . . . , n. (25.44)
The knapsack problem has many applications and variants. Consider a portfolio
of stocks, where each stock has a price wi and a potential return on investment ui .
The question how to invest a total budget of W in order to maximize the expected
profit is also a “knapsack problem.”
25.2.2 Set covering

The set covering problem is illustrated by Example 25.9. Definition 25.10 provides a
more general definition of the problem.
Example 25.9 (Set covering). After the FIFA World Cup, Camille wants to complete
her collection of stickers representing the players of each competing team. She has the
possibility of buying collections of stickers from her schoolmates. In each collection,
there are stickers that she needs, but also stickers that she does not need. However,
schoolmates do not agree to sell stickers individually. The whole collection has to be
purchased. Camille must decide which collections to purchase, in order to complete
her own album, at a minimum price.
Definition 25.10 (Set covering). Consider a set U of elements and a list of n subsets
of U, denoted by Si , i = 1, . . . , n, each associated with a cost ci , such that
n
[
Si = U. (25.45)
j=1
The set covering problem consists in selecting a sublist Sij , j = 1, . . . , J such that
their union includes all the elements in U, that is
J
[
Sij = U, (25.46)
j=1
PJ
and the total cost j=1 cij is minimal.
Note that if condition (25.45), that requires the union of all Si to be equal to
U, is not verified, the problem is not feasible. Note also that the problem is often
presented in the literature with ci = 1 for each subset i, so that the covering has to
involve the minimum number of subsets.
In the above example, U is the set of stickers missing by Camille, and each subset
Si corresponds to the collection of one of her schoolmates. We denote j = 1, . . . , m
the missing stickers, and i = 1, . . . , n the available collections. For each i, the price of
collection i is ci . For each i and j, the parameter aij is equal to 1 if sticker j belongs
to collection i, and to 0 otherwise. These parameters characterize the subsets Si .
Decision variables: for each i we define a binary variable xi with value 1 if it is
decided to purchase collection i, and 0 otherwise.
Objective function: for each i, the associated cost is ci xi , that is ci if collection i is

purchased, and 0 otherwise. Therefore, the total cost associated with the decisions
characterized by the vector x is
Xn
ci xi . (25.47)
i=1
Constraints: we have to guarantee that each missing sticker is available in at least

one purchased collection. For a sticker j and a collection i, the quantity aij xi is
equal to 1 if Camille obtains sticker j, as collection i is purchased and sticker j
belongs to collection i. Therefore, the constraint is written as
n
X
aij xi ≥ 1, j = 1, . . . , m. (25.48)
i=1
Indeed, for each sticker j, at least one of the terms aij xi must be 1, so that the sum
over all collections must be at least 1. Moreover, by definition of the variables,
the following constraints must also be verified:
xi ∈ {0, 1}, i = 1, . . . , n. (25.49)
Putting everything together, the optimization problem is written as

n
X
minn ci xi ,
x∈N
i=1
subject to
n
X
aij xi ≥ 1, j = 1, . . . , m,
i=1
xi ∈ {0, 1}, i = 1, . . . , n.
25.2.3 The traveling salesman problem

The traveling salesman problem is probably the most famous problem in combinato-
rial optimization. Easy to state, it is particularly difficult to solve.
Definition 25.11 (The traveling salesman problem). A salesman must visit n − 1

customers. Starting from home, he has to plan a tour, that is, a sequence of customers
to visit, in order to minimize the travel distance.
The problem is modeled using a network with n nodes, corresponding to the

home of the salesman and the location of the n − 1 customers. Each pair of nodes is
connected by an arc, and the cost of the arc represents the travel cost (the distance,
or the travel time, for example).
Example 25.12 (The traveling salesman problem: 4 cities). A salesman living in

Lausanne (L) must visit 3 customers during the day: one in Geneva (G), one in Bern
(B), and one in Zürich (Z). Starting from home, he has to plan a tour, that is a
sequence of customers to visit, in order to minimize the travel distance. We model
the problem with a graph represented in Figure 25.1 (see Chapter 21 for the definition
of a graph).
228
Z L
64
279
104
12
5
158
G B
Figure 25.1: Traveling salesman problem: 4 cities (distances are in kilometers)
There are several ways to model this problem as an integer optimization problem.
Here is one of them.
Decision variables: for each pair of nodes we define the binary variable xij , which
is 1 if the salesman visits node j just after node i, and 0 otherwise.
Objective function: the total length of the tour must be minimized. For n cities,
it is
X n X
X
min cij xij = min cij xij , (25.50)
(i,j)∈A i=1 j6=i
which for our example is

64xLG + 104xLB + 228xLZ
+64xGL + 158xGB + 279xGZ
+104xBL + 158xBG + 125xBZ
+228xZL + 279xZG + 125xZB .
Constraints:
• Each city must have exactly one successor in the tour:
X
xij = 1, ∀i ∈ N , (25.51)
j|(i,j)∈A
that is,
xLG + xLB + xLZ = 1,
xGL + xGB + xGZ = 1,
xBL + xBG + xBZ = 1,
xZL + xZG + xBZ = 1.

• Each city must have exactly one predecessor in the tour:
X
xij = 1, ∀j ∈ N , (25.52)
i|(i,j)∈A
that is,
xGL + xBL + xZL = 1,
xLG + xBG + xZG = 1,
xLB + xGB + xZB = 1,
xLZ + xGZ + xZB = 1.
Unfortunately, these constraints are not sufficient. Indeed, the optimal solution
of the above problem is xLG = xGL = xBZ = xZB = 1, and all other variables set
to 0. The interpretation is to travel from Lausanne to Geneva and back, and to
travel from Bern to Zürich and back. Instead of one tour, the optimization prob-
lem proposes two subtours that verify the constraints (each city has exactly one
predecessor and one successor) and minimize the distance. Additional constraints
must be included to eliminate these subtours.
One possible idea is to explicitly keep track of the order of the customers along the
tour, starting from home. We introduce new variables representing the position
of each customer in the tour. If customer j is visited after customer i in the tour,
the position of customer j must be strictly larger then the position of customer
i. It is sufficient to impose that constraint for each pair of successive customers,
so that it is verified for all customers in the tour. Denoting yi the position of
customer i, we impose that
xij = 1 =⇒ yj ≥ yi + 1, (25.53)
where i and j are customers, that is any node except home. It is important to
exclude home, because the ordering must end at the last customer, and the above
constraint does not hold for the last leg, from the last customer back to home.
It could have been included for the first leg, from home to the first customer,
but it is not necessary as the objective is to remove subtours that do not include
home. This is actually the target of this constraint. If there is a tour that does
not include home, there is no way to position its nodes. Without a reference node,
it is not possible to decide if node i comes before or after node j in the loop. In
the above example, one subtour includes the arcs ZB and BZ. The constraints
yB ≥ yZ + 1 and yZ ≥ yB + 1 are incompatible, as the first imposes that Bern
comes after Zürich, while the second imposes exactly the opposite.
We now need to transform the condition (25.53) into a constraint for the opti-
mization problem. We apply the technique for optional constraints described in
Section 25.1. It requires finding a version of the constraint that is always valid.
In our case, the value yj − yi is always greater or equal to 2 − n. Indeed, as the
variables represent the position of the customer in the tour, the largest difference
happens when customer j is visited just after home (yj = 2) and customer i is
visited last (yj = n). Therefore, the technique of optional constraints can be
applied with f(y) = yj − yi + n − 2. Indeed, f(y) is greater than zero for any
y. The constraint yj ≥ yi + 1 is equivalent to yj − yi + n − 2 ≥ n − 1, that is,
f(y) ≥ n − 1. Therefore, we use a = n − 1 in (25.5) to obtain
yj − yi + n − 2 ≥ xij (n − 1), (25.54)
or, equivalently,
xij (n − 1) + yi − yj ≤ n − 2. (25.55)
Therefore, if xij = 1, we have yj − yi + n − 2 ≥ n − 1, that is yj − yi ≥ 1, which
is the required constraint. If xij = 0, we have yi − yj ≤ n − 2, which is always
verified.
In our example, we must add the constraints
xGB (n − 1) + yG − yB ≤ n − 2,
xBG (n − 1) + yB − yG ≤ n − 2,
xGZ (n − 1) + yG − yZ ≤ n − 2,
xZG (n − 1) + yZ − yG ≤ n − 2,
xBZ (n − 1) + yB − yZ ≤ n − 2,
xZB (n − 1) + yZ − yB ≤ n − 2.
Note that the solution proposed above, where both xBZ and xZB are equal to 1 is
not feasible anymore, as there is no value of yB and yZ such that both constraints
(n − 1) + yB − yZ ≤ n − 2 and (n − 1) + yZ − yB ≤ n − 2 are verified.
Therefore, the optimization problem for the traveling salesman problem with n cities
is written as:
Xn X
min cij xij (25.56)
x∈Zn(n−1) ,y∈Z(n−1)(n−2)
i=1 j6=i
subject to
X
xij = 1 ∀i = 1, . . . , n,
j6=i
X
xij = 1 ∀j = 1, . . . , n,
i6=j (25.57)
xij (n − 1) + yi − yj ≤ n − 2, ∀i = 2, . . . , n, j = 2, . . . , n, i 6= j,
xij ∈ {0, 1} ∀i = 1, . . . , n, j = 1, . . . , n, i 6= j,
yi ≥ 0 ∀i = 1, . . . , n.
614 The curse of dimensionality
There are many other combinatorial optimization problems that are not described
in this book, such as vehicle routing problems, scheduling problems, bin packing prob-
lems, or facility location problems, to cite the most classical. We refer the reader to
Papadimitriou and Steiglitz (1998) and Pardalos et al. (2013), among other references.
25.3 The curse of dimensionality

We have seen in Section 22.3 that some problems have the property that the integrality
constraints can be safely ignored, as the optimal solution of the problem is guaranteed
to have only integer values. Unfortunately, the family of such problems is rather
restricted, and it does not correspond to the most general case.
The key difficulty for discrete optimization is that there is no optimality condition
for the global optimum of the problem. For all other optimization problems analyzed
so far in this book, the optimality conditions are the starting point for the develop-
ment of algorithms. We do not have this opportunity for discrete optimization.
Example 25.13 (A simple integer optimization problem). Consider the following
integer optimization problem:
min −3x1 − 13x2 (25.58)

x∈N2
subject to
2x1 + 9x2 ≤ 29
(25.59)
11x1 − 8x2 ≤ 79.
The feasible set is defined as the intersection between the polyhedron
{x ∈ R2 |2x1 + 9x2 ≤ 29, 11x1 − 8x2 ≤ 79, x ≥ 0} (25.60)
and the set N2 . The polyhedron is represented in Figure 25.2, together with the
points of N2 with x1 ranging from 0 to 9 and x2 ranging from 0 to 4. The feasible
points are the 24 points of this lattice that are inside the polyhedron. The list of
these points is reported in Table 25.3. The level curves of the objective function are
represented by dotted lines, and the arrow identifies the direction of descent. In order
to identify the optimal solution, we can enumerate the feasible points, calculate the
value of the objective function for each of them, and select the point corresponding to
the smallest value. From the values of the objective function reported in Table 25.3,
the optimal solution is x∗ = (1, 3), corresponding to the value -42.
The enumeration method suggested in Example 25.13 is in general not appro-

priate. Remember that we have already proposed an enumeration method in Algo-
rithm 16.1, where the optimal solution of a linear optimization problem is one of
the vertices of the constraint polyhedron. Those vertices are enumerated to identify
the best one. The combinatorially large number of such vertices was the motivation
3
0
0 1 2 3 4 5 6 7 8 9
Figure 25.2: The feasible set and the level curves for Example 25.13
Table 25.3: Enumeration of the feasible solutions of Example 25.13 with the corre-
sponding value of the objective function
x1 x2 cT x x1 x2 cT x x1 x2 cT x
0 0 0 2 0 -6 4 2 -38
0 1 -13 2 1 -19 5 0 -15
0 2 -26 2 2 -32 5 1 -28
0 3 -39 3 0 -9 5 2 -41
1 0 -3 3 1 -22 6 0 -18
1 1 -16 3 2 -35 6 1 -31
1 2 -29 4 0 -12 7 0 -21
1 3 -42 4 1 -25 7 1 -34
for abandoning the vertex enumeration method and for developing the simplex al-
gorithm. Integer optimization problems suffer from the same issue, which precludes
applying an enumeration technique.
Consider the knapsack problem introduced in Example 25.6, where all variables
are binary, that is, each item potentially carried is available only once. In order
to solve the problem by enumeration, each configuration of the variables x has to
P
be analyzed, that is, the corresponding weight i wi xi must be calculated and, if
P
smaller than the capacity W, the objective function i ui xi must be recorded. If
there are n items to be considered, there are 2n combinations of values for x. Each
of them needs about 2n floating point operations to compute the weight and the
objective function. Assume that we are using a processor with 1 Teraflops, that is a
processor able to perform 1012 floating point operations per second.
• If n = 34, it takes about 1 second to solve the problem by complete enumeration.
• If n = 40, it takes about 1 minute.
• If n = 45, it takes about 1 hour.
616 Relaxation
• If n = 50, it takes about 1 day.

• If n = 58, it takes about 1 year.
• If n = 69, it takes about 2,583 years, that is, more than the duration of the
Christian Era.
• If n = 78, it takes about 1,500,000 years, that is, about the time that has elapsed
since “homo erectus” appeared on earth.

• If n = 91, it takes about 1010 years, that is, about the age of the universe.
For all practical purposes, this method is not applicable for problems of size larger
than n = 50.
Now, assume that we have a processor that is 1,000 times more powerful, that is,
a processor able to perform 1015 floating point operations per second. In that case,
a full enumeration could be applied to problems of size up to n = 59, which would
take about a day to be solved. This is only 9 more variables. For the problem with
n = 69, the time would decrease to about 3 years. If n = 78, it would be about 1,500
years, and for n = 91, about 14 million years.
Clearly, performing a full enumeration is not a viable option, except maybe for
problems of small size. With a careful implementation of the method, as well as the
availability of faster processors, the size of problems that may be solved by enumera-
tion increases a little bit. But the fact that the running time increases exponentially
with the size of the problem does not allow us to solve problems of realistic size. The
huge explosion of the running time as a function of the dimension of the problem is
sometimes referred to as the curse of dimensionality.
25.4 Relaxation
As complete enumeration is not an operational algorithm, other methods have to be
investigated. In particular, it would be convenient to transform the problem into a
continuous optimization problem by ignoring the integrality constraints and using a
relevant algorithm to solve the continuous optimization problem. This problem is
called the relaxation of the integer optimization problem.
Definition 25.14 (Relaxation). Consider the mixed integer optimization problem

P:
n
min
ny n
f(x, y, z) (25.61)
x∈R x ,y∈Z ,z∈N z
subject to
g(x, y, z) ≤ 0
h(x, y, z) = 0
(25.62)
y ∈ Zny
z ∈ {0, 1}nz ,
where f : Rnx × Rny × Rnz → R, g : Rnx × Rny × Rnz → Rm , h : Rnx × Rny × Rnz →
Rp .
The optimization problem
min f(x, y, z) (25.63)

x∈Rnx ,y∈Rny ,z∈Rnz
subject to
g(x, y, z) ≤ 0
h(x, y, z) = 0 (25.64)
0≤z≤1
is called the relaxation of P and denoted by R(P).
By definition, the feasible set of the original problem is included in the feasible
set of its relaxation. In other words, if (x, y, z) is feasible for P, it is also feasible for
R(P). Therefore, the optimal value of R(P) is a lower bound on the optimal value of
P.
Theorem 25.15 (Lower bound from the relaxation). Let P be a mixed integer
optimization problem and R(P) its relaxation. Denote (x∗ , y∗ , z∗ ) the optimal so-
lution of problem P and (x∗R , y∗R , z∗R ) the optimal solution of problem R(P). Then,
f(x∗R , y∗R , z∗R ) ≤ f(x∗ , y∗ , z∗ ). (25.65)
Proof. It is an immediate consequence of the feasibility of (x∗ , y∗ , z∗ ) for problem

R(P).
The above result is useful only if the global minimum of the relaxation can be
identified. It is the case when the objective function and the feasible set are convex.
And in particular, it is the case when the objective function and the constraints are
linear.
Definition 25.16 (Linear relaxation). Consider the mixed integer linear optimization
problem P:
n
minny n
cTx x + cTy y + cTz z (25.66)
x∈R x ,y∈N ,z∈N z
subject to
Ax x + Ay y + Az z = b
x, y, z ≥ 0
(25.67)
y ∈ Nny
z ∈ {0, 1}nz ,
where cx ∈ Rnx , cy ∈ Rny , cz ∈ Rnz , Ax ∈ Rm×nx , Ay ∈ Rm×ny , Az ∈ Rm×nz ,
and b ∈ Rm . The linear optimization problem
min cTx x + cTy y + cTz z (25.68)
x∈Rnx ,y∈Rny ,z∈Rnz
618 Relaxation
subject to
Ax x + Ay y + Az z = b
x, y, z ≥ 0 (25.69)
z≤1
is called the relaxation of P and denoted by R(P).
The relaxation is a linear optimization problem and can be solved using any
appropriate algorithm, such as the simplex method. Except if the matrix of the
constraints is totally unimodular (see Definition 22.3), the optimal solution of the
relaxation may not provide an integer solution.
Unfortunately, it is not sufficient to round up or round down the fractional solution
of the relaxation in order to identify the optimal solution of the integer optimization
problem. Consider again Example 25.13. The relaxation of the problem is
min −3x1 − 13x2 (25.70)

x∈R2
subject to
2x1 + 9x2 ≤ 29
11x1 − 8x2 ≤ 79 (25.71)
x1 ,x2 ≥ 0.
The optimal solution of the relaxation is (8.2, 1.4), as illustrated in Figure 25.3, where
the level curves of the objective function are represented by dotted lines, and the arrow
identifies the direction of descent.
0
0 1 2 3 4 5 6 7 8 9
Figure 25.3: The feasible set of the relaxation and the level curves for Example 25.13
There are two ways to round a real number to an integer: round down or round
up. As there are two real numbers to round, there are four ways to round the optimal
solution of the relaxation: (8, 1), (8, 2), (9, 1), and (9, 2) as depicted in Figure 25.4.
3
0
0 1 2 3 4 5 6 7 8 9
Figure 25.4: Example 25.13: rounding the optimal solution of the relaxation
In this example, none of these integer solutions obtained by rounding the fractional
solution of the relaxation in one way or the other is feasible. Moreover, even if one of
them was, it appears that the fractional solution of the relaxation does not lie in the
vicinity of the optimal solution of the integer optimization problem, which is (1, 3).
The problem relaxation plays an important role in the algorithms to solve integer
optimization problems. But as it appears from the example above, there is more to it
than simply rounding the optimal solution. In Chapter 26, we present methods that
use the bound provided by the relaxation (see Theorem 25.15) to avoid an explicit
complete enumeration of the feasible space.
25.5 Exercises
Exercise 25.1. In order to participate in a cooking competition called “Top Chef,”
Benoît must build a team of excellent chefs. He has access to a pool of candidates
(see Table 25.4). For each of them, he knows the age, the restaurant where they come
from, and their cooking specialties (some candidates have only one, some two). He
also knows the number of points that each of them has been able to collect in previous
similar competitions. And he wants to build a team with the maximum number of
such points. But several constraints apply.
1. The team must contain at least 3 chefs who are skilled in preparing appetizers.
2. The team must contain at least 4 chefs who are skilled in preparing fish.
3. The team must contain at least 4 chefs who are skilled in preparing meat.
4. The team must contain at least 3 chefs who are skilled in preparing dessert.
5. To promote young people, the team must contain at least 2 chefs who are 20 or
less.
6. Yoda and Obi-Wan cannot stand each other, and it is not possible to have both
of them in the team.
620 Exercises
Table 25.4: Potential candidates for the Top Chef competition for Exercise 25.1
Name Age Restaurant Points First specialty Second specialty

Anakin 49 Lausanne 124 Fish Meat
Chewbacca 39 Noirmont 105 Fish Meat
Dooku 49 Crissier 118 Meat Dessert

Gial 33 Mézières 134 Dessert —
Han 35 Crissier 160 Appetizer —
Jabba 47 Crissier 184 Fish —
Jubnuk 28 Neuchatel 101 Fish —
Lando 17 Lucens 116 Dessert —
Leia 18 Sierre 187 Appetizer Fish
Lorth 41 Baulmes 132 Dessert —
Luke 37 Vufflens 181 Appetizer —
Obi-Wan 22 Vufflens 199 Fish Meat
Padme 38 Cossonay 192 Appetizer —
Qui-Gon 19 Cossonay 136 Fish —
Sebulba 43 Valeyres 123 Meat —
Sheev 21 Crissier 127 Meat —
Teebo 32 Sierre 132 Meat —
Watto 20 Sierre 131 Meat —
Yoda 39 Noirmont 102 Appetizer Fish
Zuckuss 42 Vufflens 123 Dessert —
7. Han and Sheev have so much experience in working together that if one of the
two is included in the team, the other one must be too.
8. For the sake of fairness, not more than 3 chefs from the same restaurant should
be hired in the team.
Write an integer optimization problem to help Benoît build his team.
Exercise 25.2 (Scheduling). The journal Millenium needs to schedule the staff for
the printing workshop for the five days of the week. During each day, there are eight
one-hour time slots. Four employees are available for the tasks. Each employee has
reported his or her preference for each time slot and each day on a scale from 0 to 10,
where 10 corresponds to the highest preference and 0 corresponds to unavailability
(see Table 25.5). The following constraints must be verified.
1. Each of the 40 slots must be covered by exactly one employee.
2. An employee cannot be assigned to a time slot if she/he is not available.
3. Every person must take a lunch break either between 12:00 and 13:00, or between
13:00 and 14:00.
4. Because of the noisy work environment, every person can work only two consec-
utive time slots. A break of at least one hour must be taken after that.
5. No one can work more than 20 hours per week.
Write an integer optimization problem to help Millenium schedule the workshop
employees in order to maximize their satisfaction according to the stated preferences,

while verifying the constraints.
Table 25.5: Preference of each worker for each time slot (Exercise 25.2)
Name Slot Mo Tu We Th Fr
Mikael 9–10 10 10 10 10 10
Mikael 10–11 9 9 9 9 9
Mikael 11–12 8 8 8 8 8
Mikael 12–13 1 1 1 1 1
Mikael 13–14 1 1 1 1 1
Mikael 14–15 1 1 1 1 1
Mikael 15–16 1 1 1 1 1
Mikael 16–17 1 1 1 1 1
Lisbeth 9–10 10 9 8 7 6
Lisbeth 10–11 10 9 8 7 6
Lisbeth 11–12 10 9 8 7 6
Lisbeth 12–13 10 3 3 3 3
Lisbeth 13–14 1 1 1 1 1
Lisbeth 14–15 1 2 3 4 5
Lisbeth 15–16 1 2 3 4 5
Lisbeth 16–17 1 2 3 4 5
Harriet 9–10 10 10 10 10 10
Harriet 10–11 9 9 9 9 9
Harriet 11–12 8 8 8 8 8
Harriet 12–13 0 0 0 0 0
Harriet 13–14 1 1 1 1 1
Harriet 14–15 1 1 1 1 1
Harriet 15–16 1 1 1 1 1
Harriet 16–17 1 1 1 1 1
Alexander 9–10 10 9 8 7 6
Alexander 10–11 10 9 8 7 6
Alexander 11–12 10 9 8 7 6
Alexander 12–13 10 3 3 3 3
Alexander 13–14 1 1 1 1 1
Alexander 14–15 1 2 3 4 5
Alexander 15–16 1 2 3 4 5
Alexander 16–17 1 2 3 4 5
622 Exercises
Exercise 25.3 (Graph coloring). Consider the map of Belgium (Figure 25.5). We
want to color each province of the map so that two provinces with a common border
have different colors. What is the minimum number of colors that must be used?
Write an integer optimization problem to answer this question and to provide a valid
coloring.
Figure 25.5: The provinces of Belgium (for Exercise 25.3)
Exercise 25.4 (Bin packing). You have 10 objects of different heights (in cm): 80,
70, 60, 50, 40, 40, 20, 20, 10, and 10. As you are moving, you need to stack them
in boxes of 1 meter high. Assuming that the sections of the objects do not allow to
store them side by side, but only one over the others, how do you arrange the objects
into the boxes to use a minimum number of them? Write an integer optimization
problem to answer this question.
Exercise 25.5 (Vehicle routing). The Nespresso factory in Orbe must deliver 18,000
capsules to Lausanne, 26,000 to Neuchatel, 11,000 to Fribourg, 30,000 to Berne and
21,000 to Sierre. The travel time between each city is reported in Table 25.6. The
company has vehicles that can transport at most Q capsules each. How does the
company have to organize the delivery of the capsules to satisfy the demand of each
city, while minimizing the total time traveled? Write an integer optimization problem
to answer this question.
Table 25.6: Travel time between each pair of cities for Exercise 25.5.
Orbe Lausanne Neuchatel Fribourg Berne Sierre

Orbe 28 39 50 57 83
Lausanne 52 49 74 72
Neuchatel 48 48 111
Fribourg 35 85
Bern 105
Chapter 26
Exact methods for discrete

optimization
Contents
26.1 Branch and bound . . . . . . . . . . . . . . . . . . . . . . . 626

26.2 Cutting planes . . . . . . . . . . . . . . . . . . . . . . . . . 637
26.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644
26.4 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
In the context of discrete optimization, the absence of optimality conditions com-

plicates the design of algorithms. Even proving that a given solution is optimal is
difficult. The enumeration of feasible solutions would provide such a proof, but is
often not feasible due to the curse of dimensionality discussed earlier.
The exact methods presented in this chapter guarantee that an optimal solution
is found if the algorithm terminates in a reasonable time. We present here two such
methods. The first performs an implicit enumeration of the feasible set, by ruling
out large sets of feasible points using mathematical properties of the optimization
problem. This is achieved by partitioning the feasible set into smaller subsets, using
a so-called “branching” strategy, and by using bounds to identify subsets that are
guaranteed not to contain an optimal solution. This “branch and bound” method is
presented in Section 26.1. The second method modifies the formulation of the problem
so that one of its optimal solutions coincides with the solution of its relaxation. It can
therefore be identified using an algorithm for continuous optimization. The cutting
planes method presented in Section 26.2 consists in cutting the constraint polyhedron
using appropriate hyperplanes.
626 Branch and bound
26.1 Branch and bound

Consider the combinatorial optimization problem P defined as
min f(x) (26.1)

subject to
x ∈ F, (26.2)
where F represents the feasible set. The idea is to partition it into smaller subsets:
F = F1 ∪ . . . ∪ FK .
In the context of integer optimization, a common branching strategy consists in divid-

ing the feasible set into two parts, by selecting one integer variable xi and a threshold
integer value c for this variable. The feasible set is then partitioned into a subset con-
taining all feasible solutions such that xi ≤ c, and another one containing all feasible
solutions such that xi ≥ c + 1.
Born on February 1, 1928, in Boston, Massachusetts, USA, John

D. C. Little was the first American to obtain a Ph.D. in Opera-
tions Research (OR), under the supervision of Philip M. Morse
(MIT, 1955). Among his many contributions to OR, Little pro-
vided the first proof of what is now known as Little’s law in
queueing theory. It states that the time-average number of cus-
tomers in the system L is equal to the average arrival rate of
customers accepted into the system λ multiplied by the average
time that each spends in the system W: L = λW. It is so famous
that funny T-shirts were printed for an OR conference with the following statement:
“It may be Little, but It’s the Law.” With his co-authors Murty, Sweeney, and Karel
(Little et al., 1963), he introduced the name “branch and bound” in the OR literature.
He has been Institute Professor at the MIT Sloan School of Management since 1989.
Figure 26.1: John D. C. Little
Call Pk the optimization problem associated with the subset Fk , that is
min f(x) (26.3)

x
subject to
x ∈ Fk , (26.4)
and call x∗k a (global) optimal solution of Pk . As Fk ⊆ F , x∗k is feasible for P. The
key idea is that each optimal solution of P is one of the optimal solutions x∗k .
Exact methods for discrete optimization 627
Theorem 26.1 (Optimal solution of a partitioned problem). Consider the optimiza-

tion problem P defined by (26.1)–(26.2), and consider a partition F = F1 ∪. . .∪FK
of the feasible set into K subsets. For each k = 1, . . . , K, let x∗k be an optimal
solution of the optimization problem Pk defined as minx f(x) subject to x ∈ Fk .
Let i be such that

f(x∗i ) ≤ f(x∗k ), k = 1, . . . , K. (26.5)
Then, x∗i is an optimal solution of the optimization problem P.
Proof. Let y∗ be an optimal solution of P, meaning that f(y∗ ) ≤ f(x), ∀x ∈ F , and

in particular
f(y∗ ) ≤ f(x∗i ). (26.6)
∗
As we consider a partition, y belongs to one of the subsets, say Fk . As x∗kis an
∗
optimal solution of problem Pk , we have f(xk ) ≤ f(x), for each x ∈ Fk . In particular
f(x∗k ) ≤ f(y∗ ). (26.7)
Combining (26.5), (26.6), and (26.7), we have
f(y∗ ) ≤ f(x∗i ) ≤ f(x∗k ) ≤ f(y∗ ).
Consequently, f(y∗ ) = f(x∗i ) and x∗i is indeed an optimal solution of the problem.
Corollary 26.2 (Optimal solution of a partitioned problem). Consider the op-

timization problem P defined by (26.1)–(26.2), and consider a partition F =
F1 ∪ . . . ∪ Fm ∪ Fm+1 ∪ . . . ∪ FK of the feasible set into K subsets. For each
k = 1, . . . , m, let x∗k be an optimal solution of the optimization problem Pk de-
fined as minx f(x) subject to x ∈ Fk . For each k = m + 1, . . . , K let ℓ(Pk ) be a
lower bound on PK , that is
ℓ(Pk ) ≤ f(xk ) ∀xk ∈ Fk . (26.8)
Let i be such that

f(x∗i ) ≤ f(x∗k ), k = 1, . . . , m, (26.9)
and
f(x∗i ) ≤ ℓ(Pk ), k = m + 1, . . . , K. (26.10)
Then, x∗i is an optimal solution of the optimization problem P.
Proof. As ℓ(Pk ) ≤ f(x∗k ) for each k, Theorem 26.1 applies.

Corollary 26.2 is the motivation for the implicit enumeration. Indeed, it is not
necessary to solve each subproblem in order to find an optimal solution of P. For
several problems, the availability of lower bounds is sufficient to exclude them as
candidates to produce an optimal solution. In practice, it is much easier to calculate
lower bounds than to solve the subproblem to optimality.
The fact that problem Pk is simpler than problem P does not mean that it is
simple. To solve Pk , it may again be necessary to partition its feasible set into smaller
subsets. And they can be partitioned themselves, if needed. This decomposition can
be depicted as a tree, as represented in Figure 26.2. The “root” of the tree corresponds
to the original problem P. Each subproblem corresponds to a “branch,” which can be
divided into other branches. Clearly, the number of nodes in this tree, that is, the
number of subproblems, increases rather quickly with the size of the problem, and
the curse of dimensionality is again in the way. To deal with this, we need to avoid
constructing all possible branches of the tree.
P1 P2 ... PK
P21 P22 P23 ...
P231 P232 P233 ...
Figure 26.2: Decomposition of the optimization problem
Consider the optimization problem P defined by (26.1)–(26.2), and suppose that

we have a feasible solution x0 ∈ F . Using a branching strategy, the problem is
partitioned into K problems P1 , . . . , PK . Consider one subproblem Pk , and suppose
that we can obtain a lower bound ℓ(Pk ) on the optimal value, that is, we have ℓ(Pk )
such that ℓ(Pk ) ≤ f(x), ∀x ∈ Fk . Whatever the optimal value of Pk is, it cannot be
better than ℓ(Pk ). Now, if it happens that f(x0 ) ≤ ℓ(Pk ), it means that the optimal
value of Pk , whatever it is, is not better than f(x0 ). Consequently, there is no need
to solve Pk . As Pk may be a difficult problem to solve, discarding it may save a lot
of effort. This is the main idea of the branch and bound algorithm, described as
Algorithm 26.1.
Note also that a branching strategy may generate empty subsets Fk . This is why
it is necessary to verify that the problem is feasible (step 17) before trying to solve
it. Finally, the exact procedure to calculate a good lower bound (step 20) is problem
dependent.
Algorithm 26.1: Branch and bound

1 Objective
2 Find a global minimum of the problem P: minx f(x) subject to x ∈ F .

3 Input
4 The objective function f : Rn → R.
5 The feasible set F .
6 If available, an initial feasible solution x0 ∈ F .
7 Output
8 A global minimum x∗ .
9 Initialization
10 P := {P}.
11 if x0 is available then
12 f∗ := f(x0 ), x∗ := x0
13 else
14 f∗ := +∞
15 Repeat
16 Select a problem Pk in P.
17 if Pk is infeasible then
18 Discard Pk from P
19 else
20 Calculate a lower bound ℓ(Pk )
21 if f∗ > ℓ(Pk ) then
22 if Pk is easy to solve then
23 Calculate x∗k a global optimal solution of Pk
24 if f(x∗k ) < f∗ then
25 x∗ := x∗k , f∗ := f(x∗k )
26 else
27 Create a list of subproblems {Pk1 , . . . , PkK }
28 P := P ∪ {Pk1 , . . . , PkK }
29 Discard from P each Pi such that f∗ ≤ ℓ(Pi ).

30 Until P = ∅.
Example 26.3 (Tasks assignment for Geppetto). Geppetto has hired four workers
with skills in carpentry and finishing to take care of the production of his toys (trains
and soldiers, see Section 1.1.7). Each worker is able to work on any task, with the
same efficiency. The number of hours that each worker needs to perform each task is
reported in the following table.
Tasks
Carpentry Finishing Carpentry Finishing
Workers trains trains soldiers soldiers
Pinocchio 9 2 7 8
Jiminy 6 4 3 7
Lampwick 5 8 1 8
Figaro 7 6 9 4
Geppetto wants to assign each task to a worker, in such a way that the total time is
minimized.
We illustrate Algorithm 26.1 on Example 26.3. In this case, a lower bound on the
optimal value can easily be derived from the data:
• the minimum number of hours that Pinocchio would work is 2,
• the minimum number of hours that Jiminy would work is 3,
• the minimum number of hours that Lampwick would work is 1,
• the minimum number of hours that Figaro would work is 4.
Consequently, the total number of hours cannot be lower than 2 + 3 + 1 + 4 = 10. As
an initial feasible solution, we consider an arbitrary assignment:
Worker Task Time
Pinocchio Carpentry trains 9
Jiminy Finishing trains 4
Lampwick Carpentry soldiers 1
Figaro Finishing soldiers 4
18
We partition the set of feasible solutions by assigning the task to Pinocchio. Problem
P1 consists in assuming that Pinocchio takes care of the carpentry for trains, and
in assigning the three other tasks to the three other workers. Problems P2 , P3 , and
P4 are defined similarly, where Pinocchio is assigned to the finishing of trains, the
carpentry of soldiers, and the finishing of soldiers, respectively.
A lower bound is derived for each subproblem, in the same way as above:
• the minimum number of hours that Jiminy would work is 3,
For the three of them, the total time cannot be lower than 3 + 1 + 4 = 8 hours.
• For P1 , Pinocchio is working 9 hours so that ℓ(P1 ) = 9 + 8 = 17.
P[10]
P1 [17] P2 [10] P3 [15] P4 [16]

Figure 26.3: First branching for Example 26.3, with bounds
The branching and the calculation of the bound are summarized in Figure 26.3, where
the number into square brackets is the bound for the corresponding problem.
The initial feasible solution corresponds to a total of 18 hours. Therefore, no
subproblem can be discarded, as each lower bound is better than this value.
We now solve one of the subproblems. We select P2 , as it has the smallest lower
bound. It corresponds to the decision that Pinocchio is assigned to finishing trains.
We branch now based on the assignment of Jiminy and create the following problems:
• P21 , where Jiminy is assigned to carpentry for trains,
• P22 , where Jiminy is assigned to finishing for trains,
• P23 , where Jiminy is assigned to carpentry for soldiers,
• P24 , where Jiminy is assigned to finishing for soldiers.
We immediately note that problem P22 is not feasible, as Pinocchio and Jiminy are
not allowed to perform the same task. It is therefore discarded. A lower bound is
derived for each remaining subproblem:
For the two of them, the total time cannot be lower than 1 + 4 = 5 hours.
• For P21 , Jiminy is working 6 hours so that ℓ(P1 ) = 2 + 6 + 5 = 13.
The branching and the calculation of the bound is summarized in Figure 26.4.
Now we select problem P23 that corresponds to the smallest lower bound. As
Pinocchio and Jiminy have already been assigned for a total of 2 + 3 = 5 hours, the
problem can be solved by complete enumeration.
• Lampwick is assigned to carpentry for trains: 5 + 5 + 4 = 14, or
• Lampwick is assigned to finishing for soldiers: 5 + 8 + 7 = 20.
The optimal solution of P23 assigns Pinocchio to finishing for trains, Jiminy to carpen-
try for soldiers, Lampwick to carpentry for trains, and Figaro to finishing for soldiers.
The total number of hours is 14. It is better than the initial feasible solution that has
a value of 18. Therefore, it becomes the candidate to be the optimal solution. We
write x∗ := [P(FT ), J(CS), L(CT ), F(FS)], and f∗ := 14.
P[10]
P1 [17] P2 [10] P3 [15] P4 [16]

P21 [13] P23 [10] P24 [14]
Figure 26.4: Second branching for Example 26.3, with bounds
We can now remove all problems with a lower bound equal or higher to f∗ = 14,
discarding P1 , P3 , P4 , and P24 . As P1 , P3 , and P4 are discarded, we know that
each optimal solution of P is an optimal solution of P2 . In particular, we know that
Pinocchio is assigned to the finishing of trains in any optimal solution, as each optimal
solution of problem P2 is also an optimal solution of problem P. The current status
of the algorithm is illustrated in Figure 26.5, where the round shape corresponds to
a solved problem, and the value in parentheses is its optimal value.
P[10]
P1 [17] P2 [10] P3 [15] P4 [16]
P21 [13] P23 (14) P24 [14]
Figure 26.5: Branch & bound for Example 26.3: one subproblem is solved
The next problem to solve is problem P21 . As Pinocchio and Jiminy have already
been assigned for a total of 2 + 6 = 8 hours, the problem can be solved by complete
enumeration.
• Lampwick is assigned to carpentry for soldiers: 8 + 1 + 4 = 13, or
• Lampwick is assigned to finishing for soldiers: 8 + 8 + 9 = 25.
The optimal solution to P21 is therefore x∗21 = [P(FT ), J(CT ), L(CS), F(FS)]. As its
value is lower than the current value of f∗ , it becomes the new candidate for the
optimal solution: x∗ := x∗21 and f∗ := 13. All the subproblems of problem P2 have
been either discarded or solved to optimality. Therefore, x∗21 is also an optimal

solution of P2 , that is x∗2 = x∗21 . The same reasoning applies to the original problem P.
Therefore, we have found an optimal solution of the problem. Note that even though
we did not enumerate all possible combinations of the assignments, we guarantee
the optimality of the solution. The final status of the algorithm is illustrated by
Figure 26.3.
P(13)
P1 [17] P2 (13) P3 [15] P4 [16]
P21 (13) P23 (14) P24 [14]
Figure 26.6: Branch & bound for Example 26.3: final tree with all subproblems solved
or discarded
We now discuss Algorithm 26.1 in the specific case of an integer optimization

problem P. For the calculation of bounds, Theorem 25.15 suggests using the relaxation
R(Pk ) (see Definition 25.14) of the subproblem. Also, if an optimal solution x∗R of
the relaxation happens to be integer, it is also an optimal solution of the subproblem
and, consequently, a feasible solution of P, candidate to be an optimal solution. If it
is not integer, we use the following branching mechanism:
• select one index i such that (x∗R )i is not integer,
• define an integer threshold c as the largest integer strictly lower than (x∗R )i , that
is c = ⌊(x∗R )i ⌋, obtained by rounding down the value of (x∗R )i ,
• partition the feasible set into a subset containing all feasible solutions such that
xi ≤ c = ⌊(x∗R )i ⌋, and another one containing all feasible solutions such that
xi ≥ c + 1. Note that, in this case, c + 1 is the smallest integer strictly larger than
(x∗R )i , that is,
c + 1 = ⌈(x∗R )i ⌉,
obtained by rounding up the value of (x∗R )i .

For instance, if (x∗R )i = 3.4, the two constraints are xi ≤ 3 and xi ≥ 4. An interesting
property of this branching scheme is that x∗R no longer belongs to the feasible set of
any subproblem, as it violates both xi ≤ c and xi ≥ c + 1. Therefore, we have the
guarantee that future fractional solutions generated by the algorithm will be different
from x∗R .
Using these ingredients, the branch and bound algorithm for integer optimization
is described in Algorithm 26.2.
Algorithm 26.2: Branch and bound for integer optimization

1 Objective
2 Find a global minimum of the problem P0 : minx f(x) subject to g(x) ≤ 0,
h(x) = 0, x ∈ Zn .
3 Input
4 The functions f : Rn → R, g : Rn → Rm , h : Rn → Rp .
5 Output
7 Initialization
8 P := {P0 }.
9 f∗ := +∞.
10 Repeat
11 Select a problem Pk in P.
12 if Pk is infeasible then
13 discard Pk from P
14 else
15 Calculate the global minimum x∗R of the relaxation R(Pk )
16 ℓ(Pk ) := f(x∗R )
17 if f∗ > ℓ(Pk ) then
18 if x∗R is integer then
19 if f(x∗R ) < f∗ then
20 x∗ := x∗R , f∗ := f(x∗R )
21 else
22 Select i such that (x∗R )i is not integer
23 Create subproblem Pkℓ by adding the constraint xi ≤ ⌊(x∗R )i ⌋ to
Pk
24 Create subproblem Pkr by adding the constraint xi ≥ ⌈(x∗R )i ⌉ to
Pk
25 P := P ∪ {Pkℓ , Pkr } \ Pk
26 Discard from P each Pi such that f∗ ≤ ℓ(Pi ).

27 Until P = ∅.
Example 26.4 (Simple integer linear optimization problem). Define problem P0 as
min x1 − 2x2 (26.11)

subject to
−4x1 + 6x2 ≤5
x1 + x2 ≤5
x1 , x2 ≥0
x1 , x2 ∈ N.
The feasible set of P0 is represented in Figure 26.7(a).
0
0 1 2 3 4 5
(a) Feasible set of P0
0
0 1 2 3 4 5
(b) Feasible set and level curves of R(P0 )
Figure 26.7: Example 26.4 and its relaxation
Example inspired by Bertsimas and Weismantel (2005)
We now illustrate Algorithm 26.2 on Example 26.4. Call the original problem P0
and initialize f∗ = +∞ and P := {P0 }. The optimal solution of the relaxation R(P0 )
is x∗0 = (2.5, 2.5), with value f(x∗0 ) = −2.5. The feasible set of the relaxation R(P0 ),
some level curves of the objective function, and the optimal solution are represented
in Figure 26.7(b). As the optimal solution is not integer, we decide to branch on x2 .
In the first subproblem P1 = P0ℓ , we include the constraint x2 ≤ ⌊2.5⌋ = 2. In the
second one P2 = P0r , we include the constraint x2 ≥ ⌈2.5⌉ = 3.
We now have P = {P1 , P2 }
P1 = P0ℓ P2 = P0r
min x1 − 2x2 min x1 − 2x2
subject to subject to
−4x1 + 6x2 ≤ 5 −4x1 + 6x2 ≤ 5

x1 + x2 ≤ 5 x1 + x2 ≤ 5
x1 , x2 ≥ 0 x1 , x2 ≥ 0
x1 , x2 ∈ N x1 , x2 ∈ N
x2 ≤ 2 x2 ≥ 3
These two additional constraints are depicted in Figure 26.8(a). It appears clearly
that problem P2 is infeasible as no feasible solution of the original problem is such
that x2 is 3 or larger. Remember that the simplex algorithm is designed to detect
infeasible problems (see Section 16.3). It is immediately discarded and P = {P1 }. The
feasible set of problem P1 is represented in Figure 26.8(b).
0
0 1 2 3 4 5
(a) Additional branching constraints
0
0 1 2 3 4 5
(b) Feasible set of problem P1
Figure 26.8: First branching for Example 26.4

It is seen that all feasible (that is, integer) points of the original problem are also
feasible for P1 . Therefore, the optimal solution of P1 is also the optimal solution of
P0 . What has changed is the feasible set of the relaxation. The polygon has shrunk.
Moreover, the optimal solution of the relaxation, (2.5, 2.5) is now excluded from the
feasible set of R(P1 ). Note that one more vertex of the polygon now has integer
coordinates: (3, 2). As the optimal solution is to be found at one vertex, we increase
our chances to find an integer optimal solution by solving the relaxation.
However, the optimal solution of R(P1 ) is not integer: x∗1 = (1.75, 2), with value
f(x∗1 ) = −2.25. Here, we branch on x1 , which is the only fractional variable. In the
first subproblem P11 = P1ℓ , we include the constraint x1 ≤ ⌊1.75⌋ = 1. In the second
one P12 = P1r , we include the constraint x1 ≥ ⌈1.75⌉ = 2. We now have P = {P11 , P12 }.
P11 P12
min x1 − 2x2 min x1 − 2x2
subject to subject to
−4x1 + 6x2 ≤ 5 −4x1 + 6x2 ≤ 5
x1 + x2 ≤ 5 x1 + x2 ≤ 5
x1 , x2 ≥ 0 x1 , x2 ≥ 0
x1 , x2 ∈ N x1 , x2 ∈ N
x2 ≤ 2 x2 ≤ 2
x1 ≤ 1 x1 ≥ 2
These additional constraints are illustrated in Figure 26.9(a). The feasible set of
R(P12 ), which is the polygon on the right of Figure 26.9(b), appears to be such that
each of its vertices corresponds to an integer solution. Therefore, it is guaranteed
that one of its optimal solutions is integer and, therefore, also an optimal solution
of P12 . In this case, the optimal solution is unique and is x∗12 = (2, 2), with value
f(x∗12 ) = −2. It becomes therefore the current (and first) candidate to be the optimal
solution of P: x∗ = (2, 2) and f∗ = −2. Hence, P12 is discarded from P that now
comprises {P0 , P1 , P11 }.
We now treat P11 . The optimal solution of R(P11 ) is x∗11 = (1, 1.5), with value
ℓ(P11 ) = f(x∗11 ) = −2. It is not an integer solution and, therefore, not a feasible
solution of P11 . Because f∗ ≤ ℓ(P11 ), the problem can be discarded without being
solved. Indeed, its optimal value cannot be better than −2. Therefore, the optimal
solution of P1 (and consequently of P0 ) is the optimal solution of P12 , so that P1 and
P0 can also be removed from P, as they have been solved. We now have P = ∅, and
the algorithm terminates. An optimal solution of P is x∗ = (2, 2) with value f∗ = −2.
26.2 Cutting planes

As illustrated by Example 26.4, it is possible to shrink the constraints polyhedron,
without modifying the feasible set of the integer optimization problem. It is due to the
fact that there are infinitely many polyhedra that characterize the same feasible set
of integer solutions. Figure 26.10 represents four different polyhedra corresponding
638 Cutting planes
2
0
0 1 2 3 4 5
(a) Additional branching constraints
3
x∗12
2
x∗11
1
0
0 1 2 3 4 5
(b) Feasible set of problem P11 and P12
Figure 26.9: Second branching for Example 26.4
to exactly the same feasible set of integer solutions. Clearly, the last of them is the
most appealing. Indeed, each vertex is integer, so that an optimal solution of the
relaxation is also an optimal solution of the original problem. This polyhedron is the
convex hull of the feasible solutions (see Definition B.3).
The idea of cutting planes methods is to start from the original formulation, in-
clude additional constraints that shrink the polyhedron without modifying the feasible
set, and force some vertices of the new polyhedron to be integer. These additional
constraints are called valid inequalities.
Definition 26.5 (Valid inequality). Let F ⊆ Rn be a set. The inequality aT x ≥ b

is a valid inequality for F if it is satisfied by all x ∈ F .
It is good practice to include valid inequalities in the original formulation, by

exploiting the properties of the problem. The modeling step should be designed to
obtain a formulation that is as tight as possible. But it is usually not possible to
identify the convex hull of the feasible set in that way. Cutting planes methods
derive valid inequalities from the optimal solution of the relaxation. The most pop-
3 3
2 2
1 1
0 0
0 1 2 3 4 5 0 1 2 3 4 5
3 3
2 2
1 1
0 0
0 1 2 3 4 5 0 1 2 3 4 5
Figure 26.10: The same feasible set characterized by different polyhedra
ular method has been proposed by Gomory (1958) and exploits the simplex tableau
introduced in Section 16.2.
Consider the (mixed) integer linear optimization problem P. We solve its relax-
ation P using the simplex algorithm in two phases (Algorithm 16.5), and we obtain
the optimal tableau
B−1 A B−1 b
cT − cTB B−1 A −cTB B−1 b
where B contains the columns of A corresponding to the basic variables. The top part
of the tableau contains a transformed version of the equality constraints. Separating
the variables into basic variables xB and non basic variables xN , and denoting N the
columns corresponding to non basic variables (see Section 3.4), it is written as
B−1 Ax = B−1 b,
B−1 BxB + B−1 NxN = B−1 b, (26.12)
−1 −1
xB + B NxN = B b.
To simplify the following equations, let us assume that the variables are numbered in
such a way that the m first variables are basic, so that xi = (xB )i . Denote αij the
entry in row i and column j of the matrix B−1 A, which is obtained directly from the
640 Cutting planes
Ralph Gomory was born on May 7, 1929, in Brooklyn Heights,

New York, USA. He received his Ph.D. in mathematics from
Princeton University in 1954. From 1957 to 1959, he was As-
sistant Professor at Princeton, where he interacted with Kuhn
and Tucker. The cutting plane algorithm (Algorithm 26.3) was
created during a project for the Navy, who insisted on obtaining

integer solutions, while linear optimization provided fractional
solutions. Gomory was responsible for IBM’s Research Division
between 1970 and 1986, when he became IBM Senior Vice President for Science and
Technology. Since 1989, he has been the president of the Alfred P. Sloan Foundation.
He is currently a Research Professor at New York University.
Figure 26.11: Ralph E. Gomory
tableau, remembering that B−1 b = x∗B . Therefore, the ith constraint is written as
X
xi + αij xj = (x∗B )i . (26.13)
j non basic
As x is feasible, it is non negative. Therefore, if we round down all coefficients αij in

the left hand side of (26.13), its value cannot increase and we obtain a valid inequality
for all feasible solutions of the relaxation:
X X
xi + ⌊αij ⌋xj ≤ xi + αij xj = (x∗B )i . (26.14)
j non basic j non basic
Rounding down the two sides of this inequality, we obtain another valid inequality
for the feasible solutions of the relaxation:
 
 
 X 
xi + ⌊αij ⌋xj  ≤ ⌊(x∗B )i ⌋. (26.15)
j non basic
Now, consider only the x that are integer. Therefore,

 
 
 X  X
 xi + ⌊αij ⌋xj  = xi + ⌊αij ⌋xj (26.16)
j non basic j non basic

X
xi + ⌊αij ⌋xj ≤ ⌊(x∗B )i ⌋, (26.17)
j non basic
which is a valid inequality for all feasible solutions of the integer optimization problem.
Equation (26.17) is called a Gomory cut .
We show now that the optimal solution x∗ of the relaxation does not verify (26.17).
Indeed, all non basic components of x∗ are zero, and (26.17) is written as
x∗i ≤ ⌊(x∗B )i ⌋ = ⌊x∗i ⌋, (26.18)

where (x∗B )i = x∗i because of our numbering convention. Inequality (26.18) is satisfied
by x∗ only if x∗i is integer. Therefore, in order to generate a valid inequality that
excludes x∗ from the relaxation polyhedron, the index i must be chosen such that x∗i
is fractional. Algorithm 26.3 describes how Gomory cuts are used to solve an integer
linear optimization problem.
We illustrate the method in Example 26.4. Note that in order for the relaxation
to be solved by Algorithm 16.5, the problem must first be transformed into standard
form by adding two slack variables:
min x1 − 2x2
subject to
−4x1 + 6x2 + x3 =5
x1 + x2 + x4 =5
x1 , x2 , x3 , x4 ≥0
The optimal tableau is

x1 x2 x3 x4
0 1 0.1 0.4 2.5 x2
1 0 −0.1 0.6 2.5 x1
0 0 0.3 0.3 2.5
The first constraint of the tableau corresponds to a fractional value of x2 . It is
written as
x2 + 0.1x3 + 0.4x4 = 2.5, (26.19)
and the valid inequality (26.17) is written as
x2 + ⌊0.1⌋x3 + ⌊0.4⌋x4 ≤ ⌊2.5⌋, (26.20)
that is
x2 ≤ 2. (26.21)
The second constraint of the tableau corresponds to a fractional value of x1 . It is

written as
x1 − 0.1x3 + 0.6x4 = 2.5, (26.22)
and generates the valid inequality
x1 − x3 ≤ 2, (26.23)
as ⌊−0.1⌋ = −1, ⌊0.6⌋ = 0, and ⌊2.5⌋ = 2. As x3 = 5 + 4x1 − 6x2 , the valid inequality
in the original variables is
−3x1 + 6x2 ≤ 7. (26.24)
These valid inequalities are illustrated in Figure 26.12.

642 Cutting planes
3 3
2 2
1 1
0 0
0 1 2 3 4 5 0 1 2 3 4 5
(a) Gomory cut on x1 (b) Gomory cut on x2
Figure 26.12: Gomory cuts for Example 26.4
We now introduce the cut on x2 in the formulation to obtain, after including

another slack variable:
min x1 − 2x2
subject to
−4x1 + 6x2 + x3 =5
x1 + x2 + x4 =5
x2 + x5 =2
x1 , x2 , x3 , x4 , x5 ≥0
x1 x2 x3 x4 x5
0 1 0 0 1 2 x2
0 0 0.25 1 −2.5 1.25 x4
1 0 −0.25 0 1.5 1.75 x1
0 0 0.25 0 0.5 2.25
Variables x1 and x4 are fractional. Therefore, the following valid inequalities can
be generated:
x4 − 3x5 ≤ 1 (26.25)
for x4 and
x1 − x3 + x5 ≤ 1 (26.26)
for x1 . In the original variables, these are
−x1 + 2x2 ≤ 2 (26.27)
and
−3x1 + 5x2 ≤ 4. (26.28)
These cuts are illustrated in Figure 26.13. They both have the property that they cut
the polygon at (2, 2), which becomes a vertex. Actually, the optimal solution of the
relaxation of any of these two problems is (2, 2), which is also the optimal solution of
the integer optimization problem. No more cuts are necessary.
3 3
2 2
1 1
0 0
0 1 2 3 4 5 0 1 2 3 4 5
(a) Gomory cut on x1 (b) Gomory cut on x4
Figure 26.13: More Gomory cuts for Example 26.4
Algorithm 26.3: Gomory cuts for integer linear optimization

1 Objective
2 Find a global minimum of the integer linear optimization problem P:
minx cT x subject to Ax = b, x ≥ 0, x ∈ Zn .
3 Input
7 Output
9 Repeat
10 Solve the relaxation using Algorithm 16.5 with A, b and c.
11 Call x∗R the optimal solution and T ∗ the optimal tableau.
12 if x∗R is integer then
13 x∗ = x∗R
14 else
15 Let i ≤ m be such that T ∗ (i, n + 1) is fractional
16 Let γ := (⌊T ∗ (i, 1)⌋, . . . , ⌊T ∗ (i, n)⌋) be the first n elements of row i of T ∗
rounded down

A 0 b c
17 Let A := , b := and c :=
γ 1 ⌊T ∗ (i, n + 1)⌋ 0
18 m := m + 1, n := n + 1.
19 Until x∗ is integer.
644 Exercises
We provide another illustration of the Gomory cuts with Example 25.13. The
feasible set is represented in Figure 25.2. After the introduction of the slack variables,
we obtain the relaxation:
min −3x1 − 13x2 (26.29)
x∈N2
subject to
2x1 + 9x2 + x3 = 29
11x1 − 8x2 + x4 = 79 (26.30)
x1 , x2 , x3 , x4 ≥ 0.
The optimal tableau of the relaxation is
x1 x2 x3 x4
0 1 0.10 −0.02 1.4 x2
1 0 0.07 0.08 8.2 x1
0 0 1.45 0.01 42.8
The two variables are fractional, so two valid inequalities can be generated:
x2 − x4 ≤ 1 (26.31)
and
x1 ≤ 8. (26.32)
As x4 = 79 − 11x1 + 8x2 , the first inequality in the original variables is written as
11x1 − 7x2 ≤ 80. (26.33)
These two cuts are illustrated in Figure 26.14. It appears in this example that the part
of the polyhedron that is cut can be small. We expect, in this case, the algorithm
to take a while to converge. Depending on the cut selected for inclusion at each
iteration, the algorithm may use more than 50 iterations to find the optimal solution
of this problem.
In practice, the cutting plane method is usually combined with the branch and
bound algorithm described in Section 26.1. In addition to the additional constraints
generated by the branching strategy, valid inequalities such as Gomory cuts are also
included. Such a method is called branch and cut.
26.3 Exercises
Exercise 26.1. Find better bounds for Example 26.3.
Exercise 26.2. Consider the assignment problem presented as Exercise 22.2.
1. Solve it using the branch and bound algorithm (Algorithm 26.1).
2. Consider its mathematical formulation as a transhipment problem (see Exer-
cise 22.2). Solve it with the simplex algorithm (Algorithm 16.5).
3
0
0 1 2 3 4 5 6 7 8 9
Figure 26.14: The feasible set for Example 25.13 with Gomory cuts
26.4 Project
Objective
The objective of the project is to analyze how different exact methods handle different
optimization problems.
Approach
1. For each problem,

• apply the branch and bound algorithm (Algorithm 26.2) using different strate-
gies to select the next problem to solve (step 11 of the algorithm):
(a) select the problem associated with the best bound,
(b) select the last problem that has been introduced in P (called last-in-first-
out or depth-first strategy),
(c) select the first problem that has been introduced in P (called first-in-first-
out or breadth-first strategy);
• apply the Gomory cut algorithm (Algorithm 26.3).
2. Report, for each run, the number of problems that have been processed and the
running time.
Algorithms

646 Project
Problems
Exercise 26.3. Solve the instance of the problem of locating plants for the supply
of energy described in Example 25.1, with 10 sites and 3 cities, using the data in
Table 25.1.
Exercise 26.4. Solve the knapsack problem presented in Example 27.2.

Exercise 26.5. Write and solve the traveling salesman problem with 16 cities pre-
sented in Example 27.3 as an optimization problem using the method described in
Section 25.2.3.
Exercise 26.6. Solve the task assignment problem of Exercise 25.1.
Exercise 26.7. Solve the scheduling problem of Exercise 25.2.
Exercise 26.8. Solve the graph coloring problem of Exercise 25.3. Write also the
version of the problem where you must color the cantons in Switzerland.
Exercise 26.9. Solve the bin packing problem of Exercise 25.4.
Exercise 26.10. Solve the vehicle routing problem of Exercise 25.5 with Q =
100,000, Q = 150,000, and Q = 200,000.
Chapter 27
Heuristics
Contents
27.1 Greedy heuristics . . . . . . . . . . . . . . . . . . . . . . . 648
27.2 Neighborhood and local search . . . . . . . . . . . . . . . 656
27.3 Variable neighborhood search . . . . . . . . . . . . . . . . 669
27.4 Simulated annealing . . . . . . . . . . . . . . . . . . . . . . 674
27.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 682
27.6 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682
Due to the combinatorial nature of the integer optimization problems, the methods
described in Chapter 26 may fail to identify the optimal solution in a reasonable
amount of time. Also, the number of subproblems to consider in a branch and bound
tree may be so high that the tree may sometimes not fit into the memory of the
computer. Under such circumstances, it is hopeless to find the optimal solution
of the problem. Instead, for all practical purposes, we are interested in efficient
techniques that identify a “good” solution, meaning a solution that is feasible and
(hopefully) significantly better than a solution that would have been designed “by
hand” by a human being, expert in the problem. Such techniques are called heuristic
algorithms.
Definition 27.1 (Heuristic algorithm). A heuristic algorithm is a method that ex-

plores the set of feasible solutions of an optimization problem, exploiting the structure
of the problem to identify quickly good feasible solutions.
648 Greedy heuristics
The above definition is relatively vague, as it involves vague adjectives such as

“intuitive,” “quickly” and “good.” Designing heuristics is more of an art than a sci-
ence. Indeed, the success of a heuristic depends on the level of intuition that can be
developed about a problem, on the time that is available to obtain a practical solu-
tion, and on the exact concrete characterization of what “good” means in the given
context.
For these reasons, the heuristics introduced in this chapter are described for spe-
cific problems, in order to illustrate the concepts. We refer the reader to the large
literature for additional examples (see, among others, Gendreau and Potvin, 2010).
27.1 Greedy heuristics

A greedy heuristic usually refers to a method that constructs a feasible solution step
by step in a way that each step is locally optimal (see Definition 21.22). We illustrate
the idea on the knapsack problem and the traveling salesman problem.

An example of a greedy algorithm to solve the knapsack problem consists in sorting
the items by decreasing order of their relative value, that is, by decreasing value of
the ratio ui /wi . The items are considered in that sequence and, for each of them, if
there is enough capacity left in the knapsack, it is added.
Example 27.2 (A larger knapsack problem). Consider a knapsack problem with 12

items. The utilities and the weights are as follows:
i 1 2 3 4 5 6 7 8 9 10 11 12
u 80 31 48 17 27 84 34 39 46 58 23 67
w 84 27 47 22 21 96 42 46 54 53 32 78
The capacity of the knapsack is 300.
Consider Example 27.2. We sort the items according to their relative value:
i 5 2 10 3 1 6 12 9 8 7 4 11
ui 27 31 58 48 80 84 67 46 39 34 17 23
wi 21 27 53 47 84 96 78 54 46 42 22 32
ui
wi
1.29 1.15 1.09 1.02 0.95 0.88 0.86 0.85 0.85 0.81 0.77 0.72
We proceed through the list and include items 5, 2, 10, 3, and 1, for a total weight
of 232 and the total utility to 244. Items 6 and 12 cannot be included, as it would
violate the capacity constraint. Item 9 can be included, increasing the weight to 286
and the utility to 290. None of the remaining items can be included anymore.
Heuristics 649
An example of a greedy algorithm for the traveling salesman problem is to start

from home and, at each step, select the closest city as the next one, as described in
Algorithm 27.1.
Algorithm 27.1: Nearest neighbor greedy algorithm

1 Objective
2 Find a good tour for the traveling salesman problem.
3 Input
4 The number of cities n.
5 The distance d(i, j), i = 1, . . . , n, j = 1, . . . , n, i 6= j.
6 Output
7 A sequence T of cities.
8 Initialization
9 T := {1}.
10 S := {2, . . . , n}.
11 c := 1.
12 Repeat
13 Select i ∈ argminj∈S d(c, j).
14 c := i.
15 T := T ∪ {i}.
16 S := S \ {i}.
17 Until S = ∅.
To illustrate the algorithm, we consider an instance of the TSP with 16 cities, as

described in Example 27.3.
Example 27.3 (The traveling salesman problem: 16 cities). A salesman must visit
15 customers during the day. Starting from home, he has to plan a tour, that is, a
sequence of customers to visit, in order to minimize the travel distance. We model it
with a graph, where each vertex represents either the home place or a customer. There
is an edge between each pair of vertices, and the cost of the edge is the Euclidean
straight distance between two vertices. The location of the home place (vertex 1) and
of the 15 customers are reported in Table 27.1 and illustrated in Figure 27.1.
650 Greedy heuristics
Table 27.1: Traveling salesman problem: position of the home (vertex 1) and of 15
customers to visit
Vertex x-coord y-coord Vertex x-coord y-coord
1 9.14 3.92 9 3.41 27.54
2 18.46 1.17 10 13.63 22.11

3 28.35 9.72 11 24.41 22.10
4 39.41 7.39 12 39.90 22.64
5 8.87 10.33 13 3.37 30.48
6 12.56 17.55 14 14.52 39.34
7 27.29 13.97 15 21.75 37.64
8 32.31 10.78 16 32.48 30.06
40 14
15
35
30 13 16
9
25
10 11 12
20
6
15
7
10 5 8
3
4
5
1
2
0
0 5 10 15 20 25 30 35 40
Figure 27.1: Traveling salesman problem: position of the home (vertex 1) and of 15
customers to visit
Heuristics 651
We obtain an itinerary of length 158.5. illustrated in Figure 27.2.
14 7.4
15
1 4.2 13
.1
13 16
2.9
9 10
.5
10.8 12
10 11
4.7
30
.4
8.6
6
15.3
8.1
7
4.4
4.1 8
5 7.9
3
6.4
1 9.7
Figure 27.2: Feasible solution provided by the nearest neighbor greedy algorithm for
Example 27.3 (length: 158.5)
Another greedy heuristic for the traveling salesman problem consists in improving
an existing subtour by inserting a vertex. An example of insertion is illustrated in
Figure 27.3, where a subtour of length 134.4 is constructed by inserting city 14 into a
subtour of 12 cities (of length 108.1). Note that it is not the best possible insertion,
which would consist in inserting city 15 after city 16 in the tour and obtaining a
subtour of length 125.6. The greedy algorithm consists in selecting the best possible
insertion at each step, as described in Algorithm 27.2.
The iterations of the insertion greedy algorithm (Algorithm 27.2) on Example 27.3
are reported in Table 27.2, and the final tour illustrated in Figure 27.4.
Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 652 Greedy heuristics
14
15
19
17.3
13
.9 16
9 10
.3 .5
11
10.8 12
10 11
4.7
15.3
8.1
7 5.9
4.4
5 8 7.9
3
6.4
4
.1
13
1 9.7
Figure 27.3: Example of a subtour involving 12 cities, of length 108.1, with insertion
of city 14
Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 Heuristics 653
Algorithm 27.2: Insertion greedy algorithm

1 Objective
2 Find a good tour for the traveling salesman problem.
3 Input
4 The number of cities n.
5 The distance d(i, j), i = 1, . . . , n, j = 1, . . . , n, i 6= j.
6 ℓ(T, i, j) calculates the length of the tour obtained by inserting city i after j
in T .
7 Initial subtour: sequence of cities T , including home (vertex 1).
8 Output
9 A sequence T of cities.
10 Initialization
11 S = {1, . . . , n} \ T .
12 Repeat
13 L = ∞.
14 for i ∈ S do
15 for j ∈ T do
16 if ℓ(T, i, j) < L then
17 i∗ := i, j∗ := j
18 L := ℓ(T, i, j)
19 Insert vertex i∗ after vertex j∗ in T

20 S := S \ {i∗ }.
21 Until S = ∅.
Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 654 Greedy heuristics
Table 27.2: Iterations of the insertion greedy algorithm (Algorithm 27.2) on Exam-
ple 27.3
Length Subtour
12.8 1 5 1
28.6 1 6 5 1
37.9 1 6 10 5 1
50.9 1 2 6 10 5 1
64.2 1 2 3 6 10 5 1
66.1 1 2 3 7 6 10 5 1
71.8 1 2 3 8 7 6 10 5 1
78.0 1 2 3 8 7 11 6 10 5 1
93.1 1 2 3 4 8 7 11 6 10 5 1
110.0 1 2 3 4 8 7 11 6 10 9 5 1
114.6 1 2 3 4 8 7 11 6 10 13 9 5 1
132.9 1 2 3 4 8 7 11 6 10 14 13 9 5 1
140.5 1 2 3 4 8 7 11 6 10 15 14 13 9 5 1
156.6 1 2 3 4 8 7 11 6 10 16 15 14 13 9 5 1
172.9 1 2 3 4 8 7 11 6 10 12 16 15 14 13 9 5 1
14 7.4
15
1 4.2 13
.1
13 16
2.9
9 10
.5
26.3
10 11 12
12.7
4.7
18.1
8.6
7 5.9
8
5
3 11.3 7.9
6.4
4
.1
13
1 9.7
Figure 27.4: Result of the insertion greedy algorithm (Algorithm 27.2) on Exam-
ple 27.3 (length: 172.9)
656 Neighborhood and local search
27.2 Neighborhood and local search

As emphasized in Definition 27.1, heuristic algorithms should explore the large set
of feasible solutions. This is not really achieved by the greedy algorithms, which
generate only one (hopefully good) feasible solution. Such an exploration requires a
proper exploration tool that generates a sequence of feasible solutions. Potentially, it

should be able to generate all feasible solutions. Such a tool is called a neighborhood
structure.
Definition 27.4 (Neighborhood structure). A neighborhood structure N is a func-

tion that associates a solution x of the optimization problem with a set N(x) of other
solutions (not necessarily feasible). Each element of N(x) is called a neighbor of x.
In general, the neighborhood of a feasible solution x is defined by a set of simple

modifications of x, each of them generating a neighbor.
For instance, consider an integer optimization problem and x ∈ Zn a feasible
solution. For each index k, we generate two neighbors by increasing and decreasing
the value of xk by 1. The two neighbors, denoted by yk+ are yk− , are defined as
yk+
i = yk−
i = xi , ∀i 6= k, yk+
k = xk + 1, yk−
k = xk − 1. (27.1)
For example, if k = 2,
x = (3, 5, 2, 8) y2+ = (3, 6, 2, 8) y2− = (3, 4, 2, 8).
The simple modifications of x characterize a neighborhood of size 2n. It is illustrated,

for n = 2, by Figure 27.5.
y2+
y1− x y1+
y2−
Figure 27.5: A simple neighborhood structure
There is an infinite number of possible neighborhood structures, and the choice

of a specific structure should be motivated by the properties of the optimization
problem. Creativity is required here. Another example of a neighborhood for an
Heuristics 657
integer optimization problem is illustrated in Figure 27.6, where the neighbors are
generated using moves inspired by the knights of a chess game. Other examples
of neighborhood structures are provided later in this chapter. When dealing with
practical applications, it is good practice to get inspiration from experts from the
field when defining a neighborhood structure. In particular, a good way to define a
neighborhood is to mimic how an expert would modify an existing feasible solution

in order to try to improve it.
Figure 27.6: Another neighborhood structure
An important property of a good neighborhood structure is that solving an opti-

mization problem within the neighborhood of a feasible solution x should be an easy
task. Typically, it should be feasible to perform an exhaustive enumeration of its
elements. Algorithm 27.3, called local search, directly exploits this feature.
Algorithm 27.3: Local search

1 Objective
2 Find a good feasible solution of the optimization problem minx f(x) subject
to x ∈ F .
3 Input
6 A neighborhood structure N.
7 An initial feasible solution x0 ∈ F such that N(x0 ) ∩ F 6= ∅.
8 Output
9 A feasible solution x∗ .
10 Repeat
11 Select xc ∈ argminx∈N(xk )∩F f(x)
12 if f(xc ) < f(xk ) then
13 xk+1 := xc
14 k := k + 1
15 Until f(xc ) = f(xk ) or N(xk ) ∩ F = ∅.
16 x∗ := xk .
The idea is simple. At each iteration, the current iterate is replaced by its best
feasible neighbor, until the current feasible solution is the best in the neighborhood.
Therefore, Algorithm 27.3 generates a local minimum with respect to the given neigh-
borhood structure. Note that this concept of local minimum is a generalization of the
one introduced for continuous optimization (Definition 1.5), where the neighborhood
was defined as all points within a ball around x.

The main issue with heuristic methods is that there is no theoretical support for
their validity. Only empirical evidences can be derived. However, we can suggest some
properties that a neighborhood should carry in order to be used in Algorithm 27.3.
• An element x belongs to its own neighborhood: for all x, x ∈ N(x). This property
is important only to characterize a local minimum as x∗ ∈ argminx∈N(x)∩F f(x).
• The neighborhood structure is symmetric, that is, for all x and y, x ∈ N(y) if and
only if y ∈ N(x).
• The neighborhood structure should allow any feasible point to be reached from
any other feasible point in a finite number of steps.
• The size of the neighborhood structure should not grow too fast with the size
of the problem. The optimization problem solved at each iteration within the
neighborhood structure must be tractable.
There are several variants of the local search method. One of them consists in
evaluating the neighbors in a given order, and to accept as next iterate the first one
that is better than the current iterate (Algorithm 27.4). This may allow compu-
tational time to be saved in early iterations, when many neighbors are better than
the current feasible solution. It means that the neighborhood structure should be
associated with an order of its elements.
For large neighborhoods, some variants propose to randomly select a fixed number
of candidates in the neighborhood. If none of these candidates is better than the
current iterate, the algorithm is interrupted. The advantage of this approach is that
the computational complexity of the algorithm can be controlled independently of
the size of the neighborhood.
Consider Example 25.13, together with the neighborhood structure defined by
(27.1) and illustrated in Figure 27.5. We apply Algorithm 27.3 with x0 = (6, 0). The
iterations are described in Table 27.3 and illustrated in Figure 27.7. The starting point
x0 = (6, 0) has four neighbors. The point (6, −1) is infeasible. Among the others,
(6, 1) is associated with the lowest value of the objective function and is selected as
the next iterate. Among its four neighbors, (6, 2) is infeasible and (7, 1) is selected as
the next iterate. Among its neighbors, two are infeasible and two have a higher value
of the objective function. Therefore, (7, 1) is a local minimum for this neighborhood
structure.
Heuristics 659
Algorithm 27.4: Local search: a variant

1 Objective
to x ∈ F .
Input
3
6 A neighborhood structure N such that for each x, N(x) is an ordered
sequence of solutions.
7 An initial feasible solution x0 ∈ F such that N(x0 ) ∩ F 6= ∅.
8 Output
10 Repeat
11 improvement :=FALSE .
12 for xc ∈ N(xk ) ∩ F do
13 if f(xc ) < f(xk ) then
14 improvement :=TRUE
15 xk+1 := xc
16 k := k + 1
17 break // The “for” loop is interrupted
18 Until improvement = FALSE or N(xk ) ∩ F = ∅.

19 x∗ := xk .
Table 27.3: Local search on Example 25.13 with x0 = (6, 0)
xk Neighbors
x (6,0) (7,0) (5,0) (6,1) (6,-1)
f(x) -18 -21 -15 -31 —
x (6,1) (7,1) (5,1) (6,2) (6,0)
f(x) -31 -34 -28 — -18
x (7,1) (8,1) (6,1) (7,2) (7,0)
f(x) -34 — -31 — -21
3
0
0 1 2 3 4 5 6 7 8 9
Figure 27.7: Local search on Example 25.13 with x0 = (6, 0)
If another starting point is selected, a different local minimum can be reached. The
iterations starting from (2, 0) are reported in Table 27.4 and illustrated in Figure 27.8.
Table 27.4: Local search on Example 25.13 with x0 = (2, 0)
xk Neighbors
x (2,0) (3,0) (1,0) (2,1) (2,-1)
f(x) -6 -9 -3 -19 —
x (2,1) (3,1) (1,1) (2,2) (2,0)
f(x) -19 -22 -16 -32 -6
x (2,2) (3,2) (1,2) (2,3) (2,1)
f(x) -32 -35 -29 — -19
x (3,2) (4,2) (2,2) (3,3) (3,1)
f(x) -35 -38 -32 — -22
x (4,2) (5,2) (3,2) (4,3) (4,1)
f(x) -38 -41 -35 — -25
x (5,2) (6,2) (4,2) (5,3) (5,1)
f(x) -41 — -38 — -28
Similarly, if a different neighborhood structure is selected, the algorithm may

also end up at a different local minimum. The iterations starting from (2, 0), using
the neighborhood inspired by the chess knights (see Figure 27.6), are reported in
Table 27.5 and illustrated in Figure 27.9.
Heuristics 661
4
0
0 1 2 3 4 5 6 7 8 9
Figure 27.8: Local search on Example 25.13 with x0 = (2, 0)
Table 27.5: Local search on Example 25.13 with the “knight” neighborhood and x0 =
(2, 0)
xk Neighbors
x (2,0) (4,1) (4,-1) (0,1) (0,-1) (3,2) (3,-2) (1,2) (1,-2)
f(x) -6 -25 — -13 — -35 — -29 —
x (3,2) (5,3) (5,1) (1,3) (1,1) (4,4) (4,0) (2,4) (2,0)
f(x) -35 — -28 -42 -16 — -12 — -6
x (1,3) (3,4) (3,2) (-1,4) (-1,2) (2,5) (2,1) (0,5) (0,1)
f(x) -42 — -35 — — — -19 — -13
0
0 1 2 3 4 5 6 7 8 9
Figure 27.9: Local search on Example 25.13 with the “knight” neighborhood and
x0 = (2, 0)

We illustrate the local search method described by Algorithm 27.4 on the 0–1 knap-
sack problem. To do so, we need to define a feasible starting point and a neighborhood
structure. There is an obvious feasible solution that can be considered as a starting
point: the empty sack. It consists in carrying no item, that is xi = 0, i = 1, . . . , n.

Consider now a configuration of the knapsack characterized by the vector x ∈ {0, 1}n .
A neighbor of x is obtained by selecting one item i and by changing the decision
with respect to it. If the item belongs to the knapsack, we remove it. Otherwise, we
include it. Therefore, we define N(x) = {x, yi , i = 1, . . . , n}, where yi is defined as
yij = xj ∀j 6= i,
yii = 1 − xi .
The iterations of the local search algorithm (Algorithm 27.4) for Example 27.2 are
reported in Table 27.6. Each block represents an iteration with the list of neighbors
that have been considered, the last one being selected for the next iteration. In the
last block, all neighbors are rejected, so that the current iterate is a local minimum.
The interpretation of these iterations is simple: each item is included one by one into
the knapsack until the next one does not fit. A total of 6 items can fit, for a total
weight of 297 and a total utility of 203. Note that the greedy algorithm presented in
Section 27.1.1 found a feasible solution with weight 286 and utility 290.
The simple neighborhood structure presented above can be generalized. We define
a neighborhood of size k by selecting k items, and modify the decision about them.
In particular, if k = 1, there are n neighbors, and the neighborhood structure is the
one used earlier. If k = n, there is only 1 neighbor, obtained by changing the status
of all the items. The size of this neighborhood is
1
n! 2n+ 2
≈ √ , (27.2)
k!(n − k)! πn
where the approximation holds when n is large and k = n/2. Therefore, the size
of this neighborhood grows exponentially with the size of the problem, which is
not desirable. In order to avoid that, the neighborhood is defined by the random
selection of a fixed number of neighbors, as defined in Algorithm 27.5. In this case,
the size of the neighborhood is bounded above by M, irrespectively of the values
of k and n. Note the randomization of the procedure, which prevents items being
considered in the same order. Note also that only feasible solutions are considered in
the neighborhood. This illustrates the flexibility of the neighborhood definition.
This procedure may generate the same neighbor several times, as the random
draws are made with replacement. The exact size of the neighborhood varies from
run to run, as infeasible combinations are discarded. It may also generate an empty
sequence, if the feasibility test at step 19 fails for each selected item. The use of this
neighborhood is illustrated in Section 27.3.1.
Algorithm 27.5: Neighborhood of size k for the knapsack problem

1 Objective
2 Generate an ordered list of feasible neighbors for the knapsack problem.
3 Input
4 The number of items n.
5 For each item i = 1, . . . , n, its status xi ∈ {0, 1}.
6 The vector of weights w.
7 The capacity W.
8 The size of the neighborhood k.
9 The maximum number of trials M.
10 Output
11 An ordered list of feasible neighbors.
12 Initialization
13 N = ∅.
14 m = 1.
15 Repeat
16 xc := x.
17 Select randomly k items i1 , . . . , ik .
18 (xc )ij := 1 − (xc )ij , j = 1, . . . , k.
19 if xTc w ≤ W then
20 N := N ∪ xc
21 m :=m+1.
22 Until m = M.

k 1 2 3 4 5 6 7 8 9 10 11 12 wT xc uT xc uT x∗
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0 0 84 80 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 80
2 1 1 0 0 0 0 0 0 0 0 0 0 111 111 80
0 1 0 0 0 0 0 0 0 0 0 0 27 31 111
1 0 0 0 0 0 0 0 0 0 0 0 84 80 111
3 1 1 1 0 0 0 0 0 0 0 0 0 158 159 111
0 1 1 0 0 0 0 0 0 0 0 0 74 79 159
1 0 1 0 0 0 0 0 0 0 0 0 131 128 159
1 1 0 0 0 0 0 0 0 0 0 0 111 111 159
4 1 1 1 1 0 0 0 0 0 0 0 0 180 176 159
0 1 1 1 0 0 0 0 0 0 0 0 96 96 176
1 0 1 1 0 0 0 0 0 0 0 0 153 145 176
1 1 0 1 0 0 0 0 0 0 0 0 133 128 176
1 1 1 0 0 0 0 0 0 0 0 0 158 159 176
5 1 1 1 1 1 0 0 0 0 0 0 0 201 203 176
0 1 1 1 1 0 0 0 0 0 0 0 117 123 203
1 0 1 1 1 0 0 0 0 0 0 0 174 172 203
1 1 0 1 1 0 0 0 0 0 0 0 154 155 203
1 1 1 0 1 0 0 0 0 0 0 0 179 186 203
1 1 1 1 0 0 0 0 0 0 0 0 180 176 203
6 1 1 1 1 1 1 0 0 0 0 0 0 297 287 203
0 1 1 1 1 1 0 0 0 0 0 0 213 207 287
1 0 1 1 1 1 0 0 0 0 0 0 270 256 287
1 1 0 1 1 1 0 0 0 0 0 0 250 239 287
1 1 1 0 1 1 0 0 0 0 0 0 275 270 287
1 1 1 1 0 1 0 0 0 0 0 0 276 260 287
1 1 1 1 1 0 0 0 0 0 0 0 201 203 287
1 1 1 1 1 1 1 0 0 0 0 0 339 321 287
1 1 1 1 1 1 0 1 0 0 0 0 343 326 287
1 1 1 1 1 1 0 0 1 0 0 0 351 333 287
1 1 1 1 1 1 0 0 0 1 0 0 350 345 287
1 1 1 1 1 1 0 0 0 0 1 0 329 310 287
7 1 1 1 1 1 1 0 0 0 0 0 1 375 354 287
Heuristics 665

We illustrate the local search method in Example 27.3 involving 16 cities. In order
to initiate the iterations, we generate a random permutation of the customers and
obtain
1, 7, 10, 3, 16, 15, 11, 4, 6, 14, 8, 2, 12, 13, 5, 9,
illustrated in Figure 27.10, with length 358.6.
14
15
13 16
10 11 12
5 8
3
4
Figure 27.10: Randomly generated initial feasible solution (length: 358.6)
The neighborhood structure that we consider for the local search algorithm is
called 2-OPT. It consists in selecting two customers and deciding to swap their posi-
tion in the tour, inverting the sequence of visits between the two. If a and b are the
two customers selected to be swapped, the 2-OPT neighbor of
h, i1 , . . . im , a, j1 , j2 , . . . jn , b, k1 , . . . , kp
Tour Length 2-OPT

1 7 10 3 16 15 11 4 6 14 8 2 12 13 5 9 358.63
1 7 10 3 16 15 11 4 6 14 8 2 12 13 9 5 322.80 5 9
1 7 10 3 16 15 11 4 12 2 8 14 6 13 9 5 287.84 6 12
1 7 10 3 8 2 12 4 11 15 16 14 6 13 9 5 257.76 16 8
1 2 8 3 10 7 12 4 11 15 16 14 6 13 9 5 231.67 7 2
1 2 8 3 4 12 7 10 11 15 16 14 6 13 9 5 213.51 10 4
1 2 8 3 4 12 7 10 11 16 15 14 6 13 9 5 196.29 15 16
1 2 8 3 4 12 7 10 11 16 15 14 9 13 6 5 180.68 6 9
1 2 3 8 4 12 7 10 11 16 15 14 9 13 6 5 173.45 8 3
Neighborhood and local search
1 2 3 8 4 12 7 10 11 16 15 14 13 9 6 5 169.16 9 13
1 2 3 8 4 12 7 10 9 13 14 15 16 11 6 5 169.11 11 9
1 2 3 8 4 12 7 11 16 15 14 13 9 10 6 5 153.82 10 11
Heuristics 667
is
h, i1 , . . . im , b, jn , . . . j2 , j1 , a, k1 , . . . , kp .
In our example, the 2-OPT neighbor based on cities 13 and 16 of

1, 7, 10, 3, 16, 15, 11, 4, 6, 14, 8, 2, 12, 13, 5, 9
is
1, 7, 10, 3, 13, 12, 2, 8, 14, 6, 4, 11, 15, 16, 5, 9.
The neighborhood therefore consists of a set of tours generated using this procedure
for any pair of cities. We apply now Algorithm 27.3 using this neighborhood structure.
The iterations are reported in Table 27.7. Each row corresponds to an iteration. For
each iteration, the current tour and its length are reported, as well as the two cities
involved in the 2-OPT neighbor.
14 7.4
15
1 4.2 13
.1
13 16
2.9
9 10
.5
11.
6
10 11 12
4.7
8.6
6
15.3
8.1
7
21.8
4.4
4.1 8
5 7.9
3
6.4
1 9.7
Figure 27.11: Feasible solution provided by the local search algorithm, started from
the feasible solution provided by the greedy algorithm for Example 27.3 (length:
150.7)
This feasible solution is a little bit better than the feasible solution provided by
the greedy algorithm presented in Section 27.1.2 (which is 158.5, see Figure 27.2),
but it involves a significantly higher computational effort. Therefore, it may make
sense to initiate the local search with the feasible solution provided by the greedy
algorithm as a starting point, instead of a randomly generated initial tour. For our
example, it performs only one iteration, by applying the 2-OPT operator on cities 9
and 11, to obtain the tour represented in Figure 27.11, with length 150.7.
As local search methods get trapped in local minima, it is good practice to apply
them from different starting points, in order to increase the chance of finding different
local minima. The methods presented in the next sections are also designed to escape
from local minima.
14 7.4
15
1 4.2 13
.1
13 16
2.9
9 .3
11.
6 11
10 11 12
4.7
.3
15
8.6
6
15.3
8.1
4.1 8
5 7.9
3
6.4
4
.1
13
1 9.7
Figure 27.12: Feasible solution provided by Algorithm 27.3 on Example 27.3 (length:
153.8)
Heuristics 669
The feasible solution provided by the local search algorithm (Algorithm 27.3) is
1, 2, 3, 8, 4, 12, 7, 11, 16, 15, 14, 13, 9, 10, 6, 5.
Its length is 153.8, and it is illustrated in Figure 27.12.

27.3 Variable neighborhood search

Introduced by Mladenović and Hansen (1997), the heuristic called “Variable Neigh-
borhood Search” (VNS) is a simple but powerful extension of the local search that is
designed to escape from local optima. The idea is to consider a collection of different
neighborhood structures. Once a local minimum has been reached for a neighborhood
structure, use it as a starting point for another local search using another neighbor-
hood structure. If a better feasible solution is found, restart the process from the first
neighborhood. Otherwise, try the next structure until all neighborhood structures
have been considered. The procedure is described as Algorithm 27.6.
Algorithm 27.6: Variable Neighborhood Search

1 Objective
to x ∈ F .
3 Input
6 An initial feasible solution x0 ∈ F .
7 K neighborhood structures N1 , N2 , . . . , NK .
8 Output
10 Initialization
11 x∗ := x0 .
12 k := 1.
13 Repeat
14 Apply a local search from x∗ using neighborhood Nk :
x+ := LS(f, F , x∗ , Nk )
if f(x+ ) < f(x∗ ) then

15 x∗ := x+
16 k := 1
17 else
18 k := k + 1
19 Until k = K.
670 Variable neighborhood search

We illustrate the method on the knapsack problem in Example 27.2. In order to
apply the VNS method (Algorithm 27.6), we need to define a series of neighborhood
structures. We use Algorithm 27.5 to generate a neighborhood of size k.
The iterations of the VNS algorithm on Example 27.2 using the neighborhood
structures defined by Algorithm 27.5 (with M = 1,000) are illustrated in Figure 27.13.
300 12
k 11
Neighborhood size k
250 f(x∗ ) 10
9
200 8
uT x∗
150 7
6
100 5
4
50 3
2
0 1
0 5 10 15 20 25
Iterations
Figure 27.13: Iterations of run 1 of Algorithm 27.6 on Example 27.2 using the neigh-
borhood structures defined by Algorithm 27.5.
The solid line shows how the size k of the neighborhood changes from iteration to
iteration. Note that it is reset to k = 1 each time a better feasible solution is found.
The dashed line provides the value of the objective function for the best feasible
solution found so far at each iteration. The first local search allows a value of 284 to
be reached. The mechanism of the VNS allows us to escape from this local minimum,
and reach an objective value of 300, obtained by the selection of items 1, 2, 3, 4, 5, 8,
and 10. There is no guarantee that it is the optimal solution. As it is a randomized
algorithm, it may be appropriate to run the same algorithm several times. In this
case, the algorithm was run 10 times. The final feasible solution was always the same.
The iterations of runs 2 and 3 of this experiment are reported in Figures 27.14 and
27.15.
300 12
k 11
Neighborhood size k
250 f(x∗ ) 10
9
200 8
uT x∗
150 7
6
100 5
4
50 3
2
0 1
0 2 4 6 8 10 12 14 16
Iterations
borhood structures defined by Algorithm 27.5
300 12
k 11
Neighborhood size k
250 f(x∗ ) 10
9
200 8
uT x∗
150 7
6
100 5
4
50 3
2
0 1
0 5 10 15 20 25
Iterations
borhood structures defined by Algorithm 27.5
672 Variable neighborhood search

In order to illustrate the VNS method on the traveling salesman problem, we define a
family of neighborhoods. We propose a neighborhood structure based on the insertion
greedy algorithm (Algorithm 27.2). Given a tour t, a member of the neighborhood
of size k is obtained as follows:

• consider the subtour s consisting of the first k cities of t,
• generate a 2-OPT neighbor s+ of s,
• generate a complete tour t+ using the insertion greedy algorithm (Algorithm 27.2)
starting from s+ .
Figure 27.16 illustrates a tour t of length 156.2 obtained using the insertion greedy
algorithm (Algorithm 27.2) starting with subtour s=1–2–3–7–8–4–12–16–11–1.
14 7.4
15
1 4.2 13
.1
13 16
2.9
9 10
.3 .5
21.7 11
10.8 12
10 11
4.7
6
15.3
8.1
7 5.9
.7
23
4.4
5 8 7.9
3
6.4
4
.1
13
1 9.7
Figure 27.16: Tour (length: 156.2) obtained using the insertion greedy algorithm
(Algorithm 27.2) starting with subtour 1–2–3–7–8–4–12–16–11–1
Heuristics 673
To obtain a neighbor of t, we apply a 2-OPT to s. For instance, we swap cities 16

and 11 to obtain the subtour s+ =1–2–3–7–8–4–12–11–16–1. Applying the insertion
greedy algorithm (Algorithm 27.2) to s+ , we obtain a tour t+ of length 151.6, illus-
trated in Figure 27.17. Note that t+ is shorter than t, although s+ is longer than s
(118.1 instead of 101.8). Such a neighborhood is defined for 2 ≤ k ≤ n.
14 7.4
15
1 4.2 13
.1
13 16
2.9
9 .3
11.
6 11
15.5
10 11 12
4.7
.0
6
35
15.3
8.1
7 5.9
4.4
5 8 7.9
3
6.4
4
.1
13
1 9.7
Figure 27.17: Tour (length: 151.6) obtained using the insertion greedy algorithm
(Algorithm 27.2) starting with subtour 1–2–3–7–8–4–12–11–16–1
This neighborhood structure can be used in Algorithm 27.6 (where the neighbor-
hood of size 1 is undefined and therefore skipped by the algorithm). The iterations in
the case of Example 27.3 are illustrated in Figure 27.18. The algorithm succeeds in
improving the objective function with a neighborhood of size 2 in the first iteration.
The next improvement is obtained with a neighborhood of size 6, and the last one
with a neighborhood of size 9.
674 Simulated annealing
400 16
k 15
f(x∗ ) 14
Neighborhood size k
350
13
12
Tour length
300 11
10
250 9
8
200 7
6
5
150 4
3
100 2
0 5 10 15 20 25 30
Iterations
Figure 27.18: Iterations of Algorithm 27.6 on Example 27.3 using the neighborhood
structures introduced in Section 27.3.2 (best feasible solution: 149.2)
The above heuristic is certainly not efficient and has been designed to illustrate
the concepts introduced in this book. Many other neighborhood structures are pos-
sible. In particular, a natural structure for the VNS method applied to the traveling
salesman problem would be a generalization of the 2-OPT neighborhood, such as the
k-OPT neighborhood, where k is the number of cities that are re-organized in the
tour (see Helsgaun, 2009). In their paper, Mladenović and Hansen (1997) illustrate
the efficiency of the VNS heuristic on the traveling salesman problem. They combine
the VNS idea with the GENIUS heuristic proposed by Gendreau et al. (1992).
27.4 Simulated annealing

In metallurgy, annealing is a technique consisting in heating a metal and then in
cooling it down slowly in order to improve its properties. By analogy, simulated
annealing is an extension of local search that may accept iterates with a higher value
of the objective function. The “temperature” is a parameter that controls the proba-
bility to accept such iterates. The higher the temperature, the higher the probability
of accepting them.
More specifically, consider the current iterate xk and y ∈ N(xk ) is a neighbor of xk .
If f(y) ≤ f(xk ), than y is immediately accepted as the next iterate. If f(y) > f(xk ),
the algorithm accepts y as the next iterate with probability

f(y) − f(xk )
exp − , (27.3)
T
where T > 0 is a parameter. Note that the probability is close to 1 if f(y) is close to
f(xk ) and decreases if the difference between the two values increases. Intuitively, the
algorithm accepts from time to time to proceed uphill to escape from local optima, still
Heuristics 675
being discouraged by steep paths. The parameter T is the “temperature” parameter

mentioned above. This mechanism is illustrated in Figure 27.19, where the acceptance
probability of various neighbors as a function of the temperature T is represented,
assuming that the value of the objective function at the current iterate is f(xk ) = 3.
0.8
Prob(xk+1 = y)
0.6
0.4 f(y) = 3.5

f(y) = 4
0.2 f(y) = 6
f(y) = 8
0
0 1 2 3 4 5 6 7 8 9
T
Figure 27.19: Simulated annealing: probability for a neighbor y of xk to be accepted,

with f(xk ) = 3
Note that this extension may be applied to any version of the local search algo-
rithm. Algorithm 27.7 proposes a version of the simulated annealing method where
the candidate neighbor is selected randomly in the neighborhood.
The performance of the algorithm varies with the value of its parameters K, that
is, the number of trials for each level of temperature, and T0 , Tf and the sequence
(Tℓ )ℓ , controlling the temperature reduction.
With respect to the temperature reduction mechanism, it is easier to interpret it
in terms of probability of acceptance. Let δt be a typical increase of the objective
function in the neighborhood structure N. In the beginning of the algorithm, we
would like such an increase to be accepted with high probability p0 = 0.999, say. At
the end, it should be accepted with low probability pf = 0.00001, say. If we allow the
algorithm to modify the temperature M times, then for ℓ = 0, . . . , M we have
δt
Tℓ = − −p0 . (27.4)
ln(p0 + pfM ℓ)
As pf − p0 < 0, an increase of m corresponds to a strict decrease of the denominator

and, consequently, a strict decrease of the temperature. Moreover, as 0 ≤ ℓ ≤ M,
the logarithm at the denominator is always negative, and the temperature always
positive, as requested by the algorithm. Here, Tf is defined as −δt / ln pf . In this
way, the decrease of the acceptance probability is linear in ℓ for a given level δt , as
Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 676 Simulated annealing
Algorithm 27.7: Simulated annealing

1 Objective
to x ∈ F .
3 Input
6 An initial feasible solution x0 ∈ F .
7 A neighborhood structure N.
8 Initial temperature T0 > 0, final temperature 0 < Tf < T0 .
9 A sequence of temperatures Tℓ , such that 0 < Tℓ+1 < Tℓ , for each ℓ and
there exists L such that Tℓ ≤ Tf , for each ℓ ≥ L.
10 Number of iterations per level of temperature K.
11 Output
13 Initialization
14 xc := x0 , x∗ := x0 , ℓ = 0.
15 Repeat
16 for k := 1 to K do
17 Select randomly y ∈ N(xc ) ∩ F
18 δ := f(y) − f(xc )
19 if δ < 0 then
20 xc := y
21 else
22 Select randomly r ∈ R between 0 and 1
23 if r < exp(−δ/Tℓ ) then
24 xc := y
25 if f(xc ) < f(x∗ ) then

26 x∗ := xc
27 Reduce the temperature: ℓ := ℓ + 1

28 Until Tℓ ≤ Tf .
Heuristics 677
1000 1
Temperature
Acceptance probability
Tm (log scale)
100 Acceptance probability 0.8
10 0.6
1 0.4
0.1 0.2
0.01 0
0 200 400 600 800 1000
m
Figure 27.20: Illustration of the temperature reduction function (27.4) with δt = 1,

M = 1000, p0 = 0.999 and pf = 0.00001

We illustrate the simulated annealing algorithm on the knapsack problem of Exam-
ple 27.2. We consider the neighborhood structure generated by Algorithm 27.5 with
k = 1, which consists in randomly selecting one item, changing its status from chosen
to unchosen, or the other way around. Algorithm 27.7 is applied with K = 50,
and the temperature defined by (27.4), where δt = 20, M = 100, p0 = 0.999,
and pf = 0.00001. Note that the algorithm is written for a minimization problem,
while the knapsack problem is a maximization problem. The number of iterations is
MK = 5,000. The feasible solution provided by the algorithm consists in including
items 1, 2, 3, 4, 5, 8, and 10, for a total utility of 300 and a total weight of 300.
The iterations of the algorithm are illustrated on Figure 27.21, where the plain line
represents the value of the objective function at each iteration, the dashed line is the
temperature T , and the dotted line is the best value of the objective function iden-
tified so far at each iteration. It appears clearly that the method frequently accepts
states with a low utility when the value of the temperature is high, while it happens
less frequently when T is low. Note also that the best value is reached at iteration
1,421.
The algorithm stays there for 3 iterations and then goes downhill to escape from
that local maximum. It returns to that feasible solution later on, during iteration
3,493. It escapes again from that local maximum and does not find any better feasible
solution afterwards. Eventually, the algorithm converges to a feasible solution with
value 273 and stops there.
For the sake of comparison, the algorithm is run with K = 1,000 and M = 5. That
is, the temperature is modified only 5 times, and each time a total number of 1,000
candidates are tested. The iterations are illustrated in Figure 27.22, using the same
convention as before.
300
1000
250
100 200
uT x
150
T
10 100
uT xc
T 50
uT x∗
1 0
0 1421 2500 3493 5000
Iterations
Figure 27.21: Iterations of the simulated annealing algorithm on the knapsack prob-
lem (K = 50, M = 100)
300
1000
250
100 200
uT x
150
T
10 100
uT xc
T 50
uT x∗
1 0
0 282 2500 5000
Iterations
lem K = 1,000, M = 5
The same feasible solution is obtained, but is reached earlier (iteration 282), when
the temperature is equal to T = 89.22. When the temperature has reached 1.74, the
algorithm cannot escape from the local maximum anymore, at the value 257. It is in
general not a good idea to drop the temperature too quickly.
Finally, the algorithm is run with K = 5 and M = 1,000. That is, the temperature
is modified 1,000 times, and each time 5 candidates are tested. The iterations are
illustrated in Figure 27.23, using the same convention as before. The same feasible
solution is obtained and reached at iteration 4,206, when the temperature is equal to
T = 16.6.
It is good practice to test different values of the parameters on small instances of
a problem before running it on large instances.
Heuristics 679
300
1000
250
100 200
uT x
150
T
10 100
uT xc
T 50
uT x∗
1 0
0 2500 4206 5000
Iterations
lem K = 5, M = 1,000
We illustrate the simulated annealing algorithm on the traveling salesman problem

of Example 27.3. We consider the 2-OPT neighborhood structure described in Sec-
tion 27.2.2. Algorithm 27.7 is applied with K = 50, and the temperature defined
by (27.4), where δt = 5, M = 100, p0 = 0.999 and pf = 0.00001. The number of
iterations is MK = 5,000. The evolution of the value of the objective function at the
current iterate and the best iterate, as well as the temperature, are represented in
Figure 27.24.
1000 450
f(xc )
T 400
100 f(x∗ )
350
f(x)
10 300
T
250
1
200
0.1 150
0 2500 5000
Iterations
Figure 27.24: Iterations of the simulated annealing algorithm on the traveling sales-
man problem (K = 50, M = 100)
The feasible solution provided by the algorithm has length 149.2 and is represented
in Figure 27.25.
The algorithm has also been run with K = 1,000 and M = 5 (Figure 27.26) and
with K = 5 and M = 1,000 (Figure 27.27). In both cases, the feasible solution reached
length 150.7.
14 7.4
15
1 4.2
15.8
13 16
2.9
9 10
.3 .5
11.
6 11
10 11 12
4.7
15.3
8.1
7 5.9
4.4
5 8 7.9
3
6.4
4
.1
13
1 9.7
Figure 27.25: A feasible solution provided by the simulated annealing algorithm on

Example 27.3 (length: 149.2)
Note that these results are based on only one execution of the algorithm. As the
algorithm is randomized, its outcome varies from run to run. Applying the algorithm
100 times with K = 50, M = 100, we obtain 53 times the value 149.2, and 26 times
the value 150.7. The complete results are reported in Table 27.8.
Simulated annealing can of course be applied to any type of neighborhood. For
instance, Alfa et al. (1991) use a 3-OPT neighborhood in this context.
Heuristics 681
100 450
xc
T 400
x∗
10 350
f(x)
300
1 250
200
0.1 150
0 2500 5000
Iterations
man problem (K = 1,000, M = 5)
10000 450
xc
T 400
1000
x∗
350
100
f(x)
300
T
10
250
1 200
0.1 150
0 2500 5000
Iterations
man problem (K = 5, M = 1,000)
f(x∗ ) No. of runs f(x∗ ) No. of runs

149.2 53 151.7 1
149.9 5 152.3 1
150.7 26 154.1 3
151.1 7 154.4 1
151.1 3
Table 27.8: Value of the objective function for 100 runs of the simulated annealing
algorithm
682 Conclusion
27.5 Conclusion
Heuristics play an important role in optimization because, for some problems, they
are the only possible way of tackling them. This is the only family of methods
presented in this book that is not supported by a rigorous theoretical framework.
Instead, we have presented various examples of problems and algorithms to illustrate

the main concepts. Creativity, as well as intense experimentation, are necessary to
obtain methods that are useful to practitioners.
27.6 Project
Objective
The objective of the project is to analyze how different heuristics handle different
optimization problems.
Approach
1. For each problem,

• design several neighborhood structures (at least 3),
• identify a feasible solution as starting point,
• apply two versions of the local search algorithm (Algorithms 27.3 and 27.4),
• apply the variable neighborhood search method with the neighborhood struc-
tures defined above (Algorithm 27.6),
• apply the simulated annealing method with each of the neighborhood struc-
tures defined above (Algorithm 27.7).
2. Report, for each run, the evolution of the objective function with the iterations,
and the best value found.
3. Compare it to the optimal solution (when available).
4. Select the algorithm that has found the worse solution, and propose some variants
to improve it.
Algorithms
Algorithms 27.3, 27.4, 27.6 and 27.7.
Problems
Exercise 27.1. Solve the instance of the problem of locating plants for the supply
of energy described in Example 25.1, with 10 sites and 3 cities, using the data in
Table 25.1.
Heuristics 683
Exercise 27.2. Solve the knapsack problem presented in Example 27.2.

Exercise 27.3. Solve the traveling salesman problem with 16 cities presented in
Example 27.3.
Exercise 27.4. Solve the task assignment problem of Exercise 25.1.
Exercise 27.5. Solve the scheduling problem of Exercise 25.2.

Exercise 27.6. Solve the graph coloring problem of Exercise 25.3. Write also the
version of the problem where you must color the cantons in Switzerland.
Exercise 27.7. Solve the bin packing problem of Exercise 25.4.
Exercise 27.8. Solve the vehicle routing problem of Exercise 25.5 with Q = 100,000,
Q = 150,000, and Q = 200,000.
Part VIII
Appendices
Appendix A
Notations
The book uses standard notations in linear algebra and analysis. We provide here
some further details that the reader may find useful.
Positive / negative A number x is positive if x > 0. A number x is negative if
x < 0. Zero is neither positive or negative. We refer to a non negative number if
x ≥ 0, and to a non positive number if x ≤ 0.
Vectors Vectors of Rn are column vectors, represented with lowercase letters, such
as  
x1
 
x =  ...  .
xn
The notation xk refers to the kth entry of vector x. When included in the core
of the text, the notation x = (x1 . . . xn )T is used, where superscript T refers to
“transposed.” The inner product of x ∈ Rn and y ∈ Rn is denoted by
xT y.
Matrices Matrices of Rm×n have m rows and n columns and are represented by
uppercase letters:  
a11 . . . a1n
 ..  .
A =  ... ..
. . 
am1 . . . amn
The notation aij refers to the entry at row i and column j. The multiplication of
two matrices A ∈ Rm×p and B ∈ Rp×n is denoted by AB ∈ Rm×n , and is such
that
Xp
(AB)ij = Aik Bkj , i = 1, . . . , m, j = 1, . . . , n.
k=1
n
As a vector x ∈ R is a column vector, it is considered as a matrix of dimension
n × 1.
The inverse of matrix A is denoted by A−1 , the transpose of matrix A is denoted
by AT , and the inverse of the transpose of matrix A is denoted by A−T .
688
min and argmin Consider the function f : Rn → R and the set of constraints X ⊆
Rn . The equation
f∗ = min f(x) (A.1)
x∈X
means that
f∗ ≤ f(x), ∀x ∈ X, (A.2)
and f∗ ∈ R represents the value of the objective function at a minimum. The
equation
x∗ ∈ argminx∈X f(x) (A.3)
means that
x∗ ∈ {x ∈ X|f(x) ≤ f(y) ∀y ∈ Y} (A.4)
and x∗ belongs to the set of minima. In an algorithmic context, x∗ usually refers
to the minimum returned by the algorithm under consideration.
When there is a unique minimum, the equation can be written
x∗ = argminx∈X f(x). (A.5)
Iterations Most algorithms presented in the book are iterative algorithms. In this
context, the notation xk ∈ Rn is used to refer to the value of the iterate at
iteration k. It is a vector of dimension n. In general, there is no ambiguity about
the meaning of the notation xk as representing the kth entry of x or iterate k of
the algorithm. In the former case, it is a real number, in the latter, a vector of
Rn . In the event of a possible ambiguity, the exact meaning is made explicit.
Appendix B
Definitions
Definition B.1 (Vector norms). A vector norm on Rn is a function k · k : Rn → R

satisfying the following conditions.
1. kxk ≥ 0 for all x ∈ Rn .
2. kxk = 0 if and only if x = 0.
3. kx + yk ≤ kxk + kyk, for all x, y ∈ Rn .
4. kαxk = |α| kxk, for all α ∈ R, x ∈ Rn .
Consider x ∈ Rn and p ≥ 1. The p-norm of x is defined by
v
u n
uX p
x p= tp
xi .
i=1
When p tends toward infinity, we get
x ∞
= max |xi | .
i=1,...,n
When p = 2, the norm k · k2 is called the Euclidean norm.

Definition B.2 (Convex set). A set X ⊆ Rn is called convex if for all x ∈ X and for
all y ∈ X, we have
λx + (1 − λ)y ∈ X,
for all 0 ≤ λ ≤ 1.
Definition B.3 (Convex combination). Consider the vectors y1 , . . . , yk of Rn . We
say that a vector x of Rn is a convex combination of y1 , . . . , yk if there exist
λ1 , . . . , λk ∈ R such that
Xk
x= λj yj , (B.1)
j=1
with λj ≥ 0, j = 1, . . . , k, and
k
X
λj = 1 . (B.2)
j=1
690
The set of all convex combinations of y1 , . . . , yk is called the convex hull of the
vectors y1 , . . . , yk .
Definition B.4 (Convex cone). The set C ⊆ Rn is a convex cone if, for any subset
x, y ∈ C, and any αx , αy > 0, then αx x + αy y ∈ C.
Definition B.5 (Continuous function). Consider f : X ⊆ Rn → Rm and x0 ∈ X.

The function f is continuous at x0 if and only if
lim f(x) = f(x0 ) , (B.3)
x→x0
i.e., if and only if, for all ε ∈ R, ε > 0, there exists η > 0 such that
x − x0 < η and x ∈ X =⇒ f(x) − f(x0 ) < ε . (B.4)
Definition B.6 (Strictly unimodal function). Consider f : [0, T ] → R. The function

f is strictly unimodal on the interval [0, T ] if it has a unique global minimum x∗ in
[0, T ], and if the following conditions are verified:
1. for each x1 , x2 such that x1 < x2 < x∗ , we have f(x1 ) > f(x2 ) > f(x∗ ),
2. for each x1 , x2 such that x∗ < x1 < x2 , we have f(x∗ ) < f(x1 ) < f(x2 ),
that is, the function decreases on the left of x∗ and increases on its right.
Definition B.7 (Eigenvalues and eigenvectors). Consider a square matrix A ∈ Rn×n .
The eigenvalues of A are the roots of its characteristic polynomial
p(z) = det (zI − A) ,
where I is the identity matrix of dimension n. If λ is an eigenvalue of A, the non zero
vectors x ∈ Rn such that
Ax = λx
are called eigenvectors.
Definition B.8 (Positive semidefinite matrix). The square matrix A ∈ Rn×n is
called positive semidefinite when
xT Ax ≥ 0 , ∀x ∈ Rn . (B.5)
If, moreover, A is symmetric, then none of its eigenvalues is negative.
Definition B.9 (Positive definite matrix). The square matrix A ∈ Rn×n is called
positive definite when
xT Ax > 0 , ∀x ∈ Rn , x 6= 0 . (B.6)
If, moreover, A is symmetric, all its eigenvalues are positive.
Definition B.10 (Orthogonal matrix). The square matrix A ∈ Rn×n is orthogonal
if and only if
AT A = AAT = I. (B.7)
Equivalently, its transpose is equal to its inverse
AT = A−1 . (B.8)
Definitions 691
Definition B.11 (Minor). Consider the element aij of a square matrix A ∈ Rn×n .
The minor of aij is the matrix obtained by removing row i and column j from A.
Definition B.12 (Cofactor matrix). Consider the element aij of a square matrix
A ∈ Rn×n . The cofactor of aij is the determinant of the minor of aij multiplied by
(−1)i+j . The cofactor matrix is the matrix such that its element (i, j) is the cofactor
of aij .
Definition B.13 (Determinant). The determinant of a square matrix A ∈ Rn×n is
defined as
X n
Y
det(A) = sgn(σ) aiσi , (B.9)
σ∈Pn k=1
where Pn is the set of all permutations of the set {1, . . . , n}, and the sign of a permu-
tation σ is
sgn(σ) = (−1)M (B.10)
where M is the number of pairwise swaps that are required to obtain σ from {1, . . . , n}.
The following recursive definition is more adequate to calculate the determinant. If
A ∈ R1×1 , that is, if A contains only one element a11 , then det(A) = a11 . If
A ∈ Rn×n , then
n
X n
X
1+j
det(A) = (−1) a1j det(A1j ) = a1j C1j , (B.11)
j=1 j=1
where A1j is the minor of a1j , and C1j is the cofactor of a1j (see Definition B.12).
Definition B.14 (Unimodular matrix). A unimodular matrix is a square integer
matrix with determinant equal to 1 or −1.
Rudolf Otto Sigismund Lipschitz was born on May 14, 1832,

in Bönkein, close to Königsberg, now Kaliningrad (Russia), and
died in Bonn (Germany) on October 7, 1903. Lipschitz was a
student of Dirichlet in Berlin. He contributed significantly to
the progress of knowledge in fields as diverse as number the-
ory, the theory of Bessel functions and Fourier series, ordinary
and partial differential equations, analytical mechanics, and the
theory of harmonic functions. He is particularly known for the
condition that bears his name (Definition B.15).
Figure B.1: Rudolf Otto Sigismund Lipschitz
Definition B.15 (Lipschitz condition). In a metric space E, a function f satisfies the

Lipschitz condition of order a > 0, with a constant k > 0, if for all (x, y)
a
d f(x), f(y) ≤ k d(x, y) ,
692
where d(x, y) is the distance between x and y. When a = 1, the function is called a
Lipschitz function. If, moreover, k < 1, the function is called contracting. A Lipschitz
function is uniformly continuous on E.
Definition B.16 (Lipschitz continuity). Consider f : X ⊆ Rn → Rm . The function
f is Lipschitz continuous on X if there exists a constant M > 0 such that, for all
x, y ∈ X, we have
f(x) − f(y) m ≤ M x − y n , (B.12)
where k · km is a norm on Rm and k · kn is a norm on Rn . If M = 0, then f is
constant on X, i.e., there exists c such that f(x) = c, ∀x ∈ X.
Definition B.17 (Landau notation o( · )). Let f and g be two functions of R → R,
with f(x) 6= 0, ∀x. The Landau notation g(x) = o f(x) signifies that
g(x)
lim = 0. (B.13)
x→0 f(x)
By abuse of language, we say that g(x) tends toward zero faster than f(x).
Definition B.18 (Cholesky decomposition). Let A ∈ Rn×n be a positive definite
symmetric matrix. The Cholesky decomposition of A is
A = LLT , (B.14)
where L ∈ Rn×n is a lower triangular matrix.

Definition B.19 (Convergence of a sequence). Let xk k be a sequence of points of
Rn . We say that the sequence xk k converges toward x if for all ε > 0, there exists
an index b
k such that
kxk − xk ≤ ε , ∀k ≥ b
k. (B.15)
We thus write
lim xk = x . (B.16)
k→+∞

Definition B.20 (Limit point of a sequence). Let xk k be a sequence of points of
Rn . We say that x is an accumulation
point or a limit point of the sequence if there
exists a subsequence xki i that converges toward x.
Definition B.21 (Semi-continuity). Consider X ⊆ Rn and let f : X → R be a function
of real values. f is called lower semi-continuous in x ∈ X if for all sequences xk k
of elements of X converging toward x, we have
f(x) ≤ lim inf f(xk ) . (B.17)

k→∞
Definition B.22 (Coercive function). Consider X ⊆ Rn and let f : X → R be a

function of real values. f is called coercive if for all sequences xk k of elements of X
such that kxk k → +∞ for any norm, we have
lim f(xk ) = +∞ . (B.18)

k→∞
Definitions 693
Definition B.23 (Compact

set). Let S be a subset of a metric set. S is compact if
for all sequences xk k of elements of S, there exists a subsequence converging toward
an element of S. If the metric set is of finite dimension (as Rn ), S is compact if and
only if S is closed and bounded.
Definition B.24 (Equivalence relation). Let X be a set and R a relation, that is, a
collection of ordered pairs of elements of X. The relation R is an equivalence relation

if the following properties are satisfied.
1. Reflexivity: (x, x) ∈ R, for all x ∈ X.
2. Symmetry: (x, y) ∈ R =⇒ (y, x) ∈ R, for all x, y ∈ X.
3. Transitivity: (x, y) ∈ R and (y, z) ∈ R =⇒ (x, z) ∈ R, for all x, y, z ∈ X.
If R is an equivalence relation, the notation x ≡ y means that (x, y) ∈ R.
Definition B.25 (Equivalence class). Let X be a set and R be an equivalence relation
on X. An equivalence class is a subset of the form {x ∈ X|(x, r) ∈ R}, where r is a
representative element of the class. Note that any element of the class can be its
representative, by symmetry of the equivalence relation.
Definition B.26 (Frobenius norm). Consider A ∈ Rm×n . The Frobenius norm of
A is v
uX
um X n
A =t F
a2 . ij
i=1 j=1
Definition B.27 (Induced norm). Let k · k be a vector norm on Rn . The matrix

norm k · km×n on Rm×n defined by
kAxk
A m×n
= max (B.19)
x∈Rn , x6=0 kxk
is the matrix norm induced by the vector norm.

Definition B.28 (Singular values). Let A ∈ Rm×n . There exist orthogonal matrices
U ∈ Rm×m and V ∈ Rn×n such that
UT AV = diag (σ1 , . . . , σp ) ,
where p = min(m, n). The σi are called singular values of A.

Definition B.29 (Rank of a matrix). Let A ∈ Rm×n be a matrix and

Im(A) = y ∈ Rm | ∃x ∈ Rn t.q. y = Ax
the subspace generated by the matrix A. The rank of A is the dimension of this
subspace. It is equal to the number of singular values of A that are non zero.
Appendix C
Theorems
Theorem C.1 (First-order Taylor theorem). Let f : Rn → R be a continuously

differentiable function on an open sphere S centered in x. Then,
• for all d such that x + d ∈ S, we have

f(x + d) = f(x) + dT ∇f(x) + o kdk , (C.1)
• for all d such that x + d ∈ S, there exists α ∈ [0, 1] such that
f(x + d) = f(x) + dT ∇f(x + αd) . (C.2)
The result (C.2) is also called the mean value theorem.

Theorem C.2 (Second-order Taylor theorem). Let f : Rn → R be a twice differen-
tiable function on an open sphere S centered in x. Then,
• for all d such that x + d ∈ S, we have
1 T 2
f(x + d) = f(x) + dT ∇f(x) + d ∇ f(x)d + o kdk2 , (C.3)
2
• for all d such that x + d ∈ S, there exists α ∈ [0, 1] such that
1 T 2
f(x + d) = f(x) + dT ∇f(x) + d ∇ f(x + αd)d . (C.4)
2
m n n p
Theorem C.3 (Chain rule differentiation).
Consider f : R → R , g : R → R
m p
and h : R → R such that h(x) = g f(x) . Then,

∇h(x) = ∇f(x)∇g f(x) , ∀x ∈ Rm , (C.5)
where ∇f : Rm → Rm×n , ∇g : Rn → Rn×p and ∇h : Rm → Rm×p . When f is

linear, i.e., when f(x) = Ax with A ∈ Rn×m , we have
∇h(x) = AT ∇g(Ax) . (C.6)

696
Theorem C.4 (Rayleigh-Ritz theorem). Let A ∈ Rn×n be a real symmetric ma-

trix. Let λ1 be the largest eigenvalue of A and λn the smallest. Then,
xT Ax
λ1 = max (C.7)
x6=0 xT x
and
xT Ax
λn = min . (C.8)
x6=0 xT x
Theorem C.5 (Symmetric Schur decomposition). Let A ∈ Rn×n be a symmetric

matrix. Then there exists an orthogonal matrix Q ∈ Rn×n such that
QT AQ = Λ = diag(λ1 , . . . , λn ). (C.9)
Also, for each column Qk of Q,
AQk = λk Qk , (C.10)
so that λk is an eigenvalue of A, and Qk the corresponding eigenvector.

Theorem C.6 (Implicit functions). Let f : Rn × Rm → Rn be a continuous func-
tion. Consider x+ ∈ Rn and y+ ∈ Rm such that
f(x+ , y+ ) = 0 (C.11)
and such that the gradient matrix ∇y f(x, y) is continuous and non singular in
a neighborhood of (x+ , y+ ). Then, there exist neighborhoods Vx+ and Vy+ of x+
and y+ , respectively, as well as a continuous function
φ : Vx+ −→ Vy+ (C.12)
such that
y+ = φ(x+ ) (C.13)
and
f x, φ(x) = 0 , ∀x ∈ Vx+ . (C.14)
The function φ is unique in the sense that any (x, y) ∈ Vx+ × Vy+ such that
f(x, y) = 0 also satisfies y = φ(x). If, moreover, f is differentiable, then so is φ
and −1
∇φ(x) = −∇x f x, φ(x) ∇y f x, φ(x) , ∀x ∈ Vx+ . (C.15)
Theorem C.7 (Projection on the kernel of a matrix). Let A ∈ Rm×n be a matrix

of full rank. Then, the matrix
−1
P = I − AT AAT A (C.16)
is the projection operator on the kernel of A, i.e., we have APy = 0 for all
y ∈ Rn .
Theorems 697
Theorem C.8 (Convexity of polyhedra). All polyhedra are convex sets.

Lemma C.9 (Farkas’ lemma). Consider A ∈ Rm×n and b ∈ Rm . Then, exactly
one of the following two statements holds:
1. There exists x ∈ Rn , x ≥ 0, such that Ax = b.
2. There exists p ∈ Rm such that AT p ≥ 0 and pT b < 0.

Lemma C.10 (Farkas’ lemma (equivalent formulation)). Consider the linear sys-
tem of inequalities Ax ≤ b, where A ∈ Rm×n , x ∈ Rn and b ∈ Rm . The system
has a solution if and only if λT b ≥ 0 for all λ ∈ Rm such that λ ≥ 0 and λT A = 0.
Theorem C.11 (Newton’s theorem). Let f : Rn → Rm be a continuously differ-
x, x+ ∈ X,
entiable function on an open convex X ⊂ Rn . For all b
Z1 Z x+
+ +
T
f(x ) − f(b
x) = ∇f b x) (x+ − b
x + t(x − b x) dt = ∇f(z) dz . (C.17)
0 b
x
Theorem C.12 (Bound on an integral). Let f : X ⊂ Rn → Rm×n , where X is an

open convex set, and consider x and x + d in X. Then, if f is integrable on
[x, x + d],
Z1 Z1 Z1
f(x + td)d dt ≤ kf(x + td)dk dt ≤ kf(x + td)k kdk dt , (C.18)
0 0 0
where k · k is a norm on Rm×n .

Theorem C.13 (Cauchy-Schwarz inequalities). Consider x and y ∈ Rn . Then,
xT y ≤ kxk2 kyk2 , (C.19)

T
x y ≤ kxk1 kyk∞ . (C.20)
Theorem C.14 (Matrix norms). The matrix norms k · k2 (Definition B.27) and
k · kF (Definition B.26) satisfy the following properties:
1. Consider A and B ∈ Rn×n . Then
kABkF ≤ kAkF kBkF . (C.21)
kABk2 ≤ kA k2 kBk2 . (C.22)

kABkF ≤ min kAk2 kBkF , kAkF kBk2 . (C.23)
4. Consider A ∈ Rn×n and x ∈ Rn . Then
kAxk2 ≤ kAkF kxk2 . (C.24)

698
5. Consider x and y ∈ Rn . Then
xyT F
= xyT 2
= kxk2 kyk2 . (C.25)
Theorem C.15 (Cramer’s rule). Consider an invertible square matrix A ∈ Rn×n .

Then,
1
A−1 = C(A)T , (C.26)
det(A)
where C(A)T is the cofactor matrix of A (see Definition B.12).
Theorem C.16 (Inverse of a perturbed matrix). Let k · k be a norm on Rn×n
satisfying the conditions
kABk ≤ kAk kBk
(C.27)
kIk = 1 .
Let A be a non singular matrix and let us take B such that
A−1 (B − A) < 1 . (C.28)
Then, B is non singular and
kA−1 k
kB−1 k ≤ . (C.29)
1 − A−1 (B − A)
Theorem C.17 (Sherman-Morrison-Woodbury formula). Let A ∈ Rn×n be a square

non singular matrix, and let us take U and V ∈ Rn×p , with 1 ≤ p ≤ n. Then,
the matrix
B = A + UV T (C.30)
is invertible and
−1 T −1
B−1 = A−1 − A−1 U I + V T A−1 U V A . (C.31)
Theorem C.18 (Positive definite matrix). Let A and B be two symmetric matrices
such that B is positive semidefinite and A is positive definite in the kernel of B,
i.e., that xT Ax > 0 for all non zero x such that xT Bx = 0. Then, there exists
c̄ ∈ R such that A + cB is positive definite for all c > c̄.
Appendix D
Projects
D.1 General instructions

The aim of the projects is to implement the different algorithms described in the book
and test them on various problems. It is advisable to use a mathematical program-
ming language. The Octave language (Eaton, 1997) has been used for all examples
presented in this book. If a language such as C, C++, or Fortran is preferred, it is
useful to have a library to manage the linear algebra, such as LAPACK (Anderson
et al., 1999).
Here is some general advice for preparing these projects.
• It is important to define the interface conventions (format of the data, transfer of
vectors and matrices, etc.) between algorithms and problems, and to follow them
strictly during the projects. The implementation of these interfaces depends on
the programming language.
• Each description of an algorithm explicitly identifies the input and the output. It
is wise to be guided by this information for the implementation.
• Certain problems to solve and certain algorithms appear in several projects. It is
therefore recommended to isolate the different modules of the programs in order
to be able to reuse them.
• It is recommended to first perform a test on the examples described in the book
to debug your programs. The details of the iterations are given for each example.
• It is often instructive to vary the parameters of the different algorithms in order
to understand their role. During the implementation, it is inadvisable to define
the value of these parameters in the code. It is better to read these values in a
file, which can easily be modified during testing. The extra time devoted to the
programming is largely outweighed by the time saved by these tests.
• Whether it is to identify a programming error or to analyze the behavior of the
algorithm, it is important to keep note in a file of the information related to
each iteration of the algorithms (current iterate, value of the objective function,
gradient norm, constraint norm, etc.) It is advisable to keep the presentation of
700 Performance analysis
this file neat so that it can be easily read or easily imported into another software
(spreadsheet, database, visualization software, etc.)
• The use of a visualization software for the functions and level curves is recom-
mended. The freeware Gnuplot (www.gnuplot.info) has been used in this book.
• One must be attentive when it comes to numeric problems. Computers operate
with what is called finite arithmetic, in the sense that only a finite set of real
numbers can be represented. One of the consequences is that the result of the
operation 1 + ε can be 1, even if ε > 0. The smallest value of ε such that 1 + ε 6= 1
is called the ε-machine and depends on the representation of real numbers in the
employed processor. Typically, for a representation in double precision, the ε-
machine is on the order of 10−16 (see Algorithm 7.1 and the discussions about
it).
• Error handling is important. For example, if the algorithm tries to invert a singular
matrix, it must be properly detected and an adequate error message should be
displayed.
D.2 Performance analysis

It is instructive to compare the performances of various algorithms on several different
problems. However, the analysis and the synthesis of the results can be tedious. Here
we present an analysis method proposed by Dolan and Moré (2002), which enables a
large amount of results to be synthesized.
The first thing to do is to choose which performance measure to consider. In
general, this includes the calculation time, the number of iterations, or the number
of evaluations of the objective function. It is essential to choose a measure that
is comparable from one algorithm to another and from one problem to another. For
instance, if two algorithms converge toward different solutions, comparing the number
of iterations makes little sense. Moreover, if one algorithm utilizes derivatives and
the other doesn’t, comparing the number of function evaluations is not representative
of the performance of the algorithm.
Let τp,a be the performance of the algorithm a for solving the problem p. Without
loss of generality, we take as a convention here that τp,a < τp,b signifies that algorithm
a is better than algorithm b for solving problem p. Define τp,a = +∞ if algorithm a is
not able to solve problem p. For each problem, we can identify the best performance,
i.e.,
Tp = min τp,a .
a
If Tp = +∞, no algorithm can solve this problem. We then normalize the performance
indices by defining  τ
 p,a if τp,a 6= +∞
ρp,a = Tp

R otherwise,
where R is sufficiently large, in the sense that R > ρp,a for any p and a such that
τp,a 6= +∞. The quantity ρp,a represents the performance of algorithm a on problem
Projects 701
p, compared to the best algorithm among those tested. For each algorithm, we
consider the performance function, defined by
Pa : [1, ∞[ → [0, 1] : π Pr(τp,a ≤ π) ,

where Pr(τp,a ≤ π) is the proportion of problems for which ρp,a ≤ π. If π = 1, it

is the proportion of problems for which algorithm a is the best, which enables us to
measure the pure performance. If π ≥ R, it is the proportion of problems that have
been solved by algorithm a, independently of the performance, which enables us to
measure the robustness of the method. The intermediary values of π enable us to
analyze the compromise between efficiency and robustness. The faster this function
increases, the better the algorithm.
We illustrate these concepts with an example where two algorithms are tested on
10 problems. The performances (τp,a ) are listed in Table D.1.
Table D.1: Example of performances for two algorithms: τp,a

Problems
Algorithms 1 2 3 4 5 6 7 8 9 10
Algo A 20 10 ∞ 10 ∞ 20 10 15 25 ∞
Algo B 10 30 70 60 70 80 60 75 ∞ ∞
Tp 10 10 70 10 70 20 10 15 25 ∞
After normalization, the relative performances (ρp,a ) are given in Table D.2. Note
that R = 10 in this case. We could have chosen any value such that R > 6. Finally, the
function Pa for each algorithm is shown in Figure D.1. In this example, algorithm A
turns out to be more efficient than algorithm B, but the latter is slightly more robust.
Table D.2: Example of performances for two algorithms: ρp,a

Problems
Algorithms 1 2 3 4 5 6 7 8 9 10
Algo A 2 1 10 1 10 1 1 1 1 10
Algo B 1 3 1 6 1 4 6 5 10 10
Pr(τp,a ≤ π)
0
1
0.2
0.4
0.6
0.8
1
2
Algo. B
Algo. A
3
π
4
5
Performance analysis
Figure D.1: Example of a performance profile

7
Bibliography
Abadie, J. (1967). On the Kuhn-Tucker Theorem, in J. Abadie (ed.), Nonlinear

Programming.
Abraham, I., Delling, D., Goldberg, A. V. and Werneck, R. F. (2011). A Hub-
Based Labeling Algorithm for Shortest Paths in Road Networks, Experimental
Algorithms, Springer, pp. 230–241.
Ahuja, R. K., Magnanti, T. L. and Orlin, J. B. (1993). Network Flows. Theory,
Algorithms and Applications, Prentice-Hall Inc.
Alfa, A. S., Heragu, S. S. and Chen, M. (1991). A 3-OPT Based Simulated Annealing
Algorithm for Vehicle Routing Problems, Computers & Industrial Engineering
21(1–4): 635–639.
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz,
J., Greenbaum, A., Hammarling, S., McKenney, A. and Sorensen, D. (1999). LA-
PACK Users’ Guide, 3rd edn, Society for Industrial and Applied Mathematics,
Philadelphia, PA.
Armijo, L. (1966). Minimization of Functions Having Continuous Partial Derivatives,
Pacific J. Math. 16: 1–3.
Avis, D. and Fukuda, K. (1992). A Pivoting Algorithm for Convex Hulls and Ver-
tex Enumeration of Arrangements and Polyhedra, Discrete & Computational
Geometry 8(1): 295–313.
Axelsson, O. (1994). Iterative Solution Methods, Cambridge University Press, Cam-
bridge, UK.
Axhausen, K., Hess, S., Koenig, A., Abay, G., Bates, J. and Bierlaire, M. (2008).
Income and Distance Elasticities of Values of Travel Time Savings: New Swiss
Results, Transport Policy 15(3): 173–185.
Bartels, R. H. and Golub, G. H. (1969). The Simplex Method of Linear Programming
Using LU Decomposition, Commun. ACM 12(5): 266–268.
Beck, A. (2014). Introduction to Nonlinear Optimization: Theory, Algorithms,
and Applications with MATLAB, MPS-SIAM Series on Optimization, SIAM,
Philadelphia, PA.
Bellman, R. E. (1957). Dynamic Programming, Princeton University Press, Prince-
ton, NJ.
Bellman, R. E. (2010). Dynamic Programming, Princeton University Press, Prince-
ton, NJ.
Bibliography 704
Ben-Tal, A., El Ghaoui, L. and Nemirovski, A. (2009). Robust Optimization, Prince-

ton Series in Applied Mathematics, Princeton University Press, Princeton, NJ.
Ben-Tal, A. and Nemirovski, A. (2001). Lectures on Modern Convex Optimization:
Analysis, Algorithms, and Engineering Applications, MPS Series on Opti-
mization, Society for Industrial and Applied Mathematics (SIAM), Philadelphia,
PA.
Bertsekas, D. P. (1976). On the Goldstein-Levitin-Polyak Gradient Projection
Method, IEEE Transactions on Automatic Control 21: 174–184.
Bertsekas, D. P. (1982). Constrained Optimization and Lagrange Multiplier Meth-
ods, Academic Press, London.
Bertsekas, D. P. (1998). Network Optimization – Continuous and Discrete Models,
Athena Scientific, Belmont, MA.
Bertsekas, D. P. (1999). Nonlinear Programming, 2nd edn, Athena Scientific, Bel-
mont, MA.
Bertsimas, D. and Tsitsiklis, J. N. (1997). Introduction to Linear Optimization,
Athena Scientific, Belmont, MA.
Bertsimas, D. and Weismantel, R. (2005). Optimization over Integers, Athena
Scientific, Belmont, MA.
Bierlaire, M. (2006). Introduction à l’optimisation différentiable, Presses polytech-
niques et universitaires romandes, Lausanne, Switzerland. In french.
Birge, J. R. and Louveaux, F. (1997). Introduction to Stochastic Programming,
Springer.
Bland, R. G. (1977). New Finite Pivoting Rules for the Simplex Method, Mathemat-
ics of Operations Research 2(2): 103–107.
Bland, R. G. and Orlin, J. B. (2005). IFORS’ Operational Research Hall of Fame:
Delbert Ray Fulkerson, International Transactions in Operational Research
12: 367–372.
Bonnans, J. F., Gilbert, J.-C., Lemaréchal, C. and Sagastizábal, C. (1997). Optimi-
sation numérique – Aspects théoriques et pratiques, number 27 in Mathéma-
tiques et applications, Springer Verlag, Berlin.
Bonnans, J. F., Gilbert, J.-C., Lemaréchal, C. and Sagastizábal, C. (2003). Numerical
Optimization: Theoretical and Numerical Aspects, Springer Verlag, Berlin.
Bonnans, J., Gilbert, J., Lemarechal, C. and Sagastizábal, C. (2006). Numerical
Optimization: Theoretical and Practical Aspects, Universitext, Springer.
Boyd, S. and Vandenberghe, L. (2004). Convex Optimization, Cambridge University
Press, Cambridge, UK.
Brassard, G. and Bratley, P. (1996). Fundamentals of Algorithmics, Prentice Hall,
Englewood Cliffs, NJ.
Breton, M. and Haurie, A. (1999). Initiation aux techniques classiques de
l’optimisation, Modulo, Montréal, CA.
Broyden, C. G. (1965). A class of Methods for Solving Nonlinear Simultaneous Equa-
tions, Mathematics of Computation 19: 577–593.
Bibliography 705
Byrd, R., Nocedal, J. and Schnabel, R. B. (1994). Representation of Quasi-Newton

Matrices and Their Use in Limited Memory Methods, Mathematical Program-
ming 63: 129–136.
Calafiore, G. and El Ghaoui, L. (2014). Optimization Models, Control Systems and
Optimization, Cambridge University Press, Cambridge, UK.
Cherruault, Y. (1999). Optimisation – Méthodes locales et globales, Presses Uni-

versitaires de France, Paris, FR.
Coleman, T. F. (1984). Large Sparse Numerical Optimization, Springer Verlag,
Berlin. Lecture Notes in Computer Sciences 165.
Conn, A. R., Gould, N. I. M. and Toint, P. L. (1992). LANCELOT: A Fortran
Package for Large-Scale Nonlinear Optimization (Release A), number 17 in
Springer Series in Computational Mathematics, Springer Verlag, Heidelberg,
DE.
Conn, A. R., Gould, N. I. M. and Toint, P. L. (2000). Trust Region Methods,
MPS–SIAM Series on Optimization, SIAM, Philadelphia, PA.
Conn, A. R., Scheinberg, K. and Vicente, L. N. (2009). Introduction to Derivative-
Free Optimization, Vol. 8 of MPS-SIAM Series on Optimization, SIAM.
Dantzig, G. B. (1949). Programming of Interdependent Activities. II. Mathematical
Model, Econometrica 17: 200–211.
Dantzig, G. B. (1963). Linear Programming and Extensions, Princeton University
Press, Princeton, NJ.
Davidon, W. C. (1959). Variable Metric Method for Minimization, Report ANL-
5990(Rev.), Argonne National Laboratory, Research and Development.
Davidon, W. C. (1991). Variable Metric Method for Minimization, SIAM Journal
on Optimization 1: 1–17.
de Werra, D., Liebling, T. M. and Hêche, J.-F. (2003). Recherche opérationnelle
pour ingénieurs I, Presses polytechniques et universitaires romandes, Lausanne,
CH.
Dem’Yanov, V., Vasil’Ev, L. and Sasagawa, T. (2012). Nondifferentiable Optimiza-
tion, Translations Series in Mathematics and Engineering, Springer, London.
Dennis, J. E. and Schnabel, R. B. (1996). Numerical Methods for Unconstrained
Optimization and Nonlinear Equations, Society for Industrial and Applied
Mathematics (SIAM), Philadelphia, PA.
Deuflhard, P. (2012). A Short History of Newton’s Method, Documenta Math Extra
Volume: Optimization Stories: 25–30.
Dijkstra, E. W. (1959). A Note on Two Problems in Connexion with Graphs, Nu-
merische Mathematik 1: 269–271.
Dikin, I. I. (1967). Iterative Solution of Problems of Linear and Quadratic Program-
ming, Soviet Math. Doklady 8: 674–675.
Dinic, E. A. (1970). Algorithm for Solution of a Problem of Maximum Flow in
Networks with Power Estimation, Soviet Math. Doklady 11: 1277–1280.
Bibliography 706
Dodge, Y. (2006). Optimisation appliquée, Statistiques et probabilités appliquées,

Springer, Philadelphia, PA.
Dolan, E. D. and Moré, J. J. (2002). Benchmarking Optimization Software with
Performance Profiles, Mathematical Programming, Serie A 91: 201–213.
Dongarra, J. (2000). Sparse Matrix Storage Formats, in Z. Bai, J. Demmel, J. Don-
garra, A. Ruhe and H. van der Vorst (eds), Templates for the Solution of
Algebraic Eigenvalue Problems: A Practical Guide.
Eaton, J. W. (1997). GNU Octave: A High Level Interactive Language for Numerical
Computations, www.octave.org.
Euler, L. (1748). Introductio in analysin infinitorum, auctore Leonhardo Eu-
lero..., apud Marcum-Michaelem Bousquet, Lausanne.
Fiacco, A. V. and McCormick, G. P. (1968). Nonlinear Programming: Sequen-
tial Unconstrained Minimization Techniques, J. Wiley and Sons, New York.
Reprinted as Classics in Applied Mathematics 4, SIAM, Philadelphia, PA
(1990).
Finkel, B. F. (1897). Biography: Leonhard Euler, The American Mathematical
Monthly 4(12): 297–302.
Fletcher, R. (1980). Practical Methods of Optimization: Unconstrained Optimiza-
tion, J. Wiley and Sons, New York.
Fletcher, R. (1981). Practical Methods of Optimization: Constrained Optimiza-
Fletcher, R. (1983). Penalty Functions, in A. Bachem, M. Groetschel and B. Korte
(eds), Mathematical Programming: The State of the Art, Springer Verlag,
Berlin.
Ford, L. R. and Fulkerson, D. R. (1956). Maximal Flow Through a Network, Cana-
dian Journal of Mathematics 8: 399–404.
Forrest, J. J. and Tomlin, J. A. (1972). Updated Triangular Factors of the Basis to
Maintain Sparsity in the Product Form Simplex Method, Mathematical Pro-
gramming 2(1): 263–278.
Forster, W. (1995). Homotopy Methods, Handbook of Global Optimization, Kluwer,
Dordrecht, The Netherlands, pp. 669–750.
Gardner, L. and Nicolio, O. (2008). A Maximum Flow Algorithm to Locate Non-
Attacking Queens on an NxN Chessboard, Congressus Numerantium 191: 129–
141.
Gärtner, B. and Matousek, J. (2012). Approximation Algorithms and Semidefinite
Programming, Springer.
Gass, S. I. (2003). IFORS’ Operational Research Hall of Fame: George B. Dantzig,
International Transactions in Operational Research 10(2): 191.
Gass, S. I. (2004). IFORS’ Operational Research Hall of Fame: Albert William
Tucker, International Transactions in Operational Research 11(2): 239.
Gauvin, J. (1992). Théorie de la programmation mathématique non convexe, Les
publications CRM, Montréal.
Bibliography 707
Gendreau, M., Hertz, A. and Laporte, G. (1992). New Insertion and Postopti-
mization Procedures for the Traveling Salesman Problem, Operations Research
40(6): 1086–1094.
Gendreau, M. and Potvin, J. (2010). Handbook of Metaheuristics, International
Series in Operations Research & Management Science, Springer.
Gill, P. E. and Murray, W. (1974). Newton-Type Methods for Unconstrained and

Linearly Constrained Optimization, Mathematical Programming 28: 311–350.
Gill, P. E., Murray, W. and Wright, M. H. (1981). Practical Optimization, Academic
Press, London.
Gill, P. E. and Wong, E. (2012). Sequential Quadratic Programming Methods, in
J. Lee and S. Leyffer (eds), Mixed Integer Nonlinear Programming, Vol. 154 of
The IMA Volumes in Mathematics and its Applications, Springer, pp. 147–
224.
Gillispie, C. C. (ed.) (1990). Dictionary of Scientific Biography, Charles Scribner’s
sons, New-York.
Goldstein, A. A. and Price, J. F. (1967). An Effective Algorithm for Minimization,
Numerische Mathematik 10: 184–189.
Golub, G. H. and O’Leary, D. P. (1989). Some History of the Conjugate Gradient
and Lanczos Algorithms: 1949–1976, SIAM Rev. 31: 50–102.
Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations, 3rd edn, Johns
Hopkins University Press, Baltimore.
Gomory, R. E. (1958). Outline of an Algorithm for Integer Solutions to Linear Pro-
grams, Bulletin of the American Mathematical Society 64: 275–278.
Gould, N. I. and Toint, P. L. (2000). SQP Methods for Large-Scale Nonlinear Pro-
gramming, in M. Powell and S. Scholtes (eds), System Modelling and Opti-
mization, Vol. 46 of IFIP — The International Federation for Information
Processing, Springer, USA, pp. 149–178.
Griewank, A. (1989). On Automatic Differentiation, in M. Iri and K. Tanabe (eds),
Mathematical Programming: Recent Developments and Applications, Kluwer
Academic Publishers, Dordrecht, The Netherlands, pp. 83–108.
Griewank, A. (2000). Evaluating Derivatives. Principles and Techniques of Algo-
rithmic Differentiation, Frontiers in Applied Mathematics, Society for Indus-
trial and Applied Mathematics (SIAM), Philadelphia, PA.
Griewank, A. and Toint, P. L. (1982). On the Unconstrained Optimization of Partially
Separable Functions, in M. J. D. Powell (ed.), Nonlinear Optimization 1981,
Academic Press, London, pp. 301–312.
Harris, T. and Ross, F. (1955). Fundamentals of a Method for Evaluating Rail Net
Capacities, Research Memorandum RM-1573, The RAND Corporation, Santa
Monica, CA.
Hart, P., Nilsson, N. and Raphael, B. (1968). A Formal Basis for the Heuristic De-
termination of Minimum Cost Paths, Systems Science and Cybernetics, IEEE
Transactions on 4(2): 100–107.
Bibliography 708
Haykin, S. O. (2008). Neural Networks and Learning Machines, 3rd edn, Prentice
Hall.
Helsgaun, K. (2009). General k-OPT Submoves for the Lin–Kernighan TSP Heuristic,
Mathematical Programming Computation 1(2-3): 119–163.
Hestenes, M. R. (1951). Iterative Methods for Solving Linear Equations, NAML
Report 52-9, National Bureau of Standards, Los Angeles, CA. Reprinted in J.

Optim. Theory Appl. 11, 323-334 (1973).
Hestenes, M. R. and Stiefel, E. (1952). Methods of Conjugate Gradients for Solving
Linear Systems, J. Res. N.B.S. 49: 409–436.
Higham, N. J. (1996). Accuracy and Stability of Numerical Algorithms, SIAM,
Philadelphia, PA.
Hiriart-Urruty, J.-B. (1998). Optimisation et analyse convexe, Presses Universi-
taires de France, Paris, FR.
Hock, W. and Schittkowski, K. (1981). Test Examples for Nonlinear Programming
Codes, Springer Verlag, Berlin. Lectures Notes in Economics and Mathematical
Systems 187.
Huhn, P. (1999). A phase-1-Algorithm for Interior-Point-Methods: Worst-Case and
Average-Case Behaviour, in P. Kall and H.-J. Luethi (eds), Operations Research
Proceedings 1998, Springer, Berlin, pp. 103–112.
Iovine, J. (2012). Understanding Neural Networks: The Experimenter’s Guide,
2nd edn, Images.
Jansson, C. and Knüppel, O. (1992). A Global Minimization Method: The
Multi-Dimensional Case, Technical Report 92.1, Forschungsschwerpunkt
Informations- und Kommunikationstechnik, TU Hamburg-Harburg.
Jarník, V. (1930). O jistém problému minimálním [About a certain minimal problem],
Práce Moravské Přírodovědecké Společnosti 6: 57–63.
John, F. (1948). Extremum Problems with Inequalities as Side Conditions, in
K. O. Friedrichs, O. E. Neugebauer and J. J. Stoker (eds), Studies and Essays,
Courant Anniversary Volume, Wiley Interscience, New York.
Johnson, E. L. (2005). IFORS’ Operational Research Hall of Fame: Ralph E. Gomory,
International Transactions in Operational Research 12: 539–543.
Karmarkar, N. (1984). A new Polynomial-Time Algorithm for Linear Programming,
Combinatorica 4: 373–395.
Karush, W. (1939). Minima of Functions of Several Variables with Inequalities
as Side Conditions, Master’s thesis, University of Chicago, Chicago, IL.
Kelley, C. T. (1995). Iterative Methods for Linear and Nonlinear Equations,
Frontiers in Applied Mathematics, SIAM, Philadephia, PA.
Kelley, C. T. (1999). Iterative Methods for Optimization, Frontiers in Applied
Mathematics, SIAM, Philadelphia, PA.
Khachiyan, L. G. (1979). A Polynomial Algorithm in Linear Programming, Doklady
Akedamii Nauk SSSR 244: 1093–1096. In Russian.
Bibliography 709
Klee, V. and Minty, G. J. (1972). How Good is the Simplex Algorithm?, in O. Shisha
(ed.), Inequalities III, Academic Press, New York, pp. 159–175.
Korte, B., Fonlupt, J. and Vygen, J. (2010). Optimisation combinatoire: Théorie
et algorithmes, Collection IRIS, Springer, France.
Korte, B. and Vygen, J. (2007). Combinatorial Optimization: Theory and Algo-
rithms, 4th edn, Springer Publishing Company, Incorporated.

Kuhn, H. W. and Tucker, A. W. (1951). Nonlinear Programming, in J. Neyman (ed.),
Proceedings of the Second Berkeley Symposium on Mathematical Statistics
and Probability, University of California Press, Berkeley, CA, pp. 481–492.
Larson, R. C. (2004). IFORS’ Operational Research Hall of Fame: John D. C. Little,
International Transactions in Operational Research 11: 361–364.
Lemaréchal, C. (1981). A View of Line-Searches, in A. Auslender, W. Oettli and
J. Stoer (eds), Optimization and Optimal Control, Vol. 30 of Lecture notes in
control and information science, Springer Verlag, Heidelberg, pp. 59–78.
Levenberg, K. (1944). A Method for the Solution of Certain Problems in Least
Squares, Quarterly Journal on Applied Mathematics 2: 164–168.
Lewis, R. M., Torczon, V. and Trosset, M. W. (2000). Direct Search Methods: Then
and Now, Journal of Computational and Applied Mathematics 124: 191–207.
Little, J. D. C., Murty, K. G., Sweeney, D. W. and Karel, C. (1963). An Algorithm
for the Traveling Salesman Problem, Operations Research 11(6): 972–989.
Mandelbrot, B. B. (1982). The Fractal Geometry of Nature, W. H. Freeman.
Mangasarian, O. L. (1979). Nonlinear Programming, McGraw-Hill, New York.
Mangasarian, O. L. and Fromovitz, S. (1967). The Fritz-John Necessary Optimality
Conditions in the Presence of Equality and Inequality Constraints, Journal of
Mathematical Analysis and Applications 17: 37–47.
Marquardt, D. (1963). An Algorithm for Least-Squares Estimation of Nonlinear Pa-
rameters, SIAM Journal on Applied Mathematics 11(2): 431–441.
McCormick, G. P. (1983). Nonlinear Programming: Theory, Algorithms and Ap-
plications, Academic Press, New York.
McKinnon, K. I. (1998). Convergence of the Nelder-Mead Simplex Method to a
Nonstationary Point, SIOPT 9(1): 148–158.
Mladenović, N. and Hansen, P. (1997). Variable Neighborhood Search, Computers
& Operations Research 24(11): 1097–1100.
Moled, C. B. (2004). Numerical Computing with Matlab, Society for Industrial and
Applied Mathematics (SIAM), Philadelphia, PA.
Montagne, E. and Ekambaram, A. (2004). An Optimal Storage Format for Sparse
Matrices, Information Processing Letters 90(2): 87–92.
Nelder, J. A. and Mead, R. (1965). A Simplex Method for Function Minimization,
Computer Journal 7: 308–313.
Nemhauser, G. L. and Wolsey, L. A. (1988). Integer and Combinatorial Optimiza-
Bibliography 710
Nesterov, Y. and Nemirovsky, A. (1994). Interior Point Polynomial Methods in

Convex Programming: Theory and Algorithms, SIAM, Philadelphia, PA.
Nocedal, J. and Wright, S. J. (1999). Numerical Optimization, Operations Research,
Springer Verlag, New York.
Ortega, J. M. and Rheinboldt, W. C. (1970). Iterative Solution of Nonlinear Equa-
tions in Several Variables, Academic Press, New York.

Oxford University Press (2013). OED online,
http://www.oed.com/view/Entry/132080.
Papadimitriou, C. H. and Steiglitz, K. (1998). Combinatorial Optimization: Algo-
rithms and Complexity, Dover Publications.
Pardalos, P., Du, D.-Z. and Graham, R. L. (eds) (2013). Handbook of Combinatorial
Optimization, 2nd edn, Springer.
Polyak, B. (1987). Introduction to Optimization, Optimization Software Inc., New
York.
Powell, M. J. D. (1977). A Fast Algorithm for Nonlinearly Constrained Optimiza-
tion Calculations, in G. A. Watson (ed.), Dundee Conference on Numerical
Analysis, Vol. 7, Springer Verlag, Berlin. Lecture Notes in Mathematics 630.
Prim, R. C. (1957). Shortest Connection Networks and Some Generalizations, Bell
System Technical Journal 36: 1389–1401.
Rockafellar, R. T. (1993). Lagrange Multipliers and Optimality, SIAM Review
35(2): 183–238.
Rosenbrock, H. (1960). An Automatic Method for Finding the Greatest or Least
Value of a Function, The Computer Journal 3: 175–184.
Scales, L. E. (1985). Introduction to Non-Linear Optimization, Springer Verlag,
Heidelberg.
Schnabel, R. B. and Eskow, E. (1999). A Revised Modified Cholesky Factorization,
SIAM Journal on Optimization 9: 1135–1148.
Schrijver, A. (2002). On the History of the Transportation and Maximum Flow
Problems, Mathematical Programming 91(3): 437–445.
Schrijver, A. (2003). Combinatorial Optimization – Polyhedra and Efficiency,
Springer Verlag, Berlin.
Shapiro, A., Dentcheva, D. and Ruszczyński, A. (2014). Lectures on Stochastic
Programming: Modeling and Theory, MPS-SIAM Series on Optimization, 2nd
edn, SIAM, Philadelphia, PA.
Slater, M. (1950). Lagrange Multipliers Revisited: A Contribution to Non-Linear
Programming, Cowles Commission Discussion Paper. Math 403.
Steihaug, T. (1983). The Conjugate Gradient Method and Trust Regions in Large
Scale Optimization, SIAM Journal on Numerical Analysis 20(3): 626–637.
Stiefel, E. (1952). Ueber einige Methoden der Relaxationsrechnung, Z. Angew. Math.
Phys. 3: 1–33.
Suhl, L. M. and Suhl, U. H. (1993). A Fast LU Update for Linear Programming,
Annals of Operations Research 43(1): 33–47.
Bibliography 711
Toint, P. L. (1981). Towards an Efficient Sparsity Exploiting Newton Method for

Minimization, in I. S. Duff (ed.), Sparse Matrices and Their Uses, Academic
Press, London, pp. 57–88.
Torczon, V. J. (1989). Multi-Directional Search: A Direct Search Algorithm for
Parallel Machine, PhD thesis, Rice University, Houston, TX.
Torczon, V. J. (1991). On the Convergence of the Multidirectional Search Algorithm,

SIAM Journal on Optimization 1(1): 123–145.
Walker, R. C. (1999). Introduction to Mathematical Programming, Prentice-Hall.
Wiles, A. (1995). Modular Elliptic Curves and Fermat’s Last Theorem, Annals of
Mathematics. Second Series 141(3): 443–551.
Winston, W. L. (1994). Operations Research. Applications and Algorithms,
Duxbury Press.
Wolfe, P. (1969). Convergence Conditions for Ascent Methods, SIAM Review
11: 226–235.
Wolfe, P. (1971). Convergence Conditions for Ascent Methods II: Some Corrections,
SIAM Review 13: 185–188.
Wolsey, L. A. (1998). Integer Programming, Interscience Series in Discrete Mathe-
matics and Optimization, John Wiley and Sons, inc.
Wood, M. K. and Dantzig, G. B. (1949). Programming of Interdependent Activities.
I. General Discussion, Econometrica 17: 193–199.
Wright, M. H. (1996). Direct Search Methods: Once Scorned, Now Respectable, in
D. F. Griffiths and G. A. Watson (eds), Numerical Analysis 1995, Addison
Wesley, Longman, Harlow, UK, pp. 191–208. Proceedings of the 1995 Dundee
Biennial Conference in Numerical Analysis.
Wright, S. J. (1997). Primal-Dual Interior-Point Methods, Society for Industrial
and Applied Mathematics (SIAM), Philadelphia, PA.
Zhang, Y. (1994). On the Convergence of a Class of Infeasible Interior-Point Methods
for the Horizontal Linear Complementarity Problem, SIAM J. Optim. 4(1): 208–
227.
Zwick, U. (1995). The Smallest Networks on Which the Ford-Fulkerson Maxi-
mum Flow Procedure May Fail to Terminate, Theoretical Computer Science
148(1): 165–170.
Index
Activation function, 332 primal, 425

Active constraints, 52–54, 81 primal-dual, 427
Adjacency matrix, 508 Chain rule, 695
Adjacent, 494 Change of variables, 46, 405
Affine function, 42 Cholesky, 278, 692
Al Khwarizmi, 185 Circulation, 505, 511, 512, 516, 519,
Analytical center, 423 543
Arc, 492 decomposition, 516
Assignment problem, 546 Coercive function, 692
Augmented Lagrangian, 152, 445, 451, Cofactor, 536, 537, 691, 698
454 Combinatorial optimization, 607
Compact set, 693
Barrier Complementarity slackness, 170, 536,
function, 417 552
method, 415, 420 Concavity, 29
Basic dual problem, 101
direction, 87, 88, 90 function, 31
solution, 83, 86, 87, 537 Condition
Bellman equation, 564 number, 45
BFGS, hyperpage311, 314 Wolfe, 266, 271
Binary optimization, 605 Conditioning, 45
Bounded function, 23 Cone
Branch and bound, 626, 629, 634 convex, 690
Broyden linearized, 69
Charles George, 314 tangent, 69
optimality, 212 Conjugate
update, 211 directions, 222, 223, 225
gradient, 222, 230, 300, 319
Capacity, 503, 504, 530, 541, 578, 581, Connected, 496
584–586, 648 component, 497
Cauchy strongly, 497
Augustin-Louis, 243 Consistent simple path flow, 518
point, 243 Constraint, 51
Cauchy-Schwarz, 697 active, 52–54, 81
Central path, 423, 434 convex, 60, 128, 131
Index 713
elimination, 75 minimum, 583, 584, 587

equality, 137, 142, 153, 159 saturated, 504, 578
inequality, 142, 154 Cutting planes, 637
linear, 78, 133 Cycle, 496, 512
linear independence, 59 flow, 512
qualification, 71 flow decomposition, 518

redundant, 57 negative cost, 554
relaxation, 93
symmetry breaking, 606 Dantzig, George B., 365
Continuity, 20 Davidon, William C., 312
function, 690 Decomposition
Lipschitz, 44, 188, 191, 193, 196, circulation, 516
209, 692 flow, 510, 517, 518
semi, 692 integer flow, 519
Convergence, 190, 196, 284, 420 Schur, 124, 696
global, 284, 471 Degenerate feasible basic solution, 87
quadratic, 192 Degree, 493
sequence, 692 Demand, 504
superlinear, 206 Derivative
Convex directional, 32
combination, 689 partial, 31
cone, 690 Descent
constraint, 60, 128, 131 direction, 33, 479
function, 30 methods, 245
hull, 690 Destination, 496, 541, 571, 584
set, 61, 689 Determinant, 691
Convexity, 29 Differentiability, 31, 39
dual problem, 101 Differentiable function, 32
gradient, 36 Dijkstra
Hessian, 40 algorithm, 566, 569, 570
polyhedron, 697 Edsger Wybe, 558
Cost, 507 Dikin, 407, 409
generalized, 507 Direct search, 347
reduced, 166, 167 Directed graph, 493
Cramer’s rule, 698 Direction
Critical basic, 87, 88, 90
path, 573 conjugate, 222, 223, 225
point, 119, 120, 158, 179 descent, 33, 479
tasks, 572, 573 feasible, 60–62, 65, 69, 72, 73
Curse of dimensionality, 614 feasible at the limit, 69
Curvature, 41, 44–46, 299 Directional derivative, 32
negative, 303, 307 Discrete optimization, 625
Cut, 494, 506, 579, 581 Divergence, 504, 506
Gomory, 640, 644 Dogleg, 294, 299
Index 714
Double penalty, 450 Fractal, 195

Downstream node, 493 Fritz John, 142
Dual, 105 Frobenius, norm, 693
constraint, 425 Fulkerson
function, 97, 98, 363 Delbert Ray, 577
problem, 99, 101, 103, 116, 423, Ford-, 577, 581

584, 585 Function
Duality, 93–109, 446 activation, 332
linear optimization, 102 affine, 42
measure, 429 barrier, 417
strong, 107, 168, 169, 588 coercive, 692
theorem, 134 continuous, 690
weak, 100 differentiable, 32
dual, 97, 98
Eigenvalue, 690
implicit, 696
Eigenvector, 690
Lagrangian, 97
Elementary row operations, 379
linear, 42
Elimination of constraints, 75
merit, 472, 473, 478, 479
Epsilon, machine, 184
non linear, 43
Equality constraint, 137, 142, 153, 159
objective, 29
Equation
quadratic, 44
Bellman, 564
normal, 335
Gauss-Newton, 334, 335
secant, 210
Geppetto, 14, 629
Equivalence, 16
Global
Euclid, 11
convergence, 284, 471
Euler, Leonhard, 501
minimum, 22
Exact methods, 625
optimum, 122
Farkas’ lemma, 697 Golden section, 257, 260
Feasible Goldfarb, Donald, 314
direction, 60–62, 65 Gomory
direction at the limit, 69, 72, 73 cut, 640, 643, 644
point, 51 Ralph E., 640
sequences, 66 Gradient, 32, 36, 44, 59
Fermat, Pierre de, 123 conjugate, 222, 230, 300, 319
Finite difference, 203, 208 matrix, 38, 192, 199
Fletcher, Roger, 314 projected, 399, 401, 402, 405, 407,
Flow, 501, 506 412
cycle, 512, 518 related, 287
decomposition, 510, 517, 518 Graph, 492
integer decomposition, 519 directed, 493
maximum, 577, 584 Greedy
simple cycle, 512 algorithm, 523
Ford-Fulkerson, 577, 581 heuristic, 648
Index 715
Greedy algorithm, 523, 649, 653 Likelihood

maximum, 17, 331
Hess, Ludwig Otto, 42 Limit point, 692
Hessian Line search, 245, 252, 254, 274, 275,
convexity, 40 277, 279
matrix, 39 exact, 251, 260

Heuristic, 647, 648 inexact, 263
Linear
Implicit functions, 696 constraint, 78, 133
Incidence matrix, 538 function, 42
Incident, 493 independence, 56
Indegree, 493 model, 183, 189, 192, 193
Indiana Jones, 13 optimization, 165, 167, 422
Induced norm, 693 problem, 103
Inequality constraint, 142, 154
relaxation, 617
Infimum, 23
Linearity, 42
Insertion, 653
Linearized cone, 69
Integer optimization, 605
Lipschitz, 42, 691
Integrality, 537
condition, 691
Interior, 25
continuity, 44, 188, 191, 193, 196,
Interior point, 25, 61, 410, 415, 430,
209, 692
436, 437, 440
Rudolf Otto, 691
Little, John D. C. , 626
Jacobi, Carl Gustav Jacob, 39
Local
Jacobian matrix, 39
minimum, 21, 22
James Bond, 11
Newton, 235, 236, 239
Kalman filter, 337, 339 search, 656, 657, 659
real time, 341, 342 SQP, 464, 466
Karush-Kuhn-Tucker, 133, 137, 142 Longest path problem, 571
Kernel, 696 Lower bound, 617
Knapsack problem, 607, 608, 648, 662,
663, 670, 677 Machine epsilon, 184
Matrix
Lagrange adjacency, 508
Joseph-Louis, 102 cofactor, 536, 537, 691, 698
multipliers, 133, 152, 235, 452 determinant, 691
Lagrangian eigenvalue, 690
augmented, 152, 445, 451, 454 eigenvector, 690
function, 97 gradient, 38
penalty, 447 Hessian, 39
Landau, notation, 692 incidence, 538
Large neighborhood, 435 Jacobian, 39
Laupt-Himum, 9 minor, 691
Least squares problem, 329 orthogonal, 690
Index 716
positive definite, 690 local, 235, 236, 239

positive semidefinite, 690 method, 302
rank, 693 point, 243
unimodular, 691 solving equations, 181, 185, 190,
Maximum 194, 196
flow, 541, 577, 584 theorem, 697

likelihood, 17, 331 trust region, 302
Mean value theorem, 695 Node, 492
Merit function, 472, 473, 478, 479 demand, 504
Minimum downstream, 493
cost flow, 529 supply, 504
cut, 583, 584, 587 transit, 504
spanning tree, 520, 523 upstream, 493
Minor, 691 Non linear function, 43
Model, 6 Norm
linear, 183, 189, 192, 193 Frobenius, 693
quadratic, 238 induced, 693
secant, 202, 209 matrix, 697
Modeling, 5, 239, 331, 539, 595 vector, 689
Multipliers, Lagrange, 452 Normal equations, 335
Number, condition, 45
Nearest neighbor, 649
Objective function, 29
Necessary optimality conditions, 115,
Optimality conditions, 123, 131, 132,
128, 133, 167, 235
153, 535, 553
Neighborhood, 656, 663, 669
constraints, 127
large, 435
necessary, 115, 128, 133, 167, 235
restricted, 434
sufficient, 120, 122, 152, 154, 167
structure, 656
Optimum, 5
Nelder-Mead, 348, 349
Origin, 496, 541, 571, 584
Network, 491, 501, 610
Orthogonal
loading, 510
matrix, 690
neural, 332
regression, 341
representation, 508
Outdegree, 493
Newton, 427
algorithm, 185, 194, 203, 208, 236, Partial derivative, 31
279 Partitioned problem, 627
Caasi, 202 Path, 495
constrained, 399, 406 central, 423, 434
convergence, 190, 196 critical, 573
direction, 294 dogleg, 294
fractal, 195 flow, 518
Gauss, 334, 335 consistent simple, 518
Isaac, 182 forward, 496
line search, 277, 279 longest, 571
Index 717
primal central, 425 set covering, 609

primal-dual central, 427 shortest path, 539, 540, 553
saturated, 579 transhipment, 529
shortest, 539, 551, 560 transportation, 544
simple, 496 traveling salesman, 610, 649, 665,
unsaturated, 578–582, 584 672, 679

Penalty Projected gradient, 399, 401, 405, 407
double, 450 Projectile, 6
Lagrangian, 447
quadratic, 449 Quadratic
PERT, 572 convergence, 192
Pivoting, 381 function, 44
Point interpolation, 252, 254
Cauchy, 243 model, 238
critical, 120 optimization, 171
feasible, 51 penalty, 449
interior, 25, 61, 410, 415, 430, 436, problem, 123, 173, 221, 222
437, 440 Quasi-Newton, 201, 311
Newton, 243 BFGS, 316
Polyhedron, 78, 79 SR1, 319
convexity, 697
Rank, 693
standard form, 79
Rayleigh-Ritz, theorem, 696
Positive
Reduced costs, 166, 167
definite matrix, 690 Redundant constraints, 57
semidefinite matrix, 690 Region, trust, 298
Preconditioned steepest descent, 246 Relaxation, 616, 617
Preconditioning, 45, 47 linear, 617
Primal, 105 Restricted neighborhood, 434
central path, 425 Rosenbrock problem, 281, 308, 320
Primal-dual central path, 427 Row operations, elementary, 379
Principle of optimality, 558
Problem Saturated
assignment, 546 cut, 504, 578
dual, 99, 101, 103 path, 579
knapsack, 607, 608, 648, 662, 663, Schur decomposition, 124, 696
670, 677 Secant
least squares, 329 equation, 210
linear, 103 method, 206, 214
longest path, 571 model, 202, 209
maximum flow, 541 Semi-continuity, 692
minimum cut, 584 Sensitivity analysis, 159
partitioned, 627 Sequences, feasible, 66
quadratic, 123, 173, 221, 222 Sequential Quadratic Programming, 463,
Rosenbrock, 281, 308, 320 464, 466, 480
Index 718
Set optimality conditions, 120, 122, 131,

compact, 693 152–154, 167
convex, 61 progress, 271
covering, 609 Superlinear convergence, 206
covering problem, 609 Supply, 504
Shanno, David F., 314 Swisscom, 7

Sherman-Morrison-Woodbury formula, Symmetry breaking constraints, 606
698
Shortest path, 551, 560, 566 Tableau
algorithm, 558, 560, 561, 564 pivoting, 381
problem, 539, 540, 553 simplex, 376, 385
spanning tree, 565 Tangent cone, 69
Simple Taylor, theorem, 695
algorithm, 363 Theorem
cycle flow, 512 mean value, 695
path flow, consistent, 518 Newton, 697
Simplex, 348 Rayleigh-Ritz, 696
algorithm, 363, 371, 383, 391 Taylor, 695
tableau, 376, 385 Torczon, 354, 355
Simulated annealing, 674, 676 Total unimodularity, 536, 537
Transhipment problem, 529
Singular values, 693
Transit, 504, 529
Sink, 541, 584
Transportation problem, 544
Slack variables, 19, 148
Traveling salesman problem, 610, 649,
Slackness, complementarity, 170, 536,
665, 672, 679
552
Tree, 498, 628
Solution, basic, 83, 86, 537
characterization, 498
Source, 541, 584
spanning, 500, 520, 522, 523, 566
Spanning tree, 500, 520, 522, 523, 566
Trust region, 291, 294, 298, 300
SQP, 463, 464, 466, 480
subproblem, 292
SR1, 317, 318
Tucker, Albert William, 134
Standard form, polyhedron, 79
Stationary points, 119 Unconstrained optimization, 115
Steepest ascent, 34 Unimodular, 536, 691
Steepest descent strictly, 258, 260, 690
algorithm, 251, 277 totally, 536–538, 618
preconditioned, 246 Unsaturated path, 578–582, 584
Strong duality, 107, 169 Update
Strongly connected, 497 BFGS, 314
Structure, neighborhood, 656 SR1, 318
Subgraph, 492 Upstream node, 493
Subnetwork, 492
Subproblem, trust region, 292 Variable Neighborhood Search, 669
Sufficient Variables
decrease, 266 change of, 46, 405
Index 719
slack, 19, 148

Vertex, 79–82, 86, 365, 492
enumeration, 368
Vertices, see Vertex
von Neumann, John, 95
Weak duality, 100

Weierstrass theorem, 24
Wolfe condition, 266, 271
Zoutendijk, 284
Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.in) - jeudi 18 avril 2024 à 07h48 Index
720

Optimization Principles and Algorithms Ed2 v1

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

Optimization Principles and Algorithms Ed2 v1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimization Principles and Algorithms Ed2 v1

Uploaded by

Copyright:

Available Formats

Ce document est la propriété exclusive de Kavyaa Kannan (kk392@snu.edu.

in) - jeudi 18 avril 2024 à 07h48

Companion website: optimizationprinciplesalgorithms.com

Prof. Thomas M. Liebling

The publisher and author express their thanks to the Ecole

Polytechnique Fédérale de Lausanne (EPFL) for its generous

Presses polytechniques et universitaires romandes, EPFL – Rolex Learning Center,

© 2018, Second edition, EPFL Press

© 2015, First edition, EPFL Press

important topics not covered in this book, we can mention

I Formulation and analysis of the problem 1

II Optimality conditions 111

5 Unconstrained optimization 115

6 Constrained optimization 127

III Solving equations 177

7 Newton’s method 181

8 Quasi-Newton methods 201

IV Unconstrained optimization 217

9.3 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

10 Newton’s local method 235

11 Descent methods and line search 245

12 Trust region 291

13 Quasi-Newton methods 311

14 Least squares problem 329

15 Direct search methods 347

V Constrained optimization 359

17 Newton’s method for constrained optimization 399

18 Interior point methods 415

19 Augmented Lagrangian method 445

20 Sequential quadratic programming 463

21.5 Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501

21.5.5 Network representation . . . . . . . . . . . . . . . . . . . . . . 508

22 The transhipment problem 529

23 Shortest path 551

24 Maximum flow 577

VII Discrete optimization 591

26 Exact methods for discrete optimization 625

VIII Appendices 685

No one trusts a model except the

Modeling is a necessity before any optimization process. How do we translate a

The answer to these questions is an essential step in any optimization: mathe-

represent conﬁgurations of the system that are possible to modify in order to

Definition 1.2 (Mathematical model). Mathematical representation of a physical,

Figure 1.1: Swisscom problem

Table 1.1: Data for Swisscom customers

f(x1 , x2 ) = 200 d1 (x1 , x2 ) + 150 d2 (x1 , x2 )

Decision variables The strategy of the Château Laupt-Himum is to decide how

Each liter of rosé wine that is sold gives (in e )

problems in history. We write it in three steps.

1.1.5 Agent 007

Figure 1.2: The setting for James Bond

at a speed of 6 km/h. This takes him

1.1.6 Indiana Jones

Figure 1.3: The setting for Indiana Jones

The modeling process consists of three steps.

(27 x1 + 21 x2 ) − (24 x1 + 19x2 ), (1.23)

2x1 + x2 ≤ 100 . (1.25)

A similar analysis of the carpentry resources leads to the following constraint:

Theorem 1.16 (Solution in the interior of constraints). Let x∗ be a local minimum