Nonlinear Optimization: Overview of Methods The Newton Method With Line Search
Nonlinear Optimization: Overview of Methods The Newton Method With Line Search
Nonlinear Optimization: Overview of Methods The Newton Method With Line Search
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Most deterministic methods for unconstrained optimization have the following features:
They are iterative, i.e. they start with an initial guess x0 of the variables and tries to nd better points {xk }, k = 1, . . .. They are descent methods, i.e. at each iteration k , f (xk +1 ) < f (xk ) is (at least) required. At each iteration k , the nonlinear objective function f is replaced by a simpler model function mk that approximates f around xk . The next iterate xk +1 = xk + p is sought as the minimizer of mk .
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
The model function mk is usually dened to be a quadratic function of the form 1 mk (xk + p ) = fk + p T fk + p T Bk p , 2 where fk = f (xk ), fk = f (xk ), and Bk is a matrix, usually a positive denite approximation of the hessian 2 f (xk ). If Bk is positive denite, a minimizer of mk may be found by solving p mk (xk + p ) = 0 for p . If the minimizer of mk does not produce a better point, the step p is modied to produce a point xk +1 = xk + p that is better. The modications come in two major avours: line search and trust-region.
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search
Line search
In the line search strategy, the algorithm chooses a search direction pk and tries to solve the following one-dimensional minimization problem min f (xk + pk ),
>0
In theory we would like optimal step lengths, but in practice it is more efcient to test trial step lengths until we nd one that gives us a good enough point.
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Trust-region
0.8
0.6
xk
0.4
In the trust-region strategy, the algorithm denes a region of trust around xk where the current model function mk is trusted. The region of trust is usually dened as p
2
0.2
0.2
where the scalar is called the trust-region radius. A candidate step p is found by approximately solving the following subproblem min mk (xk + p ) s.t. p
p 2
0.4
If the candidate step does not produce a good enough new point, we shrink the trust-region radius and re-solve the subproblem.
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
0.8
In the line search strategy, the direction is chosen rst, followed by the distance. In the trust-region strategy, the maximum distance is chosen rst, followed by the direction.
0.8 0.8
0.6
xk
0.4 0.2
0.6
0.4
xk p
0.6
0.4
xk
0.2
0.2
0.2
0
0.2
0.2
0.4
0.4 0.4
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Convergence rate Linear convergence Quadratic convergence Local vs. global convergence Globalization strategies
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Convergence rate Linear convergence Quadratic convergence Local vs. global convergence Globalization strategies
Convergence rate
Assume we have a series {xk } that converges to a solution x . Dene the sequence of errors as ek = xk x and note that
k
In order to compare different iterative methods, we need an efciency measure. Since we do not know the number of iterations in advance, the computational complexity measure used by direct methods cannot be used. Instead the concept of a convergence rate is dened.
lim ek = 0.
We say that the sequence {xk } converges to x with rate r and rate constant C if e k +1 lim =C k ek r and C < .
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Convergence rate Linear convergence Quadratic convergence Local vs. global convergence Globalization strategies
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Convergence rate Linear convergence Quadratic convergence Local vs. global convergence Globalization strategies
linear convergence, for r = 1 and 0 < C < 1; quadratic convergence, for r = 2. super-linear convergence, for r = 1 and C = 0.
For r = 1, C = 0.1 and e0 = 1, the norm of the error sequence becomes 1, 101 , 102 , . . . , 107
7 iterations
Thus the constant C is of major importance for a method with linear convergence.
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Convergence rate Linear convergence Quadratic convergence Local vs. global convergence Globalization strategies
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Convergence rate Linear convergence Quadratic convergence Local vs. global convergence Globalization strategies
A method is called locally convergent if it produces a convergent sequence toward a minimizer x provided a close enough starting approximation. A method is called globally convergent if it produces a convergent sequence toward a minimizer x provided any starting approximation. Note that global convergence does not imply convergence towards a global minimizer.
For r = 2, C = 3 och e0 = 0.1, the sequence becomes 0.1, 0.03, 0.0027, . . . , i.e. it converges despite C > 1.
For quadratic convergence, the constant C is of lesser importance. Instead it is important that the initial approximation is close enough to the solution, i.e. e0 is small.
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Convergence rate Linear convergence Quadratic convergence Local vs. global convergence Globalization strategies
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Globalization strategies
Descent directions
The line search and trust-region methods are sometimes called globalization strategies, since they modify a core method (typically locally convergent) to become globally convergent. There are two efciency requirements on any globalization strategy:
Consider the Taylor expansion of the objective function along a search direction p 1 f (xk + p ) = f (xk ) + p T fk + 2 p T 2 f (xk + p )p , 2 for some (0, )
Far from the solution, they should stop the methods from going out of control. Close to the solution, when the core method is efcient, they should interfere as little as possible.
Any direction p such that p T fk < 0 will produce a reduction of the objective function for a short enough step. A direction p such that p T fk < 0 is called a descent direction.
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
p fk Since cos = p fk is the angle between the search direction and the negative gradient, descent directions are in the same half-plane as the negative gradient. The search direction corresponding to the negative gradient p = fk is called the direction of steepest descent.
1 0.9 0.8 0.7 0.6 0.5
1.5
0.5
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Overview Exact and inexact line searches The Sufcient Decrease Condition Backtracking The Curvature Condition The Wolfe Condition
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Overview Exact and inexact line searches The Sufcient Decrease Condition Backtracking The Curvature Condition The Wolfe Condition
Line search
Each iteration of a line search method computes a search direction pk and then decides how far to move along that direction. The next iteration is given by xk +1 = xk + k pk
We will require pk to be a descent direction. This assures that the objective function will decrease f (xk + k pk ) < f (xk ) for some small k > 0.
Ideally we would like to nd the global minimizer of for every iteration. This is called an exact line search. However, it is possible to construct inexact line search methods that produce an adequate reduction of f at a minimal cost. Inexact line search methods construct a number of candidate values for and stop when certain conditions are satised.
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Overview Exact and inexact line searches The Sufcient Decrease Condition Backtracking The Curvature Condition The Wolfe Condition
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Overview Exact and inexact line searches The Sufcient Decrease Condition Backtracking The Curvature Condition The Wolfe Condition
Mathematically, the descent condition f (xk + pk ) < f (xk ) is not enough to guarantee convergence. Instead, the sufcient decrease condition is formulated from the linear Taylor approximation of () () (0) + (0) or
T f (xk + pk ) f (xk ) + fk pk .
f() 0 1/16
The sufcient decrease condition states that the new point must at least produce a fraction 0 < c1 < 1 of the decrease predicted by the Taylor approximation, i.e.
T f (xk + pk ) < f (xk ) + c1 fk pk .
1/8 1/4 1/2 1
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Overview Exact and inexact line searches The Sufcient Decrease Condition Backtracking The Curvature Condition The Wolfe Condition
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Overview Exact and inexact line searches The Sufcient Decrease Condition Backtracking The Curvature Condition The Wolfe Condition
Backtracking
The sufcient decrease condition alone is not enough to guarantee convergence, since it is satised for arbitrarily small values of . The sufcient decrease condition has to be combined with a strategy that favours large step lengths over small. A simple such strategy is called backtracking: Accept the rst element of the sequence 1 1 1, , , . . . , 2i , . . . 2 4 that satises the sufcient decrease condition. Such a step length always exist. Large step lengths are tested before small ones. Thus, the step length will not be too small. This technique works well for Newton-type algorithms.
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search
is to solve for () = 0, which is approximated to the condition | (k )| c2 | (0)|, where c2 is a constant c1 < c2 < 1. T f (x + p ), we get Since () = pk k k
T T |pk f (xk + k pk )| c2 |pk f (xk )|.
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
Overview Exact and inexact line searches The Sufcient Decrease Condition Backtracking The Curvature Condition The Wolfe Condition
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
The Newton-Raphson method in 1 The Classical Newton minimization method in n Geometrical interpretation; the model function Properties of the Newton method Ensuring a descent direction The modied Newton algorithm with line search
Consider the non-linear problem f (x ) = 0, where f , x . The Newton-Raphson method for solving this problem is based on the linear Taylor approximation of f around xk f (xk + p) f (xk ) + pf (xk ).
where 0 < c1 < c2 < 1, are collectively called the strong Wolfe conditions. Step length methods that use the Wolfe conditions are more complicated than backtracking. Several popular implementations of nonlinear optimization routines are based on the Wolfe conditions, notably the BFGS quasi-Newton method.
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search
If f (xk ) = 0 we solve the linear equation f (xk ) + pf (xk ) = 0 for p and get p = f (xk )/f (xk ).
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
The Newton-Raphson method in 1 The Classical Newton minimization method in n Geometrical interpretation; the model function Properties of the Newton method Ensuring a descent direction The modied Newton algorithm with line search
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
The Newton-Raphson method in 1 The Classical Newton minimization method in n Geometrical interpretation; the model function Properties of the Newton method Ensuring a descent direction The modied Newton algorithm with line search
In order to use Newtons method to nd a minimizer we apply the rst-order necessary conditions on a function f f (x ) = 0 (f (x ) = 0)
The approximation of the non-linear function f (x ) with the linear (in p ) polynomial f (xk + p ) f (xk ) + 2 f (xk )p corresponds to approximating the non-linear function f (x ) with the quadratic (in p ) Taylor expansion
This results in the Newton sequence xk +1 = xk (2 f (xk ))1 f (xk ) (xk +1 = xk f (x )/f (x ))
This is often written as xk +1 = xk + pk , where pk is the solution of the Newton equation: 2 f (xk )pk = f (xk ). This formulation emphasizes that a linear equation system is solved in each step, usually by other means than calculating an inverse.
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search
i.e. Bk = 2 f (xk ). Newtons method can be interpreted as that at each iteration k , f is approximated by the quadratic Taylor expansion mk around xk and xk +1 is calculated as the minimizer of mk .
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
The Newton-Raphson method in 1 The Classical Newton minimization method in n Geometrical interpretation; the model function Properties of the Newton method Ensuring a descent direction The modied Newton algorithm with line search
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
The Newton-Raphson method in 1 The Classical Newton minimization method in n Geometrical interpretation; the model function Properties of the Newton method Ensuring a descent direction The modied Newton algorithm with line search
Disadvantages:
It does not necessarily converge toward a minimizer. It may diverge if the starting approximation is too far from the solution. It will fail if 2 f (xk ) is not invertible for some k . It requires second-order information 2 f (xk ).
Newtons method is rarely used in its classical formulation. However, many methods may be seen as approximations of Newtons method.
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
The Newton-Raphson method in 1 The Classical Newton minimization method in n Geometrical interpretation; the model function Properties of the Newton method Ensuring a descent direction The modied Newton algorithm with line search
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
The Newton-Raphson method in 1 The Classical Newton minimization method in n Geometrical interpretation; the model function Properties of the Newton method Ensuring a descent direction The modied Newton algorithm with line search
If 2 fk is not positive denite, the Newton direction pN may not a descent direction. In that case we will choose Bk as a positive denite approximation of 2 fk . Performed in a proper way, this modied algorithm will converge toward a minimizer. Furthermore, close to the solution the Hessian is usually positive denite and the modication will only be performed far from the solution.
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
The Newton-Raphson method in 1 The Classical Newton minimization method in n Geometrical interpretation; the model function Properties of the Newton method Ensuring a descent direction The modied Newton algorithm with line search
Methods for unconstrained optimization Convergence Descent directions Line search The Newton Method
The Newton-Raphson method in 1 The Classical Newton minimization method in n Geometrical interpretation; the model function Properties of the Newton method Ensuring a descent direction The modied Newton algorithm with line search
The positive denite approximation Bk of the Hessian may be found with minimal extra effort: The search direction p is calculated as the solution of 2 f (x )p = f (x ).
If f (xk ) < , stop. Compute the modied LDLT factorization of the Hessian. Solve
N (LDLT )pk = f (xk ) N for the search direction pk .
may be used, where the diagonal elements of D are positive. If 2 f (x ) is not positive denite, at some point during the factorization, a diagonal element will be dii 0. In this case, the element may be replaced with a suitable positive entry. Finally, the factorization is used to calculate the search direction (LDLT )p = f (x ).
c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search