Line Search and Trust-Region
Line Search and Trust-Region
Line Search and Trust-Region
Nonlinear Optimization
Trust-region methods
◮ Line search and trust-region and two examples of global
strategies that modify a (usually) locally convergent algorithm,
Niclas Börlin e.g. Newton, to become globally convergent.
◮ At every iteration k, both global strategies enforce the descent
Department of Computing Science
condition
Umeå University f (xk +1 ) < f (xk )
niclas.borlin@cs.umu.se
by controlling the length and direction of the step.
December 3, 2007
0 p 0
kpk ≤ ∆k .
−0.2 −0.2
−0.4 −0.4 This will limit the length of the step from xk to xk +1 .
−0.6
The trust-region subproblem ◮ Note that if Bk = ∇2 f (xk ) is positive definite and ∆k big enough,
the solution of the trust-region subproblem is the solution of
◮ At iteration k of a trust-region method, the following subproblem
must be solved: ∇2 f (xk )p = −∇f (xk ),
1 i.e. p is a Newton-direction.
min mk (p) =fk + p T gk + p T Bk p,
p 2 ◮ Otherwise,
s.t. kpk ≤ ∆k
∆k ≥ kpk k = k(∇2 f (xk ) + λI)−1 ∇f (xk )k,
◮ It can be shown that the solution p ∗ of this constrained problem
is the solution of the linear equation system so if ∆k → 0, then λ → ∞ and
(Bk + λI)p ∗ = −gk 1
pk → − ∇f (xk ).
for some λ ≥ 0 such that the matrix (Bk + λI) is positive λ
semidefinite. ◮ When λ varies between 0 and ∞, the corresponding search
◮ Furthermore, direction pk (λ) will vary between the Newton direction and a
λ(∆k − kp ∗ k) = 0. multiple of the negative gradient.
c 2007 Niclas Börlin, CS, UmU
Nonlinear Optimization; Trust-region methods c 2007 Niclas Börlin, CS, UmU
Nonlinear Optimization; Trust-region methods
The Levenberg-Marquardt algorithm ◮ The Levenberg-Marquardt algorithm was put into the trust-region
framework (∆-parameterized) in the early 80-ies (Moré, 1981).
◮ The ∆ version of Levenberg-Marquardts has a number of
◮ The first trust-region algorithm was developed for least squares advantages over the λ version:
problems by Levenberg (1944) and Marquardt (1963). ◮ λ is nontrivially related to the problem. ∆ is related to the size of
◮ The original algorithm uses the approximation Bk = JkT Jk and x . E.g. ∆0 = kx0 k is often a reasonable choice.
solves
◮ The transition to λ = 0 is handled transparently.
◮ The λ algorithm need to re-solve
(Bk + λk I)p = −gk
for different values of λk . (Bk + λk I)p = −gk
◮ The original algorithm adapts by modifying the λ value, i.e. if the when a step is rejected and λ is reduced. The ∆ algorithm has
1
reduction produced by p is good enough, λk +1 = 10 λk , ways to avoid that.
otherwise λk +1 = 10λk and the step is rejected. ◮ However, many popular implementation of Levenberg-Marquardt
still use the original, λ-parameterized, formulation.