Levenberg-Marquardt Algorithms Vs Trust Region Algorithms: Frank Vanden Berghen

Levenberg-Marquardt algorithms
vs
Trust Region algorithms
Frank Vanden Berghen
IRIDIA, Université Libre de Bruxelles
fvandenb@iridia.ulb.ac.be
November 12, 2004
For an in-depth explanation and more references about this subject, see my thesis, section 2.1,
available at: http://iridia.ulb.ac.be/∼fvandenb/mythesis/index.html
Let’s assume that we want to find x∗ , the minimum of the objective function F(x) : <n → <.
Let us write the Taylor development limited to the degree 2 of F around x k :

1
F(xk + δ) ≈ Qk (δ) = F(xk ) + gkt δ + δ t Bk δ
2
with Qk (δ), the quadratical approximation of F(x) around xk .
gk , the gradient of F(x) computed at xk .
Bk , an approximation of the real hessian matrix Hk of F(x) at xk .
Hk , the real hessian matrix of F(x) at xk .
For the moment, we will assume that Bk := Hk .
The unconstrained minimum δk of Q(δ) is:
∇Q(δk ) = gk + Bk δk = 0
⇐⇒ Bk δk = −gk (1)
Equation 1 is called the equation of the Newton Step δk .
So, the Newton’s method to find the minimum x∗ of F(x) is:
1. Set k = 0. Set x0 = xstart .
2. solve Bk δk = −gk (go to the minimum of the current quadratical approximation of F).
3. set xk+1 = xk + δk
4. Increment k. Stop if gk ≈ 0 otherwise, go to step 2.
1
2
Newton’s method is VERY fast: when xk is close to x∗ (when we are near the optimum) this
method has quadratical convergence speed:
kxk+1 − x∗ k < kxk − x∗ k2
with < 1. Unfortunately, it does NOT always converge to the minimum x∗ of F(x). To have
convergence, we need to be sure that Bk is always positive definite, ie. that the curvature of
F(x) is always positive.
PROOF: Bk must be positive definite to have convergence.
We want the search direction δk to be a descent direction
=⇒ δ T g < 0 (2)
Taking the value of g from (1) and putting it in (2), we have:
−δ T Bδ < 0
⇔ δ T Bδ > 0 (3)
The Equation (3) says that Bk must always be positive definite.
END OF PROOF.
So, we must always construct the Bk matrix so that it is a positive definite approximation of
Hk , the real Hessian matrix of F(x). The Newton’s Algorithm cannot use negative curvature
(when Hk negative definite) inside F(x). See figure 1 for an illustration about positive/negative
curvature.
Figure 1: positive/negative curvature of a function f (x) : < → <
One possibility to solve this problem is to take Bk = I (I=identity matrix), which is a very
bad approximation of the Hessian H but which is always positive definite. We will simply have
δk = −gk . We will simply follow the slope. This algorithm is called the “steepest descent al-
gorithm”. It is very slow. It has linear speed of convergence: kxk+1 − x∗ k < kxk − x∗ k with
3
< 1 (problem is when = 0.99).
Another possibility, if we don’t have Hk positive definite, is to use instead Bnew,k = Hk + λI

with λ being a very big number, such that Bnew,k is positive definite. Then we solve, as usual,
the Newton Step equation (see equation (1)):
Bnew,k δk = −gk ⇔ (Hk + λI)δk = −gk (4)
Choosing a high value for λ has 2 effects:

1. Hk (inside equation Bnew,k = Hk + λI ) will become negligible and we will find, as search
direction, “the steepest descent step”.
2. The step size kδk k is reduced.

In reality, only the above second point is important. It can be proven that, if we impose a
proper limitation on the step size kδk k < ∆k , we maintain global convergence even if Bk is an
indefinite matrix. Trust region algorithms are based on this principle (∆k is called the trust
region radius). In trust region algorithm the steps δk are:
δk is the solution of Q(δk ) = min Q(δ) subject to kδk < ∆k (5)

δ
The old Levenberg-Marquardt algorithm uses a technique which adapts the value of λ during
the optimization. If the iteration was successful (F(xk + δk ) < F(δk )), we decrease λ to exploit
more the curvature information contained inside Hk . If the previous iteration was unsuccessful
(F(xk + δk ) > F(δk )), the quadratic model don’t fit properly the real function. We must then
only use the “basic” gradient information. We will increase λ in order to follow closely the
gradient (“steepest descent algorithm”). This old algorithm is the base for the explanation of
the update of the trust region radius ∆k in Trust Region Algorithms.
For intermediate value of λ, we will thus follow a direction which is a mixture of the “steepest
descent step” and the “Newton Step”. This direction is based on a perturbated Hessian matrix
Bnew,k and can sometime be disastrous (There is no geometrical meaning of the perturbation
λI on Hk ).
When a negative curvature is encountered (Hk negative definite):

• Newton’s Method fail.
• Levenberg-Marquardt algorithms are following a perturbated and approximative direction

of research δk based on an arbitrary perturbation of Hk (δk is the solution of equation (4):
(Hk + λI)δk = −gk ).
• Trust region algorithms will perform a long step (kδk k = ∆k ) and “move” quickly to a
more interesting area (see equation (5))
Trust Region algorithm will thus exhibit better performances each time a negative curvature is
encountered and have thus better performances than all the Levenberg-Marquardt algorithms.
Unfortunately, the computation of δk for Trust Region algorithm involves a constrained mini-
mization of a quadratic subject to one non-linear constraint (see equation (5)). This is not a
trivial problem to solve at all. The algorithmic complexity of Trust region algorithms is much
higher. This explains why they are not very often encountered despite their better performances.
4
The solution of equation (5) can be computed very efficiently using the fast algorithm from Moré
and Sorensen.
There is one last point which must still be taken into account: How can we obtain H k ? Usually,
we don’t have the analytical expression of Hk . Hk must thus be approximated numerically. Hk
is usually constructed iteratively based on information gathered from old evaluations F(x j ) for
j = 1, . . . , k. Iterative construction of Hk can be based on:
• The well-known BFGS formulae:

Each update is fast to compute but we get poor approximation of Hk .
• Multivariate polynomial interpolation:

Each update is very time consuming. We get very precise Hk .
• Finite difference approximation:

Very poor quality: numerical instability occurs very often.
To summarize:
• Levenberg-Marquardt algorithms and Trust region algorithms are both Newton Step-based
methods (they are called “Restricted Newton Step methods”). Thus they both exhibits
quadratical speed of convergence near x∗ .
• When we are far from the solution (xk far from x∗ ), we can encounter a negative curvature
(Hk negative definite). If this happens, Levenberg-Marquardt algorithms will slow down
dramatically. In opposition, Trust Region Methods will perform a very long step δ k and
“move” quickly to a more interesting area.
Old Levenberg-Marquardt algorithms were updating iteratively Hk only on iterations k where

a good value for λ has been found (This is because on old computers the update of H k is very
time consuming, so we want to avoid it). Modern Levenberg-Marquardt algorithms are updat-
ing iteratively Hk at every iterations k but they are still enable to follow a negative curvature
inside the function F(x). The steps δk remains thus of poor quality compared to trust region
algorithms.
To summarize again: Trust Region Methods are an evolution of the Levenberg-Marquardt algo-
rithms. Trust Region Methods are able to follow the negative curvature of the objective function.
Levenberg-Marquardt algorithms are NOT able to do so and are thus slower.

Levenberg-Marquardt Algorithms Vs Trust Region Algorithms: Frank Vanden Berghen

Uploaded by

Copyright:

Available Formats

Levenberg-Marquardt Algorithms Vs Trust Region Algorithms: Frank Vanden Berghen

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Levenberg-Marquardt Algorithms Vs Trust Region Algorithms: Frank Vanden Berghen

Uploaded by

Copyright:

Available Formats

Levenberg-Marquardt algorithms

November 12, 2004

Let us write the Taylor development limited to the degree 2 of F around x k :

For the moment, we will assume that Bk := Hk .

The unconstrained minimum δk of Q(δ) is:

Equation 1 is called the equation of the Newton Step δk .

So, the Newton’s method to find the minimum x∗ of F(x) is:

1. Set k = 0. Set x0 = xstart .

4. Increment k. Stop if gk ≈ 0 otherwise, go to step 2.

kxk+1 − x∗ k <  kxk − x∗ k2

PROOF: Bk must be positive definite to have convergence.

We want the search direction δk to be a descent direction

Taking the value of g from (1) and putting it in (2), we have:

The Equation (3) says that Bk must always be positive definite.

Figure 1: positive/negative curvature of a function f (x) : < → <

 < 1 (problem is when  = 0.99).

Another possibility, if we don’t have Hk positive definite, is to use instead Bnew,k = Hk + λI

Bnew,k δk = −gk ⇔ (Hk + λI)δk = −gk (4)

Choosing a high value for λ has 2 effects:

2. The step size kδk k is reduced.

δk is the solution of Q(δk ) = min Q(δ) subject to kδk < ∆k (5)

When a negative curvature is encountered (Hk negative definite):

• Levenberg-Marquardt algorithms are following a perturbated and approximative direction

• The well-known BFGS formulae:

• Multivariate polynomial interpolation:

• Finite difference approximation:

Old Levenberg-Marquardt algorithms were updating iteratively Hk only on iterations k where

You might also like

kxk+1 − x∗ k < kxk − x∗ k2

< 1 (problem is when = 0.99).