Levenberg-Marquardt Algorithms Vs Trust Region Algorithms: Frank Vanden Berghen
Levenberg-Marquardt Algorithms Vs Trust Region Algorithms: Frank Vanden Berghen
Levenberg-Marquardt Algorithms Vs Trust Region Algorithms: Frank Vanden Berghen
vs
Trust Region algorithms
Frank Vanden Berghen
IRIDIA, Université Libre de Bruxelles
fvandenb@iridia.ulb.ac.be
For an in-depth explanation and more references about this subject, see my thesis, section 2.1,
available at: http://iridia.ulb.ac.be/∼fvandenb/mythesis/index.html
Let’s assume that we want to find x∗ , the minimum of the objective function F(x) : <n → <.
∇Q(δk ) = gk + Bk δk = 0
⇐⇒ Bk δk = −gk (1)
2. solve Bk δk = −gk (go to the minimum of the current quadratical approximation of F).
3. set xk+1 = xk + δk
1
2
Newton’s method is VERY fast: when xk is close to x∗ (when we are near the optimum) this
method has quadratical convergence speed:
with < 1. Unfortunately, it does NOT always converge to the minimum x∗ of F(x). To have
convergence, we need to be sure that Bk is always positive definite, ie. that the curvature of
F(x) is always positive.
=⇒ δ T g < 0 (2)
−δ T Bδ < 0
⇔ δ T Bδ > 0 (3)
END OF PROOF.
So, we must always construct the Bk matrix so that it is a positive definite approximation of
Hk , the real Hessian matrix of F(x). The Newton’s Algorithm cannot use negative curvature
(when Hk negative definite) inside F(x). See figure 1 for an illustration about positive/negative
curvature.
One possibility to solve this problem is to take Bk = I (I=identity matrix), which is a very
bad approximation of the Hessian H but which is always positive definite. We will simply have
δk = −gk . We will simply follow the slope. This algorithm is called the “steepest descent al-
gorithm”. It is very slow. It has linear speed of convergence: kxk+1 − x∗ k < kxk − x∗ k with
3
The old Levenberg-Marquardt algorithm uses a technique which adapts the value of λ during
the optimization. If the iteration was successful (F(xk + δk ) < F(δk )), we decrease λ to exploit
more the curvature information contained inside Hk . If the previous iteration was unsuccessful
(F(xk + δk ) > F(δk )), the quadratic model don’t fit properly the real function. We must then
only use the “basic” gradient information. We will increase λ in order to follow closely the
gradient (“steepest descent algorithm”). This old algorithm is the base for the explanation of
the update of the trust region radius ∆k in Trust Region Algorithms.
For intermediate value of λ, we will thus follow a direction which is a mixture of the “steepest
descent step” and the “Newton Step”. This direction is based on a perturbated Hessian matrix
Bnew,k and can sometime be disastrous (There is no geometrical meaning of the perturbation
λI on Hk ).
• Trust region algorithms will perform a long step (kδk k = ∆k ) and “move” quickly to a
more interesting area (see equation (5))
Trust Region algorithm will thus exhibit better performances each time a negative curvature is
encountered and have thus better performances than all the Levenberg-Marquardt algorithms.
Unfortunately, the computation of δk for Trust Region algorithm involves a constrained mini-
mization of a quadratic subject to one non-linear constraint (see equation (5)). This is not a
trivial problem to solve at all. The algorithmic complexity of Trust region algorithms is much
higher. This explains why they are not very often encountered despite their better performances.
4
The solution of equation (5) can be computed very efficiently using the fast algorithm from Moré
and Sorensen.
There is one last point which must still be taken into account: How can we obtain H k ? Usually,
we don’t have the analytical expression of Hk . Hk must thus be approximated numerically. Hk
is usually constructed iteratively based on information gathered from old evaluations F(x j ) for
j = 1, . . . , k. Iterative construction of Hk can be based on:
To summarize:
• Levenberg-Marquardt algorithms and Trust region algorithms are both Newton Step-based
methods (they are called “Restricted Newton Step methods”). Thus they both exhibits
quadratical speed of convergence near x∗ .
• When we are far from the solution (xk far from x∗ ), we can encounter a negative curvature
(Hk negative definite). If this happens, Levenberg-Marquardt algorithms will slow down
dramatically. In opposition, Trust Region Methods will perform a very long step δ k and
“move” quickly to a more interesting area.
To summarize again: Trust Region Methods are an evolution of the Levenberg-Marquardt algo-
rithms. Trust Region Methods are able to follow the negative curvature of the objective function.
Levenberg-Marquardt algorithms are NOT able to do so and are thus slower.