Numerical Optimization: Lecture Notes #18 Quasi-Newton Methods - The BFGS Method
Numerical Optimization: Lecture Notes #18 Quasi-Newton Methods - The BFGS Method
Numerical Optimization: Lecture Notes #18 Quasi-Newton Methods - The BFGS Method
BFGS Variants
Numerical Optimization
Lecture Notes #18
Quasi-Newton Methods The BFGS Method
Peter Blomgren,
hblomgren.peter@gmail.comi
Fall 2016
Outline
1 Quasi-Newton Methods
Introduction
The BFGS Method
2 BFGS Variants
Limited-memory BFGS
Quasi-Newton Methods
We derive the DFP (a close relative) and BFGS methods; and look
at some properties and practical implementation details.
k = Bk1 f (
p xk ).
where we require that the step length k satisfies e.g. the Wolfe
conditions:
f (
xk +
pk ) f ( pT
xk ) + c1 k f (
x), c1 (0, 1)
T
pk f (
xk + T
k f (
pk ) c 2 p xk ), c2 (c1 , 1).
So far we have not really done anything new the key difference
compared with the linesearch Newton method is that we are using
an approximate Hessian Bk 6= 2 f (
xk ).
Instead to computing a completely new Bk in each iteration, we
will update
Bk+1 = Bk + something,
mk+1 (
0) = f (
xk+1 ).
mk+1 (k p
k ) = f (
xk+1 ) k Bk+1 p
k = f (
xk ).
k Bk+1 p
k = f(
xk+1 ) f(
xk ).
sk =
xk+1
xk k p
k
yk = f (
xk+1 ) f (
xk )
Bk+1sk =
yk .
sT Bk+1sk = sT
kyk sT
kyk > 0.
|k {z }
>0
xk+1 )T sk c2 f (
f ( xk )T sk ,
subject to B = B T , Bsk =
yk .
With this weighting matrix and norm, the unique solution of the
MMP is
1
Bk+1 = I k yk sT
k B k I k
s k
yk
T
+ k ykT , k = T .
yk
yk sk
yk sT
Note that k is a scalar, and k , ykT , and
sk ykT are rank-one
yk
matrices.
This is the original Davidon-Fletcher-Powell (DFP) method suggested by
W.C. Davidon in 1959.
The original paper describing this revolutionary idea the first
quasi-Newton method was not accepted for publication. It later
appeared in 1991 in the first issue the the SIAM Journal on Optimization.
Fletcher and Powell demonstrated that this algorithm was much faster
and more reliable than existing methods (at the time). This
revolutionized the field of non-linear optimization.
Peter Blomgren, hblomgren.peter@gmail.comi Quasi-Newton Methods The BFGS Method (12/24)
Quasi-Newton Methods Introduction
BFGS Variants The BFGS Method
Hk = Bk1
and use
Sherman-Morrison-Woodbury formula
If A Rnn is non-singular and Rn , and if
a, b
B = A+T
ab
then
A1 T A1
ab
B 1 = A1 .
1+b T A1
a
Peter Blomgren, hblomgren.peter@gmail.comi Quasi-Newton Methods The BFGS Method (13/24)
Quasi-Newton Methods Introduction
BFGS Variants The BFGS Method
Hk ykT Hk
yk sk sT
k
Hk+1 = Hk + .
y T Hk
yk ykT sk
| {z } | {z }
Update #1 Update #2
Hk+1
yk = sk , compare: Bk+1sk =
yk
subject to H = H T , H
yk = sk
BksksT
k Bk
ykykT
Bk+1 = Bk + .
sT
k Bksk ykTsk
BFGS:
1
Hk+1 = I k sk yk sT
ykT Hk I k sk sT
k + k k , k = ,
ykT sk
Bk sk sT
k Bk ykT
yk
Bk+1 = Bk + .
sT
k Bk sk ykT sk
DFP:
1
yk sT
Bk+1 = I k k Bk I k ykT + k
sk ykT ,
yk k = .
ykT sk
Hk ykT Hk
yk sk sT
Hk+1 = Hk T
+ Tk .
y Hk yk
yk sk
The initial value for the iteration can be selected in different ways
A finite difference approximation at
x0 .
The self-correcting properties stand and fall with the quality of the line
search! The Wolfe conditions ensure that the model captures
appropriate curvature information.
The DFP method is less effective at self-correcting bad Hessian
approximations.
Practical Implementation Details:
The linesearch should always test = 1 first, because this step length
will eventually be accepted, thus creating super-linear convergence.
The linesearch can be somewhat sloppy: c1 = 104 and c2 = 0.9
are commonly used values in the Wolfe conditions.
The initial matrix H0 should not be too large, if H0 = I , then the first
step is p
0 = f ( x0 ) which may be too long if is large, often H0
is rescaled before the update H1 is computed:
ykT sk
H0 I.
ykT
yk
L-BFGS
1
v = f ( xk )
2 T
j = j sj
v,
v=
v j
yj , j = k 1, . . . , k m.
3 w
= Hk v
4 yjT w,
j = j w =w + sj (j j ), j = k m, . . . , k 1
5 Now, use p k = w
( Hk f ( xk )).
References:
1 Matthies, H.; Strang, G. (1979). The solution of non linear finite
element equations. International Journal for Numerical Methods in
Engineering 14 (11): 16131626. doi:10.1002/nme.1620141104
2 Nocedal, J. (1980). Updating Quasi-Newton Matrices with Limited
Storage. Mathematics of Computation 35 (151): 773782.
doi:10.1090/S0025-5718-1980-0572855-7
Peter Blomgren, hblomgren.peter@gmail.comi Quasi-Newton Methods The BFGS Method (24/24)