10.1 Golden Section Search in One Dimension: Broyden-Fletcher-Goldfarb-Shanno (BFGS) Algorithm. These Methods
10.1 Golden Section Search in One Dimension: Broyden-Fletcher-Goldfarb-Shanno (BFGS) Algorithm. These Methods
10.1 Golden Section Search in One Dimension: Broyden-Fletcher-Goldfarb-Shanno (BFGS) Algorithm. These Methods
397
one-dimensional sub-minimization. Turn to 10.6 for detailed discussion and implementation. The second family goes under the names quasi-Newton or variable metric methods, as typied by the Davidon-Fletcher-Powell (DFP) algorithm (sometimes referred to just as Fletcher-Powell) or the closely related Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. These methods require of order N 2 storage, require derivative calculations and onedimensional sub-minimization. Details are in 10.7. You are now ready to proceed with scaling the peaks (and/or plumbing the depths) of practical optimization.
CITED REFERENCES AND FURTHER READING: Dennis, J.E., and Schnabel, R.B. 1983, Numerical Methods for Unconstrained Optimization and Nonlinear Equations (Englewood Cliffs, NJ: Prentice-Hall). Polak, E. 1971, Computational Methods in Optimization (New York: Academic Press). Gill, P.E., Murray, W., and Wright, M.H. 1981, Practical Optimization (New York: Academic Press). Acton, F.S. 1970, Numerical Methods That Work; 1990, corrected edition (Washington: Mathematical Association of America), Chapter 17. Jacobs, D.A.H. (ed.) 1977, The State of the Art in Numerical Analysis (London: Academic Press), Chapter III.1. Brent, R.P. 1973, Algorithms for Minimization without Derivatives (Englewood Cliffs, NJ: PrenticeHall). Dahlquist, G., and Bjorck, A. 1974, Numerical Methods (Englewood Cliffs, NJ: Prentice-Hall), Chapter 10.
Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America).
398
Chapter 10.
2 4 1 5 4 6
Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America).
Figure 10.1.1. Successive bracketing of a minimum. The minimum is originally bracketed by points 1,3,2. The function is evaluated at 4, which replaces 2; then at 5, which replaces 1; then at 6, which replaces 4. The rule at each stage is to keep a center point that is lower than the two outside points. After the steps shown, the minimum is bracketed by points 5,3,6.
contrariwise, if f (b) > f (x), then the new bracketing triplet is (b, x, c). In all cases the middle point of the new triplet is the abscissa whose ordinate is the best minimum achieved so far; see Figure 10.1.1. We continue the process of bracketing until the distance between the two outer points of the triplet is tolerably small. How small is tolerably small? For a minimum located at a value b, you might naively think that you will be able to bracket it in as small a range as (1 )b < b < (1 + )b, where is your computers oating-point precision, a number like 3 10 8 (for float) or 10 15 (for double). Not so! In general, the shape of your function f (x) near b will be given by Taylors theorem 1 f (x) f (b) + f (b)(x b)2 2 (10.1.1)
The second term will be negligible compared to the rst (that is, will be a factor smaller and will act just like zero when added to it) whenever |x b| < |b| 2 |f (b)| b2 f (b) (10.1.2)
The reason for writing the right-hand side in this way is that, for most functions, the nal square root is a number of order unity. Therefore, as a rule of thumb, it times its central is hopeless to ask for a bracketing interval of width less than value, a fractional width of only about 10 4 (single precision) or 3 10 8 (double precision). Knowing this inescapable fact will save you a lot of useless bisections! The minimum-nding routines of this chapter will often call for a user-supplied argument tol, and return with an abscissa whose fractional precision is about tol (bracketing interval of fractional size about 2tol). Unless you have a better
399
estimate for the right-hand side of equation (10.1.2), you should set tol equal to (not much less than) the square root of your machines oating-point precision, since smaller values will gain you nothing. It remains to decide on a strategy for choosing the new point x, given (a, b, c). Suppose that b is a fraction w of the way between a and c, i.e. ba =w ca cb =1w ca
Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America).
(10.1.3)
Also suppose that our next trial point x is an additional fraction z beyond b, xb =z ca (10.1.4)
Then the next bracketing segment will either be of length w + z relative to the current one, or else of length 1 w. If we want to minimize the worst case possibility, then we will choose z to make these equal, namely z = 1 2w (10.1.5)
We see at once that the new point is the symmetric point to b in the original interval, namely with |b a| equal to |x c|. This implies that the point x lies in the larger of the two segments (z is positive only if w < 1/2). But where in the larger segment? Where did the value of w itself come from? Presumably from the previous stage of applying our same strategy. Therefore, if z is chosen to be optimal, then so was w before it. This scale similarity implies that x should be the same fraction of the way from b to c (if that is the bigger segment) as was b from a to c, in other words, z =w 1w Equations (10.1.5) and (10.1.6) give the quadratic equation 3 5 2 0.38197 w 3w + 1 = 0 yielding w= 2 (10.1.6)
(10.1.7)
In other words, the optimal bracketing interval (a, b, c) has its middle point b a fractional distance 0.38197 from one end (say, a), and 0.61803 from the other end (say, b). These fractions are those of the so-called golden mean or golden section, whose supposedly aesthetic properties hark back to the ancient Pythagoreans. This optimal method of function minimization, the analog of the bisection method for nding zeros, is thus called the golden section search, summarized as follows: Given, at each stage, a bracketing triplet of points, the next point to be tried is that which is a fraction 0.38197 into the larger of the two intervals (measuring from the central point of the triplet). If you start out with a bracketing triplet whose segments are not in the golden ratios, the procedure of choosing successive points at the golden mean point of the larger segment will quickly converge you to the proper, self-replicating ratios. The golden section search guarantees that each new function evaluation will (after self-replicating ratios have been achieved) bracket the minimum to an interval
400
Chapter 10.
just 0.61803 times the size of the preceding interval. This is comparable to, but not quite as good as, the 0.50000 that holds when nding roots by bisection. Note that the convergence is linear (in the language of Chapter 9), meaning that successive signicant gures are won linearly with additional function evaluations. In the next section we will give a superlinear method, where the rate at which successive signicant gures are liberated increases with each successive function evaluation.
Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America).
Switch roles of a and b so that we can go downhill in the direction from a to b. First guess for c. Keep returning here until we bracket. Compute u by parabolic extrapolation from a, b, c. TINY is used to prevent any possible division by zero.
401
(2.0*SIGN(FMAX(fabs(q-r),TINY),q-r)); ulim=(*bx)+GLIMIT*(*cx-*bx); We wont go farther than this. Test various possibilities: if ((*bx-u)*(u-*cx) > 0.0) { Parabolic u is between b and c: try it. fu=(*func)(u); if (fu < *fc) { Got a minimum between b and c. *ax=(*bx); *bx=u; *fa=(*fb); *fb=fu; return; } else if (fu > *fb) { Got a minimum between between a and u. *cx=u; *fc=fu; return; } u=(*cx)+GOLD*(*cx-*bx); Parabolic t was no use. Use default magfu=(*func)(u); nication. } else if ((*cx-u)*(u-ulim) > 0.0) { Parabolic t is between c and its fu=(*func)(u); allowed limit. if (fu < *fc) { SHFT(*bx,*cx,u,*cx+GOLD*(*cx-*bx)) SHFT(*fb,*fc,fu,(*func)(u)) } } else if ((u-ulim)*(ulim-*cx) >= 0.0) { Limit parabolic u to maximum u=ulim; allowed value. fu=(*func)(u); } else { Reject parabolic u, use default magnicau=(*cx)+GOLD*(*cx-*bx); tion. fu=(*func)(u); } SHFT(*ax,*bx,*cx,u) Eliminate oldest point and continue. SHFT(*fa,*fb,*fc,fu) } }
Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America).
(Because of the housekeeping involved in moving around three or four points and their function values, the above program ends up looking deceptively formidable. That is true of several other programs in this chapter as well. The underlying ideas, however, are quite simple.)
402
Chapter 10.
x0=ax; At any given time we will keep track of four x3=cx; points, x0,x1,x2,x3 . if (fabs(cx-bx) > fabs(bx-ax)) { Make x0 to x1 the smaller segment, x1=bx; x2=bx+C*(cx-bx); and ll in the new point to be tried. } else { x2=bx; x1=bx-C*(bx-ax); } f1=(*f)(x1); The initial function evaluations. Note that f2=(*f)(x2); we never need to evaluate the function while (fabs(x3-x0) > tol*(fabs(x1)+fabs(x2))) { at the original endpoints. if (f2 < f1) { One possible outcome, SHFT3(x0,x1,x2,R*x1+C*x3) its housekeeping, SHFT2(f1,f2,(*f)(x2)) and a new function evaluation. } else { The other outcome, SHFT3(x3,x2,x1,R*x2+C*x0) SHFT2(f2,f1,(*f)(x1)) and its new function evaluation. } } Back to see if we are done. if (f1 < f2) { We are done. Output the best of the two *xmin=x1; current values. return f1; } else { *xmin=x2; return f2; } }
Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America).
as you can easily derive. This formula fails only if the three points are collinear, in which case the denominator is zero (minimum of the parabola is innitely far