Troutmal CalculusOpControl
Troutmal CalculusOpControl
Troutmal CalculusOpControl
Editors
S. Axler
F.W. Gehring
P.R. Halmos
Springer
New York
Berlin
Heidelberg
Barcelona
Budapest
Hong Kong
London
Milan
Paris
Santa Clara
Singapore
Tokyo
Undergraduate Texts in Mathematics
Second Edition
With 87 Illustrations
Springer
John L. Troutman
Department of Mathematics
Syracuse University
Syracuse, NY 13210
USA
Editorial Board:
Sheldon Axler F.W. Gehring
Department of Mathematics Department of Mathematics
Michigan State University University of Michigan
East Lansing, MI 48824 Ann Arbor, MI 48109
USA USA
Paul R. Halmos
Department of Mathematics
Santa Clara University
Santa Clara, CA 95053
USA
Production coordinated by Brian Howe and managed by Henry Krell; manufacturing super-
vised by Jeffrey Taub.
Typeset by Asco Trade Typesetting Ltd., Hong Kong.
Printed and bound by R.R. Donnelley & Sons, Harrisonburg, VA.
Printed in the United States of America.
9 8 7 6 5 432 1
1 Even the perennial question of how a falling cat rights itself in midair can be cast as a control
problem in geometric robotics! See Dynamics and Control of Mechanical Systems: The Falling
Cat and Related Problems, by Michael Enos, Ed. American Mathematical Society, 1993.
vii
viii Preface
this text we will view optimal control as a special form of variational calculus,
although with proper interpretation, these distinctions can be reversed.
In either field, most initial work consisted of finding (necessary) conditions
that characterize an optimal solution tacitly assumed to exist. These condi-
tions were not easy to justify mathematically, and the subsequent theories
that gave (sufficient) conditions guaranteeing that a candidate solution does
optimize were usually substantially harder to implement. (Conditions that
ensure existence of an optimizing solution were-and are-far more difficult
to investigate, and they cannot be considered at the introductory level of this
text. See [Cel) Now, in any of these directions, the statements of most later
theoretical results incorporate some form of convexity in the defining func-
tions (at times in a disguised form). Of course, convexity was to be expected
in view of its importance in characterizing extrema of functions in ordinary
calculus, and it is natural to employ this central theme as the basis for an
introductory treatment.
The present book is both a refinement and an extension of the author's
earlier text, Variational Calculus with Elementary Convexity (Springer-Verlag,
1983) and its supplement, Optimal Control with Elementary Convexity (1986).
It is addressed to the same audience of junior to first-year graduate students
in the sciences who have some background in multidimensional calculus and
differential equations. The goal remains to solve problems completely (and
exactly) whenever possible at the mathematical level required to formulate
them. To help achieve this, the book incorporates a sliding scale-of-difficulty
that allows its user to become gradually more sophisticated, both technically
and theoretically. The few starred (*) sections, examples, and problems out-
side this scheme can usually be overlooked or treated lightly on first reading.
For our purposes, a convex function is a differentiable real-valued func-
tion whose graph lies above its tangent planes. In application, it may be
enough that a function of several variables have this behavior only in some
of the variables, and such "elementary" convexity can often be inferred
through pattern recognition. Moreover, with proper formulation, many more
problems possess this convexity than is popularly supposed. In fact, using
only standard calculus results, we can solve most of the problems that moti-
vated development of the variational calculus, as well as many problems of
interest in optimal control.
The paradigm for our treatment is as follows: Elementary convexity sug-
gests simple sufficiency conditions that can often lead to direct solution, and
they in turn inform the search for necessary conditions that hold whether or
not such convexity is present. For problems that can be formulated on a fixed
interval (or set) this statement remains valid even when fixed-endpoint condi-
tions are relaxed, or certain constraints (isoperimetric or Lagrangian) are
imposed. Moreover, sufficiency arguments involving elementary convexity
are so natural that even multidimensional generalizations readily suggest
themselves.
Preface IX
Acknowledgments
and Andy Vogel, who taught from this material at Syracuse and made valu-
able suggestions for its improvement. And of course, I feel deep gratitude to
and for my many students over the years, without whose evident enjoyment
and expressed appreciation the current work would not have been under-
taken. It is a pleasure to recognize those responsible for transforming this
work from manuscript to printed page: the principal typists, Louise Capra,
Esther Clark, and Steve Everson; the editors and staff at Springer-Verlag,
in particular, Jenny Wolkowicki and the late Walter Kauffmann-Buhler.
Finally, I wish to thank my wife, Patricia Brookes, for her patience and
understanding during the years of revision.
Preface vii
CHAPTER 0
Review of Optimization in IRd 1
Problems 7
PART ONE
BASIC THEORY 11
CHAPTER 1
Standard Optimization Problems 13
1.1. Geodesic Problems 13
(a) Geodesics in IRd 14
(b) Geodesics on a Sphere 15
(c) Other Geodesic Problems 17
1.2. Time-of-Transit Problems 17
(a) The Brachistochrone 17
(b) Steering and Control Problems 20
1.3. Isoperimetric Problems 21
1.4. Surface Area Problems 24
(a) Minimal Surface of Revolution 24
(b) Minimal Area Problem 25
(c) Plateau's Problem 26
1.5. Summary: Plan of the Text 26
Notation: Uses and Abuses 29
Problems 31
xi
xii Contents
CHAPTER 2
Linear Spaces and Gateaux Variations 36
2.1. Real Linear Spaces 36
2.2. Functions from Linear Spaces 38
2.3. Fundamentals of Optimization 39
Constraints 41
Rotating Fluid Column 42
2.4. The Gateaux Variations 45
Problems 50
CHAPTER 3
Minimization of Convex Functions 53
3.1. Convex Functions 54
3.2. Convex Integral Functions 56
Free End-Point Problems 60
3.3. [Strongly] Convex Functions 61
3.4. Applications 65
(a) Geodesics on a Cylinder 65
(b) A Brachistochrone 66
(c) A Profile of Minimum Drag 69
(d) An Economics Problem 72
(e) Minimal Area Problem 74
3.5. Minimization with Convex Constraints 76
The Hanging Cable 78
Optimal Performance 81
3.6. Summary: Minimizing Procedures 83
Problems 84
CHAPTER 4
The Lemmas of Lagrange and Du Bois-Reymond 97
Problems 101
CHAPTER 5
Local Extrema in Normed Linear Spaces 103
5.1. Norms for Linear Spaces 103
5.2. Normed Linear Spaces: Convergence and Compactness 106
5.3. Continuity 108
5.4. (Local) Extremal Points 114
5.5. Necessary Conditions: Admissible Directions 115
5.6*.Affine Approximation: The Frechet Derivative 120
Tangency 127
5.7. Extrema with Constraints: Lagrangian Multipliers 129
Problems 139
CHAPTER 6
The Euler-Lagrange Equations 145
6.1. The First Equation: Stationary Functions 147
6.2. Special Cases of the First Equation 148
Contents xiii
PART TWO
ADVANCED TOPICS 195
CHAPTER 7
Piecewise C 1 Extremal Functions 197
7.1. Piecewise C 1 Functions 198
(a) Smoothing 199
(b) Norms for C1 201
7.2. Integral Functions on C1 202
7.3. Extremals in C1[a, b]: The Weierstrass-Erdmann Corner Conditions 204
A Sturm-Liouville Problem 209
7.4. Minimization Through Convexity 211
Internal Constraints 212
7.5. Piecewise C 1 Vector-Valued Extremals 215
Minimal Surface of Revolution 217
Hilbert's Differentiability Criterion* 220
7.6*. Conditions Necessary for a Local Minimum 221
(a) The Weierstrass Condition 222
(b) The Legendre Condition 224
Bolza's Problem 225
Problems 227
CHAPTER 8
Variational Principles in Mechanics 234
8.1. The Action Integral 235
8.2. Hamilton's Principle: Generalized Coordinates 236
Bernoulli's Principle of Static Equilibrium 239
xiv Contents
CHAPTER 9*
Sufficient Conditions for a Minimum 282
9.1. The Weierstrass Method 283
9.2. [Strict] Convexity of f(!., y, Z) 286
9.3. Fields 288
Exact Fields and the Hamilton-Jacobi Equation* 293
9.4. Hilbert's Invariant Integral 294
The Brachistochrone* 296
Variable End-Point Problems 297
9.5. Minimization with Constraints 300
The Wirtinger Inequality 304
9.6*. Central Fields 308
Smooth Minimal Surface of Revolution 312
9.7. Construction of Central Fields with Given Trajectory:
The Jacobi Condition 314
9.8. Sufficient Conditions for a Local Minimum 319
(a) Pointwise Results 320
Hamilton's Principle 320
(b) Trajectory Results 321
9.9*. Necessity of the Jacobi Condition 322
9.10. Concluding Remarks 327
Problems 329
PART THREE
OPTIMAL CONTROL 339
CHAPTER 10*
Control Problems and Sufficiency Considerations 341
10.1. Mathematical Formulation and Terminology 342
Contents xv
CHAPTER 11
Necessary Conditions for Optimality 378
11.1. Necessity of the Minimum Principle 378
(a) Effects of Control Variations 380
(b) Autonomous Fixed Interval Problems 384
Oscillator Energy Problem 389
(c) General Control Problems 391
11.2. Linear Time-Optimal Problems 397
Problem Statement 398
A Free Space Docking Problem 401
11.3. General Lagrangian Constraints 404
(a) Control Sets Described by Lagrangian Inequalities 405
(b)* Variational Problems with Lagrangian Constraints 406
(c) Extensions 410
Problems 413
Appendix
A.O. Compact Sets in [Rd 419
A.1. The Intermediate and Mean Value Theorems 421
A.2. The Fundamental Theorem of Calculus 423
A.3. Partial Integrals: Leibniz' Formula 425
A.4. An Open Mapping Theorem 427
A.5. Families of Solutions to a System of Differential Equations 429
A.6. The Rayleigh Ratio 435
A.7*. Linear Functionals and Tangent Cones in [Rd 441
Bibliography 445
Historical References 450
Index 457
CHAPTER 0
This chapter presents a brief summary of the standard terminology and basic
results related to characterizing the maximal and minimal values of a real
valued function f defined on a set D in Euclidean space. With the possible
exception of the remarks concerning convexity ((0.8) and (0.9», this material
is covered in texts on multidimensional calculus; the notation is explained at
the end of §1.5.
I((!)) = 0,
is again unbounded.
for given real numbers aj::; bj, j = 1, 2, ... , d is compact. However, the
interval (-1, + 1) is not compact. (See §A.O.)
f: D --+ ~ is continuous at Xo E D iff for each e > 0,3 () > 0, such that when
XED and IX - Xol < (), then If(X) - f(Xo) I < e; and f is continuous on D
iff it is continuous at each point Xo E D.
The previous examples show that neither compactness nor continuity can
alone assure the existence of extremal values.
(0.4) The maximum value of f is the minimum value of - f and vice versa.
=0.
[The bracketed quotient reverses sign as the sign of e is changed. The exis-
4 O. Review of Optimization in [Rd
tence and continuity of the partial derivatives ensures the existence of the
limit which must therefore be zero.J
Introducing the gradient vector Vfd,;! (fx" f X 2' ••• , fx), we may also ex-
press auf(Xo) = Vf(Xo )' U, and conclude that at such an interior extremal
point X o,
Vf(Xo) = (!). (4)
(0.6) The points Xo at which (4) holds, called stationary points (or critical
points) of f, need not give either a maximum or a minimum value of f.
(0.7) A stationary point Xo may be (only) a local extremal point for f; i.e., one
for which f(X) ~ f(Xo) (or f(X) ~ f(Xo)) for all XED which are in some
neighborhood of Xo.
Figure 0.1
O. Review of Optimization in [Rd 5
(0.9) When f is strictly convex on D; i.e., when (5) holds at each Xo ED with
equality iff X = X o, then f can have at most one stationary point, and hence, at
most one interior minimum point, in D.
graph of f in [R3; for the general case, (7) may be used as a definition for
tangency.]
We can give the stationarity requirement Vf(Xo) = (!) the geometric inter-
pretation that the graph of f at Xo has at the point (Xo, f(Xo)) E [Rd+1 a
"horizontal" tangent hyperplane; i.e., a d-dimensional subset parallel to [Rd.
(See § 5.6.) Thus for d = 2, a marble "balanced" at (Xo, f(Xo)) should not roll
but remain "stationary." By (5), we see that a convex differentiable function
is one whose graph lies "above" its tangent hyperplanes.
NOTE: The existence of the partial derivatives offin a neighborhood of Xo
together with their continuity at Xo (as in (0.5)) are sufficient to guarantee
that (7) holds. (See A.7 and [Ed].)
If Xo is a stationary point of fin D, and for each U = (u 1 , U2' ... , Ud) E [Rd:
(0.13) Remarks. If V = cU E !Rd, then q(V) = c 2 q(U). Hence (8) holds iff the
quadratic form q(V) > 0, V V E !Rd, V "1= (9.
When (8) holds, the (symmetric) Hessian matrix fxx(Xo) whose elements
are the second partial derivatives fx.x, J.(Xo) = fXjx.(X
, o), i, j = 1, 2, ... , d
(arranged in natural order), is said to be positive definite. Conditions which
characterize the positive definiteness of such matrices are known. (See Prob-
lem 0.10.) For the present, we observe that when (8) holds, the matrix fxx(Xo)
is invertible. [If [fxx] V = (9 for some V E !Rd, then by the laws of matrix
multiplication,
(0.14) Unless D is open, i.e., has only interior points, then it is also necessary to
consider the extremal values of f on 8D, the boundary of D.
PROBLEMS
0.1. (a) Establish the Cauchy inequality (1). (Hint: If Y = (I), the inequality obviously
holds; assume Y #- (I), set f.l = (X· Y)/I Y1 2 , and consider IX - f.lYI2.)
(b) What can you deduce about the relationship between X and Y if equality
holds in (i)?
8 O. Review of Optimization in [Rd
(c) Use the Cauchy inequality (1) to prove the triangle inequality (2a).
(d) Conclude that the reverse triangle inequality (2b) holds.
0.2. (a) Derive the inequality
IXIIYI- X· Y ~ IX - YI 2
(Hint: Show that IXII YI ~ t(IXI 2 + IY12) and add tlx - YI 2 to the right
side of this last inequality.)
(b) Use the result of part (a) to verify that
I~
IXI
_~I <
IYI -
.J2 IX -
JiXTIYi YI
0.3. Find the maximum and minimum values (and the points at which they occur)
for
on
D = {X E [R2: IXjl ~ 2,j = 1,2}.
0.4. Find the maximum and minimum values (and the points at which they occur)
for
on
D = {X E [R2: IXI ~ 1}.
0.5. Which of the following functions are convex on D = [R2? Which are strictly
convex?
(a) f(X) = xi - x~. (f) f(X) = xi + x 2 •
(b) f(X) = Xl - X 2 • (g) f(X) = x1.
(c) f(X) = xi + x~ - 2Xl' (h) f(X) = sin(xl + X2)'
(d) f(X) = eX, + x~. (i) f(X) = xi - 2X1X2 + x~.
(e) f(X) = X1X2' (j) f(X) = aX l + bX2 + c, a, b, c E [R.
0.6. (a) Prove that the sum oftwo [strictly] convex functions on D ~ [Rd is [strictly]
convex on D.
(b) Is the sum of a convex function and a strictly convex function strictly
convex?
(c) Let f be strictly convex and c > O. Show that cf is strictly convex.
(d) Give an example to show that the product oftwo convex functions need not
be convex.
0.7*. Suppose thatf: [Rd -> [R is differentiable. Show thatfis convex on [Rd ifJfor each
X, Xo E [Rd,
divide both sides of the resulting inequality by t, and let t approach zero.
For the "only if" part, set Y = tX + (1 - t)Xo, 0 < t < 1, and establish the
Problems 9
inequalities
f(Xo) ;;:::f(Y) + Vf(Y)'(Xo - Y),
(1 - t)
f(X) ;;:::f(Y) - --Vf(Y)·(Xo - Y).
t
Then, combine these last two inequalities to get the result.)
0.8*. Assume that f: jRd -+ jR has continuous second-order partial derivatives. Show
that f is convex on jRd iff for each Xo E jRd, the matrix of second partial deriva-
tives, fxx(X o), is positive semidefinite, i.e.,
d
L fx,x/XO)UiUj;;::: 0,
i,j=l
VUE jRd.
- def
(Hint: Use Taylor's theorem for f(t) = f(Xo + tU) where U = X - X o, t E jR.)
(a) Verify that the partial derivatives af/aXl and af/aX2 both exist at X = (!J.
(b) Show that f is not continuous at X = (!J!
0.10. Let f: jR2 -+ jR have continuous second-order partial derivatives and let U =
(u l , u 2 ) be a unit vector.
(a) Verify that at Xo:
where
A = fx,x, (Xo), B = fX,X2(XO), C = f X2X2(XO)'
(b) If Xo is a stationary point of f, with both AC - B2 > 0 and A > 0,
conclude that Xo is a strict local minimum point for f (Hint: let U l = cos 0,
U 2 = sin 0.)
(c) Write the conditions of (b) in terms of subdeterminants of the 2 x 2 matrix
fxx, and conjecture a form for a corresponding set of conditions for the
general d x d matrix.
0.11. Is f(X) = IXI [strictly] convex on jRd - {OJ?
PART ONE
BASIC THEORY
Groningen,
January 1, 1697
AN ANNOUNCEMENT
"I, Johann Bernoulli, greet the most clever mathematicians in the world. Noth-
ing is more attractive to intelligent people than an honest, challenging problem
whose possible solution will bestow fame and remain as a lasting monument.
Following the example set by Pascal, Fermat, etc., I hope to earn the gratitude
of the entire scientific community by placing before the finest mathematicians
of our time a problem which will test their methods and the strength of their
intellect. If someone communicates to me the solution of the proposed problem,
I shall then publicly declare him worthy of praise."
11
CHAPTER 1
13
14 1. Standard Optimization Problems
ble) to take this route because of natural obstacles, and then it is necessary to
consider the more complicated problem of finding the geodesic curves (i.e.,
those of least length) among those constrained to a given "hyper" surface. In
particular, we might wish to characterize in ~3, the geodesics on the surface
of a sphere, on a cylinder, or on a cone.
Figure 1.0
§l.l. Geodesic Problems 15
= II [(B-A)·Y'(t)]dt:::;; IIIB-AIIY'(t)ldt,
or
IB - AI2 :::;; IB - AI II IY'(t)1 dt,
and for A f= B, the desired inequality follows upon division by IB - AI. The
reader should verify each step in this chain. In obtaining the inequality
between the integrals, we utilized at each t E (0, 1), the Cauchy inequality in
the form
[(B - A)· Y'(t)] :::;; IB - All Y'(t)l,
and this pointwise inequality in turn implies that of the corresponding inte-
grals. [See A. 10.]
Can a nonstraight curve also be of minimal length? (See Problem 6.39.)
Figure 1.1
x
A~~------~----------------.
x
Figure 1.2
of the circular arc shown in Figure 1.2 might not offer a faster time of transit
to a bead sliding down it under the action of gravity than would a straight
wire joining the same two points.
In 1696, Johann Bernoulli challenged mathematicians to find the bra-
chistochrone, that is, the planar curve which would provide the least time of
transit. His own solution was derived from an optical analogy, [see Problem
1.1]; and solutions were provided by his brother Jakob, by Newton, by Euler,
and by Leibniz. Although all of these reached the same conclusion-that the
brachistochrone is not the circular arc, but a cycloid-none of their solutions
is entirely satisfactory; however, that of Jakob Bernoulli admitted refinement
and far-reaching generalization: the variational calculus.
If we use the coordinate system shown in Figure 1.2, in which the initial
point A is the origin and the positive y axis is taken as vertically downward,
then a typical curve which might represent the brachistochrone joining A to
a lower point B = (x 1, Y 1) where x 1 and y 1 are positive, can be represented as
the graph of a continuous function y = y(x), X E [0, Xl] with y(O) = 0 and
y(x l ) = Yl. (Here we are abandoning the parametric form of representing
curves used in the previous section in favor of one less general but more
convenient. Although it is reasonable that the class of curves needed should
be so representable (Why?), the reader should consider whether something
essential is lost with this restriction.)
Assuming sufficient differentiability, this curve has length 1 and the time
required to travel along it is given through pure kinematic considerations as
T= T(y) = II ds,
o v
where v = ds/dt is the speed of travel at a distance s along the curve.
Now from calculus, for each x E [0, Xl], s = s(x) = JoJ1 + y'(~)2 d~ is
the arc length corresponding to the horizontal position x, and we may regard
v = v(x) as the associated speed. Thus with these substitutions,
_
T - T(y) -
_ IX. J1 +( y'(X)2
) dx.
o vx
§1.2. Time-of-Transit Problems 19
A~---------.----------------~
x
B
y
Figure 1.3
where 0( is the angle between the tangent line to the curve and the y axis at
this point (Figure 1.3). (We use the Newtonian notation of a dot to signify a
derivative with respect to time.)
Thus vv = gy, or upon integrating with respect to time,
v2 = 2gy + const.
But at A, v = y = 0, so that in general,
v= .,fiiY or v(x) = J2gy(x).
_ _ 1 i Xl (1 +
Thus finally,
y'(X)2)1/2
T - T(y) - M: ( ) dx, (2)
y2g 0 Yx
and we have the problem of minimizing T over all functions y = y(x) which
validate the above analysis. However, we may consider also the mathemati-
cal problem of minimizing the integral
(Xl (1 + y'(X)2)1/2 dx
Jo y(x)
over all functions y continuous on [0, Xl] for which y(o) = 0, y(xd = Yl' and
the integral is defined. This last condition requires that y have a derivative
integrable on [0, Xl] and that y be ~o with
(Xl
J 0 (y(x)t1/2 dx < +00.
XI X
Ar----------------T-
y
(a) (b)
Figure 1.4
Figure 1.5
where Q((x) = (1 - r2(x)t l /2. (In order that the crossing be possible we must
have 0 ::;; r(x) < 1.) (Why?) We are required to minimize this rather compli-
cated integral over all those functions y which are continuously differentiable
on [0, Xl], and satisfy y(O) = 0, y(x l ) = Yl. The methods of Chapter 3 will
provide access to a solution. (See Problem 3.20.)
We may also consider the more natural problems in which the river banks
are represented by curves, and permit the crossing points to vary. Finally, we
can also permit the current to vary with Y as well as x. In fact, in 1931,
Zermelo investigated the two-dimensional version of this problem which
could be equally significant to the piloting of a submarine or a light aircraft
[Cl And when we ask how to operate our craft so that it travels along an
optimal path, we enter the realm of optimal control problems, first consid-
ered by Minorsky around 1920.
(Problems 1.1-1.3, 1.8)
Figure 1.6
smooth horizontal surface and filled with water as in Figure 1.6. Under the
action of gravity, the water will seek its lowest level and if we assume none
lost by leakage at the base, then this will be accomplished by a movement of
the walls of the cylinder until the effects of hydrostatic pressure are equalized.
Since this pressure is exerted uniformly at each depth, the final configuration
must have constant curvature, i.e., it is that of a right circular cylinder.
However, the configuration which provides least depth must clearly be that
having maximal cross-sectional area. Thus the circle encloses a greater area
than any noncircular isoperimetric curve.
One mathematical formulation of this problem is as follows: We suppose
that a smooth simple closed curve oflength I is represented parametrically by
Y(t) = (x(t), y(t», t E [0, 1J, with Y(O) = Y(1) for closure (Figure 1.7).
Then according to Green's theorem [EdJ, the area of the domain D
bounded by the curve is
we must maximize A(Y) over all functions Y(t) having continuously differ-
x
Figure 1.7
§1.3. Isoperimetric Problems 23
entiabl<i components on [0, 1J, which meet the closure conditions Y(O) =
Y(1), and for which the resulting curve satisfies the isoperimetric condition
Remark. This problem has received much attention and there have been
other less restrictive formulations. In particular, the German geometer Jakob
Steiner (c. 1840) devised several techniques to attack it [P]. One well-known
analytical solution utilizes properties of Fourier series, (see Problem 1.6.),
but in §8.8, we will present a recently discovered solution that seems more
natural.
1 This problem was first proposed by Galileo (who believed that a parabolic shape would be
optimal) and it was then attacked mathematically by the Bernoullis in 1701. (See Goldstine.)
24 1. Standard Optimization Problems
Find the surface of minimum area that spans fixed closed curves in \R 3 •
r r
mentary calculus to minimize the surface area function
Figure 1.8
§1.4. Surface Area Problems 25
Figure 1.9
made as close as we please to the area of the bounding disks-and that these
probably represent the "least" area-but the associated "boundary" curve is
not of the form admitted.
We have two alternatives: First, we can simply predict that in this case the
problem as posed has no solution and attempt to substantiate this. Or, we
can seek a reformulation of the problem in which such cornered curves
remain admissible. We shall adopt the second alternative when we return to
this problem in §7.S. For the present, we note that a framework large enough
to include this alternative must accomodate an accurate description of cor-
nered curves.
S(u) = Iv J1 + u~ + u; dA,
where dA denotes the two-dimensional element of integration over D; the
boundary of D is denoted by aD, and in this text, we suppose it to be so
well-behaved that Riemann integration of continuous functions can be de-
fined over D or jj and over aD [Ed].
We would then seek the minimum for S(u) over all functions u which are
continuous on jj = D u aD, continuously differentiable inside D, and have
given continuous boundary values UlaD = y, say. We shall obtain some par-
tial results for this problem, which need not have a solution, in §3.4(e), under
the assumption that D is a Green's domain (one whose boundary smoothness
admits use of Green's theorem [F1].)
26 1. Standard Optimization Problems
Figure 1.10
r
usually by means of an integral of the form
for some given real valued function f. Here Y is a real [or possibly a vector]
valued function [each component of] which is continuous on [a, b] with a
continuous derivative in (a, b); and rlZ, in general, consists of those functions
§1.5. Summary: Plan of the Text 27
theoretical foundation for the laws that govern the operation of our universe.
We shall examine this aspect of the subject in Chapter 8 (which is essentially
independent of Chapters 7 and 9).
In the concluding part (Part III), we turn to problems of optimal control
where the vector-state ¥ of a system at time t is governed by a dynamical
system of differential equations of the vector form
¥'(t) = G(t, ¥(t), U(t))
dependent upon a vector "control" function U(t). The task is to determine a
"path" ¥o and a control Uo that optimizes some performance assessment
integral of the form
F(Y, U) = r F(t, ¥(t), U(t)) dt
PROBLEMS
1.1. The Brachistochrone. (The following optical analogy was used by Johann
Bernoulli, in 1696, to solve the brachistochrone problem.) In a nonuniform me-
dium, the speed of light is not constant, but varies inversely with the index of
refraction. According to Fermat's principle of geometric optics, a light ray travel-
ling between two points in such a medium follows the "quickest" path joining the
points. Bernoulli thus concluded that finding the brachistochrone is equivalent to
finding the path of a light ray in a planar medium whose index of refraction is
inversely proportional to Jy.
The optics problem can be solved by using Snell's law which states that at
each point along the path of a light ray, the sine of the angle which the path
makes with the y-axis is inversely proportional to the index of refraction (and
hence proportional to the speed). Therefore, the brachistochrone should satisfy
c sin IX = Jy, (11)
c2
y(B) = 2(1 - cos B),
satisfies the differential equation found in part (a) and has (0, 0) as one end
point. (It will be shown in §3.4 that c and B1 can always be chosen to make
(Xl> Y1) the other end point, and that the resulting curve is expressible in the
form y = y(x).)
(Although this does not constitute a mathematically rigorous solution to the
problem, it illustrates an important parallel between geometric optics and parti-
cle mechanics which led to the works of Hamilton.)
1.2. A Brachistochrone. (See §1.2(a).) Let Xl = Y1 = 1.
(a) Use equation (2) to compute T(y) for the straight line path y = x.
(b) Use equation (2) and the trigonometric substitution 1 - X = cos B to show
that for the circular arc y = Jl - (x - 1)2,
T(y) = (2g)-1/2 J: 1
2 (sin B)-1 /2 dB.
32 1. Standard Optimization Problems
(c) Use a table of definite integrals to conclude that the circular arc in part (b)
provides a smaller transit time than the line segment in part (a).
(d) Use the inequality sin e : :; e, e ~ 0, to obtain a lower estimate for the transit
time of the circular arc of (b);
(e)* Find similar (but more precise) upper estimates which lead to the same
conclusion as in (c) without obtaining a numerical value for the integral in
(b).
1.3. Transit Time of a Boat. (See §1.2(b).) Use the following steps to derive equation
(3):
(a) Show that the x- and y-components of the velocity of the boat are given
respectively by cos u and r + sin u, where u is the steering angle of the boat
relative to the x-axis shown in Figure 1.5.
(b) Prove that the crossing time is given by
(Xl
T= Jo sec u dx,
1.4. Chaplygin's Problem. A small airplane in level flight with constant unit airspeed
flies along a simple smooth closed loop in one hour in the presence of a wind with
constant direction and speed w :::;; 1. Suppose that its ground position at time t is
given by Y(t) = (x(t), y(t)) where the wind is in the direction ofthe positive x-axis.
(a) Argue that (x'(t) - W)2 + y'(t)2 ;: 1, t E [0, 1], while A(Y) = J6
x(t)y'(t) dt
represents the area enclosed by the flight path.
(b) Formulate the problem of finding the flight path(s) maximizing the area
enclosed. (We return to this formulation in Problem 9.19.)
(c) When w = 0, show why a solution of the problem in (b) would solve the
classical isoperimetric problem of §1.3.
(d) As formulated in (b) Chaplygin's problem is not isoperimetric. (Why not?)
Recast it as an isoperimetric problem in terms of u(t), the steering angle at
time t between the wind direction and the longitudinal axis of the plane.
(Hint: Take y(o) = Y(l) = (9 and conclude that x(t) = wt + J~ cos u('t') d-r:,
while y(t) = J~ sin u(-t') d-r:).
1.5. Queen Dido's Conjecture. According to Virgil, Dido (of Carthage), when told
that her province would consist only of as much land as could be enclosed by the
hide of a bull, tore the hide into thin strips and used them to form a long rope of
length I with ends anchored to the "straight" Mediterranean coast as shown in
Figure 1.11. The rope itself was arranged in a semicircle which she believed would
result in the maximum "enclosed" province. And thus was Carthage founded-in
mythology.
Problems 33
y
Dido's
Solution ----+ ,.,... - -~
x(s), y(s)~,
/ /
/
\\
/ /
/
/
./
Figure 1.11
For simplicity, suppose that I = n; then use the arc length s as the parameter.
(a) Show that the conjectured inequality for the vector valued function Y =
(x, y) with y(o) = y(n) = 0, is
(d) Show that the integral inequality in (a) is violated for the function Yl (s) =
sin s/2. Is WYl (s) ds = O?
a b
0,---,-------.-------,--. o,----,----------------r-~
x y
Figure 1.12
(a) Show that the time of transit along this path is for some positive constant p,
given by T(y) = p S:(jl + y'(X)2/y(X)) dx. For further analysis of this
problem, see Problem 8.29 and §9.3.)
(b) When the x- and y-axes are interchanged, if this path can be described as the
graph of a function y = y(x) as in Figure 1.12(b), show that the time of
transit integral is for a < h given by T(y) = p S:(jl + y'(xY/x) dx. This for-
mulation is examined further in Problem 3.23.
where y = y(t) is continuously differentiable on [0, T], and 0 :5; y(t), with
y(O) = y(T) = O.
The analysis is continued in Problems 6.43, 8.27 and 9.12.
CHAPTER 2
36
§2.1. Real Linear Spaces 37
on S for j = 1, 2, ... , d, so that F(x) = (i1 (X),f2(X), ... ,fAx)), and G(x) =
(gl (x), g2(X), ... , gAx)), V XES, then
(F + G)(x) = + G(x)
F(x)
def! 1
= ( l(X) + gl(X),f2(X) + g2(X), ... , Jd(x) + gAx))
and
(cF)(x) = (cif1 (x), C!2(X), ... , CJd(x)),
= cF(x) def .1
V XES.
It follows that each subspace of these spaces, i.e., each subset which is closed
under the defining operations of addition and scalar multiplication, is itself a
real linear space.
In particular, if continuity is definable on S, then C(S) (= CO (S)), the set of
continuous real valued functions on S, will be a real linear space since the
sum of continuous functions, or the multiple of a continuous function by a
real constant, is again a continuous function. Similarly, for each open subset
D of Euclidean space and each m = 1, 2, ... , cm(D), the set of functions on D
having continuous partial derivatives of order :5:.m, is a real linear space,
since the laws of differentiation guarantee that the sum or scalar multiple of
such functions will be another. In addition, if D is bounded with boundary
aD, and 15 = D u aD, then C m (15), the subset of Cm(D) 11 C(15) consisting of
those functions whose partial derivatives of order :5:. m each admit continuous
extension to 15, is a real linear space.
For example, when a, b E ~, then (a, b) = [a, b], is a closed and bounded
interval. A function y, which is continuous on [a, b], is in C 1 [a, b] 1 if it is
continuously differentiable in (a, b) and its derivative y' has finite limiting
values from the right at a (denoted y'(a+)) and from the left at b (denoted
y'(b-)). When no confusion can arise we shall use the simpler notations y'(a)
and y'(b), respectively, for these values, with a similar convention for higher
derivatives at a, b, when present. Observe that yo(x) = X 3/2 does define a
function in C 1 [0, 1] while Y1 (x) = X 1/2 does not.
Finally, for d = 1, 2, ... , [C(S)]d, [Cm(D)]d, and [C m(15)]d, the sets of
d-dimensional vector valued functions whose components are in C(S), Cm(D),
and Cm (15), respectively, also form real linear spaces.
We know that subsets!!) of these spaces provide natural domains for
optimization of the real valued functions in Chapter 1. However, these sub-
sets do not in general constitute linear spaces themselves. For example,
!!) = {y EC[a, b]: y(a) = 0, y(b) = 1}
is not a linear space since if y E!!) then 2y rf.!!}, (2y(b) = 2(1) = 2 "# 1.)
However,
!!)o = {y E C[a, b]: y(a) = y(b) = O}
is a linear space. (Why?)
In the sequel we shall assume the presence of a real linear space qy consist-
1 We abbreviate C((a, b)) by C(a, b), C1([a, b]) by C 1[a, b], etc.
38 2. Linear Spaces and Gateaux Variations
ing of points (or vectors), y, in which are defined the operations of (vector)
addition and (real) scalar multiplication obeying the usual commutative,
associative, and distributive laws. In particular, there is a unique vector (9
such that c(!J = oy = (9, \;f Y E qy, C E IR; we also adopt the standard abbrevia-
tions that 1y = yand -1y = - y, \;f y E qy.
Example 1.
J(y) = r [sin 3 x + y2(X)] dx
is defined on all of qy = C [a, b], since each continuous function y E qy results
in a continuous integrand, sin 3 x + y2(X), whose integral is finite.
Example 2.
J(y) = r p(x)J1 + y'(X)2 dx, with p E C[a, b],
is defined for each y E qy = C l [a, b] since the assumption that y has a deriva-
tive on (a, b), which has a continuous extension to [a, b], again results in a
continuous integrand.
(Actually, J remains defined on qy, when p is (Riemann) integrable over
[a, b].)
_
T(y) -
1
M:
IX' J1 +r::t::\
y'(X)2
dx,
v' 2g 0 v' y(x)
is not defined on qy = C l [0, Xl] because of the presence of the term JM in
the denominator of the integrand. It is defined on the subset
~* = {y E Cl[O, Xl]: y(x) ~ 0, \;f X E (0, Xl), and f:' (y(x)fl/2 dx < +oo},
is defined on If!J == C 1 [a, b], since for each y E I1JJ, the composite function
f[y(x)] = f(x, y(x), y'(x)) E C[a, b].
However, if fE C([a, b] x D) where D is a domain in 1R2, then F is defined
only on a subset of
!!}* = {y E C1[a, b] : (y(x), y'(x)) E D, V X E [a, b]}.
Example 6. If J and J are real valued functions defined on a subset !!}* of any
linear space 1f!J, then for c, CE IR
cJ, cJ + eJ, JJ, e J , sin J,
1/J, fl, tan J, need not be defined. Thus
r
are also defined on !!}*; but
S(u) = LJ 1 + u; + u; dA
is defined VUE C 1 (D).
(Problems 2.1-2.3)
or
VYE~,
J(yo + v) - J(yo) = r
for Yo E ~, and Yo + v E 24 we should examine
= r V'(X)2 dx +2 r y~(x)v'(x) dx
~2 r y~(x)v'(x) dx. (Why?)
Now yo(a) = (Yo + v)(a) = yo(a) + v(a) so that v(a) = 0, and similarly, v(b) =
1 This is to be considered as two assertions; the first is made by deleting the bracketed expres-
sions throughout, while the second requires their presence.
§2.3. Fundamentals of Optimization 41
r
0. By inspection, y~
y~(x)v'(x) dx = const. r
= const., makes
r r r
minimizes
Constraints
If we seek to minimize J on.@, when it is further restricted to a level set of one
or more similar functions G, then as was known to Lagrange and Euler, it
may suffice to minimize an augmented function without constraints.
(2.3) Proposition. If functions J and G1 , G2 , ..• , GN are defined on.@, and for
some constants AI' ... , AN, Yo minimizes j = J + Al G1 + A2 G2 + ... + ANGN
on ~ [uniquel~~' then Yo minimizes J on.~ [uniquely] when further restricted
to the set GyO = {y E ~: Gj(y) = Gj(yo),J = 1,2, ... , N}.
PROOF. For each y E~:
but when y E GyO ' then J(y) ~ J(yo), since the terms involving the Gj will have
the same values on each side of the inequality. Uniqueness is clearly preserved
if present. D
42 2. Linear Spaces and Gateaux Variations
Remark. The hope is that the Aj can be found so that in addition Gj(yo) =
lj for prescribed values lj' j = 1, 2, ... , N. Reinforcement for this possibility
will be given in the discussion of the method of Lagrangian multipliers (§5.7).
Actually, Yo minimizes J automatically on a much larger set.
If J(y) = J(yo) under these conditions, then APj(Yo) = A.iGiy),j = 1,2, ... , N
(Why?), so that J(y) = J(yo). [With uniqueness, it follows that y = Yo·] D
l·
x
Figure 2.1
§2.3. Fundamentals of Optimization 43
and this last integral vanishes V y + V E !!), when the bracketed term == 0; i.e.,
when
(4)
and thus would be the same for another (perfect) fluid. (The actual minimal
value of J( = J(yo», although calculable, is oflittle interest.)
r r
PROOF. If y E 2.&, then
so that
F(y) - F(yo) ~ r A(X)(g[yo(x)] - g[y(x)]) dx
Observe that both y and the direction v are fixed while the limit as e -+ 0
is taken in 2.6. The existence of the limit presupposes that:
(i) J(y) is defined; (6)
(ii) J(y + ev) is defined for all sufficiently small e; and then
r
Example 2. IfI1JJ = C[a, b], then as in §2.2, Example 1,
r
linear space) so that
J(y + ev) -
------=-
J(y) 1 fb [( y+ev)2()x -y2()]
x dX
e e a
Each of the integrals in this last expression is a constant (Why?), and the limit
r
as e -+ 0 exists. Hence from Definition 2.6, we conclude that
r
Alternatively, using (7), we could form:
r r
compute for fixed y, v, the derivative
r
Example 3. When P E C[a, b], the function
r
Hence, using the second method, we form for fixed y, v E 11jj,
= fb p(x)(y + w)'(x)v'(x) dx
a J1 + (y + av)2(x)2 '
(whichis justified by the continuity ofthis last integrand on [a, b] x IR), and
evaluate at a = 0 to obtain
r r
Example 4. When fE C([a, b] x 1R2), the function
r r
pute bF(y; v) by differentiating
!5F(y; v) = r
see that F(y) = S:f(x, y(x), y'(x)) dx has the variation
r
for which
is defined on OJJ = C[a, b] and has at each y E OJJ and in each direction v E I1jj
the Gateaux variation
so that, for example, Yl (x) = sin x, x E [0, n] is a function in!!}, but Y2(X) = x
is not.
§2.4. The Gateaux Variations 49
r"
a
= Jo ae)1-(y+ev)2(x)dx
=_ r" (y + ev)(x)v(x) dx,
Jo )1 - (y + ev)2(x)
which for e = 0 should give the value
A(Y) = f x(t)y'(t) dt
of§1.3 is defined V Y = (x, y) E I1JJ = (Cl[O, 1])2. Since for V = (u, v) E 11JJ,
r r
E [R2d),
is defined for all vector valued functions YE OJJ = (C1[a, b])d, (d = 2, 3, ... ).
Its Gateaux variation at V E OJJ is given by
fz.[Y(x)]
J
~ fz.(x,
J
Y(x), Y'(x)), j = 1,2, ... , d.
(See Problem 2.6.) This result together with the notation used to express it
should be compared with the one-dimensional case of Example 4.
S(u) = In J 1 + u; + u; dA
!5S(u; v) = f Jl
D
uxvx + uyV y dA
+ u; + u;
(13)
exists V u, v E iJjf, since the denominator can never vanish. (See Problem 2.7.)
(Problems 2.4-2.13)
PROBLEMS
2.1. Give an example of a nonconstant function in each of the following sets and
determine whether the set is a subspace of <W = C1 [a, b].
(a) C[a, b].
(b) f!) = {y E <W: y(a) = OJ.
(c) f!) = {y E <W: y'(a) = 0, y(b) = I}.
(d) C 2 [a, b].
(e) f!) = {y E <W: y(a) = y'(a)}.
bW.
(f) (C 1 [a,
(g) f!) = {y E <W: r:y(x) dx = OJ.
(h) f!) = {y E <W: y'(x) = y(x), X E (a, b)}.
Problems 51
2.2. Which of the following functions are defined on: (a) C1[a, b]? (b) C[a, b]?
2.5. Let CfIJ = C1[a, b] and find (jJ(y; v) for y, v E CfIJ, when
(a) J(y) = y(a)3.
(b) J(y) = S~ [Y(X)3 + xy'(xf] dx.
(c) J(y) = S~J2 + x 2 - sin y'(x) dx.
(d) J(y) = S: [eXy(x) - 3Y'(X)4] dx + 2y'(af.
(e) J(y) = S: [X 2Y(X)2 + eY'(x)] dx.
(f) J(y) = sin y'(a) + cos y(b).
(g) J(y) = (S: [2yi(x) + x 2y(x)J dx)(S! [1 + Y'(X)J2 dx).
(h) J(y) = S: y(x) dx/S! [1 + Y'(X)2] dx.
2.6. In Example 8 of §2.4, verify equation (12) by formal differentiation under the
integral sign.
2.7. Let CfIJ = C 1 (15) where D is a bounded domain in the x-y plane with a nice
boundary.
(a) For J(u) = t SD(U~ + u;) dA, show that
2.8. Let CfIJ = (C 1[a, b])2, Y = (Y1, Y2), V = (v 1, v2), and find M(Y; V) for Y, V E CfIJ
when
(a) J(Y) = Y(a)' Y(b).
(b) J(y) = S: [Y1 (X)2 + y;(X)3] dx.
(c) J(Y) = S: [eYl(lt) - X2Y1 (x)y;(x)] dx.
(d) J(y) = S: [sin 2 Y1 (x) + XY2(X) + Y'l (x)Y2(4] dx.
2.9. Let CfIJ = C 2[a, b] and F(y) = S:f(x, y(x), y'(x), y"(x)) dx = S:f[y(x)] dx, where
f = f(x, y, z, r) in C 1([a, b] x 1R 3 ) is given. For v, y E CfIJ, prove that
2.12. (a) For p, fJ E CfIJ = C[O, b], with p > 0, find a function Yo which minimizes
oL(Y; V) = 1 1 Y'(t)
- - ) . V'(t) dt,
o IY'(t I
if Y E f!)* = {Y E CfIJ:, Y', t= O}. Is f!)* a subspace ofCflJ?
CHAPTER 3
1 The definitions of functional convexity employed in this book incorporate, for convenience,
some presupposed differentiability of the functions. For a convex set !lA, less restrictive formula-
tions are available, but they are more difficult to utilize. (See Problem 0.7, [Fl], and [E- T].)
53
54 3. Minimization of Convex Functions
(3.2) Proposition. If J and J are convex functions on a set!!} then for each
c E IR, c 2 J and J + J are also convex. Moreover, the latter functions will be
strictly convex with J (for c ¥- 0).
PROOF.
Hence J(y) ~ J(yo) [with equality iff y = Yo] and this is the desired
conclusion. D
Example 1.
J(y) = r (sin 3 x + y2(X)) ix
r
for which
= 2 r y(x)v(x) dx + r v2(x) dx
~2 r y(x)v(x) dx
with equality iff f~ v2(x) dx = 0 which by A.9 is possible for the continuous
= bJ(y; v),
function v2 iff v2(x) == 0; i.e., v = (9. Thus, by Proposition 3.3, each y E !ljj
which makes
M(y; v) = 2 Lb y(x)v(x) dx = 0,
V V E IlJI, minimizes J on !ljj uniquely. Clearly, y = (9 accomplishes this and
hence it is the unique minimizing function.
On the other hand, to minimize J on
~ = {y E C[a, b]: y(a) = a1 , y(b) = bd,
we would again try to have M(y; v) = 0, but now only for those y, y + V E ~,
i.e., only for those v E ~o where
~o = {v E C[a, b]: v(a) = v(b) = O}. (Why?)
Again y = (9 would make M(y; v) = 0 but now it is not in ~ (unless al =
b1 = 0).
Example 2.
F(y) = r y'(X)2 dx
for which
bF(y; v) =2 r y'(x)v'(x) dx, V y, V E!ljj = C 1 [a, b],
is also convex on !ljj (Why?), but now the equality F(y + v) - F(y) = bF(y; v)
56 3. Minimization of Convex Functions
is possible iff S~ V'(X)2 dx = 0 and this occurs whenever v(x) = const. Thus F
is not strictly convex on O!J. However, F is strictly convex on
r r
F on ~ uniquely. By inspection, y' = const. will accomplish this since
r r
[a, b] x [R2, then as in Example 4 of§2.4, we know that the integral function
r
has V y, v E C I [a, b], the variation
r r
or that
Observe that the underlined variable(s) (if any) are held fixed in the inequal-
ity while partial derivatives of f are required only for the remaining variables
(y and z). Clearly, if f itself is convex on 1R3 as in §0.8 then f~, y, z) will be
convex as above. Moreover, if as in §0.9, f is strictly convex, then f~, y, z)
will be strongly convex. However, in general, strong convexity is weaker than
strict convexity. (Why?) Also, f(y, z) is [strongly] convex on D ~ 1R2 precisely
when R!, y, z) = f(y, z) is [strongly] convex on [a, b] x D.
58 3. Minimization of Convex Functions
r
If f~, y, z) is [strongly] convex on [a, b] x D, then
r r
(7) gives
Example 1. To minimize
(3.6) Remarks. The differential equation (8) whose solutions in f!) minimize
our convex F is known as the Euler-Lagrange equation. It is a fundamental
tool of the variational calculus and we will examine it thoroughly in Chapter
6. For the present, note that if f E C 2 ([a, bJ x D), then we may use the chain
rule (formally) on the left side of (8) and seek a minimizing y E f!) n C2 [a, bJ
which satisfies the second-order differential equation
fzx[Y(x)] + fzly(x)]y'(x) + fzz[y(x)Jy"(x) = f,,[y(x)J, (9)
(with the obvious abbreviations). Although there are standard existence theo-
rems which provide conditions for a solution y to (9) in a neighborhood of
x = a which satisfies y(a) = a 1 , these theorems do not gurantee that such
solutions can be extended to [a, bJ, or that when extendable they cari meet
the second end point condition y(b) = b1 • (In Problem 3.20, we have an
example for which even the simpler equation fz(x, y'(x)) = const. cannot be
satisfied in f!).) Thus we do not have a proof for the existence of a function
which minimizes F on f!), and indeed as we shall see, such functions need not
exist. Our condition (8) is at best sufficient, and we must consider each
application independently.
There are similar simplifications when f = f(x, y), but the associated
integrands occur less frequently in application. (See Problem 3.18.)
(3.10) Proposition. If f = f(x, z) and fzz are continuous on [a, b] x I and for
each x E [a, b], fzAx, z) > 0 (except possibly at a finite set of z values) then
f(~, z) is strongly convex on [a, b] x I.
PROOF. For fixed x E [a, b], let g(z) = f(x, z) so that g"(z) = fzz(x, z) > 0 on I
(with a possible finite set of exceptiional values). Then integrating by parts
gives for distinct z, CE I:
since the last integral is strictly positive by the hypothesis and A.9, indepen-
dently of whether z < Cor C< z. (Why?)
Thus with w = C- z, recalling the definition of g, we conclude that
f(x, z + w) - f(x, z) > fAx, z)w, when w =F 0, and this establishes the strong
convexity of f~, z). 0
Remark. If at some x E [a, b], fzz(x, z) == 0 on [Zl, Z2] s; I then fAx, z)
increases with z, but not strictly, so that f~, z) is only convex on [a, b] x I.
fzAx, z) = 2 > o.
Example 2. f~, z) = e'!(sin 3 ~ + Z2) is also strongly convex on ~ x ~ since
fzz(x, z) = 2e X > O.
(In fact the product of a [strongly] convex function by a positive continuous
function p = p(x) is again [strongly] convex. See Problem 3.3.)
1 Z2 r2
fzz(x, z) = Jr 2 + Z2 (r2 + r2)3/2 (r2 + Z2)3/2 > O.
Example 7. f(!, z) = !2 + (sin !)Z2 with fzAx, z) = 2(sin x), becomes convex
only when sin x ;?: 0; e.g., on [0, n] x ~; and is strongly convex only when
sin x> 0, e.g., on (0, n) x ~.
When all variables are present in the function f(x, y, z), there are no
simplifications such as those just considered. (There is again a second deriva-
tive condition which guarantees [strict] convexity, but it is awkward to
apply. See Problem 3.5.)
The following general observations (whose proofs are left to Problems 3.2
and 3.3) will be of value:
(3.11) Fact 1. The sum of a [strongly] convex function and one (or more)
convex functions is again [strongly] convex.
Fact 2. The product of a [strongly] convex function f0s., y, z) by a
continuous function [p(x) > 0] p(x) ;?: 0 is again [strongly] convex on
the same set.
Fact 3. f(!, y, z) = a(!) + {3(!)y + y(!)z is (only) convex for any con-
tinuous functions a, {3, y.
Fact 4. Each [strongly] convex function f(!, z) (or f(!, y)) is also
[stron&ly] convex when considered as a function j(!, y, z) on an appro-
priate set.
f~, y, z) = p~)J1 + y2 + Z2
is strongly convex on [a, b] x 1R2 (Fact 2 and Example 10.)
which are discontinuous at the origin. However, on the restricted set 1R2 ,...,
{(O, O)} this function is again convex but not strongly convex. (See Problem
3.24.)
Example 13. When IX #- 0, f(y, z) = (z + IXy)2 is only convex in 1R2, since (5)
holds, but with equality when (w + IXV)2 = 0. (See Problem 3.16b.)
(Problems 3.1-3.19)
§3.4. Applications 65
§3.4. Applications
In this section we show that convexity is present in problems from several
diverse fields-at least after suitable formulation-and use previous results
to characterize their solutions. Applications, presented in order of increasing
difficulty-and/or sophistication, are given which characterize geodesics on
a cylinder, a version of the brachistochrone, Newton's profile of minimum
drag, an optimal plan of production, and a form of the minimal surface.
Other applications in which convexity can be used with profit will be found
in Problems 3.20 et seq.
Figure 3.1
66 3. Minimization of Convex Functions
(If the cylinder were "unrolled," this would correspond to the straight line
joining the points.) Plants take helical paths when climbing around cylindri-
cal supporting stakes toward the sun [Li].
(b) A Brachistochrone
For our next application, we return to the brachistochrone of §1.2(a). As
formulated there, the function T(y) is not of the form covered by Theorem
3.5. (Why not?) However, if we interchange the roles of x and y and consider
those curves which admit representation as the graph of a function y E :!) =
{y E Cl[O, Xl]: y(O) = 0, y(X l ) = yd (with Xl and Yl both positive) as in
Figure 3.2, then in the new coordinates, the same analysis as before gives for
each such curve the transit time
IX, 1 + y'(X)2
T(Y)=Jo 2 dx,
gx
which has the strongly convex integrand function of §3.3, Example 4, with
Figure 3.2
§3.4. Applications 67
r = 1 and p(x) = (2gxt l /2 on (0, Xl]. Now p(x) is positive and integrable on
[0, Xl] and although it is not continuous (at 0), Theorem 3.7 remains valid.
(See Problem 3.21.)
Thus we know that among such curves, the minimum transit time would
be given uniquely by each y E ~ which makes
y'(X)2 X , 2 c2
1 +yx '( f = 2
c ' so 1 + Y (x) = c-x
2 2
and
y'(x) = Jx J1 + y'(X)2 =
C
J c2
x
- X
~ 0. (12)
Thus y'(0) = 0.
If we introduce the new independent variable fJ through the relation
°
x(fJ) = (c 2/2)(1 - cos fJ) = c 2 sin2(fJ/2), then fJ = when x = 0, and for fJ < n,
fJ increases with x. Also, c 2 - x(fJ) = (c 2/2)(1 + cos fJ). By the chain rule
dy
dfJ = y'(x)x'(fJ) = y'(x) (c2 sin fJ
2
)
,
1 - cos fJ . c2
-----:-fJ sm fJ = -(1 - cos fJ).
1 + cos 2
°
Hence y(fJ) = (c 2/2)(fJ - sin fJ) + c l , and the requirement y(O) = shows that
C l = 0.
Upon replacing the unspecified constant c by fic, we see that the mini-
mum transit time would be given parametrically by a curve of the form
X(fJ) = c2(1 - cos fJ),
{ (13)
y(fJ) = c2(fJ - sin fJ),
provided that c2 and fJ l can be found to make x(fJl ) = Xl' y(fJd = Yl' The
curve described by these equations is the cycloid with cusp at (0, 0) which
would be traced by a point on the circumference of a disk of radius c 2 as it
rolls along the y axis from "below" as shown in Figure 3.3.
For fJ > 0, the ratio y(fJ)/x(fJ) = (fJ - sin fJ)/(l - cos fJ) has the limiting
value +00 as fJ i 2n, and by L'Hopital's rule it has the limiting value of as
fJ ~ 0. Its derivative is
°
(1 - cos fJ)2 - sin fJ(fJ - sin fJ) 2(1 - cos fJ) - fJ sin fJ
(1 - cos fJ)2 (1 - cos fJ)2
68 3. Minimization of Convex Functions
/ y
(
/ "\
\
\ )
\
"'" x
Figure 3.3
° °
and thus is positive for < f) < 2n. (Why?) y(f))/x(f)) is positive, increases
°
strictly from to +00 as f) increases from to 2n, and hence from continuity
(through the intermediate value theorem of §A.l), assumes each positive
value precisely once. In particular, there is a unique f)l E (0, 2n) for which
y(f)d/X(f)l) = ydx l , and for this f)l' choosing c 2 = xd(l - cos f)l) will guar-
antee the existence of a (unique) cycloid joining (0, 0) to (x I' Y I)'
Unfortunately, as Figure 3.3 shows, the associated curve can be repre-
sented in the form y = y(x) only when f)l ~ n, i.e., when ydXI ~ n/2. More-
over, the associated function y E ClEO, Xl] only when ydXI < n/2, since the
tangent line to the cycloid must be horizontal at the lowest point on the arch.
Nevertheless, we do have a nontrivial result:
(3.13) When ydXI < n/2, among all curves representable as the graph of a
function Y E C l [0, Xl] which join (0,0) to (Xl' Yl), the cycloid provides
uniquely the least time of descent.
on
F(y) = r
x(1 + y'(X)2t 1 dx,
y
1Motion
,,-
/
/
I
I
I
x
Figure 3.4
1 See the article "On Newton's Problem of Minimal Resistance," by G. Butazzo and B. Kawohl,
in The Mathematical Intelligencer, pp. 7-12, Vol. 15, No.4, 1993. Springer-Verlag, New York.
70 3. Minimization of Convex Functions
\
\
\
\
Figure 3.5
y(x) = I x
1 u2 (ce)
-~de
ce
=-
1
c
IC -u d
cx
(s)
2
s
s (cx ~ so), (14')
Minimum drag
0.8
profiles
0.6
0.4
0.2
Figure 3.6
1 Newton himself believed that his results might be applicable in the design of a ship's hull.
However, his resistance law is more appropriate to missiles traveling hypersonically in the thin
air of our upper atmosphere. See [FuJ.
72 3. Minimization of Convex Functions
which takes into account the deviations in both inventory I and associated
production rate P from their "ideal" counterparts. (P is a constant which
adjusts proportions.) This is rather a crude measure of cost, but it possesses
analytical advantages.
Moreover, if we introduce the inventory deviation function y = I - ..F,
then we see from (15) and (15') that the associated production deviation is
given by
P - &' = l' - ..F' + rL(1 - ..F) = y' + rLy. (16')
Therefore the cost may be regarded as
From (16') we see that condition (18) simply requires that P(T) = ;?JJ(T). Is
this reasonable? Why?
If we differentiate in (18') and cancel the ay' terms from each side, then the
equation reduces to
(19)
when we substitute
y2 = a2 + p2. (19')
The general solution of the differential equation (19) is
yo(t) = c1e yt + c2e- yt , with y~(t) = y(c1e yt - c2e- yt ), (20)
and we must try to find the constants Cl and C2 so that yo(t) satisfies the
boundary conditions. We require that
Yo(O) = ao = C1 + C2,
and
o = y~(T) + aYo(T)
= c1(Y + a)e yT + c2(-Y + a)e- yT ,
or that
0= C1 (y + a)e 2YT - c2(Y - a).
From this last equation, the ratio
p ~ Y + a e2yT = C2 (21)
Y- a C1
is specified, and for this p the choices C 1 = ao/(l + p), C2 = aop/(l + p) will
satisfy both conditions. This gives the desired conclusion:
(3.14) Among all inventory functions 1 Eel [0, T] with a, p, and 1(0) = 10
prescribed, that given by
with p, Y determined by (21) and (19'), respectively, will provide uniquely the
minimum cost of operation as assessed by (16). The associated optimal produc-
74 3. Minimization of Convex Functions
tion rate is
Po(t) = &l(t) + (1 + p)-l(1o - .1"0)(1' + oc) [e yt - ey(2T-t)] (23)
Moreover, in this case the minimum cost can easily be computed. Indeed
from (17)
C(Yo) = IT [P2 Y5 + (Yb + OCYO)2] dt
and we recall that Yo satisfies (18) and (18'). We see that the integrand is just
[Yo(Yb + ocYo)]'(t), and since Yo(O) = ao, we conclude that
C(yo) = [Yo(Yb + ocYo)] 16 = -oca5 - aoyb(O),
where yb(O) = 1'(0) - .1"'(0) can be obtained by differentiating (22).
Finally recalling that ao = 10 - .1"0' we find that
and this expression shows the effects of various choices of oc, p, T, and .1"(0) on
the minimum cost of operation. Observe that it is independent of the sign of
the initial inventory deviation.
J 1 + (ux + vx) 2
+ (uy + vy)2 - J 1 + Ux + uy ~ Juxvx + UyV y '
2 2
1 + ux2 + uy2
with equality iff Vx = Vy = O. Hence, from the assumed continuity:
S(u + v) - S(u) ~ ~S(u; v) = If
u v +u v
x x y y dx dy
D J1 + u; + u;
§3.4. Applications 75
U def Ux d W def Uy
= an = -----r=~:======'
J 1 + u; + u; J 1 + u; + u;'
are in C1(D) so that the integrand of c5S(u; v) may be rewritten as
Uv x + WVy = (Uv)x + (Wv)y - (Ux + w,)v. Now, if we assume that Green's
theorem holds for the domain D ([FIJ), then
and for v E :!Jo, the line integral vanishes. Thus for v E :!Jo,
(3.16) Theorem. If D is a domain in [R2, such that for some constants Ai'
j = 1, 2, ... , N, f~, y, z) and Ajgj~' y, z) are convex on [a, b] x D [and at
least one of these functions is strongly convex on this set], let
r
under the constraining relations
F(y) = r
[a, b] x D, so that by Theorem 3.5, Yo minimizes
j[y(x)] dx = F(y)
and we conclude that each solution Yo E r!J of the differential equation for
the new j minimizes F on r!J [uniquelyJ under the pointwise constraining
relations
j = 1,2, ... , N,
of Lagrangian form.
Although, in general not even one such gj[Yo(x)] may be specifiable a
priori (Why?), the vector-valued version does permit minimization with given
Lagrangian constraints. (See Problem 3.35 et seq.)
Corresponding applications involving inequality constraints are consid-
ered in Problem 3.31 and in §7.4.
Example 1. To minimize
F(y) = Ii (y'(X))2 dx
on
r!J = {y E c 1 [0, 1J: y(o) = 0, y(1) = O},
when restricted to the set
{y E c 1
[0,1]: G(y) ~ I1 y(x) dx = 1},
we observe thatf~, y, z) = Z2 is strongly convex, while g~, y, z) = y is (only)
convex, on IR x 1R2. Hence, we set j(x, y, z) = Z2 + AY and try to find A for
which Ag~, y, z) remains convex while the differential equation
d - -
dxfz[Y(x)] = J;,[y(x)]
has a solution Yo E r!J for which G(yo) = 1. Now since g is linear in y (and z),
Ag(X, y, z) = AY is convex for each real A. Upon substitution for j, the differen-
tial equation becomes
d A
dx (2y'(x)) = A or y"(x) = 2'
which has the general solution
78 3. Minimization of Convex Functions
and since -24g(!, y, z) = -24y remains convex, we have found the unique
solution to our problem.
(3.18) Remark. In this example we can find), to force Yo into any level set
of G we wish, since ).g~, y, z) = ).y is always convex for each value of ).. This
is not the case in general and our approach will work only for a restricted
class oflevel sets of G. (See Problem 3.29.)
Example 2 (The catenary problem). Let's determine the shape which a long
inextensible cable (or chain) will assume under its own weight when sus-
pended freely from its end points at equal heights as shown in Figure 3.7.
We utilize the coordinate system shown, and invoke Bernoulli's principle
that the shape assumed will minimize the potential energy of the system.
(See §8.3.)
We suppose the cable to be oflength L and weight per unit length W, and
that the supports are separated a distance H < L. Then utilizing the arclength
s along the cable as the independent variable, a shape is specified by a function
y E I1Jj = C 1 [0, LJ with y(O) = y(L) = 0, which has associated with it the po-
tential energy given within an additive reference constant by the center-of-
y
H
x
(s, y(s»
-"
Figure 3.7
§3.5. Minimization with Convex Constraints 79
mass integral
F(y) = W LL y(s) ds.
However, in order to span the supports, the function y must satisfy the
constraining relation
G(y) = LL J1 - y'(S)2 ds = LL dx(s) = H,
where x(s) denotes the horizontal displacement of the point at a distance s
along/the cable, since then as elementary geometry shows, X'(S)2 + y'(S)2 = l.
Clearly ly'(s)1 ::; 1 and if 1y'(S1)1 = 1, then the cable would have a cusp at S1'
since x'(sd = 0.
Now f(§., y, z) = Wy is (only) convex on [0, LJ x 1R2 while g(§., y, z) =
-~ is by §3.3, Example 5, strongly convex on [0, LJ x IR x (-1, 1).
Thus by 3.11(1), the modified function j(§., y, z) = Wy - 2~ is
strongly convex when 2> 0. Hence by 3.16, for 2> we should seek a
solution y for the differential equation
°
d - -
d/z[Y(s)] = J;,[y(s)J on (0, L)
that is in
d( 2y'(s) )
ds J1 _ y,2(S) =W
or
2y'(s)
+c (27)
J1 - y'2(S) =
-----r==o== S
,
where we have replaced the unspecified constant 2 by W2 and introduced a
new constant c.
We know that each y E !!) which satisfies this equation for 2 > must be
the unique shape sought. Hence we can make further simplifying assumptions
°
about y if they do not preclude solution. We could, for example, suppose
y' = const., but it is seen that this could not solve (27). And we can suppose
that y is symmetric about L/2, which accords with our physical intuition
about the shape assumed by the cable. If we set 1 = L/2 it follows that
y'(l) = 0, so that from (27), c = -l; also, we need only determine y on [0, lJ,
where we would expect that y' ::; 0.
Thus from (27) we should have that
, 2 (s - l)2
Y (s) = 22 + (s _ l)2 on [0, lJ,
80 3. Minimization of Convex Functions
I Jl - y'(S)2 ds = ~.
Upon substitution from (28), this becomes
1
1_ (1- S)2 ds = II 2 ds = H.
10 22+(1-S)2 oJ2 2 +(1-S)2 2
With the hyperbolic substitution (1 - s) = 2 sinh 0, we can evaluate the inte-
gral and fin~ that
h(0( ) <i.EH
- - -_- H
21
- s_ m
L
. h- 1 -
2
(I)
Now, h(O() d~ (sinh O()/O( is continuous and positive on (0, 00) and has by
°
L'Hopital's rule as 0( \i and 0( )" 00, the same limits as does cosh 0( viz.,
1 and 00, respectively. Thus by the intermediate value theorem (§A.l), h
assumes each value on (1,00) at least once on (0,00). Hence 30( E (0,00)
for which h(O() = L/H and for this 0(, 2 = I/sinh 0( will provide the y(s)
sought.
The resulting curve is defined parametrically on [0, IJ by
y(s) = J2 2 + (1- S)2 - J2 2 + 12,
(3.19) Among all curves of length L joining the supports, the catenary of (29)
will have (uniquely) the minimum potential energy and should thus represent
the shape actually assumed by the cable.
Optimal Performance
Example 3. (A simple optimal control problem). A rocket of mass m is to be
accelerated vertically upward from rest at the earth's surface (assumed sta-
tionary) to a height h in time T, by the thrust (mu) of its engine. If we
suppose h is so small that both m and g, the gravitational acceleration,
remain constant during flight, then we wish to control the thrust to minimize
the fuel consumption as measured by, say,
(30)
/
Figure 3.8
82 3. Minimization of Convex Functions
h = y(T) = f 0
T
(T - t)u(t) dt -
gT2
2'
Hence
G(u) =
deff T(T -
0 t)u(t) dt = h
gT2
+2 = k, say, (32)
Ok = f: (T - t)uo(t) dt = -~ f: 3
(T - t)2 dt = _ A: ,
or
6k
-2 = T 3 '
so that
() 3k(T - t)
Uo t = T3 (33)
we may now use simple calculus to minimize this expression with respect
to T and thus obtain an optimal flight time To = (6h/g)1/2.
(3.20) Remark. We know that (33) provides the unique solution to our prob-
lem. However, observe that from (32), the corresponding maximum thrust is
3k = (3h)
uo(O) = T2 T2 + 1.5g;
§3.6. Summary: Minimizing Procedures 83
First. Show that when y, y + V E f!}, then F(y + v) - F(y) ~ I(y; v) where
I(y; v) is some new expression which admits further analysis.
Second. If possible, characterize those v which permit the equality
F(y + v) - F(y) = I(y; v). (Ideally, equality at y => v = 0.)
Third. Note the restrictions on v which occur when y, y + V E!!), and
transform I(y; v) so that conditions (on y) under which it vanishes for all such
v can be discerned.
Fourth. Show that there is a y = Yo E f!} which meets these conditions.
Remarks. To obtain the basic inequality, elementary facts such as (y" + V")2
- (y")2 ~ 2y"v" may suffice. It is not essential to recognize I(y; v) as
c5F(y; v)-or indeed, even to consider this variation.
In transforming I(y; v), we may make further simplifying assumptions
about y (which do not exclude y from f!}). In particular, we may assume that
y has as many derivatives as required to integrate an expression such as
$:y"(x)v"(x) dx by parts as often as desired. "Natural" boundary conditions
for a solution y may arise in this process.
84 3. Minimization of Convex Functions
III. If a usable basic inequality cannot be obtained for F (or for F), it may
be possible to reformulate the problem-or consider a restricted version
of the problem-(perhaps expressed in other coordinates) in terms of new
functions which admit the analysis of I or II. Also, the solution of one
minimization problem usually solves some associated problems involving
constraints. 0
PROBLEMS
3.1. For which of the following functions f, is ffdf, y, z) convex on [a, b] x [R2?
For which will ffdf, y, z) be strongly convex on this set?
(a) f(x, y, z) = x + y - z, [a, b] = [0, 1].
(b) f(x, y, z) = x 3 + y2 + 2z 3, [a, b] = [0, 1].
(c) f(x, y, z) = ~ + x 2y2, [a, b] = [0, 1].
(d) f(x, y, z) = (x sin X)[y4 + Z2], [a, b] = [-n/2, n/2].
Problems 85
3.4*. Show that if f,f" andfz are continuous on [a, b] x [R2, thenf(~, y, z) is convex
on [a, b] x [R2 iff
f(x, tYl + (1 - t)Y2, tZ 1 + (1 - t)Z2) ::; tf(x, Yl' Zl) + (1 - t)f(x, Y2, Z2)
VXE[a,b]; tE(O, 1); Yj,ZjE[R, j=1,2.
(Hint: See Problem 0.7.)
3.5*. (a) Show that iff, f,y, f,z, and!.z are continuous on [a, b] x [R2, then f~, y, z)
is convex on [a, b] x [R2 iff the matrix of second derivatives [~: i:J is
positive semidefinite on [a, b] x [R2, i.e.,
(U v) [f,y
f,z
f,zJ
fzz v
(u) ~ 0, VXE[a,b]; y,z,u,vER
3.6-3.15. In these problems, verify that the integrand function is strongly convex (on
the appropriate set) and find the unique minimizing function for F
(a) on f!J. (b) on f!Jl' (c) on f!J2'
86 3. Minimization of Convex Functions
3.11. F(y) = Ii
x- l J1 + y'(X)2 dx,
~ = {y E c l [1, 2]: y(l) = J8:,
y(2) = Js}
~l = {y E c l [1, 2]: y(l) = j8}.
3.12. F(y) = J:'l e"(x) dx,
~ = {y E c l [ -1, 2]: y( -1) = 2, y(2) = 11}.
3.15. F(y) = H
[Y'(X)4 - 4y(x)] dx,
~ = {y E C l [l, 8]: y(l) = 2, y(8) = -37/4}.
~l = {y E c l [1,8]: y(l) = 2}.
r
(b) Can you find more than one function which minimizes
c the integral defining y(x) in part (c) is bounded. (This demonstrates that
we cannot always choose c to meet the boundary conditions. Explain why
physically.)
3.21. (a) Verify that Theorem 3.7 remains valid for integrands of the form f(x, z) =
p(x)~, where p is continuous on (a, b], p(x) > 0, and S:p(x) dx <
00. (For example, p(x) = X- 1/2 on (0, 1].)
(b)* More generally, suppose that f(~, y, z) is [strongly] convex on (a, b) x
[R2, and Yo E C[a, b] is a C 1 solution of the differential equation
(d/dx)fz[y(x)J = J;,[y(x)J on (a, b).
f[yo(x)] dx + v(x).fz[yo(x)]
Thus when max Ifz[Yo(x)J1 :0::; M < +00, conclude that Yo minimizes
I:·
F(y) = S:f[y(x)J dx [uniquely] on ~* ~ OJJ = C[a, b] (1 C 1 (a, b), where
~* = {y E OJJ: y(a) = Yo (a), y(b) = y(b);
F(y) exists as an ?mproper Riemann integral}.
(This extension of Theorem 3.5 to improper integral functions F also
permits consideration of functions Yo whose derivative "blows up" at the
end points.)
(c)* Make similar extensions of Theorem 3.7 and 3.9.
(d)* Suppose f.[Yo(x)J is bounded near x = a, but only (b - x)fz[Yo(x)J is
bounded near x = b. Show that we can reach the same conclusion as in
part (b) on ~' = {y E ~*: y'(x) ...... bi as x 7" b}. Hint: Use the mean value
theorem on v near b.
3.22*. A Brachistochrone. (See §3.4(b).)
(a) Show that the time of transit along the cycloid joining the origin
to the end point (Xl' Yl) for ydXl < n/2 is given by Tmin =
j(xd2g) [Odsin(Od2)J where 01 is the parameter angle of the end point.
Hint: Use equations (12), (13).
(b) For the case Xl = Yl = 1, compute 01 and Tmin . Compare with the
answers obtained for the straight line and the quarter circle in Problem
1.2.
(c)* Use the results of Problem 3.21 to extend the analysis in §3.4(b) to the
case when ydXl = n/2.
(d)** Can you use the methods of this chapter to establish the minimality of
the cycloid when Yl/x 1 > n/2 for some class of curves?
3.23. A Seismic Wave Problem. (See Problem 1.8(b) and Figure 1.12(b).)
(a) Show that the integrand function f(~, z) = ~/~ is strongly convex
on [a, b] x [Rwhena>O.
(b) Conclude that each y E ~ = {y E C 1 [a, b]: y(a) = 0, y(b) = b1 > O} which
satisfies for some positive constant r, the equation y'/j1 + y'(X)2 = x/r on
(a, b) will minimize the time-of-travel integral T(y) = P g [j1 + y'(X)2/X] dx
on ~ uniquely.
(c) Show that the associated path is along a circular arc of radius r with center
on the y axis.
Problems 89
~~-----------.y
Figure 3.9
(d) Make the substitution R'(O) = tan qJ(O) in (c), and conclude that the
resulting equation is satisfied when qJ'(O) = lib = sin IX, or when
r(O) = Cl sec(b- 1 0 + c)
for constants c, Cl'
(e) Suppose that 01 = 0, and 0 < f3~ b- 1 02 < n. Prove that an r E ~* of the
form in (d) can be found if C 1 = r 1 cos c, where
cos f3 - (rdr2)
tanc= . R
sm I'
and argue that this is possible.
(f)* Show that even though f(y, z) is not strongly convex, L is strictly convex
on ~*, so that the minimizing function found in (e) is unique. Hint: Prove
that ifr, r + v E ~*, then L(r + v) - L(r) :2: bL(r; v) with equality iff vr' =
v'r, or v = const. r.
(g) Conclude that when r 1 = r 2, the circular arc r(O) = const. is not the geode-
sic as might have been conjectured.
Geodesics on a Cone: II.
Consider the right circular cone shown in Figure 3.9. To find geodesic
curves of the form 0 = O(r) joining points (rl' 01 ) and (r2' O2), we assume with-
out loss of generality that r 2 > r 1 > 0, 01 = 0, and 0 ::; O2 ::; n.
(h) Suppose that 0 E C 1 [rl' r 2], derive the length function L(O). Is it convex?
(i) When O2 #- 0, prove that L(O) is minimized uniquely on
~ = {O E C 1 [rl' r 2]: 0(0) = 0, 0(r2) = 02},
by 0 = b sec- 1 (rlc) - b sec- 1 (rdc), provided that the constant c can be
chosen to make 0(r2 ) = O2 ,
(j) What happens when O2 = O?
3.25*. Beam Deflection. When a cantilevered beam of length L is subjected to a
distributed load (force per unit length) p(x) as shown in Figure 3.1O(a), it will
deflect to a new position which can be described by a function y E C2 [0, LJ.
According to the theory of linear elasticity, the potential energy is approxi-
mated by
U(y) = t ctJlY"(X)2 - p(x)y(x)] dx,
/.
y (a) (b)
Figure 3.10
Problems 91
(a) Prove that each y E fi) II C4 [0, L] which satisfies the differential equation
llyIV(X) = p(x) and the "natural" boundary conditions y"(L) = 0, y"'(L) =
is the unique minimizing function for U on fi). (The physical meaning of the
°
natural boundary conditions is that both the bending moment and shear
force are zero at the free end of the beam.) Hint: Show that U is strictly
convex on fi) and integrate the y"v" term of bU(y; v) by parts twice.
(b) Solve the differential equation from part (a) when p(x) = w = const.,
selecting the constants of integration so that the solution is in fi) and
satisfies the given natural boundary conditions. (This would be the case for
deflection of a beam under its own weight.) What is the maximum deflec-
tion and where does it occur?
(c) If at x = L, the beam is pinned and has a concentrated moment M applied
as shown in Figure 3.1O(b), the potential energy is approximated by
F(y) = f xy(x) dx
on
£1fi = {y E C l [0, 1]: y(O) = y(1) = OJ,
when restricted to the set where
G(y) ~ f y'(X)2 dx = l.
F(y) = II J1 + y'(X)2 dx
on
£1fi = {y E CI[O, 1]: y(O) = 0, y(1) =.J2 - 1},
when constrained to the set where
I" y2(X) dx = 1?
(b) Formulate the problem using x as the independent variable and conclude
that this results in an energy function U which is not convex on
fi) = {y E C 1 [0, H]: y(O) = y(H) = O}. (Hint: Use v = - Y to show that
U(y + v) - U(y) is not always greater than or equal to DU(y; v) when y,
y + V E fi).)
(c)* Use the arc length s, as a parameter to reformulate the problem of finding
the minimal surface of revolution (as in §1.4(a)) among all curves of fixed
length L joining the required points. (Take a = 0 and al = 1 ~ b1 .)
(d) Conclude that the problem in (c) is identical to that of a hanging cable for
an appropriate W, and hence there can be at most one minimizing surface.
(See §3.5.)
subject to the constraint L(y) = J~ e-·'y(t) dt ~ I, where IX, p, and I are positive
constants. Hint: Problem 3.18, with 2.4, 3.17. (This may be given the interpre-
tation of finding that consumption rate function y which maximizes a measure
of utility (or satisfaction) U subject to a savings-investment constraint L(y) ~ I.
See [Sm], p. 80. y is positive when I is sufficiently large relative to IX, p, and T.)
3.32*. Dido's Problem.
Convexity may be used to provide partial substantiation of Dido's conjec-
ture from Problem 1.5, in the reformulation suggested by Figure 3.11.
Verify her conjecture to the following extent:
(a) If b > lin, prove that the function representing a circular arc (uniquely)
maximizes
A(y) = fb y(x) dx
on
fi) = {y E C 1 [ -b, b]; y(b) = y( -b) = O},
=
when further constrained to the I level set of L(y) J~bJ1 + y'(X)2 dx.
(b) If b = lin, show that the function representing the semicircle accomplishes
the same purpose for a suitably chosen fi)* (see Problem 3.21).
y
Figure 3.11
94 3. Minimization of Convex Functions
(c)* In parts (a) and (b), compute the maximal area as a function of p, the angle
subtended by the arc; show that this function increases with p on (0, n].
(d) Why does this not answer the problem completely? Can you extend the
analysis to do so?
3.33. Let I be an interval in [R and D be a domain of [R2d. For x E [R, and Y =
(Yl' ... , Yd)' Z = (Zl' ... , Zd) E [Rd a function f~, Y, Z) is said to be [strongly]
convex on I x D if f, fy = U;", ... ,J;,,), and fz = (1. " ... , f •.) are defined and
continuous on this set l and satisfy the inequality
f(x, Y + v, Z + W) - f(x, Y, Z) ~ fy(x, Y, Z)· V + fz(x, Y, Z)· W,
'v' (x, Y, Z), (x, Y + v, Z + W) E I x D
r r
[R2d, then
is [strictly] convex on
{!) = {Y E (Cl[a, b])2d: Y(a) = A, Y(b) = B},
where A, B E [Rd are given.
(b) Iff~, Y, Z) is strongly convex on [a, b] x [R2d, then prove that each Y E {!)
which satisfies the vector differential equation (d/dx)fz[Y(x)] = fy[Y(x)]
(i.e.; (d/dx)f•. [Y(x)] = fy.[Y(x)], j = 1, 2, ... , d) on (a, b) is the unique
minimizing (unction for Fon {!).
3.34. Use the results in Problem 3.33 to formulate and prove analogous vector
valued versions of: (a) Theorem 3.7, (b) Corollary 3.8, and (c) Proposition 3.9
in §3.2.
3.35. (a) Formulate and prove a vector valued version of Theorem 3.16 in §3.5.
(b) Modify Theorem 3.16 to cover the case of a single Lagrangian constraint,
and prove your version. Hint: Proposition 2.5 with 3.17.
(c) Formulate a vector valued version of the modified problem in (b) that
covers both isoperimetric and Lagrangian constraints.
3.36. (a) Show that Yo(x) = sinh x = (eX - e- X)/2 minimizes H(Yl (X)2 + y~ (X)2) dx
uniquely for Yl E C l [0, 1] when Yl (0) = 0, Yl (1) = sinh 1.
(b) Apply 3.16 as extended above to the problem of minimizing
3.37. Minimal Area with Free Boundary. (See §3.4(e) and §6.9.)
(a) When u E Cl(O) and v E Cl(D), verify that formally
bS(u; v) = - f D
(Ux + Wy)v dA +
(Vu' N)v du ,
+ u~ + u;
f J1
aD
where N is the outward normal to aD and du is the element of arc length
along aD. Hint: Green's theorem in its divergence form.
(b) Conclude that if Ux + Wy == 0 in D and Vu' N = 0 on a sub arc K ~ aD,
then u provides uniquely the minimal surface area among those competing
functions which agree with it on K, that part of the boundary complemen-
tary to K.
3.38*. An Optimal Control Problem with Damping.
If the motion of the rocket discussed in Example 3 of §3.5 is opposed by its
speed through the atmosphere, then the equation of motion becomes approxi-
mately
y = u - g - rx.y, for a constant at: > O.
(a) Rewrite the last equation as (d/dt)(e·ty) = e·t(u - g) and integrate it as in
the text under the same conditions to obtain the new isoperimetric condi-
tion
G(u) ~ IT (1 - e-·(T-t»)(u(t) - g) dt = at:h.
(b) Show that the function uo(t) = Ao(l - e-·(T-t») will minimize F(u) =
g u 2 (t) on C[O, T] uniquely, subject to the constraint in (a) for an appro-
priate choice of Ao.
(c) For at: = 1, find Ao and determine F(uo).
3.39. A heavy rocket-propelled sled is accelerated from rest over a straight horizon-
tal track of length 1 under velocity-proportional air resistance as in the previ-
ous problem.
(a) To optimize the fuel consumption over time T show that we might con-
sider minimizing
F(v) = IT (Ii + at:V)2 dt
on
p) = {v E Cl[O, T]: v(O) = O}
under the isoperimetric condition G(v) ~ g v(t) dt = 1, where v(t) is the
velocity at time t and at: is a positive constant.
(b) Find a minimizing velocity function vo. Is it unique? Hint: f(y, z) =
(z + at:y)2 is convex! (Why?)
(c) When at: = 1 and 1 = h, compare F(v o) with F(uo) in part (c) of the previous
problem under zero gravity conditions. Should they be the same? Explain.
3.40*. (Newton's parametrization.) In (14), let t = - y' so that
(1 + t 2)2 1
cx = = - + 2t + t 3 = cx(t) say, for c > O.
t t
(a) Show that for t > 1/A, dx/dt > 0, so that t can be used as an independent
variable when t ;::: 1/J3.
96 3. Minimization of Convex Functions
4
y = -c ( log t - t 2 - -3t + -47) + Yl (t ~ 1/j3),
4 4
and
Hint: Show that for T> 1, H'(T) > 0 and that H assumes each positive
value. (Recall the argument in §3.4(b).)
F(y) =
e1 +xdx
J 0 Y'(X)2
on
~* = {y Eel [0, 1]: y(O) = h, y(l) = 0, y'(x) :;::; OJ,
and we are now seeking the profile of minimum drag for an entire body of
revolution, not just a shoulder.
(a) Show that if we remove the last restriction from ~* and admit zig-zag
profiles Y with large slopes, then we could obtain arbitrarily small values
for F(y)!
(b) When h = 1, compare the drag values of the profiles for a cone C, a
hemisphere H, and a truncated cone T in which y(x) = h when x :;::; t.
(c) When h = 2, compare the drag values of the profiles for a cone C, a
paraboloid of revolution P, and a truncated cone T in which y(x) = h when
x :;::;t.
(d) When h = j, repeat part (c) and conjecture about the superiority of trun-
cated cones or other flattened objects. In particular, using m = h/(1 - a)
can you find the "best" truncated cone To for given h?
(e) Show that with initial flattening permitted, we would need to minimize the
Bolza-type function G(y, a) = a 2 /2 + F(y) on ~ x [0, 1J, where F and ~
are as in §3.4(c), and we require y' :;::; O. Can our previous results from
convexity be used in attacking this problem? How?
CHAPTER 4
v(x) ~ LX (h(t) - c) dt
97
98 4. The Lemmas of Lagrange and du Bois-Reymond
(h(x) - e)v'(x) dx
= r h(x)v'(x) dx - ev(x)
Hence, from A.9 it follows that on [a, b], the continuous integrand
I: = O.
(h(x) - e)2 == 0 or hex) = e = const. as asserted. D
r r
A.B. Hence integrating the first term of the integral by parts gives
r g(x)v(x) dx = 0,
V V E!!)o = {v E Cm[a, b]: v(kl(a) = V(kl(b) = 0, k = 0, 1,2, ... , m},
= 1,2, ...
r
(4.5) Proposition. If hE C[a, b] and for some m
h(x)v(ml(x) dx = 0, V V E !!)O,
where
!!)O = {v E Cm[a, b]: v(kl(a) = V(kl(b) = 0, k = 0, 1,2, ... , m - 1},
then on [a, b], h is a polynomial of degree < m.
PRooF*. By a translation we may assume that a = 0. The function
H(x)~ f: I'
dt 1 dt 2 ···Im-, h(t)dt, (1)
so that
v(m)(x) = h(x) - p(x).
We must next show that with the proper choice of q we can make
°
v(k)(b) = for k = 0, 1,2, ... , m - 1, and this is possible. (See Problem 4.6*.)
Assuming that this choice has been made, the resulting v E Plo, and it follows
from repeated partial integrations that
J: p(x)v(m)(x) dx = - J: p'(x)v(m-1)(x) dx
since the boundary terms vanish. Thus, finally, from the hypothesis and
construction:
0::;:; f: (h(x) - p(x)f dx = J: (h(x) - p(x))v(m)(x) dx
= J: h(x)v(m)(x) dx = 0,
PROOF. If we restrict attention to those V = (v, 0, 0, 0, ... ,0) E qy, then the
integral condition resuces to
r H(x)' V'(x) dx = 0,
PROBLEMS
4.1. Carry out the steps in the proof of Proposition 4.2 in the special case that
g(x) = sin x.
4.2. If hE C[a, b] and
f [g(x)v(x) + h(x)v'(x)] dx = 0,
103
104 5. Local Extrema in Normed Linear Spaces
Thus II '11 is simply a real valued function on !lJj which by (la) is positive
definite, by (lb) is positive homogeneous, and by (lc) satisfies the triangle
inequality. Each function with these properties is called a norm for CW. There
may be more than one norm for a linear space, although in a specific exam-
ple, one may be more natural or more useful than another. Every norm also
satisfies the so-called reverse triangle inequality
Example 1. For CW = IRd with y = (YI' Y2, ..• , Yd), the choice Ilyll = Iyl =
(I,1=1 yJ)I/2 defines a norm, called the Euclidean norm, but the verification of
the triangle inequality (lc) is not trivial (Problem 5.2).
The choice Ilylll = I,1=llyj l also defines a norm and now all of the
properties are easily verified. In particular, for (lc):
d d
Ily + Pill = I, IYj + Pj ! ::;; I, (IYj! + IPjl),
j=l j=l
or
d d
lIy + Pill::;; I, Iyjl + I, IPjl = lIylll + IIPIlI'
j=l j=l
Still another norm is the maximum norm lIyllM = maxj=I.2 ..... d Iyjl. However,
the simpler choice, lIyll = IYII does not yield a norm for IRd if d ~ 2 because it
is not positive definite. Indeed, the nonzero vector (0, 1, 1, ... , 1) would have
a zero "norm" with this assignment.
Example 2. For !lJj = C[a, b], it is useful to think of the values y(x) as the
"components" of the "vector" y E CW. Then the choice lIyllM = max !y(x)1 =
maxXE[a.b1Iy(x)1 determines a norm, the so-called maximum norm. That it is
even defined and finite is not obvious and it requires the knowledge that:
(i) y E C[a, b] => Iyl E C[a, b];
(ii) [a, b] is compact;
(iii) a continuous function (Iyl) on a compact interval ([a, b]) assumes a
maximum value (llyIlM)'
(i) is a consequence of the (reverse) triangle inequality in IRI, while (ii) is
established in §A.O; (iii) will be established in Proposition 5.3.
§S.l. Norms for Linear Spaces lOS
i.e., y = @;
(b) positive homogeneous:
IleyllM = max Iey(x) I = max lelly(x)1 = lei max ly(x)1 = 1c111y1iM;
and
(c) satisfies the triangle inequality:
I(y + ji)(x) I = Iy(x) + ji(x) I ~ ly(x)1 + lji(x) I
r
xE[a,bj
and
Ilylll = (ly(x)1 + ly'(x)l) dx.
To establish the triangle inequality for II' 11M, observe that for x E [a, b]:
I(y + ji)(x)1 + I(y + ji)'(x) I ~ ly(x)1 + Iji(x) I + ly'(x)1 + Iji'(x)1
= (ly(x)1 + ly'(x)l) + (lji(x)1 + Iji'(x)l)
~ IlyliM + IljilIM'
Now maximizing over x yields the inequality
Ily + jiliM = max{ly + ji)(x) I + I(y + ji),(x)l} ~ IlyliM + IljilIM,
as desired.
Observe that for x E [a, b]: ly(x)1 ~ IlylIM' Thus IlyliM = 0 = y = @; i.e.,
II' 11M is positive definite, as is 11'111' since Ilylll ~ (b - a) IlylIM' Can you devise
corresponding norms for C 2 [a, b]? For Cm[a, b]?
(5.0) Remark. Since for each m = 2, 3, ... , Cm[a, b] ~ Cl[a, b] ~ C[a, b],
then each norm for C[a, b] from the previous example will serve also as a
norm for Cl[a, b] (or for Cm[a, b]). However, these norms do not take
cognizance of the differential properties of the functions and supply control
only over their continuity.
106 5. Local Extrema in Normed Linear Spaces
Example 4. If 11'11 is a norm for the linear space iJJj, then Ilyll + It I is a norm
for the linear space iJJj x IR consisting of the pairs (y, t) with componentwise
addition and scalar multiplication. (See Problem 5.8.)
Example 5. iJJj = C(a, b] is a linear space which has no obvious choice for
a norm. Now, max ly(x)1 can easily be infinite as can J~ ly(x)1 dx; the function
y(x) = l/(x - a), x #- a is in iJJj and realizes both of these possibilities. In
fact, this space cannot be normed usefully.
Example 6. iJJj = (C[a, b])d, the space of d-dimensional vector functions with
components in C[a, b], is a linear space with a norm for Y = (Yl' Y2, ... , Yd)
(i.e., Y(x) = (Yl(X), Y2(x), ... , yix)), x E [a, b]), given by
IIYIIM = maxi Y(x) I,
or by
d
II YII = L maxIYj(x)l,
j=l
or by
d fb
II YII = jf:l a IYj(x) I dx,
or by
II Ylll = rCtl
The verification is left to Problem 5.6
I
Yf(X)Y 2 dx = r IY(x) I dx
Example 7. iJJj = (C l [a, b])d, the space of d-dimensional vector functions with
components in C l [a, b], is a linear space with a norm
I YIIM = max(1 Y(x) I + IY'(x)l),
or
d
II YII = L max(IYj(x)1 + lyj(x)I),
j=l
Can you think of other norms for this space?
As these examples show, the discovery of a suitable norm for a given
linear space is not immediate, is seldom trivial, and may not be possible.
Fortunately, the spaces of interest to us in this text do have standard norms,
and these have been given in the examples above.
(Problems 5.1-5.8)
First, we define the "distance" between vectors Y and Y by Ily - YII. The
triangle inequality shows that for any three vectors x, y, Z E qy: Ilx - zil ~
Ilx - yll + Ily - zll and this has the familiar geometrical interpretation.
Next, we introduce the concept of convergence by declaring that if Yn E
I!Y, n = 1, 2, ... , then the sequence {Yn}f has the limit Yo E qy (denoted
limn-+oo Yn = Yo, or Yn -+ Yo as n -+ (0) iff llYn - Yoll -+ 0 as n -+ 00. A given
sequence need not have a limit, but if it does its limit is unique.
[Indeed, were Yo and Yo both limits of the same sequence {Yn} from I!Y, then
bythetriangleinequalityforeachn,O ~ Ilyo - Yoll ~ Ilyo - Ynll + llYn - Yoll·
The right side can be made as small as we wish by choosing n sufficiently
large; thus Ilyo - Yo II = 0; but by the positive definiteness of the norm, this
means that Yo - Yo = (!J or Yo = Yo·J
Alternatively, we can introduce the (open) spherical b-neighborhood of Yo
for each b > 0 defined by Sd(YO) = {y E qy: Ily - Yoll < b} and note that for a
sequence {Yn}: limn-+oo Yn = Yo iff the Yn are eventually in each Sd(YO); i.e., for
each b > 0, 3 Nd such that n ~ Nd = Yn E Sd(YO)·
8f
·r
s: -- ./ ./ __ -
'-. '- Y
--..-- - '\.
u - __ -
~----~----------------------~--.
b x
Figure 5.1
108 5. Local Extrema in Normed Linear Spaces
b x
Figure 5.2
(Problems 5.9-5.12)
§5.3. Continuity
If fl) is a subset of qy, then we can consider it as the domain of various kinds
of functions. For example, a qy-valued function :F: fl) ..... qy could be defined
by requiring :F(y) = @, V Y E fl). For our purposes, the most important
functions are those which are real valued, i.e., those of the form J: fl) ..... [R, of
which we have already encountered many examples in the previous chapters.
When qy is supplied with a norm 11'11, we simply adopt the standard 6 - b
definition for the continuity of a real valued function. (See §0.3.)
(5.1) Definition. In a normed linear space (qy, 11'11), if fl) ~ I!Y, a function
J: fl) ..... [R is said to be continuous at Yo E fl) iff for each 6 > 0, 3 a b > 0 such
that
IJ(y) - J(Yo)1 < 6, V Y E fl) with Ily - Yoll < b.
§5.3. Continuity 109
Example 1. In any normed linear space (i5JI, 11·11), the norm function J(y) =
Ilyll is always continuous on i5JI and hence on any subset fl) ofi5Jl. Indeed, from
the reverse triangle inequality (ld), IJ(y) - J(Yo)1 = Illyll - IIYoll1 ~ Ily - Yoll,
and hence making Ily - Yoll small « <5) makes IJ(y) - J(Yo)1 at least as
small. (In fact, II· I is uniformly continuous on i5J1. See Lemma 5.2 below.)
Example 2. For i5JI = C[a, b] with the maximum norm II· 11M of§5.1 Example
2, the function
J(y) =f [si n 3 x + y2(X)] dx
~f Iy(xf - Yo (X)2 I dx
Example 3. With a given oc C[a, b], the function of §2.2, Example 2; viz.,
E
g
J(y) = oc(x)J1 + Y'(xf dx is defined V Y E qy = C 1 [a, b]. Direct examina-
tion of its continuity with respect to the maximum norm I 11M is facilitated
by the following uniform estimate for f(z) = Jl"+?:
If(z) - f(zo)1 ::;; Iz - zol, z, Zo E R
[This is an immediate consequence of the mean value theorem and the fact
that
1f'(z)1 = p::;; 1+z
1, Z E R.]
To obtain uniform estimates of the type used in this last example, we shall
make frequent appeal to the following technical
(5.2) Lemma. If K is a compact set in a normed linear space (qy, 11'11), then a
continuous function F: K -+ R is uniformly continuous on K; i.e., given E > 0,
3 () > 0 such that y, Y E K and Ily - yll < () = IF(y) - F(Y)I < E.
PROOF. We shall establish the contrapositive implication. Suppose the
lemma does not hold. Then, for some Eo > 0, and each n = 1, 2, ... , 3 points
Yn' Yn E K with llYn - Ynll < lin for which IF(Yn) - F(Yn)1 :::::: Eo' However,
since K is compact, there is a subsequence Yn. -+ Yo E K asj -+ 00, and since for
}
each j = 1, 2, ... :
1
IIYnj - Yoll ::;; IIYnj - Yoll + IIYnj - Yn)1 ::;; IIYnj - Yoll + n.'
J
Example 4. When f
r r
E C( [a, b] x 1R2), the function
ly(x)l, ly'(x)1 :$; ly(x)1 + 1y'(x)1 :$; IlyliM < 1 + IIYollM = co, say.
Then with c = co, it follows from the aforementioned uniform continuity
that given E:: > 0, 3 a () E (0,1) such that Ily - YollM < ()( < 1) =>
If[y(x)] - f[Yo(x)]l = If(x, y(x), y'(x)) - f(x, Yo(x), yo(x))1 < E::,
VXE[a,b].
r
This uniform estimate gives
a a + lin b x
Figure 5.3
for n > (b - ar l,
IIYnlll = G) G)
(1) = 2~ --.0 as n--. 00,
while Yn(a) = 1, V n.
a x
Figure 5.4
For example, CEO, 1] is not compact with the maximum norm since the
continuous function J(y) = y(l) is unbounded on CEO, 1]. To see this, we
consider the functions Yn(x) = nx, x E [0, 1] for which J(Yn) = n ..... +00.
More generally, in a nontrivial normed linear space (<w, 11'11), !5Jj itself is
never compact (Problem 5.17). In particular, there is no clever assignment
of a norm to C[a, b] that makes this space compact.
Similarly, q) = {y E C l [a, b]: y(a) = 0, y(b) = 1}, which is one of the sets
of concern in Chapter 2, is not compact with respect to the maximum norm
II' 11M or the integral norm 11'111 of §5.1, Example 3. For as we have seen in
Example 1, the norm function J(y) = Ilyll is always continuous on !5Jj and
hence on any subset q) of OJ/. But JM(y) = IlyliM is unbounded on the set q)
above, as is shown by the sequence of parabolic functions sketched in Figure
5.4, which have maximum values as large as desired while remaining in q).
Clearly IlyliM ~ max ly(x)1 and so JM(y) will be unbounded on this sequence.
It is also plausible graphically that the integral norm function Jl (y) = Ilylll
(~ n ly(x)1 dx) is unbounded on this sequence, since the area under the curve
can be made as large as we wish. In particular, the apparently reasonable
problem of finding the maximum of the function J(y) = n ly(x)1 dx on q) has
no solution. (In fact, the problem of minimizing J on q) has no solution. See
Problem 5.19.)
Thus, the results of Chapter 3 in which we actually obtained minimum
values for rather complicated functions on sets such as q) become even more
remarkable. (As we saw, an underlying convexity was responsible for our
success in these cases.)
(5.4) Definition. In a normed linear space (ifY, 11·11) a point Yo E !!J !:; qy is said
to be a local extremal point for J on !!J if for some r > 0, Yo is an extremal
point for Jon !!Jr(YO), where !!Jr(yo) = {y E!!J: Ily - Yoll < r}; i.e., either
J(y) ::;; J(yo), V Y E !!Jr(yo) (Yo is a local maximum point for J on !!J), or
J(y) ~ J(yo), V y E !!Jr(yo) (Yo is a local minimum point for J on !!J).
Of course, each extremal point is automatically a local extremal point
whatever norm is used. However, Yo may be a local extremal point with
respect to one norm but not with respect to another. (See Problem 5.20)
Now, the Gateaux variations of Chapter 2 may also be formed without
consideration of a norm and when nonvanishing, they preclude local extremal
behavior with respect to any norm.
Suppose, for example, that at a point Yo, the function J has a positive
variation c5J(yo; v) in the direction v E qIj. From Definition 2.6, it follows
that V 6 sufficiently small, the ratio [J(yo + 6V) - J(Yo)]/6 is also positive,
so that J(yo + w) - J(yo) has the sign of 6. Hence,
V small 6 > 0, (2)
and we say that at Yo, J increases strictly in the direction v (and decreases
strictly in the opposite direction - v). When c5J(yo; v) < 0, then c5J(yo; - v) > °
(Why?) so that the preceding inequalities and assertions are reversed. In
either case, since as 6"" 0, II(yo ± w) - Yoll = 611vll "" 0, the points Yo ± 6V
in (2) are eventually in each norm neighbourhood of Yo. Thus local extremal
behavior of J at Yo is not possible in the direction v.
§5.5. Necessary Conditions: Admissible Direction 115
(5.5) Proposition. In a normed linear space (!fIJ, 11·11), if Yo E !!} ~ !fIJ is a (local)
extremal point for a real valued function J on !!}, then
c5J(yo; v) = 0, V directions v which are !!}-admissible at Yo. 0
tion for it in the next section and return to its classical treatment in the
next chapter.
(Note that Proposition 5.5 admits a local extremal point Yo E !!) which has
no nonzero !!)-admissible directions, and such points must also be considered
as candidates for local extrema.)
r
Example 1. To characterize local extrema for the function (of §2.4, Example 2)
r
necessary for a (local) extremal point Yo E !!) is that
This condition is surely fulfilled when Yo == 0, but this function is not in !!)
(unless a 1 = b1 = 0). And as Lemma 4.4 of Lagrange shows there is no other
possibility; i.e., when a 1 # 0 or b1 # 0 there are "too many" admissible direc-
tions to permit any function Yo E!!) to satisfy all of the conditions c5J(yo; v) =
onecessary for a (local) extremum. Thus no such local extremum exists.
If on the other hand, we attempt to minimize J over
Example 2*. Let's characterize local minima for the function J of §2.4,
Example 3,
J(y) = r oc(x)J1
c5J(y; v) = fb ex(x)y'(x)v'(x) dx = 0,
a J1 + y'(xf
Thus from Lemma 4.1 of du Bois-Reymond, we know that this necessary
condition is satisfied only by ayE ~ for which the continuous function
It remains to show that e E (0, exo) can be chosen to satisfy the other
h(e)~ r
boundary condition y(b) = b1; i.e., to make
When Q:(x) = const. = Q: o on [a, b], then h(e) = [(Q:o/e)l - 1]-1/1(b - a), is
continuous and strictly increasing on (0, Q:o) with the limiting values °
(= lime +0 h(e» and +00 (= limetao h(e». Hence in this case, there is precisely
one e for which (5) is satisfied and precisely one y E !!) which satisfies (3).
When Q: is not constant, it is more difficult to analyze h(e). (See Problem
5.32*.) However, it is important to realize that there may be no solution.
For example, when a = 1, b = 2, and Q:(x) = x on [1, 2], then we may take
Q: o = 1, and for e E (0, 1):
Q:2(X) Xl
-e 2- - 1 = -e 2 - 1 >
-
X2 - l',
hence
h(e) = f l ( 2()
1 elX - 1
Q:
)-1/2dx::;; f2 (X2 - 1 1)-1 /2 dx
Example 3. For I1JJ = 1R1 with the standard Euclidean norm 1'1, and
!!) = {y E 11JJ: Iyl = I},
there are no !!)-admissible directions v#-(() at any y E!!) for any function
J, because if y E!!) and (() #- v E 11JJ, then y + ev ¢ !!) except at most for one
value of e, as the simple geometry of Figure 5.5(a) shows.
On the other hand, at each point in the square !!)1 = {y E 11JJ: IlyliM = I}
of Figure 5.5(b) there is always one possible nonzero !!)-admissible direction,
except at the corner points where again there are none.
(a) (b)
Figure 5.5
§5.5. Necessary Conditions: Admissible Direction 119
f(x, y, z) = 1
M: /¥-+Z2--,
....;2g Y
so that
-1~
fy(x, y, z) = M: 3/2 '
2....;2g y
and
z
fAx, y, z) = M: r:. ~'
....;2g....;y....;1 +z
120 5. Local Extrema in Normed Linear Spaces
we have
and f: 1
y(xf3/2 dx < +oo},
then for each y E!?)t> c5T(y; v) is defined by (6) V directions v E OJ! at y for
which v(O) = v(x l ) = 0 and Iv(x)1 :::;; y(x) (or scalar multiples of these direc-
tions). We shall return to this problem in §6.2(c), Example 4.
In summary, as these examples show, there may be "too many" nontrivial
admissible directions v to allow any Yo E !?) to fulfill the necessary condition
c5J(Yo; v) = 0; or there may be just enough to permit this condition to deter-
mine Yo, or there may be many, but not readily usable-or there may be
just one-or even none. Nevertheless, when present, they provide the most
obvious approach to attacking problems in optimization, and should always
be considered before investigating alternatives such as the method of
Lagrangian multipliers to be introduced in §5.7. Finally, as with the
brachistochrone function of Example 4*, they may be essential to the
problem.
(Problems 5.21-5.32)
Now, for (qy, 11'11) a linear function L: qy --+ IR need not be continuous
(§5.3, Example 5), and we must require this continuity. Accordingly we make
the following:
(5.6) Definition. In a normed linear space (<w, 11'11), a real valued function
J is said to be differentiable (in the sense of Frechet) at Yo E qy provided
that J is defined in a sphere S(yo) and there exists a continuous linear
function L: qy --+ IR for which
where 3(y - Yo) is a real valued function (defined when y - Yo #- (!J by this
equation) which has zero limit as y - Yo --+ (!J or as Ily - Yoll --+ o.
(5.8) Proposition. In a normed linear space (<w, 11'11) if a real valued function
J is differentiable at Yo E <w, then it is continuous at Yo'
PROOF. From Definition 5.6,
IJ(y) - J(Yo)1 :$; IL(y - Yo)1 + Ily - Yo1113(Y - Yo)!.
Now as y --+ Yo, from the linearity and continuity of L,
IL(y - Yo)1 = IL(y) - L(Yo)I--+ 0;
122 5. Local Extrema in Normed Linear Spaces
also Ily - Yoll -+ 0, and 3(Y - Yo) -+ 0 from its definition. Thus as Y -+ Yo,
IJ(y) - J(Yo)l-+ 0,
and this establishes the continuity. o
As in ~d, the converses of these propositions need not hold. Continuous
functions are seldom differentiable. Moreover, if J admits the Gateaux varia-
tion £5J(yo; v) in each direction v E qy, the resulting function of v may be
neither linear nor continuous-and even these properties may not suffice
for differentiability. Some additional conditions are required.
Proposition 5.7 provides the key for establishing the differentiability of
a suitably defined function J at a point Yo in a normed linear space (qy, I . II).
First: Check that £5J(yo; v) exists, V v E qy.
Next: Prove that L(v) ~ £5J(yo; v) is linear and continuous in v.
(5.9) Theorem. In a normed linear space (t2¥, 11'11), if a real valued function J
has at each y E Sr(YO) Gateaux variations c5J(y; v), V v E qy and
(a) £5J(yo; v) is linear and continuous in v;
(b) as y -+ Yo, Ic5J(y; u) - c5J(yo, u)l-+ 0 uniformly for u E B = {u E qy:
Ilull = I};
then J is differentiable at Yo'
PROOF*. From condition (a) we may express £5J(yo; u) = L(u) for a linear
function L: qy -+ ~. Each y E Sr(YO) ,.., {yo} may be expressed (uniquely) as
Y = Yo + tu for t = Ily - Yoll < rand u E B. (Why?)
Moreover, for each fixed u E B, f(t) ~ J(yo + tu) is differentiable on
(-r, r) since at tl E (-r, r), with B = t - tl f= 0 and Yl = Yo + t 1u, we have
Yo + tu = Yl + BU,
so that
Remark. This theorem is the most usable for our purposes. Other sufficient
conditions are known, but all involve some additional uniformity such as
that in condition (b). Without this uniformity, it is only possible to character-
ize the behavior as y = Yo + tu -+ Yo for fixed u. On the other hand, part of
condition (a) is superfluous. The linearity of c5J(yo; v) is a consequence of
condition (b). See [Y].
Conditions (a) and (b) also imply a weak continuity of c5J at Yo in the
sense of the following:
(5.10) Definition. In a normed linear space (CW, 11·11), the Gateaux variations
c5J(y; v) of a real valued function J are said to be weakly continuous at
Yo E CW provided that for each v E 1lJ/: c5J(y; v) -+ c5J(yo; v) as y -+ Yo. [See
Problem 5.34.]
is defined if y E £l) = C[a, b]; using the maximum norm II· 11M of§5.1, Exam-
ple 2, we know that J is continuous at each Yo E 1lJ/. Moreover, from (8) in §2.4
we know that V y, v E 1lJ/,
c5J(y; v) = 2 Lb y(x)v(x) dx
and the linearity in v is apparent (Why?) Thus to establish the continuity
124 5. Local Extrema in Normed Linear Spaces
r
1c5J(yo;
2/
v)1
Yo(x)v(x) dx /
r
=
:::;; 2 IYo(x)llv(x)1 dx
:::;; 211Y -
r [y(x) - Yo(x)] u(x) dx /
YoIIMllullM(b - a)
= 211Y - YoIIM(b - a). (8)
We observe that the last term -+ 0 as Y -+ Yo and is independent of u. Hence
the left side of (8) -+ 0 uniformly in u when IluliM = 1 as required.
It follows that J is differentiable at each Yo E CiJJ.
r
Example 2. Similarly, the function J of §5.3, Example 3, viz.,
when Ily - YollM < Y = EIA, and since E can be made as small as we please
condition (b) is satisfied; the differentiability at an arbitrary Yo E ifY follows
from Theorem 5.9.
c5L(Y; V) = f0
l Y'(t) ,
IY'(t)I' V (t) dt, V V E!fY,
:S A II Y - Yo I it L1 IY:(t) Idt
IL(Y) - I
L(Yo) - c5L(Yo; Y - Yo) < A II Y - Y; I -+ 0 as Y -+ Y;o.
I Y - YoliM - 0 0 M
126 5. Local Extrema in Normed Linear Spaces
(5.11) Proposition. When f = f(x, y, z), f" and fz E C([a, b] x ~2), then
r
that they satisfy condition (b) at Yo, note that
by standard estimates.
+ r Ifz[Y(x)] - fz[Yo(x)]llv'(x)1 dx
Now
fy[y(x)] - f,[Yo(x)] = fy(x, y(x), y'(x» - f,(x, Yo(x), y~(x»,
and since f, is uniformly continuous on each box [a, b] x [-c, C]2 (Lemma
5.2), it follows that If,(x, y, z) - f,(x, Yo, zo)1 < e if Iyl, IYol, Izl, IZol ~ c and
Iy - Yol + Iz - zol < r = r(e).
Thus for the given Yo E rtIf, we can choose c so large that IIy - YoIIM ~
1 => ly(x)l, lyo(x)l, ly'(x)l, ly~(x)1 ~ c, V X E [a, b], and hence for a given e > 0,
conclude that :3 r > 0 such that
IIy - YoIIM < r ~ 1 => If, [y(x)] - f,[Yo(x)] I ~ e, Vx E [a, b].
Similarly, for perhaps a smaller r, we can have that
Ifz[Y(x)] - fz[Yo(x)] I
= IfAx, y(x), y'(x» - fz(x, Yo (x), y~(x»1 ~ e, Vx E [a, b].
§S.6*. Affine Approximation: The Frechet Derivative 127
Thus
I£5F(y; v) - £5F(yo; v)1 ~ B r
(Iv(x)1 + Iv'(x)l) dx
Tangency
(9)
which is defined V y E I[IJ. The approximation is "good" in the sense that for y
near Yo and y i= Yo,
J(y) - T(y)
Ily - Yoll = 3(y - Yo) -+ 0 as y -+ Yo·
so that if a sequence Yn E JyO ' (Yn ¥- Yo) n = 1, 2, ... , provides unit directions
.n = (Yn - Yo)/(IIYn - Yo I!) with a limit direction,., as Yn -+ Yo, it follows from
the assumed continuity of J'(yo) that
n-'oo n-'oo
By (9), T(yo + .) = J(yo) = T(yo), and so any possible limit direction. must
furnish a point Yo + • in the hyperplane T,o' Conversely, each. such that
Yo + • E T,o will make J'(yo) = O. (Why?) Accordingly we make the following
(5.12) Definition. In a normed linear space (t2¥, 11'11), if a real valued function
J is differentiable at Yo E r1JJ, then we introduce
1 Although this definition provides suggestive terminology, it avoids the deeper question of
whether each such tangential direction , is geometrically tangent in that it is the limit of a
sequence {'n}, of the type described above. This does hold under more stringent requirements on
J. See §A.7 and Liusternik's Theorem in [1-T].
§5.7. Extrema with Constraints: Lagrangian Multipliers 129
r r:
of the bI-level set of the function G2 (y) = y(b). The set
(J
Po = J(yo) p
Figure 5.6
If it also contains a full neighborhood of (Po, (10)' then there are preimage
points (1', s) and (r, §) and associated y, y for which the conditions (10) are
met. This is readily seen in Figure 5.6. -
Finally, to have (1', s), (r, §) as near (0, 0) as we please we would require
that each small neighborhood of (0, 0) map onto a set which contains a full
neighborhood of (Po, (10). All of this is assured if the mapping F == (f, !§)
has an inverse defined in a neighborhood of (Po, (10) which is continuous at
(Po, (10)·1
The simplest conditions which provide this continuous local inverse are
well known, and form the content of the inverse function theorem which
we state without proof. (See [Ed].)
(5.13) Theorem. For Xo E IRd and 'r > 0, if a vector valued function F:
S«Xo)"'" IRd has continuous first partial derivatives in each component with
nonvanishing Jacobian determinant at X o, then F provides a continuously
invertible mapping between a neighborhood of Xo and a region containing a
full neighborhood of F(Xo)· D
For F = (flO f2, ... , h) we require in this theorem that the matrix with
(continuous) elements 8fJ8xj , i, j = 1, 2, ... , d, arranged in natural order,
have a nonzero determinant when evaluated at Xo' If F defines a linear
transformation of ~d into itself, then this becomes the familiar condition
for invertibility of the matrix representing the transformation.
Now, with y = Yo + rv + sw, the partial derivative
then J cannot have a local extremal point at Yo (even) when constrained to GyO '
the level set of G through Yo'
Remark. The hypotheses of Proposition 5.14 also imply that G cannot have
a local extremal point at Yo (even) when constrained to JyO ' the level set of
J through Yo'
PROOF. Since the non vanishing determinant of the hypothesis is precisely the
Jacobian determinant 8(,f, ~)/8(r, s) evaluated at r = s = 0, we can apply the
inverse function theorem (5.13) to the vector valued function F = (,f, ~)
provided that it has continuous partial derivatives in a neighborhood of
Xo = (0,0).
It suffices to establish the continuity of, say,
J,.(r, s) = c5J(yo + rv + sw; v),
for fixed v, w, in a neighborhood of (0, 0).
But if r1 , SI are such that Yl = Yo + r1 v + SI W is in St(Yo), the neighbor-
hood given by the hypothesis, then y = Yo + rv + sw is within any given r 1 of
Yl if Ir - rll, Is - sll < rd[2(llvll + Ilwll)] since Ily - Ylll ~ Ir - r1 111vll +
Is - sllllwll; (Why?) And by the continuity of c5J(y; v) at Yl we know that
132 5. Local Extrema in Normed Linear Spaces
given Ill> 0, 3 'Cl > 0 such that 1!5J(y; v) - c5J(Yl; v)1 < III when Ily - Ylll <
'C 1 • D
With this preparation, it is easy to give conditions necessary for a local
extremal point in the presence of a constraint. We first recall Definition 5.10:
Definition. In a normed linear space (<2Y, 11·11) the Gateaux variations c5J(y; v)
of a real valued function J are said to be weakly continuous at Yo E <2Y pro-
vided that for each v E <2Y: !5J(y; v) -+ !5J(yo; v) as y -+ Yo.
(5.15) Theorem (Lagrange). In a normed linear space (<2Y, 11·11), if real valued
functions J and G are defined in a neighborhood of Yo, a local extremal point
for J constrained to GyO ' and have there weakly continuous Gateaux variations,
then either
(a) c5G(yo; w) == 0, V W E <2Y; or
(b) there exists a constant A E ~ such that !5J(yo; v) = Ac5G(yo; v), V v E <2Y.
PROOF. If (a) does not hold, then 3 WE <2Y for which c5G(yo; w) "# o. With
this wand any v E <2Y, by Proposition 5.14 we must have that the determinant
!5J(yo; v)
Ic5G(yo; I
c5J(yo; w) = O.
v) c5G(yo; w)
Hence with Ad~ c5J(yo; w)/c5G(yo; w), it follows that c5J(yo; v) = Ac5G(yo; v),
V v E <2Y as was to be proven. D
Figure 5.7
II
minimizing
J(y) = [sin 3 x + Y(X)2] dx
over
q) = {y E I1JI = C[O, 1]: II X[Y(X)]4/3 dx = 1},
G(y) == II X[Y(X)]4/3 dx = 1.
We know from §2.4, Example 2, that M(y; v) exists for all y, v E I1JI and is
given by
M(y; v) = 2 II y(x)v(x) dx.
Similarly, simple computation shows that t5G(y; v) exists for y, v E I1JI and
is given by
t5G(y; v) = ~ II X[Y(X)]I/3 V (X) dx.
In the maximum norm, both M(y; v) and t5G(y; v) are weakly continuous
by Proposition 5.11. Thus by Theorem 5.15, a point Yo E I1JI which minimizes
JIGYo must satisfy either
n
(a) t5G(yo; w) = 1 xYO(X)I/3 W (X) dx == 0, V W E 11JI, [and this condition can-
not hold since by Lemma 4.4 it would imply that xYO(X)I/3 == or yo(x) == °
° on [0, 1], while Yo = (9 is not in q)]; or
134 5. Local Extrema in Normed Linear Spaces
(5.16) Theorem. In a normed linear space (i5JI, 11'11), let real valued functions J,
G1 , G2 , ... , GN be defined in a neighborhood of Yo, a local extremal point for J
constrained to GyO ~ {y E i5J1: Gi(y) = Gi(yo), i = 1, 2, ... , N}, and have there
weakly continuous Gtiteaux variations.
Then either:
bG1(yo; v 1) bG1(yo; v2 )
bG2 (yo; v 1) bG2 (yo; v2 )
(a)
(12)
(having the determinant of condition (a) in its lower right corner) is non-
vanishing. Then the inverse function theorem in IR N +1 can be used as before
to find (N + l)-tuples of scalars (r, 81' 82' ... , 8N) and (r, §l' §2' ... , §N) as near
(0, 0, ... , 0) as we wish for which the points
N
Y = Yo + rv + L: 8jVj'
j=l
N
Y = Yo
-
L: §jVj,
+ rv + j=l
satisfy the conditions
J(y) < J(yo) < J(y),
and
i = 1,2, ... , N.
We thereby exclude a local extremum for J constrained to GyO ' contradicting
the hypothesis.
Thus for the specific set of directions V1 , V2 , •.. , VN the determinant (12)
must vanish for each v E !fIJ, and if we expand it by minors of the first column,
we have upon dividing by the cofactor of c5J(yo; v) (see [N]) an equation
equivalent to condition (b), viz.,
N
15J(yo; v) - L: Aic5Gi(YO; v) = 0,
i=l
v V E!fIf
where for each i = 1, 2, ... , N, the constant
A.- = _ cofactor of c5Gi(yo; v)
• cofactor of 15J(yo; v) ,
is defined since the denominator is precisely the nonvanishing determinant
Ic5Gi(YO; Vj) I
i,j = 1,2, ... , N . 0
(3.6) Remarks. Condition (a) holds if the constraining relations are locally
linearly dependent in that there exist constants Ili' i = 1, 2, ... , N not all zero,
for which L:f=llliGi(Y) = 0, V Y near Yo. Indeed, from the linearity of the
Gateaux variation (see §2.4) it would follow that L:f=lllic5Gi(YO; v) = for °
each direction v E !fIJ. Thus for each set of directions V1 , V2' ..• , V N E !fIJ, the
rows of the determinant of condition (a) are linearly dependent and so it must
vanish.
Conversely, if condition (a) is satisfied for any set of directions V 1 , V 2 , .•• ,
VN E !fIf, then in general the rows (and columns) of the determinant are linearly
dependent. Indeed, upon expanding it by the minors of the first column as in
proof of Theorem 5.16, we would have that L:f=lllic5Gi(YO; Vj) must always
vanish for j = 1, 2, ... , N, since this represents the expansion of a determinant
having two identical columns. Thus the rows of the determinant are linearly
°
dependent (unless Ili = for i = 1, 2, ... , N).
136 5. Local Extrema in Normed Linear Spaces
Similarly, Lagrange's condition (b) implies that the variations M(yo; v),
c5G1(Yo; v), ... , c5GN (yo; v) are linearly dependent for each v E 11JJ. Utilizing the
geometric language of IR d, we see that when all functions J, Gi , i = 1, 2, ... , N
are differentiable at Yo as in §5.6 and 't' E ilJJ is a direction simultaneously
tangent to each level set GiyO i = 1, 2, ... , N, then it must also be tangential to
JyO ' the level set of J unconstrained by the Gi at Yo, and thus M(yo; 't') = 0 for
all such directions as we should expect.
However, when N > 1, it is possible that M(yo; v) = 0 for directions v
which are not tangential to any of the level sets GiyO at Yo' Thus we cannot
assert that Lagrange's condition implies common tangency of the level sets at
Yo, as was the case for N = 1.
Observe that upon replacement of Ai by -Ai condition (b) can also be
restated in the form c5(J + If=1 AiGi)(YO; .) == 0 which suggests consideration
of the augmented function J + If=1 AiGi without constraints, again as in
§2.3.
G(y)
def
=
fO xy'(x) dx
4
= --1 ' (13)
-1 5
we may either characterize ~ by means of the two additional constraining
relations G1 (y) ~ y( -1) = 0, G2(Y)~ y(O) = i and apply Theorem 5.16 with
§5.7. Extrema with Constraints: Lagrangian Multipliers 137
E
(a) bG(y; v) = J~1 xv'(x) dx = 0, V V ifYo [which would imply by Lemma 4.1
that the continuous function h(x) = x is constant on [ -1, 0] and this is
false]; or
(b) 3 A such that b(F + AG)(y; v) = J~1 [3Y'(X)2 + h]v'(x) dx = 0, V V ifYo. E
Again by Lemma 4.1, we conclude that 3y'(X)2 + AX is constant on
[ -1, 0], or, upon replacing A by - 3A, we have for an appropriate c that
y'(X)2 = c + AX (;:?: 0 on (-1,0)). Thus on (-1,0), either y'(x) = +h -Jc
which cannot satisfy (13) (Why?); orl
y'(x) = Jc + h. (14)
Similarly, the possibility that A = 0 in (14) requires that y'(x) = and to Jc,
satisfy (13) we must take Jc= 185' but then y(x) = 18SX + const. cannot be in
EC. (Why?).
When A =F 0, integration of (14) gives for some constant c 1 :
(15)
where ex = 1- Now
2 2 2c"
y(O) = - => C 1 = - - -
3 3 3A '
while
2
y(-1)=0=>c 1 = -3/C - A)". (16)
1 y' cannot change sign at a point Xo E (-1,0) since y'(X O)2 = c + A.xo cannot be zero.
138 5. Local Extrema in Normed Linear Spaces
We need to solve the nonlinear system, (17) and (18), for c ~ 0 and A :$; c
with A -# O. By inspection, A = c = 1 constitutes one such solution, and hence
from (15), (16):
yo(x) = i(x + 1)3/2 provides a possible local extremal
point in !!} under (13).
[In fact, there is no other admissible solution to this system. l To establish
this, we use (17) to replace (c - A)~ in (18). We get
5A
A2 = 2(A - c~) + C d l + (c - A)(A - c~),
or
WA 2 = mAC~ - AC, and since A -# 0:
A = 3c~ - 2c, so that with ex = !,
c - A = 3(c - c~) = 3c(1 - Jc) ~ o. (19)
Thus 0 :$; c :$; 1; but c = 1 in (19) leads to the case A = c = 1 already
considered, while c = 0, gives A = c = 0 = y'(x) which violates the con-
straining relation (13).
Upon substitution of (19) into (17), we obtain for 0 < C < 1:
3c~ - 2c = c~ - (3c)~(1 - Jc)~ or 2c(Jc - 1) = -(3c)~(1 - Jc)~,
and with ex =l
2 = 3~ Jc(l - Jc)1 /2,
so that
4 = 27c(1 - Jc),
or finally
PROBLEMS
5.1. Reverse Triangle Inequality. Show that if ('W, 11·11) is a normed linear space,
then
Illyll - 11.9111 :0:;; Ily - .911, Vy,.9ECfII.
5.2. Let 'W = with the Euclidean norm
r
[Rd
Ilyll = (t yJ
2
, for y = (Yl' Y2' ... , Yd)·
Parts (a) and (c) do not hold for all norms on 'W.
(d) Verify that IlyliM = maxi =1.2 •...• d Iyil is a norm for [Rd, which does not have
the properties in (a) and (c) above when d ~ 2.
5.3. Let'W = C[a, b].
(a) Verify that Ilylll = S:
ly(x)1 dx defines a norm for 'W.
(b) Does Ilyll = IS:y(x) dxl define a norm for 'W?
5.4. Show that Ilyll = maxxe[a.b]ly'(x)1 defines a norm for the linear space 'Wo =
{y E Cl[a, b]: S:y(x)dx = O}, but does not define a norm for'W = Cl[a, b].
140 5. Local Extrema in Normed Linear Spaces
5.5. (a) Verify that Ilyll = ly(a)1 + maxxe[a.b]ly'(x)1 defines a norm for 0/1 =
C 1 [a,b].
(b) Show that maxXe[a.b]ly(x)1 ~ (1 + b - a) Ilyll, V Y E 0/1. Hint:
y(x) = y(a) + r
y'(t) dt.
5.6. (a) Verify that each of the functions, II YIIM and II Y111' given in §5.1, Example 6,
gives a norm for 0/1 = (C[a, bJt
(b) Show that the remaining functions in the example also give norms for 0/1.
5.7. Suppose that both 11'111 and 11'112 are norms for the linear space 0/1.
(a) Show that Ilyll = IIyl11 + IIyl12 defines a norm on 0/1.
(b) Does Ilyll = IIyl11 . IIyl12 also define a norm for o/I?
5.8. (a) Verify the assertion of §5.1, Example 4.
(b) When (~, 11'll j ) are each normed linear spaces for j = 1, 2, find a corre-
sponding norm for the linear space 0/11 x 0/12'
5.9. With 0/1 = C[O, 1J and {Yn} = {(x/2)n},
(a) Show that Yn ...... (!J as n ...... 00, using IIyl11 = H ly(x)1 dx.
(b) Show also that yn ...... (!J as n ...... 00, using IlyliM = maxxe [o.l]ly(x)l.
5.10. Let 0/1 = C[O, 1J and {Yn} = {xn}; i.e., Yn(x) = x n, n = 1,2, ....
(a) Establish that yn ...... (!J as n ...... 00, using 11'111, but
(b) Yn -1+ (!J using II' 11M' where 11'111 and II' 11M are as in Problem 5.9.
(Note: This shows that a sequence from 0/1 may converge to Yo E 0/1 with respect
to one norm, but not with respect to another.)
5.11. Let (0/1, 11'11) be a normed linear space, and {Yn}, {Yn} be sequences from 0/1.
Show that if Yn ...... Yo and Yn ...... Yo as n ...... 00, then (Yn + Yn) ...... (Yo + Yo) as
n ...... 00.
5.12. Suppose that (0/1, 11'11) is a normed linear space, and let {Yn} be a sequence from
0/1.
(a) Show that if Yn ...... Yo as n ...... 00, then IIYnl1 ...... IIYol1 as n ...... 00.
(b) Give an example to illustrate that the converse of (a) is false.
5.13. Use Definition 5.1 to prove that in a normed linear space (0/1, 11'11), a real valued
function J is continuous at Yo E 0/1 iff for each sequence {Yn} from 0/1,
lim Yn = Yo ~ lim J(Yn) = J(yo)·
n-+co n-+oo
5.14. Let 0/1 = C[a, bJ and use Definition 5.1 to establish that J(y) = S!(sin x)y(x) dx
is continuous on 0/1 using:
(a) IlyIIM = maxxe[a.b]ly(x)l.
(b) IIyl11 = S! ly(x)1 dx.
Make a similar analysis for F(y) = S! sin(y(x)) dx. Hint: Use a mean value
inequality.
5.l5. Let (0/1, 11'11) be a normed linear space and L be a real valued linear function on
0/1 (i.e., L(cy + cy) = cL(y) + cL('y), V y, YE 0/1 and V c, C E IR). Prove that L
is continuous on 0/1 iff there exists a constant A such that IL(y)1 ~ A Ilyll,
VYEo/I.
Problems 141
5.16. Suppose that 11'111 and 11'112 are both norms for the linear space 0/1 and there is
a constant A such that IIyl11 :.,:; A Ily112, VY E 0/1.
(a) Show that if y. -+ Yo as n -+ 00 using 11'112, then also y. -+ Yo using 11'111'
(b) Prove that if a real valued function J on 0/1 is continuous with respect to
11'111, then it is also continuous with respect to I . b
5.17. Let (0/1, 11'11) be a normed linear space.
(a) Show that if K is a compact subset of 0/1, then K is bounded, i.e., there is a
constant k such that Ilyll :.,:; k, V Y E K.
(b) Conclude that if 0/1 t= {(9}, then 0/1 itself cannot be compact.
Let 0/1 = qa, bJ and K = {y E 0/1: S:y(x) dx = 1}. Is K compact if we use:
(c) IIyl11 = S: ly(x)1 dx?
(d) lIyllM = maxxE[a.b1Iy(x)l?
5.18. Let (0/1, 11'11) be a normed linear space and J, G, be real valued functions on 0/1
which are continuous at Yo E 0/1. Prove that for C E IR, the following functions
are also continuous at Yo:
(a) cJ; (b) J + G; (c) JG.
Hint for JG: ab - aobo = (a - ao)(b - bo) + (a - ao)bo + ao(b - bo).
5.19. Verify that J(y) = g ly(x)1 dx does not achieve a minimum value on
Pfi = {y E qo, IJ: y(O) = 0, y(l) = I},
although J is bounded below (i.e., J(y) ~ 0) on Pfi. Does Proposition 5.3 cover
this?
5.20*. Let 0/1 = qo, IJ and J(y) = 2y(0)3 - 3y(0)2.
(a) Prove that yo(x) == 1 is a local minimum point for J on 0/1 using lIyllM =
max xE [o.1 1 Iy(x)l. (Hint: Show that y E Sl (Yo) => J(y) ~ -1 = J(yo). Con-
sider minimizing the cubic polynomial p(t) = 2t 3 - 3t 2 on IR.)
(b) Prove that yo(x) == 1 is not a local minimum point for J on 0/1 using
lIyll1 = g ly(x)1 dx. (Hint: Consider the continuous function
-I + 2x/e, 0:.,:; x :.,:; e,
y,(x) = {
1, e < x:.,:; 1,
for each fixed e > 0 and show that lIy, - Yo 111 can be made as small as we
please by choosing e small, while J(y,) = -5 < J(yo), V e > 0.)
5.21. For Example 2 of §5.5, discuss what happens if IX vanishes identically on a
subinterval of [a, b].
5.22. Let 0/1 = qa, bJ,J(y) = S:[sin 3 x + Y(X)2J dX,andPfi = {y E 0/1: S:y(x) dx = I}.
(a) What are the Pfi-admissible directions for J?
(b) Find all possible (local) extremal points for J on Pfi. (See Problem 4.2.)
(c)* Prove directly that J is differentiable at each Yo E 0/1. (See §5.6, Example 3.)
5.23. Let 0/1 = C 1 [a, bJ, Pfi = {y E 0/1: y(a) = a1 , y(b) = bd, and J(y) = S:f(x, y'(x)) dx,
where f(x, z) and f.(x, z) are continuous on [a, bJ x R
(a) What are the Pfi-admissible directions for J?
(b) Show that if y is a (local) extremal point for J on Pfi, then J;,,(x) ~
f.(x, y'(x)) = const. on [a, b].
142 5. Local Extrema in Normed Linear Spaces
5.24. Let all = qa, b], ~ = {y E all: y(a) = al, y(b) = bd, and J(y) = J~f(x, y(x)) dx,
where f(x, y) and fy(x, y) are continuous on [a, bJ x IR.
(a) What are the ~-admissible directions for J on~?
(b) Show that if y E ~ is a (local) extremal point for J on ~, then J,(x) ~f
J,(x, y(x)) = 0 on [a, b].
(c)* Prove that the variations t5J(y; v) are weakly continuous in the maximum
norm. Hint: See the proof of Proposition 5.11.
(d) Conclude that if 0: E C[a, bJ, and J(y) ~ J~ o:(x)eY(x) dx, then J cannot have
a (local) extremum on ~ for any values of al, bl , unless 0: = (!J.
In Problems 5.25-5.31 find all possible (local) extremal points for J (a) on ~;
(b) on ptll'
5.32*. In Example 2 of§5.5, let 0: E qa, b] with 0: ~ 0:0 > 0 on [a, b].
(a) Show that there exists a Do > 0 such that if 0 < bl - a l < Do, then there is
precisely one c E (0, 0: 0 ) for which (5) is satisfied (and hence precisely one
y E ptl which satisfies (3)).
(b) What happens if a l = bl ?
5.33. Suppose that (0//, 11'11) is a normed linear space for which L: all -+ IR is continu-
ous and linear (i.e., L(cy + cji) = cL(y) + CL(ji), V y, ji E all, and c, CE IR). Show
that L is Frechet differentiable at each Yo E all:
(a) by using Definition 5.6; and
(b) by using Theorem 5.9.
(c) If L=/= 0, prove that 3 Vl E all with L(v l ) = 1, and thus L(t) = 0 when t =
Y - L(y)vl> if Y E all.
(d) In Definition 5.12, take L = J'(yo) and conclude that "most" directions are
tangential.
Problems 143
5.34. If ('W, 11·11) is a normed linear space and J: 'W --+ IR has at each y E 'W Gateaux
variations which satisfy conditions (a) and (b) of Theorem 5.9, verify that
bJ(y; v) is weakly continuous at Yo. Hint: Each v E 'W may be expressed as
v = Ilvll VI' with IlvI11 = 1.
5.35*. In Example 3 of §5.6*, use the vector inequality of Problem 0.2, viz.,
I~
IAI
-~I < .j2 IA - BI ,
IBI - J[AfTBI
(!) =1= A, B, E IR",
5.36. Establish the linearity and continuity in v of the Gateaux variations bF(y; v)
utilized in the first part of the proof of Proposition 5.11.
In Problems 5.37-5.39, use the method of Lagrangian multipliers to determine all
possible (local) extremal points for J on ~.
when Vo E ~6 and vo(x) == 0 on [a, xo]. (See §1.5 for the notation.) This relax-
ation of conditions near an end point will be required for a careful analysis of
the brachistochrone where
ji+7
f(x, y, z) = M:::.
y2gy
(See Example 4* in §5.5, and Problems 6.14*, 6.15*.
(b) Formulate and prove a vector valued analogue of this result.
5.41 *. (a) For Example 2 of §5.7, prove that for A E IR, the function !(}f, z) = Z3 +
A~Z is strongly convex on [ -1, 0] x [0, 00).
(b) Conclude that when y E ~ and y'(x) ~ 0 on [ -1,0] then for an appropri-
ate A, F(Y) > F(yo), when yo(x) = t(x + 1)3/2 and y =1= Yo.
144 5. Local Extrema in Normed Linear Spaces
(c) Draw a sketch to show that in each II· 11M neighborhood of Yo, 3 y E ~
with F(y) > F(yo) and G(y) = J~1 xy(x) dx = G(yo).
(d) Can convexity be used to prove that Yo is a local minimum point for this
problem? Explain.
(e) Use part (b) to conclude that system (17), (18) has at most one solution A, c
for which AX + c ~ 0 on [ -1, 0]. Hint: Each solution pair (A, c) gives a
Yo E ~* = {y E ~: y' ~ O} that minimizes F = F - 3AG on ~* uniquely!
(f)* Redo the problem of this Example when - t replaces - 145 in (13).
5.42. When D is a bounded domain in IRd (for d ~ 2) with a smooth boundary,
verify formally, that Iluli M == maxxeij(lu(X)1 + IVu(X)I) defines a norm for
IJjj = C 1 (D). (D is compact.) See §6.9.
5.43. Find all possible functions that maximize J(y) = y2(1) on ~ = {y E C 1 [0, 1]:
y(O) = O} under the constraint G(y) ~ g y'(X)2 dx = 1.
CHAPTER 6
145
146 6. The Euler-Lagrange Equations
--Yo
--YE.@
--Y E .@b or .@,
a b
.-~-~------~~-
x
Y Y
(a) (b)
Figure 6.1
over a set
~t ~ {y E C l [Xl' X2]: 'rj(Xj , Y(Xj)) = O;j = 1, 2},
where [Xl' X 2 ] ~ IR, and the 'rj are given functions.
integrals involving C 1 vector valued functions (§6.7), and finally (in §6.9)
to integrals over higher-dimensional space. Invariance of stationarity with
respect to change in coordinates is examined in §6.8.
Many of these results were obtained first by Lagrange (1738-1813) who
began his investigations in the subject at age sixteen \ however, his successors
have added mathematical rigor to the original discoveries.
In this chapter, only those conditions necessary for a local extremum are
considered, and although the methods developed are applied to significant
problems of classical interest (including that of the brachistochrone), the final
disposition of such problems must await the discussion of sufficiency in
Chapter 9. It should be noted, however, that the initial investigators in these
fields, often regarded a function which satisfied the necessary conditions as
the extremal function sought, and the practice continues today in elementary
treatments of the subject.
Throughout this chapter, we shall supply the space C 1 [a, b] with the
maximum norm IlyliM = max(ly(x)1 + Iy'(x)l) of §5.1, Example 3, and its
vector valued counterpart (C 1 [a, b])d with the corresponding norm
I YIIM = max(1 Y(x) I + IY'(x)l).
Other norms will be introduced as needed. However, for many of our consid-
erations the particular norm in use is not significant.
F(y) = r
Then for each y E!fIf = C1[a, b]:
r
is defined. From Example 4 of §2.4, F has in each direction v the Gateaux
variation
!5F(y; v) = [J;,(x)v(x) + fy'(x)v'(x)] dx, (1)
where for the given y E !fIf, we use the compressed notation from §1.5:
f,y(X) deC
= J;,[y(x)] an d f,y'(x) deC r
= Jz[Y(x)], (2)
so that
bF(y; v) = fy'(x)v(x) 1:, VVErJ.IJ. (3')
PROOF. The first assertions are a restatement of Proposition 4.2 for the con-
tinuous functions g(x) = fy(x) and h(x) = fy.(x). But then (3) permits the
integrand of (1) to be recognized as (d/dx) [fy-{x)v(x)] and thus integrated
to produce (3'). D
(An old and rather entrenched tradition calls such functions extremal
functions or simply extremals, although they may provide neither a local
maximum nor a local minimum for the problem.) Observe that we do not
require that a stationary function satisfy any particular boundary conditions,
although in each problem, we might be interested only in those which meet
given boundary conditions.
Now, as in §5.5, certain functions f with their derivatives fy and fz are
defined only for a restricted class of functions y, (e.g., y ~ 0) so that variation
of Fat y can be performed only for a reduced class of v (e.g., those for which
Iv(x)1 ::;; ly(x)I)· As the preceding discussion shows, when y is stationary and
meets the restrictions, then bF(y; v) = 0, V V E ~o for which the variation at y
is defined. However, there may also be nonstationary functions '1 which make
bF('1; v) = 0 for the reduced class of v, and these may provide the true
extremals. (See Problem 6.13.)
(Problem 6.1)
obtain a first integral of the differential equation. We shall analyze three such
cases in this section.
L(y) = f: )1
2
+ [y'(O)]2 dO
on
~ = {y E ClEO, O2]: y(O) = Yl; y(02) = Y2}.
Here
z
f = f(z) = J1+z2, fAz) = J1+z2;
hence, a necessary condition that a given y E ~ minimize L on ~ is that y be
stationary, or that
y'
) = const. or (y'f = const.,
1 + y,2
so that y' = const. In this case, the only stationary functions are the linear
functions y(O) = clO + C2 corresponding to the circular helices on the cylin-
der. (From this analysis alone, however, we cannot say that a helix provides
the minimum sought. We would need, in addition, an argument such as that
used in §3.4(a).)
Here
f = f(cp, z) = RJ1 + Z2 sin 2 cp,
so that
Rz sin 2 cp
fAcp, z) = J1 + Z 2SIn
' 2
cp
Thus the stationary functions yare those for which
Ry'(m) sin 2 m
---,=='=~"t'~====~"t'~ = const. = 0 (at cp = 0);
J1 + y'(cp)2 sin 2 cp
i.e., the stationary functions in ~1 are those for which y'(cp) = 0 which corre-
spond to the great circles. Again, the fact that such a function minimizes L
requires separate analysis, as in §l.1(b).
= -y'(x{:xJ;,,(X) - J;,(X)],
and when y is stationary, the right side vanishes by (3). Thus on each interval
of stationarity of y:
f(x) - y'(x)fy'(x) = const. (4)
Conversely, if (4) holds on an interval in which y' does not vanish, then y
is stationary. (Why?) In this case stationarity is characterized by (4) which is
a first integral of (3).1 The additional smoothness requirement that y be C2
can be removed if y is assumed to be a local extremal function. See the next
section.
Example 3. For the function f(y, z) = y2(1 - Z)2 where fz(y, z) = 2 y2(Z - 1),
the (C 2) stationary functions y = y(x) satisfy (4). Thus on each interval, for
some constant c
1 However, this integral is usually nonlinear in y' while the original Euler-Lagrange equation is
sometimes linear.
§6.2. Special Cases of the First Equation 151
J1 + y'2
r:. - y , ( r:. y' )
= const.,
yy yy~l + y,2
or
1 1
r:. J = const. = -, say;
y y 1 + y,2 C
152 6. The Euler-Lagrange Equations
J2 Y
c - y y'=1.
(6)
There is also a more subtle point to consider before regarding this station-
ary function as a candidate for representing the brachistochrone; it arises
from the proof of Lemma 4.1 as follows:
Our analysis that a minimizing function for T on ~ must be a stationary
°
function, utilized the fact that <5T(y; v) = for a particular v E ~o. However,
as we know from the discussion in §5.5, Example 4, at a given y E ~, the only
v E ~o which are definitely ~-admissible for variation are those for which
Iv(x)1 :$; y(x) on [0, Xl] (or their scalar multiples). Unless the particular v used
to establish stationarity meets this condition, the analysis is not conclusive,
and the true brachistochrone in this class may be provided by a nonstation-
ary function.
§6.3. The Second Equation 153
.---------------------------~~-.
x
Figure 6.2
We shall return to this elusive problem, which so far remains just outside
the methods being used to analyze it, in §6.4, in §8.8, and in Chapter 9.
(See, however, Problem 6.15* for a reformulation which circumvents some of
the above difficulties.)
(Problems 6.2-6.16)
:i' / I
'" / I
.,'" / I
/1
/ I
- - -
y(x) = '1m / /
/ I
I
I
/ I / I
I /
~--~~---------7----~----'
IX a P b x,e
Figure 6.3
This equation resembles (7), the integral form of the first, and, moreover, it
does not exhibit explicitly the C 2 requirement on y used in its derivation.
Hence we can hope to obtain it directly. This is indeed possible (for extremal
functions) but it is surprisingly complicated to do so in view of the simplicity
of the underlying strategy: viz., to conduct the original variational operations
in terms of coordinate axes which are skewed slightly with respect to the
original x, y axes as in Figure 6.3.
and
Let
F(y) = r f(x, y(x), y'(x)) dx
1 Since 1 + c,,'(e) > 0, we can take y(x) = ,,(e) for the unique esuch that e+ c,,(e) = x.
§6.3. The Second Equation 155
, l1'(e) 1
y (x) = 1 + cll'(e) or 1 + cll'(e) = 1 - cy'(x). (9)
where
- 11,
f(e, C) ~
= f ( e+ Cll, 11, 1 +CcC) (1 + cC). (10)
],,(e, 11, e
C) = fx ( + Cll, 11, 1 JCC) (1 + cC)c + !, (e + Cll, 11, 1 JCC) (1 + cC),
while
Remark. When y is only stationary, this proof does not yield the second
equation unless y is C 2 • (See Problem 6.35(c), (d).) However, if fz is C 1 then y
is C2 when fzz is nonvanishing. (Theorem 7.14)
r
To find conditions necessary to minimize
Transversal Conditions*
To obtain the natural boundary conditions associated with more general end
point constraints provided by transversals such as that illustrated in Figure
6.4, it is more convenient to use Lagrangian multipliers. Here we suppose the
integral
J(y, t) = f f(x, y(x), y'(x)) dx = f f[y(x)] dx (13)
is to be minimized over
.@. = {y E c l [a, t]: y(a) = a l ; -r(t, y(t)) = O}.
r(x, y) = 0
y
(a, ad
a
Figure 6.4
Assume that f and the constraining function rare CIon domains large
enough to admit all functions of interest and that Vr "# (D. If y E C 1 [a, t]
minimizes J, then varying y by functions v in ~o = {v E C 1 [a, t]: v(a) =
v(t) = O} shows as usual that y is a stationary function and thus is a solution
of (3), the Euler-Lagrange equation (d/dx)/y.(x) = fy(x) on (a, t).
For the proper natural boundary condition at the right end, we must
admit more general variations. To provide a convenient framework, we sup-
pose that the functions yare defined by extension on a fixed large interval
[a, b], and introduce the linear space
OJ! = C1[a, b] x IR,
with the norm II(y, t)ll = IlyliM + Itl. (See Problem 5.8.)
A general variation for J in this space in the "direction" (v, ~) is obtained
by differentiating J(y + /lV, t + B~) with respect to B and setting B = O. By
Leibniz' rule (A.14), we get that with the usual abbreviations:
The right end point constraint may be expressed as the zero level set of the
function
G(y, t) = r(t, y(t)) = r[y(t)],
so that upon differentiating
evaluating at B = 0, we obtain
*+ B~, (y + BV)(t + B~)) with respect to Band
(It may be shown that variations (14) and (14') are weakly continuous.) Now,
let y be a local extremum point for J of (13). According to Theorem 5.15,
unless !5G(y, t; " .) == 0 (which would require the vanishing of both 't"x[y(t)]
and 't"y[y(t)]; Why?), then 3 A. E IR such that
!5(J + A.G)(y, t; " .) == O.
Hence restricting attention to those v E !?}o as before which vanish at a and t,
we have from (14) and (14') that
{f(t) + Jc('t"x[y(t)] + 't"y[y(t)]y'(t))}e = 0, v esufficiently small.
Similarly, if we consider variations (v, 0) for which e= v(a) = 0, then
U;,.(t) + A.'t"y[y(t)]} v(t) = 0, v v(t) sufficiently small.
Dividing these last equations by (, v(t), respectively, and eliminating A. be-
tween them shows that a local extremal point y for J on !?}t is a stationary
jUnction on (a, t) which meets the transversal condition
(15)
(15) is the desired natural boundary condition. Note that when 't"(x, y) =
b - x so that 't"y == 0, then (15) reduces to /y.(b) = 0 as obtained earlier.
Similarly, when 't"(x, y) = Y - b1 for given b1 , the terminal value t of x is
unspecified, and at (t, b 1 ) an optimal solution should meet the transversal
condition:
f(t) - y'(t)/y.(t) = O. (15')
In economics, this would be called a free-horizon problem. If the terminal
value b 1 is also unspecified, then in addition to (15'), an optimal solution must
meet the free-end condition /y.(t) = 0, at its terminal point (t, y(t)). Why?
If both end points lie on curves of this type, as in Figure 6.1 (b), then a local
extremal function will be stationary on an interval for which it satisfies (15)
at the right endpoint and the corresponding condition at the left.
Some other types of constraints amenable to the use of Lagrangian multi-
pliers will be treated in §6.7 in connection with vector valued extremals.
t(x. y) = 0
Figure 6.5
r r
the stationarity of the possible extremal functions for integrals such as
but do control the boundary conditions which the extremal function should
satisfy.
However, frequently present are other constraints which operate over the
entire interval [a, b]. When each of these can also be expressed in integral
form say by requiring that a function
on
F(y) =r f[y(x)] dx
Then either:
G;(y)~ r g;[y(x)] dx = G;(yo), i = 1,2, ... , N}.
(a) the N x N determinant
I ~~;~o; v) I= 0,
I,J-1, ... ,N
whenever Vj E q}o = {v E C 1 [a, b]: v(a) = v(b) = O},j = 1,2, ... , N;
or
(b) 3 A; E ~, i = 1, 2, ... , N that make Yo stationary for the modified function
j = f + Li'=l A;g;; i.e., Yo is a solution of the equation
d - -
dxf,·(x) = f,(x) on (a, b).
PROOF. As noted, the hypotheses on f and the g; assure that the variations
(jF(y; v), (jG;(y; v) are linear in v and weakly continuous for all v in the
subspace q}o. Hence from Theorem 5.16 (and subsequent remarks), either
condition (a) holds, V Vj E q}o or 3 A; E~, i = 1, 2, ... , N, for which
(jF(yo; v) = 0, V V E q}o, where
(Problems 6.23-6.24)
deC
c5F(y; v) = a/(Y
a + ev) I0=0'
or from A.13
c5F(y; v) = r (/y(x)v(x) + /y,(x)v'(x) + /y,,(x)v"(x» dx,
(see Problem 2.9); where /y,,(x) ~ f..(x, y(x), y'(x), y"(x», whenf = f(x, y, z, r),
(17)
g(x) = LX f,(t) dt
and (18)
h(x) = f [f,,(t) - g(t)] dt,
r
we have upon integrating (17) by parts twice in succession that for v E P}o:
Here, the definition of h assures its vanishing at x = b, and hence the van-
ishing of the boundary term h(x)v'(x)l~ for v E P}o. However, since v E P}o =>
J:
v(a) = v(b) = 0, the definition of 9 is less critical and g(x) ~ f,(t) dt + const.
would also suffice. In any case, as a necessary condition that y be locally
r
extremal, we have that
In our case, when v E Etla, the first term on the right in (21') vanishes and the
second reduces to h,,(b)v'(b). Hence from (19), we see that the appropriate
natural boundary condition is
h,,(b) = 0; (or f..(b, y(b), y'(b), y"(b)) = 0). (22)
Other boundary conditions are considered in Problems 6.25-6.28, together
with the derivation of the Euler- Lagrange equation for functions f involving
derivatives higher than the second. A corresponding "second" equation is
obtained in Problem 6.34.
~= Jl I k 2 (x) ds(x),
x p
y y
(a) (b)
Figure 6.6
each x at which y" is defined. This work may also be regarded as the potential
energy of strain stored in the rod as it is bent from an initial unstressed
configuration (supposed straight). Bernoulli conjectured that when the rod is
bent by external forces, it will assume a shape which minimizes the potential
energy. We have already utilized this principle in the analysis of the catenary
problem of §3.5, and what we have thus far would suffice to describe a
situation in which other types of strain energy (work) can be considered
negligible. (See Problem 6.29.)
However, if the bending of a column is produced through buckling under
a longitudinal compressive force of magnitude P applied to the end as in
Figure 6.6(b), then work is also done in compressing the bar. If we regard the
bent bar as an elastic spring of the "new" length
If the bar is clamped at its lower end as in Figure 6.6(b), then y(O) =
y'(O) = 0, while if we suppose that the upper end remains essentially fixed so
that y(l) = 0, we would wish to minimize the potential energy
on
U(y) == I
0
I [
J1. (1
y"(x?
+ y'(X)2)5/2 - P(J1 + y'(X)2
]
- 1) dx (23)
or with (24), it makes y"/(l + y'2)5/2 (and hence y") continuously differentiable,
and
y,,), y' y'
(
2J1. (1 + y'2)5/2 + 5J1.(Y? (1 + y'2f/2 + P (1 + y'2)1/2 = c.
The natural boundary condition (22) associated with the unspecified slope
at x = I is (from (24» given by
2J1.y"(l)
(1 + y'(l)2)5/2 =
o, or y"(I) = O. (26)
constant coefficients:
y'" + w 2 y' = c, where w 2 = P/2j1., (28)
subject to the homogeneous boundary conditions (27). Integrating and using
the conditions y(l) = y"(I) = 0 gives the second-order equation
y" + w2y = c(x - I), (29)
whose general solution is known; (see, for example, [B-diPJ). It is given by
y(x) = A cos wx + B sin wx + Cx + D, (30)
for constants A, B, C, and D to be found to satisfy (29) and the remaining
boundary conditions, y(O) = y'(0) = O.
From these last two conditions, we must have
O=A+D or A=-D,
and
o= Bw +C or B = - Cjw.
Differentiating (30) twice gives
y"(x) + w 2 y(x) = w(Cx + D),
and the right side agrees with that of (29) if and only if
Cw 2 = c and Dw 2 = -cl.
Thus the solution is given by a constant multiple (c) of
Pl = 2j1. w r~ gC;Y·
With w = Wn> (31) defines a sequence of stationary mode functions, Yn in ~
each of which satisfies the natural boundary condition (26) under the addi-
tionallinearizing assumption that
max 1y"1 ~ l.
From (31), with w = Wn (so that wnl > nn), it follows that
168 6. The Euler-Lagrange Equations
wi
Figure 6.7
Thus the smallness of Y" for the actual deflection curve y(x) = cYn(x) is
possible iff c itself is small, which means that the maximum deflection must
itself be small. Such linear analysis is usually termed small deflection theory,
and the approximations are often made in (23), the integral expression de-
fining U itself. (However, if the estimate max ly'12 ~ 1 is used there uncriti-
cally, the term involving P disappears. See Problem 6.30.)
Our linearization has resulted in another difficulty: Since each multiple of
YI (or Yn) is another stationary function which meets the required boundary
conditions, it is not evident in what sense the potential energy U could be
minimized by such functions. What must be realized is that once buckling
has occurred with the critical load PI in the mode described by YI' then
further bending can occur in this mode without additional load until the
nonlinear effects excluded by our analysis become prominent. In particular,
the assumptions of small deflection theory may be violated, even though they
are valid at the instant when buckling first occurs.
Another anomaly requires explanation; namely, whether buckling can
occur only at the critical loads Pn = 2jl.w;. If, for example, the column is
encased in a more rigid structure before loading, loaded by P without
buckling, and then uncased, it is in unstable equilibrium at the critical loads
Pn and buckling in the associated mode Yn can be induced, with the buckled
bar in static equilibrium. However, with the load P2 , say, the column cannot
buckle in a mode Yn for n > 2, since more energy would be required than can
be sustained by P2 • On the other hand, with this loading (or by any P > Pd
buckling in the mode YI could not retain the static equilibrium of the bar (at
least as described by small deflection theory). Thus with moderate loading P,
buckling may be prevented by supporting the bar only at the points of maxi-
mum deflection of the lower mode shapes.
(Problems 6.25-6.34)
§6.7. Vector Valued Stationary Functions 169
of elements
having derivatives
Y' = (y~, Y;, ... , y~),
a suitable norm is given by
on
F{Y) ~ Lb f{x, Y{x), Y'{x)) dx = r f[Y{x)] dx
plained in §1.5, fy{x) is the vector valued function with components J;,/x) ~
J;,.[Y{x)],j = 1,2, ... , d; andfr{x) is the vector valued function with compo-
J r deC r . deC
nents Jz/X) = JzJY{X)], ] = 1, 2, ... , d. (Here we regard f = f{x, Y, Z) =
f{x, Yl' Y2' ... , Yd' Zl' Z2' ... , Zd)') The dot denotes the ordinary scalar prod-
uct in !R d , and is used for convenience in notation.
170 6. The Euler-Lagrange Equations
on
F(Y) = r
Yo E qIj = (C 1 [a, b])d is a (local) extremal function for
f[Y(x)] dx
There is also an analogous second equation for local extremal Y (cf. §6.3);
viz.,
f(x) - Y'(x)·fY'(x) = IX fAt) dt + c, X E (a, b), (35)
§6.7. Vector Valued Stationary Functions 171
where f(x) = f[Y(x)] and fAx) = fx[Y(x)], However here, the second equa-
tion is scalar and cannot characterize stationarity of Y as in the one-dimen-
sional case (Problem 6.36).
As in §6.4, if, say, yib) is left unspecified for some value of j = 1, 2, ... , d,
there results the associated natural boundary conditionJ;,/b) = O. (See Prob-
lem 6.37.)
then Yo will be stationary for the modified function f + Ag, and so should
satisfy the corresponding Euler-Lagrange equation(s):
d
dx (f + Ag)y'(X) = (f + Ag)y(X) (36)
A(Y) ~ {l x(t)y'(t) dt
172 6. The Euler-Lagrange Equations
JL(Y; V) = f 0
1 Y'(t)
IY'(t)l· V
,
(t) dt,
~;-------~~~~~--------.
x
Figure 6.8
§6.7. Vector Valued Stationary Functions 173
Now, if for some Y E ~*, c;L(Y; V) = 0, V V E ~o, then from Corollary 4.7
it would follow that the unit vector Y'/I Y'I = const. so that Y' has a constant
direction, but such functions are not in ~*. (The function Y(t) == (0,0) is in ~
but it is not in ~*.)
Hence from Theorem 5.15 and Remark 5.17, if Yo E ~* maximizes A
(locally) on ~* when restricted to the I level set of L, then :3 A E IR such that
c;(A + AL)(Yo; V) = 0, V V E ~o; Yo is stationary for the function xy' + AIY'I
and satisfies the associated Euler-Lagrange equation(s) (34):1
d(AX')'
dt IY'I = y and
d(
dt X + IAY')
Y'I = 0.
Lagrangian Constraints*
The method of Lagrangian multipliers may also be adapted to the case of
constraints of the form
g[Y(x)] = g(x, Y(x), Y'(x)) == 0, VXE[a,b],
where 9 E C 1 (D) for a suitable domain D of 1R2d+1. We shall consider only the
simple case of a single constraint g(Y(x)) == 0, V X E [a, b] which is required
for the discussion of Hamiltonian mechanics in §8.6. The general case will be
treated in §11.3.
locally on
g(Y(x» == 0, VXE[a,b],
Then:3 A E C[a, b] such that Yo is stationary for the modified function f + Ag.
where
- - - def
F(Y)
- -
r
For Y E pj, we may consider the unconstrained function
- - -
f(x, Y, Z) = f(x; y, Y; z, Z), with y = I/I(Y) and z = VI/I(Y)· Z.
From the chain rule, it follows that in abbreviated form:
Jz = fz V1/1 + /Z,
while
Jy =/yVI/I + fy + fzH,
--def --
where H(Y, Z) = (VI/I(Y)· Z)y.
Now Yo minimizes F (locally) on pj (Why?), and hence as in 6.8, it is a
solution of the first equation in the form
d - - - -
dx/z[Y(x)] = fy[Y(x)].
d
dxfy,(x) - fy(x) = - [ddx!,'(x) - !,(x)]VrjI(Y(x)).
-
Finally, since gy, == 0 and gy' == (9, while gy == - gy VrjI; (here, gy == 1), then for
each A E C[a, b],
d d
dx (f + Ag)y.(X) - (f + Ag)y(X) = dxfy,(x) - fy(x) - A(X)gy(x)
and see that this A in C[a, b] forces the vanishing ofthe bracketed terms on
the right side of the last equation which in turn makes the left side vanish as
well. Upon combining these assertions we have for this A that Yo is a solution
of the equations
d
dx (f + Ag)y'(X) = (f + Ag)y(X),
Figure 6.9
We now consider those Y E?i near Yo which differ from Yo only in a small
interval [a, /3] containing ~, and suppose that y(x) = I/!(Y(x)) is defined in
[a, /3]. Then we can set y(x) = yo(x) outside [a, /3] to obtain a Y = (y, Y) E g)
as before. See Figure 6.9. (A = Yo(a) and B = Yo(b).)
For such Y, with] defined as above, we have
Figure 6.10
178 6. The Euler-Lagrange Equations
Along Yo we may use the arc length s as the parameter. Then I Y~(s)1 == 1, and
for a new A, the above equation becomes
Y~'(s) = A(S)Vg(Yo(s)),
which shows that in general the principal normal to a geodesic on a surface is
in the direction of the (nonvanishing) gradient and so is normal to the surface
at each point.
Observe that we have not established the existence of geodesics for a
general surface, but we have obtained valuable insight as to the manner in
which such geodesics should lie on the surface. (See Figure 6.10.)
Y
-
(cp, q,)
- ---..
(cp, 'II)
H
b x P e
''It
Figure 6.11
where
, _ ( ('I'~ 'l'HH')(~)).
+ u(~)
(x, Y(x), Y (x)) - cp(~, H(m, 'I'(~, H(m, , (40)
('I'H is the Jacobian matrix having elements ot/l;/01'/j, i,j = 1,2, ... , d, arranged
in natural order with its rows indexed by i).
Now, when f E C1([a, bJ x 1R 3 ), then under the transformation x =
F(Y) =
say, if we define
r
cp(~, H(~)), [so that (formally) dx = u(~) d~J, we have
In (44), lfJR and 'PH must be understood as being retransformed into func-
tions of x; i.e., lfJR is actually lfJR(q;(x, Y(x)), 'i'(x, Y(x))), and 'PH is actually
'PH(q;(x, Y(x)), 'P(x, Y(x))).
Then 'PH V is a vector function in ~o as is easily verified by matrix
multiplication, and the first term of (44) vanishes by hypothesis. Similarly,
w(x) ~ (lfJR· V)(x) vanishes at a and b since V E ~o; thus by (45) et seq.
the second term vanishes as well, and we see that for this particular Y:
c5F(H; Y) = o.
Conversely, each Y E ~o arises from that V E ~o given by V(x) ~
Y(q;(x, Y(x)). (Why?) Hence we conclude that c5F(H; Y) = 0, V Y E ~o.
PROOF*. From (43), v(~) = v(qJ(~, 1'/(~)) so that v'(~) = V'(X)u(~). Then
but from (41) and the chain rule with the usual abbreviations,
and
-
.r-[11(~)J = f(x)({J~ + J;,,(x)t/I~ - fy'(x) [t/I~ +ut/I~11'] ({J~,
(since from (42), u{(~, 11, 0 = ({In(~' 11)). The bracketed term in each case is seen
to be simply y'(x) by equation (39), and hence, under the transformation
x = ({J(~, 11(m, dx = u(~) d~, the integrals in (46), may be recognized as aris-
r
ing from the integrals:
and similarly,
Then we recognize each bracketed term in the last integrals as the derivative
of a product, and we obtain (jF(11; v) = (jF(y; t/I~v) + (j2F(y; ((J~v) in view of
(45).
Observe that under the transformation (38) both Y and H are assumed to
be cl, and from (39) and (40) it would follow that
obtained hitherto in this chapter. (Without this regularity, however, far more
sophisticated tools are required to handle the delicate questions concerning
behavior at the boundary. See [G-T].)
A typical point in IRd will be denoted by X = (Xl' X 2 , .•• , Xd) in Cartesian
coordinates, and the d-dimensional element of integration by dX. D is
assumed to be a bounded Green's domain in IRd-i.e., one for which the
boundary aD consists of (d - I)-dimensional surfaces on which integration is
possible such that Green's theorem holds in the divergence form
fD
(V, U) dX = f
aD
(U . N) du, (47)
Here C1(V) denotes the set of real valued functions u E C(V) which in D
have first partial derivatives admitting continuous extensions to 15; U =
(u 1 , u2 , ••• , ud ) is a d-tuple of such functions with the divergence
def d
V' U = L (u.)
j=l J
.
Xj'
N
N
r-----------------------~
X2
(a) (b)
Figure 6.12
§6.9. Multidimensional Integrals 183
where h(X) = h(X, u(X), Vu(X)) and fvu(X) is the vector-valued function
with components
fz.(X)=
J
fz.(X,
J
u(X), Vu(X)), j = 1,2, ... , d;
(where f = f(X, u, Z) = f(X, U, Zl' Z2' •• ·' Zd).
Next, we suppose that f E C 2 (15 x IR x IRd), and that u E !') n C 2 (D) (as in
§3.4(e)) so that we can integrate the second term of (48) by parts using Green's
theorem (47), for U(X) = v(X)fvu(X), as follows:
= f aD
v(X)fvu(X)· N(X) du - f D
v(X)V "ivu(X) dX
V.( - 0
(1 + u~Vu+ U;)1/2 ) - ,
(52)
which agrees with Equation 26 of §3.4. Thus in order that Uo E C 2 (D) have a
graph with a local extremal surface area among all such functions with the
same continuous boundary values, it is necessary and sufficient that U o satisfy
the minimal surface equation ((26) of §3.4) or its equivalent. As we have noted
in §3.4(e), this equation has a solution with arbitrarily prescribed continuous
boundary values iff the domain D is convex [Os].
Multidimensional problems arise naturally when Hamilton's principle is
applied to obtain the equations governing the motions of elastic bodies. We
shall reserve further discussion until §8.9.
(Problem 6.41)
§6.9. Multidimensional Integrals 185
+ faD
v(X) (fvu(X) . N (X» du
Consider the surface area function of the previous example when the
boundary values are specified only on a compact subarc K of the boundary.
Then a minimizing function u should satisfy the minimal surface equation
(52) in the domain D and have the prescribed values on K; however, on K, we
expect in general that
Vu·N
- -____---=--= = 0 or Vu' N = 0;
(1 + u;
+ U;)1/2
186 6. The Euler-Lagrange Equations
i.e., that the derivative of u in the direction normal to the boundary curve
should vanish. Conversely, we may use convexity as in §3.4(e), to show that
such u will in fact minimize the surface area function uniquely under these
conditions (Problem 3.37).
PROBLEMS
6.2. Find the stationary functions for f which belong to !?fi if:
(a) f(x, y, z) = sin z, and
!?fi = {y E CI[O, 1]: y(O) = -5, y(l) = 2}.
(b) f(x, y, z) ~/x, and
!?fi = {y E C l [0, 3]: y(O) = 0; y(3) = 3}.
(c) f(x, y, z) = y2 - Z2, and
!?fi = {y E CI[O, n/2]: y(O) = 0, y(n/2) = 1}.
(d) f(x, y, z) = y2 - Z2, and
!?fi = {y E C l [0, n]: y(O) = y(n) = OJ.
(e) f(x, y, z) = 2xy - y2 + 3zy2, and
!?fi = {y E CI[O, 1]: y(O) = 0, y(l) = 1}.
(f) f(y, z) = ~/y, y > 0
!?fi = {y E c l [0, 2]: y(O) = y(2) = 1, y(x) > OJ.
In Problems 6.3-6.12 find the possible local extremal functions for F on !?fi.
6.10. F(y) = S6 [2xY(X)3 + e" sin y(x) + 3x2Y(X)2y'(X) + y'(x)e" cos y(x)] dx, and
(a) !!} = {y E CI[O, 1]: y(O) = 0, y(1) = 1}.
(b) !!} = {y E c l [0, 1]: y(O) = n, y(1) = j8}.
6.14*. With the same definitions as in Problem 5.40*, duplicate the analysis in §6.1 to
prove that if bF(y; v) = 0, V V E!!}~ then J,,(x) + g(x) = Co on (a, b], where
g(x) = S~J,(t) dt when x E (a, b]. Conclude that y is stationary for f on (a, b].
(Hint: On each interval [xo, b], apply the result of Problem 4.3.)
6.15*. For the brachistochrone problem in Example 4* of §6.2, use Problem 6.14* to
show that the first equation (djdx)fy'(x) = J,(x) can be integrated as it stands
upon multiplication by
f, ,(x) = y'(x)
y JYWJ1 + Y'(X)2
Conclude that with an appropriate !!}*, the only possible minimizing function
for T on !!}* is that representing the cycloid given parametrically by equation
(6). Why is this an improvement?
In Problems 6.17-6.20, find all possible (local) extremal functions for F on fi).
6.17. F(y) = So/2 [Y(X)2 - y'(X)2] dx:
(a)fi) = {y E C1[0, n/2]: y(O) = O}.
(b)fi) = {y E C1[0, n/2]: y(O) = 1}.
6.21. Consider the problem of finding a smooth curve of the form y(x) which will
provide the shortest distance from the origin to the parabola given by
y = x 2 - 1.
(a) What are the stationary functions for this problem?
(b) Show that there are precisely two points (t, y) and (t, ji) on the parabola
which satisfy the transversal condition (15).
(c) Find the associated curves which represent the possible extremals.
(d) Use a direct argument to show that a minimum is actually achieved for
each of the curves found in part (c).
(e) What happens if the parabola is replaced by the circle x 2 + y2 = 1?
6.22. Brachistochrone. (See §6.4, Example 1.) A brachistochrone joining the origin to
the straight line y = 1 - x is sought. Show that there is precisely one point
(t, y) on the line which satisfies the transversal condition (15) and find the
associated cycloid which might be the brachistochrone in question.
6.23. (a) Use the method of Lagrangian multipliers to find all possible (local)
extremal functions for
F(y) = f Y(X)2 dx
on
fi) = {y E C 1 [0, 1]: y(O) = y(1) = O},
when further constrained to {y E C 1[0, 1]: S6 y'(X)2 dx = 1}.
(b) Can the convexity methods of Chapter 3 be used to conclude that a mini-
mum is achieved in part (a)? Explain.
6.24. (a) Use the method of Lagrangian multipliers to find all possible (local)
extremal functions for
F(y) = f: [2(sin x)y(x) + y'(X)2] dx
on
fi) = {y E C 1 [0, n]: y(O) = y(n) = O}
when further constrained to {y E C1[0, n]: SoY(x) dx = 1}.
(b) Can the convexity methods of Chapter 3 be used to conclude that a mini-
mum is achieved in part (a)?
Problems 189
6.25. Let fE CI([a, b] x [R3). Show that the natural boundary conditions asso-
ciated with minimizing
locally on:
F(y) = r f(x, y(x), y'(x), y"(x» dx
(i) f!) = {y E C 2 [a, b]: y(a) = aI' y(b) = btl are h,,(a) = k(b) = O.
(ii) f!) = {y E C2 [a, b]: y(a) = aI' y'(a) = a'} are k(b) = 0 and
6.26. Let f E CI([a, b] x [R3). Find the natural boundary conditions associated with
minimizing
on
= r
(a) Show that an Euler-Lagrange equation for a function which minimizes
6.28*. Let fE CI([a, b] x [Rn+1). Show that an Euler-Lagrange equation for a func-
r
tion which minimizes
Figure 6.13
6.29. A thin elastic rod of initial length I clamped at one end and pinned at the other
is deflected as shown in Figure 6.13 from its straight unstressed state. If the
center line of the rod is described by a smooth function y(x), 0 :::;; x :::;; I, then the
associated potential energy is given by
U(y) = J1
r1 Y"(X)2
Jo [1 + y'(X)2]5/2 dx,
where J1 is a constant. The physically imposed boundary conditions are y(O) =
y'(O) = 0, and y(l) = 11.
(a) Assuming that the shape of the rod minimizes the potential energy, find a
third-order differential equation satisfied by y(x).
(b) What is the natural boundary condition at x = I?
(c) Find a suitable linearized version of the differential equation from part (a)
by supposing that both ly'(x)1 and Iy"(x) I are very small on [0, I].
(d) Solve the linear equation found in part (c), choosing the constants to
satisfy the boundary conditions.
6.30. Buckling of a Column. (See §6.6.) In small deflection theory, approximations
are often made in the potential energy, rather than in the differential equation.
(a) Show that the approximation of (23) by
on
F(Y) = r f(Y(x)] dx
6.37. Letf E Cl([a, b] x jR2d). Show that the natural boundary condition associated
with minimizing
locally on
F(Y) = r f(x, Y(x), Y'(x)) dx
6.38. Isoperimetric Problem. (See §6.7.) Show that A cannot achieve a (positive)
minimum value on ~ or ~*. [Reason geometrically.]
6.39. Prove that in jRd, a geodesic curve which can be parametrized by Y E (Cl[O, 1])d
with Y'(t) "# l!J, is a straight line segment. Hint: See §l.1(a), and recall that the
unit vector Y'/I Y'I is tangent to the curve.
6.40. For each of the following functions f defined on a subset of jR2d+1, write the
differential equations whose solutions Y will be stationary for f. Also, give an
example of an integral function F on a set !!}, which could have such Yas local
extrema.
(a) f(x, Y, Z) = x 2 + IYI 2 + 3z l , (d = 2).
(b) f(x, Y, Z) = xlZI, (d ~ 2).
(c) f(x, Y, Z) = YllZI - (sin Zl)Y2, (d = 3).
(d) f(x, Y, Z) = IZI/~, (d = 2).
(Do not attempt to solve the differential equations.)
6.43. The Zenodoros Problem (in its nonisoperimetric formulation from Problem
1.9 (b)).
(a) Show that a function Yo which maximizes
ADVANCED TOPICS
Paris, 1900
AN ADDRESS
195
196 Part 2. Advanced Topics
DAVID HILBERT
At the Second International Congress
of Mathematicians 1
1 These are fragments ofthe celebrated lecture in which Hilbert set forth 22 additional problems
which have challenged mathematicians of all disciplines in this century. The translation of the
complete text from which they were compiled will be found in Vol. 8 of the Bulletin of the A.M.S.
(1902) pp. 437-445, 478, 479.
CHAPTER 7
1 An older literature refers to these as "discontinuous" extremals. Extremals which exhibit actual
discontinuities have been investigated by Krotov. See [Pel
197
198 7. Piecewise C 1 Extremal Functions
a b x
Yt
Figure 7.1
-~
--- y
'\ -
--~'
~.-T----~---------+------+---------.----~-
a b x
Figure 7.2
= fX P'(t) dt + ~f f ej
+! P'(t) dt = fX P'(t) dt. D
Ck )-1 Cj a
(a) Smoothing
(7.4) Smoothing Lemma. For each y E C1 [a, b] and ~ > 0, 3 y E C 1 [a, b] such
that y = y except in a ~-neighborhood of each corner point of y where
max \y'(x)\ ~ 4 max \y'(x)\. Thus max \y(x) - y(x)\ ~ A~ for a constant A
determined by y.
PROOF. Since y has at most a finite number, N, of corner points, it suffices
to explain its modification in the given ~-neighborhood of a typical corner
point c. We suppose that ~ is so small that this neighborhood excludes a
~-neighborhood of other corner points and the end points a, b. In this neigh-
borhood we replace the discontinuous function Y' by a continuous triangular
function, such as that shown in Figure 7.3, which is determined by its "height"
hat c and the values y'(c ± ~).
For any choice(s) of h at the corner point(s), the resulting function denoted
y' is continuous by construction so that the function y defined by
and its derivative is the function y' just constructed. It remains to select the
value(s) of h to effect the required estimates, and this choice is most readily
understood when y has only the single corner point c. Then, clearly, when
J:
x ~ c - ~: y'(x) = y'(x) so that y(x) = y(a) + y'(t) dt = y(x); to have
y(x) = y(x), for x ~ c + ~, it is only necessary to make
y' -p'
- - - y'
a b x
Figure 7.3
§7.1. Piecewise C 1 Functions 201
1 When d > 1, the curve represented parametrically by a Y with a simultaneous corner point in
all of its components need not exhibit a corner. See Problem 7.13.
202 7. Piecewise C1 Extremal Functions
integral
The above norms are not independent, and they satisfy inequalities such
as the following:
(Problems 7.1-7.2)
is defined and finite, since a partition of [a, b] reduces this integral to a finite
sum of integrals considered previously. Now in general F is not continuous
on tfJI with respect to the strong norm I I or the II 111 norm. (F is continuous
with respect to the weak norm II II. See Problem 7.3*.) However, il!J =
(C 1 [a, b])d ~ tfJI, and the values of F on tfJI can be approximated by those of F
on il!J as shown in the next
1 Some authors reverse these designations for the norms, although there is uniform agreement
about that for the extremals.
§7.2. Integral Functions on C1 203
The case where Yo is a weak local minimum point is left to Problem 7.20(c).
o
(7.8) Remark. The previous characterizations of local C 1 extremals given in
Chapter 5 were with respect to an unspecified norm, but as there observed,
weak local extremals Yo need not be strong local extremals; see Bolza's exam-
ple in §7.6. However, in case Yo is a global extremal for F on fl), then the norm
considerations are immaterial and we can assert that Yo is also a global
extremal for F on flJ. In particular all of the minima obtained for the convex
problems of Chapter 3 also minimize in the corresponding classes of piece-
wise C 1 functions.
(Problem 7.3)
on
F('p) =
~
I b
~1
f(x, .p(x), 'p'(x)) dx = r f[.P(x)] dx
:e F('p r
under the integral sign (A.13) to get upon reassembly that
+ efJ) = [I,[(.P
c5F('p; fJ) = I b
c/,(x)fJ(x) + /,.(x)fJ'(x)) dx, (1)
where
/,(x) ~ fy(x, 'p(x), 'p'(x)) = I, ['p(x)],
and
/,.(x) ~ fAx, 'p(x), .P'(x)) = fz[.P(x)]
are again piecewise continuous on [a, b] with at most (simple) discontinuities
J:
~ the corner points of .p. Thus b(x) ~ /,(t) dt, determines a function in
c5F('p; fJ) r
C 1 [a, b], and upon integrating by parts, we have as before that
(7.9) Remark. Actually, the integral equation (2) could be used to character-
ize stationarity for F on .@, and many of the subsequent properties obtained
in this chapter for extremal functions yare also true for this larger class. See,
in particular, Theorem 7.12 and its proof. For example, each solution y of (2)
satisfies (3) at a corner point, and therefore (see Problem 7.25(a))
At a corner point c of a solution y of (2), the double derivative
if defined, must vanish for some value of z.
!zAc, y(c), z),
[This is a simple consequence of the law of the mean (A.3) applied to the
function g(z) = !z(c, y(c), z), which by (3) has equal values at the distinct points
z = y'(c ±) and so must have a vanishing derivative g'(z) = !zAc, y(c), z) at an
intermediate point.]
Similarly, when fEel ([a, b] x 1R2) we may duplicate from §6.3, the deri-
vation of the second Euler-Lagrange equation in integral form, to conclude
that a local extremal y for F on .@ must satisfy
and since w =F z, we see that neither f(c, y(c),· (nor - f(c, y(c), .) can be
strictly convex. (Recall Definition 3.4). This fact can sometimes be used to
locate or even preclude corner points.
For example, the function f~, ~, z) = ~2 + t)~ is strictly convex
(in z) except when x 2 + y2 = O. Therefore an associated local extremal y
cannot have a corner point except possibly where c = Y(c) = O. The function
f(y, z) = (1 + y2)Z4 cannot have local extremals with corner points, i.e., each
local extremal is C 1 , because (1 + y2)Z4 is strictly convex in z. In this case, the
test of Remark 7.9 fails since fzAy, 0) = O. On the other hand, by the same
test, when f(y, z) = eY ~ every solution of (2) is C 1 , whether or not it
gives a local extremal. (Recall Example 3 of §3.3)
With a generalization similar to that used to obtain Theorem 6.8, we get
the following:
on
F(y) = r f[y(x)] dx
ityof
p' , 1
h' = fiJI + p,2 and ! - pt, = fiJI + p,2'
while the latter function is in fact constant since Ix == 0. Thus we require the
continuity of P', so that even on mathematical grounds this problem can have
°
only C 1 extremal functions. (An exception occurs at x = where P(O) =
and P'(O) = + 00. There, an extreme form of a corner is permitted at the cusp
°
of the cylcoid.)
Figure 7.4
§7.3. Extremals in C1 [a, b]: The Weierstrass-Erdmann Corner Conditions 209
-Po
y --P
- r(x, y) = 0
a c x
Figure 7.5
Given functions p, q, 'l' E C[a, b], with 'l' > 0 on [a, b], suppose that
.Po E C1[a, b] minimizes
on the subspace
@"o = {y E 61 [a, b]: y(a) = y'(b) = O},
under the isoperimetric constraint
G(y) ~ r p(X)y2(X) dx = 1.
mize F under the given conditions (Problem 7.19). The physical origin of the
Sturm-Liouville problem and the significance of this fact will be discussed in
§8.9.
(Problems 7.4-7.9)
is convex on
F(P) = r f[y(x)] dx
~ 0,
r
+ ti) - F(.P) - c5F('p; ti)
{f[Y(X)
VY E ~
+ v(x)] - f[y(x)] - (fy[y(x)]ti(x) + fz[y(X)]ti'(X»} dx
and v E ~o = {ti E CI[a, b]: ti(a) = v(b) = O}. (6)
Moreover, when fUs, y, z) is strongly convex, then F is strictly convex on ~.
[Indeed, then equality in (6) is possible only if v(x) or v'(x) = 0, V x except
at the corner points of y, V. This is seen by representing the integral in (6)
as a finite sum of integrals with continuous nonnegative integrands to each
of which may be applied the earlier argument from §3.2. Hence ti2(X) is con-
tinuous and piecewise constant on [a, b]. It follows that ti 2(x) = const. =
v2 (a) = 0, V V E ~o.]
By an analogous argument we may extend Theorem 3.5 as follows:
r
each () E OJJ, the product /y,() is also in OJJ and by (1), and 7.2,
= r:x [/y,(x)()(x)] dx
= /y,(x) ()(x) = 0, I:
since ()(a) = ()(b) = 0, when y + () E ~o. o
There is a corresponding version of Proposition 3.9. (Problem 7.12).
Internal Constraints
Convexity may play an important role in problems involving internal point
constraints such as that of the following.
on
F(y) = f: y'(X)2 dx
~
= {y E !lJJ = C1 [0, 2]: y(O) = y(2) = 1; y(l) = O},
which has the internal constraint y(l) = 0, we recall from §3.3, Example 1,
that J(z) = Z2 is strictly convex on lIt
Moreover, when y E ~ then y + () E ~ iff () E ~o = {() E !lJJ: ()(O) = ()(2) = 0;
0(1) = O} and then by the usual argument
A ( ) _ {I - x, 0::;; x ::;; 1,
Yo x -
x-I, 1::;; x::;; 2,
is the only local extremal function for F on ~, and it minimizes F on ~
uniquely, by 7.12.
§7.4. Minimization Through Convexity 213
Specifically, we seek Yo
rE
(f[y(x)] + A(X)g[.P(x)]) dx on~.
~ which accomplishes this for a A ~ 0 with
A(X)g[yO(X)] == 0 on (a, b).
Then it would follow from Proposition 2.5 and the [strong] convexity of
k, y, z) = f~, y, z) + A~g(b y)
that Yo minimizes F [uniquely] on ~ under the given inequality constraint.
Yi (b, bd
y> t(x) = x 2
a x
Figure 7.6
F(y) = r
Example 2. Minimize the strictly convex distance function
J1 + y'(X)2 dx
on?J as above under the constraint jl(x) ::;; x 2 . Then, as is evident graphically
from Figure 7.6, for some locations of the end points we must permit
a portion of the minimizing curve to lie along the parabola defined by
g(x, y) = y - x 2 = O. Clearly g~ y) is convex (as is -g~ y», but the sign of
A is still important. Here f(x, y, z) = ~ and j(x, y, z) = ~ +
A(X)(Y - x 2 ).
For this (nonstationary) part of the curve, y(x) = x 2 , and since gy == 1
while J;, == 0, we use (7), the resulting stationarity equation for j, to define
1 The resulting A. is only piecewise continuous, but since 1. = J., the usual convexity arguments
remain valid. See Problem 7.26.
§7.5. Piecewise C 1 Vector-Valued Extremals 215
guarantees that it is the unique minimizing function for the problem. (Alterna-
tively, we may argue that since.1zz = fzz > 0, Yo cannot have corner points by
Theorem 7.10(iii).)
We have just shown how to prove that the natural conjecture for the curve
of least length joining fixed points in the presence of a parabolic barrier is
the correct one. The reader will find it instructive to analyze this geodesic
problem with barriers of other shapes where more than one subarc may be
required to lie along the barrier curve. It is more difficult to obtain necessary
conditions for problems of this type, and in particular to investigate the
behavior at the points of contact. See Problem 7.22 and the discussions in
[PeJ and [Sm].
(Problems 7.23, 7.24)
r r
E
so that in the subintervals excluding its corner points, Yis C 1 and stationary
216 7. Piecewise C 1 Extremal Functions
(8')
Observe that (8) and (8') may be replaced by the equivalent vector valued
integral equation:
(9)
are well-defined when x is not a corner point of f. The continuity in (10) may
be used to conclude that at a corner point c, a (local) extremal Y satisfies the
second Weierstrass-Erdmann condition:
(! - Y"!y,)(c-) = (/ - Y'-!y,)(c+). (11)
In each interval excluding corner points, the integrand in (10) is continu-
ous so that Y satisfies the second equation:
on
F(Y) = r
and suppose that Y provides a local extremal value for
f[Y(x)] dx
Then except at its corner points, Y is C 1 and satisfies the first and second
Euler-Lagrange equations, (8') and (12). At each corner point, c, Y meets the
Weierstrass- Erdmann conditions:
§7.5. Piecewise C 1 Vector-Valued Extremals 217
Observe that when c is a corner point for only one component Yj, then as
before we may infer from (i) that if defined, each fz.z .(c, Y(c), Z) must vanish,
for some Z E ~d; (ii) reduces to requiring the conti~uity of only (j - yj 'hi)
at c; and (iii) becomes a statement of nonconvexity in Zj only. Finally, when
fx == 0, then (10) shows that (j - y,.jY') is constant.
PROOF. When Yo = Y, the radius function r(x) used in proving Theorem 6.8 is
piecewise continuous and so it again has a positive minimum value on [a, b]
guaranteeing a sufficient supply of ~-admissible directions at Yo. For (iii) see
Problem 7.25. D
y y y
b x (r) b x (r) b x
(a) (b) (c)
Figure 7.7
218 7. Piecewise C 1 Extremal Functions
However, as we have seen with similar integrals, the minimizing curve (if it
exists) may well have corners, and indeed, it is almost intuitively clear that
when b ~ b1 , the minimum area arises from the limit of curves such as shown
in Figure 7.7(b) and thus should be given by the degenerate three segment
curve of Figure 7.7(c).
Hence, ignoring the constant 2n, we are led to formulate the problem as
follows: Given positive numbers b, and b1 ~ 1, minimize
~[YX'J
dx IY'I
= 0·
' dt IY'I
~[ytJ = I f'I, (13)
y y
(b)
Figure 7.8
§7.5. Piecewise C I Vector-Valued Extremals 219
( y')2 = y2 _ 1. (13')
x' c6
Since x'(t) is continuous and nonvanishing, it must be positive in order to
J6
have x(O) = 0 and x(l) = x'(t) dt = b > o. It follows that x(t) has an in-
verse in C l [0, bJ, and we can suppose the curve to be parametrized with
respect to the variable x = x(t).
For this parameter, x' = 1, and the resulting differential equation for
y = y(x) is with C = Co,
y'(X)2 = e~X)r - 1, (14)
or
C2'1,2 = 1, for '1 =1= O.
Thus '1(x) = c-lx + Il, so that the solution of (14) is
y(x) = C cosh(c- l X + Il), (14')
for some constants c =1= 0 and Il to be determined. The boundary conditions
for y = y(x) are:
(i) y(O) = 1; and
(ii) y(b) = bl .
220 7. Piecewise C l Extremal Functions
while
ax
a IXa fy(t) dt = fy(X) = fy(x,
A A ~ ~
Y(x), Y'(x)),
and both of these functions are continuous near the noncorner point Xo'
Finally, fz.z.(x,
, J
Y(x), Z) is continuous by hypothesis.]
~
Since Z(x) = Y'(x) will surely be one local solution to equation (16),
uniqueness guarantees that it is the only solution, and since Z is C 1 , it follows
that Y is C 2 in this neighborhood. 0
r r
order that a function minimize
(in particular, (8), (9), and (10», actually characterize stationarity and do
not distinguish between maximal, minimal, or saddle point behavior-even
locally.
The early efforts (notably by Legendre in 1786) to characterize local
minimality utilized a second variation and resulted in conditions involving
the second derivatives of f However, in his lectures (c. 1879), Weierstrass
showed that a condition involving the first derivatives was even more
significant.
Graphs of W
w (e =~)
Yl
Figure 7.9
§7.6*. Conditions Necessary for a Local Minimum 223
where E = Y'(O+).
If we now divide by e and take the limit as Ii '" 0, we can use the chain rule
to see that
°
S; [f(0, (P, E + W) - f(O, (P, E)] - fz(O, (P, E) . W
locally on
F(Y) = r
be continuous on [a, b] x /R2d. Suppose that Y minimizes
(7.16) Remarks. Weierstrass' condition (19) also holds when in the hypoth-
eses, /R2d is replaced by a subdomain D, but now only for those W for which
it is defined. Further restrictions may be required. See Problem 7.27.
If we recall the definition 3.4 of partial convexity as extended in Problem
3.33, we see that (19) is satisfied automatically when the function ff.b X. Z) is
224 7. Piecewise C 1 Extremal Functions
locally with respect to the strong 1'1 norm, then Y satisfies the Weierstrass
condition:
c!(x, Y(x), Y'(x), W) ~ 0, v X E [a, b], WE \Rd. (20)
PROOF. By Theorem 7.7 we know that Y also minimizes F on ~ locally with
respect to the 1'1 norm so that the above theorem is applicable. 0
(7.18) Remark. The proofs of the local results given above fail if, in the
hypotheses the strong 1'1 norm is replaced by the weak II' II norm. [The
Weierstrass construction of Y + W is not possible within the given 11'11
neighborhood of Y.] However, if Y actually minimizes F on ~ or Yactually
minimizes F on ~, then these norm distinctions are irrelevant and the appro-
priate Weierstrass conditions are satisfied.
at each x at which the coefficient functions fz.z• J.(x, Y(x), Z) are defined and
~
PROOF. Were q(x, V) < 0 for some V, then the Weierstrass condition (19)
would be violated at x for some values of W, contradicting Theorem 7.15.
o
For example, when f(x, Y, Z) = X IZI 2 so that
2X' i =j, ..
fz,zj(x, Y, Z) = { 0 i =1= j, l,J = 1,2, ... , d,
The following example of Bolza (1902) makes clear the relative strengths
of these conditions:
Bolza's Problem
Consider
f(z) = Z2(Z + 1)2,
Here
fAz) = 4z 3 + 6z 2 + 2z and f'zz(z) = 2(6z 2 + 6z + 1). (23)
Clearly the linear function Yo(x) = mx + c is stationary for f since Yo(x) =
m = const. (§6.2(a», and each
~ = {y E CI[a, b]: y(a) = aI' y(b) = btl
contains precisely one such Yo, that with m = (b i - al}/(b - a).
226 7. Piecewise C 1 Extremal Functions
on
F(P) = Ib
f(P'(x)) dx
y! --Yo
-- - - Po
!
a b x
Figure 7.10
Problems 227
But again, when m_ < m < m+, then - fzAm) > 0 and we can use the
convexity of - f exactly as before to conclude that Yo does provide a unique
weak local maximum for F on ~.
PROBLEMS
is continuous with respect to the weak norm 11'11. (See Example 4 in §5.3.)
7.4. Show that the following functions admit only C l extremals for an associated
integral function:
(a) f(x, y, z) = J1+7".
(b) f(x, y, z) = e (l + y2 + Z2).
X
on
~1 = {P E C1 [0, 3]: P(O) = 0, P(3) = 2},
which have exactly one corner point.
(b) What is the value of F for each ofthese extremals?
(c) Are there any extremals with corner points for F on
~2 = {P E C1 [0, 3]: P(O) = 0, P(3) = 8}?
r
7.8. Discuss the possible local extremal functions for
F(P) = r
p, l' E qa, bJ, then when rx < 0:
F(P) = r
Po uniquely minimizes
P/(X)2 dx
230 7. Piecewise C 1 Extremal Functions
on
for given a1 , b1 , ej and Cj with a < Cl < C2 < ... < CN < b. Must Yo have
a corner point at each Cj?
(b) Whenf(,!, y, z) is strongly convex on [a, b] x [R2, and iJ is as in (a) prove
that each Yo E iJ which is stationary for f in the intervals excluding the Cj'
must minimize F(y) = S:f[y(x)] dx on iJ uniquely.
(c) Use (b) to conclude that a polygonal curve will supply the geodesic be-
tween fixed points which can be joined in order by a function in iJ.
iJo = {{) E C1[a, b]: {)(a) = {)(b) = {)(c) = O,j = 1,2, ... , N}
where
= r ['t(y' - y~:Y(X) + rl(x)] dx ~ r(x{,
r(x)~('ty;:2)(X)' x E(a, b).
(d) Since y E 0//0' we know that y(a) = Yl(a) = O. Use L'Hopital's rule on the
factor Y/Yl, to conclude that r(x) -+ 0 as x'" a.
(e) Since Yl E 0//0 => y~ (b) = 0, with 't(b) *- 0, think of a theoretical basis to
claim that Yl(b) *- 0 so that 't(x) also vanishes as x ,1' b. (If 't(b) = 0, but
't'(b) exists, show how to reach the same conclusion. Hint: See Problem
7.21.)
(f) Use (a), (c), (d), and (e) to conclude that when G(y) = G(Yl) = 1 say, then
F(y) ~ F(yd, V Y E fJfio.
(g) Use theorems to extend the conclusion of (f) to y E 5Jlo with G(y) = G(y 1).
(h) Give other sets of boundary conditions defining 5Jlo for which the conclu-
sions of this problem will hold.
Problems 231
locally on
r r
7.22*. To obtain necessary conditions associated with minimizing
F*(m =
~o(X)((~)f.[Po(X)] - /Y[Po(X)]) = O.
(d) Can you draw any conclusions about the behavior of Po at points of
transition between the intervals described in (c)? Could these be corner
points of Po?
7.23.· An Optimal Control Problem.
To solve the control problem presented in Example 3 of §3.5 under the
additional Lagrangian inequality 0 ::5; u ::5; fJ as discussed in Remark 3.20:
(a) Explain why it would be appropriate to consider the convexity of the
integrand
where Al is constant.
(b) Argue that a Uo E ~ = C[O, T] which solves the equation
(i) u(l + A) = Ao(T - t) with Ao = -Ad2, A ~ 0, will minimize
F(u) = IT u2(t) dt
u (t) _ {fJ,
o - Ao(T - t),
F(y) =
r
1
Jo
x
1 + P'(xf dx
on
f!}* = {P E C1 [0, 1]: P(O) = h, P(1) = 0, P' :0;; O}.
1 This is best accomplished by representing the curve parametrically as in [P], but recall the
footnote following Theorem 6.12.
CHAPTER 8
234
§8.1. The Action Integral 235
I".
Ji = oL = _(OU) , i = 1,2, 3,
0Yi 0Yi
which means that the vector force F should be derivable from the "scalar
potential" - U.
r
would be satisfied by a function Y which minimizes the action integral
and in 1835, Hamilton postulated that if the system occupies certain posi-
tions at times a and b, then between those times, it should move along those
= r
admissible trajectories which make "stationary" the action integral
(4)
(x, y)
x
Figure 8.1
238 8. Variational Principles in Mechanics
Between fixed times a, b, the system should move along those trajectories
represented by generalized coordinates Q E (C 1 [a, b]t with prescribed values
at a, b, which make stationary the action integral A(Q) = f~ L(t, Q(t), Q(t)) dt.
d
-LQ(t) = LQ(t) d(aL) aL
or - - . =-, j = 1,2, ... , n, (5)
dt dt aqj aqj
If motion of the system does not occur, the system is in static equilibrium.
Then its kinetic energy T == 0, and if its potential energy, U, depends only on
position, Hamilton's principle reduces to Bernoulli's principle, which states
that this equilibrium state is one which makes stationary the potential en-
ergy, U, of the system. Here U = U(Q) may be regarded as the work done by
the external forces in bringing the system from a reference state to the posi-
tion Q. That this equilibrium state may not minimize U is evidenced by the
example of a marble balanced on top of a sphere. However, as with the case
of this marble, actual physical disturbances will transfer it from its state of
(unstable) equilibrium, to one of stable equilibrium in which it rests, say, on
a table supporting the sphere. And in this state its potential energy is mini-
mized, at least, locally, relative to small displacements. (With larger displace-
ments it could fall to the floor and further reduce its potential energy, or
alternatively, change its reference state.)
Thus we may expect that the stable equilibrium states, those capable of
being sustained indefinitely during small disturbances, should provide a local
minimum for the potential energy function U. It was appropriate to invoke
this principal of minimum potential energy in §3.5, when we were seeking the
stable equilibrium shape of a cable hanging under its own weight. Moreover,
as we shall see in §8.9, some physical systems can be in equilibrium only when
they minimize their potential energy (relative to its reference value); i.e., they
cannot exhibit unstable equilibrium states.
240 8. Variational Principles in Mechanics
= 2T - (T - U) = T + U,
and E is the sum of the kinetic and potential energies of the system.
In any case, we call E as defined by (6) the total energy ofthe system. When
the Lagrangian, L, does not depend explicitly on time so that L t == 0, then the
second part of (6) shows that motion can occur only along those trajectories
represented by Q for which
E(t) = E(t, Q(t), Q(t» = const.,
i.e., for which the total energy is conserved. In general, we have
Figure 8.2
where the first term represents the work done in stretching the spring
(as given by elementary calculus) and the second term is the work done in
raising the pendulum mass M against gravity. (g is the familiar gravitational
constant.)
To obtain the kinetic energy function T it is best to express it first in terms
of the rectangular coordinates (x l' Y1) of the mass M as shown in Figure 8.2,
and then to use the geometrical relations
it is seen that L does not depend on time explicitly so that L t == 0, and from
(6), the total energy E = T + U is constant for this system. We now assume
that k is continuous.
From Hamilton's principle, there are two equations of motion (5) for this
system, one for each of the generalized coordinates x, e.
That for x is
d
dt x = L x
-(L.)
or from (11) and A.8:
d .
dt [(m + M)x + Ml(cos e)e] = - k(x); (12)
or
d . .
dt Ml[(cos e)x + W] = -Ml sin e(g + xe). (13)
which is now a first-order system in the 2n variables Q = (ql, q2' ... , qn),
P = (Pl, P2'···' Pn)·
244 8. Variational Principles in Mechanics
(19')
With
Q= G(t, Q, P), or qj = git, Q, P), j = 1,2, ... , n,
we have
Lqp, Q, Q) = Lqj(t, Q, G(t, Q, P)) = jj(t, Q, P),
say, for j = 1, 2, ... , n, and now the system (18) becomes
qj(t) = git, Q(t), P(t)),
j = 1,2, ... , n, (20)
jJit) = jj(t, Q(t), P(t)),
which is in the so-called normal form. Moreover, when expressed in these
variables, the total energy function of (6) is given by
E(t, Q, Q) = p. Q - L(t, Q, G(t, Q, P))
= p. G(t, Q, P) - L(t, Q, G(t, Q, P)) = H(t, Q, P), (21)
say, if we use (21) to introduce the new function H called the Hamiltonian of
the system. Observe that
n
H(t, Q, P) = L Pjgj(t, Q, P) - L(t, Q, G(t, Q, P)) (21')
j=l
§8.4. The Canonical Equations 245
oH = g.
0Pi I
+ f. (p.) _ oqj
j=l
OL)Ogj
OPi
and (22)
i= 1,2, ... ,n.
Now, since
oL . oL
Pj == ~(t, Q, Q) == ~(t, Q, G(t, Q, P)),
uqj uqj
each summation in equations (22) vanishes; also
oL oL
~ = ~(t, Q, G(t, Q, P)) = jj(t, Q, P) for j = 1,2, ... , n, as above;
uqj uqj
thus we obtain the following equations:
oH oH
-=gi and -= -J;, i=I,2, ... ,n,
0Pi Oqi
which complete the Legendre transformation. These equations do not express
laws of dynamics, but with their help, the equations of motion (20) assume
the form
oH
qj(t) = ~(t, Q(t), P(t)),
UPj
j = 1,2, ... , n. (23)
oH
Pj(t) = -~(t, Q(t), P(t)),
uqj
Thus, along a stationary trajectory where H(t, Q(t), P(t)) = H(t), say,
dH
t
oH
-d (t) = ~ +
ut
n (OH
}=1
~qj
uqj
r OH)
+ ~Pj
UPj
=
oH
~(t, Q(t), P(t)),
ut
(24)
in view of (23). This is the transformed version of the conservation law (7) and
it may be considered as a (2n + l)st equation of motion.
H=Ht •
or (25)
H=Ht , j = 1,2, ... , n,
and they are known as the canonical equations of motion for the system.
They are attributed usually to Hamilton (1835), although they first appear in
work of Lagrange (1809) and were also used by Cauchy (1831). They show
246 8. Variational Principles in Mechanics
Moreover, functions satisfying (25) make this integral stationary on sets such
(26)
j = 1,2, ... , n,
so that
1 The Hamiltonian can be defined purely geometrically when the Lagrangian L(!, Q, Z) is
strictly convex. See Problem 8.22. -
§8.5. Integrals of Motion in Special Cases 247
The canonical equations (25) can now be obtained. The first of these, namely
41 = x = Hp, and 42 = {) = H p2 , simply recover equations (28'). However,
P1 = -Hq, = -Hx = -k(x) and P2 = -Hq2 = -He are new equations, the
latter being rather complicated by the dependence of .1 on O. These four
first-order equations replace the two second-order equations (14) obtained
earlier, and they constitute the canonical equations of motion for the system.
(The final equation if = Ht == 0 shows that H = E = const. along the station-
ary trajectories.)
The local solution of this first-order nonlinear system with given initial
conditions may be obtained by numerical approximation. See [I-K]. This is
one of the reasons that the canonical equations are superior to the Euler-
Lagrange equations (14).
of some of the 2n + 1 variables t, ql' q2' ... , qn' Pl' P2' ... , Pn. For example, if
H Pi == 0 for some i = 1, 2, ... , n, then qi = const. is one integral of motion,
and the motion is confined to such hyperplanes. Similarly, if Hqi == 0 for some
i = 1, 2, ... , n then Pi = const. = Ci' say, is an integral of motion. Moreover,
in this case the ith variable qi may be effectively deleted or "ignored" by
r
considering the stationary functions for the modified action integral
and
Of special interest is the case when H t == O. Then H = H(Q, P), and from the
last of (25), H(Q(t), P(t)) = const. is an integral of motion of the system. Thus
the motion takes place along curves confined to hypersurfaces in the Q, P
space which conserve the total energy E as in Figure 8.3. In this case, t is
simply a parameter for the motion of the system and time becomes ignorable
on replacing it with t = t(,) where, is a new variable awaiting specification.
q.
Q(IX) I
Q-space
IQ(fJ)
Figure 8.3
§8.5. Integrals of Motion in Special Cases 249
2T =. t
~J=l
aij(Q)qiqj = (. t
~J=l
aij(Q)q;qi)!(t')2.
A(Q, P) = fP
"
[t
J-I
Pj(1:)qj(1:) - H(Q(1:); PI (1:), ... , Pn(1:))q~(1:)J d1:
or
For a general K of this form we may in principle choose r so that the resulting
equations of motion become
, oK , oK j = 0, 1, 2, ... , n. (36)
qi=~' Pi = -~,
uPi uqi
(See Problem 8.10.)
Observe that the integrand in (33) does not depend explicitly on r and its
form recalls Jacobi's action integral (31) from the preceding section. How-
ever, were we to attempt an analogous geodesic interpretation of least-action
principle, we would be led to examine a form
which is not positive definite and hence could only be associated with a
nonstandard metric. Such metrics were studied by Ricci (1892) and they form
a basis for the general relativity theory of Einstein (1916). The resulting
geodesics are not purely spatial in character as were Jacobi's, but occur in the
space-time world suggested by our new coordinates.
To obtain this coordinate system from an original one (Q, P), Hamilton
had proceeded indirectly by recognizing that the trajectories which could
make stationary the integral (33) with Q(ex), Q(P) prescribed, subject to the
constraint K(Q, P) = const., will also make stationary a modified integral
where S = S(Q, (2) is a single real valued C 1 function of the variables (Q, (2) E
[R2n+2; [the second term is actually a constant (after integration) so that S
does not participate in the variations]. In particular, if we could find S with
P = SQ and define P = - SQ' then, by the chain rule:
d - - - -
drS(Q(r), Q(r)) = SQ· Q' + SQ· Q' = p. Q' - p. Q',
252 8. Variational Principles in Mechanics
and the new integrand of (38) is just p. Q'. Thus our search for Jacobi's
coordinates is reduced to finding an S = S(Q, Q), for which
llo = K(Q, P) = K(Q, P), when P = SQ = SQ(Q, Q).
Moreover, since motion in the new system occurs with llo == 0, it suffices
to find a complete solution S = S(Q) to the first-order partial differential
equation
hl =
as (t, q; c
--a = 2c l t - 2q,
l )
Cl
(45)
St + H(t, Q, SQ) == o.
Although motivated by physical considerations and couched in the terminol-
ogy of analytical mechanics-velocity, momenta, energy, etc.-this transfor-
mation is purely mathematical, and it is equally applicable to the problem of
finding stationary functions for any f = f(x, Y, Y'). (In the next chapter we
shall see the connection between the corresponding Hamilton-Jacobi equa-
tion and the existence of a field used in establishing the minimality of a given
stationary function for f.)
For example, the integrand f(x, y, y') = y,2 corresponds to the Lagrangian
L(t, q, q) = q2 which we have examined at the conclusion of the preceding
section. As a result, we know that the stationary functions for this f must be
of the form corresponding to (45), namely,
f(x, Y, Z) - f(x, Yo, Zo) - fy(x, Yo, Zo)· (Y - Yo) - fz(x, Yo, Zo)· (Z - Zo)
= [H(x, Y, Po) - H(x. Y, P) - Hp(x, Y, P)· (Po - P)]
- [H(x, Y, Po) - H(x, Yo. Po) - Hy(x, Yo, Po)· (Y - Yo)], (47)
is a strict saddle function on [1,2] x 1R2d. Saddle functions have their own
significance as we see in the next results.
on
A(Y, P) = r [P(x)· Y'(x) - H(x, Y(x), P(x))] dx (48)
Now, Y E ~min => Y' = Hp(x, Y, P) and with the convexity of H~, X. P), it
(48')
1 It does not seem possible to use (47) to characterize strong convexity of f See Problem 8.28(b).
256 8. Variational Principles in Mechanics
A(Y, P) - A(Yo, Po) ~- Lb [H(x, Y, Po) - H(x, Yo, Po) + Po' (Y - Yo)] dx,
(48")
since the boundary term Po' (Y - Yo)l: = O. But for the stationary function,
P~ = -Hy(x, Yo, Po) (Why?), and now the convexity of -H~ Y, f) will en- ~
sure that the last integral is nonpositive. When strict convexity obtains in
both cases, then from the usual arguments of Chapter 3, we conclude that
P = Po and Y = Yo· 0
(8.2) Remarks. Theorem 8.1 does not require that H be a Hamiltonian asso-
ciated with a Lagrangian f In applications, we could take any function
P E (C 1 [a, b])d, and seek a solution
Y E!?}o = {Y E (C1[a, b])d: Y(a) = Yo(a), Y(b) = Yo(b)}
of the first-order system Y'(x) = Hp(x, Y(x), P(x», to obtain a (Y, P) E !?}min,
and hence an upper bound to A(Yo, Po). Similarly, choosing aYE !?}o, we
could use any solution P of the system P'(x) = - Hy(x, Y(x), P(x», to pro-
vide a lower bound for A(Yo, Po). For more information on this method of
complementary inequalities, see [Ar].
Now, in general, it is difficult to find (Y, P) E !?}min unless H is the Hamil-
tonian for a Lagrangian function f = f(x, Y, Z). Then each Y E !?}o provides
a P(x) ~ fy,(x) for which Y'(x) = Hp(x, Y(x), P(x» since this equation is a
consequence of the Legendre transformation; (recall (46'». This pair (Y, P) E
!?}min, and from Theorem 8.1 we would conclude that
as we would expect from (47) and its convexity implications. However, this
same Yalso provides (in principle) a solution P1 =F P of the first-order system
P'(x) = - Hy(x, Y(x), P(x» for which (Y, Pd E !?}max. Thus A(Y, P1) is a lower
bound for F(Yo), and this could be of use in approximating an F(Yo) that
cannot be determined exactly.
z = Hp = ~y h1 _ p2
(which has the sign of p),
§8.8. Saddle Functions and Convexity; Complementary Inequalities 257
1 IXI!¥+yI2
T(y)=- --dx
0 J2Y
y
on
!!}* = {y E CEO, Xl]: y(O) = 0, y(x l ) = Yl; T(y) < + oo}.
The integrand function J(1 + Z2)/y is not convex, but if we let y(x) =
y2(x)/2, so that y' = yji' the integral becomes
Figure 8.4
i.e., we can replace the given curve by a related curve that encloses an area at
least as large for which x' ~ O. Thus we only need to solve the nonisoperi-
metric problem of minimizing
on
~ = {Y E C 1 [0, I]: Ji(O) = Y(l) = 0, F(y) < + oo}
and so asking whether j(y, z) = -j2y - Z2 is convex.
To find out, let's introduce the associated momentum
- z J2§p
P = fz(y, z) = j ,solve for z =
2y - Z2 Jl+Pi'
and substitute in (46) to obtain the Hamiltonian
°: :; s :::; I.
°°
is unbounded near s = and s = I. However, sfz[Yo(s)] and (I - s)fz[Yo(s)]
are bounded near s = and s = I respectively while to be in ~, y must have
260 8. Variational Principles in Mechanics
Y'(O) = Y'(I) = 0 (since Y'2 < 2Y). Thus we can use Theorem 3.5 as extended
in Problem 3.21(d) to conclude that the semicircle is the unique minimizing
curve for F, and so we verify Dido's conjecture that it alone encloses the
maximal area against the x-axis.
(8.3) Remarks. 1. By Theorem 7.7, the semicircle also gives the maximal area
among polygonal or other piecewise C 1 curves (x, P) with P(s) > 0 on (0, I),
and this conclusion remains valid when we permit curves (x, P) with P(s) = 0
at some points in (0, I). Indeed at such points y'(s) = ji(s) = 0, where y = p2j2,
and when Yo = yo(s) is as before,
jm Y') - j(yo, yb) - ]'(Yo, yb)(y - Yo) - jAyo, yb)(Y' - yb)
- -,2 -
= 12 - _ -,2 _ Yo + Yo = Yo >0
v' Yo Yo /2- -,2 /2- -,2 /2- -,2 .
v' Yo - Yo v' Yo - Yo v' Yo - Yo
Thus the fundamental inequality holds even at these points.
2. It is possible to use these results to establish the isoperimetric inequal-
ity (Problem 8.26). (A different derivation of this inequality will be given in
§9.5, Example 3.) However, we can also use this approach to solve other
significant problems, including that of Zenodoros in Problem 1.9. See
Problems 8.27-8.30.
u(t, x)
Figure 8.5
T= ~I pu~dx, (50)
where p = p(t, x) is the (local) linear mass density of the string which is
assumed continuous but may be nonuniform. (T may be considered as the
limit of the ordinary kinetic energy of a large finite number of masses
m i ~ p(t, xi)Ax i located at Xi with 0 = Xo < Xl < ... < Xn = l, and moving
with vertical velocity ut(t, Xi), when the AXi = Xi - X i - l tend toward zero.)
at time t. We are also assuming that there is no external loading on the string
and thereby disregard the effects of gravity. (However, see Problem 8.17(c),
(d), and ESe] for less restrictive assumptions.)
From (49) and (50) we obtain the Lagrangian
and Hamilton's principle now requires that between times 0 and b, say, at
which the positions are known, the motion will make stationary the action
integral
!!) = {u E C 2 ([0, bJ x [0, IJ): u(t, 0) = u(t, I) = 0; u(O, x), u(b, x) prescribed}.
Upon introducing
we see that this integrand function depends on X = (t, x) E 1R2 through p and
r, and on Vu = (un uX>, but does not depend explicitly on u. (See §6.9.) Thus
by Theorem 6.13, we should seek a C 2 solution u to the equation
V"fvu = 0,
or upon substitution of (53), the equation
(put)t - (ruA1 + u~rl/2)x = 0, (54)
which meets the geometrical conditions
u(t, 0) = u(t, I) = 0, 0:::;:; t:::;:; b. (55)
In addition, we suppose that the initial position is prescribed by
u(O, x) = uo(x), 0:::;:; x:::;:; I, (56)
so that from (55), uo(O) = uo(l) = o. However, instead of attempting to specify
the position at a later time b, we seek a solution valid for all later times, with
the prescribed initial velocity
O:::;:;x:::;:;l. (57)
We observe that (54) is nonlinear, and for example, will not admit a
pure time-oscillatory solution of the form u(t, x) = u(x) cos rot. Numerical
methods must be used to obtain approximate solutions to this equation.
However, as in §6.6, we can easily linearize it by supposing that the slopes
IUxl ~ l.
Then (54) reduces to
(put)t = (rux)x, (58)
which for constant p, r becomes the one-dimensional wave equation
Utt = (12U xx , with (12 = rj p. (58')
Because of its importance to the description of simple acoustical phenom-
ena, (58') was the first partial differential equation to receive serious atten-
tion. It admits the (unique) solution with given initial data first obtained by
§8.9. Continuous Media 263
u
(t, x ) =
uo(x + crt) + uo(x -
2
crt) 1
+ -2
cr
f
x + at
x-at
Vo
()
s s.
d
(59)
(Problem 8.14.)
Here we must suppose that the initial functions U o in C2 and Vo in C t are
given on all of~. (However, to satisfy the end point conditions (55), the initial
data must be extended periodically.) In particular, when the initial velocity
Vo == 0, this solution at time t may be interpreted as the average of the
translate of the initial shape U o which is travelling to the right at velocity cr,
with that of the same shape travelling to the left at velocity cr. Moreover, for
constant p and r, (58') does admit time-oscillatory solutions in the form
u(t, x) = uo(x) cos wt, (60)
provided that U o in C 2 is a solution of the linear ordinary differential equation
u~ + (wlcr?u o = 0, (61)
as is easily verified by direct substitution.
Now, for constant Il = wlcr, the general solution to (61) is well known to
be given by
uo(x) = A cos IlX + B sin IlX.
In addition, U o must satisfy the end conditions uo(O) = 0, uo(l) = O. The first
of these conditions requires that A = 0, so that uo(x) = B sin IlX. However,
the second can now only be satisfied (for U o -# (7), when sin III = 0; i.e., for
Il = Iln = nnll, n = 1, 2, ....
It follows that simple oscillatory solutions of the form (60), can occur only
for specific "natural" frequencies
Observe that these natural frequencies increase with the tension, r, and
decrease as the density, p, or the length, 1, are increased, facts confirmed by
experience with, say, a guitar string. Moreover, from experimental evidence,
Mersenne, in his Harmonie Universe lie of 1636 predicted the exact square
root dependence on rip, while other properties of W t were known to the
Pythagorean school.
The mode shape associated with oscillations at frequency Wn is
Un ()
X . Iln X = SIn
= SIn . (nnx)
-1- ,
Now, what happens, if the string is not released in one of these pure
shapes, but is instead plucked; that is, released in a triangular (or some other)
shape? In 1753, D. Bernoulli argued that since overtones could be detected
audibly, it was reasonable to suppose that the resulting motion might be
represented by superposition of the natural modes in the form of an infinite
series. However, Bernoulli's arguments were criticized by Euler, D'Alembert,
Lagrange, and Laplace so severely, that when Fourier reawakened interest
in such series representation as possible solutions to problems in heat con-
duction (c. 1807), his work was also regarded with suspicion. The resulting
questions of Fourier series representation exerted a profound influence on
the subsequent development of mathematical analysis, and interest continues
to this day. We state without proof, the following result (which is not the best
possible):
(8.4) Proposition. If U o E C 4 [0, I], with uo(O) = uo(l) = u~(O) = u~(l) = 0, then
the coefficients bn ~ (211) J~ uo(x) sin(mrxll) dx, satisfy for some constant M, an
estimate of the form Ibnl :$; Mln\ n = 1,2, .... Hence, the series
(8.5) Proposition. When p and T are positive with Pt = T t = l1J, there is at most
one C Z solution u to the wave equation pUtt = (TUJx on [0, (0) x [0, I] with
given initial data (56) and (57), which satisfies the boundary conditions (55):
u(t, 0) = u(t, I) == O.
PROOF. If U 1 and U z are two solutions to this problem, then u ~ U 1 - Uz
provides a solution to the same equation with zero initial data. Introducing
the associated total energy at time t
If we again consider oscillatory solutions of the form (60), we see that the
mode shape U o is now required to satisfy the equation
(TY')' = _O)zpy (63)
and the homogeneous boundary conditions
y(O) = y(l) = O.
This is a problem of the Sturm-Liouville type encountered in §7.3 for a posi-
tive eigenvalue A. = O)z. In Problem 7.19, it is shown that a solution U 1 which
is positive on (0, I) should minimize
F(y) = I TY'Z dx
LOADING
x
Iu
DISTRIBUTED TENSILE FORCE
Figure 8.6
§8.9. Continuous Media 267
local density and tensile stress may vary with time, as well as position, as
would occur when a kettle drum is tightened just after it is struck.
Let X = (x, y). Then we shall assign the membrane a local areal density
p = p(t, X) so that if u = u(t, X) denotes the vertical displacement at time t,
then the associated kinetic energy is given by
Here, the first term represents the work done in stretching the membrane
as in §3.4(e), while the second term is the work done in moving the distributed
load under deflection. In both cases we take as reference state that described
by u == 0.
The resulting Lagrangian is, of course,
L = T- U,
and Hamilton's principle requires that between times t = and t = b at
which the positions of the membrane are prescribed, it should execute a
°
motion which makes stationary the action integral
A(u) = J: L dt = J: Iv
dt (ipu; - .(J1 + u; + u; - 1) + pu) dX (67)
on
q) = {u E C 2 ([0, bJ x D): UlaD = l!!, u(O, X), u(b, X) prescribed},
Thus, denoting the integrand by
f = f«t, X), u, Vu),
we have from Theorem 6.13, that u must satisfy the differential equation
V"ivu = fu; upon substituting, and subsequently utilizing the linearizing
assumption that u; + u; ~ 1, it follows that u should satisfy (approximately)
the partial differential equation:
t ;?: 0, XED. (68)
(Problem 8.18).
268 8. Variational Principles in Mechanics
for constant p, 1:, with given initial displacement uo, and initial velocity vo, to
obtain the resulting unforced motions (p == 0) of the membrane.
There are associated series methods which may be employed to find forced
motions, making further use of the linearity of (68), but the actual computa-
tions, even for the homogeneous case, are quite difficult. They have been
1 See "One Cannot Hear the Shape of a Drum" by C. Gordon, D. Webb, and S. Wolpert,
Bulletin of the A.M.S., Volume 27, No.4, July, 1992 pp. 134-137.
§8.9. Continuous Media 269
carried out only for a few simple domains, D, such as the disk, the rectangle,
and the annulus. See [Tr], [Wei], and [C-Hl
PROBLEMS
82
Mg/(1 - cos 8) ~ Mg1 2 ,
(for small 8) are made in the Lagrangian (11), and we suppose that
K(X) = KX, then the corresponding equations of motion are given by
(16).
(b) Explain why it is not inconsistent to use both of the approximations
cos 8 ~ 1 and cos 8 ~ 1 - 82 /2 in the same equation in part (a).
Q=AQ, (72)
(b) Define Ql(t) = Vcos wt and Q2(t) = Vsin wt, where V#-O is a constant
vector and w is a (real) constant. Show that Ql and Q2 are solutions of (72)
iff AV = -w 2V; i.e., V is an eigenvector of A with corresponding eigen-
value _w 2 •
(c) Show that for any positive choices of m, M, I, K, the eigenvalues;' of A are
always negative. Are the eigenvalues necessarily distinct? (An eigenvalue
;. is a root of the equation det [A - 1I] = 0.)
(d) Conclude that if A has real eigenvalues - wi < - w~ < 0 with correspond-
ing eigenvectors VI and V2, then the solution of the initial value problem
Q = AQ with Q(O) = Qo and Q(O) = Qo can always be expressed in the
form
Q(t) = (c 1 cos wIt + c1 sin W1 t)V1 + (C2 cos w2t + C2 sin W2t)V2
8.3. A double pendulum consists of two light inextensible rods of length I and two
bobs of mass ml and m, respectively, which are constrained to move in a
vertical plane as shown in Figure 8.7. Assume that the pivots are frictionless.
Use the generalized coordinates 81 and 8.
(a) Express T, U, and L in terms of 81 , 8, 01 , 0.
(b) Determine the differential equations of motion.
(c)* Find solutions for the linearized equations by the method of the previous
problem.
Problems 271
Figure 8.7
8.4. (a) Verify that the canonical equations (25) are indeed the first Euler-
Lagrange equations for the generalized action integral (26).
(b) Show that the variation of A at (Q, P) in the direction (V, W) is given by
M«Q, P); (V, W» = f [(Q - Hp)· W - (P + Ha)· V](t) dt + (p. V)(t) I:.
(c) Conclude that functions (Q, P) satisfying (25) make A stationary on a set
where only Q(a) and Q(b) are prescribed. Is the converse true?
z
/.
(x, y, z)
x
Figure 8.8
8.6. Consider a dynamical system with one degree of freedom. If the Hamiltonian
does not depend explicitly on time, the Hamilton-Jacobi equation takes the
form
as + H (as)
at q, aq = O. (73)
as + e_ q
at
(aS)3
aq
= o.
8.7. Assuming that the sun is fixed at the origin, the planar motion of a single planet
of mass m about the sun may be specified by giving the polar coordinates (r, e)
of its position at each time t. The potential energy function which recovers the
inverse square law is U = - mk/r for an appropriate constant k.
(a) Show that the associated kinetic energy function for Q == (r, e) is
T = (m/2)(r2 + r2iJ2).
(b) Obtain the Hamiltonian H in terms of the conjugate momentum P ==
(Pr' Po), and write the reduced Hamilton-Jacobi equation H(Q, sQ) = E =
const.
Problems 273
H =1(2P;
- P +-+ Pt)m k
--
2m r r2 r2 sin 2 qJ r
in terms of P == (Pro P"" P9). Hint: Use (27).
(c)* Show that the reduced Hamilton-Jacobi equation H(Q, sQ) = E = const.
may be separated successively as follows:
8s
80 = IX;
(_8s)2 + sin
8qJ
__ =p2.
2
1X2
qJ ,
8.10. (a) Apply the result of Theorem 6.10 to obtain equations for the stationary
functions minimizing the action integral of equation (33) subject to the
general constraint K(Q, P) = 0, in the form
for a constant E.
(d) Consider for this f,
F(u) = f U{3
f(X, u, Vu) dX,
8.12. In the theory of special relativity in which we ignore the effects of gravitation
and postulate the constancy of c, the speed of light in a vacuum, we may
modify the Lagrangian so that the form of Hamilton's principle remains valid.
For the case of a single particle of constant (rest) mass mo, moving freely
in an electromagnetic field of vector potential U(t, X) and scalar potential
U = qqJ = qqJ(t, X), at a speed v = lXI, we may use as the Lagrangian
Conclude that even when at rest with U = 0, the particle should contain
the enormous store of energy Eo = m oc2 , which is potentially available
for release.
Problems 275
d .
-(mopX) = -q[U, + 'ilqJ - (V x !B)],
dt
8.16. The vertical column of water rotating at constant angular velocity w (from §2.3)
may be analyzed via Hamilton's principle.
(a) Using the coordinates of Figure 2.1, show that at time t, the kinetic energy
(of rotation) is T = !p g 2nx(Ib (WX)2 dz) dx = npw 2 I~ x 3 y(t, x) dx, where
y = y(t, x) describes the profile of the upper free surface.
(b) Demonstrate that the potential energy (due to gravity) associated with the
deflected upper surface is U = pg I~ 2nx(Jb z dz) dx = npg I~ xy2(t, x) dx.
(c) Assuming that the positions of the rotating system are specified at times
o and b, apply Hamilton's principle to the action integral
A = A(y) = J: L dt = J: (T - U) dt,
where N is the outward pointing unit normal to the boundary curve from D.
Problems 277
(d) Find other boundary conditions which would also guarantee uniqueness
of solution, and describe the membranes so supported.
(e) Verify that when p and or are constant while p = 0, and U o is a solution of
(69), then u(t, X) == uo(X) cos rot will be a solution of (68) satisfying the
condition ut(O, X) = vo(X) = O.
8.19. (a) Verify the strong convexity of f(}{, u, Vu) as defined in (71). (Problem
3.26).
(b) Obtain the associated (nonlinear) equation characterizing a function
which is stationary for this f (Theorem 6.13).
(c) Conclude that each solution U o E !?) of the equation found in (b), must in
fact minimize U on !?) uniquely.
(d)* Can convexity be used to give a uniqueness argument for solutions to the
time dependent wave equation in any of the forms considered in §8.9?
8.20*. Transverse Motion of an Elastic Bar. For a horizontal thick elastic bar of
constant rectangular cross-section and length 1 in which the energy of stretch-
ing may be neglected in comparison with that of bending, let u(t, x) denote the
"vertical" position of the center line at time t.
(a) Argue that for a suitable density p there should' be an associated kinetic
energy T = t S~ put dx.
(b) If the bar is subjected to a distributed downward loading of intensity
p(t, x), argue that the resulting potential energy is approximately
-
U = J(I {]:JlUxx -
0
1 2
pu) dx,
for a suitable material stiffness function Jl = Jl(t, x). Hint: See the discus-
sion of WB in §6.6.
St
(c)* For the action integral A(u) = (T - U) dt, use the definition liA(u; v) =
lim£~o (djde)A(u + ev) to prove that
+ f~ putv/: dx - J: I: + J:
(JlUxx)Vx dt I:
(JlUxx)x V dt.
(e) Conclude that stationarity of A for prescribed u at times 0 and b, and, say,
the cantilever support conditions u(t, 0) = ux(t, 0) = 0; u(t, 1) = u,,(t, 1) = 0,
requires that
(75)
(f)* When Pt = Jlt = 0, use the total energy function E(t) = T + U to give a
uniqueness result modelled after Proposition 8.5 as extended in Problem
8.17.
278 8. Variational Principles in Mechanics
(g) Study the expression in (d) to obtain alternate sets of boundary conditions
which would lead to the same differential equation as in (e).
(h) When p = Jl. = const. and p = 0, determine an ordinary differential equa-
tion for U o = uo(x), if u(t, x) == uo(x) cos wt is to be a solution of (75).
(i) Can you find or guess a nontrivial solution Uo of the equation in (h) which
will permit u to meet the cantilever conditions in (e) at least for certain
values of w? Hint: See §6.6.
8.21 *. Transverse Motion of a Uniform Plate. If the membrane discussed in §8.9(b) is
replaced by a plate of uniform thickness and material, we may neglect the
energy of stretching in comparison with that of bending which is now given
(approximately) by
U= ~
2
r [(u;x + U;y) - 2(1 - 1:)(u"""uyy - UXy )2] dX,
JD
(76)
where, of course, u(t, X) denotes the vertical position of a center section at time
t, and Jl. and 1: < 1 are positive material constants.
(a) Argue that for an appropriate constant density p, the kinetic energy of
motion at time t should be approximately T = ! ID put dX.
It
(b) Set A(u) = (T - U) dt, and, neglecting external loading, reason that for
some boundary conditions, stationarity of A at u E C4 requires that u
should satisfy the equation
(77)
where
.1 2 u = .1(.1u) = .1(u""" + Uyy ).
(c) Which equation is U o = uo(X) required to satisfy in order that u(t, X) =
uo(X) cos rot be a solution of (77)?
(d) For static equilibrium of the loaded plate with pressure p = p(X), when
all functions are time independent, use convexity of the integrand of
U~U- LpUdX
to conclude that even for a nonplanar plate, only stable equilibrium is
possible, and it is uniquely characterized by a Uo which satisfies the equa-
tion:
= -F(Y):::;; 12/2n,
with equality iff on [0, I], (x, y) parametrizes a semicircle. Hint: ji =
y2/2 E C1(0, I). See Remarks 8.3.
(b) Define A2 similarly and conclude that A :::;; Al + A2 :::;; 12/n with equality
iff (x, y) parametrizes a circle.
(c) Can you extend the last result to piecewise C 1 curves fll?
Figure 8.9
280 8. Variational Principles in Mechanics
F(Y) = f,0
b
(Y'2 - y2)(X) dx
-
on !!# =
{YEC1[0,b]:Y(0)=0
Y(b) = sin b
}
sin 2
y(O) = y'(O) = 0; y(b) = -2- .
b}
(b) Show thatf(y, z) = z2j2y - 2y is (only) convex on {y > O}. Hint: Look at
its Hamiltonian. (for which H{2£, y, p) is strictly convex).
(c) When b < n conclude that y(x) =-(sin 2 x)j2 minimizes F on !!# uniquely.
(d) When b = n, conclude that y(x) = (sin2 x)j2 minimizes F on !!# (but not
uniquely), and obtain the Wirtinger inequality
Hint: We can assume that y(x) > 0, when 0 < x < n. (Why?)
(e)* Show that y(x) = (sin2 x)j2 does not minimize F on !!# when b > n, even
though it appears to satisfy the appropriate conditions.
8.29. (a) For the seismic wave problem 1.8(b), make a transformation (such as that
in Problem 8.28) for which y'(x)jy(x) = y'(x), and show that the new
J
integrand function j(y, z) = e 2y + Z2 is strictly convex on [R2.
Problems 281
(b) Conclude that a circular arc provides the (only) path of travel for such a
seismic wave.
8.30. (a) If g = g(y) is [strictly] convex and positive on an interval I, then show that
f(y, z) = J g2(y) + Z2 is [strictly] convex on I x IR.
(b) Under what conditions on g = g(x, y) will f(2£, y, z) = - J g2(2f, y) - Z2 be
[strictly] convex on a domain of definition? Give some nontrivial examples
of such g.
CHAPTER 9*
r r
but not sufficient to characterize a minimum value for the integral function
on a set such as
~ = {Y E C 1 ([a, b])d: Y(a) = A, Y(b) = B},
since they are only conditions for the stationarity of F. However, in the
presence of [strong] convexity of f(}£, Y, Z) these conditions do characterize
[unique] minimization. [Cf. §3.2, Problem 3.33 et seq.] Not all such functions
are convex, but we have also seen in §7.6 that a minimizing function Yo
must necessarily satisfy the Weierstrass condition cS'(x, Yo(x), Y~(x), W) ~ 0,
V WE !Rd, X E [a, b], where
282
§9.1. The Weierstrass Method 283
r r
prove that a given stationary function Yo E (C 1 [a, b])d minimizes
-Y
- - 'I'
(b, Yo(b)
- - Yo
Figure 9.1
284 9*. Sufficient Conditions for a Minimum
so that F(Y) - F(Yo) = O'(b) - O'(a). Were O"(x) ~ 0, it would follow by the
mean value theorem that F(Y) - F(Yo) ~ O. Moreover, if also a' is continu-
ous on [a, b], then equality holds iff a' == O. (§A.1, §A.2.)
,I, ( ) def ( )
If' t;x =yx
sin
- . -t
smx
is the unique function which is stationary for f and satisfies t/J(O; x) = 0 with
t/J(x; x) = y(x). (See Figure 9.2.)
Thus, for this example, equation (2) becomes
Graph of
y
------ -y
- - !/lb,)
--Yo
- Figure 9.2
and the integrand is from the chain rule, (4), and stationarity, given by
a
axf{t, 'P{t; x), 'P'{t; x)) = fy['P{t; x)] 'PAt; x) + fz['P{t; x)] ('P')At; x)
a
= at {fz['P(t; x)]· 'Px{t; x)}.
r r
formal calculations.) Thus finally, we obtain Weierstrass' formula:
F{Y) - F{Yo) = cr'{x) dx = c!{x, Y{x), 'P'{x; x), Y'{x)) dx, (7)
°
which proves that c! ~ will imply that F{Y) ~ F{Yo), provided that an
appropriate family of stationary functions 'P{.; .) having all of the assumed
properties is available. Unfortunately, it is quite difficult to prescribe con-
ditions which ensure the existence of such families (one for each competing
Y E ~), and instead in §9.3 et seq. we shall concentrate on a less direct
approach of Hilbert, which yields Weierstrass' result even for piecewise C 1
functions Y.
(Problems 9.1-9.2, 9.5-9.6)
~ fz(x, Y, Z) . (W - Z),
with equality at (x, Y) iff 1W - ZI 2 = 0 or W = Z.
f~, y, z} = )1 : Z2
is not convex, but f~, l' z} is strictly convex for the half space
{(x, y, z) E 1R3: y > O}
since fzAx, y, z} > o. (See Proposition 3.10 and the next example.)
Example 3. For d = 2, when Y = (x, y), the function f{J, X; Z) = J.YIZI 2 is
[strictly] convex on the half-space -
{(t, x, y, Z) E IRs: y ~ OJ, [{(t, x, y, Z) E IRs: y > On
For, by the computation of Example 1,
f(t, Y, W} - f(t, Y, Z} = JY(I W - ZI 2 + 2Z· (W - Z»
~ fz(t, Y, Z} . (W - Z),
[with equality for y > 0 iff W = Z].
(9.2) Proposition. If f = f(x, Y, Z) together with its partials fz, and fziz j,
i, j = 1,2, ... , d, is continuous in a Z-convex set S s;;; 1R 2d + 1 (one which contains
the segment joining each pair of its points (x, Y, Zo), (x, Y, Zl)) and the matrix
fzz is positive semidefinite [positive definiteJ in S, then fVi, Yo Z) is [strictlyJ
convex in S.
PROOF. For (x, Y, Zo) and (x, Y, Zl) in Sand t E [0, 1J, the point
deC
Zt = (1 - t)ZO + tZ1
lies on a segment contained in S by hypothesis. Integrating by parts, we get
+ fo
1 (1 - t) (. t
I.J=l
fzizj(X, Y, Zt)V;Vi ) dt,
§9.3. Fields
The Weierstrass construction in §9.1, when possible, results in a family of
stationary trajectories (the graphs of the functions '1'(.; x)) which is consistent
in that one and only one member of the family passes through a given point
(x, Y(x)). Suppose more generally, that for a given f we have a single family
of stationary functions whose trajectories cover a domain D of IRd + 1 consis-
tently in that through each point (x, Y) ED passes one and only one trajec-
tory of the family, say that represented by '1'(.; (x, Y)) E (C1[a, bJ)d. Then the
direction of the tangent line to the trajectory at (x, Y) given by
<I>(x, Y) ~ '¥'(x; (x, Y))
§9.3. Fields 289
1 The literature in this subject does not provide a uniform definition for fields. That given here
seems most convenient for our purposes.
290 9*. Sufficient Conditions for a Minimum
y y
(f) x
(a) (b)
Figure 9.3
Example 1. The time of travel function for a seismic wave (Problem 1.8); viz.,
f(y, z) = ~/y, is C 2 for y > 0, and has as its stationary trajectories the
semicircles with centers on the x axis (Problem 9.3). Geometrically, as shown
in Figure 9.3(a), it is evident that those with fixed center (say the origin)
indexed by the radius 2, form a consistent stationary family, in the upper
half plane D = {(x, y) E ~2, Y > O}. Analytically, these are defined by
t/I(t,2) = J2 2 - t 2, It I < 2;
and that passing through the point (x, y) is obtained for 2 = Jx 2 + y2. Thus
the associated (stationary) field in D is
[~ Z2] 1 1
= Y - y~ z=tp(x,y) = yJl + X 2/y2 = Jx 2 + y2'
Here, it may be verified directly that p = Sy and h = Sx when S(x, y) =
log(Jx 2 + y2 + x) -log y; (or use (10)). Thus <p(x, y) = -x/y defines an
exact field for f in D. (Another consistent stationary family is given by those
semicircles with fixed (left) end point, as in Figure 9.3(b). The resulting field is
also exact, but it is defined only in a quarter plane (Problem 9.3).)
§9.3. Fields 291
In this example, stationary fields provided the exact fields. We examine the
intimate relations between these fields in the next two results.
(9.4) Proposition. When fz is C 1 , then each exact field <I> for f is a stationary
field; i.e., each C 1 solution Yo of the field equation Y'{x) = <I>{x, Y{x)) is a
stationary function for f
PROOF*. We have by (9) and the hypothesis that when Y~{x) = <I>[Yo{x)], then
fz[Yo{x)] = fz{x, Yo{x), <I>[Yo{x)]) = P{x, Yo{x)) (11)
is C 1 • Also both P and hare C 1 and again from (9)
hy = fy + <l>rfz - <l>yP - Py<l>
= fy - Py<l>,
when the partials of f are evaluated at (x, Y, <I>{x, Y)). (In the matrices Py and
<l>y, the rows are indexed by Y; moreover, by the second set of equations (10),
the Jacobian matrix Py is symmetric and so is equal to its transpose Py.) Thus
fy[Yo{x)] = hy{x, Yo{x)) + Py{x, Yo{x))<I>{x, Yo{x)), (12)
so that finally by substitution of (11) and (12), and the chain rule:
d d
dxfz[Yo{x)] - fy[Yo{x)] = dx P{x, Yo{x)) - fy[Yo{x)]
= 0, by (10).
Thus Yo satisfies the Euler-Lagrange equation of §6.7 and so is stationary.
o
Hereafter we shall suppose fz is C 1 in all cases of interest.
The converse of Proposition 9.4 is not true in general. (See [el) However,
if ({J is a stationary field in a domain D of 1R2, then, by definition, each field
trajectory is a stationary trajectory; since the "matrix" Py is trivially symmet-
ric we may again conclude that equation (13) holds and the first exactness
condition p" - hy = 0 of (10), is met at each point (x, Yo) E D. When d = 1
and D is simply connected, it is the only requirement and we have established
the following:
292 9*. Sufficient Conditions for a Minimum
provides precisely one stationary curve joining the origin (0, 0) to a given
"lower" point (x, y); namely, the cycloid which is represented parametrically
by the equations .
t = A(r - sin r)
(0 ~ r ~ () < 2n), (14)
t/I = A(1 - cos r),
where A > 0 and () are determined uniquely by the boundary conditions
x = A(() - sin (),
(15)
y = A(1 - cos ().
(The previous notation has been replaced by one more amenable to our
present requirements.) The resulting family of cycloids, denoted t/I(t, Je), is
shown in Figure 9.4. The associated field is defined by the direction of t/I(t, Je)
at t = x; i.e.,
( ) '/"( ') I
({JXY='I'tA
, 't=x
=t/lt
-
tt
I
t=6
A sin r
=--:-:--c-------:-
A(1 - cos r)
I
t=6'
Figure 9.4
§9.3. Fields 293
the quarter plane D = {(x, y) E ~2: x> 0, y > O} which is simply connected.
Thus ({J is exact, by Corollary 9.5. (Another exact field is discussed in Problem
9.9.)
say, depends only on the end points of r; i.e., its value is invariant with the
particular curve in D which joins given end points. (This familiar conse-
quence of exactness is independent of particular choices for h, P [Ed].) I is
called Hilbert's invariant integral for <I> in D.
However, with our choices (9), we see that when r is represented as the
graph of a function Y E (C1[a, b])d, then formally, on r, dY = Y'(x) dx and
I(r) = r
(18) becomes
Moreover, when another such curve r o , say, is a trajectory of the field <1>,
represented by Yo E (C 1 [a, b])d, then by definition Y~(x) = <I>(x, Yo(x)) so that
(19) reduces to
It follows that when both rand ro have the same end points and lie in D
as shown in Figure 9.5, then by (20):
a b x
Figure 9.5
As we have stated, in §9.6 we will prove that every central field is exact. l
To avoid duplication we anticipate this result in formulating the next
(9.7) Theorem (Hilbert). Let f(~, J', Z) be [strictly] convex on D x [Rd where
D of [Rd+1 is the domain of <1>, an exact field (a central field) for J, which
contains the stationary field trajectory ro represented by Yo E (Cl[a, b])d.
Then Yo minimizes
F(Y) = f f[Y(x)] dx
[uniquely] on
~ = {Y Eel [a, b]d: Y(a) = Yo(a); Y(b) = Yo(b); (x, Y(x)) ED}.
PROOF. Since from convexity, $ ~ 0, it follows from (21) that F(Y) ~ F(Yo)
with equality iff
$(x, Y(x), <I>[Y(x)], Y'(x)) = 0 (where defined).
[With strict convexity this implies that where defined, Y'(x) = <I>(x, Y(x)) so
that Y = Y E (Cl[a, b])d represents a field trajectory. However, the field tra-
jectory through the point (a, Yo(a)) is unique, and thus Y == Yo.] D
f(x, y, z) = P/-
+Z2
-y-'
.
wIth fz(x, y, z) = r.:
z
~'
yyy 1 + z
we know from Example 2 of §9.3 that we have an exact field cp in the
quadrant D = {(x, y) E IFe, x> 0, y > O} for which the field trajectories are
the cycloids with cusp at the origin. Moreover, from §3.3, Example 3, it
follows that f(~, y, z) is strictly convex in D x IR. Hence, from Hilbert's theo-
rem, we can conclude that the cyloid, r o, joining the origin to a given point
(b, bd E D is represented by Yo E C 1 (0, b] which minimizes uniquely for each
a > 0, the time-of-descent integral given within a constant factor by
on
T,.(~) = r 1 + ~'(X)2
~(x) dx
(22)
a b a b
x
y y
r.
(a) (b)
Figure 9.6
§9.4. Hilbert's Invariant Integral 297
PE C 1 (0, b] for which T(P) < +00. Then for each a E (0, b), let Ua be the
segment (in D) joining the (possibly distinct) points (a, yo(a)) and (a, p(a))
as shown in Figure 9.6(b).
Denote by ra and t the parts of the curves ro and t, respectively, cor-
responding to x ~ a, and observe that with proper orientation of Ua , t + U a
constitutes a piecewise C 1 curve with the same end points as ra. By the
invariance of 1, Hilbert's integral (18) for cp, we obtain
1(ra) = 1(t + ua ) = 1(t) + 1(ua )
Next, with T" as defined by (22) above, we may reproduce the analysis
leading to Hilbert's theorem, utilizing in particular (19) and (20), to conclude
that
T,,(P) - T,'(Yo) = r
C(x, P(x), cp[P(x)], P'(x)) dx - l(ua ) (23)
(Problem 9.10)
Since f ~ 0, and C ~ ° in D x 1R2, we have
J: 1 + P'(X)2 fb 1 + P'(X)2
T(P) = P(x) dx ~ a
P(x) dx = T,,(P) ~ T,,(Yo) - 1(ua )·
(23')
Now as a '" 0, T,,(Yo))" T(yo), while on ua : dx = 0, and p(a, y) =
fz(a, y, cp(a, y)), so that from (18),
which establishes the minimality of the cyloid. The uniqueness requires fur-
ther analysis; [see Problem 6.15].
(Problems 9.7-9.10, 9.13-9.15)
YI
Figure 9.7
F(Y, t) = f f[¥(x)] dx
on
~t = {¥ E (C 1 [a, t])d: ¥(a) = A, (t, ¥(t)) E T; (x, ¥(x)) ED},
where the right end point is confined to the transversal T defined as the zero
level set of a C 1 function r (whose gradient, Vr, is nonvanishing on T)
provided that Tis C 1 arcwise connected in D.
For then, when ro is a trajectory of the field represented by Yo E ~" and r
is the curve represented by a competing function ¥E~" under this assump-
tion, we may join their right end points by a C 1 arc UT in D n T as illustrated
in Figure 9.7.
r
With proper orientation we may consider ro and + U T as piecewise C 1
curves in D having the same end points. Since <I> is exact we have as before
from (20)
and since (x', ¥') is a tangent vector to a curve in T (and hence to T itself),
which will vary with O"T' we must demand the vanishing of the integrand in
(24'); i.e., that the field functions hand P defined by (9) provide a vector (h, P)
which is normal to T at each point. But, by the well-known argument (repro-
duced at the end of §5.6), the gradient Vr = (rx, ry) is normal to T at each
point. Hence we should require that (h, P) is proportional to Vr at each point
of T, or equivalently, we require that the field <I> meet the transversal condition
(25)
When d = 1, recalling that y~(x) = cp(x, Yo(x)), we can verify that (25) is
precisely the transversal condition (equation (15) of §6.4) which is necessary
for a local extremal of this problem. (25) may be regarded as its vector valued
generalization.
(9.9) Theorem. Let f0i, Yo Z) be [strictly] convex on D x [Rd where D 5;: [Rd+1
is the domain of <1>, an exact field (a central field) for f, which meets the
transversal condition (25) on the set T where:
(i) T = {(x, Y) E D: r(x, Y) = O},for a given C l function r with VrlT =f. (!J; and
(ii) Tis C l arcwise connected.
If ro is a field trajectory represented by
Yo E ~t = {Y E (C l [a, t])d: Y(a) = A, (t, Y(t)) E T; (x, Y(x)) ED},
then Yo minimizes F(Y, t) = J~f[Y(x)] dx on ~t [uniquely within the specifi-
cation of its interval of definition].
PROOF. Since the transversal condition (25) forces the vanishing of I(O"T) for
all C l arcs O"T 5;: T, Hilbert's formula (24) is applicable. The convexity of
f0i, Yo Z) makes <ff ~ 0, and gives the minimizing inequality
F(Y, t) ~ F(Yo, b),
(where we suppose (b, Yo(b) E T), with equality only if
<ff(x, Y(x), <I> [Y(x)], Y'(x)) == O.
[With strict convexity this requires that Y'(x) = <I>(x, Y(x)). Hence f = r l is
a trajectory of the field, and since both r l and ro pass through (a, A), we
know that their representing functions Y1 and Yo will agree on each common
interval [a, x] of definition. As Figure 9.7 indicates, it is possible that the
transversal T cuts a given field trajectory more than once, and then we must
300 9*. Sufficient Conditions for a Minimum
accept the consequence that F(Yo, b) = F(Yl' td for b -# t 1 , even in the pres-
ence of strict convexity (Problem 9.17).] 0
The preceding arguments admit various simplifications with the form of!,
which will be taken up in Problems 9.16 and 9.17. However, it must be
admitted that field construction is difficult at best without the additional
complications accompanying the imposition of boundary conditions such as
(25). Hence our applications will be confined to those covered by Hilbert's
theorem 9.7. For a more complete discussion, see [S].
[uniquely] on
F(Y) = r f[Y(x)] dx
r r
under the constraining relations:
Remarks. In view of Proposition 3.2, the [strict] convexity for j will follow
from the convexity of f~, X. Z) and Jlj(!., .Y)gj~' X. Z), j = 1, 2, ... , N;
[with the strict convexity of one of these terms]. Yo also minimizes on the
larger class of functions Y in ~ which satisfy (26) when ":5:" replaces the
equality sign as in Proposition 2.5.
In application, there are three separate cases of interest:
(9.10a) If the Jlj are constant, j = 1,2, ... , N, then we obtain [unique] minimi-
zation for F on ~ under the isoperimetric constraints
~
Gj(Y) =
deC fb glY(X)]
a
~
dx = GiYo), j = 1,2, ... , N. (27)
Since there are as many multipliers Jlj' as constraining relations (27), there is
some hope that we may find the Jlj which retain the convexity while permit-
ting the Gj(Yo ) to be specified.
(9.10b) When the Jlj = Jlj(x), j = 1,2, ... , N, then we obtain [unique] minimi-
zation on ~ under the Lagrangian form of constraining equations
j = 1,2, ... , N. (28)
Now, however, there is little reason to believe that the Jlj can be found which
permit preassigned functions gj[Yo(x)] (say gj[Yo(x)] == 0), j = 1,2, ... , N.
Indeed, each Lagrangian constraint of the form glY(x)] = 0, in general
restricts the dimensionality of the Euclidean space available for the solution
trajectories, so that unless N < d (which precludes the case d = 1) we should
not suppose that even these constraining relations can be satisfied. However,
(9.10c) When the Jlj = JliX, Y) and glYo(x)] == 0, j = 1, 2, ... , N :5: d, then
we obtain [unique] minimization for Yo under the preassigned Lagrangian
constraints
j = 1,2, ... , N, v X E [a, b]. (29)
Observe that since Yo represents a trajectory of the field CI>, (29) yields
gix, Yo(x), CI>(x, Yo(x))) == 0 and this will hold a fortiori if we require of the
field that gj(x, Y, CI>(x, Y)) == 0, in D, for j = 1, 2, ... , N. As we shall see in
Example 3, this additional field requirement may actually assist in the deter-
mination of the multiplier functions Jlj.
Example 1. To minimize
Ftp) = Ll [P'(X)2 + P(X).P'(X)4] dx
302 9*. Sufficient Conditions for a Minimum
on
~ = {P E ClEO, 1]: .9(0) = 1, y(1) = 0, Y ~ O},
when subject to the isoperimetric constraining relation
G( yA) deC
= f
1
o
xyA,4()
X dX = 1
2,
Example 2. We may also use the field of Example 1 to conclude that Yo(x) =
1 - x minimizes F [uniquely] on ~ under the Lagrangian constraint:·
goLy(x)] == y'(X)4 = y~(X)4 == 1,
or equivalently,
gl[.Y(X)] == y'(X)4 - 1 == 0,
if we take Jl(x) = Wx in 9.10b.
PROOF.
on
To transform the problem of minimizing
F(Y) = r f[Y(x)] dx
we proceed as follows:
on
F*(Y*) = r f[Y(x)] dx = F(Y)
(33)
= K/b) - Kj(a) = lj.
Thus Yo minimizes F on ~ [uniquely] under the isoperimetric constraints
(31).
Theorem 9.10(c) permits the construction of a more usable field for the
isoperimetric problem as reformulated than does Theorem 9.10(a). However,
it is more difficult to construct higher-dimensional exact fields, and more-
over, as we have observed in §3.5, it is especially difficult to use the freedom
in selecting J1. effectively. Despite these complications, this approach will yield
a solution to the classical isoperimetric problem.
Example 3*. In Problem 1.6 it was shown that the classical isoperimetric
inequality would follow from the Wirtinger inequality
F(y) = 1 0
2"
[Y'(X)2 - y(xf] dx ~ 0
on
!!) = {y E c 1 [0, 2n]: y(O) = y(2n) = O},
subject to the isoperimetric constraint
G(y) == 1 0
2"
y(x) dx = o. (34)
on
t 2
" [P'(X)2 - P(X)2] dx
where J.l is as yet unspecified, and the factor of 2 is introduced for conve-
nience. The stationary functions for j satisfy the usual equations, (§6.7) which
in this case are
!2y' = -2y + 2J.l + 2J.l y (Y - y~),
and
Equations (38) determine the stationary family, 'P(x; A), indexed by the
parameter A. By construction, '1'(0; A) = (!). Moreover, for each x E (0, 2n):
the Jacobian matrix,
A)~[t/1!l c~s x
'PA(x,
(40) by
<p(x, Y) = t/J'(x; A(x, Y)) = J1(x, Y) sin x + Jc(x, Y) cos x
(41a)
= [(sin x - x cos x)y + (cos x - 1)Y1J/A(x),
while in view of (35),
<P1(X, Y) = t/J~(x; A(x, Y)) = y. (41b)
In the next section we shall verify that this is a central field for j in D, and
from (36) we observe that j0s:., X. Z) is convex (but not strictly convex) in D,
since this is true of the term Z2 while the remaining term is linear in z 1. Hence
Hilbert's theorem (9.7) is applicable to j in D x [R2.
and hence
ly(x)1 = If: p'(t)dtl ~ Mx,
IY1(X)1 ~M f: t dt ~ Mx 2.
Let f be the graph of Y, and for 0 < a < b < 2n, let tb be that part of f
for x E [a, b]. Finally, let Ua and Ub be the segments joining the end points of
tb to (a, (9) and (b, (9), respectively, as shown in Figure 9.8.
With proper orientation of these segments we may use Hilbert's integral
[ for the field <I> of j (as in the analysis of the brachistochrone in §9.4) to
conclude that
f j[y(x)] dx ~ f j[Yo(x)] dx + [(ua ) + [(ub ),
or
§9.5. Minimization with Constraints 307
21t X
Figure 9.8
The desired inequality will follow if we prove that I(O',J --+ 0 when x = a '" 0,
and when x = b )" 2n. We shall concentrate on the first and more difficult of
these assertions. Since iz(x, Y, Z) = 2(z, - Jl), by (36), then from (9), the field
function P(x, Y) = iz(x, Y, CI>(x, Y)) = 2(<p, - Jl)(x, Y), where Jl and <p are
given by (40) and (41a).
This last follows because 2(<p dy - Jl dYl) is (for fixed x) an exact differential
of the function in braces, as may be verified by differentiation and comparison.
Using the estimates for Y(x) obtained previously, we have for small x, that
11(0',,)1 :$; M2 x 2 [(sin x - x cos x) + 2(1 - cos x)x + x 2 sin xJ/L\(x).
By using, say, L'Hopital's rule, it can be shown that lim" ... o L\(x)/x4 is non-
zero, while lim"",o ([term in bracketsJ/x 2 ) = O. It follows that 11(0',,)1--+ 0 as
x'" o.
To establish the corresponding results as x )" 2n, we note that since
Y(2n) = (!) we have
1 Y(x) 1 :$; IJf,,2" Y'(t) dt I :$; M(2n - x), while lim 2L\(X) is nonzero.
")'2,, n-x
then by the methods of §6.7, we know that Yo is stationary for j with J.l =
const., and hence must be of the form y of (38) with $5" Yo(X) dx = 21tJ.l = O.
Thus Yo(x) = A sin x. Conversely, any function of this form gives F(yo) =
A$5" (cos 2 x - sin 2 x) dx = O. (A direct ad hoc proof of the Wirtinger in-
equality is given in [H-L-PJ, while a field for the isoperimetric problem
itself is presented in Problem 9.20.)
(Problems 9.18-9.20)
(9.13) Lemma. Let qt(.; A) be a central family for f. If the matrix qtA with
elements 8"'d8Aj , i, j = 1,2, ... , d is invertible at (Xl' A l ), then in a neighbor-
hood Dl of (Xl' Yl == qt(Xl' A l )), for values of A near A l , A = A(x, Y) is
defined and Ct, and
<I>(x, Y) = qt'(x; A(x; Y)) ~ qtt(t; A(x, Y))lt=x (44)
is a central field in D l .
This lemma shows that however indexed, a central family will determine a
central field in a neighborhood of each point associated with an invertible
matrix 'I'A. Outside this neighborhood, the associated trajectories may inter-
sect in a quite complicated manner, and in particular, they must intersect at
the center 0(. Moreover, other trajectories of the original family might inter-
sect these even in this neighborhood. The situation is illustrated in Figure 9.9.
Example 1. In the last section, we saw that with A = (p., A) E 1R2, the functions
of equations (38); viz.,
lT/(X.
T ,
A) = {P.(l - cos x) + A sin x, X E [0, 2nJ,
p.(x - sin x) + A(l - cos x),
are stationary for a certain j, and clearly '1'(0; A) = (!) E 1R2, while both 'I' and
'l'x are C 1 everywhere. It follows that '1'(.; A) is a central family for this j with
center 0( = (!).
Moreover, by the computation of (39), we know that for x E (0, 2n) the
Jacobian
a x
Figure 9.9
310 9*. Sufficient Conditions for a Minimum
We already know that each point (x, Y) E D = (0, 2n) x 1R2, determines a
unique A, so that the field <II == (cp, CPl) is well defined by (41). It follows that
<II is a central field for j in the domain D which is that required for the
application.
Lemma 9.13 shows that the field parameter function A is in C1(D), and
this can now be used to show that <II is exact. More generally, we have the
following.
PROOF. For a given f = f(x, Y, Z), let <II be a central field for f in a domain D
of IRd + 1 with center 01: = (a, A), where we may suppose a < x, V (x, Y) E D.
By definition, <II is determined by a central family '1'(.; A) (with center 01:)
where by Lemma 9.13, the field parameter A E C1(D). Thus to each point
(x, Y) E D is associated the stationary function '1'(.; A(x, Y)) E (C1[a, X])d;
then
x
S(x, Y) =
def
fa f['I'(t; A(x, Y)] dt (46)
is defined in D, and depends on (x, Y) only through A(x, Y) and the upper
limit of the integral. We shall prove that <II is exact by demonstrating that in
D, S has partials of the correct form (9). It is here that we shall use the
differentiability of'll and A guaranteed by 9.12 and 9.13.
Recall that f[Y(t)] == f(t, Y(t), Y'(t)) and introduce the abbreviation
t= (t, A(x, Y)). Then by A.13 and the chain rule, using the prime to denote
"t" differentiation, we get
where 'I'A and Ay are the matrices of partial derivatives with, respectively,
columns and rows indexed by A, and ('I")A is the matrix of t derivatives
corresponding to 'I'A- Here Sy, fy, and fz are row matrices.
Now, by assumption ('I")A = ('I't)A = ('I'A)t = ('I'A)', and 'I'(t) =
'I'(t; A(x, y)) is stationary on (a, xl Hence, fy['I'(t)] = (d/dt)fz['I'(t)], and
so (47) becomes
But
'I'A(ii) = (!), since 'I'(a; A) = A, V A;
and if Id denotes the d x d identity matrix,
'I'A(i)Ay(x, Y) = Id , since 'I'(i) = 'I'(x; A(x, Y)) = Y.
§9.6*. Central Fields 311
Finally, since
<I>(x, Y) ~ qt'(x; A(x, Y)) = qt'(x);
Sy(x, Y) = fz[qt(x)] = fz(x, Y, <I>(x, Y)) (48)
= P(x, Y), as in (9).
Similarly, by A. 14, from (46):
The reader should verify each step of this derivation in the simple case
d = 1, at least by formal operations.
Theorem 9.14 justifies the incorporation of the term "central field" in the
statements of Theorems 9.7, 9.9, and 9.10, which should now be reexamined.
Observe that both fields considered in Example 1 of §9.3 are exact, but only
one, that obtained from the semicircles with fixed (left) end point !X is central
(Problem 9.3); moreover, it is the one with smaller domain. (Attempts to
enlarge this domain by including say the semicircles with the same fixed right
end point !X result in a pair of disjoint quarter planes.)
In order to apply Hilbert's theorem (9.7) for central fields, we must be able
to find or construct them. For a given f, and center !X = (a, A) E IRd +l, the
collection of "all" stationary functions qt E (C 1 [a, b])d with qt(a) = A (and
possibly variable b) constitutes a potential central family. It may always be
parametrized by A = qt'(a), or, some other choice may be more natural.
However, whatever choice is made, the resulting family, qt = qt(x; A), with its
derivatives, qtx = qt,,(x; A), must be C 1 in all variables. In explicit cases this
technical requirement is usually met automatically.
Alternatively, we may simply examine the matrix qtA for (x, Y) domains of
invertibility. Each point (Xl' Yl ) of invertibility provides through Lemma
9.13, a local central field, but each domain of invertibility provides only a
possible central field. Coverage of the domain is assured, but consistency
must still be established (and need not be present).
An additional complication to applications is the fact that the center Q(
must be located on the extension of a trajectory of interest but be outside the
domain of a usable central field containing this trajectory. In our analysis of
the brachistochrone in §9.4 we have presented one method of confronting this
problem. However, the usual approach is to extend the given trajectory
slightly, make its new end point a, say, the center, and show that a resulting
central family can still produce a central field in a domain which contains the
original trajectory. We shall adopt this approach in the following application
and in the embedding construction of the next section.
t/I(x; 0)
a b x
Figure 9.10
For fixed x > a, u, and hence, "'(x; A), increases strictly to +00 as A
traverses the positive real axis. It follows that for each such x, "'(x; A) assumes
each value ~ "'(x; 0) = a l cosh al (x - a), precisely once for some choice of
A ~ O. Thus for A ~ 0, ",(.; A) provides a central family of catenaries which
remain consistent in the domain
Now fix Al ~ 0 and set Yl = ",(.; Ad. Using Hilbert's theorem (9.7), we
conclude with the
When A1 > 0, Y1 provides a strong local minimum for the surface area
function, in the sense of Chapter 7. (Why?)
It is considerably more delicate to introduce trajectories for A < 0 in order
to enlarge Da since they may intersect each other, as well as those already
present. However, although, as in Problem 7.15, some configurations may
permit another catenary i\ to join these same end points, it is clear from the
above that it cannot provide a lesser surface area if it is contained in Da , or in
any legitimate enlargement of Da. For a complete discussion, see [Bl]. (A
similar problem is analyzed in Example 1 of §9.9.)
Note that we still have not obtained a result for the given catenary ro
unless Yo = l/I(.; Ao} for some Ao. To ensure this, we must enlarge the interval
of definition of Yo, and choose a1 = yo(a} for some a < O. However, to con-
struct the family with this center as above requires that Yb(a} = sinh Ao ;::: O.
Geometrically, it is seen that this is assured if and only if yb(O} > O. In the
next section we shall construct such families for a general f.
(Problems 9.11-9.12)
where fzy is the matrix with elements fz., yJ., i, j = 1, 2, ... , d having its rows
indexed by i, while the remaining matrices are columns.
Using the hypothesized invertibility of fzz, we conclude that (Yo, Zo = Yo)
satisfies the system
Y'(x) = Z(x),
(56)
Z'(x) = G(x, Y(x), Z(x»,
where
G(x, Y, Z) = f ZZl [fy - fxz - fzyZ], (57)
with the partial derivatives of f evaluated at (x, Y, Z).
Of course, the trajectories of this family need not cover a domain con-
sistently. They obviously intersect at IX = (a, A), and we are indexing the
family by their slopes at IX. The situation is illustrated in Figure 9.11 where it
is seen that the trajectories appear to remain distinct, at least for small x-a.
In view of Lemma 9.13, the determinant of the matrix \{IA plays a crucial
role, and it suffices to consider it when A = Ao. Upon differentiating (56) with
respect to A and using the chain rule, we obtain for the matrices, \{IA and ZA'
316 9*. Sufficient Conditions for a Minimum
Y
Graphs of
(a,A)-=:::::::-~
--....,,--::::::=----1 } ,¥(.; A)
----==----:::::~S:...:::;~ Yo = ,¥(.; Ao)
a b x
Figure 9.11
,
T (x) = Gy(x)
[(D Id ]
Gz(x) T(x) = G(x)T(x), (60)
(9.17) Proposition. If for I> > 0 the coefficient functions in the matrix G are
continuous on an interval I. = [a - 1>, b + 1>], then on this interval, each solu-
tion T of the system (60) admits an estimate
I T(x) 1 ::;; eMlx-all T(a)l, for a constant M = M(I» > O. (61)
PROOF. Let N = 2d 2 • Each coefficient function 9u in the matrix G is bounded
on I. by MjN 2 , say. (Proposition 5.3). The dot product of (60) by
T(x) = (tl (x), t 2(x), ... , tN(x)) is
~ ~ IT(xW =
2 dx
T(x)· T'(x) = f
i.j=l
gij(x)ti(x)tj(x)
::;;MIT(xW,
since
1tix)1::;; I T(x) I, j = 1,2, ... , N.
§9.7. Construction of Central Fields with Given Trajectory 317
so that
or
IT(x) I ~ eM(x-a)1 T(a)l.
A similar argument gives the desired inequality (61), when x < a. 0
Recall that our coefficient functions J, and hence, G, together with its
partials, Gy and Gz , are defined in a neighborhood of '1&'0' Thus we may
suppose that Yo, and, by Proposition 9.16, that '1'(.; A), together with 'I'A
and ZA, are defined on [a - /:, b + /:] for some /: > O. Then in view of (59),
Proposition 9.17 proves that each solution to (58) on this interval of length
I = b - a + 2/: has components bounded by eMl , for M = M(/:). (Why?) Fur-
thermore, each component bij(x) of the matrix
def
BW=~W~W+~W~W ~~
is also bounded by some M = M(/:). Integrating (58) and incorporating the
initial conditions (59) gives
'I'A(X) = 'I'A(a) + LX ZA(t) dt,
or
'I'A(X) = (x - a)Id + LX dt f B(-r) d-r, (63)
(9.18) Lemma. L\(x, a) = det['I'A(x)] of. 0, for 0 < Ix - al < 2/Md, where
M = M(/:).
PROOF. If L\(x, a) ~ 0 for some x of. a, then 'I'A(x) is not invertible and there
exists a U = (U 1 , U 2 , .•• , U d ) E ~d with II UII = maxj=l, ... ,d Iujl = 1, for which
'I'A(X)U = (!). From (63), we have
I It
x
Ix - al = Ix - a111U11 ~ a dx a IIB(-r)UII d-r ~ Md
(x - a)2
2 ;
(9.19) Remark. Since the constants M = M(/:) used in the preceding argu-
ments apply to each solution of (58) on the interval [a - /:, b + /:], it follows
that the solution Yo can be extended to a larger interval [a, b], where a < a,
318 9*. Sufficient Conditions for a Minimum
such that a corresponding central family with center a. = (a, Yo(a)) will have
~(a, a) =f. O. Hence, by Lemma 9.18, that portion of ro near a. = (a, Yo(a)) is a
trajectory of a central field for f
(9.22) Definition. When L\(x, a) #- 0, V X E (a, b], then ro is said to satisfy the
Jacobi condition for f
on
F(Y) r
provides a unique weak local minimum for
then [~zz] = [LQQ] = [a;j] is always positive definite. (See §8.4.) Thus
L{J., Q, Q) is strictly convex on D x /Rn for some domain D of /Rn+1, by 9.2.
Hence, supposing that L is C 3 , it follows that each point on a stationary
trajectory is contained in the domain of a useable central field. By Theorem
§9.8. Sufficient Conditions for a Local Minimum 321
9.23, each stationary trajectory provides a strong local minimum for the
action integral
A(Q) == r L(t, Q(t), Q(t)) dt.
when b - a is sufficiently small, among nearby trajectories with the same end
points at a, b.
Thus Hamilton's principle of stationary action becomes in this case a
principle of (strong) local minimal action, even though the total action might
be maximized by this same stationary function.
on
F(Y) r
then Yo provides the unique weak local minimum for
z <¥to
b
x
Figure 9.12
apply Hilbert's comparison F(Y) > F(Yo) to all Y E ~ '" {Yo} which are in a
weak neighborhood of Yo. If, in addition, f~, y, Z) is [strictly] convex in
Do x !R d , then Hilbert's theorem (9.7) applies to all Y E ~ which are in a
strong neighborhood of Yo. That such a neighborhood exists requires an
appeal to compactness. D
We defer application of this result until the end of the next section.
on
F(Y) r
(9.25) Theorem (Jacobi). Let Yo be a weak local extremum for
= f[Y(x)] dx
Set
V(X) = 'PA(x; Ao)U, so that V(a) = V(a*) = (!),
d
dxfz['P(x; A)] = fr['P(x; A)]. (64)
where the double partials of f are matrices with columns indexed by the
second subscript, and rows by the first, evaluated on ~o. Post-multiplying this
equation by the constant column vector U and incorporating the properties
of Vas above, we obtain
:X [fZY(x) V(x) + fzz(x) V' (x)] = fyy(x) V(x) + fyz(x) V' (x). (65)
It may be verified that (65) is the first Euler-Lagrange equation for the
quadratic function
der - - -
q(x, V, W) = Vfyy(x) V + 2 Vfyz(x) W + Wfzz(x) W, (66)
where V, Ware (column) vectors in [Rd and V, Ware the corresponding row
vectors. (See Problem 9.22.) However, for our purposes, it suffices to observe
that since Vfyz(x) W = Wfzy(x) v,
(9.27) Remark. (65) is called Jacobi's equation. Its linearity (with fzz inver-
tible on fili'o) assures that for each j = 1,2, ... , d, it has a unique solution lj on
[a, b], with lj(a) = (!) and lj'(a) = Ej , where Ej is the unit vector of ~d in the
jth coordinate direction. The matrix having the lj(x) as columns is precisely
'¥A(X; Ao) and so A(x, a) = det[V1 (x)iV2(x)i'" iv,,(x)]. The Jacobi condition
may be restated in terms of the linear independence of these lj.
Thus y'(x) = tan O(x) = cl(x - co), for a constant Co, or finally, using (68)
once again, we obtain
(69)
We shall consider those functions of this form capable of providing local
(minimal) values for
F(y) = L f[.p(x)] dx
on
~ = {y Eel [0,1]: y(O) = 1; y(1) = t}.
To satisfy the condition y(O) = 1, we must take cic~ = 2c I - 1 in (69), and
after some manipulation, we obtain for C I = (A 2 + 1)/2, the central family of
quadratic functions:
(A 2 + 1)x 2
l/J(x; A) = 4 + AX + 1, (70)
in which the parameter Ahas been chosen to have l/J'(O; A) = A. Observe that
A~ 2
A(x,O) = l/J;.(x; A) == T + x, = 0 when X = -I. (71)
If one of the associated parabolas passes through the point (x, y), then
from (70), we have after competing the square, 4y = [(AX + 2)2 + x 2 ]. Thus
no curve of the family enters the region in which 0 < 4y < x 2 , and exactly one
curve of the family passes through each point on the boundary of this region
where 4y = x 2 • However there will be two curves of the family passing
through each point of the complementary region in which 4y > x 2 • In partic-
ular, for the point (1, t), there are the curves corresponding to the parameters
A = -1 and A = - 3 shown in Figure 9.13. From (71), l/J;.(x; -1) i= 0 on
(0, 1]. Hence the Jacobi condition is satisfied, and we are assured by Theorem
9.24 that YI(X) = l/J(x; -1) = x 2 /2 - x + 1, provides a strong local minimum
value for F on ~. But also from (71), l/J;.(x; -3) = 0 when x = l Thus by
Theorem 9.25, the curve Y3(X) = 5x 2 /2 - 3x + 1 cannot provide even a weak
local minimum (or maximum) value.
Figure 9.13
on
~ = {Y = (x, Y) Eel [0,1])2: Y(O) = (0, 1); y(i) = (1, t), Y ~ O},
then, as in the application of §7.5, we must admit the Goldschmidt curve to
consisting of the horizontal segment along the x axis and the two vertical
segments at its ends. By direct calculation in which we utilize the arc length
as the parameter t, we obtain
(In making this last computation we have used (69), with 2c I = 1 + A,2, where
A, = -1.) We can approximate the Goldschmidt function Yo by smooth func-
tions, Y for which the integral G(Y) = F(y) has values as near G(Yo) as we
wish (Proposition 7.6), and we conclude that YI gives only a strong local
minimum value for F. While it is true that for this problem, the upper of
two possible (stationary) parabolas through the same pair of points always
§9.10. Concluding Remarks 327
In this book, an effort has been made to present those results which can be
established without use of the Lebesgue integral. However, we have not
examined the special class of homogeneous integrand functions which yield
integrals independent of the particular parametrization of the underlying
trajectories. The study of the associated curve dependent integrals is pre-
sented in such works as [S], [G-F] and [Ak], and the relation between a
resulting field theory and the Hamilton-Jacobi equations for the trajectories
is investigated thoroughly in [C] and in [Ru]. In the latter work will also be
found a corresponding development for multidimensional integral functions.
Convexity plays a significant role in the examination of certain multidimen-
sional problems in which the underlying domain of integration is permitted
to vary. See [G-F] and [P-S].
The principal benefit derived from the introduction of the Lebesque inte-
gral is the possibility of establishing existence of a minimizing function in a
larger class of functions than those admitted in this book. These methods
were first carried through successfully by Tonelli (c. 1922) and his results
will be found in [Ak]. They depend on the fact that integrands, f, convex
as in this chapter (§9.2), will produce integral functions, F, which are semi-
328 9*. Sufficient Conditions for a Minimum
The foregoing remarks are merely suggestive and cannot do justice either
to the comprehensive scope of this field, or to its contributors. Despite classi-
cal origins, the principles of the variational calculus remain a vital force in
both mathematical and philosophical thought. Although the problems are
now clearly delineated, methods for their solution are far from exhausted,
and there remains a formidable gap between the known theoretical methods
and their satisfactory application in specific instances. New techniques will
be developed, and, it is to be hoped, that they will admit expression in
even simpler forms than those now employed. The subject both needs and
deserves an elegance of expression which is comparable to the idealism
embodied in its concepts.
PROBLEMS
F(y) = f J1 + y'(xf dx
on
~ = {y E Cl[O, 1]: y(O) = 0, y(l) = c}.
(b) Use Hilbert's theorem (9.7) to reach the same conclusion.
(c) What problem are we solving?
9.3. (a) Show that the stationary trajectories for
f(y, z) = ~/y, y>O,
are semicircles with centers on the x axis.
330 9*. Sufficient Conditions for a Minimum
(b) Prove directly that those having the origin as a common (left) end point
determine an exact field qJ for f in a quarter plane.
(c) Use a theorem to reach the same conclusion as in (b).
9.4. (a) Examine Facts 3.11 and make appropriate vector valued extensions for
[strictly] convex functions f(!., X; Z).
(b) Prove your results as in Problems 3.2 and 3.3.
9.5. By the method of §9.1, consider the function
(f) Show that f(§., y, z) is strictly convex on D x [R, and apply Hilbert's theo-
rem to make some comparisons about curves in D.
(g)* Sketch a limiting argument in terms of the integral
for small positive a and b, where r(y) = 1- l l ly =sin l., or 1- l l ly =sin l(l-.), are
the related radii of the arcs defining the central family.
(i) Argue geometrically that as Il '" 0, 11(u.)1 --> 0, while F.(y) --> - A(y). Con-
clude that Dido's conjecture in the form A(P) :::; A(yo) with equality iff
P= Yo, is true. (See Problem 1.5.)
(j) Compare this solution with the approach taken in Problem 6.42. In what
sense are both incomplete solutions of Dido's Problem?
9.12. The Zenodoros Problem. (See Problems 1.9, 6.43, and 8.27.)
(a) Use Problem 8.27(d) to show that a central family for f(x, y, z) =
-(y(1 - Z2»1/2 is given by the parabolic functions t/J(x; 1) = x - 1x2, for
1> o.
(b) Graph three members of this family noting that t/J'(O; 1) = 1.
(c) Show by direct computation that through each point (x, y) E [R2 with
o < y < x, passes precisely one trajectory of this family generating the
field cp(x, y) = 2(y/x) - 1.
(d) Establish that this is a central field for f in D = {(x, y) E [R2: 0 < y < x}.
(e) Argue that f~, y, z) is strictly convex in D x (-1, 1), and apply Hilbert's
comparison to related curves in D.
(f) For T> 0, let ro be the graph of the function Yo(x) = x - x 2/T =
t/J(x; T- 1) on [0, TJ, and let f be the graph of a function P E ~ =
{P E C1 [0, T]: P(O) = P(T) = 0, with (x, P(x), P'(x» E D x (-1, 1).} Finally,
for a> 0, let Ua be the segment joining these curves at x = a. Show that
I(r) = L (h dx + P'dY)
has a value that depends only on the end points of C 1 curves r s; D.
(a) Consider only those curves r which can be parametrized by Y E qy =
(C1[a, b])d; argue that for fixed A, B E [Rd,
b
F*(Y) =
def
f• [h(x, Y(x» + P(x, Y(x»· Y'(x)] dx
is constant on
!?# = {Y E 11,1/: Y(a) = A, Y(b) = B, (x, Y(x» ED},
and hence l5F*(Y; V) = 0, 'v'V in a set !?#o.
(b) Show that the resulting first equation for a typical Y E !?#, is
P" - hy + (Py - Py) . Y' = 0,
where the terms are evaluated at (x, Y(x», and Py is the Jacobian matrix
with rows indexed by Y, while Py is its transpose.
(c) Since (a, A), (b, B) could be any points in D with a < b, conclude that h, P
must satisfy the exactness conditions (10). Hint: Reason geometrically.
(d) Next, let III be a stationary field in D for a given f = f(x, Y, Z) (assumed C 1 ),
and suppose that Y E!?# is a solution of the associated field equation (§9.3).
Show that F(Y) = S:f[Y(x)] dx = F*(Y) given in (a), if h is defined as in
(9), where P is as yet unspecified.
(e) Use conditions (10) [obtained in (c)] to prove that when Y is as in (d) and
V(x) ~ P(x, Y(x» - fz[Y(x)], then V' (x) = - lIl y (x, Y(x» V(x) with trivial
solutions V == (I). Now define P. Hint: See Proof of 9.4.
9.14*. To obtain a multidimensional version of Problem 9.13, proceed as follows:
(d> 1)
(a) For arbitrary C 1 functions h = h(X, u), P = P(X, u), defined in a domain
D of [Rd+1 require that the integral
be constant on sets
9.16. (a) When J = J(x, z) E C 1 ([R2), and -r = -r(x, y) is cl, with V-r =f. f!J, show that
the family of lines which cut T, the zero level set of -r, orthogonally will
determine an exact field for J which meets the transversal condition (25) of
§9.4 in any domain D which they cover consistently.
(b) Find such a family for the parabolic function -r(x, y) = y - x 2 , and deter-
mine domains which they cover consistently.
(c) Make the construction of (b) for the function -r(x, y) = x 2 + y2 - 1.
(d) For J(z) = Z2, with -r as in (b), what related minimization problems can you
solve?
(e) Repeat part (d) when J(z) = J1+"7.
9.17. Suppose that <I> is an exact field for J in a domain D of [Rd+!, where J(~, r. Z)
is [strictly] convex in D x [Rd.
(a) If <I> satisfies the transversal condition (25) for a C 1 arcwise connected
transversal T S; D, which cuts one field trajectory r represented by Y E
334 9*. Sufficient Conditions for a Minimum
where (tj , Y(t) E T, j = 1,2, and tl < t 2 • Hint: Use the invariance of Ion
the corresponding subtrajectories Ij of r.
(b) If for some fixed b, the set of points (b, Y) E D is convex, then it defines a
transversal T. Show that the transversal condition (25) reduces to requiring
that JAb, Y, lIl(b, Y» = (!J, V (b, Y) E T. How could this transversal condi-
tion have been anticipated? (See §6.4.) If III meets this condition, which
minimization problems can you solve?
(c) What condition should III satisfy on T = {(x, B): Xl :::;; X :::;; X2} in order
that Theorem 9.9 guarantee minimization for Yo, if Yo(t) = B, but
Xl<t<X2?
(d)* Prove that the cycloid provides a minimum for Jakob Bernoulli's brachis-
tochrone problem from §6.4. Hint: Use a field from Problem 9.9(a), in
conjunction with the limiting analysis for the ordinary brachistochrone
problem given in §9.4.
9.18. Verify that the functions defined by (38) satisfy (for constant f-L) the equations
of stationary for j of (36), under the constraint (35).
9.19. Chaplygin's Problem.
(a) Using the formulation with Lagrangian constraint from Problem 1.4, show
why it might be appropriate to consider the flight path of maximal area as
being stationary for the modified function
j(t, Y, Y') = -xy' + ).(t, Y)[(x' - W)2 + y'2 - 1],
where Y = (x, y), and Y' = (x', y') is regarded as a variable. ). is an un-
known function which we may suppose Cl.
(b) Demonstrate that when). > 0 on a domain D of [R3, then j(!, X, Y') is
strictly convex on D x [R2.
(c) Under the additional constraint (x' - W)2 + y'2 == 1, prove that the sta-
tionary functions for j satisfy with f-L = 2), the equations
f-L(x' - w) = - y + C2;
for trajectory constants Cl, c2 •
(d) Show that f-L2 = (x - clf + (y - C2)2, and differentiate along the trajectory
to get
f-L' = wy' so that f-L = wy + r,
for a trajectory constant r.
(e) Conclude that under the constraint in (c), each stationary trajectory must
project onto an arc of a y ellipse whose ratio of minor to major axes is the
wind-constant ~.
(In particular, an associated closed flight path through the origin would be
along the ellipse of the type in (e) fixed by the flight time and the initial flight
path angle. It is considerably more difficult to reach this conclusion with the
isoperimetric formulation of the problem. See [Sm] for details.)
(f) Explain what would be necessary to prove that flight along such an ellipti-
cal path would in fact maximize the enclosed area.
Problems 335
again first when d = 1, then for d > 1. Explain why Q(V) might be denoted
b2 F(Yo; V).
(e) Argue, (with Legendre), that ifr(x) = fzz[Yo(x)J > 0 on [a, bJ, and a solu-
tion Ul of the Riccati equation in (b) is available, then the stationary
function Yo will minimize F(y) = S:f[y(x)] dx on a typical set !?), in each
direction v.
(f) Why would the conclusion in (e) not establish Yo as a weak local minimum
point for F on !?)? What might strengthen Legendre's approach into a
proof for this minimality?
(g)* Attempt a formal extension of Legendre's argument to the case where
d = 2.
9.24. (a) For the Sturm-Liouville function /(x, y, z) == 't(X)Z2 + [q(x) - lp(x)]y2,
with q, p, 't E C[a, bJ (as in §7.3), verify that the solutions of the Jacobi
equation (65) will be stationary for j
(b)* When't > 0 on [a, bJ, relate Picard's argument of Problem 7.19 in terms
of a nonvanishing stationary (eigen)function Yl, to that of Problem 9.23(b)
in terms of a nonvanishing solution V 1 of Jacobi's equation.
(c) Explain how Picard's inequality for F of Problem 7.19(f) follows from
that of Legendre for Q in Problem 9.23(d).
9.27*. (Newton's drag profile minimizes.) In Problem 7.27 we saw that the only
candidate for minimizing F(P) on !?)* is the cornered curve Yo for which
Yo(x) = h, x ~ a, and y~(a + ) = -1. Moreover, for a ~ x ::;; 1, Yo is described
by (14') of §3.4(c), and so has a stationary extension above the line y = h
(whose graph is indicated in Figure 3.6), as well as one below the x-axis. It is
given parametrically in Problem 3.40(b) where c = a and Yl = h.
(a) When Yl = h, the family of curves for c> 0 in Problem 3.39(b) cover
consistently a simply-connected domain D :2 {O < x ::;; 1,0::;; Y ::;; h}. Ex-
plain, and graph several curves from this family.
(b) Explain why the slopes ({J of these curves define a field in D that is station-
ary, hence exact. Also explain why ({J(x, h) = -1 and why ({J ::;; 0 in D.
Conclude that when Y = h, the field functions in (9) (here denoted by p and
h) are, respectively, x/2 and x.
(c) Let P E ~*, and suppose that like Yo, P(x) = h, for 0 < x ~ c say. Let a be
r,
the segment from (c, h) to (a, h) parametrized in this direction), the graph
of Pfor x ;::: c, and r o, the graph of Yo for x ;::: a. Then show that l(r) =
l(ro) + l(a), where I is Hilbert's invariant integral for ({J. Use previous
Problems 337
1(u) = r
x dx = s: x dx - f: x dx
so that as in (21), F(P) ~ F(yo) as desired. Hint: Recall that e ~ 0 for the
appropriate pi, cp.
(d) If PE g)* is not constant on some initial x-interval, we can modify the
previous construction by introducing the vertical segment Ue from (c, y(c))
to (c, h) for small c < a and letting t and re be the graphs of y and Yo
respectively for x ~ c. Show that 1(re) = 1We) + 1(ue) and that
fh 2clcp(c, y)1 lui 1
11(uJI :;:; J0 (1 + cp2(C, y))2 dy :;:; ch (since 1 + u2 :;:; 2: when u E R).
Let c ...,. 0 and conclude that as before F(P) ~ F(yo).
PART THREE
OPTIMAL CONTROL
AN OBSERVATION
"Since the fabric of the universe is most perfect, and is the work of a most
wise Creator, nothing whatsoever takes place in the universe in which some
form of maximum and minimum does not appear."
339
CHAPTER 10*
The discipline now identified as optimal control emerged during the decade
1940-1950, from the efforts by engineers to design electromechanical appara-
tus which was efficiently self-correcting, relative to some targeted objective.
Such efficiency is clearly desirable in, say, the tracking of an aircraft near a
busy airport or in the consumption of its fuel, and other economically desir-
able objectives suggest themselves. The underlying mathematical problems
were attacked systematically in the next decade by Bellman [Be], by
Hestenes [He], and by a Russian group under Pontjragin [Po]. Their results
were quickly adapted to characterize optimal processes in other fields
(including economics itself) and the feasibility of optimal control is now a
standard consideration in contemporary design strategy.
In examining the associated idealized problems, it is natural to employ
the techniques of the variational calculus to obtain models for what can
occur. We have already attacked two such problems successfully by such
methods-the production problem of §3.4(d), and the fuel consumption prob-
lem in §3.5. Indeed, most (deterministic) problems in optimal control admit
formulation as one of steering a system so as to minimize a performance
integral over an interval (in time), in the presence of Lagrangian constraints,
with certain additional target conditions and control restrictions (§10.1). (We
shall not consider problems involving multidimensional integrals.) Control
constraints are usually a reflection of physical limitations, and although
their presence imposes severe complications on the theoretical derivation of
necessary conditions (§11.1), it seems far less inimical to sufficiency consider-
ations. (We avoid attempts at presenting general existence theory which for
optimal control problems is truly formidable; see [Ce] and [CIa]). In this
chapter we concentrate on developing effective sufficiency methods which
can usually be attempted, and which, when successful, wi11lead to a solu-
tion-in many cases-the unique solution to the problem.
343
342 10*. Control Problems and Sufficiency Considerations
r
assessed by an integral of the form
where for each (Y, U) E!!J, each of the expressions [ ] and < ) is an
(integrable) real valued function of t on [0, T].
Now, if for some (continuous) Lagrangian multiplier functions p and Ji,
(Yo, Uo) E !!J minimizes
F(Y, U) = F(Y, U) + LT (p(t)[Y, U] + Ji(t)<Y, U») dt
on !!J [uniquely], it follows that (Yo, Uo) will minimize F(Y, U) on !!J
[uniquely] under (a) and (b) provided that for t E [0, T]:
(a') [Yo, Uo] = 0; and
(3)
(b') Ji> 0 with Ji<Yo, Uo ) = 0;
(except possibly at a finite set of values of t).
Indeed, then <Y, U) :$; 0 => Ji< Y, U) :$; 0 = Ji< Yo, Uo ), so that
F(Y, U) :?: F(Y, U) :?: F(Yo, Uo) = F(Yo, Uo).
For the vector valued versions of (a) and (b) we simply add additional
terms to j, each with its own multiplier function as in §2.3, resulting in a new
integrand of the form
j=j+p.[] +M·< ),
which can then be subjected to a similar analysis.
Of course, if the new integrand j is convex in the sense of this book, then
minimization of F can, in general, be obtained from a (Yo, Uo) E !!J which
satisfies the Euler-Lagrange equations for j together with the corresponding
Weierstrass-Erdmann conditions of §7.5 at any corner points.
Also, sufficient strong convexity will guarantee uniqueness of the minimi-
zation, and this requires only convexity of each term in j plus strong convex-
ity of one of these terms. Moreover, as in §7.4, these arguments remain valid
even when the terms of j[Y(t), U(t)] are only piecewise continuous, as can be
seen by the usual partitioning of the integrals.
When the target time T is not fixed, it may be possible to solve the
problem as if it were, and then optimize over T as was done with the perfor-
mance problem in §3.5. If this is not possible-and it cannot be for time
optimal problems in which it is this T itself being minimized-then there
may be a transformation which replaces the problem by a (convex) one
over a fixed interval in some other independent variable. If this fails, then
there are certain other sufficiency theorems including some of the field theory
type of Chapter 9, but they are usually more difficult to implement.
Note also that the above simple device of setting Y = U, transforms most
problems in variational calculus considered in the previous chapters into
those which have the appearance of problems in optimal control. This
supplies a convenient source of counterexamples.
For example, it is easily verified by the techniques of Chapter 3, that
the convex function
is minimized uniquely on
r!} = {y E Cl[O, 1]: y(O) = 1, y(1) = e}
by
Hence
uo(t) = yo(t) = et
provides the unique optimal control for minimizing
= Ll (u 2 + 3(t - 1)y3 U) dt - 1.
The methods employing convexity may extend also to a problem such as the
following of Bolza type in which the performance integral is augmented by a
function of the endpoint values.
1 The requirement that y(t) ~ 0 is superfluous for this problem. From the state equation, we see
that y ~ - y when u ~ -1, on any interval [0, t] in which y ~ O. But then y(t) ~ y(O)e-', so this
interval cannot terminate.
348 10*. Control Problems and Sufficiency Considerations
Thus we want JlY to be continuous with (JlY)" = y except at corner points, and
JlC? - 1) == 0, with y(O) = 1 and y(2) + Jl(2)Y(2) = o.
Suppose Jl(O) > 0, so tht either y(O) = + 1 or y(O) = -1. In the former case
JlY increases (why?) so that Jl can never vanish and y(t) = + 1; therefore,
y(t) = t + 1, but this violates the terminal condition. In the latter case,
y(t) = -1, and both y(t) = 1 - t and Jl(t) = (1 - t)2/2 + c decrease until Jl
vanishes and permits a corner point at t = t 1 , say. For example, if c = 0,
then tl = 1, y(t 1 ) = 0, and we could take Jl(t) = y(t) = 0, for t > 1, since this
satisfies all requirements. Thus, one possible solution is to take
yo(t) = 1 - t and Jl(t) = (1 - t)2/2, t ~ 1,
=0 = 0, t ~ 1,
but are these the only choices? Observe that
F(yo + v) - F(yo) - t5F(Yo; v) = v 2(2) + I2 [v 2(t) + Jl(t)V(t)2J dt
=0 iff v == O.
Thus F(yo + v) = F(yo) iff v == 0, so that the solution Yo is unique, and
_ . _ {-1 on (0, 1),
Uo - Yo - 0 on (1, 2),
is the unique optimal control.
Here the optimal control uo, although discontinuous, is not of the bang-
bang type, since U o = 0 is not on the boundary of the control region dfI =
[ -1, 1]. (However, it would be of this type on the smaller control region
dfI = [0, 1].)
In order to find the minimum time T required to transfer this system from
its initial state at the origin (X(O) = (9) to a prescribed state (X(T) = B =
(b, bd), we can transform this problem to one on a fixed interval as follows:
Under admissible motions we have
1= u2 + uf = (x - 1)2 + xf,
or 2x = x2 + xf ~ 0, which implies that since x(O) = 0, x(T) = b,
b= LT x(t) dt ~ 0, (5)
°
and only such b can be permitted. Moreover, b = => x == on (0, T), (why?),
so that Xl == 0, as well. Therefore bl = xl(T) = Xl(O) = 0, and the problem is
°
trivial.
Thus for b > 0, we shall assume that x > 0, and consider x as the new
independent variable, while Y = t(x) and Yl (x) = Xl (t(x» will be the new state
variables, governed by the new state equation
Y? = 2y' - 1 (6)
on the fixed interval [0, b]. (Why?) Since
{
xy' = 1,
y~ = x1 Y',
we shall permit discontinuities in y~ provisionally. Then we have the simple
convex problem of minimizing
It follows that Pis the desired minimum time To required to reach state B
under the mild restriction that x > 0; it is obtained uniquely with the linear
trajectory Xo(t) = (tiP, bltlbP)·
where p is a given positive constant, and m = m(t) is the mass of the rocket
and fuel at time t. Then -rh, the rate of fuel consumption, is nonnegative,
controllable, and limited by the design of the engine; its effect in providing
thrust is clearly visible in (8).
Since m(t) ~ mR , the mass of the rocket alone, we can divide (8) by m and
obtain
y= u - g, (9)
where u = - prh/m ~ 0, is a thrust control with, say, u(t) ::;; p; g is the gravita-
tional acceleration which we still assume constant. Note that (9) is identical
to the dynamic law used in the earlier analysis, but our new model admits
more realistic applications.
For example, if the initial mass of the fueled rocket is M o, what is the
maximum altitude which can be reached by this rocket with the consumption
of a fixed mass M of fuel during the first stage of ascent, and how should the
fuel be burned to achieve it?
For each program m(t) of fuel consumption over the fixed time interval
[0, T], with m(T) = Mo - M, there is a control u E CEO, T], with
y(T) = IT (T - t)y(t) dt
If we ignore the control constraints, 0 ::;; u(t) ::;; p, and examine for con-
stant A, the modified integrand
j(!, u) = (! - T)u + AU
(which is convex), for that U o which makes 0 = .!..[uo(t)] = t - T + A, we see
that this is not feasible. (Why?) Hence the inequality constraints on u appear
to be an essential feature of this problem.
To take them into account most effectively observe that the pair of
inequalities 0::;; u ::;; p is equivalent to the single quadratic inequality
u(u - P) ::;; 0, since the possibility of u < 0 < u - P is untenable.
Then according to the approach taken in §10.1, we should consider the
modified integrand
j(!, u) = (! - T + A)U + /1(t)(u 2 - pu)
which is strongly convex on [0, T] x ~ when /1 E CEO, T] is positive.
If we can find a A, /1, and U o E C[0, T] such that
o = .!..[uo(t)] = (t - T + A) + /1(t)(2u o(t) - P), (11 a)
while
/1(t)u o(t) [uo(t) - P] = 0 with /1 > 0, (l1b)
and (lOa) is satisfied, we have a unique solution. (Why?) Observe that by
(lla), /1(t) = 0 only when t = ,~f T - A for A < T, so that for any other t,
(l1b) requires uo(t) = 0 or p. But then /1 > 0 in (l1a) requires that
Uo
(t)-_{P, t E (0, ,),
0, t E (" T).
Finally, it remains to select A, if possible, to make
Ymax(T) = IT (T - t)(uo(t) - g) dt
=P t (T - t) dt - gT2/2
y(t) = u(t)y(t),
allows the production rate to increase directly with the amount available for
investment.
This is clearly of the form which identifies it as a problem in optimal
control. However, the product u(t)y(t) which appears in the integrand makes
convexity arguments difficult. Fortunately, in this case, we can use the con-
straint to replace the product.
Moreover, since y ~ 0, we can replace the control restriction
by the Lagrangian inequalities °: ;
y(t) ::;; y(t), and we obtain the simpler
°: ;
u ::;; 1
problem of minimizing
F(y) = IT [y(t) - y(t)] dt
§1O.2. Sample Problems 353
on
q) = {y E Cl[O, T]: y(O) = al, Y ~ O}
under the pair of Lagrangian inequalities
y(t) - y(t) ~ 0, - y(t) ~ 0
expressed in the most useful form.
According to our general analysis in §7.4, we should introduce Lagrangian
multiplier functions, A(t), p,(t), and try instead to minimize the modified
f:
integral
F(Y) ~ F(y) + [A(t)(y(t) - y(t)) - p,(t)y(t)] dt
(12)
A(t) = {-1 +
0,
(T - .)e t - t
, t:::;;.,
• :::;; t :::;; T,
are nonnegative and continuous as required. Observe that in this latter case,
the optimal control function is bang-bang;
_ Yo(t) _
()-
uot ---
t<" {1,
yo(t) 0,. < t :::;; T,
is discontinuous with values entirely on the boundary extremes of the control
region o/J = [0, 1]. It dictates that initially all of the output should be used
to improve plant production capability, after which all material produced
should be sold for profit. In a real situation, the work force might object to
this "ideal" solution when T is large!
It remains to consider uniqueness of the optimal solution. What we have
actually established is that if y E (51 [0, T], with y(O) = Yo(O) = a 1 , then with
v = y - Yo:
rT ~
F(y) - F(Yo) = J
0 [(1 + A- Jl)v - (1 + A) v] (t) dt
Hence, F(y) = F(yo) = F(yo), and under the required inequalities 0 :::;; Y :::;; y,
since A and Jl are continuous and nonnegative, it follows that
F(y) = F(yo) iff A(Y - y)(t) == - Jly(t) == O.
However, when T:::;; 1, Jl =I- 0, so that Y == 0 or y(t) = y(O) = yo(t); and simi-
larly for T > 1, we conclude that
{ ~y=O,
= y, 0:::;; t :::;; "
.:::;;t:::;;T,
which again means that y(t) == yo(t). (Why?)
§1O.2. Sample Problems 355
Thus we have obtained the unique solution Yo, to the given problem and
hence the unique optimal control Uo. A related application which arises in
the fishing industry is explored in Problem 10.29.
as can be verified by using Leibniz' formula (A. 14) to differentiate the integral
at each point of continuity of u.
In particular, if u == + 1, then the substitution 81 = t - 1: gives
under (13),
we cannot use the above approach as successfully since Iyul is not as easy to
analyze. From (14) follows
y(t) = J: cos(t - 't')u('t') d't', (16)
Moreover, for t ~ T ~ n/2, both sin t and cos t are nonnegative, and we
see that when lui ~ 1,
and thus maximal energy Eo(T) is achievable with either of the controls
Uo == + 1 or U o == -1. When T > n/2, this maximal energy problem is signifi-
cantly more difficult than its predecessors. (A related time-optimal problem is
solved completely in [Y].) We shall return to it in §11.1 after we have de-
veloped a more general approach for attacking it.
For the present, we shall convert it into the standard form by introducing
the new variable
y = Yl'
so that (18)
Yl = U - y.
F(Y, u) = - IT Y1u dt
on
!!) = {(Y, u) E c [0, T]
1 x CEO, T], with Y(O) = (f) and lu(t)1 ~ I}
under the new linear state equation (18), which takes the matrix form
y=[_~ ~]Y+ul~l·
We see that this problem is autonomous, and that the terminal state Y(T)
is unspecified.
§10.2. Sample Problems 357
with IY(O) I = IAI, IY(To)1 = 0, then by the same argument, To will at least
be the (unique) minimum time of transfer. Now we can surely do this if, say,
Y·I/I(t, Y) = 0 (22)
since then (21) is just (d/dt) IYI = -1 with the trivial solution
IY(t)1 = IY(O) I - t = IAI - t so that To = IAI. (23)
Moreover, the optimal control Uo(t) is that which opposes the (unit) state
vector Y(t)/I Y(t) I at each subsequent instant. (This provides an illustration of
control synthesis: a state dependent prescription for the optimal control at
each instant.)
Condition (22) is realized in the problem where a spinning fully asymme-
trical body (say, a satellite) in space is to be brought to (spin) rest by the
application of a (vector) torque U.
358 10*. Control Problems and Sufficiency Considerations
Figure 10.1
Indeed, if the body has distinct moments of inertia Ij about three body
fixed principal axes through its center of mass as illustrated in Figure 10.1,
then the associated angular momentum vector Y = (Yl' Y2' Y3) is governed
by Euler's equations. (See [eel) The first requires gyroscopic coupling due to
the assumed asymmetry 12 "# 13 ; it is
and the rest are obtained by cyclic permutation of the indices (1, 2, 3).
Since the inertial coefficients are constant, it is easy to see that (22) requires
only
and the sum of the first two terms does cancel the last.
It follows from (23) that if all control torques lUI::;; 1 are available, then
the minimal time of transfer from a spin state A is To = IAI, and it is given by
that control Uo(t) = - Y(t)/I Y(t)1 which opposes the spin state direction at
each subsequent instant. For further discussion of this problem when one
axis of symmetry (12 = 13 ) is allowed, see [A- Fl
In addition to (22), other conditions may lead to a solution by this method
in related problems where it is desired to drive a system from a given state A
to the origin. Some of these are explored in Problems 10.7-10.9, while time-
optimal problems for different control regions are investigated in §11.2. Ob-
serve, however, that the fundamental device utilized in the above analysis,
was that of choosing at each instant t an admissible control Uo(t) which
minimizes a certain expression related to the given functions. The develop-
ment of a suitable generalization (the Pontjragin principle) is taken up in
§10A.
§1O.3. Sufficient Conditions Through Convexity 359
on~,
F(Y, U)
under the state equation
= f: f[Y(t), U(t)] dt
d - -
d/u = fu, or m= hu(t, Y, U), (26b)
360 10*. Control Problems and Sufficiency Considerations
(10.1) Theorem (Sufficiency). Suppose that D is an open set in ~d, 0/1 £; ~k,
and h(f, Y, U) is [strictly] convex on [0, T] x D x 0/1. Then each solution Yo,
P E C1 [0, T], Uo E C[0, T], of the system (26a, b, c) minimizes
on:
P(t)· (G[Y(t), U(t)] - yet)) ~ ° (27)
In the usual case, where only the actual solutions Y of the state equation
are of interest, we obtain minimization under the stated conditions among
those Y which satisfy Y(t) = G[Y(T), U(t)]. However, our formulation of
Theorem 10.1, permits consideration of certain state differential inequalities
provided that corresponding multipliers P(t) can be found with components
Pj of the correct signs. With further modifications of the integrand j by
addition of terms such as p.(t)I/J(t, Y, U), it is straightforward to formulate
corresponding theorems which permit such state-control inequalities as
I/J(t, Y(t), U(t)) :5: O. (See Problem 10.18.)
for appropriate matrix functions A and IB, together with a "quadratic" per-
formance function of the type associated with energy assessments:
For symmetric positive definite matrices 0 and § with elements in C[O, T],
f(!, Y, U) will be strictly convex on [0, T] x ~d+k. (Why?)
Then
hU, Y, U) = f(!, Y, U)+ PW· (AW Y + IBW U)
will be strictly convex on [0, T] x ~d+k for any P in Ct, and the equations
(26a) and (26b) are, respectively,
- P(t) = Y(t)§(t) + P(t)A(t), (28a)
(!J = U(t)O(t) + P(t)lB(t), (28b)
where P, Y, and U are row vectors to be determined to satisfy these equations
together with (28) (the state equation) and with certain initial/target condi-
tions on Y, P.
Since 0 is invertible (§0.13), we can rewrite (28b) as
U = - PIBO-t, (29)
and substitute into (28) to obtain the first-order system which we abbreviate
Y= A Y - IB(Qtl jBp,
(30)
(33b), a Y, which will be an admissible Y(T) for the system (32), and hence
provide a solution to the control problem. We conclude that P(T) = (!J = MIA,
or since MI is of maximal rank I :s; d, that A = (!J; i.e., Mlo is invertible.
Since Mlo is invertible, we can always solve (33c) for A, and find Y Ed
from (33b), resulting in a compatible P(O) from (33a) which satisfies the
system (32) with Y(T) = Y. The resulting Yo(t), Uo(t) provide the unique
solution to the control problem.
For other linear systems and split-boundary conditions, such arguments
may not be possible-and even when they prevail there is in general no
closed method for obtaining the desired solutions to such two-point bound-
ary value problems. For a survey of the required numerical "shooting"
methods, see [Ke]. We note however, that in principle the fundamental
matrix WT can be approximated quite accurately for explicit linear systems
which are not too large, and thus used to obtain P(O) as an explicit solution
to a set of simultaneous linear equations, from which follows
(10.2) Theorem. Let A, IB, 0, and § be given matrix functions with elements in
CEO, T], of sizes d x d, d x k, k x k, and d x d, respectively, where at each t,
O(t) is symmetric positive definite while §(t) is symmetric positive semi-definite.
Then there exists a unique optimal control Uo E C 1 [0, T] and trajectory Yo E
C 1 [0, T] that minimizes
under the state equation Y(t) = A(t) Y(t) + lB(t) U(t), on:
(ii) ~T = {(Y, U) E C1 [0, T] x CEO, T], with Y(O) prescribed};
(iii) ~o = {(Y, U) E ~T with Y(T) prescribed};
(iv) ~M = {(Y, U) E ~T with MlY(T) = -L}.
where L E IRI is given and MI is a given matrix of maximal rank I :s; d.
PROOF. Only the weakened hypothesis on § requires comment. The
integrand
f(f, Y, U) = V O(t) U + Y§(t) Y
remains convex, and is in fact semi-strongly convex in that
f(f, Y, U) = f(f, Yo, Uo) => U = Uo (why?).
§10A. Separate Convexity and the Maximum Principle 365
(10.4) Proposition. Suppose that for some Uo E CEO, T] and P E Cl [0, T],
h(f, Y, UoW) is convex on [0, T] x D x 0/1, where D is open in ~d and 0/1 !:; ~k,
then each Yo E Cl [0, T], which satisfies the adjoint equation
P(t) = - hy(t, Y(t), Uo(t», (34)
makes ho(t) ~ h(t, Yo(t), Uo(t» continuous on [0, T]; and constant in the auton-
omous case if additionally,
on (0, T).
PROOF. With j(t, Y, U, Z) = h(t, Y, U) - P(t)· Z, it follows that for Y E
ClEO, T],
F(Y, Uo ) - F(Yo, Uo ) ~ LT {hy[Yo(t), Uo(t)] - P(t)· (Y(t) - Yo(t)} dt
if Y(O) = Yo(O) and Y(T) = Yo(T). Hence Yo satisfies the second Euler-
Lagrange equation for j, which in the integral form given in §7.5, is that
ho(t) = j[Yo(t), Uo(t)] -iz[Yo(t), Uo(t)] . Yo(t)
o a
Figure 10.2
ies in Uo are limited both in location and values to those for which
h(t, Yo(t), Uo(t - )) = h(t, Yo(t), Uo(t + )).
In the context of the previous proposition, suppose that for each non-
corner point t E [0, T]
(37)
Then under convexity of h(t, Yo(t), U) on 1111, it follows by §0.8, that this
function of U is minimized at Uo(t), or that
min h(t, Yo(t), U) = ho(t) = h(t, Yo(t), Uo(t)); (38)
Uel!/!
moreover, under continuity of ho, this must hold for all t, since then the
equivalent inequality
VUE dIt.
persists. (Why?)
1 In fact, the first version of the principle appeared in 1950 in a then little-known work of
Hestenes (see [He]).
§10.4. Separate Convexity and the Minimum Principle 369
is defined, then we have the following result first considered by Arrow and
Kurtz. (For some generalizations, see [S-S].)
and this can hold for all Y near Yo, iff the derivative vectors on each side are
identical. D
(10.6) Remarks. Since the form of h* depends on the unknown adjoint func-
tion P, it is not easy to predict that h* will exhibit the desired convexity.
However, it clearly does so when
h(t, Y, U) = h'(t, Y) + h"(t, U) (43)
and h'(f, Y) is [strictlyJ convex on D.
Example 2. To minimize
F(y, u) = Il [y2 - Y - (1 - U)2J (t) dt
on
= {y E C1 [0, IJ; u E CEO, 1]: y(O) = O}
fi)
under the state equation y = u, where u(t) E o/J = [0, 2J, we note that the
problem is autonomous. Moreover, if p E C1 [0, IJ, then
h(t, y, u)~ y2 - Y - (1 - U)2 + p(t)u
= h'(y) + h"(t, u), (44)
where h'(y) = y2 - Y is strictly convex, but
h"(f, u) = -(1 - U)2 + p(Ou
is not convex.
In fact h"(f, u) is a concave parabolic function in u, so that for 0 :$; u :$; 2,
it is minimized at u = Uo where Uo = 0 or Uo = 2. Thus, by Proposition 10.4,
if Yo E C1 [0, IJ, and Yo = u, then
ho = y~ - Yo - (1 - UO)2 + pu o = Co, a constant,
when p satisfies the adjoint equation
-p = hy = 2yo - 1 (with p(l) = 0). (45)
Since (1 - u o? = 1 when Uo = 0 or 2, we see that y~ - Yo + pu o = Co + 1 is
also constant, and conclude that switching from U o = 0 to U o = 2 can occur
only at points where p vanishes. Finally, we know that if it exists, the
minimizing Yo with yo(O) = 0, is unique. Hence we can first try the simplest
possibilities.
Case 1. uo(t) == O. Then Yo = Uo = 0 so that yo(t) = yo(O) = 0, and by (45),
P= 1 with p(l) = 0, so that p(t) = t - 1 :$; O. However, from (44), we see that
for t < 1,
h(t, Yo, u) = -(1 - U)2 + (t - l)u
is minimized only when u = 2 and has the minimum value of -1 +
(t - 1)2 #- ho(t), since ho(t) is constant. This solution does not satisfy the
§10.4. Separate Convexity and the Minimum Principle 371
minimum principle (38). It may be the optimal solution but we cannot use
our proposition to verify its optimality (or that of the other simple case
uo(t) = 2. (See Problem 10.22.)
Case 2. We must look for a solution with at least one switching point a,
say, where p vanishes. Since p(1) = 0, it is simplest to try p(t) = 0 for t ~ a.
Then 2Yo(t) - 1 = - P = 0 or yo(t) = t, so that uo(t) = Yo(t) = 0 for t ~ a.
But then for t ::;; a we should try uo(t) = 2 so that yo(t) = 2t, and continuity
of Yo at a requires that a = t. Finally, for t::;; t: P(t) = - 2Yo(t) + 1 =
-4t + 1 so that p(t) = -2t 2 + t + c, and p(1/4) = 0 => C = -i so that
p(t) = - 2(t - t)2 ::;; O. Now for t::;; t, h(t, 2t, u) is minimized when u =
U o = 2 (since p ::;; 0) and it has the minimum value ho(t) = -l Similarly for
t ~ t: h(t, t, u) = (t)2 - t - (1 - U)2 is minimized when either u = 2 or u =
U o = 0 and it has the minimum value ho(t) = -l Therefore for all t E [0, 1J
this Yo satisfies the minimum principle. Consequently, the unique solution to
our problem is given by
yo(t) = 2t; and uo(t) = 2, t< t,
_ 1..
- 2, =0, t > t,
and the optimal control is bang-bang.
PROBLEMS
with p, J-t continuous and J-t ;::: 0 on [0, 2], such that
J-t(t)(u~(t) - 1) == O. (47)
(b) Examine the convexity of the associated integrand j(!, u, ji).
(c) Consider ~ = F(y, u) - F(yo, uo), and integrate the p(y - Yo) term by
parts to show that if y also meets the above conditions, then
Y(O) = (1),
o x
dx
+ (25 + x/4)u(x)
on C[O, 100], with
G(u) = f l00
o x
u(x) dx
+ (25 + x/4)u(x)
=1
,
where 0 ::;; u(x) ::;; 1, show that it suffices to find U o which minimizes
1 =,
for x < " uo(x) = 0, x > " in order to have J.L ~ O. Show that G(u o) =
= 20(e 5/4 - 1), so that A = 251' + 1/4 is known.
(c)* For convexity, show that f.. ~ 0 when u ~ O. (This is immediate when
x < ,; for x > " it is essential to use the fact that J.L(x) = (XA1 - 25)/x 2 .)
(d) Observe that J.L obtained in (b) is continuous, and conclude (through
(3.10)) that U o is indeed the unique optimal "control" for this problem
(which is given a physical origin in Problem 11.26).
10.16. Under the conditions of Problem 10.13, find P(2) to minimize
F*(Y, u) = IY(2) - (5, 2W + F(Y, u).
10.17. (a) Transform the problem of minimizing
with say y(O) = y(O) = 0, y(2n) = 1, into one for which Theorem 10.1
applies. Does Theorem 10.2 apply? Explain.
(b) Find an optimal trajectory for this problem. Is it unique? Hint: Eliminate
u and transform to a Bolza problem with (ji + y)(2n) = O.
10.18. (a) Explain how the hypotheses of Theorem 10.1 should be modified to admit
a single state-control inequality such as
",(t, Y(t), U(t)) ::;; 0, where", is Cl.
Problems 375
with given y(O) and y(T) = ll!, show that we should choose
(b) This problem has a simple graphical interpretation. Use the result of
Problem 10.25 to contrast what happens when instead (e Y - yd(2) = 2.
10.27. (Optimal oil production.) An offshore oil field estimated to contain K barrels
is to be pumped out by w identical rigs over a large number of days, T. Each
oil rig costs R dollars to install and has a maximum production rate of m
barrels per day of oil that will sell at a fixed price of D dollars per barrel. At
discount rate r, the company should control both wand v = vet), the fraction
of its total productive capacity in use at time t, to maximize
378
§11.1. Necessity of the Minim urn Principle 379
was also the case for the fixed interval problem in the variational calculus
(Chapters 3 and 6).
Specifically, we shall prove the necessity of the minimum principle, which
asserts that optimality of a control Uo guarantees its pointwise minimization
of some related function h, in that
h(t, Yo(t), U) ~ h(t, Yo(t), Uo(t», V UEi1IJ, (1)
where i1IJ is the control set, and Yo is an associated optimal state function.
Moreover, we expect that for performance and state functions f and
G, h might take the form
h(t, Y, U) = f(t, Y, U) + P(t)· G(t, Y, U) (2)
for some function P to be determined, but which probably satisfies the
adjoint equation
(3)
along the optimal trajectory, with related target conditions.
Now, under classical assumptions that i1IJ is open and h is differentiable in
U, (1) implies that h u(', Yo, Uo) == (f). Then the minimum principle is just
the Weierstrass necessary condition (7.15), for the modified integrand
j(t, Y, U, Y) = h(t, Y, U) - P(t)· Y, (4)
if we think temporarily of U as U. Indeed, the relevant combination for
W = (U, V) E lRk+d, is the excess function given by
j(-, Yo, U, V) - j(-, Yo, Uo, Yo) -icu,rk, Yo, Uo, Yo)·(w - (Uo, Yo»
= h(', Yo, U) - h(', Yo, Uo) - P'(V - Yo) + p·(V - Yo)
= h(', Yo, U) - h(', Yo, Uo) (assuming that hu (', Yo, Uo) = (f)1
the resultant states at later times. We use this in part (b) to establish the
minimum principle for autonomous problems on a fixed interval (albeit with
a slightly different h), and apply our results to the oscillator energy problem
from §10.2(f). Finally in part (c) we transform the general problem into the
case covered in part (b) and give illustrative applications which indicate the
techniques employed when using these conditions, as well as some of the
difficulties encountered in doing so.
We will sometimes refer to a state Y E C1 [0, T] as a state trajectory.
We may also assign a vector-valued function or a matrix-valued function any
smoothness property common to its elements.
- - - U,
~k ----- Uo
O~----------~--~--------------~r-~
I .'. ----'!..L t
Figure 11.1
§11.1. Necessity of the Minimum Principle 381
equation Y = G(Y, U.) on (0, T), with y(o) = Yo (0). Indeed, when t ~ ,', we
just take y'(t) = Yo (t). On Eo', ,], the small interval of replacement, we can
solve the equation Y(t) = G(Y(t), V) with Y.(,') = Yo(,'), by the methods of
§A.5, and obtain
For small s, Y.(,) will be sufficiently near Yo(,) to permit the embedding
methods of Theorem A.18, to be applied to obtain the solution y. on the
successive subintervals of continuity of Uo over [" T]. In particular, the
resulting target value y'(T) is known, and it can be compared with Yo(T).
(11.1) Lemma. There are G-dependent positive constants M and <50 such that
on Eo', ,],
a(t) = I y'(t) - Yo(t) I ~ Ma, when Cl: ~ <50 ,
(10)
if say cs ~ 111' = <50 , and we replace eMo with M. D
and used the preceding lemma to estimate the second integrand uniformly by
some ye. The final expression in (11) collects the error terms associated with
°
these approximations into some 3(e) that approaches with e; (ec = 7:' - 7: is
just the length of [7:', 7:J).
Next, we note that on the remaining interval [7:, TJ both y. and Yo satisfy
the same state equation
Y = G(Y, Uo)
but with initial value Y(7:) = A, for different A.
By the arguments used in Theorem A.20 we know that as a function of A,
the resulting solution Y(t, A), say, is differentiable with respect to the compo-
nents of A, and when evaluated at Ao = Yo(7:), W< ~ lA(·, Ao) is the unique
solution to the linear system
W= Gy(Y, U)W on (7:, T) (13)
with W«7:) = ~d (the d x d identity matrix), when the columns of Gy are
indexed by Y. It follows that since A = y'(7:) is given by (11), then on [7:, TJ
d Y.(.) I
-d = lA(·, AO)-d
dAI = cW«·)L\G(7:, V);
e .=0 e .=0
i.e.,
(14)
for some function 3(t, e) that approaches 0, as e -+ 0. When t = T, let W; =
W;(T) and 3(e) = 3(T, e), so that
y'(T) = Yo(T) + ecW<L\G(7:, V) + e3(e). (14')
At this point, we digress to obtain a simple result.
forms an infinite pyramid in ~d with apex at (I), and "edges" in the directions
of the "fi as indicated in Figure 11.2. This pyramid need not have an interior,
but when it does, we can establish the following:
YoCT) = {} YoCT) = f7
Figure 11.2
384 11. Necessary Conditions for Optimality
but the correction term cannot be ignored. Instead, we consider the function
fwhich assigns to each X E K o, the target value from (16) with the same 1:, ci ,
so that f(X) = X + 1:3(1:). Since for 1:0 sufficiently small, and 0 ~ Ci ~ 1, we
can take a single function 3, it follows that this f: Ko --+ IRd is continuous.
Moreover, for X E 8K o, the boundary of K o,
If(X) - XI = 1:13(1:)1 ~ 1:0 < IX - pXol,
if 1:0 is sufficiently small and p> 0 is even smaller. Then examining Figure
11.2 again, we can "see" that f does not move the boundary of Ko far enough
to remove pXo from f(Ko); i.e., some X E K o, has the target value f(X) = pXo
as required.
(To complete this argument rigorously we must appeal to a form of
Brouwer's Fixed Point Theorem as given, say, in [L-M]; this circumstance
gives analytical depth to the proposition, and to subsequent results depen-
dent upon it.) 0
$" = {X f =
;=1
ci"li: C; ~ 0, i ~ n = 1,2, ... }, (17)
and the "Ii are any of the vectors defined by «12), (13), and (16)) for some rio
V;. For if Xo is in the interior of $", then there must be d linearly independent
"Ii forming a subpyramid K containing Xo in its interior. With a slight shift
if necessary, we can suppose these V; to be associated with distinct r i , so that
the construction in the proposition is valid for K, and so for $".
To understand the relevance of the next result, note that the autonomous
optimal control problem from §1O.1 admits (Mayer) formulation as that of
finding a Y = (y, Y) E C1 [0, T] with minimal y(T) among those which satisfy
§11.1. Necessity of the Minimum Principle 385
Remark. In fact, p. G(Yo, Uo) is constant so that (19') holds at all t. (See
Remarks 11.11.)
PROOF. Assume that Yo(T) = i9, and consider first the simpler cases where the
target is either fixed (<D(Y) == Y) or free (<D(Y) == @). In these cases, the target
cone % of 11.5, cannot contain in its interior the "downward" pointing
vector Xo = (-1,0, ... ,0). [For then by Proposition 11.4 (as extended), there
would be a modified control U., with associated state target value y'(T) =
( - p, 0, 0, ... ,0) for some p > O. But then Y(T) = Yo(T) = @, while
y(T) = - P < 0 = Yo(T) contradicting minimality.] Thus, there is a unit vec-
tor N orthogonal to a d-dimensional subspace ff for which
N·X~O, v X E %, with N = (n, N) and n ~ o. (20)
386 11. Necessary Conditions for Optimality
/RxY
%
"-\\ Edge
\ view
\
\
yz
Figure 11.3
(When <I> == 0, we can take N = (1, 0, ... ,0)) ([N]). In case d = 2, (20) guaran-
tees that each vector in the cone is within an angle of nl2 of the fixed
upward-slanted vector N, as illustrated in Figure 11.3.
(11.7) For the more general <1>, the requirement that fMI = <l>y(Yo(T)) (= <l>y({D))
have maximal rank I ::;; d means that locally (through implicit function theory
[Ed]), Yo the {D-Ievel set of <I> through {D can be represented as a (smooth)
r (= d - I)-dimensional submanifold in /Rd. Moreover, !/' has an associated
r-dimensional tangent space !T which is orthogonal to the subspace spanned
by the I linearly independent (column) vectors of fMI. A similar argument to
that given above shows that :f{" cannot contain in its interior any downward
directed tangent vector Xo E /R X fT.l We conclude that there is again a
vector N = (n, N) with N orthogonal to /R x !T for which (20) holds. But
then N = A<I>y(Yo(T)) for some A E /R I.
In Figure 11.3, we illustrate the latter situation for d = 2, where the target
surface is defined by the single equation CP(Yl, Y2) = O. Maximal rank at (D
means that Vcp({D) l' {D and as we know say, by the discussion at the end of
§5.6, this vector is orthogonal to the r (= I)-dimensional subspace !T (a line)
tangent to Yo the O-level set at {D. Since the interior of :f{" cannot contain
vectors in the "lower" half of the plane /R x !T, then (20) holds for some
N = (n, N), where N is orthogonal to !T and thus is given by some AVcp({D).
1 For if so, then ff also contains a small downward directed target vector Y,(T) which lies in
[J;1! x 9", and this is clearly inadmissible. (Why?)
§11.1. Necessity of the Minimum Principle 387
Now let's see what these results tell us about our original autono-
II}ous fixed interval problem. Note that in its Mayer formulation, when
Y= G(Y, U) only, then with P = (p, P), we must have from (19) that
P·=-G·P=O
y ,
1 See the first example in §10.2(a). Such problems are referred to as being abnormal because their
optimal criteria do not depend on the performance measure being optimized.
388 11. Necessary Conditions for Optimality
(11.9) Remarks. 1. If fixed (or more general) target values are defined by
requiring that <I>(Y(T)) = (!) for some C l I-vector valued function <I> for which
the matrix <l>y(Yo(T)) has maximal rank I ::;; d, then by (19),
P(T) = A<I>y(Yo(T)) for some A E ~I. (21d)
Maximal rank is essential for this conclusion. See Example 1 below.
2. The Lagrange multiplier theorem (5.16) admits similar formulation for
Ai = Pi' with
Then alternative (a) permits Ao = 0, while if (a) is not satisfied, we must take
Ao 1= 0, and, if desired, Ao = 1.
3. Observe that under classical smoothness conditions, we have just es-
tablished a Lagrange multiplier rule for Lagrangian constraints of the form
Y = G(Y, U). It guarantees the Weierstrass condition, but for the modified
integrand j = Ao! + p. (G - Y) (where now U is to be thought of as U). The
more general situation is discussed in §11.3.
uniquely on
!'!) = {(y, u) E ClEO, 1]: y(O) = 1, y(1) = e}
under y = u, with lui::;; 4,
Thus, for some P E Cl [0, 1] and Ao (= ° or 1),
h(t, y, u) = Ao(y2 + u2) + p(t)u
obeys the minimum principle for uo(t) = e t ; hence,
v lui::;; 4:
Since letl < 4 for t E [0, 1], it follows that
so that
hu(t, yo(t), e t ) = ° or 2A ouo(t) + p(t) = 0,
p(t) = - 2A ouo(t) = - 2A oe t and p(1) = - 2A o.
Since (Ao, p(1)) 1= (0, 0), we must take Ao = 1, even though this is a fixed
target problem. If we use cp(y) = y - e to fix the target, we can take A = -2
and satisfy (21d). However, if we use cp(y) = (y - e)2 to fix the target, then
CPy(Yo(1)) = 2(Yo(1) - e) = 0, and we cannot find a suitable A because the
maximal rank condition is not satisfied.
§11.1. Necessity of the Minimum Principle 389
Note that
ho(t) = (e t ? + (e t )2 - 2e t et = 0.
which are seen to be identical to those satisfied by Yo. Since P(T) = (!), we
conclude that
P(t) = -iT t
{sin«t - r»}uo(r) dr.
cos t - r
(25)
we see that in order that uo(t) minimize this expression when lui ~ 1, we
should choose
uo(t) = sgn(YI (t) - PI (t». (Why?)
Thus, from (23) and (25), we find that
~} = LT {:~::}uo(r) dr
are not both zero. If we exclude the non optimal case U o == 0, then we must
find a if possible that makes these equations for U o compatible. We see that
the optimal control is of the bang-bang type; in fact, it is constant, + 1 or
-1, on successive intervals of length n, exactly as our preliminary analysis in
§1O.2(f) suggested.
For example, when T = 2n, we can ask whether for a = n/2
U
o= {+
-1
1 on (0, n),
on (n, 2n),
could be optimal?
F or this we need
J(3"1
0
2
cos(t - r)uo(r) dr = 2 sin t - "
f 3"12
cos(t - r) dr
= 3 sin t + cos t
which equals -1 when t = n, and so by continuity, is negative in a neighbor-
hood of t = n, while uo(t) = + 1, for t E (0, n).
From Theorem 11.8, we also know that ho(t) = h(t, Yo(t), Uo(t» is con-
stant. In particular, ho(O) = ho(T), and since Yo(O) = P(T) = (!), from (22), we
see that
§11.1. Necessity of the Minimum Principle 391
This equation provides a simple test which possible optimal controls must
meet. For example, when T = n/2, the control
{ +1 on (0, n/6),
u= -Ion (n/6, n/2),
cannot be optimal since fO/2 cos ru(r) dr = 0, while the corresponding inte-
gral on the right side of (27) is non vanishing. On the other hand, for T = n,
this condition becomes
+1
{ -1 on (0, u), h
U o= (
on u,5n ,
/4) where tan u = -(1 + y'2), (27')
could be optimal.
Unfortunately, without either a sufficiency theorem, or an existence theo-
rem, we still cannot guarantee the optimality of this sole survivor.
In Theorem 11.8, the interval [0, T] is fixed, which precludes direct applica-
tion to say, time optimal problems. However, it is straightforward to use
the theorem (and subsequent remarks) to obtain an extension to the non-
autonomous case where T also varies, and thereby strengthen the previous
results. It is less straightforward to present the new results.
392 11. Necessary Conditions for Optimality
and it is seen that if to(r) = rTo (so that Wo = To), with corresponding sub-
stitutions for X o, Yo, then (to, Xo; wo, Yo) minimizes
°
Hence by Theorem 11.8 and Remarks 11.9, with appropriate substitutions,
3 A.o = or 1, A E IRI, and functions (qo, Q) E C1 [0, 1] with
(A. o; qo(1), Q(l)) "# (9, (32a)
such that
qo(1) = A . <l>t(To, Xo(l)); Q(l) = A<I>x(To, Xo(1))
(since t o(1) = To); and
q~(.) = -hk, to: X o, Vo)To, Q'{-) = -h y (·, to, X o, Vo)To, (32b)
where
h(r, t, X, V)w ~ [A.of(t, X, V) + Q(r)· G(t, X, V) + qo(r) ·l]w,
obeys the minimum principle on [0, 1] for the control pair (w, V), relative to
the trajectory to(r) = rTo and Xo(r) = Yo(rTo), at each point r of continuity
of Yo.
[0, To]. If, in addition, <l>t == m, then we can conclude that ho = ho(To) = 0, so
that (Ao, P(To )) =f. m. However, for the fixed interval case which requires say
<I> = (<p, <ill with <p(t, Y) = t - To, then <l>t = (1, <ilt) =f. m, and we know that
for such problems ho need not vanish (Example 2). Observe that all conclu-
sions about the behavior of ho were obtained under the assumption that the
various integrands are defined on some time interval (0, b) ;2 (0, To]. For
certain nonautonomous fixed interval problems, this need not be valid. If,
for example, f(t, Y, U) = JTo2 - t 2j(Y, U), then more general methods are
required to obtain analogous results. See [He].
2. This theorem required boundedness of the control set 1111. When 1111
can in addition be described by Lagrangian inequalities of the form 'P(U) :::;;
mE IRm, then under sufficient smoothness assumptions, we can replace the
minimum principle as stated by iiu(t, Yo(t), Uo(t)) = m, where ii = h + M· 'II
for suitable multiplier functions M on [0, To]. Recall that a similar suffi-
ciency formulation was made in §10.1 and subsequently used to attack sev-
eral of the problems in §10.2. This extension, as a necessary condition for
optimality, will be established in §11.3.
3. In this theorem, the optimal target time To could be any positive num-
ber, and this might not be realistic in certain applications. Problems in which
To is required to lie in some given interval are explored in [S-S].
Example 3. Suppose that for Y = (y, YI) and given B = (b, bl ) E 1R2, we wish
to minimize
F(Y, u, T) = IT (1 + yi(t)) dt
on
q} = {(Y, u) E CI[O, T] x C[O, T) with T> O:}
Y(O) = m, Y(T) = B, lu(t)1 :::;; 1 '
/
/
" Uo =-1 ./ Uo = 0
...... .;'
't 't'
Uo = +1
Figure 11.4
396 11. Necessary Conditions for Optimality
Uo = +1
- - - --:~---I
Uo = -1 /" B
-'
Figure 11.5
switches in value which an optimal control can exhibit. These results are
applied to obtain a complete solution to a docking problem, and other
applications are discussed.
Problem Statement
for an appropriate square matrix function e- tA and row matrix v. l From (29)
we see that for each U E 1111:
h(t, Yo(t), U) = AO + Ve-tA(A Yo(t) + IBU)
~ ho(t) = AO + Ve-tA(AYo(t) + IBUo(t»,
or
(41)
Also ho = 0 (why?), so that by (30'),(AO, V) =f. (!).
For this linear autonomous case, there are additional conditions under
which the minimum principle in the form (41) is effectively sufficient.
(11.12) Theorem. Suppose that the origin is an interior point of the control set
1111 in IRk and that the matrix tMI = (IB: AlB: A21B: ... : Ad-lIB) has rank d. If for
some vector V E IRd '" (!), Ve-tAIBU ~ Ve-tAIBUo(t), t E [0, To], VUE 1111, then
1 Formally
where the indicated series of matrices converges uniformly in t on each compact interval. Also
and etA are inverse matrices [C-L].
e- tA
§11.2. Linear Time-Optimal Problems 399
PROOF. With the same matrix e- tA used above, the state equation may be
rewritten (d/dt)(e-tAY(t» = e-tAIBU(t) so that upon integration over [0, T],
with Y(T) = (9 (and e- OA = ~d)' we obtain
V t; (43)
0= f TO
T Ve-tAIBUo(t) dt,
and by (43),
Ve-tAIBUo(t) == 0 on [T, Tol
Using (41) again, we see that U E IJIt => Ve-tAIBU ~ 0, and where also - U E 1JIt,
we get Ve-tAIBU == 0 on [T, Tol Since (9 is an interior point oflJlt, this must
hold for all U near (9 so that the row vector Ve-tAIB == (9 on [T, Tol By
translation, with tl = (T + To)/2,
c))(t) ~ Ve- t1 Ae- t AlB,
[~e-tA
dt
= _Ae- tA ]
t=O
= -A,
(11.14) Corollary. Suppose that 1111 of Theorem 11.12 is a closed box (or a
compact convex polyhedron) in IRk with vertices ~. If the vectors IBE, AIBE, ... ,
Ad-1IBE are linearly independent for each edge vector E joining adjacent
vertices, then the optimal control Vo(t) is unique, and piecewise vertex-valued
on 1111.
PROOF*. For each t E [0, ToJ, Ve-tAIBV is linear in V and continuous on 1111.
Hence by Proposition 5.3 it is minimized at some Vo(t) E 1111, and for the
hypothesized shapes of 1111, the minimum value must occur at some vertex ~.
(See Problem 11.7.) If this same minimum value occurs at another vertex, it
must do so at an adjacent vertex Vi so that by subtraction, Ve-tAIBE = 0 for
the edge vector E = Vi - ~.
Suppose this nonuniqueness occurs for more than a finite set of values of
t = tn' Then since the number of edges is finite we can assume that all tn
are associated with the same edge E, i.e., that q>(tn) ~f Ve- tnA IBE = 0, 'V n =
1, 2, .... From this it follows that the function q>(t) = Ve-tAIBE, which is
real analytic in t, vanishes identically [Rud]. But if q>(t) = Ve-tAIBE == 0,
§11.2. Linear Time Optimal Problems 401
Remarks. This result shows that for such problems, an optimal control
should be of the bang-bang type since it is mostly vertex-valued. If we
replace d/t by {V: IVI ~ 1} then such distinguished vertices are not present,
and we have already considered related problems in §10.2(g).
If we set Y(t) = (y(t), Yl (t)) where Yl (t) is the velocity at time t, then the
state vector Y = (y, yd E ~2 obeys the linear law
Y(t) = (Yl (t), u(t)) = A Y(t) + lBu(t),
where
Under the simple dynamics permitted here it is clear by time reversal that
this is equivalent to asking whether we can travel from the origin to a
given state A by such controls.
Obviously, we can do so for all states in ~2 where uo(t) == + 1, i.e., for all
states A which lie along the upper parabolic arc with parametric equations
{ Y(t) = t /2,
2 2
( ) (so Y = yd2, Yl > 0),
Yl t = t,
or along the lower parabolic arc Y = - yil2, Yl < 0, obtained when uo(t) =
-1, both shown in Figure 11.6. The other parabolic arcs shown in this figure
are those obtained by reversing the thrust direction at various points along
Figure 11.6
where O"j(t) = P(t)· Bj is the switching function for the jth component of Uo(t),
so-called because as (45) shows, if O"j(t) -# 0 then
O"j(t)Uj ~ -mj sgn O"it) = O"j(t)(Uo)it),
which means that the jth component of Uo is
-mj , when O"it) > 0,
{
+ mj , when O"j(t) < o.
When the eigenvalues of lSI. are all real, there can be at most d - 1 zeros of
each O"j (which is continuous), and so at most d - 1 switches in each compo-
nent of the optimal control Uo. Some details are given in Problem 11.10, and
404 11. Necessary Conditions for Optimality
A.=[_~ ~J
has eigenvalues A where
IA. - AnI = I =~ ~ AI = 0,
or A2 +1= 0, so that A = ± i.
Here again
We suppose that all terms are as defined in Theorem 11.10. Then from
Theorem A.28, we can easily get the following:
(11.16) Corollary. In Theorem 11.10, suppose that the control set o/J can be
described by Lagrangian inequalities 'P(V) ~ (!J E ~m, where 'P is C l and
'Pu(Vo(t)) has maximal row rank m while fu and Gu are continuous. Then there
exists a (unique) M = (Ill' 1l2' ... , Ilm) E CEO, ToJ, with M ~ (!J, for which
hu( ., Yo, V o) = (!J on [0, ToJ, where
h(t, Y, V) = h(t, Y, V) + M(t)· 'P(V). (47)
Moreover
i = 1,2, ... , m.
PROOF. The minimum principle (29) implies that for each t, h(t, Yo(t), V) is
minimized at Vo(t) under the constraint 'P(V) ~ (!J. By Theorem A.28, a
I In this case we will say that G has maximal row rank m (::s; n).
406 11. Necessary Conditions for Optimality
Note that when a component t/!i is inactive at Vo(t) in that t/!i(VO(t» < 0,
then J1.i(t) = O. In §10.1 and in several of the applications in §1O.2, we have
seen how this extension affects the search for optimal solutions.
but then ff(T, F(T)) = (D => F(T) = (D, by local uniqueness of F; i.e.,
G(X, U) = U as desired. D
on
q} _ { c
Y E 1 [0, T]: T> 0, with y(o) = Yo(O), }
- (Y(t), Y(t)) E D, and <I>(T, Y(T)) = (D E ~l '
under the Lagrangian constraint
~(t, Y(t), Y(t)) == (D on (0, T).
Here D is a domain in ~2d while I = At, Y, Z), Iy and Iz are assumed
continuous, and <I> = <I>(t, Y) is e 1 while <I>(t,y)(To, Yo(To)) has maximal row
rank 1 ::;; d + 1. Similarly, we suppose that ~ = ~(t, Y, U) is e 2 , and that
~u(t, Y, U) has maximal row rank m ::;; d whenever ~(t, Y, U) = (D, in a neigh-
borhood So of/{fo, the trajectory of Yo in ~2d+l. According to Corollary 11.17,
with X = (t, Y), there exists a e 1 function G(t, Y, U) defined in a neighbor-
hood of So, such that
~(t, Y, G(t, Y, U)) == (D E ~m, while G(t, Y, U) = U on So. (48)
Next, let
f(t, Y, U) ~ At, Y, G(t, Y, U)). (49)
408 11. Necessary Conditions for Optimality
We claim that with Uo = Yo, the triple (Yo, Uo, To) minimizes
i
i.e., Yo is stationary for in such intervals.
At corner points, h[Yo(·)] is continuous, (with P), while from the conti-
nuity of ho(t) = i[Yo(t)] - iy[Yo(t)]· Yo(t) follows the second Weierstrass-
Erdmann condition of §7.5.
From equation (30) for ho in Theorem 11.10, we can show that Yo also
I
satisfies the second Euler-Lagrange equation for Similarly, the minimum
i
principle gives the Weierstrass necessary condition on of Theorem 7.15.
(Problem 11.20.)
410 11. Necessary Conditions for Optimality
(c) Extensions
(11.21) Various extensions of Theorem 11.20 are possible. For example,
suppose that instead of the fixed left end condition Y(O) = Yo(O), the
competing Yare required to satisfy a condition of the form <D(a, Y(a)) == (!) for
some C l vector valued function <D whose Jacobian matrix at (ao, Yo(a o)) has
maximal rank:$; d. The 1 defined as above on [ao, To] will satisfy transver-
sality conditions analogous to (57) at (ao, Yo (ao)) for an appropriate vector A.
It is also possible to consider minimization of a Bolza type functional in
which .'F is replaced by
.'F + qJo(T, Y(T)), where qJo is Cl.
Then the theorem holds as stated if A . <I> is replaced by A.o qJo + A· <I> in the
transversality conditions (57). (Details are indicated in Problem 11.22*.)
Corresponding problems with additional isoperimetric constraints may
be handled by replacing these with Lagrangian constraints in a higher-
dimensional vector space as in Proposition 9.11. The resulting multipliers
associated with such constraints will be constant as expected. A simple
example in discussed in Problem 11.23.
(11.22) Lagrangian inequalities of the form '§(t, Y(t), Y(t)) :$; (!), can be
treated with the device of slack variables introduced by Valentine [CCV],
and used in Problem 7.22*.
To accomplish this, we let Vi = wl, i = 1,2, ... , m, and for the new function
,§*(t, Y, Z, W) = ,§(t, Y, Z) +V (58)
we consider the problem of minimizing .'F as before on
!!}* = {(Y, W; T): (Y, T) E!!}, WE C[O, T]}
§11.3. General Lagrangian Constraints 411
(11.23) Finally, there is an optimal control analogue for each of the above
extensions which can be obtained by applying the same techniques. Let's
consider only one such problem, that for which (Yo, Uo, To) minimizes
C§"':
(Y,U)
[.!- :
= -Id I :!~J
Gu
Finally, we have from the Weierstrass necessary condition for j that with
V = (w, U) say, near (Yo(t), Uo(t»,
h(t, Yo(t), V) - h(t, Yo(t), Yo(t), Uo(t»
= j(t, Yo(t), V) - j(t, Yo(t), Yo(t), Uo(t» + P(t)· (W - Yo(t»
;:::: 0 (since P = - jy and ju = (!J along the optimal trajectory.)
As two special cases of the latter inequality, we obtain (when U = Uo(t»
that
h(t, Yo(t), W, Uo(t» ;:::: ho(t), v W near Yo(t)
and (when W = Yo(t» that
h(t, Yo(t), Yo(t), U) ;:::: ho(t), v U near Uo(t).
We have now closed the circle of ideas begun in §10.1, by proving that the
conditions shown there to be sufficient on a fixed interval with convexity
are indeed necessary for more general problems. In [He] other "smooth"
problems are considered wherein the various given functions are supposed
C 2 as required. Related problems with C state variables are examined in
[S-S], and numerical methods that implement some of these results are
indicated in [G-L].
Associated nonsmooth problems are explored in [Ce], [CIa], [Lo], [1-T],
[R], [Wa], [Z] among others. Such problems demand far more sophisticated
tools, and constraints of the type considered here impose substantial diffi-
culties for the development of a governing theory. Nevertheless, after proper
interpretation, the results are essentially the same; viz., constrained optima
guarantee existence of associated multipliers for which a modified perfor-
mance function is optimized as though such constraints were not present. In
this theory, the role of the multipliers receives functional analytic clarifica-
tion, but the solution of any given problem, even when smooth, remains as
difficult as before. Each problem must be analyzed by methods peculiar to
itself. What is most needed is simple effective sufficiency criteria, especially for
nonconvex problems on variable intervals, or a better understanding of the
transformations which might be used to reduce such problems to those for
which such criteria are already known. Peterson and Zalkind (see [Le])
supply a valuable comparative survey of earlier work, while recent results of
Zeidan (see [CIa] and [Lo]) provide hope for further success.
PROBLEMS
for appropriate e < d and polynomials qi associated with the distinct A.i,
i = 1,2, ... , e. Argue that the conclusion in part (b) applies to this case as
well.
416 11. Necessary Conditions for Optimality
for some locally defined y E IRm such that y = 1fI(Y) "solves" G(y, Y) = O.
(b) Extend the argument in part (a) to cover the case when G = G(x, Y) for a
C 2 function G with Gy[Yo(x)] of maximal row rank m < d at each x.
(c)* Explain how you could use slack variables to consider Lagrangian
inequalities such as G(Y(x)) :S (!) or G(x, Y(x)) :S (!). Hint: See (11.22) in
§11.3.
11.16*. (Simple differential constraints: ~(t, Y(t), Y(t)) = (!).)
(a) Suppose that for some choice of variables Y = (X, V) E IRd - k X IRk, the
constraint function ~ takes the form ~(t, Y, Y) = G(t, Y, V) - X. Then
show that the problem of minimizing
where f(t, Y, U) ~ I(t, Y, G(t, Y, U), U) under the new state law Y =
(G(·, Y, U), U) with the control set 0/1 = IRk.
(b) Apply the minimum principle (Theorem 11.10) to a possible minimizing
(Yo, Uo , To), to obtain an associated Lagrangian multiplier function
(P, Q), say, and a Ao = 0 or 1 for which (since 0/1 is open): Aofu + GuP +
Q = lTJ, when evaluated at (Yo, Uo).
(c) Establish that (P, Q) satisfies the equations
(P, Q)" = -(h x , h y ) for h = Aof + p. G + Q. U.
(d) Introduce M = P + Ao!x[Yo] and conclude that the above equations
combine to form the Euler-Lagrange equations for
i~ Aol + M·(G - X) = Aol + M·t;;.
(e) Discuss the related transversal conditions, and formulate your result as a
theorem.
11.17. (a) Obtain conditions which are necessary for Y = (y, yd Eel [0,1] to mini-
mize F(Y) = H'p
2 dt under.Pi - .P = 0 with Y(O) = lTJ, y(l) = 1, Yl(1) = 2.
(b) Show that these conditions will also be sufficient provided that an appro-
priate multiplier, J.L ~ 0, can be found.
(c) Explain how to find an optimal solution from these conditions.
11.18. (a) Apply an analysis similar to that used in the previous problem to
minimize F(Y) = HI YI 2 dt under .P~ + .P = 0 with Y(O) = lTJ and
y(l) = 1.
(b) What is the minimum value?
11.19. (a) Obtain necessary conditions for the time-optimal problem of a system
whose state Y = (y, Yl) at time t is governed by the nonlinear state equa-
tion Y = (u, u 2 ) where lui:::;; 1 with Y(O), Yl (T) prescribed.
(b) Can you find a sufficiency argument?
11.20. In the arguments leading to Theorem 11.20 verify that as defined does i
satisfy the second Euler-Lagrange equation and the Weierstrass necessary
condition.
11.21. (Bolza functionals, I)
(a) Suppose that for some C 1 function cp, (Yo, Uo, To) minimizes
where U E 0/1, on
on a related ?), under the above augmented state equations, provided that
Yo(O) = 0, y(To) = q>(Y(To)), U E R
(b) Apply the minimum principle from Theorem 11.10 to
h= Ao(f + u) + p·G + pu,
noting that then
h.(·, Yo, Uo) = Ao + p = o.
(c) Examine the associated transversality conditions for the augmented
<ij(y) = (<J)(Y), q>(Y) - y), and verify that the conclusions of Theorem
11.10 are valid as stated if A<J) is replaced by Aoq> + A<J).
(d) Argue that the same is true when <J) = <J)(t, Y) and q> = q>(t, Y). Hint:
Introduce t as a new state variable as in the proof of Theorem 11.10.
11.22*. (Bolza Functionals, II)
Assume that Theorem 11.10 remains valid for Bolza functionals of the type
considered in Problem 11.21, when A<J) is replaced by Aoq> + A<J). Reexamine
the arguments leading to Theorem 11.20, and conclude that it, too, is valid for
419
420 Appendix
Now, for each n = 1, 2, ... , suppose that Xn E f has the binary expansion
Xn = Lk=l bnk 2- k •
We will select a subsequence Xl' X2, ... of these Xn which converges to a
limit point X E f.
If there is an infinite set of the Xn with first binary coefficient bIn = 0, then
choose Xl to be that Xn from this set with least index n l . Otherwise, there is
an infinite set of the Xn with bIn = 1, and Xl should be taken to be that from
among them with least index. In either case let 51 be the first binary coeffi-
cient of Xl'
Next, let X2 be that Xn of least index n2 > n l for which bIn = 51 and
b2n = 0, if there is an infinite set of such Xn; otherwise, for which bIn = 51
and b2n = 1. Let 52 be the second binary coefficient of x2.
This selection process can be continued indefinitely, since at each stage we
consider only those remaining Xn which form an infinite set. There results a
subsequence xm = Xn ,and an associated sequence of binary coefficients 5m ,
m = 1, 2, ... , d etermmmg
?' .
t h e number x
- def"
= L.,k=l
<Xl b-k 2- k Ef.
But by construction, X and xm will have the same initial m binary coeffi-
cients 51' 52' ... , 5m, and the remaining binary coefficients cannot exceed 1.
Hence, by an easy estimate,
<Xl
Ix-xml::;; L 2·2- k =4·2- m , which-+O asm-+CX). D
k=m
Remarks. The selection process used in proving the lemma may be visualized
by successively bisecting (sub)intervals of f, and retaining at each stage, the
leftmost interval which contains an infinite set of the x n. The 5k index these
selections, and, more importantly, define the limit point X (as the sum of a
series of positive terms). We are assuming here (and elsewhere) that the set of
real number is complete in that it contains a sum for each convergent series
of its elements. For further discussion of this somewhat subtle point, see
[Ed].
· f( X ) = 0 an d
f( Xo ) = 11m f'( Xo ) -- l'1m f(x nk ) - f(xo) -- 0.
nk
k-+<Xl k-+<Xl X nk - Xo
Compactness is essential for this conclusion: f(x) = sin x and f'(x) = cos x
never vanish simultaneously even though f(nn) = 0 for n = 1,2, ....
f(x) ;::: f(x o) - If(x) - f(xo)1 > f(~o), when If(x) - f(xo)1 < f(~o),
and this holds V x in a neighborhood of xo.]
A similar analysis shows that when f(x o) > c, then f > c in a neighbor-
hood of x o, and the corresponding result holds when f(x o) < c.
(A.2) Intermediate Value Theorem. Iff E era, b] then f assumes each value c
between f(a) and f(b).
PROOF. For definiteness, suppose that f(a) > c > f(b). Then by continuity (as
above) f> c on (a, x) for some x sufficiently near a. We wish to find the
largest such interval. To do so rigorously, we let P = {x E (a, b): f> c on
(a, x)} and consider the (open) set 1= UXEP(a, x), which with Xl' and
X 2 > Xl also contains the interval [Xl' x 2 ]. (Why?) Hence I is an open
interval of the form (a, x o) for some Xo E (a, b] on which f > c. Were f(xo) < c
we could use continuity as above to conclude that f < c, on some (x, x o), and
this is a contradiction. Similarly, were f(x o ) > c, we could conclude that
f> c on a larger interval (a, x) with x > x o, but then x E P, and x E (a, x o],
which is another contradiction. Thus f(xo) = c as desired. D
422 Appendix
°
(A.5) Corollary. Iff is differentiable on an interval I and f'(x) "# in I, then
when [a, bJ ~ I, f assumes each value between f(a) and f(b) precisely once on
[a, b].
PROOF. From A.3, we could not have f(a) = f(b), for any distinct points a,
b E I. But f is continuous so that A.2 applies. D
and 3(X - Xo) ~ (9 as X ~ X o, since IX* - Xol ::; IX - Xol. Thusfis differ-
entiable at Xo. D
Ix ~ xo,1 Ix: [f(t) - f(x o}] dtl ::;; Ix ~ xol Ix: If(t) - f(xo)1 dt
< e IX dt = e·
-Ix - xol Xo '
reversing the limits on the integral shows that the same estimate is valid when
0< Xo - x < (j. Hence as x -+ Xo, the integral term in (1) approaches zero
and we conclude that
F'( Xo ) -- 1·1m F(x) - F(xo) -- f( Xo.)
X-+Xo X - Xo
Finally, when fECI [a, b], then by the result just established, F(x) ~
J~f'(t)dt is defined on [a, b] with derivative F'(x) = f'(x), x E (a, b). Hence
by A.4, f(b) - f(a) = F(b) - F(a) = J~f'(x) dx. 0
(A.10) Proposition. If f and g are in C(a, b) and f::;; g, then J~f(x) dx ::;;
J~ g(x) dx provided that both integrals exist and are finite; with equality iff
f(x) = g(x), V X E (a, b).
PROOF. 0::;; p ~ g - f is also in the linear space C(a, b). Thus as in Lemma
A.9.
g(y) =
deC
f f(x, y) dx
a
b
(2)
Ig(y) - g(ji)1 = r
y, ji E [u, v] and Iy - jil < b. Thus for such y, ji:
:s e Lb dx = e(b - a),
f dy Lb f(x, y) dx = f g(y) dy
dx LV f(x, y) dy
PROOF. By the above remarks, both iterated integrals exist and it remains
only to establish their equality.
426 Appendix
We shall regard u as fixed and permit v to vary, say v E [u, vo]. Then from
Theorem A.8,
:v rf dy f(x, y) dx =
The theorem will follow if we can show that the left side of (3), F(v), say,
has the same derivative, since both sides vanish at v = u. (Here we use the
well-known fact that functions with the same derivative on an interval differ
by a constant (See A.4).)
For small t of. 0, with v and v + tin [u, vo], we may express the difference
quotient as follows:
F(v
----- =
fb dx-1 fV+I [f(x, y) - f(x, v)] dy
+ t) - F(v) + fb f(x, v) dx. (4)
tat v a
Now, given 8 > 0,3 !50 such that If(x, y) - f(x, v)1 < 8 when Iy - vi < !50 and
y, v E [u, Vo], V X E [a, b] by the aforementioned uniform continuity of f on
[a, b] x [u, vo] (Lemma 5.2). Thus when 0 < t < !50 , the iterated integral in
(4) can be estimated by
as desired. D
Interchanging the order of integration as permitted by this theorem may
be interpreted as integrating "under" the integral sign. It may also be possible
to differentiate under the integral sign.
(A.13) Theorem. Iff = f(x, y) and its partial derivative h are continuous on
[a, b] x [u, v] then
def fb
g(y) = a f(x, y) dx (5)
r rf r
hence by the theorem just proven,
However, the last inner integration may be carried out by A.8, and the last
iterated integral becomes
Finally, from Theorem A.13 and the chain rule we obtain Leibniz' formula:
open neighborhood
S,(Xo) = {X E [Rd: IX - Xol < r},
so that F'(X) is also invertible in this neighborhood. The continuity of the
partial derivatives (at Xo) in conjunction with the mean value theorem
ensures that each component Jj is differentiable at Xo as in A.7, and we may
infer that F is differentiable at Xo in that
Now, F'(Xo)U -# (!), since F'(Xo) is invertible (§0.13), and hence the real
valued function p(U) ~ IF'(Xo)UI is positive and continuous on the compact
surface of the unit sphere, {U s; [Rd: lUI = I} (§A.O). Thus, by Proposition 5.3,
p assumes the necessarily positive minimum value, 4m, say, so that from (10),
and the reverse triangle inequality
=
IF(~ ~~~o)1 ~ IF'(Xo)UI -13(X - Xo)1 > 4m - m = 3m(> 2m), (11)
or, by (11),
,u(XI ) > 2mlXI - Xol- mt = 2mt - mt = mt,
which contradicts our knowledge that the minimum value,
,u(Xd::;; ,u(Xo) = !F(Xo) - YII = IYo - YII < mt.
Thus the minimum value occurs at Xl with IXI - Xol < t, and Xl must also
give the minimum value of
§A.5. Families of Solutions to a System of Differential Equations 429
a b x
Figure A.I
430 Appendix
tions, denoted T(-; L), have a Jacobian matrix TL with continuous elements,
and to obtain this we shall require that G have a Jacobian matrix GT with
continuous elements (and columns indexed by T).
(A.16) Lemma. If the elements of GT are continuous on K o, then for (x, To)
and (x, Td in Ko:
G(x, Td - G(x, To) = (Il GT(x, T,J dU) (Tl - To), (13)
g(x, T1 ) - g(x, To) = Il :u g(x, T,,) du = (Il gT(X, T,,) dU). (Tl - To),
Each solution T E !!I of (12) with T(a) = Lo also satisfies (15), and from
continuity (x, T(x)) E K o, for a maximal interval [a, b], with b ::; b. Conse-
- ~
quently, on [a, b] we may use (14) to estimate V(x) = T(x) - To(x), as follows:
It then follows that T(a) = L, and from the fundamental theorem of calculus
that T E:T, with T'(x) = G(x, T(x)), x E [a, b]. The uniqueness is then a
consequence of A.17.
Let r = b - a.
The solution to (16) is obtained by successive approximation from (15) as
follows
To(x) = Lo + LX G(s, To(s)) ds;
(17)
1',.+1(X) = L + LX G(s, 1',.(s)) ds, n = 0,1,2....
IIT2 - Toll ~ IITI - Toll + IIT2 - TIll ~ «5(1 + yr) ~ «5e yr < «50 •
Thus (x, T2 (x)) E K o, and we may continue inductively to conclude that
for n = 0,1,2...
[y(x - a)]n
1T,,+I(x) - T,,(x) I ~ , «5, (19)
n.
while
I T,,+1 - Toll ~ «5e yr < «50 , so that (x, T,,+1 (x)) E Ko·
From (19), we see that when n ~ 1,
n-l
T,,(x) = To(x) + L
k=O
[1i+l (x) - li(x)],
and we can conclude that the solution to (12), with T(a) = L, is uniquely
determined in this neighborhood. (It is possible to give growth conditions on
G, which permit the extension of this solution to the entire interval [a, b]. See
[C-LJ and [Ak].)
V(x, h) - V(x) = LX {[ G(s, T(s)) - G(s, To(s))] h- 1 - GT(s, To(s)) V(s)} ds.
Introducing
T,.(s) = uT(s) + (1 - u) To(s) for u E [0, 1J, (24)
434 Appendix
From A.1S, we know that when Ihl = IL - Lol = ~ < ~oe-y(b-a), then
I T - Toll <
~Ko = IhlKo, where KO = ey(b-a), and by (24),
liT.. - Toll = luIIIT(·, h) - Toll < IhlKo for U E [0, 1]. (26)
From the uniform continuity of the components of GT on the compact set
Ko (5.2), then for a typical row gT of GT and each e > 0, we can conclude by
(26) that IgT(s, T..(s)) - gT(S, To(s)) I ::; e, when Ihl is sufficiently small.
Thus we may estimate (25) in the manner used to obtain (14), and see that
for sufficiently small h, and an appropriately large y,
vh(x) ~ IV(x, h) - V(x) I ::; y (IX vh(s) ds + ell VII) = a(x), say.
Then, as in the proof of A.17, a'(x) = yvh(x) ::; ya(x), or (e-yxa(x))' ::; 0, so that
vh(x) ::; a(x) ::; ey(X-a)a(a) ::; ey(b-a)eyll VII. Since e may be made as small as we
°
wish, we see that as h -+ 0, Vh(X) -+ and in fact I V(', h) - VII -+ 0.
Recalling the definitions (22) and (21) of V(x, h) and V(x) respectively, we
have proved that for an appropriate E, the partial derivative 1/ = 1/(x; Lo)
exists and provides a continuous solution to equation (23)
Y1 is within a sign, the unique eigenfunction in iflJo for this eigenvalue A1.
[Otherwise, when A = A1, the equation (28) would have two linearly indepen-
dent solutions, both in iflJo. It would then follow that all solutions to this
second-order linear equation on [a, b] would be in iflJo, contradicting the fact
that a solution y can be constructed by the method of §A.5 with y(a) =f. O.
(See [C-L].)]
This argument is equally applicable to any eigenvalue A, so that the asso-
ciated eigenfunction y is uniquely determined within a sign by the normal-
ization G(y) = 1. In what follows we suppose the eigenfunctions to be so
normalized. (We also suppress the argument x in the integrands.)
r r
PROOF.
rem 6.4b with Remark 5.17, it follows that if Y2 accomplishes this minimiza-
tion then 3 Lagrangian multipliers J1. and - A2 such that Y2 is a C 1 solution
of the equation
(32)
(The constraints eliminate the alternative conclusion 6.4a, since
<5G(y;y)
1<5G <5G(Y;Ydl=12G(Y) 2G (Y)I=12 01=2)
1
1(y; y) <5G1(y; yd G1(y) G(yd 0 1 .
§A.6. The Rayleigh Ratio 437
Here J1. = 0. [Indeed, with two integrations by parts repeatedly utilizing the
constraint G1 (y) = 0, we have from (29) that
= - Lb [ry'y~ + qYY1] dx
= r [(TY~r - qYl]Y dx = - Lb A1PY1Y dx = 0,
since Yl is a solution of (28) for A = A1 . (The boundary terms vanish as usual
with Y and Yl')]
It follows that Y2 is an eigenfunction to the eigenvalue A2, and as in
Problem 7.19(a), A2 = R(Y2) ~ R(yd = A1 . (Why?) Moreover, A2 > A1 , for
otherwise Y2 would be an eigenfunction to Al which is essentially different
from Yl (since G1 (Y2) = 0). Continuing in this manner, we can define A3 =
min{R(y): Y E q), Giy) = 0, j = 1, 2} where
(A.22) Proposition (Courant). For j = 1, 2, ... , n - 1, let CPj E era, b]. Then
PROOF. We may select constants Ci not all zero so that the function Y = CiYi L7
meets the orthogonality conditions <l>j(Y) = L7=1 ci<l>iYi) = 0, j = 1, 2, ... ,
438 Appendix
G(y) = f bpy2 dx = . ~ n
CiCj
fb PYiYj dx = .L ct =f. 0,
n
(34)
a 1,J=1 a 1=1
or
n
F(y) ~ An L ct, since Al < A2 < ... < An·
i=l
Now, when P = Po and 't" = 't"o are (positive) constants, while q = 0, then
(28) has for eigenvalues 1n > 0, the elementary eigenfunctions
Yn(x) = an cos WnX + bn sin WnX,
where Wn = me/I,
1= b - a;
and the coefficients an and bn are selected to have Yn E q). (See §8.9(a).)
(A.23) Corollary. When Po = max p, and 't"o = min 't", then 1n ~ An' n = 1,
2, ... , so that An -+ +00, as n -+ 00.
§A.6. The Rayleigh Ratio 439
PROOF. For this Po, 1"0 and q = 0, let R be the associated Rayleigh ratio which
is defined since Po > O. Observe that when Y E !?):
(A.24) Theorem. If An --+ +00 as n --+ 00, and qJ E C[a, b], set
Thus L~=1 C~ ~ G(qJ) and (a) follows as N --+ 00. (Observe that we have used
only the orthonormality of the Yn.)
If, in addition, qJ E Wo, then we may suppose that G(qJN) =f. 0, V N. [Other-
wise we would have L~=1 c~ = G(qJ), while qJN = 0 (Why?), so that cj =
Gj(qJ) = L~=1 cnGj(Yn) = 0 ifj > N, and (b) would follow.] Then qJN E Wo and
by (33) and (35), AN ~ R(qJN) = F(qJN)/G(qJN), or by (31),
f 2"
o y(x) sin(x/2) dx = 0,
then
F(y) = f 0
2"
[(y'? - y2J dx ~ 0,
f 2" y2 dx = L c;, 00
with Cn = r.:. f2" y(x) sin-
1 nx
2 dx, (37)
o n=l V n 0
L c;,
n=l
(38)
§A.7. Linear Functionals and Tangent Cones in IRd 441
where
en = - 1 f2" y'(x) cos-dx
nx
.In 0 2
n 1 f2" nx n
"2.Jn 0 y(x) sin 2 dx = "2 cn , n = 1,2, ... ,
since y(O) = y(2n) = O. Thus combining (37) and (38) and recalling that C1 =
0, we have
f
2" (y,2 - y2) dx ~
o
L
00
n=2
(e: - c:) = L
00
n=2
[(n)2
-2 - 1] c: ~ 0,
with equality iff cn = 0 when n > 2. But it then follows from (36) that
q>2(X) = y(x) - C2 sin x satisfies G(q>2) = H"(q>2)2 dx = G(y) - (C2)2 = 0, (by
(37)) so that q>2 == 0; i.e., y(x) = C2 sin x. 0
Remark. There is a similar proof for the standard Wirtinger inequality used
in Problem 1.6, involving the general theory of Fourier series. (See [H-L-P].)
As we observed there, some restriction on the functions in iflJo is necessary
for its validity. Fortunately, the integral condition H" y(x) sin(x/2) dx = 0
can always be achieved in the solution of the classical isoperimetric problem,
without violating the additional requirements that y(O) = y(2n) = O. [We
suppose that the origin is placed on a typical bounding curve of length
2n, and "load" the curve non uniformly with the mass function sin(s/2), where
s is the arc length from the origin. Then, if the x axis is taken through the
centroid of the loaded curve, it follows that H" y(s) sin(s/2) ds = 0.]
We say that the Ii are linearly independent when the Ai are linearly inde-
pendent, i = 1, 2, ... , m ~ d. We write A ~ (9, when each component of A
is nonnegative, and give a corresponding componentwise interpretation to
other vector-valued inequalities. Let M = (11-1,11-2, .•• , I1-m) E !Rm •
PROOF. We know that if T E %0, then G'(Xo) T ~ {9. Conversely, suppose that
T'~ G'(Xo)T ~ (9, for some T with ITI = 1, and let li(X) = Vgi(XO)·X,
i = 1,2, ... , m. As X ranges over ~d, L(X) = (11 (X), ... , Im(X)) ranges over ~m,
since the Ii are linearly independent by hypothesis. Thus there exist Xj E ~d,
j = 1, 2, ... , m for which the vectors L(X) are linearly independent in ~m, so
that the m x m matrix G with elements gij = Vgi(X0)· Xj' i, j = 1, 2, ... , d, is
invertible.
Next, observe that (0, (9) E ~m+1 is in the (9-level set of
(since G(Xo) = (9), and the Jacobian matrix Fv(O, (9) = G, (since
(8jJ8v j )(0, (9) = Vgi(XO)· Xj). Hence by implicit function theory [Ed], we can
represent a neighborhood of this level set as the graph of a unique C 1 func-
tion V(t) defined for It I < e, say; i.e., F(t, V(t)) == (9 for It I < e, and V(O) = {9.
Differentiating and evaluating at t = 0 we obtain
Thus by Proposition A.26(b) there exist unique M E [Rm for which M ~ (9 and
m
Vf(Xo) = - L fJ.iVgi(XO),
i=l
or
Vj(Xo) = (9, if j = f + M· G.
Now, were some giXo) < 0, then by continuity we should have gj <
near Xo , and such constraints would be inactive in the minimization. It
°
follows that we can restrict the gi to those for which gi(XO ) = 0, and apply
that part already established to these gi' But then, by uniqueness, we can
°
simply set Jlj = when gj(Xo) < 0, and the conclusion follows. 0
Bibliography
445
446 Bibliography
[Bo] O. Bolza
Lectures on the Calculus of Variations, 3rd ed. Chelsea Publishing Com-
pany: New York, 1973.
Vorlesungen uber Variationsrechnung. B.G. Teubner: Leipzig and Berlin,
1909.
[B-dP] W. Boyce and R. DiPrima
Elementary Differential Equations and Boundary Value Problems, 2nd ed.
John Wiley and Sons: New York, 1969.
[CCV] Contributions to the Calculus of Variations, 4 vols. University of Chicago
Press: Chicago, 1930-1941.
[C] e. Caratheodory
Calculus of Variations and Partial Differential Equations of the First Order:
Vols. I, II. Holden-Day: San Francisco, 1965, 1967.
[Ca] L.B. Carll
A Treatise on the Calculus of Variations. John Wiley and Sons: New York,
1890.
[Ce] L. Cesari
Optimization: Theory and Applications. Springer-Verlag: New York, 1983.
[Cia] F. Clarke
Optimization and Nonsmooth Analysis. John Wiley and Sons: New York,
1983.
[CI] J.e. Clegg
Calculus of Variations. Oliver and Boyd: Edinburgh, 1968.
[C-L] E. Coddington and N. Levinson
Theory of Ordinary Differential Equations. McGraw-Hill Book Company:
New York, 1955.
[Co] R. Courant
Calculus of Variations and Supplementary Notes and Exercises. New York
University Lecture Notes, 1957.
[C-H] R. Courant and D. Hilbert
Methods of Mathematical Physics, Vol. I. Interscience Publishers: New
York, 1953.
[Cr] B.D. Craven
Mathematical Programming and Control Theory. Chapman and Hall:
London, 1978.
[Ed] e.H. Edwards, Jr.
Advanced Calculus of Several Variables. Academic Press: New York,
1973.
[EI] L.E. El'sgol'c
Calculus of Variations. Addison-Wesley Publishing Company: Reading,
MA,1962.
[E-T] I. Ekeland and R. Temam
Convex Analysis and Variational Problems, North-Holland: Amsterdam,
1977.
[Ew] G.M. Ewing
Calculus of Variations with Applications. W.W. Norton: New York, 1969.
[FI] W.D. Fleming
Functions of Several Variables, 2nd ed. Springer-Verlag: New York, 1977.
[F-R] W.D. Fleming and R. Rishel
Deterministic and Stochastic Optimal Control. Springer-Verlag: New York,
1975.
[F] M.J. Forray
Variational Calculus in Science and Engineering. McGraw-Hill Book Com-
pany: New York, 1968.
Bibliography 447
[Le] G. Leitmann
The Calculus of Variations and Optimal Control. Plenum Press: New York,
1981.
[Li] L.A. Liusternik
Shortest Paths: Variational Problems. Macmillan: London, 1964.
[Lo] P. Loewen
Optimal Control via N onsmooth Analysis. American Mathematical Society,
Providence, RI, 1991.
[M] K. Maurin
Calculus of Variations and Classical Field Theory, J. Lecture Notes Series,
# 34. Aaarhus: Danemark, 1972.
[M-S] J. Macki and A. Strauss
Introduction to Optimal Control Theory. Springer-Verlag: New York, 1982.
[Mi] S.G. Mikhlin
Variational Methods in Mathematical Physics. Macmillan: New York, 1964.
The Problem of the Minimum of a Quadratic Functional. Holden-Day: San
Francisco, 1965.
The Numerical Performance of Variational Methods. Walters-Noordhoff:
Groningen, 1971.
[Mo] c.B. Morrey, Jr.
Multiple Integral Problems in the Calculus of Variations and Related
Topics. University of California Press: Berkeley, 1943.
[Mor] M. Morse
The Calculus of Variations in the Large. American Mathematical Society:
Providence, RI, 1934.
[Mu] F.D. Murnaghan
The Calculus of Variations. Spartan Books: Washington, DC, 1962.
[N] J. N ering
Linear Algebra and Matrix Theory, 2nd ed., John Wiley and Sons: New
York, 1970.
[Ne] L. Neustadt
Optimization: A Theory of Necessary Conditions. Princeton University
Press: Princeton, NJ, 1976.
[No] A.R.M. Noton
Variational Methods in Control Engineering. Pergamon Press: New York,
1965.
[O-R] J.T. aden and J.R. Reddy
Variational Methods in Theoretical Mechanics. Springer-Verlag: Berlin,
1976.
[as] R. Osserman
A Survey of Minimal Surfaces. Van Nostrand Reinhold: New York, 1969.
[P] L.A. Pars
An Introduction to the Calculus of Variations. Heinemann: London, 1962.
[Pe] J.P. Petrov
Variational Methods in Optimal Control Theory. Academic Press: New
York, 1968.
[Po] L.S. Pontjragin with V.G. Boltjanskii, RS. Gamkre1idze, and E.F.
Mischenko
The Mathematical Theory of Optimal Processes. Pergamon-Macmillan:
New York, 1964.
[P-S] G. Polya and M. Schiffer
Convexity of functionals by transplantation. J. d'Anal. Math., 1953/54,
247-345.
Bibliography 449
[R] K. Rektorys
Variational Methods in Mathematics, Science and Engineering. Reidel
Publishing Company: Boston, 1975.
[Ro] R.T. Rockafellar
Convex Analysis. Princeton University Press: Princeton, NJ, 1970.
[Rud] W. Rudin
Principles of Mathematical Analysis, 2nd ed., McGraw-Hill Book Com-
pany: New York, 1964.
[Ru] H. Rund
The Hamilton-Jacobi Theory in the Calculus of Variations. D. Van
Nostrand Company: Princeton, 1966.
[Rus] J.S. Rustagi
Variational Methods in Statistics. Academic Press: New York, 1979.
[S1 H. Sagan
Introduction to the Calculus of Variations. McGraw-Hill Book Company:
New York, 1969.
[S-S] A. Seierstad and K. Sydsaeter
Optimal Control with Economic Applications. North-Holland: New York,
1987.
[Se] L.A. Segel
Mathematics Applied to Continuum Mechanics. Macmillan: New York,
1977.
[Sew] M.J. Sewell
Maximum and Minimum Principles. Cambridge University 'Press, New
York,1987.
[Sm] D.R. Smith
Variational Methods in Optimization. Prentice-Hall: Englewood Cliffs, NJ,
1974.
[Smi] P. Smith
Convexity Methods in Variational Calculus. Research Studies Press Ltd.:
Letchworth, Hertfordshire, England, 1985.
CSt] W. Stacey
Variational Methods in Nuclear Reactor Physics. Academic Press: New
York,1974.
[S-F] G. Strang and G.J. Fix
An Analysis of the Finite Element Method. Prentice-Hall: Englewood
Cliffs, NJ, 1973.
[T] L. Tonelli
Fondamenti di Calcolo delle Variazione, I. II, N. Zanichelli: Bologna, 1921,
1923.
[Ti] V.M. Tikhomirov
Fundamental Principles of the Theory of Extremal Problems, John Wiley &
Sons: New York, 1986.
[TO] G. Leitmann, Ed.
Topics in Optimization. Academic Press: New York, 1967.
[Tr] J.L. Troutman
Boundary Value Problems of Applied Mathematics. PWS Publishing Com-
pany, Boston, 1994.
[V] M.M. Vainberg
Variational Methods for the Study of Nonlinear Operations. Holden-Day:
San Francisco, 1964.
Variational Method and Method of Monotone Operators in the Theory of
Nonlinear Equations. John Wiley and Sons: New York, 1973.
450 Bibliography
[W] K. Washizu
Variational Methods in Elasticity and Plasticity. Pergamon Press: New
York, 1967.
[Wa] J. Warga
Optimal Control of Differential and Functional Equations. Academic Press:
New York, 1972.
[W-S] A. Weinstein and W. Stenger
Methods of Intermediate Problems for Eigenvalues. Academic Press: New
York, 1972.
[Wei] H. Weinberger
A First Course in Partial Differential Equations. Blaisdell Publishers: New
York, 1965.
[We] R. Weinstock
Calculus of Variations with Applications to Physics and Engineering.
McGraw-Hill Book Company: New York, 1952.
[Y] L.c. Young
Lectures on the Calculus of Variations and Optimal Control Theory. W.B.
Saunders: Philadelphia, PA, 1969.
[Y - M] W. Yourgrau and S. Mandelstam
Variational Principles in Dynamics and Quantum Theory. Dover Publica-
tions: New York, 1979.
[Z] E. Zeidler
Nonlinear Functional Analysis and its Applications, III: Variational Methods
and Optimization. Springer-Verlag: New York, 1985.
Historical References
451
Answers to Selected Problems
Chapter 0
Chapter 2
Chapter 3
452
Answers to Selected Problems 453
x 3 X2)
3.25. (b) WL(X4
2JL 1~ - -f
+ 21 ,where Xl = x/L;
M .
(d) 4JL x 2(1 - xd wIth max. at X = 2L/3.
Chapter 5
Chapter 6
6.41. (a) U xx + U yy + x 2 + y2 = O.
(c) 3u x uxx - 6u;u yy - u = O.
Chapter 7
(Note: For 7.6 and 7.7, express the Weierstrass-Erdmann conditions in terms ofm
and n, the distinct limiting values of jl' at a potential corner point, and determine these
values.)
X, x:::; c, 1
7.9. (c) jl(x) = { x where c = eC - •
e - 1, x ~ c,
7.23. (d) 1: = T if T2 :::; 2h and 1: = T - JT 2 - 2h otherwise; (e) To = 3J2h15.
Chapter 8
and
Chapter 10
Chapter 11
457
458 Index