Notes On Analytical Mechanics
Notes On Analytical Mechanics
Notes On Analytical Mechanics
“… to reduce the theory of mechanics, and the art of solving the associated problems, to
general formulae, whose simple development provides all the equations necessary for the
solution of each problem… . to unite, and present from one point of view, the different
principles which have, so far, been found to assist in the solution of problems in
mechanics; by showing their mutual dependence and making a judgment of their validity
and scope possible. … No diagrams will be found in this work. The methods that I
explain in it require neither constructions nor geometrical or mechanical arguments, but
only the algebraic operations inherent to a regular and uniform process. Those who love
Analysis will, with joy, see mechanics become a new branch of it and will be grateful to
me for having extended its field.”
Outline:
References:
Newton’s 1st law of motion, the law of inertia, tells us that motion or lack of motion of a
particle is unchanged if there are no forces present; consequently, this is also a test for the
presence of a force.
Newton’s 2nd law of motion relates an applied force to the rate of change of momentum
of a body: F = = p = ma , where the latter is true whenever the mass is unchanged.
dp
dt
Thus if the forces are known a priori, the motions are determined, and vice versa.
d ( p1 + p2 )
By Newton’s 3rd law of motion, the reaction force is equal and opposite to the action
force. Thus F1 + F2 = 1 + 2 = = 0, and the total momentum of the system
dp dp
dt dt dt
is unchanged. This is easily extended to many particles, and hence momentum is
conserved in the absence of external forces.
by a path integral: W = ∫ F idr . If the amount of work done is independent of the actual
The common notion of work is the application of a force through a distance, made precise
C
path, depending only upon the end-points, then we have a conservative force. The
mathematical test is if the curl is zero: ∇ × F = 0. Gravity and static electric forces are
conservative; friction is non-conservative. For conservative forces, we have the work-
W = ∫ F idr = ∫ (
idr = 1 ∫ m idp = 1 ∫ pidp = pB2 − pA2 2m. )
energy theorem:
B B dp pB dr pB
A A dt m pA dt m pA
So work is the change in kinetic energy. If the process is reversible, we can let the work
be performed on us, such as with a compressed spring, or a weight on a lever. Thus we
potential energy: U ( P ) = − ∫ F idr , where P is any point, and R0 is the point of reference
can store the work, and later convert it back to kinetic energy. This stored work is called
P
R0
If the potential is known, the force can be obtained by means of the gradient; this is
simply an application of the fundamental theorem of (multivariable) calculus:
F ( P ) = −∇U ( P). The gradient operator kills off the reference level associated with the
potential, so we get the true force.
( )
E ( t ) = T ( t ) + U r ( t ) . Then the rate of change of energy with time is
( )
⎟ + U r (t ) = ⎜ i p ⎟ + ∑
dE d ⎛ p 2 ⎞ d ⎛p ⎞ ∂U dxi
= ⎜ = vi p + ∇U iv. The first term is
dt dt ⎝ 2m ⎠ dt ⎝ m ⎠ i ∂xi dt
instantaneous power, vi F , and the second term contains the gradient of the potential,
which is the negative of the force, followed by the rate of change of position with time,
= 0, so that the
dE
which is the velocity. The two terms are equal and opposite, leaving
dt
energy is conserved; we say that this is a conservative force. This would not be the case if
the potential depended upon time or velocity, as the additional terms from the time
derivative would not cancel.
The mechanics of Newton are carried out with forces, which are represented as vectors.
But the kinetic and potential energies are scalar quantities, and we have seen that forces
and the magnitude of the momentum can be recovered from them. In order to further
explore the expressions of mechanics in terms of energies, we must first review
derivatives, and the construction of generalized coordinate systems.
=∑
∂z ∂z ∂xi
for each j = 1 to m.
n
∂u j i =1 ∂xi ∂u j
The proof can be found in advanced calculus texts; it is mostly keeping track of details,
which makes it lengthy. The fundamental idea is that you must compound the rates of
change through each extant parametric dependence; each of these is just the simple chain
rule.
z = f ( r ,θ ) , r 2 = x 2 + y 2 , tan (θ ) = y
Example:
( x ).
x
so in the first quadrant r = x 2 + y 2 , θ = tan −1 y
∂z ∂z ∂r ∂z ∂θ
= +
∂x ∂r ∂x ∂θ ∂x
Then .
If a function depends on only one variable, then the partial and ordinary derivatives are
the same.
z = f ( r ,θ ) , r = r ( t ) , θ = θ ( t ) .
Example with parametric form:
∂z ∂z ∂r ∂z ∂θ ∂r dr ∂θ dθ
= + = =
∂t ∂r ∂t ∂θ ∂t ∂t dt ∂t
Then with and .
dt
In the presence of constraints one or more of the independent variables may be held
constant. In that case the non-varying terms vanish. This is especially common in
thermodynamics, where the chemist is able to control the volume or the pressure. The
notation must be adapted to indicate the held variables to avoid ambiguities.
=∑ =∑
or rotating coordinate systems:
∂ r dxi ∂ r n ∂ r ∂r
r= + xi + ,
n
dr
dt i =1 ∂xi dt ∂t i =1 ∂xi ∂t
()
and also its partial with respect to xk :
⎜∑ xi + ⎟ = ∑
∂ ∂ ⎛ n ∂r ∂ r ⎞ n ∂ r ∂xi
r =
∂xk ∂xk ⎝ i =1 ∂xi ∂t ⎠ i =1 ∂xi ∂xk
,
∂r
∂xi
where we have recognized that there is no xk dependence in the . Independence of the
∂xi
= 0 except when i = k , so the summation collapses to
∂xk
coordinates also means
∂r ∂r
=
∂xk ∂xk
(2.1) ; this is called cancellation of the dots. We will use this later, along with
d ⎛ ∂r ⎞ ∂r
⎜ ⎟=
dt ⎝ ∂xk ⎠ ∂
this interchange of operators: (2.2) . This is established by starting from
x
=∑ ⎟=∑
k
∂r ∂2 r ∂2 r d ⎛ ∂r ⎞ n ∂ 2 r ∂2 r
xi + ⎜ xi +
n
the two sides differ only in the order of the partial derivatives; Clairaut’s theorem tells us
that these can be interchanged as long as both of the resulting partial derivatives are
continuous, which provides the condition for the identity.
f g ( x) = = , where y = g ( x ) .
d
|y= g ( x) or briefly,
dx dy dx dx dy dx
∂F dx ∂F dy ∂F
0= F ( x, y ) = + = Fx + Fy = − Fx Fy for Fy = ≠ 0.
d dy dy
∂x dx ∂y dx ∂y
so
dx dx dx
Thus y ( x ) = ∫ dx = − ∫ ⎜ x ⎟ dx + C.
dy ⎛F ⎞
dx ⎝ Fy ⎠
dθ d −1 df (θ )
1= = f ( y ) | y = f (θ ) ⇒ f ( y) = 1 f ( y) = 1
d −1 d −1
dθ dy dθ dθ dθ
dy dy
so .
Just remember to convert all of the variables from θ to y when you are done.
dy dy
Example:
n
x n = x, y = x n ⇒ x = n y = y1/ n . So
( y)= d ⎛ 1n ⎞ 1 1n −1
1
= = = = y ⇒ ⎜y ⎟= y .
1 1n −1
( )
n
d 1 1 1 1x 1 y
n x n −1 n y n y n dy ⎝ ⎠ n
n
dy d n
x
dx
tan (θ )
d 1 . But
dθ
dy d
d ⎛ 1 ⎞ cos (θ ) sin (θ )
tan (θ ) = ⎜⎜ sin (θ ) ⎟⎟ = − = 1 + tan 2 (θ ) = 1 + y 2 .
cos (θ ) ⎠ cos (θ ) cos (θ )
2
dθ dθ ⎝
d
2
tan −1 ( y ) =
d 1
1+ y2
So .
dy
1. Use the chain rule to find the derivatives of the inverses of the following functions:
sin (θ )
exp ( x )
2. Use the chain rule to find the total derivatives of the following functions, with respect
to the parameter t :
(
cos x 2 + y 2 )
x2 + y 2 + z 2
3. Repeat problem (2), but take the partial derivatives with respect to z.
∂r
∂xk
4. Recall equation (2.1); use the same methods to find .
F ( x0 + h ) − F ( x0 )
( x0 ) = lim
i
dF
.
dx h →0 h
Generalize first with N ≥ 1 so that the domain can support a surface (e.g., z = F ( x, y ) ).
( )
line x0 + hnˆ ; this defines the directional derivative:
i i
( ) ( )
( ) ( )
F x0 + hnˆ − F x0
x0 = lim = lim F x0 + hnˆ .
dF d
dnˆ h → 0 h h → 0 dh
If n̂ is one of the basis vectors, say uˆk , only the uk variable is then subject to change.
Thus follows the definition and rule for partial derivatives: hold all of the ui constant
dF ∂F
=
duˆk ∂uk
except for uk , and you get .
∂F
∂uk
If we find continuous for each of the basis vectors at the point x0 we have the slopes
gradient: ∇F = ∑ uˆi , which is a vector pointing “up hill” on the tangent plane. The
∂F
of a hyper-plane tangent to that point. This plane represents a new type of derivative, the
i ∂ui
∂F
gradient is identical to the directional derivative having the greatest magnitude, and is
independent of the coordinates used. Note then that ∇F iuˆk =
∂uk
for any basis.
( ) ( ) ( ) ( )
Applying the chain rule to the definition of the directional derivative gives
x0 = lim x0 + hnˆ = lim ∇F x0 + hnˆ inˆ = ∇F x0 inˆ. We get a linear
dF dF
dnˆ h → 0 dh h →0
direction vector, nˆ. If we change to the basis where nˆ = vˆk then all of the above remains
combination of the partial derivatives making up the gradient, with weightings from the
true since vector relationships are independent of the basis chosen; this provides an
alternative proof of the gradient/unit vector method of evaluation.
= = ∇F i
dvˆk ∂vk i ∂ui ∂vk ∂vk
∂U
i
parameterize via the arc length along the path, or by its parallel, time. The derivative
=∑
with respect to this parameterization is called the total derivative and is given by
∂F dui
= ∇F i
dF dU
i ∂ui ds
. The total derivative includes both explicit and implicit
ds ds
=∑
variations due to the parameterization. If this is time, it gives you the speed along the
∂F ∂F ∂ui ∂U
= ∇F i
∂vk i ∂ui ∂vk ∂vk
path. We also see that .
∂F ∂U
= vˆk . Similarly,
∂vk ∂vk
So is a directional derivative in the direction of unit vector
dF dU
is a directional derivative in the direction of , but scaled by this magnitude; it is
ds ds
the direction of the “motion”.
()⎛ ∂T ∂T ∂T
∇T r = ⎜
⎝ ∂x ∂y ∂z
, ,
⎞
⎟ . If the temperature map includes time, T = T ( x, y, z , t ) , then
⎠
∂T
∂t
we can also find the temporal variation, . Changing the spatial coordinate system
does not change the gradient, though it does change the vector components. Now
+∑
dT ∂T ∂T dxk ∂T
temperature map as parameterized by the bicyclist’s path:
= = + vi∇T .
dt ∂t k ∂xk dt ∂t
This is also known as the convective derivative in fluid flow. Note that the time is what
ties the bicyclist to a specific temperature, and that the second term is a directional
derivative, in the direction of travel, scaled by speed.
( )
⎪ ⎪
⎪f x ⎪
⎪ ⎪
F =⎨ 2 0 ⎬
⎪ ⎪
( )
⎪ ⎪
⎪⎩ f M x0 ⎪⎭
( )
Then the derivative of F is the matrix where row k is the gradient of f k x0 . This is
( )
⎡ ∇f1 ⎤
called the Jacobian of F, and can be written as:
( )
⎡ ∂f k x0 ⎤ ⎢ ∇f ⎥ ∂ ( f , f , , fM )
JF x0 = ⎢ ⎥=⎢ 2⎥=
⎢ ∂uk ⎥ ⎢ ⎥ ∂ ( u1 , u2 , , uN )
1 2
.
⎣ ⎦ ⎢ ⎥
⎣ ∇f M ⎦
( ) ( )
Then it is easy to show that the change of F at x0 is: ΔF x0 = JF x0 iΔ x0 .
As the best linear estimator for F, this is what we want for a generalized derivative.
These properties make the determinant of the Jacobian matrix the best estimator for
volume changes, hence its use as the volume adjustment for change of variables in an
integral.
Divergence
⎝
V →0 V ∂V
⎠
the coordinate system. By Gauss’s theorem, ∫∫∫ ∇i E dV = ∫∫ E idS , or the volume
V ∂V
integral of the divergence of a field is equal to the net flow of the field through the
boundary of the volume. A fluid is incompressible if the divergence of its flow is zero.
Curl
⎝
V →0 V ∂V
⎠
coordinate system. By Stoke’s theorem, ∫∫ ∇ × E idS = ∫ E id , where R is an open
∂R
surface, and ∂R is a closed loop about its opening; d is tangent to the curve. Applies to
R
Let X ∈ N
so that F : X → R defines a surface in N +1
. Let {ui } be a basis set for X
the stationary points must be examined further to determine which are the local minima
and maxima of F ; if there are boundaries, these must be examined as well.
Note that since the gradient vanishes at a stationary point, so must the directional
= ∇F in. If we apply a side condition, or constraint to the space
dF
derivative, because
dn
in which we are working, then we must search for stationary points in the constrained
space.
{ ( )
the surface defined by F . Let S be the space: S = X : gi X = 0, for all i = 1 to m . }
Assume that x0 ∈ S is such a stationary point. Let u ( s ) be a parameterized curve in
S which passes through x0 at s = s0 . Then ( s0 ) is tangent to the curve there, and each
du
gi (u ( s)) = 0. If we looked at all of the possible curves passing through x0 , and their
ds
derivatives there, we would have found the tangent plane at x0 , for S. By the chain rule
∇gi , is orthogonal to the tangent plane to S. This does not mean that they are parallel to
ds ds
each other, for we would expect the constraints to be independent of each other.
Lagrange’s theorem states that ∇F = ∑ λi ∇gi for a unique set of multipliers, the λi , for
i
∇gi are orthogonal to the level curves defined by their corresponding equations of
each stationary point in the constrained space, S. There are two things going on: the
constraint, which results in m independent directions orthogonal to the space S ; and the
gradient of F is a linear combination of those directions at each of the stationary points.
F ′ = F − ∑ λi gi , ∇F ′ = 0.
This allows us to rewrite the constrained function as
The directional derivative gives the rate of change in the specified direction:
= ∇F iuˆ.
dF
duˆ
The maximum rate of change is when the direction vector points in the same direction as
the gradient vector; the gradient gives the maximum rate of change.
An isocontour, or constant contour plot, is what you get when you set F ( x, y, z ) = Ck for
even increments ΔCk . Several examples are shown here. The plot above is shown with
simple contour lines, and with color shading. If you “walk” the cyan, you never have to
climb over the hills, though the land is not quite flat!
When the terrain is very steep, you see many contour lines very close together. Their
“density” is proportional to the directional derivative in the direction viewed.
Note that the directional derivative along a contour line is zero, because the contour line
is “instantaneously” level in that direction; this is true in higher dimensional spaces also.
But the directional derivative is the inner product of the gradient with the direction
vector, so they must be orthogonal.
This means that the gradient is normal to every level curve and level surface. Look
closely at the figures above, and find the steepest gradients, where the level curves are
closest together. At these points it will be most obvious that the gradient must be running
perpendicular to the level curves, but it is true everywhere.
If you look for minima or maxima of a function, a plot of the level curves helps you to
find the local extrema quickly. Now consider any directional derivative at these extreme
points: it must be zero in every direction, because you are at a stationary point of the
surface in every direction.
∇F = 0.
This means that the gradient must be zero at each stationary point:
∫∫
⎡ 1 ⎤
An alternative, coordinate-free, or geometric definition of the gradient is as follows:
∇F = lim ⎢ FdS ⎥ ,
ΔV → 0 ΔV
⎣ ∂V
⎦
where V is a volume containing the point of interest, ∂V is the boundary of that volume,
and we take the limit as the volume shrinks to zero. The integral is a surface integral,
which uses the function F to weight each of the surface normal vectors, which are then
summed. The result is the net weight of the changes in F for every direction at the point
enclosed by the shrinking volume.
function, such as g ( x ) = 0. These are generalized level curves, because the “level” is
Now consider the process required to find stationary points subject to constraints on the
buried on the left-hand side of the equation. So we know that the gradient of G is normal
to this surface.
The plots here shows a series of level curves of f ( x, y ) , and the constraint
g ( x, y ) − k = 0. The level curve of f which intersects the level curve of g at the highest
point is a local maximum. But where two curves just touch (“osculate”) they are by
∇f is normal to the level curve of f , and ∇g is normal to the level curve of g , so they
necessity tangent to each other, and so their normals are parallel. We already know that
∇f = λ ∇g ,
satisfy Lagrange’s equation:
where λ is called a Lagrange multiplier. This is a necessary condition for the location of
a stationary point subject to a level-curve constraint.
gradient, ∇gi , which must be removed from the solution space. The Lagrange equation
freedom from the problem, and each one represents an independent direction for its
for multiple constraints is then a linear combination of these gradients, each with its own
∇F = ∑ λi ∇gi .
multiplier:
F ′ = F − ∑ λi gi ,
i
It is often convenient to define a new function with all of the degrees of freedom restored:
and then the solution is ∇F ′ = 0, and the Lagrange multipliers are found be means of the
i
side conditions. Often the values of the multipliers are not required, and for this reason
the method is known as the Lagrange method of undetermined multipliers.
∇ f ( x , y , z ) = λ ∇g ( x , y , z )
1. Solve the following system of equations:
g ( x, y, z ) = 0.
2. Plug in all solutions ( x, y, z ) from the first step into f ( x, y, z ) and identify the
extreme values.
∂f ∂g ∂f ∂g ∂f ∂g
Note that we actually have four equations in four unknowns:
=λ =λ =λ
∂x ∂x ∂y ∂y ∂z ∂z
, , ,
First we must formulate the object function, f ( x, y, z ) , which is the volume of the box.
So f ( x, y, z ) = xyz. The constraint is that the surface area of the box is fixed, so since
opposite sides are the same we have g ( x, y, z ) = 2 ( xy + yz + zx ) − k = 0, where k is the
fixed value for the surface area. We will also required each side to have a positive
length, because this is a real box.
∇g ( x, y, z ) = ∇ ⎣⎡ 2 ( xy + yz + zx ) − k ⎦⎤ = 2 ( y + z , x + z , y + x ) .
yz = λ ⎡⎣ 2 ( y + z ) ⎤⎦ ,
Applying Lagrange’s theorem results in the following three equations:
xz = λ ⎣⎡ 2 ( x + z ) ⎦⎤ ,
xy = λ ⎡⎣ 2 ( y + x ) ⎤⎦ ,
along with the constraint, 2 ( xy + yz + zx ) − k = 0.
If you multiply the first by x, the second by y, and the third by z , you end up with the same
that λ = 0 must be rejected if these are to be useful boxes, we can divide out the common
expression on the left hand side of all three, so the right hand sides are all equal. Noting
factor 2λ , leaving:
x ( y + z ) = y ( x + z ) = z ( y + x ) , or expanding each of these, we have
xy + xz = yx + yz = zy + zx, and working through these pair-wise gives
x = y = z.
So the largest volume rectangular box with a given surface area is (surprise!) a cube.
f ( −10, 6 ) = −68,
Answer:
f (10, −6 ) = 68.
Example 3 Find the maximum and minimum values of f ( x, y, z ) = xyz subject to the
constraint x + y + z = 1.
⎛1 1 1⎞ 1
f ⎜ , , ⎟= .
⎝ 3 3 3 ⎠ 27
Note that the constraint here is the inequality for the disk.
Now proceed with Lagrange Multipliers and treat the constraint as an equality instead of
an inequality. We deal with the inequality when finding the critical points. The three
8 x = 2λ x ,
equations are:
20 y = 2λ y,
x 2 + y 2 = 4.
You will find four points: ( 0, 2 ) , ( 0, −2 ) , ( 2, 0 ) , ( −2, 0 ) . These are all on the boundary.
To find the maximum and minimum we need to simply plug these four points along with
f ( 0, 0 ) = 0,
the critical point into the function f :
f ( 2, 0 ) = f ( −2, 0 ) = 16,
f ( 0, 2 ) = f ( 0, −2 ) = 40.
The minimum is in the interior, and the maximum is on the boundary of the disk.
Suppose ingredient X costs $4 per gram and ingredient Y costs $3 per gram. Find the
maximum number of doses that can be made if no more than $7,000 can be spent on raw
materials.
Answer:
Let a particle be subject to a force f = ( f1 , f 2 , f3 ) , so that the work done over a distance
d r = (dx1 , dx2 , dx3 ) is
dW = f id r = f1dx1 + f 2 dx2 + f 3dx3 = ∑ f i dxi .
If there is a constraint F ( x1 , x2 , x3 ) = 0 which leaves only 2 degrees of freedom (DOF) for
(4.1)
the motion then we can find generalized coordinates ui = ui ( x1 , x2 , x3 ) for each degree of
freedom with implicitly defined inverses: xi = xi ( u1 , u2 ) . For example, we could be
constrained to the surface of a sphere.
dW = f idr = ∑ f i du j = ∑ G j du j , with G j = f i
∂r ∂r
j
∂u j ∂u j
(4.2) .
j j
= m∑ i i .
dr ∂r dx ∂x
dt
Gj = m i
dt ∂u j dt ∂u j
(4.3)
i
d ⎛ ∂x ⎞ d ⎛ ∂x ⎞ d ⎛ ∂x ⎞
We can rearrange each term as the difference of two derivatives:
dxi ∂xi ∂x
= ⎜ xi i ⎟ − xi ⎜ i ⎟ = ⎜ xi i ⎟ − xi i ,
dt ∂u j dt ⎝⎜ ∂u j ⎟⎠ dt ⎜⎝ ∂u j ⎟⎠ dt ⎜⎝ ∂u j ⎟⎠ ∂u j
(4.4)
where we have applied our special result shown earlier (2.2) to interchange the partial
and total derivatives of xi in the final term. Also recalling the theorem for cancellation of
⎛ ∂x ⎞
the dots from the section on total derivatives (2.1), we now replace ⎜ i ⎟ in the leading
⎜ ∂u j ⎟
⎝ ⎠
⎛ ∂x ⎞
term by ⎜ i ⎟ , and substitute (4.4) back into (4.3) where we recognize quadratic forms:
⎜ ∂u j ⎟
⎝ ⎠
⎡ d ⎛ ∂x ⎞ ∂x ⎤ 1 ⎡ d ⎛ ∂x 2 ⎞ ∂x 2 ⎤
G j = m∑ ⎢ ⎜ xi i ⎟ − xi i ⎥ = m∑ ⎢ ⎜ i ⎟ − i ⎥ .
⎜ ∂u j ⎟
i ⎢ dt ⎝ ∂u j ⎦⎥ 2 i ⎣⎢ dt ⎝⎜ ∂u j ⎠⎟ ∂u j ⎥⎦
⎣ ⎠
(4.5)
If we carry out the summations of the xi 2 we have derivatives of the velocity squared, so
, where G j = ∑ Fi i
d ⎛ ∂T ⎞ ∂T
the Euler-Lagrange equations of motion in terms of generalized forces:
∂r i
Gj = ⎜ ⎟ −
dt ⎜⎝ ∂q j ⎟⎠ ∂q j
N
∂q j
(4.7) .
i =1
Note that ri = ri ( q1 , q2 ,
, qn , t ) , as the particle positions may depend upon any or all of
the generalized coordinates, including explicit dependence upon time.
Conservative Forces
U = U ( q1 , q2 , , qn ) such that dW = −dU . Note that we have also assumed that there is
Conservative forces are independent of time, and can be derived from a work function
∂U d ⎛ ∂ (T − U ) ⎞ ∂T
no dependence upon the coordinate velocities, the qi . Then the generalized forces are
∂U ∂U
Gj = − = 0. This gives G j = − = ⎜ ⎟−
∂q j ∂q j ∂q j dt ⎜⎝ ∂q j ⎟⎠ ∂q j
, with . We can
∂L
It is customary to define p j =
∂q j
, the momentum conjugate to q j or more simply, the
∂L
mq 2 − U , where U = U ( q ). Then p =
Illustration: Let L = = mq, the mechanical
1
2 ∂q
∂
momentum. We can thus identify a conjugate momentum operator for the Lagrangian:
pˆ j =
∂q j
.
In this chapter we have looked at a minimal set of generalized coordinates, having used
the equations of constraint to remove coordinates beyond those required for the number
of degrees of freedom natural to the problem. However, it is not always easy or
convenient to find generalized coordinates which match the DOF; in those cases we will
use a clever method due to Lagrange for working with a surplus of coordinates, the
method of undetermined multipliers.
Simon Stevin (1548-1620), Flemish Engineer and Mathematician, made a detailed study
of mechanical systems. He had the figure of a continuous chain of balls which rests on
an asymmetric ramp engraved on his tomb. Will the unbalanced weights cause the chain
to move? Consider a virtual displacement, where we move each ball clockwise by one
position. Since the configuration is unchanged by this action, there was no net work
performed … thus the chain is in equilibrium, and nothing moves. This reasoning is an
application of an early version of the Principle of Virtual Work.
following form: Let δ R be a virtual displacement which is arbitrary, but consistent with
This principle was later given precise mathematical form by Lagrange in essentially the
( ) ( )
that if we have both impressed forces and forces of constraint, then Newton’s 2nd law of
motion can be written: F = FI + FC = mr ⇒ FI − mr = − FC , where the so-called
inertial forces have been moved over with the impressed forces, isolating the unknown
( )
∑ Fi − mi ri iδ ri = 0.
D’Alembert’s expression for the forces of constraint in a moving system gives
In order to minimize a known function we start by examining its stationary points; these
are the places where the derivatives vanish. The calculus of variations provides tools for
a related task: identification of functions which make an integral stationary over a path.
Instead of operating in a space of points, we now operate in a space of functions.
Suppose we have a trial function, y = f ( x ) , then we will need to vary it in order to see if
the integral is stationary; if it is, then the variation of the integral will be zero.
We will only consider weak variations which are defined as arbitrary functions
constrained only by continuity conditions and a requirement that they vanish at the end-
the variations; ε is the parameter which connects the family together, and will be of
function which vanishes at the boundaries, but is otherwise arbitrary in order to provide
interest primarily as we take the limit as it passes towards zero. The end result will be a
partial differential equation (PDE) in y and its derivatives; the solutions of this PDE are
the functions which make the integral stationary under weak variations.
∂f ∂f
Then dy = dx + d ε , but our immediate interest is not in dy , which is a small
∂x ∂ε
which contains the arbitrary variations in y due to the parameter ε and the arbitrary
change in the function due to a small change in position, dx, but rather in the latter term,
function ϕ ( x ) via
∂f ∂
dε = ⎡ f ( x, 0 ) + ε ⋅ ϕ ( x ) ⎤⎦ d ε = ϕ ( x ) d ε . We thus introduce a
∂ε ∂ε ⎣
new type of differential, the d–process, for which only the dependent variables are varied:
δx=0
δ y = f ( x, ε ) − f ( x, 0 ) = ε ⋅ ϕ ( x ) .
(5.1)
(5.2)
(5.3) δ = δy.
dy d
dx dx
has a known form, say F ( x, y, y′ ) , we can use our d–process tools on the following
Now consider working with definite integrals with fixed end-points. Then if the integrand
I ( ε ) = ∫ F ( x, y ( x, ε ) , y′ ( x, ε ) )dx.
B
(5.4)
A
Since the unknown is a function, this type of integral is called a functional.
The stationary points are determined in the usual way, by setting the derivative of the
∂F ∂y ∂F ∂y′ ⎤
I (ε → 0 ) = ∫ dx = ∫ ⎢
B ∂F B ⎡ ∂F ∂x
functional to zero:
0= + + ⎥dx.
dε A ∂ε
⎣ ∂x ∂ε ∂y ∂ε ∂y′ ∂ε ⎦
d
(5.5)
∂x
A
= 0 , and the other partial derivatives with respect to the parameter ε can be
∂ε
But
∂y ∂y′
= ϕ ( x), = ϕ ′ ( x ) , so these can be put back
∂ε ∂ε
reduced to the forms arrived at above,
B ∂F ∂y ′
(5.6) ∫ dx = ∫ ϕ ( x ) |BA − ∫ ϕ ( x )dx = − ∫
B ∂F ∂F B d ∂F B d ∂F
ϕ ′ ( x )dx = ϕ ( x )dx,
into (5.5), and the final term can be integrated by parts to get:
A ∂y ′ ∂ε A ∂y ′ ∂y′ A dx ∂y ′ A dx ∂y ′
where the integrated term vanishes due to the boundary conditions imposed on ϕ ( x ) .
Putting this back into (5.5) and rearranging slightly gives:
0=∫ ⎢
B ⎡ ∂F d ∂F ⎤
− ⎥ϕ ( x ) dx.
⎣ ∂y dx ∂y′ ⎦
(5.7)
Since ϕ ( x ) is arbitrary we conclude that the bracketed expression must be zero in order
A
for this to be a stationary point. Thus the solutions to the problem must be solutions of
the resulting PDE, the Euler-Lagrange differential equation:
∂F d ∂F
− =0.
∂y dx ∂y′
(5.8)
0 = δ I = δ ∫ F ( x, y, y′ )dx = ∫ δ F ( x, y, y′ )dx = ∫ ⎢ δ y +
B ⎡ ∂F ∂F ⎤
process to carry out the procedure:
δ y′⎥ dx,
⎣ ∂y ∂y′
B B
⎦
(5.9)
Putting this back into (5.10) gives the equivalent of (5.7), but with δ y in place of ϕ ( x ) .
where the integrated term vanishes because the variation vanishes at the boundaries.
Hamilton’s Principle
0 = δ ∫ L dt ,
can be succinctly stated:
B
(5.11)
derivatives, qk ( t ) , and the time, t. The beginning and ending positions are fixed via the
definite times of the integration. We follow the weak variations method so that δ t = 0
Let height = y.
Let speed = y.
Then kinetic energy is T = my and potential energy is U = mgy.
1 2
2
So the Lagrangian is L = T − U = my 2 − mgy.
1
2
0 = δ ∫ L dt = δ ∫ ⎜ my 2 − mgy ⎟ dt.
⎛1 ⎞
⎝2 ⎠
⎛1
( )
∫ ⎜⎝ 2 mδ y − mgδ y ⎟⎠dt = ∫ ( myδ y − mgδ y )dt.
⎞
Carrying out the variation gives
2
∫ my dt (δ y ) dt = myδ y |A − ∫ dt ( my ) δ y dt = − ∫ myδ y dt ,
The first term can be rewritten as a differential, then integrated by parts:
d B d
where the integrated term vanishes due to the lack of variation at the boundaries.
Since δ y is arbitrary, and could be held positive, its factor must be zero everywhere.
∂L ∂ ⎛ 1 2 ⎞
We would get the same result by direct application of Lagrange’s equations of motion:
= ⎜ my − mgy ⎟ = my.
∂y ∂y ⎝ 2 ⎠
∂L ∂ ⎛ 1 2 ⎞
= ⎜ my − mgy ⎟ = −mg.
∂y ∂y ⎝ 2 ⎠
d ⎛ ∂L ⎞ ∂L d
0= ⎜ ⎟− = ( my ) − ( − mg ) = my + mg.
dt ⎝ ∂y ⎠ ∂y dt
When kinematical or other constraints apply we can no longer freely vary all of the
times t; note that the time is not varied, as it will be our independent variable. Note also
δ L′ = δ L + ∑ j =1 λ jδ f j = δ L + ∑ j =1 λ j ∑ k =1 j δ qk .
∂f
∂qk
m m n
(6.1)
Carrying out the variations in L, and regrouping in terms around the δ qk gives
δ L = ∑ k =1 ⎜⎜ ⎟ + ∑ j =1 λ j
⎛ ∂L d ⎛ ∂L ⎞ ∂f j ⎞
− ⎜ ⎟δ qk .
⎝ ∂qk dt ⎝ ∂qk ⎠ ∂qk ⎠⎟
n m
(6.2)
Note that each of the δ qk includes the entire set of m Lagrange multipliers; the original
problem had ( n − m ) DOF, and we have added in m DOF via the Lagrange multipliers.
This is enough to treat all n of the δ qk as independent, and we thus can set each of the
the equations of constraint, if required. Note that the λ j are functions of time; this is
terms independently to zero. The Lagrange multipliers can be determined with the aid of
because they must satisfy the conditions at each point in time. The Lagrange equation of
motion for the generalized coordinate qk and subject to m constraints is often written:
Note that each of the constraints appears with qk in the form of λ j , but only with the
constraint coefficients due to δ qk , these being the A jk .
In the case of holonomic constraints, we can use Hamilton’s principle directly. With
∑
d ⎛ ∂T ⎞ ∂T
nonholonomic constraints we go back to the Euler-Lagrange equations from chapter four,
∂ri
where we had Gk = ⎜ ⎟ − =
N
functions are derivable from a conservative potential, Gk = −∇U k , then we can define
L = T − U for as many of the generalized forces as this holds. For polygenic forces and
nonholonomic constraints we instead end up with:
= Gk + ∑ j =1 λ j Ajk ,
d ⎛ ∂T ⎞ ∂T
⎜ ⎟−
dt ⎝ ∂qk ⎠ ∂qk
m
(6.4)
⎠
Recall from the previous lesson:
ψ (t = 0) = ϕ (t = 0) = 0
s = R1ϕ = R2ψ ⇒ ψ = ( R1 R2 ) ϕ
ϕ = π 2 − θ ⇒ ψ = ( R1 R2 )(π 2 − θ )
ϕ = −θ and ψ = − ( R1 R2 )θ
rˆ = cos (θ ) xˆ + sin (θ ) yˆ
θˆ = − sin (θ ) xˆ + cos (θ ) yˆ xˆ = cos (θ ) rˆ − sin (θ )θˆ
Vector relationships: rˆi yˆ = sin (θ ) θˆi xˆ = cos (θ ) yˆ = sin (θ ) rˆ + cos (θ )θˆ
( )
Having stated all of the forces in polar coordinates, do the same for the acceleration of
( ) ( )
−mg sin (θ ) rˆ + cos (θ ) θˆ + Nrˆ + f θˆ = m ⎡ r − rθ 2 rˆ + rθ + 2rθ θˆ ⎤ .
the center of the rolling ball, which gives:
⎣ ⎦
( )
N = m r − rθ 2 + mg sin (θ ) ,
Matching coefficients of the component vectors allows isolation of the unknown forces:
f = m ( rθ + 2rθ ) + mg cos (θ ) .
But the radius vector is constrained to move along a circle of radius r = R1 + R2 , a
constant, so r = r = 0, and the unknown force expressions simplify to:
N = −m ( R1 + R2 ) θ 2 + mg sin (θ ) ,
f = m ( R1 + R2 ) θ + mg cos (θ ) .
( )
The additional concept required is that of the torque of the rolling ball, lumped at the
arm Λ = − R2 rˆ × f = − R2 f rˆ × θˆ = − R2 fzˆ, and also in the form of the 2nd law of motion
center of mass. This can be expressed in two ways: in terms of the force and the moment
for rotating bodies, derived from the rate of change of angular momentum, Λ = α I , where
⎛ R1 + R2 ⎞
α = − (ϕ + ψ ) = + ⎜ ⎟ θ is the angular acceleration with respect to the − ẑ axis, and
⎝ R2 ⎠
we have compounded the motions with respect to both spheres; and I = m ( R2 ) is the
2 2
⎛ R + R2 ⎞ 2
moment of inertia of the rolling ball. This gives us
− R2 f = + ⎜ 1 ⎟ θ i m ( R2 ) ⇒
⎝ R2 ⎠ 5
2
f = − m ( R1 + R2 ) θ .
2
5
m ( R1 + R2 ) θ + mg cos (θ ) = − m ( R1 + R2 ) θ , ⇒
Equating this with the previous expression for the frictional force gives
2
5 g cos (θ )
5
θ =− = − k cos (θ ) , k =
7 ( R1 + R2 ) 7 ( R1 + R2 )
5 g
.
do not know the form of θ ( t ) yet. So multiply both sides by the integrating factor θ ,
rolling ball initially at rest. However
0
∫ θθ dt = ∫ θ dθ = θ 2 ,
For the LHS we get:
t θ 1
t =0 θ =0
π
2
and recalling that θ = when t = 0, for the RHS we get:
⎝2⎠
t
π
t =0 θ=
Combining these expressions gives θ 2 = 2k ( sin (θ ) − 1) , and plugging this into the
2
N = − m ( R1 + R2 ) ⋅ 2k ( sin (θ ) − 1) + mg sin (θ ) .
equation for the normal force gives:
( sin (θ ) − 1) + mg sin (θ )
Recalling the definition of the constant k , we get:
N = −m ( R1 + R2 ) ⋅ 2
7 ( R1 + R2 )
5 g
= −mg ⋅
10
7
( sin (θ ) − 1) + mg sin (θ ) = mg (17 sin (θ ) − 10 ) .
1
7
This occurs when sin (θ ) = , or a little bit more than 36 . This position is marked on
The rolling ball leaves the surface of the stationary ball when the normal force is zero.
10
17
the diagram with a dot.
We could integrate again to get θ ( t ) , which would allow us to determine the time of
departure of the rolling ball, and it’s speed and direction, but that is left as an exercise for
the reader.
ψ (t = 0) = ϕ (t = 0) = 0
s = R1ϕ = R2ψ ⇒ R1ϕ = R2ψ
⇒ ψ = ( R1 R2 ) ϕ
⇒ R1dϕ − R2 dψ = 0
T = m ( R1 + R2 ) ϕ 2 + I ω 2
Kinetic Energy:
1 2 1
ω = ϕ + ψ I = m ( R2 )
2 2
2 2
U = mg ( R1 + R2 ) cos (ϕ )
Potential Energy:
The center of mass motion is along an arc of radius ( R1 + R2 ) , and at the rate ϕ . The
both spheres, so the angular speed is ω = ϕ + ψ . Putting this together gives the
rolling motion is about the radius R2 , but compounded of the rotations with respect to
Lagrangian:
L = T −U = T = m ( R1 + R2 ) ϕ 2 + m ( R2 ) (ϕ + ψ ) − mg ( R1 + R2 ) cos (ϕ )
1 2 1 2 2
2 5
d ⎛ ∂L ⎞ ∂L
⎜ ⎟− = m ( R1 + R2 ) ϕ + m ( R2 ) (ϕ + ψ ) − mg ( R1 + R2 ) sin (ϕ ) = λ R1.
dt ⎝ ∂ϕ ⎠ ∂ϕ
2 2 2
∂L 2 ∂L
ψ: = m ( R2 ) (ϕ +ψ ) =0⇒
∂ψ 5 ∂ψ
2
d ⎛ ∂L ⎞ ∂L 2
⎜ ⎟− = m ( R2 ) (ϕ +ψ ) = −λ R2
dt ⎝ ∂ψ ⎠ ∂ψ 5
2
m ( R1 + R2 ) ϕ + m ( R2 ) (1 + ( R1 R2 ) ) ϕ − mg ( R1 + R2 ) sin (ϕ ) = λ R1.
This leaves us with two equations:
2 2 2
(7.1)
m ( R2 ) (1 + ( R1 R2 ) ) ϕ = −λ R2 .
5
2 2
(7.2)
5
5 g sin (ϕ )
5 R2 5
π
This is the same as our previous solution if we substitute ϕ = −θ.
2
The condition for losing contact is an inequality, so we cannot use the undetermined
multipliers to find it; it is actually a boundary condition. Instead we note that the normal
force is equal to the centripetal force at the instant of leaving, so we need
a= = ( R1 + R2 ) ϕ 2 . We find ϕ by integration of ϕ as follows:
( R1 + R2 )
v2
5 g sin (ϕ )
∫ ϕϕ dt = 2 ϕ =∫ dϕ = − ( cos (ϕ ) − 1) ;
ϕ
7 ( R1 + R2 ) 7 ( R1 + R2 )
t 1 2 5 g
So a = ( R1 + R2 ) ϕ 2 = −
10
For each problem, write the Lagrangian, and identify the conserved quantity. The goal is
to properly describe the system. For extra credit, write the equations of motion for each
system. Then illustrate the solutions by establishing typical initial conditions.
2. Same as problem (1), but with mass M fixed, not free to move.
4. Same as problem (3), but now include the pendulum from problem (1).
∂
A particle of mass m is in a central force field F ( r ) = −U ( r ) ; that is, the only
∂r
5.
6. A mass m is free to slide down a wedge (base angle θ ) of mass M; the wedge is
free to move in the X direction. Consider only motions along a single horizontal line.
Describe the initial conditions that would violate this constraint.
8. A mass m is hung from a rod of length R which is freely pivoted to the ceiling,
allowing it to swing freely in all directions. Ignore the inertia of the rod, and use
spherical coordinates.
9. Same as problem (8), but now the rod is suspended from a free-moving block of
mass M, perhaps magnetically suspended from the ceiling. Ignore friction.
∂L ∂L
p= reduces the equation of motion to p =
∂q ∂q
. The product of each pair of
conjugate coordinates, q j p j , always has units of action, the same as angular momentum.
dL ∂L dq ∂L dq ∂L ∂L d ∂L ∂L d
= ( pq ) + = ( pq − L ) .
Consider the time variation of the Lagrangian:
= + + = pq + pq + , so that −
dt ∂q dt ∂q dt ∂t ∂t dt ∂t ∂t dt
Thus there are both implicit and explicit variations with time, and a transformation is
Hamiltonian H = H ( q, p, t ) = pq − L .
apparent which would remove the implicit time variations: we thus define the
∂F
function. Suppose you have F ( x, y ) and need G ( x, y′ ) where y′ =
The Legendre (or “dual”) transformation is a method used to change the variables of a
∂y
; y is called the
G = yy ′ − F ,
steps, the first being to define the dual function:
followed by the algebraic removal of the variable y. This second step is possible only if
the Hessian determinant is non-zero. The Hessian is formed by taking the Jacobian of
the partial derivatives of the function F with respect to its arguments:
∂2F ∂2F
∂x 2 ∂x∂y
HF ( x, y ) = 2
∂ F ∂2 F
.
∂y∂x ∂y 2
∂G ∂G
dy′ = d ( yy′ − F ) = ydy′ + y′dy − dF , which expands and simplifies to:
Taking the differential of the dual function gives:
dG = dx +
∂x ∂y′
⎛ ∂F ⎞ ∂F
dG = ydy′ + y′dy − ⎜ dx + y′dy ⎟ = ydy′ −
⎝ ∂x ⎠ ∂x
dx.
∂G ∂F
can recover the original active variable by means of differentiation. Note also that
=−
∂x ∂x
for the passive variable.
∂H
that:
=q
∂p
(8.1) this being the result of the active transform on q
∂H ∂L
=−
∂q ∂q
(8.2) this being the result of the passive transform on q
∂H ∂L
=−
∂t ∂t
(8.3) this being the result of the passive transform on t.
∂L d ⎛ ∂L ⎞ d ∂H
So far this is all mathematics. The physics comes from Lagrange’s equations of
= ⎜ ⎟ = ( p ) = p. Thus equation (8.2) becomes = − p.
∂q dt ⎝ ∂q ⎠ dt ∂q
motion:
The results of this transformation are called Hamilton’s equations of motion, and they
bring a great simplification to theoretical mechanics. This is due in part to the symmetric
relation between p and q, and the reduction from 2nd order to 1st order partial differential
equations. But the great value is that the theatre of physics has been relocated from the
busy confines of the n-dimensional configuration space to the more open 2n-dimensional
phase space.
The Lagrangian exists in the tangent space of the configuration manifold, while the
Hamiltonian is its dual, and exists in the corresponding cotangent space (phase space).
Following the Hamiltonian further would lead to a study of its symplectic structure, of
Poisson brackets, Liouville’s theorem, and of canonical transformations. Instead of the
configuration space of the Lagrangian, we work in the phase space of the Hamiltonian;
this rich environment is of great importance in classical, statistical, and quantum
mechanics. We will take a brief look at a number of these important topics, and try to
show their inter-connections.
We began by looking at the time variation of the Lagrangian, and transformed to the
∂H ∂H ∂H ∂L ∂L
( H ) = q + p + = pq − qp − = − , where we have substituted the
Hamiltonian. The temporal variation of the Hamiltonian is:
d
dt ∂q ∂p ∂t ∂t ∂t
results from Hamilton’s equations of motion for the partial derivatives of H. Thus if the
Lagrangian has no explicit dependence upon time, the Hamiltonian is time independent.
We thus restrict ourselves to systems where H = 0 , and show that this must be a
conservative system. We need the result of Euler’s theorem on homogeneous functions;
, so we get ∑ pi qi = ∑ aij qi q j = 2T .
2 i, j
∂L ∂T
j
principle , we will now apply Hamilton’s Principle to L = qp − H . Note that the path
Recalling that the Euler-Lagrange equations of motion can be derived from a variational
variable (time) is not varied, and that there are no variations at the endpoints for any
variable:
∂L
k
∂L
are N equations of motion: pk =
∂qk
. In phase space the conjugate momentum becomes
∂H ∂H
= qk and = − pk .
∂p k ∂qk
an independent variable with 2N equations of motion:
We will explore some of the properties of phase space, in order to appreciate its
advantages over the more familiar configuration space, and seek methods for solving
Hamilton’s equations of motion. We will consider only conservative systems.
Canonical Transformations
∂L
The Lagrange equations of motion retain the same form even as the generalized
coordinates are transformed. A lucky choice may result in pk = = 0, which implies
∂q k
δ ∫ ⎜ ∑ pk qk − H ⎟dt = 0 = δ ∫ ⎜ ∑ Pk Qk − H ′ ⎟dt ,
⎛ ⎞ ⎛ ⎞
Hamilton’s Principle is satisified:
⎝ k ⎠ ⎝ k ⎠
(9.1)
δ∫ dt = δ ∫ dS = δ S ( q, Q, t ) =
∂S ∂S
δq+ δ Q = 0,
dS
∂q ∂Q
(9.3)
dt
the result being zero because the coordinates are fixed at the boundaries. But from
S ( q, Q, t ) = ∫ L dt.
conditions, where the parameters are for the boundary conditions:
(9.4)
= ∑⎜ ( )
= ∑ pk qk − Pk Qk + .
⎛ ∂S ∂S ⎞ ∂S ∂S
qk + Qk ⎟ +
dS
k ⎝ ∂qk ∂Qk ⎠ ∂t ∂t
Then the total derivative is
dt k
∂S
H ' ( Q, P, t ) = H ( q, p, t ) +
Putting these results of the total derivative into (9.2) gives the desired transformation:
∂t
(9.5) ,
∂S
where we also need to substitute pk =
∂qk
into the old Hamiltonian. The most direct
The solution is the generating function for H ' ( Q, P, t ) = 0, and automatically gives us all
∂H ′ ∂H ′
of the constants of the motion of the system. This follows directly from the condition
that H ' = 0, so that Qk = − = 0, and Pk = = 0, so Qk and Pk are both constants in
∂Pk ∂Qk
this coordinate system, and thus describe a bundle of parallel lines in the state space, with
axes for Q, P, and t. The effect of the transformation has been to straighten out the world
lines for each of the N particles. The system is stationary in phase space; the generating
function is the annihilator of the old Hamiltonian.
Please note that while finding a solution to this single partial differential equation reduces
the Hamiltonian problem to a process of differentiation, it is still a very difficult problem
… its value is mostly in the theory, which gives us additional means to look at very
k
constants of the motion. Such systems can be completely solved by these methods.
∂S
Homework: prove that qk = −
∂pk
. This shows that the old coordinates can be recovered
∂S
∂Pk
from the generating function. Also find the value of . Can you also find the total
energy, E?
{F , G} = ∑ ⎜
⎛ ∂F ∂G ∂G ∂F ⎞
For functions defined on phase space we define the Poisson bracket as follows:
− ⎟.
k ⎝ ∂qk ∂pk ∂qk ∂pk ⎠
(9.7)
{F , G} = − {G, F }
{F1 + F2 , G} = {F1 , G} + {F2 , G}
Anticommutation
∂pk ∂qk
Selection property
= ∑⎜ = ∑⎜
⎛ ∂F dqk ∂F dpk ⎞ ∂F ⎛ ∂F ∂H ∂F ∂H ⎞ ∂F
The total derivative has a special relationship with the Hamiltonian:
∂F
+ ⎟+ − ⎟+ = {F , H } +
dF
k ⎝ ∂qk dt ∂pk dt ⎠ ∂t k ⎝ ∂qk ∂pk ∂pk ∂qk ⎠ ∂t ∂t
,
dt
Where we have used Hamiltion’s equations of motion in the second step, which is
identical to the Poisson bracket found in the third step.
and only if { F , H } = 0.
The Poisson brackets form a Lie algebra. It is important to note that the expressions are
invariant under canoncial transformations; the proof is a straight-forward, if lengthy,
exercise of the chain rule for partial derivatives. We will use these results briefly, but
will revisit them when we get to quantum mechanics.
Homework (extra credit): prove that the Poisson bracket {F , G}Q , P = { F , G}q , p ; start with
the dependence upon Qk , Pk and recall that each of these may depend upon all of the
qk , pk . Then use the Poisson bracket phase space variable identities to consolidate the
expansion.
For a Hamiltonian with constant energy the system is clearly confined to move on a
hypersurface of constant energy through phase space. Liousville’s theorem states that the
phase fluid for any collection of Hamiltonians is incompressible. We show this by
∇ iv = ∑ ⎜ ⎟ = ∑⎜
⎛ ∂qk ∂pk ⎞ ⎛ ∂2 H ∂2 H ⎞
+ − ⎟ = 0,
k ⎝ ∂qk ∂pk ⎠ k ⎝ ∂qk ∂pk ∂pk ∂qk ⎠
(9.8)
σ = ∫ d N q d N p,
This is an important result for statistical mechanics, where it is often written as
(9.9)
where the integration is over all of the 2N phase space coordinates for some initial
volume, and then follow that patch of phase fluid over time. The enclosed volume may
become distorted, but it remains constant for all time. If we divide by the fixed volume,
we get the unchanging phase space density as that collection of systems evolves.
∂ρ
= − {ρ , H } , where
∂t
Homework: An alternative expression for Liousville’s theorem is
dρ
ρ is a probability distribution on phase space. To prove this, show that = 0.
dt
We have come a long way from Newton’s F = ma. We will finish this lecture series
with some connections to optics and quantum mechanics. At a later date we may return
and show how analytic mechanics lies at the foundations of thermodynamics, that is, of
statistical mechanics.
By Hamilton’s Principle, δ ∫ Ldt = 0, where the integration takes place between fixed
limits in order for the boundary terms to cancel. In our search for a solution by means of
generating function, S ( q, Q, t ) = ∫ L dt. The boundary terms would then be the action
canonical transformations we found an expression for such boundary terms in the
dependency, the remaining two variables ( p, P ) are implicitly included, and usually one
the result of the Hamilton-Jacobi equation. Though you only see (q, Q ) in the functional
Recalling that a complete set of partial derivatives represent a tangent plane, it is natural
that the solutions to partial differential equations result in families of surfaces; the
corresponding analogy from ordinary differential equations are tangents to curves, whose
solutions are trajectories or parameterized paths. We have now seen that the calculus of
variations has provided solutions and insights into both types of problems … as our focus
has shifted from a specific path due to specified boundary conditions, to a family of
solutions which are related by the boundary conditions falling on the specified
hypersurfaces.
This is exactly the connection between ray optics and the optics of wave fronts.
Optical path length (OPL) is the ideal way to measure the path followed by a beam of
along the path. We use the common notation for the index of refraction, n = c / v, where
light. It takes into account both the distance traveled, and the local speed of light all
c is the speed of light in vacuo, and v is the local speed of light. The index of refraction
is often varies spatially, but may also have temporal variations, such as small thermal or
pressure shifts in air.
fixed points in space. The integrated optical path length is divided by the speed of light
in vacuo in order to covert back to the true transit time.
The Huyghen’s construction is clearly related to Figure 1, but we will see that it is also
the key to the eikonal equation, which is the fundamental equation of ray optics. We will
derive it from Fermat’s Principle; it can also be derived as an approximation to
Maxwell’s equations for electromagnetic waves.
Suppose that our light source is a point, and that the curve is a surface of constant OPL
through a series of these surfaces dS = n ds … and each surface would be lit up in turn.
from the light source. Then if we flashed the light once, we would see the pulse travel
The fastest way to get to the next surface is for each ray to follow the gradient. It’s
magnitude will be the local index of refraction, in the direction of the path, ŝ :
∇S = nsˆ = k .
ω
c
(10.2)
This is the eikonal equation for isotropic media, where ω = 2πν , is the circular frequency
2π ω
(cycle rate), and k = is the wave number such that λν = = v = . The ray direction
λ
c
k n
defines the wave vector, k .
⎝ du ⎠ ⎝ du ⎠ ⎝ du ⎠
with v = = . So we can write Fermat’s Principle as:
c ds
n ( x, y , z )
that ( x′2 + y′2 + z ′2 ) = 1, and u = s. So L ( x, y, z , x′, y′, z ′, s ) = ( )
Recognizing that the parameterization is arbitrary, let it be the path length, which means
x ′2 + y ′2 + z ′2 ,
( )
ds
0= T iT = 2Tˆ i , which implies that the rate of change of the unit tangent to a
d ˆ ˆ dTˆ
ds ds
curve is perpendicular to the tangent’s rate of change with arc length; this is the origin of
d ⎛ ∂L ⎞ ∂L
R
⎛ ⎞
Carrying out these operations we get:
d ⎜ x′
( )
⎟ = ∂n x′2 + y′2 + z ′2 , but the radicals satisfy the constraint,
ds ⎜⎜ ( )⎟
x′2 + y′2 + z ′2 ⎟ ∂x
n
⎝ ⎠
d ⎛ dx ⎞ ∂n
⎜ n ⎟ = ; taking all three together gives the
ds ⎝ ds ⎠ ∂x
and so are equal to 1, reducing to
( )
eikonal equation,
nTˆ = ∇n .
d
(10.5)
ds
Integrating this form with respect to arc length recovers our earlier equation (10.2). Either
form of the eikonal equation is logically equivalent to Fermat’s Principle, and any of
these can be taken as the foundation of ray optics. The eikonal equation is also the
mathematical expression of the Huyghen’s construction of advancing wave fronts.
Figure 3- The computer-rendered image on the right has enabled a photon-mapping algorithm,
which makes the image appear more life-like. These caustics are due to reflection and refraction.
These phenomena were first studied systematically by Christan Huyghens.
Caustic curves are the envelopes of reflected or refracted rays. They can be found via the
singularities of the eikonal equation. They are responsible for the bright lines that ripple
across a clear pool of water with the wind, and for the effects seen (and unseen) in Figure
2 and Figure 3.
⎛ ∂W ⎞
of the constant energy:
∂W
H ⎜ q1 , , qN , ⎟ = E.
⎝ ∂q1 ∂qN ⎠
(10.6) , ,
∂W
Otherwise W is defined in a way similar to S above, with pk =
∂qk .
. Then the vector of
⎛ ∂W ⎞ ⎛ ∂W ⎞ ⎛ ∂W ⎞
⎜ ⎟ +⎜ ⎟ +⎜ ⎟ = ( ∇W ) = 2m ( E − U ) , which is analagous to the optical
2 2 2
⎝ ∂x ⎠ ⎝ ∂y ⎠ ⎝ ∂z ⎠
2
normal to the light rays, and which is equivalent to the Huyghen’s construction. Then the
opto-mechanical analogy is that the surfaces of constant action from Hamilton’s
the same energy which are on rays at the initial surface will remain on rays for all time.
from Fermat’s variational principle: δ ∫ n c ds = 0, which says that light rays follow the
The opto-mechanical analogy goes further because the eikonal equation can be derived
the kinetic energy (and hence has absorbed the masses of all of the particles), and is
independent of the time. In order to evaluate this integral, we must reexpress the
The surfaces are now seen to be surfaces of constant action, and we have an alternative
condition for the principle of least action. However, the orthogonality is not Euclidean,
but Riemannian. For example, in the case of electrons in a magnetic field, the surfaces of
constant action are not perpendicular to the trajectories in the Euclidean sense, and thus
not all optical theorems for light hold for electron optics. Lanczos (p. 268) adds some
further cautions: light always obeys the ray property because the possible light rays of a
given optical field form a two-dimensional manifold of curves, while the possible particle
trajectories of mechanics forms a five-dimensional manifold.
Furthermore, we point out that the hypersurfaces of constant S (or W) are determined by
an ensemble of systems satisfying the same Hamiltonian, but with different boundary
conditions. Now consider the case where they are determined by a single system … these
In his spare time, between completing his dissertation on the statistical mechanics of
Brownian motion as a means to determine Avogadro’s number and the size of atoms and
molecules, and earning a future Nobel prize for his explanation of the photo-electric
effect, Einstein developed the special theory of relativity as a way to reconcile mechanics
with electrodynamics. This solution imposes additional constraints upon classical
(
E 2 = ( pc ) + mc 2 )
mechanics. These can be viewed as an interconnection of mass, energy, and momentem:
×
2 2 3
, or as a change in metrical structure from a Euclidean for
With the introduction of the nuclear atom by Ernest Rutherford (1911), and the orbital
quantization of this atom by Niels Bohr (1913), the era of “the old quantum theory” was
well under way. This period, ending with a crescendo in 1926, produced the still-useful
semi-classical quantum theory based upon the Hamilton-Jacobi equation. We will look at
only one element, the application by Louis deBroglie to obtain the “quantum wave”,
wavelength: p = h / λ = k .
which can be summarized by the deBroglie relation of classical momentum to quantum
This result was motivated by the Planck relation, and deBroglie applied the notions of
relativity to the Hamilton-Jacobi equation to show that if waves have particle attributes,
then so must particles have wave attributes. Einstein thought highly of deBroglie’s thesis
(1924), and Schrödinger (1925) took the idea up with a vengeance as an alternative to the
matrix mechanics of Heisenberg (1925). Heisenberg’s thesis was that the ills of the “old
quantum mechanics” were bound up with the reism of non-observable trajectories
inherited from classical mechanics … so he banished them! Matrix mechanics only dealt
with experimentally observable quantities. From Wikipedia article on basic quantum
mechanics: Heisenberg voice recording in an early lecture on the uncertainty principle
pointing to a Bohr model of the atom: "You can say, well, this orbit is really not a
complete orbit. Actually at every moment the electron has only an inactual position and
an inactual velocity and between these two inaccuracies there is an inverse correlation."
Schrödinger was an established professor, with a liking for music. He was familiar with
the treatment of analysis of vibrating strings by means of Hamiltonian perturbations, and
immediately saw that deBroglie’s matter waves could be given a Hamiltonian treatment.
It all looked very realistic, but in 1926 Schrödinger proved that the Heisenberg matrix
mechanics and the Schrödinger wave mechanics were mathematically equivalent. So the
quantum wave is an unobservable property of the quantum world. Max Born gave the
explaination (1927) that the quantum wave is a probability amplitude, and that if you
repeat the experiment many times, you will simply fill in the probability distribution.
( )
ψ ( r , t ) = Aiexp ⎡i k ir − ωt ⎤ = Aiexp ⎡⎣i ( pir − Et )⎤⎦ .
Assume the existance of de Broglie’s quantum wave:
⎣ ⎦
(11.1)
∂
We can extract the momentum for the x-direction with pˆ x =
∂x
i; this generalizes to
i2
∂ψ
operator once on the right hand side; we get the potential by multiplication:
H =− ∇ +U = E ⇒ − ∇ 2ψ + U iψ = i
2 2
∂t
(11.2) ˆ 2 ˆ .
2m 2m
Hˆψ = Enψ ,
This is the famous Schrödinger wave equation. For time-independent solutions:
∂ψ
(11.3)
apply a conjugate pair of operators to a quantum wave function, the order of application
makes a difference … this is another way to state the Heisenberg Uncertainty Principle.
Homework: Consider a particle in a 1D box; the sides of the box consist of a very steep
potential, while inside the box the potential is zero. Find the solutions to the time-
independent Schrödinger equation.
Dirac introduced a notation that is still dominates quantum mechanics. It is brief, and
conveys the ideas of a vector space quite well. Since the solutions are vectors in a
function space, these functions are represented as vectors. But there is also a dual space,
as the result. Dirac slyly called it the bracket or bra-ket notation, where the bra, < ψ | , is
and the inner product of a vector and a dual is an inner product, with a complex number
in the dual space, and the ket, | ϕ > , is in the vector space. < ψ |* = | ψ >; they are
trasposed conjugates. Their inner product is < ψ | ϕ >=< ϕ | ψ >* , which would represent
the probability amplitude for scattering from the first state to the second. Once you find
this as an integral, ∫ψ *ϕ d n r . The formula for finding the expectation value, or mean
the corresponding functions (usually in terms of the eigenfunctions), you can evaluate
Postscript
Note that we have used a classical, not a relativistic, Hamiltonian. Paul Dirac (1928)
gained fame by finding that equation. It has the inherent feature of requiring the
existance of quantum spin, which has no classical parallel, and the existance of anti-
pariticles, such as the positron.
Quantum mechanics was not complete though, because the classical potential was still
retained. This step requires quantization of the fields (“second quantization”), and was
essentially completed independently by Tomonaga and Schwinger using Hamiltonian
techniques, and Feynman using relativistically-invariant Lagrangian techniques, in the
late 1940’s. The result was Quantum Electro-Dynamics, or QED. In particular, the
Hamilton-Jacobi Equation again plays a role in the Feynman approach, because the
classical action is the temporal propagator. This is also known as the quantum theory of
light, because it covers the interactions between light and matter.
Gravity has yet to be quantized. The difficulty is due to the non-linear nature of
Einstein’s General Theory of Relativity. Current proposals include quantum loop and
string theories.
Homework: Find a fully relativistic quantum theory for gravitation. What currently
accepted theories need to be modified? What role does the Hamilton-Jacobi equation
play in your development?
p = [γ mv ] = γ mv + γ mv
Newton’s second law of motion can be correctly written as
F=
d d
(1.1)
dt dt
Thus we see that in addition to the direction of acceleration, the force also
acts in the direction of motion. Writing a for v , and applying the chain
⎛v a ⎞
rule we easily get
γ =γ3⎜ i ⎟
⎝ c c ⎠ (1.2)
( v ia ) v + γ ma
which gives
F =γ3
m
γ ≈ 1,
(1.3)
c2
The first term vanishes for velocities where v c 1 , and we also have
leaving the traditional form of F = ma. But for relativistic velocities we see
that we cannot find the acceleration directly from the force, for it also
depends upon the velocity.
Let û be the instantaneous direction of travel and û⊥ be any transverse
direction so that uˆ iuˆ⊥ = 0. Using this notation we can resolve the
transverse and longitudinal components of the force:
where the first term has vanished since v iuˆ⊥ = 0. The longitudinal
2 (
v i a ) (v iuˆ ) + γ m( a iuˆ )
component starts with two terms:
F = F iuˆ = γ 3
m
(1.5)
= vuˆ , giving:
c
We can simplify the first term by using v
Relativistic Force
P. Diehr – July 2002
F = γ 3m
1
( vuˆ ia ) (vuˆ iuˆ ) + γ m(a iuˆ )
= γ m 2 ( a ) (1) + γ m( a )
c2
3 v2
c
⎡ 2 v2 ⎤
(1.6)
= γ m ⎢γ 2 + 1⎥ a
⎣ c ⎦
F = γ 3ma
longitudinal force:
(1.7)
m⊥ = γ m
the obvious definitions:
m = γ 3m
These are called the transverse and longitudinal masses, and unlike the rest
mass, are obviously not relativistic invariants. This is because forces are
frame-dependent.