Notes PDF

An Introduction to Wave Equations and Solitons
Richard S. Palais
The Morningside Center of Mathematics
Chinese Academy of Sciences
Beijing
Summer 2000
Contents
Section 1. Wave Equations 2
1.1 Introduction 2
1.2 Travelling Waves and Plane Waves 3
1.3 Some Model Equations 4
1.4 Linear Wave EquationsDispersion and Dissipation 8
1.5 Conservation Laws 15
1.6 Split-Stepping 20
Section 2. The Korteweg-de Vries Equation 22
2.1 Early History, Exact Solutions, and Solitons 22
2.2 Constants of the Motion for the KdV Flow 25
2.3 KdV as a Hamiltonian System 26
2.4 KdV as a Lax Equation 28
2.5 The KdV Heierarchy 31
Section 3. Introduction to the Inverse Scattering Transform 33
3.1 The Scattering Data and its Evolution 33
3.2 The Gelfand-Levitan-Marchenko Equation 36
3.3 An Explicit Formula for KdV Multi-Solitons 37
Section 4. The ZS-AKNS Scheme 39
4.1 Zero Curvature Lax Equations 39
4.2 Some ZS-AKNS Examples 40
Appendix A. Symplectic Manifolds and Hamiltonian Systems 42
Appendix B. Wave Equations as Continuum Limits of Lattice Models 45
Appendix C. The Pseudospectral Method: Solving Wave Equations Numerically 51
References 57
1
Section 1
Wave Equations
1.1 Introduction
This rst section of these notes is intended as a very basic introduction to the theory of
wave equations, concentrating on the case of a single space dimension. We will start pretty
much from scratch, not assuming any prior experience with either the special terminology or
the techniques used in working with these equations. On the other hand, given the nature
of the audience, I will assume a fairly sophisticated general background in analysis.
An experienced applied mathematician can often tell a great deal about the behavior of
solutions to a wave equation from just a cursory look at its structure. My goal in the next
few sections is to disclose some of the secrets that make this possible.
What we mean by a wave equation will gradually be made more precise as we proceed.
At rst, we will just a mean a certain kind of ordinary dierential equation on the space
C
(R
n
, V ), where V is some nite dimensional vector space, usually R or C, (and generally
we will take n = 1). Thus the wave equation will look like:
() u
t
= f(u),
where u signies a point of C
(R
n
, V ), u
t
means
du
dt
, and f is a special kind of map of
C
(R
n
, V ) to itself, namely it is a partial dierential operator, i.e., f(u)(x) is a smooth
function F(u(x), u
x
i
(x), u
x
i
x
j
(x), . . .) of the values of u and certain of its partial derivatives
at x. (In fact, below the function F will almost always be a polynomial.) A solution of ()
is a smooth curve u(t) in C
(R
n
, V ), such that if we write u(t)(x) = u(x, t), then
u
t
(x, t) = F
_
u(x),
u
x
i
(x, t),

2
u
x
i
x
j
(x, t), . . .
_
.
We will be interested in solving the so-called Cauchy Problem for such partial dier-
ential equations, i.e., nding a solution, in the above sense, with u(x, 0) some given element
u
0
(x) of C
(R
n
, V ). So far, this should more properly be called simply an evolution equa-
tion. In general such equations will describe evolving phenomena which are not wave-like
in character and, as we said above, it is only after certain additional assumptions are made
concerning the function F that it is appropriate to call such an evolution equation a wave
equation.
We will be interested of course in the obvious questions of existence, uniqueness, and
general properties of solutions of the Cauchy problem, but even more it will be the nature
and properties of certain special solutions that will concern us. In particular we will try to
understand the mechanism behind the remarkable behavior of the so-called soliton solutions
of certain special wave equations such as the Korteweg de Vries Equation (KdV), the Sine-
Gordon Equation (SGE), the Nonlinear Schr odinger Equation (NLS), and other so-called
integrable equations.
2
As well as rst order ODE on C
(R
n
, V ) we could also consider second and higher order
ODE, but these can be easily reduced to rst order ODE by the standard trick of adding
more dependent variables. For example, to study the classic wave equation in one space
dimension, u
tt
= c
2
u
xx
, a second order ODE, we can add a new independent variable v and
consider instead the rst order system u
t
= v, v
t
= c
2
u
xx
(which we can put in the form (*)
by writing U
t
= F(U), with U = (u, v), F(u, v) = (v, c
2
u
xx
)).
1.2 Travelling Waves and Plane Waves
Before discussing particular model wave equations, we will look at the kind of behavior we
expect to see in solutions. There are a number of important simplications in the description
of wave propagation for the case of a single space dimension, and to develop a feeling for
many of the important concepts it is best to see them rst without the extra complexities
that come with higher dimensions, so in what follows we will concentrate almost solely on
the case n = 1.
Lets recall the basic intuitive idea of what is meant by wave motion. Suppose that
u(x, t) represents the strength or amplitude of some scalar physical quantity at the
spatial point x and time t. If you like, you can think of u as representing the height of
water in a canal. Then the graph of u
0
(x) = u(x, t
0
) gives a snapshot of u at time t
0
. It is
frequently the case that we can understand the evolution of u in time as representing the
propagation of the shape of this graph. In other words, for t
1
close to t
0
, the shape of the
graph of u
1
(x) = u(x, t
1
) near x
0
will be related in some simple way to the shape of u
0
near
x
0
.
Perhaps the purest form of this is exhibited by a so-called travelling wave. This is a
function u of the form u(x, t) = f(x ct) where f : R V is a function dening the wave
shape, and c is a real number dening the propagation speed of the wave. Let us dene the
prole of the wave at time t to be the graph of the function x u(x, t). Then the initial
prole (at time t = 0) is just the graph of f, and at any later time t, the prole at
time t is obtained by translating each point (x, f(x)) of the initial prole ct units
to the right to the point (x + ct, f(x)). In other words, the wave prole of a travelling
wave just propagates by rigid translation with velocity c.
(We will see below that the general solution of the equation u
t
cu
x
is an arbitrary
travelling wave moving with velocity c, and that the general solution to the equation u
tt

c
2
u
xx
is the sum (or superposition) of two arbitrary travelling waves, both moving with
speed [c[, but in opposite directions.)
There is a special kind of complex-valued travelling wave, called a plane wave, that plays
a fundamental r ole in the theory of linear wave equations. The general form of a plane wave
is u(x, t) = Ae
i
e
i(kxt)
, where A is a positive constant called the amplitude, [0, 2) is
called the initial phase, and k and are two real parameters called the wave number and
angular frequency. (Note that
k
2
is the number of waves per unit length, while

2
is the
number of waves per unit time.) Rewriting u in the form u(x, t) = Ae
i
e
ik(x
k
t)
, we see
that it is indeed a travelling wave and that its propagation velocity is

k
.
In studying a wave equation, one of the rst things to look for is the travelling wave
solutions (if any) that it admits. For linear wave equations (with constant coecients) we
will see that for each wave number k there is a unique angular frequency (k) for which the
3
equation admits a plane wave solution, and the velocity
(k)
k
of this plane wave as a function
of k (the so-called dispersion relation of the equation) not only completely determines the
equation, but is crucial to understanding how solutions of the equation disperse as time
progresses. Moreover, the fact that there is a unique (up to a multiplicative constant)
travelling wave solution u
k
(x, t) = e
i(kx(k)t)
for each wave number k will allow us to solve
the equation easily by representing the general solution as a superposition of these solutions
u
k
; this is the Fourier method.
1.2.1 Remark. For nonlinear wave equations, the travelling wave solutions are in general
severely restricted. Usually only special proles, characteristic of the particular equation,
are possible for travelling wave solutions. In particular they do not normally admit any
solutions of the plane wave form Ae
i
e
i(kxt)
.
The concepts of travelling wave and plane wave still make sense when the spatial dimension
n is greater than one. Given an initial prole f : R
n
V , and a direction S
n1
,
we can dene the travelling wave u(x, t) with prole f and moving with speed in the
direction by u(x, t) := f(x t). Note that the graph of the function x u(x, t) is just
the graph of f translated by t, so it indeed travels in the direction with speed . If
we choose a basis v
i
for V , then we can write f as a nite sum f(x) =

d
i=1
f
i
(x)v
i
where
f
i
: R
n
C, thus essentially reducing consideration to the case V = C, so we will assume
that f is scalar valued in what follows.
If S
n1
is a direction, then the bers of the projection
: x x of R
n
onto R
foliates R
n
by the hyperplanes x = a orthogonal to . A prole, f : R
n
C, that is
constant on each such hyperplane is called a plane wave prole, and will be of the form
f(x) = g(x ) where g : R C. If we dene c = , then the corresponding travelling
plane wave is f(xt) = g(x ct), i.e., just the travelling wave with prole g and speed
c on R, pulled back to R
n
by
.
The exponential plane wave u(x, t) = e
i(kxt)
that we used for the case n = 1 has the
prole u(y, 0) = e
iky
. If we use this same prole for n > 1, i.e., dene g(y) = e
iky
, then our
travelling waves will have the form e
ik(xct)
= e
i(kxt)
where, as above, = kc. If we
dene = k and recall that was a unit vector, then our travelling wave is u
,
(x, t) =
e
i(xt)
, where now the wave number is ||, and the speed, c, is related to the angular
frequency by c =

. At any point, x, u
,
(x, t) is periodic in t with frequency

2
, and
xing t, u
,
(x, t) is periodic with period

2
along any line parallel to . We shall see that
for n > 1 too, there is a dispersion relation associated to any linear wave equation, and
the Fourier magic still works; i.e., for each there will be a unique frequency () such
that u
(x, y) = u
,()
(x, t) is a solution of the wave equation, and we will again be able to
represent the general solution as a superposition of these special travelling wave solutions.
1.3 Some Model Equations
In this section we will introduce some of the more important model wave equations (and
classes of wave equations) that will be studied in more detail in later sections
1.3Example 1. Perhaps the most familiar wave equation is u
tt
c
2
u = 0, and I
will refer to it as The Classic Wave Equation. Here is the Laplace operator, and the
operator

2
t
2
is called the wave operator, or DAlembertian operator. We can reduce
4
this to the rst-order form considered above by replacing the one-component vector u by a
two-component vector (u, v) that satises the PDE (u, v)
t
= (v, c
2
u), i.e., the wave shape
u and velocity v satisfy the system of two linear PDE u
t
= v and v
t
= c
2
u. As we shall
see next, in one space dimension it is extremely easy to solve the Cauchy problem for the
Classic Wave Equation explicitly.
Namely, in one space dimension we can factor the wave operator,

2
t
2
c
2
2
x
2
, as the
product(

t
c

x
)(

t
+ c

x
). This suggests that we transform to so-called characteristic
coordinates, = x ct and = x + ct, in which the wave equation becomes simply
2
u
= 0. This clearly has the general solution u(, ) = F() +G(), so transforming back
to laboratory coordinates x, t, the general solution is u(x, t) = F(x ct) + G(x + ct). If
the initial shape of the wave is u(x, 0) = u
0
(x) and its initial velocity is u
t
(x, 0) = v(x, 0) =
v
0
(x), then an easy algebraic computation gives the following very explicit formula:
u(x, t) =
1
2
[u
0
(x ct) +u
0
(x +ct)] +
1
2c
_
x+ct
xct
v
0
() d,
known as DAlemberts Solution of the Cauchy Problem for the Wave Equation in one
space dimension. Note the geometric interpretation in the important plucked string case,
v
0
= 0; the initial prole u
0
breaks up into the sum of two travelling waves, both with the
same prole u
0
/2, and one travels to the right with speed c and the other to the left with
speed c. (We shall see later that something similar happens when n > 1. One can again
decompose the initial shape, but now into a continuous superposition of shapes u
, one for
each direction on the unit sphere S
n1
, and each u
k
then moves as a travelling wave
with the speed c in the direction .)
1.3Exercise 1. Derive DAlemberts solution. (Hint: u
0
(x) = F(x) + G(x), so
u
0
(x) = F
(x) +G
(x), while v
0
(x) = u
t
(x, 0) = cF
(x) +cG
(x).)
1.3.1 Remark. There are a number of important consequences that follow easily from the
form of the DAlembert solution:
a) The solution is well-dened for initial conditions (u
0
, v
0
) in the space of distributions,
and gives a ow on this much larger space.
b) The quantity
_
[u
t
[
2
+ [u
x
[
2
dx is a constant of the motion. More precisely, if this
integral is nite at one time for a solution u(x, t), then it is nite and has the same value
at any other time.
1.3Exercise 2. Prove this.
(Hint: [u
t
(x, t)[
2
+[u
x
(x, t)[
2
= 2[F
(x +ct)[
2
+ 2[G
(x ct)[
2
.)
c) The domain of dependence of a point (x, t) of space-time consists of the interval [x
ct, x+ct]. That is, the value of any solution u at (x, t) depends only on the values u
0
and
v
0
in the interval [xct, x+ct]. Another way to say this is that the region of inuence
of a point x
0
consists of the interior of the light-cone with vertex at x
0
, i.e., all points
(x, t) satisfying x
0
ct < x < x
0
+ ct. (These are the points having x
0
in their domain
of dependence.) Still a third way of stating this is that the Classical Wave Equation has
signal propagation speed c, meaning that the value of a solution at (x, t) depends only on
the values of u
0
and v
0
at points x
0
from which a signal propagating with speed c could
reach x in time t (i.e., points inside the sphere of radius ct about x.)
5
1.3Example 2. The Linear Advection Equation, u
t
cu
x
= 0. Using again the trick
of transforming to the coordinates, = xct, = x+ct, now the equation becomes
u
= 0,
and hence the general solution is u() = constant, and the solution to the Cauchy Problem
is u(x, t) = u
0
(x ct). As before we see that if u
0
is any distribution then u(t) = u
0
(x ct)
gives a well-dened curve in the space of distributions that satises u
t
cu
x
= 0, so that we
really have a ow on the space of distributions whose generating vector eld is c

x
. Since
c

x
is a skew-adjoint operator on L
2
(R), it follows that this ow restricts to a one-parameter
group of isometries of L
2
(R), i.e.,
_
u(x, t)
2
dx is a constant of the motion.
1.3Exercise 3. Prove directly that
d
dt
_
u(x, t)
2
dx is zero. (Hint: It suces to
show this when u
0
is smooth and has compact support, since these are dense in L
2
. Now for
such functions we can rewrite the integral as
_
t
u(x, t)
2
dx and the result will follow if
we can show that

t
u(x, t)
2
can be written for each t in the form
d
dx
h(x), where h is smooth
and has compact support.)
1.3.2 Remark. Clearly the domain of dependence of (x, t) is now just the single point
xct, the region of inuence of x
0
is the line x = x
0
+ct, and the signal propagation speed
is again c. The main dierence with Example 1 is that the Linear Advection Equation
describes wave moving in one direction with velocity c, while The Classic Wave Equation
describes wave moving in both directions with velocity c.
1.3Exercise 4. (Duhamels Principle.) The homogeneous Linear Advection Equa-
tion describes waves moving to the right in a non-dispersive and and non-dissipative one-
dimensional linear elastic medium when there are no external forces acting on it. (The
italicised terms will be explained later.) If there is an external force, then the appropriate
wave equation will be an inhomogeneous version of the equation, u
t
cu
x
= F(x, t). Show
that the Cauchy Problem now has the solution u(x, t) = u
0
(xct) +
_
t
0
F(xct +c, ) d.
1.3Example 3. General Linear Evolution Equation, u
t
+ P(

x
)u = 0. Here P() is a
polynomial with complex coecients. For example, if P() = c then we get back the
Linear Advection Equation. We will outline the theory of these equations in a separate
subsection below where as we shall see, they can analyzed easily and thoroughly using the
Fourier Transform. It will turn out that to qualify as a wave equation, the odd coecients
of the polynomial P should be real and the even coecients pure imaginary, or more simply,
P(i) should be imaginary valued on the real axis. This is the condition for P(

x
) to be a
skew-adoint operator on L
2
(R).
1.3Example 4. The General Conservation Law, u
t
= (F(u))
x
. Here F(u) can any
smooth function of u and certain of its partial derivatives with respect to x. For exam-
ple,if P() = a
1
+ + a
n
n
, we can get the linear evolution equation u
t
= P(

x
)u by
taking F(u) = a
1
u + + a
n
n1
u
x
n1
, and F(u) = (
1
2
u
2
+
2
u
xx
) gives the KdV equation
u
t
+ uu
x
+
2
u
xxx
= 0 that we consider just below. Note that if F(u(x, t)) vanishes at
innity then integration gives
d
dt
_
u(x, t) dx = 0, i.e.,
_
u(x, t) dx is a constant of the

motion, and this is where the name Conservation Law comes from. We will be concerned
mainly with the case that F(u) is a zero-order operator, i.e., F(u)(x) = F(u(x)), where F
is a smooth function on R. In this case, if we let f = F
, then we can write our Conser-

vation Law in the form u
t
= f(u)u
x
. In particular, taking f() = c (i.e., F() = c) gives
6
the Linear Advection Equation u
t
= cu
x
, while F() =
1
2
2
gives the important Inviscid
Burgers Equation, u
t
+uu
x
= 0.
There is a very beautiful and highly developed theory of such Conservation Laws, and
again we will devote a separate subsection to outlining some of the basic results from this
theory. Recall that for the Linear Advection Equation we have an explicit solution for
the Cauchy Problem, namely u(x, t) = u
0
(x ct), which we can also write as u(x, t) =
u
0
(x f(u(x, t))t), where f() = c. If we are incredibly optimistic we might hope that
we could more generally solve the Cauchy Problem for u
t
= f(u)u
x
by solving u(x, t) =
u
0
(x f(u(x, t))t) as an implicit equation for u(x, t). This would mean that we could
generalize our algorithm for nding the prole of u at time t from the initial prole as follows:
translate each point (, u
0
()) of the graph of u
0
to the right by an amount f(u
0
())t to
get the graph of x u(x, t). This would of course give us a simple method for solving any
such Cauchy Problems, and the amazing thing is that this bold idea actually works.
However, one must be careful. As we shall see, this algorithm (which goes under the name
the method of characteristics) contains the seeds of its own eventual failure. For a general
initial condition u
0
and function f, we shall see that we can predict a positive time T
B
(the
so-called breaking time) after which the solution given by the method of characteristics
can no longer exist as a smooth, single-valued funtion.
1.3Example 5. The Korteveg-de Vries (KdV) Equation, u
t
+uu
x
+
2
u
xxx
= 0. If we
re-scale the independent variables by t t and x x, then the KdV equation becomes:
u
t
+
_
_
uu
x
+
_

3
_
2
u
xxx
= 0,
so by appropriate choice of and we can obtain any equation of the form u
t
+ uu
x
+
u
xxx
= 0, and any such equation is referred to as the KdV equation. A commonly
used choice that is convenient for many purposes is u
t
+ 6uu
x
+ u
xxx
= 0, although the
form u
t
6uu
x
+ u
xxx
= 0 (obtained by replacing u by u) is equally common. We will
use both these forms. This is surely one of the most important and most studied of all
evolution equations. It is over a century since it was shown to govern wave motion in
a shallow channel, but less than forty years since the remarkable phenomenon of soliton
interactions was discovered in studying certain of its solutions. Shortly thereafter the so-
called Inverse Scattering Transform (IST) for solving the KdV equation was discovered
and the equation was eventually shown to be an innite dimensional completely integrable
Hamiltonian system. This equation, and its remarkable properties will be one of our main
objects of study.
1.3Example 6. The Sine-Gordon Equation (SGE), u
tt
u
xx
= sin(u). This equation
is considerably older than KdV. It was discovered in the late eighteen hundreds to be
the master equation for the study of pseudospherical surfaces, i.e., surfaces of Gaussian
curvature K equal to 1 immersed in R
3
, and for that reason it was intensively studied
(and its solitons discovered, but not recognized as such) long before KdV was even known.
However it was only in the course of trying to nd other equations that could be solved by
the IST that it was realized that SGE was also a integrable equation.
1.3Example 7. The Nonlinear Schr odinger Equation, iu
t
+ u
xx
+ u[u[
2
= 0. This is
of more recent origin. It was the third evolution equation shown to have soliton behavior
and to be integrable, and recently has been intensively studied because it describes the
7
propagation of pulses of laser light in optical bers. The latter technology that is rapidly
becoming the primary means for long-distance, high bandwidth communication, which in
turn is the foundation of the Internet and the World Wide Web.
1.4 Linear Wave EquationsDispersion and Dissipation
Evolution equations that are not only linear but also translation invariant (i.e., have
constant coecients) can be solved explicitly using Fourier methods. They are interesting
both for their own sake, and also because they can serve as a tool for studying nonlinear
equations.
The general linear evolution equation has the form u
t
+ P(

x
)u = 0, where to begin
with we can assume that the polynomial P has coecients that are smooth complex-valued
functions of x and t: P(

x
)u =

r
i=1
a
i
(x, t)
i
u
x
i
. For each (x
0
, t
0
), we have a space-time
translation operator T
(x
0
,t
0
)
acting on smooth functions of x and t by T
(x
0
,t
0
)
u(x, t) =
u(x x
0
, t t
0
). We say that the operator P(

x
) is translation invariant if it commutes
with all the T
(x
0
,t
0
)
.
1.4Exercise 1. Show that the necessary and sucient condition for P(

x
) to be
translation invariant is that the coecients a
i
of P should be constant complex numbers.
1.4.1 Invariance Principles.
There are at least two good reasons to assume that our equation is translation invariant.
First, the eminently practical one that in this case we will be able to use Fourier transform
techniques to solve the initial value problem explicitly, and investigate in detail the nature
of its solutions.
But there is frequently an even more important philosophical reason for postulating trans-
lation invariance. Assume that we are trying to model the dynamics of some fundamental
physical eld quantity u by an evolution equation of the above type. Thus x will denote
the place where, and t the time when the quantity has the value u(x, t). Now, if our
proposed physical law is indeed fundamental, its validity should not depend on where or
when it is appliedit will be the same on Alpha Centauri as on Earth, and the same in
a million years as it is todaywe can even take that as part of the denition of what we
mean by fundamental. The way to give a precise mathematical formulation of this principle
of space-time symmetry or homogeneity is to demand that our equation should be invariant
under some transitive group acting on space and time.
But, like most philosophical discussions, this only begs a deeper question. How does it
happen that the space-time we live in appears to admit a simply-transitive abelian group
action under which the physical laws are invariant? On the level of Newtonian physics (or
Special Relativity) this is simply taken as axiomatic. General Relativity gives an answer that
is both more sophisticated and more satisfying. The basic symmetry principle postulated
is the Principle of Equivalence. This demands that the truly Fundamental Laws of Physics
should be invariant under the (obviously transitive) group of all dieomorphisms of space-
time. Of course there are very few laws that are that fundamentalbut Einsteins Field
Equations for a (pseudo-)Riemannian metric on space-time is one of them, and the physical
evidence for its correctness is pretty overwhelming. In a neighborhood of any point of space-
8
time we can then coordinatize space-time by using geodesic coordinates (i.e., by identifying
space-time with its tangent space at that point using the Riemannian exponential map). To
use Einsteins lovely metaphor, we get into an elevator and cut the rope. In these natural
coordinates, the space-time appears at to second order, and the translation group that
comes from the linear structure of the tangent space is an approximate symmetry group.
I will not try here to answer the still far deeper philosophical mystery of why our physical
world seems to be governed by laws that exhibit such remarkable symmetry. This is closely
related to what Eugene Wigner called The Unreasonable Eectiveness of Mathematics in
the Natural Sciences in a famous article by that name [W]. (For another view of this topic
see [H].) But I cannot help wondering if the so-called Anthropic Principle is not at least
part of the answer. Perhaps only a Universe governed by such symmetry principles manifests
the high degree of stability that is conducive to the evolution of the kind of self-cognizant,
intelligent life that would worry about this point. In other words: mathematicians, physi-
cists, and philosophers can exist to wonder about why such fundamental laws govern, only
in those universes where they do in fact govern.
In any case, we shall henceforth assume that P does in fact have constant complex
numbers as coecients. If we substitute the Ansatz u(x, t) = e
i(kxt)
into our linear
equation, u
t
+ P(

x
)u = 0, then we nd the relation iu + P(ik)u = 0, or = (k) :=
1
i
P(ik). For u(x, t) to be a plane wave solution, we need the angular frequency, , to be
real. Thus, we will have a (unique) plane wave solution for each real wave number k just
when
1
i
P(ik) is real (or P(ik) is imaginary) for k on the real axis. This just translates into
the condition that the odd coecients of P should be real and the even coecients pure
imaginary. Let us assume this in what follows. As we shall see, one consequence will be that
we can solve the initial value problem for any initial condition u
0
in L
2
, and the solution is a
superposition of these plane wave solutionsclearly a strong reason to consider this case as
describing honest wave equations, whatever that term should mean. Then we will follow
up by taking a look at what happens when we relax this condition.
The relation (k) :=
1
i
P(ik) relating the angular frequency and wave number k of
a plane wave solution of a linear wave equation is called the dispersion relation for the
equation. The propagation velocity of the plane wave solution with wave number k is called
the phase velocity at wave number k, and is given by the formula
(k)
k
=
1
ik
P(ik) (which is
also sometimes referred to as the dispersion relation for the equation). It is important to
observe that the dispersion relation is not only determined by the polynomial P that denes
the evolution equation, but it conversely determines P.
Now let u
0
be any initial wave prole in L
2
, so that u
0
(x) =
_
u
0
(k)e
ikx
dk, where
u
0
(k) =
1
2
_
u
0
(x)e
ikx
dk is the Fourier Transform of u. Dening u(k, t) = e
P(ik)t
u
0
(k),
we see that u(k, t)e
ikx
= u
0
(k)e
ik(x
(k)
k
t)
is a plane wave solution to our equation with
initial condition u
0
(k)e
ikx
. We now dene u(x, t) (formally) to be the superposition of
these plane waves: u(x, t)
_
u(k, t)e
ikx
dk. So far we have not really used the fact that
P(ik) is imaginary for k real, and this u(x, t) would still be a formal solution without
that assumption. The way we shall use the condition on P is to notice that it implies
[e
P(ik)t
[ = 1. Thus, [ u(k, t)[ = [ u
0
(k)[, so u(k, t) is in L
2
for all t, and in fact it has
the same norm as u
0
. It then follows from Plancherels Theorem that u(x, t) is in L
2
for
all t, and has the same norm as u
0
. It is now elementary to see that our formal solution
u(x, t) is in fact an honest solution of the Cauchy Problem for our evolution equation, and
in fact denes a one-parameter group of unitary transformations of L
2
. (Another way to
9
see this is to note that since

x
is a skew-adjoint operator on L
2
, so is any odd power or
i times any even power, so that P(

x
), is skew-adjoint and and hence exp(P(

x
)t) is a
one-parameter group of unitary transformations of L
2
. But a rigorous proof that

x
is a
skew-adjoint operator (and not just formally skew-adjopint) involves essentially the same
Fourier analysis.)
We next look at what can happen if we drop the condition that the odd coecients of P
are real and the even coecients pure imaginary.
Consider rst the special case of the Heat (or Diusion) Equation, u
t
u
xx
= 0, with
> 0. Here P(x) = X
2
, so [e
P(ik)t
[ = [e
k
2
t
[. Thus, when t > 0, [e
P(ik)t
[ < 1, and
[ u(k, t)[ < [ u
0
(k)[, so again u(k, t) is in L
2
for all t, but now |u(x, t)|
L
2
< |u
0
(x)|
L
2
. Thus
our solution is not a unitary ow on L
2
, but rather a contracting, positive semi-group. In
fact, it is easy to see that for each initial condition u
0
L
2
, the solution tends to zero in
L
2
exponentially fast as t , and in fact it tends to zero uniformly too. This so-called
dissipative behavior is clearly not very wave-like in nature, and the Heat Equation is not
considered to be a wave equation. On the other hand, the fact that the propagator [e
P(ik)t
[
is so rapidly decreasing implies very strong regularity for the solution u(x, t) as a function
of x as soon as t > 0.
1.4Exercise 2. Show that for any initial condition u
0
in L
2
, the solution u(x, t)
of the Heat Equation is an analytic function of x for any t > 0. (Hint: If you know the
Paley-Wiener Theorem, this is of course an immediate consequence, but it is easy to prove
directly.)
What happens for t < 0? In this case [e
P(ik)t
[ = [e
k
2
t
[ is not an essentially bounded
function of k, and indeed grows more rapidly than any polynomial, so that multiplication
by it does not map L
2
into itself. or any of the Sobolev spaces. In fact, it is immediate
from the above exercise, that if u
0
L
2
is not analytic, then there cannot be an L
2
solution
u(x, t) of the Heat Equation with initial condition u
0
on any non-trivial interval (T, 0].
It is not hard to extend this analysis for the Heat Equation to any monomial P: P(X) =
a
n
X
n
, where a
n
= +i. Then [e
P(ik)t
[ = [e
i
n
t
[[e
i
n+1
t
[. If n = 2m is even, this becomes
[e
(1)
m
t
[, while if n = 2m+1 is odd, it becomes [e
(1)
(m+1)
t
[. If (respectively ) is zero,
we are back to our earlier case that gives a unitary ow on L
2
. If not, then we get essentially
back to the dissipative semi-ow behavior of the heat equation. Whether the semi-ow is
dened for t > 0 or t < 0 depends on the parity of m and the sign of (repectively ).
1.4Exercise 3. Work out the details.
We will now return to our assumption that P(D) is a skew-adjoint operator, i.e., the
odd coecients of P(X) are real and the even coecients pure imaginary. We next note
that this seemingly ad hoc condition is actually equivalent to a group invariance principle,
similar to translation invariance.
1.4.2 Symmetry Principles in Generaland CPT in Particular.
One of the most important ways to single out important and interesting model equations
for study is to look for equations that satisfy various symmetry or invariance principles.
Suppose our equation is of the form c = 0 where c is some dierential operator on a linear
space T of smooth functions, and we have some group G that acts on T. Then we say that
10
the equation is G-invariant (or that G is a symmetry group for the equation) if the operator
c commutes with the elements of G. Of course it follows that if u T is a solution of c = 0,
then so is gu for all g in G.
As we have already noted, the evolution equation u
t
+P(D)u = 0 is clearly invariant under
time translations, and is invariant under spatial translations if and only if the coeciends
of the polynomial P(X) are constant. Most of the equations of physical interest have
further symmetries, i.e., are invariant under larger groups, reecting the invariance of the
underlyng physics under these groups. For example, the equations of pre-relativistic physics
are Gallilean invariant, while those of relativistic physics are Lorentz invariant. We will
consider here a certain important discrete symmetry that so far has proved to be universal
in physics.
We denote by T the time-reversal map (x, t) (x, t), and by P the analogous par-
ity or spatial reection map (x, t) (x, t). These involutions act as linear operators on
functions on space-time by u(x, t) u(x, t) and u(x, t) u(x, t) respectively. There is a
third important involution, that does not act on space-time, but directly on complex-valued
functions; namely the conjugation operator C, mapping u(x, t) to its complex conjugate
u(x, t)
. Clearly C, P, and T commute, so their composition CPT is also an involution

u(x, t) u(x, t)
acting on complex-valued functions dened on space-time. We note

that CPT maps the function u(x, t) = e
i(kxt)
(with real wave number k) to the function
u(x, t) = e
i(kx
t)
, so it xes such a u if and only if u is a plane wave.
1.4Exercise 4. Prove that u
t
+P(D)u = 0 is CPT-invariant if and only if P(D) is
skew-adjoint, i.e., if and only if P(i) is pure imaginary for all real . Check that the KdV,
NLS, and Sine-Gordon equation are also CPT-invariant.
1.4.3 Examples of Linear Evolution Equations.
1.4Example 1. Choosing P() = c, gives the Linear Advection Equation u
t
+cu
x
= 0.
The dispersion relation is
(k)
k
=
P(ik)
ik
= c, i.e., all plane wave solutions have the same
phase velocity c. For this example we see that u(k, t)e
ikx
= u
0
(k)e
ik(xct)
, and since
_
u
0
(k)e
ikx
dk = u
0
(x), it follows that u(x, t) =
_
u(k, t)e
ik(xct)
dk = u
0
(x ct), giving
an independent proof of this explicit solution to the Cauchy Problem in this case.
The next obvious case to consider is P() = c + d
3
, giving the dispersion relation
(k)
k
=
P(ik)
ik
= c(1 (d/c)k
2
), and the wave equation u
t
+ cu
x
+ du
xxx
= 0. This is
sometimes referred to as the weak dispersion wave equation. Note that the phase velocity
at wave number k is a constant, c, plus a constant times k
2
. It is natural therefore to
transform to coordinates moving with velocity c, i.e., make the substitution x xct, and
the wave equation becomes u
t
+du
xxx
= 0. Moreover, by rescaling the independent variable
x we can get rid of the coecient d. This leads us to our next example.
1.4Example 2. P() =
3
, gives the equation u
t
+u
xxx
= 0. Now the dispersion relation
is non-trivial; plane wave solutions with wave number k move with phase velocity
(k)
k
=
P(ik)
ik
= k
2
, so the Fourier components u
0
(k)e
ik(x+k
2
t)
of u(x, t) with a large wave number
k move faster than those with smaller wave number, causing an initially compact wave prole
to gradually disperse as these Fourier modes move apart and start to interfere destructively.
It is not hard in this example to write a formula for u(x, t) explicitly in terms of u
0
,
11
instead of u
0
, namely:
u(x, t) =
1
3t
_

Ai
_
x
3
3t
_
u
0
() d.
Here Ai is the Airy function, a bounded solution of w
zw = 0 that can be dened explicitly

by:
Ai(z) =
1
_

0
cos
_
t
3
3
+tz
_
dw,
and it follows from this from this that v(x, t) =
1
3t
Ai
_
x
3
3t
_
satises v
t
+v
xxx
= 0, and
lim
t0
v(x, t) = (x).
1.4.4 Remark. More generally, for a wave equation u
t
+P(

x
)u = 0, the solution, p(x, t),
of the Cauchy Problem with p(x, 0) = (x) is called the Fundamental Solution or Propagator
for the equation. It follows that the solution to the Cauchy problem for a general initial
condition is given by convolution with p, i.e., u(x, t) =
_
p(x , t)u
0
() d.
1.4Exercise 5. (General Duhamel Principle) Suppose p is the fundamental solution
for the homogeneous wave equation u
t
+P(

x
)u = 0. Show that the solution to the Cauchy
Problem for the corresponding inhomogeneous equation u
t
+ P(

x
)u = F(x, t) is given by
_
p(x , t)u
0
() d +
_
t
0
d
_
p(x , t )F(, ) d.
Before leaving linear wave equations we should say something about the important con-
cept of group velocity. We consider an initial wave packet, u
0
, that is synthesized from
a relatively narrow band of wave numbers, k, i.e., u
0
(x) =
_
k
0
+
k
0
u
0
(k)e
ikx
dk. Thus the
corresponding frequencies (k) will also be restricted to a narrow band around (k
0
), and
since all the plane wave Fourier modes are moving at approximately the velocity
(k
0
)
k
0
,
the solution u(x, t) of the Cauchy Problem will tend to disperse rather slowly and keep
an approximately constant prole f, at least for a short initial period. One might expect
that the velocity at which this approximate wave prole moves would be
(k
0
)
k
0
, the central
phase velocity, but as we shall now see, it turns out to be
(k
0
). To see this we expand
(kx (k)t) in a Taylor series about k
0
:
(kx (k)t) = (k
0
x (k
0
)t) + (k k
0
)(x
(k
0
)t) +O((k k
0
)
2
),
and substitute this in the formula u(x, t) =
_
k
0
+
k
0
u
0
(k)e
i(kx(k)t)
dk for the solution. As-
suming is small enough that the higher order terms in this expansion can be ignored in the
interval [k
0
, k
0
+] we get the approximation u(x, t) f(x
(k
0
)t)e
i(k
0
x(k
0
)t)
, where
f(x) =
_
k
0
+
k
0
u
0
(k)e
i(kk
0
)x
= u
0
(x)e
ik
0
x
dk. Thus, to this approximation, the solution
u(x, t) is just the plane wave solution of the wave equation having wave number k
0
, but
amplitude modulated by a traveling wave with prole f and moving at the group velocity
(k
0
).
1.4Exercise 6. Consider the solution u(x, t) to a linear wave equation that is the
superposition of two plane wave solutions, the rst with wave number k
0
and the second
with wave number k
0
+ k, that is close to k
0
. Let = (k
0
+ k) (k
0
). Show that
u(x, t) is (exactly!) the rst plane wave solution amplitude modulated by a travelling wave
of prole f(x) = 1+e
ikx
and velocity

k
. (So that in this case there is no real dispersion.)
12
1.4.5 Remark. In many important physical applications (e.g., light travelling in a trans-
parent medium such as an optical ber)
< 0, i.e., the dispersion curve is convex upwards,

so that the phase velocity exceeds the group velocity, and high frequency plane waves are
slower than low frequency plane waves. Thus, wavelets enter the envelope of a group from
the left, and rst grow and then diminish in amplitude as they pass through the group and
exit to the right. This is called normal dispersion, the opposite case
> 0 being referred

to as anomalous dispersion.
1.4Example 3. De Broglie Waves.
Schr odingers Equation for a particle in one dimension,
t
= i
h
2m
xx
+
1
i h
u, provides
an excellent model for comparing phase and group velocity. Here h = 6.626 10
34
Joule
seconds is Plancks quantum of action, h = h/2, and u is the potential function, i.e.,
u
(x) gives the force acting on the particle when its location is x. We will only consider
the case of a free particle, i.e., one not acted on by any force, so we take u = 0, and
Schr odingers Equation reduces to
t
+ P(

x
) = 0, where P() =
h
i
2
2m
. The dispersion
relation therefor gives v
(k) =
(k)
k
=
P(ik)
ik
=
hk
2m
as the phase velocity of a plane wave
solution of wave number k, (a so-called de Broglie wave), and thus the group velocity is
v
g
(k) =
(k) =
hk
m
. Now the classical velocity of a particle of momentum p is
p
m
, and this
implies the relation p = hk between momentum and wave number. Since the wave-length
is related to the wave number by =
2
k
, this gives the formula =
h
p
for the so-called
de Broglie wave-length of a particle of momentum p. (This was the original de Broglie
hypothesis, associating a wave-length to a particle.) Note that the energy E of a particle
of momentum p is
p
2
2m
, so E(k) =
( hk)
2
2m
= h(k), the classic quantum mechanics formula
relating energy and frequency.
For this wave equation it is easy and interesting to nd explicitly the evolution of a
Gaussian wave-packet that is initially centered at x
0
and has wave number centerd at k
0
in fact this is given as an exercise in almost every rst text on quantum mechanics. For the
Fourier Transform of the initial wave function
0
, we take

0
(k) = G(k k
0
,
p
), where
G(k, ) =
1
(2)
1
4
exp
_
k
2
4
2
_
is the L
2
normalized Gaussian centered at the origin and having width . Then, as we
saw above, (x, t), the wave function at time t, has Fourier Transform

(k, t) given by
0
(k)e
P(ik)t
, and (x, t) =
_

(k, t)e
ikx
dk. Using the fact that the Fourier Transform
of a Gaussian is another Gaussian, we nd easily that (x, t) = A(x, t)e
i(x,t)
, where the
amplitude A is given by A(x, t) = G(x v
g
t,
x
(t)). Here, as above, v
g
= v
g
(k
0
) =
hk
0
m
is
the group velocity, and the spatial width
x
(t) is given by
x
(t) =
h
2
p
(1+
4
4
p
h
2
t
2
m
). We recall
that the square of the amplitude A(x, t) is just the probability density at time t of nding
the particle at x. Thus, we see that this is a Gaussian whose mean (which is the expected
position of the particle) moves with the velocity of the classical particle. Note that we have
a completely explicit formula for the width
x
(t) of the wave packet as a function of time, so
the broadening eect of dispersion is apparent. Also note that the Heisenbergs Uncertainty
Principle,
x
(t)
p

h
2
is actually an equality at time zero, and it is the broadening of
disperion that makes it a strict inequality at later times.
13
1.4.6 Remark. For a non-free particle (i.e., when the potential u is not a constant func-
tion) the Schr odinger Equation,
t
= i
h
2m
xx
+
1
i h
u, no longer has coecients that are
constant in x, so we dont expect solutions that are exponential in both x and t (i.e., plane
waves or de Broglie waves). But the equation is still linear, and it is still invariant under time
translations, so do we expect to be able to expand the general solution into a superposition
of functions of the form
E
(x, t) = (x)e
i
E
h
t
. (We have adopted the physics convention,
replacing the frequency, , by
E
h
, where E is the energy associated to that frequency.) If we
substitute this into the Schr odinger equation, then we see that the energy eigenfunction
(or stationary wave function) must satisfy the so-called time-independent Schr odinger
Equation, (
h
2
2m
d
2
dx
2
+ u) = E. Note that this is just a second-order linear ODE, so
for each choice of E it will have a two-dimensional linear space of solutions. This linear
equation will show up with a strange twist when we solve the nonlinear KdV equation,
u
t
6uu
x
+ u
xxx
= 0, by the remarkable Inverse Scattering Method. Namely, we will see
that if the one-parameter family of potentials u(t)(x) = u(x, t) evolves so as to satisfy the
KdV equation, then the corresponding family of Schr odinger operators, (
h
2
2m
d
2
dx
2
+u), are
unitarily equivalent, a fact that will play a key r ole in the Inverse Scattering Method. (Note
that the time, t, in the time-dependent Schr odinger Equation is not related in any way
to the t in the KdV equation.)
1.4.7 Remark. We next explain how to generalize the Fourier methods above for solving
linear PDE to the case n > 1. For simplicity, we will only consider the case of scalar
equationsi.e., we will assume that u is a complex-valued function (rather than one taking
values in some complex vector space V ), but the more general vector-valued case can be
handled similarly, (see the exercise below). As we saw earlier, the analog of plane waves in
more space dimensions are travelling waves of the form u
,
(x, t) = e
i(xt)
, where R
n
.
Now

S
n1
is the direction of the plane wave motion, the wave number is ||, and the
speed, c, is related to the angular frequency (which must be real) by c =

.
Suppose we have a constant coecient linear wave equation, u
t
+ P(D)u = 0. Here
P(X) =

||r
A
is a complex polynomial in X = (X
1
, . . . , X
n
), and we are using
standard multi-index notation. Thus, denotes an n-tuple of non-negative integers,
[[ =
1
+ +
n
, X
= X
1
1
. . . X
n
n
, D = (

x
1
, . . .

x
n
), and D
=

||
x
1
1
...x
n
n
. Note
that D
u
,
= (i)
u
,
, and hence P(D)u
,
= P(i)u
,
, where P(i) is the so-called
total symbol of P(D), i.e.,

||r
i
||
. On the other hand,

t
u
,
= iu
,
, so u
,
is a solution of u
t
+ P(D)u = 0 if and only if = () =
1
i
P(i), and this is now the
dispersion relation. Since must be real, if we want to have a plane wave solution for
each , the condition as before is that P(i) must be pure imaginary for all R
n
. This
clearly is the case if and only if A
is real for [[ odd, and imaginary for [[ even, and

this is also equivalent to requiring that P(D) be a skew-adjoint operator on L
2
(R
n
, C). If
u
0
(x) =
_
u()e
ix
d in L
2
(R) is given, then u(x, t) =
_
u()u
,()
(x, t) d solves the given
wave equation and u(x, 0) = u
0
(x). As in the case n = 1, the transformation U(t) mapping
u
0
to u(t) = u(x, t) denes a one-parameter group of unitary transformations acting on
L
2
(R
n
, C) with P(D) as its innitesimal generator, i.e., U(t) = exp(tP(D)).
1.4Exercise 7. Analyze the vector-valued wave equation u
t
+ P(D)u = 0, with u
now a C
d
-valued function. Again, P(X) =

||r
A
, the coecients A
now lying in
the space of linear operators on C
d
(or dd complex matrices). Show that for P(D) to be a
14
skew-adjoint operator on L
2
(R
n
, C
d
) generating a one-parameter group of unitary operators
U(t), the total symbol P(i) =

||r
i
||
must be skew-adjoint operator on C

d
for
all in R
n
, and this in turn means that A
is self-adjoint for [[ odd and skew-adjoint for

[[ even. Check this is equivalent to the CPT-invariance of u
t
+ P(D)u = 0. Write an
explicit formula for U(t)u
0
using vector-valued Fourier transforms.
1.4Example 4. Finally we take a quick look at the classic linear wave equation, u
tt

c
2
u = 0 with more spatial dimensions. If we substitute the plane wave Ansatz u(x, t) =
u
,
(x, t) = e
i(xt)
into this equation, we nd that (
2
c
2
||
2
)u = 0, so (k) = c ||,
or
(k)
= c. Thus, all the plane wave solutions travel at the same speed, c, but now they can
travel with this speed in innitely many dierent directions , instead of just the two possible
directions (right and left) when n = 1. If we now take a Gaussian initial condition
u(x, 0) (and say assume that u
t
(x, 0) = 0) and analyze it into its Fourier components, we
see that because the various components all move with speed c but in dierent directions,
the original Gaussian wave packet will spread out and become dispersed.
Both a plucked piano string and the waves from a pebble dropped in a pond satisfy
the classic wave equation. But in the rst case we observe two travelling waves race o in
opposite directions, maintaining their original shape as they go, while in the second we see
a circular wave pattern moving out from the central source, gradually losing amplitude as
the energy spreads out over a larger and larger circle.
1.4Exercise 8. Show that u
tt
c
2
u = 0 is Lorentz invariant, conformally invariant,
and CPT-invariant
1.5 Conservation Laws
Let u(x, t) denote the density of some physical quantity at the point x and time t, and
(x, t) its ux, i.e.,

(x, t)
n dA is the rate of ow of the quantity across a surface element

dA with unit normal

n . Finally, let g(x, t) denote the rate at which the quantity is being
created at x at time t. Then, essentially by the meanings of these denitions, given any
region V with smooth boundary V ,
d
dt
_
V
u(x, t) dV =
_
V
g(x, t) dV
_
V
(x, t)
n dA,
which is the general form of a conservation law in integrated form. (Note that in the one
space dimensional case this becomes
_
b
a
u(x, t) dx =
_
b
a
g(x, t) dx [(b, t) (a, t)].) If u is
C
1
, then by Gausss Theorem,
_
V
_
u(x, t)
t
+
(x, t)
_
dV =
_
V
g(x, t) dV,
so, dividing by the volume of V , and letting V shrink down on a point x we get the
corresponding dierential form of the conservation law,
u(x, t)
t
+
(x, t) = g(x, t),

15
or in one space dimension, u
t
+
x
= g. We will be mainly concerned with the case g = 0,
and we note that in this case it follows that if vanishes at innity, then
_
u(x, t) dt is
independent of t (which explains why this is called a conservation law).
As it stands, this is a single equation for two unknown functions u and , and is un-
derdetermined. Usually however we have some so-called constitutive equation expressing
in terms of u. In the most general case, (x, t) will be a function not only of u(x, t) but
also of certain partial derivatives of u with respect to x, however we will only consider the
case of constitutive equations of the form (x, t) = F(u(x, t)), where F is a smooth function
on R whose derivative F
will be denoted by f. Thus our conservation law nally takes the

form:
(CL) u
t
+f(u)u
x
= 0.
We will usually assume that f
(u) 0, so that f is a non-decreasing function. This is

satised in most of the important applications.
1.5Example 1. Take F(u) = cu, so f(u) = c and we get once again the Linear Advection
Equation u
t
cu
x
= 0. The Method of Characteristics below will give yet another proof
that the solution to the Cauchy Problem is u(x, t) = u
0
(x ct).
1.5Example 2. Take F(u) =
1
2
u
2
, so f(u) = u and we get the important Inviscid
Burgers Equation, u
t
+uu
x
= 0.
We will next explain how to solve the Cauchy Problem for such a Conservation Law
using the so-called Method of Characteristics. We look for smooth curves (x(s), t(s)) in
the (x, t)-plane along which the solution to the Cauchy Problem is constant. Suppose that
(x(s
0
), t(s
0
)) = (x
0
, 0), so that the constant value of u(x, t) along this so-called characteristic
curve is u
0
(x
0
). Then 0 =
d
ds
u((x(s), t(s)) = u
x
x
+u
t
t
, and hence
dx
dt
=
x
(s)
t
(s)
=
u
t
u
x
= f(u(x(s), t(s)) = f(u
0
(x
0
)),
so the characteristic curve is a straight line of slope f(u
0
(x
0
)), i.e., u has the constant value
u
0
(x
0
) along the line
x
0
: x = x
0
+f(u
0
(x
0
))t. Note the following geometric interpretation
of this last result: to nd the wave prole at time t (i.e., the graph of the map
x u(x, t)), translate each point (x
0
, u
0
(x
0
)) of the initial prole to the right
by the amount f(u
0
(x
0
))t. (This is what we promised to show in Example 1.3.4.) The
analytic statement of this geometric fact is that the solution u(x, t) to our Cauchy Problem
must satisfy the implicit equation u(x, t) = u
0
(x tf(u(x, t))). Of course the above is
heuristichow do we know that a solution exists?but it isnt hard to work backwards and
make the argument rigorous.
The idea is to rst dene characteristic coordinates (, ) in a suitable strip 0 t < T
B
of the (x, t)-plane. We dene (x, t) = t and (x, t) = x
0
along the characteristic
x
0
, so
t(, ) = and x(, ) = +f(u
0
()). But of course, for this to make sense, we must show
that there is a unique
x
0
passing through each point (x, t) in the strip t < T
B
.
The easiest case is f
= 0, say f = c, giving the Linear Advection Equation, u

t
+cu
x
= 0.
In this case, all characteristics have the same slope, 1/c, so that no two characteristics
intersect, and there is clearly exactly one characteristic through each point, and we can
dene T
B
= .
16
From now on we will assume that the equation is truly nonlinear, in the sense that
f
(u) > d > 0, so that f is a strictly increasing function.

If u
0
is everywhere positive, then u
0
(x) is strictly increasing, and hence so is f(u
0
(x)).
In this case we can again take T
B
= . For, since the slope of the characteristic
x
0
issuing
from (x
0
, 0) is
1
f(u
0
(x))
, it follows that if x
0
< x
1
then
x
1
has smaller slope than
x
0
, and
hence these two characteristics cannot intersect for t > 0, so again every point (x, t) in the
upper half-plane lies on at most one characteristic
x
0
.
Finally the interesting general case: suppose u
0
is somewhere negative. In this case
we dene T
B
to be the inmum of [u
0
(x)f
(u
0
(x))]
1
, where the inf is taken over all x
with u
0
(x) < 0. For reasons that will appear shortly, we call T
B
the breaking time. As
we shall see, T
B
is the largest T for which the Cauchy Problem for (CL) has a solution
with u(x, 0) = u
0
(x) in the strip 0 t < T of the (x, t)-plane. It is easy to construct
examples for which T
B
= 0; this will happen if and only if there exists a sequence x
n
with
u
0
(x
n
) . In the following we will assume that T
B
is positive, and that in fact there is
a point x
0
where T
B
=
1
u
0
(x
0
)f
(u
0
(x
0
))
. In this case, we will call
x
0
a breaking characteristic.
Now choose any point x
0
where u
0
(x
0
) is negative. For x
1
slightly greater than x
0
, the
slope of
x
1
will be greater than the slope of
x
0
, and it follows that these two characteristics
will meet at the point (x, t) where x
1
+ f(u
0
(x
1
))t = x = x
0
+ f(u
0
(x
0
))t, namely when
t =
x
1
x
0
f(u
0
(x
1
))f(u
0
(x
0
))
.
1.5Exercise 1. Show that T
B
is the least t for which any two characteristics intersect
at some point (x, t) with t 0.
1.5Exercise 2. Show that there is always at least one characteristic curve passing
through any point (x, t) in the strip 0 t < T
B
(and give a counterexample if u
0
is not
required to be continuous).
Thus the characteristic coordinates (, ) are well-dened in the strip 0 t < T
B
of the
(x, t)-plane. Note that since x = +f(u
0
()),
x
= 1+f
(u
0
())u
0
(), and
x
= f(u
0
()),
while
t
= 0 and
t
= 1. It follows that the Jacobian of (x, t) with respect to (, ) is

x
= 1+f
(u
0
())u
0
(), which is positive in 0 t < T
B
, so that (, ) are smooth coordinates
in this strip. On the other hand, if
x
0
is a breaking characteristic, then then the Jacobian
aproaches zero along
x
0
as t approaches T
B
, conrming that the characteristic coordinates
cannot be extended to any larger strip.
By our heuristics above, we know that the solution of the Cauchy Problem for (CL)
with initial value u
0
should be given in characteristic coordinates by the explicit formula
u(, ) = u
0
(), and so we dene a smooth function u in 0 t < T
B
by this formula. Since
the map from (x, t) to (, ) is a dieomophism, this also denes u as a smooth function
of x and t, but it will be simpler to do most calculations in characteristic coordinates. In
any case, since a point (x, t) on the characteristic
satises x = +f(u
0
())t, we see that
u = u(x, t) is the solution of the implicit equation u = u
0
(x tf(u)). It is obvious that
u(x, 0) = u
0
(x), and we shall see next that u(x, t) satises (CL).
1.5Exercise 3. Use the chain-rule: u
x
= u
x
and u
t
= u
t
to compute the partial
17
derivatives u
x
and u
t
as functions of and :
u
t
(, ) =
u
0
()f(u
0
())
1 +u
0
()f
(u
0
())
and
u
x
(, ) =
u
0
()
1 +u
0
()f
(u
0
())
and deduce from this that u actually satises the equation (CL) in 0 t < T
B
.
1.5Exercise 4. Show that, along a breaking characteristic
x
0
, the value of u
x
at
the point x = x
0
+ f(u
0
(x
0
))t is given by
u
0
(x
0
)T
B
T
B
t
. (Note that this is just the slope of the
wave prole at time t over the point x.)
We can now get a qualitative but very precise picture of how u develops a singularity
as t approaches the breaking time T
B
, a process usually referred to as shock formation or
steepening and breaking of the wave prole.
Namely, let
x
0
be a breaking characteristic and consider an interval I around x
0
where
u
0
is decreasing. Lets follow the evolution of that part of the wave prole that is originally
over I. Recall our algorithm for evolving the wave prole: each point (x, u
0
(x)) of the
initial prole moves to the right with a constant velocity f(u
0
(x)), so at time t it is at
(x+f(u
0
(x))t, u
0
(x)). Thus, the higher part of the wave prole, to the left, will move faster
than the lower part to the right, so the prole will bunch up and become steeper, until it
eventually becomes vertical or breaks at time T
B
when the slope of the prole actually
becomes innite over the point x
0
+f(u
0
(x
0
))T
B
. (In fact, the above exercise shows that the
slope goes to like a constant times
1
tT
B
.) Note that the values of u remain bounded
as t approaches T
B
. In fact, it is clearly possible to continue the wave prole past t = T
B
,
using the same algorithm. However, for t > T
B
there will be values x
where the vertical

line x = x
meets the wave prole at time t in two distinct points (corresponding to two
characteristics intersecting at the point (x
, t)), so the prole is no longer the graph of a

single-valued function.
1.5.1 Remark. Despite the fact that u
x
blows up along a breaking characteristic as t T
B
,
surprisingly the total variation of the function x u(x, t) not only doesnt blow up as t
approaches T
B
, it is actually a constant of the motion, i.e.,
_
[u
x
(x, t)[ dx =
_
[u
0
()[ d.
To see this, note that
x
= 1 + f
(u
0
())u
0
()t is clearly positive for t < T
B
, so that
[u
x
(x, t)[ dx = [u
x
(, )[
x
d = [u
x
(, )[ (1+f
(u
0
())u
0
()) d and use the above formula
for u
x
(, ). Thus, although [u
x
[ gets very large as t approaches T
B
, it is only large on a set
of small measure.
For certain purposes it is interesting to know how higher derivatives u
xx
, u
xxx
, . . . behave
as t approaches T
B
along a breaking characteristic, (in particular, in the next section we
will want to compare u
xxx
with uu
x
). These higher partial derivatives can be estimated in
terms of powers of u
x
using

x
=

(
x
)
1
, and
x
= 1 +f
(u
0
())u
0
().
1.5Exercise 5. Show that along a breaking characteristic
x
0
, as t T
B
, u
xx
=
O(u
3
x
) = O((t T
B
)
3
). Similarly, show that u
xxx
= O(u
5
x
) = O((t T
B
)
5
).
For more details on the formation of singularities in conservation laws and other PDEs see
[J] and [La2]. Below we will consider what happens after the breaking time T
B
. Although
18
we can no longer have a smooth solution, it turns out that there may still be physically
meaningful solutions of the integrated form of the conservation law. But rst we consider
an interesting example.
1.5Example 3. Trac Flow on a Highway.
We imagine an ideal, straight, innitely long highway, modeled by the real line. To
simplify the analysis, we assume that there are no entrance or exit ramps, and we will smooth
out the discrete nature of the cars and model the trac density and ux by approximating
continuous statistical averages. We choose an arbitrary origin, and we let u(x, t) denote the
density of cars at the point x at time t (in units of cars per kilometer), and we let (x, t)
denote the ux of cars, i.e., the rate at which cars are passing the point x at time t (in
units of cars per second). We will also want to consider the speed of the trac at x at time
t, which we will denote by v(x, t) and measure it in kilometers per second. We have the
obvious relation = vu. If we choose any two points a and b along the highway, then clearly
we have the conservation law in integrated form,
d
dt
_
b
a
u(x, t) dx +(b, t) (a, t) = 0; i.e.,
the rate of change of the total number of cars between a and b plus the rate at which cars
are coming in at b minus the rate at which they are leaving at a must be zero (since no cars
are leaving between a and b). Assuming u is smooth, and letting a and b approach x we get
the dierential form of the conservation law, u
t
+
x
= 0.
To proceed further we will need a constitutive relation, relating u and . It is natural
to try to model this using the intuitively observed law of trac ow that the denser the
trac, the slower drivers will travel. For simplicity, we assume that there is a maximum
velocity v
max
(the speed limit) and a maximum density u
max
and we assume that the
speed at which drivers travel is v
max
on an empty road (i.e., when u = 0), 0 when trac is
bumper to bumper, (i.e., when u = u
max
) and linear in between. This leads to the relation
v(u) = v
max
(1 u/u
max
), and then using = vu we derive the constitutive relation,
(u) = F(u) = uv(u). The conservation law then takes the form u
t
+ F
(u)u
x
= u
t
+
(v(u) +uv
(u))u
x
= 0, or u
t
+v
max
(1 2u/u
max
)u
x
= 0.
Of course, trac engineers use much more realistic models that take into account on and
o ramps, and use more sophisticated contitutive relations, but already with this model
one can see interesting phenomena such as the development of a shock wave as oncoming
trac meets trac stopped at a red light. We illustrate this below, after rst introducing
the simplest kind of non-smooth solutions of conservation laws
1.5.2 Shock Wave Solutions and the Rankine-Hugoniot Jump Condition.
Let x
s
(t) denote a smooth curve C in the closed upper-half (x, t)-plane, dened for all
t 0, and so dividing the upper-half plane into two open regions, R
to the left and R

+
to
the right. Let u(x, t) be a smooth solution of the conservation law u
t
+
x
in the union of
these two regions. We assume that the restrictions u[R
and u[R
+
each extend continuously
to the boundary curve C, although these two extensions do not necessarily agree. Given
a point p = (x
s
(t), t) on C, the dierence between the limits u(x
+
s
, t) of u at p from the
right and the limit u(x
s
, t) from the left denes the jump [u](x
s
, t) = u(x
+
s
, t) u(x
s
, t)
across C at this point. Since is given by a constitutive equation (x, t) = F(u(x, t)),
we also have a corresponding jump [](x
s
, t) = (x
+
s
, t) (x
s
, t) in as we cross C. We
will call such a piecewise smooth solution u of the conservation law a shock wave solution
of the conservation law with shock path C if in addition to satisfying the equation in each
of R
and R
+
, it also satises the integrated form of the conservation law, i.e., for all
19
a < b,
d
dt
_
b
a
u(x, t) dx + (b, t) (a, t) = 0. By choosing a < x
s
(t) < b and breaking the
above integral into a sum corresponding to the sub-intervals [a, x
s
(t)] and [x
s
(t), b] (and
then letting a and b approach x
s
(t)), we can easily derive the following:
Rankine-Hugoniot Jump Condition. Let u be a shock wave solution of the conservation
law u
t
+
x
with shock path C given by (x
s
(t), t). Then x
s
(t) satises the following ordinary
dierential equation, known as the Rankine-Hugoniot Jump Condition:
dx
s
(t)
dt
=
[](x
s
, t)
[u](x
s
, t)
.
1.5Exercise 6. A Shock Wave Solution of the Inviscid Burgers Equation. Lets try
to solve the Inviscid Burgers Equation, u
t
+ uu
x
= 0, with the initial condition u
0
(x) = 1
for x < 0 and u
0
(x) = 1 for x 0. It is easy to see that there are pairs of characterisics
that meet after an arbitrarily short time, so T
B
= 0, and this can have no smooth solution.
Show that u(x, t) = 1 for x < t/2 and u(x, t) = 0 for x > t/2 is a shock wave solution with
shock path x
s
(t) = t/2.
1.5Exercise 7. A Shock Wave at a Red Light. Consider highway trac that is
backing up as it runs into a red light. Assume that the oncoming trac has a constant
density u
1
, and at time t = 0 it runs into the stopped trac which has density u
max
beginning at x = 0 and extending to the right. Show that the shock curve is given by
x
s
(t) = v
max
(
u
1
u
max
)t and the density is u
1
to the left, and u
max
to the right. In other
words, trac is backing up at the speed v
max
(
u
1
u
max
).
1.6 Split-Stepping
We now return to the KdV equation, say in the form u
t
= uu
x
u
xxx
. If we drop the
nonlinear term, we have left the dispersive wave equation u
t
= u
xxx
, that we considered
in the section on linear wave equations. Recall that we can solve its Cauchy Problem, either
by using the Fourier Transform or by convolution with an explicit fundamental solution that
we wrote in terms of the Airy function.
On the other hand, if we drop the linear term, we are left with the inviscid Burgers
Equation, u
t
= uu
x
, which as we know exhibits steepening and breaking of the wave
prole, causing a shock singularity to develop in nite time T
B
for any non-trivial initial
condition u
0
that vanishes at innity. Up to this breaking time, T
B
, we can again solve the
Cauchy Problem, either by the method of characteristics, or by solving the implicit equation
u = u
0
(x ut) for u as a function of x and t.
Now, in [BS] it is proved that KdV denes a global ow on the Sobolev space H
4
(R)
of functions u : R R having derivatives of order up to four in L
2
, so it is clear that
dispersion from the linear u
xxx
term must be counteracting the peaking from the nonlinear
uu
x
term, preventing the development of a shock singularity.
In order to understand this balancing act better, it would be useful to have a method for
taking the two ows dened by u
t
= u
xxx
and u
t
= uu
x
and combining them to dene
the ow for the full KdV equation. (In addition, this would give us a method for solving
the KdV Cauchy Problem numerically.)
20
In fact there is a very general technique that applies in such situations. In the pure
mathematics community it is usually referred to as the Trotter Product Formula, while
in the applied mathematics and numerical analysis communities it is called split-stepping.
Let me state it in the context of ordinary dierential equations. Suppose that Y and Z
are two smooth vector elds on R
n
, and we know how to solve each of the dierential
equations dx/dt = Y (x) and dx/dt = Z(x), meaning that we know both of the ows
t
and
t
on R
n
generated by X and Y respectively. The Trotter Product Formula is a
method for constructing the ow
t
generated by Y + Z out of and ; namely, letting
t =
t
n
,
t
= lim
n
(
t
t
)
n
. The intuition behind the formula is simple. Think of
approximating the solution of dx/dt = Y (x) +Z(x) by Eulers Method. If we are currently
at a point p
0
, to propagate one more time step t we go to the point p
0
+t (Y (p
0
)+Z(p
0
)).
Using the split-step approach on the other hand, we rst take an Euler step in the Y (p
0
)
direction, going to p
1
= p
0
+ t Y (p
0
), then take a second Euler step, but now from p
1
and in the Z(p
1
) direction, going to p
2
= p
1
+ t Z(p
1
). If Y and Z are constant vector
elds, then this gives exactly the same nal result as the simple full Euler step with Y +Z,
while for continuous Y and Z and small time step t it is a good enough approximation
that the above limit is valid. The situation is more delicate for ows on innite dimensional
manifolds, nevertheless it was shown by F. Tappert in [Ta] that the Cauchy Problem for
KdV can be solved numerically by using split-stepping to combine methods for u
t
= uu
x
and u
t
= u
xxx
.
[Tappert actually uses an interesting variant, known as Strang splitting, which was rst
suggested in [St] to solve multi-dimensional hyperbolic problems by split-stepping one-
dimensional problems. One advantage of splitting in numerical analysis comes from the
greatly reduced eort required to solve the smaller bandwidth linear systems that arise when
implicit schemes are necessary to maintain stability, but in addition, Strang demonstrated
that second-order accuracy of the component methods need not be compromised by the
asymmetry of the splitting, as long as the pattern t
2
t
2
t
2
t
2
is used, to account for pos-
sible non-commutativity of Y and Z. (This may be seen by multiplying the respective expo-
nential series.) Serendipitously, when output is not required, several steps of Strang splitting
require only marginal additional eort: (t
2
t
2
t
2
t
2
)
n
= (t
2
t
(
t
t
)
n1
t
2
.]
Aside from such numerical considerations, split-stepping suggests a way to understand
the mechanism by which dispersion from u
xxx
balances shock formation from uu
x
in KdV.
Namely, if we consider wave prole evolution under KdV as made up of a succession of pairs
of small steps (one for u
t
= uu
x
and the one for u
t
= u
xxx
), then when u, u
x
, and u
xxx
are not too large, the steepening mechanism will dominate. But recall that as the time t
approaches the breaking time T
B
, u remains bounded, and along a breaking characteristic
u
x
only blows up like (T
B
t)
1
while u
xxx
blows up like (T
B
t)
5
. Thus, near breaking
in time and space, the u
xxx
term will dwarf the non-linearity and will disperse the incipient
shock. In fact, computer simulations do show just such a scenario playing out.
21
Section 2
The Korteweg-de Vries Equation
We have seen that in the Korteweg-de Vries (or KdV) equation, u
t
+ 6uu
x
+ u
xxx
= 0,
expresses a balance between dispersion from its third-derivative term and the shock-forming
tendency of its nonlinear term, and in fact many models of one-dimensional physical systems
that exhibit mild dispersion and weak nonlinearity lead to KdV as the controlling equation
at some level of approximation.
2.1 Early History, Exact Solutions, and Solitons
We will give here only a very abbreviated version of the historical origins of KdV. For
more details see [P] and further references given there.
As already mentioned, KdV rst arose as the modelling equation for solitary gravity waves
in a shallow canal. Such waves are rare and not easy to produce, and they were apparently
only rst noticed in 1834 (by the naval architect, John Scott Russell). Early attempts by
Stokes and Airy to model them mathematically seemed to indicate that such waves could
not be stableand their very existence was at rst a matter of debate. Later work by
Boussinesq and Rayleigh corrected errors in this earlier theory, and nally a paper in 1894
by Korteweg and de Vries [KdV] settled the matter by giving a convincing mathematical
argument that wave motion in a shallow canal is governed by KdV, and showing by explicit
computation that their equation admitted travelling-wave solutions that had exactly the
properties described by Russell, including the relation of height to speed that Russell had
determined experimentally in a wave tank he had constructed.
But it was only much later that the truly remarkable properties of the KdV equation
became evident. In 1954, Fermi, Pasta and Ulam (FPU) used one of the very rst digi-
tal computers to perform numerical experiments on a one-dimensional, anharmonic lattice
model, and their results contradicted the then current expectations of how energy should
distribute itself among the normal modes of such a system [FPU]. A decade later, Zabusky
and Kruskal re-examined the FPU results in a famous paper [ZK]. They showed that, in
a certain continuum limit, the FPU lattice was approximated by the KdV equation. They
then did their own computer experiments, solving the Cauchy Problem for KdV with initial
conditions corresponding to those used in the FPU experiments. In the results of these sim-
ulations they observed the rst example of a soliton, a term that they coined to describe
a remarkable particle-like behavior (elastic scattering) exhibited by certain KdV solutions.
Zabusky and Kruskal showed how the coherence of solitons explained the anomalous results
observed by Fermi, Pasta, and Ulam. But in solving that mystery, they had uncovered a
larger one; KdV solitons were unlike anything that had been seen before, and the search for
an explanation of their remarkable behavior led to a series of discoveries that changed the
course of applied mathematics for the next thirty years.
We next ll in some of the mathematical details behind the above sketch, beginning with
a discussion of explicit solutions to the KdV equation.
To nd the travelling wave solutions of the KdV equation is straightforward; substituting
22
the Ansatz u(x, t) = f(x ct) into KdV gives the ODE cf
+ 6ff
+ f
, and adding
as boundary condition that f should vanish at innity, a routine computation leads to the
two-parameter family of travelling-wave solutions u(x, t) = 2a
2
sech
2
(a(x 4a
2
t +d)).
2.1Exercise 1. Carry out the details of this computation. (Hint: Get a rst integral
by writing 6ff
= (3f
2
)
.)
These are the solitary waves seen by Russell, and they are now usually referred to as the
1-soliton solutions of KdV. Note that their amplitude, 2a
2
, is just half their speed, 4a
2
,
while their width is proportional to a
1
; i.e., taller solitary waves are thinner and move
faster.
These solutions were found by Korteweg and de Vries, who also carried out the more
complicated calculations that arise when one assumes periodicity instead of decay as a
boundary condition. The prole of the periodic travelling wave is given in terms of the
Jacobi elliptic function cn,
u(x, t) = 2a
2
k
2
cn
2
(a(x 4(2k
2
1)a
2
t)),
and following Korteweg and de Vries they are called cnoidal waves. Here 0 k 1 is the
modulus of the elliptic function cn. Note that as the modulus k 1, cn converges to sech,
and so the cnoidal waves have the solitary wave as a limiting case.
Next, following M . Toda [To], we will derive the n-soliton solutions of KdV. We
rst rewrite the 1-soliton solution as u(x, t) = 2

2
x
2
log cosh(a(x 4a
2
t + ), or u(x, t) =
2

2
x
2
log K(x, t) where K(x, t) = (1 +e
2a(x4a
2
t+)
).
We now try to generalize, looking for solutions of the form u(x, t) = 2

2
x
2
log K(x, t), with
K of the form K(x, t) = 1 + A
1
e
2
1
+ A
2
e
2
2
+ A
3
e
2(
1
+
2
)
, where
i
= a
i
(x 4a
2
i
t + d
i
),
and we are to choose the A
i
and d
i
by substituting in KdV and seeing what works.
2.1Exercise 2. Show that KdV is satised for u(x, t) of this form and for arbitrary
choices of A
1
, A
2
, a
1
, a
2
, d
1
, d
2
, provided only that we dene
A
3
=
_
a
2
a
1
a
1
+a
2
_
2
A
1
A
2
.
The solutions of KdV that arise in this way are called the 2-soliton solutions of KdV.
2.1Exercise 3. Show that if we take A
i
=
1
2a
i
then K(x, t) = det B(x, t), where
B(x, t) is the 2 2 matrix, B
ij
(x, t) =
ij
+
1
a
i
+a
j
e
i
+
j
.
Yes, youve guessed it, this generalizes in the obvious way. If we dene an n n matrix
B(x, t) with the matrix elements dened in the same way, then u(x, t) = 2

2
x
2
log det B(x, t)
is a solution of KdV for all choices of a
i
and d
i
, and this 2n-parameter family of solutions
is called the n-soliton solutions of KdV.
Of course this is a complete swindle! Only knowing the answer in advance allowed us
to make the correct choice of Ansatz for K. Later we shall see how to get the n-soliton
family of solutions for KdV in a completely straightforward way using the Inverse Scattering
Method.
23
But, for now, we want to look more closely at the 2-soliton solutions, and more specically
their asymptotic behavior as t approaches . We could do this for an arbitrary 2-soliton,
but for simplicity let us take a
1
= a
2
= 3.
2.1Exercise 4. Show that for these choices of a
1
and a
2
,
u(x, t) = 12
3 + 4 cosh(2x 8t) + cosh(4x 64t)
[cosh(3x 36t) + 3 cosh(x 28t)]
2
,
so in particular u(x, 0) = 6 sech
2
(x).
2.1Exercise 5. Show that for t large and negative u(x, t) is asymptotically equal to
2 sech
2
(x4t)+8 sech
2
(x16t+
2
), while for t large and positive u(x, t) is asymptotically
equal to 2 sech
2
(x 4t +) + 8 sech
2
(x 16t

2
), where = log(3)/3. (This is hard. For
the solution see [To], Chapter 6.)
Note what this says. If we follow the evolution from T to T (where T is large and
positive), we rst see the superposition of two 1-solitons; a larger and thinner one to the
left of and overtaking a shorter, fatter, and slower-moving one to the right. Around t = 0
they merge into a single lump (with the shape 6 sech
2
(x)), and then they separate again,
with their original shapes restored, but now the taller and thinner one is to the right. It is
almost as if they had passed right through each otherthe only eect of their interaction is
the pair of phase shiftsthe slower one is retarded slightly from where it would have been,
and the faster one is slightly ahead of where it would have been. Except for these phase
shifts, the nal result is what we might expect from a linear interaction. It is only if we
see the interaction as the two solitons meet that we can detect its highly nonlinear nature.
(Note that at time t = 0, the maximum amplitude, 6, of the combined wave is actually less
than the maximum amplitude, 8, of the taller wave when they are separated.) But of course
the really striking fact is the resilience of the two individual solitonstheir ability to put
themselves back together after the collision. Not only is no energy radiated away, but their
actual shapes are preserved.
(Remarkably, on page 384 of Russells 1844 paper, there is a sketch of a 2-soliton inter-
action experiment that Russell had carried out in his wave tank!)
We shall see later that every initial prole u
0
for the KdV equation can be thought of as
made up of two parts: an n-soliton solution for some n, and a dispersive tail. The tail is
transient, i.e., it rapidly tends to zero in the sup norm (although its L
2
norm is preserved),
while the n-soliton part behaves in the robust way that is the obvious generalization of the
2-soliton behavior we have just analyzed.
Now back to the computer experiment of Zabusky and Kruskal. For numerical reasons,
they chose to deal with the case of periodic boundary conditionsin eect studying the
KdV equation u
t
+ uu
x
+
2
u
xxx
= 0 (which they label (1) ) on the circle instead of on
the line. For their published report, they chose = 0.022 and used the initial condition
u(x, 0) = cos(x). With the above background, it is interesting to read the following extract
from their 1965 report, containing the rst use of the term soliton:
(I) Initially the rst two terms of Eq. (1) dominate and the classical overtaking
phenomenon occurs; that is u steepens in regions where it has negative slope. (II)
Second, after u has steepened suciently, the third term becomes important and serves
24
to prevent the formation of a discontinuity. Instead, oscillations of small wavelength
(of order ) develop on the left of the front. The amplitudes of the oscillations grow,
and nally each oscillation achieves an almost steady amplitude (that increases linearly
from left to right) and has the shape of an individual solitary-wave of (1). (III) Finally,
each solitary wave pulse or soliton begins to move uniformly at a rate (relative to
the background value of u from which the pulse rises) which is linearly proportional
to its amplitude. Thus, the solitons spread apart. Because of the periodicity, two or
more solitons eventually overlap spatially and interact nonlinearly. Shortly after the
interaction they reappear virtually unaected in size or shape. In other words, solitons
pass through one another without losing their identity. Here we have a nonlinear
physical process in which interacting localized pulses do not scatter irreversibly.
2.2 Constants of the Motion for the KdV Flow
After the appearance of the Zabusky-Kruskal paper, attempts were quickly made to
understand what it was that was special about KdV that led to the soliton phenomenon.
Perhaps because soliton behavior involved the conservation of shape, one conjecture was that
the KdV ow might have unusually many constants of the motion, and a search was begun
for polynomial conservation laws, i.e, polynomials P(u, u
x
, u
xx
, . . .) in u and its spatial
partial derivatives such that
_
P(u(x, t), u
x
(x, t), u
xx
(x, t), . . .) dx would be independent
of t for all solutions u(x, t) of KdV that vanish suciently rapidly at innity.
We have seen that a sucient condition for this is that any solution of KdV should
satisfy a conservation law with P as the density, i.e., there should exist a corresponding ux,
J(u, u
x
, u
xx
, . . .), such that the equation

t
P +

x
J should follow as a formal consequence
of the KdV equation. If this is the case, then we will call u a conserved density for KdV.
Now KdV is itself a conservation law: u
t
+(3u
2
+u
xx
)
x
= 0, so u is one conserved density.
Also, if we multiply KdV through by u, we can rewrite the result as (
1
2
u
2
)
t
+(2u
3
+uu
xx
1
2
u
2
x
)
x
= 0, so u
2
is a second conserved density for KdV. Finally multiplying KdV by u
2
and
adding the result to u
x
times the x derivative of KdV, we nd that (u
3
+
1
2
u
2
x
) is a third
conserved density for KdV, with associated ux (
9
2
u
4
3u
2
u
xx
+6uu
2
x
+u
x
u
xxx
1
2
u
2
xx
).
For future reference we will name these conserved densities

F
1
(u) = u,

F
2
(u) = u
2
, and
F
3
(u, u
x
) = u
3
+
1
2
u
2
x
.
These three were classical, in the sense that they were well-known long before the
Zabusky-Kruskal paper. In fact, as we shall see below, they represent important conserved
physical quantities in the case that KdV is modelling a wave in a shallow canal. A fourth
conserved density was found by Whitham, and Zabusky and Kruskal discovered two more.
This was prior to the existence of symbolic manipulation computer programs programs such
as Macsyma, and by now the computations were getting horrendous. Still, with remarkable
perseverance and eort the number was eventually raised to eleven, before Miura, using
generating function methods, nally showed that these were in fact the rst elements of an
innite sequence of conserved densities,

F
k
(u, u
x
, u
xx
, . . .) for KdV.
It is still not easy to prove the existence of the

F
k
, and we will not attempt to do so here,
but rather refer the interested reader to Chapter 1 of [La3]. There was naturally speculation
that KdV could somehow be regarded as an innite dimensional Hamiltonian sytem, with
25
one of the functionals F
k
(u) =
_
F
k
(u) dx playing the r ole of the Hamiltonian, and all of
them being in involution, and that turned out to be the case.
Let u(x, t) be a solution of KdV that along with all its derivatives vanishes at innity, and
let us imagine u as measuring the height of a wave in a narrow channel (above the level of
undisturbed water). If we regard the water as incompressible, then u(x, t) dx is proportional
to the mass of water in the wave between x and x + dx (at time t), so we can identify the
constant of the motion
_
u(x, t) dx with the total mass of water in the wave, and we will
denote it by (u).
2.2Exercise 1. Show that the 1-soliton with parameter a has mass 4a.
We expect that the linear momentum carried by the wave should also be a constant of
the motion. How do we compute it? It is natural to dene the center of mass of the wave
at time t by x
u
(t) =
1
(u)
_
xu(x, t) dx.
2.2Exercise 2. Show that x
u
(t) moves with constant velocity. Equivalently, show
that P
u
=
_
xu
t
(x, t) dx is another constant of the motion.
In [To], Toda calls P
u
the total momentum of the wave, but this does not seem physically
justied to me. It suggests that the velocity of the center of mass is the same as the average
translational velocity of the water particles in the x direction, as if the water were moving
along with the wave. In fact, in the approximation used to derive the KdV equation, the
x-component of velocity of the water particle located at x at time t is proportional to the
wave height u(x, t), so that the total x-momentum of the wave is actually
_
u
2
(x, t) dx.
This leaves the third classical constant of the motion:
_
(u
3
1
2
u
2
x
) dx; we would like to
give it some physical interpretationand total energy is the obvious suspect. I dont see
any convincing argument to identify it with the sum of the kinetic and potential energy of
the wave, but on the other hand we will see in the next section that in our representation
of KdV as a Hamiltonian system, this constant of the motion is the Hamiltonian function,
and of course in classical mechanics that is the r ole normally played by the total energy.
2.3 KdV as a Hamiltonian System
I will assume below that the reader is familiar with the basic facts concerning symplectic
manifolds and Hamiltonian systems, including the innite dimensional case, however I have
included an appendix in which these concepts are reviewed.
We shall now see how to view KdV as a Hamiltonian system in a simple and natural
way. It turns out that this Hamiltonian system has a key property one would expect from
any generalization to innite dimensions of the concept of complete integrability in the
Liouville sense, namely the existence of innitely many functionally independent constants
of the motion that are in involution. (Later, in discussing the inverse scattering method, we
will indicate how complete integrability was proved in a more precise sense by Fadeev and
Zakharov [ZF]; they demonstrated that the scattering data for the KdV equation obey
the characteristic Poisson bracket relations for the action-angle variables of a completely
integrable system.)
For simplicity, we shall take as our phase space P for KdV the Schwartz space, o(R), of
rapidly decreasing functions u : R R, although a much larger space would be possible.
26
As mentioned earlier, it is proved in [BS] that KdV denes a global ow on the Sobolev
space H
4
(R) (see also [Ka1], [Ka2]), and it is not hard to see that P is an invariant subspace
of this ow. For u, v in P we will denote their L
2
inner product
_
u(x)v(x) dx by u, v)
and we dene
(u, v) =
1
2
_

(v(x)u(x) u(x)v(x)) dx,

where u(x) =
_
x
u(y) dy denotes the indenite integral of u. (For the periodic KdV

equation we take P to be all smooth periodic functions of period 2 and replace the
_
by
_
2
0
.)
We denote by the derivative operator, u u
, so u = u, and
_
u = 0 for
functions u that vanish at innity. We will also write u
(k)
for
k
u, but for small k we shall
also use u = u
(0)
, u
x
= u
(1)
, u
xx
= u
(2)
, etc.
There is a simple but important relation connecting , , and the L
2
inner product,
namely:
(u, v) = u, v) .
2.3Exercise 1. Prove this. (Hint: Show that (uv) = (u) v+uv,
_
(uv) = 0,
and (u, v) = (1/2)
_
(v u (u)v).)
One important consequence of this is the weak non-degeneracy of . For, if i
v
is zero,
then in particular u, v) = (u, v) = (v, u) = (i
v
)(u) = 0 for all u, so v = 0.
is clearly a skew-bilinear form on P. Since P is a vector space, we can as usual identify
P with its tangent space at every point, and then becomes a constant 2-form on P.
Since it is constant, of course d = 0.
A second consequence of (u, v) = u, v) is that if F : P R is a smooth function (or
functional) on P that has a gradient F with respect to the at Riemannian structure
on P dened by the L
2
inner product, then the symplectic gradient of F also exists.
Recall that dF, the dierential of F, is the 1-form on P dened by
dF
u
(v) =
d
d
=0
F(u +v),
and the gradient of F is the vector eld dual to dF with respect to the L
2
inner product
(if such a vector eld indeed exists), i.e., it is characterized by (dF)
u
(v) = (F)
u
, v).
Similarly, the symplectic gradient of F (if it exists) is the dual to dF with respect to , i.e.,
it satises (dF)
u
(v) = ((
s
F)
u
, v). Since (F)
u
, v) = (((F)
u
), v), it follows that if
(F)
u
exists then (
s
F)
u
also exists and equals ((F)
u
).
2.3.1 Remark. Since P is not complete with respect to the L
2
norm, it is not automatic
that a continuous linear functional on P can be represented as the inner product with some
element of P. Thus, the existence of a gradient vector eld corresponding to a particular
function F requires proof.
We shall only consider functions F : P R of the type normally considered in the
Calculus of Variations, i.e., of the form:
F(u) =
_

F(u, u
x
, u
xx
, . . .) dx,
27
where

F : R
k+1
R is a polynomial function without a constant term. Then the usual
integration by parts argument of the Calculus of Variations shows that such an F has a
gradient, given by:
(F)
u
=

F
u

_

F
u
x
_
+
2
_

F
u
xx
_
. . . .
2.3.2 Remark. The above formula is written using the standard but somewhat illogical
conventions of the Calculus of Variations and needs a little interpretation.

F is a function
of variables y = (y
0
, y
1
, y
2
, . . . y
k
), and for example

F/u
xx
really means the function on
R whose value at x is

F/y
2
evaluated at y = (u
(0)
(x), u
(1)
(x), u
(2)
(x), . . . u
(k)
(x)).
From what we saw above, the symplectic gradient of such an F exists and is given by:
(
s
F)
u
=
_

F
u
_

2
_

F
u
x
_
+
3
_

F
u
xx
_
. . . .
Now a smooth function on a symplectic manifold is called Hamiltonian if it has a sym-
plectic gradient, so what we have shown is that all such Calculus of Variations functionals
on P are Hamiltonian and dene the Hamiltonian ow u = (
s
F)
u
, where u(t) denotes a
smooth curve in P. If instead of u(t)(x) we write u(x, t), this symbolic ODE in the manifold
P becomes the PDE:
u
t
=
_

F
u
_

2
_

F
u
x
_
+
3
_

F
u
xx
_
. . . .
In particular if we take

F =

F
3
(u, u
x
) = u
3
+ u
2
x
/2 , then we get the KdV equation in
standard form: u
t
= (3u
2
)
2
(u
x
) = 6uu
x
u
xxx
.
Recall that if two smooth real-valued functions F and G on a symplectic manifold P are
both Hamiltonian, then they determine a third function on P, called their Poisson bracket,
dened by: F, G = (
s
G,
s
F), and it is easy to show that this is also a Hamiltonian
function and in fact
s
F, G = [
s
F,
s
G], (cf. the appendix.)
Specializing again to the case that P is the Schwartz space o(R), with the symplectic
structure dened above, we get the following formula for the Poisson Bracket:
F, G = (
s
G,
s
F) = ( G, F) = G, (F))
2.3Exercise 2. Note that the density

F
3
above, that gives the KdV equation, is
just the third of the classical conserved densities for KdV. Show that in fact all three of
the functionals F
i
(corresponding to the densities

F
1
(u) = u,

F
2
(u) = u
2
, and

F
3
(u, u
x
) =
u
3
+
1
2
u
2
x
) are in involution.
In fact, it turns out that the whole sequence of functionals F
i
coming from the conserved
densities

F
i
found by Miura are all in involution. (For the proof see [La3].)
2.4 KdV as a Lax Equation
In developing the Inverse Scattering Transform Gardner, Greene, Kruskal and Miura
[GGKM] showed that there was an intimate relation between the KdV equation and the
28
time-independent Schr odinger operators
d
2
dx
2
+ unamely if a one-parameter family of
potentials u(x, t) evolved according to the KdV equation, then the corresponding one-
parameter L(t) of self-adjoint operators on L
2
(R) that are given by the Schr odinger op-
erators with potentials u(t)(x) = u(x, t) (i.e., L(t)(x) =
d
2
dx
2
(x) + u(x, t)(x)) are
isospectral. Peter Lax [La1] took this observation one step further. He showed that one
could recast the KdV equation in a form now known as a Lax Equation, and as such it is
equivalent to the statement that the L(t) are evolving by unitary equivalence, i.e., there is
a smooth one-parameter family, U(t), of unitary operators on L
2
(R) such that U(0) = I
and L(t) = U(t)L(0)U(t)
1
. We will rst develop the concept of a Lax Equation in an
abstract setting, and then apply these considerations to the KdV situation. By the way, in
the following it will be convenient to take KdV in the form u
t
6uu
x
+u
xxx
= 0.
Suppose we have a smooth one-parameter family U(t) of unitary transformations of a
Hilbert space H with U(0) = I. U
t
(t), the derivative of U(t), is a tangent vector at U(t) of
the group |(H) of unitary transformations of H, so B(t) = U
t
(t)U(t)
1
= U
t
(t)U(t)
is a
tangent vector to |(H) at the identity, I. Dierentiating UU
= I gives U
t
U
+UU
t
= 0,
and since U
t
= BU and U
t
= U
, 0 = BUU
+ UU
, so B
= B; i.e., B(t) is a
family of skew-adjoint operators on H. Conversely, a smooth map t B(t) of R into the
skew-adjoint operators denes a time-dependent right invariant vector eld X
U
(t) = B(t)U
on |(H) and so (at least in nite dimensions) a smooth curve U(t) of unitary operators
starting from I such that U
t
(t) = B(t)U(t).
Now suppose that L(0) is a self-adjoint operator on H, and dene a family of conjugate
operators L(t) by L(t) = U(t)L(0)U(t)
1
, so L(0) = U(t)
L(t)U(t). Dierentiating the

latter with respect to t, 0 = U
t
LU + U
L
t
U + U
LU
t
= U
(BL + L
t
+ LB)U. Hence,
writing [B, L] = BLLB as usual for the commutator of B and L, we see that L(t) satises
the so-called Lax Equation, L
t
= [B, L].
Given a smooth family of skew-adjoint operators B(t), the Lax Equation is a time-
dependent linear ODE in the vector space o of self-adjoint operators on H, whose special
form expresses the fact that the evolution is by unitary conjugation. Indeed, since the
commutator of a skew-adjoint operator and a self-adjoint operator is again self-adjoint, B(t)
denes a time-dependent vector eld, Y , on o by Y (t)(L) = [B(t), L]. Clearly a smooth
curve L(t) in o satises the Lax Equation if and only if it is a solution curve of Y . By
uniqueness of solutions of linear ODE, the solution L(t) of this ODE with initial condition
L(0) must be the one-parameter family U(t)L(0)U(t)
1
constructed above.
Given any (0) in H, dene (t) = U(t)(0). Since U(t)L(0) = L(t)U(t), it follows that
if (0) is an eigenvector of L(0) belonging to the eigenvalue , then (t) is an eigenvalue
of L(t) belonging to the same eigenvalue . Dierentiating the relation dening (t) gives
t
= B(t), so we may consider (t) to be dened as the solution of this linear ODE with
initial value (0). Since this is one of the main ways in which we will use Lax Equations,
we will restate it as what we shall call the:
Isospectral Principle. Let L(t) and B(t) be smooth one-parameter families of self-adjoint
and skew-adjoint operators respectively on a Hilbert space H, satisfying the Lax Equation
L
t
= [B, L], and let (t) be a curve in H that is a solution of the time-dependent linear
ODE
t
= B. If the initial value, (0), is an eigenvector of L(0) belonging to an eigenvalue
, then (t) is an eigenvector of L(t) belonging to the same eigenvalue .
We now apply the above with H = L
2
(R). We will see that if u satises KdV, then
29
the family of Schr odinger operators L(t) on H dened above satises the Lax Equation
L
t
= [B, L], where
B(t)(x) = 4
xxx
(x) + 3 (u(x, t)
x
(x) + (u(x, t)(x))
x
) ,
or more succinctly, B = 4
3
+ 3(u + u). Here and in the sequel it is convenient to use
the same symbol both for an element w of the Schwartz space, o(R), and for the bounded
self-adjoint multiplication operator v wv on H. Since H is innite dimensional and our
operators B and L are unbounded on H, some care is needed for a rigorous treatment. But
this is relatively easy. Note that all the operators involved have the Schwartz space as a
common dense domain.)
Note that since is skew-adjoint, so is any odd power, and in particular 4
3
is skew-
adjoint. Also, the multiplication operator u is self-adjoint, while the anti-commutator of a
self-adjoint and a skew-adjoint operator is skew-adjoint, so u +u and hence B is indeed
skew-adjoint.
Since clearly L
t
= u
t
, while u
t
6uu
x
+u
xxx
= 0 by assumption, to prove that L
t
= [B, L]
we need only check that [B, L] = 6uu
x
u
xxx
.
2.4Exercise 1. Prove this. (Hint: [B, L] = 4[
3
,
2
] 4[
3
, u] 3[u,
2
] +3[u, u]
3[u,
2
] + 3[u, u], so it will suce to verify the following six commutator relations:
[
3
,
2
] = 0, [
3
, u] = u
xxx
+ 3u
xx
+ 3u
x
2
, [u,
2
] = u
xx
2u
x
2
, [u, u] = uu
x
,
[u,
2
] = 3u
xx
2u
x
2
u
xxx
, and [u, u] = uu
x
.
Let us now apply the Isospectral Principle to this example.
KdV Isospectrality Theorem. Suppose u(x, t) is a solution of the KdV equation,
u
t
6uu
x
+u
xxx
= 0,
whose initial value u(x, 0) is in the Schwartz space o(R), and that (x) is an eigenfunction
of the Schr odinger Equation with potential u(x, 0) and eigenvalue :
d
2
dx
2
(x) +u(x, 0)(x) = (x).
Let (x, t) be the solution of the evolution equation
t
= B, i.e.,
t
= 4
x
3
+ 3
_
u(x, t)
x
(x, t) +

x
(u(x, t)(x, t))
_
with the initial value (x, 0) = (x). Then (x, t) is an eigenfunction for the Schr odinger
Equation with potential u(x, t) and the same eigenvalue :
xx
(x, t) +u(x, t)(x, t) = (x, t),
and moreover, if (x) is in L
2
, then the L
2
norm of (, t) is independent of t. Finally,
(x, t) also satises the rst-order evolution equation
t
(4 + 2u)
x
+u
x
= 0.
30
PROOF. Except for the nal statement this is an immediate application of the Isospec-
trality Principle. Dierentiating the eigenvalue equation for (x, t) with respect to x gives
xxx
= u
x
+ (u )
x
, and substituting this into the assumed evolution equation for
gives the asserted rst-order equation for .
By the way, I should re-emphasize that the essential point is that when a potential
evolves via KdV, the corresponding Schr odinger operators are isospectral, and this is already
clearly stated in [GGKM]. Laxs contribution was to explain the mechanism behind this
remarkable fact and to formulate it in a way that was easy to generalize. In fact, almost all
generalizations of the phenomena rst recognized in KdV have used the Lax Equation as a
jumping-o place.
2.5 The KdV Heierarchy
Let me explain why you should nd the Lax form of the KdV equation striking and more
that a little surprising. Lets examine both sides of the equation L
t
= [B, L] carefully. On
the left, since the self-adjoint operator L(t) is the constant (in time) second order dierential
operator
2
plus the zero-order operator multiplication by u(t), its time derivative, L
t
,
is the zero-order operator multiplication by u
t
. On the right we have the dierence of the
two fth order dierential operators B(t)L(t) and L(t)B(t). Now of course the operators
L(t) and B(t) do not commute, but since they are acting on scalar-valued functions, the
top-order (i.e., fth order) terms (or symbols) of their products is independent of the
order of composition by a well-known trivial calculation. Thus, [B, L] should be a fourth
order operator! What happened to the terms of order one, two, three, and four? Of course
they have miraculously cancelled (as you will have noticed if you did the above exercise)
due to the special form of B(t). The fact that the zero orderright hand side, 6uu
x
u
xxx
of the KdV equation u
t
= 6uu
x
u
xxx
can be written as the commutator of L(t) and a
third order operator B(t) is what should have surprised you, it is what accounts for the
remarkable integrability properties of KdV. If you replace B(t) by another skew-adjoint
third order operator that is not just a multiple of B(t) you will see that it is never of order
zero. But what if we replace B(t) with some higher order dierential operator?
Peter Lax asked and answered this question in [La1], the paper in which he rst derived
the above Lax form of the KdV equation. The natural generalization for B, Lax points out,
is an operator B
j
of order 2j +1 of the form a
2j+1
+
j
i=1
(b
i
2i1
+
2i1
b
i
), where the b
i
are polynomials in u and its partial derivatives. If we demand that [L, B
j
] be a zero order
operator, multiplication by some polynomial K
j
(u) in u and its derivatives, this gives j
conditions that uniquely determine the b
1
, . . . , b
j
. For j = 0 we get B
0
= and K
0
(u) = u
x
,
so the analogue of the KdV equation is just the linear advection equation u
t
= u
x
which
is therefore also called the zero-th ow of the KdV Heierarchy. Of course B
1
= B, and
KdV itself is the rst ow of the KdV Heierarchy, and in general the evolution equation
u
t
= K
j
(u) is referred to as the j-th ow of the KdV Heierarchy. What is more, it turns
out that each of these higher order KdV ows is a Hamiltonian ow on our phase space
o(R) with respect to the symplectic structure we have described above. The corresponding
Hamiltonian functions F
j
: o(R) R are all Calculus of Variations functionals, i.e., they
are of the form F
j
(u) =
_
F
j
(u) dx, where

F
j
(u) is a polynomial dierential operator of
order j. By now you probably wont be surprised to learn that these F
j
are all in involution
31
(so all the ows of the KdV heierarchy commute), and hence each F
j
is a conserved quantity
for the KdV ow. In fact, F
1
, F
2
, F
3
are the classical conserved quantities of KdV, and the
higher F
j
are just the sequence of conserved quantities discovered in [GGKM].
32
Section 3
Introduction to the Inverse Scattering Transform
3.1 The Scattering Data and its Evolution
We now x a potential function u in the Schwartz space o(R) and look more closely at
the space E
(u) of eigenfunctions of the Schr odinger operator with this potential. By

denition, E
(u) is just the kernel of the linear operator L

u
() =
d
2
dx
2
+ u acting
on the space C
(R), and by the elementary theory of second-order linear ODE it is, for
each choice of , a two-dimensional linear subspace of C
(R). Using the special form of

L
u
we can describe E
(u) more precisely. We will ignore the case = 0, and consider the
case of positive and negative separately.
Suppose =
2
, > 0. Note that any in E
(u) will clearly be of the form (x) =

ae
x
+be
x
in any interval on which u vanishes identically. Thus if u has compact support,
say u(x) = 0 for [x[ > M, then we can nd a basis
+
,
,
,
for E
(u) such that
,
(x) = e
x
, (or equivalently,
+
,
(x)e
x
= 1) for x < M. Similarly there is a
second basis
+
,
,
,
for E
(u) such that
,
(x) = e
x
(or
,
(x)e
x
= 1) for
x > M.
When u does not have compact support, but is only rapidly decreasing then, by an
argument that we will sketch below, it follows that there still exist two bases for E
(u)
+
,
,
,
,
+
,
,
,
such that
,
(x) e
x
at and
,
(x) e
x
at
+ (i.e., lim
x
,
(x)e
x
= 1 and lim
x
,
(x)e
x
= 1).
Using these bases it is easy to detect when is a so-called discrete eigenvalue of L
u
,
i.e., when E
(u) contains a non-zero element of L

2
(R). Let us dene functions f() and
c() by
+
,
= f()
+
,
+ c()
,
. We can assume has L
2
norm one, and since
,
blows up at while
+
,
blows up at , must be both a multiple of
+
,
and of
,
, and since ,= 0 it follows that f() = 0. Conversely, if f() = 0 then
+
,
= c()
,
decays exponentially both at and and so we can normalize it to
get an element of E
(u) with L
2
norm one. Thus the discrete eigenvalues of L
u
are precisely
the roots of the function f.
It follows from standard arguments of Sturm-Liouville theory that in fact L
u
has only
nitely many discrete eigenvalues,
1
, . . . ,
N
, with corresponding L
2
normalized eigenfunc-
tions
1
, . . . ,
N
, and these determine so-called normalization constants c
1
, . . . , c
N
by
n
= c
n
n
,
, i.e., if we write
n
=
2
n
, then c
n
is characterized by
n
(x) c
n
e
n
x
as
x . We note that the
n
and hence the normalization constants c
n
are only determined
up to sign, but we will only use c
2
n
in the Inverse Scattering Transform.
For = k
2
, k > 0 there are similar considerations. In this case if u(x) vanishes for
[x[ > M then any element of E
(u) will be of the form ae

ikx
+be
ikx
for x < M and also
of the form ce
ikx
+de
ikx
for x > M. If u is only rapidly decaying then this time we can nd
two bases
+
,
,
,
and
+
,
,
,
for E
(u) such that
,
(x) e
ikx
at
33
while
,
(x) e
ikx
at +. Then
,
=
,
+
+
,
, where can be shown to be
non-zero. Dividing by we get a particular eigenfunction
k
, called the Jost solution, with
the special asymptotic behavior
k
(x) a(k)e
ikx
as x and
k
(x) e
ikx
+b(k)e
ikx
as x .
The functions a(k) and b(k) are called the transmission coecient and reection coe-
cient respectively, and b(k) together with the above normalizing constants c
1
, . . . c
n
make
up the Scattering Data, o(u) for u.
While it may seem intuitively clear that the bases
,
must exist, to give rigorous
asymptotic arguments required for the proof of the crucial theorem on the time evolution
of the Scattering Data it is essential to supply precise denitions of these bases, and we do
this next.
For a warm-up, consider the simpler problem of the rst order ODE L
u
=
d
dx
u.
If we make the substitution = e
x
, then the eigenvalue equation L
u
() = becomes
d
dx
= u, so (assuming u depends on a parameter t) we have (x, t) = exp(
_
x
u(, t) d).
Note that lim
x
(x, t) = 1 while lim
x
(x, t) = exp(
_
0
u(, t) d) = c(t). If (x, t)
is an eigenfunction of L
u
it follows that (x, t) c(t)e
x
(i.e., lim
x
(x, t)e
x
= c(t)).
But moreover, since u(x, t) is rapidly decaying we can even dierentiate under the integral
sign to obtain
t
(x, t) c
(t)e
x
.
If an asymptotic relation depends on a parameter, one cannot in general dierentiate
with respect to that parameter unless one knows that the asymptotic relation holds uni-
formly in the parameter, and since we will need to derive a relation similar to the above
for eigenfunctions of Schr odinger operators, we must now make an argument similar to the
above (but a somewhat more complicated) to justify dierentiation in that case.
If we make the substitution = e
x
in our eigenvalue equation
xx
=
2
+u, then
we get after simplications
x
x 2
x
= u, or ( 2) = u. Recall the method of
solving the inhomogeneous equation ( 2) = f by variation of parameters. Since 1
and e
2x
form a basis for the solutions of the homogeneous equation, we look for a solution
of the form =
1
+
2
e
2x
, and to make the system determined we add the relation
1
+
2
e
2x
= 0. This leads to the equations
1
=
f
2
and
2
=
f
2
e
2x
so the solution
to the inhomogeneous equation is just =
1
2
_
x
0
f() d +
e
2x
2
_
x
0
f()e
2x
d.
If we now take f = u (and use e
x
= ) then we get the relation
(x, t) =
1
2
_
0
x
u(, t)(, t) d
e
2x
2
_
0
x
u(, t)(, t)e
x
d.
Assuming that
2
is a discrete eigenvalue, and that has L
2
norm 1, u will also be
in L
2
and we can estimate the second integral using the Schwartz Inequality. We see that
in fact [
_
0
x
u()()e
x
d[ < O(e
x
), so the second term is O(e
x
), and it follows that
(x, t) c(t)e
x
at where c(t) = (, t) =
1
2
_
0
u(, t)(, t) d. In other words,

the normalizing constant is well dened. But what is more important, it also follows that
if u(x, t) satises KdV, then the normalizing constant c(t) for a xed eigenvalue
2
is
a dierentiable function of t and satises
t
(x, t) c
(t)e
x
. This follows from the fact
that we can dierentiate the formula for c(t) under the integral sign because u is rapidly
decreasing.
Note that dierentiating the relation e
x
= gives
x
e
x
=
x
. But the formula
for shows that
x
converges to zero at , so
x
(x, t) c(t)e
x
. From the KdV
34
Isospectrality Theorem, we know that if u(x, t) satises KdV, then (x, t) satises
t

(4
2
+ 2u)
x
+ u
x
= 0 It follows that the left hand side times e
x
converges to c
(t) +
4
2
(c(t)) as x and hence c
(t) 4
3
c(t) = 0, so c(t) = c(0)e
4
3
t
.
By a parallel argument (which we omit) it follows that the transmission and reec-
tion coecients are also well dened and that the Jost solution
k
(x, t) satises (
k
)
t

a
t
(k, t)e
ikx
at and (
k
)
t
b
t
(k, t)e
ikx
at , and then one can show from the KdV
Isospectrality Theorem that the transmission coecients are constant, while the reection
coecients satisfy b(k, t) = b(k, 0)e
8ik
3
t
.
3.1Exercise 1. Supply the omitted proof.
Theorem on Evolution of the Scattering Data. Let u(t) = u(x, t) be a smooth curve
in o(R) satisfying the KdV equation u
t
6uu
x
+u
xxx
= 0 and assume that the Schr odinger
operator with potential u(t) has discrete eigenvalues
2
1
, . . . ,
2
N
whose corresponding nor-
malized eigenfunctions have normalization constants c
1
(t), . . . , c
n
(t). Let the transmission
and reection coecients of u(t) be respectively a(k, t) and b(k, t). Then the transmission
coecients are all constants of the motion, i.e., a(k, t) = a(k, 0), while the Scattering Data,
c
n
(t) and b(k, t), satisfy:
1) c
n
(t) = c
n
(0)e
4
3
n
t
,
2) b(k, t) = b(k, 0)e
8ik
3
t
.
We note a striking (and important) fact: not only do we now have an explicit and simple
formula for the evolution of the scattering data o(u(t)) when u(t) evolves by the KdV
equation, but further this formula does not require any knowledge of u(t).
The fact that the transmission coecients a(k) are constants of the motion while the
logarithms of the reection coecients, b(k) vary linearly with time suggest that perhaps
they can somehow be regarded as action-angle variables for the KdV equation, thereby
identifying KdV as a completely integrable system in a precise sense. While a(k) and b(k)
are not themselves canonical variables, Zakharov and Fadeev in [ZF] showed that certain
functions of a and b did satisfy the Poisson commutation relations for action-angle variables.
Namely, the functions p(k) = (k/) log [a(k)[
2
= (k/) log[1 +[b(k)[
2
] and q(k) = arg(b(k))
satisfy p(k), q(k
) = (k k
) and p(k), p(k
) = q(k), q(k
) = 0.
The above formula for the evolution of the Scattering Data is one of the key ingredients
for The Inverse Scattering Method, and we are nally in a position to describe this elegant
algorithm for solving the Cauchy problem for KdV.
The Inverse Scattering Method.
To solve the KdV initial value problem u
t
6uu
x
+u
xxx
= 0 with given initial potential
u(x, 0) in o(R):
1) Apply the Direct Scattering Transform. That is, nd all the discrete eigenval-
ues
2
1
, . . . ,
2
N
for the Schr odinger operator with potential u(x, 0), and the the
Scattering Data o(u(x, 0))for u(x, 0), consisting of the normalizing constants c
n
(0)
and the reection coecients b(k, 0).
2) Use the explicit evolution formulae: c
n
(t) = c
n
(0)e
4
3
n
t
and b(k, t) = b(k, 0)e
8ik
3
t
,
to nd the scattering data o(u(x, t)).
35
3) Use the Inverse Scattering Transform (described in the next section) to compute
u(t) from o(u(x, t)).
3.2 The Gelfand-Levitan-Marchenko Equation
Recovering the potential u of a Schr odinger operator L
u
from the Scattering Data o(u) was
not something invented for the purpose of solving the KdV initial value problem. Rather,
it was a question of basic importance to physicists doing Cyclotron experiments, and the
theory was worked out in the mid-1950s by Kay and Moses [KM], by Gelfand and Levitan
[GL], and Marchenko [M].
Denote the discrete eigenvalues of u by
2
1
, . . . ,
2
N
, the normalizing constants by
c
1
, . . . , c
N
, and the reection coecients by b(k), and dene a function
B() =
N
n=1
c
2
n
e
+
1
2
_

b(k)e
ik
dk.
Inverse Scattering Theorem. The potential u can be recovered using the formula u(x) =
2
d
dx
K(x, x), where K(x, z) is the unique function on R R that is zero for z < x and
satises the Gelfand-Levitan-Marchenko Integral Equation:
(GLM) K(x, z) +B(x +z) +
_

K(x, y)B(y +z) dy = 0.

Below we will demonstrate how the Inverse Scattering Method can be used to get formulas
for the KdV multi-solitons by solving the GLM equation explicitly for the case of reection-
less potentials. But rst a couple of general remarks about solving the GLM equation. We
assume in the following that B is rapidly decreasing.
Let C(RR) denote the Banach space of bounded, continuous real-valued functions on
RR with the sup norm. Dene T
B
: C(RR) C(RR) by the formula
T
B
(K)(x, z) = B(x +z)
_

K(x, y)B(y +z) dy.

Then K satises the Gelfand-Levitan-Marchenko equation if and only if it is a xed-point of
T
B
. It is clear that T
B
is Lipschitz with constant |B|
L
1
, so if |B|
L
1
< 1 then by the Banach
Contraction Principle the Gelfand-Levitan-Marchenko equation has a unique solution, and
it is the limit of the sequence K
n
dened by K
1
(x, z) = B(x +z), K
n+1
= T
B
(K
n
).
Secondly, we note that if the function B is separable in the sense that it satises an
identity of the form B(x + z) =

N
n=1
X
n
(x)Z
n
(z), then the Gelfand-Levitan-Marchenko
equation takes the form:
K(x, z) +
N
n=1
X
n
(x)Z
n
(z) +
N
n=1
Z
n
(z)
_

x
K(x, y)X
n
(y) dy = 0.
36
and it follows that K(x, z) must have the form K(x, z) =

N
n=1
L
n
(x)Z
n
(z). If we substitute
this for K in the previous equation and dene a
nm
(x) =
_
x
Z
m
(y)X
n
(y) dy then we have
reduced the problem to solving N linear equations for the unknown functions L
n
, namely:
L
n
(x) + X
n
(x) +

N
m=1
a
nm
(x)L
m
(x) = 0, or X
n
(x) +

N
m=1
A
nm
(x)L
m
(x) = 0, where
A
nm
(x) =
nm
+a
nm
(x). Thus nally we have:
K(x, x) =
N
n=1
Z
n
(x)
N
m=1
A
1
nm
(x)X
m
(x).
3.3 An Explicit Formula for KdV Multi-Solitons.
A potential u is called reectionless if all the reection coecients are zero. Because of the
relation b(k, t) = b(k, 0)e
8ik
3
t
, it follows that if u(x, t) evolves by KdV and if it is reectionless
at t = 0 then it is reectionless for all t. If the discrete eigenvalues of such a potential are
2
1
, . . . ,
2
N
and the normalizing constants are c
1
, . . . , c
N
, then B() =

N
n=1
c
2
n
e
, so
B(x + z) =

N
n=1
X
n
(x)Z
n
(z), where X
n
(x) = c
2
n
e
n
x
, and Z
n
(z) = e
n
z
and we are in
the separable case just considered. Recall that:
a
nm
(x) =
_

x
Z
m
(y)X
n
(y) dy
=c
2
n
_

x
e
(
n
+
m
)y
dy
=c
2
n
e
(
n
+
m
)x
(
n
+
m
)
,
and that
A
nm
(x) =
nm
+a
nm
(x)
=
nm
+c
2
n
e
(
n
+
m
)x
(
n
+
m
)
.
Dierentiation gives
d
dx
A
nm
(x) = c
2
n
e
(
n
+
m
)x
, so by the formula above:
K(x, x) =
N
n=1
Z
n
(x)
N
m=1
A
1
nm
(x)X
m
(x)
=
N
n=1
e
n
x
N
m=1
A
1
nm
(x)(c
2
m
e
m
x
)
=
N
n=1
N
m=1
A
1
nm
d
dx
A
mn
(x)
= tr
_
A
1
(x)
d
dx
A(x)
_
=
1
det(A(x))
d
dx
det A(x)
=
d
dx
log det A(x).
37
and so u(x) = 2
d
dx
K(x, x) = 2
d
2
dx
2
log det A(x).
If N = 1 and we put =
1
it is easy to see that this formula reduces to our earlier
formula for travelling wave solutions of the KdV equation: u(x, t) =
2
2
sech
2
((x
2
t)).
We can also use it to nd explicit solutions u(x, t) for N = 2. Let g
i
(x, t) = exp(
3
i
t
i
x),
and set A =
(
1
2
)
2
(
1
+
2
)
2
, then
u(x, t) = 2
2
1
g
1
+
2
2
g
2
+ 2(
1

2
)
2
g
1
g
2
+Ag
1
g
2
(
2
1
g
2
+
2
2
g
1
)
(1 +g
1
+g
2
+Ag
1
g
2
)
2
.
For general N the solutions u(x, t) that we get this way are referred to as the pure N-
soliton solutions of the KdV equation. It is not hard to show by an asymptotic analysis
that for large large negative times and large positive times they behave as a superposition
of the above travelling wave solutions, and that after the larger, faster moving waves have
all passed through the slower moving shorter ones and they have become well-separated, the
only trace of their interactions are certain predictable phase-shifts, i.e., certain constant
translations of the locations of their maxima from where they would have been if they had
not interacted.
38
Section 4
The ZS-AKNS Scheme
4.1 Zero Curvature Lax Equations
In this section, we will explain a general technique developed Zakharov and Shabat and by
Ablowitz, Kaup, Newell, and Segur for generating Lax Equations of a special type (called
ZCC, or Zero Curvature Condition equations). The Cauchy Problem for ZCC equations
can be solved using a generalization of the Inverse Scattering Transform. Zakharov and
Shabat introduced this method to study an important special equation (the so-called Non-
linear Schr odinger Equation, or NLS) [ZS], and soon thereafter Ablowitz, Kaup, Newell,
and Segur [AKNS1] showed that one relatively minor modication of the Zakharov and
Shabat approach recovers the theory of the KdV equation, while another leads to an Inverse
Scattering Theory analysis for a third very important evolution equation, the Sine-Gordon
Equation (SGE). They went on [AKNS2] to develop the Zakharov and Shabat technique
into a general method for PDE with values in 2 2-matrix groups [AKNS2], and Zakharov
and Shabat further generalized it to the case of n n-matrix groups. Following current
custom, we will refer to this method as the ZS-AKNS Scheme.
To prepare for the introduction of the ZS-AKNS Scheme, we next develop the infra-
structure on which Zero Curvature Equations are based. We x a matrix Lie Group G and
denote its Lie algebra by (. That is, G is some closed subgroup of the group GL(n, C) of all
nn complex matrices, and ( is the set of all nn complex matrices, X, such that exp(X)
is in G. If you feel more comfortable working with a concrete example, think of G as the
group SL(n, C) of all n n complex matrices of determinant 1, and ( as its Lie algebra
sl(n, C) of all n n complex matrices of trace zero. In fact, for the original ZS-AKNS
Scheme, G = SL(2, C) and ( = sl(2, C), and we will carry out most of the later discussion
with these choices, but for what we will do next the precise nature of G is irrelevant.
Let be a at connection for the trivial principal bundle R
2
G. Then we can write
= d , where is a 1-form on R
2
with values in the Lie algebra (. Using coordinates
(x, t) for R
2
we can then write = Adx+Bdt where A and B are smooth maps of R
2
into
(.
If X is a vector eld on R
2
, then the covariant derivative operator in the direction X is
X
=
X
(X), and in particular, the covariant derivatives in the coordinate directions
x
and

t
are
x
=

x
A and
t
=

t
B.
Since we are assuming that is at, it determines a global parallelism. If (x
0
, t
0
) is any
point of R
2
, then we have a map : R
2
G, where (x, t) is the parallel translation
operator from (x
0
, t
0
) to (x, t). Considered as a section of our trivial principal bundle, is
covariant constant, i.e.,
X
= 0 for any tangent vector eld X. In particular, taking X
to be

x
and

t
gives the relations
x
= A and
t
= B.
There are many equivalent ways to express the atness of the connection . On the
one hand the curvature 2-form d is zero. Equivalently, the covariant derivative
operators in the

x
and

t
directions commute, i.e., [

x
A,

t
B] = 0, or nally,
39
equating the cross-derivatives of , (A)
t
=
xt
=
tx
= (B)
x
. Expanding the latter
gives A
t
+A
t
= B
x
+B
x
or A
t
+AB = B
x
+BA, and right multiplying by
1
we arrive at the so-called Zero-Curvature Condition: A
t
B
x
+[A, B] = 0. Rewriting this
as A
t
= B
x
+ [B, A], and noting that [B,

x
] = B
x
, we see that the Zero-Curvature
Condition has an equivalent formulation as a Lax Equation:
(ZCC)
_

x
A
_
t
=
_
B,

x
A
_
,
and it is ZCC that plays the central r ole in the ZS-AKNS Scheme.
Recall what ZCC is telling us. If we look at t as a parameter, then the operator

x

A(x, t
0
) is the covariant derivative in the x-direction along the line t = t
0
, and the Lax
Equation ZCC says that as a function of t
0
these operators are all conjugate. Moreover the
operator (t
0
, t
1
) implementing the conjugation between the time t
0
and the time t
1
satises
t
= B, which means it is parallel translation from (x, t
0
) to (x, t
1
) computed by going
vertically along the curve t (x, t). But since

x
A(x, t
0
) generates parallel translation
along the horizontal curve x (x, t
0
), what this amounts to is the statement that parallel
translating horizontally from (x
0
, t
0
) to (x
1
, t
0
) is the same as parallel translation vertically
from (x
0
, t
0
) to (x
0
, t
1
) followed by parallel translation horizontally from (x
0
, t
1
) to (x
1
, t
1
)
followed by parallel translation vertically from (x
1
, t
1
) to (x
1
, t
0
). Thus, in the case of ZCC,
the standard interpretation of the meaning of a Lax Equation reduces to a special case of the
theorem that if a connection has zero curvature, then the holonomy around a contractible
path is trivial.
4.2 Some ZS-AKNS Examples
The ZS-AKNS Scheme is a method for solving the initial value problem for certain
(hierarchies of) evolution equations on a space of potentials P. In general P will be of
the form o(R, V ), where V is some nite dimensional real or complex vector space, i.e.,
each potential u will be a map x u(x) of Schwartz class from R into V . (A function u
with values in V is of Schwartz class if, for each linear functional on V , the scalar valued
function u is of Schwartz class, or equivalently if, when we write u in terms of a xed basis
for V , its components are of Schwartz class.) The evolution equations in question are of the
form u
t
= F(u) where the map F : P P is a polynomial dierential operatori.e., it
has the form F(u) = p(u, u
x
, u
xx
, . . .), where p is a polynomial mapping of V to itself.
When we say we want to solve the initial value (or Cauchy) problem for such an
equation, we of course mean that given u
0
= u(x, 0) in P we want to nd a smooth map
t u(t) = u(x, t) of R to P with u(0) = u
0
and u
t
(x, t) = p(u(x, t), u
x
(x, t), u
xx
(x, t), . . .).
In essence, we want to think of F as a vector eld on P and construct the ow
t
that it
generates. (Of course, if P were a nite dimensional manifold, then we could construct the
ow
t
by solving a system of ODEs, and as we shall see, the ZS-AKNS Scheme allows us
in certain cases to solve the PDE u
t
= p(u, u
x
, u
xx
, . . .) by reducing it to ODEs.)
The rst and crucial step in using the ZS-AKNS Scheme to study a particular such
evolution equation consists in setting up an interpretation of A and B so that the equation
u
t
= p(u, u
x
, u
xx
, . . .) becomes a special case of ZCC.
To accomplish this, we rst identify V with a subspace of ( (so that P = o(R, V )
40
becomes a subspace of o(R, ()), and dene a map u A(u) of P into C
(R, () of the
form A(u) = const + u, so that if u depends parametrically on t, then (

x
A(u))
t
= u
t
.
Finally (and this is the dicult part) we must dene a map u B(u) of P into C
(R, ()
so that [B(u),

x
A(u)] = p(u, u
x
, u
xx
, . . .).
To interpret the latter equation correctly, and in particular to make sense out of the
commutator bracket in a manner consistent with our earlier interpretation of A and B,
it is important to be clear about the interpretation A(u) and B(u) as operators, and in
particular to be precise about the space on which they are operating. This is just the space
C
(R, gl(2, C)) of smooth maps of Rinto the space of all complex 22 matrices. Namely,
we identify A(u) with the zero-order dierential operator mapping to A(u), the pointwise
matrix product of A(u)(x) and (x), and similarly with B(u). (This is a complete analogy
with the KdV situation, where in interpreting the Schr odinger operator, we identied our
potential u with the operator of multiplication by u.) Of course (

x
)(x) =
x
.
We will now illustrate this with three examples: the KdV equation, the Nonlinear
Schr odinger Equation (NLS), and the Sine-Gordon Equation (SGE). In each case V will
be a one-dimensional space that is embedded in the space of o-diagonal complex matrices
_
0 b
c 0
_
, and in each case A(u) = a + u, where is a complex parameter, and a is the
constant, diagonal, trace zero matrix a =
_
i 0
0 i
_
.
Example 1. [AKNS1] Take u(x) =
_
0 q(x)
1 0
_
, and let
B(u) = a
3
+u
2
+
_
i
2
q
i
2
q
x
0
i
2
q
_
+
_
q
x
4
q
2
2

q
xx
4
q
2
q
x
4
_
.
Then an easy computation shows that ZCC is satised if and only if q satises KdV in the
form q
t
=
1
4
(6qq
x
+q
xxx
).
Example 2. [ZS] Take u(x) =
_
0 q(x)
q(x) 0
_
, and let
B(u) = a
2
+u +
_
i
2
[q[
2 i
2
q
x
i
2
q
x

i
2
[q[
2
_
.
In this case ZCC is satised if and only if q(x, t) satises the so-called Nonlinear Schr odinger
Equation (NLS) q
t
=
i
2
(q
xx
+ 2[q[
2
q).
Example 3. [AKNS1] Take u =
_
0
q
x
(x)
2
q
x
(x)
2
0
_
, and let B(u) =
1
v where v(x) =
i
4
_
cos q(x) sinq(x)
sinq(x) cos q(x)
_
. In this case, ZCC is satised if and only if q satises the Sine-
Gordon Equation (SGE) in the form q
xt
= sinq.
41
Appendix A
Symplectic Manifolds and Hamiltonian Systems
If P is a nite dimensional smooth manifold, then a symplectic structure for P is a closed
non-degenerate 2-form on P. Non-degenerate means that at each point p of P, the map
v i
v
= (v, ) of TP
p
to its dual T
P
p
is a linear isomorphism. It then follows that
if F : P R is a smooth real-valued function on P, then there is a uniquely determined
vector eld X on P such that i
X
= dF, and we call X the symplectic gradient of F and
denote it by
s
F.
By an important theorem of Darboux, ([Ar], Chapter 8) in the neighborhood of any
point of P there exist canonical coordinates q
1
, . . . , q
n
, p
1
, . . . , p
n
in which has the form
i
dp
i
dq
i
, and in these coordinates
s
H =

i
(
H
p
i
q
i
H
q
i
p
i
), or equivalently the
solution curves of
s
H satisfy Hamiltons equations
dp
i
dt
=
H
q
i
,
dq
i
dt
=
H
p
i
.
We next recall some facts about Lie derivatives. If X is a smooth vector eld on a smooth
manifold M, generating a ow
t
, and if T is any smooth tensor eld on M, then the Lie
derivative of T with respect to X is the tensor eld L
X
T =
d
dt
[
t=0
t
(T). If L
X
T = 0, then
we shall say that X preserves T, for this is the necessary and sucient condition that
the ow
t
preserve T, i.e., that
t
(T) = T for all t. There is a famous formula of Cartan
for the Lie derivative operator L
X
restricted to dierential forms, identifying it with the
anti-commutator of the exterior derivative operator d and the interior product operator i
X
:
L
X
= di
X
+i
X
d.
If is a closed p-form, this gives L
X
= d(i
X
), so X preserves if and only if the (p 1)-
form i
X
is closed. In particular this demonstrates the important fact that a vector eld
X on a symplectic manifold P is symplectic (i.e., preserves the symplectic form, ) if and
only if i
X
is a closed 1-form (and hence, at least locally, the dierential of a smooth
function). The well known identity L
[X,Y ]
= [L
X
, L
Y
] implies that the space of symplectic
vector elds on P is a Lie algebra, which we can think of as the Lie algebra of the group
of symplectic dieomorphisms of P. It is an interesting and useful fact that the space of
Hamiltonian vector elds on P, i.e., those for which i
X
is an exact form, dF, is not only a
linear subspace, but is even a Lie subalgebra of the symplectic vector elds, and moreover
the commutator subalgebra of the symplectic vector elds is included in the Hamiltonian
vector elds. To demonstrate this we shall show that if i
X
and i
Y
are closed forms,
then i
[X,Y ]
is not only closed but even exact, and in fact it is the dierential of the
function (Y, X). First, using the fact that Lie derivation satises a Leibnitz formula
with respect to any natural bilinear operation on tensors (so in particular with respect
to the interior product), L
X
(i
Y
) = i
(L
X
Y )
+ i
Y
(L
X
). Thus, since L
X
Y = [X, Y ] and
L
X
= 0, L
X
(i
Y
) = i
[X,Y ]
. Finally, since d(i
Y
) = 0, Cartans formula for L
X
(i
Y
) gives
i
[X,Y ]
= di
X
(i
Y
) = d((Y, X)).
5.0.1 Remark. Cartans Formula can be proved easily as follows. There is an important
involutory automorphism of the algebra A of dierential forms on a manifold.
Namely, it is the identity on forms of even degree and is minus the identity on forms of odd
degree. A linear map : A A is called an anti-derivation if () = +

. It
42
is of course well-known that the exterior derivative, d, is an anti-derivation (of degree +1),
and an easy check shows that the interior product i
X
is an anti-derivation (of degree 1).
Moreover, the anti-commutator of two anti-derivations is clearly a derivation, so that L
X
and di
X
+ i
X
d are both derivations of A, and hence to prove they are equal it suces to
check that they agree on a set of generators of A. But A is generated by forms of degree zero
(i.e., functions) and the dierentials of functions, and it is obvious that L
X
and di
X
+ i
X
d
agree on these.
We shall also have to deal with symplectic structures on innite dimensional manifolds.
In this case we still require that is a closed form and we also still require that is weakly
non-degenerate, meaning that for each point p of P, the map v i
v
of TP
p
to TP
p
is injective. In nite dimensions this of course implies that is strongly non-degenerate
meaning that the latter map is in fact an isomorphismbut that is rarely the case in innite
dimensions, so we will not assume it. Thus, if F is a smooth function on P, it does not
automatically follow that there is a symplectic gradient vector eld
s
F on P satisfying
((
s
F)
p
, v) = dF
p
(v) for all v in TP
p
this must be proved separately. However, if a
symplectic gradient does exist, then weak non-degeneracy shows that it is unique. In the
innite dimensional setting we call a function F : P R a Hamiltonian function if it
has a symplectic gradient, and vector elds of the form
s
F will be called Hamiltonian
vector elds. Obviously the space of Hamiltonian functions is linear, and in fact the formula
d(FG) = FdG+GdF shows that it is even an algebra, and that
s
(FG) = F
s
G+G
s
F.
We shall call a vector eld X on P symplectic if the 1-form i
X
is closed but not necessarily
exact, for as we have seen, this is the condition for the ow generated by X to preserve .
Of course if P is a vector space, the distinction between Hamiltonian and symplectic
disappears: if i
X
is closed, then H(p) =
_
1
0

tp
(X
tp
, p) dt denes a Hamiltonian function
with
s
H = X. Moreover, in this case it is usually straightforward to check if i
X
is closed.
Given u, v in P, consider them as constant vector elds on P, so that [u, v] = 0. Then the
formula d(u, v) = u((v)) v((u)) ([u, v]) for the exterior derivative of a 1-form shows
that symmetry of
d
dt
t=0
(X
p+tu
, v) in u and v is necessary and sucient for i
X
to be closed
(and hence exact). In case is a constant form (i.e.,
p
(u, v) is independent of p) , then
d
dt
t=0
(X
p+tu
, v) = ((DX
p
)(u), v), where (DX)
p
(u) =
d
dt
t=0
X
p+tu
is the dierential of
X at p. Since is skew-symmetric in u and v, this shows that if is constant, then X is
Hamiltonian if and only if (DX)
p
is skew-adjoint with respect to .
If two smooth real-valued functions F
1
and F
2
on a symplectic manifold P are Hamil-
tonian, i.e., if they have symplectic gradients
s
F
1
and
s
F
2
, then they determine a third
function on P, called their Poisson bracket, dened by:
F
1
, F
2
= (
s
F
2
,
s
F
1
).
The formula i
[X,Y ]
= d((Y, X)) shows that the Poisson bracket is also a Hamiltonian
function, and in fact
s
F
1
, F
2
= [
s
F
1
,
s
F
2
].
What this formula says is that Hamiltonian functions F : P R are not only a commuta-
tive and associative algebra under pointwise product, but also a Lie algebra under Poisson
bracket, and F
s
F is a Lie algebra homomorphism of this Lie algebra onto the Lie
algebra of Hamiltonian vector elds on P. In particular, we see that the Poisson bracket
satises the Jacobi identity,
F
1
, F
2
, F
3
+F
2
, F
3
, F
1
+F
3
, F
2
, F
1
= 0,
43
and the Leibnitz Rule
s
(FG) = F
s
G+G
s
F gives:
F
1
, F
2
F
3
= F
1
, F
2
F
3
+F
2
F
1
, F
3
,
which we will also call the Leibnitz Rule.
Since F
1
, F
2
= (
s
F
2
,
s
F
1
) = dF
2
(
s
F
1
) =
s
F
1
(F
2
), we can interpret the Poisson
bracket of F
1
and F
2
as the rate of change of F
2
along the solution curves of the vector eld
s
F
1
. If we are considering some xed Hamiltonian system
dx
dt
=
s
H
x
on P, then we can
write this as
dF
dt
= H, F, and we see that the vanishing of the Poisson bracket H, F is
the necessary and sucient condition for F to be a constant of the motion. By the Jacobi
Identity, a corollary to this observation is that the Poisson Bracket of two constants of the
motion is also a constant of the motion. And since H, H = 0, H itself is always a constant
of the motion.
Since the Poisson bracket is skew-symmetric, F
1
, F
2
is zero if and only if F
2
, F
1
is
zero, and in this case we say that F
1
and F
2
are in involution. More generally k Hamiltonian
functions F
1
, . . . , F
k
are said to be in involution if all of the Poisson brackets F
i
, F
j
vanish.
Note that since
s
F
i
, F
j
= [
s
F
i
,
s
F
j
], if the F
i
are in involution then the vector elds
s
F
i
commute, i.e., [
s
F
i
,
s
F
j
] = 0, or equivalently the ows they generate commute.
In particular we see that if F
1
, . . . , F
n
are in involution and if each
s
F
i
generates a one-
parameter group of dieomorphisms
i
t
of P, then (t
1
, . . . , t
n
)
1
t
1

2
t
2
. . .
n
t
n
denes
a symplectic action of the abelian group R
n
on P.
Suppose P is a symplectic manifold of dimension 2n and that there exist n functions F
i
such that the dF
i
are everywhere linearly independent. If the functions F
i
are in involution
with each other and with a function H, then the so-called Arnold-Liouville Theorem ([Ar],
Chapter 10) states that the Hamiltonian system
s
H is completely integrable in the sense
mentioned earlier, i.e., there exist action-angle variables q
1
, . . . , q
n
, p
1
, . . . , p
n
. In fact, com-
plete integrability of a 2n dimensional Hamiltonian system is often dened as the existence
of n functionally independent constants of the motion in involution.
44
Appendix B
Wave Equations as Continuum Limits of Lattice Models
Introduction.
In this appendix we will indicate one way that physically natural wave equations can be
deduced more or less directy from rst principles. While, much of what we do below can be
generalized to n space dimensions, we shall as usual, concentrate on the 1-dimensional case.
We imagine a 1-dimensional physical medium that it may help to think of as a stretched
piano wire or a steel bar of small cross-section, but as we shall see later there are other
useful interpretations. We think of this initially as a continuum that is placed along the
x-axis from 0 to its length . We assume that the property of this system that interests us
can be described by a scalar eld u, that we will usually take to be real-valued (although
the case that u is complex-valued is also of interest). The value of u at the point x [0, ]
and time t is u(x, t). The time evolution of the system will then be described by some sort
of evolution PDE for u of the general form u
tt
= F(u) or u
t
= G(u), and we want to deduce
the general form of the right hand side from basic physical laws.
Our strategy will be to take an atomistic viewpointreal physical media after all are
discrete, not continuous, made up of particles (molecules) arranged in space with a very
small average distance h separating nearest neighbors. In our simplied model, we will
suppose that there are N identical particles, located at the lattice points p
i
= ih where
0 i N 1, and h = /(N 1). We shall refer to the particle at p
i
as P
i
.
We assume each P
i
is a Newtonian particle with a mass m and that the property we are
concerned with is associated to a degree of freedom of the particle that is uncoupled from its
other degrees of freedom, and that we will denote by x
i
. We do not at this point x a a precise
interpretation of x
i
in terms of say the spatial location of the particle P
i
. In a particular
model, it will often represent the deviation of P
i
from an equilibrium conguration. The
relation between the eld u and the particles P
i
is that u should interpolate the values x
i
at the points p
i
, i.e., u(p
i
, t) = x
i
(t).
Note that if the density of our medium is then the mass of a length h between p
i
and
p
i+1
is m = h, which we think of as being concentrated in the particle P
i
. Eventually we
will to pass to a continuum limit by letting N tend to innity (and hence h 0), keeping
the density, , xed.
Since the evolution of the x
i
in time is governed by Newtons Third Law of Motion,force
= mass acceleration, i.e., m x
i
= F
i
, to specify a model, we must specify the force, F
i
, on
each particle P
i
. Assuming the system is isolated, F
i
will be a sum of terms, F
i
=

j
F
i,j
,
where F
i,j
is the force on the particle P
i
due to the particle P
j
, and by Newtons Second
Law, equating action and reaction, F
ij
= F
ji
. We will restrict attention to the case of so-
called nearest-neighbor interactions. This means that we assume F
ij
= 0 unless P
j
is one of
the two nearest neighbors of P
j
, i.e., j = i +1 or j = i 1, so F
i
= F
i,i+1
+F
i,i1
= F
i,i+1
F
i1,i
. Since the particle are identical, there is a universal force law expressing F
i,i+1
as
a xed function of x
i
and x
i+1
. We will make the natural assumption that our force law is
translation invariant, so it only depends on the dierence x
i
x
i+1
: F
i,i+1
= F(x
i
x
i+1
),
45
where F(0) = 0. Thus F
i
= F(x
i
x
i+1
)F(x
i1
x
i
), and to complete the specication of
a model, we may specify either the force function F, or equivalently the potential function
V (r) =
_
r
0
F(y) dy (in terms of which F(y) = V
(y) ).
Now, as we remarked above, in the models we shall consider, x
i
will be a measure of the
deviation from equilibrium of P
i
. In particular, if all the x
i
are zero, then all the forces F
i
will also be zero, and it follows that V
(0) = 0, or in other words 0 is a critical point of

V , so V has the Taylor expansion V (y) =
k
2
y
2
+ R(y)y
3
near 0. We shall assume that in
fact 0 is a non-degenerate minimum of V , i.e., k > 0. Physically this means that taking all
x
i
equal 0 gives a stable equilibrium of the system, in the sense that any small deviation
from this state will create forces that drive the system back towards equilibrium. Since
V
(y) = ky +S(y)y
2
(where S(y) = yR
(y) + 3R(y)), and F = V
,
F
i
=V
(x
i1
x
i
) V
(x
i
x
i+1
)
=k(x
i+1
+x
i1
2x
i
) +T(x
i1
, x
i
, x
i+1
),
where T(x
i1
, x
i
, x
i+1
) = S(x
i1
x
i
)(x
i1
x
i
)
2
S(x
i
x
i+1
)(x
i
x
i+1
)
2
.
The constant k has the interpretation of the spring constant for a piece of the medium
of length h, so if is the Youngs modulus for the medium (the spring constant for a piece
of unit length) then k = /h. (If you halve the length of a spring, it becomes twice as hard
to stretch it.) So, if we write c =
_
and recall that m = h, we have

k
m
= c
2 1
h
2
, and we
can rewrite the above formula for F
i
as
1
m
F
i
= c
2
_
x
i+1
+x
i1
2x
i
h
2
_
+
1
m
T(x
i1
, x
i
, x
i+1
.)
Let us rst consider the case the case where our medium rigorously satises Hookes Law,
i.e., the remainder term R(y) (and hence also S and T) are identically zero. (Or else assume
that the deviations from equilibrium are so small that the quadratic terms in the x
i
can be
ignored.) Thus the forces F
i
are linear, and Newtons Equations of Motion become simply
x
i
= c
2
_
x
i+1
+x
i1
2x
i
h
2
_
.
We can now easily pass to the continuum limit. That is, by letting N tend to innity
(so h tends to zero) we can derive a PDE for the function u(x, t) If we take x = p
i
, then
by denition u(x, t) = x
i
(t) and since p
i
+ h = p
i+1
while p
i
h = p
i1
, the latter form of
Newtons equations gives:
u
tt
(x, t) = c
2
u(x +h, t) +u(x h, t) 2u(x, t)
h
2
.
Next recall Taylors formula:
f(x h) = f(x) hf
(x) +
h
2
2!
f
(x)
h
3
3!
f
(x) +
h
4
4!
f
(x) +O(h
5
).
If we now take f(x) = u(x, t), this gives:
u(x +h, t) +u(x h, t) 2u(x, t)
h
2
= u
xx
(x, t)+
_
h
2
12
_
u
xxxx
(x, t) +O(h
4
),
46
so letting h 0, we nd u
tt
= c
2
u
xx
, i.e., u satises the linear wave equation, with
propagation speed c.
Example. Longitudinal vibrations (sound waves) in a metal bar.
In this interpretation of the above, we assume that the atoms are in equilibrium when
the all the particles P
i
are in the locations p
i
, and that the actual positions of the particles
at time t are X
i
(t) = p
i
+ x
i
(t). Note that the particles are moving longitudinally (i.e.,
along the length of the medium). This models the sound vibrations in a clamped metal bar
when it is struck at one end.
Example. Transverse vibrations of a stretched wire (piano string).
In this interpretation we assume that the articles are constrained to move orthogonally
to the medium. That is, the particles are in equilibrium when located at the points (p
i
, 0)
of the xy-plane, and the actual position of the particle at time t is (p
i
, x
i
(t)). If we think of
this as a stretched piano wire and if the tension along the string is T, then F
i
is the vertical
or y-component of the tension, which is
T
(x
i
x
i+1
)
_
h
2
+ (x
i
x
i+1
)
2
=
T
h
(x
i
x
i+1
)
_
1 + (
x
i
x
i+1
h
)
2
.
If we assume that the slope of the wire, x
i
x
i+1
/h, is small compared with 1, we can
approximate this by
T
h
(x
i
x
i+1
), which reduces to the general case above if we take k =
T
h
,
i.e., if we identify T with the Youngs modulus . It follows that with the above assumption
that the slope of the wire is small, we again get a wave equation with the propogation speed
c equal to
_
T
.
Exercise. The Sine-Gordon Equation as a continuum limit.
If we put a piano wire under tension and clamp the ends, then it resists twisting as well
as stretching. If attach a pendulum P
i
at each of the points p
i
, we get a lattice of torsion
pendulums, and we let x
i
denote the angle that the P
i
makes with a xed direction, say
the vertical. In this case m
i
should measure moment of inertia of P
i
around the wire, and
F
i
the torque on P
i
due to the twisting of the wire between P
i
and P
i+1
. (We neglect
the external force of gravity.) Show that the Sine-Gordon equation can be obtained as a
continuum limit this lattice. (See Remoissenet, M., Waves Called Solitons: Concepts and
Experiments, Springer, 1994.)
Exercise. The Lagrangian approach to nding continuum limits.
The kinetic energy of our nearest neighbor lattice is K =
m
2
i
x
2
i
, and the potential
energy is U =
k
2
i
((x
i
x
i+1
)
2
+ (x
i1
x
i
)
2
). Show that as N , these approach
respectively to K =

2
_
0
u
t
(x, t)
2
dx and U =

2
_
0
u
x
(x, t)
2
dx, and hence the action of
the system (in a time interval [a, b]) converges to A =
_
b
a
dt
_
0
(
2
u
t
(x, t)
2

2
u
x
(x, t)
2
) dx.
Show that the Euler-Lagrange equations for extremalizing this action is just the linear wave
equation for u derived above.
So far we have only dealt with the case of a harmonic lattice, that is one with quadratic
potential. We are now going to generalize this by looking at the eect of the next (i.e.,
47
cubic) term in the potential function V . That is, we are now going to assume that we
have a slightly anharmonic lattice, with potential function V (y) =
k
2
y
2
+

3
y
3
, so the force
is F(y) = ky y
2
, and F
i
= k(x
i+1
+ x
i1
2x
i
)[1 + (x
i+1
x
i1
)], and Newtons
Equations now take the form:
(FPU) x
i
= c
2
_
x
i+1
+x
i1
2x
i
h
2
_
[1 +(x
i+1
x
i1
)].
which we will refer to as the Fermi-Pasta-Ulam lattice equations, after the three mathe-
maticians and physicists who studied it numerically in 1955, using one of the rst digital
computers.
Finding a good continuum limit for this non-linear lattice is a lot more sophisticated
than one might at rst expect after the easy time we had with the linear case. In fact the
approach to the limit has to be handled with considerable skill to avoid inconsistent results,
and it involves several non-obvious steps.
As before we let u(x, t) denote the function measuring the displacement at time t of
the particle with equilibrium position x, so if x = p
i
then, by denition, x
i
(t) = u(x, t),
x
i+1
(t) = u(x+h, t), and x
i1
(t) = u(xh, t). Of course x
i
= u
tt
(x, t) and, as noted earlier,
Taylors Theorem with remainder gives
x
i+1
+x
i1
2x
i
h
2
=
u(x +h, t) +u(x h, t) 2u(x, t)
h
2
= u
xx
(x, t)+
_
h
2
12
_
u
xxxx
(x, t) +O(h
4
).
By a similar computation
(x
i+1
x
i1
) = (2h)u
x
(x, t)+
_
h
3
3
_
u
xxx
(x, t) +O(h
5
),
so substitution in (FPU) gives
_
1
c
2
_
u
tt
u
xx
= (2h)u
x
u
xx
+
_
h
2
12
_
u
xxxx
+O(h
4
).
As a rst attempt to derive a continuum description for the FPU lattice in the non-linear
case, it is tempting to just let h approach zero and assume that 2h converges to a limit .
This would give the PDE
u
tt
= c
2
(1 +u
x
)u
xx
as our continuum limit for the FPU Lattice equations and the non-linear generalization of
the wave equation. But this leads to a serious problem. This equation is familiar in applied
mathematicsit was studied by Rayleigh in the last centuryand it is easy to see from
examples that its solutions develop discontinuities (shocks) after a time on the order of
(c)
1
, which is considerably shorter than the time scale of the almost periods observed in
the Fermi-Pasta-Ulam experiments. It was Zabusky who realized that the correct approach
was to retain the term of order h
2
and study the equation
(ZK)
_
1
c
2
_
u
tt
u
xx
= (2h)u
x
u
xx
+
_
h
2
12
_
u
xxxx
.
48
If we dierentiate this equation with respect to x and make the substitution v = u
x
, we see
that it reduces to the more familiar Boussinesq equation
_
1
c
2
_
v
tt
= v
xx
+h
2
(v
2
)
x
2
+
_
h
2
12
_
v
xxxx
,
(The eect of the fourth order term is to add dispersion to the equation, and this smoothes
out incipient shocks before they can develop.)
It is important to realize that, since h ,= 0, (ZK) cannot logically be considered a true
continuum limit of the FPU lattice. It should rather be regarded as an asymptotic approx-
imation to the lattice model that works for small lattice spacing h (and hence large N).
Nevertheless, we shall now see how to pass from (ZK) to a true continuum description of
the FPU lattice.
The next step is to notice that, with and h small, solutions of (ZK) should behave
qualitatively like solutions of the linear wave equation u
tt
= c
2
u
xx
, and increasingly so as
and h tend to zero. Now the general solution of the linear wave equation is of course
u(x, t) = f(x + ct) + g(x ct), i.e., the sum of an arbitrary left moving traveling wave
and an arbitrary right moving traveling wave, both moving with speed c. Recall that it is
customary to simplify the analysis in the linear case by treating each kind of wave separately,
and we would like to do the same here. That is, we would like to look for solutions u(x, t)
that behave more and more like (say) right moving traveling waves of velocity cand for
longer and longer periods of timeas and h tend to zero.
It is not dicult to make precise sense out of this requirement. Suppose that y(, )
is a smooth function of two real variables such that the map y(, ) is uniformly
continuous from R into the bounded functions on R with the sup normi.e., given > 0
there is a positive such that [
0
[ < implies [y(, ) y(,
0
)[ < . Then for
[t t
0
[ < T = /(hc) we have [hct hct
0
[ < , so [y(xct, hct) y(xct, hct
0
)[ < .
In other words, the function u(x, t) = y(x ct, hct) is uniformly approximated by the
traveling wave u
0
(x, t) = y(x ct, hct
0
) on the interval [t t
0
[ < T (and of course T
as and h tend to zero). To restate this a little more picturesquely, u(x, t) = y(xct, hct)
is approximately a traveling wave whose shape gradually changes in time. Notice that if
y(, ) is periodic or almost periodic in , the gradually changing shape of the approximate
traveling wave will also be periodic or almost periodic.
To apply this observation, we dene new variables = x ct and = (h)ct. Then by
the chain rule,
k
/x
k
=
k
/
k
, /t = c(/ (h)/), and
2
/t
2
= c
2
(
2
/
2
(2h)
2
/) + (h)
2
2
/
2
).
Thus in these new coordinates the wave operator transforms to:
1
c
2
2
t
2

2
x
2
= 2h

2
+ (h)
2

2
2
,
so substituting u(x, t) = y(, ) in (ZK) (and dividing by 2h) gives:
y

_
h
2
_
y
= y
_
h
24
_
y
,
and, at last, we are prepared to pass to the continuum limit. We assume that and h tend
to zero at the same rate, i.e., that as h tends to zero, the quotient h/ tends to a positive
49
limit, and we dene = lim
h0
_
h/(24). Then h = O(h
2
), so letting h approach zero
gives y
+ y
+
2
y
= 0. Finally, making the substitution v = y
we arrive at the
KdV equation:
(KdV ) v
+vv
+
2
v
= 0.
Remark. Note that if we re-scale the independent variables by and , then
the KdV equation becomes:
v
+
_
_
vv
+
_

3
_
2
v
= 0,
so by appropriate choice of and we can obtain any equation of the formv
+vv
+v
=
0, and any such equation is referred to as the KdV equation. A commonly used choice that
is convenient for many purposes is v
+6vv
+v
= 0, although the form v
6vv
+v
= 0
(obtained by replacing v by v) is equally common. We will use both these forms.
Let us recapitulate the relationship between the FPU Lattice and the KdV equation.
Given a solution x
i
(t) of the FPU Lattice we get a function u(x, t) by interpolationi.e.,
u(ih, t) = x
i
(t), i = 0, . . . , N. For small lattice spacing h and non-linearity parameter there
will be solutions x
i
(t) so that the corresponding u(x, t) will be an approximate right moving
traveling wave with slowly varying shape, i.e., it will be of the form u(x, t) = y(xct, hct)
for some smooth function y(, ), and the function v(, ) = y
(, ) will satisfy the KdV

equation v
+vv
+
2
v
= 0, where
2
= h/(24).
50
Appendix C
Solving Wave Equations Numerically:
The Pseudospectral Method
Introduction.
So-called pseudospectral methods, for the numerical integration of evolution equations, use
discrete Fourier transforms instead of nite dierencing to evaluate spatial derivatives (an
excellent early article is [FW]). A surprising fact is that these methods often work very well
for nonlinear equations. The time-stepping for pseudospectral methods is accomplished by
a classical dierencing scheme that can in principle be either explicit or implicit, but for
the usual stability reasons, an implicit method such as Crank-Nicolson (the trapezoidal rule)
is usually preferred. However, when the equation is nonlinear, the solution of the implicit
equations that arise can present a problem. One approach is to employ split-stepping; use
Crank-Nicolson plus Gaussian elimination for the linear terms, but fall back to an explicit
method for the nonlinear terms. An alternative approach (pioneered in [WMGS] and that
we refer to as the WMGS method) is to treat the linear and nonlinear terms together, write
the implicit equation in xed-point form, and then solve it by an iteration scheme.
WGMS originally developed their method to solve the initial value problems for the KdV
and KP equations with periodic boundary conditions, and we became aware of their tech-
nique via [S], in which D. H. Sattinger reports on a modiied WGMS method for solving
the KdV equation developed in collaboration with Yi Li. In this paper, we will discuss a
generalization of the WGMS algorithm to treat the initial value problem for a fairly broad
class of evolutionary PDE that are weakly nonlinear, in the sense that their nonlinear
terms are a lower order perturbation of the linear part (see below for a precise denition)
and we will prove a convergence theorem for the iteration method that is at the heart of the
WGMS algorithm.
Weakly Nonlinear PDE of Evolution.
Let U denote a nite dimensional complex inner product space and V a vector space of
U-valued functions on R. Usually we will work in a xed orthonormal basis (e
1
, . . . , e
n
) for
U and use it to identify U with C
n
, so that elements u of V can be considered as n-tuples
(u
1
, . . . , u
n
) of complex-valued functions. (If n = 1 we shall say we are in the scalar case and
we then identify u with u
1
.)
We will specify V more precisely later, but the elements of V will admit derivatives up
to a certain order, and they will in most cases be required to be 2-periodic, in which case
we shall also consider them as functions on the unit circle in the complex plane. If u(t) is a
curve in V we will also write u(x, t) for u(t)(x). As usual we think of t as denoting time and
x as space. We denote by D the dierentiation operator

x
and we also write u
i
x
= Du
i
,
u
i
xx
= D
2
u
i
, etc., and of course u
t
=
u
t
.
We will be considering evolution equations of the form u
t
= F(u), where F : V V
should be thought of as a vector eld on V , and its form will be a smooth function (usually
polynomial) of the u
i
and their derivatives, Du
j
, D
2
u
k
, . . .. Usually F(u) will the sum of a
dominant linear dierential operator, and a nonlinear part that we can consider as a small
51
perturbation of this linear part. By a linear dierential operator on V we will always mean
an operator of the form u L(u) = (L
1
(u), . . . , L
n
(u)) where L
i
(u) =

n
j=1
L
i
j
(D)u
j
. Here
each L
i
j
(X) is a polynomial with constant coecients in an indeterminate X. In the scalar
case L(u) = L(D)u and we will often use L(D)u to denote L(u) in the general case too.
The simplest kind of nonlinear operator that we shall consider is a zero order nonlinear
operator, by which we will mean a map of the form u G(u) = (G
1
(u), . . . G
n
(u)), where
G
i
(u)(x) = G
i
(u
1
(x), . . . , u
n
(x)) and G
i
(Y
1
, . . . , Y
n
) is either a constant coecient polyno-
mial on C
n
or more generally an entire function of these variables (i.e., given by a power
series that converges for all values of (Y
1
, . . . , Y
n
)). Of course, care must be taken to make
sure that if u V then also G(u) V . When we come to the rigorous proofs, we will assume
that V is one of the Sobolev Hilbert spaces H
m
(R, U) for m >
1
2
, and since it is well-known
that H
m
(R, C) is a Banach algebras, it follows easily that G is a smooth map of H
m
(R, U)
to itself. The most general kind of nonlinearity that we will consider will be one that can
be factored into a composition of the form M(D)G(u) where M(D) is a linear dierential
operator as above and G(u) is a zero order nonlinearity.
If L(X) =

m=1
a
m
X
m
is a complex polynomial, then the dierential operator L(D)
is called formally skew-adjoint if L(D)u
1
, u
2
) = u
1
, L(D)u
2
) whenever u
1
and u
2
are
smooth maps of R into U with compact support. Here u, v) denotes the L
2
inner product,
i.e., u, v) :=
_
u(x), v(x)) dx. Integration by parts shows that D is skew-adjoint. More-

over an odd power of a formally skew-adjoint operator (and i times an even power) is clearly
again formally skew-adjoint, so it follows that L(D) is formally skew-adjoint if and only if
the coecients a
m
are real for m odd and imaginary for m even, i.e., if and only if L(ik) is
imaginary for all real k, and it is this last condition that we shall use.
Denition A system of partial dierential equation of the form:
(WNWE) u
i
t
= L
i
(D)u
i
+M
i
G
i
(u).
is called a weakly nonlinear wave equation if:
1) Each L
i
(D) is a formally skew-adjoint operator and the polynomials L
i
(X) all have the
same degree, ,
2) degree M
i
(X) < ,
3) G
i
(0) = 0, so that u(x, t) 0 is a solution of (WNWE).
In what follows we will denote the minimum dierence, degree M
i
(X), by q. For the most
part, we will be dealing with the case n = 1, in which case we put L = L
1
and M = M
1
1
,
so = degree L and q = degree Ldegree M, and a weakly nonlinear wave equation has the
form:
(WNWE) u
t
= L(D)u +M(D)G(u).
Two important examples are the Korteweg-deVries Equation:
(KdV ) u
t
= u
xxx
uu
x
= D
3
u
1
2
D(u
2
),
52
and the Nonlinear Schr odinger Equation:
(NLS) u
t
= iu
xx
+i[u[
2
u = iD
2
u +i[u[
2
u.
In the former case we have L(X) = X
3
, M(X) =
1
2
X, G(X) = X
2
, and in the latter,
L(X) = iX
2
, M(X) = i, G(X) = [X[
2
X.
In the next section we will see that the Sine-Gordon Equation:
(SGE) u
tt
= u
xx
+ sinu
also can be regarded as a weakly nonlinear wave equation.
The Sine-Gordon Equation
A natural reduction of the linear wave equation to a system of rst order PDE is
u
t
=
v
x
,
v
t
=
u
x
. in fact,

2
t
2
u =

t
x
v =

x
t
v =

2
x
2
u.
This suggests that to nd a representation of Sine-Gordon as a weakly nonlinear wave
equation, we should start with systems of the form
u
t
=
v
x
+F(u, v),
v
t
=
u
x
+G(u, v) or
t
(u, v) = L(u, v) + (F(u, v), G(u, v)) where F and G are entire functions on CC, and of
course F(0, 0) = G(0, 0) = 0. We will next show that with appropriate choice of F and G
we do indeed get Sine-Gordon, and moreover that essentially the only other equations of the
form u
tt
= u
xx
+ (u) that arise in this way are the Klein-Gordon equation, u
tt
= u
xx
+ u,
and the Sinh-Gordon equation u
tt
= u
xx
+ sinhu.
Starting out as above,

2
u
t
2
=

t
(
v
x
+ F(u, v)) =

x
v
t
+ F
1
u
t
+ F
2
v
t
=

2
u
x
2
+ (F
1
F +
F
2
G) +
u
x
(G
1
+ F
2
) +
v
x
(G
2
+ F
1
). For the latter to be of the form u
tt
= u
xx
+ (u) we
must have

v
(F
1
F + F
2
G) = 0, G
1
= F
2
, and G
2
= F
1
, in which case u
tt
= u
xx
+ (u)
with = F
1
F +F
2
G.
Next note that these conditions on F and G give F
11
= G
21
= G
12
= F
22
, or in
other words, F is a solution of the one-dimensional wave equation, and hence a sum of a left
moving wave and a right-moving wave: F(u, v) = h(u+v) +k(uv). Then using G
1
= F
2
,
and G
2
= F
1
it follows that G(u, v) = k(u v) h(u + v), where h(0) = k(0) = 0
in order to make F(0, 0) = G(0, 0) = 0. The condition

v
(F
1
F + F
2
G) = 0 now gives
v
(h
(u +v)k(u v) +h(u +v)k
(u v)) = 0 or h
(u +v)k(u v) = h(u +v)k
(u v), or
h
(u+v)
h(u+v)
=
k
(uv)
k(uv)
. Since u +v and u v are coordinates, the only way the last relation can
hold identically is for both sides to be a constant , i.e. h
= h and k
= k.
If is negative, say =
2
, then since h(0) = k(0) = 0, it follows that h(u) = Asin(u)
and k(u) = Bsin(u). If we choose =
1
2
and A = B = 1 we get F(u, v) = sin(
u
2
+
v
2
) +
sin(
u
2

v
2
) = 2 sin
u
2
cos
v
2
and similarly G(u, v) = 2 cos
u
2
sin
v
2
, and this gives the system
of partial dierential equations
u
t
=
v
x
+ 2 sin
u
2
cos
v
2
,
v
t
=
u
x
2 cos
u
2
sin
v
2
, and we
will leave it to the reader to check that if (u, v) is a solution of this system, then u is a
solution of the Sine-Gordon equation. (Other choices of A, B, and lead to equations that
can be transformed to the Sine-Gordon equation by a simple re-scaling of independent and
dependent variables. Similarly taking = 0 gives the Klein-Gordon equation, and positive
gives Sinh-Gordon.)
53
While this system of PDE for u and v is not in the form (WNWE), if we dene u
1
= u+v
and u
2
= u v, then u
1
and u
2
satisfy:
u
1
t
= +u
1
x
+ 2 sin(
u
1
2

u
2
2
),
u
2
t
= u
2
x
+ 2 sin(
u
1
2
+
u
2
2
).
which is manifestly in the form (WNWE), with L
1
(X) = D, L
2
(X) = D, and M
i
(X) = 1,
and moreover we can recover u from u
1
and u
2
by u =
u
1
+u
2
2
.
To simplify the exposition, we will from now on assume we are in the scalar case, n = 1
and that G is a polynomial. The modications needed for the general case are obvious.
The Generalized WGMS Method (Heuristics).
Let us assume that for some particular example of (WNEE) we know that there is a unique
solution u(t) with the initial condition u(0) V . Let t be close to zero, and let us look
for a time-stepping algorithm that, given a suciently good approximation to u(t) as input
will produce an approximation to u(t
) = u(t +t) as output. If we integrate (WNEE) with

respect to t, from t to t
, and use the trapezoidal rule to approximate the integrals on the

right hand side, we nd:
u(t
) u(t) =
t
2
L(D)[u(t) +u(t
)]
+
t
2
M(D)[G(u(t)) +G(u(t
))]
or
(I dL(D))u(t
) = (I +dL(D))u(t)
+dM(D)[G(u(t)) +G(u(t
))],
which we can rewrite as:
u(t
) = Cu(t) +B[G(u(t)) +G(u(t
))]
where d =
t
2
, B =
dM(D)
IdL(D)
, and C =
I+dL(D)
IdL(D)
is the Cayley transform of the skew-adjoint
operator dL(D). We note that the skew-adjointness of L(D) assures that I dL(D) is
invertible, and that C is a unitary operator. In fact, as we shall see shortly, on the Fourier
transform side, both C and B become simple multiplication operators, whose properties are
obvious from those of the polynomials L(X) and M(X).
Next, for each u in V , we dene a map H
u
: V V by
H
u
(w) := Cu +B[G(u) +G(w)],
and we note that the equation above becomes H
u(t)
(u(t
)) = u(t
), i.e., u(t
), which is what
we are trying to compute, is a xed-point of H
u(t)
.
Now, we permit ourselves a little optimismwe assume that u(t
) is in fact a contracting
xed point of H
u(t)
. If this is so then, for t small, u(t) will be close to u(t
), and we can
expect that iterating H
u(t)
starting at u(t), will produce a sequence that converges to u(t
).
This the essence of the WGMS time-stepping algorithm (generalized to WNEE).
For this to work as a numerical method, we must be able to compute H
u
eciently, and
that is where the Fourier Transform comes in. Let us write T for the Fourier Transform,
54
mapping V isomorphically onto

V , and T
1
for its inverse. We dene operators

C = TCT
1
and

B = TBT
1
on

V . Then TH
u
(w) = TCT
1
T(u) +TBT
1
T[G(u) +G(w)], so we can
rewrite H
u
as:
H
u
(w) = T
1
(

C u +

BT[G(u) +G(w)]),
where u = T(u) is the Fourier Transform of u.
Assuming that we have a good algorithm for computing T and T
1
(e.g., the Fast Fourier
Transform), it is now clear that it is easy and ecient to calculate H
u
, and hence to carry out
the iteration. Indeed, calculating G(u) and G(w) at a point x is just a matter of evaluating
the polynomial G at u(x) and w(x). And since M(X) and L(X) are constant coecient
polynomials, the operators

C and

B are diagonal in the Fourier basis e
k
(x) = e
ikx
, i.e., they
are multiplication operators, by the rational functions
1+dL(ik)
1dL(ik)
and
dM(ik)
1dL(ik)
respectively.
Since L(D) is by assumption skew-adjoint, L(ik) is pure imaginary, so the denominator
1dL(ik) does not vanish. Moreover the function
1+dL(ik)
1dL(ik)
clearly takes it values on the unit
circle, and since L(X) has degree greater than M(X), it follows that while the nonlinearity
G(w) may push energy into the high frequency modes of the Fourier Transform, multiplication
by
dM(ik)
1dL(ik)
acts as a low-pass lter, attenuating these high frequency modes and giving the
WGMS method excellent numerical stability.
Proof that H
u
is a Contraction.
In this section we will justify the above optimism by showing that, with a proper choice of
the space V , a suitable restriction of the mapping H
u
does indeed satisfy the hypotheses of
the Banach Contraction Theorem provided |u| and t are suciently small. The space we
will choose for V is the Sobolev Hilbert space H
m
= H
m
(S
1
, V ), with m >
1
2
. We recall that
this is the Hilbert space of all functions u in L
2
(S
1
, V ) such that |u|
2
m
=

k
(1+k
2
)
m
2
[ u(k)[
2
is nite, where as before, u(k) are the Fourier coecients of u.
The principal property of these spaces that we shall need is that H
m
(S
1
, R) is a commu-
tative Banach algebra under pointwise multiplication when m >
1
2
(cf. [A], Theorem 5.23,
or [P]). As a rst consequence, it follows that if P : V V is a polynomial mapping, then
u P(u) is a map of H
m
to itself, and moreover |P(u)|
m
< C |u|
r
m
, where r is the degree
of P. We will permit ourselves the abuse of notation of denoting this latter map by P, and it
is now elementary to see that it is Frechet dierentiable, and in fact that DP
u
(v) = P
(u)v,
where P
is the derivative of P. (This will follow if we can show that there is an algebraic
identity of the form P(X+Y ) = P(X) +P
(X)Y +Q(X, Y )Y
2
, for some polynomial Q in X
and Y . But it is clearly enough to check this for monomial P, in which case it is immediate
from the binomial theorem.)
Let us denote by B
R
the ball of radius R in H
m
. Then as an immediate consequence of
the preceeding remarks we have:
Proposition 1. For any R > 0 there exist positive constants C
1
and C
2
such that for all u
in B
R
, |G(u)|
m
< C
1
and |DG
u
| < C
2
.
It will be important for us to have a good estimate of how the norm of B depends on t.
Proposition 2. Given T > 0, there is a positive constant C
3
such that the norm of the
operator B on H
m
satises |B| < C
3
t
q
, for all t < T, where = degree(L(X)) and

q = degree(L(X)) degree(M(X)). Thus lim
t0
|B| = 0.
55
PROOF. It is clear that the Fourier basis e
k
(x) = e
ikx
is orthogonal with respect to the
H
m
inner-product (though not orthonormal, except for the case H
0
= L
2
). Thus, since all
constant coecient dierential operators are diagonalized in this basis, we can compute their
norms on H
m
by taking the maximum absolute values of their eigenvalues on the e
k
. In the
case of B, we have already seen that these eigenvalues are
dM(ik)
1dL(ik)
. Since d =
T
2
, to prove
the proposition it will suce to show that
dM(ik)
1dL(ik)
< C
3
d
q
for all real k and all d < 2T.

Writing L(X) =

j=0
b
j
X
j
and M(X) =

q
j=0
a
j
X
j
, let us dene parametric fam-
ilies of polynomials L
c
and M
c
for c 0 by L
c
(X) =

j=0
(c
j
b
j
)X
j
and M
c
(X) =
q
j=0
(c
qj
a
j
)X
j
. Now note that if we dene = d
1
then (since
j
(X)
j
= dX
j
) clearly
L
(X) = dL(X), and similarly M
(X) =
q
dM(X), so
dM(ik)
1dL(ik)
=
q
M
(ik)
1L
(ik)
, and to com-
plete the proof it will suce to show that the family of rational functions R
c
(x) =
M
c
(ix)
1L
c
(ix)
is
uniformly bounded for 0 c (T/2)
1
and x real. If

R is the one-point compactication
of R and we dene R
c
() = 0, then since the denominator of R
c
(X) never vanishes and has
degree greater than the numerator, if follows that (c, x) R(c, x) is continuous and hence
bounded on the compact space [0, (T/2)
1
]

R.
Theorem. Given R > 0 there exist positive r and T such that H
u
is a contraction mapping
of B
R
into itself provided that u is in B
r
and t < T. Moreover there is a uniform contraction
constant K < 1 for all such u and t.
PROOF. From Proposition 1 and the denition of H
u
it follows that H
u
is dierentiable
on H
m
and that D(H
u
)
v
= B DG
v
. Then, again by Proposition 1, |D(H
u
)
v
| < C
2
|B|
for all u in B
R
, and so by Proposition 2, |D(H
u
)
v
| < C
2
C
3
t
q
. Given K < 1, if we choose

T <
_
K
C
2
C
3
_
q
then |D(H
u
)
v
| < K on the convex set B
R
and hence K is a contraction
constant for H
u
on B
R
, and it remains only to show that if we choose r suciently small,
and perhaps a smaller T then H
u
also maps B
R
into itself for u in B
r
.
But using the denition of H
u
again, it follows that
|H
u
(w)|
m
< |Cu|
m
+|B| (|G(u)|
m
+|G(w)|
m
),
and recalling that C is unitary on H
m
, it follows from Propositions 1 and 2 that |H
u
(w)|
m
<
r + 2C
1
C
3
T
q
. Thus H
u
will map B
R
into itself provided r + 2C
1
C
3
T
q
< R, i.e., provided

r < R and T <
_
Rr
2C
2
C
3
_
q
.
Numerics.
There are several types of numerical errors inherent in the WGMS algorithm. The rst
and most obvious is the error in approximating the integral of the right hand side of the
equation using the trapezoidal rule.
A second truncation error occurs when we stop the xed point iteration after a nite
number of steps.
In actually implementing the WGMS algorithm to solve an initial value program nu-
merically, one usually chooses an integer N of the form 2
e
, and works in the space V
N
of
56
band-limited functions u whose Fourier coecients u(k) vanish for [k[ > N/2. Of course,
V
N
is in all the Sobolev spaces. If we start with an initial condition u
0
not actually in V
N
then there will be an aliasing error when the initial Fast Fourier Transform projects it into
V
N
. Also, since the WGMS method does not rigorously preserve V
N
, there will be further
such errors at each time step.
Of course it is important to investigate the magnitude of these various local errors, see
how they propogate, and get a bound for the global error.
57
References
[A]
Adams,R.A., Sobolev Spaces, Academic Press, 1975.
[AKNS1]
Ablowitz, M.J., Kaup, D.j., Newell, A.C. and Segur, H., Method for solving the Sine-Gordon equation,
Phy. Rev. Lett. 30 (1973), 12621264.
[AKNS2]
Ablowitz, M.J., Kaup, D.j., Newell, A.C. and Segur, H., The inverse scattering transformFourier
analysis for nonlinear problems, Stud. Appl. Math. 53 (1974), 249315.
[AbM]
Abraham, R., Marsden, J.E., Foundations of Mechanics, Benjamin/Cummings, 1978.
[Ad]
Adler, M., On a trace functional for formal pseudo-dierential operators and the symplectic structure of the
Korteweg-de Vries equation, Invent. Math 50 (1979), 219248.
[Ar]
Arnold, V.I., Mathematical Methods of Classical Mechanics, Springer-Verlag, 1978.
[BS]
Bona, J.L. and Smith, R., The Initial-Value Problem for the Korteveg-de Vries Equation, Philos. Trans.
Royal Soc. London, Series A 278 (1975), 555604.
[DJ]
Drazin, P.G., Johnson, R.S., Solitons: an introduction, Cambridge Univ. Press, 1989.
[FW]
Forneberg,B. and Whitham, G.B., A Numerical and Theoretical Study of Certain Nonlinear Wave Phe-
nomena, Proc. R. Soc. Lond. A 289 (1978), 373403.
[FT]
Faddeev, L.D., Takhtajan, L.A., Hamiltonian Methods in the theory of Solitons,
Springer-Verlag, 1987.
[FPU]
Fermi, E., Pasta, J., Ulam, S., Studies of Nonlinear Problems. I, Lectures in Applied Math., 1974.
[GGKM]
Gardner, C.S., Greene, J.M., Kruskal, M.D., Miura, R.M., Method for solving the Korteweg-de Vries
equation, Physics Rev. Lett. 19 (1967), 10951097.
[GL]
Gelfand, I.M., Levitan, B. M., On the determination of a dierential equation from its spectral function,
Izv. Akad. Nauk SSSR Ser. Mat. 15 (1951), 309366.
[H]
Hamming, R.W., The Unreasonable Eectiveness of Mathematics, Amer. Math. Monthly 87 (1980), .
[J]
John, F., Nonlinear Wave Equations, Formation of Singularities, American Math. Soc., ULS, 1990.
[Ka1]
Kato, T., On the Cauchy Problem for the (Generalized) Korteweg-de Vries Equation, Studies in Applied
Math., Adv. in Math. Supp. Stud. 8 (1983), 93128.
[Ka2]
Kato, T., Quasi-linear equations of evolution, with applications to partial dierential equations, Lecture Notes
in Math., vol. 448, Springer-Verlag, Berlin and New York, 1988.
[KM]
Kay, B., Moses, H.E., The determination of the scattering potential from the spectral measure function, III,
Nuovo Cim. 3 (1956), 276304.
[Kos]
Kostant, B., The solution to a generalized Toda lattice and representation theory, Adv. Math. 34 (1979),
195338.
[KdV]
Korteweg, D.J., de Vries, G., ON the change of form of long waves advancing in a rectangular canal, and
on a new type of long stationary waves, Philos. Mag. Ser. 5 39 (1895), 422443.
[La1]
Lax, P.D., Integrals of nonlinear equations of evolution and solitary waves, Comm. Pure. Appl. Math. 31
(1968), 467490.
[La2]
Lax, P.D., Hyperbolic Systems of Conservation Laws and the Mathematical Theory of Shock Waves, SIAM,
CBMS 11, 1973.
58
[La3]
Lax, P.D., Outline of a theory of the KdV equation, in Recent Mathematical Methods in Nonlinear Wave
Propagation, Lecture Notes in Math., vol. 1640, Springer-Verlag, Berlin and New York, 1996.
[M]
Marchenko, V.A., On the reconstruction of the potential energy from phases of the scattered waves, Dokl.
Akad. Nauk SSSR 104 (1955), 695698.
[MGSS]
Wineberg, S.B., Gabl,E.F., Scott, L.R. and Southwell, C.E., Implicit Spectral Methods for Wave
Propogation Problems, J. Comp. Physics 97 (1991), 311336.
[P]
Palais,R.S., Foundations of Global Non-linear Analysis, Benjamin and Co., 1968.
[Pa]
Palais, R.S., The Symmetries of Solitons, Bulletin of the AMS 34 (1997), 339403. dg-ga 9708004
[Ru]
Russell, J.S., Report on Waves, 14th Mtg. of the British Assoc. for the Advance. of Science, John Murray,
London, pp. 311390 + 57 plates, 1844.
[S]
Sattinger, D.H., Weak and Strong Nonlinear Waves,
[St]
Strang, G., On the Construction and Comparison of Dierence Schemes, SIAM J. Numerical Analysis 5
(1968), 506517. MR 38:4057
[Sy]
Symes, W.W., Systems of Toda type, Inverse spectral problems, and representation theory, Inventiones Math.
59 (1980), 1351.
[Ta]
Tappert, F., Numerical Solutions of the Korteweg-de Vries Equations and its Generalizations by the Split-Step
Fourier Method, in Nonlinear Wave Motion, Lecture Notes in Applied Math. (AMS) v.15 (1974), 215216.
[To]
Toda, M., Nonlinear Waves and Solitons, Kluwer, 1989.
[W]
Wigner, E., The Unreasonable Eectiveness of Mathematics in the Natural Sciences, Comm. in Pure and
Appl. Math. 13 (1960), .
[ZF]
Zakharov, V.E., Faddeev, L.D., Korteweg-de Vries equation, a completely integrable Hamiltonian system,
Func. Anal. Appl. 5 (1971), 280287.
[ZS]
Zakharov, V.E., Shabat, A.B., Exact theory of two-dimensional self-focusing and one-dimensional of waves
in nonlinear media, Sov. Phys. JETP 34 (1972), 6269.
[ZK]
Zabusky, N.J., Kruskal, M.D., Interaction of solitons in a collisionless plasma and the recurrence of initial
states, Physics Rev. Lett. 15 (1965), 240243.
59

Notes PDF

Uploaded by

Copyright:

Available Formats

Notes PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Notes PDF

Uploaded by

Copyright:

Available Formats

An Introduction to Wave Equations and Solitons

u(x, t) dx is a constant of the

, then we can write our Conser-

. Clearly C, P, and T commute, so their composition CPT is also an involution

acting on complex-valued functions dened on space-time. We note

zw = 0 that can be dened explicitly

< 0, i.e., the dispersion curve is convex upwards,

> 0 being referred

. On the other hand,

is real for [[ odd, and imaginary for [[ even, and

must be skew-adjoint operator on C

is self-adjoint for [[ odd and skew-adjoint for

(x, t) its ux, i.e.,

n dA is the rate of ow of the quantity across a surface element

(x, t) = g(x, t),

will be denoted by f. Thus our conservation law nally takes the

(u) 0, so that f is a non-decreasing function. This is

= 0, say f = c, giving the Linear Advection Equation, u

(u) > d > 0, so that f is a strictly increasing function.

= 1. It follows that the Jacobian of (x, t) with respect to (, ) is

where the vertical

, t)), so the prole is no longer the graph of a

to the left and R

(v(x)u(x) u(x)v(x)) dx,

u(y) dy denotes the indenite integral of u. (For the periodic KdV

L(t)U(t). Dierentiating the

(u) of eigenfunctions of the Schr odinger operator with this potential. By

(u) is just the kernel of the linear operator L

(R). Using the special form of

(u) will clearly be of the form (x) =

(u) such that

(u) such that

(u) contains a non-zero element of L

(u) will be of the form ae

(u) such that

u(, t)(, t) d. In other words,

) and p(k), p(k

K(x, y)B(y +z) dy = 0.

K(x, y)B(y +z) dy.

(0) = 0, or in other words 0 is a critical point of

(y) + 3R(y)), and F = V

and recall that m = h, we have

= 0. Finally, making the substitution v = y

= 0, although the form v

(, ) will satisfy the KdV

u(x), v(x)) dx. Integration by parts shows that D is skew-adjoint. More-

(u +v)k(u v) +h(u +v)k

(u +v)k(u v) = h(u +v)k

) = u(t +t) as output. If we integrate (WNEE) with

, and use the trapezoidal rule to approximate the integrals on the

) = Cu(t) +B[G(u(t)) +G(u(t

, for all t < T, where = degree(L(X)) and

for all real k and all d < 2T.

(X) = dL(X), and similarly M

. Given K < 1, if we choose

< R, i.e., provided

You might also like