When Functions Have No Value(s) : Delta Functions and Distributions
When Functions Have No Value(s) : Delta Functions and Distributions
When Functions Have No Value(s) : Delta Functions and Distributions
1
x
[x[ x/2
0 otherwise
= lim
x0
+
x
(x)?
For any x > 0, the function
x
(x) at right has
integral = 1 and is zero except near x = 0. Un-
fortunately, the x 0 limit does not exist as an
ordinary function:
x
(x) approaches for x = 0,
but of course is not a real number.
Informally, one often sees denitions of (x) that
describe it as some mysterious object that is not
quite a function, which = 0 for x ,= 0 but is unde-
ned at x = 0, and which is only really dened inside
an integral (where it integrates to 1).
1
This may
leave you with a queasy feeling that (x) is somehow
1
Historically, this unsatisfactory description is precisely
how (x) rst appeared, and for many years there was a corre-
sponding cloud of guilt and uncertainty surrounding any usage
of it. Most famously, an informal (x) notion was popularized
by physicist Paul Dirac, who in his Principles of Quantum Me-
chanics (1930) wrote: Thus (x) is not a quantity which can
be generally used in mathematical analysis like an ordinary
function, but its use must be conned to certain simple types
of expression for which it is obvious that no inconsistency can
arise. You know you are on shaky ground, in mathematics,
when you are forced to appeal to the obvious.
1
not real or rigorous (and therefore anything based on
it may be suspect). For example, integration is an
operation that is classically only dened for ordinary
functions, so it may not even be clear (yet) what
(x)dx.
1.2 Not all functions are dierentiable
Most physical laws can be written in the form of
derivatives, but lots of functions are not dieren-
tiable. Discontinuous functions arise all of the time
at the interface between two materials (e.g. think
of the density at the interface between glass and air).
Of course, at a microscopic level you could argue that
the quantum wavefunctions might be continuous, but
one hardly wishes to resort to atoms and quantum
mechanics every time a material has a boundary!
The classic example of a discontinuous function is
the Heaviside step function:
S(x) =
1 x 0
0 x < 0
.
The derivative S
(x
)dx
n=1
a
n
sin(nx) =
n=1
a
n
d
dx
sin(nx) =
n=1
na
n
cos(nx),
but this is not always true if we interpret = in the
usual sense of being true for every x. The general
problem is that one cannot always interchange limits
and derivatives, i.e.
d
dx
lim ,= lim
d
dx
. (Note that
n=1
is really a limit lim
N
N
n=1
.)
A simple example of this is the function
f
(x) =
x <
x x
x >
, f
(x) =
1 < x <
0 [x[ >
.
The limit as 0 of f
lim
0
f
x=0
,= 1 = lim
0
d
dx
f
x=0
.
Notice, however, that this mismatch only occurs at
one isolated point x = 0. The same thing happens
for Fourier series: dierentiation term-by-term works
except at isolated points. Thus, this problem returns
to the complaint in the previous section 1.3.
1.5 Too much information
A function f(x) gives us a value at every point x, but
does this really correspond to a measurable quantity
2
in the physical universe? How would you measure the
velocity at one instant in time, or the density at one
point in a uid? Of course, your measurement device
could be very precise, and very small, and very fast,
but in the end all you can ever measure are averages
of f(x) over a small region of space and/or time.
But an average is the same thing as an integral
f(x)(x)dx
of f(x) against a test function (x).
But if all we can ever ask is such an integral, why
are we worrying about isolated points? In fact, why
do we even dene f(x) to have values at points at
all? In a physical application of mathematics, per-
haps we should only dene things that we can mea-
sure! This insight xes every one of the problems
above, and leads to the concept of distributions.
2 Distributions
The old kind of function is a map from R R: given
an x, we get a value f(x). Following section 1.5,
this is too much information; we can only ask for an
average value given some weight function (x). So,
we make a new denition of function that provides
this information, and only this information:
f is a rule that given any test function (x) re-
turns a number f.
2
This new denition of a function is called a dis-
tribution or a generalized function. We are no
longer allowed to ask the value at a point x. This
will x all of the problems with the old functions from
above. However, we should be more precise about our
denition. First, we have to specify what (x) can
be:
(x) is an ordinary function R R (not a dis-
tribution) in some set T. We require (x) to be
innitely dierentiable. We also require (x) to
be nonzero only in some nite region (the sup-
port of ): is a smooth bump function.
3
2
Many authors use the notation f, instead of f{}, but
we are already using , for inner products and I dont want
to confuse matters.
3
Alternatively, one sometimes loosens this requirement to
merely say that (x) must 0 quickly as x , and in
particular that x
n
(x) 0 as x for all integers n 0.
[The generalization to functions (x) for x R
d
, with
the distributions corresponding to d-dimensional in-
tegrals, is very straightforward, but we will stick to
d = 1 for simplicity.] Second,
4
we require that f
act like integration in that it must be linear:
f
1
+
2
= f
1
+f
2
for any num-
bers , R and any
1
,
2
T.
Thus, f is a linear map from T R, and the set
of all distributions for a given set of test functions
T is sometimes denoted T
f(x)(x)dx.
This is called a regular distribution.
Not all ordinary functions f(x) dene regular dis-
tributions: we must have
f nite for all T.
This reduces to requiring that
b
a
[f(x)[dx < for
all intervals [a, b] (f is locally integrable).
2.2 Singular distributions
& the delta function
Although the integral of an ordinary function is one
way to dene a distribution, it is not the only way.
For example, the following distribution is a perfectly
good one:
= (0).
This rule is linear and continuous, there are no
weird innities, nor is there anything spooky or non-
rigorous. Given a test function (x), the (x) distri-
bution is simply the rule that gives (0) from each .
(x), however, does not correspond to any ordinary
functionit is not a regular distributionso we call
it a singular distribution.
Notation: when we write
(x)(x)dx, we
dont mean an ordinary integral, we really mean
= (0).
4
There is also a third requirement: f{} must be continu-
ous, in that if you change (x) continuously the value of f{}
must change continuously. In practice, you will never violate
this condition unless you are trying to.
3
Furthermore, when we say (x x
), we just mean
the distribution (x x
) = (x
).
If we look at the nite-x delta approximation
x
(x) from section 1.1, that denes the regular dis-
tribution:
x
=
1
x
x/2
x/2
(x)dx,
which is just the average of (x) in [x/2, x/2].
Now, however, viewed as a distribution, the limit
x 0 is perfectly well dened:
5
lim
x0
x
=
(0) = , i.e.
x
.
Of course, the distribution is not the only singular
distribution, but it is the most famous one (and the
one from which many other singular distributions are
built). We will see more examples later.
2.3 Derivatives of distributions
& dierentiating discontinuities
How do we dene the derivative f
of a distribution?
Well, at the very least we want it to be the same as
the ordinary derivative when f is a regular distribu-
tion f(x) that happens to be dierentiable in the or-
dinary sense. In that case, f
=
f
(x)(x)dx =
f(x)
(x)dx = f
of f is
given by the distribution f
= f
, where
(0).
The most important consequence of this denition
is that even discontinuous functions are dier-
entiable as distributions, and their derivatives give
delta functions for each discontinuity. Consider the
regular distribution S dened by the step function
S(x):
S =
S(x)(x)dx =
0
(x)dx.
5
We have used the fact that (x) is required to be continu-
ous, from which one can show that nothing weird can happen
with the average of (x) as x 0.
It immediately follows that the distributional deriva-
tive of the step function is
S
= S
(x)dx = (x)[
0
= (0)
() = (0).
But this is exactly the same as , so we immedi-
ately conclude: S
= .
Since any function with jump discontinuities can be
written in terms of S(x), we nd that the derivative
of any jump discontinuity gives a delta function mul-
tiplied by the magnitude of the jump. For example,
consider:
f(x) =
x
2
x < 3
x
3
x 3
= x
2
+ (x
3
x
2
)S(x 3).
The distributional derivative works just like the ordi-
nary derivative, except that S
= , so
6
f
(x) = 2x + (3x
2
2x)S(x 3) + (3
3
3
2
)(x 3)
= 18(x 3) +
2x x < 3
3x
2
x 3
,
where of course by f
1 x = 0
0 otherwise
.
This is not a delta function: it is nite at x = 0, and
is a perfectly acceptable function. It also denes a
regular distribution:
f =
f(x)(x)dx = 0.
6
Im being a bit glib here. How do we know that the product
rule works the same? Below, we will rigorously dene what it
means to multiply a distribution by a smooth function like
x
3
x
2
, and the ordinary product rule will follow.
4
Think about it: no matter what (x) is, the integral
must give zero because it is only nonzero (by a -
nite amount) at a single point, with zero area (zero
measure for the pure-math folks). Thus, in the dis-
tribution sense, we can say perfectly rigorously that
f = 0
even though f(x) ,= 0 in the ordinary-function sense!
In general, any two ordinary functions that only
dier (by nite amountsnot delta functions!) at
isolated points (a set of measure zero) dene the
same regular distribution. We no longer have to
make caveats about isolated pointsnite values at
isolated points make no dierence to a distribution.
For example, there are no more caveats about the
Fourier series or Fourier transforms: they converge,
period, for distributions.
7
Also, there is no more quibbling about the value of
things like S(x) right at the point of discontinuities.
It doesnt matter, for distributions. Nor is there quib-
bling about the derivative of things like [x[ right at
the point of the slope discontinuity. It is an easy mat-
ter to show that the distributional derivative of [x[ is
simply 2S(x) 1, i.e. it is the regular distribution
corresponding to the function that is +1 for x > 0
and 1 for x < 0 (with the value at x = 0 being
rigorously irrelevant).
2.5 Interchanging
limits and derivatives
With a distribution, limits and (distributional)
derivatives can always be interchanged. This is
tremendously useful when talking about PDEs and
convergence of approximations. In the distribution
sense, the Fourier series can always be dierentiated
term-by-term, for example.
This is easy to prove. Suppose that the distribu-
tions f
n
f as n . That is, for any (x),
f
n
f. Since this is true for any (x),
it must be true for
n
=
f
n
= f
as n . Q.E.D.
3 Rules for distributions
For the most part, in 18.303, we will cheat a bit.
We will treat things as ordinary functions when-
ever we can, using the ordinary operations, and only
7
Technically, we have to choose distributions with the right
set of test functions. The right test functions for the general
Fourier transform on the real line are those for which x
n
(x)
0 as x for any n > 0, i.e. (x) vanishes faster than any
polynomial. The resulting distributions are called tempered
distributions, and are the domain of the Fourier transform.
switch to interpreting them as distributions when we
run into diculty (e.g. derivatives of discontinuities,
etcetera). Since the rules for distribution operations
are all dened to be consistent with those for func-
tions in the case of regular distributions, this doesnt
usually cause any trouble.
However, it is good to dene some of the important
operations on distributions precisely. All you have to
do is to explain what the operation does to test func-
tions, usually dened by analogy with
f(x)(x)dx
for regular distributions. Here are a few of the most
basic operations:
dierentiation: f
= f
addition: (f
1
+f
2
) = f
1
+f
2
= [s f]
= fs
=
fs
(s)
= fs
+f
s = [s
f] +
[s f
] = [s
f +s f
].
translation: [f(x y)](x) = f(x +y)
scaling [f(x)](x) =
1
f(x/)
If you are not sure where these rules come from, just
try plugging them into a regular distribution f =
x
(x)
2
is okay, but its limit as x 0 does
not exist even as a distribution (the amplitude
goes as 1/x
2
while the integral goes as x, so
it diverges).
5
For linear PDEs, lack of multiplication is not such
a big problem, but it does mean that we need to
be careful about Hilbert spaces: if we think of the
solutions u(x) as distributions, we have a problem
because u, u) may not be denedthe set of distri-
butions does not form a Hilbert space. (Technically,
we can make something called a rigged Hilbert space
that includes distributions, but I dont want to go
there.)
5 The weak form of a PDE
Suppose we have a linear PDE
Au = f. We want
to allow f to be a delta function etcetera, but we
still want to talk about boundary conditions, Hilbert
spaces, and so on for u. There is a relatively simple
compromise for linear PDEs, called the weak form
of the PDE or the weak solution. This concept
can roughly be described as requiring
Au = f only
in the weak sense of distributions (i.e., integrated
against test functions, taking distributional deriva-
tives in the case of a u with discontinuities), but re-
quiring u to be in a more restrictive class, e.g. regu-
lar distributions corresponding to functions satisfying
the boundary conditions in the ordinary sense but
having some continuity and dierentiability so that
u, u) and u,
Au) are nite (i.e., living in an appropri-
ate Sobolev space). Even this denition gets rather
technical very quickly, especially if you want to allow
delta functions for f (in which case u can blow up at
the point of the delta for dimensions > 1). Nailing
down precisely what spaces of functions and opera-
tors one is dealing with is where a lot of technical
diculty and obscurity arises in functional analysis.
However, for practical engineering and science appli-
cations, we can get away with being a little careless in
precisely how we dene the function spaces because
the weird counterexamples are usually obviously un-
physical. The key insight of distributions is that what
we care about is weak equality, not pointwise equality,
and correspondingly we only need weak derivatives
(integration by parts, or technically bilinear forms).
6 Greens functions
Now that we have distributions, Greens functions are
much easier to work with. Consider, for example, the
Greens function G(x, x
) of
A =
2
x
2
on [0, L] with
Dirichlet (zero) boundary conditions. If
Au = f is
to be solved by u(x) =
G(x, x
)f(x
)dx
, then we
must have
Au =
[
AG(x, x
)]f(x
)dx
= f(x), which
is true if
AG(x, x
) = (x x
)
and the integrals are re-interpreted as evaluating a
distribution.
What does this equation mean? For any x ,= x
) = 0] we must have
AG(x, x
) = 0 =
2
x
2
G(x, x
and x > x
.
To satisfy, the boundary conditions, this straight line
must pass through zero at 0 and L, and hence G(x, x
)
must look like x for x < x
, or
otherwise we would get a delta function from the
rst derivativehence = (x
L)/x
. The rst
derivative
x
G(x, x
and
for x > x
x
G(x, x
, but as a distribu-
tion it is no problem:
2
x
2
G(x, x
) is zero everywhere
(the derivative of a constant) plus a delta function
(x x
) =
AG = ( )(x x
), and
from above we must have = 1. Combined with
the equation for from continuity of G, we obtain
= x
/L and = 1 x
(x x
)(x)dx just
means (x
(x x
)(x) give (x
) if x
is in the
interior of and 0 if x
is outside , but
are undened (or at least, more care is re-
quired) if x
is on the boundary d.
When in doubt about how to compute f
(x),
integrate by parts against a test function to
see what
f
(x)(x) =
f(x)
gives
(xx
)], plus
the ordinary derivative everywhere else.
This also applies to dierentiating func-
tions like 1/