When Functions Have No Value(s) : Delta Functions and Distributions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

When functions have no value(s):

Delta functions and distributions


Steven G. Johnson, MIT course 18.303 notes
Created October 2010, updated April 15, 2011.
Abstract
These notes give a brief introduction to the mo-
tivations, concepts, and properties of distributions,
which generalize the notion of functions f(x) to al-
low derivatives of discontinuities, delta functions,
and other nice things. This generalization is in-
creasingly important the more you work with linear
PDEs, as we do in 18.303. For example, Greens func-
tions are extremely cumbersome if one does not al-
low delta functions. Moreover, solving PDEs with
functions that are not classically dierentiable is of
great practical importance (e.g. a plucked string with
a triangle shape is not twice dierentiable, making
the wave equation problematic with traditional func-
tions). Any serious work with PDEs will eventually
run into the concept of a weak solution, which is
essentially a version of the distribution concept.
1 Whats wrong with functions?
The most familiar notion of a function f(x) is a map
from real numbers R to real numbers R (or maybe
complex numbers C); that is, for every x you have
a value y = f(x). Of course, you may know that
one can dene functions from any set to any other
set, but at rst glance it seems that R R func-
tions (and multi-dimensional generalizations thereof)
are the best-suited concept for describing most phys-
ical quantitiesfor example, velocity as a function
of time, or pressure or density as a function of posi-
tion in a uid, and so on. Unfortunately, such func-
tions have some severe drawbacks that, eventually,
lead them to be replaced in some contexts by an-
other concept: distributions (also called generalized
functions). What are these drawbacks?
1.1 No delta functions
For lots of applications, such as those involving PDEs
and Greens functions, one would like to have a func-
tion (x) whose integral is concentrated at the point
x = 0. That is, one would like the function (x) = 0
for all x ,= 0, but with

(x)dx = 1 for any in-
tegration region that includes x = 0; this concept
is called a Dirac delta function or simply a delta
function. (x) is usually the simplest right-hand-
side for which to solve dierential equations, yielding
a Greens function. It is also the simplest way to
consider physical eects that are concentrated within
very small volumes or times, for which you dont ac-
tually want to worry about the microscopic details
in this volumefor example, think of the concepts of
a point charge, a point mass, a force plucking a
string at one point, a kick that suddenly imparts
some momentum to an object, and so on. The prob-
lem is that there is no classical function (x) having
these properties.
For example, one could imagine constructing this
function as the limit:
(x) = lim
x0
+

1
x
[x[ x/2
0 otherwise
= lim
x0
+

x
(x)?
For any x > 0, the function
x
(x) at right has
integral = 1 and is zero except near x = 0. Un-
fortunately, the x 0 limit does not exist as an
ordinary function:
x
(x) approaches for x = 0,
but of course is not a real number.
Informally, one often sees denitions of (x) that
describe it as some mysterious object that is not
quite a function, which = 0 for x ,= 0 but is unde-
ned at x = 0, and which is only really dened inside
an integral (where it integrates to 1).
1
This may
leave you with a queasy feeling that (x) is somehow
1
Historically, this unsatisfactory description is precisely
how (x) rst appeared, and for many years there was a corre-
sponding cloud of guilt and uncertainty surrounding any usage
of it. Most famously, an informal (x) notion was popularized
by physicist Paul Dirac, who in his Principles of Quantum Me-
chanics (1930) wrote: Thus (x) is not a quantity which can
be generally used in mathematical analysis like an ordinary
function, but its use must be conned to certain simple types
of expression for which it is obvious that no inconsistency can
arise. You know you are on shaky ground, in mathematics,
when you are forced to appeal to the obvious.
1
not real or rigorous (and therefore anything based on
it may be suspect). For example, integration is an
operation that is classically only dened for ordinary
functions, so it may not even be clear (yet) what

means when we write

(x)dx.
1.2 Not all functions are dierentiable
Most physical laws can be written in the form of
derivatives, but lots of functions are not dieren-
tiable. Discontinuous functions arise all of the time
at the interface between two materials (e.g. think
of the density at the interface between glass and air).
Of course, at a microscopic level you could argue that
the quantum wavefunctions might be continuous, but
one hardly wishes to resort to atoms and quantum
mechanics every time a material has a boundary!
The classic example of a discontinuous function is
the Heaviside step function:
S(x) =

1 x 0
0 x < 0
.
The derivative S

(x) is zero everywhere except at x =


0, where the derivative does not existthe slope at
x = 0 is innity. Notice that H

(x) very much


resembles (x), and

x

(x

)dx

would certainly look


something like S(x) if it existed (since the integral
of should be 1 for x > 0)this is not a coincidence,
and it would be a shame not to exploit it!
A function doesnt need to be discontinuous to lack
a derivative. Consider the function [x[: its derivative
is +1 for x > 0 and 1 for x < 0, but at x = 0
the derivative doesnt exist. We say that [x[ is only
piecewise dierentiable.
Note that S(x) is very useful for writing down
all sorts of functions with discontinuities. [x[ =
xS(x) xS(x), for example, and the
x
(x) box
function on the right-hand-side of the (x) deni-
tion above can be written
x
(x) = [S(x +x/2)
S(x x/2)]/x.
1.3 Nagging worries about
discrepancies at isolated points
When we try to do linear algebra with functions, we
continually nd ourselves worrying about excluding
odd caveats and counterexamples that have to do
with nite discrepancies at isolated points. For ex-
ample, a Fourier series of a square-integrable function
converges everywhere. . . except at isolated points of
discontinuity [like the point x = 0 for S(x), where
a Fourier series would converge to 0.5]. As an-
other example, u, v) =

uv denes an inner prod-
uct on functions and a norm |u|
2
= u, u) > 0 for
u ,= 0. . . except for u(x) that are only nonzero for iso-
lated points. And with functions like S(x) there are
all sorts of apparently pointless questions about what
value to assign exactly at the discontinuity. It may
likewise seem odd to care about what value to assign
the slope of [x[ at x = 0; surely any value in [1, 1]
should do?
1.4 Limits and derivatives
cannot always be interchanged
In numerical and analytical PDE methods, we are
continually writing functions as limits. A Fourier se-
ries is a limit as the number of terms goes to innity.
A nite-dierence method solves the problem in the
limit as x 0. We initially nd the Greens func-
tion as the limit of the response to a box-like function
that is nonzero outside of a width x, and then take
the limit x 0. After all of these kinds of things,
we eventually substitute our solution back into the
PDE, and we assume that the limit of the solution is
still a solution. In doing so, however, we usually end
up interchanging the limits and the dierentiation.
For example, we usually dierentiate a Fourier series
by:
d
dx

n=1
a
n
sin(nx) =

n=1
a
n
d
dx
sin(nx) =

n=1
na
n
cos(nx),
but this is not always true if we interpret = in the
usual sense of being true for every x. The general
problem is that one cannot always interchange limits
and derivatives, i.e.
d
dx
lim ,= lim
d
dx
. (Note that

n=1
is really a limit lim
N

N
n=1
.)
A simple example of this is the function
f

(x) =

x <
x x
x >
, f

(x) =

1 < x <
0 [x[ >
.
The limit as 0 of f

(x) is simply zero. But


f

(0) = 1 for all , so


0 =
d
dx

lim
0
f

x=0
,= 1 = lim
0

d
dx
f

x=0

.
Notice, however, that this mismatch only occurs at
one isolated point x = 0. The same thing happens
for Fourier series: dierentiation term-by-term works
except at isolated points. Thus, this problem returns
to the complaint in the previous section 1.3.
1.5 Too much information
A function f(x) gives us a value at every point x, but
does this really correspond to a measurable quantity
2
in the physical universe? How would you measure the
velocity at one instant in time, or the density at one
point in a uid? Of course, your measurement device
could be very precise, and very small, and very fast,
but in the end all you can ever measure are averages
of f(x) over a small region of space and/or time.
But an average is the same thing as an integral

f(x) over the averaging region. More generally, in-


stead of averaging f(x) uniformly in some region, we
could average with some weights (x) (e.g. our de-
vice could be more sensitive to some points than oth-
ers). Thus, the only physical question we can ever
ask about a function is the value of an integral

f(x)(x)dx
of f(x) against a test function (x).
But if all we can ever ask is such an integral, why
are we worrying about isolated points? In fact, why
do we even dene f(x) to have values at points at
all? In a physical application of mathematics, per-
haps we should only dene things that we can mea-
sure! This insight xes every one of the problems
above, and leads to the concept of distributions.
2 Distributions
The old kind of function is a map from R R: given
an x, we get a value f(x). Following section 1.5,
this is too much information; we can only ask for an
average value given some weight function (x). So,
we make a new denition of function that provides
this information, and only this information:
f is a rule that given any test function (x) re-
turns a number f.
2
This new denition of a function is called a dis-
tribution or a generalized function. We are no
longer allowed to ask the value at a point x. This
will x all of the problems with the old functions from
above. However, we should be more precise about our
denition. First, we have to specify what (x) can
be:
(x) is an ordinary function R R (not a dis-
tribution) in some set T. We require (x) to be
innitely dierentiable. We also require (x) to
be nonzero only in some nite region (the sup-
port of ): is a smooth bump function.
3
2
Many authors use the notation f, instead of f{}, but
we are already using , for inner products and I dont want
to confuse matters.
3
Alternatively, one sometimes loosens this requirement to
merely say that (x) must 0 quickly as x , and in
particular that x
n
(x) 0 as x for all integers n 0.
[The generalization to functions (x) for x R
d
, with
the distributions corresponding to d-dimensional in-
tegrals, is very straightforward, but we will stick to
d = 1 for simplicity.] Second,
4
we require that f
act like integration in that it must be linear:
f
1
+
2
= f
1
+f
2
for any num-
bers , R and any
1
,
2
T.
Thus, f is a linear map from T R, and the set
of all distributions for a given set of test functions
T is sometimes denoted T

. There are two classes of


distributions: regular and singular distributions.
2.1 Regular distributions
from ordinary functions f(x)
The most obvious way to dene a distribution f is
simply an ordinary integral of an ordinary function:
given an ordinary function f(x), we can dene the
distribution:
f =

f(x)(x)dx.
This is called a regular distribution.
Not all ordinary functions f(x) dene regular dis-
tributions: we must have

f nite for all T.
This reduces to requiring that

b
a
[f(x)[dx < for
all intervals [a, b] (f is locally integrable).
2.2 Singular distributions
& the delta function
Although the integral of an ordinary function is one
way to dene a distribution, it is not the only way.
For example, the following distribution is a perfectly
good one:
= (0).
This rule is linear and continuous, there are no
weird innities, nor is there anything spooky or non-
rigorous. Given a test function (x), the (x) distri-
bution is simply the rule that gives (0) from each .
(x), however, does not correspond to any ordinary
functionit is not a regular distributionso we call
it a singular distribution.
Notation: when we write

(x)(x)dx, we
dont mean an ordinary integral, we really mean
= (0).
4
There is also a third requirement: f{} must be continu-
ous, in that if you change (x) continuously the value of f{}
must change continuously. In practice, you will never violate
this condition unless you are trying to.
3
Furthermore, when we say (x x

), we just mean
the distribution (x x

) = (x

).
If we look at the nite-x delta approximation

x
(x) from section 1.1, that denes the regular dis-
tribution:

x
=
1
x

x/2
x/2
(x)dx,
which is just the average of (x) in [x/2, x/2].
Now, however, viewed as a distribution, the limit
x 0 is perfectly well dened:
5
lim
x0

x
=
(0) = , i.e.
x
.
Of course, the distribution is not the only singular
distribution, but it is the most famous one (and the
one from which many other singular distributions are
built). We will see more examples later.
2.3 Derivatives of distributions
& dierentiating discontinuities
How do we dene the derivative f

of a distribution?
Well, at the very least we want it to be the same as
the ordinary derivative when f is a regular distribu-
tion f(x) that happens to be dierentiable in the or-
dinary sense. In that case, f

=

f

(x)(x)dx =

f(x)

(x)dx = f

, where we have integrated


by parts and used the fact that (x) is zero outside a
nite region to eliminate the boundary terms. This
is such a nice result that we will use it to dene the
derivative of any distribution:
The distributional derivative f

of f is
given by the distribution f

= f

, where

(x) is the ordinary derivative of (x).


(This is sometimes also called a weak derivative.)
Since the test functions (x) were required to be in-
nitely dierentiable, we have a remarkable conse-
quence: every distribution is innitely dierentiable
(in the distributional sense).
For example, since = (0), it immediately
follows that the derivative of a delta function is the
distribution

(0).
The most important consequence of this denition
is that even discontinuous functions are dier-
entiable as distributions, and their derivatives give
delta functions for each discontinuity. Consider the
regular distribution S dened by the step function
S(x):
S =

S(x)(x)dx =


0
(x)dx.
5
We have used the fact that (x) is required to be continu-
ous, from which one can show that nothing weird can happen
with the average of (x) as x 0.
It immediately follows that the distributional deriva-
tive of the step function is
S

= S

(x)dx = (x)[

0
= (0)

() = (0).
But this is exactly the same as , so we immedi-
ately conclude: S

= .
Since any function with jump discontinuities can be
written in terms of S(x), we nd that the derivative
of any jump discontinuity gives a delta function mul-
tiplied by the magnitude of the jump. For example,
consider:
f(x) =

x
2
x < 3
x
3
x 3
= x
2
+ (x
3
x
2
)S(x 3).
The distributional derivative works just like the ordi-
nary derivative, except that S

= , so
6
f

(x) = 2x + (3x
2
2x)S(x 3) + (3
3
3
2
)(x 3)
= 18(x 3) +

2x x < 3
3x
2
x 3
,
where of course by f

(x) I mean the distribution


f

. It is common to be casual with notation in


this way for distributions, treating them like ordinary
functions, but you have to remember that you cant
evaluate them at any point x, you can only evaluate
them for test functions (x).
2.4 Isolated points
With ordinary functions, we had to make lots of
caveats about isolated points. No more with distri-
butions. The key point is that two dierent ordinary
functions can dene the same distribution. Consider,
for example, the function
f(x) =

1 x = 0
0 otherwise
.
This is not a delta function: it is nite at x = 0, and
is a perfectly acceptable function. It also denes a
regular distribution:
f =

f(x)(x)dx = 0.
6
Im being a bit glib here. How do we know that the product
rule works the same? Below, we will rigorously dene what it
means to multiply a distribution by a smooth function like
x
3
x
2
, and the ordinary product rule will follow.
4
Think about it: no matter what (x) is, the integral
must give zero because it is only nonzero (by a -
nite amount) at a single point, with zero area (zero
measure for the pure-math folks). Thus, in the dis-
tribution sense, we can say perfectly rigorously that
f = 0
even though f(x) ,= 0 in the ordinary-function sense!
In general, any two ordinary functions that only
dier (by nite amountsnot delta functions!) at
isolated points (a set of measure zero) dene the
same regular distribution. We no longer have to
make caveats about isolated pointsnite values at
isolated points make no dierence to a distribution.
For example, there are no more caveats about the
Fourier series or Fourier transforms: they converge,
period, for distributions.
7
Also, there is no more quibbling about the value of
things like S(x) right at the point of discontinuities.
It doesnt matter, for distributions. Nor is there quib-
bling about the derivative of things like [x[ right at
the point of the slope discontinuity. It is an easy mat-
ter to show that the distributional derivative of [x[ is
simply 2S(x) 1, i.e. it is the regular distribution
corresponding to the function that is +1 for x > 0
and 1 for x < 0 (with the value at x = 0 being
rigorously irrelevant).
2.5 Interchanging
limits and derivatives
With a distribution, limits and (distributional)
derivatives can always be interchanged. This is
tremendously useful when talking about PDEs and
convergence of approximations. In the distribution
sense, the Fourier series can always be dierentiated
term-by-term, for example.
This is easy to prove. Suppose that the distribu-
tions f
n
f as n . That is, for any (x),
f
n
f. Since this is true for any (x),
it must be true for

(x), and hence f

n
=
f
n

= f

as n . Q.E.D.
3 Rules for distributions
For the most part, in 18.303, we will cheat a bit.
We will treat things as ordinary functions when-
ever we can, using the ordinary operations, and only
7
Technically, we have to choose distributions with the right
set of test functions. The right test functions for the general
Fourier transform on the real line are those for which x
n
(x)
0 as x for any n > 0, i.e. (x) vanishes faster than any
polynomial. The resulting distributions are called tempered
distributions, and are the domain of the Fourier transform.
switch to interpreting them as distributions when we
run into diculty (e.g. derivatives of discontinuities,
etcetera). Since the rules for distribution operations
are all dened to be consistent with those for func-
tions in the case of regular distributions, this doesnt
usually cause any trouble.
However, it is good to dene some of the important
operations on distributions precisely. All you have to
do is to explain what the operation does to test func-
tions, usually dened by analogy with

f(x)(x)dx
for regular distributions. Here are a few of the most
basic operations:
dierentiation: f

= f
addition: (f
1
+f
2
) = f
1
+f
2

multiplication by smooth functions (including


constants) s(x): [s(x) f] = fs(x)(x)
product rule for multiplication by smooth func-
tions s(x): [s f]

= [s f]

= fs

=
fs

(s)

= fs

+f

s = [s

f] +
[s f

] = [s

f +s f

].
translation: [f(x y)](x) = f(x +y)
scaling [f(x)](x) =
1

f(x/)
If you are not sure where these rules come from, just
try plugging them into a regular distribution f =

f(x)(x)dx, and youll see that they work out in


the ordinary way.
4 Problems with distributions
Unfortunately, distributions are not a free lunch; they
come with their own headaches. There are two ma-
jor diculties, one of which is surmountable and the
other is not:
Boundary conditions: since distributions do not
have values at individual points, it is not so easy
to impose boundary conditions on the solutions
if they are viewed as distributionswhat does it
mean to set u(0) = 0? There are ways around
this, but they are a bit cumbersome, especially
in more than one dimension.
Multiplication: it is not generally meaningful
to multiply distributions. The simplest exam-
ple is the delta function: what would (x)
2
be?

x
(x)
2
is okay, but its limit as x 0 does
not exist even as a distribution (the amplitude
goes as 1/x
2
while the integral goes as x, so
it diverges).
5
For linear PDEs, lack of multiplication is not such
a big problem, but it does mean that we need to
be careful about Hilbert spaces: if we think of the
solutions u(x) as distributions, we have a problem
because u, u) may not be denedthe set of distri-
butions does not form a Hilbert space. (Technically,
we can make something called a rigged Hilbert space
that includes distributions, but I dont want to go
there.)
5 The weak form of a PDE
Suppose we have a linear PDE

Au = f. We want
to allow f to be a delta function etcetera, but we
still want to talk about boundary conditions, Hilbert
spaces, and so on for u. There is a relatively simple
compromise for linear PDEs, called the weak form
of the PDE or the weak solution. This concept
can roughly be described as requiring

Au = f only
in the weak sense of distributions (i.e., integrated
against test functions, taking distributional deriva-
tives in the case of a u with discontinuities), but re-
quiring u to be in a more restrictive class, e.g. regu-
lar distributions corresponding to functions satisfying
the boundary conditions in the ordinary sense but
having some continuity and dierentiability so that
u, u) and u,

Au) are nite (i.e., living in an appropri-
ate Sobolev space). Even this denition gets rather
technical very quickly, especially if you want to allow
delta functions for f (in which case u can blow up at
the point of the delta for dimensions > 1). Nailing
down precisely what spaces of functions and opera-
tors one is dealing with is where a lot of technical
diculty and obscurity arises in functional analysis.
However, for practical engineering and science appli-
cations, we can get away with being a little careless in
precisely how we dene the function spaces because
the weird counterexamples are usually obviously un-
physical. The key insight of distributions is that what
we care about is weak equality, not pointwise equality,
and correspondingly we only need weak derivatives
(integration by parts, or technically bilinear forms).
6 Greens functions
Now that we have distributions, Greens functions are
much easier to work with. Consider, for example, the
Greens function G(x, x

) of

A =

2
x
2
on [0, L] with
Dirichlet (zero) boundary conditions. If

Au = f is
to be solved by u(x) =

G(x, x

)f(x

)dx

, then we
must have

Au =

[

AG(x, x

)]f(x

)dx

= f(x), which
is true if

AG(x, x

) = (x x

)
and the integrals are re-interpreted as evaluating a
distribution.
What does this equation mean? For any x ,= x

[or for any (x) with (x

) = 0] we must have

AG(x, x

) = 0 =

2
x
2
G(x, x

), and this must mean


that G(x, x

) is a straight line for x < x

and x > x

.
To satisfy, the boundary conditions, this straight line
must pass through zero at 0 and L, and hence G(x, x

)
must look like x for x < x

and (x L) for x > x

for some constants and .


G(x, x

) had better be continuous at x = x

, or
otherwise we would get a delta function from the
rst derivativehence = (x

L)/x

. The rst
derivative

x
G(x, x

) then gives for x < x

and
for x > x

. What about the next derivative? Since

x
G(x, x

) is discontinuous, it doesnt have an ordi-


nary second derivative at x = x

, but as a distribu-
tion it is no problem:

2
x
2
G(x, x

) is zero everywhere
(the derivative of a constant) plus a delta function
(x x

) multiplied by , the size of the jump.


Thus,

2
x
2
G(x, x

) =

AG = ( )(x x

), and
from above we must have = 1. Combined with
the equation for from continuity of G, we obtain
= x

/L and = 1 x

/L, exactly the same as


our result from class (which we got by a more labo-
rious method).
Further reading
I. M. Gelfand and G. E. Shilov, Generalized
Functions, Volume I: Properties and Operations
(New York: Academic Press, 1964). [Out of
print, but still my favorite book on the subject.]
Robert S. Strichartz, A Guide to Distribu-
tion Theory and Fourier Transforms (Singapore:
World Scientic, 1994).
Jesper Lutzen, The Prehistory of the Theory of
Distributions (New York: Springer, 1982). [A
fascinating book describing the painful historical
process that led up to distribution theory.]
6
Distributions in a nutshell
Delta functions are okay. You can employ their
informal description without guilt because there
is a rigorous denition to fall back on in case of
doubt.
An integral

(x x

)(x)dx just
means (x

). Integrals over nite domains

(x x

)(x) give (x

) if x

is in the
interior of and 0 if x

is outside , but
are undened (or at least, more care is re-
quired) if x

is on the boundary d.
When in doubt about how to compute f

(x),
integrate by parts against a test function to
see what

f

(x)(x) =

f(x)

(x) does (the


weak or distributional derivative).
A derivative of a discontinuity at x

gives
(xx

) multiplied by the size of the discon-


tinuity [the dierence f(x
+
)f(x

)], plus
the ordinary derivative everywhere else.
This also applies to dierentiating func-
tions like 1/

x that have nite integrals


but whose classical derivatives have diver-
gent integralsapplying the weak deriva-
tive instead produces a well-dened distri-
bution. [For example, this procedure fa-
mously yields
2 1
r
= 4(x) in 3d.]
All that matters in the distribution (weak) sense
is the integral of a function times a smooth, lo-
calized test function (x). Anything that doesnt
change such integrals

f(x)(x), like nite val-
ues of f(x) at isolated points, doesnt matter.
(That is, whenever we use = for functions we
are almost always talking about weak equality.)
Interchanging limits and derivatives is okay in
the distribution sense. Dierentiating Fourier
series (and other expansions in innite bases)
term-by-term is okay.
In practice, we only ever need to solve PDEs in
the distribution sense (a weak solution): inte-
grating the left- and right-hand sides against any
test functions must give the same number, with
all derivatives taken in the weak sense.
7

You might also like