Kantorovich2016 Book MathematicsForNaturalScientist (1)
Kantorovich2016 Book MathematicsForNaturalScientist (1)
Kantorovich2016 Book MathematicsForNaturalScientist (1)
Lev Kantorovich
Mathematics
for Natural
Scientists II
Advanced Methods
Undergraduate Lecture Notes in Physics
Series editors
Neil Ashby
Professor Emeritus, University of Colorado Boulder, CO, USA
William Brantley
Professor, Furman University, Greenville, SC, USA
Matthew Deady
Professor, Bard College, Annandale, NY, USA
Michael Fowler
Professor, University of Virginia, Charlottesville, VA, USA
Morten Hjorth-Jensen
Professor, University of Oslo, Norway
Michael Inglis
Professor, SUNY Suffolk County Community College, Selden, NY, USA
Heinz Klose
Professor Emeritus, Humboldt University Berlin, Germany
Helmy Sherif
Professor, University of Alberta, Edmonton, AB, Canada
Lev Kantorovich
123
Lev Kantorovich
Physics Department
School of Natural and Mathematical
Sciences
King’s College London, The Strand
London, UK
This is the second volume of the course of mathematics for natural scientists. It is
loosely based on the mathematics course for second year physics students at King’s
College London that I have been reading for more than 10 years. It follows the
spirit of the first volume [1] by continuing a gradual build-up of the mathematical
knowledge necessary for, but not exclusively, physics students.
This volume covers more advanced material, beginning with two essential
components: linear algebra (Chap. 1) and theory of functions of complex variables
(Chap. 2). These techniques are heavily used in the chapters that follow. Fourier
series are considered in Chap. 3, special functions of mathematical physics (Dirac
delta function, gamma and beta functions, detailed treatment of orthogonal polyno-
mials, the hypergeometric differential equation, spherical and Bessel functions) in
Chap. 4 and then Fourier (Chap. 5) and Laplace (Chap. 6) transforms. In Chap. 7, a
detailed treatment of curvilinear coordinates is given, including the corresponding
differential calculus. This is essential, as many physical problems possess symmetry
and using appropriate curvilinear coordinates may significantly simplify the solution
of the corresponding partial differential equations (studied in Chap. 8) if the
symmetry of the problem at hand is taken into account. The book is concluded with
variational calculus in Chap. 9.
As in the first volume, I have tried to introduce new concepts gradually and as
clearly as possible, giving examples and problems to illustrate the material. Across
the text, all the proofs necessary to understand and appreciate the mathematics
involved are also given. In most cases, the proofs would satisfy the most demanding
physicist or even a mathematician; only in a few cases have I had to sacrifice the
“strict mathematical rigour” by presenting somewhat simplified derivations and/or
proofs.
As in the first volume, many problems are given throughout the text. These
are designed mainly to illustrate the theoretical material and require the reader to
complete them in order to be in a position to move forward. In addition, other
problems are offered for practise, although I have to accept, their number could
have been larger. For more problems, the reader is advised to consult other texts,
e.g. the books [2–6].
v
vi Preface
References
20. Allan Pinkus, “Fourier Series and Integral Transforms”, Cambridge University Press, 1997
(ISBN-10: 0521597714, ISBN-13: 978-0521597715).
21. Serge Lang, “Linear Algebra”, Undergraduate Texts in Mathematics, Springer, 2013 (ISBN-10:
1441930817, ISBN-13: 978-1441930811).
22. Richard Bellman, “Introduction to Matrix Analysis”, Classics in Applied Mathematics, Society
for Industrial and Applied Mathematics, 2 edition, 1987 ( ISBN-10: 0898713994, ISBN-13:
978-0898713992).
23. George Andrews, Richard Askey and Ranjan Roy, “Special Functions”, Encyclopedia of
Mathematics and its Applications, Cambridge University Press, 2010 (ISBN-10: 0521789885,
ISBN-13: 978-0521789882).
24. N. N. Lebedev, “Special Functions & Their Applications”, Dover Books on Mathematics,
Dover, 2003 (ISBN-10: 0486606244, ISBN-13: 978-0486606248).
25. Naismith Sneddon, “Special Functions of Mathematical Physics and Chemistry”, Univer-
sity Mathematics Texts, Oliver & Boyd, 1966 (ISBN-10: 0050013343, ISBN-13: 978-
0050013342).
26. W. W. Bell, “Special Functions for Scientists and Engineers”, Dover Books on Mathematics,
Dover, 2004 (ISBN-10: 0486435210, ISBN-13: 978-0486435213).
27. В. Г. Левич, “Курс теоретической физики”, т. 1 и 2, Наука, 1969 (in Russian). There
is a translation: Benjamin G. Levich, “Theoretical physics: an advanced text”, North-Holland
Publishing in Amsterdam, London, 1970.
28. L. Kantorovich, “Quantum Theory of the Solid State: an Introduction”, Springer, 2004 (ISBN
978-1-4020-1821-3 and 978-1-4020-2153-4).
Famous Scientists Mentioned in the Book
Throughout the book various people, both mathematicians and physicists, who
are remembered for their outstanding contribution in developing science, will be
mentioned. For reader’s convenience, their names (together with some information
borrowed from their Wikipedia pages) are listed here in the order they first appear
in the text:
Leopold Kronecker (1823–1891) was a German mathematician.
Georg Friedrich Bernhard Riemann (1826–1866) was an influential German
mathematician who made lasting and revolutionary contributions to analysis,
number theory and differential geometry.
Jørgen Pedersen Gram (1850–1916) was a Danish actuary and mathematician.
Erhard Schmidt (1876–1959) was an Estonian-German mathematician.
Adrien-Marie Legendre (1752–1833) was a French mathematician.
Edmond Nicolas Laguerre (1834–1886) was a French mathematician.
Charles Hermite (1822–1901) was a French mathematician.
Pafnuty Lvovich Chebyshev (1821–1894) was a Russian mathematician.
Albert Einstein (1879–1955) was a German-born theoretical physicist who
developed the general theory of relativity and also contributed in many other areas of
physics. He received the 1921 Nobel Prize in physics for his “services to theoretical
physics”.
Wolfgang Ernst Pauli (1900–1958) was an Austrian theoretical physicist and
one of the pioneers of quantum physics.
Jean-Baptiste Joseph Fourier (1768–1830) was a French mathematician and
physicist.
Gabriel Cramer (1704–1752) was a Swiss mathematician.
Hendrik Antoon Lorentz (1853–1928) was a Dutch physicist.
Józef Maria Hoene-Wroński (1776–1853) was a Polish Messianist philosopher,
mathematician, physicist, inventor, lawyer and economist.
Ludwig Otto Hesse (1811–1874) was a German mathematician.
Cornelius (Cornel) Lanczos (1893–1974) was a Hungarian mathematician and
physicist.
Léon Nicolas Brillouin (1889–1969) was a French physicist.
ix
x Famous Scientists Mentioned in the Book
xiii
xiv Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669
Chapter 1
Elements of Linear Algebra
We gave the definition of vectors in real one, two, three and p dimensions in
Sect. I.1.7.1 Here we generalise vectors to complex p-dimensional spaces. It is
straightforward to do: we define a vector x in a p-dimensional space by specifying p
(generally complex) numbers x1 ; x2 ; : : :; xp , called vector coordinates, which define
uniquely the vector: x D x1 ; : : : ; xp . Two such vectors can be added to each
other, subtracted from each other or multiplied by a number c. In all these cases
the operations are performed on the vectors coordinates:
x C y D g means xi C yi D gi ; i D 1; 2; : : : ; p I
x y D g means xi yi D gi ; i D 1; 2; : : : ; p I
cx D g means cxi D gi ; i D 1; 2; : : : ; p :
which generalises the corresponding definition for real vectors. Two vectors are
called orthogonal
p if their dot product is zero. The length of a vector x is defined
as jxj D .x; x/, i.e. the dot product of the vector with itself is the square of its
length. Obviously,
ˇ ˇ2
.x; x/ D x1 x1 C C xp xp D jx1 j2 C C ˇxp ˇ > 0 ;
i.e. the dot product of the vector with itself is non-negative and hence the vector
length jxj, as a square root of the dot product, is well defined.
The dot product defined above satisfies the following identity:
!
X
p
X
p
.x; y/ D xi yi D xi yi D .y; x/ : (1.2)
iD1 iD1
1
In the following, references to the first volume of this course (L. Kantorovich, Mathematics for
natural scientists: fundamentals and basics, Springer, 2015) will be made by appending the Roman
number I in front of the reference, e.g. Sect. I.1.8 or Eq. (I.5.18) refer to Sect. 1.8 and Eq. (5.18) of
the first volume, respectively.
1.1 Vector Spaces 3
X
p
x D x1 ; : : : ; xp D x1 e1 C x2 e2 C C xp ep D xi ei ;
iD1
with its (generally complex) coordinates fxi g serving as the expansion coefficients.
Next, we have to introduce the notion of a linear independence of vectors. Take
two vectors u and v which are known to be not proportional to each other, i.e.
u ¤ v for any complex . A third vector g is said to be linearly dependent on
u and v if it can be written as their linear combination:
g D ˛u C ˇv ;
˛x C ˇy C g D 0
4 1 Elements of Linear Algebra
˛1 x1 C ˛2 x2 C C ˛n xn D 0 ;
is only possible if all the coefficients are equal to zero at the same time: ˛1 D ˛2 D
D ˛p D 0. We shall see in the following that in a p-dimensional space the
maximum number of linearly independent vectors cannot be larger than p.
Example 1.1.
I Consider vectors A D i C j D .1; 1; 0/ , B D i C k D .1; 0; 1/. We shall show
that the vector C D .2; 1; 1/ is linearly dependent on A and B. Indeed, trying a
linear combination, C D ˛A C ˇB, we can write the following three equations in
components:
8
<2 D ˛ C ˇ
1D˛ ;
:
1Dˇ
Fig. 1.1 A third vector goes beyond the plane formed by the first two vectors as it has an extra
coordinate in the direction out of the plane (an extra “degree of freedom”). The three vectors form
a 3D space (volume)
• Start from u1 . All vectors c1 u1 with any numbers c1 form a line (a 1D space)
along u1 .
• Take u2 , which is not proportional to u1 (and hence is linearly independent of it),
and form all linear combinations c1 u1 C c2 u2 with arbitrary numbers c1 and c2 ;
these form a plane (2D space) with u1 and u2 within it.
• Similarly, linear combinations c1 u1 C c2 u2 C c3 u3 would cover a 3D space
(the volume), see Fig. 1.1, if the vector u3 is chosen out of the plane due to u1
and u2 , i.e. it is linearly independent of them.
• Continuing this procedure, the whole p-dimensional space p-D is built up by
linear combinations
X
p
x D c1 u1 C C cp up D ci ui (1.3)
iD1
As we shall see below in Sect. 1.1.3, it is always possible to form linear combi-
nations of any n linearly independent vectors such that the constructed vectors are
orthonormal.
In order to find the coefficients ci for the given x, let us calculate the dot product
of both sides of Eq. (1.3) with uj :
X
p
X
p
.uj ; x/ D ck .uj ; uk / D ck ıkj D cj ;
kD1 kD1
6 1 Elements of Linear Algebra
since only one term in the sum will survive, namely the one with the summation
index k D j:
X
p
ck ıkj D c1 ı1j C c2 ı2j C C cj1 ıj1;j C cj ıjj C cjC1 ıjC1;j C D cj :
kD1
„ƒ‚… „ƒ‚… „ ƒ‚ … „ƒ‚… „ ƒ‚ …
D0 D0 D0 ¤0 D0
This proves that for each x the coefficients ci are uniquely determined, so that the
appropriate (and unique) linear combination (1.3) is obtained.
Since any x from the p-D space can be uniquely specified by a linear combination
of vectors u1 , u2 , : : :, up , the latter set is said to be complete. Note that the choice
of the vectors of the basis set u1 , u2 , : : :, up is not unique; any set of p linearly
independent vectors will do. Of course, in the new basis the expansion coefficients
will be different.
Example 1.3. I Prove that vectors u1 D .1; 0; 0; 0/, u2 D .0; 1; 1; 0/, u3 D
.1; 0; 1; 0/ and u4 D .0; 0; 0; 1/ are linearly independent.
Solution. Indeed, we have to solve the vector equation ˛1 u1 C ˛2 u2 C ˛3 u3 C ˛4
u4 D 0, which, if written in components, results in four algebraic equations with
respect to the four coefficients ˛1 , ˛2 , ˛3 and ˛4 :
8 8
ˆ
ˆ ˛1 1 C ˛2 0 C ˛3 1 C ˛4 0 D 0 ˆ
ˆ ˛1 C ˛3 D 0
< <
˛1 0 C ˛2 1 C ˛3 0 C ˛4 0 D 0 ˛2 D 0
H) ;
ˆ ˛1 0 C ˛2 1 C ˛3 1 C ˛4 0 D 0 ˆ ˛2 C ˛3 D 0
:̂ :̂
˛1 0 C ˛2 0 C ˛3 0 C ˛4 1 D 0 ˛4 D 0
x D c1 u1 C c2 u2 C c3 u3 C c4 u4 : (1.5)
Multiplying (in the sense of the dot product) both sides of this equation by u1 , we
obtain an algebraic equation; repeating this process with u2 , u3 and u4 , we obtain
other three equations for the unknown coefficients, i.e. the required four equations:
8
ˆ
ˆ 1 D c1 1 C c2 0 C c3 1 C c4 0
<
1 D c1 0 C c2 2 C c3 1 C c4 0
;
ˆ 3 D c1 1 C c2 1 C c3 2 C c4 0
:̂
3 D c1 0 C c2 0 C c3 0 C c4 1
1.1 Vector Spaces 7
which are easily solved to give c1 D 2, c2 D 1, c3 D 3 and c4 D 3, i.e. the
required expansion is
Problem 1.2. Alternatively, solve this problem by writing Eq. (1.5) in com-
ponents. This also gives a system of four linear algebraic equations for the
unknown coefficients c1 , c2 , c3 and c4 . This method is simpler than the one used
above.
Problem 1.3. Prove linear independence of vectors u1 D .1; 1; 0/, u2 D
.1; 0; 1/ and u3 D .0; 1; 1/.
Problem 1.4. Expand the vector u D .1; 1; 1/ in terms of u1 D .1; 0; 1/,
u2 D .0; 1; 1/ and u3 D .1; 1; 0/. [Answer: u D u1 =2 C 3=2u2 u3 =2.]
Problem 1.5. Check linear dependence of vectors u1 D .1; 1; 0/, u2 D
.1; 0; 1/ and u3 D .0; 1; 1/. [Answer: linearly dependent.]
Problem 1.6. Find the expansion coordinates c1 , c2 and c3 of the vector u D
.1; 2; 3/ in terms of the basis vectors u1 D .1; 1; 0/, u2 D .1; 0; 1/ and u3 D
.1; 1; 1/. [Answer: c1 D 2, c2 D 1 and c3 D 4.]
Problem 1.7. Check linear independence of vectors u1 D .1; 1; 1; 1/, u2 D
.1; 1; 1; 1/, u3 D .1; 1; 1; 1/ and u4 D .1; 1; 1; 1/. [Answer: the vectors are
linearly independent.]
There is close analogy between functions of a single variable and vectors which is
frequently exploited. Indeed, consider a generally complex function f .x/ of a real
variable defined on the interval a x b. We assume that the function is integrable
in this interval. We divide the interval into N equidistant subintervals of the length
N D .b a/=N using division points x0 D a, x1 D a C N , x2 D a C 2N , etc.,
and xN D a C NN D b, as shown in Fig. 1.2; generally, xi D a C iN . The values
.f .x0 / ; f .x1 / ; : : : ; f .xN // D .f0 ; f1 ; : : : ; fN / of the function f .x/ at the NC1 division
points form a vector f D .f0 ; f1 ; : : : ; fN / of dimension N C 1. Similarly we can form
vectors g, h, etc., of the same dimension from other functions g.x/, h.x/, etc.
Then, similarly to vectors, we can sum and subtract vectors formed from
functions as well as multiply them by a number. All these operations will give
values of thus obtained new functions at the division points. One can also consider
8 1 Elements of Linear Algebra
the dot product of two vectors f and g, corresponding to the functions f .x/ and g.x/,
respectively, in the usual way as
X
N
.f; g/N D f .xi / g .xi / ;
iD0
where we indicated explicitly in our notations for the dot product that it is based on
the division of the interval a x b into N subintervals. The sum above diverges
in the N ! 1 limit. However, if we multiply the sum by the division interval
N D x, it would correspond to the Riemann integral sum (Sect. I.4.1), which in
the limit becomes the definite integral of the product f .x/ g.x/ between a and b,
which may converge. Therefore, we define the dot product of two functions as the
limit:
" N # Z
X b
.f .x/; g.x// D lim .f; g/N N D lim f .xi / g .xi / N D f .x/ g.x/dx :
N!1 N!1 a
iD0
(1.6)
In fact, it is convenient to generalise this definition a little by introducing the
so-called weight function w.x/ > 0:
Z b
.f .x/; g.x// D w.x/f .x/ g.x/dx : (1.7)
a
This integral, which is the closest to the dot product of vectors if w.x/ D 1, is
often called an overlap integral since its value depends crucially on whether or not
the two functions overlap within the interval: only if there is a subinterval where
both functions are non-zero (i.e. they overlap there), the integral is non-zero. If the
overlap integral is equal to zero, it is said that the two functions are orthogonal.
The overlap integral of the function with itself defines the “length” of the function
on the interval (also called norm):
Z b
.f ; f / D w.x/ jf .x/j2 dx :
a
Assuming f .x/ is continuous, the norm is not equal to zero if and only if the function
f .x/ ¤ 0 on at least one continuous subinterval of a finite length inside the original
interval a x b. If .f ; f / D 0, then jf .x/j D 0 at all points x within our interval.
1.1 Vector Spaces 9
Therefore, the norm .f ; f / characterises how strongly the function f .x/ is different
from zero, while the dot product .f ; g/ demonstrates if the two functions f .x/ and
g.x/ have appreciable
overlap
within the interval. If functions in the set f1 .x/, f2 .x/,
etc., fn .x/ satisfy fi ; fj D ıij , they are called orthonormal: orthogonal if i ¤ j
(i.e. when functions are different), and each function is of the norm equal to one
(when i D j).
Similarly to vectors, it is also possible to consider linear independence of
functions. Functions f1 .x/, f2 .x/,: : :, fk .x/ are said to be linearly independent if any
one of them cannot be expressed as a linear combination of the others. In other
words, the equation
X
k
˛1 f1 .x/ C ˛2 f2 .x/ C C ˛k fk .x/ D ˛i fi .x/ D 0 (1.8)
iD1
is valid for any x from a specified interval if and only if there is only a unique
trivial choice for the coefficients ˛1 D ˛2 D D ˛k D 0. This definition makes
perfect sense: indeed, if a function is linearly dependent on some other functions,
then one should be able to write this function as its linear combination with non-zero
coefficients. For instance, f .x/ D 2x C 5x2 1 is linearly dependent on f0 .x/ D 1,
f1 .x/ D x and f2 .x/ D x2 , since f .x/ D f0 .x/ C 2f1 .x/ C 5f2 .x/, and hence f .x/ C
f0 .x/ 2f1 .x/ 5f2 .x/ D 0 for any x. It is seen that in this case the coefficients f˛i g
are not zero, but equal to some real values. If one can only accommodate the linear
combination (1.8) with all coefficients being equal to zero, the functions are indeed
linearly independent. This definition of linear independence is exactly equivalent to
that for vectors.
Example 1.5.
I As an example, let us prove, assuming the unit weight function, that the functions
f1 D 1, f2 D x and f3 D x2 are linearly independent on the interval 0 x 1.
Indeed, we need to solve the equation
˛1 C ˛2 x C ˛3 x2 D 0 (1.9)
1 1 1
˛1 C ˛2 C ˛3 D 0 ;
2 3 4
10 1 Elements of Linear Algebra
1 1 1
˛1 C ˛2 C ˛3 D 0 :
3 4 5
These three linear equations are solved, e.g. by substitution and yield the unique
trivial (zero) solution ˛1 D ˛2 D ˛3 D 0 proving that the three functions are indeed
linearly independent. J
Note that the linear independence can also be verified by simply using several
values of the x in the original equation (1.9):
xD0 H) ˛1 D 0 I
xD1 H) ˛2 C ˛3 D 0 I
1 1 1
xD H) ˛2 C ˛3 D 0 :
2 2 4
Solving the last two equations gives trivially ˛2 D ˛3 D 0; this with ˛1 D 0
obtained from x D 0 gives immediately the desired result.
In the problems below the unit weight function is assumed.
Later on in Sect. 1.2.8 we shall give a simpler method for checking linear
independence of functions since it relies solely on differentiation, which can always
be done analytically; neither orthogonality nor integration is required.
.3/ .3/
Next, consider the third vector v3 D u3 C c1 v1 C c2 v2 with
so that
2 1 1 1
v3 D u3 v1 v2 D ; ; :
3 3 3 3
The constructed vectors v1 D .1; 0; 1/, v2 D 12 ; 1; 12 and v3 D . 13 ; 13 ; 13 / are
mutually orthogonal. To make them normalised,
q rescale them by their own length to
finally obtain: v1 D p .1; 0; 1/, v2 D 3 2 ; 1; 12 and v3 D p1 .1; 1; 1/ J.
1 2 1
2 3
Eventually, the above procedure allows expanding new vectors via the old ones
explicitly:
where d21 , d31 , etc., are some coefficients which are calculated during the course
of the procedure. For instance, in the case of the previous example (before
normalisation),
1 2 2 2
v1 D u1 ; v2 D u1 C u2 ; v3 D u1 v2 C u3 D u1 u2 C u3 :
2 3 3 3
We close this section by noting that the Gram–Schmidt method can also be used
to check linear dependence of a set of vectors. If, during the course of this procedure,
some of the new vectors come out to be zero, then there are linearly dependent
vectors in the original set. The newly created set of vectors will then contain a
smaller set of only linearly independent vectors.
To illustrate this point, consider, for instance, a situation in which we assume
that the third vector is linearly dependent on the first two vectors: u3 D ˛u1 C ˇu2 .
Let us run the first steps of the procedure to see how one of the vectors is going
to be eliminated. Out of u1 and u2 we construct two orthogonal vectors v1 and v2
using the first part of the Gram–Schmidt method (i.e. before normalisation). The
new vectors are some linear combinations of the old ones; reversely, the old ones
are a linear combination of the new ones. Therefore, we can write that
built as linear combinations of the original set with expansion coefficients ˛ij (they
bear two indices and form an object called matrix, see the next section), then the
Gram–Schmidt procedure is the simplest method to achieve this goal. It goes exactly
as described above for vectors; the only difference is in what we mean by the dot
product in the case of the functions. The following example illustrates this point.
Example 1.7. I Consider the first three powers of x on the interval 1 x 1
as our original set of functions, i.e. f1 .x/ D x0 D 1, f2 .x/ D x and f3 .x/ D x2 .
Assuming the unit weight function in the definition (1.7) of the dot product of
functions, we immediately conclude by a direct calculation of the appropriate
integrals that our functions are not orthogonal. Construct their linear combinations
(i.e. polynomials of not higher than the second degree) which are orthogonal to each
other.
Solution. Following the procedure outlined above, we choose the first function as
g1 .x/ D f1 .x/ D 1. The second function is
.2/ .2/
g2 .x/ D f2 .x/ C c1 g1 .x/ D x C c1 :
We would like it to be orthogonal to the first one, g1 .x/. Taking the dot product of
both sides of the above equation for g2 .x/ with g1 and setting it to zero, we obtain
.2/
.g2 ; g1 / D .f2 ; g1 / C c1 .g1 ; g1 / D 0 :
14 1 Elements of Linear Algebra
R1
Simple calculations show that .f2 ; g1 / D .x; 1/ D 1 xdx D 0 and similarly
.2/
.g1 ; g1 / D .1; 1/ D 2. Hence, we get c1 D .f2 ; g1 / = .g1 ; g1 / D 0, i.e. g2 .x/ D x.
Similarly, consider
.3/ .3/
g3 .x/ D f3 .x/ C c1 g1 .x/ C c2 g2 .x/ ;
which is to be made orthogonal to both g1 .x/ and g2 .x/ at the same time. Taking one
dot product of both sides of the above equation for g3 with g1 and then another with
g2 and setting both to zero, we obtain
Problem 1.14. Continuing the above procedure, obtain the next two functions
3
from the previous example related to Legendre polynomials.
4
3 Use f4 .x/ D x
f45 .x/ D 2x . [Answer:
and the new functions are g4 D 5x 3x =5 and g5 D
35x 30x C 3 =35.]
Problem 1.15. Assuming the weight function w.x/ D ex and the interval
0 x < 1, show starting from the function L0 .x/ D 1 that the next three
orthogonal polynomials are
1 3
H1 .x/ D x ; H2 .x/ D x2 ; H3 .x/ D x3 x :
2 2
(continued)
1.2 Matrices: Definition and Properties 15
1
T1 .x/ D x ; T2 .x/ D x2 :
2
The generated functions are directly related to Chebyshev polynomials
(Chap. 4.3).
One can make a generalisation and stretch the set of numbers in the other dimension
as well forming a table of numbers:
0 1
a11 a12 a13 a1n a11 a12 a13 a1n
B a21 a22 a23 a2n C
C a2n
ADB D a21 a22 a23 ; (1.11)
@ A
am1 am2 am3 amn a am2 am3 amn
m1
16 1 Elements of Linear Algebra
All other, so-called off-diagonal, elements are equal to zero. In particular, if all
elements of the diagonal matrix are equal to one, then this matrix is called a unit
matrix. We shall denote the unit matrix by symbol E. Since non-diagonal elements
of E are zeros, but all diagonal elements are equal
to one, the element eij of the unit
matrix is in fact the Kronecker symbol, E D ıij .
For instance, the matrix
3 8 7
AD
1 0 2
is the 2 3 matrix, i.e. it has 2 rows and 3 columns; its elements are: a11 D 3,
a12 D 8, a13 D 7 (the first row); a21 D 1, a22 D 0 and a23 D 2 (the second row).
1.2 Matrices: Definition and Properties 17
A matrix
of any structure can be multiplied by a number. If c is a number and
A D aij , then B D cA is a new matrix with elements bij D caij , e.g.
1 3 2 3 9 6
3 D :
47 1 12 21 3
From two matrices A D aij (which is n m) and B D bij (which is m p) a new
matrix
X
m
C D AB D cij with cij D aik bkj (1.13)
kD1
can be constructed, which is an n p matrix, see Fig. 1.3. This operation is a natural
generalisation of the dot product of vectors. Thus, to obtain the i; j element cij of the
matrix C, it is necessary to calculate a dot product of the i-th vector-row of A with
the j-th vector-column of B. This means, that not just any matrices can be multiplied:
the number of columns in A must be equal to the number of rows in B. Note how the
indices in formula (1.13) for cij are written: i and j in the product aik bkj under the
sum are written exactly in the same order as in cij , while the dump summation index
k appears in between i and j, i.e. as the right index in aik and the left in bkj . In physics
literature the so-called Einstein summation convention is sometimes used whereby
Fig. 1.3 Schematics of the matrix multiplication. Matrix A has n D 5 rows and m D 9 columns,
while matrix B has p D 6 columns and the same number of rows m D 9 as A has columns. The
resultant matrix C D AB has n D 5 rows (as in A) and p D 6 columns (as in B)
18 1 Elements of Linear Algebra
the sum sign is dropped, i.e. C D AB is written in elements as cij D aik bkj and it is
assumed that summation is performed over the repeated index (k in this case); we
shall not be using this convention here however to avoid confusion.
As an example, consider the product of two matrices
0 1
1 4
131
AD and B D @ 10 3 A :
234
1 2
We obtain the matrix
1 1 C 3 10 C 1 1 1 4 C 3 3 C 1 2 32 15
C D AB D D ;
2 1 C 3 10 C 4 1 2 4 C 3 3 C 4 2 36 25
which is a 2 2 matrix.
In particular, one can multiply a matrix and a vector which results in a vector.
Indeed, if x D .xi / is an n-fold vector and A D aij is an m n matrix, then y D Ax
is an m-fold vector:
0 10 1 0 1 0 1
a11 a1n x1 a11 x1 C C a1n xn y1
y D Ax D @ A @ A D @ A D @A D y ;
am1 amn xn am1 x1 C C amn xn ym
(1.14)
or in components:
X
n
yi D aij xj :
jD1
This is equivalent to the general rule (1.13) of matrix multiplication: since vectors
are one-column matrices, then x D .xi / can be written as X D .xi1 / with xi1 D xi
and hence Y D AX in components means
X
n X
n
yi1 D aij xj1 D aij xj ;
jD1 jD1
which is the same as above because the elements yi1 form a single column matrix
and therefore represent a vector y with components yi D yi1 .
The matrix product is in general not commutative, i.e. AB ¤ BA. However, some
matrices may be commutative, e.g. diagonal matrices: if
0 1 0 1
a11 b11
B a22 C B b22 C
ADB
@
C and B D B
A @
C ;
A
ann bnn
1.2 Matrices: Definition and Properties 19
then
0 1
a11 b11
B a22 b22 C
AB D B
@
C D BA :
A
ann bnn
where the Kronecker delta symbol ıkj cuts off all the terms in the sum except for the
one with k D j. Similarly, the i; j element of the matrix DA works out to be
X X
dik akj D dkk ıik akj D dii aij ;
k k
which is different from aij djj obtained when multiplying matrices in the reverse
order, i.e. the two matrices are not generally commutative.
At the same time, the matrix product is associative: .AB/ C D A .BC/. The
simplest proof is again algebraic based on writing
down the
products of matrices
by elements. Indeed, if A D aij , B D bij and C D cij , then D D AB has
elements
X
dij D aik bkj :
k
Note that we introduced a new dump index l to indicate the product of elements
ail blk in D since the index k has already been used in the product dik ckj of DC. Also,
note how indices have been used in the product: since dik has the left and right
indices as i and k, these appear exactly in the same order in the product ail blk with
the summation index l in between, as required. The final product of three matrices
in elements becomes a double sum:
X
gij D ail blk ckj :
kl
20 1 Elements of Linear Algebra
The indices i; j of the element gij of the final matrix G appear as the first and the
last indices in the product ail blk ckj of the elements under the sums; the summation
indices follow one after the other and are both repeated twice.
On the other hand, the matrix H D A .BC/ has elements
!
X X X
hij D aik bkl clj D aik bkl clj :
k l kl
This is the same as gij since we can always interchange the dumb indices k $ l in
the double sums. This proves the required statement.
When a square matrix A is multiplied from either side (from the left or from
the right) by the unit matrix E of the same dimension, it does not change:
AE D EA D A. Indeed, in components:
X
.AE/ij D aik ıkj D aij ;
k
i.e. its indices are simply permuted. We shall be frequently using these notations
here denoting elements of the transposed matrix with the same small letter as the
original matrix, but putting a tilde (a wavy line) on top of it.
.AB/T D BT AT : (1.16)
Proof. If A D aij and B D bij , then AT D aQ ij with aQ ij D aji and BT D bQ ij
P
with bQ ij D bji . The product AB D cij with elements cij D k aik bkj after transpose
turns into the matrix .AB/T D cQ ij with elements
X X X
cQ ij D cji D ajk bki D aQ kj bQ ik D bQ ik aQ kj D dij ;
k k k
1.2 Matrices: Definition and Properties 21
whereP dij are exactly elements of BT AT . Note that here we were able to associate
dij D k bQ ik aQ kj with BT AT only after making sure that indices in the product of the
elements under the sum are properly ordered as required by the matrix multiplication
with the dump summation index k appearing in the middle and the indices i and j as
the first and the last ones, respectively, as in dij . Q.E.D.
The operation of transpose of a matrix appears in a dot product of two real
vectors:
.Ax; y/ D x; AT y : (1.17)
P
Indeed, let Ax D c be a vector with elements ci D k aik xk . Then, the dot product
of c and another vector y is
! !
X XX X X X X
.Ax; y/ D cl yl D alk xk yl D xk alk yl D xk aQ kl yl
l l k k l k l
D x; AT y ;
P P
because the vector d D AT y has elements dk D l aQ kl yl D l alk yl , as required.
If a square matrix is equal to its transpose, it is then symmetric with respect to its
diagonal and hence is called a symmetric matrix:
The first matrix is symmetric, while the second is not. In fact, the second matrix is
antisymmetric as aij D aji .
.x; y/ D X T Y ; (1.19)
22 1 Elements of Linear Algebra
where X D .xi / and Y D .yi / are matrices consisting of a single column (i.e. vector-
columns). Then, X T will be a matrix containing a single row (a vector-row), so that
X T and Y can be multiplied using the usual matrix multiplication rules:
01T 0 1 0 1
x1 y1 y1
@
.x; y/ D X Y D
T A @ D . x1 xp / A D x1 y1 C C xp yp ;
A @
xp yp yp
which is a 1 1 matrix (we enclosed its only element in the round brackets to stress
this point); it is essentially a scalar. Note, however, that the operation in which the
second vector is transposed produces a square matrix:
0 1 0 1T 0 1 0 1
x1 y1 x1 x1 y1 x1 yp
C D XY D @ A @ A D @ A y1 yp D @ A ;
T
xp yp xp xp y1 xp yp
with elements cij D xi yj .
If a matrix A D .aij / contains complex numbers, the complex conjugate matrix
A D .aij / can be defined which contains elements aij D aij , where indicates the
operation of complex conjugation. For instance,
a C ib 2 a ib 2
AD H) A D :
3i a ib 3i a C ib
n
commute if the matrices A and B do.
1.2 Matrices: Definition and Properties 23
Show that (i) A2 D B2 D C2 D E, where E is the unit matrix; (ii) any pair
of matrices anticommute, e.g. AB D BA and (iii) a commutator of any two
matrices, defined by ŒA; B D AB BA, is expressed via the third matrix, e.g.
ŒA; B D 2iC.
Problem 1.25. If ŒA; B D AB BA is the commutator of two matrices, prove
the following identity:
We have already encountered square, diagonal, unit and symmetric matrices. There
are many other types of matrices that have special properties some of which to be
considered later on. In the forthcoming subsections we shall encounter other types
of matrices.
Some square matrices A have an inverse matrix, denoted A1 , that is defined in
this way:
X
n X
n
aik bkj D bik akj D ıij ; (1.21)
kD1 kD1
where ıij is the Kronecker symbol. It is easy to see that A1 standing on the left and
right sides of A in the above equation is one and the same matrix. Indeed, assume
that these are different matrices called B and C, respectively, with B ¤ C, i.e.
BA D E (1.22)
and also, quite independently,
AC D E : (1.23)
Now multiply the second of these identities by B from the left: BAC D BE. In the
left-hand side: BAC D .BA/C D EC D C, where we used the first of the two
identities above, Eq. (1.22), while in the right-hand side, BE D B. Hence, we obtain
C D B, which contradicts our initial assumption, i.e. A1 on the left and right of A
in Eq. (1.20) is indeed the same matrix.
Not all square matrices have an inverse. For instance, a matrix consisting of only
zeros does not have one. Only non-singular matrices have an inverse and we shall
explain what that means later on in Sect. 1.2.9.
Theorem 1.2. Inverse of a product AB of two square matrices is equal to a
product of their inverse matrices taken in reverse:
Proof. The matrix C D .AB/1 is defined as CAB D E. Multiply both sides of this
matrix equation by B1 from the right: CABB1 D EB1 . Since E is the unit matrix,
EB1 D B1 . Also, BB1 D E by definition of B1 . Thus, we obtain: CAE D B1
or simply CA D B1 . Next multiply both sides of this matrix equation on A1 from
the right again; we obtain: CAA1 D B1 A1 or simply C D B1 A1 . Q.E.D.
Problem 1.26. Prove that the inverse matrix, if exists, is unique. [Hint: prove
by contradiction.]
Problem 1.27. Prove that the inverse of AT is equal to the transpose of the
inverse of A, i.e.
T 1 1 T
A D A : (1.25)
1
Problem 1.28. Prove that A1 D A.
Problem 1.29. Consider a left triangular square .n n/ matrix
0 1
a11
B a21 a22 C
B C
B C
A D B a31 a32 a33 C ;
B : :: :: : : C
@ :: : : : A
an1 an2 an3 ann
where all elements aij with j > i are equal to zero while aij ¤ 0 if j i. Show
by writing the identity AA1 D E directly in components that the inverse matrix
A1 has exactly the same left triangular structure.
Problem 1.30. Prove a similar statement for the right triangular matrix.
Solution. By definition,
ab xy 10
D ;
cd zh 01
that can easily be solved with respect to x; y; z; h (the first and the third equations
give x and z, while the other two y and h) yielding
d b c a
xD ; y D ; z D and h D ; where D da bc :
Thus, the inverse of the square matrix A is
1 d b
A1 D with D ad bc : (1.26)
c a
Problem 1.31. Let A D aij , x D .xi / and y D .yi /. Prove the above identity.
[Hint: write both sides of Eq. (1.27) in components.]
We see that two transformations, performed one after another, act as some other
transformation that is given by the product matrix C D BA, in which the order of
matrices in the product follows the order of the transformations themselves from
right to left. This operation, when one transformation is performed after another, is
called a multiplication operation, and we see that in the case under consideration
this operation corresponds exactly to the matrix multiplication.
A set of non-singular matrices may form an algebraic object called a group.
A number of requirements exist which are necessary for a set of matrices to form
such a group.2 What are these requirements? There are three of them:
1. There is a unity element e in the set that does nothing; this is served by a unit
matrix E in the case of matrices.
2. There exists a multiplication operation to be understood as an action of several
operations one after another; such a combined operation must be equivalent to
some other operation in the same set; in other words, if g1 and g2 belong to the
set, then g1 g2 must also be in it (the closure condition); the multiplication is
2
Note, however, that groups can also be formed by other objects, not only matrices, although we
shall limit ourselves only to matrices here.
1.2 Matrices: Definition and Properties 27
Example 1.9.
I As an example, consider a set of four matrices:
10 0 1 1 0 0 1
ED ; AD ; BD and C D : (1.29)
01 1 0 0 1 1 0
It is easily checked that a product of any two elements results in an element from
the same set; all such products are shown in Table 1.1. Next, we see that each
element has an inverse; indeed, by looking at the table, we see that A1 D C
(since AC D CA D E), B1 D B and C1 D A. Also, it is obvious that E serves
as a unity element. Hence, the four elements form a group with respect to matrix
multiplication. Moreover, we can also notice that two elements E and B form a
group of two elements on their own since BB D E and B1 D B. This smaller group
consists of elements of a larger group and is called a subgroup. J
Let us now consider a particular set of certain real transformation matrices A. These
are the matrices that conserve the dot product between any two real vectors: if
x0 D Ax and y0 D Ay, then
0 0
x ; y D .Ax; Ay/ D .x; y/ : (1.30)
In particular, by taking y D x; we obtain that .Ax; Ax/ D .x; x/, i.e. the length of a
vector before and after this particular transformation does not change.
To uncover the appropriate condition
the matrix A should satisfy, we use
Eq. (1.17) and get .Ax; Ay/ D x; AT Ay , which will be equal to .x; y/ only if
AT A D E : (1.31)
One can also see that AAT D E as well (indeed, multiply both sides of Eq. (1.31) by
A from the left and then by A1 from the right). Matrices satisfying this condition are
called orthogonal.4 Transformations performed by orthogonal matrices are called
orthogonal transformations. One can see by comparing Eqs. (1.31) and (1.20) that
for orthogonal matrices
A1 D AT ; (1.32)
i.e. the transposed matrix AT is at the same time the inverse of A. Therefore, the
inverse of the orthogonal matrix always exists and is equal to AT , i.e. orthogonal
matrices are non-singular.
The orthogonal matrices form a group: if A and B are orthogonal, then C D AB
is also orthogonal. Indeed, A1 D AT , B1 D BT and
3
An interested reader may consult specialised texts, e.g. J.P. Elliott and P.G. Dawber, Symme-
try in physics, Vols. 1 (ISBN-10: 0195204557, ISBN-13: 978-0195204551) and 2 (ISBN-10:
0195204565, ISBN-13: 978-0195204568), Oxford Univ Pr; Reprint edition, 1985, or M. Hamer-
mesh, Group theory and its application in physical problems, Dover Publications; Reprint edition,
2012 (ISBN-10: 0486661814, ISBN-13: 978-0486661810).
4
This probably originates from the definition of two orthogonal vectors. Indeed, if we have two
vector-columns X and Y which are orthogonal, then one can write X T Y D 0 which in some sense
may be considered analogous to Eq. (1.31).
1.2 Matrices: Definition and Properties 29
Also, the set of orthogonal matrices contains the unit element E which is of course
orthogonal; finally, each element has an inverse as A1 D AT and AT is also an
T 1 1
orthogonal matrix because AT D A D A1 D AT . This discussion
proves the statement made above.
As a simple example, consider the following matrix:
cos sin
AD : (1.33)
sin cos
Then, the matrix
cos sin
AT D
sin cos
is its transpose. It is easily checked by direct multiplication that A is an orthogonal
matrix since AT A D E. Indeed,
cos sin cos sin 10
D ;
sin cos sin cos 01
i.e. the transposed matrix is indeed the inverse of A. Consider now a vector x D
.x1 ; x2 / of length square jxj2 D .x; x/ D x12 C x22 . It is easy to see that the length
square .y; y/ of the vector y D Ax is also equal to .x; x/ for any x. Indeed,
cos sin x1 x1 cos x2 sin y1
yD D D ;
sin cos x2 x1 sin C x2 cos y2
and its square length y21 C y22 is easily calculated (after simple manipulations) to be
x12 C x22 .
In the 3D space orthogonal matrices not only conserve the length of a vector
after the transformation, they also conserve the angle between two vectors after
transforming both of them. Indeed, according to Eq. (I.1.21), the dot product is
equal to a product of vectors lengths and the cosine of the angle between them. The
lengths of the vectors are conserved after the transformation. Since the dot product
is conserved as well, the cosine of the angle is conserved, i.e. the angle between two
transformed vectors remains the same.
What makes the matrix an orthogonal one? We shall now see that these matrices
have a very special structure. Writing Eq. (1.31) element by element, we obtain
(assuming that A is a square n n matrix):
X X
aQ ik akj D aki akj D ıij :
k k
Here elements faki g D .a1i ; a2i ; : : : ; ani / with the fixed i form a vector˚ which
is
composed of all elements of the i-th column of A; likewise, all elements akj with
the fixed j form a vector out of the j-th column of A. Therefore, the equation above
30 1 Elements of Linear Algebra
tells us that the dot product of the i-th and j-th columns of A is equal to zero if i ¤ j
and one if i D j. In other words, the columns of an orthogonal n n matrix form a
set of n orthonormal vectors.
Similarly, we can consider a more general p-D space of complex vectors. But
first, let us prove for that case an identity analogous to Eq. (1.17):
! ! !
X X X X X X
.Ax; y/ D alk xk yl D xk alk yl D xk aQ kl yl
l k k l k l
D x; AT y :
We see that when the matrix A goes from the left position in the dot product to the
right position there, it changes in two ways: it is transposed (as in the previous case
of the real space)
and undergoes complex conjugation (obviously, in any order).
The matrix AT D AT obtained by transposing A and then applying complex
conjugationto all
its elements is called Hermitian conjugate and is denoted with
the dagger: AT D A . Hence, we can write
.Ax; y/ D x; A y : (1.34)
AA D E ; (1.35)
are called unitary matrices. Transformations in complex vector spaces by unitary
matrices are called unitary transformations. Physical quantities in quantum mechan-
ics have a direct correspondence to unitary matrices. Comparing Eq. (1.35) with the
definition of the inverse matrix, Eq. (1.20), one can see that for unitary matrices
A1 D A ; (1.36)
i.e. to calculate the inverse matrix, one has to simply transpose it and then take the
complex conjugate of all its elements. Since for unitary matrices A is the same as
A1 , then A A D E is true as well.
If we write down AA D A A D E in elements, we obtain the following
equations:
X X
aik ajk D ıij and aki akj D ıij : (1.37)
k k
1.2 Matrices: Definition and Properties 31
These relationships mean that rows and columns of a unitary matrix are orthonormal
in the full sense of the dot product of complex vectors.
It then follows from Eq. (1.34) that for unitary transformations,
.Ax; Ay/ D x; A ŒAy D x; A Ay D .x; Ey/ D .x; y/ ;
i.e. the dot product .x; y/ is conserved. We see that unitary transformations are
generalisations of the orthogonal transformations for complex vector spaces.
Example 1.10. I Show that
1 1 C 2i 4 2i
AD
5 2 4i 2 i
is unitary.
Solution. What we need to do is simply to check if AA D A A D E:
1 1 C 2i 2 4i 1 1 2i 2 C 4i
A D
T
and thus A DT
D A I
5 4 2i 2 i 5 4 C 2i 2 C i
therefore,
1 1 2i 2C4i 1 1 C 2i 4 2i 1 25 0 10
A AD D D DE ;
5 4C2i 2Ci 5 2 4i 2 i 25 0 25 01
1 1C2i 42i 1 1 2i 2C4i 1 25 0 10
AA D D D DE;
5 2 4i 2i 5 4 C 2i 2Ci 25 0 25 01
as it should be. J
If after applying the Hermitian conjugate to a matrix A it changes its sign, i.e.
A D A, the matrix is called anti-Hermitian or skew-Hermitian.
Problem 1.33 (Discrete Fourier Transform). Consider N numbers x0 , x1 ,
etc., xN1 . One can generate another set of N numbers, yj (with j D
0; 1; : : : ; N 1), using the following recipe:5
1 X i2
N1
yj D p e jk=N
xk : (1.38)
N kD0
˚ ˚
Form N-dimensional vectors X and Y from the quantities xj and yj ,
respectively, and then write the above relationship in the matrix form as
Y D UX. Prove that rows (columns) of U form an orthonormal set of vectors,
and hence that the matrix U is unitary. Finally, establish that the inverse
transformation from Y to X reads
1 X i2
N1
xk D p e jk=N
yj : (1.39)
N jD0
5
Complex exponential function is introduced properly in Section 2.3.3.
32 1 Elements of Linear Algebra
1.2.5.2 Rotations in 3D
Problem 1.37. Using well-known trigonometric identities, show that the new
coordinates, y0 and z0 , are related to the old ones, y and z, as follows:
y0 D y cos C z sin and z0 D z cos y sin :
Note that this matrix is orthogonal. Indeed, consider a rotation by the angle
about the same axis x:
0 1
1 0 0
Rx . / D @ 0 cos sin A D Rx . /T ;
0 sin cos
and this matrix can easily be checked to be the inverse to that in Eq. (1.40):
0 10 1 0 1
1 0 0 1 0 0 100
Rx . /Rx . /T D @ 0 cos sin A @ 0 cos sin A D @ 0 1 0 A D E ;
0 sin cos 0 sin cos 001
i.e. indeed
(continued)
34 1 Elements of Linear Algebra
It is seen now that 3D rotations about the x axis form a group. Indeed, any rotation
has an inverse, the unit matrix is the unity element (rotation by D 0 gives E) and
any two consecutive rotations by 1 and 2 correspond to a single rotation by 1 C2 .
Obviously, all rotations about y or z axis also form groups. In fact, it is actually
possible to establish that any arbitrary rotation in the 3D space (not necessarily
about the same axis) can always be represented by no more than three elementary
rotations described above, and all these rotations form a group of rotations of the
3D space.
1.2.5.3 Reflections in 3D
r D a C .r n/ n and r0 D a .r n/ n H) r0 r D 2 .r n/ n :
while the reflection in the plane with the normal n D p1 .1; 1; 1/ is given by the
3
matrix
0 1
1 2 2
1@
C 111 D 2 1 2 A ;
3
2 2 1
so that the vector along the normal, r D .1; 1; 1/, transforms into r0 D C 111 r D
.1; 1; 1/, as expected.
One quantity which is of great importance in the theory of matrices is its determi-
nant. We introduced the 2 2 and 3 3 determinants before in Sect. I.1.6. Here we
shall generalise these definitions to determinants of arbitrary dimension and, most
importantly, will relate the determinants to square matrices.
We start by recalling the 2- and 3-dimensional cases and then consider a general
case of an arbitrary dimension. Consider a 2 2 and a 3 3 matrices
0 1
b11 b12 b13
a11 a12
AD and B D @ b21 b22 b23 A :
a21 a22
b31 b32 b33
36 1 Elements of Linear Algebra
D b11 b22 b33 b11 b23 b32 b12 b21 b33 C b12 b23 b31 C b13 b21 b32 b13 b22 b31 :
(1.46)
This expression contains 3Š D 6 terms.
Let us have a look at the two expressions (1.45) and (1.46) closely. Each
expression contains a sum of products of elements of the corresponding matrix,
such as b12 b21 b33 in Eq. (1.46). One can say that in every such a product each row
is represented by a single element; at the same time, one may also independently
say that each column is also represented by a single element of A.
This means that each elementary product in the cases of jAj and jBj can be
written as ˙a1j1 a2j2 and ˙b1j1 b2j2 b3j3 , respectively, where indices j1 ; j2 or j1 ; j2 ; j3
correspond to some permutation of numbers 1,2 and 1,2,3 for the two determinants.
There are 2Š D 2 and 3Š D 6 permutations of the second indices possible in the two
cases clearly illustrating the number of terms in each of the two determinants.
Now, let us look at the sign attached to each of the products. It is defined
by the parity of the permutation 1; 2 ! j1 ; j2 or 1; 2; 3 ! j1 ; j2 ; j3 . Every
permutation of two indices contributes a factor of 1. By performing all necessary
pair permutations one after another, starting from the first (perfectly ordered) term,
such as a11 a22 or b11 b22 b33 , the overall sign of each term in Eqs. (1.45) and (1.46)
can be obtained. For instance, consider the second term in Eq. (1.46), b11 b23 b32 .
Only one permutation of the right indices in the third and the second elements is
required in this case:
where the permuted indices are underlined. Therefore, this term should appear with
a single factor of 1. The fourth term, Cb12 b23 b31 , on the other hand, requires two
permutations:
We shall use Eq. (1.47). In each term in the product at least one element is chosen
from every row and column. However, all terms except those on the main diagonal
are zeros in D. Therefore, there is only one non-zero term in the sum, namely
d11 d22 dnn ; the parity of this term is obviously one, i.e. the determinant of a
diagonal matrix is equal to a product of all its diagonal elements:
ˇ ˇ
ˇ :: ˇ
ˇ d11 : 0 0 ˇ
ˇ ˇ
ˇ :: :: ˇ
ˇ : d22 : 0 ˇˇ Yn
ˇ D d d d D dkk : (1.49)
ˇ : : : : : : ˇˇ 11 22 nn
ˇ 0 : : :ˇ
ˇ kD1
ˇ : ˇ
ˇ 0 0 : : dnn ˇ
Solution. We have to construct, starting each time from the “perfectly ordered”
elementary product a11 a22 a33 a44 , all possible orderings of the right indices, keeping
track of the parity of the permutation. There are 4Š D 24 terms to be expected. All
24 permutations of numbers 1,2,3,4 and their parity together with the corresponding
contribution to the detA are shown in Table 1.2. Summing up all terms in the last
column of the table, we obtain the desired expression for the determinant. J
d X n
jAj D jAk j ;
d kD1
Proof. Consider a matrix Ar$t obtained from A by interchanging the row r with the
row t (assuming for certainty that t > r):
0 1 0 1
a11 a1n a11 a1n
B C B C
B C B C
Ba arn C B a atn C
B r1 C B t1 C
B C B C
A D B C and Ar$t D B C :
B C B C
B at1 atn C B ar1 arn C
B C B C
@ A @ A
an1 ann an1 ann
Then, we get
X
detA D j1 j2 :::jn a1j1 : : : arjr : : : atjt : : : anjn ;
P
X
detAr$t D j1 j2 :::jn a1j1 : : : atjt : : : arjr : : : anjn :
P
It is clear from this that if we make a single pair permutation atjt $ arjr in every
term of detAr$t , the expression will become exactly the same as for detA. However,
a single permutation brings in an extra factor of 1 to each of the nŠ terms of the
sum. Thus, detAr$t D detA, as required.
Property 1.2. Interchanging two columns in the determinant gives a factor of 1,
e.g.
ˇ ˇ ˇ ˇ
ˇ 1 2 1 ˇ ˇ 1 1 2 ˇ
ˇ ˇ ˇ ˇ
ˇ3 0 3 ˇ D ˇ3 3 0ˇ :
ˇ ˇ ˇ ˇ
ˇ2 1 4 ˇ ˇ2 4 1ˇ
Proof. Similar to the above; however, each term in the sum should be ordered with
respect to the right rather than the left index of every element of the matrix in
accordance with Eq. (1.48).
Property 1.3. If a matrix has two identical rows (columns), its determinant is equal
to zero, e.g.
ˇ ˇ
ˇx 1 xˇ
ˇ ˇ
ˇ x2 1 x2 ˇ D 0 :
ˇ ˇ
ˇ x3 3 x3 ˇ
Proof. Let the rows r and t be identical. If we interchange them, then, according to
Property 1.1, detAr$t D detA. However, because the two rows are identical, Ar$t
does not differ from A, leading to detA D detA, which means that detA D 0.
1.2 Matrices: Definition and Properties 41
Proof. This property follows directly from Eq. (1.47) and the fact that in each
product of elements there is only one element from that row (column).
Property 1.5. If every element of a row (column) is written as a sum (difference)
of two terms, the determinant is equal to the sum (difference) of two determinants
each containing one part of the row (column):
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ a1r ˙ b1r ˇ ˇ a1r ˇ ˇ b1r ˇ
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ ˇˇ D ˇˇ ˇˇ ˙ ˇˇ ˇˇ ; (1.50)
ˇ
ˇ a ˙ b ˇ ˇ a ˇ ˇ b ˇ
nr nr nr nr
e.g.
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ a 2 1 ˇ ˇ a C x 2 1 ˇ ˇ x 2 1 ˇ
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ 3a 0 3 ˇ D ˇ 2a 0 3 ˇ C ˇ a 0 3 ˇ :
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ 2a 1 4 ˇ ˇ 4a 1 4 ˇ ˇ 2a 1 4 ˇ
where the first row in the determinant in the right-hand side is obtained by
subtracting row 2 from row 1 in the determinant in the left-hand side.
Proof. It follows from Properties 1.5 and 1.3.
Property 1.7. The determinant is equal identically to zero if at least one row
(column) is a linear combination of other rows (columns).
42 1 Elements of Linear Algebra
Proof. The idea of the proof can be clearly seen by considering a simple case when
the first column is a linear combination of the second and the third columns:
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ ˛a12 C ˇa13 a12 a13 ˇ ˇ ˛a12 a12 a13 ˇ ˇ ˇa13 a12 a13 ˇ
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ ˇˇ D ˇˇ ˇˇ C ˇˇ ˇˇ
ˇ
ˇ ˛a C ˇa a a ˇ ˇ ˛a a a ˇ ˇ ˇa a a ˇ
n2 n3 n2 n3 n2 n2 n3 n3 n2 n3
ˇ ˇ ˇ ˇ
ˇ a12 a12 a13 ˇ ˇ a13 a12 a13 ˇ
ˇ ˇ ˇ ˇ
ˇ
D ˛ ˇ ˇ C ˇ ˇˇ ˇˇ ;
ˇ
ˇa a a ˇ ˇa a a ˇ
n2 n2 n3 n3 n2 n3
where we have used Properties 1.5 and 1.4. Finally, each of the determinants in the
last line is equal to zero since it contains two identical columns (Property 1.3), as
required. The proof in a general case is straightforward.
Property 1.8. The determinants of the matrices A and AT are equal.
Proof. The determinant of AT D aQ ij with aQ ij D aji is given by
X X
det AT D Q 1j1 aQ 2j2 : : : aQ njn D
j1 j2 :::jn a j1 j2 :::jn aj1 1 aj2 2 : : : ajn n ;
P P
by virtue of Property 1.3. Thus, using Property 1.5, we can add two determinants
together as follows (they both have the same second column):
ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ
ˇ a11 x1 a12 ˇ ˇ a12 x2 a12 ˇ ˇ a11 x1 C a12 x2 a12 ˇ ˇ h1 a12 ˇ
x1 jAj C 0 D ˇˇ ˇ C ˇ ˇ D ˇ ˇ D ˇ ˇ ;
a21 x1 a22 ˇ ˇ a22 x2 a22 ˇ ˇ a21 x1 C a22 x2 a22 ˇ ˇ h2 a22 ˇ
1.2 Matrices: Definition and Properties 43
that gives the required solution for x1 as the ratio of the determinant in the right-hand
side and jAj. Similarly, starting from
ˇ ˇ
ˇa a x ˇ
x2 jAj D ˇˇ 11 12 2 ˇˇ
a21 a22 x2
to it, we obtain
ˇ ˇ ˇ ˇ
ˇ a11 a11 x1 C a12 x2 ˇ ˇ a11 h1 ˇ
ˇ
x2 jAj D ˇ ˇ D ˇ ˇ ;
a21 a21 x1 C a22 x2 ˇ ˇ a21 h2 ˇ
which gives x2 . The obtained solution is a particular case of the Cramer’s rule to be
considered in more detail in Sect. 1.2.7.
Property 1.9. The determinant of a product of two matrices is equal to the product
of their determinants:
Proof. Let us first consider a two-dimensional case to see the main idea:
ˇ ˇ ˇ ˇ
ˇa a ˇ ˇb b ˇ
detA D ˇˇ 11 12 ˇˇ D a11 a22 a12 a21 and detB D ˇˇ 11 12 ˇˇ D b11 b22 b12 b21 ;
a21 a22 b21 b22
and
ˇ ˇ
ˇ a11 b11 C a12 b21 a11 b12 C a12 b22 ˇ
ˇ
det .AB/ D ˇ ˇ :
a21 b11 C a22 b21 a21 b12 C a22 b22 ˇ
where the selected two determinants are each equal to zero due to Property 1.3.
Finally, collect the remaining terms:
ˇ ˇ ˇ ˇ
ˇ a11 a12 ˇ ˇ a12 a11 ˇ
ˇ
det .AB/ D b11 b22 ˇ ˇ C b21 b12 ˇ ˇ ˇ
a21 a22 ˇ a22 a21 ˇ
ˇ ˇ ˇ ˇˇ ˇ
ˇa a ˇ ˇb b ˇˇa a ˇ
D .b11 b22 b21 b12 / ˇˇ 11 12 ˇˇ D ˇˇ 11 12 ˇˇ ˇˇ 11 12 ˇˇ ;
a21 a22 b21 b22 a21 a22
as required. Above we permuted the columns of the second determinant to have its
elements in the correct order (as in the first one); this changed the sign before b21 b12
to give exactly detB.
Now we apply the same method to the general P case of an n n determinant. The
general term of the matrix C D AB is cij D k aik bkj . Therefore, the determinant of
C we would like to calculate is
ˇ P P P ˇ
ˇ ˇ
ˇ Pk1 a1k1 bk1 1 Pk2 a1k2 bk2 2 Pkn a1kn bkn n ˇ
ˇ ˇ
det .AB/ D ˇ
ˇ l1 a2l1 bl1 1 l2 a2l2 bl2 2 ln a2ln bln n ˇ :
ˇ
ˇ ˇ
ˇP P P ˇ
ˇ m anm1 bm1 1 m anm2 bm2 2 m anmn bmn n ˇ
1 2 n
Notice that we used different summation (dump) indices. We start by splitting the
first column term-by-term, then the second one, the third and so on:
ˇ P P ˇ
ˇa b ˇ
ˇ 1k1 k1 1 Pk2 a1k2 bk2 2 Pkn a1kn bkn n ˇ
X ˇˇ a2k bk 1 ˇ
det .AB/ D l2 a2l2 bl2 2 ln a2ln bln n ˇ
ˇ 1 1 ˇ
ˇ ˇ
k1 ˇ P P ˇ
ˇ ank1 bk1 1 m anm2 bm2 2 m anmn bmn n ˇ
2 n
ˇ ˇ
ˇa b a b P a b ˇ
ˇ 1k1 k1 1 1k2 k2 2 1k ˇ
X X ˇˇ a2k bk 1 a2k bk 2 P a2l bl n ˇˇ
kn n kn n
D ˇ 1 1 2 2 ln n n
ˇ
ˇ ˇ
k1 k2 ˇ P ˇ
ˇ ank1 bk1 1 ank2 bk2 2 m anmn bmn n ˇ
n
1.2 Matrices: Definition and Properties 45
ˇ ˇ
ˇ a1k1 bk1 1 a1k2 bk2 2 a1kn bkn n ˇ
ˇ ˇ
X X X ˇ a2k bk 1 a2k bk 2 a2k bk n ˇ
D D ::: ˇ 1 1 2 2 n n ˇ
ˇ ˇˇ
kn ˇ
k1 k2
ˇa b a b a b ˇ
nk1 k1 1 nk2 k2 2 nkn kn n
ˇ ˇ
ˇ a1k1 a1k2 a1kn ˇ
ˇ ˇ
XX X ˇ a2k1 a2k2 a2kn ˇ
D ::: ˇ
bk1 1 bk2 2 bkn n ˇ ˇ :
ˇ ˇˇ
k1 k2 kn
ˇa a a ˇ
nk1 nk2 nkn
Note that when splitting the columns, we took the summation signs out; then the
same summation index can be used. The determinant above in the right-hand side
contains only elements of the matrix A; it would only be non-zero if all indices k1 ,
k2 , etc., are different. In other words, the n summations over the indices k1 , k2 , etc.,
in fact can be replaced with a single sum taken over all permutations P of the indices
.k1 ; k2 ; : : : ; kn / running between 1 and n:
ˇ ˇ
ˇ a1k1 a1k2 a1kn ˇˇ
X ˇ
ˇa a2k2 a2kn ˇˇ
det .AB/ D bk1 1 bk2 2 bkn n ˇˇ 2k1 :
ˇ ˇˇ
P
ˇa ank2 ankn ˇ
nk1
Now, looking at the determinant: if the indices k1 , k2 , etc., were ordered correctly in
the ascending order from 1 to n, then the determinant above would be equal exactly
to detA. To put them in the correct order for an arbitrary arrangement of the indices
k1 , k2 , etc., a permutation is required resulting in the sign P D k1 k2 :::kn D ˙1.
Therefore,
!
X X
det .AB/ D k1 k2 :::kn bk1 1 bk2 2 bkn n detA D k1 k2 :::kn bk1 1 bk2 2 bkn n detA
P P
D detB detA ;
as required.
Table 1.3 Grouping permutations of numbers .1; 2; 3/ into three groups corre-
sponding to a different fixed first number
Label Sequence after permutation Number of permutations Sign/Parity
a1 123 0 C1
a2 132 1 1
b1 213 0C1D1 1
b2 231 1C1D2 C1
c1 312 1C1D2 C1
c2 321 2C1D3 1
the first group (sequences a1, a2) to the second (b1, b2), only one more additional
permutation is required giving an extra minus sign to the parity; going from the
second to the third (c1, c2), a single additional permutation is added again bringing
in another minus sign. Therefore, the element b11 in Eq. (1.54) corresponding to
the first group (sequences a1, a2) has a plus sign, b12 acquired a minus, while b13
acquires another minus giving it plus in the end. In other words, the signs alternate
starting from the plus for the very first term.
It appears that this method is very general and can be applied to a determinant of
arbitrary order. To formulate the method, we introduce a new quantity. Consider a
determinant jAj of order n. If we remove the i-th row and the j-th column from it (the
row and the column cross at element aij ), a determinant of order n 1 is obtained.
It is called a minor of aij and denoted Mij . For example, the minor of the element
a34 D 8 (bold underlined) of the determinant
ˇ ˇ ˇ ˇ
ˇ 1 2 3 4 ˇˇ ˇ 1 2 3 j ˇˇ ˇˇ ˇ
ˇ ˇ 1 2 3 ˇˇ
ˇ 4 3 2 1 ˇˇ ˇ 4 3 2 ˇ
jˇ ˇ ˇ
ˇ is M34 D ˇˇ D 4 3 2 ˇˇ :
ˇ 0 2 4 8 ˇˇ ˇˇ ˇˇ
ˇ ˇ
8 4 2 ˇ
ˇ 8 4 2 0 ˇ ˇ 8 4 2 jˇ
A minor Mij can be attached a sign .1/iCj in which case it is called a co-factor
of aij and denoted Aij D .1/iCj Mij . In the example above the co-factor of a34 is
.1/3C4 M34 D M34 . The co-factor signs can be easily obtained by constructing a
chess-board of plus and minus signs starting from the plus at the position 11:
ˇ ˇ
ˇC C C ˇ
ˇ ˇ
ˇ C C ˇ
ˇ ˇ
ˇ ˇ
ˇC C C ˇ
ˇ ˇ
ˇ C C ˇ
ˇ ˇ :
ˇC C C ˇ
ˇ ˇ
ˇ :: ˇ
ˇ : ˇ
ˇ ˇ
ˇ : : ˇˇ
ˇ :
48 1 Elements of Linear Algebra
Thus, the correct sign .1/iCj for the co-factor can be located on this chess-board
at the same i; j position as the element aij itself in the original determinant.
Now we are ready to formulate the general result:
X
n
jAj D a11 A11 C a12 A12 C C a1n A1n D a1k A1k ; (1.55)
kD1
X
n X
n
D a1j1 .1/j1 C1 M1j1 D a1j1 A1j1 ;
j1 D1 j1 D1
as required since the expression in the square brackets is nothing but the determinant
obtained by removing all elements of the first row and of the j1 -th column, i.e. it is
the minor M1j1 . Q.E.D.
Problem 1.48. Prove that similar formula can be written by expanding along
any row or column.
1.2 Matrices: Definition and Properties 49
Therefore, we obtain
ˇ ˇ
ˇx 1 1 1 ˇˇ
ˇ
ˇ1 x 0 0 ˇˇ
ˇ D x4 x2 x2 x2 D x2 x2 3 D 0 ;
ˇ1 0 x 0ˇ ˇ
ˇ
ˇ1 0 0 xˇ
p
which has the following solutions: x D 0; ˙ 3. J
Problem 1.49. Show that the solutions of the following equation with respect
to x,
ˇ ˇ
ˇx 1 0 1ˇ
ˇ ˇ
ˇ1 x 1 0ˇ
ˇ ˇ
ˇ0 1 x 1ˇ D 0 ;
ˇ ˇ
ˇ1 0 1 xˇ
are x D 0; ˙2.
Problem 1.50. Consider the following matrix:
0 1
2 0 1
A D @ 1 1 3 A :
0 2 1
(continued)
50 1 Elements of Linear Algebra
The same result is valid for the left triangular matrix as well.
As an interesting
example of a determinant calculation, let us consider a matrix
A D aij of a special structure in which only elements along the diagonal (i D j)
and next to it are non-zero, all other elements are equal to zero: aij ¤ 0 with j D i
and j D i ˙ 1. In other words, in the case of this so-called tridiagonal matrix aij D 0
as long as ji jj > 1. The determinant we are about to calculate is shown in the
left-hand side of the equation pictured in Fig. 1.6.
Let Ak be a matrix obtained from A by removing its first .k1/ rows and columns,
i.e. in Ak the diagonal elements are ˛k , ˛kC1 , etc., ˛n . In particular, A D A1 . Then,
opening the determinant of A along the first row, we have: jA1 j D ˛1 jA2 j C ˇ1 R12 ,
where R12 is the corresponding co-factor to the a12 D ˇ1 element of A, see the
second term in the right-hand side in Fig. 1.6. Opening now R12 along its first
column, we get R12 D ˇ1 jA3 j, yielding
jA1 j ˇ12
jA1 j D ˛1 jA2 j ˇ12 jA3 j H) D ˛1 :
jA2 j jA2 j = jA3 j
1.2 Matrices: Definition and Properties 51
Fig. 1.6 For the calculation of the determinant of a tridiagonal matrix: opening along the first
(upper) row. At the next step, the second determinant in the right-hand side is opened along its first
column leading to a product of ˇ1 with the minor enclosed in the green box
This fraction has a finite number of terms (as the matrix A is of a finite dimension)
and can be denoted in several different ways, e.g.
ˇ ˇ ˇ ˇ
ˇ12 ˇ ˇ22 ˇ ˇ32 ˇ 2 ˇ
ˇn1
1 D ˛1
j˛2 j˛3 j˛4 ˛n
or
It can be rewritten in a compact form using the matrix notations. Let us collect all the
unknown quantities x1 ; : : : ; xn into a vector-column X D .xi /, the coefficients aij into
a square matrix A D aij and, finally, the quantities in the right-hand side b1 ; : : : ; bn
into a vector-column B D .bi /. Then, instead of Eq. (1.58) we write simply
AX D B : (1.59)
If we multiply both sides of this equation from the left by A1 , we obtain in the
left-hand side A1 AX D EX D X, and hence we get a formal solution:
X D A1 B : (1.60)
Thus, the solution of the system of linear equations (1.59) is expressed via the
inverse of the matrix A of the coefficients. Although this solution is in many cases
useful, especially for analytical work, it does not give a simple practical way of
calculating X since it is not always convenient to find the inverse of a matrix,
especially if the dimension n of the problem is large.
Instead, we shall employ a different method due to Cramer. Consider
ˇ ˇ
ˇ a11 x1 a1n ˇ
ˇ ˇ
x1 jAj D ˇˇ ˇˇ :
ˇa x a ˇ
n1 1 nn
in which the first column is proportional to the second one. Now sum up both
expressions above, which yields
ˇ ˇ
ˇ a11 x1 C a12 x2 a1n ˇ
ˇ ˇ
x1 jAj D ˇˇ ˇˇ :
ˇa x C a x a ˇ
n1 1 n2 2 nn
1.2 Matrices: Definition and Properties 53
Next, we consider the zero determinant with the first column equal to the third one
times x3 , and then add this to the determinant above; we obtain in this way:
ˇ ˇ
ˇ a11 x1 C a12 x2 C a13 x3 a1n ˇ
ˇ ˇ
x1 jAj D ˇˇ ˇˇ :
ˇa x C a x C a x a ˇ
n1 1 n2 2 n3 3 nn
The elements along the first column can now be recognised to be elements of the
vector B in Eq. (1.58), i.e. we can write
ˇ ˇ ˇ ˇ
ˇ a11 x1 C a12 x2 C a13 x3 C C a1n xn a1n ˇ ˇ b1 a12 a1n ˇ
ˇ ˇ ˇ ˇ
x1 jAj D ˇˇ ˇˇ D ˇˇ ˇˇ ;
ˇa x C a x C a x C C a x a ˇ ˇ b a a ˇ
n1 1 n2 2 n3 3 nn n nn n n2 nn
(1.61)
which gives the required closed solution for x1 as a ratio of two determinants.
Multiplying jAj by x2 and inserting it into the second column and repeating the
above procedure, we obtain
ˇ ˇ
ˇ a11 b1 a1n ˇ
ˇ ˇ
x2 jAj D ˇˇ ˇˇ ; (1.62)
ˇa b a ˇ
n1 n nn
Problem 1.55. Determine the values of x for which the system of equations
with respect to c1 , c2 , c3 and c4 has a non-trivial solution:
8
ˆ
ˆ xc1 C c2 C c3 C c4 D 0
<
c1 C xc2 C c4 D 0
:
ˆ c1 C xc3 C c4 D 0
:̂
c1 C c2 C c3 C xc4 D 0
p
[Answer: x D 0; 1, .1 ˙ 17/=2.]
Proof. To prove that the vectors are linearly independent, we have to demonstrate
that the system of equations
c1 d1 C c2 d2 C C cp dp D 0
with respect to the coefficients c1 ; c2 ; : : : ; cp has only the trivial (zero) solution.
Write the equations in components of the vectors dk D .dik / using the right index in
dik to indicate the vector number k and the left one for the component:
8
< d11 c1 C d12 c2 C C d1p cp D 0
: (1.65)
:
dp1 c1 C dp2 c2 C C dpp cp D 0
56 1 Elements of Linear Algebra
This is a set of p linear algebraic equations with respect to the unknown coefficients
c1 ; c2 ; : : : ; cp with the zero right-hand side. It has a trivial solution if the determinant
of a matrix formed by the coordinates of vectors d1 ; d2 ; : : : ; dp is not equal to zero:
ˇ ˇ
ˇ d11 d1p ˇ
ˇ ˇ
jDj D ˇˇ ˇˇ ¤ 0 : (1.66)
ˇd d ˇ
p1 pp
We shall now show that this is indeed the case since the vectors d1 ; d2 ; : : : ; dp are
orthonormal. Indeed, due to their orthogonality and unit length, we can write
X
p
.di ; dj / D ıij or dki dkj D ıij : (1.67)
kD1
If we now introduce the Hermitian conjugate matrix D D dij with dij D dji , then
it is seen that Eq. (1.67) can be written simply as
X
p
dik dkj D ıij or D D D E ;
kD1
i.e. D is a unitary matrix whose determinant (see Problem 1.47) jDj D ˙1. We see
that the determinant of D is not equal to zero. This means that the only solution
of Eq. (1.65) is the trivial solution which, in turn, means that the set of vectors
d1 ; d2 ; : : : ; dp is indeed linearly independent. Q.E.D.
Note that requirement of the theorem that the vectors are normalised to unity
is not essential and was only assumed for convenience. It is sufficient to have
vectors orthogonal to guarantee their linear independence. The theorem just proven
is important as it says that if we have a set of p orthogonal vectors, they can form
a basis of a p-D vector space, and any vector from this space can be expanded in
terms of them. Correspondingly, that means that no more than p linearly independent
vectors can be constructed for the p-D space: any additional vector within the same
space will necessarily be linearly dependent on them.
valid for any continuous interval of the values of x, has only the trivial solution with
respect to the coefficients ˛1 , ˛2 , etc.:
˛1 D ˛2 D D ˛n D 0 :
It is possible to work out a simple method for verifying this. To this end, let us
generate .n 1/ more equations by differentiating both sides of Eq. (1.68) once,
twice, etc., .n 1/ times. We obtain n 1 additional equations:
.1/ .1/
˛1 f1 .x/ C ˛2 f2 .x/ C C ˛n fn.1/ .x/ D 0 ;
.2/ .2/
˛1 f1 .x/ C ˛2 f2 .x/ C C ˛n fn.2/ .x/ D 0 ;
.n1/ .n1/
˛1 f1 .x/ C ˛2 f2 .x/ C C ˛n fn.n1/ .x/ D 0 ;
.k/
where fi .x/ D dk fi =dxk . These equations, together with Eq. (1.68), form a system
of n linear algebraic equations with respect to the coefficients ˛1 , ˛2 , etc.:
W˛ D 0 ; (1.69)
It is easy to see that W has a triangular form with the elements along its diagonal
being 1, 1, 2Š, 3Š, 4Š, etc., nŠ. Calculating the determinant jWj along the first column
followed by the calculation of all the consecutive minors also along the first column
(cf. Problem 1.52) results in that
Y
n
jWj D 1 2Š 3Š : : : nŠ D iŠ ¤ 0 :
iD1
Problem 1.56. Demonstrate, using the method based on the calculation of the
Wronskian, that the functions sin x, cos x and eix are linearly dependent.
Problem 1.57. The same for functions ex , ex and sinh.x/.
Problem 1.58. Show that the exponential functions ex , e2x , etc., enx are linearly
independent.
The formulae obtained in the previous section allows us to derive a general formula
for the inverse matrix. We should also be able to establish a necessary condition for
the inverse of a matrix to exist.
To accomplish this program, we have to compare the Cramer’s solution of
Eq. (1.63) with that given by Eq. (1.60). To this end, let us first rewrite the
solution (1.63) in a slightly different form. We expand the determinant in the
numerator˚ of the expression for xi along its i-th column (the one which contains
elements bj ):
ˇ ˇ
ˇ a11 a1;i1 b1 a1;iC1 a1n ˇ
ˇ ˇ Xn
ˇ ˇ D b1 A1i C b2 A2i C C bn Ani D bk Aki ;
ˇ ˇ
ˇa a ˇ
n1 n;i1 bn an;iC1 ann kD1
where Aki are the corresponding co-factors of the matrix A of the coefficients of the
system of equations: indeed, by removing the i-th column and the k-th row we arrive
at the k; i co-factor Aki of A. Therefore, combining the last equation with Eq. (1.63),
we have
1 X
n
xi D bk Aki : (1.71)
jAj kD1
1.2 Matrices: Definition and Properties 59
On the other hand, we also formally have the solution in the form of Eq. (1.60)
which contains the inverse matrix; it can be written in components as
X
n
1
xi D A ik bk :
kD1
The last two expressions should give the same answer for any numbers bk .
Therefore, the following expression must be generally valid:
1 Aki
A ik D ; (1.72)
jAj
which gives a general expression for the elements of the inverse matrix sought for.
It can be used to calculate it in the case of arbitrary dimension of the matrixA. Note
the reverse order of indices in Eq. (1.72) above: if we denote by Acof D Aij the
matrix of co-factors, then
ATcof
A1 D : (1.73)
jAj
This is the general result we have been looking for.
It also follows from the above formula that A1 exists if and only if jAj ¤ 0.
Matrices that have a non-zero determinant are called non-singular as opposite to
singular matrices that have a zero determinant.
Example 1.14. I Find the inverse of the matrix
0 1
2 11
A D @ 1 3 2 A :
2 01
Therefore, noting the reverse order of indices between the elements of the inverse
matrix and the corresponding co-factors (i.e. using the transpose of the co-factor
matrix), we get
0 1 0 1
A A A 3=5 1=5 1=5
1 @ 11 21 31 A @
A1 D A12 A22 A32 D 1 0 1 A :
5
A13 A23 A33 6=5 2=5 7=5
It is easy to check that the matrix A1 we found is indeed the inverse of the matrix
A. Using the matrix multiplication, we get
0 10 1 0 1
3=5 1=5 1=5 2 11 100
A1 A D @ 1 0 1 A @ 1 3 2 A D @ 0 1 0 A D E ;
6=5 2=5 7=5 2 01 001
0 10 1 0 1
2 11 3=5 1=5 1=5 100
AA1 D @ 1 3 2 A @ 1 0 1 A D @ 0 1 0 A D E : J
2 01 6=5 2=5 7=5 001
Problem 1.59. The matrix of rotations by the angle around the y axis is
0 1
cos 0 sin
Ry . / D @ 0 1 0 A :
sin 0 cos
Calculate the inverse matrix Ry . /1 via building the corresponding co-
factors. Interpret your result.
Problem 1.60. Show using the method of co-factors that the inverse of the
matrix
0 1 0 1
a0f b 0 bf
1
A D @0 b dA is A1 D @ cd a cf ad A :
b.a cf /
c01 cb 0 ab
Problem
1 1.62. For the same tridiagonal matrix A, show that the element
A 12 of its inverse can be represented as:
" ˇ ˇ ˇ ˇ #1
1
1 ˛1 ˇ12 ˇ ˇ22 ˇ ˇ32 ˇ 2 ˇ
ˇn1
A 12
D ˛1 :
ˇ1 ˇ1 j˛2 j˛3 j˛4 ˛n
When a p-th order matrix A is multiplied with a vector x, this gives another vector
y D Ax. However, in many applications, most notably in quantum mechanics, it
is important to know particular vectors x for which the transformation Ax gives
a vector in the same direction as x, i.e. different from x only by some (generally
complex) constant factor :
Ax D x : (1.74)
Here both x and are to be found; this problem is usually called an eigenproblem.
The number is called an eigenvalue, and the corresponding to it vector x
eigenvector. More than one pair of eigenvectors and eigenvalues may exist for the
given square matrix A. Note that the vectors x and numbers are necessarily related
to each other by the nature of equation (1.74) defining them. Also, note that if x
is a solution of Eq. (1.74) with some or, as it is usually said, it is an eigenvector
corresponding to the eigenvalue , then any vector cx with an arbitrary complex
factor c is also an eigenvector with the same eigenvalue. In other words, eigenvectors
are always defined up to an arbitrary prefactor.
Obviously, the vector x D 0 is a solution of Eq. (1.74) with D 0. However, in
applications this trivial solution is never of a value, and so we shall only be interested
in non-trivial solutions of this problem.
To solve the problem, we shall rewrite it in the following way: let us take the
vector x to the left-hand side and rewrite it in a matrix form as Ex using the
unit matrix E. We get
.A E/ x D 0 : (1.75)
This equation should ring a bell as it is simply a set of linear algebraic equations
with respect to x with the zero right-hand side. We know from Sect. 1.2.7 that this
system of equations has a non-trivial solution if
Problem 1.63. Prove that the square matrices A and AT share the same
eigenvalues.
Problem 1.64. Prove that the square matrices A and U 1 AU share the same
eigenvalues.
Problem 1.65. Let the n n matrix A has eigenvalues 1 ; 2 ; : : : ; n . Show
that the eigenvalues of the matrix Am D AA …
„ ƒ‚ A are m
1 ; 2 ; : : : ; n .
m m
The determinant of a square matrix A can be directly expressed via its eigenvalues
as demonstrated by the following Theorem.
Theorem 1.5. The determinant of a matrix is equal to the product of all its
eigenvalues:
Y
p
detA D .i/ : (1.77)
iD1
Solution. First if all, we need to solve the secular equation to find the eigenvalues:
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ 1 1 1 0 ˇˇ ˇˇ 1 1 0 ˇˇ ˇˇ 1 1 ˇˇ
ˇ D D D2 6 D 0 ;
ˇ 4 2 01 ˇ ˇ 4 2 0 ˇ ˇ 4 2 ˇ
that has two solutions: .1/ D 3 and .2/ D 2. Substituting the value of .1/ into
the matrix equation Ax D .1/ x, we obtain (in components) two equations for the
two components x1 ; x2 of the eigenvector x:
x1 C x2 D 3x1 4x1 C x2 D 0
Ax D 3x H) or :
4x1 C 2x2 D 3x2 4x1 x2 D 0
The two equations are equivalent since, by construction, the corresponding rows of
the matrix A .1/ E are linearly dependent because its determinant is zero (recall
Property 1.7 of the determinant in Sect. 1.2.6.2). Solving either of the two yields
.1/ .1/
x2 D4x1 ; i.e.
the eigenvector corresponding to the eigenvalue D 3 is x D
a 1
Da , where a is an arbitrary number.6 Similarly, using .2/ D 2 in
4a 4
the matrix equation Ax D .2/ x yields two equations
x1 C x2 D 2x1 x1 C x2 D 0
or :
4x1 C 2x2 D 2x2 4x1 C 4x2 D 0
as it should be! J
6
Recall that each eigenvector is defined up to an arbitrary constant prefactor anyway.
64 1 Elements of Linear Algebra
(we
p have expanded the determinant with respect to its first row), yielding .1;2/ D
.3/ .1/
˙ 3 and D 2. The p system of equations for the first eigenvector x , which
.1/
corresponds to D 3, reads
8 p
< .1 3/xp 1 C x2 C x3 D 0
x1 3x2 p 2x3 D 0 ;
:
x1 x2 C .1 3/x3 D 0
0 1
1
p p T
which yields the vector x.1/ D a @ 3 A D a 1; 3; 1 with arbitrary a.
1
Again, one of the equations must be equivalent to two others, so we set x3 D a and
expressed x1 and x2 pfrom the first two equations via x3 . Similarly, for the second
eigenvalue .2/ D 3 we get the system of equations
8 p
< .1 C 3/xp 1 C x2 C x3 D 0
x1 C 3x2 p 2x3 D 0 ;
:
x1 x2 C .1 C 3/x3 D 0
0 1
1
p p T
that yields the eigenvector x.2/ D b @ 3 A D b 1; 3; 1 . Finally, performing
1
similar calculations for the third eigenvalue .3/ D 2, we get the equations
8
< x1 C x2 C x3 D 0
x 2x2 2x3 D 0 ;
: 1
x1 x2 x3 D 0
1.2 Matrices: Definition and Properties 65
01
0
which results in x.3/ D c @ 1 A D c .0; 1; 1/T . In this case the first and the third
1
equations are obviously equivalent and hence one of them should be dropped. We
fixed x3 D c and expressed x1 and x2 via x3 from the second and third equations.
Hence, in this case we obtained three solutions of the eigenproblem, i.e. three
pairs of eigenvectors and corresponding to them eigenvalues. J
In some cases care is needed in solving equations for the eigenvectors as
illustrated by the following example.
Example 1.17. I Find eigenvectors and eigenvalues of a matrix
0 1
100
A D @0 4 2A :
024
Note that the first equation has as a solution any x1 ; the second and the third
equations should be solved simultaneously (e.g. solve for x2 from the second
equation and substitute into the third), resulting in the zero solution for x2 and
x3 .7 So, the first eigenvector corresponding to 1 D 1 is x.1/ D .a; 0; 0/T with
7
This situation can also be considered as a solution of two algebraic linear equations with the zero
right-hand side:
3x2 C 2x3 D 0
:
2x2 C 3x3 D 0
As
weknow, this system of equations has a non-trivial solution only if the determinant of its matrix
32
is equal to zero. Obviously, this is not the case and hence only the zero solution exists.
23
66 1 Elements of Linear Algebra
an arbitrary constant a. Finding eigenvectors for the other two eigenvalues is more
straightforward. Consider, for instance, the second one, 2 D 2:
0 10 1 0 1 8 8
100 x1 x1 < x1 D 0 < x1 D 0
@ 0 4 2 A @ x2 A D 2 @ x2 A H) 2x2 C 2x3 D 0 H) x D x3 :
: : 2
024 x3 x3 2x2 C 2x3 D 0 x3 D x2
We see that in this case the second and the third equations give identical information
about x2 and x3 , which is that x2 D x3 ; nothing can be said about the absolute
values of them. Therefore, we can take x2 D b with an arbitrary b and then write the
eigenvector as x.2/ D .0; b; b/T D b .0; 1; 1/T . Similarly the third eigenvector is
found to be x.3/ D .0; c; c/T D c .0; 1; 1/T with an arbitrary c. J
If eigenvalues of a matrix A are all different, they are said to be non-degenerate. If
an eigenvalue repeats itself, it is said to be degenerate. We have seen in the examples
above that if a matrix A of dimension p has all different eigenvalues, it has exactly p
linearly independent eigenvectors. In fact, this is a very general result that is proven
by the following Theorem.
Theorem 1.6. If all eigenvalues of a square matrix A are different, then all its
eigenvectors are linearly independent.
Proof. Let eigenvalues and eigenvectors of the matrix A be .i/ and x.i/ (i D
1; : : : ; p), respectively. If the vectors are all linearly independent, then the equation
X
p
c1 x.1/ C C cp x.p/ D ci x.i/ D 0 (1.78)
iD1
X .i/
p
A .2/ E ci x D c1 A .2/ E x.1/ C c2 A .2/ E x.2/
iD1
X .i/
p
C A .2/ E ci x D 0 :
iD3
Now act from the left with the matrix A .3/ E . First, we notice that the matrices
A .2/ E and A .3/ E commute since A and E do. Thus, we have
X .i/ X .i/
A .3/ E A .2/ E ci x D A .2/ E A .3/ E ci x D 0 :
i¤2 i¤2
Similarly to the above, the term i D 3 in the sum will disappear since x.3/ is the
eigenvector corresponding to the eigenvalue .3/ . Thus we get
X
A .3/ E A .2/ E ci x.i/ D 0 :
i¤2;3
Repeating this procedure, we finally remove all the terms in the sum except for the
very first one:
" #
Y
n
A E c1 x.1/ D A .p/ E A .3/ E A .2/ E c1 x.1/ D 0:
.i/
iD2
(1.79)
However, for any i ¤ 1, we get
A .i/ E x.1/ D Ax.1/ .i/ x.1/ D .1/ x.1/ .i/ x.1/ D .1/ .i/ x.1/ :
Therefore, after repeatedly using the above identity, Eq. (1.79) turns into
c1 .1/ .p/ .1/ .3/ .1/ .2/ x.1/ D 0 :
If all the eigenvalues are different, then this equation can be satisfied if and only if
c1 D 0. Q
Similarly, by operating with the matrix i¤2 A .i/ E on the left-hand side of
Eq. (1.78), we obtain c2 D 0. All other coefficients ci are found to be zero in the
same way. Q.E.D.
Note that if there are some repeated (degenerate) eigenvalues, the number of
distinct eigenvectors may be smaller than the dimension of the matrix p. The two
examples below illustrate this point.
Example 1.18. I Find eigenvalues and eigenvectors of a matrix
1 1
AD :
1 3
from where we get x1 D x2 . Hence only a single eigenvector x.1/ D a.1; 1/T is
found. Of course, formally, one can always construct the second eigenvector, x.2/ D
b.1; 1/T , but this is linearly dependent on the first one. Thus, there is only a single
linearly independent eigenvector. J
Example 1.19. I Find eigenvectors and eigenvalues of the matrix
0 1
3 2 1
A D @ 3 4 3 A :
2 4 0
yielding .1/ D 5, .2;3/ D 2. For the first eigenvalue, the eigenvector is obtained
in the usual way from
8
< 8x1 2x2 x3 D 0
.A C 5E/x D 3x1 C x2 3x3 D 0 ;
:
2x1 4x2 C 5x3 D 0
yielding x.1/ D a.1; 3; 2/T . The situation with the other two eigenvectors is a bit
peculiar: since both eigenvalues are degenerate, the eigenvector equations are the
same:
8 8
< x1 2x2 x3 D0 < x1 2x2 x3 D0
.A 2E/xD 3x1 6x2 3x3 D0 ; or after simplification, x1 2x2 x3 D0 ;
: :
2x1 4x2 2x3 D0 x1 2x2 x3 D0
i.e. all three equations are identical! Thus, the only relationship between compo-
nents of either of the eigenvectors x.2/ and x.3/ is that x1 D 2x2 C x3 , i.e. there are
two arbitrary constants possible. This means that two linearly independent vectors
can be constructed. There is an infinite number of possibilities. For instance, if we
take x2 D 0, then we obtain x1 D x3 and hence x.2/ D a.1; 0; 1/T , and by setting
1.2 Matrices: Definition and Properties 69
instead x3 D 0, we obtain another vector x.3/ D b.2; 1; 0/T . It is easily seen that all
three vectors are linearly independent. Any linear combination of vectors x.2/ and
x.3/ can also be constructed,
to serve as the second and third eigenvectors provided that the new vectors are
linearly independent. J
X
k
y.i/ D ˛ij x.j/
jD1
can be written as
0 1
X
N
ˇj2 Y
N
det A D @˛0 A ˛i :
jD1
˛j iD1
Show that all eigenvalues of the matrix A are obtained by solving the
transcendental equation
X
N
ˇj2
˛0 D :
jD1
˛j
Prove specifically that the eigenvalues are not given by the diagonal ele-
ments ˛j . Then, show that the normalised to unity eigenvector of A correspond-
ing to the eigenvalue is given by
0 1
1
B C
1 B 1C
B C
e D q PN B 2 C ;
B C
1 C jD1 j2 @ A
N
where i D ˇi = .˛i /. Finally, show that any two eigenvectors, e and e0 ,
corresponding to different eigenvalues ¤ 0 , are orthogonal.
1.2 Matrices: Definition and Properties 71
We have seen above that in the case of degenerate eigenvalues of a pp matrix A it is
not always possible to find all its p linearly independent eigenvectors. We shall show
in this section that for Hermitian, AT D A D A matrices all their eigenvectors
can always be chosen linearly independent even if there are some degenerate
eigenvalues. The case of symmetric matrices, AT D A, can be considered as a
particular case of the Hermitian ones when all elements of the matrices are real, so
that there is no need to consider the case of the symmetric matrices separately. Also,
quantum mechanics is based upon Hermitian matrices, so that their consideration is
of special importance.
x Ax D x x : (1.80)
x A x D x x : (1.81)
As the matrix A is Hermitian, the left-hand sides of the two equations (1.80)
and (1.81) are the same. Therefore,
x x D 0 :
Since x ¤ 0, x x > 0, so that the only way to satisfy the above equation is to admit
that is real: D . Q.E.D.
Note that in quantum mechanics measurable quantities correspond to eigenvalues
of Hermitian matrices associated with them. This theorem guarantees that all
measurable quantities are real.
Another important theorem deals with eigenvectors of a Hermitian matrix:
are .1/ D 1, .2/ D 1 and .3/ D 2. Note that all are indeed real. The eigenvector
x.1/ is obtained from
0 10 1 8
1 i i x1 < x1 C ix2 C ix3 D 0
@ i 0 0 A @ x2 A D 0 H) ix1 D 0 ;
:
i 0 0 x3 ix1 D 0
that gives x.1/ D a .0; 1; 1/T , where a is an arbitrary complex number. Similarly,
x.2/ is found via solving
1.2 Matrices: Definition and Properties 73
0 10 1 8
1 i i x1 < x1 C ix2 C ix3 D 0
@ i 2 0 A @ x2 A D 0 H) ix1 C 2x2 D 0 ;
:
i 0 2 x3 ix1 C 2x3 D 0
yielding x.2/ D b .1; i=2; i=2/T . Finally, x.3/ is determined from equations
0 10 1 8
2 i i x1 < 2x1 C ix2 C ix3 D 0
@ i 1 0 A @ x2 A D 0 H) ix1 x2 D 0 ;
:
i 0 1 x3 ix1 x3 D 0
and results in x.3/ D c .1; i; i/T . The obtained eigenvectors are all orthogonal:
0 1
1
i i
x.1/ x.2/
D a b 0 1 1 @ A
i=2 D a b
D0;
2 2
i=2
0 1
1
i2 i2
x.2/ x.3/ D b c 1 2i 2i @ i A D b c 1 C C D 0;
2 2
i
0 1
1
x.1/ x.3/
D a c 0 1 1 @ i A D a c Œi C i D 0 : J
i
Consider two matrices: A and non-singular B. Then, the special product B1 AB is
called a similarity transformation of A by B; the matrices A and B1 AB are called
similar.
Next, consider an eigenvalue/eigenvector problem Ax D x, where A is a square
p p matrix. It is convenient in this section to denote eigenvectors of A as xj D xij ,
where in the components xij the second index corresponds to the vector number and
the first to its components. We assume that the matrix A has all its eigenvectors
linearly independent. Let us have all eigenvectors of A arranged as columns,
0 1
x11 x12 x1p
U D x1 x2 xp D @ A : (1.82)
xp1 xp2 xpp
D D U 1 AU ; (1.83)
where
0 1
1 0
D D @ A (1.84)
0 p
is a diagonal matrix containing all eigenvalues of A on its main diagonal.
Note that it was not required for this prove to work that the eigenvectors xj be
orthogonal. Q.E.D.
1.2 Matrices: Definition and Properties 75
We see from Eq. (1.83) that the similarity transformation of A by the matrix U,
that contains as its columns the eigenvectors of A, transforms A into the diagonal
form.
Inversely, let us multiply the matrix identity AU D UD from the right by U 1 .
We get
A D UDU 1 : (1.85)
Theorem 1.10. The determinant of any Hermitian matrix A is equal to the product
of all its eigenvalues:
Y
det A D i : (1.86)
i
Proof. We write first A D UDU and then calculate the determinant on both sides:
in the left-hand side we simply have det A, while in the right we get
Y
det UDU D det U det D det U D det D D i ;
i
the modal matrix U can always be chosen unitary with det U detU D
since
det UU D detE D 1. Q.E.D.
Example 1.21. IDiagonalise the matrix
0 1
0 i i
A D @ i 1 0 A :
i 0 1
with the eigenvalues on the main diagonal, as required. Note that the eigenvalues
run along the diagonal exactly in the same order as that of eigenvectors chosen in
the modal matrix U. J
Note that in the above example the product of eigenvalues is 2; a direct
calculation of the determinant det A yields exactly the same result.
The following theorem establishes an important property of the similar matrices.
1.2 Matrices: Definition and Properties 77
The matrix inside the determinant in the second equation can be rearranged by using
E D U 1 U, i.e.
U 1 AU E D U 1 AU U 1 U D U 1 .A E/ U ;
Au D u and Bv D U 1 AUv D v :
Multiply the second equation by U from the left to get A .Uv/ D .Uv/. It is
seen that the vector Uv is an eigenvector of A with the eigenvalue . However, we
know that u is also an eigenvector of A with the same , and there is only one such
eigenvalue. Therefore, u D Uv (up to an arbitrary multiplier).
In the general case of repeated eigenvalues (degeneracy) one can always con-
struct linear combinations of eigenvectors ui (of A) or vi (of B) corresponding to the
same eigenvalue (see Problem 1.66); these will still serve as accepted eigenvectors
for that eigenvalue. Therefore, in this case one can write that
X
Uvi D cik uk
k
with some coefficients cik forming a matrix C; here we sum up over all eigenvectors
corresponding to the same . Inversely,
X X
uk D c1
ki Uvi D U c1 0
ki vi D Uvk ;
i i
P
where c1 1 0
ki are elements of the inverse matrix C , and vk D
1
i cki vi are linear
combinations of the eigenvectors of B. Q.E.D.
78 1 Elements of Linear Algebra
As a simple example, let us find all eigenvalues and eigenvectors of the matrix
1 1
AD
4 2
As we found in Example 1.15, the matrix A has two eigenvalues .1/D3 and .2/ D
1
2, the corresponding normalised eigenvectors being x.1/ D p1 and x.2/ D
17 4
1
p1 . Note that these are not orthogonal (they do not need be as A is neither
2 1
symmetric nor Hermitian). Then, the matrix A after the similarity transformation
becomes
1 1=2 1=2 1 1 1 1 3 0
B D U AU D D :
1=2 1=2 4 2 1 1 3 2
leading to the same eigenvalues, 3 and 2. The first eigenvector y.1/ is obtained
from
0y1 C 0y2 D 0
;
3y1 5y2 D 0
5=3
so that y1 D 53 y2 and the normalised vector y.1/ D p3 . It is indeed directly
34 1
proportional to the vector x.1/ via U (we can omit the unnecessary multipliers):
1 1 5=3 2=3 2 1
Uy.1/ D D D x.1/ :
1 1 1 8=3 3 4
.2/ 0
yielding y1 D 0 and the normalised eigenvector y D . It is also related to
1
x.2/ via U:
1 1 0 1 1
Uy.2/ D D D x.2/ :
1 1 1 1 1
Now we are ready to conclude the proof of Theorem 1.8. There we established that
two eigenvectors x and y of a Hermitian matrix A corresponding to two different
eigenvalues are orthogonal, i.e. .x; y/ D 0, we did not consider the case of repeated
eigenvalues in detail. Now we shall show that one can always choose n linearly
independent eigenvectors for any Hermitian n n matrix, even if the latter has
repeated eigenvalues.
Theorem 1.12. For any Hermitian pp matrix A (even with repeated eigenvalues)
there exists a unitary matrix U such that the similarity transformation U AU D D
results in a diagonal matrix D containing all eigenvalues of A.
Then, consider the similarity transformation U1 AU1 . The i; j components of the
matrix formed this way are
!
X X X
U1 AU1 D xki akl xlj D xki akl xlj :
ij
kl k l
80 1 Elements of Linear Algebra
Now, consider specifically j D 1 which corresponds to the first column of the matrix
U1 AU1 . Using Eq. (1.87), we have
!
X X X X
U1 AU1 D xki akl xl1 D xki .1 xk1 / D 1 xki xk1 D 1 ıi1 ;
i1
k l k k
where in the last step we used the fact that, by construction, the vectors xi D .xki /
and x1 D .xk1 / are orthonormal. We see from the above that all elements of the
first column are zeros, apart from the 1; 1 element which is equal to 1 , the first
eigenvalue of A. On the other hand, the matrix U1 AU1 must be Hermitian since A is
such (see Problem 1.36). Therefore, all elements on the first row, apart from the first
one, should also be zeros. In other words, the structure of the matrix U1 AU1 must
be this:
0 1
1 0 0
B 0 C
U1 AU1 D B C
@ Ap1 A ;
0
with all elements, apart from the ones in the first row and column, forming a
.p 1/ ˇ.p 1/ ˇmatrix
ˇ Aˇ p1 (this will be the 1; 1-minor of the matrix U1 AU1 ).
ˇ ˇ ˇ ˇ
Note that ˇU1 AU1 ˇ D ˇU1 ˇ jAj jU1 j D jAj, since U1 is unitary by construction. On
the other
ˇ hand,ˇ because of the special structure of the matrix U1 AU1 , we can also
ˇ ˇ ˇ ˇ
write: ˇU1 AU1 ˇ D 1 Ap1 ˇ, and hence (see Eq. (1.77)) the matrix Ap1 contains
ˇ
all other eigenvalues of A (see Theorem 1.10). In other words, if 1 is not repeated,
Ap1 does not have this one but has all others; if 1 is repeated, then Ap1 has this
one as well, but repeated one time less. At the next step, we consider one eigenvector
of Ap1 , let us call it y2 , corresponding to the eigenvalue 2 of Ap1 (and hence of
A). Repeating the above procedure, we can construct a Hermitian .p 1/ .p 1/
matrix S2 D y2 yp such that it brings the matrix Ap1 into the form:
0 1
2 0 0
B 0 C
S2 Ap1 S2 D B C
@ Ap2 A ;
0
where Ap2 is a square matrix of dimension p 2. Therefore, if one constructs a
p p matrix
0 1
1 0 0
B 0 C
U2 D B C
@ S2 A ;
0
1.2 Matrices: Definition and Properties 81
then it will bring the matrix U1 AU1 into the following form:
0 10 10 1
1 0 0 1 0 0 1 0 0
B 0 CB 0 CB 0 C
U2 U1 AU1 U2 D B CB CB C
@ S A @ Ap1 A @ S2 A
2
0 0 0
0 1
0 1 1 0 0 0
1 0 0 B 0 0 0C
B 0 C B B 2 C
C
DB
@ S Ap1 S2 A B
C D B 0 0 C D .U1 U2 / A .U1 U2 / :
2
C
@ Ap2 A
0
0 0
ˇ ˇ ˇ ˇ
ˇ ˇ
As before, the determinant of ˇ.U1 U2 / A .U1 U2 /ˇ D jAj is also equal to 1 2 ˇAp2 ˇ
because of the special structure of the matrix above obtained after the similarity
transformation. Therefore, Ap2 has all eigenvalues of A apart from 1 and 2 .
This process can be repeated another p 3 times by constructing a sequence of
unitary matrices U3 , U4 , etc., Up1 , until only the last element at the p; p position is
left which is a scalar Ap.p1/ D A1 . This way the matrix A would be finally brought
into the diagonal form:
0 1
1 0 0 0
B 0 0 0 C
B B
2 C
C
U1 U2 : : : Up1 A U1 U2 : : : Up1 D B 0 0 3 0 C D D ;
B C
@ A
0 0 0 p
where the last element at the pp position must be the last eigenvalue p of A
which we have not yet considered in our construction. Thus, we managed to
construct a unitary matrix U D U1 U2 : : : Up1 which upon performing the similarity
transformation, U AU, brings the original matrix A into the diagonal form D with
its eigenvalues on the principal diagonal, as required. Our proof is general as it was
not based on assuming that all eigenvalues are different. Q.E.D.
Now we should be able to finally prove that even in the case of a Hermitian p p
matrix A having degenerate (repeated) eigenvalues it is always possible to choose
exactly p linearly independent eigenvectors fxi g (i D 1; : : : ; p) corresponding to its
eigenvalues fi g. This is shown by the following Theorem.
Theorem 1.13. Consider a Hermitian pp matrix A which has p eigenvalues fi g
amongst which there could be repeated ones. Then, it is always possible to choose
exactly p orthogonal eigenvectors fxi g (i D 1; : : : ; p).
82 1 Elements of Linear Algebra
Proof. We know from the previous Theorem that there exists a unitary matrix
U D x1 x2 xp D xij ;
such that U AU D D with D D dij D i ıij only containing all eigenvalues
of A on its diagonal. Recall that the second (column) index of xij in U indicates the
vector number, while the first index corresponds to its components. But, multiplying
U AU D D from the left with U, we obtain AU P D UD. Let usP write the
last equation
in components. In the left-hand side we have k aik xkj D k aik xj k , while the
right-hand side reads
X X X
xik dkj D xik j ıkj D j xij D j xj i H) aik xj k D j xj i ;
k k k
i.e. the j-th column xj D xij of the matrix U is an eigenvector of A corresponding to
the eigenvalue j . Since the matrix U is unitary, all the eigenvectors are orthogonal
(and hence linearly independent). Q.E.D.
The Theorem just proven generalises our previous results of Sect. 1.2.10.2 to
arbitrary Hermitian matrices.
X
d
yi D cki xk ; i D 1; : : : ; p ;
kD1
of the eigenvectors can also serve as eigenvectors of A with the same eigenvalue
(note the “reverse” order of indices in the coefficients above!). Show that the
vectors fyi g will be orthonormal if the matrix C D .cik / of the coefficients ofthe
linear transformation satisfies the matrix equation: C SC D E, where
S D sij
is the matrix of dot products of the original vectors, sij D xi ; xj . In particular,
it is seen that if S D E (the original set is orthonormal), then C must be unitary
to preserve this property for the new eigenvectors.
A question one frequently asks in quantum mechanics is what are the conditions
at which two (or more) physical quantities can be observed (measured) at the same
time. The answer to this question essentially boils down to the following question: is
1.2 Matrices: Definition and Properties 83
it possible to diagonalise two (or more) matrices associated with the chosen physical
quantities using the same similarity transformation U? This problem is solved with
the following Theorem.
Theorem 1.14. Two Hermitian matrices can be diagonalised with the same
similarity transformation U if and only if they commute.
Proof. The necessary condition of the proof we shall do first. Consider two p p
matrices A and B. Let us assume that there exists the same similarity transformation
that diagonalises both of them:
Da D U 1 AU and Db D U 1 BU :
Da Db D Db Da H) U 1 AUU 1 BU D U 1 BUU 1 AU
H) U 1 ABU D U 1 BAU H) AB D BA ;
i.e. the vector y D Bxj must also be an eigenvector of A with the same eigenvalue
j as xj . Since this is a unique eigenvalue, this is only possible if y is proportional
to the xj , i.e. y D ˇxj . This means that Bxj D ˇxj , i.e. xj is also an eigenvector
of B. Hence, both matrices A and B share the same eigenvalues and eigenvectors,
and hence the same modal matrix U D x1 xp will diagonalise both of them.
If A has a d-fold (d > 1) repeated eigenvalue , then Bxj must be a linear
combination of all eigenvectors associated with this particular eigenvalue. For
simplicity let us assume that the first d eigenvectors of A correspond to the same
eigenvalue . Then, for any j between 1 and d,
X
d
Bxj D cjk xk ; j D 1; : : : ; d ;
kD1
with some coefficients cjk forming a square d d matrix C. This matrix can easily
be seen to be Hermitian. Indeed,
Xd X
d
xi ; Bxj D cjk .xi ; xk / D cjk ıik D cji
kD1 kD1
84 1 Elements of Linear Algebra
and also
Xd
Xd
Bxj ; xi D xj ; B xi D xj ; Bxi D cik xj ; xk D cik ıjk D cij ;
kD1 kD1
since B D B . However, Bxj ; xi D xi ; Bxj , from which it immediately follows
that cji D cij , i.e. C D C , i.e. it is indeed a Hermitian matrix. Once C is Hermitian,
it can be diagonalised, i.e. there exists a d d unitary matrix S D sij such that
C D S DS with D being a diagonal matrix of eigenvalues P fk g of C.
Now it is easy to see that the vectors yi D dkD1 sik xk (with the coefficients sik
from S) are eigenvectors of both A and B. Indeed,
X X X
Ayi D sik Axk D sik xk D sik xk D yi
k k k
and
X X X X
Byi D sik Bxk D sik ckl xl D .SC/il xl
k k l l
X X X
D SS DS il xl D .DS/il xl D i sil xl D i yi :
l l l
Therefore, even in the case of repeated eigenvalues there exists a common set of
eigenvectors for both matrices. Repeating this procedure for all repeated eigenvec-
tors, we can collect all eigenvectors which are common to both matrices. Therefore,
collecting all eigenvectors
(corresponding
both to repeated and distinct eigenvalues)
into a matrix U D y1 yp (we should use yj D xj in the case of a distinct j-th
eigenvalue) and following the statement of Theorem 1.9, we conclude that one can
diagonalise both A and B at the same time via the same similarity transformation:
Da D U 1 AU and Db D U 1 BU. This proves the theorem. Q.E.D.
Example 1.22. I Show that the matrices
31 12
AD and B D
13 21
Then, in order to find the required similarity transformation, we should find the
modal matrix of one of the matrices. For instance, consider A. The secular equation,
1.2 Matrices: Definition and Properties 85
ˇ ˇ
ˇ3 1 ˇ
ˇ ˇ 2 2
ˇ 1 3 ˇ D .3 / 1 D 6 C 8 D 0 ;
Problem 1.75. Show that any symmetric matrix A D aij can be written via
its eigenvalues i and eigenvectors xi D .xki / as a sum
X
X
AD i xi xi or aij D k xki xkj : (1.88)
i k
Here in xik and xkj the right index corresponds to the eigenvector number, while
the left to its component.
One can see that any Hermitian (symmetric) matrix can actually be written as a
sum over its all eigenvalues and eigenvectors (the so-called spectral theorem). This
theorem opens a way to define functions of matrices. Indeed, consider a square of a
symmetric matrix A as the matrix product AA:
Since the matrix D is diagonal, the product DD D D2 is also a diagonal matrix with
the squares of the eigenvalues of A on its main diagonal, i.e.
0 2 1
1 0
D2 D DD D @ A :
0 2p
One can see that A2 has the same form as A of Eq. (1.85), but with squares of its
eigenvalues (instead of the eigenvalues themselves) in the sum. Therefore, we can
immediately write
X
A2 D 2i xi xi :
i
Similarly, one can show (e.g. by induction) that for any power n,
X
An D ni xi xi :
i
p Xp 1 X 1
A D A1=2 D i xi xi or p D A1=2 D p xi xi :
i
A i
i
Show that eigenvalues 1;2 and the corresponding normalised eigenvectors x1;2
of H are
q
1 2 2 1 T
1 D 1C 2C . 1 2 / C 4 jTj and x1 D q ;
2 a
a2 C jTj2
q
1 2 2 1 T
2 D 1C 2 . 1 2 / C 4 jTj and x2 D q ;
2 b
b2 C jTj2
where
q
1 2 2
aD 1 2 . 1 2 / C 4 jTj and
2
q
1
bD 1 2 C . 1 2/
2
C 4 jTj2 :
2
Then verify explicitly that the two eigenvectors are orthogonal and that H can
be indeed written in its spectral form as H D 1 x1 x1 C 2 x2 x2 .
88 1 Elements of Linear Algebra
Problem 1.79. The density matrix written via its eigenvalues and eigenvec-
tors X is
X
D X X :
Show that if the density matrix is idempotent, i.e. 2 D , then one eigenvalue
is equal to one and all others to zero, i.e. in this case D X0 X0 , where 0 is
the only non-zero eigenvalue.
Special significance in physics has the matrix G.z/ D .zEA/1 , called resolvent
of a Hermitian matrix A; it is a function of (generally complex) number z. It is
defined via the scalar function f .x/ D .z x/1 D 1=.z x/. Using the spectral
theorem, we can write down G.z/ explicitly as
X 1
X xi x
G.z/ D xi xi D i
: (1.91)
i
z i i
z i
Problem 1.80. Show that in order for Q to be real, the matrix A must be
Hermitian, A D A.
1.2 Matrices: Definition and Properties 89
Therefore, we shall assume in this section that A is Hermitian. Note that if the
matrix A and the variables xi are real, then A in Eq. (1.92) can always be made
symmetric, i.e. satisfying aij D aji . Indeed, if the form Q contains an expression
axi xj C bxj xi with different coefficients, a ¤ b, it is always possible to write this
sum as cxi xj C cxj xi with c D .a C b/ =2.
It is sometimes necessary to find a special linear combination Y D UX of the
original variables X, such that the quadratic form is diagonal in the new variables Y,
i.e. it does not contain off-diagonal elements at all (the so-called canonical form):
X
QD ci jyi j2 : (1.93)
i
Solution. First, write Q using matrix notations paying specific attention to the off-
diagonal elements:
T T
x 34 x x x
Q D 3x2 C8xyC3y2 D 3x2 C4xyC4yxC3y2 D D A :
y 43 y y y
34
The symmetric matrix A D has two eigenvalues 1 D 1 and 2 D 7 and
43
1 1 1 1
the corresponding normalised eigenvectors are u1 D p and u2 D p ,
2 1 2 1
so that the modal matrix
90 1 Elements of Linear Algebra
p p
1=p 2 1=p2 1 0
UD ; and hence U AU D
T
:
1= 2 1= 2 0 7
Thus, the new variables are
p p p
y1 1=p 2 1=p2 x y1 D .x C y/=p 2
YD DU XDT
H) ;
y2 1= 2 1= 2 y y2 D .x C y/= 2
and the quadratic form Q in the new variables reads Q D y21 C 7y22 , i.e. it has the
canonical form. Note that by a direct substitution of the new variables y1 and y2 into
the Q it is returned back into its original form via x and y. J
It can easily be seen that any unitary transformation of the modal matrix of A
also diagonalises a quadratic form based on it. Indeed, consider Q D X AX and
A D UDU with U being the modal matrix of A. Since A is Hermitian, U can
always be chosen unitary. If V is another unitary matrix, then the transformation
UV also diagonalises Q. Indeed, define auxiliary variables Y D .UV/ X D V U X,
then X D UVY, so that
Q D X AX D .UVY/ A .UVY/ D Y V U AUVY D Y V U AU VY
D Y V DVY D .VY/ D .VY/ D Z DZ ;
which appears to be in the canonical form with respect to the new variables
Z D VY D VV U X D U X since V is a unitary matrix. Thus, V disappears
completely from our final result, and the transformed Q is the same as given by the
modal matrix alone.
Therefore, any Hermitian matrix can be diagonalised with a similarity transfor-
mation. The latter is defined up to a unitary transformation.
Problem 1.81. Show that the quadratic form Q D 2x12 C 2x1 x2 C 2x22 is Q D
y21 C 3y22 in the canonical form, and find the new variables yi via the old ones.
[Answer: y1 D p1 .x1 C x2 / and y2 D p1 .x1 x2 /.]
2 2
Problem 1.82. Show that the quadratic form Q D x12 C 2x1 x2 C x22 can
be brought into the diagonal form Q D 2y22 by means of an orthogonal
transformation.
We have formulated the sufficient condition for a function of two variables to have
a minimum or a maximum at a point r0 D .x0 ; y0 / in Sect. I.5.10. Here we shall
generalise this result to the case of a function of n variables.
1.2 Matrices: Definition and Properties 91
where x D x x0 (i.e. xi D xi xi0 ). If the point x0 is a stationary point, the first
order partial derivatives are all equal to zero, so that the first non-vanishing term in
the above expansion is the one containing second derivatives. We shall now write
y.x/ above using matrix notations:
1
y D y x0 C x y x0 D xT Hx C ; (1.94)
2
where H D hij is the Hessian8 matrix of second derivatives, which is symmetric
due to the well-known property of mixed derivatives (Sect. I.5.3):
2 2
@y @y
hij D D D hji :
@xi @xj x0 @xj @xi x0
The change y of the function y.x/ is a quadratic form with respect to the changes
of the variables, xi . By choosing a proper transformation of the vector x into
new variables z which diagonalises the matrix H, we obtain
1X
y D j z2j C ; (1.95)
2 j
where j are eigenvalues of the Hessian matrix. This is our final result: it shows that
if all eigenvalues of the Hessian matrix at the stationary point x0 are positive, then a
small deviation from the stationary point x0 ! x D x0 C x can only increase the
function y.x/, i.e. the stationary point is a minimum. If, however, all the eigenvalues
j are negative, then x0 corresponds to a maximum instead. If at least one of the
eigenvalues has a different sign to others, this is neither minimum nor maximum.
Problem 1.83. Consider a function z D z.x; y/of two variables and consider
ac
conditions for the Hessian matrix H D to have both its eigenvalues
cb
positive. Hence demonstrate
that the sufficient conditions for the function to
have a minimum at point x0 ; y0 are the same as those derived in Sect. I.5.10.
8
Named after Ludwig Otto Hesse.
92 1 Elements of Linear Algebra
Traces of matrices are frequently used in physics, e.g. quantum statistical mechanics
is based on them. They possess a number of useful and important properties which
we shall consider here.
Firstly, the trace of a product of matrices is invariant (i.e. does not change) under
any cyclic permutation of them, i.e.
Tr .AB/ D Tr .BA/ or Tr .ABC/ D Tr .BCA/ ; but Tr .ABC/ ¤ Tr .ACB/ in general!
as required. This result can be used to prove the statement for three (and more)
matrices. We shall consider the case of three for simplicity, a more general case can
be considered similarly:
where D D BC and we have used just proven statement for two matrices. Similarly,
Therefore,
Indeed, let the square matrix A has fi g as its eigenvalues. Then, there exists a
modal matrix U that diagonalises A, i.e. A D U 1 DU with the diagonal matrix D
containing all eigenvalues i on its main diagonal. Therefore,
X
Tr .A/ D Tr U 1 DU D Tr UU 1 D D Tr .D/ D i ;
i
as required.
Problem 1.84. Show that the trace of a matrix A does not change after a
similarity transformation, i.e.
Tr A D Tr U 1 AU :
Calculate the products of matrices ABC, CAB and BAC and then the traces of
them. Compare your results and explain your findings.
Problem 1.87. Using the spectral representation of a non-singular Hermitian
matrix A, prove the following identity:
xQ 2 D Ax1 ˛1 x1 : (1.99)
Let us choose the parameter ˛1 in such a way that the vector xQ 2 be orthogonal to
x1 . Calculating the dot product of both sides of the above equation with x1 , i.e.
multiplying the equation from the left by x1 , we have
x1 xQ 2 D x1 Ax1 ˛1 x1 x1 H) x1 xQ 2 D x1 Ax1 ˛1 :
It is seen that x1 xQ 2 D 0 if
˛1 D x1 Ax1 : (1.100)
The vector xQ 2 may not be of unit length; therefore, we normalise it to unity, i.e. we
introduce a (real) scaling factor ˇ1 such that the vector x2 D xQ 2 =ˇ1 be of unit length:
x2 x2 D 1. Obviously, ˇ12 D xQ 2 xQ 2 . However, this expression for ˇ1 is not really
useful. Another expression for the parameter ˇ1 can formally be derived which is
directly related to the vectors x2 and x1 . Indeed, Eq. (1.99) can be rewritten as
ˇ1 x2 D Ax1 ˛1 x1 : (1.101)
Then, multiplying both sides of (1.101) from the left by x2 , we obtain
ˇ1 x2 x2 D x2 Ax1 ˛1 x2 x1 :
Since the two vectors are orthogonal and x2 is normalised to unity, we have ˇ1 D
x2 Ax1 . Since ˇ1 is real and A Hermitian, we can also write
ˇ1 D x2 Ax1 D x1 Ax2 : (1.102)
Next, we construct the third vector x3 using a linear combination of the vector
Ax2 and the vectors x1 and x2 :
ˇ2 x3 D Ax2 ˛2 x2 ˇ1 x1 : (1.103)
It is seen that x2 x3 D 0 if ˛2 D x2 Ax2 is chosen. Similarly, we can find that x3 and
x1 are orthogonal by this construction automatically:
ˇ2 x1 x3 Dx1 Ax2 ˛2 x1 x2 ˇ1 x1 x1 H) ˇ2 x1 x3 Dx1 Ax2 ˇ1 H) x2 x3 D 0 ;
because of the expression (1.102) for ˇ1 . Finally, ˇ2 is chosen in such a way that the
vector x3 be of unit length. This constant formally satisfies the equation ˇ2 D x3 Ax2
which can be obtained by multiplying Eq. (1.103) from the left by x3 :
ˇ2 x3 x3 D x3 Ax2 ˛2 x3 x2 ˇ1 x3 x1 H) ˇ2 x3 x3 D x3 Ax2 ;
since x3 has already been made orthogonal to x1 and x2 , and its rescaling cannot
change that.
The next vector, x4 , is obtained from x2 and x3 via
ˇ3 x4 D Ax3 ˛3 x3 ˇ2 x2 : (1.104)
Problem 1.88. Show that by choosing ˛3 D x3 Ax3 , the vector x4 is made
orthogonal to both x2 and x3 . Next, demonstrate that ˇ3 D x3 Ax2 D x2 Ax3
if x4 is to be of unit length.
the last step was legitimate because of the orthogonality of x3 with x2 and of x2 with
x1 , ensured by the previous steps of the procedure. Next, from Eq. (1.101) it follows
that Ax1 D ˛1 x1 C ˇ1 x2 , i.e. Ax1 is a linear combination of x1 and x2 . However, x3
is orthogonal to both x1 and x2 which ensures that x4 is orthogonal to x1 :
ˇ3 x4 x1 D x3 Ax1 H) ˇ3 x4 x1 D x3 .˛1 x1 C ˇ1 x2 / H) x4 x1 D 0 ;
as required. Since x3 Ax1 was found to be proportional to x4 x1 , we also conclude that
x3 Ax1 D 0.
The subsequent vectors x5 , : : :, xn are obtained in a similar way by using at each
step two previously constructed vectors, and all vectors constructed in this way form
an orthonormal set, i.e. they are mutually orthogonal and are all of unit length.
96 1 Elements of Linear Algebra
We are now ready to formulate the general procedure. Starting from a unit vector
x1 , a vector x2 is constructed using (1.101) with the constants ˛1 and ˇ1 satisfying
Eqs. (1.100) and (1.102). Then, each subsequent vector xiC1 for i D 2; 3; ; : : : ; n 1
is built using the following rule:
It is seen that each consecutive vector is obtained from two preceding ones.
the vector xiC1 is made orthogonal to xi1 and xi . Then, demonstrate that the
scaling factor
ˇi1 D xi1 Axi D xi Axi1 : (1.107)
Thus, the matrix A and the unit vector x1 generate a set of n mutually ˚ orthogonal
unit vectors x1 ; : : : ; xn . By taking a different first vector, another set x0i of n vectors
is generated. These new vectors belong to the same n-dimensional vector space,
are orthogonal to each other (and hence are linearly independent) and therefore are
obtained by a linear combination of the vectors from the first set:
X
n
x0i D wij xj ;
jD1
where expansion coefficients wij form a square matrix W. Since both sets are
orthonormal, the matrix U must be unitary (Sect. 1.2.5).
The Lanczos procedure described above has an interesting implication. Let us
construct a square matrix U D .x1 x2 xn / D uij by placing the vectors
x1 ; x2 ; : : : ; xn generated using the Lanczos algorithm as its columns. Obviously,
the
uij element of U is then equal to the i-th component of the vector xj , i.e. uij D xj i .
Recall that in Sect. 1.2.10.3 we built the modal matrix in exactly the same way.
Since the vectors x1 ; x2 ; : : : ; xn are orthonormal, the matrix U is unitary. We shall
now show that the similarity transformation with the matrix U results in a matrix
1.2 Matrices: Definition and Properties 97
0 1
˛1 ˇ1
B ˇ1 ˛2 ˇ2 C
B C
B ˇ ˛ ˇ C
B 2 3 3 C
B :: :: :: C
B : : : C
B C
B C
B ˇi1 ˛i ˇi C
U AU D T D B C: ; (1.109)
B ˇi ˛iC1 ˇiC1 C
B C
B :: :: :: C
B : : : C
B C
B ˇn3 ˛n2 ˇn2 C
B C
@ ˇn2 ˛n1 ˇn1 A
ˇn1 ˛n
where at the last step we returned back to the vector and matrix notations. The
numbers xi Axj are however all equal to zero if the indices i and j differ by more than
one according to Eq. (1.108). The diagonal elements tii D xi Axi coincide with ˛i ,
see Eq. (1.100) and (1.106), and the nearest off-diagonal elements ti1;i D xi1 Axi
and ti;i1 D xi Axi1 both coincide with ˇi1 , see Eqs. (1.102) and (1.107). This
finally proves formula (1.109).
Fig. 1.7 Examples of partitioning of two square p p matrices into blocks: (a) the matrix A is
divided into four blocks A11 , A12 , A21 and A22 with their dimensions indicated, where n1 C n2 D p,
and (b) B is split into nine blocks with n1 C n2 C n3 D p
98 1 Elements of Linear Algebra
matrix; other blocks are defined similarly. Note that with this partition the 22 block
A22 is also a square n2 n2 matrix, while the blocks A12 and A21 are rectangular
matrices if n1 ¤ n2 . Similarly, more partitions can be made; an example of nine
blocks (three partitions along each side) is shown in Fig. 1.7(b).
It is useful to be aware of the fact that one
can operate with blocks as with
matrix
elements. Consider a p p matrix A D aij and another matrix B D bij of the
same dimension, and let us introduce for each of them the same block structure as
shown in Fig. 1.7(a). Then, the product of the two matrices C D AB will also be a
p p matrix for which the same partition can be made. Then, one can write
C11 D A11 B11 C A12 B21 ; C12 D A11 B12 C A12 B22 ;
C21 D A21 B11 C A22 B21 ; C22 D A21 B12 C A22 B22 ;
Problem 1.91. Prove the above identities by writing down explicitly all matrix
multiplications using elements aij , bij and cij of the matrices A, B and C.
Problem 1.92. Consider an inverse G D A1 of the matrix A from the previous
Problem. By writing explicitly in blocks the identity AG D E, show that the
blocks of G can be written as follows:
1
G11 D A11 A12 A1 22 A21 ; G21 D A1
22 A21 G11 ;
1
G22 D A22 A21 A1 11 A12 ; G12 D A1
11 A12 G22 : (1.111)
E11 0
Problem 1.95. Consider a 2 2 block matrix X D . Prove that
X21 X22
jXj D jX22 j.
Problem 1.96. Using the matrix decomposition Eq. (1.112) and the result of
the previous Problem, show that
ˇ ˇ
ˇ X11 X12 ˇ ˇ ˇ
ˇ ˇ ˇ 1 ˇ
ˇ X21 X22 ˇ D jX11 j X22 X21 X11 X12 : (1.113)
dv
m D q .v B/ :
dt
We would like to know the trajectory of the particle r.t/ and its velocity v.t/ as a
function of time subject to the known initial conditions, r.0/ and v.0/.
In fact, there are three equations for each of the Cartesian components of the
velocity, i.e. we have a system of three linear differential equations:
8
< mvP 1 D q .v2 B3 v3 B2 /
mvP D q .v3 B1 v1 B3 / :
: 2
mvP 3 D q .v1 B2 v2 B1 /
To solve the problem, we shall rewrite the equations as a single matrix equation:
8 0 1 0 10 1
< mvP 1 D q .v2 B3 v3 B2 / v
d @ 1A q@
0 B3 B2 v1
mvP 2 D q .v3 B1 v1 B3 / H) v2 D B3 0 B1 A @ v2 A ;
: dt m
mvP 3 D q .v1 B2 v2 B1 / v3 B2 B1 0 v3
(1.114)
i.e.
0 1
0 B3 B2
dv q@
D Gv ; where G D B3 0 B1 A : (1.115)
dt m
B2 B1 0
100 1 Elements of Linear Algebra
To solve this matrix equation, we shall borrow the solution method from the
one-dimensional analog of this problem, which is the corresponding ordinary
differential equation. Indeed, the equation vP D gv has an exponential solution,
v.t/ D v0 egt . Therefore, we attempt to solve the corresponding matrix (three-
dimensional) equation using the following trial solution: v.t/ D uet with u and
being an unknown vector and scalar. Using the trial solution in the equation of
motion, we first calculate its time derivative, dv=dt D uet , and then, substituting
it into Eq. (1.115), we obtain
which gives three eigenvalues: D 0; i!; i!. As expected, they are purely
imaginary and zero. The normalised eigenvectors (generally complex) are easily
obtained to be
0 1 0 1 0 1
0 1 1
@ A 1 @ A 1 @
u1 D 0 ; u2 D p i ; u3 D p i A :
1 2 0 2 0
Therefore, a general solution of the system of three linear differential equations
can be written as a linear combination of the three functions with three arbitrary
constants c1 , c2 and c3 :
Note that the first term is not time dependent as it corresponds to the zero eigenvalue.
Since u2 and u3 are within the x; y plane and u3 is directed along z, we may anticipate
that if the magnetic field is along the z direction the particle moves with a constant
speed along the z axis and performs an oscillatory motion in the x; y plane, i.e.
perpendicular to the magnetic field.
1.3 Examples in Physics 101
To see this explicitly, we need to apply the initial conditions which determine the
undefined constants. Let us assume that v.0/ D .0; v? ; vk /, i.e. the particle enters
the field with a velocity vk parallel to the field B D .0; 0; B/ and a velocity v?
perpendicular to it. Then at t D 0 we obtain
0 1 0 1 0 1 0 1
0 0 1 1
c c
@ v? A D c1 @ 0 A C p2 @ i A C p3 @ i A ;
vk 1 2 0 2 0
p
which can be solved to give: c1 D vk , c2 D c3 D iv? = 2. Substituting these
constants into solution (1.117), we obtain
0 1 0 1 0 1
0 1 1
iv? @ A i!t iv? @
v.t/ D @ 0 A i e C i A ei!t
2 2
vk 0 0
0 1 0 1 0 1 0 1
0 iei!t C iei!t 0 2 sin .!t/
v ? v?
D @ 0 AC @ ei!t C ei!t A D @ 0 A C @ 2 cos .!t/ A
2 2
vk 0 vk 0
0 1
v? sin .!t/
D @ v? cos .!t/ A : (1.118)
vk
To obtain the position vector r.t/ of the particle, we should integrate the velocity
vector:
0 1
Z t .v? =!/ Œ1 cos .!t/
r.t/ D v.t1 /dt1 D @ .v? =!/ sin .!t/ A ; (1.119)
0
vk t
where we have assumed that initially the particle was in the centre of the coordinate
system, r.0/ D 0. Thus, indeed, the particle performs a circular motion in the plane
perpendicular to the magnetic field and, at the same time, it moves with the constant
speed along the field direction. The circle radius R D v? =! D mv? =qB and the
rotation frequency ! D qB=m.
The kinetic energy of the particle,
m 2 m 2
K.t/ D v1 C v22 C v32 D v C vk2 D K.0/
2 2 ?
is conserved, as the magnetic field, as it is well known, does not do any work on the
particle.
102 1 Elements of Linear Algebra
Problem 1.97. Assume a general direction of the magnetic field. Show that the
general solution for the velocity in this case is
0 1 0 1 0 1
!1 !1 !3 i!!2 !1 !3 C i!!2
v.t/ D c1 @ !2 A C c1 @ !2 !3 C i!!1 A ei!t C c3 @ !2 !3 i!!1 A ei!t ;
!3 !32 ! 2 !32 ! 2
where !i D eBi =m.
Problem 1.98. Solve the following system of linear differential equations:
xP 1 D x1 C 2x2
xP 2 D 2x1 C x2
x1
by writing them first in a matrix form XP D DX with X D , applying
x2
a trial solution X.t/ D Yet to obtain an eigenproblem for the matrix D that
should give and Y as its eigenvalues and eigenvectors. Hence, construct the
general solution X.t/ by combining the two elementary solutions with arbitrary
constants. Finally, find the particular solution that satisfies the following initial
t 3t
tx1 .0/3t D 0 and x2 .0/ D 1. [Answer: x1 .t/ D e C e =2,
conditions:
x2 .t/ D e C e =2.]
Problem 1.99. Similarly, solve the system of equations
xP 1 D 3x1 2x2
xP 2 D 2x1 x2
Show that the solution of these equations subject to the initial conditions that
initially (t D 0) only N0 free molecules existed is
(continued)
1.3 Examples in Physics 105
Fig. 1.9 Rectangular islands of two phases growing on a surface of a crystal (Problem 1.103).
Between islands is a mobile phase consisting of mobile molecules which can attach (with the rate
k0 ) to any of the islands and/or detach (with the rates k1 and k2 , respectively) from any of them.
However, it is assumed in the Problem that the rate k2 k1 and hence the process of detachment
from the islands of phase 2 is neglected
N0 k0 t
N1 .t/ D e eC t ;
N0
Nf .t/ D .k1 C / e t .k1 C C / eC t ;
1
N2 .t/ D N0 1 C C .k1 C / e t .k1 C C / eC t ;
k1
where
q
1
˙ D .k1 C 2k0 ˙ / ; D k12 C 4k02 :
2
Prove that both eigenvalues ˙ < 0 and that the first phase and the
free molecules completely disappear with time, while the second phase will
consume all the molecules, i.e. N1 .1/ D Nf .1/ D 0, while N2 .1/ D N0 .
Explain this result.
X
n
mi xR i D Fi D ˆij xj ; (1.122)
jD1
where mi is the mass associated with the degree of freedom i. The left-hand side
gives mass times acceleration, and the notation xRi means the second derivative with
respect to time.
Introducing obvious vector and matrix notations,
0 1 0 1 0 1
x1 ˆ11 ˆ1n m1 0
x D @ A ; ˆ D @ ˆij A and M D @ mi A ;
xn ˆn1 ˆnn 0 mn
M xR D ˆx : (1.123)
1.3 Examples in Physics 107
This is a system of linear second order differential equations. Since the motion
is oscillatory, we anticipate that the displacements xi .t/ ei!t in time. This also
follows from the one-dimensional analogue of this equation, mRx D ! 2 x, whose
solution is x ei!t . Therefore, we substitute into Eq. (1.123) a trial solution of
the form x.t/ D uei!t with some unknown scalar ! and a constant vector u. Since
xR D .i!/2 uei!t D ! 2 uei!t , we obtain an equation:
! 2 Mu D ˆu : (1.124)
In order to solve this equation, we perform some matrix manipulations. Let us define
a matrix M 1=2 in such a way that square of it is equal to M (in this particular case
p
M 1=2 is a diagonal matrix with mi on its main diagonal). Then, the matrix M 1=2
p
is the inverse to M 1=2 , i.e. M 1=2 M 1=2 D E; it only contains 1= mi on its main
diagonal. Now, multiply from the left Eq. (1.124) by M 1=2 and insert the unit matrix
E D M 1=2 M 1=2 between ˆ and u in the right-hand side:
! 2 M 1=2 Mu D M 1=2 ˆM 1=2 M 1=2 u H) ! 2 M 1=2 u D D M 1=2 u ;
where
Dv D ! 2 v : (1.126)
This is the central result. It shows that the vibrational problem can be cast into an
eigenvector/eigenvalue problem. Squares of the vibrational frequencies .˛/ D !˛2
appear as ˛-th eigenvalue, and the corresponding eigenvector v.˛/ is directly related
to atomic displacements via u.˛/ D M 1=2 v.˛/ which are called normal modes.
These are collective (synchronised) displacements of many atoms of the system.
Since the matrix D is symmetric, the modal matrix of it, in which the vectors
.˛/ .˛/
v D vi are placed as its columns, is orthogonal. In turn, this means that its
rows (or columns) form orthonormal sets of vectors. Let us adopt ˛ as the column
index and i as the row index of the modal matrix. If these conditions are written
explicitly in components, the following identities are obtained:
X .˛/ .ˇ/ .˛/ T .ˇ/
vi vi D ı˛ˇ or v v D ı˛ˇ ; (1.127)
i
X .˛/ .˛/
X T
vi vj D ıij or v.˛/ v.˛/ D E ; (1.128)
˛ ˛
108 1 Elements of Linear Algebra
where c˛ are arbitrary constants that should be found from initial (t D 0) conditions
for atomic displacements and their velocities. Since the matrix D is real, its
.˛/
eigenvectors v.˛/ D vi are also real. However, the coefficients c˛ must be
complex to ensure that the vector x.t/ is real:
X
n
Xn
x.t/ D M 1=2 v.˛/ Re c˛ ei!˛ t D M 1=2 v.˛/ g˛ .t/ ; (1.130)
˛D1 ˛D1
where
g˛ .t/ D Re c˛ ei!˛ t D a˛ cos .!˛ t/ C b˛ sin .!˛ t/
with a˛ and b˛ being two real arbitrary constants.
As we mentioned above, the eigenvalues !˛2 are always real which is guaranteed
by the fact that the dynamical matrix D is symmetric. However, there is no guarantee
that they are positive. If all eigenvalues (the frequencies squared) are positive,
!˛2 > 0, then the frequencies are real and (can be chosen) positive, and the
vibrational system is in a stable mechanical equilibrium. If there is at least one
negative eigenvalue, !˛2 < 0, then the frequency !˛ D ˙i j!˛ j is pure imaginary
and the corresponding normal mode is no longer sinusoidal: x.˛/ .t/ ej!˛ jt .
This means that the system is not stable in this particular atomic configuration and
will eventually transform to a different atomic arrangement (e.g. a molecule may
dissociate, i.e. break into several parts).
These conclusions about stability can be also directly illustrated on the potential
energy itself. As should be clear from Sect. 1.2.13, the potential energy (1.121) is
a so-called quadratic form of atomic displacements, which can be brought into a
diagonal form (i.e. diagonalised) in terms of the normal modes:
1 T 1 X 1=2 .˛/ T
V V.0/ D x ˆx D M v g˛ .t/ ˆ M 1=2 v.ˇ/ gˇ .t/
2 2
˛;ˇ
1X T 1=2
D g˛ .t/gˇ .t/ v.˛/ M ˆM 1=2 v.ˇ/
2
˛;ˇ
1X T 1X h T i
D g˛ .t/gˇ .t/ v.˛/ Dv.ˇ/ D g˛ .t/gˇ .t/!ˇ2 v.˛/ v.ˇ/ ;
2 2
˛;ˇ ˛;ˇ
1.3 Examples in Physics 109
where at the last step we made use of the fact that the vector v.ˇ/ is an eigenvector
of D with the eigenvalue !ˇ2 , so that Dv.ˇ/ D !ˇ2 v.ˇ/ . Because of the orthogonality
condition, Eq. (1.127), the expression in the square brackets above is equal to ı˛ˇ ,
so that we finally obtain
1X 2 2
n
1
V D V.0/ C x ˆx D V.0/ C ! g .t/ : (1.131)
2 2 ˛D1 ˛ ˛
One can clearly see that if all eigenfrequencies are real, i.e. !˛2 > 0, then the
current equilibrium state is indeed stable, i.e. the quadratic form, the potential
energy (1.121) is positive definite and hence it can only increase due to atomic
displacements. If, however, at least one of !˛ is complex, then !˛2 < 0 and the
current state is not stable as there must be a displacement which would take the
potential energy to a value smaller than V.0/.
1 T 1 T
EK D qP K qP and EP D q Vq ; (1.132)
2 2
where K and V are symmetric square matrices. Show, considering the energy
conservation condition, EK C EP D Const, that the motion of the particles is
described by the matrix equation K qR C Vq D 0. Then, assuming an oscillatory
motion of frequency !, i.e. q.t/ D xei!t , show that the oscillation
ˇ frequencies
ˇ
of the system normal modes are determined by the equation ˇV ! 2 K ˇ D 0.
Problem 1.105. Consider a linear symmetric triatomic molecule A–B–A with
masses m, m and m, respectively, see Fig. 1.10. If x1 , x2 and x3 are dis-
placements of the atoms from their equilibrium positions along the molecular
axis, then one can write the following expressions for the kinetic and potential
energies of the system:
m 2 kh i
EK D xP C Px22 C xP 32 and EP D .x2 x1 /2 C .x3 x2 /2 ;
2 1 2
(continued)
Show that the corresponding eigenvectors for each mode are q1 D .1; 0; 1/T ,
q2 D .1; 1; 1/T and q3 D .1; 2=; 1/T . Sketch them. What motion does the
zero frequency mode correspond to?
Problem 1.106. Here we shall solve the previous problem differently. Using
the condition that the centre of mass of the molecule is at rest at the origin,
eliminate x2 and thus rewrite both EK and EP in the matrix form as
m k x1 xP 1
EK D XP T K XP and EP D X T ˆX ; where XD and P
XD :
2 2 x3 xP 3
Obtain eigenvalues and eigenvectors of the matrix ˆ and hence find the
transformation U that diagonalises ˆ. Express new coordinates
orthogonal
y1
Y D D U X via the old ones, X. Demonstrate explicitly that the
y3
new coordinates .y1 ; y3 / are no longer coupled in V. Show then that the same
orthogonal transformation diagonalises the matrix K of the kinetic energy as
well. Show that the total energy E D EK C EP of the molecule in the new
coordinates is the sum of the energies of two independent harmonic oscillators.
Hence, determine the two oscillation frequencies of the molecule !1 and !3 .
Make sure they are the same as in Eq. (1.133).
Fig. 1.11 (a) An infinite chain of identical atoms of mass m connected with identical springs; the
atoms are numbered by an integer index n running between 1 and +1. (b) The same chain, but
the atom with n D 0 was replaced with the one having a different mass m
0 1 0 10 1
:: :: ::
: : :
Bx C B C
B 2 CB xn1 C
d2 BB
n1 C
C B
B
CB
C B
C
C
B x C D B 2 B
CB n C
x ; (1.134)
dt2 B C C
n
@ xnC1 A B C
@ 2 A @ xnC1 A
:: :: ::
: : :
where D k=m. The infinite dimension matrix D has a tridiagonal form, i.e.
it has non-zero elements only on the diagonal itself as well as one element to
the left and right of it. The matrix D is symmetric with the elements dij D
2ıij ıi;jC1 C ıi;j1 . We notice also that dij depends only on the difference
of indices i j, but not on the both indices themselves. This is due to periodicity
of the system at equilibrium. Hence, we can write dij simply as dij , where dn D
.2ın0 ın1 ın;1 /.
Now we shall try to solve the equations XR D DX. To do this, we shall introduce
the so-called periodic boundary conditions: we shall say that the chain repeats itself
after a very large number N of atoms, i.e. xnCN D xn for any n between 0 and N 1.
This can be imagined in such a way that atom N would coincide with atom 0, i.e.
the chain of N atoms is connected to itself forming a ring as depicted in Fig. 1.12.
This trick allows us to form a set of N (which is very-very large, but finite) set of
equations (1.134) which we shall now attempt to solve.
Using the method of the previous section, we shall attempt a substitution X.t/ D
Yei!t , which results in the eigenvector-eigenvalue problem
X
DY D ! 2 Y or dnj yj D ! 2 yn : (1.135)
j
112 1 Elements of Linear Algebra
X
N1
1X
.D/js D U DU js D ulj dln uns D dln ei2 lj=N i2 ns=N
e :
l;nD0
N ln
Since dln depends only on the difference of indices, we can introduce a new index
p D l n to replace n, which yields
0 1 !
1X X 1 X i2
N1 N1
.D/js D dp ei2 lj=N i2
e .lp/s=N
D @ dp ei2 sp=NA
e l.js/=N
DdQ s ıjs :
N lp pD0
N lD0
(1.137)
Here
the
Kronecker symbol appeared since in the second bracket we simply have
U U js written explicitly (the matrix U is unitary and hence U U D E). We have
also introduced a new quantity
X
N1 X
N1
dQ s D dp ei2 sp=N
D 2ıp0 ıp1 ıp;1 ei2 sp=N
pD0 pD0
2 s
D 2 ei2 s=N
C ei2 s=N
D 2 1 cos : (1.138)
N
We see from Eq. (1.137) that the matrix D became diagonal after the similarity
transformation, and hence we can immediately get the required eigenvalues appear-
ing in (1.136) as:
1.3 Examples in Physics 113
2 2 s 2k 2 s 2k
! D 2 1 cos D 1 cos D .1 cos qa/ or
N m N m
r ˇ
4k ˇ qa ˇˇ
!.q/ D ˇsin ˇ :
m 2
There are N eigenvalues corresponding to N possible values of the index s D
0; 1; : : : ; N 1; however, it is more convenient to introduce q D 2 s=aN (which is
the wave vector) instead of s to label different solutions. It changes between 0 (when
s D 0) and 2 =a (when s D N 1 ' N as N 1). Since the chain is a ring, we
can alternatively consider the values of q between =a and =a. This interval is
called the Brillouin zone. The wave vector almost continuously changes within the
Brillouin zone between these two values as N is very large, i.e. the nearest values
of q differ only by q D 2 =Na D 2 =L, where L D Na is the length of the p ring.
And the vibrational frequencies !.q/ change between zero and the value of 4k=m
at the Brillouin zone boundaries (at q D ˙ =a). The dependence of the oscillation
frequency !.q/ of the chain on the wave vector q is called the dispersion relation.
Once we obtained all N eigenvalues, !.q/, we can calculate the corresponding to
them eigenvectors. The simplest choice of orthogonal eigenvectors9 is to consider
the corresponding to !.q/ eigenvector Y .q/ as a vector with all components equal to
zero apart from the s-th one which is equal to 1 (here q D 2 s=aN and s are directly
related):
.q/ 2 s
Y .q/ D .: : : ; 0; 1; 0; : : :/T ; i.e. Y j D ıjs ; where qD D sq :
L
Then, for Y .q/ D UY .q/ we have
.q/ X 1
N1
.q/ 1 1
Y nD p ei2 nj=N
Y j D p ei2 ns=N
D p eiqna ;
jD0
N N N
.q/ 1
X n D Y .q/ n ei!.q/t D p eiqna ei!.q/t : (1.139)
N
The problem which we have just solved corresponds to a periodic chain: all atoms
in the chain are identical and repeat themselves after a “translation” by the distance
a (the distance between atoms). In practice, one is sometimes concerned in solving
a much more difficult problem of defective systems which has no periodicity.
However, before studying such systems, it is instructive first to calculate a special
auxiliary object, called the Green’s function. It is defined as a resolvent of the
dynamical matrix (D in our case):
9
As the matrix D is diagonal and hence Hermitian, this choice is always possible.
114 1 Elements of Linear Algebra
X ys ys
G.z/ D .zE D/1 D ; (1.140)
s
z !s2
where ys (which is the same as Y .q/ ) and !s2 are the s-th (or q-th) eigenvector and
eigenvalue of the matrix D, i.e. Dys D !s2 ys , and z is a complex number (see also
Eq. (1.91)); z is sometimes called complex “energy”. For the periodic chain the
index s counts different solutions of the eigenproblem, but we can use q for that
instead. Using the above results, we write for the elements of the matrix G.z/:
X Y .q/ n Y .q/ j 1X 1
gnj .z/ D 2
D 2 .q/
eiq.nj/a : (1.141)
q
z ! .q/ N q
z !
Equipped with the Green’s function of the ideal (perfect) chain, we can now
consider a more difficult problem of a defective chain. As the simplest example, let
us have a look at the chain in which a single 0-th atom of mass m was replaced with
an isotope of different mass m, as in Fig. 1.11(b). Since the isotope is chemically
identical, the same spring constants can be used as for the ideal chain. The same
equations of motion
can be written for all atoms apart from the one with n D 0 (the isotope), for which
we have instead:
Therefore, all linear differential equations we have to solve can now be written as:
They differ from the equations for the perfect chain by the second term in the right-
hand side which plays the role of a perturbation. Introducing now the same notations
we used for the periodic chain, we make the substitution xn .t/ D yn .t/ei!t , which
enable us to rewrite our equations as follows:
X
N1
! 2 yn D dnj yj C .1 / ! 2 y0 ın0 H) ! 2 Y D DY C WY
jD0
2
H) ! EDW Y D0; (1.142)
where the “perturbation” matrix W was introduced, which has a single non-zero
element W00 .!/ D .1 / ! 2 . Note that it depends on the frequency explicitly. To
solve the above equation, we shall rewrite it for a general complex z (i.e. we replace
! 2 with z) as follows:
where we introduced the Green’s function G.z/ D .zE D/1 of the perfect chain.
Note that the values of z above correspond to the solutions of the equation
jzE D W.z/j D 0
and hence the determinant of the matrix zE D cannot be zero. This allowed us to
introduce the matrix G.z/ in Eq. (1.143).
Non-trivial solutions of this equation appear as roots of the equation
where explicit expression for the Green’s function on the defect site (1.141) (when
n D j D 0) has been used. The sum in the right-hand side can be turned into an
integral (since N ! 1 and hence q ! 0)
Z =a
1X 1 1 X q a dq
D ! 2 4k sin2 qa
;
N q ! 2 ! 2 .q/ Nq q ! 2 ! 2 .q/ 2 =a ! m 2
and hence calculated. Equating this to the left-hand side of Eq. (1.145) allows
calculation of all solutions for the frequencies of the defective chain. It should
give perturbed “bulk” solutions close to those of the perfect chain, plus additional
solutions may also appear when becomes sufficiently different than one. A
patient reader should be able to perform the q-integration analytically and obtain
a transcendental equation for !.
116 1 Elements of Linear Algebra
and hence find the frequency corresponding to the local vibration of a lighter
atom ( < 1) as
s
4k
!loc D ;
m .2 /
p
which is positioned above the perfect chain frequencies 0 < ! 4k=m (since
0 < .2 / < 1). This vibration corresponds to a local mode associated
with oscillations of atoms in the vicinity of the defect. Explain why there is no
extra solution for a heavier atom ( > 1). [Hint: use the substitution t D tan 2x ,
where x is the argument of the sine function in the integrand.]
Consider a three-dimensional atomic system, e.g. a solid. The solid we are consid-
ering does not need to be periodic; it could be a disordered or defective system. We
would like to obtain energy levels electrons can occupy in this material. To find
the energy levels, one need to solve the Schrödinger equation for the electrons:
„2
.r/ C V.r/ .r/ D .r/ ; (1.146)
2m
where is the wave function of the electron occupying the state with energy
, the first term in the left-hand side corresponds to the kinetic, while the second
to the potential energy of the electron with V.r/ being the corresponding lattice
potential that the electrons experience in the solid, and m is the electron mass. It is
convenient to introduce the Hamiltonian operator H b which is defined in such a way
that its action on any function ' .r/ standing on the right of it results in the following
action:
b „2 „2
H'.r/ D C V.r/ '.r/ D '.r/ C V.r/'.r/ :
2m 2m
Then, the Schrödinger equation (1.146) takes on a very simple form:
b
H .r/ D .r/ : (1.147)
1.3 Examples in Physics 117
Fig. 1.13 A possible model of an alloy of a two-dimensional solid: we have a regular arrangement
of identical atoms (brown) except for some random lattice sites which are occupied by a different
species (cyan). At each lattice site a single localised atomic orbital A .r/ is positioned (blue)
For the sake of simplicity we shall assume that there is only one orbital, A .r/,
placed on each atom A. Here the summation is performed with respect to all atoms
(i.e. all atomic orbitals). This method is called the linear combination of atomic
orbitals (LCAO) method. As an example of a possible system we show in Fig. 1.13
a fragment of a two-dimensional (e.g. a surface or a slab) system in which two
species of atoms are distributed at random at regular lattice sites. On each atom of
the system with position vector RA we have placed an orbital A .r/ D .r RA /;
note that in this example all orbitals have an identical shape given by the function
.r/; only their positions are different.
We shall also assume that the orbitals are normalised to unity and that the orbitals
on different atoms do not overlap, i.e. they are orthogonal to each other; in other
words,
Z
A .r/B .r/dr D ıAB ; (1.149)
where the integration is done with respect to the whole volume of the system, and
ıAB is the Kronecker symbol (equal to one when A D B, otherwise it is equal to
zero).
Before we move any further, we need one important result related to expanding
a given function in terms of other functions. This question is considered in more
detail in Sect. I.7.2 on functional series. What is essential for us here is that we may
assume that the atomic orbitals centred on all atoms of the system form a set of
118 1 Elements of Linear Algebra
functions which is very close to a complete set of functions, i.e. that any “good”
function can be expanded in terms of them:
X
f .r/ D fA A .r/ :
A
where we have used the Dirac’s notation for the matrix element, something which
is frequently used in quantum mechanics.
b B .r/ as the function f .r/, then the above expressions are
Now, if we consider H
rewritten as:
X Z
b
HB .r/ D fA A .r/ with fA D A .r/H b B .r/dr D HAB ; (1.150)
A
where the numbers HAB D hA j H b jB i form elements of the Hamiltonian matrix
H D .HBA /, and yet again the Dirac’s notations for the matrix elements were applied
for convenience.
Now we are ready to continue working on our problem. Substituting the LCAO
expansion (1.148) into the Schrödinger equation (1.147), we obtain
X X
b A .r/ D
cA ./H cA ./A .r/ :
A A
Multiplying both sides of the equation by B .r/ and integrating over the whole
volume and using the orthogonality of the atomic orbitals (1.149), we obtain
X
HBA cA ./ D cB ./ ; (1.151)
A
HC D C ; (1.152)
where C D .cA .// is the vector of all LCAO coefficients corresponding to the
state . Hence, to determine eigenvalues and the LCAO coefficients C , one has
to find eigenvalues and eigenvectors of the Hamiltonian matrix H D .HAB /. Note
that since H is a symmetric matrix, the eigenvalues are guaranteed to be real.
1.3 Examples in Physics 119
The only difference with the Lanczos procedure introduced in Sect. 1.2.16 is that
we work here with functions and the operator H b instead of vectors and a matrix.
However, as we recall from Sect. 1.1.2, there is a very close analogy between vectors
and functions; it is easy to see that the operator H b plays here the role of such a
matrix.
By virtue of the above construction, the function g2 must be orthogonal to g1 and
normalised, i.e. hg2 j g1 i D 0 and hg2 j g2 i D 1. It is essential to realise at this stage
that g2 , constructed as above, is actually a linear combination of the atomic orbitals.
Indeed, according to Eq. (1.150),
X
Hg b 0 .r/ D
b 1 .r/ D H H0B B .r/ ;
B
Reversely,
X
A D UkA gk :
k
Let us now look at the resolvent matrix G.z/ D .zE H/1 . Let us express the
Hamiltonian matrix H D .HAB / via matrix elements Tkl of the Hamiltonian operator
b written with respect to the new orbitals:
H
X X X
b jB i D
HAB D hA j H b jgl i D
UkA UlB hgk j H UkA UlB Tkl D U T Ak Tkl UlB :
kl kl kl
This identity shows that the pole structure of the resolvent matrix is fully contained
in the resolvent G.z/ D .zE T/1 of the T matrix which is written in terms of the
new (Lanczos’s) basis set. But the matrix zE T is tridiagonal, and hence the first
diagonal element of its inverse, G11 .z/, can easily be calculated as a function of z via
a continued fraction, see Sect. 1.2.6.4 and Problem 1.61:
ˇ ˇ ˇ ˇ
1j ˇ12 ˇ ˇ22 ˇ ˇ32 ˇ 2 ˇ
ˇn1
G11 .z/ D ;
j1 j2 j3 j4 jn
1.3 Examples in Physics 121
which yields a quadratic equation for the sum S1 .z/. Once the infinite tail of the
fraction containing identical terms is known, these terms can be all replaced by the
infinite sum S1 leading to a finite continued fraction
ˇ ˇ ˇ ˇ ˇ
1j ˇ12 ˇ ˇ22 ˇ ˇ32 ˇ 2 ˇ
ˇl2 ˇl2 ˇ
G11 .z/ D ; (1.156)
j1 j2 j3 j4 jl1 S1
which can be now calculated exactly.
We have discussed here only the main ideas of Lanczos method. It exploits the
“localised” nature of interactions in the systems, and as such is frequently used, e.g.
in the theory of electronic states of disordered system as well as in other fields as
it effectively allows considering finite fragments of realistic systems taking explicit
account of their specific geometry.
Consider
an n-level quantum system described by the Hamiltonian matrix H D
Hij . If at the initial time t D 0 the wave function of the system was a vector-
column
0 1
0
1
B C
B C
B C
‰0 D B k0 C ;
B C
@ A
0
n
122 1 Elements of Linear Algebra
P
Expanding the state vector ‰t in terms of the eigenvectors of H, i.e. ‰t D j ˛j .t/xj ,
we find upon multiplication from the left with xk that the expansion coefficients
˛j .t/ D xj ‰t . Substituting the obtained expansion of ‰t into (1.158), we have
X d˛k X
X d˛k X
i„ xk D k ˛j xk xk xj H) i„ xk D k ˛k xk :
k
dt kj k
dt k
(1.159)
In the above manipulation we used the associativity in multiplying matrices (vectors
in this particular case) and also the fact that the eigenvectors of H are orthonormal;
because of the latter the double sum in the right-hand side was transformed into
a single sum. Finally, multiply (1.159) by xj from the left again and use the
orthonormality property. This way a simple differential equation for the coefficients
˛j .t/ follows which is trivially solved:
d˛j
i„ D j ˛j H) ˛j .t/ D ˛j .0/eij t=„ ;
dt
where ˛j .0/ D xj ‰0 are the initial expansion coefficients. Therefore, the wave
function at time t becomes
X X X
‰t D ˛j .t/xj D ˛j .0/eij t=„ xj D eij t=„ xj ˛j .0/
j j j
0 1
X
D@ eij t=„ xj xj A ‰0 :
j
1.3 Examples in Physics 123
The sum in the right-hand side in the round brackets can be recognised to be the
spectral representation of the function of the Hamiltonian matrix H:
Ut0 D eiHt=„ :
The matrix Ut0 is called propagator in quantum mechanics as it propagates the wave
function ‰0 from t D 0 to ‰t D Ut0 ‰0 at any finite value of t > 0. It satisfies simple
properties which we shall leave to the reader to prove as a problem.
where
0
Utt0 D eiH.tt /=„ :
written in terms of the two stationary states. By substituting .t/ into the
time dependent Schrödinger equation i„ P D .H0 C V/ and considering
explicitly both components, show that the coefficients C1 .t/ and C2 .t/ satisfy
the following system of two ordinary differential equations:
4 jVj2 t 4 jVj2 t
P1 .t/ D 1 2
sin2 and P2 .t/ D sin2 :
2 C 4 jVj „ 2 C 4 jVj2 „
We introduced complex numbers in Sect. 1.8 of the first volume.1 There we just
defined the numbers themselves, but did not go any further. In fact, since the
introduction of complex numbers a number of centuries ago, the theory based
on them has been substantially developed into an extended analysis of complex
functions defined on the complex plane. The mathematical tool thus created
represents an extremely powerful device for solving practical problems ranging
from calculating real integrals to solving partial differential equations.
The purpose of this chapter is to consider in detail this elegant formalism.
We shall start by returning to complex numbers and the complex plane, then
consider functions on the complex plane, their differentiation and integration,
complex functional series, analytic continuation, residues, Frobenius method for
solving ordinary differential equations and, finally, some applications in physics to
conclude this chapter.
1
In the following, references to the first volume of this course (L. Kantorovich, Mathematics for
natural scientists: fundamentals and basics, Springer, 2015) will be made by appending the Roman
number I in front of the reference, e.g. Sect. I.1.8 or Eq. (I.5.18) refer to Sect. 1.8 and Eq. (5.18) of
the first volume, respectively.
i.e. the product z3 D z1 z2 is the complex with the length r3 D r1 r2 and the phase
3 D 1 C 2.
Problem 2.1. Show that if two complex numbers z1 and z2 are specified by the
lengths r1 and r2 and the phases 1 and 2 , then their division z3 D z1 =z2 is
characterised by r3 D r1 =r2 and the phase 3 D 1 2 , i.e.
z1 r1
D Œcos . 1 2 / C i sin . 1 2 / : (2.4)
z2 r2
Problem 2.2. Derive the following result for the power of a complex number
by repeatedly using Eq. (2.3):
This formula can actually be used for expressing the cosine or sine of the angle
n with integer n via the cosine and sine of the angle (angle reduction). Indeed,
by taking n D 2 and opening brackets in (2.6), we have
The real part in the left-hand side should be equal to the real part in the right, and
similarly for the complex imaginary parts, which give us two identities:
Problem 2.5. Now obtain a general result applying the binomial expansion to
the left-hand side of (2.6):
p
X 2p
cos .2p / D .1/k sin2k cos2.pk/ ;
2k
kD0
p1
X 2p
sin .2p / D .1/k sin2kC1 cos2.pk/1 ;
2k C 1
kD0
p
X 2p C 1
cos ..2p C 1/ / D .1/k sin2k cos2.pk/C1 ;
2k
kD0
p
X 2p C 1
sin ..2p C 1/ / D .1/k sin2kC1 cos2.pk/ :
2k C 1
kD0
Check that these equations reproduce the previous results of Problem 2.4.
128 2 Complex Numbers and Functions
Next we can rederive the sum of the cosine and sine functions we calculated
previously,
X
n
1 .n C 1/ x nx
sin .kx/ D x sin sin (2.7)
kD1
sin 2 2 2
and
X
n
1 .n C 1/ x nx
cos .kx/ D x cos sin ; (2.8)
kD1
sin 2 2 2
see Eqs. (I.2.63) and (I.2.64), using the trigonometric representation of a complex
number. In fact, we shall even be able to generalise these formulae a little. To this
end, let us note that the formula (I.1.61) for the geometric progression,
qnC1 1
Sn D a0 ;
q1
is valid for complex q as well as that we did not use in the derivation that q was
necessarily real (recall that the complex numbers are governed by the same algebraic
rules as the real numbers). Then, we can write
X
n
1 znC1
zk D with z D r .cos x C i sin x/ and zk D rk Œcos .kx/ C i sin .kx/ :
kD0
1z
Therefore, by taking the real and imaginary parts of the expression in the right-
hand sides, we should be able to work out the sums of rk cos .kx/ and rk sin .kx/.
The calculation is based on the fact that the product zz D x2 C y2 is real, where
x D Re .z/ and y D Im .z/. Therefore, we multiply and divide the expression above
by the complex conjugate of the denominator:
1 znC1 1 znC1 1 z
D :
1z 1z 1 z
2.1 Representation of Complex Numbers 129
With this trick the denominator becomes real, so that only the numerator is complex:
Problem 2.6. Calculate the numerator to show that its real and imaginary
parts are, correspondingly:
Re 1 znC1 1 z D 1 r cos x C rnC2 cos .nx/ rnC1 cos ..n C 1/ x/ ;
Im 1 znC1 1 z D r sin x C rnC2 sin .nx/ rnC1 sin ..n C 1/ x/ :
X
n
1 r cos x C rnC2 cos .nx/ rnC1 cos ..n C 1/ x/
rk cos .kx/ D ; (2.9)
kD0
1 2r cos x C r2
X
n
r sin x C rnC2 sin .nx/ rnC1 sin ..n C 1/ x/
rk sin .kx/ D : (2.10)
kD0
1 2r cos x C r2
Assuming that 0 < r < 1, we can also calculate the infinite sums by taking the
n ! 1 limit (note that rn ! 0):
1
X 1 r cos x
rk cos .kx/ D ; (2.11)
kD0
1 2r cos x C r2
1
X r sin x
rk sin .kx/ D : (2.12)
kD0
1 2r cos x C r2
Problem 2.7. Prove that the series (2.11) and (2.12) converge only for 0 <
r < 1.
Problem 2.8. Check that by setting r D 1 in Eqs. (2.9) and (2.10), we recover
Eqs. (2.7) and (2.8).
130 2 Complex Numbers and Functions
jz z0 j < : (2.13)
q
Here jz z0 j D .x x0 /2 C .y y0 /2 corresponds to the distance on the complex
plane between the two points, z and z0 . Therefore, Eq. (2.13) selects all the points
lying inside a circle of radius drawn with the point z0 at its centre, see Fig. 2.1(a).
Note that the points at the boundary of the circle are strictly not selected because of
the less (rather than less and equal) sign in Eq. (2.13). If we now consider a region
D in the complex plane, see Fig. 2.1(b), then three situations can be envisaged: an
internal point z1 of D, a boundary point z2 lying on the boundary L of D (shown by
the solid line) and finally a point z3 which is outside D. For any internal point one
can always find such its -vicinity (or such value of ) that all its points belong to
D; if a point is on the boundary L of D, then for any -vicinity there will be points
Fig. 2.1 (a) The -vicinity (neighbourhood) of the point z0 includes all points z from C which lie
inside the circle of radius centred at the point z0 D .x0 ; y0 /. (b) For any internal point z1 of a
region D one can always find its -vicinity such that it lies fully inside D. This is, however, not the
case for the point z2 lying on the boundary L of the region D: only some of the points of its any
-vicinity lie inside D. For a point z3 lying outside D, one can always find such its vicinity that lies
entirely outside D
2.2 Functions on a Complex Plane 131
inside and outside of D including some boundary points, i.e. not all points in the
-vicinity would belong to D; finally, if a point lies outside D, then one can always
find such its -vicinity that all its points lie outside D, i.e. the entire -vicinity is
outside D.
If a region in the complex plane has a single boundary (a single continuous
line, either a smooth or a broken one2 ) as the one shown in Fig. 2.1(b), it is called
simply connected. However, regions in the complex plane may have a more complex
structure. For instance, they may have not just one but several boundaries (i.e.
several closed boundary lines). This happens when some regions of points on C are
excluded and/or cuts made through accepted regions, as is schematically shown in
Fig. 2.2(a). Indeed, the two white regions (with boundaries L2 and L3 ) which are cut
off from the shaded region add two more boundary lines to it, so that now the shaded
region has three boundary lines: L1 , L2 and L3 . Regions which have more than one
continuous boundary we shall call multiply connected or non-simply connected. One
may say that our shaded region has two holes. Indicating explicitly the number of
boundaries (closed boundary lines) existing in D as k D 1; 2; : : :, we say that D is
.k 1/-fold-connected; simply connected regions have k D 1. Hence, the shaded
region in Fig. 2.2(a) is twofold-connected. Note that the cuts by lines L4 and L5 in
this figure do not change the number of boundary lines as it must be clear from
Fig. 2.2(b): they simply contribute to the outside boundary line of D.
When generalizing the idea of integration, we shall need to define a direction
along the boundary of a region. We shall consider the direction “positive” when
we traverse the boundary in such a way that the region D is on the left (compare the
Green’s and Stokes’s theorems in Sects. I.6.3.3 and I.6.4.5). If the opposite direction
is chosen with the points of D being on the right, the integrals will change their sign,
so that this direction will then be called “negative”.
Fig. 2.2 Regions in the complex plane and their boundaries. (a) Region D has five boundaries
L1 L5 , where L1 is its outer boundary, L2 and L3 correspond to the boundaries of internal ovals
taken out and L4 and L5 correspond to the cuts made in D. The cuts are shown more clearly in (b).
Note the directions of traverse: their “positive” direction is always chosen such that the region D is
on the left
2
In other words, a closed piecewise line (i.e. consisting of smooth pieces which connect to each
other).
132 2 Complex Numbers and Functions
A function f .z/ on the complex plane C can be defined in a way similar to a real
function of two variables g.x; y/ when each pair of real numbers x and y is put into
correspondence with a complex number w. Hence, there is an important difference
with the former case: if in the case of a real function g D g.x; y/ every point on the
x y plane with the coordinates .x; y/ is put into correspondence with a real number
g, in the case of a complex function w D f .z/ we define the correspondence between
the point z D .x; y/ in the 2D complex plane with a point w D .u; v/ on the same
2D plane, i.e. w D u C iv contains two real numbers, not one as in the case of a real
function g.x; y/. Therefore, complex functions map the complex plane onto itself.
Hence, in this sense, there is more similarity with a real function of a single variable
where the 1D space (the real x axis) is mapped onto itself. Therefore, the mapping
w D f .z/ is equivalent to two real functions of two real variables each:
When considering such a mapping, more complications may arise. For instance,
we may have that for the same value of z several values of w are possible. This
results in a multi-valued function as schematically shown in Fig. 2.3(b). If only a
single value of w exists for each value of z, but still several different values of z may
result in the same w, the function w D f .z/ is called single-valued, see Fig. 2.3(a). If
any single z within some region D corresponds to a single value of w, and vice versa,
we then have a mapping with one-to-one correspondence, Fig. 2.3(c). We shall see
that in many practical situations we shall be trying to find such regions of values of
z and w so that the one-to-one mapping between them would be possible.
We note that this is not something really specific for functions of complex
variables. Indeed, we know that a square root of p a real positive number has p two
values: negative and positive, i.e. the function
p y D x has in fact two values, ˙ x,
for the same value of x (assuming that by x we mean only the positive value of
the root); therefore, in the world of real numbers we do face a similar problem.
p It is
overcome by taking only the positive root, i.e. by assuming that y D x is always
positive. In the case of complex functions defined on the complex plane this is an
issue of fundamental importance and we shall encounter this over and over again.
How this problem is tackled is similar to choosing a single root in the world of real
numbers and will be considered later on in detail.
Suppose now that we have the one-to-one mapping between two complex
domains of z and w. Then, it must be possible to define the inverse function
w D f 1 .z/ such that z D f .w/. For two single-valued functions w D f .z/ and
q D .w/ one can also introduce their superposition qD .f .z//. In particular, if
w D f .z/ performs one-to-one mapping, then f f 1 .z/ D z and f 1 .f .z// D z.
Next, we have to define the limit of a function in C: The limit limz!z0 f .z/ is
defined similarly to that of a real function of two variables, i.e.
Alternatively, using the ı language, limz!z0 f .z/ D F if for any > 0 one
can always find such ı > 0 that for any z satisfying 0 < jz z0 j < ı follows
jf .z/ Fj D jw Fj < . In other words, points in the ı-vicinity of z0 (excluding
the point z0 as the function f .z/ may not exist there) are mapped onto points of an
-vicinity of F in the w plane. No matter how small the latter vicinity is, one can
always find the corresponding vicinity of the point z0 to accomplish the mapping.
It is clear that the limit exists if and only if the two limits for the functions u.x; y/
and v.x; y/ exist. The function f .z/ is continuous, if
similarly to the real functions. Clearly, both functions u.x; y/ and v.x; y/ are to be
continuous for this to happen. It is essential that the limit must not depend on the
path z ! z0 in C.
The usual properties of the limits apply then, similarly to the calculus of real
functions:
lim .f .z/Cg.z// D lim f .z/C lim g.z/; lim .f .z/g.z// D lim f .z/ lim g.z/ ;
z!z0 z!z0 z!z0 z!z0 z!z0 z!z0
f .z/ limz!z0 f .z/
lim D .if g ¤ 0/ and lim g .f .z// D g lim f .z/ :
z!z0 g.z/ limz!z0 g.z/ z!z0 z!z0
134 2 Complex Numbers and Functions
It is also possible to show that if a closed region D is considered, the function f .z/
will be bounded, i.e. there exists such positive F that jf .z/j F for any z from D.
Now we are ready to consider differentiation of f .z/ with respect to z. We can define
the derivative of f .z/ similarly to the real functions as
f .z C z/ f .z/
f 0 .z/ D lim : (2.15)
z!0 z
It is essential that the limit must not depend on the path in which the complex
number z approaches zero, Fig. 2.4, as otherwise this definition would not have
any sense. This condition for a function f .z/ to be (complex) differentiable puts
certain limitations on the function f .z/ D w D u.x; y/ C iv.x; y/ itself, which are
formulated by the following Theorem.
Theorem 2.1 (Cauchy and Riemann). In order for the function f .z/ D u C iv
to be (complex) differentiable at z D .x; y/, where both u.x; y/ and v.x; y/
are differentiable around .x; y/, it is necessary and sufficient that the following
conditions are satisfied at this point:
@u @v @u @v
D and D : (2.16)
@x @y @y @x
Proof. We shall first prove the necessary condition. This means that we assume that
f 0 .z/ exists and need to show that the conditions (2.16) are satisfied. Indeed, if the
derivative exists, it means that it does not depend on the direction in which z ! 0.
Therefore, let us take the limit (2.15) by approaching zero along the x axis, i.e. by
considering x ! 0 and y D 0. We have
2.2 Functions on a Complex Plane 135
since along this path z D x. Alternatively, we may approach zero along the
imaginary axis by taking z D iy with x D 0. This must give the same complex
number f 0 .z/:
Œu .x; y C y/ C iv .x; y C y/ Œu .x; y/ C iv .x; y/
f 0 .z/ D lim
y!0 iy
1 u .x; y C y/ u.x; y/ v .x; y C y/ v.x; y/
D lim C i lim
i y!0 y y!0 y
@u @v
D i C :
@y @y
The two expressions must be identical as the derivative must not depend on the
direction. Therefore, comparing the real and imaginary parts of the two expressions
above, we obtain the required conditions (2.16).
Next we prove sufficiency: we are given the conditions, and we have to prove
that the limit (2.15) exists. Assuming that the functions u.x; y/ and v.x; y/ are
differentiable, we can write (see Sect. I.5.3):
@u @u
u D u .x C x; y C y/ u.x; y/ D x C y C ˛1 x C ˛2 y
@x @y
@u @v
D x y C ˛1 x C ˛2 y;
@x @x
@v @v
v D v .x C x; y C y/ v.x; y/ D x C y C ˇ1 x C ˇ2 y
@x @y
@v @u
D x C y C ˇ1 x C ˇ2 y;
@x @x
where ˛1 , ˛2 , ˇ1 and ˇ2 tend to zero if x and y tend to zero (i.e. z ! 0),
and in the second passage in both cases we have made use of the conditions (2.16)
by replacing all partial derivatives with respect to y with those with respect to x.
Therefore, we can consider the difference of the function in Eq. (2.15):
@u @v @v @u
f D f .z C z/ f .z/ D u C iv D Ci x C Ci y
@x @x @x @x
C1 x C 2 y
@u @v @u @v
D Ci .x C iy/ C 1 x C 2 y D Ci z
@x @x @x @x
C1 x C 2 y; (2.17)
136 2 Complex Numbers and Functions
f .z C z/ f .z/ @u @v x y
D Ci C 1 C 2 : (2.18)
z @x @x z z
It is easy to see that the fractions x=z and y=z are limited. To show this, let
us write the complex number z D r .cos C i sin / in the trigonometric form,
then:
and hence
ˇ ˇ q
ˇ x ˇ
ˇ ˇ D jcos .cos i sin /j jcos i sin j D cos2 C sin2 D 1;
ˇ z ˇ
since the complex number cos i sin lies on a circle of the unit radius in the
complex plane; similarly for the y=z. Therefore, when taking the limit z ! 0
in (2.18), we can ignore these fractions and consider only the limit of 1 and 2
which both tend to zero. Therefore, finally:
f .z C z/ f .z/ @u @v
lim D Ci ;
z!0 z @x @x
which is a well-defined expression (since both u and v are differentiable), i.e. the
derivative f 0 .z/ exists. Q.E.D.
Using the conditions (2.16) and the fact that the derivative should not depend on
the direction in which z ! 0, we can write down several alternative expressions
for the derivative:
@u @v @u @v @v @v @u @u
f 0 .z/ D Ci D i C D Ci D i : (2.19)
@x @x @y @y @y @x @x @y
Since the derivative is basically based on the partial derivatives of the functions
u.x; y/ and v.x; y/, all properties of the derivatives of real functions are carried over
into here:
df dg
.f C g/0 D f 0 C g0 ; .fg/0 D f 0 g C fg0 and f .g.z//0 D :
dg dz
2.2 Functions on a Complex Plane 137
If z D f .w/ does the unique (one-to-one) mapping, then the inverse function w D
f 1 .z/ exists. Then, similarly to the real calculus,
ˇ
dw d 1 1 ˇˇ
D f 1 .z/ D D 0
dz dz dz=dw f .w/ ˇwDf 1 .z/
@u @v @u @v
D 2x; D 2x; D 2y; D 2y;
@x @y @y @x
and the Cauchy–Riemann conditions (2.16) are indeed satisfied as can be easily
checked. Correspondingly,
2 0 @u @v
z D Ci D 2x C i2y D 2 .x C iy/ D 2z:
@x @x
@2 u @2 u @2 v @2 v
C D 0 and C D 0: (2.20)
@x2 @y2 @x2 @y2
Here we wrote v as a y-integral since @v=@y D y; this follows from the general
property of the indefinite integrals. Also, the integration is performed only over y,
i.e. keeping x constant, and .x/ appears as an arbitrary function of x, i.e. .x/ is
yet to be determined. The above result however fully determines the dependence of
v.x; y/ on y. To find the function .x/, we employ the second condition (2.16):
Z
@u @v d 1
Dx H) D x D H) .x/ D xdx D x2 C C;
@y @x dx 2
i i 1
f .z/ D z2 C iC D .x C iy/2 C iC D xy i x2 y2 C iC;
2 2 2
which is precisely our function. This example shows that if u.x; y/ is known, then
the corresponding imaginary part v.x; y/ can indeed be found. It also illustrates that
the final function must be expressible entirely via z as f .z/. Not in all cases this is
possible; some u.x; y/ (or v.x; y/) do not correspond to any f .z/. For instance, if
u.x; y/ D x3 3x2 y, then we first obtain
@u @v
D 3x2 6xy H) D 3x2 6xy
@x @y
Z
2
H) v .x; y/ D 3x 6xy dy D 3x2 y 3xy2 C .x/ I
2.2 Functions on a Complex Plane 139
@u @v @ 2
D 3x2 H) D 3x2 H) 3x2 D 3x y 3xy2 C .x/
@y @x @x
d
D 6xy 3y2 C ;
dx
which does not give an equation for .x/ since terms with y do not cancel out. This
means that there is no function f .z/ having the real part equal to x3 3x2 y.
@r @r @ sin @ cos
D cos ; D sin ; D and D :
@x @y @x r @y r
@u @u sin @v @v cos
cos D sin C ;
@r @ r @r @ r
@u @u cos @v @v sin
sin C D cos C :
@r @ r @r @ r
@u 1 @v @v 1 @u
D and D :
@r r@ @r r@
140 2 Complex Numbers and Functions
Consider w D zn with n being a positive integer. If jzj D r and arg .z/ D , then
we know from Eq. (2.5) that jzn j D rn and arg .zn / D n . So, the power function is
single-valued.
It is however easy to see that the mapping here is given by that depicted in
Fig. 2.3(a) where two points, z2 and z3 , go over into a single point w2 of the function.
Indeed, let us see if there exist two different points z1 and z2 (given by their absolute
values r1 and r2 and the phases 2 and 2 ) which give the same power, i.e. for
which zn1 D zn2 . Naively, we may write two equations: r1n D r2n and n 1 D n 2 .
The first one yields simply r1 D r2 . However, the second equation is not quite
correct as the phase may differ by an integer number of 2 , and hence is to be
replaced by n 1 D n 2 C 2 k with k being any integer (including zero); therefore,
1 D 2 C 2 k=n. Hence, the points on a circle of radius r1 D r2 which have
difference in phases of 2 k=n transform by the power function w D zn into the
same point w in C. Correspondingly, if we only consider a sector of points with
the phases satisfying the inequality
2 2
.k 1/ < < k; (2.21)
n n
then they would all transform by w D zn into different points, i.e. there will be a
one-to-one mapping of that sector into the complex plane. Here k D 1; : : : ; n, i.e.
the complex plane is divided into n such sectors as shown in Fig. 2.5 for the case of
n D 8. If we consider any points z inside of any of these sectors, they will provide a
unique mapping by means of the power function w D zn .
We have proven in Problem 2.10 that this function is analytic using the binomial
expansion. In fact, a much simpler proof exists. Indeed, w D u C iv D .x C iy/n ,
with u D Re .x C iy/n and v D Im .x C iy/n . Therefore,
h i
@u @
D Re .x C iy/n D Re n .x C iy/n1 D Re Œn .a C ib/ D na;
@x @x
h i
@v @
D Im .x C iy/ D Im ni .x C iy/n1 D Im Œin .a C ib/ D Im .ina nb/
n
@y @y
D na;
i.e. both are the same. Above, a and b are real and imaginary parts of the complex
number .x C iy/n1 .
Problem 2.16. Show that the second condition (2.16) is also satisfied.
@
.zn /0 D .x C iy/n D n .x C iy/n1 D nzn1 ;
@x
i.e. the same result as in the real calculus, i.e. as if the derivative was taken directly
with respect to z.
in their phase correspond to the same value of z. This means that for the given
z D r .cos C i sin / there are n values of w (roots), which are defined via z D wn ,
i.e. we have for w D .cos C i sin / the absolute value and the phase
satisfying: n D r and n D C 2 k, i.e.
ˇp ˇ p p C2 k 2
ˇ n zˇ D D n
r and arg n
z D D D C k;
n n n
where k D 0; 1; 2; : : : ; n 1; (2.22)
142 2 Complex Numbers and Functions
which means
p p 2
n
z D n r .cos k C i sin k/ ; k D C k; (2.23)
n n
where the index k numbers all the roots. Note that there are only n different values
of the integer k possible, e.g. the ones given above; all additional values repeat the
same roots. So, we conclude that the n root of any complex number z (except for
z D 0) has n roots. The point z D 0 is indeed somewhat special as it only has a single
root for any n. We shall see in a moment the special meaning of this particular point.
Example 2.2. I As an example, consider the case of n D 2; we should have two
roots:
p p
z 1 D r cos C i sin ;
2 2
p p p p
z 2 D r cos C C i sin C D r cos i sin D z 1:
2 2 2 2
p
In particular,
p if z is real positive ( D 0 and z D x > 0), the two roots are x
and x, as one would expect. If z is real negative, then D , and the two roots
become
p p p p p
z 1 D jxj cos C i sin D i jxj and z 2 D i jxj:
2 2
These are shown as dots in Fig. 2.6.
As an another example, consider all n roots of 1. Here r D 1 and D 0, so
the n roots of 1 would all have the absolute value equal to one, and the phases
k D 2 k=n. The roots form vertices of a regular n-polygon fit into a circle of unit
radius as shown for n D 3; 4; 5 in Fig. 2.7.
In a general case of calculating all roots of a number z D x C iy, it is necessary
to
p calculate r and pfirst, and then p work out all the roots. For instance, consider
3
1 C 2i. Here r D 12 C 22 D 5 and D arctan .2=1/ D arctan 2. Therefore,
the three roots are
p
Fig. 2.7 Roots n 1 for
n D 3; 4; 5 form vertices of
regular polygons inserted
inside the circle of unit radius
p
3 1 2
zk D 5 Œcos k C i sin k with k D arctan 2 C k; k D 0; 1; 2:
3 3
As an example of an application, let us obtain all roots of a quadratic equation
x2 C2xC4 D 0. Using the general expression for the roots of the quadratic equation,
we can write
p p p
x1;2 D 1 ˙ 1 4 D 1 ˙ 3 D 1 ˙ i 3:
Here the ˙ sign takes care of the two values of the square root. J
Problem 2.17. Prove that the sum of all roots of 1 is equal to zero. [Hint: use
Eqs. (2.9) and (2.10).]
Problem 2.18. Obtain all roots of the following quadratic equations:
x2 C x C 1 D 0 I x2 C 2x C i D 0 I 2x2 3x C 3 D 0:
p
[Answers: 1=2 ˙ i 3=2; 1 ˙ 21=4 Œcos . =8/ i sin . =8/;
p
3 ˙ i 15 =4.]
4 4
p roots of the equation z C a D 0. [Answer: the four
Problem 2.19. Obtain all
roots are ˙ .˙1 C i/ a= 2.]
Problem 2.20. Show that all four roots of the equation (g > 1)
x4 C 2 2g2 1 x2 C 1 D 0
q p
are ˙i 2g2 1 ˙ 2g g2 1.
Now let us consider in more detail how different roots of the same complex
p
D z.
number z are related to each other. Let us start from the square root w p
If jzj D r and arg .z/ D , then the first root w1 is given by jw1 j D r and
arg .w1 / D =2. If we change of z within the limits of 0 and 2 , excluding the
boundary values themselves, i.e. 0 < < 2 , then the argument of w1 would
144 2 Complex Numbers and Functions
p
Fig. 2.9 As z changes along the path in (a), the value of the root w D n
z goes from w1 domain
to the w2 domain as shown for the cases of n D 2 (b) and n D 3 (c)
change (see Eq. (2.23)) within the limits of 0 and , i.e. 0 < arg .w1 / < .
p
However, if we consider the second root w2 of the function w D z, then over
the same range of z values the phase of the root w2 would vary within the interval
< arg .w2 / < 2 . This is schematically shown in Fig. 2.8: when z is passed along
the closed loop shown in (a) which is not crossing the positive part of the x axis,
the first root w1 traverses only over the upper loop in (b), while the second root w2
p
over the lower part. Similarly, in the case of the function w D 3 z the arguments
of the three roots lie within the intervals, correspondingly, 0 < arg .w1 / < 2 =3,
2 =3 < arg .w2 / < 4 =3 and 4 =3 < arg .w3 / < 2 . Therefore, if we imagine
taking z along the same contour shown in Fig. 2.8(a), the three roots would traverse
p
along the three paths shown in (c) of the same figure. The root function w D n z
under this condition (0 < arg .z/ < 2 ) is clearly single-valued and we can choose
any of the roots.
Therefore, if z does not cross the positive part of the x axis from below, i.e. the
z D 0 point is not completely circled, each of the roots remains stable within their
respective regions. Let us now imagine that we take a contour in which the positive
part of the z axis is crossed from below, i.e. the z D 0 is fully circled (arg.z/ goes
beyond 2 ) as is shown in Fig. 2.9(a). In this case, if we initially start from the root
w1 , its phase arg .w1 / goes beyond its range ( in the case of n D 2 and 2 =3 in the
case of n D 3) as is shown in Fig. 2.9(b, c), and thus the root function takes on the
next value w2 .
2.3 Main Elementary Functions 145
Fig. 2.10 To avoid multi-valued character of the n-root function, a “cut” is to be made from
the z D 0 point to infinity in order to restrict the phase of z in such a way that the maximum change
of arg.z/ would not exceed 2 . For example, this can be done either by cutting along the positive
part of the x axis as shown in (a) (in (b) the cut is shown clearer), in which case 0 < arg.z/ < 2 ,
or along its negative part as in (c) in which case < arg.z/ < , or along, e.g. the upper part of
the imaginary axis as in (d) when 3 =2 < arg.z/ < =2
146 2 Complex Numbers and Functions
p
Fig. 2.11 (a) The function w D z2 1 has two branch points: z D ˙1. A general point z in the
complex plane can be represented in two ways: with respect to either the branch point z D 1 (using
r1 and 1 ) or z D 1 (using r2 and 2 ). Here r1 and r2 are the corresponding distances to the
branch points. (b) One branch cut is drawn from z D 1 to infinity along the positive x direction,
while the other branch cut is drawn in the same direction but from the point z D 1. (c) The previous
case is equivalent to having the branch cut drawn only between the points z D 1 and z D 1; as
in (b) w.z/ appears to be continuous everywhere outside the cut (see text). (d) Another possible
choice of the two branch cuts which leads to a different function w.z/
The branch cuts can be made in various ways, as the idea is to limit the phase
of z such
p that the function w.z/ is single-valued. Two such other possibilities for
w D z2 1 are shown in Fig. 2.11(b, d). Consider first the construction shown in
(b) where the two cuts are drawn in the positive direction of the x axis from both
points (so that the cuts overlap in the region x > 1). In this case the angles 1 and
2 change within the same p intervals: 0 < 1 < 2 and 0 < 2 < 2 . To understand
the behaviour of w.z/ D z2 1, it is sufficient to calculate it on both sides of
the x axis only, i.e. above (y D C0) and below (y D 0) it. We have to consider
three cases: (i) x < 1, (ii) 1 < x < 1 and (iii) x > 1. Generally, for any z we can
write
p p p
wD z 1 z C 1 D r1 r2 Œcos . 1 C 2/ C i sin . 1 C 2 /
p 1 C 2 1 C 2 1 C 2
D r1 r2 cos C i sin ; i.e. arg.w/ D :
2 2 2
For x > 1 we have r1 D x 1 and r2 Dpx C 1. Then, on the upper side of the cut
p
1 D 2 D 0 and hence w D r1 r2 D x2 1. On the lower side 1 D 2 D 2
(we have to make a complete circle in the anti-clockwise direction to reach the lower
side whereby accumulating
p in both cases the phase of 2 ) and hence arg .w/ D 2
which yields w D x2 1 , which is exactly the same value. Therefore, for any
x > 1 our function is continuous as crossing the x axis does not change it.
Next, let us consider x < 1. Here r1 D 1 x and r2 D 1 x; on the upper
side of the x axis 1 D 2 D (a half circle rotation is necessary resulting in the
phase change of only) and thus arg.w/ D as well, so that
2.3 Main Elementary Functions 147
p p
wD .1 x/ .1 x/ .cos C i sin / D x2 1:
On the lower side the angles 1 and 2 are the same, leading to the same function.
We conclude that w.z/ is also continuous at x < 1.
Finally, let us consider the interval 1 < x < 1 between the two branch points,
where r1 D 1 x and r2 D x C 1. On the upper side of the cut 1 D (a half circle
rotation) but 2 D 0 yielding arg.w/ D =2 and hence the function there
p p
wD .1 x/ .x C 1/ cos C i sin D i 1 x2 :
2 2
On the lower side 1 D (a half circle rotation), 2 D2 (a full circle rotation)
and arg.w/ D 3 =2 giving
p p
3 3
wD .1 x/ .x C 1/ cos C i sin D i 1 x2 :
2 2
Hence, the function jumps across the cut between the points .1; 0/ and .1; 0/, i.e.
it is discontinuous across the interval 1 < x < 1. Therefore, the two cuts drawn
in Fig. 2.11(b) can be equivalently drawn as a single cut between the two points
only. Note, however, that this might be misleading as this cut does not imply that
< 1 < and 0 < 2 < 2 , as this choice results in an undesired behaviour of
our function.
p
Problem 2.21. Show that the function w D z2 1 becomes discontinuous
for x > 1 and x < 1 and continuous for 1 < x < 1, if we assume that
< 1 < and 0 < 2 < 2 .
p
Problem 2.22. Show that the function w D z2 1 is continuous for 1 <
x < 1 and discontinuous everywhere else on the x axis if the two cuts are chosen
as in Fig. 2.11(d).
p
Problem 2.23. Analyse the behaviour of the function w D z2 C 1 on both
sides of the imaginary y axis. Consider cuts similar to the choice of Fig. 2.11
(b, d). Note that the branch points in this case are at z D ˙i. You may find the
drawing in Fig. 2.12 useful. [Answer: for the cut 1 < y < 1 (or, which is the
same, two cuts y > 1 and y > 1) the function w is continuous for y > 1 and
y < 1, but is discontinuous across 1 < y < 1; in the case of the cuts y > 1
and y < 1, it is the other way round.]
148 2 Complex Numbers and Functions
q
Problem 2.24. Consider the function w D .z C 1/2 .z 1/. Show that in the
3
case of the cut made as in Fig. 2.11(b, c) it is discontinuous for 1 < x < 1,
while for the cuts as in Fig. 2.11(d) this happens at x > 1 and x < 1 instead.
p
Consider now one branch of the n-root function, w D n z (we omit the index k for
simplicity). Let us show that it is analytic. In order to check the Cauchy–Riemann
conditions (2.16), we have to calculate the partial derivatives of u and v with respect
to x and y. Here w and z are related via
z D wn H) x C iy D .u C iv/n :
Differentiating with respect to x and y both sides of this identity, we obtain two
equations:
@u @v @u @v
1 D n .u C iv/ n1
Ci and i D n .u C iv/ n1
Ci ;
@x @x @y @y
so that
@u @v i @u @v
Ci D Di Ci :
@y @y n .u C iv/n1 @x @x
Comparing the real and imaginary parts on both sides, we immediately see that the
necessary conditions are indeed satisfied, i.e. the root function is analytic. Therefore,
its derivative can be calculated using general rules of the inverse function:
p
dw p 0 1 1 1 w n
z 1
D n
z D D n 0 D D D D z1=n1 ;
dz dz=dw .w /w nwn1 nwn nz n
which is exactly the same result as for the real n-root function.
2.3 Main Elementary Functions 149
Thus we see that we can differentiate both zn and z1=n with respect to z directly
using usual rules; there is no need to split z D x C iy and use any of the
formulae (2.19) which followed from the Cauchy–Riemann conditions.
We shall see that in the complex calculus the trigonometric functions are directly
related to the exponential function, something which would be completely impos-
sible to imagine being in the world of real functions! To see this we need first to
define what we mean by the exponential function on the complex plane C. To this
end, we shall employ the famous results of the real calculus, see Sects. I.2.3.4 and
I.2.4.6 and especially Eq. (I.2.85),
a n
lim 1C D ea ; (2.24)
n!1 n
linking the exponential function to a limit of a numerical sequence, and will use it
as our definition:
h
z n x C iy n x y in
e De
z xCiy
D lim 1 C D lim 1 C D lim 1 C Ci
n!1 n n!1 n n!1 n n
D lim wn : (2.25)
n!1
Here
z x y
wD1C D 1C Ci
n n n
is a complex number with the phase
y=n y
D arctan D arctan
1 C x=n xCn
where we have used again Eq. (2.24). Thus, the complex number which we call ez
has the absolute value equal to ex and the phase y, i.e. we obtain the fundamental
formula:
Let us analyse this result. Firstly, if the imaginary part y D 0, the exponential turns
into the real exponential ex , i.e. the complex exponential is exactly the real one
if z is real. Next, as follows from the two problems below, this function satisfies
usual properties of the exponential function of the real functions, and is analytic
everywhere.
Problem 2.25. Show that it follows from (2.28) that the following identities
are satisfied by the exponential function:
1 ez1
ez1 ez2 D ez1 Cz2 I ez D ; .ez /n D enz ; .ez /n D enz and D ez1 z2 ;
ez ez2
(2.29)
where z, z1 and z2 are complex numbers and n is an integer.
Problem 2.26. Show that
.ez / D exCiy D exiy D ez : (2.30)
Problem 2.27. Show, using again Eq. (2.28), that the exponential function is
analytic everywhere.
@u @v
.ez /0 D Ci D ex cos y C iex sin y D ex .cos y C i sin y/ D ez ;
@x @x
2.3 Main Elementary Functions 151
which again is the familiar result from the world of real functions. So, the
exponential function can be differentiated directly with respect to z as the latter
was real.
If we set x D 0 in Eq. (2.28), we shall obtain
where Eq. (2.30) was used for the second formula. These two identities were derived
by Euler and bear his name. Using the exponential function we can write any
complex number z with the absolute value r and the phase simply as
Problem 2.28. Write the following complex numbers in the exponential form:
p
z D ˙1 I zD1˙iI z D ˙i I zD1˙ 3i:
p
[Answers: 1, ei ; 2e˙i =4
; e˙i =2
; 2e˙i =3 .]
p
Problem 2.29. Write all roots of 5 ˙1 in the exponential form.
ˇ ˇ
Problem 2.30. Show that ˇeia ˇ D 1, where a is a real number.
Problem 2.31. Show that all roots of the quadratic equation x2 C .2 C i/ x C
p
4i D 0 can be written as x1;2 D 1 i=2 ˙ 171=4 3=2 ei =2 , where D
arctan 4.
Problem 2.32. Prove that the sum of all n-roots of 1 is equal to zero. [Hint:
represent the roots in the exponential form and then calculate the sum.]
From Eq. (2.32) we can express both the cosine and sine functions via the
exponential functions:
1 iy 1 iy
cos y D e C yiy and sin y D e eiy : (2.33)
2 2i
It is seen that the trigonometric functions are indeed closely related to the complex
exponential function. These relations are called Euler’s identities as well.
Expressions (2.31) or (2.33) (that are easy to remember) can be used for quickly
deriving various trigonometric identities (that are not so easy to remember!). For
instance, let us prove the double angle formula for the sine function:
152 2 Complex Numbers and Functions
1 i2˛ 1 h i˛ 2 i˛ 2 i
sin .2˛/ D e ei2˛ D e e
2i 2i
1 i˛ i˛
i˛ i˛
1 i˛ i˛
1 i˛ i˛
D e Ce e e D2 e Ce e e
2i 2 2i
D 2 cos ˛ sin ˛:
by following these steps: (i) make the substitution t D sin ; (ii) use the Euler’s
formula (2.33) to express the sine function via complex exponentials; (iii) use
the Binomial formula and perform integration over and (iv) note that only a
single term in the sum will give a non-zero contribution.
ezCi2 k
D ez ei2 k
D ez Œcos .2 k/ C i sin .2 k/ D ez ;
since the cosine is equal to one and the sine to zero. In other words, ei2 k D 1 for
any integer k. This also means that the exponential function of any two complex
numbers z1 and z2 related via z1 D z2 C i2 k is the same: ez1 D ez2 . Therefore, if
one considers horizontal stripes 2 k Im .z/ < 2 .k C 1/ for any fixed integer
k, then there will be one-to-one correspondence between z and ez . This is essential
to define the inverse of the exponential function which, as we shall see in the next
section, is the logarithm. This situation is similar to the integer power function we
considered previously where it was necessary to restrict the phase of z to define the
inverse (the n-root) function.
The hyperbolic functions are defined identically to their real variables counter-
parts:
1 z 1 z
sinh z D .e ez / and cosh z D .e C ez / : (2.35)
2 2
cosh2 z sinh2 z D 1 I cosh .z/ D cosh .z/ I sinh .z/ D sinh .z/ :
2.3 Main Elementary Functions 153
2.3.4 Logarithm
Since the phase arg.z/ of any z from C is defined up to 2 k with any integer k, the
logarithm of z is a multi-valued function. If is one particular phase of z, then
To choose a single branch of the logarithm, we fix the value of the integer k, and
then ln z will remain within a stripe 2 k Im .ln z/ < 2 .k C 1/, see Fig. 2.13.
For instance, by choosing k D 0 we select the stripe between 0i and 2 i. This is the
principal branch of the logarithm corresponding to the cut made along the positive
part of the x axis shown in Fig. 2.10(b) as in this case the phase of the logarithm
changes only between 0 and 2 .
For instance, consider ln .1/. This logarithm is not defined in real numbers.
However, on C this quantity has perfect meaning: since 1 D ei then ln .1/ D
ln 1 C i . C 2 k/ D i . C 2 k/. Here different values of k correspond to the
values of ln .1/ on different branches of the logarithmic function.
Problem 2.36. Present the following expressions in the form u C iv using the
k-th branch of the logarithm:
p p
ln i I ln .1 ˙ i/ I ln 3˙i I ln 1 ˙ i 3 I ln .1 C 5i/ :
Now let us try to understand if there are any limitations on the domain of allowed
z values in order for its logarithm to remain within the particular chosen stripe (and
hence to correspond to a single-valued function). In Fig. 2.14(a) we take a closed
path which does not contain the point z D 0 within it; then the logarithmic function
w D ln z goes along the path shown in (b) for each particular branch k. The vertical
parts in the paths in (b) correspond to the circle paths in (a) when only the phase of
z changes, while the horizontal parts in (b) correspond to the horizontal paths in (a)
when only jzj changes. The situation is different, however, if the point z D 0 lies
inside the contour as shown in Fig. 2.15. In this case w D ln z passes through the
current stripe and goes over into the next one, i.e. revolving around z D 0 takes
the logarithmic function from one of its branches to the next one. As in the case of
the n-root function, this problem is avoided by taking a branch cut from the branch
point z D 0 which would limit the phase of z between 0 and 2 , Fig. 2.10.
Above, when choosing the branches, we assumed that the branch cut is made
along the positive part of the x axis. Another direction of the cut will change the
function. For instance, the cut made along the negative part of the x axis shown
in Fig. 2.10(c) restricts the phase of z being between and . Hence, the same
2.3 Main Elementary Functions 155
point on the complex plane will have a different imaginary part of the logarithm.
Indeed, the points z1 D rei3 =2 and z2 D rei =2 are equivalent, but the former
point (when 0 < arg .z1 / < 2 ) corresponds to the cut made along the positive,
while the latter (when < arg .z2 / < ) along the negative directions of the x
axis. Correspondingly, the values of the logarithm in each case are by i2 different:
ln z1 D ln r C i3 =2 and ln z2 D ln r i =2.
Considering a particular branch, we can easily establish that the logarithmic
function is analytic, and we can calculate its derivative.
dw 1 1 1 1
.ln z/0 D D D w 0 D w D : (2.38)
dz dz=dw .e / e z
Again, the formula looks the same as for the real logarithm.
Sine and cosine functions of a complex variable are defined from the Euler-like
equations (2.33) generalised to any complex number z, i.e.
1 iz 1 iz
sin z D e eiz and cos z D e C eiz : (2.39)
2i 2
156 2 Complex Numbers and Functions
The trigonometric functions satisfy all the usual properties of the sine and cosine
functions of the real variable. First of all, if z is real, these definitions become Euler’s
formulae and hence give us usual sine and cosine. Then we see that for a complex z
the sine is an odd while cosine an even function, e.g.
1 iz 1 iz
sin .z/ D e eiz D e eiz D sin z:
2i 2i
Similarly one can establish various trigonometric identities for the sine and
cosine, and the manipulations are similar to those in Problem 2.33. For instance,
consider
1 iz 2 1 iz 2
sin2 z C cos2 z D e eiz C e C eiz
2i 2
1 2iz 1 2iz 1 1
D e 2 C e2iz C e C 2 C e2iz D C D 1;
4 4 2 2
as expected. It is also obvious that since the sine and cosine functions are composed
as a linear combination of the analytic exponential functions, they are analytic.
Finally, their derivatives are given by the same formulae as for the real variable
sine and cosine.
Problem 2.39. Express the complex numbers below in the form of u C iv:
1Ci 2i 3i
sin i I cos i I sin I cos :
1i 2Ci 3Ci
1 1 1 1
[Answers: i sinh .1/; cosh .1/; i sinh .1/; cos 10 cosh 10 C i sin 10 sinh 10 .]
Problem 2.40. Prove that generally:
cos .2z/ D cos2 z sin2 z I sin .4z/ D 4 sin z cos3 z 4 sin3 z cos z I
sin .z1 ˙ z2 / D sin z1 cos z2 ˙ cos z1 sin z2 I .sin z/0 D cos z I
.cos z/0 D sin z:
Problem 2.42. Prove directly (by checking derivatives of their real and imag-
inary parts) that both sine and cosine functions are analytic.
Problem 2.43. Prove the following identities using the definitions of the
corresponding functions:
cos .iz/ D cosh z I sin .iz/ D i sinh z sinh .iz/ D i sin z I cosh .iz/ D cos z:
(2.40)
Problem 2.44. Prove that
sin .y ix/
coth x i cot y D :
sin y sinh x
The last point which needs investigation is to determine which z points give the
same values for the sine and cosine functions. This is required for selecting such
domains of z in C where the trigonometric functions are single-valued and hence
where their inverse functions can be defined.
Let us start from the sine function:
or/and
The first expression reflects the periodicity of the sine function along the real axis
with the period of 2 ; note that this is entirely independent of the imaginary part
of z. This gives us vertical stripes (along the y axis) of the width 2 within which the
sine function is single-valued. The second condition is trickier. It is readily seen that
if z is contained inside the vertical stripe =2 < Re.z/ < =2, then no additional
solutions (or relationships between x1 and x2 ) come out of this extra condition.
Indeed, it is sufficient to consider the case of k D 0 because of the mentioned
periodicity. Then, we have the condition x1 C x2 D . If both x1 and x2 are positive,
this identity will never be satisfied for both x1 and x2 lying between 0 (including)
and =2 (excluding). Similarly, if both x1 and x2 were negative, then this condition
will not be satisfied if both x1 and x2 lie between =2 and 0. Finally, if x1 and x2
are of different sign, then the condition x1 C x2 D is not satisfied at all if both
of them are contained between =2 and =2. Basically, the conclusion is that no
identical values of the sine function are found if z is contained inside the vertical
stripe =2 < Re.z/ < =2, as required.
Problem 2.46. Show similarly that the equation cos z1 D cos z2 has the
solution of either x1 D x2 C 2 k (k is an integer) and/or x1 C x2 D 2 k.
Therefore, one may choose the vertical stripe 0 Re .z/ < to avoid identical
values of the cosine function.
Since the cosine and sine functions were generalised from their definitions
given for real variables, it makes perfect sense to define the tangent and cotangent
functions accordingly:
Since the logarithm is a multi-valued function, so is the arcsine. Also, here we do not
need to write ˙ before the root since here the root is understood as a multi-valued
function.
w D zc D ec ln z : (2.45)
w D zc D eRCi‰ ;
where
If ˛ D n is an integer, then
kC /
zn D rn ein.2 D rn ein ;
i.e. we obtain our previous single-valued result of Sect. 2.3.1. Similarly, in the n-root
case, i.e. when ˛ D 1=n with n being an integer, we obtain
1=n 1=n i.2 kC /=n
p 2
z Dr e D n
r exp i k C i ;
n n
which is the same result as we obtained earlier in Sect. 2.3.2. Further, if we consider
now a rational power ˛ D n=m with both n and m being integers (m ¤ 0), then
n
2 n n 1=m 2
z n=m
Dr n=m
exp i kCi D r exp i k C i
m m m m
p n
and it coincides with the function m z , as expected. Hence the definition (2.45)
indeed generalises our previous definitions of the power function.
2.4 Integration in the Complex Plane 161
Now we shall consider some examples of calculating the general power func-
tion. Let us calculate ii D exp .i ln i/. Since ln i D i . =2 C 2 k/, then ii D
exp . =2 2 k/, i.e. the result is a real number which however depends on the
branch (the value of k) used to define the logarithm.
2.4.1 Definition
q
where is the maximum distance jzkC1 zk j D .xkC1 xk /2 C .ykC1 yk /2
between any of the two adjacent points on the curve. The limit of ! 0 means
that the distances between any two adjacent points become smaller and smaller in
the limit (and correspondingly the number of division points n ! 1). It is clear
that if the limit exists, then the choice of the internal points k is not important.
We observe that this definition is very close to the definition of the line integral of
a vector field (of the second kind), Sect. I.6.3.2. Indeed, let f .z/ D u .x; y/Civ .x; y/,
zk D xk C iyk and k D ˛k C iˇk , then
X
n1 X
n1
f .k / .zkC1 zk / D .uk C ivk / Œ.xkC1 xk / C i .ykC1 yk /
kD0 kD0
X
n1 X
n1
D Œuk .xkC1 xk / vk .ykC1 yk / C i Œvk .xkC1 xk / C uk .ykC1 yk /
kD0 kD0
X
n1 X
n1
D Œuk xk vk yk C i Œvk xk C uk yk ;
kD0 kD0
This result shows that the problem of calculating the integral on the complex plane
can in fact be directly related, if needed, to calculating two real line integrals on
the x y plane. If these two integrals exist (i.e. u.x; y/ and v.x; y/ are piecewise
continuous and their absolute values are limited), then the complex integral also
exists and is well defined.
In practice the complex integrals are calculated by using a parametric represen-
tation of the contour L. Let x D x.t/ and y D y.t/ (or z D x.t/ C iy.t/ D z.t/) define
the curve L via a parameter t. Then dx D x0 .t/dt and dy D y0 .t/dt, so that we obtain
Z Z Z
0
f .z/dz D ux vy0 C i vx0 C uy0 dt D .u C iv/ x0 C iy0 dt
L L L
Z
D f .z.t// z0 .t/dt: (2.48)
L
Example 2.3.
I As an example, let us integrate the function f .z/ D 1= .z z0 / around a circle
of radius R centred at the point z0 D x0 C iy0 in the anti-clockwise direction, see
2.4 Integration in the Complex Plane 163
Fig. 2.17. Here the parameter u can be chosen as a polar angle since the points z
on the circle can be easily related to via
z . / D z0 C Rei : (2.49)
Indeed, if the circle was centred at the origin, then we would have x . / D R cos
and y . / D R sin , i.e.
however, once the circle is shifted by z0 , we add z0 , which is exactly Eq. (2.49).
Therefore, we get
Z ˇ ˇ Z Z
dz ˇ z D z0 C Rei ˇ 2
iRei d 2
D ˇˇ ˇD
ˇ Di d D 2 i:
circle z z0 dz D z0 . /d D iRei d 0 Rei 0
where
Z Z p Z p
lD jdzj D dx2 C dy2 D x0 .t/2 C y0 .t/2 dt
L L L
is the length of the curve L, specified with the parameter t, compare with
Sect. I.6.3.1.
R
Problem 2.51. Show that L dz=z2 D 2=R, where L is the upper semicircle of
radius R centred at the origin which is traversed from the positive direction of
the x axis to the negative one. At the same time, demonstrate that the same result
is obtained when the integral is taken along the lower part of the semicircle
traversed in the negative x direction (i.e. connecting the same initial and final
points).
Theorem 2.2 (Due to Cauchy). If the function f .z/ is analytic in some simply
connected3 region D of C and has a continuous derivative4 everywhere in D, then
for any contour L lying in D and starting
R and ending at the points zA D A .xA ; yA /
and zB D B .xB ; yB /, the integral L f .z/dz would have the same value, i.e. the
integral does not depend on the actual path, but only on the initial and final points.
Proof. Indeed, consider both real integrals in formula (2.47) and let us check if
these two integrals satisfy conditions of Theorem I.6.5. The region D is simply
connected, we hence only need to check whether the condition (2.52) is satisfied in
each case. In the first integral we have P D u and Q D v, and hence the required
condition (2.52) corresponds to @u=@y D @v=@x, which is the second Cauchy–
Riemann condition (2.16). Next, in the second integral in (2.47) we instead have
3
Recall that this was also an essential condition in Theorem I.6.5 dealing with real line integrals.
4
In fact, this particular condition can be avoided, although this would make the proof more
complex.
2.4 Integration in the Complex Plane 165
H
Problem 2.52. Prove that the integral over any closed contour L f .z/dz taken
inside D is zero:
I
f .z/dz D 0: (2.53)
L
This is a corollary to the Cauchy Theorem 2.2. The inverse statement is also valid
as is demonstrated by the following Theorem.
Proof. Since the integral over any closed contour is equal to zero, the integral
Z z Z Z
f .p/dp D udx vdy C i vdx C udy
z0 L L
does not depend on the path L connecting the twoR points, z0 and z, but only on
the points themselves. There are two line integrals L Pdx C Qdy above, each not
depending on the path. Then, from Theorem I.6.5 for line integrals, it follows that in
each case @P=@y D @Q=@x. Applying this condition to each of the two integrals, we
immediately obtain the Cauchy–Riemann conditions (2.16) for the functions u.x; y/
and v.x; y/, which means that indeed the function f .z/ is analytic. Q.E.D.
The usefulness of these Theorems can be illustrated on the frequently met
integral
Z 1 Z 1Cia
ˇ.xCia/2 2
I.a/ D e dx D eˇz dz; (2.54)
1 1Cia
166 2 Complex Numbers and Functions
Fig. 2.19 The integration between points .˙1; ia/ along the blue horizontal line z D x C ia
(1 < x < 1) can alternatively be performed along a different contour consisting of three
straight pieces (in purple): a vertical down piece at x D 1, then horizontal line along the x axis
and finally again a vertical piece to connect up with the final point
where a is a real number (for definiteness, we assume that a 0). The integration
here is performed along the horizontal line in the complex plane between the points
z˙ D ˙1 C ia crossing the imaginary axis at the number ia. The integrand
2
f .z/ D eˇz does not have any singularities, so any region in C is simply connected.
Hence, the integration line can be replaced by a three-piece contour connecting the
same initial and final points, as shown in Fig. 2.19 in purple. The integrals along the
vertical pieces, where x D ˙1, are equal to zero. Indeed, consider the integral over
the right vertical piece for some finite x D R, the R ! 1 limit is assumed at the
end. There
where M is some positive finite number corresponding to the value of the integral
R a ˇy2 2
0 e dy. It is seen from here that since eˇR ! 0 as R ! 1, the integral tends to
zero. Similarly it is shown that the integral over the vertical piece at x D R ! 1
is also zero. Therefore, it is only necessary to perform integration over the horizontal
x axis between 1 and C1. Effectively it appears that the original horizontal
contour at z D ia can be shifted down (up if a < 0) to coincide with the x axis, in
which case the integration is easily performed as explained in Sect. 4.2:
Z 1 Z 1 r
2 2
I.a/ D eˇ.xCia/ dx D eˇx dx D : (2.55)
1 1 ˇ
So, the integral (2.54) does not actually depend on the value of a.
2.4 Integration in the Complex Plane 167
The importance of the region D being simply connected can be illustrated by our
Example 2.3: the contour there is taken around the point z0 at which the function
f .z/ D 1= .z z0 / is singular. Because of the singularity, the region inside the
circular contour is not simply connected, because it is needed to cut off the singular
point. The latter can be done by drawing a small circle around z0 and removing all
points inside that circle. Thus, the region where the integration contour passes has a
hole and hence two boundaries: one is the circle itself and another is related to the
small circle used to cut off the point z0 . That is why the integral is not zero. However,
if the integral was taken around any contour which does not have the point z0 inside
it, e.g. the contour L0 shown in Fig. 2.18, then the integral would be zero according
to the Cauchy Theorem 2.2.
Since the integral of an analytic function only depends on the starting and ending
points of the contour, we may also indicate this explicitly:
Z Z z
f .z/dz D f .z/dz:
L z0
This notation now looks indeed like the one used for a real one-dimensional definite
integral, and this similarity is even stronger established because of the following
two theorems which provide us with a formula that is practically exactly the same
as the main formula of the integral calculus of Sect. I.4.3.
Proof. Let us write the integral explicitly via real and imaginary parts of the
function f .z/ D u C iv, see Eq. (2.47). However, since we know that both line
integrals do not depend on the choice of the path, we can run them using the special
path .x0 ; y0 / ! .x; y0 / ! .x; y/, i.e. we first move along the x and then along
the y axis (cf. Sect. I.6.3.4 and especially Fig. I.6.20, as, indeed, we have done
this before!). In this case the real U.x; y/ and imaginary V.x; y/ parts of F.z/ are,
respectively:
Z Z x Z y
U .x; y/ D Re F.z/ D udx vdy D u .; y0 / d v .x; / d;
L x0 y0
5
Note that according to Theorem 2.2 the actual path is not important as long as it lies fully inside
the simply connected region D.
168 2 Complex Numbers and Functions
Z Z x Z y
V .x; y/ D Im F.z/ D vdx C udy D v .; y0 / d C u .x; / d:
L x0 y0
Now, let us calculate all the partial derivatives to check if F.z/ is analytic. We start
with @U=@x:
Z Z
@U y
@v .x; / y
@u .x; /
D u .x; y0 / d D u .x; y0 / C d
@x y0 @x y0 @
D u .x; y0 / C Œu .x; y/ u .x; y0 / D u .x; y/ ; (2.58)
where we replaced @v=@x with @u=@y (using for the second variable) because
the function f .z/ is analytic and hence satisfies the conditions (2.16). A similar
calculation yields
Z Z
@V y
@u .x; / y
@v .x; /
D v .x; y0 / C d D v .x; y0 / C d
@x y0 @x y0 @
D v .x; y0 / C Œv .x; y/ v .x; y0 / D v .x; y/ ; (2.59)
@U @V
D v .x; y/ and D u .x; y/ :
@y @y
@U @V @V @U
D and D ;
@y @x @y @x
i.e. the Cauchy–Riemann conditions (2.16) are indeed satisfied for F.z/, as required.
Q.E.D.
Proof. Since we have proven in the previous theorem that the function F.z/ is
analytic, its derivative F 0 .z/ does not depend on the direction in which it is taken.
If we take it, say, along the x axis, then, as follows from Eqs. (2.58) and (2.59),
@U @V
F 0 .z/ D .U C iV/0 D Ci D u.x; y/ C iv.x; y/ D f .z/;
@x @x
as required. Q.E.D.
Similarly to the case of real integrals, we can establish a simple formula for
calculating complex integrals. Indeed, it is easy to see that different functions F.z/,
2.4 Integration in the Complex Plane 169
all satisfying the relation F 0 .z/ D f .z/, may only differ by a constant. Indeed,
suppose there are two such functions, F1 .z/ and F2 .z/, i.e. F10 .z/ D F20 .z/ D f .z/.
Consider F D F1 F2 which has zero derivative: F 0 D F10 F20 D f f D 0.
If F.z/ was a real function, then it would be obvious that it is then a constant. In our
case F D U C iV is in general complex, consisting of two real functions, and hence
a proper consideration is needed. Because the derivative can be calculated along any
direction, we can write for the real, U, and imaginary, V, parts of the function F the
following equations:
dF @U @V @U @V
D Ci D0 H) D 0 and D 0;
dz @x @x @x @x
and
dF @U @V @U @V
D Ci D0 H) D 0 and D 0;
dz @y @y @y @y
from which it is clear that U and V can only be constants, i.e. F.z/ D C, where C
is a complex number. This means that the two functions F1 and F2 may only differ
by a complex constant, and therefore, one can write
Z z1
f .z/dz D F .z1 / C C
z0
with the constant C defined immediately by setting z1 D z0 . Indeed, in this case the
integral is zero and hence C D F .z0 /, which finally gives
Z z1
f .z/dz D F .z1 / F .z0 / ; (2.60)
z0
which does indeed coincide with the main result of real integral calculus
(Eq. (I.4.43), the Newton–Leibniz formula, in Sect. I.4.3). The function F.z/ may
also be called an indefinite integral. This result enables calculation of complex
integrals using methods identical to those used in real calculus, such as integration
by parts, change of variables, etc. Many formulae of real calculus for simple
integrals can also be directly applied here. Indeed, since expressions for derivatives
of all elementary functions in C coincide with those of the functions of a real
variable, we can immediately write (assuming the functions in question are defined
in a simply connected region):
Z Z Z
e dz D e C C;
z z
sin z dz D cos z C C; cos z dz D sin z C C
and so on.
170 2 Complex Numbers and Functions
RB
Problem 2.54. Consider the integral I D A z2 dz between points A.1; 0/ and
B.0; 1/ using several methods: (i) along the straight line AB; (ii) along the
quarter of a circle connecting the two points; (iii) going first from A to the
centre O.0; 0/, and then from O to B and (iv) using directly Eq. (2.60) and
finding the appropriate function F.z/ for which F 0 D z2 . [Answer: in all cases
I D .1 C i/ =3 and F.z/ D z3 =3.]
ProblemH 2.55. Show by an explicit calculation that for any n ¤ 1 the
integral L .z z0 /n dz D 0, where L is a circle centred at z0 .
The Cauchy theorem above was proven for simply connected regions. We can
now generalise this result to multiply connected regions as, e.g. the ones shown in
Fig. 2.2. To this end, let us consider a region D shownH in Fig. 2.20(a) which has
two holes in it. If we calculate the closed-loop integral L f .z/dz for some analytic
function f .z/, it would not be zero since f .z/ is not analytic where the holes are
and hence our region is not simply connected. This is perfectly illustrated by the
Example 2.3 where a non-zero value for the integral around the singularity z0 was
found. Therefore, in those cases the Cauchy theorem has to be modified.
The required generalisation can be easily made by constructing an additional
path which goes around all the “forbidden” regions as shown in Fig. 2.20(b). In this
case we make two cuts to transform our region into a simply connected one; then
the integral will be zero over the whole closed loop:
Z Z Z Z
f .z/dz C f .z/dz C f .z/dz C f .z/dz D 0;
L L1 L2 L
where L1 is taken around the first “forbidden” region, while L2 around the second,
and L corresponds to two connecting lines traversing in the opposite directions
when connecting L with L1 and L1 with L2 . Since we can arbitrarily deform the
Fig. 2.20 (a) Contour L is taken around two “forbidden” regions shown as yellow with red
boundaries. (b) The contour L is deformed such that it goes round each of the forbidden regions
with sub-contours L1 and L2 , both traversed in the clockwise direction in such a way that the
“allowed” region is always on the left; the red dashed lines indicate the branch cuts made to turn
the region into a simply connected one; (c) the contours L1 and L2 are taken in the opposite
direction so that they traverse the “forbidden” regions anti-clockwise
2.4 Integration in the Complex Plane 171
contour inside the simply connected region without changing the value of the
integral (which is zero), we can make sure that the connecting lines in L are passed
very close to each other on both sides of each cut, and hence their contribution will
be zero. Therefore, we can write
Z Z Z Z Z
f .z/dz D f .z/dz f .z/dz D f .z/dz C f .z/dz D g1 C g2 ;
L L1 L2 L1 L2
(2.61)
where g1 and g2 are the closed-loop integrals around each of the holes passed in the
opposite (anti-clockwise) direction as shown in Fig. 2.20(c).
Hence, if a loop L encloses several “forbidden” regions, where f .z/ is not
analytic, as in Fig. 2.20(a), then
I XI
f .z/dz D f .z/dz; (2.62)
L k Lk
where the sum is taken over all “forbidden” regions falling inside L, and in all cases
the integrals are taken in the anti-clockwise direction. One can also write the above
formula in an alternative form:
I XI
f .z/dz C f .z/dz D 0; (2.63)
L k Lk
where all contour integrals over L and any of the Lk are run in such a way that the
region D is always on the left (i.e. L is run anti-clockwise and any internal ones, Lk ,
clockwise). Formally this last formula can be written in a form identical to the one
we obtained for a simply connected region, Eq. (2.53):
I
f .z/dz D 0; (2.64)
L
where the loop L is understood as composed of the loop L itself and all the internal
loops Lk which surround any of the “forbidden” regions which fall inside L. All
loops are taken in such a way that the region D is on the left.
It is clear that if the loop L goes around the k-th hole many times, each time the
value gk of the corresponding loop integral in Eq. (2.61) is added on the right-hand
side, in which case
I X X I
f .z/dz D nk gk D nk f .z/dz; (2.65)
L k k Lk
where nk is the number of times the k-th hole is traversed. These numbers nk may
also be negative if the traverse is made in the clockwise direction, or zero if no
traverse is made at all around the given hole (which happens when the hole is
outside L). The values gk do not depend on the loop shape as within the simply
172 2 Complex Numbers and Functions
connected region the loop can be arbitrarily deformed, i.e. gk is the “property” of
the function f .z/. Thus, it is seen from Eq. (2.65) that the value of the integral with
the contour L taken inside a multiply connected region with the “forbidden” regions
inside L may take many values, i.e. it is inherently multi-valued. Formulae (2.64)
and (2.65) are known as a Cauchy theorem for a multiply connected region.
Example 2.4.
I To illustrate this very point, it is instructive to consider a contour integral of
f .z/ D 1=z between two points z0 ¤ 0 and z ¤ 0. We expect that for any path
connecting the points z0 and z and not looping around the z D 0 point, as, e.g. is the
path L shown in Fig. 2.21 by the solid line, the integral is related to the logarithm:
Z z
dz0
D ln z ln z0 (2.66)
z0 z0
(since .ln z/0 D 1=z). However, if the path loops around the branch point z D 0
along the way, as does the path L1 shown on the same figure by the dashed line, then
the result must be different. Indeed, the path L1 can be split into two parts: the first
one, z0 ! z1 ! z, which goes directly between the initial and ending points, and
the second one, which is the loop itself (passed in the clockwise direction). The first
part should give the same result (2.66) as for L as the path z0 ! z1 ! z can be
obtained by deforming L all the time remaining within the simply connected region
(this can be done by making a branch cut going, e.g. from z D 0 along the positive
x direction as shown in Fig. 2.10). Concerning the loop integral around z D 0, this
can also be arbitrarily deformed, the result will not depend on the actual shape. In
Fig. 2.22(a) two loops are shown: an arbitrary shaped loop L and a circle Lo . We
change the direction on the circle and run a branch cut along the positive direction
of the x axis as in Fig. 2.22(b); then we connect two loops by straight horizontal lines
Lc running on both sides of the cut. They run in the opposite directions and hence
do not contribute to the integral. However, since the whole contour L C Lc Lo lies
entirely in the simply connected region, the Cauchy theorem applies, and hence the
result must be zero. Considering that the path Lc does not contribute, we have that
2.4 Integration in the Complex Plane 173
Z Z Z Z Z
dz0 dz0 dz0 dz0 dz0
C D0 H) D D ;
L z0 Lo z0 L z0 Lo z0 Lo z0
i.e. the two loop integrals in Fig. 2.22(a) are indeed the same. This means that the
integral over the loop in Fig. 2.21 can be replaced with the one where the contour is a
circle of any radius. We have already looked at this problem in Example 2.3 for some
z0 and found the value of 2 i for the value of the integral taken over a single loop
going in the anti-clockwise direction; incidentally, we found that the result indeed
does not depend on R (as it should as changing the radius would simply correspond
to a deformation of the contour). Hence, for the contour L1 shown in Fig. 2.21, the
result will be
Z z 0
dz
0
D ln z ln z0 2 i;
z0 z
where 2 i, which is the contribution from the contour L0 , appeared with the minus
sign due to the clockwise direction of the traverse in L0 . Obviously, we can loop the
branch point in either direction and many times, so that the general result for any
contour is
Z z 0
dz
0
D ln z ln z0 C i2 k; (2.67)
z0 z
where k D 0; ˙1; ˙2; : : :. We see that the integral is indeed equal to the multi-
valued logarithmic function, compare with Eq. (2.37), and the different branches of
the logarithm are related directly to the contour chosen. J
Problem 2.56. Using the substitution t D tan .x=2/ from Sect. I.4.4.4, show
that for g > 1
Z =2 Z =2
dx dx
D Dp : (2.68)
=2 g sin x =2 g C sin x g2 1
(continued)
174 2 Complex Numbers and Functions
Analytic functions still have more very interesting properties. We shall prove
now a famous result that the value of an analytic function f .z/ at some point z of
a multiply connected region D is determined by its values on any closed contour
surrounding the point z; in particular, this could be the boundary of region D. For
a multiply connected region this boundary includes both the external and all the
internal loops surrounding the “forbidden” regions.
Theorem 2.6 (Due to Cauchy). Let the function f .z/ be analytic inside some
multiply connected region D. Then for any contour L surrounding the point z and
lying inside D, we have
I
1 f .p/dp
f .z/ D ; (2.69)
2 i L pz
where L contains the loop L and all the internal loops fLk g which surround any
holes (“forbidden” regions) lying inside L. Note the direction of the traverse of
the outside loop L and any of the internal loops: the “allowed” points of D should
always remain on the left.
where L is a composite loop consisting of L and all the internal loops fLk g
surrounding the “forbidden” regions inside L. Therefore:
I I I I
f .p/ f .p/ f .p/ f .z/ dp
dp D dp D dp C f .z/ ; (2.70)
L pz Cr pz Cr pz Cr pz
2.4 Integration in the Complex Plane 175
where both integrals in the right-hand side are now taken in the anti-clockwise
direction. The second integral we have calculated in Example 2.3 where we found
that it is equal to 2 i. The first integral is equal to zero. Indeed, it can be estimated
using the inequality (2.51) as
ˇI ˇ ˇ ˇ
ˇ f .p/f .z/ ˇˇ ˇ f .p/f .z/ ˇ
ˇ dpˇ max ˇ ˇ ˇ2 r
ˇ pz circle pz ˇ
circle
ˇ ˇ ˇ ˇ
ˇ f zCri f .z/ ˇ ˇ f z C ri f .z/ ˇ
ˇ ˇ ˇ ˇ
D 2 r max ˇ ˇ D 2 max ˇ ˇ
circle ˇ rei ˇ circle ˇ ei ˇ
ˇ ˇ
D 2 max ˇf z C ri f .z/ˇ :
circle
Here we have used that on the circle p D z C rei . The circle can be continuously
deformed without affecting the value of the integral. In particular, we can make
as weˇ want. Taking therefore the limit of r ! 0, the difference
ˇit as small
ˇf z C ri f .z/ˇ tends to zero, and hence the above estimate shows that the first
circle integral in the right-hand side of (2.70) tends to zero. Therefore, from (2.70)
follows the result we set out to prove. Q.E.D.
If we formally differentiate both sides of Eq. (2.69) with respect to z, we get a
similar result for the derivative of f .z/:
I
1 f .p/
f 0 .z/ D dp: (2.71)
2 i L .p z/2
In other words, one may take the limit sign inside the integral:
Z Z
lim f .˛; z/ g.z/dz D lim f .˛; z/ g.z/dz: (2.73)
˛!˛0 L L ˛!˛0
Proof. Since the function f .˛; z/ converges uniformly with respect to ˛, then for
any > 0 one can find ı > 0, not depending on z, such that j˛ ˛0 j < ı
implies
R jf .˛; z/ F.z/j < . Therefore, considering the limit of the integral
L f .˛; z/ g.z/dz, we can write down an estimate:
ˇZ Z ˇ ˇZ ˇ
ˇ ˇ ˇ ˇ
ˇ f .˛; z/ g.z/dz ˇ ˇ
F.z/g.z/dzˇ D ˇ Œf .˛; z/ F.z/ g.z/dzˇˇ
ˇ
L L L
Z Z
< jg.z/j dz < M dz D Ml D 0
;
L L
where l is the length of the whole contour L (including all the internal parts). The
above inequality proves the property (2.73). Q.E.D.
The Cauchy Theorem 2.6 enables us to present an analytic function at an internal
point z via its values on a contour surrounding it. With the help of formula (2.73)
just proven, we can extend the theorem to the derivatives of f .z/. We shall now
show how to present the derivatives of an analytic f .z/ via its values on a contour
L surrounding the point z. Indeed, the first derivative is the limit of the expression
.f .z C z/ f .z// =z, which, with the help of Eq. (2.69), can be written as:
I I
f .z C z/ f .z/ 1 f .p/ f .p/
D dp dp
z 2 iz L p z z L p z
I I
1 1 1 1 1 1 f .p/
D f .p/ dp D dp:
2 i L z p z z p z 2 i L p z z p z
For small enough ı one can always ensure that d1 > 0. Then, to justify the uniform
convergence 1= .p z z/ ! 1= .p z/, we have to estimate the difference:
ˇ ˇ ˇ ˇ
ˇ 1 1 ˇˇ ˇˇ z ˇ ı
ˇ ˇ
ˇ p z z p z ˇ D ˇ .p z z/ .p z/ ˇ < dd D :
1
It is seen that the estimate is valid for any p from L , so that ı depends only on but
not on p, and this proves the uniform convergence required. Therefore, Eq. (2.73) is
applicable, and we can take the limit z ! 0 inside the integral, yielding Eq. (2.71).
Problem 2.57. Prove the general result for the n-th derivative:
I
nŠ f .p/
f .n/ .z/ D dp: (2.74)
2 i L .p z/nC1
[Hint: use induction.]
This result shows that an analytic function f .z/ has derivatives of any order which
are also analytic functions.
To avoid cumbersome notations, when using the Cauchy theorem we shall write
L instead of L in the following, assuming that all the internal contours fLk g are
included as well if there are “forbidden” regions inside L.
Similarly to the case of the real calculus, one can consider infinite numerical series
1
X
z 1 C z2 C D zk (2.75)
kD1
on the complex plane. The series is said to converge to z, if for any > 0 one
can find
P a positive integer N such that for any n N the partial sum of the series,
Sn D nkD1 zk , differs from z by no more than , i.e. the following inequality holds:
jSn zj < . If such N cannot be found, the series is said to diverge.
It is helpful to recognise that a complex numerical series consists of two real
series. Since each term zk D xk C iyk consists of real and imaginary parts, we can
write
1
X 1
X 1
X
zk D xk C i yk :
kD1 kD1 kD1
178 2 Complex Numbers and Functions
Therefore, the series (2.75) converges to z D x C iy, if and only if the two real series
in the right-hand side of the above equation converge to x and y, respectively. This
fact allows transferring most of the theorems we proved for real series (Sect. I.7.1)
to the complex numerical series.
Especially useful for us here is the notion of absolute convergence introduced
in Sect. I.7.1.4 for real numerical series with terms which may be either positive or
negative. We proved there that a general series necessarily converges if the series
consisting of the absolute values of the terms of the original series converges. The
same type of statement is valid for complex series as well which is formulated in
the following Theorem.
constructed from absolute values of the terms of the original series, converges,
then so does the original series.
q q
Proof. Indeed, since jxk j xk2 C y2k and similarly jyk j xk2 C y2k for any k, then
P P1
the series 1 kD1 jxk j and kD1 jyk j will both converge (and converge absolutely)
as long as the series (2.76) converges (see Theorem I.7.6). Then, since the real
and imaginary series both individually converge
P1 and convergeP1 absolutely, so are
the original real and imaginary series, kD1 xk and kD1 yk , and hence the
series (2.75). Q.E.D.
Similarly to absolutely converging real series, absolutely converging complex
series can be summed up, subtracted and/or multiplied to each other; their sum also
does not depend on the order of terms in the series.
The root and ratio tests for the convergence of the series are also valid. Although
the proof of the root test remains essentially the same (see Theorem I.7.7), the
ratio test proven in Theorem I.7.8 requires some modification due to a different
nature of the absolute value jzj of a complex number. We shall therefore sketch the
proof of the ratio test here again to adopt it specifically for complex series.
Theorem 2.9 (The Ratio Test). The series (2.75) converges absolutely if
ˇ ˇ
ˇ znC1 ˇ
D lim ˇˇ ˇ < 1;
n!1 zn ˇ
Proof. Note that is a positive number. Since the limit exists, then for any >0
one can always find a number N such that any n N implies
ˇ ˇ
ˇ znC1 ˇ
ˇ ˇ< H) jznC1 zn j < jzn j : (2.77)
ˇ z ˇ
n
From the inequality ja bj jaj C jbj (valid even for complex a and b) follows
that jc bj jcj jbj (where c D a C b). Therefore,
jznC1 j < . C / jzn j < . C /2 jzn1 j < < . C /n jz1 j :
1
X 1
X 1
X
jzn j D jznC1 j < jz1 j . C /n ;
nD1 nD0 nD0
which is the case only if 0 < q < 1. If < 1, one can always find a positive such
that C < 1, and hence the series (2.75) converges.
Consider now the case of > 1. In this case it is convenient to consider the ratio
zn =znC1 which has a definite limit of D 1= < 1. Similar argument to the one
given in the previous case then leads to an inequality:
ˇ ˇ
ˇ zn ˇ
ˇ ˇˇ < H) jzn j jznC1 j jzn znC1 j < jznC1 j ;
ˇz
nC1
which yields
2 n
1
jznC1 j > jzn j D jzn j > jzn1 j > > jz1 j :
C 1C 1C 1C
Since > 1, one can always find > 0 such that q D = .1 C / > 1.
However, since qn ! 1 when n ! 1, jznC1 j ! 1 as well, and hence the
necessary condition for the convergence of the series, provided by Theorem I.7.4, is
not satisfied, i.e. the series (2.75) indeed diverges. Q.E.D.
As in the case of the real calculus, nothing can be said about the convergence of
the series if D 1.
180 2 Complex Numbers and Functions
P
Problem 2.58. Prove that the geometric progression S D 1 k
kD0 q (where q
is a complex number) converges absolutely if jqj < 1 and diverges if jqj >
1. Then show that the sum of the series is still formally given by exactly the
same expression, S D 1= .1 q/, as in the real case. [Hint: derive a recurrence
relation for the partial sum, SN , and then take the limit N ! 1.]
In this section we shall generalise some of the results of Chap. I.7 to complex
functions. Most of the results obtained in Chap. I.7 are valid in these cases as
well, although there are some differences. We shall mostly be interested in uniform
convergence here (cf. Sect. I.7.2.1).
We shall start by considering a functional sequence f1 .z/, f2 .z/, f3 .z/, etc. We
know that the sequence ffn .z/g converges uniformly to f .z/ if for any > 0 one can
find a number N D N. / such that any n N implies jfn .z/ f .z/j < for any z.
We stress again that it is essential that the number N depends exclusively on , not
on z, i.e. the same value of N applies to all z from region D where all the functions
are defined; that is why the convergence is called uniform.
Next, consider an infinite functional series
1
X
f1 .z/ C f2 .z/ C D fn .z/: (2.78)
nD1
The series (2.78) is said to converge uniformly to f .z/ if the functional sequence of
its partial sums
X
N
SN .z/ D fn .z/ (2.79)
nD1
converges uniformly when N ! 1. Most of the theorems of Sect. I.7.2.1 are valid
here as well. In particular, if the series converges, its n-th term tends to zero as
n ! 1 (cf. Theorem I.7.4). Next, if the series converges uniformly to f .z/ and the
functions ffn .z/g are continuous, then f .z/ is continuous as well, which means that
(cf. Theorems I.7.16 and I.7.17)
1
X 1
X
lim fn .z/ D lim fn .z/ D f .z0 / :
z!z0 z!z0
nD1 nD1
Further, one can integrate a uniformly converging series (2.78) term-by-term, i.e.
(cf. Theorem I.7.18) for any contour L within region D:
Z X
1 1 Z
X
fn .z/dz D fn .z/dz:
L nD1 nD1 L
2.5 Complex Functional Series 181
Also the convergence test due to Weierstrass (Theorem I.7.15) is also valid: if each
element of the series fn .z/ beyond
P some number N (i.e. for all n > N) satisfies
jfn .z/j ˛n and the series n ˛n converges, then the series (2.78) converges
uniformly. Proofs of all these statements are almost identical to those given in
Chap. I.7, so we do not need to repeat them here.
There are also some additional Theorems specific for the complex functions
which we shall now discuss.
Theorem 2.10. If the series (2.78) converges uniformly to f .z/, and all functions
ffn .z/g are analytic in a simply connected region D, then f .z/ is also analytic in D.
Proof. Indeed, since the series converges uniformly for all z from D, we can
integrate the series term-by-term, i.e. one can write
I 1 I
X
f .z/dz D fn .z/dz;
L nD1 L
where L is an arbitrary closed contour in D. Since the functions fn .z/ are analytic, the
closed contour integral of any of them is equal to zero (see Problem 2.52). Therefore,
the closed contour integral of f .z/, from the above equation, is also zero. But this
means, according to Theorem 2.3, that f .z/ is analytic. Q.E.D.
The next Theorem states that the uniformly converging functional series (2.78)
can be differentiated any number of times. The situation is much more restrictive in
the real calculus (Theorem I.7.19).
Proof. Consider a closed loop L in D, and let us pick up a point z inside L and a
point p on L. Then, since the series (2.78) converges uniformly to f .z/ for any z
including points p on the contour L, we can write
1
X
f .p/ D fn .p/:
nD1
h i
Next, we multiply both sides of this equation by kŠ= 2 i .p z/kC1 with some
positive integer k and integrating over L (note that the integration can be done term-
by-term in the right-hand side as the series converges uniformly), we obtain
I X1 I
kŠ f .p/dp kŠ fn .p/dp
D :
2 i L .p z/kC1 nD1
2 i L .p z/kC1
182 2 Complex Numbers and Functions
According to the previous theorem, f .z/ is analytic. Therefore, we can use for-
mula (2.74) in both sides, which yields
1
X
f .k/ .p/ D fn.k/ .p/;
nD1
The series
1
X
ck .z a/k ; (2.80)
kD0
in which functions fk .z/ are powers of za (where a is also complex) and ck are some
complex coefficients, is called a power series in the complex plane C. Practically
all the results of the real calculus we considered before are transferred (with some
modifications) to the complex power series.
We shall start by stating again the Abel’s Theorem I.7.20, which we shall
reformulate for the case of the complex power series here.
Theorem 2.12 (Due to Abel). If the power series (2.80) converges at some point
z0 ¤ a, see Fig. 2.23(a), then it converges absolutely within the circle jz aj < r,
where r D jz0 aj; moreover, it converges uniformly for any z within a circle
jz aj < r, where 0 < < 1.
is some positive number. Then, for any z inside the circle of radius r, i.e. within a
circle C with radius D r with 0 < < 1, we can write
ˇ ˇ ˇ ˇk
ˇ ˇ ˇ ˇ ˇˇ .z a/k ˇˇ ˇ ˇˇ ˇ
ˇck .z a/ ˇ D ˇck .z0 a/ ˇ ˇ
k k
ˇ D ˇck .z0 a/k ˇ ˇ z a ˇ < M k ;
ˇ .z0 a/ ˇ
k ˇ z0 a ˇ
where D j.z a/ = .z0 a/j < =r < 1. Hence, the absolute value of each term of
our series is bounded by the elements of the converging geometric progression, M k ,
with 0 < < 1, and hence, according to the corresponding analog of Weierstrass
Theorem I.7.15, the series converges absolutely and uniformly within the circle C .
Q.E.D.
Problem 2.59. Prove by contradiction that if the series (2.80) diverges at some
z0 ¤ a, then it diverges for any z lying outside the circle Cr of radius r D
jz0 aj.
Problem 2.60. Prove that if it is known that the series (2.80) converges at
some z0 ¤ a and diverges at some z1 (obviously, jz1 aj > jz0 aj), then
there exists a positive R > 0 such that the series diverges outside the circle CR ,
i.e. for any z satisfying jz aj > R, and absolutely converges inside CR , i.e. for
any z satisfying jz aj < R.
The number R is called the radius of convergence of the series (cf. Sect. I.7.3.1).
It follows now from Theorem 2.10 that the series (2.80) is an analytic function
inside the circle CR of its radius of convergence R. This in turn means that it
can be differentiated and integrated term-by-term any number of times. The series
obtained this way would have the same radius of convergence. The radius of
convergence can be determined from either ratio or root tests via the following
formulae (cf. Sect. I.7.3.1):
ˇ ˇ
ˇ cn ˇ p
R D lim ˇˇ ˇ and/or 1 D Sup n
jcn j; (2.81)
n!1 cnC1 ˇ R n!1
where in the latter case the maximum value of the root in the limit is implied.
Problem 2.61. Determine the region of convergence of the power series with
the coefficients ck D 3k =k around the point a D i. [Answer: jz ij < 1=3.]
Problem 2.62. Determine the region of convergence of the power series with
p
the coefficients ck D 1= 2k k around the point a D 0. Does the series
converge at the points: z D i and z D 3 i? [Answer: jzj < 2; yes; no.]
184 2 Complex Numbers and Functions
(as the points z and a lie inside C , while p is on it), may be used to form an infinite
geometric progression
1
X 1
X
za k 1
qk D D :
kD0 kD0
pa 1q
1 X k
n
1 1 1 1 1 1
D D za D D q
pz .p a/ .z a/ p a 1 pa pa1q p a kD0
X
n
.z a/k
D ; (2.82)
kD0
.p a/kC1
where the series on the right converges uniformly for all p on the circle C .
Therefore, it can be integrated term-by-term. Multiplying both sides of Eq. (2.82)
by f .p/=2 i and integrating over the circle C , we get
I X 1 n I
1 f .p/ f .p/ .z a/k
dp D dp
2 i C pz kD0
2 i C .p a/kC1
" I #
Xn
1 f .p/dp
D .z a/ k
: (2.83)
kD0
2 i C .p a/kC1
Using now formulae (2.69) and (2.74) for the left- and right-hand sides, respectively,
and recalling that f .p/ is analytic on C as it is inside CR , we see that the left-hand
side is equal to f .p/ and in the right-hand side we have the k-th derivative of f .p/.
Hence, we finally obtain
2.5 Complex Functional Series 185
1
X I
f .k/ .a/ 1 f .p/dp
f .z/ D ck .z a/k ; where ck D D ; (2.84)
kD0
kŠ 2 i C .p a/kC1
which is the final result. Note that the expansion converges uniformly since it
was obtained by a term-by-term integration of the uniformly converging geometric
progression; moreover, the series is an analytic function (Theorem 2.10).
The formula for the series above looks exactly the same as the Taylor’s formula
for real functions (see Sect. I.7.3.3). Hence, since the formulae for differentiation
of all elementary functions on the complex plane are identical to those in the real
case, the Taylor’s expansions for the elementary functions also look identical. For
instance, we can immediately write the following expansions around a D 0:
1
X zn
z2
ez D 1 C z C C D I (2.85)
2Š nD0
nŠ
1
X .1/n1 z2n1
z3 z 5 z 7 .1/nC1 z2n1
sin z D z C C C C D I (2.86)
3Š 5Š 7Š .2n 1/Š nD1
.2n 1/Š
1
X .1/n z2n
z2 z4 .1/n z2n
cos z D 1 C C C D I (2.87)
2Š 4Š .2n/Š nD0
.2n/Š
X 1
z2 z3 zn zn
ln .1 C z/ D z C C .1/nC1 C D .1/nC1 I (2.88)
2 3 n nD1
n
˛ .˛ 1/ 2 ˛ .˛ 1/ .˛ 2/ .˛ n C 1/ n
.1 C z/˛ D 1 C ˛z C z C C z
2 nŠ
1
X
C D D˛n zn ; (2.89)
nD0
where ˛ is generally complex and the “generalised binomial” coefficients D˛n are
given by
˛ ˛ .˛ 1/ .˛ 2/ .˛ n C 1/
D˛n D D (2.90)
n nŠ
(cf. Eq. (I.3.70)). The latter two expansions are written for single-valued branches
of the functions which correspond to the values of 0 and 1 of the functions at the
point z D 0, respectively.
186 2 Complex Numbers and Functions
Problem 2.63. Show that the radius of convergence of the series (2.85)–(2.87)
is R D 1 (i.e. they converge for all z).
Problem 2.64. Show that the radius of convergence of the series (2.88)
and (2.89) is R D 1 (i.e. they converge for jzj < 1).
Problem 2.65. Derive an analog of the Taylor’s formula for complex functions
(cf. Eq. (I.3.60)),
X
n
.z a/k
f .z/ D f .k/ .a/ C RnC1 (2.91)
kD0
kŠ
and then applying the method we used above when deriving the Taylor’s series.
Problem 2.66. Consider the Taylor’s series of eix with x real, separate out the
even and odd powers of x and hence prove the Euler’s formulae (2.31).
Exactly in the same way as in Sect. I.7.3.3 it can be shown that any power series
of an analytic function f .z/ coincides with its Taylor’s series, i.e. the Taylor’s series
is unique.
The Taylor’s series is useful in expanding the function f .z/ within a circle jz aj <
R where f .z/ is analytic. However, if f .z/ is not analytic at the point z D a, Taylor’s
2.5 Complex Functional Series 187
expansion around this point cannot be applied. As was shown by Laurent6 it is still
possible to expand f .z/ around the point z D a, but in this case the series would not
only contain terms .z a/k with positive powers k, but also terms with negative k as
well, i.e. this, the so-called Laurent series, would have the general form:
1
X
f .z/ D ck .z a/k : (2.93)
kD1
For some functions the series may contain a finite number of terms in the part of the
series with positive or negative k. This is determined by the character of the point
z D a where f .z/ has a singularity. We shall postpone considering this particular
aspect in more detail until later, but now let us derive the Laurent series.
Consider f .z/ which is analytic inside a circle of radius R around the point a,
except the point a itself, i.e. the function is analytic in a ring 0 < jz aj < R, see
Fig. 2.24. Take now a point z inside the ring formed by the circle Cr , surrounding the
point z D a, and a larger circle C which encloses the point z but remains inside the
circle CR of radius R, i.e. 0 < r < < R. The value of the function f .z/ at the point
z can then be expressed employing the generalisation (2.64) of the Cauchy theorem,
yielding
I I
1 f .p/ 1 f .p/
f .z/ D dp dp: (2.94)
2 i C pz 2 i Cr pz
Both integrals are taken in the anti-clockwise direction. Note that in the first integral
over C the points p are further away from z, i.e. jp aj > jz aj, and hence
1= .p z/ can be expanded into a geometric progression (2.82) leading to the
Taylor’s series for the first integral in (2.94), i.e.
6
Karl Weierstrass discovered the series 2 years before Laurent, but published his results more than
50 years later.
188 2 Complex Numbers and Functions
I X 1 I
1 f .p/ 1 f .p/dp
dp D ck .z a/k ; where ck D ;
2 i C pz kD0
2 i C .p z/kC1
(2.95)
see Eq. (2.83). Note that the coefficients ck here cannot be written via f .k/ .a/ as the
latter does not exist (f .z/ is not analytic at z D a).
In the second integral in Eq. (2.94) points p lie closer to a than z, i.e. jp aj <
jz aj. In this case we can expand with respect to q D .p a/ = .z a/ (so that
jqj < 1), i.e.
1
1 1 1 1 1 1 1 X k
D D pa D D q
pz .p a/ .z a/ z a 1 za za1q z a kD0
1
X .p a/k1
D ;
kD1
.z a/k
where in the last step we changed the summation index in the sum, so that now it
starts from k D 1. This leads to the following formula for the second integral:
I X 1 I
1 f .p/ 1
dp D ck .z a/k ; where ck D .p a/k1 f .p/dp:
2 i Cr p z kD1
2 i Cr
Here the sum runs over negative powers of .z a/ using the positive index k; if we
change the index k ! k, so that the summation index k runs over all negative
integer numbers between 1 and 1, then we obtain for the second integral
in (2.94) instead:
I X1 I
1 f .p/ 1 f .p/
dp D ck .z a/k ; where ck D dp:
2 i Cr pz kD1
2 i Cr .p a/kC1
(2.96)
The latter form now looks extremely similar to the expansion (2.95) for the first
integral which allows combining both results into a single formula:
1
X I
1 f .p/
f .z/ D ck .z a/ ; k
where ck D dp; (2.97)
kD1
2 i C .p a/kC1
which is called the Laurent series. Note that here C is any closed contour lying
between Cr and C . Indeed, the loop C in the formula for ck with positive k can
be deformed into C as long as C remains inside the ring formed by Cr and C , and
for negative k the same can be done with the loop Cr which can be freely deformed
into C.
The part of the series containing negative powers of .z a/ is called the principal
part of the Laurent series. Also note that since both parts of the Laurent series
2.5 Complex Functional Series 189
(for positive and negative k) were based on the uniformly converging geometric
progressions, the Laurent series also converges uniformly inside the ring 0 <
jz aj < R. Moreover, the Laurent series represents an analytic function in the
ring as consisting of two series (corresponding to negative and positive k), each of
which is analytic. The following Theorem establishes uniqueness of the series and
shows that if f .z/ is analytic in a ring r < jz aj < R, then its expansion (2.93) over
positive and negative powers of .z a/ is unique and hence is given by Eq. (2.97),
i.e. it must be the Laurent series.
Theorem 2.13. Consider the series (2.93) that converges to f .z/ within a ring
r < jz aj < R. Then its expansion over positive and negative powers of .z a/
is analytic, unique and hence coincides with its Laurent expansion.
Proof. Indeed, consider the expansion (2.93). Its part with k 0 converges inside
the circle jz aj < R, while its part with k 1 converges for any z satisfying
jz aj > r. Indeed, using the ratio test for the positive part, we have
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ ckC1 ˇ ˇ ˇ ˇ ck ˇ
lim ˇˇ ˇ jz aj D jz aj lim ˇ ckC1 ˇ < 1 H) jz aj < lim ˇˇ ˇ D R;
k!1 c ˇ k!1 ˇ c ˇ k!1 ckC1 ˇ
k k
Either series converges uniformly; the proof for the positive part of the series is
identical to that given by the Abel’s Theorem 2.12; for the negative part the proof
has to be slightly modified and we shall sketch it here (using temporarily k as
positive for convenience). Consider a point z0 inside the ring such that jz0 aj <
ˇjz aj. The k
series
ˇ converges at z0 and hence each of its terms must be bounded:
ˇck .z0 a/ ˇ < M. Then,
ˇ ˇ ˇ ˇ ˇ ˇk
ˇ ˇ ˇ ˇˇ k ˇ ˇ kˇ ˇ ˇ
ˇck .z a/k ˇ D ˇck .z0 a/k ˇ ˇˇ .z a/ ˇˇ < M ˇˇ .z0 a/ ˇˇ D M ˇ z0 a ˇ D M k ;
ˇ .z0 a/k ˇ ˇ .z a/k ˇ ˇ za ˇ
where D jz0 aj = jz aj < 1. Since the absolute value of each element of our
series is bounded by the elements k of the converging geometric progression, the
series converges absolutely and uniformly because of the Weierstrass test (Theorem
I.7.15).
Now, since the expansion (2.93) converges uniformly, it can be integrated term-
by-term. Let us multiply both sides of this equation by .z a/n f .z/=2 i with some
fixed integer n and integrate in the anti-clockwise direction over a circle C with the
point a in its centre:
190 2 Complex Numbers and Functions
I 1
X I
1 1
.z a/n f .z/dz D ck .z a/nCk dz:
2 i C kD1
2 i C
The integral in the right-hand side is equal to zero for any n C k ¤ 1 (see
Problem 2.55), and is equal to 2 i for n C k D 1 (Example 2.3). Therefore, in
the sum in the right-hand side only the single term with k D n 1 survives, and
we obtain
I I
1 1 f .z/
.z a/n f .z/dz D cn1 H) cn D dz;
2 i C 2 i C .z a/nC1
which is exactly the same as in the Laurent series, see Eq. (2.97) (of course, the circle
can be deformed into any contour lying inside the ring). This proves the second part
of the Theorem. Q.E.D.
Formula (2.97) can be used to find the Laurent expansion of any function
f .z/ which is analytic in a ring. This requires calculating closed-loop contour
integrals for the expansion coefficients ck via Eq. (2.97). However, in the cases of
f .z/ D Qn .z/=Pm .z/, where Qn .z/ and Pm .z/ are polynomials of the power n and m,
respectively, simpler methods can be used based on a geometric progression.
Example
2.5. I Let us find the Laurent series for the function f .z/ D
1= z2 3z C 2 around a D 0.
The quadratic polynomial in the denominator has two roots at z1 D 1 and z2 D 2, i.e.
1 1 1
f .z/ D D : (2.98)
.z 1/ .z 2/ z2 z1
Since singularities are at z D 1 and z D 2, we will have to consider three circular
regions: A f0 jzj < 1g, B f1 < jzj < 2g and C fjzj > 2g, see Fig. 2.25(a), where
each of the two fractions can be considered separately. In region A we can expand
directly with respect to z as jzj < 1:
X1 1
1 1 1 1 1 1X z k
D D zk and D D ;
z1 1z kD0
z2 2 1 z=2 2 kD0 2
i.e. it is basically represented by the Taylor’s series. Region B is a ring 1 < jzj < 2
and hence we should expect the negative part of the Laurent expansion be presented
as well. And, indeed, since jzj > 1, we cannot expand the fraction 1=.z 1/ in terms
of z, but rather should be able to do it in terms of 1=z:
2.5 Complex Functional Series 191
Fig. 2.25 For the expansion into the Laurent series of f .z/ D 1= z2 3z C 2 around (a) a D 0
and (b) a D 3. Blue circles separate three regions A, B and C in each case
1 1 1
1 1 1 1 X 1 k X k1 X k
D D D z D z :
z1 z 1 1=z z kD0 z kD0 kD1
On the other hand, since jzj < 2, the same expansion as above can be used for
1= .z 2/. Therefore, in this ring region
X1 1
1 1X z k
D zk ;
.z 1/ .z 2/ kD1
2 kD0 2
i.e. it contains both negative and positive parts. Finally, in region C we have jzj > 2
and hence for both fractions we have to expand in terms of 1=z. The expansion of
1= .z 1/ stays the same as for the previous region, while for the other fraction
1 1 1
1 1 1 1 X 2 k X k k1 X k1 k
D D D 2z D 2 z ;
z2 z 1 2=z z kD0 z kD0 kD1
X1 X1 X1
1 k1
D zk C 2k1 zk D 2 1 zk ;
.z 1/ .z 2/ kD1 kD1 kD1
i.e. the expansion contains only the negative part of the Laurent series.J
Example 2.6. I In this example we shall expand the same function around a D 3
instead.
We again have three regions A, B and C as depicted in Fig. 2.25(b). Region A
corresponds to a circle centred at z D 3 and with the radius 1, i.e. jz 3j < 1, up to
192 2 Complex Numbers and Functions
the nearest singularity at z D 2; region B forms a ring between the two singularities,
i.e. 1 < jz 3j < 2, while the third region C corresponds to region jz 3j > 2. Let
us construct the Laurent series for region B:
X1 X1
1 1 1 1 .1/k .1/k
D D 1
D D ;
z2 .z 3/ C 1 z 3 1 C z3 kD0
.z 3/kC1 kD1
.z 3/k
1 1
1 1 1 1 1X k .z 3/
k X
k .z 3/
k
D D D .1/ D .1/ ;
z1 .z 3/ C 2 2 1 C z3
2
2 kD0 2k kD0
2kC1
X1 1
X
1 .1/k k .z 3/
k
D .1/ :
.z 1/ .z 2/ kD1
.z 3/k kD0 2kC1
Here, when expanding the first fraction, 1=.z 2/, we have in the denominator
.z 3/ C 1, where jz 3j > 1, and hence we must expand using inverse powers.
In the second case of the fraction 1=.z 1/, the denominator becomes .z 3/ C 2
with jz 3j < 2, and hence we can expand with respect to .z 3/=2 which results
in terms with positive powers. J
Example 2.7. I Let us expand into the Laurent series the function f .z/ D sin .1=z/
around a D 0.
Here we have a single region jzj > 0. Since the Taylor’s expansion for the sine
function (2.86) converges for all values of z, we can just use this expansion with
respect to p D 1=z to get
X 1
1 .1/n1 2nC1
sin D z :
z nD1
.2n 1/Š
Problem 2.67. Expand f .z/ D exp .1=z/ into the Laurent series around a D 0.
Problem
2.68. Show that the Laurent expansion of f .z/ D
1= z2 .i C 3/ z C 3i around a D 4i is
1
X
ˇk
f .x/ D ˛k .z 4i/k C ;
kD0
.z 4i/kC1
Singularities of a function f .z/ are the points where it is not analytic; as we shall
discuss here, they are closely related to the Laurent series of f .z/.
Firstly, we shall consider the so-called isolated singularities. The point z D a
is an isolated singularity, if one can always find its neighbourhood such where f .z/
is analytic apart from the singularity point itself, i.e. one can always find r > 0
such that in the ring 0 < jz aj < r the function f .z/ has no other singularities.
For instance, the function f .z/ D 1= z2 3z C 2 has two isolated singularities at
z D 1 and z D 2, since one can always draw a circle of the radius r < 1 around
each of these points and find f .z/ to be analytic everywhere in those circles apart
from the points z D 1 and z D 2 themselves. Isolated singularities, in turn, may be
of three categories:
The point z D a is removable if f .z/ is not analytic there although its limit,
limz!a f .z/, exists. In this case one can define f .z/ at z D a via its limit making
f .z/ analytic at this point as well, i.e. the singularity can be “removed”. Since f .z/
has a well-defined limit at z ! a, its Laurent expansion cannot contain negative
power terms; therefore, f .z/ must be represented on the ring 0 < jz aj < R (with
some R) by the Taylor’s series (2.84) with f .a/ being defined as its zero power
coefficient c0 (as all other terms tend to zero in the limit).
Example 2.8. I Function f .z/ D sin z=z has z D 0 as an isolated removable
singularity.
194 2 Complex Numbers and Functions
Indeed, we can expand the sine functions for jzj > 0 in the Taylor’s series to see that
the singularity is removed:
1
sin z 1 X .1/n1 2n1 1 z3 z2
D z D z C D 1 C ;
z z nD1 .2n 1/Š z 3Š 3Š
2.5.5.2 Poles
where '.z/ has only positive terms in its expansion, i.e. it is expandable into the
Taylor’s series, and hence is well defined in the neighbourhood of z D a including
the point z D a itself. Therefore, the origin of the singularity (and of the infinite
limit of f .z/ when z ! a) is due to the factor 1= .z a/n .
Above, n corresponds to the largest negative power term in the expan-
sion (2.99). If n D 1, i.e. the expansion starts from the term c1 = .z a/, the pole
is called simple. Otherwise, if it starts from cn = .z a/n , the pole is said to be of the
order n. It is easy to see that
1
X
.z a/n f .z/ D '.z/ D cnCk .z a/k
kD0
has a well-defined limit at z ! a equal to cn , which must be neither zero nor
infinity. Therefore, by taking such a limit it is possible to determine the order of the
pole:
Order of pole n is when lim .z a/n f .z/ is neither zero nor infinity:
z!a
(2.100)
2.5 Complex Functional Series 195
z3 z3
f .z/ D D
z3 C .1 2i/ z2 .1 C 2i/ z 1 .z i/2 .z C 1/
has two poles. The point z D i is the pole of order 2, while z D 1 is a simple pole.
Indeed, applying the criterion, we have for the former pole:
.z 3/ .z i/n2 i3
lim .z i/n f .z/ D lim D lim .z i/n2 :
z!i z!i zC1 i C 1 z!i
The limit is not zero or infinity only if n D 2 (in which case the limit is
.i 3/ = .i C 1/ D 1 C 2i). Similarly, for the other pole:
.z 3/ .z C 1/n1 1 3
lim .z C 1/n f .z/ D lim 2
D lim .z C 1/n1 ;
z!1 z!1 .z i/ .1 i/2 z!1
which gives n D 1; for any other values of n the limit is either zero (when n > 1) or
infinity (n < 1/.J
Poles are closely related to zeros of complex functions. If the function f .z/ is
not singular at z D a and its Taylor’s expansion around this point is missing the
very first (constant) term, i.e. the coefficient c0 D 0 in Eq. (2.99), then f .a/ D 0.
But more first terms in the Taylor’s expansion may be missing for some functions,
and this would characterise the rate with which f .z/ tends to zero as z ! a. More
precisely, if the Taylor’s expansion of f .z/ starts from the n-th term, i.e.
1
X 1
X
f .z/ D cn .z a/k D .z a/n cnCk .z a/k D .z a/n '.z/; (2.101)
kDn kD0
where '.z/ has all terms in its Taylor’s expansion, then it is said that the point z D a
is a zero of order n of f .z/. If n D 1, then it is called a simple zero. The function
f .z/ D sin z has a simple zero at z D 0 since its Taylor’s expansion starts from the
linear term.
By looking at Eqs. (2.99) and (2.101), we can see that if f .z/ has a pole of order
n at z D a, then the same point is a zero of the same order of the function 1=f .z/ D
.z a/n ='.z/ since 1='.z/ tends to a finite limit at z D a. Inversely, if f .z/ has a
zero of order n at z D a, then 1=f .z/ D .z a/n ='.z/ has a pole of the same order
at the same point.
The point z D a is called an essential singularity of f .z/ if the limit limz!a f .z/
does not exist. This means that by taking different sequences of points fzk g on the
complex plane which converge to z D a, different limits of f .z/ are obtained, i.e. f .z/
196 2 Complex Numbers and Functions
at z D a is basically not defined; in fact,7 one can always find a sequence of numbers
converging to z D a which results in any limit of limz!a f .z/. The Laurent series
around z D a must have an infinite number of negative power terms (as otherwise
we arrive at the two previously considered cases).
Example 2.10. I The function f .z/ D sin .1=z/ has an essential singularity at
z D 0.
Indeed, the Laurent series has the complete principal part, i.e. the negative part has
all terms (Example 2.7). If we tend z to zero over a sequence of points on the real
axis, z D x, then the limit limx!0 .1=x/ is not defined as the function oscillates
rapidly as x ! 0, although it remains bounded by ˙1. If, however, we take the limit
along the imaginary axis, z D iy, then
1 1 i=iy i
sin D e ei=iy D e1=y e1=y :
iy 2i 2
If y ! C0 (i.e. from above), then e1=y ! 1, e1=y ! 0 and hence sin .1=iy/ !
i1. If, however, y ! 0 (that is, from below), then sin .1=iy/ D Ci1. This
discussion illustrates well that z D 0 is an essential singularity of sin .1=z/. J
Functions f .z/ which do not have singularities are called holomorphic. For instance,
these are sine, cosine and exponential functions, polynomials, as well as their simple
combinations not involving division. Meromorphic functions only have isolated
poles. It follows from Eq. (2.99) that any meromorphic function is the ratio of two
homomorphic functions.
7
This statement was proven independently by Julian Sochocki, Felice Casorati and Karl Weier-
strass.
2.6 Analytic Continuation 197
Two more types of singularities may also be present. Firstly, a singularity may be
1
not isolated. Consider the function f .z/ D e1=z C 1 . Its denominator is equal to
zero, e1=z C 1 D 0, when
1
D ln .1/ D ln ei D i C i2 k with k D 0; ˙1; ˙2; : : : :
z
Therefore, at points zk D .i C i2 k/1 the function f .z/ is singular. However, in
the limit of k ! 1 the points zk form a very dense sequence near z D 0, i.e. the
singularities are not isolated.
The other types of points where a function is not analytic are branch points and
points on branch cuts. The points on the branch cuts form a continuous set and hence
are also not isolated.
@f df @z df @f df @z df
D D D u0x C ivx0 ; D D i D u0y C ivy0 :
@x dz @x dz @y dz @y dz
Therefore,
df df
D u0x C ivx0 and also D i u0y C ivy0 D vy0 iu0y :
dz dz
Equating real and imaginary parts of the two expressions, we obtain u0x D vy0 and
vx0 D u0y , which are the Cauchy–Riemann conditions (2.16). Hence, f .z/ is indeed
analytic.
This series converges within the circle jzj < 1, and it is not defined outside it
including the circle itself, i.e. for jzj 1 the series diverges.
Let us now expand the function around some general point z D a (where
a ¤ 1):
g2
ln .1 C z/ D ln Œ.1 C a/ .1 C g/ D ln .1 C a/ C g C ;
2
1
.1 C z/1 D Œ.1 C a/ .1 C g/1 D 1 g C g2 ;
1Ca
This is a particular expansion of f .z/ obtained with respect to the point a, and it
converges for jgj D j.z a/ = .1 C a/j < 1, i.e. in the circle jz aj < j1 C aj,
which is centred at the point z D a and has the radius of R D j1 C aj. At a D 0 the
latter expansion reduces to (2.102).
The convergence of the series (2.103) is compared for different values of a in
Fig. 2.26. Expanding f .z/ around a D 0 results in a function f0 .z/ which is only
defined in the domain A; the function f1 .z/ obtained with a D 1 is defined in a larger
domain which also completely includes A; taking a bigger value of a D 3 results in
an even bigger domain C which goes beyond the two previous ones. We may say
that the expansion (2.103) for a D 1 defines our function f .z/ in the part of the
domain B which goes beyond A, i.e. f0 .z/ is said to be continued beyond its domain
into a larger one by means of f1 .z/. Similarly, f2 .z/ defines f .z/ in the rest of C which
is beyond B. In other words, we may now define one function
2.6 Analytic Continuation 199
8
< f0 .z/; z from A
f .z/ D f1 .z/; z from that part of B which is outside A :
:
f2 .z/; z from that part of C which is outside B
This process can be continued so that f .z/ would be defined in an even larger domain
in the complex plane. Of course, in our particular case we know f .z/ in the whole
complex plane C via the logarithm from the very beginning, so this exercise seems
to be useless. However, it serves to illustrate the general idea and can be used in
practice when the function f .z/ is actually not known. For instance, when solving a
differential equation via a power series (see Sects. I.8.4 and 2.8), expanding about
different points z D a allows obtaining different expansions with overlapping circles
of convergence. Hence, a single function can be defined in the united domains as a
result of the procedure described above.
This operation of defining a function f .z/ in a bigger domain is called analytic
continuation. The appearance of the word “continuous” is not accidental since the
resulting function will be analytic as long as its components are. This is proven by
the following Theorem.
Theorem 2.14. Consider two functions f0 .z/ and f1 .z/ which are analytic in
domains D0 and D1 , respectively. Let the two domains have only a common line
L as shown in Fig. 2.27(a), and let the two functions be equal on the line, i.e.
f0 .z/ D f1 .z/ for any z on L. If the two functions are also continuous on L, then the
combined function
8 8
< f0 .z/; z in D0 < f0 .z/; z in D0
f .z/ D f1 .z/; z on L or f .z/ D f0 .z/; z on L ; (2.104)
: :
f1 .z/; z in D1 f1 .z/; z in D1
Fig. 2.27 (a) The regions D0 and D1 overlap at a line L. (b) A contour is considered which runs
across both regions and consists of two parts: L0 lying in D0 and L1 lying fully in D1 . The paths ˛
and ˇ lie exactly on the common line L and are passed in the opposite directions
Proof. Consider a contour which starts in D0 , then crosses L, goes into D1 and
finally returns back to D0 . It consists of two parts: L0 which lies fully inside D0 ; and
of the other part, L1 , which is in D1 , as shown in Fig. 2.27(b). We can close L0 with
a line ˛ which lies on the common line L for the two regions; similarly, L1 can be
closed with the line ˇ D ˛, also lying on L and passed in the opposite direction
to that of ˛. Since the two lines ˛ and ˇ are the same, but passed in the opposite
directions,
Z Z Z Z Z Z
f .z/dz C f .z/dz D f0 .z/dz C f1 .z/dz D f0 .z/dz C f0 .z/dz D 0;
˛ ˇ ˛ ˇ ˛ ˛
as f .z/ D f0 .z/ D f1 .z/ everywhere on L. Therefore, the contour integral of f .z/ over
the path L0 C L1 can be written as
I Z Z Z Z
f .z/dz D f .z/dz C f .z/dz D f0 .z/dz C f1 .z/dz
L0 CL1 L0 L1 L0 L1
Z Z I
C f0 .z/dz C f1 .z/dz D f0 .z/dz
˛ ˇ L0 C˛
I
C f1 .z/dz D 0 C 0 D 0;
L1 Cˇ
since the loop L0 C ˛ lies fully in D0 and hence the closed contour integral of the
analytic function f0 .z/ is equal to zero; similarly, the integral of f1 .z/ over the closed
loop L1 C ˇ is zero as well. Therefore, a closed-loop integral of the function (2.104)
anywhere in the entire region D0 C D1 is zero. This finally means, according to the
Morera’s Theorem 2.3, that the function f .z/ is analytic. Note that the continuity of
both functions on L is required for the Cauchy theorem we used here as L is a part
of the boundary of the two regions. Q.E.D.
Above we assumed that the two regions overlap only along a line. In this case
the continuation defines a single-valued function f .z/. If the two regions overlap in
2.7 Residues 201
their internal parts, i.e. in the subregion shown in Fig. 2.28, then a continuation is
not longer unique as f0 .z/ may be quite different to f1 .z/ in , and hence in the
function f .z/ becomes multi-valued.
2.7 Residues
2.7.1 Definition
dn1 .n C 1/Š
Œ.z a/n f .z/ D .n 1/Šc1 C nŠc0 .z a/1 C c1 .z a/2 C ;
dzn1 2Š
where all terms preceding the term with c1 vanish upon differentiation. Then,
taking the limit z ! a, the terms standing behind the c1 term disappear as well,
and we finally obtain a very useful expression:
1 dn1
c1 D Res Œf .a/I a D lim n1 Œ.z a/n f .z/ : (2.106)
.n 1/Š z!a dz
This formula can be used in practice to find the residues. For simple poles this
formula can be manipulated into simpler forms. For n D 1
Since B0 .z/ D B1 .z/ C .z a/ B01 .z/ with B01 .a/ D b2 ¤ 0 and B0 .a/ D B1 .a/, we
obtain in this particular case a simpler formula:
A.z/ A.z/ A.a/ A.a/
Res I a D lim D D 0 : (2.108)
B.z/ z!a B1 .z/ B1 .a/ B .a/
Problem 2.76. Show that if the point a is a zero of order 2 of the function B.z/,
then the above formula is modified:
A.z/ 2A0 .a/ 2A.a/B000 .a/
Res I a D 00 : (2.109)
B.z/ B .a/ 3 ŒB00 .a/2
There are obviously two of them to find as there are only two singularities: at z D 0
and z D i. Consider first z D 0. Since the cosine function and .z i/2 behave well
there, the function can be presented as f .z/ D '1 .z/=z with '1 .z/ D cos z= .z i/2
being a function which has a finite non-zero limit at z D 0. Therefore, z D 0 is a
simple pole. This also follows from the fact that the limit of limz!0 Œzf .z/ is non-
zero and finite, see Eq. (2.100). Therefore, from Eq. (2.108) we obtain
cos z cos 0= .0 i/2 1= .1/
Res 2
I 0 D 0 D D 1:
z .z i/ .z/ 1
The same result comes out from the general formula (2.106) as well:
2.7 Residues 203
cos z 1 cos z cos 0
Res 2
I0 D lim 2
D D 1:
z .z i/ 0Š z!0 .z i/ .0 i/2
The Laurent expansion would also give the same c1 D 1 coefficient. Indeed,
expanding around z D 0, we have
z2
cos z D 1 C ; .z i/2 D Œi .1 C iz/2 D .1 C iz/2 D 1 C 2iz C ;
2
Now let us consider another pole z D i. Since the cosine and 1=z behave well at
z D i, we deal here with the pole of order 2. The criterion (2.100) gives the same
result: the limit of .z i/ f .z/ at z ! i does not exist, while the limit of .z i/2 f .z/
is finite and non-zero (and equal to cos i=i D i cos i). Hence we can use directly
formula (2.106) for n D 2:
1 d 2 cos z
Res Œf .i/I i D lim .z i/
1Š z!i dz z .z i/2
d cos z sin z cos z
D lim D lim 2 D i sin i C cos i D e1 :
z!i dz z z!i z z
The same result is of course obtained when using the Laurent method:
and
1 1 1 1 1 zi
D D D 1 C D i C .z i/ C ;
z .z i/ C i i 1 C .z i/ =i i i
and we obtain
cos z 1
2
D Œcos i .z i/ sin i C Œi C .z i/ C
z .z i/ .z i/2
1 h i
D 2
i cos i C .cos i C i sin i/ .z i/1 C ;
.z i/
Problem 2.77. Identify all singularities the following functions have, and then
calculate the corresponding residues there:
eiz z2 1
.a/ I .b/ tan z I .c/ :
z2 C 1 z2 iz C 6
[Answers: (a) Res f .i/ D i=2e, Res f .i/ D ie=2; (b) Res f . =2 C k/ D
1 for any integer k; (c) Res f .2i/ D i, Res f .3i/ D 2i.]
where the sum is taken over all isolated singularities which lie inside L. (Note that
the function f .z/ does not need to be analytic on L (which could be, e.g. a boundary
of D), but need to be continuous there).
Fig. 2.29 (a) An isolated singularity a is traversed by a circular contour C. (b) Three isolated
singularities a1 , a2 and a3 fall inside the contour L, while three other singularities are outside the
contour. According to the generalised Cauchy theorem (2.62), the integral over L is equal to the
sum of three contour integrals around the points a1 , a2 and a3
2.7 Residues 205
Next, consider a larger contour L which is run (in the same direction) around several
such poles as shown in Fig. 2.29(b). Then, according to the generalised form of the
Cauchy theorem, Eq. (2.62), we can write
I XI X
f .z/dz D f .z/dz D 2 i Res f .ai / ;
L i Ci i
where the sum is run over all poles which lie inside the contour L. At the last step
we again used the definition (2.105) of the residue. This is the result we have set out
to prove. Q.E.D.
Note that in the above formula (2.110) only poles and essential singularities
matter, as the residue of a removable singularity is zero. Therefore, removable
singularities can be ignored, and this is perfectly in line with the fact that any
function f .z/ can be made analytic at the removable singularity by defining its value
there with the corresponding limit as was explained in Sect. 2.5.5.
Closed-loop contour integrals are calculated immediately using the residue Theo-
rem 2.15. For instance, any contour L going around a point z0 which is a simple
pole of the function f .z/ D 1= .z z0 / yields
I
dz 1
D 2 i Res I z0 D 2 i;
L z z0 z z0
which is exactly the same result as in Example 2.3.
More interesting, however, are applications of the residues in calculating definite
integrals of real calculus taken along the real axis x. These are based on closing
the integration line running along the real x axis in the complex plane and using
the appropriate analytic continuation of the function f .x/ ! f1 .z/ (in many cases
f1 .z/ D f .z/). It is best to illustrate the method using various examples.
Example 2.12. I
Let us calculate the following integral:
Z 1
dx
ID : (2.111)
1 1 C x2
To perform the calculation, we consider the following integral on the complex plane:
I Z R Z
dz dz dz
IR D D C D Ihoriz C Icircle ;
L 1 C z2 R 1 C z2 CR 1 C z2
206 2 Complex Numbers and Functions
where the closed contour L shown in Fig. 2.30 consists of a horizontal part from R
to R running along the positive direction of the x axis, and a semicircle CR which
is run in the anti-clockwise direction as shown. This contour is closed and hence
the value of the integral can be calculated using the residue
Theorem (2.110), i.e.
it is equal to 2 i times the residue of f .z/ D 1= z2 C 1 at the point z D Ci. We
only need to consider this pole as the other one (z D i) is outside L. The residue is
easily calculated to be
1 1 1 1 i
Res 2 I i D Res Ii D D D ;
z C1 .z C i/ .z i/ iCi 2i 2
(this follows from the inequality jc bj jcj C jbj by setting c D a C b). Hence,
according to the inequality (2.51),
ˇZ ˇ
ˇ dz ˇˇ 1
jICR j D ˇˇ R;
CR z2 C 1 ˇ R2 1
the absolute value of the integral tends indeed to zero as R ! 1 which means that
ICR ! 0 in this limit. Therefore,
which is our final result. This result can be checked independently as the integral
can also be calculated directly:
Z 1
dx
ID D arctan .C1/ arctan .1/ D D ;
1 1 C x2 2 2
which is the same. J
In a similar manner one can calculate integrals of a more general type:
Z 1
Qm .x/
ID dx; (2.113)
1 Pn .x/
where Qm .x/ and Pn .x/ are polynomials of the order m and n, respectively, and it is
assumed that Pn .x/ does not have real zeros, i.e. its poles do not lie on the x axis.
First, we note that the convergence of this integral at ˙1 requires m C 1 < n. Then,
each polynomial can be expressed via its zeroes as a product:
where fak g are zeroes of Qm .x/, while fbk g are zeroes of Pn .x/. Again, we consider
the contour shown in Fig. 2.30 and hence need to investigate the integral over the
semicircle. On it we have
ˇ ˇ ˇ ˇ
jz ak j D ˇR ei ak ˇ ˇR ei ˇ C jak j D R C jak j
and also
ˇ ˇ ˇ i ˇ 1 1
jz bk j D ˇR ei bk ˇ ˇR e ˇ jbk j D R jbk j H) ;
jz bk j R jbk j
which allows us to estimate f .z/ D Qm .z/=Pn .z/ as follows:
ˇ ˇ ˇ ˇ ˇQ ˇ ˇ ˇQ ˇ ˇ
ˇ Qm .z/ ˇ ˇ qm ˇ ˇ m .z ak /ˇ ˇ qm ˇ m kD1 .R C jak j/
ˇ qm ˇ Am .R/
ˇ ˇ D ˇ ˇ ˇQkD1 ˇ ˇ ˇ ˇ ˇ
ˇ P .z/ ˇ ˇ p ˇ ˇ n .z b /ˇ ˇ p ˇ Qn .R jb j/ D ˇ p ˇ B .R/ ;
n n lD1 l n lD1 k n n
where Am .R/ and Bn .R/ are polynomials in R. Hence, the value of the semicircle
integral can be estimated as
ˇZ ˇ ˇ ˇ
ˇ Qm .z/ ˇˇ ˇˇ qm ˇˇ Am .R/
ˇ
jICR j D ˇ dzˇ ˇ ˇ R:
CR Pn .z/ pn Bn .R/
208 2 Complex Numbers and Functions
It is clearly seen that the expression in the right-hand side of the inequality tends to
zero as R ! 1 if m C 1 < n, and therefore, the Icircle ! 0. This means that
X
Qm .z/
I D lim .IR Icircle / D lim IR lim Icircle D lim IR D 2 i Res I bk ;
R!1 R!1 R!1 R!1 Pn .z/
k
where the sum is taken over all zeroes lying in the upper half of the complex plane.
[Answers: (a) =4a3 ; (b) = Œab .a C b/; (c) ; (d) =2; (e) 3 = 8a5 .]
Problem 2.79. Prove the formula
I
zdz i
D p ; (2.114)
z4 C 2 .2g2 1/ z2 C 1 2
2g g 1
where g > 1 and the contour is the circle of unit radius with the centre at the
origin.
can also be calculated using the residue theorem. Here R.x; y/ is a rational function
of x and y. The trick here is to notice that the integration over between 0 and 2
may also be related to going anti-clockwise around the unit radius circle C1 . Indeed,
on the circle z D ei , dz D iei d D izd and also
1 i 1 1 1 i 1 1
sin D e ei D z and cos D e C ei D zC ;
2i 2i z 2 2 z
so that
I
1 1 1 1 dz
ID R z ; zC ;
C1 2i z 2 z iz
2.7 Residues 209
i.e. the integration is indeed related now to that on the circle. Therefore, according
to the residue theorem, the result is equal to the sum of residues of all poles inside
the unit circle C1 , times 2 i.
Example 2.13. I Consider the integral
Z 2
dx
ID :
0 2 C cos x
R2
Problem 2.80. Calculate the integrals 0 f . /d for the following functions
f . /:
1 1 1
.a/ I .b/ 2 I .c/ I
1 C sin2 1 C sin2 1 C cos2
cos2 cos2 1
.d/ 2
I .e/ I .f / ;
1 C sin2 2 cos C cos2 a C sin
Problem 2.82. Consider the integral of Problem 2.34 again. Make the substi-
tution t D sin , then replace with z D ei and show that the integration can
be extended over the whole unit circle. Finally, perform the integration using
the method of residues. You will also need the result
d2n 2n .2n/Š 2
lim 2n z2 1 D .1/n ; (2.116)
z!0 dz nŠ
see Eq. (I.3.46).
with some real ; alternatively, cosine or sine functions may appear instead of the
exponential function. We shall come across this type of integrals, e.g. in considering
the Fourier transform in Chap. 5. The idea of their calculation is also based on
closing the contour either above the real axis (as we have done so far) or below.
Before we formulate a rather general result known as Jordan’s lemma, let us
consider an example.
Example 2.14. I Consider the following integral:
Z 1 ix
e dx
I D 2 2
; where > 0:
1 x C a
We use the function f .z/ D eiz = z2 C a2 and the same contour as in Fig. 2.30
can be employed. Then the contribution from the semicircle part CR appears to be
equal to zero in the limit R ! 1. Indeed, on the upper semicircle z D x C iy D
R .cos C i sin / and 0 < < , so that sin > 0, and hence
How will the result change if was negative? We cannot use the same contour
as above since
in this case. What is needed is to make sin < 0 on the semicircle CR , then eiz
would still tend to zero in the R ! 1 limit and the contribution from CR will
vanish. This can be achieved simply by closing the horizontal path with a semicircle
in the lower part of the complex plane, Fig. 2.31.
Problem 2.83. Show that in this case the integral is equal to . =a/ ea . [Hint:
note that this time the contour is traversed in the clockwise direction, so that
the closed contour integral is equal to a minus residue at z D ia times 2 i.]
In the above example we have used a rather intuitive approach. Fortunately, the
reasoning behind it can be put on a more rigorous footing. In fact, it appears that if
f .z/ tends to zero as z ! 1 (i.e. along any direction from z D 0), then the integral
over the semicircle CR always tends to zero. This latter statement is formulated in
the following Lemma where a slightly more general contour CR is considered than
in the example we have just discussed since this is useful for some applications: CR
may also have a lower part which goes by a below the x axis, i.e. Imz > a for all z
on CR , where 0 a < R, Fig. 2.32(a).
Fig. 2.32 For the proof of Jordan’s lemma: (a) when > 0, then the contour CR can be used
which consists of an upper semicircle ABC and two (optional) arches CD and EA; when < 0,
then instead the rest of the circle CR0 is to be used. Note that a is kept constant in the R ! 1 limit,
i.e. the angle ˛ D arcsin .a=R/ ! 0. (b) The sine function is concave on the interval 0 < x < =2,
i.e. it lies everywhere higher than the straight line y D 2x=
Proof. Let us start from the case of positive , and consider the semicircle part ABC
of CR . Since f .z/ converges uniformly to zero as R ! 1, we can say that for any
z on the semicircle jf .z/j < MR , where MR ! 0 as R ! 1. Note that due to the
uniform convergence MR does not depend on z, it only depends on R. Next, on the
semicircle
In the last step we were able to use the symmetry of the sine function about
D =2 and hence consider the integration interval only between 0 and =2
(with the corresponding factor of two appeared in front). The last integral can be
estimated with the help of the inequality sin 2 = valid for 0 =2 (see
Fig. 2.32(b)), yielding
ˇZ ˇ Z
ˇ ˇ =2
MR
ˇ f .z/eiz dzˇˇ 2RMR e.2R= /
d D 1 eR :
ˇ
semicircle 0
What is left to consider is the effect of the arches CD and AE of CR , which lie
below the x axis. Apply the above method to estimate the integral on the AE first:
ˇZ ˇ ˇZ ˇ Z
ˇ ˇ ˇ ˇ 0
ˇ f .z/eiz dzˇˇ MR ˇˇ eiz dzˇˇ MR R eR sin d :
ˇ
AE AE ˛
For the angles lying between ˛ and zero we have R sin a, so that the last
integral can be estimated as:
Z 0 Z 0
a
MR R eR sin d MR R ea d D MR Rea ˛ D MR ea R arcsin :
˛ ˛ R
where we have made the substitution D . Since lies between the limits
0 < ˛ < < =2, the inequality sin 2 = is valid and, since < 0, we can
estimate the integral in the same way as above:
ˇZ ˇ ˇZ ˇ
ˇ =2 ˇ ˇ =2 ˇ
ˇ R sin ˇ ˇ jjR sin ˇ
2RMR ˇ e d ˇ D 2RMR ˇ e d ˇ
ˇ ˛ ˇ ˇ ˛ ˇ
ˇZ ˇ
ˇ =2 ˇ
ˇ ˇ MR Rjj
2RMR ˇ e.2jjR= / d ˇ D e e2Rjj˛= :
ˇ ˛ ˇ jj
It follows then that this expression tends to zero in the R ! 1 limit. The Lemma is
now fully proven. Q.E.D.
So, when > 0 one has to consider closing a horizontal path in the upper part of
the complex plane, while for < 0 in the lower part, we did exactly the same in the
last example.
214 2 Complex Numbers and Functions
Z 1 2 Z 1 p
x cos .x/ dx 1 a x sin .x/ dx a
2 D e I 4 C a4
D 2 ea= 2 sin p :
1 .x C a /
2 2 2 a 1 x a 2
where ı ! C0.
Let us now consider several examples in which the poles of the functions f .z/ in
the complex plane lie exactly on the real axis.
Example 2.15. I Consider the integral
Z 1
eix
I D dx; (2.120)
1 x
where > 0. The contour cannot be taken exactly as in Fig. 2.30 in this case since
it goes directly through the point z D 0. The idea is to bypass the point with a little
semicircle C as is shown in Fig. 2.33 and then take the limit ! 0. Therefore,
the contour contains a large semicircle CR whose contribution tends to zero as
R ! 1 due to Jordan’s lemma, then a small semicircle C whose contribution we
shall need to investigate, and two horizontal parts R < x < and < x < R.
The horizontal parts upon taking the limits R ! 1 and ! 0 converge to the
integration over the whole real axis as required in I . Moreover, since the function
f .z/ D eiz =z does not have any poles inside the contour (the only pole is z D 0
which we avoid), we can write using obvious notations:
Z Z Z R Z
C C C D0 (2.121)
R C CR
for any > 0 and R > . Consider now the integral over C . There z D ei , so
that
Z Z 0 .cos Ci sin / Z
eiz ei .cos Ci sin /
dz D iei d D i ei d :
C z ei 0
Calculating this integral exactly would be a challenge; however, this is not required,
as we only need it calculated in the limit of ! 0. This evidently gives simply
i as the exponent in the limit is replaced by one. Therefore, the C integral in the
limit tends simply to i . The two horizontal integrals in (2.121) give in the limit
the integral over the whole real axis apart from the point x D 0 itself (in the sense
of Cauchy principal value, Sect. I.4.5.4), the CR integral is zero in the limit as was
already mentioned, and hence we finally obtain
Z 1 Z
eix
P dx D D .i / D i :
1 x C
Noting that eix D cos .x/ C i sin .x/ and that the integral with the cosine function
is zero due to symmetry (the integration limits are symmetric but the integrand is an
odd function), we arrive at the following famous result:
Z 1
sin .x/
dx D : (2.122)
0 x 2
Note that the integrand is even and hence we were able to replace the integration
over the whole real axis by twice the integration over its positive part. Most
importantly, however, we were able to remove the sign P of the principal value
here since the function sin .x/ =x is well defined at x D 0 (and is equal to ). J
216 2 Complex Numbers and Functions
[Hint: write the sine via an exponential function and consider all three cases
for !.]
R
In many other cases of the integrals f .x/dx over the real axis the singularity of
f .x/ within the integration interval cannot be removed and hence integrals may only
exist in the sense of the Cauchy principal value. We shall now demonstrate that the
method of residues may be quite useful in evaluating some of these integrals.
Example 2.16. I The integral
Z 1
eix
f .x/dx with f .x/ D
1 .x a/ .x2 C b2 /
is not well defined in the usual sense since the integrand has a singularity at x D a
lying on the real axis; however, as we shall see, its Cauchy principal value,
Z 1 Z a Z 1
IDP f .x/dx D lim f .x/dx C f .x/dx ; (2.124)
1 !0 1 aC
is well defined. To calculate it, we shall take the contour as shown in Fig. 2.34: this
contour is similar to the one shown in Fig. 2.33, but the little semicircle C goes
around the singularity at z D a. There is only one simple pole at z D ib which is
inside the contour, so using the simplified notations we can write
Z Z Z Z
a R
eb
C C C D 2 i Res Œf .z/I ib D 2 i
R C aC CR .ib a/ .2ib/
a C ib b
D e : (2.125)
b a2 C b2
2.7 Residues 217
The integral over the large semicircle CR is equal to zero in the R ! 1 limit due to
the Jordan’s lemma; the integral over the small semicircle is however not zero as is
clear from a direct calculation (z D a C ei on C , and dz D iei d ):
Z Z 0
Z
exp i a C ei exp i a C ei
D h i ie d D i
i
h id
C ei .a C ei /2 C b2 0 .a C ei /2 C b2
Z
eia i eia
! i d D
0 a2 C b2 a2 C b2
in the ! 0 limit. The two integrals over the parts of the x axis in (2.125) in
the limits R ! 1 and ! 0 yield exactly the principal value integral (2.124),
therefore, we obtain
a C ib b i eia a C ib b
ID e C D e C ieia
:
b a2 C b2 a2 C b2 a2 C b2 b
Separating the real and imaginary parts on both sides, we obtain two useful integrals:
Z 1 ha i
cos .x/ dx b
P D e C sin .a/ ;
1 .x a/ .x2 C b2 / a2 C b2 b
Z 1
sin .x/ dx
P D 2 eb C cos .a/ : J
1 .x a/ .x2 C b2 / a C b2
We shall adopt the principal branch of the logarithmic function with ln rei D
ln r C i and 0 < < 2 , the branch cut being chosen along the positive part of
the x axis. The contour is the same as the one shown in Fig. 2.33. The four integrals
are to be considered which appear in the left-hand side of the Cauchy theorem:
Z Z Z R Z " #
ln z
C C C D 2 i Res I ia
R C CR .z2 C a2 /2
d ln z
D2 i
dz .z C ia/2 zDia
1=z 2 ln z
D2 i
.z C ia/2 .z C ia/3 zDia
2
D .ln a 1/ C i : (2.126)
2a3 4a3
Let us first consider the integral over the small circle C , where z D ei :
Z Z 0 Z 0 Z 0
ln C i ei i ln ei
D iei d D d d :
C . 2 ei2 C a2 /2 . 2 ei2 C a2 / 2
. 2 ei2 C a2 /2
Both integrals tend to zero in the ! 0 limit. We shall demonstrate this using
two methods: the first one is based on taking the limit inside the integrals; the other
method will be based on making estimates of the integrals. Using the first method,
the first integral behaves as ln for small , which tends to zero when ! 0; the
second integral behaves as and hence also goes to zero.
The same result is obtained using the second method which may be thought of
as a more rigorous one. Indeed, for any the following inequality holds:
q s
1 1
jln zj D jln C i j D ln2 C 2 D ln2 C 2 2 ln ;
so that we can estimate the integral by a product of the semicircle length, , and
the maximum value of the integrand, see Eq. (2.51):
p
8
Taking square of the both sides, we have 3 ln2 .1= / 2 or ln .1= / = 3, which is fulfilled
starting from some large number 1= as the logarithm monotonously increases. Note that is
limited to the value of 2 .
9
Since ja C bj a b, when a > b > 0.
2.7 Residues 219
ˇZ ˇ ˇ ˇ!
ˇ ˇ ˇ ˇ 2 ln .1= /
ˇ ln z ˇ ˇ ln z ˇ
ˇ dzˇ max ˇ ˇ ;
ˇ C .z2 C a2 / ˇ
2 C ˇ .z2 C a2 / ˇ
2
.a2 2 /2
where the quantity on the right-hand side can easily be seen going to zero when
! 0, and so is the integral. We see that a more rigorous approach fully supports
the simple reasoning we developed above and hence in most cases our simple
method can be used without hesitation.
Now we shall consider the integral over the large semicircle CR . Either of the
arguments used above proves that it goes to zero when R ! 1. Indeed, using our
(more rigorous) second method, we can write
ˇZ ˇ ˇ ˇ!
ˇ ˇ ˇ ˇ 2 ln .R/
ˇ ln z ˇ ˇ ln z ˇ
ˇ dz ˇ max ˇ ˇ R R ;
ˇ CR .z2 C a2 /2 ˇ CR ˇ .z2 C a2 /2 ˇ .R2 a2 /2
and the expression on the right-hand side goes to zero as ln R=R3 . The same result
is obtained by using the first (simpler) method.
So, we conclude, there is no contribution coming from the semicircle parts of the
contour. Let us now consider the horizontal parts. On the part R < x < the
phase of the logarithm is , so that ln z D ln jxj C i D ln .x/ C i , and we can
write
Z ˇ ˇ Z R
ln .x/ C i ˇ t D x ˇ ln t C i
dx D ˇ ˇD dt
2 ˇ ˇ
R .x2 C a2 / dt D dx .t2 C a2 /2
Z 1 Z 1
ln t dt
! 2
dt C i DICi J
0 .t2 C a2 / 0 .t2 C a2 /2
whereas for J we obtain exactly the same result =4a3 as in Problem 2.78(a).J
220 2 Complex Numbers and Functions
Fig. 2.35 (a) The contour taken for the calculation of the integral (2.127) in the complex plane.
The contour consists of a closed internal part and a closed external large circle CR . The cut has been
made effectively between points z D a and z D b. (b) The contour to be taken when solving
Problem 2.90(a); it goes all the way around the cut taken along the positive part of the x axis
discussed in Problem 2.24. So, we shall consider the branch cut going to the right
from the point z D a along the x axis as in Fig. 2.11 (where a D 1 and b D 1).
We select such branch of the root-three function for which its phase is between 0
and 2 =3, i.e. we restrict the phase of any z to 0 < arg.z/ < 2 . As f .z/ is only
discontinuous between the points z D a and z D b on the x axis, we choose the
contour as shown in Fig. 2.35(a). It consists of two parts: an external circle CR of
large radius R (which we shall eventually take to infinity) and of a closed path going
around the branch cut. In turn, this latter part is made of two circles C (around
z D a) and C0 (around z D b), and two horizontal parts, one going along the
upper side of the cut, the other along the lower. Let 1 and 2 are the radii of the two
small circles C0 and C , respectively. According to the Cauchy theorem for multiply
connected regions, the sum of the integrals going around these two contours is zero
as there are no poles between them:
2.7 Residues 221
Z Z Z aC 2
Z Z b 1
C C C C D 0: (2.128)
CR C0 b 1 C aC 2
On the large circle z D Rei , the function f .z/ is continuous, and hence we can
calculate the integral as follows:
Z Z 2
dz Riei d
q D q 2
.z C a/2 .z b/
3 0 3
CR
Rei C a Rei b
Z 2 Z 2
Riei d Riei d
! q D D2 i
0 3
.R2 e2i / Rei 0 Rei
2=3 Z
i ei d
D q 1 ;
ei =3
.a C b/2
3
2=3
which tends to zero as 1 when 1 ! 0. Similarly for the other small circle integral
(z D a C 2 ei with 0 < < 2 ):
Z Z 0 i Z 0 i
2 ie d 2 ie d
D q ! 2=3 i2 =3 p
C 2 3
. 2 ei /2 .b a C i / 2 2 e
3
.b C a/
2e
1=3 Z 2
i 2 =3
D p ei d ;
3
.a C b/ 0
1=3
which tends to zero as 2 . So, both small circle integrals give no contribution.
What is left to consider are the two horizontal parts of the internal contour.
Generally z D a C 2 ei 2 or z D b C 1 ei 1 (see Fig. 2.11(a)qfor the definitions of
the two angles 1 and 2 in a similar case), and hence f .z/ D 3 2 i.2 2 C 1 /=3
2 1e . On
the upper side of the cut between the two points 2 D 0, 1 D , 2 D x C a and
1 D b x, so that
q
f .z/ D .x C a/2 .b x/ei =3
3
222 2 Complex Numbers and Functions
and the corresponding integral over the upper side of the cut becomes
Z Z Z
b 1 b 1
ei =3 dx b
ei =3 dx
D q ! q D ei =3
I:
2
.x C a/2 .b x/
aC aC 3 a 3
2 2 .x C a/ .b x/
Interestingly, the result does not depend on the values of a and b. This can also
easily be understood by making a substitution t D .2x C a b/ = .a C b/. J
Problem 2.90. Using the contour shown in Fig. 2.35(b), prove the following
results:
Z 1
x dx a
.a/ D ; where 1 < < 0 I
0 xCa sin . /
Z 1 p Z 1 p
x x ln x
.b/ 2 2
dx D p I .c/ dx D p C ln a :
0 x Ca 2a 0 x C a2
2
2a 2
Problem 2.91. Consider the same integrals using the contour shown in
Fig. 2.33.
with variable functions p.x/ and q.x/ based on a generalised series expansion,
1
X
y.x/ D cr .x x0 /rCs ; (2.130)
rD0
considered on the complex plane, i.e. the variable functions p.z/ and q.z/ and
the solution itself, y.z/, are treated as functions of a complex variable z. What
we shall be interested in here is addressing the following question: under which
circumstances can one apply a generalised series expansion,
1
X
y.z/ D cr .z z0 /rCs ; (2.132)
rD0
in order to solve the DE (2.131), and what is a practical criterion one may apply in
order to determine if this can be done. We shall not be considering the question of
convergence of the series.
We shall first consider the case of the point z0 being an ordinary point, i.e. when
the functions p.z/ and q.z/ are analytic functions within some circle of radius R
centred at z0 , i.e. for all z satisfying jz z0 j < R. Therefore, they can be expanded
in the Taylor series around z0 :
This process can be continued. It is clear, that any derivative y.n/ .z0 / of the solution
can be calculated this way and expressed via the initial conditions and the expansion
coefficients of the functions p.z/ and q.z/. In other words, the solution y.z/ in this
case can be sought in the form of the Taylor expansion
y.z/ D ˛ C ˇu C y2 u2 C :
224 2 Complex Numbers and Functions
The unknown coefficients y2 , y3 , etc., are obtained by substituting the expansion into
the DE, collecting coefficients to the same powers of u and setting them to zero. This
is exactly the method we used in Sect. I.8.4. It can be shown that the series obtained
in this way converges uniformly to the solution of the DE within the same region
jz z0 j < R where the functions p.z/ and q.z/ are analytic.
Consider now a much more interesting (and rather non-trivial) case of the
functions p.z/ and q.z/ having a singularity at z0 . In this case these functions
of coefficients of the DE are in general expanded into infinite10 Laurent series
(Sect. 2.5.4):
1
X 1
X
p.z/ D pk uk and q.z/ D qk uk :
kD1 kD1
The point z0 becomes a branch point. This means that if we consider two linearly
independent solutions y1 .z/ and y2 .z/ of the DE (we assume that they both exist),
then, when passing around z0 along a closed contour starting from a point z, they
arrive to some values yC C
1 .z/ and y2 .z/ when arriving back to z, which are not the
same as the starting points, i.e. y1 .z/ ¤ y1 .z/ and yC
C
2 .z/ ¤ y2 .z/ (the superscript
“C” hereafter indicates the value of a function after completing the full closed
contour around z0 ). However, since y1 and y2 are the two linearly independent
solutions, the new values must be some linear combinations of the same functions
y1 and y2 , i.e.
yC
1 D a11 y1 C a12 y2 and yC
2 D a21 y1 C a22 y2 : (2.133)
Problem 2.92. Prove by contradiction that a11 a22 a12 a21 ¤ 0, as otherwise
the two solutions y1 and y2 , which are assumed to be linearly independent,
appear to be linearly dependent (i.e. y2 D y1 with some complex number ),
which is impossible by the assumption.
Y C D Y H) b1 yC C
1 C b2 y2 D .b1 y1 C b2 y2 / :
10
Either of the series may contain a finite number of terms in its fundamental part which is a
particular case of the infinite series.
2.8 Linear Differential Equations 225
Problem 2.93. Prove this by contradiction, i.e. assuming that Y2 .z/ D Y1 .z/
with some complex number .
It is also easy to see that the eigenvalues are the property of the DE itself;
they do not depend on the particular linear combinations taken as the linearly
independent solutions. Indeed, consider two new functions w1 and w2 defined by
the transformation
w1 c11 c12 y1
WD D D CY:
w2 c21 c22 y2
i.e. f .z/ behaves similarly to Y1 .z/. Hence the function Y1 .z/=f .z/ D
Y1 .z/= .z z0 /s1 does not change when going around the point z0 , i.e. it is single-
valued around the vicinity of this point, except may be at the point z0 itself.
Therefore, it can be expanded in a Laurent series, i.e. one can write
1
X
Y1 .z/ D .z z0 /s1 k .z z0 /k : (2.135)
kD1
has a similar form but with the s2 D ln 2 =2 i instead (the coefficients in the
Laurent series are most likely also different). Recall that the complex numbers s1
and s2 are defined up to arbitrary integers m1 and m2 , respectively. However, this
does not change the general form of the series (2.135) or (2.136) as the Laurent
series in each case contains all possible powers of .z z0 /.
Consider now the case of equal eigenvalues 1 D 2 of the matrix A (or AT ).
Note that the corresponding numbers s1 and s2 may in fact differ by an integer. In
this case Y1 satisfies Y1C D 1 Y1 , while the second possible linearly independent
solution Y2 must generally satisfy Y2C D a21 Y1 C a22 Y2 . The corresponding matrix
A in this case in the basis of Y1 and Y2 (we have learned that the eigenvalues are
invariant with respect to the choice of the basis) would have a11 D 1 and a12 D 0;
therefore, the determinant equation for the eigenvalues is of the form
ˇ ˇ
ˇ 1 a21 ˇ
ˇ ˇ
ˇ 0 a22 ˇ D 0:
The condition that the second root of this equation coincides with the first, 1 , means
that a22 D 1 , so that the second solutions Y2 must in fact satisfy the condition
Y2C D a21 Y1 C 1 Y2 .
Consider now the function
Y2 a21
f .z/ D ln .z z0 / : (2.137)
Y1 2 i1
Upon a complete traverse around z0 this function turns into
2.8 Linear Differential Equations 227
Y2C a21
f C .z/ D Œln .z z0 / C 2 i
Y1C 2 i1
a21 Y1 C 1 Y2 a21
D Œln .z z0 / C 2 i
1 Y 1 2 i1
Y2 a21 a21 Y2 a21
D C Œln .z z0 / C 2 i D ln .z z0 / D f .z/;
Y1 1 2 i1 Y1 2 i1
i.e. the function f .z/ is single-valued inside some circle with the centre at z0 accept
maybe at the point z0 itself, and therefore can be represented by a Laurent series.
This means that the second solution, according to Eq. (2.137), must have the form:
where D a21 =2 i1 is some complex number. Thus, the second solution should
contain a logarithmic term.
So far we did not make any assumptions concerning the nature of the point z0 ;
it could have been either a pole (sometimes also called a regular singular point,
Sect. I.8.4) or an essential singularity (an irregular singular point), see Sect. 2.5.5.
Let us consider specifically the case when the point z0 is a pole and hence each
Laurent series contains a finite number of terms .z z0 /k with negative powers k.
We can then add appropriate integers m1 and m2 to s1 and s2 , respectively, to ensure
that the Laurent series in either of the cases considered above do not contain the
negative powers at all, and this brings us to the method presented in Sect. I.8.4. What
is only left to understand is how one can determine from the DE itself whether the
point z0 is regular or irregular.
To this end, let us assume that we know two linearly independent solutions y1 .z/
and y2 .z/ of the DE (2.131), i.e.
Problem 2.94. Solve the above equations with respect to the functions p.z/
and q.z/ to show that they can be expressed via the solutions y1 and y2 as
follows:
We shall now consider the necessary criterion for the first case, when 1 ¤ 2 .
Since z0 is a pole by our assumption, the two solutions can be written as
where u D z z0 and P1 .u/ and P2 .u/ are two functions which are well defined
(analytic) at u D 0 and some vicinity around it, i.e. they both can be represented
by Taylor expansions around u D 0 with a non-zero free term, i.e. P1 .0/ ¤ 0 and
P2 .0/ ¤ 0. This is because the behaviour of y1 or y2 around u D 0 has already been
described by the corresponding power terms us1 and us2 , and starting from a zero
free term in the expansions of the functions P1 and P2 would simply modify the
exponents s1 and s2 by an integer. We are using here symbolic notations whereby
Pn .u/ represents a Taylor expansion with a non-zero free term and with the index n
numbering various such functions we shall encounter in what follows.
The idea is to obtain a general form of the functions p.z/ and q.z/ from (2.139)
based on the known general form of the solutions in (2.140). To this end, we need
to calculate derivatives of the solutions and then the Wronskian and its derivative.
Since a derivative of a function P1 .u/ is also some expansion P3 .u/, we can write
y01 D s1 us1 1 P1 .u/ C us1 P3 .u/ D us1 1 Œs1 P1 .u/ C uP3 .u/ D us1 1 P4 .u/
and
W 0 D .s1 C s2 1/ us1 Cs2 2 P8 .u/ C us1 Cs2 1 P9 .u/ D us1 Cs2 2 P10 .u/:
Next, using the same method as above, we get y001 D us1 2 P12 .u/. Hence, for q.z/
we obtain
y001 .z/ y0 .z/ us1 2 P5 .u/ P11 .u/ us1 1 P4 .u/ P13 .u/
q.z/ D p.z/ 1 D s C D
y1 .z/ y1 .z/ u P1 .u/
1 u u P1 .u/
s1 u2
P13 .z z0 /
D :
.z z0 /2
Therefore, the necessary conditions for the point z0 to be a regular singular point is
that p.z/, if singular, has a pole of the first order, while q.z/, if singular, has either
2.9 Selected Applications in Physics 229
first or second order pole. Of course, one of these functions may be regular at z0 (i.e.
it may have a finite limit there), but then the other should be singular. The simple
criteria for determining if these conditions are satisfied are then based on calculating
the limits
If both limits result in finite numbers (including zeros), then the point z0 is a regular
singular point. This is exactly the criterion we used in Sect. I.8.4 without proper
proof.
Problem 2.95. Use a similar reasoning to show that a general form of the
functions p.z/ and q.z/ still remains the same if the two independent solutions
have the form (2.135) and (2.138). [Hint: write y2 D y1 . ln u C P2 .u// and
keep explicitly the terms by the logarithm when calculating W; in this way the
logarithms containing terms will cancel out.]
It can be shown that the stated above conditions for the function p.z/ and q.z/ are
also sufficient for the point z0 to be a regular singular point.
Fig. 2.36 Two possible contours which can be used to prove the dispersion relations (2.144). Here
CR is a semicircle of radius R ! 1, while C is a semicircle of radius ! 0 around the point x0
which is passed either above (a) or below (b) the real axis
ˇ ˇ ˇQn ˇ Qn Qn
ˇ Pn .z/ ˇ ˇ ˇ
iD1 .z ai / jz ai j .jzj C jai j/
jf .z/j D ˇˇ ˇD ˇQ ˇˇ D Qm ˇˇ
iD1 ˇ QmiD1 ˇ ˇ
Qm .z/ ˇ ˇ m
ˇ jD1 z bj ˇ jD1 z bj
ˇ
jD1 jzj bj
ˇ ˇ
Qn
.R C jai j/
D QmiD1 ˇ ˇ ; (2.141)
ˇ ˇ
jD1 R bj
where ai and bi are zeros of the two polynomials. The expression in the right-hand
side of (2.141) tends to zero as R ! 1 since the polynomial in the denominator is
of a higher order (m > n) with respect to R than in the numerator. This convergence
to zero does not depend on the phase and hence is uniform. Note that it is not at all
obvious that the same can be proven for a general function f .x/; however, in many
physical applications f .x/ is some rational function of x.
Therefore, let us consider an analytic function f .z/ which tends to zero uniformly
when jzj ! 1. We shall also assume that f .z/ does not have singularities in the
upper half of the complex plane including the real axis itself. Then, if x0 is some
real number,
I
f .z/dz
D 0;
L z x0
according to the Cauchy formula (2.53), where L is an arbitrary loop running in the
upper half plane. The loop may include the real axis, but must avoid the point x0 .
Consider now a particular loop shown in Fig. 2.36(a), where we assume taking the
limits R ! 1 and ! 0. We can write
Z x0 Z Z R Z
C C C D 0; (2.142)
R C x0 C CR
where the integral over the large semicircle CR tends to zero due ˇ to ˇ
uniform
convergence of f .z/ to zero as jzj D R ! 1. Indeed, we can write ˇf Rei ˇ MR
with MR ! 0 when R ! 1. Hence,
2.9 Selected Applications in Physics 231
ˇZ ˇ ˇZ ˇ Z ˇ ˇ
ˇ
ˇ f .z/dz ˇˇ ˇˇ f Rei Riei ˇ
ˇ ˇf Rei ˇ R
ˇ D ˇ d ˇ ˇ i ˇd
CR z x0 ˇ ˇ 0 Rei x0 ˇ 0
ˇRe x0 ˇ
Z
R ˇ i ˇ MR
ˇf Re ˇ d :
R x0 0 1 x0 =R
It is obvious now that the estimate in the right-hand side tends to zero as R ! 1.
Now, let us calculate the integral over the small semicircle where z D x0 C ei . We
have
Z Z 0 i
f .z/dz f x0 C ei ie
D d
C z x0 ei
Z Z
D i f x0 C ei d ! if .x0 / d D i f .x0 /
0 0
in the ! 0 limit. The two integrals along the real x axis in Eq. (2.142) combine
into the Cauchy principal value integral
Z x0 Z 1 Z 1
f .x/
C ! P dx
R x0 C 1 x x0
Remarkably, only the values of the function on the real axis enter this expression!
Normally in applications this formula is presented in a different form in which real
and imaginary parts of the function f .x/ are used:
Z 1 Z 1
1 Im f .x/ 1 Re f .x/
Re f .x0 / D P dx and Im f .x0 / D P dx:
1 x x0 1 x x0
(2.144)
These relationships were discovered by H.A. Kramers and R. Kronig in relation
to real and imaginary parts of the dielectric constant of a material. However, the
physical significance of these relations is much wider. They relate imaginary and
real parts of the time Fourier transform (to be considered in Chap. 5) of a response
function .t / of an observable G.t/ of a physical system subjected to an external
perturbation F.t/ (e.g. a field):
Z t
G.t/ D .t / F. /d: (2.145)
1
Note that the integral here is taken up to the current time t due to causality: the
observable G.t/ can only depend on the values of the field at previous times, it
232 2 Complex Numbers and Functions
cannot depend on the future times > t. An example of Eq. (2.145) could be, for
instance, between the relationship between the displacement vector D.t/ and the
electric field E.t/ in electromagnetism, in which case the response function is the
time dependent dielectric function .t /.
It can be shown using the Fourier transform method that because of the causality
alone (i.e. the response function .t/ D 0 for t < 0), the Kramers–Kronig relations
can be derived independently. Note that the imaginary part of the response function
is responsible for energy dissipation in a physical system, while the real part of
.t / corresponds to a driving force.
Problem 2.97. Prove formula (2.143) using the contour shown in Fig. 2.36(b).
Problem 2.98. Consider specifically a response function .!/ D 1 .!/ C
i2 .!/ with 1 and 2 being its real and imaginary parts and ! > 0 a real
frequency. For physical systems .!/ D .!/, which guarantees that the
physical observables remain real. Show that in this case the Kramers–Kronig
relations read
Z 1 Z 1
2 !2 .!/ 2!0 1 .!/
1 .!/ D P 2 !2
d! and 2 .!/ D P 2 !2
d!:
0 ! 0 0 ! 0
(2.146)
where 1 .!/ D 1 if 1 < ! < 1 and 0 otherwise. Use the following method: (i)
relate first the real part of the function e
K.!/ to 1 .!/ of Eq. (2.123) expressed
via a similar integral; (ii) then, using the Kramers–Kronig relations, determine
the imaginary part of e K.!/.
Let us first discuss a little what a wave is. Consider the following function of the
coordinate x and time t:
If we choose a fixed value of the x, the function ‰ .x; t/ oscillates at that point
with the frequency ! between the values ˙‰0 . Consider now a particular value
of ‰ at time t (x is still fixed). The full oscillation cycle corresponds to the time
2.9 Selected Applications in Physics 233
t C T after which the function returns back to the chosen value. Obviously, this must
correspond to the !T being equal exactly to the 2 which is the period of the sine
function. Therefore, the minimum period of oscillations is given by T D 2 =!.
Let us now make a snapshot of the function ‰ .x; t/ at different values of x but at
the given (fixed) time t. It is also a sinusoidal function shown in blue in Fig. 2.37.
After a passage of small time t the function becomes
and its corresponding snapshot is shown in red in Fig. 2.37. We notice that the
whole shape is shifted to the right by some distance x. This can be calculated
by, e.g. considering the shift of the maximum of the sine function. Indeed, at time
t the maximum of one of the peaks is when kxm !t D =2. At time t C t the
maximum must be at the point xm C xm , where k .xm C xm / ! .t C t/ D =2.
From these two equations we immediately get that xm D !t=k. Therefore, the
function (2.147) “moves” to the right with the velocity vphase D xm =t D !=k.
Over the time of the period T the function would move by the distance D
vphase T D !T=k D 2 =k.
We see that the specific construction (2.147) corresponds to a wave. It could
be, e.g. a sound wave: for the given position x the function ‰ oscillates in time
(particles of air vibrate in time at the given point), but at each given time the
function ‰ at different positions x forms a sinusoidal shape (the displacement of
air particles at the given time changes sinusoidally with x). This sinusoidal shape, if
considered as a function of time, moves undistorted in the positive x direction with
the velocity vphase: This would be the velocity at which the sound propagates: the
sound (oscillation of air particles) created at a particular point x0 (the source) would
propagate along the direction x with the velocity vphase . The latter is called phase
velocity, ! the frequency, k the wave vector and the distance the wave passes over
the time T the wavelength.
234 2 Complex Numbers and Functions
Since the wave depends only on a single coordinate x, points in space with
different y and z but the same x form a plane perpendicular to the x axis and passing
through that value of x. All y and z points lying on this plane would have exactly the
same properties, i.e. they oscillate in phase with each other; this wave is actually a
plane wave since its front is a plane. A point source creates a spherical wave with
the front being a sphere, but at large distances from the source such waves can be
approximately considered as plane waves as the curvature of a sphere of a very large
radius is very small.
The function ‰ satisfies a simple partial differential equation:
@2 ‰ 1 @2 ‰
2
D 2 ; (2.148)
@x vphase @t2
which is called the (one-dimensional) wave equation. We have already come across
wave equations in Sect. I.6.7.1, but these were more general three-dimensional
equations corresponding to waves propagating in 3D space.11
Of course, the cosine function in place of the sine function above can serve
perfectly well as a one-dimensional wave. Moreover, their linear combination
would also describe a wave. Its shape (which moves undistorted with time) is still
perfectly sinusoidal as is demonstrated by the following simple manipulation (where
' D kx !t):
p
A B
A sin ' C B cos ' D A2 C B2 p sin ' C p cos '
A2 C B2 A2 C B2
p p
D A2 C B2 .cos sin ' C sin cos '/ D A2 C B2 sin.' C / ;
where tan D B=A. And of course the form (2.149) still satisfies the same
wave equation (2.148). At the same time, we also know that both sine and cosine
functions can also be written via complex exponentials by virtue of the Euler’s
formulae (2.33). Therefore, instead of Eq. (2.149) we can also write
where C1 and C2 are complex numbers ensuring that ‰ is real. Obviously, since
the second exponential is a complex conjugate of the first, the ‰ will be real if
C2 D C1 . It is readily checked that the function (2.150) still satisfies the same wave
equation (2.148). Although complex exponential functions are perfectly equivalent
to the real sine and cosine functions, they are much easier to deal with in practice,
and we shall illustrate their use now in a number of simple examples from physics.
11
We shall consider the 1D wave equation in more detail in Sect. 8.2.
2.9 Selected Applications in Physics 235
4 1 @D 1 @H
curlH D jC ; divH D 0; curlE D ; divE D 0; (2.151)
c c @t c @t
4 @
curl curlH D curlE C curlE:
c c @t
Using the fact that
(the divergence of E is zero due to the last equation in (2.151)), and the third
equation (2.151) relating curlE to H, we obtain a closed equation for H:
4 @H @2 H
H D 2
C 2 2 : (2.152)
c @t c @t
A similar calculation starting form the third equation in (2.151) results in an
identical closed equation for E:
4 @E @2 E
E D C : (2.153)
c2 @t c2 @t2
Consider now a plane wave propagating along the z direction:
and similarly for the H.z; t/. Substituting this trial solution into Eq. (2.153) should
give us the unknown dependence of the wave vector k on the frequency !. We obtain
r
2!2 4 ! 4 !
k D 2 Ci H) k.!/ D Ci D .n C i/ :
c ! c ! c
(2.155)
236 2 Complex Numbers and Functions
We see that the wave vector k is complex. This makes perfect sense as it corresponds
to the propagating waves attenuating into the media. Indeed, using the value of k in
Eq. (2.154), we obtain
where k0 D !n=c and ı D !=c. It is seen that the amplitude of the wave, E0 eız ,
decays in the media as the energy of the wave is spent on accelerating conducting
electrons in it.
Problem 2.100. Show that the solution of the equation (2.155) with respect to
real and imaginary parts of the wave vector reads
v "r # v "r #
u u
u1 16 2 2 u1 16 2 2
nDt 2 C C and Dt 2 C :
2 !2 2 !2
(2.156)
Problem 2.101. Assume that the conductivity of the media is small as
compared to , i.e. =! . Then show that in this case
p 2
n' and ' p ;
!
and hence n. This means that absorption of energy in such a media is very
small.
4 @H 4 @E
H D 2
and E D 2
; (2.157)
c @t c @t
We assume that the current flows along the x axis:
jx D j.z/ei!t ; jy D 0; jz D 0:
Because of the continuity equation, Sect. I.6.5.1, div j D 0 and hence @jx =@x D
0, i.e. the current cannot depend on x, it can only depend on z (if we assume, in
addition, that there is no y dependence either). This seems a natural assumption for
the conductor which extends indefinitely in the positive and negative y directions:
the current may only depend on the distance z from the boundary of the conductor,
and hence the function j.z/ characterises the distribution of the current with respect
2.9 Selected Applications in Physics 237
to that distance. To find this distribution, we note that the current and the electric
field are proportional, j D E, and hence only the x component of the field E
remains, Ex .z; t/ D E.z/ei!t , where j.z/ D E.z/. Substituting this trial solution
into the second equation (2.157) for the field E, we obtain
@2 E.z/ 4 !
2
D i 2 E.z/: (2.158)
@z c
Problem 2.102. Assuming an exponential solution, E.z/ ez , show that the
two possible values of the exponent are
1 C ip 1Ci
1;2 D ˙ 2 !D˙ ; (2.159)
c ı
Physically, it is impossible that the field (and, hence, the current) would increase
indefinitely with z; therefore, the first term above leading to such a nonphysical
behaviour must be omitted (A D 0), and we obtain
The current jx .z; t/ D Ex .z; t/ behaves similarly. It appears then that the field and
the current decay exponentially into the conductor remaining non-zero within only
a thin layer
p near its boundary, with the width of the layer being of the order of
ı D c= 2 !. The width decays as ! 1=2 with the frequency !, i.e. the effect is
more pronounced at high frequencies. This is called the skin effect. As ! ! 0 the
“skin” width ı ! 1, i.e. there is no skin effect anymore: the direct (not alternating)
current is distributed uniformly over the conductor.
Problem 2.103. Using the third equation in (2.151), show that the magnetic
field is
.1 i/ c
Hx D 0; Hy .z; t/ D Ex .z; t/ ; Hz D 0;
!ı
i.e. the magnetic field also decays exponentially into the body of the conductor
remaining perpendicular to the electric field. Numerical estimates show that
at high frequencies ! the magnetic field is much larger than the electric field
within the skin layer.
238 2 Complex Numbers and Functions
„2 00
.x/ C V.x/ .x/ D E .x/; (2.161)
2m
where „ is the Planck constant (divided by 2 ), m electron mass and E is the electron
energy.
Here we shall consider a number of problems where an electron behaves as a
wave which propagates in space with a particular momentum and may be reflected
from a potential barrier; at the same time, we shall also see that under certain
circumstances the wave may penetrate the barrier and hence transmit through it,
which is a purely quantum effect called tunneling.
Consider first the case of an electron moving in a free space where V.x/ D 0.
The solution of the differential equation (2.161) can be sought in a form of an
exponential, .x/ eikx , which upon substitution yields E D „2 k2 =2m, the energy
of a free electron. The energy is positive and continuous. The wave function of
the electron corresponds to two possible values of the wave vectorp k (in fact, it is
just a number in the one-dimensional case), which are: k˙ D ˙ 2mE=„ D ˙k.
Correspondingly, the wave function can be written as a linear combination
where A and B are complex constants. The first term in .x/ above describes the
electron moving along the positive x direction, whereas the second term—in the
negative one.
Let us now consider a more interesting case of an electron hitting a wall, as
shown in Fig. 2.38(a). We shall consider a solution of the Schrödinger equation
corresponding to the electron propagating to the right from x D 1. In this case in
the region x < 0 (the left region) the wave function of the electron
Fig. 2.38 (a) The potential V.x/ makes a step of height V0 at x D 0, so that the electron wave
D eikx propagating towards the step (to the right), although partially penetrating into the step,
will mostly reflect from it propagating to the left as eikx ; (b) in the case of the potential
barrier of height V0 and width d the electron wave propagates through the barrier as eikx ,
although a partial reflection from the barrier is also taking place; (c) the same case as (b), but in
this case a bias < 0 is applied to the left electrode, so that the potential experienced by the
electrons U D e > 0 (e < 0 is the electron charge) on the left becomes higher than on the right
causing a current (a net flow of electrons to the right)
p
where g D 2m .E V0 /=„. We shall consider the most interesting case of the
electron energy E < V0 . In a classical setup of the problem the electron would not be
able to penetrate into the right region with this energy; however,
p quantum mechanics
allows for some penetration. Indeed, in this case g D i 2m .V0 E/=„ D i
becomes purely imaginary and hence the wave function in the barrier region
becomes
Since > 0, the second term must be dropped as it leads to an infinite increase of
the wave function at large x which is nonphysical (recall that j .x/j2 represents the
probability density). Therefore,
To find the amplitudes in Eqs. (2.162) and (2.163), we exploit the fact that the
wave function .x/ and its first derivative must be continuous across the whole x
axis. This means that the following conditions must be satisfied at any point x D xb :
ˇ ˇ
d ˇ d ˇ
.xb 0/ D .xb C 0/ and ˇ
.x/ˇ D .x/ˇˇ ; (2.164)
dx xDxb 0 dx xDxb C0
where C0 and 0 correspond to the limits of x ! xb from the right and left, respec-
tively. In our case the potential makes a jump at xb D 0 and hence this particular
240 2 Complex Numbers and Functions
Using these conditions, we immediately obtain two simple equations for the
amplitudes:
We see that in this case all of the incoming wave gets reflected from the step;
however, some of the wave gets transmitted into the step through the wall: the
probability density P.x/ D j R .x/j2 D jCj2 e2x is non-zero within a small region
of the width x 1=2 behind the wall of the step, although it decays exponentially
inside the step. This behaviour is entirely due to the quantum nature of the electron.
Since no electron is to be determined at x D C1, the transmission through the step
is zero in this case. Note that x ! 0 when V0 ! 1, i.e. the region behind the wall
(x > 0) is forbidden even for quantum electrons when the step is infinitely high.
A very interesting case from the practical point of view is a step of a finite
width shown in Fig. 2.38(b). In this case two solutions of the same energy E of the
Schrödinger equation exist: one corresponding to the wave propagating from left to
right, and one in the opposite direction. Let us first consider the wave propagating
to the right. Three regions are to be identified: x < 0 (left, L), 0 < x < d (central, C)
and x > d (right, R). The corresponding solutions of the Schrödinger equation for
all these regions are
p p
where k D 2mE=„ and q D 2m .E V0 /=„. We have set the amplitude of the
wave incoming to the barrier in L to one as the amplitudes can only be determined
relatively to this one anyway; also note that only a single term exists for the right
wave, R , as we are considering the solution propagating to the right.
while in the case of 0 < E < V0 (when q becomes imaginary) they are
2
C k2 sinh .d/
RD 2 ;
.k 2 / sinh .d/ 2ik cosh .d/
2ikeikd
TD ;
.k2 2 / sinh .d/ 2ik cosh .d/
k .i C k/ ed
CD and
.k2 2 / sinh .d/ 2ik cosh .d/
k .i k/ ed
DD ;
.k2 2 / sinh .d/ 2ik cosh .d/
p
where D 2m .V0 E/=„.
242 2 Complex Numbers and Functions
where jŒ states explicitly that the current depends on the wave function
to be used (it is said that the current is a functional of the wave function; we
shall consider this notion in more detail in Chap. 9) and e is the (negative)
electron charge. The current should not depend on the x, i.e. it is expected that
it is constant across the whole system. Show that the currents calculated using
L .x/ (i.e. in the left region of Fig. 2.38(b)) and R .x/ (in the right region) are,
respectively:
e„k e„k
jL D 1 jRj2 and jR D jTj2 : (2.168)
m m
Then demonstrate by a direct calculation based on the expressions for the
amplitudes given in Problem 2.104 that
jTj2 D 1 jRj2 :
This identity guarantees that the current on the left and on the right due to an
electron of any energy E is the same.
Problem 2.107. The reflection coefficient r is defined as the ratio of the current
(which is basically a flux of electrons) of the reflected wave, j Reikx (here we
assume jŒ calculated specifically for D Reikx ), to the current due to the
incoming one, j eikx , while the transmission coefficient t is defined as the ratio
of the transmitted, j Teikx , and the incoming waves. Show that
j Reikx j Teikx
rD D jRj2 and tD D jTj2 :
j Œeikx j Œeikx
Considering specifically the two possible cases for the electron energy, show
that:
2 2 " 2 #1
q k2 k2 q2 2
r.E/ D ; t.E/ D 1 C sin .qd/
.q2 C k2 /2 C 4k2 q2 cot2 .qd/ 2kq
when E > V0
(continued)
2.9 Selected Applications in Physics 243
where the energy dependence comes from that of k and as given above.
Problem 2.108. Consider now the other solution of the Schrödinger equation
corresponding to the propagation of an electron of the same energy E from right
to left. In this case the solutions for the wave functions in the three regions we
are seeking are
Therefore, the net current due to electrons traveling from the left and from the
right, which is the difference of the two, is zero. This result is to be expected as the
situation is totally symmetric: there is no difference between the two sides (x < 0
and x > d) of the system. In order for the current to flow, it is necessary to distort this
balance, and this can be achieved by applying a bias to the system which would
result in the electrons on the right and left to experience a different potential V.x/.
The simplest case illustrating this situation is shown in Fig. 2.38(c) where, because
of the bias < 0 applied to the left electrode, the potential V.x/ D U D e > 0
in the left region (x < 0), while V D 0 in the right one (x > d). Recall that e is the
(negative) electron charge.
p p p
where k D 2m .E U/=„, q D 2m .E V0 /=„ and p D 2mE=„, and
then determine the amplitudes R! and T ! ; you should get these:
(continued)
244 2 Complex Numbers and Functions
These formulae are valid for any energy E; specifically, q D i when E <
V0 and the sine and cosine functions are understood in the proper sense as
functions of the complex variable.
The most interesting case to consider is when electrons are with energies
U < E < V0 which corresponds to the tunneling. Then, show that for these
energies:
2 4k2 2
t! D jT ! j D 2
;
. 2 pk/ sinh2 .d/ C 2 .p C k/2 cosh2 .d/
2 2
2 C pk sinh2 .d/ C 2 .p k/2 cosh2 .d/
r! D jR! j D :
. 2 pk/2 sinh2 .d/ C 2 .p C k/2 cosh2 .d/
Using the complete wave functions in the left and right regions, show that the
corresponding currents are
Note that j! !
L ¤ jR . Next, consider the reverse current, from right to left.
Repeating the calculations, show that in this case (for any E):
2ipd q .p k/ C i q2 pk tan .qd/
R De and
q .k C p/ i .q2 C pk/ tan .qd/
2pqeipd
T D : (2.172)
q .k C p/ cos .qd/ i .q2 C pk/ sin .qd/
2 p2 ! 2
t D jT j D t and r D jR j D r! :
k2
e„ 4 2 pk .k p/
jD 2
:
m . 2 pk/ sinh2 .d/ C 2 .p C k/2 cosh2 .d/
The problem we have just considered is a very simplified model for a very
real and enormously successful experimental tool called Scanning Tunneling
Microscopy (STM) for invention of which Gerd Binnig and Heinrich Rohrer
received the Nobel Prize in Physics in 1986. Schematically STM is shown in
Fig. 2.39(a). An atomically sharp conducting tip is brought very close to a con-
ducting surface and the bias is applied between the two. As the distance between
the tip apex atom (the atom which is at the tip end, the closest to the surface)
and the surface atoms reaches less than '10 Å, a small current (in the range
of pA to nA) is measured due to tunneling electrons: the vacuum gap between
the two surfaces serves as a potential energy barrier we have just considered in
Problem 2.109. Most importantly, the current depends on the lateral position of
the tip with respect to the atomic structure of the surface. This allows obtaining in
many cases atomic resolution when scanning the sample with the STM. Nowadays
Fig. 2.39 (a) A sketch of the Scanning Tunneling Microscope (STM): an atomically sharp tip
is placed above a sample surface 4–10 Å from it. When a bias is applied to the tip and sample,
electrons tunnel between the two surfaces. (b) STM image of an Si(111) 77 reconstructed surface
with a small coverage of C60 molecules on top of it; bias voltage of 1:8 V and the tunnel current
of 0.2 nA were used (reproduced with permission from J.I. Pascual et al. —Chem. Phys. Lett. 321
(2000) 78)
246 2 Complex Numbers and Functions
the STM is widely used in surface physics and chemistry as it allows not only
imaging atoms and adsorbed molecules on crystal surfaces, but also moving them
(this is called manipulation) and performing local chemical reactions (e.g. breaking
a molecule into parts or fusing parts together). In Fig. 2.39(b) a real STM image of
C60 molecules adsorbed on top of the Si(111) reconstructed surface is shown. You
can clearly see atoms of the surface as well as the molecules themselves as much
bigger circles. Atoms on the surface form a nearly perfect arrangement, however,
many defects are visible. The molecules are seen as various types of bigger balls
with some internal structure (the so-called sub-molecular resolution): carbon atoms
of the molecules can be distinguished as tiny features on the big circles. Analysing
this substructure, it is possible to figure out the orientation of the molecules on the
surface. Different sizes of the balls correspond most likely to different adsorption
positions of the molecules.
Consider
an n-level quantum system described by the n n Hamiltonian matrix
H D H kj . If ‰0 D . k / is a vector-column of the system wave function at time
t, then at time t0 > t its wave function can be formally written as ‰t0 D Ut0 t ‰t ,
where Ut0 t is a matrix to be calculated. Due to its nature it is called the propagation
matrix as it describes a propagation of the system state vector ‰t from the initial
time t to the final time t0 . Let us calculate the propagation matrix in the case when
the Hamiltonian matrix H does not depend on time.
We shall start by solving approximately the time dependent Schrödinger equation
d‰t
i„ D H‰t (2.173)
dt
for a very small propagation time t. In this case the time derivative of the wave
function vector can be approximated as
Now, let k and xk be the eigenvalues and eigenvectors of the matrix H. The latter
is Hermitian, H D H, and hence the eigenvalues are real and eigenvectors can
P an orthonormal set, xk xj D ıkj .
always be chosen in such a way that they comprise
Expand ‰t in terms of the eigenvectors, ‰t D k ˛k xk . Multiplying both sides of
this equation by xk from the left and using the orthonormality relation between the
eigenvectors, the coefficients ˛k can be found as ˛k D xk ‰t . Hence, one can then
rewrite Eq. (2.174) as:
m X X m
t t
‰tCmt ' 1C H ˛k xk D ˛k 1 C H xk
i„ k k
i„
X t
m
D ˛k 1 C k xk :
k
i„
At the last step we used the spectral theorem, Eq. (1.90), stating that a function of a
matrix, acting on its eigenvector, can be replaced with the same function calculated
at the corresponding eigenvalue. Let t0 D t C mt be the finite time after the
propagation. Then t D .t0 t/ =m and we obtain
X m
k .t0 t/ =i„
‰tCmt D ‰t0 ' xk ‰t 1C xk :
k
m
The above formula becomes exact in the limit of t ! 0 or, which is equivalent,
when m ! 1. As we already know (see Sect. 2.3.3), that limit is equal to the
exponential function:
X
k .t0 t/ =i„ m
‰ D
t0 lim 1 C
xk ‰t xk
k
m!1 m
" #
X X
k .t0 t/=i„ k .t0 t/=i„
D xk ‰t e xk D e xk xk ‰t :
k k
In the square brackets we recognise the spectral theorem expansion of the matrix
0
eH.t t/=i„ which serves then as the required propagation matrix:
0 0
Ut0 t D eH.t t/=i„ D eiH.t t/=„ :
Chapter 3
Fourier Series
In Sect. I.7.21 of the first book we investigated in detail how an arbitrary function
f .x/ can be expanded into a functional series in terms of functions an .x/ with
n D 0; 1; 2; : : :, i.e.
1
X
f .x/ D f0 a0 .x/ C f1 a1 .x/ C D fn an .x/: (3.1)
nD0
There are many applications in which one has to solve ordinary or partial
differential equation with respect to functions which are periodic:
f .x C T/ D f .x/: (3.2)
1
In the following, references to the first volume of this course (L. Kantorovich, Mathematics for
natural scientists: fundamentals and basics, Springer, 2015) will be made by appending the Roman
number I in front of the reference, e.g. Sect. I.1.8 or Eq. (I.5.18) refer to Sect. 1.8 and Eq. (5.18) of
the first volume, respectively.
Consider functions
n x n x
n .x/ D cos and n .x/ D sin ; (3.3)
l l
where n D 0; 1; 2; : : :. As can easily be seen, they have the same periodicity T for
any n, i.e.
n .x C 2l/ n x n x
cos D cos C 2 n D cos and
l l l
n .x C 2l/ n x n x
sin D sin C 2 n D sin :
l l l
What we would like to do is to understand whether it is possible to express f .x/ as
a linear combination of all these functions for all possible values of n from 0 to 1.
We shall start our discussion by showing that the functions (3.3) have a very simple
and important property. Namely, they satisfy the following identities for any n ¤ m:
Z l Z l
n .x/ m .x/dx D 0 and n .x/ m .x/dx D 0; (3.4)
l l
Indeed, let us first prove Eqs. (3.4). Using trigonometric identities from Sect. I.2.3.8,
one can write
n x m x 1 .n C m/ x .n m/ x
n .x/ m .x/ D cos cos D cos C cos :
l l 2 l l
Note that for any different n and m, the integer numbers k D n ˙ m are never equal
to zero. But then for any k ¤ 0:
Z ˇ
l
k x l k x ˇˇl l
cos dx D sin ˇ D Œsin k sin .k / D 0;
l l k l l k
so that the first integral in Eq. (3.4) is zero.
The relations we have found can now be conveniently rewritten using the Kronecker
symbol ınm which, we recall, is by definition equal to zero if n ¤ m and to unity if
n D m. Then we can write
Z l Z l
n .x/ m .x/dx D lınm and n .x/ m .x/dx D lınm : (3.6)
l l
Thus, we find that the integral between l and l (the interval of periodicity of f .x/)
of a product of any two different functions taken from the set f n ; n g of Eq. (3.3)
is always equal to zero. We conclude then that these functions are orthogonal, i.e.
they form an orthogonal set of functions. We shall have a more detailed look at an
expansion via orthogonal sets of functions in Sect. 3.7.3.
Problem 3.3. Prove that Eqs. (3.6) are valid also for the integral limits l C c
and l C c with c being any real number. This means that the integrals can be
taken between any two limits x1 and x2 , as long as the difference between them
x2 x1 D 2l D T is equal to the period.
Let us now return to our function f .x/ that is periodic with the period of T D 2l,
and let us assume that it can be represented as a linear combination of all functions
of the set f n ; n g, i.e. as an infinite functional series
a0 X n n xo
1 1
a0 X n x
f .x/ D C fan n .x/ C bn n .x/g D C an cos C bn sin :
2 nD1
2 nD1
l l
(3.7)
Note that 0 .x/ D 0 and thus has been dropped; also, 0 .x/ D 1 and hence the a0
term has been separated from the sum with its coefficient chosen for convenience
as a0 =2. Note that f .x/ in the left-hand side of the expansion above, as we shall
see later on in Sect. 3.2, may not coincide for all values of x with the value which
the series in the right-hand side converges to; in other words, the two functions, f .x/
and the series itself, may differ at specific points. We shall have a proper look at this
particular point and the legitimacy of this expansion later on in Sect. 3.7.
Now, what we would like to do is to determine the coefficients a0 ; a1 ; a2 ; a3; ; : : :
and b1 ; b2 ; b3 ; : : :, assuming that such an expansion exists. To this end, let us also
252 3 Fourier Series
assume that we can integrate both sides of Eq. (3.7) from l to l term-by-term
(of course, as we know well from Sect. I.7.2, this is not always possible). Thus,
we have
Z l Z l X1 Z l Z l
a0
f .x/dx D dx C an n .x/dx C bn n .x/dx :
l l 2 nD1 l l
The integrals in the right-hand side for n 1 in the curly brackets are all equal to
zero (this can be checked either by a direct calculation, or also from the fact that
these integrals can be considered as orthogonality integrals (3.6) with 0 .x/ D 1),
so that the .a0 =2/ 2l D a0 l is obtained in the right-hand side, yielding
Z
1 l
a0 D f .x/dx (3.8)
l l
The first term in the right-hand side is zero since m ¤ 0, similarly to above. In the
same fashion, due to Eq. (3.5), the second integral in the curly brackets is also equal
to zero, and we are left with
Z l 1
X Z l
f .x/ m .x/dx D an n .x/ m .x/dx:
l nD1 l
Now in the right-hand side we have an infinite sum of terms containing the same
integrals as in Eq. (3.6) which are all equal to zero except for the single one in
which n D m, i.e. only a single term in the sum above survives
Z l 1
X Z l 1
X
f .x/ m .x/dx D an n .x/ m .x/dx D an lınm D am l;
l nD1 l nD1
Note that a0 of Eq. (3.8) can also formally be obtained from Eq. (3.9) although
the latter was, strictly speaking, obtained for non-zero values of m only. This
became possible because of the factor of 1=2 introduced earlier in Eq. (3.7), which
3.1 Trigonometric Series: An Intuitive Approach 253
now justifies its convenience. Therefore, Eq. (3.9) gives all an coefficients. Note,
however, that in practical calculations it is frequently required to consider the
coefficients a0 and an for n > 0 separately.
Thus, the formulae (3.8)–(3.10) solve the problem: if the function f .x/ is
known, then we can calculate all the coefficients in its expansion of Eq. (3.7). The
coefficients an and bn are called Fourier coefficients, and the infinite series (3.7)
Fourier series.
Example 3.1. I Consider a periodic function with the period of 2 specified in
the following way: f .x/ D x within the interval < x < , and then repeated
like this to the left and to the right. This function, when periodically repeated, jumps
between its values of ˙1 at the points ˙k for any integer k D 1; 3; 5; : : :. Calculate
the expansion coefficients an and bn and thus write the corresponding Fourier series.
Solution. In this case l D , and the formulae (3.8)–(3.10) for the coefficients an
and bn are rewritten as:
Z
1
am D x cos .mx/ dx D 0; m D 0; 1; 2; : : : ;
and
Z ˇ Z
1 1 cos.mx/ ˇˇ 1
bm D x sin .mx/ dx D x ˇ Cm cos .mx/ dx
m
8 9
ˆ
ˆ >
ˆ
< ˇ > >
=
1 cos.m / cos.m / 1 ˇ
D C 2 sin.mx/ˇˇ
ˆ
ˆ m m m >
>
:̂ „ ƒ‚ …> ;
D0
2 2
D .1/m D .1/mC1 ; m ¤ 0
m m
(the integration by parts was used). Note that am D 0 for any m because the function
under the integral for am is an odd function and we integrate over a symmetric
interval. Thus, in this particular example the Fourier series consists only of sine
functions:
X1
2
f .x/ D .1/mC1 sin.mx/: (3.11)
mD1
m
254 3 Fourier Series
2.5 n=3
0
-2.5
2.5 n=5
0
-2.5
fn(x)
2.5 n=10
0
-2.5
2.5 n=20
0
-2.5
-15 -10 -5 0 5 10 15
x
Fig. 3.1 Graphs of fn .x/ corresponding to the first n terms in the series of Eq. (3.11)
The convergence of the series is demonstrated in Fig. 3.1: the first n terms in the
series are only accounted for, i.e. the functions
Xn
2
fn .x/ D .1/mC1 sin.mx/
mD1
m
for several choices of the upper limit n D 3; 5; 10; 20 are plotted. It can be seen
that the series converges very quickly to the exact function between and .
Beyond this interval the function is periodically repeated. Note also that the largest
error in representing the function f .x/ by the series appears at the points ˙k (with
k D 1; 3; 5; : : :) where f .x/ jumps between the values ˙1. J
The actual integration limits from l to Cl in the above formulae were chosen
only for simplicity; in fact, due to periodicity of f .x/, cos .m x=l/ and sin .m x=l/,
one can use any limits differing by 2l, i.e. from l C c to l C c for any value of c
(Problem 3.3). For instance, in some cases it is convenient to use the interval from
0 to T D 2l.
The Fourier expansion can be handy in summing up infinite numerical series as
illustrated by the following example:
Example 3.2. I Show that
1 1 1
SD1 C C D : (3.12)
3 5 7 4
3.1 Trigonometric Series: An Intuitive Approach 255
Solution. Consider the series (3.11) generated in the previous Example 3.1 for
f .x/ D x, < x < , and set there x D =2:
X1
2 m
D .1/mC1 sin :
2 mD1
m 2
The sine functions are non-zero only for odd values of m D 1; 3; 5; : : :, so that
we obtain (convince yourself in what follows by calculating the first few terms
explicitly):
1 1 1
D 2 1 C C D 2S;
2 3 5 7
so that S D =4 as required. J
Problem 3.5. Show that if the function f .x/ D f .x/ is even, the Fourier series
is simplified as follows:
1 Z
a0 X nx 2 l
nx
f .x/ D C an cos with an D f .x/ cos dx: (3.13)
2 nD1
l l 0 l
The Fourier series for a periodic function f .x/ can be written in an alternative
form which may better illuminate its meaning. Indeed, the expression in the curly
brackets of Eq. (3.7) can be rewritten as a single sine function:
n x n x
an cos C bn sin D An sin .2 n t C n/ ;
l l
p
where An D a2n C b2n is an amplitude and n a phase satisfying
an bn
sin n Dp and cos n Dp ;
a2n C b2n a2n C b2n
256 3 Fourier Series
and n D n=T D n= .2l/ is the frequency. Correspondingly, the Fourier series (3.7)
is transformed into
1
a0 X
f .x/ D C An sin .2 n x C n/ : (3.15)
2 nD1
This simple result means that any periodic function can be represented as a sum
of sinusoidal functions with discrete frequencies 1 D 0 , 2 D 20 , 3 D 30 ,
etc., so that n D n0 , where 0 D 1=T is the smallest (the so-called fundamental)
frequency, and n D 1; 2; 3; : : :.
For instance, let us consider some signal f .t/ in a device or an electric circuit
which is periodic in time t with the period of T. Then, we see from Eq. (3.15)
that such a signal can be synthesised by forming a linear superposition of simple
harmonic signals having discrete frequencies n D n0 and amplitudes An . As we
have seen from the examples given above, in practice very often only a finite number
of lowest frequencies may be sufficient to faithfully represent the signal, with the
largest error appearing at the points where the original signal f .t/ experiences
jumps or changes most rapidly. It also follows from the above equations that when
expanding a complicated signal f .t/ there is a finite number of harmonics with
relatively large amplitudes, then f .t/ could be reasonably well represented only by
these.
The discussion above was not rigorous: firstly, we assumed that the expansion (3.7)
exists and, secondly, we integrated the infinite expansion term-by-term, which
cannot always be done. A more rigorous formulation of the problem is this: we
are given a function f .x/ specified in the interval2 l < x < l. We then form an
infinite series (3.7) with the coefficients calculated via Eqs. (3.8)–(3.10):
a0 X n n xo
1
n x
C an cos C bn sin :
2 nD1
l l
We ask if for any l < x < l the series converges to some function fFS .x/ , and if
it does, would the resulting function be exactly the same as f .x/, i.e. is it true that
fFS .x/ D f .x/ for all values of x? Also, are there any limitations on the function
f .x/ itself for this to be true? The answers to all these questions are not trivial and
the corresponding rigorous discussion is given later on in Sect. 3.7. Here we shall
simply formulate the final result established by Dirichlet.
2
Or, which is the same, which is periodic with the period of T D 2l; the “main” or “irreducible”
part of the function, which is periodically repeated, can start anywhere, one choice is between l
and l, the other between 0 and 2l, etc.
3.2 Dirichlet Conditions 257
Fig. 3.2 The function f .x/ is equal to f1 .x/ for x < x0 and to f2 .x/ for x > x0 , but is discontinuous
(makes a “jump”) at x0 . However, finite limits exist at both sides of the “jump” corresponding to
the two different values of the function on both sides of the point x D x0 : on the left, f .x 0/
D limx!x0 f1 .x/ D f1 .x0 /, while on the right f .x C 0/ D limx!x0 f2 .x/ D f2 .x0 /, where f1 .x0 / ¤
f2 .x0 /
We first give some definitions (see also Sect. I.2.4.3). Let the function f .x/ be
discontinuous at x D x0 , see Fig. 3.2, but have well-defined limits x ! x0 from the
left and from the right of x0 (see Sect. I.2.4.1.2), i.e.
with ı > 0 in both cases. It is then said that f .x/ has a discontinuity of the first
kind at the point x0 . Then, the function f .x/ is said to be piecewise continuous in the
interval a < x < b, if it has a finite number n < 1 of discontinuities of the first kind
there, but otherwise is continuous everywhere, i.e. it is continuous between any two
adjacent points of discontinuity.
The function f .x/ D x, < x < , of Example 3.1, when periodically
repeated, represents an example of such a function: in any finite interval crossing
points ˙ ; ˙2 ; : : :, it has a finite number of discontinuities; however, at each
discontinuity finite limits exist from both sides. For instance, consider the point
of discontinuity x D . Just on the left of it f .x/ D x and hence the limit from the
left f . 0/ D , while on the right of it f . C 0/ D due to periodicity of f .x/.
Thus, at x D we have a discontinuity of the first kind.
Then, the following Dirichlet theorem addresses the fundamental questions about
the expansion of the function f .x/ into the Fourier series:
Theorem 3.1. If f .x/ is piecewise continuous in the interval l < x < l and has
the period of T D 2l, then the Fourier series
a0 X n n xo
1
n x
fFS .x/ D C an cos C bn sin (3.16)
2 nD1
l l
converges to f .x/ at any point x where f .x/ is continuous, while it converges to the
mean value
1
fFS .x0 / D Œf .x0 0/ C f .x0 C 0/ (3.17)
2
at the points x D x0 of discontinuity.
258 3 Fourier Series
The proof of this theorem is quite remarkable and will be given in Sect. 3.7
using additional assumptions. Functions f .x/ satisfying conditions of the Dirichlet
theorem are said to satisfy Dirichlet conditions.
As an example, consider the function f .x/ D x, < x < , with the period 2
(Example 3.1) whose Fourier series is given by Eq. (3.11):
X1
2
fFS .x/ D .1/mC1 sin.mx/:
mD1
m
What values does fFS .x/ converge to at the points x D ; 0; =2; ; 3 =2? To
answer these questions, we need to check if the function makes a jump at these
points. If it does, then we should consider the left and right limits of the function
at the point of the jump and calculate the average (the mean value); if there is no
jump, the Fourier series converges at this point to the value of the function itself.
f .x/ is continuous at x D 0; =2; 3 =2 and thus fFS .0/ D f .0/ D 0, fFS . =2/ D
f . =2/ D =2 and fFS .3 =2/ D f .3 =2/ D f .3 =2 2 / D f . =2/ D =2
(by employing T D 2 periodicity of f .x/, we have moved its argument 3 =2
back into the interval < x < ), while at x D the function f .x/ has the
discontinuity of the first kind with the limits on both sides equal to C (from the
left) and (right), respectively, so that the mean is zero, i.e. fFS . / D 0. This is
also clearly seen in Fig. 3.1.
2 1
X 4.1/n
f .x/ D C cos.nx/: (3.18)
3 nD1
n2
Problem 3.10. Show that the sine/cosine Fourier series expansion of the
function f .x/ D 1 for 0 < x < and f .x/ D 0 for < x < 0 is
1
1 1 X 1 .1/n
f .x/ D C sin.nx/: (3.19)
2 nD1
n
X .1/nC1 1 2
1 1 1
SD1 C C D D : (3.20)
22 32 42 nD1
n2 12
Problem 3.13. Use the Fourier series (3.19) to show that the numerical series
1 1
SD1 C D : (3.21)
3 5 4
The Fourier series is an example of a functional series and hence it can be inte-
grated term-by-term if it converges uniformly (Sect. I.7.2). However, the uniform
convergence is only a sufficient criterion. We shall prove now that irrespective of
whether the uniform convergence exists or not, the Fourier series can be integrated
term-by-term any number of times.
To prove this, it is convenient to consider an expansion of f .x/ with a0 D 0; it
is always possible by considering the function f .x/ a0 =2 ! f .x/ whose Fourier
expansion would have no constant term, i.e. for this function a0 D 0. Then, we
consider an auxiliary function:
Z x
F.x/ D f .t/dt: (3.22)
l
Z xC2l Z x Z xC2l Z x Z 2l
F.x C 2l/ D f .t/dt D f .t/dt C f .t/dt D f .t/dt C f .t/dt
l l x l 0
Z x
D f .t/dt D F.x/:
l
Further, if f .x/ satisfies the Dirichlet conditions (is piecewise continuous), its
integral, F.x/, will as well. Therefore, F.x/ can also formally be expanded into a
Fourier series:
A0 X h nx i
1
nx
F.x/ D C An cos C Bn sin ; (3.23)
2 nD1
l l
where
Z Z Z
1 l
1 l
1 l
A0 D F.x/dx D xF.x/jll xf .x/dx D xf .x/dx;
l l l l l l
where we have used integration by parts in each case and the fact that F 0 .x/ D f .x/
and F.˙l/ D 0. Above, an and bn are the corresponding Fourier coefficients to
the cosines and sines in the Fourier expansion of f .x/. Therefore, we can write the
expansion of F.x/ as follows:
Z Z 1
X
x
1 l l
nx l nx
F.x/ D f .t/dt D bn cos C
tf .t/dt Can sin :
l l nD1
2l n l n l
(3.24)
Further, at x D l we should have zero in both sides, which means that
Z X 1 Z 1
X
1 l
l 1 l
l .1/n
tf .t/dt bn cos . n/ D 0 H) tf .t/dt D bn ;
2l l nD1
n 2l l nD1
n
3.3 Integration and Differentiation of the Fourier Series 261
Since f1 .x1 / D x1 in the interval under consideration, the integral in the left-hand
side gives x2 =2. Integrating the sine functions in the right-hand side, we obtain
X1
x2 2 1
D .1/mC1 Œ cos.mx/ C 1
2 mD1
m m
262 3 Fourier Series
10
n=10
n=3
8 n=2
6
fn(x)
0
-10 0 10
X
Fig. 3.3 The partial Fourier series of f .x/ D x2 , see Eq. (3.26), containing n D 2, 3 and 10 terms
X1 X1
2.1/mC1 .1/mC1
D 2
cos.mx/ C 2 :
mD1
m mD1
m2
The numerical series (the last term) can be shown (using the direct method for
f .x/ D x2 , i.e. expanding it into the Fourier series, see Problems 3.8 and 3.12)
to be equal to 2 =12,
X1 2
.1/mC1
D ;
mD1
m2 12
The convergence of this series with different number of terms n in the sum (m n)
is pretty remarkable, as is demonstrated in Fig. 3.3. J
The situation with term-by-term differentiation of the Fourier series is more
complex since each differentiation of either cos . nx=l/ or sin . nx=l/ brings in
an extra n in the sum which results is slower convergence or even divergence. For
example, if we formally differentiate term-by-term formula (3.11) for f .x/ D x,
< x < , we obtain
1
X
1D 2.1/mC1 cos.mx/;
mD1
3.4 Parseval’s Theorem 263
which contains the diverging series. There are much more severe restrictions on
the function f .x/ that would enable its Fourier series to be differentiable term-by-
term. Therefore, this procedure should be performed with caution and proper prior
investigation.
Consider a periodic function f .x/ with the period of 2l satisfying Dirichlet condi-
tions, i.e. f .x/ has a finite number of discontinuities of the first kind in the interval
l < x < l. Thus, it can be expanded in the Fourier series (3.16). If we multiply this
expansion by itself, a Fourier series of the square of the function, f 2 .x/, is obtained.
Next, we integrate both sides from l to l (we know by now that the term-by-term
integration of the Fourier series is always permissible), which gives
Z l Z l( )
a0 X h n x i n a0
1
2 n x
fFS .x/dx D C an cos C bn sin
l l 2 nD1
l l 2
1 h
)
X m x m xi
C am cos C bm sin dx
mD1
l l
1 Z
a20 a0 X l n n x n xo
D 2l C 2 an cos C bn sin dx
4 2 nD1 l l l
1 Z l n
n xo n
X1 X
n x m x
C an cos C bn sin am cos
nD1 mD1 l
l l l
m x o
C bm sin dx:
l
In the left-hand side we used fFS .x/ instead of f .x/ to stress the fact that these
two functions may not coincide at the points of discontinuity of f .x/. However,
the integral in the left-hand side can be replaced by the integral of f 2 .x/ if f .x/ is
continuous everywhere; if it has discontinuities, this can also be done by splitting
the integral into a sum of integrals over each region of continuity of f .x/, where
2
fFS .x/ D f .x/. Hence, the integral of fFS .x/ can be replaced by the integral of f 2 .x/ in
a very general case. Further, in the right-hand side of the above equation any integral
in the second term is zero (integrals of cosine and sine functions there can be treated
as the orthogonality integrals with the n D 0 cosine function which is equal to one).
Also, due to the orthogonality of the sine and cosine functions, Eqs. (3.5) and (3.6),
in the term with the double sum only integrals with equal indices n D m are non-
zero if taken between two cosine or two sine functions; all other integrals are zero.
Hence, we obtain the following simple result:
264 3 Fourier Series
Z 1
1 l
a20 X 2
f 2 .x/dx D C an C b2n : (3.27)
l l 2 nD1
This equation is called Parseval’s equality or theorem. It can be used, e.g., for
calculating infinite numerical series.
Example 3.4. I Write the Parseval’s equality for the series (3.11) of f .x/ D x,
< x < , and then sum up the infinite numerical series:
1 1 1
1C C 2 C 2 ::::
22 3 4
Solution. The integral in the left-hand side of the Parseval’s equality (3.27) is
simply (l D here):
Z Z ˇ
1 2 1 2 1 x3 ˇˇ 2 2
f .x/dx D x dx D D :
3 ˇ 3
1
( 2 ) 1 1 1
02 X 2 2 X 4 X 1 X 1 2 2
C 0 C .1/ nC1
D 2
D 4 H) 4 D ;
2 nD1
n nD1
n nD1
n2 nD1
n2 3
or
X1 2
1 1 1
D 1 C C C D ; (3.28)
nD1
n2 22 32 6
as required.J
called Plancherel’s theorem. Here f .x/ and g.x/ are two functions ˚of the same
period 2l, both satisfying the Dirichlet conditions, and fan ; bn g and a0n ; b0n are
their Fourier coefficients, respectively.
Problem 3.15. Use the Parseval’s theorem applied to the series (3.19) to show
that
2
1 1
1C 2
C 2 C D :
3 5 8
3.5 Complex (Exponential) Form of the Fourier Series 265
Problem 3.16. Applying the Parseval’s theorem to the series (3.26), show that
4
1 1
SD1C C C D :
24 34 90
Problem 3.17. Show that the Fourier series expansion of f .x/ D x3 4x, where
2 x < 2, is
1
96 X .1/n nx
f .x/ D 3 3
sin :
nD1
n 2
It is also possible to formulate the Fourier series (3.7) of a function f .x/ that is
periodic with the period of T D 2l in a different form based on complex exponential
functions:
1
X 1
X
f .x/ D cn ein x=l
D cn n .x/: (3.30)
nD1 nD1
Note that here the index n runs over all possible negative and positive integer
values (including zero), not just over zero and positive values as in the case of
the cosine/sine Fourier series expansion. It will become clearer later on why this
is necessary.
The Fourier coefficients cn can be obtained in the same way as for the sine/cosine
series by noting that the functions n .x/ D exp .in x=l/ also form an orthogonal
set. Indeed, if integers n and m are different, n ¤ m, then
Z l Z l Z l
n .x/m .x/dx D ein x=l im x=l
e dx D ei.nm/ x=l
dx
l l l
l ˇ
x=l ˇl l i.nm/
D ei.nm/ l
D e ei.nm/
i.n m/ i.n m/
l
D f2i sin Œ .n m/g D 0:
i.n m/
266 3 Fourier Series
Note that when formulating the orthogonality condition above, we took one of the
functions in the integrand, n .x/, as complex conjugate. If n D m, however, then
Z l Z l Z l
n .x/n .x/dx D jn .x/j2 dx D dx D 2l;
l l l
where ınm is the Kronecker symbol. Note again that one of the functions is complex
conjugate in the above equation.
Thus, assuming that f .x/ can be expanded into the functional series in terms of
the functions n .x/, we can find the expansion coefficients cn exactly in the same
way as for the sine/cosine series: multiply both sides of Eq. (3.30) by m .x/ with a
fixed index m, and integrate from l to l on both sides:
Z l 1
X Z l 1
X
f .x/m .x/dx D cn n .x/m .x/dx D cn 2lınm D cm 2l;
l nD1 l nD1
Note that we never assumed here that the function f .x/ is real. So, it could also be
complex.
The same expressions (3.30) and (3.32) can also be derived directly from the
sine/cosine Fourier series. This exercise helps to understand, firstly, that this new
form of the Fourier series is exactly equivalent to the previous one based on the sine
and cosine functions; secondly, we can see explicitly why both positive and negative
values of n are needed. To accomplish this, we start from Eq. (3.16) and replace sine
and cosine with complex exponentials by means of the Euler’s formulae (2.33):
1 ix 1 ix
sin x D e eix and cos x D e C eix ;
2i 2
yielding:
1
a0 X 1 in x=l
1 in in x=l
f .x/ D C an ein x=l
Ce C bn e x=l
e
2 nD1
2 2i
1 1
a0 X 1 1 X 1 1
D C an ein x=l
C bn ein x=l
C an ein x=l
bn ein x=l
2 nD1
2 2i nD1
2 2i
1
X 1
X
a0 1 1 1 1
D C an C bn ein x=l
C an bn ein x=l
: (3.33)
2 2 nD1
i 2 nD1
i
3.5 Complex (Exponential) Form of the Fourier Series 267
Although these expressions were obtained for positive values of n only, they can
formally be extended for any values of n including negative and zero values. Then,
we observe that an D an , bn D bn , and b0 D 0. These expressions allow us to
rewrite the second sum in (3.33) as a sum over all negative integers n:
1 1
1X 1 1 X 1
an bn ein x=l
D an bn ein x=l
2 nD1 i 2 nD1 i
1
1 X 1
D an C bn ein x=l
:
2 nD1 i
We see that this sum looks now exactly the same as the first sum in (3.33) in which
n is positive, so that we can combine the two into a single sum in which n takes on
all integer values from 1 to C1 except for n D 0:
1
X
a0 1
f .x/ D C .an ibn / ein x=l
:
2 2
nD1;n¤0
and noting that b0 D 0 and hence the a0 =2 term can also be formally incorporated
into the sum as c0 D a0 =2, we can finally rewrite the above expansion in the
form of Eq. (3.30). The obtained equations are the same as Eqs. (3.30) and (3.32),
respectively, but derived differently. Thus, the two forms of the Fourier series are
completely equivalent to each other. The exponential (complex) form looks simpler
and thus is easier to remember. It is always possible, using the Euler’s formula, to
obtain any of the forms as illustrated by the following example:
Example 3.5. I Obtain the complex (exponential) form of the Fourier series for
f .x/ D x, < x < as in Example 3.1.
268 3 Fourier Series
Solution. We start by calculating the Fourier coefficients cn from Eq. (3.32) using
l D . For n ¤ 0
Z
1
cn D xein x=
dx
2
Z ˇ Z
1 inx 1 1 inx ˇˇ 1 inx
D xe dx D x e ˇ C e dx
2 2 in in
1 in 1 in
D e Ce in
e e in
2 in .in/2
1 2 2i 1 .1/nC1
D cos .n / C sin .n / D cos .n / D ;
2 in .in/2 in in
and, when n D 0,
Z ˇ
1 1 x2 ˇˇ
c0 D xdx D D 0;
2 2 2 ˇ
Example 3.6. I Show that the above expansion is equivalent to the series (3.11).
Solution. Since exp .inx/ D cos.nx/ C i sin.nx/, we get by splitting the sum into
two with negative and positive summation indices:
1
X 1
X 1 1
.1/nC1 .1/nC1 inx X .1/nC1 inx X .1/nC1 inx
f .x/ D einx C e D e C e ;
nD1
in nD1
in nD1
in nD1
in
where in the second sum we replaced the summation index n ! n, so that the
new index would run from 1 to C1 as in the other sum. Combining the two sums
together, and noting that .1/nC1 D .1/nC1 , we get
X1 1
.1/nC1 inx X .1/nC1
f .x/ D e einx D 2i sin.nx/
nD1
in nD1
in
1
X 2.1/nC1
D sin.nx/;
nD1
n
which is exactly the same as in Eq. (3.11) which was obtained using the sine/cosine
formulae for the Fourier series.J
3.5 Complex (Exponential) Form of the Fourier Series 269
Problem 3.18. Show that the complex (exponential) Fourier series of the
function
8
< 0; < x < 0
f .x/ D 1; 0 < x < =2
:
0; =2 < x <
is
1
X
1 1
f .x/ D C 1 ein =2
einx : (3.35)
4 2 in
nD1;n¤0
Problem 3.19. Use x D 0 in the series of the previous problem to obtain the
sum of the numerical series (3.21).
Problem 3.20. Show that the expansion of the function
sin x; 0 x <
f .x/ D
0; < x < 0
for two (generally complex) functions f .x/ and g.x/ of the same period of 2l,
with the corresponding (exponential) Fourier coefficients cn and dn .
270 3 Fourier Series
Problem 3.24. Applying the Parseval’s theorem (3.36) to the Fourier series of
Problem 3.20, show that
1
X 2
1 1 1 1
2
D1C 2
C 2 C D C :
nD0 .4n2 1/ 3 15 16 2
Problem 3.25. In theoretical many-body physics one frequently uses the so-
called Matsubara Green’s function G.1 ; 2 / D G.1 2 /. The function G. /
is defined on the interval ˇ ˇ, where ˇ D 1=kB T is the inverse
temperature. For bosons the values of G for positive and negative arguments
are related via the following relationship: G. / D G . ˇ/, where ˇ
0. Show that this function may be expanded into the following Fourier series:
1 X !n
G. / D e G .!n / ;
iˇ n
Problem 3.26. Show that exactly the same expansion exists also for fermions
when the values of G for positive and negative arguments are related via
G. / D G . ˇ/, where ˇ 0. The only difference is that the
summation is run only over odd values of n.
We assume that the force f .t/ is a periodic function with the period T D 2 =!
(frequency !). Let !0 be the fundamental frequency of the harmonic oscillator.
Here yR is a double derivative of y.t/ with respect to time t. We would like to obtain
3.6 Application to Differential Equations 271
the particular integral of this differential equation to learn about the response of the
oscillator to the external force f .t/. In other words, we shall only be interested here
in obtaining the partial integral of this DE.
Using the Fourier series method, it is possible to write down the solution of (3.38)
for a general f .t/. Indeed, since f .t/ is periodic, we can expand it into the complex
Fourier series (we use l D =! so that T D 2l):
1
X 1
X
f .t/ D fn ein t!=
D fn ein!t (3.39)
nD1 nD1
with
Z Z Z 2 =!
1 T=2
in!t 1 T
in!t !
fn D f .t/e dt D f .t/e dt D f .t/ein!t dt;
T T=2 T 0 2 0
(3.40)
where the integration was shifted to the interval 0 < t < T D 2 =!.
To obtain y.t/ that satisfies the differential equation above, we recognise from
Eq. (3.38) that the function Y.t/ D yR .t/ C !02 y.t/ must also be periodic with the
same periodicity as the external force; consequently, the function y.t/ sought for
must also be such. Hence, we can expand it into a Fourier series as well:
1
X
y.t/ D yn ein!t : (3.41)
nD1
or
1
X ˚
.n!/2 C !02 yn fn ein!t D 0: (3.42)
nD1
This equation is satisfied for all values of t if and only if all coefficients of exp .in!t/
are equal to zero simultaneously for all values of n:
Indeed, upon multiplying both sides of Eq. (3.42) by exp .im!t/ with some fixed
value of m and integrating between 0 and T, we get that only the n D m term is
left in the left-hand side of Eq. (3.42) due to orthogonality of the functions n .t/ D
exp .in!t/:
3
Since the Fourier series for both y.t/ and f .t/ converge, the series for yR .t/ must converge as well,
i.e. the second derivative of the Fourier series of y.t/ must be well defined.
272 3 Fourier Series
1
X Z
˚ T
.n!/2 C !02 yn fn ein!t eim!t dt
nD1 0
1
X ˚
D .n!/2 C !02 yn fn ınm T
nD1
˚
DT .m!/2 C !02 ym fm D 0;
which immediately leads to Eq. (3.43). Thus, we get the unknown Fourier coeffi-
cients yn of the solution y.t/ as
fn
yn D (3.44)
!02 .n!/2
and hence the whole solution reads
1
X fn
y.t/ D ein!t : (3.45)
!2
nD1 0
.n!/2
We see that the harmonics of f .t/ with frequencies n! are greatly enhanced in
the solution if they come close to the fundamental frequency !0 of the harmonic
oscillator (resonance).
Before actually giving the rigorous formulation for the Fourier series, we note that
in Eq. (3.7) the function f .x/ is expanded into a set of linearly independent functions
f n .x/g and f n .x/g, see Sect. 1.1.2 for the definition of linear independence.
It is easy to see that cosine and sine functions f n .x/g and f n .x/g of Eq. (3.3) are
linearly independent. Indeed, let us construct a linear combination of all functions
with unknown coefficients ˛i and ˇi and set it to zero:
1
X
Œ˛i i .x/ C ˇi i .x/ D 0: (3.46)
iD0
Next we multiply out both sides by j .x/ with some fixed j and integrate over x
between l and l:
1
X Z l Z l
˛i i .x/ j .x/dx C ˇi i .x/ j .x/dx D 0:
iD0 l l
3.7 A More Rigorous Approach to the Fourier Series 273
Due to the orthogonality of the functions, see Eqs. (3.5) and (3.6), all the integrals
between any i .x/ and j .x/ will be equal to zero, while the integrals involving i .x/
(i D 0; 1; 2; : : :) and j .x/ give the Kronecker symbol ıij , i.e. only one term in the
sum with the value of i D j will survive
1
X Z l 1
X
˛i i .x/ j .x/dx D ˛j ıij l D ˛j l D 0 H) ˛j D 0:
iD0 l iD0
˛0 X
N
fN .x/ D C .˛n n .x/ C ˇn n .x// (3.47)
2 nD1
of the same type as in the Fourier series (3.7), but with arbitrary coefficients ˛n
and ˇn . In addition, the sum above is constructed out of only the first N functions
n .x/ and n .x/ of the Fourier series.
Theorem 3.2. The expansion (3.47) converges on average to the function f .x/ for
any N if the coefficients ˛n and ˇn of the linear combination coincide with the
corresponding Fourier coefficients an and bn defined by Eqs. (3.9) and (3.10), i.e.
when ˛n D an and ˇn D bn for any n D 0; 1; : : : ; N. By “average” convergence
we mean the minimum of the mean square error function
Z
1 l
ıN D Œf .x/ fN .x/2 dx: (3.48)
l l
Proof. Expanding the square in the integrand of the error function (3.48), we get
three terms:
Z Z Z
1 l
2 l
1 l
ıN D f 2 .x/dx f .x/fN .x/dx C fN2 .x/dx: (3.49)
l l l l l l
Substituting the expansion (3.47) into each of these terms, they can be calculated.
We use the orthogonality of the functions f n .x/g and f n .x/g to calculate first the
last term in Eq. (3.49):
Z Z
2 X ˛0
N
1 l
˛02 l
fN2 .x/dx D C .˛n n .x/ C ˇn n .x// dx
l l 2 l nD1 2 l
Z
1 XX
N N l
C .˛n n .x/ C ˇn n .x// .˛m m .x/ C ˇm m .x// dx:
l nD1 mD1 l
The second term in the right-hand side is zero since the integrals of either n or
n are zero for any n. In the third term, only integrals between two n or two n
functions with equal indices survive, and thus we obtain
Z
˛02 X 2
N
1 l
fN2 .x/dx D C ˛n C ˇn2 ; (3.50)
l l 2 nD1
i.e. an identity similar to the Parseval’s theorem, Sect. 3.4. The second integral in
Eq. (3.49) can be treated along the same lines:
Z Z
2 l
˛0 l
f .x/fN .x/dx D f .x/dx
l l l l
N Z l Z l
2X
C ˛n f .x/ n .x/dx C ˇn f .x/ n .x/dx :
l nD1 l l
Using Eqs. (3.9) and (3.10) for the Fourier coefficients of f .x/, we can rewrite the
above expression in a simplified form:
Z X
N
2 l
f .x/fN .x/dx D ˛0 a0 C 2 .˛n an C ˇn bn / : (3.51)
l l nD1
Z
1 l
1 a2
D 2
f .x/dx C .˛0 a0 /2 0
l l 2 2
N h
X i
C .˛n an /2 a2n C .ˇn bn /2 b2n : (3.52)
nD1
It is seen that the minimum of ıN with respect to the coefficients ˛n and ˇn of the trial
expansion (3.47) is achieved at ˛n D an and ˇn D bn , i.e. when the expansion (3.47)
coincides with the partial Fourier series containing the first N terms. Q.E.D.
Theorem 3.3. The Fourier coefficients an and bn defined by Eqs. (3.9) and (3.10)
tend to zero as n ! 1, i.e.
Z l Z l
nx nx
lim f .x/ sin dx D 0 and lim f .x/ cos dx D 0; (3.53)
n!1 l l n!1 l l
Proof. The conditions imposed on the function f .x/ in the formulation of the
theorem are needed for all the integrals we wrote above to exist. Then, the minimum
error ıN is obtained from Eq. (3.52) by putting ˛n D an and ˇn D bn :
Z
a20 X 2
N
1 l
ıN D f 2 .x/dx an C b2n : (3.54)
l l 2 nD1
Note that the values of the coefficients ˛n and ˇn do not depend on the value of N; for
instance, if N is increased by one, N ! N C 1, two new coefficients are added to the
expansion (3.47), ˛NC1 and ˇNC1 , however, the values of the previous coefficients
remain the same. At the same time, the error (3.54) becomes ıNC1 D ıN a2NC1
b2NC1 , i.e. it gets two extra negative terms and hence can only become smaller. As the
number of terms N in the expansion is increased, the error gets smaller and smaller.
On the other hand, the error is always not negative by construction, i.e. ıN 0.
Therefore, from Eq. (3.54), given the fact that ıN 0, we conclude that
Z
a20 X 2 1
N l
C an C b2n f 2 .x/dx: (3.55)
2 nD1
l l
As N is increased, the sum in the left-hand side is getting larger, but will always
remain smaller than the positive
P value
2 of the integral in the right-hand side. This
means that the infinite series 1
nD1 an C b 2
n is absolutely convergent, and we can
replace N with 1:
1 Z
a20 X 2 1 l
C an C b2n f 2 .x/dx: (3.56)
2 nD1
l l
276 3 Fourier Series
Thus, the infinite series in the left-hand side above is bound from above. Since the
series converges, the terms of it, a2n C b2n , must tend to zero as n ! 1 (Theorem
I.7.4), i.e. each of the coefficients an and bn must tend separately to zero as n ! 1.
Q.E.D.
As a simple corollary to this theorem, we notice that the integration limits in
Eq. (3.53) can be arbitrary; in particular, these could only cover a part of the interval
l x l. Indeed, if the interval a x b is given, which lies inside the
interval l x l, one can always define a new function which is equal to
the original function f .x/ inside a x b and is zero in the remaining part of
the periodicity interval l x l. The new function would still be integrable, and
hence the above theorem would hold for it as well. This proves the statement made
which we shall use below.
It can then be shown that the error ıN ! 0 as N ! 1. This means that we
actually have the equal sign in the above Eq. (3.56):
1 Z
a20 X 2 1 l
C an C b2n D f 2 .x/dx (3.57)
2 nD1
l l
This is the familiar Parseval’s equality, Eq. (3.27). To prove this, however, we have
to perform a very careful analysis of the partial sum of the Fourier series which is
the subject of the next subsection.
a0 X
N
nx nx
SN .x/ D C an cos C bn sin ; (3.58)
2 nD1
l l
where the coefficients an and bn are given precisely by Eqs. (3.9) and (3.10). Using
these expressions in (3.58), we get
Z
1 l
SN .x/ D f .t/dt
2l l
N Z
X Z l
1 l
nt nx 1 nt nx
C f .t/ cos dt cos C f .t/ sin dt sin
nD1
l l l l l l l l
Z ( )
1 l
1 X N
nt nx nt nx
D f .t/ C cos cos C sin sin dt
l l 2 nD1 l l l l
Z ( )
1 l
1 X N
n .t x/
D f .t/ C cos dt;
l l 2 nD1 l
3.7 A More Rigorous Approach to the Fourier Series 277
where we have used a well-known trigonometric identity for the expression in the
square brackets, as well as the fact that we are dealing with a sum of a finite number
of terms, and hence the summation and integration signs are interchangeable. The
sum of cosine functions we have already calculated before, Eq. (2.8), so that
1 X
N
n .t x/ 1 .t x/ 1
C cos D sin N C
2 nD1 l 2 sin .tx/
2l
l 2
sin 2l .t x/ .2N C 1/
D :
2 sin .tx/
2l
where in the last step we shifted the integration interval by x. We can do it, firstly,
as the function f .t/ is periodic, and secondly the ratio of both sine functions is also
periodic with the same period of 2l (check it!). Next, we split the integration region
into two intervals: from x l to x, and from x to x C l, and shall make a different
change of variable t ! p in each integral as shown below:
Z ˇ ˇ
1 x sin .t x/ .2N C 1/ ˇ t D x 2p ˇ
f .t/ 2l ˇ
dt D ˇ ˇ
l xl 2 sin .tx/ dt D 2dp ˇ
2l
Z
1 l=2 sin l .2N C 1/ p
f .x 2p/ D dp;
l 0 sin l p
Z ˇ ˇ
1 xCl sin 2l .t x/ .2N C 1/ ˇ t D x C 2p ˇ
f .t/ ˇ
dt D ˇ ˇ
l x 2 sin .tx/ dt D 2dp ˇ
2l
Z
1 l=2 sin l .2N C 1/ p
D f .x C 2p/ dp;
l 0 sin l p
Next, we note that if f .x/ D 1, then the Fourier series consists of a single term
which is a0 =2 D 1, i.e. a0 D 2, since an and bn for n 1 are all equal to zero (due
to orthogonality of the corresponding sine and cosine functions, Eq. (3.3), with the
cosine function 0 .x/ D 1 corresponding to n D 0). Therefore, SN .x/ D 1 in this
case for any N, and we can write
Z
2 l=2 sin l .2N C 1/ p
1D dp:
l 0 sin l p
where
Œf .x 2p/ f .x ı/
‰1 .p/ D ; (3.62)
sin l p
Œf .x C 2p/ f .x C ı/
‰2 .p/ D : (3.63)
sin l p
The integrals in the right-hand side of Eq. (3.61) are of the form we considered in
Theorem 3.3 and a corollary to it. They are supposed to tend to zero as N ! 1
(where n D 2N C 1) if the functions ‰1 .p/ and ‰2 .p/ are continuous with respect to
their variable p within l p l, apart maybe from some finite number of points
of discontinuity of the first kind. If this was true, this theorem would be applicable
and then
1
lim SN .x/ Œf .x ı/ C f .x C ı/ D 0:
N!1 2
This would be the required result if we assume that ı ! 0, i.e. the Fourier series
(which is the limit of SN .x/ for N ! 1) would be equal to the mean value of f .x/
at the point x calculated using its left and right limits:
1
lim SN .x/ D Œf .x ı/ C f .x C ı/ : (3.64)
N!1 2
3.7 A More Rigorous Approach to the Fourier Series 279
Hence, what is needed is to analyse the two functions: ‰1 .p/ and ‰2 .p/. We
notice that both functions inherit the discontinuities of the first kind of f .x/ itself, i.e.
they are indeed continuous everywhere apart from possible discontinuities of f .x/.
The only special point is p D C0. Indeed, the sin . p=l/ in the denominator of both
functions is zero at p D 0, and hence we have to consider this particular point as
well. It is easy to see, however, that both functions have well-defined limits at this
point. Indeed, assuming f .x/ is differentiable on the left of x (i.e. at x 0), we can
apply the Lagrange formula of Sect. I.3.7, see Eq. (I.3.64), which we shall write here
as follows (recall that p changes between 0 and l=2, i.e. it is always positive):
and hence
since the limit of z= sin z when z ! 0 is equal to one. Hence, the limit of ‰1 .p/ at
p ! C0 is finite and is related to the left derivative of f .x/. Similarly,
and hence
Œf .x C 2p/ f .x C ı/ 2p
lim ‰2 .p/ D lim lim
p!C0 p!C0 2p p!C0 sin
l
p
2p 2l
D f 0 .x C 0/ lim D f 0 .x C 0/ ;
p!C0 sin p
l
it is also finite and is related to the right derivative of f .x/ (the left and right
derivatives at x do not need to be the same). This concludes our proof: the functions
‰1 .p/ and ‰2 .p/ satisfy the conditions of Theorem 3.3 and hence SN .x/ indeed
tends to the mean value of f .x/ at point x, Eq. (3.64).
In the above proof we have made an assumption concerning the function f .x/,
specifically, that it has well-defined left and right derivatives (and not necessarily
the same) at any point x between l and l; in fact, this assumption is not necessary
and can be lifted leading to the general formulation of the Dirichlet Theorem 3.1;
however, this more general proof is much more involved and will not be given here.
280 3 Fourier Series
Sine, cosine and complex exponential functions are just a few examples of
orthogonal sets of functions which can be used to expand “good” functions in the
corresponding functional series. There are many more such examples. Here we shall
discuss in some detail this question, a rigorous discussion of this topic goes far
beyond this book.
Let us assume that some continuous and generally complex functions 1 .x/,
2 .x/, 3 .x/, etc., form a set f i .x/g. We shall call this set of functions orthonormal
on the interval a x b with weight w.x/ 0 if for any i and j we have (cf.
Sect. 1.1.2):
Z b
w.x/ i .x/ j .x/dx D ıij ; (3.65)
a
compare Eqs. (3.6) and (3.31). Next, we consider a generally complex function f .x/
with a finite number of discontinuities of the first kind, but continuous everywhere
between any two points of discontinuity. Formally, let us assume, exactly in the
same way as when we investigated the trigonometric Fourier series in Sect. 3.1, that
f .x/ can be expanded into a functional series in terms of the functions of the set
f i .x/g, i.e.
X
f .x/ D ci i .x/; (3.66)
i
where ci are expansion coefficients. To find them, we assume that the series above
can be integrated term-by-term. Then we multiply both sides of the above equation
by w.x/ j .x/ with some fixed index j and integrate between a and b:
Z b X Z b
w.x/f .x/ j .x/dx D ci w.x/ j .x/ i .x/dx
a i a
Z b X
H) w.x/f .x/ j .x/dx D ci ıij D cj ;
a i
(3.67)
where we have made use of the orthonormality condition (3.65). Note that the
coefficients ci may be complex. Therefore, if the expansion (3.66) exists, the
expansion coefficients cj are to be determined from (3.67).
The expansion (3.66) is called the generalised Fourier expansion and coeffi-
cients (3.67) the generalised Fourier coefficients.
3.7 A More Rigorous Approach to the Fourier Series 281
Problem 3.27. Assuming the existence of the expansion (3.66) and legitimacy
of the integration term-by-term, prove the Parseval’s theorem for the gener-
alised Fourier expansion:
Z b X
w.x/ jf .x/j2 dx D jci j2 : (3.68)
a i
Problem 3.28. Similarly, consider two functions, f .x/ and g.x/, both expanded
into the functional series in terms of the same set f i .x/g. Then, prove the
Plancherel’s theorem for the generalised Fourier expansion:
Z b X
w.x/f .x/g.x/dx D fi gi ; (3.69)
a j
X
N
SN .x/ D fi i .x/;
iD1
Problem 3.30. Substitute the above expansion for the partial sum into the
error function (3.70) and, using explicit expressions for the generalised Fourier
coefficients (3.67), show that the error function
Z b X
N X
N
2 2
ıN D w.x/ jf .x/j dx jci j C jci fi j2 : (3.71)
a iD1 iD1
282 3 Fourier Series
It is seen from this result, Eq. (3.71), that the error ıN is minimised if the
coefficients fi and ci coincide: fi D ci . In other words, if the generalised Fourier
expansion exists, its coefficients must be the generalised Fourier coefficients (3.67).
Then the error
Z b X
N Z b X
N
2 2
ıN D w.x/ jf .x/j dx jci j 0 H) w.x/ jf .x/j2 dx jci j2 :
a iD1 a iD1
(3.72)
We also notice that the ci coefficients do not depend on N, and if N is increased,
the errorP ıN is necessarily reduced remaining non-negative. If the functional
series i ci i .x/ was equal to f .x/ everywhere except maybe a finite number
of points with discontinuities of the first kind, then, according to the Parseval’s
theorem (3.68), we would have the equal sign in (3.72), i.e. the error ıN in the
limit of N ! 1 would tend to zero. Therefore, there is a fundamental connection
between Eq. (3.68) and the fact that the generalised Fourier expansion is equal to
the original function f .x/ at all points apart from some finite number of them where
f .x/ is discontinuous. Equation (3.68) is called the completeness condition, because
it is closely related to the fact of whether an arbitrary function can be expanded into
the set f i .x/g or not.
Problem 3.31. Prove that if f .x/ is orthogonal to all functions of the complete
system f i .x/g, then f .x/ D 0. [Hint: use the completeness condition (3.68) as
well as expressions for the ci coefficients.]
Proof. Let us assume that the series converges to another function g.x/, i.e.
X Z b
g.x/ D ci i .x/ with ci D w.x/f .x/ j .x/dx:
i a
Since the series converges uniformly, g.x/ is continuous and we can integrate it
term-by-term. Hence, we multiply both sides of the expression for the series by
w.x/ j .x/ with some fixed j and integrate between a and b:
Z b X Z b
w.x/g.x/ j .x/dx D ci w.x/ j .x/ i .x/dx
a i a
Z b
H) cj D w.x/g.x/ j .x/dx;
a
i.e. the coefficients cj have the same expressions both in terms of f .x/ and g.x/, i.e.
Z b
w.x/ Œf .x/ g.x/ j .x/dx D0
a
for any j. The expression above states that the continuous function h.x/ D f .x/g.x/
is orthogonal to any of the functions j .x/ of the set. But if the set is complete, then
h.x/ can only be zero, i.e. g.x/ D f .x/. Q.E.D.
Using the notion of the Dirac delta function (Sect. 4.1), the completeness
condition can be formulated directly in terms of the functions f i .x/g of the set
and the weight. Indeed, using (3.67), we write
X XZ b
f .x/ D ci i .x/ D w.x/f .x0 / 0
i .x /dx
0
i .x/
i i a
Z " #
b 0 X 0
D f x w.x/ i x i .x/ dx0 :
a i
It is now clearly seen, that, according to the filtering theorem for the delta function,
Eq. (4.9), the square brackets above should be equal to ı .x x0 /, i.e.
X 0
0
w.x/ i x i .x/ D ı x x
i
The method of Fourier series is being used a lot in sciences since it allows presenting
a function (e.g. an electric signal or a field), which may have a complicated form, as
a linear combination of simple complex exponentials or sine and cosine functions
which are much easier to deal with; all the information about the function will
then be “stored” in the expansion coefficients. Therefore, the Fourier series method
is also used when solving differential equations, especially partial differential
equations, since in this case the problem with respect to all or some of the variables
becomes an algebraic one related to finding Fourier coefficients in the expansion of
the unknown functions.
Here we shall consider several examples of using Fourier series in condensed
matter physics. We shall also introduce notations often used in physics which are
slightly different to those used above.
Consider a periodic solid (a crystal) and a function f .x; y; z/ D f .r/ which describes
one of its properties, e.g., the charge density or the electrostatic potential at point
r D .x; y; z/. This function is periodic in several directions, and hence can be
expanded into multiple Fourier series. Here we shall consider how this can be done
in some detail.
Let us start from a one-dimensional crystal: consider a one-dimensional periodic
chain of atoms arranged along the x direction, similar to the one shown in Fig. 3.4.
Let .x/ be the electron density of the system corresponding to a distribution of
electrons on atoms of the chain. The density must repeat the periodicity of the chain
itself, i.e. .x C a/ D .x/ for any point x along the axis, where a (D 2l in our
previous notations) is the periodicity, which is the smallest distance between two
equivalent atoms. The density is a smooth function of x, continuous everywhere,
and hence .x/ can be expanded into the Fourier series:
Fig. 3.4 A one-dimensional infinite periodic crystal of alternating point charges ˙Q. Each unit
cell (indicated) contains two oppositely charged atoms with charges Q > 0 and Q < 0. The
total charge of the unit cell is zero and there is an infinite number of such cells running along the
positive and negative directions of the x axis
3.8 Applications of Fourier Series in Physics 285
Fig. 3.5 Two-dimensional lattice of atoms of two species. The system is seen periodic across the
lattice with the lattice vectors being a1 and a2 . The two non-equivalent atoms are indicated by a
pink dashed line, while the irreducible region in the 2D space associated with the unit cell is shown
by a dashed red line
1
X 1
X X
.x/ D ne
i2 nx=a
D ne
iGn x
D ge ;
igx
nD1 nD1 g
are the corresponding Fourier coefficients. In the last passage of the expansion
formula for .x/ we have used simplified notations in which n is dropped; these
are also frequently used. The quantity b D 2 =a is called a reciprocal lattice vector
(of the one-dimensional lattice) as it corresponds to the periodicity of this reciprocal
lattice: gn D bn. We can see that the Fourier series in this case can be mapped onto
the lattice sites of the (imaginable) reciprocal lattice with the periodicity b, and the
single summation over n would take all possible lattice sites g of this lattice into
account.
This consideration can be generalised for two- and three-dimensional lattices.
Consider first a two-dimensional lattice shown in Fig. 3.5. We introduce two vectors
on the plane of the lattice: a1 and a2 . Then any lattice site can be related to a
reference (zero) site via the lattice vector a D n1 a1 C n2 a2 , where n1 and n2
are two integers. By taking on all possible negative and positive values of n1 and
n2 including zero (corresponding to the zero lattice site), the whole infinite two-
dimensional lattice can be reproduced. The density in this case .x; y/ D .r/ is
a function of the two-dimensional vector r D .x; y/ or, alternatively, of the two
coordinates x and y. However, in general, the periodicity of the system does not
necessarily follow the Cartesian directions, i.e. the lattice vectors a1 and a2 may be
directed differently to the x and y axes. In this case it is more convenient to consider
the density as a function of the so-called fractional coordinates r1 and r2 instead.
These appear if we write r in terms of the lattice vectors, r D r1 a1 C r2 a2 , with r1
286 3 Fourier Series
and r2 being real numbers between 1 and C1. Then the density .r/ becomes a
function 1 .r1 ; r2 / of the fractional coordinates r1 and r2 . The convenience of this
representation is in that the density is periodic with respect to both r1 and r2 with
the period equal to one, i.e.
since adding the unity to either r1 or r2 changes the vector r in .r/ exactly by the
lattice vector a1 or a2 , i.e. in either case an equivalent point in space is obtained.
Hence, 1 .r1 ; r2 / is periodic and can be expanded into the Fourier series. We first
consider 1 as a function of r1 and expand it as above with respect to this variable
only:
1
X
1 .r1 ; r2 / D n1 .r2 / eik1 r1
n1 D1
with
Z 1=2 Z 1
ik1 r1
n1 .r2 / D 1 .r1 ; r2 / e dr1 D 1 .r1 ; r2 / eik1 r1 dr1 ;
1=2 0
1
" 1
# 1 1
X X X X
i.k1 r1 Ck2 r2 /
D n1 n2 e
ik2 r2 ik1 r1
e D n1 n2 e ;
n1 D1 n2 D1 n1 D1 n2 D1
(3.74)
where k2 D 2 n2 and
Z 1 Z 1 Z 1
n1 n2 D n1 .r2 / eik2 r2
dr2 D dr1 dr2 1 .r1 ; r2 / ei.k1 r1 Ck2 r2 / : (3.75)
0 0 0
The above equations can also be written in a more concise form. To this end,
as it is customarily done in crystallography and solid state physics, we introduce
reciprocal lattice vectors b1 and b2 via ai bj D 2 ıij , where i; j D 1; 2, and ıij is
the Kronecker delta symbol.
Problem 3.32. Show that the two-dimensional reciprocal lattice vectors are
2 2
b1 D 2
a22 a1 .a1 a2 / a2 ; b2 D 2 .a1 a2 / a1 C a21 a2 ;
vs vs
ˇ ˇ
where vs D ˇa1x a2y a2x a1y ˇ is the area of the unit cell. [Hint: expand b1 with
respect to a1 and a2 and find the expansion coefficients using its orthogonality
with the direct lattice vectors a1 and a2 . Repeat for b2 .]
X “
1
.r/ D ge
ig r
; where g D .r/ eig r dr; (3.76)
g
vs cell
where we sum over all possible reciprocal lattice vectors (it is in fact a double sum
over all possible integer values of n1 and n2 ). Notice that the integration in the
definition of the Fourier coefficients g is performed over the whole area of the unit
cell shown in Fig. 3.5, not over a square of side one as before in (3.75), and the factor
of 1=vs appeared. This is the result of the change of variables in the double integral
which was initially taken over r1 and r2 . Indeed, r D r1 a1 C r2 a2 , and hence x D
r1 a1x C r2 a2x and y D r1 a1y C r2 a2y . Therefore, dr D dxdy D jJj dr1 dr2 , where the
Jacobian
ˇ ˇ ˇ ˇ
@ .x; y/ ˇ @x=@r1 @y=@r1 ˇ ˇ a1x a1y ˇ
JD ˇ
Dˇ ˇ D ˇ ˇ D a1x a2y a1y a2x H) jJj D vs :
@ .r1 ; r2 / @x=@r2 @y=@r2 ˇ ˇ a2x a2y ˇ
2 2 2
b1 D Œa2 a3 ; b2 D Œa3 a1 ; b3 D Œa1 a2 ; (3.77)
vc vc vc
where vc D jŒa1 ; a2 ; a3 j is the unit cell volume (see also Sect. I.1.7.1).
Problem 3.34. Correspondingly, show that a function f .r/ which is periodic
in the direct space with respect to any direct lattice vector L D n1 a1 C n2 a2 C
n3 a3 (where fni g D .n1 ; n2 ; n3 / are all possible negative and positive integers,
including zero, and fai g are the unit base vectors) can be expanded in the triple
Fourier series
X •
1
f .r/ D fg eig r ; where fg D f .r/ eig r dr; (3.78)
g
vc cell
where integration is performed over the volume of the unit cell (i.e. a paral-
lelepiped formed by the three basic direct lattice vectors) and the summation is
performed over all possible reciprocal lattice vectors g D m1 b1 Cm2 b2 Cm3 b3
vectors (i.e. all possible m1 ; m2 ; m3 ).
Problem 3.35. Consider a big macroscopic volume of the solid containing a
very large number N of identical unit cells of volume vc each. Show then that
the triple integral in Eq. (3.78) for fg can in fact be extended to the whole volume
V D Nvc , i.e.
X •
1
f .r/ D fg eig r ; where fg D f .r/ eig r dr: (3.79)
g
V V
This expression for the Fourier coefficients could be sometimes more convenient
in considering the so-called thermodynamic limit when V ! 1 since the shape
of the unit cell can be ignored.
Problem 3.36. The electrostatic potential V.r/ in a three-dimensional peri-
odic crystal caused by its charge density satisfies the Poisson equation of
classical electrostatics V.r/ D 4 .r/, where .r/ is the total charge
density in the crystal at point r. Expanding both the potential and the density
in the Fourier series, show that the solution of this equation is
X4 g ig r
V.r/ D e : (3.80)
g
g2
(continued)
3.8 Applications of Fourier Series in Physics 289
X p 3
2 2 1 X 2 2
ejrLj t
D eg =4t eig r ; (3.81)
L
vc g t
where t is a real number and r is a vector. This identity shows that the direct
lattice sum in the left-hand side can be expanded in the Fourier series and can
thus be equivalently rewritten as a reciprocal lattice sum. [Hint: first of all,
prove that the function in the left-hand side is periodic in r with respect to
an arbitrary lattice vector L0 , i.e. adding L0 to r does not change the function;
hence, the left-hand side can be expanded in the Fourier series in terms of
g using Eq. (3.79). Then, when performing triple integration over the whole
space, show that each term in the sum makes an identical contribution giving
out a factor of N; hence the sum over L can be removed, while V be replaced
with vc . Next, calculate the triple integral over the whole space (i.e. in the
thermodynamic limit).]
The theta-function transformation allows one to obtain a very useful formula for
calculating in practice the electrostatic potential of the lattice of point charges
(atoms), the so-called Ewald formula. Consider a three-dimensional periodic crystal
with point charges qs in each unit cell, the index s is used to count the charges; in
the zero cell (with the lattice vector L D 0/ the positions ofPcharges are given by
vectors Xs . Each unit cell is considered charge-neutral, i.e. s qs D 0. Then, the
electrostatic potential at point r due to all charges of the whole crystal (we only
consider r to be somewhere between the atoms) will be
XX qs
V.r/ D ;
L s
jr L Xs j
since the position of the charge qs in the unit cell associated with the lattice vector
L is L C Xs .
290 3 Fourier Series
This lattice sum converges extremely slowly; in fact, it can be shown that it
converges only conditionally (see Sect. I.7.1.4). However, its convergence can be
considerably improved using the following trick.4 Consider the so-called error
function
Z 1
2 2
erfc.x/ D p et dt: (3.82)
x
2
This function tends to zero exponentially fast as x ! 1 (in fact, as ex ).
Conjugated to this one is another error function
Z
2 x
2
erf.x/ D 1 erfc.x/ D p et dt; (3.83)
0
which tends to unity as x ! 1. Using these two functions, we can rewrite the
potential:
" #
X erfc . jr L Xs j/ X X erf . jr L Xs j/
V.r/ D qs C qs ;
Ls
jr L Xs j s L
jr L Xs j
where is some (so far arbitrary) positive constant (called the Ewald’s constant).
The first sum over the direct lattice L converges very quickly because of the error
function, so only the unit cells around the point r contribute appreciably. The second
sum, however, converges extremely slowly because of the other error function
erf.x/ which tends to unity as x ! 1. This is the point where the theta-function
transformation proves to be extremely useful. Indeed, the function in the square
brackets can be manipulated into the following expression:
X erf . jr L Xs j/ Z
2 X 1 jrLXs j
2
Dp et dt
L
jr L Xs j L
jr L Xs j 0
ˇ ˇ Z !
ˇ D t= jr L Xs j ˇ 2 X
ˇ ˇ 2 jrLXs j2
Dˇ Dp e d:
d D dt= jr L Xs j ˇ 0 L
4
The Ewald method corresponds to a particular regularisation of the conditionally converging
series. However, it can be shown (see, e.g., L. Kantorovich and I. Tupitsin—J. Phys. Cond. Matter
11, 6159 (1999)) that this calculation results in the correct expression for the electrostatic potential
in the central part of a large finite sample if the dipole and quadruple moments of the unit cell are
equal to zero. Otherwise, an additional macroscopic contribution to the potential is present.
3.8 Applications of Fourier Series in Physics 291
Now, the expression in the round brackets can easily be recognised to be the one for
which the theta-function transformation (3.81) can be used:
Z ! Z !
2 X 2 X 1
2 jrLXs j2 g2 =42 ig .rXs /
p e d D e e d:
0 L
vc 0 g
3
The integration over is trivially performed using the substitution x D g2 =42 , and
we obtain
Z X !
2 2 jrLXs j2 4 X 1 g2 =4 2 ig .rXs /
p e d D e e :
0 L
vc g g2
Problem 3.39. Prove that for a two-dimensional solid the Ewald formula
reads
X !
X erfc . jr L Xs j/ 2 X 1 jgj ig Xs
V.r/ D qs C 2
erfc qs e eig r ;
Ls
jr L X s j vs g 2 s
g¤0
(3.86)
where L and g are two-dimensional direct and reciprocal lattice vectors.
In solid state physics very often it is necessary to deal with functions f .r/ which
are not periodic, e.g., an electron wave function in a crystal. Even in those cases
it is still useful and convenient to employ the Fourier series formalism. The way
this is normally done is as follows. We consider, following Born and von Karman,
an artificial periodicity in the solid happening on a much larger scale. Namely, we
assume that if we move N1 1 times along the lattice vector a1 , the solid repeats
itself, and the same happens if we perform a translation N2 1 and N3 1
times along the other two lattice directions (vectors), i.e. we impose an artificial
periodicity on the function f .r/ as follows:
f .r/ D f .r C N1 a1 / D f .r C N2 a2 / D f .r C N3 a3 / : (3.87)
M1 M2 M3
G D M1 B1 C M2 B2 C M3 B3 D b1 C b2 C b3 ;
N1 N2 N3
where numbers Mi take on all possible negative and positive integer values including
zero. When each integer Mi (i D 1; 2; 3) becomes equal to an integer number of the
corresponding Ni , the vector G becomes equal to the reciprocal lattice vector g of
the original reciprocal lattice corresponding to the small direct lattice with the basic
vectors ai . For other values of the numbers Mi we can always write: Mi D mi Ni Ci ,
where mi is an integer taking all possible values, but the integer i changes only
between 0 and Ni 1. Then,
3.8 Applications of Fourier Series in Physics 293
1 2 3
G D .m1 b1 C m2 b2 C m3 b3 / C b1 C b2 C b3 D g C k;
N1 N2 N3
where the vector k takes N1 N2 N3 values within a parallelepiped with the sides
made by the three basic reciprocal lattice vectors bi corresponding to the original
reciprocal lattice. This parallelepiped, called the (first) Brillouin zone (BZ), is
divided by a grid of N1 N2 N3 small cells with the sides given by bi =Ni , and all values
of k correspond to these small cells. Therefore, instead of (3.78) we can write in this
case:
XX X
f .r/ D fGCk ei.GCk/ r D fK eiK r ;
G k K
where we sum over all reciprocal lattice vectors of the original lattice and all points
k from the first BZ, where K D G C k.
The complex exponential functions exp .iK r/ with K D G C k are called plane
waves, they form the basis of most modern electronic structure calculation methods.
It is essential that this basis is complete (see Sect. 3.7.3), i.e. any function f .r/ can
be expanded in terms of them. In practice the expansion is terminated, and there is
a very simple algorithm for doing this. Indeed, plane waves with large reciprocal
lattice vectors K oscillate rapidly in the direct space and hence need to be kept
in the expansion only if the function f .r/ changes rapidly on a small length scale
(e.g. wave functions of the electrons oscillate strongly close to atomic nuclei in
atoms, molecules and crystals). If, however, f .r/ is smooth everywhere in space,
then large reciprocal space vectors K are not needed, i.e. one can simply include in
the expansion all vectors whose lengths are smaller than a certain cut-off Kmax , i.e.
jKj Kmax . Special tricks are used to achieve the required smooth behaviour of the
valence electron wave functions near and far from the atomic cores by employing
special effective core potentials called pseudopotentials.
294 3 Fourier Series
Here we introduced also a friction force with being a friction constant since there
must always be friction in the system. If the cantilever is excited with a signal of
frequency !, after some transient time a steady-state solution will be established
as the particular integral of the DE. This solution will also be periodic with the
same frequency !. We would like to obtain such a solution. Since the oscillation
is periodic, it can be expanded into a Fourier series (it is convenient to use the
exponential form):
1
X X X
z.t/ D zn ei2 nt=T
D zn ei!nt D z0 C zn ei!nt ; (3.89)
nD1 n n¤0
where
Z
1 T
Fn D Fs .h C z.t// ei!nt dt (3.91)
T 0
Note that the constant external force holding the cantilever contributes only to the
n D 0 Fourier coefficient. Since the exponential functions form a complete set, the
coefficients in both sides of the equation must be equal to each other. This way
we can establish the following equations for the unknown amplitudes zn :
296 3 Fourier Series
F0 C Fext
z0 D ;
m!02
1 1
z1 ! 2 C i! C !02 D F1 C A0
m 2
1 1
z1 ! 2 i! C !02 D F1 C A0 ;
m 2
Fn
zn ! 2 n2 C i! n C !02 D for any n D ˙2; ˙3; : : : : (3.92)
m
The external force ensures that there is no constant displacement of the cantilever
due to oscillations, i.e. Fext D F0 and hence z0 D 0. The rest of the obtained
equations are to be solved self-consistently since the coefficients Fn depend on
all amplitudes zn , see Eq. (3.91): using some initial values for the amplitudes, one
calculates the constants Fn ; these are then used to update the amplitudes zn by
solving the above equations; new amplitudes are used to recalculate the forces Fn ,
and so on until convergence. In practice, the Fourier series is terminated, so that
only the first N terms are considered, 0 n N.
In many cases only the first two terms in the Fourier series (i.e. for n D ˙1)
can be retained. Let us establish an expression for the resonance frequency in this
case. As was explained above, the resonance is established by the =2 phase shift
between the excitation signal, A0 cos .!t/, and the tip oscillation. This means that in
this approximation
A i!t
z.t/ ' A cos !t D A sin .!t/ D e ei!t ;
2 2i
i.e. z1 D A=2i D iA=2 and z1 D iA=2. Here A is the cantilever oscillation
amplitude.
Problem 3.40. Substituting these values of z˙1 into Eqs. (3.92), shows that the
resonance frequency ! must satisfy the following equation:
2 Z 2
! 1
D1 Fs .h C A sin / sin d : (3.93)
!0 kA 0
This formula can be used for calculating the resonance frequency of the tip
oscillations for the given lateral position of the tip above the surface for which the
tip-surface force Fs is known.
3.8 Applications of Fourier Series in Physics 297
Fig. 3.8 (a) A spatially uniform external potential V.t/ acting on an electron as a function of
time, t. (b) Occupation of energy levels n by the electron. Left: initially, when only the ground
state n D 0 was occupied. Right: at t D 1, when several states are occupied with different
probabilities (shown by the length of the horizontal lines representing the states), with the excited
state n D 2 being the most probable
298 3 Fourier Series
1
X
.x; t/ D cn .t/ n .x/;
nD0
then, according to the rules of quantum mechanics, the probability at time t to find
our electron in state n is Pn .t/ D jcn .t/j2 , i.e. it is given by the module square of
the generalised Fourier coefficients. Of course, since the electron should occupy a
state with certainty,
X X
Pn .t/ D jcn .t/j2 D 1:
n n
Problem 3.41. Assuming that the stationary states functions n .x/ form an
orthonormal set, show that the above condition R corresponds to the correct
normalisation of the wave function at any time: .x; t/ .x; t/dx D 1.
If the measurement is done long after the action of the external potential so that
we can assume that all the relaxation processes have ceased in the system, then
there will be a stationary distribution Pn .1/ of finding the electron in the state n as
schematically shown in Fig. 3.8(b).
To calculate the probability, we can use formula (3.67) for the Fourier
coefficients:
ˇZ ˇ2
ˇ ˇ
Pn .t/ D ˇˇ
n .x/ .x; t/dxˇˇ :
Of course, this is just one example; in fact, quantum mechanics is basically built on
the idea of expanding states into the Fourier series with respect to some stationary
states.
Another application of the functional series in quantum mechanics is related
to modern theories of quantum chemistry and condensed matter physics where a
many-electron problem is solved. The same Schrödinger equation is solved in both
these cases, but essentially different expansion of the electron wave functions in
basis functions (the basis set) is employed in each case. If in the condensed matter
physics plane waves are used representing an orthogonal and complete basis the
convergence of which can be easily controlled, in quantum chemistry localised on
atoms functions are used which mimic atomic-like orbitals. This way much smaller
sets of basis functions are needed to achieve a reasonable precision, however, it is
much more difficult to achieve convergence with respect to the basis set as functions
localised on different atoms do not represent an orthogonal and complete basis set,
so that including too many of them on each atom may result in overcompleteness
and hence instabilities in numerical calculations.
Chapter 4
Special Functions
In this chapter1 we shall consider various special functions which have found
prominent and widespread applications in physics and engineering. Most of these
functions cannot be expressed via a finite combination of elementary functions, and
are solutions of non-trivial differential equations. Their careful consideration is our
main objective here.
We shall start from the so-called Dirac delta function, which corresponds to the
class of generalised functions, then move on to gamma and beta functions; then
consider in great detail various orthogonal polynomials, hypergeometric functions,
associated Legendre and Bessel functions. The chapter is concluded with differential
equations which frequently appear when solving physics and engineering problems,
where the considered special functions naturally emerge as their solutions.
Consider a charge q which is spread along the x axis. The distribution of the charge
q along x is characterised by the distribution function .x/, called charge density,
which is the charge per unit length. Integrating the density over the whole 1D space
would give q,
Z 1
.x/dx D q:
1
1
In the following, references to the first volume of this course (L. Kantorovich, Mathematics for
natural scientists: fundamentals and basics, Springer, 2015) will be made by appending the Roman
number I in front of the reference, e.g. Sect. I.1.8 or Eq. (I.5.18) refer to Sect. 1.8 and Eq. (5.18) of
the first volume, respectively.
If the charge q is smeared out along x smoothly, the density .x/ would be a smooth
function of a single variable x. In physics, however, it is frequently needed to
describe point charges. A point charge is localised “at a single point”, i.e. there
is a finite charge at a single value of x only, beyond this point there is no charge at
all. How can one in this case specify the corresponding charge density? Obviously,
there is a problem. Consider a charge q placed at x D 0. Since the charge is assumed
to be point-like, the density must be equal to zero everywhere outside the charge, i.e.
where x ¤ 0, and equal to some constant A where the charge is, i.e. at x D 0. It is
immediately obvious that the constant A cannot be finite. Indeed, integration of the
density must recover the total charge q. However, we obtain zero when integrating
A since the function .x/ D A only within an immediate vicinity of the point x D 0,
i.e. when < x < with ! 0, but is zero outside it. Integrating such a density
gives zero:
Z 1 Z
.x/dx D lim Adx D lim 2A D 0:
1 !0 !0
This paradoxical situation may be resolved in the following way: one hopes that the
limit above may not be equal to zero if the constant A was infinity as in this case the
limit would be of the 0 1 type and hence (one hopes) might be finite. This means
that the charge density of a point charge must have a very unusual form: .x/ D 0
for any x ¤ 0 and .0/ D C1, i.e. it is zero everywhere apart from the single point
x D 0 where it is infinite; such a density has an infinitely high zero width spike at
x D 0.
This is indeed a very unusual function we have never come across before: all
functions we have encountered so far were smooth with at most finite number of
discontinuities; we have never had a function which jumps infinitely high up on the
left of a single point (x D 0) and immediately jumps back on the right side of it.
How can this kind of a function be defined mathematically? We shall show below
that this function can be defined in a very special way as a limit of an infinite
sequence of well-defined functions.
Consider a function ın .x/ which has the form of a rectangular impulse for any
value of n D 1; 2; 3; : : : (Fig. 4.1):
n; if 1=2n x 1=2n
ın .x/ D : (4.1)
0; if jxj > 1=2n
This function is constructed in such a way that the area under it is equal to one for
any value of n, i.e. the definite integral
Z 1 Z 1=2n
2
ın .x/dx D ndx D n D1 (4.2)
1 1=2n 2n
for any n. When n is increased, the graph of ın .x/ becomes narrower and more
peaked; at the same time, the area under the curve remains the same and equal to
one. In the limit of n ! 1 we shall arrive at a function which has an infinite height
and at the same time an infinitely small width—exactly what we need! This idealised
“impulse” function which is defined in the limit of n ! 1 was first introduced by
Dirac and bears his name:
It is helpful to think of ı.x/ as being an infinitely sharp impulse function of unit area
centred at x D 0:
It is important to stress that the function defined in this way is not an ordinary
function; it can only be understood and defined using a sequence of well-defined
functions ın .x/, n D 1; 2; 3 : : :, called a delta sequence. This type of a function
belongs to the class of generalised functions.
We shall now consider one of the most important properties of this function
which is called the filtering theorem. Let f .x/ be a well-defined function which can
be expanded in a Taylor series around x D 0. Consider the integral
Z 1 Z 1=2n
ın .x/f .x/dx D n f .x/dx: (4.5)
1 1=2n
For large n the integration interval 1=2n x 1=2n becomes very narrow and
hence f .x/ can be represented well by the Maclaurin series about the point x D 0:
It can be readily seen that at large values of n the integral is mainly determined by
f .0/, the other terms (in the sum) are very small, the largest one being of the order
of 1=n2 . In fact, by taking the n ! 1 limit we obtain
Z 1
lim ın .x/f .x/dx D f .0/: (4.7)
n!1 1
Formally, we can take the limit under the integral sign which would turn ın .x/ into
ı.x/, and hence arrive at the following formal result:
Z 1
ı.x/f .x/dx D f .0/: (4.8)
1
One may say that the integration of f .x/ with ı.x/ has filtered out the value f .0/
of the function f .x/; this value corresponds to the point x D 0 where the delta
function is peaked. This result must be easy to understand: since the delta function is
infinitely narrow around x D 0, only a single value of f .x/ at this point, f .0/, can be
kept in the product f .x/ı.x/ under the integral. Basically, within that infinitesimally
narrow interval of x the function f .x/ may be considered as a constant equal to f .0/.
Therefore, it can be taken out of the integral which then appears simply as the
integral of the delta function alone which is equal to unity because of Eq. (4.4).
We shall now discuss some obvious generalisations. Instead of consid-
ering the delta sequence centred at x D 0, one may define a sequence
fın .x a/; n D 1; 2; 3; : : :g, centred at the point x D a. This would lead us to
the delta function ı .x a/ and the corresponding generalisation of the filtering
theorem:
Z 1
ı.x a/f .x/dx D f .a/: (4.9)
1
Here the delta function ı.x a/ peaks at x D a and in the integral it filtered out the
value of the function f .x/ at this point.
In fact, more complex arguments of the delta function may be considered in the
same way by making an appropriate substitution. For instance,
Z ˇ ˇ
5 ˇ ˇ
x ˇ t D 2x C 1 ˇ
e ı .2x C 1/ dx D ˇ ˇ
10 ˇ dt D 2dx ˇ
Z 11 p
1 .t1/=2 1 .t1/=2 1 e
D e ı.t/dt D e D e1=2 D :
19 2 2 tD0 2 2
Note that the exact numerical values of the boundaries in the integral are unimpor-
tant: as long as the singularity of the delta function happens within them, one can use
4.1 Dirac Delta Function 303
the filtering theorem irrespective of the exact values of the boundaries; otherwise,
i.e. when the singularity is outside the boundaries, the integral is simply equal to
zero.
Using the filtering theorem, various useful properties of the delta function may
be established. First of all, consider ı .x/. We have for any function f .x/ which is
smooth around x D 0:
Z 1 Z 1 Z 1
ı.x/f .x/dx D ı.t/f .t/dt D ı.t/f .t/dt D f .0/ D f .0/;
1 C1 1
i.e. ı.x/ does the same job as ı.x/ and hence we formally may write
Because of this property which tells us that the delta function is even (something
one would easily accept considering its definition), we may also write the following
useful results:
Z 0 Z 1
1
ı.x/dx D ı.x/dx D ;
1 0 2
and hence
Z 0 Z 1
f .0/
ı.x/f .x/dx D ı.x/f .x/dx D : (4.11)
1 0 2
Problem 4.3. Prove the following other properties of the delta function:
xı.x/ D 0 I
1 b
ı.ax C b/ D ı x : (4.12)
jaj a
Note that in the latter case both positive and negative values of a are to be
considered separately.
Often the argument of the delta function is a more complex function than the
linear one considered so far. Moreover, it may become equal to zero at more than
304 4 Special Functions
one point yielding more than one singularity points for the delta function. As a
consequence, the filtering theorem in these cases must be modified. As an example,
consider the integral
Z 1
ID f .x/ı x2 a2 dx:
1
It contains ı x2 a2 which has two impulses: one at x D a and another at
x D Ca. We split the integral into two: one performed around the point x D a
and another around x D a:
Z aC Z
aC
ID ı x2 a2 f .x/dx C ı x2 a2 f .x/dx D I C IC ;
a a
where 0 < < a. In the first integral I we change the variable x into t D x2 a2 ,
p p
so that x D t C a2 and dx D dt= 2 t C a2 ; here the minus sign is essential
t D 0 where ı.t/ is peaked must correctly correspond to x D a where
as the point
ı x2 a2 is peaked. Performing the integration, we then obtain
Z 2 aC 2
p dt
I D ı.t/f t C a2 p
2 aC 2 2 t C a2
p
Z 2 aC 2
p f a2
dt f . jaj/
D ı.t/f t C a2 p D p D :
2 aC 2 2 tCa2 2 a2 2 jaj
In the second p integral IC the same p substitution is made, however, in this
case x D C t C a2 and dx D Cdt=2 t C a2 , while the integration results in
f .jaj/ = jaj. We therefore obtain that
Z 1
1 1
f .x/ı x2 a2 dx D Œf . jaj/ C f .jaj/ D Œf .a/ C f .a/ :
1 2 jaj 2 jaj
The same result is obtained if we formally accept the following identity:
1
ı x2 a2 D Œı.x a/ C ı.x C a/ : (4.13)
2 jaj
The rectangular sequence considered above to define the delta function is not
unique; there are many other sequences one can build to define the delta function as
the limit. If we consider any bell-like function .x/ defined for 1 < x < 1 and of
unit area which tends to zero at x ! ˙1, as in Fig. 4.2, then one can construct the
corresponding delta sequence using the recipe ın .x/ D n .nx/, which in the limit
n ! 1 corresponds to the delta function. Indeed, .nx/ tends to become narrower
with increasing n and, at the same time, the prefactor n makes the function ın .x/
more peaked as n gets bigger; at the same time, the area under the curve remains the
same for any n,
Z 1 Z 1 ˇ ˇ Z 1
ˇ t D nx ˇ
ın .x/dx D n .nx/dx D ˇ ˇ ˇD .t/dt D 1;
1 1 dt D ndx ˇ 1
as required. Also, it is clear that because the function ın .x/ defined above gets
narrower for larger n, the filtering theorem (4.8) or (4.9) is also valid.
Indeed, consider the integral
Z 1 Z 1 Z 1
t
f .x/ın .x/dx D f .x/n .nx/ dx D f .t/ dt:
1 1 1 n
Fig. 4.2 The graph of a typical bell-like function .x/ which may serve as the one generating the
corresponding delta sequence
306 4 Special Functions
Since the function .t/ is of unit area, the first term is simply f .0/, while all the
R 1 to zero in the n ! 1 limit, provided that for any positive integer k
other terms tend
the integral 1 tk .t/ dt converges.
For example, the following functions can be used to generate the delta sequences:
1 2
1 .x/ D p ex =2 ; (4.15)
2
1 1
2 .x/ D ; (4.16)
1 C x2
Z 1
sin x 1
3 .x/ D D eikx dk (4.17)
x 2 1
(the last passage in the equality is checked by direct integration). All these functions
satisfy the required conditions and have the desired shape as in Fig. 4.2. Therefore,
these functions generate the following delta sequences:
n 2 2
ın.1/ .x/ D p en x =2 ; (4.18)
2
n 1
ın.2/ .x/ D : (4.19)
1 C n2 x2
where ı ! C0, P is the symbol of Cauchy principal value and f .x/ is some
continuous function on the real axis. This formula can be written symbolically as:
1 1
DP i ı .x x0 / : (4.22)
x x0 ˙ iı x x0
4.1 Dirac Delta Function 307
To prove it, let us multiply the denominator and the numerator of the integrand in
the original integral by .x x0 / iı:
Z 1 Z 1
f .x/ f .x/ .x x0 iı/
dx D 2
dx
1 x x 0 ˙ iı 1 .x x0 / C ı
2
Z 1 Z 1
x x0 ı
D 2
f .x/dx i 2
f .x/dx:
2 2
1 .x x0 / C ı 1 .x x0 / C ı
(4.23)
In the first term we have to exclude the point x0 from the integration since otherwise
the integral diverges in the ı ! 0 limit:
Z 1 Z x0 Z 1
xx0 xx0 xx0
2
f .x/dxD 2
f .x/dxC 2
f .x/dx
2 2 2
1 .xx0 / Cı 1 .xx0 / Cı x0 C .xx0 / Cı
Z x0 Z 1 Z 1
x x0 x x0 f .x/
H) 2
f .x/dx C 2
f .x/dx H) P dx
1 .x x0 / x0 C .x x0 / 1 x0
x
in the ı ! 0 limit. Above, ! C0.To transform the second term in Eq. (4.23), we
notice that an expression ı= ı 2 C x2 tends to ı .x/ in the ı ! 0 limit. Indeed, this
follows from the representation (4.19) of the delta function upon the substitution
n ! 1=ı. Therefore, the second term in Eq. (4.23), upon the application of the
filtering theorem for the delta function, becomes i f .x0 /. This proves completely
formula (4.21).
Problem 4.5. By splitting the integral (4.20) into two by the zero point, show
that this formula can also be equivalently written as follows:
Z 1
1
ı.x/ D cos .xt/ dt: (4.24)
0
Here H.x/ is the Heaviside unit step function defined in Sect. I.2.1. [Answer:
(a) exp .2=5/ =5 ; (b) exp .242=5/ =5 and (c) e .1 C 5e/ =2.]
The Heaviside unit step function H.x/ is directly related to the delta function.
Indeed, H.x/ is constant anywhere apart from the point x D 0 and hence its
308 4 Special Functions
This guess can be supported by a simple calculation that shows that the integral of
the derivative of the Heaviside function is equal to unity:
Z 1
H 0 .x/dx D H.1/ H .1/ D 1 0 D 1;
1
since for x < 0 it is zero as the spike due to singularity of the delta function appears
outside the integral limits, while for x > 0 we obviously have 1 as the singularity
falls inside the limits. The case of x D 0 results in fact in the value of the integral
equal to 1=2 due to the fact that the delta function is even, i.e. it follows that H.0/ D
1=2 is consistent with Eq. (4.26). If we now differentiate both sides of this equation
with respect to x, we obtain (4.25). The derivative function H 0 .x/ belongs to the
class of generalised functions and is equal to ı.x/ in this sense.
R1
Problem 4.7. By calculating the integral 1 f .x/H 0 .x/dx by parts, prove that
the derivative of the Heaviside function, H 0 .x/, is equal to the Dirac delta
function ı.x/ in a sense that H 0 .x/ works as a filter for f .x/, exactly as the
delta function.
Problem 4.8. By using integration by parts, prove the integral identity
Z 1
f .x/ı 0 .x/dx D f 0 .0/;
1
XR C XP C DX D ˆ; (4.27)
This expression shows that stochastic forces are not correlated in time: indeed,
if t ¤ t0 , then the delta function is zero meaning there are no correlations.
Only the forces at the same times t D t0 correlate (when the delta function
is not zero). This also means that the system of oscillators does not possess any
memory as the past (t0 < t) does not influence the future at time t due to lack
of correlation between the forces at different times. Also forces corresponding
to different degrees of freedom are not correlated with each other due to ıij .
The appearance of temperature T in the right-hand side of Eq. (4.28) is not
accidental: this ensures that the so-called fluctuation-dissipation theorem is
obeyed. This is necessary to satisfy the equipartition theorem of statistical
mechanics as we shall see later on in this problem.
(i) Let e and !2 be eigenvectors and eigenvalues of the dynamical matrix D.
By writing X as a linear combination of all eigenvectors,
X
X.t/ D .t/e ;
show that each scalar coordinate .t/ satisfies the following DE:
(continued)
310 4 Special Functions
(iii) Correspondingly, show that the solution of Eq. (4.27) which survives at
long times can be written in the matrix form as follows:
hp i
Z t sin D .t /
X.t/ D e.t/=2 p ˆ. /d;
1 D
P
while the velocity vector, V.t/ D X.t/, reads
8 hp i 9
Z t < sin D .t / hp i=
V.t/D e.t /=2 p C cos D .t/ ˆ ./ d:
1 : 2 D ;
Here the matrix D D D 2 =4 E, where E is the unit matrix.
(iv) Show that half of the same time velocity–velocity autocorrelation function
at long times, i.e. the half of the average of the square of the velocity, is
1˝ ˛ kB T kB T
V.t/V T .t/ D ıij D E;
2 2 2
i.e. 12 kB T of the kinetic energy (recall that our particles are of unit
mass) is associated with a single degree of freedom, while the same time
displacement–displacement autocorrelation function is
˝ ˛
X.t/X T .t/ D kB TD1 :
(continued)
4.2 The Gamma Function 311
We define the gamma function, .z/, as the following integral in which its argument,
generally a complex number z, appears as a parameter:
Z 1
.z/ D tz1 et dt: (4.30)
0
We initially assume that Re z > 0 as this integral converges in the right half of
the complex plane. Indeed, because of et , the convergence at t ! 1 is achieved.
However, the integral may
R z1 diverge at t D 0 since for small t one puts et ' 1
arriving at the integral 0 t dt (with some 0 < 1) which diverges for z D 0
(logarithmically). Therefore, the question of convergence of the integral is not at all
straightforward. In order to prove the convergence of the integral for Re z > 0, it is
wise to split the integral into two parts corresponding to intervals 0 t 1 and
1 t < 1:
Z 1 Z 1
.z/ D 1 .z/ C 2 .z/; 1 .z/ D tz1 et dt and 2 .z/ D tz1 et dt:
0 1
Consider first 2 .z/. For a vertical stripe 0 < Re z xmax , we write z D x C iy, and
then the integral can be estimated as
Z 1 Z 1 ˇ ˇ Z 1 ˇ ˇˇ
ˇ z1 t ˇ ˇ ˇ ˇ ˇ ˇ
j2 .z/j ˇt e ˇ dtD et ˇe.x1/ ln tCiy ln t ˇ dtD et ˇe.x1/ ln t ˇ ˇeiy ln t ˇ dt
1 1 1
Z 1 ˇ ˇ Z 1 Z 1
ˇ ˇ
D et ˇe.x1/ ln t ˇ dt et e.xmax 1/ ln t dt D txmax 1 et dt:
1 1 1
In writing the second line above we have made use of the fact that ln t > 0
when t > 1. Since the integral in the right-hand side converges (its convergence
at t ! 1 is obvious because of the exponential function et ), then 2 .z/ converges
as well. Moreover, since the estimate above is valid for any z within the stripe, it
converges uniformly. Since the value of xmax was chosen arbitrarily, the integral
2 .z/ converges everywhere to the right of the imaginary axis, i.e. for any Re z > 0,
and is analytic there.
312 4 Special Functions
Consider now 1 .z/ which is obviously analytic for Re .z 1/ > 0 (or when
x D Re z > 1). Within the vertical stripe 0 < xmin < Re z < 1 in the complex plane,
the integral can be estimated as follows:
Z Z ˇ ˇ Z ˇ ˇ
1 ˇ z1 t ˇ 1
ˇ .x1/ ln t ˇ
1
ˇ ˇ
j1 .z/j ˇt e ˇ dt D t
e ˇe ˇ dt D et ˇe.1x/. ln t/ ˇ dt
0 0 0
Z Z Z ˇ
1
t .1xmin /. ln t/
1
xmin 1 t
1
xmin 1 txmin ˇˇ1 1
e e dtD t e dt t dtD D :
0 0 0 xmin ˇ0 xmin
This means that 1 .z/ converges and converges uniformly. Hence, .z/ D 1 .z/ C
2 .z/ is analytic everywhere to the right of the imaginary axis, Re z > 0.
The function .z/ satisfies a simple recurrence relation. Indeed, let us calculate
the integral for .z C 1/ by parts:
Z 1 Z 1 Z 1
1
.z C 1/D tz et dtD tz et 0 Cz tz1 et dt D z tz1 et dtDz.z/ ;
0 0 0
(4.31)
where the free term above is zero both at t D 0 and t D 1.
Problem 4.11. Prove that in the case of z being a positive integer, z D n, the
gamma function is equal to the factorial function:
.n C 1/ D nŠ: (4.32)
[Hint: to see this, first check that .1/ D 1 and then apply induction.]
p
Problem 4.12. Using induction and the fact that .1=2/ D (see below),
prove that
1 1 3 5 : : : .2n 1/ p .2n 1/ŠŠ p .2n/Š p
nC D D D 2n ;
2 2 n 2n 2 nŠ
(4.33)
where the double factorial .2n 1/ŠŠ corresponds to a product of all odd
integers between 1 and 2n 1.
2 Z 1 Z 1 Z 1Z 1
1 2 2 2 2
D4 ex dx ey dy D e.x Cy / dxdy;
2 0 0 1 1
which can be viewed as a double integral over the x y plane (that is why we have
introduced x and y as new variables). Hence, we can calculate it by going into polar
coordinates x D r cos and y D r sin (with dxdy D rdrd), which gives
2 Z 2 Z 1 Z 1
1 2 2
D d er rdr D 2 er rdr
2 0 0 0
Z 1
1 p
D et dt D H) D : (4.34)
0 2
where ˛ is a constant and n a positive integer. For odd values of n the integrand is
an odd function and the integral is obviously equal to zero. For even values of n the
integral In .˛/ is directly related to the gamma function of half integer argument as
is demonstrated by the following Problem.
Problem 4.14. Using a new variable x D ˛t2 in the integral of Eq. (4.36),
show explicitly that for even n
.nC1/=2 nC1
In .a/ D ˛ : (4.37)
2
1 2 2
G.x/ D p e.xx0 / =2 ; (4.39)
2
centred at the point x0 and with dispersion which characterises the width of
the function. An example of a Gaussian for x0 D 0 and D 1p is depicted in
Fig. 4.2. Show that the width of G.x/ at its half height is D 2 2 ln 2. Then,
prove that G.x/ is correctly normalised to unity. Finally, calculate the first two
momenta of the Gaussian:
Z 1 Z 1
xG.x/dx D x0 and x2 G.x/dx D x02 C 2
:
1 1
can be directly expressed via the gamma function. Similarly to the way we
calculated above .1=2/, consider the integral
Z 1 Z 1
x2 2˛1 y2 2ˇ1
ID e x dx e y dy
0 0
using two methods: (i) firstly, show that each integral in the brackets above is
related to the gamma function, so that I D 14 .˛/ .ˇ/; (ii) secondly, combine
the two integrals together into a double integral and then change into the polar
coordinates .x; y/ ! .r; '/; relate the r-integral to .˛ C ˇ/ by means of an
appropriate substitution, while the '-integral can be manipulated into 12 B.˛; ˇ/
by means of the substitution t D cos2 '. Hence show that
.˛/.ˇ/
B .˛; ˇ/ D : (4.41)
.˛ C ˇ/
(continued)
2
Named after Johann Carl Friedrich Gauss.
4.2 The Gamma Function 315
The second form (with x˛1 ) follows from the symmetry of the beta func-
tion (4.41).
Problem 4.20. Show that
Z 1 n .nŠ/2
1 x2 dx D 22nC1 : (4.43)
1 .2n C 1/Š
[Hint: using a new variable t via x D 1 2t, express the integral via the beta
function B.n C 1; n C 1/.]
Above the gamma function .z/ was defined in the right half of the complex
plane, to the right of the imaginary axis. The recursion relation (4.31) can be used
to analytically continue (Sect. 2.6) .z/ to the left half of the complex plane as well,
where Re z < 0. Indeed, let us apply the recurrence relation consecutively n 1
times to .z C n/ (here n D 1; 2; : : :):
.z C n C 1/ D .z C n/ .z C n/ D D .z C n/ .z C n 1/ .z C 1/ z.z/:
.z C n C 1/
.z/ D : (4.44)
.z C n/ .z C n 1/ .z C 1/ z
This formula can be used for calculating the gamma function for Re z 0. Indeed,
using in the above formula n D 0, we can write .z/ D .z C 1/=z, relating
.z/ within the vertical stripe 1 < Re z 0 to the values of the gamma function
.z C 1/ with 0 < Re .z C 1/ 1, where it is well defined. Similarly, by choosing
different values of n one can define the gamma function for the corresponding
vertical stripes of the width one to the left of the imaginary axis.
This formula also clearly shows that the gamma function in the left half of the
complex plane will have poles at z D n, where n D 0; 1; 2; : : :, i.e. the gamma
function defined this way is analytic in the whole complex plane apart from the
points z D 0; 1; 2; 3; : : : , where it has singularities. These are simple poles,
however, since (see Sect. 2.5.5) only the limit
is finite (recall that .1/ D 1); the limits of .z C n/k .z/ for any k > 1 are equal
to zero. The limit above (see Sect. 2.7.1) also gives the residue at the pole z D n,
which is .1/n =nŠ.
There is also a simple identity involving the gamma function which we shall
now derive. Using the integral representation (4.30) and assuming a real z between
0 and 1, let us consider the product .z/.1 z/. We shall employ a similar trick to
the one we used before when deriving Eq. (4.34). We write
Z 1 Z 1
.z/ .1 z/ D t1z1 et1 dt1 t2z et2 dt2 :
0 0
In the first integral we make the substitution x2 D t1 , while in the second integral
the substitution will be y2 D t2 . This brings us to a double integral
Z 1 Z 1 2z1
.x2 Cy2 / x
.z/ .1 z/ D 4 e dxdy;
0 0 y
in which we next use the polar coordinates x D r cos and y D r sin . This gives
(note that we only integrate over a quarter of the xy plane and hence 0 =2):
Z 1 Z =2 Z =2
2
.z/ .1 z/ D 4 er rdr .cot /2z1 d D 2 .cot /2z1 d:
0 0 0
p p
At the final step, we make the substitution t D cot , d D dt= 2 t .1 C t/ ,
which transforms the integral into the form which can be handled:
Z 1 z1
t
.z/ .1 z/ D dt D ; (4.45)
0 1Ct sin . z/
where at the last step we have used the result we obtained earlier in Problem 2.90 for
the integral in the right-hand side. This result was derived for 0 < z < 1. However,
it can be analytically continued for the whole complex plane and hence this formula
is valid for any z (apart from z D 1; 2; : : : where in both sides we have infinity).
Since .1 z/ D z .z/ because of the recurrence relation the gamma
function satisfies, the obtained identity can also be written as
for z > 0. This sequence converges to the gamma function in the limit of n !
1 because the sequence .1 t=n/n converges to et . The integral above can be
calculated by repeated integration by parts.
Y
n
C1=n/ z.1C1=2C1=3C C1=n/ C1=n/
1 D ez.1C1=2C1=3C e D ez.1C1=2C1=3C ez=k ;
kD1
giving
1 Y
n
z z=k
C1=nln n/
D zez.1C1=2C1=3C 1C e :
n .z/ kD1
k
Since .z/ D limn!1 n .z/, we can take the n ! 1 limit in the above formula
which yields
Y1
1 1 1 1 z z=k
D z exp z lim 1 C C C C ln n 1C e :
.z/ n!1 2 3 n kD1
k
Here the finite product becomes an infinite one, and the limit in the exponent is
nothing but the Euler–Mascheroni constant D 0:5772 : : : which we introduced
in Sect. I.7.1.2. Therefore, we finally obtain a representation of the gamma function
via an infinite product as follows:
Y1
1 z z=k
D zez 1C e : (4.47)
.z/ kD1
k
318 4 Special Functions
Problem 4.22. Using this product representation in Eq. (4.46), derive the
following representation of the sine function via an infinite product:
Y1
z2
sin . z/ D z 1 2 : (4.48)
kD1
k
Problem 4.23. Taking the logarithms of both sides of (4.48) and differentiat-
ing, show that
1
1 X 1
cot . z/ D : (4.49)
kD1
k C z
We shall need this beautiful result at the end of this chapter in Sect. 4.7.4.
X 1
1
G.x; t/ D p D Pn .x/tn : (4.50)
1 2xt C t2 nD0
4.3 Orthogonal Polynomials 319
Here 1 < x < 1, otherwise the square root is complex. Indeed, the function under
the square root,
f .t/ D 1 2xt C t2 D .t x/2 C 1 x2 ; (4.51)
is a parabola, see Fig. 4.3. It is positive for all values of t only if 1 x2 > 0, i.e.
when 1 < x < 1.
Let us expand the generating function G.x; t/ explicitly into the Taylor’s series
with respect to t and thus calculate several first polynomials:
X1 n ˇ
1 t @n G.x; t/ @G ˇˇ
G.x; t/ D p D D G.x; 0/ C t
1 2xt C t2 nD0
nŠ @tn tD0 @t ˇtD0
ˇ
1 @2 G ˇˇ
C t2 C
2 @t2 ˇtD0
ˇ
xt ˇ
ˇ
D 1C ˇ t
3=2 ˇ
2
.1 2xt C t / tD0
" #
1 1 3 .x t/.2t 2x/
C t2 C
2 .1 2xt C t2 /3=2 2 .1 2xt C t2 /5=2
tD0
1 2
D 1 C xt C 3x 1 t2 C :
2
Therefore, comparing this expansion with the definition of the Legendre polynomi-
als (4.50), we conclude that
1 2
P0 .x/ D 1 I P1 .x/ D x I P2 .x/ D 3x 1 : (4.52)
2
This procedure of direct expansion of the generating function can be continued;
however, the calculation becomes increasingly tedious.
320 4 Special Functions
X 1
@G
D nPn .x/tn1 : (4.53)
@t nD0
@G @ 1
D p
@t @t 1 2xt C t2
X1
xt xt xt
D D G.x; t/ D Pn .x/tn : (4.54)
.1 2xt C t2 /3=2 1 2xt C t2 1 2xt C t2 nD0
or
1
X 1
X 1
X 1
X 1
X
nPn tn1 2xnPn tn C nPn tnC1 D xPn tn Pn tnC1 :
nD0 nD0 nD0 nD0 nD0
Several simplifications are possible: the n D 1 term in the first sum does not
contribute because of the factor of .n C 1/, and hence the summation can start from
n D 0; in the third term we can add the n D 0 term since it is zero anyway because
of the prefactor n in front of Pn1 . Then, all three summations now run from n D 0,
have the same power of t and hence can be combined into one:
1
X
Œ.n C 1/PnC1 x.2n C 1/Pn C nPn1 tn D 0:
nD0
4.3 Orthogonal Polynomials 321
Since this expression must be valid for any t, each and every coefficient to tn should
be equal to zero3 :
Note that this recurrence relation is formally valid even for n D 0 as well if we
postulate that P1 D 0, in which case we simply get P1 .x/ D xP0 .x/ D x, the result
we already knew.
This recurrent relation can be used to generate the functions Pn .x/. Indeed, using
P0 D 1 and P1 D x, we have using n D 1 in the recurrence relation that 2P2 C P0 D
3xP1 , which gives
1 1 2
P2 D .3xP1 P0 / D 3x 1 ;
2 2
i.e. the same expression as obtained above using the direct method. All higher order
functions (corresponding to larger values of n) are obtained in exactly the same way,
i.e. using a very simple algebra.
Problem 4.24. Show using the direct method (the Taylor’s expansion) that:
1 1 1 5
P3 D .5x3 3x/ I P4 .x/D .35x4 30x2 C3/ I P5 .x/D 63x 70x3 C15x :
2 8 8
(4.56)
Problem 4.25. Confirm these results by repeating the calculation using the
recurrence relations.
Problem 4.26. Prove that Pn .x/ is a polynomial of order n. [Hint: use the
recurrence relation (4.55) and induction.]
X 1
@G
D P0n .x/tn ;
@x nD0
X1
@G t t t
D D G.x; t/ D Pn tn :
@x .1 2xt C t2 /3=2 1 2xt C t2 1 2xt C t2 nD0
3
Functions tn with different powers n are linearly independent.
322 4 Special Functions
or
1
X 1
X 1
X 1
X
P0n tn 2x P0n tnC1 C P0n tnC2 D Pn tnC1 ;
nD0 nD0 nD0 nD0
„ ƒ‚ … „ ƒ‚ …
n!nC1 nC1!n
which after the corresponding index substitutions indicated above transforms into:
1
X 1
X 1
X
P0nC1 tnC1 2xP0n C Pn tnC1 C P0n1 tnC1 D 0:
nD0 nD0 nD1
Note that the n D 1 term in the first sum does not contribute since P00 D 0 and
hence was omitted. Separating out the n D 0 terms in the first and second sums and
collecting other terms together, we obtain
1
X
P01 t 2xP00 C P0 t C P0nC1 2xP0n Pn C P0n1 tnC1 D 0:
nD1
The expression in the square brackets is in fact zero if we recall that P0 D 1 and
P1 D x. Thus, we immediately obtain a different recurrence relation:
Note that the two recurrence relations we have derived contain the Legendre
polynomials with three consecutive indices. Other identities can also be obtained
via additional differentiations as described explicitly below. In doing this, we aim
at obtaining such identities which relate only Legendre polynomials with two
consecutive indices. Using these we shall derive a differential equation the functions
Pn .x/ must satisfy.
To this end, we first differentiate Eq. (4.55) with respect to x:
Solve (4.58) with respect to xP0n and substitute into (4.57); after straightforward
algebra, we obtain
Solving (4.59) with respect to P0nC1 and substituting into Eq. (4.57) gives
while solving (4.59) with respect to P0n1 and substituting into Eq. (4.57) results in:
Now we have both P0n1 and P0nC1 expressed via Pn and P0n by means of
Eqs. (4.60) and (4.61), respectively. This should allow us to formulate a differential
equation for the polynomials Pn .x/. Differentiating Eq. (4.60), we can write
Substituting P0n1 and P00n1 from (4.60) and (4.63), respectively, into (4.64) gives an
equation containing functions Pn with the same index n:
1 x2 P00n 2xP0n .x/ C n.n C 1/Pn .x/ D 0; (4.65)
Problem 4.27. Expanding directly the generating function into the Taylor’s
series for x D ˙1 and comparing this expansion with the definition of the
polynomials, Eq. (4.50), show that
Problem 4.28. Using the fact that G.x; t/ D G.x; t/, prove that
i.e. polynomials are even functions (contain only even powers of x) for even
n, while polynomials with odd n are odd (contain only odd powers of x). In
particular, Pn .0/ D 0 for odd n. In other words,the polynomials
Pn .x/ for odd
n do not contain constant terms, e.g. P3 .x/ D 12 5x3 3x .
324 4 Special Functions
Problem 4.29. In this problem we shall calculate P2n .0/ using the method of
the generating function. First show that at x D 0 the Taylor’s expansion of the
generating function is
1
X .2n/Š 2n
G.0; t/ D .1/n t :
nD0
.nŠ/2 22n
P
Then, alternatively, G.0; t/ must be equal to the series 1nD0 Pn .0/t . Rewriting
n
2
this latter series as an expansion with respect to t (why can this only be done
for even n?), show that
.2n/Š
P2n .0/ D .1/n : (4.68)
22n .nŠ/2
The Legendre polynomials with different indices n are orthogonal to each other with
the weight one. To show this, we first rewrite the differential equation (4.65) in the
following equivalent form:
d
1 x2 P0n C n.n C 1/Pn D 0: (4.69)
dx
Multiplying it with Pm .x/ with m ¤ n and integrating between 1 and 1 gives
Z 1 Z 1
d
Pm 1 x2 P0n dx C n.n C 1/ Pn Pm dx D 0:
1 dx 1
or
Z 1 Z 1
n.n C 1/ Pn .x/Pm .x/dx D 1 x2 P0n P0m dx: (4.70)
1 1
Alternatively, we can start from Eq. (4.69) written for Pm .x/, then multiply it by
Pn .x/ with some n ¤ m and integrate; this way we would obtain the same result as
above but with n and m interchanged:
Z 1 Z 1
m.m C 1/ Pm .x/Pn .x/dx D 1 x2 P0m P0n dx:
1 1
4.3 Orthogonal Polynomials 325
ˇ ˇ1 ˇ ˇ
1 ˇ 2
C 1 ˇ 1 ˇ 1 t2 C1 ˇ
t ˇ ˇ
D ln ˇˇx ˇ D ln ˇ 2t
ˇ
2t 2t ˇ1 2t ˇ 1 t2 C1 ˇ
2t
ˇ ˇ ˇ ˇ
1 ˇ 2t t2 1 ˇ 1 ˇˇ t2 2t C 1 ˇˇ
D ln ˇˇ ˇ D ln ˇ 2
2t 2t t2 1 ˇ 2t t C 2t C 1 ˇ
ˇ ˇ ˇ ˇ
1 ˇ .t 1/2 ˇ 1 ˇˇ 1 t ˇˇ 1
D ln ˇˇ ˇ D ln ˇ D Œln.1 t/ ln.1 C t/ :
2t .t C 1/ 2 ˇ t 1Ct ˇ t
Using the Taylor expansion for the logarithms (recall that 1 < t < 1 and hence the
expansions converge),
1
X 1 k
X
tk t
ln.1 C t/ D .1/kC1 and ln.1 t/ D ;
kD1
k kD1
k
Only terms with the odd summation indices k survive, hence we can replace the
summation index according to the recipe k ! 2n C 1, yielding
326 4 Special Functions
1
X t2nC1
ln.1 t/ ln.1 C t/ D 2 ;
nD0
2n C 1
Problem 4.30. Show, using explicit calculation of the integrals, that P3 .x/ is
orthogonal to P4 .x/ (their expressions are given in Eq. (4.56)) and that P3 .x/ is
properly normalised, i.e.
Z 1 Z 1
2 2
P3 .x/P4 .x/dx D 0 and P23 .x/dx D D :
1 1 23C1 7
Since all polynomials contain different powers of x, they are all linearly
independent. It can also be shown that they form a complete set (see Sect. 4.3.2.6),
and hence a function f .x/ defined on the interval 1 x 1 can be expanded in
them:
1
X
f .x/ D an Pn .x/: (4.75)
nD0
4.3 Orthogonal Polynomials 327
Multiplying both sides of this equation by Pm .x/ with some fixed value of m,
integrating between 1 and 1 and using the orthonormality condition (4.74), we get
Z 1 1
X Z 1
f .x/Pm .x/dx D an Pn .x/Pm .x/dx
1 nD0 1
Z 1 1
X 2 2
H) f .x/Pm .x/dx D an ınm D am ;
1 nD0
2n C 1 2m C 1
from which the following expression for the expansion coefficient follows:
Z 1
2n C 1
an D f .x/Pn .x/dx: (4.76)
2 1
Example 4.1. I
Let us expand the Dirac delta function in Legendre polynomials:
1
X
ı.x/ D an Pn .x/:
nD0
Problem 4.32. Expand the function f .x/ D 1 C x C 2x2 into a series with
respect to the Legendre polynomials:
2
X
1 C x C 2x2 D cn Pn .x/:
nD0
Why from the start do only polynomials up to the order two need to be con-
sidered? Using explicit expressions for the several first Legendre polynomials,
verify your expansion. [Answer: c0 D 5=3, c1 D 1 and c2 D 4=3.]
Problem 4.33. Expand f .x/ D x3 via the appropriate Legendre polynomials.
[Answer: x3 D 35 P1 .x/ C 25 P3 .x/.]
Problem 4.34. Hence, show that
Z 1
2 4
Pn .x/x3 dx D ın1 C ın3 :
1 5 35
328 4 Special Functions
We shall now prove the so-called Rodrigues formula which allows writing the
Legendre polynomial Pn .x/ with general n in an explicit and compact form:
1 dn 2 n
Pn .x/ D x 1 : (4.77)
2 nŠ dx
n n
n
To prove it, we consider an auxiliary function #.x/ D x2 1 , which satisfies
the equation (check!):
2 d#
x 1 D 2xn#.x/: (4.78)
dx
We shall now differentiate this equation n C 1 times using the Leibnitz formula
Xn
dn .uv/ n .k/ .nk/
D u v ; (4.79)
dxn kD0
k
!
dnC1 X nC1
n C 1 .k/ .nC1k/
2n nC1 .x#/ D 2n x #
dx kD0
k
! !
nC1 nC1
D 2n x# .nC1/ C 2n # .n/ D 2nx# .nC1/ C 2n.n C 1/# .n/ :
0 1
or
1 x2 # .nC2/ 2x# .nC1/ C n.n C 1/# .n/ D 0 H)
1 x2 U 00 2xU 0 C n.n C 1/U D 0;
which is the familiar Legendre equation (4.65) for the function U.x/ D # .n/ .
Since the function U.x/ satisfies the correct differential equation for Pn .x/, it
must be equal to Pn .x/ up to an unknown constant factor. To find this factor, and
hence prove the Rodrigues formula (4.77), we shall calculate U.1/ and compare it
with Pn .1/ which we know is equal to one:
dn 2 n dn
U.x/ D # .n/ D x 1 D Œ.x C 1/n .x 1/n :
dxn dxn
Use the Leibnitz formula again:
n
X n
U.x/ D Œ.x C 1/n .k/ Œ.x 1/n .nk/ ;
k
kD0
where
nŠ
Œ.x C 1/n .k/ D n.n 1/ : : : .n k C 1/.x C 1/nk D .x C 1/nk ;
.n k/Š
and similarly
nŠ nŠ
Œ.x 1/n .k/ D .x 1/nk H) Œ.x 1/n .nk/ D .x 1/k :
.n k/Š kŠ
Hence, we obtain
n
X n nŠ nŠ
U.x/ D .x C 1/nk .x 1/k :
kD0
k .n k/Š kŠ
and this proves the normalisation factor in Eq. (4.77). The Rodrigues formula is
proven completely.
330 4 Special Functions
where w.x/ 0 is a weight function (see Sects. 1.1.2 and 3.7.3). In this section we
shall call the expression .f ; g/ defined by Eq. (4.81) an overlap integral between two
functions f .x/ and g.x/. Note that the overlap integral is fully defined if the weight
function w.x/ is given. For the moment we shall not assume any particular form of
the weight function, but later on several forms of it will be considered.
Theorem 4.1. Any polynomial Hn .x/ of order n can be presented as a linear com-
bination of Qm polynomials with m D 0; 1; : : : ; n, i.e. higher order polynomials are
not required:
X
n
Hn .x/ D cnk Qk .x/: (4.82)
kD0
Proof. The Qk .x/ polynomial contains only powers of x from zero to k. Then,
collecting all polynomials Q0 .x/, Q1 .x/, : : :, Qn .x/ into a vector-column Q, one can
write this statement in a compact form using the following formal matrix equation:
4.3 Orthogonal Polynomials 331
0 1 0 10 1
Q0 .x/ a11 1
B Q1 .x/ C B a21 a22 CB x C
B C B CB C
B Q2 .x/ C B a31 C B x2 C
B CDB a32 a33 CB C;
B : C B : :: :: : : CB : C
@ :: A @ :: : : : A @ :: A
Qn .x/ an1 an2 an3 ann xn
which can also be written simply as Q D AX, where A is the left triangular matrix
of the coefficients aij , and X is the vector-column of powers of x. Correspondingly,
X D A1 Q. It is known (see Problem 1.29 in Sect. 1.2.3) that the inverse of a
triangular matrix has the same structure as the matrix itself, i.e. A1 is also left
triangular:
0 1 0 10 1
1 b11 Q0 .x/
B x C B b21 b22 C B Q1 .x/ C
B C B CB C
B x2 C B b31 C B Q2 .x/ C
B CDB b32 b33 CB C;
B : C B : :: :: : : CB : C
@ :: A @ :: : : : A @ :: A
xn bn1 bn2 bn3 bnn Qn .x/
where bij are elements of the matrix A1 . In other words, the k-th power of x is
expanded only in polynomials Q0 ; Q1 ; : : : ; Qk . Since the polynomial Hn .x/ contains
only powers of x from 0 to n, it is expanded exclusively in polynomials Qk with
k n, as required. Q.E.D.
A simple corollary to this theorem is that any polynomial Hn .x/ of order n is
orthogonal (in the sense of the definition (4.81)) to any of the polynomials Qk .x/
with k > n. Indeed, Hn .x/ can be expanded in terms of the orthogonal polynomials
as shown in Eq. (4.82). Therefore, the overlap integral
X
n X
n
.Hn ; Qk / D cnl .Ql ; Qk / D cnl ıkl :
lD0 lD0
It is seen from here that the overlap integral .Hn ; Qk / ¤ 0 only if the summation
index l may accept a value of k, but this can only happen if k n. This proves the
statement made above: .Hn ; Qk / D 0 for any k > n.
Theorem 4.2. The polynomial Qn .x/ has exactly n roots on the interval
a x b.
Proof. Let us assume that Qn .x/ changes its sign k times on the interval, and
that k < n (strictly). This means that the function Qn .x/ must cross the x axis
at some k points x1 ; x2 ; : : : ; xk lying within the same interval, see an example in
Fig. 4.4. Consider then a polynomial Hk .x/ D .x x1 / .x x2 / .x xk / which
332 4 Special Functions
Fig. 4.4 Polynomials Q3 .x/, Q4 .x/ and Q5 .x/ cross the x axis three, four and five times,
respectively, and hence change their sign the same number of times
also changes its sign k times on the same interval and has its roots at the same
points. Correspondingly, the product F.x/ D Qn .x/Hk .x/ does not change its sign at
all, and hence the integral
Z b Z b
w.x/F.x/dx D w.x/Hk .x/Qn .x/dx ¤ 0
a a
(recall that w.x/ 0). This result, however, contradicts the fact proven above that
Qn is orthogonal to any polynomial of a lower order than itself. It follows therefore
that k must be equal to n as only in this case the contradiction is eliminated. Q.E.D.
It follows then that the orthogonal polynomials fQn .x/g must be somewhat
special as they possess a special property that each of them has exactly as many
roots as its order. Since a polynomial of order n may have no more than n distinct
roots (the total number of roots, including repetitions, is n), we conclude that for
any Qn .x/ all its n roots must be distinct.
X
nC1 Z
1 b
.xQn ; Qk /
xQn D hn;k Qk ; with hn;k D w.x/xQn .x/Qk .x/dx D ;
kD0
Dk0 a .Qk ; Qk /
(4.83)
where
Z b
Dk0 D w.x/Q2k .x/dx D .Qk ; Qk / (4.84)
a
so that
.n/
DnC1;0 DnC1;0 an
hnC1;n D hn;nC1 D :
Dn;0 Dn;0 a.nC1/
nC1
334 4 Special Functions
This identity must be valid for any n. Therefore, using n 1 instead of n in it, we
can write
.n1/
Dn;0 an1
hn;n1 D :
Dn1;0 a.n/
n
This gives us the first coefficient in the expansion (4.85). It is now left to calculate
the second coefficient hn;n , which can be done by comparing the coefficients to xn in
both sides of Eq. (4.85):
After collecting all the coefficients we have just found, the recurrence relation (4.85)
takes on the following form:
Here we shall derive the differential equation which the polynomials Qn .x/ should
satisfy. However, at this point more information is to be given concerning the
weight function w.x/. As it is customarily done, we shall assume that it satisfies
the following first order DE:
w0 .x/ ˛.x/
D ; (4.88)
w.x/ .x/
.x/w.x/jxDa;b D 0: (4.89)
It will become apparent later on why it is convenient that these are obeyed.
4.3 Orthogonal Polynomials 335
The first (free) term in the right-hand side is equal to zero because of the
boundary condition (4.89); to calculate the integral in the right-hand side we use
the integration by parts again:
Z
ˇb b k1 0
I D k xk1 w Qn ˇa C k x w Qn dx:
a
Again, due to the boundary condition, the first term in the right-hand side is zero,
and hence
Z Z
b k1 0 b
IDk x w Qn dx D k .k 1/ xk2 w C xk1 w0 C xk1 w 0
Qn dx:
a a
The second term in the square brackets can be rearranged into xk1 w0 D xk1 w˛
because of the DE (4.88) the weight function satisfies. Then we can finally write
Z b
0
IDk w .k 1/ xk2 C xk1 ˛ C xk1 Qn dx:
a
Now the whole expression in the square brackets is a polynomial of the order k
(note that 0 is the first order polynomial), and therefore I D 0 for any k < n as any
polynomial of order less than n is orthogonal to Qn .
On the other hand, the integral I can be written directly as
Z Z
b 0 b
ID xk w Q0n dx D xk w0 Q0n C w 0 Q0n C w Q00n dx
a a
Z Z
b b
D x w˛Q0n C w 0 Q0n C w Q00n dx D
k
wxk ˛C 0
Q0n C Q00n dx;
a a
where we have used (4.88) again. Here the expression in the square brackets is some
polynomial Hn of order n. Since we know that I D 0 for any k < n, the polynomial
Hn should be proportional to Qn , i.e. we must have
0
˛C Q0n C Q00n D n Qn H) Q00n C ˛ C 0
Q0n n Qn D 0; (4.90)
which is the desired DE. The constant n can be expressed via the coefficients of
the polynomials ˛.x/ and .x/. Indeed, comparing the coefficients to xn in the DE
above, we obtain:
Hence, the selection of the functions ˛.x/ and .x/, which define the weight
function w.x/, precisely determines the DE for the orthogonal polynomials. This
DE, which we shall rewrite in a simpler form as
There exists a compact general formula for the polynomials Qn .x/ corresponding to
a particular weight function w.x/ which also bears the name of Olinde Rodrigues4 :
Cn dn
Qn .x/ D Œw.x/ n
.x/ ; (4.94)
w.x/ dxn
v 0 .x/ D w0 .x/ n
.x/ C w.x/n n1
.x/ 0 .x/ D n1
.x/w.x/ ˛.x/ C n 0 .x/ ;
where we expressed .x/w0 .x/ from Eq. (4.88). Multiplying both sides of the above
equation by .x/, we obtain
In this equation .x/ and ˛.z/ C n 0 .x/ are the second and first order polynomials,
respectively. Therefore, we can differentiate both sides of this equation n C 1 times
using the Leibnitz formula (4.79) and a finite number of terms will be obtained in
the right- and the left-hand sides:
4
He actually derived it only for Legendre polynomials in 1816.
4.3 Orthogonal Polynomials 337
0 .nC1/ nC1 .nC2/ nC1 0 .nC1/ nC1 00 .n/
LHS D v D v C v C v
0 1 2
1
D v .nC2/ C .n C 1/ 0 .nC1/
v C n .n C 1/ 00 v .n/ ;
2
.nC1/
RHS D ˛Cn 0 v D ˛ C n 0 v .nC1/ C .n C 1/ ˛ 0 C n 00
v .n/ :
Since the two expressions must be equal, we obtain after small rearrangements:
n
v .nC2/ C 0
˛ v .nC1/ .n C 1/ ˛ 0 C 00
v .n/ D 0:
2
To obtain a DE for Qn we recall that, according to the Rodrigues formula (4.94) we
are set to prove here, v .n/ is supposed to be proportional to wQn . Therefore, we have
to replace v .n/ with wQn in the above DE for the final rearrangement (and ignoring
the constant prefactor between them):
n
.wQn /00 C 0
˛ .wQn /0 .n C 1/ ˛ 0 C 00
wQn D 0:
2
Performing differentiation and using repeatedly Eq. (4.88) to express derivatives of
the weight function via itself, w0 D w˛= and
˛ 0 ˛ ˛0 ˛ 0
wh 0 ˛ i
w00 D w D w0 Cw w 2
D ˛ C ˛ 0
;
we obtain exactly DE (4.90) for Qn .x/ with the same prefactor (4.91) to Qn . This
proves the Rodrigues formula in the very general case.
We know from the very beginning that one can define a generating function G.x; t/
for Legendre polynomials, Eq. (4.50), which can then be used as a starting point in
deriving all their properties. In the current section we have taken a different approach
by deriving the polynomials from the weight function. Still, it is important to show
that a generating function can be constructed in the very general case as well. This
can easily be shown using the Rodrigues formula.
Indeed, consider the function of two variables:
1
X Qn .x/ tn
G.x; t/ D ; (4.95)
nD0
Cn nŠ
where Cn is the constant prefactor in the Rodrigues formula, Eq. (4.94). Using the
Rodrigues formula, we first rewrite G.x; t/:
1
1 X tn d n
G.x; t/ D .w n
/; (4.96)
w.x/ nD0 nŠ dxn
338 4 Special Functions
and then use the Cauchy formula (2.74) for the n-th derivative:
1 I I 1
1 X tn nŠ w.z/ n
.z/ 1 1 w.z/ X .z/t n
G.x; t/ D dz D dz;
w.x/ nD0 nŠ 2 i L .z x/ nC1 w.x/ 2 i L z x nD0 z x
where L is some contour in the complex plane that surrounds the point x; it is to be
chosen such that the function w.z/, when analytically continued into the complex
plane from the real axis, is analytic inside L including L itself ( .z/D 0 C 1 zC 2 z2
is obviously analytic everywhere). Assuming that j t= .z x/j < 1 (this can
always be achieved by choosing t sufficiently small), we can sum up the geometric
progression inside the integral to obtain
I I
1 1 w.z/ 1 1 1 w.z/
G.x; t/ D .z/t
dz D dz:
w.x/ 2 i L zx1 w.x/ 2 i L z x .z/t
zx
At this point we need to understand where the roots are for small enough t. To this
end, let us expand the square root in terms of t up to the first order:
q
.1 t 1 /2 4t 2 .t 0 C x/ D 1 . 1 C 2 2 x/ t C :
If .x/ is a constant or a first order polynomial, then there is only one root of f .z/
which is easily seen to be always close to x for small t. Hence, the formula derived
above is valid formally in this case as well, where z is the root in question.
We have already discussed the general theory of functional series in Sect. I.7.2. We
also stated in Sect. 4.3.1.2 that “good” functions f .x/ can be expanded in Legendre
polynomials since these form a complete set. This is actually true for any orthogonal
polynomials. This is because the orthogonal polynomials Qn .x/, n D 0; 1; 2; : : :, as
it can be shown, form a closed set which is the necessary condition for them to form
a complete set.
To explain what that means, consider functions f .x/ for which the integral
Z b
f 2 .x/w.x/dx < C1:
a
imply that f .x/ D 0, then the functions fQn .x/; n D 0; 1; 2; : : :g form a closed
set. This is analogous to a statement that if a vector in an n-dimensional space is
orthogonal to every basis vector of that space, then this vector is a zero vector, i.e.
the collection of basis vectors is complete.
The point about the orthogonal polynomials is that any family of them (i.e. for
the given ˛.x/, .x/, see the following subsection) forms such a closed set, and
hence any function f .x/ for which integrals
Z b Z b
2 2
f .x/w.x/dx and f 0 .x/ w.x/ .x/dx
a a
z x t .z/ D tz2 x C .z t/ D 0;
By replacing 2t ! t in the last two formulae, we obtain the usual definition (4.50)
of the generating function for the Legendre polynomials.
Calculations for other classical polynomials are performed in the same way.
These are considered in the following Problems.
which is the expression that is sometimes used to define the Hermite polynomi-
als.
Problem 4.37. Verify using the method of the generating function and the
Rodrigues formula that several first Hermite polynomials are
Problem 4.39. Verify using the method of the generating function and the
Rodrigues formula that several first Laguerre polynomials are
1 2
L0 D 1 I L1 D 1x I L2 D
x 4xC2 I
2
1 3 1 4
L3 D x C9x2 18xC6 I L4 D x 16x3 C72x2 96xC24 : (4.107)
6 24
342 4 Special Functions
./
Problem 4.40 (Generalised Laguerre Polynomials Ln .x/). The only dif-
ference with the previous case is that ˛.x/ D x. Show that in this case
w.x/ D x ex , and by choosing Cn D 1=nŠ we get
It is seen from the above formulae that the Legendre polynomials can indeed be
.0;0/
obtained from Jacobi ones at D D 0, i.e. Pn .x/ D Pn .x/. Chebyshev poly-
.1=2;1=2/
nomials, Tn .x/, follow by choosing D D 1=2, i.e. Tn .x/ Pn .x/.
In fact, it can be shown that Jacobi, Hermite and Laguerre polynomials cover all
possible cases of orthogonal polynomials.
It is possible to derive an explicit expression for the Jacobi polynomials (4.115).
This is done by applying the Leibnitz formula (4.79) when differentiating n times
the product of .1 x/nC and .1 C x/nC ,
n h
dn h i X n i.k/ .nk/
n
.1 x/ nC
.1 C x/ nC
D .1 x/nC .1 C x/nC :
dx k
kD0
nC
where are generalised binomial coefficients (2.90), and similarly
k
.nk/ nC
.1 C x/nC D .n k/Š .1 C x/Ck ;
nk
we obtain
n
1 X Cn Cn
P.;/ .x/ D .x 1/nk .x C 1/k ; (4.116)
n
2n kD0 k nk
from which it is evident that this is indeed a polynomial for any real values of
and . Note that the obtained expression is formally valid for any and
including negative integer values for which some values of k in the sum are cut
off. This would happen automatically because of the numerators in the generalised
binomial coefficients. The obtained expression can be used for writing down an
explicit formula for, e.g. Legendre polynomials (when D D 0).
344 4 Special Functions
ˇ.z/ 0 .z/
y00 .z/ C y .z/ C 2 y.z/ D 0; (4.117)
.z/ .z/
where ˇ.z/ is a polynomial of up to the first order, while .z/ and .z/ are
polynomials of the order not higher than two. This equation is called a generalised
equation of the hypergeometric type.
For generality we shall consider solutions in the complex plane, i.e. z in the DE
above is complex. This type of equation is frequently encountered when solving
partial differential equations (PDEs) of mathematical physics using the method
of separation of variables. We shall consider a number of examples which would
emphasise this point in Sect. 4.7. Here, however, we shall simply try to investigate
solutions of the above equation. More specifically, we shall investigate under which
conditions its solutions on the real axis (when z D x) are bound (limited) within a
particular interval of x between a and b; note that the latter boundaries could be also
1 and/or C1. This consideration is very important when obtaining physically
meaningful solutions because in physics we normally expect the solutions not to be
infinite in the spatial region of interest.
where .z/ is an unknown polynomial of the first order which we shall try to select
in order to perform the required transformation. Since
0 0 00 0 2 00 0 0 0 2 0 2
D H) D C D C ;
.z/ 0 .z/
u00 .z/ C u .z/ C 2 u.z/ D 0; (4.120)
.z/ .z/
where
are the first and the second order polynomials, respectively. The unknown polyno-
mial .z/ is now selected in a specific way so that D with some constant .
This must be possible as the polynomial .z/ in (4.121) only depends on the
unknown first order polynomial .z/ D 0 C 1 z (with two unknown parameters 0
and 1 ) and .z/, both polynomials on the left- and right-hand sides of the equation
D are of the second order, so that equating the coefficients to z0 , z1 and z2 on
both sides in this equation should give three algebraic equations for , 0 and 1 .
Although this procedure would formally allow us to transform Eq. (4.120)
into the required standard form (compare with Eq. (4.92) for classical orthogonal
polynomials),
where
k D 0 (4.124)
346 4 Special Functions
is a constant (recall that by our assumption the function .z/ is a first order
polynomial and hence its derivative is a constant). In order for the function .z/ to be
a first order polynomial, both terms in the right-hand side of Eq. (4.123) have to be
polynomials up to the first order. The free term .ˇ 0 / =2 is already a first order
polynomial, but we also have to ensure that the square root is a first order polynomial
as well. The expression under the square root, as it can easily be seen, is a second
order polynomial. However, the square root of it may still be an irrational function.
The square root is going to be a first order polynomial if and only if the second order
polynomial under the root is the exact square of a first order polynomial. Then the
square root is a rational function which is a first order polynomial.
Equipped with this idea, we write the expression inside the square root explicitly
as a quadratic polynomial, R2 .z/ D a0 C a1 z C a2 z2 (with the coefficients a0 , a1 and
a2 which can be expressed via the corresponding coefficients of the polynomials
.z/, .z/ and ˇ.z/), and make it up to the complete square:
a1 2 a21
R2 D a2 z C C D; D D a0 :
2a2 4a2
This polynomial is going to be an exact square if and only if the constant term is
zero: D D 0. This procedure gives possible values for the constant k. There could
be more than one solution. Once k is known, we find the complete function .z/
from (4.123), and hence by solving Eq. (4.119) obtain the transformation function
.z/. The prefactors .z/ and of the new form of the DE (4.122) are then obtained
from (4.121) and (4.124).
Example 4.2. I As an example, consider the transformation of the following DE
z C 1 0 .z C 1/2
y00 C y C yD0
z z2
0
p p
i 1i 3 1i 3
D D H) ln D i ln z z
z z 2 2
h z p i
H) .z/ D zi exp 1 i 3 :
2
p
Finally, we calculate the prefactor .z/ from (4.121) yielding D .1 2i/ C i 3z.
Now the initial DE accepts a new form (4.122) for u.z/ with the above values of
and .z/. Once the solution u.z/ of the new equation is obtained, the required
function y.z/ can be found via y.z/ D .z/u.z/. J
We shall come across several more examples of this transformation below.
In this section we shall study (mostly polynomial) solutions of the DE (4.122). This
DE is called the DE of a hypergeometric type.
Let u.z/ be a solution of such an equation. It is easy to find the DE which is
satisfied by the n-th derivative of u, i.e. by the function gn .z/ D u.n/ .z/. Indeed,
using Leibnitz formula (4.79), one can differentiate the DE (4.122) n times recalling
that .z/ and .z/ are polynomials of the second and first orders, respectively. We
have
.n/ 1
u00 D u.nC2/ C n 0 u.nC1/ C n .n 1/ 00 .n/
u ;
2
0 .n/
u D u.nC1/ C n 0 u.n/ ;
where
1
n .z/ D n 0 .z/ C .z/ and n D C n 0 C n .n 1/ 00
(4.126)
2
are a first order polynomial and a constant.
In particular, if n D 0 for some integer n, then one of the solutions of the
DE (4.125), g00n C n g0n D 0, is a constant. If gn D u.n/ .z/ is a constant, then surely
this in turn means that u.z/ must be an n-th order polynomial. We see from here
immediately that if the constant takes on one of the following eigenvalues,
1
n D n 0 n .n 1/ 00
; (4.127)
2
348 4 Special Functions
where
and
1 nCm1
m D n C m 0 C m .m 1/ 00
D .n m/ 0 C 00
: (4.130)
2 2
It is explicitly seen from this that when m D n we have n D 0, as it should be.
We shall now rewrite DEs (4.122) and (4.128) in the self-adjoint form in which
the terms with the second and first order derivatives are combined into a single
expression. To this end, let us multiply (4.122) and (4.128) by some functions w.z/
and wn .z/, respectively, so that the two DEs would take on the self-adjoint form
each:
0 0
wu0 C wu D 0 and wm vm0 C m wm vm D 0; (4.131)
Problem 4.44. Show that functions w.z/ and wm .z/ must satisfy the following
DEs:
Problem 4.45. Using expression (4.126) for m .z/, manipulate the equation
for wm into w0m =wm D w0 =w C m 0 = and hence show that
wm .z/ D m
.z/w.z/; m D 0; 1; 2; : : : ; n (4.133)
(when integrating, an arbitrary constant was set to zero here which corresponds
to the prefactor of one in the relation above).
4.4 Differential Equation of Generalised Hypergeometric Type 349
The above equations should help derive an explicit expression for the polynomi-
als un .z/ corresponding to n D 0 for a particular (positive integer) value of n. Note
.m/
that m ¤ 0 for m < n. Recalling that vm D un .z/ is the m-th derivative of the
.mC1/
polynomial un .z/, we can write vmC1 D un D vm0 . Also,from (4.133),
we have
wmC1 D mC1 w D wm , and therefore, .wmC1 vmC1 /0 D wm vm0 . By virtue of
the second DE in (4.131), this expression should also be equal to m wm vm , i.e. we
obtain the recurrence relation:
1
wm vm D .wmC1 vmC1 /0 : (4.134)
m
.n/
Expressing vm from the left-hand side and realizing that vn D un .z/ is a constant,
we obtain
Cnm
vm D un.m/ .z/ D .wn .z//.nm/ ; (4.135)
wm .z/
where
" n1 #1 2 31 2 3
Y Y
n1
Y
m1
Mm
Cnm Du.n/
n .k / Du.n/
n
4 j 5 4 j 5 D u.n/
n
kDm jD0 jD0
Mn
(4.136)
is some constant prefactor, and
Y
k1 Y Y
k1 nCj1 nŠ
k1
nCj1
Mk D j D .n j/ 0 C 00
D 0 C 00
:
2 .n k/Š 2
jD0 jD0 jD0
(4.137)
By definition it is convenient to assume that M0 D 1. Formula (4.135) demonstrates
that the polynomials and their derivatives are closely related.
In particular, when m D 0, one obtains an explicit (up to a constant prefactor)
expression for the polynomial solution of the DE (4.122) we have been looking for:
Cn0 Cn0
un .z/ D .wn .z//.n/ D . n
.z/w.z//.n/ ; (4.138)
w.z/ w.z/
350 4 Special Functions
where we have made use of Eq. (4.133). This is the familiar Rodrigues for-
mula (4.94). The prefactor
.n/
M0 un
Cn0 D u.n/
n D
Mn Mn
contains Mn , for which we can write down an explicit expression from (4.137), and
.n/
a constant prefactor un which is set by the normalisation of the polynomials (this
will be discussed later on).
It is seen that we have recovered some of our previous results of Sect. 4.3.2 using
a rather different approach which started from the DE itself.
It is easy to show that on the real axis the polynomial functions un .x/ correspond-
ing to different values of n in Eq. (4.122) are orthogonal if the following condition
is satisfied at the boundary points x D a and x D b of the interval:
ˇ
.z/w.z/zk ˇzDa;b D 0; (4.139)
Problem 4.46. Using the method developed in Sect. 4.3.1.2 when we proved
orthogonality of Legendre polynomials, show that on the real axis two solutions
un .x/ and um .x/ of the DE (4.122) with n and m , respectively, are orthogonal:
Z b
w.x/un .x/um .x/dx D 0; n ¤ m: (4.140)
a
[Hint: instead of Eq. (4.122) use its self-adjoint form (the first equation (4.131))
and then note that either un .x/ or um .x/ consists of a sum of powers of z.]
Problem 4.47. Similarly, consider the DE (4.128) for the m-th derivative,
.m/
vm .x/ D un .x/, of the polynomial un .x/, which obviously is also a polynomial.
Show that these are also orthogonal with respect to the weight function
wm .x/ D m .x/w.x/ for different n and the same m:
Z b
.m/
w.x/ m
.x/un.m/ .x/uk .x/dx D 0; k ¤ n: (4.141)
a
[Hint: also use the self-adjoint form for the DE, the second equation in (4.131).]
Let us now derive the relationship between the normalisation integral (4.84) for
the polynomials,
Z b
Dn0 D w.x/u2n .x/dx; (4.142)
a
4.4 Differential Equation of Generalised Hypergeometric Type 351
This can be done by considering the second DE in Eq. (4.131) for the function
.m/
vm .x/ D un .x/ which we shall write using (4.134) as
.wmC1 vmC1 /0 C m wm vm D 0:
Multiplying both sides of it by vm .x/ and integrating between a and b and applying
integration by parts for the derivative term we obtain
Z b Z b
.wmC1 vmC1 vm /jba wmC1 vmC1 vm0 dx C m wm vm2 dx D 0:
a a
The first term is zero due to the boundary condition (4.139). In the second term
vm0 .x/ D vmC1 .x/, and hence we immediately obtain a recurrence relation: Dn;mC1 D
m Dnm . Repeatedly applying this relation, Dnm can be directly related to Dn0 :
!
Y
m1
Dnm D m1 Dn;m1 D m1 m2 Dn;m2 D D k Dn0 D .1/m Mm Dn0 ;
kD0
(4.144)
which is the required relationship. The quantity Mm was defined earlier by
Eq. (4.137). Recall that m are given by Eq. (4.130). We shall employ this identity in
the next section to calculate the normalisation integral for the associated Legendre
functions.
Another useful application of the above result is in calculating the normalisation
Dn0 of the polynomials. Indeed, setting m D n in Eq. (4.144) and noticing that
Z Z
b 2 2 b
Dnn D w.x/ n
.x/ u.n/
n dx D u.n/
n w.x/ n
.x/dx;
a a
we obtain
2 2
un
.n/ Z b
.n/
an nŠ Z b
Dn0 D .1/ n
w.x/ n
.x/dx D .1/ n
w.x/ n
.x/dx:
Mn a Mn a
(4.145)
.n/
Here we have made use of the fact that is a constant and hence can be taken
un
.n/
out of the integral. Also, this constant can trivially be related to the coefficient an
.n/ .n/
to the highest power xn in the polynomial un .x/ as un D an nŠ, leading finally
to the above relationship for the normalisation integral. Therefore, what is required
for the calculation of the normalisation Dn0 is the knowledge of the highest power
.n/
coefficient an and the calculation of the integral of w n . These can be obtained in
each particular case of the polynomials as is done in the next section.
352 4 Special Functions
We recall from Sect. 4.4.1 that when transforming the original DE (4.117)
into the standard form (4.122), several cases for choosing the constant k and the
polynomial .x/ might be possible. The obtained above result for the normalisation
constants may help in narrowing down that uncertainty. Indeed, consider the case
of n D m D 1. Then, from (4.144) it follows that D11 D 0 D10 , where
0 D 0 jnD1 D 0 , see Eq. (4.130). Note that both quantities, D11 and D10 , must
be positive, see Eqs. (4.142) and (4.143). Therefore, 0 must be negative. Let us
remember this result. This is a necessary condition which can be employed when
choosing particular signs for k and the first order polynomial .x/ when applying
the transformation method of Sect. 4.4.1.
So far we have discussed mostly orthogonal polynomials as solutions of the
hypergeometric type equation (4.122). Polynomial solutions correspond to partic-
ular values of given by the eigenvalues of Eq. (4.127). It is also possible to
construct solutions of such an equation for other values of . Although we are not
going to do this here as it goes way beyond this course, we state without proof a
very important fact that complete solution of the original generalised equation of
the hypergeometric type (4.117) on the real axis corresponding to other values of
than those given by Eq. (4.127) is not bound within the interval a x b.
Only solutions corresponding to the eigenvalues n from (4.127), i.e. orthogonal
polynomials, result in bound solutions of the original equation (4.122). In other
words, only such solutions are everywhere finite in the interval of their definition,
any other solutions will indefinitely increase (decrease) within that interval or at its
boundaries. As was mentioned at the beginning of this section, this is extremely
essential in solving physical problems when quantities of interest can only take on
finite values.
Here we shall revisit Jacobi, Hermite and Laguerre polynomials using the general
theory developed above. We shall derive their explicit recurrence relations and
establish their normalisation.
We shall first consider the question of normalisation. As an example, let us look
first at the Legendre polynomials. Their DE is given by Eq. (4.65), it is already in
the standard form with .x/ D 1 x2 , .x/ D 2x and n D n .n C 1/. We also
know that the weight function in this case is w.x/ D 1. Hence from Eq. (4.137) we
can calculate
nŠ Y .1/n nŠ Y
k1 k1
nCj1
Mk D 2 C .2/ D .n C j C 1/
.n k/Š jD0 2 .n k/Š jD0
and hence Mn D .1/n .2n/Š. We also need the integral in Eq. (4.145),
R1
2 n
1 1 x dx, which has been calculated before, see Eq. (4.43). The final
.n/
ingredient is the coefficient an to the xn term in the polynomial.
To find it, consider
n
the Rodrigues formula (4.77) in which we shall expand x2 1 into the Taylor’s
series and differentiate each term n times:
n
1 h 2 n i.n/ 1 X n .n/
x 1 D .1/nk x2k :
2 nŠ
n 2 nŠ kD0 k
n
The term with the highest power of x (the term with xn ) arises when k D n, and the
required coefficient is
1 n 1
a.n/
n D .2n/ .2n 1/ : : : .2n n C 1/ D n .2n/ .2n 1/ : : : .n C 1/
2 nŠ
n n 2 nŠ
1 .2n/Š
D ;
2n nŠnŠ
so that
.2n/Š
a.n/
n D : (4.146)
2n .nŠ/2
Collecting all our findings in Eq. (4.145), we obtain the final result,
2 2nC1
1 .2n/Š 2 .nŠ/2 2
Dn0 D .1/n 2
nŠ D ;
.1/ .2n/Š 2 .nŠ/
n n .2n C 1/Š 2n C 1
a.n/
n D2 :
n
(4.147)
Problem 4.49. The generalised Laguerre polynomials Ln .x/ defined for x 0
are specified by the Rodrigues formula (4.109); they satisfy the DE (4.108). In
this case: D x, D 1Cx and w D x ex . Show by repeated differentiation
of xnC ex in the Rodrigues formula that the term with the highest power of x
arises when differentiating the exponential function only, and hence in this case
.1/n
a.n/
n D : (4.149)
nŠ
Correspondingly, verify that the normalisation is
.n C C 1/
Dn0 D : (4.150)
nŠ
.;/
Problem 4.50. Consider the Jacobi polynomials Pn .x/ given by Rodrigues
formula (4.115), defined on the interval 1 x 1 and satisfying the
DE (4.114). The coefficient to the highest power of x (which is xn ) can be
obtained by considering the limit
a.n/ n .;/
n D lim x Pn .x/ : (4.151)
x!1
Use this formula in conjunction with the general expression (4.116) for the
Jacobi polynomials and formula (2.90) for the generalised binomial coeffi-
cients, to find that
1 2n C C
a.n/
n D : (4.152)
2n n
2CC1 .n C C 1/ .n C C 1/
Dn0 D : (4.153)
nŠ .2n C C C 1/ .n C C C 1/
[Hint: when calculating the integral appearing in Eq. (4.145), relate it to the
beta function using the method employed in deriving Eq. (4.43), and then
repeatedly use the recurrence relation for the gamma function.]
The last point of the general theory which remains to be considered sys-
tematically concerns the derivation of the recurrence relation for all classical
polynomials. We have considered in detail various recurrence relations for Legendre
polynomials in Sect. 4.3.1.1 (see Eq. (4.55) in particular), and a general formula
4.4 Differential Equation of Generalised Hypergeometric Type 355
for any orthogonal polynomials has been derived in Sect. 4.3.2.2. It follows from
Eq. (4.87) derived in the latter section that in order to set up explicitly the recurrence
relation between any three consecutive polynomials, one needs to know both the
.n/ .n/
coefficients an and an1 , and the normalisation Dn0 . We have calculated the former
and the latter above in this section; however, we still need to calculate the coefficient
.n/
an1 to the power xn1 .
This task can be accomplished easily if we notice that
.n/ .n/
un .x/ D a.n/
n x C an1 x
n n1
C H) u.n1/
n D a.n/
n nŠ x C an1 .n 1/Š:
.n/
un Mn1 Mn1 w0n
vn1 D u.n1/ .x/ D .wn .x//.1/ D a.n/
n nŠ ;
n
wn1 .z/ Mn Mn wn1
where we have made use also of Eq. (4.136). However, because of the second
equation in (4.132) and of Eq. (4.129), we can write
w0n wn w0n wn n 0
D D
wn1 wn1 wn wn1
0
w n n 0 0 0 0
D D n Dn C D .n 1/ C :
w n1
Also, from (4.137),
Mn1 1 1
D D 0 00
;
Mn n1 C .n 1/
.n/ .n/ C .n 1/ 0
a.n/
n nŠ x C an1 .n 1/Š D an nŠ
0 C .n 1/ 00
.n/ 0
an1 C .n 1/
H) .n/
Dn x : (4.154)
an 0 C .n 1/ 00
Note that the first term in the square brackets above is the first order polynomial
which must start from x and cancel out the second term giving the required constant
ratio of the two coefficients.
As an example, let us consider the case of Laguerre polynomials Ln .x/ D Ln0 .x/.
.n/ .n/
For them D x and D 1 x, so that an1 =an D n2 . According to
.nC1/ .nC1/
recurrence relation (4.87), we also need the ratio an =anC1 , which is obtained
from the previous one by the substitution n ! n C 1. Then, from (4.150) we have
356 4 Special Functions
Dn0 D .n C 1/=nŠ D 1 and according to (4.149) we know the expressions for the
.n/ .n1/ .nC1/
coefficients an , an1 and anC1 . Hence the recurrence relation (4.87) reads in this
case:
Problem 4.52. Prove that for the generalised Laguerre polynomials, Ln .x/,
the recurrence relation reads
xLn D .n C / Ln1
C .2n C C 1/ Ln .n C 1/ LnC1
: (4.156)
Problem 4.53. Prove that the recurrence relation for the Hermite polynomi-
als is
1
2xHn D 2nHn1 C HnC1 : (4.157)
2
Problem 4.54. Prove that the recurrence relation for the Legendre polynomi-
als is given by Eq. (4.55).
Problem 4.55. Prove that the recurrence relation for the Jacobi polynomials
is
2.nC/.nC/ .;/ 2 2
xP.;/ D Pn1 C P.;/
n
.2nCCC1/.2nCC/ .2nCC/.2nCCC2/ n
2.n C 1/.n C C C 1/ .;/
C P : (4.158)
.2n C C C 2/.2n C C C 1/ nC1
Here we shall consider special functions which have a solid significance in many
applications in physics. They are related to Legendre polynomials and are called
associated Legendre functions. We shall see in Sect. 4.5.3 that these functions
appear naturally while solving the Laplace equation in spherical coordinates;
similar PDEs appear in various fields of physics, notably in quantum mechanics,
electrostatics, etc.
4.5 Associated Legendre Function 357
where m D 0; ˙1; ˙2; : : :. We require solutions of the above equation which are
bound within the interval 1 x 1. Whether or not there are such solutions of the
DE would certainly depend on the values of the parameter . Therefore, one task
here is to establish if such particular values of exist that ensure the solutions are
bound, and if they do, what those eigenvalues are. Finally, we would like to obtain
the bound solutions explicitly.
To accomplish this goal, we shall use the method developed above in Sect. 4.4.
Using notations from this section, we notice that this equation is of the generalised
2
2
2 type (4.117) with .x/ D 1 x , ˇ.x/ D 2x and .x/ D
hypergeometric
1 x m . It can be transformed into the standard form (4.122) by means of the
transformation ‚.x/ D .x/u.x/ with the transformation function .x/ satisfying
Eq. (4.119) where the first order polynomial .x/ is determined from Eq. (4.123).
The polynomial .x/ has the form
p
D ˙ .k / .1 x2 / C m2 ; (4.160)
where k D 0 . Here , .x/ and .x/ D ˇ.x/ C 2.x/ (see Eq. (4.121)) enter the
DE (4.122) for u.x/. We now need to find such values of k that would guarantee .x/
from (4.160) to be a first order polynomial. It is easily seen that two cases are only
possible: (1) k D , in which case .x/ D ˙m, and (2) k D m2 , in which case
.x/ D ˙mx. We therefore have four possibilities, some of them would provide us
with the required solution.
Let us specifically consider the case of m 0 and choose .x/ D mx which
yields .x/ D 2x 2mx D 2 .m C 1/ x. This choice guarantees that 0 < 0
2
as required (see the end of Sect. 4.4.2). This case corresponds to k D m2 .
0
The transformation function .x/, satisfying the DE = D = D mx= 1 x
from (4.119), is immediately found to be (up to an insignificant multiplier)
m=2
.x/ D 1 x2 : (4.161)
Next,
D k C 0 D k m D m .m C 1/ : (4.162)
has non-trivial bound solutions only if the parameter takes on the (eigen)values
from Eq. (4.127), i.e. n D 2n .m C 1/ C n .n 1/, where n D 0; 1; 2; : : : is a
positive integer which can be used to number different solutions u.x/ ! un .x/.
Consequently, the required values of become, from (4.162),
H) n D n C m .m C 1/ D l .l C 1/ with l D n C m: (4.163)
As n 0, we should have l m 0.
Next, we shall find the weight function w.x/. It satisfies the first DE given
in (4.132), i.e.
w0 2mx m
. w/0 D w H) D H) w.x/ D 1 x2 : (4.164)
w 1 x2
m=2 h l i.lm/
‚lm .x/ D .x/ul .x/ D Clm 1 x2 1 x2 ; (4.165)
where in writing the constant factor Clm and the solution itself, ‚lm .x/, we indicated
specifically that they would not only depend on l, but also on the value of m. The
obtained functions are called associated Legendre functions and denoted Pm l .x/
because they have a direct relation to the Legendre polynomials Pl .x/ as we shall
see in a moment. Choosing appropriately the proportionality constant in accord with
tradition, we write
1 m=2 dlm l
Pm
l .x/ D 1 x2 1 x2 : (4.166)
2 lŠ
l dx lm
m=2
It may seem that because of the prefactor 1 x2 this function is infinite (not
l
bound) at the boundary points x D ˙1; however, this is not the case as 1 x2 is
4.5 Associated Legendre Function 359
differentiated l m times yielding the final power of 1 x2 being larger than m=2.
We can also see this point more clearly by directly relating these functions with the
Jacobi polynomials. This can be done in the following way:
.l m/Š
2 m=2 .m;m/
Pm
l .x/ D .1/
lCm
1 x Plm .x/; (4.167)
2m lŠ
as can be easily checked by comparing (4.166) with the Rodrigues formula (4.115)
for the Jacobi polynomials.
Replacing formally m ! m in the above formula, another form of the
associated Legendre functions is obtained:
1
2 m=2 d
lCm
2 l
2 m=2 d
m
l .x/ D 1 1 D .1/ 1 Pl .x/;
l
Pm x x x
2l lŠ dxlCm dxm
(4.168)
which shows the mentioned relationship with the Legendre polynomials. Above we
have made use of the Rodrigues formula (4.77) for these polynomials. In particular,
P0l .x/ D .1/l Pl .x/. This other form is also bound everywhere within 1 x 1
and is also related to the Jacobi polynomials:
We need to show though that Pm l .x/ (where still m 0) is a solution of the associated
Legendre equation. In fact, we shall show that by proving that the two functions,
Pm
l .x/ and Pl .x/ (where m
m
0), are directly proportional to each other.
To demonstrate this fact, let us derive an explicit expression for the function
Pml .x/ inspired by some tricks we used when deriving explicit expression (4.116)
l
for the Jacobi polynomials. Writing 1 x2 as .1 x/l .1 C x/l and performing
differentiation with the help of the Leibnitz formula (4.79), we obtain
lCm
1 X
2 m=2 lCm .k/ .lCmk/
l .x/ D
Pm 1 x .1 x/l .1 C x/l ;
2l lŠ kD0
k
where
.k/ lŠ
.1 x/l D .1/k .1 x/lk for k l;
.l k/Š
.lCmk/ lŠ
.1 C x/l D .1 C x/km ;
.k m/Š
360 4 Special Functions
which is only non-zero for k m and hence it limits the values of k from below.
Therefore, finally:
lŠ .lCm/Š X
2 m=2
l
.1/k
l .x/D
Pm 1x .1x/lk .1Cx/km :
2l kDm
kŠ .lCmk/Š .lk/Š .k m/Š
(4.170)
lŠ .lm/Š X
2 m=2
lm
.1/k
Pm
l .x/D 1x .1x/lkm .1 C x/k :
2l kD0
kŠ .lmk/Š .lk/Š .kCm/Š
(4.171)
Now, changing the summation index k ! k C m, rearrange the sum and
then derive the following relationship between the two representations of the
associated Legendre function:
.l m/Š m
Pm
l .x/ D .1/
m
P .x/; l m 0: (4.172)
.l C m/Š l
We see that the two functions are proportional to each other and hence are
solutions of the same DE for the same value of m.
Problem 4.57. Derive several first associated Legendre functions:
p p
P11 D 1 x2 I P12 D 3x 1 x2 I P22 D3 1 x2 I
3 p 3=2
P13 D 5x2 1 1 x2 I P23 D15x 1 x2 I P33 D 15 1 x2 I
2
5p 15 2
P14 .x/ D 1 x2 7x3 3x I P24 D 7x 1 1 x2 I
2 2
3
2 3=2
2
P4 D 105x 1 x I P4 D105 1 x2 :
4
(continued)
4.5 Associated Legendre Function 361
m
It should now be obvious that one can equivalently use either Pml .x/ or Pl .x/ as
a solution of the associated Legendre equation. It is customary to use the function
Pml .x/ with the non-negative value of m.
Note that Pm l .x/ is not the only solution of the differential equation (4.159).
However, the other solution is infinite at x ! ˙1 and thus is not acceptable for
many physical problems. Hence it will not be considered here.
Here we shall consider some properties of the functions Pm l .x/. Firstly, let us show
0
that the functions Pm
l .x/ and Pm
l0 .x/ are orthogonal for l ¤ l and the same m:
Z 1
0
Ill0 D l .x/Pl0 .x/dx D 0; l ¤ l :
Pm m
(4.173)
1
But the integral is nothing but the orthogonality condition (4.140) written for
.m;m/
the
Jacobi polynomials Pn .x/, for which the weight function is w.x/ D
m
1 x2 (see Problem 4.43). Therefore, it is equal to zero.
Problem 4.59. Derive the orthogonality condition for the associated Legendre
functions exploiting the relationship (4.168) between them and the Legendre
polynomials and the orthogonality condition (4.141) for the derivatives of
the polynomials.
Next, let us derive the normalisation integral for the associated Legendre
functions:
Z Z
1
2
1 m h .m/ i2
Dlm D l .x/
Pm dx D 1 x2 Pl .x/ dx:
1 1
362 4 Special Functions
.m/
The functions Pl .x/ are the m-th derivatives of the Legendre polynomials. The
latter are characterised by the unit weight function w.x/ D 1, .x/ D 2x and
.x/ D 1 x2 . The weight wm .x/ associated with the m-th derivative
m of Pl .x/,
according to Eq. (4.133), must be wm .x/ D m .x/ D 1 x2 . Therefore, the
integral above is the normalisation integral (4.143) for the m-th derivative of the
Legendre polynomials Pl .x/, and hence we can directly use our result (4.144) to
relate this to the normalisation Dl0 of the functions Pl .x/ themselves:
!
Y
m1
Dlm D k Dl0 :
kD0
For Legendre polynomials Dl0 D 2= .2l C 1/, see Eq. (4.73), and, according
to (4.130),
1
k D .l k/ 0 .l k/ .l C k 1/ 00
2
D 2 .l k/ C .l k/ .l C k 1/ D .l k/ .l C k C 1/ ;
This result allows us to write the orthonormality condition for the associated
Legendre functions as:
Z 1
2 .l C m/Š
Ill0 D l .x/Pl0 .x/dx D
Pm m
ıll0 : (4.174)
1 2l C 1 .l m/Š
It is convenient to redefine the solutions ‚lm .x/ of the DE (4.159) in such a way that
their normalisation would be equal to one:
Z 1
‚lm .x/‚l0 m .x/dx D ıll0 : (4.175)
1
The associated Legendre functions we have encountered above form the main
component of the so-called spherical functions or spherical harmonics which
appear in a wide class of physical problems where PDEs containing the Laplacian
are solved in spherical coordinates. For instance, this happens when considering
central field problems of quantum mechanics (Sect. 4.7). Therefore, it is essential to
introduce these functions. It is natural to do this by considering the simplest problem
of the Laplace equation in spherical coordinates.
Consider the Laplace equation D 0. Solutions of such an equation are called
harmonic functions. We shall obtain these by considering the Laplace equation
in spherical coordinates .r; ; /. We shall learn in Sect. 7.9 that the Laplacian of
.r; ; / in the spherical coordinates can be written as:
1 @ 2@ 1 @ @ 1 @2
r C sin C D 0: (4.177)
r2 @r @r r2 sin @ @ r2 sin2 @ 2
First of all, we note that 1=r2 appears in all terms and hence can be cancelled.
Next, we shall attempt5 to separate the variables in Eq. (4.177). For that, we shall
be looking for solutions of this PDE which are in the form of a product of three
functions (a product solution),
each depending on its own variable. Substituting this product solution into the PDE
and dividing through by D R‚ˆ, we obtain
1 d 2 dR 1 d d‚ 1 1 d2 ˆ
r C sin C D 0:
R dr dr ‚ sin d d sin2 ˆd 2
It is seen that the part depending on the angle is “localised”: nowhere else in the
equation is there any dependence on the angle . Hence one can solve the above
equation with respect to this part:
1 d2 ˆ 2 1 d 2 dR sin d d‚
D sin r C sin : (4.179)
ˆd 2 R dr dr ‚ d d
The left-hand side of (4.179) depends only on , while the right-hand side only
depends on the other two variables .r; /. This is only possible if both sides are
5
We shall basically use the method described in more detail in Sect. 8.2.5.
364 4 Special Functions
equal to the same constant; let us call it the separation constant . Hence, we can
write the above equation equivalently as two equations:
1 d2 ˆ d2 ˆ
D H) C ˆ D 0; (4.180)
ˆd 2 d 2
and
21 d 2 dR sin d d‚
D sin r C sin :
R dr dr ‚ d d
It is seen that the variable was “separated” from the other two variables. Now we
need to separate the variables r and in the equation above. This is easily done by
dividing through both sides on sin2 :
1 d dR 1 d d‚
r2 D sin C : (4.181)
R dr dr ‚ sin d d sin2
In Eq. (4.181) all terms depending on r are collected on the left, while the right-hand
side depends only on . It follows, therefore, that both sides must be equal to the
same (separation) constant which this time we shall call . This way we arrive at
the following two final equations:
d 2 dR
r R D 0; (4.182)
dr dr
1 d d‚
sin C ‚ D 0; (4.183)
sin d d sin2
To find , we note that the function ˆ. / must be periodic with respect to its
argument with a period of 2 due to the geometric meaning of this angle. Indeed, if
we increase by 2 , we come back to the same point in space, and hence we must
require that the solution cannot change because of the transformation ! C 2 .
Therefore, .r; ; C 2 / D .r; ; /, which is ensured if ˆ. C 2 / D ˆ. /;
i.e. if ˆ. / is a periodic function with the period of 2 .
4.5 Associated Legendre Function 365
Let us consider Eq. (4.180) for different values of to verify which ones would
ensure such periodicity. When D m2 > 0, the solution of (4.180) is
ˆ. / D A sinh.p / C B cosh.p /:
In all these solutions A and B are arbitrary constants. It is readily seen that the
required periodicity of ˆ . / is only possible in the first case when ˆ. / is a sum
of sine and cosine functions. But even in that case, this may only happen if the
values of m D 0; ˙1; ˙2; : : : are integers. Hence, the eigenvalues are D m2
and the corresponding eigenfunctions are given by Eq. (4.184), i.e. these are either
ˆm D sin .m / or ˆm D cos .m / (or their arbitrary linear combinations). In fact,
it is very convenient to use the complex eigenfunctions ˆm D eim and ˆm D eim
instead. This is particularly useful for quantum-mechanical applications.
Using the exponential form of the eigenfunctions, it is especially easy to see that
they satisfy
Z 2
ˆm . / ˆm0 . /d D 2 ımm0 ; (4.185)
0
In order to simplify this equation, we shall change the variable using the
substitution x D cos , where 1 x 1. For any function f . /,
df df dx df df df
D D sin H) sin D 1 x2 ;
d dx d dx d dx
so that the first term in the DE above becomes
d d‚ d
2 d‚
2 d
2 d‚
sin sin D sin 1x D 1x 1x ;
d d d dx dx dx
366 4 Special Functions
which results in the following DE for ‚.x/ in terms of the new variable:
d2 ‚ d‚ m2
1 x2 2x C ‚ D 0; (4.187)
dx2 dx 1 x2
which is precisely the same equation as the one we considered in detail in Sect. 4.5.
Therefore, physically acceptable solutions of this DE which are bound everywhere
within the interval 1 x 1 are given by associated Legendre functions ‚lm .x/,
Eq. (4.176), with D l .l C 1/, where l D 0; 1; 2; : : :. With these values of this is
the only solution which is finite along the z axis ( D 0 for x D 1 and D for
x D 1).
A simple illustration of this requirement for the possible eigenvalues, D
l .l C 1/, can be given based on solving the DE (4.187) using the Frobenius method
around the point x D 0. We have shown in Sect. I.8.4.2 in Example I.8.9 when
considering series solutions of the Legendre equation (4.65) (i.e. for m D 0) that
a polynomial solution is only possible when D l .l C 1/ as in this case one of
the series of the general solution, i.e. either y1 .x/ or y2 .x/ in the general solution
y.x/ D C1 y1 .x/ C C2 y2 .x/ terminates. Hence it is guaranteed to remain finite at
the boundary points x D ˙1; the other series solution diverges at these points,
and hence should be rejected. The bound solution coincides with the Legendre
polynomials which are a particular case of the associated Legendre functions when
m D 0.
The above consideration brings us to the point where we can finally write down
the complete solution of the angular part of the Laplace equation in spherical polar
coordinates. These solutions are all possible products of the individual solutions of
the equations for ‚. / and ˆ. /, see Eq. (4.178). These products, ‚lm .x/ˆm . / D
‚lm .cos / ˆm . /, denoted as Ylm .; / or Ylm .; /, are widely known as spherical
harmonics. For m 0 these functions are defined by6
s
2l C 1 .l m/Š m
Ylm .; / D .1/ m
P .cos /eim : (4.188)
4 .l C m/Š l
Here the associated Legendre function is considered only for positive values of m,
i.e. 0 m l. At the same time, two possible functions ˆm . / D e˙im exist
for m > 0. That means that altogether 2l C 1 possible values of the index m
can be considered between l and l for each l D 0; 1; 2; 3; : : :, i.e. including the
6
Various sign factors, such as the .1/m factor we have in Eq. (4.188), can be also found in the
literature.
4.5 Associated Legendre Function 367
negative values as well, and hence 2l C 1 spherical harmonics can also be defined
for each l. The spherical harmonics for negative values m D 1; 2; : : : ; l are
defined such that
Ylm .; / D .1/m Ylm .; / : (4.189)
which is easily checked by making the substitution x D cos and using orthonor-
p
mality of the associated Legendre functions ‚lm .x/ and that of ˆm . / D eim = 2 .
Here the integration is performed over the so-called solid angle d D sin dd'
(which integrates to 4 over the whole sphere).
Problem 4.61. Show that a first few real spherical harmonics for l D 1; 2 are
r r r
3 3 3
S10 D Y10 D nz I S11 D nx I S11 D ny I (4.195)
4 4 4
r r r
15 15 5
S21 D nx nz I S21 D
ny nz I 0 0
S2 D Y2 D .3n2z 1/ I
4 4 16
r r
2 15 2 2
2 15
S2 D nx ny I S2 D nx ny ; (4.196)
16 4
where nx D sin cos , ny D sin sin and nz D cos are the components of
the unit vector n D r=r.
1 X
X l
f .r; ; '/ D flm .r/ Ylm .; /; (4.197)
lD0 mDl
where by virtue of the orthogonality of the spherical functions, Eq. (4.190), the
expansion coefficients are given by
Z 2 Z
m
flm .r/ D d' Yn .; '/ f .r; ; '/ sin d: (4.198)
0 0
Note that the expansion coefficients only depend on the length r D jrj of the
vector r.
We finish this section with another important result, which we shall also leave
without proof. If we define two unit vectors n1 and n2 with the angle D .n1 ; n2 /
between them, then
4 X l
Pl .cos / D Y m .1 ; 1 /Yl .2 ;
m
2/ ; (4.199)
2l C 1 mDl l
where and angles .1 ; 1 / and .2 ; 2 / correspond to the orientation of the first and
the second vectors, respectively, in the spherical coordinates.
4.6 Bessel Equation 369
Problem 4.62. Show that formula (4.199) remains invariant upon replacement
of the complex harmonics Ylm with the real ones:
4 X l
Pl .cos / D Sm .1 ; 1 /Sl .2 ;
m
2 /: (4.200)
2l C 1 mDl l
where s can only take on two values, s D ˙, giving rise to two independent
solutions. Recall from Sect. I.8.4 that when the difference of two s values is an
integer (which is 2 in our case), only a single solution may be obtained by this
method. Therefore, let us initially assume that 2 is not an integer. Show then
that several first terms in the expansion of the two solutions are
(continued)
370 4 Special Functions
the second independent solution is obtained from the first by the substitution
! .
Problem 4.64. The recurrence relation for the cr coefficients (4.202) can be
used to obtain a general expression for them. Prove using the method of
mathematical induction or by a repeated application of the recurrence relation
that for s D one can write
" 1
#
X .1/r x 2r
y1 .x/ D c0 x 1 C ; (4.205)
rD1
rŠ . C 1/ . C 2/ . C r/ 2
The above expansion can be written in a much more compact form if, as it is
usually done, c0 is chosen as c0 D 2 = . C 1/. In this case the first solution
is called Bessel function of order of the first kind and denoted J .x/. Using the
recurrence relation (4.31) for the gamma function, we can write
. C 1/ . C 1/ . C 2/ . C r/ D . C r C 1/ ;
Note that the very first term in the expansion (4.205) is now nicely incorporated in
the summation. The second solution of the DE corresponding to is obtained by
replacing ! in the first one:
1
X .1/r x 2r
J .x/ D : (4.207)
rD0
rŠ .r C 1/ 2
4.6 Bessel Equation 371
It is also the Bessel function of the first kind. Both functions for non-integer
are linearly independent and hence their linear combination, y.x/ D C1 J .x/ C
C2 J .x/, is a general solution of the DE (4.201).
Now we shall consider the case when 2 is an integer. This is possible in either
of the following two cases: (1) is an integer; (2) is a half of an odd integer
(obviously, half of an even integer is an integer and hence this is the first case).
Consider these two cases separately.
If D n is a positive integer, the solution Jn .x/ is perfectly valid as the
first independent solution. The function Jn .x/ contains the gamma function
.r n C 1/ which is equal to infinity for negative integer values, i.e. when
r n C 1 D 0; 1; 2; 3; : : : or simply r n 1 and is an integer. Since the
gamma function is in the numerator, contributions of these values of the index r in
the sum, i.e. of r D 0; 1; : : : ; n 1, are equal to zero and the sum can be started from
r D n:
ˇ ˇ
X1 ˇ ˇ 1
.1/r .x=2/2rn ˇ change summation index ˇ X .1/ .x=2/2kCn
kCn
Jn .x/ D Dˇ ˇD
rŠ .r n C 1/ ˇ r !k Drn ˇ .k C n/Š .k C 1/
rDn kD0
1
X .1/k .x=2/2kCn
D .1/n D .1/n Jn .x/; (4.208)
kŠ .k C n C 1/
rD0
i.e. the Bessel function of a negative integer index is directly proportional to the one
with the corresponding positive integer index, i.e. Jn .x/ is linearly dependent on
Jn .x/ and hence cannot be taken as the second independent solution of the DE. In
this case, according to general theory of Sects. I.8.4 and 2.8, one has to look for the
second solution in the form containing a logarithmic function (cf. Eq. (I.8.75) and
Problem I.8.36):
1
X
Kn .x/ D Jn .x/ ln x C g.x/; where g.x/ D xn cr xr : (4.209)
rD0
This function, Kn .x/, will be the second independent solution of the Bessel DE in
the case of D n being a positive integer (including D 0). It diverges at x D 0.
Substituting Kn .x/ into the DE (4.201) and using the fact that Jn .x/ is already its
solution, the following DE is obtained for the function g.x/:
x2 g00 C xg0 C x2 n2 g D 2xJn0 .x/: (4.210)
Problem 4.65. The modified Bessel function of the first kind I .x/ is related to
the Bessel function J .x/ via I .x/ D i J .ix/. Show that I .x/ satisfies the
following differential equation:
x2 y00 C xy0 x2 C 2 y D 0: (4.211)
Next, let us consider the case when D .2n C 1/ =2 D n C 1=2 is a half integer. In
this case, from Eq. (4.206),
1
X .1/r x 2rCnC1=2
JnC1=2 .x/ D (4.212)
rD0
rŠ r C n C 32 2
and
1
X .1/r x 2rn1=2
Jn1=2 .x/ D 1
: (4.213)
rD0
rŠ r n C 2 2
1 r "1 #
X .1/r 22rC1 rŠ x 2rC1=2 2 X .1/r 2rC1
J1=2 .x/ D p D x
rD0
rŠ .2r C 1/Š 2 x rD0 .2r C 1/Š
r
2
D sin x; (4.214)
x
where a use has been made of the Taylor’s expansion of the sine function (which is
the expression within the square brackets).
4.6 Bessel Equation 373
The expression (4.213) is well defined and hence can be used as the second
linearly independent solution of the Bessel DE, i.e. a general solution in the case
of a half integer reads y.x/ D C1 JnC1=2 .x/ C C2 Jn1=2 .x/.
Interestingly, the half integer Bessel functions, JnC1=2 .x/, are the only ones which
are bound on the real axis. We can investigate this by applying our general approach
for investigating DEs of hypergeometric type developed in Sect. 4.4.
Problem 4.67. Here we shall transform the Bessel DE (4.201) into the stan-
dard form (4.122). Show that in this case k D ˙2i and .x/ D ˙ .ix ˙ /.
Choosing k D 2i and .x/ D .iz C /, show that the DE in the standard
form is characterised by .x/ D x, .x/ D 2 C 1 2ix and D i .2 1/.
Next, show using expression (4.127) for the eigenvalues that, to guarantee
bound solutions, the values of should be positive half integer numbers, i.e.
they must satisfy D n C 1=2, where n is a positive integer.
p
p For instance, n D 0 corresponds to D 1=2, which gives J1=2 .x/ sin x= x D
x .sin x=x/ which is obviously finite at x D 0 pand tends to zero at x D ˙1. On
the other hand, the function J1=2 .x/ cos x= x tends to infinity when x ! 0
and is therefore not finite there. The possible values of we found in the Problem
guarantee that all functions JnC1=2 .x/ with any n D 0; 1; 2; : : : are bound for all
values of x. Moreover, as was said, it can be shown that these are the only Bessel
functions which possess this property.
Starting from the general expression (4.206) for J .x/, consider the derivative
X1
d J .1/r 2r 0
D x
dx x rD0
rŠ . C r C 1/ 22rC
1
X X1
.1/r 2rx2r1 .1/r x 2r1
D D :
rD1
rŠ . C r C 1/ 22rC rD1
.r 1/Š . C r C 1/ 2 2
374 4 Special Functions
Problem 4.68. Prove the other two recurrence relations using a similar
method:
d
x J .x/ D x1 J1 .x/; (4.218)
xdx
d n
x J .x/ D xn Jn .x/: (4.219)
xdx
Therefore, it is readily seen that if the first set of relations, (4.216) and (4.217),
increases the order of the Bessel function, the second set of relations, (4.218)
and (4.219), reduces it. This property can be employed to derive recurrence
relations between Bessel functions of different orders. Indeed, from (4.216) we
have, performing differentiation:
J0 D JC1 C J ; (4.220)
x
while similarly from (4.218) one obtains
J0 D J1 J : (4.221)
x
Combining these, we can obtain the other two recurrence relations:
2
J1 C JC1 D J and 2J0 D J1 JC1 : (4.222)
x
4.6 Bessel Equation 375
Recurrence relations (4.217) allow for an explicit calculation of the Bessel functions
of a positive half integer. Indeed, since we already know J1=2 .x/, we can write
n r n
d J1=2 .x/ 2 d sin x
JnC1=2 .x/ D .1/ x n nC1=2
D .1/n xnC1=2 :
xdx x1=2 xdx x
(4.223)
Problem 4.70. Show using the explicit formulae (4.223) and (4.224) that
r r
2 sin x 2 cos x
J3=2 .x/ D cos x C and J3=2 .x/ D sin x C :
x x x x
Then, using the first of the recurrence relations (4.222) demonstrate that
r
2 3 3
J5=2 .x/ D 1 sin x cos x and
xx2 x
r
2 3 3
J5=2 .x/ D sin x C 2 1 cos x :
x x x
Verify the above expressions for J˙5=2 .x/ by applying directly expres-
sions (4.223) and (4.224).
It is seen now that the functions Jn1=2 .x/ diverge at x D 0. At the same
time, JnC1=2 .x/ behaves well around the point x D 0. This can be readily seen,
e.g. by expanding sin x=x in (4.223) into the Taylor’s series. It also follows directly
from (4.212).
Here we shall derive some formulae for integer-index Bessel functions which are
frequently found useful in applications. We shall start by expanding the following
exponential function into the Laurent series:
376 4 Special Functions
1
X
e 2 .u u / D
x 1
ck .x/uk : (4.225)
kD1
1
X
e 2 .u u / D
x 1
Jk .x/uk ; (4.228)
kD1
so that the exponential function in the left-hand side serves as the generating
function for the Bessel functions of integer indices.
On the other hand, the coefficients of the Laurent expansion are given by the
contour integral of Eq. (2.97), so that we can write
I
1 x
p 1p
Jk .x/ D e2 pk1 dp; (4.229)
2 i C
where the contour C is taken anywhere in the complex plane around the p D 0 point.
This formula is valid for both negative and positive integer values of the index k.
Problem 4.73. Consider now specifically J1 .z/. Shift the integration interval
to =2 3 =2, split the integral into two, one for =2 =2
and another for =2 3 =2, change the variable ! in the
second integral and then combine the two before making the final change of
variable t D sin . Hence, show that
Z 1
i teixt
J1 .x/ D p dt: (4.232)
1 1 t2
378 4 Special Functions
The modified function of the first kind, see Problem 4.65, is then
Z 1
1 text
I1 .x/ D i1 J1 .ix/ D p dt: (4.233)
1 1 t2
Similarly to the method developed in Sect. 4.3.1.2, let us multiply both sides of
the equation above by J .ˇz/ with some real ˇ and integrate between 0 and some
l > 0:
Z l Z l
d d 2
z J .az/ J .ˇz/ dz C ˛2z J .˛z/ J .ˇz/ dz D 0:
0 dz dz 0 z
ˇl Z l Z l !
dJ .˛z/ ˇ dJ .˛z/ dJ .ˇz/ 2
z ˇ
J .ˇz/ˇ z dz C 2
˛ z J .˛z/ J .ˇz/ dz D 0:
dz 0 0 dz dz 0 z
(4.236)
In the same fashion, we start from (4.235) written for J .ˇz/, multiply it by J .˛z/
and integrate between 0 and l; this procedure readily yields the same equation as
above but with ˛ and ˇ swapped:
ˇl Z l Z l !
dJ .ˇz/ ˇ
ˇ dJ .ˇz/ dJ .˛z/ 2 2
z J .˛z/ˇ z dz C ˇ z J .ˇz/ J .˛z/ dz D 0:
dz 0 0 dz dz 0 z
Problem 4.75. Show using the explicit expansion of J .˛z/ from Eq. (4.206)
that
dJ .˛z/ dJ .ˇz/
z J .ˇz/ J .˛z/
dz dz
X1 X 1
2 .k n/ .1/kCn ˛z C2k ˇz C2n
D : (4.238)
kD0 nD0
kŠnŠ . C k C 1/ . C n C 1/ 2 2
The above result shows that the terms with k D n can be dropped; therefore, the
first term in the expansion starts not from n D k D 0 as we thought before, but
from k; n D 0; 1 and k; n D 1; 0, which give the z2C2 type of behavior for the first
(leading) term in Eq. (4.238). Therefore, the free term in (4.237) is well defined
at z D 0 if 2 C 2 > 0 or > 1.
We still need to check the convergence of the two integrals in (4.236). Let us
start from considering the first one: it contains two derivatives of the Bessel function
which behave as z1 near z D 0 each, i.e. the integrand is z21 near its bottom
Rl
limit. We know that the integral 0 z21 dz converges, if 2 1 > 1 which gives
> 1 again. Similarly for the second integral: its first term behaves as z2C1
around z D 0, while the second one as z21 , both converge for > 1.
380 4 Special Functions
This analysis proves that our consideration is valid for any > 1. Because
the free term (4.238) in Eq. (4.237), prior to taking the limits, behaves as z2C2 D
z2.C1/ , it is equal to zero when z D 0 since C 1 > 0. Therefore, the bottom limit
applied to the free term can be dropped and we obtain instead of (4.237):
Z
dJ .˛z/ dJ .ˇz/ l
l J .ˇz/ J .˛z/ C ˛2 ˇ2 zJ .˛z/ J .ˇz/ dz D 0:
dz dz zDl 0
(4.239)
So far we have not specified the values of the real constants ˛ or ˇ. Now we are
going to be more specific. Consider solutions of the equation
J .z/ D 0 (4.240)
with respect to (generally complex) z; these are the roots of the Bessel function.
It can be shown that this equation cannot have complex roots at all, but there is an
infinite number of real roots ˙x1 , ˙x2 , etc. Let us choose the values of ˛ and ˇ such
that ˛l and ˇl be two distinct roots (˛ ¤ ˇ). Then, J .˛l/ D 0 and J .ˇl/ D 0, and
the first term in (4.239) becomes zero, so that we immediately obtain
Z l
zJ .˛z/ J .ˇz/ dz D 0 (4.241)
0
for ˛ ¤ ˇ. This is the required orthogonality condition for the Bessel functions: they
appear to be orthogonal not with respect to their index , but rather with respect to
the “scaling” of their arguments. Note the weight z in the orthogonality condition
above.
It is also possible to choose ˛l and ˇl as roots of the equation J0 .z/ D 0; it is
readily seen that in this case the first term in Eq. (4.239) is also zero and we again
arrive at the orthogonality condition (4.241).
Problem 4.76. Prove that ˛l and ˇl can also be chosen as roots of the equation
C1 J .z/ C C2 zJ0 .z/ D 0, where C1 and C2 are arbitrary real constants. This
condition generalises the two conditions given above as it represents their
linear combination.
Rl
0xf .x/J .˛i x/ dx
fi D Rl 2
: (4.243)
0 xJ .˛i x/ dx
It is only left to discuss how to calculate the normalisation integral standing in the
denominator of the above equation. To do this, we first solve for the integral in
Eq. (4.239):
Z
l
l dJ .˛x/ dJ .ˇx/
xJ .˛x/ J .ˇx/ dx D J .ˇx/ J .˛x/ ;
0 ˛ ˇ2
2 dx dx xDl
and then consider the limit ˇ ! ˛. This is because in the left-hand side under this
limit we would have exactly the normalisation integral we need. Note that x D ˛l
is one of the roots of the equation J .x/ D 0. Therefore, in the right-hand side
the second term inside the square brackets can be dropped and the expression is
simplified:
Z
l
l dJ .˛x/
xJ .˛x/2 dx D lim J .ˇx/ :
0 ˇ!˛ ˛ ˇ2
2 dx xDl
Since under the limit J .ˇl/ ! J .˛l/ D 0, we have to deal with the 0=0
uncertainty. Using the L’Hôpital’s rule, we have
Z l
2 l dJ .˛x/ dJ .ˇx/
xJ .˛x/ dx D lim
0 ˇ!˛ 2ˇ dx dˇ xDl
l dJ .u/ dJ .u/ l2 dJ .u/ 2
D lim ˛ l D :
2˛ ˇ!˛ du uD˛l du uDˇl 2 du uD˛l
(4.244)
This is the required result. It can also be rearranged using Eq. (4.220). Using
there x D ˛l and recalling that J .˛l/ D 0, we obtain that ŒdJ .u/ =duuD˛l D
JC1 .˛l/, which yields
Z l
l2
xJ .˛x/2 dx D JC1 .˛l/2 : (4.245)
0 2
[Hint: express the second derivative of J .u/ from the Bessel DE.]
382 4 Special Functions
„2 d2 .x/ kx2
2
C .x/ D E .x/; (4.247)
2m dx 2
where E is the energy of a stationary state, .x/ is the corresponding wave function
and „ D h=2 is the Planck constant; the probability dP.x/ to find the particle
between x and x C dx in this particular state is given by j .x/j2 dx. Because the
particle must be somewhere, the probability to find it anywhere on the x axis must
be equal to one:
Z C1 Z C1
P.x/dx D j .x/j2 dx D 1; (4.248)
1 1
which gives a normalisation condition for the wave function. What we shall try to
do now is to find the energies and the wavep functions of the harmonic oscillator.
p Let us introduce a new variable u D x m!=„ instead of x. Then, x0 D u0 u0x D
m!=„ u0 , xx 00
D .m!=„/ uu 00
and the DE is transformed into:
00 2E
C u2 D 0: (4.249)
„!
This DE is of the hypergeometric type (4.117) with .u/ D 1, ˇ.u/ D 0 and .u/ D
2E=„!u2 , and can be transformed into the standard form (4.122) using the method
we developed in Sect. 4.4.1.
Problem 4.78. Show using this method that the transformation from .u/ to a
new function g.u/ via .u/ D .u/g.u/ can be accomplished by the transfor-
2
mation function .u/ D eu =2 , which corresponds to the polynomial (4.123)
being .u/ D u, while the corresponding eigenvalues ! n D 2n with
n D 0; 1; 2; : : :. Hence demonstrate that the transformed DE for the new
function g.u/,
The last integral in the right-hand side corresponds to the normalisation of the
Hermite polynomials, see Eq. (4.148), and therefore we finally obtain
m! 1=4 2n=2
Cn D p : (4.252)
„ nŠ
The ground state of the quantum oscillator, 0 .x/, has a non-zero energy E D „!=2,
called the zero point energy; excited states of the oscillator, n .x/ with n 1, are
obtained by adding n quanta „! to the energy of the ground state. A single quanta of
oscillation is called in solid state physics a “phonon”. It is therefore said that there
are no phonons in the ground state, but there are n phonons, each carrying the same
energy of „!, in the n-th excited state.
„2
.r/ C V.r/ .r/ D E .r/; (4.253)
2m
384 4 Special Functions
where .r/ is the electron wave function and m is its mass, „ D h=2 is the Planck
constant. Writing the Laplacian in the spherical coordinates (see Sect. 4.5.3), after
simple rearrangement, we obtain
1 @ 2@ 1 @ @ 1 @2 2m Ze2
r C 2 sin C 2 2 D 2 EC :
r2 @r @r r sin @ @ r sin @ 2 „ r
Let us solve the above equation for the radial part R.r/ of the wave function for
the case of bound states, i.e. the states in which energy is negative: E < 0. These
states correspond to a discrete spectrum of the atom, i.e. there will be an energy gap
between any two states. There are also continuum states when E > 0 (no gap), we
shall not consider them here.
It is convenient to start by introducing a new variable D r=a, where a D
p
„= 8Em. Then, R0 ! R0 =a and R00 ! R00 =a2 , and Eq. (4.254) is rewritten as:
00 2 0 1 l.l C 1/
R C R C C 2
R D 0; (4.255)
4
p
where D Ze2 m=2„2 E.
The obtained DE is of the generalised hypergeometric type (4.117) with . / D
, ˇ D 2 and . / D 2 =4 C l .l C 1/. In the general method of Sect. 4.4.1,
we express R . / via another function u . / using R . / D . / u . /, where the
auxiliary function . / is chosen such that the new function u . / satisfies a DE
in the standard form (4.122). When applying the method developed in this section,
several choices must be made concerning the sign in defining the constant k and the
-polynomial (4.123) which in our case is as follows:
r s
1 1 1 1 2 2
. /D ˙ Ck . /D ˙ lC C .k / C :
2 4 2 2 4
Recall that the constant k is chosen such that the expression under the square root
becomes a full square.
4.7 Selected Applications in Physics 385
Problem 4.80. Show using the method developed in Sect. 4.4.1, that the
constant k has to be chosen as k D ˙.l C 1=2/. Then, if one chooses the minus
sign in the expression for the k and hence the polynomial . / D =2 C l,
then the DE for the function u . /,
So we have two possible choices for the constant k and hence two possible
transformations R . / ! u . /, either R D l e u or R D l1 e u. As we
expect that the solution of the DE for u. / is to be bound on the whole interval
0 < 1 (by choosing the appropriate eigenvalues ), then only the first choice
can be accepted; the second one is unacceptable as in this case R . / diverges at
D 0 for any l D 0; 1; 2; : : :. Hence, we conclude that R. / is to be chosen from
Eq. (4.257) with u. / satisfying the DE (4.256). This DE has a bound solution if
and only if the coefficient to the u term in DE (4.256), i.e. D l 1, is
given by Eq. (4.127) with integer n D 0; 1; 2; : : :. Since in our case 00 D 0 and
0 D .2l C 2 /0 D 1, we then have
mZ 2 e4
l 1 D n 0 D n H) D nClC1 H) ED ; (4.258)
2„2 n2r
where nr D n C l C 1 is called the main quantum number. For the given value of
l, the main quantum number takes on values nr D l C 1; l C 2; : : :. In particular,
for l D 0 (the so-called s states) we have nr D 1; 2; : : : which correspond to 1s, 2s,
etc. states of the H atom electron; for l D 1 (the p states) we have nr D 2; 3; : : :
corresponding to 2p, 3p, etc. states, and so on. We see that the allowed energy levels
386 4 Special Functions
are indeed discrete and they converge at large quantum numbers to zero from below.
As nr is getting bigger, the gap between energy levels gets smaller tending to zero
as nr ! 1.
Replacing D l 1 in Eq. (4.256) with D n, we obtain the DE for
generalised Laguerre polynomials (4.108) Ln2lC1 . / D Ln2lC1 r l1
. /. In other words,
the radial part of the wave function of an electron in a hydrogen-like atom is given by
l =2 2lC1
Rnr l . / e Lnr l1 . /:
Statistical mechanics deals with very large numbers N of particles which may be
indistinguishable. In these cases it is often needed to calculate the factorial NŠ of
very large numbers. There exists an elegant approximation to NŠ for very large
integer numbers N, the so-called Stirling’s approximation
p
NŠ N N eN 2 N; (4.259)
which we shall derive here first. Then we shall show some applications of this result.
Since .N C 1/ D NŠ, we have to investigate the Gamma function for large
integer N. We shall consider a more general problem of a real number z, where
jzj 1. Let us start by making the following transformation in .z C 1/:
Z 1 Z 1 Z 1
.z C 1/ D tz et dt D .zx/z ezx zdx D zzC1 ez exp Œz.x ln x 1/ dx;
0 0 0
(4.260)
where the substitution t D zx has been made.
Now, the function f .x/ D x ln x 1 appearing in the exponential, see Fig. 4.5,
has a minimum at x D 1 with the value of f .1/ D 0. Thus the function ezf .x/ has
a maximum at x D 1 and this maximum becomes very sharply peaked when jzj
is large, see the same figure. This observation allows us to evaluate the integral in
Eq. (4.260) approximately7 . Indeed, two things can be done. Firstly, the bottom limit
of the integral can be extended to 1 as the exponential function in the integral is
practically zero for x < 0. Secondly, we can expand f .x/ about x D 1 and retain
only the leading order term. That is, if we write x D 1 C y, we obtain
y2
f .x/ D f .1 C y/ D 1 C y ln.1 C y/ 1 D y ln.1 C y/ D C
2
7
In fact, what we are about to derive corresponds to the leading terms of the so-called asymptotic
expansion of the gamma function for jzj 1.
4.7 Selected Applications in Physics 387
Fig. 4.5 Functions f .x/ D x ln x 1 (black) and exp .zf .x// (other colors) for z D1, 5, 10
and 100
ln .NŠ/ N ln N N : (4.261)
In most applications the terms we dropped here are negligible as compared to the
two terms retained.
As an example of application of the Stirling’s formula (4.261), we shall consider
a paramagnetic–ferromagnetic phase transition in a simple three-dimensional lattice
within a rather simple Bragg–Williams theory. Let us consider a lattice of atoms
assuming that each atom may either have its magnetic moment being directed “up”
or “down”. We shall also assume that only magnetic moments of the nearest atoms
interact via the so-called exchange interaction with z being the number of the nearest
neighbors for each lattice site. Let n# and n" be average densities of atoms with
the moments directed
up and
down, respectively. We shall also introduce the order
parameter m D n" n# =n, which is the relative magnetisation of the solid and
n D n" C n# is the total number of atoms in the unit volume. The densities of the
atoms having moments up and down are given, respectively, by n" D n .1 C m/ =2
and n# D n n" D n .1 m/ =2 via the order parameter, as is easily checked.
388 4 Special Functions
Our goal is to calculate the free energy density F D U TS of this lattice, where
U is the internal energy density, T temperature and S the entropy density. We start by
calculating the internal energy U. Let N"" , N## and N"# be the numbers of different
pairs (per unit volume) of the nearest moments which are aligned, respectively,
parallel up, parallel down or antiparallel to each other. It is more energetically
favorable for the moments to be aligned in parallel, the energy in this case is J0 ;
otherwise, the energy is increased by 2J0 , i.e. it is equal to CJ0 . Then on average
one can write the following expression for the internal energy:
U D J0 N"" C N## C J0 N"# B H n" n# ; (4.262)
where the last term corresponds to an additional contribution due to the applied
magnetic field H along the direction of the moments, B being Bohr magneton. To
calculate the number density of pairs, we note that, for the given lattice site, the
probability of finding a nearest site with the moment up is p" D n" =n. For each site
with the moment up (whose density is n" ) there will be zp" nearest neighbors with
the same direction of the moment, i.e.
1 zn2" 1
N"" D n" zp" D D zn .1 C m/2 ;
2 2n 8
where the factor of one half is required to avoid double counting of pairs. Similarly,
one can calculate densities of pairs of atoms with other arrangements of their
moments:
1 1 1
N## D n# zp# D zn .1 m/2 and N"# D n" zp# D zn 1 m2 :
2 8 4
Note that the factor of one half is missing in the last case. This is because in this
case there is no double counting: we counted atoms with the moment up surrounded
by those with the moment down. The obtained expressions for the density of pairs
yield for the internal energy density (4.262):
1
U D J0 znm2 mB nH: (4.263)
2
This formula expresses the internal energy entirely via the order parameter m.
The entropy density can be worked out from the well-known expression S D
kB ln W, where kB is the Boltzmann constant and W is the number of possibilities in
which one can allocate n" moments up and n# moments down on a lattice with n
n n
sites. Obviously, W D D , and hence
n# n"
n nŠ
S D kB ln D kB ln D kB ln .nŠ/ ln n" Š ln n n" Š :
n" n" Š n n" Š
4.7 Selected Applications in Physics 389
To calculate the factorials above, we use the Stirling’s formula (4.261) which gives
S ' kB n ln n n" ln n" n n" ln n n"
1 1
D kB n ln 2 .1 C m/ ln .1 C m/ .1 m/ ln .1 m/ : (4.264)
2 2
As we now have both components of the free energy, U and S, we can combine
them to obtain the free energy density as
1
F D U TS D J0 znm2 mB nH
2
1 1
nkB T ln 2 .1 C m/ ln .1 C m/ .1 m/ ln .1 m/ : (4.265)
2 2
To calculate the magnetisation M D nB m (per unit volume), we need to find the
minimum of the free energy for the given value of H and T which should correspond
to the stable phase for these parameters:
@F 1 1Cm
D0 H) nkB Tc m B nH C nkB T ln D 0; (4.266)
@m 2 1m
where Tc is defined by the identity kB Tc D J0 z. The above equation can be
rearranged in the following way. If we denote x D .B H C kB Tc m/ =kB T, then
The situation is a bit more complex for a non-zero magnetic field: the minima are
no longer equivalent, and the system prefers to align the magnetic moment along the
field. With decreasing T the magnetisation tends to the saturation limit at m D ˙1.
Using an essentially identical argument, one can also consider statistics of the
order–disorder phase transition in binary alloys.
„2 d2 k .x/
C V.x/ k .x/ D Ek k .x/; (4.268)
2m dx2
4.7 Selected Applications in Physics 391
Fig. 4.7 One dimensional lattice of positive ion cores produces a potential V.x/ for the electrons;
the potential goes to 1 at the positions of the nuclei. (a) A realistic potential; (b) a model
potential in which each nucleus creates a delta function like potential
where k is the electron momentum used to distinguish different states, and k .x/ and
Ek are the corresponding electron wave function and energy. Because the lattice is
periodic, each electron has a well-defined momentum k serving as a good quantum
number. Also, the wave function satisfies the so-called Bloch theorem whereby
k .x/ D e uk .x/, where uk .x/ is a periodic function with the same period a as
ikx
where A < 0 is some parameter. Here we sum over all atoms whose positions are
given by la. So in this model negative singularities at atomic cores are correctly
described, however, the potential everywhere between cores is zero instead of going
smoothly between two minus infinity values at the nearest cores, compare Fig. 4.7(a)
and (b).
Since the functions V.x/ and uk .x/ are periodic, we can expand them in an
(exponential) Fourier series:
X X
V.x/ D Vg eigx and uk .x/ D ug .k/eigx ; (4.270)
g g
Note that only the term l D 0 is contributed to the integral. Thus, the delta function
lattice potential has all its Fourier components equal to each other. Now use the
Bloch theorem and then substitute the expansions (4.270) of the potential and of the
wave function into the Schrödinger equation (4.268):
2 3
X „2 X 0 X
4 .q C g/2 C A eig x 5 ug .k/eigx D Ek ug .k/eigx : (4.271)
g
2m 0 g
g
This equation
P can be solved exactly to obtain the energies Ek . The trick is to
introduce D g1 ug1 .k/. Indeed, from (4.273)
A
ug .k/ D „2 :
2m
.k C g/2 Ek
Summing both sides with respect to all g, we shall recognise in the left-hand side.
Hence, canceling on , we obtain the following exact equation for the energies:
„2 X 1
D 2mEk
(4.274)
2mA 2
g .k C g/ „2
The sum in the right-hand side is calculated analytically. Indeed, using the identity
1 1 1 1
2 2
D ;
a b 2b ab aCb
4.7 Selected Applications in Physics 393
p
where b D ka=2 and c D .a=2„/ 2mEk . Next, we use Eq. (4.49) for the cotangent
function as well as the trigonometric identity
2 sin 2y
cot.x C y/ cot.x y/ D ; (4.275)
cos 2x cos 2y
sin Ka
cos ka D cos Ka C P ; (4.276)
Ka
p
where P D mAa2 =„2 and K D 2c=a D „1 2mEk , i.e. Ek D „2 K 2 =2m. The
obtained Eq. (4.276) fully solves the problem as it gives K.k/ which in turn yields
Ek for each k. It has real solutions only if the right-hand side of the equation is
between 1 and 1. This condition restricts possible values of the wave vector K and
hence of the energies Ek , and therefore results in bands. The function
sin.Ka/
f .Ka/ D cos.Ka/ C P
Ka
for P D 20 is plotted in Fig. 4.8. The bands, i.e. the regions of allowed values of
Ka, are colored green in the figure. Solving Eq. (4.276) with respect to K D K.q/
for each given value of k allows calculating dispersion of energy bands Ek .
394 4 Special Functions
Consider oscillations in a circular membrane of radius a which is fixed at its rim. The
mathematical description of this problem is based on the theory of Bessel functions.
The partial DE which needs to be solved in this case, assuming the membrane is
positioned within the x y plane, has the form (see Chap. 8 for a more detailed
discussion on partial DEs of mathematical physics):
1 @2 u 1 @2 u 1 @ @u 1 @2 u
D u H) D r C ; (4.277)
c2 @t2 c2 @t2 r @r @r r2 @ 2
where u.r; ; t/ is the vertical displacement of the membrane (along the z axis)
written in polar coordinates .r; /, and c is the sound velocity. Correspondingly,
we wrote the Laplacian in the right-hand side in the polar coordinates and discarded
the z-dependent term, see Eq. (7.88).
To solve this equation, we apply the method of separation of variables already
mentioned in Sect. 4.6.5 and to be discussed in more detail in Chap. 8.
T 00 C 2 c2 T D 0; (4.278)
ˆ00 C n2 ˆ D 0; (4.279)
00 1 0 2 n2
R C R C 2 R D 0; (4.280)
r r
where 2 > 0 and n2 are the corresponding separation constants. Then argue
why the number n must be an integer (cf. discussion in Sect. 4.5.3).
The general solution of equation (4.278) for T.t/ is given by (A and B are
arbitrary constants)
while the solutions of the ˆ. / Eq. (4.279) are periodic functions e˙in . Equa-
tion (4.280) for the radial function R.r/ coincides with the Bessel DE (4.234), and
hence its general solutions are a linear combination of the Bessel functions of the
first and second kind:
satisfy the partial DE (4.277) and the boundary conditions. Here Ani and Bni are
arbitrary constants. By taking a linear combination of these elementary solutions,
we can construct the general solution of the problem since the DE is linear:
1 nh
1 X
X i
.n/ .n/
u.r; ; t/ D Ani cos i ct C Bni sin i ct ein
nD0 iD1
h i o
.n/ .n/ .n/
C Ani cos i ct C Bni sin i ct ein Jn i r ; (4.281)
where the constants Ani and Bni are assumed to be generally complex. Note that the
arbitrary constants by the sine and cosine functions of the ein part are complex
conjugates of those we used with ein ; this particular construction ensures that the
function u.r; ; t/ is real. Also note that the summation over i corresponds to the
particular value of n in the first sum.
Formally, the above formula can be rewritten in a simpler form by extending
the sum over n to run from 1 to 1 and defining Ani D Ani , Bni D Bni and
.n/ .n/
i D i :
1 h
1 X
X i
.n/ .n/ .n/
u.r; ; t/ D Ani cos ci t C Bni sin ci t ein Jjnj i r :
nD1 iD1
(4.282)
To determine the constants, we have to apply the initial conditions. In fact, the
problem can be solved for very general initial conditions:
ˇ
@u .r; ; t/ ˇˇ
u .r; ; t/jtD0 D f .r; / and ˇ D '.r; /: (4.283)
@t tD0
396 4 Special Functions
Applying t D 0 to the function (4.282) and its time derivative, these conditions are
transformed into the following equations:
1 X
X 1
.n/
ujtD0 D Ani ein Jjnj i r D f .r; / ; (4.284)
nD1 iD1
ˇ X1 X 1
@u ˇˇ .n/ .n/
D c Bni ein Jjnj i r D ' .r; / : (4.285)
@t ˇtD0 nD1 iD1 i
The complex coefficients Ani and Bni contain two indices and correspondingly are
under double sum. They can be found then in two steps.
multiply both sides of the above Eqs. (4.284) and (4.285) by eim and then
integrate over the angle to show that
Z 2 1
X
1 in .n/
f .r; / e d D Ani Jjnj i r ; (4.286)
2 0 iD1
Z 2 1
X
1 .n/ .n/
' .r; / ein d D ci Bni Jjnj i r : (4.287)
2 0 iD1
.n/
At the second step, we recall that the Bessel functions Jjnj i r also form an
orthogonal set, see Eq. (4.241), so that coefficients in the expansion of any function
in Eq. (4.242) can be found from Eq. (4.243). Therefore, we can finally write
Z Z 2
1 a
.n/
Ani D rdr d ein f .r; / Jjnj i r ; (4.288)
2 Jni 0 0
Z Z 2
1 a
.n/
Bni D .n/
rdr d ein f .r; / Jjnj i r ; (4.289)
2 ci Jni 0 0
where
Z a 2
.n/
Jni D rJjnj i r dr (4.290)
0
In a similar way one can solve heat transport problem in cylindrical coordinates,
as well as vibration problem in spherical coordinates. In both cases Bessel functions
appear.
Z
.r0 / 0
U .r/ D dr ; (4.291)
V jr r0 j
taken over the whole volume V where the charge distribution is non-zero. Note that
if we have a collection of point charges qi located at points ri , then the charge density
due to a single charge qi is given by qi ı .r ri /, where
ı .r ri / D ı .x xi / ı .y yi / ı .z zi /
The filtering theorem is also valid for this delta function as the volume integral can
always be split into a sequence of three one-dimensional integrals, and the filtering
theorem can be applied to each of them one after another:
398 4 Special Functions
Z Z Z Z
f r0 ı r r0 dr0 D ı x x0 dx0 ı y y0 dy0 f x0 ; y0 ; z0 ı z z0 dz0
Z Z
D ı x x0 dx0 ı y y0 f x0 ; y0 ; z dy0
Z
D ı x x0 f x0 ; y; z dx0 D f .x; y; z/ D f .r/ ;
as expected.
So, formula (4.291) is general: it can be used both for continuous and discrete
charge distributions. We shall now derive the so-called multipole expansion of
the potential U.r/. Let the angle between the vectors r and r0 be . Then, using
the cosine theorem, we can write
ˇ ˇ p
ˇr r0 ˇ D r2 C r02 2rr0 cos :
Next we shall expand the Legendre polynomial into spherical functions using
Eq. (4.199) which yields the following expression for the potential:
1 X
X Z r Z
l
4 1 0 lC2 0
U.r/ D Yl .; /
m
r dr d r0 Ylm . 0 ; 0 /
lD0 mDl
2l C 1 r lC1
0
Z 1 Z
0 lC1 0
C rl r dr d r0 Ylm . 0 ; 0 / ; (4.292)
r
which is the required general formula. Here r D .r; ; / and r0 D .r0 ; 0 ; 0 / are
the two points (vectors) in the spherical coordinates. Note that complex spherical
harmonics were used here; however, since the expansion formula (4.199) for the
Legendre polynomials is invariant under the substitution of the complex harmonics
Ylm with the real ones, Ylm ! Slm , one can also use the real harmonics in the formula
above.
We shall now consider a particular case of a potential outside a confined charge
distribution. In this case we assume that there exists such a D maxV .r0 / that
.r0 / D 0 for r0 > a. Then the potential outside the charge distribution (i.e. for
r > a) would only contain the first integral of Eq. (4.292) in which the r0 integration
is done between 0 and a:
X 4 Z a Z
1
0 lC2 0
U.r/ D Yl .; / lC1
m
r dr d r0 Ylm . 0 ; 0 /
lm
2l C 1 r 0
r
X 4 Qlm
D Ylm .; / l ; (4.293)
lm
2l C 1 r
A B C
U.r/ D C 2 C 3 C D U1 .r/ C U2 .r/ C U3 .r/ C :
r r r
We expect that A must be the total charge of the distribution, B is related to its
dipole moment, C to its quadrupole moment and so on. For instance, consider the
l D m D 0 term:
r Z Z
4 0 1 0
0 0
Q00 D r p dr D r dr ;
1 V 4 V
which is the total charge Q of the distribution, and the corresponding term in the
potential, U0 .r/ D
p Q=r, does indeed have the correct form. We have used here the
fact that Y00 D 1= 4 (see Sect. 4.5.3.4).
Problem 4.88. Similarly show that the l D 1 term in the expansion (4.293)
corresponds to the dipole term:
Pn
U1 .r/ D ;
r2
where n D r=r is the unit vector in the direction r, and
Z
0 0 0
PD r r dr
V
is the dipole moment of the charge distribution. [Hint: use real spherical
harmonics and explicit expressions for them in Eq. (4.195).]
Problem 4.89. Also show that the l D 2 term is associated with the
quadrupole contribution
3
1 X
U2 .r/ D 3 D˛ˇ n˛ nˇ ;
2r
˛;ˇD1
is the quadrupole moment matrix. [Hint: use real spherical harmonics (4.196)
and the fact that Dxx C Dyy C Dzz D 0.]
Note that the quadrupole matrix is symmetric and contains only five independent
elements (since the diagonal elements sum up to zero); this is not surprising as there
are only five spherical harmonics of l D 2.
Chapter 5
Fourier Transform
We know from Chap. 3 that any piecewise continuous periodic function f .x/ can
be expanded into a Fourier series.1 One may ask if a similar expansion can be
constructed for a function which is not periodic. The purpose of this chapter is to
address this very question. We shall show that for non-periodic functions f .x/ an
analogous representation of the function exists, but in this case the Fourier series
is replaced by an integral, called Fourier integral. This development enables us to
go even further and introduce a concept of an integral transform, The definition
and various properties of the Fourier integral and Fourier transform are given, with
various applications in physics appearing at the end of the chapter.
1
In the following, references to the first volume of this course (L. Kantorovich, Mathematics for
natural scientists: fundamentals and basics, Springer, 2015) will be made by appending the Roman
number I in front of the reference, e.g. Sect. I.1.8 or Eq. (I.5.18) refer to Sect. 1.8 and Eq. (5.18) of
the first volume, respectively.
Fig. 5.2 The non-periodic function f .x/ shown in Fig. 5.1 is compared with its periodic (with the
period T) approximation fT .x/ (the solid line). The part of f .x/ which was cut out in fT .x/ is shown
with the dashed line
T T
fT .x/ D f .x/ for x< ; (5.1)
2 2
The function fT .x/ is periodic and hence can be expanded into the Fourier series.
Using the exponential form of the series, Sect. 3.5, we can write
1
X
n x
fT .x/ D FT .n /ei2 ; (5.3)
nD1
Z 1
x
f .x/ D F.v/ei2 d; (5.5)
1
where F./ D limT!1 FT ./. At the same time, in the T ! 1 limit the integral
in the right-hand side of the Fourier coefficients (5.4) extends to the whole x axis,
while in the integrand fT .x/ ! f .x/, i.e. we can also write
Z 1
F./ D f .x/ei2 x
dx: (5.6)
1
Let us split the integral over in Eq. (5.7) into two: one between 1 and 0 and
another between 0 and C1. In the first integral we shall then make a change of
variables ! . This gives
Z 0 Z 1 Z 1 Z 1
i2 .tx/
f .x/ D .d/ f .t/e dt C d f .t/ei2 .tx/
dt
1 1 0 1
Z 1 Z 1
D d f .t/ ei2 .tx/
C ei2 .tx/
dt: (5.8)
0 1
404 5 Fourier Transform
The expression in the square brackets is recognised to be the two times cosine, and
we finally obtain the so-called trigonometric form of the Fourier integral:
Z 1 Z 1
f .x/ D 2 d f .t/ cos Œ2 .t x/ dt: (5.9)
0 1
Problem 5.1. Show that if f .x/ is an even function, then Eq. (5.7) is simplified
further:
Z 1 Z 1
f .x/ D 4 f .t/ cos.2 t/dt cos.2 x/d; (5.10)
0 0
Problem 5.4. Show this using Eq. (2.122). Also show that at x D 0 the integral
is equal to one, as required.
5.1 The Fourier Integral 405
Fig. 5.3 Fourier integral for ….x/ calculated using the spectroscopic range 0 T for (a) T D
1 and T D 4, and (b) T D 10 and T D 100. The original function ….x/ is shown in (a) by the
dashed line
Let us try to understand how this integral representation actually works. To this
end let us consider an approximation
Z T
sin.2 /
…T .x/ D 4 cos.2 x/d
0 2
for the integral (5.13), where integration is performed not up to infinity, but up to
some finite real number T. The results of a numerical integration for T D 1; 4; 10
and 100 shown in Fig. 5.3 clearly demonstrate convergence of the function …T .x/ to
the exact function ….t/ as the value of T (the upper limit in the integral) is increased.
The discussion which led us to Eq. (5.7) or Eq. (5.9), which are absolutely equiv-
alent, was mostly intuitive. A more rigorous derivation leading to either of the
forms of the Fourier integral requires a more careful analysis. Let us aim to prove
specifically Eq. (5.9). This basically entails proving the following formula:
Z 1 Z 1
1
2 d f .t/ cos Œ2 .t x/ dt D Œf .x 0/ C f .x C 0/ : (5.14)
0 1 2
1
lim g .x/ D Œf .x 0/ C f .x C 0/ : (5.16)
!1 2
Therefore, one can swap the integrals in Eq. (5.15) and integrate over to obtain:
Z 1
1 sin Œ .t x/
g .x/ D f .t/ dt:
1 tx
We split this integral into two at the point x; in the integral between 1 and x we
shall then change the variable x t ! t, and in the integral between x and C1 we
shall apply the substitution t x ! t. This yields
Z 1 Z 1
1 sin .t/ 1 sin .t/
g .x/ D f .x t/ dt C f .x C t/ dt (5.17)
0 t 0 t
f .x t/ f .x 0/ f .x C t/ f .x C 0/
‰1 .t/ D and ‰2 .t/ D
t t
are two auxiliary functions. These two functions have the same discontinuities of
the first kind as the function f .x/ itself; on top of this, they may have a singularity
at t D 0. However, similarly to our analysis in Sect. 3.7.2, at t ! C0 (note that
the integration over t in (5.16) is carried out over positive values only), both these
functions have well-defined limits at t D C0 if we assume2 that the function f .x/
can also be differentiated on the left and on the right of any point x. Then,
f .x t/ f .x 0/
lim ‰1 .t/ D lim D f 0 .x 0/;
t!C0 t!C0 t
f .x C t/ f .x C 0/
lim ‰2 .t/ D lim D Cf 0 .x C 0/;
t!C0 t!C0 t
which proves that at the point t D 0 both functions ‰1 .t/ and ‰2 .t/ are well defined.
Hence these two functions may only have discontinuities of the first kind due to the
function f .t/.
If in the ! 1 limit both integrals in (5.18) tend to zero, then g .x/ would
indeed tend to the required mean value of f .x/. So, it is only left to investigate the
convergence of these integrals in the ! 1 limit. Our analysis will mostly be
intuitive from this point on. Consider the integral
Z 1 Z ı Z 1
I./ D ‰.t/ sin .t/ dt D ‰.t/ sin .t/ dt C ‰.t/ sin .t/ dt;
0 0 ı
(5.19)
which has been split into two by some small positive ı chosen in such a way that
‰.t/ is continuous within the interval 0 t ı. Then, the first integral (between 0
and ı) can be manipulated for sufficiently small ı as follows:
Z ı Z ı
1 cos .ı/
‰.t/ sin .t/ dt ' ‰.0/ sin .t/ dt D ‰.0/ ;
0 0
2
This assumption is in fact not necessary and Eq. (5.16) can be proven without it. However, this
will not be done here as it would lead us to a much lengthier calculation.
408 5 Fourier Transform
Therefore,
Z ˇ ˇ Z
ı ˇ z D t ˇ 1 ı z
ˇ
‰.t/ sin .t/ dt D ˇ ˇ D ‰ sin zdz
0 dz D dt ˇ 0
X1 Z X1
1 ‰ .n/ .0/ ı n 1 ‰ .n/ .0/
D z sin zdz D Kn .ı/ :
nD0
nC1 nŠ 0 nD0
nC1 nŠ
Since the largest power of in Kn .ı/ is n , each term in the above expansion tends
to zero as 1= when ! 1.
So, the first integral in (5.19) is zero in the ! 1 limit. Consider now the
second integral there assuming first that ‰.t/ does not have discontinuities at t > ı:
Z 1 Z
1 1 z
‰.t/ sin .t/ dt D ‰ sin zdz:
ı ı
In the ! 1 limit ‰ .z=/ ! ‰.0/, and the bottom limit ı of the integral tends
to infinity reaching the upper limit (note that ı is small but finite). Correspondingly,
the integral tends to zero. If ‰.t/ has discontinuities somewhere at t > ı, we
can always split the integral into a sum of integrals each taken between these
discontinuities. Then each of the integrals individually will tend to zero as ! 1.
Indeed, consider an interval a < t < b assuming that ‰.t/ is continuous within that
interval. Then,
Z Z
b
1 b
z
‰.t/ sin .t/ dt D ‰ sin zdz:
a a
Since both limits tend to infinity as ! 1 at the same time, the integral tends to
zero, as required. This finalises the proof of formula (5.16), and hence of Eq. (5.14).
5.2 Fourier Transform 409
If the application of Eq. (5.7) is split into two consecutive steps, then a very useful
and widely utilised device is created called the Fourier transform. The Fourier
transform of a function f .x/, defined on the whole real axis x and satisfying the
Dirichlet conditions, is defined as the integral (5.6):
Z 1
F./ D f .x/ei2 x
dx: (5.20)
1
R1
We know from the previous section that the integral 1 jf .x/j dx should exist.
From (5.20) it is seen that the function f .x/ is transformed by a process of integration
into a spectral function F.v/. If we introduce a functional operator F acting on f .x/
which converts f .x/ 7! F.v/, then we can recast (5.20) in the form:
Z 1
F./ D FŒf .x/ D f .x/ei2 x
dx: (5.21)
1
Note, in particular, that if x is time and f .t/ is a signal of some kind, then
is a frequency, i.e. in this case the transformation is performed into the frequency
domain and the Fourier transform F./ D F Œf .t/ essentially gives a spectral
analysis of the signal f .t/.
It is also possible to convert F./ back into f .x/, i.e. to perform the inverse
transformation F./ 7! f .x/, by using the formal relation f .x/ D F 1 ŒF./, where
F 1 denotes the inverse functional operator. Fortunately, an explicit formula for
this inversion procedure is readily provided by the Fourier integral (5.7). Indeed, the
expression in the square brackets there, according to Eq. (5.20), is exactly F./, so
that we find
Z 1
f .x/ D F 1 ŒF./ D F./ei2 x d: (5.22)
1
This result is called the inverse Fourier transform of F./. For a time dependent
signal f .t/ the inverse transform shows how the signal can be synthesised from its
frequency spectrum given by F./.
Problem 5.6. Show that for an even function fe .x/ the Fourier transform pair
(i.e. both direct and inverse transformations) can be expressed in the following
alternative form:
(continued)
410 5 Fourier Transform
It should be noted that the Fourier transform F./ of a real even function fe .x/
is also a real function.
Problem 5.7. Similarly, show that for an odd function fo .t/ we find
Z 1
F./ D F Œfo .x/ D 2i fo .x/ sin.2 x/dx: (5.25)
0
Z 1
fo .x/ D F 1 ŒF./ D C2i F./ sin.2 x/d: (5.26)
0
Hence, the Fourier transform F./ of a real odd function is a purely imaginary
function.
Problem 5.8. Show that the Fourier transform F./ of a real function, f .x/ D
f .x/, satisfies the following identity:
F./ D F./:
As an example, let us determine the Fourier transforms Fn .v/ of a set of the unit
impulse functions
n; for jxj 1=2n
ın .x/ D ; (5.27)
0; for jxj > 1=2n
The functions Fn ./ for selected values of n from 1 to 100 are shown in Fig. 5.4. It is
clearly seen from the definition (5.27) of the ın .x/ that as n increases the width of
5.2 Fourier Transform 411
Fig. 5.4 The Fourier transforms Fn ./ of the functions ın .x/ of the delta sequence for selected
values of n
the peak x D 1=n becomes smaller; at the same time, the width of the central
peak of Fn ./ becomes larger. Indeed, may be determined by twice the smallest
root (excluding zero) of the function sin . =n/, i.e. D 2n, i.e. for any n we find
that x ' 2 (a some sort of the “uncertainty principle” of quantum mechanics).
In the n ! 1 limit, we obtain
This is nicely confirmed by the graphs of the Fourier transform of ın .x/ for large
values of n shown in Fig. 5.4, where one can appreciate that the function Fn ./
tends more and more to the horizontal line Fn ./ D 1 with the increase of n. Thus,
the Fourier transform of the Dirac delta function, to be obtained in the limit, is
Z 1
F Œı.x/ D ı.x/ei2 x
dx D 1: (5.30)
1
Note that the integral above also perfectly conforms to the delta function filtering
theorem, see Sect. 4.1.
Now, the application of the inverse transform (5.22) to the result (5.30) gives the
important formal integral representation of the delta function:
Z 1 Z 1 Z 1
x x 1
ı.x/ D F Œı.x/ ei2 d D ei2 d D eikx dk: (5.31)
1 1 2 1
412 5 Fourier Transform
Comparing the expression in the square brackets with Eq. (5.31) we immediately
recognise the delta function ı .t x/. Therefore, the expression above becomes
Z 1
f .x/ D ı .t x/ f .t/dt:
1
The integral in the right-hand side above gives f .x/ due to the filtering theorem for
the delta function, i.e. the same function as in the left-hand side, as expected!
sin .2 /
F Œ.1 x/ .1 x/ D ;
where ˛ > 0.
Problem 5.12. Show that the Fourier transform of the function f .t/, which is
equal to 1 for 1 t < 0, C1 for 0 t 1 and zero otherwise, is given by
2i sin2
F./ D :
(continued)
5.2 Fourier Transform 413
Problem 5.13. Show that the Fourier transform of the function f .t/ D e˛jxj
(where ˛ > 0) is given by
2˛
F./ D :
˛ 2 C .2 /2
Consequently, prove the following integral representation of it:
Z
˛jxj 2˛ 1 cos.ux/
e D du:
0 ˛ 2 C u2
Using this result, show that the integral
Z 1
du
D :
0 1 C u2 2
Check this result by calculating the integral directly using a different method.
[Hint: look up the integral representation of the arctangent.]
Problem 5.14. Prove the so-called modulation theorem:
1
F Œf .x/ cos .2 0 x/ D ŒF . C 0 / C F . 0 / ;
2
where F./ D F Œf .x/.
Problem 5.15. Prove that the Fourier transform of f .x/ D sin x for jxj =2
and zero otherwise is
4 i 2
F./ D 2
cos :
.2 / 1
Using this result and the inverse Fourier transform, show that
Z 1
x sin . x/
dx D :
0 1 x2 2
Verify that the integrand is well defined at x D 1.
414 5 Fourier Transform
Problem 5.16. Show that the Fourier transform of the Gaussian delta
sequence (where n D 1; 2; : : :)
n 2 2
ın .x/ D p en x =2 (5.33)
2
is
=n/2 2
Fn ./ D e2. : (5.34)
[Hint: you may find formula (2.56) useful.] Show next that if x and are the
widths of ın .x/ and Fn ./, respectively, at their half height, then their product
x D 4 ln 2= ' 0:88 and does not depend on n.
Again, several interesting observations can be made about the functions (5.33)
and their Fourier transforms (5.34). Indeed, the functions ın .x/ defined above
represent another example of the delta sequence (see Sect. 4.1): as n is increased, the
Gaussian function ın .x/ becomes thinner and more picked, while the full area under
the curve remains equal to unity. Its Fourier transform Fn ./ is also a Gaussian;
however, with the increase of n the Gaussian Fn ./ gets more and more spread out
tending to the value of unity in the limit, limn!1 Fn ./ D 1, which agrees with the
result for the rectangular delta sequence (5.27) considered above.
So, we see yet again that if a function in the direct x-space gets thinner, its Fourier
transform in the -space gets fatter. The opposite is also true which follows from the
symmetry of the direct and inverse transforms, Eqs. (5.20) and (5.22).
One of the important strengths of the Fourier transform stems from the fact that it
frequently helps solving linear differential equations (DE), and several examples of
this application will be given in the following sections. The idea is to first transform
the equation into the -space by performing the Fourier transform of all terms in it.
In the transformed equation the unknown function f .x/ appears as its transform (or
“image”) F./, and, as will be shown in this subsection, derivatives of f .x/ in the
DE are transformed into algebraic expressions linear in F./ rendering the DE in
the -space to become algebraic with respect to F./. Then, at the second step the
inverse transform F./ 7! f .x/ is performed. If successful, the solution is obtained
in the analytical form. Otherwise, the solution is obtained as the spectral integral
over which may be calculated numerically.
However, an application of this scheme may be drastically simplified if certain
general rules of working with the Fourier transform are first established. We shall
derive several such rules here and also in the next subsection. These may be useful
in performing either direct or inverse Fourier transforms in practice.
5.2 Fourier Transform 415
Problem 5.17. Using the method of induction, generalise this result for a
derivative of any order m:
Consider two functions f .x/ and g.x/ with Fourier transforms F./ and G./,
respectively. The convolution of f .x/ and g.x/ is defined as the following function
of x:
Z 1
f .x/ g.x/ D f .t/g.x t/dt: (5.39)
1
f g D g f:
Let us calculate the Fourier transform of this new function. Our goal is to express
it via the Fourier transforms F./ and G./ of the two functions involved.
To this end, let us substitute into Eq. (5.39) the inverse Fourier transform of
g.x t/,
Z 1
.xt/
g.x t/ D G./ei2 d; (5.40)
1
Formula (5.42) is known as the convolution theorem. We see that the Fourier trans-
form of a convolution f .x/ g.x/ is simply equal to the product of the Fourier
transform of f .x/ and g.x/. Inversely, if one needs to calculate the inverse Fourier
transform of a product of two functions F./G./, then it is equal to the convolution
integral (5.39). This may be useful in performing the inverse transform. Indeed, if a
function T./ to be inverted back into the x-space looks too complex for performing
such a transformation, it might sometimes be possible to split it into a product
of two functions F./ and G./ in such a way that their individual inversions
(into f .x/ and g.x/, respectively) can be done. Then, the inversion of the whole
function T./ D F./G./ will be given by the integral (5.39), which can then be
either attempted analytically or numerically thereby solving the problem, at least in
principle.
5.2 Fourier Transform 417
As the simplest example, let us determine the convolution of the Dirac delta
function ı.x b/ with some function f .x/. From the definition (5.39) of the
convolution, we find that
Z 1
ı.x b/ f .x/ D ı.t b/f .x t/dt D f .x b/; (5.43)
1
where in the last passage we have applied the filtering theorem for the Dirac delta
function, Sect. 4.1. Considering now the -space, the Fourier image of ı.x b/ is
Z 1
F Œı .x b/ D ı .x b/ ei2 x
dx D ei2 b
;
1
while the Fourier transform of f .x/ is F./. Therefore, the Fourier transform of
their convolution is simply F./ei2 b . On the other hand, calculating directly the
Fourier transform of their convolution given by Eq. (5.43), we obtain
Z ˇ ˇ
1 ˇt D x bˇ
F Œf .x b/ D f .x b/ ei2 x
dx D ˇˇ ˇ
1 dt D dx ˇ
Z 1
D ei2 b
f .t/ ei2 t
dt D ei2 b
F./;
1
Problem 5.20. The function f .t/ is defined as e˛t for t 0 and zero otherwise
(˛ > 0). Show by direct calculation of the convolution integral, considering
separately the cases of positive and negative t, that the convolution of this
function with itself g.t/ D f .t/ f .t/ D tf .t/.
Problem 5.21. Find the Fourier transforms of the functions f .t/ and g.t/
defined in the previous problem directly and using the convolution theorem.
[Answer: F Œf .t/ D F./ D .˛ C i2 /1 and F Œg.t/ D G./ D
.˛ C i2 /2 D F./2 .]
418 5 Fourier Transform
Due to the close relationship between the Fourier series and transform, it is no
surprise that there exists a theorem similar to the corresponding Parseval’s theorem
we proved in Sects. 3.4 and 3.5 in the context of the Fourier series.
Here we shall formulate this statement in its most general form as the
Plancherel’s theorem. It states that if F./ and G./ are the Fourier transforms
of the functions f .x/ and g.x/ respectively, then
Z 1 Z 1
F./ G./d D f .x/ g.x/dx: (5.44)
1 1
To prove this, we use the definition of the Fourier transforms in the left-hand side:
Z 1 Z 1 Z 1 Z 1
2 ix 2 iy
F./ G./d D f .x/e dx g.y/e dy d
1 1 1 1
Z 1 Z 1 Z 1 Z 1 Z 1
D e2 i.xy/
d f .x/ g.y/dxdy D ı.x y/f .x/ g.y/dxdy
1 1 1 1 1
Z 1 Z 1 Z 1
D f .x/ ı.x y/g.y/dy dx D f .x/g.x/dx;
1 1 1
where in the second line we recognised the integral representation (5.31) for the
delta function ı.x y/ within the expression in the square brackets, while in the
third line we used the filtering theorem for the delta function.
In particular, if f .x/ D g.x/, then we obtain the Parseval’s theorem:
Z 1 Z 1
2
jf .x/j dx D jF./j2 d: (5.45)
1 1
Problem 5.22. Using the function from Problem 5.13 and the Parseval’s
theorem, show that
Z 1
dx
D :
0 .1 C x2 /2 4
In physics the Fourier transform is used both for functions depending on time t and
spatial coordinates x, y and z. There are several ways in which the Fourier transform
may be written. Consider first the time Fourier transforms. If f .t/ is some function
satisfying the necessary conditions for the transform to exist, then its direct and
inverse transforms can be written, for instance, as
Z 1 Z 1
i!t d!
F.!/ D f .t/e dt and f .t/ D F.!/ei!t ; (5.46)
1 1 2
i.e. the expected result. The integral representation (5.31) of the delta function was
used here.
Note that the 1=2 multiplier may appear instead in the equation for the direct
transform, or it may also be shared in both expressions symmetrically:
Z 1 Z 1
dt d!
F.!/ D f .t/ei!t p and f .t/ D F.!/ei!t p : (5.47)
1 2 1 2
The mentioned Fourier transforms are related to each other by a trivial constant
pre-factor.
It is also important to mention the sign in the exponential functions in the two
(direct and inverse) transforms: either plus or minus can equivalently be used in the
direct transform; the important point is that then the opposite sign is to be used in
the inverse one. Of course, the images (or transforms) of the function f .t/ calculated
using one or the other formulae would differ. However, since at the end of the day
one always goes back to the direct space using the inverse transform, the final result
will always be the same no matter which particular definition has been used.
Functions depending on spatial coordinates may also be Fourier transformed.
Consider first a function f .x/ depending only on the coordinate x. In this case, using
one of the forms above, one can write for both transforms:
420 5 Fourier Transform
Z 1 Z 1
ikx x dkx
F .kx / D f .x/e dx and f .x/ D F .kx / eikx x : (5.48)
1 1 2
where the triple (volume) integral is performed over the whole infinite space.
Our result can be further simplified by introducing vectors r D .x; y; z/ and
k D kx ; ky ; kz . Then, we finally obtain the direct Fourier transform of the function
f .r/ specified in 3D space:
Z
F.k/ D f .r/eik r dr; (5.49)
where the integration is performed over the whole (reciprocal) space spanned
by the wave vector k.
Problem 5.25. Assuming that the Fourier transform of a function f .r/ is
defined as in Eq. (5.49), prove the following identities:
where k D jkj. Obtain these results in two ways: (i) act with operators r
and , respectively, on both sides of the inverse Fourier transform (5.51) and
(ii) calculate the Fourier transform of rf .r/ and f .r/ directly by means of
Eq. (5.49) (using integration by parts).
Problem 5.26. Consider a vector function g.r/. Its Fourier transform G.k/ is
also some vector function in the k space defined as above for each Cartesian
component of g.r/. Then show that
Problem 5.27. Let f .t/ be some function of time and its Fourier transforms be
given by Eq. (5.46). Show that
@f @2 f
F D i!F.!/ and F D ! 2 F.!/:
@t @t2
and
Z 1 Z
dk d!
f .x; t/ D F.k; !/ei.kx!t/ : (5.53)
1 2 2
Here opposite signs in the exponent before x and t were used as is customarily
done in the physics literature.
422 5 Fourier Transform
Problem 5.30. Act with the Laplacian operator on both sides of Eq. (5.54)
and use Eq. (5.50) to prove that
1
D 4 ı .r/ : (5.55)
r
A different proof of this relationship will be given below in Sect. 5.3.3.
Here we shall show that a particular solution of the Maxwell equations has a form
of a retarded potential. As is shown in electrodynamics, all four Maxwell equations
for the fields can be recast as only two equations for the corresponding scalar ' .r; t/
and vector A .r; t/ potentials:
1 @2 ' 1 @2 A 4
' D 4 and A D J: (5.56)
c2 @t2 c2 @t2 c
These are the so-called d’Alembert equations. Here .r; t/ and J .r; t/ are the charge
and current densities, respectively.
We shall derive here a special solution of these equations corresponding to the
following problem. Suppose, the potentials are known at t 0 where the charges
(responsible for the non-zero charge density ) are stationary. Then, starting from
t D 0 the charges start moving causing time and space dependence of the “sources”
.r; t/ and J .r; t/. Assuming next that the latter are known, we would like to
5.3 Applications of the Fourier Transform in Physics 423
(here and in the following k D jkj). For other terms the transformation is trivial:
1 @2 ' 1 @2 'k .t/
F 2 2 D 2 and F Œ4 .r; t/ D 4 k .t/:
c @t c @t2
Therefore, after the transformation into the Fourier space (or k-space), we obtain the
following time differential equation for the image 'k .t/ of the unknown potential:
1 d2 'k d2 'k
k2 'k D 4 k H) C .kc/2 'k D 4 c2 k: (5.59)
c2 dt2 dt2
This is a linear second order inhomogeneous differential equation which can be
solved using, e.g., the method of variation of parameters, considered in detail in
Sect. I.8.2.2.2.
so that the arbitrary constants above should be zero. Indeed, 'k .0/ D C1 C C2 D 0,
while
d'k 4 c
D ikc C1 eikct C2 eikct C f k . / sin Œkc .t /g Dt
dt k
Z t
C4 c2 k . / cos Œkc .t / d
0
Z
t
D ikc C1 eikct C2 eikct C 4 c2 k . / cos Œkc .t / d;
0
resulting in C1 D C2 D 0. Therefore,
Z
4 c t
'k .t/ D k . / sin Œkc .t / d
k 0
Z Z
4 c t
0
D d sin Œkc .t / dr0 .r0 ; /eik r ;
k 0
where in the second passage we have replaced the Fourier transform of the density
by the density itself using the Fourier transform of it. Substituting now the image
of the potential we have just found into the inverse Fourier transform of it, the first
Eq. (5.58), we obtain after slight rearrangements:
Z Z Z
c 0
t dk ik .rr0 /
' .r; t/ D 2
dr d r0 ; e sin .kc .t // : (5.60)
2 0 k
The 3D integral over k within the square brackets may only depend on the length R
of the vector R D r r0 since R appears only in the dot product with the vector k
and we integrate over the whole reciprocal k space. Therefore, this integral can be
simplified by directing R along the z axis and then using spherical coordinates.
5.3 Applications of the Fourier Transform in Physics 425
Problem 5.32. Using this method, show that integration over the spherical
angles of k yields
Z Z 1 Z 1
dk ik R 2
e sin .k/ D cos .k . R// dk cos .k . C R// dk ;
k R 0 0
where D c .t /.
In these integrals one can recognise Dirac delta functions, see Eq. (4.24), so that
Z
dk ik R 2 2
e sin Œkc .t / D Œı .c .t / R/ ı .c .t / C R/ :
k R
The two delta functions give rise to two integrals over . We shall consider them
separately. The first one, after changing the variable ! D c .t / R,
results in
Z t Z
1 ctR RC
d r0 ; ı .c .t / R/ D r0 ; t ı ./ d:
0 c R c
If the point D 0 does not lie within the limits, the integral is equal to zero. Since
R > 0, the integral is zero if ctR < 0, i.e. if R > ct or t < R=c. If, however, t > R=c
or R < ct, then the integral results in .1=c/ .r0 ; t R=c/ by virtue of the filtering
theorem for the delta function. The second delta function does not contribute to the
integral in this case since after the change of variables ! D c .t / C R
the limits of become ct C R and CR which are both positive, and hence exclude
the point D 0. Therefore, only the first delta function contributes for t > R=c and
we finally obtain
Z
dr0 jr r0 j 0 jr r0 j
' .r; t/ D H t r ; t ; (5.61)
jr r0 j c c
where H.x/ is the Heaviside function. The Heaviside function indicates that if there
is a charge density in some region around r0 D 0, then the effect of this charge will
only be felt at points r at times t > R=c ' r=c. In other words, the field produced
by the moving charges is not felt immediately at any point r in space; the effect of
the charges will propagate to the point r over some time and will only be felt there
when t r=c. Also note that for t D 0 the Heaviside function is exactly equal to
zero and hence is the potential—in full accordance with our initial conditions.
426 5 Fourier Transform
The second equation in (5.56) for the vector potential can be solved in the same
way. In fact, these are three scalar equations for the three components of A, each
being practically identical to the one we have just solved. Therefore, the solution
for each component of A is obtained by replacing with J˛ /c (where ˛ D x; y; z) in
formula (5.61), and then combining all three contributions together :
Z
1 dr0 jr r0 j 0 jr r0 j
A .r; t/ D H t J r ; t : (5.62)
c jr r0 j c c
This is a retarded solution as well. Hence, either of the two potentials at time
t is determined by the densities or J calculated at time t jr r0 j =c in the
past. In other words, the effect of the densities (the “sources”) from point r0 is
felt at the “observation” point r only after time jr r0 j =c which is required for
the light to travel the distance jr r0 j between these two points. The other term
which contains the delta function ı .c .t / C R/ (which we dropped) corresponds
to the advanced part of the general solution. This term is zero for the boundary
conditions we have chosen. In the case of general initial and boundary conditions
both potentials contribute together with the free term (the one which contains the
constants C1 and C2 ).
Problem 5.33. Using the Fourier transform method, show that the solution of
the Poisson equation
'.r/ D 4 .r/
of classical electrostatics is
Z
.r0 / 0
'.r/ D dr : (5.63)
jr r0 j
1 @2 y @2 y
2 2
D 2;
v @t @x
(continued)
5.3 Applications of the Fourier Transform in Physics 427
@2 n @n @n
D D ;
@x2 @x @t
where D is the diffusion coefficient and n.x; t/ is the density of the particles.
Perform the Fourier transform of this equation with respect to x and solve the
obtained ordinary differential equation assuming that initially at t D 0 all
particles were located at some point x0 , i.e. n .x0 ; 0/ D n0 ı .x x0 /. Finally,
performing the inverse Fourier transform of the obtained solution, show that
n0 2
n.x; t/ D p e.xx0 t/ =4Dt
:
4 Dt
Describe what happens to the particles’ distribution over time.
The so-called Green’s functions allow establishing a general formula for a particular
integral of an inhomogeneous differential equation as a convolution with the
function in the right-hand side. And the Fourier transform method has become an
essential tool in finding the Green’s functions for each particular equation. Let us
consider this question in a bit more detail.
Consider an inhomogeneous differential equation for a function y.x/:
where L is some operator acting on y.x/. For instance, the forced damped harmonic
oscillator is described by the equation
d2 y dy f .t/
2
C 2!0 C !02 y D ; (5.65)
dt dt m
where !02 is its fundamental frequency, friction and m mass, while f .t/ is an
external excitation signal, and hence the operator
d2 d
LD C 2!0 C !02 :
dt2 dt
428 5 Fourier Transform
where the operator L acts on the variable x only, i.e. x0 serves as a parameter, and in
the right-hand side we have a Dirac delta function. The function G .x; x0 / is called
Green’s function of the differential equation (5.64) or its fundamental solution.
It is easy to see then that a general solution of our differential equation for any
function f .x/ in the right-hand side of the differential equation can be written as its
convolution with the Green’s function:
Z
y.x/ D G x; x0 f x0 dx0 : (5.67)
Indeed, by acting with the operator L (recall that it acts on x, not on the integration
variable x0 ) on both sides, we obtain
Z Z
0 0 0
Ly.x/ D LG x; x f x dx D ı x x0 f x0 dx0 D f .x/;
where in the last passage we have used the filtering theorem for the delta function.
This calculation proves that formula (5.67) does indeed provide a particular solution
of our Eq. (5.64) for an arbitrary right-hand side f .x/.
Let us now consider some examples, both for ordinary and partial differential
equations.
We shall start by looking for the Green’s function of the damped (small friction)
harmonic oscillator equation. The Green’s function G.t/ satisfies the differential
equation (5.65) with f .x/=m replaced by ı.t/ (one can set t0 to zero here, the Green’s
function would only depend on the time difference t t0 anyway). We shall use the
Fourier transform method. If we define
Z 1
G.!/ D F ŒG.t/ D G.t/ei!t dt;
1
d2 G dG
2
C 2!0 C !02 G D ı.t/
dt dt
for the Green’s function transforms into
yielding immediately
1
G.!/ D :
! 2 C 2i!0 ! !02
5.3 Applications of the Fourier Transform in Physics 429
Therefore, using the inverse Fourier transform, we can write for the Green’s function
we are interested in:
Z 1 Z 1
i!t d! 1 ei!t
G.t/ D G.!/e D 2
d!:
1 2 2 2
1 ! C 2i!0 ! !0
The !-integral is most easily taken in the complex z plane. Indeed, there are two
poles here,
p
!˙ D !0 i ˙ 1 2 D ˙$ i!0 ;
which are solutionsp of the quadratic equation ! 2 C 2i!0 ! !02 D 0. Here the
frequency $ D !0 1 2 is positive (and real) as < 1 for a weakly damped
oscillator. Both poles lie in the lower part of the complex plane. We use the contour
which closes the horizontal axis with a semicircle of radius R ! 1 either in the
upper or lower part of the complex plane. To choose the contour, we have to consider
two cases: t > 0 and t < 0. In the case of positive times, R the exponent i!t !
izt D i .x C iy/ t D ixt C yt, so that the integral CR over the semicircle of
radius R of the contour will tend to zero for y < 0, i.e. we have to choose the
semicircle in the lower part of the complex plane, and both poles would contribute.
Therefore,
I
1 eizt
G.t/ D 2
dz
2 2
C z C 2i!0 z !0
2 i eizt eizt
D Res 2 ; ! C C Res ; !
2 z C 2i!0 z !02 z2 C 2i!0 z !02
i!C t
ei!C t ei! t e ei! t sin .$t/ !0 t
Di C Di C D e :
2!C C2i!0 2! C2i!0 2$ 2$ $
Note that the contour runs around the poles in the clockwise direction bringing an
additional minus sign. In the case of t < 0, one has to enclose the horizontal axis in
the upper part of the complex plane where there are no poles, and the result is zero.
Therefore, the Heaviside function H.t/ appears in the Green’s function in front of
the above expression.
Problem 5.36. Show that the Green’s function corresponding to the operator
L D C d=dt is G.t/ D H.t/et .
Problem 5.37. Show that if L D . C d=dt/2 , then G.t/ D H.t/tet .
Now let us briefly touch upon the issue of calculating Green’s functions for
partial differential equations. Consider, as an example, the Poisson equation,
In this case the boundary conditions are important, as many Green’s functions exist
for the same equation depending on these. We shall consider the case when the
Green’s function vanishes at infinity, i.e. G.r/ ! 0 when r ! 1. In this case we
can apply to G.r/ the Fourier transform,
Z Z
dk
G.k/ D eik r G.r/dr and G.r/ D eik r G.k/:
.2 /3
In the k-space the differential equation reads simply k2 G.k/ D 1, so that
G.k/ D 1=k2 . Correspondingly, the Green’s function G.r/ can be calculated using
the inverse Fourier transform taken in spherical coordinates and with the vector r
directed along the z axis:
Z Z 1 Z Z 2
1 dk 1 ikr cos #
G.r/ D eik r 2 D dk e sin #d# d
.2 /3 k .2 /3 0 0 0
Z 1 Z 1
1 eikr eikr 1 sin p 1
D 2
dk D 2 dp D ;
.2 / 0 ikr 2 r 0 p 4 r
where we have used Eq. (2.122). Using this Green’s function, one can easily write
a solution of the inhomogeneous Poisson equation '.r/ D 4 .r/ as a
convolution:
Z Z
0 0 dr0
'.r/ D G r r0 4 r dr0 D r ;
jr r0 j
Problem 5.38. Consider the Green’s function for the so-called Helmhotz
partial differential equation. The Green’s function is defined via
Then take the k-integral using two methods: (i) replace p ! p C i and then,
performing the integration in the complex plane, show that in the ! C0 limit
G.r/ D eipr =.4 r/; (ii) similarly, by replacing p ! p i show that in this
case G.r/ D eipr =.4 r/.
5.3 Applications of the Fourier Transform in Physics 431
1 @ .r; t/
.r; t/ D ; (5.70)
@t
and correspondingly yet another definition of the Green’s function. Here .r; t/
describes the probability density of a particle to be found at point r at time t,
and is the diffusion constant. First of all, let us determine the solution of this
equation corresponding to the initial condition that the particle was in the centre of
the coordinate system at time t D 0. In this case .r; 0/ D ı.r/, since integrating this
density over the whole space gives the total probability equal to unity, as required.
Again, we shall use the Fourier transform method. Define
Z Z
dk
.k; t/ D eik r .r; t/dr and .r; t/ D eik r .k; t/;
.2 /3
1 @ .k; t/ 2
k2 .k; t/ D H) .k; t/ D Cek t :
@t
The arbitrary constant C is found using the initial condition:
Z Z
.k; 0/ D eik r .r; 0/dr D eik r ı.r/dr D 1;
1 2 =4t
.r; t/ D 3=2
er : (5.71)
.4 t/
Another interesting point to mention, which also justifies the solution (5.71) to
be called the Green’s function of the diffusion equation, is that a general solution
of the equation at time t for arbitrary initial distribution .r; 0/ can be written as a
convolution with the Green’s function:
Z
0 0
.r; t/ D G r r0 ; t r ; 0 dr ; (5.72)
Problem 5.39. Prove that the density (5.72) remains properly normalised for
any t 0, i.e.
Z Z
.r; t/ dr D .r; 0/ dr:
R
[Hint: demonstrate by direct integration that G.r; t/dr D 1.]
Problem 5.40. Show that the solution of the differential equation
1 @G.r; t/ 1
G.r; t/ C D ı .r/ ı .t/ (5.73)
@t
is H.t/G .r; t/, where H.x/ is the Heaviside unit step function and G .r; t/
is the Green’s function given by Eq. (5.71). [Hint: use the corresponding
generalisation of the Fourier transformation (5.52) written for both (all three)
spatial and (one) temporal variables. Then, when calculating the inverse Fourier
transform, calculate first the ! integral in the complex plane, then perform a
direct integration in the k space.]
Problem 5.41. Calculate the Green’s function G.x; t/ of the one-dimensional
Schrödinger equation for a free electron:
„2 @2 @
C i„ G.x; t/ D i„ı .x/ ı .t/ : (5.74)
2m @x2 @t
(continued)
5.3 Applications of the Fourier Transform in Physics 433
where, as usual, H.t/ is the Heaviside function. The -integral diverges in the
strict mathematical sense. However, one can treat it as a generalised function
and use the following regularisation:
Z 1 Z 1 r r
i˛2 .i˛C /2
e d ) lim e d D lim D ;
1 !C0 1 !C0 i˛ C i˛
This formula plays a central role in the theory of path integrals developed by R.
Feynman, which serves as an alternative formulation of quantum mechanics.
Note that some justification for the manipulations made in the last problem can be
made by noticing that the equation for the Green’s function (5.74) can be formally
considered identical to the diffusion equation (5.73) by choosing D i„=2m in the
latter. Then, also formally, the Green’s function for the Schrödinger equation can be
obtained directly from Eq. (5.71) using the corresponding substitution.
where f .t/ D f .t/ hf .t/i corresponds to the deviation of f .t/ from its average
hf .t/i and the angle brackets correspond to the averaging over the statistical
ensemble. Here the average hA.t/i is understood as a mean value of the function
P A.t/
(in general, of coordinates and velocities) over the ensemble, hA.t/i D i wi Ai .t/,
where the sum is taken over all systems in the ensemble i, wi is the probability
to find the system i in the ensemble and Ai .t/ is the value of the function A.t/
reached by the particular system i. For instance, if we would like to calculate the
velocity autocorrelation function Kvv .; t/ of a Brownian particle in a liquid, then
f . / 7! v.t/ and an ensemble would consist of different identical systems in which
initial (at t D 0) velocities and positions of particles in the liquid are different, e.g.
drawn from the equilibrium (Gibbs) distribution. In this distribution some states of
the liquid particles (i.e. their positions and velocities) are more probable than the
others, this is determined by the corresponding distribution of statistical mechanics.
Finally, the sum over i in the definition of the average would correspond to an
integral over all initial positions and momenta of the particles.
Two limiting cases are worth mentioning. If there are no correlations in the values
of the observable f .t/ at different times, then the average of the product at two times
is equal to the product of individual averages,
hf i D hf hf ii D hf i hf i D 0
where we have added to both times at the last step. Note that we also omitted the
time t in the argument of the correlation function for simplicity of notations. Also,
under these conditions, the average hf .t/i does not depend on time t.
In the following we shall measure the quantity f .t/ relative to its average value
and hence will drop the symbol from the definition of the correlation function.
There is a famous theorem due to Wiener and Khinchin which we shall briefly
discuss now. Let us assume that the Fourier transform f .!/ can be defined for the
stochastic variable f .t/. Next, we define the spectral power density as the limit:
1D E
S .!/ D lim ST .!/ D lim jfT .!/j2 ; (5.77)
T!1 T!1 T
where
Z T=2
fT .!/ D f .t/ei!t dt
T=2
is the partial Fourier transform, i.e. f .!/ D limT!1 fT .!/. Consider now ST .!/ as
defined above:
*ˇZ ˇ2 + *Z Z T=2 +
1 ˇ T=2 ˇ 1 T=2
ˇ i!t1 ˇ i!.t1 t2 /
ST .!/ D ˇ dt e f .t1 /ˇ D dt1 dt2 e f .t1 / f .t2 /
T ˇ T=2 1 ˇ T T=2 T=2
Z Z
1 T=2 T=2
D dt1 dt2 ei!.t1 t2 / hf .t1 / f .t2 /i
T T=2 T=2
Z Z ˇ ˇ
1 T=2 T=2 ˇ t2 ! D t1 t2 ˇ
D dt1 dt2 e i!.t1 t2 / ˇ
Kff .t1 t2 / D ˇ ˇ
T T=2 T=2 d D dt2 ˇ
Z Z t1 CT=2
1 T=2
D dt1 d ei! Kff ./ : (5.78)
T T=2 t1 T=2
The next step consists of interchanging the order of integration: we shall perform
integration over t1 first and over second. The integration region on the .t1 ; / plane
is shown in Fig. 5.5. When choosing the integration to be performed last, one has
to split the integration region into two regions: T < 0 and 0 T. This
yields
Z 0 Z CT=2 Z Z
1 1 T T=2
ST .!/ D ei! Kff . / d dt1 C ei! Kff ./ d dt1
T T T=2 T 0 T=2
Z 0 Z
1 T
D Kff . / .T C / ei! d C Kff ./ .T / ei! d
T T 0
Z Z
1 T T
jj i!
D .T j j/ ei! Kff . / d D 1 e Kff ./ d:
T T T T
436 5 Fourier Transform
which is the required result. It shows that the spectral power density is in fact the
Fourier transform of the correlation function.
As an example of calculating a time correlation function, let us consider a one-
dimensional Brownian particle. Its equation of motion reads
pP D p C .t/; (5.80)
where p.t/ is the particle momentum, friction coefficient and .t/ a random
force. The two forces in the right-hand side are due to random collisions of liquid
particles with the Brownian particle: the random force tends to provide energy to
the Brownian particle, while the friction force, p, is responsible for taking any
extra energy out, so that on balance the average kinetic energy of the particle,
1 ˝ ˛ 1
p.t/2 D kB T; (5.81)
2m 2
would correspond correctly to the temperature T of the liquid by virtue of the
equipartition theorem. Above m is the Brownian particle mass and kB is the
Boltzmann’s constant.
5.3 Applications of the Fourier Transform in Physics 437
The random force acting on the particle does not have any memory3 ; corre-
spondingly, we shall assume that its correlation function is proportional to the delta
function of the time difference:
˝ ˛
.t/ t0 D ˛ı t t0 ; (5.82)
.!/
i!p.!/ D p.!/ C .!/ H) p.!/ D i ;
! i
where .!/ is the Fourier transform of the random force. The correlation function
becomes
Z
d!d! 0 i!.tC / i! 0 t ˝ ˛
Kpp . / D 2
e e p.!/p ! 0
.2 /
Z
d!d! 0 i!.tC/ i! 0 t h.!/ .! 0 /i
D e e :
.2 /2 .! i / .! 0 i /
To calculate the correlation function of the random force, we shall return back from
frequencies to times by virtue of the random force Fourier transform:
Z Z Z Z
˝ ˛ 0 0 ˝ ˛ 0 0
.!/ ! 0 D dt dt0 ei!t ei! t .t/ t0 D ˛ dt dt0 ei!t ei! t ı tt0
Z
0
D˛ dt ei.!C! /t D 2 ˛ı ! C ! 0 :
Therefore,
Z
d!d! 0 2 ˛ı .! C ! 0 /
0
Kpp . / D 2
ei!.tC/ ei! t
.2 / .! i / .! 0 i /
Z Z
d! i! 1 d! ei!
D ˛ e D˛ :
2 .! i / .! i / 2 !2 C 2
3
The case with the memory will be considered in Sect. 6.5.2.
438 5 Fourier Transform
The integral is performed in the complex z plane. There are two poles z D ˙i ,
one in the upper and one in the lower halves of the complex plane. For > 0 we
enclose the horizontal axis in the upper half where only the pole z D i contributes,
which gives Kpp . / D .˛=2 / e . For negative times < 0 the horizontal axis
is enclosed by a large semicircle in the lower part of the complex plane, where the
pole z D i contributes and an extra minus sign comes from the opposite direction
in which the contour is traversed. This gives Kpp . / D .˛=2 / e . Therefore, for
any time, we can write
˛ jj
Kpp . / D e D mkB Tej j : (5.83)
2
Several interesting observations can be made. Firstly, the correlation function does
indeed decay (exponentially) with time, i.e. collisions with particles of the liquid
destroy any correlation in motion of the Brownian particles on the time scale of 1= .
Indeed, as one would expect, the stronger the friction, the faster any correlation in
the particle motion is destroyed as the Brownian particle undergoes many frequent
collisions; if the friction is weak, the particle moves farther without collisions,
collisions happen less frequently and hence the correlation between different times
decays more slowly. Secondly,
˝ ˛ at D 0 we obtain the average momentum square,
Kpp .0/ D hp.t/p.t/i D p.t/2 , which at equilibrium should remain unchanged on
average and equal to mkB T in accordance with Eq. (5.81). And indeed, Kpp .0/ D
mkB T as required. It is this particular physical condition which fixes the value of the
constant ˛ in Eq. (5.82). Finally, the correlation function is indeed even with respect
to the time variable as required by Eq. (5.76).
which experiences a random force .t/ satisfying Eq. (5.82) with the same
˛ D 2mkB T as above. Show first that after the Fourier transform the particle
velocity v.!/ D xP .!/ D i!x.!/ is given by
i !.!/
v.!/ D ;
m ! 2 !02 i!
p
where !0 D k=m is the fundamental frequency of the oscillator. Then, using
this result show that the velocity autocorrelation function
Z
2kB T d! !2
Kvv .t/ D hv.t/v.0/i D ei!t :
m 2 ! 2 ! 2 2 C .!/2
0
(continued)
5.3 Applications of the Fourier Transform in Physics 439
and
2 p 3
kB T 4 p sinh Dt
hv.t/v.0/i D cosh Dt p 5 ejtj=2 if > 2!0 :
m 2 D
˝ ˛
Above, D D !02 2 =4. Note that v.0/2 D kB T=m as required by the
equipartition theorem.
Consider propagation of light through a wall with a hole in it, an obstacle called
an aperture. On the other side of the wall a diffraction pattern will be seen because
of the wave nature of light. If in the experiment the source of light is placed far
away from the aperture so that the coming light can be considered as consisting of
plane waves, and the observation of the diffraction pattern is made on the screen
placed also far away from the aperture, this particular diffraction bears the name of
Fraunhofer.
Consider an infinitely adsorbing wall (i.e. fully non-radiative) placed in the x y
plane with a hole of an arbitrary shape. The middle of the hole (to be conveniently
defined in each particular case) is aligned with the centre of the coordinate system,
Fig. 5.6. The positive direction z is chosen towards the observation screen (on the
right in the figure). Each point A within the aperture with the vector r D .x; y; 0/
(more precisely, a small surface area dS D dxdy at this point) becomes an
independent source of light waves which propagate out of it in such a way that
their amplitude
1 i!.tR=c/
dF.g/ / e dS;
R
where R D jRj is the distance between the point A and the observation point P
which position is given by the vector g drawn from the centre of the coordinate
system, c speed of light and ! frequency (k D !=c is the wave vector). Note that
the amplitude of a spherical wave decays as 1=R with distance, and we have also
440 5 Fourier Transform
Fig. 5.6 Fraunhofer diffraction: parallel rays of light from a source (on the left) are incident on
the wall with an aperture. They are observed at point P on the screen which is far away from the
wall. Every point A on the x y plane within the aperture serves as a source of a spherical wave of
light. All such waves from all points within the aperture contribute to the final signal observed at P
explicitly accounted for in the formula above the retardation effects (see Sect. 5.3.2).
In order to calculate the total contribution at point P, one has to integrate over the
whole area of the aperture:
Z
dS ikR
F .g/ / ei!t e : (5.84)
S R
The observation point is positioned far away from the small aperture, i.e. r g, see
Fig. 5.6. In this case the distance R D jg rj can be worked out using the cosine
theorem as follows:
s
p 2
2 2
r r
R D g C r 2g r D g 1 C 2Og
g g
r
r r
' g 1 2Og ' g 1 C gO D g C gO r;
g g
where k D kOg and we have made use again of the fact that the observation point is far
away from the aperture and hence have removed the gO r term in the denominator. We
have also omitted the whole pre-factor at the last steps; we shall reinstate it later on.
5.3 Applications of the Fourier Transform in Physics 441
In order to connect the direction g (or k D kOg) with the coordinates x0 and y0
of the observation point P on the screen, we notice that projections of this point
on both the screen and on the wall are the same as the two planes are parallel to
each other, see additional dotted lines in the figure. In particular, the point B on
the wall corresponds to the observation point P on the screen. Therefore, gx D x0
and gy D y0 . If by z we denote the distance between the screen and the wall, then
z D gz ' g, and hence
k k k
kx D kOgx D gx ' x0 and ky ' y0 ; (5.86)
g z z
0 0
yielding F .g/ being directly
dependent on the coordinates .x ; y / on the screen, i.e.
it can be written as F kx ; ky .
In order to work out the pre-factor to the integral in Eq. (5.85), we first notice
that this integral is performed over the aperture area only. It can be extended to the
whole x y plane if we introduce the so-called aperture function .x; y/, which is
equal to one within the aperture and zero otherwise. Then,
Z
F kx ; ky D .x; y/ ei.kx xCky y/ dxdy; (5.87)
This is the final result. The amplitude is proportional to the two-dimensional Fourier
transform of the aperture function. The intensity distribution at point P .x0 ; y0 / on the
ˇ ˇ2
screen is then given by dI .x0 ; y0 / D ˇF kx ; ky ˇ dkx dky , where the relations between
the components of the wave vector k and the coordinates .x0 ; y0 / on the screen are
given by Eq. (5.86).
p Z
R
krr0
F r0 ; 0
D I0 rJ0 dr;
0 z
where J0 .x/ is the corresponding Bessel function, Sect. 4.6, and we have made
use of a well-known integral representation for it, Eq. (4.230):
Z 2
1
J0 .x/ D eix cos d :
2 0
Finally, using the recurrence relation (4.218) for the Bessel functions, integrate
over r and show that
p
0 0 Rz I0 kR 0
F r; D J1 r :
kr0 z
in this case.
5.3 Applications of the Fourier Transform in Physics 443
Problem 5.45. Consider now an infinite slit of width h running along the y
direction. In that case the light arriving at a point .x; y/ within the aperture will
only be diffracted in the xz plane. Repeating the above derivation which led us
to Eq. (5.88), show that in this essentially one-dimensional case the amplitude
on the screen per unit length along the slit is given by the following formula:
r Z r
I0 I0 kh 0
F .kx / D e dx D h
ikx x
sinc x ; (5.90)
2 2 2z
6.1 Definition
The idea of why the LT might be useful can be understood if we recall why the
FT could be useful. Indeed, the FT was found useful, for instance, in solving
differential equations (DE): (1) one first transforms the DE into the Fourier space,
i.e. fx; f .x/g ! f; F./g; (2) the DE in the Fourier space (the -space) for
1
In the following, references to the first volume of this course (L. Kantorovich, Mathematics for
natural scientists: fundamentals and basics, Springer, 2015) will be made by appending the Roman
number I in front of the reference, e.g. Sect. I.1.8 or Eq. (I.5.18) refer to Sect. 1.8 and Eq. (5.18) of
the first volume, respectively.
the “image” F./ appears to be simpler (e.g. it becomes entirely algebraic with
no derivatives) and can then be solved; then, finally, (3) using the inverse FT,
f; F./g ! fx; f .x/g, the original function f .x/ we are interested in is found. In
other words, to solve the problem, one “visits” the Fourier space where the problem
at hand looks simpler, solves it there, but then “returns back” to the original x-space
by means of the inverse FT. Similarly one operates with the LT: first, the problem
of finding the function f .t/ is converted into the “Laplace space” by performing the
LT (t ! p and f .t/ ! F.p/), where p is a complex number; then the “image” F.p/
of the function of interest is found, which is then converted back into the t-space,
F.p/ ! f .t/, by means of the inverse LT. In fact, we shall see later on that the two
transforms are closely related to each other.
Consider a (generally) complex function f .t/ of the real argument t 0 which is
continuous everywhere apart from some number of discontinuities of the first kind.2
Moreover, we assume that within each finite interval there could only be a finite
number of such discontinuities. Next, for definiteness, we shall set the values of f .t/
at negative t to zero: f .t/ D 0 for t < 0. Finally, we shall assume that f .t/ may
increase with t; but this cannot happen faster than for an exponential function ex0 t
with some positive exponent x0 . In other words, we assume that
where M is some positive constant. An example of such a function is, for instance,
the exponential e2t . It goes to infinity when t ! C1, however, this does not happen
2
faster than ex0 t with any x0 > 2. At the same time, the function f .t/ D e2t grows
x0 t
much faster than the exponential function e with any x0 , and hence this class
of functions is excluded from our consideration. Note that if f .t/ is limited, i.e.
jf .t/j M, then x0 D 0. The positive number x0 characterising the exponential
growth of f .t/ we shall call the growth order parameter of f .t/. We shall call the
function f .t/ the original.
We shall then define the Laplace transform (LT) of f .t/ by means of the following
formula:
Z 1
L Œf .t/ D f .t/ept dt D F.p/: (6.2)
0
Here f .t/ is a function in the “t-space”, while its transform, F.p/ D L.f /, is the
corresponding function in the “p-space”, where p is generally a complex number.
The function F.p/ in the complex plane will be called image of the original f .t/.
Before we investigate the properties of the LT, it is instructive to consider some
examples.
2
Recall that this means that one-sided limits, not equal to each other, exist on both sides of the
discontinuity.
6.1 Definition 447
A very important point here is that we set to zero the value of the exponential
function at the t D C1 boundary. This is legitimate only if a part of the complex
plane C is considered for the complex numbers p. Indeed, if we only consider the
numbers p D x C iy with the positive real part, Re.p/ D x > 0, i.e. the right semi-
plane of C, then ept D ext eiyt ! 0 at tˇ ! 1 ˇ and this ensures the convergence
of the integral at the upper limit (note that ˇeiyt ˇ D 1).
Let us next calculate the LT of f .t/ D e˛t with some complex ˛. According to
the definition,
Z ˇ1
1
e.pC˛/t ˇˇ 1
L Œf D e.pC˛/t dt D ˇ D :
0 .p C ˛/ 0 pC˛
To ensure the convergence of the integral at the upper limit we have to consider
only such values of p in the complex plane C which satisfy the following condition:
Re.p C ˛/ > 0. Indeed, only in this case e.pC˛/t ! 0 as t ! 1. In other words,
the LT of the function e˛t exists for any p to the right of the vertical line drawn in
C via the point Re.˛/.
Since the function f .t/ enters the integral (6.2) linearly, the LT represents a linear
operator, i.e. for any two functions f .t/ and g.t/ satisfying the necessary conditions
outlined above,
Z 1
L Œ˛f .t/ C ˇf .t/ D Œ˛f .t/ C ˇf .t/ ept dt D ˛L Œf C ˇL Œg ; (6.3)
0
where ˛ and ˇ are arbitrary complex numbers. The linearity property allows
simplifying the calculation of some transforms as illustrated by the example of
calculating the LT of the sine and cosine functions f .t/ D cos !t and g.t/ D sin !t
(assuming ! is real). Since
1 i!t
f .t/ D cos !t D e C ei!t ;
2
we write
1˚ 1 1 1 p
L Œcos !t D L ei!t C L ei!t D C D :
2 2 p i! p C i! p2 C !2
(6.4)
Here we must assume that Re .p/ > 0.
448 6 Laplace Transform
Problem 6.2. Obtain the same formulae for the LT of the sine and cosine
functions directly by calculating the LT integral.
Problem 6.3. Prove that for Re.p/ > 0
2!p p2 ! 2
L Œt sin !t D and L Œt cos !t D : (6.6)
.p2 C ! 2 /2 .p2 C ! 2 /2
! pC˛
L e˛t sin !t D 2
and L e˛t cos !t D :
.p C ˛/ C !2 .p C ˛/2 C ! 2
(6.8)
Problem 6.7. It was shown above that L t0 D 1=p. Show that L Œt D 1=p2 .
Then prove by induction that
nŠ
L Œtn D ; n D 0; 1; 2; 3 : : : : (6.10)
pnC1
nŠ
L tn e˛t D ; n D 0; 1; 2; 3 : : : : (6.11)
.p C ˛/nC1
6.1 Definition 449
Problem 6.9. In fact, show that the following result is generally valid:
dn
L Œ.t/n f .t/ D F.p/; (6.12)
dpn
where L Œf D F.p/. [Hint: the formula follows from the definition (6.2) upon
differentiating its both sides n times.]
Problem 6.10. Using the rule (6.12), rederive Eq. (6.10).
Problem 6.11. Using the rule (6.12), rederive Eq. (6.11).
Problem 6.12. Show that
1 1 pC˛
L e˛t cos2 .ˇt/ D C :
2.p C ˛/ 2 .p C ˛/2 C 4ˇ 2
As another useful example, let us consider the LT of the Dirac delta function
f .t/ D ı .t / with some positive > 0:
Z 1
L Œı .x / D ı .t / ept dt D ep : (6.14)
0
L Œı .t/ D 1: (6.15)
.
450 6 Laplace Transform
Here we shall consider the LT in more detail including the main theorems related to
it. Various properties of the LT which are needed for the actual use of this method
in solving practical problems will be considered in the next section.
Theorem 6.1. If f .t/ is of an exponential growth, i.e. it goes to infinity not faster
than the exponential function ex0 t with some positive growth order parameter
x0 > 0, then the LT LŒf .t/ D F.p/ of f .t/ exists in the semi-plane Re.p/ > x0 .
Proof. If f .t/ is of the exponential growth, Eq. (6.1), where the positive number x0
may be considered as a characteristic exponential of the function f .t/, then the LT
integral (6.2) converges absolutely. Indeed, consider the absolute value of the LT of
f .t/ at the point p D x C iy:
ˇZ 1 ˇ Z 1 Z 1
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ
jF.p/j D ˇ pt ˇ
f .t/e dtˇ ˇ f .t/ept ˇ
dt D jf .t/j ˇept ˇ dt
0 0 0
Z 1
D jf .t/j ext dt;
0
so that
Z 1 Z 1 Z 1
jF.p/j jf .t/j ext dt M ex0 t ext dt D M e.xx0 /t dt
0 0 0
ˇ1
M ˇ
D e.xx0 /t ˇ :
x x0 0
If x x0 D Re.p/ x0 > 0, i.e. if Re.p/ > x0 , then the value of the expression above
at the upper limit t D 1 is zero and we obtain
ˇZ ˇ
ˇ 1 ˇ M
jF.p/j D ˇˇ f .t/ept dtˇˇ ; (6.16)
0 x x0
which means that the LT integral converges absolutely and hence the LT F.p/ exists.
Q.E.D. Note that the estimate above establishes the uniform convergence of the
LT integral with respect to the imaginary part of p since jF.p/j is bounded by an
expression containing only the real part of p. We shall use this fact later on in
Sect. 6.2.3.
6.2 Detailed Consideration of the Laplace Transform 451
It follows from this theorem that F.p/ ! 0 when jpj ! 1 since the expression
in the right-hand side of (6.16) tends to zero in this limit. Note that since Re .p/ > x0 ,
this may only happen for arguments of p D jpj ei satisfying the inequality
=2 < < =2 with the points ˙ =2 strictly excluded.
Theorem 6.2. If f .t/ is of an exponential growth with the growth order parameter
x0 > 0, then the LT LŒf .t/ D F.p/ of f .t/ is an analytical function of p in the
complex semi-plane Re.p/ > x0 .
The first term is equal to zero both at t D 0 (because of the t present there) and at
t D 1 if the condition x D Re.p/ > Re.x0 / is satisfied (note that the exponential
e˛t with ˛ > 0 tends to zero much faster than any power of t when t ! 1). Then,
only the integral in the right-hand side remains which is calculated trivially to yield:
Z 1 Z 1 ˇ1
1 1 ˇ 1
te.xx0 /t dt D e.xx0 /t dt D 2
e.xx0 /t ˇ D ;
0 x x0 0 .x x0 / 0 .x x0 /2
i.e. it is indeed finite. This means that the function F.p/ is analytical (can be
differentiated).Q.E.D.
It also follows from the two theorems that F.p/ does not have singularities in
the right semi-plane Re.p/ > x0 , i.e. all possible poles of F.p/ can only be in
the complex plane on the left of the vertical line Re.p/ D x0 . Therefore, the
growth order parameter demonstrating the growth of the function f .t/ in the “direct”
t-space also determines the analytical properties of the “image” F.p/ in the complex
p-plane.
452 6 Laplace Transform
Since in the left-hand side we have L Œt sin !t, we immediately reproduce one of
the formulae in Eq. (6.6). The other formula is reproduced similarly.
It appears that the LT and the FT are closely related. We shall establish here the
relationship between the two which would also allow us to derive a direct formula
for the inverse LT. A different (and more rigorous) derivation of the inverse LT will
be given in the next subsection.
Consider a function f .t/ for which the LT exists in the region Re.p/ > x0 . If we
complement f .t/ with an extra exponential factor eat with some real parameter a,
we shall arrive at the function ga .t/ D f .t/eat which for any a > x0 will tend to
zero at the t ! C1 limit:
ˇ ˇ
jga .t/j D ˇf .t/eat ˇ Me.ax0 /t :
Therefore, the FT Ga ./ of ga .t/ can be defined3 It will depend on both a and :
Z 1 Z 1 Z 1
Ga ./ D ga .t/ei2 t
dt D f .t/eat ei2 t
dt D f .t/e.aCi2 /t
dt:
1 0 0
(6.17)
Note that we have replaced the bottom integration limit by zero as f .t/ D 0 for any
negative t. The inverse FT of Ga .t/ is then given by:
Z 1
i2 t ga .t/ D f .t/eat ; t > 0
Ga ./e dt D :
1 0 ; t<0
3
Since f .t/ is piecewise continuous, so is ga .t/, and hence the other Dirichlet condition is also
satisfied for the FT to exist.
6.2 Detailed Consideration of the Laplace Transform 453
The number p D a C i2 is some complex number, so that Eqs. (6.17) and (6.18)
can also be alternatively written as:
Z 1
Ga ./ D f .t/ept dt (6.19)
0
and
Z 1
f .t/ D Ga ./ept d: (6.20)
1
One can recognise in Eq. (6.19) the LT of the function f .t/, i.e. Ga ./ D L Œf .t/ D
F.p/. In the other Eq. (6.20) we shall change the integration variable from to
p D a C i2 and will replace Ga ./ with F.p/. This gives
Z aCi1
1
f .t/ D F.p/ept dp: (6.21)
2 i ai1
This formula provides a recipe for the inverse LT. Here a is any positive number
such that a > x0 . One can see that in order to calculate f .t/ from its LT F.p/, one
has to perform an integration in the complex plane of the function F.p/ept along the
vertical line Re.p/ D a from i1 to Ci1.
Here we shall rederive formula (6.21) using an approach similar to the one used
when proving the Fourier integral in Sect. 5.1.3.
Theorem 6.3. If f .t/ satisfies all the necessary conditions for its LT F.p/ to exist
and is analytic,4 then the following identity is valid:
Z Z 1
1 aCib
pt1
f .t/ D lim pt
e f .t1 / e dt1 dp: (6.22)
b!1 2 i aib 0
The limit above essentially means that the integral over p is to be understood as
the principal value integral when both limits go to ˙i1 simultaneously.
4
Strictly speaking, this condition is not necessary, but we shall assume it to simplify the proof.
454 6 Laplace Transform
At the next step we would like to exchange the order of integrals. This is legitimate
if the internal integral (over t1 ) converges uniformly with respect to the variable y of
the external integral taken over the vertical line p D a C iy in the complex plane. It
follows from Theorem 6.1 (see the comment immediately following its proof) that
for any a > x0 the integral over t1 indeed converges absolutely and uniformly with
respect to the imaginary part of p. Therefore, the order of the two integrals can be
interchanged (cf. Sect. I.6.1.3) enabling one to calculate the p-integral:
Z 1 Z aCib
1
fb .t/ D dt1 f .t1 / ep.tt1 / dp
2 i 0 aib
Z 1 .aCib/.tt1 /
1 e e.aib/.tt1 /
D dt1 f .t1 /
2 i 0 t t1
Z 1 Z
1 sin Œb .tt1 / 1 1 sin Œb .tt1 /
D f .t1 / ea.tt1 / dt1 D f .t1 / ea.tt1 / dt1 ;
0 t t1 1 tt1
where in the last step we replaced the bottom limit to 1 as f .t1 / D 0 for t1 < 0
by definition. Using the substitution D t1 t and introducing the function g .t/ D
f .t/eat , we obtain
Z 1 Z
1 sin .b /
a eat 1 sin .b/
fb .t/ D f .t C / e d D g .t C / d
1 1
Z Z 1
eat 1 g .t C / g .t/ eat sin .b/
D sin .b / d C g.t/ d:
1 1
The last integral, see Eq. (2.122), is equal to , and hence the whole second term
above amounts exactly to f .t/. In the first term we introduce the function ‰./ D
Œg .t C / g .t/ = within the square brackets, which yields
Z 1
eat
fb .t/ D f .t/ C ‰. / sin .b/ d: (6.23)
1
The function ‰ ./ is piecewise continuous. Indeed, for ¤ 0 this follows from the
corresponding property of f .t/. At D 0, however,
g .t C / g .t/
lim ‰ . / D lim D g0 .t/;
!0 !0
and is well defined as we assumed that f .t/ is analytic (differentiable). Hence, ‰ ./
is piecewise continuous everywhere. Following basically the same argument as the
one we used at the end of Sect. 5.1.3, the integral in Eq. (6.23) tends to zero in the
b ! 1 limit. Hence, limb!1 fb .t/ D f .t/. As the function in the square brackets
in Eq. (6.22) is F.p/, the proven result corresponds to the inverse LT of Eq. (6.21).
The subtle point here is that the result holds in the b ! 1 limit, i.e. the formula is
indeed valid in the sense of the principal value. Q.E.D.
6.2 Detailed Consideration of the Laplace Transform 455
Fig. 6.1 Possible selections of the contour used in calculating the inverse LT integral in the
complex p plane. (a) Contours CR and CR0 corresponding to, respectively, > 0 and < 0 in
the exponential term eiz of the standard formulation of the Jordan’s lemma, and (b) contours CR
and CR0 corresponding to the exponential term ept of the inverse LT for t > 0 and t < 0, respectively.
The vertical line DE corresponds to the vertical line in Eq. (6.21)
Formula (6.21) for the inverse LT is very useful in finding the original function
f .t/ from its “image” F.p/. This may be done by using the methods of residues. Let
us discuss how this can be done. Integration in formula (6.21) for the inverse LT is
performed over the vertical line p D a C iy, 1 < y < 1, where a > x0 . Consider
a part of this vertical line DE in Fig. 6.1(b), corresponding to b < y < b. A closed
contour can be constructed
p by attaching to the vertical line an incomplete circle of
the radius R D a2 C b2 either from the left, CR (the solid line), or from the right,
CR0 (the dashed line). We shall now show that one can use a reformulated Jordan’s
lemma for expressing the inverse LT integral via residues of the image F.p/.
Indeed, let us reformulate the Jordan’s lemma, Sect. 2.7.2, by making the
substitution
C =2/
z D rei ! p D iz D rei.
in Eqs. (2.118) and (2.119). It is seen that each complex number z acquires
an additional phase =2; this corresponds to the 90ı anti-clockwise rotation of
the construction we made originally in Fig. 2.32(a) as illustrated in Fig. 6.1. In
particular, upon the substitution the horizontal line z D x ia, b < x < b, in
Fig. 6.1(a) is transformed into the vertical line p D iz D a C ix with y D x satisfying
b < y < b, see Fig. 6.1(b), while the two parts CR and CR0 of the circle of radius
R in Fig. 6.1(a) turn into the corresponding parts CR and CR0 shown in Fig. 6.1(b).
Replacing z ! p D iz and also ! t in Eqs. (2.118) and (2.119), we obtain that
the contour integrals along the parts CR and CR0 of the circle shown in Fig. 6.1(b) for
t > 0 and t < 0, respectively, tend to zero in the R ! 1 limit,
Z Z
lim F.p/e dp D 0 if t > 0 and
pt
lim F.p/ept dp D 0 if t < 0;
R!1 C R!1 C0
R R
(6.24)
provided that the function F.p/ in the integrand tends to zero when jpj ! 1.
456 6 Laplace Transform
where pk is a pole of F.p/ and we sum over all poles. Since the integral over the part
of the circle CR in the R ! 1 is zero according to the Jordan’s lemma, Eq. (6.24),
we finally obtain
Z X
1 aCib
f .t/ D lim F.p/ept dp D Res F.p/ept ; pk ; (6.25)
b!1 2 i aib k
where we sum over all poles on the left of the vertical line p D x0 C iy.
To illustrate this powerful result, let us first calculate the function f .t/ corre-
sponding to the image F.p/ D 1= .p C ˛/. We know from the direct calculation
that this image corresponds to the exponential function f .t/ D e˛t . Let us see
if the inverse LT formula gives the same result. Choosing the vertical line at
x D a > Re.˛/, so that the pole at p D ˛ is positioned on the left of the vertical
line, we have
ept ˛t 1 1
f .t/ D Res ; ˛ D e DL ;
pC˛ pC˛
1
F.p/ D ; ˛ ¤ ˇ:
.p C ˛/ .p C ˇ/
6.2 Detailed Consideration of the Laplace Transform 457
It has two simple poles: p1 D ˛ and p2 D ˇ. Therefore, the original becomes
3p C 2
F.p/ D
3p2 C 5p 2
as a sum of partial fractions and then show using the LTs we calculated earlier
that the inverse LT of it is f .t/ D 17 4e2t C 3et=3 .
Problem 6.16. Similarly, show that
1 1p
L D e2t .sin 3t cos 3t/ :
p2 C 4p C 13
Problem 6.17. Use the inverse LT formula (6.21) to find the original function
f .t/ from its image:
p !
L1 D cos !t I L1 D sin !t I
p2 C ! 2 p2 C ! 2
1 p 1 ˛t
L D ˛e ˇeˇt ; ˛ ¤ ˇI
.p C ˛/ .p C ˇ/ ˛ˇ
1 a 1 p
L D sinh .at/ I L D cosh .at/ I
p2 a2 p2 a2
" # " #
1 2!p 1 p2 ! 2
L D t sin !t I L D t cos !t I
.p2 C ! 2 /2 .p2 C ! 2 /2
(continued)
458 6 Laplace Transform
Problem 6.18. Repeat the calculations of the previous problem using the
method of decomposition of F.p/ into partial fractions.
Multiple valued functions F.p/ can also be inverted using formula (6.21).
Consider as an example the image F.p/ D p , where 1 < < 0. Note that
the fact that is negative, guarantees the necessary condition F.p/ ! 0 at the large
circle CR in the R ! 1 limit. The subtle point here is that we have to modify the
contour CR since a branch cut from p D 0 to p D 1 along the negative part of
the real axis is required as shown in Fig. 6.2. Therefore, the residue theorem cannot
be applied immediately as we have to integrate explicitly over each part of the closed
contour. On the upper side of the cut p D rei , while on the lower p D rei , where
0 < r < 1 and . The function F.p/ does not have any poles inside
the closed contour, therefore, the sum of integrals over each part of it is zero. The
contour consists of the vertical part, of the incomplete circle CR with R ! 1 (of
two parts), a small circle C (where ! 0), and the upper and the lower horizontal
parts of the negative x axis. Therefore,
Z Z Z Z Z !
1 1
f .t/ D D :
2 i vertical 2 i CR upper lower C
The integral over both parts of CR tends to zero by means of the Jordan’s lemma,5
so we need to consider the integrals over the small circle and over the upper and
lower sides of the branch cut. The integral over C tends to zero in the ! 0 limit
for C 1 > 0:
Z ˇ ˇ Z
ˇ p D ei ˇ
p ept dp D ˇˇ ˇD i t ei
e e i ei d / C1 ! 0:
C dp D i e d ˇ
i
where we have introduced the gamma function, Sect. 4.2, and applied the ! 0
limit and hence replaced the lower integration limit with zero. Similarly, on the
lower side of the cut:
Z 1 Z 1
. C 1/
r ei ert .dr/ D ei r ert dr D ei :
tC1
1 i . C 1/ sin . / . C 1/
f .t/ D L1 p D e C ei D :
2 i tC1 tC1
(6.26)
In particular, for D 1=2 we obtain
1 1 .1=2/ 1
L1 p D p Dp : (6.27)
p t t
5
This follows from the fact that p ! 0 on any part of CR in the R ! 1 limit.
460 6 Laplace Transform
where 0 < ˛ < 1 and ‰.p/ D L Œ .t/ is the image of .t/; ‰.p/ is assumed
to vanish when jpj ! 1 in the whole complex plane. We, however, assume for
simplicity that p > 0 is a real number. To prove this formula, we first write .t/
using the inverse LT integral:
Z 1 Z 1 Z
1 aCi1
L Œt˛ .t/ D t˛ .t/ept dt D t˛ dt ‰ .z/ e.pz/t dz :
0 0 2 i ai1
(6.31)
Here Re .z/ D a < p, as the vertical line drawn at Re .z/ D a is always to the left
of the region in the complex plane p where the LT is analytic. The function ‰ .z/
decays to zero at jzj ! 1, and hence we can assume that the integral
Z aCi1
j‰ .z/j dz
ai1
converges. Since z D a C iy with 1 < y < 1, the integral along the vertical line
in Eq. (6.31) converges uniformly with respect to t:
ˇZ ˇ ˇZ ˇ
ˇ aCi1 ˇ ˇ aCi1 ˇ
ˇ ‰ .z/ e .pz/t ˇ
dzˇ D e.pa/t ˇ
‰ .z/ e dzˇˇ
iyt
ˇ ˇ
ai1 ai1
Z Z
aCi1 ˇ ˇ aCi1
e .pa/t ˇ‰ .z/ eiyt ˇ dz j‰ .z/j dz;
ai1 ai1
since p > a and hence e.pa/t 1. The demonstrated uniform convergence allows
us to exchange the order of integrals in Eq. (6.31) to get:
Z Z 1
1 aCi1
L Œt˛ .t/ D ‰ .z/ dz t˛ e.pz/t dt : (6.32)
2 i ai1 0
Z 1 Z .paiy/t
˛ .pz/t 1
t e dt D lim u˛ eu du:
0 .p z/1˛ t!1 0
6.2 Detailed Consideration of the Laplace Transform 461
A subtle point here is that the u-integration is performed not along the real axis
assumed in the definition of the gamma function, but rather along the straight line
u D 0 7! u D .p a iy/ t in the complex plane (note that a real y is fixed). This
is line L in Fig. 6.3. To covert this integral into the one along the real axis Re.u/,
we introduce a closed contour L C CR C X C C shown in the figure. Note that the
horizontal line X is being passed in the negative direction as shown. The contour
avoids the point u D 0 with a circular arc C of radius ! 0. Also, another circular
arc CR of radius R ! 1 connects L with the horizontal line X. Since the integrand
u˛ eu does not have poles inside the contour, the sum of the integrals along the
whole closed contour is zero. At the same time, it is easy to see that the integral
along C behaves like 1˛ and hence tends to zero when ! 0, while the one
along CR behaves like R1˛ eR cos (where R =2 < R < =2) R and hence goes to
zero as well in the R ! 1 limit. Therefore, L D X D X C , where X C is the
horizontal line along the real axis u > 0 taken in the positive direction. Hence, we
arrive at the u-integral in which u changes between 0 and 1 yielding the gamma
function .1 ˛/ as required.
Therefore, Eq. (6.32) can be rewritten as follows:
Z
˛ .1 ˛/ aCi1
‰ .z/
L Œt .t/ D dz: (6.33)
2 i ai1 .p z/1˛
of it. The function ‰.z/ does not have poles to the right of Re.z/ D a, so that the
integral over the closed contour is equal to zero. Because the function ‰ .z/ tends
to zero for jzj ! 1, the integral over the two parts of the large circular contour
CR tends to zero as R ! 1. Consider the integral over the small circle C . There
zp D ei , dz D i ei d and hence the integral behaves as ˛1 D ˛ and hence
also tends to zero as ! 0 (recall that 0 < ˛ < 1). Hence, we have to consider
the integrals over the upper I and lower II horizontal lines. On the upper side
Z Z ˇ ˇ Z 1
p
‰ .z/ ˇz p D sˇ ‰ .s C p/
D ˇ
dz D ˇ ˇ D ds
.p z/ 1˛ dz D ds ˇ .s/1˛
I 1 0
Z 1
D .s/˛1 ‰ .s C p/ ds:
0
The function .s/˛1 originates from .p z/˛1 . On the vertical part of the contour
the latter function is positive when z is on the real axis since there p > Re .z/ D a
(recall that p is real and positive). Since z D a is on the left of p, it corresponds
to the phase D of s D jsj ei (our change of variables s D z p is simply a
horizontal shift to place the centre of the coordinate system at point p). Therefore,
when choosing the correct branch of the function .s/˛1 , we have to make sure that
the function is positive at the D phase of s. There are basically two possibilities
to consider for the prefactor 1 before the s in .s/˛1 : when 1 D ei and 1 D
ei . For the former choice (recall that D when z D a),
It is easy to see that for a general non-integer ˛ it is impossible to guarantee that this
number is positive. Indeed, if we, for instance, take ˛ D 1=2, then ei2 ˛ D ei D
1 leading to a negative value. On the other hand, the other choice guarantees the
positive value for any ˛ and the phase D :
and we obtain
Z Z 1
D ei.˛1/ s˛1 ‰ .s C p/ ds:
II 0
Here we used formulae (4.42) and (4.41) for the beta function. For instance, by
taking ˛ D 1=2, we immediately recover our previous result (6.27).
464 6 Laplace Transform
where n D Œ˛ C 1 is a positive integer, and ‰ .n/ .s/ is the n-th derivative of
‰.s/. [Hint: repeat the steps which led us to Eq. (6.33) in the previous case,
and then, showing that
n
d 1 .1/n .˛ C 1/ 1
D ;
dz .p z/ˇ .ˇ/ .p z/˛C1
eat cos bt
pCa
L14 .pCa/2 Cb2
Re .p C a/ > jIm bj
a2
L15 1 cos at p.p2 Ca2 /
Re p > jIm aj
a3
L16 at sin at p2 .p2 Ca2 /
Re p > jIm aj
2a3
L17 sin at at cos at .p2 Ca2 /2
Re p > jIm aj
eat .1 at/
p
L18 .pCa/2
Re .p C a/ > 0
L19 sin at
t
arctan a
p
Re p > jIm aj
1 1
L20 t
sin at cos bt; a > 2
arctan aCb
p
C arctan ab
p
Re p > 0
0; b > 0
eat ebt pCb
L21 t
ln pCa
Re .p C a/ > 0 and
Re .p C b/ > 0
p
1 a p
L22 1 erf a
p
2 t
; a> p
e Re p > 0
0
2 1=2
L23 J0 .at/ p C a2 Re p > jRe aj, or for
real a ¤ 0; Re p 0
1 pa
L24 f(.t/ D p
e Re p > 0
1; t > a > 0
0; t < a
466 6 Laplace Transform
This question is trivially answered by the direct calculation in which the integration
by parts is used
Z 1 Z 1
0 pt df
ˇ
pt ˇ1
L f .t/ D e dt D f .t/e 0 f .t/.p/ept dt
0 dt 0
Z 1
D f .0/ C p ept f .t/dt D f .0/ C pL Œf .t/ ; (6.36)
0
where we have made use of the fact that the upper limit t D 1 in the free term
can be omitted. Indeed, since the LT of f .t/ exists for any p satisfying Re.p/ D
Re.x C iy/ D x > x0 with some growth order parameter x0 , then jf .t/j Mex0 t .
Therefore, the free term is limited from above,
ˇ ˇ ˇ ˇ
ˇf .t/ept ˇ Mex0 t ˇˇe.xCiy/t ˇˇ D Mex0 t ext D Me.xx0 /t ;
L f 00 .t/ DL g0 .t/ DpL Œg.t/ g.0/DpL f 0 .t/ f 0 .0/Dp .pL Œf .t/ f .0// f 0 .0/;
i.e.,
The obtained results (6.36) and (6.37) clearly show that the LT of the first and
the second derivatives of a function f .t/ are expressed via the LT of the function
itself multiplied by p or p2 , respectively, minus a constant. This means that since the
differentiation turns into multiplication in the Laplace space, a differential equation
would turn into an algebraic one allowing one to find the image F.p/ of the solution
of the equation. Of course, in this case the crucial step is the one of inverting the LT
and finding the original from its image, a problem which may be non-trivial.
Problem 6.23. In fact, this way it is possible to obtain a general formula for
the n-th derivative, L f .n/ .t/ . Using induction, prove the following result:
X
n1
L f .n/ .t/ D pn F.p/ pn1k f .k/ .0/; n D 1; 2; 3; : : : ; (6.38)
kD0
where f .0/ .0/ D f .0/. Note that f .0/ and the derivatives f .k/ .0/ are understood
as calculated in the limit t ! C0.
So, any differentiation of f .t/ always turns into multiplication after the LT.
6.3 Properties of the Laplace Transform 467
There is also a simple formula allowing one to calculate the inverse LT of the
n-th derivative F .n/ .p/ of the image, see Eq. (6.12):
Problem 6.24. Generalise the result (6.36) for the derivative of f .t/ for the
case when f .t/ has a discontinuity of the first kind at the point t1 > 0:
L f 0 .t/ D pL Œf .t/ f .0/ f t1C f t1 ept1 ;
where t1C D t1 C 0 and t1 D t1 0. Note that the last extra term which is
proportional to the value of the jump of f .t/ at t1 disappears if the function
does not jump, and we return to our previous result (6.36). [Hint: split the
integration in the definition of the LT into two regions by the point t1 and then
take each integral by parts in the same way as when deriving Eq. (6.36).]
There are two simple properties of the LT related to a shift of either the original or
the image which we formulate as a problem for the reader to prove:
Problem 6.25. Using the definition of the LT, prove the following formulae:
Note that it is implied (as usual) that f .t/ D 0 for t < 0 in both these equations.
In particular, f .t / D 0 for t < (where > 0).
Let us illustrate the first identity which may be found useful when calculating the
LT of functions which are obtained by shifting on the t-axis a given function. As an
example, we shall first work out the LT of a final width step function shown in the
left panel of Fig. 6.5:
Z T
1
L Œ….t/ D ept dt D 1 epT : (6.42)
0 p
Correspondingly, the LT of the function shifted to the right by is then
ep
L Œ… .t / D 1 epT :
p
468 6 Laplace Transform
Consider now a wave signal which is composed of identical unit step impulses
which start at positions tk D k.T C /, where k D 0; 1; 2; : : :, as shown in Fig. 6.6.
The LT of such a function is a sum of contributions from each impulse:
1
X
L Œf .t/ D L Œ… .t/ C L Œ… .t t1 / C L Œ… .t t2 / C D L Œ… .t tk /
kD0
1 ptk
X 1
e 1 X 1 1 epT
D 1 epT D 1 epT ep.TC/k D :
kD0
p p kD0
p 1 ep.TC/
Problem 6.26. Show that the LT of the waveform shown in Fig. 6.7 is given by
! 1 C epT=2
L Œf .t/ D ;
p2 2
C ! 1 ep.CT=2/
where T D 2 =! and the first waveform is defined on the interval 0 < t <
T=2C as f .t/ D sin !t for 0 < t < T=2 and f .t/ D 0 for T=2 < t < T=2C.
In the case of D T=2 this waveform corresponds of a half-wave rectifier
which removes the negative part of the signal.
6.3 Properties of the Laplace Transform 469
Problem 6.27. The same for the waveform shown in Fig. 6.8:
2
2 1 epT=2
L Œf .t/ D 2 :
Tp 1 ep.CT/
Problem 6.28. The same for the waveform shown in Fig. 6.9(a):
A 1
L Œf .t/ D :
p 1 ep
Problem 6.29. The same for the waveform shown in Fig. 6.9(b):
A 1 ep
L Œf .t/ D :
p 1 C ep
We saw in Sect. 6.3.1 that the LT of the first derivative of f .t/ is obtained essentially
by multiplying its image F.p/ with p. We shall now see that the LT of the integral
of f .t/ can be obtained by dividing F.p/ by p:
Z t
F.p/
L f . /d D : (6.43)
0 p
Fig. 6.9 Waveforms to Problem 6.28 (a) and Problem 6.29 (b)
470 6 Laplace Transform
Indeed, consider
Z t Z 1 Z t
L f . /d D dt ept d f . /:
0 0 0
Interchanging the order of integration (considering carefully the change of the limits
on the t plane), we obtain
Z t Z 1 Z 1
pt
L f . /d D d f . / e dt
0 0
Z 1 Z 1
ep 1 F.p/
D d f . / D d f . /ep D ;
0 p p 0 p
as required. When integrating the exponential function ept with respect to t, we set
to zero the result at t ! 1 which is valid for any Re.p/ > 0.
In the above problem we have introduced a useful function which is called error
function:
Z 1
2 2
erfc .x/ D p et dt: (6.44)
x
Here the s-integral is taken along any path in the complex plane connecting the
points p and s1 D 1, where js1 j D 1 and Re .s1 / > 0. Indeed,
Z 1 Z 1 Z 1 Z 1 Z 1
F.s/ds D ds dt f .t/est D dt f .t/ ds est
p p 0 0 p
Z 1 pt Z 1
e f .t/ pt f .t/
D dt f .t/ D dt e DL ;
0 t 0 t t
the desired result. Again, when integrating over s, we set the exponential function
est to zero at the upper limit of s1 D 1 since Re.s1 / > 0.
Check that its LTs calculated directly and when using Eq. (6.43) do coincide.
Problem 6.32. Prove the following formulae:
e˛t eˇt pCˇ
L D ln I
t pC˛
sin !t p !
LD arctan D arctan I
t 2 ! p
h cos ! t cos ! t i 2
1 p C !2 2
1 2
L D ln 2 :
t 2 p C !12
Problem 6.33. Prove the following formulae valid for Re.p/ > 0:
Z p p
t
cos 1 pCiC pi
L p d D p p I
0 2 2 2p p2 C 1
Z p p
t
sin i pCi pi
L p d D p p :
0 2 2 2p p2 C 1
Let G.p/ and F.p/ be LTs of the functions g.t/ and f .t/, respectively:
Z 1
G.p/ D L Œg.t/ D ept1 g .t1 / dt1 ; (6.47)
0
Z 1
F.p/ D L Œf .t/ D ept2 f .t2 / dt2 : (6.48)
0
This is a double integral in the .t1 t2 / plane. Let us replace the integration variable
t1 with t D t1 C t2 . This yields
Z 1 Z 1
pt
G.p/F.p/ D dt2 dt e g .t t2 / f .t2 / : (6.49)
0 t2
At the next step we interchange the order of the integrals. This has to be done with
care as the limits will change:
Z 1 Z t Z 1 Z t
G.p/F.p/D dt dt2 ept g .tt2 / f .t2 / D dt ept g .tt2 / f .t2 / dt2
0 0 0 0
(6.50)
The function
Z t
h.t/ D g .t / f . /d D .g f / .t/ (6.51)
0
is called a convolution of functions g.t/ and f .t/. Note that the convolution is
symmetric:
Z t Z t
g .t / f . /d D f .t / g. /d H) .g f / .t/ D .f g/ .t/;
0 0
The convolution we introduced here is very similar to the one we defined when
considering the FT in Sect. 5.2.3. Since the LT and FT are closely related, there is
no surprise that in both cases the convolution theorem has exactly the same form.6
Problem 6.34. Use the convolution theorem to prove the integral rule (6.43).
One of the main applications of the LT is in solving ODEs and their systems for
specific initial conditions. Here we shall illustrate this point on a number of simple
examples. More examples from physics will be given in Sect. 6.5. We shall be using
simplified notations from now on: if the original function is f .t/, its image will
be denoted by the corresponding capital letter, F in this case, with its argument p
usually omitted.
The general scheme for the application of the LT method is sketched in Fig. 6.10:
since direct solution of the differential equation may be difficult, one uses the LT to
rewrite the equation into a simpler form for the image Y, which is then solved. In
particular, as we shall see, a linear DE with constant coefficients turns into a linear
algebraic equation for the image Y which can always be solved. Once the image is
known, one performs the inverse LT to find the function y.t/ of interest. The latter
will automatically satisfy the initial conditions.
Example 6.1. I Consider the following problem:
y00 C 4y0 C 4y D t2 e2t with y.0/ D y0 .0/ D 0: (6.53)
Fig. 6.10 The working chart for using the LT method when solving differential equations
6
Note, however, that the two definitions of the convolution in the cases of LT and FT are not
identical: the limits are 0 and t in the case of the LT, while when we considered the FT the limits
were ˙1, Eq. (5.39).
474 6 Laplace Transform
where we made use of Eqs. (6.36) and (6.37) for the derivatives. From Eq. (6.11),
the right-hand side is
2
L t2 e2t D ;
.p C 2/3
so that, after using the initial conditions, we obtain the following algebraic equation
for Y:
2
p2 Y C 4pY C 4Y D ;
.p C 2/3
which is trivially solved to yield:
2 1 2 1 4 2t
YD 3
2 D H) y.t/ D t e ; (6.54)
.p C 2/ p C 4p C 4 .p C 2/5 12
where we have used Eq. (6.11) again to perform the inverse LT. J
This example illustrates the power of the LT method. Normally we would look for
a general solution of the corresponding homogeneous equation with two arbitrary
constants; then we would try to find a particular integral which satisfies the whole
equation with the right-hand side. Finally, we would use the initial conditions to
find the two arbitrary constants. Using the LT, the full solution satisfying the initial
conditions is obtained in just one step!
Moreover, it is easy to see that a general solution of the DE with constant
coefficients can always be obtained in a form of an integral for an arbitrary function
f .t/ in the right-hand side of the DE. We have already discussed this point in
Sect. 5.3.3 when considering an application of the Fourier transform method for
solving DEs. There we introduced the so-called Green’s function of the DE. A very
similar approach can be introduced within the framework of the LT as well. Indeed,
consider a second order inhomogeneous DE:
where a1 and a2 are some constant coefficients. By applying the LT to both sides,
we obtain
which yields
1 .p C a1 / y.0/ C y0 .0/
YD FC : (6.56)
p2 C a1 p C a2 p2 C a1 p C a2
6.4 Solution of Ordinary Differential Equations (ODEs) 475
where pC and p are the two roots of the corresponding quadratic polynomial in the
denominator. This function can serve as the image for the following original:
1 p t
g.t/ D L1 ŒG.p/ D e C ep t or g.t/ D tepC t ; (6.58)
pC p
depending on whether pC and p are different or the same (repeated roots).7 Then,
the solution (6.56) can be written as
The second term in the right-hand side corresponds to the solution of the cor-
responding homogeneous DE which is already adapted to the initial conditions;
obviously, it corresponds to the complementary solution (use integration in the
complex plane):
Either form is, of course, valid, as the convolution is symmetric! The function
g.t/ is called the Green’s function of the DE (6.55). It satisfies the DE with the
delta function f .t/ D ı.t/ in the right-hand side (cf. Sect. 5.3.3) and zero initial
conditions:
Recall that the image of the delta function is just unity, see Eq. (6.15).
2
7
In the latter case the inverse LT of G.p/ D 1= p pC can be obtained either directly or
by taking the limit p ! pC in g.t/ from the first formula in Eq. (6.58) obtained by assuming
pC ¤ p .
476 6 Laplace Transform
Our task is to obtain a particular solution of this DE, subject to the given initial
conditions, for an arbitrary function f .t/.
Solution. Let L .f .t// D F and L .y.t// D Y: Then, performing the LT of both
sides of the DE, we obtain
This equation is simplified further by applying our zero initial conditions. Then, the
obtained equation is easily solved for the image Y of y.t/:
1
p2 Y C 3pY C 2Y D F H) YD F:
p2 C 3p C 2
1
Here G.p/ D p2 C 3p C 2 is the image of the Green’s function. The original
corresponding to it is easily calculated, e.g. by decomposing into partial fractions:
1 1 1 1 1 1 1
g.t/DL ŒG.p/ DL DL L D et e2t :
p2 C 3p C 2 pC1 pC2
Correspondingly, the full solution of the DE, because of the convolution theorem,
Sect. 6.3.4, is
Z t Z t
y.t/ D e.t/ e2.t/ f . /d or f .t / e e2 d:
0 0
Note that the obtained forms of the solution do correspond to the general
result (6.60). This should not be surprising as in this problem we deal with zero
initial conditions. J
0
that the equation y C 3y D e , y.0/ D 0, has the solution
t
Problem6.35. Show
1 3t
y.t/ D 4 e e
t
.
Problem 6.36. Show that the equation y00 C 4y D sin 2t with the initial
conditions y.0/ D 10 and y0 .0/ D 0 has the solution
1
y.t/ D 10 cos 2t C .sin 2t 2t cos 2t/ :
8
6.4 Solution of Ordinary Differential Equations (ODEs) 477
Problem 6.37. Show that the equation y0 y D 2et with the initial condition
y.0/ D 3 has the solution y.t/ D .3 C 2t/ et .
Problem 6.38. Show that the solution of the following DE,
y00 C 9y D cos 3t; y.0/ D 0; y0 .0/ D 6;
dx
T C x D f .t/;
dt
where T is some positive constant. Using the LT method, show that the general
solution of this equation satisfying the initial condition x.0/ D 0 is
Z
1 T
x.t/ D et=T f . /e=T d:
T 0
was externally applied to the oscillator around some time t0 0. Show that
the response of the system in each case is
1 1=2n
yn .t/ D nett0 .t t0 1/ e1=2n e1=2n C e C e1=2n :
2n
Show that the same result is obtained with f .t/ D ı .t t0 /. Is this coincidence
accidental?
dy dz
2z D 2; C 2y D 0;
dt dt
which are subject to the zero initial conditions y.0/ D z.0/ D 0. Applying the LT to
both equations and introducing the images y.t/ ! Y.p/ and z.t/ ! Z.p/, we obtain
two algebraic equations for them:
2
pY 2Z D and pZ C 2Y D 0;
p
p 4
YD ; ZD :
4 C p2 p .p2 C 4/
Applying the inverse LT, we finally obtain y.t/ D sin 2t and z.t/ D 1 C cos 2t. The
solution satisfies the equations and the initial conditions. One can also see that there
is one more advantage of using the LT if not all but only one particular function is
needed. At the intermediate step, when solving for the images in the Laplace space,
each solution is obtained independently. Hence, if only one unknown function is
6.5 Applications in Physics 479
needed, it can be obtained and inverted; this brings significant time savings for
systems of more than two equations. This situation often appears when solving
electrical circuits equations (Sect. 6.5.1) when only the current (or a voltage drop)
along a particular circuit element is sought for.
y00 C z00 z0 D 0; y0 C z0 2z D 1 et ;
Systems of DEs we considered in Sect. 1.3 can also be solved using the LT
method. The convenience of the latter method is that the corresponding eigenprob-
lem does not appear. Moreover, we assumed there an exponential trial solution;
no need for this assumption either: the whole solution appears naturally if the LT
method is applied.
where v0 is the initial velocity of the particle and BO D B=B is the unit vector in
the direction of
the magnetic
field, and ! D qB=m. It is easy to see that in the
case of v0 D 0; v? ; vk and BO D .0; 0; 1/ the same result as in Eq. (1.118) is
immediately recovered.
The LT method has been found extremely useful is solving electronic circuits
problems. This is because the famous Kirchhoff’s laws for the circuits represent
480 6 Laplace Transform
di.t/
uM .t/ D M ; (6.65)
dt
where the directions of the current as shown in Fig. 6.11 are assumed. The first
Kirchhoff’s law states that for any closed mesh in a circuit the total drop of the
voltage across the mesh calculated along a particular direction (see an example in
Fig. 6.12) is equal to the applied voltage (zero if no applied voltage is present, i.e.
there is no battery attached):
X X dik X 1 Z t X X
dik
Rk ik C L C ik . /d C q0k C Mkk0 D vk .t/;
k k
dt k
Ck 0 k
dt k
(6.66)
6.5 Applications in Physics 481
where we sum over all elements appearing in the mesh, with Mkk0 being the mutual
induction coefficient between two meshes k and k0 (note that Mkk0 D Mk0 k ). The
current depending terms (in the left-hand side) are to be added algebraically with
the sign defined by the chosen directions of the currents with respect to the chosen
positive direction in the mesh (it is indicated in Fig. 6.12), and the voltages vk .t/ in
the right-hand side are also to be summed up with the appropriate sign depending
on their polarity.
The second Kirchhoff’s law states that the sum of all currents through any vertex
is zero:
X
ik D 0: (6.67)
k
To solve these equations, we apply the LT method. Since the terms are added
algebraically, we can reformulate the rules of constructing the Kirchhoff’s equations
directly in the Laplace space if we consider how each element of the mesh would
contribute. If I.p/ is the image for a current passing through a resistance R and
UR .p/ is the corresponding voltage drop, then the contribution due to the resistance
would simply be UR D RI. For a capacitance C, the LT of Eq. (6.63) gives
1
UC D .I C q0 / ; (6.68)
pC
while for the induction elements we would have similarly
Here q0 and i0 are the initial (at t D 0) charge on the capacitor and the current
through the induction. Correspondingly, the vertex equation (6.67) has the same
form for the images as for the originals, while the Eq. (6.66) describing the voltage
drop across a mesh is rewritten via images as
X X X 1 X X
Rk Ik C L .pIk i0k /C .Ik C q0k /C Mkk0 .pIk i0k / D Vk .p/:
k k k
pCk k k
(6.70)
482 6 Laplace Transform
Fig. 6.13 Simple electrical circuits for (a) Example 6.3, (b) Example 6.4 and (c) Problem 6.45.
Here voltages and currents are understood to be images of the corresponding quantities in the
Laplace space and hence are shown using capital letters
In the case of a constant voltage (a battery or Emf) Vk .p/ D vk =p. Here and in the
following we use capital letters for images and small letters for the originals.
Example 6.3. I Consider a circuit with Emf v0 given in Fig. 6.13(a). Initially, the
switch was opened. Calculate the current after the switch was closed at t D 0 and
then determine the charge on the capacitor at long times.
Solution. In this case we have a single mesh with zero charge on the capacitor.
Then, the first Kirchhoff’s equation reads
v0 1 v0 1
D RI C I H) ID 1
:
p Cp R p C CR
Problem 6.45. Show that the currents in the circuit shown in Fig. 6.13(c) after
the switch was closed are
v0 v0
i1 .t/ D v0 Cı.t/ C 1 eRt=L ; i2 .t/ D 1 eRt=L :
R R
Here v0 is the Emf. Interestingly, the current through the capacitance is zero at
t > 0.
Problem 6.46. Consider the circuit shown in Fig. 6.14(a). Initially the switch
was opened and the capacitor gets charged up. Then at t D 0 it was closed.
Show that the currents through the resistance next to the Emf v0 and the
capacitor, respectively, are given by the equations
v0 v0 2t=CR
i1 .t/ D 1 e2t=CR and i2 .t/ D e :
2R R
Problem 6.47. In the circuit shown in Fig. 6.14(b) initially the switch was
opened. After a sufficiently long time at t D 0 the switch was closed. Show
that the current through the Emf v0 is
v0 R2
i.t/ D 1 eR1 t=L :
R1 R1 C R2
[Hint: note that at t D 0 there is a current passing through the outer mesh.]
Problem 6.48. Consider the same circuit as in the previous problem, but this
time initially for a rather long time the switch was closed and then at t D 0 it
was opened. Show that this time the current flowing through the Emf v0 will be
v0 R2
i.t/ D 1 C e.R1 CR2 /t=L :
R1 C R2 R1
Fig. 6.14 Electrical circuits used in Problem 6.46 (a) and (b) Problems 6.47 and 6.48
484 6 Laplace Transform
where !0 is its fundamental frequency, .t/ is the so-called friction kernel (without
loss of generality, it can be considered as an even function of time) and .t/
is the random force. The latter force is due to interaction with the surrounding
environment of which the state is uncertain. The friction kernel must be a decaying
function of time tending to zero at long times.
Applying the LT to both sides of the equation and using the convolution theorem,
we obtain
Here x0 and v0 D xP 0 are initial position and velocity of the particle, and
1
G.p/ D : (6.73)
p2 C p.p/ C !02
8
For a more general discussion, including a derivation of the equations of motion for a multidimen-
sional open classical system (the so-called Generalised Langevin Equation), as well as references
to earlier literature, see L. Kantorovich, Phys. Rev. B 78, 094304 (2008).
6.5 Applications in Physics 485
It is easy to see that G.p/ is the LT of the Green’s function G.t/ of the equa-
tion (6.71). Indeed, replacing the right-hand side in the equation with the delta
function, .t/ ! ı.t/, assuming zero initial conditions, and performing the LT,
we obtain exactly the expression (6.73) for X.p/ in this case.
Performing the inverse LT of Eq. (6.72), we obtain
Z t
x.t/ D .t/x0 C G.t/v0 C G.t / ./ d; (6.74)
0
where
Z t
1 11 !02 2
.t/ D L ŒG.p/ .p C .p// D L G.p/ D 1 !0 G. /d:
p p 0
(6.75)
Applying t D 0 in the solution (6.74), we deduce that
Note that the first identity is consistent with the full solution (6.75) for the
function .t/.
Differentiating now the solution (6.74) with respect to time, we get for the
velocity:
Z t
v.t/ D xP .t/ D .t/x
P P
0 C G.t/v0 C
P / ./ d;
G.t (6.77)
0
where the term arising from the differentiation of the integral with respect to the
upper limit vanishes due to the second identity in Eq. (6.76). Applying t D 0 to the
solution (6.77) for the velocity, we can deduce that
.0/
P D 0 and P
G.0/ D 0: (6.78)
Note that the first of these identities also follows immediately from Eqs. (6.75)
and (6.76).
Equations (6.74) and (6.77) provide us with exact solutions for the position and
velocity of the particle in the harmonic well under the influence of the external
force .t/. However, in our case this force is random, and hence these “exact”
solutions have little value. Instead, it would be interesting to obtain statistical
information about the position and velocity of the particle at long time t ! 1 when
the particle has “forgotten” its initial state described by x0 and v0 . To investigate the
behaviour of the particle at long times, it is customary to consider the appropriate
correlation functions. The position autocorrelation function (or position–position
correlation function)
Z
˝ ˛ t
hx.t/x.0/i D hx.t/x0 i D .t/ x02 C G.t/ hv0 x0 i C G.t / h ./ x0 i d;
0
486 6 Laplace Transform
In this case only the second term survives rendering the function G.t/ to tend to zero
when t ! 1. Similarly, the velocity autocorrelation function
Z t
˝ 2˛
P hx0 v0 i C G.t/
hv.t/v.0/i D .t/ P v0 C P / h ./ v0 i d;
G.t
0
P
being proportional to G.t/ (since the first and the last terms in the right-hand side
must be zero), shows that the derivative of the Green’s function should also tend to
zero in the long time limit. So, we have established that .t/, G.t/ and G.t/ P must
decay with time to zero since the particle must forget its “past” at long enough times.
Note that, according to (6.75),
P D !02 G.t/;
.t/ (6.79)
then it must tend to x .1/ D x1 at long times as both .t/ and G.t/ tend to zero in
this limit. Similarly, we can also introduce another function,
Z t
u2 .t/ D v.t/ .t/x
P P
0 G.t/v0 D
P / ./ d;
G.t (6.81)
0
which must tend to the particle velocity v.1/ D xP .1/ D v1 at long times.
Next we shall consider the same time correlation function A11 D hu1 .t/u1 .t/i
of the function u1 .t/. To calculate it, we shall use the so-called second fluctuation-
dissipation theorem according to which
1
h.t/ . /i D .t / ; (6.82)
ˇ
where ˇ D 1=kB T. This relationship shows that the random forces at the current
and previous times are correlated with each other. It is said that the noise provided
6.5 Applications in Physics 487
by this type of the random force is “colored”, as opposed to the “white” noise of
Eq. (5.82), when such correlations are absent.
Then,
Z t Z t
A11 .t/ D hu1 .t/u1 .t/i D d1 d2 G .t 1 / h .1 / .2 /i G .t 2 /
0 0
Z Z
1 t t
D d1 d2 G .1 / .1 2 / G .2 / ; (6.83)
ˇ 0 0
Z Z
1 t t
P .2 / :
P .1 / .1 2 / G
A22 .t/ D hu2 .t/u2 .t/i D d1 d2 G (6.85)
ˇ 0 0
To calculate these correlation functions, we shall evaluate their time derivatives. Let
us start from A11 given by Eq. (6.83). Differentiating the right-hand side with respect
to t, we find
Z t
2 2
AP 11 .t/ D G.t/ .t / G. /d D G.t/L1 Œ.p/G.p/ :
ˇ 0 ˇ
Using the definition of the auxiliary function .t/ in the Laplace space, see
Eq. (6.75), the inverse LT above is easily calculated to yield:
P
L1 Œ.p/G.p/ D L1 Œ.p C .p// G.p/ L1 ŒpG.p/ D .t/ G.t/:
P
In the last passage we have used the fact that G.0/ D 0 and hence L G.t/ D pG.p/.
Therefore,
2 2 2
P
AP 11 .t/ D G.t/ .t/ G.t/ D 2 .t/.t/
P P
G.t/G.t/
ˇ ˇ!0 ˇ
1 d 2 .t/ 2
D C G .t/ ;
ˇ dt !02
where Eq. (6.79) was employed to relate G.t/ to .t/ P in the first term. Therefore,
integrating, and using the initial condition that A11 .0/ D 0 (it follows from its defi-
nition as an average of u1 .0/ D 0 squared, or directly from the expression (6.83)),
we obtain
1 1 2 .t/ 2
A11 .t/ D C G .t/ : (6.86)
ˇ!02 ˇ !02
488 6 Laplace Transform
Hence, at long times A11 ! 1=ˇ!02 , A12 .t/ ! 0 and A22 .t/ ! 1=ˇ as .t/, G.t/
P vanish in this limit as discussed above.
and G.t/
The probability distribution function P .u1 ; u2 / gives the probability
for the particle to be found with the variable u1 being between u1 and u1 C du1 ,
and u2 being between u2 and u2 C du2 . From the fact that both u1 .t/ and u2 .t/ are
linear with respect to the noise .t/, see Eqs. (6.80) and (6.81), it follows that either
of the variables is Gaussian. That basically means that at long times the probability
distribution is proportional to the exponential function with the exponent which is
quadratic with respect to u1 D u1 .1/ D x1 and u2 D u2 .1/ D v1 :
1
P .u1 ; u2 / / exp YT A1 Y ;
2
where
u1 x1 A11 A12 1=ˇ!02 0
YD D and AD D :
u2 v1 A21 A22 0 1=ˇ
Since at long times the matrix A is diagonal, its inverse is trivially calculated:
2
1 ˇ!0 0
A D ;
0 ˇ
so that
!02 u21 u2 !02 x1
2
v2
P .u1 ; u2 / / exp ˇ C 2 D exp ˇ C 1 ; (6.89)
2 2 2 2
i.e. at long times the distribution function of the particle tends to the Gibbsian
distribution P / eˇE containing the particle total energy E D v1 2
=2 C !02 x1
2
=2
(recall that we set the mass of the vibrating particle to be equal to one here).
6.5 Applications in Physics 489
Problem 6.51. Show that the average potential and kinetic energies satisfy the
equipartition theorem:
Z 1 Z 1
!02 u21 1 kB T
hUi D du1 du2 P .u1 ; u2 / du1 du2 D D ;
1 1 2 2ˇ 2
Z 1 Z 1
u22 1 kB T
hKi D du1 du2 P .u1 ; u2 / du1 du2 D D :
1 1 2 2ˇ 2
where the random force satisfies the same condition (6.82) as for the particle
in the harmonic well. Show that in this case the Green’s function G.t/ D
L1 Œ1= .p C .p//, the velocity
Z t
v.t/ D G.t/v0 C G .t / ./ d;
0
while the same time correlation function for the variable u.t/ D v.t/ G.t/v0
is
1 1
A.t/ D hu.t/u.t/i D G2 .t/;
ˇ ˇ
which tends to 1=ˇ at long times. Consequently, argue that the probability
distribution for the Brownian particle at long times is Maxwellian:
r
ˇ ˇv1 2 =2
P .v1 / D e ;
2
so
˝ 2 that
˛ the equipartition theorem is fulfilled in this case as well:
v1 =2 D 1=2ˇ.
490 6 Laplace Transform
Fig. 6.15 A “tree” of transitions from the initial state marked as 0 to all other possible states over
time t via a final number of hops at times t1 , t2 , etc., tn D t. Dashed horizontal lines correspond
to the system remaining in the state from which the line starts over all the remaining time, while
arrows indicate transitions between states, the latter are indicated by filled circles. State numbers
are shown by the circles for clarity
9
Here we loosely follow (and in a rather simplified form) a detailed discussion which a reader can
find in L. Kantorovich, Phys. Rev. B 75, 064305 (2007).
6.5 Applications in Physics 491
corresponds to the rate to hop to state i2 during the third hop from the state reached
after the second hop. Summing up all rates at the given k-th hop,
.k/
X
n
.k/
.k/
R D ri ; (6.90)
iD1
gives the total rate of leaving the state the system was in prior to the k-th hop,
and hence it corresponds to the escape rate from this state. Above, n.k/ is the total
number of states available to the system to hop into during the k-th hop.
Now, let the system be in some state at time t0 after .k 1/ hops. We can
.k/
then define the residence probability P0 .t0 ; t/ for the system to remain in the
current state until the final time t. In fact, this probability was already calculated
in Sect. I.8.5.5 and is given by the formula:
.k/ 0 .k/ 0
P0 t ; t D eR .tt / ; (6.91)
where R.k/ is the corresponding escape rate during the k-th hop. Therefore, the
probability for the system to remain in the current state over the whole time t t0
.k/
(and hence to make no hops at all during the k-th step) is given as P0 .t0 ; t/.
Consider now an event whereby the system makes a single hop from the initial
state to state i1 at some time between t0 and t, and then remains in that state until
time t. The probability of this, an essentially one-hop event, is given by the integral:
Z t Z t
.12/ .1/ .1/ .2/ .1/ .1/ .2/
Pi1 0 .t0 ; t/ D P0 .t0 ; t1 / ri1 dt1 P0 .t1 ; t/ D dt1 P0 .t0 ; t1 / ri1 P0 .t1 ; t/ :
t0 t0
(6.92)
Here the system remains in its initial state up to some time t1 (where t0 < t1 < t),
.1/
the probability of that is P0 .t0 ; t1 /, then it makes a single hop into state i1 (the
.1/
probability of this is ri1 dt1 ) and then remains in that state all the remaining time,
.2/
the probability of the latter being P0 .t1 ; t/. We integrate over all possible times t1 to
.12/
obtain the whole probability. The superscript in Pi1 0 shows that this event is based
on two elementary events: (1) a single hop, indicated by i1 as a subscript, and (2) no
transition thereafter, indicated by 0 next to i1 in the subscript. Using Eq. (6.91) for
the residence probability and performing the integration, we obtain
.1/
.12/ ri1 .2/ t .1/ t
Pi1 0 .t0 ; t/ D eR eR : (6.93)
R.1/ R.2/
This expression was obtained assuming different escape rates R.1/ ¤ R.2/ . If the
escape rates are equal, then one can either perform the integration directly or take
the limit of x D R.2/ R.1/ ! 0 in the above formula. In either case we obtain
.12/ .1/ .1/
Pi1 0 .t0 ; t/ D ri1 teR t : (6.94)
492 6 Laplace Transform
Along the same lines one can calculate the probability to make exactly two hops
over time between t0 and t: initially to state i1 and then to state i2 :
Z t Z t
.123/ .1/ .1/ .2/ .2/ .3/
Pi1 i2 0 .t0 ; t/ D dt1 dt2 P0 .t0 ; t1 / ri1 P0 .t1 ; t2 / ri2 P0 .t2 ; t/ : (6.95)
t0 t1
Integrating this expression is a bit more tedious, but still simple; assuming that all
escape rates are different, we obtain (please, check!):
.123/ .1/ .2/ 1 .1/ 1 .2/ 1 .3/
Pi1 i2 0 .t0 ; t/ D ri1 ri2 eR t C eR t C eR t ;
R21 R31 R12 R32 R13 R23
(6.96)
where it was denoted, for simplicity, Rij D R.i/ R.j/ . If some of the rates coincide,
then a slightly different expression is obtained.
which states that the .N C 1/-hop transition can be thought of as a single hop at
time t1 into state i1 followed by N remaining hops into states i2 ! i3 ! ! iN ,
after which the system remains at the last state for the rest of the time. Note that
the probabilities depend only on the time difference. Setting the initial time t0 to
zero, one can then clearly see that the time integral above represents a convolution
integral. Therefore, performing the LT of this expression, we immediately obtain
.1/
.12 N/ ri1 .23 N/
Pi1 i2iN1 0 .p/ D Pi2 i3iN1 0 .p/ ;
p C R.1/
h i
.1/
since L P0 .0; t/ D 1= p C R.1/ . Applying this recurrence relation recursively,
the probability in the Laplace space can be calculated explicitly:
To calculate the probability, it is now only required to take the inverse LT. If the
escape rates are all different, then we have simple poles on the negative part of the
real axis at R.1/ , R.2/ , etc., and hence the inverse LT is easily calculated:
! N " #
.12 N/
Y .k/ X
N1 YN
1 .k/
Pi1 i2 iN1 0 .0; t/ D rik Res e pt
I p D R
kD1 kD1 lD1
p C R.l/
! N 0 1
Y .k/ X
N1
.k/
YN
1
D rik eR t @ .k/ C R.l/
A:
R
kD1 kD1 lD1 .l¤k/
where the second product runs over all distinct escape rates and mk is their repetition.
Correspondingly, R.k/ becomes the pole of order mk and one has to use general
formula (2.106) for calculating the corresponding residues.
Problem 6.54. Show that if all escape rates are the same and equal to R, then
the N-hop probability is
!
.12 N/
Y
N1
.k/ tN1 Rt
Pi1 i2 iN1 0 .0; t/ D rik e :
kD1
.N 1/Š
Problem 6.55. The probability PN .t/ of performing N hops (no matter into
which states) over time t can be obtained by summing up all possible .N C 1/-
hops probabilities:
Assuming that the escape rates are all the same and using the result of the
previous problem, show that
.Rt/N Rt
PN .t/ D e ;
NŠ
which is the famous Poisson distribution. Then, demonstrate that the sum of all
possibilities of performing 0, 1, 2, etc. hops is equal to unity:
1
X
PN D 1:
ND0
is the oscillation frequency of the cantilever far away from the surface. If one can
calculate the quantity which is directly measured experimentally, then it would
become possible to verify theoretical models. Of course, many such models may
need to be tried before good agreement is reached.
However, one may also ask a different question: how to determine the tip force
Fs .z/ as a function of the tip-surface distance z from the experimentally measured
frequency shift !.z/ curve? If the force can be obtained in this way, then it might
be easier to choose the theoretical model which is capable of reproducing this
particular z-dependence. This problem is basically an inverse problem to the one
we solved in Sect. 3.8.4. It requires solving the integral equation (3.93) with respect
to the force.
Here we shall obtain a nearly exact solution of this integral equation using the
method of LT.10 We shall start by rewriting the integral equation (3.93) slightly
differently:
" 2 # Z 5 =2
!
kA 1 D Fs .h0 C A sin / sin d : (6.99)
!0 =2
Recall that A is the oscillation amplitude and z D h0 C A sin corresponds to the tip
height above the surface; k is the elastic constant of the cantilever. It was convenient
here to shift the integration limits by =2. This can always be done as the integrand
is periodic with respect to with the period of 2 . Next, note that within the span
of the values in the integral the tip makes the full oscillation cycle by starting at
the height h0 C A (at D =2), then moving to the position z D h0 A closest to
the surface (at D 3 =2), and then returning back (retracting) to its initial position
of z D h0 C A at D 5 =2.
Problem 6.56. Next, we shall split the integral into two: for angles =2 < <
3 =2 when the tip moves down, and for angles 3 =2 < < 5 =2 when it is
retracted back up. If F# .z/ and F" .z/ are the tip forces for the tip moving down
and up, respectively, show, by making a substitution x D sin , that Eq. (6.99)
can be rewritten as follows:
Z 1
xdx
F.z C Ax/ p D ‰ .z/ ; (6.100)
1 1 x2
(continued)
10
The solution to this problem was first obtained by J.E. Sader and S.P. Jarvis in their highly cited
paper [Appl. Phys. Lett. 84, 1801 (2004)] using the methods of LT and the so-called fractional
calculus. We adopted here some ideas of their method, but did not use the fractional calculus at all
relying instead on conventional techniques presented in this chapter.
496 6 Laplace Transform
Note that the tip force on the way down and up could be different due to a possible
atomic reconstruction at the surface (and/or at the tip) when the tip approaches the
surface on its way down; this reconstruction sets in and affects the force when
the tip is retracted. If such a reconstruction takes place, the tip force experiences
a hysteresis over the whole oscillation cycle which results in the energy being
dissipated in the junction.
To solve Eq. (6.100), we assume that F.z/ is the LT of some function f .t/:
Z 1 Z 1
xdx
p f .t/e.zCAx/t dt D ‰ .z/ :
1 1 x2 0
Our current goal is to find the function f .t/. The improper integral over t converges
for any x since z C Ax > z A > 0 as z is definitely bigger than A (the distance of
closest approach h0 A > 0); moreover, it converges uniformly with respect to x,
since
ˇZ 1 ˇ ˇZ 1 ˇ
ˇ ˇ ˇ ˇ
ˇ f .t/e.zCAx/t ˇ ˇ
dtˇ ˇ f .t/e.zA/t ˇ
dtˇ D jF.z A/j ;
ˇ
0 0
The integral in the square brackets can be directly related to the modified Bessel
function of the first kind, see Eq. (4.233); it is equal to I1 .At/. This enables us to
rewrite Eq. (6.102) as follows:
Z 1
1
f .t/ezt I1 .At/ dt D ‰ .z/ ;
0
or simply as
1
L ŒI1 .At/ f .t/ D ‰ .p/ ; (6.103)
6.5 Applications in Physics 497
Fig. 6.16 Comparison of the function I1 .x/ calculated exactly (by a numerical integration in
Eq. (4.233)) and using the approximation of Eq. (6.105). In both cases for convenience the function
I1 .x/ex is actually shown
where p is the real positive number; we have changed z to p here as this is the letter
we have been using in this chapter as the variable for the LT. Hence, it follows:
1 1 1 .t/
f .t/ D L1 Œ‰ .p/ D ; (6.104)
I1 .At/ I1 .At/
where .t/ D L1 Œ‰.p/. So far, our manipulations have been exact.
To proceed, we shall now apply an approximation to the Bessel function which
works really well across a wide range of its variables:
1 2 1 p
' ex C p C 2 x : (6.105)
I1 .x/ x 4 x
Introducing notations
G1 .p/ D L t1 .t/ ; G2 .p/ D L t1=2 .t/ and G3 .p/ D L t1=2 .t/ ;
(6.107)
and using the property (6.41), we obtain
1 2 1 p
F.p/ D G1 .p C A/ C p G2 .p C A/ C 2 AG3 .p C A/ : (6.108)
A 4 A
Now we need to calculate all three G functions in (6.107). The first one is calculated
immediately using property (6.46):
Z 1 Z 1
.t/
G1 .p/ D L D ‰ .z/ dz D ‰ .z C p/ dz: (6.109)
t p 0
For the calculation of the second and the third ones, we can use Eqs. (6.30)
and (6.35), respectively, with ˛ D 1=2. These give
Z 1
1 ‰ .z C p/
G2 .p/ D L t1=2 .t/ D p p dz;
0 z
Z 1
1 ‰ 0 .z C p/
G3 .p/ D L t1=2 .t/ D p p dz;
0 z
In many applications physical systems possess symmetry. For instance, the magnetic
field of an infinite vertical wire with a current flowing through it has a cylindrical
symmetry (i.e. the field depends only on the distance from the wire), while the field
radiated by a point source has the characteristic spherical symmetry (i.e. depends
only on the distance from the source). In these and many other cases a Cartesian
coordinate system may not be the most convenient choice; a special choice of the
coordinates (such as cylindrical or spherical ones for the two examples mentioned
above, respectively) may, however, simplify the problem considerably and hence
enable one to obtain a closed solution. In particular, investigation of a large number
of physical problems require solving the so-called partial differential equations
(PDE). Using the appropriate coordinates in place of the Cartesian ones allows
one to obtain simpler forms of these equations (which may, e.g., contain a smaller
number of variables) that can be easier to solve.
The key objective of this chapter1 is to present a general theory which allows
introduction of such alternative coordinate systems and how general differential
operators such as gradient, divergence, curl and the Laplacian can be written in
terms of them. Some applications of these so-called curvilinear coordinates in
solving PDEs will be considered in Sect. 7.11.1 and then in Chap. 8.
1
In the following, references to the first volume of this course (L. Kantorovich, Mathematics for
natural scientists: fundamentals and basics, Springer, 2015) will be made by appending the Roman
number I in front of the reference, e.g. Sect. I.1.8 or Eq. (I.5.18) refer to Sect. 1.8 and Eq. (5.18) of
the first volume, respectively.
We shall also suppose that Eq. (7.1) can be solved for each point .x; y; z/ with respect
to the new coordinates:
yielding the corresponding inverse relations. In practice, for many such transfor-
mations, at certain points .x; y; z/ the solutions (7.2) are not unique, i.e. several
(sometimes, infinite) number of coordinates .q1 ; q2 ; q3 / exist corresponding to the
same point in the 3D space. Such points are called singular points of the coordinate
transformation. The new coordinates .q1 ; q2 ; q3 / are called curvilinear coordinates.
The reader must already be familiar with at least two such curvilinear systems
(see Sect. I.1.13.2): the cylindrical and spherical curvilinear coordinate systems.
In the case of cylindrical coordinates .q1 ; q2 ; q3 / D .r; ; z/ the corresponding
transformation is given by the following equations:
where 0 r < 1, 0 < 2 and 1 < z < C1, see Fig. 7.1(b).
Points along the z axis (when r D 0) are all singular in this case: one obtains
x D y D 0 (any z) for any angle . By taking p the square of the first two equations
and adding them together, one obtains r D x2 C y2 , while dividing the first two
equations gives D arctan .y=x/. These relations serve as the inverse relations of
the transformation. The polar system, Fig. 7.1(a), corresponds to the 2D space and
is obtained by omitting the z coordinate altogether. In this case only the single point
x D y D 0 is singular.
Fig. 7.1 Frequently used curvilinear coordinate systems: (a) polar, (b) cylindrical and (c)
spherical
7.2 Unit Base Vectors 501
For the case of the spherical coordinates .q1 ; q2 ; q3 / D .r; ; / we have the
transformation relations
Problem 7.1. Show that the inverse relations for the spherical coordinate
system are
p p
rD x2 Cy2 Cz2 ; D arctan .y=x/ and D arccos z= x2 Cy2 Cz2 :
Hence, transformation relations (7.1) and (7.2) enable one to define a mapping of
the Cartesian system onto the curvilinear one. This mapping does not need to have a
one-to-one correspondence: the same point .x; y; z/ may be obtained by several sets
of the chosen curvilinear coordinates.
Note that the transformation (7.1) allows one to represent any scalar field
G .x; y; z/ in curvilinear coordinates as
It is some function Gcc of the curvilinear coordinates. For instance, the scalar field
2
G.x; y; z/ D x2 C y2 C z2 is equivalent to the scalar function Gcc .r; ; / D r4
in the spherical system (or coordinates).
It is seen that once the transformation relations are known, any scalar field can
readily be written in the chosen curvilinear coordinates. Next, we need to discuss a
general procedure for representing an arbitrary vector field F.x; y; z/ in terms of the
curvilinear coordinates .q1 ; q2 ; q3 /. This is by far more challenging as the field has a
direction and hence we need to understand how to write the field in the vector form
which does not rely on Cartesian unit base vectors i, j and k.
All three coordinate surfaces will intersect at a point P. Moreover, any pair of
surfaces will intersect at the corresponding coordinate line passing through the point
P: 1 and 3 intersect at the q2 -line, 2 and 3 and the q1 -line, and so on.
As an example, let us construct coordinate lines and surfaces for the cylindrical
coordinate system, Eq. (7.3) and Fig. 7.1(b). By changing only r, we draw the
r-line which is a ray starting at the z axis and moving outwards perpendicular to
it remaining at the height z and making the angle to the x axis. The coordinate
-line will be a circle of radius r and at the height z, while the coordinate z-line is
the vertical line drawn at a distance r from the z axis such that its projection on the
.x y/ plane is given by the polar angle , see Fig. 7.3. The coordinate surfaces for
this case are obtained by fixing a single coordinate: r is a cylinder coaxial with the
z axis, is a vertical semi-plane hinged to the z axis, and z is a horizontal plane at
the height z. Obviously, r and intersect at the corresponding z-line, r and z at
the -line, while and z at the r-line, as expected.
Next we introduce a set of vectors e1 , e2 and e3 at the intersection point P.
Each of these is a unit vector; the direction of ei (i D 1; 2; 3) is chosen along the
tangent to the corresponding qi -coordinate line and in such a way that it points to the
increase of qi . These vectors, which are called unit base vectors of the curvilinear
coordinate system, enable one to represent an arbitrary vector field F.r/ at point
P.r/ D P.x; y; z/ in the form
F.P/ D F1 e1 C F2 e2 C F3 e3 ; (7.5)
of this unit base vector changes along the line; but it does not depend on z. At the
same time, ez D k is always directed along the z axis as the third coordinate z of this
system is identical to the corresponding coordinate of the Cartesian system.
Let us now derive explicit expressions for the unit base vectors. We shall relate
them to the curvilinear coordinates and the Cartesian base vectors i, j and k. The
latter are convenient for that purpose as they are fixed vectors. Consider the general
position vector r D xi C yj C zk written in the Cartesian system. Because of the
transformation equations (7.1), r D r .q1 ; q2 ; q3 /, i.e. each Cartesian component of
r depends on the three curvilinear coordinates. Since the vector ei is directed along
the tangent of the qi -line along which only the coordinate qi varies, the direction of
ei will be proportional to the partial derivative of the vector r with respect to this
coordinate at the point P, i.e.
@r @x @y @z
ei / D iC jC k ; i D 1; 2; 3 : (7.6)
@qi @qi @qi @qi
A proportionality constant hi between the derivative @r=@qi and ei in the relation
@r=@qi D hi ei can be chosen to ensure that the unit base vector is of unit length. This
finally allows us to write the required relationships which relate the unit base vectors
of an arbitrary curvilinear coordinate system to the Cartesian unit base vectors:
1 @r 1 @x @y @z
ei D D iC jC k ; (7.7)
hi @qi hi @qi @qi @qi
where
ˇ ˇ s 2 2 2
ˇ @r ˇ @x @y @z
hi D ˇˇ ˇˇ D C C : (7.8)
@qi @qi @qi @qi
The factors hi , called scale factors and introduced above, are chosen to be positive
to ensure that ei is directed along the qi -line in the direction of increasing qi . It is
readily seen from the above equations that the unit base vectors ei are expressed as
a linear combination of the Cartesian vectors i, j and k which do not change their
direction with the position of the point P. For convenience, these can be collected
in three unit vectors e0i , i D 1; 2; 3. Therefore, a change (if any) experienced by
the vectors ei is contained in the coefficients Mij˚of the expansion. The relationship
between the new (curvilinear) fei g and Cartesian e0i vectors is conveniently written
in the matrix form2 :
0 1 0 01 0 1
3
X e1 e1 i
ei D mij e0j or @ e2 A D M @ e0 A D M @ j A ; (7.9)
2
jD1 e3 e03 k
2
We remind that we use capital letters for matrices and the corresponding small letters for their
matrix elements.
7.2 Unit Base Vectors 505
where M D mij is the 3 3 matrix of the coefficients,
1 @x 1 @y 1 @z
mi1 D ; mi2 D ; mi3 D :
hi @qi hi @qi hi @qi
e1 e2 D e1 e3 D e2 e3 D 0 : (7.11)
These relationships hold in spite of the fact that directions of ei may vary from point
to point. For an orthogonal system the coordinate surfaces through any point P will
all intersect at right angles.
If the given curvilinear system is orthogonal, then the matrix˚ M must be
orthogonal as well. It is easy to understand as the transformation e0i ! fei g,
Eq. (7.9), can be considered ˚ as a rotation in 3D space (Sect. 1.2.5.2): indeed, from
the three Cartesian vectors e0i we obtain a new set of vectors fei g in the same space.
Hence, if from one set of orthogonal vectors we obtain another orthogonal set, this
can only be accomplished by an orthogonal transformation (Sect. 1.2.5). This can
.0/ .0/
also be shown explicitly: if both sets are orthogonal, ei ej D ıij and ek ek0 D ıkk0 ,
then the matrix M must be orthogonal as well:
X .0/ .0/
X
ei ej D mik mjk0 ek ek0 D mik mjk0 ıkk0
kk0 kk0
X X
D mik mjk D mik M T kj D MM T ij :
k k
Since the dot product ei ej must be equal to ıij , then MM T is the unity matrix, i.e.
M T D M 1 , which is the required statement. For orthogonal systems the inverse
transformation, Eq. (7.10), is provided by the transposed matrix M T :
3
X 3
X
T
e0i D M ij ej D mji ej :
jD1 jD1
506 7 Curvilinear Coordinates
r D r .r; ; z/ D x .r; ; z/ iCy .r; ; z/ jCz .r; ; z/ k D .r cos / iC.r sin / jCzk:
We now calculate the derivatives of the vector r and their absolute values as required
by Eq. (7.7):
ˇ ˇ
@r ˇ @r ˇ 1=2
D .cos / i C .sin / j ; hr D ˇˇ ˇˇ D cos2 C sin2 D1I (7.12)
@r @r
ˇ ˇ
@r ˇ @r ˇ 1=2
D .r sin / i C .r cos / j ; h D ˇˇ ˇˇ D r2 sin2 C r2 cos2 DrI
@ @
(7.13)
ˇ ˇ
@r ˇ @r ˇ
D k ; hz D ˇˇ ˇˇ D 1 : (7.14)
@z @z
Hence, we obtain for the unit base vector the following explicit expressions:
It is easily verified that the vectors are of unit length (by construction) and are all
mutually orthogonal: er e D er ez D e ez D 0. These results show explicitly that
the cylindrical coordinate system is orthogonal as these conditions were derived for
any point P (i.e. any r, and z). Hence, in the case of the cylindrical system the
transformation matrix M reads:
0 1
cos sin 0
@
M D sin cos 0A: (7.16)
0 0 1
The matrix M is orthogonal since its rows (or columns) form an orthonormal set of
vectors, as expected; hence
0 1
cos sin 0
M 1 D M T D @ sin cos 0A : (7.17)
0 0 1
3
It is convenient to use the corresponding symbols of the curvilinear coordinates as subscripts for
the unit base vectors instead of numbers in each case, and we shall be frequently using this notation.
7.2 Unit Base Vectors 507
These relationships enable one to rewrite any vector field from the Cartesian to the
cylindrical coordinates:
where the three components of the field are explicit functions of the cylindrical
coordinates, e.g. Fx .r/ D Fx .x; y; z/ D Fx .r cos ; r sin ; z/. For instance,
2 y2
F D ex .xi C yj/
r2
2
De r cos cos er sin e C r sin sin er C cos e D er rer :
Problem 7.2. Using Fig. 7.4, describe the coordinate surfaces and lines for the
spherical coordinate system .r; ; /, Eq. (7.4). Show that the unit base vectors
and the corresponding scale factors for this system are
(continued)
508 7 Curvilinear Coordinates
Prove by checking the dot products between unit base vectors that this system
is orthogonal, and show that in this case the transformation matrix
0 1
sin cos sin sin cos
M D @ cos cos cos sin sin A
sin cos 0
is orthogonal, i.e. M T M D E, where E is the unit matrix. Also demonstrate by
a direct calculation that
er e D e ; er e D e and e e D er :
X3 X3
@r
dr D dqi D hi dqi ei ; (7.23)
iD1
@qi iD1
where Eq. (7.7) for the unit base vectors has been used. Note that expression (7.23)
for the change of r is valid for general non-orthogonal curvilinear coordinate
systems.
The square of the length ds of the displacement vector dr is then given by
3 X
X 3
2
.ds/ D dr dr D gij dqi dqj ; (7.24)
iD1 jD1
explaining why hi is called a scale factor. The small displacements dr1 , dr2 and dr3
made along each of the coordinate lines are shown in Fig. 7.5.
If the curvilinear system is orthogonal, the metric tensor is diagonal:
1 @r @r @r @r
ei ej D D ıij H) gij D D h2i ıij : (7.29)
hi hj @qi @qj @qi @qj
For an orthogonal system the three vectors dri are orthogonal independently of the
choice of the point P; but this is not generally the case for an arbitrary system: at
some points the unit base vectors may be orthogonal, but they will not be orthogonal
for all points. One can also see that for an orthogonal system
3
X 3
X
2 2 2 2 2
.ds/ D h21 .dq1 / C h22 .dq2 / C h23 .dq3 / D .dri / D .dsi /2 ; (7.30)
iD1 iD1
so that
v
u 3 3
p uX @r dqi X @r dqj
ds D dr dr D t dt
iD1
@qi dt jD1 @qj dt
v v
u 3 u 3
u X @r @r dqi dqj u X dqi dqj
D t dt D t gij dt ;
i;jD1
@qi @qj dt dt i;jD1
dt dt
Of course,
q the same expression is obtained directly from Eq. (7.24) by considering
ds D .ds/2 and dqi D .dqi =dt/ dt.
In the case of an orthogonal system this formula is simplified
Z t "X
3 2 #1=2
dqi
sD h2i dt : (7.33)
0 iD1
dt
as expected.
512 7 Curvilinear Coordinates
In a general case, however, the sides of the parallelepiped in Fig. 7.5 are not
orthogonal, and its volume is given by the absolute value of the mixed product of
all three vectors:
dV D j.dr1 Œdr2 dr3 /j D jh1 h2 h3 .e1 Œe2 e3 /j dq1 dq2 dq3 :
This formula can already be used in practical calculations since it gives a general
result for the Jacobian as
However, it is instructive to demonstrate that this is actually the same result as the
one derived previously in Sect. I.6.2.2 where J was expressed directly via partial
derivatives. To this end, recall the actual expressions for the unit base vectors,
Eq. (7.7); hence, our previous result can be rewritten as
ˇ ˇ
ˇ @r @r @r ˇ
dV D ˇˇ ˇ dq1 dq2 dq3 D Jdq1 dq2 dq3 : (7.39)
@q1 @q2 @q3 ˇ
It is not difficult to see now that the mixed product of derivatives above is exactly
the Jacobian J D @ .x; y; z/ =@ .q1 ; q2 ; q3 /. Indeed, the mixed product of any three
vectors can be written as a determinant (see Sect. I.1.7.1). Therefore, the Jacobian
in Eq. (7.39) can finally be manipulated into
ˇ ˇ
ˇ @x=@q1 @y=@q1 @z=@q1 ˇ
ˇ ˇ @ .x; y; z/
J D ˇˇ @x=@q2 @y=@q2 @z=@q2 ˇˇ D ;
ˇ @x=@q @y=@q @z=@q ˇ @ .q1 ; q2 ; q3 /
3 3 3
as required.
1
of the function f .r/ D r2 C 2 can be written as a one-dimensional integral
r Z 1
1 2 r sin.kr/
F.k/ D dr :
k 0 r2 C 2
514 7 Curvilinear Coordinates
Problem 7.9. A charge density of a unit point charge that is positioned at point
r0 is described by the distribution function in the form of the 3D delta function:
.r/ D ı .r r0 / D ı .x x0 / ı .y y0 / ı .z z0 / :
Show that in the spherical coordinates .r; ; / this formula takes on the
following form:
1
.r/ D ı .r r0 / D ı .r r0 / ı . 0 / ı . 0/ ;
r2 sin
1
ı .r r0 / D ı q1 q01 ı q2 q02 ı q3 q03 ; (7.40)
jh1 h2 h3 j
where r D .q1 ; q2 ; q3 / and r0 D q01 ; q02 ; q03 .
It is instructive to derive the above formula for the delta function in an orthogonal
curvilinear system also using the exponential delta-sequence (4.18):
ınml .r r0 / D ın .x x0 / ım .y y0 / ıl .z z0 /
nml n2 m2 l2
D exp .x x0 /2 .y y0 /2 .z z0 /2 ;
.2 /3=2 2 2 2
where n, m and l are integers. These are meant to go to infinity. The result should
not depend on the way these three numbers tend to infinity though; therefore, we
shall consider the limit of n D m D l ! 1:
3 h n i
n2
ınnn .r r0 / D p exp .r r0 /2 :
2 2
We are interested here in having rr0 being very close to zero. In this case .r r0 /2
is a distance square between two close points with the curvilinear coordinates
differing by q1 D q1 q01 , q2 D q2 q02 and q3 D q3 q03 . The distance
squared between these close points is given by Eq. (7.30):
so that
3
Y
n n2 hi 2
ınnn .r r0 / D p exp qi q0i :
iD1
2 2
3
Y Y 3
n n2 hi 2
ı .r r0 / D lim p exp qi q0i D ı hi qi q0i
iD1
n!1 2 2 iD1
Y3
1
D ı qi q0i ;
iD1
jhi j
which is the same result as the above Eq. (7.40). In the last passage we have used
the property (4.12) of the delta function.
We have seen in Sections I.6.1.4 and I.6.2.2 that one has to calculate a Jacobian
of transformation when changing the variables in double and triple integrals,
respectively. In the previous section we derived an expression for the Jacobian again
using the general technique of curvilinear coordinates developed in the preceding
sections. Here we shall prove that there is a straightforward generalisation of this
result to any number of multiple integrals.
Consider an n-fold integral
Z Z
In D F .x1 ; : : : ; xn / dx1 : : : dxn (7.41)
„ ƒ‚ …
n
xi D fi .y1 ; : : : ; yn / ; i D 1; : : : ; n : (7.42)
where
ˇ ˇ
ˇ @x1 =@y1 @x2 =@y1 @xn =@y1 ˇˇ
ˇ
ˇ @x =@y @x2 =@y2 @xn =@y2 ˇˇ
Jn D ˇˇ 1 2 (7.44)
ˇ ˇˇ
ˇ @x =@y @x2 =@yn @xn =@yn ˇ
1 n
where we only write explicitly elements of the k-th row. You can imagine that each
such a term, e.g. @x2 =@yk , represents the whole column of such elements with k
changing between 1 and n.
We shall prove this formula using induction. The formula is valid for n D 2; 3;
hence, we assume that it is valid for the .n 1/-dimensional integral, and then
shall prove that it is also valid for the n-dimensional integral. Consider the original
integral (7.41) in which we shall integrate over the variable x1 last:
2 3
Z Z Z
6 7
In D dx1 6 7
4 F .x1 ; : : : ; xn / dx2 : : : dxn 5 : (7.45)
„ ƒ‚ …
n1
y1 D .x1 ; y2 ; : : : ; yn / : (7.46)
Therefore, the transformation relations between the old and new variables can be
rewritten by excluding the first variable from either of the sets:
xi D fi . .x1 ; y2 ; : : : ; yn / ; y2 ; : : : ; yn / D gi .x1 ; y2 ; : : : ; yn / ; i D 2; : : : ; n ;
(7.47)
where g2 , g3 , etc., are the new transformation functions. Hence, the internal integral
in (7.45) can be transformed to the new set of variables y2 , etc., yn by means of the
Jacobian Jn1 . This is possible to do due to our assumption:
Z Z Z
In D dx1 F .x1 ; g2 .x1 ; y2 ; : : : ; yn / ; : : : ; gn .x1 ; y2 ; : : : ; yn // jJn1 j dy2 : : : dyn ;
„ ƒ‚ …
n1
7.5 Change of Variables in Multiple Integrals 517
where
ˇ ˇ
ˇ @g2 =@y2 @g3 =@y2 @gn =@y2 ˇˇ
ˇ
ˇ @g =@y @g3 =@y3 @gn =@y3 ˇˇ ˇˇ
ˇ
Jn1 D ˇˇ 2 3 ˇ D @g2 =@yk @g3 =@yk @gn =@yk ˇ :
ˇ
ˇ
ˇ @g =@y @g3 =@yn @gn =@yn ˇ
2 n
(7.48)
ˇ ˇ
ˇ ˇ
Let us calculate the derivatives appearing in Jn1 D @gi =@yj explicitly. From (7.47)
for i D 2; : : : ; n, we can write
x1 D f1 .y1 ; y2 ; : : : ; yn / : (7.49)
Here x1 is fixed, and hence dependence of (or of y1 , see Eq. (7.46)) is implicit.
Differentiating both sides of (7.49) with respect to yj and keeping in mind that y1
also depends on yj via Eq. (7.46), we obtain
The first column contains a difference of two terms; according to Properties 1.4
and 1.5 of the determinants (Sect. 1.2.6.2), we can split the first column and rewrite
Jn1 as two determinants:
ˇ ˇ
Jn1 D ˇ @x2 =@yk .@x3 =@yk / ˛k .@x3 =@y1 / .@xn =@yk / ˛k .@xn =@y1 / ˇ
@x2 ˇˇ ˇ
˛k .@x3 =@yk / ˛k .@x3 =@y1 / .@xn =@yk / ˛k .@xn =@y1 / ˇ :
@y1
Similarly, we can split the second column in both determinants:
ˇ ˇ
Jn1 D ˇ @x2 =@yk @x3 =@yk .@xn =@yk / ˛k .@xn =@y1 / ˇ
@x3 ˇˇ ˇ
@x2 =@yk ˛k .@xn =@yk / ˛k .@xn =@y1 / ˇ
@y1
518 7 Curvilinear Coordinates
@x2 ˇˇ ˇ
˛k @x3 =@yk .@xn =@yk / ˛k .@xn =@y1 / ˇ
@y1
@x2 @x3 ˇˇ ˇ
C ˛k ˛k .@xn =@yk / ˛k .@xn =@y1 / ˇ :
@y1 @y1
The last determinant contains two identical columns and hence is equal to zero
(Property 1.3 of determinants from Sect. 1.2.6.2). It is clear now that if we continue
this process and split all columns in Jn1 of Eq. (7.50), we shall arrive at a sum of
determinants in which the column ˛k can only appear once; there will also be one
determinant without ˛k , which we shall write first:
ˇ ˇ @x2 ˇ ˇ
Jn1 D ˇ @x2 =@yk @x3 =@yk @xn =@yk ˇ ˇ ˛k @x3 =@yk @xn =@yk ˇ
@y1
@x3 ˇˇ ˇ @xn ˇˇ ˇ
@x2 =@yk ˛k @xn =@yk ˇ @x2 =@yk @x3 =@yk ˛k ˇ :
@y1 @y1
(7.51)
@x1 @x1 ˇˇ ˇ
Jn1 D @x2 =@yk @x3 =@yk @xn =@yk ˇ
@y1 @y1
@x2 ˇˇ ˇ
˛k .@x1 =@y1 / @x3 =@yk @xn =@yk ˇ
@y1
@x3 ˇˇ ˇ
@x2 =@yk ˛k .@x1 =@y1 / @xn =@yk ˇ
@y1
@xn ˇˇ ˇ
@x2 =@yk @x3 =@yk ˛k .@x1 =@y1 / ˇ
@y1
7.5 Change of Variables in Multiple Integrals 519
@x1 ˇˇ ˇ
D @x2 =@yk @x3 =@yk @xn =@yk ˇ
@y1
@x2 ˇˇ ˇ
@x1 =@yk @x3 =@yk @xn =@yk ˇ
@y1
@x3 ˇˇ ˇ
@x2 =@yk @x1 =@yk @xn =@yk ˇ
@y1
@xn ˇˇ ˇ
@x2 =@yk @x3 =@yk @x1 =@yk ˇ : (7.53)
@y1
As the final step, in each of the determinants, apart from the first and the second
ones, we move the column with @x1 =@yk to the first position; the order of other
columns we do not change. This operation requires several pair permutations. If the
column with @x1 =@yk is at the position of the l-th column (where l D 1; 2; : : : ; n1),
this would require .l 1/ permutations giving a factor of .1/l1 . This yields
@x1 @x1 ˇˇ ˇ
Jn1 D @x2 =@yk @x3 =@yk @xn =@yk ˇ
@y1 @y1
@x2 ˇˇ ˇ
C .1/1 @x1 =@yk @x3 =@yk @xn =@yk ˇ
@y1
@xn ˇˇ ˇ
C C .1/n1 @x1 =@yk @x2 =@yk @xn1 =@yk ˇ : (7.54)
@y1
Looking now very carefully, we can verify that this expression is exactly an
expansion of Jn in Eq. (7.44) along the first row, i.e. Jn1 .@x1 =@y1 / is indeed Jn ,
and hence Eq. (7.52) is the required formula (7.43). Q.E.D.
Problem 7.11. Prove the formula for the multidimensional Gaussian integral,
0 1
Z 1 Z 1 Z 1 X n
1 .2 /n=2
In D dx1 dx2 dxn exp @ qij xi xj A D p ;
1 1 1 2 i;jD1 jQj
(7.55)
following these steps: (i) write the quadratic
form in the exponent in the matrix
form as 12 X T QT with X D .xi / and Q D qij ; (ii) introduce new variables Y D
U T X D .yi / with theorthogonal
matrix U which diagonalises the matrix Q D
UDU T , where D D ıij di is the diagonal matrix of eigenvalues di of Q; (iii)
change
ˇ variables
ˇ in the integral from X to Y and explain why the Jacobian J D
ˇ@xi =@yj ˇ is equal to unity; (iv) then, the n-fold integral splits into a product of
R1 p
n independent Gaussian integrals 1 exp 12 di y2i dyi D =di ; (v) finally,
multiply all such contributions to get the required result.
520 7 Curvilinear Coordinates
Here the integration is performed over the whole space. We shall now simplify this
N-dimensional integral using a kind of heuristic argument.
Indeed, let us first calculate the integral,
Z Z
VN D dx1 dx2 dxN ;
x12 C CxN
2 R2
x12 C C xN2 R2
to the centre of the coordinate system which can be considered as the “radial”
coordinate of the N-dimensional spherical coordinate system.
7.6 N-dimensional Sphere 521
We shall show now that for functions f r2 which depend only on the distance
r, the precise form of the angular part, dN , is not important. Indeed, assuming that
the angle integral does not depend on r, the r integration can be performed first:
Z Z Z Z
RN
VN D dN1 H) dN1 D NCN : (7.59)
N
To calculate the constant CN , let us consider the integral (7.56) with f .x/ D ex .
We can write
Z Z Y Z 1 Y
N
2
N
p
exp x12 C C xN2 dx1 dxN D ex dx D D N=2
:
iD1 1 iD1
On the other hand, the same integral can be written via the corresponding radial and
angular arguments as
Z 1 Z Z Z Z
r2 N1 1 N
e r dr dN1 D dN1 ;
0 2 2
where we have calculated the radial integral explicitly. Since the angular integration
gives NCN , see Eq. (7.59), we immediately obtain that
1 N 2 N=2
NCN D N=2
H) CN D : (7.60)
2 2 N N2
This solves the problem of calculating the volume of the N-dimensional sphere:
2 N=2 RN
VN D CN RN D : (7.61)
N N2
2 N=2 N1
R
SN1 .R/ D N : (7.62)
2
Now we are able to return to the integral (7.56). Noting that the argument of the
function f in the integrand is simply r2 , we obtain
Z 1 Z Z Z
2 N=2 1 2 N1
IN D f r2 rN1 dr dN1 D N
f r r dr ;
0 2 0
(7.63)
where we have used Eqs. (7.59) and (7.60) for the angular integral. This is the final
result.
The essential point of our discussion above is based on an assumption, made in
Eq. (7.58), that the volume element can be written as in Eq. (7.57). Although this
formula is based on the dimensional argument, it is worth deriving it. We shall do
this by generalising spherical coordinates to the N-dimensional space:
where the two systems of coordinates are related to each other via the following
transformation equations:
x1 D r cos '1 ; x2 D r sin '1 cos '2 ; x3 D r sin '1 sin '2 cos '3 ;
xi D r sin '1 sin '2 sin 'i1 cos 'i ;
xN1 D r sin '1 sin '2 sin 'i1 sin 'i sin 'N2 cos 'N1 ;
xN D r sin '1 sin '2 sin 'i1 sin 'i sin 'N2 sin 'N1 :
Here r 0 and 0 'N1 2 , while all other angles range between 0 and .
The volume elements in the two coordinates systems, the Cartesian and the
N-dimensional spherical ones, are related by the Jacobian (Sect. 7.5):
ˇ ˇ
ˇ @ .x1 ; x2 ; : : : ; xN / ˇ
ˇ
dx1 dx2 dxN D ˇ ˇ drd'1 d'N1 D jJN j drd'1 d'N1 :
@ .r; ' ; : : : ; ' / ˇ
1 N1
A full expression for the Jacobian (i.e. the function F) can also be derived, if
needed, by calculating explicitly the Jacobian determinant. This would be needed
when calculating integrals in which the integrand, apart from r, depends also on
other coordinates (or their combinations).
In this and the following sections we shall revisit several important notions of the
vector calculus introduced in Sects. I.5.8, I.6.6.1 and I.6.6.2. There we obtained
explicit formulae for the gradient, divergence and curl for the Cartesian system.
Our task here is to generalise these for a general curvilinear coordinates system.
These would allow us in the next chapter to consider PDE of mathematical physics
exploiting symmetry of the problem at hand. Although the derivation can be done
for a very general case, we shall limit ourselves here only to orthogonal curvilinear
coordinate systems as these are more frequently found in actual applications.
Consider a scalar field ‰.P/ defined at each point P.x; y; z/ in a 3D region R. The
directional derivative, d‰=dl, of ‰.P/ was defined in Sect. I.5.8 as the rate of change
of the scalar field along a direction specified by the unit vector l. Then, the gradient
of ‰.P/, written as grad ‰.P/, was defined as a vector satisfying the relation:
d‰
.grad ‰/ l D : (7.64)
dl
Note that the gradient does not depend on the direction l, only on the value of the
field at the point P.
Our task now is to obtain an expression for the gradient in a general orthogonal
curvilinear coordinate system. In order to do that, we first expand the gradient
(which is a vector field) in terms of the unit base vectors of this system:
3
X
grad ‰ D .grad ‰/j ej : (7.65)
jD1
Multiplying both sides by ei and using the fact that the system is orthogonal,
ei ej D ıij , we have
Comparing this equation with Eq. (7.64), we see that the i-th component of grad‰
is provided by the directional derivative of ‰ along the qi coordinate line, i.e. along
the direction ei ,
d‰
.grad ‰/i D ; (7.67)
dsi
524 7 Curvilinear Coordinates
where dsi D hi dqi is the corresponding distance in space associated with the change
of qi from qi to qi C dqi , see (7.28). Therefore, in order to calculate .grad ‰/i , we
have to calculate the change d‰ of ‰ along the direction ei . In this direction only
the coordinate qi is changing by dqi , i.e.
@‰
d‰ D dqi ; (7.68)
@qi
d‰ .@‰=@qi / dqi 1 @‰
.grad ‰/i D D D : (7.69)
dsi hi dqi hi @qi
X3
1 @‰
grad ‰ D ei : (7.70)
h @qi
iD1 i
It is easy to see that this expression indeed generalises an expression for the
gradient,
@‰ @‰ @‰
grad ‰.r/ D iC jC k; (7.71)
@x @y @z
see Eq. (I.5.69), derived for the Cartesian system. In this case .q1 ; q2 ; q3 / ! .x; y; z/,
.e1 ; e2 ; e3 / ! .i; j; k/ and h1 D h2 D h3 D 1, so that we immediately recover our
previous result. For cylindrical coordinates .q1 ; q2 ; q3 / ! .r; ; z/ and hr D hz D 1,
h D r, so that in this case
@‰ 1 @‰ @‰
grad ‰ D er C e C ez : (7.72)
@r r@ @z
@‰ 1 @‰ 1 @‰
grad ‰ D er C e C e : (7.73)
@r r @ r sin @
Problem 7.16. Consider a particle of unit mass moving within the z D 0 plane
in a central field with the potential U.r/ D ˛=r, where r is the distance from
the centre. Show that the force field, F D grad U, acting on the particle in
this coordinate system is radial, F D ˛r2 er .
Consider a vector field F.P/. We showed in Sect. I.6.6.1 that the divergence of the
vector field at the point P is given by the flux of F through a closed surface S of
volume V containing the point P,
I I
1 1
div F D lim F dS D lim F ndS ; (7.74)
V!0 V S V!0 V S
3
X
F .q1 ; q2 ; q3 / D Fi .q1 ; q2 ; q3 / ei : (7.75)
iD1
We first calculate the flux of F across the coordinate surface EFGH crossing the
coordinate line q1 at the value q1 C 12 ıq1 . In an orthogonal system that surface can be
526 7 Curvilinear Coordinates
considered, to the leading order, rectangular (its sides are orthogonal), with the sides
HE D GF D h2 ıq2 and GH D FE D h3 ıq3 . Its area therefore is dS1 D h2 h3 ıq2 ıq3 .
Also, the outward normal to the surface is n D e1 since, again, we consider an
orthogonal system. Therefore, the flux through this surface
since F e1 D F1 , see Eq. (7.75). The expressionin the round brackets above is to
be calculated at the central point of the surface, q1 C 12 ıq1 ; q2 ; q3 . Applying the
Taylor expansion, this expression can be calculated, to the first order, as:
@ ıq1
d1C D .F1 h2 h3 /P C .F1 h2 h3 / ıq2 ıq3
@q1 P 2
@ ıq1
D .F1 h2 h3 /P ıq2 ıq3 C .F1 h2 h3 / ıq2 ıq3 : (7.76)
@q1 P 2
the minus sign is due to the fact that for this surface the outward normal n D e1
and hence F n D F1 . Thus, the total outward flux across the opposite pair of
surfaces EFGH and IJKL is equal to
@
d1 D d1C C d1 D .F1 h2 h3 / ıq1 ıq2 ıq3 : (7.78)
@q1 P
7.8 Divergence of a Vector Field 527
The same analysis is repeated for the other two pairs of opposite faces to yield
the contributions
@ @
d2 D .F2 h3 h1 / ıq1 ıq2 ıq3 and d3 D .F3 h1 h2 / ıq1 ıq2 ıq3 ;
@q2 P @q3 P
for the q2 and q3 surfaces, respectively. Summing up all three contributions leads to
the final value of the flux through the whole surface S as
@ @ @
.F1 h2 h3 / C .F2 h3 h1 / C .F3 h1 h2 / ıq1 ıq2 ıq3 : (7.79)
@q1 @q2 @q3
to the leading order. Finally, dividing the flux (7.79) by the volume (7.80), we finally
obtain
1 @ @ @
div F D .F1 h2 h3 / C .h1 F2 h3 / C .h1 h2 F3 / : (7.81)
h1 h2 h3 @q1 @q2 @q3
Note that it might seem that we have never applied the limit V ! 0. In reality,
we did. Indeed, in the expressions for the fluxes and the volume we only kept
leading terms; other terms are proportional to higher powers of ıqi and would have
disappeared in the limits ıqi ! 0 (i D 1; 2; 3) which correspond to the volume
tending to zero. Hence, the result above is the general formula we sought for.
Let us verify if this formula goes over to the result we had for the Cartesian
system .x; y; z/,
see Eq. (I.6.79). In this case all scale factors are equal to one and our result does
indeed reduce to the one written above, as expected.
For cylindrical coordinates .r; ; z/ the scale factors read hr D hz D 1 and
h D r, so that we find
1 @ @ @ 1 @ 1 @F @Fz
div F D .Fr r/ C F C .Fz r/ D rF C C :
r @r @ @z r @r r @ @z
(7.83)
Only the first two terms are to be kept if the two-dimensional polar coordinate
system .r; / is considered.
528 7 Curvilinear Coordinates
Problem 7.17. Show that the divergence of a vector field F in the spherical
system is
1 @ 2 1 @ 1 @F
div F D 2 r Fr C .F sin / C : (7.84)
r @r r sin @ r sin @
Problem 7.18. Using the spherical system, calculate the divergence of the
vector field F D r2 er C r sin e at the point P r D 1; D 2 ; D 2 .
[Answer: 2 cos D 0.]
7.9 Laplacian
X3
1 @‰
F D F1 e1 C F2 e2 C F3 e3 D grad ‰ D ei ; (7.85)
h @qi
iD1 i
where we have used Eq. (7.70). Next, this vector field can be turned into a
scalar field by applying the divergence operation for which we derived a general
expression (7.81). We can now combine the two expressions to evaluate ‰ if we
notice that the components of the vector F above are
1 @‰
Fi D : (7.86)
hi @qi
Therefore, substituting (7.86) into (7.81) gives the final expression for the Laplacian
sought for:
1 @ h2 h3 @‰ @ h3 h1 @‰ @ h1 h2 @‰
‰ D C C :
h1 h2 h3 @q1 h1 @q1 @q2 h2 @q2 @q3 h3 @q3
(7.87)
@2 ‰ @2 ‰ @2 ‰
‰ D C C ;
@x2 @y2 @z2
7.10 Curl of a Vector Field 529
Problem 7.19. Show that the Laplacian in the spherical system .r; ; / is
1 @ 2 @‰ 1 @ @‰ 1 @2 ‰
‰ D 2 r C 2 sin C 2 2 : (7.89)
r @r @r r sin @ @ r sin @ 2
r 2
Problem 7.20. Calculate
‰ in the spherical system for ‰ D e cos .
r2 2 2
[Answer: 2e 3 2r C r cos .]
Problem 7.21. Show that in parabolic coordinates (see Problem 7.4) the
Laplacian has the form:
1 1 @ @‰ 1 @ @‰ 1 @2 ‰
‰ D 2 u C v C :
u C v2 u @u @u v @v @v .uv/2 @ 2
er 1 2
D er :
r r
Hence, show that the function .r/ er =r satisfies the differential equation
2
.r/ D 4 ı .r/ :
r
To calculate the curl of a vector field F in curvilinear coordinates, we shall use the
intrinsic definition of the curl given in Sect. I.6.6.2. It states that the curl of F at point
P can be defined in the following way: choose an arbitrary smooth surface S through
the point P and draw a contour L around that point within the surface. Then the curl
530 7 Curvilinear Coordinates
of F at point P is given by the limit in which the area A enclosed by the contour is
tending to zero keeping the point P inside it:
I
1
curl F n D lim F dl ; (7.90)
A!0 A L
where n is the normal to the surface at point P and in the line integral the contour is
traversed such that the point P is always on the left. The direction of the normal n is
related to the direction of the traverse around the contour L by the right-hand screw
rule.
To calculate the curl of F; we note that the limit should not depend on the shape
of the region enclosed by the contour L (or the shape of the latter) as long as the
point P remains inside it when taking the limit (the curl F is a well-defined vector
field which only depends on F at the point P). Hence, to perform the calculation, it
is convenient to choose the surface S as the qi coordinate surface for the calculation
of the i-th component of the curl. Indeed, with such a choice and for an orthogonal
curvilinear system, the normal n to the surface at point P coincides with the unit
base vector ei . The case of i D 1 is illustrated in Fig. 7.7. Hence, with that particular
choice of S, from Eq. (7.90),
I
1
curl F n D curl F ei D .curl F/i D lim F dl ;
A!0 A L
Therefore, to calculate the i-th component of the curl, we can simply choose the qi
coordinate surface passing through the point P. In addition, we can also choose the
contour L to go along the other two coordinate lines—see again Fig. 7.7, where in
the case of i D 1 the contour is a distorted rectangular with its sides running along
the q2 and q3 lines shifted from their values at the point P by ˙ 12 ıq2 and ˙ 12 ıq3 ,
respectively.
Let us calculate .curl F/1 at P. The surface S is taken as the q1 coordinate surface
with the contour L being ABCD in Fig. 7.7 passed anti-clockwise as indicated (note
the direction of the normal e1 to S). The line integral along the contour contains four
components which we have to calculateone by one. The line AB corresponds to the
q3 coordinate line through the point F q1 ; q2 C 12 ıq2 ; q3 . To the leading order, F
can be considered fixed at its value at the point F, resulting in the contribution
Z Z Z Z
F dl D F e3 dl D F3 dl D F3 dl D .F3 h3 /F ıq3 ;
AB AB AB AB
7.10 Curl of a Vector Field 531
where h3 ıq3 is the length of the line AB. The expression in the round brackets is to
be calculated at the point F as indicated above. Similarly, the contribution from the
opposite piece CD is
Z
F dl D .F3 h3 /H ıq3 ;
CD
where the minus sign comes from the fact that in this case the direction is opposite
to that of e3 , i.e. dl D e3 dl, and the index
H means that the expression in the round
brackets is to be calculated at point H q1 ; q2 12 ıq2 ; q3 . The latter differs from F
only in its second coordinate. Therefore, the sum of these two contributions is
Z
F dl D .F3 h3 /F ıq3 .F3 h3 /H ıq3 D Œ.F3 h3 /F .F3 h3 /H ıq3
ABCCD
The terms in the expression in the square brackets can be expanded in the Taylor
series keeping the first two terms:
@ ıq2 @ ıq2
.F3 h3 /q2 Cıq2 =2 D .F3 h3 /q2 C .F3 h3 / D .F3 h3 /P C .F3 h3 / ;
@q2 q2 2 @q2 P 2
@ ıq2 @ ıq2
.F3 h3 /q2 ıq2 =2 D .F3 h3 /q2 .F3 h3 / D .F3 h3 /P .F3 h3 / ;
@q2 q2 2 @q2 P 2
so that
Z
@
F dl D .F3 h3 /q2 Cıq2 =2 .F3 h3 /q2 ıq2 =2 ıq3 D .F3 h3 / ıq2 ıq3 :
ABCCD @q2 P
532 7 Curvilinear Coordinates
In a similar manner, we find that the line integrals along DA and BC are
Z
@
F dl D .F2 h2 /q3 ıq3 =2 .F2 h2 /q3 Cıq3 =2 ıq2 D .F2 h2 / ıq2 ıq3 :
DACBC @q3 P
This expression is finite; any other terms corresponding to higher orders in ıqi
vanish in the A ! 0 (or ıqi ! 0) limit.
The other two components of the curl, namely .curl F/2 and .curl F/3 , are
obtained in the similar fashion (perform this calculation as an exercise!) giving in
the end:
e1 @ @ e2 @ @
curl F D .h3 F3 / .h2 F2 / C .h1 F1 / .h3 F3 /
h2 h3 @q2 @q3 h3 h1 @q3 @q1
e3 @ @
C .h2 F2 / .h1 F1 / : (7.93)
h1 h2 @q1 @q2
It must be immediately seen that in the Cartesian coordinates we recover our old
Eq. (I.6.73):
ˇ ˇ
ˇ i j k ˇˇ
ˇ
curl F.r/ D ˇˇ @=@x @=@y @=@z ˇˇ : (7.95)
ˇ F Fy Fz ˇ
x
7.11 Some Applications in Physics 533
Problem 7.23. Show that the curl in the cylindrical system .r; ; z/ is
1 @Fz @F @Fr @Fz ez @ @Fr
curl F D er Ce C rF :
r @ @z @z @r r @r @
(7.96)
Problem 7.24. Show that the curl in the spherical system .r; ; / is
er @ @F e @Fr @
curl F D F sin C sin rF
r sin @ @ r sin @ @r
e @ @Fr
C .rF / : (7.97)
r @r @
Problem 7.25. Calculate in the spherical coordinates the curl of the vector
field F D r2 er C r2 sin e . Show that only the second term in the vector field
contributes. Hence, calculate the curl at the point P r D 1; D 2 ; D 2 .
[Answer: 3r sin e D 3e .]
Many problems in physics such as heat and mass transport, wave propagation, etc.,
are described by PDE. These are equations which are generally very difficult to
solve. However, if a physical system has a symmetry, then an appropriate curvilinear
coordinate system may help in solving these PDEs. Here we consider some of the
well-known PDEs of mathematical physics in cylindrical and spherical coordinate
systems and illustrate on simple examples how their solution can be obtained in
some cases.
We shall start from the wave equation (Sect. I.6.7.1):
1 @2 ‰
‰ D ;
c2 @t2
where ‰ .r; t/ is a wave field of interest, c is a constant corresponding to the
speed of wave propagation. If the system possesses a cylindrical symmetry, then
the Laplacian has the form (7.88) and the PDE can be rewritten as
1 @ @‰ 1 @2 ‰ @2 ‰ 1 @2 ‰
r C 2 2
C 2 D 2 2 : (7.98)
r @r @r r @ @z c @t
534 7 Curvilinear Coordinates
where ‰ D ‰ .r; ; /.
The diffusion or heat transport equations can be considered exactly along the
same lines; the only difference is that in the right-hand side instead of the second
time derivative we have the first.
As an example, consider a simple electrostatic problem of a potential ' due to a
radially symmetric charge distribution .r/. We shall assume that .r/ ¤ 0 only
for 0 r R, i.e. we consider a spherical charged ball of radius R with the
charge density which may change with the distance from the centre. In this case
the potential ' satisfying the Poisson equation,
' D 4 ;
will only depend on r, and hence we can drop the and terms in the Laplacian,
which leads us to a much simpler equation:
1 d 2 d' d d'
r D 4 .r/ H) r2 D 4 r2 .r/ :
r2 dr dr dr dr
This is an ordinary DE which can be easily solved. Integrating both sides between
0 and r 0 and assuming that the derivative of the potential at the centre is finite,
we get
Z r
d' d' Q.r/
r2 D 4 r12 .r1 / dr1 H) D 2 ; (7.100)
dr 0 dr r
4
We keep the same notation for it as in Cartesian coordinates solely for convenience; however,
when written via cylindrical coordinates, it will become a different function.
7.11 Some Applications in Physics 535
where
Z r Z
Q.r/ D 4 r12 .r1 / dr1 D dV (7.101)
0 Sphere
is the total charge contained in the sphere of radius r. This is because dV D 4 r12 dr1
corresponds exactly to the volume of a spherical shell of radius r1 and width dr1
(indeed, integrating over the angles 0 and 0 2 of the volume
element dV D r12 sin dd in the spherical system, one obtains this expression
for dV).
Consider first the potential outside the sphere, r R. In this case Q.r/ D
Q.R/ D Q0 is the total charge of the sphere. Then, integrating Eq. (7.100) between
r and 1 and setting the potential ' at infinity to zero, we obtain a simple result
.r/ D Q0 =r. Remarkably, this point charge formula is valid for any distribution
of charge inside the sphere provided it remains spherically symmetric.
Consider now the potential inside the sphere, i.e. for 0 r R. In this case
Q.r/ corresponds to the charge inside the sphere of radius r. Integrating Eq. (7.100)
between r and R, we find
Z R Z R
Q .r1 / dr1 Q0 Q .r1 / dr1
'.r/ D '.R/ C 2
D C : (7.102)
r r 1 R r r12
Here we set '.R/ D Q0 =R to ensure continuity of the solution across the boundary
of the sphere.
It is immediately seen from this formula, that the potential of a spherical layer
of a finite width inside it is constant. Indeed, if ¤ 0 only for R1 r R2 , then
Q.r/ D 0 for 0 r R1 , and hence for these distances from the centre
Z R2
Q0 Q .r1 / dr1
'.r/ D C ;
R2 R1 r12
i.e. the potential does not depend on r, it is a constant inside the layer. Correspond-
ingly, the electric field,
d'
E D r' D er ;
dr
is zero. Above, Eq. (7.73) was used for the gradient in the spherical coordinates.
Problem 7.26. Show that the above result can also be written in the form:
Z
2Q0 Q.r/ R
'.r/ D C C 4 r1 .r1 / dr1 :
R r r
[Hint: use the explicit definition of Q.r/ as an integral in Eq. (7.102) and then
change the order of integration.]
536 7 Curvilinear Coordinates
d2 r
m DF; (7.103)
dt2
is rewritten in the same curvilinear system, it may result in a drastic simplification
and even possibly solved.
What we need to do is to transform both sides of equation (7.103) into a general
curvilinear coordinate system. Since the force is assumed to be already written in
this system,
3
X
FD Fi ei ; (7.104)
iD1
where the dot above qi means its time derivative. The acceleration entering the
Newton’s equations (7.103) is the time derivative of the velocity:
X
dv d dei
aD D ei .hi qPi / C hi qPi : (7.107)
dt i
dt dt
7.11 Some Applications in Physics 537
The second term within the brackets is generally not zero as the unit base vectors
may change their direction as the particle moves. For instance, if the particle moves
along a circular trajectory around the z axis and we use the spherical system, the
unit base vectors er , e and e will all change in time.
To calculate the time derivative of the unit base vectors, we recall the general
relationship (7.9) between the unit base vectors of the curvilinear system in question
and the Cartesian vectors e0i . The idea here is that the Cartesian vectors do not
change in time with the motion of the particle, they always keep their direction.
Therefore,
0 1
X3 X3
dei d dmij 0
D @ mij e0j A D e :
dt dt jD1 jD1
dt j
Here elements of the matrix M D mij are some functions of the curvilinear
coordinates .q1 .t/; q2 .t/; q3 .t//, and hence the derivatives of mij are expressed via
time derivatives of the coordinates. Expressing back the Cartesian vectors e0i via the
unit base vectors ei using the inverse matrix M 1 , we arrive at:
X3 3 3 3
dei dmij 0 X dmij X 1 X
D ej D M jk ek D dik ek ; (7.108)
dt jD1
dt jD1
dt kD1 kD1
X3
dmij 1 dM 1
dik D M jk H) DD M : (7.109)
jD1
dt dt
Substituting Eq. (7.108) into (7.107), we obtain an equation for the acceleration a in
the given curvilinear system expressed via the coordinates and their time derivatives.
Finally, note that if the force is initially given in the Cartesian coordinates,
F .x; y; z/ D Fx .x; y; z/i C Fy .x; y; z/j C Fz .x; y; z/k D Fx e01 C Fy e02 C Fz e03 ;
it can always be transformed into the preferred curvilinear system using the
transformation relations (7.1) and the relationship (7.10) between the Cartesian
e0i and the curvilinear ei unit base vectors. As a result, the force takes on the
form (7.104).
As an example, let us work out explicit equations for the velocity and acceleration
for the cylindrical system .r; ; z/. We start by writing the velocity (recall that
hr D 1, h D r and hz D 1). Using Eq. (7.106), we immediately obtain
v D rP er C r P e C zPez : (7.110)
538 7 Curvilinear Coordinates
To calculate the acceleration, we need to work out the elements of the matrix D,
Eq. (7.109). Since for the cylindrical system the matrices M and M 1 are given by
Eqs. (7.16) and (7.17), respectively, we obtain
2 0 13 0 1
cos sin 0 cos sin 0
dM 1 4 d @
DD M D sin cos 0 A5 @ sin cos 0A
dt dt
0 0 1 0 0 1
0 10 1 0 1
P sin P cos 0 cos sin 0 0 P0
@ P P
D cos sin 0 A @ sin A
cos 0 D P@ 0 0A ;
0 0 0 0 0 1 0 00
so that
der de dez
D d12 e2 D P e ; D d21 e1 D P er and D0: (7.111)
dt dt dt
Therefore, the acceleration (7.107) becomes
d der d P de d dez
a D er .hr rP / C hr rP C e h Ch P C ez .hz zP/ C hz zP
dt dt dt dt dt dt
d P
D .Rrer C rP eP r / C r e C r P eP C zRez :
dt
Using expressions (7.111) for the derivatives of the unit base vectors obtained above,
we can write the final expression:
a D rR r P 2 er C 2Pr P C r R e C zRez : (7.112)
Problem 7.28. Show that for the spherical system the derivatives of its unit
base vectors are
P C P sin e ;
eP r De P r C P cos e ;
eP De eP D P .sin er C cos e / :
(7.113)
Correspondingly, the velocity and acceleration in this system are
P C r P sin e ;
v D rP er C re (7.114)
Problem 7.29. Prove the following relations for the parabolic cylinder system,
Problem 7.3:
uvP v uP uvP C v uP
eP u D ev ; eP v D eu ; eP z D 0 :
u2 C v 2 u2 C v 2
Then show that the velocity and acceleration of a particle in these coordinates
are
p
v D u2 C v 2 .Pueu C ve P v / C zPez ;
1 ˚ 2
aD p u uP vP 2 C uR u2 C v 2 C 2v uP vP eu
u2 C v 2
C v vP 2 uP 2 C vR u2 C v 2 C 2uPuvP ev C zRez :
Problem 7.30. If a particle moves within the x y plane (i.e. z D 0), the
spherical coordinate system becomes identical to the polar one. Show then
that the equations for the velocity (7.114) and acceleration (7.115) obtained
in the spherical system coincide with those for the polar system, Eqs. (7.110)
and (7.112).
Problem 7.31. Consider a particle of mass m moving under a central force
F D Fr er , where Fr D F.r/ depends only on the distance r from the centre of
the coordinate system.
(i) Show that the equations of motion, F D ma, in this case, when projected
onto the unit base vectors of the spherical system, have the form:
Fr
ar D rR rP 2 r P 2 sin2 D ; a D a D 0 ; (7.116)
m
where ar , a and a are components of the acceleration given in
Eq. (7.115).
(ii) The angular momentum of the particle is defined via L D m Œr v. Show
that
P
L D mr2 P sin e C e : (7.117)
PD L : (7.118)
mr2
Therefore, the particle moves within the x y plane performing a two-
dimensional motion. Note that r D r.t/ in Eq. (7.118) and the angle
.t/ does not change its sign along the whole trajectory. For instance, if
L > 0, then P > 0 during the whole motion, i.e. the particle performs a
rotation around the centre with its angle advancing all the time.
(v) Show that the total energy of the particle,
m 2 mvr2
ED rP C r2 P 2 C U.r/ D C Ueff .r/ ; (7.119)
2 2
where vr D rP is the radial component of the velocity, and
L2
Ueff .r/ D U.r/ C
2mr2
is the so-called effective potential energy of the particle, while U.r/ is
the potential energy of the field itself, F.r/ D dU=dr.
(vi) Therefore, establish the following equation for the distance r.t/ to the
centre:
L2 F.r/ dUeff
rR D H) mRr D : (7.120)
m2 r3 m dr
This radial equation corresponds to a one-dimensional motion under
the effective potential Ueff .r/, which is also justified by the energy
expression (7.119).
(vii) Then prove that the energy is conserved, EP D 0.
(viii) Choosing r as a new argument in Eq. (7.120) (instead of time) and
integrating (cf. Sect. I.8.3), show that the equation for the radial velocity
vr can be written as:
r
2
vr D E Uff .r/ : (7.121)
m
Note that this formula is equivalent to the energy expression (7.119).
The above result gives an equation for the velocity vr as a function of r.
Integrating it, obtain an equation for r.t/:
Z r
dr
q Dt; (7.122)
2
r0
m
ŒE Ueff .r/
(continued)
7.11 Some Applications in Physics 541
Problem 7.32. A boat crosses a river of width a with a constant speed D jvj.
Assume that the water in the river flows with a constant velocity V at any point
across its width. The boat starts at a point A at one bank of the river and along
the whole journey its velocity v is kept directed towards the point B which is
exactly opposite the starting point at the opposite bank. Using the polar system
.r; / with the centre of the coordinate system placed at point B, the x axis
pointing towards point A and the y axis chosen along the river flow, show that
the trajectory r. / of the boat can be written as:
v=2V
a 1 C sin
rD :
cos 1 sin
X
n X 2 3n
X
mvp˛ mv 2
EKE D D i
;
pD1 ˛
2 iD1
2
1 ˇEKE
dPv D e dv1 dv2 dvN ;
Z
where ˇ D 1=kB T is the inverse temperature and N D 3n is the total number of
degrees of freedom in the whole system. Z is the normalisation constant.
Now, let us find a probability PE dE for the whole system to have its total kinetic
energy to be within the interval between E and E C dE. This can be obtained by
calculating the integral
Z Z 3n
! 3n
!
1 1 1 X mv 2 X mv 2
PE D ı E i
exp ˇ i
dv1 dv2 dvN :
Z 1 1 iD1
2 iD1
2
Indeed, because of the delta function, only such combinations of the velocities will
be accounted for which correspond to the total kinetic energy equal exactly to E.
Then, using results of Sect. 7.6 and the properties of the delta function, show
that
1
PE D N EN=21 eE : (7.124)
2
(continued)
7.11 Some Applications in Physics 543
If an unknown function of several variables and its partial derivatives are combined
in an equation, the latter is called a partial differential equation (PDE). Several
times above we have come across such PDEs when dealing with functions of several
variables. For instance, in Sect. I.6.5.1 we derived the continuity equation for the
particle density which contains its partial derivatives with respect to time t and the
spatial coordinates x, y and z. In Sect. I.6.7.11 the so-called wave equation
1 @2 ‰ @2 ‰ @2 ‰ @2 ‰ 1 @2 ‰
D C C or D ‰ (8.1)
c2 @t2 @x2 @y2 @z2 c2 @t2
was derived for components ‰.x; y; z; t/ of the electric and magnetic fields which
they satisfy in free space, while a heat transport equation
1 @‰ @2 ‰ @2 ‰ @2 ‰ 1 @‰
D C C or D ‰ (8.2)
D @t @x2 @y2 @z2 D @t
1
In the following, references to the first volume of this course (L. Kantorovich, Mathematics for
natural scientists: fundamentals and basics, Springer, 2015) will be made by appending the Roman
number I in front of the reference, e.g. Sect. I.1.8 or Eq. (I.5.18) refer to Sect. 1.8 and Eq. (5.18) of
the first volume, respectively.
In the stationary case when, e.g. the distribution of temperature (or density)
across the system stopped changing with time, the time derivative @‰=@t D 0 and
one arrives at the Laplace equation
‰ D 0 : (8.3)
For instance, the 1D variant of this equation, @2 ‰=@x2 D 0, describes at long times
the distribution of temperature in a rod when both its ends are kept at two fixed
temperatures. The Laplace equation is also encountered in other physical problems.
For instance, it is satisfied by an electrostatic potential in regions of space where
there are no charges.
Of course, the wave, diffusion and Laplace equations do not exhaust all possible
types of PDEs which are encountered in solving physical problems; however, in this
chapter we shall limit ourselves to discussing only these.
The PDE is characterised by its order (the highest order of the partial derivatives)
and whether it is linear or not (i.e. whether the unknown function appears only to
the first degree anywhere in the equation, either on its own or when differentiated).
If an additional function of the variables appears as a separate term in the equation,
it is called inhomogeneous; otherwise, the PDE is homogeneous. It follows then,
that both diffusion and wave equations are linear, homogeneous and of the second
order. At the same time, the PDE
@2 @2
C D ae
@x2 @y2
is non-linear, although also of the second order and homogeneous, while the PDE
1 @‰ @2 ‰ @2 ‰ @2 ‰
F.x; t; z; t/ C D C C ;
D @t @x2 @y2 @z2
which contains an additional function F.x; y; z; t/, is inhomogeneous and linear, and
still of the second order.
Let us have a closer look at a general second order linear PDE with constant
coefficients. Let the unknown function of n variables X D .xi / D .x1 ; x2 ; : : : ; xn / be
‰ .x1 ; x2 ; : : : ; xn / D ‰ .X/. Then the general form of such an equation reads
X
n X
n
aij ‰xi xj C bi ‰xi C c‰ C f .X/ D 0 ; (8.4)
i;jD1 iD1
8.1 General Consideration 547
where we have used simplified notations for the partial derivatives: ‰xi xj D
@2 ‰=@xi @xj and ‰xi D @‰=@xi . The first term in the above equation contains all
second order derivatives, while the second term all first order derivatives. Because
the mixed second order derivatives are symmetric, ‰xi xj D ‰xj xi , the square n n
matrix A D aij of the coefficients2 to the second derivatives can always be chosen
symmetric, aij D aji . Indeed, a pair of terms aij ‰xi xj C aji ‰xj xi with aij ¤ aji can
always be written as aQ ij ‰xi xj C aQ ji ‰xj xi with aQ ij D 12 aij C aji D aQ ji . Note also that
one of the coefficients to the coefficients, aij or aji , could be zero; still, we can always
split it equally into two, i.e. aQ ij D 12 aij D aQ ji , to have the matrix A of the coefficients
of ‰xi xj in Eq. (8.4) symmetric. The coefficients bi to the first order derivatives form
an n-dimensional vector B D .bi /. We assume that the matrix A D aij , vector B
and the scalar c are constants, i.e. they do not depend on the variables X. The PDE
may also contain a general function f .X/. As was mentioned above, if this function
is not present, the PDE is homogeneous; otherwise, it is inhomogeneous.
We shall now show that the PDE (8.4) can always be transformed into a canonical
form which does not have mixed second derivatives. Then the transformed PDE can
be characterised into several types. This is important as the method of solution
depends on the type of the PDE as will be clarified in later sections.
The transformation of the PDE (8.4) into the canonical form is made using a
change of variables X ! Y D .yi / by means of a linear transformation:
X
n
yi D uij xj or Y D UX ;
jD1
where U D uij is a yet unknown square n n transformation matrix. To determine
U, let us rewrite our PDE via the new coordinates. We have
@‰ Xn
@‰ @yk Xn
@‰ X n
‰xi D D D uki D uki ‰yk ;
@xi kD1
@yk @xi kD1
@yk kD1
!
@ X @‰
n
@
‰xj xi D ‰x D uki
@xj i @xj kD1 @yk
!
Xn
@ X @‰
n
@yl X n
@2 ‰ X n
D uki D ulj uki D uki ‰yk yl ulj :
lD1
@yl kD1 @yk @xj l;kD1
@yl @yk l;kD1
2
Following our notations of Chap. 1, we shall use non-bold capital letters to designate vectors and
matrices, and the corresponding small letters to designate their components.
548 8 Partial Differential Equations of Mathematical Physics
or
X
n X
n
a0kl ‰yk yl C b0k ‰yk C c‰ C f1 .Y/ D 0 ; (8.5)
l;kD1 kD1
X
n
A0 D a0kl ; where a0kl D uki aij ulj or A0 D UAU T ; (8.6)
i;jD1
a new vector
X
n
B0 D b0k ; where b0k D uki bi or B0 D UB ; (8.7)
iD1
and a new function f1 .Y/ D f U 1 Y .
To eliminate mixed second order derivatives of ‰, one has to choose the
transformation matrix U in such a way that the matrix A0 be diagonal. Since the
matrix A is symmetric, this can always be done by choosing U to be the modal
matrix of A (see Sect. 1.2.10.3): if eigenvalues of A are numbers 1 , 2 ; : : : ; n
(which are real since A is symmetric, Sect. 1.2.10.2), with the corresponding
eigenvectors B1 , B2 , : : : ; Bn (so that ABi D i Bi ), then one can define the matrix
U D .B1 B2 Bn / consisting
of the eigenvectors of A as its column. Hence, the
matrix A0 D UAU T D ıij i has the diagonal form with the eigenvalues of A
standing on its diagonal in the same order as the corresponding eigenvectors in U.
Therefore, after this choice, PDE (8.5) transforms into:
X
n X
n
k ‰yk yk C b0k ‰yk C c‰ C f1 .Y/ D 0 : (8.8)
kD1 kD1
This PDE does not have mixed derivatives anymore, only the diagonal second order
derivatives ‰yk yk D @2 ‰=@y2k are present. This finalises the transformation into the
canonical form.
Now we are ready to introduce the characterisation scheme. It is based entirely
on the values of the eigenvalues fk g. The original PDE (8.4) is said to be of elliptic
type if all eigenvalues are of the same sign (e.g. all positive). It is of hyperbolic type
if all eigenvalues, apart from precisely one, are of the same sign. Finally, the PDE
is said to be of parabolic type if at least one ‰yk yk term is missing which happens if
the corresponding eigenvalue k is zero. For functions of more than three variables
.n 4) more cases are also possible; for instance, in the case of n D 4 it is possible
to have two eigenvalues of one sign and two of the other. We shall not consider those
cases here.
8.1 General Consideration 549
It follows from these definitions that the wave equation (8.1) is of hyperbolic type,
the Laplace equation (8.3) is elliptic, while the diffusion equation (8.2) is parabolic.
Indeed, in all these three cases the equations already have the canonical form. Then,
in the case of the wave equation
1 @2 ‰ @2 ‰ @2 ‰ @2 ‰
C C C D0 (8.9)
c2 @t2 @x2 @y2 @z2
one coefficient (by the second order time derivative) is of the opposite sign to the
coefficients of the spatial derivatives which means that the PDE is hyperbolic. The
heat transport equation does not have the second order time derivative term and
hence is parabolic. Finally, the Laplace equation has all coefficients to the second
order derivatives equal to unity and hence is elliptic.
The PDE (8.8) can be simplified even further: it appears that it is also possible to
eliminate the terms with the first derivatives. To do this, we introduce a new function
ˆ.Y/ via
!
Xn
‰.Y/ D exp i yi ˆ .Y/ ; (8.10)
iD1
where i are new parameters to be determined. These are chosen in such a way as
to eliminate terms with first order derivatives. We have
!
X n
‰yk D ˆyk C k ˆ exp i yi ;
iD1
!
X
n
‰yk yk D ˆyk yk C 2k ˆyk C k2 ˆ exp i yi ;
iD1
so that Eq. (8.8) is manipulated into the following PDE with respect to the new
function ˆ.Y/:
" #
Xn X
n
X
k ˆyk yk C 2k k C b0k ˆyk C c C k k C b0k k ˆ C f2 .Y/ D 0 ;
kD1 kD1 k
(8.11)
where
!
X
n
f2 .Y/ D exp i yi f1 .Y/ : (8.12)
iD1
X
n
k ˆyk yk C c0 ˆ C f2 .Y/ D 0 ; (8.13)
kD1
550 8 Partial Differential Equations of Mathematical Physics
where
Xn 0 2
0 bk
c Dc :
kD1
4 k
where the coefficients aij to the second derivatives are real numbers. Show that
this PDE is elliptic if and only if a11 a22 > a212 , it is parabolic if a11 a22 D a212
and is hyperbolic if a11 a22 < a212 . Verify explicitly that the eigenvalues 1 and
2 are always real.
Problem 8.2. Consider the PDE:
Show that this PDE is elliptic for ˛ > 1, hyperbolic for ˛ < 1 and parabolic
for ˛ D 1.
Problem 8.3. Consider the following PDE for the function ‰ .x1 ; x2 /:
assuming that a ¤ ˙b. Show that this PDE can be transformed into the
following canonical form:
.a C b/ ˆy1 y1 C .a b/ ˆy2 y2 C ˆ D 0 ;
where
.b C c/2 .b c/2
D1 ;
8 .a C b/ 8 .a b/
(continued)
8.1 General Consideration 551
and
bCc bc
‰ .y1 ; y2 / D ˆ .y1 ; y2 / exp p y1 p y2 :
2 2 .a C b/ 2 2 .a b/
Argue that the PDE is elliptic if and only if a2 > b2 , and it is hyperbolic if
a2 < b2 . The PDE cannot be parabolic as this would require a2 D b2 , which is
not possible due to the restriction on the coefficients in this problem.
Problem 8.4. Show that the PDE
11
2 ˆy1 y1 ˆy2 y2 C ˆy3 y3 C ˆ D 0 ;
4
where the transformation matrix between the original X and new Y variables
0 p p 1
1=p2 1= p2 0
U D @ 1= 2 1= 2 0 A
0 0 1
and
1 y3
‰ .y1 ; y2 ; y3 / D ˆ .y1 ; y2 ; y3 / exp p .y1 y2 / C :
4 2 2
is hyperbolic.
552 8 Partial Differential Equations of Mathematical Physics
Problem 8.6. Consider a function ‰.x; y/ satisfying the PDE (8.14). Show
that there exists a linear transformation to new variables,
1 x u11 u12 x
DU D ;
2 y u21 u22 y
which transforms this PDE into a form containing only the mixed derivative:
Show that in order to eliminate the terms with diagonal double derivatives,
‰1 1 and ‰2 2 , the ratios u12 =u11 and u22 =u21 must be different roots of the
quadratic equation
with respect to . Why does the condition on the roots being different guarantee
that the determinant of U is not zero?
Problem 8.7. As a simple application of the previous problem, show that the
PDE
is equivalent to the PDE ‰y1 y2 D 0, where the new variables can be chosen as
y1 D x1 2x2 and y2 D x1 x2 .
Problem 8.8. Here we shall return back to Problem 8.6. Specifically, we shall
consider a PDE which results in the roots of Eq. (8.15) being equal:
Show that in this case it is not possible to find a linear transformation, Y D UX,
such that in the new variables the PDE would contain only the single term with
the mixed derivative, ‰y1 y2 D 0. Instead, show that the PDE can be transformed
into its canonical form containing (in this case) only a single double derivative,
‰y2 y2 D 0, where the new variables may be chosen as y1 D 2x1 C x2 and
y2 D x1 C 2x2 .
Problem 8.9. A small transverse displacement ‰ .x; t/ of a flexible tube
containing an incompressible fluid of negligible viscosity flowing along it in
direction x is described by the PDE
Problem 8.10. Show that the 1D wave equation, ‰xx D c12 ‰tt , is invariant
with respect to the (relativistic) Lorentz transformation of coordinates:
v 1=2
x0 D .x vt/ ; t0 D t x ; D 1 v 2 =c2 ;
c2
i.e. in the new (primed) coordinates the equation has an identical form: ‰x0 x0 D
1
‰ 0 0 . Here the non-primed variables, .x; t/, correspond to the position and
c2 t t
time in a laboratory coordinate system, while the primed variables, .x0 ; t0 /,
correspond to a coordinate system moving with velocity v with respect to the
laboratory system along the positive direction of the x axis. Next show that if
the non-relativistic Galilean transformation,
x0 D x vt ; t0 D t ;
is applied, the wave equation does change its form, i.e. it is not invariant with
respect to this transformation.
Concluding, we mention p that, if desired, one may rescale all or some of the
variables yk ! zk D yk = jk j as an additional step. Of course, this can only be
done for those variables yk for which k ¤ 0; one does not need to do rescaling for
those yk for which k is zero since in this case the second derivative term is missing
anyway. This additional transformation
p leads to the corresponding coefficients to
the second derivatives being k = jk j D ˙1, i.e. just plus or minus one.
For instance, in a heat transport problem across a 1D rod we are usually given the
temperatures at its both ends, and are seeking the temperature distribution at all
internal points along the rod in time. This type of additional conditions are called
boundary conditions. The boundary conditions supply the necessary information
on the function of interest associated with its spatial variables. If, as is the case
for the wave or diffusion equations, the time is also involved, then usually we know
the whole function (and may be its time derivative) at the initial (t D 0) time,
and are interested in determining its evolution at later times. This type of additional
conditions is called initial conditions and is analogous to the case of an ordinary DE.
As an example, consider oscillations of a string of length L stretched along the
x axis between the points x D 0 and x D L. As we shall see below, the vertical
displacement u.x; t/ of the point of the string with coordinate x (0 x L) satisfies
the wave equation
1 @2 u @2 u
D :
c2 @t2 @x2
Therefore, alongside the initial conditions,
ˇ
@u ˇˇ
u.x; 0/ D 1 .x/ and D 2 .x/ ; (8.16)
@t ˇtD0
giving values of the unknown function and its first time derivative at all values of
the spatial variable x, one has to supply the boundary conditions,
as well. The boundary conditions establish the values of the function at the edge
(boundary) points x D 0 and x D L of the string at all times. Note that the boundary
conditions may include instead a derivative with respect to x at one or both of these
points, or some linear combination of the various types of terms.
Also note that since the wave equation is of the second order with respect to
time, both the function u.x; 0/ and its first time derivative at t D 0 are required to be
specified in the initial conditions (8.16). In the case of the diffusion equation where
only the first order time derivative is used, only the value of the unknown function
at t D 0 need to be given; the first derivative cannot be given in addition as this
complementary condition may be contradictory.
This idea of the boundary conditions is easily generalised to two- and three-
dimensional spatial PDEs where the values of the unknown functions are to be
specified at all times at the boundary of a spatial region of interest. For instance,
in the case of oscillations of a circular membrane of radius R, one has to specify
as the boundary conditions on the displacement u.x; y; t/ of all boundary points
x2 C y2 D R2 of the membrane at all times t.
What kind of initial and boundary conditions are necessary to specify in each
case to guarantee that a solution of the given PDE can uniquely be found? This
depends on the PDE in question so that this highly important question has to be
considered individually for each type of the PDE. It will be discussed below only
specifically for the wave and the diffusion equations.
8.2 Wave Equation 555
In Sect. I.6.7 the wave equation was derived for the electric and magnetic fields.
However, it is encountered in many other physical problems and hence has a very
general physical significance. To stress this point, it is instructive, before discussing
methods of solution of the wave PDEs, to consider two other problems in which
the same wave equation appears: oscillations of a string and sound propagation in
a condensed media (liquid or gas). This is what we are going to do in the next two
subsections.
Consider a string which lies along the x axis and is subjected to a tension T0 in the
same direction. At equilibrium the string will be stretched along the x axis. If we
apply a perpendicular external force F (per unit length) and/or take the string out
of its equilibrium position and then release it, it will start oscillating vertically, see
Fig. 8.1. Let the vertical displacement of a point with coordinate x be u.x; t/; note
that the vertical displacement is a function of both x and the time, t.
Let us consider a small element AB of the string, with the point A being at x and
the point B at x C dx, see Fig. 8.1. The total force acting on this element is due to
two tensions applied to points A and B which work in the (approximately) opposite
directions, and an (optional) external force F.x/, which is applied in the vertical
direction. We assume that the oscillations are small (i.e. the vertical displacement
u.x; t/ is small compared to the string length). This means that the tensions may be
assumed to be nearly horizontal (although in the figure vertical components of the
forces are greatly amplified for clarity), and cancel each other in this direction (if
they did not, then the string would move in the lateral direction!). Therefore, we
should only care about the balance of the forces in the vertical direction.
Let ˛.x/ be the angle the tangent line to the string makes with the x axis at the
point x, as shown in the figure. Then, the difference in the heights between points B
and A can be calculated as
@u @u
u.x C dx/ u.x/ D dx D tan ˛.x/ dx ' ˛.x/dx H) ' ˛.x/ ;
@x @x
because for small oscillations the angle ˛ is small and therefore tan ˛ ' sin ˛ ' ˛.
Here the time was omitted for convenience (we consider the difference in height at
the given time t). Note that the angle ˛.x/ depends on the point at which the tangent
line is drawn, i.e. it is a function of x.
Then, the vertical component of the tension force acting downwards at the left
point A is
@u
T0 sin ˛.x/ ' T0 ˛.x/ D T0 :
@x A
On the other hand, the vertical component of the tension applied at the point B is
similarly
@u
T0 sin Œ˛.x C dx/ ' T0 ˛.x C dx/ D T0 ;
@x B
but this time the partial derivative of the displacement is calculated at the point B.
Therefore, the net force acting on the element dx in the vertical direction will be
@u @u @u @u
Fdx C T0 T0 D Fdx C T0 :
@x B @x A @x B @x A
The expression in the square brackets gives a change of the function f .x/ D @u=@x
between two points B and A which are separated by dx, hence one can write
@f @2 u
f .x C dx/ f .x/ D dx D 2 dx ;
@x @x
so that the total force acting on the element dx of the string in the vertical direction
becomes
@2 u
F C T0 2 dx :
@x
On the other hand, due to Newton’s equations of motion, this force should be equal
to the product of the mass dx of the piece of the string of length dx and its vertical
acceleration, @2 u=@t2 . Therefore, equating the two expressions and canceling out on
dx, the final equation of motion for the string is obtained:
1 @2 u @2 u
D G.x/ C ; (8.18)
c2 @t2 @x2
p
where G.x/ D F.x/=T0 and the velocity c D T0 = . As one can see, this PDE
is in general inhomogeneous. When the external force F is absent, we arrive at the
familiar (homogeneous) wave equation with the velocity c.
8.2 Wave Equation 557
Consider the propagation of sound in a gas, e.g. in the air. As the gas is perturbed,
this creates density fluctuations in it which propagate in space and time. Corre-
sponding to these fluctuations, any little gas volume dV D dr can be assigned a
velocity v.r/. The latter must satisfy the hydrodynamic equation of motion (I.6.139):
@v 1
C v gradv D F gradP ;
@t
where F is a vector field of external forces (e.g. gravity) and P is the local pressure.
We assume that the velocity field v.r/ changes very little in space, and hence we
can neglect the v gradv term in the above equation, leading to
@v 1
D F gradP : (8.19)
@t
@
C div v C v grad D 0 ;
@t
in which one can neglect the v grad term containing a product of two small terms:
a small velocity and a small variation of the density. Hence,
@
C div v D 0 :
@t
Let 0 be the density of the gas at equilibrium. Then, the density of the gas during
the sound propagation (when the system is out of equilibrium) can be written via
D 0 .1 C s/, where s.r/ D . 0 / = 0 is the relative fluctuation of the gas
density which is considered much smaller than unity. Then, d D 0 ds and therefore
@ @s @s
D 0 H) 0 C div v D 0 : (8.20)
@t @t @t
Also, in the continuity equation we can replace with 0 since
where we dropped the second order term containing a product of two small
terms (s and a small variation of the velocity) to be consistent with the previous
approximations. Hence, the continuity equation takes on a simpler form:
@s
C div v D 0 : (8.21)
@t
558 8 Partial Differential Equations of Mathematical Physics
Finally, we have to relate the pressure to the density. In an ideal gas the sound
propagation is an adiabatic process in which the pressure P is proportional to ,
where D cP =cV is the ratio of heat capacities for the constant pressure and volume.
Therefore, if P0 is the pressure at equilibrium, then
P P
D H) D .1 C s/ ' 1 C s H) grad P D P0 grad s :
P0 0 P0
Substituting this into Eq. (8.19) and replacing with 0 there (which corresponds
again to keeping only the first order terms), we get
@v
D F c2 grad s ; (8.22)
@t
p
where c D P0 = 0 . Next, we take the divergence of both sides of this equation.
The left-hand side then becomes
@v @ @2 s
div D .div v/ D 2 ;
@t @t @t
where we have used Eq. (8.21) at the last step. The divergence of the right-hand side
of Eq. (8.22) is worked out as follows:
div F c2 grad s D divF c2 div grad s D divF c2 s ;
i.e. it contains the Laplacian of s. Equating the left- and the right-hand sides, we
finally obtain the equation sought for:
@2 s
D c2 s divF : (8.23)
@t2
If the external forces are absent, this equation turns into a familiar wave equation in
3D space with the constant c being the corresponding sound velocity:
1 @2 s
D s : (8.24)
c2 @t2
When solving an ordinary DE, we normally first obtain its general solution, which
contains arbitrary constants, and then, by imposing corresponding initial conditions,
a particular integral of the DE is obtained so that both the DE and the initial
conditions are satisfied. The case of PDEs is more complex as a function with more
than one variable is to be sought. However, in most cases one can provide some
analogy with the 1D case of an ordinary DE: if in the latter case the general solution
contains arbitrary constants, the general solution of a PDE contains arbitrary
8.2 Wave Equation 559
functions. Then, a particular integral of the PDE is obtained by finding the particular
functions so that both initial and boundary conditions are satisfied, if present.
To illustrate this point, let us consider a specific problem of string oscillations
considered in Sect. 8.2.1. We shall assume that the string is stretched along the x
axis and is of an infinite length, i.e. 1 < x < 1. The condition of the string
being infinite simplifies the problem considerably as one can completely ignore any
boundary conditions. Then, only initial conditions remain.
Therefore, the whole problem can be formulated as follows. The vertical
displacement ‰.x; t/ of the point with coordinate x must be a solution of the wave
equation:
@2 ‰ 1 @2 ‰
D ; (8.25)
@x2 c2 @t2
subject to the initial conditions
ˇ
@‰ ˇˇ
‰.x; 0/ D 1 .x/ and ‰t .x; 0/ D D 2 .x/ : (8.26)
@t ˇtD0
The wave equation is of hyperbolic type. As follows from Problem 8.6, in some
cases it is possible to find such a linear transformation of the variables .x; t/ H)
.; / that the PDE would only contain a mixed derivative.
Problem 8.11. Show that the new variables .; /, in which the PDE (8.25)
has the form
@2 ‰
D0; (8.27)
@@
D x C ct and D x ct : (8.28)
where u ./ must be another, also arbitrary, function of the other variable . Above,
v ./ is also an arbitrary function of since it is obtained by integrating an
arbitrary function C ./. Recalling what the new variables actually are, Eq. (8.28),
we immediately arrive at the following general solution of the PDE (8.25):
So, the general solution appears to be a sum of two arbitrary functions of the
variables x ˙ ct. Note that this result is general; in particular, we have not used
the fact that the string is of infinite length. However, applying boundary conditions
for a string of a finite length directly to the general solution (8.29) is non-trivial, so
we shall not consider this case here; another method will be considered instead later
on in Sect. 8.2.5.
Before applying the initial conditions (8.26) to obtain the formal solution for the
particular integral of our wave equation of the infinite string, it is instructive first
to illustrate the meaning of the obtained general solution. We start by analysing the
function v .x ct/. It is sketched for two times t1 and t2 > t1 as a function of x in
Fig. 8.2. It is easy to see that the profile (or shape) of the function remains the same
at later times; the whole function simply shifts to the right, i.e. the wave propagates
without any distortion of its shape. This can be seen in the following way. Consider
a point x1 . At time t1 the function has some value v1 D v .x1 xt1 /. At time t2 > t1
the function becomes v .x ct2 /; at this later time it will reach the same value v1 at
some point x2 if the latter satisfies the following condition:
which immediately shows that the function as the whole shifts to larger values of x
since t2 > t1 . This is also shown in Fig. 8.2. It is seen that the function shifts exactly
by the distance x D c .t2 t1 / over the time interval t D t2 t1 . Therefore,
the first part of the solution (8.29), v .x ct/, describes propagation of the wave
shape v.x/ with velocity c to the right. Similarly it is verified that the second part,
u .x C ct/, of the solution (8.29) describes the propagation of the wave shape u.x/
to the left with the same velocity.
8.2 Wave Equation 561
Now, let us find yet unknown functions u.x/ and v.x/ by satisfying the initial
conditions (8.26). Applying the particular form of the general solution (8.29), the
initial conditions read
dv.x/ du.x/
v.x/ C u.x/ D 1 .x/ and c C D 2 .x/ : (8.30)
dx dx
Integrating the second equation between, say, zero and x, we obtain
Z
1 x
Œv.x/ v.0/ C Œu.x/ u.0/ D 2 ./ d ;
c 0
which, when combined with the first equation in (8.30), allows solving for both
functions:
Z
1 1 x
u.x/ D A C 1 .x/ C 2 ./ d ;
2 c 0
Z
1 1 x
v.x/ D 1 .x/ A 2 ./ d ;
2 c 0
where A D u.0/v.0/ is a constant. Interestingly, when substituting these functions
into the general solution (8.29), the constant A cancels out, and we obtain
Z
1 1 xCct
‰ .x; t/ D Œ 1 .x ct/ C 1 .x C ct/ C 2 ./ d : (8.31)
2 2c xct
This solution is known as the d’Alembert’s formula. It gives the full solution of the
problem given by Eqs. (8.25) and (8.26) for an infinite string.
As was already mentioned, it is possible to generalise the described method so
that solution of the wave equation of a string of a finite length L (0 x L), with
boundary conditions explicitly given, is possible. However, as was remarked before,
the method becomes very complicated and will not be considered here since simpler
methods exist. One of such methods, the method due to Fourier, based on separation
of variables will be considered in detail in the following sections.
Problem 8.12. Explain why the solution (8.31) conforms to the general form
of Eq. (8.29).
Problem 8.13. Show that the general solution of the PDE of Problem 8.7 can
be written as
Problem 8.14. Show that the general solution of the PDE of Problem 8.8 can
be written as
By introducing a new function ˆ .r; t/ such that ‰ .r; t/ D r˛ ˆ .r; t/, show
that by taking ˛ D 1 the PDE for ˆ will be of the form ˆrr D c12 ˆtt .
Correspondingly, the general solution of the problem (8.32) can then be
written as
1
‰.r; t/ D Œv .r ct/ C u .r C ct/ :
r
This solution describes propagation of spherical waves from (the first term) and
to (the second) the centre of the coordinate system (r D 0). The attenuation
factor 1=r corresponds to a decay of the wave amplitude with the distance r
to the centre. Since the energy of the wave front is proportional to the square
of the amplitude, the energy decays as 1=r2 . Since the area of the wave front
increases as 4 r2 , the decay of the wave’s amplitude ensures that the wave’s
total energy is conserved.
Problem 8.16. Similarly, consider a wave propagation in the cylindrical
symmetry, Eq. (7.98). In this case the wave equation reads (ignoring the angle
and the coordinate z)
1 @ @‰ 1 @2 ‰
r D 2 2 : (8.33)
r @r @r c @t
1 1
ˆrr C ˆ D 2 ˆtt :
4r2 c
8.2 Wave Equation 563
Problem 8.17. Consider Problem 8.9 again. Show that the general solution of
the PDE is
‰.x; t/ D u .x C 1 t/ C v .x C 2 t/ ;
where u.x/ and v.x/ are two arbitrary functions. Then, assume that the tube
initially was at rest and had a small transverse displacement ‰ D ‰0 cos .kx/.
Show that the subsequent motion of the tube is given by:
‰0
‰ .x; t/ D Œ1 cos .k .x C 2 t// 2 cos .k .x C 1 t// :
1 2
@2 ‰ 2
2@ ‰
D c C f .x; t/ (8.35)
@t2 @x2
is guaranteed. Here f .x; t/ is a function of an “external force” acting on the string at
point x. That force makes the PDE inhomogeneous.
Proving by contradiction, let us assume that two different solutions, ‰ .1/ .x; t/
and ‰ .2/ .x; t/, exist which satisfy the same PDE,
.1/ .2/
‰tt D c2 ‰xx
.1/
C f .x; t/ ; ‰tt D c2 ‰xx
.2/
C f .x; t/ ; (8.36)
564 8 Partial Differential Equations of Mathematical Physics
and the same initial and boundary conditions. Consider then their difference
Obviously, it satisfies the homogeneous wave equation, ‰tt D c2 ‰xx (without the
force term f ) and the zero initial,
ˇ
@‰ ˇˇ
‰.x; 0/ D 0 and D0; (8.37)
@t ˇtD0
and boundary,
and
Consider the free term, ‰x ‰t , which is calculated at x D 0. At this end of the string
the boundary condition states that either ‰ is equal to zero at all times or ‰x . If
the latter is true, then we immediately see that the free term is zero. In the former
case differentiation of the condition ‰ D 0 at x D 0 with respect to time gives
‰t D 0, which again results in the corresponding free term being zero. So, the free
term, .‰x ‰t /xD0 , is zero in either case. Similarly, the other free term, calculated at
x D L, is also zero. So, for all four combinations of the boundary conditions both
free terms are zero and hence only the integral term remains in the right-hand side
in Eq. (8.41). Therefore,
Z
dE L
1
D ‰t ‰xx C 2 ‰tt dx D 0 ;
dt 0 c
8.2 Wave Equation 565
as the expression in the brackets is zero because of the PDE for ‰ itself. Therefore,
E must be a constant in time. Calculating this constant at t D 0,
Z
1 L
1
E.0/ D .‰x .x; 0//2 C .‰t .x; 0// 2
dx ;
2 0 c2
and employing the initial conditions, Eq. (8.37), and the fact that from ‰ .x; 0/ D 0
immediately follows, upon differentiation, that ‰x .x; 0/ D 0, we must conclude
that E.0/ D 0. In other words, the function E.t/ D 0 at all times. On the other
hand, the function E.t/ of Eq. (8.40) consists of the sum of two non-negative terms
and hence can only be zero if and only if both these terms are equal to zero at the
same time. This means that both ‰x .x; t/ D 0 and ‰t .x; t/ D 0 for any x and t.
Since ‰x D 0, the function ‰.x; t/ cannot depend on x. Similarly, since ‰t D 0,
our function cannot depend on t. In other words, ‰.x; t/ must be simply a constant.
However, because of the initial conditions, ‰ .x; 0/ D 0, so that this constant must
be zero, i.e. ‰.x; t/ D 0, which contradicts our assumption of two different solutions
being possible. Hence, it is wrong.
We have just proved that specifying initial and any of the four types of the
boundary conditions uniquely defines the solution of the problem ‰.x; t/ sought
for.
1
‰tt D ‰xx (8.42)
c2
is to be solved with general initial,
To solve this problem, we shall first seek the solution in a very special form as a
product of two functions,
where X.x/ is a function of only one variable x and T.t/ is a function of another
variable t. This trial solution may seem to be too specific and hence may not serve
as a solution of the whole problem (8.42)–(8.44); however, as will be seen below, we
shall be able to construct a linear combination of such product solutions which will
then satisfy our PDE together with the initial and boundary conditions. However,
we shall build our solution gradually, step-by-step.
Substituting the product of the two functions, Eq. (8.45), into the PDE (8.42),
gives
d2 X 1 d2 T 1 d2 X 1 d2 T
T D X H) D : (8.46)
dx2 c2 dt2 X dx2 c2 T dt2
To get the form of the equation written to the right of the arrow, we divided both
sides of the equation on the left of the arrow by the product XT. What we obtained is
2
quite peculiar. Indeed, the left-hand side of the obtained equation, X1 ddxX2 , is a function
2
of x only, while the right-hand side, c21T ddt2T , is a function of t only. One may think
that this cannot possibly be true as this “equality” must hold for all values of x and t.
However, there is one and only one possibility which would resolve this paradoxical
2 2
situation: if both functions, X1 ddxX2 and c21T ddt2T , are constants. Calling this constant K,
we then must have
d2 X
D KX ; (8.47)
dx2
and
d2 T
D c2 KT ; (8.48)
dt2
which are two ordinary DEs for the functions X.x/ and T.t/, respectively. The
constant K is called the separation constant. This is because in the right part of
Eq. (8.46) the variables x and t have been separated. Therefore, the method we are
discussing is called the method of separation of variables.
The constant K introduced above is yet unknown. However, available values of
it can be determined if we impose the boundary conditions (8.44) on our product
solution (8.45). Since in the product the function T.t/ does not depend on x, it is
clear that the boundary conditions must be applied to the function X.x/ only. Hence,
we should have
Let us try to solve Eq. (8.47) subject to this boundary conditions to see which values
of K are consistent with them. Three cases must be considered3 : K < 0, K > 0 and
K D 0.
1. When K > 0, we can write K D p2 with p > 0. For this case the solution
of (8.47) is a sum of two exponential functions:
It is clear that there is only one solution of this system of two simultaneous
algebraic equations which is the trivial solution: A D 0 and B D 0. This in turn
leads to the trivial solution X.x/ D 0, which results in the trivial trial product
solution ‰.x; t/ D 0, and it is not of any physical interest!
2. When K D 0, we find that d2 X=dx2 D 0, which yields simply X.x/ D Ax C B:
This solution is also of no physical interest, because the boundary conditions at
x D 0 and x D L can only be satisfied by A D 0 and B D 0 leading again to the
trivial solution.
3. What is left to consider is the last case of K < 0. We write K D k2 with
(without of loss of generality) some positive k (see below). In this case we obtain
an equation for the harmonic oscillator
d2 X
C k2 X D 0 ;
dx2
When solving these equations, we have to consider all possible cases. Choosing
the constant A D 0 would again yield the trivial solution; therefore, this equation
can only be satisfied if sin .kL/ D 0, which gives us a possibility to choose all
possible values of k. These obviously are
n
kn D ; n D 1; 2; : : : : (8.52)
L
3
Obviously, the constant K must be real.
568 8 Partial Differential Equations of Mathematical Physics
We obtained not one but an infinite number of possible solutions for k, which we
distinguish by the subscript n. Note that n ¤ 0 as this gives the zero value for k
which we know is to be rejected as leading to the trivial solution. Also, negative
values of n (and hence negative values of kn D kjnj D jnj =L D kjnj ) do not
give anything new as these result in the same values of the separation constant
Kn D kn2 and to the solutions
Xjnj .x/ D A sin kjnj x D A sin kjnj x ;
which differ only by the sign from the solutions Xjnj .x/ D A sin kjnj x associated
with the positive n. Hence, the choice of n D 1; 2; 3; : : : in Eq. (8.52); no need to
consider negative and zero n.
From the above analysis we see that the boundary conditions can only be satisfied
when the separation constant K takes certain discrete eigenvalues Kn D kn2 , where
n D 1; 2; : : :. This type of situation is of frequent occurrence in the theory of PDEs.
In fact, we did come across this situation already in Sect. 4.7.2 when we considered
a quantum-mechanical problem of a hydrogen atom.
Associated with the eigenvalue kn we have an eigenfunction
n x
Xn .x/ D An sin ; n D 1; 2; : : : ; (8.53)
L
where An are some constants; these may be different for different n, so we
distinguished them with the index n.
Next we have to solve the corresponding differential equation (8.48) for T.t/ with
K D kn2 , which is again an equation for the harmonic oscillator:
d2 T
C .kn c/2 T D 0 ;
dt2
with the general solution
where
c
!n D n; n D 1; 2; : : : : (8.55)
L
We can now collect our functions (8.53) and (8.54) to obtain the desired product
solution (8.45):
n x 0
‰n .x; t/ D sin Dn sin.!n t/ C En0 cos.!n t/ : (8.56)
L
8.2 Wave Equation 569
b D 0, where
or simply L‰
2 2
bD @ 1 @
L
@x2 c2 @t2
is the operator expression contained in the round brackets in the PDE above. It is
easy to see that this operator is linear, i.e. for any numbers ˛ and ˇ and any two
functions ˆ1 .x; t/ and ˆ2 .x; t/ we have
L b 2
b 1 C ˇ Lˆ
b.˛ˆ1 C ˇˆ2 / D ˛ Lˆ :
b
Indeed, for instance, consider the first part of the operator L:
@2 @2 ˆ1 @2 ˆ2
2
.˛ˆ1 C ˇˆ2 / D ˛ 2 C ˇ 2 ;
@x @x @x
as required. Therefore, if functions ˆ1 and ˆ2 are solutions of the PDE, i.e.
b 1 D 0 and Lˆ
Lˆ b 2 D 0, then L b.˛ˆ1 C ˇˆ2 / D 0 as well, i.e. their arbitrary
linear combination, ˛ˆ1 C ˇˆ2 , must also satisfy the PDE. On top of that, if ˆ1
and ˆ2 satisfy the boundary conditions, then their linear combination, ˛ˆ1 C ˇˆ2 ,
will satisfy them as well.
It is now clear how the superposition principle may help in devising a solution
which satisfies the PDE, the boundary and the initial conditions. We have already
built individual solutions (8.56) that each satisfy the PDE and the zero boundary
conditions. If we construct now a linear combination of all these solutions with
arbitrary coefficients ˛n ,
1
X 1
X n x
‰.x; t/ D ˛n ‰n .x; t/ D ŒBn sin.!n t/ C Cn cos.!n t/ sin ; (8.57)
nD1 nD1
L
570 8 Partial Differential Equations of Mathematical Physics
then this function will satisfy the PDE due to the superposition principle. An
essential point now to realise is that it will also satisfy the zero boundary conditions
as each term in the sum obeys them! Therefore, this construction satisfies the
PDE and the boundary conditions giving us enough freedom to obey the initial
conditions as well: we can try to find the coefficients Bn and Cn in such a way as to
accomplish this last hurdle of the method. Note that the new constants Bn D ˛n D0n
and Cn D ˛n En0 are at this point still arbitrary since ˛n are arbitrary as well as D0n
and En0 .
To satisfy the initial conditions, we substitute the linear combination (8.57) into
conditions (8.43). This procedure yields
1
X n x
‰.x; 0/ D Cn sin D 1 .x/ ; (8.58)
nD1
L
( 1
)
X n x
‰t .x; t/jtD0 D !n ŒBn cos.!n t/ Cn sin.!n t/ sin
nD1
L
tD0
1
X n x
D Bn !n sin D 2 .x/ : (8.59)
nD1
L
Now, what we have just obtained is quite curious: we have just one equa-
tion (8.58) for an infinite number of coefficients Cn , and another single equa-
tion (8.59) for an infinite set of Bn coefficients. How does this help in finding these
coefficients? Well, although we do indeed have just one equation to find an either
set of the coefficients, these equations are written for a continuous set of x values
between 0 and L. That means, that strictly speaking, we have an infinite number of
such equations, and that must be sufficient to find all the coefficients.
Equipped with this understanding, we now need to devise a practical way of
finding the coefficients Cn and Bn . For this we note that we have already come
across the functions sin .n x=L/ in Sect. 3.1 when considering Fourier series. We
have shown there that these eigenfunctions satisfy the orthogonality relation (3.6)
with the integration performed from L to L.
Therefore, we can employ a general method developed there (see also a more
fundamental discussion in Sect. 3.7.3) to find the coefficients: multiply both sides of
Eqs. (8.58) and (8.59) by the function sin .m x=L/ with some fixed m (D 1; 2; : : :),
8.2 Wave Equation 571
then integrate both sides with respect to x from 0 to L, and use the orthogonality
relations (8.60). However, in our case this is not needed: Eqs. (8.58) and (8.59) are
just Fourier sine series for 1 .x/ and 2 .x/, respectively. It follows, therefore, that
in both cases the expressions for the coefficients can be borrowed directly from
Eq. (3.10):
Z
2 L
n x
Cn D 1 .x/ sin dx ; (8.61)
L 0 L
Z
2 L
n x
Bn D 2 .x/ sin dx ; (8.62)
!n L 0 L
where n D 1; 2; : : :. This result finally solves the entire problem as the solu-
tion (8.57) with the coefficients given by Eqs. (8.61) and (8.62) satisfies the PDE
and both the boundary and initial conditions.
In special cases when the string is initially at equilibrium (the initial displacement
is zero, 1 .x/ D 0) and is initially given a kick (so that 2 .x/ ¤ 0 at least for some
values of x), the coefficients Cn D 0 for all n. Conversely, if the string has been
initially displaced, 1 .x/ ¤ 0, and then released so that initial velocities, ‰t .x; 0/,
are zeros (and hence 2 .x/ D 0), then all the coefficients Bn D 0.
Hence, the solution consists of a superposition of oscillations, ‰n .x; t/, associated
with different frequencies !n , Eq. (8.55): each point x of the string in the elementary
oscillation ‰n .x; t/ performs a simple harmonic motion with that frequency. At the
same time, if we look at the shape of the string due to the same elementary motion at
a particular time t, then it is given by the sine function sin .n x=L/. The elementary
motion ‰n .x; t/ is called the n-th normal mode of vibration of the string, and its
frequency of vibration, !n D n c=L, is called its n-th harmonic or normal mode
frequency. The n D 1 normal mode is called fundamental, with the frequency
!1 D c=L. Frequencies of all other modes are integer multiples of the fundamental
frequency, !n D n!1 , i.e. !n is exactly n times larger.
Note that the fundamental frequency is the lowest sound frequency to be given by
the given string. Recalling
p the expression derived in Sect. 8.2.1 for the wave velocity
of the string, c D T0 = , where T0 is the tension and the string density, we see
that there are several ways of affecting the frequency
s
T0
!1 D
L
of the fundamental mode. Indeed, e.g. taking a longer string and/or applying less
tension would reduce !1 , while increasing the tension and/or taking a shorter
string would increase the lowest frequency of sound the string can produce. These
principles are widely used in various musical instruments such as the guitar, violin,
piano, etc. For instance, in a six string guitar by tuning peg heads (or tuning keys)
the tension in a string can be adjusted without changing its length considerably
causing the string fundamental frequency to go either up (higher pitch) or down
572 8 Partial Differential Equations of Mathematical Physics
(lower pitch). Also note that thinner strings in the guitar (they are arranged in a
lower position) produce higher pitch than the thicker ones (which are set up at a
higher position in the set) since thinner strings are denser (thicker strings are hollow
inside). All six strings in a classical guitar are approximately of the same length.
Problem 8.19. Consider a string of length L fixed at both ends. Assume that
the string is initially (at t D 0) pulled by 0:06 at x D L=5 and then released.
Show that the corresponding solution of the wave equation is
1
X
3 n nx cnt
‰.x; t/ D 2 n2
sin sin cos :
nD1
4 5 L L
Problem 8.21. Consider the previous problem again, but assume that the
x D L end of the pipe is closed as well. Show that in this case
1
X 4v0
‰.x; t/ D sin .pn ct/ sin.pn x/ ;
cLp2n
nD1 .odd/
where the summation is run only over odd values of n and pn D n=L.
8.2 Wave Equation 573
Let us now consider a more complex problem of the forced oscillations of the string.
In this case the PDE has the form
where f .x; t/ is some function of x and t. We shall again consider zero boundary
conditions,
The method of separation of variables is not directly applicable here because of the
function f .x; t/. Still, as we shall see below, even in the general case of arbitrary
f .x; t/, it is easy to reformulate the problem in such a way that it can be solved using
the method of separation of variables.
Indeed, let us seek the solution ‰ .x; t/ as a sum of two functions: U .x; t/
and V.x; t/. The first one satisfies the following homogeneous problem with zero
boundary conditions and the original initial conditions,
c2 Uxx DUtt ; U.0; t/DU.L; t/D0 ; U.x; 0/D 1 .x/ and Ut .x; 0/ D 2 .x/ ;
(8.66)
while the other function satisfies the inhomogeneous PDE with both zero initial and
boundary conditions:
It is easy to see that the function ‰ D U C V satisfies the original problem (8.63)–
(8.65). The beauty of this separation into two problems (which is reminiscent of
splitting the solution of an inhomogeneous linear DE into a complementary solution
and a particular integral) is that the U problem is identical to the one we considered
before in Sect. 8.2.5 and hence can be solved by the method of separation of
variables. It takes full care of our general initial conditions. Therefore, we only
need to consider the second problem related to the function V .x; t/ which contains
the inhomogeneity. We only need to find just one solution (the particular integral)
of this second problem. Recall (Sect. 8.2.4) that the solution of the problem (8.67)
is unique.
To find the particular integral V.x; t/, we shall use the following trick. This
function is specified on a finite interval 0 x L only, and is equal to zero at its
ends. However, we are free to extend (“continue”) it to the two times larger interval
L x L. It is also convenient to define V.x; t/ such that it is odd in the whole
574 8 Partial Differential Equations of Mathematical Physics
interval: V.x; t/ D V.x; t/. Moreover, after that, we can also periodically repeat
V.x; t/ thus defined over the whole x axis. This makes V.x; t/ periodic and hence
expandable into the Fourier series (at each t) with the period of 2L:
a0 X h n xi
1
n x
V.x; t/ D C Wn cos C Vn sin ;
2 nD1
L L
here VR n D @2 Vn =@t2 . Multiply both sides by Xm .x/ D sin . mx=L/ with some
integer m and then integrate with respect to x between 0 and L. Because the
eigenfunctions Xn are orthogonal, only the single term with n D m will be left
in both sums, yielding
2
Z L
cm L mx L
Vm .t/ C f .x; t/ sin dx D VR m .t/ ;
L 2 0 L 2
8.2 Wave Equation 575
or
However, great care is needed here when offering this “interpretation”. Indeed,
although originally the function f .x; t/ was defined only for 0 x L, it can
be “defined” additionally (and arbitrarily) for L < x < 0 as well. Then the piece
thus defined for the interval L < x L can be periodically repeated into the whole
x axis, justifying an expansion of f .x; t/ into the Fourier series. This, however, will
contain all terms including the free and the cosine terms. Since f .x; t/ may not in
general be equal to zero at the boundaries x D 0 and x D L, it is impossible to
justify the sine Fourier series in Eq. (8.71) for f .x; t/. At the same time, we have
arrived at Eq. (8.70) for the coefficients fm .t/ without assuming anything about the
function f .x; t/.
So, what is left to do is to solve DE (8.69) with respect to Vm .t/. At this point it
is convenient to recall that V.x; t/ must satisfy zero initial conditions. This can only
be accomplished if for all values of m we have
ˇ
d ˇ
Vm .0/ D Vm .t/ˇˇ D0:
dt tD0
is
Z Z x
1 x 1
y .x/ D C1 C f .x1 / ei!x1 dx1 ei!x C C2 f .x1 / ei!x1 dx1 ei!x ;
2i! 0 2i! 0
(8.73)
(continued)
576 8 Partial Differential Equations of Mathematical Physics
Using formula (8.74) of the Problem and Eq. (8.70), we can write down the
solution of the DE (8.69) as
Z t
1
Vm .t/ D fm .t1 / sin Œ!m .t t1 / dt1
!m 0
Z t Z L
2 mx
D dt1 dx f .x; t1 / sin Œ!m .t t1 / sin : (8.75)
!m L 0 0 L
Once the Fourier coefficients Vm .t/ are defined via the function f .x; t/, then the full
Fourier series (8.68) is completely defined for the auxiliary function V.x; t/ yielding
the solution as ‰ D U C V. This fully solves our problem.
Now we are prepared to solve the most general boundary problem in which both
initial and boundary conditions are arbitrary and the equation is inhomogeneous:
The trick here is to first introduce an auxiliary function, U.x; t/, which satisfies the
boundary conditions above:
There could be many choices to accommodate this requirement. The simplest choice
seems to be a linear function in x:
x
U .x; t/ D '1 .t/ C Œ'2 .t/ '1 .t/ : (8.79)
L
8.2 Wave Equation 577
It is easy to see why this is useful: with this choice the function V.x; t/ D ‰ .x; t/
U .x; t/ satisfies zero boundary conditions. Of course, the PDE for the function
V .x; t/ would look more complex and the initial conditions will be modified.
However, it is easy to see that the problem for this new function can be solved since
we have eliminated the main difficulty of having arbitrary boundary conditions.
Then, the full solution is obtained via
Problem 8.23. Show that the problem for the auxiliary function V .x; t/ cor-
responds to solving an inhomogeneous wave equation with general initial and
zero boundary conditions:
where
x 00
fQ .x; t/ D f .x; t/ '100 .t/ ' .t/ '100 .t/ ;
L 2
Q 1 .x/ D 1 .x/ '1 .0/ x Œ'2 .0/ '1 .0/ ;
L
Q 2 .x/ D 2 .x/ '1 .0/ x '20 .0/ '10 .0/ :
0
L
The full solution of this particular problem was given in Sect. 8.2.6.
Problem 8.24. Consider the following stationary (i.e. fully time independent)
problem:
where '1 and '2 are constants. Note that we do not have any initial conditions
here; it is assumed that when the system was subjected to some initial
conditions with stationary boundary conditions and time independent external
force, after a very long time the system would no longer “remember” the initial
conditions, their effect will be washed out completely.
Integrate twice the PDE with respect to x (keeping the arbitrary constants)
to show that
Z Z x1
1 x
‰ .x/ D C1 x C C2 2 dx1 f .x2 / dx2 :
c 0 0
(continued)
578 8 Partial Differential Equations of Mathematical Physics
and C2 D '1 .
Problem 8.25. Consider a general problem with arbitrary initial conditions,
stationary boundary conditions and time independent function f .x; t/ D f .x/:
Construct the solution as ‰ .x; t/ D U.x/ C V .x; t/, where U.x/ is the
solution of the corresponding stationary problem from Problem 8.24. Show
that the function V.x; t/ satisfies the following homogeneous problem with zero
boundary conditions:
c2 Vxx DVtt I V.0; t/DV.L; t/D0 and V.x; 0/D 1 .x/U.x/; Vt .x; 0/ D 2 .x/:
The method of solving the wave equation for the 1D case considered above can be
generalised to the cases of 2D and 3D as well. To illustrate this point, we shall
consider here a 2D case of transverse oscillations of a square membrane fixed
around its boundary (this corresponds to zero boundary conditions). The membrane
is shown in Fig. 8.3; it is stretched in the x y plane in the intervals 0 x L and
0 y L.
Let ‰.x; y; t/ be the transverse displacement of the point .x; y/ of the membrane
(basically, ‰ is the z coordinate of the point .x; y/ of the oscillating membrane).
Then, the corresponding wave equation to solve is ‰ D c12 ‰tt , or
@2 ‰ @2 ‰ 1 @2 ‰
C D : (8.80)
@x2 @y2 c2 @t2
The membrane is fixed along its perimeter, so that the appropriate boundary
conditions are
‰.x; 0; t/ D ‰.x; L; t/ D 0 ; 0 x L
: (8.81)
‰.0; y; t/ D ‰.L; y; t/ D 0 ; 0 y L
Substituting this trial solution into the PDE (8.80), one gets
1
X 00 YT C XY 00 T D XYT 00 ;
c2
which upon dividing by XYT results in:
X 00 Y 00 1 T 00
C D 2 : (8.84)
X Y c T
Here in the left-hand side, the two terms depend each only on their own variable,
x or y, respectively, while the function in the right-hand side depends only on the
variable t. This is only possible if each of the expressions, X 00 =X, Y 00 =Y and T 00 =T,
is a constant. Therefore, we can write
X 00 Y 00 1 T 00
D k1 ; D k2 ; and hence D k1 C k2 ; (8.85)
X Y c2 T
where k1 and k2 are two independent separation constants. Thus, we have three
ordinary DEs for each of the three functions:
X 00 D k1 X ; (8.86)
Y 00 D k2 Y ; (8.87)
00 2
T D c .k1 C k2 / T : (8.88)
580 8 Partial Differential Equations of Mathematical Physics
The next steps are similar to the 1D case considered above: we first consider
equations for X.x/ and Y.y/ trying to satisfy the boundary conditions; this would
give us the permissible values for the separation constants, k1 and k2 . Once this is
done, the DE for T.t/ is solved. Finally, a general solution is constructed as a linear
combination of all product solutions, with the coefficients to be determined from the
initial conditions.
So, following this plan, let us consider the DE for the function X.x/, Eq. (8.86).
The boundary conditions on the product (8.83) at x D 0 and x D L require the
function X.x/ to satisfy X.0/ D X.L/ D 0. This problem is however fully equivalent
to the one we considered in Sect. 8.2.5 when discussing a one-dimensional string.
Therefore, it immediately follows that k1 D 2 must only be negative with taking
on the following discrete values:
! n D n; n D 1; 2; : : : ;
L
while the allowed solutions for X.x/ are given by the eigenfunctions
n
Xn .x/ D sin x ; (8.89)
L
corresponding to the eigenvalues k1 D n2 for any n D 1; 2; : : :. We do not need to
bother about the constant amplitude (a prefactor to the sine function) here as it will
be absorbed by other constants in T.t/ in the product elementary solution; recall, this
is exactly what happened in the case of the one-dimensional string. So we simply
choose it as one here.
The boundary condition (8.81) applied to (8.87) gives a similar result for k2 ,
namely: k2 D m2 with m D 1; 2; : : : being another positive integer, and the
corresponding eigenfunctions are
m
Ym .y/ D sin y ; m D 1; 2; : : : : (8.90)
L
Again, we do not keep the amplitude (prefactor) to the sine function as it will be
absorbed by other constants to appear in T.t/.
Next we consider Eq. (8.88) for T.t/:
T 00 C c2 2
n C 2
m TD0; (8.91)
which contains both n and m. It is a harmonic oscillator equation with the solution
Here ‰.x; y; 0/ is expanded into a double Fourier series with respect to the sine
functions, so that the expansion coefficients, Bnm , are found from it in the following
way. Multiply both sides of the above equation by the product Xn0 .x/Ym0 .y/ D
0 0
sin Ln x sin Lm y with some fixed positive integers n0 and m0 , and then integrate
both sides over x and y between 0 and L. The eigenfunctions Xn .x/ for all values of
n D 1; 2; : : : form an orthogonal set, and so do the functions Ym .y/. Therefore, in
the left-hand side integration over x leaves only a single n D n0 term in the sum over
n, while integration over y results in a single m D m0 term in the m-sum remaining;
in the right-hand side there is a double integral with respect to x and y. Hence, we
immediately obtain
2 Z L Z L
2 n m
Bnm D dx dy 1 .x; y/ sin x sin y : (8.95)
L 0 0 L L
582 8 Partial Differential Equations of Mathematical Physics
To obtain coefficients Anm , we apply the initial condition to the time derivative of
‰.x; y; t/ at t D 0. Since
1
@‰ X n m
‰t .x; y; t/D D !nm ŒAnm cos .!nm t/ Bnm sin .!nm t/ sin x sin y ;
@t n;mD1 L L
and, therefore, after a similar reasoning to the one used to derive the above formula
for the Bnm coefficient, we obtain
2 Z L Z L
1 2 n m
Anm D dx dy 2 .x; y/ sin x sin y : (8.96)
!nm L 0 0 L L
Equations (8.94)–(8.96) fully solve the problem as they define such a function
‰ .x; y; t/ that satisfies the PDE and both the initial and boundary conditions.
The case of a circular membrane, also treated using the method of separation
of variables, but employing the polar coordinates, was considered in detail in
Sect. 4.7.5.
We have seen above in Sect. 8.2.7 that the Fourier method plays a central role in
solving a general problem with non-zero boundary and general initial conditions.
Therefore, it is worth repeating again the main steps which lie as a foundation of
this method.
The method of separation of variables is only applicable to the problems with
zero boundary conditions. For a PDE of any number of dimensions written for any
number of coordinates, the construction of the general solution is based on several
well defined steps:
1. Construct a product solution as a product of functions each depending just on
one variable.
2. Substitute the product solution into the PDE and attempt to separate the variables
in the resulting equation. This may be done in steps. The simple separation like
the one in Eq. (8.84) may not always be possible; instead, a step-by-step method
is to be used. This works like this: using algebraic manipulations, “localise”
all terms depending on a single variable into a single expression that does not
8.2 Wave Equation 583
contain other variables; hence, this expression must only be a constant. Equate
this expression to the first constant k1 yielding the first ordinary DE for the
variable in question. Once the whole term in PDE is replaced with that constant,
the obtained equation would only depend on other variables for which the same
procedure is repeated until only the last single variable is left; this will be an
ordinary DE containing all separation constants.
3. This procedure gives separated ordinary DEs for each of the functions in the
elementary product we have started from.
4. Solve the ordinary DEs for the functions in the product using zero boundary
conditions. This procedure gives an infinite number of possible solutions for
these functions together with possible values for the corresponding separation
constants, i.e. a set of eigenfunctions and corresponding to them eigenvalues. It
is now possible to write down a set of product solutions ‰i (where i D 1; 2; : : :)
for the PDE. Each of these solutions satisfies the equation and the zero boundary
conditions.
5. Next we use the superposition principle to construct the general solution of the
PDE in the form of the linear combination of all elementary product solutions:
1
X
‰D Ci ‰i : (8.97)
iD1
2 2
where D @@2 x C @@2 y is the 2D Laplacian operator. An application of the product
solution ‰.x; y/ D X.x/Y.y/ in this case gives
1 d2 X 1 d2 Y
Cx 2˛
C Cy2˛
C 2 .xy/2˛ D 0 : (8.99)
X dx2 Y dy2
We see that in Cartesian coordinates Eq. (8.98) is not separable; this is because of
the free term 2 .xy/2˛ .
584 8 Partial Differential Equations of Mathematical Physics
This, however, does not necessarily mean that the PDE is not separable. This
is because in some special coordinate systems it may appear to be separable!
Indeed, suppose now we transform the PDE (8.98) to polar coordinates .r; /. This
procedure gives
1 @ @‰ 1 @2 ‰
r C 2 C r2˛ ‰ D 0 : (8.100)
r @r @r r @ 2
Here the expression in the square brackets is “localised” in the variable and hence
must be a constant. Let us call it k. Hence, we obtain two ordinary DEs:
1 0 0 k
ˆ00 D kˆ and rR C 2 C r2˛ D 0 :
rR r
We have succeeded in separating our PDE into two ordinary DEs in the polar
coordinate system, something which was impossible to do in the Cartesian system.
We see that a PDE may be separable in one coordinate system, but not in another!
This shows the importance of knowing various curvilinear coordinate systems!
@2 ‰ @2 ‰ 2 2
2
C 2
C e˛.x Cy / ‰ D 0 ;
@x @y
@2 ‰ @2 ‰ @2 ‰
C C D0:
@x2 @y2 @z2
‰ D 0 : (8.101)
‰t D ‰xx C f : (8.102)
Note that since our PDE has only the first time derivative, this condition is sufficient.
Various boundary conditions can be supplied. The simplest ones correspond to
certain temperatures at the end points of the interval4 :
however, other possibilities also exist. For instance, if at x D 0 end the heat flux is
known, then the above boundary condition at x D 0 is replaced by
For simplicity, most of our analysis in the coming discussion will correspond to
the 1D heat transport equation, although theorems and methods to be considered
can be generalised for the 2D and 3D cases as well.
As in the case with the wave equation, one may question whether the above
conditions (8.103) and (8.104) which supplement the heat transport equation (8.102)
guarantee that the solution exists and that it is unique. We shall positively answer
the first part of that question later on by showing explicitly how the solution can
be constructed. Here we shall prove the second part, that under certain conditions
the solution of the heat transport problem is unique. We start from the following
theorem.
Theorem 8.1. If ‰ .x; t/ is a continuous function of both its variables for all
values of x, i.e. for 0 x L, and all times up to some time T, i.e. for 0 t T,
and ‰ satisfies the PDE (without the internal sources term)
‰t D ‰xx (8.105)
for all internal points 0 < x < L and times 0 < t T, then ‰ .x; t/ reaches
its maximum and minimum values either at the initial time t D 0 and/or at the
boundaries x D 0 and x D L.
Proof. Consider the first part of the theorem, stating that ‰ reaches its maximum
value either at the initial time and/or at the end points. We shall prove that by
contradiction assuming that ‰ reaches its maximum value at some internal point
.x0 ; t0 /, where 0 < t0 T and 0 < x0 < L. Let M be the maximum value of the
4
It can be shown that the functions '1 and '2 do not need to satisfy the consistency conditions
'1 .0/ D .0/ and '2 .0/ D .L/.
8.3 Heat Conduction Equation 587
‰ .x0 ; t0 / D M C ;
k .t0 t/ k .T t/ kT < :
2
Hence, if we now consider any of the end points .x; 0/, .0; t/ and .L; t/, then for
them
V .x1 ; t1 / V .x0 ; t0 / D M C D M C C :
2 2
However, because at the end points we have that V .x; t/ M C =2, see Eq. (8.106),
the above inequality cannot be satisfied there. Hence, the point .x1 ; t1 / must be an
internal point satisfying 0 < x1 < L and 0 < t1 T.
Since this internal point is a maximum of a function of two variables, x and t, we
should at least have (see Sects. I.5.10.1 and I.5.10.2)5 :
and therefore
5
The other sufficient condition (see Eq. (I.5.90)), that Vxx Vtt .Vxt /2 > 0, is not needed here.
588 8 Partial Differential Equations of Mathematical Physics
which means that the PDE is not satisfied at the point .x1 ; t1 /. We arrived at a
contradiction to the conditions of the theorem which states that ‰ does satisfy the
equation at any internal point. Therefore, our assumption was wrong.
The case of the minimum, corresponding to the second part of the theorem, is
proven by noting that this case can be directly related to the case we have just
considered (the maximum) if the function ‰ 0 D ‰ is considered instead. Q.E.D.
The result we have just obtained has a clear physical meaning: since the internal
sources are absent (Eq. (8.105) is homogeneous), the heat cannot be created in the
system during the heat flow, and hence the temperature cannot exceed either its
initial values or values at both boundaries.
This theorem has a number of important consequences of which we only consider
the one related to the uniqueness of the solution of the heat transport equation with
general boundary conditions.
Proof. Indeed, assume that there are two such solutions, ‰1 and ‰2 , both satisfying
the same PDE and the same initial and boundary conditions. Consider then their
difference V D ‰1 ‰2 . It satisfies the homogeneous PDE and zero initial
and boundary conditions. Since we have just shown that the solution of the
homogeneous problem can reach its maximum and minimum values only at the
end points (i.e. at t D 0 or at x D 0 or x D L), where V is zero, it follows that
V .x; t/ D 0 everywhere. Hence, ‰1 D ‰2 , our assumption is wrong, there is only
one solution to the problem possible. Q.E.D.
Before considering the general problem (8.102)–(8.104), we shall first discuss the
simplest case of a one-dimensional problem with zero boundary conditions:
We shall solve this problem using the method of separation of variables already
discussed above. Assuming the product solution, ‰ .x; t/ D X.x/T.t/, substituting
it into the PDE and separating the variables, we obtain
1 T0 X 00
XT 0 D X 00 T H) D :
T X
Since the expressions on both sides of the equal sign depend on different variables,
each of the expressions must be a constant. Let us denote this separation constant .
Hence, two ordinary DEs are obtained for the two functions, X.x/ and T.t/:
In our discussion on solving a similar problem for the wave equation in Sect. 8.2.5,
we arrived at something similar. Our next step there was to solve the equation for
the function X.x/ first to deduce the possible values for the separation constant .
The peculiarity of the current problem related to the heat transport equation is that
for this job it is convenient to start from solving the equation for the function T.t/
instead.
Indeed, the solution reads: T.t/ D Cet , where C is an arbitrary constant. It
is clear from this result then that the constant cannot be positive. Indeed, in
the case of > 0 the temperature in the system may grow indefinitely with time
which contradicts physics. Therefore, we conclude immediately that must be non-
positive, i.e. negative or equal to zero: 0. We can then write it as D p2 with
p being non-negative. Hence, the solution for T.t/ must be
2
T.t/ D Cep t : (8.109)
Next, we consider the equation for X.x/ which now has the form of the harmonic
oscillator equation:
X 00 C p2 X D 0 ;
each satisfies the PDE and the boundary conditions. Actually, since X0 .x/ D 0, there
is no need to consider the n D 0 function ‰0 .x; t/; it can be dropped.
590 8 Partial Differential Equations of Mathematical Physics
Our strategy from this point on should be obvious: the PDE is linear and
hence, according to the superposition principle, a general linear combination of the
elementary product solutions,
1
X 1
X 2
‰ .x; t/ D ˛n ‰n .x; t/ D ˛n epn t sin .pn t/ ; (8.112)
nD1 nD1
is constructed which also satisfies the PDE and the boundary conditions for any
arbitrary constants f˛n g. Note that we started the summation from n D 1. It may
however not satisfy the initial conditions. So, we have to find such coefficients ˛n
which would ensure this happening as well. Applying the initial conditions to the
function above,
1
X
‰ .x; 0/ D .x/ H) ˛n sin .pn t/ D .x/ ;
nD1
we see that this corresponds to expanding the function .x/ into the sine Fourier
series. Therefore, the coefficients ˛n can be derived without difficulty (see also
Sect. 8.2.5):
Z
2 L
˛n D .x/ sin .pn x/ dx : (8.113)
L 0
The method considered in the previous section is easily generalised to non-zero but
stationary boundary conditions:
Here ‰1 and ‰2 are two temperatures which are maintained constant for t > 0 at
both ends of our 1D system. This problem corresponds to finding the temperature
distribution ‰ .x; t/ in a rod of length L whose ends are maintained at temperatures
‰1 and ‰2 and the initial temperature distribution was .x/.
The trick to applying the Fourier method to this problem is first to get rid
of the non-zero boundary conditions by assuming the stationary solution to the
problem, i.e. the solution ‰1 .x/ which would be established after a very long time
(at t ! 1). Obviously, one would expect this to happen as the boundary conditions
are kept constant (fixed).
The stationary solution of the heat conduction equation does not depend on
time and hence, after dropping the ‰t D 0 term in the PDE, in the 1D case
the temperature distribution is found to satisfy an ordinary differential equation
‰xx D 0. This is to be expected as the ‰xx D 0 equation is the particular case for
8.3 Heat Conduction Equation 591
the Laplace equation (8.101) we already mentioned above as the PDE describing
the stationary heat conduction solution.
Solution of the equation ‰xx D 0 is a linear function, ‰1 .x/ D Ax C B, where
A; B are found from the boundary conditions:
1
B D ‰1 and AL C B D ‰2 H) AD .‰2 ‰1 / ;
L
so that the required stationary solution reads
x
‰1 .x/ D ‰1 C .‰2 ‰1 / : (8.115)
L
Thus, the distribution of temperature in a rod kept at different temperatures at both
ends becomes linear at long times.
Once the stationary solution is known, we seek the solution of the whole
problem (8.114) by writing ‰ .x; t/ D ‰1 .x/ C V .x; t/. Since ‰1 .x/ does not
depend on time, but satisfies the boundary conditions, the problem for the auxiliary
function V .x; t/ is the one with zero boundary conditions:
This PDE is only slightly different from the 1D heat conduction equation we have
been considering up to now. The initial and boundary conditions are, respectively:
1 0 1 0 1 T0 1 0
RT D 2 T r2 R0 H) D 2 r2 R0 :
r T r R
Both sides depend on their own variables and hence must be both equal to the same
non-positive constant D p2 . This ensures a time non-increasing T.t/:
1 T0 2
D p2 H) T.t/ D Cep t :
T
Consider now the equation for R.r/:
1 2 0 0 2
r R D p2 R H) R00 C R0 C p2 R D 0 :
r2 r
sin .pr/
R.r/ D ;
r
where we have dropped the arbitrary constant as it will be absorbed by other
constants later on when constructing the general solution as the linear combination
of all elementary solutions.
Use now the boundary conditions:
sin .pS/ n
R.S/D D0 H) sin .pS/ D0 H) p ) pn D ; nD1; 2; 3; : : : :
S S
We do not need to consider the zero value of n as the eigenfunction Rn .r/ D
r1 sin .pn r/ is equal to zero for n D 0.
Therefore, by constructing an appropriate linear combination, we obtain the
following general solution:
1
X sin .pn r/ p2n t n
V.r; t/ D ˛n e ; with pn D ; n D 1; 2; 3; : : : :
nD1
r S
Note that n starts from the value of one. We also see that V ! 0 at long times as
expected.
The linear combination above satisfies our PDE and the boundary conditions. To
force it to satisfy the initial conditions as well, we write
1
X sin .pn r/
V.r; 0/ D ‰0 ‰1 D ˛n :
nD1
r
6
Recall that limx!0 sin x
x
D 1.
594 8 Partial Differential Equations of Mathematical Physics
At t ! 1 it tends to ‰1 as expected.
Problem 8.30. Consider the heat flow in a bar of length L, described by the
1D heat transport equation ‰xx D ‰t , where is the thermal diffusivity.
Initially the distribution of temperature in the bar is ‰.x; 0/ D .x/, while the
two ends of the bar are maintained at constant temperatures ‰.0; t/ D ‰1 and
‰ .L; t/ D ‰2 . Show that the temperature distribution in the bar is given by
X1
x 2
‰.x; t/ D ‰1 C .‰2 ‰1 / C ˛n sin .pn x/ epn t ;
L nD1
Problem 8.31. Consider a similar problem to the previous one, but assume
that at one end the temperature is fixed, ‰ .0; t/ D ‰1 , while there is no heat
loss at the x D L end, i.e. that ‰x .L; t/ D 0. Show that in this case the general
solution of the problem is
1
X 2
‰.x; t/ D ‰1 C ˛n sin .pn x/ epn t ;
nD0
where pn D L
n C 12 and
Z
2 L
2‰1
˛n D .x/ sin .pn x/ dx :
L 0 pn L
8.3 Heat Conduction Equation 595
Problem 8.32. Consider the heat flow in a bar of length L with the initial
distribution of temperature given by ‰.x; 0/ D ‰0 sin 32Lx and the boundary
conditions ‰ .0; t/ D 0 and ‰x .L; t/ D 0 (cf. the previous Problem). Show that
the temperature distribution in the bar is
2
‰.x; t/ D ‰0 ep1 t sin.p1 x/ ;
where p1 D 3 =2L.
There is one last specific auxiliary problem we need to study before we would
be ready to consider the most general boundary problem for the heat conduction
equation.
So, let us consider the following inhomogeneous problem with stationary
boundary conditions:
while the function U .x; t/ satisfies the homogeneous equation with modified initial
conditions:
where
This latter problem is solved directly using the Fourier method as explained in
the previous section. The problem (8.119) is solved using the method developed
in Sect. 8.2.6.
Problem 8.33. Show that the solution of the problem (8.119) is given by the
following formulae:
1
X n
V .x; t/ D Vn .t/ sin .pn x/ ; pn D ;
nD1
L
where
Z Z
t
2 2 L
Vn .t/ D fn . / epn .t/ d and fn . / D f .x; / sin .pn x/ dx :
0 L 0
‰t D‰xx Cf .x; t/ ; ‰ .x; 0/ D .x/ ; ‰ .0; t/ D'1 .t/ and ‰ .L; t/ D'2 .t/ :
(8.121)
This problem corresponds, e.g. to the following situation: a rod which initially had
a temperature distribution .x/ at t D 0 is subjected at its x D 0 and x D L ends to
some heating according to functions '1 .t/ and '2 .t/. The solution is obtained using
essentially the same method as the one we developed in Sect. 8.2.7 for the wave
equation.
The problem (8.122) has been solved in the previous section. In fact, there we
considered a slightly more difficult problem with non-zero stationary boundary
conditions; the problem (8.122) is even easier as it has zero boundary conditions.
Sometimes boundary conditions are absent, only the initial conditions are specified.
This kind of problems appear, for instance, when the region in which the solution is
sought for is infinite. Even in this case the Fourier method can be used, albeit with
some modifications. As an example of this kind of situation, let us consider a heat
transport PDE in an infinite one-dimensional region:
Here we assume that the function .x/ is absolutely integrable, i.e. that
Z 1
j .x/j dx D M < 1 : (8.124)
1
The idea is to apply the separation of variables and build up all possible elementary
product solutions satisfying the PDE, and then apply the initial conditions. The
calculation, at least initially, goes along the same route as in Sect. 8.3.2. We start
by writing ‰ .x; t/ D X.x/T.t/, next substitute this trial solution into the PDE and
separate the variables:
1 T0 X 00
D D 2 ;
T X
where 2 is the separation constant. Note that, as was discussed in Sect. 8.3.2,
it has to be non-positive to ensure that the obtained solution has proper physical
behaviour. Solving the obtained ordinary DEs,
X 00 C 2 X D 0 and T 0 D 2 T ;
for the functions T.t/ and X.x/, we can write the obtained elementary solution for
the given value of as follows:
2
‰ .x; t/ D C1 ./eix C C2 ./eix e t : (8.125)
598 8 Partial Differential Equations of Mathematical Physics
Here the subscript in ‰ shows explicitly that the function above corresponds to
the particular value of the parameter , and C1 and C2 are arbitrary constants which
may also depend on , i.e. these become some functions of .
The obtained solution (8.125) satisfies the PDE, but not yet the initial conditions.
Due to linearity of the PDE, any linear combination of such elementary solutions
must also be a solution. However, we do not have any boundary conditions in place
to help us to choose the permissible values of . Hence, we have to admit that can
take any positive real values, 0 < 1, and hence the linear combination (the
sum) turns into an integral:
Z 1 Z 1
2
‰ .x; t/ D ‰ .x; t/ d D C1 ./eix C C2 ./eix e t d :
0 0
Because both positive and negative exponents are present, e˙ix , there is no need
to keep both terms in the square brackets. Instead, we simply take one of the terms
and extend the integration over the whole axis, i.e. the expression above can be
rewritten simply as:
Z 1
2
‰ .x; t/ D C./eix e t d ; (8.126)
1
The obtained expression is nothing but an expansion of the function .x/ into the
Fourier integral. Therefore, the function C ./ can be found from this equation
simply by writing the inverse Fourier transform:
Z 1
1
C ./ D .x/ eix dx :
2 1
The integral over is calculated explicitly, see Eq. (2.56), and we obtain
Z 1
‰ .x; t/ D dx1 G .x x1 ; t/ .x1 / ; (8.129)
1
where
1 2 =4t
G .x; t/ D p ex :
4 t
This is the final result. We have already met a 3D analogue of the function, which
appeared in the square brackets of Eq. (5.71), in Sect. 5.3.3. There we solved, using
the Fourier transform (FT) method, the heat conduction equation in an infinite 3D
space. The reader can appreciate that the separation of variables method developed
above brings us to exactly the same result as applying the FT method directly. In
fact, this should not come as a complete surprise as both methods are very close in
spirit; moreover, we benefited directly from the FT in the method above.
The function G, called the Green’s function of the 1D heat conduction equation,
has a very simple physical meaning: it corresponds to the solution of the heat
conduction equation for the initial conditions ‰ .x; 0/ D ı .x/ (this follows directly
from Eq. (8.129)), when the temperature at the point x D 0 was infinite while the
temperature at all other points was zero, i.e. the initial temperature distribution
was at t D 0 an infinite spike at x D 0. Then the function G.x; t/ describes the
distribution of the temperature in our 1D system in time: it is easily seen that
the spike gets smoothed out in both directions x ! ˙1 with time, approaching
the uniform ‰ .x; t/ D 0 distribution in a long time limit as G.x; t/ t1=2 .
The method of separation of variables can also be very useful in solving the Laplace
equation with specific boundary conditions. We discussed in detail the solution of
the Laplace equation in spherical coordinates in Sect. 4.5.3. Here we shall briefly
illustrate the application of the Fourier method to the Laplace problem on a number
of other simple examples.
Consider an infinite cylinder of radius S, the surface of which is kept at a constant
temperature .'/ which depends on the polar angle '. We are interested in the
stationary distribution of the temperature in the cylinder. This problem is most easily
600 8 Partial Differential Equations of Mathematical Physics
formulated in polar coordinates assuming that the cylinder axis coincides with the z
axis (the coordinate z can be ignored altogether as the cylinder is of infinite length):
1 @ @‰ 1 @2 ‰
r C 2 D0; ‰ .S; '/ D .'/ ; (8.130)
r @r @r r @' 2
where in the left-hand side the .r; '/ part of Laplacian in the cylinder coordinates,
Eq. (7.88), has been written.
We start by trying a product solution, ‰ .r; '/ D R.r/ˆ.'/. Substituting it into
the PDE and separating the variables, we find
0 R 0 0 R ˆ00
ˆ rR0 C ˆ00 D 0 H) rR C D0:
r r ˆ
The expression inside the curly brackets depends only on the angle ', hence, it must
be a constant . Therefore, two ordinary DEs for the functions R.r/ and ˆ.'/ are
obtained:
We now have to discuss how the values of the separation constant are to be chosen.
It is easy to see that D m2 , where m is any integer including zero, otherwise the
function ˆ.'/ would not be periodic with the period of 2 (we have already come
across this situation before in Sect. 4.5.3 where this point was thoroughly discussed).
Two cases are to be considered: m D 0 and m ¤ 0. In the latter case, the solution
for the function ˆ is indeed a periodic function,
while the solution of the equation for R.r/ is sought using the trial function
R.r/ D rn , where n is a constant to be determined. Substituting this into the DE
for R, we get
Therefore, the solution for the radial function must be the function
D
R.r/ D Crjmj C ;
rjmj
with C and D being some arbitrary constants. The solution must be finite at r D 0,
and hence the term with the constant D must be omitted, and hence the product
solution in the case of m ¤ 0 is
where Am and Bm are arbitrary constants. Here formally m D ˙1; ˙2; : : :; however,
negative values of m result in basically the same solutions as the positive ones, so
that we can limit ourselves with the positive values of the integer m only.
In the case of m D 0 (and hence D 0) the DE for the function R reads
0 0
r2 R00 CrR0 D0 H) rR D0 H) rR0 DC0 H) R.r/DC0 ln rCD0 ;
with C0 and D0 being arbitrary constants, while the DE for ˆ is simply ˆ00 D 0
yielding a linear function, ˆ .'/ D A00 C B00 '. As we should have periodicity
with respect to the angle, the constant B00 must be equal to zero. Also, since the
solution must be finite at r D 0, the constant C0 in the solution for R.r/ must also
be zero as the logarithm has a singularity at r D 0. Hence, for m D 0 our product
solution is simply a constant. Taking a general linear combination of all solutions
and absorbing the expansion coefficients in our arbitrary constants, we can write the
general solution of the PDE as
1
A0 X m
‰ .r; '/ D C r ŒAm cos .m'/ C Bm sin .m'/ : (8.133)
2 mD1
Here the constant term (which comes from the m D 0 contribution discussed
above) was conveniently written as A0 =2. The reason for that is that when applying
the boundary conditions at r D R,
1
A0 X m
C R ŒAm cos .m'/ C Bm sin .m'/ D .'/ ; (8.134)
2 mD1
we arrive exactly at the expansion of the function .'/ into the Fourier series with
the period of 2 (see Sect. 3.1 and specifically Eq. (3.7) with l D ), and hence
the explicit expressions for the unknown coefficients Am (m D 0; 1; 2; : : :) and Bm
(m D 1; 2; : : :) are obtained directly from Eqs. (3.9) and (3.10), respectively:
Z 2
1
Am D .'/ cos .m'/ d' ; (8.135)
Rm 0
Z 2
1
Bm D .'/ sin .m'/ d' : (8.136)
Rm 0
The obtained expressions fully solve the problem: the solution is given by the
Fourier expansion (8.133) with the expansion coefficients (8.135) and (8.136).
602 8 Partial Differential Equations of Mathematical Physics
(ii) Applying the boundary conditions, show that the coefficients above are
˛0 ˇ0
C0 D ; D0 D ˛0 C0 ln a ;
ln .a=b/
˛m bm ˇm am ˛m bm ˇm am
Am D ; Bm D ; m D ˙1; ˙2; : : : ;
.a=b/m .b=a/m .a=b/m .b=a/m
satisfies the corresponding PDE and the boundary conditions at the sides
y D 0 and y D L2 .
(ii) Show that the coefficients An and Bn satisfy the following equation:
Z
2 L2
An epn x C Bn epn x D ‰1 .x; y/ sin .pn y/ dy :
L2 0
(iii) Then, by considering the boundary conditions at the other two sides,
find the coefficients An and Bn , and hence show that the distribution of
temperature is given by:
1
X
4 .T0 T1 / 1
‰ .x; y/ D T1 C Œepn x C n epn x sin .pn y/ ;
n .1 C n /
nD1 .odd/
where the summation is run only over odd values of n and n D epn L1 .
Integral transforms (such as Fourier and Laplace) are also frequently used to solve
the PDEs. An example of an application of the Fourier transform method for solving
d’Alembert’s PDE (which is an inhomogeneous hyperbolic PDE) was considered in
Sect. 5.3.2 and of the heat conduction equation in Sect. 5.3.3. Here we shall show
how the Laplace transform (LT) method can also be used in solving PDEs in a
general case of arbitrary initial and boundary conditions. Of course, the success
depends very much on whether the inverse LT can be performed; however, in many
cases exact analytical solutions in the form of integrals and infinite series can be
obtained.
We shall illustrate this method by solving a rather general one-dimensional heat
conduction problem described by the following equations:
1@ @2
D ; .x; 0/ D .x/ ; .0C; t/ D '1 .t/ ; .L; t/ D '2 .t/ ;
@t @x2
(8.137)
where 0 x L, and the notations 0C and L in the boundary conditions mean
that these are obtained by taking the limits x ! 0 and x ! L from the right and
left sides, respectively, in the function .x; t/. We need to calculate the distribution
604 8 Partial Differential Equations of Mathematical Physics
.x; t/ along the rod at any time t > 0. An important variant of this problem may
be of a semi-infinite rod; this can be obtained by taking the limit of L ! 1.
First, we shall consider the case of the rod of a finite length L. Performing the LT
of the above equations with respect to time only yields
1 @2 ‰.x; p/
Œp‰.x; p/ .x/ D ; ‰ .0C; p/ D ˆ1 .p/ ; ‰ .L; p/ D ˆ2 .p/ ;
@x2
(8.138)
where p is a number in the complex plane, ‰ .x; p/ D L Œ .x; t/ is the LT of the
unknown temperature distribution, while ˆ1 .p/ and ˆ2 .p/ are LTs of the boundary
functions '1 .t/ and '2 .t/, respectively.
is
Z Z
1 x
x1 x 1 x
x1
y .x/ D C1 C f .x1 / e dx1 e C C2 f .x1 / e dx1 ex ;
2 0 2 0
(8.140)
where C1 and C2 are arbitrary constants [cf. Problem 8.22].
When applying the result of this problem to Eq. (8.138), we note y.x/ ! ‰ .x; p/,
2 D p= and f .x/ D .x/=. To calculate the two constants (which of course
are constants only with respect to x; they may and will depend on p), one has to
apply the boundary conditions stated in Eq. (8.138), which gives two equations for
calculating them:
C1 C C2 D ˆ1 ; (8.141)
Z L
1 x1
C1 C f .x1 / e dx1 eL
2 0
Z L
1
C C2 f .x1 / e dx1 eL D ˆ2 :
x1
(8.142)
2 0
(continued)
8.6 Method of Integral Transforms 605
p
where the dependence on p comes from ˆ1 , ˆ2 and D p=.
Problem 8.40. Then, substituting the expressions for the “constants” C1 and
C2 found in the previous problem into the solution (8.140) for y.x/ ! ‰ .x; p/,
RL Rx RL
show by splitting the integral 0 D 0 C x , that the solution of the boundary
problem (8.138) can be written as follows:
Problem 8.41. Finally, using the definition of the sinh function, simplify the
expression in the square brackets to obtain the final solution of the boundary
problem:
Z
sinh . .L x// sinh .x/ 1 L
‰ .x; p/ D ˆ1 .p/C ˆ2 .p/C ./ K .x; p; / d ;
sinh .L/ sinh .L/ 0
(8.145)
where
1 sinh ./ sinh . .L x// ; 0 < < x
K .x; p; / D :
sinh .L/ sinh .x/ sinh . .L // ; x < < L
(8.146)
This is the required solution of the full initial and boundary problem (8.138) in
the Laplace space. In order to obtain the final solution .x; t/, we have to perform
the inverse LT. This can only be done in terms of an expansion into an infinite series.
Four functions need to be considered which appear in Eqs. (8.145) and (8.146).
The coefficient to the function ˆ1 .p/ is the function
Lx p p p
sinh p p
sinh . .L x// e.Lx/ p=
e.Lx/ p=
Y1 .p/ D D p D p p
sinh .L/ sinh pL p eL p= eL p=
p p p p
1 e.Lx/ p= e.Lx/ p= ex p=
e.2Lx/ p=
D p p D p :
eL p= 1 e2L p= 1 e2L p=
606 8 Partial Differential Equations of Mathematical Physics
h p p iX
1 p
Y1 .p/ D ex p= e.2Lx/ p= e2Ln p=
nD0
1 h
X p p i
D e.2LnCx/ p=
e.2L.nC1/x/ p=
: (8.147)
nD0
p
Note that each exponential term is of the form e˛ p with some positive ˛ > 0.
This ensures that the inverse LT of each exponent in the series exists and is given by
Eq. (6.28). Denoting it by .˛; t/, we can write7
1
X
2Ln C x 2L .n C 1/ x
L1 ŒY1 .p/ D y1 .x; t/ D p ;t p ;t :
nD0
(8.148)
Problem 8.42. Using a similar method, show that the other three functions
appearing in Eqs. (8.145) and (8.146) can be similarly expressed as infinite
series:
7
Here we accepted without proof that the operation of LT can be applied to the infinite series term-
by-term. This can be shown to be true if the series in the Laplace space converges, and is true in
our case as we have a geometric progression.
8.6 Method of Integral Transforms 607
whereas for x L
sinh .x/ sinh . .L //
K.x; t; / D L1 ŒK .x; p; / D L1
sinh .L/
p X 1
2Ln C x 2Ln C C x
D p ;t p ;t
2 nD0
2L.nC1/x 2L.nC1/Cx
p ; t C p ;t ; (8.153)
p p
where .˛; t/ is the inverse LT of the function e˛ p = p which is given by
Eq. (6.29). In fact, it can be seen that Eq. (8.153) can be obtained from (8.152)
by permuting x and .
The obtained formulae allow us finally to construct the exact solution of the
heat conductance problem (8.137). Using the above notations and applying the
convolution theorem for the LT (Sect. 6.3.4), we obtain from Eq. (8.145):
Z Z
t
1 L
.x; t/ D Œ'1 .t / y1 .x; / C'2 .t / y2 .x; / dC ./K .x; t; / d :
0 0
(8.154)
This is the required general result. It follows that a general problem of arbitrary
initial and boundary conditions can indeed be solved analytically, although the final
result is expressed via infinite series and convolution integrals.
Consider now a semi-infinite rod, L ! 1, in which case the above expression
is drastically simplified. Indeed, in this case the x D L boundary condition should
be replaced by .1; t/ D 0. Correspondingly, in the limit the expressions for the
functions (8.147), (8.149) and (8.150) simplify to:
p
Y1 .p/ D ex p=
; Y2 .p/ D 0 ;
608 8 Partial Differential Equations of Mathematical Physics
8
r < exp xpp= sinh pp= ; 0 < < x
K.x; p; / D p p :
p : exp p= sinh x p= ; x < < 1
For instance, in the expression for Y1 .p/ only one exponential function with n D 0
survives, all others tend to zero in the L ! 1 limit. Similarly, one can establish the
expressions given above for the other functions.
Problem 8.43. Show that the final solution for a semi-infinite rod is
Z Z
t
x 1 1
.x; t/ D '1 .t / p ; d C ./K1 .x; t; / d ;
0 0
(8.155)
where
p
x xC
K1 .x; t; / D p ;t p ;t
2
for x < 1. Note that the solution does not contain the term associated
with the zero boundary condition at the infinite end of the rod as y2 .x; t/ D 0
exactly in the L ! 1 limit.
Problem 8.44. Consider a semi-infinite rod 0 x 1 which initially had
a uniform temperature T0 . At t D 0 the
p x D 0 end of the rod is subjected to
heating according to the law '1 .t/ D t. Show that at t > 0 the distribution of
the temperature in the rod is given by the following formula:
Z t
x p x
.x; t/ D p ; t d C T0 1 erfc p :
0 4t
Often in many problems which are encountered in practice, one needs to find a
minimum (or a maximum) of a function f .x/ with respect to its variable x, i.e., the
point x0 where the value of the function is the smallest (largest) within a certain
interval a x b. The necessary condition for this to happen is for the point x0
to satisfy the equation f 0 .x0 / D 0. However, sometimes one may need to solve a
much more difficult problem of finding the whole function, e.g. f .x/, which results
in a minimum (or a maximum) of some scalar quantity L which directly depends
on it. We shall call this type of dependence a functional dependence and denote
it as L Œf using square brackets. Taking some particular function f D f1 .x/ one
gets a particular numerical value L1 of this functional, while by taking a different
function f D f2 .x/ one gets a different value L2 . Therefore, it seems that indeed,
by taking all possible choices of functions f .x/, various values of the quantity L,
called functional of the function f , are obtained, and it is perfectly legitimate to ask
a question of whether it is possible to find the optimum function f D f0 .x/ that yields
the minimum (maximum) value of that scalar quantity (functional) L.
To understand the concept better, let us consider a simple example. Suppose, we
would like to prove that the shortest line connecting two points in the 2D space is the
straight line. The length of a line between two points .x1 ; y1 / and .x2 ; y2 / specified
by the equation y D y.x/ on the x y plane is given by the integral (Sect. I.4.6.11 )
Z x2 q
LD 1 C .y0 /2 dx: (9.1)
x1
1
In the following, references to the first volume of this course (L. Kantorovich, Mathematics for
natural scientists: fundamentals and basics, Springer, 2015) will be made by appending the Roman
number I in front of the reference, e.g., Sect. I.1.8 or Eq. (I.5.18) refer to Sect. 1.8 and Eq. (5.18) of
the first volume, respectively.
Here the function we seek passes through the two points, i.e., it must satisfy the
boundary conditions y .x1 / D y1 and y .x2 / D y2 . We see that L directly depends on
the function y.x/ chosen: by taking different functions y.x/ passing through the same
two points, different values of L are obtained, i.e., the length of the line between the
two points depends directly on how we connect them. In this example the length L
depends on the function y.x/ via its derivative only. The question we are asking
is this: prove that the functional dependence y.x/ which yields the shortest line
between the two points is the straight line. In other words, we need to minimise
the functional L Œy.x/ with respect to the form (shape) of the function y.x/, derive
the corresponding differential equation which would correspond to this condition,
and then solve it. We expect that solution is a straight line y D ˛x C ˇ with the
appropriate values of the two constants corresponding to the line fixed end points.
Similarly, one may consider a more complex 3D problem of finding the shortest
line connecting two points A and B when the line lies completely on a surface given
by the equation G .x; y; z/ D 0. In this case we need to seek the minimum of the
functional
Z b q
L Œx; y; z D .x0 /2 C .y0 /2 C .z0 /2 dt (9.2)
a
with respect to the three functions x.t/, y.t/ and z.t/, which give us the desired line
in the parametric form via t. If we disregard the constraint G .x; y; z/ D 0, then of
course the straight line connecting the points A and B would be the answer. However,
this may not be the case if we require that the line is to lie on the surface, e.g., on
the surface of a sphere. This kind of variational problems is called problems with
constraints. In that case the solution will crucially depend on the constraint specified.
It is the purpose of this Chapter to discuss how these problems can be solved in
a number of most frequently encountered cases. But before we proceed, we have
to define a linear functional. A functional LŒf .x/ is linear if for any two functions
.x/ and '.x/, one can write
is linear with respect to the function f .x/ for D 0 or D 1, while it is not linear
for any other values of .
9.1 Functions of a Single Variable 611
L D L Œf1 L Œf D L Œf C ıf L Œf :
D max jıf j :
a<x<b
L D L1 Œf ; ıf C L2 Œf ; ıf ;
where the functional L1 is linear with respect to ıf , while L2 is not. In fact, L1
tends to zero linearly with ! 0, while the functional L2 tends to zero faster
than . We can say that L2 = ! 0 as ! 0.2 The first part of the difference,
L1 , is called the variation of the functional due to change ıf .x/ of the function,
and is usually denoted ıL.
2
This is very similar to the differential of a function: y D y .x C x/ y.x/ is given by a sum
of two terms, one being linear in x, the other depends on higher powers of x, i.e., tends to zero
much faster than x itself (see Sect. I.3.1).
612 9 Calculus of Variations
In this case,
Z n Z
b o b 2
L D 2 .f C ıf /2 C x3 f 0 C ıf 0 dx 2f C x3 f 0 dx D L1 C L2 ;
a a
where
Z b Z b
L1 D 4f ıf C x3 ıf 0 dx and L2 D 2 .ıf /2 dx:
a a
2
One can see that L1 behaves like , while L2 as . Indeed, since jıf j , then
the first term in L1 can be estimated as
ˇZ ˇ Z Z
ˇ b ˇ b b
ˇ Œ4f ıf dxˇˇ 4 jf j jıf j dx 4 jf j dx ;
ˇ
a a a
and hence the module of this term is also proportional to . Note that we have used
here that ıf 0 D .ıf /0 and that ıf D 0 at the end points of our interval as the two
functions are equal there, Fig. 9.1. Let us now estimate L2 :
ˇZ ˇ Z
ˇ b ˇ b
ˇ 2 .ıf /2 dxˇˇ 2 jıf j2 dx 2 2
.b a/ ;
ˇ
a a
L .˛/ D L Œf C ˛ıf :
Here ˛ıf serves as a variation of f .x/ and L.˛/ becomes a function of a single
variable ˛. By taking small values of ˛ one can make the deviation ˛ıf from f to be
arbitrarily small for all values of x within the interval a < x < b. Then, the division
9.1 Functions of a Single Variable 613
of L into two terms (one being a linear functional and the other of higher order in
terms of the deviation) can be made similarly to the way this is done for an ordinary
function of a single variable as L.˛/ is indeed such a function of ˛. Expanding L.˛/
into the Taylor series around ˛ D 0, one gets
1
L D L .˛/ L.0/ D L0 .0/˛ C L00 .0/˛ 2 C : (9.4)
2
where we were able to take ˛ out in the L1 term since it is a linear functional
with respect to its second argument (the variation of the function). Comparing
Eqs. (9.4) and (9.5), we conclude that L0 .0/ must be equal to L1 Œf ; ıf , while
the following terms in the expansion (9.4), which behave at least as ˛ 2 , should
correspond to L2 Œf ; ˛ıf . Indeed, if 0 D maxa<x<b jıf j > 0 is the maximum
value of the function ıf .x/ within our interval, then D maxa<x<b j˛ıf j D j˛j 0 is
the maximum deviation of f C ˛ıf from f , and hence the limit ! 0 is equivalent
to taking the ˛ ! 0 limit. It is clear then that
ˇ ˇ ˇ ˇ
jL2 Œf ; ˛ıf j ˇ L2 Œf ; ˛ıf ˇˇ 1 ˇˇ L2 Œf ; ˛ıf ˇˇ
ˇ
D ˇ lim
lim
!0 ˛!0 ˛ ˇD lim
ˇ˛!0 ˛ ˇ D 0;
0 0
As an illustration of this method, let us apply this simple result again to the
functional (9.3):
Z h
b i
L .˛/ D 2 .f C ˛ıf /2 C x3 f 0 C ˛ıf 0 dx:
a
and that the corresponding L2 term behaves in such a way that
L2 Œf ; ˛ıf =˛ ! 0 as ˛ ! 0.
Next we shall consider the simplest case of the functional L Œf .x/ given by an
integral
Z b
L Œf D F x; f ; f 0 dx: (9.7)
a
0
Here F .x; f ; f / is some function which we assume is continuous with respect to
its three variables and has the necessary continuous derivatives with respect to any
of them. Note that F depends on x and, at the same time, is a functional of f .x/,
depending on the function itself and its first derivative. Let us calculate the variation
ıL of this functional:
Z b
L .˛/ D F x; f C ˛ıf ; f 0 C ˛ıf 0 dx
a
Z Z
ˇ b
dF b
@F @F
H) L0 .˛/ˇ˛D0 D dx D ıf C 0 ıf 0 dx:
a d˛ ˛D0 a @f @f
There are two contributions, coming from the second and the third arguments of the
function F. The contribution to the integral due to the second term in the integrand
above (it is related to the variation of the derivative ıf 0 ) we shall take by parts:
Z Z ˇ Z b
b
@F 0 @Fb
0 @F ˇˇb d @F
ıf dx D .ıf / dx D ıf ıf dx
a @f 0 a @f
0 @f 0 ˇa a dx @f 0
Z b
d @F
D ıf dx;
a dx @f 0
where the free (first) term which appeared after the integration by parts is zero since
ıf .a/ D ıf .b/ D 0. An important point here is that the x-derivative is the total
derivative which takes account of the complete dependence of F on x including
those of f and f 0 . This is because we have taken the integral over x by parts treating
F1 .x/ D @F=@f 0 as a function of x only, i.e., including its dependence on x via f and
f 0 as well. Hence, we obtain
Z b Z b
ˇ @F @F 0 @F d @F
0 ˇ
ıL D L .˛/ ˛D0 D ıf C 0 ıf dx D ıf dx:
a @f @f a @f dx @f 0
(9.8)
9.1 Functions of a Single Variable 615
Next, we should consider the necessary condition for the functional LŒf to be
minimum (maximum). Let us assume that the function f0 .x/ gives the maximum to
the functional. This means that for any function f .x/ such that jf .x/ f0 .x/j <
( is an arbitrary positive number) it follows that LŒf < L Œf0 . This is to be satisfied
for any x within the interval a x b. The minimum of the functional is defined
in a similar manner.
What we are going to do now is to employ the definition given above to establish
the necessary condition for the maximum. Choose f D f0 C ˛ıf . Then,
so that one can always choose a sufficiently small ˛ such that j˛j 0 < , where 0 is
the largest fluctuation of ıf . But if f0 gives the maximum, then L Œf0 C ˛ıf < L Œf0 ,
which means that L .˛/ < L.0/ for any ˛ satisfying the above condition. Therefore,
the value of ˛ D 0 corresponds to the maximum of the function L.˛/ for which
the necessary condition is obviously L0 .0/ D 0. Hence, this must be the necessary
condition we are looking for:
ˇ
L0 .0/ D L0 .˛/ˇ˛D0 D ıL D 0: (9.9)
Theorem 9.1. Let .x/ be any function which is zero at the ends of the interval,
.a/ D .b/ D 0, and is continuous together with its first derivative within
the whole interval. Consider a continuous function f .x/. If for any such .x/ the
integral
Z b
f .x/.x/dx D 0;
a
then f .x/ D 0.
Proof. We shall prove this theorem by contradiction. Assume that there is a point
x0 within our interval a < x < b such that f .x0 / > 0. Since f .x/ is assumed to be
continuous, this means that there must be a vicinity of the point x0 of some width 2
where the function f .x/ is also positive (see Theorem I.2.16), i.e., f .x/ > 0 for any
x from some interval < x < C , where ˙ D x0 ˙ . Next, we can construct a
particular function .x/ in such a way that
2 2
.x/ D .x C/ .x / (9.10)
within that interval and zero otherwise. The value of can be chosen small enough
so that the points ˙ both lie inside the original interval a < x < b. The function
.x/ defined in this way is everywhere continuous and is equal to zero at the points a
and b. Its first derivative is also continuous (and is equal to zero at the points x D ˙
and beyond the interval < x < C ). At the same time, the integral
616 9 Calculus of Variations
Z b Z C
Z C
2 2
f .x/.x/dx D f .x/.x/dx D f .x/ .x C/ .x / dx > 0;
a
since f .x/ > 0 within the chosen integration limits. This contradicts the condition of
the theorem, and hence our assumption was wrong. Note that the integration limits
in the integral above were changed to ˙ because the function D 0 outside it.
Q.E.D.
This simple theorem allows us to find the DE for the function f .x/ which delivers
a minimum (or a maximum) to the functional (9.7). Indeed, the variation of this
functional is given by Eq. (9.8) containing the arbitrary variation ıf .x/ under the
integral. Since the necessary condition for the extrema is that the variation of
the functional to be zero, the integral (9.8) must be equal to zero for arbitrary
variation ıf . Using the proven theorem, this means that the expression in the square
brackets in Eq. (9.8) must necessarily be zero, i.e.,
@F d @F
D 0: (9.11)
@f dx @f 0
This is called Euler’s equation. This DE is to be solved subject to the boundary con-
ditions at the ends of the interval: f .a/ D A and f .b/ D B. In practical calculations
it is sometimes useful to write the second term in the left-hand side, containing the
total derivative with respect to x, explicitly. Indeed, the partial derivative of F with
respect to f 0 depends on x, f and f 0 , and hence
d @F @2 F @2 F 0 @2 F
D C f C 0 0 f 00 D Fxf 0 C Fff 0 f 0 C Ff 0 f 0 f 00 :
dx @f 0 @x@f 0 @f @f 0 @f @f
It becomes clear now that this is a second order DE with respect to the function
f .x/, and hence its solution f .x; C1 ; C2 / will depend on two arbitrary constants C1
and C2 which are to be chosen to satisfy the boundary conditions f .a; C1 ; C2 / D A
and f .b; C1 ; C2 / D B. Three cases are possible: (1) there are no solutions and hence
the functional (9.7) has no extrema; (2) there is only one solution and hence only
one extremum and, finally, (3) there are several solutions, i.e., several extrema exist.
It is instructive at this point to consider several particular cases of the function
F .x; f ; f 0 / serving as the integrand in the functional (9.7).
Case 1 If F D F.x; f /, i.e., this function does not depend on f 0 , then the Euler’s
equation is simply Ff D 0. This is an algebraic equation with respect to the function
f , and hence the solution does not contain any arbitrary constants. This means that
it may not satisfy the boundary conditions.
9.1 Functions of a Single Variable 617
dU @U @U 0
D C f D g C hf 0 ;
dx @x @f
which coincides exactly with the function F in Eq. (9.13). Therefore, the functional
containing it,
Z Z
b b
dU
L Œf D g C hf 0 dx D dx D U .b; f .b// U .a; f .a// ;
a a dx
is simply a constant, it only depends on the points at the ends of the interval! So, no
solution in this case is possible at all.
Case 3 Consider now the function F being independent of x and f , i.e., it only
depends on the derivative f 0 , i.e., F D F .f 0 /. The Euler’s equation in this case is
d @F @F
D0 H) D C1 ;
dx @f 0 @f 0
Case 4 Finally, let F D F .f ; f 0 /, i.e., the function F does not explicitly depend on
the variable x. In this case the Euler’s equation (9.12) reads
Ff Fff 0 f 0 Ff 0 f 0 f 00 D 0: (9.15)
This is the second order DE which can be integrated once, that is, it can be
transformed into a first order DE with respect to the function f .x/. To this end,
multiply both sides of Eq. (9.15) by f 0 and add and subtract Ff 0 f 00 :
f 0 Ff Fff 0 f 0 Ff 0 f 0 f 00 C Ff 0 f 00 Ff 0 f 00 D 0;
The first term within the round brackets is simply the total derivative dF dx
. At the
same time, the second term in the square brackets is also the total derivative of f 0 @f@F0
with respect to x (recall, that F only explicitly depends on f and f 0 ):
2
d 0 @F 00 @F 0 @F 0 @2 F 00
f 0 Df Cf f C 0 0f ;
dx @f @f 0 @f @f 0 @f @f
which is the same as in the square brackets in Eq. (9.16). Therefore, Eq. (9.16) can
be written as
d 0 @F @F
F f 0 D 0 H) F f 0 0 D C1 : (9.17)
dx @f @f
This is the final result: the obtained DE is of the first order, and C1 is an arbitrary
constant. Solving the obtained DE gives another constant, C2 , and this is enough for
the two boundary conditions to be satisfied.
We shall illustrate this formula by solving a classical geodesic problem of finding
the shortest line lying on a sphere of radius R and centred at the origin which
connects two points A .R; A ; A / and B .R; B ; B / (specified using the spherical
coordinates) on that sphere. A square of the element of the line length, .ds/2 , lying
on the sphere is given by Eq. (7.35), so that the required element of the arc length is
q
ds D R .d/2 C sin2 .d /2 :
and hence the line length is given by integrating ds along the line:
s
Z 2
B
d
LDR C sin2 d :
A
d
in the above functional does not depend on the variable at all, so that the variation
of the functional with respect to the unknown function . / results in Eq. (9.17)
with that F. Performing the required differentiation @F=@ 0 , we obtain from (9.17):
q
. 0 /2 C1 sin2 1
. 0 /2 C sin2 q D H) q D ;
R
. 0 /2 C sin2 . 0 /2 C sin2
with D R=C1 . The obtained expression can be solved easily with respect to 0
leading to an ordinary first order DE in which the variables are separated
p Z
0 d
D 2 sin4 sin2 H) p D C C2 :
sin sin2
2 4
cot 1 1 d
tD p ; dt D p 1 C cot2 d D p 2
;
2 1 2 1 2 1 sin
p r
1 p
2
4 2
2 sin sin D sin 2 2
D sin2 2 .1 C cot2 /
sin
p p
D sin2 2 1 1 t2 :
We obtain
Z
dt
p D C C2 H) arcsin t D C2 ;
1 t2
which leads to
cot
sin .C2 / D t H) sin .C2 C / D p ;
2 1
620 9 Calculus of Variations
or, writing the sine in the left-hand side via the cosine and sine of the angles C2 and
using the well-known trigonometric identity, and splitting up the cotangent,
cot
sin C2 cos C cos C2 sin Dp
2 1
.R cos /
H) sin C2 .R sin cos / C cos C2 .R sin sin / D p :
2 1
We recognise in the expressions in the round brackets the coordinates x D
R sin cos , y D R sin sin and z D R cos of the line on the surface of the
sphere. Therefore, we obtain
˛x C ˇy C z D 0;
p
where ˛ D sin C2 and ˇ D cos C2 are two related constants, while D 1= 2 1
is another independent constant, so we have two independent constants in total.
We have obtained an equation for a plane passing through the centre of the
coordinate system and the two points A and B (which determine the two constants
C2 and ). Hence, the shortest line connecting two points on a sphere lies on the
intersection of the sphere and this plane.
(continued)
622 9 Calculus of Variations
mv 2 p
mgy D H) v.x/ D 2gy.x/;
2
where mq
is the mass of the object and g the gravitational constant (on Earth).
If ds D 1 C .y0 /2 dx is the elementary length of the slide around point x, the
time required to cross it is ds=v.x/. The total travel time to slide from the top of
the slide to its bottom is given by the integral
s
Z xB Z xB
ds 1 1 C .y0 /2
LD Dp dx:
0 v 2g 0 y
Show that the optimum curve of the p slide (i.e. corresponding to the shortest
sliding time) satisfies the DE y0 D .2C1 y/ =y, where C1 is a constant.
Then, integrate this DE over y making the substitution
(continued)
9.1 Functions of a Single Variable 623
to obtain
x D C1 .t sin t/ C C2 :
To understand what we have just obtained, note that the variable t can be used
as a parameter in specifying the function y D y.x/ as x D x.t/ and y D y.t/.
The first of the boundary conditions x D y D 0 is satisfied with t D 0 and C2 D
0. Demonstrate by plotting f1 .t/ D 1= .1 cos t/ and f2 .t/ D 1= .t sin t/ for
0 t 2 that they intersect only at a single point ensuring that it is always
possible to determine the other constant C1 from the coordinates of the point B.
The curve we obtained is drawn in the x y plane by a fixed point on a wheel
rolling in the x direction; it is called cycloid.
Let us now consider a more general case of a functional depending on more than
one function:
Z b
L Œf1 ; f2 ; : : : ; fn D F x; f1; : : : fn ; f10 ; : : : ; fn0 dx: (9.18)
a
In this case one may consider a partial variation of the functional with respect to the
function fi for a specific i between 1 and n. We then introduce a function
in which only one function, fi .x/, is varied. Then the partial variation of L with
respect to fi .x/ is defined as ıLi D Li0 .0/, and the necessary condition for L to be
minimum (maximum) will be
As an example, let us consider again the problem of the shortest line connecting 2
points, A .xA ; yA ; zA / and B .xB ; yB ; zB /. Suppose, the line is specified parametrically
as x D x.t/, y D y.t/ and z D z.t/. Then the line length is given by the line integral
Z tB q
LD .x0 /2 C .y0 /2 C .z0 /2 dt:
tA
@F y0 1
D q D ;
@y0 C
.x0 /2 C .y0 /2 C .z0 /2 2
@F z0 1
Dq D ;
@z0 C3
.x0 /2 C .y0 /2 C .z0 /2
where C1 , C2 and C3 are three independent constants. Square both sides of each
equation and rearrange. We get
2 2 2
.1 C1 / x0 C y0 C z0 D 0;
0 2 2 2
x C .1 C2 / y0 C z0 D 0;
0 2 0 2 2
x C y C .1 C3 / z0 D 0:
This is a system of three linear algebraic equations with respect to the three squares
of the derivatives which has a non-trivial solution only if its determinant is equal to
zero. This, however, would imply some special relationship between the arbitrary
constants which is unreasonable. Therefore, we can accept the trivial solution that
.x0 /2 D .y0 /2 D .z0 /2 D 0. This means that all three functions we are seeking, x.t/,
y.t/ and z.t/, are linear functions of t. This corresponds to a straight line connecting
the two points as anticipated; the six constants for the three linear functions, x D
˛1 t C ˇ1 , y D ˛2 t C ˇ2 and z D ˛3 t C ˇ3 , are determined immediately from the six
coordinates of the two points.
9.1 Functions of a Single Variable 625
and the same for the point x D b. Then all free terms in the above integration by
parts become zero, and we obtain
Z Z
b
@F .m/ b
dm @F
ıf dx D .1/m ıf dx:
a @f .m/ a dxm @f .m/
This will happen to each term in Eq. (9.24), which results in the following expression
for the variation:
Z b n
@F d @F d2 @F n d @F
ıL D C 2 C C .1/ ıf dx:
a @f dx @f 0 dx @f 00 dxn @f .n/
(9.25)
626 9 Calculus of Variations
To obtain the corresponding Euler’s equation from this expression, we need first
to modify Theorem 9.1 since when proving it we only assumed that the function
.x/ is continuous together with its first derivative. Here we have to make sure that
.x/ in the condition
Z b
f .x/.x/dx D 0
a
of Theorem 9.1 is continuous together with all its derivatives up to the n-th one
and that the function itself and its derivatives are equal to zero at the end points
of the interval. This is easy to accomplish if we define .x/ within a small interval
< x < C (see the proof of Theorem 9.1) as
.x/ D .x C/
nC1
.x /
nC1
(9.26)
This is the DE of order 2n: indeed, the derivative @F=@f .n/ , which may contain f .n/ , is
differentiated n times with respect to x in the last term in the left-hand side. Hence,
the solution f .x/ must contain 2n arbitrary constants which are obtained from 2n
boundary conditions, which are that f , f 0 , : : :, f .n1/ all have well-defined values at
the end points.
Gj .x; f1 ; : : : ; fn / D 0; j D 1; : : : ; k: (9.29)
G .x; f1 ; f2 / D 0: (9.30)
Assume that we can find f2 from Eq. (9.30) above; then f2 D ' .x; f1 / will become
some function of x and f1 . Substituting this solution into the functional, we obtain
an explicit functional of only one function f1 .x/:
Z Z
b b
L Œf1 D F x; f1 ; ' .x; f1 / ; f10 ; ' 0 .x; f1 / dx D ˆ x; f1 ; f10 dx:
a a
628 9 Calculus of Variations
Note that here ' 0 .x; f1 / is the total derivative of ' with respect to x, i.e.,
@' @' 0
'0 D C f : (9.31)
@x @f1 1
Also, ˆ is the new integrand in the functional which depends only on the first
function; we have introduced it for convenience.
The function f1 (note that f2 follows from it immediately as it is related to f1 via
f2 D ' .x; f1 /) is obtained from the Euler’s equation
@ˆ d @ˆ
D 0: (9.32)
@f1 dx @f10
We shall now need to work out all the derivatives in this equation. First of all,
@ˆ @ @F @F @' @F @' 0
D F x; f1 ; ' .x; f1 / ; f10 ; ' 0 .x; f1 / D C C 0
@f1 @f1 @f1 @' @f1 @' @f1
@F @F @' @F @' 0
D C C 0 :
@f1 @f2 @f1 @f2 @f1
To calculate the derivative @' 0 =@f1 needed above, we use the explicit expres-
sion (9.31) for ' 0 :
@' 0 @ @' @' 0 @2 ' @2 ' 0
D C f1 D C f :
@f1 @f1 @x @f1 @x@f1 @f1 @f1 1
Hence,
@ˆ @F @F @' @F @2 ' @2 ' 0
D C C 0 C f1 : (9.33)
@f1 @f1 @f2 @f1 @f2 @x@f1 @f1 @f1
@ˆ @F @F @' 0
D C :
@f10 @f10 @f20 @f10
The second contribution is necessary since the derivative ' 0 , according to Eq. (9.31),
also explicitly depends on f10 . Moreover, from this expression, we immediately get:
@' 0 =@f10 D @'=@f1 . Therefore,
@ˆ @F @F @'
0 D 0 C : (9.34)
@f1 @f1 @f20 @f1
Substituting Eqs. (9.33) and (9.34) into the Euler’s equation (9.32), we obtain
@F @F @' @F @2 ' @2 ' 0 d @F @F @'
C C 0 C f C 0 D 0: (9.35)
@f1 @f2 @f1 @f2 @x@f1 @f1 @f1 1 dx @f10 @f2 @f1
9.1 Functions of a Single Variable 629
where
d @' @2 ' @2 ' 0
D C f :
dx @f1 @x@f1 @f1 @f1 1
Substituting these results into Eq. (9.35) and performing necessary cancellations, we
obtain
@F d @F @' @F d @F
C D 0: (9.36)
@f1 dx @f10 @f1 @f2 dx @f20
with respect to f1 :
@G @G @'
C D 0; (9.37)
@f1 @f2 @f1
@' @G=@f1
D :
@f1 @G=@f2
This result allows rewriting the Euler’s equation (9.36) in the following final form:
@F d @F @G=@f1 @F d @F
D 0: (9.38)
@f1 dx @f10 @G=@f2 @f2 dx @f20
The obtained DE may have solved the problem; however, it looks complicated
and hence very difficult to remember. As with the optimisation of functions with
constraints (see Sect. I.5.10.4), a simple method exists (proposed by Euler) which
can simplify the procedure. Indeed, let us multiply Eq. (9.37) by some function .x/
and then add the obtained equation and the Euler’s equation (9.36) together. After
simple manipulation:
@F @G d @F @' @F d @F @G
C .x/ C C .x/ D 0: (9.39)
@f1 @f1 dx @f10 @f1 @f2 dx @f20 @f2
630 9 Calculus of Variations
This result is valid for any function .x/. Let us now make a specific choice for it.
Namely, let us choose .x/ in such a way that the expression in the square brackets
above be zero:
@F d @F @G
C .x/ D 0; (9.40)
@f2 dx @f20 @f2
and, therefore,
1 @F d @F
.x/ D : (9.41)
@G=@f2 @f2 dx @f20
Hence, with this choice, what is left from Eq. (9.39) be simply this:
@F @G d @F
C .x/ D 0: (9.42)
@f1 @f1 dx @f10
It is easy to see that this is exactly equivalent to Eq. (9.38) if we replace here with
its expression (9.41).
Hence, the two Eqs. (9.40) and (9.42) are fully equivalent to Eq. (9.38). These,
however, can be rewritten in a form which is easy to remember. Consider an
auxiliary function of x and both functions and their derivatives:
H x; f1 ; f2 ; f10 ; f20 D F x; f1 ; f2 ; f10 ; f20 C .x/ G .x; f1 ; f2 / : (9.43)
@F @H @F @H
0 D and 0 D I
@f1 @f10 @f2 @f20
other terms in Eqs. (9.40) and (9.42) can also be combined, and we obtain instead
the two equations for the function H:
@H d @H @H d @H
D0 and D 0: (9.44)
@f1 dx @f10 @f2 dx @f20
Now we understand the main idea, we can apply the same method to the general case
of Eqs. (9.28) and (9.29). Suppose, we can solve equations of constraints (9.29) with
respect to the first k functions3 :
@'j Xn
@'j 0
fj0 D 'j0 D C f : (9.46)
@x i DkC1
@fi1 i1
1
@F X @F @'j X @F @'j
k k 0
@ˆ
D C C
@fi @fi jD1
@fj @fi jD1
@fj0 @fi
0 1
@F X @F @'j X @F @ @2 'j X
k k n
@2 'j 0
D C C C f A;
@fi jD1
@fj @fi jD1
@fj0 @x@fi i DkC1 @fi1 @fi i1
1
3
It can be shown that the necessary condition for the constraints (9.29) to be solvable with respect
1 ;:::;Gk /
to the functions f1 ; : : : ; fk is that det D ¤ 0, where D D @.G
@.f1 ;:::;fk /
is the Jacobian. This point goes
deeply into the inverse function theorem in the case of functions of many variables.
632 9 Calculus of Variations
where we have used Eq. (9.46) for the derivative 'j0 . The other derivative we need is
@F X @F @'j @F X @F @'j
k 0 k
@ˆ
D C D C :
@fi0 @fi0 jD1
@fj0 @fi0 @fi0 jD1
@fj0 @fi
In the last passage we replaced @'j0 =@fi0 with @'j =@fi . This follows from Eq. (9.46)
after differentiating it with respect to fi0 . Replacing the above derivatives in Euler’s
equation (9.47), we get
0 1
@F X
k
@F @'j Xk
@F @ @ 'j2 Xn 2
@ 'j 0 A
C C 0 C fi1
@fi jD1
@fj @fi jD1
@f j @x@f i i DkC1
@f i1 @fi
1
0 1
d @ @F X @F @'j A
k
C D 0: (9.48)
dx @fi0 jD1
@fj0 @fi
In the last term we need to calculate the total derivative with respect to x:
0 1 !
d @X @F @'j A X d @'j X @F d
k k k
@F @'j
D C ;
dx jD1 @fj0 @fi jD1
dx @fj0 @fi jD1
@fj0 dx @fi
where
Xn
d @'j @2 'j @2 'j 0
D C f :
dx @fi @x@fi i DkC1 @fi @fi1 i1
1
The derivatives @'j =@fi can be obtained by differentiating the constraints (9.29) with
respect to fi . Writing the constraints more explicitly as
we have
The above equations represent (for each fixed index i) a set of k linear algebraic
equation
with respect to the derivatives @'j1 =@fi . Introducing the k k matrix D D
Djj1 D @Gj =@fj1 of the derivatives and assuming that its determinant is not equal
to zero,4 these equations can be solved using the inverse matrix D1 . This way, at
least formally, the derivatives @'j1 =@fi can be obtained
@'j1 Xk
1 @Gj
D D j1 j : (9.51)
@fi jD1
@fi
and
!
@F d @F X
k
@Gj1
C j1 .x/ D 0; j D 1; : : : ; k: (9.54)
@fj dx @fj0 j1 D1
@fj
It can be seen now that if we solve Eq. (9.54) with respect to the Lagrange
multipliers and substitute this result into the other Eq. (9.53), then we shall obtain
correctly Eq. (9.49). Indeed, Eq. (9.54) can be rewritten as
" !#
X
k
@F d @F
Dj1 j j1 .x/ D :
j1 D1
@f j dx @fj0
Notice that here we have the matrix D transposed. Therefore, when solving for j1 ,
we also transpose the inverse matrix:
" !#
X
k
1 @F d @F
j1 .x/ D D jj1 :
jD1
@fj dx @fj0
@.G1 ;:::;Gk /
4
This matrix represents the Jacobian D D @.f1 ;:::;fk /
: As was mentioned in the previous footnote
above, we must assume that det D ¤ 0.
634 9 Calculus of Variations
Making use of Eq. (9.51), we see that the expression in the curly brackets is in
fact @'j =@fi . Therefore, comparing with the original form (9.49) of the Euler’s
equation, we conclude that it has been recovered. This means that Eqs. (9.53)
and (9.54) are completely equivalent to it.
We shall now rewrite these equations in a more convenient form by introducing
an auxiliary function
X
k
HDFC j .x/Gj : (9.55)
jD1
and
!
@H d @H
D 0; j D 1; : : : ; k: (9.57)
@fj dx @fj0
We conclude that the same equations are obtained for all functions fl with l D
1; : : : ; n. In other words, the problem with constraints is completely equivalent to
the problem without constraints but applied to the auxiliary function (9.55). This
finally proves the required general statement in the case of an arbitrary holonomic
constraints.
It is clear now that if the constraints contain also derivatives of the functions we
are seeking, then these equations are differential equations and hence expressing
some of the functions via the others may be much more complicated. Still, it can be
shown that the method of Lagrange multipliers is applicable in this case as well.
Suppose now that our constraints have the form of the integral:
Z b
Gi x; f1 ; : : : ; fn ; f10 ; : : : ; fn0 dx D gi ; i D 1; : : : ; k: (9.58)
a
These are functions resulting from the upper limit in the integral. Since at the
upper limit, x1 D b, we should reproduce the constraints themselves, we must have
yi .b/ D gi . Obviously, by construction,
y0i Gi x; f1 ; : : : ; fn ; f10 ; : : : ; fn0 D 0; i D 1; : : : ; k:
which does depend only on the derivatives of the newly added functions yi .x/. The
Euler’s equations in this case are
!
@H d @H
D 0; j D 1; : : : ; n; (9.61)
@fj dx @fj0
and
@H d @H
D 0; i D 1; : : : ; k: (9.62)
@yi dx @y0i
Since the function H does not actually depend on the functions yi themselves, the
second equation gives
@H
D i .x/ D Ci ; i D 1; : : : ; k;
@y0i
i.e., the Lagrange multipliers must be constants, i.e., they cannot depend on x.
Moreover, when writing the Euler’s equations (9.61) for the functions fj , the
derivatives of yi in H of Eq. (9.60) do not contribute and, therefore, the function
H can be built simply as
X
k
HDFC i Gi (9.63)
iD1
Therefore, the total potential energy of the rope due to gravity will be proportional
to (we can omit the irrelevant prefactor g):
Z x0 q
UD y 1 C .y0 /2 dx: (9.64)
0
We have to minimise U with respect to the shape y.x/ of the rope subject to the
condition that the length of the rope
Z x0 q
lD 1 C .y0 /2 dx (9.65)
0
is fixed and equal to l. Following the recipe discussed above, we construct the
auxiliary function
q q q
H y; y0 D y 1 C .y0 /2 C 1 C .y0 /2 D .y C / 1 C .y0 /2 :
Since it depends only on y and its derivative, the Euler’s equation yields
q
@H .y0 /2
H y0 D C1 H) .y C / 1 C .y0 /2 .y C / q D C1 :
@y0
1 C .y0 /2
q
Multiplying both sides by 1 C .y0 /2 and simplifying, we obtain
q q
yC dy 1
1 C .y0 /2 D H) D˙ .y C /2 C12 ; (9.66)
C1 dx C1
where the zero bottom limits in both integrals take account of the fact that .0; 0/ is
the starting point of the line. The integral in the left-hand side by the substitution
y1 C D C1 t is manipulated into the inverse hyperbolic cosine integral, the integral
in the right-hand side is straightforward, so that we obtain
Z .yC/=C1
dt x 1 yC 1 x
p D H) cosh cosh D ;
=C1 2
t 1 C1 C1 C1 C1
638 9 Calculus of Variations
which in turn means that the expression in the square brackets is zero:
x0 x0 x0
C cosh1 D0 H) D C1 cosh D C1 cosh :
2C1 C1 2C1 2C1
We now need to satisfy the condition related to the length of the curve to obtain C1 .
We have
Z x0 =2 q Z x0 =2 Z x0 =2
l 0 2 yC x x0 =2
D 1 C .y / dx D dx D cosh dx
2 0 0 C1 0 C1
x0
D C1 sinh :
2C1
It is easy to verify that this transcendental equation always has a solution for C1 .
The shape of the curve (9.68) is fully defined. In fact, this formula is valid for the
whole range 0 x x0 .
Problem 9.10. Consider a line of length l fixed at two points A.x0 ; 0/ and
B.x0 ; 0/ (where l > 2x0 ). Determine
Rx the shape y D y.x/ of the curve which
gives the largest area A D x0 0 y.x/dx under it, i.e., between the curve and the
x axis (it may be assumed that y 0).
(continued)
9.2 Functions of Many Variables 639
This problem also shows that the largest possible area made by a close-looped
line of a fixed length l is a circle of radius R D l=2 , something which can easily
be accepted intuitively.
So far we have considered the case where the functional L depends on one or more
functions (and their derivatives) which depend on a single variable x; consequently,
the functional L has the form of a single integral with respect to x. In some
applications it is necessary to find the minimum (maximum) of a functional which
is written as a multiple integral over some multidimensional region and contains an
unknown function f (and its partial derivatives) defined in this region, i.e., we have
to deal with the function of more than one variable.
For instance, consider a closed curve L in the 3D space. This curve could be
a boundary to many curvilinear surfaces S which may have different surface areas
(Sect. I.6.4.2)
Z Z q
2 2
AD 1 C z0x C z0y dxdy:
†
Here the surface S is specified as z D z.x; y/, and † is the projection of S on the
x y plane. It is then legitimate to ask the question of what is the surface that has the
minimum possible surface area A. For instance, if the contour L is planar (e.g. lies
in the x y plane), then the minimum of the area is achieved by the planar surface
enclosed by the contour L. However, if L is not planar, the answer is not so obvious.
We shall first consider the following functional:
Z Z
L Œf D F x; y; f ; fx0 ; fy0 dxdy: (9.69)
D
function f has definite values on the whole boundary L of the region D, and is
continuous everywhere in D together with its first derivatives fx0 and fy0 .
The variation ıL in this case is constructed exactly in the same way as for a
function of a single variable using the function L.˛/ D L Œf C ˛ıf , where ıf is
a fixed function of two variables such that on the boundary L it is zero, ıf .L/ D
ı f .x; y/jL D 0. Then,
ˇ
ıL D L0 .˛/ˇ˛D0 D L0 .0/;
Theorem 9.2. Let .x; y/ be any function which is zero at the boundary L of some
2D region D, and is continuous together with its first derivatives 0x and 0y within
the whole region D including its boundary. If f .x; y/ is a continuous function in D
and for any such .x; y/ the integral
Z Z
f .x; y/.x; y/dxdy D 0;
D
then f .x; y/ D 0.
Next, we define the function .x; y/ in the following way: it is zero at the boundary
of the same circle and beyond it, while inside the circle it is defined as
h i2
.x; y/ D .x x0 /2 C .y y0 /2 2
:
It is easy to see that this function is indeed zero at the boundary of the circle and
hence is continuous everywhere in D. Its first partial derivatives behave similarly.
Indeed, within the circle
h i
0x D 4 .x x0 / .x x0 /2 C .y y0 /2 2 ;
and hence it is zero at its boundary; it continues to be zero beyond the circle by
construction. Hence, defined in that manner satisfies the conditions of the theorem.
At the same time, the surface integral
Z Z Z Z
f .x; y/.x; y/dxdy D f .x; y/.x; y/dxdy > 0;
D U
i.e., it is not zero as both functions are positive within the circle U . Therefore, our
assumption has been proven wrong. Q.E.D.
9.2 Functions of Many Variables 641
Using this theorem, we can derive an appropriate partial differential equation for
the function f .x; y/ corresponding to the optimum of the functional (9.69). Indeed,
the variation
Z Z
d 0 0
ıL D F x; y; f C ˛ıf ; .f C ˛ıf /x ; .f C ˛ıf /y dxdy
d˛ D ˛D0
Z Z
d
D F x; y; f C ˛ıf ; fx0 C ˛ .ıf /0x ; fy0 C ˛ .ıf /0y dxdy
d˛ D ˛D0
Z Z " #
@F @F @F
D ıf C 0 .ıf /0x C 0 .ıf /0y dxdy
D @f @fx @fy
Z Z Z Z " #
@F @F 0 @F 0
D ıf dxdy C 0
.ıf /x C 0 .ıf /y dxdy: (9.70)
D @f D @fx @fy
Similarly,
! !
@F @ @F @ @F
.ıf /0y 0
D ıf ıf :
@fy @y @fy0 @y @fy0
Using these expressions to replace the two terms within the square brackets in
Eq. (9.70), we obtain
Z Z " !#
@F @ @F @ @F
ıL D ıf dxdy
D @f @x @fx0 @y @fy0
Z Z " !#
@ @F @ @F
C ıf 0 ıf dxdy:
D @x @fx0 @y @fy
The second integral is zero. Indeed, it can be handled, e.g., by means of the Green’s
formula, Sect. I.6.3.3, with the following choice of the two functions Q and P:
@F @F
Q.x; y/ D ıf and P.x; y/ D ıf :
@fx0 @fy0
642 9 Calculus of Variations
RR @Q @P
Then, according to the Green’s formula, the double integral D @x
@y
dxdy
equals the line integral
I I !
@F @F
Pdx C Qdy D ıf 0 dx C 0 dy
L L @fy @fx
At the stationary point ıL D 0. As this is to happen for any variation ıf , then based
on Theorem 9.2 we arrive at the following equation the function f .x; y/ must satisfy:
!
@F @ @F @ @F
D 0: (9.71)
@f @x @fx0 @y @fy0
This result was first obtained by Ostrogradsky and bears his name.
A generalisation to functions of more variables can easily be done. First of all,
we need the necessary generalisation of Theorem 9.2 for the case of n dimensions
(n 2). This is done by introducing
" n #2
X
.x1 ; : : : ; xn / D .xi xi0 /2 2
iD1
within a small vicinity U of the point .x10 ; : : : ; xn0 / where we assume that the
function f .x10 ; : : : ; xn0 / > 0. The rest of the proof remains exactly the same.
Before going to a general n-dimensional case, let us next consider the case of
three dimensions. The functional is a volume integral:
Z Z Z
LD F x; y; z; f ; fx0 ; fy0 ; fz0 dxdydz; (9.72)
D
Z Z Z " #
@F @F 0 @F 0 @F 0
ıL D ıf C 0 .ıf /x C 0 .ıf /y C 0 .ıf /z dxdydz
D @f @fx @fy @fz
Z Z Z " ! #
@F @ @F @ @F @ @F
D ıf dxdydz
D @f @x @fx0 @y @fy0 @z @fz0
Z Z Z " ! #
@ @F @ @F @ @F
C ıf C ıf C ıf dxdydz:
D @x @fx0 @y @fy0 @z @fz0
This time, to transform the last integral into the integral over the surface boundary
S of D, we notice that the integrand
there
can be thought of as a divergence of the
vector field Fıf , where F D Fx ; Fy ; Fz is the vector field with the components
@F @F @F
Fx D ; Fy D ; Fz D ;
@fx0 @fy0 @fz0
and the necessary condition ıL D 0 leads to the required generalisation of Eq. (9.71)
to three dimensions:
!
@F @ @F @ @F @ @F
D 0: (9.73)
@f @x @fx0 @y @fy0 @z @fz0
Show that in this case the stationary value of the functional is given by the
function f .x; y/ which satisfies the equation:
! ! !
@F @ @F @ @F @2 @F @2 @F @2 @F
C C C 2 D 0:
@f @x @fx0 @y 0
@fy 2
@x @fxx00 @x@y 00
@fxy @y @fyy00
(9.75)
9.3 Applications in Physics 645
9.3.1 Mechanics
Here it is assumed that the coordinates are fixed at the initial and final times. The
Lagrange function is constructed as a difference between the system’s kinetic, K,
and potential, U, energies, L D K U. Note that previously L was denoting
the whole functional; here it is denoted S while the integrand is L. These are the
notations widely accepted in physics literature for the action and the Lagrangian,
respectively, and hence we shall stick with them here.
5
After Joseph-Louis Lagrange.
646 9 Calculus of Variations
Gi .q1 ; : : : ; qn ; t/ D 0; i D 1; : : : k;
X
k
H DLC i .t/Gi
iD1
with i .t/ being the corresponding Lagrange multipliers, and hence the correspond-
ing Lagrange equations would contain an additional term due to these constraints:
@L X @Gj
k
d @L
D C j ; i D 1; : : : ; n: (9.78)
dt @Pqi @qi jD1
@qi
The terms in the right-hand side serve as forces acting on the coordinate qi ; the
first term is a real (physical) force, while the second is an artificial force due to the
constraints.
As our first example of application of the Lagrange equations, let us consider
a single particle of mass m moving under the external potential U.r/. Here
r D .x; y; z/ are the three Cartesian coordinates describing the system. In this case
there is no need for special generalised coordinates, the Cartesian coordinates will
do. The Lagrangian is
1 2 1
LD mPr U.r/ D m xP 2 C yP 2 C zP2 U .x; y; z/ ;
2 2
leading to three Lagrange equations for each Cartesian component, e.g., one of
them,
@L d @L @U d @U
D 0 H) .mPx/ D 0 H) mRx D 0;
@x dt @Px @x dt @x
is nothing but Newton’s second equation of motion projected on the x axis, since
@U
@x
is the x component of the force, Fx .
In our second example, consider a pendulum of length l with a particle of mass m
hanging at its end, Fig. 9.4(a). The Cartesian coordinates of the mass are x D l sin ˛
and y D l cos ˛. However, it is clear that only a single coordinate is needed to
9.3 Applications in Physics 647
Fig. 9.4 Pendulum problems: (a) a single pendulum; (b) a double pendulum and (c) a pendulum
with an oscillating support
adequately describe the oscillations of the mass. The most convenient choice here
is the angle ˛ of the pendulum with the y (vertical) axis. Hence, choosing ˛ as the
required independent generalised coordinate, we have to express both the potential
and kinetic energies via it. The potential energy is straightforward: U D mgy D
mgl cos ˛. The kinetic energy is most easily worked out starting from the Cartesian
coordinates:
1 2 1 h i 1
KD m xP C yP 2 D m .l˛P cos ˛/2 C .l˛P sin ˛/2 D ml2 ˛P 2 ;
2 2 2
so that the Lagrangian becomes
1 2 2
LD ml ˛P C mgl cos ˛:
2
The required equation of motion is obtained from the Lagrange equation:
@L d @L d g
D0 H) mgl sin ˛ ml2 ˛P D 0 H) ˛R C sin ˛ D 0:
@˛ dt @˛P dt l
and
1 1 1
LD .m1 C m2 / xP 2 C m2 l2 ˛P 2 C m2 lPx˛P cos ˛ kx2 C m2 gl cos ˛;
2 2 2
while the corresponding equations of motion read:
and
m2 l k
xR C ˛R cos ˛ ˛P 2 sin ˛ C x D 0:
m1 C m2 m1 C m2
Problem 9.14. A ball of mass m is set on a horizontal rod along which it can
slide without friction; at the same time, the ball is also attached by a spring
with the elastic constant k to a vertical axis, see Fig. 9.5. The rod rotates with
the angular frequency around the vertical axis. Show that the Lagrangian in
this case is
1 2 1
LD m rP C 2 r2 k .r r0 /2 ;
2 2
(continued)
9.3 Applications in Physics 649
Xn X n
dL @L @L d @L @L
D qP i C qR i D qP i C qR i :
dt iD1
@qi @Pqi iD1
dt @Pqi @Pqi
Here we replaced the derivative of L with respect to the coordinate qi with the
time derivative term coming from the Lagrange equations (9.77). Then, we notice
that an expression in the square brackets is nothing but the total time derivative of
qP i .@L=@Pqi /, so that
!
dL X d n
@L d X @L
n
D qP i H) qP i L D 0:
dt iD1
dt @Pqi dt iD1 @Pqi
Xn
@L
ED qP i L; (9.79)
iD1
@P
qi
6
Our consideration is also valid for systems in an external field which does not depend on time.
650 9 Calculus of Variations
is equal to zero, i.e., it must be time conserved for an isolated system. This quantity
is called system energy. Indeed, since the kinetic energy K may depend on both
generalised coordinates and their velocities, while the potential energy U only
depends on the coordinates, then
X
n
@ .K U/ X
n
@K
ED qP i .K U/ D qP i K C U:
iD1
@Pqi iD1
@Pqi
1X
n
KD ˛ij qP i qP j ;
2 i;jD1
with the coefficients ˛ij forming an n n matrix7 (which generally may depend on
the coordinates), we can write
0 1
X n
@K Xn X n
qP i D @ ˛ij qP j A qP i D 2K;
iD1
@P
qi iD1 jD1
and similarly for the y and z components. Here again the Lagrange equations (9.77)
were used. Hence the vector P with the Cartesian components
X
N
@L X
N
@L X
N
@L X
N
@L X
N
@L X
N
@L
Px D D ; Py D D ; Pz D D ;
@Pxi @vix @Pyi @viy @Pzi @viz
iD1 iD1 iD1 iD1 iD1 iD1
(9.80)
7
Of course, the matrix can always be chosen symmetric in the quadratic form.
9.3 Applications in Physics 651
XN
@L
PD ;
iD1
@vi
1X
N
KD mi v2i
2 iD1
ıri D ı Œs ri ; (9.81)
as can be seen from Fig. 9.6. Indeed, the vector r in the figure, which makes an
angle with the axis of rotation, defines a plane perpendicular to the axis when
rotated by angle ı to r0 . The projections of both r and r0 on that plane have both
the length r sin , and the angle between them is exactly ı . For small such angles
the length of the difference, ır D r0 r, is ır D .r sin / ı D js rj ı as jsj D 1.
Taking into account the directions of the vectors and the definition of the vector
product of the two vectors, we arrive at Eq. (9.81) written above.
Correspondingly, as the system rotates, velocities vi of its particles change as
well:
dri d d dri
ıvi D ı D .ıri / D ı Œs ri D ı s D ı Œs vi :
dt dt dt dt
Therefore, the total variation of the Lagrangian due to rotation of the whole system
by ı around the axis given by the unit vector s is (the sum over ˛ corresponds to
the summation over the Cartesian components):
XN X X N
@L @L @L @L
ıL D ıri˛ C ıvi˛ D ıri C ıvi :
iD1 ˛
@ri˛ @vi˛ iD1
@ri @vi
d X d X d X
N N N
D pi ıri D pi ı Œs ri D ı pi Œs ri :
dt iD1 dt iD1 dt iD1
The expression under the sum contains the triple product Œpi ; s; ri which is invariant
under a cyclic permutation of its components. Therefore, we can write
!
d X d X d X
N N N
ıL D ı Œpi ; s; ri D ı Œs; ri ; pi D ı s Œri pi :
dt iD1 dt iD1 dt iD1
Since the Lagrangian must not change, ıL D 0, the expression in the round brackets
above,
X
N
MD Œri pi ;
iD1
must be conserved. This is the familiar expression for the angular momentum of the
system.
So far we have discussed various applications of the Euler’s equations in
mechanics. Let us now consider some applications of the Ostrogradsky’s equa-
tion (9.74) for functions of more than one variable. The simplest example is related
to the oscillations of a string considered in Sect. 8.2.1. Consider a string of length l
9.3 Applications in Physics 653
which is set along the x axis with the tension T0 (per unit length). Each unit element
dx of the string oscillates in the perpendicular direction, and its vertical displacement
is described by the function u .x; t/. To construct the equation of motion for the
string, we need to write its Lagrangian, i.e., difference of the kinetic and potential
energy terms. If is the density (per unit length) of the string, the kinetic energy of
its piece of length dx will obviously be
1 2 1 0 2
dK D uP dx D ut dx:
2 2
The potential energy of the dx element of the string can be written as a work of the
tension force, T0 , to stretch its length
q from dx (when the
q string is strictly horizontal,
2 2
2
0
i.e., when not oscillating) to ds D .dx/ C .dy/ D 1 C ux dx, i.e.,
q
2 1 0 2 1 2
dU D T0 1 C u0x 1 dx D T0 1 C ux C 1 dx ' T0 u0x dx;
2 2
where we assumed that deformation of the string is small and hence expanded
the square root term keeping only the first two terms. Assuming that there is also
an external force F (per unit length) acting on the element dx, there will be an
additional term .Fdx/ u in the potential energy. Integrating these expressions
along the whole string gives the total kinetic and potential energies, respectively,
so that the total Lagrangian of the whole string is
Z l
1 0 2 1 0 2
LD ut T0 ux C Fu dx: (9.82)
0 2 2
The action to optimise is then given by the double integral
Z t2 Z t2 Z l
1 0 2 1 0 2
SD Ldt D dt dx ut T0 ux C Fu
t1 t1 0 2 2
Z t2 Z l
D dt dx F u; u0t ; u0x ;
t1 0
while the corresponding equation of motion for the transverse displacement is the
Ostrogradsky’s equation (9.74):
@F @ @F @ @F
D 0:
@u @t @ut @x @u0x
0
@2 u @4 u
2
D 4:
@t @x
[Hint: use Eq. (9.75).]
as required.
For instance, consider L Œf D f .x/n . Using our previous method, we obtain
.f C ˛ıf /n f n
ıL D L0 .0/ D lim D nf .x/n1 ıf .x/:
˛!0 ˛
9.3 Applications in Physics 655
It is easy to see that the same result is obtained using the functional derivative if we
define
ıF Œf .x/ 0
dF
Dı xx .x/: (9.85)
ıf .x0 / df
Calculating the integral with respect to the second term within the square brackets
by parts and using the fact that ıf .a/ D ıf .b/ D 0, we obtain
Z Z
b
@F d @F b
@F d @F
ıL D dx ıf .x/ ıf .x/ D dx ıf .x/ ;
a @f dx @f 0 a @f dx @f 0
which is the same expression as before, see Eq. (9.8). Hence, in the case of a
functional L given by an integral the functional derivative ıL=ıf .x/ is given by an
expression in the square brackets above; it does not contain a delta function:
656 9 Calculus of Variations
Z
ıL ı b @F d @F
D F x; f ; f 0 dx D : (9.86)
ıf .x/ ıf .x/ a @f dx @f 0
Let us work out a useful relationship involving the functional derivatives. Let L
be a functional of F (and x), which in turn is a functional of a function g D g.x/
(and x), i.e., L D L ŒF Œg.x/. Then,
Z Z
ıL ıF .x0 /
ıL D ıF x0 dx0 and also ıF x0 D ıg .x/ dx;
ıF .x0 / ıg .x/
so that
Z Z Z
ıL ıF .x0 / ıL ıF .x0 / 0
ıL D ıg .x/ dxdx0 D dx ıg .x/ dx:
ıF .x0 / ıg .x/ ıF .x0 / ıg .x/
On the other hand, the expression in the square brackets above must be the
functional derivative ıL=ıg.x/. Hence, we have
Z
ıL ıL ıF .x0 / 0
D dx : (9.87)
ıg.x/ ıF .x0 / ıg .x/
In quantum mechanics the variational principle plays a central role. Consider a non-
relativistic quantum system of N electrons moving in the field of atomic nuclei,
e.g., a molecule. In a stationary state the system is characterised by its Hamiltonian
operator, HbD K b C U,
b which is a sum of the kinetic energy,
XN
bD 1
K i
iD1
2
X
N
1 X
N
1
bD
U Vn .ri / C ˇ ˇ;
2 ˇri rj ˇ
iD1 i;jD1 .i¤j/
9.3 Applications in Physics 657
operators. Above, in the kinetic energy operator, i is the Laplacian calculated with
b describes the
respect to the position ri of the i-th electron; next, the first term in U
interaction of each electron with all atomic nuclei, while the second term stands for
the repulsive electron–electron interaction.
The quantum state of the system is described by the Schrödinger equation
b D E‰;
H‰ (9.88)
XN X N X
N
1 1 1
i ‰ C Vn .ri / ‰ C ˇ ˇ ‰ D E‰;
2 2 ˇri rj ˇ
iD1 iD1 i;jD1 .i¤j/
where we used shorthand notations for the volume element of each particle: dri D
dxi dyi dzi with ri D .xi ; yi ; zi /. The normalisation condition corresponds to the unit
probability for the electrons to be found somewhere. Indeed, the probability to find
the first electron between r1 and r1 C dr1 , the second between r2 and r2 C dr2 , and
so on is given simply by
so that the normalisation condition sums all these probabilities over the whole space
for each particle resulting in unity. To simplify our notations, we shall use in the
following the vector R to designate all electronic coordinates r1 ; : : : ; rN , while the
corresponding product of all volume elements dr1 drN will then be written simply
as dR. Also a single integral symbol will be used; however, of course, multiple
integration is implied as above.
Finally, E in Eq. (9.88) is the total energy of the system. The lowest total energy,
E0 , corresponds to the ground state with the wave function ‰0 ; assuming that the
system is stable in its ground state, we have E0 < 0. There could be more bound
(stable) states characterising the system. Their energies, E0 , E1 , E2 , etc., are negative
and form a sequence E0 < E1 < E2 < ; the corresponding wave functions are
‰0 , ‰1 , ‰2 ; etc. The states with energies higher than the ground state energy E0
correspond to the excited states of the system. The wave functions ‰i of different
states (i D 0; 1; 2; : : :) are normalised to unity and are orthogonal,
Z
‰i .R/ ‰j .R/ dR D ıij : (9.90)
658 9 Calculus of Variations
where normalisation of the wave function, Eq. (9.89), was used. This gives an
expression for the ground state energy directly via its ground state wave function.
This expression has been obtained from the Schrödinger equation. However, an
alternative way which is frequently used is to postulate the energy expression,
Z
E D ‰ H‰dR;b
as a functional of the wave function, E D E Œ‰ .R/, and then find the best
R function, ‰ .R/, which minimises its subject to the normalisation condition
wave
‰ ‰dR D 1. As usual, we define a function E .˛/ D E Œ‰ C ˛ı‰ and the
variation of the energy, ıE D E0 .˛/j˛D0 . The optimum condition for the wave
function is established by requiring that ıH D 0, where
Z Z
HD b
‰ H‰dR ‰ ‰dR 1
b
Since ı‰ is arbitrary, then ıH D 0 leads to the Schrödinger equation H‰‰ D0
with equal to the system energy, as above.
The calculation is a bit lengthier if we vary ‰ instead:
Z Z
ıH D b .ı‰/ dR
‰ H ‰ .ı‰/ dR: (9.91)
The Hamiltonian operator contains the kinetic energy term which involves differ-
entiation, and the potential energy operator which is a multiplication operator. For
such an operator one can easily place the function in front of the operator:
Z Z
b .ı‰/ dR D .ı‰/ U‰
‰ U b dR: (9.92)
9.3 Applications in Physics 659
where in the last passage we separated out the integration over the i-th electron from
all the others, which was denoted by dR0 . Next, consider the triple integral over dri
enclosed in the square brackets above:
Z Z Z Z 2
@
dri ‰ i .ı‰/ D dzi dyi dxi ‰ .ı‰/
@xi2
Z Z Z
@2
C dzi dxi dyi ‰ 2 .ı‰/
@yi
Z Z Z
@2
C dxi dyi dzi ‰ 2 .ı‰/ : (9.94)
@zi
Each of the integrals in the square brackets can be taken twice by parts, e.g.:
Z ˇ1 Z
@2 @
ˇ @‰ @
dxi ‰ .ı‰/ D ‰ .ı‰/ ˇ dxi .ı‰/
2
@xi @xi ˇ @xi @xi
1
Z
@‰ @
D dxi .ı‰/
@xi @xi
ˇ1 Z Z
@‰ ˇ @2 ‰ @2 ‰
D .ı‰/ˇˇ C dxi .ı‰/ D dxi ı‰:
@xi 1 @xi2 @xi2
Note that all free terms appearing while integrating by parts disappear as we assume
that the wave function goes to zero together with its spatial derivatives at infinity.
Repeating this integration for the other two integrals in Eq. (9.94), we obtain that8
Z Z
dri ‰ i .ı‰/ D dri i ‰ ı‰:
R R R
8
When b
A' dx D b
A
'dx D ' b
A
dx, i.e., where effectively the functions .x/
and '.x/ are allowed to change places around the operator b A, the operator is called self-adjoint. We
d2
have just shown that the operator dx 2 is such an operator.
660 9 Calculus of Variations
P
bD iK
and hence the same can be done for the whole kinetic energy operator K bi .
Combining this result with the similar result for the potential energy operator,
Eq. (9.92), we obtain for ıH in (9.91):
Z Z Z
ıH D b dR
.ı‰/ H‰ .ı‰/ ‰ dR D b ‰ dR:
.ı‰/ H‰
The factor N appears here because any of the N electrons can contribute to the
density. The proposed TF functional has the form:
Z Z Z Z
.r/ 1 .r/ .r0 /
ETF Œ .r/ D CF dr .r/5=3 Z drC drdr0 : (9.95)
r 2 jr r0 j
The first term here corresponds to the energy of a free uniform electron gas
occupying a volume dr, CF being a constant prefactor. The second term corresponds
to the attractive interaction of all the electrons of the density to the nucleus of
charge Z. Finally, the last term describes the electron–electron interaction. It is
9.3 Applications in Physics 661
simply the Coulomb interaction energy of the charge cloud of the density with
itself. The energy functional must be supplemented by the normalisation of the
density,
Z Z Z Z Z
.r/ dr D N dr : : : dR0 j‰ .R/j2 D N j‰ .R/j2 dR D N: (9.96)
„ ƒ‚ …
N1
Therefore, to find the “best” electron density that would minimise the atom energy,
we need to vary the energy with respect to the density subject to the constraint of
the normalisation:
Z Z
ıH D ı ETF Œ .r/ .r/ dr N D ıETF Œ .r/ ı .r/ dr:
(9.97)
Here is the Lagrange multiplier.
Problem 9.16. Show using the method based on the replacement ! C˛ı
that
Z Z
5 Z .r0 / 0
ıETF D CF 3=2 C dr ı .r/ dr
3 r jr r0 j
Z
5 3=2
D CF V.r/ ı .r/ dr;
3
where
Z
Z .r0 / 0
V.r/ D dr
r jr r0 j
is the electrostatic potential due to the nucleus and the entire electron cloud.
Problem 9.17. Consider the functional
Z
H D ETF Œ .r/ .r/ dr N :
ıH 5
D CF .r/3=2 V.r/:
ı .r/ 3
662 9 Calculus of Variations
Using the above results, the following equation for the electron density is
obtained:
5
ıH D 0 H) CF .r/3=2 V.r/ D :
3
This is a rather complex integral equation for the density . It is to be solved together
with the normalisation condition (9.96) which is required to determine the Lagrange
multiplier . The TF model gives plausible results for some of the many-electron
atoms, but fails miserably for molecules (which only require some modification of
the second term in the energy expression (9.95)), for which no binding is obtained
at all.
Still, from an historical point of view, the TF model is very important. Essentially,
this model avoids dealing with the wave function, a very complicated object indeed
as depending on many electronic variables; instead, it was proposed to work only
with the electron density .r/ which depends only on three variables. The TF
model was the first attempt to implement that idea of replacing ‰ with , and is
an early predecessor of the modern density functional theory (DFT) developed by
P. Hohenberg, W. Kohn and L.J. Sham (in 1964–1965), in which that idea has been
successfully developed and implemented into a powerful computational technique
which is being widely used nowadays in condensed matter physics, material science
and computational chemistry.
In Kohn–Sham DFT the electron density is used as well, however, the variational
problem is formulated for some one-particle wave functions a .r/ called orbitals
which are required to form an orthonormal set:
Z
a .r/ a0 .r/ dr D ıaa0 :
The idea is to map a real interacting many-electron system into an artificial non-
interacting system of the same number of electrons N and of the same electron
density .r/; the fictitious system of electrons is subjected to an effective external
potential to be determined self-consistently. Because of that mapping,
ˇ ˇthe electron
ˇ 2ˇ
density can be written explicitly as a sum of the densities a D ˇ a .r/ ˇ due to each
electron:
X
N
.r/ D j a .r/j2 : (9.98)
aD1
Moreover, the kinetic energy of all electrons can also be calculated as a sum of
kinetic energies due to each individual electron:
N Z
X
1
KD .r/ a a .r/ dr: (9.99)
aD1
a
2
9.3 Applications in Physics 663
Then, the total energy functional of the electron density is proposed to be of the
following form:
N Z
X Z Z
1 1 .r/ .r0 /
EDFT Œf a g D a a dr C drdr0
aD1
a
2 2 jr r0 j
Z
C .r/ Vn .r/dr C Exc Œ : (9.100)
Here the first term is the kinetic energy of the fictitious electron gas, the second
term describes the bare Coulomb interaction between electrons, next is the term
describing interaction of the electrons with the potential of the nuclei Vn .r/.
Finally, the last, the so-called exchange–correlation term, Exc , describes all effects
of exchange and correlation in the electron gas. This term also absorbs an error
due to replacing the kinetic energy of the real interacting gas with that of the
non-interacting gas. The exact expression for Exc is not known, however, good
approximations exist. In the so-called local density approximation (LDA)
Z
Exc Œ D .r/ xc Œ .r/ dr; (9.101)
Problem 9.18. Obtain the following expressions for the functional derivatives
of different terms in the energy functional:
ı .r/ ıK 1
D ı .r r1 / a .r1 / ;
D r1 a .r1 /;
ı a .r1 / ı a .r1 / 2
Z Z 0 Z
ı 1 .r/ .r / 0 .r0 /
drdr D a .r 1 / dr0 ;
ı a .r1 / 2 jr1 r0 j jr1 r0 j
Z
ı
.r/ Vn .r/dr D Vn .r1 / a .r1 / ;
ı a .r1 /
ıExc d xc
D xc . / C a .r1 / D Vxc . .r1 // a .r1 / :
ı a .r1 / d D .r1 /
Here r1 is the Laplacian with respect to the point r1 , and Vxc is called the
exchange–correlation potential.
664 9 Calculus of Variations
Once we have all the necessary derivatives, we can consider the optimum of
the energy functional (9.100) subject to the condition that the orbitals form an
orthonormal set. Consider the appropriate auxiliary functional:
X Z
H Œf a g D EDFT Œf a g aa0 a a 0 dr ı aa0 ;
a;a0
where numbers aa0 are the corresponding Lagrange multipliers. They form a
symmetric square matrix with the size equal to the number of orbitals we use. This
gives
Z
ıH 1 .r0 / 0
D r1 C dr C Vn .r1 / C Vxc .r1 / a .r1 /
ı a .r1 / 2 jr1 r0 j
X
aa0 a0 .r1 / :
a0
Problem 9.19. Repeat the previous steps by calculating the functional deriva-
tive of H with respect to a .r1 /, not a .r1 /, i.e., opposite to what has been
done above. Show that in this case we obtain the following equation:
X
FbKS a D a0 a a0 : (9.104)
a0
Taking the complex conjugate of the above equation and comparing it with
Eq. (9.103), we see that aa0 D a0 a , i.e., the matrix D .aa0 / of Lagrange
multipliers must be Hermitian.
The obtained Eq. (9.103) are not yet final. The point is that one can choose
another set of orbitals, 'b .r/, as a linear combination of the old ones,
X
'b .r/ D uba a .r/ ; (9.105)
a
9.3 Applications in Physics 665
Then, the old orbitals can easily be expressed via the new ones: multiply both sides
of (9.105) by uba0 and sum over b:
!
X X X X
uba0 'b .r/ D uba0 uba a .r/ H) a D uba 'b :
b a b b
„ ƒ‚ …
ıaa0
Problem 9.20. Prove that the new orbitals still form an orthonormal set as the
old ones.
Then, it is easy to see that the electron density can be expressed via the new
orbitals in exactly the same way as when using the old ones:
!
X XX X X X
D a a D uba ub0 a 'b 'b0 D uba ub0 a 'b 'b0 D 'b 'b
a a bb0 bb0 a b
„ ƒ‚ …
ıbb0
X
D j'b j2 : (9.106)
b
We may say that the electron density is invariant with respect to a unitary
transformation of the orbitals. Hence, since the density remains in the same form
when expressed via either the old or new sets of orbitals, and using the fact that the
bKS depends entirely on the density, one can rewrite these equations via
operator F
the new orbitals:
X X X X
FbKS a D aa0 a0 H) F bKS uba 'b D aa0 uba0 'b :
a0 b a0 b
where
X
Q b0 b D ub0 a aa0 uba0 : (9.108)
aa0
These are called Kohn–Sham equations. It is said that the orbitals 'b are eigenfunc-
tions of the Kohn–Sham operator F bKS with the eigenvalues Q b in close analogy to
eigenvectors and eigenvalues of a matrix.
These equations are to be solved self-consistently together with Eq. (9.106). First,
some orbitals f'b g are assumed. These allow one to calculate the density .r/
and hence the Kohn–Sham operator F bKS . Once this is known, new eigenfunctions
f'b g can be obtained by solving the eigenproblem (9.109) which gives an updated
density, and so on. The iterative process is stopped when the density does not change
any more (within a numerical tolerance). The obtained electron density corresponds
to the density of the real electron gas, and the total energy EKS calculated with
this density gives the total electron energy of the system. The orbitals 'a and
the corresponding eigenvalues Q a do not, strictly speaking, have a solid physical
meaning; however, in actual calculations they are interpreted as effective one-
electron wave functions and energies, respectively.
In the so-called Hartree–Fock (HF) method the total energy of a system of
electrons is written via a more general object than the density .r/ itself, which
is called the density matrix (for simplicity, we completely neglect spin here even
though it is essential in the formulation of the method):
0 X
0
r; r D a .r/ a r :
a
The “diagonal element” of this object, .r; r/, is the same as the density .r/. The
corresponding energy functional is still a functional of all the orbitals and reads:
Z
1 0
EHF Œf a g D r C Vn .r/ r; r dr
2 r0 !r
Z Z
1 drdr0 0
C 0
.r; r/ r0 ; r0 r; r0 r ;r :
2 jr r j
The first term describes the kinetic energy of the electrons together with the energy
of their interaction with the nuclei (note that the notation r0 ! r means that after
9.3 Applications in Physics 667
application of the Laplacian to the density matrix one has to set r0 to r); the last term
describes both Coulomb and exchange interaction of the electrons with each other.
Problem 9.21. By imposing the condition for the orbitals to form an orthonor-
mal set and setting the variation of the corresponding functional to zero,
show that the equations for the orbitals a are determined by solving the
eigenproblem
bHF
F a D a a;
T V
Taylor, 318 variation of functional, 611
Taylor’s expansion, 185 vector field, 501, 523, 525, 528, 530
Taylor’s formula, 185, 186 vector-column, 16
theta-function transformation, 289 vector-row, 16
Thomas-Fermi model, 660 volume of N-dimensional sphere, 521
three-dimensional delta function, volume of sphere, 520
397 von Karman, 292
trace of matrix, 92
transpose matrix, 20
transverse oscillations of rod, 654 W
triangular decomposition of matrix, 98 wave, 232, 670
triangular matrix, 50 wave equation, 234, 533, 545, 553–555
tridiagonal matrix, 50, 60, 93, 97, 111, 119 wave frequency, 233
trigonometric form of Fourier integral, wave phase velocity, 233
404 wavefunction, 117
tunneling, 238 wavelength, 233
wavevector, 113, 233, 420
weight function, 8, 330
white noise, 487
U Wronskian, 227
uncertainty principle, 411
uniform convergence, 175, 180, 406, 496
unit base vectors of curvilinear system, 503 Z
unit impulse function, 410 zero boundary conditions, 566, 573, 577, 578,
unit matrix, 16 583, 588, 591, 597
unitary matrix, 30, 70, 75 zero of order n of f .z/, 195
unitary transformation, 30 zero point energy, 383