Kantorovich2016 Book MathematicsForNaturalScientist (1)

Undergraduate Lecture Notes in Physics
Lev Kantorovich
Mathematics
for Natural
Scientists II
Advanced Methods
More information about this series at http://www.springer.com/series/8917

Undergraduate Lecture Notes in Physics (ULNP) publishes authoritative texts
covering topics throughout pure and applied physics. Each title in the series
is suitable as a basis for undergraduate instruction, typically containing prac-
tice problems, worked examples, chapter summaries, and suggestions for further
reading.
ULNP titles must provide at least one of the following:
An exceptionally clear and concise treatment of a standard undergraduate subject.
A solid undergraduate-level introduction to a graduate, advanced, or nonstandard
subject.
A novel perspective or an unusual approach to teaching a subject.
ULNP especially encourages new, original, and idiosyncratic approaches to physics
teaching at the undergraduate level.
The purpose of ULNP is to provide intriguing, absorbing books that will continue
to be the reader’s preferred reference throughout their academic career.
Series editors
Neil Ashby
Professor Emeritus, University of Colorado Boulder, CO, USA
William Brantley
Professor, Furman University, Greenville, SC, USA
Matthew Deady
Professor, Bard College, Annandale, NY, USA
Michael Fowler
Professor, University of Virginia, Charlottesville, VA, USA
Morten Hjorth-Jensen
Professor, University of Oslo, Norway
Michael Inglis
Professor, SUNY Suffolk County Community College, Selden, NY, USA
Heinz Klose
Professor Emeritus, Humboldt University Berlin, Germany
Helmy Sherif
Professor, University of Alberta, Edmonton, AB, Canada
Lev Kantorovich
Mathematics for Natural

Scientists II
Advanced Methods
123
Lev Kantorovich
Physics Department
School of Natural and Mathematical
Sciences
King’s College London, The Strand
London, UK
ISSN 2192-4791 ISSN 2192-4805 (electronic)

ISBN 978-3-319-27859-9 ISBN 978-3-319-27861-2 (eBook)
DOI 10.1007/978-3-319-27861-2
Library of Congress Control Number: 2015943266
© Springer International Publishing Switzerland 2016

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, express or implied, with respect to the material contained herein or for any
errors or omissions that may have been made.
Printed on acid-free paper
This Sprinter imprint is published by Springer Nature

The registered company is Springer International Publishing AG Switzerland
Preface
This is the second volume of the course of mathematics for natural scientists. It is
loosely based on the mathematics course for second year physics students at King’s
College London that I have been reading for more than 10 years. It follows the
spirit of the first volume [1] by continuing a gradual build-up of the mathematical
knowledge necessary for, but not exclusively, physics students.
This volume covers more advanced material, beginning with two essential
components: linear algebra (Chap. 1) and theory of functions of complex variables
(Chap. 2). These techniques are heavily used in the chapters that follow. Fourier
series are considered in Chap. 3, special functions of mathematical physics (Dirac
delta function, gamma and beta functions, detailed treatment of orthogonal polyno-
mials, the hypergeometric differential equation, spherical and Bessel functions) in
Chap. 4 and then Fourier (Chap. 5) and Laplace (Chap. 6) transforms. In Chap. 7, a
detailed treatment of curvilinear coordinates is given, including the corresponding
differential calculus. This is essential, as many physical problems possess symmetry
and using appropriate curvilinear coordinates may significantly simplify the solution
of the corresponding partial differential equations (studied in Chap. 8) if the
symmetry of the problem at hand is taken into account. The book is concluded with
variational calculus in Chap. 9.
As in the first volume, I have tried to introduce new concepts gradually and as
clearly as possible, giving examples and problems to illustrate the material. Across
the text, all the proofs necessary to understand and appreciate the mathematics
involved are also given. In most cases, the proofs would satisfy the most demanding
physicist or even a mathematician; only in a few cases have I had to sacrifice the
“strict mathematical rigour” by presenting somewhat simplified derivations and/or
proofs.
As in the first volume, many problems are given throughout the text. These
are designed mainly to illustrate the theoretical material and require the reader to
complete them in order to be in a position to move forward. In addition, other
problems are offered for practise, although I have to accept, their number could
have been larger. For more problems, the reader is advised to consult other texts,
e.g. the books [2–6].
v
vi Preface
When working on this volume, I have mostly consulted a number of excellent

classic Russian textbooks [7–12]. As far as I am aware, some of them are available
in English, and I would advise a diligent student to continue his/her education by
reading these. Concerning the others, there is of course an obvious language barrier.
Unfortunately, as I cannot ask the reader to learn Russian purely for that purpose,
these texts remain inaccessible for most readers. I hope that the reader would be
able to find more specialised texts in English, which go beyond the scope of this
(and the previous) book to further the development of their studies, e.g. books [13–
26] represent a rather good selection which cover the topics of this volume, but this
list of course by no means is complete.
The mathematics throughout the book is heavily illustrated by examples from
condensed matter physics. In fact, probably over a quarter of the text of the
whole volume is occupied by these. Every chapter, save Chap. 8, contains a
large concluding section exploring physics topics that necessitate the mathematics
presented in that chapter. Chapter 8, on partial differential equations, is somewhat
special in this respect, as it is entirely devoted to solving equations of mathematical
physics (wave, Laplace and heat transport equations). Consequently, it does not have
a special section on applications. When selecting the examples from physics, I was
mostly governed by my own experience and research interests as well as several
texts, such as the books [27, 28]. In fact, examples from [27] have been used in both
of these volumes.
As in the first volume, this book begins with a list of the names of all the scientists
across the world, mathematicians, physicists and engineers, whose invaluable
contribution formed the foundation of the beautiful sciences of mathematics and
physics that have been enjoying a special bond throughout the centuries.
Should you find any errors or unnoticed misprints, please send your corrections
either directly to myself (lev.kantorovitch@kcl.ac.uk) or to the publisher. Your
general comments, suggestions and any criticism related to these two volumes
would be greatly appreciated.
This book concludes the project I started about four years ago, mainly working
in the evenings, at the weekends, as well as on the train going to and from work.
Not everything I initially planned to include has appeared in the books, although
the two volumes contain most of the essential ideas young physicists, engineers and
computational chemists should become familiar with. Theory of operators, group
theory, tensor calculus, stochastic theory (or theory of probabilities) and some other
more specialised topics have not been included, but can be found in multiple other
texts. Still, I hope these volumes will serve as an enjoyable introduction to the
beautiful world of mathematics for students, encouraging them to think more and
ask for more. I am also confident that the books will serve as a rich source for
lecturers.
I wish you happy reading!
London, UK Lev Kantorovich

Preface vii
References
1. L. Kantorovich, “Mathematics for Natural Scientists: Fundamentals and Basics”, Undergradu-

ate Lecture Notes in Physics, Springer, 2015 (ASIN: B016CZWP36).
2. D. McQuarrie, “Mathematical methods for scientists and engineers”, Univ. Sci. Books, 2003
(ISBN 1-891389-29-7).
3. K. F. Riley, M. P. Hobson, and S. J. Bence, “Mathematical Methods for Physics and
Engineering”, Cambridge Univ. Press, 2006 (ISBN 0521679710).
4. M. Boas, “Mathematical methods in the physical sciences”, Wiley, 2nd Edition, 1983 (ISBN
0-471-04409-1).
5. K. Stroud, “Engineering mathematics”, Palgrave, 5th Edition, 2001 (ISBN 0-333-919394).
6. G. B. Arfken, H. J. Weber and F. E. Harris, “Mathematical Methods for Physicists. A
Comprehensive Guide”, Academic Press, 7th Edition, 2013 (ISBN: 978-0-12-384654-9).
7. В. И. Смирнов, “Курс высшей математики”, т. 1-5, Москва, Наука, 1974 (in Russian).
Apparently, there is a rather old English translation: V. I. Smirnov, “A Course of Higher
Mathematics: Adiwes International Series in Mathematics”, Pergamon, 1964; Vols. 1 (ISBN
1483123944), 2 (ISBN 1483120171), 3-Part-1 (ISBN B0007ILX1K) and 3-Part-2 (ISBN
B00GWQPPMO).
8. Г. М. Фихтенголц, “Курс дифференциального и интегрального исчисления”,
Москва, Физматлит, 2001 (in Russian). [G. M. Fihtengolc, “A Course of Differentiation
and Integration”, Vols. 1–3., Fizmatgiz, 2001] I was not able to find an English translation of
this book.
9. В. С. Шипачев, “Высшая математика”, Москва, Высшая Школа, 1998 (in Russian). [V.
S. Shipachev, “Higher Mathematics”, Vishaja Shkola, 1998.] I was not able to find an English
translation of this book.
10. А. Н. Тихонов и А. А. Самарский, “Уравнения математической физики”, Москва,
Наука, 1977 (in Russian). There is a resent reprint of a rather old translation: A. N. Tikhonov
and A. A. Samarskii, “Equations of Mathematical Physics”, Dover Books on Physics, 2011
(ISBN-10: 0486664228, ISBN-13: 978-0486664224).
11. М. А. Лаврентьев и Б. В. Шабат, “Методы теории функций комплексного
переменного” , УМН, 15:5(95), 1960 (in Russian). [M. A. Lavrent’ev and B. V. Shabat,
Methods of theory of functions of complex variables”, Uspekhi Mat. Nauk, 15:5(95) (1960).]
I was not able to find an English translation of this book.
12. А. Ф. Никофоров и В. Б. Уваров, “Специальные функции математической физики”,
Москва, Наука, 1984 (in Russian). This excellent book is available in English: A. F. Nikiforov
and V. B. Uvarov, “Special Functions of Mathematical Physics: A Unified Introduction With
Applications”, Springer, 2013. (ISBN-10: 1475715978, ISBN-13: 978-1475715972).
13. J. M. Howie, “Complex analysis”, Springer, 2004 (ISBN-10: 1852337338, ISBN-13: 978-
1852337339).
14. H. A. Priestley, “Introduction to Complex Analysis Paperback”, OUP Oxford; 2 edition, 2003
(ISBN-10: 0198525621, ISBN-13: 978-0198525622)
15. Kevin W. Cassel, “Variational Methods with Applications in Science and Engineering
Hardcover”, Cambridge University Press, 2013 (ISBN-10: 1107022584, ISBN-13: 978-
1107022584).
16. Stanley J. Farlow , “Partial Differential Equations for Scientists and Engineers”, Dover Books
on Mathematics, Dover, 2003 (ISBN-10: 048667620X, ISBN-13: 978-0486676203).
17. Geoffrey Stephenson, “Partial Differential Equations for Scientists and Engineers”, Imperial
College Press, 1996 (ISBN-10: 1860940242, ISBN-13: 978-1860940248).
18. Phil Dyke, “An Introduction to Laplace Transforms and Fourier Series”, Springer Undergradu-
ate Mathematics Series, Springer, 2014 (ISBN-10: 144716394X, ISBN-13: 978-1447163947).
19. Brian Davies, “Integral Transforms and Their Applications”, Texts in Applied Mathematics,
Springer, 2002 (ISBN-10: 0387953140, ISBN-13: 978-0387953144).
viii Preface
20. Allan Pinkus, “Fourier Series and Integral Transforms”, Cambridge University Press, 1997
(ISBN-10: 0521597714, ISBN-13: 978-0521597715).
21. Serge Lang, “Linear Algebra”, Undergraduate Texts in Mathematics, Springer, 2013 (ISBN-10:
1441930817, ISBN-13: 978-1441930811).
22. Richard Bellman, “Introduction to Matrix Analysis”, Classics in Applied Mathematics, Society
for Industrial and Applied Mathematics, 2 edition, 1987 ( ISBN-10: 0898713994, ISBN-13:
978-0898713992).
23. George Andrews, Richard Askey and Ranjan Roy, “Special Functions”, Encyclopedia of
Mathematics and its Applications, Cambridge University Press, 2010 (ISBN-10: 0521789885,
ISBN-13: 978-0521789882).
24. N. N. Lebedev, “Special Functions & Their Applications”, Dover Books on Mathematics,
Dover, 2003 (ISBN-10: 0486606244, ISBN-13: 978-0486606248).
25. Naismith Sneddon, “Special Functions of Mathematical Physics and Chemistry”, Univer-
sity Mathematics Texts, Oliver & Boyd, 1966 (ISBN-10: 0050013343, ISBN-13: 978-
0050013342).
26. W. W. Bell, “Special Functions for Scientists and Engineers”, Dover Books on Mathematics,
Dover, 2004 (ISBN-10: 0486435210, ISBN-13: 978-0486435213).
27. В. Г. Левич, “Курс теоретической физики”, т. 1 и 2, Наука, 1969 (in Russian). There
is a translation: Benjamin G. Levich, “Theoretical physics: an advanced text”, North-Holland
Publishing in Amsterdam, London, 1970.
28. L. Kantorovich, “Quantum Theory of the Solid State: an Introduction”, Springer, 2004 (ISBN
978-1-4020-1821-3 and 978-1-4020-2153-4).
Famous Scientists Mentioned in the Book
Throughout the book various people, both mathematicians and physicists, who
are remembered for their outstanding contribution in developing science, will be
mentioned. For reader’s convenience, their names (together with some information
borrowed from their Wikipedia pages) are listed here in the order they first appear
in the text:
Leopold Kronecker (1823–1891) was a German mathematician.
Georg Friedrich Bernhard Riemann (1826–1866) was an influential German
mathematician who made lasting and revolutionary contributions to analysis,
number theory and differential geometry.
Jørgen Pedersen Gram (1850–1916) was a Danish actuary and mathematician.
Erhard Schmidt (1876–1959) was an Estonian-German mathematician.
Adrien-Marie Legendre (1752–1833) was a French mathematician.
Edmond Nicolas Laguerre (1834–1886) was a French mathematician.
Charles Hermite (1822–1901) was a French mathematician.
Pafnuty Lvovich Chebyshev (1821–1894) was a Russian mathematician.
Albert Einstein (1879–1955) was a German-born theoretical physicist who
developed the general theory of relativity and also contributed in many other areas of
physics. He received the 1921 Nobel Prize in physics for his “services to theoretical
physics”.
Wolfgang Ernst Pauli (1900–1958) was an Austrian theoretical physicist and
one of the pioneers of quantum physics.
Jean-Baptiste Joseph Fourier (1768–1830) was a French mathematician and
physicist.
Gabriel Cramer (1704–1752) was a Swiss mathematician.
Hendrik Antoon Lorentz (1853–1928) was a Dutch physicist.
Józef Maria Hoene-Wroński (1776–1853) was a Polish Messianist philosopher,
mathematician, physicist, inventor, lawyer and economist.
Ludwig Otto Hesse (1811–1874) was a German mathematician.
Cornelius (Cornel) Lanczos (1893–1974) was a Hungarian mathematician and
physicist.
Léon Nicolas Brillouin (1889–1969) was a French physicist.
ix
x Famous Scientists Mentioned in the Book
George Green (1793–1841) was a British mathematical physicist.

Erwin Rudolf Josef Alexander Schrödinger (1887–1961) was a Nobel Prize-
winning (1933) Austrian physicist who developed a number of fundamental results,
which formed the basis of wave mechanics.
Sir William Rowan Hamilton (1805–1865) was an Irish physicist, astronomer
and mathematician.
Paul Adrien Maurice Dirac (1902–1984) was an English theoretical physicist
who made fundamental contributions to the early development of both quantum
mechanics and quantum electrodynamics. He shared the Nobel Prize in physics for
1933 with Erwin Schrödinger “for the discovery of new productive forms of atomic
theory”.
Abraham de Moivre (1667–1754) was a French mathematician.
Giacinto Morera (1856–1909) was an Italian engineer and mathematician.
Baron Augustin-Louis Cauchy (1789–1857) was a French mathematician
widely reputed as a pioneer of analysis.
Pierre-Simon, marquis de Laplace (1749–1827) was an influential French
scholar whose work was important to the development of mathematics, statistics,
physics and astronomy.
Leonhard Euler (1707–1783) was a pioneering Swiss mathematician and
physicist.
Sir Isaac Newton (1642–1726/1727) was a famous English physicist and
mathematician who laid the foundations for classical mechanics and made seminal
contributions to optics and (together with Gottfried Leibniz) the development of
calculus.
Gottfried Wilhelm von Leibniz (1646–1716) was a German polymath and
philosopher, who to this day occupies a prominent place in the history of mathemat-
ics and the history of philosophy. Most scholars believe Leibniz developed calculus
independently of Isaac Newton, and Leibniz’s notation has been widely used ever
since it was published.
Karl Theodor Wilhelm Weierstrass (1815–1897) was a German mathematician
often cited as the “father of modern analysis”.
Niels Henrik Abel (1802–1829) was a Norwegian mathematician.
Brook Taylor (1685–1731) was an English mathematician.
Pierre Alphonse Laurent (1813–1854) was a French mathematician.
Julian Karol Sochocki (1842–1927) was a Polish mathematician.
Felice Casorati (1835–1890) was an Italian mathematician.
Marie Ennemond Camille Jordan (1838–1922) was a French mathematician.
Hendrik Anthony Kramers (1894–1952) was a Dutch physicist.
Ralph Kronig (1904–1995) was a German-American physicist.
Max Karl Ernst Ludwig Planck (1858–1947) was a German theoretical
physicist and one of the founders of the quantum theory.
Gerd Binnig (born 1947) is a German physicist.
Heinrich Rohrer (born 1933) is a Swiss physicist.
Julian Karol Sokhotski (1842–1927) was a Russian-Polish mathematician.
Famous Scientists Mentioned in the Book xi
Josip Plemelj (1873–1967) was a Slovene mathematician.

Oliver Heaviside (1850–1925) was a self-taught English electrical engineer,
mathematician and physicist.
Johann Carl Friedrich Gauss (1777–1855) was a German mathematician and
physicist.
Lorenzo Mascheroni (1750–1800) was an Italian mathematician.
Benjamin Olinde Rodrigues (1795–1851), more commonly known as Olinde
Rodrigues, was a French banker, mathematician and social reformer.
Carl Gustav Jacob Jacobi (1804–1851) was a German mathematician.
Friedrich Wilhelm Bessel (1784–1846) was a German astronomer, mathemati-
cian, physicist and geodesist.
James Stirling (1692–1770) was a Scottish mathematician.
William Lawrence Bragg (1890–1971) was an Australian-born British physicist
and X-ray crystallographer, discoverer (1912) of the Bragg law of X-ray diffraction
and a joint winner (with his father, Sir William Bragg) of the Nobel Prize for physics
in 1915.
Niels Henrik David Bohr (1885–1962) was a Danish physicist who made
fundamental contributions to quantum theory. He received the Nobel Prize in
physics in 1922.
Ludwig Eduard Boltzmann (1844–1906) was an Austrian physicist and
philosopher and one of the founders of statistical mechanics.
Ralph Kronig (1904–1995) was a German physicist.
William George Penney (1909–1991) was an English mathematician and
mathematical physicist.
Felix Bloch (1905–1983) was a Swiss physicist and was awarded the 1952 Nobel
Prize in physics.
Johann Peter Gustav Lejeune Dirichlet (1805–1859) was a German mathe-
matician with deep contributions to number theory, the theory of Fourier series and
other topics in mathematical analysis.
Marc-Antoine Parseval des Chênes (1755–1836) was a French mathematician.
Michel Plancherel (1885–1967) was a Swiss mathematician.
Takeo Matsubara (born 1921) is a Japanese theoretical physicist.
Siméon Denis Poisson (1781–1840) was a French mathematician, geometer and
physicist.
Paul Peter Ewald (1888–1985) was a German-born US crystallographer and
physicist and a pioneer of X-ray diffraction methods.
Max Born (1882–1970) was a German-British physicist and mathematician.
Theodore von Kármán (1881–1963) was a Hungarian-American mathemati-
cian, aerospace engineer and physicist.
Léon Nicolas Brillouin (1889–1969) was a French physicist.
Jean-Baptiste le Rond d’Alembert (1717–1783) was a French mathematician,
mechanician, physicist, philosopher and music theorist.
Hermann Ludwig Ferdinand von Helmholtz (1821–1894) was a German
physician and physicist.
xii Famous Scientists Mentioned in the Book
Richard Phillips Feynman (1918–1988) was an American theoretical physicist

who made several fundamental contributions in physics. He received the Nobel Prize
in physics in 1965.
Named after Robert Brown (1773–1858) was a Scottish botanist and palaeob-
otanist.
Norbert Wiener (1894–1964) was an American mathematician and philosopher.
Aleksandr Yakovlevich Khinchin (1894–1959) was a Soviet mathematician.
Joseph Fraunhofer (1787–1826), ennobled in 1824 as Ritter von Fraunhofer,
was a German optician.
Pierre-Simon, marquis de Laplace (1749–1827) was a French mathematician
and astronomer.
Gustav Robert Kirchhoff (1824–1887) was a German physicist.
Josiah Willard Gibbs (1839–1903) was an American scientist who made
important theoretical contributions to physics, chemistry and mathematics.
James Clerk Maxwell (1831–1879) was a Scottish scientist in the field of
mathematical physics. His most notable achievement was to formulate the unified
classical theory of electromagnetic radiation, bringing together for the first time
electricity and magnetism.
Pythagoras of Samos (c. 570–c. 495 BC) was an Ionian Greek philosopher and
mathematician.
Mikhail Vasilyevich Ostrogradsky (1801–1862) was a Ukrainian mathemati-
cian and physicist.
Joseph-Louis Lagrange (1736–1813) was an Italian Enlightenment Era mathe-
matician and astronomer who made significant contributions to the fields of analysis,
number theory and both classical and celestial mechanics.
Llewellyn Hilleth Thomas (1903–1992) was a British physicist and applied
mathematician.
Enrico Fermi (1901–1954) was an outstanding Italian physicist and the 1938
Nobel laureate in physics.
Walter Kohn (born in 1923) is an Austrian-born American theoretical physicist.
He was awarded, with John Pople, the Nobel Prize in chemistry in 1998 for the
development of the density functional theory.
Lu Jeu Sham (born April 28, 1938) is a Chinese physicist.
Vladimir Aleksandrovich Fock (1898–1974) was a Soviet physicist.
Douglas Rayner Hartree (1897–1958) was an English mathematician and
physicist.
Contents
1 Elements of Linear Algebra. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Introduction to Multidimensional Complex
Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Analogy Between Functions and Vectors . . . . . . . . . . . . . . . . . . . 7
1.1.3 Orthogonalisation Procedure (Gram–Schmidt Method) . . . 10
1.2 Matrices: Definition and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.1 Introduction of a Concept. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.2 Operations with Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.3 Inverse Matrix: An Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.2.4 Linear Transformations and a Group. . . . . . . . . . . . . . . . . . . . . . . . 26
1.2.5 Orthogonal and Unitary Transformations . . . . . . . . . . . . . . . . . . . 28
1.2.6 Determinant of a Square Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.2.7 A Linear System of Algebraic Equations . . . . . . . . . . . . . . . . . . 52
1.2.8 Linear Independence of Functions: Wronskian. . . . . . . . . . . . . 56
1.2.9 Calculation of the Inverse Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
1.2.10 Eigenvectors and Eigenvalues of a Square Matrix . . . . . . . . . 61
1.2.11 Simultaneous Diagonalisation of Two Matrices . . . . . . . . . . . . 82
1.2.12 Spectral Theorem and Function of a Matrix . . . . . . . . . . . . . . . . 85
1.2.13 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
1.2.14 Extremum of a Function of n Variables . . . . . . . . . . . . . . . . . . . . . 90
1.2.15 Trace of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
1.2.16 Tridiagonalisation of a Matrix: The Lanczos Method . . . . . . 93
1.2.17 Dividing Matrices into Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
1.3 Examples in Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
1.3.1 Particle in a Magnetic Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
1.3.2 Vibrations in Molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
1.3.3 Vibrations of Atoms in an Infinite Chain: A
Point Defect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
1.3.4 States of an Electron in a Solid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
1.3.5 Time Propagation of a Wave Function . . . . . . . . . . . . . . . . . . . . . . 121
xiii
xiv Contents
2 Complex Numbers and Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

2.1 Representation of Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
2.2 Functions on a Complex Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
2.2.1 Regions in the Complex Plane and Mapping . . . . . . . . . . . . . . . 130
2.2.2 Differentiation: Analytic Functions . . . . . . . . . . . . . . . . . . . . . . . . . 134
2.3 Main Elementary Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
2.3.1 Integer Power Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
2.3.2 Integer Root Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
2.3.3 Exponential and Hyperbolic Functions . . . . . . . . . . . . . . . . . . . . . 149
2.3.4 Logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
2.3.5 Trigonometric Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
2.3.6 Inverse Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
2.3.7 General Power Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
2.4 Integration in the Complex Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
2.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
2.4.2 Integration of Analytic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
2.5 Complex Functional Series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
2.5.1 Numerical Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
2.5.2 General Functional Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
2.5.3 Power Series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
2.5.4 The Laurent Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
2.5.5 Zeros and Singularities of Functions . . . . . . . . . . . . . . . . . . . . . . . . 193
2.6 Analytic Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
2.7 Residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
2.7.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
2.7.2 Applications of Residues in Calculating Real
Axis Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
2.8 Linear Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
2.9 Selected Applications in Physics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
2.9.1 Dispersion Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
2.9.2 Propagation of Electro-Magnetic Waves in a Material . . . . . 232
2.9.3 Electron Tunneling in Quantum Mechanics . . . . . . . . . . . . . . . . 238
2.9.4 Propagation of a Quantum State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
3 Fourier Series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
3.1 Trigonometric Series: An Intuitive Approach . . . . . . . . . . . . . . . . . . . . . . . . 250
3.2 Dirichlet Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
3.3 Integration and Differentiation of the Fourier Series. . . . . . . . . . . . . . . . . 259
3.4 Parseval’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
3.5 Complex (Exponential) Form of the Fourier Series . . . . . . . . . . . . . . . . . . 265
3.6 Application to Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
3.7 A More Rigorous Approach to the Fourier Series . . . . . . . . . . . . . . . . . . . 272
3.7.1 Convergence “On Average” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
Contents xv
3.7.2 A More Rigorous Approach to the Fourier

Series: Dirichlet Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
3.7.3 Expansion of a Function Via Orthogonal Functions . . . . . . . 280
3.8 Applications of Fourier Series in Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
3.8.1 Expansions of Functions Describing Crystal Properties . . . 284
3.8.2 Ewald’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
3.8.3 Born and von Karman Boundary Conditions . . . . . . . . . . . . . . . 292
3.8.4 Atomic Force Microscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
3.8.5 Applications in Quantum Mechanics . . . . . . . . . . . . . . . . . . . . . . . 297
4 Special Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
4.1 Dirac Delta Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
4.2 The Gamma Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
4.2.1 Definition and Main Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
4.3 Orthogonal Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
4.3.1 Legendre Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
4.3.2 General Theory of Orthogonal Polynomials . . . . . . . . . . . . . . . . 330
4.4 Differential Equation of Generalised Hypergeometric Type . . . . . . . . 344
4.4.1 Transformation to a Standard Form . . . . . . . . . . . . . . . . . . . . . . . . . 344
4.4.2 Solutions of the Standard Equation . . . . . . . . . . . . . . . . . . . . . . . . . 347
4.4.3 Classical Orthogonal Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . 352
4.5 Associated Legendre Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
4.5.1 Bound Solution of the Associated Legendre Equation . . . . . 357
4.5.2 Orthonormality of Associated Legendre Functions . . . . . . . . 361
4.5.3 Laplace Equation in Spherical Coordinates. . . . . . . . . . . . . . . . . 363
4.6 Bessel Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
4.6.1 Bessel Differential Equation and Its Solutions . . . . . . . . . . . . . 369
4.6.2 Half Integer Bessel Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
4.6.3 Recurrence Relations for Bessel Functions . . . . . . . . . . . . . . . . . 373
4.6.4 Generating Function and Integral
Representation for Bessel Functions . . . . . . . . . . . . . . . . . . . . . . . . 375
4.6.5 Orthogonality and Functional Series Expansion . . . . . . . . . . . 378
4.7 Selected Applications in Physics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
4.7.1 Schrödinger Equation for a Harmonic Oscillator. . . . . . . . . . . 382
4.7.2 Schrödinger Equation for the Hydrogen Atom . . . . . . . . . . . . . 383
4.7.3 Stirling’s Formula and Phase Transitions . . . . . . . . . . . . . . . . . . . 386
4.7.4 Band Structure of a Solid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
4.7.5 Oscillations of a Circular Membrane . . . . . . . . . . . . . . . . . . . . . . . 394
4.7.6 Multipole Expansion of the Electrostatic Potential. . . . . . . . . 397
5 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
5.1 The Fourier Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
5.1.1 Intuitive Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
5.1.2 Alternative Forms of the Fourier Integral . . . . . . . . . . . . . . . . . . . 403
5.1.3 A More Rigorous Derivation of Fourier Integral . . . . . . . . . . . 405
xvi Contents
5.2 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409

5.2.1 General Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
5.2.2 Fourier Transform of Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
5.2.3 Convolution Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
5.2.4 Parseval’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
5.3 Applications of the Fourier Transform in Physics . . . . . . . . . . . . . . . . . . . 419
5.3.1 Various Notations and Multiple Fourier Transform . . . . . . . . 419
5.3.2 Retarded Potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
5.3.3 Green’s Function of a Differential Equation . . . . . . . . . . . . . . . . 427
5.3.4 Time Correlation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
5.3.5 Fraunhofer Diffraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
6 Laplace Transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
6.2 Detailed Consideration of the Laplace Transform . . . . . . . . . . . . . . . . . . . 450
6.2.1 Analyticity of the Laplace Transform . . . . . . . . . . . . . . . . . . . . . . . 450
6.2.2 Relation to the Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
6.2.3 Inverse Laplace Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
6.3 Properties of the Laplace Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
6.3.1 Derivatives of Originals and Images . . . . . . . . . . . . . . . . . . . . . . . . 464
6.3.2 Shift in Images and Originals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
6.3.3 Integration of Images and Originals. . . . . . . . . . . . . . . . . . . . . . . . . 469
6.3.4 Convolution Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
6.4 Solution of Ordinary Differential Equations (ODEs) . . . . . . . . . . . . . . . . 473
6.5 Applications in Physics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
6.5.1 Application of the Laplace Transform Method
in Electronics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
6.5.2 Harmonic Particle with Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
6.5.3 Probabilities of Hops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
6.5.4 Inverse NC-AFM Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
7 Curvilinear Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
7.1 Definition of Curvilinear Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
7.2 Unit Base Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
7.3 Line Elements and Line Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
7.4 Volume Element and Jacobian in 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
7.5 Change of Variables in Multiple Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
7.6 N-dimensional Sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
7.7 Gradient of a Scalar Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
7.8 Divergence of a Vector Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
7.9 Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
7.10 Curl of a Vector Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
7.11 Some Applications in Physics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
7.11.1 Partial Differential Equations of Mathematical Physics . . . 533
7.11.2 Classical Mechanics of a Particle . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
7.11.3 Distribution Function of a Set of Particles . . . . . . . . . . . . . . . . . . 541
Contents xvii
8 Partial Differential Equations of Mathematical Physics . . . . . . . . . . . . . . . . . 545

8.1 General Consideration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
8.1.1 Characterisation of Second Order PDEs . . . . . . . . . . . . . . . . . . . . 546
8.1.2 Initial and Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
8.2 Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
8.2.1 One-dimensional String. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
8.2.2 Propagation of Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
8.2.3 General Solution of PDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
8.2.4 Uniqueness of Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
8.2.5 Fourier Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565
8.2.6 Forced Oscillations of the String . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573
8.2.7 General Boundary Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576
8.2.8 Oscillations of a Rectangular Membrane . . . . . . . . . . . . . . . . . . 578
8.2.9 General Remarks on the Applicability
of the Fourier Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582
8.3 Heat Conduction Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
8.3.1 Uniqueness of the Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
8.3.2 Fourier Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588
8.3.3 Stationary Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
8.3.4 Heat Transport with Internal Sources . . . . . . . . . . . . . . . . . . . . . . . 595
8.3.5 Solution of the General Boundary Heat
Conduction Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596
8.4 Problems Without Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
8.5 Application of Fourier Method to Laplace Equation. . . . . . . . . . . . . . . . . 599
8.6 Method of Integral Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603
9 Calculus of Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
9.1 Functions of a Single Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
9.1.1 Functionals Involving a Single Function. . . . . . . . . . . . . . . . . . . . 611
9.1.2 Functionals Involving More than One Function . . . . . . . . . . . . 623
9.1.3 Functionals Containing Higher Derivatives. . . . . . . . . . . . . . . . . 625
9.1.4 Variation with Constraints given by Zero Functions . . . . . . . 626
9.1.5 Variation with Constraints given by Integrals . . . . . . . . . . . . . . 635
9.2 Functions of Many Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
9.3 Applications in Physics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
9.3.1 Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
9.3.2 Functional Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654
9.3.3 Many-Electron Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669
Chapter 1
Elements of Linear Algebra
In practical problems it is often necessary to consider the behaviour of many

functions; sometimes their number could be so large that one can easily drown in
the calculations. For instance, in problems of chemical kinetics one is interested in
calculating time evolution of concentrations Ni .t/ of many species i participating
(and produced) in the reactions; including various intermediate species in complex
reactions, the total number of all species (components) to consider can easily reach
a hundred or even more. In all these cases it might be convenient to collect all such
observables into a single object, N.t/ D fN1 .t/; N2 .t/; : : :g, and then, e.g. instead of
many equations governing the evolution of each of its components, write a single
equation for the time evolution of the whole object. This may enormously simplify
calculations of many problems, provided, of course, that we develop special tools
and a proper language for working with these collective objects.
One may say that this is nothing but a new convenient notation. However, it
has made such an enormous impact on many branches of theory in mathematics,
engineering, physics, chemistry, computing and others, that simply cannot be
ignored.
Since this new language requires a detailed study of the objects it introduces,
special attention should be paid to it. This is exactly what we are going to
accomplish in this Chapter. We shall start from spaces and vectors in them, then
move on to matrices, their properties and multiple uses. We shall see how convenient
and elegant vector and matrix notations are, and how powerful is algebra based on
them. Then, at the end of the chapter, we will show how these objects may help in
solving a number of physics problems.
© Springer International Publishing Switzerland 2016 1

L. Kantorovich, Mathematics for Natural Scientists II, Undergraduate Lecture
Notes in Physics, DOI 10.1007/978-3-319-27861-2_1
2 1 Elements of Linear Algebra
1.1 Vector Spaces
1.1.1 Introduction to Multidimensional Complex Vector Spaces
We gave the definition of vectors in real one, two, three and p dimensions in
Sect. I.1.7.1 Here we generalise vectors to complex p-dimensional spaces. It is
straightforward to do: we define a vector x in a p-dimensional space by specifying p
(generally complex) numbers x1 ; x2 ; : : :; xp , called vector coordinates, which define
uniquely the vector: x D x1 ; : : : ; xp . Two such vectors can be added to each
other, subtracted from each other or multiplied by a number c. In all these cases
the operations are performed on the vectors coordinates:
x C y D g means xi C yi D gi ; i D 1; 2; : : : ; p I
x y D g means xi yi D gi ; i D 1; 2; : : : ; p I
cx D g means cxi D gi ; i D 1; 2; : : : ; p :
Consequently, a linear combination of two vectors x and y can also be defined as
˛x C ˇy D g ; where ˛xi C ˇyi D gi ; i D 1; 2; : : : ; p ;
for any (generally complex) numbers ˛ and ˇ. So in general coordinates of vectors

could be complex and so it is said that the vectors are complex.
A dot (scalar) product of two vectors x and y is defined as a number
X
p
.x; y/ D x y D x1 y1 C x2 y2 C C xp yp D xi yi ; (1.1)
iD1
which generalises the corresponding definition for real vectors. Two vectors are
called orthogonal
p if their dot product is zero. The length of a vector x is defined
as jxj D .x; x/, i.e. the dot product of the vector with itself is the square of its
length. Obviously,
ˇ ˇ2
.x; x/ D x1 x1 C C xp xp D jx1 j2 C C ˇxp ˇ > 0 ;
i.e. the dot product of the vector with itself is non-negative and hence the vector
length jxj, as a square root of the dot product, is well defined.
The dot product defined above satisfies the following identity:
!
X
p
X
p

.x; y/ D xi yi D xi yi D .y; x/ : (1.2)
iD1 iD1
1
In the following, references to the first volume of this course (L. Kantorovich, Mathematics for
natural scientists: fundamentals and basics, Springer, 2015) will be made by appending the Roman
number I in front of the reference, e.g. Sect. I.1.8 or Eq. (I.5.18) refer to Sect. 1.8 and Eq. (5.18) of
the first volume, respectively.
1.1 Vector Spaces 3
The dot product is also distributive:
.g; u C v/ D .g; u/ C .g; v/ ;
which follows from its definition (1.1) as well.
Problem 1.1. Prove that the dot product is distributive.
Therefore, one can manipulate vectors algebraically.

It is convenient to introduce real unit base vectors e1 D .1; 0; : : : ; 0/, e2 D
.0; 1; 0; : : : ; 0/, e3 D .0; 0; 1; 0; : : : ; 0/, etc. There are exactly p such vectors in a
p-dimensional space; each of these vectors has only one component (coordinate)
equal to one, all others are equal to zero. Specifically, ei has all its coordinates equal
to zero except for its i-th coordinate which is equal to one. Obviously, these unit
are all of the unit length, .ei ; ei / D 1, and are orthogonal to each other,
base vectors
i.e. ei ; ej D 0 for any i ¤ j. In other words,

ei ; ej D ıij ;
where ıij is the familiar Kronecker

symbol (ıij D 0 if i ¤ j and equal to 1 otherwise).
Then, any vector x D x1 ; : : : ; xp can be uniquely expanded in terms of the unit
base vectors:
X
p
x D x1 ; : : : ; xp D x1 e1 C x2 e2 C C xp ep D xi ei ;
iD1
with its (generally complex) coordinates fxi g serving as the expansion coefficients.
Next, we have to introduce the notion of a linear independence of vectors. Take
two vectors u and v which are known to be not proportional to each other, i.e.
u ¤ v for any complex . A third vector g is said to be linearly dependent on
u and v if it can be written as their linear combination:
g D ˛u C ˇv ;
where at least one of the numerical coefficients ˛ and ˇ is non-zero. Otherwise, g is

said to be linearly independent on the vectors u and v.
The notion of linear dependence (independence) can be formulated also in such
a way that all three vectors enter on an equal footing: the three vectors u, v and g
are said to be linearly independent, if and only if their zero linear combination
˛x C ˇy C g D 0
is only possible when ˛ D ˇ D D 0. In other words, a linear combination of

linearly independent vectors will always be a non-zero vector (unless, of course, the
linear combination is constructed with zero coefficients). If it is possible to construct
a zero linear combination of three vectors with more than one non-zero coefficient,
that means that at least one vector can be expressed as a linear combination of the
others. Obviously, the above definition is immediately generalised to any number of
vectors: n vectors are said to be linearly independent if their zero linear combination,
˛1 x1 C ˛2 x2 C C ˛n xn D 0 ;
is only possible if all the coefficients are equal to zero at the same time: ˛1 D ˛2 D
D ˛p D 0. We shall see in the following that in a p-dimensional space the
maximum number of linearly independent vectors cannot be larger than p.
Example 1.1.
I Consider vectors A D i C j D .1; 1; 0/ , B D i C k D .1; 0; 1/. We shall show
that the vector C D .2; 1; 1/ is linearly dependent on A and B. Indeed, trying a
linear combination, C D ˛A C ˇB, we can write the following three equations in
components:
8
<2 D ˛ C ˇ
1D˛ ;
:
1Dˇ
that have a unique solution ˛ D ˇ D 1, i.e. C D A C B, hence, C is linearly

dependent on A and B. J
Example 1.2.
I Let us check the linear dependence of vectors A D .1; 0; 0/, B D .1; 0; 1/ and
C D .0; 1; 2/. Trying again C D ˛A C ˇB, we obtain in components:
8
<0 D ˛ C ˇ
1D0 ;
:
2Dˇ
which is obviously contradictive. If we, formally, take ˇ D 2 and ˛ D 2 from the

third and the first equations, respectively, then we find a vector C1 D 2A C 2B
which is not equal to C; in fact, C D C1 Cj. One can see that C has an extra “degree
of freedom” along the j direction. Hence, C is linearly independent of A and B. J
Using linearly independent vectors, called basis vectors, one can construct
multidimensional spaces. We shall illustrate this point assuming real vectors.
Suppose, we are given as a basis p linearly independent real vectors u1 , u2 , : : :, up .
These can be used to construct a p-dimensional (p-D) space using the following
imaginative procedure:
1.1 Vector Spaces 5
Fig. 1.1 A third vector goes beyond the plane formed by the first two vectors as it has an extra
coordinate in the direction out of the plane (an extra “degree of freedom”). The three vectors form
a 3D space (volume)
• Start from u1 . All vectors c1 u1 with any numbers c1 form a line (a 1D space)
along u1 .
• Take u2 , which is not proportional to u1 (and hence is linearly independent of it),
and form all linear combinations c1 u1 C c2 u2 with arbitrary numbers c1 and c2 ;
these form a plane (2D space) with u1 and u2 within it.
• Similarly, linear combinations c1 u1 C c2 u2 C c3 u3 would cover a 3D space
(the volume), see Fig. 1.1, if the vector u3 is chosen out of the plane due to u1
and u2 , i.e. it is linearly independent of them.
• Continuing this procedure, the whole p-dimensional space p-D is built up by
linear combinations
X
p
x D c1 u1 C C cp up D ci ui (1.3)
iD1
of linearly independent vectors employing all possible numbers c1 , c2 , etc.

Given the vector x and the basis fui g, one can uniquely determine the coefficients
c1 ; : : : ; cp in its expansion in terms of p linearly independent vectors of the basis.
Indeed, consider a special case when the vectors u1 , u2 , : : :, up are mutually
orthogonal and normalised to unity (it is sometimes said that the vectors are
orthonormal), i.e.

1; if i D j
.ui ; uj / D D ıij : (1.4)
0; if i ¤ j
As we shall see below in Sect. 1.1.3, it is always possible to form linear combi-
nations of any n linearly independent vectors such that the constructed vectors are
orthonormal.
In order to find the coefficients ci for the given x, let us calculate the dot product
of both sides of Eq. (1.3) with uj :
X
p
X
p
.uj ; x/ D ck .uj ; uk / D ck ıkj D cj ;
kD1 kD1
since only one term in the sum will survive, namely the one with the summation
index k D j:
X
p
ck ıkj D c1 ı1j C c2 ı2j C C cj1 ıj1;j C cj ıjj C cjC1 ıjC1;j C D cj :
kD1
„ƒ‚… „ƒ‚… „ ƒ‚ … „ƒ‚… „ ƒ‚ …
D0 D0 D0 ¤0 D0
This proves that for each x the coefficients ci are uniquely determined, so that the
appropriate (and unique) linear combination (1.3) is obtained.
Since any x from the p-D space can be uniquely specified by a linear combination
of vectors u1 , u2 , : : :, up , the latter set is said to be complete. Note that the choice
of the vectors of the basis set u1 , u2 , : : :, up is not unique; any set of p linearly
independent vectors will do. Of course, in the new basis the expansion coefficients
will be different.
Example 1.3. I Prove that vectors u1 D .1; 0; 0; 0/, u2 D .0; 1; 1; 0/, u3 D
.1; 0; 1; 0/ and u4 D .0; 0; 0; 1/ are linearly independent.
Solution. Indeed, we have to solve the vector equation ˛1 u1 C ˛2 u2 C ˛3 u3 C ˛4
u4 D 0, which, if written in components, results in four algebraic equations with
respect to the four coefficients ˛1 , ˛2 , ˛3 and ˛4 :
8 8
ˆ
ˆ ˛1 1 C ˛2 0 C ˛3 1 C ˛4 0 D 0 ˆ
ˆ ˛1 C ˛3 D 0
< <
˛1 0 C ˛2 1 C ˛3 0 C ˛4 0 D 0 ˛2 D 0
H) ;
ˆ ˛1 0 C ˛2 1 C ˛3 1 C ˛4 0 D 0 ˆ ˛2 C ˛3 D 0
:̂ :̂
˛1 0 C ˛2 0 C ˛3 0 C ˛4 1 D 0 ˛4 D 0
which gives a unique solution ˛1 D ˛2 D ˛3 D ˛4 D 0. J

Example 1.4. I Expand the vector x D .1; 1; 2; 3/ of the 4D space in terms of
the four vectors of the previous example.
Solution. Firstly, we note that vectors u1 ; : : : ; u4 are linearly independent as was
proven in the previous example. Then, we write
x D c1 u1 C c2 u2 C c3 u3 C c4 u4 : (1.5)
Multiplying (in the sense of the dot product) both sides of this equation by u1 , we
obtain an algebraic equation; repeating this process with u2 , u3 and u4 , we obtain
other three equations for the unknown coefficients, i.e. the required four equations:
8
ˆ
ˆ 1 D c1 1 C c2 0 C c3 1 C c4 0
<
1 D c1 0 C c2 2 C c3 1 C c4 0
;
ˆ 3 D c1 1 C c2 1 C c3 2 C c4 0
:̂
3 D c1 0 C c2 0 C c3 0 C c4 1
1.1 Vector Spaces 7
which are easily solved to give c1 D 2, c2 D 1, c3 D 3 and c4 D 3, i.e. the
required expansion is
x D 2u1 u2 C 3u3 C 3u4 :
This can be verified directly by using the coordinates of the vectors.
Problem 1.2. Alternatively, solve this problem by writing Eq. (1.5) in com-
ponents. This also gives a system of four linear algebraic equations for the
unknown coefficients c1 , c2 , c3 and c4 . This method is simpler than the one used
above.
Problem 1.3. Prove linear independence of vectors u1 D .1; 1; 0/, u2 D
.1; 0; 1/ and u3 D .0; 1; 1/.
Problem 1.4. Expand the vector u D .1; 1; 1/ in terms of u1 D .1; 0; 1/,
u2 D .0; 1; 1/ and u3 D .1; 1; 0/. [Answer: u D u1 =2 C 3=2u2 u3 =2.]
Problem 1.5. Check linear dependence of vectors u1 D .1; 1; 0/, u2 D
.1; 0; 1/ and u3 D .0; 1; 1/. [Answer: linearly dependent.]
Problem 1.6. Find the expansion coordinates c1 , c2 and c3 of the vector u D
.1; 2; 3/ in terms of the basis vectors u1 D .1; 1; 0/, u2 D .1; 0; 1/ and u3 D
.1; 1; 1/. [Answer: c1 D 2, c2 D 1 and c3 D 4.]
Problem 1.7. Check linear independence of vectors u1 D .1; 1; 1; 1/, u2 D
.1; 1; 1; 1/, u3 D .1; 1; 1; 1/ and u4 D .1; 1; 1; 1/. [Answer: the vectors are
linearly independent.]
1.1.2 Analogy Between Functions and Vectors
There is close analogy between functions of a single variable and vectors which is
frequently exploited. Indeed, consider a generally complex function f .x/ of a real
variable defined on the interval a x b. We assume that the function is integrable
in this interval. We divide the interval into N equidistant subintervals of the length
N D .b a/=N using division points x0 D a, x1 D a C N , x2 D a C 2N , etc.,
and xN D a C NN D b, as shown in Fig. 1.2; generally, xi D a C iN . The values
.f .x0 / ; f .x1 / ; : : : ; f .xN // D .f0 ; f1 ; : : : ; fN / of the function f .x/ at the NC1 division
points form a vector f D .f0 ; f1 ; : : : ; fN / of dimension N C 1. Similarly we can form
vectors g, h, etc., of the same dimension from other functions g.x/, h.x/, etc.
Then, similarly to vectors, we can sum and subtract vectors formed from
functions as well as multiply them by a number. All these operations will give
values of thus obtained new functions at the division points. One can also consider
Fig. 1.2 The interval a x b is divided into N equidistant subintervals by points x0 D a, x1 ,

x2 , etc., xN D b
the dot product of two vectors f and g, corresponding to the functions f .x/ and g.x/,
respectively, in the usual way as
X
N
.f; g/N D f .xi / g .xi / ;
iD0
where we indicated explicitly in our notations for the dot product that it is based on
the division of the interval a x b into N subintervals. The sum above diverges
in the N ! 1 limit. However, if we multiply the sum by the division interval
N D x, it would correspond to the Riemann integral sum (Sect. I.4.1), which in
the limit becomes the definite integral of the product f .x/ g.x/ between a and b,
which may converge. Therefore, we define the dot product of two functions as the
limit:
" N # Z
X b

.f .x/; g.x// D lim .f; g/N N D lim f .xi / g .xi / N D f .x/ g.x/dx :
N!1 N!1 a
iD0
(1.6)
In fact, it is convenient to generalise this definition a little by introducing the
so-called weight function w.x/ > 0:
Z b
.f .x/; g.x// D w.x/f .x/ g.x/dx : (1.7)
a
This integral, which is the closest to the dot product of vectors if w.x/ D 1, is
often called an overlap integral since its value depends crucially on whether or not
the two functions overlap within the interval: only if there is a subinterval where
both functions are non-zero (i.e. they overlap there), the integral is non-zero. If the
overlap integral is equal to zero, it is said that the two functions are orthogonal.
The overlap integral of the function with itself defines the “length” of the function
on the interval (also called norm):
Z b
.f ; f / D w.x/ jf .x/j2 dx :
a
Assuming f .x/ is continuous, the norm is not equal to zero if and only if the function
f .x/ ¤ 0 on at least one continuous subinterval of a finite length inside the original
interval a x b. If .f ; f / D 0, then jf .x/j D 0 at all points x within our interval.
1.1 Vector Spaces 9
Therefore, the norm .f ; f / characterises how strongly the function f .x/ is different
from zero, while the dot product .f ; g/ demonstrates if the two functions f .x/ and
g.x/ have appreciable
overlap
within the interval. If functions in the set f1 .x/, f2 .x/,
etc., fn .x/ satisfy fi ; fj D ıij , they are called orthonormal: orthogonal if i ¤ j
(i.e. when functions are different), and each function is of the norm equal to one
(when i D j).
Similarly to vectors, it is also possible to consider linear independence of
functions. Functions f1 .x/, f2 .x/,: : :, fk .x/ are said to be linearly independent if any
one of them cannot be expressed as a linear combination of the others. In other
words, the equation
X
k
˛1 f1 .x/ C ˛2 f2 .x/ C C ˛k fk .x/ D ˛i fi .x/ D 0 (1.8)
iD1
is valid for any x from a specified interval if and only if there is only a unique
trivial choice for the coefficients ˛1 D ˛2 D D ˛k D 0. This definition makes
perfect sense: indeed, if a function is linearly dependent on some other functions,
then one should be able to write this function as its linear combination with non-zero
coefficients. For instance, f .x/ D 2x C 5x2 1 is linearly dependent on f0 .x/ D 1,
f1 .x/ D x and f2 .x/ D x2 , since f .x/ D f0 .x/ C 2f1 .x/ C 5f2 .x/, and hence f .x/ C
f0 .x/ 2f1 .x/ 5f2 .x/ D 0 for any x. It is seen that in this case the coefficients f˛i g
are not zero, but equal to some real values. If one can only accommodate the linear
combination (1.8) with all coefficients being equal to zero, the functions are indeed
linearly independent. This definition of linear independence is exactly equivalent to
that for vectors.
Example 1.5.
I As an example, let us prove, assuming the unit weight function, that the functions
f1 D 1, f2 D x and f3 D x2 are linearly independent on the interval 0 x 1.
Indeed, we need to solve the equation
˛1 C ˛2 x C ˛3 x2 D 0 (1.9)
with respect to the coefficients ˛1 , ˛2 and ˛3 . To do that, we multiply both sides of

the equation by f1 D 1 and integrate both sides between 0 and 1:
Z 1 Z 1 Z 1
1 1
˛1 dx C ˛2 xdx C ˛3 x2 dx D 0 H) ˛1 C ˛2 C ˛3 D 0 :
0 0 0 2 3
Multiplying the original equation by f2 D x and performing a similar calculation,

we obtain the second equation:
1 1 1
˛1 C ˛2 C ˛3 D 0 ;
2 3 4
while using f3 D x2 results in the third equation:
1 1 1
˛1 C ˛2 C ˛3 D 0 :
3 4 5
These three linear equations are solved, e.g. by substitution and yield the unique
trivial (zero) solution ˛1 D ˛2 D ˛3 D 0 proving that the three functions are indeed
linearly independent. J
Note that the linear independence can also be verified by simply using several
values of the x in the original equation (1.9):
xD0 H) ˛1 D 0 I
xD1 H) ˛2 C ˛3 D 0 I
1 1 1
xD H) ˛2 C ˛3 D 0 :
2 2 4
Solving the last two equations gives trivially ˛2 D ˛3 D 0; this with ˛1 D 0
obtained from x D 0 gives immediately the desired result.
In the problems below the unit weight function is assumed.
Problem 1.8. Prove that the functions f0 D 1, f1 D x2 2, f2 D x3 C x2 1

and f3 D 3x3 x2 7 are linearly dependent on the interval 1 x 1.
[Answer: f3 D 3f2 C 2f1 6f0 .]
Problem 1.9. Prove that the functions f1 D 1, f2 D ex , f3 D e2x and f4 D
e3x are linearly independent on the interval 0 x < 1.
Problem 1.10. Let us consider on the interval a x b a set of orthonormal
functions f1 .x/, f2 .x/, etc., fn .x/. Prove that they are linearly independent.
Later on in Sect. 1.2.8 we shall give a simpler method for checking linear
independence of functions since it relies solely on differentiation, which can always
be done analytically; neither orthogonality nor integration is required.
1.1.3 Orthogonalisation Procedure (Gram–Schmidt Method)
We mentioned above that if we are given a set of p linearly independent vectors

u1 , u2 , : : :, up , new vectors, v1 , v2 , : : :, vp , which are special linear combinations
of the former set, can always be constructed such that they are orthonormal, i.e.
.vi ; vj / D ıij . We shall prove here that this is possible by demonstrating how this
can actually be done. This method is due to Gram and Schmidt. These kinds of
expansions are sometimes very useful.
1.1 Vector Spaces 11
First, we construct an intermediate orthogonal set of vectors which will not

necessarily be normalised to one. This is done using the following recurrence
procedure:
1. Chose the first vector v1 D u1 .
.2/ .2/
2. Construct a vector v2 D u2 C c1 v1 and choose the coefficient c1 in such a way
that .v2 ; v1 / D 0. Indeed, multiplying (using the dot product) both sides of the
equation for v2 by v1 , we obtain
.2/ .2/ .u2 ; v1 /

.v2 ; v1 / D .u2 ; v1 / C c1 .v1 ; v1 / D 0 H) c1 D :
.v1 ; v1 /
.2/
Thus, the vector v2 D u2 Cc1 v1 with the special choice of the mixing coefficient
.2/
c1 as found above is made orthogonal to v1 .
.3/ .3/
3. Construct a vector v3 D u3 C c1 v1 C c2 v2 that is orthogonal to both v1 and v2
.3/ .3/
by appropriately choosing the coefficients c1 and c2 . To this end, we multiply
both sides of this equation for v3 by v1 and use the fact that v1 and v2 have already
been made orthogonal:
.3/ .3/ .u3 ; v1 /

.v3 ; v1 / D .u3 ; v1 / C c1 .v1 ; v1 / D 0 H) c1 D I
.v1 ; v1 /
similarly, multiplying by v2 , we get
.3/ .3/ .u3 ; v2 /

.v3 ; v2 / D .u3 ; v2 / C c2 .v2 ; v2 / D 0 H) c2 D :
.v2 ; v2 /
4. This process can be continued until the last vector is obtained. All obtained new
vectors v1 , v2 , : : :, vp will be mutually orthogonal by construction.
Finally, to make the new vectors orthonormal, it is just necessary to rescale each
vector vi by its own length, i.e. vi ! vi =jvi j. This step concludes the procedure.
Example 1.6. I Given three linearly independent vectors u1 D .1; 0; 1/, u2 D
.1; 1; 0/ and u3 D .1; 1; 1/, construct their orthonormal linear combination using
the Gram–Schmidt method.
Solution. Using equations given above, we first take v1 D u1 D .1; 0; 1/. Then,
.2/
v2 D u2 C c1 v1 with

.2/ .u2 ; v1 / 1 1 1 1
c1 D D H) v2 D u2 v1 D ; 1; :
.v1 ; v1 / 2 2 2 2
.3/ .3/
Next, consider the third vector v3 D u3 C c1 v1 C c2 v2 with
.3/ .u3 ; v1 / 2 .3/ .u3 ; v2 / 1 2

c1 D D D 1 and c2 D D D ;
.v1 ; v1 / 2 .v2 ; v2 / 3=2 3
so that

2 1 1 1
v3 D u3 v1 v2 D ; ; :
3 3 3 3

The constructed vectors v1 D .1; 0; 1/, v2 D 12 ; 1; 12 and v3 D . 13 ; 13 ; 13 / are
mutually orthogonal. To make them normalised,
q rescale them by their own length to

finally obtain: v1 D p .1; 0; 1/, v2 D 3 2 ; 1; 12 and v3 D p1 .1; 1; 1/ J.
1 2 1
2 3
Eventually, the above procedure allows expanding new vectors via the old ones
explicitly:
v1 D u1 ; v2 D d21 u1 C u2 ; v3 D d31 u1 C d32 u2 C u3 ; etc.;
where d21 , d31 , etc., are some coefficients which are calculated during the course
of the procedure. For instance, in the case of the previous example (before
normalisation),
1 2 2 2
v1 D u1 ; v2 D u1 C u2 ; v3 D u1 v2 C u3 D u1 u2 C u3 :
2 3 3 3
We close this section by noting that the Gram–Schmidt method can also be used
to check linear dependence of a set of vectors. If, during the course of this procedure,
some of the new vectors come out to be zero, then there are linearly dependent
vectors in the original set. The newly created set of vectors will then contain a
smaller set of only linearly independent vectors.
To illustrate this point, consider, for instance, a situation in which we assume
that the third vector is linearly dependent on the first two vectors: u3 D ˛u1 C ˇu2 .
Let us run the first steps of the procedure to see how one of the vectors is going
to be eliminated. Out of u1 and u2 we construct two orthogonal vectors v1 and v2
using the first part of the Gram–Schmidt method (i.e. before normalisation). The
new vectors are some linear combinations of the old ones; reversely, the old ones
are a linear combination of the new ones. Therefore, we can write that
u3 D ˛u1 C ˇu2 D v1 C ıv2 :

.3/ .3/
Then, if we are to construct the third vector, v3 D u3 C c1 v1 C c2 v2 , that is to
be orthogonal to both v1 and v2 , then, following the Gram–Schmidt procedure, we
shall obtain for the expansion coefficients:
.3/ .u3 ; v1 / .3/ .u3 ; v2 /

c1 D D and c2 D D ı ;
.v1 ; v1 / .v2 ; v2 /
so that the new vector
.3/ .3/
v3 D u3 C c1 v1 C c2 v2 D . v1 C ıv2 / v1 ıv2
is easily seen to be zero.

1.1 Vector Spaces 13
Problem 1.11. Prove generally that if˚ a vector ui is linearly dependent on

some other vectors in the original set uj , then it will be eliminated during
the construction of the new set, and hence the final set will be one vector less.
[Hint: order vectors in the set in such a way that all vectors which the vector
ui is linearly dependent to stand before it prior to applying the Gram–Schmidt
procedure.]
Problem 1.12. Orthogonalise vectors u1 D .1; 1; 0/, u2 D .1; 0; 1/ and
u3 D .0; 1; 1/. [Answer: v1 D .1; 1; 0/, v2 D .1=2; 1=2; 1/ and v3 D
.2=3; 2=3; 2=3/.]
Problem 1.13. Construct orthogonal vectors from the set u1 D .1; 1; 1/,
u2 D .1; 1; 1/ and u3 D .1; 1; 1/. [Answer: the new vectors are: v1 D u1 ,
v2 D .2=3; 2=3; 4=3/ and v3 D .1; 1; 0/.]
The procedure described above is especially useful for functions. Suppose, we

have a set of functions ffi .x/g which are not mutually orthogonal. If we would like
to construct a new set of functions,
X
gi .x/ D aij fj .x/ ;
i
built as linear combinations of the original set with expansion coefficients ˛ij (they
bear two indices and form an object called matrix, see the next section), then the
Gram–Schmidt procedure is the simplest method to achieve this goal. It goes exactly
as described above for vectors; the only difference is in what we mean by the dot
product in the case of the functions. The following example illustrates this point.
Example 1.7. I Consider the first three powers of x on the interval 1 x 1
as our original set of functions, i.e. f1 .x/ D x0 D 1, f2 .x/ D x and f3 .x/ D x2 .
Assuming the unit weight function in the definition (1.7) of the dot product of
functions, we immediately conclude by a direct calculation of the appropriate
integrals that our functions are not orthogonal. Construct their linear combinations
(i.e. polynomials of not higher than the second degree) which are orthogonal to each
other.
Solution. Following the procedure outlined above, we choose the first function as
g1 .x/ D f1 .x/ D 1. The second function is
.2/ .2/
g2 .x/ D f2 .x/ C c1 g1 .x/ D x C c1 :
We would like it to be orthogonal to the first one, g1 .x/. Taking the dot product of
both sides of the above equation for g2 .x/ with g1 and setting it to zero, we obtain
.2/
.g2 ; g1 / D .f2 ; g1 / C c1 .g1 ; g1 / D 0 :
R1
Simple calculations show that .f2 ; g1 / D .x; 1/ D 1 xdx D 0 and similarly
.2/
.g1 ; g1 / D .1; 1/ D 2. Hence, we get c1 D .f2 ; g1 / = .g1 ; g1 / D 0, i.e. g2 .x/ D x.
Similarly, consider
.3/ .3/
g3 .x/ D f3 .x/ C c1 g1 .x/ C c2 g2 .x/ ;
which is to be made orthogonal to both g1 .x/ and g2 .x/ at the same time. Taking one
dot product of both sides of the above equation for g3 with g1 and then another with
g2 and setting both to zero, we obtain
.3/ .3/ .3/

.g3 ; g1 / D .f3 ; g1 / C c1 .g1 ; g1 / C c2 .g2 ; g1 / D .f3 ; g1 / C c1 .g1 ; g1 / D 0
.3/ .f3 ; g1 / 2=3 1
H) c1 D D D ;
.g1 ; g1 / 2 3
.3/ .3/ .3/
.g3 ; g2 / D .f3 ; g2 / C c1 .g1 ; g2 / C c2 .g2 ; g2 / D .f3 ; g2 / C c2 .g2 ; g2 / D 0
.3/ .f3 ; g2 / 0
H) c2 D D D0;
.g2 ; g2 / 2=3

so that g3 .x/ D f3 .x/ g1 .x/=3 D x2 1=3 D 3x2 1 =3. The obtained functions
g1 D 1, g2 D x and g3 D 3x2 1 =3 form mutually orthogonal functions; they are
proportional to famous Legendre polynomials which we shall be studying in detail
in Chap. 4.3. J
Problem 1.14. Continuing the above procedure, obtain the next two functions
3
from the previous example related to Legendre polynomials.
4
3 Use f4 .x/ D x
f45 .x/ D 2x . [Answer:
and the new functions are g4 D 5x 3x =5 and g5 D
35x 30x C 3 =35.]
Problem 1.15. Assuming the weight function w.x/ D ex and the interval
0 x < 1, show starting from the function L0 .x/ D 1 that the next three
orthogonal polynomials are
L1 .x/ D x 1 ; L2 .x/ D x2 4x C 2 ; L3 .x/ D x3 9x2 C 18x 6 :
The generated functions are directly related to Laguerre polynomials

(Chap. 4.3).
2
Problem 1.16. Assuming the weight function w.x/ D ex and the interval
1 x < 1, show starting from the function H0 .x/ D 1 that the next three
orthogonal polynomials are
1 3
H1 .x/ D x ; H2 .x/ D x2 ; H3 .x/ D x3 x :
2 2
(continued)
1.2 Matrices: Definition and Properties 15
Problem 1.16 (continued)

The generated functions are directly related to Hermite polynomials
(Chap. 4.3). You need to use the fact (see Sect. 4.2.1 for more details) that
Z 1
2
nCm D .x ; x / D xnCm ; 1 D
n m
xnCm ex dx
1

is equal to zero if n C m is odd, and nCm D p C 12 , if n C m D 2p is even;

here p C 12 is given by Eq. (4.33).
p
Problem 1.17. Assuming the weight function w.x/ D 1= 1 x2 and the
interval 1 x < 1, show starting from the function T0 .x/ D 1 that the
next two orthogonal polynomials are
1
T1 .x/ D x ; T2 .x/ D x2 :
2
The generated functions are directly related to Chebyshev polynomials
(Chap. 4.3).
1.2 Matrices: Definition and Properties
1.2.1 Introduction of a Concept
Vectors can be considered, quite formally, as a set of (generally complex) numbers

written vertically:
01
u1 u1

B u2 C u2
uD@ AD
B C
: (1.10)

u u
p p
One can make a generalisation and stretch the set of numbers in the other dimension
as well forming a table of numbers:
0 1
a11 a12 a13 a1n a11 a12 a13 a1n

B a21 a22 a23 a2n C
C a2n
ADB D a21 a22 a23 ; (1.11)
@ A

am1 am2 am3 amn a am2 am3 amn
m1
which is said to contain m rows and n columns, or is an m n matrix (rows

columns). Many formulae and solutions of various problems can be simplified
drastically by using vector and matrix notations.
Both notations, . / and k k, are frequently used to represent vectors and
matrices. In this book we shall adopt only the former one (with round brackets).
We shall use capital letters for matrices and corresponding small letters for their
matrix elements. An element of a matrix A is aij , where i refers to the row and
j to the column in the table indicating the precise position of the element aij , see
Eq. (1.11) how these indices change if you go from one element to the other along
a row or a column of a matrix. The matrix can also be written in a compact form
via its elements as A D aij . Occasionally, to indicate a particular i; j element of a
matrix A, we shall use the following notation: .A/ij .
A vector may be envisaged as a matrix with a single column (n 1 matrix),
and hence sometimes is called a vector-column. One note is in order concerning
adopted notations. Elements of any vector-column, e.g. x D .xi /, we usually show
using a single index. In fact, since the vector is a matrix with a single column, its
elements must be denoted as xi1 with 1 standing as its right index. However, since
the right index has only a single value (equal to one) it is usually omitted and we
arrive at the simplified notations used. But when performing operations on matrices
and vectors it is essential to remember that single value of the second (right) index
in vector-columns is always implied.
A set of numbers written as a row can also be thought of as a vector and, at the
same time, as a matrix with a single row, x D .x1i /. The first (left) index equal to
one is normally omitted. It is frequently called a vector-row.
Square matrices have an equal number of rows and columns, n D m. Diagonal
square matrices have only non-zero elements on their main diagonal:
0 1
a11
B a22 C
ADB
@
C D ıij aii :
A (1.12)

ann
All other, so-called off-diagonal, elements are equal to zero. In particular, if all
elements of the diagonal matrix are equal to one, then this matrix is called a unit
matrix. We shall denote the unit matrix by symbol E. Since non-diagonal elements
of E are zeros, but all diagonal elements are equal
to one, the element eij of the unit
matrix is in fact the Kronecker symbol, E D ıij .
For instance, the matrix

3 8 7
AD
1 0 2
is the 2 3 matrix, i.e. it has 2 rows and 3 columns; its elements are: a11 D 3,
a12 D 8, a13 D 7 (the first row); a21 D 1, a22 D 0 and a23 D 2 (the second row).
1.2.2 Operations with Matrices

A D aij and
By summing up the corresponding elements of two matrices B D bij
having the same m n structure, a new matrix C D cij D aij C bij is obtained
containing the same number of rows and columns. An example of this:

1 3 2 2 1 4 32 2
C D :
47 1 3 7 2 7 0 1
A matrix
of any structure can be multiplied by a number. If c is a number and
A D aij , then B D cA is a new matrix with elements bij D caij , e.g.

1 3 2 3 9 6
3 D :
47 1 12 21 3

From two matrices A D aij (which is n m) and B D bij (which is m p) a new
matrix
X
m
C D AB D cij with cij D aik bkj (1.13)
kD1
can be constructed, which is an n p matrix, see Fig. 1.3. This operation is a natural
generalisation of the dot product of vectors. Thus, to obtain the i; j element cij of the
matrix C, it is necessary to calculate a dot product of the i-th vector-row of A with
the j-th vector-column of B. This means, that not just any matrices can be multiplied:
the number of columns in A must be equal to the number of rows in B. Note how the
indices in formula (1.13) for cij are written: i and j in the product aik bkj under the
sum are written exactly in the same order as in cij , while the dump summation index
k appears in between i and j, i.e. as the right index in aik and the left in bkj . In physics
literature the so-called Einstein summation convention is sometimes used whereby
Fig. 1.3 Schematics of the matrix multiplication. Matrix A has n D 5 rows and m D 9 columns,
while matrix B has p D 6 columns and the same number of rows m D 9 as A has columns. The
resultant matrix C D AB has n D 5 rows (as in A) and p D 6 columns (as in B)
the sum sign is dropped, i.e. C D AB is written in elements as cij D aik bkj and it is
assumed that summation is performed over the repeated index (k in this case); we
shall not be using this convention here however to avoid confusion.
As an example, consider the product of two matrices
0 1
1 4
131
AD and B D @ 10 3 A :
234
1 2
We obtain the matrix

1 1 C 3 10 C 1 1 1 4 C 3 3 C 1 2 32 15
C D AB D D ;
2 1 C 3 10 C 4 1 2 4 C 3 3 C 4 2 36 25
which is a 2 2 matrix.
In particular, one can multiply a matrix and a vector which results in a vector.
Indeed, if x D .xi / is an n-fold vector and A D aij is an m n matrix, then y D Ax
is an m-fold vector:
0 10 1 0 1 0 1
a11 a1n x1 a11 x1 C C a1n xn y1
y D Ax D @ A @ A D @ A D @A D y ;
am1 amn xn am1 x1 C C amn xn ym
(1.14)
or in components:
X
n
yi D aij xj :
jD1
This is equivalent to the general rule (1.13) of matrix multiplication: since vectors
are one-column matrices, then x D .xi / can be written as X D .xi1 / with xi1 D xi
and hence Y D AX in components means
X
n X
n
yi1 D aij xj1 D aij xj ;
jD1 jD1
which is the same as above because the elements yi1 form a single column matrix
and therefore represent a vector y with components yi D yi1 .
The matrix product is in general not commutative, i.e. AB ¤ BA. However, some
matrices may be commutative, e.g. diagonal matrices: if
0 1 0 1
a11 b11
B a22 C B b22 C
ADB
@
C and B D B
A @
C ;
A

ann bnn
then
0 1
a11 b11
B a22 b22 C
AB D B
@
C D BA :
A

ann bnn
It is easy to see that a product of any

matrix with
a diagonal
matrix is not necessarily
commutative. Indeed, let A D aij and D D dij D dii ıij , then the i; j element of
the matrix AD is
X X
aik dkj D aik dkk ıkj D aij djj ;
k k
where the Kronecker delta symbol ıkj cuts off all the terms in the sum except for the
one with k D j. Similarly, the i; j element of the matrix DA works out to be
X X
dik akj D dkk ıik akj D dii aij ;
k k
which is different from aij djj obtained when multiplying matrices in the reverse
order, i.e. the two matrices are not generally commutative.
At the same time, the matrix product is associative: .AB/ C D A .BC/. The
simplest proof is again algebraic based on writing
down the
products of matrices
by elements. Indeed, if A D aij , B D bij and C D cij , then D D AB has
elements
X
dij D aik bkj :
k
Then, the matrix G D .AB/ C D DC has elements

!
X X X
gij D dik ckj D ail blk ckj :
k k l
Note that we introduced a new dump index l to indicate the product of elements
ail blk in D since the index k has already been used in the product dik ckj of DC. Also,
note how indices have been used in the product: since dik has the left and right
indices as i and k, these appear exactly in the same order in the product ail blk with
the summation index l in between, as required. The final product of three matrices
in elements becomes a double sum:
X
gij D ail blk ckj :
kl
The indices i; j of the element gij of the final matrix G appear as the first and the
last indices in the product ail blk ckj of the elements under the sums; the summation
indices follow one after the other and are both repeated twice.
On the other hand, the matrix H D A .BC/ has elements
!
X X X
hij D aik bkl clj D aik bkl clj :
k l kl
This is the same as gij since we can always interchange the dumb indices k $ l in
the double sums. This proves the required statement.
When a square matrix A is multiplied from either side (from the left or from
the right) by the unit matrix E of the same dimension, it does not change:
AE D EA D A. Indeed, in components:
X
.AE/ij D aik ıkj D aij ;
k
and similarly for the product EA.

A new matrix AT , called the transpose of A, can be obtained from the latter by
writing rows as columns as shown in the example below:
0 1
T 1 4
1 3 2
D@ 3 7A :
47 1
2 1

In other words, if A D aij , then

AT D aQ ij with aQ ij D aji ; (1.15)
i.e. its indices are simply permuted. We shall be frequently using these notations
here denoting elements of the transposed matrix with the same small letter as the
original matrix, but putting a tilde (a wavy line) on top of it.
Theorem 1.1. A transpose of a product of two matrices is equal to a product of

transposed matrices taken in the reverse order:
.AB/T D BT AT : (1.16)

Proof. If A D aij and B D bij , then AT D aQ ij with aQ ij D aji and BT D bQ ij
P
with bQ ij D bji . The product AB D cij with elements cij D k aik bkj after transpose
turns into the matrix .AB/T D cQ ij with elements
X X X
cQ ij D cji D ajk bki D aQ kj bQ ik D bQ ik aQ kj D dij ;
k k k
whereP dij are exactly elements of BT AT . Note that here we were able to associate
dij D k bQ ik aQ kj with BT AT only after making sure that indices in the product of the
elements under the sum are properly ordered as required by the matrix multiplication
with the dump summation index k appearing in the middle and the indices i and j as
the first and the last ones, respectively, as in dij . Q.E.D.
The operation of transpose of a matrix appears in a dot product of two real
vectors:

.Ax; y/ D x; AT y : (1.17)
P
Indeed, let Ax D c be a vector with elements ci D k aik xk . Then, the dot product
of c and another vector y is
! !
X XX X X X X
.Ax; y/ D cl yl D alk xk yl D xk alk yl D xk aQ kl yl
l l k k l k l

D x; AT y ;
P P
because the vector d D AT y has elements dk D l aQ kl yl D l alk yl , as required.
If a square matrix is equal to its transpose, it is then symmetric with respect to its
diagonal and hence is called a symmetric matrix:
A D AT ; i.e. aij D aQ ij D aji : (1.18)
For instance, consider the following two 3 3 matrices:

0 1 0 1
a x2 C 1 0 a x2 C 1 0
p p
@ x2 C 1 1 x C 1 A and @ .x2 C 1/ 1 x C 1A :
p p
0 xC1 x 0 xC1 x
The first matrix is symmetric, while the second is not. In fact, the second matrix is
antisymmetric as aij D aji .
Problem 1.18. Let A be a symmetric matrix, AT D A. Show that the matrix

C D BT AB with arbitrary B (this operation A ! C is called a similarity
transformation, Sect. 1.2.10.3) remains symmetric.
Interestingly, a dot product of two vectors can also be written as a multiplication

of two matrices:
.x; y/ D X T Y ; (1.19)
where X D .xi / and Y D .yi / are matrices consisting of a single column (i.e. vector-
columns). Then, X T will be a matrix containing a single row (a vector-row), so that
X T and Y can be multiplied using the usual matrix multiplication rules:
01T 0 1 0 1
x1 y1 y1
@
.x; y/ D X Y D
T A @ D . x1 xp / A D x1 y1 C C xp yp ;
A @
xp yp yp
which is a 1 1 matrix (we enclosed its only element in the round brackets to stress
this point); it is essentially a scalar. Note, however, that the operation in which the
second vector is transposed produces a square matrix:
0 1 0 1T 0 1 0 1
x1 y1 x1 x1 y1 x1 yp
C D XY D @ A @ A D @ A y1 yp D @ A ;
T
xp yp xp xp y1 xp yp
with elements cij D xi yj .
If a matrix A D .aij / contains complex numbers, the complex conjugate matrix
A D .aij / can be defined which contains elements aij D aij , where indicates the
operation of complex conjugation. For instance,

a C ib 2 a ib 2
AD H) A D :
3i a ib 3i a C ib
Problem 1.19. Demonstrate by a direct calculation that the matrices

0 1 0 1
1 3 0 1 1 0
A D @ 0 4 0 A and B D @ 0 2 1 A
1 1 0 1 0 1
are not commutative.

Problem 1.20. Check if matrices
0 1 0 1
010 1 1 1
@ 1 0 1 A and @ 1 1 1 A
010 1 1 1
commute. [Answer: they do not.]

Problem 1.21. Consider the n-th power X n of a square matrix X as a product
of it with itself n times, i.e. X n D „ X
XXƒ‚ n
…. Show that matrices A and B
m
n
commute if the matrices A and B do.
Problem 1.22. Show that the product of

0 1 0 1 0 1
1 1 0 1 0
@
AD 1 1 0 A and X D 1 A
@ is Y D 2 A :
@
0 0 1 0 0
0 1 0 1
1 0
Problem 1.23. If X D 1 and Y D 1 A, then calculate, if allowed, the
@ A @
0 1
following products: X T Y, XY T , XY and X T Y T . Name the resultant matrices
0 1
011
where appropriate. [Answer: 1 (a scalar), a 3 3 matrix @ 0 1 1 A; the other
000
two do not exist.]
Problem 1.24. The Pauli matrices are

01 0 i 1 0
AD ; BD and C D :
10 i 0 0 1
Show that (i) A2 D B2 D C2 D E, where E is the unit matrix; (ii) any pair
of matrices anticommute, e.g. AB D BA and (iii) a commutator of any two
matrices, defined by ŒA; B D AB BA, is expressed via the third matrix, e.g.
ŒA; B D 2iC.
Problem 1.25. If ŒA; B D AB BA is the commutator of two matrices, prove
the following identity:
ŒA; ŒB; C C ŒB; ŒC; A C ŒC; ŒA; B D 0 :
1.2.3 Inverse Matrix: An Introduction
We have already encountered square, diagonal, unit and symmetric matrices. There
are many other types of matrices that have special properties some of which to be
considered later on. In the forthcoming subsections we shall encounter other types
of matrices.
Some square matrices A have an inverse matrix, denoted A1 , that is defined in
this way:
AA1 D A1 A D E ; (1.20)

where E is the unitmatrix.

These matrix equations,
when written via components of
the matrices, A D aij and A1 D B D bij , look like this:
X
n X
n
aik bkj D bik akj D ıij ; (1.21)
kD1 kD1
where ıij is the Kronecker symbol. It is easy to see that A1 standing on the left and
right sides of A in the above equation is one and the same matrix. Indeed, assume
that these are different matrices called B and C, respectively, with B ¤ C, i.e.
BA D E (1.22)
and also, quite independently,
AC D E : (1.23)
Now multiply the second of these identities by B from the left: BAC D BE. In the
left-hand side: BAC D .BA/C D EC D C, where we used the first of the two
identities above, Eq. (1.22), while in the right-hand side, BE D B. Hence, we obtain
C D B, which contradicts our initial assumption, i.e. A1 on the left and right of A
in Eq. (1.20) is indeed the same matrix.
Not all square matrices have an inverse. For instance, a matrix consisting of only
zeros does not have one. Only non-singular matrices have an inverse and we shall
explain what that means later on in Sect. 1.2.9.
Theorem 1.2. Inverse of a product AB of two square matrices is equal to a
product of their inverse matrices taken in reverse:
.AB/1 D B1 A1 : (1.24)
Proof. The matrix C D .AB/1 is defined as CAB D E. Multiply both sides of this
matrix equation by B1 from the right: CABB1 D EB1 . Since E is the unit matrix,
EB1 D B1 . Also, BB1 D E by definition of B1 . Thus, we obtain: CAE D B1
or simply CA D B1 . Next multiply both sides of this matrix equation on A1 from
the right again; we obtain: CAA1 D B1 A1 or simply C D B1 A1 . Q.E.D.
Problem 1.26. Prove that the inverse matrix, if exists, is unique. [Hint: prove
by contradiction.]
Problem 1.27. Prove that the inverse of AT is equal to the transpose of the
inverse of A, i.e.
T 1 1 T
A D A : (1.25)
[Hint: transpose the equation AA1 D E.]

1
Problem 1.28. Prove that A1 D A.
Problem 1.29. Consider a left triangular square .n n/ matrix
0 1
a11
B a21 a22 C
B C
B C
A D B a31 a32 a33 C ;
B : :: :: : : C
@ :: : : : A
an1 an2 an3 ann
where all elements aij with j > i are equal to zero while aij ¤ 0 if j i. Show
by writing the identity AA1 D E directly in components that the inverse matrix
A1 has exactly the same left triangular structure.
Problem 1.30. Prove a similar statement for the right triangular matrix.
Example 1.8. I Calculate the inverse

xy ab
A1 D of the matrix AD :
zh cd
Solution. By definition,

ab xy 10
D ;
cd zh 01
which gives in components four equations:

8
ˆ
ˆ ax C bz D 1
<
ay C bh D 0
;
ˆ cx C dz D 0
:̂
cy C dh D 1
that can easily be solved with respect to x; y; z; h (the first and the third equations
give x and z, while the other two y and h) yielding
d b c a
xD ; y D ; z D and h D ; where D da bc :

Thus, the inverse of the square matrix A is

1 d b
A1 D with D ad bc : (1.26)
c a
Note that the inverse does not exist if D 0. J

1.2.4 Linear Transformations and a Group
We know that by applying a square matrix A to a vector x, a new vector z D Ax is

obtained. By definition, the new vector belongs to the same p-D space as the old one.
Thus, the operation Ax can be considered as a transformation operation x ! z. The
transformation is a linear one since it transforms a linear combination of vectors
into the linear combination of transformed vectors:
A .˛x C ˇy/ D ˛ .Ax/ C ˇ .Ay/ ; (1.27)
where ˛ and ˇ are arbitrary constants.

Problem 1.31. Let A D aij , x D .xi / and y D .yi /. Prove the above identity.
[Hint: write both sides of Eq. (1.27) in components.]
Consider now linear matrix transformations within a more general context.

Firstly, we note that for any non-singular transformation x D Ay it must be possible
to define an inverse transformation, y ! x, using the inverse matrix A1 , i.e.
y D A1 x. Secondly, there is always a unity transformation performed by the unity
matrix E that does nothing: Ex D x for any x. Finally, let us now perform two
consecutive transformations: first A and then B:
y D Ax; then z D By; so that z D .BA/ x : (1.28)
We see that two transformations, performed one after another, act as some other
transformation that is given by the product matrix C D BA, in which the order of
matrices in the product follows the order of the transformations themselves from
right to left. This operation, when one transformation is performed after another, is
called a multiplication operation, and we see that in the case under consideration
this operation corresponds exactly to the matrix multiplication.
A set of non-singular matrices may form an algebraic object called a group.
A number of requirements exist which are necessary for a set of matrices to form
such a group.2 What are these requirements? There are three of them:
1. There is a unity element e in the set that does nothing; this is served by a unit
matrix E in the case of matrices.
2. There exists a multiplication operation to be understood as an action of several
operations one after another; such a combined operation must be equivalent to
some other operation in the same set; in other words, if g1 and g2 belong to the
set, then g1 g2 must also be in it (the closure condition); the multiplication is
2
Note, however, that groups can also be formed by other objects, not only matrices, although we
shall limit ourselves only to matrices here.
associative, i.e. .g1 g2 / g3 D g1 .g2 g3 /. In the case of matrices, the multiplication

operation corresponds to the usual matrix multiplication. It is clear that all
matrices must be square matrices of the same dimension n n since otherwise
a product of matrices may only be allowed in a certain order (indeed, a product
of two matrices n m and m p is perfectly legitimate yielding a matrix n p,
however their product in the reverse order, i.e. m p with n m, is only possible
if p D n).
3. For each element g of the set, there exists its inverse g1 , belonging to the same
set, that is defined in such a way that a multiplication of g with g1 (in any order)
gives the unit element, i.e. gg1 D g1 g D e.
Example 1.9.
I As an example, consider a set of four matrices:

10 0 1 1 0 0 1
ED ; AD ; BD and C D : (1.29)
01 1 0 0 1 1 0
It is easily checked that a product of any two elements results in an element from
the same set; all such products are shown in Table 1.1. Next, we see that each
element has an inverse; indeed, by looking at the table, we see that A1 D C
(since AC D CA D E), B1 D B and C1 D A. Also, it is obvious that E serves
as a unity element. Hence, the four elements form a group with respect to matrix
multiplication. Moreover, we can also notice that two elements E and B form a
group of two elements on their own since BB D E and B1 D B. This smaller group
consists of elements of a larger group and is called a subgroup. J
Table 1.1 A group multiplication table showing

the result of multiplication of each of the four
elements of the set ŒE; A; B; C in Eq. (1.29) with
any other element
E A B C
E E A B C
A A B C E
B B C E A
C C E A B
Groups possess a number of fascinating properties; however, we are not going to

go into this theory here.3
1.2.5 Orthogonal and Unitary Transformations

1.2.5.1 General Properties
Let us now consider a particular set of certain real transformation matrices A. These
are the matrices that conserve the dot product between any two real vectors: if
x0 D Ax and y0 D Ay, then
0 0
x ; y D .Ax; Ay/ D .x; y/ : (1.30)
In particular, by taking y D x; we obtain that .Ax; Ax/ D .x; x/, i.e. the length of a
vector before and after this particular transformation does not change.
To uncover the appropriate condition
the matrix A should satisfy, we use
Eq. (1.17) and get .Ax; Ay/ D x; AT Ay , which will be equal to .x; y/ only if
AT A D E : (1.31)
One can also see that AAT D E as well (indeed, multiply both sides of Eq. (1.31) by
A from the left and then by A1 from the right). Matrices satisfying this condition are
called orthogonal.4 Transformations performed by orthogonal matrices are called
orthogonal transformations. One can see by comparing Eqs. (1.31) and (1.20) that
for orthogonal matrices
A1 D AT ; (1.32)
i.e. the transposed matrix AT is at the same time the inverse of A. Therefore, the
inverse of the orthogonal matrix always exists and is equal to AT , i.e. orthogonal
matrices are non-singular.
The orthogonal matrices form a group: if A and B are orthogonal, then C D AB
is also orthogonal. Indeed, A1 D AT , B1 D BT and
C1 D .AB/1 D B1 A1 D BT AT D .AB/T D CT :
3
An interested reader may consult specialised texts, e.g. J.P. Elliott and P.G. Dawber, Symme-
try in physics, Vols. 1 (ISBN-10: 0195204557, ISBN-13: 978-0195204551) and 2 (ISBN-10:
0195204565, ISBN-13: 978-0195204568), Oxford Univ Pr; Reprint edition, 1985, or M. Hamer-
mesh, Group theory and its application in physical problems, Dover Publications; Reprint edition,
2012 (ISBN-10: 0486661814, ISBN-13: 978-0486661810).
4
This probably originates from the definition of two orthogonal vectors. Indeed, if we have two
vector-columns X and Y which are orthogonal, then one can write X T Y D 0 which in some sense
may be considered analogous to Eq. (1.31).
Also, the set of orthogonal matrices contains the unit element E which is of course
orthogonal; finally, each element has an inverse as A1 D AT and AT is also an
T 1 1
orthogonal matrix because AT D A D A1 D AT . This discussion
proves the statement made above.
As a simple example, consider the following matrix:

cos sin
AD : (1.33)
sin cos
Then, the matrix

cos sin
AT D
sin cos
is its transpose. It is easily checked by direct multiplication that A is an orthogonal
matrix since AT A D E. Indeed,

cos sin cos sin 10
D ;
sin cos sin cos 01
i.e. the transposed matrix is indeed the inverse of A. Consider now a vector x D
.x1 ; x2 / of length square jxj2 D .x; x/ D x12 C x22 . It is easy to see that the length
square .y; y/ of the vector y D Ax is also equal to .x; x/ for any x. Indeed,

cos sin x1 x1 cos x2 sin y1
yD D D ;
sin cos x2 x1 sin C x2 cos y2
and its square length y21 C y22 is easily calculated (after simple manipulations) to be
x12 C x22 .
In the 3D space orthogonal matrices not only conserve the length of a vector
after the transformation, they also conserve the angle between two vectors after
transforming both of them. Indeed, according to Eq. (I.1.21), the dot product is
equal to a product of vectors lengths and the cosine of the angle between them. The
lengths of the vectors are conserved after the transformation. Since the dot product
is conserved as well, the cosine of the angle is conserved, i.e. the angle between two
transformed vectors remains the same.
What makes the matrix an orthogonal one? We shall now see that these matrices
have a very special structure. Writing Eq. (1.31) element by element, we obtain
(assuming that A is a square n n matrix):
X X
aQ ik akj D aki akj D ıij :
k k
Here elements faki g D .a1i ; a2i ; : : : ; ani / with the fixed i form a vector˚ which
is
composed of all elements of the i-th column of A; likewise, all elements akj with
the fixed j form a vector out of the j-th column of A. Therefore, the equation above
tells us that the dot product of the i-th and j-th columns of A is equal to zero if i ¤ j
and one if i D j. In other words, the columns of an orthogonal n n matrix form a
set of n orthonormal vectors.
Problem 1.32. Show that rows of an orthogonal matrix form a set of n

orthonormal vectors. [Hint: use the condition AAT D E.]
Similarly, we can consider a more general p-D space of complex vectors. But
first, let us prove for that case an identity analogous to Eq. (1.17):
! ! !
X X X X X X
.Ax; y/ D alk xk yl D xk alk yl D xk aQ kl yl
l k k l k l

D x; AT y :
We see that when the matrix A goes from the left position in the dot product to the
right position there, it changes in two ways: it is transposed (as in the previous case
of the real space)
and undergoes complex conjugation (obviously, in any order).
The matrix AT D AT obtained by transposing A and then applying complex
conjugationto all
its elements is called Hermitian conjugate and is denoted with
the dagger: AT D A . Hence, we can write

.Ax; y/ D x; A y : (1.34)
Matrices which satisfy
AA D E ; (1.35)
are called unitary matrices. Transformations in complex vector spaces by unitary
matrices are called unitary transformations. Physical quantities in quantum mechan-
ics have a direct correspondence to unitary matrices. Comparing Eq. (1.35) with the
definition of the inverse matrix, Eq. (1.20), one can see that for unitary matrices
A1 D A ; (1.36)
i.e. to calculate the inverse matrix, one has to simply transpose it and then take the
complex conjugate of all its elements. Since for unitary matrices A is the same as
A1 , then A A D E is true as well.
If we write down AA D A A D E in elements, we obtain the following
equations:
X X
aik ajk D ıij and aki akj D ıij : (1.37)
k k
These relationships mean that rows and columns of a unitary matrix are orthonormal
in the full sense of the dot product of complex vectors.
It then follows from Eq. (1.34) that for unitary transformations,

.Ax; Ay/ D x; A ŒAy D x; A Ay D .x; Ey/ D .x; y/ ;
i.e. the dot product .x; y/ is conserved. We see that unitary transformations are
generalisations of the orthogonal transformations for complex vector spaces.
Example 1.10. I Show that

1 1 C 2i 4 2i
AD
5 2 4i 2 i
is unitary.
Solution. What we need to do is simply to check if AA D A A D E:

1 1 C 2i 2 4i 1 1 2i 2 C 4i
A D
T
and thus A DT
D A I
5 4 2i 2 i 5 4 C 2i 2 C i
therefore,

1 1 2i 2C4i 1 1 C 2i 4 2i 1 25 0 10
A AD D D DE ;
5 4C2i 2Ci 5 2 4i 2 i 25 0 25 01

1 1C2i 42i 1 1 2i 2C4i 1 25 0 10
AA D D D DE;
5 2 4i 2i 5 4 C 2i 2Ci 25 0 25 01
as it should be. J
If after applying the Hermitian conjugate to a matrix A it changes its sign, i.e.
A D A, the matrix is called anti-Hermitian or skew-Hermitian.
Problem 1.33 (Discrete Fourier Transform). Consider N numbers x0 , x1 ,
etc., xN1 . One can generate another set of N numbers, yj (with j D
0; 1; : : : ; N 1), using the following recipe:5
1 X i2
N1
yj D p e jk=N
xk : (1.38)
N kD0
˚ ˚
Form N-dimensional vectors X and Y from the quantities xj and yj ,
respectively, and then write the above relationship in the matrix form as
Y D UX. Prove that rows (columns) of U form an orthonormal set of vectors,
and hence that the matrix U is unitary. Finally, establish that the inverse
transformation from Y to X reads
1 X i2
N1
xk D p e jk=N
yj : (1.39)
N jD0
5
Complex exponential function is introduced properly in Section 2.3.3.
Problem 1.34. Show that a product U D U1 U2 Up of unitary matrices U1 ,

U2 , etc., Up is also a unitary matrix, i.e. U U D E.
Problem 1.35. Consider an arbitrary square matrix A. It can always be split
into two matrices as A D B C C, where
1 1
BD A C A and CD A A :
2 2
Prove that while B is Hermitian, C is anti-Hermitian. This statement demon-
strates that any square matrix can be represented as a sum of some Hermitian
and anti-Hermitian matrices.
Problem 1.36. Let A be a Hermitian matrix, A D A. Show that after a
similarity transformation, C D B AB, with arbitrary B the matrix C remains
Hermitian.
1.2.5.2 Rotations in 3D
An important particular set of orthogonal transformations is formed by 3D rotations.

Let us consider a clockwise rotation by an angle of the point A with coordinates
.x; y; z/ about the x axis, see Fig. 1.4. It goes into the point A0 with coordinates
.x0 ; y0 ; z0 /. Let us express the coordinates of the target point via the rotation angle
and the coordinates of the initial point. Obviously, for any rotation about the x axis
the x coordinate does not change: x0 D x. Then, using the definitions given in the
picture, we get
z D sin ; y D cos and z0 D sin 0

; y0 D cos 0
;
0
where D .
Problem 1.37. Using well-known trigonometric identities, show that the new
coordinates, y0 and z0 , are related to the old ones, y and z, as follows:
y0 D y cos C z sin and z0 D z cos y sin :
Fig. 1.4 For the derivation of

the rotation matrix
corresponding to the
clockwise rotation about the x
axis: A ! A0
This simple result allows us to write down the transformation A ! A0 of the

point A in the matrix form as follows:
0 1 0 10 1 0 1
x0 1 0 0 x x
@ y0 A D @ 0 cos sin A @ y A D Rx . / @ y A ;
z0 0 sin cos z z
i.e. the transformation is performed by the matrix

0 1
1 0 0
Rx . / D @ 0 cos sin A : (1.40)
0 sin cos
Note that this matrix is orthogonal. Indeed, consider a rotation by the angle
about the same axis x:
0 1
1 0 0
Rx . / D @ 0 cos sin A D Rx . /T ;
0 sin cos
and this matrix can easily be checked to be the inverse to that in Eq. (1.40):
0 10 1 0 1
1 0 0 1 0 0 100
Rx . /Rx . /T D @ 0 cos sin A @ 0 cos sin A D @ 0 1 0 A D E ;
0 sin cos 0 sin cos 001
i.e. indeed
Rx ./ D Rx ./T D Rx ./1 : (1.41)
Problem 1.38. Similarly, establish matrices for clockwise rotations around

two other axes:
0 1
cos 0 sin
Ry . / D @ 0 1 0 A ; (1.42)
sin 0 cos
0 1
cos sin 0
Rz . / D @ sin cos 0 A : (1.43)
0 0 1
(continued)

Check that both matrices are orthogonal, i.e. the inverse of any of them
coincides with their transpose and also corresponds to the rotation by the
angle .
Problem 1.39. Consider now two consecutive rotations about the x axis: the
first by 1 and the second by 2 . The matrix Rx .2 / Rx .1 / should correspond
to a single rotation by the angle 1 C 2 . Show by direct multiplication of the
two matrices and using well-known trigonometric identities that this is indeed
the case:
0 1
1 0 0
Rx .2 /Rx .1 / D @ 0 cos .1 C 2 / sin .1 C 2 / A D Rx .1 C 2 / :
0 sin .1 C 2 / cos .1 C 2 /
It is seen now that 3D rotations about the x axis form a group. Indeed, any rotation
has an inverse, the unit matrix is the unity element (rotation by D 0 gives E) and
any two consecutive rotations by 1 and 2 correspond to a single rotation by 1 C2 .
Obviously, all rotations about y or z axis also form groups. In fact, it is actually
possible to establish that any arbitrary rotation in the 3D space (not necessarily
about the same axis) can always be represented by no more than three elementary
rotations described above, and all these rotations form a group of rotations of the
3D space.
1.2.5.3 Reflections in 3D
Consider another type of transformations in 3D space where a point A given by

the vector r goes over to a point A0 upon a reflection in a plane passing through
the origin. The unit normal of the plane is given by the vector n, see Fig. 1.5. The
vector r forms an angle with the plane, and upon reflection goes over into the
vector r0 which has the same projection a on the plane, but the two vectors have
opposite perpendicular components. The latter can be written as .r n/ n for r and
as .r n/ n for r0 . Therefore, one can write
r D a C .r n/ n and r0 D a .r n/ n H) r0 r D 2 .r n/ n :

Eq. (1.44): the vector r when
reflected in the plane with
the unit normal n goes into
vector r0 . The side view is
shown in which the plane
runs perpendicularly to the
surface of the page
Correspondingly, the new vector r0 can be written as a transformation r0 D C r with

a reflection matrix C .
Problem 1.40. Show that

0 1
1 2n21 2n1 n2 2n1 n3
C D @ 2n2 n1 1 2n22 2n2 n3 A : (1.44)
2n3 n1 2n3 n2 1 2n23
Problem 1.41. Verify that C C D E. This is to be expected as two consecu-

tive reflections must return the point A back: A ! A0 ! A.
In particular, the reflection in the x y plane with n D .0; 0; 1/ is performed by

the matrix
0 1
10 0
C xy D @ 0 1 0 A ;
0 0 1
while the reflection in the plane with the normal n D p1 .1; 1; 1/ is given by the
3
matrix
0 1
1 2 2
1@
C 111 D 2 1 2 A ;
3
2 2 1
so that the vector along the normal, r D .1; 1; 1/, transforms into r0 D C 111 r D
.1; 1; 1/, as expected.
1.2.6 Determinant of a Square Matrix

1.2.6.1 Formal Definition
One quantity which is of great importance in the theory of matrices is its determi-
nant. We introduced the 2 2 and 3 3 determinants before in Sect. I.1.6. Here we
shall generalise these definitions to determinants of arbitrary dimension and, most
importantly, will relate the determinants to square matrices.
We start by recalling the 2- and 3-dimensional cases and then consider a general
case of an arbitrary dimension. Consider a 2 2 and a 3 3 matrices
0 1
b11 b12 b13
a11 a12
AD and B D @ b21 b22 b23 A :
a21 a22
b31 b32 b33
The determinant of A is defined as

ˇ ˇ
ˇa a ˇ
jAj D detA D ˇˇ 11 12 ˇˇ D a11 a22 a12 a21 : (1.45)
a21 a22
Note that the expression for jAj contains a sum of 2Š D 2 terms each containing
products of two elements. Similarly, the determinant of B is defined as:
ˇ ˇ
ˇ b11 b12 b13 ˇ
ˇ ˇ
jBj D detB D ˇˇ b21 b22 b23 ˇˇ
ˇb b b ˇ
31 32 33
D b11 b22 b33 b11 b23 b32 b12 b21 b33 C b12 b23 b31 C b13 b21 b32 b13 b22 b31 :
(1.46)
This expression contains 3Š D 6 terms.
Problem 1.42. Calculate determinants of matrices

0 1 0 1
1 0 2 1 1 0
A D @ 3 1 0 A ; B D @ 0 2 1 A and C D AB :
0 5 1 3 1 0
Check that detC D detA detB. [Answer: detA D 29, detB D 4 and det.AB/ D
116.]
Let us have a look at the two expressions (1.45) and (1.46) closely. Each
expression contains a sum of products of elements of the corresponding matrix,
such as b12 b21 b33 in Eq. (1.46). One can say that in every such a product each row
is represented by a single element; at the same time, one may also independently
say that each column is also represented by a single element of A.
This means that each elementary product in the cases of jAj and jBj can be
written as ˙a1j1 a2j2 and ˙b1j1 b2j2 b3j3 , respectively, where indices j1 ; j2 or j1 ; j2 ; j3
correspond to some permutation of numbers 1,2 and 1,2,3 for the two determinants.
There are 2Š D 2 and 3Š D 6 permutations of the second indices possible in the two
cases clearly illustrating the number of terms in each of the two determinants.
Now, let us look at the sign attached to each of the products. It is defined
by the parity of the permutation 1; 2 ! j1 ; j2 or 1; 2; 3 ! j1 ; j2 ; j3 . Every
permutation of two indices contributes a factor of 1. By performing all necessary
pair permutations one after another, starting from the first (perfectly ordered) term,
such as a11 a22 or b11 b22 b33 , the overall sign of each term in Eqs. (1.45) and (1.46)
can be obtained. For instance, consider the second term in Eq. (1.46), b11 b23 b32 .
Only one permutation of the right indices in the third and the second elements is
required in this case:
b11 b22 b33 ! b11 b23 b32 ;

where the permuted indices are underlined. Therefore, this term should appear with
a single factor of 1. The fourth term, Cb12 b23 b31 , on the other hand, requires two
permutations:
b11 b22 b33 ! b12 b21 b33 ! b12 b23 b31 ;
and thus should acquire a factor of .1/2 D 1.

Once we have looked at the two cases that must be familiar to us, let us try to
make a generalisation to an arbitrary n n matrix. If
0 1
a11 a1n
A D @ A
an1 ann
is such a matrix, then its determinant is defined in the following way:

ˇ ˇ
ˇ a11 a1n ˇ X
ˇ ˇ
detA D ˇˇ ˇˇ D j1 j2 :::jn a1j1 a2j2 : : : anjn ; (1.47)
ˇa a ˇ P
n1 nn
where the sum contains nŠ terms corresponding to all possible arrangements

(permutations) P of the right (column) indices of the elements of the matrix in the
product a11 a22 ann . The sign factor j1 j2 :::jn D ˙1 corresponds to the parity of the
overall permutation: each pair permutation between two indices of the elements in
the product brings a factor of 1, and one has to do as many pair permutations as
necessary to get the required order of indices.
In the above consideration, we have initially ordered elements aij of the matrix A
in the determinant in order of the left (row) index. In fact, a similar expression can
also be written with the elements ordered with respect to the right (column) index:
ˇ ˇ
ˇ a11 a1n ˇ X
ˇ ˇ
detA D ˇˇ ˇˇ D j1 j2 :::jn aj1 1 aj2 2 : : : ajn n : (1.48)
ˇa a ˇ P
n1 nn
Exactly the same final expression for detA is obtained.

As the simplest application of our general definition of the determinant of a
square matrix, let us calculate the determinant of a diagonal matrix:
0 1
::
: 0 0
B d11 C
B: : C
B :: d22 : : 0 C
D D dii ıij D B
B
C :
B 0 :: :: :: C
@ : : :C A
::
0 0 : dnn
We shall use Eq. (1.47). In each term in the product at least one element is chosen
from every row and column. However, all terms except those on the main diagonal
are zeros in D. Therefore, there is only one non-zero term in the sum, namely
d11 d22 dnn ; the parity of this term is obviously one, i.e. the determinant of a
diagonal matrix is equal to a product of all its diagonal elements:
ˇ ˇ
ˇ :: ˇ
ˇ d11 : 0 0 ˇ
ˇ ˇ
ˇ :: :: ˇ
ˇ : d22 : 0 ˇˇ Yn
ˇ D d d d D dkk : (1.49)
ˇ : : : : : : ˇˇ 11 22 nn
ˇ 0 : : :ˇ
ˇ kD1
ˇ : ˇ
ˇ 0 0 : : dnn ˇ
In particular, the determinant of the unit matrix is equal to one: detE D 1.

Example 1.11. I As a more elaborate example, let us work out an expression for
the determinant of a 4 4 matrix
0 1
a11 a12 a13 a14
B a21 a22 a23 a24 C
ADB C
@ a31 a32 a33 a34 A :
a41 a42 a43 a44
Solution. We have to construct, starting each time from the “perfectly ordered”
elementary product a11 a22 a33 a44 , all possible orderings of the right indices, keeping
track of the parity of the permutation. There are 4Š D 24 terms to be expected. All
24 permutations of numbers 1,2,3,4 and their parity together with the corresponding
contribution to the detA are shown in Table 1.2. Summing up all terms in the last
column of the table, we obtain the desired expression for the determinant. J
Problem 1.43. Let elements aij of an nn matrix A be functions of a parameter

, i.e. aij D aij ./. Prove the formula:
d X n
jAj D jAk j ;
d kD1
where the n n matrix Ak is formed by all elements of the original matrix

apart from those in the k-th row (or column) which are derivatives of the
corresponding elements of A with respect to .
Table 1.2 All contributions to the determinant of a 4 4 matrix

Permutation No of pair permutations Parity Contributing term
1,2,3,4!1,2,3,4 0 1 a11 a22 a33 a44
1,2,3,4!1,2,4,3 1 1 a11 a22 a34 a43
1,2,3,4!1,3,2,4 1 1 a11 a23 a32 a44
1,2,3,4!1,3,4,2 2 1 a11 a23 a34 a42
1,2,3,4!1,4,2,3 2 1 a11 a24 a32 a43
1,2,3,4!1,4,3,2 1 1 a11 a24 a33 a42
1,2,3,4!2,1,3,4 1 1 a12 a21 a33 a44
1,2,3,4!2,1,4,3 2 1 a12 a21 a34 a43
1,2,3,4!2,3,1,4 2 1 a12 a23 a31 a44
1,2,3,4!2,3,4,1 3 1 a12 a23 a34 a41
1,2,3,4!2,4,1,3 3 1 a12 a24 a31 a43
1,2,3,4!2,4,3,1 2 1 a12 a24 a33 a41
1,2,3,4!3,1,2,4 2 1 a13 a21 a32 a44
1,2,3,4!3,1,4,2 3 1 a13 a21 a34 a42
1,2,3,4!3,2,1,4 1 1 a13 a22 a31 a44
1,2,3,4!3,2,4,1 2 1 a13 a22 a34 a41
1,2,3,4!3,4,1,2 2 1 a13 a24 a31 a42
1,2,3,4!3,4,2,1 3 1 a13 a24 a32 a41
1,2,3,4!4,1,2,3 3 -1 a14 a21 a32 a43
1,2,3,4!4,1,3,2 2 1 a14 a21 a33 a42
1,2,3,4!4,2,1,3 2 1 a14 a22 a31 a43
1,2,3,4!4,2,3,1 1 -1 a14 a22 a33 a41
1,2,3,4!4,3,1,2 3 -1 a14 a23 a31 a42
1,2,3,4!4,3,2,1 2 1 a14 a23 a32 a41
1.2.6.2 Properties of Determinants
The formal definition of the determinant of a matrix is seen to be very cumbersome

to use in practice; however, it is proven to be very handy in establishing various
properties of the determinants, which is the subject of the current subsection.
A convenient method of calculating determinants will be given at the end of this
section.
Property 1.1. Interchanging two rows in the determinant gives a factor of 1, e.g.
ˇ ˇ ˇ ˇ
ˇ 1 2 1 ˇ ˇ3 0 3 ˇ
ˇ ˇ ˇ ˇ
ˇ 3 0 3 ˇ D ˇ 1 2 1 ˇ :
ˇ ˇ ˇ ˇ
ˇ2 1 4 ˇ ˇ2 1 4 ˇ
Proof. Consider a matrix Ar$t obtained from A by interchanging the row r with the
row t (assuming for certainty that t > r):
0 1 0 1
a11 a1n a11 a1n
B C B C
B C B C
Ba arn C B a atn C
B r1 C B t1 C
B C B C
A D B C and Ar$t D B C :
B C B C
B at1 atn C B ar1 arn C
B C B C
@ A @ A
an1 ann an1 ann
Then, we get
X
detA D j1 j2 :::jn a1j1 : : : arjr : : : atjt : : : anjn ;
P
X
detAr$t D j1 j2 :::jn a1j1 : : : atjt : : : arjr : : : anjn :
P
It is clear from this that if we make a single pair permutation atjt $ arjr in every
term of detAr$t , the expression will become exactly the same as for detA. However,
a single permutation brings in an extra factor of 1 to each of the nŠ terms of the
sum. Thus, detAr$t D detA, as required.
Property 1.2. Interchanging two columns in the determinant gives a factor of 1,
e.g.
ˇ ˇ ˇ ˇ
ˇ 1 2 1 ˇ ˇ 1 1 2 ˇ
ˇ ˇ ˇ ˇ
ˇ3 0 3 ˇ D ˇ3 3 0ˇ :
ˇ ˇ ˇ ˇ
ˇ2 1 4 ˇ ˇ2 4 1ˇ
Proof. Similar to the above; however, each term in the sum should be ordered with
respect to the right rather than the left index of every element of the matrix in
accordance with Eq. (1.48).
Property 1.3. If a matrix has two identical rows (columns), its determinant is equal
to zero, e.g.
ˇ ˇ
ˇx 1 xˇ
ˇ ˇ
ˇ x2 1 x2 ˇ D 0 :
ˇ ˇ
ˇ x3 3 x3 ˇ
Proof. Let the rows r and t be identical. If we interchange them, then, according to
Property 1.1, detAr$t D detA. However, because the two rows are identical, Ar$t
does not differ from A, leading to detA D detA, which means that detA D 0.
Property 1.4. If every element of a single row (column) is multiplied by a factor c,

the determinant is also multiplied by the same factor, e.g.
ˇ ˇ ˇ ˇ
ˇ a 2 1 ˇ ˇ 1 2 1 ˇ
ˇ ˇ ˇ ˇ
ˇ 3a 0 3 ˇ D a ˇ 3 0 3 ˇ :
ˇ ˇ ˇ ˇ
ˇ 2a 1 4 ˇ ˇ2 1 4 ˇ
Proof. This property follows directly from Eq. (1.47) and the fact that in each
product of elements there is only one element from that row (column).
Property 1.5. If every element of a row (column) is written as a sum (difference)
of two terms, the determinant is equal to the sum (difference) of two determinants
each containing one part of the row (column):
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ a1r ˙ b1r ˇ ˇ a1r ˇ ˇ b1r ˇ
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ ˇˇ D ˇˇ ˇˇ ˙ ˇˇ ˇˇ ; (1.50)
ˇ
ˇ a ˙ b ˇ ˇ a ˇ ˇ b ˇ
nr nr nr nr
e.g.
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ a 2 1 ˇ ˇ a C x 2 1 ˇ ˇ x 2 1 ˇ
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ 3a 0 3 ˇ D ˇ 2a 0 3 ˇ C ˇ a 0 3 ˇ :
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ 2a 1 4 ˇ ˇ 4a 1 4 ˇ ˇ 2a 1 4 ˇ
Proof. Again, the proof follows directly from Eq. (1.47).

Property 1.6. The value of the determinant is not changed if one of the rows
(columns) is added (subtracted) to (from) another,
ˇ ˇ ˇ ˇ
ˇ a1r ˙ a1t a1t ˇ ˇ a1r a1t ˇ
ˇ ˇ ˇ ˇ
ˇ ˇˇ D ˇˇ ˇˇ ; (1.51)
ˇ
ˇ a ˙ a a ˇ ˇ a a ˇ
nr nt nt nr nt
where, for certainty, we assumed that t > r. For instance,

ˇ ˇ ˇ ˇ
ˇ 0 x x2 ˇ ˇ 1 x x2 1 ˇ
ˇ ˇ ˇ ˇ
ˇ 1 0 1 ˇ D ˇ 1 0 1 ˇ ;
ˇ ˇ ˇ ˇ
ˇ x2 x 1 ˇ ˇ x2 x 1 ˇ
where the first row in the determinant in the right-hand side is obtained by
subtracting row 2 from row 1 in the determinant in the left-hand side.
Proof. It follows from Properties 1.5 and 1.3.
Property 1.7. The determinant is equal identically to zero if at least one row
(column) is a linear combination of other rows (columns).
Proof. The idea of the proof can be clearly seen by considering a simple case when
the first column is a linear combination of the second and the third columns:
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ ˛a12 C ˇa13 a12 a13 ˇ ˇ ˛a12 a12 a13 ˇ ˇ ˇa13 a12 a13 ˇ
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ ˇˇ D ˇˇ ˇˇ C ˇˇ ˇˇ
ˇ
ˇ ˛a C ˇa a a ˇ ˇ ˛a a a ˇ ˇ ˇa a a ˇ
n2 n3 n2 n3 n2 n2 n3 n3 n2 n3
ˇ ˇ ˇ ˇ
ˇ a12 a12 a13 ˇ ˇ a13 a12 a13 ˇ
ˇ ˇ ˇ ˇ
ˇ
D ˛ ˇ ˇ C ˇ ˇˇ ˇˇ ;
ˇ
ˇa a a ˇ ˇa a a ˇ
n2 n2 n3 n3 n2 n3
where we have used Properties 1.5 and 1.4. Finally, each of the determinants in the
last line is equal to zero since it contains two identical columns (Property 1.3), as
required. The proof in a general case is straightforward.
Property 1.8. The determinants of the matrices A and AT are equal.

Proof. The determinant of AT D aQ ij with aQ ij D aji is given by
X X
det AT D Q 1j1 aQ 2j2 : : : aQ njn D
j1 j2 :::jn a j1 j2 :::jn aj1 1 aj2 2 : : : ajn n ;
P P
which is an equivalent expression (1.48) for the determinant of A.

As a simple illustration of the properties of determinants, we shall solve the
system of two linear algebraic equations

a11 x1 C a12 x2 D h1
a21 x1 C a22 x2 D h2
with respect to x1 and x2 . ˇTo this end,
ˇ consider the determinant of the coefficients in
ˇ a11 a12 ˇ
the left-hand side: jAj D ˇˇ ˇ. Using Property 1.4, we write
a21 a22 ˇ
ˇ ˇ
ˇa x a ˇ
x1 jAj D ˇˇ 11 1 12 ˇˇ :
a21 x1 a22
Then, the determinant is

ˇ ˇ ˇ ˇ
ˇ a12 x2 a12 ˇ ˇ ˇ
ˇ ˇ D x2 ˇ a12 a12 ˇ D 0
ˇ a22 x2 a22 ˇ ˇ a22 a22 ˇ
by virtue of Property 1.3. Thus, using Property 1.5, we can add two determinants
together as follows (they both have the same second column):
ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ
ˇ a11 x1 a12 ˇ ˇ a12 x2 a12 ˇ ˇ a11 x1 C a12 x2 a12 ˇ ˇ h1 a12 ˇ
x1 jAj C 0 D ˇˇ ˇ C ˇ ˇ D ˇ ˇ D ˇ ˇ ;
a21 x1 a22 ˇ ˇ a22 x2 a22 ˇ ˇ a21 x1 C a22 x2 a22 ˇ ˇ h2 a22 ˇ
that gives the required solution for x1 as the ratio of the determinant in the right-hand
side and jAj. Similarly, starting from
ˇ ˇ
ˇa a x ˇ
x2 jAj D ˇˇ 11 12 2 ˇˇ
a21 a22 x2
and adding a zero determinant

ˇ ˇ
ˇ a11 a11 x1 ˇ
ˇ ˇ
ˇ a21 a21 x1 ˇ
to it, we obtain
ˇ ˇ ˇ ˇ
ˇ a11 a11 x1 C a12 x2 ˇ ˇ a11 h1 ˇ
ˇ
x2 jAj D ˇ ˇ D ˇ ˇ ;
a21 a21 x1 C a22 x2 ˇ ˇ a21 h2 ˇ
which gives x2 . The obtained solution is a particular case of the Cramer’s rule to be
considered in more detail in Sect. 1.2.7.
Property 1.9. The determinant of a product of two matrices is equal to the product
of their determinants:
det .AB/ D detA detB : (1.52)
Proof. Let us first consider a two-dimensional case to see the main idea:
ˇ ˇ ˇ ˇ
ˇa a ˇ ˇb b ˇ
detA D ˇˇ 11 12 ˇˇ D a11 a22 a12 a21 and detB D ˇˇ 11 12 ˇˇ D b11 b22 b12 b21 ;
a21 a22 b21 b22
and
ˇ ˇ
ˇ a11 b11 C a12 b21 a11 b12 C a12 b22 ˇ
ˇ
det .AB/ D ˇ ˇ :
a21 b11 C a22 b21 a21 b12 C a22 b22 ˇ
We shall now use properties of the determinants we discovered above. Using

Properties 1.5 and 1.4 on the first column, we can split it:
ˇ ˇ ˇ ˇ
ˇ a b a b C a12 b22 ˇ ˇ a12 b21 a11 b12 C a12 b22 ˇ
det .AB/ D ˇˇ 11 11 11 12 ˇCˇ ˇ
a21 b11 a21 b12 C a22 b22 ˇ ˇ a22 b21 a21 b12 C a22 b22 ˇ
ˇ ˇ ˇ ˇ
ˇ a a b C a12 b22 ˇ ˇ ˇ
D b11 ˇˇ 11 11 12 ˇ C b21 ˇ a12 a11 b12 C a12 b22 ˇ :
a21 a21 b12 C a22 b22 ˇ ˇ a22 a21 b12 C a22 b22 ˇ
Now we split the second column in the same way:

ˇa a b ˇ ˇa a b ˇ ˇa a b ˇ ˇa a b ˇ
det .AB/ Db11 ˇˇ 11 11 12 ˇˇ C ˇˇ 11 12 22 ˇˇ Cb21 ˇˇ 12 11 12 ˇˇ C ˇˇ 12 12 22 ˇˇ
a21 a21 b12 a21 a22 b22 a22 a21 b12 a22 a22 b22
0 1 0 1
B ˇˇ ˇ ˇ
ˇ a a ˇC
ˇ B ˇˇ ˇ ˇ
ˇ a a ˇC
ˇ
B a a ˇ C B a a ˇ C
Db11 Bb12 ˇˇ 11 11 ˇˇ Cb22 ˇˇ 11 12 ˇˇC Cb21 Bb12 ˇˇ 12 11 ˇˇ Cb22 ˇˇ 12 12 ˇˇC ;
@ a21 a21 a21 a22 A @ a22 a21 a22 a22 A
„ ƒ‚ … „ ƒ‚ …
D0 D0
where the selected two determinants are each equal to zero due to Property 1.3.
Finally, collect the remaining terms:
ˇ ˇ ˇ ˇ
ˇ a11 a12 ˇ ˇ a12 a11 ˇ
ˇ
det .AB/ D b11 b22 ˇ ˇ C b21 b12 ˇ ˇ ˇ
a21 a22 ˇ a22 a21 ˇ
ˇ ˇ ˇ ˇˇ ˇ
ˇa a ˇ ˇb b ˇˇa a ˇ
D .b11 b22 b21 b12 / ˇˇ 11 12 ˇˇ D ˇˇ 11 12 ˇˇ ˇˇ 11 12 ˇˇ ;
a21 a22 b21 b22 a21 a22
as required. Above we permuted the columns of the second determinant to have its
elements in the correct order (as in the first one); this changed the sign before b21 b12
to give exactly detB.
Now we apply the same method to the general P case of an n n determinant. The
general term of the matrix C D AB is cij D k aik bkj . Therefore, the determinant of
C we would like to calculate is
ˇ P P P ˇ
ˇ ˇ
ˇ Pk1 a1k1 bk1 1 Pk2 a1k2 bk2 2 Pkn a1kn bkn n ˇ
ˇ ˇ
det .AB/ D ˇ
ˇ l1 a2l1 bl1 1 l2 a2l2 bl2 2 ln a2ln bln n ˇ :
ˇ
ˇ ˇ
ˇP P P ˇ
ˇ m anm1 bm1 1 m anm2 bm2 2 m anmn bmn n ˇ
1 2 n
Notice that we used different summation (dump) indices. We start by splitting the
first column term-by-term, then the second one, the third and so on:
ˇ P P ˇ
ˇa b ˇ
ˇ 1k1 k1 1 Pk2 a1k2 bk2 2 Pkn a1kn bkn n ˇ
X ˇˇ a2k bk 1 ˇ
det .AB/ D l2 a2l2 bl2 2 ln a2ln bln n ˇ
ˇ 1 1 ˇ
ˇ ˇ
k1 ˇ P P ˇ
ˇ ank1 bk1 1 m anm2 bm2 2 m anmn bmn n ˇ
2 n
ˇ ˇ
ˇa b a b P a b ˇ
ˇ 1k1 k1 1 1k2 k2 2 1k ˇ
X X ˇˇ a2k bk 1 a2k bk 2 P a2l bl n ˇˇ
kn n kn n
D ˇ 1 1 2 2 ln n n
ˇ
ˇ ˇ
k1 k2 ˇ P ˇ
ˇ ank1 bk1 1 ank2 bk2 2 m anmn bmn n ˇ
n
ˇ ˇ
ˇ a1k1 bk1 1 a1k2 bk2 2 a1kn bkn n ˇ
ˇ ˇ
X X X ˇ a2k bk 1 a2k bk 2 a2k bk n ˇ
D D ::: ˇ 1 1 2 2 n n ˇ
ˇ ˇˇ
kn ˇ
k1 k2
ˇa b a b a b ˇ
nk1 k1 1 nk2 k2 2 nkn kn n
ˇ ˇ
ˇ a1k1 a1k2 a1kn ˇ
ˇ ˇ
XX X ˇ a2k1 a2k2 a2kn ˇ
D ::: ˇ
bk1 1 bk2 2 bkn n ˇ ˇ :
ˇ ˇˇ
k1 k2 kn
ˇa a a ˇ
nk1 nk2 nkn
Note that when splitting the columns, we took the summation signs out; then the
same summation index can be used. The determinant above in the right-hand side
contains only elements of the matrix A; it would only be non-zero if all indices k1 ,
k2 , etc., are different. In other words, the n summations over the indices k1 , k2 , etc.,
in fact can be replaced with a single sum taken over all permutations P of the indices
.k1 ; k2 ; : : : ; kn / running between 1 and n:
ˇ ˇ
ˇ a1k1 a1k2 a1kn ˇˇ
X ˇ
ˇa a2k2 a2kn ˇˇ
det .AB/ D bk1 1 bk2 2 bkn n ˇˇ 2k1 :
ˇ ˇˇ
P
ˇa ank2 ankn ˇ
nk1
Now, looking at the determinant: if the indices k1 , k2 , etc., were ordered correctly in
the ascending order from 1 to n, then the determinant above would be equal exactly
to detA. To put them in the correct order for an arbitrary arrangement of the indices
k1 , k2 , etc., a permutation is required resulting in the sign P D k1 k2 :::kn D ˙1.
Therefore,
!
X X
det .AB/ D k1 k2 :::kn bk1 1 bk2 2 bkn n detA D k1 k2 :::kn bk1 1 bk2 2 bkn n detA
P P
D detB detA ;
as required.
Problem 1.44. Prove that

1
detA1 D : (1.53)
detA
[Hint: use AA1 D E.]

Problem 1.45. Prove that the determinant of an orthogonal matrix A is equal
to ˙1. [Hint: use AAT D E.]
Problem 1.46. Prove by a direct calculation that det C D 1, where C is the

reflection matrix (1.44). Is this result to be expected?
Problem 1.47. Prove that the absolute value of the determinant of a unitary
matrix A is equal to 1. [Hint: use AA D E.]
1.2.6.3 Practical Method of Calculating Determinants: Minors
There is a very simple way of calculating determinants. The idea is to establish a

recurrence relation in which an n n determinant is expressed via determinants
of a smaller dimension. Applying this formula a required number of times, the
determinant of the given order can be finally written via determinants of low orders
(e.g. 2 or 3), explicit expressions for which were given above.
To this end, let us re-examine expression (1.46) for the 3 3 determinant. By
grouping terms by the first row elements of the matrix, we get
ˇ ˇ
ˇ b11 b12 b13 ˇ
ˇ ˇ
ˇ b21 b22 b23 ˇ D b11 Œb22 b33 b23 b32 b12 Œb21 b33 b23 b31 C b13 Œb21 b32 b22 b31
ˇ ˇ
ˇb b b ˇ
31 32 33
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ b22 b23 ˇ ˇ b21 b23 ˇ ˇ b21 b22 ˇ
ˇ
D b11 ˇ ˇ ˇ
b12 ˇ ˇ ˇ
C b13 ˇ ˇ : (1.54)
b32 b33 ˇ b31 b33 ˇ b31 b32 ˇ
It can be seen that the 3 3 determinant is expressed via a sum of three 2 2

determinants with the prefactors which are elements of the first row. Each of the 22
determinants is obtained by removing one row and one column corresponding to the
first
ˇ and ˇthe second indices of the prefactor element. For instance, the determinant
ˇ b22 b23 ˇ
ˇ ˇ
ˇ b32 b33 ˇ is combined with the prefactor b11 and can be obtained by removing
the first rowˇ and the ˇfirst column from the original 3 3 determinant, while the
ˇb b ˇ
determinant ˇˇ 21 23 ˇˇ is obtained by removing the first row and the second column
b31 b33
as these are the indices of its own prefactor b12 . In other words, each element of the
first row is multiplied by the 2 2 determinant which is obtained by removing the
row and the column which cross at that particular element.
Note that to each 2 2 determinant in Eq. (1.54) is attached either a plus or
minus sign. In fact, the signs alternate if the elements b1k standing as prefactors to
the 2 2 determinants are ordered as shown with respect to k D 1; 2; 3. This fact
can be illustrated by considering all six permutations of the integers 1, 2 and 3. We
can split these six permutations into three groups (the number of groups is equal to
the number of elements in our number set) by selecting every one of the integers
to stand at the first (left) position as shown in the second column of Table 1.3.
All six permutations are now grouped, and in each group only the second and the
third elements are allowed to be permuted, the first one is fixed. As we go from
Table 1.3 Grouping permutations of numbers .1; 2; 3/ into three groups corre-
sponding to a different fixed first number
Label Sequence after permutation Number of permutations Sign/Parity
a1 123 0 C1
a2 132 1 1
b1 213 0C1D1 1
b2 231 1C1D2 C1
c1 312 1C1D2 C1
c2 321 2C1D3 1
the first group (sequences a1, a2) to the second (b1, b2), only one more additional
permutation is required giving an extra minus sign to the parity; going from the
second to the third (c1, c2), a single additional permutation is added again bringing
in another minus sign. Therefore, the element b11 in Eq. (1.54) corresponding to
the first group (sequences a1, a2) has a plus sign, b12 acquired a minus, while b13
acquires another minus giving it plus in the end. In other words, the signs alternate
starting from the plus for the very first term.
It appears that this method is very general and can be applied to a determinant of
arbitrary order. To formulate the method, we introduce a new quantity. Consider a
determinant jAj of order n. If we remove the i-th row and the j-th column from it (the
row and the column cross at element aij ), a determinant of order n 1 is obtained.
It is called a minor of aij and denoted Mij . For example, the minor of the element
a34 D 8 (bold underlined) of the determinant
ˇ ˇ ˇ ˇ
ˇ 1 2 3 4 ˇˇ ˇ 1 2 3 j ˇˇ ˇˇ ˇ
ˇ ˇ 1 2 3 ˇˇ
ˇ 4 3 2 1 ˇˇ ˇ 4 3 2 ˇ
jˇ ˇ ˇ
ˇ is M34 D ˇˇ D 4 3 2 ˇˇ :
ˇ 0 2 4 8 ˇˇ ˇˇ ˇˇ
ˇ ˇ
8 4 2 ˇ
ˇ 8 4 2 0 ˇ ˇ 8 4 2 jˇ
A minor Mij can be attached a sign .1/iCj in which case it is called a co-factor
of aij and denoted Aij D .1/iCj Mij . In the example above the co-factor of a34 is
.1/3C4 M34 D M34 . The co-factor signs can be easily obtained by constructing a
chess-board of plus and minus signs starting from the plus at the position 11:
ˇ ˇ
ˇC C C ˇ
ˇ ˇ
ˇ C C ˇ
ˇ ˇ
ˇ ˇ
ˇC C C ˇ
ˇ ˇ
ˇ C C ˇ
ˇ ˇ :
ˇC C C ˇ
ˇ ˇ
ˇ :: ˇ
ˇ : ˇ
ˇ ˇ
ˇ : : ˇˇ
ˇ :
Thus, the correct sign .1/iCj for the co-factor can be located on this chess-board
at the same i; j position as the element aij itself in the original determinant.
Now we are ready to formulate the general result:
Theorem 1.3. The determinant detA D jAj of an n n matrix A can be expanded

along its first row as follows:
X
n
jAj D a11 A11 C a12 A12 C C a1n A1n D a1k A1k ; (1.55)
kD1
where Aij D .1/iCj Mij is the co-factor of the element aij .
Proof. Consider all possible permutations P of the sequence .1; 2; 3 : : : ; n/ of

integer numbers. There are nŠ of such permutations. We shall split them into n
groups by selecting the first element of the permuted set to be fixed, i.e. the first
group has all permutations which start from 1, the second group starts from 2 and
so on. Consider one such a group in which the first number is j1 . In this group we
have all terms like a1j1 a2j2 anjn in the expansion of det A, in which all possible
values of the indices j2 ; : : : ; jn are allowed (except for j1 ). The parity j1 j2 :::jn of any
member of this group of terms can be expressed as .1/j1 C1 j2 j3 :::jn via the parity
j2 j3 :::jn of all its elements but the first one. Indeed, if j1 D 1, then the two parities are
the same, for j1 D 2 an extra minus sign is added, for j1 D 3 one more minus sign is
added and so on. Therefore, we can present the sum over all permutations P of the
original set via the sum over the groups and the permutations within each group of
right indices (i.e. within the .j2 ; j3 ; : : : ; jn / set):
X
detA D j1 j2 :::jn a1j1 a2j2 : : : anjn
PDfj1 ;j2 ;:::;jn g
2 3
X
n X
D a1j1 .1/j1 C1 4 j2 :::jn a2j2 : : : anjn 5
j1 D1 PDfj2 ;:::;jn g
X
n X
n
D a1j1 .1/j1 C1 M1j1 D a1j1 A1j1 ;
j1 D1 j1 D1
as required since the expression in the square brackets is nothing but the determinant
obtained by removing all elements of the first row and of the j1 -th column, i.e. it is
the minor M1j1 . Q.E.D.
Problem 1.48. Prove that similar formula can be written by expanding along
any row or column.
Example 1.12. I Solve the following equation with respect to x:

ˇ ˇ
ˇx 1 1 1ˇ
ˇ ˇ
ˇ1 x 0 0ˇ
ˇ ˇ
ˇ1 0 x 0ˇ D 0 :
ˇ ˇ
ˇ1 0 0 xˇ
Solution. Expand the determinant with respect to its first row:

ˇ ˇ
ˇx 1 1 1 ˇˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ
ˇ ˇx 0 0ˇ ˇ1 0 0ˇ ˇ1 x 0ˇ ˇ1 x 0ˇ
ˇ1 x 0 0ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ
ˇ
ˇ1 0 x ˇ D x ˇˇ 0 x 0 ˇˇ 1 ˇˇ 1 x 0 ˇˇ C 1 ˇˇ 1 0 0 ˇˇ 1 ˇˇ 1 0 x ˇˇ :
ˇ 0ˇ ˇ0 0 xˇ ˇ1 0 xˇ ˇ1 0 xˇ ˇ1 0 0ˇ
ˇ1 0 0 xˇ
It is convenient to open the 33 determinants along the rows or columns that contain
more zeros:
ˇx 0 0ˇ ˇ1 0 0ˇ ˇ1 x 0ˇ ˇ1 x 0ˇ
ˇ 0 x 0 ˇ D x3 ; ˇ 1 x 0 ˇ D x2 ; ˇ 1 0 0 ˇ D x2 and ˇ 1 0 x ˇ D x2 :
ˇ0 0 xˇ ˇ1 0 xˇ ˇ1 0 xˇ ˇ1 0 0ˇ
Therefore, we obtain
ˇ ˇ
ˇx 1 1 1 ˇˇ
ˇ
ˇ1 x 0 0 ˇˇ
ˇ D x4 x2 x2 x2 D x2 x2 3 D 0 ;
ˇ1 0 x 0ˇ ˇ
ˇ
ˇ1 0 0 xˇ
p
which has the following solutions: x D 0; ˙ 3. J
Problem 1.49. Show that the solutions of the following equation with respect
to x,
ˇ ˇ
ˇx 1 0 1ˇ
ˇ ˇ
ˇ1 x 1 0ˇ
ˇ ˇ
ˇ0 1 x 1ˇ D 0 ;
ˇ ˇ
ˇ1 0 1 xˇ
are x D 0; ˙2.
Problem 1.50. Consider the following matrix:
0 1
2 0 1
A D @ 1 1 3 A :
0 2 1
(continued)

Show that all its minors Mij and co-factors Aij , if combined into 3 3 matrices,
are
0 1 0 1
7 1 2 7 1 2
Mij D @ 2 2 4 A and Aij D @ 2 2 4 A :
1 7 2 1 7 2

Then verify that if all the co-factors are combined into a matrix Acof D Aij ,
then the matrix ATcof /jAj gives the inverse A1 of A.
Problem 1.51. Show that the co-factors Aij of the matrix
0 1 0 1
1 02 1 3 1
A D @ 2 1 1 A are Acof D Aij D @ 0 3 0 A :
1 0 1 2 3 1
Finally, verify that the inverse of A can be calculated via ATcof /jAj.
Problem 1.52. Show that the determinant of a right triangular matrix (its
elements on the left of the diagonal are all zeros) is equal to the product of
its diagonal elements:
ˇ ˇ
ˇ a11 a12 a1n ˇ
ˇ ˇ
ˇ 0 a22 a2n ˇ Yn
ˇ ˇ D a11 a22 ann D aii : (1.56)
ˇ ˇ
ˇ ˇ
ˇ 0 0 a ˇ iD1
nn
The same result is valid for the left triangular matrix as well.
1.2.6.4 Determinant of a Tridiagonal Matrix
As an interesting
example of a determinant calculation, let us consider a matrix
A D aij of a special structure in which only elements along the diagonal (i D j)
and next to it are non-zero, all other elements are equal to zero: aij ¤ 0 with j D i
and j D i ˙ 1. In other words, in the case of this so-called tridiagonal matrix aij D 0
as long as ji jj > 1. The determinant we are about to calculate is shown in the
left-hand side of the equation pictured in Fig. 1.6.
Let Ak be a matrix obtained from A by removing its first .k1/ rows and columns,
i.e. in Ak the diagonal elements are ˛k , ˛kC1 , etc., ˛n . In particular, A D A1 . Then,
opening the determinant of A along the first row, we have: jA1 j D ˛1 jA2 j C ˇ1 R12 ,
where R12 is the corresponding co-factor to the a12 D ˇ1 element of A, see the
second term in the right-hand side in Fig. 1.6. Opening now R12 along its first
column, we get R12 D ˇ1 jA3 j, yielding
jA1 j ˇ12
jA1 j D ˛1 jA2 j ˇ12 jA3 j H) D ˛1 :
jA2 j jA2 j = jA3 j
Fig. 1.6 For the calculation of the determinant of a tridiagonal matrix: opening along the first
(upper) row. At the next step, the second determinant in the right-hand side is opened along its first
column leading to a product of ˇ1 with the minor enclosed in the green box
Let k D jAk j = jAkC1 j be the ratio of two consecutive determinants containing ˛k

and ˛kC1 as their first diagonal elements, respectively. Then the above expression
can be written simply as 1 D ˛1 ˇ12 =2 . If we started from the matrix A2 , a similar
calculation would result in 2 D ˛2 ˇ22 =3 . It is obvious now that generally we
would have
k D ˛k ˇk2 =kC1 (1.57)
for any k D 1; 2; : : : ; n 2. In particular, at the last step we get n2 D ˛n2

2
ˇn2 =n1 . However, n1 is calculated explicitly:
ˇ ˇ
ˇ ˛n1 ˇn1 ˇ
ˇ ˇ
jAn1 j ˇ ˇn1 ˛n ˇ 2
˛n1 ˛n ˇn1
n1 D D D ;
jAn j j˛n j ˛n
so that, going backwards in Eq. (1.57) for k D n 2, it is possible to calculate n2 ;

once n2 is known, then n3 is within reach, and so on, until at the final step 1 is
calculated. Formally 1 can be written as a so-called continued fraction:
ˇ12 ˇ12 ˇ12

1 D ˛1 D ˛1 D ˛1 :
2 ˇ2 ˇ22
˛2 23 ˛2 ˇ2
˛3 ˛ 3
4
This fraction has a finite number of terms (as the matrix A is of a finite dimension)
and can be denoted in several different ways, e.g.
ˇ ˇ ˇ ˇ
ˇ12 ˇ ˇ22 ˇ ˇ32 ˇ 2 ˇ
ˇn1
1 D ˛1
j˛2 j˛3 j˛4 ˛n
or
ˇ12 ˇ22 ˇ32 ˇ2

1 D ˛1 n1 :
˛2 ˛3 ˛4 ˛n
1.2.7 A Linear System of Algebraic Equations
Consider a system of linear algebraic equations with respect to unknown numbers

x1 , x2 , etc., xn :
8
ˆ
ˆ a11 x1 C a12 x2 C C a1n xn D b1
<
a21 x1 C a22 x2 C C a2n xn D b2
: (1.58)
ˆ
:̂
an1 x1 C an2 x2 C C ann xn D bn
It can be rewritten in a compact form using the matrix notations. Let us collect all the
unknown quantities x1 ; : : : ; xn into a vector-column X D .xi /, the coefficients aij into
a square matrix A D aij and, finally, the quantities in the right-hand side b1 ; : : : ; bn
into a vector-column B D .bi /. Then, instead of Eq. (1.58) we write simply
AX D B : (1.59)
If we multiply both sides of this equation from the left by A1 , we obtain in the
left-hand side A1 AX D EX D X, and hence we get a formal solution:
X D A1 B : (1.60)
Thus, the solution of the system of linear equations (1.59) is expressed via the
inverse of the matrix A of the coefficients. Although this solution is in many cases
useful, especially for analytical work, it does not give a simple practical way of
calculating X since it is not always convenient to find the inverse of a matrix,
especially if the dimension n of the problem is large.
Instead, we shall employ a different method due to Cramer. Consider
ˇ ˇ
ˇ a11 x1 a1n ˇ
ˇ ˇ
x1 jAj D ˇˇ ˇˇ :
ˇa x a ˇ
n1 1 nn
Now we add to this determinant a zero determinant,

ˇ ˇ ˇ ˇ
ˇ a12 a12 a1n ˇ ˇ a12 x2 a12 a1n ˇ
ˇ ˇ ˇ ˇ
0 D x2 ˇˇ ˇˇ D ˇˇ ˇˇ ;
ˇa a a ˇ ˇa x a a ˇ
n2 n2 nn n2 2 n2 nn
in which the first column is proportional to the second one. Now sum up both
expressions above, which yields
ˇ ˇ
ˇ a11 x1 C a12 x2 a1n ˇ
ˇ ˇ
ˇa x C a x a ˇ
n1 1 n2 2 nn
Next, we consider the zero determinant with the first column equal to the third one
times x3 , and then add this to the determinant above; we obtain in this way:
ˇ ˇ
ˇ a11 x1 C a12 x2 C a13 x3 a1n ˇ
ˇ ˇ
ˇa x C a x C a x a ˇ
n1 1 n2 2 n3 3 nn
This process is repeated until the last column is added:

ˇ ˇ
ˇ a11 x1 C a12 x2 C a13 x3 C C a1n xn a1n ˇ
ˇ ˇ
ˇa x C a x C a x C C a x a ˇ
n1 1 n2 2 n3 3 nn n nn
The elements along the first column can now be recognised to be elements of the
vector B in Eq. (1.58), i.e. we can write
ˇ ˇ ˇ ˇ
ˇ a11 x1 C a12 x2 C a13 x3 C C a1n xn a1n ˇ ˇ b1 a12 a1n ˇ
ˇ ˇ ˇ ˇ
x1 jAj D ˇˇ ˇˇ D ˇˇ ˇˇ ;
ˇa x C a x C a x C C a x a ˇ ˇ b a a ˇ
n1 1 n2 2 n3 3 nn n nn n n2 nn
(1.61)
which gives the required closed solution for x1 as a ratio of two determinants.
Multiplying jAj by x2 and inserting it into the second column and repeating the
above procedure, we obtain
ˇ ˇ
ˇ a11 b1 a1n ˇ
ˇ ˇ
x2 jAj D ˇˇ ˇˇ ; (1.62)
ˇa b a ˇ
n1 n nn
which enables one to express x2 as a ratio of two determinants. Generally, in order

to find xi , one has to replace the i-th column in the determinant of the matrix A by
vector B and divide it by jAj. This method is called Cramer’s rule:
ˇ ˇ
ˇ a11 a1;i1 b1 a1;iC1 a1n ˇ
ˇ ˇ
ˇ ˇ
ˇ ˇ
ˇa a ˇ
n1 n;i1 bn an;iC1 ann
xi D ˇ ˇ : (1.63)
ˇ a11 an1 ˇ
ˇ ˇ
ˇ ˇ
ˇ ˇ
ˇa a ˇ
n1 nn
Example 1.13. I Use Cramer’s rules to solve the system of equations

8
< xCyCzD2
2x y z D 1 :
:
x C 2y z D 3
Solution. The matrix A and vector B here are given by

0 1 0 1
1 1 1 2
A D @ 2 1 1 A and B D @ 1 A :
1 2 1 3
Calculating the determinant of A, we get jAj D 9. Therefore, using Cramer’s rule,

we have
ˇ ˇ ˇ ˇ
ˇ 2 1 1 ˇ ˇ1 2 1 ˇ
ˇ ˇ ˇ ˇ
ˇ 1 1 1 ˇ ˇ 2 1 1 ˇ
ˇ ˇ ˇ ˇ
ˇ 3 2 1 ˇ 9 ˇ 1 3 1 ˇ 9
xD D D1; yD D D 1
9 9 9 9
ˇ ˇ
ˇ1 1 2 ˇ
ˇ ˇ
ˇ 2 1 1 ˇ
ˇ ˇ
ˇ 1 2 3 ˇ 18
and z D D D2: J
9 9
Problem 1.53. Solve the system of equations:

8 8
< xCyzD2 < x C 2y C 3z D 5
.a/ 2x y C 3z D 5 I .b/ x 3y C z D 14 ;
: :
3x C 2y 2z D 5 2x C y C z D 1
using the Cramer’s rules. [Answer: (a) x D 1, y D 3, z D 2; (b) x D 1, y D 3,

z D 4.]
Problem 1.54. In special relativity, coordinates of particles and time associ-
ated with physical events depend on the particular coordinate system (frame)
chosen. Consider a .x0 ; t0 / frame moving with velocity v along the positive
x direction of a fixed (laboratory) frame .x; t/. By virtue of the Lorenz
transformation,

x0 D .x vt/ and t0 D t xv=c2 ;
1=2
where D 1 v 2 =c2 and c is the speed of light, one can calculate .x0 ; t0 /
in the moving frame from .x; t/ in the fixed one. Using Cramer’s rules, show that
the inverse transformation .x0 ; t0 / ! .x; t/ is

x D x0 C vt0 and t D t0 C x0 v=c2 :
Formula (1.63) allows us to draw one important conclusion that is proven to be

extremely useful in practice. Consider an algebraic system of Eqs. (1.58) or (1.59)
with the zero right-hand side:
AX D 0 : (1.64)
According to the general result (1.63), if jAj ¤ 0 and all bk D 0, then all xi D 0 as
well. Note that the determinant in the numerator of Eq. (1.63) is zero since the whole
i-th column contains zeros. This follows from Property 1.4 of the determinants
(Sect. 1.2.6.2): if we choose a common factor c D 0 to all elements of the i-th
column, then the whole determinant is multiplied by it and thus in our case the
determinant is indeed zero.
Thus, if jAj ¤ 0, the system of equations (1.64) has only a single (trivial or zero)
solution. The situation is different, however, if jAj D 0 (the matrix A is singular).
As we know from Property 1.7 of determinants (Sect. 1.2.6.2), this means that some
of the rows or columns are linearly dependent on each other. Thus, if jAj D 0,
our previous reasoning becomes false (as we cannot divide by zero in Eq. (1.63))
and hence a non-trivial solution of Eq. (1.64) may exist. We shall come across this
situation in Sect. 1.2.10. In the meantime, we illustrate the use of this statement by
solving a problem and proving an important general result.
Problem 1.55. Determine the values of x for which the system of equations
with respect to c1 , c2 , c3 and c4 has a non-trivial solution:
8
ˆ
ˆ xc1 C c2 C c3 C c4 D 0
<
c1 C xc2 C c4 D 0
:
ˆ c1 C xc3 C c4 D 0
:̂
c1 C c2 C c3 C xc4 D 0
p
[Answer: x D 0; 1, .1 ˙ 17/=2.]
Theorem 1.4. If vectors d1 ; d2 ; : : : ; dp are orthonormal, then they are linearly

independent.
Proof. To prove that the vectors are linearly independent, we have to demonstrate
that the system of equations
c1 d1 C c2 d2 C C cp dp D 0
with respect to the coefficients c1 ; c2 ; : : : ; cp has only the trivial (zero) solution.
Write the equations in components of the vectors dk D .dik / using the right index in
dik to indicate the vector number k and the left one for the component:
8
< d11 c1 C d12 c2 C C d1p cp D 0
: (1.65)
:
dp1 c1 C dp2 c2 C C dpp cp D 0
This is a set of p linear algebraic equations with respect to the unknown coefficients
c1 ; c2 ; : : : ; cp with the zero right-hand side. It has a trivial solution if the determinant
of a matrix formed by the coordinates of vectors d1 ; d2 ; : : : ; dp is not equal to zero:
ˇ ˇ
ˇ d11 d1p ˇ
ˇ ˇ
jDj D ˇˇ ˇˇ ¤ 0 : (1.66)
ˇd d ˇ
p1 pp
We shall now show that this is indeed the case since the vectors d1 ; d2 ; : : : ; dp are
orthonormal. Indeed, due to their orthogonality and unit length, we can write
X
p
.di ; dj / D ıij or dki dkj D ıij : (1.67)
kD1

If we now introduce the Hermitian conjugate matrix D D dij with dij D dji , then
it is seen that Eq. (1.67) can be written simply as
X
p
dik dkj D ıij or D D D E ;
kD1
i.e. D is a unitary matrix whose determinant (see Problem 1.47) jDj D ˙1. We see
that the determinant of D is not equal to zero. This means that the only solution
of Eq. (1.65) is the trivial solution which, in turn, means that the set of vectors
d1 ; d2 ; : : : ; dp is indeed linearly independent. Q.E.D.
Note that requirement of the theorem that the vectors are normalised to unity
is not essential and was only assumed for convenience. It is sufficient to have
vectors orthogonal to guarantee their linear independence. The theorem just proven
is important as it says that if we have a set of p orthogonal vectors, they can form
a basis of a p-D vector space, and any vector from this space can be expanded in
terms of them. Correspondingly, that means that no more than p linearly independent
vectors can be constructed for the p-D space: any additional vector within the same
space will necessarily be linearly dependent on them.
1.2.8 Linear Independence of Functions: Wronskian
The notion of the determinant appears to be very useful in verifying whether a

set of functions f1 .x/, f2 .x/,: : :, fn .x/ is linearly independent. Let us establish a
simple criterion for that. We know from Sect. 1.1.2 that the functions are linearly
independent if the equation
˛1 f1 .x/ C ˛2 f2 .x/ C C ˛n fn .x/ D 0 ; (1.68)

valid for any continuous interval of the values of x, has only the trivial solution with
respect to the coefficients ˛1 , ˛2 , etc.:
˛1 D ˛2 D D ˛n D 0 :
It is possible to work out a simple method for verifying this. To this end, let us
generate .n 1/ more equations by differentiating both sides of Eq. (1.68) once,
twice, etc., .n 1/ times. We obtain n 1 additional equations:
.1/ .1/
˛1 f1 .x/ C ˛2 f2 .x/ C C ˛n fn.1/ .x/ D 0 ;
.2/ .2/
˛1 f1 .x/ C ˛2 f2 .x/ C C ˛n fn.2/ .x/ D 0 ;

.n1/ .n1/
˛1 f1 .x/ C ˛2 f2 .x/ C C ˛n fn.n1/ .x/ D 0 ;
.k/
where fi .x/ D dk fi =dxk . These equations, together with Eq. (1.68), form a system
of n linear algebraic equations with respect to the coefficients ˛1 , ˛2 , etc.:
W˛ D 0 ; (1.69)
where ˛ is the vector of the coefficients and

0 1
f1 f2 fn
B .1/ .1/ .1/ C
B f f2 fn C
WDB 1 C (1.70)
@ A
.n1/ .n1/ .n1/
f1 f2 fn
is a matrix called Wronskian. We have already come across it before, e.g. in

Sect. I.8.2. The point is that Eq. (1.69) has the form of a set of linear algebraic
equations with the zero right-hand side which we have already discussed above.
Therefore, it has a non-zero solution only if the determinant of the Wronskian is
equal to zero for any x within the given interval. Conversely, if detW ¤ 0, then
there is only a zero solution and hence the functions are linearly independent.
As an example, let us prove that power functions xi with different powers are
linearly independent. We shall specifically consider a set of .n C 1/ functions with
i D 0; 1; : : : ; n. The Wronskian in this case is
0 1
1 x x2 x3 xn
B 0 1 2x 3x2 nxn1 C
B C
B C
B 0 0 2 3 2x n .n 1/ xn2 C
WDB C :
B 0 0 0 3Š nŠ= .n 3/Šxn1 C
B C
@ A
0 0 0 0 nŠ
It is easy to see that W has a triangular form with the elements along its diagonal
being 1, 1, 2Š, 3Š, 4Š, etc., nŠ. Calculating the determinant jWj along the first column
followed by the calculation of all the consecutive minors also along the first column
(cf. Problem 1.52) results in that
Y
n
jWj D 1 2Š 3Š : : : nŠ D iŠ ¤ 0 :
iD1
Therefore, the power functions are linearly independent.
Problem 1.56. Demonstrate, using the method based on the calculation of the
Wronskian, that the functions sin x, cos x and eix are linearly dependent.
Problem 1.57. The same for functions ex , ex and sinh.x/.
Problem 1.58. Show that the exponential functions ex , e2x , etc., enx are linearly
independent.
1.2.9 Calculation of the Inverse Matrix
The formulae obtained in the previous section allows us to derive a general formula
for the inverse matrix. We should also be able to establish a necessary condition for
the inverse of a matrix to exist.
To accomplish this program, we have to compare the Cramer’s solution of
Eq. (1.63) with that given by Eq. (1.60). To this end, let us first rewrite the
solution (1.63) in a slightly different form. We expand the determinant in the
numerator˚ of the expression for xi along its i-th column (the one which contains
elements bj ):
ˇ ˇ
ˇ a11 a1;i1 b1 a1;iC1 a1n ˇ
ˇ ˇ Xn
ˇ ˇ D b1 A1i C b2 A2i C C bn Ani D bk Aki ;
ˇ ˇ
ˇa a ˇ
n1 n;i1 bn an;iC1 ann kD1
where Aki are the corresponding co-factors of the matrix A of the coefficients of the
system of equations: indeed, by removing the i-th column and the k-th row we arrive
at the k; i co-factor Aki of A. Therefore, combining the last equation with Eq. (1.63),
we have
1 X
n
xi D bk Aki : (1.71)
jAj kD1
On the other hand, we also formally have the solution in the form of Eq. (1.60)
which contains the inverse matrix; it can be written in components as
X
n
1
xi D A ik bk :
kD1
The last two expressions should give the same answer for any numbers bk .
Therefore, the following expression must be generally valid:
1 Aki
A ik D ; (1.72)
jAj
which gives a general expression for the elements of the inverse matrix sought for.
It can be used to calculate it in the case of arbitrary dimension of the matrixA. Note
the reverse order of indices in Eq. (1.72) above: if we denote by Acof D Aij the
matrix of co-factors, then
ATcof
A1 D : (1.73)
jAj
This is the general result we have been looking for.
It also follows from the above formula that A1 exists if and only if jAj ¤ 0.
Matrices that have a non-zero determinant are called non-singular as opposite to
singular matrices that have a zero determinant.
Example 1.14. I Find the inverse of the matrix
0 1
2 11
A D @ 1 3 2 A :
2 01
Solution. First of all, we calculate all necessary co-factors:

ˇ ˇ ˇ ˇ ˇ ˇ
ˇ3 2ˇ ˇ 1 2 ˇ ˇ 1 3 ˇ
A11 D Cˇ ˇ ˇ ˇ
D 3 ; A12 D ˇ ˇ ˇ
D 5 ; A13 D C ˇ ˇ D 6 ;
0 1ˇ 2 1ˇ 2 0ˇ
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ1 1ˇ ˇ ˇ ˇ ˇ
A21 D ˇˇ ˇ D 1 ; A22 D C ˇ 2 1 ˇ D 0 ; A23 D ˇ 2 1 ˇ D 2 ;
01 ˇ ˇ 21 ˇ ˇ 2 0ˇ
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ1 1ˇ ˇ 2 1ˇ ˇ 2 1ˇ
A31 D Cˇ ˇ ˇ D 1 ; A32 D ˇˇ ˇ D 5 ; A33 D C ˇ ˇ ˇD7:
3 2ˇ 1 2 ˇ 1 3 ˇ
Using already available co-factors, the calculation of the determinant is straightfor-

ward as one can, e.g. expand alone the first row:
jAj D 2A11 C 1 A12 C 1 A13 D 2 3 C 1 5 C 1 .6/ D 5 :

Therefore, noting the reverse order of indices between the elements of the inverse
matrix and the corresponding co-factors (i.e. using the transpose of the co-factor
matrix), we get
0 1 0 1
A A A 3=5 1=5 1=5
1 @ 11 21 31 A @
A1 D A12 A22 A32 D 1 0 1 A :
5
A13 A23 A33 6=5 2=5 7=5
It is easy to check that the matrix A1 we found is indeed the inverse of the matrix
A. Using the matrix multiplication, we get
0 10 1 0 1
3=5 1=5 1=5 2 11 100
A1 A D @ 1 0 1 A @ 1 3 2 A D @ 0 1 0 A D E ;
6=5 2=5 7=5 2 01 001
0 10 1 0 1
2 11 3=5 1=5 1=5 100
AA1 D @ 1 3 2 A @ 1 0 1 A D @ 0 1 0 A D E : J
2 01 6=5 2=5 7=5 001
Problem 1.59. The matrix of rotations by the angle around the y axis is
0 1
cos 0 sin
Ry . / D @ 0 1 0 A :
sin 0 cos
Calculate the inverse matrix Ry . /1 via building the corresponding co-
factors. Interpret your result.
Problem 1.60. Show using the method of co-factors that the inverse of the
matrix
0 1 0 1
a0f b 0 bf
1
A D @0 b dA is A1 D @ cd a cf ad A :
b.a cf /
c01 cb 0 ab
Problem 1.61. Consider a tridiagonal n n matrix A shown inthe left-hand

side of the equation depicted in Fig. 1.6. Show that the element A1 11 of its
inverse can be represented as a continued fraction:
ˇ ˇ ˇ ˇ
1 1j ˇ12 ˇ ˇ22 ˇ ˇ32 ˇ 2 ˇ
ˇn1
A 11 D :
j˛1 j˛2 j˛3 j˛4 ˛n
Problem
1 1.62. For the same tridiagonal matrix A, show that the element
A 12 of its inverse can be represented as:
" ˇ ˇ ˇ ˇ #1
1
1 ˛1 ˇ12 ˇ ˇ22 ˇ ˇ32 ˇ 2 ˇ
ˇn1
A 12
D ˛1 :
ˇ1 ˇ1 j˛2 j˛3 j˛4 ˛n
1.2.10 Eigenvectors and Eigenvalues of a Square Matrix

1.2.10.1 General Formulation
When a p-th order matrix A is multiplied with a vector x, this gives another vector
y D Ax. However, in many applications, most notably in quantum mechanics, it
is important to know particular vectors x for which the transformation Ax gives
a vector in the same direction as x, i.e. different from x only by some (generally
complex) constant factor :
Ax D x : (1.74)
Here both x and are to be found; this problem is usually called an eigenproblem.
The number is called an eigenvalue, and the corresponding to it vector x
eigenvector. More than one pair of eigenvectors and eigenvalues may exist for the
given square matrix A. Note that the vectors x and numbers are necessarily related
to each other by the nature of equation (1.74) defining them. Also, note that if x
is a solution of Eq. (1.74) with some or, as it is usually said, it is an eigenvector
corresponding to the eigenvalue , then any vector cx with an arbitrary complex
factor c is also an eigenvector with the same eigenvalue. In other words, eigenvectors
are always defined up to an arbitrary prefactor.
Obviously, the vector x D 0 is a solution of Eq. (1.74) with D 0. However, in
applications this trivial solution is never of a value, and so we shall only be interested
in non-trivial solutions of this problem.
To solve the problem, we shall rewrite it in the following way: let us take the
vector x to the left-hand side and rewrite it in a matrix form as Ex using the
unit matrix E. We get
.A E/ x D 0 : (1.75)
This equation should ring a bell as it is simply a set of linear algebraic equations
with respect to x with the zero right-hand side. We know from Sect. 1.2.7 that this
system of equations has a non-trivial solution if
det .A E/ D jA Ej D 0 : (1.76)

This is an equation for the eigenvalues . By expanding the determinant jA Ej,

we shall obtain a p-th order polynomial in where p is the order of the determinant.
Therefore, allowed values of are obtained from the polynomial equation (1.76)
which is called the secular or characteristic equation. It is known from elementary
algebra that a polynomial equation of order p has always p (generally complex)
solutions (roots of the polynomial) of which some may coincide, so that the number
of different values of may in reality be smaller than p, see below.
Problem 1.63. Prove that the square matrices A and AT share the same
eigenvalues.
Problem 1.64. Prove that the square matrices A and U 1 AU share the same
eigenvalues.
Problem 1.65. Let the n n matrix A has eigenvalues 1 ; 2 ; : : : ; n . Show
that the eigenvalues of the matrix Am D AA …
„ ƒ‚ A are m
1 ; 2 ; : : : ; n .
m m
The determinant of a square matrix A can be directly expressed via its eigenvalues
as demonstrated by the following Theorem.
Theorem 1.5. The determinant of a matrix is equal to the product of all its
eigenvalues:
Y
p
detA D .i/ : (1.77)
iD1
Proof. We know that, after opening up the determinant of A E, we get a

polynomial algebraic equation in which has roots .1/ ; : : : ; .p/ , i.e. one can write
det.A E/ D ..1/ / ..p/ / :
By putting D 0 in the above equation, we obtain the desired result. Q.E.D.

An important criterion for checking if a matrix A is singular, i.e. has a zero
determinant, follows immediately from this: if at least one of the eigenvalues of
A is zero, then jAj D 0 and the matrix A is indeed a singular matrix. In particular,
this means that such a matrix does not have an inverse.
Once the eigenvalues are determined, the corresponding eigenvectors x are
found: by taking a particular eigenvalue and substituting it into Eq. (1.75), the
corresponding eigenvector x, corresponding to this eigenvalue, is found. Each value
of gives one vector x associated with it.
Example 1.15. I Find all eigenvalues and eigenvectors of the matrix

1 1
AD :
4 2
Solution. First if all, we need to solve the secular equation to find the eigenvalues:
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ 1 1 1 0 ˇˇ ˇˇ 1 1 0 ˇˇ ˇˇ 1 1 ˇˇ
ˇ D D D2 6 D 0 ;
ˇ 4 2 01 ˇ ˇ 4 2 0 ˇ ˇ 4 2 ˇ
that has two solutions: .1/ D 3 and .2/ D 2. Substituting the value of .1/ into
the matrix equation Ax D .1/ x, we obtain (in components) two equations for the
two components x1 ; x2 of the eigenvector x:

x1 C x2 D 3x1 4x1 C x2 D 0
Ax D 3x H) or :
4x1 C 2x2 D 3x2 4x1 x2 D 0
The two equations are equivalent since, by construction, the corresponding rows of
the matrix A .1/ E are linearly dependent because its determinant is zero (recall
Property 1.7 of the determinant in Sect. 1.2.6.2). Solving either of the two yields
.1/ .1/
x2 D4x1 ; i.e.
the eigenvector corresponding to the eigenvalue D 3 is x D
a 1
Da , where a is an arbitrary number.6 Similarly, using .2/ D 2 in
4a 4
the matrix equation Ax D .2/ x yields two equations

x1 C x2 D 2x1 x1 C x2 D 0
or :
4x1 C 2x2 D 2x2 4x1 C 4x2 D 0
Again, both equations are equivalent, andthusonly x2 D x 1 follows from either

.2/ b 1
of them, giving the eigenvector x D D b , where b is another
b 1
arbitrary number. The two arbitrary numbers a and b can be given a certain value
(for certainty) by imposing an additional condition on the eigenvectors.pA usual
gives a D 1=
choice in physics is to construct them of unit length. This 17 and

p 1 1 1 1
b D 1= 2. Notice that the two eigenvectors x.1/ D p and x.2/ D p
17 4 2 1
are linearly independent.
To check that the obtained eigenvectors and eigenvalues are correct, it is
advisable to use them directly in the equation Ax D x, e.g.

1 1 a a C 4a 3a a
D D D3 ;
4 2 4a 4a C 8a 12a 4a
as it should be! J
6
Recall that each eigenvector is defined up to an arbitrary constant prefactor anyway.
Example 1.16. I Find eigenvectors and eigenvalues of the matrix

0 1
1 1 1
A D @ 1 0 2 A :
1 1 1
Solution. The secular equation in this case

ˇ ˇ
ˇ1 1 1 ˇˇ
ˇ
ˇ 1 0 2 ˇ D .1 / Œ.1 / 2 Œ.1 / C 2 C Œ1 C
ˇ ˇ
ˇ 1 1 1 ˇ
D .2 3/. 2/ D 0
(we
p have expanded the determinant with respect to its first row), yielding .1;2/ D
.3/ .1/
˙ 3 and D 2. The p system of equations for the first eigenvector x , which
.1/
corresponds to D 3, reads
8 p
< .1 3/xp 1 C x2 C x3 D 0
x1 3x2 p 2x3 D 0 ;
:
x1 x2 C .1 3/x3 D 0
0 1
1
p p T
which yields the vector x.1/ D a @ 3 A D a 1; 3; 1 with arbitrary a.
1
Again, one of the equations must be equivalent to two others, so we set x3 D a and
expressed x1 and x2 pfrom the first two equations via x3 . Similarly, for the second
eigenvalue .2/ D 3 we get the system of equations
8 p
< .1 C 3/xp 1 C x2 C x3 D 0
x1 C 3x2 p 2x3 D 0 ;
:
x1 x2 C .1 C 3/x3 D 0
0 1
1
p p T
that yields the eigenvector x.2/ D b @ 3 A D b 1; 3; 1 . Finally, performing
1
similar calculations for the third eigenvalue .3/ D 2, we get the equations
8
< x1 C x2 C x3 D 0
x 2x2 2x3 D 0 ;
: 1
x1 x2 x3 D 0
01
0
which results in x.3/ D c @ 1 A D c .0; 1; 1/T . In this case the first and the third
1
equations are obviously equivalent and hence one of them should be dropped. We
fixed x3 D c and expressed x1 and x2 via x3 from the second and third equations.
Hence, in this case we obtained three solutions of the eigenproblem, i.e. three
pairs of eigenvectors and corresponding to them eigenvalues. J
In some cases care is needed in solving equations for the eigenvectors as
illustrated by the following example.
Example 1.17. I Find eigenvectors and eigenvalues of a matrix
0 1
100
A D @0 4 2A :
024
Solution. The secular equation in this case:

ˇ ˇ
ˇ1 0 0 ˇˇ
ˇ
ˇ 0 4 2 ˇ D .1 / .4 /2 4 D 0 ;
ˇ ˇ
ˇ 0 2 4 ˇ
giving 1 D 1 and 4 D ˙2 for the other two solutions, i.e. 2 D 2 and 3 D 6.

Consider now the first eigenvector:
0 10 1 0 1 8 8
100 x1 x1 < x1 D x1 < x1 is arbitrary
@ 0 4 2 A @ x2 A D @ x2 A H) 3x C 2x3 D 0 H) x2 D 0 :
: 2 :
024 x3 x3 2x2 C 3x3 D 0 x3 D 0
Note that the first equation has as a solution any x1 ; the second and the third
equations should be solved simultaneously (e.g. solve for x2 from the second
equation and substitute into the third), resulting in the zero solution for x2 and
x3 .7 So, the first eigenvector corresponding to 1 D 1 is x.1/ D .a; 0; 0/T with
7
This situation can also be considered as a solution of two algebraic linear equations with the zero
right-hand side:

3x2 C 2x3 D 0
:
2x2 C 3x3 D 0
As
weknow, this system of equations has a non-trivial solution only if the determinant of its matrix
32
is equal to zero. Obviously, this is not the case and hence only the zero solution exists.
23
an arbitrary constant a. Finding eigenvectors for the other two eigenvalues is more
straightforward. Consider, for instance, the second one, 2 D 2:
0 10 1 0 1 8 8
100 x1 x1 < x1 D 0 < x1 D 0
@ 0 4 2 A @ x2 A D 2 @ x2 A H) 2x2 C 2x3 D 0 H) x D x3 :
: : 2
024 x3 x3 2x2 C 2x3 D 0 x3 D x2
We see that in this case the second and the third equations give identical information
about x2 and x3 , which is that x2 D x3 ; nothing can be said about the absolute
values of them. Therefore, we can take x2 D b with an arbitrary b and then write the
eigenvector as x.2/ D .0; b; b/T D b .0; 1; 1/T . Similarly the third eigenvector is
found to be x.3/ D .0; c; c/T D c .0; 1; 1/T with an arbitrary c. J
If eigenvalues of a matrix A are all different, they are said to be non-degenerate. If
an eigenvalue repeats itself, it is said to be degenerate. We have seen in the examples
above that if a matrix A of dimension p has all different eigenvalues, it has exactly p
linearly independent eigenvectors. In fact, this is a very general result that is proven
by the following Theorem.
Theorem 1.6. If all eigenvalues of a square matrix A are different, then all its
eigenvectors are linearly independent.
Proof. Let eigenvalues and eigenvectors of the matrix A be .i/ and x.i/ (i D
1; : : : ; p), respectively. If the vectors are all linearly independent, then the equation
X
p
c1 x.1/ C C cp x.p/ D ci x.i/ D 0 (1.78)
iD1
has only the trivial solution c1 D c2 D D cp D 0. This is what we have to

prove. Let us first show that c1 D 0. To this end, multiply the equation above with
the matrix A .2/ E from the left:
X .i/
p

A .2/ E ci x D c1 A .2/ E x.1/ C c2 A .2/ E x.2/
iD1
X .i/
p

C A .2/ E ci x D 0 :
iD3
Because x.2/ is an eigenvector of A corresponding

to .2/
the eigenvalue
.2/ .2/ , the second
term in the right-hand side is equal to zero, A E x D 0, so that we can
write
X .i/
A .2/ E ci x D 0 :
i¤2

Now act from the left with the matrix A .3/ E . First, we notice that the matrices

A .2/ E and A .3/ E commute since A and E do. Thus, we have
X .i/ X .i/
A .3/ E A .2/ E ci x D A .2/ E A .3/ E ci x D 0 :
i¤2 i¤2
Similarly to the above, the term i D 3 in the sum will disappear since x.3/ is the
eigenvector corresponding to the eigenvalue .3/ . Thus we get
X
A .3/ E A .2/ E ci x.i/ D 0 :
i¤2;3
Repeating this procedure, we finally remove all the terms in the sum except for the
very first one:
" #
Y
n

A E c1 x.1/ D A .p/ E A .3/ E A .2/ E c1 x.1/ D 0:
.i/
iD2
(1.79)
However, for any i ¤ 1, we get

A .i/ E x.1/ D Ax.1/ .i/ x.1/ D .1/ x.1/ .i/ x.1/ D .1/ .i/ x.1/ :
Therefore, after repeatedly using the above identity, Eq. (1.79) turns into

c1 .1/ .p/ .1/ .3/ .1/ .2/ x.1/ D 0 :
If all the eigenvalues are different, then this equation can be satisfied if and only if
c1 D 0. Q
Similarly, by operating with the matrix i¤2 A .i/ E on the left-hand side of
Eq. (1.78), we obtain c2 D 0. All other coefficients ci are found to be zero in the
same way. Q.E.D.
Note that if there are some repeated (degenerate) eigenvalues, the number of
distinct eigenvectors may be smaller than the dimension of the matrix p. The two
examples below illustrate this point.
Example 1.18. I Find eigenvalues and eigenvectors of a matrix

1 1
AD :
1 3
Solution. The characteristic equation gives

ˇ ˇ
ˇ 1 1 ˇ
ˇ ˇ 2
ˇ 1 3 ˇ D . 2/ D 0 ;
leading to a single (degenerate) eigenvalue D 2. The corresponding eigenvectors

are obtained from the equation .A 2E/ x D 0 which has the form:

x1 x2 D 0
;
x1 C x2 D 0
from where we get x1 D x2 . Hence only a single eigenvector x.1/ D a.1; 1/T is
found. Of course, formally, one can always construct the second eigenvector, x.2/ D
b.1; 1/T , but this is linearly dependent on the first one. Thus, there is only a single
linearly independent eigenvector. J
Example 1.19. I Find eigenvectors and eigenvalues of the matrix
0 1
3 2 1
A D @ 3 4 3 A :
2 4 0
Solution. The eigenvalues of A are obtained via

ˇ ˇ
ˇ 3 2 1 ˇ
ˇ ˇ
ˇ 3 4 3 ˇ D .5 /.2 /2 D 0 ;
ˇ ˇ
ˇ 2 4 ˇ
yielding .1/ D 5, .2;3/ D 2. For the first eigenvalue, the eigenvector is obtained
in the usual way from
8
< 8x1 2x2 x3 D 0
.A C 5E/x D 3x1 C x2 3x3 D 0 ;
:
2x1 4x2 C 5x3 D 0
yielding x.1/ D a.1; 3; 2/T . The situation with the other two eigenvectors is a bit
peculiar: since both eigenvalues are degenerate, the eigenvector equations are the
same:
8 8
< x1 2x2 x3 D0 < x1 2x2 x3 D0
.A 2E/xD 3x1 6x2 3x3 D0 ; or after simplification, x1 2x2 x3 D0 ;
: :
2x1 4x2 2x3 D0 x1 2x2 x3 D0
i.e. all three equations are identical! Thus, the only relationship between compo-
nents of either of the eigenvectors x.2/ and x.3/ is that x1 D 2x2 C x3 , i.e. there are
two arbitrary constants possible. This means that two linearly independent vectors
can be constructed. There is an infinite number of possibilities. For instance, if we
take x2 D 0, then we obtain x1 D x3 and hence x.2/ D a.1; 0; 1/T , and by setting
instead x3 D 0, we obtain another vector x.3/ D b.2; 1; 0/T . It is easily seen that all
three vectors are linearly independent. Any linear combination of vectors x.2/ and
x.3/ can also be constructed,
x.2/0 D ˛2 x.2/ C ˇ2 x.3/ and x.3/0 D ˛3 x.2/ C ˇ3 x.3/ ;
to serve as the second and third eigenvectors provided that the new vectors are
linearly independent. J
Problem 1.66. Let an eigenvalue of a matrix A be p-fold degenerate,

and there are k p linearly independent eigenvectors x.i/ , i D 1; : : : ; k.
Demonstrate that any linear combination
X
k
y.i/ D ˛ij x.j/
jD1
of the eigenvectors, corresponding to the same eigenvalue , are also eigenvec-

tors of A with the same .
Problem 1.67. Show that the normalised to unity eigenvectors of the matrix

12 .1/ 1 1 .2/ 1 1
AD are x Dp and x Dp :
21 2 1 2 1
Problem 1.68. Show that eigenvectors of the matrix

0 1 0 1 0 1 0 1
310 1 1 0
AD @ 1 3 0 A are x.1/ Da @ 1 A ; x.2/ Db @ 1 A and x.3/ Dc @ 0 A ;
002 0 0 1
where a, b and c are arbitrary constants.

Problem 1.69. Find the eigenvectors and eigenvalues of the matrix
0 1
221
A D @1 3 1A :
122
[Answer: the eigenvalues are 1, 1 and 5, while the corresponding eigenvectors

are, for instance, .1; 0; 1/, .2; 1; 0/ and .1; 1; 1/, respectively.]
Problem 1.70. Let x and be the eigenvector and eigenvalue of a unitary

matrix A, i.e. Ax D x and A D A1 . Show that A has the same eigenvector
with the eigenvalue 1=.
Problem 1.71. Use the result of the previous Problem to show that jj2 D
1, i.e. any eigenvalue of a unitary matrix has the absolute value of one, and
of an orthogonal matrix can only be ˙1. [Hint:
correspondingly eigenvalues
consider the dot product A x; x .]
Problem 1.72. Prove that eigenvalues of an anti-Hermitian (or skew-
Hermitian) matrix A defined via A D A are purely imaginary (including
zero).
Problem 1.73. Show that determinant of the matrix
0 1
˛0 ˇ1 ˇ2 ˇN
B ˇ1 ˛1 0 0 C
B C
B C
A D B ˇ2 0 ˛2 0 C
B C
@ ::: A
ˇN 0 0 ˛N
can be written as
0 1
X
N
ˇj2 Y
N
det A D @˛0 A ˛i :
jD1
˛j iD1
Show that all eigenvalues of the matrix A are obtained by solving the
transcendental equation
X
N
ˇj2
˛0 D :
jD1
˛j
Prove specifically that the eigenvalues are not given by the diagonal ele-
ments ˛j . Then, show that the normalised to unity eigenvector of A correspond-
ing to the eigenvalue is given by
0 1
1
B C
1 B 1C
B C
e D q PN B 2 C ;
B C
1 C jD1 j2 @ A
N
where i D ˇi = .˛i /. Finally, show that any two eigenvectors, e and e0 ,
corresponding to different eigenvalues ¤ 0 , are orthogonal.
1.2.10.2 Hermitian (Symmetric) Matrices
We have seen above that in the case of degenerate eigenvalues of a pp matrix A it is
not always possible to find all its p linearly independent eigenvectors. We shall show
in this section that for Hermitian, AT D A D A matrices all their eigenvectors
can always be chosen linearly independent even if there are some degenerate
eigenvalues. The case of symmetric matrices, AT D A, can be considered as a
particular case of the Hermitian ones when all elements of the matrices are real, so
that there is no need to consider the case of the symmetric matrices separately. Also,
quantum mechanics is based upon Hermitian matrices, so that their consideration is
of special importance.
Theorem 1.7. All eigenvalues of a Hermitian matrix A D A are real.
Proof. Let x be an eigenvector of A with the eigenvalue , i.e. Ax D x. Next, we

multiply the first equation by x from the left:
x Ax D x x : (1.80)
Alternatively, take the Hermitian conjugate of both sides of Ax D x to get x A D

x . Next, multiply both sides from the right by x:
x A x D x x : (1.81)
As the matrix A is Hermitian, the left-hand sides of the two equations (1.80)
and (1.81) are the same. Therefore,

x x D 0 :
Since x ¤ 0, x x > 0, so that the only way to satisfy the above equation is to admit
that is real: D . Q.E.D.
Note that in quantum mechanics measurable quantities correspond to eigenvalues
of Hermitian matrices associated with them. This theorem guarantees that all
measurable quantities are real.
Another important theorem deals with eigenvectors of a Hermitian matrix:
Theorem 1.8. Eigenvectors of a Hermitian matrix corresponding to different

eigenvalues can always be chosen orthogonal.
Proof. Consider two distinct eigenvalues ˛ and ˇ of a Hermitian matrix A.

Corresponding to them eigenvectors are x and y, i.e. Ax D ˛x and Ay D ˇy.
Similarly to the steps taken when proving the previous theorem, multiply the first
equation from the left by y ; then, take the Hermitian conjugate of the second
equation and then multiply it from the right by x. You will get two equations:
y Ax D ˛y x and y A x D ˇy x :

Since A D A, the left-hand sides of these equations are identical and so

.˛ ˇ/ y x D 0. Since the two eigenvalues are different, ˛ ¤ ˇ, then y x D
.y; x/ D 0, i.e. the two eigenvectors are orthogonal. Q.E.D.
In fact, even if a matrix has repeated eigenvalues, one can always choose linearly
independent set of eigenvectors for each of them which (by virtue of the Gram–
Schmidt procedure of Sect. 1.1.3) can always be made into linear combinations
which are mutually orthogonal. We also note that the vectors corresponding to
a generate eigenvalue are orthogonal to other eigenvectors since they correspond
to different eigenvalues. Therefore, if all eigenvectors corresponding to the same
eigenvalue can be made orthogonal, then we can conclude that all eigenvectors of a
Hermitian matrix can always be made orthogonal to each other.
How many eigenvectors do exist for a Hermitian (or symmetric) p p matrix? It
is possible to show that a Hermitian p p matrix has always exactly p eigenvectors,
no matter if its eigenvalues are all distinct or there are repeated ones. The proof of
this more general result corresponding to a Hermitian matrix is more involved and
will be given later.
Thus, a Hermitian (or symmetric) p p matrix has exactly p linearly independent
orthogonal eigenvectors corresponding to real eigenvalues. In fact, by properly
choosing an arbitrary multiplier associated with each eigenvector, one can always
construct an orthonormal set of eigenvectors,
p which are obtained by normalising
each eigenvector x with the prefactor 1= .x; x/.
Example 1.20. I Find eigenvalues and eigenvectors of a Hermitian matrix
0 1
0 i i
A D @ i 1 0 A :
i 0 1
Solution. Eigenvalues found from

ˇ ˇ
ˇ i i ˇˇ
ˇ
ˇ i 1 0 ˇ D 2 1 . 2/ D 0
ˇ ˇ
ˇ i 0 1 ˇ
are .1/ D 1, .2/ D 1 and .3/ D 2. Note that all are indeed real. The eigenvector
x.1/ is obtained from
0 10 1 8
1 i i x1 < x1 C ix2 C ix3 D 0
@ i 0 0 A @ x2 A D 0 H) ix1 D 0 ;
:
i 0 0 x3 ix1 D 0
that gives x.1/ D a .0; 1; 1/T , where a is an arbitrary complex number. Similarly,
x.2/ is found via solving
0 10 1 8
1 i i x1 < x1 C ix2 C ix3 D 0
@ i 2 0 A @ x2 A D 0 H) ix1 C 2x2 D 0 ;
:
i 0 2 x3 ix1 C 2x3 D 0
yielding x.2/ D b .1; i=2; i=2/T . Finally, x.3/ is determined from equations
0 10 1 8
2 i i x1 < 2x1 C ix2 C ix3 D 0
@ i 1 0 A @ x2 A D 0 H) ix1 x2 D 0 ;
:
i 0 1 x3 ix1 x3 D 0
and results in x.3/ D c .1; i; i/T . The obtained eigenvectors are all orthogonal:
0 1
1
i i
x.1/ x.2/
D a b 0 1 1 @ A
i=2 D a b
D0;
2 2
i=2
0 1
1
i2 i2
x.2/ x.3/ D b c 1 2i 2i @ i A D b c 1 C C D 0;
2 2
i
0 1
1
x.1/ x.3/
D a c 0 1 1 @ i A D a c Œi C i D 0 : J
i
1.2.10.3 Diagonalisation of Matrices: Similarity Transformation
Consider two matrices: A and non-singular B. Then, the special product B1 AB is
called a similarity transformation of A by B; the matrices A and B1 AB are called
similar.
Next, consider an eigenvalue/eigenvector problem Ax D x, where A is a square
p p matrix. It is convenient in this section to denote eigenvectors of A as xj D xij ,
where in the components xij the second index corresponds to the vector number and
the first to its components. We assume that the matrix A has all its eigenvectors
linearly independent. Let us have all eigenvectors of A arranged as columns,
0 1
x11 x12 x1p
U D x1 x2 xp D @ A : (1.82)
xp1 xp2 xpp
The matrix thus defined is called the modal matrix of A.

It appears that the modal matrix has a very useful property which is stated by the
following Theorem.
Theorem 1.9. If a p p matrix A has p eigenvectors xj (j D 1; : : : ; p) which form

the modal matrix U given by Eq. (1.82), then the matrix A can be diagonalised by
the similarity transformation with U:
D D U 1 AU ; (1.83)
where
0 1
1 0
D D @ A (1.84)
0 p
is a diagonal matrix containing all eigenvalues of A on its main diagonal.
Proof. Consider the product of two matrices, U and A:

0 10 1
a11 a1p x11 x1p
AU D @ aik A @ xkj A
ap1 app xp1 xpp
0P P 1 0 1
k a1k xk1 P k a1k xkp 1 x11 p x1p
D@ k aik xkj P
A D @ j xij A ;
P
a x
k pk k1 a x
k pk kp 1 xp1 p xpp
which can also be written as a product of two matrices, UD:
0 10 1
x11 x1p 1 0
@ xij A @ j A :
xp1 xpp 0 p
By multiplying the obtained matrix identity AU D UD by U 1 from the left

(the inverse matrix exists since, as we have assumed, all eigenvectors of A are
orthonormal so that detU ¤ 0), we obtain the required result, Eq. (1.83).
It is instructive to illustrate the above derivation algebraically as well. Consider
the i; j element of the matrix AU:
X X X
.AU/ij D aik ukj D aik xkj D aik xj k D Axj i D j xj i D j xij ;
k k k
which is the same as the i; j element of the matrix UD:

X X
.UD/ij D uik dkj D xik k ıkj D j xij :
k k
Note that it was not required for this prove to work that the eigenvectors xj be
orthogonal. Q.E.D.
We see from Eq. (1.83) that the similarity transformation of A by the matrix U,
that contains as its columns the eigenvectors of A, transforms A into the diagonal
form.
Inversely, let us multiply the matrix identity AU D UD from the right by U 1 .
We get
A D UDU 1 : (1.85)
Since D consists of eigenvalues of A and U of its eigenvectors, the last formula

shows that any square pp matrix A that has p linearly independent eigenvectors can
actually be written via its eigenvectors and eigenvalues (see also later Sect. 1.2.12).
It is essential that the matrix A has p linearly independent eigenvectors so that a
modal matrix with a non-zero determinant can be constructed. This is needed for the
modal matrix U to have its inverse; otherwise, the matrix A is not diagonalisable.
It is easy to see that the modal matrix U of a Hermitian (or symmetric) matrix A is
unitary (orthogonal), i.e. U 1 D U (or U 1 D U T ). This follows immediately from
the fact that the columns of U are made of orthogonal vectors xj . In this particular
case Eqs. (1.83) and (1.85) are simplified: D D U AU and A D UDU .
We have shown in Theorem 1.5 that the determinant of any square matrix A is
equal to the product of all its eigenvalues. For Hermitian matrices this statement is
simply a trivial consequence of Eq. (1.85).
Theorem 1.10. The determinant of any Hermitian matrix A is equal to the product
of all its eigenvalues:
Y
det A D i : (1.86)
i
Proof. We write first A D UDU and then calculate the determinant on both sides:
in the left-hand side we simply have det A, while in the right we get
Y
det UDU D det U det D det U D det D D i ;
i

the modal matrix U can always be chosen unitary with det U detU D
since
det UU D detE D 1. Q.E.D.
Example 1.21. IDiagonalise the matrix
0 1
0 i i
A D @ i 1 0 A :
i 0 1
Solution. We found in Example 1.20 that A has three eigenvectors

0 1 0 1 0 1
0 1 1
x1 D a @ 1 A ; x2 D b @ i=2 A ; x3 D c @ i A ;
1 i=2 i
that correspond to three eigenvalues 1 D 1, 2 D 1 and 3 D 2. We shall choose

the values of the arbitrary constants a, b and c to ensure that the eigenvectors are
normalised to unity. Then,
0 1 r 0 1 1 0 1
0 1
1 @ 2 1
x1 D p 1 A ; x2 D @ i=2 A ; x3 D p @ i A ;
2 1 3 3 i
i=2
and hence the modal matrix is

0 p p 1
0p 2=3 1= p3
p
U D @ 1= 2 i= 6 i= 3 A :
p p p
1= 2 i= 6 i= 3
The modal matrix consists of orthonormal rows (columns) and is hence unitary. The
inverse of this matrix can be calculated using general formula (1.72); however, we
know that for unitary matrices:
0 p p 1
p 0 1= p2 1=p 2
U 1 D U D @ 2=3 i= 6 i= 6 A ;
p p p
1= 3 i= 3 i= 3
which is the correct inverse of U. Indeed,
0 p p 10 p p 1 0 1
p 0 1= p2 1=p 2 0p 2=3
p 1= p3 100
U U D @ 2=3 i= 6 i= 6 A @ 1= 2 i= 6 i= 3 A D @ 0 1 0 A :

p p p p p p
1= 3 i= 3 i= 3 1= 2 i= 6 i= 3 001
Therefore, the similarity transformation U 1 AU D U AU should diagonalise A. We

can check that by a direct calculation:
0 p p 10 10 p p 1
p0 1= p2 1=p 2 0 i i 0p 2=3 1= p3
p
@ 2=3 i= 6 i= 6 A @ i 1 0 A @ 1= 2 i= 6 i= 3 A
p p p p p p
1= 3 i= 3 i= 3 i 0 1 1= 2 i= 6 i= 3
0 p p 10 p p 1 0 1
p0 1= p2 1=p 2 0p p 2=3 2= p3 1 0 0
D @ 2=3 i= 6 i= 6 A @ 1= 2 i= 6 2i= 3 A D @ 0 1 0A ;
p p p p p p
1= 3 i= 3 i= 3 1= 2 i= 6 2i= 3 0 0 2
with the eigenvalues on the main diagonal, as required. Note that the eigenvalues
run along the diagonal exactly in the same order as that of eigenvectors chosen in
the modal matrix U. J
Note that in the above example the product of eigenvalues is 2; a direct
calculation of the determinant det A yields exactly the same result.
The following theorem establishes an important property of the similar matrices.
Theorem 1.11. Similar matrices A and B D U 1 AU have the same eigenvalues,

and their eigenvectors are related via the transformation U.
Proof. Let two matrices A and B D U 1 AU be similar, U being some transforma-

tion (i.e. some non-singular matrix). The eigenvalues of A and B are determined
from the following secular equations:
ˇ ˇ
jA Ej D 0 and jB Ej D ˇU 1 AU Eˇ D 0 :
The matrix inside the determinant in the second equation can be rearranged by using
E D U 1 U, i.e.
U 1 AU E D U 1 AU U 1 U D U 1 .A E/ U ;
and hence its determinant,

ˇ 1 ˇ ˇ ˇ ˇ ˇ
ˇU AU Eˇ D ˇU 1 .A E/ U ˇ D ˇU 1 ˇ jA Ej jUj D jA Ej :
ˇ ˇ
since ˇU 1 ˇ jUj D 1. Thus, both A and B have the same characteristic (secular)
equation and, therefore, share the same eigenvalues (cf. Problem 1.64).
Next, let us establish a connection between the eigenvectors of the two eigen-
problems. Consider the eigenproblems for the two matrices corresponding to the
same eigenvalue (assuming for simplicity that all eigenvalues are different, i.e. there
is no degeneracy):
Au D u and Bv D U 1 AUv D v :
Multiply the second equation by U from the left to get A .Uv/ D .Uv/. It is
seen that the vector Uv is an eigenvector of A with the eigenvalue . However, we
know that u is also an eigenvector of A with the same , and there is only one such
eigenvalue. Therefore, u D Uv (up to an arbitrary multiplier).
In the general case of repeated eigenvalues (degeneracy) one can always con-
struct linear combinations of eigenvectors ui (of A) or vi (of B) corresponding to the
same eigenvalue (see Problem 1.66); these will still serve as accepted eigenvectors
for that eigenvalue. Therefore, in this case one can write that
X
Uvi D cik uk
k
with some coefficients cik forming a matrix C; here we sum up over all eigenvectors
corresponding to the same . Inversely,
X X
uk D c1
ki Uvi D U c1 0
ki vi D Uvk ;
i i
P
where c1 1 0
ki are elements of the inverse matrix C , and vk D
1
i cki vi are linear
combinations of the eigenvectors of B. Q.E.D.
As a simple example, let us find all eigenvalues and eigenvectors of the matrix

1 1
AD
4 2
(see Example 1.15) after a similarity transformation with the matrix

1 1 1 1=2 1=2
UD ; U D
1 1 1=2 1=2
As we found in Example 1.15, the matrix A has two eigenvalues .1/D3 and .2/ D
1
2, the corresponding normalised eigenvectors being x.1/ D p1 and x.2/ D
17 4

1
p1 . Note that these are not orthogonal (they do not need be as A is neither
2 1
symmetric nor Hermitian). Then, the matrix A after the similarity transformation
becomes

1 1=2 1=2 1 1 1 1 3 0
B D U AU D D :
1=2 1=2 4 2 1 1 3 2
The secular problem for B yields

ˇ ˇ
ˇ3 0 ˇˇ
ˇ
ˇ 3 2 ˇ D .3 /.2 / D 0 ;
leading to the same eigenvalues, 3 and 2. The first eigenvector y.1/ is obtained
from

0y1 C 0y2 D 0
;
3y1 5y2 D 0

5=3
so that y1 D 53 y2 and the normalised vector y.1/ D p3 . It is indeed directly
34 1
proportional to the vector x.1/ via U (we can omit the unnecessary multipliers):

1 1 5=3 2=3 2 1
Uy.1/ D D D x.1/ :
1 1 1 8=3 3 4
The second eigenvector is found by solving

5y1 C 0y2 D 0
;
3y1 C 0y2 D 0

.2/ 0
yielding y1 D 0 and the normalised eigenvector y D . It is also related to
1
x.2/ via U:

1 1 0 1 1
Uy.2/ D D D x.2/ :
1 1 1 1 1
1.2.10.4 General Case of Repeated Eigenvalues
Now we are ready to conclude the proof of Theorem 1.8. There we established that
two eigenvectors x and y of a Hermitian matrix A corresponding to two different
eigenvalues are orthogonal, i.e. .x; y/ D 0, we did not consider the case of repeated
eigenvalues in detail. Now we shall show that one can always choose n linearly
independent eigenvectors for any Hermitian n n matrix, even if the latter has
repeated eigenvalues.
Theorem 1.12. For any Hermitian pp matrix A (even with repeated eigenvalues)
there exists a unitary matrix U such that the similarity transformation U AU D D
results in a diagonal matrix D containing all eigenvalues of A.
Proof. Let us pick up one eigenvalue 1 of A with the corresponding eigenvector x1 .

Next, we choose any p 1 orthonormal (and hence linearly independent) other
vectors x2 , x3 , etc., xn and form the matrix
0 1
x11 x12 x1p
U1 D x1 x2 xp D @ A ;
xp1 xp2 xpp

whose columns are composed of the chosen vectors xj D xij . Note that, according
to the notations above, the first (left) index in xij corresponds to the components i of
the vector xj where number j is given by the second (right) index. Also note that U1
is not the modal matrix of A as only the first vector (the first column) is the actual
eigenvector of A.
Since x1 is an eigenvector of A, we can write
X X
akl .x1 /l D akl xl1 D 1 .x1 /k D 1 xk1 : (1.87)
l l

Then, consider the similarity transformation U1 AU1 . The i; j components of the
matrix formed this way are
!

X X X

U1 AU1 D xki akl xlj D xki akl xlj :
ij
kl k l
Now, consider specifically j D 1 which corresponds to the first column of the matrix

U1 AU1 . Using Eq. (1.87), we have
!

X X X X

U1 AU1 D xki akl xl1 D xki .1 xk1 / D 1 xki xk1 D 1 ıi1 ;
i1
k l k k
where in the last step we used the fact that, by construction, the vectors xi D .xki /
and x1 D .xk1 / are orthonormal. We see from the above that all elements of the
first column are zeros, apart from the 1; 1 element which is equal to 1 , the first

eigenvalue of A. On the other hand, the matrix U1 AU1 must be Hermitian since A is
such (see Problem 1.36). Therefore, all elements on the first row, apart from the first

one, should also be zeros. In other words, the structure of the matrix U1 AU1 must
be this:
0 1
1 0 0
B 0 C
U1 AU1 D B C

@ Ap1 A ;
0
with all elements, apart from the ones in the first row and column, forming a

.p 1/ ˇ.p 1/ ˇmatrix
ˇ Aˇ p1 (this will be the 1; 1-minor of the matrix U1 AU1 ).
ˇ ˇ ˇ ˇ
Note that ˇU1 AU1 ˇ D ˇU1 ˇ jAj jU1 j D jAj, since U1 is unitary by construction. On

the other
ˇ hand,ˇ because of the special structure of the matrix U1 AU1 , we can also
ˇ ˇ ˇ ˇ
write: ˇU1 AU1 ˇ D 1 Ap1 ˇ, and hence (see Eq. (1.77)) the matrix Ap1 contains
ˇ
all other eigenvalues of A (see Theorem 1.10). In other words, if 1 is not repeated,
Ap1 does not have this one but has all others; if 1 is repeated, then Ap1 has this
one as well, but repeated one time less. At the next step, we consider one eigenvector
of Ap1 , let us call it y2 , corresponding to the eigenvalue 2 of Ap1 (and hence of
A). Repeating the above procedure, we can construct a Hermitian .p 1/ .p 1/
matrix S2 D y2 yp such that it brings the matrix Ap1 into the form:
0 1
2 0 0
B 0 C
S2 Ap1 S2 D B C

@ Ap2 A ;
0
where Ap2 is a square matrix of dimension p 2. Therefore, if one constructs a
p p matrix
0 1
1 0 0
B 0 C
U2 D B C
@ S2 A ;
0

then it will bring the matrix U1 AU1 into the following form:
0 10 10 1
1 0 0 1 0 0 1 0 0
B 0 CB 0 CB 0 C
U2 U1 AU1 U2 D B CB CB C

@ S A @ Ap1 A @ S2 A
2
0 0 0
0 1
0 1 1 0 0 0
1 0 0 B 0 0 0C
B 0 C B B 2 C
C
DB
@ S Ap1 S2 A B
C D B 0 0 C D .U1 U2 / A .U1 U2 / :
2
C
@ Ap2 A
0
0 0
ˇ ˇ ˇ ˇ
ˇ ˇ
As before, the determinant of ˇ.U1 U2 / A .U1 U2 /ˇ D jAj is also equal to 1 2 ˇAp2 ˇ
because of the special structure of the matrix above obtained after the similarity
transformation. Therefore, Ap2 has all eigenvalues of A apart from 1 and 2 .
This process can be repeated another p 3 times by constructing a sequence of
unitary matrices U3 , U4 , etc., Up1 , until only the last element at the p; p position is
left which is a scalar Ap.p1/ D A1 . This way the matrix A would be finally brought
into the diagonal form:
0 1
1 0 0 0
B 0 0 0 C
B B
2 C
C
U1 U2 : : : Up1 A U1 U2 : : : Up1 D B 0 0 3 0 C D D ;
B C
@ A
0 0 0 p
where the last element at the pp position must be the last eigenvalue p of A
which we have not yet considered in our construction. Thus, we managed to
construct a unitary matrix U D U1 U2 : : : Up1 which upon performing the similarity
transformation, U AU, brings the original matrix A into the diagonal form D with
its eigenvalues on the principal diagonal, as required. Our proof is general as it was
not based on assuming that all eigenvalues are different. Q.E.D.
Now we should be able to finally prove that even in the case of a Hermitian p p
matrix A having degenerate (repeated) eigenvalues it is always possible to choose
exactly p linearly independent eigenvectors fxi g (i D 1; : : : ; p) corresponding to its
eigenvalues fi g. This is shown by the following Theorem.
Theorem 1.13. Consider a Hermitian pp matrix A which has p eigenvalues fi g
amongst which there could be repeated ones. Then, it is always possible to choose
exactly p orthogonal eigenvectors fxi g (i D 1; : : : ; p).
Proof. We know from the previous Theorem that there exists a unitary matrix

U D x1 x2 xp D xij ;

such that U AU D D with D D dij D i ıij only containing all eigenvalues
of A on its diagonal. Recall that the second (column) index of xij in U indicates the
vector number, while the first index corresponds to its components. But, multiplying
U AU D D from the left with U, we obtain AU P D UD. Let usP write the
last equation
in components. In the left-hand side we have k aik xkj D k aik xj k , while the
right-hand side reads
X X X
xik dkj D xik j ıkj D j xij D j xj i H) aik xj k D j xj i ;
k k k

i.e. the j-th column xj D xij of the matrix U is an eigenvector of A corresponding to
the eigenvalue j . Since the matrix U is unitary, all the eigenvectors are orthogonal
(and hence linearly independent). Q.E.D.
The Theorem just proven generalises our previous results of Sect. 1.2.10.2 to
arbitrary Hermitian matrices.
Problem 1.74. Consider a Hermitian matrix A, and let the eigenvalue be

repeated d times (d-fold degenerate). Also, let x1 , etc., xd be the corresponding
d eigenvectors associated with this eigenvalue. It is known (Problem 1.66) that
any linear combinations
X
d
yi D cki xk ; i D 1; : : : ; p ;
kD1
of the eigenvectors can also serve as eigenvectors of A with the same eigenvalue
(note the “reverse” order of indices in the coefficients above!). Show that the
vectors fyi g will be orthonormal if the matrix C D .cik / of the coefficients ofthe
linear transformation satisfies the matrix equation: C SC D E, where
S D sij
is the matrix of dot products of the original vectors, sij D xi ; xj . In particular,
it is seen that if S D E (the original set is orthonormal), then C must be unitary
to preserve this property for the new eigenvectors.
1.2.11 Simultaneous Diagonalisation of Two Matrices
A question one frequently asks in quantum mechanics is what are the conditions
at which two (or more) physical quantities can be observed (measured) at the same
time. The answer to this question essentially boils down to the following question: is
it possible to diagonalise two (or more) matrices associated with the chosen physical
quantities using the same similarity transformation U? This problem is solved with
the following Theorem.
Theorem 1.14. Two Hermitian matrices can be diagonalised with the same
similarity transformation U if and only if they commute.
Proof. The necessary condition of the proof we shall do first. Consider two p p
matrices A and B. Let us assume that there exists the same similarity transformation
that diagonalises both of them:
Da D U 1 AU and Db D U 1 BU :
The two diagonal matrices obviously commute:
Da Db D Db Da H) U 1 AUU 1 BU D U 1 BUU 1 AU
H) U 1 ABU D U 1 BAU H) AB D BA ;
i.e. the matrices A and B must commute.

To prove the sufficient condition we shall first make an assumption that the
eigenvalues of A are all distinct. Then, assuming this time that the matrices
commute, AB D BA, we can write for an eigenvector xj of A corresponding to
the eigenvalue j :

A Bxj D BAxj D Bj xj D j Bxj ;
i.e. the vector y D Bxj must also be an eigenvector of A with the same eigenvalue
j as xj . Since this is a unique eigenvalue, this is only possible if y is proportional
to the xj , i.e. y D ˇxj . This means that Bxj D ˇxj , i.e. xj is also an eigenvector
of B. Hence, both matrices A and B share the same eigenvalues and eigenvectors,
and hence the same modal matrix U D x1 xp will diagonalise both of them.
If A has a d-fold (d > 1) repeated eigenvalue , then Bxj must be a linear
combination of all eigenvectors associated with this particular eigenvalue. For
simplicity let us assume that the first d eigenvectors of A correspond to the same
eigenvalue . Then, for any j between 1 and d,
X
d
Bxj D cjk xk ; j D 1; : : : ; d ;
kD1
with some coefficients cjk forming a square d d matrix C. This matrix can easily
be seen to be Hermitian. Indeed,
Xd X
d
xi ; Bxj D cjk .xi ; xk / D cjk ıik D cji
kD1 kD1
and also
Xd
Xd
Bxj ; xi D xj ; B xi D xj ; Bxi D cik xj ; xk D cik ıjk D cij ;
kD1 kD1

since B D B . However, Bxj ; xi D xi ; Bxj , from which it immediately follows
that cji D cij , i.e. C D C , i.e. it is indeed a Hermitian matrix. Once C is Hermitian,

it can be diagonalised, i.e. there exists a d d unitary matrix S D sij such that
C D S DS with D being a diagonal matrix of eigenvalues P fk g of C.
Now it is easy to see that the vectors yi D dkD1 sik xk (with the coefficients sik
from S) are eigenvectors of both A and B. Indeed,
X X X
Ayi D sik Axk D sik xk D sik xk D yi
k k k
and
X X X X
Byi D sik Bxk D sik ckl xl D .SC/il xl
k k l l
X X X
D SS DS il xl D .DS/il xl D i sil xl D i yi :
l l l
Therefore, even in the case of repeated eigenvalues there exists a common set of
eigenvectors for both matrices. Repeating this procedure for all repeated eigenvec-
tors, we can collect all eigenvectors which are common to both matrices. Therefore,
collecting all eigenvectors
(corresponding
both to repeated and distinct eigenvalues)
into a matrix U D y1 yp (we should use yj D xj in the case of a distinct j-th
eigenvalue) and following the statement of Theorem 1.9, we conclude that one can
diagonalise both A and B at the same time via the same similarity transformation:
Da D U 1 AU and Db D U 1 BU. This proves the theorem. Q.E.D.
Example 1.22. I Show that the matrices

31 12
AD and B D
13 21
commute and then diagonalise them by the same similarity transformation.

Solution. First of all, we make sure that they indeed commute:

31 12 57 12 31 57
D and D :
13 21 75 21 13 75
Then, in order to find the required similarity transformation, we should find the
modal matrix of one of the matrices. For instance, consider A. The secular equation,
ˇ ˇ
ˇ3 1 ˇ
ˇ ˇ 2 2
ˇ 1 3 ˇ D .3 / 1 D 6 C 8 D 0 ;
has solutions 1 D 2 and 2 D 4. The normalised eigenvectors

corresponding
to
1 1 1 1
these eigenvalues are easily found to be x1 D p and x2 D p . The
2 1 2 1
eigenvectors are orthonormal (as A is symmetric and eigenvalues are distinct):

1 1 1 1
.x1 ; x2 / D xT1 x2 D p 1 1 p D .1 1/ D 0 :
2 2 1 2
The corresponding modal matrix is then orthogonal:
p p
1= p2 1=p2 1 1 1 1 1 1
UD Dp and U 1 D U T D p :
1= 2 1= 2 2 1 1 2 1 1
Therefore, both matrices A and B should be diagonalisable by the same U. And,
indeed:

1 1 1 31 1 1 1
U T AU D p p
2 1 1 13 2 1 1

1 1 1 2 4 1 40 20
D D D ;
2 1 1 2 4 2 08 04

1 1 1 12 1 1 1
U BU D p
T
p
2 1 1 21 2 1 1

1 1 1 1 3 1 2 0 1 0
D D D :
2 1 1 1 3 2 0 6 0 3
As one can see, the eigenvalues of B must be 1 D 1 and 2 D 3, that can be
confirmed by an independent calculation:
ˇ ˇ
ˇ1 2 ˇ
ˇ ˇ 2 2
ˇ 2 1 ˇ D .1 / 4 D 2 3 D 0 ;
as this equation obviously has the same roots. J
1.2.12 Spectral Theorem and Function of a Matrix
We start by rewriting in components Eq. (1.85) obtained previously.


Problem 1.75. Show that any symmetric matrix A D aij can be written via
its eigenvalues i and eigenvectors xi D .xki / as a sum
X
X

AD i xi xi or aij D k xki xkj : (1.88)
i k
Here in xik and xkj the right index corresponds to the eigenvector number, while
the left to its component.
One can see that any Hermitian (symmetric) matrix can actually be written as a
sum over its all eigenvalues and eigenvectors (the so-called spectral theorem). This
theorem opens a way to define functions of matrices. Indeed, consider a square of a
symmetric matrix A as the matrix product AA:
A2 D AA D UDU 1 UDU 1 D UDDU 1 :
Since the matrix D is diagonal, the product DD D D2 is also a diagonal matrix with
the squares of the eigenvalues of A on its main diagonal, i.e.
0 2 1
1 0
D2 D DD D @ A :
0 2p
One can see that A2 has the same form as A of Eq. (1.85), but with squares of its
eigenvalues (instead of the eigenvalues themselves) in the sum. Therefore, we can
immediately write
X
A2 D 2i xi xi :
i
Similarly, one can show (e.g. by induction) that for any power n,
X
An D ni xi xi :
i
Therefore, it is possible to define a (scalar) function of a matrix f .A/ using a Taylor

expansion of the function f .x/ around x D 0 (if exists):
1 1
f .A/ D f .0/ C f 0 .0/A C f 00 .0/A2 C f 000 .0/A3 C : (1.89)
2 3Š
This infinite expansion defines a matrix which can be written using the spectral
theorem as
X X
1 1
f .A/ D f .0/ C f 0 .0/i C f 00 .0/2i C f 000 .0/3i C xi xi D f .i /xi xi :
i
2 3Š i
(1.90)
It is seen that a function of the matrix shares the same eigenvectors, but has the
eigenvalues f .i /.
We have come across natural powers of a matrix A; however, once a function of

a matrix was defined, one can define any power ˛ of the matrix, e.g.
p Xp 1 X 1
A D A1=2 D i xi xi or p D A1=2 D p xi xi :
i
A i
i
Problem 1.76. If i and xi are the eigenvalues and eigenvectors of a matrix A,

prove that the eigenvalues and eigenvectors of the inverse matrix A1 are 1=i
and xi , respectively. Hence write the inverse matrix A1 via the eigenvectors
and eigenvalues of A.
Problem 1.77. Show that A1=2 A1=2 D A and A1=2 A1=2 D E.
Problem 1.78. Consider a two-level quantum system whose Hamiltonian
matrix

1 T
HD :
T 2
Show that eigenvalues 1;2 and the corresponding normalised eigenvectors x1;2
of H are
q
1 2 2 1 T
1 D 1C 2C . 1 2 / C 4 jTj and x1 D q ;
2 a
a2 C jTj2
q
1 2 2 1 T
2 D 1C 2 . 1 2 / C 4 jTj and x2 D q ;
2 b
b2 C jTj2
where
q
1 2 2
aD 1 2 . 1 2 / C 4 jTj and
2
q
1
bD 1 2 C . 1 2/
2
C 4 jTj2 :
2
Then verify explicitly that the two eigenvectors are orthogonal and that H can

be indeed written in its spectral form as H D 1 x1 x1 C 2 x2 x2 .
Problem 1.79. The density matrix written via its eigenvalues and eigenvec-
tors X is
X
D X X :

Show that if the density matrix is idempotent, i.e. 2 D , then one eigenvalue

is equal to one and all others to zero, i.e. in this case D X0 X0 , where 0 is
the only non-zero eigenvalue.
Special significance in physics has the matrix G.z/ D .zEA/1 , called resolvent
of a Hermitian matrix A; it is a function of (generally complex) number z. It is
defined via the scalar function f .x/ D .z x/1 D 1=.z x/. Using the spectral
theorem, we can write down G.z/ explicitly as
X 1
X xi x
G.z/ D xi xi D i
: (1.91)
i
z i i
z i
It is also often called the Green’s function.
1.2.13 Quadratic Forms
A real scalar function

X X
QD aij xi xj D xi aij xj D X AX (1.92)
ij ij
of p variables x1 ; : : : ; xp is called a quadratic form.

The (generally complex)
coefficients aij form a square p p matrix A D aij , while all the variables form a
vector-column X D .xi /. An example of such an expression in physics is, e.g. the
kinetic energy of a set of particles,
X1
KD mPxi2 ;
i
2
which is a quadratic form with respect to particles velocities xP i , or the potential

energy of oscillating atoms in a molecule as in Eq. (1.121) in Sect. 1.3.2.
In most applications the quadratic form Q is real.
Problem 1.80. Show that in order for Q to be real, the matrix A must be
Hermitian, A D A.
Therefore, we shall assume in this section that A is Hermitian. Note that if the
matrix A and the variables xi are real, then A in Eq. (1.92) can always be made
symmetric, i.e. satisfying aij D aji . Indeed, if the form Q contains an expression
axi xj C bxj xi with different coefficients, a ¤ b, it is always possible to write this
sum as cxi xj C cxj xi with c D .a C b/ =2.
It is sometimes necessary to find a special linear combination Y D UX of the
original variables X, such that the quadratic form is diagonal in the new variables Y,
i.e. it does not contain off-diagonal elements at all (the so-called canonical form):
X
QD ci jyi j2 : (1.93)
i
Transformation of a quadratic form to the canonical form is sometimes also called

diagonalisation of the quadratic form. What is necessary for us to find is the right
transformation U that would do the trick. The tools we have developed so far allow
us to solve this problem with ease.
Theorem 1.15. A quadratic form (1.92) can be diagonalised by a modal matrix

U of A.
1
Proof.
Let U be the modal matrix of A, i.e. A D UDU D UDU , where D D
ıij i is the diagonal matrix with all (real) eigenvalues i of A. Note that, because
A is Hermitian, U is unitary, i.e. U 1 D U . Then,
X X
Q D X AX D X UDU X D U X D U X D Y DY D yi ıij i yj D i jyi j2 ;
ij i
as required, where Y D U X is a new set of variables. Q.E.D.

Example 1.23. I Transform the real quadratic form
Q D 3x2 C 8xy C 3y2 :
Solution. First, write Q using matrix notations paying specific attention to the off-
diagonal elements:
T T
x 34 x x x
Q D 3x2 C8xyC3y2 D 3x2 C4xyC4yxC3y2 D D A :
y 43 y y y

34
The symmetric matrix A D has two eigenvalues 1 D 1 and 2 D 7 and
43

1 1 1 1
the corresponding normalised eigenvectors are u1 D p and u2 D p ,
2 1 2 1
so that the modal matrix
p p
1=p 2 1=p2 1 0
UD ; and hence U AU D
T
:
1= 2 1= 2 0 7
Thus, the new variables are
p p p
y1 1=p 2 1=p2 x y1 D .x C y/=p 2
YD DU XDT
H) ;
y2 1= 2 1= 2 y y2 D .x C y/= 2
and the quadratic form Q in the new variables reads Q D y21 C 7y22 , i.e. it has the
canonical form. Note that by a direct substitution of the new variables y1 and y2 into
the Q it is returned back into its original form via x and y. J
It can easily be seen that any unitary transformation of the modal matrix of A
also diagonalises a quadratic form based on it. Indeed, consider Q D X AX and
A D UDU with U being the modal matrix of A. Since A is Hermitian, U can
always be chosen unitary. If V is another unitary matrix, then the transformation
UV also diagonalises Q. Indeed, define auxiliary variables Y D .UV/ X D V U X,
then X D UVY, so that

Q D X AX D .UVY/ A .UVY/ D Y V U AUVY D Y V U AU VY
D Y V DVY D .VY/ D .VY/ D Z DZ ;
which appears to be in the canonical form with respect to the new variables
Z D VY D VV U X D U X since V is a unitary matrix. Thus, V disappears
completely from our final result, and the transformed Q is the same as given by the
modal matrix alone.
Therefore, any Hermitian matrix can be diagonalised with a similarity transfor-
mation. The latter is defined up to a unitary transformation.
Problem 1.81. Show that the quadratic form Q D 2x12 C 2x1 x2 C 2x22 is Q D
y21 C 3y22 in the canonical form, and find the new variables yi via the old ones.
[Answer: y1 D p1 .x1 C x2 / and y2 D p1 .x1 x2 /.]
2 2
Problem 1.82. Show that the quadratic form Q D x12 C 2x1 x2 C x22 can
be brought into the diagonal form Q D 2y22 by means of an orthogonal
transformation.
1.2.14 Extremum of a Function of n Variables
We have formulated the sufficient condition for a function of two variables to have
a minimum or a maximum at a point r0 D .x0 ; y0 / in Sect. I.5.10. Here we shall
generalise this result to the case of a function of n variables.
Consider a function y Dy.x/ D y .x1 ; x2 ; : : : ; xn / around a point specified by the

vector x0 D x10 ; x20 ; : : : ; xn0 . We shall expand y around that point into the Taylor’s
series:
n
X n
@y 1X @2 y
y.x/ D y x0 C x D y x0 C xi C xi xj C ;
iD1
@xi x0 2 i;jD1 @xi @xj x0
where x D x x0 (i.e. xi D xi xi0 ). If the point x0 is a stationary point, the first
order partial derivatives are all equal to zero, so that the first non-vanishing term in
the above expansion is the one containing second derivatives. We shall now write
y.x/ above using matrix notations:
1
y D y x0 C x y x0 D xT Hx C ; (1.94)
2

where H D hij is the Hessian8 matrix of second derivatives, which is symmetric
due to the well-known property of mixed derivatives (Sect. I.5.3):
2 2
@y @y
hij D D D hji :
@xi @xj x0 @xj @xi x0
The change y of the function y.x/ is a quadratic form with respect to the changes
of the variables, xi . By choosing a proper transformation of the vector x into
new variables z which diagonalises the matrix H, we obtain
1X
y D j z2j C ; (1.95)
2 j
where j are eigenvalues of the Hessian matrix. This is our final result: it shows that
if all eigenvalues of the Hessian matrix at the stationary point x0 are positive, then a
small deviation from the stationary point x0 ! x D x0 C x can only increase the
function y.x/, i.e. the stationary point is a minimum. If, however, all the eigenvalues
j are negative, then x0 corresponds to a maximum instead. If at least one of the
eigenvalues has a different sign to others, this is neither minimum nor maximum.
Problem 1.83. Consider a function z D z.x; y/of two variables and consider
ac
conditions for the Hessian matrix H D to have both its eigenvalues
cb
positive. Hence demonstrate
that the sufficient conditions for the function to
have a minimum at point x0 ; y0 are the same as those derived in Sect. I.5.10.
8
Named after Ludwig Otto Hesse.
1.2.15 Trace of a Matrix
A sum of diagonal elements of a matrix is called its trace:

X
p
Tr .A/ D akk : (1.96)
kD1
Traces of matrices are frequently used in physics, e.g. quantum statistical mechanics
is based on them. They possess a number of useful and important properties which
we shall consider here.
Firstly, the trace of a product of matrices is invariant (i.e. does not change) under
any cyclic permutation of them, i.e.
Tr .AB/ D Tr .BA/ or Tr .ABC/ D Tr .BCA/ ; but Tr .ABC/ ¤ Tr .ACB/ in general!
Indeed, consider first just two matrices:

X X X X
Tr .AB/ D .AB/kk D aki bik D bik aki D .BA/ii D Tr .BA/ ;
k ki ik i
as required. This result can be used to prove the statement for three (and more)
matrices. We shall consider the case of three for simplicity, a more general case can
be considered similarly:
Tr .ABC/ D Tr .A .BC// D Tr .AD/ D Tr .DA/ D Tr .BCA/ ;
where D D BC and we have used just proven statement for two matrices. Similarly,
Tr .ABC/ D Tr .GC/ D Tr .CG/ D Tr .CAB/ :
Therefore,
Tr .CAB/ D Tr .ABC/ D Tr .BCA/ :
Note the cyclic order of matrices, A ! B ! C ! A ! : : :, in each term is the

same!
Next, it is also easy to see that the trace of a p p matrix, which has p linearly
independent eigenvectors, is equal to the sum of its eigenvalues:
X
Tr .A/ D i : (1.97)
i
Indeed, let the square matrix A has fi g as its eigenvalues. Then, there exists a
modal matrix U that diagonalises A, i.e. A D U 1 DU with the diagonal matrix D
containing all eigenvalues i on its main diagonal. Therefore,
X
Tr .A/ D Tr U 1 DU D Tr UU 1 D D Tr .D/ D i ;
i
as required.
Problem 1.84. Show that the trace of a matrix A does not change after a
similarity transformation, i.e.

Tr A D Tr U 1 AU :
Problem 1.85. Check by direct calculation of the matrices A and U 1 AU,

where

1 1 2 3
AD and U D ;
1 1 3 2
that their traces are both equal to zero.

Problem 1.86. Consider the three matrices:

1 1 02 10
AD ; BD and C D :
1 1 20 32
Calculate the products of matrices ABC, CAB and BAC and then the traces of
them. Compare your results and explain your findings.
Problem 1.87. Using the spectral representation of a non-singular Hermitian
matrix A, prove the following identity:
Tr .ln A/ D ln .det A/ : (1.98)
1.2.16 Tridiagonalisation of a Matrix: The Lanczos Method
Here we shall consider a special procedure which allows constructing recurrently

a similarity transformation which transforms a Hermitian matrix into a tridiagonal
form. This procedure developed by Lanczos has found numerous applications in
physics.
Consider a Hermitian matrix A of dimension n and some vector x1 of the same

dimensionality and of unit length, i.e. x1 x1 D .x1 ; x1 / D 1 (the vector x1 can be
complex). Now, we construct a vector xQ 2 (also generally complex) using the recipe:
xQ 2 D Ax1 ˛1 x1 : (1.99)
Let us choose the parameter ˛1 in such a way that the vector xQ 2 be orthogonal to
x1 . Calculating the dot product of both sides of the above equation with x1 , i.e.

multiplying the equation from the left by x1 , we have

x1 xQ 2 D x1 Ax1 ˛1 x1 x1 H) x1 xQ 2 D x1 Ax1 ˛1 :

It is seen that x1 xQ 2 D 0 if

˛1 D x1 Ax1 : (1.100)
Since the matrix A is Hermitian, A D A, the number ˛1 is real:

˛1 D x1 Ax1 D x1 Ax1 D x1 A x1 D x1 Ax1 D ˛1 :
The vector xQ 2 may not be of unit length; therefore, we normalise it to unity, i.e. we
introduce a (real) scaling factor ˇ1 such that the vector x2 D xQ 2 =ˇ1 be of unit length:

x2 x2 D 1. Obviously, ˇ12 D xQ 2 xQ 2 . However, this expression for ˇ1 is not really
useful. Another expression for the parameter ˇ1 can formally be derived which is
directly related to the vectors x2 and x1 . Indeed, Eq. (1.99) can be rewritten as
ˇ1 x2 D Ax1 ˛1 x1 : (1.101)

Then, multiplying both sides of (1.101) from the left by x2 , we obtain

ˇ1 x2 x2 D x2 Ax1 ˛1 x2 x1 :
Since the two vectors are orthogonal and x2 is normalised to unity, we have ˇ1 D

x2 Ax1 . Since ˇ1 is real and A Hermitian, we can also write

ˇ1 D x2 Ax1 D x1 Ax2 : (1.102)
Next, we construct the third vector x3 using a linear combination of the vector
Ax2 and the vectors x1 and x2 :
ˇ2 x3 D Ax2 ˛2 x2 ˇ1 x1 : (1.103)
Above we immediately introduced the real scaling factor ˇ2 to ensure normalisation

of x3 . Let us show that one can choose the real constant ˛2 such that x3 be orthogonal

to both x1 and x2 . Indeed, multiplying both sides of (1.103) from the left by x2 and
using the fact that x2 is of unit length and has already been made orthogonal to x1 ,
we obtain

ˇ2 x2 x3 D x2 Ax2 ˛2 x2 x2 ˇ1 x2 x1 H) ˇ2 x2 x3 D x2 Ax2 ˛2 ;

It is seen that x2 x3 D 0 if ˛2 D x2 Ax2 is chosen. Similarly, we can find that x3 and
x1 are orthogonal by this construction automatically:

ˇ2 x1 x3 Dx1 Ax2 ˛2 x1 x2 ˇ1 x1 x1 H) ˇ2 x1 x3 Dx1 Ax2 ˇ1 H) x2 x3 D 0 ;
because of the expression (1.102) for ˇ1 . Finally, ˇ2 is chosen in such a way that the

vector x3 be of unit length. This constant formally satisfies the equation ˇ2 D x3 Ax2

which can be obtained by multiplying Eq. (1.103) from the left by x3 :

ˇ2 x3 x3 D x3 Ax2 ˛2 x3 x2 ˇ1 x3 x1 H) ˇ2 x3 x3 D x3 Ax2 ;
since x3 has already been made orthogonal to x1 and x2 , and its rescaling cannot
change that.
The next vector, x4 , is obtained from x2 and x3 via
ˇ3 x4 D Ax3 ˛3 x3 ˇ2 x2 : (1.104)

Problem 1.88. Show that by choosing ˛3 D x3 Ax3 , the vector x4 is made

orthogonal to both x2 and x3 . Next, demonstrate that ˇ3 D x3 Ax2 D x2 Ax3
if x4 is to be of unit length.
We see that, by construction, x4 is orthogonal to two previous vectors, x2 and x3 .

Remarkably, it can also be shown that it is automatically orthogonal to x1 as well.
Indeed, let us take the Hermitian conjugate of Eq. (1.104) and then multiply its both
sides from the right by x1 :

ˇ3 x4 x1 D x3 Ax1 ˛3 x3 x1 ˇ2 x2 x1 H) ˇ3 x4 x1 D x3 Ax1 ;
the last step was legitimate because of the orthogonality of x3 with x2 and of x2 with
x1 , ensured by the previous steps of the procedure. Next, from Eq. (1.101) it follows
that Ax1 D ˛1 x1 C ˇ1 x2 , i.e. Ax1 is a linear combination of x1 and x2 . However, x3
is orthogonal to both x1 and x2 which ensures that x4 is orthogonal to x1 :

ˇ3 x4 x1 D x3 Ax1 H) ˇ3 x4 x1 D x3 .˛1 x1 C ˇ1 x2 / H) x4 x1 D 0 ;

as required. Since x3 Ax1 was found to be proportional to x4 x1 , we also conclude that

x3 Ax1 D 0.
The subsequent vectors x5 , : : :, xn are obtained in a similar way by using at each
step two previously constructed vectors, and all vectors constructed in this way form
an orthonormal set, i.e. they are mutually orthogonal and are all of unit length.
We are now ready to formulate the general procedure. Starting from a unit vector
x1 , a vector x2 is constructed using (1.101) with the constants ˛1 and ˇ1 satisfying
Eqs. (1.100) and (1.102). Then, each subsequent vector xiC1 for i D 2; 3; ; : : : ; n 1
is built using the following rule:
ˇi xiC1 D Axi ˛i xi ˇi1 xi1 : (1.105)
It is seen that each consecutive vector is obtained from two preceding ones.
Problem 1.89. Show that by choosing

˛i D xi Axi (1.106)
the vector xiC1 is made orthogonal to xi1 and xi . Then, demonstrate that the
scaling factor

ˇi1 D xi1 Axi D xi Axi1 : (1.107)
Problem 1.90. Assume that vectors x1 ; x2 ; : : : ; xi (i 3) constructed by the

Lanczos procedure are mutually orthogonal and normalised to one. Then,
prove by mathematical induction that xiC1 is orthogonal to x1 ; x2 ; : : : ; xi2 .
Therefore, establish that

xj Axi D xi Axj D 0 if ji jj 2: (1.108)
Thus, the matrix A and the unit vector x1 generate a set of n mutually ˚ orthogonal
unit vectors x1 ; : : : ; xn . By taking a different first vector, another set x0i of n vectors
is generated. These new vectors belong to the same n-dimensional vector space,
are orthogonal to each other (and hence are linearly independent) and therefore are
obtained by a linear combination of the vectors from the first set:
X
n
x0i D wij xj ;
jD1
where expansion coefficients wij form a square matrix W. Since both sets are
orthonormal, the matrix U must be unitary (Sect. 1.2.5).
The Lanczos procedure described above has an interesting implication. Let us
construct a square matrix U D .x1 x2 xn / D uij by placing the vectors
x1 ; x2 ; : : : ; xn generated using the Lanczos algorithm as its columns. Obviously,
the

uij element of U is then equal to the i-th component of the vector xj , i.e. uij D xj i .
Recall that in Sect. 1.2.10.3 we built the modal matrix in exactly the same way.
Since the vectors x1 ; x2 ; : : : ; xn are orthonormal, the matrix U is unitary. We shall
now show that the similarity transformation with the matrix U results in a matrix
0 1
˛1 ˇ1
B ˇ1 ˛2 ˇ2 C
B C
B ˇ ˛ ˇ C
B 2 3 3 C
B :: :: :: C
B : : : C
B C
B C
B ˇi1 ˛i ˇi C
U AU D T D B C: ; (1.109)
B ˇi ˛iC1 ˇiC1 C
B C
B :: :: :: C
B : : : C
B C
B ˇn3 ˛n2 ˇn2 C
B C
@ ˇn2 ˛n1 ˇn1 A
ˇn1 ˛n
which is tridiagonal. Note that T contains as diagonal and off-diagonal elements

numbers ˛i and ˇi which are generated during the Lanczos procedure.
To prove this, let us write U AU D T explicitly in components:
X X
uki akl ulj D tij H) .xi /k akl xj l D tij H) xi Axj D tij ;
kl kl
where at the last step we returned back to the vector and matrix notations. The

numbers xi Axj are however all equal to zero if the indices i and j differ by more than

one according to Eq. (1.108). The diagonal elements tii D xi Axi coincide with ˛i ,

see Eq. (1.100) and (1.106), and the nearest off-diagonal elements ti1;i D xi1 Axi

and ti;i1 D xi Axi1 both coincide with ˇi1 , see Eqs. (1.102) and (1.107). This
finally proves formula (1.109).
1.2.17 Dividing Matrices into Blocks
In applications it is often needed to partition a matrix into blocks which are

themselves matrices of smaller dimensions. In Fig. 1.7(a) a p p matrix is split
into four blocks by
picking
up the equal number n1 of first rows and columns. Then,
the block A11 D aij with 1 i n1 and 1 j n1 becomes a square n1 n1
Fig. 1.7 Examples of partitioning of two square p p matrices into blocks: (a) the matrix A is
divided into four blocks A11 , A12 , A21 and A22 with their dimensions indicated, where n1 C n2 D p,
and (b) B is split into nine blocks with n1 C n2 C n3 D p
matrix; other blocks are defined similarly. Note that with this partition the 22 block
A22 is also a square n2 n2 matrix, while the blocks A12 and A21 are rectangular
matrices if n1 ¤ n2 . Similarly, more partitions can be made; an example of nine
blocks (three partitions along each side) is shown in Fig. 1.7(b).
It is useful to be aware of the fact that one
can operate with blocks as with
matrix
elements. Consider a p p matrix A D aij and another matrix B D bij of the
same dimension, and let us introduce for each of them the same block structure as
shown in Fig. 1.7(a). Then, the product of the two matrices C D AB will also be a
p p matrix for which the same partition can be made. Then, one can write
C11 D A11 B11 C A12 B21 ; C12 D A11 B12 C A12 B22 ;
C21 D A21 B11 C A22 B21 ; C22 D A21 B12 C A22 B22 ;
which all can formally be written as:

2
X
CIJ D AI1 B1J C AI2 B2J D AIK BKJ ; (1.110)
KD1
which is identical to the common rule of matrix multiplication.
Problem 1.91. Prove the above identities by writing down explicitly all matrix
multiplications using elements aij , bij and cij of the matrices A, B and C.
Problem 1.92. Consider an inverse G D A1 of the matrix A from the previous
Problem. By writing explicitly in blocks the identity AG D E, show that the
blocks of G can be written as follows:
1
G11 D A11 A12 A1 22 A21 ; G21 D A1
22 A21 G11 ;
1
G22 D A22 A21 A1 11 A12 ; G12 D A1
11 A12 G22 : (1.111)
Problem 1.93. Repeat the previous Problem by starting from GA D E instead.

The same result should be obtained.
Problem 1.94.
Check by direct multiplication that any square matrix X D
X11 X12
, which is split into four blocks, can be represented in the following
X21 X22
form:
1

X11 X12 X11 0 E11 0 E11 X11 X12
D 1 ;
X21 X22 0 E22 X21 E22 0 X22 X21 X11 X12
(1.112)
where zeros mean zero elements in the whole block, and E11 , etc., mean
the corresponding unity matrices. This form is sometimes called triangular
decomposition since the second and the third matrices in the right-hand side
are left and right triangular, respectively.
1.3 Examples in Physics 99

E11 0
Problem 1.95. Consider a 2 2 block matrix X D . Prove that
X21 X22
jXj D jX22 j.
Problem 1.96. Using the matrix decomposition Eq. (1.112) and the result of
the previous Problem, show that
ˇ ˇ
ˇ X11 X12 ˇ ˇ ˇ
ˇ ˇ ˇ 1 ˇ
ˇ X21 X22 ˇ D jX11 j X22 X21 X11 X12 : (1.113)
1.3 Examples in Physics
1.3.1 Particle in a Magnetic Field
Consider a particle of charge q (e.g. an electron) and mass m in a constant magnetic

field B. The equation of motion reads
dv
m D q .v B/ :
dt
We would like to know the trajectory of the particle r.t/ and its velocity v.t/ as a
function of time subject to the known initial conditions, r.0/ and v.0/.
In fact, there are three equations for each of the Cartesian components of the
velocity, i.e. we have a system of three linear differential equations:
8
< mvP 1 D q .v2 B3 v3 B2 /
mvP D q .v3 B1 v1 B3 / :
: 2
mvP 3 D q .v1 B2 v2 B1 /
To solve the problem, we shall rewrite the equations as a single matrix equation:
8 0 1 0 10 1
< mvP 1 D q .v2 B3 v3 B2 / v
d @ 1A q@
0 B3 B2 v1
mvP 2 D q .v3 B1 v1 B3 / H) v2 D B3 0 B1 A @ v2 A ;
: dt m
mvP 3 D q .v1 B2 v2 B1 / v3 B2 B1 0 v3
(1.114)
i.e.
0 1
0 B3 B2
dv q@
D Gv ; where G D B3 0 B1 A : (1.115)
dt m
B2 B1 0
To solve this matrix equation, we shall borrow the solution method from the
one-dimensional analog of this problem, which is the corresponding ordinary
differential equation. Indeed, the equation vP D gv has an exponential solution,
v.t/ D v0 egt . Therefore, we attempt to solve the corresponding matrix (three-
dimensional) equation using the following trial solution: v.t/ D uet with u and
being an unknown vector and scalar. Using the trial solution in the equation of
motion, we first calculate its time derivative, dv=dt D uet , and then, substituting
it into Eq. (1.115), we obtain
uet D Guet H) u D Gu ; (1.116)
after cancelling on the exponent in both sides. We have obtained a familiar

eigenvalue–eigenvector problem which should allow us to find both and u. Since
the matrix G is antisymmetric, nothing can be said about the eigenvectors in
advance apart from the fact that they could be either zero or purely imaginary
(see Problem 1.72).
To simplify the problem, let us assume that the magnetic field B acts along the z
axis only:
0 1 0 1
0 B0 0 !0
q@ qB
GD B 0 0 A D @ ! 0 0 A ; where ! D :
m m
0 00 0 0 0
The eigenvalues are found by solving the corresponding characteristic equation:
ˇ ˇ
ˇ ! 0 ˇ
ˇ ˇ
jG Ej D ˇˇ ! 0 ˇˇ D 2 C ! 2 D 0 ;
ˇ 0 0 ˇ
which gives three eigenvalues: D 0; i!; i!. As expected, they are purely
imaginary and zero. The normalised eigenvectors (generally complex) are easily
obtained to be
0 1 0 1 0 1
0 1 1
@ A 1 @ A 1 @
u1 D 0 ; u2 D p i ; u3 D p i A :
1 2 0 2 0
Therefore, a general solution of the system of three linear differential equations
can be written as a linear combination of the three functions with three arbitrary
constants c1 , c2 and c3 :
v.t/ D c1 u1 C c2 u2 ei!t C c3 u3 ei!t : (1.117)
Note that the first term is not time dependent as it corresponds to the zero eigenvalue.
Since u2 and u3 are within the x; y plane and u3 is directed along z, we may anticipate
that if the magnetic field is along the z direction the particle moves with a constant
speed along the z axis and performs an oscillatory motion in the x; y plane, i.e.
perpendicular to the magnetic field.
To see this explicitly, we need to apply the initial conditions which determine the
undefined constants. Let us assume that v.0/ D .0; v? ; vk /, i.e. the particle enters
the field with a velocity vk parallel to the field B D .0; 0; B/ and a velocity v?
perpendicular to it. Then at t D 0 we obtain
0 1 0 1 0 1 0 1
0 0 1 1
c c
@ v? A D c1 @ 0 A C p2 @ i A C p3 @ i A ;
vk 1 2 0 2 0
p
which can be solved to give: c1 D vk , c2 D c3 D iv? = 2. Substituting these
constants into solution (1.117), we obtain
0 1 0 1 0 1
0 1 1
iv? @ A i!t iv? @
v.t/ D @ 0 A i e C i A ei!t
2 2
vk 0 0
0 1 0 1 0 1 0 1
0 iei!t C iei!t 0 2 sin .!t/
v ? v?
D @ 0 AC @ ei!t C ei!t A D @ 0 A C @ 2 cos .!t/ A
2 2
vk 0 vk 0
0 1
v? sin .!t/
D @ v? cos .!t/ A : (1.118)
vk
To obtain the position vector r.t/ of the particle, we should integrate the velocity
vector:
0 1
Z t .v? =!/ Œ1 cos .!t/
r.t/ D v.t1 /dt1 D @ .v? =!/ sin .!t/ A ; (1.119)
0
vk t
where we have assumed that initially the particle was in the centre of the coordinate
system, r.0/ D 0. Thus, indeed, the particle performs a circular motion in the plane
perpendicular to the magnetic field and, at the same time, it moves with the constant
speed along the field direction. The circle radius R D v? =! D mv? =qB and the
rotation frequency ! D qB=m.
The kinetic energy of the particle,
m 2 m 2
K.t/ D v1 C v22 C v32 D v C vk2 D K.0/
2 2 ?
is conserved, as the magnetic field, as it is well known, does not do any work on the
particle.
Problem 1.97. Assume a general direction of the magnetic field. Show that the
general solution for the velocity in this case is
0 1 0 1 0 1
!1 !1 !3 i!!2 !1 !3 C i!!2
v.t/ D c1 @ !2 A C c1 @ !2 !3 C i!!1 A ei!t C c3 @ !2 !3 i!!1 A ei!t ;
!3 !32 ! 2 !32 ! 2
where !i D eBi =m.
Problem 1.98. Solve the following system of linear differential equations:

xP 1 D x1 C 2x2
xP 2 D 2x1 C x2

x1
by writing them first in a matrix form XP D DX with X D , applying
x2
a trial solution X.t/ D Yet to obtain an eigenproblem for the matrix D that
should give and Y as its eigenvalues and eigenvectors. Hence, construct the
general solution X.t/ by combining the two elementary solutions with arbitrary
constants. Finally, find the particular solution that satisfies the following initial

t 3t
tx1 .0/3t D 0 and x2 .0/ D 1. [Answer: x1 .t/ D e C e =2,
conditions:
x2 .t/ D e C e =2.]
Problem 1.99. Similarly, solve the system of equations

xP 1 D 3x1 2x2
xP 2 D 2x1 x2
subject to the initial hconditions x1 .0/ D 1 and x .0/ D 0.

p p p i p 2
[Answer: x1 .t/ D e2t sin 3t C 3 cos 3t = 3 and x2 .t/ D
p p
2= 3 e2t sin 3t .]
Problem 1.100. Upon an electronic bombardment, molecules A turn into an

unstable species B that at room temperature breaks down into two stable C
molecules: A ! B ! C C C. The concentrations nA .t/, nB .t/ and nC .t/ of the
molecules A, B and C satisfy the following kinetics equations:
nP A D 3nA ; nP B D 3nA 2nB and nP C D 4nB :
Initially nA .0/ D n0 and nB .0/ D nC .0/ D 0. Using the same method

as in the previous Problem, determine the time evolution of the three
concentrations. Plot the obtained concentrations and comment
on your find-
ings. [Answer: nA .t/ D n0 e3t , nB .t/ D 3n0 e2t e3t and nC .t/ D
2n0 1 3e2t C 2e3t .]
Problem 1.101. The populations of three species in a forest are denoted by

the time dependent functions N1 .t/, N2 .t/ and N3 .t/, respectively. The time
dependence of the populations is determined by the rates at which the species
eat each other as well as by availability of food. Suppose that this ecosystem
can approximately be described by the following system of coupled differential
equations:
8
< NP 1 D N1 C 2N2 C N3
NP 2 D N1 N2 :
: P
N3 D 2N1 N3
Assuming that initially all species had identical populations, Ni .0/ D N0 ,
where i D 1; 2; 3, show
that the corresponding
t particular
solution is
3t t 3t
N1 .t/
D N 0 5et
e =4, N2 .t/ D N0 5e C 2e C e =8 and N3 .t/ D
N0 5et 2et C e3t =4. Show that after a long period of time all three
populations grow exponentially and become related to each other as N1 '
N3 ' 2N2 .
Problem 1.102. Here we shall derive a general expression for the 3 3
matrix of rotation by angle ' about an arbitrarily oriented axis specified
by the unit vector w D .w1 ; w2 ; w3 /. Consider first a rotation by an
infinitesimal angle ı' as shown in Fig. 1.8. Work out the direction of the
difference vector ır D r0 r and its magnitude to convince yourself
that ır D Œw r ı'. Write this equation in the matrix form for the
derivative vector as dr=d' D Ar. The matrix A depends only on the
components of the unit vector w. Using a substitution r D ve' , show
that the constant vector v and the scalar are obtained as eigenvectors
and eigenvalues of A. Demonstrate by a direct calculation that the three
eigenvalues are D 0; ˙i. Work out the corresponding eigenvectors
and construct the general solution r.'/ as a linear combination of the three
(continued)

Eq. (1.120): vector r goes into
vector r0 upon rotation by an
infinitesimal angle ı' about
the axis given by the unit
vector w. The angle between
vectors w and r (and r0 ) is

elementary ones using three arbitrary constants. Noting that the vector r.'/
is to be real, combine the two elementary solutions corresponding to D ˙i
together. Finally, using the initial condition r.0/ D r0 , find the constants. These
manipulations yield r.'/ D C .'/ r0 , where
0 1
w21 C 1 w21 c w1 w2 .1 c/ w3 s w1 w3 .1 c/ C w2 s
C .'/ D @ w1 w2 .1 c/ C w3 s w22 C 1 w22 c w2 w3 .1 c/ w1 s A ;

w1 w3 .1 c/ w2 s w2 w3 .1 c/ C w1 s w23 C 1 w23 c
(1.120)
where c D cos ' and s D sin '. Show that C .0/ D E, C .'2 / C .'1 / D
C .'1 C '2 / and C .'1 /1 D C .'1 / D C .'1 /T , so that the matrix C .'/
is orthogonal. Finally, demonstrate that jC .'/j D 1. It is easy to see that C .'/
is a generalisation of the rotational matrices around the Cartesian axes derived
in Sect. 1.2.5.2.
Problem 1.103. Consider kinetics of formation of two different 2D molecular
phases on a surface of a crystal during the course of molecular self-assembly,
see Fig. 1.9. Both phases are formed from freely diffusing molecules which may
attach and detach from islands of either of the phases. We assume that the
concentration of islands is small and hence exchange of molecules between
them can only happen via the free phase between islands, i.e. when islands
loose molecules, they freely diffuse along the surface to attach to another
island(s). This rather general formulation of the problem of growth we shall
simplify, however. We shall assume that islands of the first phase experience a
reversible growth, i.e. molecules can attach (with the rate k0 ) and also detach
from them (with the rate k1 ), while the islands of the second phase experience an
irreversible growth with the same attachment rate. Their growth is irreversible
since the detachment barrier for the molecules from the phase 2 islands is
relatively large, rendering the corresponding detachment rate k2 being very
small and hence this process negligible. In summary, phase 2 islands can
only grow, while phase 1 islands can both grow and diminish in size. The
corresponding rate equations for the number of free molecules Nf , and the
number of molecules N1 and N2 in the two phases (the totals in all islands)
are
8
< NP 1 D k1 N1 C k0 Nf
NP D k N 2k0 Nf :
: f P 1 1
N2 D k0 Nf
Show that the solution of these equations subject to the initial conditions that
initially (t D 0) only N0 free molecules existed is
(continued)
Fig. 1.9 Rectangular islands of two phases growing on a surface of a crystal (Problem 1.103).
Between islands is a mobile phase consisting of mobile molecules which can attach (with the rate
k0 ) to any of the islands and/or detach (with the rates k1 and k2 , respectively) from any of them.
However, it is assumed in the Problem that the rate k2 k1 and hence the process of detachment
from the islands of phase 2 is neglected
N0 k0 t
N1 .t/ D e eC t ;

N0
Nf .t/ D .k1 C / e t .k1 C C / eC t ;

1
N2 .t/ D N0 1 C C .k1 C / e t .k1 C C / eC t ;
k1
where
q
1
˙ D .k1 C 2k0 ˙ / ; D k12 C 4k02 :
2
Prove that both eigenvalues ˙ < 0 and that the first phase and the
free molecules completely disappear with time, while the second phase will
consume all the molecules, i.e. N1 .1/ D Nf .1/ D 0, while N2 .1/ D N0 .
Explain this result.
1.3.2 Vibrations in Molecules
The notion of eigenvectors and eigenvalues is of huge importance in many areas of

physics; quantum mechanics is almost entirely based on it. We shall illustrate this
concept here using a classical example of atomic vibrations in a molecule.
From the point of view of classical mechanics atomic displacements xi .t/ in a
molecule (here i counts all its degrees of freedom) perform a complicated oscillatory
motion in time that is governed by the equations of motion. These are derived
from the potential energy V.x1 ; : : : ; xn / which is obtained by assuming that atomic
displacements xi are small. Adopting that equilibrium geometry of the system is
given by x D .xi / D 0, we expand the potential energy V.x1 ; : : : ; xn / in a Taylor

series around the point x D 0:
Xn n
@V 1 X @2 V
V.x1 ; : : : ; xn / D V.0/ C xi C xi xj C :
iD1
@xi xD0 2 i;jD1 @xi @xj xD0
Here n is the total number of degrees of freedom. Since the point x D 0

corresponds to mechanical equilibrium, the first derivatives of the energy at this
point, .@V=@xi /xD0 , must be equal to zero for every degree of freedom. Therefore,
assuming small atomic displacements, we can keep the first non-zero (quadratic)
term (this is called harmonic approximation) and write
n
1 X @2 V 1X
n
V.x1 ; : : : ; xn / ' V.0/ C xi xj D V.0/ C îj xi xj :
2 i;jD1 @xi @xj xD0 2 i;jD1
(1.121)
This is nothing but a quadratic form with respect to atomic displacements. The
constants îj D ˆji form a symmetric matrix, which is called the force-constant
matrix ˆ D îj . The force acting on the degree of freedom i is given by
0 1
@ @1 X 1X @
n n
@V
Fk D D îj xi xj A D îj xi xj
@xk @xk 2 i;jD1 2 i;jD1 @xk
0 1
1X 1 @X X X
n n n n
D îj ıik xj C xi ıjk D ˆkj xj C îk xi A D îj xj ;
2 i;jD1 2 jD1 iD1 jD1
so that the required equations of motion become
X
n
mi xR i D Fi D îj xj ; (1.122)
jD1
where mi is the mass associated with the degree of freedom i. The left-hand side
gives mass times acceleration, and the notation xRi means the second derivative with
respect to time.
Introducing obvious vector and matrix notations,
0 1 0 1 0 1
x1 ˆ11 ˆ1n m1 0
x D @ A ; ˆ D @ îj A and M D @ mi A ;
xn ˆn1 ˆnn 0 mn
where M is the diagonal matrix of masses, we can rewrite equations of

motion (1.122) as a single matrix equation:
M xR D ˆx : (1.123)
This is a system of linear second order differential equations. Since the motion
is oscillatory, we anticipate that the displacements xi .t/ ei!t in time. This also
follows from the one-dimensional analogue of this equation, mRx D ! 2 x, whose
solution is x ei!t . Therefore, we substitute into Eq. (1.123) a trial solution of
the form x.t/ D uei!t with some unknown scalar ! and a constant vector u. Since
xR D .i!/2 uei!t D ! 2 uei!t , we obtain an equation:
! 2 Mu D û : (1.124)
In order to solve this equation, we perform some matrix manipulations. Let us define
a matrix M 1=2 in such a way that square of it is equal to M (in this particular case
p
M 1=2 is a diagonal matrix with mi on its main diagonal). Then, the matrix M 1=2
p
is the inverse to M 1=2 , i.e. M 1=2 M 1=2 D E; it only contains 1= mi on its main
diagonal. Now, multiply from the left Eq. (1.124) by M 1=2 and insert the unit matrix
E D M 1=2 M 1=2 between ˆ and u in the right-hand side:

! 2 M 1=2 Mu D M 1=2 ˆM 1=2 M 1=2 u H) ! 2 M 1=2 u D D M 1=2 u ;
where
D D M 1=2 ˆM 1=2 (1.125)

is called the dynamical matrix of atomic vibrations. Note that the dynamical matrix
is symmetric, its elements are given by
2
1 @ V
dij D p :
mi mj @xi @xj xD0
Finally, introducing a vector v D M 1=2 u, we obtain
Dv D ! 2 v : (1.126)
This is the central result. It shows that the vibrational problem can be cast into an
eigenvector/eigenvalue problem. Squares of the vibrational frequencies .˛/ D !˛2
appear as ˛-th eigenvalue, and the corresponding eigenvector v.˛/ is directly related
to atomic displacements via u.˛/ D M 1=2 v.˛/ which are called normal modes.
These are collective (synchronised) displacements of many atoms of the system.
Since the matrix D is symmetric, the modal matrix of it, in which the vectors
.˛/ .˛/
v D vi are placed as its columns, is orthogonal. In turn, this means that its
rows (or columns) form orthonormal sets of vectors. Let us adopt ˛ as the column
index and i as the row index of the modal matrix. If these conditions are written
explicitly in components, the following identities are obtained:
X .˛/ .ˇ/ .˛/ T .ˇ/
vi vi D ı˛ˇ or v v D ı˛ˇ ; (1.127)
i
X .˛/ .˛/
X T
vi vj D ıij or v.˛/ v.˛/ D E ; (1.128)
˛ ˛
The first of these corresponds to orthonormality of the vectors v.˛/ (due to

orthonormality of the columns of the modal matrix), while the second to their
completeness (orthonormality of the rows).
Further, from the theory of symmetric matrices (Sect. 1.2.10.2) we know that all
eigenvalues are real and the eigenvectors v.˛/ form a linearly independent set. There
are exactly n elementary solutions, ˛ D 1; : : : ; n, and their linear combination forms
a general solution of Eq. (1.123):
X
n
x.t/ D c˛ M 1=2 v.˛/ ei!˛ t ; (1.129)
˛D1
where c˛ are arbitrary constants that should be found from initial (t D 0) conditions
for atomic displacements and their velocities. Since the matrix D is real, its
.˛/
eigenvectors v.˛/ D vi are also real. However, the coefficients c˛ must be
complex to ensure that the vector x.t/ is real:
X
n
Xn
x.t/ D M 1=2 v.˛/ Re c˛ ei!˛ t D M 1=2 v.˛/ g˛ .t/ ; (1.130)
˛D1 ˛D1
where

g˛ .t/ D Re c˛ ei!˛ t D a˛ cos .!˛ t/ C b˛ sin .!˛ t/
with a˛ and b˛ being two real arbitrary constants.
As we mentioned above, the eigenvalues !˛2 are always real which is guaranteed
by the fact that the dynamical matrix D is symmetric. However, there is no guarantee
that they are positive. If all eigenvalues (the frequencies squared) are positive,
!˛2 > 0, then the frequencies are real and (can be chosen) positive, and the
vibrational system is in a stable mechanical equilibrium. If there is at least one
negative eigenvalue, !˛2 < 0, then the frequency !˛ D ˙i j!˛ j is pure imaginary
and the corresponding normal mode is no longer sinusoidal: x.˛/ .t/ ej!˛ jt .
This means that the system is not stable in this particular atomic configuration and
will eventually transform to a different atomic arrangement (e.g. a molecule may
dissociate, i.e. break into several parts).
These conclusions about stability can be also directly illustrated on the potential
energy itself. As should be clear from Sect. 1.2.13, the potential energy (1.121) is
a so-called quadratic form of atomic displacements, which can be brought into a
diagonal form (i.e. diagonalised) in terms of the normal modes:
1 T 1 X 1=2 .˛/ T
V V.0/ D x ˆx D M v g˛ .t/ ˆ M 1=2 v.ˇ/ gˇ .t/
2 2
˛;ˇ
1X T 1=2
D g˛ .t/gˇ .t/ v.˛/ M ˆM 1=2 v.ˇ/
2
˛;ˇ
1X T 1X h T i
D g˛ .t/gˇ .t/ v.˛/ Dv.ˇ/ D g˛ .t/gˇ .t/!ˇ2 v.˛/ v.ˇ/ ;
2 2
˛;ˇ ˛;ˇ
where at the last step we made use of the fact that the vector v.ˇ/ is an eigenvector
of D with the eigenvalue !ˇ2 , so that Dv.ˇ/ D !ˇ2 v.ˇ/ . Because of the orthogonality
condition, Eq. (1.127), the expression in the square brackets above is equal to ı˛ˇ ,
so that we finally obtain
1X 2 2
n
1
V D V.0/ C x ˆx D V.0/ C ! g .t/ : (1.131)
2 2 ˛D1 ˛ ˛
One can clearly see that if all eigenfrequencies are real, i.e. !˛2 > 0, then the
current equilibrium state is indeed stable, i.e. the quadratic form, the potential
energy (1.121) is positive definite and hence it can only increase due to atomic
displacements. If, however, at least one of !˛ is complex, then !˛2 < 0 and the
current state is not stable as there must be a displacement which would take the
potential energy to a value smaller than V.0/.
Problem 1.104. Consider a system of particles connected by springs and per-

forming vibrations around their equilibrium positions. If particles coordinates
P respectively, then the kinetic
and velocities are combined into vectors q and q,
and potential energies of the system can be written as:
1 T 1 T
EK D qP K qP and EP D q Vq ; (1.132)
2 2
where K and V are symmetric square matrices. Show, considering the energy
conservation condition, EK C EP D Const, that the motion of the particles is
described by the matrix equation K qR C Vq D 0. Then, assuming an oscillatory
motion of frequency !, i.e. q.t/ D xei!t , show that the oscillation
ˇ frequencies
ˇ
of the system normal modes are determined by the equation ˇV ! 2 K ˇ D 0.
Problem 1.105. Consider a linear symmetric triatomic molecule A–B–A with
masses m, m and m, respectively, see Fig. 1.10. If x1 , x2 and x3 are dis-
placements of the atoms from their equilibrium positions along the molecular
axis, then one can write the following expressions for the kinetic and potential
energies of the system:
m 2 kh i
EK D xP C Px22 C xP 32 and EP D .x2 x1 /2 C .x3 x2 /2 ;
2 1 2
(continued)
Fig. 1.10 Three atoms

connected with springs

where k is the elastic constant corresponding to interactions between neigh-
bouring atoms. Assuming that the vector q contains x1 , x2 and x3 as its
components, rewrite EK and EP in matrix form and hence obtain explicit
expressions for the matrices V and K in Eq. (1.132). Show that the frequencies
of normal vibrations of the molecule are
r s
k k 2
!1 D ; !2 D 0 and !3 D 1C : (1.133)
m m
Show that the corresponding eigenvectors for each mode are q1 D .1; 0; 1/T ,
q2 D .1; 1; 1/T and q3 D .1; 2=; 1/T . Sketch them. What motion does the
zero frequency mode correspond to?
Problem 1.106. Here we shall solve the previous problem differently. Using
the condition that the centre of mass of the molecule is at rest at the origin,
eliminate x2 and thus rewrite both EK and EP in the matrix form as

m k x1 xP 1
EK D XP T K XP and EP D X T ˆX ; where XD and P
XD :
2 2 x3 xP 3
Obtain eigenvalues and eigenvectors of the matrix ˆ and hence find the
transformation U that diagonalises ˆ. Express new coordinates
orthogonal
y1
Y D D U X via the old ones, X. Demonstrate explicitly that the
y3
new coordinates .y1 ; y3 / are no longer coupled in V. Show then that the same
orthogonal transformation diagonalises the matrix K of the kinetic energy as
well. Show that the total energy E D EK C EP of the molecule in the new
coordinates is the sum of the energies of two independent harmonic oscillators.
Hence, determine the two oscillation frequencies of the molecule !1 and !3 .
Make sure they are the same as in Eq. (1.133).
1.3.3 Vibrations of Atoms in an Infinite Chain: A Point Defect
Consider a one-dimensional chain of identical atoms of mass m connected with

springs of elastic constant k, see Fig. 1.11a. At equilibrium, all atoms are equidistant
with the distance a between them. At non-zero temperature atoms vibrate around
their equilibrium positions.
Let xn be the instantaneous displacement of atom n from its equilibrium position.
Then the distance between atoms n and n 1 will be D a C xn xn1 , while the
distance between atom n and its another neighbour n C 1 is C D a C xnC1 xn .
Correspondingly, the force acting on atom n is composed of the force FC D
Fig. 1.11 (a) An infinite chain of identical atoms of mass m connected with identical springs; the
atoms are numbered by an integer index n running between 1 and +1. (b) The same chain, but
the atom with n D 0 was replaced with the one having a different mass m
k .C a/ D k .xnC1 xn / due to its right neighbour (number n C 1) which

acts to the right, and of the force F D k . a/ D k .xn xn1 / due to its
left neighbour (number n 1), acting to the left. Therefore, we can write down an
equation of motion for atom n as follows:
mRxn D FC F D k .xn1 C 2xn xnC1 / ; n D 1; : : : ; 1; 0; 1; : : : ; 1 :
Introducing a vector X D .xn / of atomic displacements, these equations are

rewritten in the matrix form XR D DX, i.e. in more detail:
0 1 0 10 1
:: :: ::
: : :
Bx C B C
B 2 CB xn1 C
d2 BB
n1 C
C B
B
CB
C B
C
C
B x C D B 2 B
CB n C
x ; (1.134)
dt2 B C C
n
@ xnC1 A B C
@ 2 A @ xnC1 A
:: :: ::
: : :
where D k=m. The infinite dimension matrix D has a tridiagonal form, i.e.
it has non-zero elements only on the diagonal itself as well as one element to
the left and right of it. The matrix D is symmetric with the elements dij D
2ıij ıi;jC1 C ıi;j1 . We notice also that dij depends only on the difference
of indices i j, but not on the both indices themselves. This is due to periodicity
of the system at equilibrium. Hence, we can write dij simply as dij , where dn D
.2ın0 ın1 ın;1 /.
Now we shall try to solve the equations XR D DX. To do this, we shall introduce
the so-called periodic boundary conditions: we shall say that the chain repeats itself
after a very large number N of atoms, i.e. xnCN D xn for any n between 0 and N 1.
This can be imagined in such a way that atom N would coincide with atom 0, i.e.
the chain of N atoms is connected to itself forming a ring as depicted in Fig. 1.12.
This trick allows us to form a set of N (which is very-very large, but finite) set of
equations (1.134) which we shall now attempt to solve.
Using the method of the previous section, we shall attempt a substitution X.t/ D
Yei!t , which results in the eigenvector-eigenvalue problem
X
DY D ! 2 Y or dnj yj D ! 2 yn : (1.135)
j
Fig. 1.12 Boundary

conditions for a
one-dimensional chain of
atoms: N atoms can be
thought of as connected on
themselves forming a ring
To find ! and Y, we shall perform a discrete Fourier transform of the atomic

displacements (see Problem 1.33 and Eq. (1.38)) to diagonalise the matrix D. We
have

DYD! 2 Y H) U DU U Y D ! 2 U Y H) DYD! 2 Y :
(1.136)

In other words, we apply a unitary transformation matrix U D unj , where unj D
N 1=2 exp .i2 nj=N/, such that elements of the matrix D become:
X
N1
1X
.D/js D U DU js D ulj dln uns D dln ei2 lj=N i2 ns=N
e :
l;nD0
N ln
Since dln depends only on the difference of indices, we can introduce a new index
p D l n to replace n, which yields
0 1 !
1X X 1 X i2
N1 N1
.D/js D dp ei2 lj=N i2
e .lp/s=N
D @ dp ei2 sp=NA
e l.js/=N
DdQ s ıjs :
N lp pD0
N lD0
(1.137)
Here
the
Kronecker symbol appeared since in the second bracket we simply have
U U js written explicitly (the matrix U is unitary and hence U U D E). We have
also introduced a new quantity
X
N1 X
N1

dQ s D dp ei2 sp=N
D 2ıp0 ıp1 ıp;1 ei2 sp=N
pD0 pD0

2 s
D 2 ei2 s=N
C ei2 s=N
D 2 1 cos : (1.138)
N
We see from Eq. (1.137) that the matrix D became diagonal after the similarity
transformation, and hence we can immediately get the required eigenvalues appear-
ing in (1.136) as:

2 2 s 2k 2 s 2k
! D 2 1 cos D 1 cos D .1 cos qa/ or
N m N m
r ˇ
4k ˇ qa ˇˇ
!.q/ D ˇsin ˇ :
m 2
There are N eigenvalues corresponding to N possible values of the index s D
0; 1; : : : ; N 1; however, it is more convenient to introduce q D 2 s=aN (which is
the wave vector) instead of s to label different solutions. It changes between 0 (when
s D 0) and 2 =a (when s D N 1 ' N as N 1). Since the chain is a ring, we
can alternatively consider the values of q between =a and =a. This interval is
called the Brillouin zone. The wave vector almost continuously changes within the
Brillouin zone between these two values as N is very large, i.e. the nearest values
of q differ only by q D 2 =Na D 2 =L, where L D Na is the length of the p ring.
And the vibrational frequencies !.q/ change between zero and the value of 4k=m
at the Brillouin zone boundaries (at q D ˙ =a). The dependence of the oscillation
frequency !.q/ of the chain on the wave vector q is called the dispersion relation.
Once we obtained all N eigenvalues, !.q/, we can calculate the corresponding to
them eigenvectors. The simplest choice of orthogonal eigenvectors9 is to consider
the corresponding to !.q/ eigenvector Y .q/ as a vector with all components equal to
zero apart from the s-th one which is equal to 1 (here q D 2 s=aN and s are directly
related):
.q/ 2 s
Y .q/ D .: : : ; 0; 1; 0; : : :/T ; i.e. Y j D ıjs ; where qD D sq :
L
Then, for Y .q/ D UY .q/ we have
.q/ X 1
N1
.q/ 1 1
Y nD p ei2 nj=N
Y j D p ei2 ns=N
D p eiqna ;
jD0
N N N
which immediately gives us the required solution:
.q/ 1
X n D Y .q/ n ei!.q/t D p eiqna ei!.q/t : (1.139)
N
The problem which we have just solved corresponds to a periodic chain: all atoms
in the chain are identical and repeat themselves after a “translation” by the distance
a (the distance between atoms). In practice, one is sometimes concerned in solving
a much more difficult problem of defective systems which has no periodicity.
However, before studying such systems, it is instructive first to calculate a special
auxiliary object, called the Green’s function. It is defined as a resolvent of the
dynamical matrix (D in our case):
9
As the matrix D is diagonal and hence Hermitian, this choice is always possible.
X ys ys
G.z/ D .zE D/1 D ; (1.140)
s
z !s2
where ys (which is the same as Y .q/ ) and !s2 are the s-th (or q-th) eigenvector and
eigenvalue of the matrix D, i.e. Dys D !s2 ys , and z is a complex number (see also
Eq. (1.91)); z is sometimes called complex “energy”. For the periodic chain the
index s counts different solutions of the eigenproblem, but we can use q for that
instead. Using the above results, we write for the elements of the matrix G.z/:

X Y .q/ n Y .q/ j 1X 1
gnj .z/ D 2
D 2 .q/
eiq.nj/a : (1.141)
q
z ! .q/ N q
z !
Equipped with the Green’s function of the ideal (perfect) chain, we can now
consider a more difficult problem of a defective chain. As the simplest example, let
us have a look at the chain in which a single 0-th atom of mass m was replaced with
an isotope of different mass m, as in Fig. 1.11(b). Since the isotope is chemically
identical, the same spring constants can be used as for the ideal chain. The same
equations of motion
mRxn D k .xn1 C 2xn xnC1 / ; n D 1; : : : ; 1; 1; : : : ; 1 ;
can be written for all atoms apart from the one with n D 0 (the isotope), for which
we have instead:
mRx0 D k .x1 C2x0 x1 / H) mRx0 D k .x1 C2x0 x1 / C m .1 / xR 0 :
Therefore, all linear differential equations we have to solve can now be written as:
mRxn D k .xn1 C 2xn xnC1 / C m .1 / xR 0 ın0 ; n D 1; : : : ; 0; : : : ; 1 :
They differ from the equations for the perfect chain by the second term in the right-
hand side which plays the role of a perturbation. Introducing now the same notations
we used for the periodic chain, we make the substitution xn .t/ D yn .t/ei!t , which
enable us to rewrite our equations as follows:
X
N1
! 2 yn D dnj yj C .1 / ! 2 y0 ın0 H) ! 2 Y D DY C WY
jD0
2
H) ! EDW Y D0; (1.142)
where the “perturbation” matrix W was introduced, which has a single non-zero
element W00 .!/ D .1 / ! 2 . Note that it depends on the frequency explicitly. To
solve the above equation, we shall rewrite it for a general complex z (i.e. we replace
! 2 with z) as follows:
.zE D W .z// Y D 0 H) .zE D/ .E G.z/W.z// Y D 0 ; (1.143)

where we introduced the Green’s function G.z/ D .zE D/1 of the perfect chain.
Note that the values of z above correspond to the solutions of the equation
jzE D W.z/j D 0
and hence the determinant of the matrix zE D cannot be zero. This allowed us to
introduce the matrix G.z/ in Eq. (1.143).
Non-trivial solutions of this equation appear as roots of the equation
j.zE D/ .E G.z/W.z//j D jzE Dj jE G.z/W.z/j D 0

H) jE G.z/W.z/j D 0 : (1.144)
The solutions are given by Eq. (1.144). To solve it, we introduce a block structure
in our matrices by splitting all sites of the chain into two regions: the zeroth region
corresponds to the single site 0, while the first region—to all other sites. This gives

W00 0 g00 .z/ G01 .z/
WD ; G.z/ D
0 0 G10 .z/ G11 .z/

1 g00 .z/W00 .z/ 0
H) E G.z/W D
G10 .z/W00 .z/ E11
(note that the 0-th region contains just a single lattice site). Therefore, additional
frequencies as solutions of the equation jE GWj D 0 are obtained by solving the
equation (the off-diagonal terms do not contribute to the determinant):
j1 g00 .z/W00 .z/j D 0 H) 1 D g00 .z/W00 .z/

1 1X 1
H) D ; (1.145)
.1 / ! 2 N q ! 2 ! 2 .q/
where explicit expression for the Green’s function on the defect site (1.141) (when
n D j D 0) has been used. The sum in the right-hand side can be turned into an
integral (since N ! 1 and hence q ! 0)
Z =a
1X 1 1 X q a dq
D ! 2 4k sin2 qa
;
N q ! 2 ! 2 .q/ Nq q ! 2 ! 2 .q/ 2 =a ! m 2
and hence calculated. Equating this to the left-hand side of Eq. (1.145) allows
calculation of all solutions for the frequencies of the defective chain. It should
give perturbed “bulk” solutions close to those of the perfect chain, plus additional
solutions may also appear when becomes sufficiently different than one. A
patient reader should be able to perform the q-integration analytically and obtain
a transcendental equation for !.
Problem 1.107. Show that the above integral is equal to (cf.

Eqs. (2.68), (2.114) or (2.115))
Z =a
a dq 2
4k 2 qa
D p ;
2 2
=a ! m sin 2 a! ! 2 4k=m
and hence find the frequency corresponding to the local vibration of a lighter
atom ( < 1) as
s
4k
!loc D ;
m .2 /
p
which is positioned above the perfect chain frequencies 0 < ! 4k=m (since
0 < .2 / < 1). This vibration corresponds to a local mode associated
with oscillations of atoms in the vicinity of the defect. Explain why there is no
extra solution for a heavier atom ( > 1). [Hint: use the substitution t D tan 2x ,
where x is the argument of the sine function in the integrand.]
1.3.4 States of an Electron in a Solid
Consider a three-dimensional atomic system, e.g. a solid. The solid we are consid-
ering does not need to be periodic; it could be a disordered or defective system. We
would like to obtain energy levels electrons can occupy in this material. To find
the energy levels, one need to solve the Schrödinger equation for the electrons:
„2
.r/ C V.r/ .r/ D .r/ ; (1.146)
2m
where is the wave function of the electron occupying the state with energy
, the first term in the left-hand side corresponds to the kinetic, while the second
to the potential energy of the electron with V.r/ being the corresponding lattice
potential that the electrons experience in the solid, and m is the electron mass. It is
convenient to introduce the Hamiltonian operator H b which is defined in such a way
that its action on any function ' .r/ standing on the right of it results in the following
action:

b „2 „2
H'.r/ D C V.r/ '.r/ D '.r/ C V.r/'.r/ :
2m 2m
Then, the Schrödinger equation (1.146) takes on a very simple form:
b
H .r/ D .r/ : (1.147)
Fig. 1.13 A possible model of an alloy of a two-dimensional solid: we have a regular arrangement
of identical atoms (brown) except for some random lattice sites which are occupied by a different
species (cyan). At each lattice site a single localised atomic orbital A .r/ is positioned (blue)
To solve the differential equation (1.147), it is convenient to turn it into a matrix

form. To this end, we shall expand the electronic wave function in terms of
localised on atoms orbitals A .r/ (called atomic orbitals),
X
.r/ D cA ./A .r/ : (1.148)
A
For the sake of simplicity we shall assume that there is only one orbital, A .r/,
placed on each atom A. Here the summation is performed with respect to all atoms
(i.e. all atomic orbitals). This method is called the linear combination of atomic
orbitals (LCAO) method. As an example of a possible system we show in Fig. 1.13
a fragment of a two-dimensional (e.g. a surface or a slab) system in which two
species of atoms are distributed at random at regular lattice sites. On each atom of
the system with position vector RA we have placed an orbital A .r/ D .r RA /;
note that in this example all orbitals have an identical shape given by the function
.r/; only their positions are different.
We shall also assume that the orbitals are normalised to unity and that the orbitals
on different atoms do not overlap, i.e. they are orthogonal to each other; in other
words,
Z
A .r/B .r/dr D ıAB ; (1.149)
where the integration is done with respect to the whole volume of the system, and
ıAB is the Kronecker symbol (equal to one when A D B, otherwise it is equal to
zero).
Before we move any further, we need one important result related to expanding
a given function in terms of other functions. This question is considered in more
detail in Sect. I.7.2 on functional series. What is essential for us here is that we may
assume that the atomic orbitals centred on all atoms of the system form a set of
functions which is very close to a complete set of functions, i.e. that any “good”
function can be expanded in terms of them:
X
f .r/ D fA A .r/ :
A
The expansion coefficients fA are obtained, as this is usually done, by multiplying

both sides of the above equation with some function B .r/, integrating over the
whole space and making use of the orthogonality condition (1.149). This gives
Z X Z Z X
f .r/B .r/dr D fA B .r/A .r/dr H) f .r/B .r/dr D fA ıAB
A A
Z
H) f .r/B .r/dr D fB or fB D hf jB i ;
where we have used the Dirac’s notation for the matrix element, something which
is frequently used in quantum mechanics.
b B .r/ as the function f .r/, then the above expressions are
Now, if we consider H
rewritten as:
X Z
b
HB .r/ D fA A .r/ with fA D A .r/H b B .r/dr D HAB ; (1.150)
A
where the numbers HAB D hA j H b jB i form elements of the Hamiltonian matrix
H D .HBA /, and yet again the Dirac’s notations for the matrix elements were applied
for convenience.
Now we are ready to continue working on our problem. Substituting the LCAO
expansion (1.148) into the Schrödinger equation (1.147), we obtain
X X
b A .r/ D
cA ./H cA ./A .r/ :
A A
Multiplying both sides of the equation by B .r/ and integrating over the whole
volume and using the orthogonality of the atomic orbitals (1.149), we obtain
X
HBA cA ./ D cB ./ ; (1.151)
A
which presents yet another example of the eigenvalue–eigenvector problem,
HC D C ; (1.152)
where C D .cA .// is the vector of all LCAO coefficients corresponding to the
state . Hence, to determine eigenvalues and the LCAO coefficients C , one has
to find eigenvalues and eigenvectors of the Hamiltonian matrix H D .HAB /. Note
that since H is a symmetric matrix, the eigenvalues are guaranteed to be real.
For a general disordered infinite system it is rather difficult to solve the

eigenproblem since the Hamiltonian matrix has an infinite dimension. On the other
hand, one can exploit a local character of the interaction between atoms which can
be mathematically formulated as a condition stating, for instance, that only matrix
elements HAB between the nearest neighbours A and B are non-zero. Then, instead
of solving the eigenproblem (1.151) or (1.152) directly, one can consider a resolvent
G.z/ D .zE H/1 of the Hamiltonian matrix H (z is a complex number) whose
singularity points z on the real axis (called poles) give the required energies.
To proceed, we shall use the (slightly modified) Lanczos method (Sect. 1.2.16).
We start by choosing the first orbital g1 .r/ as the atomic orbital 0 .r/ centred at the
zero lattice site. We assume, of course, that 0 is properly normalised: hg1 j g1 i D 1.
Then, the second orbital is chosen using
b 1 .r/ ˛1 g1 .r/ ;
ˇ1 g2 .r/ D Hg b jg1 i ;
˛1 D hg1 j H
b jg1 i D hg1 j H
ˇ1 D hg2 j H b jg2 i : (1.153)
The only difference with the Lanczos procedure introduced in Sect. 1.2.16 is that
we work here with functions and the operator H b instead of vectors and a matrix.
However, as we recall from Sect. 1.1.2, there is a very close analogy between vectors
and functions; it is easy to see that the operator H b plays here the role of such a
matrix.
By virtue of the above construction, the function g2 must be orthogonal to g1 and
normalised, i.e. hg2 j g1 i D 0 and hg2 j g2 i D 1. It is essential to realise at this stage
that g2 , constructed as above, is actually a linear combination of the atomic orbitals.
Indeed, according to Eq. (1.150),
X
Hg b 0 .r/ D
b 1 .r/ D H H0B B .r/ ;
B
b jB i. Since there is only “interaction”

with the expansion coefficients H0B D h0 j H
between the nearest sites, the expansion above is limited only to the nearest
neighbours. Therefore, the orbital g2 .r/ from (1.153) is an LCAO from the zero
and all neighbouring atomic sites only. Let us keep this in mind.
The construction of the consecutive orbitals g3 , g4 and so on goes along the
general recipe of Eqs. (1.105)–(1.107):
b k ˛k gk ˇk1 gk1 ;
ˇk gkC1 D Hg b jgk i
˛k D hgk j H and
b jgk1 i D hgk1 j H
ˇk1 D hgk j H b jgk i ; (1.154)
where k D 2; 3; : : :. Constructed in this way, the orbitals gk .r/ are orthogonal to

each other and normalised, i.e. hgl j gk i D ıkl . Then the Hamiltonian matrix in
b jgk i, would have a special structure: the only matrix
new orbitals, Tlk D hgl j H
elements which are non-zero are those which relate either the same (k D l) or
the neighbouring orbitals (k D l ˙ 1) during the procedure, i.e. Tlk D 0 as long as
jk lj > 1. In other words, the matrix T is tridiagonal.
Each orbital constructed using the described Lanczos algorithm is an LCAO

centred around the chosen zero lattice site. Indeed, as we have already pointed out,
g2 contains 0 and the orbitals A from the sites nearest to the zero site; g3 would
contain orbitals used in g2 and the orbitals nearest to them; g4 would contain all
orbitals used in expanding g3 plus all orbitals A nearest to them and so on. As
more and more orbitals are built, the larger and larger region around the zero lattice
site is used for expanding them. At every stage of this procedure a finite number of
orbitals are built which at each stage form an orthonormal set.
Let us formally continue the construction procedure until we build as many
orbitals as the number of atoms in the system, i.e. we shall formally have the same
number of new orbitals in the set as there are atomic orbitals in the initial set. As
each new orbital is a linear combination of the original atomic orbitals also forming
an orthonormal set, the new set of orbitals is related to the initial set by a unitary
transformation matrix U:
X
gk D UkA A ; where U 1 D U T :
A
Reversely,
X
A D UkA gk :
k
Let us now look at the resolvent matrix G.z/ D .zE H/1 . Let us express the
Hamiltonian matrix H D .HAB / via matrix elements Tkl of the Hamiltonian operator
b written with respect to the new orbitals:
H
X X X
b jB i D
HAB D hA j H b jgl i D
UkA UlB hgk j H UkA UlB Tkl D U T Ak Tkl UlB :
kl kl kl
In other words, H D U T TU, where the matrix T is tridiagonal as explained above.

Then, employing the fact that U T U D E, we can write
1 1
G.z/ D zU T U U T TU D U T .zE T/ U D U T .zE T/1 U :
This identity shows that the pole structure of the resolvent matrix is fully contained
in the resolvent G.z/ D .zE T/1 of the T matrix which is written in terms of the
new (Lanczos’s) basis set. But the matrix zE T is tridiagonal, and hence the first
diagonal element of its inverse, G11 .z/, can easily be calculated as a function of z via
a continued fraction, see Sect. 1.2.6.4 and Problem 1.61:
ˇ ˇ ˇ ˇ
1j ˇ12 ˇ ˇ22 ˇ ˇ32 ˇ 2 ˇ
ˇn1
G11 .z/ D ;
j1 j2 j3 j4 jn
where k D z ˛k are the diagonal elements of the matrix zE T, while Tk;kC1 D

TkC1;k D ˇk are its non-zero off-diagonal elements. As the size of our system is
infinite, the continued fraction is infinite as well.
However, calculations usually show that some sort of convergence for the
coefficients ˛k ! ˛1 and ˇk ! ˇ1 may be reached at some (possibly large) step l
of the construction procedure. If the convergence is indeed reached, the fraction can
be “summed up” to infinity exactly. Indeed, in this case the fraction looks like this:
1j ˇ12 ˇ ˇ22 ˇ ˇ32 ˇ 2 ˇ
ˇl2 ˇl2 ˇ ˇ12 ˇ 2 ˇ
ˇ1 2 ˇ
ˇ1
G11 .z/ D ;
j1 j2 j3 j4 jl1 j1 j1 j1 j1
(1.155)
where 1 D z ˛1 and after the step l all terms in the fraction are identical. If we
denote this part of the fraction as S1 , then it can be calculated:
2 2
ˇ1 ˇ1 ˇ2
S1 D 1 2
D 1 D z ˛1 1 ;
ˇ1 S1 S1
1 ˇ2
1 11
which yields a quadratic equation for the sum S1 .z/. Once the infinite tail of the
fraction containing identical terms is known, these terms can be all replaced by the
infinite sum S1 leading to a finite continued fraction
ˇ ˇ ˇ ˇ ˇ
1j ˇ12 ˇ ˇ22 ˇ ˇ32 ˇ 2 ˇ
ˇl2 ˇl2 ˇ
G11 .z/ D ; (1.156)
j1 j2 j3 j4 jl1 S1
which can be now calculated exactly.
We have discussed here only the main ideas of Lanczos method. It exploits the
“localised” nature of interactions in the systems, and as such is frequently used, e.g.
in the theory of electronic states of disordered system as well as in other fields as
it effectively allows considering finite fragments of realistic systems taking explicit
account of their specific geometry.
1.3.5 Time Propagation of a Wave Function
Consider
an n-level quantum system described by the Hamiltonian matrix H D
Hij . If at the initial time t D 0 the wave function of the system was a vector-
column
0 1
0
1
B C
B C
B C
‰0 D B k0 C ;
B C
@ A
0
n
then its evolution in time can be described by the Schrödinger equation

d‰t
i„ D H‰t : (1.157)
dt
In the simplest case of time independent Hamiltonian, a closed formal solution of
this equation can be found in the form of an exponential function of the matrix H.
Indeed, let k and xk are the eigenvalues and eigenvectors of H (note that since H is
Hermitian, the eigenvalues are guaranteed to be real; however, the eigenvectors may
be complex as the matrix H may in general be complex). It is also guaranteed that

the eigenvectors form an orthonormal set, i.e. xk xj D ıkj .
Then, using the spectral theorem, we can write the Hamiltonian via its eigenvec-
tors and eigenvalues and then substitute into the Schrödinger equation (1.157) to
obtain
X d‰t X
HD k xk xk H) i„ D k xk xk ‰t : (1.158)
k
dt k
P
Expanding the state vector ‰t in terms of the eigenvectors of H, i.e. ‰t D j ˛j .t/xj ,

we find upon multiplication from the left with xk that the expansion coefficients

˛j .t/ D xj ‰t . Substituting the obtained expansion of ‰t into (1.158), we have
X d˛k X
X d˛k X
i„ xk D k ˛j xk xk xj H) i„ xk D k ˛k xk :
k
dt kj k
dt k
(1.159)
In the above manipulation we used the associativity in multiplying matrices (vectors
in this particular case) and also the fact that the eigenvectors of H are orthonormal;
because of the latter the double sum in the right-hand side was transformed into

a single sum. Finally, multiply (1.159) by xj from the left again and use the
orthonormality property. This way a simple differential equation for the coefficients
˛j .t/ follows which is trivially solved:
d˛j
i„ D j ˛j H) ˛j .t/ D ˛j .0/eij t=„ ;
dt

where ˛j .0/ D xj ‰0 are the initial expansion coefficients. Therefore, the wave
function at time t becomes
X X X
‰t D ˛j .t/xj D ˛j .0/eij t=„ xj D eij t=„ xj ˛j .0/
j j j
0 1
X
D@ eij t=„ xj xj A ‰0 :
j
The sum in the right-hand side in the round brackets can be recognised to be the
spectral representation of the function of the Hamiltonian matrix H:
Ut0 D eiHt=„ :
The matrix Ut0 is called propagator in quantum mechanics as it propagates the wave
function ‰0 from t D 0 to ‰t D Ut0 ‰0 at any finite value of t > 0. It satisfies simple
properties which we shall leave to the reader to prove as a problem.

Utt0 D Ut Ut0 ; Utt0 D Ut0 t and Utt1
0 D Ut0 t ;
where
0
Utt0 D eiH.tt /=„ :
Problem 1.109 (Rabi Problem).

Consider a two-level
quantum system with
1 0
stationary states 1 D and 2 D , which are normalised and
0 1
orthogonal to each other. These states are eigenvectors of the Hamiltonian

1 0
H0 D
0 2
with eigenvalues 1 and 2 , respectively. Suppose that at time t D 0 when the

system was in state 1 it was subjected to a perturbation

V1 V
VD :
V V2
The state of the system .t/ at time t can be considered as a linear

combination
.t/ D C1 .t/ 1 C C2 .t/ 2 ;
written in terms of the two stationary states. By substituting .t/ into the
time dependent Schrödinger equation i„ P D .H0 C V/ and considering
explicitly both components, show that the coefficients C1 .t/ and C2 .t/ satisfy
the following system of two ordinary differential equations:
i„CP 1 D . 1 C V1 / C1 C VC2 and i„CP 2 D . 2 C V2 / C2 C V C1 :

(continued)

C1
Introducing a vector C D , rewrite these equations in a matrix form
C2
i„CP D DC, where D is a 2 2 Hermitian matrix.
x1
By using a trial solution C.t/ D e˛t X, where X D is a constant vector,
x2
show that both X and ˛ can be obtained by solving an eigenproblem DX D X
for the matrix D, where D i„˛.
Show that eigenvalues of D are (cf. Problem 1.78)
q
1
˙ D .1 C2 / ˙ .1 2 /2 C4 jVj2 ; where i D i CVi ; i D 1; 2 ;
2
while the corresponding eigenvectors can, for instance, be chosen as

V V
XC D and X D ;
=2 C =2
q
where D 1 2 and D .1 2 /2 C 4 jVj2 .
1
2
Introducing two arbitrary constants AC and A , write the general solution
of the equation i„CP D DC. Then, applying the initial condition that C.0/ D 1 ,
determine these two constants. Finally, show that the probabilities P1 .t/ D
jC1 .t/j2 and P2 .t/ D jC2 .t/j2 of finding the system in the states 1 or 2 at
time t > 0 are given by the following equations:
4 jVj2 t 4 jVj2 t
P1 .t/ D 1 2
sin2 and P2 .t/ D sin2 :
2 C 4 jVj „ 2 C 4 jVj2 „
Note that the probabilities oscillate; however, P1 C P2 D 1 at any time.

Chapter 2
Complex Numbers and Functions
We introduced complex numbers in Sect. 1.8 of the first volume.1 There we just
defined the numbers themselves, but did not go any further. In fact, since the
introduction of complex numbers a number of centuries ago, the theory based
on them has been substantially developed into an extended analysis of complex
functions defined on the complex plane. The mathematical tool thus created
represents an extremely powerful device for solving practical problems ranging
from calculating real integrals to solving partial differential equations.
The purpose of this chapter is to consider in detail this elegant formalism.
We shall start by returning to complex numbers and the complex plane, then
consider functions on the complex plane, their differentiation and integration,
complex functional series, analytic continuation, residues, Frobenius method for
solving ordinary differential equations and, finally, some applications in physics to
conclude this chapter.
2.1 Representation of Complex Numbers
We shall start by introducing various representations of complex numbers. Let us

p number z D x C iy can be written in the trigonometric form
recall that the complex
via its length r D x2 C y2 D jzj and phase (also called the argument of the
complex number z and denoted D arg.z/) as
z D x C iy D r .cos C i sin / : (2.1)
1

126 2 Complex Numbers and Functions
It is essential that the phase is defined up to an arbitrary integer number of 2 ,

i.e. the same complex number is obtained for the phase Arg.z/ D C 2 n, where
n D 0; ˙1; ˙2; : : :. This fact has profound consequences for functions of complex
variables as we shall see below many times!
A complex number can be shown as a point with coordinates .x; y/ on the 2D
plane x y, called the complex plane C. It is also convenient to show a complex
number z D x C iy as a vector connecting the centre of the coordinate system with
the point .x; y/. Then r corresponds to the length of the vector, while the phase (up to
a multiple of 2 )
y
D arctan (2.2)
x
is equal to the angle the vector forms with the x axis. By going around the z D 0
point (i.e. around the point with coordinates .0; 0/, the centre of the coordinate
system) the phase acquires either 2 or 2 , depending on whether the direction
of the traverse is anti-clockwise or clockwise, respectively.
We know that one can perform algebraic operations with complex numbers
respecting usual algebraic rules and an additional condition that i2 D 1. Using
known properties of the trigonometric functions, Sect. I.2.3.7, we shall now prove
useful formulae for the product and division of complex numbers. Consider a
product of two complex numbers z1 and z2 which are specified by their lengths
r1 and r2 and their phases 1 and 2 . Then, their product
z1 z2 D r1 .cos 1 C i sin 1 / r2 .cos 2 C i sin 2 /

D r1 r2 cos 1 cos 2 C i sin 1 cos 2 C i cos 1 sin 2 C i2 sin 1 sin 2
D r1 r2 Œ.cos 1 cos 2 sin 1 sin 2/ C i .sin 1 cos 2 C cos 1 sin 2 /
D r1 r2 Œcos . 1 C 2/ C i sin . 1 C 2 / ; (2.3)
i.e. the product z3 D z1 z2 is the complex with the length r3 D r1 r2 and the phase
3 D 1 C 2.
Problem 2.1. Show that if two complex numbers z1 and z2 are specified by the
lengths r1 and r2 and the phases 1 and 2 , then their division z3 D z1 =z2 is
characterised by r3 D r1 =r2 and the phase 3 D 1 2 , i.e.
z1 r1
D Œcos . 1 2 / C i sin . 1 2 / : (2.4)
z2 r2
Problem 2.2. Derive the following result for the power of a complex number
by repeatedly using Eq. (2.3):
zn D rn Œcos .n / C i sin .n / and zn D rn Œcos .n / i sin .n / ;

(2.5)
(continued)
2.1 Representation of Complex Numbers 127

where n is a positive integer. In particular, for positive n we have the formula
due to de Moivre:
.cos C i sin /n D cos .n / C i sin .n / : (2.6)
Problem 2.3. Also prove de Moivre formula by induction.
This formula can actually be used for expressing the cosine or sine of the angle
n with integer n via the cosine and sine of the angle (angle reduction). Indeed,
by taking n D 2 and opening brackets in (2.6), we have
cos2 C 2i cos sin sin2 D cos .2 / C i sin .2 / :
The real part in the left-hand side should be equal to the real part in the right, and
similarly for the complex imaginary parts, which give us two identities:
cos .2 / D cos2 sin2 and sin .2 / D 2 sin cos ;
which we are well familiar with already.
Problem 2.4. Similarly, show that (cf. Eqs. (I.2.56)–(I.2.59)):
cos .3 / D cos3 3 sin2 cos and sin 3 D 3 sin cos2 sin3 I

4 2 2 4
cos .4 / D cos 6 sin cos C sin I
sin 4 D 4 sin cos3 4 sin3 cos :
Problem 2.5. Now obtain a general result applying the binomial expansion to
the left-hand side of (2.6):
p
X 2p
cos .2p / D .1/k sin2k cos2.pk/ ;
2k
kD0
p1
X 2p
sin .2p / D .1/k sin2kC1 cos2.pk/1 ;
2k C 1
kD0
p
X 2p C 1
cos ..2p C 1/ / D .1/k sin2k cos2.pk/C1 ;
2k
kD0
p
X 2p C 1
sin ..2p C 1/ / D .1/k sin2kC1 cos2.pk/ :
2k C 1
kD0
Check that these equations reproduce the previous results of Problem 2.4.
Next we can rederive the sum of the cosine and sine functions we calculated
previously,
X
n
1 .n C 1/ x nx
sin .kx/ D x sin sin (2.7)
kD1
sin 2 2 2
and
X
n
1 .n C 1/ x nx
cos .kx/ D x cos sin ; (2.8)
kD1
sin 2 2 2
see Eqs. (I.2.63) and (I.2.64), using the trigonometric representation of a complex
number. In fact, we shall even be able to generalise these formulae a little. To this
end, let us note that the formula (I.1.61) for the geometric progression,
qnC1 1
Sn D a0 ;
q1
is valid for complex q as well as that we did not use in the derivation that q was
necessarily real (recall that the complex numbers are governed by the same algebraic
rules as the real numbers). Then, we can write
X
n
1 znC1
zk D with z D r .cos x C i sin x/ and zk D rk Œcos .kx/ C i sin .kx/ :
kD0
1z
Therefore, the real part of the sum (denoted as Re .: : :/) is

!
X
n X
n X
n
1 znC1
Re z k
D Re rk Œcos .kx/ C i sin .kx/ D rk cos .kx/ D Re ;
kD0 kD0 kD0
1z
while the imaginary part (denoted Im .: : :/) is

!
X
n X
n X
n
1 znC1
Im z k
D Im rk Œcos .kx/ C i sin .kx/ D rk sin .kx/ D Im ;
kD0 kD0 kD0
1z
Therefore, by taking the real and imaginary parts of the expression in the right-
hand sides, we should be able to work out the sums of rk cos .kx/ and rk sin .kx/.
The calculation is based on the fact that the product zz D x2 C y2 is real, where
x D Re .z/ and y D Im .z/. Therefore, we multiply and divide the expression above
by the complex conjugate of the denominator:
1 znC1 1 znC1 1 z
D :
1z 1z 1 z
2.1 Representation of Complex Numbers 129
With this trick the denominator becomes real, so that only the numerator is complex:
.1 z/.1 z / D Œ.1 r cos x/ ir sin x Œ.1 r cos x/ C ir sin x

D .1 r cos x/2 C r2 sin2 x D 1 2r cos x C r2 :
Problem 2.6. Calculate the numerator to show that its real and imaginary
parts are, correspondingly:

Re 1 znC1 1 z D 1 r cos x C rnC2 cos .nx/ rnC1 cos ..n C 1/ x/ ;

Im 1 znC1 1 z D r sin x C rnC2 sin .nx/ rnC1 sin ..n C 1/ x/ :
Therefore, we finally obtain
X
n
1 r cos x C rnC2 cos .nx/ rnC1 cos ..n C 1/ x/
rk cos .kx/ D ; (2.9)
kD0
1 2r cos x C r2
X
n
r sin x C rnC2 sin .nx/ rnC1 sin ..n C 1/ x/
rk sin .kx/ D : (2.10)
kD0
1 2r cos x C r2
Assuming that 0 < r < 1, we can also calculate the infinite sums by taking the
n ! 1 limit (note that rn ! 0):
1
X 1 r cos x
rk cos .kx/ D ; (2.11)
kD0
1 2r cos x C r2
1
X r sin x
rk sin .kx/ D : (2.12)
kD0
1 2r cos x C r2
Problem 2.7. Prove that the series (2.11) and (2.12) converge only for 0 <
r < 1.
Problem 2.8. Check that by setting r D 1 in Eqs. (2.9) and (2.10), we recover
Eqs. (2.7) and (2.8).
2.2 Functions on a Complex Plane
2.2.1 Regions in the Complex Plane and Mapping
Before introducing complex functions, we have to make some definitions and

consider regions (or domains) in the complex plane C as these play a crucial role
in what follows.
In the world of real numbers we consider intervals like a < x < b, a x < b or
a x b, where the first one excludes the two boundary points a and b, while the
other two include either one of them or both. Similarly, in the (two-dimensional)
complex plane C we consider open domains (regions) where the boundary line
(or boundary lines) are not included, or closed domains (regions) where they are
included. To define what we mean by this, let us first define the circular -vicinity
(or neighbourhood) of a point z0 on the complex plane as a set of points z in it which
satisfy the condition
jz z0 j < : (2.13)
q
Here jz z0 j D .x x0 /2 C .y y0 /2 corresponds to the distance on the complex
plane between the two points, z and z0 . Therefore, Eq. (2.13) selects all the points
lying inside a circle of radius drawn with the point z0 at its centre, see Fig. 2.1(a).
Note that the points at the boundary of the circle are strictly not selected because of
the less (rather than less and equal) sign in Eq. (2.13). If we now consider a region
D in the complex plane, see Fig. 2.1(b), then three situations can be envisaged: an
internal point z1 of D, a boundary point z2 lying on the boundary L of D (shown by
the solid line) and finally a point z3 which is outside D. For any internal point one
can always find such its -vicinity (or such value of ) that all its points belong to
D; if a point is on the boundary L of D, then for any -vicinity there will be points
Fig. 2.1 (a) The -vicinity (neighbourhood) of the point z0 includes all points z from C which lie
inside the circle of radius centred at the point z0 D .x0 ; y0 /. (b) For any internal point z1 of a
region D one can always find its -vicinity such that it lies fully inside D. This is, however, not the
case for the point z2 lying on the boundary L of the region D: only some of the points of its any
-vicinity lie inside D. For a point z3 lying outside D, one can always find such its vicinity that lies
entirely outside D
2.2 Functions on a Complex Plane 131
inside and outside of D including some boundary points, i.e. not all points in the
-vicinity would belong to D; finally, if a point lies outside D, then one can always
find such its -vicinity that all its points lie outside D, i.e. the entire -vicinity is
outside D.
If a region in the complex plane has a single boundary (a single continuous
line, either a smooth or a broken one2 ) as the one shown in Fig. 2.1(b), it is called
simply connected. However, regions in the complex plane may have a more complex
structure. For instance, they may have not just one but several boundaries (i.e.
several closed boundary lines). This happens when some regions of points on C are
excluded and/or cuts made through accepted regions, as is schematically shown in
Fig. 2.2(a). Indeed, the two white regions (with boundaries L2 and L3 ) which are cut
off from the shaded region add two more boundary lines to it, so that now the shaded
region has three boundary lines: L1 , L2 and L3 . Regions which have more than one
continuous boundary we shall call multiply connected or non-simply connected. One
may say that our shaded region has two holes. Indicating explicitly the number of
boundaries (closed boundary lines) existing in D as k D 1; 2; : : :, we say that D is
.k 1/-fold-connected; simply connected regions have k D 1. Hence, the shaded
region in Fig. 2.2(a) is twofold-connected. Note that the cuts by lines L4 and L5 in
this figure do not change the number of boundary lines as it must be clear from
Fig. 2.2(b): they simply contribute to the outside boundary line of D.
When generalizing the idea of integration, we shall need to define a direction
along the boundary of a region. We shall consider the direction “positive” when
we traverse the boundary in such a way that the region D is on the left (compare the
Green’s and Stokes’s theorems in Sects. I.6.3.3 and I.6.4.5). If the opposite direction
is chosen with the points of D being on the right, the integrals will change their sign,
so that this direction will then be called “negative”.
Fig. 2.2 Regions in the complex plane and their boundaries. (a) Region D has five boundaries
L1 L5 , where L1 is its outer boundary, L2 and L3 correspond to the boundaries of internal ovals
taken out and L4 and L5 correspond to the cuts made in D. The cuts are shown more clearly in (b).
Note the directions of traverse: their “positive” direction is always chosen such that the region D is
on the left
2
In other words, a closed piecewise line (i.e. consisting of smooth pieces which connect to each
other).
A function f .z/ on the complex plane C can be defined in a way similar to a real
function of two variables g.x; y/ when each pair of real numbers x and y is put into
correspondence with a complex number w. Hence, there is an important difference
with the former case: if in the case of a real function g D g.x; y/ every point on the
x y plane with the coordinates .x; y/ is put into correspondence with a real number
g, in the case of a complex function w D f .z/ we define the correspondence between
the point z D .x; y/ in the 2D complex plane with a point w D .u; v/ on the same
2D plane, i.e. w D u C iv contains two real numbers, not one as in the case of a real
function g.x; y/. Therefore, complex functions map the complex plane onto itself.
Hence, in this sense, there is more similarity with a real function of a single variable
where the 1D space (the real x axis) is mapped onto itself. Therefore, the mapping
w D f .z/ is equivalent to two real functions of two real variables each:
w D f .z/ ” u D u.x; y/ and v D v.x; y/: (2.14)
When considering such a mapping, more complications may arise. For instance,
we may have that for the same value of z several values of w are possible. This
results in a multi-valued function as schematically shown in Fig. 2.3(b). If only a
single value of w exists for each value of z, but still several different values of z may
result in the same w, the function w D f .z/ is called single-valued, see Fig. 2.3(a). If
any single z within some region D corresponds to a single value of w, and vice versa,
we then have a mapping with one-to-one correspondence, Fig. 2.3(c). We shall see
that in many practical situations we shall be trying to find such regions of values of
z and w so that the one-to-one mapping between them would be possible.
Fig. 2.3 The function

w D f .z/ performs a mapping
from one complex plane C of
z (left panels) to another one
C of w (right panels): (a) the
function is single-valued;
(b) multi-valued and
(c) mapping is a one-to-one
correspondence, so that the
inverse function f 1 .w/ D z
can be defined
We note that this is not something really specific for functions of complex
variables. Indeed, we know that a square root of p a real positive number has p two
values: negative and positive, i.e. the function
p y D x has in fact two values, ˙ x,
for the same value of x (assuming that by x we mean only the positive value of
the root); therefore, in the world of real numbers we do face a similar problem.
p It is
overcome by taking only the positive root, i.e. by assuming that y D x is always
positive. In the case of complex functions defined on the complex plane this is an
issue of fundamental importance and we shall encounter this over and over again.
How this problem is tackled is similar to choosing a single root in the world of real
numbers and will be considered later on in detail.
Suppose now that we have the one-to-one mapping between two complex
domains of z and w. Then, it must be possible to define the inverse function
w D f 1 .z/ such that z D f .w/. For two single-valued functions w D f .z/ and
q D .w/ one can also introduce their superposition qD .f .z//. In particular, if
w D f .z/ performs one-to-one mapping, then f f 1 .z/ D z and f 1 .f .z// D z.
Next, we have to define the limit of a function in C: The limit limz!z0 f .z/ is
defined similarly to that of a real function of two variables, i.e.
lim f .z/ ” lim f .z/ D lim Œu .x; y/ C iv .x; y/

z!z0 x!x0 ; y!y0 x!x0 ; y!y0
D lim u .x; y/ C i lim v .x; y/ :

x!x0 ; y!y0 x!x0 ; y!y0
Alternatively, using the ı language, limz!z0 f .z/ D F if for any > 0 one
can always find such ı > 0 that for any z satisfying 0 < jz z0 j < ı follows
jf .z/ Fj D jw Fj < . In other words, points in the ı-vicinity of z0 (excluding
the point z0 as the function f .z/ may not exist there) are mapped onto points of an
-vicinity of F in the w plane. No matter how small the latter vicinity is, one can
always find the corresponding vicinity of the point z0 to accomplish the mapping.
It is clear that the limit exists if and only if the two limits for the functions u.x; y/
and v.x; y/ exist. The function f .z/ is continuous, if
lim f .z/ D f .z0 / ;

z!z0
similarly to the real functions. Clearly, both functions u.x; y/ and v.x; y/ are to be
continuous for this to happen. It is essential that the limit must not depend on the
path z ! z0 in C.
The usual properties of the limits apply then, similarly to the calculus of real
functions:

lim .f .z/Cg.z// D lim f .z/C lim g.z/; lim .f .z/g.z// D lim f .z/ lim g.z/ ;
z!z0 z!z0 z!z0 z!z0 z!z0 z!z0

f .z/ limz!z0 f .z/
lim D .if g ¤ 0/ and lim g .f .z// D g lim f .z/ :
z!z0 g.z/ limz!z0 g.z/ z!z0 z!z0
Fig. 2.4 For the derivative

f 0 .z/ to exist, the limit in
Eq. (2.15) must not depend on
the path in which z ! 0
It is also possible to show that if a closed region D is considered, the function f .z/
will be bounded, i.e. there exists such positive F that jf .z/j F for any z from D.
2.2.2 Differentiation: Analytic Functions
Now we are ready to consider differentiation of f .z/ with respect to z. We can define
the derivative of f .z/ similarly to the real functions as
f .z C z/ f .z/
f 0 .z/ D lim : (2.15)
z!0 z
It is essential that the limit must not depend on the path in which the complex
number z approaches zero, Fig. 2.4, as otherwise this definition would not have
any sense. This condition for a function f .z/ to be (complex) differentiable puts
certain limitations on the function f .z/ D w D u.x; y/ C iv.x; y/ itself, which are
formulated by the following Theorem.
Theorem 2.1 (Cauchy and Riemann). In order for the function f .z/ D u C iv
to be (complex) differentiable at z D .x; y/, where both u.x; y/ and v.x; y/
are differentiable around .x; y/, it is necessary and sufficient that the following
conditions are satisfied at this point:
@u @v @u @v
D and D : (2.16)
@x @y @y @x
Proof. We shall first prove the necessary condition. This means that we assume that
f 0 .z/ exists and need to show that the conditions (2.16) are satisfied. Indeed, if the
derivative exists, it means that it does not depend on the direction in which z ! 0.
Therefore, let us take the limit (2.15) by approaching zero along the x axis, i.e. by
considering x ! 0 and y D 0. We have
Œu .x C x; y/ C iv .x C x; y/ Œu .x; y/ C iv .x; y/

f 0 .z/ D lim
x!0 x
u .x C x; y/ u.x; y/ v .x C x; y/ v.x; y/ @u @v
D lim C i lim D Ci ;
x!0 x x!0 x @x @x
since along this path z D x. Alternatively, we may approach zero along the
imaginary axis by taking z D iy with x D 0. This must give the same complex
number f 0 .z/:
Œu .x; y C y/ C iv .x; y C y/ Œu .x; y/ C iv .x; y/
f 0 .z/ D lim
y!0 iy

1 u .x; y C y/ u.x; y/ v .x; y C y/ v.x; y/
D lim C i lim
i y!0 y y!0 y
@u @v
D i C :
@y @y
The two expressions must be identical as the derivative must not depend on the
direction. Therefore, comparing the real and imaginary parts of the two expressions
above, we obtain the required conditions (2.16).
Next we prove sufficiency: we are given the conditions, and we have to prove
that the limit (2.15) exists. Assuming that the functions u.x; y/ and v.x; y/ are
differentiable, we can write (see Sect. I.5.3):
@u @u
u D u .x C x; y C y/ u.x; y/ D x C y C ˛1 x C ˛2 y
@x @y
@u @v
D x y C ˛1 x C ˛2 y;
@x @x
@v @v
v D v .x C x; y C y/ v.x; y/ D x C y C ˇ1 x C ˇ2 y
@x @y
@v @u
D x C y C ˇ1 x C ˇ2 y;
@x @x
where ˛1 , ˛2 , ˇ1 and ˇ2 tend to zero if x and y tend to zero (i.e. z ! 0),
and in the second passage in both cases we have made use of the conditions (2.16)
by replacing all partial derivatives with respect to y with those with respect to x.
Therefore, we can consider the difference of the function in Eq. (2.15):

@u @v @v @u
f D f .z C z/ f .z/ D u C iv D Ci x C Ci y
@x @x @x @x
C1 x C 2 y

@u @v @u @v
D Ci .x C iy/ C 1 x C 2 y D Ci z
@x @x @x @x
C1 x C 2 y; (2.17)
where 1 D ˛1 C iˇ1 and 2 D ˛2 C iˇ2 both tend to zero as z ! 0. Therefore,
f .z C z/ f .z/ @u @v x y
D Ci C 1 C 2 : (2.18)
z @x @x z z
It is easy to see that the fractions x=z and y=z are limited. To show this, let
us write the complex number z D r .cos C i sin / in the trigonometric form,
then:
x r cos cos cos .cos i sin /

D D D
z r .cos C i sin / cos C i sin .cos C i sin / .cos i sin /
cos .cos i sin /
D D cos .cos i sin / ;
cos2 C sin2
and hence
ˇ ˇ q
ˇ x ˇ
ˇ ˇ D jcos .cos i sin /j jcos i sin j D cos2 C sin2 D 1;
ˇ z ˇ
since the complex number cos i sin lies on a circle of the unit radius in the
complex plane; similarly for the y=z. Therefore, when taking the limit z ! 0
in (2.18), we can ignore these fractions and consider only the limit of 1 and 2
which both tend to zero. Therefore, finally:
f .z C z/ f .z/ @u @v
lim D Ci ;
z!0 z @x @x
which is a well-defined expression (since both u and v are differentiable), i.e. the
derivative f 0 .z/ exists. Q.E.D.
Using the conditions (2.16) and the fact that the derivative should not depend on
the direction in which z ! 0, we can write down several alternative expressions
for the derivative:
@u @v @u @v @v @v @u @u
f 0 .z/ D Ci D i C D Ci D i : (2.19)
@x @x @y @y @y @x @x @y
Since the derivative is basically based on the partial derivatives of the functions
u.x; y/ and v.x; y/, all properties of the derivatives of real functions are carried over
into here:
df dg
.f C g/0 D f 0 C g0 ; .fg/0 D f 0 g C fg0 and f .g.z//0 D :
dg dz
If z D f .w/ does the unique (one-to-one) mapping, then the inverse function w D
f 1 .z/ exists. Then, similarly to the real calculus,
ˇ
dw d 1 1 ˇˇ
D f 1 .z/ D D 0
dz dz dz=dw f .w/ ˇwDf 1 .z/
for the derivative of the inverse function.

It is said that if f 0 .z/ exists for any z belonging to some region D in the complex
plane C, it is analytic (or holomorphic) there.
As an example, consider the square function w D z2 D .x C iy/2 D x2 y2 C
i2xy. First, let us show that it is analytic everywhere in C. Indeed, for this function
u D x2 y2 and v D 2xy, and hence
@u @v @u @v
D 2x; D 2x; D 2y; D 2y;
@x @y @y @x
and the Cauchy–Riemann conditions (2.16) are indeed satisfied as can be easily
checked. Correspondingly,
2 0 @u @v
z D Ci D 2x C i2y D 2 .x C iy/ D 2z:
@x @x
Problem 2.9. Show that the functions z3 and z4 are

0 analytic everywhere,
0and
hence, using any form of Eq. (2.19), show that z3 D 3z2 and 2z4 C z D
8z3 C 1.
Problem 2.10. Generalise these results for the general natural power function
w D zn by showing that .zn /0 D nzn1 . [Hint: use the binomial expansion for
zn D .x C iy/n .]
Problem 2.11. When proving Theorem 2.1 we considered the derivative f 0 .z/
taken along the x and iy directions. Prove that the derivative f 0 .z/ of the analytic
function f .z/ would not change if the limit is taken along an arbitrary direction
l in the complex plane. [Hint: define a unit vector l by the angle ˛ it makes with
the x axis; then y D x tan ˛.]
Problem 2.12. Prove that if the function f .z/ D u.x; y/ C iv.x; y/ is analytic,
then its real and imaginary parts are harmonic functions, i.e. each of them
satisfies the two-dimensional Laplace equation:
@2 u @2 u @2 v @2 v
C D 0 and C D 0: (2.20)
@x2 @y2 @x2 @y2
[Hint: differentiate the conditions (2.16).]

Because of the Cauchy–Riemann conditions (2.16) the whole function f .z/ is

in fact nearly fully determined (up to a complex constant) by its either real or
imaginary part. We shall illustrate this by the following example.
Example 2.1. I
Let the real part of f .z/ be u.x; y/ D xy. Let us determine the imaginary part of it.
Use the first condition:
Z
@u @v 1
D y H) D y H) v .x; y/ D ydy D y2 C .x/:
@x @y 2
Here we wrote v as a y-integral since @v=@y D y; this follows from the general
property of the indefinite integrals. Also, the integration is performed only over y,
i.e. keeping x constant, and .x/ appears as an arbitrary function of x, i.e. .x/ is
yet to be determined. The above result however fully determines the dependence of
v.x; y/ on y. To find the function .x/, we employ the second condition (2.16):
Z
@u @v d 1
Dx H) D x D H) .x/ D xdx D x2 C C;
@y @x dx 2
where C is a real constant. Hence, we finally obtain

1 1 1 2 1 2
v.x; y/ D y2 x2 C C H) f .z/ D u C iv D xy C i y x CC
2 2 2 2
y2 x2
D xy C i C iC:
2
It is easy to see now that f .z/ D iz2 =2 C iC. Indeed,
i i 1
f .z/ D z2 C iC D .x C iy/2 C iC D xy i x2 y2 C iC;
2 2 2
which is precisely our function. This example shows that if u.x; y/ is known, then
the corresponding imaginary part v.x; y/ can indeed be found. It also illustrates that
the final function must be expressible entirely via z as f .z/. Not in all cases this is
possible; some u.x; y/ (or v.x; y/) do not correspond to any f .z/. For instance, if
u.x; y/ D x3 3x2 y, then we first obtain
@u @v
D 3x2 6xy H) D 3x2 6xy
@x @y
Z
2
H) v .x; y/ D 3x 6xy dy D 3x2 y 3xy2 C .x/ I
however, the second conditions yields
@u @v @ 2
D 3x2 H) D 3x2 H) 3x2 D 3x y 3xy2 C .x/
@y @x @x
d
D 6xy 3y2 C ;
dx
which does not give an equation for .x/ since terms with y do not cancel out. This
means that there is no function f .z/ having the real part equal to x3 3x2 y.
Problem 2.13. Find the analytic function f D u C iv if u D x3 3xy2 C 1.

[Answer: v D 3x2 y y3 C C and f .z/ D z3 C 1 C iC.]

Problem 2.14. Find
2 the 2analytic
function f D u C iv if v D y= x2 C y2 .
[Answer: u D x= x C y C C and f .z/ D 1=z C C.]
Problem 2.15. In this problem we shall derive the Cauchy–Riemann condi-
tions in polar coordinates. Let us consider the functions u.x; y/ and v.x; y/
specified via .r; / instead: u D u.r; / and v D v.r; /, where x D r cos
and y D r sin . By differentiating the latter equations with respect to x and
y, four algebraic equations are obtained for the partial derivatives of r and
with respect to x and y. Show that
@r @r @ sin @ cos
D cos ; D sin ; D and D :
@x @y @x r @y r
Then, considering the functions u D u.x; y/ and v D v.x; y/ as composite

functions of r and , relate the partial derivatives of u and v with respect to x
and y to their partial derivatives with respect to r and , and hence show that
the Cauchy–Riemann conditions read as follows:
@u @u sin @v @v cos
cos D sin C ;
@r @ r @r @ r
@u @u cos @v @v sin
sin C D cos C :
@r @ r @r @ r
Finally, manipulate these equations to show that
@u 1 @v @v 1 @u
D and D :
@r r@ @r r@
2.3 Main Elementary Functions
Having introduced main definitions and concepts related to analytic functions, we

are now ready to consider main elementary functions to see if it is possible to define
our usual functions on the complex plane.
2.3.1 Integer Power Function
Consider w D zn with n being a positive integer. If jzj D r and arg .z/ D , then
we know from Eq. (2.5) that jzn j D rn and arg .zn / D n . So, the power function is
single-valued.
It is however easy to see that the mapping here is given by that depicted in
Fig. 2.3(a) where two points, z2 and z3 , go over into a single point w2 of the function.
Indeed, let us see if there exist two different points z1 and z2 (given by their absolute
values r1 and r2 and the phases 2 and 2 ) which give the same power, i.e. for
which zn1 D zn2 . Naively, we may write two equations: r1n D r2n and n 1 D n 2 .
The first one yields simply r1 D r2 . However, the second equation is not quite
correct as the phase may differ by an integer number of 2 , and hence is to be
replaced by n 1 D n 2 C 2 k with k being any integer (including zero); therefore,
1 D 2 C 2 k=n. Hence, the points on a circle of radius r1 D r2 which have
difference in phases of 2 k=n transform by the power function w D zn into the
same point w in C. Correspondingly, if we only consider a sector of points with
the phases satisfying the inequality
2 2
.k 1/ < < k; (2.21)
n n
then they would all transform by w D zn into different points, i.e. there will be a
one-to-one mapping of that sector into the complex plane. Here k D 1; : : : ; n, i.e.
the complex plane is divided into n such sectors as shown in Fig. 2.5 for the case of
Fig. 2.5 In the case of n D 8

there will be eight sectors,
each corresponding to the
angle 2 =8 D =4 (or 45ı ).
The z points inside any of the
sectors provide a one-to-one
mapping with the function
w D zn
2.3 Main Elementary Functions 141
n D 8. If we consider any points z inside of any of these sectors, they will provide a
unique mapping by means of the power function w D zn .
We have proven in Problem 2.10 that this function is analytic using the binomial
expansion. In fact, a much simpler proof exists. Indeed, w D u C iv D .x C iy/n ,
with u D Re .x C iy/n and v D Im .x C iy/n . Therefore,
h i
@u @
D Re .x C iy/n D Re n .x C iy/n1 D Re Œn .a C ib/ D na;
@x @x
h i
@v @
D Im .x C iy/ D Im ni .x C iy/n1 D Im Œin .a C ib/ D Im .ina nb/
n
@y @y
D na;
i.e. both are the same. Above, a and b are real and imaginary parts of the complex
number .x C iy/n1 .
Problem 2.16. Show that the second condition (2.16) is also satisfied.
Correspondingly, as the power function is analytic, we can calculate its derivative

using any of the expressions in (2.19); for instance, using the first one (derivative
with respect to x), we get
@
.zn /0 D .x C iy/n D n .x C iy/n1 D nzn1 ;
@x
i.e. the same result as in the real calculus, i.e. as if the derivative was taken directly
with respect to z.
2.3.2 Integer Root Function

p
The function w D n z is defined as an inverse to the n-power function, i.e.
z D w . We have seen above that n different values of w differing only by 2 =n
n
in their phase correspond to the same value of z. This means that for the given
z D r .cos C i sin / there are n values of w (roots), which are defined via z D wn ,
i.e. we have for w D .cos C i sin / the absolute value and the phase
satisfying: n D r and n D C 2 k, i.e.
ˇp ˇ p p C2 k 2
ˇ n zˇ D D n
r and arg n
z D D D C k;
n n n
where k D 0; 1; 2; : : : ; n 1; (2.22)
which means
p p 2
n
z D n r .cos k C i sin k/ ; k D C k; (2.23)
n n
where the index k numbers all the roots. Note that there are only n different values
of the integer k possible, e.g. the ones given above; all additional values repeat the
same roots. So, we conclude that the n root of any complex number z (except for
z D 0) has n roots. The point z D 0 is indeed somewhat special as it only has a single
root for any n. We shall see in a moment the special meaning of this particular point.
Example 2.2. I As an example, consider the case of n D 2; we should have two
roots:

p p
z 1 D r cos C i sin ;
2 2

p p p p
z 2 D r cos C C i sin C D r cos i sin D z 1:
2 2 2 2
p
In particular,
p if z is real positive ( D 0 and z D x > 0), the two roots are x
and x, as one would expect. If z is real negative, then D , and the two roots
become
p p p p p
z 1 D jxj cos C i sin D i jxj and z 2 D i jxj:
2 2
These are shown as dots in Fig. 2.6.
As an another example, consider all n roots of 1. Here r D 1 and D 0, so
the n roots of 1 would all have the absolute value equal to one, and the phases
k D 2 k=n. The roots form vertices of a regular n-polygon fit into a circle of unit
radius as shown for n D 3; 4; 5 in Fig. 2.7.
In a general case of calculating all roots of a number z D x C iy, it is necessary
to
p calculate r and pfirst, and then p work out all the roots. For instance, consider
3
1 C 2i. Here r D 12 C 22 D 5 and D arctan .2=1/ D arctan 2. Therefore,
the three roots are
Fig. 2.6 Two square roots of

a negative real x D jxj
p
Fig. 2.7 Roots n 1 for
n D 3; 4; 5 form vertices of
regular polygons inserted
inside the circle of unit radius
p
3 1 2
zk D 5 Œcos k C i sin k with k D arctan 2 C k; k D 0; 1; 2:
3 3
As an example of an application, let us obtain all roots of a quadratic equation
x2 C2xC4 D 0. Using the general expression for the roots of the quadratic equation,
we can write
p p p
x1;2 D 1 ˙ 1 4 D 1 ˙ 3 D 1 ˙ i 3:
Here the ˙ sign takes care of the two values of the square root. J
Problem 2.17. Prove that the sum of all roots of 1 is equal to zero. [Hint: use
Eqs. (2.9) and (2.10).]
Problem 2.18. Obtain all roots of the following quadratic equations:
x2 C x C 1 D 0 I x2 C 2x C i D 0 I 2x2 3x C 3 D 0:
p
[Answers: 1=2 ˙ i 3=2; 1 ˙ 21=4 Œcos . =8/ i sin . =8/;
p
3 ˙ i 15 =4.]
4 4
p roots of the equation z C a D 0. [Answer: the four
Problem 2.19. Obtain all
roots are ˙ .˙1 C i/ a= 2.]
Problem 2.20. Show that all four roots of the equation (g > 1)

x4 C 2 2g2 1 x2 C 1 D 0
q p
are ˙i 2g2 1 ˙ 2g g2 1.
Now let us consider in more detail how different roots of the same complex
p
D z.
number z are related to each other. Let us start from the square root w p
If jzj D r and arg .z/ D , then the first root w1 is given by jw1 j D r and
arg .w1 / D =2. If we change of z within the limits of 0 and 2 , excluding the
boundary values themselves, i.e. 0 < < 2 , then the argument of w1 would
Fig. 2.8 When a contour shown in (a) is passed p

by the variable z, different regions in the complex
plane are passed by the function w D u C iv D n z as shown in (b) and (c) for the cases of n D 2
and n D 3
p
Fig. 2.9 As z changes along the path in (a), the value of the root w D n
z goes from w1 domain
to the w2 domain as shown for the cases of n D 2 (b) and n D 3 (c)
change (see Eq. (2.23)) within the limits of 0 and , i.e. 0 < arg .w1 / < .
p
However, if we consider the second root w2 of the function w D z, then over
the same range of z values the phase of the root w2 would vary within the interval
< arg .w2 / < 2 . This is schematically shown in Fig. 2.8: when z is passed along
the closed loop shown in (a) which is not crossing the positive part of the x axis,
the first root w1 traverses only over the upper loop in (b), while the second root w2
p
over the lower part. Similarly, in the case of the function w D 3 z the arguments
of the three roots lie within the intervals, correspondingly, 0 < arg .w1 / < 2 =3,
2 =3 < arg .w2 / < 4 =3 and 4 =3 < arg .w3 / < 2 . Therefore, if we imagine
taking z along the same contour shown in Fig. 2.8(a), the three roots would traverse
p
along the three paths shown in (c) of the same figure. The root function w D n z
under this condition (0 < arg .z/ < 2 ) is clearly single-valued and we can choose
any of the roots.
Therefore, if z does not cross the positive part of the x axis from below, i.e. the
z D 0 point is not completely circled, each of the roots remains stable within their
respective regions. Let us now imagine that we take a contour in which the positive
part of the z axis is crossed from below, i.e. the z D 0 is fully circled (arg.z/ goes
beyond 2 ) as is shown in Fig. 2.9(a). In this case, if we initially start from the root
w1 , its phase arg .w1 / goes beyond its range ( in the case of n D 2 and 2 =3 in the
case of n D 3) as is shown in Fig. 2.9(b, c), and thus the root function takes on the
next value w2 .
Looking at this slightly differently, consider a point z0 D x > 0 on the positive

part of the x axisp(i.e. with
p ˇpjz0ˇj D px and arg .z/
p D 0). Its first square root
w1 D z0 D C x since ˇ z0 ˇ D x and arg z0 D 0. After the complete
circle we arrive at theˇ point z where 2 isadded to the argument of z0 ; the first
p ˇ 1 p p
square root becomes ˇ z1 ˇ D x and arg z1 D 2 =2 D . However, z1 is
p
obviously the same point on the diagram as z0 ! So, the function w D z jumps
on the positive part of the x axis to its second instance w2 and hence w1 becomes
discontinuous. Hence the root function is multi-valued. In the case of the square
root, two complete revolutions will bring the root function back to w1 ; in the case
p p
of 3 z three revolutions are necessary. Generally, in the case of w D n z each full
rotation of z around the z D 0 point will take the root wk to wkC1 , and exactly n
rotations will return wn back to w1 .
This complication happens because the z D 0 point is so special: it is called the
branch point. If we limit the argument of z to stay within a 2 interval between
2 k and 2 .k C 1/, or, more precisely, 2 k arg.z/ < 2 .k C 1/), then any of
the roots, which is specified by the integer k D 0; 1; : : : ; n 1, would correspond
to a well-defined single-valued function. Each such function is related to different
branches of the root function. These are related to each other by an extra phase
of 2 =n, so that arg .wk1 / D arg .wk / C 2 =n. To facilitate this limitation on the
possible values of the phase of z, a cut is made from the z D 0 point to infinity to
disallow crossing the cut when traversing the complex plane putting natural limits
to the possible change of the phase. An example of such a cut is shown in Fig. 2.10
(a, b). In this case 0 < arg.z/ < 2 . Note, however, that the domain of arguments
of z depends on where the cut is made. In Fig. 2.10(c) another possibility is shown
when the argument of z varies between and ; for the cut made along the upper
part of the imaginary y axis, Fig. 2.10(d), we have instead 3 =2 < arg.z/ < =2.
Some functionspmay have morep than p one branch point to be defined. Consider,
for instance, w D z2 1 D z 1 z C 1. It has two branch points, z1 D 1 and
z2 D 1, due to each individual square root function. If we take a general point z,
then it can be represented either with respect to the one or the other branch point as
must be clearer from the drawing made in Fig. 2.11(a):
Fig. 2.10 To avoid multi-valued character of the n-root function, a “cut” is to be made from
the z D 0 point to infinity in order to restrict the phase of z in such a way that the maximum change
of arg.z/ would not exceed 2 . For example, this can be done either by cutting along the positive
part of the x axis as shown in (a) (in (b) the cut is shown clearer), in which case 0 < arg.z/ < 2 ,
or along its negative part as in (c) in which case < arg.z/ < , or along, e.g. the upper part of
the imaginary axis as in (d) when 3 =2 < arg.z/ < =2
p
Fig. 2.11 (a) The function w D z2 1 has two branch points: z D ˙1. A general point z in the
complex plane can be represented in two ways: with respect to either the branch point z D 1 (using
r1 and 1 ) or z D 1 (using r2 and 2 ). Here r1 and r2 are the corresponding distances to the
branch points. (b) One branch cut is drawn from z D 1 to infinity along the positive x direction,
while the other branch cut is drawn in the same direction but from the point z D 1. (c) The previous
case is equivalent to having the branch cut drawn only between the points z D 1 and z D 1; as
in (b) w.z/ appears to be continuous everywhere outside the cut (see text). (d) Another possible
choice of the two branch cuts which leads to a different function w.z/
z 1 D x1 C iy1 D r1 .cos 1 C i sin 1/ or z C 1 D x2 C iy2 D r2 .cos 2 C i sin 2/ :
The branch cuts can be made in various ways, as the idea is to limit the phase
of z such
p that the function w.z/ is single-valued. Two such other possibilities for
w D z2 1 are shown in Fig. 2.11(b, d). Consider first the construction shown in
(b) where the two cuts are drawn in the positive direction of the x axis from both
points (so that the cuts overlap in the region x > 1). In this case the angles 1 and
2 change within the same p intervals: 0 < 1 < 2 and 0 < 2 < 2 . To understand
the behaviour of w.z/ D z2 1, it is sufficient to calculate it on both sides of
the x axis only, i.e. above (y D C0) and below (y D 0) it. We have to consider
three cases: (i) x < 1, (ii) 1 < x < 1 and (iii) x > 1. Generally, for any z we can
write
p p p
wD z 1 z C 1 D r1 r2 Œcos . 1 C 2/ C i sin . 1 C 2 /

p 1 C 2 1 C 2 1 C 2
D r1 r2 cos C i sin ; i.e. arg.w/ D :
2 2 2
For x > 1 we have r1 D x 1 and r2 Dpx C 1. Then, on the upper side of the cut
p
1 D 2 D 0 and hence w D r1 r2 D x2 1. On the lower side 1 D 2 D 2
(we have to make a complete circle in the anti-clockwise direction to reach the lower
side whereby accumulating
p in both cases the phase of 2 ) and hence arg .w/ D 2
which yields w D x2 1 , which is exactly the same value. Therefore, for any
x > 1 our function is continuous as crossing the x axis does not change it.
Next, let us consider x < 1. Here r1 D 1 x and r2 D 1 x; on the upper
side of the x axis 1 D 2 D (a half circle rotation is necessary resulting in the
phase change of only) and thus arg.w/ D as well, so that
p p
wD .1 x/ .1 x/ .cos C i sin / D x2 1:
On the lower side the angles 1 and 2 are the same, leading to the same function.
We conclude that w.z/ is also continuous at x < 1.
Finally, let us consider the interval 1 < x < 1 between the two branch points,
where r1 D 1 x and r2 D x C 1. On the upper side of the cut 1 D (a half circle
rotation) but 2 D 0 yielding arg.w/ D =2 and hence the function there
p p
wD .1 x/ .x C 1/ cos C i sin D i 1 x2 :
2 2
On the lower side 1 D (a half circle rotation), 2 D2 (a full circle rotation)
and arg.w/ D 3 =2 giving
p p
3 3
wD .1 x/ .x C 1/ cos C i sin D i 1 x2 :
2 2
Hence, the function jumps across the cut between the points .1; 0/ and .1; 0/, i.e.
it is discontinuous across the interval 1 < x < 1. Therefore, the two cuts drawn
in Fig. 2.11(b) can be equivalently drawn as a single cut between the two points
only. Note, however, that this might be misleading as this cut does not imply that
< 1 < and 0 < 2 < 2 , as this choice results in an undesired behaviour of
our function.
p
Problem 2.21. Show that the function w D z2 1 becomes discontinuous
for x > 1 and x < 1 and continuous for 1 < x < 1, if we assume that
< 1 < and 0 < 2 < 2 .
p
Problem 2.22. Show that the function w D z2 1 is continuous for 1 <
x < 1 and discontinuous everywhere else on the x axis if the two cuts are chosen
as in Fig. 2.11(d).
p
Problem 2.23. Analyse the behaviour of the function w D z2 C 1 on both
sides of the imaginary y axis. Consider cuts similar to the choice of Fig. 2.11
(b, d). Note that the branch points in this case are at z D ˙i. You may find the
drawing in Fig. 2.12 useful. [Answer: for the cut 1 < y < 1 (or, which is the
same, two cuts y > 1 and y > 1) the function w is continuous for y > 1 and
y < 1, but is discontinuous across 1 < y < 1; in the case of the cuts y > 1
and y < 1, it is the other way round.]
Fig. 2.12 For thepanalysis of

the function w D z2 C 1
which has its branch points
at ˙i
q
Problem 2.24. Consider the function w D .z C 1/2 .z 1/. Show that in the
3
case of the cut made as in Fig. 2.11(b, c) it is discontinuous for 1 < x < 1,
while for the cuts as in Fig. 2.11(d) this happens at x > 1 and x < 1 instead.
p
Consider now one branch of the n-root function, w D n z (we omit the index k for
simplicity). Let us show that it is analytic. In order to check the Cauchy–Riemann
conditions (2.16), we have to calculate the partial derivatives of u and v with respect
to x and y. Here w and z are related via
z D wn H) x C iy D .u C iv/n :
Differentiating with respect to x and y both sides of this identity, we obtain two
equations:

@u @v @u @v
1 D n .u C iv/ n1
Ci and i D n .u C iv/ n1
Ci ;
@x @x @y @y
so that

@u @v i @u @v
Ci D Di Ci :
@y @y n .u C iv/n1 @x @x
Comparing the real and imaginary parts on both sides, we immediately see that the
necessary conditions are indeed satisfied, i.e. the root function is analytic. Therefore,
its derivative can be calculated using general rules of the inverse function:
p
dw p 0 1 1 1 w n
z 1
D n
z D D n 0 D D D D z1=n1 ;
dz dz=dw .w /w nwn1 nwn nz n
which is exactly the same result as for the real n-root function.
Thus we see that we can differentiate both zn and z1=n with respect to z directly
using usual rules; there is no need to split z D x C iy and use any of the
formulae (2.19) which followed from the Cauchy–Riemann conditions.
2.3.3 Exponential and Hyperbolic Functions
We shall see that in the complex calculus the trigonometric functions are directly
related to the exponential function, something which would be completely impos-
sible to imagine being in the world of real functions! To see this we need first to
define what we mean by the exponential function on the complex plane C. To this
end, we shall employ the famous results of the real calculus, see Sects. I.2.3.4 and
I.2.4.6 and especially Eq. (I.2.85),
a n
lim 1C D ea ; (2.24)
n!1 n
linking the exponential function to a limit of a numerical sequence, and will use it
as our definition:
h
z n x C iy n x y in
e De
z xCiy
D lim 1 C D lim 1 C D lim 1 C Ci
n!1 n n!1 n n!1 n n
D lim wn : (2.25)
n!1
Here
z x y
wD1C D 1C Ci
n n n
is a complex number with the phase
y=n y
D arctan D arctan
1 C x=n xCn
and the absolute value jwj, where

x 2 y2 2x x2 C y2
jwj2 D 1 C C D 1 C C :
n n2 n n2
Correspondingly, the absolute value of the complex number .1 C z=n/n D wn is

n=2
2
n=2 2x x2 C y2
jw j D jwj
n
D 1C C ; (2.26)
n n2
while its phase will be

h z ni y
D arg 1C D n argy .w/ D n D n arctan : (2.27)
n xCn
Now we need to take the limit n ! 1 in both expressions (2.26) and (2.27) above.
Concerning the phase , the limit is calculated using the L’HOopital’s rule and the
answer is y (see Problem I.3.38), while when calculating the limit of the absolute
value in Eq. (2.26), we can neglect the second term inside the square brackets which
is inversely proportional to n2 as it tends to zero much faster than the first term:
" #n=2
2x x2 C y2 2x n=2 x n=2
lim 1C C D lim 1C D lim 1C D ex ;
n!1 n n2 n!1 n n=2!1 n=2
where we have used again Eq. (2.24). Thus, the complex number which we call ez
has the absolute value equal to ex and the phase y, i.e. we obtain the fundamental
formula:
ez D exCiy D ex .cos y C i sin y/ : (2.28)
Let us analyse this result. Firstly, if the imaginary part y D 0, the exponential turns
into the real exponential ex , i.e. the complex exponential is exactly the real one
if z is real. Next, as follows from the two problems below, this function satisfies
usual properties of the exponential function of the real functions, and is analytic
everywhere.
Problem 2.25. Show that it follows from (2.28) that the following identities
are satisfied by the exponential function:
1 ez1
ez1 ez2 D ez1 Cz2 I ez D ; .ez /n D enz ; .ez /n D enz and D ez1 z2 ;
ez ez2
(2.29)
where z, z1 and z2 are complex numbers and n is an integer.

.ez / D exCiy D exiy D ez : (2.30)
Problem 2.27. Show, using again Eq. (2.28), that the exponential function is
analytic everywhere.
Finally, we calculate its derivative: if ez D u C iv with u D ex cos y and v D

x
e sin y, then
@u @v
.ez /0 D Ci D ex cos y C iex sin y D ex .cos y C i sin y/ D ez ;
@x @x
which again is the familiar result from the world of real functions. So, the
exponential function can be differentiated directly with respect to z as the latter
was real.
If we set x D 0 in Eq. (2.28), we shall obtain
eiy D cos y C i sin y and eiy D cos y i sin y; (2.31)
where Eq. (2.30) was used for the second formula. These two identities were derived
by Euler and bear his name. Using the exponential function we can write any
complex number z with the absolute value r and the phase simply as
z D r .cos C i sin / D rei ; (2.32)
which is called the exponential form of the complex number z. It becomes

particularly transparent from this simple form and the properties of the complex
exponential function derived above why upon multiplication of two complex
numbers their phases sum up while upon division the phases subtract.
Problem 2.28. Write the following complex numbers in the exponential form:
p
z D ˙1 I zD1˙iI z D ˙i I zD1˙ 3i:
p
[Answers: 1, ei ; 2e˙i =4
; e˙i =2
; 2e˙i =3 .]
p
Problem 2.29. Write all roots of 5 ˙1 in the exponential form.
ˇ ˇ
Problem 2.30. Show that ˇeia ˇ D 1, where a is a real number.
Problem 2.31. Show that all roots of the quadratic equation x2 C .2 C i/ x C
p
4i D 0 can be written as x1;2 D 1 i=2 ˙ 171=4 3=2 ei =2 , where D
arctan 4.
Problem 2.32. Prove that the sum of all n-roots of 1 is equal to zero. [Hint:
represent the roots in the exponential form and then calculate the sum.]
From Eq. (2.32) we can express both the cosine and sine functions via the
exponential functions:
1 iy 1 iy
cos y D e C yiy and sin y D e eiy : (2.33)
2 2i
It is seen that the trigonometric functions are indeed closely related to the complex
exponential function. These relations are called Euler’s identities as well.
Expressions (2.31) or (2.33) (that are easy to remember) can be used for quickly
deriving various trigonometric identities (that are not so easy to remember!). For
instance, let us prove the double angle formula for the sine function:
1 i2˛ 1 h i˛ 2 i˛ 2 i
sin .2˛/ D e ei2˛ D e e
2i 2i

1 i˛ i˛
i˛ i˛
1 i˛ i˛
1 i˛ i˛

D e Ce e e D2 e Ce e e
2i 2 2i
D 2 cos ˛ sin ˛:
Problem 2.33. Derive trigonometric identities of Sect. I.2.3.8 using Euler’s

formulae.
Problem 2.34. Prove the formula (p is integer):
Z 1
t2p .2p/Š
p dt D (2.34)
1 1 t2 22p .pŠ/2
by following these steps: (i) make the substitution t D sin ; (ii) use the Euler’s
formula (2.33) to express the sine function via complex exponentials; (iii) use
the Binomial formula and perform integration over and (iv) note that only a
single term in the sum will give a non-zero contribution.
Because the exponential function of the purely imaginary argument is related

directly to the sine and cosine functions, it is periodic:
ezCi2 k
D ez ei2 k
D ez Œcos .2 k/ C i sin .2 k/ D ez ;
since the cosine is equal to one and the sine to zero. In other words, ei2 k D 1 for
any integer k. This also means that the exponential function of any two complex
numbers z1 and z2 related via z1 D z2 C i2 k is the same: ez1 D ez2 . Therefore, if
one considers horizontal stripes 2 k Im .z/ < 2 .k C 1/ for any fixed integer
k, then there will be one-to-one correspondence between z and ez . This is essential
to define the inverse of the exponential function which, as we shall see in the next
section, is the logarithm. This situation is similar to the integer power function we
considered previously where it was necessary to restrict the phase of z to define the
inverse (the n-root) function.
The hyperbolic functions are defined identically to their real variables counter-
parts:
1 z 1 z
sinh z D .e ez / and cosh z D .e C ez / : (2.35)
2 2
cosh2 z sinh2 z D 1 I cosh .z/ D cosh .z/ I sinh .z/ D sinh .z/ :
2.3.4 Logarithm
The logarithm is defined as an inverse function to the exponential function, i.e. if

we write w D ln z, that means ew D z. Because of the properties (2.29) of the
exponential function which are exactly the same as for the real exponential function,
the logarithm on the complex plane of a product or ratio of two complex numbers is
equal to the sum or difference of their logarithms:
z1
ln .z1 z2 / D ln z1 C ln z2 and ln D ln z1 ln z2 :
z2
For any complex number z we can write
z D rei D eln r ei D eln rCi H) ln z D ln rCi D ln jzjCiarg.z/: (2.36)
Since the phase arg.z/ of any z from C is defined up to 2 k with any integer k, the
logarithm of z is a multi-valued function. If is one particular phase of z, then
ln z D ln jzj C i . C 2 k/ ; k D 0; ˙1; ˙2; : : : : (2.37)
To choose a single branch of the logarithm, we fix the value of the integer k, and
then ln z will remain within a stripe 2 k Im .ln z/ < 2 .k C 1/, see Fig. 2.13.
For instance, by choosing k D 0 we select the stripe between 0i and 2 i. This is the
principal branch of the logarithm corresponding to the cut made along the positive
part of the x axis shown in Fig. 2.10(b) as in this case the phase of the logarithm
changes only between 0 and 2 .
For instance, consider ln .1/. This logarithm is not defined in real numbers.
However, on C this quantity has perfect meaning: since 1 D ei then ln .1/ D
ln 1 C i . C 2 k/ D i . C 2 k/. Here different values of k correspond to the
values of ln .1/ on different branches of the logarithmic function.
Fig. 2.13 The logarithmic

function w D u C iv D ln z is
defined in stripes of the
vertical width 2 satisfying
(for the given branch k) the
inequality
2 k v < 2 .k C 1/. Each
stripe corresponds to a
particular branch of the
logarithmic function
Fig. 2.14 When z goes

around a closed contour (a)
which avoids the point z D 0,
the function
w D ln z D u C iv remains
within a particular branch
making a rectangular path
there (b)
Fig. 2.15 If the z-contour

shown in (a) contains the
point z D 0 inside, then the
path taken by the logarithm
shown in (b) goes from one
logarithm branch into another
continuously
Problem 2.36. Present the following expressions in the form u C iv using the
k-th branch of the logarithm:
p p
ln i I ln .1 ˙ i/ I ln 3˙i I ln 1 ˙ i 3 I ln .1 C 5i/ :
[Answers: i .2k C 1=2/; .ln 2/ =2 C i .2k ˙ 1=4/; ln 2 C i .2k ˙ 1=6/;

ln 2 C i .2k ˙ 1=3/; .ln 26/ =2 C i .arctan 5 C 2 k/.]
Now let us try to understand if there are any limitations on the domain of allowed
z values in order for its logarithm to remain within the particular chosen stripe (and
hence to correspond to a single-valued function). In Fig. 2.14(a) we take a closed
path which does not contain the point z D 0 within it; then the logarithmic function
w D ln z goes along the path shown in (b) for each particular branch k. The vertical
parts in the paths in (b) correspond to the circle paths in (a) when only the phase of
z changes, while the horizontal parts in (b) correspond to the horizontal paths in (a)
when only jzj changes. The situation is different, however, if the point z D 0 lies
inside the contour as shown in Fig. 2.15. In this case w D ln z passes through the
current stripe and goes over into the next one, i.e. revolving around z D 0 takes
the logarithmic function from one of its branches to the next one. As in the case of
the n-root function, this problem is avoided by taking a branch cut from the branch
point z D 0 which would limit the phase of z between 0 and 2 , Fig. 2.10.
Above, when choosing the branches, we assumed that the branch cut is made
along the positive part of the x axis. Another direction of the cut will change the
function. For instance, the cut made along the negative part of the x axis shown
in Fig. 2.10(c) restricts the phase of z being between and . Hence, the same
point on the complex plane will have a different imaginary part of the logarithm.
Indeed, the points z1 D rei3 =2 and z2 D rei =2 are equivalent, but the former
point (when 0 < arg .z1 / < 2 ) corresponds to the cut made along the positive,
while the latter (when < arg .z2 / < ) along the negative directions of the x
axis. Correspondingly, the values of the logarithm in each case are by i2 different:
ln z1 D ln r C i3 =2 and ln z2 D ln r i =2.
Considering a particular branch, we can easily establish that the logarithmic
function is analytic, and we can calculate its derivative.
p 2.37. Show that the logarithmic function w D ln z D u C iv, where

Problem
u D ln x2 C y2 and v D arctan .y=x/, is analytic.
Problem 2.38. Show that the principal branch of the function (! > 0)
z!
f .z/ D ln
zC!
near the real axis (z D x C iı with ı ! ˙0) is given by
ˇ ˇ
ˇx! ˇ
ˇ
f .z/ D ln ˇ ˇ ˙ i ! .x/;
xC!ˇ
where the plus sign corresponds to ı > 0 (i.e. just above the real axis), while
the minus sign to ı < 0 (just below), and ! .x/ D 1 if ! < x < ! and zero
otherwise. [Hint: since we have here two logarithmic functions, ln .z !/ and
ln .z C !/, there will be two branch points at ˙! on the real axis; then it is
convenient to choose the cuts x > ! and x > ! (which is equivalent to the
cut ! < x < !).]
The derivative of the logarithm is obtained in the usual way as it is an inverse

function to the exponential one:
dw 1 1 1 1
.ln z/0 D D D w 0 D w D : (2.38)
dz dz=dw .e / e z
Again, the formula looks the same as for the real logarithm.
2.3.5 Trigonometric Functions
Sine and cosine functions of a complex variable are defined from the Euler-like
equations (2.33) generalised to any complex number z, i.e.
1 iz 1 iz
sin z D e eiz and cos z D e C eiz : (2.39)
2i 2
As an example, let us calculate
1 h i.2iC =4/ i 1 2Ci =4

sin 2i C D e ei.2iC =4/ D e e2i =4
4 2i 2i

i 1 i =4 2 i =4 i 1 2
D e e e D cos C i sin e cos i sin
2 e2 2 e2 4 4 4 4
4 4
i 1 1Ce 1e 1
D p 2
.1 C i/ e2 .1 i/ D p i p D p .cosh 2 C i sinh 2/ :
2 2 e 2 2e 2 2 2e 2 2
The trigonometric functions satisfy all the usual properties of the sine and cosine
functions of the real variable. First of all, if z is real, these definitions become Euler’s
formulae and hence give us usual sine and cosine. Then we see that for a complex z
the sine is an odd while cosine an even function, e.g.
1 iz 1 iz
sin .z/ D e eiz D e eiz D sin z:
2i 2i
Similarly one can establish various trigonometric identities for the sine and
cosine, and the manipulations are similar to those in Problem 2.33. For instance,
consider

1 iz 2 1 iz 2
sin2 z C cos2 z D e eiz C e C eiz
2i 2
1 2iz 1 2iz 1 1
D e 2 C e2iz C e C 2 C e2iz D C D 1;
4 4 2 2
as expected. It is also obvious that since the sine and cosine functions are composed
as a linear combination of the analytic exponential functions, they are analytic.
Finally, their derivatives are given by the same formulae as for the real variable
sine and cosine.
Problem 2.39. Express the complex numbers below in the form of u C iv:

1Ci 2i 3i
sin i I cos i I sin I cos :
1i 2Ci 3Ci
1 1 1 1
[Answers: i sinh .1/; cosh .1/; i sinh .1/; cos 10 cosh 10 C i sin 10 sinh 10 .]
Problem 2.40. Prove that generally:
sin .x C iy/ D sin x cosh y C i cos x sinh y;

cos .x C iy/ D cos x cosh y i sin x sinh y:
cos .2z/ D cos2 z sin2 z I sin .4z/ D 4 sin z cos3 z 4 sin3 z cos z I
sin .z1 ˙ z2 / D sin z1 cos z2 ˙ cos z1 sin z2 I .sin z/0 D cos z I
.cos z/0 D sin z:
Problem 2.42. Prove directly (by checking derivatives of their real and imag-
inary parts) that both sine and cosine functions are analytic.
Problem 2.43. Prove the following identities using the definitions of the
corresponding functions:
cos .iz/ D cosh z I sin .iz/ D i sinh z sinh .iz/ D i sin z I cosh .iz/ D cos z:
(2.40)
sin .y ix/
coth x i cot y D :
sin y sinh x
Problem 2.45. Show that zeros of cos z are given by z D =2 C k, while

zeros of sin z are given by z D k with k D 0; ˙1; ˙2; : : :.
The last point which needs investigation is to determine which z points give the
same values for the sine and cosine functions. This is required for selecting such
domains of z in C where the trigonometric functions are single-valued and hence
where their inverse functions can be defined.
Let us start from the sine function:
sin z1 D sin z2 H) eiz1 eiz1 D eiz2 eiz2 :
If g1 D eiz1 and g2 D eiz2 , then the last equation can be rewritten as

1 1 1
g1 D g2 H) .g1 g2 / 1 C D0
g1 g2 g1 g2
iz1
H) e eiz2 1 C ei.z1 Cz2 / D 0:
Therefore, we obtain that either
eiz1 D eiz2 H) z1 D z2 C 2 k H) fx1 D x2 C 2 k; y1 D y2 g ;
or/and
ei.z1 Cz2 / D 1 H) ei.z1 Cz2 / D ei H) z1 C z2 D C2 k

H) fx1 C x2 D .2k C 1/ ; y1 D y2 g :
The first expression reflects the periodicity of the sine function along the real axis
with the period of 2 ; note that this is entirely independent of the imaginary part
of z. This gives us vertical stripes (along the y axis) of the width 2 within which the
sine function is single-valued. The second condition is trickier. It is readily seen that
if z is contained inside the vertical stripe =2 < Re.z/ < =2, then no additional
solutions (or relationships between x1 and x2 ) come out of this extra condition.
Indeed, it is sufficient to consider the case of k D 0 because of the mentioned
periodicity. Then, we have the condition x1 C x2 D . If both x1 and x2 are positive,
this identity will never be satisfied for both x1 and x2 lying between 0 (including)
and =2 (excluding). Similarly, if both x1 and x2 were negative, then this condition
will not be satisfied if both x1 and x2 lie between =2 and 0. Finally, if x1 and x2
are of different sign, then the condition x1 C x2 D is not satisfied at all if both
of them are contained between =2 and =2. Basically, the conclusion is that no
identical values of the sine function are found if z is contained inside the vertical
stripe =2 < Re.z/ < =2, as required.
Problem 2.46. Show similarly that the equation cos z1 D cos z2 has the
solution of either x1 D x2 C 2 k (k is an integer) and/or x1 C x2 D 2 k.
Therefore, one may choose the vertical stripe 0 Re .z/ < to avoid identical
values of the cosine function.
Since the cosine and sine functions were generalised from their definitions
given for real variables, it makes perfect sense to define the tangent and cotangent
functions accordingly:
sin z eiz eiz cos z eiz C eiz

tan z D D i iz and cot z D D i iz : (2.41)
cos z e C eiz sin z e eiz
2.3.6 Inverse Trigonometric Functions
We shall start from w D arcsin z. It is defined as an inverse to z D sin w. If we

solve the last equation with respect to w, we should be able to obtain an explicit
expression for the arcsine function. We therefore have
1
z D sin w H) 2iz D eiw eiw H) 2iz D p H) p2 2izp1 D 0;
p
where p D eiw . Solving the quadratic equation with respect to p, we obtain

q p p
p D iz C .iz/2 C 1 D iz C 1 z2 H) eiw D iz C 1 z2
p
H) w D arcsin z D i ln iz C 1 z2 : (2.42)
Since the logarithm is a multi-valued function, so is the arcsine. Also, here we do not
need to write ˙ before the root since here the root is understood as a multi-valued
function.
Problem 2.47. Show that arccosine function is related to the logarithm as

follows:
p
arccos z D i ln z C z2 1 : (2.43)
Problem 2.48. Prove the following identities (k is an integer):
arcsin z C arccos z D C2 k I arcsin .z/ C arcsin z D 2 k:

2
In both cases all values of the functions are implied.
Using these representations, inverse trigonometric functions are expressed via

the logarithm and hence easily calculated as is illustrated by the following example:
p
1Ci 2i
w D arcsin D arcsin D arcsin .i/ D i ln i2 C 1 i2
1i 2
p
D i ln 1 ˙ 2 :
Here the ˙ sign comes from the two possible

p values of the square root. When we
choose the “C” sign, then z D 1 C 2 > 0 and its phase is zero, so that w1 D
p
i ln 1 C 2 C2 k. If we choose the “” sign value of the root, with the phase
p
attached to it, then we arrive at w2 D i ln 1 C 2 C . C 2 k/. In this way
all possible values of the complex number w are obtained.
Next, arctangent w D arctan z is defined in such a way that tan w D z, and
similarly for the arccotangent. Their derivation we leave as an exercise.
Problem 2.49. Derive the following expressions:

1 1 C iz 1 zCi
arctan z D ln and arccotz D ln I (2.44)
2i 1 iz 2i z i
arctan z C arccotz D C k;
2
where k is an integer as usual.
All these functions are multi-valued.

2.3.7 General Power Function
General power function w D zc with c D ˛ C iˇ being a general complex number

is defined as follows:
w D zc D ec ln z : (2.45)
Here the logarithmic function is understood as a multi-valued one. To see this

explicitly, we recall that ln z D ln r C i .2 k C /, where z D rei and k is an
integer. Then,
w D zc D eRCi‰ ;
where
R D Re .c ln z/ D Re Œ.˛ C iˇ/ .ln r C i2 k C i / D ˛ ln r .2 k C / ˇ;

arg .zc / D ‰ D Im .c ln z/ D ˇ ln r C ˛ .2 k C / :
These are the general formulae.

Consider now specifically the case of ˇ D 0, i.e. the power c D ˛ being real.
Then R D ˛ ln r and ‰ D ˛ .2 k C /, i.e.
kC / kC /
z˛ D e˛ ln r ei˛.2 D r˛ ei˛.2 :
If ˛ D n is an integer, then
kC /
zn D rn ein.2 D rn ein ;
i.e. we obtain our previous single-valued result of Sect. 2.3.1. Similarly, in the n-root
case, i.e. when ˛ D 1=n with n being an integer, we obtain

1=n 1=n i.2 kC /=n
p 2
z Dr e D n
r exp i k C i ;
n n
which is the same result as we obtained earlier in Sect. 2.3.2. Further, if we consider
now a rational power ˛ D n=m with both n and m being integers (m ¤ 0), then
n
2 n n 1=m 2
z n=m
Dr n=m
exp i kCi D r exp i k C i
m m m m
p n
and it coincides with the function m z , as expected. Hence the definition (2.45)
indeed generalises our previous definitions of the power function.
2.4 Integration in the Complex Plane 161
Now we shall consider some examples of calculating the general power func-
tion. Let us calculate ii D exp .i ln i/. Since ln i D i . =2 C 2 k/, then ii D
exp . =2 2 k/, i.e. the result is a real number which however depends on the
branch (the value of k) used to define the logarithm.
Problem 2.50. Calculate the following complex numbers:

p !2i
1Ci 1Ci 3
.1 C i/ I I 102iC5 I .x C i/˛ ;
2
p
where hin the last case both
i x and ˛ are real. [Answers: 2 exp . =4 2 k/
p ˛=2
exp i =4 C ln 2 ; exp .2 =3 4 k/; 105 exp .i2 ln 10/; 1 C x2
exp Œi˛ . C 2 k/, where D arctan .1=x/.]
2.4 Integration in the Complex Plane
2.4.1 Definition
Integration of a function f .z/ of a complex variable z can be defined similarly to the

line integrals considered at length in Sect. I.6.3: we define a line (called contour) L
in the complex plane C, on which the function f .z/ is defined, Fig. 2.16. The line is
divided into n portions by points z0 ; z1 ; : : : ; zn and within each neighbouring pair of
points .zk ; zkC1 / an intermediate point k lying on L is chosen. Then we define the
integral of f .z/ on the contour as the limit of the sum:
Z X
n1
f .z/dz D lim f .k / .zkC1 zk / ; (2.46)
L !0
kD0
Fig. 2.16 The directed

contour L in the complex
plane is divided into n
portions by n C 1 points z0 ,
z1 , : : :, zn
q
where is the maximum distance jzkC1 zk j D .xkC1 xk /2 C .ykC1 yk /2
between any of the two adjacent points on the curve. The limit of ! 0 means
that the distances between any two adjacent points become smaller and smaller in
the limit (and correspondingly the number of division points n ! 1). It is clear
that if the limit exists, then the choice of the internal points k is not important.
We observe that this definition is very close to the definition of the line integral of
a vector field (of the second kind), Sect. I.6.3.2. Indeed, let f .z/ D u .x; y/Civ .x; y/,
zk D xk C iyk and k D ˛k C iˇk , then
X
n1 X
n1
f .k / .zkC1 zk / D .uk C ivk / Œ.xkC1 xk / C i .ykC1 yk /
kD0 kD0
X
n1 X
n1
D Œuk .xkC1 xk / vk .ykC1 yk / C i Œvk .xkC1 xk / C uk .ykC1 yk /
kD0 kD0
X
n1 X
n1
D Œuk xk vk yk C i Œvk xk C uk yk ;
kD0 kD0
where uk D u .˛k ; ˇk / and vk D v .˛k ; ˇk /. In the limit of max fxk g ! 0 and

max fyk g ! 0, the two sums tend to 2D line integrals, i.e. one can write
Z Z Z
f .z/dz D udx vdy C i vdx C udy: (2.47)
L L L
This result shows that the problem of calculating the integral on the complex plane
can in fact be directly related, if needed, to calculating two real line integrals on
the x y plane. If these two integrals exist (i.e. u.x; y/ and v.x; y/ are piecewise
continuous and their absolute values are limited), then the complex integral also
exists and is well defined.
In practice the complex integrals are calculated by using a parametric represen-
tation of the contour L. Let x D x.t/ and y D y.t/ (or z D x.t/ C iy.t/ D z.t/) define
the curve L via a parameter t. Then dx D x0 .t/dt and dy D y0 .t/dt, so that we obtain
Z Z Z
0
f .z/dz D ux vy0 C i vx0 C uy0 dt D .u C iv/ x0 C iy0 dt
L L L
Z
D f .z.t// z0 .t/dt: (2.48)
L
Example 2.3.
I As an example, let us integrate the function f .z/ D 1= .z z0 / around a circle
of radius R centred at the point z0 D x0 C iy0 in the anti-clockwise direction, see
Fig. 2.17 A circle of radius

R centred at point
z0 D x0 C iy0
Fig. 2.17. Here the parameter u can be chosen as a polar angle since the points z
on the circle can be easily related to via
z . / D z0 C Rei : (2.49)
Indeed, if the circle was centred at the origin, then we would have x . / D R cos
and y . / D R sin , i.e.
z . / D R .cos C i sin / D Rei I
however, once the circle is shifted by z0 , we add z0 , which is exactly Eq. (2.49).
Therefore, we get
Z ˇ ˇ Z Z
dz ˇ z D z0 C Rei ˇ 2
iRei d 2
D ˇˇ ˇD
ˇ Di d D 2 i:
circle z z0 dz D z0 . /d D iRei d 0 Rei 0
Note the direction of integration: it is anti-clockwise as shown in the figure, hence

changes from 0 to 2 .J
Note that the result does not depend on the radius R of the circle. We shall better
understand this fact later on in Sect. 2.4.2.
From the relationship (2.47) connecting the complex integral with the real
line integrals, it is clear that the properties of the line integrals are immediately
transferred to the complex ones: (1) the integral changes sign if the direction of
integration is reversed; (2) the integral of a sum of functions is equal to the sum of
the integrals taken with the individual functions and (3) the integral over a composite
curve L consisting of separate pieces (e.g. a piecewise curve L D L1 C L2 C ) is
equal to the sum of integrals taken over each piece L1 , L2 , etc. The absolute value of
the integral is also limited from above if the function f .z/ is limited everywhere on
the line L. Indeed, if jf .z/j F D max ff .z/g and the integral is the limit of the sum,
we can use the inequality (zi are any complex numbers)
ˇ ˇ
ˇX ˇ X
ˇ ˇ
ˇ zi ˇ jzi j ; (2.50)
ˇ ˇ
i i
see Eq. (I.1.90). Then,

ˇZ ˇ Z Z Z
ˇ ˇ
ˇ f .z/dzˇ jf .z/dzj D jf .z/j jdzj F jdzj D Fl; (2.51)
ˇ ˇ
L L L L
where
Z Z p Z p
lD jdzj D dx2 C dy2 D x0 .t/2 C y0 .t/2 dt
L L L
is the length of the curve L, specified with the parameter t, compare with
Sect. I.6.3.1.
R
Problem 2.51. Show that L dz=z2 D 2=R, where L is the upper semicircle of
radius R centred at the origin which is traversed from the positive direction of
the x axis to the negative one. At the same time, demonstrate that the same result
is obtained when the integral is taken along the lower part of the semicircle
traversed in the negative x direction (i.e. connecting the same initial and final
points).
2.4.2 Integration of Analytic Functions

R
As in the case of real line integrals, the contour integral L f .z/dz would depend
on the function f .z/ and on the curve L. Recall, however, that for functions in 2D
satisfying a certain condition
@P=@y D @Q=@x; (2.52)

R
see Theorem I.6.5 in Sect. I.6.3.4, the line integral L Pdx C Qdy does not depend on
the path, but only on the initial and final points. It appears that exactly the same is
true for complex integrals provided that the function f .z/ is analytic. This is stated
by the following Theorem.
Theorem 2.2 (Due to Cauchy). If the function f .z/ is analytic in some simply
connected3 region D of C and has a continuous derivative4 everywhere in D, then
for any contour L lying in D and starting
R and ending at the points zA D A .xA ; yA /
and zB D B .xB ; yB /, the integral L f .z/dz would have the same value, i.e. the
integral does not depend on the actual path, but only on the initial and final points.
Proof. Indeed, consider both real integrals in formula (2.47) and let us check if
these two integrals satisfy conditions of Theorem I.6.5. The region D is simply
connected, we hence only need to check whether the condition (2.52) is satisfied in
each case. In the first integral we have P D u and Q D v, and hence the required
condition (2.52) corresponds to @u=@y D @v=@x, which is the second Cauchy–
Riemann condition (2.16). Next, in the second integral in (2.47) we instead have
3
Recall that this was also an essential condition in Theorem I.6.5 dealing with real line integrals.
4
In fact, this particular condition can be avoided, although this would make the proof more
complex.
Fig. 2.18 The contour L has

the point z0 inside, while the
contour L0 avoids this point
P D v and Q D u, so that the required condition becomes @v=@y D @u=@x, which

is exactly the first Cauchy–Riemann condition (2.16). So, the conditions for the
integral to not depend on the path appeared to be exactly the same as the conditions
for f .z/ to be analytic, as required. Q.E.D.
H
Problem 2.52. Prove that the integral over any closed contour L f .z/dz taken
inside D is zero:
I
f .z/dz D 0: (2.53)
L
This is a corollary to the Cauchy Theorem 2.2. The inverse statement is also valid
as is demonstrated by the following Theorem.
Theorem 2.3 (By H Morera). If f .z/ is continuous in a simply connected region D

and the integral L f .z/dz D 0 over any closed contour in D, then f .z/ is analytic
in D.
Proof. Since the integral over any closed contour is equal to zero, the integral
Z z Z Z
f .p/dp D udx vdy C i vdx C udy
z0 L L
does not depend on the path L connecting the twoR points, z0 and z, but only on
the points themselves. There are two line integrals L Pdx C Qdy above, each not
depending on the path. Then, from Theorem I.6.5 for line integrals, it follows that in
each case @P=@y D @Q=@x. Applying this condition to each of the two integrals, we
immediately obtain the Cauchy–Riemann conditions (2.16) for the functions u.x; y/
and v.x; y/, which means that indeed the function f .z/ is analytic. Q.E.D.
The usefulness of these Theorems can be illustrated on the frequently met
integral
Z 1 Z 1Cia
ˇ.xCia/2 2
I.a/ D e dx D eˇz dz; (2.54)
1 1Cia
Fig. 2.19 The integration between points .˙1; ia/ along the blue horizontal line z D x C ia
(1 < x < 1) can alternatively be performed along a different contour consisting of three
straight pieces (in purple): a vertical down piece at x D 1, then horizontal line along the x axis
and finally again a vertical piece to connect up with the final point
where a is a real number (for definiteness, we assume that a 0). The integration
here is performed along the horizontal line in the complex plane between the points
z˙ D ˙1 C ia crossing the imaginary axis at the number ia. The integrand
2
f .z/ D eˇz does not have any singularities, so any region in C is simply connected.
Hence, the integration line can be replaced by a three-piece contour connecting the
same initial and final points, as shown in Fig. 2.19 in purple. The integrals along the
vertical pieces, where x D ˙1, are equal to zero. Indeed, consider the integral over
the right vertical piece for some finite x D R, the R ! 1 limit is assumed at the
end. There
f .z/ D eˇ.RCiy/ D eˇ.R / ei2ˇRy

2 2 y2
and hence the integral can be estimated:

ˇZ ˇ Z ˇ ˇ
ˇ a
ˇ.RCiy/2
ˇ a
ˇ ˇ.R2 y2 / i2ˇRy ˇ
ˇ e dyˇˇ ˇe e ˇ dy
ˇ
0 0
Z a ˇ ˇ Z a
ˇ ˇ.R2 y2 / ˇ 2 2 2
D ˇe ˇ dy D eˇR eˇy dy D MeˇR ;
0 0
where M is some positive finite number corresponding to the value of the integral
R a ˇy2 2
0 e dy. It is seen from here that since eˇR ! 0 as R ! 1, the integral tends to
zero. Similarly it is shown that the integral over the vertical piece at x D R ! 1
is also zero. Therefore, it is only necessary to perform integration over the horizontal
x axis between 1 and C1. Effectively it appears that the original horizontal
contour at z D ia can be shifted down (up if a < 0) to coincide with the x axis, in
which case the integration is easily performed as explained in Sect. 4.2:
Z 1 Z 1 r
2 2
I.a/ D eˇ.xCia/ dx D eˇx dx D : (2.55)
1 1 ˇ
So, the integral (2.54) does not actually depend on the value of a.

Z 1
r
2 Ciˇx 2 =4˛
e˛x dx D eˇ : (2.56)
1 ˛
The importance of the region D being simply connected can be illustrated by our
Example 2.3: the contour there is taken around the point z0 at which the function
f .z/ D 1= .z z0 / is singular. Because of the singularity, the region inside the
circular contour is not simply connected, because it is needed to cut off the singular
point. The latter can be done by drawing a small circle around z0 and removing all
points inside that circle. Thus, the region where the integration contour passes has a
hole and hence two boundaries: one is the circle itself and another is related to the
small circle used to cut off the point z0 . That is why the integral is not zero. However,
if the integral was taken around any contour which does not have the point z0 inside
it, e.g. the contour L0 shown in Fig. 2.18, then the integral would be zero according
to the Cauchy Theorem 2.2.
Since the integral of an analytic function only depends on the starting and ending
points of the contour, we may also indicate this explicitly:
Z Z z
f .z/dz D f .z/dz:
L z0
This notation now looks indeed like the one used for a real one-dimensional definite
integral, and this similarity is even stronger established because of the following
two theorems which provide us with a formula that is practically exactly the same
as the main formula of the integral calculus of Sect. I.4.3.
Theorem 2.4. If f .z/ is an analytic function in a simply connected region D, then

Z z

F.z/ D f z0 dz0 ; (2.57)
z0
considered as a function of the upper limit,5 is an analytic function.
Proof. Let us write the integral explicitly via real and imaginary parts of the
function f .z/ D u C iv, see Eq. (2.47). However, since we know that both line
integrals do not depend on the choice of the path, we can run them using the special
path .x0 ; y0 / ! .x; y0 / ! .x; y/, i.e. we first move along the x and then along
the y axis (cf. Sect. I.6.3.4 and especially Fig. I.6.20, as, indeed, we have done
this before!). In this case the real U.x; y/ and imaginary V.x; y/ parts of F.z/ are,
respectively:
Z Z x Z y
U .x; y/ D Re F.z/ D udx vdy D u .; y0 / d v .x; / d;
L x0 y0
5
Note that according to Theorem 2.2 the actual path is not important as long as it lies fully inside
the simply connected region D.
Z Z x Z y
V .x; y/ D Im F.z/ D vdx C udy D v .; y0 / d C u .x; / d:
L x0 y0
Now, let us calculate all the partial derivatives to check if F.z/ is analytic. We start
with @U=@x:
Z Z
@U y
@v .x; / y
@u .x; /
D u .x; y0 / d D u .x; y0 / C d
@x y0 @x y0 @
D u .x; y0 / C Œu .x; y/ u .x; y0 / D u .x; y/ ; (2.58)
where we replaced @v=@x with @u=@y (using for the second variable) because
the function f .z/ is analytic and hence satisfies the conditions (2.16). A similar
calculation yields
Z Z
@V y
@u .x; / y
@v .x; /
D v .x; y0 / C d D v .x; y0 / C d
@x y0 @x y0 @
D v .x; y0 / C Œv .x; y/ v .x; y0 / D v .x; y/ ; (2.59)
while the y-derivatives are straightforward:
@U @V
D v .x; y/ and D u .x; y/ :
@y @y
Hence, it is immediately seen that
@U @V @V @U
D and D ;
@y @x @y @x
i.e. the Cauchy–Riemann conditions (2.16) are indeed satisfied for F.z/, as required.
Q.E.D.
Theorem 2.5. If f .z/ is an analytic function in a simply connected region D and

F.z/ is given by Eq. (2.57), then F 0 .z/ D f .z/.
Proof. Since we have proven in the previous theorem that the function F.z/ is
analytic, its derivative F 0 .z/ does not depend on the direction in which it is taken.
If we take it, say, along the x axis, then, as follows from Eqs. (2.58) and (2.59),
@U @V
F 0 .z/ D .U C iV/0 D Ci D u.x; y/ C iv.x; y/ D f .z/;
@x @x
as required. Q.E.D.
Similarly to the case of real integrals, we can establish a simple formula for
calculating complex integrals. Indeed, it is easy to see that different functions F.z/,
all satisfying the relation F 0 .z/ D f .z/, may only differ by a constant. Indeed,
suppose there are two such functions, F1 .z/ and F2 .z/, i.e. F10 .z/ D F20 .z/ D f .z/.
Consider F D F1 F2 which has zero derivative: F 0 D F10 F20 D f f D 0.
If F.z/ was a real function, then it would be obvious that it is then a constant. In our
case F D U C iV is in general complex, consisting of two real functions, and hence
a proper consideration is needed. Because the derivative can be calculated along any
direction, we can write for the real, U, and imaginary, V, parts of the function F the
following equations:
dF @U @V @U @V
D Ci D0 H) D 0 and D 0;
dz @x @x @x @x
and
dF @U @V @U @V
D Ci D0 H) D 0 and D 0;
dz @y @y @y @y
from which it is clear that U and V can only be constants, i.e. F.z/ D C, where C
is a complex number. This means that the two functions F1 and F2 may only differ
by a complex constant, and therefore, one can write
Z z1
f .z/dz D F .z1 / C C
z0
with the constant C defined immediately by setting z1 D z0 . Indeed, in this case the
integral is zero and hence C D F .z0 /, which finally gives
Z z1
f .z/dz D F .z1 / F .z0 / ; (2.60)
z0
which does indeed coincide with the main result of real integral calculus
(Eq. (I.4.43), the Newton–Leibniz formula, in Sect. I.4.3). The function F.z/ may
also be called an indefinite integral. This result enables calculation of complex
integrals using methods identical to those used in real calculus, such as integration
by parts, change of variables, etc. Many formulae of real calculus for simple
integrals can also be directly applied here. Indeed, since expressions for derivatives
of all elementary functions in C coincide with those of the functions of a real
variable, we can immediately write (assuming the functions in question are defined
in a simply connected region):
Z Z Z
e dz D e C C;
z z
sin z dz D cos z C C; cos z dz D sin z C C
and so on.
RB
Problem 2.54. Consider the integral I D A z2 dz between points A.1; 0/ and
B.0; 1/ using several methods: (i) along the straight line AB; (ii) along the
quarter of a circle connecting the two points; (iii) going first from A to the
centre O.0; 0/, and then from O to B and (iv) using directly Eq. (2.60) and
finding the appropriate function F.z/ for which F 0 D z2 . [Answer: in all cases
I D .1 C i/ =3 and F.z/ D z3 =3.]
ProblemH 2.55. Show by an explicit calculation that for any n ¤ 1 the
integral L .z z0 /n dz D 0, where L is a circle centred at z0 .
The Cauchy theorem above was proven for simply connected regions. We can
now generalise this result to multiply connected regions as, e.g. the ones shown in
Fig. 2.2. To this end, let us consider a region D shownH in Fig. 2.20(a) which has
two holes in it. If we calculate the closed-loop integral L f .z/dz for some analytic
function f .z/, it would not be zero since f .z/ is not analytic where the holes are
and hence our region is not simply connected. This is perfectly illustrated by the
Example 2.3 where a non-zero value for the integral around the singularity z0 was
found. Therefore, in those cases the Cauchy theorem has to be modified.
The required generalisation can be easily made by constructing an additional
path which goes around all the “forbidden” regions as shown in Fig. 2.20(b). In this
case we make two cuts to transform our region into a simply connected one; then
the integral will be zero over the whole closed loop:
Z Z Z Z
f .z/dz C f .z/dz C f .z/dz C f .z/dz D 0;
L L1 L2 L
where L1 is taken around the first “forbidden” region, while L2 around the second,
and L corresponds to two connecting lines traversing in the opposite directions
when connecting L with L1 and L1 with L2 . Since we can arbitrarily deform the
Fig. 2.20 (a) Contour L is taken around two “forbidden” regions shown as yellow with red
boundaries. (b) The contour L is deformed such that it goes round each of the forbidden regions
with sub-contours L1 and L2 , both traversed in the clockwise direction in such a way that the
“allowed” region is always on the left; the red dashed lines indicate the branch cuts made to turn
the region into a simply connected one; (c) the contours L1 and L2 are taken in the opposite
direction so that they traverse the “forbidden” regions anti-clockwise
contour inside the simply connected region without changing the value of the
integral (which is zero), we can make sure that the connecting lines in L are passed
very close to each other on both sides of each cut, and hence their contribution will
be zero. Therefore, we can write
Z Z Z Z Z
f .z/dz D f .z/dz f .z/dz D f .z/dz C f .z/dz D g1 C g2 ;
L L1 L2 L1 L2
(2.61)
where g1 and g2 are the closed-loop integrals around each of the holes passed in the
opposite (anti-clockwise) direction as shown in Fig. 2.20(c).
Hence, if a loop L encloses several “forbidden” regions, where f .z/ is not
analytic, as in Fig. 2.20(a), then
I XI
f .z/dz D f .z/dz; (2.62)
L k Lk
where the sum is taken over all “forbidden” regions falling inside L, and in all cases
the integrals are taken in the anti-clockwise direction. One can also write the above
formula in an alternative form:
I XI
f .z/dz C f .z/dz D 0; (2.63)
L k Lk
where all contour integrals over L and any of the Lk are run in such a way that the
region D is always on the left (i.e. L is run anti-clockwise and any internal ones, Lk ,
clockwise). Formally this last formula can be written in a form identical to the one
we obtained for a simply connected region, Eq. (2.53):
I
f .z/dz D 0; (2.64)
L
where the loop L is understood as composed of the loop L itself and all the internal
loops Lk which surround any of the “forbidden” regions which fall inside L. All
loops are taken in such a way that the region D is on the left.
It is clear that if the loop L goes around the k-th hole many times, each time the
value gk of the corresponding loop integral in Eq. (2.61) is added on the right-hand
side, in which case
I X X I
f .z/dz D nk gk D nk f .z/dz; (2.65)
L k k Lk
where nk is the number of times the k-th hole is traversed. These numbers nk may
also be negative if the traverse is made in the clockwise direction, or zero if no
traverse is made at all around the given hole (which happens when the hole is
outside L). The values gk do not depend on the loop shape as within the simply
Fig. 2.21 When integrating

f .z/ D 1=z between z0 and z,
one can choose a direct path L
between the two points, or a
dashed-line path L1 which
makes a loop around the
branch point z D 0
connected region the loop can be arbitrarily deformed, i.e. gk is the “property” of
the function f .z/. Thus, it is seen from Eq. (2.65) that the value of the integral with
the contour L taken inside a multiply connected region with the “forbidden” regions
inside L may take many values, i.e. it is inherently multi-valued. Formulae (2.64)
and (2.65) are known as a Cauchy theorem for a multiply connected region.
Example 2.4.
I To illustrate this very point, it is instructive to consider a contour integral of
f .z/ D 1=z between two points z0 ¤ 0 and z ¤ 0. We expect that for any path
connecting the points z0 and z and not looping around the z D 0 point, as, e.g. is the
path L shown in Fig. 2.21 by the solid line, the integral is related to the logarithm:
Z z
dz0
D ln z ln z0 (2.66)
z0 z0
(since .ln z/0 D 1=z). However, if the path loops around the branch point z D 0
along the way, as does the path L1 shown on the same figure by the dashed line, then
the result must be different. Indeed, the path L1 can be split into two parts: the first
one, z0 ! z1 ! z, which goes directly between the initial and ending points, and
the second one, which is the loop itself (passed in the clockwise direction). The first
part should give the same result (2.66) as for L as the path z0 ! z1 ! z can be
obtained by deforming L all the time remaining within the simply connected region
(this can be done by making a branch cut going, e.g. from z D 0 along the positive
x direction as shown in Fig. 2.10). Concerning the loop integral around z D 0, this
can also be arbitrarily deformed, the result will not depend on the actual shape. In
Fig. 2.22(a) two loops are shown: an arbitrary shaped loop L and a circle Lo . We
change the direction on the circle and run a branch cut along the positive direction
of the x axis as in Fig. 2.22(b); then we connect two loops by straight horizontal lines
Lc running on both sides of the cut. They run in the opposite directions and hence
do not contribute to the integral. However, since the whole contour L C Lc Lo lies
entirely in the simply connected region, the Cauchy theorem applies, and hence the
result must be zero. Considering that the path Lc does not contribute, we have that
Fig. 2.22 The loop integrals

around L and a circle loop Lo
in (a) are the same since the
two loops can be connected
via two lines Lc going in the
opposite directions on both
sides of the branch cut (red
dashed line) as shown in (b)
Z Z Z Z Z
dz0 dz0 dz0 dz0 dz0
C D0 H) D D ;
L z0 Lo z0 L z0 Lo z0 Lo z0
i.e. the two loop integrals in Fig. 2.22(a) are indeed the same. This means that the
integral over the loop in Fig. 2.21 can be replaced with the one where the contour is a
circle of any radius. We have already looked at this problem in Example 2.3 for some
z0 and found the value of 2 i for the value of the integral taken over a single loop
going in the anti-clockwise direction; incidentally, we found that the result indeed
does not depend on R (as it should as changing the radius would simply correspond
to a deformation of the contour). Hence, for the contour L1 shown in Fig. 2.21, the
result will be
Z z 0
dz
0
D ln z ln z0 2 i;
z0 z
where 2 i, which is the contribution from the contour L0 , appeared with the minus
sign due to the clockwise direction of the traverse in L0 . Obviously, we can loop the
branch point in either direction and many times, so that the general result for any
contour is
Z z 0
dz
0
D ln z ln z0 C i2 k; (2.67)
z0 z
where k D 0; ˙1; ˙2; : : :. We see that the integral is indeed equal to the multi-
valued logarithmic function, compare with Eq. (2.37), and the different branches of
the logarithm are related directly to the contour chosen. J
Problem 2.56. Using the substitution t D tan .x=2/ from Sect. I.4.4.4, show
that for g > 1
Z =2 Z =2
dx dx
D Dp : (2.68)
=2 g sin x =2 g C sin x g2 1
(continued)

[Hint: after the change of the variable x ! t, the denominator becomes a square
polynomial whose two roots are complex numbers; decompose the fraction into
simpler ones and perform integration using a particular branch (sheet) of the
logarithm; take the limits and collect all four logarithmic functions together.]
Analytic functions still have more very interesting properties. We shall prove
now a famous result that the value of an analytic function f .z/ at some point z of
a multiply connected region D is determined by its values on any closed contour
surrounding the point z; in particular, this could be the boundary of region D. For
a multiply connected region this boundary includes both the external and all the
internal loops surrounding the “forbidden” regions.
Theorem 2.6 (Due to Cauchy). Let the function f .z/ be analytic inside some
multiply connected region D. Then for any contour L surrounding the point z and
lying inside D, we have
I
1 f .p/dp
f .z/ D ; (2.69)
2 i L pz
where L contains the loop L and all the internal loops fLk g which surround any
holes (“forbidden” regions) lying inside L. Note the direction of the traverse of
the outside loop L and any of the internal loops: the “allowed” points of D should
always remain on the left.
Proof. A function g.p/ D f .p/= .p z/ is analytic everywhere in a multiply

connected region D except at the point z itself; hence, we can surround z by a
small circle Cr of sufficiently small radius r and cut the circle off the region D
whereby constructing an additional “forbidden” region in D. Note that the loop Cr
is to be traversed in the clockwise direction, see, e.g. Fig. 2.20(b), keeping points of
D always on the left. Let L be a closed loop which contains the point z and may be
some “forbidden” regions inside it. Then, as follows from the Cauchy theorem for
multiply connected regions, Eq. (2.64),
I I
f .p/ f .p/
dp C dp D 0;
L pz Cr pz
where L is a composite loop consisting of L and all the internal loops fLk g
surrounding the “forbidden” regions inside L. Therefore:
I I I I
f .p/ f .p/ f .p/ f .z/ dp
dp D dp D dp C f .z/ ; (2.70)
L pz Cr pz Cr pz Cr pz
where both integrals in the right-hand side are now taken in the anti-clockwise
direction. The second integral we have calculated in Example 2.3 where we found
that it is equal to 2 i. The first integral is equal to zero. Indeed, it can be estimated
using the inequality (2.51) as
ˇI ˇ ˇ ˇ
ˇ f .p/f .z/ ˇˇ ˇ f .p/f .z/ ˇ
ˇ dpˇ max ˇ ˇ ˇ2 r
ˇ pz circle pz ˇ
circle
ˇ ˇ ˇ ˇ
ˇ f zCri f .z/ ˇ ˇ f z C ri f .z/ ˇ
ˇ ˇ ˇ ˇ
D 2 r max ˇ ˇ D 2 max ˇ ˇ
circle ˇ rei ˇ circle ˇ ei ˇ
ˇ ˇ
D 2 max ˇf z C ri f .z/ˇ :
circle
Here we have used that on the circle p D z C rei . The circle can be continuously
deformed without affecting the value of the integral. In particular, we can make
as weˇ want. Taking therefore the limit of r ! 0, the difference
ˇit as small
ˇf z C ri f .z/ˇ tends to zero, and hence the above estimate shows that the first
circle integral in the right-hand side of (2.70) tends to zero. Therefore, from (2.70)
follows the result we set out to prove. Q.E.D.
If we formally differentiate both sides of Eq. (2.69) with respect to z, we get a
similar result for the derivative of f .z/:
I
1 f .p/
f 0 .z/ D dp: (2.71)
2 i L .p z/2
At this point we have to be careful as the operation of differentiation is not rigorously

justified since the integrand in the Cauchy formula (2.69) is singular at p D z.
We shall show now that Eq. (2.71) is nevertheless still valid.
To this end, consider a function f .˛; z/ which depends parametrically on a (gen-
erally complex) ˛, and let us discuss the limit of ˛ tending to ˛0 (cf. Sects. I.6.1.3
and I.7.2.1). We shall say that f .˛; z/ converges uniformly to F.z/ from some region
D when ˛ ! ˛0 , if for any > 0 there exists ı D ı. / > 0 such that j˛ ˛0 j < ı
implies jf .˛; z/ F .z/j < for all z. It is essential for the uniform convergence to
exist that ı only depends on , but not on z. Next, we define a function
Z
G .˛/ D f .˛; z/ g.z/dz; (2.72)
L
where, as above, L is a contour in the complex plane consisting of a closed loop L

and all the internal loops fLk g going around any of the “forbidden” regions inside L.
The function g.z/R above can be arbitrary; we only require that the integral of its
absolute value, L jg.z/j dz, exists.
Theorem 2.7. If f .˛; z/ converges uniformly to F.z/ on the contour L when

˛ ! ˛0 , and g.z/ is limited there, jg.z/j < M, then
Z Z
G .˛/ D f .˛; z/ g.z/dz ! F .z/ g.z/dz:
L L
In other words, one may take the limit sign inside the integral:
Z Z
lim f .˛; z/ g.z/dz D lim f .˛; z/ g.z/dz: (2.73)
˛!˛0 L L ˛!˛0
Proof. Since the function f .˛; z/ converges uniformly with respect to ˛, then for
any > 0 one can find ı > 0, not depending on z, such that j˛ ˛0 j < ı
implies
R jf .˛; z/ F.z/j < . Therefore, considering the limit of the integral
L f .˛; z/ g.z/dz, we can write down an estimate:
ˇZ Z ˇ ˇZ ˇ
ˇ ˇ ˇ ˇ
ˇ f .˛; z/ g.z/dz ˇ ˇ
F.z/g.z/dzˇ D ˇ Œf .˛; z/ F.z/ g.z/dzˇˇ
ˇ
L L L
Z Z
< jg.z/j dz < M dz D Ml D 0
;
L L
where l is the length of the whole contour L (including all the internal parts). The
above inequality proves the property (2.73). Q.E.D.
The Cauchy Theorem 2.6 enables us to present an analytic function at an internal
point z via its values on a contour surrounding it. With the help of formula (2.73)
just proven, we can extend the theorem to the derivatives of f .z/. We shall now
show how to present the derivatives of an analytic f .z/ via its values on a contour
L surrounding the point z. Indeed, the first derivative is the limit of the expression
.f .z C z/ f .z// =z, which, with the help of Eq. (2.69), can be written as:
I I
f .z C z/ f .z/ 1 f .p/ f .p/
D dp dp
z 2 iz L p z z L p z
I I
1 1 1 1 1 1 f .p/
D f .p/ dp D dp:
2 i L z p z z p z 2 i L p z z p z
The function 1= .p z z/ converges uniformly to 1= .p z/ when z ! 0.

Indeed, let d be the shortest distance from the point z to the contour L (recall that
z lies inside L ). Then, jp zj d and, if jzj < ı, then for any p from L we can
also write
jp z zj > jp zj jzj D d ı D d1 :

2.5 Complex Functional Series 177
For small enough ı one can always ensure that d1 > 0. Then, to justify the uniform
convergence 1= .p z z/ ! 1= .p z/, we have to estimate the difference:
ˇ ˇ ˇ ˇ
ˇ 1 1 ˇˇ ˇˇ z ˇ ı
ˇ ˇ
ˇ p z z p z ˇ D ˇ .p z z/ .p z/ ˇ < dd D :
1
It is seen that the estimate is valid for any p from L , so that ı depends only on but
not on p, and this proves the uniform convergence required. Therefore, Eq. (2.73) is
applicable, and we can take the limit z ! 0 inside the integral, yielding Eq. (2.71).
Problem 2.57. Prove the general result for the n-th derivative:
I
nŠ f .p/
f .n/ .z/ D dp: (2.74)
2 i L .p z/nC1
[Hint: use induction.]
This result shows that an analytic function f .z/ has derivatives of any order which
are also analytic functions.
To avoid cumbersome notations, when using the Cauchy theorem we shall write
L instead of L in the following, assuming that all the internal contours fLk g are
included as well if there are “forbidden” regions inside L.
2.5 Complex Functional Series
2.5.1 Numerical Series
Similarly to the case of the real calculus, one can consider infinite numerical series
1
X
z 1 C z2 C D zk (2.75)
kD1
on the complex plane. The series is said to converge to z, if for any > 0 one
can find
P a positive integer N such that for any n N the partial sum of the series,
Sn D nkD1 zk , differs from z by no more than , i.e. the following inequality holds:
jSn zj < . If such N cannot be found, the series is said to diverge.
It is helpful to recognise that a complex numerical series consists of two real
series. Since each term zk D xk C iyk consists of real and imaginary parts, we can
write
1
X 1
X 1
X
zk D xk C i yk :
kD1 kD1 kD1
Therefore, the series (2.75) converges to z D x C iy, if and only if the two real series
in the right-hand side of the above equation converge to x and y, respectively. This
fact allows transferring most of the theorems we proved for real series (Sect. I.7.1)
to the complex numerical series.
Especially useful for us here is the notion of absolute convergence introduced
in Sect. I.7.1.4 for real numerical series with terms which may be either positive or
negative. We proved there that a general series necessarily converges if the series
consisting of the absolute values of the terms of the original series converges. The
same type of statement is valid for complex series as well which is formulated in
Theorem 2.8 (Due to Cauchy). If the series

1
X 1 q
X
p D jz1 j C jz2 j C jz3 j C D jzk j D xk2 C y2k ; (2.76)
kD1 kD1
constructed from absolute values of the terms of the original series, converges,
then so does the original series.
q q
Proof. Indeed, since jxk j xk2 C y2k and similarly jyk j xk2 C y2k for any k, then
P P1
the series 1 kD1 jxk j and kD1 jyk j will both converge (and converge absolutely)
as long as the series (2.76) converges (see Theorem I.7.6). Then, since the real
and imaginary series both individually converge
P1 and convergeP1 absolutely, so are
the original real and imaginary series, kD1 xk and kD1 yk , and hence the
series (2.75). Q.E.D.
Similarly to absolutely converging real series, absolutely converging complex
series can be summed up, subtracted and/or multiplied to each other; their sum also
does not depend on the order of terms in the series.
The root and ratio tests for the convergence of the series are also valid. Although
the proof of the root test remains essentially the same (see Theorem I.7.7), the
ratio test proven in Theorem I.7.8 requires some modification due to a different
nature of the absolute value jzj of a complex number. We shall therefore sketch the
proof of the ratio test here again to adopt it specifically for complex series.
Theorem 2.9 (The Ratio Test). The series (2.75) converges absolutely if
ˇ ˇ
ˇ znC1 ˇ
D lim ˇˇ ˇ < 1;
n!1 zn ˇ
while it diverges if > 1.

Proof. Note that is a positive number. Since the limit exists, then for any >0
one can always find a number N such that any n N implies
ˇ ˇ
ˇ znC1 ˇ
ˇ ˇ< H) jznC1 zn j < jzn j : (2.77)
ˇ z ˇ
n
From the inequality ja bj jaj C jbj (valid even for complex a and b) follows
that jc bj jcj jbj (where c D a C b). Therefore,
jznC1 zn j jznC1 j jzn j ;
which, when combined with (2.77), gives
jznC1 j jzn j < jzn j H) jznC1 j < . C / jzn j :
Repeated use of this inequality results in an estimate:
jznC1 j < . C / jzn j < . C /2 jzn1 j < < . C /n jz1 j :
Therefore, the series (2.78)

P converges absolutely for such values of z for which the
geometric progression n qn with q D C converges:
1
X 1
X 1
X
jzn j D jznC1 j < jz1 j . C /n ;
nD1 nD0 nD0
which is the case only if 0 < q < 1. If < 1, one can always find a positive such
that C < 1, and hence the series (2.75) converges.
Consider now the case of > 1. In this case it is convenient to consider the ratio
zn =znC1 which has a definite limit of D 1= < 1. Similar argument to the one
given in the previous case then leads to an inequality:
ˇ ˇ
ˇ zn ˇ
ˇ ˇˇ < H) jzn j jznC1 j jzn znC1 j < jznC1 j ;
ˇz
nC1
which yields
2 n
1
jznC1 j > jzn j D jzn j > jzn1 j > > jz1 j :
C 1C 1C 1C
Since > 1, one can always find > 0 such that q D = .1 C / > 1.
However, since qn ! 1 when n ! 1, jznC1 j ! 1 as well, and hence the
necessary condition for the convergence of the series, provided by Theorem I.7.4, is
not satisfied, i.e. the series (2.75) indeed diverges. Q.E.D.
As in the case of the real calculus, nothing can be said about the convergence of
the series if D 1.
P
Problem 2.58. Prove that the geometric progression S D 1 k
kD0 q (where q
is a complex number) converges absolutely if jqj < 1 and diverges if jqj >
1. Then show that the sum of the series is still formally given by exactly the
same expression, S D 1= .1 q/, as in the real case. [Hint: derive a recurrence
relation for the partial sum, SN , and then take the limit N ! 1.]
2.5.2 General Functional Series
In this section we shall generalise some of the results of Chap. I.7 to complex
functions. Most of the results obtained in Chap. I.7 are valid in these cases as
well, although there are some differences. We shall mostly be interested in uniform
convergence here (cf. Sect. I.7.2.1).
We shall start by considering a functional sequence f1 .z/, f2 .z/, f3 .z/, etc. We
know that the sequence ffn .z/g converges uniformly to f .z/ if for any > 0 one can
find a number N D N. / such that any n N implies jfn .z/ f .z/j < for any z.
We stress again that it is essential that the number N depends exclusively on , not
on z, i.e. the same value of N applies to all z from region D where all the functions
are defined; that is why the convergence is called uniform.
Next, consider an infinite functional series
1
X
f1 .z/ C f2 .z/ C D fn .z/: (2.78)
nD1
The series (2.78) is said to converge uniformly to f .z/ if the functional sequence of
its partial sums
X
N
SN .z/ D fn .z/ (2.79)
nD1
converges uniformly when N ! 1. Most of the theorems of Sect. I.7.2.1 are valid
here as well. In particular, if the series converges, its n-th term tends to zero as
n ! 1 (cf. Theorem I.7.4). Next, if the series converges uniformly to f .z/ and the
functions ffn .z/g are continuous, then f .z/ is continuous as well, which means that
(cf. Theorems I.7.16 and I.7.17)
1
X 1
X
lim fn .z/ D lim fn .z/ D f .z0 / :
z!z0 z!z0
nD1 nD1
Further, one can integrate a uniformly converging series (2.78) term-by-term, i.e.
(cf. Theorem I.7.18) for any contour L within region D:
Z X
1 1 Z
X
fn .z/dz D fn .z/dz:
L nD1 nD1 L
Also the convergence test due to Weierstrass (Theorem I.7.15) is also valid: if each
element of the series fn .z/ beyond
P some number N (i.e. for all n > N) satisfies
jfn .z/j ˛n and the series n ˛n converges, then the series (2.78) converges
uniformly. Proofs of all these statements are almost identical to those given in
Chap. I.7, so we do not need to repeat them here.
There are also some additional Theorems specific for the complex functions
which we shall now discuss.
Theorem 2.10. If the series (2.78) converges uniformly to f .z/, and all functions
ffn .z/g are analytic in a simply connected region D, then f .z/ is also analytic in D.
Proof. Indeed, since the series converges uniformly for all z from D, we can
integrate the series term-by-term, i.e. one can write
I 1 I
X
f .z/dz D fn .z/dz;
L nD1 L
where L is an arbitrary closed contour in D. Since the functions fn .z/ are analytic, the
closed contour integral of any of them is equal to zero (see Problem 2.52). Therefore,
the closed contour integral of f .z/, from the above equation, is also zero. But this
means, according to Theorem 2.3, that f .z/ is analytic. Q.E.D.
The next Theorem states that the uniformly converging functional series (2.78)
can be differentiated any number of times. The situation is much more restrictive in
the real calculus (Theorem I.7.19).
Theorem 2.11 (Due to Weierstrass). If the series (2.78) converges uniformly to

f .z/ in D, it can be differentiated any number of times.
Proof. Consider a closed loop L in D, and let us pick up a point z inside L and a
point p on L. Then, since the series (2.78) converges uniformly to f .z/ for any z
including points p on the contour L, we can write
1
X
f .p/ D fn .p/:
nD1
h i
Next, we multiply both sides of this equation by kŠ= 2 i .p z/kC1 with some
positive integer k and integrating over L (note that the integration can be done term-
by-term in the right-hand side as the series converges uniformly), we obtain
I X1 I
kŠ f .p/dp kŠ fn .p/dp
D :
2 i L .p z/kC1 nD1
2 i L .p z/kC1
According to the previous theorem, f .z/ is analytic. Therefore, we can use for-
mula (2.74) in both sides, which yields
1
X
f .k/ .p/ D fn.k/ .p/;
nD1
which is exactly the result we wanted to prove. Q.E.D.
2.5.3 Power Series
The series
1
X
ck .z a/k ; (2.80)
kD0
in which functions fk .z/ are powers of za (where a is also complex) and ck are some
complex coefficients, is called a power series in the complex plane C. Practically
all the results of the real calculus we considered before are transferred (with some
modifications) to the complex power series.
We shall start by stating again the Abel’s Theorem I.7.20, which we shall
reformulate for the case of the complex power series here.
Theorem 2.12 (Due to Abel). If the power series (2.80) converges at some point
z0 ¤ a, see Fig. 2.23(a), then it converges absolutely within the circle jz aj < r,
where r D jz0 aj; moreover, it converges uniformly for any z within a circle
jz aj < r, where 0 < < 1.
Proof. Since the series (2.80) converges at z0 , it is required by the necessary

condition of convergence that ck .z0 a/k tends to ˇzero as k ! ˇ 1. Hence, its
general element ck .z0 a/k must be bounded, i.e. ˇck .z0 a/k ˇ < M, where M
Fig. 2.23 (a) Point z0 lies on

a circle Cr of radius
r D jz0 aj with the centre
at point a, and the point z is
strictly inside a circle C of
the radius D r < r. (b)
Point p is on the circle C ,
which has the radius < R
is some positive number. Then, for any z inside the circle of radius r, i.e. within a
circle C with radius D r with 0 < < 1, we can write
ˇ ˇ ˇ ˇk
ˇ ˇ ˇ ˇ ˇˇ .z a/k ˇˇ ˇ ˇˇ ˇ
ˇck .z a/ ˇ D ˇck .z0 a/ ˇ ˇ
k k
ˇ D ˇck .z0 a/k ˇ ˇ z a ˇ < M k ;
ˇ .z0 a/ ˇ
k ˇ z0 a ˇ
where D j.z a/ = .z0 a/j < =r < 1. Hence, the absolute value of each term of
our series is bounded by the elements of the converging geometric progression, M k ,
with 0 < < 1, and hence, according to the corresponding analog of Weierstrass
Theorem I.7.15, the series converges absolutely and uniformly within the circle C .
Q.E.D.
Problem 2.59. Prove by contradiction that if the series (2.80) diverges at some
z0 ¤ a, then it diverges for any z lying outside the circle Cr of radius r D
jz0 aj.
Problem 2.60. Prove that if it is known that the series (2.80) converges at
some z0 ¤ a and diverges at some z1 (obviously, jz1 aj > jz0 aj), then
there exists a positive R > 0 such that the series diverges outside the circle CR ,
i.e. for any z satisfying jz aj > R, and absolutely converges inside CR , i.e. for
any z satisfying jz aj < R.
The number R is called the radius of convergence of the series (cf. Sect. I.7.3.1).
It follows now from Theorem 2.10 that the series (2.80) is an analytic function
inside the circle CR of its radius of convergence R. This in turn means that it
can be differentiated and integrated term-by-term any number of times. The series
obtained this way would have the same radius of convergence. The radius of
convergence can be determined from either ratio or root tests via the following
formulae (cf. Sect. I.7.3.1):
ˇ ˇ
ˇ cn ˇ p
R D lim ˇˇ ˇ and/or 1 D Sup n
jcn j; (2.81)
n!1 cnC1 ˇ R n!1
where in the latter case the maximum value of the root in the limit is implied.
Problem 2.61. Determine the region of convergence of the power series with
the coefficients ck D 3k =k around the point a D i. [Answer: jz ij < 1=3.]
Problem 2.62. Determine the region of convergence of the power series with
p
the coefficients ck D 1= 2k k around the point a D 0. Does the series
converge at the points: z D i and z D 3 i? [Answer: jzj < 2; yes; no.]
Next, let us consider a function f .z/ which is analytic in some region D. We

choose a point a inside D and draw the largest possible circle CR of radius R with
the centre at a and lying inside D; all points z inside the circle satisfy the inequality
jz aj < R. Choose then another circle C , also with the centre at a, which encloses
the point z and is inside the circle CR , and let p be some point on C , so that
jp zj D , see Fig. 2.23(b). The complex number q D .z a/ = .p a/, which
absolute value is
ˇ ˇ
ˇzaˇ
D jqj D ˇˇ ˇ D jz aj < 1
p aˇ
(as the points z and a lie inside C , while p is on it), may be used to form an infinite
geometric progression
1
X 1
X
za k 1
qk D D :
kD0 kD0
pa 1q
It converges absolutely to 1= .1 q/ for any p on the circle C . Moreover, it also

converges uniformly with respect to p. Indeed, the absolute value of its k-th term,
k , where D jz aj = , does not depend on p ( is the distance from a to the
P circlek
C , which is the same for all points p on it), and the geometric progression 1 kD0
converges absolutely since 0 < < 1. Therefore, we obtain
1 X k
n
1 1 1 1 1 1
D D za D D q
pz .p a/ .z a/ p a 1 pa pa1q p a kD0
X
n
.z a/k
D ; (2.82)
kD0
.p a/kC1
where the series on the right converges uniformly for all p on the circle C .
Therefore, it can be integrated term-by-term. Multiplying both sides of Eq. (2.82)
by f .p/=2 i and integrating over the circle C , we get
I X 1 n I
1 f .p/ f .p/ .z a/k
dp D dp
2 i C pz kD0
2 i C .p a/kC1
" I #
Xn
1 f .p/dp
D .z a/ k
: (2.83)
kD0
2 i C .p a/kC1
Using now formulae (2.69) and (2.74) for the left- and right-hand sides, respectively,
and recalling that f .p/ is analytic on C as it is inside CR , we see that the left-hand
side is equal to f .p/ and in the right-hand side we have the k-th derivative of f .p/.
Hence, we finally obtain
1
X I
f .k/ .a/ 1 f .p/dp
f .z/ D ck .z a/k ; where ck D D ; (2.84)
kD0
kŠ 2 i C .p a/kC1
which is the final result. Note that the expansion converges uniformly since it
was obtained by a term-by-term integration of the uniformly converging geometric
progression; moreover, the series is an analytic function (Theorem 2.10).
The formula for the series above looks exactly the same as the Taylor’s formula
for real functions (see Sect. I.7.3.3). Hence, since the formulae for differentiation
of all elementary functions on the complex plane are identical to those in the real
case, the Taylor’s expansions for the elementary functions also look identical. For
instance, we can immediately write the following expansions around a D 0:
1
X zn
z2
ez D 1 C z C C D I (2.85)
2Š nD0
nŠ
1
X .1/n1 z2n1
z3 z 5 z 7 .1/nC1 z2n1
sin z D z C C C C D I (2.86)
3Š 5Š 7Š .2n 1/Š nD1
.2n 1/Š
1
X .1/n z2n
z2 z4 .1/n z2n
cos z D 1 C C C D I (2.87)
2Š 4Š .2n/Š nD0
.2n/Š
X 1
z2 z3 zn zn
ln .1 C z/ D z C C .1/nC1 C D .1/nC1 I (2.88)
2 3 n nD1
n
˛ .˛ 1/ 2 ˛ .˛ 1/ .˛ 2/ .˛ n C 1/ n
.1 C z/˛ D 1 C ˛z C z C C z
2 nŠ
1
X
C D D˛n zn ; (2.89)
nD0
where ˛ is generally complex and the “generalised binomial” coefficients D˛n are
given by

˛ ˛ .˛ 1/ .˛ 2/ .˛ n C 1/
D˛n D D (2.90)
n nŠ
(cf. Eq. (I.3.70)). The latter two expansions are written for single-valued branches
of the functions which correspond to the values of 0 and 1 of the functions at the
point z D 0, respectively.
Problem 2.63. Show that the radius of convergence of the series (2.85)–(2.87)
is R D 1 (i.e. they converge for all z).
Problem 2.64. Show that the radius of convergence of the series (2.88)
and (2.89) is R D 1 (i.e. they converge for jzj < 1).
Problem 2.65. Derive an analog of the Taylor’s formula for complex functions
(cf. Eq. (I.3.60)),
X
n
.z a/k
f .z/ D f .k/ .a/ C RnC1 (2.91)
kD0
kŠ
with the remainder term

I
.z a/nC1 f .p/dp
RnC1 D ; (2.92)
2 i C .p z/ .p a/nC1
starting from a finite geometric progression
X
n
qnC1 1 za
Sn D qk D with q D ;
kD0
q1 pa
proving first that

n
1 1 X za k .z a/nC1
D C
pz p a kD0 p a .p z/ .p a/nC1
and then applying the method we used above when deriving the Taylor’s series.
Problem 2.66. Consider the Taylor’s series of eix with x real, separate out the
even and odd powers of x and hence prove the Euler’s formulae (2.31).
Exactly in the same way as in Sect. I.7.3.3 it can be shown that any power series
of an analytic function f .z/ coincides with its Taylor’s series, i.e. the Taylor’s series
is unique.
2.5.4 The Laurent Series
The Taylor’s series is useful in expanding the function f .z/ within a circle jz aj <
R where f .z/ is analytic. However, if f .z/ is not analytic at the point z D a, Taylor’s
Fig. 2.24 We surround the

singularity point z D a by
two circles: Cr of radius r and
a larger one C of radius ,
where r < < R and R is the
radius of a circle where f .z/ is
analytic (except for the point
z D a itself)
expansion around this point cannot be applied. As was shown by Laurent6 it is still
possible to expand f .z/ around the point z D a, but in this case the series would not
only contain terms .z a/k with positive powers k, but also terms with negative k as
well, i.e. this, the so-called Laurent series, would have the general form:
1
X
f .z/ D ck .z a/k : (2.93)
kD1
For some functions the series may contain a finite number of terms in the part of the
series with positive or negative k. This is determined by the character of the point
z D a where f .z/ has a singularity. We shall postpone considering this particular
aspect in more detail until later, but now let us derive the Laurent series.
Consider f .z/ which is analytic inside a circle of radius R around the point a,
except the point a itself, i.e. the function is analytic in a ring 0 < jz aj < R, see
Fig. 2.24. Take now a point z inside the ring formed by the circle Cr , surrounding the
point z D a, and a larger circle C which encloses the point z but remains inside the
circle CR of radius R, i.e. 0 < r < < R. The value of the function f .z/ at the point
z can then be expressed employing the generalisation (2.64) of the Cauchy theorem,
yielding
I I
1 f .p/ 1 f .p/
f .z/ D dp dp: (2.94)
2 i C pz 2 i Cr pz
Both integrals are taken in the anti-clockwise direction. Note that in the first integral
over C the points p are further away from z, i.e. jp aj > jz aj, and hence
1= .p z/ can be expanded into a geometric progression (2.82) leading to the
Taylor’s series for the first integral in (2.94), i.e.
6
Karl Weierstrass discovered the series 2 years before Laurent, but published his results more than
50 years later.
I X 1 I
1 f .p/ 1 f .p/dp
dp D ck .z a/k ; where ck D ;
2 i C pz kD0
2 i C .p z/kC1
(2.95)
see Eq. (2.83). Note that the coefficients ck here cannot be written via f .k/ .a/ as the
latter does not exist (f .z/ is not analytic at z D a).
In the second integral in Eq. (2.94) points p lie closer to a than z, i.e. jp aj <
jz aj. In this case we can expand with respect to q D .p a/ = .z a/ (so that
jqj < 1), i.e.
1
1 1 1 1 1 1 1 X k
D D pa D D q
pz .p a/ .z a/ z a 1 za za1q z a kD0
1
X .p a/k1
D ;
kD1
.z a/k
where in the last step we changed the summation index in the sum, so that now it
starts from k D 1. This leads to the following formula for the second integral:
I X 1 I
1 f .p/ 1
dp D ck .z a/k ; where ck D .p a/k1 f .p/dp:
2 i Cr p z kD1
2 i Cr
Here the sum runs over negative powers of .z a/ using the positive index k; if we
change the index k ! k, so that the summation index k runs over all negative
integer numbers between 1 and 1, then we obtain for the second integral
in (2.94) instead:
I X1 I
1 f .p/ 1 f .p/
dp D ck .z a/k ; where ck D dp:
2 i Cr pz kD1
2 i Cr .p a/kC1
(2.96)
The latter form now looks extremely similar to the expansion (2.95) for the first
integral which allows combining both results into a single formula:
1
X I
1 f .p/
f .z/ D ck .z a/ ; k
where ck D dp; (2.97)
kD1
2 i C .p a/kC1
which is called the Laurent series. Note that here C is any closed contour lying
between Cr and C . Indeed, the loop C in the formula for ck with positive k can
be deformed into C as long as C remains inside the ring formed by Cr and C , and
for negative k the same can be done with the loop Cr which can be freely deformed
into C.
The part of the series containing negative powers of .z a/ is called the principal
part of the Laurent series. Also note that since both parts of the Laurent series
(for positive and negative k) were based on the uniformly converging geometric
progressions, the Laurent series also converges uniformly inside the ring 0 <
jz aj < R. Moreover, the Laurent series represents an analytic function in the
ring as consisting of two series (corresponding to negative and positive k), each of
which is analytic. The following Theorem establishes uniqueness of the series and
shows that if f .z/ is analytic in a ring r < jz aj < R, then its expansion (2.93) over
positive and negative powers of .z a/ is unique and hence is given by Eq. (2.97),
i.e. it must be the Laurent series.
Theorem 2.13. Consider the series (2.93) that converges to f .z/ within a ring
r < jz aj < R. Then its expansion over positive and negative powers of .z a/
is analytic, unique and hence coincides with its Laurent expansion.
Proof. Indeed, consider the expansion (2.93). Its part with k 0 converges inside
the circle jz aj < R, while its part with k 1 converges for any z satisfying
jz aj > r. Indeed, using the ratio test for the positive part, we have
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ ckC1 ˇ ˇ ˇ ˇ ck ˇ
lim ˇˇ ˇ jz aj D jz aj lim ˇ ckC1 ˇ < 1 H) jz aj < lim ˇˇ ˇ D R;
k!1 c ˇ k!1 ˇ c ˇ k!1 ckC1 ˇ
k k
while for the negative part

ˇ ˇ ˇ ˇ ˇ ˇ
ˇ ck1 ˇ ˇ ˇ ˇ ˇ
lim ˇˇ ˇ jz aj1 D jz aj1 lim ˇ ck1 ˇ < 1 H) jz aj > lim ˇ ck1 ˇ D r:
k!1 c ˇ
k k!1 ˇ c ˇ k k!1 ˇ c ˇ k
Either series converges uniformly; the proof for the positive part of the series is
identical to that given by the Abel’s Theorem 2.12; for the negative part the proof
has to be slightly modified and we shall sketch it here (using temporarily k as
positive for convenience). Consider a point z0 inside the ring such that jz0 aj <
ˇjz aj. The k
series
ˇ converges at z0 and hence each of its terms must be bounded:
ˇck .z0 a/ ˇ < M. Then,
ˇ ˇ ˇ ˇ ˇ ˇk
ˇ ˇ ˇ ˇˇ k ˇ ˇ kˇ ˇ ˇ
ˇck .z a/k ˇ D ˇck .z0 a/k ˇ ˇˇ .z a/ ˇˇ < M ˇˇ .z0 a/ ˇˇ D M ˇ z0 a ˇ D M k ;
ˇ .z0 a/k ˇ ˇ .z a/k ˇ ˇ za ˇ
where D jz0 aj = jz aj < 1. Since the absolute value of each element of our
series is bounded by the elements k of the converging geometric progression, the
series converges absolutely and uniformly because of the Weierstrass test (Theorem
I.7.15).
Now, since the expansion (2.93) converges uniformly, it can be integrated term-
by-term. Let us multiply both sides of this equation by .z a/n f .z/=2 i with some
fixed integer n and integrate in the anti-clockwise direction over a circle C with the
point a in its centre:
I 1
X I
1 1
.z a/n f .z/dz D ck .z a/nCk dz:
2 i C kD1
2 i C
The integral in the right-hand side is equal to zero for any n C k ¤ 1 (see
Problem 2.55), and is equal to 2 i for n C k D 1 (Example 2.3). Therefore, in
the sum in the right-hand side only the single term with k D n 1 survives, and
we obtain
I I
1 1 f .z/
.z a/n f .z/dz D cn1 H) cn D dz;
2 i C 2 i C .z a/nC1
which is exactly the same as in the Laurent series, see Eq. (2.97) (of course, the circle
can be deformed into any contour lying inside the ring). This proves the second part
of the Theorem. Q.E.D.
Formula (2.97) can be used to find the Laurent expansion of any function
f .z/ which is analytic in a ring. This requires calculating closed-loop contour
integrals for the expansion coefficients ck via Eq. (2.97). However, in the cases of
f .z/ D Qn .z/=Pm .z/, where Qn .z/ and Pm .z/ are polynomials of the power n and m,
respectively, simpler methods can be used based on a geometric progression.
Example
2.5. I Let us find the Laurent series for the function f .z/ D
1= z2 3z C 2 around a D 0.
The quadratic polynomial in the denominator has two roots at z1 D 1 and z2 D 2, i.e.
1 1 1
f .z/ D D : (2.98)
.z 1/ .z 2/ z2 z1
Since singularities are at z D 1 and z D 2, we will have to consider three circular
regions: A f0 jzj < 1g, B f1 < jzj < 2g and C fjzj > 2g, see Fig. 2.25(a), where
each of the two fractions can be considered separately. In region A we can expand
directly with respect to z as jzj < 1:
X1 1
1 1 1 1 1 1X z k
D D zk and D D ;
z1 1z kD0
z2 2 1 z=2 2 kD0 2
and hence in this interval

X1
1 1
D kC1 C 1 zk ;
.z 1/ .z 2/ kD0
2
i.e. it is basically represented by the Taylor’s series. Region B is a ring 1 < jzj < 2
and hence we should expect the negative part of the Laurent expansion be presented
as well. And, indeed, since jzj > 1, we cannot expand the fraction 1=.z 1/ in terms
of z, but rather should be able to do it in terms of 1=z:

Fig. 2.25 For the expansion into the Laurent series of f .z/ D 1= z2 3z C 2 around (a) a D 0
and (b) a D 3. Blue circles separate three regions A, B and C in each case
1 1 1
1 1 1 1 X 1 k X k1 X k
D D D z D z :
z1 z 1 1=z z kD0 z kD0 kD1
On the other hand, since jzj < 2, the same expansion as above can be used for
1= .z 2/. Therefore, in this ring region
X1 1
1 1X z k
D zk ;
.z 1/ .z 2/ kD1
2 kD0 2
i.e. it contains both negative and positive parts. Finally, in region C we have jzj > 2
and hence for both fractions we have to expand in terms of 1=z. The expansion of
1= .z 1/ stays the same as for the previous region, while for the other fraction
1 1 1
1 1 1 1 X 2 k X k k1 X k1 k
D D D 2z D 2 z ;
z2 z 1 2=z z kD0 z kD0 kD1
so that finally within C
X1 X1 X1
1 k1
D zk C 2k1 zk D 2 1 zk ;
.z 1/ .z 2/ kD1 kD1 kD1
i.e. the expansion contains only the negative part of the Laurent series.J
Example 2.6. I In this example we shall expand the same function around a D 3
instead.
We again have three regions A, B and C as depicted in Fig. 2.25(b). Region A
corresponds to a circle centred at z D 3 and with the radius 1, i.e. jz 3j < 1, up to
the nearest singularity at z D 2; region B forms a ring between the two singularities,
i.e. 1 < jz 3j < 2, while the third region C corresponds to region jz 3j > 2. Let
us construct the Laurent series for region B:
X1 X1
1 1 1 1 .1/k .1/k
D D 1
D D ;
z2 .z 3/ C 1 z 3 1 C z3 kD0
.z 3/kC1 kD1
.z 3/k
1 1
1 1 1 1 1X k .z 3/
k X
k .z 3/
k
D D D .1/ D .1/ ;
z1 .z 3/ C 2 2 1 C z3
2
2 kD0 2k kD0
2kC1
and hence we finally obtain
X1 1
X
1 .1/k k .z 3/
k
D .1/ :
.z 1/ .z 2/ kD1
.z 3/k kD0 2kC1
Here, when expanding the first fraction, 1=.z 2/, we have in the denominator
.z 3/ C 1, where jz 3j > 1, and hence we must expand using inverse powers.
In the second case of the fraction 1=.z 1/, the denominator becomes .z 3/ C 2
with jz 3j < 2, and hence we can expand with respect to .z 3/=2 which results
in terms with positive powers. J
Example 2.7. I Let us expand into the Laurent series the function f .z/ D sin .1=z/
around a D 0.
Here we have a single region jzj > 0. Since the Taylor’s expansion for the sine
function (2.86) converges for all values of z, we can just use this expansion with
respect to p D 1=z to get
X 1
1 .1/n1 2nC1
sin D z :
z nD1
.2n 1/Š
This series contains only negative powers of z. J
Problem 2.67. Expand f .z/ D exp .1=z/ into the Laurent series around a D 0.
Problem
2.68. Show that the Laurent expansion of f .z/ D
1= z2 .i C 3/ z C 3i around a D 4i is
1
X
ˇk
f .x/ D ˛k .z 4i/k C ;
kD0
.z 4i/kC1
where the coefficients are

(continued)

" #
3Ci i kC1 3 C 4i kC1
˛k D and ˇk D 0 for jz 4ij < 3 I
10 3 25

3Ci 3C4i kC1 3Ci
˛k D and ˇk D .3i/k for 3 < jz 4ij < 5 I
10 25 10
3Ci
˛k D 0 and ˇk D .3 4i/k .3i/k for jz 4ij > 5:
10
[Hint: the roots of the square polynomial in the denominator are 3 and i.]

Problem 2.69. Expand f .z/ D 1= z2 C 1 into the Laurent series around
a D 0 using the method discussed above (by decomposing the fraction). Then
check your result by expanding the original fraction directly using u D z2 .
2.5.5 Zeros and Singularities of Functions
Singularities of a function f .z/ are the points where it is not analytic; as we shall
discuss here, they are closely related to the Laurent series of f .z/.
Firstly, we shall consider the so-called isolated singularities. The point z D a
is an isolated singularity, if one can always find its neighbourhood such where f .z/
is analytic apart from the singularity point itself, i.e. one can always find r > 0
such that in the ring 0 < jz aj < r the function f .z/ has no other singularities.
For instance, the function f .z/ D 1= z2 3z C 2 has two isolated singularities at
z D 1 and z D 2, since one can always draw a circle of the radius r < 1 around
each of these points and find f .z/ to be analytic everywhere in those circles apart
from the points z D 1 and z D 2 themselves. Isolated singularities, in turn, may be
of three categories:
2.5.5.1 Removable Singularity
The point z D a is removable if f .z/ is not analytic there although its limit,
limz!a f .z/, exists. In this case one can define f .z/ at z D a via its limit making
f .z/ analytic at this point as well, i.e. the singularity can be “removed”. Since f .z/
has a well-defined limit at z ! a, its Laurent expansion cannot contain negative
power terms; therefore, f .z/ must be represented on the ring 0 < jz aj < R (with
some R) by the Taylor’s series (2.84) with f .a/ being defined as its zero power
coefficient c0 (as all other terms tend to zero in the limit).
Example 2.8. I Function f .z/ D sin z=z has z D 0 as an isolated removable
singularity.
Indeed, we can expand the sine functions for jzj > 0 in the Taylor’s series to see that
the singularity is removed:
1
sin z 1 X .1/n1 2n1 1 z3 z2
D z D z C D 1 C ;
z z nD1 .2n 1/Š z 3Š 3Š
and hence it has a well-defined limit at z D 0. Therefore, we can redefine the

function via

sin z=z; if z ¤ 0
f .z/ D ;
1; if z D 0
which now makes it analytic everywhere. J
2.5.5.2 Poles
The point z D a is called a pole if limz!a f .z/ D 1 (either positive or negative

infinity). In this case the Laurent series contains a finite number of negative power
terms, i.e. it has the form:
1
X X1
1 '.z/
f .z/ D ck .z a/ D
k
cnCk .z a/k D ; (2.99)
kDn
.z a/ n
kD0
.z a/n
where '.z/ has only positive terms in its expansion, i.e. it is expandable into the
Taylor’s series, and hence is well defined in the neighbourhood of z D a including
the point z D a itself. Therefore, the origin of the singularity (and of the infinite
limit of f .z/ when z ! a) is due to the factor 1= .z a/n .
Above, n corresponds to the largest negative power term in the expan-
sion (2.99). If n D 1, i.e. the expansion starts from the term c1 = .z a/, the pole
is called simple. Otherwise, if it starts from cn = .z a/n , the pole is said to be of the
order n. It is easy to see that
1
X
.z a/n f .z/ D '.z/ D cnCk .z a/k
kD0
has a well-defined limit at z ! a equal to cn , which must be neither zero nor
infinity. Therefore, by taking such a limit it is possible to determine the order of the
pole:
Order of pole n is when lim .z a/n f .z/ is neither zero nor infinity:
z!a
(2.100)
Example 2.9. I The function
z3 z3
f .z/ D D
z3 C .1 2i/ z2 .1 C 2i/ z 1 .z i/2 .z C 1/
has two poles. The point z D i is the pole of order 2, while z D 1 is a simple pole.
Indeed, applying the criterion, we have for the former pole:
.z 3/ .z i/n2 i3
lim .z i/n f .z/ D lim D lim .z i/n2 :
z!i z!i zC1 i C 1 z!i
The limit is not zero or infinity only if n D 2 (in which case the limit is
.i 3/ = .i C 1/ D 1 C 2i). Similarly, for the other pole:
.z 3/ .z C 1/n1 1 3
lim .z C 1/n f .z/ D lim 2
D lim .z C 1/n1 ;
z!1 z!1 .z i/ .1 i/2 z!1
which gives n D 1; for any other values of n the limit is either zero (when n > 1) or
infinity (n < 1/.J
Poles are closely related to zeros of complex functions. If the function f .z/ is
not singular at z D a and its Taylor’s expansion around this point is missing the
very first (constant) term, i.e. the coefficient c0 D 0 in Eq. (2.99), then f .a/ D 0.
But more first terms in the Taylor’s expansion may be missing for some functions,
and this would characterise the rate with which f .z/ tends to zero as z ! a. More
precisely, if the Taylor’s expansion of f .z/ starts from the n-th term, i.e.
1
X 1
X
f .z/ D cn .z a/k D .z a/n cnCk .z a/k D .z a/n '.z/; (2.101)
kDn kD0
where '.z/ has all terms in its Taylor’s expansion, then it is said that the point z D a
is a zero of order n of f .z/. If n D 1, then it is called a simple zero. The function
f .z/ D sin z has a simple zero at z D 0 since its Taylor’s expansion starts from the
linear term.
By looking at Eqs. (2.99) and (2.101), we can see that if f .z/ has a pole of order
n at z D a, then the same point is a zero of the same order of the function 1=f .z/ D
.z a/n ='.z/ since 1='.z/ tends to a finite limit at z D a. Inversely, if f .z/ has a
zero of order n at z D a, then 1=f .z/ D .z a/n ='.z/ has a pole of the same order
at the same point.
2.5.5.3 Essential Singularity
The point z D a is called an essential singularity of f .z/ if the limit limz!a f .z/
does not exist. This means that by taking different sequences of points fzk g on the
complex plane which converge to z D a, different limits of f .z/ are obtained, i.e. f .z/
at z D a is basically not defined; in fact,7 one can always find a sequence of numbers
converging to z D a which results in any limit of limz!a f .z/. The Laurent series
around z D a must have an infinite number of negative power terms (as otherwise
we arrive at the two previously considered cases).
Example 2.10. I The function f .z/ D sin .1=z/ has an essential singularity at
z D 0.
Indeed, the Laurent series has the complete principal part, i.e. the negative part has
all terms (Example 2.7). If we tend z to zero over a sequence of points on the real
axis, z D x, then the limit limx!0 .1=x/ is not defined as the function oscillates
rapidly as x ! 0, although it remains bounded by ˙1. If, however, we take the limit
along the imaginary axis, z D iy, then

1 1 i=iy i
sin D e ei=iy D e1=y e1=y :
iy 2i 2
If y ! C0 (i.e. from above), then e1=y ! 1, e1=y ! 0 and hence sin .1=iy/ !
i1. If, however, y ! 0 (that is, from below), then sin .1=iy/ D Ci1. This
discussion illustrates well that z D 0 is an essential singularity of sin .1=z/. J
Problem 2.70. Show by taking different sequences of points that z D 0 is an

essential singularity of the function f .z/ D exp .1=z/.
Problem 2.71. Show that z D 0 is a removable singularity of f .z/ D
2
1 ez =z2 and redefine the function appropriately at this point.

Problem 2.72. Determine all poles of the function f .z/ D ez = z2 C 1 and
state their orders.
Problem 2.73. The same for the function f .z/ D sin z=z3 .
Problem 2.74. The same for the function f .z/ D
1
z z2 C i .z 3/5 .z C 3/3 .
2.5.5.4 Holomorphic and Meromorphic Functions
Functions f .z/ which do not have singularities are called holomorphic. For instance,
these are sine, cosine and exponential functions, polynomials, as well as their simple
combinations not involving division. Meromorphic functions only have isolated
poles. It follows from Eq. (2.99) that any meromorphic function is the ratio of two
homomorphic functions.
7
This statement was proven independently by Julian Sochocki, Felice Casorati and Karl Weier-
strass.
2.6 Analytic Continuation 197
2.5.5.5 Non-isolated Singularities
Two more types of singularities may also be present. Firstly, a singularity may be
1
not isolated. Consider the function f .z/ D e1=z C 1 . Its denominator is equal to
zero, e1=z C 1 D 0, when
1
D ln .1/ D ln ei D i C i2 k with k D 0; ˙1; ˙2; : : : :
z
Therefore, at points zk D .i C i2 k/1 the function f .z/ is singular. However, in
the limit of k ! 1 the points zk form a very dense sequence near z D 0, i.e. the
singularities are not isolated.
The other types of points where a function is not analytic are branch points and
points on branch cuts. The points on the branch cuts form a continuous set and hence
are also not isolated.
2.6 Analytic Continuation
It is often necessary to define a function outside thedomain of its natural definition.

For instance, consider a real function f .x/ D sin x= x2 C 1 .It is defined on the real
x axis. If we now consider the function f .z/ D sin z= z2 C 1 with z being complex,
how would it relate to f .x/? Obviously, the two functions coincide on the real axis
(when z D x), but it should be clarified what happens outside that axis. We need
this understanding, for instance, when calculating real integrals by extending the
integrands into the complex plane, a method very often used in practice.
It is easy to see that if a function f .x/ is continuous together with its first
derivative f 0 .x/, then the complex function f .z/ D f .x C iy/ (obtained by replacing
x with z in its definition) is analytic. Indeed, f .z/ is a complex function, i.e.
f .z/ D f .x C iy/ D u.x; y/ C iv.x; y/, and hence we can write
@f df @z df @f df @z df
D D D u0x C ivx0 ; D D i D u0y C ivy0 :
@x dz @x dz @y dz @y dz
Therefore,
df df
D u0x C ivx0 and also D i u0y C ivy0 D vy0 iu0y :
dz dz
Equating real and imaginary parts of the two expressions, we obtain u0x D vy0 and
vx0 D u0y , which are the Cauchy–Riemann conditions (2.16). Hence, f .z/ is indeed
analytic.
Problem 2.75. Consider a complex function f .x/ D u.x/ C iv.x/, composed

of two real functions u.x/ and v.x/. Prove that the complex function f .z/ D
u.z/ C iv.z/ is analytic.
Another example: consider a function f .z/ D ln .1 C z/ = .1 C z/ in the complex

plane, where we take the principal branch of the logarithm (with zero imaginary
part on the real axis, when x > 0); the branch point is at z D 1. This function is
analytic everywhere except for the point z D 1. For jzj < 1 it can be expanded into
the Taylor’s series by, e.g. multiplying the corresponding expansions of ln .1 C z/
and .1 C z/1 (since both converge absolutely and this is legitimate to do):

ln .1 C z/ z2 z3
f .z/ D D z C 1 z C z2 z3 C
1Cz 2 3
3 11
D z z2 C z3 : (2.102)
2 6
This series converges within the circle jzj < 1, and it is not defined outside it
including the circle itself, i.e. for jzj 1 the series diverges.
Let us now expand the function around some general point z D a (where
a ¤ 1):
g2
ln .1 C z/ D ln Œ.1 C a/ .1 C g/ D ln .1 C a/ C g C ;
2
1
.1 C z/1 D Œ.1 C a/ .1 C g/1 D 1 g C g2 ;
1Ca
where g D .z a/ = .1 C a/, so that we can define a function

g2 1 2

fa .z/ D ln .1 C a/ C g C 1 g C g
2 1Ca
ln .1 C a/ 1 ln .1 C a/ 3 2 ln .1 C a/
D C 2
.z a/ 3
.z a/2 :
1Ca .1 C a/ 2 .1 C a/
(2.103)
This is a particular expansion of f .z/ obtained with respect to the point a, and it
converges for jgj D j.z a/ = .1 C a/j < 1, i.e. in the circle jz aj < j1 C aj,
which is centred at the point z D a and has the radius of R D j1 C aj. At a D 0 the
latter expansion reduces to (2.102).
The convergence of the series (2.103) is compared for different values of a in
Fig. 2.26. Expanding f .z/ around a D 0 results in a function f0 .z/ which is only
defined in the domain A; the function f1 .z/ obtained with a D 1 is defined in a larger
domain which also completely includes A; taking a bigger value of a D 3 results in
an even bigger domain C which goes beyond the two previous ones. We may say
that the expansion (2.103) for a D 1 defines our function f .z/ in the part of the
domain B which goes beyond A, i.e. f0 .z/ is said to be continued beyond its domain
into a larger one by means of f1 .z/. Similarly, f2 .z/ defines f .z/ in the rest of C which
is beyond B. In other words, we may now define one function
2.6 Analytic Continuation 199
Fig. 2.26 Circles of

convergence of f .z/ given by
Eq. (2.103) for a D 0
(domain A), a D 1 (domain
B) and a D 3 (domain C).
The branch cut is made to the
left from z D 1 (the red
line)
8
< f0 .z/; z from A
f .z/ D f1 .z/; z from that part of B which is outside A :
:
f2 .z/; z from that part of C which is outside B
This process can be continued so that f .z/ would be defined in an even larger domain
in the complex plane. Of course, in our particular case we know f .z/ in the whole
complex plane C via the logarithm from the very beginning, so this exercise seems
to be useless. However, it serves to illustrate the general idea and can be used in
practice when the function f .z/ is actually not known. For instance, when solving a
differential equation via a power series (see Sects. I.8.4 and 2.8), expanding about
different points z D a allows obtaining different expansions with overlapping circles
of convergence. Hence, a single function can be defined in the united domains as a
result of the procedure described above.
This operation of defining a function f .z/ in a bigger domain is called analytic
continuation. The appearance of the word “continuous” is not accidental since the
resulting function will be analytic as long as its components are. This is proven by
Theorem 2.14. Consider two functions f0 .z/ and f1 .z/ which are analytic in
domains D0 and D1 , respectively. Let the two domains have only a common line
L as shown in Fig. 2.27(a), and let the two functions be equal on the line, i.e.
f0 .z/ D f1 .z/ for any z on L. If the two functions are also continuous on L, then the
combined function
8 8
< f0 .z/; z in D0 < f0 .z/; z in D0
f .z/ D f1 .z/; z on L or f .z/ D f0 .z/; z on L ; (2.104)
: :
f1 .z/; z in D1 f1 .z/; z in D1
which can be considered as an analytic continuation of f0 .z/ from D0 into D1 (or

of f1 .z/ from D1 into D0 ) is analytic in the whole domain D0 C D1 .
Fig. 2.27 (a) The regions D0 and D1 overlap at a line L. (b) A contour is considered which runs
across both regions and consists of two parts: L0 lying in D0 and L1 lying fully in D1 . The paths ˛
and ˇ lie exactly on the common line L and are passed in the opposite directions
Proof. Consider a contour which starts in D0 , then crosses L, goes into D1 and
finally returns back to D0 . It consists of two parts: L0 which lies fully inside D0 ; and
of the other part, L1 , which is in D1 , as shown in Fig. 2.27(b). We can close L0 with
a line ˛ which lies on the common line L for the two regions; similarly, L1 can be
closed with the line ˇ D ˛, also lying on L and passed in the opposite direction
to that of ˛. Since the two lines ˛ and ˇ are the same, but passed in the opposite
directions,
Z Z Z Z Z Z
f .z/dz C f .z/dz D f0 .z/dz C f1 .z/dz D f0 .z/dz C f0 .z/dz D 0;
˛ ˇ ˛ ˇ ˛ ˛
as f .z/ D f0 .z/ D f1 .z/ everywhere on L. Therefore, the contour integral of f .z/ over
the path L0 C L1 can be written as
I Z Z Z Z
f .z/dz D f .z/dz C f .z/dz D f0 .z/dz C f1 .z/dz
L0 CL1 L0 L1 L0 L1
Z Z I
C f0 .z/dz C f1 .z/dz D f0 .z/dz
˛ ˇ L0 C˛
I
C f1 .z/dz D 0 C 0 D 0;
L1 Cˇ
since the loop L0 C ˛ lies fully in D0 and hence the closed contour integral of the
analytic function f0 .z/ is equal to zero; similarly, the integral of f1 .z/ over the closed
loop L1 C ˇ is zero as well. Therefore, a closed-loop integral of the function (2.104)
anywhere in the entire region D0 C D1 is zero. This finally means, according to the
Morera’s Theorem 2.3, that the function f .z/ is analytic. Note that the continuity of
both functions on L is required for the Cauchy theorem we used here as L is a part
of the boundary of the two regions. Q.E.D.
Above we assumed that the two regions overlap only along a line. In this case
the continuation defines a single-valued function f .z/. If the two regions overlap in
2.7 Residues 201
their internal parts, i.e. in the subregion shown in Fig. 2.28, then a continuation is
not longer unique as f0 .z/ may be quite different to f1 .z/ in , and hence in the
function f .z/ becomes multi-valued.
2.7 Residues
2.7.1 Definition
Consider a domain D where a function f .z/ is analytic everywhere except for an

isolated singularity z D a. This could be either a pole or an essential singularity.
Surround the point a with a circle C passed in the anti-clockwise direction and
expand f .z/ in the Laurent series around a. The c1 coefficient of the series is
called the residue of f .z/ at point z D a, and is denoted Res f .a/ or Res Œf .z/ I a.
From (2.97) we deduce that
I
1
Res f .a/ D c1 D f .p/dp: (2.105)
2 i C
Obviously the residue at the removable singularity is zero, as the expansion of f .z/
around this point does not have the negative (principal) part and hence c1 D 0.
Let us derive a convenient expression for calculating the residue of a function at
a pole. If the point a is a pole of order n, then the Laurent expansion (2.99) starts
from the term cn .z a/n , and hence one can write
1
X
.z a/n f .z/ D ck .z a/kCn
kDn
D cn C cnC1 .z a/1 C cnC2 .z a/2 C C c1 .z a/n1

Cc0 .z a/n C :
Differentiate this expression n 1 times:
dn1 .n C 1/Š
Œ.z a/n f .z/ D .n 1/Šc1 C nŠc0 .z a/1 C c1 .z a/2 C ;
dzn1 2Š
Fig. 2.28 Two regions D0

and D1 overlap over some
domain , i.e. they share
some internal points apart
from the common line L
where all terms preceding the term with c1 vanish upon differentiation. Then,
taking the limit z ! a, the terms standing behind the c1 term disappear as well,
and we finally obtain a very useful expression:
1 dn1
c1 D Res Œf .a/I a D lim n1 Œ.z a/n f .z/ : (2.106)
.n 1/Š z!a dz
This formula can be used in practice to find the residues. For simple poles this
formula can be manipulated into simpler forms. For n D 1
Res Œf .a/I a D lim Œ.z a/ f .z/ : (2.107)

z!a
Another useful result is obtained when f .z/ is represented by a ratio A.z/=B.z/ of

two functions, where A.a/ ¤ 0 and B.z/ has a zero of order one (a simple zero) at
the point a, i.e.
B.z/ D .z a/ B1 .z/ D .z a/ Œb1 C b2 .z a/ C with B1 .a/ D b1 :
Since B0 .z/ D B1 .z/ C .z a/ B01 .z/ with B01 .a/ D b2 ¤ 0 and B0 .a/ D B1 .a/, we
obtain in this particular case a simpler formula:

A.z/ A.z/ A.a/ A.a/
Res I a D lim D D 0 : (2.108)
B.z/ z!a B1 .z/ B1 .a/ B .a/
Problem 2.76. Show that if the point a is a zero of order 2 of the function B.z/,
then the above formula is modified:

A.z/ 2A0 .a/ 2A.a/B000 .a/
Res I a D 00 : (2.109)
B.z/ B .a/ 3 ŒB00 .a/2
Example 2.11. I Let us find all residues of the function

cos z
f .z/ D :
z .z i/2
There are obviously two of them to find as there are only two singularities: at z D 0
and z D i. Consider first z D 0. Since the cosine function and .z i/2 behave well
there, the function can be presented as f .z/ D '1 .z/=z with '1 .z/ D cos z= .z i/2
being a function which has a finite non-zero limit at z D 0. Therefore, z D 0 is a
simple pole. This also follows from the fact that the limit of limz!0 Œzf .z/ is non-
zero and finite, see Eq. (2.100). Therefore, from Eq. (2.108) we obtain

cos z cos 0= .0 i/2 1= .1/
Res 2
I 0 D 0 D D 1:
z .z i/ .z/ 1
The same result comes out from the general formula (2.106) as well:
2.7 Residues 203

cos z 1 cos z cos 0
Res 2
I0 D lim 2
D D 1:
z .z i/ 0Š z!0 .z i/ .0 i/2
The Laurent expansion would also give the same c1 D 1 coefficient. Indeed,
expanding around z D 0, we have
z2
cos z D 1 C ; .z i/2 D Œi .1 C iz/2 D .1 C iz/2 D 1 C 2iz C ;
2
giving c1 D 1, as expected:

cos z 1 z2 1
D 1 C .1 C 2iz C / D .1 C 2iz C /
z .z i/2 z 2 z
D z1 C 2iz0 C :
Now let us consider another pole z D i. Since the cosine and 1=z behave well at
z D i, we deal here with the pole of order 2. The criterion (2.100) gives the same
result: the limit of .z i/ f .z/ at z ! i does not exist, while the limit of .z i/2 f .z/
is finite and non-zero (and equal to cos i=i D i cos i). Hence we can use directly
formula (2.106) for n D 2:

1 d 2 cos z
Res Œf .i/I i D lim .z i/
1Š z!i dz z .z i/2

d cos z sin z cos z
D lim D lim 2 D i sin i C cos i D e1 :
z!i dz z z!i z z
The same result is of course obtained when using the Laurent method:
cos z D cos i .z i/ sin i C
and

1 1 1 1 1 zi
D D D 1 C D i C .z i/ C ;
z .z i/ C i i 1 C .z i/ =i i i
and we obtain
cos z 1
2
D Œcos i .z i/ sin i C Œi C .z i/ C
z .z i/ .z i/2
1 h i
D 2
i cos i C .cos i C i sin i/ .z i/1 C ;
.z i/
and the same value for the c1 coefficient is calculated. J

Problem 2.77. Identify all singularities the following functions have, and then
calculate the corresponding residues there:
eiz z2 1
.a/ I .b/ tan z I .c/ :
z2 C 1 z2 iz C 6
[Answers: (a) Res f .i/ D i=2e, Res f .i/ D ie=2; (b) Res f . =2 C k/ D
1 for any integer k; (c) Res f .2i/ D i, Res f .3i/ D 2i.]
Most applications of the residues are based on the following Theorem:
Theorem 2.15 (Due to Cauchy). Consider a function f .z/ which is analytic

everywhere in some region D except at isolated singularities a1 ; a2 ; : : : ; an . Then
for any closed contour L inside D (passed, as usual, in the anti-clockwise
direction)
I X
f .z/dz D 2 i Res f .ai / ; (2.110)
L i .inside L/
where the sum is taken over all isolated singularities which lie inside L. (Note that
the function f .z/ does not need to be analytic on L (which could be, e.g. a boundary
of D), but need to be continuous there).
Proof. First, let us consider a single isolated singularity z D a. We surround it by

a circle C as shown in Fig. 2.29(a) to be passed in the anti-clockwise direction as
shown. Then, according to the definition of the residue, Eq. (2.105), we have
I
f .z/dz D 2 i Res f .a/ :
C
Fig. 2.29 (a) An isolated singularity a is traversed by a circular contour C. (b) Three isolated
singularities a1 , a2 and a3 fall inside the contour L, while three other singularities are outside the
contour. According to the generalised Cauchy theorem (2.62), the integral over L is equal to the
sum of three contour integrals around the points a1 , a2 and a3
2.7 Residues 205
Next, consider a larger contour L which is run (in the same direction) around several
such poles as shown in Fig. 2.29(b). Then, according to the generalised form of the
Cauchy theorem, Eq. (2.62), we can write
I XI X
f .z/dz D f .z/dz D 2 i Res f .ai / ;
L i Ci i
where the sum is run over all poles which lie inside the contour L. At the last step
we again used the definition (2.105) of the residue. This is the result we have set out
to prove. Q.E.D.
Note that in the above formula (2.110) only poles and essential singularities
matter, as the residue of a removable singularity is zero. Therefore, removable
singularities can be ignored, and this is perfectly in line with the fact that any
function f .z/ can be made analytic at the removable singularity by defining its value
there with the corresponding limit as was explained in Sect. 2.5.5.
2.7.2 Applications of Residues in Calculating Real

Axis Integrals
Closed-loop contour integrals are calculated immediately using the residue Theo-
rem 2.15. For instance, any contour L going around a point z0 which is a simple
pole of the function f .z/ D 1= .z z0 / yields
I
dz 1
D 2 i Res I z0 D 2 i;
L z z0 z z0
which is exactly the same result as in Example 2.3.
More interesting, however, are applications of the residues in calculating definite
integrals of real calculus taken along the real axis x. These are based on closing
the integration line running along the real x axis in the complex plane and using
the appropriate analytic continuation of the function f .x/ ! f1 .z/ (in many cases
f1 .z/ D f .z/). It is best to illustrate the method using various examples.
Example 2.12. I
Let us calculate the following integral:
Z 1
dx
ID : (2.111)
1 1 C x2
To perform the calculation, we consider the following integral on the complex plane:
I Z R Z
dz dz dz
IR D D C D Ihoriz C Icircle ;
L 1 C z2 R 1 C z2 CR 1 C z2
Fig. 2.30 The contour used

to calculate the
integral (2.111)
where the closed contour L shown in Fig. 2.30 consists of a horizontal part from R
to R running along the positive direction of the x axis, and a semicircle CR which
is run in the anti-clockwise direction as shown. This contour is closed and hence
the value of the integral can be calculated using the residue
Theorem (2.110), i.e.
it is equal to 2 i times the residue of f .z/ D 1= z2 C 1 at the point z D Ci. We
only need to consider this pole as the other one (z D i) is outside L. The residue is
easily calculated to be

1 1 1 1 i
Res 2 I i D Res Ii D D D ;
z C1 .z C i/ .z i/ iCi 2i 2
so that IR D 2 i .i=2/ D . The integral Ihoriz is a part of the integral I we

need, and the latter can be obtained by taking R ! 1 limit, I D limR!1 Ihoriz .
However, we still need to evaluate the integral over the semicircle Icircle . We shall
show nowthat it tends to zero as R ! 1. Indeed, let us estimate the function
f .z/ D 1= z2 C 1 on the circle CR . There z D R ei (with 0 ), so that
ˇ ˇ
ˇ 1 ˇ 1 1 1
ˇ ˇ
ˇ 1 C z2 ˇ D j1 C z2 j D j1 C R2 e2i j R2 1 ;
where we made use of the fact that
ja C bj jaj jbj > 0 for jaj > jbj (2.112)
(this follows from the inequality jc bj jcj C jbj by setting c D a C b). Hence,
according to the inequality (2.51),
ˇZ ˇ
ˇ dz ˇˇ 1
jICR j D ˇˇ R;
CR z2 C 1 ˇ R2 1
the absolute value of the integral tends indeed to zero as R ! 1 which means that
ICR ! 0 in this limit. Therefore,
I D lim Ihoriz D lim .IR ICR / D lim ICR D ;

R!1 R!1 R!1
2.7 Residues 207
which is our final result. This result can be checked independently as the integral
can also be calculated directly:
Z 1
dx
ID D arctan .C1/ arctan .1/ D D ;
1 1 C x2 2 2
which is the same. J
In a similar manner one can calculate integrals of a more general type:
Z 1
Qm .x/
ID dx; (2.113)
1 Pn .x/
where Qm .x/ and Pn .x/ are polynomials of the order m and n, respectively, and it is
assumed that Pn .x/ does not have real zeros, i.e. its poles do not lie on the x axis.
First, we note that the convergence of this integral at ˙1 requires m C 1 < n. Then,
each polynomial can be expressed via its zeroes as a product:
Qm .x/ D qm xm C qm1 xm1 C C q1 x C q0 D qm .x a1 / .x a2 / : : : .x am /

Y
m
D qm .x ak / ;
kD1
Pn .x/ D pn xn C pn1 xn1 C C p1 x C p0 D pn .x b1 / .x b2 / : : : .x bn /

Y
n
D pn .x bl / ;
lD1
where fak g are zeroes of Qm .x/, while fbk g are zeroes of Pn .x/. Again, we consider
the contour shown in Fig. 2.30 and hence need to investigate the integral over the
semicircle. On it we have
ˇ ˇ ˇ ˇ
jz ak j D ˇR ei ak ˇ ˇR ei ˇ C jak j D R C jak j
and also
ˇ ˇ ˇ i ˇ 1 1
jz bk j D ˇR ei bk ˇ ˇR e ˇ jbk j D R jbk j H) ;
jz bk j R jbk j
which allows us to estimate f .z/ D Qm .z/=Pn .z/ as follows:
ˇ ˇ ˇ ˇ ˇQ ˇ ˇ ˇQ ˇ ˇ
ˇ Qm .z/ ˇ ˇ qm ˇ ˇ m .z ak /ˇ ˇ qm ˇ m kD1 .R C jak j/
ˇ qm ˇ Am .R/
ˇ ˇ D ˇ ˇ ˇQkD1 ˇ ˇ ˇ ˇ ˇ
ˇ P .z/ ˇ ˇ p ˇ ˇ n .z b /ˇ ˇ p ˇ Qn .R jb j/ D ˇ p ˇ B .R/ ;
n n lD1 l n lD1 k n n
where Am .R/ and Bn .R/ are polynomials in R. Hence, the value of the semicircle
integral can be estimated as
ˇZ ˇ ˇ ˇ
ˇ Qm .z/ ˇˇ ˇˇ qm ˇˇ Am .R/
ˇ
jICR j D ˇ dzˇ ˇ ˇ R:
CR Pn .z/ pn Bn .R/
It is clearly seen that the expression in the right-hand side of the inequality tends to
zero as R ! 1 if m C 1 < n, and therefore, the Icircle ! 0. This means that
X
Qm .z/
I D lim .IR Icircle / D lim IR lim Icircle D lim IR D 2 i Res I bk ;
R!1 R!1 R!1 R!1 Pn .z/
k
where the sum is taken over all zeroes lying in the upper half of the complex plane.
Problem 2.78. Calculate the following integrals: (assuming a; b > 0):

Z 1 Z 1 Z 1
dx dx dx
.a/ 2
I .b/ 2 C a2 / .x2 C b2 /
I .c/ 2 2x C 2
I
0 2
.x C a /2 1 .x 1 x
Z 1 Z 1
dx dx
.d/ 2
I .e/ 3
:
2
1 .x 2x C 2/ 1 .x C a2 /
2

[Answers: (a) =4a3 ; (b) = Œab .a C b/; (c) ; (d) =2; (e) 3 = 8a5 .]
Problem 2.79. Prove the formula
I
zdz i
D p ; (2.114)
z4 C 2 .2g2 1/ z2 C 1 2
2g g 1
where g > 1 and the contour is the circle of unit radius with the centre at the
origin.
A wide class of integrals of the type

Z 2
ID R .sin ; cos / d
0
can also be calculated using the residue theorem. Here R.x; y/ is a rational function
of x and y. The trick here is to notice that the integration over between 0 and 2
may also be related to going anti-clockwise around the unit radius circle C1 . Indeed,
on the circle z D ei , dz D iei d D izd and also

1 i 1 1 1 i 1 1
sin D e ei D z and cos D e C ei D zC ;
2i 2i z 2 2 z
so that
I

1 1 1 1 dz
ID R z ; zC ;
C1 2i z 2 z iz
2.7 Residues 209
i.e. the integration is indeed related now to that on the circle. Therefore, according
to the residue theorem, the result is equal to the sum of residues of all poles inside
the unit circle C1 , times 2 i.
Example 2.13. I Consider the integral
Z 2
dx
ID :
0 2 C cos x
Using the transformations described above, we get:

I I
dz dz
ID 1
D 2i :
C1 zi 2 C 2 .z C 1=z/ C1 z2 C 4z C 1
p
The square polynomial has two roots: z˙ D 2 ˙ 3 and only the one with the
plus sign is inside the circle of unit radius.
p Therefore, we only need to calculate the
residue at the simple pole zC D 2 C 3, which yields
I
dz 1 4 2
I D 2i D 2i .2 i/ D p D p :J
C1 .z zC / .z z / zC z 2 3 3
R2
Problem 2.80. Calculate the integrals 0 f . /d for the following functions
f . /:
1 1 1
.a/ I .b/ 2 I .c/ I
1 C sin2 1 C sin2 1 C cos2
cos2 cos2 1
.d/ 2
I .e/ I .f / ;
1 C sin2 2 cos C cos2 a C sin
where a > 1 in the last case.

p p p p
[Answers:
q (a) 2; (b)
3 =2 2; (c) 2; (d) = 2;
p p
(e) 2 13=14 C 8 2=7 ; (f) 2 = a2 1.]
Problem 2.81. Show that for g > 1

Z =2
dx
D p : (2.115)
=2 g2 sin2 x g g2 1
Problem 2.82. Consider the integral of Problem 2.34 again. Make the substi-
tution t D sin , then replace with z D ei and show that the integration can
be extended over the whole unit circle. Finally, perform the integration using
the method of residues. You will also need the result

d2n 2n .2n/Š 2
lim 2n z2 1 D .1/n ; (2.116)
z!0 dz nŠ
see Eq. (I.3.46).
A considerable class of other real axis integrals have the form

Z 1
ID f .x/eix dx (2.117)
1
with some real ; alternatively, cosine or sine functions may appear instead of the
exponential function. We shall come across this type of integrals, e.g. in considering
the Fourier transform in Chap. 5. The idea of their calculation is also based on
closing the contour either above the real axis (as we have done so far) or below.
Before we formulate a rather general result known as Jordan’s lemma, let us
consider an example.
Example 2.14. I Consider the following integral:
Z 1 ix
e dx
I D 2 2
; where > 0:
1 x C a

We use the function f .z/ D eiz = z2 C a2 and the same contour as in Fig. 2.30
can be employed. Then the contribution from the semicircle part CR appears to be
equal to zero in the limit R ! 1. Indeed, on the upper semicircle z D x C iy D
R .cos C i sin / and 0 < < , so that sin > 0, and hence
eiz D eiR cos eR sin ! 0 when R ! 1:

Notice that the function 1= z2 C a2 remains bounded on the semicircle and in fact
goes to zero as well with R ! 1 (since z D Rei on CR ). Even though the length of
the semicircle
growsproportionally to R, this is overpowered by the decrease of the
function 1= z2 C a2 and the exponential function. Therefore, only the horizontal
part contributes with a simple pole at z D ia to be considered, and we obtain
immediately:
Z
I C f .z/dz D 2 i Res Œf .z/I ia
CR

eiz ea
H) I D 2 i Res Œf .z/I ia D 2 i D2 i D ea : J
2z zDia 2ia a
2.7 Residues 211
Fig. 2.31 The horizontal

path between R and R has
been closed in the lower part
of the complex plane by a
semicircle CR . Note that the
whole closed contour
(horizontal path and CR ) is
traversed in the clockwise
direction
How will the result change if was negative? We cannot use the same contour
as above since
eiz D eiR cos eR sin ! 1 when R!1
in this case. What is needed is to make sin < 0 on the semicircle CR , then eiz
would still tend to zero in the R ! 1 limit and the contribution from CR will
vanish. This can be achieved simply by closing the horizontal path with a semicircle
in the lower part of the complex plane, Fig. 2.31.
Problem 2.83. Show that in this case the integral is equal to . =a/ ea . [Hint:
note that this time the contour is traversed in the clockwise direction, so that
the closed contour integral is equal to a minus residue at z D ia times 2 i.]
In the above example we have used a rather intuitive approach. Fortunately, the
reasoning behind it can be put on a more rigorous footing. In fact, it appears that if
f .z/ tends to zero as z ! 1 (i.e. along any direction from z D 0), then the integral
over the semicircle CR always tends to zero. This latter statement is formulated in
the following Lemma where a slightly more general contour CR is considered than
in the example we have just discussed since this is useful for some applications: CR
may also have a lower part which goes by a below the x axis, i.e. Imz > a for all z
on CR , where 0 a < R, Fig. 2.32(a).
Lemma (Due to Jordan). Consider an incomplete circle CR of radius R as shown

in Fig. 2.32(a). We shall also need the rest of the same circle, CR0 , shown by the
dashed line on the same figure. Consider the limit of R ! 1 with a being fixed. If
f .z/ ! 0 uniformly with respect to z on CR and CR0 in the limit, then for > 0
I
lim f .z/eiz dz D 0; (2.118)
R!1 C
R
while for < 0

I
lim f .z/eiz dz D 0: (2.119)
R!1 C0
R
Fig. 2.32 For the proof of Jordan’s lemma: (a) when > 0, then the contour CR can be used
which consists of an upper semicircle ABC and two (optional) arches CD and EA; when < 0,
then instead the rest of the circle CR0 is to be used. Note that a is kept constant in the R ! 1 limit,
i.e. the angle ˛ D arcsin .a=R/ ! 0. (b) The sine function is concave on the interval 0 < x < =2,
i.e. it lies everywhere higher than the straight line y D 2x=
Proof. Let us start from the case of positive , and consider the semicircle part ABC
of CR . Since f .z/ converges uniformly to zero as R ! 1, we can say that for any
z on the semicircle jf .z/j < MR , where MR ! 0 as R ! 1. Note that due to the
uniform convergence MR does not depend on z, it only depends on R. Next, on the
semicircle
z D Rei D R .cos C i sin / H) eiz D eiR cos eR sin ;
and hence we can estimate the integral as follows (dz D iRei d ):

ˇZ ˇ ˇZ ˇ ˇZ ˇ
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ ˇ ˇ
f .z/e dzˇ MR ˇ
iz
e dzˇ D MR ˇˇ
iz ˇ
eiR cos eR sin Riei d ˇ
ˇ ˇ
semicircle semicircle 0
Z Z
ˇ iR cos R sin i ˇ
RMR ˇe e ie ˇ d D RMR eR sin d
0 0
Z =2
D 2RMR eR sin d :
0
In the last step we were able to use the symmetry of the sine function about
D =2 and hence consider the integration interval only between 0 and =2
(with the corresponding factor of two appeared in front). The last integral can be
estimated with the help of the inequality sin 2 = valid for 0 =2 (see
Fig. 2.32(b)), yielding
ˇZ ˇ Z
ˇ ˇ =2
MR
ˇ f .z/eiz dzˇˇ 2RMR e.2R= /
d D 1 eR :
ˇ
semicircle 0
In the limit of R ! 1 the right-hand side tends to zero because of MR .

2.7 Residues 213
What is left to consider is the effect of the arches CD and AE of CR , which lie
below the x axis. Apply the above method to estimate the integral on the AE first:
ˇZ ˇ ˇZ ˇ Z
ˇ ˇ ˇ ˇ 0
ˇ f .z/eiz dzˇˇ MR ˇˇ eiz dzˇˇ MR R eR sin d :
ˇ
AE AE ˛
For the angles lying between ˛ and zero we have R sin a, so that the last
integral can be estimated as:
Z 0 Z 0
a
MR R eR sin d MR R ea d D MR Rea ˛ D MR ea R arcsin :
˛ ˛ R
It is easy to see that R arcsin .a=R/ ! a as R ! 1, so that again because of MR the

contribution from the arch AE tends to zero in the R ! 1 limit.
Concerning the CD part, there the inequality R sin a also holds, and hence
the estimate for the exponent made above is valid as well, i.e. the same zero result
is obtained.
Let us now consider the case of < 0. In this case we shall consider the
incomplete circle contour CR0 instead. Then, the integral is estimated as follows:
ˇZ ˇ Z Z
ˇ ˇ C˛ ˛
ˇ iz ˇ
ˇ f .z/e dzˇ RMR eR sin d D RMR eR sin d
ˇ CR0 ˇ ˛ ˛
Z =2
D 2RMR eR sin d ;
˛
where we have made the substitution D . Since lies between the limits
0 < ˛ < < =2, the inequality sin 2 = is valid and, since < 0, we can
estimate the integral in the same way as above:
ˇZ ˇ ˇZ ˇ
ˇ =2 ˇ ˇ =2 ˇ
ˇ R sin ˇ ˇ jjR sin ˇ
2RMR ˇ e d ˇ D 2RMR ˇ e d ˇ
ˇ ˛ ˇ ˇ ˛ ˇ
ˇZ ˇ
ˇ =2 ˇ
ˇ ˇ MR Rjj
2RMR ˇ e.2jjR= / d ˇ D e e2Rjj˛= :
ˇ ˛ ˇ jj
It follows then that this expression tends to zero in the R ! 1 limit. The Lemma is
now fully proven. Q.E.D.
So, when > 0 one has to consider closing a horizontal path in the upper part of
the complex plane, while for < 0 in the lower part, we did exactly the same in the
last example.
Problem 2.84. Prove the following formulae:

Z 1 Z 1
cos .x/ dx 1 jja eix dx
2
D 2 C e I
1 .x2 C a2 / 2a a 1 .x2 C a2 / .x2 C b2 /
!
ejja ejjb
D :
b2 a2 a b
2
[Hint: in the first problem consider the function f .z/ D eiz = z2 C a2 , work
out the integral over the real axis and then take the real part of it. (In fact, the
same integral with cos .x/ replaced by sin .x/ is zero because the integrand
is an odd function.) In the second problem consider both cases of > 0 and
< 0.]
Z 1 Z 1
x sin .x/ dx x2 cos .x/ dx a
D ea I D 2 ae beb I
1 x2 C a2 1 .x 2 C a2 / .x2 C b2 / a b2
Z 1 2 Z 1 p
x cos .x/ dx 1 a x sin .x/ dx a
2 D e I 4 C a4
D 2 ea= 2 sin p :
1 .x C a /
2 2 2 a 1 x a 2
Problem 2.86. Prove the following integral representation of the Heaviside

function:
Z 1 (
d! ei!t 1; t > 0
.t/ D D ;
1 2 i ! C iı 0; t < 0
where ı ! C0.
Let us now consider several examples in which the poles of the functions f .z/ in
the complex plane lie exactly on the real axis.
Example 2.15. I Consider the integral
Z 1
eix
I D dx; (2.120)
1 x
Fig. 2.33 The integration

contour used in calculating
the integral of Eq. (2.120)
2.7 Residues 215
where > 0. The contour cannot be taken exactly as in Fig. 2.30 in this case since
it goes directly through the point z D 0. The idea is to bypass the point with a little
semicircle C as is shown in Fig. 2.33 and then take the limit ! 0. Therefore,
the contour contains a large semicircle CR whose contribution tends to zero as
R ! 1 due to Jordan’s lemma, then a small semicircle C whose contribution we
shall need to investigate, and two horizontal parts R < x < and < x < R.
The horizontal parts upon taking the limits R ! 1 and ! 0 converge to the
integration over the whole real axis as required in I . Moreover, since the function
f .z/ D eiz =z does not have any poles inside the contour (the only pole is z D 0
which we avoid), we can write using obvious notations:
Z Z Z R Z
C C C D0 (2.121)
R C CR
for any > 0 and R > . Consider now the integral over C . There z D ei , so
that
Z Z 0 .cos Ci sin / Z
eiz ei .cos Ci sin /
dz D iei d D i ei d :
C z ei 0
Calculating this integral exactly would be a challenge; however, this is not required,
as we only need it calculated in the limit of ! 0. This evidently gives simply
i as the exponent in the limit is replaced by one. Therefore, the C integral in the
limit tends simply to i . The two horizontal integrals in (2.121) give in the limit
the integral over the whole real axis apart from the point x D 0 itself (in the sense
of Cauchy principal value, Sect. I.4.5.4), the CR integral is zero in the limit as was
already mentioned, and hence we finally obtain
Z 1 Z
eix
P dx D D .i / D i :
1 x C
Noting that eix D cos .x/ C i sin .x/ and that the integral with the cosine function
is zero due to symmetry (the integration limits are symmetric but the integrand is an
odd function), we arrive at the following famous result:
Z 1
sin .x/
dx D : (2.122)
0 x 2
Note that the integrand is even and hence we were able to replace the integration
over the whole real axis by twice the integration over its positive part. Most
importantly, however, we were able to remove the sign P of the principal value
here since the function sin .x/ =x is well defined at x D 0 (and is equal to ). J
Fig. 2.34 The contour taken

for calculating the
integral (2.124)

Z 1
sin t i!t 1 ; if 1 < ! < 1
1 .!/ D e dt D : (2.123)
1 t 0; otherwise
[Hint: write the sine via an exponential function and consider all three cases
for !.]
R
In many other cases of the integrals f .x/dx over the real axis the singularity of
f .x/ within the integration interval cannot be removed and hence integrals may only
exist in the sense of the Cauchy principal value. We shall now demonstrate that the
method of residues may be quite useful in evaluating some of these integrals.
Example 2.16. I The integral
Z 1
eix
f .x/dx with f .x/ D
1 .x a/ .x2 C b2 /
is not well defined in the usual sense since the integrand has a singularity at x D a
lying on the real axis; however, as we shall see, its Cauchy principal value,
Z 1 Z a Z 1
IDP f .x/dx D lim f .x/dx C f .x/dx ; (2.124)
1 !0 1 aC
is well defined. To calculate it, we shall take the contour as shown in Fig. 2.34: this
contour is similar to the one shown in Fig. 2.33, but the little semicircle C goes
around the singularity at z D a. There is only one simple pole at z D ib which is
inside the contour, so using the simplified notations we can write
Z Z Z Z
a R
eb
C C C D 2 i Res Œf .z/I ib D 2 i
R C aC CR .ib a/ .2ib/
a C ib b
D e : (2.125)
b a2 C b2
2.7 Residues 217
The integral over the large semicircle CR is equal to zero in the R ! 1 limit due to
the Jordan’s lemma; the integral over the small semicircle is however not zero as is
clear from a direct calculation (z D a C ei on C , and dz D iei d ):
Z Z 0
Z
exp i a C ei exp i a C ei
D h i ie d D i
i
h id
C ei .a C ei /2 C b2 0 .a C ei /2 C b2
Z
eia i eia
! i d D
0 a2 C b2 a2 C b2
in the ! 0 limit. The two integrals over the parts of the x axis in (2.125) in
the limits R ! 1 and ! 0 yield exactly the principal value integral (2.124),
therefore, we obtain

a C ib b i eia a C ib b
ID e C D e C ieia
:
b a2 C b2 a2 C b2 a2 C b2 b
Separating the real and imaginary parts on both sides, we obtain two useful integrals:
Z 1 ha i
cos .x/ dx b
P D e C sin .a/ ;
1 .x a/ .x2 C b2 / a2 C b2 b
Z 1
sin .x/ dx
P D 2 eb C cos .a/ : J
1 .x a/ .x2 C b2 / a C b2

Z 1
xdx
P Dp :
1 x3 a3 3a
[Hint: do not forget to take account of a simple pole due to a root of the
denominator and prove that the integral over the large semicircle CR tends to
zero as R ! 1 .]
Finally, we shall consider several examples on calculating real integrals contain-

ing functions which, when continued into the complex plane, become multi-valued.
We shall only consider the logarithmic and the general power functions here.
Example 2.17. I We shall start from the following integral containing the logarith-
mic function: Z 1
ln x
ID dx:
0 .x C a2 /2
2

We shall adopt the principal branch of the logarithmic function with ln rei D
ln r C i and 0 < < 2 , the branch cut being chosen along the positive part of
the x axis. The contour is the same as the one shown in Fig. 2.33. The four integrals
are to be considered which appear in the left-hand side of the Cauchy theorem:
Z Z Z R Z " #
ln z
C C C D 2 i Res I ia
R C CR .z2 C a2 /2

d ln z
D2 i
dz .z C ia/2 zDia

1=z 2 ln z
D2 i
.z C ia/2 .z C ia/3 zDia
2
D .ln a 1/ C i : (2.126)
2a3 4a3
Let us first consider the integral over the small circle C , where z D ei :
Z Z 0 Z 0 Z 0
ln C i ei i ln ei
D iei d D d d :
C . 2 ei2 C a2 /2 . 2 ei2 C a2 / 2
. 2 ei2 C a2 /2
Both integrals tend to zero in the ! 0 limit. We shall demonstrate this using
two methods: the first one is based on taking the limit inside the integrals; the other
method will be based on making estimates of the integrals. Using the first method,
the first integral behaves as ln for small , which tends to zero when ! 0; the
second integral behaves as and hence also goes to zero.
The same result is obtained using the second method which may be thought of
as a more rigorous one. Indeed, for any the following inequality holds:
q s
1 1
jln zj D jln C i j D ln2 C 2 D ln2 C 2 2 ln ;
when becomes sufficiently small.8 In addition,9

ˇ ˇ
ˇ 1 ˇ 1
ˇ ˇ ;
ˇ 2 ei2 Ca2 ˇ 2
a 2
so that we can estimate the integral by a product of the semicircle length, , and
the maximum value of the integrand, see Eq. (2.51):
p
8
Taking square of the both sides, we have 3 ln2 .1= / 2 or ln .1= / = 3, which is fulfilled
starting from some large number 1= as the logarithm monotonously increases. Note that is
limited to the value of 2 .
9
Since ja C bj a b, when a > b > 0.
2.7 Residues 219
ˇZ ˇ ˇ ˇ!
ˇ ˇ ˇ ˇ 2 ln .1= /
ˇ ln z ˇ ˇ ln z ˇ
ˇ dzˇ max ˇ ˇ ;
ˇ C .z2 C a2 / ˇ
2 C ˇ .z2 C a2 / ˇ
2
.a2 2 /2
where the quantity on the right-hand side can easily be seen going to zero when
! 0, and so is the integral. We see that a more rigorous approach fully supports
the simple reasoning we developed above and hence in most cases our simple
method can be used without hesitation.
Now we shall consider the integral over the large semicircle CR . Either of the
arguments used above proves that it goes to zero when R ! 1. Indeed, using our
(more rigorous) second method, we can write
ˇZ ˇ ˇ ˇ!
ˇ ˇ ˇ ˇ 2 ln .R/
ˇ ln z ˇ ˇ ln z ˇ
ˇ dz ˇ max ˇ ˇ R R ;
ˇ CR .z2 C a2 /2 ˇ CR ˇ .z2 C a2 /2 ˇ .R2 a2 /2
and the expression on the right-hand side goes to zero as ln R=R3 . The same result
is obtained by using the first (simpler) method.
So, we conclude, there is no contribution coming from the semicircle parts of the
contour. Let us now consider the horizontal parts. On the part R < x < the
phase of the logarithm is , so that ln z D ln jxj C i D ln .x/ C i , and we can
write
Z ˇ ˇ Z R
ln .x/ C i ˇ t D x ˇ ln t C i
dx D ˇ ˇD dt
2 ˇ ˇ
R .x2 C a2 / dt D dx .t2 C a2 /2
Z 1 Z 1
ln t dt
! 2
dt C i DICi J
0 .t2 C a2 / 0 .t2 C a2 /2
in the R ! 1 and ! 0 limits. Above, J is the second integral which does

not contain the logarithm (and which we calculated above in Problem 2.78(a)). On
the horizontal part from to R the phase of the logarithm is zero and the integral
converges exactly to I. Combining all parts of Eq. (2.126) together, we obtain
2
2I C i J D .ln a 1/ C i ;
2a3 4a3
which, by equating the real and imaginary parts on both sides, gives the required
result:
Z 1
ln x
ID dx D 3 .ln a 1/ ;
0 .x C a2 /2
2 4a
whereas for J we obtain exactly the same result =4a3 as in Problem 2.78(a).J
Fig. 2.35 (a) The contour taken for the calculation of the integral (2.127) in the complex plane.
The contour consists of a closed internal part and a closed external large circle CR . The cut has been
made effectively between points z D a and z D b. (b) The contour to be taken when solving
Problem 2.90(a); it goes all the way around the cut taken along the positive part of the x axis

Z 1
ln xdx ln a
2 2
D I
0 x Ca 2a
Z 1
ln x ln b ln a
dx D :
0 .x2 C a2 / .x2 C b2 / 2 .a2 b2 / b a
Example 2.18. I As our second example, let us consider the integral

Z b
dx
ID q ; where a < 0; b > 0: (2.127)
.x C a/2 .b x/
a 3
this integral, we consider a slightly different integrand F.z/ D 1=f .z/

To calculate q
with f .z/ D .z C a/2 .z b/. How to deal with f .z/ of this kind we have already
3
discussed in Problem 2.24. So, we shall consider the branch cut going to the right
from the point z D a along the x axis as in Fig. 2.11 (where a D 1 and b D 1).
We select such branch of the root-three function for which its phase is between 0
and 2 =3, i.e. we restrict the phase of any z to 0 < arg.z/ < 2 . As f .z/ is only
discontinuous between the points z D a and z D b on the x axis, we choose the
contour as shown in Fig. 2.35(a). It consists of two parts: an external circle CR of
large radius R (which we shall eventually take to infinity) and of a closed path going
around the branch cut. In turn, this latter part is made of two circles C (around
z D a) and C0 (around z D b), and two horizontal parts, one going along the
upper side of the cut, the other along the lower. Let 1 and 2 are the radii of the two
small circles C0 and C , respectively. According to the Cauchy theorem for multiply
connected regions, the sum of the integrals going around these two contours is zero
as there are no poles between them:
2.7 Residues 221
Z Z Z aC 2
Z Z b 1
C C C C D 0: (2.128)
CR C0 b 1 C aC 2
On the large circle z D Rei , the function f .z/ is continuous, and hence we can
calculate the integral as follows:
Z Z 2
dz Riei d
q D q 2
.z C a/2 .z b/
3 0 3
CR
Rei C a Rei b
Z 2 Z 2
Riei d Riei d
! q D D2 i
0 3
.R2 e2i / Rei 0 Rei
after taking the limit of R ! 1.

Next, we shall consider the two small circle integrals. In the C0 we have z D
b C 1 ei with < < and dz D 1 iei d , and hence
Z Z i Z i
1 ie d 1 ie d
D q ! q
C0
.b C a C 1 ei /2 1 ei .b C a/2 1 ei
3 3
2=3 Z
i ei d
D q 1 ;
ei =3
.a C b/2
3
2=3
which tends to zero as 1 when 1 ! 0. Similarly for the other small circle integral
(z D a C 2 ei with 0 < < 2 ):
Z Z 0 i Z 0 i
2 ie d 2 ie d
D q ! 2=3 i2 =3 p
C 2 3
. 2 ei /2 .b a C i / 2 2 e
3
.b C a/
2e
1=3 Z 2
i 2 =3
D p ei d ;
3
.a C b/ 0
1=3
which tends to zero as 2 . So, both small circle integrals give no contribution.
What is left to consider are the two horizontal parts of the internal contour.
Generally z D a C 2 ei 2 or z D b C 1 ei 1 (see Fig. 2.11(a)qfor the definitions of
the two angles 1 and 2 in a similar case), and hence f .z/ D 3 2 i.2 2 C 1 /=3
2 1e . On
the upper side of the cut between the two points 2 D 0, 1 D , 2 D x C a and
1 D b x, so that
q
f .z/ D .x C a/2 .b x/ei =3
3
and the corresponding integral over the upper side of the cut becomes
Z Z Z
b 1 b 1
ei =3 dx b
ei =3 dx
D q ! q D ei =3
I:
2
.x C a/2 .b x/
aC aC 3 a 3
2 2 .x C a/ .b x/
On the lower side of the cut 1 D , 2 D 2 , 2 D x C a and 1 D b x, so that

q q
2 i.4 C /=3
D .x C a/2 .b x/ei =3 ;
3 3
f .z/ D .x C a/ .b x/e
giving for the corresponding horizontal part the contribution of

Z aC Z aC =3 Z
2 2
ei dx b
ei =3 dx =3
D q ! q D ei I:
3 2 a 3 2
b 1 b 1 .x C a/ .b x/ .x C a/ .b x/
Combining now all contributions in Eq. (2.128), we obtain

=3
2 i ei I C ei =3
ID0
2 i 2 i 2
H) ID =3 ei =3
D D Dp :
ei 2i sin . =3/ sin . =3/ 3
Interestingly, the result does not depend on the values of a and b. This can also
easily be understood by making a substitution t D .2x C a b/ = .a C b/. J
Problem 2.90. Using the contour shown in Fig. 2.35(b), prove the following
results:
Z 1
x dx a
.a/ D ; where 1 < < 0 I
0 xCa sin . /
Z 1 p Z 1 p
x x ln x
.b/ 2 2
dx D p I .c/ dx D p C ln a :
0 x Ca 2a 0 x C a2
2
2a 2
Problem 2.91. Consider the same integrals using the contour shown in
Fig. 2.33.
2.8 Linear Differential Equations
In Sect. I.8.4 we considered, without proof, a method of solving a second order

linear differential equation (DE)
y00 .x/ C p.x/y0 .x/ C q.x/y.x/ D 0 (2.129)

2.8 Linear Differential Equations 223
with variable functions p.x/ and q.x/ based on a generalised series expansion,
1
X
y.x/ D cr .x x0 /rCs ; (2.130)
rD0
about a point x D a, where cr are expansion coefficients, s D 0 when x0 is an

ordinary point and s ¤ 0 (and is a non-integer) when x0 is a regular singular
point. Neither proof of the method nor the actual meaning of the different types
of the singular points was provided, and only a brief outline was given for special
cases which may arise. In this section we shall present a more detailed account of
this method and provide some essential proofs, derivations and explanations. Our
consideration will be based on a more general equation
y00 .z/ C p.z/y0 .z/ C q.z/y.z/ D 0; (2.131)
considered on the complex plane, i.e. the variable functions p.z/ and q.z/ and
the solution itself, y.z/, are treated as functions of a complex variable z. What
we shall be interested in here is addressing the following question: under which
circumstances can one apply a generalised series expansion,
1
X
y.z/ D cr .z z0 /rCs ; (2.132)
rD0
in order to solve the DE (2.131), and what is a practical criterion one may apply in
order to determine if this can be done. We shall not be considering the question of
convergence of the series.
We shall first consider the case of the point z0 being an ordinary point, i.e. when
the functions p.z/ and q.z/ are analytic functions within some circle of radius R
centred at z0 , i.e. for all z satisfying jz z0 j < R. Therefore, they can be expanded
in the Taylor series around z0 :
p.z/ D p0 C p1 u C p2 u2 C and q.z/ D q0 C q1 u C q2 u2 C ;
where u D z z0 . Suppose, we would like to obtain a solution which satisfies initial

conditions y .z0 / D ˛ and y0 .z0 / D ˇ. Then, substituting z0 ! z into Eq. (2.131),
we immediately get y00 .z0 / D p0 ˇ q0 ˛. Differentiating (2.131) and substituting
z0 ! z again, we get

y000 .z0 / D .q0 C p1 / ˇ q1 ˛ p0 y00 .z0 / D ˛ .q1 p0 q0 / ˇ p1 p20 C q0 :
This process can be continued. It is clear, that any derivative y.n/ .z0 / of the solution
can be calculated this way and expressed via the initial conditions and the expansion
coefficients of the functions p.z/ and q.z/. In other words, the solution y.z/ in this
case can be sought in the form of the Taylor expansion
y.z/ D ˛ C ˇu C y2 u2 C :
The unknown coefficients y2 , y3 , etc., are obtained by substituting the expansion into
the DE, collecting coefficients to the same powers of u and setting them to zero. This
is exactly the method we used in Sect. I.8.4. It can be shown that the series obtained
in this way converges uniformly to the solution of the DE within the same region
jz z0 j < R where the functions p.z/ and q.z/ are analytic.
Consider now a much more interesting (and rather non-trivial) case of the
functions p.z/ and q.z/ having a singularity at z0 . In this case these functions
of coefficients of the DE are in general expanded into infinite10 Laurent series
(Sect. 2.5.4):
1
X 1
X
p.z/ D pk uk and q.z/ D qk uk :
kD1 kD1
The point z0 becomes a branch point. This means that if we consider two linearly
independent solutions y1 .z/ and y2 .z/ of the DE (we assume that they both exist),
then, when passing around z0 along a closed contour starting from a point z, they
arrive to some values yC C
1 .z/ and y2 .z/ when arriving back to z, which are not the
same as the starting points, i.e. y1 .z/ ¤ y1 .z/ and yC
C
2 .z/ ¤ y2 .z/ (the superscript
“C” hereafter indicates the value of a function after completing the full closed
contour around z0 ). However, since y1 and y2 are the two linearly independent
solutions, the new values must be some linear combinations of the same functions
y1 and y2 , i.e.
yC
1 D a11 y1 C a12 y2 and yC
2 D a21 y1 C a22 y2 : (2.133)
Problem 2.92. Prove by contradiction that a11 a22 a12 a21 ¤ 0, as otherwise
the two solutions y1 and y2 , which are assumed to be linearly independent,
appear to be linearly dependent (i.e. y2 D y1 with some complex number ),
which is impossible by the assumption.
One can consider various nonequivalent linear combinations of y1 and y2 as new

solutions. Let us find a special linear combination Y D b1 y1 Cb2 y2 which transforms
upon traversing around the point z0 into itself with a complex multiplier , i.e. that
Y C D Y H) b1 yC C
1 C b2 y2 D .b1 y1 C b2 y2 / :
Substituting here Eq. (2.133) for yC C

1 and y2 ,
b1 .a11 y1 C a12 y2 / C b2 .a21 y1 C a22 y2 / D .b1 y1 C b2 y2 / ;
10
Either of the series may contain a finite number of terms in its fundamental part which is a
particular case of the infinite series.
and comparing the coefficients to y1 and y2 in both sides, one arrives at an

eigenproblem,
which gives both the eigenvalue and the corresponding vector
b1
BD :
b2

a11 a12
AT B D B with AD : (2.134)
a21 a22
Solution of the equation

ˇ ˇ
ˇ a11 a21 ˇ
ˇ ˇ
ˇ a12 a22 ˇ D 0
gives up to two values of and hence up to two linearly independent eigenvectors B.

Consider now two possible cases: the two values 1 and 2 are different, or are the
same.
In the former case we obtain two solutions Y1 .z/ and Y2 .z/ which satisfy
Y1C .z/ D 1 Y1 .z/ and Y2C .z/ D 2 Y2 .z/. They are linearly independent since
1 ¤ 2 .
Problem 2.93. Prove this by contradiction, i.e. assuming that Y2 .z/ D Y1 .z/
with some complex number .
It is also easy to see that the eigenvalues are the property of the DE itself;
they do not depend on the particular linear combinations taken as the linearly
independent solutions. Indeed, consider two new functions w1 and w2 defined by
the transformation

w1 c11 c12 y1
WD D D CY:
w2 c21 c22 y2
Upon a traverse around z0 the function W goes into W C D CY C D CAY D

CAC1 W, i.e. the role of the matrix A in the eigenproblem (2.134) plays the
matrix CAC1 obtained using a similarity transformation with the matrix C (see
Sect. 1.2.10.3). However, similar matrices share their eigenvalues (Theorem 1.11 in
Sect. 1.2.10.3), this proves the statement made above.
Next, we consider a function
f .z/ D .z z0 /s1 D es1 ln.zz0 /

with s1 D ln 1 =2 i. Any value of the logarithm can be assumed for ln 1 , i.e. it

is defined up to 2 mi with arbitrary integer number m. In turn, this means that the
number s1 is defined up to an arbitrary integer m. After the traverse around z0 the
logarithm ln .z z0 / acquires the phase 2 , i.e. the function f .z/ turns into
f C .z/ D es1 Œln.zz0 /C2 i

D es1 2 i es1 ln.zz0 / D eln 1 es1 ln.zz0 / D 1 f .z/;
i.e. f .z/ behaves similarly to Y1 .z/. Hence the function Y1 .z/=f .z/ D
Y1 .z/= .z z0 /s1 does not change when going around the point z0 , i.e. it is single-
valued around the vicinity of this point, except may be at the point z0 itself.
Therefore, it can be expanded in a Laurent series, i.e. one can write
1
X
Y1 .z/ D .z z0 /s1 k .z z0 /k : (2.135)
kD1
Similarly, the other solution

1
X
Y2 .z/ D .z z0 /s2 k .z z0 /k (2.136)
kD1
has a similar form but with the s2 D ln 2 =2 i instead (the coefficients in the
Laurent series are most likely also different). Recall that the complex numbers s1
and s2 are defined up to arbitrary integers m1 and m2 , respectively. However, this
does not change the general form of the series (2.135) or (2.136) as the Laurent
series in each case contains all possible powers of .z z0 /.
Consider now the case of equal eigenvalues 1 D 2 of the matrix A (or AT ).
Note that the corresponding numbers s1 and s2 may in fact differ by an integer. In
this case Y1 satisfies Y1C D 1 Y1 , while the second possible linearly independent
solution Y2 must generally satisfy Y2C D a21 Y1 C a22 Y2 . The corresponding matrix
A in this case in the basis of Y1 and Y2 (we have learned that the eigenvalues are
invariant with respect to the choice of the basis) would have a11 D 1 and a12 D 0;
therefore, the determinant equation for the eigenvalues is of the form
ˇ ˇ
ˇ 1 a21 ˇ
ˇ ˇ
ˇ 0 a22 ˇ D 0:
The condition that the second root of this equation coincides with the first, 1 , means
that a22 D 1 , so that the second solutions Y2 must in fact satisfy the condition
Y2C D a21 Y1 C 1 Y2 .
Consider now the function
Y2 a21
f .z/ D ln .z z0 / : (2.137)
Y1 2 i1
Upon a complete traverse around z0 this function turns into
Y2C a21
f C .z/ D Œln .z z0 / C 2 i
Y1C 2 i1
a21 Y1 C 1 Y2 a21
D Œln .z z0 / C 2 i
1 Y 1 2 i1
Y2 a21 a21 Y2 a21
D C Œln .z z0 / C 2 i D ln .z z0 / D f .z/;
Y1 1 2 i1 Y1 2 i1
i.e. the function f .z/ is single-valued inside some circle with the centre at z0 accept
maybe at the point z0 itself, and therefore can be represented by a Laurent series.
This means that the second solution, according to Eq. (2.137), must have the form:
Y2 .z/ D ln .z z0 / Y1 .z/ C f .z/Y1 .z/

1
X
D ln .z z0 / Y1 .z/ C Y1 .z/ k .z z0 /k ; (2.138)
kD1
where D a21 =2 i1 is some complex number. Thus, the second solution should
contain a logarithmic term.
So far we did not make any assumptions concerning the nature of the point z0 ;
it could have been either a pole (sometimes also called a regular singular point,
Sect. I.8.4) or an essential singularity (an irregular singular point), see Sect. 2.5.5.
Let us consider specifically the case when the point z0 is a pole and hence each
Laurent series contains a finite number of terms .z z0 /k with negative powers k.
We can then add appropriate integers m1 and m2 to s1 and s2 , respectively, to ensure
that the Laurent series in either of the cases considered above do not contain the
negative powers at all, and this brings us to the method presented in Sect. I.8.4. What
is only left to understand is how one can determine from the DE itself whether the
point z0 is regular or irregular.
To this end, let us assume that we know two linearly independent solutions y1 .z/
and y2 .z/ of the DE (2.131), i.e.
y001 C py01 C qy1 D 0 and y002 C py02 C qy2 D 0:
Problem 2.94. Solve the above equations with respect to the functions p.z/
and q.z/ to show that they can be expressed via the solutions y1 and y2 as
follows:
W 0 .z/ y001 .z/ y0 .z/

p.z/ D and q.z/ D p.z/ 1 ; (2.139)
W.z/ y1 .z/ y1 .z/
where W.z/ D y1 y02 y01 y2 is the Wronskian (see Sect. 1.2.8).
We shall now consider the necessary criterion for the first case, when 1 ¤ 2 .
Since z0 is a pole by our assumption, the two solutions can be written as
y1 D us1 P1 .u/ and y2 D us2 P2 .u/; (2.140)
where u D z z0 and P1 .u/ and P2 .u/ are two functions which are well defined
(analytic) at u D 0 and some vicinity around it, i.e. they both can be represented
by Taylor expansions around u D 0 with a non-zero free term, i.e. P1 .0/ ¤ 0 and
P2 .0/ ¤ 0. This is because the behaviour of y1 or y2 around u D 0 has already been
described by the corresponding power terms us1 and us2 , and starting from a zero
free term in the expansions of the functions P1 and P2 would simply modify the
exponents s1 and s2 by an integer. We are using here symbolic notations whereby
Pn .u/ represents a Taylor expansion with a non-zero free term and with the index n
numbering various such functions we shall encounter in what follows.
The idea is to obtain a general form of the functions p.z/ and q.z/ from (2.139)
based on the known general form of the solutions in (2.140). To this end, we need
to calculate derivatives of the solutions and then the Wronskian and its derivative.
Since a derivative of a function P1 .u/ is also some expansion P3 .u/, we can write
y01 D s1 us1 1 P1 .u/ C us1 P3 .u/ D us1 1 Œs1 P1 .u/ C uP3 .u/ D us1 1 P4 .u/
and similarly for y02 D us2 1 P5 .u/, so that
W D us1 Cs2 1 P6 .u/ us1 1Cs2 P7 .u/ D us1 Cs2 1 P8 .u/
and
W 0 D .s1 C s2 1/ us1 Cs2 2 P8 .u/ C us1 Cs2 1 P9 .u/ D us1 Cs2 2 P10 .u/:
Therefore, a general form of the function p.z/ must then be
W 0 .z/ us1 Cs2 2 P10 .u/ P11 .u/ P11 .z z0 /

p.z/ D D s Cs 1 D D :
W.z/ u 1 2 P8 .u/ u z z0
Next, using the same method as above, we get y001 D us1 2 P12 .u/. Hence, for q.z/
we obtain
y001 .z/ y0 .z/ us1 2 P5 .u/ P11 .u/ us1 1 P4 .u/ P13 .u/
q.z/ D p.z/ 1 D s C D
y1 .z/ y1 .z/ u P1 .u/
1 u u P1 .u/
s1 u2
P13 .z z0 /
D :
.z z0 /2
Therefore, the necessary conditions for the point z0 to be a regular singular point is
that p.z/, if singular, has a pole of the first order, while q.z/, if singular, has either
2.9 Selected Applications in Physics 229
first or second order pole. Of course, one of these functions may be regular at z0 (i.e.
it may have a finite limit there), but then the other should be singular. The simple
criteria for determining if these conditions are satisfied are then based on calculating
the limits
lim .z z0 / p.z/ and lim .z z0 /2 q.z/:

z!z0 z!z0
If both limits result in finite numbers (including zeros), then the point z0 is a regular
singular point. This is exactly the criterion we used in Sect. I.8.4 without proper
proof.
Problem 2.95. Use a similar reasoning to show that a general form of the
functions p.z/ and q.z/ still remains the same if the two independent solutions
have the form (2.135) and (2.138). [Hint: write y2 D y1 . ln u C P2 .u// and
keep explicitly the terms by the logarithm when calculating W; in this way the
logarithms containing terms will cancel out.]
Problem 2.96. Show that the associated Legendre equation

m2
1 x2 y00 2xy0 C y D 0;
1 x2
where and m are some parameters, has regular singular points at x D C1
and x D 1. We shall study this DE in more detail in Sect. 4.5.
It can be shown that the stated above conditions for the function p.z/ and q.z/ are
also sufficient for the point z0 to be a regular singular point.
2.9 Selected Applications in Physics
2.9.1 Dispersion Relations
In many physical problems one encounters complex functions of a real variable x

(e.g. of the frequency or energy !) containing real and imaginary parts, f .x/ D
u.x/ C iv.x/. We know from Problem 2.75 that when f .x/ is continued analytically
into the complex plane, x ! z D xCiy, then the new function f .z/ becomes analytic.
Let us assume that the function f .z/ converges uniformly to zero as jzj ! 1. It is
easy to show that if f .x/ D Pn .x/=Qm .x/ is a rational function with polynomials
Pn .x/ and Qm .x/ of orders n and m, respectively, the uniform convergence to zero,
f .x/ ! 0 when x ! ˙1, immediately follows if m > n. Indeed, let us estimate
f .z/ for z D Rei in the R ! 1 limit. We have using Eq. (2.112):
Fig. 2.36 Two possible contours which can be used to prove the dispersion relations (2.144). Here
CR is a semicircle of radius R ! 1, while C is a semicircle of radius ! 0 around the point x0
which is passed either above (a) or below (b) the real axis
ˇ ˇ ˇQn ˇ Qn Qn
ˇ Pn .z/ ˇ ˇ ˇ
iD1 .z ai / jz ai j .jzj C jai j/
jf .z/j D ˇˇ ˇD ˇQ ˇˇ D Qm ˇˇ
iD1 ˇ QmiD1 ˇ ˇ
Qm .z/ ˇ ˇ m
ˇ jD1 z bj ˇ jD1 z bj
ˇ
jD1 jzj bj
ˇ ˇ
Qn
.R C jai j/
D QmiD1 ˇ ˇ ; (2.141)
ˇ ˇ
jD1 R bj
where ai and bi are zeros of the two polynomials. The expression in the right-hand
side of (2.141) tends to zero as R ! 1 since the polynomial in the denominator is
of a higher order (m > n) with respect to R than in the numerator. This convergence
to zero does not depend on the phase and hence is uniform. Note that it is not at all
obvious that the same can be proven for a general function f .x/; however, in many
physical applications f .x/ is some rational function of x.
Therefore, let us consider an analytic function f .z/ which tends to zero uniformly
when jzj ! 1. We shall also assume that f .z/ does not have singularities in the
upper half of the complex plane including the real axis itself. Then, if x0 is some
real number,
I
f .z/dz
D 0;
L z x0
according to the Cauchy formula (2.53), where L is an arbitrary loop running in the
upper half plane. The loop may include the real axis, but must avoid the point x0 .
Consider now a particular loop shown in Fig. 2.36(a), where we assume taking the
limits R ! 1 and ! 0. We can write
Z x0 Z Z R Z
C C C D 0; (2.142)
R C x0 C CR
where the integral over the large semicircle CR tends to zero due ˇ to ˇ
uniform
convergence of f .z/ to zero as jzj D R ! 1. Indeed, we can write ˇf Rei ˇ MR
with MR ! 0 when R ! 1. Hence,
ˇZ ˇ ˇZ ˇ Z ˇ ˇ
ˇ
ˇ f .z/dz ˇˇ ˇˇ f Rei Riei ˇ
ˇ ˇf Rei ˇ R
ˇ D ˇ d ˇ ˇ i ˇd
CR z x0 ˇ ˇ 0 Rei x0 ˇ 0
ˇRe x0 ˇ
Z
R ˇ i ˇ MR
ˇf Re ˇ d :
R x0 0 1 x0 =R
It is obvious now that the estimate in the right-hand side tends to zero as R ! 1.
Now, let us calculate the integral over the small semicircle where z D x0 C ei . We
have
Z Z 0 i
f .z/dz f x0 C ei ie
D d
C z x0 ei
Z Z

D i f x0 C ei d ! if .x0 / d D i f .x0 /
0 0
in the ! 0 limit. The two integrals along the real x axis in Eq. (2.142) combine
into the Cauchy principal value integral
Z x0 Z 1 Z 1
f .x/
C ! P dx
R x0 C 1 x x0
in the limits of R ! 1 and ! 0. Therefore, combining all terms together, we

obtain
Z 1
i f .x/
f .x0 / D P dx: (2.143)
1 x x0
Remarkably, only the values of the function on the real axis enter this expression!
Normally in applications this formula is presented in a different form in which real
and imaginary parts of the function f .x/ are used:
Z 1 Z 1
1 Im f .x/ 1 Re f .x/
Re f .x0 / D P dx and Im f .x0 / D P dx:
1 x x0 1 x x0
(2.144)
These relationships were discovered by H.A. Kramers and R. Kronig in relation
to real and imaginary parts of the dielectric constant of a material. However, the
physical significance of these relations is much wider. They relate imaginary and
real parts of the time Fourier transform (to be considered in Chap. 5) of a response
function .t / of an observable G.t/ of a physical system subjected to an external
perturbation F.t/ (e.g. a field):
Z t
G.t/ D .t / F. /d: (2.145)
1
Note that the integral here is taken up to the current time t due to causality: the
observable G.t/ can only depend on the values of the field at previous times, it
cannot depend on the future times > t. An example of Eq. (2.145) could be, for
instance, between the relationship between the displacement vector D.t/ and the
electric field E.t/ in electromagnetism, in which case the response function is the
time dependent dielectric function .t /.
It can be shown using the Fourier transform method that because of the causality
alone (i.e. the response function .t/ D 0 for t < 0), the Kramers–Kronig relations
can be derived independently. Note that the imaginary part of the response function
is responsible for energy dissipation in a physical system, while the real part of
.t / corresponds to a driving force.
Problem 2.97. Prove formula (2.143) using the contour shown in Fig. 2.36(b).
Problem 2.98. Consider specifically a response function .!/ D 1 .!/ C
i2 .!/ with 1 and 2 being its real and imaginary parts and ! > 0 a real
frequency. For physical systems .!/ D .!/, which guarantees that the
physical observables remain real. Show that in this case the Kramers–Kronig
relations read
Z 1 Z 1
2 !2 .!/ 2!0 1 .!/
1 .!/ D P 2 !2
d! and 2 .!/ D P 2 !2
d!:
0 ! 0 0 ! 0
(2.146)

Z 1 ˇ ˇ
sin t i!t 1 i ˇ! 1ˇ
e
K.!/ D e dt D 1 .!/ ˇ
ln ˇ ˇ;
0 t 2 2 ! C 1ˇ
where 1 .!/ D 1 if 1 < ! < 1 and 0 otherwise. Use the following method: (i)
relate first the real part of the function e
K.!/ to 1 .!/ of Eq. (2.123) expressed
via a similar integral; (ii) then, using the Kramers–Kronig relations, determine
the imaginary part of e K.!/.
2.9.2 Propagation of Electro-Magnetic Waves in a Material
Let us first discuss a little what a wave is. Consider the following function of the
coordinate x and time t:
‰ .x; t/ D ‰0 sin .kx !t/ : (2.147)
If we choose a fixed value of the x, the function ‰ .x; t/ oscillates at that point
with the frequency ! between the values ˙‰0 . Consider now a particular value
of ‰ at time t (x is still fixed). The full oscillation cycle corresponds to the time
Fig. 2.37 The oscillation of

a media along the direction x
shown at two times t (blue)
and t C t (red)
t C T after which the function returns back to the chosen value. Obviously, this must
correspond to the !T being equal exactly to the 2 which is the period of the sine
function. Therefore, the minimum period of oscillations is given by T D 2 =!.
Let us now make a snapshot of the function ‰ .x; t/ at different values of x but at
the given (fixed) time t. It is also a sinusoidal function shown in blue in Fig. 2.37.
After a passage of small time t the function becomes
‰ .x; t C t/ D ‰0 sin Œkx ! .t C t/ ;
and its corresponding snapshot is shown in red in Fig. 2.37. We notice that the
whole shape is shifted to the right by some distance x. This can be calculated
by, e.g. considering the shift of the maximum of the sine function. Indeed, at time
t the maximum of one of the peaks is when kxm !t D =2. At time t C t the
maximum must be at the point xm C xm , where k .xm C xm / ! .t C t/ D =2.
From these two equations we immediately get that xm D !t=k. Therefore, the
function (2.147) “moves” to the right with the velocity vphase D xm =t D !=k.
Over the time of the period T the function would move by the distance D
vphase T D !T=k D 2 =k.
We see that the specific construction (2.147) corresponds to a wave. It could
be, e.g. a sound wave: for the given position x the function ‰ oscillates in time
(particles of air vibrate in time at the given point), but at each given time the
function ‰ at different positions x forms a sinusoidal shape (the displacement of
air particles at the given time changes sinusoidally with x). This sinusoidal shape, if
considered as a function of time, moves undistorted in the positive x direction with
the velocity vphase: This would be the velocity at which the sound propagates: the
sound (oscillation of air particles) created at a particular point x0 (the source) would
propagate along the direction x with the velocity vphase . The latter is called phase
velocity, ! the frequency, k the wave vector and the distance the wave passes over
the time T the wavelength.
Since the wave depends only on a single coordinate x, points in space with
different y and z but the same x form a plane perpendicular to the x axis and passing
through that value of x. All y and z points lying on this plane would have exactly the
same properties, i.e. they oscillate in phase with each other; this wave is actually a
plane wave since its front is a plane. A point source creates a spherical wave with
the front being a sphere, but at large distances from the source such waves can be
approximately considered as plane waves as the curvature of a sphere of a very large
radius is very small.
The function ‰ satisfies a simple partial differential equation:
@2 ‰ 1 @2 ‰
2
D 2 ; (2.148)
@x vphase @t2
which is called the (one-dimensional) wave equation. We have already come across
wave equations in Sect. I.6.7.1, but these were more general three-dimensional
equations corresponding to waves propagating in 3D space.11
Of course, the cosine function in place of the sine function above can serve
perfectly well as a one-dimensional wave. Moreover, their linear combination
‰ .x; t/ D A sin .kx !t/ C B cos .kx !t/ (2.149)
would also describe a wave. Its shape (which moves undistorted with time) is still
perfectly sinusoidal as is demonstrated by the following simple manipulation (where
' D kx !t):
p
A B
A sin ' C B cos ' D A2 C B2 p sin ' C p cos '
A2 C B2 A2 C B2
p p
D A2 C B2 .cos sin ' C sin cos '/ D A2 C B2 sin.' C / ;
where tan D B=A. And of course the form (2.149) still satisfies the same
wave equation (2.148). At the same time, we also know that both sine and cosine
functions can also be written via complex exponentials by virtue of the Euler’s
formulae (2.33). Therefore, instead of Eq. (2.149) we can also write
‰ .x; t/ D C1 ei.kx!t/ C C2 ei.kx!t/ ; (2.150)
where C1 and C2 are complex numbers ensuring that ‰ is real. Obviously, since
the second exponential is a complex conjugate of the first, the ‰ will be real if
C2 D C1 . It is readily checked that the function (2.150) still satisfies the same wave
equation (2.148). Although complex exponential functions are perfectly equivalent
to the real sine and cosine functions, they are much easier to deal with in practice,
and we shall illustrate their use now in a number of simple examples from physics.
11
We shall consider the 1D wave equation in more detail in Sect. 8.2.
Consider an isotropic conducting media with the dielectric constant and

conductivity . For simplicity, we shall assume that the media is non-magnetic and
without free charges. Maxwell equations in such a media differ slightly from those
in the vacuum we considered in Sect. I.6.7.1 and have the following form:
4 1 @D 1 @H
curlH D jC ; divH D 0; curlE D ; divE D 0; (2.151)
c c @t c @t
where D D E, while the current j D E is related to the electric field via

conductivity of the media.
Let us establish a wave solution of these equations. To this end, we shall rearrange
them by applying the curl to both sides of the first equation (cf. Sect. I.6.7.1):
4 @
curl curlH D curlE C curlE:
c c @t
Using the fact that
curl curlE D grad divE E D E
(the divergence of E is zero due to the last equation in (2.151)), and the third
equation (2.151) relating curlE to H, we obtain a closed equation for H:
4 @H @2 H
H D 2
C 2 2 : (2.152)
c @t c @t
A similar calculation starting form the third equation in (2.151) results in an
identical closed equation for E:
4 @E @2 E
E D C : (2.153)
c2 @t c2 @t2
Consider now a plane wave propagating along the z direction:
E .z; t/ D E0 ei.kz!t/ ; (2.154)
and similarly for the H.z; t/. Substituting this trial solution into Eq. (2.153) should
give us the unknown dependence of the wave vector k on the frequency !. We obtain
r
2!2 4 ! 4 !
k D 2 Ci H) k.!/ D Ci D .n C i/ :
c ! c ! c
(2.155)
We see that the wave vector k is complex. This makes perfect sense as it corresponds
to the propagating waves attenuating into the media. Indeed, using the value of k in
Eq. (2.154), we obtain
E .z; t/ D E0 eız ei.k0 z!t/ ;
where k0 D !n=c and ı D !=c. It is seen that the amplitude of the wave, E0 eız ,
decays in the media as the energy of the wave is spent on accelerating conducting
electrons in it.
Problem 2.100. Show that the solution of the equation (2.155) with respect to
real and imaginary parts of the wave vector reads
v "r # v "r #
u u
u1 16 2 2 u1 16 2 2
nDt 2 C C and Dt 2 C :
2 !2 2 !2
(2.156)
Problem 2.101. Assume that the conductivity of the media is small as
compared to , i.e. =! . Then show that in this case
p 2
n' and ' p ;
!
and hence n. This means that absorption of energy in such a media is very
small.
As our second example, let us consider how an alternating current is distributed

inside a conductor. For simplicity, we shall consider the conductor occupying the
half space z 0 and the fields are quasi-stationary, i.e. they vary slowly over the
dimensions of the system. In this case it can be shown that the displacement current,
jdislp D c1 @D=@t, can be dropped in the Maxwell’s equations (2.151) as compared
to the actual current j and hence the equations for the fields (2.152) and (2.153) take
on a simpler form:
4 @H 4 @E
H D 2
and E D 2
; (2.157)
c @t c @t
We assume that the current flows along the x axis:
jx D j.z/ei!t ; jy D 0; jz D 0:
Because of the continuity equation, Sect. I.6.5.1, div j D 0 and hence @jx =@x D
0, i.e. the current cannot depend on x, it can only depend on z (if we assume, in
addition, that there is no y dependence either). This seems a natural assumption for
the conductor which extends indefinitely in the positive and negative y directions:
the current may only depend on the distance z from the boundary of the conductor,
and hence the function j.z/ characterises the distribution of the current with respect
to that distance. To find this distribution, we note that the current and the electric
field are proportional, j D E, and hence only the x component of the field E
remains, Ex .z; t/ D E.z/ei!t , where j.z/ D E.z/. Substituting this trial solution
into the second equation (2.157) for the field E, we obtain
@2 E.z/ 4 !
2
D i 2 E.z/: (2.158)
@z c
Problem 2.102. Assuming an exponential solution, E.z/ ez , show that the
two possible values of the exponent are
1 C ip 1Ci
1;2 D ˙ 2 !D˙ ; (2.159)
c ı
where we have introduced a parameter ı which has a meaning of the decay

length (see below).
Hence, the general solution of equation (2.158) is
E.z/ D Aez=ı eiz=ı C Bez=ı eiz=ı :
Physically, it is impossible that the field (and, hence, the current) would increase
indefinitely with z; therefore, the first term above leading to such a nonphysical
behaviour must be omitted (A D 0), and we obtain
Ex .z; t/ D Bez=ı eiz=ı ei!t ; Ey D Ez D 0: (2.160)
The current jx .z; t/ D Ex .z; t/ behaves similarly. It appears then that the field and
the current decay exponentially into the conductor remaining non-zero within only
a thin layer
p near its boundary, with the width of the layer being of the order of
ı D c= 2 !. The width decays as ! 1=2 with the frequency !, i.e. the effect is
more pronounced at high frequencies. This is called the skin effect. As ! ! 0 the
“skin” width ı ! 1, i.e. there is no skin effect anymore: the direct (not alternating)
current is distributed uniformly over the conductor.
Problem 2.103. Using the third equation in (2.151), show that the magnetic
field is
.1 i/ c
Hx D 0; Hy .z; t/ D Ex .z; t/ ; Hz D 0;
!ı
i.e. the magnetic field also decays exponentially into the body of the conductor
remaining perpendicular to the electric field. Numerical estimates show that
at high frequencies ! the magnetic field is much larger than the electric field
within the skin layer.
2.9.3 Electron Tunneling in Quantum Mechanics
In quantum mechanics the behaviour of an electron in a one-dimensional potential

V.x/ is described by its (generally) complex wave function .x/ which yields a
probability dP.x/ D j .x/j2 dx for an electron to be found during a measurement
between x and x C dx. The wave function satisfies the Schrödinger equation
„2 00
.x/ C V.x/ .x/ D E .x/; (2.161)
2m
where „ is the Planck constant (divided by 2 ), m electron mass and E is the electron
energy.
Here we shall consider a number of problems where an electron behaves as a
wave which propagates in space with a particular momentum and may be reflected
from a potential barrier; at the same time, we shall also see that under certain
circumstances the wave may penetrate the barrier and hence transmit through it,
which is a purely quantum effect called tunneling.
Consider first the case of an electron moving in a free space where V.x/ D 0.
The solution of the differential equation (2.161) can be sought in a form of an
exponential, .x/ eikx , which upon substitution yields E D „2 k2 =2m, the energy
of a free electron. The energy is positive and continuous. The wave function of
the electron corresponds to two possible values of the wave vectorp k (in fact, it is
just a number in the one-dimensional case), which are: k˙ D ˙ 2mE=„ D ˙k.
Correspondingly, the wave function can be written as a linear combination
.x/ D Aeikx C Beikx ;
where A and B are complex constants. The first term in .x/ above describes the
electron moving along the positive x direction, whereas the second term—in the
negative one.
Let us now consider a more interesting case of an electron hitting a wall, as
shown in Fig. 2.38(a). We shall consider a solution of the Schrödinger equation
corresponding to the electron propagating to the right from x D 1. In this case in
the region x < 0 (the left region) the wave function of the electron
L .x/ D Aeikx C Beikx ; (2.162)

p
where k D 2mE=„. The solution of the Schrödinger equation for x > 0 (the right
region) is obtained exactly in the same way, but in this case
R .x/ D Ceigx C Deigx ;

Fig. 2.38 (a) The potential V.x/ makes a step of height V0 at x D 0, so that the electron wave
D eikx propagating towards the step (to the right), although partially penetrating into the step,
will mostly reflect from it propagating to the left as eikx ; (b) in the case of the potential
barrier of height V0 and width d the electron wave propagates through the barrier as eikx ,
although a partial reflection from the barrier is also taking place; (c) the same case as (b), but in
this case a bias < 0 is applied to the left electrode, so that the potential experienced by the
electrons U D e > 0 (e < 0 is the electron charge) on the left becomes higher than on the right
causing a current (a net flow of electrons to the right)
p
where g D 2m .E V0 /=„. We shall consider the most interesting case of the
electron energy E < V0 . In a classical setup of the problem the electron would not be
able to penetrate into the right region with this energy; however,
p quantum mechanics
allows for some penetration. Indeed, in this case g D i 2m .V0 E/=„ D i
becomes purely imaginary and hence the wave function in the barrier region
becomes
R .x/ D Cex C Dex :
Since > 0, the second term must be dropped as it leads to an infinite increase of
the wave function at large x which is nonphysical (recall that j .x/j2 represents the
probability density). Therefore,
R .x/ D Cex : (2.163)
To find the amplitudes in Eqs. (2.162) and (2.163), we exploit the fact that the
wave function .x/ and its first derivative must be continuous across the whole x
axis. This means that the following conditions must be satisfied at any point x D xb :
ˇ ˇ
d ˇ d ˇ
.xb 0/ D .xb C 0/ and ˇ
.x/ˇ D .x/ˇˇ ; (2.164)
dx xDxb 0 dx xDxb C0
where C0 and 0 correspond to the limits of x ! xb from the right and left, respec-
tively. In our case the potential makes a jump at xb D 0 and hence this particular
point has to be specifically considered; the wave function is already continuous

anywhere else. Therefore, the above conditions translate into the following:
ˇ ˇ
d ˇ d ˇ
L .0/ D R .0/ and .x/ ˇ D .x/ ˇ :
dx
L ˇ dx
R ˇ
xD0 xD0
Using these conditions, we immediately obtain two simple equations for the
amplitudes:
ACBDC and ik .A B/ D C;
solution of which yields

p p
B ik C i 2mE C 2m .V0 E/ C 2ik
D D p p and D
A ik i 2mE 2m .V0 E/ A ik
p
2i 2mE
D p p (2.165)
i 2mE 2m .V0 E/
(only relative amplitudes can in fact be determined). The extent of reflection is

determined by the energy dependent reflection coefficient
ˇ ˇ2 ˇ ˇ
ˇBˇ ˇ ik C ˇ2 2 2
ˇ ˇ ˇ
R.E/ D ˇ ˇ D ˇ ˇ D C k D 1:
A ik ˇ 2 C k2
We see that in this case all of the incoming wave gets reflected from the step;
however, some of the wave gets transmitted into the step through the wall: the
probability density P.x/ D j R .x/j2 D jCj2 e2x is non-zero within a small region
of the width x 1=2 behind the wall of the step, although it decays exponentially
inside the step. This behaviour is entirely due to the quantum nature of the electron.
Since no electron is to be determined at x D C1, the transmission through the step
is zero in this case. Note that x ! 0 when V0 ! 1, i.e. the region behind the wall
(x > 0) is forbidden even for quantum electrons when the step is infinitely high.
A very interesting case from the practical point of view is a step of a finite
width shown in Fig. 2.38(b). In this case two solutions of the same energy E of the
Schrödinger equation exist: one corresponding to the wave propagating from left to
right, and one in the opposite direction. Let us first consider the wave propagating
to the right. Three regions are to be identified: x < 0 (left, L), 0 < x < d (central, C)
and x > d (right, R). The corresponding solutions of the Schrödinger equation for
all these regions are
L .x/ D eikx C Reikx ; C .x/ D Ceiqx C Deiqx and R .x/ D Teikx ;

(2.166)
p p
where k D 2mE=„ and q D 2m .E V0 /=„. We have set the amplitude of the
wave incoming to the barrier in L to one as the amplitudes can only be determined
relatively to this one anyway; also note that only a single term exists for the right
wave, R , as we are considering the solution propagating to the right.
Problem 2.104. Using the continuity conditions (2.164) at x D 0 and x D d,

show that:
2
q k2 e2iqd 1 4kqei.kCq/d
RD ; T D ;
.q k/2 .q C k/2 e2iqd .q k/2 .q C k/2 e2iqd
2k .q C k/ e2iqd 2k .q k/
CD 2
and DD :
.q k/ .q C k/2 e2iqd .q k/2 .q C k/2 e2iqd
Problem 2.105. The above solution corresponds to any energy E of the

electron. Let us now specifically consider two possible cases: 0 < E < V0
and E > V0 . Show that for E > V0 the amplitudes read

i q2 k2 sin .qd/
RD ;
i .q2 C k2 / sin .qd/ 2kq cos .qd/
2kqeikd
TD ;
i .q2 C k2 / sin .qd/
2kq cos .qd/
k .q C k/ eiqd
CD and
i .q2 C k2 / sin .qd/
2kq cos .qd/
k .q k/ eiqd
DD ;
i .q2 C k2 / sin .qd/ 2kq cos .qd/
while in the case of 0 < E < V0 (when q becomes imaginary) they are
2
C k2 sinh .d/
RD 2 ;
.k 2 / sinh .d/ 2ik cosh .d/
2ikeikd
TD ;
.k2 2 / sinh .d/ 2ik cosh .d/
k .i C k/ ed
CD and
k .i k/ ed
DD ;
p
where D 2m .V0 E/=„.
Problem 2.106. The current can be obtained using an expression:

ie„ d d
jŒ D ; (2.167)
2m dx dx
where jŒ states explicitly that the current depends on the wave function
to be used (it is said that the current is a functional of the wave function; we
shall consider this notion in more detail in Chap. 9) and e is the (negative)
electron charge. The current should not depend on the x, i.e. it is expected that
it is constant across the whole system. Show that the currents calculated using
L .x/ (i.e. in the left region of Fig. 2.38(b)) and R .x/ (in the right region) are,
respectively:
e„k e„k
jL D 1 jRj2 and jR D jTj2 : (2.168)
m m
Then demonstrate by a direct calculation based on the expressions for the
amplitudes given in Problem 2.104 that
jTj2 D 1 jRj2 :
This identity guarantees that the current on the left and on the right due to an
electron of any energy E is the same.
Problem 2.107. The reflection coefficient r is defined as the ratio of the current
(which is basically a flux of electrons) of the reflected wave, j Reikx (here we
assume jŒ calculated specifically for D Reikx ), to the current due to the
incoming one, j eikx , while the transmission coefficient t is defined as the ratio
of the transmitted, j Teikx , and the incoming waves. Show that
j Reikx j Teikx
rD D jRj2 and tD D jTj2 :
j Œeikx j Œeikx
Considering specifically the two possible cases for the electron energy, show
that:
2 2 " 2 #1
q k2 k2 q2 2
r.E/ D ; t.E/ D 1 C sin .qd/
.q2 C k2 /2 C 4k2 q2 cot2 .qd/ 2kq
when E > V0
(continued)

and
" #1 " 2 2 #1
4k2 2 k C 2 2
r.E/ D 1 C ; t.E/ D 1 C sinh .d/
.k2 C 2 /2 sinh2 .d/ 2k
when < E < V0 ;
where the energy dependence comes from that of k and as given above.
Problem 2.108. Consider now the other solution of the Schrödinger equation
corresponding to the propagation of an electron of the same energy E from right
to left. In this case the solutions for the wave functions in the three regions we
are seeking are
L .x/ D Teikx ; C .x/ D Ceiqx C Deiqx and R .x/ D eikx C Reikx ;

(2.169)
where the same k and q are used. Show that the expressions for the reflection
and transmission coefficients for this solution are exactly the same as for
the left-to-right solution, and hence the corresponding currents as given by
Eq. (2.168) coincide exactly as well.
Therefore, the net current due to electrons traveling from the left and from the
right, which is the difference of the two, is zero. This result is to be expected as the
situation is totally symmetric: there is no difference between the two sides (x < 0
and x > d) of the system. In order for the current to flow, it is necessary to distort this
balance, and this can be achieved by applying a bias to the system which would
result in the electrons on the right and left to experience a different potential V.x/.
The simplest case illustrating this situation is shown in Fig. 2.38(c) where, because
of the bias < 0 applied to the left electrode, the potential V.x/ D U D e > 0
in the left region (x < 0), while V D 0 in the right one (x > d). Recall that e is the
(negative) electron charge.
Problem 2.109. In this Problem we shall consider this situation explicitly.

Using the method we have been using above, write down the left-to-right
solution of the Schrödinger equation:
!
L .x/ D eikx C R! eikx ; !
C .x/ D C! eiqx C D! eiqx and
!
R .x/ D T ! eipx ; (2.170)
p p p
where k D 2m .E U/=„, q D 2m .E V0 /=„ and p D 2mE=„, and
then determine the amplitudes R! and T ! ; you should get these:
(continued)

! q .k p/ C i q2 pk tan .qd/
R D and
q .k C p/ i .q2 C pk/ tan .qd/
2kqeipd
T! D : (2.171)
q .k C p/ cos .qd/ i .q2 C pk/ sin .qd/
These formulae are valid for any energy E; specifically, q D i when E <
V0 and the sine and cosine functions are understood in the proper sense as
functions of the complex variable.
The most interesting case to consider is when electrons are with energies
U < E < V0 which corresponds to the tunneling. Then, show that for these
energies:
2 4k2 2
t! D jT ! j D 2
;
. 2 pk/ sinh2 .d/ C 2 .p C k/2 cosh2 .d/
2 2
2 C pk sinh2 .d/ C 2 .p k/2 cosh2 .d/
r! D jR! j D :
. 2 pk/2 sinh2 .d/ C 2 .p C k/2 cosh2 .d/
Using the complete wave functions in the left and right regions, show that the
corresponding currents are
e„k 2 e„k e„p ! 2 e„p !

j!
L D 1 jR! j D 1 r! and j!
R D jT j D t :
m m m m
Note that j! !
L ¤ jR . Next, consider the reverse current, from right to left.
Repeating the calculations, show that in this case (for any E):

2ipd q .p k/ C i q2 pk tan .qd/
R De and
q .k C p/ i .q2 C pk/ tan .qd/
2pqeipd
T D : (2.172)
q .k C p/ cos .qd/ i .q2 C pk/ sin .qd/
Therefore, considering again the energies between U and V0 , show that
2 p2 ! 2
t D jT j D t and r D jR j D r! :
k2
Hence, calculate the currents as measured in the left, jL D j!

L jL , and right,
jR D j!
R j R , electrodes. Demonstrate that both are the same (as it should be)
and given by:
(continued)
e„ 4 2 pk .k p/
jD 2
:
m . 2 pk/ sinh2 .d/ C 2 .p C k/2 cosh2 .d/
It is explicitly seen now that if no bias is applied, U D 0, k D p and hence

j D 0.
Using atomic units (a.u., in which „2 =m D 1 and e D 1) and explicit
expressions given above for k, p and , investigate the dependence of the
current j on the electron energy E (from 0 to, say, 0.1 a.u.), depth d (between
1 and, say, 10 a.u.) and height V0 (up to 0.1 a.u.) of the barrier. Also plot the
dependence of the current on the applied bias D U (the so-called current–
voltage characteristics of the device we constructed).
The problem we have just considered is a very simplified model for a very
real and enormously successful experimental tool called Scanning Tunneling
Microscopy (STM) for invention of which Gerd Binnig and Heinrich Rohrer
received the Nobel Prize in Physics in 1986. Schematically STM is shown in
Fig. 2.39(a). An atomically sharp conducting tip is brought very close to a con-
ducting surface and the bias is applied between the two. As the distance between
the tip apex atom (the atom which is at the tip end, the closest to the surface)
and the surface atoms reaches less than '10 Å, a small current (in the range
of pA to nA) is measured due to tunneling electrons: the vacuum gap between
the two surfaces serves as a potential energy barrier we have just considered in
Problem 2.109. Most importantly, the current depends on the lateral position of
the tip with respect to the atomic structure of the surface. This allows obtaining in
many cases atomic resolution when scanning the sample with the STM. Nowadays
Fig. 2.39 (a) A sketch of the Scanning Tunneling Microscope (STM): an atomically sharp tip
is placed above a sample surface 4–10 Å from it. When a bias is applied to the tip and sample,
electrons tunnel between the two surfaces. (b) STM image of an Si(111) 77 reconstructed surface
with a small coverage of C60 molecules on top of it; bias voltage of 1:8 V and the tunnel current
of 0.2 nA were used (reproduced with permission from J.I. Pascual et al. —Chem. Phys. Lett. 321
(2000) 78)
the STM is widely used in surface physics and chemistry as it allows not only
imaging atoms and adsorbed molecules on crystal surfaces, but also moving them
(this is called manipulation) and performing local chemical reactions (e.g. breaking
a molecule into parts or fusing parts together). In Fig. 2.39(b) a real STM image of
C60 molecules adsorbed on top of the Si(111) reconstructed surface is shown. You
can clearly see atoms of the surface as well as the molecules themselves as much
bigger circles. Atoms on the surface form a nearly perfect arrangement, however,
many defects are visible. The molecules are seen as various types of bigger balls
with some internal structure (the so-called sub-molecular resolution): carbon atoms
of the molecules can be distinguished as tiny features on the big circles. Analysing
this substructure, it is possible to figure out the orientation of the molecules on the
surface. Different sizes of the balls correspond most likely to different adsorption
positions of the molecules.
2.9.4 Propagation of a Quantum State
Consider
an n-level quantum system described by the n n Hamiltonian matrix
H D H kj . If ‰0 D . k / is a vector-column of the system wave function at time
t, then at time t0 > t its wave function can be formally written as ‰t0 D Ut0 t ‰t ,
where Ut0 t is a matrix to be calculated. Due to its nature it is called the propagation
matrix as it describes a propagation of the system state vector ‰t from the initial
time t to the final time t0 . Let us calculate the propagation matrix in the case when
the Hamiltonian matrix H does not depend on time.
We shall start by solving approximately the time dependent Schrödinger equation
d‰t
i„ D H‰t (2.173)
dt
for a very small propagation time t. In this case the time derivative of the wave
function vector can be approximated as
d‰t ‰tCt ‰t ‰tCt ‰t

' H) i„ ' H‰t
dt t t

t
H) ‰tCt ' 1 C H ‰t :
i„
Applying the obtained relationship m times (t ! t C t ! t C 2t ! !

t C mt), the wave function will be propagated by the time mt as follows:
2
t t
‰tCmt ' 1 C H ‰tC.m1/t ' 1 C H ‰tC.m2/t
i„ i„
m
t
' ::: ' 1 C H ‰t : (2.174)
i„
Now, let k and xk be the eigenvalues and eigenvectors of the matrix H. The latter
is Hermitian, H D H, and hence the eigenvalues are real and eigenvectors can

P an orthonormal set, xk xj D ıkj .
always be chosen in such a way that they comprise
Expand ‰t in terms of the eigenvectors, ‰t D k ˛k xk . Multiplying both sides of

this equation by xk from the left and using the orthonormality relation between the

eigenvectors, the coefficients ˛k can be found as ˛k D xk ‰t . Hence, one can then
rewrite Eq. (2.174) as:
m X X m
t t
‰tCmt ' 1C H ˛k xk D ˛k 1 C H xk
i„ k k
i„
X t
m
D ˛k 1 C k xk :
k
i„
At the last step we used the spectral theorem, Eq. (1.90), stating that a function of a
matrix, acting on its eigenvector, can be replaced with the same function calculated
at the corresponding eigenvalue. Let t0 D t C mt be the finite time after the
propagation. Then t D .t0 t/ =m and we obtain
X m
k .t0 t/ =i„
‰tCmt D ‰t0 ' xk ‰t 1C xk :
k
m
The above formula becomes exact in the limit of t ! 0 or, which is equivalent,
when m ! 1. As we already know (see Sect. 2.3.3), that limit is equal to the
exponential function:
X
k .t0 t/ =i„ m
‰ D
t0 lim 1 C
xk ‰t xk
k
m!1 m
" #
X X
k .t0 t/=i„ k .t0 t/=i„
D xk ‰t e xk D e xk xk ‰t :
k k
In the square brackets we recognise the spectral theorem expansion of the matrix
0
eH.t t/=i„ which serves then as the required propagation matrix:
0 0
Ut0 t D eH.t t/=i„ D eiH.t t/=„ :
Chapter 3
Fourier Series
In Sect. I.7.21 of the first book we investigated in detail how an arbitrary function
f .x/ can be expanded into a functional series in terms of functions an .x/ with
n D 0; 1; 2; : : :, i.e.
1
X
f .x/ D f0 a0 .x/ C f1 a1 .x/ C D fn an .x/: (3.1)
nD0
There are many applications in which one has to solve ordinary or partial
differential equation with respect to functions which are periodic:
f .x C T/ D f .x/: (3.2)
Here T D 2l is the period and l is the corresponding half-period. Expanding the

function f .x/ we seek in an appropriate series with respect to functions an .x/,
which are also periodic with the same period, may help enormously when solving
differential equations. These expansions are called Fourier series. The Fourier series
also provides a foundation for the so-called Fourier transform, to be considered in
Chap. 5, which is an analogous formalism for non-periodic functions.
In this chapter we shall consider a specific case when a set of functions fan .x/g
consists of orthogonal trigonometric functions. Only briefly we shall dwell on a
more general case. The definition and the properties of the Fourier series will be
thoroughly considered. The chapter will conclude with some applications of the
method in physics.
1

250 3 Fourier Series
3.1 Trigonometric Series: An Intuitive Approach
Consider functions
n x n x
n .x/ D cos and n .x/ D sin ; (3.3)
l l
where n D 0; 1; 2; : : :. As can easily be seen, they have the same periodicity T for
any n, i.e.
n .x C 2l/ n x n x
cos D cos C 2 n D cos and
l l l
n .x C 2l/ n x n x
sin D sin C 2 n D sin :
l l l
What we would like to do is to understand whether it is possible to express f .x/ as
a linear combination of all these functions for all possible values of n from 0 to 1.
We shall start our discussion by showing that the functions (3.3) have a very simple
and important property. Namely, they satisfy the following identities for any n ¤ m:
Z l Z l
n .x/ m .x/dx D 0 and n .x/ m .x/dx D 0; (3.4)
l l
and also for any n and m (including the case of n D m):

Z l
n .x/ m .x/dx D 0: (3.5)
l
Indeed, let us first prove Eqs. (3.4). Using trigonometric identities from Sect. I.2.3.8,
one can write

n x m x 1 .n C m/ x .n m/ x
n .x/ m .x/ D cos cos D cos C cos :
l l 2 l l
Note that for any different n and m, the integer numbers k D n ˙ m are never equal
to zero. But then for any k ¤ 0:
Z ˇ
l
k x l k x ˇˇl l
cos dx D sin ˇ D Œsin k sin .k / D 0;
l l k l l k
so that the first integral in Eq. (3.4) is zero.
Problem 3.1. Prove the second identity in Eq. (3.4).
Problem 3.2. Prove Eq. (3.5).

3.1 Trigonometric Series: An Intuitive Approach 251
Consider now similar integrals between two identical functions:

Z Z Z Z
l
2
l
n x 1 l 2n x 1 l
n .x/ dx D cos2
dx D 1 C cos dx D dx D l;
l l l 2 l l 2 l
Z Z l Z Z
l
2 n x 1 l 2n x 1 l
n .x/ dx D sin2 dx D 1 cos dx D dx D l:
l l l 2 l l 2 l
The relations we have found can now be conveniently rewritten using the Kronecker
symbol ınm which, we recall, is by definition equal to zero if n ¤ m and to unity if
n D m. Then we can write
Z l Z l
n .x/ m .x/dx D lınm and n .x/ m .x/dx D lınm : (3.6)
l l
Thus, we find that the integral between l and l (the interval of periodicity of f .x/)
of a product of any two different functions taken from the set f n ; n g of Eq. (3.3)
is always equal to zero. We conclude then that these functions are orthogonal, i.e.
they form an orthogonal set of functions. We shall have a more detailed look at an
expansion via orthogonal sets of functions in Sect. 3.7.3.
Problem 3.3. Prove that Eqs. (3.6) are valid also for the integral limits l C c
and l C c with c being any real number. This means that the integrals can be
taken between any two limits x1 and x2 , as long as the difference between them
x2 x1 D 2l D T is equal to the period.
Let us now return to our function f .x/ that is periodic with the period of T D 2l,
and let us assume that it can be represented as a linear combination of all functions
of the set f n ; n g, i.e. as an infinite functional series
a0 X n n xo
1 1
a0 X n x
f .x/ D C fan n .x/ C bn n .x/g D C an cos C bn sin :
2 nD1
2 nD1
l l
(3.7)
Note that 0 .x/ D 0 and thus has been dropped; also, 0 .x/ D 1 and hence the a0
term has been separated from the sum with its coefficient chosen for convenience
as a0 =2. Note that f .x/ in the left-hand side of the expansion above, as we shall
see later on in Sect. 3.2, may not coincide for all values of x with the value which
the series in the right-hand side converges to; in other words, the two functions, f .x/
and the series itself, may differ at specific points. We shall have a proper look at this
particular point and the legitimacy of this expansion later on in Sect. 3.7.
Now, what we would like to do is to determine the coefficients a0 ; a1 ; a2 ; a3; ; : : :
and b1 ; b2 ; b3 ; : : :, assuming that such an expansion exists. To this end, let us also
assume that we can integrate both sides of Eq. (3.7) from l to l term-by-term
(of course, as we know well from Sect. I.7.2, this is not always possible). Thus,
we have
Z l Z l X1 Z l Z l
a0
f .x/dx D dx C an n .x/dx C bn n .x/dx :
l l 2 nD1 l l
The integrals in the right-hand side for n 1 in the curly brackets are all equal to
zero (this can be checked either by a direct calculation, or also from the fact that
these integrals can be considered as orthogonality integrals (3.6) with 0 .x/ D 1),
so that the .a0 =2/ 2l D a0 l is obtained in the right-hand side, yielding
Z
1 l
a0 D f .x/dx (3.8)
l l
for the a0 coefficient. To obtain other coefficients an for n ¤ 0, we first multiply

both sides of Eq. (3.7) by m .x/ with some fixed value of m ¤ 0 and then integrate
from l to l:
Z l Z l
a0
f .x/ m .x/dx D m .x/dx
l 2 l
1
X Z l Z l
C an n .x/ m .x/dx C bn n .x/ m .x/dx :
nD1 l l
The first term in the right-hand side is zero since m ¤ 0, similarly to above. In the
same fashion, due to Eq. (3.5), the second integral in the curly brackets is also equal
to zero, and we are left with
Z l 1
X Z l
f .x/ m .x/dx D an n .x/ m .x/dx:
l nD1 l
Now in the right-hand side we have an infinite sum of terms containing the same
integrals as in Eq. (3.6) which are all equal to zero except for the single one in
which n D m, i.e. only a single term in the sum above survives
Z l 1
X Z l 1
X
f .x/ m .x/dx D an n .x/ m .x/dx D an lınm D am l;
l nD1 l nD1
which immediately gives

Z Z
1 l
1 l
m x
am D f .x/ m .x/dx D f .x/ cos dx: (3.9)
l l l l l
Note that a0 of Eq. (3.8) can also formally be obtained from Eq. (3.9) although
the latter was, strictly speaking, obtained for non-zero values of m only. This
became possible because of the factor of 1=2 introduced earlier in Eq. (3.7), which
now justifies its convenience. Therefore, Eq. (3.9) gives all an coefficients. Note,
however, that in practical calculations it is frequently required to consider the
coefficients a0 and an for n > 0 separately.
Problem 3.4. Prove that the coefficients bn are obtained from

Z Z
1 l 1 l m x
bm D f .x/ m .x/dx D f .x/ sin dx: (3.10)
l l l l l
Thus, the formulae (3.8)–(3.10) solve the problem: if the function f .x/ is
known, then we can calculate all the coefficients in its expansion of Eq. (3.7). The
coefficients an and bn are called Fourier coefficients, and the infinite series (3.7)
Fourier series.
Example 3.1. I Consider a periodic function with the period of 2 specified in
the following way: f .x/ D x within the interval < x < , and then repeated
like this to the left and to the right. This function, when periodically repeated, jumps
between its values of ˙1 at the points ˙k for any integer k D 1; 3; 5; : : :. Calculate
the expansion coefficients an and bn and thus write the corresponding Fourier series.
Solution. In this case l D , and the formulae (3.8)–(3.10) for the coefficients an
and bn are rewritten as:
Z
1
am D x cos .mx/ dx D 0; m D 0; 1; 2; : : : ;

and
Z ˇ Z
1 1 cos.mx/ ˇˇ 1
bm D x sin .mx/ dx D x ˇ Cm cos .mx/ dx
m
8 9
ˆ
ˆ >
ˆ
< ˇ > >
=
1 cos.m / cos.m / 1 ˇ
D C 2 sin.mx/ˇˇ
ˆ
ˆ m m m >
>
:̂ „ ƒ‚ …> ;
D0
2 2
D .1/m D .1/mC1 ; m ¤ 0
m m
(the integration by parts was used). Note that am D 0 for any m because the function
under the integral for am is an odd function and we integrate over a symmetric
interval. Thus, in this particular example the Fourier series consists only of sine
functions:
X1
2
f .x/ D .1/mC1 sin.mx/: (3.11)
mD1
m
2.5 n=3
0
-2.5
2.5 n=5
0
-2.5
fn(x)
2.5 n=10
0
-2.5
2.5 n=20
0
-2.5
-15 -10 -5 0 5 10 15
x
Fig. 3.1 Graphs of fn .x/ corresponding to the first n terms in the series of Eq. (3.11)
The convergence of the series is demonstrated in Fig. 3.1: the first n terms in the
series are only accounted for, i.e. the functions
Xn
2
fn .x/ D .1/mC1 sin.mx/
mD1
m
for several choices of the upper limit n D 3; 5; 10; 20 are plotted. It can be seen
that the series converges very quickly to the exact function between and .
Beyond this interval the function is periodically repeated. Note also that the largest
error in representing the function f .x/ by the series appears at the points ˙k (with
k D 1; 3; 5; : : :) where f .x/ jumps between the values ˙1. J
The actual integration limits from l to Cl in the above formulae were chosen
only for simplicity; in fact, due to periodicity of f .x/, cos .m x=l/ and sin .m x=l/,
one can use any limits differing by 2l, i.e. from l C c to l C c for any value of c
(Problem 3.3). For instance, in some cases it is convenient to use the interval from
0 to T D 2l.
The Fourier expansion can be handy in summing up infinite numerical series as
illustrated by the following example:
Example 3.2. I Show that
1 1 1
SD1 C C D : (3.12)
3 5 7 4
Solution. Consider the series (3.11) generated in the previous Example 3.1 for
f .x/ D x, < x < , and set there x D =2:
X1
2 m
D .1/mC1 sin :
2 mD1
m 2
The sine functions are non-zero only for odd values of m D 1; 3; 5; : : :, so that
we obtain (convince yourself in what follows by calculating the first few terms
explicitly):

1 1 1
D 2 1 C C D 2S;
2 3 5 7
so that S D =4 as required. J
Problem 3.5. Show that if the function f .x/ D f .x/ is even, the Fourier series
is simplified as follows:
1 Z
a0 X nx 2 l
nx
f .x/ D C an cos with an D f .x/ cos dx: (3.13)
2 nD1
l l 0 l
This is called a cosine Fourier series.

Problem 3.6. Show that if the function f .x/ D f .x/ is odd, the Fourier
series is simplified as follows:
1
X Z
nx 2 l
nx
f .x/ D bn sin with bn D f .x/ sin dx: (3.14)
nD1
l l 0 l
This is called a sine Fourier series.
The Fourier series for a periodic function f .x/ can be written in an alternative
form which may better illuminate its meaning. Indeed, the expression in the curly
brackets of Eq. (3.7) can be rewritten as a single sine function:
n x n x
an cos C bn sin D An sin .2 n t C n/ ;
l l
p
where An D a2n C b2n is an amplitude and n a phase satisfying
an bn
sin n Dp and cos n Dp ;
a2n C b2n a2n C b2n
and n D n=T D n= .2l/ is the frequency. Correspondingly, the Fourier series (3.7)
is transformed into
1
a0 X
f .x/ D C An sin .2 n x C n/ : (3.15)
2 nD1
This simple result means that any periodic function can be represented as a sum
of sinusoidal functions with discrete frequencies 1 D 0 , 2 D 20 , 3 D 30 ,
etc., so that n D n0 , where 0 D 1=T is the smallest (the so-called fundamental)
frequency, and n D 1; 2; 3; : : :.
For instance, let us consider some signal f .t/ in a device or an electric circuit
which is periodic in time t with the period of T. Then, we see from Eq. (3.15)
that such a signal can be synthesised by forming a linear superposition of simple
harmonic signals having discrete frequencies n D n0 and amplitudes An . As we
have seen from the examples given above, in practice very often only a finite number
of lowest frequencies may be sufficient to faithfully represent the signal, with the
largest error appearing at the points where the original signal f .t/ experiences
jumps or changes most rapidly. It also follows from the above equations that when
expanding a complicated signal f .t/ there is a finite number of harmonics with
relatively large amplitudes, then f .t/ could be reasonably well represented only by
these.
3.2 Dirichlet Conditions
The discussion above was not rigorous: firstly, we assumed that the expansion (3.7)
exists and, secondly, we integrated the infinite expansion term-by-term, which
cannot always be done. A more rigorous formulation of the problem is this: we
are given a function f .x/ specified in the interval2 l < x < l. We then form an
infinite series (3.7) with the coefficients calculated via Eqs. (3.8)–(3.10):
a0 X n n xo
1
n x
C an cos C bn sin :
2 nD1
l l
We ask if for any l < x < l the series converges to some function fFS .x/ , and if
it does, would the resulting function be exactly the same as f .x/, i.e. is it true that
fFS .x/ D f .x/ for all values of x? Also, are there any limitations on the function
f .x/ itself for this to be true? The answers to all these questions are not trivial and
the corresponding rigorous discussion is given later on in Sect. 3.7. Here we shall
simply formulate the final result established by Dirichlet.
2
Or, which is the same, which is periodic with the period of T D 2l; the “main” or “irreducible”
part of the function, which is periodically repeated, can start anywhere, one choice is between l
and l, the other between 0 and 2l, etc.
3.2 Dirichlet Conditions 257
Fig. 3.2 The function f .x/ is equal to f1 .x/ for x < x0 and to f2 .x/ for x > x0 , but is discontinuous
(makes a “jump”) at x0 . However, finite limits exist at both sides of the “jump” corresponding to
the two different values of the function on both sides of the point x D x0 : on the left, f .x 0/
D limx!x0 f1 .x/ D f1 .x0 /, while on the right f .x C 0/ D limx!x0 f2 .x/ D f2 .x0 /, where f1 .x0 / ¤
f2 .x0 /
We first give some definitions (see also Sect. I.2.4.3). Let the function f .x/ be
discontinuous at x D x0 , see Fig. 3.2, but have well-defined limits x ! x0 from the
left and from the right of x0 (see Sect. I.2.4.1.2), i.e.
f .x0 C 0/ D lim f .x0 C ı/ and f .x0 0/ D lim f .x0 ı/;

ı!0 ı!0
with ı > 0 in both cases. It is then said that f .x/ has a discontinuity of the first
kind at the point x0 . Then, the function f .x/ is said to be piecewise continuous in the
interval a < x < b, if it has a finite number n < 1 of discontinuities of the first kind
there, but otherwise is continuous everywhere, i.e. it is continuous between any two
adjacent points of discontinuity.
The function f .x/ D x, < x < , of Example 3.1, when periodically
repeated, represents an example of such a function: in any finite interval crossing
points ˙ ; ˙2 ; : : :, it has a finite number of discontinuities; however, at each
discontinuity finite limits exist from both sides. For instance, consider the point
of discontinuity x D . Just on the left of it f .x/ D x and hence the limit from the
left f . 0/ D , while on the right of it f . C 0/ D due to periodicity of f .x/.
Thus, at x D we have a discontinuity of the first kind.
Then, the following Dirichlet theorem addresses the fundamental questions about
the expansion of the function f .x/ into the Fourier series:
Theorem 3.1. If f .x/ is piecewise continuous in the interval l < x < l and has
the period of T D 2l, then the Fourier series
a0 X n n xo
1
n x
fFS .x/ D C an cos C bn sin (3.16)
2 nD1
l l
converges to f .x/ at any point x where f .x/ is continuous, while it converges to the
mean value
1
fFS .x0 / D Œf .x0 0/ C f .x0 C 0/ (3.17)
2
at the points x D x0 of discontinuity.
The proof of this theorem is quite remarkable and will be given in Sect. 3.7
using additional assumptions. Functions f .x/ satisfying conditions of the Dirichlet
theorem are said to satisfy Dirichlet conditions.
As an example, consider the function f .x/ D x, < x < , with the period 2
(Example 3.1) whose Fourier series is given by Eq. (3.11):
X1
2
fFS .x/ D .1/mC1 sin.mx/:
mD1
m
What values does fFS .x/ converge to at the points x D ; 0; =2; ; 3 =2? To
answer these questions, we need to check if the function makes a jump at these
points. If it does, then we should consider the left and right limits of the function
at the point of the jump and calculate the average (the mean value); if there is no
jump, the Fourier series converges at this point to the value of the function itself.
f .x/ is continuous at x D 0; =2; 3 =2 and thus fFS .0/ D f .0/ D 0, fFS . =2/ D
f . =2/ D =2 and fFS .3 =2/ D f .3 =2/ D f .3 =2 2 / D f . =2/ D =2
(by employing T D 2 periodicity of f .x/, we have moved its argument 3 =2
back into the interval < x < ), while at x D the function f .x/ has the
discontinuity of the first kind with the limits on both sides equal to C (from the
left) and (right), respectively, so that the mean is zero, i.e. fFS . / D 0. This is
also clearly seen in Fig. 3.1.
Problem 3.7. The function f .x/ D 2 cos2 2x C sin 2x C 1 is defined for

x . Show its non-zero Fourier coefficients are a0 D 4, a4 D b2 D 1.
Problem 3.8. Consider the periodic function f .x/ with the period of 2 , one
period of which is specified as f .x/ D x2 ; < x < . Sketch the function in
the interval 3 < x < 3 . Is this function continuous everywhere? Show that
the Fourier series of this function is
2 1
X 4.1/n
f .x/ D C cos.nx/: (3.18)
3 nD1
n2
Would the series converge everywhere to x2 ?

Problem 3.9. Show that the Fourier series for the function f .x/ with the period
of 2 defined as x C 1 for 1 x < 0 and x C 1 for 0 x < 1 is
1
1 X 2
f .x/ D C Œ1 .1/n cos . nx/ :
2 nD1 . n/2
Sketch the function and explain why the series converges to 1 at x D 0?

3.3 Integration and Differentiation of the Fourier Series 259
Problem 3.10. Show that the sine/cosine Fourier series expansion of the
function f .x/ D 1 for 0 < x < and f .x/ D 0 for < x < 0 is
1
1 1 X 1 .1/n
f .x/ D C sin.nx/: (3.19)
2 nD1
n
What does the series converge to at x D 0?

Problem 3.11. Using the Fourier expansion for the f .x/ defined in the previous
problem, obtain the Fourier series for

1; 0 < x <
g.x/ D :
1; < x < 0
Problem 3.12. Use the series (3.18) to show that
X .1/nC1 1 2
1 1 1
SD1 C C D D : (3.20)
22 32 42 nD1
n2 12
Problem 3.13. Use the Fourier series (3.19) to show that the numerical series
1 1
SD1 C D : (3.21)
3 5 4
3.3 Integration and Differentiation of the Fourier Series
The Fourier series is an example of a functional series and hence it can be inte-
grated term-by-term if it converges uniformly (Sect. I.7.2). However, the uniform
convergence is only a sufficient criterion. We shall prove now that irrespective of
whether the uniform convergence exists or not, the Fourier series can be integrated
term-by-term any number of times.
To prove this, it is convenient to consider an expansion of f .x/ with a0 D 0; it
is always possible by considering the function f .x/ a0 =2 ! f .x/ whose Fourier
expansion would have no constant term, i.e. for this function a0 D 0. Then, we
consider an auxiliary function:
Z x
F.x/ D f .t/dt: (3.22)
l
It is equal to zero at x D l; but it is also zero at x D l, since F.l/ D a0 l D 0,

see expression (3.8) for a0 . These conditions ensure the required periodicity of F.x/
with the same period of 2l as for f .x/:
Z xC2l Z x Z xC2l Z x Z 2l
F.x C 2l/ D f .t/dt D f .t/dt C f .t/dt D f .t/dt C f .t/dt
l l x l 0
Z x
D f .t/dt D F.x/:
l
Further, if f .x/ satisfies the Dirichlet conditions (is piecewise continuous), its
integral, F.x/, will as well. Therefore, F.x/ can also formally be expanded into a
Fourier series:
A0 X h nx i
1
nx
F.x/ D C An cos C Bn sin ; (3.23)
2 nD1
l l
where
Z Z Z
1 l
1 l
1 l
A0 D F.x/dx D xF.x/jll xf .x/dx D xf .x/dx;
l l l l l l
and for any n 1:

Z " ˇ Z l #
1 l
nx 1 l nx ˇˇl l nx
An D F.x/ cos dx D F.x/ sin f .x/ sin dx
l l l l n l ˇl n l l
Z l
1 nx l
D f .x/ sin dx D bn ;
n l l n
Z " ˇ Z l #
1 l nx 1 l nx ˇˇl l nx
Bn D F.x/ sin dx D F.x/ cos C f .x/ cos dx
l l l l n l ˇl n l l
Z l
1 nx l
D f .x/ cos dx D an ;
n l l n
where we have used integration by parts in each case and the fact that F 0 .x/ D f .x/
and F.˙l/ D 0. Above, an and bn are the corresponding Fourier coefficients to
the cosines and sines in the Fourier expansion of f .x/. Therefore, we can write the
expansion of F.x/ as follows:
Z Z 1
X
x
1 l l
nx l nx
F.x/ D f .t/dt D bn cos C
tf .t/dt Can sin :
l l nD1
2l n l n l
(3.24)
Further, at x D l we should have zero in both sides, which means that
Z X 1 Z 1
X
1 l
l 1 l
l .1/n
tf .t/dt bn cos . n/ D 0 H) tf .t/dt D bn ;
2l l nD1
n 2l l nD1
n
3.3 Integration and Differentiation of the Fourier Series 261
so as a result, the expansion (3.24) takes on the form:

Z x 1
X h i
l nx l nx
F.x/ D f .t/dt D bn cos .1/n C an sin :
l nD1
n l n l
(3.25)
It is seen that the function F.x/ does not jump at x D ˙l where it is equal to zero.
Now let us look at the Fourier expansion of f .t/ (replacing x with t) with a0 D 0,
e.g. Eq. (3.7), and formally integrate it term-by-term between l and x. It is easy
to see that exactly the same expressions are obtained on both sides as above. This
proves that integration of the Fourier series is legitimate.
Now, consider a function f .x/ for which a0 ¤ 0 in its Fourier expansion. The
above treatment is valid for .x/ D f .x/ a0 =2 and the corresponding function in
the left-hand side in this case
Z xh Z x
a0 i a0
F.x/ D f .t/ dt D f .t/dt .x C l/ :
l 2 l 2
We already know that F .˙l/ D 0. Therefore, after integrating term-by-term a
general Fourier expansion of f .x/, an additional term a0 .x C l/=2 appears in the
right-hand side. The same term arises when integrating the constant a0 =2 term of the
original expansion of f .x/. This discussion fully demonstrates that one can always
integrate any Fourier series term-by-term.
We mention that, in fact, the bottom limit when integrating the series of f .x/ is
not so important. Indeed, assume that one needs to integrate a series between a and
x, where x > a > l. Then, two series can be produced, one integrated between
l and x, and another between l and a. Subtracting the two series from each other
term-by-term (assuming absolute convergence), the required series is obtained.
Note also that the convergence of the series only improves after integration.
This is because, when integrating cos . nx=l/ or sin . nx=l/ with respect to x, an
additional factor of 1=n arises that can only accelerate the convergence of the Fourier
series.
Example 3.3. Obtain the Fourier series for the function f .x/ D x2 , <x< ,
that is periodic with the period of 2 .
Solution. I This can be obtained by integrating term-by-term the series (3.11) for
f1 .x/ D x from 0 to some 0 < x < :
Z X1 Z x
x
2
f1 .x1 /dx1 D .1/mC1 sin.mx1 /dx1 :
0 mD1
m 0
Since f1 .x1 / D x1 in the interval under consideration, the integral in the left-hand
side gives x2 =2. Integrating the sine functions in the right-hand side, we obtain
X1
x2 2 1
D .1/mC1 Œ cos.mx/ C 1
2 mD1
m m
10
n=10
n=3
8 n=2
6
fn(x)
0
-10 0 10
X
Fig. 3.3 The partial Fourier series of f .x/ D x2 , see Eq. (3.26), containing n D 2, 3 and 10 terms
X1 X1
2.1/mC1 .1/mC1
D 2
cos.mx/ C 2 :
mD1
m mD1
m2
The numerical series (the last term) can be shown (using the direct method for
f .x/ D x2 , i.e. expanding it into the Fourier series, see Problems 3.8 and 3.12)
to be equal to 2 =12,
X1 2
.1/mC1
D ;
mD1
m2 12

X
21
2 .1/mC1
x D 4 cos.mx/: (3.26)
3 mD1
m2
The convergence of this series with different number of terms n in the sum (m n)
is pretty remarkable, as is demonstrated in Fig. 3.3. J
The situation with term-by-term differentiation of the Fourier series is more
complex since each differentiation of either cos . nx=l/ or sin . nx=l/ brings in
an extra n in the sum which results is slower convergence or even divergence. For
example, if we formally differentiate term-by-term formula (3.11) for f .x/ D x,
< x < , we obtain
1
X
1D 2.1/mC1 cos.mx/;
mD1
3.4 Parseval’s Theorem 263
which contains the diverging series. There are much more severe restrictions on
the function f .x/ that would enable its Fourier series to be differentiable term-by-
term. Therefore, this procedure should be performed with caution and proper prior
investigation.
3.4 Parseval’s Theorem
Consider a periodic function f .x/ with the period of 2l satisfying Dirichlet condi-
tions, i.e. f .x/ has a finite number of discontinuities of the first kind in the interval
l < x < l. Thus, it can be expanded in the Fourier series (3.16). If we multiply this
expansion by itself, a Fourier series of the square of the function, f 2 .x/, is obtained.
Next, we integrate both sides from l to l (we know by now that the term-by-term
integration of the Fourier series is always permissible), which gives
Z l Z l( )
a0 X h n x i n a0
1
2 n x
fFS .x/dx D C an cos C bn sin
l l 2 nD1
l l 2
1 h
)
X m x m xi
C am cos C bm sin dx
mD1
l l
1 Z
a20 a0 X l n n x n xo
D 2l C 2 an cos C bn sin dx
4 2 nD1 l l l
1 Z l n
n xo n
X1 X
n x m x
C an cos C bn sin am cos
nD1 mD1 l
l l l
m x o
C bm sin dx:
l
In the left-hand side we used fFS .x/ instead of f .x/ to stress the fact that these
two functions may not coincide at the points of discontinuity of f .x/. However,
the integral in the left-hand side can be replaced by the integral of f 2 .x/ if f .x/ is
continuous everywhere; if it has discontinuities, this can also be done by splitting
the integral into a sum of integrals over each region of continuity of f .x/, where
2
fFS .x/ D f .x/. Hence, the integral of fFS .x/ can be replaced by the integral of f 2 .x/ in
a very general case. Further, in the right-hand side of the above equation any integral
in the second term is zero (integrals of cosine and sine functions there can be treated
as the orthogonality integrals with the n D 0 cosine function which is equal to one).
Also, due to the orthogonality of the sine and cosine functions, Eqs. (3.5) and (3.6),
in the term with the double sum only integrals with equal indices n D m are non-
zero if taken between two cosine or two sine functions; all other integrals are zero.
Hence, we obtain the following simple result:
Z 1
1 l
a20 X 2
f 2 .x/dx D C an C b2n : (3.27)
l l 2 nD1
This equation is called Parseval’s equality or theorem. It can be used, e.g., for
calculating infinite numerical series.
Example 3.4. I Write the Parseval’s equality for the series (3.11) of f .x/ D x,
< x < , and then sum up the infinite numerical series:
1 1 1
1C C 2 C 2 ::::
22 3 4
Solution. The integral in the left-hand side of the Parseval’s equality (3.27) is
simply (l D here):
Z Z ˇ
1 2 1 2 1 x3 ˇˇ 2 2
f .x/dx D x dx D D :
3 ˇ 3
In the right-hand side of Eq. (3.27) we have bn D 2.1/nC1 =n and an D 0, see

Eq. (3.11), i.e.
1
( 2 ) 1 1 1
02 X 2 2 X 4 X 1 X 1 2 2
C 0 C .1/ nC1
D 2
D 4 H) 4 D ;
2 nD1
n nD1
n nD1
n2 nD1
n2 3
or
X1 2
1 1 1
D 1 C C C D ; (3.28)
nD1
n2 22 32 6
as required.J
Problem 3.14. Prove a generalisation of the Parseval’s theorem,

Z 1
1 l
a0 a00 X 0
f .x/g.x/dx D C an an C bn b0n ; (3.29)
l l 2 nD1
called Plancherel’s theorem. Here f .x/ and g.x/ are two functions ˚of the same
period 2l, both satisfying the Dirichlet conditions, and fan ; bn g and a0n ; b0n are
their Fourier coefficients, respectively.
Problem 3.15. Use the Parseval’s theorem applied to the series (3.19) to show
that
2
1 1
1C 2
C 2 C D :
3 5 8
3.5 Complex (Exponential) Form of the Fourier Series 265
Problem 3.16. Applying the Parseval’s theorem to the series (3.26), show that
4
1 1
SD1C C C D :
24 34 90
Problem 3.17. Show that the Fourier series expansion of f .x/ D x3 4x, where
2 x < 2, is
1
96 X .1/n nx
f .x/ D 3 3
sin :
nD1
n 2
Then apply the Parseval’s theorem to show that

6
1 1
SD1C C C D :
26 36 945
3.5 Complex (Exponential) Form of the Fourier Series
It is also possible to formulate the Fourier series (3.7) of a function f .x/ that is
periodic with the period of T D 2l in a different form based on complex exponential
functions:
1
X 1
X
f .x/ D cn ein x=l
D cn n .x/: (3.30)
nD1 nD1
Note that here the index n runs over all possible negative and positive integer
values (including zero), not just over zero and positive values as in the case of
the cosine/sine Fourier series expansion. It will become clearer later on why this
is necessary.
The Fourier coefficients cn can be obtained in the same way as for the sine/cosine
series by noting that the functions n .x/ D exp .in x=l/ also form an orthogonal
set. Indeed, if integers n and m are different, n ¤ m, then
Z l Z l Z l
n .x/m .x/dx D ein x=l im x=l
e dx D ei.nm/ x=l
dx
l l l
l ˇ
x=l ˇl l i.nm/
D ei.nm/ l
D e ei.nm/
i.n m/ i.n m/
l
D f2i sin Œ .n m/g D 0:
i.n m/
Note that when formulating the orthogonality condition above, we took one of the
functions in the integrand, n .x/, as complex conjugate. If n D m, however, then
Z l Z l Z l
n .x/n .x/dx D jn .x/j2 dx D dx D 2l;
l l l
so that we can generally write

Z l
n .x/m .x/dx D 2lınm ; (3.31)
l
where ınm is the Kronecker symbol. Note again that one of the functions is complex
conjugate in the above equation.
Thus, assuming that f .x/ can be expanded into the functional series in terms of
the functions n .x/, we can find the expansion coefficients cn exactly in the same
way as for the sine/cosine series: multiply both sides of Eq. (3.30) by m .x/ with a
fixed index m, and integrate from l to l on both sides:
Z l 1
X Z l 1
X
f .x/m .x/dx D cn n .x/m .x/dx D cn 2lınm D cm 2l;
l nD1 l nD1
which finally gives for any m:

Z
1 l
cm D f .x/m .x/dx: (3.32)
2l l
Note that we never assumed here that the function f .x/ is real. So, it could also be
complex.
The same expressions (3.30) and (3.32) can also be derived directly from the
sine/cosine Fourier series. This exercise helps to understand, firstly, that this new
form of the Fourier series is exactly equivalent to the previous one based on the sine
and cosine functions; secondly, we can see explicitly why both positive and negative
values of n are needed. To accomplish this, we start from Eq. (3.16) and replace sine
and cosine with complex exponentials by means of the Euler’s formulae (2.33):
1 ix 1 ix
sin x D e eix and cos x D e C eix ;
2i 2
yielding:
1
a0 X 1 in x=l
1 in in x=l

f .x/ D C an ein x=l
Ce C bn e x=l
e
2 nD1
2 2i
1 1
a0 X 1 1 X 1 1
D C an ein x=l
C bn ein x=l
C an ein x=l
bn ein x=l
2 nD1
2 2i nD1
2 2i
1
X 1
X
a0 1 1 1 1
D C an C bn ein x=l
C an bn ein x=l
: (3.33)
2 2 nD1
i 2 nD1
i
Now look at the formulae (3.9) and (3.10) for an and bn :

Z Z
1 l
n x 1 l
n x
an D f .x/ cos dx and bn D f .x/ sin dx:
l l l l l l
Although these expressions were obtained for positive values of n only, they can
formally be extended for any values of n including negative and zero values. Then,
we observe that an D an , bn D bn , and b0 D 0. These expressions allow us to
rewrite the second sum in (3.33) as a sum over all negative integers n:
1 1
1X 1 1 X 1
an bn ein x=l
D an bn ein x=l
2 nD1 i 2 nD1 i
1
1 X 1
D an C bn ein x=l
:
2 nD1 i
We see that this sum looks now exactly the same as the first sum in (3.33) in which
n is positive, so that we can combine the two into a single sum in which n takes on
all integer values from 1 to C1 except for n D 0:
1
X
a0 1
f .x/ D C .an ibn / ein x=l
:
2 2
nD1;n¤0
Introducing the coefficients

Z Z
1 1 1 l n x i l n x
cn D .an ibn / D f .x/ cos dx f .x/ sin dx
2 2 l l l l l l
Z l Z l
1 n x n x 1
D f .x/ cos i sin dx D f .x/ein x=l dx;
2l l l l 2l l
and noting that b0 D 0 and hence the a0 =2 term can also be formally incorporated
into the sum as c0 D a0 =2, we can finally rewrite the above expansion in the
form of Eq. (3.30). The obtained equations are the same as Eqs. (3.30) and (3.32),
respectively, but derived differently. Thus, the two forms of the Fourier series are
completely equivalent to each other. The exponential (complex) form looks simpler
and thus is easier to remember. It is always possible, using the Euler’s formula, to
obtain any of the forms as illustrated by the following example:
Example 3.5. I Obtain the complex (exponential) form of the Fourier series for
f .x/ D x, < x < as in Example 3.1.
Solution. We start by calculating the Fourier coefficients cn from Eq. (3.32) using
l D . For n ¤ 0
Z
1
cn D xein x=
dx
2
Z ˇ Z
1 inx 1 1 inx ˇˇ 1 inx
D xe dx D x e ˇ C e dx
2 2 in in

1 in 1 in
D e Ce in
e e in
2 in .in/2

1 2 2i 1 .1/nC1
D cos .n / C sin .n / D cos .n / D ;
2 in .in/2 in in
and, when n D 0,
Z ˇ
1 1 x2 ˇˇ
c0 D xdx D D 0;
2 2 2 ˇ
so that the Fourier series is

1
X .1/nC1 inx
f .x/ D e :J (3.34)
in
nD1;n¤0
Example 3.6. I Show that the above expansion is equivalent to the series (3.11).
Solution. Since exp .inx/ D cos.nx/ C i sin.nx/, we get by splitting the sum into
two with negative and positive summation indices:
1
X 1
X 1 1
.1/nC1 .1/nC1 inx X .1/nC1 inx X .1/nC1 inx
f .x/ D einx C e D e C e ;
nD1
in nD1
in nD1
in nD1
in
where in the second sum we replaced the summation index n ! n, so that the
new index would run from 1 to C1 as in the other sum. Combining the two sums
together, and noting that .1/nC1 D .1/nC1 , we get
X1 1
.1/nC1 inx X .1/nC1
f .x/ D e einx D 2i sin.nx/
nD1
in nD1
in
1
X 2.1/nC1
D sin.nx/;
nD1
n
which is exactly the same as in Eq. (3.11) which was obtained using the sine/cosine
formulae for the Fourier series.J
Problem 3.18. Show that the complex (exponential) Fourier series of the
function
8
< 0; < x < 0
f .x/ D 1; 0 < x < =2
:
0; =2 < x <
is
1
X
1 1
f .x/ D C 1 ein =2
einx : (3.35)
4 2 in
nD1;n¤0
Problem 3.19. Use x D 0 in the series of the previous problem to obtain the
sum of the numerical series (3.21).
Problem 3.20. Show that the expansion of the function

sin x; 0 x <
f .x/ D
0; < x < 0
into the complex (exponential) Fourier series is

1
X
1 ix 1 .1/nC1 inx
f .x/ D e eix C e :
4i 2 .1 n2 /
nD1;n¤˙1
[Hint: you should consider the cn coefficients for n D ˙1 separately.]

Problem 3.21. Derive the sine/cosine Fourier series (3.16) directly from its
exponential form (3.30).
Problem 3.22. Derive the Parseval’s theorem (f .x/ is a real function)
Z 1
X
1 l
2
f .x/dx D jcn j2 (3.36)
2l l nD1
for the (complex) exponential Fourier series.

Problem 3.23. Derive the more general Plancherel’s theorem
Z 1
X
1 l
f .x/g .x/dx D cn dn (3.37)
2l l nD1
for two (generally complex) functions f .x/ and g.x/ of the same period of 2l,
with the corresponding (exponential) Fourier coefficients cn and dn .
Problem 3.24. Applying the Parseval’s theorem (3.36) to the Fourier series of
Problem 3.20, show that
1
X 2
1 1 1 1
2
D1C 2
C 2 C D C :
nD0 .4n2 1/ 3 15 16 2
Problem 3.25. In theoretical many-body physics one frequently uses the so-
called Matsubara Green’s function G.1 ; 2 / D G.1 2 /. The function G. /
is defined on the interval ˇ ˇ, where ˇ D 1=kB T is the inverse
temperature. For bosons the values of G for positive and negative arguments
are related via the following relationship: G. / D G . ˇ/, where ˇ
0. Show that this function may be expanded into the following Fourier series:
1 X !n
G. / D e G .!n / ;
iˇ n
where !n D i n=ˇ are the so-called Matsubara frequencies, the summation is

performed only over even (positive, negative, zero) integer values of n, and
Z ˇ
G .!n / D i e!n G. /d
0
are the corresponding coefficients.
Problem 3.26. Show that exactly the same expansion exists also for fermions
when the values of G for positive and negative arguments are related via
G. / D G . ˇ/, where ˇ 0. The only difference is that the
summation is run only over odd values of n.
3.6 Application to Differential Equations
The Fourier method is frequently used to obtain solutions of differential equations in

the form of the Fourier series. Then this series can be either summed up analytically
or, if this does not work out, numerically on a computer using a final number of
terms in the sum. As an example, consider a harmonic oscillator subject to a periodic
excitation (external) force f .t/:
yR C !02 y D f .t/: (3.38)
We assume that the force f .t/ is a periodic function with the period T D 2 =!
(frequency !). Let !0 be the fundamental frequency of the harmonic oscillator.
Here yR is a double derivative of y.t/ with respect to time t. We would like to obtain
3.6 Application to Differential Equations 271
the particular integral of this differential equation to learn about the response of the
oscillator to the external force f .t/. In other words, we shall only be interested here
in obtaining the partial integral of this DE.
Using the Fourier series method, it is possible to write down the solution of (3.38)
for a general f .t/. Indeed, since f .t/ is periodic, we can expand it into the complex
Fourier series (we use l D =! so that T D 2l):
1
X 1
X
f .t/ D fn ein t!=
D fn ein!t (3.39)
nD1 nD1
with
Z Z Z 2 =!
1 T=2
in!t 1 T
in!t !
fn D f .t/e dt D f .t/e dt D f .t/ein!t dt;
T T=2 T 0 2 0
(3.40)
where the integration was shifted to the interval 0 < t < T D 2 =!.
To obtain y.t/ that satisfies the differential equation above, we recognise from
Eq. (3.38) that the function Y.t/ D yR .t/ C !02 y.t/ must also be periodic with the
same periodicity as the external force; consequently, the function y.t/ sought for
must also be such. Hence, we can expand it into a Fourier series as well:
1
X
y.t/ D yn ein!t : (3.41)
nD1
Substituting Eqs. (3.39) and (3.41) into Eq. (3.38), we obtain3

1
X 1
X
2
.in!/ C !02 yn e in!t
D fn ein!t ;
nD1 nD1
or
1
X ˚
.n!/2 C !02 yn fn ein!t D 0: (3.42)
nD1
This equation is satisfied for all values of t if and only if all coefficients of exp .in!t/
are equal to zero simultaneously for all values of n:
.n!/2 C !02 yn fn D 0: (3.43)
Indeed, upon multiplying both sides of Eq. (3.42) by exp .im!t/ with some fixed
value of m and integrating between 0 and T, we get that only the n D m term is
left in the left-hand side of Eq. (3.42) due to orthogonality of the functions n .t/ D
exp .in!t/:
3
Since the Fourier series for both y.t/ and f .t/ converge, the series for yR .t/ must converge as well,
i.e. the second derivative of the Fourier series of y.t/ must be well defined.
1
X Z
˚ T
.n!/2 C !02 yn fn ein!t eim!t dt
nD1 0
1
X ˚
D .n!/2 C !02 yn fn ınm T
nD1
˚
DT .m!/2 C !02 ym fm D 0;
which immediately leads to Eq. (3.43). Thus, we get the unknown Fourier coeffi-
cients yn of the solution y.t/ as
fn
yn D (3.44)
!02 .n!/2
and hence the whole solution reads
1
X fn
y.t/ D ein!t : (3.45)
!2
nD1 0
.n!/2
We see that the harmonics of f .t/ with frequencies n! are greatly enhanced in
the solution if they come close to the fundamental frequency !0 of the harmonic
oscillator (resonance).
3.7 A More Rigorous Approach to the Fourier Series
3.7.1 Convergence “On Average”
Before actually giving the rigorous formulation for the Fourier series, we note that
in Eq. (3.7) the function f .x/ is expanded into a set of linearly independent functions
f n .x/g and f n .x/g, see Sect. 1.1.2 for the definition of linear independence.
It is easy to see that cosine and sine functions f n .x/g and f n .x/g of Eq. (3.3) are
linearly independent. Indeed, let us construct a linear combination of all functions
with unknown coefficients ˛i and ˇi and set it to zero:
1
X
Œ˛i i .x/ C ˇi i .x/ D 0: (3.46)
iD0
Next we multiply out both sides by j .x/ with some fixed j and integrate over x
between l and l:
1
X Z l Z l
˛i i .x/ j .x/dx C ˇi i .x/ j .x/dx D 0:
iD0 l l
3.7 A More Rigorous Approach to the Fourier Series 273
Due to the orthogonality of the functions, see Eqs. (3.5) and (3.6), all the integrals
between any i .x/ and j .x/ will be equal to zero, while the integrals involving i .x/
(i D 0; 1; 2; : : :) and j .x/ give the Kronecker symbol ıij , i.e. only one term in the
sum with the value of i D j will survive
1
X Z l 1
X
˛i i .x/ j .x/dx D ˛j ıij l D ˛j l D 0 H) ˛j D 0:
iD0 l iD0
By taking different values of j, we find that any of the coefficients ˛1 , ˛2 , etc., is

equal to zero. Similarly, by multiplying both sides of Eq. (3.46) on j .x/ with fixed
j and integrating over x between l and l, we find ˇj D 0. Since j was chosen
arbitrarily, all the coefficients ˇ1 , ˇ2 , etc., are equal to zero. Hence, the sine and
cosine functions sin . nx=l/ and cos . nx=l/ for all n D 0; 1; 2; : : : are linearly
independent.
Intuitively, the function f .x/ contains an “infinite” amount of “information”
since the set of x values is continuous; therefore, when we expand it into an
infinite set of linearly independent functions, it naively seems that we provide an
adequate amount of information for it, and hence the expansion must be meaningful.
Of course, this is not yet sufficient for the f .x/ to be expandable: the set of functions
f n .x/g and f n .x/g must also be complete to represent f .x/ adequately. We shall
elaborate on this point later on.
In order to get some feeling of how the Fourier series calculated using formu-
lae (3.9) and (3.10) for the an and bn coefficients converges to the function f .x/, let
us consider a general linear combination
˛0 X
N
fN .x/ D C .˛n n .x/ C ˇn n .x// (3.47)
2 nD1
of the same type as in the Fourier series (3.7), but with arbitrary coefficients ˛n
and ˇn . In addition, the sum above is constructed out of only the first N functions
n .x/ and n .x/ of the Fourier series.
Theorem 3.2. The expansion (3.47) converges on average to the function f .x/ for
any N if the coefficients ˛n and ˇn of the linear combination coincide with the
corresponding Fourier coefficients an and bn defined by Eqs. (3.9) and (3.10), i.e.
when ˛n D an and ˇn D bn for any n D 0; 1; : : : ; N. By “average” convergence
we mean the minimum of the mean square error function
Z
1 l
ıN D Œf .x/ fN .x/2 dx: (3.48)
l l
Note that ıN 0, i.e. cannot be negative.

Proof. Expanding the square in the integrand of the error function (3.48), we get
three terms:
Z Z Z
1 l
2 l
1 l
ıN D f 2 .x/dx f .x/fN .x/dx C fN2 .x/dx: (3.49)
l l l l l l
Substituting the expansion (3.47) into each of these terms, they can be calculated.
We use the orthogonality of the functions f n .x/g and f n .x/g to calculate first the
last term in Eq. (3.49):
Z Z
2 X ˛0
N
1 l
˛02 l
fN2 .x/dx D C .˛n n .x/ C ˇn n .x// dx
l l 2 l nD1 2 l
Z
1 XX
N N l
C .˛n n .x/ C ˇn n .x// .˛m m .x/ C ˇm m .x// dx:
l nD1 mD1 l
The second term in the right-hand side is zero since the integrals of either n or
n are zero for any n. In the third term, only integrals between two n or two n
functions with equal indices survive, and thus we obtain
Z
˛02 X 2
N
1 l
fN2 .x/dx D C ˛n C ˇn2 ; (3.50)
l l 2 nD1
i.e. an identity similar to the Parseval’s theorem, Sect. 3.4. The second integral in
Eq. (3.49) can be treated along the same lines:
Z Z
2 l
˛0 l
f .x/fN .x/dx D f .x/dx
l l l l
N Z l Z l
2X
C ˛n f .x/ n .x/dx C ˇn f .x/ n .x/dx :
l nD1 l l
Using Eqs. (3.9) and (3.10) for the Fourier coefficients of f .x/, we can rewrite the
above expression in a simplified form:
Z X
N
2 l
f .x/fN .x/dx D ˛0 a0 C 2 .˛n an C ˇn bn / : (3.51)
l l nD1
Thus, collecting all terms, we get for the error:

Z X ˛02 X 2
N N
1 l
ıN D f 2 .x/dx ˛0 a0 2 .˛n an C ˇn bn / C C ˛n C ˇn2
l l nD1
2 nD1
Z X
1 l
2 ˛02
N
2
D f .x/dx C ˛0 a0 C ˛n 2˛n an C ˇn2 2ˇn bn
l l 2 nD1
Z
1 l
1 a2
D 2
f .x/dx C .˛0 a0 /2 0
l l 2 2
N h
X i
C .˛n an /2 a2n C .ˇn bn /2 b2n : (3.52)
nD1
It is seen that the minimum of ıN with respect to the coefficients ˛n and ˇn of the trial
expansion (3.47) is achieved at ˛n D an and ˇn D bn , i.e. when the expansion (3.47)
coincides with the partial Fourier series containing the first N terms. Q.E.D.
Theorem 3.3. The Fourier coefficients an and bn defined by Eqs. (3.9) and (3.10)
tend to zero as n ! 1, i.e.
Z l Z l
nx nx
lim f .x/ sin dx D 0 and lim f .x/ cos dx D 0; (3.53)
n!1 l l n!1 l l
if the function f .x/ is continuous everywhere within the interval l x l, apart

from points where it has discontinuities of the first kind.
Proof. The conditions imposed on the function f .x/ in the formulation of the
theorem are needed for all the integrals we wrote above to exist. Then, the minimum
error ıN is obtained from Eq. (3.52) by putting ˛n D an and ˇn D bn :
Z
a20 X 2
N
1 l
ıN D f 2 .x/dx an C b2n : (3.54)
l l 2 nD1
Note that the values of the coefficients ˛n and ˇn do not depend on the value of N; for
instance, if N is increased by one, N ! N C 1, two new coefficients are added to the
expansion (3.47), ˛NC1 and ˇNC1 , however, the values of the previous coefficients
remain the same. At the same time, the error (3.54) becomes ıNC1 D ıN a2NC1
b2NC1 , i.e. it gets two extra negative terms and hence can only become smaller. As the
number of terms N in the expansion is increased, the error gets smaller and smaller.
On the other hand, the error is always not negative by construction, i.e. ıN 0.
Therefore, from Eq. (3.54), given the fact that ıN 0, we conclude that
Z
a20 X 2 1
N l
C an C b2n f 2 .x/dx: (3.55)
2 nD1
l l
As N is increased, the sum in the left-hand side is getting larger, but will always
remain smaller than the positive
P value
2 of the integral in the right-hand side. This
means that the infinite series 1
nD1 an C b 2
n is absolutely convergent, and we can
replace N with 1:
1 Z
a20 X 2 1 l
C an C b2n f 2 .x/dx: (3.56)
2 nD1
l l
Thus, the infinite series in the left-hand side above is bound from above. Since the
series converges, the terms of it, a2n C b2n , must tend to zero as n ! 1 (Theorem
I.7.4), i.e. each of the coefficients an and bn must tend separately to zero as n ! 1.
Q.E.D.
As a simple corollary to this theorem, we notice that the integration limits in
Eq. (3.53) can be arbitrary; in particular, these could only cover a part of the interval
l x l. Indeed, if the interval a x b is given, which lies inside the
interval l x l, one can always define a new function which is equal to
the original function f .x/ inside a x b and is zero in the remaining part of
the periodicity interval l x l. The new function would still be integrable, and
hence the above theorem would hold for it as well. This proves the statement made
which we shall use below.
It can then be shown that the error ıN ! 0 as N ! 1. This means that we
actually have the equal sign in the above Eq. (3.56):
1 Z
a20 X 2 1 l
C an C b2n D f 2 .x/dx (3.57)
2 nD1
l l
This is the familiar Parseval’s equality, Eq. (3.27). To prove this, however, we have
to perform a very careful analysis of the partial sum of the Fourier series which is
the subject of the next subsection.
3.7.2 A More Rigorous Approach to the Fourier Series:

Dirichlet Theorem
Consider a partial sum of N terms in the Fourier series,
a0 X
N
nx nx
SN .x/ D C an cos C bn sin ; (3.58)
2 nD1
l l
where the coefficients an and bn are given precisely by Eqs. (3.9) and (3.10). Using
these expressions in (3.58), we get
Z
1 l
SN .x/ D f .t/dt
2l l
N Z
X Z l
1 l
nt nx 1 nt nx
C f .t/ cos dt cos C f .t/ sin dt sin
nD1
l l l l l l l l
Z ( )
1 l
1 X N
nt nx nt nx
D f .t/ C cos cos C sin sin dt
l l 2 nD1 l l l l
Z ( )
1 l
1 X N
n .t x/
D f .t/ C cos dt;
l l 2 nD1 l
where we have used a well-known trigonometric identity for the expression in the
square brackets, as well as the fact that we are dealing with a sum of a finite number
of terms, and hence the summation and integration signs are interchangeable. The
sum of cosine functions we have already calculated before, Eq. (2.8), so that

1 X
N
n .t x/ 1 .t x/ 1
C cos D sin N C
2 nD1 l 2 sin .tx/
2l
l 2

sin 2l .t x/ .2N C 1/
D :
2 sin .tx/
2l
Correspondingly, we can write for the partial sum:

Z
1 l sin 2l .t x/ .2N C 1/
SN .x/ D f .t/ dt
l l 2 sin .tx/
2l
Z
1 xCl sin 2l .t x/ .2N C 1/
D f .t/ dt; (3.59)
l xl 2 sin .tx/ 2l
where in the last step we shifted the integration interval by x. We can do it, firstly,
as the function f .t/ is periodic, and secondly the ratio of both sine functions is also
periodic with the same period of 2l (check it!). Next, we split the integration region
into two intervals: from x l to x, and from x to x C l, and shall make a different
change of variable t ! p in each integral as shown below:
Z ˇ ˇ
1 x sin .t x/ .2N C 1/ ˇ t D x 2p ˇ
f .t/ 2l ˇ
dt D ˇ ˇ
l xl 2 sin .tx/ dt D 2dp ˇ
2l
Z
1 l=2 sin l .2N C 1/ p
f .x 2p/ D dp;
l 0 sin l p
Z ˇ ˇ
1 xCl sin 2l .t x/ .2N C 1/ ˇ t D x C 2p ˇ
f .t/ ˇ
dt D ˇ ˇ
l x 2 sin .tx/ dt D 2dp ˇ
2l
Z
1 l=2 sin l .2N C 1/ p
D f .x C 2p/ dp;
l 0 sin l p
i.e., we can write

Z
1 l=2 .2N C 1/ p
sin
SN .x/ D f .x 2p/ l
dp
l 0 sin l p
Z
1 l=2 sin l .2N C 1/ p
C f .x C 2p/ dp: (3.60)
l 0 sin l p
Next, we note that if f .x/ D 1, then the Fourier series consists of a single term
which is a0 =2 D 1, i.e. a0 D 2, since an and bn for n 1 are all equal to zero (due
to orthogonality of the corresponding sine and cosine functions, Eq. (3.3), with the
cosine function 0 .x/ D 1 corresponding to n D 0). Therefore, SN .x/ D 1 in this
case for any N, and we can write
Z
2 l=2 sin l .2N C 1/ p
1D dp:
l 0 sin l p
Multiply this identity on both sides by Œf .x C ı/ C f .x ı/ =2 (with some small

ı > 0) and subtract from (3.60) to obtain:
1
SN .x/ Œf .x ı/ C f .x C ı/
2
Z
1 l=2 .2N C 1/ p
sin
D Œf .x 2p/ f .x ı/ l
dp
l0 sin l p
Z
1 l=2 sin l .2N C 1/ p
C Œf .x C 2p/ f .x C ı/ dp
l 0 sin l p
Z Z
1 l=2 1 l=2
D ‰1 .p/ sin .2N C 1/ p dp C ‰2 .p/ sin .2N C 1/ p dp;
l 0 l l 0 l
(3.61)
where
Œf .x 2p/ f .x ı/
‰1 .p/ D ; (3.62)
sin l p
Œf .x C 2p/ f .x C ı/
‰2 .p/ D : (3.63)
sin l p
The integrals in the right-hand side of Eq. (3.61) are of the form we considered in
Theorem 3.3 and a corollary to it. They are supposed to tend to zero as N ! 1
(where n D 2N C 1) if the functions ‰1 .p/ and ‰2 .p/ are continuous with respect to
their variable p within l p l, apart maybe from some finite number of points
of discontinuity of the first kind. If this was true, this theorem would be applicable
and then

1
lim SN .x/ Œf .x ı/ C f .x C ı/ D 0:
N!1 2
This would be the required result if we assume that ı ! 0, i.e. the Fourier series
(which is the limit of SN .x/ for N ! 1) would be equal to the mean value of f .x/
at the point x calculated using its left and right limits:
1
lim SN .x/ D Œf .x ı/ C f .x C ı/ : (3.64)
N!1 2
Hence, what is needed is to analyse the two functions: ‰1 .p/ and ‰2 .p/. We
notice that both functions inherit the discontinuities of the first kind of f .x/ itself, i.e.
they are indeed continuous everywhere apart from possible discontinuities of f .x/.
The only special point is p D C0. Indeed, the sin . p=l/ in the denominator of both
functions is zero at p D 0, and hence we have to consider this particular point as
well. It is easy to see, however, that both functions have well-defined limits at this
point. Indeed, assuming f .x/ is differentiable on the left of x (i.e. at x 0), we can
apply the Lagrange formula of Sect. I.3.7, see Eq. (I.3.64), which we shall write here
as follows (recall that p changes between 0 and l=2, i.e. it is always positive):
f .x 2p/ D f .x ı/ f 0 .x #2p/ 2p with 0 < # < 1;
and hence
Œf .x 2p/ f .x ı/ 2p

lim ‰1 .p/ D lim
p!C0 p!C0 2p sin l p
Œf .x 2p/ f .x ı/ 2p
D lim lim
p!C0 2p p!C0 sin
l
p
2p 2l
D f 0 .x 0/ lim D f 0 .x 0/ ;
p!C0 sin l p
since the limit of z= sin z when z ! 0 is equal to one. Hence, the limit of ‰1 .p/ at
p ! C0 is finite and is related to the left derivative of f .x/. Similarly,
f .x C 2p/ D f .x C ı/ C f 0 .x C #2p/ 2p with 0 < # < 1;
and hence
Œf .x C 2p/ f .x C ı/ 2p
lim ‰2 .p/ D lim lim
p!C0 p!C0 2p p!C0 sin
l
p
2p 2l
D f 0 .x C 0/ lim D f 0 .x C 0/ ;
p!C0 sin p
l
it is also finite and is related to the right derivative of f .x/ (the left and right
derivatives at x do not need to be the same). This concludes our proof: the functions
‰1 .p/ and ‰2 .p/ satisfy the conditions of Theorem 3.3 and hence SN .x/ indeed
tends to the mean value of f .x/ at point x, Eq. (3.64).
In the above proof we have made an assumption concerning the function f .x/,
specifically, that it has well-defined left and right derivatives (and not necessarily
the same) at any point x between l and l; in fact, this assumption is not necessary
and can be lifted leading to the general formulation of the Dirichlet Theorem 3.1;
however, this more general proof is much more involved and will not be given here.
3.7.3 Expansion of a Function Via Orthogonal Functions
Sine, cosine and complex exponential functions are just a few examples of
orthogonal sets of functions which can be used to expand “good” functions in the
corresponding functional series. There are many more such examples. Here we shall
discuss in some detail this question, a rigorous discussion of this topic goes far
beyond this book.
Let us assume that some continuous and generally complex functions 1 .x/,
2 .x/, 3 .x/, etc., form a set f i .x/g. We shall call this set of functions orthonormal
on the interval a x b with weight w.x/ 0 if for any i and j we have (cf.
Sect. 1.1.2):
Z b

w.x/ i .x/ j .x/dx D ıij ; (3.65)
a
compare Eqs. (3.6) and (3.31). Next, we consider a generally complex function f .x/
with a finite number of discontinuities of the first kind, but continuous everywhere
between any two points of discontinuity. Formally, let us assume, exactly in the
same way as when we investigated the trigonometric Fourier series in Sect. 3.1, that
f .x/ can be expanded into a functional series in terms of the functions of the set
f i .x/g, i.e.
X
f .x/ D ci i .x/; (3.66)
i
where ci are expansion coefficients. To find them, we assume that the series above
can be integrated term-by-term. Then we multiply both sides of the above equation
by w.x/ j .x/ with some fixed index j and integrate between a and b:
Z b X Z b

w.x/f .x/ j .x/dx D ci w.x/ j .x/ i .x/dx
a i a
Z b X

H) w.x/f .x/ j .x/dx D ci ıij D cj ;
a i
(3.67)
where we have made use of the orthonormality condition (3.65). Note that the
coefficients ci may be complex. Therefore, if the expansion (3.66) exists, the
expansion coefficients cj are to be determined from (3.67).
The expansion (3.66) is called the generalised Fourier expansion and coeffi-
cients (3.67) the generalised Fourier coefficients.
Problem 3.27. Assuming the existence of the expansion (3.66) and legitimacy
of the integration term-by-term, prove the Parseval’s theorem for the gener-
alised Fourier expansion:
Z b X
w.x/ jf .x/j2 dx D jci j2 : (3.68)
a i
Problem 3.28. Similarly, consider two functions, f .x/ and g.x/, both expanded
into the functional series in terms of the same set f i .x/g. Then, prove the
Plancherel’s theorem for the generalised Fourier expansion:
Z b X
w.x/f .x/g.x/dx D fi gi ; (3.69)
a j
where fi and gi are the corresponding generalised Fourier coefficients.

Problem 3.29. Prove that if the functions of the system f i .x/g are orthogonal,
then they are linearly independent.
Next, we build an error function

Z b
ıN D w.x/ jf .x/ SN .x/j2 dx (3.70)
a
between f .x/ and the partial sum
X
N
SN .x/ D fi i .x/;
iD1
constructed using some arbitrary coefficients fi .
Problem 3.30. Substitute the above expansion for the partial sum into the
error function (3.70) and, using explicit expressions for the generalised Fourier
coefficients (3.67), show that the error function
Z b X
N X
N
2 2
ıN D w.x/ jf .x/j dx jci j C jci fi j2 : (3.71)
a iD1 iD1
It is seen from this result, Eq. (3.71), that the error ıN is minimised if the
coefficients fi and ci coincide: fi D ci . In other words, if the generalised Fourier
expansion exists, its coefficients must be the generalised Fourier coefficients (3.67).
Then the error
Z b X
N Z b X
N
2 2
ıN D w.x/ jf .x/j dx jci j 0 H) w.x/ jf .x/j2 dx jci j2 :
a iD1 a iD1
(3.72)
We also notice that the ci coefficients do not depend on N, and if N is increased,
the errorP ıN is necessarily reduced remaining non-negative. If the functional
series i ci i .x/ was equal to f .x/ everywhere except maybe a finite number
of points with discontinuities of the first kind, then, according to the Parseval’s
theorem (3.68), we would have the equal sign in (3.72), i.e. the error ıN in the
limit of N ! 1 would tend to zero. Therefore, there is a fundamental connection
between Eq. (3.68) and the fact that the generalised Fourier expansion is equal to
the original function f .x/ at all points apart from some finite number of them where
f .x/ is discontinuous. Equation (3.68) is called the completeness condition, because
it is closely related to the fact of whether an arbitrary function can be expanded into
the set f i .x/g or not.
Problem 3.31. Prove that if f .x/ is orthogonal to all functions of the complete
system f i .x/g, then f .x/ D 0. [Hint: use the completeness condition (3.68) as
well as expressions for the ci coefficients.]
We know that orthogonal functions are linearly independent. Consider a subset

of functions of a complete set, say with numbers i D 1; : : : ; N. If we take any other
function from the complete set which is not in the selected subset, i.e. it is j .x/ with
j > N, then it will be orthogonal to any function of the selected subset. However,
obviously, this function is not identically zero. So, if a function is orthogonal to a
set of functions, it does not necessarily mean that this function is zero. However,
as follows from the last problem, if a function is orthogonal to any function of a
complete set, then another function orthogonal to any function of the set but not
belonging to it must be zero. This means that any function is linearly dependent
on the functions of the set, which in turn means that it can be expanded in a linear
combination of them.
P
Theorem 3.4. If the functional series i ci i .x/ converges uniformly with ci
being the Fourier coefficients for f .x/, and the set f i .x/g is complete, then the
series converges to f .x/.
Proof. Let us assume that the series converges to another function g.x/, i.e.
X Z b

g.x/ D ci i .x/ with ci D w.x/f .x/ j .x/dx:
i a
Since the series converges uniformly, g.x/ is continuous and we can integrate it
term-by-term. Hence, we multiply both sides of the expression for the series by
w.x/ j .x/ with some fixed j and integrate between a and b:
Z b X Z b

w.x/g.x/ j .x/dx D ci w.x/ j .x/ i .x/dx
a i a
Z b

H) cj D w.x/g.x/ j .x/dx;
a
i.e. the coefficients cj have the same expressions both in terms of f .x/ and g.x/, i.e.
Z b

w.x/ Œf .x/ g.x/ j .x/dx D0
a
for any j. The expression above states that the continuous function h.x/ D f .x/g.x/
is orthogonal to any of the functions j .x/ of the set. But if the set is complete, then
h.x/ can only be zero, i.e. g.x/ D f .x/. Q.E.D.
Using the notion of the Dirac delta function (Sect. 4.1), the completeness
condition can be formulated directly in terms of the functions f i .x/g of the set
and the weight. Indeed, using (3.67), we write
X XZ b
f .x/ D ci i .x/ D w.x/f .x0 / 0
i .x /dx
0
i .x/
i i a
Z " #
b 0 X 0

D f x w.x/ i x i .x/ dx0 :
a i
It is now clearly seen, that, according to the filtering theorem for the delta function,
Eq. (4.9), the square brackets above should be equal to ı .x x0 /, i.e.
X 0
0
w.x/ i x i .x/ D ı x x
i
must hold. This is the required completeness condition.

Note that one may always remove the weight function from the consideration
completely
p by introducing an alternative set of orthogonal functions Q i .x/ D
i .x/ w.x/. These are orthogonal with the unit weight and for them the complete-
ness condition simply states that
X
Q i x0 Q i .x/ D ı x x0 : (3.73)
i
3.8 Applications of Fourier Series in Physics
The method of Fourier series is being used a lot in sciences since it allows presenting
a function (e.g. an electric signal or a field), which may have a complicated form, as
a linear combination of simple complex exponentials or sine and cosine functions
which are much easier to deal with; all the information about the function will
then be “stored” in the expansion coefficients. Therefore, the Fourier series method
is also used when solving differential equations, especially partial differential
equations, since in this case the problem with respect to all or some of the variables
becomes an algebraic one related to finding Fourier coefficients in the expansion of
the unknown functions.
Here we shall consider several examples of using Fourier series in condensed
matter physics. We shall also introduce notations often used in physics which are
slightly different to those used above.
3.8.1 Expansions of Functions Describing Crystal Properties
Consider a periodic solid (a crystal) and a function f .x; y; z/ D f .r/ which describes
one of its properties, e.g., the charge density or the electrostatic potential at point
r D .x; y; z/. This function is periodic in several directions, and hence can be
expanded into multiple Fourier series. Here we shall consider how this can be done
in some detail.
Let us start from a one-dimensional crystal: consider a one-dimensional periodic
chain of atoms arranged along the x direction, similar to the one shown in Fig. 3.4.
Let .x/ be the electron density of the system corresponding to a distribution of
electrons on atoms of the chain. The density must repeat the periodicity of the chain
itself, i.e. .x C a/ D .x/ for any point x along the axis, where a (D 2l in our
previous notations) is the periodicity, which is the smallest distance between two
equivalent atoms. The density is a smooth function of x, continuous everywhere,
and hence .x/ can be expanded into the Fourier series:
Fig. 3.4 A one-dimensional infinite periodic crystal of alternating point charges ˙Q. Each unit
cell (indicated) contains two oppositely charged atoms with charges Q > 0 and Q < 0. The
total charge of the unit cell is zero and there is an infinite number of such cells running along the
positive and negative directions of the x axis
3.8 Applications of Fourier Series in Physics 285
Fig. 3.5 Two-dimensional lattice of atoms of two species. The system is seen periodic across the
lattice with the lattice vectors being a1 and a2 . The two non-equivalent atoms are indicated by a
pink dashed line, while the irreducible region in the 2D space associated with the unit cell is shown
by a dashed red line
1
X 1
X X
.x/ D ne
i2 nx=a
D ne
iGn x
D ge ;
igx
nD1 nD1 g
where gn D n .2 =a/ is the one-dimensional reciprocal lattice “vector” g, and the

summation over n means that we sum over all possible such values of g, and
Z
1 a
n D g D .x/eign x dx
a 0
are the corresponding Fourier coefficients. In the last passage of the expansion
formula for .x/ we have used simplified notations in which n is dropped; these
are also frequently used. The quantity b D 2 =a is called a reciprocal lattice vector
(of the one-dimensional lattice) as it corresponds to the periodicity of this reciprocal
lattice: gn D bn. We can see that the Fourier series in this case can be mapped onto
the lattice sites of the (imaginable) reciprocal lattice with the periodicity b, and the
single summation over n would take all possible lattice sites g of this lattice into
account.
This consideration can be generalised for two- and three-dimensional lattices.
Consider first a two-dimensional lattice shown in Fig. 3.5. We introduce two vectors
on the plane of the lattice: a1 and a2 . Then any lattice site can be related to a
reference (zero) site via the lattice vector a D n1 a1 C n2 a2 , where n1 and n2
are two integers. By taking on all possible negative and positive values of n1 and
n2 including zero (corresponding to the zero lattice site), the whole infinite two-
dimensional lattice can be reproduced. The density in this case .x; y/ D .r/ is
a function of the two-dimensional vector r D .x; y/ or, alternatively, of the two
coordinates x and y. However, in general, the periodicity of the system does not
necessarily follow the Cartesian directions, i.e. the lattice vectors a1 and a2 may be
directed differently to the x and y axes. In this case it is more convenient to consider
the density as a function of the so-called fractional coordinates r1 and r2 instead.
These appear if we write r in terms of the lattice vectors, r D r1 a1 C r2 a2 , with r1
and r2 being real numbers between 1 and C1. Then the density .r/ becomes a
function 1 .r1 ; r2 / of the fractional coordinates r1 and r2 . The convenience of this
representation is in that the density is periodic with respect to both r1 and r2 with
the period equal to one, i.e.
1 .r1 C 1; r2 / D 1 .r1 ; r2 C 1/ D 1 .r1 ; r2 / ;
since adding the unity to either r1 or r2 changes the vector r in .r/ exactly by the
lattice vector a1 or a2 , i.e. in either case an equivalent point in space is obtained.
Hence, 1 .r1 ; r2 / is periodic and can be expanded into the Fourier series. We first
consider 1 as a function of r1 and expand it as above with respect to this variable
only:
1
X
1 .r1 ; r2 / D n1 .r2 / eik1 r1
n1 D1
with
Z 1=2 Z 1
ik1 r1
n1 .r2 / D 1 .r1 ; r2 / e dr1 D 1 .r1 ; r2 / eik1 r1 dr1 ;
1=2 0
where k1 D 2 n1 and the corresponding Fourier coefficient n1 .r2 / is still an explicit

function of r2 (recall that the periodicity here is just equal to one). It is easily seen
that 1 is indeed periodic with respect to r1 with the shortest period of 1. Because
the density is also periodic with respect to r2 , the Fourier coefficients above must
still be periodic with respect to it: n1 .r2 / D n1 .r2 C 1/. Hence these can in turn be
expanded in a Fourier series and we arrive at a double Fourier series for the whole
density:
1
X
1 .r1 ; r2 / D n1 .r2 / eik1 r1
n1 D1
1
" 1
# 1 1
X X X X
i.k1 r1 Ck2 r2 /
D n1 n2 e
ik2 r2 ik1 r1
e D n1 n2 e ;
n1 D1 n2 D1 n1 D1 n2 D1
(3.74)
where k2 D 2 n2 and
Z 1 Z 1 Z 1
n1 n2 D n1 .r2 / eik2 r2
dr2 D dr1 dr2 1 .r1 ; r2 / ei.k1 r1 Ck2 r2 / : (3.75)
0 0 0
These are the final Fourier coefficients.

The above equations can also be written in a more concise form. To this end,
as it is customarily done in crystallography and solid state physics, we introduce
reciprocal lattice vectors b1 and b2 via ai bj D 2 ıij , where i; j D 1; 2, and ıij is
the Kronecker delta symbol.
Problem 3.32. Show that the two-dimensional reciprocal lattice vectors are
2 2
b1 D 2
a22 a1 .a1 a2 / a2 ; b2 D 2 .a1 a2 / a1 C a21 a2 ;
vs vs
ˇ ˇ
where vs D ˇa1x a2y a2x a1y ˇ is the area of the unit cell. [Hint: expand b1 with
respect to a1 and a2 and find the expansion coefficients using its orthogonality
with the direct lattice vectors a1 and a2 . Repeat for b2 .]
Next we define the reciprocal lattice vector g D n1 b1 C n2 b2 corresponding to

the .n1 ; n2 / site in the reciprocal lattice. It is easy to see then that
g r D .n1 b1 C n2 b2 / .r1 a1 C r2 a2 / D n1 r1 .a1 b1 / C n2 r2 .a2 b2 /

D 2 .n1 r1 C n2 r2 / D k1 r1 C k2 r2 ;
and hence Eqs. (3.74) and (3.75) are simplified to:
X “
1
.r/ D ge
ig r
; where g D .r/ eig r dr; (3.76)
g
vs cell
where we sum over all possible reciprocal lattice vectors (it is in fact a double sum
over all possible integer values of n1 and n2 ). Notice that the integration in the
definition of the Fourier coefficients g is performed over the whole area of the unit
cell shown in Fig. 3.5, not over a square of side one as before in (3.75), and the factor
of 1=vs appeared. This is the result of the change of variables in the double integral
which was initially taken over r1 and r2 . Indeed, r D r1 a1 C r2 a2 , and hence x D
r1 a1x C r2 a2x and y D r1 a1y C r2 a2y . Therefore, dr D dxdy D jJj dr1 dr2 , where the
Jacobian
ˇ ˇ ˇ ˇ
@ .x; y/ ˇ @x=@r1 @y=@r1 ˇ ˇ a1x a1y ˇ
JD ˇ
Dˇ ˇ D ˇ ˇ D a1x a2y a1y a2x H) jJj D vs :
@ .r1 ; r2 / @x=@r2 @y=@r2 ˇ ˇ a2x a2y ˇ
In the following problems we shall generalise our results to the three-dimensional

case.
Problem 3.33. A three-dimensional lattice is specified by three basic direct

lattice vectors a1 , a2 and a3 . The basis vectors of the corresponding reciprocal
lattice are defined via ai bj D 2 ıij , where i; j D 1; 2; 3. Show that
2 2 2
b1 D Œa2 a3 ; b2 D Œa3 a1 ; b3 D Œa1 a2 ; (3.77)
vc vc vc
where vc D jŒa1 ; a2 ; a3 j is the unit cell volume (see also Sect. I.1.7.1).
Problem 3.34. Correspondingly, show that a function f .r/ which is periodic
in the direct space with respect to any direct lattice vector L D n1 a1 C n2 a2 C
n3 a3 (where fni g D .n1 ; n2 ; n3 / are all possible negative and positive integers,
including zero, and fai g are the unit base vectors) can be expanded in the triple
Fourier series
X •
1
f .r/ D fg eig r ; where fg D f .r/ eig r dr; (3.78)
g
vc cell
where integration is performed over the volume of the unit cell (i.e. a paral-
lelepiped formed by the three basic direct lattice vectors) and the summation is
performed over all possible reciprocal lattice vectors g D m1 b1 Cm2 b2 Cm3 b3
vectors (i.e. all possible m1 ; m2 ; m3 ).
Problem 3.35. Consider a big macroscopic volume of the solid containing a
very large number N of identical unit cells of volume vc each. Show then that
the triple integral in Eq. (3.78) for fg can in fact be extended to the whole volume
V D Nvc , i.e.
X •
1
f .r/ D fg eig r ; where fg D f .r/ eig r dr: (3.79)
g
V V
This expression for the Fourier coefficients could be sometimes more convenient
in considering the so-called thermodynamic limit when V ! 1 since the shape
of the unit cell can be ignored.
Problem 3.36. The electrostatic potential V.r/ in a three-dimensional peri-
odic crystal caused by its charge density satisfies the Poisson equation of
classical electrostatics V.r/ D 4 .r/, where .r/ is the total charge
density in the crystal at point r. Expanding both the potential and the density
in the Fourier series, show that the solution of this equation is
X4 g ig r
V.r/ D e : (3.80)
g
g2
(continued)

This expression has a singularity at g D 0 since the Fourier coefficient of the
potential contains division on the square of g. Show that the Fourier component
of the charge density, 0 , corresponding to the zero reciprocal vector is directly
related to the total charge in the unit cell and is then zero for neutral systems.
Therefore, the summation in Eq. (3.80) is in fact performed only with respect to
non-zero reciprocal lattice vectors, hence the singularity is avoided explicitly.
Problem 3.37. Let L D n1 a1 C n2 a2 C n3 a3 and g D m1 b1 C m2 b2 C m3 b3
be direct and correspondingly reciprocal lattice vectors. Prove the so-called
theta-function transformation:
X p 3
2 2 1 X 2 2
ejrLj t
D eg =4t eig r ; (3.81)
L
vc g t
where t is a real number and r is a vector. This identity shows that the direct
lattice sum in the left-hand side can be expanded in the Fourier series and can
thus be equivalently rewritten as a reciprocal lattice sum. [Hint: first of all,
prove that the function in the left-hand side is periodic in r with respect to
an arbitrary lattice vector L0 , i.e. adding L0 to r does not change the function;
hence, the left-hand side can be expanded in the Fourier series in terms of
g using Eq. (3.79). Then, when performing triple integration over the whole
space, show that each term in the sum makes an identical contribution giving
out a factor of N; hence the sum over L can be removed, while V be replaced
with vc . Next, calculate the triple integral over the whole space (i.e. in the
thermodynamic limit).]
3.8.2 Ewald’s Formula
The theta-function transformation allows one to obtain a very useful formula for
calculating in practice the electrostatic potential of the lattice of point charges
(atoms), the so-called Ewald formula. Consider a three-dimensional periodic crystal
with point charges qs in each unit cell, the index s is used to count the charges; in
the zero cell (with the lattice vector L D 0/ the positions ofPcharges are given by
vectors Xs . Each unit cell is considered charge-neutral, i.e. s qs D 0. Then, the
electrostatic potential at point r due to all charges of the whole crystal (we only
consider r to be somewhere between the atoms) will be
XX qs
V.r/ D ;
L s
jr L Xs j
since the position of the charge qs in the unit cell associated with the lattice vector
L is L C Xs .
This lattice sum converges extremely slowly; in fact, it can be shown that it
converges only conditionally (see Sect. I.7.1.4). However, its convergence can be
considerably improved using the following trick.4 Consider the so-called error
function
Z 1
2 2
erfc.x/ D p et dt: (3.82)
x
2
This function tends to zero exponentially fast as x ! 1 (in fact, as ex ).
Conjugated to this one is another error function
Z
2 x
2
erf.x/ D 1 erfc.x/ D p et dt; (3.83)
0
which tends to unity as x ! 1. Using these two functions, we can rewrite the
potential:
" #
X erfc . jr L Xs j/ X X erf . jr L Xs j/
V.r/ D qs C qs ;
Ls
jr L Xs j s L
jr L Xs j
where is some (so far arbitrary) positive constant (called the Ewald’s constant).
The first sum over the direct lattice L converges very quickly because of the error
function, so only the unit cells around the point r contribute appreciably. The second
sum, however, converges extremely slowly because of the other error function
erf.x/ which tends to unity as x ! 1. This is the point where the theta-function
transformation proves to be extremely useful. Indeed, the function in the square
brackets can be manipulated into the following expression:
X erf . jr L Xs j/ Z
2 X 1 jrLXs j
2
Dp et dt
L
jr L Xs j L
jr L Xs j 0
ˇ ˇ Z !
ˇ D t= jr L Xs j ˇ 2 X
ˇ ˇ 2 jrLXs j2
Dˇ Dp e d:
d D dt= jr L Xs j ˇ 0 L
4
The Ewald method corresponds to a particular regularisation of the conditionally converging
series. However, it can be shown (see, e.g., L. Kantorovich and I. Tupitsin—J. Phys. Cond. Matter
11, 6159 (1999)) that this calculation results in the correct expression for the electrostatic potential
in the central part of a large finite sample if the dipole and quadruple moments of the unit cell are
equal to zero. Otherwise, an additional macroscopic contribution to the potential is present.
Now, the expression in the round brackets can easily be recognised to be the one for
which the theta-function transformation (3.81) can be used:
Z ! Z !
2 X 2 X 1
2 jrLXs j2 g2 =42 ig .rXs /
p e d D e e d:
0 L
vc 0 g
3
The integration over is trivially performed using the substitution x D g2 =42 , and
we obtain
Z X !
2 2 jrLXs j2 4 X 1 g2 =4 2 ig .rXs /
p e d D e e :
0 L
vc g g2
Therefore, the potential takes on the final form:

!
X erfc . jr L Xs j/ 4 X 1 g2 =4 2 X ig Xs ig r
V.r/ D qs C e qs e e :
Ls
jr L Xs j vc g g2 s
(3.84)
2 2
This is the required result: because of the exponential function eg =4 , the second
sum over the reciprocal lattice vectors converges extremely quickly rendering the
obtained formula a very efficient instrument in calculating the electrostatic potential
of a lattice of point charges. The constant regulates the contributions coming from
the direct (the first term) and reciprocal (second) lattice sums, and can be optimised
to improve the efficiency in numerical calculations. Also note that the g D 0 term
in the second sum does not contribute as any physical system is charge-neutral.
Problem 3.38. Consider a one-dimensional periodic chain of atoms (i.e. a

one-dimensional crystal) with periodicity a. Show that in this case the Ewald
formula for the electrostatic potential reads
2 X !
X erfc . .r L Xs // 1X g
V.r/ D qs C Ei qs eigXs eigr ;
Ls
r L Xs a 4 s g¤0
(3.85)
where g D mb is one-dimensional reciprocal vector with b D 2 =a, m D
˙1; ˙2; : : :, and
Z 1
et
Ei.x/ D dt
x t
is the function called exponential integral.

Problem 3.39. Prove that for a two-dimensional solid the Ewald formula
reads
X !
X erfc . jr L Xs j/ 2 X 1 jgj ig Xs
V.r/ D qs C 2
erfc qs e eig r ;
Ls
jr L X s j vs g 2 s
g¤0
(3.86)
where L and g are two-dimensional direct and reciprocal lattice vectors.
3.8.3 Born and von Karman Boundary Conditions
In solid state physics very often it is necessary to deal with functions f .r/ which
are not periodic, e.g., an electron wave function in a crystal. Even in those cases
it is still useful and convenient to employ the Fourier series formalism. The way
this is normally done is as follows. We consider, following Born and von Karman,
an artificial periodicity in the solid happening on a much larger scale. Namely, we
assume that if we move N1 1 times along the lattice vector a1 , the solid repeats
itself, and the same happens if we perform a translation N2 1 and N3 1
times along the other two lattice directions (vectors), i.e. we impose an artificial
periodicity on the function f .r/ as follows:
f .r/ D f .r C N1 a1 / D f .r C N2 a2 / D f .r C N3 a3 / : (3.87)
This condition is sometimes called a cyclic boundary condition, as explained in

Fig. 3.6 for the one-dimensional case: it corresponds to forming a ring of N1 cells.
In the three-dimensional case one may imagine that every two opposite sides of our
very big solid are connected with each other like in a torus. This new solid can be
imagined as having very large direct lattice vectors Ai D Ni ai (i D 1; 2; 3) and
correspondingly very small reciprocal basic lattice vectors Bi D bi =Ni (i D 1; 2; 3).
Expanding f .r/ now as in Eq. (3.78) we have a sum over all vectors of our new
reciprocal lattice:
M1 M2 M3
G D M1 B1 C M2 B2 C M3 B3 D b1 C b2 C b3 ;
N1 N2 N3
where numbers Mi take on all possible negative and positive integer values including
zero. When each integer Mi (i D 1; 2; 3) becomes equal to an integer number of the
corresponding Ni , the vector G becomes equal to the reciprocal lattice vector g of
the original reciprocal lattice corresponding to the small direct lattice with the basic
vectors ai . For other values of the numbers Mi we can always write: Mi D mi Ni Ci ,
where mi is an integer taking all possible values, but the integer i changes only
between 0 and Ni 1. Then,
Fig. 3.6 Born and von

Karman periodic boundary
conditions for a
one-dimensional solid: (a)
using an infinite periodic cell
containing N1 1 primitive
unit cells, and (b) using a ring
consisting of N1 unit cells

1 2 3
G D .m1 b1 C m2 b2 C m3 b3 / C b1 C b2 C b3 D g C k;
N1 N2 N3
where the vector k takes N1 N2 N3 values within a parallelepiped with the sides
made by the three basic reciprocal lattice vectors bi corresponding to the original
reciprocal lattice. This parallelepiped, called the (first) Brillouin zone (BZ), is
divided by a grid of N1 N2 N3 small cells with the sides given by bi =Ni , and all values
of k correspond to these small cells. Therefore, instead of (3.78) we can write in this
case:
XX X
f .r/ D fGCk ei.GCk/ r D fK eiK r ;
G k K
where we sum over all reciprocal lattice vectors of the original lattice and all points
k from the first BZ, where K D G C k.
The complex exponential functions exp .iK r/ with K D G C k are called plane
waves, they form the basis of most modern electronic structure calculation methods.
It is essential that this basis is complete (see Sect. 3.7.3), i.e. any function f .r/ can
be expanded in terms of them. In practice the expansion is terminated, and there is
a very simple algorithm for doing this. Indeed, plane waves with large reciprocal
lattice vectors K oscillate rapidly in the direct space and hence need to be kept
in the expansion only if the function f .r/ changes rapidly on a small length scale
(e.g. wave functions of the electrons oscillate strongly close to atomic nuclei in
atoms, molecules and crystals). If, however, f .r/ is smooth everywhere in space,
then large reciprocal space vectors K are not needed, i.e. one can simply include in
the expansion all vectors whose lengths are smaller than a certain cut-off Kmax , i.e.
jKj Kmax . Special tricks are used to achieve the required smooth behaviour of the
valence electron wave functions near and far from the atomic cores by employing
special effective core potentials called pseudopotentials.
3.8.4 Atomic Force Microscopy
In non-contact atomic force microscopy (NC-AFM) an atomically sharp tip attached

to the end of a cantilever beam is made to oscillate above a crystal surface. This is
achieved by means of a piezocrystal which the beam is attached to with its fixed
end: an alternating voltage of frequency ! (in the order of hundreds of megahertz)
applied to the piezocrystal causes the beam to oscillate, see Fig. 3.7. During its
oscillations the tip comes very close to the surface, within one-two atomic distances,
and this way the surface force Fs acts on the cantilever affecting its oscillations.
Depending on whether the tip oscillates above one surface species or another,
or between atoms, this force would be different; this variation of the short-range
interaction of the tip with the surface due to lateral position of the tip is at the heart
of the contrast of this experimental device when it scans the surface.
Modern NC-AFM experiments can achieve a remarkable resolution: in many
cases surfaces are resolved atomically. This is done when the scan is performed in
vacuum using a special technique called frequency modulation. In this method the
cantilever is oscillated at resonance which is established in such a way that there
is exactly =2 phase shift between the oscillating cantilever and the sinusoidal
voltage applied to the piezo (the driving signal). During the scan this resonance
is established automatically by sophisticated electronics, so that for any lateral
position of the tip above to the surface the tip always oscillates at resonance. In
constant (resonance) frequency mode the oscillation frequency is kept constant
during the scan, this is achieved by moving the surface vertically by another
piezocrystal attached to the sample surface. The values of the vertical movements
of the sample z for each lateral position of the tip .x; y/ form the image of the
surface, z.x; y/.
Let us consider the oscillating cantilever theoretically. Let h be the average
distance between the tip and surface, which is maintained by a constant external
force Fext holding the oscillating cantilever. During the scan this distance changes
Fig. 3.7 Schematic of

NC-AFM: a macroscopic
cantilever with a sharp tip at
its end, terminated by a single
atom, oscillates above a
surface with frequency !.
The average distance between
the tip and the surface is h,
while the deviation of the tip
from that average distance is
given by the function z.t/.
The total distance of the tip
from the surface is
Z.t/ D h C z.t/
(via appropriate vertical displacements of the sample) to maintain the constant

frequency shift. Then we denote by z.t/ the vertical deviation of the tip during its
oscillations from that distance. This means that the total distance between the tip
and surface at each time t is Z.t/ D h C z.t/, see Fig. 3.7. If Fs .Z/ is the force
acting on the tip from the surface, Fpiezo D A0 cos .!t/ is the excitation (or driving)
signal p
due to the piezo which holds the cantilever, k cantilever elastic constant and
!0 D k=m the cantilever fundamental frequency and m is its effective mass, then
the equation of motion of the cantilever can be written as follows:
mRz.t/ C mPz.t/ C kz.t/ D Fs .h C z/ C A0 cos .!t/ C Fext : (3.88)
Here we introduced also a friction force with being a friction constant since there
must always be friction in the system. If the cantilever is excited with a signal of
frequency !, after some transient time a steady-state solution will be established
as the particular integral of the DE. This solution will also be periodic with the
same frequency !. We would like to obtain such a solution. Since the oscillation
is periodic, it can be expanded into a Fourier series (it is convenient to use the
exponential form):
1
X X X
z.t/ D zn ei2 nt=T
D zn ei!nt D z0 C zn ei!nt ; (3.89)
nD1 n n¤0
where T D 2 =! is the period of oscillations. Correspondingly, the surface force

Fs .h C z.t// will also be a periodic function and can also be expanded in a similar
way:
X
Fs .h C z.t// D Fn ei!nt ; (3.90)
n
where
Z
1 T
Fn D Fs .h C z.t// ei!nt dt (3.91)
T 0
are constant (time independent) Fourier coefficients. Substituting expansions (3.90)

and (3.89) into the equation of motion (3.88), we obtain
X X Fn i!nt A0 i!t Fext
zn ei!nt ! 2 n2 C i! n C !02 D e C e C ei!t C :
n n
m 2m m
Note that the constant external force holding the cantilever contributes only to the
n D 0 Fourier coefficient. Since the exponential functions form a complete set, the
coefficients in both sides of the equation must be equal to each other. This way
we can establish the following equations for the unknown amplitudes zn :
F0 C Fext
z0 D ;
m!02

1 1
z1 ! 2 C i! C !02 D F1 C A0
m 2

1 1
z1 ! 2 i! C !02 D F1 C A0 ;
m 2
Fn
zn ! 2 n2 C i! n C !02 D for any n D ˙2; ˙3; : : : : (3.92)
m
The external force ensures that there is no constant displacement of the cantilever
due to oscillations, i.e. Fext D F0 and hence z0 D 0. The rest of the obtained
equations are to be solved self-consistently since the coefficients Fn depend on
all amplitudes zn , see Eq. (3.91): using some initial values for the amplitudes, one
calculates the constants Fn ; these are then used to update the amplitudes zn by
solving the above equations; new amplitudes are used to recalculate the forces Fn ,
and so on until convergence. In practice, the Fourier series is terminated, so that
only the first N terms are considered, 0 n N.
In many cases only the first two terms in the Fourier series (i.e. for n D ˙1)
can be retained. Let us establish an expression for the resonance frequency in this
case. As was explained above, the resonance is established by the =2 phase shift
between the excitation signal, A0 cos .!t/, and the tip oscillation. This means that in
this approximation
A i!t
z.t/ ' A cos !t D A sin .!t/ D e ei!t ;
2 2i
i.e. z1 D A=2i D iA=2 and z1 D iA=2. Here A is the cantilever oscillation
amplitude.
Problem 3.40. Substituting these values of z˙1 into Eqs. (3.92), shows that the
resonance frequency ! must satisfy the following equation:
2 Z 2
! 1
D1 Fs .h C A sin / sin d : (3.93)
!0 kA 0
This formula can be used for calculating the resonance frequency of the tip
oscillations for the given lateral position of the tip above the surface for which the
tip-surface force Fs is known.
3.8.5 Applications in Quantum Mechanics
Expansion of a function in terms of other functions, which form a complete

orthonormal set, is at the heart of quantum mechanics. For simplicity, we shall only
consider here a one-dimensional system (e.g. an electron in an external potential
moving along the x axis). Then the state of the electron in quantum mechanics is
described by its wave function .x; t/, which is a function of the electron position
x and time t. The time evolution of the wave function depends on its value at t D 0
(initial conditions) and how the potential V.x; t/ acting on it changes in time. If the
potential is constant in time, V D V0 .x/, then the electron may accept just one state
from an infinite set of stationary states each described by the wave function n .x/
and energy n , where the index n D 0; 1; 2; 3; : : : numbers the stationary states in
order of increasing energy, i.e. nC1 n . Any state is legitimate, but most likely
the state n D 0 with the lowest energy will be occupied, unless a higher energy
state with n 1 was specifically prepared. If the potential V0 .x/ is known, one can
solve the Schrödinger equation and determine all the stationary energies and wave
functions, at least in principle.
The situation, however, becomes more complex if the potential changes in time.
Consider as an example the uniform potential (i.e. it does not depend on x) shown in
Fig. 3.8(a). Initially the potential is equal to V0 and is constant in time. Then between
times t1 and t2 the potential changes in time somehow, after which it returns back to
its initial value. For instance, it may be that within the finite time interval t1 t t2
an external field was acting on our electron. If initially the electron was prepared
to occupy exclusively the ground state, 0 .x/, after the action of the potential at
t t1 its wave function .x; t/ will depend on time and each stationary state will
then become occupied with some probability, Fig. 3.8(b). If we expand .x; t/ in
terms of the complete set of stationary states, f n .x/g, i.e. if we write down the
corresponding generalised Fourier series,
Fig. 3.8 (a) A spatially uniform external potential V.t/ acting on an electron as a function of
time, t. (b) Occupation of energy levels n by the electron. Left: initially, when only the ground
state n D 0 was occupied. Right: at t D 1, when several states are occupied with different
probabilities (shown by the length of the horizontal lines representing the states), with the excited
state n D 2 being the most probable
1
X
.x; t/ D cn .t/ n .x/;
nD0
then, according to the rules of quantum mechanics, the probability at time t to find
our electron in state n is Pn .t/ D jcn .t/j2 , i.e. it is given by the module square of
the generalised Fourier coefficients. Of course, since the electron should occupy a
state with certainty,
X X
Pn .t/ D jcn .t/j2 D 1:
n n
Problem 3.41. Assuming that the stationary states functions n .x/ form an
orthonormal set, show that the above condition R corresponds to the correct
normalisation of the wave function at any time: .x; t/ .x; t/dx D 1.
If the measurement is done long after the action of the external potential so that
we can assume that all the relaxation processes have ceased in the system, then
there will be a stationary distribution Pn .1/ of finding the electron in the state n as
schematically shown in Fig. 3.8(b).
To calculate the probability, we can use formula (3.67) for the Fourier
coefficients:
ˇZ ˇ2
ˇ ˇ
Pn .t/ D ˇˇ
n .x/ .x; t/dxˇˇ :
Of course, this is just one example; in fact, quantum mechanics is basically built on
the idea of expanding states into the Fourier series with respect to some stationary
states.
Another application of the functional series in quantum mechanics is related
to modern theories of quantum chemistry and condensed matter physics where a
many-electron problem is solved. The same Schrödinger equation is solved in both
these cases, but essentially different expansion of the electron wave functions in
basis functions (the basis set) is employed in each case. If in the condensed matter
physics plane waves are used representing an orthogonal and complete basis the
convergence of which can be easily controlled, in quantum chemistry localised on
atoms functions are used which mimic atomic-like orbitals. This way much smaller
sets of basis functions are needed to achieve a reasonable precision, however, it is
much more difficult to achieve convergence with respect to the basis set as functions
localised on different atoms do not represent an orthogonal and complete basis set,
so that including too many of them on each atom may result in overcompleteness
and hence instabilities in numerical calculations.
Chapter 4
Special Functions
In this chapter1 we shall consider various special functions which have found
prominent and widespread applications in physics and engineering. Most of these
functions cannot be expressed via a finite combination of elementary functions, and
are solutions of non-trivial differential equations. Their careful consideration is our
main objective here.
We shall start from the so-called Dirac delta function, which corresponds to the
class of generalised functions, then move on to gamma and beta functions; then
consider in great detail various orthogonal polynomials, hypergeometric functions,
associated Legendre and Bessel functions. The chapter is concluded with differential
equations which frequently appear when solving physics and engineering problems,
where the considered special functions naturally emerge as their solutions.
4.1 Dirac Delta Function
Consider a charge q which is spread along the x axis. The distribution of the charge
q along x is characterised by the distribution function .x/, called charge density,
which is the charge per unit length. Integrating the density over the whole 1D space
would give q,
Z 1
.x/dx D q:
1
1

300 4 Special Functions
If the charge q is smeared out along x smoothly, the density .x/ would be a smooth
function of a single variable x. In physics, however, it is frequently needed to
describe point charges. A point charge is localised “at a single point”, i.e. there
is a finite charge at a single value of x only, beyond this point there is no charge at
all. How can one in this case specify the corresponding charge density? Obviously,
there is a problem. Consider a charge q placed at x D 0. Since the charge is assumed
to be point-like, the density must be equal to zero everywhere outside the charge, i.e.
where x ¤ 0, and equal to some constant A where the charge is, i.e. at x D 0. It is
immediately obvious that the constant A cannot be finite. Indeed, integration of the
density must recover the total charge q. However, we obtain zero when integrating
A since the function .x/ D A only within an immediate vicinity of the point x D 0,
i.e. when < x < with ! 0, but is zero outside it. Integrating such a density
gives zero:
Z 1 Z
.x/dx D lim Adx D lim 2A D 0:
1 !0 !0
This paradoxical situation may be resolved in the following way: one hopes that the
limit above may not be equal to zero if the constant A was infinity as in this case the
limit would be of the 0 1 type and hence (one hopes) might be finite. This means
that the charge density of a point charge must have a very unusual form: .x/ D 0
for any x ¤ 0 and .0/ D C1, i.e. it is zero everywhere apart from the single point
x D 0 where it is infinite; such a density has an infinitely high zero width spike at
x D 0.
This is indeed a very unusual function we have never come across before: all
functions we have encountered so far were smooth with at most finite number of
discontinuities; we have never had a function which jumps infinitely high up on the
left of a single point (x D 0) and immediately jumps back on the right side of it.
How can this kind of a function be defined mathematically? We shall show below
that this function can be defined in a very special way as a limit of an infinite
sequence of well-defined functions.
Consider a function ın .x/ which has the form of a rectangular impulse for any
value of n D 1; 2; 3; : : : (Fig. 4.1):
Fig. 4.1 First elements of the

rectangular delta sequence
ın .x/ for n D 1; : : : ; 4. One
can see that each next
element of the sequence is
narrower and higher, but the
area under it remains the
same and equal to unity
4.1 Dirac Delta Function 301

n; if 1=2n x 1=2n
ın .x/ D : (4.1)
0; if jxj > 1=2n
This function is constructed in such a way that the area under it is equal to one for
any value of n, i.e. the definite integral
Z 1 Z 1=2n
2
ın .x/dx D ndx D n D1 (4.2)
1 1=2n 2n
for any n. When n is increased, the graph of ın .x/ becomes narrower and more
peaked; at the same time, the area under the curve remains the same and equal to
one. In the limit of n ! 1 we shall arrive at a function which has an infinite height
and at the same time an infinitely small width—exactly what we need! This idealised
“impulse” function which is defined in the limit of n ! 1 was first introduced by
Dirac and bears his name:
lim ın .x/ D ı.x/: (4.3)

n!1
By definition, it has the following properties:

Z 1
1; if x D 0
ı.x/ D and ı.x/dx D 1: (4.4)
0; if x ¤ 0 1
It is helpful to think of ı.x/ as being an infinitely sharp impulse function of unit area
centred at x D 0:
It is important to stress that the function defined in this way is not an ordinary
function; it can only be understood and defined using a sequence of well-defined
functions ın .x/, n D 1; 2; 3 : : :, called a delta sequence. This type of a function
belongs to the class of generalised functions.
We shall now consider one of the most important properties of this function
which is called the filtering theorem. Let f .x/ be a well-defined function which can
be expanded in a Taylor series around x D 0. Consider the integral
Z 1 Z 1=2n
ın .x/f .x/dx D n f .x/dx: (4.5)
1 1=2n
For large n the integration interval 1=2n x 1=2n becomes very narrow and
hence f .x/ can be represented well by the Maclaurin series about the point x D 0:
f 00 .0/ 2 f .k/ .0/ k

f .x/ D f .0/ C f 0 .0/x C x C C x C ; (4.6)
2 kŠ
so that the integral (4.5) becomes
Z 1 1 .k/
X Z 1=2n X1
f .0/ f .k/ .0/ 1C .1/k
ın .x/f .x/dxDf .0/C x dxDf .0/C
k
:
1 kD1
kŠ 1=2n kD1
.k C 1/Š .2n/kC1
It can be readily seen that at large values of n the integral is mainly determined by
f .0/, the other terms (in the sum) are very small, the largest one being of the order
of 1=n2 . In fact, by taking the n ! 1 limit we obtain
Z 1
lim ın .x/f .x/dx D f .0/: (4.7)
n!1 1
Formally, we can take the limit under the integral sign which would turn ın .x/ into
ı.x/, and hence arrive at the following formal result:
Z 1
ı.x/f .x/dx D f .0/: (4.8)
1
One may say that the integration of f .x/ with ı.x/ has filtered out the value f .0/
of the function f .x/; this value corresponds to the point x D 0 where the delta
function is peaked. This result must be easy to understand: since the delta function is
infinitely narrow around x D 0, only a single value of f .x/ at this point, f .0/, can be
kept in the product f .x/ı.x/ under the integral. Basically, within that infinitesimally
narrow interval of x the function f .x/ may be considered as a constant equal to f .0/.
Therefore, it can be taken out of the integral which then appears simply as the
integral of the delta function alone which is equal to unity because of Eq. (4.4).
We shall now discuss some obvious generalisations. Instead of consid-
ering the delta sequence centred at x D 0, one may define a sequence
fın .x a/; n D 1; 2; 3; : : :g, centred at the point x D a. This would lead us to
the delta function ı .x a/ and the corresponding generalisation of the filtering
theorem:
Z 1
ı.x a/f .x/dx D f .a/: (4.9)
1
Here the delta function ı.x a/ peaks at x D a and in the integral it filtered out the
value of the function f .x/ at this point.
Problem 4.1. Derive the above result by making the substitution t D x a in

the integral.
In fact, more complex arguments of the delta function may be considered in the
same way by making an appropriate substitution. For instance,
Z ˇ ˇ
5 ˇ ˇ
x ˇ t D 2x C 1 ˇ
e ı .2x C 1/ dx D ˇ ˇ
10 ˇ dt D 2dx ˇ
Z 11 p
1 .t1/=2 1 .t1/=2 1 e
D e ı.t/dt D e D e1=2 D :
19 2 2 tD0 2 2
Note that the exact numerical values of the boundaries in the integral are unimpor-
tant: as long as the singularity of the delta function happens within them, one can use
the filtering theorem irrespective of the exact values of the boundaries; otherwise,
i.e. when the singularity is outside the boundaries, the integral is simply equal to
zero.
Problem 4.2. Prove the following identities:

Z 1 Z 1 Z 1
sin x 1Cx 1Cx 1
ı.x/ dx D 1I ı .2x 4/ dx D 0I ı .2x 4/ dx D :
1 x 1 1 C 2x2 1 1 C 2x2 6
Using the filtering theorem, various useful properties of the delta function may
be established. First of all, consider ı .x/. We have for any function f .x/ which is
smooth around x D 0:
Z 1 Z 1 Z 1
ı.x/f .x/dx D ı.t/f .t/dt D ı.t/f .t/dt D f .0/ D f .0/;
1 C1 1
i.e. ı.x/ does the same job as ı.x/ and hence we formally may write
ı.x/ D ı.x/: (4.10)
Because of this property which tells us that the delta function is even (something
one would easily accept considering its definition), we may also write the following
useful results:
Z 0 Z 1
1
ı.x/dx D ı.x/dx D ;
1 0 2
and hence
Z 0 Z 1
f .0/
ı.x/f .x/dx D ı.x/f .x/dx D : (4.11)
1 0 2
Problem 4.3. Prove the following other properties of the delta function:
xı.x/ D 0 I

1 b
ı.ax C b/ D ı x : (4.12)
jaj a
Note that in the latter case both positive and negative values of a are to be
considered separately.
Often the argument of the delta function is a more complex function than the
linear one considered so far. Moreover, it may become equal to zero at more than
one point yielding more than one singularity points for the delta function. As a
consequence, the filtering theorem in these cases must be modified. As an example,
consider the integral
Z 1

ID f .x/ı x2 a2 dx:
1

It contains ı x2 a2 which has two impulses: one at x D a and another at
x D Ca. We split the integral into two: one performed around the point x D a
and another around x D a:
Z aC Z
aC
ID ı x2 a2 f .x/dx C ı x2 a2 f .x/dx D I C IC ;
a a
where 0 < < a. In the first integral I we change the variable x into t D x2 a2 ,
p p
so that x D t C a2 and dx D dt= 2 t C a2 ; here the minus sign is essential
t D 0 where ı.t/ is peaked must correctly correspond to x D a where
as the point
ı x2 a2 is peaked. Performing the integration, we then obtain
Z 2 aC 2
p dt
I D ı.t/f t C a2 p
2 aC 2 2 t C a2
p
Z 2 aC 2
p f a2
dt f . jaj/
D ı.t/f t C a2 p D p D :
2 aC 2 2 tCa2 2 a2 2 jaj
In the second p integral IC the same p substitution is made, however, in this
case x D C t C a2 and dx D Cdt=2 t C a2 , while the integration results in
f .jaj/ = jaj. We therefore obtain that
Z 1
1 1
f .x/ı x2 a2 dx D Œf . jaj/ C f .jaj/ D Œf .a/ C f .a/ :
1 2 jaj 2 jaj
The same result is obtained if we formally accept the following identity:
1
ı x2 a2 D Œı.x a/ C ı.x C a/ : (4.13)
2 jaj
Problem 4.4. Generalise this result to an arbitrary continuous differentiable

function .x/:
Z 1 Xn X n
f .xi / 1
ı.' .x//f .x/dx D 0 .x /j
H) ı .'.x// D 0 .x /j
ı .x xi / ;
1 iD1
j' i iD1
j' i
(4.14)
where x1 ; : : : ; xn are n roots of the equation '.x/ D 0.
The rectangular sequence considered above to define the delta function is not
unique; there are many other sequences one can build to define the delta function as
the limit. If we consider any bell-like function .x/ defined for 1 < x < 1 and of
unit area which tends to zero at x ! ˙1, as in Fig. 4.2, then one can construct the
corresponding delta sequence using the recipe ın .x/ D n .nx/, which in the limit
n ! 1 corresponds to the delta function. Indeed, .nx/ tends to become narrower
with increasing n and, at the same time, the prefactor n makes the function ın .x/
more peaked as n gets bigger; at the same time, the area under the curve remains the
same for any n,
Z 1 Z 1 ˇ ˇ Z 1
ˇ t D nx ˇ
ın .x/dx D n .nx/dx D ˇ ˇ ˇD .t/dt D 1;
1 1 dt D ndx ˇ 1
as required. Also, it is clear that because the function ın .x/ defined above gets
narrower for larger n, the filtering theorem (4.8) or (4.9) is also valid.
Indeed, consider the integral
Z 1 Z 1 Z 1
t
f .x/ın .x/dx D f .x/n .nx/ dx D f .t/ dt:
1 1 1 n
Expanding the function f .t=n/ in the Maclaurin series, we have

Z 1 Z 1 Z 1
f 0 .0/
f .x/ın .x/dx D f .0/ .t/ dt C t .t/ dt
1 1 n 1
Z 1
f 00 .0/
C t2 .t/ dt C :
2n2 1
Fig. 4.2 The graph of a typical bell-like function .x/ which may serve as the one generating the
corresponding delta sequence
Since the function .t/ is of unit area, the first term is simply f .0/, while all the
R 1 to zero in the n ! 1 limit, provided that for any positive integer k
other terms tend
the integral 1 tk .t/ dt converges.
For example, the following functions can be used to generate the delta sequences:
1 2
1 .x/ D p ex =2 ; (4.15)
2
1 1
2 .x/ D ; (4.16)
1 C x2
Z 1
sin x 1
3 .x/ D D eikx dk (4.17)
x 2 1
(the last passage in the equality is checked by direct integration). All these functions
satisfy the required conditions and have the desired shape as in Fig. 4.2. Therefore,
these functions generate the following delta sequences:
n 2 2
ın.1/ .x/ D p en x =2 ; (4.18)
2
n 1
ın.2/ .x/ D : (4.19)
1 C n2 x2
The third sequence,

Z ˇ ˇ Z n
1 ˇ t D nk ˇ
ın.3/ .x/ D
n iknx
e ˇ
dk D ˇ ˇD 1 eixt dt;
2 1 dt D ndk ˇ 2 n
in the n ! 1 limit tends to the integral

Z 1
1
ın.3/ .x/ H) ı.x/ D eixt dt; (4.20)
2 1
which is a very frequently used integral representation of the delta function. We

shall come across it several times in this book.
Various representations of the delta function can be used in deriving useful
formulae. We shall illustrate this point here by deriving the Sokhotski–Plemelj
formula:
Z 1 Z 1
f .x/ f .x/
dx D P dx i f .x0 / ; (4.21)
1 x x0 ˙ iı 1 x x0
where ı ! C0, P is the symbol of Cauchy principal value and f .x/ is some
continuous function on the real axis. This formula can be written symbolically as:
1 1
DP i ı .x x0 / : (4.22)
x x0 ˙ iı x x0
To prove it, let us multiply the denominator and the numerator of the integrand in
the original integral by .x x0 / iı:
Z 1 Z 1
f .x/ f .x/ .x x0 iı/
dx D 2
dx
1 x x 0 ˙ iı 1 .x x0 / C ı
2
Z 1 Z 1
x x0 ı
D 2
f .x/dx i 2
f .x/dx:
2 2
1 .x x0 / C ı 1 .x x0 / C ı
(4.23)
In the first term we have to exclude the point x0 from the integration since otherwise
the integral diverges in the ı ! 0 limit:
Z 1 Z x0 Z 1
xx0 xx0 xx0
2
f .x/dxD 2
f .x/dxC 2
f .x/dx
2 2 2
1 .xx0 / Cı 1 .xx0 / Cı x0 C .xx0 / Cı
Z x0 Z 1 Z 1
x x0 x x0 f .x/
H) 2
f .x/dx C 2
f .x/dx H) P dx
1 .x x0 / x0 C .x x0 / 1 x0
x
in the ı ! 0 limit. Above, ! C0.To transform the second term in Eq. (4.23), we
notice that an expression ı= ı 2 C x2 tends to ı .x/ in the ı ! 0 limit. Indeed, this
follows from the representation (4.19) of the delta function upon the substitution
n ! 1=ı. Therefore, the second term in Eq. (4.23), upon the application of the
filtering theorem for the delta function, becomes i f .x0 /. This proves completely
formula (4.21).
Problem 4.5. By splitting the integral (4.20) into two by the zero point, show
that this formula can also be equivalently written as follows:
Z 1
1
ı.x/ D cos .xt/ dt: (4.24)
0
Problem 4.6. Calculate the following integrals:

Z 20 Z 1
10.xC1/2 2
.a/ ı.5x C 6/e dx I .b/ ı.5x C 6/e10.xC1/ dx I
10 1
Z 1
.c/ e2x Œı.2x C 1/ C 5H.x C 1/ dx:
1
Here H.x/ is the Heaviside unit step function defined in Sect. I.2.1. [Answer:
(a) exp .2=5/ =5 ; (b) exp .242=5/ =5 and (c) e .1 C 5e/ =2.]
The Heaviside unit step function H.x/ is directly related to the delta function.
Indeed, H.x/ is constant anywhere apart from the point x D 0 and hence its
derivative is equal to zero for x ¤ 0; it is not defined at x D 0. We might expect,

however, that since it makes a finite jump at this point, its derivative will be equal to
infinity at x D 0, so we might anticipate that
H 0 .x/ D ı.x/: (4.25)
This guess can be supported by a simple calculation that shows that the integral of
the derivative of the Heaviside function is equal to unity:
Z 1
H 0 .x/dx D H.1/ H .1/ D 1 0 D 1;
1
as is required for the delta function. Moreover, the integral:

Z x
ı.t/dt D H.x/ for any x ¤ 0; (4.26)
1
since for x < 0 it is zero as the spike due to singularity of the delta function appears
outside the integral limits, while for x > 0 we obviously have 1 as the singularity
falls inside the limits. The case of x D 0 results in fact in the value of the integral
equal to 1=2 due to the fact that the delta function is even, i.e. it follows that H.0/ D
1=2 is consistent with Eq. (4.26). If we now differentiate both sides of this equation
with respect to x, we obtain (4.25). The derivative function H 0 .x/ belongs to the
class of generalised functions and is equal to ı.x/ in this sense.
R1
Problem 4.7. By calculating the integral 1 f .x/H 0 .x/dx by parts, prove that
the derivative of the Heaviside function, H 0 .x/, is equal to the Dirac delta
function ı.x/ in a sense that H 0 .x/ works as a filter for f .x/, exactly as the
delta function.
Problem 4.8. By using integration by parts, prove the integral identity
Z 1
f .x/ı 0 .x/dx D f 0 .0/;
1
which can be used as a definition of the impulse function ı 0 .x/.

Problem 4.9. By using repeatedly integration by parts or the method of
induction, prove that generally
Z 1
f .x/ı .n/ .x/dx D .1/n f .n/ .0/:
1
This identity can be used to define higher derivatives of ı.x/.

Problem 4.10. Consider a system of n unit mass particles which oscillates

around their equilibrium positions (e.g. a crystal). The displacement of a
particle p in direction ˛ D x; y; z is denoted xp˛ .t/ D xi .t/, where i D .p˛/
is the joined index associated with the corresponding degree of freedom.
If we define a vector X D .xi / of all displacements of the particles from
their equilibrium positions, then this vector satisfies the following equation of
motion:
XR C XP C DX D ˆ; (4.27)
where D is a (symmetric) dynamical matrix of the system associated with atoms

vibrating around their equilibrium positions. The second term in the left-hand
side corresponds to a friction acting on each degree of freedom, with > 0
being the corresponding (scalar) friction constant (i.e. all degrees of freedom
experience an identical friction). The force ˆ.t/ D .'i .t// in the right-hand
side is a vector of stochastic (random) forces 'i .t/ acting on each degree
of freedom i. The statistical average, shown by the angle brackets h i, of
a product of these forces calculated at different times satisfies the following
equation:
˝ ˛
'i .t/'j t0 D 2kB Tıij ı t t0 : (4.28)
This expression shows that stochastic forces are not correlated in time: indeed,
if t ¤ t0 , then the delta function is zero meaning there are no correlations.
Only the forces at the same times t D t0 correlate (when the delta function
is not zero). This also means that the system of oscillators does not possess any
memory as the past (t0 < t) does not influence the future at time t due to lack
of correlation between the forces at different times. Also forces corresponding
to different degrees of freedom are not correlated with each other due to ıij .
The appearance of temperature T in the right-hand side of Eq. (4.28) is not
accidental: this ensures that the so-called fluctuation-dissipation theorem is
obeyed. This is necessary to satisfy the equipartition theorem of statistical
mechanics as we shall see later on in this problem.
(i) Let e and !2 be eigenvectors and eigenvalues of the dynamical matrix D.
By writing X as a linear combination of all eigenvectors,
X
X.t/ D .t/e ;

show that each scalar coordinate .t/ satisfies the following DE:
R C P C !2 D ; (4.29)
(continued)

where D eT ˆ is the scalar product which can also be interpreted as a
projection of the vector ˆ onto the -th normal coordinate.
(ii) Show that the general solution of Eq. (4.29) contains, apart from a
decaying (with time) component (the transient), a particular integral
Z
1 t
.t/=2
.t/ D . /e sin Œ! .t / d:
! 1
(iii) Correspondingly, show that the solution of Eq. (4.27) which survives at
long times can be written in the matrix form as follows:
hp i
Z t sin D .t /
X.t/ D e.t/=2 p ˆ. /d;
1 D
P
while the velocity vector, V.t/ D X.t/, reads
8 hp i 9
Z t < sin D .t / hp i=
V.t/D e.t /=2 p C cos D .t/ ˆ ./ d:
1 : 2 D ;

Here the matrix D D D 2 =4 E, where E is the unit matrix.
(iv) Show that half of the same time velocity–velocity autocorrelation function
at long times, i.e. the half of the average of the square of the velocity, is
1˝ ˛ kB T kB T
V.t/V T .t/ D ıij D E;
2 2 2
i.e. 12 kB T of the kinetic energy (recall that our particles are of unit
mass) is associated with a single degree of freedom, while the same time
displacement–displacement autocorrelation function is
˝ ˛
X.t/X T .t/ D kB TD1 :
These are manifestations of the equipartition theorem for the vibrational

potential energy. Indeed, if D was diagonal, D D ıij i , with i being
an elastic constant of the oscillator associated with the i-th degree of
freedom, the diagonal element j D i of the latter correlation function
would be given by
˝ ˛
˝ ˛ kB T i xi2 .t/ 1
X.t/X .t/ ii D
T
H) D kB T;
i 2 2
(continued)
4.2 The Gamma Function 311

as expected. These expressions justify the particular choice (4.28) for
the random forces which ensure the manifestation of the equipartition
theorem.
4.2 The Gamma Function
4.2.1 Definition and Main Properties
We define the gamma function, .z/, as the following integral in which its argument,
generally a complex number z, appears as a parameter:
Z 1
.z/ D tz1 et dt: (4.30)
0
We initially assume that Re z > 0 as this integral converges in the right half of
the complex plane. Indeed, because of et , the convergence at t ! 1 is achieved.
However, the integral may
R z1 diverge at t D 0 since for small t one puts et ' 1
arriving at the integral 0 t dt (with some 0 < 1) which diverges for z D 0
(logarithmically). Therefore, the question of convergence of the integral is not at all
straightforward. In order to prove the convergence of the integral for Re z > 0, it is
wise to split the integral into two parts corresponding to intervals 0 t 1 and
1 t < 1:
Z 1 Z 1
.z/ D 1 .z/ C 2 .z/; 1 .z/ D tz1 et dt and 2 .z/ D tz1 et dt:
0 1
Consider first 2 .z/. For a vertical stripe 0 < Re z xmax , we write z D x C iy, and
then the integral can be estimated as
Z 1 Z 1 ˇ ˇ Z 1 ˇ ˇˇ
ˇ z1 t ˇ ˇ ˇ ˇ ˇ ˇ
j2 .z/j ˇt e ˇ dtD et ˇe.x1/ ln tCiy ln t ˇ dtD et ˇe.x1/ ln t ˇ ˇeiy ln t ˇ dt
1 1 1
Z 1 ˇ ˇ Z 1 Z 1
ˇ ˇ
D et ˇe.x1/ ln t ˇ dt et e.xmax 1/ ln t dt D txmax 1 et dt:
1 1 1
In writing the second line above we have made use of the fact that ln t > 0
when t > 1. Since the integral in the right-hand side converges (its convergence
at t ! 1 is obvious because of the exponential function et ), then 2 .z/ converges
as well. Moreover, since the estimate above is valid for any z within the stripe, it
converges uniformly. Since the value of xmax was chosen arbitrarily, the integral
2 .z/ converges everywhere to the right of the imaginary axis, i.e. for any Re z > 0,
and is analytic there.
Consider now 1 .z/ which is obviously analytic for Re .z 1/ > 0 (or when
x D Re z > 1). Within the vertical stripe 0 < xmin < Re z < 1 in the complex plane,
the integral can be estimated as follows:
Z Z ˇ ˇ Z ˇ ˇ
1 ˇ z1 t ˇ 1
ˇ .x1/ ln t ˇ
1
ˇ ˇ
j1 .z/j ˇt e ˇ dt D t
e ˇe ˇ dt D et ˇe.1x/. ln t/ ˇ dt
0 0 0
Z Z Z ˇ
1
t .1xmin /. ln t/
1
xmin 1 t
1
xmin 1 txmin ˇˇ1 1
e e dtD t e dt t dtD D :
0 0 0 xmin ˇ0 xmin
This means that 1 .z/ converges and converges uniformly. Hence, .z/ D 1 .z/ C
2 .z/ is analytic everywhere to the right of the imaginary axis, Re z > 0.
The function .z/ satisfies a simple recurrence relation. Indeed, let us calculate
the integral for .z C 1/ by parts:
Z 1 Z 1 Z 1
1
.z C 1/D tz et dtD tz et 0 Cz tz1 et dt D z tz1 et dtDz.z/ ;
0 0 0
(4.31)
where the free term above is zero both at t D 0 and t D 1.
Problem 4.11. Prove that in the case of z being a positive integer, z D n, the
gamma function is equal to the factorial function:
.n C 1/ D nŠ: (4.32)
[Hint: to see this, first check that .1/ D 1 and then apply induction.]
p
Problem 4.12. Using induction and the fact that .1=2/ D (see below),
prove that

1 1 3 5 : : : .2n 1/ p .2n 1/ŠŠ p .2n/Š p
nC D D D 2n ;
2 2 n 2n 2 nŠ
(4.33)
where the double factorial .2n 1/ŠŠ corresponds to a product of all odd
integers between 1 and 2n 1.
Next we shall calculate .1=2/. To do this, we consider its square:

2 Z 1 Z 1
1
D t1=2 et dt t1=2 et dt :
2 0 0
In the first integral in the right-hand side we make a substitution t D x2 , while in

the second t D y2 (we have chosen x and y on purpose here!). This yields a double
integral with respect to x and y:
2 Z 1 Z 1 Z 1Z 1
1 2 2 2 2
D4 ex dx ey dy D e.x Cy / dxdy;
2 0 0 1 1
which can be viewed as a double integral over the x y plane (that is why we have
introduced x and y as new variables). Hence, we can calculate it by going into polar
coordinates x D r cos and y D r sin (with dxdy D rdrd), which gives
2 Z 2 Z 1 Z 1
1 2 2
D d er rdr D 2 er rdr
2 0 0 0
Z 1
1 p
D et dt D H) D : (4.34)
0 2
Hence, we can also write

Z C1
2 p
ex dx D : (4.35)
1
This is called the Gaussian integral.

Many integrals encountered in statistical mechanics are of the form:
Z C1
2
In .a/ D tn e˛t dt; (4.36)
1
where ˛ is a constant and n a positive integer. For odd values of n the integrand is
an odd function and the integral is obviously equal to zero. For even values of n the
integral In .˛/ is directly related to the gamma function of half integer argument as
is demonstrated by the following Problem.
Problem 4.13. Using a new variable t D ln x, show that

Z Z p
1
xC1 1 p 1 p 1
p dx D 1 C p I .x C 1/ ln xdx D 1 C p :
0 ln x 2 0 2 2 2
Problem 4.14. Using a new variable x D ˛t2 in the integral of Eq. (4.36),
show explicitly that for even n

.nC1/=2 nC1
In .a/ D ˛ : (4.37)
2

Z 1 r p Z Z 1
1 p
2 1 3 8 ˛t2 105 2
t2 e˛t dt D I I t4 e˛t D
t e D I
1 ˛ 2˛
1 4˛ 5=2 1 16˛ 9=2
Z 1 r
2 1 2
1 C 2x2 e2x˛x dx D 1 C C 2 e1=˛ :
1 ˛ ˛ ˛

Z C1
r
2 ˙bx 2 =4a
eax dx D eb : (4.38)
1 a
Problem 4.17. In physics (as well as in many other disciplines) it is often

necessary to consider the so-called Gaussian2 function (or the Gaussian)
1 2 2
G.x/ D p e.xx0 / =2 ; (4.39)
2
centred at the point x0 and with dispersion which characterises the width of
the function. An example of a Gaussian for x0 D 0 and D 1p is depicted in
Fig. 4.2. Show that the width of G.x/ at its half height is D 2 2 ln 2. Then,
prove that G.x/ is correctly normalised to unity. Finally, calculate the first two
momenta of the Gaussian:
Z 1 Z 1
xG.x/dx D x0 and x2 G.x/dx D x02 C 2
:
1 1
Problem 4.18. Another class of integrals, the so-called beta function,

defined as
Z 1
B.˛; ˇ/ D t˛1 .1 t/ˇ1 dt; (4.40)
0
can be directly expressed via the gamma function. Similarly to the way we
calculated above .1=2/, consider the integral
Z 1 Z 1
x2 2˛1 y2 2ˇ1
ID e x dx e y dy
0 0
using two methods: (i) firstly, show that each integral in the brackets above is
related to the gamma function, so that I D 14 .˛/ .ˇ/; (ii) secondly, combine
the two integrals together into a double integral and then change into the polar
coordinates .x; y/ ! .r; '/; relate the r-integral to .˛ C ˇ/ by means of an
appropriate substitution, while the '-integral can be manipulated into 12 B.˛; ˇ/
by means of the substitution t D cos2 '. Hence show that
.˛/.ˇ/
B .˛; ˇ/ D : (4.41)
.˛ C ˇ/
(continued)
2
Named after Johann Carl Friedrich Gauss.
Problem 4.19. Using the substitution t ! x D 1 C 1=t in Eq. (4.40), derive

another integral representation of the beta function:
Z 1 Z 1
xˇ1 dx x˛1 dx
B .˛ˇ/ D D : (4.42)
0 .x C 1/˛Cˇ 0 .x C 1/˛Cˇ
The second form (with x˛1 ) follows from the symmetry of the beta func-
tion (4.41).
Z 1 n .nŠ/2
1 x2 dx D 22nC1 : (4.43)
1 .2n C 1/Š
[Hint: using a new variable t via x D 1 2t, express the integral via the beta
function B.n C 1; n C 1/.]
Above the gamma function .z/ was defined in the right half of the complex
plane, to the right of the imaginary axis. The recursion relation (4.31) can be used
to analytically continue (Sect. 2.6) .z/ to the left half of the complex plane as well,
where Re z < 0. Indeed, let us apply the recurrence relation consecutively n 1
times to .z C n/ (here n D 1; 2; : : :):
.z C n C 1/ D .z C n/ .z C n/ D D .z C n/ .z C n 1/ .z C 1/ z.z/:
Solving for .z/, we obtain
.z C n C 1/
.z/ D : (4.44)
.z C n/ .z C n 1/ .z C 1/ z
This formula can be used for calculating the gamma function for Re z 0. Indeed,
using in the above formula n D 0, we can write .z/ D .z C 1/=z, relating
.z/ within the vertical stripe 1 < Re z 0 to the values of the gamma function
.z C 1/ with 0 < Re .z C 1/ 1, where it is well defined. Similarly, by choosing
different values of n one can define the gamma function for the corresponding
vertical stripes of the width one to the left of the imaginary axis.
This formula also clearly shows that the gamma function in the left half of the
complex plane will have poles at z D n, where n D 0; 1; 2; : : :, i.e. the gamma
function defined this way is analytic in the whole complex plane apart from the
points z D 0; 1; 2; 3; : : : , where it has singularities. These are simple poles,
however, since (see Sect. 2.5.5) only the limit
.z C n C 1/ .1/ .1/n

lim .z C n/ .z/ D lim D D
z!n z!n .z C n 1/ .z C 1/ z .1/ .2/ .n/ nŠ
is finite (recall that .1/ D 1); the limits of .z C n/k .z/ for any k > 1 are equal
to zero. The limit above (see Sect. 2.7.1) also gives the residue at the pole z D n,
which is .1/n =nŠ.
There is also a simple identity involving the gamma function which we shall
now derive. Using the integral representation (4.30) and assuming a real z between
0 and 1, let us consider the product .z/.1 z/. We shall employ a similar trick to
the one we used before when deriving Eq. (4.34). We write
Z 1 Z 1
.z/ .1 z/ D t1z1 et1 dt1 t2z et2 dt2 :
0 0
In the first integral we make the substitution x2 D t1 , while in the second integral
the substitution will be y2 D t2 . This brings us to a double integral
Z 1 Z 1 2z1
.x2 Cy2 / x
.z/ .1 z/ D 4 e dxdy;
0 0 y
in which we next use the polar coordinates x D r cos and y D r sin . This gives
(note that we only integrate over a quarter of the xy plane and hence 0 =2):
Z 1 Z =2 Z =2
2
.z/ .1 z/ D 4 er rdr .cot /2z1 d D 2 .cot /2z1 d:
0 0 0
p p
At the final step, we make the substitution t D cot , d D dt= 2 t .1 C t/ ,
which transforms the integral into the form which can be handled:
Z 1 z1
t
.z/ .1 z/ D dt D ; (4.45)
0 1Ct sin . z/
where at the last step we have used the result we obtained earlier in Problem 2.90 for
the integral in the right-hand side. This result was derived for 0 < z < 1. However,
it can be analytically continued for the whole complex plane and hence this formula
is valid for any z (apart from z D 1; 2; : : : where in both sides we have infinity).
Since .1 z/ D z .z/ because of the recurrence relation the gamma
function satisfies, the obtained identity can also be written as
.z/ .z/ D : (4.46)

z sin . z/
This result allows deriving an interesting formula in which the gamma function
is expressed via an infinite product. Let us derive it ignoring some mathematical
subtleties. Consider a sequence (n D 1; 2; : : :) of functions
Z ˇ ˇ Z 1
n
t n ˇ D t=n ˇ
n .z/ D 1 tz1 dt D ˇˇ ˇ D nz .1 /n z1 d
0 n dt D nd ˇ 0
for z > 0. This sequence converges to the gamma function in the limit of n !
1 because the sequence .1 t=n/n converges to et . The integral above can be
calculated by repeated integration by parts.
Problem 4.21. Performing n integrations by parts, show that

Z 1
n .n1/ .n2/ 1 nŠ
n .z/Dnz zCn1 dD nz :
z .zC1/ .zC2/ .zCn1/ 0 z .z C 1/ .z C 2/ .z C n/
Since nz D exp .z ln n/, the above result can be rewritten as:
1 zC1 zC2 zCn z ln n Yn

zCk Yn
z
Dz e Dzez ln n Dzez ln n 1C :
n .z/ 1 2 n kD1
k kD1
k
Next, we shall multiply the right-hand side of the above formula by
Y
n
C1=n/ z.1C1=2C1=3C C1=n/ C1=n/
1 D ez.1C1=2C1=3C e D ez.1C1=2C1=3C ez=k ;
kD1
giving
1 Y
n
z z=k
C1=nln n/
D zez.1C1=2C1=3C 1C e :
n .z/ kD1
k
Since .z/ D limn!1 n .z/, we can take the n ! 1 limit in the above formula
which yields
Y1
1 1 1 1 z z=k
D z exp z lim 1 C C C C ln n 1C e :
.z/ n!1 2 3 n kD1
k
Here the finite product becomes an infinite one, and the limit in the exponent is
nothing but the Euler–Mascheroni constant D 0:5772 : : : which we introduced
in Sect. I.7.1.2. Therefore, we finally obtain a representation of the gamma function
via an infinite product as follows:
Y1
1 z z=k
D zez 1C e : (4.47)
.z/ kD1
k
Problem 4.22. Using this product representation in Eq. (4.46), derive the
following representation of the sine function via an infinite product:
Y1
z2
sin . z/ D z 1 2 : (4.48)
kD1
k
Problem 4.23. Taking the logarithms of both sides of (4.48) and differentiat-
ing, show that
1
1 X 1
cot . z/ D : (4.49)
kD1
k C z
We shall need this beautiful result at the end of this chapter in Sect. 4.7.4.
4.3 Orthogonal Polynomials
Orthogonal polynomials appear in a wide range of important physical problems as

solutions of the corresponding equations of mathematical physics or in functional
series used to expand their solutions. We have already encountered some of the
polynomials in Sect. 1.1.3 where they were constructed using the Gram–Schmidt
procedure applied to powers of x. Here we consider polynomials in more detail
starting from the Legendre polynomials; at the end of this section we shall build a
general theory of the polynomials and will hence discuss other polynomials as well.
4.3.1 Legendre Polynomials

4.3.1.1 Generating Function
As will be clear later on in Sect. 4.3.2, it is always possible to generate all

polynomials using the so-called generating functions. Legendre polynomials Pn .x/
of the variable x can be generated by the Taylor’s expansion with respect to an
auxiliary variable t of the generating function
X 1
1
G.x; t/ D p D Pn .x/tn : (4.50)
1 2xt C t2 nD0
4.3 Orthogonal Polynomials 319
Fig. 4.3 Function f .t/ of

Eq. (4.51)
Here 1 < x < 1, otherwise the square root is complex. Indeed, the function under
the square root,

f .t/ D 1 2xt C t2 D .t x/2 C 1 x2 ; (4.51)
is a parabola, see Fig. 4.3. It is positive for all values of t only if 1 x2 > 0, i.e.
when 1 < x < 1.
Let us expand the generating function G.x; t/ explicitly into the Taylor’s series
with respect to t and thus calculate several first polynomials:
X1 n ˇ
1 t @n G.x; t/ @G ˇˇ
G.x; t/ D p D D G.x; 0/ C t
1 2xt C t2 nD0
nŠ @tn tD0 @t ˇtD0
ˇ
1 @2 G ˇˇ
C t2 C
2 @t2 ˇtD0
ˇ
xt ˇ
ˇ
D 1C ˇ t
3=2 ˇ
2
.1 2xt C t / tD0
" #
1 1 3 .x t/.2t 2x/
C t2 C
2 .1 2xt C t2 /3=2 2 .1 2xt C t2 /5=2
tD0
1 2
D 1 C xt C 3x 1 t2 C :
2
Therefore, comparing this expansion with the definition of the Legendre polynomi-
als (4.50), we conclude that
1 2
P0 .x/ D 1 I P1 .x/ D x I P2 .x/ D 3x 1 : (4.52)
2
This procedure of direct expansion of the generating function can be continued;
however, the calculation becomes increasingly tedious.
A more convenient and much simpler method is based on generating recurrence

relations for the polynomials which enables one to calculate them consecutively
starting from the very first one, P0 D 1. To this end, we shall first differentiate both
sides of Eq. (4.50) with respect to t:
X 1
@G
D nPn .x/tn1 : (4.53)
@t nD0
Using the explicit expression of G.x; t/, we can also write
@G @ 1
D p
@t @t 1 2xt C t2
X1
xt xt xt
D D G.x; t/ D Pn .x/tn : (4.54)
.1 2xt C t2 /3=2 1 2xt C t2 1 2xt C t2 nD0
The two expressions should be identical. Therefore, we must have

1 1
X X
1 2xt C t2 nPn .x/tn1 D .x t/ Pn .x/tn ;
nD0 nD0
or
1
X 1
X 1
X 1
X 1
X
nPn tn1 2xnPn tn C nPn tnC1 D xPn tn Pn tnC1 :
nD0 nD0 nD0 nD0 nD0
Collecting sums with similar powers of t, we get

1
X 1
X 1
X
nPn tn1 .2n C 1/xPn tn C .n C 1/Pn tnC1 D 0:
nD0 nD0 nD0
„ ƒ‚ … „ ƒ‚ …
n1!n nC1!n
Changing summation indices as indicated above, we arrive at:

1
X 1
X 1
X
.n C 1/PnC1 tn .2n C 1/xPn tn C nPn1 tn D 0:
nD1 nD0 nD1
Several simplifications are possible: the n D 1 term in the first sum does not
contribute because of the factor of .n C 1/, and hence the summation can start from
n D 0; in the third term we can add the n D 0 term since it is zero anyway because
of the prefactor n in front of Pn1 . Then, all three summations now run from n D 0,
have the same power of t and hence can be combined into one:
1
X
Œ.n C 1/PnC1 x.2n C 1/Pn C nPn1 tn D 0:
nD0
Since this expression must be valid for any t, each and every coefficient to tn should
be equal to zero3 :
.n C 1/PnC1 C nPn1 D .2n C 1/xPn ; where n 1: (4.55)
Note that this recurrence relation is formally valid even for n D 0 as well if we
postulate that P1 D 0, in which case we simply get P1 .x/ D xP0 .x/ D x, the result
we already knew.
This recurrent relation can be used to generate the functions Pn .x/. Indeed, using
P0 D 1 and P1 D x, we have using n D 1 in the recurrence relation that 2P2 C P0 D
3xP1 , which gives
1 1 2
P2 D .3xP1 P0 / D 3x 1 ;
2 2
i.e. the same expression as obtained above using the direct method. All higher order
functions (corresponding to larger values of n) are obtained in exactly the same way,
i.e. using a very simple algebra.
Problem 4.24. Show using the direct method (the Taylor’s expansion) that:
1 1 1 5
P3 D .5x3 3x/ I P4 .x/D .35x4 30x2 C3/ I P5 .x/D 63x 70x3 C15x :
2 8 8
(4.56)
Problem 4.25. Confirm these results by repeating the calculation using the
recurrence relations.
Problem 4.26. Prove that Pn .x/ is a polynomial of order n. [Hint: use the
recurrence relation (4.55) and induction.]
A number of other recurrence relations relating the function Pn with different

values of n can also be established. Above we differentiated both sides of Eq. (4.50)
with respect to t. This time, we shall differentiate it with respect to x. On the one
hand,
X 1
@G
D P0n .x/tn ;
@x nD0
while on the other,
X1
@G t t t
D D G.x; t/ D Pn tn :
@x .1 2xt C t2 /3=2 1 2xt C t2 1 2xt C t2 nD0
3
Functions tn with different powers n are linearly independent.
The two expressions must be equal, so that

1 1
X X
1 2xt C t2 P0n tn D Pn tnC1 ;
nD0 nD0
or
1
X 1
X 1
X 1
X
P0n tn 2x P0n tnC1 C P0n tnC2 D Pn tnC1 ;
nD0 nD0 nD0 nD0
„ ƒ‚ … „ ƒ‚ …
n!nC1 nC1!n
which after the corresponding index substitutions indicated above transforms into:
1
X 1
X 1
X

P0nC1 tnC1 2xP0n C Pn tnC1 C P0n1 tnC1 D 0:
nD0 nD0 nD1
Note that the n D 1 term in the first sum does not contribute since P00 D 0 and
hence was omitted. Separating out the n D 0 terms in the first and second sums and
collecting other terms together, we obtain
1
X

P01 t 2xP00 C P0 t C P0nC1 2xP0n Pn C P0n1 tnC1 D 0:
nD1
The expression in the square brackets is in fact zero if we recall that P0 D 1 and
P1 D x. Thus, we immediately obtain a different recurrence relation:
P0nC1 2xP0n C P0n1 D Pn ; n 1: (4.57)
Note that the two recurrence relations we have derived contain the Legendre
polynomials with three consecutive indices. Other identities can also be obtained
via additional differentiations as described explicitly below. In doing this, we aim
at obtaining such identities which relate only Legendre polynomials with two
consecutive indices. Using these we shall derive a differential equation the functions
Pn .x/ must satisfy.
To this end, we first differentiate Eq. (4.55) with respect to x:
.n C 1/P0nC1 C nP0n1 D .2n C 1/xP0n C .2n C 1/Pn : (4.58)
Solve (4.58) with respect to xP0n and substitute into (4.57); after straightforward
algebra, we obtain
.2n C 1/Pn D P0nC1 P0n1 : (4.59)
Solving (4.59) with respect to P0nC1 and substituting into Eq. (4.57) gives
P0n1 D xP0n nPn ; (4.60)

while solving (4.59) with respect to P0n1 and substituting into Eq. (4.57) results in:
P0nC1 D .n C 1/Pn C xP0n : (4.61)
If we make the index change n C 1 ! n in the last expression, we obtain
P0n D nPn1 C xP0n1 : (4.62)
Now we have both P0n1 and P0nC1 expressed via Pn and P0n by means of
Eqs. (4.60) and (4.61), respectively. This should allow us to formulate a differential
equation for the polynomials Pn .x/. Differentiating Eq. (4.60), we can write
P00n1 D xP00n C P0n nP0n : (4.63)
Differentiating (4.62), we obtain
P00n D nP0n1 C xP00n1 C P0n1 D .n C 1/P0n1 C xP00n1 : (4.64)
Substituting P0n1 and P00n1 from (4.60) and (4.63), respectively, into (4.64) gives an
equation containing functions Pn with the same index n:

1 x2 P00n 2xP0n .x/ C n.n C 1/Pn .x/ D 0; (4.65)
which is called Legendre’s differential equation.

Using the generating function G.x; t/, one can easily establish some general
properties of the polynomials and calculate them at the ends and the centre of the
1 x 1 interval.
Problem 4.27. Expanding directly the generating function into the Taylor’s
series for x D ˙1 and comparing this expansion with the definition of the
polynomials, Eq. (4.50), show that
Pn .1/ D 1 and Pn .1/ D .1/n : (4.66)
Problem 4.28. Using the fact that G.x; t/ D G.x; t/, prove that
Pn .x/ D .1/n Pn .x/; (4.67)
i.e. polynomials are even functions (contain only even powers of x) for even
n, while polynomials with odd n are odd (contain only odd powers of x). In
particular, Pn .0/ D 0 for odd n. In other words,the polynomials
Pn .x/ for odd
n do not contain constant terms, e.g. P3 .x/ D 12 5x3 3x .
Problem 4.29. In this problem we shall calculate P2n .0/ using the method of
the generating function. First show that at x D 0 the Taylor’s expansion of the
generating function is
1
X .2n/Š 2n
G.0; t/ D .1/n t :
nD0
.nŠ/2 22n
P
Then, alternatively, G.0; t/ must be equal to the series 1nD0 Pn .0/t . Rewriting
n
2
this latter series as an expansion with respect to t (why can this only be done
for even n?), show that
.2n/Š
P2n .0/ D .1/n : (4.68)
22n .nŠ/2
4.3.1.2 Orthogonality and Normalisation
The Legendre polynomials with different indices n are orthogonal to each other with
the weight one. To show this, we first rewrite the differential equation (4.65) in the
following equivalent form:
d
1 x2 P0n C n.n C 1/Pn D 0: (4.69)
dx
Multiplying it with Pm .x/ with m ¤ n and integrating between 1 and 1 gives
Z 1 Z 1
d
Pm 1 x2 P0n dx C n.n C 1/ Pn Pm dx D 0:
1 dx 1
The first integral is calculated by parts:

Z Z
ˇ1 1 1
Pm .x/ 1x2 P0n .x/ˇ1 1x2 P0n .x/P0m .x/dxCn.nC1/ Pn .x/Pm .x/dxD0;
„ ƒ‚ … 1 1
D0
or
Z 1 Z 1
n.n C 1/ Pn .x/Pm .x/dx D 1 x2 P0n P0m dx: (4.70)
1 1
Alternatively, we can start from Eq. (4.69) written for Pm .x/, then multiply it by
Pn .x/ with some n ¤ m and integrate; this way we would obtain the same result as
above but with n and m interchanged:
Z 1 Z 1
m.m C 1/ Pm .x/Pn .x/dx D 1 x2 P0m P0n dx:
1 1
Subtracting one equation from another yields

Z 1 Z 1
Œn.n C 1/ m.m C 1/ Pn .x/Pm .x/dx D 0 H) Pn .x/Pm .x/dx D 0;
1 1
(4.71)
since n ¤ m. So, it is seen that indeed Pn .x/ and Pm .x/ are orthogonal.
Consider now the special case of equal n and m by calculating
Z 1 1 X
X 1 Z 1 1
X Z 1
2 2n
G .x; t/dx D t nCm
Pn .x/Pm .x/dx D t P2n .x/dx;
1 nD0 mD0 1 nD0 1
(4.72)
where use has been made of the already established orthogonality of the Legendre
polynomials, so that only n D m terms survive in the above double sum and
hence only a single sum remains. On the other hand, the left-hand side of the above
equation can be calculated directly using the known expression for the generating
function where we, without loss of generality, may assume that 1 < t < 1:
Z 1 Z 1 Z 1
dx 1 dx
G2 .x; t/dx D 2
D 2
1 1 1 2xt C t 2t x t 2tC1
1
ˇ ˇ1 ˇ ˇ
1 ˇ 2
C 1 ˇ 1 ˇ 1 t2 C1 ˇ
t ˇ ˇ
D ln ˇˇx ˇ D ln ˇ 2t
ˇ
2t 2t ˇ1 2t ˇ 1 t2 C1 ˇ
2t
ˇ ˇ ˇ ˇ
1 ˇ 2t t2 1 ˇ 1 ˇˇ t2 2t C 1 ˇˇ
D ln ˇˇ ˇ D ln ˇ 2
2t 2t t2 1 ˇ 2t t C 2t C 1 ˇ
ˇ ˇ ˇ ˇ
1 ˇ .t 1/2 ˇ 1 ˇˇ 1 t ˇˇ 1
D ln ˇˇ ˇ D ln ˇ D Œln.1 t/ ln.1 C t/ :
2t .t C 1/ 2 ˇ t 1Ct ˇ t
Using the Taylor expansion for the logarithms (recall that 1 < t < 1 and hence the
expansions converge),
1
X 1 k
X
tk t
ln.1 C t/ D .1/kC1 and ln.1 t/ D ;
kD1
k kD1
k
we can manipulate the difference of the two logarithms into

1 k
X 1
X
t tk
ln.1 t/ ln.1 C t/ D 1 .1/ kC1
D 1 C .1/kC1 :
kD1
k kD1
k
Only terms with the odd summation indices k survive, hence we can replace the
summation index according to the recipe k ! 2n C 1, yielding
1
X t2nC1
ln.1 t/ ln.1 C t/ D 2 ;
nD0
2n C 1
and thus obtain that

Z 1 X 2 1
1
G2 .x; t/dx D Œln.1 t/ ln.1 C t/ D t2n :
1 t nD0
2n C 1
Comparing this expansion with that in Eq. (4.72), we conclude that

Z 1
2
P2n .x/dx D : (4.73)
1 2n C 1
Orthogonality (4.71) and normalisation (4.73) properties can be combined into a

single equation:
Z 1
2
Pn .x/Pm .x/dx D ımn ; (4.74)
1 2n C 1
where ınm is the Kronecker delta symbol.
Problem 4.30. Show, using explicit calculation of the integrals, that P3 .x/ is
orthogonal to P4 .x/ (their expressions are given in Eq. (4.56)) and that P3 .x/ is
properly normalised, i.e.
Z 1 Z 1
2 2
P3 .x/P4 .x/dx D 0 and P23 .x/dx D D :
1 1 23C1 7
Problem 4.31. Prove the following identities:

Z 1 Z 1
2
Pn .x/dx D 2ın0 and xPn .x/dx D ın1 :
1 1 3
Since all polynomials contain different powers of x, they are all linearly
independent. It can also be shown that they form a complete set (see Sect. 4.3.2.6),
and hence a function f .x/ defined on the interval 1 x 1 can be expanded in
them:
1
X
f .x/ D an Pn .x/: (4.75)
nD0
Multiplying both sides of this equation by Pm .x/ with some fixed value of m,
integrating between 1 and 1 and using the orthonormality condition (4.74), we get
Z 1 1
X Z 1
f .x/Pm .x/dx D an Pn .x/Pm .x/dx
1 nD0 1
Z 1 1
X 2 2
H) f .x/Pm .x/dx D an ınm D am ;
1 nD0
2n C 1 2m C 1
from which the following expression for the expansion coefficient follows:
Z 1
2n C 1
an D f .x/Pn .x/dx: (4.76)
2 1
Example 4.1. I
Let us expand the Dirac delta function in Legendre polynomials:
1
X
ı.x/ D an Pn .x/:
nD0
Using Eq. (4.76), we get

Z 1 X 2n C 1 1
2nC1 2n C 1
an D ı.x/Pn .x/dxD Pn .0/ H) ı.x/D Pn .0/Pn .x/: J
2 1 2 nD0
2
Problem 4.32. Expand the function f .x/ D 1 C x C 2x2 into a series with
respect to the Legendre polynomials:
2
X
1 C x C 2x2 D cn Pn .x/:
nD0
Why from the start do only polynomials up to the order two need to be con-
sidered? Using explicit expressions for the several first Legendre polynomials,
verify your expansion. [Answer: c0 D 5=3, c1 D 1 and c2 D 4=3.]
Problem 4.33. Expand f .x/ D x3 via the appropriate Legendre polynomials.
[Answer: x3 D 35 P1 .x/ C 25 P3 .x/.]
Problem 4.34. Hence, show that
Z 1
2 4
Pn .x/x3 dx D ın1 C ın3 :
1 5 35
4.3.1.3 Rodrigues Formula
We shall now prove the so-called Rodrigues formula which allows writing the
Legendre polynomial Pn .x/ with general n in an explicit and compact form:
1 dn 2 n
Pn .x/ D x 1 : (4.77)
2 nŠ dx
n n
n
To prove it, we consider an auxiliary function #.x/ D x2 1 , which satisfies
the equation (check!):
2 d#
x 1 D 2xn#.x/: (4.78)
dx
We shall now differentiate this equation n C 1 times using the Leibnitz formula
Xn
dn .uv/ n .k/ .nk/
D u v ; (4.79)
dxn kD0
k
see Eq. (I.3.47). On the left-hand side we get

dnC1 2 d# dnC1 2
LHS D nC1
x 1 D nC1
x 1 # .1/
dx dx dx
X
nC1 2 .k/
nC1
D x 1 # .nkC2/
k
kD0

nC1 2 .nC2/ nC1 .nC1/ nC1
D x 1 # C 2x# C 2# .n/
0 1 2
n.n C 1/ .n/
D x2 1 # .nC2/ C .n C 1/2x# .nC1/ C 2# ; (4.80)
2
since only k D 0; 1; 2 terms give non-zero contributions. On the other hand, the
right-hand side of (4.78) after n C 1 differentiations results in:
!
dnC1 X nC1
n C 1 .k/ .nC1k/
2n nC1 .x#/ D 2n x #
dx kD0
k
! !
nC1 nC1
D 2n x# .nC1/ C 2n # .n/ D 2nx# .nC1/ C 2n.n C 1/# .n/ :
0 1
Since both expressions should be identical, we obtain

x2 1 # .nC2/ C 2.n C 1/x# .nC1/ C n.n C 1/# .n/ D 2nx# .nC1/ C 2n.n C 1/# .n/ ;
or

1 x2 # .nC2/ 2x# .nC1/ C n.n C 1/# .n/ D 0 H)

1 x2 U 00 2xU 0 C n.n C 1/U D 0;
which is the familiar Legendre equation (4.65) for the function U.x/ D # .n/ .
Since the function U.x/ satisfies the correct differential equation for Pn .x/, it
must be equal to Pn .x/ up to an unknown constant factor. To find this factor, and
hence prove the Rodrigues formula (4.77), we shall calculate U.1/ and compare it
with Pn .1/ which we know is equal to one:
dn 2 n dn
U.x/ D # .n/ D x 1 D Œ.x C 1/n .x 1/n :
dxn dxn
Use the Leibnitz formula again:
n
X n
U.x/ D Œ.x C 1/n .k/ Œ.x 1/n .nk/ ;
k
kD0
where
nŠ
Œ.x C 1/n .k/ D n.n 1/ : : : .n k C 1/.x C 1/nk D .x C 1/nk ;
.n k/Š
and similarly
nŠ nŠ
Œ.x 1/n .k/ D .x 1/nk H) Œ.x 1/n .nk/ D .x 1/k :
.n k/Š kŠ
Hence, we obtain
n
X n nŠ nŠ
U.x/ D .x C 1/nk .x 1/k :
kD0
k .n k/Š kŠ
Because of .x 1/k , at x D 1 only one term with k D 0 is left:

ˇ
n nŠ nŠ ˇ
U.1/ D .x C 1/n ˇˇ D nŠ2n ;
0 nŠ 0Š xD1
and this proves the normalisation factor in Eq. (4.77). The Rodrigues formula is
proven completely.
Problem 4.35. Using Rodrigues formula, derive polynomials Pn .x/ for n D

0; : : : ; 5. [Hint: to perform multiple differentiations, it is convenient first to
n
expand x2 1 using the binomial formula and then differentiate each power
of x separately.]
4.3.2 General Theory of Orthogonal Polynomials

4.3.2.1 General Properties
There are several functions–polynomials defined in a similar way to Legendre

polynomials. Apparently, they all can be considered on the same footing in a rather
general way. We shall briefly discuss this theory here for the reader to appreciate
that all orthogonal polynomials have the same foundation.
Let Qn .x/ be a real n-order polynomial with respect to a real variable x. We
assume that polynomials of different orders are orthogonal to each other on the
interval a x b (a and b could be either finite or infinite, e.g. a could be 1,
while b could be either finite or C1) in the following sense:
Z b
.Qn ; Qm / D w.x/Qn .x/Qm .x/dx D 0 for any n ¤ m; (4.81)
a
where w.x/ 0 is a weight function (see Sects. 1.1.2 and 3.7.3). In this section we
shall call the expression .f ; g/ defined by Eq. (4.81) an overlap integral between two
functions f .x/ and g.x/. Note that the overlap integral is fully defined if the weight
function w.x/ is given. For the moment we shall not assume any particular form of
the weight function, but later on several forms of it will be considered.
Theorem 4.1. Any polynomial Hn .x/ of order n can be presented as a linear com-
bination of Qm polynomials with m D 0; 1; : : : ; n, i.e. higher order polynomials are
not required:
X
n
Hn .x/ D cnk Qk .x/: (4.82)
kD0
Proof. The Qk .x/ polynomial contains only powers of x from zero to k. Then,
collecting all polynomials Q0 .x/, Q1 .x/, : : :, Qn .x/ into a vector-column Q, one can
write this statement in a compact form using the following formal matrix equation:
0 1 0 10 1
Q0 .x/ a11 1
B Q1 .x/ C B a21 a22 CB x C
B C B CB C
B Q2 .x/ C B a31 C B x2 C
B CDB a32 a33 CB C;
B : C B : :: :: : : CB : C
@ :: A @ :: : : : A @ :: A
Qn .x/ an1 an2 an3 ann xn
which can also be written simply as Q D AX, where A is the left triangular matrix
of the coefficients aij , and X is the vector-column of powers of x. Correspondingly,
X D A1 Q. It is known (see Problem 1.29 in Sect. 1.2.3) that the inverse of a
triangular matrix has the same structure as the matrix itself, i.e. A1 is also left
triangular:
0 1 0 10 1
1 b11 Q0 .x/
B x C B b21 b22 C B Q1 .x/ C
B C B CB C
B x2 C B b31 C B Q2 .x/ C
B CDB b32 b33 CB C;
B : C B : :: :: : : CB : C
@ :: A @ :: : : : A @ :: A
xn bn1 bn2 bn3 bnn Qn .x/
where bij are elements of the matrix A1 . In other words, the k-th power of x is
expanded only in polynomials Q0 ; Q1 ; : : : ; Qk . Since the polynomial Hn .x/ contains
only powers of x from 0 to n, it is expanded exclusively in polynomials Qk with
k n, as required. Q.E.D.
A simple corollary to this theorem is that any polynomial Hn .x/ of order n is
orthogonal (in the sense of the definition (4.81)) to any of the polynomials Qk .x/
with k > n. Indeed, Hn .x/ can be expanded in terms of the orthogonal polynomials
as shown in Eq. (4.82). Therefore, the overlap integral
X
n X
n
.Hn ; Qk / D cnl .Ql ; Qk / D cnl ıkl :
lD0 lD0
It is seen from here that the overlap integral .Hn ; Qk / ¤ 0 only if the summation
index l may accept a value of k, but this can only happen if k n. This proves the
statement made above: .Hn ; Qk / D 0 for any k > n.
Theorem 4.2. The polynomial Qn .x/ has exactly n roots on the interval
a x b.
Proof. Let us assume that Qn .x/ changes its sign k times on the interval, and
that k < n (strictly). This means that the function Qn .x/ must cross the x axis
at some k points x1 ; x2 ; : : : ; xk lying within the same interval, see an example in
Fig. 4.4. Consider then a polynomial Hk .x/ D .x x1 / .x x2 / .x xk / which
Fig. 4.4 Polynomials Q3 .x/, Q4 .x/ and Q5 .x/ cross the x axis three, four and five times,
respectively, and hence change their sign the same number of times
also changes its sign k times on the same interval and has its roots at the same
points. Correspondingly, the product F.x/ D Qn .x/Hk .x/ does not change its sign at
all, and hence the integral
Z b Z b
w.x/F.x/dx D w.x/Hk .x/Qn .x/dx ¤ 0
a a
(recall that w.x/ 0). This result, however, contradicts the fact proven above that
Qn is orthogonal to any polynomial of a lower order than itself. It follows therefore
that k must be equal to n as only in this case the contradiction is eliminated. Q.E.D.
It follows then that the orthogonal polynomials fQn .x/g must be somewhat
special as they possess a special property that each of them has exactly as many
roots as its order. Since a polynomial of order n may have no more than n distinct
roots (the total number of roots, including repetitions, is n), we conclude that for
any Qn .x/ all its n roots must be distinct.
4.3.2.2 Recurrence Relation
Next we shall derive a recurrence relation relating three consecutive polynomials.

To this end, we shall consider a polynomial xQn .x/ which is of order n C 1. It can
be expanded into polynomials Qk with k D 0; 1; : : : ; n C 1 as follows:
X
nC1 Z
1 b
.xQn ; Qk /
xQn D hn;k Qk ; with hn;k D w.x/xQn .x/Qk .x/dx D ;
kD0
Dk0 a .Qk ; Qk /
(4.83)
where
Z b
Dk0 D w.x/Q2k .x/dx D .Qk ; Qk / (4.84)
a
is the corresponding normalisation given by the overlap integral of Qk with itself.

The above expression for the expansion coefficient hn;k is obtained by multiplying
both sides of the first equation above by Qm with some fixed m (0 m n C 1),
integrating over x between a and b and using the orthogonality relation between the
polynomials Qk .
The expression for hn;k is proportional to the overlap integral .xQn ; Qk /. This
overlap integral can however be considered alternatively as the overlap .Qn ; xQk /
between Qn and the polynomial HkC1 .x/ D xQk .x/ of the order k C 1. The latter
polynomial can be expanded in Qm polynomials with m ranging between 0 and kC1.
Hence, hn;k ¤ 0 only if n k C 1 or k n 1. However, when expanding xQn
in terms of Qk in Eq. (4.83), we have the summation index k n C 1. Combining
the last two inequalities, it is seen that in the sum in Eq. (4.83) only three terms will
survive for which k is either n 1, n or n C 1. In other words, we may write
xQn D hn;n1 Qn1 C hn;n Qn C hn;nC1 QnC1 ; (4.85)
which is the required recurrence relation relating three consecutive polynomials:

Qn1 , Qn and QnC1 .
It is actually possible to express the expansion coefficients in the above recur-
.n/
rence relation via the coefficients ak of the polynomials Qn themselves. Let
.n/ .n/
Qn .x/ D a.n/
n x C an1 x
n n1
C C a0 : (4.86)
.n/ .nC1/
By comparing the coefficients to the xnC1 in (4.85), we get an D hn;nC1 anC1
.n/ .nC1/
which gives us the last coefficient in Eq. (4.85) as hn;nC1 D an =anC1 . From the
definition of hn;k in (4.83) it also follows that
.xQn ; QnC1 / .Qn ; xQnC1 / .Qn ; Qn / .Qn ; xQnC1 /

hn;nC1 D D D
.QnC1 ; QnC1 / .QnC1 ; QnC1 / .QnC1 ; QnC1 / .Qn ; Qn /
Dn;0
D hnC1;n ;
DnC1;0
so that
.n/
DnC1;0 DnC1;0 an
hnC1;n D hn;nC1 D :
Dn;0 Dn;0 a.nC1/
nC1
This identity must be valid for any n. Therefore, using n 1 instead of n in it, we
can write
.n1/
Dn;0 an1
hn;n1 D :
Dn1;0 a.n/
n
This gives us the first coefficient in the expansion (4.85). It is now left to calculate
the second coefficient hn;n , which can be done by comparing the coefficients to xn in
both sides of Eq. (4.85):
.n/ .nC1/ .n/ .nC1/

.n/ an1 an an1 an
an1 Dhn;n a.n/ .nC1/
n Chn;nC1 an H) hn;n D .n/
hn;nC1 .n/
D .n/
.nC1/
:
an an an anC1
After collecting all the coefficients we have just found, the recurrence relation (4.85)
takes on the following form:
.n1/ .n/ .nC1/

! .n/
Dn;0 an1 an1 an an
xQn D Qn1 C Qn C QnC1 : (4.87)
Dn1;0 a.n/
n an
.n/ .nC1/
anC1
.nC1/
anC1
Therefore, the knowledge of the normalisation constants and the coefficients to

the first two highest powers of x in the polynomials allows one to establish the
recurrence relations between them.
4.3.2.3 Differential Equation
Here we shall derive the differential equation which the polynomials Qn .x/ should
satisfy. However, at this point more information is to be given concerning the
weight function w.x/. As it is customarily done, we shall assume that it satisfies
the following first order DE:
w0 .x/ ˛.x/
D ; (4.88)
w.x/ .x/
where ˛.x/ D ˛0 C ˛1 x and .x/ D 0 C 1 x C 2 x2 are polynomials of the first

and second order, respectively. For the moment we shall not assume any particular
values for the ˛ and coefficients; several particular cases will be specifically
considered below. We shall also assume that at the end of the interval a x b the
following boundary conditions are satisfied:
.x/w.x/jxDa;b D 0: (4.89)
It will become apparent later on why it is convenient that these are obeyed.
Next we shall consider the following integral which we take by parts:

Z Z
b 0 ˇb b k1 0
ID xk w Q0n dx D xk w Q0n ˇa k x w Qn dx:
a a
The first (free) term in the right-hand side is equal to zero because of the
boundary condition (4.89); to calculate the integral in the right-hand side we use
the integration by parts again:
Z
ˇb b k1 0
I D k xk1 w Qn ˇa C k x w Qn dx:
a
Again, due to the boundary condition, the first term in the right-hand side is zero,
and hence
Z Z
b k1 0 b
IDk x w Qn dx D k .k 1/ xk2 w C xk1 w0 C xk1 w 0
Qn dx:
a a
The second term in the square brackets can be rearranged into xk1 w0 D xk1 w˛
because of the DE (4.88) the weight function satisfies. Then we can finally write
Z b
0
IDk w .k 1/ xk2 C xk1 ˛ C xk1 Qn dx:
a
Now the whole expression in the square brackets is a polynomial of the order k
(note that 0 is the first order polynomial), and therefore I D 0 for any k < n as any
polynomial of order less than n is orthogonal to Qn .
On the other hand, the integral I can be written directly as
Z Z
b 0 b
ID xk w Q0n dx D xk w0 Q0n C w 0 Q0n C w Q00n dx
a a
Z Z
b b
D x w˛Q0n C w 0 Q0n C w Q00n dx D
k
wxk ˛C 0
Q0n C Q00n dx;
a a
where we have used (4.88) again. Here the expression in the square brackets is some
polynomial Hn of order n. Since we know that I D 0 for any k < n, the polynomial
Hn should be proportional to Qn , i.e. we must have
0

˛C Q0n C Q00n D n Qn H) Q00n C ˛ C 0
Q0n n Qn D 0; (4.90)
which is the desired DE. The constant n can be expressed via the coefficients of
the polynomials ˛.x/ and .x/. Indeed, comparing the coefficients to xn in the DE
above, we obtain:
n.n 1/ 2 a.n/ .n/ .n/

n C .˛1 C 2 2 / nan n an D 0 H) n D n Œ.n C 1/ 2 C ˛1 :
(4.91)
Hence, the selection of the functions ˛.x/ and .x/, which define the weight
function w.x/, precisely determines the DE for the orthogonal polynomials. This
DE, which we shall rewrite in a simpler form as
Q00n C Q0n C n Qn D 0; (4.92)
has a form in which the coefficient .x/ to the second derivative of Qn is a

polynomial of no more than the second order, the coefficient .x/ D ˛ C 0 to the
first derivative of Qn is a polynomial of no more than first order, while the prefactor
to the function Qn itself is a constant

1 00
n D n D n Œ.n C 1/ 2 C ˛1 D n .n 1/ C 0 ; (4.93)
2
since ˛1 D ˛ 0 D . 0 /0 D 0 00 . The same expression for the eigenvalues n

will be derived in the next section using a very different approach.
4.3.2.4 Rodrigues Formula
There exists a compact general formula for the polynomials Qn .x/ corresponding to
a particular weight function w.x/ which also bears the name of Olinde Rodrigues4 :
Cn dn
Qn .x/ D Œw.x/ n
.x/ ; (4.94)
w.x/ dxn
where Cn is a constant determined by the chosen normalisation of the polynomials.

To prove Eq. (4.94) we shall use a method similar to the one we employed in
Sect. 4.3.1.3 for Legendre polynomials. Consider a function v.x/ D w.x/ n .x/.
It satisfies a DE. To derive it, consider
v 0 .x/ D w0 .x/ n
.x/ C w.x/n n1
.x/ 0 .x/ D n1
.x/w.x/ ˛.x/ C n 0 .x/ ;
where we expressed .x/w0 .x/ from Eq. (4.88). Multiplying both sides of the above
equation by .x/, we obtain
.x/v 0 .x/ D ˛.x/ C n 0 .x/ v.x/:
In this equation .x/ and ˛.z/ C n 0 .x/ are the second and first order polynomials,
respectively. Therefore, we can differentiate both sides of this equation n C 1 times
using the Leibnitz formula (4.79) and a finite number of terms will be obtained in
the right- and the left-hand sides:
4
He actually derived it only for Legendre polynomials in 1816.

0 .nC1/ nC1 .nC2/ nC1 0 .nC1/ nC1 00 .n/
LHS D v D v C v C v
0 1 2
1
D v .nC2/ C .n C 1/ 0 .nC1/
v C n .n C 1/ 00 v .n/ ;
2
.nC1/
RHS D ˛Cn 0 v D ˛ C n 0 v .nC1/ C .n C 1/ ˛ 0 C n 00
v .n/ :
Since the two expressions must be equal, we obtain after small rearrangements:
n
v .nC2/ C 0
˛ v .nC1/ .n C 1/ ˛ 0 C 00
v .n/ D 0:
2
To obtain a DE for Qn we recall that, according to the Rodrigues formula (4.94) we
are set to prove here, v .n/ is supposed to be proportional to wQn . Therefore, we have
to replace v .n/ with wQn in the above DE for the final rearrangement (and ignoring
the constant prefactor between them):
n
.wQn /00 C 0
˛ .wQn /0 .n C 1/ ˛ 0 C 00
wQn D 0:
2
Performing differentiation and using repeatedly Eq. (4.88) to express derivatives of
the weight function via itself, w0 D w˛= and
˛ 0 ˛ ˛0 ˛ 0
wh 0 ˛ i
w00 D w D w0 Cw w 2
D ˛ C ˛ 0
;
we obtain exactly DE (4.90) for Qn .x/ with the same prefactor (4.91) to Qn . This
proves the Rodrigues formula in the very general case.
4.3.2.5 Generating Function
We know from the very beginning that one can define a generating function G.x; t/
for Legendre polynomials, Eq. (4.50), which can then be used as a starting point in
deriving all their properties. In the current section we have taken a different approach
by deriving the polynomials from the weight function. Still, it is important to show
that a generating function can be constructed in the very general case as well. This
can easily be shown using the Rodrigues formula.
Indeed, consider the function of two variables:
1
X Qn .x/ tn
G.x; t/ D ; (4.95)
nD0
Cn nŠ
where Cn is the constant prefactor in the Rodrigues formula, Eq. (4.94). Using the
Rodrigues formula, we first rewrite G.x; t/:
1
1 X tn d n
G.x; t/ D .w n
/; (4.96)
w.x/ nD0 nŠ dxn
and then use the Cauchy formula (2.74) for the n-th derivative:
1 I I 1
1 X tn nŠ w.z/ n
.z/ 1 1 w.z/ X .z/t n
G.x; t/ D dz D dz;
w.x/ nD0 nŠ 2 i L .z x/ nC1 w.x/ 2 i L z x nD0 z x
where L is some contour in the complex plane that surrounds the point x; it is to be
chosen such that the function w.z/, when analytically continued into the complex
plane from the real axis, is analytic inside L including L itself ( .z/D 0 C 1 zC 2 z2
is obviously analytic everywhere). Assuming that j t= .z x/j < 1 (this can
always be achieved by choosing t sufficiently small), we can sum up the geometric
progression inside the integral to obtain
I I
1 1 w.z/ 1 1 1 w.z/
G.x; t/ D .z/t
dz D dz:
w.x/ 2 i L zx1 w.x/ 2 i L z x .z/t
zx
The function f .z/ D z x .z/t in the denominator in the contour integral is a

polynomial in z which could be either of the first or second order. Consider first the
latter case which requires for .x/ to be a second order polynomial. In this case f .z/
has two roots:
q
1
z˙ D 1 t 1 ˙ .1 t 1 /2 4t 2 .t 0 C x/ :
2 2t
At this point we need to understand where the roots are for small enough t. To this
end, let us expand the square root in terms of t up to the first order:
q
.1 t 1 /2 4t 2 .t 0 C x/ D 1 . 1 C 2 2 x/ t C :
It is easy to see that the pole zC D 1= . 2 t/ x 1 = 2 C can be made as remote

from x as is wished for by choosing t appropriately small, while z D xC is very
close to x for small t (the dots stand for the terms which vanish at t D 0). Therefore,
choosing the contour such that only the pole z near z D x lies inside it, we can
calculate the contour integral above using the single residue at that pole only:

1 w.z/ 1 w.z/
G.x; t/ D Res I z D
w.x/ z x .z/t w.x/ Œz x .z/t0 zDz
1 w .z /
D ; (4.97)
w.x/ 1 t 0 .z /
which is the required expression for the generating function.

If .x/ is a constant or a first order polynomial, then there is only one root of f .z/
which is easily seen to be always close to x for small t. Hence, the formula derived
above is valid formally in this case as well, where z is the root in question.
4.3.2.6 Expansion of Functions in Orthogonal Polynomials
We have already discussed the general theory of functional series in Sect. I.7.2. We
also stated in Sect. 4.3.1.2 that “good” functions f .x/ can be expanded in Legendre
polynomials since these form a complete set. This is actually true for any orthogonal
polynomials. This is because the orthogonal polynomials Qn .x/, n D 0; 1; 2; : : :, as
it can be shown, form a closed set which is the necessary condition for them to form
a complete set.
To explain what that means, consider functions f .x/ for which the integral
Z b
f 2 .x/w.x/dx < C1:
a
If for any of such functions the conditions

Z b
f .x/w.x/Qn .x/dx D 0 for any n D 0; 1; 2; 3; : : :
a
imply that f .x/ D 0, then the functions fQn .x/; n D 0; 1; 2; : : :g form a closed
set. This is analogous to a statement that if a vector in an n-dimensional space is
orthogonal to every basis vector of that space, then this vector is a zero vector, i.e.
the collection of basis vectors is complete.
The point about the orthogonal polynomials is that any family of them (i.e. for
the given ˛.x/, .x/, see the following subsection) forms such a closed set, and
hence any function f .x/ for which integrals
Z b Z b
2 2
f .x/w.x/dx and f 0 .x/ w.x/ .x/dx
a a
converge can be expanded in a functional series with respect to them:

1
X Z
.f ; Qn / 1 b
f .x/ D fn Qn .x/ H) fn D D w.x/f .x/Qn .x/dx:
nD0
.Qn ; Qn / .Qn ; Qn / a
(4.98)
We shall leave this statement without proof.
4.3.2.7 Classical Orthogonal Polynomials
It is seen that orthogonal polynomials are uniquely determined by the functions

˛.x/ and .x/, and hence a particular choice of these functions would generate a
particular family of orthogonal polynomials. We shall investigate this point in more

detail here and construct from the general equations worked out above the particular
ones for Legendre, Hermite, Chebyshev, Jacobi and Laguerre polynomials.
As our first example, let us choose ˛.x/ D 0 (˛0 D ˛1 D 0) and .x/ D 1 x2
( 0 D 1, 1 D 0 and 2 D 1), and the interval 1 x 1. In this
case Eq. (4.88) for the weight function reads simply w0 .x/ D 0 and hence is a
constant. We shall set w D 1. Note that the boundary condition (4.89) is also
satisfied due to .˙1/ D 0 by construction. The DE for the polynomials has
the general form (4.90) with the constant prefactor to Qn being n D n .n C 1/
(see Eq. (4.91)). Correspondingly, the DE becomes the same as Eq. (4.65) for the
Legendre polynomials. Hence, this particular choice of the interval and of the weight
function leads to the Legendre polynomials. Using the unit weight and D 1 x2 ,
we immediately obtain from the general result (4.94) the corresponding Rodrigues
n .n/
formula Qn .x/ D Cn 1 x2 . Comparing this with our previous result,
Eq. (4.77), gives Cn D .1/n = .2n nŠ/ for the prefactor. To derive the generating
function, we need first to find one specific root of the equation
z x t .z/ D tz2 x C .z t/ D 0;
which is close to x for small t. This is easily found to be

1 p
z D 1 1 C 4t2 C 4tx
2t
and hence the generating function, Eq. (4.97), becomes
1 1
G.x; t/ D Dp ;
1 C 2tz 1 C 4tx C 4t2
which by definition (4.95) is also equal to
1
X X1
tn
G.x; t/ D Qn .x/ .2/n nŠ D Qn .x/ .2t/n :
nD0
nŠ nD0
By replacing 2t ! t in the last two formulae, we obtain the usual definition (4.50)
of the generating function for the Legendre polynomials.
Calculations for other classical polynomials are performed in the same way.
These are considered in the following Problems.
Problem 4.36 (Hermite Polynomials Hn .x/). Consider on the interval

1 < x < 1 functions ˛.x/ D 2x and .x/ D 1. Show that the weight
2
function in this case can be chosen as w.x/ D ex . Verify that the boundary
conditions (4.89) are satisfied. Show next that the DE, the Rodrigues formula
and the generating function for the polynomials are
(continued)

Hn00 2xHn0 C 2nHn D 0; (4.99)
n
2 d 2
Hn .x/ D .1/n ex ex (4.100)
dxn
2 2xt
G.x; t/ D et : (4.101)
This corresponds to the customary chosen prefactor Cn D .1/n . Use the

definition of the generating function (4.95) and replace there t ! t to finally
obtain
X1
t2 C2xt 1
e D Hn .x/tn ; (4.102)
nD0
nŠ
which is the expression that is sometimes used to define the Hermite polynomi-
als.
Problem 4.37. Verify using the method of the generating function and the
Rodrigues formula that several first Hermite polynomials are
H0 D1I H1 D2xI H2 D4x2 2I H3 D8x3 12xI H4 D16x4 48x2 C 12:

(4.103)
Problem 4.38 (Laguerre Polynomials Ln .x/). Consider on the interval 0

x < 1 functions ˛.x/ D x and .x/ D x. Show that the weight function is
w.x/ D ex and the boundary conditions are satisfied. Choosing Cn D 1=nŠ,
demonstrate that the DE, the Rodrigues formula and the generating function
for the polynomials are
xLn00 C .1 x/ Ln0 C nLn D 0; (4.104)

x n
e d
Ln .x/ D .ex xn / (4.105)
nŠ dxn
X1
ext=.1t/
G.x; t/ D D Ln .x/tn : (4.106)
1t nD0
Problem 4.39. Verify using the method of the generating function and the
Rodrigues formula that several first Laguerre polynomials are
1 2
L0 D 1 I L1 D 1x I L2 D
x 4xC2 I
2
1 3 1 4
L3 D x C9x2 18xC6 I L4 D x 16x3 C72x2 96xC24 : (4.107)
6 24
./
Problem 4.40 (Generalised Laguerre Polynomials Ln .x/). The only dif-
ference with the previous case is that ˛.x/ D x. Show that in this case
w.x/ D x ex , and by choosing Cn D 1=nŠ we get
xLn./00 C . C 1 x/ Ln./0 C nLn./ D 0; (4.108)

x ex dn x Cn
Ln./ .x/ D e x (4.109)
nŠ dxn
1
ext=.1t/ X ./
G.x; t/ D D Ln .x/tn : (4.110)
.1t/C1 nD0
Problem 4.41 (Chebyshev Polynomials Tn .x/). Consider on the interval

1 x 1 functions ˛.x/ D x and .x/ D 1 x2 . Show that the weight
1=2
function in this case can be chosen as w.x/ D 1 x2 . Verify the boundary
conditions (4.89) and show that the DE and the Rodrigues formula are

1 x2 Tn00 xTn0 Cn2 Tn D 0; (4.111)
.2/n nŠ p dn n1=2
Tn .x/ D 1x2 n 1 x2 ; (4.112)
.2n/Š dx
where the factor Cn is chosen as it is customary done.
Problem 4.42. Verify using the Rodrigues formula that several first Chebyshev
polynomials are
T0 D 1 I T1 D x I T2 D 2x2 1 I T3 D 4x3 3x I T4 D 8x4 8x2 C 1:

(4.113)
.;/
Problem 4.43 (Jacobi Polynomials, Pn .x/). Legendre and Chebyshev
polynomials are particular cases of a more general class of polynomials due to
Jacobi. They are generated using .x/ D 1x2 and ˛.x/ D . /. C / x
on the interval 1 x 1, where and are some real numbers.
.;/
Show that for the Jacobi polynomials Pn .x/ the weight function w.x/ D

.1 C x/ .1 x/ , the boundary conditions at x D ˙1 are satisfied, and the
corresponding DE and the Rodrigues formula read
d2 .;/ d
1 x2 2
Pn C Œ. / . C C 2/ x Pn.;/
dx dx
Cn . C C n C 1/ Pn.;/ D 0; (4.114)
.1/ n
d n h i
Pn.;/ D .1 x/ .1 C x/ .1 x/nC .1 C x/nC : (4.115)
2n nŠ dxn
Here the factor Cn is chosen appropriately as shown to simplify the normalisa-
tion conditions as in the cases of other polynomials.
It is seen from the above formulae that the Legendre polynomials can indeed be
.0;0/
obtained from Jacobi ones at D D 0, i.e. Pn .x/ D Pn .x/. Chebyshev poly-
.1=2;1=2/
nomials, Tn .x/, follow by choosing D D 1=2, i.e. Tn .x/ Pn .x/.
In fact, it can be shown that Jacobi, Hermite and Laguerre polynomials cover all
possible cases of orthogonal polynomials.
It is possible to derive an explicit expression for the Jacobi polynomials (4.115).
This is done by applying the Leibnitz formula (4.79) when differentiating n times
the product of .1 x/nC and .1 C x/nC ,
n h
dn h i X n i.k/ .nk/
n
.1 x/ nC
.1 C x/ nC
D .1 x/nC .1 C x/nC :
dx k
kD0
Using the fact that

h i.k/
.1 x/nC D .1/k .n C / .n C 1/ : : : .n C k C 1/ .1 x/nCk

.nC/ : : : .nCkC1/ nC
D .1/ kŠ k
.1x/ nCk
D .1/ kŠ
k
.1x/nCk ;
kŠ k

nC
where are generalised binomial coefficients (2.90), and similarly
k

.nk/ nC
.1 C x/nC D .n k/Š .1 C x/Ck ;
nk
we obtain
n
1 X Cn Cn
P.;/ .x/ D .x 1/nk .x C 1/k ; (4.116)
n
2n kD0 k nk
from which it is evident that this is indeed a polynomial for any real values of
and . Note that the obtained expression is formally valid for any and
including negative integer values for which some values of k in the sum are cut
off. This would happen automatically because of the numerators in the generalised
binomial coefficients. The obtained expression can be used for writing down an
explicit formula for, e.g. Legendre polynomials (when D D 0).
4.4 Differential Equation of Generalised

Hypergeometric Type
In the previous section we considered orthogonal polynomials Qn .x/ and investi-

gated their various properties. In particular, we found that they satisfy the DE (4.90).
In this section we shall give a brief introduction into a theory which considers
solutions of a more general DE of the following type:
ˇ.z/ 0 .z/
y00 .z/ C y .z/ C 2 y.z/ D 0; (4.117)
.z/ .z/
where ˇ.z/ is a polynomial of up to the first order, while .z/ and .z/ are
polynomials of the order not higher than two. This equation is called a generalised
equation of the hypergeometric type.
For generality we shall consider solutions in the complex plane, i.e. z in the DE
above is complex. This type of equation is frequently encountered when solving
partial differential equations (PDEs) of mathematical physics using the method
of separation of variables. We shall consider a number of examples which would
emphasise this point in Sect. 4.7. Here, however, we shall simply try to investigate
solutions of the above equation. More specifically, we shall investigate under which
conditions its solutions on the real axis (when z D x) are bound (limited) within a
particular interval of x between a and b; note that the latter boundaries could be also
1 and/or C1. This consideration is very important when obtaining physically
meaningful solutions because in physics we normally expect the solutions not to be
infinite in the spatial region of interest.
4.4.1 Transformation to a Standard Form
Equation (4.117) can be transformed into a simpler form which is easier to

investigate. This can be accomplished using a transformation of the unknown
function y ! u using some (yet arbitrary) function .z/ via y.z/ D .z/u.z/.
Performing differentiation of y.z/, the following DE for the new function u.z/ is
obtained:
0 00
2 ˇ ˇ 0
u00 C C u0 C C C 2 u D 0: (4.118)
It appears that a significant simplification is possible if the function .z/ satisfies

the following first order DE:
0
.z/
D ; (4.119)
.z/
4.4 Differential Equation of Generalised Hypergeometric Type 345
where .z/ is an unknown polynomial of the first order which we shall try to select
in order to perform the required transformation. Since
0 0 00 0 2 00 0 0 0 2 0 2

D H) D C D C ;
we obtain instead of (4.118):
.z/ 0 .z/
u00 .z/ C u .z/ C 2 u.z/ D 0; (4.120)
.z/ .z/
where
.z/Dˇ.z/C2.z/ and .z/D.z/C 2 .z/C.z/ ˇ.z/ 0 .z/ C 0 .z/ .z/

(4.121)
are the first and the second order polynomials, respectively. The unknown polyno-
mial .z/ is now selected in a specific way so that D with some constant .
This must be possible as the polynomial .z/ in (4.121) only depends on the
unknown first order polynomial .z/ D 0 C 1 z (with two unknown parameters 0
and 1 ) and .z/, both polynomials on the left- and right-hand sides of the equation
D are of the second order, so that equating the coefficients to z0 , z1 and z2 on
both sides in this equation should give three algebraic equations for , 0 and 1 .
Although this procedure would formally allow us to transform Eq. (4.120)
into the required standard form (compare with Eq. (4.92) for classical orthogonal
polynomials),
.z/u00 C .z/u0 C u D 0; (4.122)
it is inconvenient in practice. A better method exists which is as follows. Equating

from (4.121) to , we obtain a quadratic equation

2 C ˇ 0
C C 0 D0
with respect to which has two solutions:

r
1 0
1 0 /2
D ˇ ˙ .ˇ Œ C . 0 /
2 4
r
1 0
1 0 /2
D ˇ ˙ .ˇ Ck ; (4.123)
2 4
where
k D 0 (4.124)
is a constant (recall that by our assumption the function .z/ is a first order
polynomial and hence its derivative is a constant). In order for the function .z/ to be
a first order polynomial, both terms in the right-hand side of Eq. (4.123) have to be
polynomials up to the first order. The free term .ˇ 0 / =2 is already a first order
polynomial, but we also have to ensure that the square root is a first order polynomial
as well. The expression under the square root, as it can easily be seen, is a second
order polynomial. However, the square root of it may still be an irrational function.
The square root is going to be a first order polynomial if and only if the second order
polynomial under the root is the exact square of a first order polynomial. Then the
square root is a rational function which is a first order polynomial.
Equipped with this idea, we write the expression inside the square root explicitly
as a quadratic polynomial, R2 .z/ D a0 C a1 z C a2 z2 (with the coefficients a0 , a1 and
a2 which can be expressed via the corresponding coefficients of the polynomials
.z/, .z/ and ˇ.z/), and make it up to the complete square:

a1 2 a21
R2 D a2 z C C D; D D a0 :
2a2 4a2
This polynomial is going to be an exact square if and only if the constant term is
zero: D D 0. This procedure gives possible values for the constant k. There could
be more than one solution. Once k is known, we find the complete function .z/
from (4.123), and hence by solving Eq. (4.119) obtain the transformation function
.z/. The prefactors .z/ and of the new form of the DE (4.122) are then obtained
from (4.121) and (4.124).
Example 4.2. I As an example, consider the transformation of the following DE
z C 1 0 .z C 1/2
y00 C y C yD0
z z2
into the form (4.122). Here D z, ˇ D z C 1 and D .z C 1/2 , compare with

Eq. (4.118). The polynomial inside the square root in (4.123) becomes

3 2 4 2 .k 2/2
R2 D z kC C D with D D 1 C ;
4 3 3 3
p
which gives k D 2 ˙ 3 as a solution of D D 0. Let us choose here the plus sign
for definiteness. We also choose the plus sign before the square root in Eq. (4.123).
This yields
1 p
.z/ D i 1 i 3 z:
2
Once we know this function, we can find
p
0 3 p 3
DkC D C 3 Ci
2 2
and determine the transformation function .z/ satisfying Eq. (4.119):
0
p p
i 1i 3 1i 3
D D H) ln D i ln z z
z z 2 2
h z p i
H) .z/ D zi exp 1 i 3 :
2
p
Finally, we calculate the prefactor .z/ from (4.121) yielding D .1 2i/ C i 3z.
Now the initial DE accepts a new form (4.122) for u.z/ with the above values of
and .z/. Once the solution u.z/ of the new equation is obtained, the required
function y.z/ can be found via y.z/ D .z/u.z/. J
We shall come across several more examples of this transformation below.
4.4.2 Solutions of the Standard Equation
In this section we shall study (mostly polynomial) solutions of the DE (4.122). This
DE is called the DE of a hypergeometric type.
Let u.z/ be a solution of such an equation. It is easy to find the DE which is
satisfied by the n-th derivative of u, i.e. by the function gn .z/ D u.n/ .z/. Indeed,
using Leibnitz formula (4.79), one can differentiate the DE (4.122) n times recalling
that .z/ and .z/ are polynomials of the second and first orders, respectively. We
have
.n/ 1
u00 D u.nC2/ C n 0 u.nC1/ C n .n 1/ 00 .n/
u ;
2
0 .n/
u D u.nC1/ C n 0 u.n/ ;
which results in the DE for gn :
g00n C n g0n C n gn D 0; (4.125)
where
1
n .z/ D n 0 .z/ C .z/ and n D C n 0 C n .n 1/ 00
(4.126)
2
are a first order polynomial and a constant.
In particular, if n D 0 for some integer n, then one of the solutions of the
DE (4.125), g00n C n g0n D 0, is a constant. If gn D u.n/ .z/ is a constant, then surely
this in turn means that u.z/ must be an n-th order polynomial. We see from here
immediately that if the constant takes on one of the following eigenvalues,
1
n D n 0 n .n 1/ 00
; (4.127)
2
where n D 0; 1; 2; : : :, then one of the solutions of the DE (4.122) is a polynomial

(recall that the DE (4.125) is of the second order and hence there must be two
linearly independent solutions, so this is only one of them). This is exactly the
same expression as the one we obtained earlier, see Eq. (4.93). Of course, this is
not accidental considering that Eqs. (4.122) and (4.92) have an identical form.
Now let us consider one such solution, un .x/, corresponding to the value of n
given by Eq. (4.127) with some n. It satisfies the DE (4.122). If we differentiate this
equation m times (where m n), then according to the above discussion the function
.m/
vm .x/ D un .x/ should satisfy the DE (4.125) where n is replaced by m, i.e.
vm00 C m vm0 C m vm D 0; (4.128)
where
m .x/ D m 0 .x/ C .z/ (4.129)
and

1 nCm1
m D n C m 0 C m .m 1/ 00
D .n m/ 0 C 00
: (4.130)
2 2
It is explicitly seen from this that when m D n we have n D 0, as it should be.
We shall now rewrite DEs (4.122) and (4.128) in the self-adjoint form in which
the terms with the second and first order derivatives are combined into a single
expression. To this end, let us multiply (4.122) and (4.128) by some functions w.z/
and wn .z/, respectively, so that the two DEs would take on the self-adjoint form
each:
0 0
wu0 C wu D 0 and wm vm0 C m wm vm D 0; (4.131)
where u.x/ D un .x/ is the initial polynomial.
Problem 4.44. Show that functions w.z/ and wm .z/ must satisfy the following
DEs:
. w/0 D w and . wm /0 D m wm : (4.132)
Problem 4.45. Using expression (4.126) for m .z/, manipulate the equation
for wm into w0m =wm D w0 =w C m 0 = and hence show that
wm .z/ D m
.z/w.z/; m D 0; 1; 2; : : : ; n (4.133)
(when integrating, an arbitrary constant was set to zero here which corresponds
to the prefactor of one in the relation above).
The above equations should help derive an explicit expression for the polynomi-
als un .z/ corresponding to n D 0 for a particular (positive integer) value of n. Note
.m/
that m ¤ 0 for m < n. Recalling that vm D un .z/ is the m-th derivative of the
.mC1/
polynomial un .z/, we can write vmC1 D un D vm0 . Also,from (4.133),
we have
wmC1 D mC1 w D wm , and therefore, .wmC1 vmC1 /0 D wm vm0 . By virtue of
the second DE in (4.131), this expression should also be equal to m wm vm , i.e. we
obtain the recurrence relation:
1
wm vm D .wmC1 vmC1 /0 : (4.134)
m
Applying this relation n m times consecutively, one obtains

1 0 1 1
wm vm D .wmC1 vmC1 / D .wmC2 vmC2 /00
m m mC1

1 1 1
D .wn vn /.nm/
m mC1 n1
" n1 #1
Y
D .k / .wn vn /.nm/ :
kDm
.n/
Expressing vm from the left-hand side and realizing that vn D un .z/ is a constant,
we obtain
Cnm
vm D un.m/ .z/ D .wn .z//.nm/ ; (4.135)
wm .z/
where
" n1 #1 2 31 2 3
Y Y
n1
Y
m1
Mm
Cnm Du.n/
n .k / Du.n/
n
4 j 5 4 j 5 D u.n/
n
kDm jD0 jD0
Mn
(4.136)
is some constant prefactor, and
Y
k1 Y Y
k1 nCj1 nŠ
k1
nCj1
Mk D j D .n j/ 0 C 00
D 0 C 00
:
2 .n k/Š 2
jD0 jD0 jD0
(4.137)
By definition it is convenient to assume that M0 D 1. Formula (4.135) demonstrates
that the polynomials and their derivatives are closely related.
In particular, when m D 0, one obtains an explicit (up to a constant prefactor)
expression for the polynomial solution of the DE (4.122) we have been looking for:
Cn0 Cn0
un .z/ D .wn .z//.n/ D . n
.z/w.z//.n/ ; (4.138)
w.z/ w.z/
where we have made use of Eq. (4.133). This is the familiar Rodrigues for-
mula (4.94). The prefactor
.n/
M0 un
Cn0 D u.n/
n D
Mn Mn
contains Mn , for which we can write down an explicit expression from (4.137), and
.n/
a constant prefactor un which is set by the normalisation of the polynomials (this
will be discussed later on).
It is seen that we have recovered some of our previous results of Sect. 4.3.2 using
a rather different approach which started from the DE itself.
It is easy to show that on the real axis the polynomial functions un .x/ correspond-
ing to different values of n in Eq. (4.122) are orthogonal if the following condition
is satisfied at the boundary points x D a and x D b of the interval:
ˇ
.z/w.z/zk ˇzDa;b D 0; (4.139)
where k D 0; 1; 2; : : :. This is proven in the following Problems:
Problem 4.46. Using the method developed in Sect. 4.3.1.2 when we proved
orthogonality of Legendre polynomials, show that on the real axis two solutions
un .x/ and um .x/ of the DE (4.122) with n and m , respectively, are orthogonal:
Z b
w.x/un .x/um .x/dx D 0; n ¤ m: (4.140)
a
[Hint: instead of Eq. (4.122) use its self-adjoint form (the first equation (4.131))
and then note that either un .x/ or um .x/ consists of a sum of powers of z.]
Problem 4.47. Similarly, consider the DE (4.128) for the m-th derivative,
.m/
vm .x/ D un .x/, of the polynomial un .x/, which obviously is also a polynomial.
Show that these are also orthogonal with respect to the weight function
wm .x/ D m .x/w.x/ for different n and the same m:
Z b
.m/
w.x/ m
.x/un.m/ .x/uk .x/dx D 0; k ¤ n: (4.141)
a
[Hint: also use the self-adjoint form for the DE, the second equation in (4.131).]
Let us now derive the relationship between the normalisation integral (4.84) for
the polynomials,
Z b
Dn0 D w.x/u2n .x/dx; (4.142)
a
and a similar integral defined for their m-th derivative:

Z b Z b
2 2
Dnm D wm .x/ un.m/ .x/ dx D w.x/ m
.x/ un.m/ .x/ dx: (4.143)
a a
This can be done by considering the second DE in Eq. (4.131) for the function
.m/
vm .x/ D un .x/ which we shall write using (4.134) as
.wmC1 vmC1 /0 C m wm vm D 0:
Multiplying both sides of it by vm .x/ and integrating between a and b and applying
integration by parts for the derivative term we obtain
Z b Z b
.wmC1 vmC1 vm /jba wmC1 vmC1 vm0 dx C m wm vm2 dx D 0:
a a
The first term is zero due to the boundary condition (4.139). In the second term
vm0 .x/ D vmC1 .x/, and hence we immediately obtain a recurrence relation: Dn;mC1 D
m Dnm . Repeatedly applying this relation, Dnm can be directly related to Dn0 :
!
Y
m1
Dnm D m1 Dn;m1 D m1 m2 Dn;m2 D D k Dn0 D .1/m Mm Dn0 ;
kD0
(4.144)
which is the required relationship. The quantity Mm was defined earlier by
Eq. (4.137). Recall that m are given by Eq. (4.130). We shall employ this identity in
the next section to calculate the normalisation integral for the associated Legendre
functions.
Another useful application of the above result is in calculating the normalisation
Dn0 of the polynomials. Indeed, setting m D n in Eq. (4.144) and noticing that
Z Z
b 2 2 b
Dnn D w.x/ n
.x/ u.n/
n dx D u.n/
n w.x/ n
.x/dx;
a a
we obtain
2 2
un
.n/ Z b
.n/
an nŠ Z b
Dn0 D .1/ n
w.x/ n
.x/dx D .1/ n
w.x/ n
.x/dx:
Mn a Mn a
(4.145)
.n/
Here we have made use of the fact that is a constant and hence can be taken
un
.n/
out of the integral. Also, this constant can trivially be related to the coefficient an
.n/ .n/
to the highest power xn in the polynomial un .x/ as un D an nŠ, leading finally
to the above relationship for the normalisation integral. Therefore, what is required
for the calculation of the normalisation Dn0 is the knowledge of the highest power
.n/
coefficient an and the calculation of the integral of w n . These can be obtained in
each particular case of the polynomials as is done in the next section.
We recall from Sect. 4.4.1 that when transforming the original DE (4.117)
into the standard form (4.122), several cases for choosing the constant k and the
polynomial .x/ might be possible. The obtained above result for the normalisation
constants may help in narrowing down that uncertainty. Indeed, consider the case
of n D m D 1. Then, from (4.144) it follows that D11 D 0 D10 , where
0 D 0 jnD1 D 0 , see Eq. (4.130). Note that both quantities, D11 and D10 , must
be positive, see Eqs. (4.142) and (4.143). Therefore, 0 must be negative. Let us
remember this result. This is a necessary condition which can be employed when
choosing particular signs for k and the first order polynomial .x/ when applying
the transformation method of Sect. 4.4.1.
So far we have discussed mostly orthogonal polynomials as solutions of the
hypergeometric type equation (4.122). Polynomial solutions correspond to partic-
ular values of given by the eigenvalues of Eq. (4.127). It is also possible to
construct solutions of such an equation for other values of . Although we are not
going to do this here as it goes way beyond this course, we state without proof a
very important fact that complete solution of the original generalised equation of
the hypergeometric type (4.117) on the real axis corresponding to other values of
than those given by Eq. (4.127) is not bound within the interval a x b.
Only solutions corresponding to the eigenvalues n from (4.127), i.e. orthogonal
polynomials, result in bound solutions of the original equation (4.122). In other
words, only such solutions are everywhere finite in the interval of their definition,
any other solutions will indefinitely increase (decrease) within that interval or at its
boundaries. As was mentioned at the beginning of this section, this is extremely
essential in solving physical problems when quantities of interest can only take on
finite values.
4.4.3 Classical Orthogonal Polynomials
Here we shall revisit Jacobi, Hermite and Laguerre polynomials using the general
theory developed above. We shall derive their explicit recurrence relations and
establish their normalisation.
We shall first consider the question of normalisation. As an example, let us look
first at the Legendre polynomials. Their DE is given by Eq. (4.65), it is already in
the standard form with .x/ D 1 x2 , .x/ D 2x and n D n .n C 1/. We also
know that the weight function in this case is w.x/ D 1. Hence from Eq. (4.137) we
can calculate

nŠ Y .1/n nŠ Y
k1 k1
nCj1
Mk D 2 C .2/ D .n C j C 1/
.n k/Š jD0 2 .n k/Š jD0
.1/n nŠ .n C k/Š .1/n .n C k/Š

D D ;
.n k/Š nŠ .n k/Š
and hence Mn D .1/n .2n/Š. We also need the integral in Eq. (4.145),
R1
2 n
1 1 x dx, which has been calculated before, see Eq. (4.43). The final
.n/
ingredient is the coefficient an to the xn term in the polynomial.
To find it, consider
n
the Rodrigues formula (4.77) in which we shall expand x2 1 into the Taylor’s
series and differentiate each term n times:
n
1 h 2 n i.n/ 1 X n .n/
x 1 D .1/nk x2k :
2 nŠ
n 2 nŠ kD0 k
n
The term with the highest power of x (the term with xn ) arises when k D n, and the
required coefficient is

1 n 1
a.n/
n D .2n/ .2n 1/ : : : .2n n C 1/ D n .2n/ .2n 1/ : : : .n C 1/
2 nŠ
n n 2 nŠ
1 .2n/Š
D ;
2n nŠnŠ
so that
.2n/Š
a.n/
n D : (4.146)
2n .nŠ/2
Collecting all our findings in Eq. (4.145), we obtain the final result,
2 2nC1
1 .2n/Š 2 .nŠ/2 2
Dn0 D .1/n 2
nŠ D ;
.1/ .2n/Š 2 .nŠ/
n n .2n C 1/Š 2n C 1
the result we already know, see Eq. (4.73).
Problem 4.48. Consider Hermite polynomials (4.100) satisfying the DE (4.99)

2
and defined on the whole x axis. In this case D 1, D 2x and w D ex .
2
Convince yourself by repeated differentiation of ex in the Rodrigues formula
that the term with the highest power of x arises from differentiating n times the
exponential function only, i.e.
a.n/
n D2 :
n
(4.147)
Then prove that in this case

p
Dn0 D 2n nŠ: (4.148)
Problem 4.49. The generalised Laguerre polynomials Ln .x/ defined for x 0
are specified by the Rodrigues formula (4.109); they satisfy the DE (4.108). In
this case: D x, D 1Cx and w D x ex . Show by repeated differentiation
of xnC ex in the Rodrigues formula that the term with the highest power of x
arises when differentiating the exponential function only, and hence in this case
.1/n
a.n/
n D : (4.149)
nŠ
Correspondingly, verify that the normalisation is
.n C C 1/
Dn0 D : (4.150)
nŠ
.;/
Problem 4.50. Consider the Jacobi polynomials Pn .x/ given by Rodrigues
formula (4.115), defined on the interval 1 x 1 and satisfying the
DE (4.114). The coefficient to the highest power of x (which is xn ) can be
obtained by considering the limit
a.n/ n .;/
n D lim x Pn .x/ : (4.151)
x!1
Use this formula in conjunction with the general expression (4.116) for the
Jacobi polynomials and formula (2.90) for the generalised binomial coeffi-
cients, to find that

1 2n C C
a.n/
n D : (4.152)
2n n
Problem 4.51. In the case of Jacobi polynomials D 1 x2 , D

. C C 2/ x and the weight function w D .1 C x/ .1 x/ . Show that in
this case the normalisation of the polynomials is given by:
2CC1 .n C C 1/ .n C C 1/
Dn0 D : (4.153)
nŠ .2n C C C 1/ .n C C C 1/
[Hint: when calculating the integral appearing in Eq. (4.145), relate it to the
beta function using the method employed in deriving Eq. (4.43), and then
repeatedly use the recurrence relation for the gamma function.]
The last point of the general theory which remains to be considered sys-
tematically concerns the derivation of the recurrence relation for all classical
polynomials. We have considered in detail various recurrence relations for Legendre
polynomials in Sect. 4.3.1.1 (see Eq. (4.55) in particular), and a general formula
for any orthogonal polynomials has been derived in Sect. 4.3.2.2. It follows from
Eq. (4.87) derived in the latter section that in order to set up explicitly the recurrence
relation between any three consecutive polynomials, one needs to know both the
.n/ .n/
coefficients an and an1 , and the normalisation Dn0 . We have calculated the former
and the latter above in this section; however, we still need to calculate the coefficient
.n/
an1 to the power xn1 .
This task can be accomplished easily if we notice that
.n/ .n/
un .x/ D a.n/
n x C an1 x
n n1
C H) u.n1/
n D a.n/
n nŠ x C an1 .n 1/Š:
On the other hand, application of Eq. (4.135) for m D n 1 yields
.n/
un Mn1 Mn1 w0n
vn1 D u.n1/ .x/ D .wn .x//.1/ D a.n/
n nŠ ;
n
wn1 .z/ Mn Mn wn1
where we have made use also of Eq. (4.136). However, because of the second
equation in (4.132) and of Eq. (4.129), we can write
w0n wn w0n wn n 0
D D
wn1 wn1 wn wn1
0
w n n 0 0 0 0
D D n Dn C D .n 1/ C :
w n1
Also, from (4.137),
Mn1 1 1
D D 0 00
;
Mn n1 C .n 1/
.n/ .n/ C .n 1/ 0
a.n/
n nŠ x C an1 .n 1/Š D an nŠ
0 C .n 1/ 00
.n/ 0
an1 C .n 1/
H) .n/
Dn x : (4.154)
an 0 C .n 1/ 00
Note that the first term in the square brackets above is the first order polynomial
which must start from x and cancel out the second term giving the required constant
ratio of the two coefficients.
As an example, let us consider the case of Laguerre polynomials Ln .x/ D Ln0 .x/.
.n/ .n/
For them D x and D 1 x, so that an1 =an D n2 . According to
.nC1/ .nC1/
recurrence relation (4.87), we also need the ratio an =anC1 , which is obtained
from the previous one by the substitution n ! n C 1. Then, from (4.150) we have
Dn0 D .n C 1/=nŠ D 1 and according to (4.149) we know the expressions for the
.n/ .n1/ .nC1/
coefficients an , an1 and anC1 . Hence the recurrence relation (4.87) reads in this
case:
xLn D nLn1 C .2n C 1/ Ln .n C 1/ LnC1 ; (4.155)
.n/ .n/ .nC1/ .nC1/ .n1/ .n/

since an1 =an D n2 , an =anC1 D .n C 1/2 , an1 =an D n and
.n/ .nC1/
an =anC1 D .n C 1/.
Problem 4.52. Prove that for the generalised Laguerre polynomials, Ln .x/,
the recurrence relation reads
xLn D .n C / Ln1

C .2n C C 1/ Ln .n C 1/ LnC1

: (4.156)
Problem 4.53. Prove that the recurrence relation for the Hermite polynomi-
als is
1
2xHn D 2nHn1 C HnC1 : (4.157)
2
Problem 4.54. Prove that the recurrence relation for the Legendre polynomi-
als is given by Eq. (4.55).
Problem 4.55. Prove that the recurrence relation for the Jacobi polynomials
is
2.nC/.nC/ .;/ 2 2
xP.;/ D Pn1 C P.;/
n
.2nCCC1/.2nCC/ .2nCC/.2nCCC2/ n
2.n C 1/.n C C C 1/ .;/
C P : (4.158)
.2n C C C 2/.2n C C C 1/ nC1
4.5 Associated Legendre Function
Here we shall consider special functions which have a solid significance in many
applications in physics. They are related to Legendre polynomials and are called
associated Legendre functions. We shall see in Sect. 4.5.3 that these functions
appear naturally while solving the Laplace equation in spherical coordinates;
similar PDEs appear in various fields of physics, notably in quantum mechanics,
electrostatics, etc.
4.5 Associated Legendre Function 357
4.5.1 Bound Solution of the Associated Legendre Equation
Consider the associated Legendre equation

2
00 0 m2
1 x ‚ 2x‚ C ‚ D 0; (4.159)
1 x2
where m D 0; ˙1; ˙2; : : :. We require solutions of the above equation which are
bound within the interval 1 x 1. Whether or not there are such solutions of the
DE would certainly depend on the values of the parameter . Therefore, one task
here is to establish if such particular values of exist that ensure the solutions are
bound, and if they do, what those eigenvalues are. Finally, we would like to obtain
the bound solutions explicitly.
To accomplish this goal, we shall use the method developed above in Sect. 4.4.
Using notations from this section, we notice that this equation is of the generalised
2
2
2 type (4.117) with .x/ D 1 x , ˇ.x/ D 2x and .x/ D
hypergeometric
1 x m . It can be transformed into the standard form (4.122) by means of the
transformation ‚.x/ D .x/u.x/ with the transformation function .x/ satisfying
Eq. (4.119) where the first order polynomial .x/ is determined from Eq. (4.123).
The polynomial .x/ has the form
p
D ˙ .k / .1 x2 / C m2 ; (4.160)
where k D 0 . Here , .x/ and .x/ D ˇ.x/ C 2.x/ (see Eq. (4.121)) enter the
DE (4.122) for u.x/. We now need to find such values of k that would guarantee .x/
from (4.160) to be a first order polynomial. It is easily seen that two cases are only
possible: (1) k D , in which case .x/ D ˙m, and (2) k D m2 , in which case
.x/ D ˙mx. We therefore have four possibilities, some of them would provide us
with the required solution.
Let us specifically consider the case of m 0 and choose .x/ D mx which
yields .x/ D 2x 2mx D 2 .m C 1/ x. This choice guarantees that 0 < 0
2
as required (see the end of Sect. 4.4.2). This case corresponds to k D m2 .
0
The transformation function .x/, satisfying the DE = D = D mx= 1 x
from (4.119), is immediately found to be (up to an insignificant multiplier)
m=2
.x/ D 1 x2 : (4.161)
Next,
D k C 0 D k m D m .m C 1/ : (4.162)
The DE in this case,

1 x2 u00 2 .m C 1/ xu0 C u D 0;
has non-trivial bound solutions only if the parameter takes on the (eigen)values
from Eq. (4.127), i.e. n D 2n .m C 1/ C n .n 1/, where n D 0; 1; 2; : : : is a
positive integer which can be used to number different solutions u.x/ ! un .x/.
Consequently, the required values of become, from (4.162),
H) n D n C m .m C 1/ D l .l C 1/ with l D n C m: (4.163)
As n 0, we should have l m 0.
Next, we shall find the weight function w.x/. It satisfies the first DE given
in (4.132), i.e.
w0 2mx m
. w/0 D w H) D H) w.x/ D 1 x2 : (4.164)
w 1 x2
Therefore, the (eigen)function un .x/ corresponding to the eigenvalue n (or, equiv-

alently, to n ) is given by the Rodrigues formula (4.138)
Cn m h nCm i.n/

un .x/ D Œw.x/ n .x/.n/ D Cn 1 x2 1 x2
w.x/
m h l i.lm/
D Cn 1 x2 1 x2 ;
with Cn being a normalisation constant. We shall determine it later. Historically,

instead of n the number l D 0; 1; 2; : : : is normally used to number all possible
bound solutions, un .x/ ! ul .x/, and for each value of l we shall only have a
limited number of the values of m, namely m D 0; 1; : : : ; l since, as was determined
above, m l.
The above findings enable us to finally write the bound solution of the original
DE (4.159) as follows:
m=2 h l i.lm/
‚lm .x/ D .x/ul .x/ D Clm 1 x2 1 x2 ; (4.165)
where in writing the constant factor Clm and the solution itself, ‚lm .x/, we indicated
specifically that they would not only depend on l, but also on the value of m. The
obtained functions are called associated Legendre functions and denoted Pm l .x/
because they have a direct relation to the Legendre polynomials Pl .x/ as we shall
see in a moment. Choosing appropriately the proportionality constant in accord with
tradition, we write
1 m=2 dlm l
Pm
l .x/ D 1 x2 1 x2 : (4.166)
2 lŠ
l dx lm
m=2
It may seem that because of the prefactor 1 x2 this function is infinite (not
l
bound) at the boundary points x D ˙1; however, this is not the case as 1 x2 is
differentiated l m times yielding the final power of 1 x2 being larger than m=2.
We can also see this point more clearly by directly relating these functions with the
Jacobi polynomials. This can be done in the following way:
.l m/Š
2 m=2 .m;m/
Pm
l .x/ D .1/
lCm
1 x Plm .x/; (4.167)
2m lŠ
as can be easily checked by comparing (4.166) with the Rodrigues formula (4.115)
for the Jacobi polynomials.
Replacing formally m ! m in the above formula, another form of the
associated Legendre functions is obtained:
1
2 m=2 d
lCm
2 l

2 m=2 d
m
l .x/ D 1 1 D .1/ 1 Pl .x/;
l
Pm x x x
2l lŠ dxlCm dxm
(4.168)
which shows the mentioned relationship with the Legendre polynomials. Above we
have made use of the Rodrigues formula (4.77) for these polynomials. In particular,
P0l .x/ D .1/l Pl .x/. This other form is also bound everywhere within 1 x 1
and is also related to the Jacobi polynomials:
2m .l C m/Š m=2 .m;m/

l .x/ D .1/
Pm lCm
1 x2 PlCm .x/: (4.169)
lŠ
We need to show though that Pm l .x/ (where still m 0) is a solution of the associated
Legendre equation. In fact, we shall show that by proving that the two functions,
Pm
l .x/ and Pl .x/ (where m
m
0), are directly proportional to each other.
To demonstrate this fact, let us derive an explicit expression for the function
Pml .x/ inspired by some tricks we used when deriving explicit expression (4.116)
l
for the Jacobi polynomials. Writing 1 x2 as .1 x/l .1 C x/l and performing
differentiation with the help of the Leibnitz formula (4.79), we obtain
lCm
1 X
2 m=2 lCm .k/ .lCmk/
l .x/ D
Pm 1 x .1 x/l .1 C x/l ;
2l lŠ kD0
k
where
.k/ lŠ
.1 x/l D .1/k .1 x/lk for k l;
.l k/Š
which limits the summation index from above, and
.lCmk/ lŠ
.1 C x/l D .1 C x/km ;
.k m/Š
which is only non-zero for k m and hence it limits the values of k from below.
Therefore, finally:
lŠ .lCm/Š X
2 m=2
l
.1/k
l .x/D
Pm 1x .1x/lk .1Cx/km :
2l kDm
kŠ .lCmk/Š .lk/Š .k m/Š
(4.170)
Problem 4.56. Perform a similar calculation for Pm

l .x/ of Eq. (4.166) to
obtain
lŠ .lm/Š X
2 m=2
lm
.1/k
Pm
l .x/D 1x .1x/lkm .1 C x/k :
2l kD0
kŠ .lmk/Š .lk/Š .kCm/Š
(4.171)
Now, changing the summation index k ! k C m, rearrange the sum and
then derive the following relationship between the two representations of the
associated Legendre function:
.l m/Š m
Pm
l .x/ D .1/
m
P .x/; l m 0: (4.172)
.l C m/Š l
We see that the two functions are proportional to each other and hence are
solutions of the same DE for the same value of m.
Problem 4.57. Derive several first associated Legendre functions:
p p
P11 D 1 x2 I P12 D 3x 1 x2 I P22 D3 1 x2 I
3 p 3=2
P13 D 5x2 1 1 x2 I P23 D15x 1 x2 I P33 D 15 1 x2 I
2
5p 15 2
P14 .x/ D 1 x2 7x3 3x I P24 D 7x 1 1 x2 I
2 2
3

2 3=2
2
P4 D 105x 1 x I P4 D105 1 x2 :
4
These are polynomials only for even values of m.

Problem 4.58. Consider the associated Legendre equation (4.159) with D
m=2
l .l C 1/. By making the substitution ‚.x/ D 1 x2 !.x/, derive the
following DE for the function !.x/:

1 x2 ! 00 2.m C 1/x! 0 C .l m/.l C m C 1/! D 0:
(continued)

On the other hand, show that by differentiating m times the Legendre equa-
tion (4.65) for Pl .x/, the same equation is obtained, i.e. !.x/ D ŒPl .x/.m/ ,
l .x/.
which is basically formula (4.168) for Pm
m
It should now be obvious that one can equivalently use either Pml .x/ or Pl .x/ as
a solution of the associated Legendre equation. It is customary to use the function
Pml .x/ with the non-negative value of m.
Note that Pm l .x/ is not the only solution of the differential equation (4.159).
However, the other solution is infinite at x ! ˙1 and thus is not acceptable for
many physical problems. Hence it will not be considered here.
4.5.2 Orthonormality of Associated Legendre Functions
Here we shall consider some properties of the functions Pm l .x/. Firstly, let us show
0
that the functions Pm
l .x/ and Pm
l0 .x/ are orthogonal for l ¤ l and the same m:
Z 1
0
Ill0 D l .x/Pl0 .x/dx D 0; l ¤ l :
Pm m
(4.173)
1
Indeed, because of the relationship (4.169) between the associated Legendre

functions and the Jacobi polynomials, we can write
Z 1
0 4m .l C m/Š .l0 C m/Š m .m;m/ .m;m/
Ill0 D .1/lCl 1 x2 PlCm .x/Pl0 Cm .x/dx:
lŠl0 Š 1
But the integral is nothing but the orthogonality condition (4.140) written for
.m;m/
the
Jacobi polynomials Pn .x/, for which the weight function is w.x/ D
m
1 x2 (see Problem 4.43). Therefore, it is equal to zero.
Problem 4.59. Derive the orthogonality condition for the associated Legendre
functions exploiting the relationship (4.168) between them and the Legendre
polynomials and the orthogonality condition (4.141) for the derivatives of
the polynomials.
Next, let us derive the normalisation integral for the associated Legendre
functions:
Z Z
1
2
1 m h .m/ i2
Dlm D l .x/
Pm dx D 1 x2 Pl .x/ dx:
1 1
.m/
The functions Pl .x/ are the m-th derivatives of the Legendre polynomials. The
latter are characterised by the unit weight function w.x/ D 1, .x/ D 2x and
.x/ D 1 x2 . The weight wm .x/ associated with the m-th derivative
m of Pl .x/,
according to Eq. (4.133), must be wm .x/ D m .x/ D 1 x2 . Therefore, the
integral above is the normalisation integral (4.143) for the m-th derivative of the
Legendre polynomials Pl .x/, and hence we can directly use our result (4.144) to
relate this to the normalisation Dl0 of the functions Pl .x/ themselves:
!
Y
m1
Dlm D k Dl0 :
kD0
For Legendre polynomials Dl0 D 2= .2l C 1/, see Eq. (4.73), and, according
to (4.130),
1
k D .l k/ 0 .l k/ .l C k 1/ 00
2
D 2 .l k/ C .l k/ .l C k 1/ D .l k/ .l C k C 1/ ;
where D 2x and D 1 x2 correspond to the DE (4.65) for Pl .x/. Therefore,

!
Y
m1
2 2 Y
m1
2 .l C m/Š
Dlm D k D .l k/ .l C k C 1/ D :
kD0
2l C 1 2l C 1 kD0 2l C 1 .l m/Š
This result allows us to write the orthonormality condition for the associated
Legendre functions as:
Z 1
2 .l C m/Š
Ill0 D l .x/Pl0 .x/dx D
Pm m
ıll0 : (4.174)
1 2l C 1 .l m/Š
It is convenient to redefine the solutions ‚lm .x/ of the DE (4.159) in such a way that
their normalisation would be equal to one:
Z 1
‚lm .x/‚l0 m .x/dx D ıll0 : (4.175)
1
It is readily seen to be done, e.g. by choosing

s
2l C 1 .l m/Š m
‚lm .x/ D .1/ l
P .x/
2 .l C m/Š l
s
.1/l 2l C 1 .l m/Š m=2 dlCm l
D l 1 x2 1 x2 : (4.176)
2 lŠ 2 .l C m/Š dx lCm
4.5.3 Laplace Equation in Spherical Coordinates

4.5.3.1 Separation of Variables
The associated Legendre functions we have encountered above form the main
component of the so-called spherical functions or spherical harmonics which
appear in a wide class of physical problems where PDEs containing the Laplacian
are solved in spherical coordinates. For instance, this happens when considering
central field problems of quantum mechanics (Sect. 4.7). Therefore, it is essential to
introduce these functions. It is natural to do this by considering the simplest problem
of the Laplace equation in spherical coordinates.
Consider the Laplace equation D 0. Solutions of such an equation are called
harmonic functions. We shall obtain these by considering the Laplace equation
in spherical coordinates .r; ; /. We shall learn in Sect. 7.9 that the Laplacian of
.r; ; / in the spherical coordinates can be written as:

1 @ 2@ 1 @ @ 1 @2
r C sin C D 0: (4.177)
r2 @r @r r2 sin @ @ r2 sin2 @ 2
First of all, we note that 1=r2 appears in all terms and hence can be cancelled.
Next, we shall attempt5 to separate the variables in Eq. (4.177). For that, we shall
be looking for solutions of this PDE which are in the form of a product of three
functions (a product solution),
.r; ; / D R.r/‚. /ˆ. /; (4.178)
each depending on its own variable. Substituting this product solution into the PDE
and dividing through by D R‚ˆ, we obtain

1 d 2 dR 1 d d‚ 1 1 d2 ˆ
r C sin C D 0:
R dr dr ‚ sin d d sin2 ˆd 2
It is seen that the part depending on the angle is “localised”: nowhere else in the
equation is there any dependence on the angle . Hence one can solve the above
equation with respect to this part:

1 d2 ˆ 2 1 d 2 dR sin d d‚
D sin r C sin : (4.179)
ˆd 2 R dr dr ‚ d d
The left-hand side of (4.179) depends only on , while the right-hand side only
depends on the other two variables .r; /. This is only possible if both sides are
5
We shall basically use the method described in more detail in Sect. 8.2.5.
equal to the same constant; let us call it the separation constant . Hence, we can
write the above equation equivalently as two equations:
1 d2 ˆ d2 ˆ
D H) C ˆ D 0; (4.180)
ˆd 2 d 2
and

21 d 2 dR sin d d‚
D sin r C sin :
R dr dr ‚ d d
It is seen that the variable was “separated” from the other two variables. Now we
need to separate the variables r and in the equation above. This is easily done by
dividing through both sides on sin2 :

1 d dR 1 d d‚
r2 D sin C : (4.181)
R dr dr ‚ sin d d sin2
In Eq. (4.181) all terms depending on r are collected on the left, while the right-hand
side depends only on . It follows, therefore, that both sides must be equal to the
same (separation) constant which this time we shall call . This way we arrive at
the following two final equations:

d 2 dR
r R D 0; (4.182)
dr dr

1 d d‚
sin C ‚ D 0; (4.183)
sin d d sin2
clearly demonstrating that our method of separating the variables succeeded. We

obtained three ordinary differential equations which should be solved: Eq. (4.182)
for the radial function R.r/ and Eqs. (4.180) and (4.183) for the angular part
provided by the functions ˆ. / and ‚. /, respectively. However, we also have two
unknown separation constants, and , which we have to find.
4.5.3.2 Solution of the ˆ Equation
To find , we note that the function ˆ. / must be periodic with respect to its
argument with a period of 2 due to the geometric meaning of this angle. Indeed, if
we increase by 2 , we come back to the same point in space, and hence we must
require that the solution cannot change because of the transformation ! C 2 .
Therefore, .r; ; C 2 / D .r; ; /, which is ensured if ˆ. C 2 / D ˆ. /;
i.e. if ˆ. / is a periodic function with the period of 2 .
Let us consider Eq. (4.180) for different values of to verify which ones would
ensure such periodicity. When D m2 > 0, the solution of (4.180) is
ˆ. / D A sin.m / C B cos.m / I (4.184)
when D 0, we find ˆ. / D A C B, and for D p2 < 0 the solution is
ˆ. / D A sinh.p / C B cosh.p /:
In all these solutions A and B are arbitrary constants. It is readily seen that the
required periodicity of ˆ . / is only possible in the first case when ˆ. / is a sum
of sine and cosine functions. But even in that case, this may only happen if the
values of m D 0; ˙1; ˙2; : : : are integers. Hence, the eigenvalues are D m2
and the corresponding eigenfunctions are given by Eq. (4.184), i.e. these are either
ˆm D sin .m / or ˆm D cos .m / (or their arbitrary linear combinations). In fact,
it is very convenient to use the complex eigenfunctions ˆm D eim and ˆm D eim
instead. This is particularly useful for quantum-mechanical applications.
Using the exponential form of the eigenfunctions, it is especially easy to see that
they satisfy
Z 2
ˆm . / ˆm0 . /d D 2 ımm0 ; (4.185)
0
i.e.pthe eigenfunctions are orthogonal. They will be orthonormal if a factor of

1= 2 is attached to them, ˆm . / D p1 e˙im ; this is what we will be assuming
2
in the following.
4.5.3.3 Solution of the ‚ Equation
Being equipped with the knowledge of the separation constant D m2 we found

while solving the equation for ˆ. /, we can rewrite Eq. (4.183) for ‚ ./ as follows:

d d‚
sin sin C sin2 m2 ‚ D 0: (4.186)
d d
In order to simplify this equation, we shall change the variable using the
substitution x D cos , where 1 x 1. For any function f . /,
df df dx df df df
D D sin H) sin D 1 x2 ;
d dx d dx d dx
so that the first term in the DE above becomes

d d‚ d
2 d‚

2 d

2 d‚
sin sin D sin 1x D 1x 1x ;
d d d dx dx dx
which results in the following DE for ‚.x/ in terms of the new variable:

d2 ‚ d‚ m2
1 x2 2x C ‚ D 0; (4.187)
dx2 dx 1 x2
which is precisely the same equation as the one we considered in detail in Sect. 4.5.
Therefore, physically acceptable solutions of this DE which are bound everywhere
within the interval 1 x 1 are given by associated Legendre functions ‚lm .x/,
Eq. (4.176), with D l .l C 1/, where l D 0; 1; 2; : : :. With these values of this is
the only solution which is finite along the z axis ( D 0 for x D 1 and D for
x D 1).
A simple illustration of this requirement for the possible eigenvalues, D
l .l C 1/, can be given based on solving the DE (4.187) using the Frobenius method
around the point x D 0. We have shown in Sect. I.8.4.2 in Example I.8.9 when
considering series solutions of the Legendre equation (4.65) (i.e. for m D 0) that
a polynomial solution is only possible when D l .l C 1/ as in this case one of
the series of the general solution, i.e. either y1 .x/ or y2 .x/ in the general solution
y.x/ D C1 y1 .x/ C C2 y2 .x/ terminates. Hence it is guaranteed to remain finite at
the boundary points x D ˙1; the other series solution diverges at these points,
and hence should be rejected. The bound solution coincides with the Legendre
polynomials which are a particular case of the associated Legendre functions when
m D 0.
4.5.3.4 Spherical Harmonics
The above consideration brings us to the point where we can finally write down
the complete solution of the angular part of the Laplace equation in spherical polar
coordinates. These solutions are all possible products of the individual solutions of
the equations for ‚. / and ˆ. /, see Eq. (4.178). These products, ‚lm .x/ˆm . / D
‚lm .cos / ˆm . /, denoted as Ylm .; / or Ylm .; /, are widely known as spherical
harmonics. For m 0 these functions are defined by6
s
2l C 1 .l m/Š m
Ylm .; / D .1/ m
P .cos /eim : (4.188)
4 .l C m/Š l
Here the associated Legendre function is considered only for positive values of m,
i.e. 0 m l. At the same time, two possible functions ˆm . / D e˙im exist
for m > 0. That means that altogether 2l C 1 possible values of the index m
can be considered between l and l for each l D 0; 1; 2; 3; : : :, i.e. including the
6
Various sign factors, such as the .1/m factor we have in Eq. (4.188), can be also found in the
literature.
negative values as well, and hence 2l C 1 spherical harmonics can also be defined
for each l. The spherical harmonics for negative values m D 1; 2; : : : ; l are
defined such that
Ylm .; / D .1/m Ylm .; / : (4.189)
This definition is in sync with Eq. (4.172) as changing m ! m in the expression

l .x/ results in an appropriate change of the prefactor including the sign factor
for Pm
of .1/m . In other words, with the definition (4.189) one can use Eq. (4.188) not
only for positive and zero, but also for negative values of m.
The spherical harmonics are normalised so that
Z 2 Z
m
Yl .; / Yjk .; / sin dd D ılj ımk ; (4.190)
0 0
which is easily checked by making the substitution x D cos and using orthonor-
p
mality of the associated Legendre functions ‚lm .x/ and that of ˆm . / D eim = 2 .
Here the integration is performed over the so-called solid angle d D sin dd'
(which integrates to 4 over the whole sphere).
Problem 4.60. Show that a first few spherical harmonics are

r r r r
1 3 3 3
Y00 D I Y10 D cos I Y11 D sin ei' I Y11 D sin ei I
4 4 8 8
(4.191)
r r r
15 15 5
Y21 D sin cos ei I Y21 D sin cos ei I Y20 D .3 cos2 1/I
8 8 16
r r
2 15 2 2i 2 15
Y2 D sin e I Y2 D sin2 e2i : (4.192)
32 32
So far we have defined complex spherical harmonics. Real spherical harmonics

can also be defined by mixing (for m ¤ 0) Ylm and Ylm . Assuming that m > 0, we
obtain two real functions:
1 1
Slm .; / D p Ylm C .1/m Ylm D p Ylm C Ylm
2 2
s
2l C 1 .l m/Š m
D .1/m P .cos / cos .m / ; (4.193)
2 .l C m/Š l
1 1
Slm .; / D p Ylm .1/m Ylm D p Ylm Ylm
2i 2i
s
2l C 1 .l m/Š m
D .1/m P .cos / sin .m / : (4.194)
2 .l C m/Š l
Note that Yl0 is already real and hence is kept the same: Sl0 .; / D Yl0 .; /.
Problem 4.61. Show that a first few real spherical harmonics for l D 1; 2 are
r r r
3 3 3
S10 D Y10 D nz I S11 D nx I S11 D ny I (4.195)
4 4 4
r r r
15 15 5
S21 D nx nz I S21 D
ny nz I 0 0
S2 D Y2 D .3n2z 1/ I
4 4 16
r r
2 15 2 2
2 15
S2 D nx ny I S2 D nx ny ; (4.196)
16 4
where nx D sin cos , ny D sin sin and nz D cos are the components of
the unit vector n D r=r.
Spherical harmonics Ylm are useful in expressing angular dependence of functions

of vectors, f .r/. Indeed, it can be explicitly shown that any “good” function
f .r/ D f .r; ; / can be expanded into a functional series with respect to all
spherical harmonics:
1 X
X l
f .r; ; '/ D flm .r/ Ylm .; /; (4.197)
lD0 mDl
where by virtue of the orthogonality of the spherical functions, Eq. (4.190), the
expansion coefficients are given by
Z 2 Z
m
flm .r/ D d' Yn .; '/ f .r; ; '/ sin d: (4.198)
0 0
Note that the expansion coefficients only depend on the length r D jrj of the
vector r.
We finish this section with another important result, which we shall also leave
without proof. If we define two unit vectors n1 and n2 with the angle D .n1 ; n2 /
between them, then
4 X l

Pl .cos / D Y m .1 ; 1 /Yl .2 ;
m
2/ ; (4.199)
2l C 1 mDl l
where and angles .1 ; 1 / and .2 ; 2 / correspond to the orientation of the first and
the second vectors, respectively, in the spherical coordinates.
4.6 Bessel Equation 369
Problem 4.62. Show that formula (4.199) remains invariant upon replacement
of the complex harmonics Ylm with the real ones:
4 X l
Pl .cos / D Sm .1 ; 1 /Sl .2 ;
m
2 /: (4.200)
2l C 1 mDl l
4.6 Bessel Equation
4.6.1 Bessel Differential Equation and Its Solutions
These functions frequently appear when solving PDEs of mathematical physics in

polar or cylindrical coordinates. They correspond to solutions of the following DE:

x2 y00 C xy0 C x2 2 y D 0; (4.201)
where is a real parameter.

We shall solve this DE using the Frobenius method of Sect. I.8.4 (see also
Sect. 2.8). An example of the simplest DE corresponding to D 0 was also
considered in Problem I.8.36. We shall consider the most general case of arbitrary
in the following two Problems.
Problem 4.63. Consider the generalised power series

1
X
y.x/ D cr xrCs ;
rD0
where s is a number yet to be determined. Substitute this expansion into the

DE (4.201) assuming that c0 is arbitrary. Show then that cr D 0 for any odd
values of r, while for its even values
cr
crC2 D ; (4.202)
.r C s C 2/2 2
where s can only take on two values, s D ˙, giving rise to two independent
solutions. Recall from Sect. I.8.4 that when the difference of two s values is an
integer (which is 2 in our case), only a single solution may be obtained by this
method. Therefore, let us initially assume that 2 is not an integer. Show then
that several first terms in the expansion of the two solutions are
(continued)

x2 x4
y1 .x/ D c0 x 1 C
2 .2 C 2/ 2 4 .2 C 2/ .2 C 4/
6
x
C ; (4.203)
2 4 6 .2 C 2/ .2 C 4/ .2 C 6/
x2 x4
y2 .x/ D c0 x 1 C
2 .2 C 2/ 2 4 .2 C 2/ .2 C 4/
6
x
C ; (4.204)
2 4 6 .2 C 2/ .2 C 4/ .2 C 6/
the second independent solution is obtained from the first by the substitution
! .
Problem 4.64. The recurrence relation for the cr coefficients (4.202) can be
used to obtain a general expression for them. Prove using the method of
mathematical induction or by a repeated application of the recurrence relation
that for s D one can write
" 1
#
X .1/r x 2r
y1 .x/ D c0 x 1 C ; (4.205)
rD1
rŠ . C 1/ . C 2/ . C r/ 2
while the other solution corresponding to s D is obtained formally by

replacing ! .
The above expansion can be written in a much more compact form if, as it is
usually done, c0 is chosen as c0 D 2 = . C 1/. In this case the first solution
is called Bessel function of order of the first kind and denoted J .x/. Using the
recurrence relation (4.31) for the gamma function, we can write
. C 1/ . C 1/ . C 2/ . C r/ D . C r C 1/ ;
and hence from (4.205) we finally obtain

1
X .1/r x 2rC
J .x/ D : (4.206)
rD0
rŠ .r C C 1/ 2
Note that the very first term in the expansion (4.205) is now nicely incorporated in
the summation. The second solution of the DE corresponding to is obtained by
replacing ! in the first one:
1
X .1/r x 2r
J .x/ D : (4.207)
rD0
rŠ .r C 1/ 2
It is also the Bessel function of the first kind. Both functions for non-integer
are linearly independent and hence their linear combination, y.x/ D C1 J .x/ C
C2 J .x/, is a general solution of the DE (4.201).
Now we shall consider the case when 2 is an integer. This is possible in either
of the following two cases: (1) is an integer; (2) is a half of an odd integer
(obviously, half of an even integer is an integer and hence this is the first case).
Consider these two cases separately.
If D n is a positive integer, the solution Jn .x/ is perfectly valid as the
first independent solution. The function Jn .x/ contains the gamma function
.r n C 1/ which is equal to infinity for negative integer values, i.e. when
r n C 1 D 0; 1; 2; 3; : : : or simply r n 1 and is an integer. Since the
gamma function is in the numerator, contributions of these values of the index r in
the sum, i.e. of r D 0; 1; : : : ; n 1, are equal to zero and the sum can be started from
r D n:
ˇ ˇ
X1 ˇ ˇ 1
.1/r .x=2/2rn ˇ change summation index ˇ X .1/ .x=2/2kCn
kCn
Jn .x/ D Dˇ ˇD
rŠ .r n C 1/ ˇ r !k Drn ˇ .k C n/Š .k C 1/
rDn kD0
1
X .1/k .x=2/2kCn
D .1/n D .1/n Jn .x/; (4.208)
kŠ .k C n C 1/
rD0
i.e. the Bessel function of a negative integer index is directly proportional to the one
with the corresponding positive integer index, i.e. Jn .x/ is linearly dependent on
Jn .x/ and hence cannot be taken as the second independent solution of the DE. In
this case, according to general theory of Sects. I.8.4 and 2.8, one has to look for the
second solution in the form containing a logarithmic function (cf. Eq. (I.8.75) and
Problem I.8.36):
1
X
Kn .x/ D Jn .x/ ln x C g.x/; where g.x/ D xn cr xr : (4.209)
rD0
This function, Kn .x/, will be the second independent solution of the Bessel DE in
the case of D n being a positive integer (including D 0). It diverges at x D 0.
Substituting Kn .x/ into the DE (4.201) and using the fact that Jn .x/ is already its
solution, the following DE is obtained for the function g.x/:

x2 g00 C xg0 C x2 n2 g D 2xJn0 .x/: (4.210)
Since Jn .x/ is represented by a Taylor’s expansion, expansion coefficients of g.x/

in (4.209) can be obtained from the above equation by comparing coefficients to the
same powers of x. An example of such a calculation is presented in Problem I.8.36.
Problem 4.65. The modified Bessel function of the first kind I .x/ is related to
the Bessel function J .x/ via I .x/ D i J .ix/. Show that I .x/ satisfies the
following differential equation:

x2 y00 C xy0 x2 C 2 y D 0: (4.211)
4.6.2 Half Integer Bessel Functions
Next, let us consider the case when D .2n C 1/ =2 D n C 1=2 is a half integer. In
this case, from Eq. (4.206),
1
X .1/r x 2rCnC1=2
JnC1=2 .x/ D (4.212)
rD0
rŠ r C n C 32 2
and
1
X .1/r x 2rn1=2
Jn1=2 .x/ D 1
: (4.213)
rD0
rŠ r n C 2 2
These expressions can be manipulated into a combination of elementary functions.

However, it would be much easier to do that after we worked out the appropriate
recurrence relations for the Bessel function which will be done in the next section.
Here we shall only derive explicit expressions for the case of D ˙1=2, i.e. for
J˙1=2 .x/. Since

3 1 1 .2r C 1/Š p
rC D rC rC D 2rC1
2 2 2 2 rŠ
according to Eq. (4.33), we can write
1 r "1 #
X .1/r 22rC1 rŠ x 2rC1=2 2 X .1/r 2rC1
J1=2 .x/ D p D x
rD0
rŠ .2r C 1/Š 2 x rD0 .2r C 1/Š
r
2
D sin x; (4.214)
x
where a use has been made of the Taylor’s expansion of the sine function (which is
the expression within the square brackets).
Problem 4.66. Show similarly that

r "1 # r
2 X .1/r 2r 2
J1=2 .x/ D x D cos x: (4.215)
x rD0 .2r/Š x
The expression (4.213) is well defined and hence can be used as the second
linearly independent solution of the Bessel DE, i.e. a general solution in the case
of a half integer reads y.x/ D C1 JnC1=2 .x/ C C2 Jn1=2 .x/.
Interestingly, the half integer Bessel functions, JnC1=2 .x/, are the only ones which
are bound on the real axis. We can investigate this by applying our general approach
for investigating DEs of hypergeometric type developed in Sect. 4.4.
Problem 4.67. Here we shall transform the Bessel DE (4.201) into the stan-
dard form (4.122). Show that in this case k D ˙2i and .x/ D ˙ .ix ˙ /.
Choosing k D 2i and .x/ D .iz C /, show that the DE in the standard
form is characterised by .x/ D x, .x/ D 2 C 1 2ix and D i .2 1/.
Next, show using expression (4.127) for the eigenvalues that, to guarantee
bound solutions, the values of should be positive half integer numbers, i.e.
they must satisfy D n C 1=2, where n is a positive integer.
p
p For instance, n D 0 corresponds to D 1=2, which gives J1=2 .x/ sin x= x D
x .sin x=x/ which is obviously finite at x D 0 pand tends to zero at x D ˙1. On
the other hand, the function J1=2 .x/ cos x= x tends to infinity when x ! 0
and is therefore not finite there. The possible values of we found in the Problem
guarantee that all functions JnC1=2 .x/ with any n D 0; 1; 2; : : : are bound for all
values of x. Moreover, as was said, it can be shown that these are the only Bessel
functions which possess this property.
4.6.3 Recurrence Relations for Bessel Functions
Starting from the general expression (4.206) for J .x/, consider the derivative
X1
d J .1/r 2r 0
D x
dx x rD0
rŠ . C r C 1/ 22rC
1
X X1
.1/r 2rx2r1 .1/r x 2r1
D D :
rD1
rŠ . C r C 1/ 22rC rD1
.r 1/Š . C r C 1/ 2 2
Note that summation starts from r D 1 after differentiation. Changing the

summation index r 1 ! r, we can rewrite the above result as
"1 #
d J 1 X .1/r x 2rCC1 1

D D JC1 ;
dx x x rD0 rŠ . C r C 2/ 2 x
which can be rearranged:

d J .x/ JC1 .x/
D C1 : (4.216)
xdx x x
d
Thus, applying the operator xdx to J =x gives (up to a sign) the Bessel function with
! C 1. Applying this recurrence relation n times, we will have

d n J .x/ d n1 JC1 .x/
D
xdx x xdx xC1

2 d n2 JC2 .x/ JCn .x/
D .1/ D D .1/n Cn : (4.217)
xdx xC2 x
Problem 4.68. Prove the other two recurrence relations using a similar
method:

d
x J .x/ D x1 J1 .x/; (4.218)
xdx

d n
x J .x/ D xn Jn .x/: (4.219)
xdx
Therefore, it is readily seen that if the first set of relations, (4.216) and (4.217),
increases the order of the Bessel function, the second set of relations, (4.218)
and (4.219), reduces it. This property can be employed to derive recurrence
relations between Bessel functions of different orders. Indeed, from (4.216) we
have, performing differentiation:

J0 D JC1 C J ; (4.220)
x
while similarly from (4.218) one obtains

J0 D J1 J : (4.221)
x
Combining these, we can obtain the other two recurrence relations:
2
J1 C JC1 D J and 2J0 D J1 JC1 : (4.222)
x
Recurrence relations (4.217) allow for an explicit calculation of the Bessel functions
of a positive half integer. Indeed, since we already know J1=2 .x/, we can write
n r n
d J1=2 .x/ 2 d sin x
JnC1=2 .x/ D .1/ x n nC1=2
D .1/n xnC1=2 :
xdx x1=2 xdx x
(4.223)
Problem 4.69. Similarly prove, using (4.219), that

r n
2 d cos x
Jn1=2 .x/ D x nC1=2
: (4.224)
xdx x
Problem 4.70. Show using the explicit formulae (4.223) and (4.224) that
r r
2 sin x 2 cos x
J3=2 .x/ D cos x C and J3=2 .x/ D sin x C :
x x x x
Then, using the first of the recurrence relations (4.222) demonstrate that
r
2 3 3
J5=2 .x/ D 1 sin x cos x and
xx2 x
r
2 3 3
J5=2 .x/ D sin x C 2 1 cos x :
x x x
Verify the above expressions for J˙5=2 .x/ by applying directly expres-
sions (4.223) and (4.224).
It is seen now that the functions Jn1=2 .x/ diverge at x D 0. At the same
time, JnC1=2 .x/ behaves well around the point x D 0. This can be readily seen,
e.g. by expanding sin x=x in (4.223) into the Taylor’s series. It also follows directly
from (4.212).
4.6.4 Generating Function and Integral Representation

for Bessel Functions
Here we shall derive some formulae for integer-index Bessel functions which are
frequently found useful in applications. We shall start by expanding the following
exponential function into the Laurent series:
1
X
e 2 .u u / D
x 1
ck .x/uk : (4.225)
kD1
To calculate the coefficients ck .x/ in the exponential function in (4.225), we write

the latter as a product of two exponential functions, exu=2 and ex=2u , which we both
expand into the Taylor series:
X1 X1
1 x n
x=2u .1/m x m
e xu=2
D u n
and e D um :
nD0
nŠ 2 mD0
mŠ 2
Multiplying both expansions, we get

1 X
X 1
.1/m x nCm
e 2 .u u / D
x 1
An;m unm with An;m D :
nD0 mD0
nŠmŠ 2
Both summations can be combined by introducing a single summation index k D

n m. Then, one has
1
X
e 2 .u u / D
x 1
Ak;0 C AkC1;1 C AkC2;2 C uk
kD0
1
X
C A0;k C A1;kC1 C A2;kC2 C uk
kD1
1
X 1
X
D .Ak;0 C AkC1;1 C AkC2;2 C /uk C .A0;k C A1;kC1 C A2;kC2 C /uk
kD0 kD1
1 1
! 1 1
!
X X X X
D AkCn;n u C k
An;kCn uk :
kD0 nD0 kD1 nD0
It follows from here that (k > 0)

1
X 1
X .1/n x 2nCk
ck .x/ D AkCn;n D ; (4.226)
nD0 nD0
nŠ .k C n/Š 2
1
X X1
.1/kCn x 2nCk
ck .x/ D An;kCn D : (4.227)
nD0 nD0
nŠ .k C n/Š 2
Comparing Eq.(4.226) with Eq. (4.206), we immediately recognise in ck .x/ the

Bessel function Jk .x/ of a positive index k D 0; 1; 2; : : :. Also, comparing
Eqs. (4.226) and (4.227), one can see that ck .x/ D .1/k ck .x/. Exactly the
same relationship exists between Jn .x/ and Jn .x/, see Eq. (4.208). Therefore, the
coefficients (4.227) correspond to Jk .z/. In other words, expansion (4.225) can
be now written as
1
X
e 2 .u u / D
x 1
Jk .x/uk ; (4.228)
kD1
so that the exponential function in the left-hand side serves as the generating
function for the Bessel functions of integer indices.
On the other hand, the coefficients of the Laurent expansion are given by the
contour integral of Eq. (2.97), so that we can write
I
1 x
p 1p
Jk .x/ D e2 pk1 dp; (4.229)
2 i C
where the contour C is taken anywhere in the complex plane around the p D 0 point.
This formula is valid for both negative and positive integer values of the index k.
Problem 4.71. By taking the contour C to be a circle of unit radius centred at

the origin, p D ei , show that the Bessel function can also be represented as a
definite integral:
Z 2
1 k /
Jk .x/ D ei.x sin d : (4.230)
2 0
Then show that alternatively this formula can also be rewritten as

Z 2
1
Jk .x/ D cos .x sin k /d : (4.231)
2 0
[Hint: by shifting the integration range to (which can always be

done as the integrand is a periodic function of with the period of 2 ); argue
that the integral of the sine function does not contribute.]
Problem 4.72. Prove the following formula:
1
X
eiz sin D Jn .z/ ein :
nD1
Problem 4.73. Consider now specifically J1 .z/. Shift the integration interval
to =2 3 =2, split the integral into two, one for =2 =2
and another for =2 3 =2, change the variable ! in the
second integral and then combine the two before making the final change of
variable t D sin . Hence, show that
Z 1
i teixt
J1 .x/ D p dt: (4.232)
1 1 t2
The modified function of the first kind, see Problem 4.65, is then
Z 1
1 text
I1 .x/ D i1 J1 .ix/ D p dt: (4.233)
1 1 t2
4.6.5 Orthogonality and Functional Series Expansion
In some physical applications one has to solve PDEs of mathematical physics

in cylindrical coordinates. In those cases it is possible to seek the solution as a
functional series expansion with respect to Bessel functions using the method of
separation of variables to be considered in detail in Chap. 8. We shall only briefly
discuss this point later on in this section. However, one essential issue of this method
is to be able to find coefficients of such an expansion, and for that it is extremely
convenient if the functions used in the expansion are orthogonal. It appears that
Bessel functions of the index > 1 do possess such a property in some specific
sense (i.e. not with respect to integration of two functions with different indices as,
e.g. is the case for orthogonal polynomials), which is nevertheless exactly what is
needed for using the method of separation of variables in cylindrical coordinates.
Hence it is worth discussing it in detail.
Consider the function J .˛z/ with ˛ being some real parameter and z generally a
complex number.
Problem 4.74. Show that J .˛z/ satisfies the following DE:

d2 J .˛z/ 1 dJ .˛z/ 2 2
C C ˛ J .˛z/ D 0; (4.234)
dz2 z dz z2
which can also be manipulated into the following self-adjoint form:

d d 2
z J .˛z/ C ˛ 2 z J .˛z/ D 0: (4.235)
dz dz z
Similarly to the method developed in Sect. 4.3.1.2, let us multiply both sides of
the equation above by J .ˇz/ with some real ˇ and integrate between 0 and some
l > 0:
Z l Z l
d d 2
z J .az/ J .ˇz/ dz C ˛2z J .˛z/ J .ˇz/ dz D 0:
0 dz dz 0 z
Taking the first integral by parts, we obtain

ˇl Z l Z l !
dJ .˛z/ ˇ dJ .˛z/ dJ .ˇz/ 2
z ˇ
J .ˇz/ˇ z dz C 2
˛ z J .˛z/ J .ˇz/ dz D 0:
dz 0 0 dz dz 0 z
(4.236)
In the same fashion, we start from (4.235) written for J .ˇz/, multiply it by J .˛z/
and integrate between 0 and l; this procedure readily yields the same equation as
above but with ˛ and ˇ swapped:
ˇl Z l Z l !
dJ .ˇz/ ˇ
ˇ dJ .ˇz/ dJ .˛z/ 2 2
z J .˛z/ˇ z dz C ˇ z J .ˇz/ J .˛z/ dz D 0:
dz 0 0 dz dz 0 z
Subtracting one equation from the other, we get

ˇl Z
dJ .˛z/ dJ .ˇz/ ˇ l
z J .ˇz/ J .˛z/ ˇˇ C ˛ 2 ˇ 2 zJ .˛z/ J .ˇz/ dz D 0:
dz dz 0 0
(4.237)
Let us now carefully analyse the first term above near z D 0 (the bottom limit). The
leading term in the Bessel function expansion, Eq. (4.206), is J .˛z/ z , while its
derivative dzd J .˛z/ z1 . Correspondingly, it may seem that the whole first term
above has the behavior z Œ: : : z2 around z D 0. A more careful consideration,
however, results in z Œ: : : z2C2 as is demonstrated by the following Problem.
Problem 4.75. Show using the explicit expansion of J .˛z/ from Eq. (4.206)
that

dJ .˛z/ dJ .ˇz/
z J .ˇz/ J .˛z/
dz dz
X1 X 1
2 .k n/ .1/kCn ˛z C2k ˇz C2n
D : (4.238)
kD0 nD0
kŠnŠ . C k C 1/ . C n C 1/ 2 2
The above result shows that the terms with k D n can be dropped; therefore, the
first term in the expansion starts not from n D k D 0 as we thought before, but
from k; n D 0; 1 and k; n D 1; 0, which give the z2C2 type of behavior for the first
(leading) term in Eq. (4.238). Therefore, the free term in (4.237) is well defined
at z D 0 if 2 C 2 > 0 or > 1.
We still need to check the convergence of the two integrals in (4.236). Let us
start from considering the first one: it contains two derivatives of the Bessel function
which behave as z1 near z D 0 each, i.e. the integrand is z21 near its bottom
Rl
limit. We know that the integral 0 z21 dz converges, if 2 1 > 1 which gives
> 1 again. Similarly for the second integral: its first term behaves as z2C1
around z D 0, while the second one as z21 , both converge for > 1.
This analysis proves that our consideration is valid for any > 1. Because
the free term (4.238) in Eq. (4.237), prior to taking the limits, behaves as z2C2 D
z2.C1/ , it is equal to zero when z D 0 since C 1 > 0. Therefore, the bottom limit
applied to the free term can be dropped and we obtain instead of (4.237):
Z
dJ .˛z/ dJ .ˇz/ l
l J .ˇz/ J .˛z/ C ˛2 ˇ2 zJ .˛z/ J .ˇz/ dz D 0:
dz dz zDl 0
(4.239)
So far we have not specified the values of the real constants ˛ or ˇ. Now we are
going to be more specific. Consider solutions of the equation
J .z/ D 0 (4.240)
with respect to (generally complex) z; these are the roots of the Bessel function.
It can be shown that this equation cannot have complex roots at all, but there is an
infinite number of real roots ˙x1 , ˙x2 , etc. Let us choose the values of ˛ and ˇ such
that ˛l and ˇl be two distinct roots (˛ ¤ ˇ). Then, J .˛l/ D 0 and J .ˇl/ D 0, and
the first term in (4.239) becomes zero, so that we immediately obtain
Z l
zJ .˛z/ J .ˇz/ dz D 0 (4.241)
0
for ˛ ¤ ˇ. This is the required orthogonality condition for the Bessel functions: they
appear to be orthogonal not with respect to their index , but rather with respect to
the “scaling” of their arguments. Note the weight z in the orthogonality condition
above.
It is also possible to choose ˛l and ˇl as roots of the equation J0 .z/ D 0; it is
readily seen that in this case the first term in Eq. (4.239) is also zero and we again
arrive at the orthogonality condition (4.241).
Problem 4.76. Prove that ˛l and ˇl can also be chosen as roots of the equation
C1 J .z/ C C2 zJ0 .z/ D 0, where C1 and C2 are arbitrary real constants. This
condition generalises the two conditions given above as it represents their
linear combination.
As was mentioned, under certain conditions a properly integrable function f .x/

can be expanded into Bessel functions with a desired index :
X
f .x/ D fi J .˛i x/ ; (4.242)
i
where summation is made over all roots xi D ˛i l of the equation J .x/ D 0;

note that the roots depend on the index chosen, although this is not explicitly
indicated. Using the orthogonality of the Bessel functions, we immediately obtain
for the expansion coefficients the following formula:
Rl
0xf .x/J .˛i x/ dx
fi D Rl 2
: (4.243)
0 xJ .˛i x/ dx
It is only left to discuss how to calculate the normalisation integral standing in the
denominator of the above equation. To do this, we first solve for the integral in
Eq. (4.239):
Z
l
l dJ .˛x/ dJ .ˇx/
xJ .˛x/ J .ˇx/ dx D J .ˇx/ J .˛x/ ;
0 ˛ ˇ2
2 dx dx xDl
and then consider the limit ˇ ! ˛. This is because in the left-hand side under this
limit we would have exactly the normalisation integral we need. Note that x D ˛l
is one of the roots of the equation J .x/ D 0. Therefore, in the right-hand side
the second term inside the square brackets can be dropped and the expression is
simplified:
Z
l
l dJ .˛x/
xJ .˛x/2 dx D lim J .ˇx/ :
0 ˇ!˛ ˛ ˇ2
2 dx xDl
Since under the limit J .ˇl/ ! J .˛l/ D 0, we have to deal with the 0=0
uncertainty. Using the L’Hôpital’s rule, we have
Z l
2 l dJ .˛x/ dJ .ˇx/
xJ .˛x/ dx D lim
0 ˇ!˛ 2ˇ dx dˇ xDl

l dJ .u/ dJ .u/ l2 dJ .u/ 2
D lim ˛ l D :
2˛ ˇ!˛ du uD˛l du uDˇl 2 du uD˛l
(4.244)
This is the required result. It can also be rearranged using Eq. (4.220). Using
there x D ˛l and recalling that J .˛l/ D 0, we obtain that ŒdJ .u/ =duuD˛l D
JC1 .˛l/, which yields
Z l
l2
xJ .˛x/2 dx D JC1 .˛l/2 : (4.245)
0 2
Problem 4.77. Perform a similar calculation assuming that ˛l is a root of the

equation J0 .x/ D 0. Show that in this case
Z
l
1 2 2
xJ .˛x/2 dx D l 2 J .˛l/2 : (4.246)
0 2 ˛
[Hint: express the second derivative of J .u/ from the Bessel DE.]
4.7 Selected Applications in Physics
4.7.1 Schrödinger Equation for a Harmonic Oscillator
Consider a quantum particle of mass m moving in a quadratic potential V.x/ D

kx2 =2. In classical
p mechanics a particle in this potential would oscillate with the
frequency ! D k=m. In quantum mechanics stationary states of the particle are
obtained by solving the Schrödinger equation:
„2 d2 .x/ kx2
2
C .x/ D E .x/; (4.247)
2m dx 2
where E is the energy of a stationary state, .x/ is the corresponding wave function
and „ D h=2 is the Planck constant; the probability dP.x/ to find the particle
between x and x C dx in this particular state is given by j .x/j2 dx. Because the
particle must be somewhere, the probability to find it anywhere on the x axis must
be equal to one:
Z C1 Z C1
P.x/dx D j .x/j2 dx D 1; (4.248)
1 1
which gives a normalisation condition for the wave function. What we shall try to
do now is to find the energies and the wavep functions of the harmonic oscillator.
p Let us introduce a new variable u D x m!=„ instead of x. Then, x0 D u0 u0x D
m!=„ u0 , xx 00
D .m!=„/ uu 00
and the DE is transformed into:

00 2E
C u2 D 0: (4.249)
„!
This DE is of the hypergeometric type (4.117) with .u/ D 1, ˇ.u/ D 0 and .u/ D
2E=„!u2 , and can be transformed into the standard form (4.122) using the method
we developed in Sect. 4.4.1.
Problem 4.78. Show using this method that the transformation from .u/ to a
new function g.u/ via .u/ D .u/g.u/ can be accomplished by the transfor-
2
mation function .u/ D eu =2 , which corresponds to the polynomial (4.123)
being .u/ D u, while the corresponding eigenvalues ! n D 2n with
n D 0; 1; 2; : : :. Hence demonstrate that the transformed DE for the new
function g.u/,
g00 2ug0 C n g D 0; (4.250)
coincides with the DE for the Hermite polynomials (4.100), where n D

2E=„! 1.
Therefore, we obtain for the energies of an oscillator:

2En 1
1 D 2n H) En D „! n C ; n D 0; 1; 2; 3; : : : : (4.251)
„! 2
Hence, stationary states of the harmonic oscillator are characterised by a discrete
set of equidistant energies (differing exactly by a single oscillation quanta „!), and
their wave functions are proportional to Hermite polynomials:
r
u2 =2 m!x2 =2„ m!
n .u/ D C n e Hn .u/ H) n .x/ D C n e Hn x ;
„
where the normalisation constant is to be defined from Eq. (4.248):
Z 1 Z 1 r 2
2 m!x2 =„ m!
n .x/dx D Cn2 e Hn x dx
1 1 „
r Z 1
„ 2
D Cn2 eu Hn .u/2 du D 1:
m! 1
The last integral in the right-hand side corresponds to the normalisation of the
Hermite polynomials, see Eq. (4.148), and therefore we finally obtain
m! 1=4 2n=2
Cn D p : (4.252)
„ nŠ
The ground state of the quantum oscillator, 0 .x/, has a non-zero energy E D „!=2,
called the zero point energy; excited states of the oscillator, n .x/ with n 1, are
obtained by adding n quanta „! to the energy of the ground state. A single quanta of
oscillation is called in solid state physics a “phonon”. It is therefore said that there
are no phonons in the ground state, but there are n phonons, each carrying the same
energy of „!, in the n-th excited state.
4.7.2 Schrödinger Equation for the Hydrogen Atom
Consider an electron of charge e (the elementary charge e is assumed positive

here) moving in the Coulomb field V.r/ D Ze2 =r of a nucleus of charge Ze placed
in the centre of coordinates. The nucleus potential V.r/ D V.r/ depends only on the
distance r D jrj between the nucleus, placed in the centre of the coordinate system,
and the electron position described by the vector r. The Schrödinger equation in this
case for the stationary states of the electron with energy E has the form:
„2
.r/ C V.r/ .r/ D E .r/; (4.253)
2m
where .r/ is the electron wave function and m is its mass, „ D h=2 is the Planck
constant. Writing the Laplacian in the spherical coordinates (see Sect. 4.5.3), after
simple rearrangement, we obtain

1 @ 2@ 1 @ @ 1 @2 2m Ze2
r C 2 sin C 2 2 D 2 EC :
r2 @r @r r sin @ @ r sin @ 2 „ r
Problem 4.79. Using the method of separation of variables identical to the

one we used in Sect. 4.5.3, show that the solution of the above equation can
be written as .r/ D R.r/‚ ./ ˆ . /, where ˆ . / D e˙im with a positive
integer m (including zero) and the function ‚ ./ satisfies Eq. (4.186). In other
words, the angular part of the wave function is identical to the angular solution
of the Laplace equation we obtained in Sect. 4.5.3, i.e. it is basically given by
the spherical functions Ylm .; / with l D 0; 1; : : : and m D l; : : : ; 0; : : : ; l.
At the same time, the equation for the function R.r/ reads

00 2 0 2m Ze2 l.l C 1/
R C R C EC R D 0: (4.254)
r „2 r r2
Let us solve the above equation for the radial part R.r/ of the wave function for
the case of bound states, i.e. the states in which energy is negative: E < 0. These
states correspond to a discrete spectrum of the atom, i.e. there will be an energy gap
between any two states. There are also continuum states when E > 0 (no gap), we
shall not consider them here.
It is convenient to start by introducing a new variable D r=a, where a D
p
„= 8Em. Then, R0 ! R0 =a and R00 ! R00 =a2 , and Eq. (4.254) is rewritten as:

00 2 0 1 l.l C 1/
R C R C C 2
R D 0; (4.255)
4
p
where D Ze2 m=2„2 E.
The obtained DE is of the generalised hypergeometric type (4.117) with . / D
, ˇ D 2 and . / D 2 =4 C l .l C 1/. In the general method of Sect. 4.4.1,
we express R . / via another function u . / using R . / D . / u . /, where the
auxiliary function . / is chosen such that the new function u . / satisfies a DE
in the standard form (4.122). When applying the method developed in this section,
several choices must be made concerning the sign in defining the constant k and the
-polynomial (4.123) which in our case is as follows:
r s

1 1 1 1 2 2
. /D ˙ Ck . /D ˙ lC C .k / C :
2 4 2 2 4
Recall that the constant k is chosen such that the expression under the square root
becomes a full square.
Problem 4.80. Show using the method developed in Sect. 4.4.1, that the
constant k has to be chosen as k D ˙.l C 1=2/. Then, if one chooses the minus
sign in the expression for the k and hence the polynomial . / D =2 C l,
then the DE for the function u . /,
u00 C .2l C 2 / u0 C . l 1/ u D 0; (4.256)
indeed takes on the form of Eq. (4.122) with D , . / D 2l C 2 and

D l 1. Note the minus sign before in . / D =2 C l; this ensures
that the derivative 0 D is negative. Finally, show that in this case the
transformation function . / D l e =2 , i.e. the radial part is expressed in
terms of the function u . / of Eq. (4.256) as
l =2
R. / D e u. /: (4.257)
Problem 4.81. Show that if the constant k is chosen as k D C .l C 1=2/,

hence . / D =2 l 1 and then the transformation between R . / and
u . / is given by R D l1 e =2 u.
Problem 4.82. Consider the solution of Eq. (4.255) near D 0 in terms of a
power series: R . / D r1 s Cr2 sC1 C . Show that the power s of the leading
term can only be either l or l 1.
Problem 4.83. Similarly, consider the solution of Eq. (4.255) for large by
dropping terms containing 1= . Show that in this case R . / e =2 .
So we have two possible choices for the constant k and hence two possible
transformations R . / ! u . /, either R D l e u or R D l1 e u. As we
expect that the solution of the DE for u. / is to be bound on the whole interval
0 < 1 (by choosing the appropriate eigenvalues ), then only the first choice
can be accepted; the second one is unacceptable as in this case R . / diverges at
D 0 for any l D 0; 1; 2; : : :. Hence, we conclude that R. / is to be chosen from
Eq. (4.257) with u. / satisfying the DE (4.256). This DE has a bound solution if
and only if the coefficient to the u term in DE (4.256), i.e. D l 1, is
given by Eq. (4.127) with integer n D 0; 1; 2; : : :. Since in our case 00 D 0 and
0 D .2l C 2 /0 D 1, we then have
mZ 2 e4
l 1 D n 0 D n H) D nClC1 H) ED ; (4.258)
2„2 n2r
where nr D n C l C 1 is called the main quantum number. For the given value of
l, the main quantum number takes on values nr D l C 1; l C 2; : : :. In particular,
for l D 0 (the so-called s states) we have nr D 1; 2; : : : which correspond to 1s, 2s,
etc. states of the H atom electron; for l D 1 (the p states) we have nr D 2; 3; : : :
corresponding to 2p, 3p, etc. states, and so on. We see that the allowed energy levels
are indeed discrete and they converge at large quantum numbers to zero from below.
As nr is getting bigger, the gap between energy levels gets smaller tending to zero
as nr ! 1.
Replacing D l 1 in Eq. (4.256) with D n, we obtain the DE for
generalised Laguerre polynomials (4.108) Ln2lC1 . / D Ln2lC1 r l1
. /. In other words,
the radial part of the wave function of an electron in a hydrogen-like atom is given by
l =2 2lC1
Rnr l . / e Lnr l1 . /:
Normalisation of these functions can now be obtained using Eq. (4.150).
4.7.3 Stirling’s Formula and Phase Transitions
Statistical mechanics deals with very large numbers N of particles which may be
indistinguishable. In these cases it is often needed to calculate the factorial NŠ of
very large numbers. There exists an elegant approximation to NŠ for very large
integer numbers N, the so-called Stirling’s approximation
p
NŠ N N eN 2 N; (4.259)
which we shall derive here first. Then we shall show some applications of this result.
Since .N C 1/ D NŠ, we have to investigate the Gamma function for large
integer N. We shall consider a more general problem of a real number z, where
jzj 1. Let us start by making the following transformation in .z C 1/:
Z 1 Z 1 Z 1
.z C 1/ D tz et dt D .zx/z ezx zdx D zzC1 ez exp Œz.x ln x 1/ dx;
0 0 0
(4.260)
where the substitution t D zx has been made.
Now, the function f .x/ D x ln x 1 appearing in the exponential, see Fig. 4.5,
has a minimum at x D 1 with the value of f .1/ D 0. Thus the function ezf .x/ has
a maximum at x D 1 and this maximum becomes very sharply peaked when jzj
is large, see the same figure. This observation allows us to evaluate the integral in
Eq. (4.260) approximately7 . Indeed, two things can be done. Firstly, the bottom limit
of the integral can be extended to 1 as the exponential function in the integral is
practically zero for x < 0. Secondly, we can expand f .x/ about x D 1 and retain
only the leading order term. That is, if we write x D 1 C y, we obtain
y2
f .x/ D f .1 C y/ D 1 C y ln.1 C y/ 1 D y ln.1 C y/ D C
2
7
In fact, what we are about to derive corresponds to the leading terms of the so-called asymptotic
expansion of the gamma function for jzj 1.
Fig. 4.5 Functions f .x/ D x ln x 1 (black) and exp .zf .x// (other colors) for z D1, 5, 10
and 100
by expanding the logarithm in the Taylor series about y D 0. Consequently, we

obtain
Z C1
2 =2 p
.z C 1/ zzC1 ez ezy dy D 2 zzz ez :
1
As a consequence of the Stirling’s approximation, consider the logarithm of NŠ

for large values of N. Letting z D N in the Stirling’s formula, one obtains
ln .NŠ/ N ln N N : (4.261)
In most applications the terms we dropped here are negligible as compared to the
two terms retained.
As an example of application of the Stirling’s formula (4.261), we shall consider
a paramagnetic–ferromagnetic phase transition in a simple three-dimensional lattice
within a rather simple Bragg–Williams theory. Let us consider a lattice of atoms
assuming that each atom may either have its magnetic moment being directed “up”
or “down”. We shall also assume that only magnetic moments of the nearest atoms
interact via the so-called exchange interaction with z being the number of the nearest
neighbors for each lattice site. Let n# and n" be average densities of atoms with
the moments directed
up and
down, respectively. We shall also introduce the order
parameter m D n" n# =n, which is the relative magnetisation of the solid and
n D n" C n# is the total number of atoms in the unit volume. The densities of the
atoms having moments up and down are given, respectively, by n" D n .1 C m/ =2
and n# D n n" D n .1 m/ =2 via the order parameter, as is easily checked.
Our goal is to calculate the free energy density F D U TS of this lattice, where
U is the internal energy density, T temperature and S the entropy density. We start by
calculating the internal energy U. Let N"" , N## and N"# be the numbers of different
pairs (per unit volume) of the nearest moments which are aligned, respectively,
parallel up, parallel down or antiparallel to each other. It is more energetically
favorable for the moments to be aligned in parallel, the energy in this case is J0 ;
otherwise, the energy is increased by 2J0 , i.e. it is equal to CJ0 . Then on average
one can write the following expression for the internal energy:

U D J0 N"" C N## C J0 N"# B H n" n# ; (4.262)
where the last term corresponds to an additional contribution due to the applied
magnetic field H along the direction of the moments, B being Bohr magneton. To
calculate the number density of pairs, we note that, for the given lattice site, the
probability of finding a nearest site with the moment up is p" D n" =n. For each site
with the moment up (whose density is n" ) there will be zp" nearest neighbors with
the same direction of the moment, i.e.
1 zn2" 1
N"" D n" zp" D D zn .1 C m/2 ;
2 2n 8
where the factor of one half is required to avoid double counting of pairs. Similarly,
one can calculate densities of pairs of atoms with other arrangements of their
moments:
1 1 1
N## D n# zp# D zn .1 m/2 and N"# D n" zp# D zn 1 m2 :
2 8 4
Note that the factor of one half is missing in the last case. This is because in this
case there is no double counting: we counted atoms with the moment up surrounded
by those with the moment down. The obtained expressions for the density of pairs
yield for the internal energy density (4.262):
1
U D J0 znm2 mB nH: (4.263)
2
This formula expresses the internal energy entirely via the order parameter m.
The entropy density can be worked out from the well-known expression S D
kB ln W, where kB is the Boltzmann constant and W is the number of possibilities in
which one can allocate n" moments up and n# moments down on a lattice with n

n n
sites. Obviously, W D D , and hence
n# n"

n nŠ
S D kB ln D kB ln D kB ln .nŠ/ ln n" Š ln n n" Š :
n" n" Š n n" Š
To calculate the factorials above, we use the Stirling’s formula (4.261) which gives

S ' kB n ln n n" ln n" n n" ln n n"

1 1
D kB n ln 2 .1 C m/ ln .1 C m/ .1 m/ ln .1 m/ : (4.264)
2 2
As we now have both components of the free energy, U and S, we can combine
them to obtain the free energy density as
1
F D U TS D J0 znm2 mB nH
2

1 1
nkB T ln 2 .1 C m/ ln .1 C m/ .1 m/ ln .1 m/ : (4.265)
2 2
To calculate the magnetisation M D nB m (per unit volume), we need to find the
minimum of the free energy for the given value of H and T which should correspond
to the stable phase for these parameters:
@F 1 1Cm
D0 H) nkB Tc m B nH C nkB T ln D 0; (4.266)
@m 2 1m
where Tc is defined by the identity kB Tc D J0 z. The above equation can be
rearranged in the following way. If we denote x D .B H C kB Tc m/ =kB T, then
1Cm e2x 1 ex ex

ln D 2x H) mD D
1m e2x C 1 ex C ex

1
D tanh .x/ D tanh .B H C kB Tc m/ :
kB T
Therefore, the magnetisation is given by:

B H Tc
M D nB tanh C M : (4.267)
kB T TnB
This is a transcendental equation for M which can be solved, e.g. graphically
by plotting f1 .M/ M and f2 .M/ D tanh .a C M/ for different values of
D Tc = .TnB / and a D B H=kB T (or the magnetic field H) and looking for
intersection(s). Consider the simplest case of zero magnetic field (a D 0) which
is shown in Fig. 4.6. Then it can be seen that Eq. (4.266) has a single solution
M D 0 if T > Tc ; however, for T < Tc two additional solutions are possible
corresponding to two symmetric minima in F, both giving non-zero magnetisation
determined by Eq. (4.267). These correspond to a spontaneous magnetisation and
the ferromagnetic phase. Correspondingly, Tc is a critical temperature at which the
transition from the non-magnetic to ferromagnetic phase happens, i.e. it is the phase
transition temperature. Note that the solution M D 0 also exists for T < Tc , but it
corresponds to a higher free energy F and hence has to be omitted.
Fig. 4.6 Graphical solution

of Eq. (4.267) in the case of
zero magnetic field (for
simplicity, we have used the
units in which nB D 1)
The situation is a bit more complex for a non-zero magnetic field: the minima are
no longer equivalent, and the system prefers to align the magnetic moment along the
field. With decreasing T the magnetisation tends to the saturation limit at m D ˙1.
Using an essentially identical argument, one can also consider statistics of the
order–disorder phase transition in binary alloys.
4.7.4 Band Structure of a Solid
It is well known from elementary quantum mechanics that electrons in a periodic

lattice form energy bands, i.e. energies of the electrons can take values within some
continuous intervals of energy called bands, and there could be gaps between such
intervals (called energy gaps). We shall now discuss a famous model due to R.
de L. Kronig and W.G. Penney which demonstrates the formation of bands in an
extremely clear way.
Consider a one-dimensional metal in which positive ions are arranged period-
ically along the x axis with a distance a between them as shown in Fig. 4.7(a).
Positive ions create a potential V.x/ for the electrons which is also periodic, i.e.
V.x C na/ D V.x/ for any integer n. This potential is also schematically shown in
Fig. 4.7(a). An essential feature of V.x/; apart from its periodicity, is that it goes to
1 at the nuclei themselves, the minus sign is due to attraction between negative
electrons and positive nuclei.
What we would like to do is to calculate the energy levels of the electrons by
solving the corresponding Schrödinger equation:
„2 d2 k .x/
C V.x/ k .x/ D Ek k .x/; (4.268)
2m dx2
Fig. 4.7 One dimensional lattice of positive ion cores produces a potential V.x/ for the electrons;
the potential goes to 1 at the positions of the nuclei. (a) A realistic potential; (b) a model
potential in which each nucleus creates a delta function like potential
where k is the electron momentum used to distinguish different states, and k .x/ and
Ek are the corresponding electron wave function and energy. Because the lattice is
periodic, each electron has a well-defined momentum k serving as a good quantum
number. Also, the wave function satisfies the so-called Bloch theorem whereby
k .x/ D e uk .x/, where uk .x/ is a periodic function with the same period a as
ikx
the lattice potential.

Following Kronig and Penney, we shall consider an extremely simplified nuclear
potential which is simply a sum of delta functions placed at the nuclei:
1
X
V.x/ D Aa ı.x la/; (4.269)
lD1
where A < 0 is some parameter. Here we sum over all atoms whose positions are
given by la. So in this model negative singularities at atomic cores are correctly
described, however, the potential everywhere between cores is zero instead of going
smoothly between two minus infinity values at the nearest cores, compare Fig. 4.7(a)
and (b).
Since the functions V.x/ and uk .x/ are periodic, we can expand them in an
(exponential) Fourier series:
X X
V.x/ D Vg eigx and uk .x/ D ug .k/eigx ; (4.270)
g g
where g D .2 =a/ j with j D 0; ˙1; ˙2; : : : is the one-dimensional reciprocal

“lattice vector”, and the summations are performed over all such possible values of
g (in fact, over all integer j). The Fourier image Vg of the potential is given by
Z XZ X
1 a
igx
a
Vg D V.x/e dx D A ı.x la/eigx dx D A ıl0 eigx xD0 D A:
a 0 l 0 l
Note that only the term l D 0 is contributed to the integral. Thus, the delta function
lattice potential has all its Fourier components equal to each other. Now use the
Bloch theorem and then substitute the expansions (4.270) of the potential and of the
wave function into the Schrödinger equation (4.268):
2 3
X „2 X 0 X
4 .q C g/2 C A eig x 5 ug .k/eigx D Ek ug .k/eigx : (4.271)
g
2m 0 g
g
Problem 4.84. Using the definition of the reciprocal one-dimensional vector

g, prove that the plane waves eigx are orthonormal on the lattice, i.e. they satisfy
the following identity:
Z
1 a
0
ei.gg /x dx D ıgg0 : (4.272)
a 0
[Hint: consider separately the case of g D g0 and g ¤ g0 .]

Problem 4.85. By multiplying both sides of Eq. (4.271) with eig1 x and inte-
grating over x between 0 and a, obtain the following equations for the wave
function Fourier components ug .k/:
X
„2
.k C g/2 Ek ug .k/ C A ug1 .k/ D 0: (4.273)
2m g 1
This equation
P can be solved exactly to obtain the energies Ek . The trick is to
introduce D g1 ug1 .k/. Indeed, from (4.273)
A
ug .k/ D „2 :
2m
.k C g/2 Ek
Summing both sides with respect to all g, we shall recognise in the left-hand side.
Hence, canceling on , we obtain the following exact equation for the energies:
„2 X 1
D 2mEk
(4.274)
2mA 2
g .k C g/ „2
The sum in the right-hand side is calculated analytically. Indeed, using the identity

1 1 1 1
2 2
D ;
a b 2b ab aCb
Fig. 4.8 Plot of the function

f .z/ D cos z 20 sin z=z
versus z D Ka. The allowed
regions of the parameter Ka
for which the function lies
between 1 and 1 are
indicated by green regions.
The first four energy bands
are numbered
Eq. (4.274) is manipulated into the following form:

r 1 1
„ 8Ek X 1 X 1
D ;
Aa m jD1
j C .b C c/ = jD1
j C .b c/ =
p
where b D ka=2 and c D .a=2„/ 2mEk . Next, we use Eq. (4.49) for the cotangent
function as well as the trigonometric identity
2 sin 2y
cot.x C y/ cot.x y/ D ; (4.275)
cos 2x cos 2y
see Eq. (I.2.66), which finally gives
sin Ka
cos ka D cos Ka C P ; (4.276)
Ka
p
where P D mAa2 =„2 and K D 2c=a D „1 2mEk , i.e. Ek D „2 K 2 =2m. The
obtained Eq. (4.276) fully solves the problem as it gives K.k/ which in turn yields
Ek for each k. It has real solutions only if the right-hand side of the equation is
between 1 and 1. This condition restricts possible values of the wave vector K and
hence of the energies Ek , and therefore results in bands. The function
sin.Ka/
f .Ka/ D cos.Ka/ C P
Ka
for P D 20 is plotted in Fig. 4.8. The bands, i.e. the regions of allowed values of
Ka, are colored green in the figure. Solving Eq. (4.276) with respect to K D K.q/
for each given value of k allows calculating dispersion of energy bands Ek .
4.7.5 Oscillations of a Circular Membrane
Consider oscillations in a circular membrane of radius a which is fixed at its rim. The
mathematical description of this problem is based on the theory of Bessel functions.
The partial DE which needs to be solved in this case, assuming the membrane is
positioned within the x y plane, has the form (see Chap. 8 for a more detailed
discussion on partial DEs of mathematical physics):

1 @2 u 1 @2 u 1 @ @u 1 @2 u
D u H) D r C ; (4.277)
c2 @t2 c2 @t2 r @r @r r2 @ 2
where u.r; ; t/ is the vertical displacement of the membrane (along the z axis)
written in polar coordinates .r; /, and c is the sound velocity. Correspondingly,
we wrote the Laplacian in the right-hand side in the polar coordinates and discarded
the z-dependent term, see Eq. (7.88).
To solve this equation, we apply the method of separation of variables already
mentioned in Sect. 4.6.5 and to be discussed in more detail in Chap. 8.
Problem 4.86. Seeking the solution in the factorised form, u.r; ; t/ D

R.r/ˆ. /T.t/, substitute it into the DE and show, assuming an oscillatory in
time solution, that the three functions satisfy the following ordinary DEs:
T 00 C 2 c2 T D 0; (4.278)
ˆ00 C n2 ˆ D 0; (4.279)

00 1 0 2 n2
R C R C 2 R D 0; (4.280)
r r
where 2 > 0 and n2 are the corresponding separation constants. Then argue
why the number n must be an integer (cf. discussion in Sect. 4.5.3).
The general solution of equation (4.278) for T.t/ is given by (A and B are
arbitrary constants)
T.t/ D A cos .ct/ C B cos .ct/ ;
while the solutions of the ˆ. / Eq. (4.279) are periodic functions e˙in . Equa-
tion (4.280) for the radial function R.r/ coincides with the Bessel DE (4.234), and
hence its general solutions are a linear combination of the Bessel functions of the
first and second kind:
R.r/ D C1 Jn .r/ C C2 Kn .r/ :

Since the Kn function diverges at r D 0, it has to be discarded as we expect

the solution to be finite everywhere including the centre of the membrane. Hence,
R.r/ D Jn .r/ must be the only acceptable solution.
To find the appropriate values of the separation constant , we have to make
sure that the solution we are seeking behaves appropriately at the boundary of
the membrane, the so-called boundary conditions. Since the membrane is fixed at
the boundary r D a, there is no vertical displacement on the rim of the membrane,
and hence we have to set R.a/ D 0. This means that Jn .a/ D 0, and this
equation may serve as the equation for the determination of the unknown separation
.n/
constant . If i are roots (numbered by the index i D 1; 2; : : :) of the Bessel
.n/ .n/ .n/
function for the given value of n, i.e. Jn i D 0, then H) i D i =a.
We know from Sect. 4.6.5 that there is an infinite number of roots of the equation
.n/
Jn i D 0, and hence there is an infinite number of the possible values of the
.n/ .n/
separation constant i D i =a. Therefore, for any n and i (the root number) the
products
h i
.n/ .n/ .n/
Ani cos i ct C Bni sin i ct e˙in Jn i r
satisfy the partial DE (4.277) and the boundary conditions. Here Ani and Bni are
arbitrary constants. By taking a linear combination of these elementary solutions,
we can construct the general solution of the problem since the DE is linear:
1 nh
1 X
X i
.n/ .n/
u.r; ; t/ D Ani cos i ct C Bni sin i ct ein
nD0 iD1
h i o
.n/ .n/ .n/
C Ani cos i ct C Bni sin i ct ein Jn i r ; (4.281)
where the constants Ani and Bni are assumed to be generally complex. Note that the
arbitrary constants by the sine and cosine functions of the ein part are complex
conjugates of those we used with ein ; this particular construction ensures that the
function u.r; ; t/ is real. Also note that the summation over i corresponds to the
particular value of n in the first sum.
Formally, the above formula can be rewritten in a simpler form by extending
the sum over n to run from 1 to 1 and defining Ani D Ani , Bni D Bni and
.n/ .n/
i D i :
1 h
1 X
X i
.n/ .n/ .n/
u.r; ; t/ D Ani cos ci t C Bni sin ci t ein Jjnj i r :
nD1 iD1
(4.282)
To determine the constants, we have to apply the initial conditions. In fact, the
problem can be solved for very general initial conditions:
ˇ
@u .r; ; t/ ˇˇ
u .r; ; t/jtD0 D f .r; / and ˇ D '.r; /: (4.283)
@t tD0
Applying t D 0 to the function (4.282) and its time derivative, these conditions are
transformed into the following equations:
1 X
X 1
.n/
ujtD0 D Ani ein Jjnj i r D f .r; / ; (4.284)
nD1 iD1
ˇ X1 X 1
@u ˇˇ .n/ .n/
D c Bni ein Jjnj i r D ' .r; / : (4.285)
@t ˇtD0 nD1 iD1 i
The complex coefficients Ani and Bni contain two indices and correspondingly are
under double sum. They can be found then in two steps.
Problem 4.87. Using the orthogonality of the exponential functions,

Z 2
ei.nm/ d D 2 ınm ;
0
multiply both sides of the above Eqs. (4.284) and (4.285) by eim and then
integrate over the angle to show that
Z 2 1
X
1 in .n/
f .r; / e d D Ani Jjnj i r ; (4.286)
2 0 iD1
Z 2 1
X
1 .n/ .n/
' .r; / ein d D ci Bni Jjnj i r : (4.287)
2 0 iD1
.n/
At the second step, we recall that the Bessel functions Jjnj i r also form an
orthogonal set, see Eq. (4.241), so that coefficients in the expansion of any function
in Eq. (4.242) can be found from Eq. (4.243). Therefore, we can finally write
Z Z 2
1 a
.n/
Ani D rdr d ein f .r; / Jjnj i r ; (4.288)
2 Jni 0 0
Z Z 2
1 a
.n/
Bni D .n/
rdr d ein f .r; / Jjnj i r ; (4.289)
2 ci Jni 0 0
where
Z a 2
.n/
Jni D rJjnj i r dr (4.290)
0
is the normalisation integral which we calculated in Eq. (4.244). These equations

fully solve our problem.
Fig. 4.9 A small volume dr0

within a charge density
distribution .r0 / creates an
electrostatic potential at point
r. The “observation” point
may in general be inside the
distribution
In a similar way one can solve heat transport problem in cylindrical coordinates,
as well as vibration problem in spherical coordinates. In both cases Bessel functions
appear.
4.7.6 Multipole Expansion of the Electrostatic Potential
Consider an electrostatic potential at point r created by some charge density

distribution .r/. A little volume dr0 inside the distribution, see Fig. 4.9, contains
the charge dq D .r0 / dr0 which is a source of the electrostatic potential dq= jr r0 j
at the “observation” point r. The latter point may be inside the distribution as shown
in the figure, or outside it. The total potential at the point r is given by the integral
Z
.r0 / 0
U .r/ D dr ; (4.291)
V jr r0 j
taken over the whole volume V where the charge distribution is non-zero. Note that
if we have a collection of point charges qi located at points ri , then the charge density
due to a single charge qi is given by qi ı .r ri /, where
ı .r ri / D ı .x xi / ı .y yi / ı .z zi /
is a three-dimensional Dirac delta function. Indeed, this definition makes perfect

sense: it is only non-zero at the position ri where the charge is and integrates exactly
to qi :
Z Z Z Z
qi ı .r ri / dr D qi ı .x xi / dx ı .y yi / dy ı .z zi / dz D qi :
The filtering theorem is also valid for this delta function as the volume integral can
always be split into a sequence of three one-dimensional integrals, and the filtering
theorem can be applied to each of them one after another:
Z Z Z Z

f r0 ı r r0 dr0 D ı x x0 dx0 ı y y0 dy0 f x0 ; y0 ; z0 ı z z0 dz0
Z Z

D ı x x0 dx0 ı y y0 f x0 ; y0 ; z dy0
Z

D ı x x0 f x0 ; y; z dx0 D f .x; y; z/ D f .r/ ;
where f .r/ is an arbitrary continuous function.

Therefore, for a collection of point charges
X
.r/ D qi ı .r ri / ;
i
and the potential becomes

Z Z
.r0 / 0 X ı .r0 ri / 0 X qi
Ucharges D dr D qi dr D ;
V jr r0 j i V jr r0 j i
jr ri j
as expected.
So, formula (4.291) is general: it can be used both for continuous and discrete
charge distributions. We shall now derive the so-called multipole expansion of
the potential U.r/. Let the angle between the vectors r and r0 be . Then, using
the cosine theorem, we can write
ˇ ˇ p
ˇr r0 ˇ D r2 C r02 2rr0 cos :
The volume integration in (4.291) is convenient to perform in spherical coordinates

.r; ; /:
Z 1 Z
.r0 /
U.r/ D r02 dr0 d p ;
0 r2 C r02 2rr0 cos
where d D sin dd is an element of the solid angle (which integrates to 4 );

this is just a shorter, fully equivalent, way of writing the integration in spherical
coordinates. The integration over r0 is formally performed up to infinity; in practice,
it is terminated if the charge distribution occupies a finite volume. Now, we shall
divide the r0 integration into two parts: 0 r0 r and r r0 < 1, and then expand
the integrand into a power series exploiting the generating function (4.50) for the
Legendre polynomials. In the case of r0 =r < 1 we write
1 0 l
1 1 1 1X r
p D q D Pl .cos / ;
r2 Cr02 2rr0 cos r 1 C .r0 =r/2 2 .r0 =r/ cos r lD0 r
while for the case of r=r0 < 1 we similarly have

1
1 1 1 1X r l
p D 0q D 0 Pl .cos / 0 :
2 02 0
r Cr 2rr cos r 2 r r
1C .r=r0 / 2 .r=r0 / cos lD0
Next we shall expand the Legendre polynomial into spherical functions using
Eq. (4.199) which yields the following expression for the potential:
1 X
X Z r Z
l
4 1 0 lC2 0
U.r/ D Yl .; /
m
r dr d r0 Ylm . 0 ; 0 /
lD0 mDl
2l C 1 r lC1
0
Z 1 Z
0 lC1 0
C rl r dr d r0 Ylm . 0 ; 0 / ; (4.292)
r
which is the required general formula. Here r D .r; ; / and r0 D .r0 ; 0 ; 0 / are
the two points (vectors) in the spherical coordinates. Note that complex spherical
harmonics were used here; however, since the expansion formula (4.199) for the
Legendre polynomials is invariant under the substitution of the complex harmonics
Ylm with the real ones, Ylm ! Slm , one can also use the real harmonics in the formula
above.
We shall now consider a particular case of a potential outside a confined charge
distribution. In this case we assume that there exists such a D maxV .r0 / that
.r0 / D 0 for r0 > a. Then the potential outside the charge distribution (i.e. for
r > a) would only contain the first integral of Eq. (4.292) in which the r0 integration
is done between 0 and a:
X 4 Z a Z
1
0 lC2 0
U.r/ D Yl .; / lC1
m
r dr d r0 Ylm . 0 ; 0 /
lm
2l C 1 r 0
r
X 4 Qlm
D Ylm .; / l ; (4.293)
lm
2l C 1 r
where we introduced multipole moments of the charge distribution:

r Z a Z
4 0 lC2 0
Qlm D r dr d r0 Ylm . 0 ; 0 /
2l C 1 0
r Z
4 0 m 0 0 0 l 0
D r Yl . ; / r dr (4.294)
2l C 1 V
with dr0 D .r0 /2 dr0 d.

The moments Qlm depend entirely on the charge distribution and are therefore
its intrinsic property. The expression (4.293) is the required multipole expansion.
It shows that the potential outside the charge distribution can be expanded into a
series with respect to inverse powers of the distance measured from the “centre” of
the distribution (where the centre of the coordinate system was chosen):
A B C
U.r/ D C 2 C 3 C D U1 .r/ C U2 .r/ C U3 .r/ C :
r r r
We expect that A must be the total charge of the distribution, B is related to its
dipole moment, C to its quadrupole moment and so on. For instance, consider the
l D m D 0 term:
r Z Z
4 0 1 0
0 0
Q00 D r p dr D r dr ;
1 V 4 V
which is the total charge Q of the distribution, and the corresponding term in the
potential, U0 .r/ D
p Q=r, does indeed have the correct form. We have used here the
fact that Y00 D 1= 4 (see Sect. 4.5.3.4).
Problem 4.88. Similarly show that the l D 1 term in the expansion (4.293)
corresponds to the dipole term:
Pn
U1 .r/ D ;
r2
where n D r=r is the unit vector in the direction r, and
Z
0 0 0
PD r r dr
V
is the dipole moment of the charge distribution. [Hint: use real spherical
harmonics and explicit expressions for them in Eq. (4.195).]
Problem 4.89. Also show that the l D 2 term is associated with the
quadrupole contribution
3
1 X
U2 .r/ D 3 D˛ˇ n˛ nˇ ;
2r
˛;ˇD1
where summation is performed over three Cartesian components and

Z
0
D˛ˇ D r 3r˛0 rˇ0 ı˛ˇ r02 dr0
V
is the quadrupole moment matrix. [Hint: use real spherical harmonics (4.196)
and the fact that Dxx C Dyy C Dzz D 0.]
Note that the quadrupole matrix is symmetric and contains only five independent
elements (since the diagonal elements sum up to zero); this is not surprising as there
are only five spherical harmonics of l D 2.
Chapter 5
Fourier Transform
5.1 The Fourier Integral
We know from Chap. 3 that any piecewise continuous periodic function f .x/ can
be expanded into a Fourier series.1 One may ask if a similar expansion can be
constructed for a function which is not periodic. The purpose of this chapter is to
address this very question. We shall show that for non-periodic functions f .x/ an
analogous representation of the function exists, but in this case the Fourier series
is replaced by an integral, called Fourier integral. This development enables us to
go even further and introduce a concept of an integral transform, The definition
and various properties of the Fourier integral and Fourier transform are given, with
various applications in physics appearing at the end of the chapter.
5.1.1 Intuitive Approach
This will be done using a Fourier integral. We begin by considering an arbitrary

piecewise smooth non-periodic function f .x/, e.g. the one shown in Fig. 5.1. The
driving idea here is to cut out a central piece of f .x/ between two symmetric
points T=2 and T=2 (T is a positive number), and then repeat this piece of f .x/
periodically in both directions whereby introducing an associated periodic function
fT .x/, with a fundamental period T, shown in Fig. 5.2:
1

402 5 Fourier Transform
Fig. 5.1 An example of a

non-periodic function f .x/
Fig. 5.2 The non-periodic function f .x/ shown in Fig. 5.1 is compared with its periodic (with the
period T) approximation fT .x/ (the solid line). The part of f .x/ which was cut out in fT .x/ is shown
with the dashed line
T T
fT .x/ D f .x/ for x< ; (5.1)
2 2
while the expression fT .x C T/ D fT .x/ extends this approximation to the entire x

axis. It is clear from (5.1) and Fig. 5.2 that the periodic function fT .x/ will provide
an exact representation of f .x/ within the interval T=2 x < T=2; beyond this
interval these two functions are obviously different. However, the two functions
should become identical in the limit of T ! 1, i.e. we should have
f .x/ D lim fT .x/ for 1 < x < 1: (5.2)

T!1
The function fT .x/ is periodic and hence can be expanded into the Fourier series.
Using the exponential form of the series, Sect. 3.5, we can write
1
X
n x
fT .x/ D FT .n /ei2 ; (5.3)
nD1
where n D n=T, D nC1 n D 1=T; and the Fourier coefficients are

Z T=2
FT .n / D fT .x/ei2 n x
dx: (5.4)
T=2
In the T ! 1 limit D 1=T ! 0 and the summation in (5.3) becomes an

integral sum, i.e. the sum tends to a definite integral in the right-hand side, while
fT .x/ ! f .x/ in the left-hand side:
5.1 The Fourier Integral 403
Z 1
x
f .x/ D F.v/ei2 d; (5.5)
1
where F./ D limT!1 FT ./. At the same time, in the T ! 1 limit the integral
in the right-hand side of the Fourier coefficients (5.4) extends to the whole x axis,
while in the integrand fT .x/ ! f .x/, i.e. we can also write
Z 1
F./ D f .x/ei2 x
dx: (5.6)
1
Combining formulae (5.5) and (5.6), one obtains

Z 1 Z 1 Z 1 Z 1
i2 t i2 x
f .x/ D f .t/ e dt e d D d dt f .t/ei2 .tx/
:
1 1 1 1
(5.7)
This very important result is known as the Fourier integral representation for
the non-periodic function f .x/. After calculating the integral over t (if possible),
we shall arrive at a representation of f .x/ via a single integral over , Eq. (5.5).
Therefore, it is seen from (5.7) that if we wish to represent a non-periodic function
f .x/ over the infinite range 1 < x < 1, then the Fourier sum over the discrete
frequency spectrum fn I n D 0; ˙1; ˙2; : : :g must be replaced by an integral over
a continuous frequency spectrum .
The discussion which led us to Eqs. (5.5)–(5.7) was mostly based on intuition; it
was not very rigorous. It can be proved that the Fourier integral representation (5.7)
is valid Rfor all piecewise smooth non-periodic functions f .x/, provided that the
1
integral 1 jf .x/jdx exists. If the function f .x/ has a discontinuity at x D x1 ,
then the Fourier integral (5.7) will give the mean value of f .x/ at x D x1 , i.e.
1
2
Œf .x1 0/ C f .x1 C 0/. But before we provide a more rigorous discussion of the
Fourier integral, let us first consider its other alternative forms.
5.1.2 Alternative Forms of the Fourier Integral
Let us split the integral over in Eq. (5.7) into two: one between 1 and 0 and
another between 0 and C1. In the first integral we shall then make a change of
variables ! . This gives
Z 0 Z 1 Z 1 Z 1
i2 .tx/
f .x/ D .d/ f .t/e dt C d f .t/ei2 .tx/
dt
1 1 0 1
Z 1 Z 1
D d f .t/ ei2 .tx/
C ei2 .tx/
dt: (5.8)
0 1
The expression in the square brackets is recognised to be the two times cosine, and
we finally obtain the so-called trigonometric form of the Fourier integral:
Z 1 Z 1
f .x/ D 2 d f .t/ cos Œ2 .t x/ dt: (5.9)
0 1
Problem 5.1. Show that if f .x/ is an even function, then Eq. (5.7) is simplified
further:
Z 1 Z 1
f .x/ D 4 f .t/ cos.2 t/dt cos.2 x/d; (5.10)
0 0
which is known as the Fourier cosine integral for f .x/.

Problem 5.2. Similarly, show that for an odd function
Z 1 Z 1
f .x/ D 4 f .t/ sin.2 t/dt sin.2 x/d; (5.11)
0 0
which is called Fourier sine integral.

Problem 5.3. It is also possible to use the Fourier cosine and sine integrals to
represent a general function f .x/ which is only defined on half of the real axis,
e.g. for 0 x < 1 (i.e. f .x/ D 0 outside this range). Show that in this case the
following formula is valid:
Z 1 Z 1
f .x/ D 2 d dt f .t/ cos.2 .x t//: (5.12)
0 0
As an example, let us obtain the Fourier integral representation of the function

….x/ which is equal to 1 for 1 x 1 and zero otherwise. In this particular case
the function ….x/ is even, so that we can write the Fourier integral in the form of
Eq. (5.10). Hence we find
Z 1 Z 1 Z 1
sin.2 /
….x/ D 4 cos.2 t/dt cos.2 x/d D 4 cos.2 x/d:
0 0 0 2
(5.13)
This is the required integral representation of the function ….x/. When x D ˙1,
where the original function ….x/ makes a jump, the Fourier integral (5.13) gives the
mean value of 12 .
Problem 5.4. Show this using Eq. (2.122). Also show that at x D 0 the integral
is equal to one, as required.
Fig. 5.3 Fourier integral for ….x/ calculated using the spectroscopic range 0 T for (a) T D
1 and T D 4, and (b) T D 10 and T D 100. The original function ….x/ is shown in (a) by the
dashed line
Let us try to understand how this integral representation actually works. To this
end let us consider an approximation
Z T
sin.2 /
…T .x/ D 4 cos.2 x/d
0 2
for the integral (5.13), where integration is performed not up to infinity, but up to
some finite real number T. The results of a numerical integration for T D 1; 4; 10
and 100 shown in Fig. 5.3 clearly demonstrate convergence of the function …T .x/ to
the exact function ….t/ as the value of T (the upper limit in the integral) is increased.
5.1.3 A More Rigorous Derivation of Fourier Integral
The discussion which led us to Eq. (5.7) or Eq. (5.9), which are absolutely equiv-
alent, was mostly intuitive. A more rigorous derivation leading to either of the
forms of the Fourier integral requires a more careful analysis. Let us aim to prove
specifically Eq. (5.9). This basically entails proving the following formula:
Z 1 Z 1
1
2 d f .t/ cos Œ2 .t x/ dt D Œf .x 0/ C f .x C 0/ : (5.14)
0 1 2
To this end, let us consider the following function:

Z =2 Z 1 Z Z 1
1
g .x/D2 d f .t/ cos Œ2 .tx/ dtD d f .t/ cos Œ.tx/ dt;
0 1 0 1
(5.15)
where in the second passage we have made a change 2 ! of the variable. In

the limit of ! 1 the function g .x/ is supposed to approach the mean value of
f .x/ as stated by Eq. (5.14), i.e.
1
lim g .x/ D Œf .x 0/ C f .x C 0/ : (5.16)
!1 2
This is the result we aim to prove.

R 1We assume that f .x/ satisfies Dirichlet conditions and that the integral
1 jf .t/j dt < 1, i.e. it exists. Then, one notices that the t integral in (5.15)
converges absolutely and uniformly for any since it can be estimated by the
integral which (by our assumption) converges
ˇZ ˇ Z
ˇ 1 ˇ 1
ˇ f .t/ cos Œ.t x/ dtˇˇ jf .t/ cos Œ.t x/j dt
ˇ
1 1
Z 1 Z 1
D jf .t/j jcos Œ.t x/j dt jf .t/j dt:
1 1
Therefore, one can swap the integrals in Eq. (5.15) and integrate over to obtain:
Z 1
1 sin Œ .t x/
g .x/ D f .t/ dt:
1 tx
We split this integral into two at the point x; in the integral between 1 and x we
shall then change the variable x t ! t, and in the integral between x and C1 we
shall apply the substitution t x ! t. This yields
Z 1 Z 1
1 sin .t/ 1 sin .t/
g .x/ D f .x t/ dt C f .x C t/ dt (5.17)
0 t 0 t
Since for any , according to Eq. (2.122),

Z 1
sin .t/
dt D ;
0 t 2
one can write for the first integral:

Z 1 Z 1
1 sin .t/ 1 sin .t/ 1
f .x t/ dt D Œf .x t/ f .x 0/ dt C f .x 0/;
0 t 0 t 2
and similarly for the second integral:

Z 1 Z 1
1 sin .t/ 1 sin .t/ 1
f .x C t/ dt D Œf .x C t/ f .x C 0/ dt C f .x C 0/:
0 t 0 t 2
These manipulations lead us to the following result:

Z 1 Z 1
1 1 1
g .x/ D Œf .x0/Cf .xC0/ C ‰1 .t/ sin .t/ dtC ‰2 .t/ sin .t/ dt;
2 0 0
(5.18)
where
f .x t/ f .x 0/ f .x C t/ f .x C 0/
‰1 .t/ D and ‰2 .t/ D
t t
are two auxiliary functions. These two functions have the same discontinuities of
the first kind as the function f .x/ itself; on top of this, they may have a singularity
at t D 0. However, similarly to our analysis in Sect. 3.7.2, at t ! C0 (note that
the integration over t in (5.16) is carried out over positive values only), both these
functions have well-defined limits at t D C0 if we assume2 that the function f .x/
can also be differentiated on the left and on the right of any point x. Then,
f .x t/ f .x 0/
lim ‰1 .t/ D lim D f 0 .x 0/;
t!C0 t!C0 t
f .x C t/ f .x C 0/
lim ‰2 .t/ D lim D Cf 0 .x C 0/;
t!C0 t!C0 t
which proves that at the point t D 0 both functions ‰1 .t/ and ‰2 .t/ are well defined.
Hence these two functions may only have discontinuities of the first kind due to the
function f .t/.
If in the ! 1 limit both integrals in (5.18) tend to zero, then g .x/ would
indeed tend to the required mean value of f .x/. So, it is only left to investigate the
convergence of these integrals in the ! 1 limit. Our analysis will mostly be
intuitive from this point on. Consider the integral
Z 1 Z ı Z 1
I./ D ‰.t/ sin .t/ dt D ‰.t/ sin .t/ dt C ‰.t/ sin .t/ dt;
0 0 ı
(5.19)
which has been split into two by some small positive ı chosen in such a way that
‰.t/ is continuous within the interval 0 t ı. Then, the first integral (between 0
and ı) can be manipulated for sufficiently small ı as follows:
Z ı Z ı
1 cos .ı/
‰.t/ sin .t/ dt ' ‰.0/ sin .t/ dt D ‰.0/ ;
0 0
2
This assumption is in fact not necessary and Eq. (5.16) can be proven without it. However, this
will not be done here as it would lead us to a much lengthier calculation.
and hence it is seen to tend to zero as ! 1. An inquisitive reader may assume

that ‰.t/ is expanded into the Taylor’s series around t D 0, and hence the integration
over t can be performed term-by-term. To this end, consider first the integral
Z x
Kn .x/ D tn sin tdt; n D 0; 1; 2; : : : :
0
Problem 5.5. Show that it satisfies the recurrence relation:
Kn .x/ D xn1 .n sin x x cos x/ n .n 1/ Kn2 .x/; n D 2; 3; : : : ;
with K0 D 1 cos x and K1 D sin x x cos x. Importantly, Kn .x/ is a linear

combination of sine and cosine functions multiplied by powers of the x, with
the largest power being xn .
Therefore,
Z ˇ ˇ Z
ı ˇ z D t ˇ 1 ı z
ˇ
‰.t/ sin .t/ dt D ˇ ˇ D ‰ sin zdz
0 dz D dt ˇ 0
X1 Z X1
1 ‰ .n/ .0/ ı n 1 ‰ .n/ .0/
D z sin zdz D Kn .ı/ :
nD0
nC1 nŠ 0 nD0
nC1 nŠ
Since the largest power of in Kn .ı/ is n , each term in the above expansion tends
to zero as 1= when ! 1.
So, the first integral in (5.19) is zero in the ! 1 limit. Consider now the
second integral there assuming first that ‰.t/ does not have discontinuities at t > ı:
Z 1 Z
1 1 z
‰.t/ sin .t/ dt D ‰ sin zdz:
ı ı
In the ! 1 limit ‰ .z=/ ! ‰.0/, and the bottom limit ı of the integral tends
to infinity reaching the upper limit (note that ı is small but finite). Correspondingly,
the integral tends to zero. If ‰.t/ has discontinuities somewhere at t > ı, we
can always split the integral into a sum of integrals each taken between these
discontinuities. Then each of the integrals individually will tend to zero as ! 1.
Indeed, consider an interval a < t < b assuming that ‰.t/ is continuous within that
interval. Then,
Z Z
b
1 b
z
‰.t/ sin .t/ dt D ‰ sin zdz:
a a
Since both limits tend to infinity as ! 1 at the same time, the integral tends to
zero, as required. This finalises the proof of formula (5.16), and hence of Eq. (5.14).
5.2 Fourier Transform 409
5.2 Fourier Transform
5.2.1 General Idea
If the application of Eq. (5.7) is split into two consecutive steps, then a very useful
and widely utilised device is created called the Fourier transform. The Fourier
transform of a function f .x/, defined on the whole real axis x and satisfying the
Dirichlet conditions, is defined as the integral (5.6):
Z 1
F./ D f .x/ei2 x
dx: (5.20)
1
R1
We know from the previous section that the integral 1 jf .x/j dx should exist.
From (5.20) it is seen that the function f .x/ is transformed by a process of integration
into a spectral function F.v/. If we introduce a functional operator F acting on f .x/
which converts f .x/ 7! F.v/, then we can recast (5.20) in the form:
Z 1
F./ D FŒf .x/ D f .x/ei2 x
dx: (5.21)
1
Note, in particular, that if x is time and f .t/ is a signal of some kind, then
is a frequency, i.e. in this case the transformation is performed into the frequency
domain and the Fourier transform F./ D F Œf .t/ essentially gives a spectral
analysis of the signal f .t/.
It is also possible to convert F./ back into f .x/, i.e. to perform the inverse
transformation F./ 7! f .x/, by using the formal relation f .x/ D F 1 ŒF./, where
F 1 denotes the inverse functional operator. Fortunately, an explicit formula for
this inversion procedure is readily provided by the Fourier integral (5.7). Indeed, the
expression in the square brackets there, according to Eq. (5.20), is exactly F./, so
that we find
Z 1
f .x/ D F 1 ŒF./ D F./ei2 x d: (5.22)
1
This result is called the inverse Fourier transform of F./. For a time dependent
signal f .t/ the inverse transform shows how the signal can be synthesised from its
frequency spectrum given by F./.
Problem 5.6. Show that for an even function fe .x/ the Fourier transform pair
(i.e. both direct and inverse transformations) can be expressed in the following
alternative form:
(continued)

Z 1
F./ D F Œfe .x/ D 2 fe .x/ cos.2 x/dx; (5.23)
0
Z 1
fe .x/ D F 1 ŒF./ D 2 F./ cos.2 x/d: (5.24)
0
It should be noted that the Fourier transform F./ of a real even function fe .x/
is also a real function.
Problem 5.7. Similarly, show that for an odd function fo .t/ we find
Z 1
F./ D F Œfo .x/ D 2i fo .x/ sin.2 x/dx: (5.25)
0
Z 1
fo .x/ D F 1 ŒF./ D C2i F./ sin.2 x/d: (5.26)
0
Hence, the Fourier transform F./ of a real odd function is a purely imaginary
function.
Problem 5.8. Show that the Fourier transform F./ of a real function, f .x/ D
f .x/, satisfies the following identity:
F./ D F./:
As an example, let us determine the Fourier transforms Fn .v/ of a set of the unit
impulse functions

n; for jxj 1=2n
ın .x/ D ; (5.27)
0; for jxj > 1=2n
where n D 1; 2; : : :. As is explained in Sect. 4.1, the sequence of functions,

ın .x/, tends to the Dirac delta function ı.x/ in the n ! 1 limit. Therefore,
the Fourier tranforms F Œın .x/ D Fn ./ of the functions in this sequence form
another sequence Fn ./ which in the same limit tends to the Fourier transform
F./ D limn!1 Fn ./ of the delta function itself. We shall also see that as a by-
product of this calculation we would be able to obtain an integral representation for
ı.x/. And indeed, since the function ın .t/ is even, we can use Eq. (5.23) which gives
Z 1=2n
sin. =n/
Fn ./ D F Œın .t/ D 2n cos.2 t/dt D : (5.28)
0 =n
The functions Fn ./ for selected values of n from 1 to 100 are shown in Fig. 5.4. It is
clearly seen from the definition (5.27) of the ın .x/ that as n increases the width of
Fig. 5.4 The Fourier transforms Fn ./ of the functions ın .x/ of the delta sequence for selected
values of n
the peak x D 1=n becomes smaller; at the same time, the width of the central
peak of Fn ./ becomes larger. Indeed, may be determined by twice the smallest
root (excluding zero) of the function sin . =n/, i.e. D 2n, i.e. for any n we find
that x ' 2 (a some sort of the “uncertainty principle” of quantum mechanics).
In the n ! 1 limit, we obtain
sin. =n/ sin t

lim F Œın .x/ D lim D lim D 1: (5.29)
n!1 n!1 . =n/ t!0 t
This is nicely confirmed by the graphs of the Fourier transform of ın .x/ for large
values of n shown in Fig. 5.4, where one can appreciate that the function Fn ./
tends more and more to the horizontal line Fn ./ D 1 with the increase of n. Thus,
the Fourier transform of the Dirac delta function, to be obtained in the limit, is
Z 1
F Œı.x/ D ı.x/ei2 x
dx D 1: (5.30)
1
Note that the integral above also perfectly conforms to the delta function filtering
theorem, see Sect. 4.1.
Now, the application of the inverse transform (5.22) to the result (5.30) gives the
important formal integral representation of the delta function:
Z 1 Z 1 Z 1
x x 1
ı.x/ D F Œı.x/ ei2 d D ei2 d D eikx dk: (5.31)
1 1 2 1
By changing the variable k ! k in the integral, another integral representation

is obtained with the minus sign in the exponent. Both forms, of course, are fully
equivalent. The same expressions for the Dirac delta function are also derived in
Sect. 4.1 using a different method, based on general properties of the delta function.
The appearance of the integral representation of the delta function in the theory
of Fourier transform is not accidental either. Indeed, let us return to the Fourier
integral (5.7) which we shall rewrite this time slightly differently by interchanging
the order of the integrals:
Z 1 Z 1
f .x/ D ei2 .tx/
d f .t/dt: (5.32)
1 1
Comparing the expression in the square brackets with Eq. (5.31) we immediately
recognise the delta function ı .t x/. Therefore, the expression above becomes
Z 1
f .x/ D ı .t x/ f .t/dt:
1
The integral in the right-hand side above gives f .x/ due to the filtering theorem for
the delta function, i.e. the same function as in the left-hand side, as expected!

1
F .ı.x a/ C ı.x C a// D cos .2 a/ :
2
sin .2 /
F Œ.1 x/ .1 x/ D ;

where .x/ is the Heaviside function.

h i h i
8 ˛i 2 ˛2 4 2 2
F xe˛jxj D ; while F jxj e˛jxj
D ;
.˛ 2 C 4 2 2 /2 .˛ 2 C 4 2 2 /2
where ˛ > 0.
Problem 5.12. Show that the Fourier transform of the function f .t/, which is
equal to 1 for 1 t < 0, C1 for 0 t 1 and zero otherwise, is given by
2i sin2
F./ D :

(continued)

Then, using the inverse Fourier transform show that
Z
8 11 2
f .x/ D sin t sin .tx/ cos .tx/ dt:
0 t
Finally, choose an appropriate value of x to show that the integral
Z 1
sin3 t cos t
dt D :
0 t 16
Problem 5.13. Show that the Fourier transform of the function f .t/ D e˛jxj
(where ˛ > 0) is given by
2˛
F./ D :
˛ 2 C .2 /2
Consequently, prove the following integral representation of it:
Z
˛jxj 2˛ 1 cos.ux/
e D du:
0 ˛ 2 C u2
Using this result, show that the integral
Z 1
du
D :
0 1 C u2 2
Check this result by calculating the integral directly using a different method.
[Hint: look up the integral representation of the arctangent.]
Problem 5.14. Prove the so-called modulation theorem:
1
F Œf .x/ cos .2 0 x/ D ŒF . C 0 / C F . 0 / ;
2
where F./ D F Œf .x/.
Problem 5.15. Prove that the Fourier transform of f .x/ D sin x for jxj =2
and zero otherwise is
4 i 2

F./ D 2
cos :
.2 / 1
Using this result and the inverse Fourier transform, show that
Z 1
x sin . x/
dx D :
0 1 x2 2
Verify that the integrand is well defined at x D 1.
Problem 5.16. Show that the Fourier transform of the Gaussian delta
sequence (where n D 1; 2; : : :)
n 2 2
ın .x/ D p en x =2 (5.33)
2
is
=n/2 2
Fn ./ D e2. : (5.34)
[Hint: you may find formula (2.56) useful.] Show next that if x and are the
widths of ın .x/ and Fn ./, respectively, at their half height, then their product
x D 4 ln 2= ' 0:88 and does not depend on n.
Again, several interesting observations can be made about the functions (5.33)
and their Fourier transforms (5.34). Indeed, the functions ın .x/ defined above
represent another example of the delta sequence (see Sect. 4.1): as n is increased, the
Gaussian function ın .x/ becomes thinner and more picked, while the full area under
the curve remains equal to unity. Its Fourier transform Fn ./ is also a Gaussian;
however, with the increase of n the Gaussian Fn ./ gets more and more spread out
tending to the value of unity in the limit, limn!1 Fn ./ D 1, which agrees with the
result for the rectangular delta sequence (5.27) considered above.
So, we see yet again that if a function in the direct x-space gets thinner, its Fourier
transform in the -space gets fatter. The opposite is also true which follows from the
symmetry of the direct and inverse transforms, Eqs. (5.20) and (5.22).
5.2.2 Fourier Transform of Derivatives
One of the important strengths of the Fourier transform stems from the fact that it
frequently helps solving linear differential equations (DE), and several examples of
this application will be given in the following sections. The idea is to first transform
the equation into the -space by performing the Fourier transform of all terms in it.
In the transformed equation the unknown function f .x/ appears as its transform (or
“image”) F./, and, as will be shown in this subsection, derivatives of f .x/ in the
DE are transformed into algebraic expressions linear in F./ rendering the DE in
the -space to become algebraic with respect to F./. Then, at the second step the
inverse transform F./ 7! f .x/ is performed. If successful, the solution is obtained
in the analytical form. Otherwise, the solution is obtained as the spectral integral
over which may be calculated numerically.
However, an application of this scheme may be drastically simplified if certain
general rules of working with the Fourier transform are first established. We shall
derive several such rules here and also in the next subsection. These may be useful
in performing either direct or inverse Fourier transforms in practice.
We start by considering the calculation of the Fourier transform of the derivative

f 0 .x/ of f .x/. It is given by
Z 1
F f 0 .x/ D f 0 .x/ei2 x
dx: (5.35)
1
If we integrate the right-hand side of (5.35) by parts, we find

Z
ˇ1 1
0
F f .x/ D f .x/ei2 x ˇ1 C i2 f .x/ei2 x
dx: (5.36)
1
R1
Since we have to assume that the integral 1 jf .x/jdx exists, it is necessary that
limx!˙1 f .x/ D 0. Hence, the free term in the right-hand side can be dropped and
we obtain a simple result:
F f 0 .x/ D i2 F./; (5.37)
where F./ D F Œf .x/.
Problem 5.17. Using the method of induction, generalise this result for a
derivative of any order m:
F f .m/ .x/ D .i2 /m F./; (5.38)
provided that limx!1 f .j/ .x/ D 0 for any j D 0; 1; 2; : : : ; m 1.

Problem 5.18. The n-th moment of a function f .x/ is defined by the expression:
Z 1
n D xn f .x/dx:
1
Show that if F./ D F Œf .x/, then n D F .n/ .0/= .2 i/n .
It is seen that, as mentioned above, the Fourier transform of any derivative

of a function is proportional to the Fourier transform of the function itself. This
guarantees that any linear DE will transform into a linear algebraic equation with
respect to the image F./, and this can always be solved. This is the main reason
for the Fourier transform to be so widely used in physics and engineering since
very often one faces a problem of solving linear ordinary and partial differential
equations.
5.2.3 Convolution Theorem
Consider two functions f .x/ and g.x/ with Fourier transforms F./ and G./,
respectively. The convolution of f .x/ and g.x/ is defined as the following function
of x:
Z 1
f .x/ g.x/ D f .t/g.x t/dt: (5.39)
1
It is readily verified that
f g D g f:
Let us calculate the Fourier transform of this new function. Our goal is to express
it via the Fourier transforms F./ and G./ of the two functions involved.
To this end, let us substitute into Eq. (5.39) the inverse Fourier transform of
g.x t/,
Z 1
.xt/
g.x t/ D G./ei2 d; (5.40)
1
and then change the order of integration:

Z 1 Z 1 Z 1
f .x/ g.x/ D G./ei2 x d f .t/ei2 t
dt D F./G./ei2 x
d;
1 1 1
(5.41)
since the integral within the square brackets above is nothing but the Fourier
transform F./ of f .x/. The result we have just obtained corresponds to the inverse
Fourier transform of the convolution, i.e. f .t/ g.t/ D F 1 ŒF./G./, and hence
we can finally write:
F Œf .x/ g.x/ D F./G./: (5.42)
Formula (5.42) is known as the convolution theorem. We see that the Fourier trans-
form of a convolution f .x/ g.x/ is simply equal to the product of the Fourier
transform of f .x/ and g.x/. Inversely, if one needs to calculate the inverse Fourier
transform of a product of two functions F./G./, then it is equal to the convolution
integral (5.39). This may be useful in performing the inverse transform. Indeed, if a
function T./ to be inverted back into the x-space looks too complex for performing
such a transformation, it might sometimes be possible to split it into a product
of two functions F./ and G./ in such a way that their individual inversions
(into f .x/ and g.x/, respectively) can be done. Then, the inversion of the whole
function T./ D F./G./ will be given by the integral (5.39), which can then be
either attempted analytically or numerically thereby solving the problem, at least in
principle.
As the simplest example, let us determine the convolution of the Dirac delta
function ı.x b/ with some function f .x/. From the definition (5.39) of the
convolution, we find that
Z 1
ı.x b/ f .x/ D ı.t b/f .x t/dt D f .x b/; (5.43)
1
where in the last passage we have applied the filtering theorem for the Dirac delta
function, Sect. 4.1. Considering now the -space, the Fourier image of ı.x b/ is
Z 1
F Œı .x b/ D ı .x b/ ei2 x
dx D ei2 b
;
1
while the Fourier transform of f .x/ is F./. Therefore, the Fourier transform of
their convolution is simply F./ei2 b . On the other hand, calculating directly the
Fourier transform of their convolution given by Eq. (5.43), we obtain
Z ˇ ˇ
1 ˇt D x bˇ
F Œf .x b/ D f .x b/ ei2 x
dx D ˇˇ ˇ
1 dt D dx ˇ
Z 1
D ei2 b
f .t/ ei2 t
dt D ei2 b
F./;
1
which is exactly the same result.
Problem 5.19. Derive formula (5.42) directly by calculating the transform of

the convolution from the definition of the Fourier transform:
Z 1
x
F Œf .x/ g.x/ D Œf .x/ g.x/ ei2 dx:
1
Problem 5.20. The function f .t/ is defined as e˛t for t 0 and zero otherwise
(˛ > 0). Show by direct calculation of the convolution integral, considering
separately the cases of positive and negative t, that the convolution of this
function with itself g.t/ D f .t/ f .t/ D tf .t/.
Problem 5.21. Find the Fourier transforms of the functions f .t/ and g.t/
defined in the previous problem directly and using the convolution theorem.
[Answer: F Œf .t/ D F./ D .˛ C i2 /1 and F Œg.t/ D G./ D
.˛ C i2 /2 D F./2 .]
5.2.4 Parseval’s Theorem
Due to the close relationship between the Fourier series and transform, it is no
surprise that there exists a theorem similar to the corresponding Parseval’s theorem
we proved in Sects. 3.4 and 3.5 in the context of the Fourier series.
Here we shall formulate this statement in its most general form as the
Plancherel’s theorem. It states that if F./ and G./ are the Fourier transforms
of the functions f .x/ and g.x/ respectively, then
Z 1 Z 1
F./ G./d D f .x/ g.x/dx: (5.44)
1 1
To prove this, we use the definition of the Fourier transforms in the left-hand side:
Z 1 Z 1 Z 1 Z 1
2 ix 2 iy
F./ G./d D f .x/e dx g.y/e dy d
1 1 1 1
Z 1 Z 1 Z 1 Z 1 Z 1
D e2 i.xy/
d f .x/ g.y/dxdy D ı.x y/f .x/ g.y/dxdy
1 1 1 1 1
Z 1 Z 1 Z 1
D f .x/ ı.x y/g.y/dy dx D f .x/g.x/dx;
1 1 1
where in the second line we recognised the integral representation (5.31) for the
delta function ı.x y/ within the expression in the square brackets, while in the
third line we used the filtering theorem for the delta function.
In particular, if f .x/ D g.x/, then we obtain the Parseval’s theorem:
Z 1 Z 1
2
jf .x/j dx D jF./j2 d: (5.45)
1 1
Problem 5.22. Using the function from Problem 5.13 and the Parseval’s
theorem, show that
Z 1
dx
D :
0 .1 C x2 /2 4
Problem 5.23. Consider a function f .x/ D cos x for =2 x =2 and

zero otherwise. Show that its Fourier transform

2 cos 2
F./ D :
.2 /2 1
Hence, using the equal functions Parseval’s theorem, show that
Z 1
cos2 . x=2/ 2
dx D :
0 .x2 1/2 8
5.3 Applications of the Fourier Transform in Physics 419
5.3 Applications of the Fourier Transform in Physics
5.3.1 Various Notations and Multiple Fourier Transform
In physics the Fourier transform is used both for functions depending on time t and
spatial coordinates x, y and z. There are several ways in which the Fourier transform
may be written. Consider first the time Fourier transforms. If f .t/ is some function
satisfying the necessary conditions for the transform to exist, then its direct and
inverse transforms can be written, for instance, as
Z 1 Z 1
i!t d!
F.!/ D f .t/e dt and f .t/ D F.!/ei!t ; (5.46)
1 1 2
where ! D 2 is a new frequency and the integration variable in the inverse

Fourier transform. This is just another way of writing the Fourier transform. Indeed,
substituting F.!/ from the first equation into the integrand of the second, we obtain
the Fourier integral:
Z 1 Z 1 Z 1 Z 1
d! d!
f .t/ D f . /ei! d ei!t D f . / ei!.t / d
1 1 2 1 1 2
Z 1
D f . /ı .t / d D f .t/;
1
i.e. the expected result. The integral representation (5.31) of the delta function was
used here.
Note that the 1=2 multiplier may appear instead in the equation for the direct
transform, or it may also be shared in both expressions symmetrically:
Z 1 Z 1
dt d!
F.!/ D f .t/ei!t p and f .t/ D F.!/ei!t p : (5.47)
1 2 1 2
The mentioned Fourier transforms are related to each other by a trivial constant
pre-factor.
It is also important to mention the sign in the exponential functions in the two
(direct and inverse) transforms: either plus or minus can equivalently be used in the
direct transform; the important point is that then the opposite sign is to be used in
the inverse one. Of course, the images (or transforms) of the function f .t/ calculated
using one or the other formulae would differ. However, since at the end of the day
one always goes back to the direct space using the inverse transform, the final result
will always be the same no matter which particular definition has been used.
Functions depending on spatial coordinates may also be Fourier transformed.
Consider first a function f .x/ depending only on the coordinate x. In this case, using
one of the forms above, one can write for both transforms:
Z 1 Z 1
ikx x dkx
F .kx / D f .x/e dx and f .x/ D F .kx / eikx x : (5.48)
1 1 2
Here kx plays the role of the wave vector.

If a function depends on more than one variable, one can transform it with respect
to some or all of its variables. Consider, for instance, a function depending on all
three spatial variables, f .x; y; z/ (e.g. a temperature distribution in a sample). We
first calculate its transform with respect to the x variable:
Z 1 Z 1
ikx x dkx
F .kx ; y; z/ D f .x; y; z/e dx and f .x; y; z/ D F .kx ; y; z/ eikx x :
1 1 2
Then, the function F .kx ; y; z/ can be transformed with respect to its y variable:
Z 1 Z 1
iky y
dky
F kx ; ky ; z D F .kx ; y; z/e dy and F .kx ; y; z/ D F kx ; ky ; z eiky y :
1 1 2
Finally, we can also perform a similar operation with respect to z:
Z 1 Z 1
dkz
F kx ; ky ; kz D F kx ; ky ; z eikz z dz and F kx ; ky ; z D F kx ; ky ; kz eikz z :
1 1 2
Combining all these expressions together, one writes
Z 1 Z 1 Z 1

F kx ; ky ; kz D F kx ; ky ; z eikz z dzD F .kx ; y; z/ e iky y
dy eikz z dz
1 1 1
Z 1 Z 1 Z 1
ikx x iky y
D f .x; y; z/e dx e dy eikz z dz
1 1 1
Z Z Z
D f .x; y; z/ei.kx xCky yCkz z/ dxdydz;
where the triple (volume) integral is performed over the whole infinite space.
Our result can be further simplified by introducing vectors r D .x; y; z/ and
k D kx ; ky ; kz . Then, we finally obtain the direct Fourier transform of the function
f .r/ specified in 3D space:
Z
F.k/ D f .r/eik r dr; (5.49)
where dr corresponds to the volume element dxdydz, and the integration is

performed over the whole space.
Note that the Fourier transform of a constant function equal to one is given by
the 3D delta function:
Z Z Z Z
e ik r
drD e ikx x
dx eiky y
dy eikz z dzD .2 /3 ı .x/ ı .y/ ı .z/ D .2 /3 ı .r/;
(5.50)
Problem 5.24. Show that the inverse transform reads

Z
1
f .r/ D F.k/eik r dk; (5.51)
.2 /3
where the integration is performed over the whole (reciprocal) space spanned
by the wave vector k.
Problem 5.25. Assuming that the Fourier transform of a function f .r/ is
defined as in Eq. (5.49), prove the following identities:
F Œrf .r/ D ikF .k/ and F Œf .r/ D k2 F .k/ ;
where k D jkj. Obtain these results in two ways: (i) act with operators r
and , respectively, on both sides of the inverse Fourier transform (5.51) and
(ii) calculate the Fourier transform of rf .r/ and f .r/ directly by means of
Eq. (5.49) (using integration by parts).
Problem 5.26. Consider a vector function g.r/. Its Fourier transform G.k/ is
also some vector function in the k space defined as above for each Cartesian
component of g.r/. Then show that
F Œr g.r/ D ik G .k/ :
Problem 5.27. Let f .t/ be some function of time and its Fourier transforms be
given by Eq. (5.46). Show that

@f @2 f
F D i!F.!/ and F D ! 2 F.!/:
@t @t2
Problem 5.28. If a function of interest depends on both spatial and time

variables, e.g. f .x; t/, then the Fourier transform can be performed over both
variables. Show that in this case one may write for the direct and inverse
transforms:
Z 1 Z 1
F.k; !/ D dx dt f .x; t/ei.kx!t/ (5.52)
1 1
and
Z 1 Z
dk d!
f .x; t/ D F.k; !/ei.kx!t/ : (5.53)
1 2 2
Here opposite signs in the exponent before x and t were used as is customarily
done in the physics literature.
Problem 5.29. Calculate the Fourier transform

Z
1 ik r
'k D e dr
r
of the function 1=r. First of all, using the spherical coordinates .r; ; / and
choosing the direction of the vector k along the z axis (the integral cannot
depend on its direction anyway) integrate over the angles. When doing the
r integral, attach a factor e r to the integrand which would ensure the
convergence of the integral
(assuming ! 0), and then perform the integration
to get 4 = k2 C 2 . Therefore, in the ! 0 limit 'k D 4 =k2 . The obtained
result corresponds to the Fourier transform of the Coulomb potential:
Z
1 dk 4 ik .rr0 /
D e : (5.54)
0
jr r j .2 /3 k2
Problem 5.30. Act with the Laplacian operator on both sides of Eq. (5.54)
and use Eq. (5.50) to prove that
1
D 4 ı .r/ : (5.55)
r
A different proof of this relationship will be given below in Sect. 5.3.3.
5.3.2 Retarded Potentials
Here we shall show that a particular solution of the Maxwell equations has a form
of a retarded potential. As is shown in electrodynamics, all four Maxwell equations
for the fields can be recast as only two equations for the corresponding scalar ' .r; t/
and vector A .r; t/ potentials:
1 @2 ' 1 @2 A 4
' D 4 and A D J: (5.56)
c2 @t2 c2 @t2 c
These are the so-called d’Alembert equations. Here .r; t/ and J .r; t/ are the charge
and current densities, respectively.
We shall derive here a special solution of these equations corresponding to the
following problem. Suppose, the potentials are known at t 0 where the charges
(responsible for the non-zero charge density ) are stationary. Then, starting from
t D 0 the charges start moving causing time and space dependence of the “sources”
.r; t/ and J .r; t/. Assuming next that the latter are known, we would like to
determine the changes in the potentials due to this movement of charges. As we

are concerned only with the changes here, we may assume that the potentials satisfy
zero initial conditions:
ˇ
@' .r; t/ ˇˇ
' .r; t D 0/ D D 0; (5.57)
@t ˇtD0
and similarly for the vector potential. We also assume that the charges occupy at any
time a finite portion of space, so that the potentials tend to zero infinitely far away
from the charges. In fact, both ' and A must tend to zero at least as 1=r, where
r D jrj. This latter condition is essential for making sure that the corresponding
Fourier integrals exist.
Let us consider the equation for the scalar potential first. We expand both the
potential and the charge density in the corresponding Fourier integrals with respect
to the spatial variables only:
Z Z
dk dk
' .r; t/ D 3
'k .t/eik r
and .r; t/ D k .t/e
ik r
: (5.58)
.2 / .2 /3
Next we apply the Fourier transform to both sides of the differential equation for
' in (5.56). The action of the Laplacian corresponds to the sum of the second
derivatives with respect to each of the three spatial variables. Therefore, according
to Problem 5.25,
F Œ' .r; t/ D k2 'k .t/
(here and in the following k D jkj). For other terms the transformation is trivial:

1 @2 ' 1 @2 'k .t/
F 2 2 D 2 and F Œ4 .r; t/ D 4 k .t/:
c @t c @t2
Therefore, after the transformation into the Fourier space (or k-space), we obtain the
following time differential equation for the image 'k .t/ of the unknown potential:
1 d2 'k d2 'k
k2 'k D 4 k H) C .kc/2 'k D 4 c2 k: (5.59)
c2 dt2 dt2
This is a linear second order inhomogeneous differential equation which can be
solved using, e.g., the method of variation of parameters, considered in detail in
Sect. I.8.2.2.2.
Problem 5.31. Show that the general solution of this equation is

Z
4 c t
'k .t/ D C1 eikct C C2 eikct C k . / sin Œkc .t / d:
k 0
From the initial conditions, it follows that

Z ˇ Z
ik r d'k ˇˇ @'.r; t/
'k .0/D dr '.r; 0/e D0 and D dr eik r D 0;
dt ˇtD0 @t tD0
so that the arbitrary constants above should be zero. Indeed, 'k .0/ D C1 C C2 D 0,
while
d'k 4 c
D ikc C1 eikct C2 eikct C f k . / sin Œkc .t /g Dt
dt k
Z t
C4 c2 k . / cos Œkc .t / d
0
Z
t
D ikc C1 eikct C2 eikct C 4 c2 k . / cos Œkc .t / d;
0
which after setting t D 0 yields

ˇ
d'k ˇˇ
D ikc .C1 C2 / D 0 H) C1 D C2 ;
dt ˇtD0
resulting in C1 D C2 D 0. Therefore,
Z
4 c t
'k .t/ D k . / sin Œkc .t / d
k 0
Z Z
4 c t
0
D d sin Œkc .t / dr0 .r0 ; /eik r ;
k 0
where in the second passage we have replaced the Fourier transform of the density
by the density itself using the Fourier transform of it. Substituting now the image
of the potential we have just found into the inverse Fourier transform of it, the first
Eq. (5.58), we obtain after slight rearrangements:
Z Z Z
c 0
t dk ik .rr0 /
' .r; t/ D 2
dr d r0 ; e sin .kc .t // : (5.60)
2 0 k
The 3D integral over k within the square brackets may only depend on the length R
of the vector R D r r0 since R appears only in the dot product with the vector k
and we integrate over the whole reciprocal k space. Therefore, this integral can be
simplified by directing R along the z axis and then using spherical coordinates.
Problem 5.32. Using this method, show that integration over the spherical
angles of k yields
Z Z 1 Z 1
dk ik R 2
e sin .k/ D cos .k . R// dk cos .k . C R// dk ;
k R 0 0
where D c .t /.
In these integrals one can recognise Dirac delta functions, see Eq. (4.24), so that
Z
dk ik R 2 2
e sin Œkc .t / D Œı .c .t / R/ ı .c .t / C R/ :
k R
This simple result allows manipulating Eq. (5.60) further:

Z Z
dr0 t 0
' .r; t/ D c d r ; Œı .c .t / R/ ı .c .t / C R/ :
jr r0 j 0
The two delta functions give rise to two integrals over . We shall consider them
separately. The first one, after changing the variable ! D c .t / R,
results in
Z t Z
1 ctR RC
d r0 ; ı .c .t / R/ D r0 ; t ı ./ d:
0 c R c
If the point D 0 does not lie within the limits, the integral is equal to zero. Since
R > 0, the integral is zero if ctR < 0, i.e. if R > ct or t < R=c. If, however, t > R=c
or R < ct, then the integral results in .1=c/ .r0 ; t R=c/ by virtue of the filtering
theorem for the delta function. The second delta function does not contribute to the
integral in this case since after the change of variables ! D c .t / C R
the limits of become ct C R and CR which are both positive, and hence exclude
the point D 0. Therefore, only the first delta function contributes for t > R=c and
we finally obtain
Z
dr0 jr r0 j 0 jr r0 j
' .r; t/ D H t r ; t ; (5.61)
jr r0 j c c
where H.x/ is the Heaviside function. The Heaviside function indicates that if there
is a charge density in some region around r0 D 0, then the effect of this charge will
only be felt at points r at times t > R=c ' r=c. In other words, the field produced
by the moving charges is not felt immediately at any point r in space; the effect of
the charges will propagate to the point r over some time and will only be felt there
when t r=c. Also note that for t D 0 the Heaviside function is exactly equal to
zero and hence is the potential—in full accordance with our initial conditions.
The second equation in (5.56) for the vector potential can be solved in the same
way. In fact, these are three scalar equations for the three components of A, each
being practically identical to the one we have just solved. Therefore, the solution
for each component of A is obtained by replacing with J˛ /c (where ˛ D x; y; z) in
formula (5.61), and then combining all three contributions together :
Z
1 dr0 jr r0 j 0 jr r0 j
A .r; t/ D H t J r ; t : (5.62)
c jr r0 j c c
This is a retarded solution as well. Hence, either of the two potentials at time
t is determined by the densities or J calculated at time t jr r0 j =c in the
past. In other words, the effect of the densities (the “sources”) from point r0 is
felt at the “observation” point r only after time jr r0 j =c which is required for
the light to travel the distance jr r0 j between these two points. The other term
which contains the delta function ı .c .t / C R/ (which we dropped) corresponds
to the advanced part of the general solution. This term is zero for the boundary
conditions we have chosen. In the case of general initial and boundary conditions
both potentials contribute together with the free term (the one which contains the
constants C1 and C2 ).
Problem 5.33. Using the Fourier transform method, show that the solution of
the Poisson equation
'.r/ D 4 .r/
of classical electrostatics is
Z
.r0 / 0
'.r/ D dr : (5.63)
jr r0 j
Of course, this famous Coulomb formula can be obtained directly from

Eq. (5.61) for stationary charges (i.e. for the charge density which does not
depend on time), but it is a good exercise to repeat the steps of the method of
this subsection in this much simpler case.
Problem 5.34. Consider an infinitely long string along the x-axis that is pulled
up at x D 0 and then let go oscillating in the xy plane. The initial shape of the
string is described by the function y.x; t/jtD0 D hejxj=a (h and a being positive
constants). Show by using the Fourier transform method that the solution of the
equation of motion for the string (the wave equation),
1 @2 y @2 y
2 2
D 2;
v @t @x
(continued)

where v is the wave velocity, can be written as
Z
ha 1 cos.vkt/ ikx
y.x; t/ D 2 2
e dk:
1 1 C k a
Problem 5.35. Diffusion of particles immersed in a liquid flowing in the

x direction with constant velocity is described by the following partial
differential equation:
@2 n @n @n
D D ;
@x2 @x @t
where D is the diffusion coefficient and n.x; t/ is the density of the particles.
Perform the Fourier transform of this equation with respect to x and solve the
obtained ordinary differential equation assuming that initially at t D 0 all
particles were located at some point x0 , i.e. n .x0 ; 0/ D n0 ı .x x0 /. Finally,
performing the inverse Fourier transform of the obtained solution, show that
n0 2
n.x; t/ D p e.xx0 t/ =4Dt
:
4 Dt
Describe what happens to the particles’ distribution over time.
5.3.3 Green’s Function of a Differential Equation
The so-called Green’s functions allow establishing a general formula for a particular
integral of an inhomogeneous differential equation as a convolution with the
function in the right-hand side. And the Fourier transform method has become an
essential tool in finding the Green’s functions for each particular equation. Let us
consider this question in a bit more detail.
Consider an inhomogeneous differential equation for a function y.x/:
Ly.x/ D f .x/; (5.64)
where L is some operator acting on y.x/. For instance, the forced damped harmonic
oscillator is described by the equation
d2 y dy f .t/
2
C 2!0 C !02 y D ; (5.65)
dt dt m
where !02 is its fundamental frequency, friction and m mass, while f .t/ is an
external excitation signal, and hence the operator
d2 d
LD C 2!0 C !02 :
dt2 dt
Suppose that we can find a solution G .x; x0 / of the auxiliary equation

LG x; x0 D ı x x0 ; (5.66)
where the operator L acts on the variable x only, i.e. x0 serves as a parameter, and in
the right-hand side we have a Dirac delta function. The function G .x; x0 / is called
Green’s function of the differential equation (5.64) or its fundamental solution.
It is easy to see then that a general solution of our differential equation for any
function f .x/ in the right-hand side of the differential equation can be written as its
convolution with the Green’s function:
Z

y.x/ D G x; x0 f x0 dx0 : (5.67)
Indeed, by acting with the operator L (recall that it acts on x, not on the integration
variable x0 ) on both sides, we obtain
Z Z
0 0 0
Ly.x/ D LG x; x f x dx D ı x x0 f x0 dx0 D f .x/;
where in the last passage we have used the filtering theorem for the delta function.
This calculation proves that formula (5.67) does indeed provide a particular solution
of our Eq. (5.64) for an arbitrary right-hand side f .x/.
Let us now consider some examples, both for ordinary and partial differential
equations.
We shall start by looking for the Green’s function of the damped (small friction)
harmonic oscillator equation. The Green’s function G.t/ satisfies the differential
equation (5.65) with f .x/=m replaced by ı.t/ (one can set t0 to zero here, the Green’s
function would only depend on the time difference t t0 anyway). We shall use the
Fourier transform method. If we define
Z 1
G.!/ D F ŒG.t/ D G.t/ei!t dt;
1
then the equation
d2 G dG
2
C 2!0 C !02 G D ı.t/
dt dt
for the Green’s function transforms into
! 2 G.!/ 2i!0 !G.!/ C !02 G.!/ D 1;
yielding immediately
1
G.!/ D :
! 2 C 2i!0 ! !02
Therefore, using the inverse Fourier transform, we can write for the Green’s function
we are interested in:
Z 1 Z 1
i!t d! 1 ei!t
G.t/ D G.!/e D 2
d!:
1 2 2 2
1 ! C 2i!0 ! !0
The !-integral is most easily taken in the complex z plane. Indeed, there are two
poles here,
p
!˙ D !0 i ˙ 1 2 D ˙$ i!0 ;
which are solutionsp of the quadratic equation ! 2 C 2i!0 ! !02 D 0. Here the
frequency $ D !0 1 2 is positive (and real) as < 1 for a weakly damped
oscillator. Both poles lie in the lower part of the complex plane. We use the contour
which closes the horizontal axis with a semicircle of radius R ! 1 either in the
upper or lower part of the complex plane. To choose the contour, we have to consider
two cases: t > 0 and t < 0. In the case of positive times, R the exponent i!t !
izt D i .x C iy/ t D ixt C yt, so that the integral CR over the semicircle of
radius R of the contour will tend to zero for y < 0, i.e. we have to choose the
semicircle in the lower part of the complex plane, and both poles would contribute.
Therefore,
I
1 eizt
G.t/ D 2
dz
2 2
C z C 2i!0 z !0

2 i eizt eizt
D Res 2 ; ! C C Res ; !
2 z C 2i!0 z !02 z2 C 2i!0 z !02
i!C t
ei!C t ei! t e ei! t sin .$t/ !0 t
Di C Di C D e :
2!C C2i!0 2! C2i!0 2$ 2$ $
Note that the contour runs around the poles in the clockwise direction bringing an
additional minus sign. In the case of t < 0, one has to enclose the horizontal axis in
the upper part of the complex plane where there are no poles, and the result is zero.
Therefore, the Heaviside function H.t/ appears in the Green’s function in front of
the above expression.
Problem 5.36. Show that the Green’s function corresponding to the operator
L D C d=dt is G.t/ D H.t/et .
Problem 5.37. Show that if L D . C d=dt/2 , then G.t/ D H.t/tet .
Now let us briefly touch upon the issue of calculating Green’s functions for
partial differential equations. Consider, as an example, the Poisson equation,
G.r/ D ı.r/: (5.68)

In this case the boundary conditions are important, as many Green’s functions exist
for the same equation depending on these. We shall consider the case when the
Green’s function vanishes at infinity, i.e. G.r/ ! 0 when r ! 1. In this case we
can apply to G.r/ the Fourier transform,
Z Z
dk
G.k/ D eik r G.r/dr and G.r/ D eik r G.k/:
.2 /3
In the k-space the differential equation reads simply k2 G.k/ D 1, so that
G.k/ D 1=k2 . Correspondingly, the Green’s function G.r/ can be calculated using
the inverse Fourier transform taken in spherical coordinates and with the vector r
directed along the z axis:
Z Z 1 Z Z 2
1 dk 1 ikr cos #
G.r/ D eik r 2 D dk e sin #d# d
.2 /3 k .2 /3 0 0 0
Z 1 Z 1
1 eikr eikr 1 sin p 1
D 2
dk D 2 dp D ;
.2 / 0 ikr 2 r 0 p 4 r
where we have used Eq. (2.122). Using this Green’s function, one can easily write
a solution of the inhomogeneous Poisson equation '.r/ D 4 .r/ as a
convolution:
Z Z
0 0 dr0
'.r/ D G r r0 4 r dr0 D r ;
jr r0 j
which is exactly the result (5.63) we obtained previously.
Problem 5.38. Consider the Green’s function for the so-called Helmhotz
partial differential equation. The Green’s function is defined via
G.r/ C p2 G.r/ D ı.r/: (5.69)
First of all, using the Fourier transform method, show that

Z 1
1 keikr dk
G.r/ D 2
:
ir .2 / 1 p2 k2
Then take the k-integral using two methods: (i) replace p ! p C i and then,
performing the integration in the complex plane, show that in the ! C0 limit
G.r/ D eipr =.4 r/; (ii) similarly, by replacing p ! p i show that in this
case G.r/ D eipr =.4 r/.
Finally, we shall consider the diffusion equation
1 @ .r; t/
.r; t/ D ; (5.70)
@t
and correspondingly yet another definition of the Green’s function. Here .r; t/
describes the probability density of a particle to be found at point r at time t,
and is the diffusion constant. First of all, let us determine the solution of this
equation corresponding to the initial condition that the particle was in the centre of
the coordinate system at time t D 0. In this case .r; 0/ D ı.r/, since integrating this
density over the whole space gives the total probability equal to unity, as required.
Again, we shall use the Fourier transform method. Define
Z Z
dk
.k; t/ D eik r .r; t/dr and .r; t/ D eik r .k; t/;
.2 /3
so that the differential equation in the k-space becomes
1 @ .k; t/ 2
k2 .k; t/ D H) .k; t/ D Cek t :
@t
The arbitrary constant C is found using the initial condition:
Z Z
.k; 0/ D eik r .r; 0/dr D eik r ı.r/dr D 1;
yielding C D 1. Hence, performing the inverse transform, we write

Z
dk 2 t
.r; t/ D eik r ek D Ix Iy Iz ;
.2 /3
where, for instance,

Z 1
dkx ikx x .t/kx2 1 2
Ix D e e Dp ex =4t ;
1 2 4 t
where we have used Eq. (2.56). This finally gives
1 2 =4t
.r; t/ D 3=2
er : (5.71)
.4 t/
This particular solution, corresponding to the initial condition G.r; 0/ D ı .r/, is

called Green’s function G.r; t/ of the diffusion equation. Note that it also satisfies
the corresponding inhomogeneous diffusion equation but with a product of two delta
functions, both in r and t, in the right-hand side, see Problem 5.40.
Another interesting point to mention, which also justifies the solution (5.71) to
be called the Green’s function of the diffusion equation, is that a general solution
of the equation at time t for arbitrary initial distribution .r; 0/ can be written as a
convolution with the Green’s function:
Z
0 0
.r; t/ D G r r0 ; t r ; 0 dr ; (5.72)
where G .r r0 ; 0/ D ı .r r0 / by construction. Indeed, acting with the operator

L D 1 .@=@t/ on both sides of the above formula and taking into account
that the Laplacian acts on the variable r (and not on the integration variable r0 ) and
hence LG .r r0 ; t/ D 0, we obtain immediately that also L .r; t/ D 0 for any
time t, i.e. it satisfies the diffusion equation. At the same time, at t D 0
Z Z
0 0
0
0 0
.r; t/jtD0 D G r r ;0 r ; 0 dr D ı r r0 r ; 0 dr D .r; 0/ ;
i.e. it also satisfies the initial conditions, as required.
Problem 5.39. Prove that the density (5.72) remains properly normalised for
any t 0, i.e.
Z Z
.r; t/ dr D .r; 0/ dr:
R
[Hint: demonstrate by direct integration that G.r; t/dr D 1.]
Problem 5.40. Show that the solution of the differential equation
1 @G.r; t/ 1
G.r; t/ C D ı .r/ ı .t/ (5.73)
@t
is H.t/G .r; t/, where H.x/ is the Heaviside unit step function and G .r; t/
is the Green’s function given by Eq. (5.71). [Hint: use the corresponding
generalisation of the Fourier transformation (5.52) written for both (all three)
spatial and (one) temporal variables. Then, when calculating the inverse Fourier
transform, calculate first the ! integral in the complex plane, then perform a
direct integration in the k space.]
Problem 5.41. Calculate the Green’s function G.x; t/ of the one-dimensional
Schrödinger equation for a free electron:

„2 @2 @
C i„ G.x; t/ D i„ı .x/ ı .t/ : (5.74)
2m @x2 @t
(continued)

Firstly, solve for the Fourier transform G.k; !/ of the Green’s function. Then
apply the inverse Fourier transform to write G.x; t/ as a double integral over !
and k. Perform first the !-integration in the complex plane using the contour
which contains the horizontal part shifted upwards by i and enclosed by a
semicircle from below, so that the pole (which is on the real axis) appears inside
the contour. This gives in the ! C0 limit:
Z 1
1 2 2
G.x; t/ D H.t/ eimx =2„t ei.„t=2m/ d;
2 1
where, as usual, H.t/ is the Heaviside function. The -integral diverges in the
strict mathematical sense. However, one can treat it as a generalised function
and use the following regularisation:
Z 1 Z 1 r r
i˛2 .i˛C /2
e d ) lim e d D lim D ;
1 !C0 1 !C0 i˛ C i˛
where ˛ D „t=2m. Therefore, show that

r
m imx2 =2„t
G.x; t/ D e :
2 „ti
This formula plays a central role in the theory of path integrals developed by R.
Feynman, which serves as an alternative formulation of quantum mechanics.
Note that some justification for the manipulations made in the last problem can be
made by noticing that the equation for the Green’s function (5.74) can be formally
considered identical to the diffusion equation (5.73) by choosing D i„=2m in the
latter. Then, also formally, the Green’s function for the Schrödinger equation can be
obtained directly from Eq. (5.71) using the corresponding substitution.
5.3.4 Time Correlation Functions
Time correlation functions play an important role in physics. When a particular

observable is considered, e.g. a position r.t/ or velocity v.t/ D rP .t/ of a particle in
a liquid at time t, this quantity is subjected to random fluctuations due to interaction
with the environment (other particles); in other words, physical quantities of a
subsystem will fluctuate even if the whole system is in thermodynamic equilibrium.
At the same time, the values of the given observable taken at different times are
correlated, i.e. there is a “memory” in the system whereby fluctuations of the
observable at time t somehow depend on its values at previous times t0 < t.
The quantitative measure of this kind of memory is provided by the so-called

correlation functions. If f .t/ is a particular quantity of interest measured at time
t, e.g. the velocity along the x direction of a particular particle in the system, then
the correlation function is defined as
Kff .; t/ D hf .t C / f .t/i ; (5.75)
where f .t/ D f .t/ hf .t/i corresponds to the deviation of f .t/ from its average
hf .t/i and the angle brackets correspond to the averaging over the statistical
ensemble. Here the average hA.t/i is understood as a mean value of the function
P A.t/
(in general, of coordinates and velocities) over the ensemble, hA.t/i D i wi Ai .t/,
where the sum is taken over all systems in the ensemble i, wi is the probability
to find the system i in the ensemble and Ai .t/ is the value of the function A.t/
reached by the particular system i. For instance, if we would like to calculate the
velocity autocorrelation function Kvv .; t/ of a Brownian particle in a liquid, then
f . / 7! v.t/ and an ensemble would consist of different identical systems in which
initial (at t D 0) velocities and positions of particles in the liquid are different, e.g.
drawn from the equilibrium (Gibbs) distribution. In this distribution some states of
the liquid particles (i.e. their positions and velocities) are more probable than the
others, this is determined by the corresponding distribution of statistical mechanics.
Finally, the sum over i in the definition of the average would correspond to an
integral over all initial positions and momenta of the particles.
Two limiting cases are worth mentioning. If there are no correlations in the values
of the observable f .t/ at different times, then the average of the product at two times
is equal to the product of individual averages,
Kff .; t/ D hf .t C /i hf .t/i ;
leading to Kff . / D 0, since
hf i D hf hf ii D hf i hf i D 0
by definition. On the other hand, at large times ! 1 the correlations between

the quantities f .t C / and f .t/ calculated at remote times must weaken, and hence
Kff .; t/ ! 0 in this limit.
When a system is in thermodynamic equilibrium or in a stationary state, the
correlation function should not depend on the time t as only the time difference
between the two functions f .t C / and f .t/ actually matters. In other words,
translation in time (given by t) should not affect the correlation function which
would be a function of only. Because of this it is clear that the correlation function
must be an even function of time:
Kff . /D hf .t / f .t/i D hf .t/ f .t /i D hf .t C / f .t/i D Kff . /;

(5.76)
where we have added to both times at the last step. Note that we also omitted the
time t in the argument of the correlation function for simplicity of notations. Also,
under these conditions, the average hf .t/i does not depend on time t.
In the following we shall measure the quantity f .t/ relative to its average value
and hence will drop the symbol from the definition of the correlation function.
There is a famous theorem due to Wiener and Khinchin which we shall briefly
discuss now. Let us assume that the Fourier transform f .!/ can be defined for the
stochastic variable f .t/. Next, we define the spectral power density as the limit:
1D E
S .!/ D lim ST .!/ D lim jfT .!/j2 ; (5.77)
T!1 T!1 T
where
Z T=2
fT .!/ D f .t/ei!t dt
T=2
is the partial Fourier transform, i.e. f .!/ D limT!1 fT .!/. Consider now ST .!/ as
defined above:
*ˇZ ˇ2 + *Z Z T=2 +
1 ˇ T=2 ˇ 1 T=2
ˇ i!t1 ˇ i!.t1 t2 /
ST .!/ D ˇ dt e f .t1 /ˇ D dt1 dt2 e f .t1 / f .t2 /
T ˇ T=2 1 ˇ T T=2 T=2
Z Z
1 T=2 T=2
D dt1 dt2 ei!.t1 t2 / hf .t1 / f .t2 /i
T T=2 T=2
Z Z ˇ ˇ
1 T=2 T=2 ˇ t2 ! D t1 t2 ˇ
D dt1 dt2 e i!.t1 t2 / ˇ
Kff .t1 t2 / D ˇ ˇ
T T=2 T=2 d D dt2 ˇ
Z Z t1 CT=2
1 T=2
D dt1 d ei! Kff ./ : (5.78)
T T=2 t1 T=2
The next step consists of interchanging the order of integration: we shall perform
integration over t1 first and over second. The integration region on the .t1 ; / plane
is shown in Fig. 5.5. When choosing the integration to be performed last, one has
to split the integration region into two regions: T < 0 and 0 T. This
yields
Z 0 Z CT=2 Z Z
1 1 T T=2
ST .!/ D ei! Kff . / d dt1 C ei! Kff ./ d dt1
T T T=2 T 0 T=2
Z 0 Z
1 T
D Kff . / .T C / ei! d C Kff ./ .T / ei! d
T T 0
Z Z
1 T T
jj i!
D .T j j/ ei! Kff . / d D 1 e Kff ./ d:
T T T T
Fig. 5.5 The integration

region in Eq. (5.78)
Taking the T ! 1 limit, we finally obtain

Z 1
S.!/ D ei! Kff ./ d; (5.79)
1
which is the required result. It shows that the spectral power density is in fact the
Fourier transform of the correlation function.
As an example of calculating a time correlation function, let us consider a one-
dimensional Brownian particle. Its equation of motion reads
pP D p C .t/; (5.80)
where p.t/ is the particle momentum, friction coefficient and .t/ a random
force. The two forces in the right-hand side are due to random collisions of liquid
particles with the Brownian particle: the random force tends to provide energy to
the Brownian particle, while the friction force, p, is responsible for taking any
extra energy out, so that on balance the average kinetic energy of the particle,
1 ˝ ˛ 1
p.t/2 D kB T; (5.81)
2m 2
would correspond correctly to the temperature T of the liquid by virtue of the
equipartition theorem. Above m is the Brownian particle mass and kB is the
Boltzmann’s constant.
The random force acting on the particle does not have any memory3 ; corre-
spondingly, we shall assume that its correlation function is proportional to the delta
function of the time difference:
˝ ˛
.t/ t0 D ˛ı t t0 ; (5.82)
where ˛ D 2mkB T . This particular choice of the proportionality constant will

become apparent later on. This expression shows that there is no correlation between
the random force at different times t ¤ t0 since ı .t t0 / D 0 in this case.
Our goal is to calculate the momentum–momentum correlation function
Kpp . / D hp.t C /p.t/i. We shall use the Fourier transform method to perform this
calculation. Defining
Z Z
d! i!t
p .!/ D ei!t p.t/dt and p.t/ D e p.!/;
2
and applying the Fourier transform to both sides of the equation of motion (5.80),
we obtain
.!/
i!p.!/ D p.!/ C .!/ H) p.!/ D i ;
! i
where .!/ is the Fourier transform of the random force. The correlation function
becomes
Z
d!d! 0 i!.tC / i! 0 t ˝ ˛
Kpp . / D 2
e e p.!/p ! 0
.2 /
Z
d!d! 0 i!.tC/ i! 0 t h.!/ .! 0 /i
D e e :
.2 /2 .! i / .! 0 i /
To calculate the correlation function of the random force, we shall return back from
frequencies to times by virtue of the random force Fourier transform:
Z Z Z Z
˝ ˛ 0 0 ˝ ˛ 0 0
.!/ ! 0 D dt dt0 ei!t ei! t .t/ t0 D ˛ dt dt0 ei!t ei! t ı tt0
Z
0
D˛ dt ei.!C! /t D 2 ˛ı ! C ! 0 :
Therefore,
Z
d!d! 0 2 ˛ı .! C ! 0 /
0
Kpp . / D 2
ei!.tC/ ei! t
.2 / .! i / .! 0 i /
Z Z
d! i! 1 d! ei!
D ˛ e D˛ :
2 .! i / .! i / 2 !2 C 2
3
The case with the memory will be considered in Sect. 6.5.2.
The integral is performed in the complex z plane. There are two poles z D ˙i ,
one in the upper and one in the lower halves of the complex plane. For > 0 we
enclose the horizontal axis in the upper half where only the pole z D i contributes,
which gives Kpp . / D .˛=2 / e . For negative times < 0 the horizontal axis
is enclosed by a large semicircle in the lower part of the complex plane, where the
pole z D i contributes and an extra minus sign comes from the opposite direction
in which the contour is traversed. This gives Kpp . / D .˛=2 / e . Therefore, for
any time, we can write
˛ jj
Kpp . / D e D mkB Tej j : (5.83)
2
Several interesting observations can be made. Firstly, the correlation function does
indeed decay (exponentially) with time, i.e. collisions with particles of the liquid
destroy any correlation in motion of the Brownian particles on the time scale of 1= .
Indeed, as one would expect, the stronger the friction, the faster any correlation in
the particle motion is destroyed as the Brownian particle undergoes many frequent
collisions; if the friction is weak, the particle moves farther without collisions,
collisions happen less frequently and hence the correlation between different times
decays more slowly. Secondly,
˝ ˛ at D 0 we obtain the average momentum square,
Kpp .0/ D hp.t/p.t/i D p.t/2 , which at equilibrium should remain unchanged on
average and equal to mkB T in accordance with Eq. (5.81). And indeed, Kpp .0/ D
mkB T as required. It is this particular physical condition which fixes the value of the
constant ˛ in Eq. (5.82). Finally, the correlation function is indeed even with respect
to the time variable as required by Eq. (5.76).
Problem 5.42. Consider a one-dimensional damped harmonic oscillator,
mRx C kx D mPx C .t/;
which experiences a random force .t/ satisfying Eq. (5.82) with the same
˛ D 2mkB T as above. Show first that after the Fourier transform the particle
velocity v.!/ D xP .!/ D i!x.!/ is given by
i !.!/
v.!/ D ;
m ! 2 !02 i!
p
where !0 D k=m is the fundamental frequency of the oscillator. Then, using
this result show that the velocity autocorrelation function
Z
2kB T d! !2
Kvv .t/ D hv.t/v.0/i D ei!t :
m 2 ! 2 ! 2 2 C .!/2
0
(continued)

Calculating this integral in the complex plane separately for t > 0 and t < 0,
show that
2 p 3
kB T 4 p sin Dt
hv.t/v.0/i D cos Dt p 5 ejtj=2 if < 2!0
m 2 D
and
2 p 3
kB T 4 p sinh Dt
hv.t/v.0/i D cosh Dt p 5 ejtj=2 if > 2!0 :
m 2 D
˝ ˛
Above, D D !02 2 =4. Note that v.0/2 D kB T=m as required by the
equipartition theorem.
5.3.5 Fraunhofer Diffraction
Consider propagation of light through a wall with a hole in it, an obstacle called
an aperture. On the other side of the wall a diffraction pattern will be seen because
of the wave nature of light. If in the experiment the source of light is placed far
away from the aperture so that the coming light can be considered as consisting of
plane waves, and the observation of the diffraction pattern is made on the screen
placed also far away from the aperture, this particular diffraction bears the name of
Fraunhofer.
Consider an infinitely adsorbing wall (i.e. fully non-radiative) placed in the x y
plane with a hole of an arbitrary shape. The middle of the hole (to be conveniently
defined in each particular case) is aligned with the centre of the coordinate system,
Fig. 5.6. The positive direction z is chosen towards the observation screen (on the
right in the figure). Each point A within the aperture with the vector r D .x; y; 0/
(more precisely, a small surface area dS D dxdy at this point) becomes an
independent source of light waves which propagate out of it in such a way that
their amplitude
1 i!.tR=c/
dF.g/ / e dS;
R
where R D jRj is the distance between the point A and the observation point P
which position is given by the vector g drawn from the centre of the coordinate
system, c speed of light and ! frequency (k D !=c is the wave vector). Note that
the amplitude of a spherical wave decays as 1=R with distance, and we have also
Fig. 5.6 Fraunhofer diffraction: parallel rays of light from a source (on the left) are incident on
the wall with an aperture. They are observed at point P on the screen which is far away from the
wall. Every point A on the x y plane within the aperture serves as a source of a spherical wave of
light. All such waves from all points within the aperture contribute to the final signal observed at P
explicitly accounted for in the formula above the retardation effects (see Sect. 5.3.2).
In order to calculate the total contribution at point P, one has to integrate over the
whole area of the aperture:
Z
dS ikR
F .g/ / ei!t e : (5.84)
S R
The observation point is positioned far away from the small aperture, i.e. r g, see
Fig. 5.6. In this case the distance R D jg rj can be worked out using the cosine
theorem as follows:
s
p 2
2 2
r r
R D g C r 2g r D g 1 C 2Og
g g
r
r r
' g 1 2Og ' g 1 C gO D g C gO r;
g g
where gO is the unit vector in the direction of g. Therefore,

Z Z Z
dS dS ik r
F .g/ / ei!t eikg eik r / e / eik r dS; (5.85)
S g C gO r S g S
where k D kOg and we have made use again of the fact that the observation point is far
away from the aperture and hence have removed the gO r term in the denominator. We
have also omitted the whole pre-factor at the last steps; we shall reinstate it later on.
In order to connect the direction g (or k D kOg) with the coordinates x0 and y0
of the observation point P on the screen, we notice that projections of this point
on both the screen and on the wall are the same as the two planes are parallel to
each other, see additional dotted lines in the figure. In particular, the point B on
the wall corresponds to the observation point P on the screen. Therefore, gx D x0
and gy D y0 . If by z we denote the distance between the screen and the wall, then
z D gz ' g, and hence
k k k
kx D kOgx D gx ' x0 and ky ' y0 ; (5.86)
g z z
0 0
yielding F .g/ being directly
dependent on the coordinates .x ; y / on the screen, i.e.
it can be written as F kx ; ky .
In order to work out the pre-factor to the integral in Eq. (5.85), we first notice
that this integral is performed over the aperture area only. It can be extended to the
whole x y plane if we introduce the so-called aperture function .x; y/, which is
equal to one within the aperture and zero otherwise. Then,
Z

F kx ; ky D .x; y/ ei.kx xCky y/ dxdy; (5.87)
i.e. it is seen as a two-dimensional Fourier transform of the aperture function. Here

is a yet unknown amplitude. We shall now relate it to the total intensity density I0
(i.e. calculated per unit area) of the incident light waves on the aperture. Indeed, the
inverse Fourier transform of the last equation reads
Z
1 dkx dky
.x; y/ D F kx ; ky ei.kx xCky y/ :
.2 /2

Both functions, .x; y/ and F kx ; ky must then be related by the Parseval’s
theorem (5.45) which in our notation reads:
Z Z
1 ˇ ˇ
2 j .x; y/j2 dxdy D ˇF kx ; ky ˇ2 dkx dky :
.2 /2
R
The integral in the left-hand side is equal to the area S D S dxdy of the aperture.
The integral in the right-hand side gives the total intensity I0 S of all waves which
made it through the aperture (note that the intensity is given by the module of the
amplitude squared
p and I0 is the intensity density). Therefore, 2 S D I0 S= .2 /2 ,
hence D I0 =2 and we can finally write
p Z
I0
F kx ; ky D .x; y/ ei.kx xCky y/ dxdy: (5.88)
2
This is the final result. The amplitude is proportional to the two-dimensional Fourier
transform of the aperture function. The intensity distribution at point P .x0 ; y0 / on the
ˇ ˇ2
screen is then given by dI .x0 ; y0 / D ˇF kx ; ky ˇ dkx dky , where the relations between
the components of the wave vector k and the coordinates .x0 ; y0 / on the screen are
given by Eq. (5.86).
Problem 5.43. Consider a square aperture with dimensions h h separated

by the distance z (z h) from the screen. Calculate the Fourier transform of
the corresponding aperture function and hence show that the intensity of light
on the dx0 dy0 area of the screen is given by
2 2
hk kh 0 kh 0
dI x0 ; y0 D I0 sinc2 x sinc2 y dx0 dy0 ; (5.89)
4 z 2z 2z
where sinc.x/ D sin x=x is the so-called sinc function.

Problem 5.44. Consider a circular aperture of radius R. Using polar coordi-
nates for points on the aperture, i.e. .x; y/ ! .r; /, and on the screen, i.e.
.x0 ; y0 / ! .r0 ; 0 /, first show that kx x C ky y D .k=z/ rr0 cos . 0 /. Then,
integrating in polar coordinates, show that the amplitude in this case
p Z
R
krr0
F r0 ; 0
D I0 rJ0 dr;
0 z
where J0 .x/ is the corresponding Bessel function, Sect. 4.6, and we have made
use of a well-known integral representation for it, Eq. (4.230):
Z 2
1
J0 .x/ D eix cos d :
2 0
Finally, using the recurrence relation (4.218) for the Bessel functions, integrate
over r and show that
p
0 0 Rz I0 kR 0
F r; D J1 r :
kr0 z
Correspondingly, the intensity distribution is given by

2
Rz 1 2 kR 0
dI r0 ; 0
D I0 J r dr0 d 0
k r0 1 z
in this case.
Problem 5.45. Consider now an infinite slit of width h running along the y
direction. In that case the light arriving at a point .x; y/ within the aperture will
only be diffracted in the xz plane. Repeating the above derivation which led us
to Eq. (5.88), show that in this essentially one-dimensional case the amplitude
on the screen per unit length along the slit is given by the following formula:
r Z r
I0 I0 kh 0
F .kx / D e dx D h
ikx x
sinc x ; (5.90)
2 2 2z
leading to the intensity distribution

kh2 I0 kh 0
dI x0 D sinc2 x dx0 : (5.91)
2 z 2z
Chapter 6
Laplace Transform
In Chap. 5 we considered the Fourier transform (FT) of a function f .x/ defined on

the whole real axis 1 < x < 1. Recall that for the FT to exist, the function
fR.x/ must tend to zero at x ! ˙1 to ensure the convergence of the integral
1
1 jf .x/j dx. However, sometimes it is useful to study functions f .x/ defined only
for x 0 which may also increase indefinitely as x ! C1 (but not faster than an
exponential function, see below). For instance, this kind of problems is encountered
in physics when dealing with processes in time t when we are interested in the
behaviour of some function f .t/ at times after some “initial” time (considered as
t D 0), at which the system was “prepared” (the initial value problem), e.g. prior
to some perturbation acting on it. In such cases another integral transform, named
after Laplace, the so-called Laplace transform (LT), has been found very useful in
many applications.
In this chapter1 main properties and applications of the Laplace transform are
presented.
6.1 Definition
The idea of why the LT might be useful can be understood if we recall why the
FT could be useful. Indeed, the FT was found useful, for instance, in solving
differential equations (DE): (1) one first transforms the DE into the Fourier space,
i.e. fx; f .x/g ! f; F./g; (2) the DE in the Fourier space (the -space) for
1

446 6 Laplace Transform
the “image” F./ appears to be simpler (e.g. it becomes entirely algebraic with
no derivatives) and can then be solved; then, finally, (3) using the inverse FT,
f; F./g ! fx; f .x/g, the original function f .x/ we are interested in is found. In
other words, to solve the problem, one “visits” the Fourier space where the problem
at hand looks simpler, solves it there, but then “returns back” to the original x-space
by means of the inverse FT. Similarly one operates with the LT: first, the problem
of finding the function f .t/ is converted into the “Laplace space” by performing the
LT (t ! p and f .t/ ! F.p/), where p is a complex number; then the “image” F.p/
of the function of interest is found, which is then converted back into the t-space,
F.p/ ! f .t/, by means of the inverse LT. In fact, we shall see later on that the two
transforms are closely related to each other.
Consider a (generally) complex function f .t/ of the real argument t 0 which is
continuous everywhere apart from some number of discontinuities of the first kind.2
Moreover, we assume that within each finite interval there could only be a finite
number of such discontinuities. Next, for definiteness, we shall set the values of f .t/
at negative t to zero: f .t/ D 0 for t < 0. Finally, we shall assume that f .t/ may
increase with t; but this cannot happen faster than for an exponential function ex0 t
with some positive exponent x0 . In other words, we assume that
jf .t/j Mex0 t for any t 0; (6.1)
where M is some positive constant. An example of such a function is, for instance,
the exponential e2t . It goes to infinity when t ! C1, however, this does not happen
2
faster than ex0 t with any x0 > 2. At the same time, the function f .t/ D e2t grows
x0 t
much faster than the exponential function e with any x0 , and hence this class
of functions is excluded from our consideration. Note that if f .t/ is limited, i.e.
jf .t/j M, then x0 D 0. The positive number x0 characterising the exponential
growth of f .t/ we shall call the growth order parameter of f .t/. We shall call the
function f .t/ the original.
We shall then define the Laplace transform (LT) of f .t/ by means of the following
formula:
Z 1
L Œf .t/ D f .t/ept dt D F.p/: (6.2)
0
Here f .t/ is a function in the “t-space”, while its transform, F.p/ D L.f /, is the
corresponding function in the “p-space”, where p is generally a complex number.
The function F.p/ in the complex plane will be called image of the original f .t/.
Before we investigate the properties of the LT, it is instructive to consider some
examples.
2
Recall that this means that one-sided limits, not equal to each other, exist on both sides of the
discontinuity.
6.1 Definition 447
The simplest case corresponds to the function f .t/ D 1. According to the

definition of the LT, we can write
Z ˇ1
1
1 ˇ 1
L Œf D ept dt D ept ˇˇ D :
0 p 0 p
A very important point here is that we set to zero the value of the exponential
function at the t D C1 boundary. This is legitimate only if a part of the complex
plane C is considered for the complex numbers p. Indeed, if we only consider the
numbers p D x C iy with the positive real part, Re.p/ D x > 0, i.e. the right semi-
plane of C, then ept D ext eiyt ! 0 at tˇ ! 1 ˇ and this ensures the convergence
of the integral at the upper limit (note that ˇeiyt ˇ D 1).
Let us next calculate the LT of f .t/ D e˛t with some complex ˛. According to
the definition,
Z ˇ1
1
e.pC˛/t ˇˇ 1
L Œf D e.pC˛/t dt D ˇ D :
0 .p C ˛/ 0 pC˛
To ensure the convergence of the integral at the upper limit we have to consider
only such values of p in the complex plane C which satisfy the following condition:
Re.p C ˛/ > 0. Indeed, only in this case e.pC˛/t ! 0 as t ! 1. In other words,
the LT of the function e˛t exists for any p to the right of the vertical line drawn in
C via the point Re.˛/.
Since the function f .t/ enters the integral (6.2) linearly, the LT represents a linear
operator, i.e. for any two functions f .t/ and g.t/ satisfying the necessary conditions
outlined above,
Z 1
L Œ˛f .t/ C ˇf .t/ D Œ˛f .t/ C ˇf .t/ ept dt D ˛L Œf C ˇL Œg ; (6.3)
0
where ˛ and ˇ are arbitrary complex numbers. The linearity property allows
simplifying the calculation of some transforms as illustrated by the example of
calculating the LT of the sine and cosine functions f .t/ D cos !t and g.t/ D sin !t
(assuming ! is real). Since
1 i!t
f .t/ D cos !t D e C ei!t ;
2
we write

1˚ 1 1 1 p
L Œcos !t D L ei!t C L ei!t D C D :
2 2 p i! p C i! p2 C !2
(6.4)
Here we must assume that Re .p/ > 0.
Problem 6.1. Similarly, show that

!
L Œsin !t D : (6.5)
p2 C !2
Problem 6.2. Obtain the same formulae for the LT of the sine and cosine
functions directly by calculating the LT integral.
Problem 6.3. Prove that for Re.p/ > 0
2!p p2 ! 2
L Œt sin !t D and L Œt cos !t D : (6.6)
.p2 C ! 2 /2 .p2 C ! 2 /2
Problem 6.4. Prove that for Re.p/ > max f˙Re.˛/g

˛ p
L Œsinh .˛t/ D 2 and L Œcosh .˛t/ D 2 : (6.7)
p ˛2 p ˛2
Problem 6.5. Prove that for Re .p/ > max .˛ ˙ ˇ/
! pC˛
L e˛t sin !t D 2
and L e˛t cos !t D :
.p C ˛/ C !2 .p C ˛/2 C ! 2
(6.8)
Problem 6.6. Prove that for Re .p/ > max .˛ ˙ ˇ/

ˇ p˛
L e˛t sinh .ˇt/ D 2
and L e˛t cosh .ˇt/ D :
.p ˛/ ˇ2 .p ˛/2 ˇ 2
(6.9)
Problem 6.7. It was shown above that L t0 D 1=p. Show that L Œt D 1=p2 .
Then prove by induction that
nŠ
L Œtn D ; n D 0; 1; 2; 3 : : : : (6.10)
pnC1
Problem 6.8. Prove by induction that
nŠ
L tn e˛t D ; n D 0; 1; 2; 3 : : : : (6.11)
.p C ˛/nC1
6.1 Definition 449
Problem 6.9. In fact, show that the following result is generally valid:
dn
L Œ.t/n f .t/ D F.p/; (6.12)
dpn
where L Œf D F.p/. [Hint: the formula follows from the definition (6.2) upon
differentiating its both sides n times.]
Problem 6.10. Using the rule (6.12), rederive Eq. (6.10).
Problem 6.11. Using the rule (6.12), rederive Eq. (6.11).
1 1 pC˛
L e˛t cos2 .ˇt/ D C :
2.p C ˛/ 2 .p C ˛/2 C 4ˇ 2

1 p
L ŒH .t / D e :
p

e˙it 1 1
L p Dp p : (6.13)
2 t 2 p i
p
[Hint: make the change of variables t ! x D t and note that for any ˛ with
Re.˛/ > 0 we have
Z 1
r
˛x2
e dx D ;
0 4˛
which follows from Eq. (4.35).]
As another useful example, let us consider the LT of the Dirac delta function
f .t/ D ı .t / with some positive > 0:
Z 1
L Œı .x / D ı .t / ept dt D ep : (6.14)
0
In particular, by taking the limit ! C0, we conclude that
L Œı .t/ D 1: (6.15)
.
6.2 Detailed Consideration of the Laplace Transform
Here we shall consider the LT in more detail including the main theorems related to
it. Various properties of the LT which are needed for the actual use of this method
in solving practical problems will be considered in the next section.
6.2.1 Analyticity of the Laplace Transform
Theorem 6.1. If f .t/ is of an exponential growth, i.e. it goes to infinity not faster
than the exponential function ex0 t with some positive growth order parameter
x0 > 0, then the LT LŒf .t/ D F.p/ of f .t/ exists in the semi-plane Re.p/ > x0 .
Proof. If f .t/ is of the exponential growth, Eq. (6.1), where the positive number x0
may be considered as a characteristic exponential of the function f .t/, then the LT
integral (6.2) converges absolutely. Indeed, consider the absolute value of the LT of
f .t/ at the point p D x C iy:
ˇZ 1 ˇ Z 1 Z 1
ˇ ˇ ˇ ˇ ˇ ˇ
ˇ
jF.p/j D ˇ pt ˇ
f .t/e dtˇ ˇ f .t/ept ˇ
dt D jf .t/j ˇept ˇ dt
0 0 0
Z 1
D jf .t/j ext dt;
0
so that
Z 1 Z 1 Z 1
jF.p/j jf .t/j ext dt M ex0 t ext dt D M e.xx0 /t dt
0 0 0
ˇ1
M ˇ
D e.xx0 /t ˇ :
x x0 0
If x x0 D Re.p/ x0 > 0, i.e. if Re.p/ > x0 , then the value of the expression above
at the upper limit t D 1 is zero and we obtain
ˇZ ˇ
ˇ 1 ˇ M
jF.p/j D ˇˇ f .t/ept dtˇˇ ; (6.16)
0 x x0
which means that the LT integral converges absolutely and hence the LT F.p/ exists.
Q.E.D. Note that the estimate above establishes the uniform convergence of the
LT integral with respect to the imaginary part of p since jF.p/j is bounded by an
expression containing only the real part of p. We shall use this fact later on in
Sect. 6.2.3.
6.2 Detailed Consideration of the Laplace Transform 451
It follows from this theorem that F.p/ ! 0 when jpj ! 1 since the expression
in the right-hand side of (6.16) tends to zero in this limit. Note that since Re .p/ > x0 ,
this may only happen for arguments of p D jpj ei satisfying the inequality
=2 < < =2 with the points ˙ =2 strictly excluded.
Theorem 6.2. If f .t/ is of an exponential growth with the growth order parameter
x0 > 0, then the LT LŒf .t/ D F.p/ of f .t/ is an analytical function of p in the
complex semi-plane Re.p/ > x0 .
Proof. Consider the derivative of F.p/:

Z 1 Z 1 Z 1
dF.p/ d pt d pt
D f .t/e dt D f .t/ e dt D f .t/tept dt:
dp dp 0 0 dt 0
This integral converges absolutely:

ˇ ˇ ˇZ ˇ Z Z Z
ˇ dF ˇ ˇ 1 ˇ 1 1 1
ˇ ˇDˇ pt
f .t/te dtˇˇ jf .t/j text dtM ex0 t text dt D M te.xx0 /t dt:
ˇ dp ˇ ˇ
0 0 0 0
This integral can be calculated by parts in the usual way:

Z ˇ1 Z 1
1
t ˇ 1
te.xx0 /t dt D e.xx0 /t ˇˇ C e.xx0 /t dt:
0 .x x0 / 0 x x0 0
The first term is equal to zero both at t D 0 (because of the t present there) and at
t D 1 if the condition x D Re.p/ > Re.x0 / is satisfied (note that the exponential
e˛t with ˛ > 0 tends to zero much faster than any power of t when t ! 1). Then,
only the integral in the right-hand side remains which is calculated trivially to yield:
Z 1 Z 1 ˇ1
1 1 ˇ 1
te.xx0 /t dt D e.xx0 /t dt D 2
e.xx0 /t ˇ D ;
0 x x0 0 .x x0 / 0 .x x0 /2
so that we finally conclude that

ˇ ˇ Z 1
ˇ dF ˇ M
ˇ ˇM te.xx0 /t dt D ;
ˇ dp ˇ .x x0 /2
0
i.e. it is indeed finite. This means that the function F.p/ is analytical (can be
differentiated).Q.E.D.
It also follows from the two theorems that F.p/ does not have singularities in
the right semi-plane Re.p/ > x0 , i.e. all possible poles of F.p/ can only be in
the complex plane on the left of the vertical line Re.p/ D x0 . Therefore, the
growth order parameter demonstrating the growth of the function f .t/ in the “direct”
t-space also determines the analytical properties of the “image” F.p/ in the complex
p-plane.
Theorem 6.2 is proven to be very useful for obtaining LT of new functions

without calculating the LT integral. Indeed, as an example, let us derive again
formulae (6.6). Differentiating both sides of the LT of the cosine function, Eq. (6.4),
with respect to !, we have
Z 1
2!p
ept .t sin !t/ dt D :
0 .p2 C ! 2 /2
Since in the left-hand side we have L Œt sin !t, we immediately reproduce one of
the formulae in Eq. (6.6). The other formula is reproduced similarly.
6.2.2 Relation to the Fourier Transform
It appears that the LT and the FT are closely related. We shall establish here the
relationship between the two which would also allow us to derive a direct formula
for the inverse LT. A different (and more rigorous) derivation of the inverse LT will
be given in the next subsection.
Consider a function f .t/ for which the LT exists in the region Re.p/ > x0 . If we
complement f .t/ with an extra exponential factor eat with some real parameter a,
we shall arrive at the function ga .t/ D f .t/eat which for any a > x0 will tend to
zero at the t ! C1 limit:
ˇ ˇ
jga .t/j D ˇf .t/eat ˇ Me.ax0 /t :
Therefore, the FT Ga ./ of ga .t/ can be defined3 It will depend on both a and :
Z 1 Z 1 Z 1
Ga ./ D ga .t/ei2 t
dt D f .t/eat ei2 t
dt D f .t/e.aCi2 /t
dt:
1 0 0
(6.17)
Note that we have replaced the bottom integration limit by zero as f .t/ D 0 for any
negative t. The inverse FT of Ga .t/ is then given by:
Z 1
i2 t ga .t/ D f .t/eat ; t > 0
Ga ./e dt D :
1 0 ; t<0
Multiplying both sides of this equation by eat , we obtain

Z 1
f .t/ ; t > 0
Ga ./e.aCi2 /t
d D : (6.18)
1 0 ; t<0
3
Since f .t/ is piecewise continuous, so is ga .t/, and hence the other Dirichlet condition is also
satisfied for the FT to exist.
The number p D a C i2 is some complex number, so that Eqs. (6.17) and (6.18)
can also be alternatively written as:
Z 1
Ga ./ D f .t/ept dt (6.19)
0
and
Z 1
f .t/ D Ga ./ept d: (6.20)
1
One can recognise in Eq. (6.19) the LT of the function f .t/, i.e. Ga ./ D L Œf .t/ D
F.p/. In the other Eq. (6.20) we shall change the integration variable from to
p D a C i2 and will replace Ga ./ with F.p/. This gives
Z aCi1
1
f .t/ D F.p/ept dp: (6.21)
2 i ai1
This formula provides a recipe for the inverse LT. Here a is any positive number
such that a > x0 . One can see that in order to calculate f .t/ from its LT F.p/, one
has to perform an integration in the complex plane of the function F.p/ept along the
vertical line Re.p/ D a from i1 to Ci1.
6.2.3 Inverse Laplace Transform
Here we shall rederive formula (6.21) using an approach similar to the one used
when proving the Fourier integral in Sect. 5.1.3.
Theorem 6.3. If f .t/ satisfies all the necessary conditions for its LT F.p/ to exist
and is analytic,4 then the following identity is valid:
Z Z 1
1 aCib
pt1
f .t/ D lim pt
e f .t1 / e dt1 dp: (6.22)
b!1 2 i aib 0
The limit above essentially means that the integral over p is to be understood as
the principal value integral when both limits go to ˙i1 simultaneously.
Proof. Let us consider the function under the limit above:

Z aCib Z 1
1 pt1
fb .t/ D ept
f .t1 / e dt1 dp:
2 i aib 0
4
Strictly speaking, this condition is not necessary, but we shall assume it to simplify the proof.
At the next step we would like to exchange the order of integrals. This is legitimate
if the internal integral (over t1 ) converges uniformly with respect to the variable y of
the external integral taken over the vertical line p D a C iy in the complex plane. It
follows from Theorem 6.1 (see the comment immediately following its proof) that
for any a > x0 the integral over t1 indeed converges absolutely and uniformly with
respect to the imaginary part of p. Therefore, the order of the two integrals can be
interchanged (cf. Sect. I.6.1.3) enabling one to calculate the p-integral:
Z 1 Z aCib
1
fb .t/ D dt1 f .t1 / ep.tt1 / dp
2 i 0 aib
Z 1 .aCib/.tt1 /
1 e e.aib/.tt1 /
D dt1 f .t1 /
2 i 0 t t1
Z 1 Z
1 sin Œb .tt1 / 1 1 sin Œb .tt1 /
D f .t1 / ea.tt1 / dt1 D f .t1 / ea.tt1 / dt1 ;
0 t t1 1 tt1
where in the last step we replaced the bottom limit to 1 as f .t1 / D 0 for t1 < 0
by definition. Using the substitution D t1 t and introducing the function g .t/ D
f .t/eat , we obtain
Z 1 Z
1 sin .b /
a eat 1 sin .b/
fb .t/ D f .t C / e d D g .t C / d
1 1
Z Z 1
eat 1 g .t C / g .t/ eat sin .b/
D sin .b / d C g.t/ d:
1 1
The last integral, see Eq. (2.122), is equal to , and hence the whole second term
above amounts exactly to f .t/. In the first term we introduce the function ‰./ D
Œg .t C / g .t/ = within the square brackets, which yields
Z 1
eat
fb .t/ D f .t/ C ‰. / sin .b/ d: (6.23)
1
The function ‰ ./ is piecewise continuous. Indeed, for ¤ 0 this follows from the
corresponding property of f .t/. At D 0, however,
g .t C / g .t/
lim ‰ . / D lim D g0 .t/;
!0 !0
and is well defined as we assumed that f .t/ is analytic (differentiable). Hence, ‰ ./
is piecewise continuous everywhere. Following basically the same argument as the
one we used at the end of Sect. 5.1.3, the integral in Eq. (6.23) tends to zero in the
b ! 1 limit. Hence, limb!1 fb .t/ D f .t/. As the function in the square brackets
in Eq. (6.22) is F.p/, the proven result corresponds to the inverse LT of Eq. (6.21).
The subtle point here is that the result holds in the b ! 1 limit, i.e. the formula is
indeed valid in the sense of the principal value. Q.E.D.
Fig. 6.1 Possible selections of the contour used in calculating the inverse LT integral in the
complex p plane. (a) Contours CR and CR0 corresponding to, respectively, > 0 and < 0 in
the exponential term eiz of the standard formulation of the Jordan’s lemma, and (b) contours CR
and CR0 corresponding to the exponential term ept of the inverse LT for t > 0 and t < 0, respectively.
The vertical line DE corresponds to the vertical line in Eq. (6.21)
Formula (6.21) for the inverse LT is very useful in finding the original function
f .t/ from its “image” F.p/. This may be done by using the methods of residues. Let
us discuss how this can be done. Integration in formula (6.21) for the inverse LT is
performed over the vertical line p D a C iy, 1 < y < 1, where a > x0 . Consider
a part of this vertical line DE in Fig. 6.1(b), corresponding to b < y < b. A closed
contour can be constructed
p by attaching to the vertical line an incomplete circle of
the radius R D a2 C b2 either from the left, CR (the solid line), or from the right,
CR0 (the dashed line). We shall now show that one can use a reformulated Jordan’s
lemma for expressing the inverse LT integral via residues of the image F.p/.
Indeed, let us reformulate the Jordan’s lemma, Sect. 2.7.2, by making the
substitution
C =2/
z D rei ! p D iz D rei.
in Eqs. (2.118) and (2.119). It is seen that each complex number z acquires
an additional phase =2; this corresponds to the 90ı anti-clockwise rotation of
the construction we made originally in Fig. 2.32(a) as illustrated in Fig. 6.1. In
particular, upon the substitution the horizontal line z D x ia, b < x < b, in
Fig. 6.1(a) is transformed into the vertical line p D iz D a C ix with y D x satisfying
b < y < b, see Fig. 6.1(b), while the two parts CR and CR0 of the circle of radius
R in Fig. 6.1(a) turn into the corresponding parts CR and CR0 shown in Fig. 6.1(b).
Replacing z ! p D iz and also ! t in Eqs. (2.118) and (2.119), we obtain that
the contour integrals along the parts CR and CR0 of the circle shown in Fig. 6.1(b) for
t > 0 and t < 0, respectively, tend to zero in the R ! 1 limit,
Z Z
lim F.p/e dp D 0 if t > 0 and
pt
lim F.p/ept dp D 0 if t < 0;
R!1 C R!1 C0
R R
(6.24)
provided that the function F.p/ in the integrand tends to zero when jpj ! 1.
We know that the LT image F.p/ ! 0 in the limit of jpj ! 1 everywhere on

the right of x0 , i.e. for any p satisfying Re.p/ > x0 . However, we can perform the
analytic continuation of F.p/ into the half of the complex plane on the left of the
vertical line p D x0 C iy. Then, F.p/ is defined everywhere on the complex plane,
apart from some singularities on the left of the vertical line p D x0 C iy, and tends
to zero on the circle CR C CR0 in the R ! 1 limit. Hence the lemma is directly
applicable to this function.
Consider now the case of t < 0 and the closed contour consisting of the vertical
line and the part CR0 of the circle, Fig. 6.1(b). Since all poles of F.p/ are on the left of
the vertical part of the contour, the integral over the closed contour is zero. However,
according to the Jordan’s lemma, the integral over the circle part CR0 is zero as well
in the b ! 1 limit (and hence, in the R ! 1 limit). Therefore, the integral over
the vertical part is zero as well, i.e. f .t/ D 0 for t < 0, as required.
Let us now consider positive times t > 0 and the closed contour formed by the
vertical line and the part CR of the circle. In the b ! 1 limit all poles of F.p/ will
be enclosed by such a contour, so that
Z aCib Z X
F.p/e dp C
pt
F.p/ept dp D 2 i Res F.p/ept ; pk ;
aib CR k
where pk is a pole of F.p/ and we sum over all poles. Since the integral over the part
of the circle CR in the R ! 1 is zero according to the Jordan’s lemma, Eq. (6.24),
we finally obtain
Z X
1 aCib
f .t/ D lim F.p/ept dp D Res F.p/ept ; pk ; (6.25)
b!1 2 i aib k
where we sum over all poles on the left of the vertical line p D x0 C iy.
To illustrate this powerful result, let us first calculate the function f .t/ corre-
sponding to the image F.p/ D 1= .p C ˛/. We know from the direct calculation
that this image corresponds to the exponential function f .t/ D e˛t . Let us see
if the inverse LT formula gives the same result. Choosing the vertical line at
x D a > Re.˛/, so that the pole at p D ˛ is positioned on the left of the vertical
line, we have

ept ˛t 1 1
f .t/ D Res ; ˛ D e DL ;
pC˛ pC˛
i.e. exactly the same result. Consider next the image
1
F.p/ D ; ˛ ¤ ˇ:
.p C ˛/ .p C ˇ/
It has two simple poles: p1 D ˛ and p2 D ˇ. Therefore, the original becomes
e˛t eˇt e˛t eˇt

f .t/ D C D :
˛ C ˇ ˇ C ˛ ˇ˛
The same result can also be obtained by decomposing F.p/ into simpler fractions:
1 A B 1
F.p/ D D C ; A D B D ;
.p C ˛/ .p C ˇ/ pC˛ pCˇ ˇ˛
so that

1 1 1 1 1 ˛t
f .t/ D L ŒF.p/ D L1 L 1
D e eˇt :
ˇ˛ pC˛ pCˇ ˇ˛
This example shows clearly that in some cases it is possible to simplify the
expression F.p/ for the image by decomposing it into simpler expressions for which
the originals f .t/ are known.
Problem 6.15. Represent the fraction
3p C 2
F.p/ D
3p2 C 5p 2
as a sum of partial fractions and then show using the LTs we calculated earlier
that the inverse LT of it is f .t/ D 17 4e2t C 3et=3 .
Problem 6.16. Similarly, show that

1 1p
L D e2t .sin 3t cos 3t/ :
p2 C 4p C 13
Problem 6.17. Use the inverse LT formula (6.21) to find the original function
f .t/ from its image:

p !
L1 D cos !t I L1 D sin !t I
p2 C ! 2 p2 C ! 2

1 p 1 ˛t
L D ˛e ˇeˇt ; ˛ ¤ ˇI
.p C ˛/ .p C ˇ/ ˛ˇ

1 a 1 p
L D sinh .at/ I L D cosh .at/ I
p2 a2 p2 a2
" # " #
1 2!p 1 p2 ! 2
L D t sin !t I L D t cos !t I
.p2 C ! 2 /2 .p2 C ! 2 /2
(continued)

" # " #
1 2ap 1 p2 C a2
L D t sinh .at/ I L D t cosh .at/ I
.p2 a2 /2 .p2 a2 /2
1 1 1 1
D C e3t et I
p .p2
C 4p C 3/ 3 6 2

pCk k 1 k ˛ ˛t k ˇ ˇt
L1 D C e e I
p .p C ˛/ .p C ˇ/ ˛ˇ ˛ˇ ˛ ˇ

ap2 CbpC1 1 1 1 1
L1 D C Ca˛Cb e˛t CaˇCb eˇt :
p .p˛/ .pˇ/ ˛ˇ ˛ˇ ˛ ˇ
Problem 6.18. Repeat the calculations of the previous problem using the
method of decomposition of F.p/ into partial fractions.
Multiple valued functions F.p/ can also be inverted using formula (6.21).
Consider as an example the image F.p/ D p , where 1 < < 0. Note that
the fact that is negative, guarantees the necessary condition F.p/ ! 0 at the large
circle CR in the R ! 1 limit. The subtle point here is that we have to modify the
contour CR since a branch cut from p D 0 to p D 1 along the negative part of
the real axis is required as shown in Fig. 6.2. Therefore, the residue theorem cannot
be applied immediately as we have to integrate explicitly over each part of the closed
contour. On the upper side of the cut p D rei , while on the lower p D rei , where
0 < r < 1 and . The function F.p/ does not have any poles inside
the closed contour, therefore, the sum of integrals over each part of it is zero. The
contour consists of the vertical part, of the incomplete circle CR with R ! 1 (of
two parts), a small circle C (where ! 0), and the upper and the lower horizontal
parts of the negative x axis. Therefore,
Z Z Z Z Z !
1 1
f .t/ D D :
2 i vertical 2 i CR upper lower C
Fig. 6.2 The contour needed

for the calculation of the
inverse LT of F.p/ D p ,
1 < < 0
The integral over both parts of CR tends to zero by means of the Jordan’s lemma,5
so we need to consider the integrals over the small circle and over the upper and
lower sides of the branch cut. The integral over C tends to zero in the ! 0 limit
for C 1 > 0:
Z ˇ ˇ Z
ˇ p D ei ˇ
p ept dp D ˇˇ ˇD i t ei
e e i ei d / C1 ! 0:
C dp D i e d ˇ
i
On the upper side of the cut

Z Z ˇ ˇ
1 ˇ v D tr ˇ
r ei ert .dr/ D ei r ert dr D ˇˇ ˇ
1 dv D tdr ˇ
Z 1
ei . C 1/
D C1 v ev dv D ei ;
t t tC1
where we have introduced the gamma function, Sect. 4.2, and applied the ! 0
limit and hence replaced the lower integration limit with zero. Similarly, on the
lower side of the cut:
Z 1 Z 1
. C 1/
r ei ert .dr/ D ei r ert dr D ei :
tC1
Therefore, combining all contributions,
1 i . C 1/ sin . / . C 1/
f .t/ D L1 p D e C ei D :
2 i tC1 tC1
(6.26)
In particular, for D 1=2 we obtain

1 1 .1=2/ 1
L1 p D p Dp : (6.27)
p t t

h p i ˛ 2
L1 e˛ p D p e˛ =4t : (6.28)
2 t3

1 1 ˛pp 1 2
L p e D p e˛ =4t : (6.29)
p t
5
This follows from the fact that p ! 0 on any part of CR in the R ! 1 limit.
Some general relationships for the LT can sometimes be established as well. We

shall illustrate this by deriving the following useful formula:
Z 1
1
L Œt˛ .t/ D x˛1 ‰ .x C p/ dx; (6.30)
.˛/ 0
where 0 < ˛ < 1 and ‰.p/ D L Œ .t/ is the image of .t/; ‰.p/ is assumed
to vanish when jpj ! 1 in the whole complex plane. We, however, assume for
simplicity that p > 0 is a real number. To prove this formula, we first write .t/
using the inverse LT integral:
Z 1 Z 1 Z
1 aCi1
L Œt˛ .t/ D t˛ .t/ept dt D t˛ dt ‰ .z/ e.pz/t dz :
0 0 2 i ai1
(6.31)
Here Re .z/ D a < p, as the vertical line drawn at Re .z/ D a is always to the left
of the region in the complex plane p where the LT is analytic. The function ‰ .z/
decays to zero at jzj ! 1, and hence we can assume that the integral
Z aCi1
j‰ .z/j dz
ai1
converges. Since z D a C iy with 1 < y < 1, the integral along the vertical line
in Eq. (6.31) converges uniformly with respect to t:
ˇZ ˇ ˇZ ˇ
ˇ aCi1 ˇ ˇ aCi1 ˇ
ˇ ‰ .z/ e .pz/t ˇ
dzˇ D e.pa/t ˇ
‰ .z/ e dzˇˇ
iyt
ˇ ˇ
ai1 ai1
Z Z
aCi1 ˇ ˇ aCi1
e .pa/t ˇ‰ .z/ eiyt ˇ dz j‰ .z/j dz;
ai1 ai1
since p > a and hence e.pa/t 1. The demonstrated uniform convergence allows
us to exchange the order of integrals in Eq. (6.31) to get:
Z Z 1
1 aCi1
L Œt˛ .t/ D ‰ .z/ dz t˛ e.pz/t dt : (6.32)
2 i ai1 0
The t integral by means of the change of variable t ! u D .p z/ t is related to the

gamma function (4.30). Indeed,
Z 1 Z .paiy/t
˛ .pz/t 1
t e dt D lim u˛ eu du:
0 .p z/1˛ t!1 0
Fig. 6.3 For the proof of the

relation of the t integral in
Eq. (6.32) to the gamma
function integral upon the
change of the variable
t ! u D .p z/ t D
.p a iy/ t. The two cases
correspond to y < 0 (a) and
y > 0 (b). Note that p > a
Fig. 6.4 The contour used in

calculating the integral in
Eq. (6.33)
A subtle point here is that the u-integration is performed not along the real axis
assumed in the definition of the gamma function, but rather along the straight line
u D 0 7! u D .p a iy/ t in the complex plane (note that a real y is fixed). This
is line L in Fig. 6.3. To covert this integral into the one along the real axis Re.u/,
we introduce a closed contour L C CR C X C C shown in the figure. Note that the
horizontal line X is being passed in the negative direction as shown. The contour
avoids the point u D 0 with a circular arc C of radius ! 0. Also, another circular
arc CR of radius R ! 1 connects L with the horizontal line X. Since the integrand
u˛ eu does not have poles inside the contour, the sum of the integrals along the
whole closed contour is zero. At the same time, it is easy to see that the integral
along C behaves like 1˛ and hence tends to zero when ! 0, while the one
along CR behaves like R1˛ eR cos (where R =2 < R < =2) R and hence goes to
zero as well in the R ! 1 limit. Therefore, L D X D X C , where X C is the
horizontal line along the real axis u > 0 taken in the positive direction. Hence, we
arrive at the u-integral in which u changes between 0 and 1 yielding the gamma
function .1 ˛/ as required.
Therefore, Eq. (6.32) can be rewritten as follows:
Z
˛ .1 ˛/ aCi1
‰ .z/
L Œt .t/ D dz: (6.33)
2 i ai1 .p z/1˛
Recall that here p > a D Re .z/.

To calculate the vertical line integral above, we use the contour shown in Fig. 6.4.
As ˛ is a positive real number, a branch cut has been made as shown in the figure
with the phase for z D jzj ei being zero on the upper and 2 on the lower sides
of it. The function ‰.z/ does not have poles to the right of Re.z/ D a, so that the
integral over the closed contour is equal to zero. Because the function ‰ .z/ tends
to zero for jzj ! 1, the integral over the two parts of the large circular contour
CR tends to zero as R ! 1. Consider the integral over the small circle C . There
zp D ei , dz D i ei d and hence the integral behaves as ˛1 D ˛ and hence
also tends to zero as ! 0 (recall that 0 < ˛ < 1). Hence, we have to consider
the integrals over the upper I and lower II horizontal lines. On the upper side
Z Z ˇ ˇ Z 1
p
‰ .z/ ˇz p D sˇ ‰ .s C p/
D ˇ
dz D ˇ ˇ D ds
.p z/ 1˛ dz D ds ˇ .s/1˛
I 1 0
Z 1
D .s/˛1 ‰ .s C p/ ds:
0
The function .s/˛1 originates from .p z/˛1 . On the vertical part of the contour
the latter function is positive when z is on the real axis since there p > Re .z/ D a
(recall that p is real and positive). Since z D a is on the left of p, it corresponds
to the phase D of s D jsj ei (our change of variables s D z p is simply a
horizontal shift to place the centre of the coordinate system at point p). Therefore,
when choosing the correct branch of the function .s/˛1 , we have to make sure that
the function is positive at the D phase of s. There are basically two possibilities
to consider for the prefactor 1 before the s in .s/˛1 : when 1 D ei and 1 D
ei . For the former choice (recall that D when z D a),
.s/˛1 D jsj˛1 ei. C /.˛1/

D jsj˛1 ei2 .˛1/
D jsj˛1 ei2 ˛ :
It is easy to see that for a general non-integer ˛ it is impossible to guarantee that this
number is positive. Indeed, if we, for instance, take ˛ D 1=2, then ei2 ˛ D ei D
1 leading to a negative value. On the other hand, the other choice guarantees the
positive value for any ˛ and the phase D :
.s/˛1 D jsj˛1 ei. C /.˛1/

D jsj˛1 :
Therefore, the correct branch of the function .s/˛1 is chosen by setting

1 D ei .
With this choice in mind, we can now proceed to the calculation of the integral on
the upper side of the branch cut. On this side D 0 and s D x p > 0. Therefore,
˛1
.s/˛1 D ei s D ei.˛1/ s˛1 ;
so that the integral over the upper side becomes

Z Z 1
D ei.˛1/ s˛1 ‰ .s C p/ ds:
I 0
Similarly, the integral on the lower side,

Z Z 1 Z 1
‰ .z/
D dz D .s/˛1 ‰ .s C p/ ds;
II p .p z/1˛ 0
is evaluated by noting that in this case s has the phase of D 2 , so that

˛1 i ˛1
.s/˛1 D ei ei2 s D e s D ei.˛1/ s˛1 ;
and we obtain
Z Z 1
D ei.˛1/ s˛1 ‰ .s C p/ ds:
II 0
Therefore, the vertical line integral

Z Z Z h iZ 1
aCi1
‰ .z/ i.˛1/
1˛
dz D D e ei.˛1/
s˛1 ‰ .s C p/ ds
ai1 .p z/ I II 0
Z 1
D 2i sin ..˛ 1/ / s˛1 ‰ .s C p/ ds
0
Z 1
D 2i sin .˛ / s˛1 ‰ .s C p/ ds: (6.34)
0
This finally gives

Z
.1 ˛/ aCi1
‰ .z/
L Œt˛ .t/ D dz
2 i .p z/1˛ ai1
Z
.1 ˛/ sin .˛ / 1 ˛1
D s ‰ .s C p/ ds:
0
Recalling the relationship (4.45), we immediately recover Eq. (6.30), as required.

The obtained result can be illustrated on several examples for the calculated
transforms. For instance, if .t/ D 1, then ‰ .p/ D 1=p, and we obtain
Z ˇ ˇ
1 1
s˛1 ˇ s D pt ˇ
L Œt ˛
D ˇ
ds D ˇ ˇ
.˛/ 0 sCp ds D pdt ˇ
˛1 Z 1 ˛1
p t dt p˛1
D D B .˛; 1 ˛/ D p˛1 .1 ˛/ :
.˛/ 0 tC1 .˛/
Here we used formulae (4.42) and (4.41) for the beta function. For instance, by
taking ˛ D 1=2, we immediately recover our previous result (6.27).
Problem 6.21. Verify formula (6.30) for .t/ D ı .t a/ with a > 0 by

showing that both sides give L Œt˛ ı .t a/ D a˛ eap .
Problem 6.22. Consider ˛ D ˇ C Œ˛ > 0, where Œ˛ is the integer part of ˛
and 0 < ˇ < 1 is the rest of it. Demonstrate that
Z 1
.1/n
L Œt˛ .t/ D sˇ ‰ .n/ .s C p/ ds; (6.35)
.1 ˇ/ 0
where n D Œ˛ C 1 is a positive integer, and ‰ .n/ .s/ is the n-th derivative of
‰.s/. [Hint: repeat the steps which led us to Eq. (6.33) in the previous case,
and then, showing that
n
d 1 .1/n .˛ C 1/ 1
D ;
dz .p z/ˇ .ˇ/ .p z/˛C1
relate the integral over z to that given in Eq. (6.34).]
6.3 Properties of the Laplace Transform
By calculating the LT of various functions f .t/ we would find the correspondences

L Œf .t/ ! F.p/. It is convenient to collect these all together in a table, and an
example of this is Table 6.1. This table may be found very useful in applications,
both at the first step when going into the Laplace p-space from the original time
formulation of the problem, and also back from the p-space into the t-space, when
the original f .t/ is to be obtained from the image F.p/. Of course, it is not possible
to list all existing functions in such a table; not only because in some cases the
LT cannot be obtained analytically; it is simply because the number of possible
functions is infinite. That is why establishing rules for calculating the LT has been
found extremely useful in practical applications of the method as these rules, if
applied properly, enable one to use a few entries in the table to calculate the LT (and
the inverse LT) of a wide class of functions. In this section we shall consider the
most frequently used rules.
6.3.1 Derivatives of Originals and Images
When transforming differential equations into the Laplace p-space, it is necessary

to convert derivatives of the original f .t/. To this end, let us assume that the LT is
known for the function f .t/, i.e. L .f .t// D F.p/. The question we ask is whether
the LT of the derivative f 0 .t/ can be related to the LT of the function f .t/ itself.
6.3 Properties of the Laplace Transform 465
Table 6.1 Laplace transforms of some functions

y D f .t/; t > 0 R1
Y D L.y/ D F.p/ D 0 ept f .t/dt Range of p in the
Œy D f .t/ D 0; t < 0 complex plane
1
L1 1 p
Re p > 0
at 1
L2 e pCa
Re .p C a/ > 0
L3 sin at a
p2 Ca2
Re p > jIm aj
p
L4 cos at p2 Ca2
Re p > jIm aj
.kC1/
L5 tk ; k > 1 kŠ
pkC1
or pkC1
Re p > 0
k at .kC1/
L6 t e ; k > 1 kŠ
.pCa/kC1
or .pCa/kC1
Re .p C a/ > 0
eat ebt 1
L7 ba .pCa/.pCb/
Re .p C a/ > 0
aeat bebt p
L8 ab .pCa/.pCb/
Re .p C b/ > 0
L9 sinh at a
p2 a2
Re p > jRe aj
p
L10 cosh at p2 a2
Re p > jRe aj
2ap
L11 t sin at .p2 Ca2 /2
Re p > jIm aj
p2 a2
L12 t cos at .p2 Ca2 /2
Re p > jIm aj
L13 eat sin bt b

.pCa/2 Cb2
Re .p C a/ > jIm bj
eat cos bt
pCa
L14 .pCa/2 Cb2
Re .p C a/ > jIm bj
a2
L15 1 cos at p.p2 Ca2 /
Re p > jIm aj
a3
L16 at sin at p2 .p2 Ca2 /
Re p > jIm aj
2a3
L17 sin at at cos at .p2 Ca2 /2
Re p > jIm aj
eat .1 at/
p
L18 .pCa/2
Re .p C a/ > 0
L19 sin at
t
arctan a
p
Re p > jIm aj
1 1
L20 t
sin at cos bt; a > 2
arctan aCb
p
C arctan ab
p
Re p > 0
0; b > 0
eat ebt pCb
L21 t
ln pCa
Re .p C a/ > 0 and
Re .p C b/ > 0
p
1 a p
L22 1 erf a
p
2 t
; a> p
e Re p > 0
0
2 1=2
L23 J0 .at/ p C a2 Re p > jRe aj, or for
real a ¤ 0; Re p 0
1 pa
L24 f(.t/ D p
e Re p > 0
1; t > a > 0
0; t < a
This question is trivially answered by the direct calculation in which the integration
by parts is used
Z 1 Z 1
0 pt df
ˇ
pt ˇ1
L f .t/ D e dt D f .t/e 0 f .t/.p/ept dt
0 dt 0
Z 1
D f .0/ C p ept f .t/dt D f .0/ C pL Œf .t/ ; (6.36)
0
where we have made use of the fact that the upper limit t D 1 in the free term
can be omitted. Indeed, since the LT of f .t/ exists for any p satisfying Re.p/ D
Re.x C iy/ D x > x0 with some growth order parameter x0 , then jf .t/j Mex0 t .
Therefore, the free term is limited from above,
ˇ ˇ ˇ ˇ
ˇf .t/ept ˇ Mex0 t ˇˇe.xCiy/t ˇˇ D Mex0 t ext D Me.xx0 /t ;
and hence tends to zero at t ! 1 limit.

For the second derivative one can use the above formula twice since f 00 .t/ D g0 .t/
with g.t/ D f 0 .t/. Hence,
L f 00 .t/ DL g0 .t/ DpL Œg.t/ g.0/DpL f 0 .t/ f 0 .0/Dp .pL Œf .t/ f .0// f 0 .0/;
i.e.,
L f 00 .t/ D p2 L Œf .t/ pf .0/ f 0 .0/: (6.37)
The obtained results (6.36) and (6.37) clearly show that the LT of the first and
the second derivatives of a function f .t/ are expressed via the LT of the function
itself multiplied by p or p2 , respectively, minus a constant. This means that since the
differentiation turns into multiplication in the Laplace space, a differential equation
would turn into an algebraic one allowing one to find the image F.p/ of the solution
of the equation. Of course, in this case the crucial step is the one of inverting the LT
and finding the original from its image, a problem which may be non-trivial.
Problem 6.23. In fact, this way it is possible to obtain a general formula for
the n-th derivative, L f .n/ .t/ . Using induction, prove the following result:
X
n1
L f .n/ .t/ D pn F.p/ pn1k f .k/ .0/; n D 1; 2; 3; : : : ; (6.38)
kD0
where f .0/ .0/ D f .0/. Note that f .0/ and the derivatives f .k/ .0/ are understood
as calculated in the limit t ! C0.
So, any differentiation of f .t/ always turns into multiplication after the LT.
There is also a simple formula allowing one to calculate the inverse LT of the
n-th derivative F .n/ .p/ of the image, see Eq. (6.12):
L1 F .n/ .p/ D .t/n f .t/: (6.39)
Problem 6.24. Generalise the result (6.36) for the derivative of f .t/ for the
case when f .t/ has a discontinuity of the first kind at the point t1 > 0:

L f 0 .t/ D pL Œf .t/ f .0/ f t1C f t1 ept1 ;
where t1C D t1 C 0 and t1 D t1 0. Note that the last extra term which is
proportional to the value of the jump of f .t/ at t1 disappears if the function
does not jump, and we return to our previous result (6.36). [Hint: split the
integration in the definition of the LT into two regions by the point t1 and then
take each integral by parts in the same way as when deriving Eq. (6.36).]
6.3.2 Shift in Images and Originals
There are two simple properties of the LT related to a shift of either the original or
the image which we formulate as a problem for the reader to prove:
Problem 6.25. Using the definition of the LT, prove the following formulae:
L Œf .t / D ep L Œf .t/ D ep F.p/ I (6.40)

˛t
L e f .t/ D F .p C ˛/ : (6.41)
Note that it is implied (as usual) that f .t/ D 0 for t < 0 in both these equations.
In particular, f .t / D 0 for t < (where > 0).
Let us illustrate the first identity which may be found useful when calculating the
LT of functions which are obtained by shifting on the t-axis a given function. As an
example, we shall first work out the LT of a final width step function shown in the
left panel of Fig. 6.5:
Z T
1
L Œ….t/ D ept dt D 1 epT : (6.42)
0 p
Correspondingly, the LT of the function shifted to the right by is then
ep
L Œ… .t / D 1 epT :
p
Fig. 6.5 A step ….t/ of unit

height and width T is defined
in the left panel. In the right
panel the function is shifted
as a whole by to the right
Fig. 6.6 The function f .t/ is

formed by a sum of an infinite
series of identical steps
… .t tk /
Fig. 6.7 A waveform

composed of the periodically
repeated positive half-period
piece of the sine function
Consider now a wave signal which is composed of identical unit step impulses
which start at positions tk D k.T C /, where k D 0; 1; 2; : : :, as shown in Fig. 6.6.
The LT of such a function is a sum of contributions from each impulse:
1
X
L Œf .t/ D L Œ… .t/ C L Œ… .t t1 / C L Œ… .t t2 / C D L Œ… .t tk /
kD0
1 ptk
X 1
e 1 X 1 1 epT
D 1 epT D 1 epT ep.TC/k D :
kD0
p p kD0
p 1 ep.TC/
Problem 6.26. Show that the LT of the waveform shown in Fig. 6.7 is given by
! 1 C epT=2
L Œf .t/ D ;
p2 2
C ! 1 ep.CT=2/
where T D 2 =! and the first waveform is defined on the interval 0 < t <
T=2C as f .t/ D sin !t for 0 < t < T=2 and f .t/ D 0 for T=2 < t < T=2C.
In the case of D T=2 this waveform corresponds of a half-wave rectifier
which removes the negative part of the signal.
Problem 6.27. The same for the waveform shown in Fig. 6.8:
2
2 1 epT=2
L Œf .t/ D 2 :
Tp 1 ep.CT/
Problem 6.28. The same for the waveform shown in Fig. 6.9(a):
A 1
L Œf .t/ D :
p 1 ep
Problem 6.29. The same for the waveform shown in Fig. 6.9(b):
A 1 ep
L Œf .t/ D :
p 1 C ep
6.3.3 Integration of Images and Originals
We saw in Sect. 6.3.1 that the LT of the first derivative of f .t/ is obtained essentially
by multiplying its image F.p/ with p. We shall now see that the LT of the integral
of f .t/ can be obtained by dividing F.p/ by p:
Z t
F.p/
L f . /d D : (6.43)
0 p
Fig. 6.8 A waveform formed

by periodically repeated
hat-like signal
Fig. 6.9 Waveforms to Problem 6.28 (a) and Problem 6.29 (b)
Indeed, consider
Z t Z 1 Z t
L f . /d D dt ept d f . /:
0 0 0
Interchanging the order of integration (considering carefully the change of the limits
on the t plane), we obtain
Z t Z 1 Z 1
pt
L f . /d D d f . / e dt
0 0
Z 1 Z 1
ep 1 F.p/
D d f . / D d f . /ep D ;
0 p p 0 p
as required. When integrating the exponential function ept with respect to t, we set
to zero the result at t ! 1 which is valid for any Re.p/ > 0.

1 ˛pp ˛
L1 e D erfc p :
p 2 t
[Hint: note the result of Problem 6.19.]
In the above problem we have introduced a useful function which is called error
function:
Z 1
2 2
erfc .x/ D p et dt: (6.44)
x
A function complimentary to this one is

Z
2 x
2
erf .x/ D p et dt; (6.45)
0
also called the error function. Both functions are related

Z 1
2 2
erf.x/ C erfc.x/ D p et dt D 1;
0
see Sect. 4.2 for the value of the last integral.

Similarly the “reciprocal” property is proven
Z 1
f .t/
L1 F.s/ds D : (6.46)
p t
Here the s-integral is taken along any path in the complex plane connecting the
points p and s1 D 1, where js1 j D 1 and Re .s1 / > 0. Indeed,
Z 1 Z 1 Z 1 Z 1 Z 1
F.s/ds D ds dt f .t/est D dt f .t/ ds est
p p 0 0 p
Z 1 pt Z 1
e f .t/ pt f .t/
D dt f .t/ D dt e DL ;
0 t 0 t t
the desired result. Again, when integrating over s, we set the exponential function
est to zero at the upper limit of s1 D 1 since Re.s1 / > 0.
Problem 6.31. Consider the function

Z t
1
g.t/ D e˛ d D 1 e˛t :
0 ˛
Check that its LTs calculated directly and when using Eq. (6.43) do coincide.

e˛t eˇt pCˇ
L D ln I
t pC˛

sin !t p !
LD arctan D arctan I
t 2 ! p
h cos ! t cos ! t i 2
1 p C !2 2
1 2
L D ln 2 :
t 2 p C !12
Problem 6.33. Prove the following formulae valid for Re.p/ > 0:
Z p p
t
cos 1 pCiC pi
L p d D p p I
0 2 2 2p p2 C 1
Z p p
t
sin i pCi pi
L p d D p p :
0 2 2 2p p2 C 1
[Hint: you may find it useful to use Eq. (6.13).]

6.3.4 Convolution Theorem
Let G.p/ and F.p/ be LTs of the functions g.t/ and f .t/, respectively:
Z 1
G.p/ D L Œg.t/ D ept1 g .t1 / dt1 ; (6.47)
0
Z 1
F.p/ D L Œf .t/ D ept2 f .t2 / dt2 : (6.48)
0
Consider their product:

Z 1 Z 1
G.p/F.p/ D dt2 dt1 ep.t1 Ct2 / g .t1 / f .t2 / :
0 0
This is a double integral in the .t1 t2 / plane. Let us replace the integration variable
t1 with t D t1 C t2 . This yields
Z 1 Z 1
pt
G.p/F.p/ D dt2 dt e g .t t2 / f .t2 / : (6.49)
0 t2
At the next step we interchange the order of the integrals. This has to be done with
care as the limits will change:
Z 1 Z t Z 1 Z t
G.p/F.p/D dt dt2 ept g .tt2 / f .t2 / D dt ept g .tt2 / f .t2 / dt2
0 0 0 0
(6.50)
The function
Z t
h.t/ D g .t / f . /d D .g f / .t/ (6.51)
0
is called a convolution of functions g.t/ and f .t/. Note that the convolution is
symmetric:
Z t Z t
g .t / f . /d D f .t / g. /d H) .g f / .t/ D .f g/ .t/;
0 0
which is checked immediately by changing the integration variable. Hence, we have

just proven the following identity called the convolution theorem:
L Œ.f g/ .t/ D F.p/G.p/: (6.52)

6.4 Solution of Ordinary Differential Equations (ODEs) 473
The convolution we introduced here is very similar to the one we defined when
considering the FT in Sect. 5.2.3. Since the LT and FT are closely related, there is
no surprise that in both cases the convolution theorem has exactly the same form.6
Problem 6.34. Use the convolution theorem to prove the integral rule (6.43).
6.4 Solution of Ordinary Differential Equations (ODEs)
One of the main applications of the LT is in solving ODEs and their systems for
specific initial conditions. Here we shall illustrate this point on a number of simple
examples. More examples from physics will be given in Sect. 6.5. We shall be using
simplified notations from now on: if the original function is f .t/, its image will
be denoted by the corresponding capital letter, F in this case, with its argument p
usually omitted.
The general scheme for the application of the LT method is sketched in Fig. 6.10:
since direct solution of the differential equation may be difficult, one uses the LT to
rewrite the equation into a simpler form for the image Y, which is then solved. In
particular, as we shall see, a linear DE with constant coefficients turns into a linear
algebraic equation for the image Y which can always be solved. Once the image is
known, one performs the inverse LT to find the function y.t/ of interest. The latter
will automatically satisfy the initial conditions.
Example 6.1. I Consider the following problem:
y00 C 4y0 C 4y D t2 e2t with y.0/ D y0 .0/ D 0: (6.53)
Fig. 6.10 The working chart for using the LT method when solving differential equations
6
Note, however, that the two definitions of the convolution in the cases of LT and FT are not
identical: the limits are 0 and t in the case of the LT, while when we considered the FT the limits
were ˙1, Eq. (5.39).
Solution. Take the LT of both sides of the DE:
p2 Y py.0/ y0 .0/ C 4 ŒpY y.0/ C 4Y D L t2 e2t ;
where we made use of Eqs. (6.36) and (6.37) for the derivatives. From Eq. (6.11),
the right-hand side is
2
L t2 e2t D ;
.p C 2/3
so that, after using the initial conditions, we obtain the following algebraic equation
for Y:
2
p2 Y C 4pY C 4Y D ;
.p C 2/3
which is trivially solved to yield:
2 1 2 1 4 2t
YD 3
2 D H) y.t/ D t e ; (6.54)
.p C 2/ p C 4p C 4 .p C 2/5 12
where we have used Eq. (6.11) again to perform the inverse LT. J
This example illustrates the power of the LT method. Normally we would look for
a general solution of the corresponding homogeneous equation with two arbitrary
constants; then we would try to find a particular integral which satisfies the whole
equation with the right-hand side. Finally, we would use the initial conditions to
find the two arbitrary constants. Using the LT, the full solution satisfying the initial
conditions is obtained in just one step!
Moreover, it is easy to see that a general solution of the DE with constant
coefficients can always be obtained in a form of an integral for an arbitrary function
f .t/ in the right-hand side of the DE. We have already discussed this point in
Sect. 5.3.3 when considering an application of the Fourier transform method for
solving DEs. There we introduced the so-called Green’s function of the DE. A very
similar approach can be introduced within the framework of the LT as well. Indeed,
consider a second order inhomogeneous DE:
y00 C a1 y0 C a2 y D f .t/; (6.55)
where a1 and a2 are some constant coefficients. By applying the LT to both sides,
we obtain
p2 Y py.0/ y0 .0/ C a1 ŒpY y .0/ C a2 Y D F;
which yields
1 .p C a1 / y.0/ C y0 .0/
YD FC : (6.56)
p2 C a1 p C a2 p2 C a1 p C a2
Let us introduce a function

1 1
G.p/ D D ; (6.57)
p2 C a1 p C a2 .p pC / .p p /
where pC and p are the two roots of the corresponding quadratic polynomial in the
denominator. This function can serve as the image for the following original:
1 p t
g.t/ D L1 ŒG.p/ D e C ep t or g.t/ D tepC t ; (6.58)
pC p
depending on whether pC and p are different or the same (repeated roots).7 Then,
the solution (6.56) can be written as
Y.p/ D G.p/F.p/ C G.p/ .p C a1 / y.0/ C y0 .0/ : (6.59)
The second term in the right-hand side corresponds to the solution of the cor-
responding homogeneous DE which is already adapted to the initial conditions;
obviously, it corresponds to the complementary solution (use integration in the
complex plane):
.pC C a1 / y.0/ C y0 .0/ pC t .p C a1 / y.0/ C y0 .0/ p t

yc .t/ D e e
pC p pC p
(assuming that pC ¤ p ). The first term in Eq. (6.59), however, corresponds to

the particular integral of the inhomogeneous DE, and can be written (utilising the
convolution theorem) as
Z t Z t
yp .t/ D g .t / f . / d D f .t / g ./ d: (6.60)
0 0
Either form is, of course, valid, as the convolution is symmetric! The function
g.t/ is called the Green’s function of the DE (6.55). It satisfies the DE with the
delta function f .t/ D ı.t/ in the right-hand side (cf. Sect. 5.3.3) and zero initial
conditions:
g00 C a1 g0 C a2 g D ı .t/ : (6.61)
Recall that the image of the delta function is just unity, see Eq. (6.15).
2
7
In the latter case the inverse LT of G.p/ D 1= p pC can be obtained either directly or
by taking the limit p ! pC in g.t/ from the first formula in Eq. (6.58) obtained by assuming
pC ¤ p .
Example 6.2. I Consider the following problem:
y00 C 3y0 C 2y D f .t/ with y.0/ D y0 .0/ D 0: (6.62)
Our task is to obtain a particular solution of this DE, subject to the given initial
conditions, for an arbitrary function f .t/.
Solution. Let L .f .t// D F and L .y.t// D Y: Then, performing the LT of both
sides of the DE, we obtain
p2 Y py.0/ y0 .0/ C 3 ŒpY y.0/ C 2Y D F:
This equation is simplified further by applying our zero initial conditions. Then, the
obtained equation is easily solved for the image Y of y.t/:
1
p2 Y C 3pY C 2Y D F H) YD F:
p2 C 3p C 2
1
Here G.p/ D p2 C 3p C 2 is the image of the Green’s function. The original
corresponding to it is easily calculated, e.g. by decomposing into partial fractions:

1 1 1 1 1 1 1
g.t/DL ŒG.p/ DL DL L D et e2t :
p2 C 3p C 2 pC1 pC2
Correspondingly, the full solution of the DE, because of the convolution theorem,
Sect. 6.3.4, is
Z t Z t
y.t/ D e.t/ e2.t/ f . /d or f .t / e e2 d:
0 0
Note that the obtained forms of the solution do correspond to the general
result (6.60). This should not be surprising as in this problem we deal with zero
initial conditions. J
0
that the equation y C 3y D e , y.0/ D 0, has the solution
t
Problem6.35. Show
1 3t
y.t/ D 4 e e
t
.
Problem 6.36. Show that the equation y00 C 4y D sin 2t with the initial
conditions y.0/ D 10 and y0 .0/ D 0 has the solution
1
y.t/ D 10 cos 2t C .sin 2t 2t cos 2t/ :
8
Problem 6.37. Show that the equation y0 y D 2et with the initial condition
y.0/ D 3 has the solution y.t/ D .3 C 2t/ et .
Problem 6.38. Show that the solution of the following DE,
y00 C 9y D cos 3t; y.0/ D 0; y0 .0/ D 6;
is y.t/ D .2 C t=6/ sin 3t.

Problem 6.39. Show that the function y.t/ D e2t is the solution of the equation
y00 C y0 5y D e2t subject to the initial conditions y.0/ D 1 and y0 .0/ D 2.
Problem 6.40. A response x.t/ of an electronic device to an external excitation
f .t/ (assuming that f .t/ D 0 for t < 0) is described by the differential equation
dx
T C x D f .t/;
dt
where T is some positive constant. Using the LT method, show that the general
solution of this equation satisfying the initial condition x.0/ D 0 is
Z
1 T
x.t/ D et=T f . /e=T d:
T 0
Next, consider an excitation in the form of a rectangular impulse f .t/ D 1=

for 0 < t t0 < (with t0 > 0) and zero otherwise. Obtain the x.t/ in this
case and show that in the ! C0 limit x.t/ D T1 e.tt0 /=T . Since in this limit
the rectangular impulse represents the delta function ı .t t0 /, the solution x.t/
in this case in fact represents the Green’s function of the differential equation.
Hence, show that the same Green’s function of the DE can be obtained directly
by setting f . / D ı . t0 /.
Problem 6.41. A harmonic oscillator is initially at rest. Then at t D 0 a
driving force f .t/ is applied. Show that the solution of the corresponding
harmonic oscillator equation
y00 2y0 C y D f .t/;
can be generally written as

Z t
y.t/ D .t / et f . /d:
0
Problem 6.42. As an example of the application of the formalism developed

in the previous problem, consider an experiment, in which a finite pulse (n D
1; 2; 3; : : :)

n; if 1=2n < t t0 < 1=2n
f .t/ D n .t/ D
0; if jt t0 j > 1=2n
was externally applied to the oscillator around some time t0 0. Show that
the response of the system in each case is

1 1=2n
yn .t/ D nett0 .t t0 1/ e1=2n e1=2n C e C e1=2n :
2n
By taking the n ! 1 limit, show that the response
yn .t/ ! y1 .t/ D .t t0 / ett0 :
Show that the same result is obtained with f .t/ D ı .t t0 /. Is this coincidence
accidental?
The LT method simplifies significantly solutions of a system of ordinary DEs as

well. As an example, consider the following system of two first order differential
equations:
dy dz
2z D 2; C 2y D 0;
dt dt
which are subject to the zero initial conditions y.0/ D z.0/ D 0. Applying the LT to
both equations and introducing the images y.t/ ! Y.p/ and z.t/ ! Z.p/, we obtain
two algebraic equations for them:
2
pY 2Z D and pZ C 2Y D 0;
p
which are easily solved to yield
p 4
YD ; ZD :
4 C p2 p .p2 C 4/
Applying the inverse LT, we finally obtain y.t/ D sin 2t and z.t/ D 1 C cos 2t. The
solution satisfies the equations and the initial conditions. One can also see that there
is one more advantage of using the LT if not all but only one particular function is
needed. At the intermediate step, when solving for the images in the Laplace space,
each solution is obtained independently. Hence, if only one unknown function is
6.5 Applications in Physics 479
needed, it can be obtained and inverted; this brings significant time savings for
systems of more than two equations. This situation often appears when solving
electrical circuits equations (Sect. 6.5.1) when only the current (or a voltage drop)
along a particular circuit element is sought for.
Problem 6.43. Show that the solution of the equations
y00 C z00 z0 D 0; y0 C z0 2z D 1 et ;
which is subject to the initial conditions y.0/ D 0, y0 .0/ D z.0/ D z0 .0/ D 1 is

y.t/ D t and z.t/ D et .
Systems of DEs we considered in Sect. 1.3 can also be solved using the LT
method. The convenience of the latter method is that the corresponding eigenprob-
lem does not appear. Moreover, we assumed there an exponential trial solution;
no need for this assumption either: the whole solution appears naturally if the LT
method is applied.
Problem 6.44. In this Problem we shall consider again a particle in the

magnetic field and obtain a general solution for an arbitrary direction of the
magnetic field B and the initial velocity v0 . Apply the LT method to Eq. (1.114)
to show that
h i h i
v.t/ D BO v0 BO C v0 BO v0 BO cos .!t/ C v0 BO sin .!t/ ;
where v0 is the initial velocity of the particle and BO D B=B is the unit vector in
the direction of
the magnetic
field, and ! D qB=m. It is easy to see that in the
case of v0 D 0; v? ; vk and BO D .0; 0; 1/ the same result as in Eq. (1.118) is
immediately recovered.
The LT method can be very useful in solving partial differential equations as

well. This particular application will be considered in some detail in Sect. 8.6.
6.5 Applications in Physics
6.5.1 Application of the Laplace Transform Method in

Electronics
The LT method has been found extremely useful is solving electronic circuits
problems. This is because the famous Kirchhoff’s laws for the circuits represent
Fig. 6.11 Main elements of

electrical circuits: a resistor
R, capacitance C, induction L
and mutual induction M
a set of integro-differential equations with constant coefficients which turn into a

set of algebraic equations for the current images in the Laplace space and hence can
be easily solved there. The resulting expressions in most cases represent a rational
function of p for which the inverse LT can always be calculated. Therefore, any
circuit, however complex, can in principle be solved. We shall illustrate the power
of the LT method for solving problems of electrical circuits on a number of simple
examples and problems.
We start by analyzing the main elements of any electrical circuit. There are four
of them, all shown in Fig. 6.11. The precise knowledge of the voltage drop u.t/ on
each of the elements caused by the flowing current i.t/ is required in order to write
the Kirchhoff’s equations. The simplest relationship is for the resistance: uR .t/ D
Ri.t/. In the case of a conductance, uC .t/ D q.t/=C, where q.t/ is the charge stored
there. Since i.t/ D dq=dt, we obtain in this case:
Z t
1
uC .t/ D i. /d C q0 ; (6.63)
C 0
where q0 is the initial charge on the capacitor. In the case of induction,
di.t/
uL .t/ D L ; (6.64)
dt
while in the case of the mutual induction,
di.t/
uM .t/ D M ; (6.65)
dt
where the directions of the current as shown in Fig. 6.11 are assumed. The first
Kirchhoff’s law states that for any closed mesh in a circuit the total drop of the
voltage across the mesh calculated along a particular direction (see an example in
Fig. 6.12) is equal to the applied voltage (zero if no applied voltage is present, i.e.
there is no battery attached):
X X dik X 1 Z t X X
dik
Rk ik C L C ik . /d C q0k C Mkk0 D vk .t/;
k k
dt k
Ck 0 k
dt k
(6.66)
Fig. 6.12 A typical mesh in

a circuit containing a set of
elements and vertex points
where the current goes along
several different directions. A
dashed line indicates a chosen
“positive” direction needed to
select correctly the signs in
the first Kirchhoff’s law for
each element of the mesh
where we sum over all elements appearing in the mesh, with Mkk0 being the mutual
induction coefficient between two meshes k and k0 (note that Mkk0 D Mk0 k ). The
current depending terms (in the left-hand side) are to be added algebraically with
the sign defined by the chosen directions of the currents with respect to the chosen
positive direction in the mesh (it is indicated in Fig. 6.12), and the voltages vk .t/ in
the right-hand side are also to be summed up with the appropriate sign depending
on their polarity.
The second Kirchhoff’s law states that the sum of all currents through any vertex
is zero:
X
ik D 0: (6.67)
k
To solve these equations, we apply the LT method. Since the terms are added
algebraically, we can reformulate the rules of constructing the Kirchhoff’s equations
directly in the Laplace space if we consider how each element of the mesh would
contribute. If I.p/ is the image for a current passing through a resistance R and
UR .p/ is the corresponding voltage drop, then the contribution due to the resistance
would simply be UR D RI. For a capacitance C, the LT of Eq. (6.63) gives
1
UC D .I C q0 / ; (6.68)
pC
while for the induction elements we would have similarly
UL D L .pI i0 / or UM D M .pI i0 / : (6.69)
Here q0 and i0 are the initial (at t D 0) charge on the capacitor and the current
through the induction. Correspondingly, the vertex equation (6.67) has the same
form for the images as for the originals, while the Eq. (6.66) describing the voltage
drop across a mesh is rewritten via images as
X X X 1 X X
Rk Ik C L .pIk i0k /C .Ik C q0k /C Mkk0 .pIk i0k / D Vk .p/:
k k k
pCk k k
(6.70)
Fig. 6.13 Simple electrical circuits for (a) Example 6.3, (b) Example 6.4 and (c) Problem 6.45.
Here voltages and currents are understood to be images of the corresponding quantities in the
Laplace space and hence are shown using capital letters
In the case of a constant voltage (a battery or Emf) Vk .p/ D vk =p. Here and in the
following we use capital letters for images and small letters for the originals.
Example 6.3. I Consider a circuit with Emf v0 given in Fig. 6.13(a). Initially, the
switch was opened. Calculate the current after the switch was closed at t D 0 and
then determine the charge on the capacitor at long times.
Solution. In this case we have a single mesh with zero charge on the capacitor.
Then, the first Kirchhoff’s equation reads
v0 1 v0 1
D RI C I H) ID 1
:
p Cp R p C CR
Inverting the image, we immediately obtain

v0 t=CR
i.t/ D e :
R
Therefore, the current is smoothly reduces to zero leading to the capacitor being
charged up. Its charge is then q1 D v0 C. J
Example 6.4. I Consider a circuit with Emf v0 shown in Fig. 6.13(b). Determine
the current passing through the Emf after the switch is closed.
Solution. The chosen directions of the three currents and the positive directions in
each mesh are indicated in the figure. Let us write the Kirchhoff’s equations in the
Laplace space following the chosen selections of the directions:
v0
RI1 C RI3 D ; LpI2 C RI2 RI3 D 0; and I1 D I2 C I3 :
p
Solving these algebraic equations with respect to the current I1 , which is the only
one of interest for us here, yields
v0 Lp C 2R v0
I1 D H) i1 .t/ D 4 e3Rt=2L : J
2LR p p C 3R
2L
6R
Problem 6.45. Show that the currents in the circuit shown in Fig. 6.13(c) after
the switch was closed are
v0 v0
i1 .t/ D v0 Cı.t/ C 1 eRt=L ; i2 .t/ D 1 eRt=L :
R R
Here v0 is the Emf. Interestingly, the current through the capacitance is zero at
t > 0.
Problem 6.46. Consider the circuit shown in Fig. 6.14(a). Initially the switch
was opened and the capacitor gets charged up. Then at t D 0 it was closed.
Show that the currents through the resistance next to the Emf v0 and the
capacitor, respectively, are given by the equations
v0 v0 2t=CR
i1 .t/ D 1 e2t=CR and i2 .t/ D e :
2R R
Problem 6.47. In the circuit shown in Fig. 6.14(b) initially the switch was
opened. After a sufficiently long time at t D 0 the switch was closed. Show
that the current through the Emf v0 is

v0 R2
i.t/ D 1 eR1 t=L :
R1 R1 C R2
[Hint: note that at t D 0 there is a current passing through the outer mesh.]
Problem 6.48. Consider the same circuit as in the previous problem, but this
time initially for a rather long time the switch was closed and then at t D 0 it
was opened. Show that this time the current flowing through the Emf v0 will be

v0 R2
i.t/ D 1 C e.R1 CR2 /t=L :
R1 C R2 R1
Fig. 6.14 Electrical circuits used in Problem 6.46 (a) and (b) Problems 6.47 and 6.48
6.5.2 Harmonic Particle with Memory
In Sect. 5.3.4 we discussed the solution of an equation of motion for a Brownian

particle. We established there that if the time correlation function of the random
force due to liquid molecules (hitting the particle all the time) satisfies a certain
relationship (5.82) with the constant ˛ D 2mkB T , where m is the particle mass,
friction coefficient and T temperature, then on average the kinetic energy of the
particle satisfies the equipartition theorem (5.81). In our treatment there we did not
consider any memory effects assuming that the friction force is proportional to the
instantaneous particle velocity.
In this section we shall consider a (still one-dimensional) problem8 of a particle in
a harmonic potential well. We shall assume that the friction force will be dependent
on the values of the velocity at preceding times; this corresponds to accounting
for the so-called memory effects. Our aim here is to derive an expression for
the distribution function of the particle at long times, both in its position x and
velocity v D xP , accepting a certain relationship for the random force autocorrelation
function.
Assuming for simplicity that the particle is of unit mass, its equation of motion
can be cast in the following form:
Z t
xR C .t / xP . / d C !02 x D .t/; (6.71)
0
where !0 is its fundamental frequency, .t/ is the so-called friction kernel (without
loss of generality, it can be considered as an even function of time) and .t/
is the random force. The latter force is due to interaction with the surrounding
environment of which the state is uncertain. The friction kernel must be a decaying
function of time tending to zero at long times.
Applying the LT to both sides of the equation and using the convolution theorem,
we obtain
p2 X.p/ px0 v0 C .p/ ŒpX.p/ x0 C !02 X.p/ D .p/;
which gives for the LT of the particle position:
X.p/ D G.p/ f.p/ C Œ.p/ C p x0 C v0 g : (6.72)
Here x0 and v0 D xP 0 are initial position and velocity of the particle, and
1
G.p/ D : (6.73)
p2 C p.p/ C !02
8
For a more general discussion, including a derivation of the equations of motion for a multidimen-
sional open classical system (the so-called Generalised Langevin Equation), as well as references
to earlier literature, see L. Kantorovich, Phys. Rev. B 78, 094304 (2008).
It is easy to see that G.p/ is the LT of the Green’s function G.t/ of the equa-
tion (6.71). Indeed, replacing the right-hand side in the equation with the delta
function, .t/ ! ı.t/, assuming zero initial conditions, and performing the LT,
we obtain exactly the expression (6.73) for X.p/ in this case.
Performing the inverse LT of Eq. (6.72), we obtain
Z t
x.t/ D .t/x0 C G.t/v0 C G.t / ./ d; (6.74)
0
where
Z t
1 11 !02 2
.t/ D L ŒG.p/ .p C .p// D L G.p/ D 1 !0 G. /d:
p p 0
(6.75)
Applying t D 0 in the solution (6.74), we deduce that
.0/ D 1 and G.0/ D 0: (6.76)
Note that the first identity is consistent with the full solution (6.75) for the
function .t/.
Differentiating now the solution (6.74) with respect to time, we get for the
velocity:
Z t
v.t/ D xP .t/ D .t/x
P P
0 C G.t/v0 C
P / ./ d;
G.t (6.77)
0
where the term arising from the differentiation of the integral with respect to the
upper limit vanishes due to the second identity in Eq. (6.76). Applying t D 0 to the
solution (6.77) for the velocity, we can deduce that
.0/
P D 0 and P
G.0/ D 0: (6.78)
Note that the first of these identities also follows immediately from Eqs. (6.75)
and (6.76).
Equations (6.74) and (6.77) provide us with exact solutions for the position and
velocity of the particle in the harmonic well under the influence of the external
force .t/. However, in our case this force is random, and hence these “exact”
solutions have little value. Instead, it would be interesting to obtain statistical
information about the position and velocity of the particle at long time t ! 1 when
the particle has “forgotten” its initial state described by x0 and v0 . To investigate the
behaviour of the particle at long times, it is customary to consider the appropriate
correlation functions. The position autocorrelation function (or position–position
correlation function)
Z
˝ ˛ t
hx.t/x.0/i D hx.t/x0 i D .t/ x02 C G.t/ hv0 x0 i C G.t / h ./ x0 i d;
0
where averaging h: : :i is performed with respect to degrees of freedom of the

environment and all possible initial positions and velocities of the particle. The
random force .t/ is related to the environment and hence is not correlated with
the particle initial position; in addition, the particle initial position and the velocity
are totally independent of each other. Therefore, only the first term survives in the
last equation, and hence we conclude that the position autocorrelation function is
hx.t/x.0/i / .t/. As the correlations with the initial position are expected to die
off with time, we deduce that .t/ ! 0 when t ! 1. Next, consider the position–
velocity correlation function:
Z t
˝ 2˛
hx.t/v.0/i D hx.t/v0 i D .t/ hx0 v0 i C G.t/ v0 C G.t / h ./ v0 i d:
0
In this case only the second term survives rendering the function G.t/ to tend to zero
when t ! 1. Similarly, the velocity autocorrelation function
Z t
˝ 2˛
P hx0 v0 i C G.t/
hv.t/v.0/i D .t/ P v0 C P / h ./ v0 i d;
G.t
0
P
being proportional to G.t/ (since the first and the last terms in the right-hand side
must be zero), shows that the derivative of the Green’s function should also tend to
zero in the long time limit. So, we have established that .t/, G.t/ and G.t/ P must
decay with time to zero since the particle must forget its “past” at long enough times.
Note that, according to (6.75),
P D !02 G.t/;
.t/ (6.79)
and hence tends to zero at long times together with G.t/.

Therefore, if we now consider the function
Z t
u1 .t/ D x.t/ .t/x0 G.t/v0 D G.t / ./ d; (6.80)
0
then it must tend to x .1/ D x1 at long times as both .t/ and G.t/ tend to zero in
this limit. Similarly, we can also introduce another function,
Z t
u2 .t/ D v.t/ .t/x
P P
0 G.t/v0 D
P / ./ d;
G.t (6.81)
0
which must tend to the particle velocity v.1/ D xP .1/ D v1 at long times.
Next we shall consider the same time correlation function A11 D hu1 .t/u1 .t/i
of the function u1 .t/. To calculate it, we shall use the so-called second fluctuation-
dissipation theorem according to which
1
h.t/ . /i D .t / ; (6.82)
ˇ
where ˇ D 1=kB T. This relationship shows that the random forces at the current
and previous times are correlated with each other. It is said that the noise provided
by this type of the random force is “colored”, as opposed to the “white” noise of
Eq. (5.82), when such correlations are absent.
Then,
Z t Z t
A11 .t/ D hu1 .t/u1 .t/i D d1 d2 G .t 1 / h .1 / .2 /i G .t 2 /
0 0
Z Z
1 t t
D d1 d2 G .1 / .1 2 / G .2 / ; (6.83)
ˇ 0 0
where we used Eq. (6.82) and changed the variables t 1 ! 1 and t 2 ! 2 .

Two other correlation functions can be introduced in a similar way for which we
obtain
Z Z t
1 t P .2 / ;
A12 .t/ D hu1 .t/u2 .t/i D d1 d2 G .1 / .1 2 / G (6.84)
ˇ 0 0
Z Z
1 t t
P .2 / :
P .1 / .1 2 / G
A22 .t/ D hu2 .t/u2 .t/i D d1 d2 G (6.85)
ˇ 0 0
To calculate these correlation functions, we shall evaluate their time derivatives. Let
us start from A11 given by Eq. (6.83). Differentiating the right-hand side with respect
to t, we find
Z t
2 2
AP 11 .t/ D G.t/ .t / G. /d D G.t/L1 Œ.p/G.p/ :
ˇ 0 ˇ
Using the definition of the auxiliary function .t/ in the Laplace space, see
Eq. (6.75), the inverse LT above is easily calculated to yield:
P
L1 Œ.p/G.p/ D L1 Œ.p C .p// G.p/ L1 ŒpG.p/ D .t/ G.t/:
P
In the last passage we have used the fact that G.0/ D 0 and hence L G.t/ D pG.p/.
Therefore,
2 2 2
P
AP 11 .t/ D G.t/ .t/ G.t/ D 2 .t/.t/
P P
G.t/G.t/
ˇ ˇ!0 ˇ

1 d 2 .t/ 2
D C G .t/ ;
ˇ dt !02
where Eq. (6.79) was employed to relate G.t/ to .t/ P in the first term. Therefore,
integrating, and using the initial condition that A11 .0/ D 0 (it follows from its defi-
nition as an average of u1 .0/ D 0 squared, or directly from the expression (6.83)),
we obtain

1 1 2 .t/ 2
A11 .t/ D C G .t/ : (6.86)
ˇ!02 ˇ !02
Problem 6.49. Using a similar reasoning, demonstrate that

1 d 1
AP 12 .t/ D P
G.t/.t/G.t/G.t/ H) P
A12 .t/D G.t/ .t/G.t/ ;
ˇ dt ˇ
(6.87)
1 d 1 1
AP 22 .t/ D P 2 .t/
!02 G2 .t/G H) P 2 .t/ :
A22 .t/D !02 G2 .t/CG
ˇ dt ˇ ˇ
(6.88)
Hence, at long times A11 ! 1=ˇ!02 , A12 .t/ ! 0 and A22 .t/ ! 1=ˇ as .t/, G.t/
P vanish in this limit as discussed above.
and G.t/
The probability distribution function P .u1 ; u2 / gives the probability
dW .u1 ; u2 / D P .u1 ; u2 / du1 du2
for the particle to be found with the variable u1 being between u1 and u1 C du1 ,
and u2 being between u2 and u2 C du2 . From the fact that both u1 .t/ and u2 .t/ are
linear with respect to the noise .t/, see Eqs. (6.80) and (6.81), it follows that either
of the variables is Gaussian. That basically means that at long times the probability
distribution is proportional to the exponential function with the exponent which is
quadratic with respect to u1 D u1 .1/ D x1 and u2 D u2 .1/ D v1 :

1
P .u1 ; u2 / / exp YT A1 Y ;
2
where

u1 x1 A11 A12 1=ˇ!02 0
YD D and AD D :
u2 v1 A21 A22 0 1=ˇ
Since at long times the matrix A is diagonal, its inverse is trivially calculated:
2
1 ˇ!0 0
A D ;
0 ˇ
so that

!02 u21 u2 !02 x1
2
v2
P .u1 ; u2 / / exp ˇ C 2 D exp ˇ C 1 ; (6.89)
2 2 2 2
i.e. at long times the distribution function of the particle tends to the Gibbsian
distribution P / eˇE containing the particle total energy E D v1 2
=2 C !02 x1
2
=2
(recall that we set the mass of the vibrating particle to be equal to one here).
Problem 6.50. Determine the normalisation constant in the distribu-

tion (6.89), i.e. show that
Z 1 Z 1 2
ˇ!0 u2 !02 u21
du1 du2 P.u1 ; u2 / D 1 H) P.u1 ; u2 / D exp ˇ C :
1 1 2 2 2
Problem 6.51. Show that the average potential and kinetic energies satisfy the
equipartition theorem:
Z 1 Z 1
!02 u21 1 kB T
hUi D du1 du2 P .u1 ; u2 / du1 du2 D D ;
1 1 2 2ˇ 2
Z 1 Z 1
u22 1 kB T
hKi D du1 du2 P .u1 ; u2 / du1 du2 D D :
1 1 2 2ˇ 2
[Hint: you may find results of Problem 4.15 useful.]

Problem 6.52. Consider now a problem of the unit mass Brownian particle
with memory. There is no conservative force acting on the particle (in the
previous case such was the harmonic force), therefore the equation of motion
is more conveniently formulated with respect to the particle velocity:
Z t
vP C .t / v . / d D .t/;
0
where the random force satisfies the same condition (6.82) as for the particle
in the harmonic well. Show that in this case the Green’s function G.t/ D
L1 Œ1= .p C .p//, the velocity
Z t
v.t/ D G.t/v0 C G .t / ./ d;
0
while the same time correlation function for the variable u.t/ D v.t/ G.t/v0
is
1 1
A.t/ D hu.t/u.t/i D G2 .t/;
ˇ ˇ
which tends to 1=ˇ at long times. Consequently, argue that the probability
distribution for the Brownian particle at long times is Maxwellian:
r
ˇ ˇv1 2 =2
P .v1 / D e ;
2
so
˝ 2 that
˛ the equipartition theorem is fulfilled in this case as well:
v1 =2 D 1=2ˇ.
Fig. 6.15 A “tree” of transitions from the initial state marked as 0 to all other possible states over
time t via a final number of hops at times t1 , t2 , etc., tn D t. Dashed horizontal lines correspond
to the system remaining in the state from which the line starts over all the remaining time, while
arrows indicate transitions between states, the latter are indicated by filled circles. State numbers
are shown by the circles for clarity
6.5.3 Probabilities of Hops
Here we shall consider a problem which frequently appears in many areas of

physics, chemistry and biology, e.g. in materials modeling, when a system hops
between different energy minima (states) during some process (e.g. crystal growth).9
Consider a system prepared at time t D 0 in some initial state. Over time the
.1/
system may undergo a transition, with the rate ri1 , to a different state i1 taken from
a collection of n1 possible states. Once the state i1 is reached, the second transition
.2/
into state i2 with the rate ri2 may occur, and so on. Figure 6.15 illustrates this.
Over the whole observation time t the system may either remain in the initial state,
make just one transition, two, three, etc. The sum of all these events (with up to an
infinite number of transitions, or hops) should form a complete system of events,
and hence the probabilities of all these events must sum up to unity. We shall derive
here these probabilities assuming that the individual rates do not depend on time.
Over time t the system may hop an arbitrary number of times N ranging between
.3/
0 and 1. We shall indicate the hop number by a superscript; for instance, ri2
9
Here we loosely follow (and in a rather simplified form) a detailed discussion which a reader can
find in L. Kantorovich, Phys. Rev. B 75, 064305 (2007).
corresponds to the rate to hop to state i2 during the third hop from the state reached
after the second hop. Summing up all rates at the given k-th hop,
.k/
X
n
.k/
.k/
R D ri ; (6.90)
iD1
gives the total rate of leaving the state the system was in prior to the k-th hop,
and hence it corresponds to the escape rate from this state. Above, n.k/ is the total
number of states available to the system to hop into during the k-th hop.
Now, let the system be in some state at time t0 after .k 1/ hops. We can
.k/
then define the residence probability P0 .t0 ; t/ for the system to remain in the
current state until the final time t. In fact, this probability was already calculated
in Sect. I.8.5.5 and is given by the formula:
.k/ 0 .k/ 0
P0 t ; t D eR .tt / ; (6.91)
where R.k/ is the corresponding escape rate during the k-th hop. Therefore, the
probability for the system to remain in the current state over the whole time t t0
.k/
(and hence to make no hops at all during the k-th step) is given as P0 .t0 ; t/.
Consider now an event whereby the system makes a single hop from the initial
state to state i1 at some time between t0 and t, and then remains in that state until
time t. The probability of this, an essentially one-hop event, is given by the integral:
Z t Z t
.12/ .1/ .1/ .2/ .1/ .1/ .2/
Pi1 0 .t0 ; t/ D P0 .t0 ; t1 / ri1 dt1 P0 .t1 ; t/ D dt1 P0 .t0 ; t1 / ri1 P0 .t1 ; t/ :
t0 t0
(6.92)
Here the system remains in its initial state up to some time t1 (where t0 < t1 < t),
.1/
the probability of that is P0 .t0 ; t1 /, then it makes a single hop into state i1 (the
.1/
probability of this is ri1 dt1 ) and then remains in that state all the remaining time,
.2/
the probability of the latter being P0 .t1 ; t/. We integrate over all possible times t1 to
.12/
obtain the whole probability. The superscript in Pi1 0 shows that this event is based
on two elementary events: (1) a single hop, indicated by i1 as a subscript, and (2) no
transition thereafter, indicated by 0 next to i1 in the subscript. Using Eq. (6.91) for
the residence probability and performing the integration, we obtain
.1/
.12/ ri1 .2/ t .1/ t
Pi1 0 .t0 ; t/ D eR eR : (6.93)
R.1/ R.2/
This expression was obtained assuming different escape rates R.1/ ¤ R.2/ . If the
escape rates are equal, then one can either perform the integration directly or take
the limit of x D R.2/ R.1/ ! 0 in the above formula. In either case we obtain
.12/ .1/ .1/
Pi1 0 .t0 ; t/ D ri1 teR t : (6.94)
Along the same lines one can calculate the probability to make exactly two hops
over time between t0 and t: initially to state i1 and then to state i2 :
Z t Z t
.123/ .1/ .1/ .2/ .2/ .3/
Pi1 i2 0 .t0 ; t/ D dt1 dt2 P0 .t0 ; t1 / ri1 P0 .t1 ; t2 / ri2 P0 .t2 ; t/ : (6.95)
t0 t1
Integrating this expression is a bit more tedious, but still simple; assuming that all
escape rates are different, we obtain (please, check!):

.123/ .1/ .2/ 1 .1/ 1 .2/ 1 .3/
Pi1 i2 0 .t0 ; t/ D ri1 ri2 eR t C eR t C eR t ;
R21 R31 R12 R32 R13 R23
(6.96)
where it was denoted, for simplicity, Rij D R.i/ R.j/ . If some of the rates coincide,
then a slightly different expression is obtained.
Problem 6.53. Show that the three-hop probability
.1234/ .1/ .2/ .3/ 1 .1/ 1 .2/

Pi1 i2 i3 0 .t0 ; t/ D ri1 ri2 ri3 eR t C eR t
R21 R31 R41 R12 R32 R43

1 .3/ 1 .4/
C eR t C eR t
R13 R23 R43 R14 R24 R34
where all escape rates were assumed to be different.
This type of calculation can be continued, although the calculation of probabili-

ties associated with larger numbers of hops becomes very cumbersome. Moreover,
one has to separately consider all possible cases of rates being different and some
of them (or all) the same. The important point is that, as can be shown, the sum of
all probabilities for an arbitrary number of hops is equal to unity:
8 8 99
n.1/ <
X n.2/ <
X n.3/ n
X o==
.1/ .12/ .123/ .1234/
P0 .t0 ; t/ C P .t ; t/ C P .t ; t/ C Pi1 i2 i3 0 .t0 ; t/ C D1:
: i1 0 0 : i1 i2 0 0 ;;
i1 D1 i2 D1 i3 D1
(6.97)
Using the LT method the calculation of the probabilities can be drastically

simplified. Indeed, let us start from writing out an obvious recurrence relation:
Z t
.12 NC1/ .1/ .1/ .23 nNC1/
Pi1 i2 iN 0 .t0 ; t/ D P0 .t0 ; t1 / ri1 dt1 Pi2 i3 iN 0 .t1 ; t/
0
Z t
.1/ .1/ .23 nNC1/
D dt1 P0 .t0 ; t1 / ri1 Pi2 i3 iN 0 .t1 ; t/ ;
0
which states that the .N C 1/-hop transition can be thought of as a single hop at
time t1 into state i1 followed by N remaining hops into states i2 ! i3 ! ! iN ,
after which the system remains at the last state for the rest of the time. Note that
the probabilities depend only on the time difference. Setting the initial time t0 to
zero, one can then clearly see that the time integral above represents a convolution
integral. Therefore, performing the LT of this expression, we immediately obtain
.1/
.12 N/ ri1 .23 N/
Pi1 i2iN1 0 .p/ D Pi2 i3iN1 0 .p/ ;
p C R.1/
h i
.1/
since L P0 .0; t/ D 1= p C R.1/ . Applying this recurrence relation recursively,
the probability in the Laplace space can be calculated explicitly:
.1/ .1/ .2/

.12 N/ ri1 .23 N/ ri1 ri2 .34 N/
Pi1 i2iN1 0 .p/ D P iN1 0 .p/ D Pi3 i4iN1 0 .p/
p C R.1/ i2 i3 p C R.1/ p C R.2/
.1/ .2/ .3/
ri1 ri2 ri3 .45 N/
D Pi4 i5iN1 0 .p/
p C R.1/ p C R.2/ p C R.3/
.N1/
.1/
ri1 rin .N/ 1 Y ri.k/
N1
D P .p/
.N1/ 0
D k
:
p C R.1/ pCR p C R.N/ kD1 p C R.k/
(6.98)
To calculate the probability, it is now only required to take the inverse LT. If the
escape rates are all different, then we have simple poles on the negative part of the
real axis at R.1/ , R.2/ , etc., and hence the inverse LT is easily calculated:
! N " #
.12 N/
Y .k/ X
N1 YN
1 .k/
Pi1 i2 iN1 0 .0; t/ D rik Res e pt
I p D R
kD1 kD1 lD1
p C R.l/
! N 0 1
Y .k/ X
N1
.k/
YN
1
D rik eR t @ .k/ C R.l/
A:
R
kD1 kD1 lD1 .l¤k/
It can be checked immediately that this expression is a generalisation of the previous

results obtained above for the cases of N between 1 and 4. If some of the escape rates
are the same for different hops, then the poles associated with those terms are no
longer simple poles; in this more general case the probability (6.98) can be written
slightly differently:
!
.12 N/
Y .k/ Y
N1
1
Pi1 i2 iN1 0 .p/ D rik m ;
kD1 k
p C R.k/ k
where the second product runs over all distinct escape rates and mk is their repetition.
Correspondingly, R.k/ becomes the pole of order mk and one has to use general
formula (2.106) for calculating the corresponding residues.
Problem 6.54. Show that if all escape rates are the same and equal to R, then
the N-hop probability is
!
.12 N/
Y
N1
.k/ tN1 Rt
Pi1 i2 iN1 0 .0; t/ D rik e :
kD1
.N 1/Š
Problem 6.55. The probability PN .t/ of performing N hops (no matter into
which states) over time t can be obtained by summing up all possible .N C 1/-
hops probabilities:
.1/ .2/ .N/

X
n X
n X
n
.12 NC1/
PN .t/ D ::: Pi1 i2iN 0 .0; t/ :
i1 D1 i2 D1 iN D1
Assuming that the escape rates are all the same and using the result of the
previous problem, show that
.Rt/N Rt
PN .t/ D e ;
NŠ
which is the famous Poisson distribution. Then, demonstrate that the sum of all
possibilities of performing 0, 1, 2, etc. hops is equal to unity:
1
X
PN D 1:
ND0
6.5.4 Inverse NC-AFM Problem
In Sect. 3.8.4 we introduced non-contact Atomic Force Microscopy (NC-AFM),

shown in Fig. 3.7, a revolutionary technique which is capable of imaging surfaces as
well as atoms and molecules on them with sub-molecular, and sometimes even with
atomic, resolution. We then derived an expression (3.93) which allows calculating
the resonance frequency ! of the oscillating cantilever of this experimental probe
from the force acting on the tip due to interaction with the surface. This expression
is very useful in comparing theory and experiment as it allows calculating the
frequency shift ! D ! !0 , the quantity which is directly measured. Here !0
is the oscillation frequency of the cantilever far away from the surface. If one can
calculate the quantity which is directly measured experimentally, then it would
become possible to verify theoretical models. Of course, many such models may
need to be tried before good agreement is reached.
However, one may also ask a different question: how to determine the tip force
Fs .z/ as a function of the tip-surface distance z from the experimentally measured
frequency shift !.z/ curve? If the force can be obtained in this way, then it might
be easier to choose the theoretical model which is capable of reproducing this
particular z-dependence. This problem is basically an inverse problem to the one
we solved in Sect. 3.8.4. It requires solving the integral equation (3.93) with respect
to the force.
Here we shall obtain a nearly exact solution of this integral equation using the
method of LT.10 We shall start by rewriting the integral equation (3.93) slightly
differently:
" 2 # Z 5 =2
!
kA 1 D Fs .h0 C A sin / sin d : (6.99)
!0 =2
Recall that A is the oscillation amplitude and z D h0 C A sin corresponds to the tip
height above the surface; k is the elastic constant of the cantilever. It was convenient
here to shift the integration limits by =2. This can always be done as the integrand
is periodic with respect to with the period of 2 . Next, note that within the span
of the values in the integral the tip makes the full oscillation cycle by starting at
the height h0 C A (at D =2), then moving to the position z D h0 A closest to
the surface (at D 3 =2), and then returning back (retracting) to its initial position
of z D h0 C A at D 5 =2.
Problem 6.56. Next, we shall split the integral into two: for angles =2 < <
3 =2 when the tip moves down, and for angles 3 =2 < < 5 =2 when it is
retracted back up. If F# .z/ and F" .z/ are the tip forces for the tip moving down
and up, respectively, show, by making a substitution x D sin , that Eq. (6.99)
can be rewritten as follows:
Z 1
xdx
F.z C Ax/ p D ‰ .z/ ; (6.100)
1 1 x2
(continued)
10
The solution to this problem was first obtained by J.E. Sader and S.P. Jarvis in their highly cited
paper [Appl. Phys. Lett. 84, 1801 (2004)] using the methods of LT and the so-called fractional
calculus. We adopted here some ideas of their method, but did not use the fractional calculus at all
relying instead on conventional techniques presented in this chapter.

where

kA ! 2 .z/
‰ .z/ D 1 (6.101)
2 !02
is to be considered as a known function of z, and F.z/ D 12 F# .z/ C F" .z/ is

the average (over a cycle) tip force. When deriving
p formula (6.100), choose the
correct sign in working out dx D cos d D ˙ 1 x2 d in each of the two
integrals.
Note that the tip force on the way down and up could be different due to a possible
atomic reconstruction at the surface (and/or at the tip) when the tip approaches the
surface on its way down; this reconstruction sets in and affects the force when
the tip is retracted. If such a reconstruction takes place, the tip force experiences
a hysteresis over the whole oscillation cycle which results in the energy being
dissipated in the junction.
To solve Eq. (6.100), we assume that F.z/ is the LT of some function f .t/:
Z 1 Z 1
xdx
p f .t/e.zCAx/t dt D ‰ .z/ :
1 1 x2 0
Our current goal is to find the function f .t/. The improper integral over t converges
for any x since z C Ax > z A > 0 as z is definitely bigger than A (the distance of
closest approach h0 A > 0); moreover, it converges uniformly with respect to x,
since
ˇZ 1 ˇ ˇZ 1 ˇ
ˇ ˇ ˇ ˇ
ˇ f .t/e.zCAx/t ˇ ˇ
dtˇ ˇ f .t/e.zA/t ˇ
dtˇ D jF.z A/j ;
ˇ
0 0
so that the two integrals can be swapped:

Z "Z #
1 1
zt xe.At/x
f .t/e dt p dx D ‰ .z/ : (6.102)
0 1 1 x2
The integral in the square brackets can be directly related to the modified Bessel
function of the first kind, see Eq. (4.233); it is equal to I1 .At/. This enables us to
rewrite Eq. (6.102) as follows:
Z 1
1
f .t/ezt I1 .At/ dt D ‰ .z/ ;
0
or simply as
1
L ŒI1 .At/ f .t/ D ‰ .p/ ; (6.103)
Fig. 6.16 Comparison of the function I1 .x/ calculated exactly (by a numerical integration in
Eq. (4.233)) and using the approximation of Eq. (6.105). In both cases for convenience the function
I1 .x/ex is actually shown
where p is the real positive number; we have changed z to p here as this is the letter
we have been using in this chapter as the variable for the LT. Hence, it follows:
1 1 1 .t/
f .t/ D L1 Œ‰ .p/ D ; (6.104)
I1 .At/ I1 .At/
where .t/ D L1 Œ‰.p/. So far, our manipulations have been exact.
To proceed, we shall now apply an approximation to the Bessel function which
works really well across a wide range of its variables:

1 2 1 p
' ex C p C 2 x : (6.105)
I1 .x/ x 4 x
An amazing agreement of this approximation with the exactly calculated func-

tion (4.233) is demonstrated in Fig. 6.16.
By using approximation (6.105) in Eq. (6.104), applying the LT to both sides and
recalling that the force F.p/ D L Œf .t/, we can now write:

1 2 eAt 1 eAt p p At
F.p/ D L .t/ C p L p .t/ C 2 A L te .t/ :
A t 4 A t
(6.106)
Introducing notations
G1 .p/ D L t1 .t/ ; G2 .p/ D L t1=2 .t/ and G3 .p/ D L t1=2 .t/ ;
(6.107)
and using the property (6.41), we obtain

1 2 1 p
F.p/ D G1 .p C A/ C p G2 .p C A/ C 2 AG3 .p C A/ : (6.108)
A 4 A
Now we need to calculate all three G functions in (6.107). The first one is calculated
immediately using property (6.46):
Z 1 Z 1
.t/
G1 .p/ D L D ‰ .z/ dz D ‰ .z C p/ dz: (6.109)
t p 0
For the calculation of the second and the third ones, we can use Eqs. (6.30)
and (6.35), respectively, with ˛ D 1=2. These give
Z 1
1 ‰ .z C p/
G2 .p/ D L t1=2 .t/ D p p dz;
0 z
Z 1
1 ‰ 0 .z C p/
G3 .p/ D L t1=2 .t/ D p p dz;
0 z
which allow us to obtain the final result:

Z 1
2 A1=2 1 A3=2 ‰ 0 .z C p C A/
F.p/ D dz 1 C p p ‰ .z C p C A/ p p :
A 0 8 z 2 z
(6.110)
This formula solves our problem: the function ‰ .z/ is measured experimentally
by performing the so-called spectroscopy experiments when the oscillating tip
gradually approaches the surface and the frequency shift !.z/ D !.z/ !0
is measured as a function of the tip average height z. Then, by performing the
numerical integration with the functions ‰ .z/ and ‰ 0 .z/ as prescribed by this
formula, the force acting on the tip as a function of the tip height p is obtained.
Chapter 7
Curvilinear Coordinates
In many applications physical systems possess symmetry. For instance, the magnetic
field of an infinite vertical wire with a current flowing through it has a cylindrical
symmetry (i.e. the field depends only on the distance from the wire), while the field
radiated by a point source has the characteristic spherical symmetry (i.e. depends
only on the distance from the source). In these and many other cases a Cartesian
coordinate system may not be the most convenient choice; a special choice of the
coordinates (such as cylindrical or spherical ones for the two examples mentioned
above, respectively) may, however, simplify the problem considerably and hence
enable one to obtain a closed solution. In particular, investigation of a large number
of physical problems require solving the so-called partial differential equations
(PDE). Using the appropriate coordinates in place of the Cartesian ones allows
one to obtain simpler forms of these equations (which may, e.g., contain a smaller
number of variables) that can be easier to solve.
The key objective of this chapter1 is to present a general theory which allows
introduction of such alternative coordinate systems and how general differential
operators such as gradient, divergence, curl and the Laplacian can be written in
terms of them. Some applications of these so-called curvilinear coordinates in
solving PDEs will be considered in Sect. 7.11.1 and then in Chap. 8.
1

500 7 Curvilinear Coordinates
7.1 Definition of Curvilinear Coordinates
Instead of the Cartesian coordinate system, we define a different system of

coordinates .q1 ; q2 ; q3 / using a coordinate transformation which relates the new
and old coordinates:
x D x .q1 ; q2 ; q3 / ; y D y .q1 ; q2 ; q3 / and z D z .q1 ; q2 ; q3 / : (7.1)
We shall also suppose that Eq. (7.1) can be solved for each point .x; y; z/ with respect
to the new coordinates:
q1 D q1 .x; y; z/ ; q2 D q2 .x; y; z/ and q3 D q3 .x; y; z/ ; (7.2)
yielding the corresponding inverse relations. In practice, for many such transfor-
mations, at certain points .x; y; z/ the solutions (7.2) are not unique, i.e. several
(sometimes, infinite) number of coordinates .q1 ; q2 ; q3 / exist corresponding to the
same point in the 3D space. Such points are called singular points of the coordinate
transformation. The new coordinates .q1 ; q2 ; q3 / are called curvilinear coordinates.
The reader must already be familiar with at least two such curvilinear systems
(see Sect. I.1.13.2): the cylindrical and spherical curvilinear coordinate systems.
In the case of cylindrical coordinates .q1 ; q2 ; q3 / D .r; ; z/ the corresponding
transformation is given by the following equations:
x D r cos ; y D r sin and zDz; (7.3)
where 0 r < 1, 0 < 2 and 1 < z < C1, see Fig. 7.1(b).
Points along the z axis (when r D 0) are all singular in this case: one obtains
x D y D 0 (any z) for any angle . By taking p the square of the first two equations
and adding them together, one obtains r D x2 C y2 , while dividing the first two
equations gives D arctan .y=x/. These relations serve as the inverse relations of
the transformation. The polar system, Fig. 7.1(a), corresponds to the 2D space and
is obtained by omitting the z coordinate altogether. In this case only the single point
x D y D 0 is singular.
Fig. 7.1 Frequently used curvilinear coordinate systems: (a) polar, (b) cylindrical and (c)
spherical
7.2 Unit Base Vectors 501
For the case of the spherical coordinates .q1 ; q2 ; q3 / D .r; ; / we have the
transformation relations
x D r sin cos ; y D r sin sin and z D r cos ; (7.4)
where 0 r < 1, 0 and 0 < 2 , see Fig. 7.1(c).
Problem 7.1. Show that the inverse relations for the spherical coordinate
system are
p p
rD x2 Cy2 Cz2 ; D arctan .y=x/ and D arccos z= x2 Cy2 Cz2 :
Which points are singular points in this case?
Hence, transformation relations (7.1) and (7.2) enable one to define a mapping of
the Cartesian system onto the curvilinear one. This mapping does not need to have a
one-to-one correspondence: the same point .x; y; z/ may be obtained by several sets
of the chosen curvilinear coordinates.
Note that the transformation (7.1) allows one to represent any scalar field
G .x; y; z/ in curvilinear coordinates as
G .x .q1 ; q2 ; q3 / ; y .q1 ; q2 ; q3 / ; z .q1 ; q2 ; q3 // D Gcc .q1 ; q2 ; q3 / :
It is some function Gcc of the curvilinear coordinates. For instance, the scalar field
2
G.x; y; z/ D x2 C y2 C z2 is equivalent to the scalar function Gcc .r; ; / D r4
in the spherical system (or coordinates).
It is seen that once the transformation relations are known, any scalar field can
readily be written in the chosen curvilinear coordinates. Next, we need to discuss a
general procedure for representing an arbitrary vector field F.x; y; z/ in terms of the
curvilinear coordinates .q1 ; q2 ; q3 /. This is by far more challenging as the field has a
direction and hence we need to understand how to write the field in the vector form
which does not rely on Cartesian unit base vectors i, j and k.
7.2 Unit Base Vectors
In order to define vector fields in a general curvilinear coordinate system, we have

to generalise the definition of the unit base vectors i, j and k of the Cartesian system.
To prepare ourselves for the new notions we require, let us first revisit the Cartesian
system. If we “move” along say vector i starting from some point P.x; y; z/, then
only the coordinate x will change, the other two coordinates y and z would remain
the same; our trajectory will be a straight line parallel to the x axis. Let us call it
an x-line. Similarly, “moving” along j or k would only change the coordinates y or

z, respectively, leaving the other two coordinates unchanged; this exercise yields y-
and z-lines, respectively. We may say that lines parallel to the Cartesian axes and
crossing at right angles characterise the Cartesian system.
Conversely, instead of allowing only a single coordinate to change, we can also
fix just one coordinate and allow the other two coordinates to change. For instance,
by fixing x coordinate of the point P.x; y; z/ and making all possible changes in y
and z, one constructs a plane passing through the point P which is parallel to the
.y z/ plane of the Cartesian system. We can call this a x plane. Similarly, planes
z and y parallel to the .xy/ and .xz/ Cartesian planes, respectively, and passing
through P correspond to fixing only z or y coordinates, respectively. Any two such
planes cross at the corresponding line. Indeed, x and y planes both pass through
the point P and cross at the z-line; x and z planes cross at the y-line, while y and
z at the x-line, all passing through the point P.
Consider now a general case of a curvilinear system specified by Eqs. (7.1)
and (7.2). Choose a point P with the coordinates q01 ; q02 ; q03 . If we constrain two
coordinates q2 .x; y; z/ D q02 and q3 .x; y; z/ D q03 and allow only one coordinate q1
to change, we shall “draw”
in space a coordinate line, or q1 -line, passing through
the point P q01 ; q02 ; q03 . This line may not necessarily be straight as in the Cartesian
system; it may be curved. Similarly, one defines q2 - and q3 -lines by allowing only
either of these two coordinates to change. All three lines pass through the point
P, but go in different directions as shown in Fig. 7.2. Generally the direction of a
coordinate line depends also on the position of the point P, i.e. it might change
across space. Note that the directions of the Cartesian coordinate lines remain the
same, they do not depend on the position of the point P.
Next, we introduce coordinate surfaces. If a single coordinate q1 .x; y; z/ D q01
is fixed, but the other two coordinates q2 and q3 are allowed to change, a surface
is constructed given by the equation q1 .x; y; z/ D q01 . We shall call it a 1
surface. Similarly, 2 and 3 surfaces are built by equations q2 .x; y; z/ D q02
and q3 .x; y; z/ D q03 , respectively. In many frequently used curvilinear coordinate
systems at least one of these three surfaces is not planar.
Fig. 7.2 Coordinate lines

and unit base vectors in a
general curvilinear coordinate
system .q1 ; q2 ; q3 /

(dashed directed lines) and
unit base vectors (solid
vectors) for the cylindrical
coordinate system at point P
All three coordinate surfaces will intersect at a point P. Moreover, any pair of
surfaces will intersect at the corresponding coordinate line passing through the point
P: 1 and 3 intersect at the q2 -line, 2 and 3 and the q1 -line, and so on.
As an example, let us construct coordinate lines and surfaces for the cylindrical
coordinate system, Eq. (7.3) and Fig. 7.1(b). By changing only r, we draw the
r-line which is a ray starting at the z axis and moving outwards perpendicular to
it remaining at the height z and making the angle to the x axis. The coordinate
-line will be a circle of radius r and at the height z, while the coordinate z-line is
the vertical line drawn at a distance r from the z axis such that its projection on the
.x y/ plane is given by the polar angle , see Fig. 7.3. The coordinate surfaces for
this case are obtained by fixing a single coordinate: r is a cylinder coaxial with the
z axis, is a vertical semi-plane hinged to the z axis, and z is a horizontal plane at
the height z. Obviously, r and intersect at the corresponding z-line, r and z at
the -line, while and z at the r-line, as expected.
Next we introduce a set of vectors e1 , e2 and e3 at the intersection point P.
Each of these is a unit vector; the direction of ei (i D 1; 2; 3) is chosen along the
tangent to the corresponding qi -coordinate line and in such a way that it points to the
increase of qi . These vectors, which are called unit base vectors of the curvilinear
coordinate system, enable one to represent an arbitrary vector field F.r/ at point
P.r/ D P.x; y; z/ in the form
F.P/ D F1 e1 C F2 e2 C F3 e3 ; (7.5)
where Fi (i D 1; 2; 3) are scalar functions of the coordinates which can be treated as

components of the vector F.P/ in the given coordinate system. In particular, in the
Cartesian system e1 D i, e2 D j and e3 D k, and F1 D Fx , F2 D Fy and F3 D Fz .
If a coordinate line curves, the direction of the corresponding unit base vector will
vary depending on the position of the point P. This can easily be seen on the example
of the cylindrical coordinate system shown in Fig. 7.3: the r-lines are directed away
from the z axis parallel to the .x y/ plane; hence, the vector er has the same
direction along the line but changes in a different line which has a different value of
the angle . On the other hand, e goes around the circle of radius r, i.e. the direction
of this unit base vector changes along the line; but it does not depend on z. At the
same time, ez D k is always directed along the z axis as the third coordinate z of this
system is identical to the corresponding coordinate of the Cartesian system.
Let us now derive explicit expressions for the unit base vectors. We shall relate
them to the curvilinear coordinates and the Cartesian base vectors i, j and k. The
latter are convenient for that purpose as they are fixed vectors. Consider the general
position vector r D xi C yj C zk written in the Cartesian system. Because of the
transformation equations (7.1), r D r .q1 ; q2 ; q3 /, i.e. each Cartesian component of
r depends on the three curvilinear coordinates. Since the vector ei is directed along
the tangent of the qi -line along which only the coordinate qi varies, the direction of
ei will be proportional to the partial derivative of the vector r with respect to this
coordinate at the point P, i.e.
@r @x @y @z
ei / D iC jC k ; i D 1; 2; 3 : (7.6)
@qi @qi @qi @qi
A proportionality constant hi between the derivative @r=@qi and ei in the relation
@r=@qi D hi ei can be chosen to ensure that the unit base vector is of unit length. This
finally allows us to write the required relationships which relate the unit base vectors
of an arbitrary curvilinear coordinate system to the Cartesian unit base vectors:

1 @r 1 @x @y @z
ei D D iC jC k ; (7.7)
hi @qi hi @qi @qi @qi
where
ˇ ˇ s 2 2 2
ˇ @r ˇ @x @y @z
hi D ˇˇ ˇˇ D C C : (7.8)
@qi @qi @qi @qi
The factors hi , called scale factors and introduced above, are chosen to be positive
to ensure that ei is directed along the qi -line in the direction of increasing qi . It is
readily seen from the above equations that the unit base vectors ei are expressed as
a linear combination of the Cartesian vectors i, j and k which do not change their
direction with the position of the point P. For convenience, these can be collected
in three unit vectors e0i , i D 1; 2; 3. Therefore, a change (if any) experienced by
the vectors ei is contained in the coefficients Mij˚of the expansion. The relationship
between the new (curvilinear) fei g and Cartesian e0i vectors is conveniently written
in the matrix form2 :
0 1 0 01 0 1
3
X e1 e1 i
ei D mij e0j or @ e2 A D M @ e0 A D M @ j A ; (7.9)
2
jD1 e3 e03 k
2
We remind that we use capital letters for matrices and the corresponding small letters for their
matrix elements.

where M D mij is the 3 3 matrix of the coefficients,
1 @x 1 @y 1 @z
mi1 D ; mi2 D ; mi3 D :
hi @qi hi @qi hi @qi
Note that the inverse transformation,

0 1 0 1
3
X 1 e01 e1
e0i D M ij ej or @ e0 A D M @ e2 A ;
1
(7.10)
2
jD1 e03 e3
enables one to express the Cartesian vectors in terms of the unit base vectors ei , if
needed.
We know that the unit base vectors of the Cartesian system are mutually
orthogonal: i j Di k D j k D 0. The Cartesian coordinate system is said to
be orthogonal. A curvilinear coordinate system may also be orthogonal. It is called
orthogonal if at each point P the triple of its unit base vectors remain mutually
orthogonal:
e1 e2 D e1 e3 D e2 e3 D 0 : (7.11)
These relationships hold in spite of the fact that directions of ei may vary from point
to point. For an orthogonal system the coordinate surfaces through any point P will
all intersect at right angles.
If the given curvilinear system is orthogonal, then the matrix˚ M must be
orthogonal as well. It is easy to understand as the transformation e0i ! fei g,
Eq. (7.9), can be considered ˚ as a rotation in 3D space (Sect. 1.2.5.2): indeed, from
the three Cartesian vectors e0i we obtain a new set of vectors fei g in the same space.
Hence, if from one set of orthogonal vectors we obtain another orthogonal set, this
can only be accomplished by an orthogonal transformation (Sect. 1.2.5). This can
.0/ .0/
also be shown explicitly: if both sets are orthogonal, ei ej D ıij and ek ek0 D ıkk0 ,
then the matrix M must be orthogonal as well:
X .0/ .0/
X
ei ej D mik mjk0 ek ek0 D mik mjk0 ıkk0
kk0 kk0
X X
D mik mjk D mik M T kj D MM T ij :
k k
Since the dot product ei ej must be equal to ıij , then MM T is the unity matrix, i.e.
M T D M 1 , which is the required statement. For orthogonal systems the inverse
transformation, Eq. (7.10), is provided by the transposed matrix M T :
3
X 3
X
T
e0i D M ij ej D mji ej :
jD1 jD1
To illustrate the material presented above it is instructive to consider in detail

an example; we shall choose again the case of the cylindrical coordinate system.
To express the unit base vectors er , e and ez of this system3 via the Cartesian unit
base vectors, we shall write explicitly the vector r via the cylindrical coordinates
x D r cos , y D r sin , z D z to get:
r D r .r; ; z/ D x .r; ; z/ iCy .r; ; z/ jCz .r; ; z/ k D .r cos / iC.r sin / jCzk:
We now calculate the derivatives of the vector r and their absolute values as required
by Eq. (7.7):
ˇ ˇ
@r ˇ @r ˇ 1=2
D .cos / i C .sin / j ; hr D ˇˇ ˇˇ D cos2 C sin2 D1I (7.12)
@r @r
ˇ ˇ
@r ˇ @r ˇ 1=2
D .r sin / i C .r cos / j ; h D ˇˇ ˇˇ D r2 sin2 C r2 cos2 DrI
@ @
(7.13)
ˇ ˇ
@r ˇ @r ˇ
D k ; hz D ˇˇ ˇˇ D 1 : (7.14)
@z @z
Hence, we obtain for the unit base vector the following explicit expressions:
er D cos i C sin j ; e D sin i C cos j ; ez D k : (7.15)
It is easily verified that the vectors are of unit length (by construction) and are all
mutually orthogonal: er e D er ez D e ez D 0. These results show explicitly that
the cylindrical coordinate system is orthogonal as these conditions were derived for
any point P (i.e. any r, and z). Hence, in the case of the cylindrical system the
transformation matrix M reads:
0 1
cos sin 0
@
M D sin cos 0A: (7.16)
0 0 1
The matrix M is orthogonal since its rows (or columns) form an orthonormal set of
vectors, as expected; hence
0 1
cos sin 0
M 1 D M T D @ sin cos 0A : (7.17)
0 0 1
3
It is convenient to use the corresponding symbols of the curvilinear coordinates as subscripts for
the unit base vectors instead of numbers in each case, and we shall be frequently using this notation.

and unit base vectors for the
spherical coordinate system
In fact, M is readily seen to be the rotation matrix Rz . / (rotation by angle

around the z axis, Sect. 1.2.5.2 and Eq. (1.43)). Once the inverse matrix of the
transformation M 1 is known, one can write the inverse transformation of the unit
base vectors explicitly:
i D cos er sin e ; j D sin er C cos e ; k D ez :
These relationships enable one to rewrite any vector field from the Cartesian to the
cylindrical coordinates:
F D Fx .r/ i C Fy .r/ j C Fz .r/ k

D Fx .r/ cos er sin e C Fy .r/ sin er C cos e C Fz .r/ez
D Fx .r/ cos C Fy .r/ sin er C Fx .r/ sin C Fy .r/ cos e C Fz .r/ez ;
where the three components of the field are explicit functions of the cylindrical
coordinates, e.g. Fx .r/ D Fx .x; y; z/ D Fx .r cos ; r sin ; z/. For instance,
2 y2
F D ex .xi C yj/
r2
2
De r cos cos er sin e C r sin sin er C cos e D er rer :
It is seen that the field is radial.

Hence, in order to rewrite a vector field in a curvilinear coordinate system,
one has to express the Cartesian unit base vectors via the unit base vectors of the
curvilinear system and replace the Cartesian coordinates via the curvilinear ones
using the corresponding transformation relations.
Problem 7.2. Using Fig. 7.4, describe the coordinate surfaces and lines for the
spherical coordinate system .r; ; /, Eq. (7.4). Show that the unit base vectors
and the corresponding scale factors for this system are
(continued)

er D sin .cos i C sin j/ C cos k ; hr D 1 I (7.18)
e D cos .cos i C sin j/ sin k ; h D r I (7.19)
e D sin i C cos j ; h D r sin : (7.20)
Prove by checking the dot products between unit base vectors that this system
is orthogonal, and show that in this case the transformation matrix
0 1
sin cos sin sin cos
M D @ cos cos cos sin sin A
sin cos 0
is orthogonal, i.e. M T M D E, where E is the unit matrix. Also demonstrate by
a direct calculation that
er e D e ; er e D e and e e D er :
Problem 7.3. The parabolic cylinder coordinate system .u; v; z/ is specified

by the transformation relations:
1 2
xD u v2 ; y D uv ; zDz: (7.21)
2
Sketch coordinate surfaces and lines for this system. Show that the unit base
vectors and the scale factors are
1 p
eu D .ui C vj/ ; hu D u2 C v 2 I
hu
1
ev D .vi C uj/ ; hv D hu I ez D k ; hz D 1 :
hv
Finally, prove that this system is orthogonal.
Problem 7.4. The parabolic coordinates .u; v; / (where u 0, v 0 and
0 2 ) are specified by the following transformation equations:
1 2
x D uv cos ; y D uv sin ; z D u v2 : (7.22)
2
Show that in this case the unit base vectors and the scale factors are
v u p
eu D cos i C sin j C k ; hu D u2 C v 2 I
hu v
u v
ev D cos i C sin j k ; hv D hu I
hv u
e D sin i C cos j ; h D uv :
Finally, prove that this system is orthogonal.

7.3 Line Elements and Line Integral 509
Problem 7.5. Rewrite the vector field

x y
F.x; y; z/ D p iC p j C zk
2
x Cy 2 x C y2
2
in cylindrical coordinates. [Answer: F D er C zez .]
7.3 Line Elements and Line Integral
If we make a small change in the coordinates of a point P .q1 ; q2 ; q3 /, we shall

arrive at the point .q1 C dq1 ; q2 C dq2 ; q3 C dq3 /. To the first order the change in
the position vector r .q1 ; q2 ; q3 / is given by the differential line element
X3 X3
@r
dr D dqi D hi dqi ei ; (7.23)
iD1
@qi iD1
where Eq. (7.7) for the unit base vectors has been used. Note that expression (7.23)
for the change of r is valid for general non-orthogonal curvilinear coordinate
systems.
The square of the length ds of the displacement vector dr is then given by
3 X
X 3
2
.ds/ D dr dr D gij dqi dqj ; (7.24)
iD1 jD1
where the coefficients

@r @r
gij D hi hj ei ej D (7.25)
@qi @qj
form a symmetric matrix (since obviously gij D gji /

0 1
g11 g12 g13
G D @ g21 g22 g23 A ; (7.26)
g31 g32 g33
called a metric tensor.

Consider now moving exactly along the qi -coordinate line through P with the
coordinate qi changing by dqi . Since the other curvilinear coordinates are kept
constant, then from Eq. (7.23) the change of r in space is given by the vector
dri D hi dqi ei : (7.27)

Fig. 7.5 The line elements

for a general curvilinear
system
The length of this differential line element is
dsi D jdri j D hi dqi ; (7.28)
explaining why hi is called a scale factor. The small displacements dr1 , dr2 and dr3
made along each of the coordinate lines are shown in Fig. 7.5.
If the curvilinear system is orthogonal, the metric tensor is diagonal:
1 @r @r @r @r
ei ej D D ıij H) gij D D h2i ıij : (7.29)
hi hj @qi @qj @qi @qj
For an orthogonal system the three vectors dri are orthogonal independently of the
choice of the point P; but this is not generally the case for an arbitrary system: at
some points the unit base vectors may be orthogonal, but they will not be orthogonal
for all points. One can also see that for an orthogonal system
3
X 3
X
2 2 2 2 2
.ds/ D h21 .dq1 / C h22 .dq2 / C h23 .dq3 / D .dri / D .dsi /2 ; (7.30)
iD1 iD1
which is nothing but a statement of the (three-dimensional) Pythagoras’ theorem,

see Fig. 7.5.
In Sect. I.6.3.1 we defined a line integral of a scalar field and calculated a length
of a curve specified by parametric equations x D x.t/, y D y.t/ and z D z.t/ (which,
e.g., may describe a trajectory of a point particle as a function of time t). We showed
there that the length of the curve corresponding to the change of the parameter t from
zero (e.g. the total path passed by the particle during time t) is given by the formula
(see Eq. (I.6.27) with f .x; y; z/ D 1):
Z tp
sD x0 .t1 /2 C y0 .t1 /2 C z0 .t1 /2 dt1 : (7.31)
0
7.3 Line Elements and Line Integral 511
We can generalise this result now by considering a space curve specified in

curvilinear coordinates by the parametric equations qi D qi .t/ (i D 1; 2; 3). Then
ds D jdrj, where dr given by Eq. (7.23) would correspond to the differential
arc length of the curve. The total arc length is obtained by integrating over the
parameter t:
3 3
!
X @r X @r dqi
dr D dqi D dt ;
iD1
@qi iD1
@qi dt
so that
v
u 3 3
p uX @r dqi X @r dqj
ds D dr dr D t dt
iD1
@qi dt jD1 @qj dt
v v
u 3 u 3
u X @r @r dqi dqj u X dqi dqj
D t dt D t gij dt ;
i;jD1
@qi @qj dt dt i;jD1
dt dt
and hence finally

2 31=2
Z t 3
X dq dq
sD 4 gij
i j
5 dt : (7.32)
0 i;jD1
dt dt
Of course,
q the same expression is obtained directly from Eq. (7.24) by considering
ds D .ds/2 and dqi D .dqi =dt/ dt.
In the case of an orthogonal system this formula is simplified
Z t "X
3 2 #1=2
dqi
sD h2i dt : (7.33)
0 iD1
dt
It is readily seen that in the Cartesian system, when q1 D x, q2 D y and q3 D z, the

previous formula (7.31) is recovered.
As a simple illustration of the above formula, consider a particle moving around
a circular trajectory with angular velocity !. We use cylindrical coordinates, when
r and z are constants, but the angle .t/ D !t changes with the time t between zero
and T D 2 =! that corresponds to one full circle. Then, the path passed by the
particle (note that hr D hz D 1, h D r, see Eqs. (7.12)–(7.14)) is, according to
Eq. (7.33):
Z T q Z T q Z T Z T
sD h2r rP 2 C h P 2 C h2z zP2 dt D
2
h2 P2
dt D h P dt D r!dtDr!T
0 0 0 0
D2 r;
as expected.
Problem 7.6. Show that .ds/2 on the surface of a cylinder of radius R is
.ds/2 D R2 .d /2 C .dz/2 ; (7.34)
while in the case of a sphere of the same radius,
.ds/2 D R2 .d/2 C R2 sin2 .d /2 : (7.35)
[Hint: note that on the surfaces of the cylinder or sphere dr D 0.]

p
Problem 7.7. Show that .ds/2 on the surface of a cone z D x2 C y2 in polar
coordinates is given by
.ds/2 D 2 .dr/2 C r2 .d /2 ; (7.36)
while for the paraboloid of revolution z D x2 C y2 it is

.ds/2 D 4r2 C 1 .dr/2 C r2 .d /2 : (7.37)
7.4 Volume Element and Jacobian in 3D
As we know from Sect. I.6.2.2, curvilinear coordinates could also be useful in

performing volume integration, and we derived there the conversion formula for
the volume element dV D dxdydz between the Cartesian and any other coordinates
q1 , q2 , q3 (we did not call them curvilinear though at the time):
dV D jJj dq1 dq2 dq3 ;
where J D @ .x; y; z/ =@ .q1 ; q2 ; q3 / is the Jacobian of the transformation, cf.

Eq. (I.6.22). Exactly the same result can be derived using our new mathematical
tools developed above.
Indeed, consider Fig. 7.5 again. A parallelepiped there is formed by three vectors
dr1 D h1 e1 dq1 , dr2 D h2 e2 dq2 and dr3 D h3 e3 dq3 . These vectors are orthogonal
to each other only for an orthogonal curvilinear coordinate system, in which case,
obviously, the parallelepiped becomes a cuboid with the volume
dV D jdr1 j jdr2 j jdr3 j D h1 h2 h3 dq1 dq2 dq3 ;
i.e. the Jacobian in this case J D h1 h2 h3 is simply given by a product of all

three scale factors. In the case of the cylindrical system, when hr D hz D 1 and
h D r, we obtain J D r, a well-known result (Sect. I.6.1.4), while in the case of the
spherical system, for which hr D 1, h D r and h D r sin , we obtain J D r2 sin ,
cf. Sect. I.6.2.2.
7.4 Volume Element and Jacobian in 3D 513
In a general case, however, the sides of the parallelepiped in Fig. 7.5 are not
orthogonal, and its volume is given by the absolute value of the mixed product of
all three vectors:
dV D j.dr1 Œdr2 dr3 /j D jh1 h2 h3 .e1 Œe2 e3 /j dq1 dq2 dq3 :
This formula can already be used in practical calculations since it gives a general
result for the Jacobian as
J D h1 h2 h3 .e1 Œe2 e3 / : (7.38)
However, it is instructive to demonstrate that this is actually the same result as the
one derived previously in Sect. I.6.2.2 where J was expressed directly via partial
derivatives. To this end, recall the actual expressions for the unit base vectors,
Eq. (7.7); hence, our previous result can be rewritten as
ˇ ˇ
ˇ @r @r @r ˇ
dV D ˇˇ ˇ dq1 dq2 dq3 D Jdq1 dq2 dq3 : (7.39)
@q1 @q2 @q3 ˇ
It is not difficult to see now that the mixed product of derivatives above is exactly
the Jacobian J D @ .x; y; z/ =@ .q1 ; q2 ; q3 /. Indeed, the mixed product of any three
vectors can be written as a determinant (see Sect. I.1.7.1). Therefore, the Jacobian
in Eq. (7.39) can finally be manipulated into
ˇ ˇ
ˇ @x=@q1 @y=@q1 @z=@q1 ˇ
ˇ ˇ @ .x; y; z/
J D ˇˇ @x=@q2 @y=@q2 @z=@q2 ˇˇ D ;
ˇ @x=@q @y=@q @z=@q ˇ @ .q1 ; q2 ; q3 /
3 3 3
as required.
Problem 7.8. Show that the 3D Fourier transform

Z 1
1
F.k/ D f .r/eik r dr
.2 /3=2 1
1
of the function f .r/ D r2 C 2 can be written as a one-dimensional integral
r Z 1
1 2 r sin.kr/
F.k/ D dr :
k 0 r2 C 2
Problem 7.9. A charge density of a unit point charge that is positioned at point
r0 is described by the distribution function in the form of the 3D delta function:
.r/ D ı .r r0 / D ı .x x0 / ı .y y0 / ı .z z0 / :
Show that in the spherical coordinates .r; ; / this formula takes on the
following form:
1
.r/ D ı .r r0 / D ı .r r0 / ı . 0 / ı . 0/ ;
r2 sin
where r D .r; ; / and r0 D .r0 ; 0 ; 0 /. [Hint: the simplest method is

to ensure that in both coordinate systems, spherical and Cartesian, a volume
integral of a general function f .r/ multiplied by ı .r r0 / equals f .r0 /.]
Problem 7.10. Generalise this result to a general orthogonal curvilinear
system:
1
ı .r r0 / D ı q1 q01 ı q2 q02 ı q3 q03 ; (7.40)
jh1 h2 h3 j

where r D .q1 ; q2 ; q3 / and r0 D q01 ; q02 ; q03 .
It is instructive to derive the above formula for the delta function in an orthogonal
curvilinear system also using the exponential delta-sequence (4.18):
ınml .r r0 / D ın .x x0 / ım .y y0 / ıl .z z0 /

nml n2 m2 l2
D exp .x x0 /2 .y y0 /2 .z z0 /2 ;
.2 /3=2 2 2 2
where n, m and l are integers. These are meant to go to infinity. The result should
not depend on the way these three numbers tend to infinity though; therefore, we
shall consider the limit of n D m D l ! 1:
3 h n i
n2
ınnn .r r0 / D p exp .r r0 /2 :
2 2
We are interested here in having rr0 being very close to zero. In this case .r r0 /2
is a distance square between two close points with the curvilinear coordinates
differing by q1 D q1 q01 , q2 D q2 q02 and q3 D q3 q03 . The distance
squared between these close points is given by Eq. (7.30):
.r r0 /2 D s2 D h1 .q1 /2 C h2 .q2 /2 C h3 .q3 /2

2 2 2
D h1 q1 q01 C h2 q2 q02 C h3 q3 q03 ;
7.5 Change of Variables in Multiple Integrals 515
so that
3
Y
n n2 hi 2
ınnn .r r0 / D p exp qi q0i :
iD1
2 2
Taking the limit of n ! 1 and using the fact that

eachindividual exponent (for the
given i) tends to a separate delta function ı hi qi q00 , we obtain
3
Y Y 3
n n2 hi 2
ı .r r0 / D lim p exp qi q0i D ı hi qi q0i
iD1
n!1 2 2 iD1
Y3
1
D ı qi q0i ;
iD1
jhi j
which is the same result as the above Eq. (7.40). In the last passage we have used
the property (4.12) of the delta function.
7.5 Change of Variables in Multiple Integrals
We have seen in Sections I.6.1.4 and I.6.2.2 that one has to calculate a Jacobian
of transformation when changing the variables in double and triple integrals,
respectively. In the previous section we derived an expression for the Jacobian again
using the general technique of curvilinear coordinates developed in the preceding
sections. Here we shall prove that there is a straightforward generalisation of this
result to any number of multiple integrals.
Consider an n-fold integral
Z Z
In D F .x1 ; : : : ; xn / dx1 : : : dxn (7.41)
„ ƒ‚ …
n
of some function F .x1 ; : : : ; xn / of n variables. To calculate the integral, we would

like to use another set of variables y1 , y2 , : : :, yn , which are related to the former set
via the corresponding transformation relations:
xi D fi .y1 ; : : : ; yn / ; i D 1; : : : ; n : (7.42)
We would like to prove that

Z Z
In D F .x1 .y1 ; : : : ; yn / ; : : : ; xn .y1 ; : : : ; yn // jJn j dy1 : : : dyn ; (7.43)
„ ƒ‚ …
n
where
ˇ ˇ
ˇ @x1 =@y1 @x2 =@y1 @xn =@y1 ˇˇ
ˇ
ˇ @x =@y @x2 =@y2 @xn =@y2 ˇˇ
Jn D ˇˇ 1 2 (7.44)
ˇ ˇˇ
ˇ @x =@y @x2 =@yn @xn =@yn ˇ
1 n
is the corresponding Jacobian. Note that the absolute value of J is to be taken in

Eq. (7.43). To simplify our notations in this section, we shall also be using the
following convenient notation for the Jacobian:
ˇ ˇ
Jn D ˇ @x1 =@yk @x2 =@yk @xn =@yk ˇ ;
where we only write explicitly elements of the k-th row. You can imagine that each
such a term, e.g. @x2 =@yk , represents the whole column of such elements with k
changing between 1 and n.
We shall prove this formula using induction. The formula is valid for n D 2; 3;
hence, we assume that it is valid for the .n 1/-dimensional integral, and then
shall prove that it is also valid for the n-dimensional integral. Consider the original
integral (7.41) in which we shall integrate over the variable x1 last:
2 3
Z Z Z
6 7
In D dx1 6 7
4 F .x1 ; : : : ; xn / dx2 : : : dxn 5 : (7.45)
„ ƒ‚ …
n1
When the internal .n 1/-dimensional integral over the variables x2 , etc., xn is

calculated, the variable x1 is fixed. Now, let us solve the first transformation equation
in (7.42) (i.e. for i D 1) with respect to the new variable y1 ; it will be some function
of the fixed old variable x1 and all the other new variables:
y1 D .x1 ; y2 ; : : : ; yn / : (7.46)
Therefore, the transformation relations between the old and new variables can be
rewritten by excluding the first variable from either of the sets:
xi D fi . .x1 ; y2 ; : : : ; yn / ; y2 ; : : : ; yn / D gi .x1 ; y2 ; : : : ; yn / ; i D 2; : : : ; n ;
(7.47)
where g2 , g3 , etc., are the new transformation functions. Hence, the internal integral
in (7.45) can be transformed to the new set of variables y2 , etc., yn by means of the
Jacobian Jn1 . This is possible to do due to our assumption:
Z Z Z
In D dx1 F .x1 ; g2 .x1 ; y2 ; : : : ; yn / ; : : : ; gn .x1 ; y2 ; : : : ; yn // jJn1 j dy2 : : : dyn ;
„ ƒ‚ …
n1
where
ˇ ˇ
ˇ @g2 =@y2 @g3 =@y2 @gn =@y2 ˇˇ

ˇ
ˇ @g =@y @g3 =@y3 @gn =@y3 ˇˇ ˇˇ
ˇ
Jn1 D ˇˇ 2 3 ˇ D @g2 =@yk @g3 =@yk @gn =@yk ˇ :
ˇ
ˇ
ˇ @g =@y @g3 =@yn @gn =@yn ˇ

2 n
(7.48)
ˇ ˇ
ˇ ˇ
Let us calculate the derivatives appearing in Jn1 D @gi =@yj explicitly. From (7.47)
for i D 2; : : : ; n, we can write
@xi @gi @fi @fi @ @xi @xi @

D D C D C :
@yj @yj @yj @ @yj @yj @y1 @yj
To calculate the derivative @ =@yj , consider the equation
x1 D f1 .y1 ; y2 ; : : : ; yn / : (7.49)
Here x1 is fixed, and hence dependence of (or of y1 , see Eq. (7.46)) is implicit.
Differentiating both sides of (7.49) with respect to yj and keeping in mind that y1
also depends on yj via Eq. (7.46), we obtain
@f1 @f1 @ @ @f1 =@yj @x1 =@yj

0D C H) D D D ˛j ;
@yj @y1 @yj @yj @f1 =@y1 @x1 =@y1
which allows us to write:
@gi @xi @xi

D ˛j :
@yj @yj @y1
Now we are in a position to write the Jacobian Jn1 in another form:
ˇ
Jn1 D ˇ .@x2 =@yk / ˛k .@x2 =@y1 / .@x3 =@yk /
ˇ
˛k .@x3 =@y1 / .@xn =@yk / ˛k .@xn =@y1 / ˇ: (7.50)
The first column contains a difference of two terms; according to Properties 1.4
and 1.5 of the determinants (Sect. 1.2.6.2), we can split the first column and rewrite
Jn1 as two determinants:
ˇ ˇ
Jn1 D ˇ @x2 =@yk .@x3 =@yk / ˛k .@x3 =@y1 / .@xn =@yk / ˛k .@xn =@y1 / ˇ
@x2 ˇˇ ˇ
˛k .@x3 =@yk / ˛k .@x3 =@y1 / .@xn =@yk / ˛k .@xn =@y1 / ˇ :
@y1
Similarly, we can split the second column in both determinants:
ˇ ˇ
Jn1 D ˇ @x2 =@yk @x3 =@yk .@xn =@yk / ˛k .@xn =@y1 / ˇ
@x3 ˇˇ ˇ
@x2 =@yk ˛k .@xn =@yk / ˛k .@xn =@y1 / ˇ
@y1
@x2 ˇˇ ˇ
˛k @x3 =@yk .@xn =@yk / ˛k .@xn =@y1 / ˇ
@y1
@x2 @x3 ˇˇ ˇ
C ˛k ˛k .@xn =@yk / ˛k .@xn =@y1 / ˇ :
@y1 @y1
The last determinant contains two identical columns and hence is equal to zero
(Property 1.3 of determinants from Sect. 1.2.6.2). It is clear now that if we continue
this process and split all columns in Jn1 of Eq. (7.50), we shall arrive at a sum of
determinants in which the column ˛k can only appear once; there will also be one
determinant without ˛k , which we shall write first:
ˇ ˇ @x2 ˇ ˇ
Jn1 D ˇ @x2 =@yk @x3 =@yk @xn =@yk ˇ ˇ ˛k @x3 =@yk @xn =@yk ˇ
@y1
@x3 ˇˇ ˇ @xn ˇˇ ˇ
@x2 =@yk ˛k @xn =@yk ˇ @x2 =@yk @x3 =@yk ˛k ˇ :
@y1 @y1
(7.51)
So far, we have our integral In calculated by first performing .n 1/ integrations

over the variables y2 ; : : : ; yn and only then over x1 . Now we shall change the order
of integration so that the x1 integration would appear as the first:
Z Z Z
In D jJn1 j dy2 : : : dyn F .x1 ; g2 .x1 ; y2 ; : : : ; yn /; : : : ; gn .x1 ; y2 ; : : : ; yn // dx1:
„ ƒ‚ …
n1
Here the x1 integration is performed between two boundaries corresponding to the

fixed values of y2 ; : : : ; yn . Since the latter variables are fixed, dx1 D .@x1 =@y1 / dy1 ,
which gives
Z Z ˇ ˇ
ˇ @x1 ˇˇ
In D F .f1 .y1 ; : : : ; yn / ; f2 .y1 ; : : : ; yn / ; : : : ; fn .y1 ; : : : ; yn // ˇˇJn1 dy1 dy2 : : : dyn :
@y1 ˇ
„ ƒ‚ …
n
(7.52)
What is left to show is that Jn1 .@x1 =@y1 / D Jn . To see this, we notice that
˛k .@x1 =@y1 / D @x1 =@yk , and hence
@x1 @x1 ˇˇ ˇ
Jn1 D @x2 =@yk @x3 =@yk @xn =@yk ˇ
@y1 @y1
@x2 ˇˇ ˇ
˛k .@x1 =@y1 / @x3 =@yk @xn =@yk ˇ
@y1
@x3 ˇˇ ˇ
@x2 =@yk ˛k .@x1 =@y1 / @xn =@yk ˇ
@y1
@xn ˇˇ ˇ
@x2 =@yk @x3 =@yk ˛k .@x1 =@y1 / ˇ
@y1
@x1 ˇˇ ˇ
D @x2 =@yk @x3 =@yk @xn =@yk ˇ
@y1
@x2 ˇˇ ˇ
@x1 =@yk @x3 =@yk @xn =@yk ˇ
@y1
@x3 ˇˇ ˇ
@x2 =@yk @x1 =@yk @xn =@yk ˇ
@y1
@xn ˇˇ ˇ
@x2 =@yk @x3 =@yk @x1 =@yk ˇ : (7.53)
@y1
As the final step, in each of the determinants, apart from the first and the second
ones, we move the column with @x1 =@yk to the first position; the order of other
columns we do not change. This operation requires several pair permutations. If the
column with @x1 =@yk is at the position of the l-th column (where l D 1; 2; : : : ; n1),
this would require .l 1/ permutations giving a factor of .1/l1 . This yields
@x1 @x1 ˇˇ ˇ
Jn1 D @x2 =@yk @x3 =@yk @xn =@yk ˇ
@y1 @y1
@x2 ˇˇ ˇ
C .1/1 @x1 =@yk @x3 =@yk @xn =@yk ˇ
@y1
@xn ˇˇ ˇ
C C .1/n1 @x1 =@yk @x2 =@yk @xn1 =@yk ˇ : (7.54)
@y1
Looking now very carefully, we can verify that this expression is exactly an
expansion of Jn in Eq. (7.44) along the first row, i.e. Jn1 .@x1 =@y1 / is indeed Jn ,
and hence Eq. (7.52) is the required formula (7.43). Q.E.D.
Problem 7.11. Prove the formula for the multidimensional Gaussian integral,
0 1
Z 1 Z 1 Z 1 X n
1 .2 /n=2
In D dx1 dx2 dxn exp @ qij xi xj A D p ;
1 1 1 2 i;jD1 jQj
(7.55)
following these steps: (i) write the quadratic
form in the exponent in the matrix
form as 12 X T QT with X D .xi / and Q D qij ; (ii) introduce new variables Y D
U T X D .yi / with theorthogonal
matrix U which diagonalises the matrix Q D
UDU T , where D D ıij di is the diagonal matrix of eigenvalues di of Q; (iii)
change
ˇ variables
ˇ in the integral from X to Y and explain why the Jacobian J D
ˇ@xi =@yj ˇ is equal to unity; (iv) then, the n-fold integral splits into a product of
R1 p
n independent Gaussian integrals 1 exp 12 di y2i dyi D =di ; (v) finally,
multiply all such contributions to get the required result.
7.6 N-dimensional Sphere
So far we have only limited ourselves to spherical coordinates applied to triple

integrals. In fact, one can define a generalisation of these coordinates to N 3
dimensions which could be used in taking N-dimensional integrals. Here we shall
consider, as an example, calculation of a volume of an N-dimensional sphere and a
related integral.
In physical applications it is sometimes needed to calculate N-dimensional
integrals of the following type:
Z Z

IN D f x12 C x22 C C xN2 dx1 dx2 dxN : (7.56)
Here the integration is performed over the whole space. We shall now simplify this
N-dimensional integral using a kind of heuristic argument.
Indeed, let us first calculate the integral,
Z Z
VN D dx1 dx2 dxN ;
x12 C CxN
2 R2
that can be thought of as a volume of an N-dimensional “sphere” of radius R. Indeed,

the integration is performed in such a way that the condition
x12 C C xN2 R2
is satisfied. It is readily seen that this is indeed a generalisation of the three-

dimensional case. Clearly, VN must be proportional to RN , so that we write VN D
CN RN , where CN is a constant to determine. On the other hand, one may presume
that the volume element dVN can be written as
dVN D dx1 : : : dxN D rN1 drdN1 : (7.57)
where dN is the corresponding angular part. Then,

Z R Z Z
VN D rN1 dr dN1 ; (7.58)
0
where the .N 1/-dimensional integration within the square brackets is performed

over all angular dependencies, while the first integration is done with respect to the
distance
q
r D x12 C C xN2
to the centre of the coordinate system which can be considered as the “radial”
coordinate of the N-dimensional spherical coordinate system.
7.6 N-dimensional Sphere 521

We shall show now that for functions f r2 which depend only on the distance
r, the precise form of the angular part, dN , is not important. Indeed, assuming that
the angle integral does not depend on r, the r integration can be performed first:
Z Z Z Z
RN
VN D dN1 H) dN1 D NCN : (7.59)
N
To calculate the constant CN , let us consider the integral (7.56) with f .x/ D ex .
We can write
Z Z Y Z 1 Y
N
2
N
p
exp x12 C C xN2 dx1 dxN D ex dx D D N=2
:
iD1 1 iD1
On the other hand, the same integral can be written via the corresponding radial and
angular arguments as
Z 1 Z Z Z Z
r2 N1 1 N
e r dr dN1 D dN1 ;
0 2 2
where we have calculated the radial integral explicitly. Since the angular integration
gives NCN , see Eq. (7.59), we immediately obtain that

1 N 2 N=2
NCN D N=2
H) CN D : (7.60)
2 2 N N2
This solves the problem of calculating the volume of the N-dimensional sphere:
2 N=2 RN
VN D CN RN D : (7.61)
N N2
It is easy to see that the particular values of VN for N D 1; 2; 3 do indeed make

perfect sense: V1 D 2R, V2 D R2 and V3 D 43 R3 .
Incidentally, we can also easily consider the surface area of the N-dimensional
sphere, SN1 .R/. Indeed, since
Z R
VN D SN1 .r/dr ;
0
then dVN =dR D SN1 .R/, which yields
2 N=2 N1
R
SN1 .R/ D N : (7.62)
2
Again, in particular cases one obtains expressions we would expect: S0 D 2,

S1 D 2 R (the length of a circle of radius R) and S2 D 4 R2 (the surface of a
sphere of radius R).
Now we are able to return to the integral (7.56). Noting that the argument of the
function f in the integrand is simply r2 , we obtain
Z 1 Z Z Z
2 N=2 1 2 N1
IN D f r2 rN1 dr dN1 D N
f r r dr ;
0 2 0
(7.63)
where we have used Eqs. (7.59) and (7.60) for the angular integral. This is the final
result.
The essential point of our discussion above is based on an assumption, made in
Eq. (7.58), that the volume element can be written as in Eq. (7.57). Although this
formula is based on the dimensional argument, it is worth deriving it. We shall do
this by generalising spherical coordinates to the N-dimensional space:
.x1 ; x2 ; : : : ; xN / H) .r; '1 ; '2 ; : : : ; 'N1 / ;
where the two systems of coordinates are related to each other via the following
transformation equations:
x1 D r cos '1 ; x2 D r sin '1 cos '2 ; x3 D r sin '1 sin '2 cos '3 ;

xi D r sin '1 sin '2 sin 'i1 cos 'i ;

xN1 D r sin '1 sin '2 sin 'i1 sin 'i sin 'N2 cos 'N1 ;
xN D r sin '1 sin '2 sin 'i1 sin 'i sin 'N2 sin 'N1 :
Here r 0 and 0 'N1 2 , while all other angles range between 0 and .
The volume elements in the two coordinates systems, the Cartesian and the
N-dimensional spherical ones, are related by the Jacobian (Sect. 7.5):
ˇ ˇ
ˇ @ .x1 ; x2 ; : : : ; xN / ˇ
ˇ
dx1 dx2 dxN D ˇ ˇ drd'1 d'N1 D jJN j drd'1 d'N1 :
@ .r; ' ; : : : ; ' / ˇ
1 N1
Problem 7.12. Check that
x12 C x22 C C xN2 D r2 :
Problem 7.13. Prove, by writing down the Jacobian JN explicitly employing

the transformation relations above and using properties of the determinants,
that JN D rN1 F .'1 ; : : : ; 'N1 /. This proves Eq. (7.57) with
dN1 D F .'1 ; : : : ; 'N1 / d'1 d'N1 :

7.7 Gradient of a Scalar Field 523
A full expression for the Jacobian (i.e. the function F) can also be derived, if
needed, by calculating explicitly the Jacobian determinant. This would be needed
when calculating integrals in which the integrand, apart from r, depends also on
other coordinates (or their combinations).
7.7 Gradient of a Scalar Field
In this and the following sections we shall revisit several important notions of the
vector calculus introduced in Sects. I.5.8, I.6.6.1 and I.6.6.2. There we obtained
explicit formulae for the gradient, divergence and curl for the Cartesian system.
Our task here is to generalise these for a general curvilinear coordinates system.
These would allow us in the next chapter to consider PDE of mathematical physics
exploiting symmetry of the problem at hand. Although the derivation can be done
for a very general case, we shall limit ourselves here only to orthogonal curvilinear
coordinate systems as these are more frequently found in actual applications.
Consider a scalar field ‰.P/ defined at each point P.x; y; z/ in a 3D region R. The
directional derivative, d‰=dl, of ‰.P/ was defined in Sect. I.5.8 as the rate of change
of the scalar field along a direction specified by the unit vector l. Then, the gradient
of ‰.P/, written as grad ‰.P/, was defined as a vector satisfying the relation:
d‰
.grad ‰/ l D : (7.64)
dl
Note that the gradient does not depend on the direction l, only on the value of the
field at the point P.
Our task now is to obtain an expression for the gradient in a general orthogonal
curvilinear coordinate system. In order to do that, we first expand the gradient
(which is a vector field) in terms of the unit base vectors of this system:
3
X
grad ‰ D .grad ‰/j ej : (7.65)
jD1
Multiplying both sides by ei and using the fact that the system is orthogonal,
ei ej D ıij , we have
.grad ‰/i D .grad ‰/ ei : (7.66)
Comparing this equation with Eq. (7.64), we see that the i-th component of grad‰
is provided by the directional derivative of ‰ along the qi coordinate line, i.e. along
the direction ei ,
d‰
.grad ‰/i D ; (7.67)
dsi
where dsi D hi dqi is the corresponding distance in space associated with the change
of qi from qi to qi C dqi , see (7.28). Therefore, in order to calculate .grad ‰/i , we
have to calculate the change d‰ of ‰ along the direction ei . In this direction only
the coordinate qi is changing by dqi , i.e.
@‰
d‰ D dqi ; (7.68)
@qi
so that the required directional derivative
d‰ .@‰=@qi / dqi 1 @‰
.grad ‰/i D D D : (7.69)
dsi hi dqi hi @qi
Therefore, finally, we can write
X3
1 @‰
grad ‰ D ei : (7.70)
h @qi
iD1 i
It is easy to see that this expression indeed generalises an expression for the
gradient,
@‰ @‰ @‰
grad ‰.r/ D iC jC k; (7.71)
@x @y @z
see Eq. (I.5.69), derived for the Cartesian system. In this case .q1 ; q2 ; q3 / ! .x; y; z/,
.e1 ; e2 ; e3 / ! .i; j; k/ and h1 D h2 D h3 D 1, so that we immediately recover our
previous result. For cylindrical coordinates .q1 ; q2 ; q3 / ! .r; ; z/ and hr D hz D 1,
h D r, so that in this case
@‰ 1 @‰ @‰
grad ‰ D er C e C ez : (7.72)
@r r@ @z
Problem 7.14. Show that the gradient in the spherical system is
@‰ 1 @‰ 1 @‰
grad ‰ D er C e C e : (7.73)
@r r @ r sin @
Problem 7.15. Using the spherical coordinates, calculate grad ‰ for

‰.x; y; z/ D 10 exp x2 y2 z2 at the point P r D 1; D 2 ; D 2 .
Check your result by calculating the gradient of the same function directly in
Cartesian coordinates. [Answer: .20=e/ er .]
7.8 Divergence of a Vector Field 525
Problem 7.16. Consider a particle of unit mass moving within the z D 0 plane
in a central field with the potential U.r/ D ˛=r, where r is the distance from
the centre. Show that the force field, F D grad U, acting on the particle in
this coordinate system is radial, F D ˛r2 er .
7.8 Divergence of a Vector Field
Consider a vector field F.P/. We showed in Sect. I.6.6.1 that the divergence of the
vector field at the point P is given by the flux of F through a closed surface S of
volume V containing the point P,
I I
1 1
div F D lim F dS D lim F ndS ; (7.74)
V!0 V S V!0 V S
in the limit of V ! 0. Note that dS is an area element on S, and dS D ndS is the

directed area element, where n is a unit vector normal to S at point P and pointing in
the outward direction, i.e. outside the volume. An important feature of the definition
given by Eq. (7.74) is that it is intrinsic since it makes no reference to any particular
coordinate system. Therefore, it is exactly what we need to derive an expression for
the divergence in a general curvilinear coordinate system.
We have also stressed in Sect. I.6.6.1 that for the limit in Eq. (7.74) to exist, it
should not depend on the shape of the volume ensuring that div F.P/ is a well-
defined scalar field which only depends on the value of the vector field F at
point P. We shall now use this property of the intrinsic definition given above to
derive the corresponding expression for the divergence for an orthogonal curvilinear
coordinate system.
Consider a point P .q1 ; q2 ; q3 /. Let us surround this point with a small curvilinear
box V. Since the limit should not depend on the shape of the box, it is convenient to
choose it to be formed by six coordinate surfaces corresponding to particular values
of the curvilinear coordinates. Three pairs of surfaces are chosen, by two for each
curvilinear coordinate qi (i D 1; 2; 3); each pair crosses the corresponding qi -line
before and after the point P .q1 ; q2 ; q3 / and corresponds to the qi coordinate equal to
qi ˙ 12 ıqi as is shown in Fig. 7.6, where ıqi is chosen infinitesimally small. Hence,
there are two surfaces crossing q1 , two crossing q2 and another two crossing q3 . The
vector field F via its components is
3
X
F .q1 ; q2 ; q3 / D Fi .q1 ; q2 ; q3 / ei : (7.75)
iD1
We first calculate the flux of F across the coordinate surface EFGH crossing the
coordinate line q1 at the value q1 C 12 ıq1 . In an orthogonal system that surface can be
Fig. 7.6 For the calculation

of the flux through the faces
EFGH and IJKL of the
closed
surface S. Here points
A q1 C 12 ıq1 ; q2 ; q3 ,

B q1 ; q2 C 12 ıq2 ; q3 and
1

C q1 ; q2 ; q3 C 2 ıq3 are
positioned in the centre of the
q1 -coordinate surface EFGH,
q2 -coordinate surface EIJF
and q3 -coordinate surface
KGFJ, respectively
considered, to the leading order, rectangular (its sides are orthogonal), with the sides
HE D GF D h2 ıq2 and GH D FE D h3 ıq3 . Its area therefore is dS1 D h2 h3 ıq2 ıq3 .
Also, the outward normal to the surface is n D e1 since, again, we consider an
orthogonal system. Therefore, the flux through this surface
d1C D F dS1 D F e1 dS1 D .F1 h2 h3 /q1 Cıq1 =2 ıq2 ıq3 ;
since F e1 D F1 , see Eq. (7.75). The expressionin the round brackets above is to
be calculated at the central point of the surface, q1 C 12 ıq1 ; q2 ; q3 . Applying the
Taylor expansion, this expression can be calculated, to the first order, as:

@ ıq1
d1C D .F1 h2 h3 /P C .F1 h2 h3 / ıq2 ıq3
@q1 P 2

@ ıq1
D .F1 h2 h3 /P ıq2 ıq3 C .F1 h2 h3 / ıq2 ıq3 : (7.76)
@q1 P 2
Here the subscript P indicates that the corresponding expression is to be calculated

at the point P (i.e. at the centre of the volume V).
In a similar manner, we find that the outward flux across the coordinate surface
IJKL (with the q1 coordinate equal to q1 12 ıq1 ) is

@ ıq1
d1 D .F1 h2 h3 /P ıq2 ıq3 C .F1 h2 h3 / ıq2 ıq3 ; (7.77)
@q1 P 2
the minus sign is due to the fact that for this surface the outward normal n D e1
and hence F n D F1 . Thus, the total outward flux across the opposite pair of
surfaces EFGH and IJKL is equal to

@
d1 D d1C C d1 D .F1 h2 h3 / ıq1 ıq2 ıq3 : (7.78)
@q1 P
7.8 Divergence of a Vector Field 527
The same analysis is repeated for the other two pairs of opposite faces to yield
the contributions

@ @
d2 D .F2 h3 h1 / ıq1 ıq2 ıq3 and d3 D .F3 h1 h2 / ıq1 ıq2 ıq3 ;
@q2 P @q3 P
for the q2 and q3 surfaces, respectively. Summing up all three contributions leads to
the final value of the flux through the whole surface S as

@ @ @
.F1 h2 h3 / C .F2 h3 h1 / C .F3 h1 h2 / ıq1 ıq2 ıq3 : (7.79)
@q1 @q2 @q3
The volume V enclosed by S is given by (recall that our system is orthogonal):
ıV D h1 h2 h3 ıq1 ıq2 ıq3 ; (7.80)
to the leading order. Finally, dividing the flux (7.79) by the volume (7.80), we finally
obtain

1 @ @ @
div F D .F1 h2 h3 / C .h1 F2 h3 / C .h1 h2 F3 / : (7.81)
h1 h2 h3 @q1 @q2 @q3
Note that it might seem that we have never applied the limit V ! 0. In reality,
we did. Indeed, in the expressions for the fluxes and the volume we only kept
leading terms; other terms are proportional to higher powers of ıqi and would have
disappeared in the limits ıqi ! 0 (i D 1; 2; 3) which correspond to the volume
tending to zero. Hence, the result above is the general formula we sought for.
Let us verify if this formula goes over to the result we had for the Cartesian
system .x; y; z/,
@Fx @Fy @Fz

div F D C C ; (7.82)
@x @y @z
see Eq. (I.6.79). In this case all scale factors are equal to one and our result does
indeed reduce to the one written above, as expected.
For cylindrical coordinates .r; ; z/ the scale factors read hr D hz D 1 and
h D r, so that we find

1 @ @ @ 1 @ 1 @F @Fz
div F D .Fr r/ C F C .Fz r/ D rF C C :
r @r @ @z r @r r @ @z
(7.83)
Only the first two terms are to be kept if the two-dimensional polar coordinate
system .r; / is considered.
Problem 7.17. Show that the divergence of a vector field F in the spherical
system is
1 @ 2 1 @ 1 @F
div F D 2 r Fr C .F sin / C : (7.84)
r @r r sin @ r sin @
Problem 7.18. Using the spherical system, calculate the divergence of the

vector field F D r2 er C r sin e at the point P r D 1; D 2 ; D 2 .
[Answer: 2 cos D 0.]
7.9 Laplacian
We are now in a position to work out an expression for the Laplacian ‰ D

div grad ‰ of a scalar field in a general curvilinear coordinate system. Indeed, a
scalar field ‰ we have can be used to generate a vector field
X3
1 @‰
F D F1 e1 C F2 e2 C F3 e3 D grad ‰ D ei ; (7.85)
h @qi
iD1 i
where we have used Eq. (7.70). Next, this vector field can be turned into a
scalar field by applying the divergence operation for which we derived a general
expression (7.81). We can now combine the two expressions to evaluate ‰ if we
notice that the components of the vector F above are
1 @‰
Fi D : (7.86)
hi @qi
Therefore, substituting (7.86) into (7.81) gives the final expression for the Laplacian
sought for:

1 @ h2 h3 @‰ @ h3 h1 @‰ @ h1 h2 @‰
‰ D C C :
h1 h2 h3 @q1 h1 @q1 @q2 h2 @q2 @q3 h3 @q3
(7.87)
It is easy to see that in Cartesian coordinates this expression simplifies to a well-

known result
@2 ‰ @2 ‰ @2 ‰
‰ D C C ;
@x2 @y2 @z2
7.10 Curl of a Vector Field 529
while for cylindrical coordinates .r; ; z/ we find that

1 @ @‰ 1 @2 ‰ @2 ‰
‰ D r C 2 C : (7.88)
r @r @r r @ 2 @z2
Correspondingly, in polar coordinates one drops the last term.
Problem 7.19. Show that the Laplacian in the spherical system .r; ; / is

1 @ 2 @‰ 1 @ @‰ 1 @2 ‰
‰ D 2 r C 2 sin C 2 2 : (7.89)
r @r @r r sin @ @ r sin @ 2
r 2
Problem 7.20. Calculate
‰ in the spherical system for ‰ D e cos .
r2 2 2
[Answer: 2e 3 2r C r cos .]
Problem 7.21. Show that in parabolic coordinates (see Problem 7.4) the
Laplacian has the form:

1 1 @ @‰ 1 @ @‰ 1 @2 ‰
‰ D 2 u C v C :
u C v2 u @u @u v @v @v .uv/2 @ 2
Problem 7.22. Using the Laplacian written in the spherical coordinates

.r; ; /, show that
er 1 2
D er :
r r
Hence, show that the function .r/ er =r satisfies the differential equation

2
.r/ D 4 ı .r/ :
r
pwith energy E < 0,

Verify that this is a Schrödinger equation for a free electron
mass m and the wave function .r/ for r > 0 if D „1 2mE. [Hint: you
may find Eq. (5.55) useful.]
7.10 Curl of a Vector Field
To calculate the curl of a vector field F in curvilinear coordinates, we shall use the
intrinsic definition of the curl given in Sect. I.6.6.2. It states that the curl of F at point
P can be defined in the following way: choose an arbitrary smooth surface S through
the point P and draw a contour L around that point within the surface. Then the curl
of F at point P is given by the limit in which the area A enclosed by the contour is
tending to zero keeping the point P inside it:
I
1
curl F n D lim F dl ; (7.90)
A!0 A L
where n is the normal to the surface at point P and in the line integral the contour is
traversed such that the point P is always on the left. The direction of the normal n is
related to the direction of the traverse around the contour L by the right-hand screw
rule.
To calculate the curl of F; we note that the limit should not depend on the shape
of the region enclosed by the contour L (or the shape of the latter) as long as the
point P remains inside it when taking the limit (the curl F is a well-defined vector
field which only depends on F at the point P). Hence, to perform the calculation, it
is convenient to choose the surface S as the qi coordinate surface for the calculation
of the i-th component of the curl. Indeed, with such a choice and for an orthogonal
curvilinear system, the normal n to the surface at point P coincides with the unit
base vector ei . The case of i D 1 is illustrated in Fig. 7.7. Hence, with that particular
choice of S, from Eq. (7.90),
I
1
curl F n D curl F ei D .curl F/i D lim F dl ;
A!0 A L
where the vector field
curl F D .curl F/1 e1 C .curl F/2 e2 C .curl F/3 e3 :
Therefore, to calculate the i-th component of the curl, we can simply choose the qi
coordinate surface passing through the point P. In addition, we can also choose the
contour L to go along the other two coordinate lines—see again Fig. 7.7, where in
the case of i D 1 the contour is a distorted rectangular with its sides running along
the q2 and q3 lines shifted from their values at the point P by ˙ 12 ıq2 and ˙ 12 ıq3 ,
respectively.
Let us calculate .curl F/1 at P. The surface S is taken as the q1 coordinate surface
with the contour L being ABCD in Fig. 7.7 passed anti-clockwise as indicated (note
the direction of the normal e1 to S). The line integral along the contour contains four
components which we have to calculateone by one. The line AB corresponds to the
q3 coordinate line through the point F q1 ; q2 C 12 ıq2 ; q3 . To the leading order, F
can be considered fixed at its value at the point F, resulting in the contribution
Z Z Z Z
F dl D F e3 dl D F3 dl D F3 dl D .F3 h3 /F ıq3 ;
AB AB AB AB
7.10 Curl of a Vector Field 531
Fig. 7.7 For the calculation

of the line integral along
ABCD serving as the contour
L: the q1 coordinate surface is
chosen as S to calculate the
component .curl F/1 of the
curl.
Points
H q1 ; q2 12 ıq2 ; q3 ,

F q1 ; q2 C 12 ıq2 ; q3 ,

E q1 ; q2 ; q3 12 ıq3 and

G q1 ; q2 ; q3 C 12 ıq3 in the
middle of the sides of the
distorted rectangular ABCD
are also indicated
where h3 ıq3 is the length of the line AB. The expression in the round brackets is to
be calculated at the point F as indicated above. Similarly, the contribution from the
opposite piece CD is
Z
F dl D .F3 h3 /H ıq3 ;
CD
where the minus sign comes from the fact that in this case the direction is opposite
to that of e3 , i.e. dl D e3 dl, and the index
H means that the expression in the round
brackets is to be calculated at point H q1 ; q2 12 ıq2 ; q3 . The latter differs from F
only in its second coordinate. Therefore, the sum of these two contributions is
Z
F dl D .F3 h3 /F ıq3 .F3 h3 /H ıq3 D Œ.F3 h3 /F .F3 h3 /H ıq3
ABCCD
D .F3 h3 /q2 Cıq2 =2 .F3 h3 /q2 ıq2 =2 ıq3 :
The terms in the expression in the square brackets can be expanded in the Taylor
series keeping the first two terms:

@ ıq2 @ ıq2
.F3 h3 /q2 Cıq2 =2 D .F3 h3 /q2 C .F3 h3 / D .F3 h3 /P C .F3 h3 / ;
@q2 q2 2 @q2 P 2

@ ıq2 @ ıq2
.F3 h3 /q2 ıq2 =2 D .F3 h3 /q2 .F3 h3 / D .F3 h3 /P .F3 h3 / ;
@q2 q2 2 @q2 P 2
so that
Z
@
F dl D .F3 h3 /q2 Cıq2 =2 .F3 h3 /q2 ıq2 =2 ıq3 D .F3 h3 / ıq2 ıq3 :
ABCCD @q2 P
In a similar manner, we find that the line integrals along DA and BC are
Z
@
F dl D .F2 h2 /q3 ıq3 =2 .F2 h2 /q3 Cıq3 =2 ıq2 D .F2 h2 / ıq2 ıq3 :
DACBC @q3 P
Hence, we obtain for the whole closed-path line integral:

I
@ @
F dl D .F3 h3 / .F2 h2 / ıq2 ıq3 : (7.91)
ABCD @q2 @q3 P
This expression is valid to the leading order in ıqi .

Next, the area enclosed by ABCD, to the leading order, is A D .h2 ıq2 / .h3 ıq3 /.
Therefore, dividing the value of the line integral along ABCD by the area A, we
obtain

1 @ @
.curl F/1 D .h3 F3 / .h2 F2 / : (7.92)
h2 h3 @q2 @q3
This expression is finite; any other terms corresponding to higher orders in ıqi
vanish in the A ! 0 (or ıqi ! 0) limit.
The other two components of the curl, namely .curl F/2 and .curl F/3 , are
obtained in the similar fashion (perform this calculation as an exercise!) giving in
the end:

e1 @ @ e2 @ @
curl F D .h3 F3 / .h2 F2 / C .h1 F1 / .h3 F3 /
h2 h3 @q2 @q3 h3 h1 @q3 @q1

e3 @ @
C .h2 F2 / .h1 F1 / : (7.93)
h1 h2 @q1 @q2
This formula can be expressed more compactly as a determinant:

ˇ ˇ
ˇ h1 e1 h2 e2 h3 e3 ˇ
1 ˇ ˇ
curl F D ˇ ı=ıq1 ı=ıq2 ı=ıq3 ˇ : (7.94)
ˇ ˇ
h1 h2 h3 ˇhF hF hF ˇ
1 1 2 2 3 3
It must be immediately seen that in the Cartesian coordinates we recover our old
Eq. (I.6.73):
ˇ ˇ
ˇ i j k ˇˇ
ˇ
curl F.r/ D ˇˇ @=@x @=@y @=@z ˇˇ : (7.95)
ˇ F Fy Fz ˇ
x
7.11 Some Applications in Physics 533
Problem 7.23. Show that the curl in the cylindrical system .r; ; z/ is

1 @Fz @F @Fr @Fz ez @ @Fr
curl F D er Ce C rF :
r @ @z @z @r r @r @
(7.96)
Problem 7.24. Show that the curl in the spherical system .r; ; / is

er @ @F e @Fr @
curl F D F sin C sin rF
r sin @ @ r sin @ @r

e @ @Fr
C .rF / : (7.97)
r @r @
Problem 7.25. Calculate in the spherical coordinates the curl of the vector
field F D r2 er C r2 sin e . Show that only the second term in the vector field

contributes. Hence, calculate the curl at the point P r D 1; D 2 ; D 2 .
[Answer: 3r sin e D 3e .]
7.11 Some Applications in Physics
7.11.1 Partial Differential Equations of Mathematical Physics
Many problems in physics such as heat and mass transport, wave propagation, etc.,
are described by PDE. These are equations which are generally very difficult to
solve. However, if a physical system has a symmetry, then an appropriate curvilinear
coordinate system may help in solving these PDEs. Here we consider some of the
well-known PDEs of mathematical physics in cylindrical and spherical coordinate
systems and illustrate on simple examples how their solution can be obtained in
some cases.
We shall start from the wave equation (Sect. I.6.7.1):
1 @2 ‰
‰ D ;
c2 @t2
where ‰ .r; t/ is a wave field of interest, c is a constant corresponding to the
speed of wave propagation. If the system possesses a cylindrical symmetry, then
the Laplacian has the form (7.88) and the PDE can be rewritten as

1 @ @‰ 1 @2 ‰ @2 ‰ 1 @2 ‰
r C 2 2
C 2 D 2 2 : (7.98)
r @r @r r @ @z c @t
Here ‰ D ‰ .r; ; z/ is the function of the cylindrical coordinates.4 This equation

may seem to be as difficult as the original one, but this is not the case as the
variables r and instead of x and y are more appropriate for that symmetry and
hence may help in obtaining the required solution. An example of this was given in
Sect. 4.7.5 when we considered an oscillation of the circular membrane. In the latter
case the z coordinate was in fact dropped and only polar coordinates were used. The
z coordinate can also be dropped (together with the last term in the left-hand side
in Eq. (7.98)) if, for instance, one expects that the solution should not depend on z
due to symmetry. For instance, the wave produced by an infinitely long thin source
positioned along the z axis would have that kind of symmetry.
Similarly, propagation of a wave from a spherical source should have spherical
symmetry. The corresponding wave equation is obtained by taking the Laplacian in
spherical coordinates, Eq. (7.89),

1 @ 2 @‰ 1 @ @‰ 1 @2 ‰ 1 @2 ‰
r C sin C D ; (7.99)
r2 @r @r r2 sin @ @ r2 sin2 @ 2 c2 @t2
where ‰ D ‰ .r; ; /.
The diffusion or heat transport equations can be considered exactly along the
same lines; the only difference is that in the right-hand side instead of the second
time derivative we have the first.
As an example, consider a simple electrostatic problem of a potential ' due to a
radially symmetric charge distribution .r/. We shall assume that .r/ ¤ 0 only
for 0 r R, i.e. we consider a spherical charged ball of radius R with the
charge density which may change with the distance from the centre. In this case
the potential ' satisfying the Poisson equation,
' D 4 ;
will only depend on r, and hence we can drop the and terms in the Laplacian,
which leads us to a much simpler equation:

1 d 2 d' d d'
r D 4 .r/ H) r2 D 4 r2 .r/ :
r2 dr dr dr dr
This is an ordinary DE which can be easily solved. Integrating both sides between
0 and r 0 and assuming that the derivative of the potential at the centre is finite,
we get
Z r
d' d' Q.r/
r2 D 4 r12 .r1 / dr1 H) D 2 ; (7.100)
dr 0 dr r
4
We keep the same notation for it as in Cartesian coordinates solely for convenience; however,
when written via cylindrical coordinates, it will become a different function.
where
Z r Z
Q.r/ D 4 r12 .r1 / dr1 D dV (7.101)
0 Sphere
is the total charge contained in the sphere of radius r. This is because dV D 4 r12 dr1
corresponds exactly to the volume of a spherical shell of radius r1 and width dr1
(indeed, integrating over the angles 0 and 0 2 of the volume
element dV D r12 sin dd in the spherical system, one obtains this expression
for dV).
Consider first the potential outside the sphere, r R. In this case Q.r/ D
Q.R/ D Q0 is the total charge of the sphere. Then, integrating Eq. (7.100) between
r and 1 and setting the potential ' at infinity to zero, we obtain a simple result
.r/ D Q0 =r. Remarkably, this point charge formula is valid for any distribution
of charge inside the sphere provided it remains spherically symmetric.
Consider now the potential inside the sphere, i.e. for 0 r R. In this case
Q.r/ corresponds to the charge inside the sphere of radius r. Integrating Eq. (7.100)
between r and R, we find
Z R Z R
Q .r1 / dr1 Q0 Q .r1 / dr1
'.r/ D '.R/ C 2
D C : (7.102)
r r 1 R r r12
Here we set '.R/ D Q0 =R to ensure continuity of the solution across the boundary
of the sphere.
It is immediately seen from this formula, that the potential of a spherical layer
of a finite width inside it is constant. Indeed, if ¤ 0 only for R1 r R2 , then
Q.r/ D 0 for 0 r R1 , and hence for these distances from the centre
Z R2
Q0 Q .r1 / dr1
'.r/ D C ;
R2 R1 r12
i.e. the potential does not depend on r, it is a constant inside the layer. Correspond-
ingly, the electric field,
d'
E D r' D er ;
dr
is zero. Above, Eq. (7.73) was used for the gradient in the spherical coordinates.
Problem 7.26. Show that the above result can also be written in the form:
Z
2Q0 Q.r/ R
'.r/ D C C 4 r1 .r1 / dr1 :
R r r
[Hint: use the explicit definition of Q.r/ as an integral in Eq. (7.102) and then
change the order of integration.]
Problem 7.27. Consider a uniformly charged sphere of radius R and

charge Q0 . Show that in this case inside the sphere

Q0 r 2
'.r/ D 1C :
2R R
7.11.2 Classical Mechanics of a Particle
Suppose, we would like to solve Newton’s equations of motion for an electron

in an external field which is conveniently given in some curvilinear coordinates
.q1 ; q2 ; q3 /, i.e. the force is F .q1 ; q2 ; q3 /. Of course, everything can be formulated in
the conventional Cartesian system, but that might be extremely (and unnecessarily!)
complicated. Instead, if the equation of motion,
d2 r
m DF; (7.103)
dt2
is rewritten in the same curvilinear system, it may result in a drastic simplification
and even possibly solved.
What we need to do is to transform both sides of equation (7.103) into a general
curvilinear coordinate system. Since the force is assumed to be already written in
this system,
3
X
FD Fi ei ; (7.104)
iD1
it remains to calculate the acceleration d2 r=dt2 in these curvilinear coordinates. The

starting point is Eq. (7.23) for a displacement dr of the particle due to small changes
dqi D qP i dt of its curvilinear coordinates over the time interval dt, when the particle
moved from point P .q1 ; q2 ; q3 / to point P0 .q1 C dq1 ; q2 C dq2 ; q3 C dq3 /:
X X
dr D hi dqi ei D hi qP i ei dt : (7.105)
i i
Hence, the velocity is

dr X
vD D hi qPi ei ; (7.106)
dt i
where the dot above qi means its time derivative. The acceleration entering the
Newton’s equations (7.103) is the time derivative of the velocity:
X
dv d dei
aD D ei .hi qPi / C hi qPi : (7.107)
dt i
dt dt
The second term within the brackets is generally not zero as the unit base vectors
may change their direction as the particle moves. For instance, if the particle moves
along a circular trajectory around the z axis and we use the spherical system, the
unit base vectors er , e and e will all change in time.
To calculate the time derivative of the unit base vectors, we recall the general
relationship (7.9) between the unit base vectors of the curvilinear system in question
and the Cartesian vectors e0i . The idea here is that the Cartesian vectors do not
change in time with the motion of the particle, they always keep their direction.
Therefore,
0 1
X3 X3
dei d dmij 0
D @ mij e0j A D e :
dt dt jD1 jD1
dt j

Here elements of the matrix M D mij are some functions of the curvilinear
coordinates .q1 .t/; q2 .t/; q3 .t//, and hence the derivatives of mij are expressed via
time derivatives of the coordinates. Expressing back the Cartesian vectors e0i via the
unit base vectors ei using the inverse matrix M 1 , we arrive at:
X3 3 3 3
dei dmij 0 X dmij X 1 X
D ej D M jk ek D dik ek ; (7.108)
dt jD1
dt jD1
dt kD1 kD1
where the elements of the matrix D are given by:
X3
dmij 1 dM 1
dik D M jk H) DD M : (7.109)
jD1
dt dt
Substituting Eq. (7.108) into (7.107), we obtain an equation for the acceleration a in
the given curvilinear system expressed via the coordinates and their time derivatives.
Finally, note that if the force is initially given in the Cartesian coordinates,
F .x; y; z/ D Fx .x; y; z/i C Fy .x; y; z/j C Fz .x; y; z/k D Fx e01 C Fy e02 C Fz e03 ;
it can always be transformed into the preferred curvilinear system using the
transformation relations (7.1) and the relationship (7.10) between the Cartesian
e0i and the curvilinear ei unit base vectors. As a result, the force takes on the
form (7.104).
As an example, let us work out explicit equations for the velocity and acceleration
for the cylindrical system .r; ; z/. We start by writing the velocity (recall that
hr D 1, h D r and hz D 1). Using Eq. (7.106), we immediately obtain
v D rP er C r P e C zPez : (7.110)
To calculate the acceleration, we need to work out the elements of the matrix D,
Eq. (7.109). Since for the cylindrical system the matrices M and M 1 are given by
Eqs. (7.16) and (7.17), respectively, we obtain
2 0 13 0 1
cos sin 0 cos sin 0
dM 1 4 d @
DD M D sin cos 0 A5 @ sin cos 0A
dt dt
0 0 1 0 0 1
0 10 1 0 1
P sin P cos 0 cos sin 0 0 P0
@ P P
D cos sin 0 A @ sin A
cos 0 D P@ 0 0A ;
0 0 0 0 0 1 0 00
so that
der de dez
D d12 e2 D P e ; D d21 e1 D P er and D0: (7.111)
dt dt dt
Therefore, the acceleration (7.107) becomes

d der d P de d dez
a D er .hr rP / C hr rP C e h Ch P C ez .hz zP/ C hz zP
dt dt dt dt dt dt

d P
D .Rrer C rP eP r / C r e C r P eP C zRez :
dt
Using expressions (7.111) for the derivatives of the unit base vectors obtained above,
we can write the final expression:

a D rR r P 2 er C 2Pr P C r R e C zRez : (7.112)
Problem 7.28. Show that for the spherical system the derivatives of its unit
base vectors are
P C P sin e ;
eP r De P r C P cos e ;
eP De eP D P .sin er C cos e / :
(7.113)
Correspondingly, the velocity and acceleration in this system are
P C r P sin e ;
v D rP er C re (7.114)
a D rR rP 2 r P 2 sin2 er C 2PrP C rR r P 2 sin cos e
C 2Pr P sin C r R sin C 2rP P cos e : (7.115)

Problem 7.29. Prove the following relations for the parabolic cylinder system,
Problem 7.3:
uvP v uP uvP C v uP
eP u D ev ; eP v D eu ; eP z D 0 :
u2 C v 2 u2 C v 2
Then show that the velocity and acceleration of a particle in these coordinates
are
p
v D u2 C v 2 .Pueu C ve P v / C zPez ;
1 ˚ 2
aD p u uP vP 2 C uR u2 C v 2 C 2v uP vP eu
u2 C v 2

C v vP 2 uP 2 C vR u2 C v 2 C 2uPuvP ev C zRez :
Problem 7.30. If a particle moves within the x y plane (i.e. z D 0), the
spherical coordinate system becomes identical to the polar one. Show then
that the equations for the velocity (7.114) and acceleration (7.115) obtained
in the spherical system coincide with those for the polar system, Eqs. (7.110)
and (7.112).
Problem 7.31. Consider a particle of mass m moving under a central force
F D Fr er , where Fr D F.r/ depends only on the distance r from the centre of
the coordinate system.
(i) Show that the equations of motion, F D ma, in this case, when projected
onto the unit base vectors of the spherical system, have the form:
Fr
ar D rR rP 2 r P 2 sin2 D ; a D a D 0 ; (7.116)
m
where ar , a and a are components of the acceleration given in
Eq. (7.115).
(ii) The angular momentum of the particle is defined via L D m Œr v. Show
that
P
L D mr2 P sin e C e : (7.117)
(iii) Differentiate L D m Œr v with respect to time and, using equations

of motion (7.116), prove that LP D 0, i.e. the angular momentum is
conserved.
(iv) Choose the coordinate system such that it is oriented with the angular
momentum directed along the z axis, i.e. L D Lk. Express k via the unit
base vectors of the spherical system and, comparing with Eq. (7.117),
show that cos D 0 (and hence D =2 remaining the same during the
whole trajectory) and
(continued)
PD L : (7.118)
mr2
Therefore, the particle moves within the x y plane performing a two-
dimensional motion. Note that r D r.t/ in Eq. (7.118) and the angle
.t/ does not change its sign along the whole trajectory. For instance, if
L > 0, then P > 0 during the whole motion, i.e. the particle performs a
rotation around the centre with its angle advancing all the time.
(v) Show that the total energy of the particle,
m 2 mvr2
ED rP C r2 P 2 C U.r/ D C Ueff .r/ ; (7.119)
2 2
where vr D rP is the radial component of the velocity, and
L2
Ueff .r/ D U.r/ C
2mr2
is the so-called effective potential energy of the particle, while U.r/ is
the potential energy of the field itself, F.r/ D dU=dr.
(vi) Therefore, establish the following equation for the distance r.t/ to the
centre:
L2 F.r/ dUeff
rR D H) mRr D : (7.120)
m2 r3 m dr
This radial equation corresponds to a one-dimensional motion under
the effective potential Ueff .r/, which is also justified by the energy
expression (7.119).
(vii) Then prove that the energy is conserved, EP D 0.
(viii) Choosing r as a new argument in Eq. (7.120) (instead of time) and
integrating (cf. Sect. I.8.3), show that the equation for the radial velocity
vr can be written as:
r
2
vr D E Uff .r/ : (7.121)
m
Note that this formula is equivalent to the energy expression (7.119).
The above result gives an equation for the velocity vr as a function of r.
Integrating it, obtain an equation for r.t/:
Z r
dr
q Dt; (7.122)
2
r0
m
ŒE Ueff .r/
(continued)

where r0 D r.0/ is the initial distance to the centre.
(ix) Finally, using the fact that rP D P .dr=d / and setting .0/ D 0, obtain
the equation for the trajectory r . /:
Z r
dr m
q D ; (7.123)
r0 r2 2
ŒE Ueff .r/ L
m
Problem 7.32. A boat crosses a river of width a with a constant speed D jvj.
Assume that the water in the river flows with a constant velocity V at any point
across its width. The boat starts at a point A at one bank of the river and along
the whole journey its velocity v is kept directed towards the point B which is
exactly opposite the starting point at the opposite bank. Using the polar system
.r; / with the centre of the coordinate system placed at point B, the x axis
pointing towards point A and the y axis chosen along the river flow, show that
the trajectory r. / of the boat can be written as:
v=2V
a 1 C sin
rD :
cos 1 sin
7.11.3 Distribution Function of a Set of Particles
a set of n particles of mass mp (p D 1; : : : ; n) moving with velocities

Consider
vp D vpx ; vpy ; vpz in a box. This could, for example, be atoms in a liquid or in
a solid which in their movement follow Newtonian equations of motion (this type
of simulation is called molecular dynamics in material science). It is sometimes
useful to know what is the probability distribution of the total kinetic energy of the
particles, i.e. what is the probability dP D f .E/dE of finding the particles in the
system with their kinetic energy between E and E C dE.
The total kinetic energy of the system is
X
n X 2 3n
X
mvp˛ mv 2
EKE D D i
;
pD1 ˛
2 iD1
2
where ˛ D x; y; z corresponds to the Cartesian components of the velocity of each

particle, and in the second passage we used the index i D .p; ˛/ to designate all
degrees of freedom of the system (i D 1; : : : ; 3n) to simplify our notations.
The probability distribution for the degree of freedom 1 to have its velocity
between v1 and v1 C dv1 , for the degree of freedom 2 to have its velocity between
v2 and v2 C dv2 , and so on, is given by the Maxwell’s distribution:
1 ˇEKE
dPv D e dv1 dv2 dvN ;
Z
where ˇ D 1=kB T is the inverse temperature and N D 3n is the total number of
degrees of freedom in the whole system. Z is the normalisation constant.
Now, let us find a probability PE dE for the whole system to have its total kinetic
energy to be within the interval between E and E C dE. This can be obtained by
calculating the integral
Z Z 3n
! 3n
!
1 1 1 X mv 2 X mv 2
PE D ı E i
exp ˇ i
dv1 dv2 dvN :
Z 1 1 iD1
2 iD1
2
Indeed, because of the delta function, only such combinations of the velocities will
be accounted for which correspond to the total kinetic energy equal exactly to E.
Problem 7.33. The total probability of finding particle velocities somewhere

between 1 and C1 is one; it is given by summing up all probabilities for
all possible values of their velocities, i.e. integrating dPv with respect to all
velocities:
Z 1 Z 1 3n
!
1 X mvi2
exp ˇ dv1 dv2 dvN D 1 :
1 1 Z iD1
2
Perform the integration and show that the normalisation constant

N=2 X
2 1
ZD p :
ˇ i
mi
Then, using results of Sect. 7.6 and the properties of the delta function, show
that
1
PE D N EN=21 eE : (7.124)
2
Note that this distribution does not depend on temperature.

Problem 7.34. Let us derive the distribution PE in a different way. Consider
the probability
Z Z 3n
!
1 X mv 2
P .E/ D exp ˇ i
dv1 dv2 dvN
Z EKE E iD1
2
(continued)

of finding the system with its kinetic energy EKE being not bigger than E. Using
ideas borrowed from Sect. 7.6 on the N-dimensional sphere, show that
Z p
2 E
2
P .E/ D 3n er r3n1 dr :
2 0
Since PE dE D P .E C dE/ P .E/, derive Eq. (7.124) from the above

formula for P .E/.
Problem
R1 7.35. Show that the distribution PE is properly normalised, i.e.
0 P E dE D 1.
Chapter 8
Partial Differential Equations
of Mathematical Physics
If an unknown function of several variables and its partial derivatives are combined
in an equation, the latter is called a partial differential equation (PDE). Several
times above we have come across such PDEs when dealing with functions of several
variables. For instance, in Sect. I.6.5.1 we derived the continuity equation for the
particle density which contains its partial derivatives with respect to time t and the
spatial coordinates x, y and z. In Sect. I.6.7.11 the so-called wave equation
1 @2 ‰ @2 ‰ @2 ‰ @2 ‰ 1 @2 ‰
D C C or D ‰ (8.1)
c2 @t2 @x2 @y2 @z2 c2 @t2
was derived for components ‰.x; y; z; t/ of the electric and magnetic fields which
they satisfy in free space, while a heat transport equation
1 @‰ @2 ‰ @2 ‰ @2 ‰ 1 @‰
D C C or D ‰ (8.2)
D @t @x2 @y2 @z2 D @t
was derived in Sect. I.6.7.2 for a distribution of temperature ‰.x; y; z; t/ in a system.

An identical equation (but with the constant D having a different physical meaning)
is satisfied by the particles density, ‰.x; y; z; t/, in the mass transport (i.e. diffusion)
problem. Both Eqs. (8.1) and (8.2) are somewhat similar, but note that the diffusion
equation contains the first order derivative of the unknown function ‰ with respect
to time while the second order time derivative enters the wave equation.
1

546 8 Partial Differential Equations of Mathematical Physics
In the stationary case when, e.g. the distribution of temperature (or density)
across the system stopped changing with time, the time derivative @‰=@t D 0 and
one arrives at the Laplace equation
‰ D 0 : (8.3)
For instance, the 1D variant of this equation, @2 ‰=@x2 D 0, describes at long times
the distribution of temperature in a rod when both its ends are kept at two fixed
temperatures. The Laplace equation is also encountered in other physical problems.
For instance, it is satisfied by an electrostatic potential in regions of space where
there are no charges.
Of course, the wave, diffusion and Laplace equations do not exhaust all possible
types of PDEs which are encountered in solving physical problems; however, in this
chapter we shall limit ourselves to discussing only these.
8.1 General Consideration
8.1.1 Characterisation of Second Order PDEs
The PDE is characterised by its order (the highest order of the partial derivatives)
and whether it is linear or not (i.e. whether the unknown function appears only to
the first degree anywhere in the equation, either on its own or when differentiated).
If an additional function of the variables appears as a separate term in the equation,
it is called inhomogeneous; otherwise, the PDE is homogeneous. It follows then,
that both diffusion and wave equations are linear, homogeneous and of the second
order. At the same time, the PDE
@2 @2
C D ae
@x2 @y2
is non-linear, although also of the second order and homogeneous, while the PDE
1 @‰ @2 ‰ @2 ‰ @2 ‰
F.x; t; z; t/ C D C C ;
D @t @x2 @y2 @z2
which contains an additional function F.x; y; z; t/, is inhomogeneous and linear, and
still of the second order.
Let us have a closer look at a general second order linear PDE with constant
coefficients. Let the unknown function of n variables X D .xi / D .x1 ; x2 ; : : : ; xn / be
‰ .x1 ; x2 ; : : : ; xn / D ‰ .X/. Then the general form of such an equation reads
X
n X
n
aij ‰xi xj C bi ‰xi C c‰ C f .X/ D 0 ; (8.4)
i;jD1 iD1
8.1 General Consideration 547
where we have used simplified notations for the partial derivatives: ‰xi xj D
@2 ‰=@xi @xj and ‰xi D @‰=@xi . The first term in the above equation contains all
second order derivatives, while the second term all first order derivatives. Because
the mixed second order derivatives are symmetric, ‰xi xj D ‰xj xi , the square n n

matrix A D aij of the coefficients2 to the second derivatives can always be chosen
symmetric, aij D aji . Indeed, a pair of terms aij ‰xi xj C aji ‰xj xi with aij ¤ aji can

always be written as aQ ij ‰xi xj C aQ ji ‰xj xi with aQ ij D 12 aij C aji D aQ ji . Note also that
one of the coefficients to the coefficients, aij or aji , could be zero; still, we can always
split it equally into two, i.e. aQ ij D 12 aij D aQ ji , to have the matrix A of the coefficients
of ‰xi xj in Eq. (8.4) symmetric. The coefficients bi to the first order derivatives form

an n-dimensional vector B D .bi /. We assume that the matrix A D aij , vector B
and the scalar c are constants, i.e. they do not depend on the variables X. The PDE
may also contain a general function f .X/. As was mentioned above, if this function
is not present, the PDE is homogeneous; otherwise, it is inhomogeneous.
We shall now show that the PDE (8.4) can always be transformed into a canonical
form which does not have mixed second derivatives. Then the transformed PDE can
be characterised into several types. This is important as the method of solution
depends on the type of the PDE as will be clarified in later sections.
The transformation of the PDE (8.4) into the canonical form is made using a
change of variables X ! Y D .yi / by means of a linear transformation:
X
n
yi D uij xj or Y D UX ;
jD1

where U D uij is a yet unknown square n n transformation matrix. To determine
U, let us rewrite our PDE via the new coordinates. We have
@‰ Xn
@‰ @yk Xn
@‰ X n
‰xi D D D uki D uki ‰yk ;
@xi kD1
@yk @xi kD1
@yk kD1
!
@ X @‰
n
@
‰xj xi D ‰x D uki
@xj i @xj kD1 @yk
!
Xn
@ X @‰
n
@yl X n
@2 ‰ X n
D uki D ulj uki D uki ‰yk yl ulj :
lD1
@yl kD1 @yk @xj l;kD1
@yl @yk l;kD1
Substituting these derivatives into the PDE (8.4), we obtain

! !
X
n X
n X
n X
n
aij uki ‰yk yl ulj C bi uki ‰yk C c‰ C f D 0 ;
i;jD1 l;kD1 iD1 kD1
2
Following our notations of Chap. 1, we shall use non-bold capital letters to designate vectors and
matrices, and the corresponding small letters to designate their components.
or
X
n X
n
a0kl ‰yk yl C b0k ‰yk C c‰ C f1 .Y/ D 0 ; (8.5)
l;kD1 kD1
where we introduced a new matrix
X
n
A0 D a0kl ; where a0kl D uki aij ulj or A0 D UAU T ; (8.6)
i;jD1
a new vector
X
n
B0 D b0k ; where b0k D uki bi or B0 D UB ; (8.7)
iD1

and a new function f1 .Y/ D f U 1 Y .
To eliminate mixed second order derivatives of ‰, one has to choose the
transformation matrix U in such a way that the matrix A0 be diagonal. Since the
matrix A is symmetric, this can always be done by choosing U to be the modal
matrix of A (see Sect. 1.2.10.3): if eigenvalues of A are numbers 1 , 2 ; : : : ; n
(which are real since A is symmetric, Sect. 1.2.10.2), with the corresponding
eigenvectors B1 , B2 , : : : ; Bn (so that ABi D i Bi ), then one can define the matrix
U D .B1 B2 Bn / consisting
of the eigenvectors of A as its column. Hence, the
matrix A0 D UAU T D ıij i has the diagonal form with the eigenvalues of A
standing on its diagonal in the same order as the corresponding eigenvectors in U.
Therefore, after this choice, PDE (8.5) transforms into:
X
n X
n
k ‰yk yk C b0k ‰yk C c‰ C f1 .Y/ D 0 : (8.8)
kD1 kD1
This PDE does not have mixed derivatives anymore, only the diagonal second order
derivatives ‰yk yk D @2 ‰=@y2k are present. This finalises the transformation into the
canonical form.
Now we are ready to introduce the characterisation scheme. It is based entirely
on the values of the eigenvalues fk g. The original PDE (8.4) is said to be of elliptic
type if all eigenvalues are of the same sign (e.g. all positive). It is of hyperbolic type
if all eigenvalues, apart from precisely one, are of the same sign. Finally, the PDE
is said to be of parabolic type if at least one ‰yk yk term is missing which happens if
the corresponding eigenvalue k is zero. For functions of more than three variables
.n 4) more cases are also possible; for instance, in the case of n D 4 it is possible
to have two eigenvalues of one sign and two of the other. We shall not consider those
cases here.
It follows from these definitions that the wave equation (8.1) is of hyperbolic type,
the Laplace equation (8.3) is elliptic, while the diffusion equation (8.2) is parabolic.
Indeed, in all these three cases the equations already have the canonical form. Then,
in the case of the wave equation
1 @2 ‰ @2 ‰ @2 ‰ @2 ‰
C C C D0 (8.9)
c2 @t2 @x2 @y2 @z2
one coefficient (by the second order time derivative) is of the opposite sign to the
coefficients of the spatial derivatives which means that the PDE is hyperbolic. The
heat transport equation does not have the second order time derivative term and
hence is parabolic. Finally, the Laplace equation has all coefficients to the second
order derivatives equal to unity and hence is elliptic.
The PDE (8.8) can be simplified even further: it appears that it is also possible to
eliminate the terms with the first derivatives. To do this, we introduce a new function
ˆ.Y/ via
!
Xn
‰.Y/ D exp i yi ˆ .Y/ ; (8.10)
iD1
where i are new parameters to be determined. These are chosen in such a way as
to eliminate terms with first order derivatives. We have
!
X n
‰yk D ˆyk C k ˆ exp i yi ;
iD1
!
X
n
‰yk yk D ˆyk yk C 2k ˆyk C k2 ˆ exp i yi ;
iD1
so that Eq. (8.8) is manipulated into the following PDE with respect to the new
function ˆ.Y/:
" #
Xn X
n
X
k ˆyk yk C 2k k C b0k ˆyk C c C k k C b0k k ˆ C f2 .Y/ D 0 ;
kD1 kD1 k
(8.11)
where
!
X
n
f2 .Y/ D exp i yi f1 .Y/ : (8.12)
iD1
We now consider a non-zero eigenvalue k . It is seen that choosing k D b0k =

2k eliminates the term with ˆyk . If all eigenvalues are non-zero, then the
PDE (8.11) takes on the form:
X
n
k ˆyk yk C c0 ˆ C f2 .Y/ D 0 ; (8.13)
kD1
where
Xn 0 2
0 bk
c Dc :
kD1
4 k
Hence, all first order derivatives disappeared. If some of the eigenvalues k D 0,

then the corresponding linear term ˆyk cannot be eliminated.
The two-step transformation procedure considered above shows, firstly, that the
second order derivatives term can always be diagonalised (i.e. all mixed derivatives
can be eliminated), and, secondly, that the first order derivatives terms can also be
eliminated if the corresponding eigenvalues of the matrix A of the coefficients to
the second derivatives are non-zero (the corresponding second order derivative term
after diagonalisation is present).
Problem 8.1. Consider a function ‰ .x; y/ satisfying the following PDE:
a11 ‰xx C 2a12 ‰xy C a22 ‰yy C b1 ‰x C b2 ‰y C c‰ D 0 ; (8.14)
where the coefficients aij to the second derivatives are real numbers. Show that
this PDE is elliptic if and only if a11 a22 > a212 , it is parabolic if a11 a22 D a212
and is hyperbolic if a11 a22 < a212 . Verify explicitly that the eigenvalues 1 and
2 are always real.
Problem 8.2. Consider the PDE:
‰x1 x1 C 2‰x1 x2 C ˛‰x2 x2 D 0 :
Show that this PDE is elliptic for ˛ > 1, hyperbolic for ˛ < 1 and parabolic
for ˛ D 1.
Problem 8.3. Consider the following PDE for the function ‰ .x1 ; x2 /:
a‰x1 x1 C 2b‰x1 x2 C a‰x2 x2 C b‰x1 C c‰y1 C ‰ D 0 ;
assuming that a ¤ ˙b. Show that this PDE can be transformed into the
following canonical form:
.a C b/ ˆy1 y1 C .a b/ ˆy2 y2 C ˆ D 0 ;
where
.b C c/2 .b c/2
D1 ;
8 .a C b/ 8 .a b/
(continued)

and the new variables are related to the old ones via

y1 1 1 1 x1
Dp ;
y2 2 1 1 x2
and

bCc bc
‰ .y1 ; y2 / D ˆ .y1 ; y2 / exp p y1 p y2 :
2 2 .a C b/ 2 2 .a b/
Argue that the PDE is elliptic if and only if a2 > b2 , and it is hyperbolic if
a2 < b2 . The PDE cannot be parabolic as this would require a2 D b2 , which is
not possible due to the restriction on the coefficients in this problem.
Problem 8.4. Show that the PDE
4‰x1 x2 C ‰x3 x3 C ‰x1 ‰x3 C 3‰ D 0
for the function ‰ .x1 ; x2 ; x3 / can be simplified as follows:
11
2 ˆy1 y1 ˆy2 y2 C ˆy3 y3 C ˆ D 0 ;
4
where the transformation matrix between the original X and new Y variables
0 p p 1
1=p2 1= p2 0
U D @ 1= 2 1= 2 0 A
0 0 1
and

1 y3
‰ .y1 ; y2 ; y3 / D ˆ .y1 ; y2 ; y3 / exp p .y1 y2 / C :
4 2 2
This manipulation shows that the original PDE is hyperbolic.

Problem 8.5. Show that the PDE
2‰xx C 4‰xy C ‰yy D 0
is hyperbolic.
Problem 8.6. Consider a function ‰.x; y/ satisfying the PDE (8.14). Show
that there exists a linear transformation to new variables,

1 x u11 u12 x
DU D ;
2 y u21 u22 y
which transforms this PDE into a form containing only the mixed derivative:
a012 ‰1 2 C b01 ‰1 C b02 ‰2 C c‰ D 0 :
Show that in order to eliminate the terms with diagonal double derivatives,
‰1 1 and ‰2 2 , the ratios u12 =u11 and u22 =u21 must be different roots of the
quadratic equation
a22 2 C 2a12 C a11 D 0 (8.15)
with respect to . Why does the condition on the roots being different guarantee
that the determinant of U is not zero?
Problem 8.7. As a simple application of the previous problem, show that the
PDE
2‰x1 x1 C 3‰x1 x2 C ‰x2 x2 D 0
is equivalent to the PDE ‰y1 y2 D 0, where the new variables can be chosen as
y1 D x1 2x2 and y2 D x1 x2 .
Problem 8.8. Here we shall return back to Problem 8.6. Specifically, we shall
consider a PDE which results in the roots of Eq. (8.15) being equal:
‰x1 x1 C 4‰x1 x2 C 4‰x2 x2 D 0:
Show that in this case it is not possible to find a linear transformation, Y D UX,
such that in the new variables the PDE would contain only the single term with
the mixed derivative, ‰y1 y2 D 0. Instead, show that the PDE can be transformed
into its canonical form containing (in this case) only a single double derivative,
‰y2 y2 D 0, where the new variables may be chosen as y1 D 2x1 C x2 and
y2 D x1 C 2x2 .
Problem 8.9. A small transverse displacement ‰ .x; t/ of a flexible tube
containing an incompressible fluid of negligible viscosity flowing along it in
direction x is described by the PDE
‰tt C A‰xt C B‰xx D 0 ;

(continued)

where A and B are some constants. Find the transformation of variables,
.x; t/ ! .x1 ; x2 / with x1 D x C 1 t and x2 D x C 2 t, in which the PDE
p
reads ‰x1 x2 D 0. [Answer: :1;2 D 12 A ˙ A2 4B .]
Problem 8.10. Show that the 1D wave equation, ‰xx D c12 ‰tt , is invariant
with respect to the (relativistic) Lorentz transformation of coordinates:
v 1=2
x0 D .x vt/ ; t0 D t x ; D 1 v 2 =c2 ;
c2
i.e. in the new (primed) coordinates the equation has an identical form: ‰x0 x0 D
1
‰ 0 0 . Here the non-primed variables, .x; t/, correspond to the position and
c2 t t
time in a laboratory coordinate system, while the primed variables, .x0 ; t0 /,
correspond to a coordinate system moving with velocity v with respect to the
laboratory system along the positive direction of the x axis. Next show that if
the non-relativistic Galilean transformation,
x0 D x vt ; t0 D t ;
is applied, the wave equation does change its form, i.e. it is not invariant with
respect to this transformation.
Concluding, we mention p that, if desired, one may rescale all or some of the
variables yk ! zk D yk = jk j as an additional step. Of course, this can only be
done for those variables yk for which k ¤ 0; one does not need to do rescaling for
those yk for which k is zero since in this case the second derivative term is missing
anyway. This additional transformation
p leads to the corresponding coefficients to
the second derivatives being k = jk j D ˙1, i.e. just plus or minus one.
8.1.2 Initial and Boundary Conditions
We know that when solving an ordinary (one-dimensional) DE, we first obtain a

general solution which contains one or more arbitrary constants. Then, using initial
conditions which provide information about the function we are seeking at a single
point x D x0 (the value of the function for the first order DE or the values of
the function and its derivatives at this point in the case of higher order DE), the
constant(s) are determined yielding the final particular solution of the DE.
In the case of PDEs there are more than one variable involved, and we also
need to know something additional about the function we are seeking in order
to determine it. What kind of additional information is usually available? When
solving physics problems in which spatial coordinates are involved, normally one
knows the values of the unknown function at a boundary of a region of interest.
For instance, in a heat transport problem across a 1D rod we are usually given the
temperatures at its both ends, and are seeking the temperature distribution at all
internal points along the rod in time. This type of additional conditions are called
boundary conditions. The boundary conditions supply the necessary information
on the function of interest associated with its spatial variables. If, as is the case
for the wave or diffusion equations, the time is also involved, then usually we know
the whole function (and may be its time derivative) at the initial (t D 0) time,
and are interested in determining its evolution at later times. This type of additional
conditions is called initial conditions and is analogous to the case of an ordinary DE.
As an example, consider oscillations of a string of length L stretched along the
x axis between the points x D 0 and x D L. As we shall see below, the vertical
displacement u.x; t/ of the point of the string with coordinate x (0 x L) satisfies
the wave equation
1 @2 u @2 u
D :
c2 @t2 @x2
Therefore, alongside the initial conditions,
ˇ
@u ˇˇ
u.x; 0/ D 1 .x/ and D 2 .x/ ; (8.16)
@t ˇtD0
giving values of the unknown function and its first time derivative at all values of
the spatial variable x, one has to supply the boundary conditions,
u.0; t/ D '1 .t/ and u.L; t/ D '2 .t/ ; (8.17)
as well. The boundary conditions establish the values of the function at the edge
(boundary) points x D 0 and x D L of the string at all times. Note that the boundary
conditions may include instead a derivative with respect to x at one or both of these
points, or some linear combination of the various types of terms.
Also note that since the wave equation is of the second order with respect to
time, both the function u.x; 0/ and its first time derivative at t D 0 are required to be
specified in the initial conditions (8.16). In the case of the diffusion equation where
only the first order time derivative is used, only the value of the unknown function
at t D 0 need to be given; the first derivative cannot be given in addition as this
complementary condition may be contradictory.
This idea of the boundary conditions is easily generalised to two- and three-
dimensional spatial PDEs where the values of the unknown functions are to be
specified at all times at the boundary of a spatial region of interest. For instance,
in the case of oscillations of a circular membrane of radius R, one has to specify
as the boundary conditions on the displacement u.x; y; t/ of all boundary points
x2 C y2 D R2 of the membrane at all times t.
What kind of initial and boundary conditions are necessary to specify in each
case to guarantee that a solution of the given PDE can uniquely be found? This
depends on the PDE in question so that this highly important question has to be
considered individually for each type of the PDE. It will be discussed below only
specifically for the wave and the diffusion equations.
8.2 Wave Equation 555
8.2 Wave Equation
In Sect. I.6.7 the wave equation was derived for the electric and magnetic fields.
However, it is encountered in many other physical problems and hence has a very
general physical significance. To stress this point, it is instructive, before discussing
methods of solution of the wave PDEs, to consider two other problems in which
the same wave equation appears: oscillations of a string and sound propagation in
a condensed media (liquid or gas). This is what we are going to do in the next two
subsections.
8.2.1 One-dimensional String
Consider a string which lies along the x axis and is subjected to a tension T0 in the
same direction. At equilibrium the string will be stretched along the x axis. If we
apply a perpendicular external force F (per unit length) and/or take the string out
of its equilibrium position and then release it, it will start oscillating vertically, see
Fig. 8.1. Let the vertical displacement of a point with coordinate x be u.x; t/; note
that the vertical displacement is a function of both x and the time, t.
Let us consider a small element AB of the string, with the point A being at x and
the point B at x C dx, see Fig. 8.1. The total force acting on this element is due to
two tensions applied to points A and B which work in the (approximately) opposite
directions, and an (optional) external force F.x/, which is applied in the vertical
direction. We assume that the oscillations are small (i.e. the vertical displacement
u.x; t/ is small compared to the string length). This means that the tensions may be
assumed to be nearly horizontal (although in the figure vertical components of the
forces are greatly amplified for clarity), and cancel each other in this direction (if
they did not, then the string would move in the lateral direction!). Therefore, we
should only care about the balance of the forces in the vertical direction.
Let ˛.x/ be the angle the tangent line to the string makes with the x axis at the
point x, as shown in the figure. Then, the difference in the heights between points B
and A can be calculated as
Fig. 8.1 Oscillating string.

The tangent direction to the
string at point x is indicated
by a vector which makes the
angle ˛ with the x axis. Here
F.x/ is a force (per unit
length) applied to the string
element AB contained
between points x and x C dx
(it may also depend on time)
@u @u
u.x C dx/ u.x/ D dx D tan ˛.x/ dx ' ˛.x/dx H) ' ˛.x/ ;
@x @x
because for small oscillations the angle ˛ is small and therefore tan ˛ ' sin ˛ ' ˛.
Here the time was omitted for convenience (we consider the difference in height at
the given time t). Note that the angle ˛.x/ depends on the point at which the tangent
line is drawn, i.e. it is a function of x.
Then, the vertical component of the tension force acting downwards at the left
point A is

@u
T0 sin ˛.x/ ' T0 ˛.x/ D T0 :
@x A
On the other hand, the vertical component of the tension applied at the point B is
similarly

@u
T0 sin Œ˛.x C dx/ ' T0 ˛.x C dx/ D T0 ;
@x B
but this time the partial derivative of the displacement is calculated at the point B.
Therefore, the net force acting on the element dx in the vertical direction will be

@u @u @u @u
Fdx C T0 T0 D Fdx C T0 :
@x B @x A @x B @x A
The expression in the square brackets gives a change of the function f .x/ D @u=@x
between two points B and A which are separated by dx, hence one can write
@f @2 u
f .x C dx/ f .x/ D dx D 2 dx ;
@x @x
so that the total force acting on the element dx of the string in the vertical direction
becomes

@2 u
F C T0 2 dx :
@x
On the other hand, due to Newton’s equations of motion, this force should be equal
to the product of the mass dx of the piece of the string of length dx and its vertical
acceleration, @2 u=@t2 . Therefore, equating the two expressions and canceling out on
dx, the final equation of motion for the string is obtained:
1 @2 u @2 u
D G.x/ C ; (8.18)
c2 @t2 @x2
p
where G.x/ D F.x/=T0 and the velocity c D T0 = . As one can see, this PDE
is in general inhomogeneous. When the external force F is absent, we arrive at the
familiar (homogeneous) wave equation with the velocity c.
8.2.2 Propagation of Sound
Consider the propagation of sound in a gas, e.g. in the air. As the gas is perturbed,
this creates density fluctuations in it which propagate in space and time. Corre-
sponding to these fluctuations, any little gas volume dV D dr can be assigned a
velocity v.r/. The latter must satisfy the hydrodynamic equation of motion (I.6.139):
@v 1
C v gradv D F gradP ;
@t
where F is a vector field of external forces (e.g. gravity) and P is the local pressure.
We assume that the velocity field v.r/ changes very little in space, and hence we
can neglect the v gradv term in the above equation, leading to
@v 1
D F gradP : (8.19)
@t
This equation has to be supplemented by the continuity equation (I.6.88),
@
C div v C v grad D 0 ;
@t
in which one can neglect the v grad term containing a product of two small terms:
a small velocity and a small variation of the density. Hence,
@
C div v D 0 :
@t
Let 0 be the density of the gas at equilibrium. Then, the density of the gas during
the sound propagation (when the system is out of equilibrium) can be written via
D 0 .1 C s/, where s.r/ D . 0 / = 0 is the relative fluctuation of the gas
density which is considered much smaller than unity. Then, d D 0 ds and therefore
@ @s @s
D 0 H) 0 C div v D 0 : (8.20)
@t @t @t
Also, in the continuity equation we can replace with 0 since
div v D 0 .1 C s/ div v D 0 div v C 0 s div v ' 0 div v ;
where we dropped the second order term containing a product of two small
terms (s and a small variation of the velocity) to be consistent with the previous
approximations. Hence, the continuity equation takes on a simpler form:
@s
C div v D 0 : (8.21)
@t
Finally, we have to relate the pressure to the density. In an ideal gas the sound
propagation is an adiabatic process in which the pressure P is proportional to ,
where D cP =cV is the ratio of heat capacities for the constant pressure and volume.
Therefore, if P0 is the pressure at equilibrium, then

P P
D H) D .1 C s/ ' 1 C s H) grad P D P0 grad s :
P0 0 P0
Substituting this into Eq. (8.19) and replacing with 0 there (which corresponds
again to keeping only the first order terms), we get
@v
D F c2 grad s ; (8.22)
@t
p
where c D P0 = 0 . Next, we take the divergence of both sides of this equation.
The left-hand side then becomes

@v @ @2 s
div D .div v/ D 2 ;
@t @t @t
where we have used Eq. (8.21) at the last step. The divergence of the right-hand side
of Eq. (8.22) is worked out as follows:

div F c2 grad s D divF c2 div grad s D divF c2 s ;
i.e. it contains the Laplacian of s. Equating the left- and the right-hand sides, we
finally obtain the equation sought for:
@2 s
D c2 s divF : (8.23)
@t2
If the external forces are absent, this equation turns into a familiar wave equation in
3D space with the constant c being the corresponding sound velocity:
1 @2 s
D s : (8.24)
c2 @t2
8.2.3 General Solution of PDE
When solving an ordinary DE, we normally first obtain its general solution, which
contains arbitrary constants, and then, by imposing corresponding initial conditions,
a particular integral of the DE is obtained so that both the DE and the initial
conditions are satisfied. The case of PDEs is more complex as a function with more
than one variable is to be sought. However, in most cases one can provide some
analogy with the 1D case of an ordinary DE: if in the latter case the general solution
contains arbitrary constants, the general solution of a PDE contains arbitrary
functions. Then, a particular integral of the PDE is obtained by finding the particular
functions so that both initial and boundary conditions are satisfied, if present.
To illustrate this point, let us consider a specific problem of string oscillations
considered in Sect. 8.2.1. We shall assume that the string is stretched along the x
axis and is of an infinite length, i.e. 1 < x < 1. The condition of the string
being infinite simplifies the problem considerably as one can completely ignore any
boundary conditions. Then, only initial conditions remain.
Therefore, the whole problem can be formulated as follows. The vertical
displacement ‰.x; t/ of the point with coordinate x must be a solution of the wave
equation:
@2 ‰ 1 @2 ‰
D ; (8.25)
@x2 c2 @t2
subject to the initial conditions
ˇ
@‰ ˇˇ
‰.x; 0/ D 1 .x/ and ‰t .x; 0/ D D 2 .x/ : (8.26)
@t ˇtD0
The wave equation is of hyperbolic type. As follows from Problem 8.6, in some
cases it is possible to find such a linear transformation of the variables .x; t/ H)
.; / that the PDE would only contain a mixed derivative.
Problem 8.11. Show that the new variables .; /, in which the PDE (8.25)
has the form
@2 ‰
D0; (8.27)
@@
are related to the old ones via:
D x C ct and D x ct : (8.28)
[Hint: recall Problem 8.6.]
The obtained PDE (8.27) can be easily integrated. Indeed,

@2 ‰ @ @‰ @‰
D D 0 H) D C ./ ;
@@ @ @ @
where C ./ is an arbitrary function of the variable . Integrating the latter equation,
we obtain
Z
@‰
D C ./ H) ‰ .; / D C ./ d C u ./ D v ./ C u ./ ;
@
Fig. 8.2 A sketch of the

function v .x ct/ at two
times t1 and t2 > t1
where u ./ must be another, also arbitrary, function of the other variable . Above,
v ./ is also an arbitrary function of since it is obtained by integrating an
arbitrary function C ./. Recalling what the new variables actually are, Eq. (8.28),
we immediately arrive at the following general solution of the PDE (8.25):
‰.x; t/ D v.x ct/ C u.x C ct/ : (8.29)
So, the general solution appears to be a sum of two arbitrary functions of the
variables x ˙ ct. Note that this result is general; in particular, we have not used
the fact that the string is of infinite length. However, applying boundary conditions
for a string of a finite length directly to the general solution (8.29) is non-trivial, so
we shall not consider this case here; another method will be considered instead later
on in Sect. 8.2.5.
Before applying the initial conditions (8.26) to obtain the formal solution for the
particular integral of our wave equation of the infinite string, it is instructive first
to illustrate the meaning of the obtained general solution. We start by analysing the
function v .x ct/. It is sketched for two times t1 and t2 > t1 as a function of x in
Fig. 8.2. It is easy to see that the profile (or shape) of the function remains the same
at later times; the whole function simply shifts to the right, i.e. the wave propagates
without any distortion of its shape. This can be seen in the following way. Consider
a point x1 . At time t1 the function has some value v1 D v .x1 xt1 /. At time t2 > t1
the function becomes v .x ct2 /; at this later time it will reach the same value v1 at
some point x2 if the latter satisfies the following condition:
x1 ct1 D x2 ct2 H) x2 D x1 C c .t2 t1 / ;
which immediately shows that the function as the whole shifts to larger values of x
since t2 > t1 . This is also shown in Fig. 8.2. It is seen that the function shifts exactly
by the distance x D c .t2 t1 / over the time interval t D t2 t1 . Therefore,
the first part of the solution (8.29), v .x ct/, describes propagation of the wave
shape v.x/ with velocity c to the right. Similarly it is verified that the second part,
u .x C ct/, of the solution (8.29) describes the propagation of the wave shape u.x/
to the left with the same velocity.
Now, let us find yet unknown functions u.x/ and v.x/ by satisfying the initial
conditions (8.26). Applying the particular form of the general solution (8.29), the
initial conditions read

dv.x/ du.x/
v.x/ C u.x/ D 1 .x/ and c C D 2 .x/ : (8.30)
dx dx
Integrating the second equation between, say, zero and x, we obtain
Z
1 x
Œv.x/ v.0/ C Œu.x/ u.0/ D 2 ./ d ;
c 0
which, when combined with the first equation in (8.30), allows solving for both
functions:
Z
1 1 x
u.x/ D A C 1 .x/ C 2 ./ d ;
2 c 0
Z
1 1 x
v.x/ D 1 .x/ A 2 ./ d ;
2 c 0
where A D u.0/v.0/ is a constant. Interestingly, when substituting these functions
into the general solution (8.29), the constant A cancels out, and we obtain
Z
1 1 xCct
‰ .x; t/ D Œ 1 .x ct/ C 1 .x C ct/ C 2 ./ d : (8.31)
2 2c xct
This solution is known as the d’Alembert’s formula. It gives the full solution of the
problem given by Eqs. (8.25) and (8.26) for an infinite string.
As was already mentioned, it is possible to generalise the described method so
that solution of the wave equation of a string of a finite length L (0 x L), with
boundary conditions explicitly given, is possible. However, as was remarked before,
the method becomes very complicated and will not be considered here since simpler
methods exist. One of such methods, the method due to Fourier, based on separation
of variables will be considered in detail in the following sections.
Problem 8.12. Explain why the solution (8.31) conforms to the general form
of Eq. (8.29).
Problem 8.13. Show that the general solution of the PDE of Problem 8.7 can
be written as
‰ .x1 ; x2 / D u .x1 2x2 / C v .x1 x2 / ;
where u.x/ and v.x/ are two arbitrary functions.

Problem 8.14. Show that the general solution of the PDE of Problem 8.8 can
be written as
‰ .x1 ; x2 / D u .2x1 C x2 / C v .x1 C 2x2 / ;
where u.x/ and v.x/ are arbitrary functions.

Problem 8.15. Consider propagation of a spherically symmetric wave in 3D
space. This problem is described by the wave equation (7.99) in which the
dependence of the function ‰ on the angles and is completely ignored:

1 @ 2 @‰ 1 @2 ‰
2
r D 2 2 : (8.32)
r @r @r c @t
By introducing a new function ˆ .r; t/ such that ‰ .r; t/ D r˛ ˆ .r; t/, show
that by taking ˛ D 1 the PDE for ˆ will be of the form ˆrr D c12 ˆtt .
Correspondingly, the general solution of the problem (8.32) can then be
written as
1
‰.r; t/ D Œv .r ct/ C u .r C ct/ :
r
This solution describes propagation of spherical waves from (the first term) and
to (the second) the centre of the coordinate system (r D 0). The attenuation
factor 1=r corresponds to a decay of the wave amplitude with the distance r
to the centre. Since the energy of the wave front is proportional to the square
of the amplitude, the energy decays as 1=r2 . Since the area of the wave front
increases as 4 r2 , the decay of the wave’s amplitude ensures that the wave’s
total energy is conserved.
Problem 8.16. Similarly, consider a wave propagation in the cylindrical
symmetry, Eq. (7.98). In this case the wave equation reads (ignoring the angle
and the coordinate z)

1 @ @‰ 1 @2 ‰
r D 2 2 : (8.33)
r @r @r c @t
Show that with an appropriate choice of the power ˛ in the transformation

‰ .r; t/ D r˛ ˆ .r; t/, the wave equation for the new function ˆ .r; t/ can be
manipulated into the following form:
1 1
ˆrr C ˆ D 2 ˆtt :
4r2 c
Problem 8.17. Consider Problem 8.9 again. Show that the general solution of
the PDE is
‰.x; t/ D u .x C 1 t/ C v .x C 2 t/ ;
where u.x/ and v.x/ are two arbitrary functions. Then, assume that the tube
initially was at rest and had a small transverse displacement ‰ D ‰0 cos .kx/.
Show that the subsequent motion of the tube is given by:
‰0
‰ .x; t/ D Œ1 cos .k .x C 2 t// 2 cos .k .x C 1 t// :
1 2
8.2.4 Uniqueness of Solution
In the following, we shall study oscillations of a string of a finite length L with

the coordinate x being within the interval 0 x L. Two types of the boundary
conditions at each end of the string we shall consider: when either the function
‰.x; t/ is specified at that value of x (either x D 0 or x D L) or its x-derivative.
For instance, when the x D L end of the string is free (not fixed), the tension
force at this end is zero and hence the x-derivative must be zero, i.e. the boundary
condition at this end is ‰x .L; t/ D 0. At the same time, at the other end (x D 0)
the string is fixed, i.e. at this end the boundary condition must be ‰.0; t/ D 0.
This simple example shows that at one end one type of the boundary conditions
can be specified, e.g. formulated for the function, while at the other end another,
e.g. formulated for its derivative. Altogether, four possibilities of the boundary
conditions exist, three of them are distinct. The initial conditions are given in the
usual way by Eq. (8.16):
ˇ
@‰ ˇˇ
‰.x; 0/ D 1 .x/ and D 2 .x/ : (8.34)
@t ˇtD0
What we are interested in here is to demonstrate that by specifying initial and any
of the four types of the boundary conditions a unique solution of the wave equation
@2 ‰ 2
2@ ‰
D c C f .x; t/ (8.35)
@t2 @x2
is guaranteed. Here f .x; t/ is a function of an “external force” acting on the string at
point x. That force makes the PDE inhomogeneous.
Proving by contradiction, let us assume that two different solutions, ‰ .1/ .x; t/
and ‰ .2/ .x; t/, exist which satisfy the same PDE,
.1/ .2/
‰tt D c2 ‰xx
.1/
C f .x; t/ ; ‰tt D c2 ‰xx
.2/
C f .x; t/ ; (8.36)
and the same initial and boundary conditions. Consider then their difference
‰ .x; t/ D ‰ .1/ .x; t/ ‰ .2/ .x; t/ :
Obviously, it satisfies the homogeneous wave equation, ‰tt D c2 ‰xx (without the
force term f ) and the zero initial,
ˇ
@‰ ˇˇ
‰.x; 0/ D 0 and D0; (8.37)
@t ˇtD0
and boundary,
‰.0; t/ D 0 or ‰x .0; t/ D 0 (8.38)
and
‰.L; t/ D 0 or ‰x .L; t/ D 0 ; (8.39)
conditions. Next, we shall consider an auxiliary function of time

Z
1 L
1
E.t/ D .‰x /2 C .‰t / 2
dx : (8.40)
2 0 c2
Let us differentiate this function with respect to time:

Z Z
dE L L
1
D ‰x ‰xt dx C ‰t ‰tt dx :
dt 0 0 c2
The first term we shall integrate by parts:

Z L Z L
‰x .‰xt dx/ D .‰x ‰t /jxDL .‰x ‰t /xD0 ‰xx ‰t dx : (8.41)
0 0
Consider the free term, ‰x ‰t , which is calculated at x D 0. At this end of the string
the boundary condition states that either ‰ is equal to zero at all times or ‰x . If
the latter is true, then we immediately see that the free term is zero. In the former
case differentiation of the condition ‰ D 0 at x D 0 with respect to time gives
‰t D 0, which again results in the corresponding free term being zero. So, the free
term, .‰x ‰t /xD0 , is zero in either case. Similarly, the other free term, calculated at
x D L, is also zero. So, for all four combinations of the boundary conditions both
free terms are zero and hence only the integral term remains in the right-hand side
in Eq. (8.41). Therefore,
Z
dE L
1
D ‰t ‰xx C 2 ‰tt dx D 0 ;
dt 0 c
as the expression in the brackets is zero because of the PDE for ‰ itself. Therefore,
E must be a constant in time. Calculating this constant at t D 0,
Z
1 L
1
E.0/ D .‰x .x; 0//2 C .‰t .x; 0// 2
dx ;
2 0 c2
and employing the initial conditions, Eq. (8.37), and the fact that from ‰ .x; 0/ D 0
immediately follows, upon differentiation, that ‰x .x; 0/ D 0, we must conclude
that E.0/ D 0. In other words, the function E.t/ D 0 at all times. On the other
hand, the function E.t/ of Eq. (8.40) consists of the sum of two non-negative terms
and hence can only be zero if and only if both these terms are equal to zero at the
same time. This means that both ‰x .x; t/ D 0 and ‰t .x; t/ D 0 for any x and t.
Since ‰x D 0, the function ‰.x; t/ cannot depend on x. Similarly, since ‰t D 0,
our function cannot depend on t. In other words, ‰.x; t/ must be simply a constant.
However, because of the initial conditions, ‰ .x; 0/ D 0, so that this constant must
be zero, i.e. ‰.x; t/ D 0, which contradicts our assumption of two different solutions
being possible. Hence, it is wrong.
We have just proved that specifying initial and any of the four types of the
boundary conditions uniquely defines the solution of the problem ‰.x; t/ sought
for.
8.2.5 Fourier Method
As was mentioned above, it is possible to devise a method of solution of the wave

equation based on the sum of two arbitrary functions, Eq. (8.29); however, it is rather
complicated for finite systems when boundary conditions are to be specified. At
the same time, a simpler method exists, called the Fourier method, which allows
solving a wide range of problems related to the wave equation, and in this section we
shall start developing essential ingredients of this method. Before considering a very
general case of inhomogeneous wave equation with arbitrary initial and boundary
conditions, it is instructive to start from a simple case of the homogeneous equation
with trivial zero boundary conditions. Once we develop all essential ideas and
techniques for this simpler problem, we shall then be able to consider the general
case.
Consider the following problem in which the wave equation,
1
‰tt D ‰xx (8.42)
c2
is to be solved with general initial,
‰.x; 0/ D 1 .x/ ; ‰t .x; 0/ D 2 .x/ ; (8.43)

and zero boundary conditions:
‰.0; t/ D 0 and ‰ .L; t/ D 0 : (8.44)
To solve this problem, we shall first seek the solution in a very special form as a
product of two functions,
‰.x; t/ D X.x/T.t/ ; (8.45)
where X.x/ is a function of only one variable x and T.t/ is a function of another
variable t. This trial solution may seem to be too specific and hence may not serve
as a solution of the whole problem (8.42)–(8.44); however, as will be seen below, we
shall be able to construct a linear combination of such product solutions which will
then satisfy our PDE together with the initial and boundary conditions. However,
we shall build our solution gradually, step-by-step.
Substituting the product of the two functions, Eq. (8.45), into the PDE (8.42),
gives
d2 X 1 d2 T 1 d2 X 1 d2 T
T D X H) D : (8.46)
dx2 c2 dt2 X dx2 c2 T dt2
To get the form of the equation written to the right of the arrow, we divided both
sides of the equation on the left of the arrow by the product XT. What we obtained is
2
quite peculiar. Indeed, the left-hand side of the obtained equation, X1 ddxX2 , is a function
2
of x only, while the right-hand side, c21T ddt2T , is a function of t only. One may think
that this cannot possibly be true as this “equality” must hold for all values of x and t.
However, there is one and only one possibility which would resolve this paradoxical
2 2
situation: if both functions, X1 ddxX2 and c21T ddt2T , are constants. Calling this constant K,
we then must have
d2 X
D KX ; (8.47)
dx2
and
d2 T
D c2 KT ; (8.48)
dt2
which are two ordinary DEs for the functions X.x/ and T.t/, respectively. The
constant K is called the separation constant. This is because in the right part of
Eq. (8.46) the variables x and t have been separated. Therefore, the method we are
discussing is called the method of separation of variables.
The constant K introduced above is yet unknown. However, available values of
it can be determined if we impose the boundary conditions (8.44) on our product
solution (8.45). Since in the product the function T.t/ does not depend on x, it is
clear that the boundary conditions must be applied to the function X.x/ only. Hence,
we should have
X.0/ D X.L/ D 0 : (8.49)

Let us try to solve Eq. (8.47) subject to this boundary conditions to see which values
of K are consistent with them. Three cases must be considered3 : K < 0, K > 0 and
K D 0.
1. When K > 0, we can write K D p2 with p > 0. For this case the solution
of (8.47) is a sum of two exponential functions:
X.x/ D Aepx C Bepx : (8.50)
If the boundary conditions (8.49) at x D 0 and x D L are applied to Eq. (8.50),

one finds
ACBD0 and AepL C BepL D 0 :
It is clear that there is only one solution of this system of two simultaneous
algebraic equations which is the trivial solution: A D 0 and B D 0. This in turn
leads to the trivial solution X.x/ D 0, which results in the trivial trial product
solution ‰.x; t/ D 0, and it is not of any physical interest!
2. When K D 0, we find that d2 X=dx2 D 0, which yields simply X.x/ D Ax C B:
This solution is also of no physical interest, because the boundary conditions at
x D 0 and x D L can only be satisfied by A D 0 and B D 0 leading again to the
trivial solution.
3. What is left to consider is the last case of K < 0. We write K D k2 with
(without of loss of generality) some positive k (see below). In this case we obtain
an equation for the harmonic oscillator
d2 X
C k2 X D 0 ;
dx2
which solution is X.x/ D A sin.kx/ C B cos.kx/. The application of the boundary

conditions at x D 0 and x D L to this solution results in two algebraic equations:
X.0/ D B D 0 and X.L/ D A sin .kL/ D 0 : (8.51)
When solving these equations, we have to consider all possible cases. Choosing
the constant A D 0 would again yield the trivial solution; therefore, this equation
can only be satisfied if sin .kL/ D 0, which gives us a possibility to choose all
possible values of k. These obviously are
n
kn D ; n D 1; 2; : : : : (8.52)
L
3
Obviously, the constant K must be real.
We obtained not one but an infinite number of possible solutions for k, which we
distinguish by the subscript n. Note that n ¤ 0 as this gives the zero value for k
which we know is to be rejected as leading to the trivial solution. Also, negative
values of n (and hence negative values of kn D kjnj D jnj =L D kjnj ) do not
give anything new as these result in the same values of the separation constant
Kn D kn2 and to the solutions

Xjnj .x/ D A sin kjnj x D A sin kjnj x ;

which differ only by the sign from the solutions Xjnj .x/ D A sin kjnj x associated
with the positive n. Hence, the choice of n D 1; 2; 3; : : : in Eq. (8.52); no need to
consider negative and zero n.
From the above analysis we see that the boundary conditions can only be satisfied
when the separation constant K takes certain discrete eigenvalues Kn D kn2 , where
n D 1; 2; : : :. This type of situation is of frequent occurrence in the theory of PDEs.
In fact, we did come across this situation already in Sect. 4.7.2 when we considered
a quantum-mechanical problem of a hydrogen atom.
Associated with the eigenvalue kn we have an eigenfunction
n x
Xn .x/ D An sin ; n D 1; 2; : : : ; (8.53)
L
where An are some constants; these may be different for different n, so we
distinguished them with the index n.
Next we have to solve the corresponding differential equation (8.48) for T.t/ with
K D kn2 , which is again an equation for the harmonic oscillator:
d2 T
C .kn c/2 T D 0 ;
dt2
with the general solution
Tn .t/ D Dn sin.!n t/ C En cos.!n t/ ; (8.54)
where
c
!n D n; n D 1; 2; : : : : (8.55)
L
We can now collect our functions (8.53) and (8.54) to obtain the desired product
solution (8.45):
n x 0
‰n .x; t/ D sin Dn sin.!n t/ C En0 cos.!n t/ : (8.56)
L
Here we combined products An Dn and An En of arbitrary constants into new arbitrary

constants D0n and En0 . The function above, for any n D 1; 2; : : :, satisfies the PDE
and the boundary conditions by construction. Note that we have obtained not one
but an infinite number of such solutions ‰n .x; t/ labeled by the index n.
This is all good, but neither of the obtained solutions may satisfy the initial
conditions (8.43). In order to overcome this difficulty, we remark that our PDE is
linear and hence any linear combination of the solutions will also be a solution. This
is called the superposition principle. To understand this better, let us first rewrite our
PDE in an operator form:

1 @2 1 @2
‰xx ‰tt D 0 H) 2 2 ‰D0;
c2 @x 2 c @t
b D 0, where
or simply L‰
2 2
bD @ 1 @
L
@x2 c2 @t2
is the operator expression contained in the round brackets in the PDE above. It is
easy to see that this operator is linear, i.e. for any numbers ˛ and ˇ and any two
functions ˆ1 .x; t/ and ˆ2 .x; t/ we have
L b 2
b 1 C ˇ Lˆ
b.˛ˆ1 C ˇˆ2 / D ˛ Lˆ :
b
Indeed, for instance, consider the first part of the operator L:
@2 @2 ˆ1 @2 ˆ2
2
.˛ˆ1 C ˇˆ2 / D ˛ 2 C ˇ 2 ;
@x @x @x
as required. Therefore, if functions ˆ1 and ˆ2 are solutions of the PDE, i.e.
b 1 D 0 and Lˆ
Lˆ b 2 D 0, then L b.˛ˆ1 C ˇˆ2 / D 0 as well, i.e. their arbitrary
linear combination, ˛ˆ1 C ˇˆ2 , must also satisfy the PDE. On top of that, if ˆ1
and ˆ2 satisfy the boundary conditions, then their linear combination, ˛ˆ1 C ˇˆ2 ,
will satisfy them as well.
It is now clear how the superposition principle may help in devising a solution
which satisfies the PDE, the boundary and the initial conditions. We have already
built individual solutions (8.56) that each satisfy the PDE and the zero boundary
conditions. If we construct now a linear combination of all these solutions with
arbitrary coefficients ˛n ,
1
X 1
X n x
‰.x; t/ D ˛n ‰n .x; t/ D ŒBn sin.!n t/ C Cn cos.!n t/ sin ; (8.57)
nD1 nD1
L
then this function will satisfy the PDE due to the superposition principle. An
essential point now to realise is that it will also satisfy the zero boundary conditions
as each term in the sum obeys them! Therefore, this construction satisfies the
PDE and the boundary conditions giving us enough freedom to obey the initial
conditions as well: we can try to find the coefficients Bn and Cn in such a way as to
accomplish this last hurdle of the method. Note that the new constants Bn D ˛n D0n
and Cn D ˛n En0 are at this point still arbitrary since ˛n are arbitrary as well as D0n
and En0 .
To satisfy the initial conditions, we substitute the linear combination (8.57) into
conditions (8.43). This procedure yields
1
X n x
‰.x; 0/ D Cn sin D 1 .x/ ; (8.58)
nD1
L
( 1
)
X n x
‰t .x; t/jtD0 D !n ŒBn cos.!n t/ Cn sin.!n t/ sin
nD1
L
tD0
1
X n x
D Bn !n sin D 2 .x/ : (8.59)
nD1
L
Now, what we have just obtained is quite curious: we have just one equa-
tion (8.58) for an infinite number of coefficients Cn , and another single equa-
tion (8.59) for an infinite set of Bn coefficients. How does this help in finding these
coefficients? Well, although we do indeed have just one equation to find an either
set of the coefficients, these equations are written for a continuous set of x values
between 0 and L. That means, that strictly speaking, we have an infinite number of
such equations, and that must be sufficient to find all the coefficients.
Equipped with this understanding, we now need to devise a practical way of
finding the coefficients Cn and Bn . For this we note that we have already come
across the functions sin .n x=L/ in Sect. 3.1 when considering Fourier series. We
have shown there that these eigenfunctions satisfy the orthogonality relation (3.6)
with the integration performed from L to L.
Problem 8.18. Prove the orthogonality relation

Z L
n x m x L
sin sin dx D ınm ; (8.60)
0 L L 2
where integration is performed for 0 x L. Here ınm denotes the Kronecker

delta symbol (ınm D 1 if n D m and ınm D 0 if n ¤ m).
Therefore, we can employ a general method developed there (see also a more
fundamental discussion in Sect. 3.7.3) to find the coefficients: multiply both sides of
Eqs. (8.58) and (8.59) by the function sin .m x=L/ with some fixed m (D 1; 2; : : :),
then integrate both sides with respect to x from 0 to L, and use the orthogonality
relations (8.60). However, in our case this is not needed: Eqs. (8.58) and (8.59) are
just Fourier sine series for 1 .x/ and 2 .x/, respectively. It follows, therefore, that
in both cases the expressions for the coefficients can be borrowed directly from
Eq. (3.10):
Z
2 L
n x
Cn D 1 .x/ sin dx ; (8.61)
L 0 L
Z
2 L
n x
Bn D 2 .x/ sin dx ; (8.62)
!n L 0 L
where n D 1; 2; : : :. This result finally solves the entire problem as the solu-
tion (8.57) with the coefficients given by Eqs. (8.61) and (8.62) satisfies the PDE
and both the boundary and initial conditions.
In special cases when the string is initially at equilibrium (the initial displacement
is zero, 1 .x/ D 0) and is initially given a kick (so that 2 .x/ ¤ 0 at least for some
values of x), the coefficients Cn D 0 for all n. Conversely, if the string has been
initially displaced, 1 .x/ ¤ 0, and then released so that initial velocities, ‰t .x; 0/,
are zeros (and hence 2 .x/ D 0), then all the coefficients Bn D 0.
Hence, the solution consists of a superposition of oscillations, ‰n .x; t/, associated
with different frequencies !n , Eq. (8.55): each point x of the string in the elementary
oscillation ‰n .x; t/ performs a simple harmonic motion with that frequency. At the
same time, if we look at the shape of the string due to the same elementary motion at
a particular time t, then it is given by the sine function sin .n x=L/. The elementary
motion ‰n .x; t/ is called the n-th normal mode of vibration of the string, and its
frequency of vibration, !n D n c=L, is called its n-th harmonic or normal mode
frequency. The n D 1 normal mode is called fundamental, with the frequency
!1 D c=L. Frequencies of all other modes are integer multiples of the fundamental
frequency, !n D n!1 , i.e. !n is exactly n times larger.
Note that the fundamental frequency is the lowest sound frequency to be given by
the given string. Recalling
p the expression derived in Sect. 8.2.1 for the wave velocity
of the string, c D T0 = , where T0 is the tension and the string density, we see
that there are several ways of affecting the frequency
s
T0
!1 D
L
of the fundamental mode. Indeed, e.g. taking a longer string and/or applying less
tension would reduce !1 , while increasing the tension and/or taking a shorter
string would increase the lowest frequency of sound the string can produce. These
principles are widely used in various musical instruments such as the guitar, violin,
piano, etc. For instance, in a six string guitar by tuning peg heads (or tuning keys)
the tension in a string can be adjusted without changing its length considerably
causing the string fundamental frequency to go either up (higher pitch) or down
(lower pitch). Also note that thinner strings in the guitar (they are arranged in a
lower position) produce higher pitch than the thicker ones (which are set up at a
higher position in the set) since thinner strings are denser (thicker strings are hollow
inside). All six strings in a classical guitar are approximately of the same length.
Problem 8.19. Consider a string of length L fixed at both ends. Assume that
the string is initially (at t D 0) pulled by 0:06 at x D L=5 and then released.
Show that the corresponding solution of the wave equation is
1
X
3 n nx cnt
‰.x; t/ D 2 n2
sin sin cos :
nD1
4 5 L L
Problem 8.20. Consider the propagation of sound in a pipe. The longitudinal

oscillations of the air in the pipe are described by the function ‰ .x; t/, which
obeys the wave equation ‰tt D c2 ‰xx with the initial conditions ‰.x; 0/ D 0
and ‰t .x; 0/ D v0 . This corresponds to somebody blowing into one end (at
x D 0) of the pipe with velocity v0 at the initial moment only. The boundary
condition at the mouth end of the pipe (x D 0) is ‰.0; t/ D 0, i.e. the mouth
is attached to the pipe all the time and hence there are no vibrations of the air
there, while the other end, x D L, of the pipe is opened, i.e. the corresponding
boundary condition there can be written as ‰x .L; t/ D 0. Separate the
variables in the PDE, apply the boundary conditions and then show that
the corresponding eigenvalues and eigenfunctions are pn D L n C 12 and
Xn .x/ D sin .pn x/, respectively, where n D 0; 1; 2; 3; : : :. Argue why negative
values of n yield linearly dependent eigenfunctions and hence can be ignored.
Next, show that the eigenfunctions form an orthogonal set:
Z L
L
Xn .x/Xm .x/dx D ınm :
0 2
Then, construct the general solution of the equation, apply the initial conditions
and show that the solution of the problem reads
X1
2v0
‰.x; t/ D 2
sin .pn ct/ sin.pn x/ :
nD0
cLp n
Problem 8.21. Consider the previous problem again, but assume that the
x D L end of the pipe is closed as well. Show that in this case
1
X 4v0
‰.x; t/ D sin .pn ct/ sin.pn x/ ;
cLp2n
nD1 .odd/
where the summation is run only over odd values of n and pn D n=L.
8.2.6 Forced Oscillations of the String
Let us now consider a more complex problem of the forced oscillations of the string.
In this case the PDE has the form
c2 ‰xx C f .x; t/ D ‰tt ; (8.63)
where f .x; t/ is some function of x and t. We shall again consider zero boundary
conditions,
‰.0; t/ D ‰ .L; t/ D 0 ; (8.64)
but general initial conditions,
‰.x; 0/ D 1 .x/ and ‰t .x; 0/ D 2 .x/ : (8.65)
The method of separation of variables is not directly applicable here because of the
function f .x; t/. Still, as we shall see below, even in the general case of arbitrary
f .x; t/, it is easy to reformulate the problem in such a way that it can be solved using
the method of separation of variables.
Indeed, let us seek the solution ‰ .x; t/ as a sum of two functions: U .x; t/
and V.x; t/. The first one satisfies the following homogeneous problem with zero
boundary conditions and the original initial conditions,
c2 Uxx DUtt ; U.0; t/DU.L; t/D0 ; U.x; 0/D 1 .x/ and Ut .x; 0/ D 2 .x/ ;
(8.66)
while the other function satisfies the inhomogeneous PDE with both zero initial and
boundary conditions:
c2 Vxx C f D Vtt ; V.0; t/ D V.L; t/ D 0 ; V.x; 0/ D Vt .x; 0/ D 0 : (8.67)
It is easy to see that the function ‰ D U C V satisfies the original problem (8.63)–
(8.65). The beauty of this separation into two problems (which is reminiscent of
splitting the solution of an inhomogeneous linear DE into a complementary solution
and a particular integral) is that the U problem is identical to the one we considered
before in Sect. 8.2.5 and hence can be solved by the method of separation of
variables. It takes full care of our general initial conditions. Therefore, we only
need to consider the second problem related to the function V .x; t/ which contains
the inhomogeneity. We only need to find just one solution (the particular integral)
of this second problem. Recall (Sect. 8.2.4) that the solution of the problem (8.67)
is unique.
To find the particular integral V.x; t/, we shall use the following trick. This
function is specified on a finite interval 0 x L only, and is equal to zero at its
ends. However, we are free to extend (“continue”) it to the two times larger interval
L x L. It is also convenient to define V.x; t/ such that it is odd in the whole
interval: V.x; t/ D V.x; t/. Moreover, after that, we can also periodically repeat
V.x; t/ thus defined over the whole x axis. This makes V.x; t/ periodic and hence
expandable into the Fourier series (at each t) with the period of 2L:
a0 X h n xi
1
n x
V.x; t/ D C Wn cos C Vn sin ;
2 nD1
L L
where (see Sect. 3.1)

Z
1 L
n x
Wn D V.x; t/ cos dx ;
L L L
while
Z Z
1 L
n x 2 L
n x
Vn D V.x; t/ sin dx D V.x; t/ sin d:
L L L L 0 L
Due to the fact that V.x; t/ is odd on the whole interval L x L, all the Wn
coefficients are equal to zero, so that only sine terms in the Fourier series for V.x; t/
remain. This proves that we can seek the solution of the problem (8.67) as a sine
Fourier series:
1
X nx
V .x; t/ D Vn .t/ sin : (8.68)
nD1
L
This expansion satisfies the zero boundary conditions automatically: V.0; t/ D

V.L; t/ D 0. Note that if we chose the function V.x; t/ to be even in the interval
L x L, then the coefficients Vn would be all equal to zero, while the Wn
coefficients, including the W0 , would in general be not. Then, the Fourier series
for V.x; t/ will include in this case only the free and the cosine terms and will not
guarantee that at x D 0 and x D L the expansion is equal to zero. This would
introduce unnecessary complications into the procedure of solution.
By choosing the coefficients-functions Vn .t/ appropriately, we hope to satisfy the
PDE and the zero initial conditions.
Substituting this expansion into the PDE in (8.67), one obtains
1
X X 1
cn 2 nx nx
Vn .t/ sin C f .x; t/ D VR n .t/ sin I
nD1
L L nD1
L
here VR n D @2 Vn =@t2 . Multiply both sides by Xm .x/ D sin . mx=L/ with some
integer m and then integrate with respect to x between 0 and L. Because the
eigenfunctions Xn are orthogonal, only the single term with n D m will be left
in both sums, yielding
2
Z L
cm L mx L
Vm .t/ C f .x; t/ sin dx D VR m .t/ ;
L 2 0 L 2
or
VR m C !m2 Vm D fm .t/ ; (8.69)
where !m D cm=L and

Z
2 L
mx
fm .t/ D f .x; t/ sin dx : (8.70)
L 0 L
It is easy to see that fm .t/ can formally be considered as a Fourier coefficient in the
expansion of the function f .x; t/ into the sine Fourier series:
1
X mx
f .x; t/ 7! fm .t/ sin : (8.71)
mD1
L
However, great care is needed here when offering this “interpretation”. Indeed,
although originally the function f .x; t/ was defined only for 0 x L, it can
be “defined” additionally (and arbitrarily) for L < x < 0 as well. Then the piece
thus defined for the interval L < x L can be periodically repeated into the whole
x axis, justifying an expansion of f .x; t/ into the Fourier series. This, however, will
contain all terms including the free and the cosine terms. Since f .x; t/ may not in
general be equal to zero at the boundaries x D 0 and x D L, it is impossible to
justify the sine Fourier series in Eq. (8.71) for f .x; t/. At the same time, we have
arrived at Eq. (8.70) for the coefficients fm .t/ without assuming anything about the
function f .x; t/.
So, what is left to do is to solve DE (8.69) with respect to Vm .t/. At this point it
is convenient to recall that V.x; t/ must satisfy zero initial conditions. This can only
be accomplished if for all values of m we have
ˇ
d ˇ
Vm .0/ D Vm .t/ˇˇ D0:
dt tD0
Therefore, Eq. (8.69) has to be solved subject to zero initial conditions.
Problem 8.22. Using the method of variation of parameters (Sect. I.8.2.2.2),

show that the general solution of the DE
y00 .x/ C ! 2 y.x/ D f .x/ (8.72)
is
Z Z x
1 x 1
y .x/ D C1 C f .x1 / ei!x1 dx1 ei!x C C2 f .x1 / ei!x1 dx1 ei!x ;
2i! 0 2i! 0
(8.73)
(continued)

where C1 and C2 are arbitrary constants. Next, assuming the zero initial
conditions, y.0/ D y0 .0/ D 0, show that C1 D C2 D 0, and hence the final
solution can be written simply as:
Z
1 x
y.x/ D f .x1 / sin Œ! .x x1 / dx1 : (8.74)
! 0
Using formula (8.74) of the Problem and Eq. (8.70), we can write down the
solution of the DE (8.69) as
Z t
1
Vm .t/ D fm .t1 / sin Œ!m .t t1 / dt1
!m 0
Z t Z L
2 mx
D dt1 dx f .x; t1 / sin Œ!m .t t1 / sin : (8.75)
!m L 0 0 L
Once the Fourier coefficients Vm .t/ are defined via the function f .x; t/, then the full
Fourier series (8.68) is completely defined for the auxiliary function V.x; t/ yielding
the solution as ‰ D U C V. This fully solves our problem.
8.2.7 General Boundary Problem
Now we are prepared to solve the most general boundary problem in which both
initial and boundary conditions are arbitrary and the equation is inhomogeneous:
c2 ‰xx C f .x; t/ D ‰tt ; (8.76)

ˇ
@‰ ˇˇ
‰.x; 0/ D 1 .x/ ; D 2 .x/ I (8.77)
@t ˇtD0
‰.0; t/ D '1 .t/ ; ‰.L; t/ D '2 .t/ : (8.78)
The trick here is to first introduce an auxiliary function, U.x; t/, which satisfies the
boundary conditions above:
U .0; t/ D '1 .t/ and U .L; t/ D '2 .t/ :
There could be many choices to accommodate this requirement. The simplest choice
seems to be a linear function in x:
x
U .x; t/ D '1 .t/ C Œ'2 .t/ '1 .t/ : (8.79)
L
It is easy to see why this is useful: with this choice the function V.x; t/ D ‰ .x; t/
U .x; t/ satisfies zero boundary conditions. Of course, the PDE for the function
V .x; t/ would look more complex and the initial conditions will be modified.
However, it is easy to see that the problem for this new function can be solved since
we have eliminated the main difficulty of having arbitrary boundary conditions.
Then, the full solution is obtained via
‰ .x; t/ D V .x; t/ C U .x; t/ :
Problem 8.23. Show that the problem for the auxiliary function V .x; t/ cor-
responds to solving an inhomogeneous wave equation with general initial and
zero boundary conditions:
Vtt D c2 Vxx C fQ .x; t/ ;

V .x; 0/ D Q 1 .x/ ; Vt .x; 0/ D Q 2 .x/ I
V .0; t/ D V .L; t/ D 0 ;
where
x 00
fQ .x; t/ D f .x; t/ '100 .t/ ' .t/ '100 .t/ ;
L 2
Q 1 .x/ D 1 .x/ '1 .0/ x Œ'2 .0/ '1 .0/ ;
L
Q 2 .x/ D 2 .x/ '1 .0/ x '20 .0/ '10 .0/ :
0
L
The full solution of this particular problem was given in Sect. 8.2.6.
Problem 8.24. Consider the following stationary (i.e. fully time independent)
problem:
c2 ‰xx C f .x/ D 0 I ‰.0/ D '1 ; ‰.L/ D '2 ;
where '1 and '2 are constants. Note that we do not have any initial conditions
here; it is assumed that when the system was subjected to some initial
conditions with stationary boundary conditions and time independent external
force, after a very long time the system would no longer “remember” the initial
conditions, their effect will be washed out completely.
Integrate twice the PDE with respect to x (keeping the arbitrary constants)
to show that
Z Z x1
1 x
‰ .x/ D C1 x C C2 2 dx1 f .x2 / dx2 :
c 0 0
(continued)

Then show, by applying the boundary conditions, that
Z Z
1 1 L x1
C1 D .'2 '1 / C 2 dx1 f .x2 / dx2
L cL 0 0
and C2 D '1 .
Problem 8.25. Consider a general problem with arbitrary initial conditions,
stationary boundary conditions and time independent function f .x; t/ D f .x/:
c2 ‰xx C f .x/ D ‰tt I ‰.0; t/ D '1 ; ‰.L; t/ D '2 ;

ˇ
@‰ ˇˇ
‰.x; 0/ D 1 .x/ ; D 2 .x/ :
@t ˇtD0
Construct the solution as ‰ .x; t/ D U.x/ C V .x; t/, where U.x/ is the
solution of the corresponding stationary problem from Problem 8.24. Show
that the function V.x; t/ satisfies the following homogeneous problem with zero
c2 Vxx DVtt I V.0; t/DV.L; t/D0 and V.x; 0/D 1 .x/U.x/; Vt .x; 0/ D 2 .x/:
This problem is solved using the Fourier method, Sect. 8.2.5.
8.2.8 Oscillations of a Rectangular Membrane
The method of solving the wave equation for the 1D case considered above can be
generalised to the cases of 2D and 3D as well. To illustrate this point, we shall
consider here a 2D case of transverse oscillations of a square membrane fixed
around its boundary (this corresponds to zero boundary conditions). The membrane
is shown in Fig. 8.3; it is stretched in the x y plane in the intervals 0 x L and
0 y L.
Fig. 8.3 2D square

membrane
Let ‰.x; y; t/ be the transverse displacement of the point .x; y/ of the membrane
(basically, ‰ is the z coordinate of the point .x; y/ of the oscillating membrane).
Then, the corresponding wave equation to solve is ‰ D c12 ‰tt , or
@2 ‰ @2 ‰ 1 @2 ‰
C D : (8.80)
@x2 @y2 c2 @t2
The membrane is fixed along its perimeter, so that the appropriate boundary
conditions are

‰.x; 0; t/ D ‰.x; L; t/ D 0 ; 0 x L
: (8.81)
‰.0; y; t/ D ‰.L; y; t/ D 0 ; 0 y L
The corresponding initial conditions are assumed to be very general:

ˇ
@‰ ˇˇ
‰.x; y; 0/ D 1 .x; y/ and D ‰t .x; y; 0/ D 2 .x; y/ : (8.82)
@t ˇtD0
To solve this problem, we will first attempt to separate all three variables .x; y; t/
by seeking a solution as a product of functions each depending only on a single
variable:
‰.x; y; t/ D X.x/Y.y/T.t/ : (8.83)
Substituting this trial solution into the PDE (8.80), one gets
1
X 00 YT C XY 00 T D XYT 00 ;
c2
which upon dividing by XYT results in:
X 00 Y 00 1 T 00
C D 2 : (8.84)
X Y c T
Here in the left-hand side, the two terms depend each only on their own variable,
x or y, respectively, while the function in the right-hand side depends only on the
variable t. This is only possible if each of the expressions, X 00 =X, Y 00 =Y and T 00 =T,
is a constant. Therefore, we can write
X 00 Y 00 1 T 00
D k1 ; D k2 ; and hence D k1 C k2 ; (8.85)
X Y c2 T
where k1 and k2 are two independent separation constants. Thus, we have three
ordinary DEs for each of the three functions:
X 00 D k1 X ; (8.86)
Y 00 D k2 Y ; (8.87)
00 2
T D c .k1 C k2 / T : (8.88)
The next steps are similar to the 1D case considered above: we first consider
equations for X.x/ and Y.y/ trying to satisfy the boundary conditions; this would
give us the permissible values for the separation constants, k1 and k2 . Once this is
done, the DE for T.t/ is solved. Finally, a general solution is constructed as a linear
combination of all product solutions, with the coefficients to be determined from the
initial conditions.
So, following this plan, let us consider the DE for the function X.x/, Eq. (8.86).
The boundary conditions on the product (8.83) at x D 0 and x D L require the
function X.x/ to satisfy X.0/ D X.L/ D 0. This problem is however fully equivalent
to the one we considered in Sect. 8.2.5 when discussing a one-dimensional string.
Therefore, it immediately follows that k1 D 2 must only be negative with taking
on the following discrete values:
! n D n; n D 1; 2; : : : ;
L
while the allowed solutions for X.x/ are given by the eigenfunctions
n
Xn .x/ D sin x ; (8.89)
L
corresponding to the eigenvalues k1 D n2 for any n D 1; 2; : : :. We do not need to
bother about the constant amplitude (a prefactor to the sine function) here as it will
be absorbed by other constants in T.t/ in the product elementary solution; recall, this
is exactly what happened in the case of the one-dimensional string. So we simply
choose it as one here.
The boundary condition (8.81) applied to (8.87) gives a similar result for k2 ,
namely: k2 D m2 with m D 1; 2; : : : being another positive integer, and the
corresponding eigenfunctions are
m
Ym .y/ D sin y ; m D 1; 2; : : : : (8.90)
L
Again, we do not keep the amplitude (prefactor) to the sine function as it will be
absorbed by other constants to appear in T.t/.
Next we consider Eq. (8.88) for T.t/:

T 00 C c2 2
n C 2
m TD0; (8.91)
which contains both n and m. It is a harmonic oscillator equation with the solution
Tnm .t/ D A0nm sin .!nm t/ C B0nm cos .!nm t/ ; (8.92)
and the frequency

q c p 2
!nm D c 2 C 2 D n C m2 : (8.93)
n m
L
So, in the case of the two-dimensional membrane we have an infinite number of

product solutions
n m
‰mn .x; y; t/ D A0nm sin .!nm t/ C B0nm cos .!nm t/ sin x sin y ;
L L
which are numbered by two integers n; m D 1; 2; 3; : : :. Any of these solutions
satisfies the PDE and the boundary conditions. However, neither may satisfy the
initial conditions. As you can see, there was no need to keep the prefactors in Xn .x/
and Ym .y/ as these would indeed be simply absorbed by the constants A0nm and B0nm
already contained in Tnm .t/.
To satisfy the initial conditions as well, we first construct the general solution as
a linear combination
1
X
‰ .x; y; t/ D ˛nm ‰nm .x; y; t/
n;mD1
of all possible elementary solutions:

1
X n m
‰.x; y; t/ D ŒAnm sin .!nm t/ C Bnm cos .!nm t/ sin x sin y ;
n;mD1
L L
(8.94)
where Anm D ˛nm A0nm and Bnm D ˛nm B0nm are new arbitrary constants.
The function (8.94), by construction, satisfies the PDE (8.80) and the boundary
conditions (8.81). The former is due to the superposition principle; the latter—due
to the fact that each elementary product solution obeys the boundary conditions,
and hence their linear combination will as well. Now, what is left to do is to choose
the constants Anm and Bnm in such a way as to satisfy the initial conditions (8.82),
similarly to the case of the 1D string. To obtain Bnm , we apply the initial condition
to ‰.x; y; t/, yielding
1
X n m
‰ .x; y; 0/ D 1 .x; y/ H) Bnm sin x sin y D 1 .x; y/ :
n;mD1
L L
Here ‰.x; y; 0/ is expanded into a double Fourier series with respect to the sine
functions, so that the expansion coefficients, Bnm , are found from it in the following
way. Multiply both sides of the above equation by the product Xn0 .x/Ym0 .y/ D
0 0
sin Ln x sin Lm y with some fixed positive integers n0 and m0 , and then integrate
both sides over x and y between 0 and L. The eigenfunctions Xn .x/ for all values of
n D 1; 2; : : : form an orthogonal set, and so do the functions Ym .y/. Therefore, in
the left-hand side integration over x leaves only a single n D n0 term in the sum over
n, while integration over y results in a single m D m0 term in the m-sum remaining;
in the right-hand side there is a double integral with respect to x and y. Hence, we
immediately obtain
2 Z L Z L
2 n m
Bnm D dx dy 1 .x; y/ sin x sin y : (8.95)
L 0 0 L L
To obtain coefficients Anm , we apply the initial condition to the time derivative of
‰.x; y; t/ at t D 0. Since
1
@‰ X n m
‰t .x; y; t/D D !nm ŒAnm cos .!nm t/ Bnm sin .!nm t/ sin x sin y ;
@t n;mD1 L L
the initial condition ‰t .x; y; 0/ D 2 .x; y/ yields

X n m
!nm Anm sin x sin y D '2 .x; y/ ;
nm
L L
and, therefore, after a similar reasoning to the one used to derive the above formula
for the Bnm coefficient, we obtain
2 Z L Z L
1 2 n m
Anm D dx dy 2 .x; y/ sin x sin y : (8.96)
!nm L 0 0 L L
Equations (8.94)–(8.96) fully solve the problem as they define such a function
‰ .x; y; t/ that satisfies the PDE and both the initial and boundary conditions.
The case of a circular membrane, also treated using the method of separation
of variables, but employing the polar coordinates, was considered in detail in
Sect. 4.7.5.
8.2.9 General Remarks on the Applicability

of the Fourier Method
We have seen above in Sect. 8.2.7 that the Fourier method plays a central role in
solving a general problem with non-zero boundary and general initial conditions.
Therefore, it is worth repeating again the main steps which lie as a foundation of
this method.
The method of separation of variables is only applicable to the problems with
zero boundary conditions. For a PDE of any number of dimensions written for any
number of coordinates, the construction of the general solution is based on several
well defined steps:
1. Construct a product solution as a product of functions each depending just on
one variable.
2. Substitute the product solution into the PDE and attempt to separate the variables
in the resulting equation. This may be done in steps. The simple separation like
the one in Eq. (8.84) may not always be possible; instead, a step-by-step method
is to be used. This works like this: using algebraic manipulations, “localise”
all terms depending on a single variable into a single expression that does not
contain other variables; hence, this expression must only be a constant. Equate
this expression to the first constant k1 yielding the first ordinary DE for the
variable in question. Once the whole term in PDE is replaced with that constant,
the obtained equation would only depend on other variables for which the same
procedure is repeated until only the last single variable is left; this will be an
ordinary DE containing all separation constants.
3. This procedure gives separated ordinary DEs for each of the functions in the
elementary product we have started from.
4. Solve the ordinary DEs for the functions in the product using zero boundary
conditions. This procedure gives an infinite number of possible solutions for
these functions together with possible values for the corresponding separation
constants, i.e. a set of eigenfunctions and corresponding to them eigenvalues. It
is now possible to write down a set of product solutions ‰i (where i D 1; 2; : : :)
for the PDE. Each of these solutions satisfies the equation and the zero boundary
conditions.
5. Next we use the superposition principle to construct the general solution of the
PDE in the form of the linear combination of all elementary product solutions:
1
X
‰D Ci ‰i : (8.97)
iD1
In the case of non-zero boundary conditions, an auxiliary function U is defined

which satisfies the full set of boundary conditions. It is convenient to choose this
function to be as simple as possible. Then, the solution is constructed as a sum of
two functions, ‰ D U C V, and the corresponding problem for the other auxiliary
function, V, is formulated using the original problem. This function would satisfy
zero boundary conditions for which the Fourier method is directly applicable.
The most crucial step which decides whether the whole method would work is the
procedure of separation of variables itself. No general rules exist to tell us whether
a given PDE is separable or not, we must just test and see!
For example, consider the following linear homogeneous PDE of the second
order:
˛
‰ C x2 C y2 ‰ D 0 ; (8.98)
2 2
where D @@2 x C @@2 y is the 2D Laplacian operator. An application of the product
solution ‰.x; y/ D X.x/Y.y/ in this case gives

1 d2 X 1 d2 Y
Cx 2˛
C Cy2˛
C 2 .xy/2˛ D 0 : (8.99)
X dx2 Y dy2
We see that in Cartesian coordinates Eq. (8.98) is not separable; this is because of
the free term 2 .xy/2˛ .
This, however, does not necessarily mean that the PDE is not separable. This
is because in some special coordinate systems it may appear to be separable!
Indeed, suppose now we transform the PDE (8.98) to polar coordinates .r; /. This
procedure gives

1 @ @‰ 1 @2 ‰
r C 2 C r2˛ ‰ D 0 : (8.100)
r @r @r r @ 2
Now we check if a solution in the form of an elementary product ‰.r; / D

R.r/ˆ. / can lead to a separation of the variables r and . Substituting this product
solution into the transformed Eq. (8.100) and dividing both sides by Rˆ, we find

1 d dR 1 1 d2 ˆ
r C 2 C r2˛ D 0 :
rR dr dr r ˆd 2
Here the expression in the square brackets is “localised” in the variable and hence
must be a constant. Let us call it k. Hence, we obtain two ordinary DEs:
1 0 0 k
ˆ00 D kˆ and rR C 2 C r2˛ D 0 :
rR r
We have succeeded in separating our PDE into two ordinary DEs in the polar
coordinate system, something which was impossible to do in the Cartesian system.
We see that a PDE may be separable in one coordinate system, but not in another!
This shows the importance of knowing various curvilinear coordinate systems!
Problem 8.26. Consider a PDE:
@2 ‰ @2 ‰ 2 2
2
C 2
C e˛.x Cy / ‰ D 0 ;
@x @y
written in the Cartesian coordinates. Using ‰.x; y/ D X.x/Y.y/, try to

separate the variables x; y in the equation. Why do you think it does not work?
Choose now the polar coordinate system .r; / and rewrite the equation in
this system for ‰.x; y/ D ‰.r; / employing the .r; /-part of the Laplacian.
Demonstrate that in this coordinate system separation of variables is possible,
and ‰.r; / D R.r/ˆ. / leads to two ordinary DEs for the functions R.r/ and
ˆ. /:
0 2
ˆ00 C kˆ D 0 and r rR0 C r2 er k R D 0 :
8.3 Heat Conduction Equation 585
Problem 8.27. The method of separation of variables in the case of equations

with more than two variables can be done in stages, when each variable is
separated individually from all others, i.e. one after the other, until one ends
up with a set of ordinary DEs. Consider the Laplace equation in the Cartesian
coordinate system:
@2 ‰ @2 ‰ @2 ‰
C C D0:
@x2 @y2 @z2
By substituting ‰.x; y; z/ D ˆ.x; y/Z.z/ into the equation, separate the

variables .x; y/ from z by introducing a separation constant k1 . Obtain two
equations: one for Z.z/ and another for ˆ.x; y/. Consider then the equa-
tion for ˆ.x; y/ and attempt to separate the variables there by considering
ˆ.x; y/ D X.x/Y.y/ and introducing another separation constant k2 . Find the
corresponding DEs for X.x/ and Y.y/. What is the final form of ‰.x; y; z/?
[Answer: the DEs for the three functions may be written as X 00 C k2 X D 0,
Y 00 .k1 C k2 / Y D 0 and Z 00 C k1 Z D 0.]
8.3 Heat Conduction Equation
In this section we shall consider parabolic PDEs using, as an important physical

example, the heat conduction equation
@‰
D ‰ C f ;
@t
where ‰.x; y; z; t/ is the temperature, the thermal diffusivity, f .x; y; z; t/ some
function due to internal sources and t time. This PDE has to be supplied with the
corresponding initial and boundary conditions.
If the boundary conditions do not depend on time and there are no sources
(f D 0), the temperature in the system will reach at long times (t ! 1) its
stationary distribution ‰1 .x; y; z/ which satisfies the Laplace equation:
‰ D 0 : (8.101)
For definiteness, let us consider a 1D heat transport equation
‰t D ‰xx C f : (8.102)
Initial conditions are given simply by
‰ .x; 0/ D .x/ : (8.103)

Note that since our PDE has only the first time derivative, this condition is sufficient.
Various boundary conditions can be supplied. The simplest ones correspond to
certain temperatures at the end points of the interval4 :
‰.0; t/ D '1 .t/ and ‰ .L; t/ D '2 .t/ ; (8.104)
however, other possibilities also exist. For instance, if at x D 0 end the heat flux is
known, then the above boundary condition at x D 0 is replaced by
‰x .0; t/ D '1 .t/ :
For simplicity, most of our analysis in the coming discussion will correspond to
the 1D heat transport equation, although theorems and methods to be considered
can be generalised for the 2D and 3D cases as well.
8.3.1 Uniqueness of the Solution
As in the case with the wave equation, one may question whether the above
conditions (8.103) and (8.104) which supplement the heat transport equation (8.102)
guarantee that the solution exists and that it is unique. We shall positively answer
the first part of that question later on by showing explicitly how the solution can
be constructed. Here we shall prove the second part, that under certain conditions
the solution of the heat transport problem is unique. We start from the following
theorem.
Theorem 8.1. If ‰ .x; t/ is a continuous function of both its variables for all
values of x, i.e. for 0 x L, and all times up to some time T, i.e. for 0 t T,
and ‰ satisfies the PDE (without the internal sources term)
‰t D ‰xx (8.105)
for all internal points 0 < x < L and times 0 < t T, then ‰ .x; t/ reaches
its maximum and minimum values either at the initial time t D 0 and/or at the
boundaries x D 0 and x D L.
Proof. Consider the first part of the theorem, stating that ‰ reaches its maximum
value either at the initial time and/or at the end points. We shall prove that by
contradiction assuming that ‰ reaches its maximum value at some internal point
.x0 ; t0 /, where 0 < t0 T and 0 < x0 < L. Let M be the maximum value of the
4
It can be shown that the functions '1 and '2 do not need to satisfy the consistency conditions
'1 .0/ D .0/ and '2 .0/ D .L/.
function ‰ at t D 0, x D 0 and x D L. In the combined .x; t/ notation these are

the points .x; 0/, .0; t/ and .L; t/ which we shall call “end points” for brevity. Then,
according to our assumption,
‰ .x0 ; t0 / D M C ;
where > 0. Next, consider an auxiliary function
V .x; t/ D ‰ .x; t/ C k .t0 t/ :
Obviously, V .x0 ; t0 / D M C . Let us choose the positive constant k such that

kT < =2. Then,
k .t0 t/ k .T t/ kT < :
2
Hence, if we now consider any of the end points .x; 0/, .0; t/ and .L; t/, then for
them
V .x; t/ D ‰ .x; t/ C k .t0 t/ M C ; (8.106)

2
since M is the maximum value of the function at the end points.
Since V .x; t/ is continuous (because ‰ is), it has to reach its maximum value at
some point .x1 ; t1 /:
V .x1 ; t1 / V .x0 ; t0 / D M C D M C C :
2 2
However, because at the end points we have that V .x; t/ M C =2, see Eq. (8.106),
the above inequality cannot be satisfied there. Hence, the point .x1 ; t1 / must be an
internal point satisfying 0 < x1 < L and 0 < t1 T.
Since this internal point is a maximum of a function of two variables, x and t, we
should at least have (see Sects. I.5.10.1 and I.5.10.2)5 :
Vx .x1 ; t1 / D 0 ; Vt .x1 ; t1 / D 0 and Vxx .x1 ; t1 / 0 :
However, since ‰ D V C k .t t0 /, then
‰xx .x1 ; t1 / D Vxx .x1 ; t1 / 0 and ‰t .x1 ; t1 / D Vt .x1 ; t1 / C k D k > 0 ;
and therefore
‰t .x1 ; t1 / ‰xx .x1 ; t1 / > 0 ;
5
The other sufficient condition (see Eq. (I.5.90)), that Vxx Vtt .Vxt /2 > 0, is not needed here.
which means that the PDE is not satisfied at the point .x1 ; t1 /. We arrived at a
contradiction to the conditions of the theorem which states that ‰ does satisfy the
equation at any internal point. Therefore, our assumption was wrong.
The case of the minimum, corresponding to the second part of the theorem, is
proven by noting that this case can be directly related to the case we have just
considered (the maximum) if the function ‰ 0 D ‰ is considered instead. Q.E.D.
The result we have just obtained has a clear physical meaning: since the internal
sources are absent (Eq. (8.105) is homogeneous), the heat cannot be created in the
system during the heat flow, and hence the temperature cannot exceed either its
initial values or values at both boundaries.
This theorem has a number of important consequences of which we only consider
the one related to the uniqueness of the solution of the heat transport equation with
general boundary conditions.
Theorem 8.2. The problem (8.102)–(8.104) has a unique solution.
Proof. Indeed, assume that there are two such solutions, ‰1 and ‰2 , both satisfying
the same PDE and the same initial and boundary conditions. Consider then their
difference V D ‰1 ‰2 . It satisfies the homogeneous PDE and zero initial
and boundary conditions. Since we have just shown that the solution of the
homogeneous problem can reach its maximum and minimum values only at the
end points (i.e. at t D 0 or at x D 0 or x D L), where V is zero, it follows that
V .x; t/ D 0 everywhere. Hence, ‰1 D ‰2 , our assumption is wrong, there is only
one solution to the problem possible. Q.E.D.
8.3.2 Fourier Method
Before considering the general problem (8.102)–(8.104), we shall first discuss the
simplest case of a one-dimensional problem with zero boundary conditions:
‰t D ‰xx ; ‰ .x; 0/ D .x/ ; ‰ .0; t/ D ‰ .L; t/ D 0 : (8.107)
We shall solve this problem using the method of separation of variables already
discussed above. Assuming the product solution, ‰ .x; t/ D X.x/T.t/, substituting
it into the PDE and separating the variables, we obtain
1 T0 X 00
XT 0 D X 00 T H) D :
T X
Since the expressions on both sides of the equal sign depend on different variables,
each of the expressions must be a constant. Let us denote this separation constant .
Hence, two ordinary DEs are obtained for the two functions, X.x/ and T.t/:
X 00 X D 0 and T 0 T D 0 : (8.108)

In our discussion on solving a similar problem for the wave equation in Sect. 8.2.5,
we arrived at something similar. Our next step there was to solve the equation for
the function X.x/ first to deduce the possible values for the separation constant .
The peculiarity of the current problem related to the heat transport equation is that
for this job it is convenient to start from solving the equation for the function T.t/
instead.
Indeed, the solution reads: T.t/ D Cet , where C is an arbitrary constant. It
is clear from this result then that the constant cannot be positive. Indeed, in
the case of > 0 the temperature in the system may grow indefinitely with time
which contradicts physics. Therefore, we conclude immediately that must be non-
positive, i.e. negative or equal to zero: 0. We can then write it as D p2 with
p being non-negative. Hence, the solution for T.t/ must be
2
T.t/ D Cep t : (8.109)
Next, we consider the equation for X.x/ which now has the form of the harmonic
oscillator equation:
X 00 C p2 X D 0 ;
with the solution
X.x/ D A sin .px/ C B cos .px/ :
Applying the boundary conditions X.0/ D X.L/ D 0 yields B D 0 and

n
p ! pn D ; n D 0; 1; 2; : : : : (8.110)
L
Therefore, as in the case of the Fourier method applied to the wave equation, we
also have here an infinite discrete set of the permissible values for the separation
constant n D p2n , where pn given above is an eigenvalue corresponding to the
eigenfunction
Xn .x/ D sin .pn x/ : (8.111)
Note that we dropped the constant prefactor here anticipating it to be absorbed by

other constants in the end.
Therefore, concluding this part, we find not one but an infinite set of product
solutions
2
‰n .x; t/ D Xn .t/Tn .t/ D epn t sin .pn t/ ; n D 0; 1; 2; : : : ;
each satisfies the PDE and the boundary conditions. Actually, since X0 .x/ D 0, there
is no need to consider the n D 0 function ‰0 .x; t/; it can be dropped.
Our strategy from this point on should be obvious: the PDE is linear and
hence, according to the superposition principle, a general linear combination of the
elementary product solutions,
1
X 1
X 2
‰ .x; t/ D ˛n ‰n .x; t/ D ˛n epn t sin .pn t/ ; (8.112)
nD1 nD1
is constructed which also satisfies the PDE and the boundary conditions for any
arbitrary constants f˛n g. Note that we started the summation from n D 1. It may
however not satisfy the initial conditions. So, we have to find such coefficients ˛n
which would ensure this happening as well. Applying the initial conditions to the
function above,
1
X
‰ .x; 0/ D .x/ H) ˛n sin .pn t/ D .x/ ;
nD1
we see that this corresponds to expanding the function .x/ into the sine Fourier
series. Therefore, the coefficients ˛n can be derived without difficulty (see also
Sect. 8.2.5):
Z
2 L
˛n D .x/ sin .pn x/ dx : (8.113)
L 0
Equations (8.112) and (8.113) fully solve our problem.
8.3.3 Stationary Boundary Conditions
The method considered in the previous section is easily generalised to non-zero but
stationary boundary conditions:
‰t D ‰xx ; ‰ .x; 0/ D .x/ ; ‰ .0; t/ D ‰1 and ‰ .L; t/ D ‰2 : (8.114)
Here ‰1 and ‰2 are two temperatures which are maintained constant for t > 0 at
both ends of our 1D system. This problem corresponds to finding the temperature
distribution ‰ .x; t/ in a rod of length L whose ends are maintained at temperatures
‰1 and ‰2 and the initial temperature distribution was .x/.
The trick to applying the Fourier method to this problem is first to get rid
of the non-zero boundary conditions by assuming the stationary solution to the
problem, i.e. the solution ‰1 .x/ which would be established after a very long time
(at t ! 1). Obviously, one would expect this to happen as the boundary conditions
are kept constant (fixed).
The stationary solution of the heat conduction equation does not depend on
time and hence, after dropping the ‰t D 0 term in the PDE, in the 1D case
the temperature distribution is found to satisfy an ordinary differential equation
‰xx D 0. This is to be expected as the ‰xx D 0 equation is the particular case for
the Laplace equation (8.101) we already mentioned above as the PDE describing
the stationary heat conduction solution.
Solution of the equation ‰xx D 0 is a linear function, ‰1 .x/ D Ax C B, where
A; B are found from the boundary conditions:
1
B D ‰1 and AL C B D ‰2 H) AD .‰2 ‰1 / ;
L
so that the required stationary solution reads
x
‰1 .x/ D ‰1 C .‰2 ‰1 / : (8.115)
L
Thus, the distribution of temperature in a rod kept at different temperatures at both
ends becomes linear at long times.
Once the stationary solution is known, we seek the solution of the whole
problem (8.114) by writing ‰ .x; t/ D ‰1 .x/ C V .x; t/. Since ‰1 .x/ does not
depend on time, but satisfies the boundary conditions, the problem for the auxiliary
function V .x; t/ is the one with zero boundary conditions:
Vt D Vxx ; V .x; 0/ D .x/ ‰1 .x/ and ‰ .0; t/ D ‰ .L; t/ D 0 :

(8.116)
This is the familiar problem (8.107) which can be solved by the Fourier method as
was discussed in the previous section. Since ‰ ! ‰1 at long times, we expect that
V ! 0 in this limit.
Although we have just considered the 1D case, the general principle remains
the same for any dimensions: first, find the stationary solution by solving the
Laplace equation, ‰1 .r/ D 0, subject to the given boundary conditions, and
then formulate the zero boundary condition problem for the rest of the solution,
V.r; t/ D ‰ .r; t/ ‰1 .r/. The latter problem can then be solved using the Fourier
method.
As an example of the considered method, we shall consider a 3D problem of a
ball of radius S in a hot bath. The ball initially has the temperature ‰0 . At t D 0
it was placed in a water tank of temperature ‰1 > ‰0 . We need to determine the
temperature distribution in the ball at all times.
If the tank is much bigger than the ball itself, one can assume that the ball is
surrounded by an infinitely large heat bath, and hence the temperature ‰ inside the
ball would only depend on time and the distance from the ball’s centre. In other
words, after rewriting the problem in the spherical coordinates .r; ; '/, we state
that ‰ only depends on r and t, there is no dependence on the angles. This in turn
means that the 3D heat conduction equation, ‰ D ‰t , becomes (after writing the
Laplacian ‰ in the spherical coordinates, Sect. 7.9, and dropping the angle terms):

1 @‰ 1 @ @‰
D 2 r2 :
@t r @r @r
This PDE is only slightly different from the 1D heat conduction equation we have
been considering up to now. The initial and boundary conditions are, respectively:
‰ .r; 0/ D ‰0 and ‰ .S; t/ D ‰1 :
We first have to solve the stationary problem, i.e. the equation:

Z
1 d 2 d‰ 2 d‰ dr C1
2
r D0 H) r D C 1 H) ‰.r/DC 1 2
D CC2 ;
r dr dr dr r r
where C1 and C2 are arbitrary constants. However, since the solution must be finite
at all r including the ball centre, r D 0, the constant C1 must be zero. Thus, ‰.r/ D
C2 must be constant for all 0 r S, the temperature inside the ball should be the
same across all of it. It is easy to see that in order to satisfy the boundary condition,
‰ .S; t/ D ‰1 , which is valid for all times (including very long ones), one has to
take C2 D ‰1 . Hence, ‰1 .r/ D ‰1 . This result is to be expected: at long enough
times the whole ball would have eventually the same temperature as the water in the
tank around it.
Next, we should solve the full heat conduction equation subject to the corre-
sponding initial and boundary conditions. Separating out the stationary part of the
temperature distribution, ‰ .r; t/ D ‰1 C V .r; t/, we obtain the following problem
for the auxiliary function V .r; t/:

1 @V 1 @ 2 @V
D 2 r ; V .r; 0/ D ‰0 ‰1 and V .S; t/ D 0 :
@t r @r @r
Next, we seek the solution as an elementary product of two functions, V.r; t/ D

R.r/T.t/, substitute it into the PDE above and separate the variables:
1 0 1 0 1 T0 1 0
RT D 2 T r2 R0 H) D 2 r2 R0 :
r T r R
Both sides depend on their own variables and hence must be both equal to the same
non-positive constant D p2 . This ensures a time non-increasing T.t/:
1 T0 2
D p2 H) T.t/ D Cep t :
T
Consider now the equation for R.r/:
1 2 0 0 2
r R D p2 R H) R00 C R0 C p2 R D 0 :
r2 r
Problem 8.28. Show that a new function R1 D rR satisfies the harmonic

oscillator equation R001 C p2 R1 D 0.
The ordinary DE for R1 is easily solved,
R1 .r/ D A sin .pr/ C B cos .pr/ ;
yielding the following solution for the R function:
R1 sin .pr/ cos .pr/

R.r/ D DA CB :
r r r
Similarly to the stationary case considered above, we must set the constant B to zero
as the solution must be finite everywhere including the point r D 0 (note that sinrpr
is finite at r D 0)6 . Thus:
sin .pr/
R.r/ D ;
r
where we have dropped the arbitrary constant as it will be absorbed by other
constants later on when constructing the general solution as the linear combination
of all elementary solutions.
Use now the boundary conditions:
sin .pS/ n
R.S/D D0 H) sin .pS/ D0 H) p ) pn D ; nD1; 2; 3; : : : :
S S
We do not need to consider the zero value of n as the eigenfunction Rn .r/ D
r1 sin .pn r/ is equal to zero for n D 0.
Therefore, by constructing an appropriate linear combination, we obtain the
following general solution:
1
X sin .pn r/ p2n t n
V.r; t/ D ˛n e ; with pn D ; n D 1; 2; 3; : : : :
nD1
r S
Note that n starts from the value of one. We also see that V ! 0 at long times as
expected.
The linear combination above satisfies our PDE and the boundary conditions. To
force it to satisfy the initial conditions as well, we write
1
X sin .pn r/
V.r; 0/ D ‰0 ‰1 D ˛n :
nD1
r
It is convenient first to rewrite this equation in a slightly different form:

1
X
r .‰0 ‰1 / D ˛n sin .pn r/ ;
nD1
6
Recall that limx!0 sin x
x
D 1.
from which it immediately follows that the coefficients are given by

Z S
2
˛n D .‰0 ‰1 / r sin .pn r/ dr ; (8.117)
S 0
since the functions fsin pn rg form an orthogonal set:

Z S
S
sin .kn r/ sin .km r/ dr D ınm :
0 2
Problem 8.29. Calculate the integral in Eq. (8.117) to show that

2S
˛n D .1/n .‰1 ‰0 / ;
n
and hence the final solution of the problem is
X1
2S .1/n 2
‰.r; t/ D ‰1 C .‰1 ‰0 / sin .pn r/ epn t :
r nD1
n
At t ! 1 it tends to ‰1 as expected.
Problem 8.30. Consider the heat flow in a bar of length L, described by the
1D heat transport equation ‰xx D ‰t , where is the thermal diffusivity.
Initially the distribution of temperature in the bar is ‰.x; 0/ D .x/, while the
two ends of the bar are maintained at constant temperatures ‰.0; t/ D ‰1 and
‰ .L; t/ D ‰2 . Show that the temperature distribution in the bar is given by
X1
x 2
‰.x; t/ D ‰1 C .‰2 ‰1 / C ˛n sin .pn x/ epn t ;
L nD1
where pn D n=L and

Z
2 L
˛n D .x/ sin .pn x/ dx :
L 0
Problem 8.31. Consider a similar problem to the previous one, but assume
that at one end the temperature is fixed, ‰ .0; t/ D ‰1 , while there is no heat
loss at the x D L end, i.e. that ‰x .L; t/ D 0. Show that in this case the general
solution of the problem is
1
X 2
‰.x; t/ D ‰1 C ˛n sin .pn x/ epn t ;
nD0

where pn D L
n C 12 and
Z
2 L
2‰1
˛n D .x/ sin .pn x/ dx :
L 0 pn L
Problem 8.32. Consider the heat flow in a bar of length L with the initial
distribution of temperature given by ‰.x; 0/ D ‰0 sin 32Lx and the boundary
conditions ‰ .0; t/ D 0 and ‰x .L; t/ D 0 (cf. the previous Problem). Show that
the temperature distribution in the bar is
2
‰.x; t/ D ‰0 ep1 t sin.p1 x/ ;
where p1 D 3 =2L.
8.3.4 Heat Transport with Internal Sources
There is one last specific auxiliary problem we need to study before we would
be ready to consider the most general boundary problem for the heat conduction
equation.
So, let us consider the following inhomogeneous problem with stationary
‰t D‰xx Cf .x; t/ ; ‰ .x; 0/ D .x/ ; ‰ .0; t/ D‰1 and ‰ .L; t/ D‰2 :

(8.118)
The solution of this problem follows more or less the same route as the one we
used in Sect. 8.2.6 when solving a similar problem for the wave equation. First of
all, we split the solution into three parts:
h x i
‰ .x; t/ D U .x; t/ C V .x; t/ C ‰1 C .‰2 ‰1 / ;
L
where the third function (in the square brackets) satisfies the boundary conditions
so that the other two functions, V and U, satisfy zero boundary conditions. Next, the
function V .x; t/ is chosen to satisfy the inhomogeneous equation with zero initial
and boundary conditions,
Vt D Vxx C f .x; t/ ; V .x; 0/ D 0 ; V .0; t/ D V .L; t/ D 0 ; (8.119)
while the function U .x; t/ satisfies the homogeneous equation with modified initial
conditions:
Ut D Uxx ; U .x; 0/ D Q .x/ ; U .0; t/ D U .L; t/ D 0 ; (8.120)
where
Q .x/ D .x/ ‰1 x .‰2 ‰1 / :

L
This latter problem is solved directly using the Fourier method as explained in
the previous section. The problem (8.119) is solved using the method developed
in Sect. 8.2.6.
Problem 8.33. Show that the solution of the problem (8.119) is given by the
following formulae:
1
X n
V .x; t/ D Vn .t/ sin .pn x/ ; pn D ;
nD1
L
where
Z Z
t
2 2 L
Vn .t/ D fn . / epn .t/ d and fn . / D f .x; / sin .pn x/ dx :
0 L 0
8.3.5 Solution of the General Boundary Heat Conduction

Problem
Finally, let us consider the most general 1D heat conduction problem:
‰t D‰xx Cf .x; t/ ; ‰ .x; 0/ D .x/ ; ‰ .0; t/ D'1 .t/ and ‰ .L; t/ D'2 .t/ :
(8.121)
This problem corresponds, e.g. to the following situation: a rod which initially had
a temperature distribution .x/ at t D 0 is subjected at its x D 0 and x D L ends to
some heating according to functions '1 .t/ and '2 .t/. The solution is obtained using
essentially the same method as the one we developed in Sect. 8.2.7 for the wave
equation.
Problem 8.34. Show that if we split the function ‰ as

x
‰ .x; t/ D '1 .t/ C Œ'2 .t/ '1 .t/ C V .x; t/ ;
L
where the first term is constructed in such a way as to satisfy the boundary
conditions, then the problem for the auxiliary function V reads
Vt D Vxx C fQ .x; t/ ; V .x; 0/ D Q .x/ ; V .0; t/ D V .L; t/ D 0 ;

(8.122)
(continued)
8.4 Problems Without Boundary Conditions 597

where
x
fQ .x; t/ D f .x; t/ '10 .t/ '20 .t/'10 .t/
L
x
and Q .x/ D .x/ '1 .0/ Œ'2 .0/'1 .0/ :
L
The problem (8.122) has been solved in the previous section. In fact, there we
considered a slightly more difficult problem with non-zero stationary boundary
conditions; the problem (8.122) is even easier as it has zero boundary conditions.
8.4 Problems Without Boundary Conditions
Sometimes boundary conditions are absent, only the initial conditions are specified.
This kind of problems appear, for instance, when the region in which the solution is
sought for is infinite. Even in this case the Fourier method can be used, albeit with
some modifications. As an example of this kind of situation, let us consider a heat
transport PDE in an infinite one-dimensional region:
‰t D ‰xx ; ‰ .x; 0/ D .x/ ; 1 < x < 1 : (8.123)
Here we assume that the function .x/ is absolutely integrable, i.e. that
Z 1
j .x/j dx D M < 1 : (8.124)
1
The idea is to apply the separation of variables and build up all possible elementary
product solutions satisfying the PDE, and then apply the initial conditions. The
calculation, at least initially, goes along the same route as in Sect. 8.3.2. We start
by writing ‰ .x; t/ D X.x/T.t/, next substitute this trial solution into the PDE and
separate the variables:
1 T0 X 00
D D 2 ;
T X
where 2 is the separation constant. Note that, as was discussed in Sect. 8.3.2,
it has to be non-positive to ensure that the obtained solution has proper physical
behaviour. Solving the obtained ordinary DEs,
X 00 C 2 X D 0 and T 0 D 2 T ;
for the functions T.t/ and X.x/, we can write the obtained elementary solution for
the given value of as follows:
2
‰ .x; t/ D C1 ./eix C C2 ./eix e t : (8.125)
Here the subscript in ‰ shows explicitly that the function above corresponds to
the particular value of the parameter , and C1 and C2 are arbitrary constants which
may also depend on , i.e. these become some functions of .
The obtained solution (8.125) satisfies the PDE, but not yet the initial conditions.
Due to linearity of the PDE, any linear combination of such elementary solutions
must also be a solution. However, we do not have any boundary conditions in place
to help us to choose the permissible values of . Hence, we have to admit that can
take any positive real values, 0 < 1, and hence the linear combination (the
sum) turns into an integral:
Z 1 Z 1
2
‰ .x; t/ D ‰ .x; t/ d D C1 ./eix C C2 ./eix e t d :
0 0
Because both positive and negative exponents are present, e˙ix , there is no need
to keep both terms in the square brackets. Instead, we simply take one of the terms
and extend the integration over the whole axis, i.e. the expression above can be
rewritten simply as:
Z 1
2
‰ .x; t/ D C./eix e t d ; (8.126)
1
where C./ is some (generally complex) function of .

This function can be found from the initial conditions:
Z 1
‰ .x; 0/ D .x/ H) C./eix d D .x/ :
1
The obtained expression is nothing but an expansion of the function .x/ into the
Fourier integral. Therefore, the function C ./ can be found from this equation
simply by writing the inverse Fourier transform:
Z 1
1
C ./ D .x/ eix dx :
2 1
Substituting this expression into Eq. (8.126), we obtain

Z 1 Z 1
1 2
‰ .x; t/ D d dx1 .x1 / ei.xx1 / e t : (8.127)
2 1 1
The integral over x1 converges uniformly with respect to ,

ˇZ ˇ Z ˇ ˇ Z
ˇ 1 ˇ 1
ˇ ˇ
1
ˇ .x1 / ei.xx1 / ˇ
dx1 ˇ .x1 / ei.xx1 / ˇ D
ˇ dx1 ˇ dx1 j .x1 /j DM<1 ;
1 1 1
8.5 Application of Fourier Method to Laplace Equation 599
and hence the order of integration in Eq. (8.127) can be interchanged:

Z 1 Z 1
1 2t
‰ .x; t/ D dx1 .x1 / d ei.xx1 / e : (8.128)
2 1 1
The integral over is calculated explicitly, see Eq. (2.56), and we obtain
Z 1
‰ .x; t/ D dx1 G .x x1 ; t/ .x1 / ; (8.129)
1
where
1 2 =4t
G .x; t/ D p ex :
4 t
This is the final result. We have already met a 3D analogue of the function, which
appeared in the square brackets of Eq. (5.71), in Sect. 5.3.3. There we solved, using
the Fourier transform (FT) method, the heat conduction equation in an infinite 3D
space. The reader can appreciate that the separation of variables method developed
above brings us to exactly the same result as applying the FT method directly. In
fact, this should not come as a complete surprise as both methods are very close in
spirit; moreover, we benefited directly from the FT in the method above.
The function G, called the Green’s function of the 1D heat conduction equation,
has a very simple physical meaning: it corresponds to the solution of the heat
conduction equation for the initial conditions ‰ .x; 0/ D ı .x/ (this follows directly
from Eq. (8.129)), when the temperature at the point x D 0 was infinite while the
temperature at all other points was zero, i.e. the initial temperature distribution
was at t D 0 an infinite spike at x D 0. Then the function G.x; t/ describes the
distribution of the temperature in our 1D system in time: it is easily seen that
the spike gets smoothed out in both directions x ! ˙1 with time, approaching
the uniform ‰ .x; t/ D 0 distribution in a long time limit as G.x; t/ t1=2 .
8.5 Application of Fourier Method to Laplace Equation
The method of separation of variables can also be very useful in solving the Laplace
equation with specific boundary conditions. We discussed in detail the solution of
the Laplace equation in spherical coordinates in Sect. 4.5.3. Here we shall briefly
illustrate the application of the Fourier method to the Laplace problem on a number
of other simple examples.
Consider an infinite cylinder of radius S, the surface of which is kept at a constant
temperature .'/ which depends on the polar angle '. We are interested in the
stationary distribution of the temperature in the cylinder. This problem is most easily
formulated in polar coordinates assuming that the cylinder axis coincides with the z
axis (the coordinate z can be ignored altogether as the cylinder is of infinite length):

1 @ @‰ 1 @2 ‰
r C 2 D0; ‰ .S; '/ D .'/ ; (8.130)
r @r @r r @' 2
where in the left-hand side the .r; '/ part of Laplacian in the cylinder coordinates,
Eq. (7.88), has been written.
We start by trying a product solution, ‰ .r; '/ D R.r/ˆ.'/. Substituting it into
the PDE and separating the variables, we find

0 R 0 0 R ˆ00
ˆ rR0 C ˆ00 D 0 H) rR C D0:
r r ˆ
The expression inside the curly brackets depends only on the angle ', hence, it must
be a constant . Therefore, two ordinary DEs for the functions R.r/ and ˆ.'/ are
obtained:
r2 R00 C rR0 C R D 0 and ˆ00 ˆ D 0 : (8.131)
We now have to discuss how the values of the separation constant are to be chosen.
It is easy to see that D m2 , where m is any integer including zero, otherwise the
function ˆ.'/ would not be periodic with the period of 2 (we have already come
across this situation before in Sect. 4.5.3 where this point was thoroughly discussed).
Two cases are to be considered: m D 0 and m ¤ 0. In the latter case, the solution
for the function ˆ is indeed a periodic function,
ˆ .'/ D A cos .m'/ C B sin .m'/ ;
while the solution of the equation for R.r/ is sought using the trial function
R.r/ D rn , where n is a constant to be determined. Substituting this into the DE
for R, we get
n.n 1/rn C nrn m2 rn D 0 H) n2 m2 D 0 H) n D ˙m :
Therefore, the solution for the radial function must be the function
D
R.r/ D Crjmj C ;
rjmj
with C and D being some arbitrary constants. The solution must be finite at r D 0,
and hence the term with the constant D must be omitted, and hence the product
solution in the case of m ¤ 0 is
‰m .r; '/ D rjmj ŒAm cos .m'/ C Bm sin .m'/ ; (8.132)

8.5 Application of Fourier Method to Laplace Equation 601
where Am and Bm are arbitrary constants. Here formally m D ˙1; ˙2; : : :; however,
negative values of m result in basically the same solutions as the positive ones, so
that we can limit ourselves with the positive values of the integer m only.
In the case of m D 0 (and hence D 0) the DE for the function R reads
0 0
r2 R00 CrR0 D0 H) rR D0 H) rR0 DC0 H) R.r/DC0 ln rCD0 ;
with C0 and D0 being arbitrary constants, while the DE for ˆ is simply ˆ00 D 0
yielding a linear function, ˆ .'/ D A00 C B00 '. As we should have periodicity
with respect to the angle, the constant B00 must be equal to zero. Also, since the
solution must be finite at r D 0, the constant C0 in the solution for R.r/ must also
be zero as the logarithm has a singularity at r D 0. Hence, for m D 0 our product
solution is simply a constant. Taking a general linear combination of all solutions
and absorbing the expansion coefficients in our arbitrary constants, we can write the
general solution of the PDE as
1
A0 X m
‰ .r; '/ D C r ŒAm cos .m'/ C Bm sin .m'/ : (8.133)
2 mD1
Here the constant term (which comes from the m D 0 contribution discussed
above) was conveniently written as A0 =2. The reason for that is that when applying
the boundary conditions at r D R,
1
A0 X m
C R ŒAm cos .m'/ C Bm sin .m'/ D .'/ ; (8.134)
2 mD1
we arrive exactly at the expansion of the function .'/ into the Fourier series with
the period of 2 (see Sect. 3.1 and specifically Eq. (3.7) with l D ), and hence
the explicit expressions for the unknown coefficients Am (m D 0; 1; 2; : : :) and Bm
(m D 1; 2; : : :) are obtained directly from Eqs. (3.9) and (3.10), respectively:
Z 2
1
Am D .'/ cos .m'/ d' ; (8.135)
Rm 0
Z 2
1
Bm D .'/ sin .m'/ d' : (8.136)
Rm 0
The obtained expressions fully solve the problem: the solution is given by the
Fourier expansion (8.133) with the expansion coefficients (8.135) and (8.136).
Problem 8.35. Consider a stationary distribution ‰ .r; '/ of temperature

in a hollow cylinder with the internal and external radii being a and b,
respectively. The internal surface of the cylinder is maintained at the time-
constant temperature distribution a .'/, while the external at b .'/.
(i) Show first that a general solution of the Laplace equation can be written
in this case as follows:
1
X
‰ .r; '/ D C0 ln r C D0 C rm Am eim' C Bm eim' :
mD1 .m¤0/
(ii) Applying the boundary conditions, show that the coefficients above are
˛0 ˇ0
C0 D ; D0 D ˛0 C0 ln a ;
ln .a=b/
˛m bm ˇm am ˛m bm ˇm am
Am D ; Bm D ; m D ˙1; ˙2; : : : ;
.a=b/m .b=a/m .a=b/m .b=a/m
where for any m D 0; ˙1; ˙2; : : : we have

Z 2 Z 2
1 1
˛m D a .'/ eim' d' and ˇm D b .'/ eim' d' :
2 0 2 0
Problem 8.36. Consider a thin circular plate of radius R. One semicircular

boundary of it is held at a constant temperature T1 , while the other at T2 . Using
polar coordinates and the method of separation of variables, show that the
stationary distribution of temperature in the plate is given by:
1
X
1 2 .T1 T2 / r m
‰ .r; '/ D .T1 C T2 / C sin .m'/ :
2 m R
mD1 .odd/
Here the summation is performed only over odd values of m.

Problem 8.37. A rectangular plate occupying 0 x L1 and 0 y L2 is
brought into contact with a heat source which maintains its two opposite sides
of length L1 at temperature T1 . Two other sides of the plate (of length L2 ) are
kept at temperature T0 . The plate is then left for a very long time, so that some
stationary distribution of temperature ‰ .x; y/ is established.
(i) Introduce an auxiliary function ‰1 D ‰ T1 and show, using the method
of separation of variables, that
(continued)
8.6 Method of Integral Transforms 603

1
X n
‰1 .x; y/ D ŒAn epn x C Bn epn x sin .pn y/ ; pn D ;
nD1
L2
satisfies the corresponding PDE and the boundary conditions at the sides
y D 0 and y D L2 .
(ii) Show that the coefficients An and Bn satisfy the following equation:
Z
2 L2
An epn x C Bn epn x D ‰1 .x; y/ sin .pn y/ dy :
L2 0
(iii) Then, by considering the boundary conditions at the other two sides,
find the coefficients An and Bn , and hence show that the distribution of
temperature is given by:
1
X
4 .T0 T1 / 1
‰ .x; y/ D T1 C Œepn x C n epn x sin .pn y/ ;
n .1 C n /
nD1 .odd/
where the summation is run only over odd values of n and n D epn L1 .
8.6 Method of Integral Transforms
Integral transforms (such as Fourier and Laplace) are also frequently used to solve
the PDEs. An example of an application of the Fourier transform method for solving
d’Alembert’s PDE (which is an inhomogeneous hyperbolic PDE) was considered in
Sect. 5.3.2 and of the heat conduction equation in Sect. 5.3.3. Here we shall show
how the Laplace transform (LT) method can also be used in solving PDEs in a
general case of arbitrary initial and boundary conditions. Of course, the success
depends very much on whether the inverse LT can be performed; however, in many
cases exact analytical solutions in the form of integrals and infinite series can be
obtained.
We shall illustrate this method by solving a rather general one-dimensional heat
conduction problem described by the following equations:
1@ @2
D ; .x; 0/ D .x/ ; .0C; t/ D '1 .t/ ; .L; t/ D '2 .t/ ;
@t @x2
(8.137)
where 0 x L, and the notations 0C and L in the boundary conditions mean
that these are obtained by taking the limits x ! 0 and x ! L from the right and
left sides, respectively, in the function .x; t/. We need to calculate the distribution
.x; t/ along the rod at any time t > 0. An important variant of this problem may
be of a semi-infinite rod; this can be obtained by taking the limit of L ! 1.
First, we shall consider the case of the rod of a finite length L. Performing the LT
of the above equations with respect to time only yields
1 @2 ‰.x; p/
Œp‰.x; p/ .x/ D ; ‰ .0C; p/ D ˆ1 .p/ ; ‰ .L; p/ D ˆ2 .p/ ;
@x2
(8.138)
where p is a number in the complex plane, ‰ .x; p/ D L Œ .x; t/ is the LT of the
unknown temperature distribution, while ˆ1 .p/ and ˆ2 .p/ are LTs of the boundary
functions '1 .t/ and '2 .t/, respectively.
Problem 8.38. Using the method of variation of parameters (Sect. I.8.2.2.2),

show that the general solution of the DE
y00 .x/ 2 y.x/ D f .x/ (8.139)
is
Z Z
1 x
x1 x 1 x
x1
y .x/ D C1 C f .x1 / e dx1 e C C2 f .x1 / e dx1 ex ;
2 0 2 0
(8.140)
where C1 and C2 are arbitrary constants [cf. Problem 8.22].
When applying the result of this problem to Eq. (8.138), we note y.x/ ! ‰ .x; p/,
2 D p= and f .x/ D .x/=. To calculate the two constants (which of course
are constants only with respect to x; they may and will depend on p), one has to
apply the boundary conditions stated in Eq. (8.138), which gives two equations for
calculating them:
C1 C C2 D ˆ1 ; (8.141)
Z L
1 x1
C1 C f .x1 / e dx1 eL
2 0
Z L
1
C C2 f .x1 / e dx1 eL D ˆ2 :
x1
(8.142)
2 0
Problem 8.39. Show that the solution of these equations is

Z
1 1 L
C1 .p/ D ˆ2 ˆ1 eL f ./ sinh . .L// d ; (8.143)
2 sinh .L/ 0
(continued)

Z
1 1 L
C2 .p/ D ˆ2 Cˆ1 eL C f ./ sinh . .L// d ; (8.144)
2 sinh .L/ 0
p
where the dependence on p comes from ˆ1 , ˆ2 and D p=.
Problem 8.40. Then, substituting the expressions for the “constants” C1 and
C2 found in the previous problem into the solution (8.140) for y.x/ ! ‰ .x; p/,
RL Rx RL
show by splitting the integral 0 D 0 C x , that the solution of the boundary
problem (8.138) can be written as follows:
sinh . .L x// sinh .x/

‰ .x; p/ D ˆ1 .p/ C ˆ2 .p/
sinh .L/ sinh .L/
Z x
1 sinh .x/
./ sinh . .x // sinh . .L // d :
0 sinh .L/
Problem 8.41. Finally, using the definition of the sinh function, simplify the
expression in the square brackets to obtain the final solution of the boundary
problem:
Z
sinh . .L x// sinh .x/ 1 L
‰ .x; p/ D ˆ1 .p/C ˆ2 .p/C ./ K .x; p; / d ;
sinh .L/ sinh .L/ 0
(8.145)
where

1 sinh ./ sinh . .L x// ; 0 < < x
K .x; p; / D :
sinh .L/ sinh .x/ sinh . .L // ; x < < L
(8.146)
This is the required solution of the full initial and boundary problem (8.138) in
the Laplace space. In order to obtain the final solution .x; t/, we have to perform
the inverse LT. This can only be done in terms of an expansion into an infinite series.
Four functions need to be considered which appear in Eqs. (8.145) and (8.146).
The coefficient to the function ˆ1 .p/ is the function
Lx p p p
sinh p p
sinh . .L x// e.Lx/ p=
e.Lx/ p=
Y1 .p/ D D p D p p
sinh .L/ sinh pL p eL p= eL p=

p p p p
1 e.Lx/ p= e.Lx/ p= ex p=
e.2Lx/ p=
D p p D p :
eL p= 1 e2L p= 1 e2L p=
Expanding the denominator into the geometric progression, we obtain
h p p iX
1 p
Y1 .p/ D ex p= e.2Lx/ p= e2Ln p=
nD0
1 h
X p p i
D e.2LnCx/ p=
e.2L.nC1/x/ p=
: (8.147)
nD0
p
Note that each exponential term is of the form e˛ p with some positive ˛ > 0.
This ensures that the inverse LT of each exponent in the series exists and is given by
Eq. (6.28). Denoting it by .˛; t/, we can write7
1
X
2Ln C x 2L .n C 1/ x
L1 ŒY1 .p/ D y1 .x; t/ D p ;t p ;t :
nD0

(8.148)
Problem 8.42. Using a similar method, show that the other three functions
appearing in Eqs. (8.145) and (8.146) can be similarly expressed as infinite
series:
sinh .x/ X h .2L.nC1/x/pp= .2L.nC1/Cx/pp= i

1
Y2 .p/ D D e e I (8.149)
sinh .L/ nD0
p X
1 h .2LnCx/pp=
1
p
K .x; p; / D p e e.2LnCxC/ p=
2 nD0 p
p p i
e.2L.nC1/x/ p= C e.2L.nC1/xC/ p= ; (8.150)
if 0 < x, while K .x; p; / for the case of x L is obtained from the

previous expression by the interchange x $ .
Then, show that the inverse transform of the coefficient to ˆ2 is
X 1
sinh .x/ 2L.n C 1/x 2L .nC1/ Cx
y2 .x; t/DL1 D p ; t p ;t ;
sinh .L/ nD0

(8.151)
(continued)
7
Here we accepted without proof that the operation of LT can be applied to the infinite series term-
by-term. This can be shown to be true if the series in the Laplace space converges, and is true in
our case as we have a geometric progression.

while the inverse LT of the two parts of the kernel K .x; p; / is as follows: for
0 < x we have

sinh ./ sinh . .L x//
K.x; t; / D L1 ŒK .x; p; / DL1
sinh .L/
p X 1
2LnCx 2LnCxC
D p ; t p ;t
2 nD0

2L.nC1/x 2L.nC1/xC
p ; t C p ;t ; (8.152)

whereas for x L

sinh .x/ sinh . .L //
K.x; t; / D L1 ŒK .x; p; / D L1
sinh .L/
p X 1
2Ln C x 2Ln C C x
D p ;t p ;t
2 nD0

2L.nC1/x 2L.nC1/Cx
p ; t C p ;t ; (8.153)

p p
where .˛; t/ is the inverse LT of the function e˛ p = p which is given by
Eq. (6.29). In fact, it can be seen that Eq. (8.153) can be obtained from (8.152)
by permuting x and .
The obtained formulae allow us finally to construct the exact solution of the
heat conductance problem (8.137). Using the above notations and applying the
convolution theorem for the LT (Sect. 6.3.4), we obtain from Eq. (8.145):
Z Z
t
1 L
.x; t/ D Œ'1 .t / y1 .x; / C'2 .t / y2 .x; / dC ./K .x; t; / d :
0 0
(8.154)
This is the required general result. It follows that a general problem of arbitrary
initial and boundary conditions can indeed be solved analytically, although the final
result is expressed via infinite series and convolution integrals.
Consider now a semi-infinite rod, L ! 1, in which case the above expression
is drastically simplified. Indeed, in this case the x D L boundary condition should
be replaced by .1; t/ D 0. Correspondingly, in the limit the expressions for the
functions (8.147), (8.149) and (8.150) simplify to:
p
Y1 .p/ D ex p=
; Y2 .p/ D 0 ;
8
r < exp xpp= sinh pp= ; 0 < < x

K.x; p; / D p p :
p : exp p= sinh x p= ; x < < 1
For instance, in the expression for Y1 .p/ only one exponential function with n D 0
survives, all others tend to zero in the L ! 1 limit. Similarly, one can establish the
expressions given above for the other functions.
Problem 8.43. Show that the final solution for a semi-infinite rod is
Z Z
t
x 1 1
.x; t/ D '1 .t / p ; d C ./K1 .x; t; / d ;
0 0
(8.155)
where
p
x xC
K1 .x; t; / D p ;t p ;t
2
for 0 < x, and

p
x Cx
K1 .x; t; / D p ;t p ;t
2
for x < 1. Note that the solution does not contain the term associated
with the zero boundary condition at the infinite end of the rod as y2 .x; t/ D 0
exactly in the L ! 1 limit.
Problem 8.44. Consider a semi-infinite rod 0 x 1 which initially had
a uniform temperature T0 . At t D 0 the
p x D 0 end of the rod is subjected to
heating according to the law '1 .t/ D t. Show that at t > 0 the distribution of
the temperature in the rod is given by the following formula:
Z t
x p x
.x; t/ D p ; t d C T0 1 erfc p :
0 4t
The error function was introduced in Eq. (6.44).

Problem 8.45. A semi-infinite rod was initially at zero temperature. Then from
t D 0 its x D 0 end was kept at temperature T1 . Show that the distribution of
temperature in the rod is given by:

x
.x; t/ D T1 erfc p :
4t
Chapter 9
Calculus of Variations
Often in many problems which are encountered in practice, one needs to find a
minimum (or a maximum) of a function f .x/ with respect to its variable x, i.e., the
point x0 where the value of the function is the smallest (largest) within a certain
interval a x b. The necessary condition for this to happen is for the point x0
to satisfy the equation f 0 .x0 / D 0. However, sometimes one may need to solve a
much more difficult problem of finding the whole function, e.g. f .x/, which results
in a minimum (or a maximum) of some scalar quantity L which directly depends
on it. We shall call this type of dependence a functional dependence and denote
it as L Œf using square brackets. Taking some particular function f D f1 .x/ one
gets a particular numerical value L1 of this functional, while by taking a different
function f D f2 .x/ one gets a different value L2 . Therefore, it seems that indeed,
by taking all possible choices of functions f .x/, various values of the quantity L,
called functional of the function f , are obtained, and it is perfectly legitimate to ask
a question of whether it is possible to find the optimum function f D f0 .x/ that yields
the minimum (maximum) value of that scalar quantity (functional) L.
To understand the concept better, let us consider a simple example. Suppose, we
would like to prove that the shortest line connecting two points in the 2D space is the
straight line. The length of a line between two points .x1 ; y1 / and .x2 ; y2 / specified
by the equation y D y.x/ on the x y plane is given by the integral (Sect. I.4.6.11 )
Z x2 q
LD 1 C .y0 /2 dx: (9.1)
x1
1
number I in front of the reference, e.g., Sect. I.1.8 or Eq. (I.5.18) refer to Sect. 1.8 and Eq. (5.18) of

610 9 Calculus of Variations
Here the function we seek passes through the two points, i.e., it must satisfy the
boundary conditions y .x1 / D y1 and y .x2 / D y2 . We see that L directly depends on
the function y.x/ chosen: by taking different functions y.x/ passing through the same
two points, different values of L are obtained, i.e., the length of the line between the
two points depends directly on how we connect them. In this example the length L
depends on the function y.x/ via its derivative only. The question we are asking
is this: prove that the functional dependence y.x/ which yields the shortest line
between the two points is the straight line. In other words, we need to minimise
the functional L Œy.x/ with respect to the form (shape) of the function y.x/, derive
the corresponding differential equation which would correspond to this condition,
and then solve it. We expect that solution is a straight line y D ˛x C ˇ with the
appropriate values of the two constants corresponding to the line fixed end points.
Similarly, one may consider a more complex 3D problem of finding the shortest
line connecting two points A and B when the line lies completely on a surface given
by the equation G .x; y; z/ D 0. In this case we need to seek the minimum of the
functional
Z b q
L Œx; y; z D .x0 /2 C .y0 /2 C .z0 /2 dt (9.2)
a
with respect to the three functions x.t/, y.t/ and z.t/, which give us the desired line
in the parametric form via t. If we disregard the constraint G .x; y; z/ D 0, then of
course the straight line connecting the points A and B would be the answer. However,
this may not be the case if we require that the line is to lie on the surface, e.g., on
the surface of a sphere. This kind of variational problems is called problems with
constraints. In that case the solution will crucially depend on the constraint specified.
It is the purpose of this Chapter to discuss how these problems can be solved in
a number of most frequently encountered cases. But before we proceed, we have
to define a linear functional. A functional LŒf .x/ is linear if for any two functions
.x/ and '.x/, one can write
L Œ˛ .x/ C ˇ'.x/ D ˛L Œ C ˇL Œ' ;
with ˛ and ˇ being arbitrary numbers. For instance, the functional

Z b
LŒf D x2 f C f 0 dx
a
is linear with respect to the function f .x/ for D 0 or D 1, while it is not linear
for any other values of .
9.1 Functions of a Single Variable 611
9.1 Functions of a Single Variable
9.1.1 Functionals Involving a Single Function
Consider the functional dependence, L Œf .x/, in the case of a single function of a

single variable, f .x/, in more detail. Let L D L Œf be the value of the functional
corresponding to the function f .x/. Using a different function, f1 .x/, which satisfies
the same boundary conditions, Fig. 9.1, will result in the value L1 D L Œf1 of the
functional. The difference between the two functions, ıf .x/ D f1 .x/ f .x/, yields
the difference in the values of the functional:
L D L Œf1 L Œf D L Œf C ıf L Œf :
Next, we define as the maximum deviation of f1 from f within the interval

a < x < b, i.e.,
D max jıf j :
a<x<b
Then, L can be written as
L D L1 Œf ; ıf C L2 Œf ; ıf ;
where the functional L1 is linear with respect to ıf , while L2 is not. In fact, L1
tends to zero linearly with ! 0, while the functional L2 tends to zero faster
than . We can say that L2 = ! 0 as ! 0.2 The first part of the difference,
L1 , is called the variation of the functional due to change ıf .x/ of the function,
and is usually denoted ıL.
Fig. 9.1 The two functions,

f .x/ and f1 .x/, have the same
initial and final points, and
their difference,
ıf .x/ D f1 .x/ f .x/, is equal
to zero at the end points
2
This is very similar to the differential of a function: y D y .x C x/ y.x/ is given by a sum
of two terms, one being linear in x, the other depends on higher powers of x, i.e., tends to zero
much faster than x itself (see Sect. I.3.1).
As an example, let us consider the functional

Z b 2
LŒf D 2f C x3 f 0 dx: (9.3)
a
In this case,
Z n Z
b o b 2
L D 2 .f C ıf /2 C x3 f 0 C ıf 0 dx 2f C x3 f 0 dx D L1 C L2 ;
a a
where
Z b Z b
L1 D 4f ıf C x3 ıf 0 dx and L2 D 2 .ıf /2 dx:
a a
2
One can see that L1 behaves like , while L2 as . Indeed, since jıf j , then
the first term in L1 can be estimated as
ˇZ ˇ Z Z
ˇ b ˇ b b
ˇ Œ4f ıf dxˇˇ 4 jf j jıf j dx 4 jf j dx ;
ˇ
a a a
while the second contribution can be estimated by first integrating by parts:

Z Z Z Z
b b ˇb b b
x3 ıf 0 dx D x3 .ıf /0 dx D x3 ıf ˇa 3x2 ıf dx D 3x2 ıf dx;
a a a a
and hence the module of this term is also proportional to . Note that we have used
here that ıf 0 D .ıf /0 and that ıf D 0 at the end points of our interval as the two
functions are equal there, Fig. 9.1. Let us now estimate L2 :
ˇZ ˇ Z
ˇ b ˇ b
ˇ 2 .ıf /2 dxˇˇ 2 jıf j2 dx 2 2
.b a/ ;
ˇ
a a
and hence jL2 = j ! 0 as ! 0, as required. Hence, the variation of L is

ıL D L1 .
The decomposition of L into the two contributions can be simplified in practice
using the following simple method. The trick is to take some fixed function ıf which
is equal to zero at the ends of the interval, and then consider a function
L .˛/ D L Œf C ˛ıf :
Here ˛ıf serves as a variation of f .x/ and L.˛/ becomes a function of a single
variable ˛. By taking small values of ˛ one can make the deviation ˛ıf from f to be
arbitrarily small for all values of x within the interval a < x < b. Then, the division
of L into two terms (one being a linear functional and the other of higher order in
terms of the deviation) can be made similarly to the way this is done for an ordinary
function of a single variable as L.˛/ is indeed such a function of ˛. Expanding L.˛/
into the Taylor series around ˛ D 0, one gets
1
L D L .˛/ L.0/ D L0 .0/˛ C L00 .0/˛ 2 C : (9.4)
2
Here L.0/ D LŒf . On the other hand,
LDL Œf C˛ıf L Œf DL1 Œf ; ˛ıf CL2 Œf ; ˛ıf D˛L1 Œf ; ıf CL2 Œf ; ˛ıf ;

(9.5)
where we were able to take ˛ out in the L1 term since it is a linear functional
with respect to its second argument (the variation of the function). Comparing
Eqs. (9.4) and (9.5), we conclude that L0 .0/ must be equal to L1 Œf ; ıf , while
the following terms in the expansion (9.4), which behave at least as ˛ 2 , should
correspond to L2 Œf ; ˛ıf . Indeed, if 0 D maxa<x<b jıf j > 0 is the maximum
value of the function ıf .x/ within our interval, then D maxa<x<b j˛ıf j D j˛j 0 is
the maximum deviation of f C ˛ıf from f , and hence the limit ! 0 is equivalent
to taking the ˛ ! 0 limit. It is clear then that
ˇ ˇ ˇ ˇ
jL2 Œf ; ˛ıf j ˇ L2 Œf ; ˛ıf ˇˇ 1 ˇˇ L2 Œf ; ˛ıf ˇˇ
ˇ
D ˇ lim
lim
!0 ˛!0 ˛ ˇD lim
ˇ˛!0 ˛ ˇ D 0;
0 0
as required. Therefore, the differential of LŒf can be calculated as:

ˇ
ıL D L1 Œf ; ıf D L0 .0/ D L0 .˛/ˇ˛D0 : (9.6)
As an illustration of this method, let us apply this simple result again to the
functional (9.3):
Z h
b i
L .˛/ D 2 .f C ˛ıf /2 C x3 f 0 C ˛ıf 0 dx:
a
Differentiating with respect to ˛ under the integral sign, we get

Z b Z b
L0 .0/ D lim 4 .f C ˛ıf / ıf C x3 ıf 0 dx D 4f ıf C x3 ıf 0 dx:
˛!0 a a
This is the same result as before.

Problem 9.1. Show that the variation of the functional

Z h Z
b 2 i b
LŒf D 3x3 f 3 C 2 f 0 dx is ıL D L1 D 9x3 f 2 ıf C 4f 0 ıf 0 dx;
a a
and that the corresponding L2 term behaves in such a way that
L2 Œf ; ˛ıf =˛ ! 0 as ˛ ! 0.
Next we shall consider the simplest case of the functional L Œf .x/ given by an
integral
Z b

L Œf D F x; f ; f 0 dx: (9.7)
a
0
Here F .x; f ; f / is some function which we assume is continuous with respect to
its three variables and has the necessary continuous derivatives with respect to any
of them. Note that F depends on x and, at the same time, is a functional of f .x/,
depending on the function itself and its first derivative. Let us calculate the variation
ıL of this functional:
Z b
L .˛/ D F x; f C ˛ıf ; f 0 C ˛ıf 0 dx
a
Z Z
ˇ b
dF b
@F @F
H) L0 .˛/ˇ˛D0 D dx D ıf C 0 ıf 0 dx:
a d˛ ˛D0 a @f @f
There are two contributions, coming from the second and the third arguments of the
function F. The contribution to the integral due to the second term in the integrand
above (it is related to the variation of the derivative ıf 0 ) we shall take by parts:
Z Z ˇ Z b
b
@F 0 @Fb
0 @F ˇˇb d @F
ıf dx D .ıf / dx D ıf ıf dx
a @f 0 a @f
0 @f 0 ˇa a dx @f 0
Z b
d @F
D ıf dx;
a dx @f 0
where the free (first) term which appeared after the integration by parts is zero since
ıf .a/ D ıf .b/ D 0. An important point here is that the x-derivative is the total
derivative which takes account of the complete dependence of F on x including
those of f and f 0 . This is because we have taken the integral over x by parts treating
F1 .x/ D @F=@f 0 as a function of x only, i.e., including its dependence on x via f and
f 0 as well. Hence, we obtain
Z b Z b
ˇ @F @F 0 @F d @F
0 ˇ
ıL D L .˛/ ˛D0 D ıf C 0 ıf dx D ıf dx:
a @f @f a @f dx @f 0
(9.8)
Next, we should consider the necessary condition for the functional LŒf to be
minimum (maximum). Let us assume that the function f0 .x/ gives the maximum to
the functional. This means that for any function f .x/ such that jf .x/ f0 .x/j <
( is an arbitrary positive number) it follows that LŒf < L Œf0 . This is to be satisfied
for any x within the interval a x b. The minimum of the functional is defined
in a similar manner.
What we are going to do now is to employ the definition given above to establish
the necessary condition for the maximum. Choose f D f0 C ˛ıf . Then,
jf .x/ f0 .x/j D j˛ıf .x/j D j˛j jıf j < j˛j 0;
so that one can always choose a sufficiently small ˛ such that j˛j 0 < , where 0 is
the largest fluctuation of ıf . But if f0 gives the maximum, then L Œf0 C ˛ıf < L Œf0 ,
which means that L .˛/ < L.0/ for any ˛ satisfying the above condition. Therefore,
the value of ˛ D 0 corresponds to the maximum of the function L.˛/ for which
the necessary condition is obviously L0 .0/ D 0. Hence, this must be the necessary
condition we are looking for:
ˇ
L0 .0/ D L0 .˛/ˇ˛D0 D ıL D 0: (9.9)
Naturally, the same condition is obtained for the minimum of a functional.

To obtain the appropriate DE for the function f .x/ which satisfies this necessary
condition, we need the following theorem:
Theorem 9.1. Let .x/ be any function which is zero at the ends of the interval,
.a/ D .b/ D 0, and is continuous together with its first derivative within
the whole interval. Consider a continuous function f .x/. If for any such .x/ the
integral
Z b
f .x/.x/dx D 0;
a
then f .x/ D 0.
Proof. We shall prove this theorem by contradiction. Assume that there is a point
x0 within our interval a < x < b such that f .x0 / > 0. Since f .x/ is assumed to be
continuous, this means that there must be a vicinity of the point x0 of some width 2
where the function f .x/ is also positive (see Theorem I.2.16), i.e., f .x/ > 0 for any
x from some interval < x < C , where ˙ D x0 ˙ . Next, we can construct a
particular function .x/ in such a way that
2 2
.x/ D .x C/ .x / (9.10)
within that interval and zero otherwise. The value of can be chosen small enough
so that the points ˙ both lie inside the original interval a < x < b. The function
.x/ defined in this way is everywhere continuous and is equal to zero at the points a
and b. Its first derivative is also continuous (and is equal to zero at the points x D ˙
and beyond the interval < x < C ). At the same time, the integral
Z b Z C
Z C
2 2
f .x/.x/dx D f .x/.x/dx D f .x/ .x C/ .x / dx > 0;
a
since f .x/ > 0 within the chosen integration limits. This contradicts the condition of
the theorem, and hence our assumption was wrong. Note that the integration limits
in the integral above were changed to ˙ because the function D 0 outside it.
Q.E.D.
This simple theorem allows us to find the DE for the function f .x/ which delivers
a minimum (or a maximum) to the functional (9.7). Indeed, the variation of this
functional is given by Eq. (9.8) containing the arbitrary variation ıf .x/ under the
integral. Since the necessary condition for the extrema is that the variation of
the functional to be zero, the integral (9.8) must be equal to zero for arbitrary
variation ıf . Using the proven theorem, this means that the expression in the square
brackets in Eq. (9.8) must necessarily be zero, i.e.,

@F d @F
D 0: (9.11)
@f dx @f 0
This is called Euler’s equation. This DE is to be solved subject to the boundary con-
ditions at the ends of the interval: f .a/ D A and f .b/ D B. In practical calculations
it is sometimes useful to write the second term in the left-hand side, containing the
total derivative with respect to x, explicitly. Indeed, the partial derivative of F with
respect to f 0 depends on x, f and f 0 , and hence

d @F @2 F @2 F 0 @2 F
D C f C 0 0 f 00 D Fxf 0 C Fff 0 f 0 C Ff 0 f 0 f 00 :
dx @f 0 @x@f 0 @f @f 0 @f @f
This manipulation allows rewriting the Euler’s equation (9.11) as follows:
Ff Fxf 0 Fff 0 f 0 Ff 0 f 0 f 00 D 0: (9.12)
It becomes clear now that this is a second order DE with respect to the function
f .x/, and hence its solution f .x; C1 ; C2 / will depend on two arbitrary constants C1
and C2 which are to be chosen to satisfy the boundary conditions f .a; C1 ; C2 / D A
and f .b; C1 ; C2 / D B. Three cases are possible: (1) there are no solutions and hence
the functional (9.7) has no extrema; (2) there is only one solution and hence only
one extremum and, finally, (3) there are several solutions, i.e., several extrema exist.
It is instructive at this point to consider several particular cases of the function
F .x; f ; f 0 / serving as the integrand in the functional (9.7).
Case 1 If F D F.x; f /, i.e., this function does not depend on f 0 , then the Euler’s
equation is simply Ff D 0. This is an algebraic equation with respect to the function
f , and hence the solution does not contain any arbitrary constants. This means that
it may not satisfy the boundary conditions.
Case 2 If F .x; f ; f 0 / is linear with respect to the derivative of f , i.e.,

F x; f ; f 0 D g .x; f / C h .x; f / f 0 ; (9.13)
then the Euler’s equation gives

@g 0 @h dh @g 0 @h @h @h @g @h
Cf D0 H) Cf C f0 D 0 H) D :
@f @f dx @f @f @x @f @f @x
(9.14)
This is again an algebraic equation for f .x/ and hence f will not contain any
arbitrary constants. However, as we shall show presently, this case has a real
problem as the functional does not depend on the function f at all! Indeed, because
of the condition (9.14), there is a function U.x; f / such that its exact differential
is dU D gdx C hdf with g D @U=@x and h D @U=@f (see Sect. I.5.5). The total
derivative of U with respect to x is
dU @U @U 0
D C f D g C hf 0 ;
dx @x @f
which coincides exactly with the function F in Eq. (9.13). Therefore, the functional
containing it,
Z Z
b b
dU
L Œf D g C hf 0 dx D dx D U .b; f .b// U .a; f .a// ;
a a dx
is simply a constant, it only depends on the points at the ends of the interval! So, no
solution in this case is possible at all.
Case 3 Consider now the function F being independent of x and f , i.e., it only
depends on the derivative f 0 , i.e., F D F .f 0 /. The Euler’s equation in this case is

d @F @F
D0 H) D C1 ;
dx @f 0 @f 0
where C1 is a constant. This is an algebraic equation with respect to the derivative

f 0 , resulting in f 0 being equal to some number, let us call it ˛ (it is a function of C1 ):
f 0 D ˛. The only solution of this DE is the linear function f .x/ D ˛x C ˇ.
This particular case we shall illustrate by a simple example, mentioned at the
beginning of this Chapter, of finding the shortest line on the x y plane connecting
two points. The appropriate functional for the line y.x/ is given by Eq. (9.1) which
depends exclusively on the derivative y0 . Hence, the Euler’s equation immediately
gives a linear function y D ˛x C ˇ with the two constants to be obtained from the
coordinates of the two points. We have obtained the result to be expected!
Case 4 Finally, let F D F .f ; f 0 /, i.e., the function F does not explicitly depend on
the variable x. In this case the Euler’s equation (9.12) reads
Ff Fff 0 f 0 Ff 0 f 0 f 00 D 0: (9.15)
This is the second order DE which can be integrated once, that is, it can be
transformed into a first order DE with respect to the function f .x/. To this end,
multiply both sides of Eq. (9.15) by f 0 and add and subtract Ff 0 f 00 :

f 0 Ff Fff 0 f 0 Ff 0 f 0 f 00 C Ff 0 f 00 Ff 0 f 00 D 0;
or, after some trivial manipulation,

0 h 2 i
f Ff C Ff 0 f 00 Fff 0 f 0 C Ff 0 f 0 f 0 f 00 C Ff 0 f 00 D 0: (9.16)
The first term within the round brackets is simply the total derivative dF dx
. At the
same time, the second term in the square brackets is also the total derivative of f 0 @f@F0
with respect to x (recall, that F only explicitly depends on f and f 0 ):
2
d 0 @F 00 @F 0 @F 0 @2 F 00
f 0 Df Cf f C 0 0f ;
dx @f @f 0 @f @f 0 @f @f
which is the same as in the square brackets in Eq. (9.16). Therefore, Eq. (9.16) can
be written as

d 0 @F @F
F f 0 D 0 H) F f 0 0 D C1 : (9.17)
dx @f @f
This is the final result: the obtained DE is of the first order, and C1 is an arbitrary
constant. Solving the obtained DE gives another constant, C2 , and this is enough for
the two boundary conditions to be satisfied.
We shall illustrate this formula by solving a classical geodesic problem of finding
the shortest line lying on a sphere of radius R and centred at the origin which
connects two points A .R; A ; A / and B .R; B ; B / (specified using the spherical
coordinates) on that sphere. A square of the element of the line length, .ds/2 , lying
on the sphere is given by Eq. (7.35), so that the required element of the arc length is
q
ds D R .d/2 C sin2 .d /2 :
It is convenient to define the line by the equation D . / (with A D . A /

and B D . B /) in which the angle is used as an independent variable (the line
parameter). Then,
s
2
d
ds D R C sin2 d ;
d
and hence the line length is given by integrating ds along the line:
s
Z 2
B
d
LDR C sin2 d :
A
d
We see that the function

q

F D F ; 0 D R . 0 /2 C sin2
in the above functional does not depend on the variable at all, so that the variation
of the functional with respect to the unknown function . / results in Eq. (9.17)
with that F. Performing the required differentiation @F=@ 0 , we obtain from (9.17):
q
. 0 /2 C1 sin2 1
. 0 /2 C sin2 q D H) q D ;
R
. 0 /2 C sin2 . 0 /2 C sin2
with D R=C1 . The obtained expression can be solved easily with respect to 0
leading to an ordinary first order DE in which the variables are separated
p Z
0 d
D 2 sin4 sin2 H) p D C C2 :
sin sin2
2 4
The integral is calculated using the substitution
cot 1 1 d
tD p ; dt D p 1 C cot2 d D p 2
;
2 1 2 1 2 1 sin
and noting that
p r
1 p
2
4 2
2 sin sin D sin 2 2
D sin2 2 .1 C cot2 /
sin
p p
D sin2 2 1 1 t2 :
We obtain
Z
dt
p D C C2 H) arcsin t D C2 ;
1 t2
which leads to
cot
sin .C2 / D t H) sin .C2 C / D p ;
2 1
or, writing the sine in the left-hand side via the cosine and sine of the angles C2 and
using the well-known trigonometric identity, and splitting up the cotangent,
cot
sin C2 cos C cos C2 sin Dp
2 1
.R cos /
H) sin C2 .R sin cos / C cos C2 .R sin sin / D p :
2 1
We recognise in the expressions in the round brackets the coordinates x D
R sin cos , y D R sin sin and z D R cos of the line on the surface of the
sphere. Therefore, we obtain
˛x C ˇy C z D 0;
p
where ˛ D sin C2 and ˇ D cos C2 are two related constants, while D 1= 2 1
is another independent constant, so we have two independent constants in total.
We have obtained an equation for a plane passing through the centre of the
coordinate system and the two points A and B (which determine the two constants
C2 and ). Hence, the shortest line connecting two points on a sphere lies on the
intersection of the sphere and this plane.
Problem 9.2. Consider the functional

Z h
b 2 i
LD x2 f C f 0 dx:
a
Show that its stationary value is given by the function

1 4
f .x/ D x C C1 x C C2 ;
24
which is a solution of the DE f 00 D 12 x2 . Here C1 and C2 are two constants.
Problem 9.3. Show that the stationary value of the functional
Z h
b 2 i
LŒf D 3f 2 C f sin x C 2 y0 dx
a
is given by the function satisfying the DE

3 1
f 00 f D sin x:
2 4
Assuming that the boundary conditions are f .0/ D 0 and f . / D 0 (i.e. a D 0
1
and b D ), show that f .x/ D 10 sin x. Use this function in the functional to
show that its value at this optimum function is equal to 5 =200.

Z b
LD f C f 2 C ex Aff 0 dx:
a
Show that the minimum (maximum) of L is given by a constant function

f .x/ D 1=2 (which may not satisfy the boundary conditions) irrespective
of the value of the constant A. Which case does this problem correspond to?
[Hint: Could the x dependence in the functional be ignored? Could the term
Aff 0 be integrated for any f .x/ and hence also ignored?]
Problem 9.5. Consider two points A and B on a cylinder of radius R. Show
that the shortest line connecting these two points and lying on the cylinder
surface is given in the cylindrical coordinates .R; ; z/ by z . / D C1 C C2 ,
where the constants C1 and C2 are determined from the coordinates of the two
points. [Hint: use formula (7.34).]
Problem 9.6. pConsider the same problem for the cone surface given by the
equation z D x2 C y2 . Using polar coordinates .r; /, show that in this case
the optimum curve is given by
C1
r. / D p ;
cos 2C1 C C2
where C1 and C2 are two constants.

Problem 9.7. Show that the stationary value of the functional
q
Z b 1 C .f 0 /2
LD dx
a f
is given by the function
q
f .x/ D C1 1 C12 .x C C2 /2 ;
where C1 and C2 are two constants.

Problem 9.8 (Classical Catenary Problem). Consider a curve y D y.x/
drawn between two points A .a; y.a// and B .b; y.b// on the x y plane. If
we rotate this line around the x axis (Sect. I.4.6.4), a surface of revolution will
be drawn with the surface area given by Eq. (I.4.96):
Z b q
SD2 y 1 C .y0 /2 dx:
a
(continued)
Fig. 9.2 An object of mass m

slides down the slope
y D y.x/

Show that the curve resulting in the minimal possible surface area is given by
the equation
x C C2
y.x/ D C1 cosh ;
C1
which is called catenary. Here C1 and C2 are two constants which are
determined from the coordinates of the two fixed points. [Hint: the integral
1=2
of x2 1 is calculated using the substitution x D cosh t leading to the
inverse hyperbolic cosine function for the integral.]
Problem 9.9 (Classical Brachistochrone Problem). Consider a slope
described by the equation y D y.x/ with the x axis running horizontally and
the y axis vertically down, Fig. 9.2. An object is released from the top (point
A with zero x and y coordinates) which slides down along the slope to reach
its bottom at point B.xB ; yB /. The velocity v.x/ of the object at its horizontal
position x can be determined from the conservation of energy:
mv 2 p
mgy D H) v.x/ D 2gy.x/;
2
where mq
is the mass of the object and g the gravitational constant (on Earth).
If ds D 1 C .y0 /2 dx is the elementary length of the slide around point x, the
time required to cross it is ds=v.x/. The total travel time to slide from the top of
the slide to its bottom is given by the integral
s
Z xB Z xB
ds 1 1 C .y0 /2
LD Dp dx:
0 v 2g 0 y
Show that the optimum curve of the p slide (i.e. corresponding to the shortest
sliding time) satisfies the DE y0 D .2C1 y/ =y, where C1 is a constant.
Then, integrate this DE over y making the substitution
(continued)
y D 2C1 sin2 .t=2/ D C1 .1 cos t/
to obtain
x D C1 .t sin t/ C C2 :
To understand what we have just obtained, note that the variable t can be used
as a parameter in specifying the function y D y.x/ as x D x.t/ and y D y.t/.
The first of the boundary conditions x D y D 0 is satisfied with t D 0 and C2 D
0. Demonstrate by plotting f1 .t/ D 1= .1 cos t/ and f2 .t/ D 1= .t sin t/ for
0 t 2 that they intersect only at a single point ensuring that it is always
possible to determine the other constant C1 from the coordinates of the point B.
The curve we obtained is drawn in the x y plane by a fixed point on a wheel
rolling in the x direction; it is called cycloid.
9.1.2 Functionals Involving More than One Function
Let us now consider a more general case of a functional depending on more than
one function:
Z b
L Œf1 ; f2 ; : : : ; fn D F x; f1; : : : fn ; f10 ; : : : ; fn0 dx: (9.18)
a
In this case one may consider a partial variation of the functional with respect to the
function fi for a specific i between 1 and n. We then introduce a function
Li .˛/ D L Œf1 ; : : : ; fi C ˛ıfi ; : : : fn

Z b

D F x; f1; : : : ; fi C ˛ıfi ; : : : ; fn ; f10 ; : : : ; fi0 C ˛ıfi0 ; : : : ; fn0 dx; (9.19)
a
in which only one function, fi .x/, is varied. Then the partial variation of L with
respect to fi .x/ is defined as ıLi D Li0 .0/, and the necessary condition for L to be
minimum (maximum) will be
ıLi D Li0 .0/ D 0; i D 1; : : : ; n: (9.20)
Correspondingly, basically the whole discussion of the previous section can be

repeated without change. This results in the Euler’s equation (9.11) for each i, i.e.,
one has to solve a system of DEs to obtain all the functions:

@F d @F
D 0; i D 1; : : : ; n; (9.21)
@fi dx @fi0
or, if written in the expanded form (9.12),
Ffi Fxfi0 Ffi fi0 fi0 Ffi0 fi0 fi00 D 0; i D 1; : : : ; n: (9.22)
As an example, let us consider again the problem of the shortest line connecting 2
points, A .xA ; yA ; zA / and B .xB ; yB ; zB /. Suppose, the line is specified parametrically
as x D x.t/, y D y.t/ and z D z.t/. Then the line length is given by the line integral
Z tB q
LD .x0 /2 C .y0 /2 C .z0 /2 dt:
tA
Euler’s equations give

d @F @F x0 1
D0 H) D q D ;
dx @x0 @x 0 C1
.x0 /2 C .y0 /2 C .z0 /2
and similarly for the other two functions:
@F y0 1
D q D ;
@y0 C
.x0 /2 C .y0 /2 C .z0 /2 2
@F z0 1
Dq D ;
@z0 C3
.x0 /2 C .y0 /2 C .z0 /2
where C1 , C2 and C3 are three independent constants. Square both sides of each
equation and rearrange. We get
2 2 2
.1 C1 / x0 C y0 C z0 D 0;
0 2 2 2
x C .1 C2 / y0 C z0 D 0;
0 2 0 2 2
x C y C .1 C3 / z0 D 0:
This is a system of three linear algebraic equations with respect to the three squares
of the derivatives which has a non-trivial solution only if its determinant is equal to
zero. This, however, would imply some special relationship between the arbitrary
constants which is unreasonable. Therefore, we can accept the trivial solution that
.x0 /2 D .y0 /2 D .z0 /2 D 0. This means that all three functions we are seeking, x.t/,
y.t/ and z.t/, are linear functions of t. This corresponds to a straight line connecting
the two points as anticipated; the six constants for the three linear functions, x D
˛1 t C ˇ1 , y D ˛2 t C ˇ2 and z D ˛3 t C ˇ3 , are determined immediately from the six
coordinates of the two points.
9.1.3 Functionals Containing Higher Derivatives
The Euler’s type equations are to be modified if the functional

Z b

L Œf D F x; f ; f 0 ; f 00 ; : : : ; f .n/ dx (9.23)
a
contains derivatives of the unknown function f .x/ up to order n 2. Again, we

define the function L.˛/ D L Œf C ˛ıf and its variation
Z b
@F @F @F @F
ıL D L0 .0/ D ıf C 0 ıf 0 C 00 ıf 00 C C .n/ ıf .n/ dx: (9.24)
a @f @f @f @f
To calculate the variation, however, more than one integration by parts will be
required. Consider the contribution due to the m-th derivative (2 m n). Let
us calculate it by parts m times:
Z ˇ Z b
@F .m/
b
@F .m1/ ˇˇb d @F
.m/
ıf dx D .m/ ıf ˇ ıf .m1/ dx
a @f @f a a dx @f .m/
ˇ ˇb Z b 2
@F .m1/ ˇˇb d @F .m2/ ˇ
ˇ d @F
D .m/ ıf ˇ .m/
ıf ˇ C 2 .m/
ıf .m2/ dx : : :
@f a dx @f a a dx @f
ˇb ˇb
@F ˇ d @F ˇ
D .m/ ıf .m1/ ˇˇ .m/
ıf .m2/ ˇˇ C
@f a dx @f a
m1 ˇb Z b m
m1 d @F ˇ d @F
C .1/ ˇ
ıf ˇ C .1/ m
ıf dx:
dxm1 @f .m/ a a dx
m @f .m/
To make progress, we have to assume that not only the deviation ıf is zero at the
boundary points x D a and x D b, as we have assumed so far, but also all its
derivatives up to the order .m 1/ are equal to zero at the end points, i.e.,
ıf 0 .a/ D ıf 00 .a/ D D ıf .m1/ .a/ D 0;
and the same for the point x D b. Then all free terms in the above integration by
parts become zero, and we obtain
Z Z
b
@F .m/ b
dm @F
ıf dx D .1/m ıf dx:
a @f .m/ a dxm @f .m/
This will happen to each term in Eq. (9.24), which results in the following expression
for the variation:
Z b n
@F d @F d2 @F n d @F
ıL D C 2 C C .1/ ıf dx:
a @f dx @f 0 dx @f 00 dxn @f .n/
(9.25)
To obtain the corresponding Euler’s equation from this expression, we need first
to modify Theorem 9.1 since when proving it we only assumed that the function
.x/ is continuous together with its first derivative. Here we have to make sure that
.x/ in the condition
Z b
f .x/.x/dx D 0
a
of Theorem 9.1 is continuous together with all its derivatives up to the n-th one
and that the function itself and its derivatives are equal to zero at the end points
of the interval. This is easy to accomplish if we define .x/ within a small interval
< x < C (see the proof of Theorem 9.1) as
.x/ D .x C/
nC1
.x /
nC1
(9.26)
instead of the form (9.10) considered previously; outside the interval D 0 as

before. In addition, .k/ D 0 for any k D 1; 2; : : : ; n at the end points. This
guarantees that .x/ and all its derivatives up to the order n are continuous functions
of x. Then the proof is repeated almost without change and the theorem remains
valid in this case as well.
Hence, setting ıL D 0 in Eq. (9.25) and applying the result of the modified
theorem, we obtain the following Euler’s equation in this case:

@F d @F d2 @F dn @F
C C C .1/n D 0: (9.27)
@f dx @f 0 dx2 @f 00 dxn @f .n/
This is the DE of order 2n: indeed, the derivative @F=@f .n/ , which may contain f .n/ , is
differentiated n times with respect to x in the last term in the left-hand side. Hence,
the solution f .x/ must contain 2n arbitrary constants which are obtained from 2n
boundary conditions, which are that f , f 0 , : : :, f .n1/ all have well-defined values at
the end points.
9.1.4 Variation with Constraints given by Zero Functions
In applications it is often required to obtain the function(s) by minimising a

functional subject to some additional conditions (or constraints) imposed on the
function(s) to be determined. Suppose, we would like to optimise the functional
Z b
L Œf1 ; f2 ; : : : ; fn D F x; f1; : : : fn ; f10 ; : : : ; fn0 dx; (9.28)
a
subject to additional conditions imposed on the unknown functions. We shall

consider in detail the case of the so-called holonomic constraints which do not
contain derivatives of the functions f1 ; : : : ; fn :
Gj .x; f1 ; : : : ; fn / D 0; j D 1; : : : ; k: (9.29)
We have k 1 such constraints. The case of nonholonomic constraints which

contain derivatives of the functions as well is much more complicated and will only
be briefly mentioned at the end of this Section.
Here k, i.e., the number of constraints, must be smaller than n. Indeed, Eqs. (9.29)
can be solved with respect to k functions f1 ; : : : ; fk if k < n; these functions will
be related to the rest of the functions, fkC1 ; : : : ; fn . Substituting these relationships
into the original functional (9.28), L would still be a functional of the functions
fkC1 ; : : : ; fn , and hence the optimisation problem would make perfect sense. If
k D n, then all functions may be determined from the constraints alone and the
variational problem would be impossible.
In addition to the constraints themselves, we also require, as usual, that our
functions are to be fixed at the end points, fi .a/ D Ai and fi .b/ D Bi (for all
i D 1; : : : ; n), so that the constraints are satisfied at the end points x D a and
x D b as well.
The idea behind solving this problem is to try to convert it into the problem
without constraints which we have considered above and know how to deal with.
This can be done, as was mentioned above, by formally solving the constraints
equations with respect to some k functions and replacing these solutions into
the functional L. The latter will then be a functional of the rest of the n k
functions without any additional constraints and hence the Euler’s equations can
be established. This should solve the problem.
9.1.4.1 The Case of Two Functions and of a Single Constraint
To accomplish this programme, it is instructive to consider first the simplest case of

the functional
Z b
L Œf1 ; f2 D F x; f1 ; f2 ; f10 ; f20 dx
a
of just two functions, f1 D f1 .x/ and f2 D f2 .x/, and a single constraint
G .x; f1 ; f2 / D 0: (9.30)
Assume that we can find f2 from Eq. (9.30) above; then f2 D ' .x; f1 / will become
some function of x and f1 . Substituting this solution into the functional, we obtain
an explicit functional of only one function f1 .x/:
Z Z
b b
L Œf1 D F x; f1 ; ' .x; f1 / ; f10 ; ' 0 .x; f1 / dx D ˆ x; f1 ; f10 dx:
a a
Note that here ' 0 .x; f1 / is the total derivative of ' with respect to x, i.e.,
@' @' 0
'0 D C f : (9.31)
@x @f1 1
Also, ˆ is the new integrand in the functional which depends only on the first
function; we have introduced it for convenience.
The function f1 (note that f2 follows from it immediately as it is related to f1 via
f2 D ' .x; f1 /) is obtained from the Euler’s equation

@ˆ d @ˆ
D 0: (9.32)
@f1 dx @f10
We shall now need to work out all the derivatives in this equation. First of all,
@ˆ @ @F @F @' @F @' 0
D F x; f1 ; ' .x; f1 / ; f10 ; ' 0 .x; f1 / D C C 0
@f1 @f1 @f1 @' @f1 @' @f1
@F @F @' @F @' 0
D C C 0 :
@f1 @f2 @f1 @f2 @f1
To calculate the derivative @' 0 =@f1 needed above, we use the explicit expres-
sion (9.31) for ' 0 :

@' 0 @ @' @' 0 @2 ' @2 ' 0
D C f1 D C f :
@f1 @f1 @x @f1 @x@f1 @f1 @f1 1
Hence,

@ˆ @F @F @' @F @2 ' @2 ' 0
D C C 0 C f1 : (9.33)
@f1 @f1 @f2 @f1 @f2 @x@f1 @f1 @f1
Similarly we calculate the other derivative needed in Eq. (9.32):
@ˆ @F @F @' 0
D C :
@f10 @f10 @f20 @f10
The second contribution is necessary since the derivative ' 0 , according to Eq. (9.31),
also explicitly depends on f10 . Moreover, from this expression, we immediately get:
@' 0 =@f10 D @'=@f1 . Therefore,
@ˆ @F @F @'
0 D 0 C : (9.34)
@f1 @f1 @f20 @f1
Substituting Eqs. (9.33) and (9.34) into the Euler’s equation (9.32), we obtain

@F @F @' @F @2 ' @2 ' 0 d @F @F @'
C C 0 C f C 0 D 0: (9.35)
@f1 @f2 @f1 @f2 @x@f1 @f1 @f1 1 dx @f10 @f2 @f1
The last term requires additional attention:

d @F @F @' d @F d @F @' @F d @'
0 C D C C 0 ;
dx @f1 @f20 @f1 dx @f10 dx @f20 @f1 @f2 dx @f1
where

d @' @2 ' @2 ' 0
D C f :
dx @f1 @x@f1 @f1 @f1 1
Substituting these results into Eq. (9.35) and performing necessary cancellations, we
obtain

@F d @F @' @F d @F
C D 0: (9.36)
@f1 dx @f10 @f1 @f2 dx @f20
This equation is to be complemented by the constraint itself. This can be done in

the following way: let us differentiate both sides of the constraint equation,
G .x; f1 ; ' .x; f1 // D 0;
with respect to f1 :
@G @G @'
C D 0; (9.37)
@f1 @f2 @f1
from which we can get the derivative of ' with respect to f1 :
@' @G=@f1
D :
@f1 @G=@f2
This result allows rewriting the Euler’s equation (9.36) in the following final form:

@F d @F @G=@f1 @F d @F
D 0: (9.38)
@f1 dx @f10 @G=@f2 @f2 dx @f20
The obtained DE may have solved the problem; however, it looks complicated
and hence very difficult to remember. As with the optimisation of functions with
constraints (see Sect. I.5.10.4), a simple method exists (proposed by Euler) which
can simplify the procedure. Indeed, let us multiply Eq. (9.37) by some function .x/
and then add the obtained equation and the Euler’s equation (9.36) together. After
simple manipulation:

@F @G d @F @' @F d @F @G
C .x/ C C .x/ D 0: (9.39)
@f1 @f1 dx @f10 @f1 @f2 dx @f20 @f2
This result is valid for any function .x/. Let us now make a specific choice for it.
Namely, let us choose .x/ in such a way that the expression in the square brackets
above be zero:

@F d @F @G
C .x/ D 0; (9.40)
@f2 dx @f20 @f2
and, therefore,

1 @F d @F
.x/ D : (9.41)
@G=@f2 @f2 dx @f20
Hence, with this choice, what is left from Eq. (9.39) be simply this:

@F @G d @F
C .x/ D 0: (9.42)
@f1 @f1 dx @f10
It is easy to see that this is exactly equivalent to Eq. (9.38) if we replace here with
its expression (9.41).
Hence, the two Eqs. (9.40) and (9.42) are fully equivalent to Eq. (9.38). These,
however, can be rewritten in a form which is easy to remember. Consider an
auxiliary function of x and both functions and their derivatives:

H x; f1 ; f2 ; f10 ; f20 D F x; f1 ; f2 ; f10 ; f20 C .x/ G .x; f1 ; f2 / : (9.43)
Since G does not depend on the derivatives of f1 and f2 ,
@F @H @F @H
0 D and 0 D I
@f1 @f10 @f2 @f20
other terms in Eqs. (9.40) and (9.42) can also be combined, and we obtain instead
the two equations for the function H:

@H d @H @H d @H
D0 and D 0: (9.44)
@f1 dx @f10 @f2 dx @f20
These look exactly as the Euler’s equations of the unconstrained problem

formulated for the auxiliary function H, Eq. (9.43), and hence, this formulation
seems to be much easier and more attractive. Similarly to the case of optimising
functions of many variables subject to constraints, this method is also called the
method of Lagrange multipliers.
9.1.4.2 The General Case
Now we understand the main idea, we can apply the same method to the general case
of Eqs. (9.28) and (9.29). Suppose, we can solve equations of constraints (9.29) with
respect to the first k functions3 :
fj D 'j .x; fkC1 ; : : : ; fn / ; j D 1; : : : ; k: (9.45)
We also note for the later that their derivatives are
@'j Xn
@'j 0
fj0 D 'j0 D C f : (9.46)
@x i DkC1
@fi1 i1
1
The functional (9.28) becomes dependent only on the n k functions fi with i D

k C 1; : : : ; n,
Z b
L ŒfkC1 ; : : : ; fn D ˆ .x; fkC1 ; : : : ; fn / dx;
a
where we have introduced an auxiliary function

ˆ .x; fkC1 ; : : : ; fn / D F x; '1 ; : : : ; 'k ; fkC1 ; : : : ; fn ; '10 ; : : : ; 'k0 ; fkC1
0
; : : : ; fn0 ;
which yields immediately a set of the corresponding Euler’s equations:

@ˆ d @ˆ
D 0; i D k C 1; : : : ; n: (9.47)
@fi dx @fi0
It is important to remember that functions 'j and 'j0 (with j D 1; : : : ; k) depend

on all the functions fi with i D k C 1; : : : ; n. We are now ready to work out the
derivatives we encountered in Eq. (9.47) above. Firstly,
@F X @F @'j X @F @'j
k k 0
@ˆ
D C C
@fi @fi jD1
@fj @fi jD1
@fj0 @fi
0 1
@F X @F @'j X @F @ @2 'j X
k k n
@2 'j 0
D C C C f A;
@fi jD1
@fj @fi jD1
@fj0 @x@fi i DkC1 @fi1 @fi i1
1
3
It can be shown that the necessary condition for the constraints (9.29) to be solvable with respect
1 ;:::;Gk /
to the functions f1 ; : : : ; fk is that det D ¤ 0, where D D @.G
@.f1 ;:::;fk /
is the Jacobian. This point goes
deeply into the inverse function theorem in the case of functions of many variables.
where we have used Eq. (9.46) for the derivative 'j0 . The other derivative we need is
@F X @F @'j @F X @F @'j
k 0 k
@ˆ
D C D C :
@fi0 @fi0 jD1
@fj0 @fi0 @fi0 jD1
@fj0 @fi
In the last passage we replaced @'j0 =@fi0 with @'j =@fi . This follows from Eq. (9.46)
after differentiating it with respect to fi0 . Replacing the above derivatives in Euler’s
equation (9.47), we get
0 1
@F X
k
@F @'j Xk
@F @ @ 'j2 Xn 2
@ 'j 0 A
C C 0 C fi1
@fi jD1
@fj @fi jD1
@f j @x@f i i DkC1
@f i1 @fi
1
0 1
d @ @F X @F @'j A
k
C D 0: (9.48)
dx @fi0 jD1
@fj0 @fi
In the last term we need to calculate the total derivative with respect to x:
0 1 !
d @X @F @'j A X d @'j X @F d
k k k
@F @'j
D C ;
dx jD1 @fj0 @fi jD1
dx @fj0 @fi jD1
@fj0 dx @fi
where
Xn
d @'j @2 'j @2 'j 0
D C f :
dx @fi @x@fi i DkC1 @fi @fi1 i1
1
Using these results in Eq. (9.48) and simplifying, we obtain

X k
" !#
@F d @F @'j @F d @F
0 C D 0; i D k C 1; : : : ; n: (9.49)
@fi dx @fi jD1
@fi @fj dx @fj0
The derivatives @'j =@fi can be obtained by differentiating the constraints (9.29) with
respect to fi . Writing the constraints more explicitly as
Gj .x; '1 : : : ; 'k ; fkC1 ; : : : ; fn / D 0;
we have
@Gj X @Gj @'j1

k
C D 0; i D k C 1; : : : ; n and j D 1; : : : ; k: (9.50)
@fi j D1
@fj1 @fi
1
The above equations represent (for each fixed index i) a set of k linear algebraic
equation
with respect to the derivatives @'j1 =@fi . Introducing the k k matrix D D
Djj1 D @Gj =@fj1 of the derivatives and assuming that its determinant is not equal
to zero,4 these equations can be solved using the inverse matrix D1 . This way, at
least formally, the derivatives @'j1 =@fi can be obtained
@'j1 Xk
1 @Gj
D D j1 j : (9.51)
@fi jD1
@fi
Now we shall reformulate Eq. (9.49) by introducing Lagrange multipliers.

Interchange the indices j ! j1 in Eq. (9.50), multiply each by j1 .x/, and add
to Eq. (9.49) for each j1 . After grouping the terms appropriately, we obtain
2 ! 3
X @Gj1 X @'j @F X
k k k
@F d @F d @F @Gj1
C j1 .x/ C 4 C j1 .x/ 5 D 0;
@fi dx @fi0 @fi @fi @fj dx @fj0 @fj
j1 D1 jD1 j1 D1
(9.52)
where i D k C 1; : : : ; n. We choose the functions j1 .x/ in such a way as to get the
expression in the square brackets equal to zero. This gives two equations:
X
k
@F d @F @Gj1
C j1 .x/ D 0; i D k C 1; : : : ; n; (9.53)
@fi dx @fi0 j1 D1
@fi
and
!
@F d @F X
k
@Gj1
C j1 .x/ D 0; j D 1; : : : ; k: (9.54)
@fj dx @fj0 j1 D1
@fj
It can be seen now that if we solve Eq. (9.54) with respect to the Lagrange
multipliers and substitute this result into the other Eq. (9.53), then we shall obtain
correctly Eq. (9.49). Indeed, Eq. (9.54) can be rewritten as
" !#
X
k
@F d @F
Dj1 j j1 .x/ D :
j1 D1
@f j dx @fj0
Notice that here we have the matrix D transposed. Therefore, when solving for j1 ,
we also transpose the inverse matrix:
" !#
X
k
1 @F d @F
j1 .x/ D D jj1 :
jD1
@fj dx @fj0
@.G1 ;:::;Gk /
4
This matrix represents the Jacobian D D @.f1 ;:::;fk /
: As was mentioned in the previous footnote
above, we must assume that det D ¤ 0.
Replacing j1 just obtained in Eq. (9.53), we get

8 9" !#
k < X
@F d @F X k
1 @Gj1 = @F d @F
C D jj1 D 0:
@fi dx @fi0 jD1
:
j D1
@fi ; @fj dx @fj0
1
Making use of Eq. (9.51), we see that the expression in the curly brackets is in
fact @'j =@fi . Therefore, comparing with the original form (9.49) of the Euler’s
equation, we conclude that it has been recovered. This means that Eqs. (9.53)
and (9.54) are completely equivalent to it.
We shall now rewrite these equations in a more convenient form by introducing
an auxiliary function
X
k
HDFC j .x/Gj : (9.55)
jD1
Since the constraints do not depend on the derivatives of our functions, we

immediately see that Eqs. (9.53) and (9.54) are equivalent to these ones:

@H d @H
D 0; i D k C 1; : : : ; n; (9.56)
@fi dx @fi0
and
!
@H d @H
D 0; j D 1; : : : ; k: (9.57)
@fj dx @fj0
We conclude that the same equations are obtained for all functions fl with l D
1; : : : ; n. In other words, the problem with constraints is completely equivalent to
the problem without constraints but applied to the auxiliary function (9.55). This
finally proves the required general statement in the case of an arbitrary holonomic
constraints.
9.1.4.3 Nonholonomic Constraints
As was mentioned at the beginning of this Section, the case of nonholonomic

constraints is much more complicated. Indeed, our treatment above was based
on an assumption that one can solve Eq. (9.29) of the constraints with respect to
some of the functions and hence express them via the others. This trick made
the functional dependent only on those other functions without any additional
constraints and hence the usual technique based on the Euler’s equations would
be directly applicable.
It is clear now that if the constraints contain also derivatives of the functions we
are seeking, then these equations are differential equations and hence expressing
some of the functions via the others may be much more complicated. Still, it can be
shown that the method of Lagrange multipliers is applicable in this case as well.
9.1.5 Variation with Constraints given by Integrals
Suppose now that our constraints have the form of the integral:
Z b

Gi x; f1 ; : : : ; fn ; f10 ; : : : ; fn0 dx D gi ; i D 1; : : : ; k: (9.58)
a
This is sometimes called an isoparametric problem. This is because a classical

problem of finding a shape of a closed curve on the plane which has a fixed
length and which encloses the largest area has this name. We shall demonstrate
now that this problem can be reformulated to the one we considered in the previous
section when the constraints did not contain integrals. We shall also find that the
number of constraints could be arbitrary, e.g., k could be larger than or equal to n.
Recall that in the previous case k was necessarily smaller than n.
Let us introduce auxiliary functions
Z x1
yi .x1 / D Gi x; f1 ; : : : ; fn ; f10 ; : : : ; fn0 dx; i D 1; : : : ; k: (9.59)
a
These are functions resulting from the upper limit in the integral. Since at the
upper limit, x1 D b, we should reproduce the constraints themselves, we must have
yi .b/ D gi . Obviously, by construction,

y0i Gi x; f1 ; : : : ; fn ; f10 ; : : : ; fn0 D 0; i D 1; : : : ; k:
These conditions can serve as constraints without integrals if we agree to consider

the functions y1 ; : : : ; yk as additional unknown functions of our variational problem.
Then, the total number of functions we are seeking increases to n C k, with the
number of constraints being k. Here k could be either smaller or bigger than n, the
problem is well defined regardless.
According to the discussion of the previous Section, to solve our problem, we
have to consider the auxiliary function

H x; f1 ; : : : ; fn ; f10 ; : : : ; fn0 ; y01 ; : : : y0k D F x; f1 ; : : : ; fn ; f10 ; : : : ; fn0
X
k

C i .x/ y0i Gi x; f1 ; : : : ; fn ; f10 ; : : : ; fn0 ; (9.60)
iD1
which does depend only on the derivatives of the newly added functions yi .x/. The
Euler’s equations in this case are
!
@H d @H
D 0; j D 1; : : : ; n; (9.61)
@fj dx @fj0
and

@H d @H
D 0; i D 1; : : : ; k: (9.62)
@yi dx @y0i
Since the function H does not actually depend on the functions yi themselves, the
second equation gives
@H
D i .x/ D Ci ; i D 1; : : : ; k;
@y0i
i.e., the Lagrange multipliers must be constants, i.e., they cannot depend on x.
Moreover, when writing the Euler’s equations (9.61) for the functions fj , the
derivatives of yi in H of Eq. (9.60) do not contribute and, therefore, the function
H can be built simply as
X
k
HDFC i Gi (9.63)
iD1
and applied only to finding the functions fj .

To illustrate the method, let us consider the following problem. Consider a rope
suspended between two points x D 0 and x D x0 as illustrated in Fig. 9.3. Because of
the gravity, the rope will accept a certain shape which minimises the total potential
energy of the rope. As with brachistochrone problem (see Problem 9.9), the shape of
the rope willqbe described by a function y D y.x/ with the y axis drawn down. The
piece ds D 1 C .y0 /2 dx of the rope accumulates the potential energy dU D gyds,
where is the (uniform) linear mass density of the rope and g gravity constant.
Fig. 9.3 A rope of length l

fixed between two points
x D 0 and x D x0 will accept
some shape y D y.x/ due to
gravity
Therefore, the total potential energy of the rope due to gravity will be proportional
to (we can omit the irrelevant prefactor g):
Z x0 q
UD y 1 C .y0 /2 dx: (9.64)
0
We have to minimise U with respect to the shape y.x/ of the rope subject to the
condition that the length of the rope
Z x0 q
lD 1 C .y0 /2 dx (9.65)
0
is fixed and equal to l. Following the recipe discussed above, we construct the
auxiliary function
q q q

H y; y0 D y 1 C .y0 /2 C 1 C .y0 /2 D .y C / 1 C .y0 /2 :
Since it depends only on y and its derivative, the Euler’s equation yields
q
@H .y0 /2
H y0 D C1 H) .y C / 1 C .y0 /2 .y C / q D C1 :
@y0
1 C .y0 /2
q
Multiplying both sides by 1 C .y0 /2 and simplifying, we obtain
q q
yC dy 1
1 C .y0 /2 D H) D˙ .y C /2 C12 ; (9.66)
C1 dx C1
where it is assumed that C1 is positive.

The curve must be symmetric about the midpoint x D x0 =2 and hence at the
midpoint y0 must be equal to zero. But the symmetry condition also enables us to
limit ourselves in finding the shape of the curve only within the interval 0 x
x0 =2 where the first derivative y0 0. To satisfy this we choose the plus sign.
Rearranging the above DE and integrating, we obtain
Z y Z x
dy1 dx1
q D ;
C1
0 .y1 C /2 C12 0
where the zero bottom limits in both integrals take account of the fact that .0; 0/ is
the starting point of the line. The integral in the left-hand side by the substitution
y1 C D C1 t is manipulated into the inverse hyperbolic cosine integral, the integral
in the right-hand side is straightforward, so that we obtain
Z .yC/=C1
dt x 1 yC 1 x
p D H) cosh cosh D ;
=C1 2
t 1 C1 C1 C1 C1
from where the solution readily follows:

yC x
D cosh C cosh1 : (9.67)
C1 C1 C1
This is a general solution as it still contains two unknown constants: C1 and . We

have already used the condition that the rope starts at the point .0; 0/ when choosing
the integration limits above; we still need to satisfy the condition that the length of
the rope is l and at the midpoint x0 =2 the derivative of y.x/ is zero. It is convenient
first to take into account the latter condition. Using the first equation in (9.66), we
find that at the midpoint

y .x0 =2/ C x0
D1 H) cosh C cosh1 D 1;
C1 2C1 C1
which in turn means that the expression in the square brackets is zero:

x0 x0 x0
C cosh1 D0 H) D C1 cosh D C1 cosh :
2C1 C1 2C1 2C1
This gives in terms of C1 . Our solution (9.67) can now be simplified:

x x0 =2 x0
y D C1 cosh cosh : (9.68)
C1 2C1
We now need to satisfy the condition related to the length of the curve to obtain C1 .
We have
Z x0 =2 q Z x0 =2 Z x0 =2
l 0 2 yC x x0 =2
D 1 C .y / dx D dx D cosh dx
2 0 0 C1 0 C1

x0
D C1 sinh :
2C1
It is easy to verify that this transcendental equation always has a solution for C1 .
The shape of the curve (9.68) is fully defined. In fact, this formula is valid for the
whole range 0 x x0 .
Problem 9.10. Consider a line of length l fixed at two points A.x0 ; 0/ and
B.x0 ; 0/ (where l > 2x0 ). Determine
Rx the shape y D y.x/ of the curve which
gives the largest area A D x0 0 y.x/dx under it, i.e., between the curve and the
x axis (it may be assumed that y 0).
(continued)
9.2 Functions of Many Variables 639
q that the curve which optimises the auxiliary function H D y C

(i) show
1 C .y0 /2 is a circle .x a/2 C .y C b/2 D 2 , where a and b are
arbitrary constants andq the Lagrange multiplier;
(ii) prove that a D 0, D x02 C b2 and b is determined from the equation
q
x0
l D 2 x02 C b2 arctan :
b
(iii) Prove that the latter equation always has one solution with respect to b.
This problem also shows that the largest possible area made by a close-looped
line of a fixed length l is a circle of radius R D l=2 , something which can easily
be accepted intuitively.
9.2 Functions of Many Variables
So far we have considered the case where the functional L depends on one or more
functions (and their derivatives) which depend on a single variable x; consequently,
the functional L has the form of a single integral with respect to x. In some
applications it is necessary to find the minimum (maximum) of a functional which
is written as a multiple integral over some multidimensional region and contains an
unknown function f (and its partial derivatives) defined in this region, i.e., we have
to deal with the function of more than one variable.
For instance, consider a closed curve L in the 3D space. This curve could be
a boundary to many curvilinear surfaces S which may have different surface areas
(Sect. I.6.4.2)
Z Z q
2 2
AD 1 C z0x C z0y dxdy:
†
Here the surface S is specified as z D z.x; y/, and † is the projection of S on the
x y plane. It is then legitimate to ask the question of what is the surface that has the
minimum possible surface area A. For instance, if the contour L is planar (e.g. lies
in the x y plane), then the minimum of the area is achieved by the planar surface
enclosed by the contour L. However, if L is not planar, the answer is not so obvious.
We shall first consider the following functional:
Z Z

L Œf D F x; y; f ; fx0 ; fy0 dxdy: (9.69)
D
It depends on a single function f D f .x; y/ of two variables, and the integration

is performed over some 2D region D in the x y plane. We also assume that the
function f has definite values on the whole boundary L of the region D, and is
continuous everywhere in D together with its first derivatives fx0 and fy0 .
The variation ıL in this case is constructed exactly in the same way as for a
function of a single variable using the function L.˛/ D L Œf C ˛ıf , where ıf is
a fixed function of two variables such that on the boundary L it is zero, ıf .L/ D
ı f .x; y/jL D 0. Then,
ˇ
ıL D L0 .˛/ˇ˛D0 D L0 .0/;
as before. To proceed, we need a theorem similar to Theorem 9.1 we proved above

for the 1D case.
Theorem 9.2. Let .x; y/ be any function which is zero at the boundary L of some
2D region D, and is continuous together with its first derivatives 0x and 0y within
the whole region D including its boundary. If f .x; y/ is a continuous function in D
and for any such .x; y/ the integral
Z Z
f .x; y/.x; y/dxdy D 0;
D
then f .x; y/ D 0.
Proof. It is proven by contradiction as well. If we assume that at some point .x0 ; y0 /

the function f .x; y/ is, say, positive, then it will be such within some vicinity U of
this point, e.g., in a circle of radius :
f .x; y/ > 0 for .x x0 /2 C .y y0 /2 < 2

:
Next, we define the function .x; y/ in the following way: it is zero at the boundary
of the same circle and beyond it, while inside the circle it is defined as
h i2
.x; y/ D .x x0 /2 C .y y0 /2 2
:
It is easy to see that this function is indeed zero at the boundary of the circle and
hence is continuous everywhere in D. Its first partial derivatives behave similarly.
Indeed, within the circle
h i
0x D 4 .x x0 / .x x0 /2 C .y y0 /2 2 ;
and hence it is zero at its boundary; it continues to be zero beyond the circle by
construction. Hence, defined in that manner satisfies the conditions of the theorem.
At the same time, the surface integral
Z Z Z Z
f .x; y/.x; y/dxdy D f .x; y/.x; y/dxdy > 0;
D U
i.e., it is not zero as both functions are positive within the circle U . Therefore, our
assumption has been proven wrong. Q.E.D.
Using this theorem, we can derive an appropriate partial differential equation for
the function f .x; y/ corresponding to the optimum of the functional (9.69). Indeed,
the variation
Z Z
d 0 0

ıL D F x; y; f C ˛ıf ; .f C ˛ıf /x ; .f C ˛ıf /y dxdy
d˛ D ˛D0
Z Z
d
D F x; y; f C ˛ıf ; fx0 C ˛ .ıf /0x ; fy0 C ˛ .ıf /0y dxdy
d˛ D ˛D0
Z Z " #
@F @F @F
D ıf C 0 .ıf /0x C 0 .ıf /0y dxdy
D @f @fx @fy
Z Z Z Z " #
@F @F 0 @F 0
D ıf dxdy C 0
.ıf /x C 0 .ıf /y dxdy: (9.70)
D @f D @fx @fy
To calculate the last double integral, we consider the partial derivative

@ @F @ @F @ .ıf / @F @ @F @F
ıf D ıf C D ıf C .ıf /0x 0 ;
@x @fx0 @x @fx0 @x @fx0 @x @fx0 @fx

@F @ @F @ @F
H) .ıf /0x 0 D ıf ıf :
@fx @x @fx0 @x @fx0
Similarly,
! !
@F @ @F @ @F
.ıf /0y 0
D ıf ıf :
@fy @y @fy0 @y @fy0
Using these expressions to replace the two terms within the square brackets in
Eq. (9.70), we obtain
Z Z " !#
@F @ @F @ @F
ıL D ıf dxdy
D @f @x @fx0 @y @fy0
Z Z " !#
@ @F @ @F
C ıf 0 ıf dxdy:
D @x @fx0 @y @fy
The second integral is zero. Indeed, it can be handled, e.g., by means of the Green’s
formula, Sect. I.6.3.3, with the following choice of the two functions Q and P:
@F @F
Q.x; y/ D ıf and P.x; y/ D ıf :
@fx0 @fy0
RR @Q @P
Then, according to the Green’s formula, the double integral D @x
@y
dxdy
equals the line integral
I I !
@F @F
Pdx C Qdy D ıf 0 dx C 0 dy
L L @fy @fx
along the boundary L of D. However, ıf D 0 everywhere on the boundary L, and

hence this integral is identical to zero as required.
Hence,
Z Z " !#
@F @ @F @ @F
ıL D ıf dxdy:
D @f @x @fx0 @y @fy0
At the stationary point ıL D 0. As this is to happen for any variation ıf , then based
on Theorem 9.2 we arrive at the following equation the function f .x; y/ must satisfy:
!
@F @ @F @ @F
D 0: (9.71)
@f @x @fx0 @y @fy0
This result was first obtained by Ostrogradsky and bears his name.
A generalisation to functions of more variables can easily be done. First of all,
we need the necessary generalisation of Theorem 9.2 for the case of n dimensions
(n 2). This is done by introducing
" n #2
X
.x1 ; : : : ; xn / D .xi xi0 /2 2
iD1
within a small vicinity U of the point .x10 ; : : : ; xn0 / where we assume that the
function f .x10 ; : : : ; xn0 / > 0. The rest of the proof remains exactly the same.
Before going to a general n-dimensional case, let us next consider the case of
three dimensions. The functional is a volume integral:
Z Z Z

LD F x; y; z; f ; fx0 ; fy0 ; fz0 dxdydz; (9.72)
D
where f D f .x; y; z/ is the function of three variables to be determined as the

stationary value of this functional. We assume that this function is continuous
together with its necessary derivatives in the 3D region D, and is fixed at the (closed)
boundary surface S of the volume. As before, the variation
Z Z Z " #
@F @F 0 @F 0 @F 0
ıL D ıf C 0 .ıf /x C 0 .ıf /y C 0 .ıf /z dxdydz
D @f @fx @fy @fz
Z Z Z " ! #
@F @ @F @ @F @ @F
D ıf dxdydz
D @f @x @fx0 @y @fy0 @z @fz0
Z Z Z " ! #
@ @F @ @F @ @F
C ıf C ıf C ıf dxdydz:
D @x @fx0 @y @fy0 @z @fz0
This time, to transform the last integral into the integral over the surface boundary
S of D, we notice that the integrand
there
can be thought of as a divergence of the
vector field Fıf , where F D Fx ; Fy ; Fz is the vector field with the components
@F @F @F
Fx D ; Fy D ; Fz D ;
@fx0 @fy0 @fz0
so that the integral becomes

Z Z Z —
div .Fıf / dxdydz D ıf F dS;
D S
where we have used the Ostrogradsky–Gauss theorem (Sect. I.6.4.7) to transform

the volume integral into the integral over its surface boundary. Since everywhere on
S the variation ıf D 0, this volume integral is zero. Consequently,
Z Z Z " ! #
@F @ @F @ @F @ @F
ıL D ıf dxdydz;
D @f @x @fx0 @y @fy0 @z @fz0
and the necessary condition ıL D 0 leads to the required generalisation of Eq. (9.71)
to three dimensions:
!
@F @ @F @ @F @ @F
D 0: (9.73)
@f @x @fx0 @y @fy0 @z @fz0
The case of any n dimensions (n 4) is worked out similarly. What is needed is

to transform the n-dimensional integral
Z Z " ! ! !#
@ @F @ @F @ @F
::: ıf C ıf C C ıf dx1 dx2 : : : dxn
@x1 @fx01 @x2 @fx02 @xn @fx0n
„ ƒ‚ …
n
over the whole n-dimensional region into an .n 1/-dimensional integral corre-

sponding to its closed “surface” S. Basically, what we are to do here is to generalise
the Ostrogradsky–Gauss theorem to more than 3 dimensions. This step requires

performing one 1D integration explicitly. For instance, consider the first term:
Z Z !
@ @F
::: ıf dx1 dx2 : : : dxn
@x1 @fx01
„ ƒ‚ …
n
Z Z "Z ! #
x1 .x2 ;:::;xn /C
@ @F
D : : : dx2 : : : dxn ıf dx1
x1 .x2 ;:::;xn / @x1 @fx01
„ ƒ‚ …
n1
2 ! ! 3
Z Z
@F @F
D ::: 4 ıf ıf 5 dx2 : : : dxn :
@fx01 @fx01
„ ƒ‚ … x1 .x2 ;:::;xn /C x1 .x2 ;:::;xn /
n1
Here x1 .x2 ; : : : ; xn /˙ are two values of x1 at the boundary surface S of

the n-dimensional region D for the given values x2 ; : : : ; xn of the other variables.
Since ıf D 0 on the bounding surface, the surface integral is zero again. Similarly,
we repeat this for other terms by performing an integration over x2 , x3 , etc. for the
second, third and so on terms, respectively, to arrive in the end at the following
equation:
!
@F X @
n
@F
D 0: (9.74)
@f iD1
@xi @fx0i
It is easy to see that in the cases of n D 1; 2; 3 we obtain the previously derived

Eqs. (9.11), (9.71) and (9.73), respectively.
This result is generalised to the functionals containing more than one function
and/or higher derivatives of the functions along the same lines as before.

Z Z

LD F x; y; f ; fx0 ; fy0 ; fxx00 ; fxy00 ; fyy00 dxdy:
D
Show that in this case the stationary value of the functional is given by the
function f .x; y/ which satisfies the equation:
! ! !
@F @ @F @ @F @2 @F @2 @F @2 @F
C C C 2 D 0:
@f @x @fx0 @y 0
@fy 2
@x @fxx00 @x@y 00
@fxy @y @fyy00
(9.75)
9.3 Applications in Physics
9.3.1 Mechanics
For a system of N particles of mass mi (where i D 1; : : : ; N) described by their

coordinates ri D .xi ; yi ; zi / and velocities vi D rP i D .Pxi ; yP i ; zPi / (the dot above the
symbols here and in what follows corresponds to the time derivative), the usual
approach for describing their evolution in time t under forces Fi is to write Newton’s
equations of motion, mi rR i D Fi , where the vector rR i is the acceleration of the i-th
particle. A general solution of these second order DEs will contain 6N arbitrary
constants which can be found from the known initial positions and velocities of the
particles.
However, in many practical cases the situation may be much more complex.
For instance, in some mechanical problems certain constraints may exist relating
Cartesian atomic positions and velocities of particles with each other effectively
reducing the actual number of degrees of freedom n. In these and also in other cases
one may choose the so-called generalised coordinates instead of the Cartesian ones
(e.g. distances between atoms, angles formed by three atoms, etc.) which are more
convenient for the problem at hand. Hence, a more abstract and general formulation
of mechanics is desirable that is based on such a general set of independent
coordinates chosen to fully describe the system of interest. We shall denote these
generalised coordinates qi (i D 1; : : : ; n, where n 3N), while their time derivatives
qP i which will be called generalised velocities.
In order to describe the time evolution of the generalised coordinates, one may
use a very general principle of mechanics, called the principle of least action,
proposed by Hamilton. It states that any mechanical system can be described by
a function L D L .q1 ; : : : ; qn ; qP 1 ; : : : ; qP n ; t/ called Lagrangian5 which depends on
all atomic coordinates and velocities (and possibly time), and that the true particle
trajectories qi D qi .t/ (amongst all possible trajectories) between times t1 and t2
must correspond to the minimum of the action (or action integral)
Z t2
SD L .q1 ; : : : ; qn ; qP 1 ; : : : ; qP n ; t/ dt: (9.76)
t1
Here it is assumed that the coordinates are fixed at the initial and final times. The
Lagrange function is constructed as a difference between the system’s kinetic, K,
and potential, U, energies, L D K U. Note that previously L was denoting
the whole functional; here it is denoted S while the integrand is L. These are the
notations widely accepted in physics literature for the action and the Lagrangian,
respectively, and hence we shall stick with them here.
5
After Joseph-Louis Lagrange.
The corresponding equations of motion are then obtained by optimising the

action functional (9.76), which leads to the set of familiar Euler’s equations:

@L d @L
D 0; i D 1; : : : ; n: (9.77)
@qi dt @Pqi
In physics these are normally called Lagrange equations.

If the system of n degrees of freedom is given with k < n holonomic constraints
Gi .q1 ; : : : ; qn ; t/ D 0; i D 1; : : : k;
then we have to consider the auxiliary function
X
k
H DLC i .t/Gi
iD1
with i .t/ being the corresponding Lagrange multipliers, and hence the correspond-
ing Lagrange equations would contain an additional term due to these constraints:

@L X @Gj
k
d @L
D C j ; i D 1; : : : ; n: (9.78)
dt @Pqi @qi jD1
@qi
The terms in the right-hand side serve as forces acting on the coordinate qi ; the
first term is a real (physical) force, while the second is an artificial force due to the
constraints.
As our first example of application of the Lagrange equations, let us consider
a single particle of mass m moving under the external potential U.r/. Here
r D .x; y; z/ are the three Cartesian coordinates describing the system. In this case
there is no need for special generalised coordinates, the Cartesian coordinates will
do. The Lagrangian is
1 2 1
LD mPr U.r/ D m xP 2 C yP 2 C zP2 U .x; y; z/ ;
2 2
leading to three Lagrange equations for each Cartesian component, e.g., one of
them,

@L d @L @U d @U
D 0 H) .mPx/ D 0 H) mRx D 0;
@x dt @Px @x dt @x
is nothing but Newton’s second equation of motion projected on the x axis, since
@U
@x
is the x component of the force, Fx .
In our second example, consider a pendulum of length l with a particle of mass m
hanging at its end, Fig. 9.4(a). The Cartesian coordinates of the mass are x D l sin ˛
and y D l cos ˛. However, it is clear that only a single coordinate is needed to
Fig. 9.4 Pendulum problems: (a) a single pendulum; (b) a double pendulum and (c) a pendulum
with an oscillating support
adequately describe the oscillations of the mass. The most convenient choice here
is the angle ˛ of the pendulum with the y (vertical) axis. Hence, choosing ˛ as the
required independent generalised coordinate, we have to express both the potential
and kinetic energies via it. The potential energy is straightforward: U D mgy D
mgl cos ˛. The kinetic energy is most easily worked out starting from the Cartesian
coordinates:
1 2 1 h i 1
KD m xP C yP 2 D m .l˛P cos ˛/2 C .l˛P sin ˛/2 D ml2 ˛P 2 ;
2 2 2
so that the Lagrangian becomes
1 2 2
LD ml ˛P C mgl cos ˛:
2
The required equation of motion is obtained from the Lagrange equation:

@L d @L d g
D0 H) mgl sin ˛ ml2 ˛P D 0 H) ˛R C sin ˛ D 0:
@˛ dt @˛P dt l
p˛ ' ˛ and we arrive at the harmonic oscillator equation

For small oscillations sin
with the frequency ! D g=l.
For a system of particles, we have to sum up over all contributions to get their
total kinetic and potential energies.
Problem 9.12. Consider a double pendulum system shown in Fig. 9.4(b). By

considering two angles, ˛1 and ˛2 , as independent generalised coordinates,
write the Cartesian coordinates .x1 ; y1 / and .x2 ; y2 / of the two masses m1 and
m2 and hence show that the Lagrangian of the system is
(continued)

1
LD .m1 C m2 / l12 ˛P 12 C m2 l22 ˛P 22 C m2 l1 l2 cos .˛1 ˛2 / ˛P 1 ˛P 2
2
C .m1 C m2 / gl1 cos ˛1 C m2 gl2 cos ˛2 :
Then, derive equations of motion for the two masses:

m2
l1 ˛R 1 C l2 ˛R 2 cos .˛1 ˛2 / C ˛P 22 sin .˛1 ˛2 / D g sin ˛1 ;
m1 C m2
and
l2 ˛R 2 C l1 ˛R 1 cos .˛1 ˛2 / ˛P 12 sin .˛1 ˛2 / D g sin ˛2 :
Problem 9.13. Consider a pendulum suspended at a support which is attached

by a spring with the elastic constant k to a wall, Fig. 9.4(c). Choosing the
horizontal coordinate x of the oscillating support and the angle ˛ of the
pendulum as independent generalised coordinates, show that the Lagrangian
of this system is
1 1 1
LD .m1 C m2 / xP 2 C m2 l2 ˛P 2 C m2 lPx˛P cos ˛ kx2 C m2 gl cos ˛;
2 2 2
while the corresponding equations of motion read:
l˛R C xR cos ˛ C g sin ˛ D 0
and
m2 l k
xR C ˛R cos ˛ ˛P 2 sin ˛ C x D 0:
m1 C m2 m1 C m2
Problem 9.14. A ball of mass m is set on a horizontal rod along which it can
slide without friction; at the same time, the ball is also attached by a spring
with the elastic constant k to a vertical axis, see Fig. 9.5. The rod rotates with
the angular frequency around the vertical axis. Show that the Lagrangian in
this case is
1 2 1
LD m rP C 2 r2 k .r r0 /2 ;
2 2
(continued)

where r and r0 are the distances of the mass to the centre during the motion
and at equilibrium, respectively. Then derive
p the corresponding equations of
motion to show that in the case ofq!0 D k=m > the ball oscillates along
the rod with the frequency ! D !02 2 and the average distance rav D
.!0 =!/2 r0 > r0 from the centre. Why the average position of the mass moved
more away from the centre?
The formalism based on the Lagrange equations can be used to establish

important conservation quantities of a mechanical system such as energy, linear (or
translational) momentum and angular momentum.
First of all, consider the invariance of an isolated system with respect to time.6
Since the system is conservative, its Lagrangian cannot depend explicitly on time,
i.e., the total time derivative of L should be zero:
Xn X n
dL @L @L d @L @L
D qP i C qR i D qP i C qR i :
dt iD1
@qi @Pqi iD1
dt @Pqi @Pqi
Here we replaced the derivative of L with respect to the coordinate qi with the
time derivative term coming from the Lagrange equations (9.77). Then, we notice
that an expression in the square brackets is nothing but the total time derivative of
qP i .@L=@Pqi /, so that
!
dL X d n
@L d X @L
n
D qP i H) qP i L D 0:
dt iD1
dt @Pqi dt iD1 @Pqi
In other words, the total time derivative of the quantity
Xn
@L
ED qP i L; (9.79)
iD1
@P
qi
Fig. 9.5 A ball on a rotating

rod
6
Our consideration is also valid for systems in an external field which does not depend on time.
is equal to zero, i.e., it must be time conserved for an isolated system. This quantity
is called system energy. Indeed, since the kinetic energy K may depend on both
generalised coordinates and their velocities, while the potential energy U only
depends on the coordinates, then
X
n
@ .K U/ X
n
@K
ED qP i .K U/ D qP i K C U:
iD1
@Pqi iD1
@Pqi
Assuming that K is a quadratic form of the velocities,
1X
n
KD ˛ij qP i qP j ;
2 i;jD1
with the coefficients ˛ij forming an n n matrix7 (which generally may depend on
the coordinates), we can write
0 1
X n
@K Xn X n
qP i D @ ˛ij qP j A qP i D 2K;
iD1
@P
qi iD1 jD1
leading to E D K C U, which is a usual expression for the energy.

The momentum of the system appears due to its invariance with respect to an
arbitrary translation in space. Let us use the Cartesian coordinates of all particles
of our system to describe this. Displacing all the positions ri of the particles by the
same vector ır D .ıx; ıy; ız/ will cause the following change in the Lagrangian:
N
X X N XN
@L @L @L @L @L
ıL D ıx C ıy C ız D ır D ır ;
iD1
@xi @yi @zi iD1
@ri iD1
@ri
where @L=@ri corresponds to a vector with components .@L=@xi ; @L=@yi ; @L=@zi /,

the notation frequently used in physics literature. In the formula above we perform
summation over all particles in the system, N. Since such a translation cannot change
the physical behaviour of the system, we should have ıL D 0. Since the translation
ır is arbitrary, one can only satisfy this by having
!
XN
@L X N
d @L d X @L
N
D 0 H) D 0 H) D 0;
iD1
@xi iD1
dt @Pxi dt iD1 @Pxi
and similarly for the y and z components. Here again the Lagrange equations (9.77)
were used. Hence the vector P with the Cartesian components
X
N
@L X
N
@L X
N
@L X
N
@L X
N
@L X
N
@L
Px D D ; Py D D ; Pz D D ;
@Pxi @vix @Pyi @viy @Pzi @viz
iD1 iD1 iD1 iD1 iD1 iD1
(9.80)
7
Of course, the matrix can always be chosen symmetric in the quadratic form.
is to be conserved. This vector is called momentum of the system,
XN
@L
PD ;
iD1
@vi
while its individual components, @L=@vi˛ (where ˛ D x; y; z and vi D rP i is the

velocity vector), correspond to the components of the momentum vector
@L
pi D
@vi
of the i-th individual particle. Since only the kinetic energy
1X
N
KD mi v2i
2 iD1
in the Lagrangian L depends on the velocities, we obtain after differentiation that

pi D mi vi , which is a familiar expression for the particle momentum.
If the system is subjected to an external field which does not depend on a
particular coordinate, e.g., x, then the system will still remain invariant in its
translation along this directions. This means that the x component of the momentum,
Px , will be conserved.
An isolated system should also be invariant under an arbitrary rotation around
an arbitrary axis, and this must correspond to the conservation of another physical
quantity. Consider an axis passing through the centre of the coordinate system which
is characterised by the unit vector s. A rotation by the angle ı around this axis will
change the position vectors ri of the particle i (where i D 1; : : : ; N) by
ıri D ı Œs ri ; (9.81)
as can be seen from Fig. 9.6. Indeed, the vector r in the figure, which makes an
angle with the axis of rotation, defines a plane perpendicular to the axis when
rotated by angle ı to r0 . The projections of both r and r0 on that plane have both
Fig. 9.6 The vector r rotated

by angle ı around the axis
drawn along the unit vector s
goes over into a new vector
r0 D r C ır
the length r sin , and the angle between them is exactly ı . For small such angles
the length of the difference, ır D r0 r, is ır D .r sin / ı D js rj ı as jsj D 1.
Taking into account the directions of the vectors and the definition of the vector
product of the two vectors, we arrive at Eq. (9.81) written above.
Correspondingly, as the system rotates, velocities vi of its particles change as
well:

dri d d dri
ıvi D ı D .ıri / D ı Œs ri D ı s D ı Œs vi :
dt dt dt dt
Therefore, the total variation of the Lagrangian due to rotation of the whole system
by ı around the axis given by the unit vector s is (the sum over ˛ corresponds to
the summation over the Cartesian components):
XN X X N
@L @L @L @L
ıL D ıri˛ C ıvi˛ D ıri C ıvi :
iD1 ˛
@ri˛ @vi˛ iD1
@ri @vi
Using Lagrange equations, this expression can be manipulated further:

!
XN
d @L @L d X @L
N
ıL D ıri C ıvi D ıri
iD1
dt @vi @vi dt iD1 @vi
d X d X d X
N N N
D pi ıri D pi ı Œs ri D ı pi Œs ri :
dt iD1 dt iD1 dt iD1
The expression under the sum contains the triple product Œpi ; s; ri which is invariant
under a cyclic permutation of its components. Therefore, we can write
!
d X d X d X
N N N
ıL D ı Œpi ; s; ri D ı Œs; ri ; pi D ı s Œri pi :
dt iD1 dt iD1 dt iD1
Since the Lagrangian must not change, ıL D 0, the expression in the round brackets
above,
X
N
MD Œri pi ;
iD1
must be conserved. This is the familiar expression for the angular momentum of the
system.
So far we have discussed various applications of the Euler’s equations in
mechanics. Let us now consider some applications of the Ostrogradsky’s equa-
tion (9.74) for functions of more than one variable. The simplest example is related
to the oscillations of a string considered in Sect. 8.2.1. Consider a string of length l
which is set along the x axis with the tension T0 (per unit length). Each unit element
dx of the string oscillates in the perpendicular direction, and its vertical displacement
is described by the function u .x; t/. To construct the equation of motion for the
string, we need to write its Lagrangian, i.e., difference of the kinetic and potential
energy terms. If is the density (per unit length) of the string, the kinetic energy of
its piece of length dx will obviously be
1 2 1 0 2
dK D uP dx D ut dx:
2 2
The potential energy of the dx element of the string can be written as a work of the
tension force, T0 , to stretch its length
q from dx (when the
q string is strictly horizontal,
2 2
2
0
i.e., when not oscillating) to ds D .dx/ C .dy/ D 1 C ux dx, i.e.,
q
2 1 0 2 1 2
dU D T0 1 C u0x 1 dx D T0 1 C ux C 1 dx ' T0 u0x dx;
2 2
where we assumed that deformation of the string is small and hence expanded
the square root term keeping only the first two terms. Assuming that there is also
an external force F (per unit length) acting on the element dx, there will be an
additional term .Fdx/ u in the potential energy. Integrating these expressions
along the whole string gives the total kinetic and potential energies, respectively,
so that the total Lagrangian of the whole string is
Z l
1 0 2 1 0 2
LD ut T0 ux C Fu dx: (9.82)
0 2 2
The action to optimise is then given by the double integral
Z t2 Z t2 Z l
1 0 2 1 0 2
SD Ldt D dt dx ut T0 ux C Fu
t1 t1 0 2 2
Z t2 Z l

D dt dx F u; u0t ; u0x ;
t1 0
while the corresponding equation of motion for the transverse displacement is the
Ostrogradsky’s equation (9.74):
@F @ @F @ @F
D 0:
@u @t @ut @x @u0x
0
Calculating the necessary derivatives, we obtain

@ 0 @
F ut T0 u0x D 0 H) F u00tt C T0 u00xx D 0;
@t @x
which is the wave equation (8.18).
Problem 9.15. Consider transverse oscillations of a rod of length L and

density (per unit length) , which is placed horizontally along the x axis. Let
u .x; t/ be a displacement of the rod at x at time t. Assuming that the potential
2
energy per unit length of the deformed rod is given by 12 u00xx , where
is some material dependent constant, construct the corresponding Lagrangian
and the action, and then show that the function u.x; t/ satisfies the following
PDE:
@2 u @4 u
2
D 4:
@t @x
[Hint: use Eq. (9.75).]
9.3.2 Functional Derivatives
In physics calculus of variations is frequently formulated in a slightly different way

via the so-called functional derivatives. Although the mathematics behind those is
basically the same as the one described above, it is worth considering it here since
otherwise the slightly different notations widely used in physics literature may cause
confusion.
If we have a functional L Œf .x/, then its variation can formally be written as an
integral
Z
ıL Œf .x/ 0 0
Œf
ıL .x/ D ıf x dx ; (9.83)
ıf .x0 /
where the quantity ıLŒf .x/

ıf .x0 /
is called the functional derivative. The functional L here
can be either an algebraic expression involving f .x/ or an integral containing f .x/;
the latter case we assumed everywhere above. If the functional L is just f .x/, then
obviously, to be consistent, one has to define the functional derivative of the function
with respect to itself to be the delta function:
ıf .x/
0
D ı x x0 : (9.84)
ıf .x /
Indeed, in this case
Z Z
ıf .x/ 0 0
ıf .x/ D ıf x dx D ı x x0 ıf x0 dx0 D ıf .x/ ;
ıf .x0 /
as required.
For instance, consider L Œf D f .x/n . Using our previous method, we obtain
.f C ˛ıf /n f n
ıL D L0 .0/ D lim D nf .x/n1 ıf .x/:
˛!0 ˛
It is easy to see that the same result is obtained using the functional derivative if we
define

ıF Œf .x/ 0
dF
Dı xx .x/: (9.85)
ıf .x0 / df
Here dF=df is a usual derivative with respect to the function f calculated at x.

Indeed,

ıf .x/n 0
df n
0
D ı x x .x/ D ı x x0 nf .x/n1
ıf .x / df
Z
ıf .x/n 0 0
H) ıL D ıf x dx D nf .x/n1 ıf .x/:
ıf .x0 /
Similarly, applying the functional derivative to a functional written as an integral,

we have
Z b Z b Z b Z
ıF .f .x// 0 0
ı F .f / dx D ıF .f / dx D ıf x dx dx
a a a ıf .x0 /
Z b Z b Z b
0
dF 0 0 dF
D ı xx .x/ıf x dx dx D ıf .x/dx;
a a df a df
as one would expect.
Let us apply now this technique to a more general functional (9.7) which also
contains a derivative of the function f .x/. In this case the variation of the derivative
also has to be included as shown below:
Z Z Z ( )
b b ıF x; f ; f 0
b 0 ıF x; f ; f 0 0 0
ıL D ı F x; f ; f 0 dx D dx dx 0
ıf x C ıf x
a a a ıf .x0 / ıf 0 .x0 /
Z b Z b (" # " # )
0 0 0
0 0 @F x; f ; f 0 @F x; f ; f 0 0
D dx dx ı xx ıf x C ı x x ıf x
a a @f .x/ @f 0 .x/
Z b Z b
@F @F @F @F d .ıf .x//
D dx ıf .x/ C 0 ıf 0 .x/ D dx ıf .x/ C 0 :
a @f @f a @f @f dx
Calculating the integral with respect to the second term within the square brackets
by parts and using the fact that ıf .a/ D ıf .b/ D 0, we obtain
Z Z
b
@F d @F b
@F d @F
ıL D dx ıf .x/ ıf .x/ D dx ıf .x/ ;
a @f dx @f 0 a @f dx @f 0
which is the same expression as before, see Eq. (9.8). Hence, in the case of a
functional L given by an integral the functional derivative ıL=ıf .x/ is given by an
expression in the square brackets above; it does not contain a delta function:
Z
ıL ı b @F d @F
D F x; f ; f 0 dx D : (9.86)
ıf .x/ ıf .x/ a @f dx @f 0
Let us work out a useful relationship involving the functional derivatives. Let L
be a functional of F (and x), which in turn is a functional of a function g D g.x/
(and x), i.e., L D L ŒF Œg.x/. Then,
Z Z
ıL ıF .x0 /
ıL D ıF x0 dx0 and also ıF x0 D ıg .x/ dx;
ıF .x0 / ıg .x/
so that
Z Z Z
ıL ıF .x0 / ıL ıF .x0 / 0
ıL D ıg .x/ dxdx0 D dx ıg .x/ dx:
ıF .x0 / ıg .x/ ıF .x0 / ıg .x/
On the other hand, the expression in the square brackets above must be the
functional derivative ıL=ıg.x/. Hence, we have
Z
ıL ıL ıF .x0 / 0
D dx : (9.87)
ıg.x/ ıF .x0 / ıg .x/
This is an analog of the chain rule we use in usual differentiation.

Finally, we note that the condition that the variation of a functional be zero
needed for finding its optimum value can be equivalently formulated as zero of
the corresponding functional derivative. For instance, the Euler’s equation (9.11)
follows immediately from setting the derivative (9.86) to zero.
9.3.3 Many-Electron Theory
In quantum mechanics the variational principle plays a central role. Consider a non-
relativistic quantum system of N electrons moving in the field of atomic nuclei,
e.g., a molecule. In a stationary state the system is characterised by its Hamiltonian
operator, HbD K b C U,
b which is a sum of the kinetic energy,
XN
bD 1
K i
iD1
2
(we use atomic units for simplicity), and potential energy,
X
N
1 X
N
1
bD
U Vn .ri / C ˇ ˇ;
2 ˇri rj ˇ
iD1 i;jD1 .i¤j/
operators. Above, in the kinetic energy operator, i is the Laplacian calculated with
b describes the
respect to the position ri of the i-th electron; next, the first term in U
interaction of each electron with all atomic nuclei, while the second term stands for
the repulsive electron–electron interaction.
The quantum state of the system is described by the Schrödinger equation
b D E‰;
H‰ (9.88)
or, if written fully,
XN X N X
N
1 1 1
i ‰ C Vn .ri / ‰ C ˇ ˇ ‰ D E‰;
2 2 ˇri rj ˇ
iD1 iD1 i;jD1 .i¤j/
where ‰ D ‰ .r1 ; : : : ; rN / is the system wave function depending on the coor-

dinates r1 ; : : : ; rN of all particles (we omitted spin here for simplicity). The wave
function is assumed to be normalised to unity,
Z Z
: : : ‰ .r1 ; : : : ; rN / ‰ .r1 ; : : : ; rN / dr1 drN
Z Z
D ::: j‰ .r1 ; : : : ; rN /j2 dr1 drN D 1; (9.89)
where we used shorthand notations for the volume element of each particle: dri D
dxi dyi dzi with ri D .xi ; yi ; zi /. The normalisation condition corresponds to the unit
probability for the electrons to be found somewhere. Indeed, the probability to find
the first electron between r1 and r1 C dr1 , the second between r2 and r2 C dr2 , and
so on is given simply by
dP D j‰ .r1 ; : : : ; rN /j2 dr1 drN ;
so that the normalisation condition sums all these probabilities over the whole space
for each particle resulting in unity. To simplify our notations, we shall use in the
following the vector R to designate all electronic coordinates r1 ; : : : ; rN , while the
corresponding product of all volume elements dr1 drN will then be written simply
as dR. Also a single integral symbol will be used; however, of course, multiple
integration is implied as above.
Finally, E in Eq. (9.88) is the total energy of the system. The lowest total energy,
E0 , corresponds to the ground state with the wave function ‰0 ; assuming that the
system is stable in its ground state, we have E0 < 0. There could be more bound
(stable) states characterising the system. Their energies, E0 , E1 , E2 , etc., are negative
and form a sequence E0 < E1 < E2 < ; the corresponding wave functions are
‰0 , ‰1 , ‰2 ; etc. The states with energies higher than the ground state energy E0
correspond to the excited states of the system. The wave functions ‰i of different
states (i D 0; 1; 2; : : :) are normalised to unity and are orthogonal,
Z
‰i .R/ ‰j .R/ dR D ıij : (9.90)
b 0 D E0 ‰0 . Multiplying both sides

Consider the equation for the ground state, H‰
b would not act on it) and integrating over
by ‰0 from the left (so that the operator H
all electronic coordinates R, we have
Z Z Z
b 0 dR D E0 ‰0 ‰0 dR H) E0 D ‰0 H‰
‰0 H‰ b 0 dR;
where normalisation of the wave function, Eq. (9.89), was used. This gives an
expression for the ground state energy directly via its ground state wave function.
This expression has been obtained from the Schrödinger equation. However, an
alternative way which is frequently used is to postulate the energy expression,
Z
E D ‰ H‰dR;b
as a functional of the wave function, E D E Œ‰ .R/, and then find the best
R function, ‰ .R/, which minimises its subject to the normalisation condition
wave
‰ ‰dR D 1. As usual, we define a function E .˛/ D E Œ‰ C ˛ı‰ and the
variation of the energy, ıE D E0 .˛/j˛D0 . The optimum condition for the wave
function is established by requiring that ıH D 0, where
Z Z
HD b
‰ H‰dR ‰ ‰dR 1
is the auxiliary functional with being the appropriate Lagrange multiplier.

Note that both functions, ‰ and ‰ , must be varied. However, since the wave
function is complex, we can vary independently its real and imaginary parts.
Alternatively, we can consider ‰ and ‰ being independent. Then, varying the ‰
function, we get
Z Z Z

ıH D b
ı‰ H‰dR ı‰ ‰dR D ı‰ b ‰ dR:
H‰
b
Since ı‰ is arbitrary, then ıH D 0 leads to the Schrödinger equation H‰‰ D0
with equal to the system energy, as above.
The calculation is a bit lengthier if we vary ‰ instead:
Z Z
ıH D b .ı‰/ dR
‰ H ‰ .ı‰/ dR: (9.91)
The Hamiltonian operator contains the kinetic energy term which involves differ-
entiation, and the potential energy operator which is a multiplication operator. For
such an operator one can easily place the function in front of the operator:
Z Z
b .ı‰/ dR D .ı‰/ U‰
‰ U b dR: (9.92)
The situation with the Kb is less trivial. Consider the component K

bi of K
b correspond-
ing to the i-th electron:
Z Z Z Z Z
1 1
bi .ı‰/ dR D
‰ K ‰ i .ı‰/ dR D : : : dR0 dri ‰ i .ı‰/ ;
2 2
„ ƒ‚ …
N1
(9.93)
where in the last passage we separated out the integration over the i-th electron from
all the others, which was denoted by dR0 . Next, consider the triple integral over dri
enclosed in the square brackets above:
Z Z Z Z 2
@
dri ‰ i .ı‰/ D dzi dyi dxi ‰ .ı‰/
@xi2
Z Z Z
@2
C dzi dxi dyi ‰ 2 .ı‰/
@yi
Z Z Z
@2
C dxi dyi dzi ‰ 2 .ı‰/ : (9.94)
@zi
Each of the integrals in the square brackets can be taken twice by parts, e.g.:
Z ˇ1 Z
@2 @
ˇ @‰ @
dxi ‰ .ı‰/ D ‰ .ı‰/ ˇ dxi .ı‰/
2
@xi @xi ˇ @xi @xi
1
Z
@‰ @
D dxi .ı‰/
@xi @xi
ˇ1 Z Z
@‰ ˇ @2 ‰ @2 ‰
D .ı‰/ˇˇ C dxi .ı‰/ D dxi ı‰:
@xi 1 @xi2 @xi2
Note that all free terms appearing while integrating by parts disappear as we assume
that the wave function goes to zero together with its spatial derivatives at infinity.
Repeating this integration for the other two integrals in Eq. (9.94), we obtain that8
Z Z

dri ‰ i .ı‰/ D dri i ‰ ı‰:
By looking at Eq. (9.93) it becomes clear now that

Z Z
bi
‰ K .ı‰/ dR D bi ‰ dR;
.ı‰/ K
R R R
8
When b
A' dx D b
A
'dx D ' b
A
dx, i.e., where effectively the functions .x/
and '.x/ are allowed to change places around the operator b A, the operator is called self-adjoint. We
d2
have just shown that the operator dx 2 is such an operator.
P
bD iK
and hence the same can be done for the whole kinetic energy operator K bi .
Combining this result with the similar result for the potential energy operator,
Eq. (9.92), we obtain for ıH in (9.91):
Z Z Z
ıH D b dR
.ı‰/ H‰ .ı‰/ ‰ dR D b ‰ dR:
.ı‰/ H‰
Since ıH D 0, we obtain H‰ b ‰ D 0, which, after taking its complex

conjugate, is nothing but the same Schrödinger equation as before.
To obtain the first excited state of the system the same procedure can be followed.
However, in this case one has toR make sure that the wave function ‰1 is orthogonal
to ‰0 of the ground state, i.e., ‰1 .R/ ‰0 .R/ dR D 0. This can be achieved by
setting up the variational problem with an additional Lagrange multiplier:
Z Z Z
b
HD ‰ H‰dR 0 ‰ ‰dR 1 1 ‰ ‰0 dR;

so that the condition of orthogonality is imposed. Then, setting ıH D 0 should

give the lowest quantum state which is orthogonal to the ground one, and this
has to be the first excited state ‰1 of the system. It is clear that this procedure
can be generalised for finding any excited state ‰n of the system by forcing its
orthogonality to all the previous states ‰0 ; : : : ; ‰n1 .
To illustrate this variational principle, let us consider the so-called Thomas–
Fermi (TF) model of an atom. Any atom (apart from the hydrogen atom) is
essentially a many-electron system which is very difficult to treat exactly. In this
model an attempt was made to propose an expression for the atom energy, E, which
is a functional of the atom electron density .r/ only. The latter is defined as the
probability density to find one (any) electron between points r and r C dr. It can be
obtained from the wave function ‰ by integrating over all electrons but one:
Z Z h i Z Z
.r/ D N : : : dR0 j‰ .R/j2 D N : : : dr2 dr3 drN j‰ .r; r2 ; : : : ; rn /j2 :
r1 Dr
„ ƒ‚ …
N1
The factor N appears here because any of the N electrons can contribute to the
density. The proposed TF functional has the form:
Z Z Z Z
.r/ 1 .r/ .r0 /
ETF Œ .r/ D CF dr .r/5=3 Z drC drdr0 : (9.95)
r 2 jr r0 j
The first term here corresponds to the energy of a free uniform electron gas
occupying a volume dr, CF being a constant prefactor. The second term corresponds
to the attractive interaction of all the electrons of the density to the nucleus of
charge Z. Finally, the last term describes the electron–electron interaction. It is
simply the Coulomb interaction energy of the charge cloud of the density with
itself. The energy functional must be supplemented by the normalisation of the
density,
Z Z Z Z Z
.r/ dr D N dr : : : dR0 j‰ .R/j2 D N j‰ .R/j2 dR D N: (9.96)
„ ƒ‚ …
N1
Therefore, to find the “best” electron density that would minimise the atom energy,
we need to vary the energy with respect to the density subject to the constraint of
the normalisation:
Z Z
ıH D ı ETF Œ .r/ .r/ dr N D ıETF Œ .r/ ı .r/ dr:
(9.97)
Here is the Lagrange multiplier.
Problem 9.16. Show using the method based on the replacement ! C˛ı
that
Z Z
5 Z .r0 / 0
ıETF D CF 3=2 C dr ı .r/ dr
3 r jr r0 j
Z
5 3=2
D CF V.r/ ı .r/ dr;
3
where
Z
Z .r0 / 0
V.r/ D dr
r jr r0 j
is the electrostatic potential due to the nucleus and the entire electron cloud.
Z
H D ETF Œ .r/ .r/ dr N :
Apply the method of functional derivatives to show that
ıH 5
D CF .r/3=2 V.r/:
ı .r/ 3
Using the above results, the following equation for the electron density is
obtained:
5
ıH D 0 H) CF .r/3=2 V.r/ D :
3
This is a rather complex integral equation for the density . It is to be solved together
with the normalisation condition (9.96) which is required to determine the Lagrange
multiplier . The TF model gives plausible results for some of the many-electron
atoms, but fails miserably for molecules (which only require some modification of
the second term in the energy expression (9.95)), for which no binding is obtained
at all.
Still, from an historical point of view, the TF model is very important. Essentially,
this model avoids dealing with the wave function, a very complicated object indeed
as depending on many electronic variables; instead, it was proposed to work only
with the electron density .r/ which depends only on three variables. The TF
model was the first attempt to implement that idea of replacing ‰ with , and is
an early predecessor of the modern density functional theory (DFT) developed by
P. Hohenberg, W. Kohn and L.J. Sham (in 1964–1965), in which that idea has been
successfully developed and implemented into a powerful computational technique
which is being widely used nowadays in condensed matter physics, material science
and computational chemistry.
In Kohn–Sham DFT the electron density is used as well, however, the variational
problem is formulated for some one-particle wave functions a .r/ called orbitals
which are required to form an orthonormal set:
Z

a .r/ a0 .r/ dr D ıaa0 :
The idea is to map a real interacting many-electron system into an artificial non-
interacting system of the same number of electrons N and of the same electron
density .r/; the fictitious system of electrons is subjected to an effective external
potential to be determined self-consistently. Because of that mapping,
ˇ ˇthe electron
ˇ 2ˇ
density can be written explicitly as a sum of the densities a D ˇ a .r/ ˇ due to each
electron:
X
N
.r/ D j a .r/j2 : (9.98)
aD1
Moreover, the kinetic energy of all electrons can also be calculated as a sum of
kinetic energies due to each individual electron:
N Z
X
1
KD .r/ a a .r/ dr: (9.99)
aD1
a
2
Then, the total energy functional of the electron density is proposed to be of the
following form:
N Z
X Z Z
1 1 .r/ .r0 /
EDFT Œf a g D a a dr C drdr0
aD1
a
2 2 jr r0 j
Z
C .r/ Vn .r/dr C Exc Œ : (9.100)
Here the first term is the kinetic energy of the fictitious electron gas, the second
term describes the bare Coulomb interaction between electrons, next is the term
describing interaction of the electrons with the potential of the nuclei Vn .r/.
Finally, the last, the so-called exchange–correlation term, Exc , describes all effects
of exchange and correlation in the electron gas. This term also absorbs an error
due to replacing the kinetic energy of the real interacting gas with that of the
non-interacting gas. The exact expression for Exc is not known, however, good
approximations exist. In the so-called local density approximation (LDA)
Z
Exc Œ D .r/ xc Œ .r/ dr; (9.101)
where xc is the exchange–correlation energy density which is some known smooth

function of the density fitted to calculations of the uniform electron gas of various
densities.
The energy expression above is a functional of all orbitals a , so it has to be
varied taking into account the expression (9.98) which relates explicitly the density
with the orbitals. All the necessary functional derivatives we need in order to obtain
equations for the orbitals are calculated in the following problem:
Problem 9.18. Obtain the following expressions for the functional derivatives
of different terms in the energy functional:
ı .r/ ıK 1

D ı .r r1 / a .r1 / ;
D r1 a .r1 /;
ı a .r1 / ı a .r1 / 2
Z Z 0 Z
ı 1 .r/ .r / 0 .r0 /
drdr D a .r 1 / dr0 ;
ı a .r1 / 2 jr1 r0 j jr1 r0 j
Z
ı
.r/ Vn .r/dr D Vn .r1 / a .r1 / ;
ı a .r1 /

ıExc d xc
D xc . / C a .r1 / D Vxc . .r1 // a .r1 / :
ı a .r1 / d D .r1 /
Here r1 is the Laplacian with respect to the point r1 , and Vxc is called the
exchange–correlation potential.
Once we have all the necessary derivatives, we can consider the optimum of
the energy functional (9.100) subject to the condition that the orbitals form an
orthonormal set. Consider the appropriate auxiliary functional:
X Z

H Œf a g D EDFT Œf a g aa0 a a 0 dr ı aa0 ;
a;a0
where numbers aa0 are the corresponding Lagrange multipliers. They form a
symmetric square matrix with the size equal to the number of orbitals we use. This
gives
Z
ıH 1 .r0 / 0
D r1 C dr C Vn .r1 / C Vxc .r1 / a .r1 /
ı a .r1 / 2 jr1 r0 j
X
aa0 a0 .r1 / :
a0
Setting this functional derivative to zero results in the equations

Z X
1 .r0 / 0
r1 C 0
dr C Vn .r1 / C Vxc .r1 / a .r1 / D aa0 a0 .r1 / ;
2 jr1 r j 0 a
(9.102)
or simply
X
bKS
F a D aa0 a0 ; (9.103)
a0
bKS is the expression within the square brackets in Eq. (9.102).

where the operator F
Problem 9.19. Repeat the previous steps by calculating the functional deriva-
tive of H with respect to a .r1 /, not a .r1 /, i.e., opposite to what has been
done above. Show that in this case we obtain the following equation:
X
FbKS a D a0 a a0 : (9.104)
a0
instead of Eq. (9.103).
Taking the complex conjugate of the above equation and comparing it with
Eq. (9.103), we see that aa0 D a0 a , i.e., the matrix D .aa0 / of Lagrange
multipliers must be Hermitian.
The obtained Eq. (9.103) are not yet final. The point is that one can choose
another set of orbitals, 'b .r/, as a linear combination of the old ones,
X
'b .r/ D uba a .r/ ; (9.105)
a
with the coefficients, uba , forming a unitary matrix U D .uba /, i.e., U U D UU D

E D .ıaa0 /, or, in components,
X X
uba uba0 D uab ua0 b D ıaa0 :
b b
Then, the old orbitals can easily be expressed via the new ones: multiply both sides
of (9.105) by uba0 and sum over b:
!
X X X X

uba0 'b .r/ D uba0 uba a .r/ H) a D uba 'b :
b a b b
„ ƒ‚ …
ıaa0
Problem 9.20. Prove that the new orbitals still form an orthonormal set as the
old ones.
Then, it is easy to see that the electron density can be expressed via the new
orbitals in exactly the same way as when using the old ones:
!
X XX X X X

D a a D uba ub0 a 'b 'b0 D uba ub0 a 'b 'b0 D 'b 'b
a a bb0 bb0 a b
„ ƒ‚ …
ıbb0
X
D j'b j2 : (9.106)
b
We may say that the electron density is invariant with respect to a unitary
transformation of the orbitals. Hence, since the density remains in the same form
when expressed via either the old or new sets of orbitals, and using the fact that the
bKS depends entirely on the density, one can rewrite these equations via
operator F
the new orbitals:
X X X X
FbKS a D aa0 a0 H) F bKS uba 'b D aa0 uba0 'b :
a0 b a0 b
Multiply both sides by ub0 a and sum over a:

! 0 1
X X X X X
bKS
F ub0 a uba 'b D @ ub0 a aa0 uba0 A 'b H) bKS 'b0 D
F Q b0 b 'b ;
b a b aa0 b
„ ƒ‚ …
ıbb0
(9.107)
where
X
Q b0 b D ub0 a aa0 uba0 : (9.108)
aa0
Using matrix notations, the last equation is simply Q D UU .

We can now choose the matrix U in such a way as to diagonalise the matrix
of Lagrange multipliers, Q D ıbb0 Q b . In fact, we have seen that the matrix is
Hermitian. Hence, all its eigenvalues Q b are real numbers. Therefore, one can rewrite
Eq. (9.107) in their final form as:
bKS 'b D Q b 'b :

F (9.109)
These are called Kohn–Sham equations. It is said that the orbitals 'b are eigenfunc-
tions of the Kohn–Sham operator F bKS with the eigenvalues Q b in close analogy to
eigenvectors and eigenvalues of a matrix.
These equations are to be solved self-consistently together with Eq. (9.106). First,
some orbitals f'b g are assumed. These allow one to calculate the density .r/
and hence the Kohn–Sham operator F bKS . Once this is known, new eigenfunctions
f'b g can be obtained by solving the eigenproblem (9.109) which gives an updated
density, and so on. The iterative process is stopped when the density does not change
any more (within a numerical tolerance). The obtained electron density corresponds
to the density of the real electron gas, and the total energy EKS calculated with
this density gives the total electron energy of the system. The orbitals 'a and
the corresponding eigenvalues Q a do not, strictly speaking, have a solid physical
meaning; however, in actual calculations they are interpreted as effective one-
electron wave functions and energies, respectively.
In the so-called Hartree–Fock (HF) method the total energy of a system of
electrons is written via a more general object than the density .r/ itself, which
is called the density matrix (for simplicity, we completely neglect spin here even
though it is essential in the formulation of the method):
0 X
0
r; r D a .r/ a r :
a
The “diagonal element” of this object, .r; r/, is the same as the density .r/. The
corresponding energy functional is still a functional of all the orbitals and reads:
Z
1 0
EHF Œf a g D r C Vn .r/ r; r dr
2 r0 !r
Z Z
1 drdr0 0
C 0
.r; r/ r0 ; r0 r; r0 r ;r :
2 jr r j
The first term describes the kinetic energy of the electrons together with the energy
of their interaction with the nuclei (note that the notation r0 ! r means that after
application of the Laplacian to the density matrix one has to set r0 to r); the last term
describes both Coulomb and exchange interaction of the electrons with each other.
Problem 9.21. By imposing the condition for the orbitals to form an orthonor-
mal set and setting the variation of the corresponding functional to zero,
show that the equations for the orbitals a are determined by solving the
eigenproblem
bHF
F a D a a;
where the so-called Fock operator is

Z Z
bHF .r/ D 1 r C Vn .r/ C .r0 ; r0 / 0 .r; r0 /
F dr Prr0 dr0 :
2 jr r0 j jr r0 j
The last term describes the exchange interaction. It contains an exchange

operator Prr0 defined as follows: Prr0 .r/ D .r0 /. In other words, the last
term in the Fock operator works in such a way that when it acts on a function
of r on the right of it, it changes its arguments to r0 so that it appears inside the
integral:
Z Z
.r; r0 / 0 .r; r0 / 0 0
0
Prr0 dr .r/ D r dr :
jr r j jr r0 j
Index
Symbols brachistochrone problem, 622

N-dimensional sphere, 520 branch cut, 145, 154
branch point, 145, 154
branches of multi-valued function, 145
A Brillouin zone, 293
absolute convergence, 178 Brownian particle, 434, 436, 484, 489
advanced solution, 426
analytic continuation, 199, 456
analytic function, 137 C
angular momentum, 652 canonical form of PDE, 547
anti-Hermitian matrix, 31, 70 catenary problem, 621
anti-symmetric matrix, 21 Cauchy principal value, 215
aperture, 439 Cauchy theorem, 165, 174
aperture function, 441 Cauchy theorem for multiply connected region,
argument of complex number, 125 172
associated Legendre equation, 229, 357 Cauchy-Riemann conditions, 137, 197
associated Legendre functions, 356, 366 causality principle, 231
atomic orbitals, 117 characteristic equation, 62
atomic vibrations, 105 characterization of PDE, 547
autocorrelation function, 310, 484 charge density, 299
Chebyshev polynomials, 342
classical action, 645
B closed domain in complex plane, 130
basis vectors, 4 closed region in complex plane, 130
Bessel differential equation, 369 closed set of functions, 339
Bessel function, 394, 397, 442 co-factor of matrix, 47
Bessel function of the first kind, 370 colored noise, 487
beta function, 314, 354 complete set of functions, 273, 297, 339
Bloch theorem, 391 complete set of vectors, 6
Bohr magneton, 388 completeness condition, 282
Born, 292 completeness of eigenvectors, 108
bosons, 270 complex conjugate matrix, 22
boundary conditions, 334, 395, 554 complex Fourier series, 265
boundary problem, 605 complex plane, 126

Notes in Physics, DOI 10.1007/978-3-319-27861-2
670 Index
complex spherical harmonics, 367 differentiation of Fourier series, 262

conditional convergence, 290 diffraction pattern, 439
cone, 512 diffusion equation, 431, 534, 545, 554
conservation laws of mechanics, 649 Dirac, 301
constraint, 610 Dirac delta function, 283, 301, 327, 397, 410,
continued fraction, 51, 60, 120 428
continuity equation, 236, 557 directed area element, 525
continuous function, 133 Dirichlet, 256
convolution of functions, 416, 427, 432, Dirichlet conditions, 258, 406, 409, 452
472 Dirichlet theorem, 257, 276
convolution theorem, 416, 472, 475, 607 discontinuity of the 1st kind, 257, 407, 446,
coordinate line, 502 467
coordinate transformation, 500 dispersion relation, 113
correlation function, 434, 485 divergence, 525
cosine Fourier series, 255 divergence theorem, 643
Coulomb formula, 426 domain in complex plane, 130
Coulomb potential, 422 dot product of two functions, 8
Cramer’s method, 52 dot product of vectors, 2
Cramer’s rule, 52 double factorial, 312
critical temperature, 389 double Fourier series, 581
crystallography, 287 dynamical matrix, 107, 309
curl of vector field, 529
current, 242
curvilinear coordinates, 499 E
cyclic boundary condition, 292 eigenfunction, 358, 365, 568, 580, 583, 589,
cyclic permutation, 92 666
cycloid, 623 eigenproblem, 61
cylindrical coordinates, 500 eigenvalue, 61, 358, 365, 568, 583, 589, 666
cylindrical symmetry, 499 eigenvector, 61
cylindrical wave, 562 Einstein summation convention, 17
electron density, 284, 660
electronic circuits, 479
D elliptic PDE, 548, 550
d’Alembert, 422 energy bands, 390
d’Alembert equation, 422 energy gaps, 390
d’Alembert PDE, 603 equipartition theorem, 309, 436
d’Alembert’s formula, 561 error function, 281, 290, 470, 608
damped harmonic oscillator, 427 essential singularity, 195
de Moivre formula, 127 Euler identity, 151, 155
degenerate eigenvalue, 66, 79 Euler’s equation, 616, 623, 628, 631, 636, 637,
delta function, 301, 420, 475, 514 646, 652, 656
delta function in 3D, 397 Euler-Mascheroni constant, 317
delta sequence, 301, 410 even function, 255
density functional theory, 662 even function Fourier transform, 409
density matrix, 666 Ewald, 289
derivative of inverse function, 137 Ewald formula, 289
determinant of matrix, 35 exchange interaction, 667
diagonal matrix, 16 exchange operator, 667
diagonalization of quadratic form, 89 exchange-correlation potential, 663
dielectric function, 232 excited states, 657
differential arc length, 511 exponential form of complex number,
differential equation of hypergeometric type, 151
347, 373, 382 exponential function, 149
differential line element, 509 exponential integral, 291
Index 671
F generalised binomial coefficients, 343, 354

fermions, 270 generalised coordinate, 645
ferromagnetic phase, 389 generalised equation of hypergeometric type,
Feynman, 433 344, 357, 384
filtering theorem, 283, 411, 417, 418, 425, generalised Fourier coefficients, 280
428 generalised Fourier expansion, 280, 297
final width step function, 467 generalised function, 301, 433
finite geometric progression, 186 generalised Laguerre polynomials, 342, 354,
first Kirchhoff’s law, 480 356, 386
flow of incompressible fluid, 552 generalised series expansion, 223
fluctuation-dissipation theorem, 309 generalised velocities, 645
flux of vector field, 525 generating function, 318, 337, 340, 341, 377,
Fock operator, 667 398
force-constant matrix, 106 geodesic problem, 618
forced oscillations of string, 573 geometric progression, 128, 606
Fourier coefficients, 253 gradient, 523
Fourier cosine integral, 404 Gram and Schmidt method, 10
Fourier integral, 401, 403 Green’s formula, 641
Fourier method for solving PDE, 565, 582, Green’s function, 88, 113, 270, 427, 474, 477,
588 485, 489, 599
Fourier series, 253, 391, 570, 574, 581, 601 ground state, 657
Fourier series solution of DE, 270 group, 26
Fourier sine integral, 404 growth kinetics, 104
Fourier transform, 409, 599, 603 growth order parameter, 446, 450
fractional coordinates, 285
Fraunhofer, 439
Fraunhofer diffraction, 439 H
free energy, 388 Hamiltonian, 246
frequency, 256 Hamiltonian operator, 656
frequency spectrum, 409 harmonic approximation, 106
friction kernel, 484 harmonic function, 137, 363
function of matrix, 86 harmonic oscillator, 270, 382, 647
functional, 242, 609 harmonic oscillator equation, 567, 568, 589,
functional dependence, 609 592
functional derivative, 654, 663 Hartree-Fock method, 666
functional of function of many variables, 639 heat transport PDE, 534, 545, 585, 586, 599,
functional operator, 409 603
functional sequence, 180 Heaviside, 307
functional series, 180, 249, 251, 280 Heaviside function, 214, 307, 412, 425, 429,
functional with constraints, 626 432
fundamental frequency, 256 Helmholtz differential equation, 430
fundamental mode, 571 Helmhotz, 430
fundamental solution, 428 Hermite polynomials, 340, 352, 353, 356,
382
Hermitian matrix, 30, 71, 75
G holomorphic function, 137, 196
Galilean transformation, 553 holonomic constraints, 626, 646
gamma function, 311, 354, 370, 459 homogeneous PDE, 546
Gaussian delta sequence, 414 hydrodynamic equation of motion, 557
Gaussian integral, 313, 519 hydrogen atom, 383
general boundary conditions, 576, 596, 603 hyperbolic function, 152
general power function, 160 hyperbolic PDE, 548, 550, 559
general solution of PDE, 560, 583 hypergeometric differential equation, 344
672 Index
I Laurent series, 187, 375

image of function in Laplace transform, 446 left triangular matrix, 331
indefinite integral, 169 Legendre, 318
induction, 516 Legendre equation, 323, 366
infinite numerical series, 254 Legendre polynomials, 318, 337, 340, 356,
infinite product, 316 359, 366, 398
infinite string, 559 Leibnitz, 328
inhomogeneous PDE, 546, 563, 573, 576, 595 Leibnitz formula, 328, 343, 359
initial conditions, 395, 553 length of curve, 510
integral representation of delta function, 411 linear combination of atomic orbitals, 117
integral transform, 401, 445 linear functional, 610
integration of Fourier series, 259 linear independence of functions, 9, 57, 272
invariance in time, 649 linear independence of vectors, 3
invariance with rotation, 651 linear operator, 447
invariance with translation, 650 linear transformation, 547
inverse Fourier transform, 409, 598 Lorentz transformation of coordinates, 553
inverse functional operator, 409
inverse matrix, 23
irregular singular point, 227 M
irreversible growth, 104 Maclaurin, 301
isolated singularity, 193 magnetic field, 99
isoparametric problem, 635 magnetization, 389
main quantum number, 385
matrix, 16
J matrix blocks, 97
Jacobi polynomials, 342, 352, 354, 356, 359 matrix multiplication, 17
Jacobian, 287, 512, 515, 522 Matsubara, 270
Jordan’s lemma, 210, 211, 455 maximum of function of many variables, 90
maximum of functional, 615
K Maxwell equations, 235, 422
Kamers-Kronig relations, 232 Maxwell’s distribution, 541
kinetic energy, 541 mean square error, 273
Kirchhoff’s laws, 479 mechanical stability, 109
Kohn-Sham equations, 666 memory effects, 484
Kronecker symbol, 251, 570 meromorphic function, 196
Kronig-Penney model, 390 metric tensor, 509
minimum of function of many variables, 90
minimum of functional, 615
L minor of matrix, 47
Lagrange equations, 646 modal matrix, 73, 548
Lagrange formula, 279 modified Bessel function, 496
Lagrange function, 645 modified Bessel function of first kind, 372, 378
Lagrange multiplier, 630, 633, 635, 646, 658, modulation theorem, 413
661, 664 molecular dynamics, 541
Lagrangian, 645 molecular self-assembly, 104
Laguerre polynomials, 341, 352, 355 moment of function, 415
Lanczos method, 93, 119 momentum of system, 651
Laplace equation, 137, 356, 363, 384, 546, 585, movement under central force, 539
591, 599 multi-dimensional space, 4
Laplace equation in spherical coordinates, 363 multi-valued function, 132
Laplace transform, 445, 603 multiple Fourier series, 284
Laplacian, 384, 394, 422, 423, 528, 533, 558, multiple Fourier transform, 419
583, 600, 657 multiply connected region, 131
Index 673
multipole expansion of electrostatic potential, P

397 parabolic coordinates, 508, 529
multipole moments of charge distribution, 399 parabolic cylinder coordinates, 508, 539
parabolic PDE, 548, 550, 585
paraboloid of revolution, 512
N parity of permutation, 36
NC-AFM, 294 Parseval, 264, 269
necessary condition for functional extremum, Parseval’s equality, 276
615 Parseval’s theorem, 263, 269, 274, 281, 418,
neighborhood of point, 130 441
Newton’s equations of motion, 556, 645 partial differential equation, 363, 428, 429,
Newton’s equations of motion in curvilinear 499, 533, 545
coordinates, 536 partial Fourier transform, 435
Newton-Leibnitz formula, 169 partial sum, 177, 276, 281
non-contact atomic force microscopy, 294 particular solution of PDE, 553
non-isolated singularity, 197 path integrals, 433
non-periodic functions, 249, 401 PDE, 545
non-simply connected region, 131 pendulum, 647
nonholonomic constraints, 627, 634 period of function, 249
norm of function, 8 periodic boundary conditions, 111
normal mode, 107, 571 periodic chain of atoms, 284
numerical series, 177 periodic function, 249, 364
periodic solid, 284
permutation, 36
O phase transition, 387
odd function, 255 phase transition temperature, 389
odd function Fourier transform, 410 phonon, 383
one-dimensional chain, 110 piecewise continuous function, 257
one-to-one mapping, 132 Plancherel, 264, 269, 418
open domain in complex plane, 130 Plancherel’s theorem, 264, 269, 281, 418
open region in complex plane, 130 plane wave, 234, 293
operator, 569 Plemelj, 306
order of partial differential equation, 546 point charges, 300
order of pole, 194 Poisson, 288, 429
order parameter, 387 Poisson distribution, 494
ordinary point, 223 Poisson equation, 288, 426, 429, 534
original in Laplace transform, 446 pole, 194
orthogonal curvilinear system, 505 power series, 182
orthogonal functions, 8, 251, 265, 280 principal value, 306
orthogonal matrix, 28 principal value integral, 453
orthogonal transformation, 28, 505 principle of least action, 645
orthogonalisation of vectors, 10 probability, 541
orthogonality condition, 570 probability distribution, 489
orthonormal functions, 9 probability of hops, 490
orthonormal vectors, 5 problems with constraints, 610
orthonormality of eigenvectors, 108 product solution, 363, 566, 588
oscillations in circular membrane, 394 propagation matrix, 246
oscillations of square membrane, 578 propagation of wave, 560
Ostrogradsky, 642 propagator, 123
Ostrogradsky’s equation, 652 pseudopotentials, 293
Ostrogradsky-Gauss theorem, 643 Pythagoras, 510
overlap integral, 8, 330 Pythagoras’ theorem, 510
674 Index
Q separation constant, 364, 566, 579, 583, 588

quadratic form, 88, 106 separation of variables, 344, 363, 378, 384,
quantum mechanics, 118, 297, 433 394, 566, 573, 582, 588, 597, 599,
602
signal, 409
R similar matrices, 73
radius of convergence, 183 similarity transformation, 21, 73
random force, 484 simple pole, 194
ratio test, 178 simple zero of f .z/, 195
real spherical harmonics, 367 simply connected region, 131, 164
reciprocal lattice vectors, 285, 287, 292 sinc function, 442
reflection matrix, 35 sine Fourier series, 255, 574
region in complex plane, 130 single-valued function, 132
regular singular point, 223, 227 singular matrix, 55
regularization, 433 singular point of coordinate transformation,
remainder term, 186 500
removable singularity, 193 singularity of function, 193
repeated eigenvalues, 79 skew-Hermitian matrix, 31, 70
residence probability, 491 skin effect, 237
residue of f .z/, 201 Sokhotski, 306
resolvent of matrix, 88, 113, 119 Sokhotski-Plemelj formula, 306
resonance, 272, 294 solid angle, 367, 398
response function, 231 solid state physics, 287
retardation effects, 440 sound propagation, 558
retarded potential, 422 spectral analysis, 409
retarded solution, 426 spectral power density, 435
reversible growth, 104 spectral theorem, 86, 247
right-hand screw rule, 530 spherical coordinates for N-dimensional space,
Rodrigues, 328, 336 522
Rodrigues formula, 328, 336, 340–342, 350, spherical functions, 363, 384
353, 354, 358, 359 spherical harmonics, 363, 366
root test, 178 spherical symmetry, 499
roots of Bessel function, 380 spherical wave, 234, 439, 562
rope under gravity, 636 spherically symmetric wave, 562
rotation matrix, 32 spontaneous magnetization, 389
square matrix, 16
stationary boundary conditions, 578, 590, 595
S stationary distribution, 585, 599, 602
scalar field, 501, 525, 528 stationary PDE, 577
scalar product of vectors, 2 stationary solution, 590
scale factor, 504, 510 statistical ensemble, 434
Scanning Tunneling Microscopy, 245 Stirling’s approximation, 386
Schrödinger equation, 297, 382, 383, 390, 432, STM, 245
529, 657 stochastic force, 309
Schrödinger equation, 238, 246 string guitar, 571
second fluctuation-dissipation theorem, 486 subgroup, 27
second Kirchhoff’s law, 481 superposition principle, 569, 581, 583, 590
second order linear PDE, 546 surface area of N-dimensional sphere, 521
secular equation, 62 symmetric matrix, 21, 71
self-adjoint form of DE, 348, 378 system of linear differential equations, 99, 102,
self-adjoint operator, 659 107, 123
semi-infinite rod, 607 system of ordinary DEs, 99, 106, 111, 478
Index 675
T V
Taylor, 318 variation of functional, 611
Taylor’s expansion, 185 vector field, 501, 523, 525, 528, 530
Taylor’s formula, 185, 186 vector-column, 16
theta-function transformation, 289 vector-row, 16
Thomas-Fermi model, 660 volume of N-dimensional sphere, 521
three-dimensional delta function, volume of sphere, 520
397 von Karman, 292
trace of matrix, 92
transpose matrix, 20
transverse oscillations of rod, 654 W
triangular decomposition of matrix, 98 wave, 232, 670
triangular matrix, 50 wave equation, 234, 533, 545, 553–555
tridiagonal matrix, 50, 60, 93, 97, 111, 119 wave frequency, 233
trigonometric form of Fourier integral, wave phase velocity, 233
404 wavefunction, 117
tunneling, 238 wavelength, 233
wavevector, 113, 233, 420
weight function, 8, 330
white noise, 487
U Wronskian, 227
uncertainty principle, 411
uniform convergence, 175, 180, 406, 496
unit base vectors of curvilinear system, 503 Z
unit impulse function, 410 zero boundary conditions, 566, 573, 577, 578,
unit matrix, 16 583, 588, 591, 597
unitary matrix, 30, 70, 75 zero of order n of f .z/, 195
unitary transformation, 30 zero point energy, 383

Kantorovich2016 Book MathematicsForNaturalScientist (1)

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Kantorovich2016 Book MathematicsForNaturalScientist (1)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kantorovich2016 Book MathematicsForNaturalScientist (1)

Uploaded by

Copyright:

Available Formats

Undergraduate Lecture Notes in Physics

More information about this series at http://www.springer.com/series/8917

Mathematics for Natural

ISSN 2192-4791 ISSN 2192-4805 (electronic)

Library of Congress Control Number: 2015943266

© Springer International Publishing Switzerland 2016

Printed on acid-free paper

This Sprinter imprint is published by Springer Nature

When working on this volume, I have mostly consulted a number of excellent

London, UK Lev Kantorovich

1. L. Kantorovich, “Mathematics for Natural Scientists: Fundamentals and Basics”, Undergradu-

George Green (1793–1841) was a British mathematical physicist.

Josip Plemelj (1873–1967) was a Slovene mathematician.

Richard Phillips Feynman (1918–1988) was an American theoretical physicist

1 Elements of Linear Algebra. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Complex Numbers and Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

3.7.2 A More Rigorous Approach to the Fourier

5.2 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409

8 Partial Differential Equations of Mathematical Physics . . . . . . . . . . . . . . . . . 545

In practical problems it is often necessary to consider the behaviour of many

© Springer International Publishing Switzerland 2016 1

1.1 Vector Spaces

1.1.1 Introduction to Multidimensional Complex Vector Spaces

Consequently, a linear combination of two vectors x and y can also be defined as

˛x C ˇy D g ; where ˛xi C ˇyi D gi ; i D 1; 2; : : : ; p ;

for any (generally complex) numbers ˛ and ˇ. So in general coordinates of vectors

The dot product is also distributive:

.g; u C v/ D .g; u/ C .g; v/ ;

which follows from its definition (1.1) as well.

Problem 1.1. Prove that the dot product is distributive.

Therefore, one can manipulate vectors algebraically.

where ıij is the familiar Kronecker

where at least one of the numerical coefficients ˛ and ˇ is non-zero. Otherwise, g is

is only possible when ˛ D ˇ D D 0. In other words, a linear combination of

that have a unique solution ˛ D ˇ D 1, i.e. C D A C B, hence, C is linearly

which is obviously contradictive. If we, formally, take ˇ D 2 and ˛ D 2 from the

of linearly independent vectors employing all possible numbers c1 , c2 , etc.

which gives a unique solution ˛1 D ˛2 D ˛3 D ˛4 D 0. J

x D 2u1 u2 C 3u3 C 3u4 :

This can be verified directly by using the coordinates of the vectors.

1.1.2 Analogy Between Functions and Vectors

Fig. 1.2 The interval a x b is divided into N equidistant subintervals by points x0 D a, x1 ,

with respect to the coefficients ˛1 , ˛2 and ˛3 . To do that, we multiply both sides of

Multiplying the original equation by f2 D x and performing a similar calculation,

while using f3 D x2 results in the third equation:

Problem 1.8. Prove that the functions f0 D 1, f1 D x2 2, f2 D x3 C x2 1

1.1.3 Orthogonalisation Procedure (Gram–Schmidt Method)

We mentioned above that if we are given a set of p linearly independent vectors

First, we construct an intermediate orthogonal set of vectors which will not

.2/ .2/ .u2 ; v1 /

.3/ .3/ .u3 ; v1 /

.3/ .3/ .u3 ; v2 /

.3/ .u3 ; v1 / 2 .3/ .u3 ; v2 / 1 2

v1 D u1 ; v2 D d21 u1 C u2 ; v3 D d31 u1 C d32 u2 C u3 ; etc.;

u3 D ˛u1 C ˇu2 D v1 C ıv2 :

.3/ .u3 ; v1 / .3/ .u3 ; v2 /

is easily seen to be zero.

Problem 1.11. Prove generally that if˚ a vector ui is linearly dependent on

The procedure described above is especially useful for functions. Suppose, we

Problem 1.11. Prove generally that if˚ a vector ui is linearly dependent on

which is said to contain m rows and n columns, or is an m n matrix (rows

For instance, consider the following two 3 3 matrices:

ŒA; ŒB; C C ŒB; ŒC; A C ŒC; ŒA; B D 0 :

Rx ./ D Rx ./T D Rx ./1 : (1.41)

Problem 1.43. Let elements aij of an nn matrix A be functions of a parameter