Ammar Khanfer

Ammar Khanfer
Applied
Functional
Analysis
Applied Functional Analysis
Ammar Khanfer
Applied Functional Analysis

Ammar Khanfer
Department of Mathematics and Sciences
Prince Sultan University
Riyadh, Saudi Arabia
ISBN 978-981-99-3787-5 ISBN 978-981-99-3788-2 (eBook)

https://doi.org/10.1007/978-981-99-3788-2
Mathematics Subject Classification: 46B70, 46B50, 46A22, 47B07, 47B38, 47B99, 46B25, 46A30,
54E52, 46C05, 35D30, 35J20, 35A15
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
The present book is the third volume of our series in advanced analysis, which
consists of
• Volume 1: “Measure Theory and Integration”.
• Volume 2: “Fundamentals of Functional Analysis”.
The field of applied functional analysis concerns with the applications of functional
analysis to different areas of applied mathematics and includes various subfields
and research directions. Historically, functional analysis emerged as a consequence
of the investigations made on the minimization problems of calculus of variations
(COV), but it was soon thrived when connected to the theory of partial differential
equations. The theories of Sobolev spaces and distributions were established in the
beginning of the twentieth century to offer genuine and brilliant answers to the big
question of the existence of solutions of PDEs. This direction: (Sobolev spaces,
minimization problems in COV, existence and uniqueness theorems of solutions of
PDEs, regularity theory) is one of the greatest and most remarkable mathematical
achievements in the twentieth century, and should be regarded as one of the most
successful stories in the history of mathematics.
The present volume highlights this direction and introduces it to the readers by
studying its fundamentals and main theories, and providing a careful treatment of the
subject with clear exposition and in a student-friendly manner. The book is intended
to help students and junior researchers focusing on the theory of PDEs and calculus of
variations. The book serves as a one-semester, or alternatively two-semester, graduate
course for mathematics students concentrating on analysis. Essential prerequisites
for the book include real and functional analysis in addition to linear algebra. A
course on PDEs can be helpful but not necessary. The book consists of five chapters,
with eleven sections in each chapter.
Chapter 1 discusses linear bounded operators: compact operators, Hilbert–
Schmidt operators, self-adjoint operators and their spectral properties, and the Fred-
holm alternative theorem. After that, the unbounded operators are discussed in detail,
with a special focus on differential and integral operators.
v
vi Preface
Chapter 2 introduces distribution theory to the reader motivated by the discussion

of Green’s function, and introduces the Dirac delta, then regular and singular distri-
butions and their derivatives. The theory is then connected with Fourier transform
and Schwartz spaces and the tempered distributions.
Chapter 3 gives a comprehensive discussion about the Sobolev spaces and their
properties. Approximations of Sobolev spaces are studied in detail, then inequalities
and embedding results are treated carefully.
Chapter 4 discusses elliptic theory as an application to the theory of Sobolev
spaces. This topic connects applied functional analysis with the theory of partial
differential equations. The focus of the chapter is mainly on the applications of
Sobolev spaces to PDEs and the dominant role they play in establishing solutions of
the equations. The chapter ends with some elliptic regularity results which discuss
the regularity and smoothness of the weak solutions.
Chapter 5 introduces the area of calculus of variations to the reader as an impor-
tant application of the theory of Sobolev spaces as it plays a central role in estab-
lishing the existence of minimizers of integral functionals using the direct method.
The Gateaux derivative is introduced as a generalization of the derivative notion on
infinite-dimensional normed spaces.
Each chapter ends with a collection of problems. These problems aim to test
whether the reader has absorbed the material and gained a comprehensive under-
standing. Unlike the case in the first two volumes, I didn’t wish to provide any hints
or solutions to the problems in this volume because I expect the reader at this stage of
study has acquired the knowledge and skills needed to handle these problems inde-
pendently, and should be able to tackle them correctly after a careful study of the
material, which will significantly improve the reader’s analytical skills and provide
a more in-depth understanding of the topics.
Riyadh, Saudi Arabia Ammar Khanfer

2023
Acknowledgments
I am forever grateful and thankful to God for giving me the strength, health,
knowledge, and patience to endure and complete this work successfully.
I would like to express my sincere thanks to Prince Sultan University for its
continuing support. I also wish to express my deep thanks and gratitude to Prof.
Mahmoud Al Mahmoud, the dean of our college (CHS), and Prof. Wasfi Shatanawi,
the chair of our department (MSD), for their support and recognition of my work. My
sincerest thanks to my colleagues in our department for their warm encouragement.
vii
Contents
1 Operator Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Quick Review of Hilbert Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Lebesgue Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Convergence Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 Complete Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 Hilbert Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.5 Fundamental Mapping Theorems on Banach Spaces . . . . 4
1.2 The Adjoint of Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Bounded Linear Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Definition of Adjoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Adjoint Operator on Hilbert Spaces . . . . . . . . . . . . . . . . . . . 7
1.2.4 Self-adjoint Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Compact Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.1 Definition and Properties of Compact Operators . . . . . . . . 8
1.3.2 The Integral Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.3 Finite-Rank Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Hilbert–Schmidt Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.1 Definition of Hilbert–Schmidt Operator . . . . . . . . . . . . . . . 14
1.4.2 Basic Properties of HS Operators . . . . . . . . . . . . . . . . . . . . . 16
1.4.3 Relations with Compact and Finite-Rank Operators . . . . . 17
1.4.4 The Fredholm Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.5 Characterization of HS Operators . . . . . . . . . . . . . . . . . . . . 22
1.5 Eigenvalues of Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5.1 Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5.2 Definition of Eigenvalues and Eigenfunctions . . . . . . . . . . 24
1.5.3 Eigenvalues of Self-adjoint Operators . . . . . . . . . . . . . . . . . 24
1.5.4 Eigenvalues of Compact Operators . . . . . . . . . . . . . . . . . . . 26
1.6 Spectral Analysis of Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.1 Resolvent and Regular Values . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.2 Bounded Below Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.6.3 Spectrum of Bounded Operator . . . . . . . . . . . . . . . . . . . . . . 30
ix
x Contents
1.6.4 Spectral Mapping Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 33

1.6.5 Spectrum of Compact Operators . . . . . . . . . . . . . . . . . . . . . 33
1.7 Spectral Theory of Self-adjoint Compact Operators . . . . . . . . . . . . 33
1.7.1 Eigenvalues of Compact Self-adjoint Operators . . . . . . . . 33
1.7.2 Invariant Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.7.3 Hilbert–Schmidt Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.7.4 Spectral Theorem For Self-adjoint Compact
Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.8 Fredholm Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.8.1 Resolvent of Compact Operators . . . . . . . . . . . . . . . . . . . . . 41
1.8.2 Fundamental Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.8.3 Fredholm Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.8.4 Volterra Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.9 Unbounded Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.9.2 Closed Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.9.3 Basics Properties of Unbounded Operators . . . . . . . . . . . . 49
1.9.4 Toeplitz Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.9.5 Adjoint of Unbounded Operators . . . . . . . . . . . . . . . . . . . . . 52
1.9.6 Deficiency Spaces of Unbounded Operators . . . . . . . . . . . 54
1.9.7 Symmetry of Unbounded Operators . . . . . . . . . . . . . . . . . . 55
1.9.8 Spectral Properties of Unbounded Operators . . . . . . . . . . . 58
1.10 Differential Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
1.10.1 Green’s Function and Dirac Delta . . . . . . . . . . . . . . . . . . . . 61
1.10.2 Laplacian Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
1.10.3 Sturm–Liouville Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
1.10.4 Momentum Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
1.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2 Distribution Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.1 The Notion of Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.1.1 Motivation For Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.1.2 Test Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.1.3 Definition of Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.2 Regular Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.2.1 Locally Integrable Functions . . . . . . . . . . . . . . . . . . . . . . . . 84
2.2.2 Notion of Regular Distribution . . . . . . . . . . . . . . . . . . . . . . . 84
2.2.3 The Dual Space D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.2.4 Basic Properties of Regular Distributions . . . . . . . . . . . . . . 87
2.3 Singular Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
2.3.1 Notion of Singular Distribution . . . . . . . . . . . . . . . . . . . . . . 87
Contents xi
2.3.2 Dirac Delta Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

2.3.3 Delta Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.3.4 Gaussian Delta Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
2.4 Differentiation of Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
2.4.1 Notion of Distributional Derivative . . . . . . . . . . . . . . . . . . . 96
2.4.2 Calculus Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
2.4.3 Examples of Distributional Derivatives . . . . . . . . . . . . . . . . 98
2.4.4 Properties of δ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
2.5 The Fourier Transform Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.5.2 Fourier Transform on Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.5.3 Existence of Fourier Transform . . . . . . . . . . . . . . . . . . . . . . 103
2.5.4 Plancherel Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.6 Schwartz Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
2.6.1 Rapidly Decreasing Functions . . . . . . . . . . . . . . . . . . . . . . . 105
2.6.2 Definition of Schwartz Space . . . . . . . . . . . . . . . . . . . . . . . . 107
2.6.3 Derivatives of Schwartz Functions . . . . . . . . . . . . . . . . . . . . 108
2.6.4 Isomorphism of Fourier Transform on Schwartz
Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
2.7 Tempered Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
2.7.1 Definition of Tempered Distribution . . . . . . . . . . . . . . . . . . 112
2.7.2 Functions of Slow Growth . . . . . . . . . . . . . . . . . . . . . . . . . . 113
2.7.3 Examples of Tempered Distributions . . . . . . . . . . . . . . . . . . 114
2.8 Fourier Transform of Tempered Distribution . . . . . . . . . . . . . . . . . . 116
2.8.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
2.8.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
2.8.3 Derivative of F.T. of Tempered Distribution . . . . . . . . . . . . 117
2.9 Inversion Formula of The Fourier Transform . . . . . . . . . . . . . . . . . . 118
2.9.1 Fourier Transform of Gaussian Function . . . . . . . . . . . . . . 119
2.9.2 Fourier Transform of Delta Distribution . . . . . . . . . . . . . . . 122
2.9.3 Fourier Transform of Sign Function . . . . . . . . . . . . . . . . . . 123
2.10 Convolution of Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
2.10.1 Derivatives of Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . 124
2.10.2 Convolution in Schwartz Space . . . . . . . . . . . . . . . . . . . . . . 124
2.10.3 Definition of Convolution of Distributions . . . . . . . . . . . . . 125
2.10.4 Fundamental Property of Convolutions . . . . . . . . . . . . . . . . 125
2.10.5 Fourier Transform of Convolution . . . . . . . . . . . . . . . . . . . . 126
2.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
3 Theory of Sobolev Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.1 Weak Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.1.1 Notion of Weak Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.1.2 Basic Properties of Weak Derivatives . . . . . . . . . . . . . . . . . 135
3.1.3 Pointwise Versus Weak Derivatives . . . . . . . . . . . . . . . . . . . 136
3.1.4 Weak Derivatives and Fourier Transform . . . . . . . . . . . . . . 138
xii Contents
3.2 Regularization and Smoothening . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

3.2.1 The Concept of Mollification . . . . . . . . . . . . . . . . . . . . . . . . 139
3.2.2 Mollifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
3.2.3 Cut-Off Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
3.2.4 Partition of Unity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
3.2.5 Fundamental Lemma of Calculus of Variations . . . . . . . . . 148
3.3 Density of Schwartz Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
3.3.1 Convergence of Approximating Sequence . . . . . . . . . . . . . 149
3.3.2 Approximations of S and L p . . . . . . . . . . . . . . . . . . . . . . . . 151
3.3.3 Generalized Plancherel Theorem . . . . . . . . . . . . . . . . . . . . . 154
3.4 Construction of Sobolev Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
3.4.1 Completion of Schwartz Spaces . . . . . . . . . . . . . . . . . . . . . . 155
3.4.2 Definition of Sobolev Space . . . . . . . . . . . . . . . . . . . . . . . . . 156
3.4.3 Fractional Sobolev Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
3.5 Basic Properties of Sobolev Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 159
3.5.1 Convergence in Sobolev Spaces . . . . . . . . . . . . . . . . . . . . . . 159
3.5.2 Completeness and Reflexivity of Sobolev Spaces . . . . . . . 161
3.5.3 Local Sobolev Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
3.5.4 Leibnitz Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
3.5.5 Mollification with Sobolev Function . . . . . . . . . . . . . . . . . . 165
k, p
3.5.6 W0 () . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
3.6 W () . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1, p
167
3.6.1 Absolute Continuity Characterization . . . . . . . . . . . . . . . . . 167
3.6.2 Inclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
3.6.3 Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
3.6.4 Dual Space of W 1, p () . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
3.7 Approximation of Sobolev Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
3.7.1 Local Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
3.7.2 Global Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
3.7.3 Consequences of Meyers–Serrin Theorem . . . . . . . . . . . . . 179
3.8 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
3.8.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
3.8.2 The Zero Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
3.8.3 Coordinate Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 188
3.8.4 Extension Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
3.9 Sobolev Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
3.9.1 Sobolev Exponent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
3.9.2 Fundamental Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
3.9.3 Gagliardo–Nirenberg–Sobolev Inequality . . . . . . . . . . . . . 204
3.9.4 Poincare Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
3.9.5 Estimate for W 1, p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
3.9.6 The Case p = n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
3.9.7 Holder Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
3.9.8 The Case p > n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
3.9.9 General Sobolev Inequalities . . . . . . . . . . . . . . . . . . . . . . . . 216
Contents xiii
3.10 Embedding Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

3.10.1 Compact Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
3.10.2 Rellich–Kondrachov Theorem . . . . . . . . . . . . . . . . . . . . . . . 221
3.10.3 High Order Sobolev Estimates . . . . . . . . . . . . . . . . . . . . . . . 225
3.10.4 Sobolev Embedding Theorem . . . . . . . . . . . . . . . . . . . . . . . 226
3.10.5 Embedding of Fractional Sobolev Spaces . . . . . . . . . . . . . . 227
3.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
4 Elliptic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
4.1 Elliptic Partial Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . 239
4.1.1 Elliptic Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
4.1.2 Uniformly Elliptic Operator . . . . . . . . . . . . . . . . . . . . . . . . . 240
4.1.3 Elliptic PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
4.2 Weak Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
4.2.1 Motivation for Weak Solutions . . . . . . . . . . . . . . . . . . . . . . . 242
4.2.2 Weak Formulation of Elliptic BVP . . . . . . . . . . . . . . . . . . . 243
4.2.3 Classical Versus Strong Versus Weak Solutions . . . . . . . . 246
4.3 Poincare Equivalent Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
4.3.1 Poincare Inequality on H01 . . . . . . . . . . . . . . . . . . . . . . . . . . 247
4.3.2 Equivalent Norm on H01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
4.3.3 Poincare–Wirtinger Inequality . . . . . . . . . . . . . . . . . . . . . . . 249
4.3.4 Quotient Sobolev Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
4.4 Elliptic Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
4.4.1 Bilinear Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
4.4.2 Elliptic Bilinear Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
4.4.3 Garding’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
4.5 Symmetric Elliptic Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
4.5.1 Riesz Representation Theorem for Hilbert Spaces . . . . . . 255
4.5.2 Existence and Uniqueness Theorem—Poisson’s
Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
4.5.3 Existence and Uniqueness Theorem—Helmholtz
Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
4.5.4 Ellipticity and Coercivity . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
4.5.5 Existence and Uniqueness Theorem—Symmetric
Uniformly Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
4.6 General Elliptic Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
4.6.1 Lax–Milgram Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
4.6.2 Dirichlet Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
4.6.3 Neumann Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
4.7 Spectral Properties of Elliptic Operators . . . . . . . . . . . . . . . . . . . . . . 268
4.7.1 Resolvent of Elliptic Operators . . . . . . . . . . . . . . . . . . . . . . 268
4.7.2 Fredholm Alternative for Elliptic Operators . . . . . . . . . . . . 270
4.7.3 Spectral Theorem for Elliptic Operators . . . . . . . . . . . . . . . 271
4.8 Self-adjoint Elliptic Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
4.8.1 The Adjoint of Elliptic Bilinear . . . . . . . . . . . . . . . . . . . . . . 271
xiv Contents
4.8.2 Eigenvalue Problem of Elliptic Operators . . . . . . . . . . . . . . 273

4.8.3 Spectral Theorem of Elliptic Operator . . . . . . . . . . . . . . . . 274
4.9 Regularity for the Poisson Equation . . . . . . . . . . . . . . . . . . . . . . . . . . 275
4.9.1 Weyl’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
4.9.2 Difference Quotients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
4.9.3 Caccioppoli’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
4.9.4 Interior Regularity for Poisson Equation . . . . . . . . . . . . . . . 280
4.10 Regularity for General Elliptic Equations . . . . . . . . . . . . . . . . . . . . . 283
4.10.1 Interior Regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
4.10.2 Higher Order Interior Regularity . . . . . . . . . . . . . . . . . . . . . 286
4.10.3 Interior Smoothness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
4.10.4 Boundary Regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
4.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
5 Calculus of Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
5.1 Minimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
5.1.1 Definition of Minimization Problem . . . . . . . . . . . . . . . . . . 295
5.1.2 Lower Semicontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
5.1.3 Minimization Problems in Finite-Dimensional
Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
5.1.4 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
5.1.5 Minimization in Infinite-Dimensional Space . . . . . . . . . . . 300
5.2 Weak Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
5.2.1 Notion of Weak Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
5.2.2 Weak Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
5.2.3 Weakly Closed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
5.2.4 Reflexive Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
5.2.5 Weakly Lower Semicontinuity . . . . . . . . . . . . . . . . . . . . . . . 303
5.3 Direct Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
5.3.1 Direct Verses Indirect Methods . . . . . . . . . . . . . . . . . . . . . . 305
5.3.2 Minimizing Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
5.3.3 Procedure of Direct Method . . . . . . . . . . . . . . . . . . . . . . . . . 306
5.3.4 Coercivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
5.3.5 The Main Theorem on the Existence of Minimizers . . . . . 308
5.4 The Dirichlet Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
5.4.1 Variational Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
5.4.2 Dirichlet Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
5.4.3 Weierstrass Counterexample . . . . . . . . . . . . . . . . . . . . . . . . . 313
5.5 Dirichlet Principle in Sobolev Spaces . . . . . . . . . . . . . . . . . . . . . . . . 315
5.5.1 Minimizer of the Dirichlet Integral in H01 . . . . . . . . . . . . . . 315
5.5.2 Minimizer of the Dirichlet Integral in H 1 . . . . . . . . . . . . . . 315
5.5.3 Dirichlet Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
5.5.4 Dirichlet Principle with Neumann Condition . . . . . . . . . . . 318
5.5.5 Dirichlet Principle with Neumann B.C. in Sobolev
Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Contents xv
5.6 Gateaux Derivatives of Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . 322

5.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
5.6.2 Historical Remark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
5.6.3 Gateaux Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
5.6.4 Basic Properties of G-Derivative . . . . . . . . . . . . . . . . . . . . . 324
5.6.5 G-Differentiability and Continuity . . . . . . . . . . . . . . . . . . . . 325
5.6.6 Frechet Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
5.6.7 G-Differentiability and Convexity . . . . . . . . . . . . . . . . . . . . 326
5.6.8 Higher Gateaux Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . 327
5.6.9 Minimality Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
5.7 Poisson Variational Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
5.7.1 Gateaux Derivative of Poisson Integral . . . . . . . . . . . . . . . . 331
5.7.2 Symmetric Elliptic PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
5.7.3 Dirichlet Principle of Symmetric Elliptic PDEs . . . . . . . . . 336
5.8 Euler–Lagrange Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
5.8.1 Lagrangian Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
5.8.2 First Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
5.8.3 Necessary Condition for Minimiality I . . . . . . . . . . . . . . . . 339
5.8.4 Euler–Lagrange Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
5.8.5 Second Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
5.8.6 Legendre Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
5.9 Dirichlet Principle for Euler–Lagrange Equation . . . . . . . . . . . . . . . 344
5.9.1 The Lagrangian Functional . . . . . . . . . . . . . . . . . . . . . . . . . . 344
5.9.2 Gateaux Derivative of the Lagrangian Integral . . . . . . . . . . 344
5.9.3 Dirichlet Principle for Euler-Lagrange Equation . . . . . . . . 348
5.10 Variational Problem of Euler–Lagrange Equation . . . . . . . . . . . . . . 348
5.10.1 p−Convex Lagrangian Functional . . . . . . . . . . . . . . . . . . . . 348
5.10.2 Existence of Minimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
5.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
About the Author
Ammar Khanfer earned his Ph.D. from Wichita State University, USA. His area of
interest is analysis and partial differential equations (PDEs), focusing on the interface
and links between elliptic PDEs and hypergeometry. He has notably contributed to
the field by providing prototypes studying the behavior of generalized solutions of
elliptic PDEs in higher dimensions in connection to the behavior of hypersurfaces
near nonsmooth boundaries. He also works on the qualitative theory of differential
equations, and in the area of inverse problems of mathematical physics. He has
published articles of high quality in reputable journals.
Ammar taught at several universities in the USA: Western Michigan University,
Wichita State University, and Southwestern College in Winfield. He was a member
of the Academy of Inquiry Based Learning (AIBL) in the USA. During the period
2008–2014, he participated in AIBL workshops and conferences on effective teaching
methodologies and strategies of creative thinking. He then moved to Saudi Arabia
to teach at Imam Mohammad Ibn Saud Islamic University, where he taught and
supervised undergraduate and graduate students of mathematics. Furthermore, he
was appointed as coordinator of the Ph.D. program establishment committee in the
department of mathematics. In 2020, he moved to Prince Sultan University in Riyadh,
and has been teaching there since then.
xvii
Chapter 1
Operator Theory
1.1 Quick Review of Hilbert Space
This section provides a very brief and quick review of the basics of Hilbert space
theory and functional analysis that are needed for this text. We list some of the most
important notions and results that will be used throughout this book. It should be
noted that the objective of this section is to merely refresh the memory rather than
explain these concepts as they have been already explained in detail in volume 2 of
this series [58]. The reader who did not study this material should consult [58] or
alternatively any introductory book on functional analysis.
1.1.1 Lebesgue Spaces
Definition 1.1.1 (Normed Spaces) Let X be a vector space. If X is endowed with a

norm ·, then the space (X, ·) is called “normed space”.
Definition 1.1.2 (L p Spaces) The space L[a, b] is the space consisting of all
Lebesgue-integrable functions on [a, b], that is, those functions f : [a, b] → R such
that b
f = | f (x)| d x < ∞.
a
The space L[a, b] can be also generalized to L p [a, b], the space of all functions such
that | f | p is Lebesgue-integrable on [a, b] for every f ∈ L p [a, b], where 1 ≤ p <
∞, endowed with the norm
b 1/ p
f p = | f (x)| d x
p
.
a
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 1
A. Khanfer, Applied Functional Analysis,
https://doi.org/10.1007/978-981-99-3788-2_1
2 1 Operator Theory
The convergence in L p is defined as follows: Let ( f n ) ∈ L p (). Then f n converges

f in p− pnorm (or in the mean of order p) if f n − f p → 0, or, equivalently,
to
|
fn − f
| −→ 0.
Theorem 1.1.3 (Holder’s Inequality) Let f ∈ L p () and g ∈ L q (), for 1 ≤
p, q ≤ ∞, and p and q be conjugates. Then f g ∈ L 1 (), i.e., Lebesgue-integrable,
and
| f g| ≤ f p . gq .

Theorem 1.1.4 (Minkowski’s Inequality) Let f, g ∈ L p for 1 ≤ p ≤ ∞. Then
f + g p ≤ f p + g p .
Theorem 1.1.5 (Fatou’s Lemma) Let { f n } be a sequence of measurable functions

in a measure space (X, Σ, μ), f n ≥ 0, and f n −→ f a.e. on a set E ∈ Σ. Then

f dμ ≤ lim f n dμ.
E E
1.1.2 Convergence Theorems
Theorem 1.1.6 (Monotone Convergence Theorem) Let { f n } be a sequence of

nonnegative and increasing measurable functions on E ∈ Σ in a measure space
(X, Σ, μ). If lim f n = f a.e., then

lim f n dμ = f dμ.
E E
Theorem 1.1.7 (Dominated Convergence Theorem) (DCT) Let { f n } be a sequence

of measurable functions on E ∈ Σ in a measure space (X, Σ, μ). If f n → f p.w.
a.e., and is dominated by some μ−integrable function g over E, that is, | f n | ≤ g for
all n and for almost all x in E, then f is μ−integrable and

lim f n dμ = f dμ.
E
Theorem 1.1.8 (Riesz’s Lemma) Let Y be a closed subspace of a normed space

X. Then, for every δ, 0 < δ < 1, there exists x0 ∈ X such that x0 = 1 and
dist(x0 , Y ) ≥ δ.
Theorem 1.1.9 (Riesz–Fischer Theorem) (L p )∗ ∼
= L q for 1 ≤ p < ∞.
Theorem 1.1.10 Let X be a normed space. Then, X is finite-dimensional if and only
if its closed unit ball is compact.
1.1 Quick Review of Hilbert Space 3
1.1.3 Complete Space
Definition 1.1.11 (Banach Space) A space X is called complete if every Cauchy

sequence in X converges to an element in X. A complete normed vector space is
called “Banach Space”.
Theorem 1.1.12 Every finite-dimensional normed space is complete.
Theorem 1.1.13 Let X be complete space and Y ⊂ X. Then, Y is closed if and only
if Y is complete.
1.1.4 Hilbert Space
Definition 1.1.14 (Inner Product) The inner product is a map
·, · : V × V −→ F
where V is a vector space over the field F, which could be R or C, such that the
following hold:
(1) For any x ∈ V , x, x ≥ 0 and x, x = 0 iff x = 0.
(2) For x, y ∈ V, a ∈ F, we have αx, · = α x, · and x + y, · = x, · + y, · .
We also have the conjugate linearity property: ·, αx = ᾱ ·, x and
·, x + y = ·, x + ·, y .
(3) x, y = y, x . If F = R, then we have x, y = y, x .
Theorem 1.1.15 (Cauchy–Schwartz Inequality) Let x, y ∈ V for some vector space

V. Then
| x, y | ≤ x y .
Definition 1.1.16 (Hilbert Space) Let V be a vector space. The space X = (V, ·, · )
is said to be an inner product space. A complete inner product space is called Hilbert
space.
Theorem 1.1.17 (Decomposition Theorem) Let Y be a closed subspace of a Hilbert

space H. Then, any x ∈ H can be written as x = y + z, where y ∈ Y and z ∈ Y ⊥ ,
and such that x and z are uniquely determined by x.
Theorem 1.1.18 (Orthonormality Theorem) Let H be an infinite-dimensional

Hilbert space and M = {en }n∈N be orthonormal sequence in H. Then
∞
∞
2
(1) α j e j converges if and only if α j converges. In this case,
j=1 j=1
4 1 Operator Theory
2
∞ ∞
2
α e = α j .
j j
j=1 j=1
(2) Bessel’s Inequality. For every x ∈ H, we have

∞

x, e j 2 ≤ x 2 .
j=1
(3) Parseval’s Identity. If M is an orthonormal basis for H then for every x ∈ H,

∞
we have x = x, e j e j , and
j=1
∞

x, e j 2 = x 2 .
j=1
1.1.5 Fundamental Mapping Theorems on Banach Spaces
Theorem 1.1.19 (Open Mapping Theorem) (OMT) If T ∈ B(X, Y ) where X and

Y are Banach spaces. If T is surjective then it is open.
Theorem 1.1.20 (Bounded Inverse Theorem) If T ∈ B(X, Y ) where X and Y are

Banach spaces, if T is bijective then T −1 ∈ B(X, Y ).
Theorem 1.1.21 (Closed Graph Theorem) Let T : X −→ Y be a linear operator

where X and Y are Banach spaces. Then T is bounded if and only if its graph is
closed in X × Y.
Theorem 1.1.22 (Uniform Bounded Principle) Consider the sequence Tn ∈ B(X, Y )

for some Banach space X and normed space Y. If sup Tn (x) < ∞ for every x ∈ X ,
then sup Tn < ∞.
1.2 The Adjoint of Operator
1.2.1 Bounded Linear Operators
Recall in a basic functional analysis course, the linear operator was defined to be a
mapping T from a normed space X to another normed space Y such that
T (cx + y) = cT (x) + T (y)

1.2 The Adjoint of Operator 5
for all x, y ∈ Y and for any scalar c in the field underlying the spaces X, Y which
usually is taken as R.
Let T be a linear operator, and X and Y be two normed spaces with norms · X and
·Y , respectively. Then T is called bounded linear operator if there exists M ∈ R,
such that for all x ∈ X,
T xY ≤ M x X ,
where · X and ·Y are the norms for the spaces X and Y, respectively. If there is no
such M, then T is said to be unbounded. If Y = R then T is called functional, and if
X is finite-dimensional then T is called transformation. In general, T is a mapping
that maps an element x to a unique element in Y. The norm can be written as
T = sup T (x) .
x=1
The fundamental theorem of bounded linear operators states that T is bounded iff T
is continuous at all x ∈ X iff T is continuous at 0. So, for linear functionals, bound-
edness and continuity are equivalent, and this feature is not available for nonlinear
operators. In fact, it can be easily shown that every linear operator defined on a finite-
dimensional space is bounded (i.e., continuous). An operator T is said to be injective
(or one-to-one) if for every x, y ∈ X such that T (x) = T (y), we have x = y. An
operator T is said to be surjective (or onto) if for every y ∈ Y, there exists at least
one x ∈ X such that T (x) = y. An operator T is said to be bijective if it is injective
and surjective. If dim X = dim Y = n < ∞, then T is injective if and only if T is
surjective. If T is bijective, and T, T −1 are continuous, then T is an isomorphism
between X and Y. Moreover, T is called isometry if
x X = ϕ(x)Y
for all x ∈ X.
1.2.2 Definition of Adjoint
An important operator that is defined in association with the operator T is the adjoint
operator.
Definition 1.2.1 (Adjoint Operator) Let X and Y be two Banach spaces, and let
T : X −→ Y be a linear operator. Then, the adjoint operator of T, denoted by T ∗ ,
is the operator T ∗ : Y ∗ −→ X ∗ defined as
T ∗ ( f ) = f T,
for f ∈ Y ∗ .
6 1 Operator Theory
A basic property that can be easily established from the definition is that T ∗ is
linear for linear operators, and, moreover, if S : X −→ Y is another linear operator,
then
(T + S)∗ = T ∗ + S ∗
and
(T S)∗ = S ∗ T ∗
(verify). The following proposition provides two important fundamental properties

for the adjoint operator.
Proposition 1.2.2 Let X, Y be Banach spaces. Let T : X −→ Y be a linear oper-

ator, and let T ∗ be its adjoint operator. Then
(1) If T is bounded, then T ∗ is bounded and
∗
T = T .
(2) T is bounded and invertible with bounded inverse if and only if T ∗ has bounded
inverse, and
(T ∗ )−1 = (T −1 )∗ .
Proof For (1), let f ∈ Y ∗ and x ∈ B X . Then

∗
T ( f (x)) = | f (T (x))| ≤ T ∗ f ,
and so
∗
T ≤ T < ∞. (1.2.1)
Let x ∈ X . Then T (x) = y ∈ Y. By Hahn–Banach theorem, there exists gx ∈ Y ∗

such that
|gx (y)| = y = T (x) .
Since T ∗ : Y ∗ −→ X ∗ ,
∗ ∗
T ≥ T (gx ) = |gx (T (x))| = T (x) .
Taking the supremum over all x ∈ X gives the reverse direction of (1.2.1), and this
proves (1). For (2), we have
f (x) = f (T 1 (y)) = (T −1 )∗ f (y) = (T −1 )∗ f (T (x)) = (T −1 )∗ T ∗ f (x).
Conversely, suppose T ∗ has bounded inverse. Then T ∗∗ has bounded inverse. Then
1.2 The Adjoint of Operator 7
T|∗∗
X
= T,
so T is one-to-one. Moreover, since T ∗∗ is onto, then by the open mapping theorem

it is open, so it maps closed sets to a closed set; hence T (X ) is closed in Y ∗∗ and
consequently in Y. Now, suppose T is not onto, i.e., there exists y ∈ Y \ T (X ). By the
Banach separation theorem, there exists f ∈ Y ∗ such that f (y) = 1 and f (T (x)) = 0
for all x ∈ X. It follows that
T ∗ ( f (x)) = f (T (x)) = 0
which implies that f = 0. This contradiction implies that T is onto, and hence a
bijection.
1.2.3 Adjoint Operator on Hilbert Spaces
In Hilbert space, the definition of adjoint is given in terms of an inner product.

Let T : H1 → H2 for some Hilbert spaces H1 and H2 . Let y ∈ H2 and define the
functional
f (x) = T x, y ,
then f is clearly linear and bounded on H1 , and by the Riesz representation theorem,
there exists a unique z ∈ H1 such that
T x, y = x, z .
We define z as T ∗ (y).
Definition 1.2.3 (Adjoint Operator on Hilbert Spaces) Let T : H1 → H2 be a
bounded linear operator between two Hilbert spaces H1 and H2 . The adjoint operator
of T , denoted by T ∗ , is defined to be T ∗ : H2 → H1 given by
T x, y = x, T ∗ (y) .
Recall the null space of an operator T is defined as
ker(T )= N (T ) = {x ∈ Dom(T ) : T (x) = 0}
and the range of T is
Im(T ) = R(T ) = {T x : x ∈ Dom(T )}.
Proposition 1.2.4 Let T : H1 → H2 be a bounded linear operator between two

Hilbert spaces H1 and H2 . Then
8 1 Operator Theory
(1) T ∗∗ = T .
(2) T ∗ T = T 2 .
(3) N (T ) = R(T ∗ )⊥ and N (T )⊥ = R(T ∗ ).
(4) N (T ∗ ) = R(T )⊥ and N (T ∗ )⊥ = R(T ).
1.2.4 Self-adjoint Operators
Recall from a linear algebra course that a self-adjoint matrix is a matrix that is equal
to its own adjoint. This extends to operators on infinite-dimensional spaces.
Definition 1.2.5 (Self-Adjoint Operator) A bounded linear operator on a Hilbert
space T ∈ B(H) is called self-adjoint if T = T ∗ .
Proposition 1.2.6 Let T ∈ B(H). Then T is self-adjoint if and only if T x, x ∈ R
for all x ∈ H.
Proof If T = T ∗ then
T x, y = x, T (y) = T x, y .
Conversely, let T x, x ∈ R for all x ∈ H. Then
T x, x = T x, x = x, T ∗ x = T ∗ x, x ,
hence T = T ∗ .
1.3 Compact Operators
1.3.1 Definition and Properties of Compact Operators
We now introduce compact operators, an extremely important class of operators

which is a cornerstone in the study of several topics of analysis, and finds numer-
ous applications to differential equations and integral equations and their spectral
theories.
Definition 1.3.1 (Compact Operator) Let X and Y be normed spaces and
T : X → Y be a linear operator. Then, T is called compact operator if for every
bounded sequence xn ∈ X , the sequence {T (xn )} has a convergent subsequence in
Y. The set of all compact operators from X to Y is denoted by K(X, Y ), or K(X ) if
X = Y.
In simple words, a linear operator T : X → Y is compact if T (B) is compact in Y
whenever the set B is bounded in X . The definition implies immediately that every
1.3 Compact Operators 9
compact linear operator is bounded, thus continuous. One of the basic properties
of compact operators is that composing them with bounded linear operators retains
compactness.
Proposition 1.3.2 Let T be compact, and S be bounded linear on a normed space
X. Then ST and T S are compact operators.
Proof If {xn } is bounded, then S(xn ) is bounded, hence T (S(xn )) has a convergent
subsequence, so T S is compact. Also, since {xn } is bounded, T (xn ) has a convergent
subsequence T (xn j ), but since S is bounded, it is continuous in norm, so S(T (xn j ))
also converges.
Theorem 1.3.3 Let T ∈ K(X ) for a normed space X. Then, T is invertible if and
only if dim(X ) < ∞.
Proof If T −1 exists, then T −1 is bounded, and by Proposition 1.3.2, I = T −1 T is

also compact, but the identity cannot be compact in an infinite-dimensional space
since the unit closed ball would be compact, which contradicts Theorem 1.1.10.
One of the most fundamental and important results in analysis which provides a
compactness criterion is the Arzela–Ascoli theorem.
Theorem 1.3.4 (Arzela–Ascoli Theorem) Let f n ∈ C(K ), for some compact set
K . If the sequence { f n } is bounded and equicontinuous, then it has a uniformly
convergent subsequence.
Proof By compactness, K is separable, and so let D = {xn } be a countable dense

in K and consider the sequence { f n (x1 )} which, again by compactness, has a con-
vergent subsequence. Using Cantor’s diagonalization argument to obtain a diagonal
sequence, say {h n } = { f n n } which is a subsequence of { f n (x1 )} that converges at
each point of D. Since { f n } is equicontinuous, for every x ∈ K and > 0, we can
find δ > 0 such that
h n (x) − h n (y) < /3
for all y ∈ D. But h n (y) is Cauchy, so we can find n, m such that
h n (y) − h m (y) ≤ /3.
Hence
h n (x) − h m (x) ≤ h n (x) − h n (y) + h n (y) − h m (y) + h m (y) − h m (x) < .
This implies that {h n (x)} is uniformly Cauchy and thus it converges to, say h(x),
which is continuous by equicontinuity of { f n }. Now, using a similar argument we
conclude that {h n (x)} converges uniformly to h(x).
10 1 Operator Theory
The Arzela–Ascoli theorem will be used in demonstrating the following important

property of compact operators linked to adjoint operators.
Theorem 1.3.5 Let X and Y be normed spaces. Then, T is compact if and only if
T ∗ is compact.
Proof Let T : X → Y be compact. Let S = { f n } be a bounded set in Y ∗ . Then
| f n (y)| ≤ M T for all n. Denote K = T (B X ). Then, S ⊂ C(K ). Since ( f n ) is
uniformly bounded for all y1 , y2 ∈ K and all f j ∈ S, we have

f j (y1 ) − f (y2 ) ≤ M y1 − y2 .
So { f n } is equicontinuous, hence by the Arzela–Ascoli theorem { f n } has a sub-

sequence that uniformly converges on C(K ), i.e., f n j → f ∈ C(K ). Note that by
Proposition 1.2.2(1) T ∗ is bounded, and since S is bounded in Y ∗ , we have
∗
T f n − T ∗ f ≤ T ∗ f n − f → 0.
j j ∞
Hence (T ∗ f n j ) converges.
For the converse, if T ∗ is compact then so is T ∗∗ . Let J : X −→ X ∗∗ be the
canonical mapping given by
(J (x))( f ) = f (x) ∀x ∈ X, f ∈ X ∗ . (1.3.1)
If {xn } is bounded in X , then J (xn ) is bounded in X ∗∗ because J is an isometry.

But (T ∗∗ (J (xn )) is bounded (due to the compactness of T ∗∗ ) and has a convergent
subsequence, say
(T ∗∗ (J (xn j )) ⊂ Y ∗∗ .
So
(T ∗∗ (J (xn j ))(h) = (J (xn j )T ∗ (h), (1.3.2)
for h ∈ Y ∗ . But from (1.3.1)
(J (xn j )T ∗ (h) = T ∗ (h)(xn j ) = h(T (xn j ). (1.3.3)
By (1.3.2) and (1.3.3), we obtain
(T ∗∗ (J (xn j ))(h) = h(T (xn j ),
and consequently
∗∗
T (J (xn ) = T (xn ) .
j j
T (xn j ) converges due to the convergence of T ∗∗ (J (xn j ), and this completes the
proof of the other direction.
1.3.2 The Integral Operator
Example 1.3.6 (Integral Operator) Let X = L p [a, b], 1 < p < ∞, and let k ∈
C ([a, b] × [a, b]) be a mapping from [a, b] × [a, b] to R. Consider the Fredholm
integral operator K: X −→ X defined by
b
(K u)(x) = k(x, y)u(y)dy.
a
We will show that K is a compact operator for all p > 1. It is easy to see that
|K u| ≤ k∞ u p ,
from which we conclude that K is bounded, and so for a bounded set B ⊂ X, K (B)
is bounded. Now we show that K (B) is equicontinuous. Since u ∈ L p [a, b], let
b
|u(x)| p d x = α.
a
Moreover, since k is uniformly continuous on [a, b] × [a, b], for every > 0 there
exists δ > 0 such that for all x1 , x2 ∈ [a, b], with x1 − x2 < δ we have

|k(x2 , y) − k(x1 , y)| < , ∀t ∈ [a, b].
α
Hence, for all u ∈ K (B)
b
|(K u)(x2 ) − (K u)(x1 )| ≤ |k(x2 , y) − k(x1 , y)| |u(y)| dy ≤ .
a
This shows K (B) is equicontinuous, and therefore by the Arzela–Ascoli theorem, it

has a convergent subsequence, which implies that K (B) is compact, and therefore
J is compact.
To show the adjoint of K , we assume u, v ∈ X. Then
b
K u, v = K u(x)v(x)d x
a
b b
= k(x, y)u(y)dy v(x)d x
a a
b b
= k(x, y)v(x)d x u(y)dy by Fubini Thm
a a
b b
= u(y)dy k(x, y)v(x)d x .
a a
Define
b
K ∗ v(y) = k(x, y)v(x)d x.
a
Therefore, we see that K is self-adjoint if
k(x, y) = k(x, y),
that is, k is symmetric (or Hermitian).
So the integral operator is an example of a compact operator (in fact the earli-
est example in the literature), and this operator is also self-adjoint if its kernel is
symmetric and square-integrable.
An important question raised is whether K(X, Y ) is closed, in the sense that if
{Tn } is a sequence of compact operators, and Tn → T , is T compact? The following
theorem demonstrates that this property holds if Y is Banach.
Theorem 1.3.7 Let {Tn } be a sequence of compact operators from a normed space
X to a Banach space Y . If {Tn } converges in norm to T , then T : X → Y is compact.
In particular, the set K(X, Y ) is closed.
Proof Let {xn } ∈ X be a bounded sequence, so xn ≤ M for some M > 0 for all
n. Since T1 is compact, {xn } has a subsequence {xn1 } such that T1 (xn1 ) converges in Y.
But {xn1 } must be bounded, and since T2 is compact, {xn1 } has a subsequence {xn2 } such
that T2 (xn2 ) converges in Y, and keeping in mind that T1 (xn2 ) converges as well since
{xn2 } is a subsequence of {xn1 }. Proceed the argument inductively; {xnk } subsequence
of {xnk−1 } with Tk (xnk ) converges. Choose the diagonal sequence
{xnn } = x11 , x22 , x33 , . . . ,
that is, the first term in the first sequence {xn1 }, the second term of the second sequence
{xn2 }, etc. The proof is done if we can prove that T (xnn ) converges. Clearly, Tk (xnn )
converges, so it is a Cauchy sequence and

TN (x n ) − TN (x m ) < (1.3.4)
n m
3
for large n, m > N . On the other hand, since Tn → T , for every > 0 there exists
N ∈ N such that for all n ≥ N , we have

T − Tn < . (1.3.5)
3M
From (1.3.4) and (1.3.5), we obtain

T (x n ) − T (x m ) ≤ T (x n ) − TN (x n ) + TN (x n ) − TN (x m ) + TN (x m ) − T (x m )
n m n n n m m m
m
≤ T − TN xnn + + T − TN xm = .
3
It follows that T (xnn ) is a Cauchy sequence, so it converges due to the completeness

of Y.
1.3.3 Finite-Rank Operators
Recall from a linear algebra course that the rank of a linear transformation is defined
as the dimension of its range, and represents the maximum number of linearly inde-
pendent dimensions in the range space of the operator. The definition extends to
operators on infinite-dimensional spaces as we shall see next.
Definition 1.3.8 (Finite-Rank Operator) Let X and Y be normed spaces. A bounded
linear operator T : X → Y is said to be of finite rank if its range is a finite-
dimensional subspace, i.e.,
r = dim(T (X ) < ∞.
An operator having a finite rank is called finite-rank operator, or f.r. operator for
short. The rank of an operator T is denoted by r (T ). The class of all bounded linear
operators of finite rank is denoted by K0 (X, Y ).
Note that if at least one of the spaces X or Y is finite-dimensional, then T is of
finite rank. Note also that if the range is finite-dimensional then every closed bounded
set is compact. Choosing any bounded set in the domain of a f.r. operator, this set
will be mapped to a bounded set in the finite-dimensional range, and so its closure
is compact. Thus:
Proposition 1.3.9 Finite-rank operators are compact operators, i.e.,
K0 (X, Y ) ⊂ K(X, Y ).
Note that the inclusion is proper, i.e., there exist compact operators that are not
f.r. operators. A simple example which is left to the reader to verify is to consider
the sequence space 2 and define the operator T : 2 −→ 2 as
x2 x3 xn
T (x1 , x2 , . . .) = (x1 , , , . . . , , . . .).
2 3 n
The source of the problem here is the range of T as it is not closed. If it is closed,
then it would be a f.r. operator as in the next proposition.
Proposition 1.3.10 If T : X −→ Y is compact and R(T ) is closed in the Banach
space Y, then T is of finite rank.
Proof Since Y is complete, so is R(T ), so T : X → R(T ) is a surjective operator

between Banach spaces; hence, T is open (by the Open Mapping Theorem), hence
the open unit ball B X is mapped to the open set K = (T (B X )), which is relatively
compact since A is compact. Then we can find an open ball of radius r > 0 such
that Br ⊂ K , so Br ⊂ K ; hence Br is compact, and therefore Y is necessarily finite-
dimensional.
If a f.r. operator is defined on a Hilbert space, then we have more to say. In this case,
the subspace T (X ) is of dimension, say, r (T ) = n, which gives rise to a countable
orthonormal basis {e1 , e2 . . . , en } for T (X ), and every y ∈ T (X ) can be written as

n
n
y = T (x) = y, e j e j = T (x), e j e j . (1.3.6)
j=1 j=1
This suggests the following result.

Proposition 1.3.11 If T ∈ K0 (H) then r (T ) = r (T ∗ ).
Proof If H is finite-dimensional, the conclusion follows directly from Proposition
1.2.4 and the rank-nullity theorem. Suppose otherwise H is infinite-dimensional, and
let r (T ) = n. From (1.3.6), we have

n
n
T (x) = T (x), e j e j = x, T ∗ (e j ) e j .
j=1 j=1
Let T ∗ (e j ) = θ j . Since T x, y = x, T ∗ y , we have

n
n
x, θ j e j , y = x, y, e j θ j .
j=1 j=1
Thus

n
T ∗ (·) = ·, e j θ j .
j=1
Clearly r (T ∗ ) = n. The opposite direction follows from the fact that T ∗∗ = T.
1.4 Hilbert–Schmidt Operator
1.4.1 Definition of Hilbert–Schmidt Operator
Now we discuss another important class of compact operators which was investigated
by Hilbert–Schmidt in 1907.
1.4 Hilbert–Schmidt Operator 15
Definition 1.4.1 (Hilbert–Schmidt Operator) Let T ∈ B(H), and let {ϕn } be an

orthonormal basis for H. Then T is said to be a Hilbert–Schmidt operator if

T ϕi 2 < ∞.
i
Note that the definition requires an orthonormal basis; hence the space is essen-
tially separable. Another thing to observe is that separable Hilbert spaces have more
than one basis, so this raises the question of whether the condition holds for any
orthonormal basis or for a particular one? The answer is that the condition does not
depend on the basis. First, let us find a convenient form for the norm. Let xk ∈ H.
Then, T (x) can be written as

T (xk ) = T (xk ), ϕ j ϕ j .
j
Letting xk = ϕk and substituting back,

T (ϕk ) = T (ϕk ), ϕ j ϕ j .
j
So

T (ϕk )2 = T (ϕk ), ϕ j 2 .
j
Take the summation on k,

T (ϕk )2 = T (ϕk ), ϕ j 2 .
k k j
Hence we can define the Hilbert–Schmidt norm to be

⎛ ⎞1/2
1/2
2
T 2 = T (ϕk )2 =⎝ T (ϕk ), ϕ j ⎠ ,
k j,k
and therefore the condition in the definition is equivalent to saying that

T (ϕk ), ϕ j 2 < ∞.
j,k
Now, we show that the condition is independent of the choice of the basis. Let {u k }
be another orthonormal basis for H. By representing T (ϕk ) and T (u k ) by {u k }, it is
easy to see that

T ϕk 2 = T (ϕk ), u j 2 = T ∗ (u j ), ϕk 2 = T ∗ (u j )2 .
k j,k j,k j
(1.4.1)
Similarly,

T u k 2 = T ∗ (u j )2 . (1.4.2)
k j
The combination of (1.4.1) and (1.4.2) gives

T ϕk 2 = T u k 2 .
k k
We denote the Hilbert–Schmidt operators by HS operator, and the Hilbert–Schmidt

norm by ·2 . The set of all Hilbert–Schmidt operators on a separable Hilbert space
is denoted by K2 (H).
Remark It is important to emphasize here that if {ϕn } is an orthonormal basis for a
space X then {ϕn } is an orthonormal basis for X.
1.4.2 Basic Properties of HS Operators
The following basic properties of HS operators follow easily from the above discus-
sion.
Proposition 1.4.2 Let T be a HS operator (i.e., T ∈ K2 (H)). Then
(1) T 2 = T ∗ 2 .
(2) T ≤ T 2 .
Proof The first assertion follows immediately from (1.4.2). For the second assertion,
let x ∈ H and {ϕn } be an orthonormal basis for H. Then

T x2 = | T x, ϕk |2 = x, T ∗ ϕk 2 ≤ x2 T 2 .
2
k k
This implies that
T x ≤ x T 2
for all x ∈ H, whence the result.

It is clear that the set of all Hilbert–Schmidt operators form a linear subspace.
Proposition 1.4.3 Let T1 , T2 ∈ K2 (H) and α ∈ F. Then αT1 , T1 + T2 ∈ K2 (H).
Proof Use basic properties of norm and triangle inequality. Details are left to the
reader.
1.4.3 Relations with Compact and Finite-Rank Operators
The next result demonstrates two important fundamental properties of HS operators:

every finite-rank operator is a HS operator, and every HS operator is compact.
Theorem 1.4.4 Let T ∈ B(H). Then,
K0 (H) ⊂ K2 (H) ⊂ K(H).
Proof The first inclusion asserts that every f.r. operator is HS. To prove this, let
T ∈ K0 (H). Since
r (T ) = dim(Im(T )) = m,
we have {ϕn } ∈ ImT for n ≤ m. But by Proposition 1.2.4(4),
(Im(T ))⊥ = ker(T ∗ ).
So T ∗ (ϕn ) = 0 for all n > m, and consequently

∞
∗ 2 m
∗ 2
T ϕn = T ϕn < ∞.
n=1 n=1
Hence, T ∗ is a HS operator, and by Proposition 1.4.2, T is HS and T ∈ K2 (H).

For the second inclusion, let T ∈ K2 (H) and {ϕn } be an orthonormal basis for H.
Then

T 22 = T ϕ j 2 < ∞.
Note that for x ∈ H, we have

∞
∞

x= x, ϕ j ϕ j ⇒ T (x) = x, ϕ j T (ϕ j ).
j=1 j=1
Define the sequence

n
Tn = x, ϕ j T (ϕ j ).
j=1
Then, Tn is clearly a sequence of finite-rank operators for all n since R(Tn ) is con-
tained in the span of {ϕ1 , ϕ2 , . . . ϕn }. By Proposition 1.4.2(2),
Tn − T 2 ≤ Tn − T 22

∞

= T ϕ j 2
j=n+1
∞

≤ T ϕ j 2 < ∞.
j=n+1
Taking n → ∞ gives
Tn − T → 0.
Note that {Tn } is a sequence of f.r. operators which are compact operators, and the
result follows from Theorem 1.3.7.
The preceding result simply states that every f.r. operator is a HS operator, and
every HS operator is compact. The proper inclusions imply the existence of compact
operators that are not HS, and the existence of HS that are not of finite rank. A useful
conclusion drawn from the preceding theorem is
Corollary 1.4.5 K0 (H) = K2 (H) in ·2 HS norm, that is, for every T ∈ K2 (H)
there exists a sequence Tn ∈ K0 (H) such that Tn − T 2 → 0.
Proof Details are left to the reader as an exercise.
The following proposition gives a description of each class.

Theorem 1.4.6 Let T ∈ B(H), given by

T (x) = αn x, ϕn u n ,
where (ϕn ) and (u n ) are two orthonormal bases in H and (αn ) be a bounded sequence
in the underlying field F, say R. Then
if lim αn = 0.
(1) T is a compact operator if and only
(2) T is a HS operator if and only if |αn |2 < ∞.
(3) T is of finite rank if and only if there exists N ∈ N such that αn = 0 for all
n ≥ N.
Proof For (1), let T be compact. If αn 0 then for every ,

αn ≥ > 0
j
for some n j . It is easy to see that

T ϕn − T ϕn 2 = αn 2 + αn 2 ≥ 22 .
j i j j
Hence, the sequence {T (ϕn j )} has no convergent subsequence, and this implies that
T (xn ) cannot be compact. Suppose now that lim αn = 0. Define the sequence

n
Tn (x) = αk x, ϕk u k .
k=1
Then each Tn is of finite rank, and using the same argument of the proof of the
previous theorem we see that
Tn − T → 0
which implies that T is compact.

For (2), note that

T ϕn 2 = | T ϕn , u n |2 = |αn u n , u k |2 = |αn |2 .
For (3), let T be a finite-rank operator. Note that T (ϕk ) = αk u k . So we have
R(T ) ⊆ span{u 1 , u n , . . . u k }
for some m. This means that ϕk = 0 for all k ≥ m + 1. On the other hand, if for each
k ∈ N, αk = 0, then u k ∈ R(T ) for all k; hence
dim(R(T )) = ∞,
and, consequently, T is not of finite rank.
In light of the preceding theorem, if T is a f.r. operator, then αn = 0 for all but finite
number of terms, and this implies

|αn |2 < ∞, (1.4.3)
which also implies that αn → 0. Thus, T is compact. This leads to Theorem 1.4.4.
Moreover, if T is compact, then αn → 0, but this doesn’t necessarily imply (1.4.3),
and so T may not be HS. If (1.4.3) holds, then T is HS, but this doesn’t necessarily
mean that αn = 0 for all but finite number of terms, hence T may not be of finite
rank. It turns out that the results of the last two theorems are fully consistent with
each other, and the last theorem is very helpful to construct examples of compact but
not HS operators, and HS operators but not of f.r.
The following example is a good application of the preceding theorem.
Example 1.4.7 Let T : 2 → 2 , given by

T (x) = αn xn ϕn ,
for some orthonormal basis {ϕn } of 2 . The operator

x n ϕn
T (x) = √
n
1 1
is a compact operator but not HS. This is because αn = √ and = ∞. Fur-
n n
thermore, the operator
x n ϕn
T (x) =
n
1 1
is HS but not f.r. since αn = for all n, and < ∞.
n n2
1.4.4 The Fredholm Operator
Example 1.3.6 demonstrated the fact that Fredholm integral equations defined on L p
are compact. In the particular case p = 2, we have the advantage of dealing with
an orthonormal basis for the space, which will allow us to work on HS norms. In
particular, we have the following result.
Theorem 1.4.8 The Fredholm integral operator K ∈ L 2 ([a, b]) defined by
b
(K u)(x) = k(x, y)u(y)dy
a
with k ∈ L 2 ([a, b] × [a, b]) is a Hilbert–Schmidt operator.

Proof Since K is defined on a Hilbert space, let {ϕn } be an orthonormal basis for
H. Using the Dominated Convergence Theorem (DCT) and the fact that k ∈ L 2 we
have that
∞ ∞ b
2

K ϕn =
k(x, y)ϕn (y)dy d x
2
n=1 n=1 a
∞ b
= | k, ϕn |2 d x
n=1 a
∞
b
= | k, ϕn |2 d x (DCT)
a n=1
b
= k2 d x
a
b b
= |k(x, y)|2 d x
a a
= k22 < ∞.
The preceding theorem demonstrates that the Fredholm integral operator defined
on L 2 [a, b] is HS. In fact, the converse is also true. Namely, every HS operator
defined on L 2 [a, b] is an integral operator. This striking result justifies the particular
importance of HS operators as compact operators that behave as integral operators
and could also be self-adjoint if their kernels are symmetric.
The next theorem demonstrates that every HS operator from L 2 to L 2 is identified
by a integral operator with kernel k ∈ L 2 .
Theorem 1.4.9 Every Hilbert–Schmidt operator defined on X = L 2 ([a, b]) is an
integral operator with square-integrable kernel.
Proof Consider a Hilbert–Schmidt operator K : X −→ X and let {ϕn } be an
orthonormal basis for X. So
∞

K ϕn 2 < ∞,
n=1
∞
hence n=1 (K ϕn ) converges in X. Now for u ∈ X we have
∞

u= u, ϕn ϕn .
n=1
It follows that
∞

(K u)(x) = K u, ϕn ϕn (x)
n=1
∞

= u, ϕn K ϕn (x)
n=1
∞ b

= u(y)ϕn (y)dy K ϕn (x)
n=1 a
b ∞

= u(y) ϕn (y)(K ϕn )(x) dy,
a n=1
where we used the Dominated Convergence Theorem (DCT) in the last step. Now,
define
∞

k(x, y) = ϕn (y)(K ϕn )(x).
n=1
Then k is clearly a mapping from [a, b] × [a, b] to R and k ∈ L 2 ([a, b] × [a, b]) .
Therefore, the HS operator K can be written as
b
(K u)(x) = k(x, y)u(y)dy.
a
1.4.5 Characterization of HS Operators
We end the section by the following observation: It was shown in Theorem 1.4.6(2)
that the operator

T (x) = αn x, ϕn u n

is HS if and only if |αn |2 < ∞. Note here that if u k = ϕk then we obtain
T (ϕk ) = αk ϕk .
It turns out that the sequence (αk ) is nothing but the eigenvalues of T. These
are countable (possibly finite) set of eigenvalues. This motivates us to investigate
the spectral properties of the operators to elaborate more on the eigenvalues and
eigenvectors of HS and compact operators. Before we start this investigation, we
would like to obtain one final result in this section. The preceding theorem predicts
that every HS operator on L 2 is an integral operator with a square-integrable kernel.
We
will2 combine this result with the preceding theorem to show that the scalar sum
|αn | is, in fact, the HS norm of the operator, so the result that T is HS iff this
sum is finite comes as no surprise.
Theorem 1.4.10 Let T ∈ K2 be a HS operator
T : L 2 ([a, b] → L 2 ([a, b].
Let (λn ) be the eigenvalues of T. If the kernel k ∈ L 2 ([a, b] × [a, b]) of T is sym-
metric, then
∞

k22 = |λn |2 .
n=1
Proof Since L 2 is Hilbert, let (ϕn ) be an orthonormal basis of L 2 ([a, b]) which are
the corresponding eigenvectors for (λn ). Define the set
ψnm (x, y) = ϕn (x)ϕm (y). (1.4.4)
Then it can be shown that (ψnm ) is an orthonormal basis of L 2 ([a, b] × [a, b]) (see
Problem 1.11.32). Since k ∈ L 2 ([a, b] × [a, b]) ,

k(x, y) = k, ψnm ψnm ,
n m
and by Parseval’s identity, we have

1.5 Eigenvalues of Operators 23

k2 = | k, ψnm |2 . (1.4.5)
n m
On the other hand, we have

b b
k, ψnm = k(x, y)ϕn (x)d x ϕm (y)dy
a a
b
= T (ϕn )ϕm (y)dy
a
= λn ϕn , ϕm since T (ϕn ) = λn ϕn .

λn n = m
=
0 n = m.
Substituting this into (1.4.5) gives
∞

k22 = | k, ψnm |2 = |λn |2 .
n m n=1
1.5 Eigenvalues of Operators
1.5.1 Spectral Analysis
The study of eigenvalues in functional analysis was begun by Hilbert in 1904 during
his investigations on the quadratic forms in infinitely many variables. In 1904, Hilbert
used the terms “eigenfunction” and “eigenvalue” for the first time, and he called this
new direction of research “Spectral Analysis”. Hilbert’s early theory led to the study
of infinite systems of linear equations, and mathematicians like Schmidt and Riesz,
and then John von Neumann were among the first who began this direction. Finite
systems of linear equations were investigated during the eighteenth and nineteenth
centuries and based centrally on the notions of matrices and determinants. Then
the problem of solving integral equations emerged at the beginning of the twentieth
century. It turned out that the problem of solving an integral or differential equation
could be boiled down to solving linear systems of infinite unknowns. The spectral
theory deals with eigenfunctions and eigenvalues of operators in infinite-dimensional
spaces and the conditions under which they can be expressed in terms of their eigen-
values and eigenfunctions, which helps in solving integral and differential equations
by expanding the solution as a series of the eigenfunctions. Extending matrices to
infinite dimensions leads to the notion of operator, but many fundamental and crucial
properties of matrices will be lost upon that extension.
One of the remarkable results of the study of operators (or infinite-dimensional

matrices) is the fact that compact operators can be represented in a diagonal form
in which their eigenvalues are the entries of the “infinite diagonal”. This class of
operators was constructed to retain many characteristic properties of matrices from
which the importance of this class stems. The present section and the next one will
establish two remarkable results that demonstrate the fact that compact operators
share some nice properties with matrices, and they could be viewed as an “infinite-
dimensional matrices”.
Let T: V −→ V be a Hermitian (or self-adjoint) map on a finite-dimensional
space V . It is well-known from linear algebra that T is identified with a Hermi-
tian matrix. The main result is that all eigenvalues of T are real numbers, and its
eigenvectors form an orthonormal basis for the space, and this basis can be used
to diagonalize the matrix. This is the main spectral theorem for Hermitian maps on
finite-dimensional spaces. In infinite-dimensional spaces, the situation can be much
different and more complicated. There might not be “enough” eigenvectors, or to
say eigenfunctions, to form a basis for the space. We will, nevertheless, show that
compact self-adjoint operators on Hilbert spaces retain this property.
1.5.2 Definition of Eigenvalues and Eigenfunctions
We begin our discussion by the following definition, which is analogous to the finite-
dimensional case.
Definition 1.5.1 (Eigenvalue, Eigenfunction) Let X be a normed space and T ∈
B(X ). Then, a constant λ ∈ C is called eigenvalue of T if there exists a nonzero vector
x ∈ X such that T x = λx. The element x is called eigenvector, or
eigenfunction of T corresponding to λ.
Notice the following:
(1) The concept of eigenvalue and eigenfunction has been defined for bounded linear
operators, but it can also be defined for unbounded operators.
(2) We always avoid the case x = 0 since this won’t lead to desirable results.
1.5.3 Eigenvalues of Self-adjoint Operators
The next proposition gives two basic properties of self-adjoint operators not proper-
ties related to the eigenvalues and the norm of the self-adjoint operator.
Proposition 1.5.2 Let T ∈ B(H) be a self-adjoint operator. Then
(1) All eigenvalues {λi } of T are real numbers, and their corresponding eigenvectors
are orthogonal.
(2) T = sup{| T x, x | : x ≤ 1}.
Proof Let T u = λu. Then
λ u2 = T u, u = u, T u = λ̄ u2 ,
so λ = λ̄, which implies λ ∈ R. Let
T v = μv
for another eigenvalue μ = λ. Since T is self-adjoint, we have
T u, v = u, T v .
Then
0 = T u, v − u, T v = λ u, v − μ u, v = (λ − μ) u, v .
Hence u and v are orthogonal. This proves (1).

To prove (2), let
M = sup{| T x, x | : x ≤ 1}.
Then, clearly
T x, x ≤ T x2 .
Take the supremum over all x ∈ B X ,
M ≤ T .
On the other hand, choosing x, y ∈ B X , and using Polarization Identity then the
Parallelogram Law for the inner product,
1
Re( T x, y ≤ [| T (x + y), x + y | + | T (x − y), x − y |]
4
1
≤ M x + y2 + x − y2
4
1
= M x2 + y2
2
= M.
Let c ∈ C with |c| = 1. So c̄y ∈ B X . Then

| T x, y | = T x, c̄y ≤ M.
Taking the supremum over all y ∈ B X , then supremum over all x ∈ B X gives
T ≤ M.
This proves (2).

An immediate corollary is
Corollary 1.5.3 If T ∈ B(H) is a self-adjoint operator and T x, x = 0 for all
x ∈ B X , then T = 0.
1.5.4 Eigenvalues of Compact Operators
Proposition 1.4.2 is of extreme importance in determining spectral properties for

self-adjoint operators. We managed to find orthogonal, hence orthonormal, set of
vectors in the range space of the operator. We will use this property by combining
compactness with being self-adjoint. This will provide an interesting result. Before
stating the result, we may need to recall the notation
T ∗ = T ∈ K(H)
which means T is a self-adjoint compact operator on a Hilbert space.

Proposition 1.5.4 Let T ∗ = T ∈ K(H). Then, T or − T is an eigenvalue of T.
Proof From Proposition 1.5.2, we have
T = sup{| T x, x | : x = 1}.
Hence, there exists a sequence xn ∈ SH such that
| T xn , xn | → T ,
and keeping in mind that T xn , xn ∈ R since T is self-adjoint. Assume
T xn , xn → λ ∈ R.
Then, either λ = T or λ = − T . Therefore, using the expansion of the inner

product,
T xn − λxn 2 = T xn 2 + λ2 − 2λ T xn , xn
≤ 2λ2 − 2λ T xn , xn → 0.
Since T is compact, the sequence {T xn } has a convergent subsequence such that
T xnk → z (1.5.1)
for z ∈ H. So λxn k → z and by continuity of T, this implies that
λT (xn k ) −→ T z,
or
1
T xn k −→ T z. (1.5.2)
λ
Then from (1.5.1) and (1.5.2), we obtain T z = λz. It remains to show that z = 0.
Note that T > 0. Then, we have
T xn ≥ λxn − T xn − λxn → |λ| = T > 0.
Taking the limit of both sides, and using continuity of the norm gives
z > 0.
Hence |λ| is an eigenvalue of T with a corresponding vector z.
An immediate consequence is
Corollary 1.5.5 If T ∗ = T ∈ K(H), then T has at least one nonzero eigenvalue.
This contrasts with the finite-dimensional case where symmetric matrices may
have no eigenvalues if the underlying field is R since the characteristic polynomial
may not have solutions in R. It is well-known from linear algebra that the set of
eigenvalues of a matrix is finite. Since they are simply the roots of the characteristic
polynomial, there can be at most n eigenvalues for an n × n matrix. As mentioned
at the beginning of this section, things change when we turn to operators on infinite-
dimensional spaces. The next example illustrates the idea.
Example 1.5.6 Consider the left-shift operator on p , 1 ≤ p < ∞,
T (x1, x2 , x3 . . .) = (x2 , x3 . . .).
To find eigenvalues of T, we write T xn = λxn . Then
xn+1 = λn x1 .
If λ = 0, the corresponding eigenvector will be of the form
x = (x1 , λx1 , λ2 x1 , λ3 x1 , . . .),

and this element does not belong to p unless |λ| < 1. Hence, the set of eigenvalues
of T is (−1, 1).
1.6 Spectral Analysis of Operators
1.6.1 Resolvent and Regular Values
We have seen one similarity that compact operators share with matrices: the discrete
set of eigenvalues (finite or countable). Now we exhibit another property regarding
the eigenvalues. Recall from linear algebra that a scalar λ ∈ C is called an eigenvalue
of a linear mapping T if (T − λI ) is singular, i.e., if
det(T − λI ) = 0.
If λ is not an eigenvalue, then (T − λI ) is regular, and so (T − λI )−1 exists. Recall

a linear map on a finite-dimensional space is surjective if and only if it is injective.
The situation in infinite-dimensional spaces is much more complicated. Recall also
that an operator T is invertible if T −1 exists and is bounded, so by saying invertible,
we mean the inverse exists in B(X ). One sufficient condition for this to hold is the
Open Mapping Theorem (OMT) which states that a bounded linear operator of a
Banach space onto a Banach space is open, i.e., its inverse is continuous, which by
linearity implies it is bounded. A value λ ∈ C for which the mapping (T − λI ) has
a bounded inverse is called regular value. This proposes the following definition:
Definition 1.6.1 (Regular Value) Let X be a Banach space and T ∈ B(X ). A scalar
λ ∈ C is called a regular value of T if
Tλ−1 = (T − λI )−1 ∈ B(X )
(i.e., Tλ is bijection with a bounded inverse operator). The set of all regular values
of T is called the resolvent set, and is denoted by ρ(T ).
The definition is limited to bounded linear operators on Banach spaces, but the
definition can always be extended to unbounded operators on normed spaces. Fur-
thermore, every bounded continuous operator and bijective on Banach spaces has a
bounded inverse, thanks to OMT. So the condition of the bounded inverse is auto-
matically satisfied in the case of Banach spaces. If a scalar λ ∈
/ ρ(T ), then one of
the following holds:
(1) The scalar λ is an eigenvalue, so that Tλ has no inverse, i.e., ker(Tλ ) = {0}, or
(2) λ is not an eigenvalue, i.e., ker(Tλ ) = {0} and Tλ has an inverse, but λ is not a
regular value. By OMT, Tλ must be nonsurjective. This is due to one of the two
reasons:
(a) R(Tλ ) = X and Tλ−1 is unbounded.
1.6 Spectral Analysis of Operators 29
(b) R(Tλ ) X.
Hence, there can be more than one reason for the scalar not to be regular. All these
values are called spectral values and the set containing all of them is called spectrum.
Definition 1.6.2 (Spectrum) Let T be an operator. The set consisting all numbers
that are not in the resolvent set ρ(T ) is called the spectrum of T, and is denoted by
σ(T ). It consists of three sets: The point spectrum σ p (T ) consisting of all eigenvalues
of T, the continuous spectrum σc (T ) consisting of all scalars λ for which R(Tλ ) = X
and Tλ−1 is unbounded, and the residual spectrum σr (T ) consisting of all scalars λ
for which R(Tλ ) X.
As an immediate consequence of the two preceding definitions, we have the
following formula:
σ p (T ) ∪ σc (T ) ∪ σr (T ) = σ(T ) = C \ ρ(T ).
1.6.2 Bounded Below Mapping
It turns out that the spectrum of an operator defined on infinite-dimensional spaces

contains, but is not limited to, its eigenvalues. This contrasts with the finite-
dimensional case where any scalar that is not an eigenvalue is a regular value for the
linear map, and the map is invertible. In fact, the idea of invertibility can be identified
with another notion. If T is invertible, then for all x ∈ X ,

x = T −1 T x ≤ T −1 T x . (1.6.1)
Since T is invertible,
−1
T = M < ∞
for some M > 0. This gives
c x ≤ T x
1
for c = . Notice how the operator T is bounded from below. This suggests the
M
following definition:
Definition 1.6.3 (Bounded Below) An operator T is said to be bounded below if for
some c > 0, we have
c x ≤ T x
for all x ∈ X.
The previous argument shows that if an operator is invertible, i.e., bijection with
bounded inverse, then it is bounded below. For the converse, we have the following
proposition.
Proposition 1.6.4 Let T ∈ B(X ) for a Banach space X. Then, T is bounded below
if and only if T is injective and R(T ) is closed.
Proof If T is bounded below, then clearly ker(T ) = {0}, and so T is injective. If

xn ∈ X , and T (xn ) → y ∈ X, then {T xn } is Cauchy. It follows that
c xn − xm ≤ T xn − T xm ,
and so {xn } is Cauchy. Since X is Banach, xn → x for some x ∈ X, and by continuity,

T xn → T x. Thus, we have y = T x and R(T ) is closed.
Conversely, since R(T ) is closed in X , it is Banach, so the mapping T̂ : X →
R(T ) is surjective, and by OMT T̂ has bounded inverse, and by (1.6.1) it is bounded
below, taking into account that in R(T ) we have
T xR(T ) = T xY .
The following result follows from the preceding one.

Proposition 1.6.5 Let T ∈ B(X ) for a Banach space X. Then, T −1 ∈ B(X ) if and
only if T is bounded below and R(T ) is dense in X.
Proof If T is invertible, then it is bounded below and bijective, so it is surjective.

Conversely, if T is bounded below, then by the previous proposition, it is injective,
and R(T ) is closed in X. Since it is also dense, we have
R(T ) = R(T ) = X.
So it is surjective. It follows from OMT that T is invertible.
1.6.3 Spectrum of Bounded Operator
We can define the resolvent to be
ρ(T ) = {λ ∈ C : T − λI is bounded below and with dense range}.
On the other hand,
σc (T ) = {λ ∈ C : T − λI is not bounded below}.

1.6 Spectral Analysis of Operators 31
Example 1.6.6 In Example 1.5.6, the point spectrum of the left-shift operator on
p , 1 ≤ p < ∞,
T (x1, x2 , x3 . . .) = (x2 , x3 . . .)
was found to be
σ p (T ) = {λ ∈ C : |λ| < 1}.
Now it is clear that T x p ≤ x p since T ≤ 1, so
T x − λx p ≥ λx p − T x p ≥ (|λ| − 1) x p .
Consequently,
|λ| > T = 1
for all λ ∈ ρ(T ), and therefore
σ(T ) ⊆ {λ ∈ C : |λ| ≤ 1}
but since σ(T ) is closed,
σ(T ) = {λ ∈ C : |λ| ≤ 1}.
The next proposition provides information about the structure of the spectrum.
Proposition 1.6.7 Let T ∈ B(X ) for a Banach space X.
(1) If T < 1, then T − I is invertible.
(2) If T < |λ| , then λ ∈ ρ(T ).
Proof For (1), let T < 1. Then

1
T k ≤ T k = .
1 − T

Hence, T k converges and

n
n
lim T k (I − T ) = lim(I − T ) Tk
n n
k=1 k=1
= lim(I − T n+1
)
n
= I.
So
∞

(I − T )−1 = T k. (1.6.2)
k=1
T T
For (2), we set S = . Then, by assumption S < 1, and so by (a) − I is
λ λ
invertible, and so is T − λI .
The series in (1.6.2) is called the Neumann series.
Proposition 1.6.8 For any operator T ∈ B(X ), the spectrum σ p (T ) is compact.
Proof Let μ ∈ ρ(T ), so T − μI is invertible. Then for z ∈ C, we can write
T − z I = (T − μI ) − (z − μ)I = (T − μI )(I − (z − μ)Tμ−1 ). (1.6.3)

Since μ ∈ ρ(T ), there is M > 0 such that Tμ−1 ≤ M, so we can find δ > 0 such
−1
that δ < T −1 . Choose z such that |z − μ| ≤ δ. Then
μ

(z − μ)T −1 = |z − μ| T −1 < 1.
μ μ
Hence, by Proposition 1.6.7(1) and Neumann series, I − (z − μ)Tμ−1 is invertible

with a bounded inverse. From (1.6.3) we conclude that T − z I is invertible and
Tz−1 ∈ B(X ), and so z ∈ ρ(T ).
This gives a disk inside ρ(T ), thus it is open. Therefore, the spectrum is closed
in C. If λ ∈ σ(T ), then by Proposition 1.6.7(2)
|λ| ≤ T < ∞
so the spectrum it is bounded in C, and therefore it is compact.

Proposition 1.6.9 Let T ∈ B(X ) for some Banach space X. Then
(1) λ ∈ σ(T ) if and only if λ̄ ∈ σ(T ∗ ).
(2) σ(T −1 ) = {λ−1 : λ ∈ σ(t)}.
(3) T is invertible if and only if 0 ∈
/ σ(T ).
¯ is invertible.
Proof For (1), note that T − λI is invertible iff (T − λI )∗ = T ∗ − λI
For (2), we have
λ−1 I − T −1 = −T −1 λ−1 (λI − T ),
knowing that
T −1 λ−1 = λ−1 T −1
is invertible. Hence, λI − T is invertible iff λ−1 I − T −1 is invertible.

(3) follows directly from (2). The details are left to the reader as an exercise (see
Problem 1.11.41).
1.7 Spectral Theory of Self-adjoint Compact Operators 33
1.6.4 Spectral Mapping Theorem
For polynomials in particular, we have the following result, known as “Spectral

Mapping Theorem”.
Proposition 1.6.10 (Spectral Mapping Theorem) Let T ∈ B(X ). If p(z) is a poly-
nomial with complex coefficients, then
p(σ(T )) = σ( p(T )).
Proof Note that
p(σ(x)) = { p(λ) : λ ∈ σ(T )}.
Since p(z) − p(λ) is a polynomial in C, say of degree n, it can be factorized as
p(z) − p(λ) = (z − λ1 ) . . . (z − λn ).
Then, p(λ) ∈ / σ( p(T )) iff p(T ) − p(λ)I is invertible iff (z − λ j ) = 0 for all j =
1, . . . , n iff (T − λi I ) is invertible for all j = 1, . . . , n iff λ ∈
/ σ(T ). The details are
left to the reader as an (see Problem 1.11.42).
1.6.5 Spectrum of Compact Operators
In the case of compact operators, things change. Again, compact operators prove
to be the typical extension to matrices since they retain spectral properties of linear
maps on finite-dimensional spaces.
Proposition 1.6.11 Let T ∈ K(X ) for some Banach space X. If dim(X ) = ∞, then
0 ∈ σ(T ).
Proof If 0 ∈
/ σ(T ), then T is invertible, and if T is invertible and compact, then I
is compact by Proposition 1.3.2, and therefore by Theorem 1.3.3 X must be finite-
dimensional.
1.7 Spectral Theory of Self-adjoint Compact Operators
1.7.1 Eigenvalues of Compact Self-adjoint Operators
Having an uncountable number of eigenvalues is a phenomenon that many operators

on infinite-dimensional spaces possess. We will see, however, that compact operators
can have a countable set of eigenvalues at most.
Theorem 1.7.1 Let T ∗ = T ∈ K(H). Then the set of all eigenvalues {λn } of T is at
most countable, and λn → 0.
Proof If the set of all eigenvalues is finite we are done. Suppose it is infinite and let
> 0. We claim that the set
S = {λ : |λ| ≥ }
is finite. If not, then we can construct a sequence {λn } with corresponding (orthonor-
mal) eigenvectors {ϕn } such that
ϕi , ϕ j = δi j
by Proposition 1.5.2(1). Then

T (ϕ j ) − T (ϕi )2 = λ j ϕ j − λi ϕi 2 = |λi |2 + |λi |2 ≥ 22 > 0. (1.7.1)
But T is compact, then T must have a convergent subsequence. This contradicts

(1.7.1), which implies that S is finite for every > 0, i.e., S c consists of all but finite
terms of λn . The only possible way is that λn → 0.
The result reveals one of the most important spectral properties that characterizes
compact operators. The situation in the case of compact operators is very similar to
linear maps on finite-dimensional spaces in the sense that both have discrete (finite
or countable set of eigenvalues). What about eigenvectors? No information on the
behavior of these eigenvalues or their corresponding eigenvectors is known yet so
far. We will show that they also retain some features from the finite-dimensional
case. Before that, we need to introduce the concept of invariant subspaces.
1.7.2 Invariant Subspaces
Definition 1.7.2 (Invariant Subspace) Let X be a normed space and Y ⊂ X be a

subspace. Let T ∈ L(X ). Then, Y is said to be an invariant subspace if T (Y ) ⊆ Y .
The subspace Y is called T -invariant.
Invariant subspaces allow us to restrict the mapping to this invariant to obtain a

new mapping, denoted by TY , which is a restriction of the original mapping
TY = T |Y = Y −→ Y, TY (x) = T (x) ∀x ∈ Y.
The new mapping is well-defined, but note that it is not necessarily surjective. A
trivial restriction can be made by choosing Y = {0}, then TY = 0. Invariant subspaces
are helpful in reducing down operators into simpler operators acting on invariant
subspaces. The following result will be very helpful in proving our main theorem of
this section.
Proposition 1.7.3 Let T ∈ B(H), and let Y be a closed subspace of H, that is,
T −invariant. Then
(1) If T is self-adjoint, then TY is self-adjoint.
(2) If T is compact, then TY is compact.
Proof Let TY = Y −→ Y , TY (x) = T (x). Note that since Y is closed in H, Y is a
Hilbert space. Suppose T = T ∗ and let x, z ∈ Y. Then
TY x, z Y = TY x, z = T x, z = x, T z = x, TY z = x, TY z Y .
This proves (1).

To prove (2), let {yn } be a bounded sequence in Y. Then, it is bounded in H with
yn = yn Y
and since T is compact, {T yn } has a subsequence {T yn k } that converges in H, so it

is Cauchy in H with

TY (y j ) − TY (yi ) = TY (y j ) − TY (yi ) .
Y
Hence, it is Cauchy in Y which is complete, and consequently it converges in Y. This

proves (2).
1.7.3 Hilbert–Schmidt Theorem
Now, we come to a major result in the spectral theory of operators. The following
theorem provides these missing pieces of information about eigenfunctions.
Theorem 1.7.4 (Hilbert–Schmidt Theorem) If T ∗ = T ∈ K(H) (i.e., compact and
self-adjoint), then its eigenfunctions {ϕn } form an orthonormal basis for R(T ), and
their corresponding eigenvalues behave as
|λ1 | > |λ2 | > |λ3 | . . . .
Moreover, for x ∈ H
∞

T (x) = λ j x, e j e j .
j=1
Proof Proposition 1.5.4 implies the existence of an eigenvalue λ1 of T such that
|λ1 | = T = max{| T x, x | : x ∈ H, x = 1}. (1.7.2)

Let u 1 be the corresponding eigenvector, and normalize it to get

u1
ϕ1 = .
u 1
Define the following subspace:
H1 = (span{ϕ1 })⊥ .
Then H1 is a closed subspace of H, so it is Hilbert. Let x ∈ H1 . Then, x, ϕ1 = 0,

which implies that
T x, ϕ1 = x, T ϕ1 = λ x, ϕ1 = 0.
Therefore, T x ∈ H1 , and this shows that H1 is a T −invariant subspace of H. Now

consider the restriction
TH1 = T1 : H1 −→ H1 , T1 (z) = T (z) ∀z ∈ H1 .
Then by previous proposition, T1 is self-adjoint compact on H1 . Again, there exists

an eigenvalue λ2 of T1 such that
|λ2 | = T1 = max{| T1 x, x | : x ∈ H1 , x = 1}. (1.7.3)
In view of the definition of T1 and (1.7.2)–(1.7.3), it is clear that
|λ1 | > |λ2 | .
Let ϕ2 be the eigenvector corresponding to λ2 . Clearly the set {ϕ1 , ϕ2 } is orthonormal.

Define
H2 = (span{ϕ1 , ϕ2 })⊥ .
It is clear that H2 ⊂ H1 is closed in H1 , hence Hilbert. Moreover, it is T − invariant,

so
T2 = TH2 : H2 −→ H2
is (again by previous proposition) self-adjoint and compact. Then, there exists an

eigenvalue λ3 of T2 such that
|λ3 | = T2 = max{| T2 x, x | : x ∈ H2 , x = 1}.
It is clear that
|λ1 | > |λ2 | > |λ3 | .
Now we proceed to obtain a collection of eigenvalues {λn } of T , with
Tn = |λn+1 | ≤ |λn | (1.7.4)
for all n ≥ 1. If the process stops at n = N < ∞, such that THN +1 = 0, then for every
x ∈H

N
T (x) = λ j x, ϕ j ϕ j ,
j=1
and we have a finite set of eigenvectors {ϕ j : j = 1, . . . , N } together with a finite

set of eigenvalues
|λ1 | > |λ2 | > . . . > |λ N | .
If the process doesn’t stop at any N < ∞, we continue the process and we get a
sequence
|λ1 | > |λ2 | > |λ3 | > . . .
with corresponding countable orthonormal eigenvectors {ϕ1 , ϕ2 , ϕ3 , . . .}. Note that

every x ∈ H can be written uniquely as
⎛ ⎞
n
n
x = ⎝x − x, ϕ j ϕ j ⎠ + x, ϕ j ϕ j ,
j=1 j=1
and since

n
x, ϕ j ϕ j ∈ span{ϕ1 , ϕ2 , . . . , ϕn },
j=1
we have
⎛ ⎞

n
⎝x − x, ϕ j ϕ j ⎠ ∈ Hn .
j=1
Now, let

n
zn = x − x, ϕ j ϕ j .
j=1
Using (1.7.4) and Bessel’s inequality, we obtain
T z n = Tn z n ≤ Tn z n ≤ |λn | x . (1.7.5)
But
⎛ ⎞

n
n
n
T ⎝x − x, ϕ j ϕ j ⎠ = T (x) − x, ϕ j T (ϕ j ) = T (x) − λ j x, ϕ j ϕ j .
j=1 j=1 j=1
Therefore, using (1.7.5)

n

T (x) − λ j x, ϕ j ϕ j
= T z n ≤ |λn | x .
j=1
Take n → ∞ and use Theorem 1.7.1 to obtain

∞

T (x) = λ j x, ϕ j ϕ j . (1.7.6)
This shows that the range space of T is spanned by the eigenvectors {ϕn }. That is,
letting
M = Span{ϕn }n∈N ,
then (1.7.6) makes it clear that R(T ) ⊆ M, hence
R(T ) ⊆ M.
On the other hand, let

∞

y= αn ϕn ∈ M.
Then, we have
∞
αjϕj
=x∈X
λj
and
∞
αjϕj
T (x) = T
λj
∞
T (ϕ j )
= αj
λj
∞

= αjϕj
= y.
Therefore, y ∈ R(T ), and so M ⊆ R(T ). This completes the proof.
Problem 1.11.34 gives another approach to show that M is an orthonormal basis

for the range space. The preceding theorem allows us to construct an orthonormal
basis for R(T ) from the eigenvectors of T, whether H is separable or not. The form
∞

T (x) = λ j x, e j e j
j=1
which was concluded at the end of the proof is called the diagonal operator with
entries {λn } on the diagonal
⎛ ⎞
λ1
⎜ λ2 ⎟
⎝ ⎠
..
.
It turns out that compact self-adjoint operators defined on Hilbert spaces can be
unitarily diagonalized, a property which is similar to those of finite-dimensional
linear mappings.
1.7.4 Spectral Theorem For Self-adjoint Compact Operators
The next theorem is a continuation of the previous theorem in case the space H is
separable. This will extend the eigenvectors to cover the whole space and a complete
representation will be given.
Theorem 1.7.5 (Spectral Theorem For Self-adjoint Compact Operators)
Let T ∗ = T ∈ K(H) (i.e., compact and self-adjoint) on a Hilbert space H. Then, its
eigenfunctions form an orthonormal basis for the space H. This orthonormal basis
is countable if H is separable.
Proof Let
H =R(T ) ⊕ N (T ).
So it suffices to find a basis for the null space of T. If H is separable, then so is

the null space N (T ). Let {ϕn } be a countable orthonormal basis for N (T ). Since
T ϕn = 0 for all n, we consider them as eigenvectors corresponding to λ = 0. Let
{φn } = {ϕn } ∪ {en },
where {en } are the orthonormal basis for R(T ) that was constructed in the proof of
the Hilbert–Schmidt theorem. Note that Proposition 1.5.2(1) implies that
en , ϕm = 0
for all n, m. Indeed, any x ∈ H can be written uniquely as

x= [ x, en en + x, ϕn ϕn ] .
Therefore, {φn } is a countable orthonormal basis for H. If H is nonseparable, then

we choose the orthonormal set {ϕα : α ∈ } to be a basis for N (T ), and if α ∈ ,
then λα = 0. Thus
{φα } = {{ϕα } ∪ {en } : n ∈ N, α ∈ }.
This set is not necessarily countable. We proceed with the same argument for the
separable case.
The spectral theorem in its two parts shows that a compact self-adjoint operator
T ∈ K(H) can be written as

T (x) = λ j x, ϕ j ϕ j
where {ϕn } is a set of orthonormal basis for H and {λn } is the set of the eigenvalues,
which is either finite or countable and decreasing with λn → 0. The next theorem
shows that the converse of the spectral theorem holds as well.
Theorem 1.7.6 Let T ∈ B(H) be a bounded linear operator on a Hilbert space H
such that for every x ∈ H

T (x) = λ j x, ϕ j ϕ j ,
where {ϕn } is a set of orthonormal basis for H and {λn } is the set of the eigenvalues
which is either finite or countable and decreasing and λn → 0. Then T is a compact
self-adjoint operator.
Proof If dim(H) < ∞ and the system {λn , ϕn } is finite then T is of finite rank and
thus it is compact. If not then we define the following operators:

n
Tn (x) = λ j x, ϕ j ϕ j .
j=1
1.8 Fredholm Alternative 41
Then for each n, Tn is of finite rank and so it is compact. Since λn is decreasing, we

have 2
∞ ∞
2
Tn − T =
2 λ j x, ϕ j ϕ j = λ2j x, ϕ j .

j=n+1 j=n+1
So by Bessel’s inequality this yields
Tn − T 2 ≤ λ2n+1 x2 .
As n −→ ∞, λn → 0 and this gives the uniform convergence of Tn to T which

implies that T is compact. Furthermore, we have

T x, y = λ j x, ϕ j ϕ j , y

= λ j x, ϕ j ϕ j , y

= x, λj ϕj, y ϕj
= x, T y .

1.8 Fredholm Alternative
1.8.1 Resolvent of Compact Operators
According to Proposition 1.6.4, a bounded linear operator on a Banach space is

bounded below if and only if T is injective and R(T ) is closed. The following
proposition treats this result differently.
Proposition 1.8.1 Let T ∈ K(X ), and let 0 = λ ∈ C be a complex number. If Tλ is
injective for λ = 0, then Tλ is bounded below.
Proof If not, then for each n, there exists xn ∈ S X such that

1
Tλ (xn ) < .
n
Since {xn } is bounded and T is compact, there exists a subsequence, say {xn j } of xn ,
such that T (xn j ) converges to, say, y. Thus
λxn j = T xn j − (T − λI )(xn j ) → y − 0 = y,
which implies
y
xn j → .
λ
By continuity of T and the fact that xn = 1, we get T y = λy, with
|y| = y > 0.
Hence, λ ∈ σ(T ) and therefore Tλ is not injective.
As a consequence, we have the following.

Corollary 1.8.2 Let T ∈ K(X ), and let 0 = λ ∈ C be a complex number. If Tλ =
T − λI is injective then R(T − λI ) is closed, and if Y is a closed subspace of X
then Tλ (Y ) is closed in X.
Proof The first part of the conclusion follows from Proposition 1.6.4. For the second
part, it can be readily seen that Tλ |Y is also bounded below from the previous
proposition, hence again applying Proposition 1.6.4.
Theorem 1.8.3 Let T ∈ K(X ), and let 0 = λ ∈ C be a complex number. If Tλ is

injective, then Tλ is surjective.
Proof Suppose not. Let Y = (Tλ )(X ). Then, by Corollary 1.8.2
Y = Y X.
Consider Yn = Tλn (X ). Clearly
X ⊃ Y1 ⊃ Y2 . . . ,
with each Yn being closed. Let 0 < γ < 1. By the Riesz lemma, there exists yn ∈
Yn \ Yn+1 such that
d(xn , Yn+1 ) ≥ γ. (1.8.1)
Note that if m > n, then
Ym+1 Ym ⊆ Yn+1 .
It follows that
(T − λI )ym ∈ Ym+1 & (T − λI )yn ∈ Yn+1 .
This gives
T yn − T ym = λyn − λym + Tλ (ym ) − Tλ (yn )

= λyn − (λym + Tλ (yn ) − Tλ (ym )) (1.8.2)
where
z = λym + Tλ (yn ) − Tλ (ym ) ∈ Yn+1 .
From (1.8.1) and (1.8.2), we have
T yn − T ym ≥ |λ| γ > 0.
Hence, the sequence {T xn } cannot have a convergent subsequence, which contradicts

the fact that T is compact.
Now, the combination of Propositions 1.6.4 and 1.8.1, and Theorem 1.8.3 implies
the following.
Corollary 1.8.4 Let T ∈ K(X ) for a Banach space X , and let 0 = λ ∈ C be a
complex number. Then, Tλ = T − λI is injective iff Tλ is invertible.
1.8.2 Fundamental Principle
Using the notion of a compact operator, we have been able to find an analog of the
result in the finite-dimensional case, which states that invertibility and injectivity are
equivalent for linear maps. The previous corollary implies that if
/ σ p (T )
0 = λ ∈
for some compact T , then λ ∈ ρ(T ). This also means that for a compact operator
σ p (T ) \ {0} = σ(T ) \ {0},
or in other words,
σ(T ) = σ p (T ) ∪ {0}.
This leads to the remarkable result, commonly called, Fredholm Alternative, which
states the following.
Theorem 1.8.5 (Fredholm Alternative) Let T ∈ K(X ) for a Banach space X , and
let 0 = λ ∈ C be a complex number. Then, we have one of the following:
(1) Tλ is noninjective, i.e., N (T − λI ) = {0}, or
(2) Tλ is invertible, i.e., T − λI has bounded inverse.
The operator Tλ satisfying this Fredholm Alternative principle shall be called the
Fredholm Operator. The two statements mean precisely the following: either the
equation
T (u) − λu = 0 (1.8.3)
has a nontrivial solution, or the equation
T (u) − λu = v (1.8.4)
has a unique solution for every v ∈ X .
1.8.3 Fredholm Equations
In the language of integral equations, we can also say: either the equation
b
λ f (x) = k(x, t) f (t)dt
a
has a nontrivial solution f , or the equation

b
λ f (x) − k(x, t) f (t)dt = g(x)
a
has a unique solution for every function g, keeping in mind that the integral operator
is a Hilbert–Schmidt operator, which is a compact operator. We state the Fredholm
Alternative theorem for Fredholm integral equations.
Theorem 1.8.6 (Fredholm Alternative for Fredholm Equations) Either the equation
b
k(x, y)u(y)dy − λu(x) = f (x)
a
has a unique solution for all f ∈ L 2 [a, b], or the equation

b
k(x, y)u(y)dy − λu(x) = 0
a
has a nontrivial solution u ∈ L 2 [a, b].

Example 1.8.7 The Fredholm equation of the first kind takes the form
b
k(x, y) f (y)dy = g(y). (1.8.5)
a
This can be written as

K f = g,
where K is a compact integral operator defined as

b
K (·) = k(x, y)(·)dy.
a
If 0 ∈ σ(K ) then K 0 is noninjective, i.e., K ( f ) = 0 has a nontrivial solution. By the

Fredholm Alternative Theorem 1.8.5, K 0 is not invertible; we cannot find a unique
solution for (1.8.3) for all g.
1.8.4 Volterra Equations
We state a variant of Fredholm Alternative theorem for Volterra integral equations.

Theorem 1.8.8 (Existence and Uniqueness of Solutions of Volterra Integral Equa-
tion) Consider the Volterra equation of the first kind
x
(V u)(x) = k(x, y)u(y)dy,
0
where V is defined on L p ([0, 1]), 1 < p < ∞, and k ∈ C([0, 1] × [0, 1]). Then for
all λ = 0, there exists a unique solution for the equation
(λI − V )u = f
for all f ∈ L 2 ([0, 1]).
Proof The kernel can be written as

k(x, y) 0 ≤ x < y
k̃(x, y) =
0 y ≤ x < 1.
Then the operator takes the Fredholm form

1
V u(x) = k̃(x, y)u(y)dy,
0
which is compact. Now we discuss the spectrum of V. Set f = 0 to obtain the

following eigenvalue equation:
x
V u = λu = k(x, y)u(y)dy (1.8.6)
0
for 0 < x ≤ 1. Let M > 0 such that |k| < M on [0, 1], and let
u1 = α
then we obtain for a.e. x ∈ [0, 1]

x
|λ| |u(x)| ≤ k(x, y)u(y)dy ≤ αM.
0
Repeat the process on u using the fact that λ2 u = V 2 u = λ(λu) to obtain
x

|λ|2 |u(x)| ≤ k(x, y)(λu(y)dy
0 x

≤ k(x, y)αMdy
0
≤ αM 2 x
≤ αM 2 .
Iterating it n times
1
|λ|n |u(x)| ≤ M n α ,
(n − 1)!
and taking n −→ ∞ gives either λ = 0 (i.e., σ(V ) = {0}) or u = 0 on [0, 1]. So

we conclude that the Volterra equation (1.8.6) has only the trivial solution, i.e.,
(1.8.3) doesn’t hold and V is injective, and since it is compact, λI − V . There-
fore, for all λ = 0, Vλ is injective and the result follows by Fredholm Alternative
(1.8.4).
1.9 Unbounded Operators
1.9.1 Introduction
All the operators that were studied so far fall under the class of linear bounded
operators. In this section, we will investigate operators that are unbounded. The
theory of the unbounded operator is an important aspect of applied functional analysis
since some important operators, such as the differential operators, encountered in
applied mathematics and physics are unbounded, so developing a theory that treat
these operators is of utmost importance. The following theorems are central in the
treatment of linear bounded operators:
1.9 Unbounded Operators 47
(1) The fundamental Theorem of Linear Operators: If T : X −→ Y is a linear oper-

ator between two normed spaces, then T is bounded if and only if T is continuous.
(2) Bounded Inverse Theorem: If T : X −→ Y is a linear invertible operator between
two Banach spaces X, Y , then T is bounded if and only if T −1 is bounded.
(3) Closed Graph Theorem: If T : X −→ Y is a linear operator between two Banach
spaces X, Y and D(T ) = X , then T is bounded if and only if its graph is a closed
set in X × Y.
These three theorems describe the general framework of the theory of bounded linear
operators. When dealing with unbounded operators, it is understood in view of these
theorems that the operators are no longer continuous, and consequently we will have
to seek other notions that can redeem some of the properties that hold due to the
notion of continuity. The best choice is the notion of closedness as it generalizes the
notion of continuity.
1.9.2 Closed Operator
Recall that a mapping T : X −→ Y is continuous on its domain if whenever xn −→ x

we have T (xn ) −→ T (x). On the other hand, we have the following definition for
closed operators.
Definition 1.9.1 (Closed Operator) An operator T : X −→ Y is closed if
graph(T ) = G( f ) = {(x, T (x)) : x ∈ D(T )}
is a closed subspace in X × Y.
Remark It is important to note the following:
(1) The above definition is equivalent to say T : X −→ Y is closed if whenever

xn −→ x in D(T ) and T (xn ) −→ y, we have x ∈ X and y = T (x).
(2) This definition is different than the definition used in point-set topology which
states that a mapping is closed if the image of a closed set is closed. The two
definitions are not equivalent. Mappings can have closed graphs but not closed
domains (simply think of f (x) = e x − 1 for example).
If D(T ) is closed, then we will certainly have x ∈ D(T ), and we are only concerned
about the convergence of (T (xn )). Here, when xn −→ x in D(T ), the continuity
property ensures the convergence of the sequence (T (xn )) to T (x). On the other
hand, the closedness property won’t guarantee the convergences of (T (xn )), but if
it happens, it is guaranteed to converge to T (x). The two properties are similar in
that both guarantee that the convergence of (T (xn )) won’t be to any element other
than T (x) but closedness doesn’t guarantee the convergence of this sequence as
it may diverge, whereas continuity (which is equivalent to boundedness for linear
operators) guarantees its convergence. It is evident that a continuous operator is
closed but the converse is not necessarily true. The only type of discontinuity that
may occur by closed linear operators is the (infinite) essential discontinuity. Loosely
speaking, if the domain of the operator is complete then x ∈ D(T ), i.e., T (x) = y ∈
ImT , and this forces the operator to be bounded since otherwise there would be a
convergent sequence xn −→ x ∈ D(T ) such that T (xn ) −→ ∞, and this will break
the closedness of the graph of T . It turns out that closed operators can redeem some
of the properties for the continuous operators.
Theorem 1.9.2 (Closed Range Theorem) Let T : X −→ Y be linear closed opera-
tor between Banach spaces X and Y. If T is bounded below, then R(T ) is closed in
Y.
Proof The proof is similar to Prop 1.6.4. Let yn ∈ R(T ) and yn −→ y. Let xn ∈
D(T ) such that T xn = yn and T (xn ) → y ∈ X, then {T xn } is Cauchy. It follows that
c xn − xm ≤ T xn − T xm ,
hence {xn } is Cauchy and by completeness it converges to, say, x ∈ X, and since T
is closed,
y = T (x) ∈ R(T )
and therefore R(T ) is closed.
In a sense analogous to the Bounded inverse theorem, we have the following.

Proposition 1.9.3 Let T : X −→ Y be a linear operator between Banach spaces
X and Y. Then T is closed iff T −1 is closed. Moreover, if T is bijective, then T −1 is
bounded.
Proof Note that the graph of T −1 is
G(T −1 ) = {(T (x), x) : x ∈ D(T )},
so if G(T ) is a closed subspace, then so is G(T −1 ) since the same argument for G(T )
can be made for G(T −1 ) with
T (xn ) = yn
where yn −→ y, and assuming
T −1 (yn ) = xn −→ x.
If, in addition, T is onto, then

D(T −1 ) = Y
which is Banach, so by the closed graph theorem T −1 is bounded.

This is a very interesting property of closed operators which says that a linear
closed operator between Banach spaces has a bounded and closed inverse even if it
is unbounded. It also shows that the solution u of the equation Lu = f for a closed
bijective operator L is controlled and bounded by f . Indeed, if Lu = f for some
f ∈ R( f ) then

u = L −1 f ≤ L −1 f .
This result is useful in applications to differential equations when seeking well-posed

solutions that depend continuously on the data given.
1.9.3 Basics Properties of Unbounded Operators
The sum and product of operators can be defined the same way as for the bounded
case.
Definition 1.9.4 Let T and S be two operators on X. Then
(1) D(T + S) = D(T ) ∩ D(S).
(2) D(ST ) = {x ∈ D(T ) : T (x) ∈ D(S)}.
(3) T = S if D(T ) = D(S) and T x = Sx for all x ∈ D.
(4) S ⊂ T if D(S) ⊂ D(T ) and T |D(S) = S. In this case, T is said to be an extension
of S.
Note that from (2) and (3) above, we have in general T S = ST and they are equal
only if
D(T S) = D(ST ).
Furthermore, if
L : D(L) ⊂ X −→ X,
and L −1 exists, then D(L −1 L) = D(L) whereas D(L L −1 ) = X, so L −1 L ⊂ I and

equality holds only if D(L) = X.
The next result shows that the closedness of a linear operator T extends to its
Fredholm form Tλ .
Proposition 1.9.5 If T : X −→ Y is a closed linear operator then so is
Tλ = T − λI.
Proof Let xn −→ x and assume Tλ (xn ) −→ y. Then
xn − x → 0
and
T xn − (λxn + y) → 0.
Then
T xn − (λx + y) = T xn − λxn + λxn − λx − y

≤ T xn − λxn − y + λ xn − xn → 0.
Thus,
T xn −→ λx + y.
But, since T is closed, T xn = T x, so
T x = λx + y,
or
T x − λx = Tλ (x) = y.
A classic example of a closed unbounded operator is the differential operator.

Example 1.9.6 Let D : C 1 [0, 1] −→ C[0, 1], D(u) = u where both spaces are
endowed with the supremum norm ·∞ . It is clear that D is linear. Let u n −→ u
and
D(u n ) = u n −→ v
in the supremum norm, i.e., the convergence is uniform. Then
x x
v(τ )dτ = lim u n (τ )dτ = lim[u n (x) − u n (0)] = u(x) − u(0).
0 0
That is, we have

x
u(x) = u(0) + v(τ )dτ .
0
Hence u = D(u) = v and u ∈ X. Thus D is closed. If we let u n (x) = x n , then

x∞ = 1 and

D(u n (x)) = nx n−1 = n.
So
D = sup D(u n ) = n.
Hence D is unbounded.
So it became apparent in view of the preceding example that the class of closed
unbounded operators represents some of the most important linear operators, and
a comprehensive theory for this class of operators is certainly needed to elaborate
more on their properties. It is well-known that the inverse of the differential operator
is the integral operator, which was found to be compact and self-adjoint. So our next
step will be to investigate ways to define the adjoint of these unbounded operators.
1.9.4 Toeplitz Theorem
Recall that the adjoint of a linear bounded operator T : X → Y is defined as
T ∗ : Y ∗ −→ X ∗
given by
T ∗ y = yT,
for y ∈ Y ∗ . If X and Y are Hilbert spaces, then the adjoint operator for the operator
T : H1 → H2 is defined as T ∗ : H2 → H1 ,
T x, y = x, T ∗ y
for all x ∈ H1 and y ∈ H2 . If T is bounded then D(T ∗ ) = H2 and so the adjoint is

defined for all y ∈ H2 for which there is y ∗ = T ∗ y such that
T x, y = x, T ∗ y
for all x ∈ H1 . In the unbounded case, this construction might cause a trouble and
won’t give rise to a well-defined adjoint mapping since (T (xn ))n will diverge for
some sequence (xn ), therefore we need to restrict the domain of T ∗ to consist of only
the elements y that would make T x, y bounded for all x ∈ D(T ). The following
theorem illustrates this.
Theorem 1.9.7 (Toeplitz Theorem) Let L : H −→ H be a linear operator and
D(T ) = H. If T x, y = x, T y for all x, y ∈ H then T is bounded.
Proof If T is unbounded then there is a sequence z n ∈ H such that z n = 1 and
T (z n ) −→ ∞. (1.9.1)
Define the sequence

f n (x) = T x, yn .
Note that for every x ∈ H

| f n (x)| = | T x, yn | ≤ T x
and for each n

| f n (x)| = | x, T yn | ≤ T yn x .
By the uniform bounded principle, f n ≤ M for some M > 0, i.e.,
| f n (x)| ≤ M x
for every x ∈ H. Then
T z n 2 = T z n , T z n = | f n (T z n )| ≤ M T z n ,
so T z n ≤ M which contradicts (1.9.1).
1.9.5 Adjoint of Unbounded Operators
The Toeplitz theorem indicates that for symmetric unbounded operators, we must
have D(T ) ⊂ H. Another problem that arises in obtaining a well-defined adjoint is
that T ∗ must be uniquely determined for each y ∗ ∈ D(T ∗ ). This can be guaranteed if
D(T ) is made as large as possible, but since D(T ) ⊂ H, we will try something like
D(T ) = H. Indeed, if D(T ) ⊂ H then by the orthogonal decomposition of Hilbert
spaces
⊥
H = D(T ) ⊕ D(T ) ,
we can find
⊥
0 = y0 ∈ D(T )
such that x, y0 = 0 for all x ∈ D(T ), but this implies that
T x, y = x, y ∗ = x, y ∗ + x, y0 = x, y ∗ + y0 = x, T ∗ y .
It follows that for every x ∈ D(T ) such that T x, y = x, T ∗ y we have T ∗ y =

y and T ∗ y = y ∗ + y0 . Hence by making the domain of T dense, we obtain
∗
⊥
uniqueness of T ∗ y since D(T ) = {0} in this case. An operator T : X → Y where
D(T ) = X is called densely defined.
The above argument proposes the following definition for general (possibly
unbounded) operators.
Definition 1.9.8 (Adjoint of General Operator) Let T : D(T ) ⊂ H −→ H be a
linear operator on a Hilbert space H that is densely defined, i.e., D(T ) = H. Then
the adjoint of T is defined as T ∗ : D(T ∗ ) → H,
T x, y = x, T ∗ y
for all x ∈ D(T ) and y ∈ D(T ∗ ) = {y ∈ H such that x, T ∗ y = T x, y ∈ C for

all x ∈ D(T )}. The collection of all linear (possibly unbounded) operators on a space
X is denoted by L(X ).
In the bounded case, it is well-known that for T, S ∈ B(H),
T ∗ + S ∗ = (T + S)∗
and
T ∗ S ∗ = (ST )∗ .
The next proposition shows that this is not the case for unbounded operators.
Proposition 1.9.9 Let T, S ∈ L(H) be two densely defined operators. Then
(1) (αT )∗ = αT ∗ .
(2) If S ⊂ T then T ∗ ⊂ S ∗ .
(3) T ∗ + S ∗ ⊂ (T + S)∗ .
(4) If ST is densely defined, then T ∗ S ∗ ⊂ (ST )∗ .
Proof For (1), we have
αT x, y = x, (αT )∗ y (1.9.2)
for all x ∈ D(T ). On the other hand,
αT x, y = α T x, y = α x, T ∗ y = x, ᾱT ∗ y . (1.9.3)
Then (1) follows from (1.9.2) and (1.9.3).

For (2), let y ∈ D(T ∗ ). Then
x, T ∗ y = T x, y
for all x ∈ D(T ), hence for all x ∈ D(S). But from Definition 1.9.4(4)
T (x) = S(x)
for all x ∈ D(S). So

x, S ∗ y = Sx, y
for all x ∈ D(S), which implies that y ∈ D(S ∗ ) and
S ∗ = T ∗ on D(T ∗ ).
This gives (2).

For (3), by Definition 1.9.4(1) let
y ∈ D(T ∗ + S ∗ ) = D(T ∗ ) ∩ D(S ∗ ).
Then y ∈ D(T ∗ ) and D(S ∗ ). Hence,
x, T ∗ y = T x, y
for all x ∈ D(T ), and

x, S ∗ y = Sx, y
for all x ∈ D(S). It follows that
x, T ∗ y + x, S ∗ y = T x, y + Sx, y (1.9.4)
for all
x ∈ D(T ) ∩ D(S) = D(T + S).
But (1.9.4) can be written as
x, (T ∗ + S ∗ )y = (T + S)x, y = x, (T + S)∗ y .
Therefore, y ∈ D(T + S)∗ , and
T ∗ + S ∗ = (T + S)∗
for all y ∈ D(T ∗ + S ∗ ). This proves (3).

To prove (4), let y ∈ D(T ∗ S ∗ ). Then
x, T ∗ S ∗ y = T x, S ∗ y
= ST x, y
= x, (ST )∗ y
for all x ∈ D(T ). Hence y ∈ D((ST )∗ ) and (4) is proved.
1.9.6 Deficiency Spaces of Unbounded Operators
Proposition 1.2.4 gives the relations between the null space and the range of the
operator and its adjoint. The result holds to the unbounded case as well.
Proposition 1.9.10 Let T ∈ L(H). Then
(1) N (T ∗ ) = R(T )⊥ and N (T ∗ )⊥ = R(T ). T and T ∗ in the identities can be

replaced to give N (T ) = R(T ∗ )⊥ and N (T )⊥ = R(T ∗ ).
(2) For λ ∈ C, we have N (Tλ∗ ) = R(Tλ )⊥ and N (Tλ∗ )⊥ = R(Tλ ). T and T ∗ in the
identities can also be replaced.
Proof Note that y ∈ R(T )⊥ iff T x, y = 0 for all x ∈ D(T ) iff x, T ∗ y = 0 iff
T ∗ y = 0 iff y ∈ N (T ∗ ), and this gives N (T ∗ ) = R(T )⊥ . All the other identities
can be proved similarly and are left to the reader to verify.
The decomposition of a Hilbert space is therefore possible for linear operators

that are possibly unbounded. Namely, letting T ∈ L(H),
R(T ) ⊕ N (T ∗ ) = H.
The spaces N (Tλ∗ ) and R(Tλ ) are called the deficiency spaces, and the numbers
dim N (Tλ∗ ) & dim R(Tλ )
are called the deficiency indices.
1.9.7 Symmetry of Unbounded Operators
Due to the densely defined domains, we need the following definition for symmetric
operators.
Definition 1.9.11 (Symmetric Operator) Let T ∈ L(H). Then T is symmetric if
T x, y = x, T y
for all x, y ∈ D(T ). The operator T is self-adjoint.
Proposition 1.9.12 Let T ∈ L(H) and D(T ) = H. Then

(1) T is symmetric if and only if T ⊂ T ∗ .
(2) If T is symmetric and R(T ) = H. then T is injective. Conversely, if T is sym-
metric and injective then R(T ) = H.
(3) If T is symmetric and R(T ) = H, then T is self-adjoint.
Proof For (1), let y ∈ D(T ). If T x, y = x, T y for all x ∈ D(T ) then clearly
y ∈ D(T ∗ ) and so D(T ) ⊂ D(T ∗ ). This gives the first direction. Conversely, let
T ⊂ T ∗ . Then T ∗ = T on D(T ), so for all x, y ∈ D(T )
T x, y = x, T ∗ y = x, T y ,
so T is symmetric.
For (2), let x ∈ D(T ) and T x = 0. Then
T x, y = x, T y = 0
for all y ∈ D(T ). So x⊥R(T ), and since R(T ) = H, we have x = 0. Conversely,

⊥
assume R(T ) ⊂ H. Then there exists z ∈ R(T ) such that z, y = 0 for all y ∈
R(T ). But there is x ∈ D(T ) such that T x = y and so
z, T x = 0 = T z, x
for all x ∈ D(T ), and this implies that T z = 0, but because T is injective, we must
have z = 0, hence T is injective.
For (3), let T be symmetric. By (1), T ⊂ T ∗ , so it suffices to show that D(T ∗ ) ⊂
D(T ) which will give the other direction. Let y ∈ D(T ∗ ). Then
T x, y = x, T ∗ y
for all x ∈ D(T ). But note that T ∗ y ∈ H, so by surjection of T, there exists z ∈ D(T )
such that
T z = T ∗ y.
Consequently,
T x, y = x, T ∗ y = x, T z = T x, z ,
hence y = z ∈ D(T ) and so

D(T ∗ ) ⊂ D(T ).
Note that using the adjoint operator, the Toeplitz theorem becomes more accessible
to us and follows easily from the preceding proposition since we can simply argue as
follows: If T is symmetric then by the preceding proposition T ⊂ T ∗ , which implies
D(T ) ⊂ D(T ∗ ),
but since D(T ) = H, we also have
D(T ∗ ) ⊆ D(T ),
which implies
D(T ) = D(T ∗ )
and therefore T = T ∗ .
The next theorem discusses the connection between disjoints and inverses. In the
bounded case, it is known that T is invertible if and only if T ∗ is invertible, and
(T ∗ )−1 = (T −1 )∗ .
This identity extends to general linear invertible densely defined operators that are
not necessarily bounded. A more interesting result is to assume symmetry rather than
injectivity.
Theorem 1.9.13 Let T ∈ L(H) be symmetric. If D(T ) = H and R(T ) = H then
(T −1 )∗ exists, (T ∗ )−1 exists, and
(T −1 )∗ = (T ∗ )−1 .
Proof By Proposition 1.9.12(2), T is injective, hence T : D(T ) −→ R(T ) is invert-

ible. A similar argument shows that for y ∈ D(T ∗ ), T ∗ y = 0 implies y = 0. So T −1
and (T ∗ )−1 exist. Also,
D(T −1 ) = R(T ) = H,
so (T −1 )∗ exists. It follows that
x, y = T T 1 x, y
= T −1 x, T ∗ y
= x, (T −1 )∗ T ∗ y ,
for all y ∈ D(T ∗ ). So we obtain
(T −1 )∗ T ∗ y = y on D(T ∗ )
and R(T ∗ ) ⊂ D(T −1 )∗ ). On the other hand, the inverse of T ∗ is
(T ∗ )−1 : R(T ∗ ) −→ D(T ∗ ).
Consequently,
D((T ∗ )−1 ) = R(T ∗ ) ⊂ D(T −1 )∗ ),
therefore
(T ∗ )−1 ⊂ (T −1 )∗ . (1.9.5)
Similarly,
x, y = T −1 T x, y
= T x, (T −1 )∗ y
= x, T ∗ (T −1 )∗ y
for all y ∈ D(T −1 )∗ ). Then
T ∗ (T −1 )∗ y = y on D(T −1 )∗ ),
so for all y ∈ D(T −1 )∗ ) we have
y ∈ R(T ∗ ) = D((T ∗ )−1 ),
whence
(T −1 )∗ ⊂ (T ∗ )−1 . (1.9.6)
From (1.9.5) and (1.9.6), we conclude that (T −1 )∗ = (T ∗ )−1 .
An important corollary is
Corollary 1.9.14 Let T ∈ L(H) be densely defined and injective. If T is self-adjoint
then T −1 is self-adjoint.
The next result asserts that the adjoint of any densely defined operator is closed.
Theorem 1.9.15 If T ∈ L(H) such that D(T ) = H, then T ∗ is closed.
Proof Let yn ∈ D(T ∗ ) such that yn −→ y and T ∗ (yn ) −→ z. Then for every x ∈
D(T )
T x, yn = x, T ∗ yn −→ x, z
and
T x, yn −→ T x, y .
Hence
x, z = T x, y = x, T ∗ y ,
and this implies that y ∈ D(T ∗ ) and T ∗ y = z.
1.9.8 Spectral Properties of Unbounded Operators
The spectral properties of the unbounded linear operators retain much of those for
the bounded operators. The definitions are the same.
Definition 1.9.16 (Resolvent and Spectrum) Let X be Banach space and T ∈ L(X ).
A scalar λ ∈ C is called a regular value of T if the resolvent
Rλ = Tλ−1 = (T − λI )−1 ∈ B(X ),
that is, Tλ is boundedly invertible (i.e., bijection with a bounded inverse operator).
The set of all regular values of T is called the resolvent set, and is denoted by ρ(T ).
The set C \ ρ(T ) = σ(T ) is the spectrum of T.
Note that to have a bounded inverse for an unbounded operator is more challeng-
ing. The notion of closedness will be invoked here as it will play an important role
in establishing some interesting properties for the unbounded operators.
Theorem 1.9.17 Let T ∈ L(X ) be a densely defined closed operator on a Banach
space X. Then λ ∈ ρ(T ) iff Tλ is injective.
Proof If λ ∈ ρ(T ) then Tλ is in fact bijective. Conversely, if Tλ be injective then
Tλ−1 : R(Tλ ) −→ D(Tλ )
exists, and by Proposition 1.9.3 it is closed since T is closed. Moreover, Tλ is closed

by Proposition 1.9.5, and so is D(Tλ ), which implies that it is complete, and the same
holds for R(Tλ ). Therefore Tλ−1 is bounded by the Closed Graph Theorem.
The preceding theorem gives a characterization for the resolvent of a densely

defined closed operator T ∈ L(X ) by saying that
ρ(T ) = {λ ∈ C : T − λI is injective.}.
The next result characterize closed operators in terms of their resolvents. It basi-
cally says that if you can find, at least, one element in the resolvent set, then the
operator is necessarily closed.
Proposition 1.9.18 Let T ∈ L(H) be a densely defined operator on a Hilbert space
H. If ρ(T ) = Ø then T is closed.
Proof If not, then for any λ ∈ C neither Tλ nor Tλ−1 is closed, so Tλ−1 is unbounded,
and therefore λ ∈/ ρ(T ) and consequently ρ(T ) is empty.
The preceding result indicates why dealing with closed operators is efficient.
This will simplify the work on self-adjoint operators knowing that every self-adjoint
operator is closed.
Theorem 1.9.19 Let T ∈ L(H) be a densely defined operator on a Hilbert space
H. Then T is self-adjoint iff T is symmetric and σ(T ) ⊂ R.
Proof If T is self-adjoint then it is closed by Theorem 1.9.15. Let λ ∈ C, λ = a + bi

with b = 0. Then
(T − λ)x2 = (T − a)x2 + b2 x2 ≥ b2 x2 , (1.9.7)
so T − λI is bounded below and hence injective. By Theorem 1.9.17 λ ∈ ρ(T ), and

since λ is arbitrary, σ(T ) ⊂ R. √
Conversely, let σ(T ) ⊂ R. Then, λ = i = −1 is not an eigenvalue of T and so
by Theorem 1.9.17, the operator
Ti = T − i I
is boundedly invertible and R(Ti ) = H. By Proposition 1.9.10(2)
N (Ti∗ ) = (R(Ti ))⊥ = H⊥ = {0}.
Now, let y ∈ D(T ∗ ). Since R(Ti ) = H, there exists x ∈ D(T ) such that
Ti∗ y = (Ti )x,
and by symmetry we have D(T ) ⊂ D(T ∗ ) and T ∗ is an extension of T . It follows

that x − y ∈ D(T ∗ ) and
(Ti∗ )(x − y) = 0,
hence x − y ∈ N (Ti∗ ), and therefore x = y and consequently, D(T ) = D(T ∗ ) and

T = T ∗.
We end the section with the following important criterion for self-adjointness of
linear operators.
Theorem 1.9.20 Let T ∈ L(H) be a densely defined and symmetric operator on a
Hilbert space H. Then the following are equivalent:
(1) T is self-adjoint.
(2) T is closed and N (T ∗ ± i I ) = {0}.
(3) R(T ± i I ) = H.
Proof Clearly (1) gives (2) by Proposition 1.9.18 and Theorem 1.9.19. Suppose (2)
holds. By Proposition 1.9.10
R(T ± i I ) = H.
Since T is symmetric, we use (1.9.7) to conclude that T ± i I is bounded below,

hence by Theorem 1.9.2 R(T ± i I ) is closed and therefore
R(T ± i I ) = R(T ± i I ) = H
and (3) is proved. Now, suppose (3) holds. Then we have
D(T ) ⊂ D(T ∗ ).
1.10 Differential Operators 61
Let y ∈ D(T ∗ ). Then
(T ∗ + i I )y = z ∈ R(T ∗ + i I ) ⊆ H.
By (3), there exists x ∈ D(T ) such that
(T + i I )x = z = (T ∗ + i I )y.
Now, (1) can be obtained using the same argument as in the proof of the
preceding theorem.
1.10 Differential Operators
This section is a continuation of the preceding section in discussing unbounded

operators. We will investigate differential operators as the most important class of
unbounded operators. Example 1.9.6 demonstrates the fact that the derivative oper-
ator is a closed but not bounded linear operator. This is a prototype of the class of
closed unbounded operator. In this section, we will discuss cases in which differential
operators can be self-adjoint, which enables us to use the results of the preceding
section and conclude that the inverse of the differential operator is also closed and
self-adjoint. This inverse operator is nothing but the integral operator, which has
been already proved it is a compact operator, and its spectral properties was dis-
cussed in Sect. 1.8. This will help us in answering questions about the existence of
solutions of differential equations, and about the eigenvalues and their corresponding
eigenvectors of the eigenvalue boundary value problems.
1.10.1 Green’s Function and Dirac Delta
Consider the differential equation
Lu(x) = f (x) (1.10.1)
for some differential operator L . If L is invertible then the solution of the equation
above is given by u = L −1 f, and so the equation is written as
L(L −1 f ) = f.
Note that since L −1 is an integral operator, it has a kernel, say, k(x, t), namely

L −1 f = k(x, t) f (t)dt,

hence

−1
L(L f) = (Lk(x, t)) f (t)dt = f. (1.10.2)

Hence we obtain that

−1
u=L f = k(x, t) f (t)dt

is the solution to Eq. (1.10.1). The situation of the function Lk(x, t) in (1.10.2) is
rather unusual since the integral of its product with f gives f again, and this has no
explanation in the classical theory of derivatives. This problem was investigated by
Dirac in 1922, and he extended the concept of Kronecker delta function

1 i= j
δi j = ,
0 i = j
which helps select an element, say ak from a set S = {a1 , a2 , . . .} by means of the
operation
ak = δ jk a j , (1.10.3)
j
and it was required by normalization that

δ jk = 1.
j
The notion was extended to the so-called Dirac delta function

0 x = 0
δ(x) = , (1.10.4)
∞ x =0
and the process in (1.10.3) becomes

∞
δ(x) f (x)d x = f (0),
−∞
and in general
∞
δ(x − t) f (x)d x = f (t). (1.10.5)
−∞
Moreover, the normalization condition took the form

∞
δ(x)d x = 1.
−∞
Consequently, we obtain
Lk(x, t) = δ(t). (1.10.6)
Of course, the way the Dirac delta was created doesn’t make it well-defined. More-
over, a rigorous and the treatment above doesn’t stand on a firm mathematical founda-
tion, and so a rigorous analysis was needed to validate the construction of Dirac delta.
Some great mathematicians, such as Sobolev, Schwartz, and others, were among the
first to carry the mathematical analysis, which led to the creation of distribution
theory and Sobolev spaces. In fact, the observation and the debate about the Dirac
delta was a stepstone that had led to the creation of this important area of functional
analysis. The kernel k(x, t) is called Green’s function, and it is the solution of the
equation
L x k(x, t) = δ(t),
where L x is the differential operator that is the inverse of the integral operator of
which k(x, t) is its kernel, and the subscript x of L x denotes the variable under
differentiation.
The strategy is to define L x such that it is densely defined on a separable Hilbert
space, say L 2 , injective, and symmetric. In order to prove it is self-adjoint, we can
use the definition to show that D(T ∗ ) = D(T ) but this might be a challenging task
in many cases, so using theorems and results of the preceding section can be more
helpful. One can show that T is surjective, then use Proposition 1.9.12(3) to conclude
that T is self-adjoint. Alternatively, one can show that σ(T ) ⊆ R, then use Theorem
1.9.19. After we prove that L is self-adjoint, we apply Corollary 1.9.14 to conclude
that L −1 (which is an integral operator) is self-adjoint. We also know that the kernel
of the integral operator must be symmetric in order for the integral operator to be
self-adjoint, although Corollary 1.9.14 ensures that the inverse operator L −1 is self-
adjoint if L is densely defined, injective, and self-adjoint. It turns out that whenever
the differential operator L is self-adjoint, the kernel k of the integral operator L −1
is guaranteed to be symmetric so that L −1 is self-adjoint. Therefore, we can apply
Theorems 1.7.4 and 1.7.5 to conclude the existence of a decreasing countable set of
eigenvalues (λn ) for the integral operator L −1 with a countable set of eigenfunctions
(ϕn ) that form an orthonormal basis for the space, and such that
L −1 ϕn = λn ϕn .
But the eigenfunctions of the operators L and L −1 are the same, and the eigenvalues
are the reciprocals of each other. Thus we have
Lϕn = μn ϕn ,
where
1
μn = −→ ∞
λn
are the eigenvalues of the differential operator L . This can be simply seen from
(1.10.1). Indeed, if Eq. (1.10.1) is of the form Lu = λu, then

u= k(x, t)λu(t)dt,

or

1
u(x) = L −1 u = k(x, t)u(t)dt,
λ
so the eigenvalue of the integral operator L −1 is the reciprocal of the differential

operator L.
We will take the Laplacian operator as an example.
1.10.2 Laplacian Operator
The Laplacian operator is the differential involved in the Laplace operator
Lu = −∇ 2 u
where the minus sign is adopted for convenience. Consider the following one-
dimensional equation:
Lu = −u = f
defined on L 2 ([a, b]). The operator
−L : L 2 ([a, b]) −→ L 2 ([a, b]),
and it is well-known that not all functions in L 2 ([a, b]) are twice differentiable, so
D(L) ⊂ L 2 ([a, b])
and L cannot be surjective. In fact, L is densely defined since the space C 2 [a, b]
consisting of all functions that are twice continuously differentiable on [a, b] is
dense in L 2 ([a, b]) as we will see in Chap. 3, so let us take it for granted now. So L ∗
is well-defined.
To prove symmetry, we proceed as follows:

b
Lu, v − u, Lv = −u v + uv dt
a
= [−u v + uv ]x=b
x=a .
Hence, it is required to assume the homogeneous conditions u(a) = u(b) = 0 from

which we obtain that L is symmetric. In fact, the homogeneous conditions, besides
being helpful in establishing self-adjointness, are also helpful to prove the injectivity
of the operator. Indeed, if Lu = 0 with u(a) = u(b) = 0, then the only solution to
the problem is u = 0, so L is injective. Furthermore, it is easy to show that the
spectrum consists of real numbers. Indeed, using integration by parts yields
b

u, −u =− uu d x
a

b b 2
=− (uu ) d x + u d x
a a

b b 2
= −uu a + u d x
a
b 2
= u d x ≥ 0.
a
Therefore, the Laplacian operator −u is a positive operator, and thus all its
eigenvalues are nonnegative (see Problem 1.11.58). Therefore, by Theorem 1.9.19
we see that L is self-adjoint, and so by Corollary 1.9.14, the inverse operator L −1
exists and it is also self-adjoint. The operator L −1 is given by
b
−1
L f = Kf = G(x, t) f (t)dt. (1.10.7)
a
Here, G(x, t) is Green’s function and it is the kernel of the operator, which is neces-
sarily symmetric since L −1 is self-adjoint. If G is continuous on [a, b] × [a, b] then
L −1 is compact, and consequently we can apply the Hilbert–Schmidt theorem and
the spectral theory of compact self-adjoint operators. From (1.10.6) and since δ = 0
for x = t, Green’s function necessarily satisfies the boundary conditions:
k(a, t) = k(b, t) = 0.
Moreover,
b
d2
LK f = − Ku = −G x x (x, t) f (t)dt = f (x),
dx2 a
and from (1.10.5) we obtain
−G x x (x, t) = δ(x − t).
If x < t we have
−G x x (x, t) = 0,
from which we get
G(x, t) = c1 (t) + c2 (t)x.
For x = t we have
t+
G x x d x = G x (t + ; t) − G x (t − ; t) = 1.
t−
Using the boundary conditions and the jump discontinuity of G gives

1 (x − a)(b − t) x ≤ t
G(x; ξ) = ,
b − a (t − a)(b − x) t ≤ x
which is clearly continuous on [a, b] × [a, b] and symmetric as expected. Therefore

we have the following theorem.
Theorem 1.10.1 The solution to the problem
−u = f
defined on L 2 ([a, b]) with the conditions
u(a) = u(b) = 0
is given by
∞
1
u(x) = f, ϕn ϕn (x)
λ
n=1 n
where {λn } are the eigenvalues of L such that
λ1 < λ2 < . . .
where λn −→ ∞, and {ϕn } are the corresponding eigenfunctions which form an

orthonormal basis for L 2 ([a, b]).
Example 1.10.2 Consider the problem

−u = f,
subject to the homogeneous conditions
u(0) = u(1) = 0.
Then it can be shown using classical techniques of ODEs that the eigenvalues to the
problem are
λn = n 2 π 2
and the corresponding orthonormal eigenfunctions are

√
ϕn (x) = 2 sin nπx.
Moreover, these eigenfunctions form an orthonormal basis of L 2 [0, 1]. Replacing 1

2
by π gives rise to the space L 2 [0, π] which has the set { sin nx} as an orthonormal
π
basis.
1.10.3 Sturm–Liouville Operator
In general, the differential operator
∂2 ∂
L = a2 (x) + a1 (x) + a0 (x)
∂x 2 ∂x
is not self-adjoint. To convert it to self-adjoint, we first multiply L by the factor
x
1 a1 (t)
exp dt .
a2 (x) a2 (t)
If we let

x
a1 (t) a0 (x) a1 (x)
p(x) = exp dt , q(x) = exp dx
a2 (t) a2 (x) a2 (x)
such that p ∈ C 1 [a, b] and q ∈ C[a, b], we obtain the so-called Sturm–Liouville
Operator:

d d
L= p(x) + q(x). (1.10.8)
dx dx
It remains to find the appropriate boundary conditions that yields symmetry. For
this, we assume the Sturm–Liouville equation Lu = f defined on an interval [a, b].
Then b x=b
Lu, v − u, Lv = vLu − u Lvd x = puv − vu x=a .
a
So for the operator L to be self-adjoint, it is required that

x=b
p(x)u(x)v (x) − v(x)u (x) x=a
= 0.
One way to achieve this is to assume the following boundary conditions:
α1 u(a) + α2 u (a) = 0β1 u(b) + β2 u (b) = 0. (1.10.9)
According to the results of the preceding section, we have the following.

Theorem 1.10.3 The S-L problem
Lu = f
defined by (1.10.8–1.10.9) for f ∈ L 2 [a, b] has a countable set of increasing eigen-

values
|λ1 | < |λ2 | < . . . < |λn | < . . .
where λn −→ ∞, and their corresponding eigenvectors {ϕn } form an orthonormal

basis for L 2 [a, b]. Moreover, the solution to the system is given by
∞
1
u(x) = f, ϕn ϕn (x).
λ
n=1 n
For the eigenvalues λn of the S-L system, consider the eigenvalue problem
Lu + λu = 0.
We simply multiply the equation by u and integrate it to obtain

b b

u( pu ) + qu 2 d x + λ u 2 d x,
a a
from which we get the so-called Rayleigh quotient

b b
− puu a + a p(u )2 − qu 2 d x
λ= b .
2
a u dx
It is clear now that if

b
puu a
≤0
and q ≤ 0 on [a, b] then λ ≥ 0, and the absolute values in the above theorem could
be removed.
It is important to observe that the more the boundary conditions are restrictive,
the more chance the operator won’t be self-adjoint (even if it is symmetric) since
D(T ) ⊂ D(T ∗ ). The following operator explains this observation.
1.10.4 Momentum Operator
The differential operator

∂
P = −i
∂x
is called the momentum operator, and has important applications in the field of
quantum mechanics. Let P ∈ L(L 2 [a, b]). Remember that the domain of P must be
dense in L 2 (a, b) in order to define the adjoint. It is easy to see that this operator is
symmetric. Indeed,
b
u, Pv = Lvud x
a
b
= −i v ud x
a
b
= [−ivu]ab + i u vd x
a
1
= [u(b)v(b) − u(a)v(a)] + P ∗ u, v .
i
So by imposing the condition u(a) = u(b), we obtain symmetry and
P ∗ u = Pu
on D(P). Here, D(P ∗ ) consists of all functions in C 1 [a, b] such that u(a) = u(b).
If we adopt the same space C 1 [a, b] subject to the conditions
u(a) = u(b) = 0 (1.10.10)
for the domain of P, then

D(P) D(P ∗ )
and P won’t be self-adjoint, hence the homogeneous conditions (1.10.10) will only
establish symmetry but won’t lead to self-adjointness. Therefore, we always need
to choose suitable boundary conditions to ensure not only symmetry, but D(P) =
D(P ∗ ).
1.11 Problems
(1) Let T ∈ B(H) and T > 0. Prove that
| T x, y |2 ≤ T x, x T y, y .
(2) Let H be a complex Hilbert space.

(a) Let λ ∈ C and T ∈ B(H) be a normal operator. Show that T − λI is normal.
(b) Let α > 0. If T x ≥ α x ∀x then T ∗ is one-to-one.
(3) If T and S are two positive bounded linear operators, and T S = ST, show that
T S is positive.
(4) Show that if T is positive and S is bounded then S ∗ T S is positive.
(5) Let T ∈ B(H) and there exists c > 0 such that
c x2 ≤ T x, x
for all x ∈ H.
(a) Prove that T −1 exists.
1
(b) Prove that T −1 is bounded, with T −1 ≤ .
c
(6) If T ∈ B(H) is a normal operator, show that T is invertible iff T ∗ T is invertible.
(7) If Tn ∈ B(H) is a sequence of self-adjoint operators and Tn −→ T, show that
T is self-adjoint.
(8) Let T ∈ B(H) such that T ≤ 1. Show that T x = x if and only if T ∗ x = x.
(9) Consider the integral operator
π
(T f )(x) = K (x − t)u(t)dt.
−π
Determine whether T is self-adjoint in the following cases:

(a) K (x) = |x| .
(b) K (x) = sin x.
(c) K (x) = ei x .
(d) K (x) = e−x .
2
(10) (a) Give an example of T ∈ B(H) such that T 2 is compact, but T is not.
(b) Show that if T ∈ B(H) is self-adjoint, and T 2 is compact, then T is compact.
(11) Let T ∈ B(X, 1 ). If X is reflexive, show that T is compact.
(12) Let T : p −→ p , 1 < p < ∞, defined as
T (x1 , x2 , . . .) = (α1 x1 , α2 x2 , . . . α j x j , . . .),

1.11 Problems 71
where |αn | < 1 for all n. Show that T is compact iff lim αn = 0.
(13) Determine if the operator T : ∞ −→ ∞ , defined as
x2 , xk
T (x1 , x2 , . . .) = (x1 ,... , , . . .),
2 k
is compact.
(14) Consider X = (C 1 [0, 1], ·1,∞ ) and Y = (C[0, 1], ·∞ ) where

f 1,∞ = max{| f (t)| , f (t) : t ∈ [0, 1]}.
Let i : → C[0, 10 be the inclusion mapping. Show that i is compact.

(15) If T ∈ K(H) and {ϕn } is an orthonormal basis for H, show that T (ϕn ) −→ 0.
(16) Show that if T ∈ K(H) and dim(H) = ∞ then T −1 is unbounded.
(17) Let H be a Hilbert space. Show that K2 (H) is closed if and only if H is finite-
dimensional.
(18) Let T ∈ K2 (H) and S ∈ B(H). Show that T S, ST ∈ K2 (H).
(19) Let T ∈ B(H) with an orthonormal basis {en } for H. Show that if T is compact,
then lim T en = 0.
(20) Show that if T ∈ K2 (H), then
n
T = T n
2 2
for all n ∈ N.
(21) Consider Example 1.3.6.
(a) Prove that the integral operator K in the example is compact if X =
L p (a, b]), 1 < p < ∞, and k is piecewise continuous on [a, b] × [a, b].
(b) Prove that K is compact if X = L p (a, b]), 1 < p < ∞, and
k ∈ L ∞ ([a, b] × [a, b]) .
(c) Prove that K is compact if X = L p () for any bounded measurable in

Rn and
k ∈ L q ( × ) ,
where q is the Holder conjugate of p (i.e., p −1 + q −1 = 1).

(d) Prove that K is compact if X = L p () for any measurable in Rn .
(e) Give an example to show that K is not compact if X = L 1 [a, b].
(22) Let T : C[0, 1] → C[0, 1] defined by
x
f (η)
(T f )(x) = √ dη.
0 x −η
Show that T is compact.

(23) Let T : L 2 [0, ∞] −→ L 2 [0, ∞]
x
1
(T f )(x) = f (ξ)dξ.
x 0
Show that T is not compact. What if T : L 2 [0, 1] −→ L 2 [0, 1]?

(24) Consider the Volterra operator V : L p [0, 1] −→ C[0, 1], defined by
x
(V u)(x) = u(t)dt.
0
(a) Show that V is linear bounded with V ≤ 1.

(b) Show that V is compact for all 1 < p ≤ ∞.
(c) Show that V is a Hilbert–Schmidt operator when p = 2.
(d) Give an example of a function u ∈ L 1 [a, b] to show that V is not compact
when p = 1.
(25) In Theorem 1.4.6, show that T is compact if and only if the sequence {αn } is
bounded.
(26) Show that the subspace of Hilbert–Schmidt operators K2 (H) endowed with the
inner product defined by
T, S = tr(S ∗ T )
for all S, T ∈ K2 (H) form a Hilbert space.

(27) If T ∈ L(X ), for some normed space X, show that the null space N (T ) and
range space R(T ) are T −invariant of X.
(28) Let T ∗ = T ∈ K(H) for a Hilbert space H. Prove the following.
(a) If T has a finite number of eigenvalues, then T is of finite rank.
(b) R(T ) is separable.
(29) Show that T : L 2 [0, 1] −→ L 2 [0, 1] is a Hilbert–Schmidt operator for the fol-
lowing: x
(a) T f (x) = 0 f (t)dt.
1
(b) T f (x) = 0 (x − t) f (t)dt.
x f (t)
(c) T f (x) = 0 √x−t dt.
(30) Consider the Laplace transform: L : L 2 (R+ ) −→ L 2 (R+ ) defined by
∞
L f (x) = e−xs f (s)ds.
0
(a) Show that L is a bounded linear integral operator with

√
L f 2 ≤ π f 2 .
(b) Determine if L is a Hilbert–Schmidt operator.

(c) Determine if L is a compact operator.
(31) Determine whether the multiplication mapping
1.11 Problems 73
T u(x) = xu(x)
is compact if
(a) D(T ) = C[0, 1].
(b) D(T ) = L 2 [0, 1].
(32) Show that the system
ψnm (x, y) = ϕn (x)ϕm (y)
in (1.4.4) is an orthonormal basis for L 2 ([a, b] × [a, b]) given that (ϕn ) is an
orthonormal basis for L 2 [a, b].
(33) Let T ∈ B(X ) for some Banach space X. Show that for all n ∈ N,
{λn : λ ∈ σ(T )} ⊆ σ(T n ).
(34) The spectral radius is defined as
R(T ) = sup{|λ| : λ ∈ σ(T ) ⊆ C}.
(a) Show that R(T ) ≤ T .

(b) Show that 1/n
R(T ) ≤ inf {T n }.
n
(c) If T is normal, show that

1/n
R(T ) = lim T n } = T .
n
(35) Find σ p (T ) and σ(T ) for

(1) R : p −→ p 1 ≤ p ≤ ∞ defined by
R(x1 , x2 , . . .) = (0, x1 , x2 , . . .).
(2) T : p −→ p 1 < p < ∞, defined by

x2 x3 xn
T (x1 , x2 , . . .) = (x1 , , , . . . , , . . .).
2 3 n
(3) T : 2 −→ 2 defined by
x2 x3 xn
T (x1 , x2 , . . .) = ( , ,..., , . . .).
1 2 n−1
(4) T : L 2 (0, 1) −→ L 2 (0, 1), defined by
(T u)(x) = xu(x).
(5) T : D(T ) −→ L 2 (0, 1), D(T ) = {u ∈ C 1 (0, 1) : u(0 = 0} defined by
(T u)(x) = u .
(6) T : L 2 [0, 1] −→ L 2 [0, 1], defined by

x
T f (x) = f (t)dt.
0
(36) Find the point spectrum of the left-shift operator
(x1 , x2 , . . .) = (x2 , x3 , x4 , . . .)
(a) on c.
(b) on ∞ .
(37) Let T ∈ B(H) and (T − λ0 I )−1 be compact for some λ0 ∈ ρ(T ). Show that
(a) (T − λI )−1 is compact for all λ ∈ ρ(T ).
(b) dim N (T − λI ) < ∞ for all λ ∈ ρ(T ).
(38) Let T ∈ B(X ) be a normal operator and μ ∈ ρ(T ). Show that (T − μI )−1 is
normal.
(39) Let T ∈ B(H) and λ ∈ ρ(T ). Show that T is symmetric if and only if
(T − λI )−1 is symmetric.
(40) Let T ∈ K2 (H). Show that if T is a finite rank operator, then σ(T ) is finite.
(41) Write the details of the proof of Proposition 1.6.9.
(42) Write the details of the proof of the Spectral Mapping Theorem.
(43) Let T ∈ K(X ) for some Banach space X. Show that for λ = 0,
ker(T − λI ) = N (Tλ )
is finite-dimensional.
(44) Let T ∈ B(X ) for some Banach space X. Show that T is invertible if and only
if T and T ∗ are bounded below.
(45) Consider the differential operator D : C 1 [0, 1] → C[0, 1],
D( f ) = f .
Show that ρ(D) = Ø.

(46) Let T ∈ B(H). Show that T ∗ is bounded below if and only if T is onto.
(47) Let T ∈ B(X ) and A ∈ K(X ) for some Banach space X. Show that if A + T
is injective then it is invertible.
(48) In the proof of Hilbert–Schmidt theorem, let
M = Span{ϕn }n∈N .
1.11 Problems 75
(a) Show that T | M ⊥ : M ⊥ −→ M ⊥ is self-adjoint and compact.

(b) Show that T | M ⊥ is the zero operator.
(c) Use (b) to show that M = R(T ).
(49) A ∈ K(H) with eigenvalues {λn }, and let λ = 0. Show that the equation
(λI − A)u = f
has a solution if and only if f, v = 0 for all v ∈ ker(λn I − A).

(50) Let T ∈ K(H) be a compact self-adjoint operator and let λ ∈ ρ(T ). Let (λn )
be the eigenvalues of T with the corresponding eigenvectors (ϕn ). If f ∈ H,
show that the solution to the equation
(λI − T )u = f
is
u(x) = (λ − λn )−1 f, ϕn ϕn .
(51) Consider the Fourier transform F : L 2 (R) −→ L 2 (R) defined by

∞
1
(Fu)(k) = √ eikx u(x)d x.
2π −∞
(a) Show that F is a bounded operator defined on L 2 (R).

(b) Find its adjoint operator.
(c) Determine if it is compact.
(d) Find its eigenvalues.
(52) Consider the Volterra equation of the second kind: V : C[0, 1] −→ C[0, 1],
x
u(x) − k(x, y)u(y)dy = f.
0
(a) Use Fredholm Alternative to prove the existence of a unique solution u ∈

C[0, 1] for the equation for f ∈ C[0, 1].
(b) Use the method of successive iterations that was implemented in the proof
of Theorem 1.8.8 to find the solution.
(53) Let u ∈ L 2 [a, b], k ∈ C[a, b] such that |λ| k2 < 1. Show that the equation
b
u(x) = λ k(x, y)u(y)dy + f (x)
a
has a unique solution.

(54) Consider the Fredholm integral operator V :

V u(x) = k(x, y)u(y)dy,

for some ⊂ Rn . If 1 ∈
/ σ(V ) show that there exists a unique solution for the
equation
f (x) − k(x, y)u(y)dy = f

for all f ∈ C().

(55) Let K : L 2 [a, b] −→ L 2 [a, b] be a self-adjoint integral operator with a sym-
metric continuous kernel. Let ϕn be the eigenfunctions of the operator K . If
there exists g ∈ L 2 [a, b] such that K g = f , show that
∞

f = f, ϕn ϕn
i=1
converges uniformly.
(56) Give an example, other than the examples mentioned in the text, of a linear
operator defined on a Banach space that is
(a) bounded but a non-closed operator.
(b) bounded with a non-closed range.
(c) closed but unbounded.
(57) Let T ∈ L(H) be a closed and densely defined operator.
(a) Show that
σ(T ∗ ) = σ(T ).
(b) Show that T is closed if and only if R(T ) is closed.

(58) Recall that an operator T is called positive if T x, x ≥ 0 for all x ∈ H.
(a) Show that every positive operator is symmetric.
(b) Show that the eigenvalues of a positive operator is nonnegative.
(59) Let T ∈ L(H) be a closed and densely defined operator. Show that T ∗ T is
positive and self-adjoint on H.
(60) If T is an operator and T −1 is closed and bounded, show that T is closed.
(61) If T ∈ L(H) is closed and S ∈ B(H),
(a) Show that T + S is closed.
(b) (T + S)∗ = T ∗ + S ∗ .
(62) Let S, T ∈ L(H) be two densely defined unbounded operators. If D(S) ⊂
D(T ) and T −1 = S −1 , show that T = S.
(63) Let S, T ∈ L(H) be two densely defined unbounded operators such that
D(ST ) = H.
(a) Show that

T ∗ S ∗ ⊂ (ST )∗ .
1.11 Problems 77
(b) If S is bounded and D(S) = H show that
T ∗ S ∗ = (ST )∗ .
(64) Let T ∈ L(H) be a densely defined operator.

(a) If T is symmetric, show that
T ⊂ T ∗∗ ⊂ T ∗ .
(b) Moreover, if T is closed then
T = T ∗∗ ⊂ T ∗ .
(65) Let T ∈ L(H) be a densely defined operator on a Hilbert space H. Use the
preceding problem to show that T is bounded if and only if T ∗ is bounded.
(66) Let T ∈ L(H) be an unbounded closed densely defined operator defined on a
Hilbert space H.
(a) Show that σ(T ) is closed.
(b) Show that λ ∈ σ(T ) iff λ ∈ σ(T ).
(c) If i ∈ ρ(T ), show that (T ∗ − i)−1 is the adjoint of (T + i)−1 .
(67) Let T ∈ L(H) be a densely defined operator on a Hilbert space H. Show that
if λ ∈ ρ(T ) then Tλ is bounded below.
(68) Let T ∈ L(H) be a densely defined operator on a Hilbert space H. Show that
if there exists a real number λ ∈ ρ(T ) then T is symmetric if and only if T is
self-adjoint.
(69) Let T ∈ L(H) be a densely defined operator and symmetric on H. Show that
if there exists λ ∈ C such that
R(T − λI ) = H & R(T − λI = H,
then T is self-adjoint.
(70) Find the integral operator and find the eigenvalues and the corresponding eigen-
vectors for the problem Lu = −u provided that
(a) u(0) = u (1) = 0.
(b) u (0) = u(1) = 0.
(c) u (0) = u (1) = 0.
(71) Let T : L 2 [0, 1] −→ L 2 [0, 1], T f = f . Find the spectrum of T if
(a) D(T ) = { f ∈ C 1 [0, 1] : such that f (0) = 0}.
(b) D(T ) = { f ∈ AC[0, 1] : such that f ∈ L 2 [0, 1] and f (0) = f (1)}.
(72) Consider the differential operator
L = ex D2 + ex D
defined on [0, 1] such that u (0) = u(1) = 0. Determine whether or not the
operator is self-adjoint (where D is the first derivative).
(73) Show that the following operators are of the Sturm–Liouville type.
(a) Legendre: (1 − x 2 )D 2 − 2x D + λ on [−1, 1].
(b) Bessel: x 2 D 2 + x D + (x 2 − n 2 )
(c) Laguerre: x D√2 + (1 − x)D
√ + λ on 0 < x < ∞.
(d) Chebyshev: 1 − x 2 D[ 1 − x 2 D] + λ on [−1, 1].
(74) Convert the equation
y − 2x y + 2ny = 0
on (−∞, ∞) into a Sturm–Liouville equation.

(75) Determine whether or not the Sturm–Liouville operator
L = D2 + 1
on [0, π] is self-adjoint under the conditions

(a) u(0) = u(π) = 0.
(b) u(0) = u (0) = 0.
(c) u(0) = u (0) and u(π) = u (π).
(76) Consider the equation
Lu = u = f
where f ∈ L 2 [0, 1]. Find L ∗ if the equation is subject to the boundary condi-
tions
(a) u(0) = u (0) = 0.
(b) u (0) = u (1) = 0.
(c) u(0) = u(1).
(77) Consider the problem
Lu = u = f
where f ∈ L 2 [0, π] under the conditions: u(0) = u(π) = 0.

(a) Show that L is injective.
(b) Show that L is self-adjoint.
(c) Find an orthonormal basis for L 2 [0, π].
Lu = −u = f
where f ∈ L 2 [−π, π] under the conditions: u(−π) = u(π) and u (−π) =

u (π).
(c) Find an orthonormal basis for L 2 [−π, π].
Lu = u + λu = 0,
where 0 < x < 1, under the conditions: u(0) = u(1) and u (0) = u (1).
1.11 Problems 79

(c) Find an orthonormal basis for L 2 [0, 1].
(80) Consider the operator
Lu = iu ,
√
where i = −1, and such that
D(L) = {u ∈ C 1 [0, 1] : u(0) = u(1) = 0}.
Show that L is symmetric but not self-adjoint.

u + qu = f (x)
for q ∈ C[a, b] and f ∈ L 2 [a, b] subject to the conditions u(a) = u(ab) = 0.

Show that the solution to the problem exists and is given by
∞
1
u(x) = f, ϕn ϕn (x),
λ
n=1 n
where {λn } is the set of eigenvalues such that
|λn | ∞
and {ϕn } are the corresponding eigenfunctions that form an orthonormal basis
for L 2 [a, b].
(82) Let L be a self-adjoint differential operator and let f ∈ L 2 [0, 1]. Use Fredholm
Alternative to discuss the solvability of the two boundary value problems
(1) Lu = f defined on [0, 1] subject to the conditions u(0) = α and u(1) = β.
(2) Lu = 0 defined on [0, 1] subject to the conditions u(0) = u(1) = 0.
(83) Determine the value of λ ∈ R for which the operator T : C[0, 1] −→ C[0, 1]
defined by x
(T u)(x) = u(0) + u(t)dt
0
is contraction.
Chapter 2
Distribution Theory
2.1 The Notion of Distribution
2.1.1 Motivation For Distributions
Recall in Sect. 1.10, the Dirac delta were introduced with no mathematical founda-
tions, and we mentioned that a rigorous analysis is needed to validate the construction
of delta. This is one of the main motivations to develop the theory of distribution,
and the purpose of this chapter is to introduce the theory to the reader and discuss
its most important basics. As explained earlier, the Dirac delta cannot be considered
as a function. We shall call these mathematical objects: distributions. Distributions
are not functions in the classical sense because they exhibit some features that are
beyond the definition of the function. We can, however, view them as “generalized”
functions provided that the definition of function is being extended to include them.
This “generalized” feature provides more power and flexibility to these distributions,
enabling them to represent some more complicated behavior that cannot be repre-
sented by functions. For this reason, distributions are very useful in applications to
topics related to physics and engineering, such as quantum mechanics, electromag-
netic theory, aerodynamics, and many other fields. This chapter defines the notion
of distribution and discusses some fundamental properties. Then, we will perform
some operational calculus on rigorous mathematical settings, such as derivatives,
convolutions, and Fourier transforms.
2.1.2 Test Functions
In principle, the theory suggests that distributions should act on other functions
rather than being evaluated at particular points, and its action on a particular function
determines its values, and its definition is governed through an integral over a domain.
https://doi.org/10.1007/978-981-99-3788-2_2
82 2 Distribution Theory
So a distribution, denoted by T, is an operator, in fact functional, and should take

the following form:
∞
T (ϕ) = T (x)ϕ(x)d x. (2.1.1)
−∞
It is simply an integral operator acting on functions through an integral over R, or

generally Rn . Observe that not any function ϕ can be admitted in (2.1.1), so we
certainly need to impose some conditions on these admissible functions ϕ to ensure
the operator T is well-defined. To handle this issue, note that by the linearity of
integrals
∞ ∞ ∞
T (x)[c1 ϕ(x) + c2 ψ(x)]d x = c1 T (x)ϕ(x)d x. + c2 T (x)ψ(x)d x,
−∞ −∞ −∞
which implies that

T (c1 ϕ + c2 ψ) = c1 T (ϕ) + c2 T (ψ).
So the operator T is clearly linear. If we assume that both ϕ and ψ to be functions in

the domain of T, then c1 ϕ + c2 ψ must be in the domain as well. This implies that
the space of all admissible functions ϕ of a distribution T must be a vector space.
The admissible functions must satisfy two essential conditions.
(1) They are infinitely differentiable. That is, ϕ ∈ C ∞ . Functions in the space C ∞ ()
are called smooth functions on .
(2) They must be of compact support. Recall a support of a function f , denoted by
supp( f ), is defined as
supp( f ) = {x ∈ Dom( f ) : f (x) = 0}.
According to Heine–Borel theorem, a compact set on R n is closed and bounded,

so a function f is of compact support K means

f (x) = 0 : x ∈ K
.
f (x) = 0 : x ∈/K
The space of continuous functions of compact support is denoted by Cc (). Sim-

ilarly, the space of smooth functions of compact support is denoted by Cc∞ ().
There are two reasons to impose the first condition, one of them is related to dif-
ferentiation of distributions, and the other one is related to their Fourier transforms.
We will clarify these two points when we come to them. The second condition is a
strong condition to convert the integral from improper to proper. Indeed, if ϕ is of
compact support, then we can find two real numbers a and b such that
2.1 The Notion of Distribution 83
∞ b
T (x)ψ(x)d x = T (x)ψ(x)d x.
−∞ a
Now, we are ready to provide our first definition.

Definition 2.1.1 (Test Function) A function is called a test function on ⊆ Rn if
it is smooth on and is of compact support. The space of test functions on is
denoted by D() = Cc∞ ().
It is readily seen that the space D is a linear space. The dual space of D is denoted by
D . We need to impose the notion of a distance in D in order to define convergence in
this space. The nature of the members of the space suggests the uniform convergence.
Definition 2.1.2 (Convergence in D) Let ϕn , ϕ ∈ D(). Then, we say that ϕn →

ϕ in D() if there exists a compact set K such that ϕn , ϕ = 0 outside K , and
ϕn − ϕ → 0 uniformly, i.e.,
max |ϕn (x) − ϕ(x)| → 0.

x∈K
2.1.3 Definition of Distribution
We are ready now to give a definition for the distribution.

Definition 2.1.3 (Distribution) A distribution T is a continuous linear functional,
T : D(Rn ) → R, and is given by

T (ϕ) = T (x)ϕ(x)d x,
Rn
for every test function ϕ in D.

A distribution T acting on a function ϕ can also be denoted by T, ϕ . The
notation ., . here is not an inner product but it was adopted because T behaves
similarly when acting on functions.
Note that in case n = 1, the definition reduces to (2.1.1).
Remark All results for n = 1 applies to n > 1, so for the sake of simplicity, we
may establish some of the upcoming results only for n = 1 and the reader can extend
them to n > 1 either by induction or by routine calculations.
The definition implies that a functional on D is a distribution in the dual space
D (R) if it is linear, continuous and satisfies (2.1.1). Recall that by linearity we mean:
T, αϕ + βψ = α T, ϕ + β T, ψ .
By continuity, we mean: If ϕn → ϕ in D then
T, ϕn → T, ϕ .
Since for linear functionals continuity at a point implies continuity at all points, it is
enough to study convergence at zero, i.e.,
T, ϕn → T, 0
whenever ϕn → 0. The dual space of D, denoted by D (R), is endowed with the

weak-star topology which forms the pointwise convergence.
2.2 Regular Distribution
2.2.1 Locally Integrable Functions
Definition 2.1.3 offers a rigorous mathematical definition of distribution by means of

integration. But, we may still have some issues related to the nature of this definition.
The functional T is defined in terms of itself because the functional T is inserted
inside the integral and acts pointwise on x ∈ Rn as the test functions do, then we
compute the integral over Rn to obtain T again. This approach might be awkward and
fuzzy in some situations. If we can associate the functional T with another function,
say f, that can be used to define the functional appropriately, then the functional T
would be characterized by f . But how to choose such f ? The following definition
gives some help in this regard.
Definition 2.2.1 (Locally Integrable Function) Let f : → R be a measurable
function. Then, f is called a locally integrable function if it is Lebesgue-integrable
on every compact subset of . The space of all locally integrable functions on is
1
denoted by L loc ().
The definition implies that every continuous function on Rn is locally integrable.
The constant function f (x) = c is locally integrable on R but not integrable on it. If
a function is continuous on R and locally integrable but not integrable, this means it
does not vanish at infinity.
2.2.2 Notion of Regular Distribution
The notion of locally integrable functions ignites the following idea: Since ϕ is
continuous on a compact set K , it has a maximum value on that set, and if we
2.2 Regular Distribution 85
multiply a locally integrable function f with a test function ϕ and integrate over Rn ,
this gives

f (x)ϕ(x)d x = f (x)ϕ(x)d x ≤ | f (x)ϕ(x)| d x

Rn K
K
≤ (max |ϕ|) | f | d x < ∞.

x∈K K
Therefore, one way to characterize a distribution T and give it an appropriate

integral representation is to use a locally integrable function to define it, and this
will yield a meaningful integral. Since the integral exists and finite, the functional is
well-defined. To show it is a distribution, we need to prove linearity and continuity.
Linearity is clear:

T (ϕ1 + ϕ2 ) = f (x)(ϕ1 + ϕ2 )d x

= f (x)ϕ1 (x)d x + f (x)ϕ2 (x)d x
= T, ϕ1 + T, ϕ2 = T (ϕ1 ) + T (ϕ2 ).
To prove continuity, we assume ϕn → 0, then
max |ϕn | → 0
x∈K
for some compact K ⊃ supp(ϕn ). Then,

| f, ϕn | = f (x)ϕn (x)d x
Rn

≤ (max |ϕn |) | f | d x → 0.
x∈K K
Therefore, f is a distribution on Rn , i.e., f ∈ D (Rn ).

Definition 2.2.2 (Regular Distribution) Let f ∈ L loc
1
(Rn ). Then, the distribution T f
given by

Tf , ϕ = f (x)ϕ(x)d x
Rn
is called a regular distribution.

According to Definition 2.2.2, any regular distribution is characterized by a locally
integrable function. Indeed, if f = g, then T f = Tg . On the other hand, if f and g
are locally integrable functions and T f = Tg , then f = g except for a set of measure
zero. So we can say that regular distributions are uniquely determined by a locally
integrable function. We could go the other way around and say that every locally
integrable function can be used to define a regular distribution. If no such f exists

to define a distribution, it is a singular distribution. This leads to the fact that some
functions defined in the classical sense, such as the class of locally integrable func-
tions, can be considered regular distributions. This also shows that the value of a
distribution at a point can have a meaning only if the distribution is regular because it
1
can be identified with a function in L loc (). This fact is rather interesting and shows
how distributions generalize the notion of classical functions, which implies that cal-
culus operations (such as limits and differentiation) that were classically developed
for functions can also be implemented somehow in a distributional sense.
2.2.3 The Dual Space D
The dual space of D is the space of all distributions on D, and is denoted by D . So

if T ∈ D , then T : D → R. The space D is linear (check!), which means that for
any two T1 , T2 ∈ D , we have
αT1 + βT2 ∈ D .
The convergence in D can take a weak form. Recall that a sequence ϕn ϕ in D

weakly if
T (ϕn ) → T (ϕ)
for every T ∈ D . The convergence in D takes the form of a weak-star. Recall that
a sequence Tn → T in D in the weak-star topology if
Tn (ϕ) → T (ϕ)
for every ϕ ∈ D. Another characterization is equality. Recall for functions f and g,

we say that f = g on if f (x) = g(x) for every x ∈ . That is, the equality of
functions is contingent on their values. To extend to distributions, we say that two
distributions T and S are equal if and only if
T (ϕ) = S(ϕ)
for all ϕ ∈ D. That is
T = S ⇐⇒ T, ϕ = S, ϕ ∀ϕ ∈ D. (2.2.1)
By continuity,
Tn → T ⇐⇒ Tn , ϕ → T, ϕ ∀ϕ ∈ D. (2.2.2)
Note that when we say T = S, it means they are equal in the distributional sense.
2.3 Singular Distributions 87
2.2.4 Basic Properties of Regular Distributions
Here are Some Elementary Properties of Distributions

Proposition 2.2.3 Let T and S be two distributions, and ϕ ∈ D(), ⊆ Rn . Then,
the following properties hold.
(1) T (0) = 0.
(2) T, cϕ = c T, ϕ for any c ∈ R.
(3) T + S, ϕ = T, ϕ + S, ϕ .
(4) T (x − c), ϕ(x) = T, ϕ(x + c) . If T (x − c), ϕ = T, ϕ(x) then T is said
to be invariant with respect to translations.
(5) T (−x), ϕ(x) = T (x), ϕ(−x) .
1 x
(6) T (cx), ϕ(x) = T (x), ϕ( , for any nonzero number c.
c c
(7) T.S, ϕ = S, T ϕ , provided that T ϕ ∈ D().
(8) Let x = g(y) ∈ C ∞ be injection. Then,

T ◦ g, ϕ = T, (ϕ ◦ g −1 )(g −1 (x)) .
Proof The properties from (1) to (7) can be easily concluded from the definition
directly and using some simple substitutions, so the reader is asked to present proofs
for them. For (8), we let y = g −1 (x) in the integral

T (g(y)ϕ(y)dy.
2.3 Singular Distributions
2.3.1 Notion of Singular Distribution
The terms “regular” and “singular”, if coined with functions, determine whether
a function is finite or infinite on some particular domain. The points at which the
function blows up are called singularities. If we extend the notion to distributions, we
∞
shall say that a regular distribution T is a distribution when the value f (x)ϕ(x)d x
−∞
is finite and well-defined. This can be achieved if there exists a function f that is
integrable on every compact set. This ensures that f is integrable on supp(ϕ); hence
the integral is well defined. If no such f exists, then the distribution is called singular.
One way to construct a singular distribution is through the Cauchy principal value
of a nonintegrable function having a singularity at a point a. Such a function cannot
define a regular distribution, but we can define the principal value of it, denoted by
p.v. f (x), as follows

p.v. f (x), ϕ(x) = lim+ f (x)ϕ(x)d x
↓0 |x−a|>
where f has a singularity at x = a. The following example illustrates the idea.

1
Example 2.3.1 Consider f (x) = on R. Obviously, f ∈ / L 1Loc (R) due to the sin-
x
gularity at x = 0. Hence, we apply its principal value,

1 ϕ(x)
p.v. , ϕ = lim+ d x. (2.3.1)
x ↓0 |x|> x
Since ϕ is compact, we can find a compact interval K = [−r, r ], with r > , such
that ϕ vanishes outside K . So (2.3.1) can be written as

ϕ(x) − ϕ(0) ϕ(0)
lim+ dx + d x.
↓0 <|x|<r x <|x|<r x
The second integral is zero due to symmetry, so passing to the limit we get

1 r
ϕ(x) − ϕ(0)
p.v. ,ϕ = d x.
x −r x
Notice that ϕ ∈ C ∞ , so it has a maximum on K , then by the Mean Value Theorem

we conclude that

ϕ(x) − ϕ(0)
≤ 2 max ϕ < ∞. (2.3.2)
x
So the principal value of f is well-defined and defines a linear functional. To prove

continuity, we consider a sequence ϕn → 0. It’s enough to show that
1
p.v. , ϕn → 0.
x
This can be easily established knowing that the convergence of the integral is uni-
form, as can be seen from (2.3.2). Therefore, the principal value of f is a singular
distribution.
2.3.2 Dirac Delta Distribution
Example 2.3.2 The next example illustrates the earliest and the most important
distribution known in literature. To simplify the treatment, we restrict to R, though
it is valid for Rn . Let ϕ ∈ D, and define the following
∞

Tx0 (ϕ) = Tx0 , ϕ = T (x − x0 )ϕ(x)d x = ϕ(x0 ). (2.3.3)
−∞
The functional is obviously linear. Let ϕn −→ ϕ, then

Tx (ϕn ) − Tx (ϕ) = |ϕn (x) − ϕ(x)| −→ 0,
0 0
so Tx0 is continuous, and hence it is a distribution. The question arises: Is Tx0 a regular
or singular distribution? In other words, can we find a function g ∈ L loc
1
() such that
∞
Tx0 (ϕ) = g(x)ϕ(x)d x = ϕ(x0 ) (2.3.4)
−∞
for every ϕ ∈ D? Suppose there is a function g ∈ L loc

1
() identifying Tx0 such that
∞
g(x)ϕ(x)d x = ϕ(x0 ) (2.3.5)
−∞
for all ϕ ∈ D. Then, we can consider a subset B ⊂ such that

|g| < 1.
B
We can also find a test function ϕ with supp(ϕ) ⊆ B and max{ϕ} = ϕ(0) (why?).
Then, ∞ ∞

g(x)ϕ(x)d x ≤ ϕ(0) |g(x)| d x < ϕ(0),

−∞ −∞
which contradicts (2.3.5). Hence, no such g exists, and so Tx0 cannot be regular,
hence it is singular. In (2.3.5), let us release the condition that
g ∈ L loc
1
().
What behavior does g exhibit to maintain (2.3.4) for all tests functions ϕ? One way
to allow this is to require that g(x) = 0 for all x = x0 . If x0 = 0 and g(x0 ) = c < ∞,
then ∞
g(x)ϕ(x)d x = 0 = x0 .
−∞
So, the function behaves abnormally at x0 with no possible finite value. Further, we
can find ϕ ∈ D such that ϕ = 1 a.e., and
∞
g(x)d x = 1.
−∞
These conditions remind us of the famous Dirac delta. It is the first generalized
function appeared in literature, and it is the most basic and important one that plays
a dominant role in this theory. It is called Dirac delta, in honors of Paul Dirac, a
British physicist, who introduced his delta in 1922 as the continuum analog of the
discrete Kronecker delta. Dirac introduced delta as a function and gave some of its
basic properties, and it soon became a famous powerful tool in solving problems
in mechanics with discontinuous pulses. However, the mathematical community
rejected the Dirac delta and refused to deal with it as a function since it doesn’t
behave normally as other functions. Recall in applied mathematics courses, the Dirac
delta is represented by
∞ x = x0
δ(x − x0 ) = (2.3.6)
0 x = x0 ,
with the condition ∞

δ(x − x0 )d x = 1. (2.3.7)
−∞
If x0 = 0 then (2.3.6) reduces to (1.10.4). Note that (2.3.6) and (2.3.7) do not define a
function in the classical definition of functions. Under the representation (2.3.3), the
integral cannot be evaluated in Riemann Theory, and is evaluated as 0 in Lebesgue
Theory. No such function in the usual sense would satisfy (2.3.6)–(2.3.7). This
resulted in the famous debate of whether the delta notion defined in (2.3.6) and
(2.3.7) is a function. The debate continued until it was settled in 1936, when Sergei
Sobolev, a Soviet mathematician and pioneer in the theory of partial differential
equations, proposed a rigorous definition for the delta notion based on an integral
operator. In 1943, Laurent Schwartz, a leading French mathematician, constructed a
comprehensive theory on these “generalized functions”. The idea is that it is best to
think of delta as a distribution of the form of (2.3.1), i.e.,
Tx0 (x) = δ(x − x0 ).
If we substitute it in definition (2.3.3), then we obtain

∞
δ(x − x0 ), ϕ(x) = δ(x − x0 )ϕ(x)d x = ϕ(x0 ). (2.3.8)
−∞
So (2.3.8) implies (2.3.6)–(2.3.7). Conversely, it can be seen from (2.3.6) that
ϕ(x)δ(x − x0 ) = ϕ(x0 )δ(x − x0 ).
Thus, using (2.3.4) and (2.3.7) we get

∞ ∞ ∞
δ(x − x0 )ϕ(x)d x = δ(x − x0 )ϕ(x0 )d x = ϕ(x0 ) δ(x − x0 )d x = ϕ(x0 ).
−∞ −∞ −∞
This implies that (2.3.6)–(2.3.7) and (2.3.8) are, in fact, equivalent, and the Dirac delta
defined in (2.3.6)–(2.3.7) can be defined more rigorously in terms of a distribution
of the form (2.3.8). Now, we are ready to give a rigorous definition for the delta.
Definition 2.3.3 (Delta Distribution) The Dirac delta distribution, denoted by δ, is
a singular distribution defined as

δ(x − x0 ), ϕ(x) = δ(x − x0 )ϕ(x)d x = ϕ(x0 )
Rn
for a given x0 ∈ Rn and all ϕ ∈ D(Rn ).
2.3.3 Delta Sequence
The Dirac delta is indeed a singular distribution because, in light of the discussion
above, it is not characterized by any locally integrable function. No measurable
function can be used to define δ as represented by the integral in the above formula.
We will, however, introduce a sequence of functions in D(R) that will converge
to δ.
Definition 2.3.4 (Delta Sequence) A sequence φm ∈ L loc
1
(Rn ) (n ≥ 1) is called delta
sequence if the following conditions hold:
(1) φm ≥ 0.
∞
(2) φm (x)d x = 1.
−∞
(3) φm (x) → 0 uniformly in |x| ≥ > 0.
The last condition means: for every > 0, there exists N ∈ R such that
|φm (x)| <
for every m > N and for every x = 0. This implies that as m −→ ∞, we have

φm (x)d x −→ 0. (2.3.9)
|x|≥
We will use this sequence to conclude an important result that justifies the name of
the sequence.
Theorem 2.3.5 If φm ∈ L loc
1
(Rn ) is a delta sequence, then φm (x) → δ(x) as
m → ∞.
Proof We restrict the argument to n=1 and leave the general case for the reader.
According to Definitions 2.3.3 and 2.3.4, it suffices to prove that for any ϕ ∈ D, we
have
∞
lim φm (x)ϕ(x)d x = ϕ(0).
m→∞ −∞
We have
∞ ∞ ∞
lim φm (x)ϕ(x)d x = lim φm (x)[ϕ(x) − ϕ(0)]d x + ϕ(0) φm (x)d x.
m→∞ −∞ m→∞ −∞ −∞
(2.3.10)
But the first term of the RHS of the equation above can be written as
∞ r
lim φm (x)[ϕ(x) − ϕ(0)]d x + φm (x)[ϕ(x) − ϕ(0)]d x.
m→∞ |x|≥r >0 −r
For the first integral, we pass the limit inside the integral using (2.3.9) due to uniform
convergence on
|x| ≥ r > 0,
and this gives 0. For the second integral, note that ϕ is continuous at x = 0. Let
> 0. Then, there exists r1 > 0 such that
|ϕ(x) − ϕ(0)| <
for |x| < r1 . Moreover, we can find r2 sufficiently small such that
r2
φm (x)d x < 1.
−r2
Let r = min{r1 , r2 }. Then

r
r
φ (x)[ϕ(x) − ϕ(0)]d x ≤ φm (x)d x < .
m
−r −r
Note that is arbitrary, hence we obtain

∞ ∞
lim φm (x)ϕ(x)d x = 0 + ϕ(0) φm (x)d x = ϕ(0) · 1 = ϕ(0). (2.3.11)
m→∞ −∞ −∞
In view of Definition 2.3.3, we write (2.3.11) as
lim φm (x), ϕ(x) = δ(x), ϕ(x) .

m→∞
It follows from (2.2.1) that

lim φm (x) = δ(x).
m→∞
Since {φn } ∈ 1
(),
each φi can be considered a regular distribution, which
L loc
implies that, though δ is not regular, it is the limit of a sequence of regular distribu-
tions. This argument raises the following question: Does this sequence exist? The
answer is yes, and one way to construct it is to consider a function φ ∈ D(), with the
properties φ ≥ 0 on K =supp(φ), and φ = 1. Since φ is continuous on a compact
K
set K , it is uniformly continuous on K , so for every η > 0 we can find n sufficiently
large such that x

φ − φ(0) < η
n
for all x ∈ K . This shows that
x
φ → φ(0)
n
uniformly as n −→ ∞, and this can be written as
φ(y) → φ(0)
uniformly as → 0. Now, for the function φ ∈ D() and n > 0, we define the
sequence
φn (x) = nφ(nx) (2.3.12)
for ⊆ R. Then, we observe the following:

(1) 0 ≤ φn (x) ∈ D() for all n.
(2) φn (x) −→ 0 uniformly on x = 0, and φn (0) −→ ∞ as n −→ ∞.
∞
(3) φn (x)d x = 1. This can be seen from the substitution nx = y.
−∞
If we represent δ by (2.3.6) which physicists use, then observation (2) leads to the
result directly, but we prefer to maintain a rigorous treatment. Given the observations
above, we see that {φn } in (2.3.12) is a delta sequence, hence by Theorem 2.3.5, we
obtain
lim φn (x) = δ(x).
n→∞
Indeed, for ϕ ∈ D() we have

∞ ∞
φn (x)ϕ(x)d x = nφ(nx)ϕ(x)d x.
−∞ −∞
Using the substitution y = nx, we obtain

∞ ∞
y
φn (x)ϕ(x)d x = φ(y)ϕ dy.
−∞ −∞ n
Using the same argument as above on ϕ, we have

y
ϕ → ϕ(0)
n
uniformly. Hence, we can pass the limit inside the previous integral and obtain
∞ ∞
y
lim φ(y)ϕ dy. = ϕ(0) φ(y)dy = ϕ(0).
−∞ n −∞
Therefore, as n → ∞ we have
φn (x) → δ(x).
Remark The following should be noted:

1. The condition φ(x) ∈ D() is used to define continuous functions on compact
sets, but this condition can be relaxed and we still obtain a delta sequence {φn }
without necessarily being in D().
2. The above discussion can be extended analogously to Rn .
2.3.4 Gaussian Delta Sequence
Here is a famous example of a delta sequence known as the Gaussian delta sequence
which is derived from the Gaussian function.
Example 2.3.6 Consider the Gaussian function
1
φ(x) = √ e−x .
2
Clearly, φ ∈
/ D(R) because it is not of compact support. It is well-known that
∞ √
e−t dt =
2
π.
−∞
Define n
φn (x) = nφ(nx) = √ e−n x .
2 2
Then, φn (x) ≥ 0. Integrating over R, and using the substitution y = nx,

∞ ∞
n 1
√ e−n x d x = √ e−y dy = 1.
2 2 2
−∞ π π −∞
Moreover, for |x| ≥ r > 0, we have


n n 1
sup √ e−n x − 0 ≤ 2 2 = 2 → 0
2 2
|x|≥r >0, π n r nr
as n −→ ∞ for all |x| ≥ r. So

φn (x) → 0
on |x| ≥ r. In view of Definition 2.3.4 we conclude that {φn } is a delta sequence, and
hence by Theorem 2.3.5,
φn −→ δ.
Indeed, letting y = nx, then

∞ ∞
n 1 y
√ e−n x ϕ(x)d x = √ e−y ϕ
2 2 2
lim dy.
n→∞ −∞ π −∞ π n
Since ϕ ∈ D, by a previous argument, we have the uniform convergence

y u
ϕ −→ ϕ(0).
n
Passing the limit inside the integral above gives
∞ ∞
1 y 1
√ e−y ϕ √ e−y dy = ϕ(0).
2 2
dy = ϕ(0) (2.3.13)
−∞ π n −∞ π
We have the following important remarks about the previous example.

(1) We can verify that φn −→ δ by a simple, though no rigorous, argument. Let
x = 0, then
n
δn (0) = √ → ∞ as n → ∞.
π
Let x = 0, then
n
lim √ e−n x = 0.
2 2
n→∞ π
So, in general, we have

n −n 2 x 2 0 : x = 0
lim √ e = δ(x) = .
n→∞ π ∞ :x =0
As illustrated before, approaching δ distribution through the representation

(2.3.6) does not yield a rigorous treatment.
(2) We can prove the result in (2.3.13) using the Dominated Convergence Theorem
rather than the uniform convergence on |x| ≥ r > 0. This can be achieved using
the fact that ϕ is a continuous function with a compact support, and the fact
that e−y is Lebesgue-integrable on R. We leave the details for the reader as an

2
exercise.
(3) We can define the function φ over Rn and we obtain the same results. Moreover,
we can rewrite the sequence in (2.3.12) as
1 x
φ (x) = φ (2.3.14)
n
1
and taking → 0+ , then letting n 2 = in
4
n
φn (x) = √ e−n x .
2 2
Then, the sequence can be written in the following form
1
e−x /4 ,
2
φ (x) = √
4π
with φ → δ as → 0. This representation of the delta sequence is important

and useful in constructing solutions of PDEs.
2.4 Differentiation of Distributions
2.4.1 Notion of Distributional Derivative
One of the most fundamental and important properties of distributions is that they
ignore values at points and act on functions through an integration process. This
seems interesting because it enables us to differentiate discontinuous functions. The
generalized definition of functions is the main tool to allow this process to occur.
Assume a distribution is T and its derivative is T . Then
∞

T ,ϕ = T ϕd x.
−∞
If we perform integration by parts, and making use of the fact that ϕ is differentiable
and of compact support, then the above integral will be of the form
∞

T , ϕ = 0 − T ϕ d x.
−∞
This proposes the following definition.

2.4 Differentiation of Distributions 97
Definition 2.4.1 (Distributional Derivative) Let T be a distribution. Then, the dis-

tributional derivative of T , denoted by T , is given by

T , ϕ = − T, ϕ .
The derivative of distribution is always a distribution, and we can continue dif-

ferentiating or use induction to get

T (m) , ϕ = (−1)m T, ϕ(m) .
We have no problem with that as long as ϕ ∈ D, which is in fact one of the main
reasons why test functions are restricted to that condition. It should be noted that
T (m) can never be the zero function. We pointed out previously that some normal
functions, such as the locally integrable functions, can be considered distributions.
This implies the following:
(1) Derivatives of the distributions should extend the notion of derivative for func-
tions. Otherwise, we may get two different derivatives for the same function if
treated as a normal function and as a distribution.
(2) The rules governing the distributional derivatives should be the same in classical
cases.
2.4.2 Calculus Rules
Theorem 2.4.2 The following rules hold for distributional derivatives:

(1) Summation Rule:
(T + S) = T + S .
(2) Scalar Multiplication Rule:

(cT ) = cT
for every c ∈ R.
(3) Product Rule: If g ∈ C ∞ , then
(gT ) = gT + g T.
(4) Chain Rule: If g ∈ C ∞ , then
d
(T (g(x)) = T (g(x) · g (x).
dx
Proof (1) and (2) are immediate using the definitions.

(T + S) , ϕ = T , ϕ + (S , ϕ ,
and
(cT ) , ϕ = c T , ϕ .
For (3), we have

(gT ) , ϕ = − gT, ϕ = − T, gϕ

= − T, (gϕ) + T, g ϕ

= T , gϕ + T, g ϕ

= gT , ϕ + g T, ϕ .
For (4), we apply the definition to get

[T (g(x))] , ϕ = − T (g(x), ϕ
∞
= −[T (g(x))ϕ]∞ −∞ + T (g(x)) · g (x)ϕ(x)d x
−∞

= T (g(x) · g (x), ϕ .
2.4.3 Examples of Distributional Derivatives
Example 2.4.3 To find the distributional derivative of δ(x), we apply Defini-

tion 2.4.1 to get
δ , ϕ = − δ, ϕ = −ϕ (0).
It is important to know that this doesn’t mean
δ = −ϕ (0),
but it says that when δ acts on ϕ, the outcome is −ϕ (0), i.e., δ [ϕ] = −ϕ (0).
Further, using induction one can continue to find derivatives to get the general
formula

δ (n) , ϕ = (−1)n ϕ(n) (0).
Example 2.4.4 Consider the Heaviside function

2.4 Differentiation of Distributions 99

1 :x >0
H (x) =
0 : x < 0.
Let ϕ ∈ D. Then

H , ϕ = − H, ϕ
∞
=− H ϕ d x
−∞
∞
=− ϕ d x = ϕ(0).
0
But ϕ(0) is nothing but δ, ϕ , therefore we get

H , ϕ = δ, ϕ
for any test function ϕ. So we conclude that H = δ.

Example 2.4.5 Consider the sign function

1 :x >0
f (x) = sgn(x) =
−1 : x < 0.
Then we clearly have

sgn(x) = H (x) − H (−x).
Using properties of derivatives (Theorem 2.4.2(1)) and the previous example,
(sgn(x)) = H (x)(1) − H (−x)(−1)

= δ(x) + δ(−x) = 2δ(x).
Another interesting function that we like to differentiate in a distributive sense is the

following.
Example 2.4.6 Let f (x) = ln |x| . We apply the definition to obtain
∞

(ln |x|) , ϕ = − ln |x| ϕ d x.
−∞
Since we have a singularity at 0, we use the principal value again to get

(ln |x|) , ϕ = − lim+ ln |x| ϕ d x. (2.4.1)
→0 |x|>
By integration by parts, and the fact that ϕ is of compact support, the RHS of (2.4.1)
equals

ϕ(x)
− lim+ d x + [(ϕ() − ϕ(−)) ln |x|]. (2.4.2)
→0 |x|> x
The reader should be able to verify that
lim (ϕ() − ϕ(−)) ln |x| = 0.

→0+
Substituting (2.4.2) in (2.4.1) gives

ϕ(x) 1
(ln |x|) , ϕ = − lim+ d x = p.v. .
→0 |x|> x x
2.4.4 Properties of δ
The following theorem provides some interesting properties for the delta distribution
that can be very useful in computations.
Theorem 2.4.7 The following properties hold for δ(x).
(1) x · δ (x) = −δ(x).
(2) If c ∈ Dom( f ), then
f (x)δ(x − c) = f (c)δ(x − c).
(3) For any c = 0, we have
1
δ(x 2 − c2 ) = [δ(x − c) + δ(x + c)].
2 |c|
(4) Let g(x) = (x − x1 )(x − x2 ) · · · (x − xn ). Then,

n
1
δ(g(x)) = δ(x − xk ).
k=1
|g (x )|
k
Proof For (1), we make use of Proposition 2.2.3(3) and Theorem 2.4.2(3). Since
x · δ(x) = 0, we have

xδ , ϕ = (xδ) , ϕ − δ, ϕ = − δ, ϕ .
For (2) we apply the definition,

2.5 The Fourier Transform Problem 101
f (x)δ(x − c), ϕ(x) = δ(x − c), f (x)ϕ(x)

= f (c)ϕ(c)
= f (c) δ(x − c), ϕ(x)
= f (c)δ(x − c), ϕ(x) .
For (3), consider ⎧

⎪
⎨1 : x < −c
H (x − c ) = 0 : −c < x < c
2 2
⎪
⎩
1 : x > c.
Then, we can write
H (x 2 − c2 ) = 1 − [H (x + c) − H (x − c)].
Taking the derivative of both sides of the equation using chain rule gives
2xδ(x 2 − c2 ) = δ(x − c) − δ(x + c).
Now (3) follows from (2).

For (4), we extend (3). Notice that
H (g(x)) = 1 − [H (x − x1 ) − H (x − x2 ) + H (x − x3 ) · · · −(−1)n H (x − xn )].
Differentiate both sides and using (2), the result follows.
2.5 The Fourier Transform Problem
2.5.1 Introduction
The Fourier transform is one of the main tools used in the theory of distributions and
its applications to partial differential equations. In fact, a comprehensive study of
the theory of Fourier transforms and its techniques requires a whole separate book.
We will, however, confine ourselves to the material that suffices our needs and meets
the aims of the present book. Our main goal is to enlarge the domain of the Fourier
transform to apply to a wide variety of functions. If we confine distribution theory
to test functions, we cannot do much work on transformations. It is well-known
that some functions such as the Heaviside function, constant functions, polynomials,
periodic sine and cosine, and other functions are good examples of external sources
imposed on systems, so they appear in the PDEs representing the systems. Unfortu-
nately, these functions do not possess Fourier transforms. The duality of the Fourier
transform is not consistent with test functions because the Fourier transform of a test
function needs not be a test function. The key is to ensure the following two points:
1. To find a property that keeps a function vanishing at infinity. 2. If multiplied by
other smooth and nice functions, the integrand is integrable over R, or Rn .
2.5.2 Fourier Transform on Rn
Recall the one-dimensional Fourier transform of a function f : R −→ R is given by

∞
F{ f (x)} = f (x)e−iωx d x, (2.5.1)
−∞
where ω ∈ R. The Fourier transform of a function is denoted by
F{ f (x)}(ω) = fˆ(ω).
In n-dimensions, this is extended to the multidimensional Fourier transform

ˆ
F{ f (x)}(ω) = f (ω) = f (x)e−i(ω·x) d x. (2.5.2)
Rn
Here, x = (x1 , x2 , . . . , xn ) and ω = (ω1 , ω2 , . . . , ωn ) are spatial variables of n

dimensions, and
ω · x = ω1 x 1 + · · · + ω n x n .
We can recover the function from the Fourier transform through the inverse Fourier
transform ∞
1
f (x) = fˆ(ω)eiωx dω
2π −∞
on R, and
1
F −1 { fˆ(ω)} = fˆ(ω)ei(ω·x) dω
(2π)n Rn
for f : Rn −→ R. In the present section, we discuss the problem of existence of

Fourier and inverse Fourier transforms. Particularly speaking, we impose the follow-
ing questions:
(1) Under what conditions does the Fourier transform of a function f exist?
(2) Can we find the inverse Fourier transform of fˆ(ω) for a function f (x)?
(3) If the answer to (2) is yes, does this recover f again in the sense that
F −1 F{ f (x)} = FF −1 { f (ω)} = f (x)?
2.5 The Fourier Transform Problem 103
2.5.3 Existence of Fourier Transform
We begin with the following result.

Theorem 2.5.1 If f ∈ L 1 (R), then fˆ(ω) exists.
Proof Note that

∞ ∞
|F{ f }| ≤ f (x)e−iωx d x = | f (x)| d x < ∞.
−∞ −∞
The result establishes the fact that the Fourier transform F is, in fact, a linear bounded
(hence continuous) mapping from L 1 to L ∞ . Now, suppose
f ∈ L 1 (R) ∩ L 2 (R).
How to establish a Fourier transform for f ? The idea is to define the transform F on
a dense subspace of L 2 (R), then we extend the domain of definition to L 2 (R) using
the closure obtained by continuity. The typical example of such a subspace is the
space of simple functions because this space is dense in L 2 . Consider the truncated
sequence
f n = f · χ[−n,n] ,
where χ is the characteristic function

1 −n ≤ x ≤ n
χ[−n,n] (x) =
0 otherwise.
Then
f n ∈ L 1 (R) ∩ L 2 (R),
and it is known that
L 1 (R) ∩ L 2 (R) = L 2 (R).
It is easy to see that
f n − f 2 → 0, f n − f 1 → 0,
which implies that f n → f in L 2 (R), and the sequence { f n } is Cauchy in L 2 (R). On

the other hand, since f n ∈ L 1 (R) ∩ L 2 (R), fˆn (ω) ∈ L 2 (R) for every n, where
n
fˆn (ω) = f (x)e−iωx d x.
−n
Now, for 0 < n < m. With the aid of Plancherel Theorem which will be discussed
next, we have as n, m −→ ∞
F{ f n } − F{ f m }2 = F{ f n − f m }2 = 2π f n − f m 2 . (2.5.3)
So fˆn (ω) is Cauchy in the complete space L 2 (R), hence fˆn (ω) converges in the
L 2 −norm to a function in L 2 (R), call it h(x). Note that h was found by means of
the sequence { f n }. Let us assume there exists another Cauchy sequence, say gn ∈
L 1 (R) ∩ L 2 (R), such that gn − f 2 → 0, which implies that gn − f n 2 → 0. By
the same argument above we conclude that ĝn −→ g in the L 2 (R) norm. Using
(2.5.3) again, it is easy to show that

h − g2 ≤ h − fˆn + 2π f n − gn 2 + ĝn − g 2 → 0.
2
This means that h = g a.e., i.e. h does not depend on the choice of the approximating
sequence { f n }, and therefore we can define now the Fourier transform of f on L 2 (R)
to be fˆ(ω) = F{ f } = h, as an equivalence class of functions in L 2 , and where
n
F{ f } = l.i.m n→∞ F{ f n } = l.i.m n→∞ f (t)e−iωt dt.
−n
The notation l.i.m is the limit in mean and is referred to the limit in the L 2 −norm.
For convenience, we will, however, write it simply as lim, keeping in mind that it
is not a pointwise convergence, but a convergence in L 2 - norm. It remains to prove
(2.5.3).
2.5.4 Plancherel Theorem
The following theorem is one of the central theorems in the theory of Fourier analysis.
It is called: the Plancherel theorem, and it is sometimes called: Parseval’s identity. It
demonstrates the fact that the Fourier transform F on L 2 (R) is a bijective linear oper-
ator which maps f to fˆ, and is an isometry up to a constant, so it is an isomorphism
of L 2 onto itself.
Theorem 2.5.2 (Plancherel Theorem) Let f ∈ L 2 (R), and let its Fourier transform
be fˆ. Then,
1

f 2 = √ fˆ .
2π 2
Proof We have
2.6 Schwartz Space 105
∞ ∞
| f (t)|2 dt = f (t) f (t)dt
−∞ −∞
∞ ∞
1
= f (t) fˆ(ω)e−iωt dωdt
2π −∞ −∞
∞ ∞
1
= fˆ(ω) f (t)e−iωt dt dω
2π −∞ −∞
∞
1
= fˆ(ω) fˆ(ω)dω
2π −∞
∞
1 ˆ 2
= f (ω) dω.
2π −∞

This result shows that the space L (R) is a perfect environment for the Fourier
2
transform to work in.
2.6 Schwartz Space
2.6.1 Rapidly Decreasing Functions
One of the central problems of Fourier analysis is how to apply Fourier transform to a
broader class of functions. The main obstacle is that we cannot guarantee F{ f } ∈ L 1 ,
even if F{ f } exists for some f ∈ L 1 , which will encounter a problem in satisfying
the essential identity
F −1 {F{ f }} = f.
One reason for this is that f is not decaying fast enough. This tells us that the rate
of convergence of the Fourier transform plays a significant role. To see this, let
f ∈ L 1 (R), with
lim f (t) = 0.
t→±∞
Using the definition and basic properties of Fourier transform, it can be easily shown
that
F{ f (t)} = iω fˆ(ω),
which gives

ˆ M
f (ω) ≤ for some M > 0.
ω
1
This implies that fˆ(ω) converges to 0 like . If we proceed further, assuming that
ω
f is absolutely integrable over R, and
lim f (t) = 0,
t→±∞
then we obtain
ˆ M
f (ω) ≤ 2 ,
ω
1
i.e., fˆ(ω) converges to 0 like 2 . This shows that the smoother the function and the
ω
more integrability of its derivatives, the faster the decay of its Fourier transform will
be. If f and all its derivatives are absolutely integrable over R and vanish at ∞, then
its Fourier transform decays at least exponentially. If we continue the process, we
find that fˆ(ω) converges faster than any inverse of the polynomial, i.e., fˆ(ω) is a
rapidly decreasing function. On the other hand, it is well-known that
d ˆ
F{t f (t)} = i ( f (ω)).
dω
Repeating these processes infinitely many times can only work if we are dealing with
infinitely differentiable functions of rapid decay. We conclude that if f has a high
rate of decay, then fˆ is smooth, and if f is smooth, then fˆ has a high rate of decay.
Due to the duality between the smoothness of a function and the rate of decay of
its Fourier transform, and the rate of decay of a function with the smoothness of its
Fourier transform, the Fourier transform of a function can be used to measure how
smooth that function is, and the faster f decays the smoother its Fourier transform
will be. This idea motivated Laurent Schwartz in the late 40s of the last century
to introduce the class of rapidly decreasing functions which provides the bases for
Schwartz spaces.
Definition 2.6.1 (Rapidly Decreasing Function) Let ϕ ∈ C ∞ (R). Then ϕ is said to
be rapidly decreasing function if

ϕk,m = sup x k ϕ(m) (x) < ∞ for all k, m ≥ 0. (2.6.1)
x∈R
In other words, a rapidly decreasing function is simply a smooth function that

decays to zero as x → ±∞ faster than the inverse of any polynomial. According
to the definition, all the derivatives of such functions have the same property. The
following can be considered equivalent to (2.6.1) (see Problem 2.11.22)
lim x k ϕ(m) (x) = 0, for all k, m ≥ 0. (2.6.2)

|x|→∞

sup(1 + |x|2 )k ϕ(m) (x) < ∞, for all k, m ≥ 0. (2.6.3)
x∈R

sup(1 + |x|)k ϕ(m) (x) < ∞, for all k, m ≥ 0. (2.6.4)
x∈R
The definition can be easily extended to Rn . In this case, we need partial differen-
tiation. We define a multi-index α = (α1 , . . . , αn ) to be an n-tuple of nonnegative
integers αi ≥ 0, such that

αi ∈ N0 = N ∪ {0},
and we denote |α| = α1 + · · · + αn so that for x = (x1 , x2 , . . . , xn ) ∈ Rn , we have
x α = x1α1 . . . xnαn .
The norm (2.6.1) becomes

ϕα,β = sup x α ∂ β ϕ ,
x∈Rn
and definition (2.6.2) reads
lim x α ∂ β ϕ(x) = 0, for all α, β ∈ Nn0

|x|→∞
for |α| = k, and |β| = m.
2.6.2 Definition of Schwartz Space
It is a good exercise to show that if ϕ and φ are two rapidly decreasing functions,
then aϕ + bφ is also a rapidly decreasing function (verify), so the collection of all
rapidly increasing functions on R forms a linear space. This space is called Schwartz
Space.
Definition 2.6.2 (Schwartz Space) A linear space is called Schwartz Space, denoted
by S, if it consists of all rapidly increasing functions, which are also known as
Sobolev functions.
It is clear from the definition that every test function is a Schwartz function. That
is,
D(Rn ) ⊂ S(Rn ).
On the other hand, ϕ(x) = e−x is clearly Schwartz function but is not a test function.
2
Indeed,
2
f (x) = e x ∈ L loc
1
,
so f defines a regular distribution in D (Rn ). But

e x e−x d x = ∞.
2 2
f, ϕ =
R
Hence, ϕ is not a test function. Thus we have the following important proper inclusion
D(Rn ) S(Rn ).
For the convergence in S(Rn ), let {ϕ j } be a sequence s.t. ϕ j , ϕ ∈ S(Rn ). Then, we

say that ϕ j → ϕ in S(Rn ) iff

ϕ j − ϕ → 0
α,β
for every α, β ∈ Nn0 . For n = 1, we have

ϕ j − ϕ → 0
k.m
for every k, m ∈ N0 .
Under this new class of functions, if ϕ ∈ S then e−iωt ϕ ∈ S, and so F{ϕ} ∈ S
Indeed, the equivalent definition (2.6.3) with m = 0 gives

ϕe−iωx d x ≤ |ϕ(x)|

dx
≤ sup(1 + |x|2 )k ϕ(x)
x∈R (1 + |x|2 )k
< ∞.
So the Fourier transform of a function in S exists, and using the properties of Fourier
transform and the fact that
d ˆ
F{t f (t)} = t
f (t) = i ( f (ω)),
dω
we can claim the same result for all derivatives of F.
2.6.3 Derivatives of Schwartz Functions
The following result is the key to achieve our goal.

Proposition 2.6.3 Let ϕ ∈ S(R) be a Schwartz function, and F{ϕ}(ω) = ϕ̂(ω). Let
dkϕ
Dxk ϕ denotes , for k ∈ Z+ . Then
dxk
(1) (−i)k F{Dxk ϕ(x)} = ω k ϕ̂(ω).
(2) (−i)k F{x k ϕ(x)} = Dωk ϕ̂(ω).
Proof To prove (1), we perform integration by parts k times, taking into account that
Dxk e−iωx = (−iω)k e−iωx

and using the fact that ϕ vanishes at ∞ being Schwartz, we get

F{Dxk ϕ} = (−1)k ϕ(x)(−iω)k e−iωx d x = (iω)k ϕ̂(ω). (2.6.5)
R
The second assertion follows immediately given the fact that
Dωk e−iωx = (−i x)k e−iωx ,
where the differential operator Dωk is taken with respect to ω. We get

Dωk ϕ̂(ω) = ϕ(x)Dωk e−iωx d x.
R
Now, performing the differentiation k times w.r.t. ω gives

Dωk ϕ̂(ω) = ϕ(x)(−i x)k e−iωx d x
R

= (−i)k
x k ϕ(x)e−iωx d x
Rn
= (−i) F{x k ϕ(x)}.
k

The previous result can be easily extended to S(Rn ). Recall that the n-dimensional
Fourier transform of f : Rn −→ R, denoted by F{ f (x)}(ω), is given by

F{ f (x)}(ω) = fˆ(ω) = f (x)e−i(ω·x) d x
Rn
where
ω = (ω1 , ω2 , . . . , ωn ), ω · x = ω1 x1 + . . . + ωn xn .
Then, Proposition 2.6.3 takes the following form:

Proposition 2.6.4 Let ϕ ∈ S(Rn ) be a Schwartz function, and F{ϕ} = ϕ̂. Let D α
denotes ∂ α for α = (α1 , . . . , αn ) , |α| = α1 + · · · + αn . Then:
(1) (−i)|α| F{Dxα ϕ} = ω α ϕ̂(ω).
(2) (−i)|α| F{x α ϕ(x)} = Dωα ϕ̂(ω).
Proof The proof is the same for the previous proposition. To prove (1), note that the
integral in (2.6.5) becomes over Rn . Then

F{Dx j ϕ} = Dx j ϕ(x) e−iωx d x.
Rn
But since ϕ is Schwartz, we must have

Dxα ϕ(x) e−iωx ∈ L 1 ,
and using Fubini Theorem

F{Dx j ϕ} = Dx j ϕ(x) e−iωx d x.
Rn−1 R
Then, we proceed the same as in the proof of Proposition 2.6.3, and repeating |α|
times, we obtain (1).
To prove (2) we write

Dω j ϕ̂(ω) = Dω j ϕ(x)(e−iωx )d x.
Rn
Again,
x j ϕ(x)e−iωx = x j ϕ(x)
and x j ϕ(x) ∈ L 1 . So

Dω j ϕ̂(ω) = ϕ(x)Dω j e−iωx d x
Rn

= (−i x j )ϕ(x)e−iωx d x
Rn
= −iF{x j ϕ(x)}.
Repeating the process |α| times gives (2).
Notice the correspondence between the two processes: differentiation and multipli-
cation by polynomials, and one advantage of the Schwartz space is that it can deal
well with this correspondence because it is closed under the two operations. If we
add to this the advantage of being closed under Fourier transform, we realize why
such space is the ideal space to utilize. As a consequence of the previous result, if
ϕ ∈ S(Rn ), then F{Dxα ϕ} and F{x α ϕ(x)} exist, hence by Proposition 2.6.4 ω α ϕ̂(ω)
and Dωα ϕ̂(ω) exist, and we have
(−i)|α| F{Dxα ϕ} = ω α ϕ̂(ω)
and
(−i)|α| F{x α ϕ(x)} = Dωα ϕ̂(ω).
It turns out that F{ϕ} is a Schwartz function as claimed in the discussion at the
beginning of the section. We have the following important result.
Corollary 2.6.5 If ϕ ∈ S(Rn ), then F{ϕ} ∈ S(Rn ), that is, the Schwartz space is
closed under the Fourier transform.
The result is not valid for D(Rn ) because the Fourier transform of a test function
is not necessarily a test function.
2.6.4 Isomorphism of Fourier Transform on Schwartz Spaces
Consider the mapping

F : S(Rn ) −→ S(Rn ). (2.6.6)
Let f ∈ S(Rn ). Using Proposition 2.6.4,
ω α D β F{ f } = (−i)|β| ω α F{x β f }
= (−i)|β|+|α| F{D α x β f }.
But
D α x β f ∈ S(Rn ) ⊂ L 1 (Rn ). (2.6.7)
Hence, F{D α x β f }(ω) exists, and

sup ω α D β F{ f } < ∞.
Therefore,
F{ f } ∈ S(Rn ).

Note that if the sequence f j ∈ S(Rn ) and f j → f in S(Rn ), i.e., f j − f α,β → 0,
then
Dα x β f j → Dα x β f
in S(Rn ). Furthermore, the inclusion in (2.6.7) implies that f j → f in L 1 , but since

F is continuous on L 1 , this implies
F{D α x β f j } → F{D α x β f j }
and consequently,
ω α D β fˆj → ω α D β fˆ.
Hence, fˆj → fˆ and F is thus continuous on S.

If f (x) ∈ S(Rn ), then F 2 { f } = F{F{ f }}. Since

(2π)n f (−x) = F(ω)e−iω.x dω = F{F{ f }},
Rn
we have
F 2 { f (x)} = (2π)n f (−x), (2.6.8)
or, normalizing F by setting

1
T = F,
(2π)n/2
we get
T 2 ( f ) = f (−x),
from which we get

T 4 ( f ) = f.
This implies that

T 4 = I d S(Rn ) .
It follows that
T (T 3 ( f )) = f = T 3 (T ( f )),
and hence
T 3 = T −1 .
Since T is continuous, T −1 and F −1 are continuous. Moreover, for every f ∈ S(Rn ),

there exists g = T 3 { f } such that T {g} = f, thus we conclude that T, hence F, maps
S(Rn ) onto itself. This demonstrates the following important property:
Theorem 2.6.6 The mapping: F : S(Rn ) −→ S(Rn ) is an isomorphism.
The isomorphism of the Fourier transform between Schwartz spaces means that
we can find the Fourier transform and the inverse Fourier transform of any function
in that space. This makes S(Rn ) the ideal space to use Fourier transform.
2.7 Tempered Distributions
2.7.1 Definition of Tempered Distribution
It was illustrated in the previous section that the existence of Fourier transforms
is one of the central problems in Fourier analysis, and it has motivated Schwartz
to introduce a new class of distributions. The idea of Schwartz was to extend the
space of test functions to include more functions in addition to the smooth functions
of compact supports. Since the space of functions associated with distributions is
getting larger, we expect the new space of distributions to be smaller, hoping that
this new class of distributions will have all the properties we need to define a Fourier
transform.
2.7 Tempered Distributions 113
Let S (Rn ) be the dual space of S(Rn ), and

∞
T, ϕ = T (x)ϕ(x)d x.
−∞
It is clear that if T1 , T2 ∈ S (Rn ), then
aT1 + bT2 ∈ S (Rn )
for every a, b ∈ R. Moreover, if

ϕ j → ϕ,
then
T, ϕ j → T, ϕ .
This proposes the following class of distributions.

Definition 2.7.1 (Tempered Distribution) A distribution T : S(Rn ) → R is said to
be tempered distribution if it is a linear continuous functional defined as
∞
T (ϕ) = T, ϕ = T (x)ϕ(x)d x.
−∞
The space of all linear continuous functionals on S(Rn ) is called the space of tem-
pered distributions, and is denoted by S (Rn ).
2.7.2 Functions of Slow Growth
The tempered distribution can be defined through a function f with the property that
f ϕ is rapidly decreasing. This can be achieved by what is known as “functions of
slow growth”.
Definition 2.7.2 (Function of Slow Growth) A function f ∈ C ∞ (Rn ) is said to be
of slow growth if for every m there exists cm ≥ 0 such that
(m)
f (x) ≤ cm (1 + |x|2 )k
for some k ∈ N.
The definition implies that functions of slow growth grow at infinity but no more
than polynomials, i.e., for some k, we have
f (x)
→ 0.
xk
The reader should be able to prove that if f is a function of slow growth and ϕ is
Schwartz function, then f ϕ is Schwartz (see Problem 2.11.35), hence integrable.
Therefore, this class of functions can be used to define a tempered distribution. Let
f be of slow growth, and ϕn ∈ S, then
| f, ϕn − f, ϕ | = | f, ϕn − ϕ | ≤ f ϕn − ϕ . (2.7.1)
If ϕn → ϕ in S, then (2.7.1) implies that f is continuous. Linearity of f is obvious,

hence f is a linear continuous functional on S, i.e., defines a tempered distribution.
Thus, let f be a function of slow growth. Define T f as
∞

T f (ϕ) = T f , ϕ = f (x)ϕ(x)d x.
−∞
Linearity is clear. To prove continuity, consider the sequence ϕn ∈ S. Then
T f (ϕn ) = f, ϕn
∞
= f (x)ϕn (x)d x
−∞
∞
f (x)
= · (1 + |x|2 )k ϕn (x)d x
−∞ (1 + |x| )
2 k

∞ f (x)
≤ sup (1 + |x|2 )k ϕn d x.
−∞ (1 + |x| )
2 k
Since f is of slow growth, the integral exists for some large k. If ϕn → 0, then
ϕn k,m = 0.
Let m = 0, then
sup (1 + |x|2 )k ϕn → 0
because {ϕn } is rapidly decreasing, hence
f, ϕn → 0.
Therefore, f is continuous. We proved f to be linear and continuous, so f is a tem-

pered distribution. As a result, all polynomials, constant functions, and trigonometric
sine and cosine functions generate tempered distributions.
2.7.3 Examples of Tempered Distributions
It should be noted that every tempered distribution is regular but not the converse.
Again, the function
2.7 Tempered Distributions 115
2
f (x) = e x
defines a regular distribution, but it is not a tempered distribution because
ϕ(x) = e−x ∈ S,
2
and the integral f ϕ diverges. On the other hand, if we assume f to be of slow

R
growth, i.e., | f | ≤ p(x) for some polynomial p, then

f ϕ < ∞.
R
This tells us that every tempered distribution is a regular distribution, i.e.,
S (R) D (R),
and there are regular distributions that are not tempered.

Example 2.7.3 Consider the Heaviside function H (x). Using the same technique
as above
∞

| H, ϕn | = H (x)ϕn (x)d x
−∞ ∞
∞
= ϕn (x)d x ≤ |ϕn (x)| d x
0 0
∞
1
≤ sup (1 + |x|2 )ϕn d x.
−∞ (1 + |x| )
2
The integral exists. Let ϕn → 0. Then,
ϕn 1,0 = 0,
so
sup (1 + |x|2 )ϕn → 0,
which implies that

| H, ϕn | → 0.
So H (x) is a continuous functional. Linearity is clear. Hence
H (x) ∈ S (R).
Example 2.7.4 Consider the delta distribution. We have

| δ, ϕn | = |ϕn (0)| .
If ϕn → 0, then
ϕn k,m = 0.
Let k = m = 0, then sup |ϕn | → 0, which implies
δ, ϕn → 0.
Therefore, δ is a continuous functional. Linearity is clear, hence δ ∈ S (R).
2.8 Fourier Transform of Tempered Distribution
2.8.1 Motivation
Suppose we need to take the Fourier transform of a tempered distribution T , and for
simplicity, let n = 1. Then

T̂ , ϕ = T̂ ϕ(ω)dω
R
= T (x)e−iωx ϕ(ω)d xdω.
R R
The RHS of the equation can be written by means of Fubini Theorem as

−iωx

ϕ(ω)e dω T (x)d x = T (x)ϕ̂d x = T, ϕ̂ .
R R R
Thus, we have
T̂ , ϕ = T, ϕ̂ .

In order for T, ϕ̂ to make sense, it is required that ϕ̂ ∈ S for every ϕ ∈ S. So now
we understand the purpose of introducing a new class of distributions which is the
dual of rapidly decreasing (Schwartz) functions. The tempered distribution seems to
behave nicely with Fourier transforms. Now, we state the definition of the Fourier
transform of distributions.
2.8 Fourier Transform of Tempered Distribution 117
2.8.2 Definition
Definition 2.8.1 (Fourier Transform of Tempered Distribution) Let T be a tempered

distribution and ϕ ∈ S. Then, the Fourier transform of T , denoted by T̂ (or F{T }),
is given by
T̂ , ϕ = T, ϕ̂ .
Remark The notation F{T } is commonly used to denote classical Fourier trans-
forms of functions, but the notation T̂ is more commonly used for distributions.
The definition says that

T̂ (ϕ) = T (ϕ̂)
for every ϕ ∈ S, and that makes sense because ϕ̂ ∈ S, and T ∈ S , and this means
that every tempered distribution has a Fourier transform. Now, the question arises is
How to find the
Fourier
transform of a distribution? We need to manipulate with the
integration in T, ϕ̂ and rewrite it as

g(x)ϕ(x)d x
for some g(x). Then, we obtain

T̂ , ϕ = g, ϕ
which implies
T̂ = g.
The Fourier transform, in this case, is defined in a distributional sense, but it is a

natural extension of the classical Fourier transform. If T is a function of slow growth
and its classical Fourier transform F{T } exists, then F{T } = g.
2.8.3 Derivative of F.T. of Tempered Distribution
The next proposition shows that the result of Proposition 2.6.4 for Schwartz functions
is valid for distributions.
Proposition 2.8.2 Let T be a tempered distribution. Then
(1) Dωα (T̂ (ω)) = (−i)|α|
xαT ,
α |α|
α
(2) ω T̂ = (−i) Dx T .
Proof To prove the first assertion, we have


Dωα (T̂ (ω)), ϕ(ω) = (−1)|α| T̂ , Dωα ϕ(ω) = (−1)|α| T, F{Dωα ϕ(ω)} .
By Proposition 2.6.4(1),
F{Dωα ϕ} = (i)|α| x α ϕ̂(ω).
This gives

(−1)|α| T, F{Dωα ϕ(ω)} = (−i)|α| T, x α ϕ̂(ω) = (−i)|α| x α T , ϕ(ω) .
Hence,
Dωα (T̂ (ω)), ϕ(ω) = (−i)|α|
x α T , ϕ(ω) .
This proves (1).

For (2), note that

D α α
x T , ϕ(ω) = D x T, ϕ̂ = (−1)
|α|
T, Dxα (ϕ̂) .
Again, by Proposition 2.6.4(2)

(−1)|α| T, Dxα (ϕ̂) = T, (iω) ˆ |α| ω α ϕ = (i)|α| ω α T,ϕ
α ϕ = T,(i) ˆ .
Hence,
α |α| α ˆ
D x T = (i) ω T.
Dividing by (i)|α| proves (2).
2.9 Inversion Formula of The Fourier Transform
The inverse Fourier transform of a distribution, denoted by F −1 {T }, or Ť , is given

by −1
F {F{T }}, ϕ = F{T }, F −1 {ϕ} = T, F{F −1 {ϕ}} = T, ϕ ,
i.e.,
F{F −1 {ϕ} = F −1 {F{T }} = T. (2.9.1)
How to construct a formula for Ť ? The next example is helpful in establishing some
subsequent results.
2.9 Inversion Formula of The Fourier Transform 119
2.9.1 Fourier Transform of Gaussian Function
Example 2.9.1 (Gaussian Function) Consider
f (x) = e−x /2
2
.
By definition, we have
∞
e−( 2 t +iωt )
1 2
F{ f (t)} = dt.
−∞
Write 2
1 2 t iω ω2
t + iωt = √ +√ + .
2 2 2 2
Use the substitution

t + iω
u= √
2
we get
√ −ω2 /2 ∞ √
e−u du = 2πe−ω /2
2 2
F{ f (t)} = 2e .
−∞
If x ∈ Rn , then f is written as
f (x) = e−|x| /2
2
.
So
e−|x| /2−i(ω·x)
2
F{ f } = d x.
Rn
By the previous argument for the case n = 1, taking into account the integral is made
over Rn , we have
n
∞
− 21 |ω|2
e− 2 [x+iwk ] d x
1 2
F{ f } = e
k=1 −∞
n
√
= e− 2 |ω|
1 2
( 2π)
k=1
n
−|ω|2 /2
= (2π) e 2 .
The Gaussian function shall be used to obtain the inversion formula of the Fourier
transform. Indeed, let f ∈ Cc∞ (Rn ), and consider the sequence
g (x) = g(x)
for some g ∈ C0∞ (Rn ) and > 0. Then
1 ω
F{g } = ĝ . (2.9.2)
n
It follows that
f, F{g } = f F{g }.
Using (2.9.2) and appropriate substitution we get

f, F{g } = f (y)ĝ(y)dy.
Then, we either use Dominated Convergence Theorem (verify), or the fact that
f (y) → f (0) uniformly as → 0 (justify) to pass the limit inside the integral,
and this gives
f, F{g } → f (0) ĝ(y)dy. (2.9.3)
On the other hand,

f, F{g } = F{ f }, g = fˆ(y).g (y)dy = fˆ(y).g(y)dy.
Passing to the limit → 0,

fˆ(y).g(y)dy −→ g(0) fˆ(y)dy.
Hence,
f (0) ĝ(y)dy = g(0) fˆ(y)dy. (2.9.4)
This holds for all possible g. So let g(x) be the Gaussian function discussed in
Example 2.9.1. Then, g(0) = 1, and
F{g} = ĝ = (2π) 2 e− 2 |ω| .

n 1 2
Integrating ĝ over Rn gives

2.9 Inversion Formula of The Fourier Transform 121

e− 2 |y| dy
n 1 2
ĝ(y)dy = (2π) 2
Rn Rn
n
∞
e− 2 yk dy
n 1 2
= (2π) 2
k=1 −∞
n
n
√
= (2π) 2 2π = (2π)n .
k=1
Hence, from (2.9.4) we obtain

1 ˆ(y)dy = 1
f (0) = f fˆ(y)e−i y·x ei y·x dy. (2.9.5)
(2π)n Rn (2π)n Rn
Then, for any x, f (x) can be obtained by using shifting property on (2.9.5) as
f (x − x0 ) ←→ e−iwx0 fˆ(ω),
and we obtain
1
f (x) = fˆ(y)ei y·x dy.
(2π)n Rn
This suggests (in fact establishes) the inversion formula for the Fourier transform.
The Inverse Fourier transform, denoted by fˇ(x), is defined as

1
F −1 { fˆ(ω)} = fˇ(x) = fˆ(ω)ei(ω·x) dω. (2.9.6)
(2π)n Rn
We have
FF −1 ( f ) = F −1 F( f ) = f.
Hence, the inverse Fourier transform of a distribution T can be defined similar to

Definition 2.8.1 as
Ť , ϕ = T, ϕ̌ , ∀ϕ ∈ S.
The analog of (2.9.1) can be achieved as follows: For every ϕ ∈ S, we have
T̂ˇ (ϕ) = T̂ (ϕ̌) = T (ϕ̂)

ˇ = Ť (ϕ̂) = Ťˆ (ϕ).
Hence,
T̂ˇ = Ť.
ˆ
2.9.2 Fourier Transform of Delta Distribution
Example 2.9.2 To find F{δ}, we have
F{δ}, ϕ = δ, F{ϕ} .
But this is equal to

∞ ∞
F{ϕ}(0) = e0xi ϕ(x)d x = 1 · ϕ(x)d x = 1, ϕ .
−∞ −∞
Hence F{δ} = 1.
The result of the example seems plausible. Let us see why. Let
1
φn (x) = sin(nx).
πx
It is well-known that ∞
sin x
d x = π.
−∞ x
Using the substitution u = nx, and a continuous f ,

∞ ∞
sin nx sin u u
lim f (x)d x = f du
n→∞ −∞ πx −∞ πu n

1 ∞ sin u u
= lim f du
π −∞ u n→∞ n
∞
1 sin u
= f (0) du = f (0).
π −∞ u
Therefore, φn (x) is a delta sequence. Now let us find the inverse Fourier of 1.
∞
−1 1
F {1} = 1.eiωx dω
2π −∞
L
1
= lim 1.eiωx dω
2π L→∞ −L
iωL
1 e − e−iωL
= lim
2π L→∞ ix
sin L x
= lim = δ(x).
L→∞ πx
Example 2.9.3 Consider δc . This is the shift δ(x − c) of δ. We have

2.10 Convolution of Distribution 123

F{δc }, ϕ = δc , F{ϕ} = F{ϕ}(c) = e−icx ϕ(x)d x = e−icx , ϕ .
R
Hence
F{δc }(ω) = e−icω .
Similarly, one can show that

F{δ−c } = eicω .
This implies that

F{eicx } = 2πδ(ω − c).
Moreover,
δc + δ−c
F = cos cω,
2
and
F{cos cx} = π(δ(ω − c) + δ(ω + c)).
2.9.3 Fourier Transform of Sign Function
Example 2.9.4 Since (sgn) = 2δ, F{sgn } = 2. According to Proposition 2.8.2,
F{sgn } = iωF{sgn}.
So
2
F{sgn} = .
iω
The unit step function can be written as
1
H (x) = (1 + sgn(x)),
2
so we obtain
1 2 1
F{H } = 2πδ + = πδ + .
2 iω iω
2.10 Convolution of Distribution
The convolution of two functions is a special type of product that satisfies elementary
algebraic properties, such as the commutative, associative, distributive properties. To
define a convolution for distributions, we restrict ourselves to the case of a distribution

and a test function.
2.10.1 Derivatives of Convolutions
First, we need the following result, which discusses the derivative of convolutions.
Lemma 2.10.1 (ϕ ∗ ψ)(k) = ϕ(k) ∗ ψ = ϕ ∗ ψ (k) for k = 0, 1, 2, . . .
Proof We differentiate ϕ ∗ ψ to obtain

d
(ϕ ∗ ψ) = ϕ (x − y)ψ(y)dy
dx R

d
= (−1) ϕ(x − y)ψ(y)dy
dy
R
= (−1)ϕ(x − y)ψ (y)dy
R
= ϕ ∗ ψ (x).
Continuing the process k times gives the result.
2.10.2 Convolution in Schwartz Space
The previous result indicates the smoothness property of convolution. If ψ ∈ S(Rn ),

then the convolution of ψ with another function ϕ is smooth and can be differentiated
infinitely many times. As a consequence, we have
Theorem 2.10.2 If ϕ, ψ ∈ S(Rn ), then ϕ ∗ ψ ∈ S(Rn ).
Proof The previous lemma implies that
ϕ ∗ ψ ∈ C ∞ (Rn ).
To prove ϕ ∗ ψ is rapidly decreasing, we use Definition 2.6.1. Notice that

|x|k |ϕ(x − y)| ψ (m) (y) ≤ 2k |x − y|k |ϕ(x − y)| ψ (m) (y) + |y|k |ϕ(x − y)| ψ (m) (y) .
Then,

|x|k (ϕ ∗ ψ)(m) ≤ 2k |x − y|k |ϕ(x − y)| ψ (m) (y) + |y|k |ϕ(x − y)| ψ (m) (y) dy.
R
(2.10.1)
2.10 Convolution of Distribution 125
Since ϕ, ψ ∈ S(Rn ), the integral in the RHS of (2.10.1) exists and finite (why?).
Hence
|x|k (ϕ ∗ ψ)(m) < ∞.
Taking the supremum over all x ∈ R the result follows. For S(Rn ), the proof is the
same as above, with k is replaced with |α| and m with β for some α, β ∈ Nn0 , and
taking the integral over Rn .
2.10.3 Definition of Convolution of Distributions
Now, let T ∈ S (Rn ) and ψ ∈ S(Rn ). The convolution of T with ψ is given by

ψ ∗ T, ϕ = ψ(x − y)T (y)dyϕ(x)d x.
R R

= T (y)dy ψ(x − y)ϕ(x)d x.
R R
Using Fubini Theorem, this gives

ψ ∗ T, ϕ = T, ψ − ∗ ϕ ,
where
ψ − (x) = ψ(−x).
This definition won’t make sense unless
ψ − ∗ ϕ ∈ S.
Now we are ready to state the following definition.

Definition 2.10.3 (Convolution of Distribution) Let T ∈ S (Rn ) and ψ ∈ S(Rn ).
Then the convolution of T and ψ is given by

ψ ∗ T, ϕ = T, ψ − ∗ ϕ .
2.10.4 Fundamental Property of Convolutions
We apply the definition to establish the following fundamental property of convolu-

tions.
Theorem 2.10.4 If f (n) exists, then for all n = 0, 1, 2, . . . , we have
f ∗ δ (n) = f (n) .
Proof Let n = 0. Applying Definition 2.10.3, we get

f ∗ δ, ϕ = δ, f − ∗ ϕ
= ( f − ∗ ϕ)(0)

= f (y − 0)ϕ(y)dy
R
= f, ϕ .
Hence, we obtain
f ∗ δ = f.
For n = 1, we have

f ∗ δ, ϕ = δ, f − ϕ
d
= − δ, ( f − ∗ ϕ)
dx
= −( f − ∗ ϕ) (0

= (−1)(−1) f (y − x)ϕ(y)dy |x=0
R

= f , ϕ .
Using induction, one can easily prove that the result is valid for all n.
The result shows that the delta distribution plays the role of the identity of the
convolution process over all distributions. The advantage of this property is that
the delta function and its derivatives can be used in computing the derivatives of
functions.
2.10.5 Fourier Transform of Convolution
Now, we state the Fourier transform of convolution.

Theorem 2.10.5 Let T ∈ S (R) and ψ ∈ S(R). Then,
F{ψ ∗ T } = F{ψ} · F{T }.
Proof We have
F{ψ ∗ T }, ϕ = ψ ∗ T, F{ϕ}

= T, ψ − ∗ F{ϕ} ,
2.11 Problems 127
which, using the fact that

F{F{ψ}} = ψ −
(verify), can be written as
T, F{F{ψ}} ∗ F{ϕ} = T, F{F{ψ} · ϕ}

= F{T }, F{ψ} · ϕ
= F{ψ} · F{T }, ϕ .
Therefore, we obtain
F{ψ ∗ T } = F{ψ} · F{T }.
2.11 Problems
(1) Show that Cc∞ (R) is an infinite-dimensional linear space.

(2) Give an example of a function that is not locally integrable on R2 .
(3) Give an example of a locally integrable function f that defines regular distri-
bution, but f 2 does not.
(4) Give an example of a function f ∈ L loc1
(I ) but f ∈/ L(I ) for some interval
I ⊂ R.
(5) Consider the following function

e−1/t t > 0
f (t) = .
0 t ≤0
(a) Show that f ∈ C ∞ (R).

(b) Define
g(x) = f (1 − |x|2 ).
Show that supp(g) = (0, ∞).

(6) Determine when f (x) = x α defines a regular distribution.
(7) Prove Proposition 2.2.3.
(8) Let ϕ ∈ D (R).
(a) Show that ϕ(at) ∈ D(R) for all a ∈ R \ {0}.
(b) If g ∈ C ∞ (R), show that g(t)ϕ(t) ∈ D(R).
(9) Let T ∈ D (R). Show that
f (t − τ ), ϕ(t) = f (t), ϕ(t + τ ) .
(10) Show that if T ∈ D() for ⊆ R and f ∈ C0∞ (R) then
d( f T ) = f (DT ) + T f .
(11) Show that for s < r we have
L rloc ⊂ L loc
s
.
(12) Show that the following sequences are delta-sequences

n
(1) ϕn (x) = , for n → ∞.
π(1 + n 2 x 2 )
sin2 nx
(2) ϕn (x) = , for n → ∞.
nπx 2
(13) Evaluate the integral

∞
δ(x − y − u)δ(u − t)du.
−∞
(14) Show that ∞

δ(x − a)δ(x − b)d x = δ(a − b).
−∞
(15) Find the first distributional derivative of each of the following functions.
1
(1) f (x) = |x| . (5) f (x) = √ .
x+
x
(2) f (x) = . (6) f (x) = cos x for x irrational.
|x|
1
(3) f (x) = |x|2 . (7) f (x) = ln |x| sgn(x).
2
1
(4) f (x) = √ . (8) f (x) = H (x) sin x.
x
(16) Determine whether each of the following functions is a Schwartz function or
not √
(1) f (x) = e−a|x| for all a > 0. (5) f (x) = e− x +1 .
2
(2) f (x) = e−a|x| for all a > 0. (6) f (x) = e−x cos e x .
2 2 2
(3) f (x) = xe−x . (7) f (x) = e−x cos e x .

2
(4) f (x) = e−|x| . (8) f (x) = e−x cos e x .

2
(17) Let T f be a regular distribution. Show that F{T f (ϕ)} = T f (F(ϕ)).

2.11 Problems 129
(18) Prove the scaling property:
1 x
F{ϕ(ax)} = F ϕ .
|a| a
(19) Show the following shift properties:

(a) F{ f (x − a)} = e−iaω F{ f }.
(b) F{ei x·a f (x)} = fˆ(ω − a).
(20) Prove the duality property of the operator F: If F{ f (x)} = F(ω), then
F{F(x)} = 2π f (−ω).
(21) Let f ∈ S(Rn ). Show that

(a) F{ f (x + y)} = ei yω F{ f }.
1 ω
(b) F{ f (λx)} = n F{ }, λ ∈ R.
|λ| λ
(22) Show that definitions from (2.6.2) to (2.6.4) are all equivalent.
(23) Show that if f n converges in D(R), then f n converges in S(R).
(24) Let ϕ ∈ S(R) and P(x) is a polynomial on R. Prove that
(P(x))c ϕ(x) ∈ S(R)
for any c ≥ 0.
(25) Show that if ϕ, ψ are rapidly decreasing functions, then ϕ · ψ ∈ S(R).
Conclude that
ϕ ∗ ψ ∈ S(R).
(26) Prove that if F{ f } is rapidly decreasing function, so is f .

(27) Show that the convergence in S(Rn ) is uniform.
(28) Show that
f (x) = e−|x| ∈ S(R)
m
if and only if m = 2n for n ∈ N.

(29) Show that F : S(Rn ) −→ S(Rn ) is absolutely convergent.
(30) Show that |·|α,β is a seminorm but not a norm.
(31) Show that if f ∈ S(Rn ) then
| f | ≤ cm (1 + |x|)−m
for all m ∈ N, but the converse is not true.

(32) Show that if f (x) ∈ S(Rn ) then
f (−x), f (ax) ∈ S(Rn ).
(33) Determine whether each of the following functions defines a tempered distri-
bution.
4
(1) f (x) = e x .
(2) f (x) = e .
x
(3) f (x) = x 3 .
(34) Show that if f (x) ∈ S (R) then
f (−x), f (at) ∈ S (R)
for a ∈ R.
(35) Show that a product of a function of slow growth with a Schwartz function is
again a Schwartz function.
(36) If f n → f in S , show that f n → f in S .
(37) Show that if T is tempered distribution then so is T.
(38) Determine whether e x cos(e x ) belongs to S (R).
(39) Show that
(T f )− , ϕ = T f , ϕ− .
(40) Show that

D ⊂ S ⊂ L 2 ⊂ S ⊂ D.
(41) Use the duality property (F{F(x)} = 2π f (−ω)) to find the Fourier transform
1
of f (x) = .
x
(42) Show that ∞
1
δ(ω) = eiωx dω.
2π −∞
(43) Find the Fourier transform of

(1) e−ax +bx+c . (5) |x|α sgn(x).
2
(2) e−b|x| . (6) |x|α ln |x| .

2
1 1
(3) . (7) p.v. .
x + ia x
1 1
(4) . (8) p.v. 2 .
(x + ia)2 x
(44) Find F{(−x)α } and F{D α δ}.
(45) Show that if f ∈ S(Rn ) then
F{ f } = (2π)n F −1 { f }.
2.11 Problems 131
(46) Let F : S (Rn ) −→ S (Rn ). Show that F is continuous and F −1 exists.

(47) Show that
ω α D β (F{ f }) = (−i)|β|+|α| D α (F{(x β f )}).
(48) Let T be the distribution defined by the function F{ f }, for some f ∈ S (Rn ).
Show that TF { f } = F{T f }.
(49) Fourier Transform of Polynomials: Show that the following is true.
(a) F{e−icx } = 2πδ(ω + c) for x ∈ R and F{e−ic·x } = (2π)n δ(ω + c) for x ∈
Rn .
(b) F{i k x k e−icx } = (−1)k (2π)D k (δ(ω + c)) for x ∈ R and
(c) F{i |α| x α e−ic·x } = (−1)|α| (2π)n D α (δ(ω + c)) for x ∈ Rn .
(d) F{x k } = 2πi k D k δ(ω).
!n !
n
(e) F{P(x)} = 2π a j (i) j D j (δ(ω)) for the polynomial P(x) = ajx j.
j=0 j=0
(50) Show that the following inclusions are proper
(a) S(Rn ) ⊂ Cc∞ (Rn ).
(b) S (R) ⊂ D (R).
(51) If T ∈ S and ϕ, ψ ∈ S, show that
(T ∗ ϕ) ∗ ψ = T ∗ (ϕ ∗ ψ).
(52) Show that F{ f g} = fˆ ∗ ĝ and
F −1 {F{ f } ∗ F{g}} = f g.
(53) Let f, g ∈ C0∞ (R).

(a) Show that:
f ∗ g ∈ C0∞ (R).
(b) Show that:

supp( f ∗ g) ⊂ supp( f ) + supp(g).
(54) Let T ∈ S (R) and ϕ ∈ S(R). Show that T ∗ ϕ ∈ S (R).

(55) Show that
1
F{ϕ · ψ} = F{ϕ} ∗ F{ψ}
(2π)n
for x ∈ Rn .
(56) Let f ∈ L loc
1
(R).
(a) Show that if g ∈ Cc (R) then f ∗ g ∈ C(R).
(b) Show that if g ∈ Cc1 (R) then f ∗ g ∈ C 1 (R).
(c) Show that if g ∈ Cc∞ (R) then f ∗ g ∈ C ∞ (R).
Chapter 3
Theory of Sobolev Spaces
3.1 Weak Derivative
3.1.1 Notion of Weak Derivative
Recall Definition 1.4.1 for distributional derivatives was given in the form

T (k) , ϕ = (−1)k T, ϕ(k) .
Under this type of derivative, distributions have derivatives of all orders. Another
generalization of differentiation is proposed for locally integrable functions that are
not necessarily differentiable in the usual sense. Such type of derivatives has two
advantages: Providing derivatives for nondifferentiable functions and generalizing
the notion of partial derivative. Recall the multi-index
α = (α1 , . . . , αn )
defined in Sect. 2.6 as the n-tuple of nonnegative integers αi ≥ 0, where
αi ∈ N0 = N ∪ {0},
and we denote
|α| = α1 + · · · + αn .
Then, for x = (x1 , x2 , . . . , xn ) ∈ Rn , we have
x α = x1α1 . . . xnαn .
For differentiation, we have
https://doi.org/10.1007/978-981-99-3788-2_3
134 3 Theory of Sobolev Spaces
∂ |α| u
∂xα = ∂1α1 . . . ∂nαn , ∂ α u = .
∂x1α1 . . . ∂xnαn
For example, letting α = (2, 1, 0) and u = u(x, y, z), we get
∂3u
Dαu = ∂ αu = .
∂x 2 ∂ y
In general, for a function u ∈ L p (Rn ), we can write

⎛ ⎞
D1 u(x)
⎜ D2 u(x) ⎟
⎜ ⎟
Du(x) = ⎜ .. ⎟,
⎝ . ⎠
Dn u(x)
with norm
Du pp = D1 u pp + D2 u pp + · · · + Dn u pp .
Motivated by the distributional derivative, the differentiation takes the form
∂ α T, ϕ = (−1)|α| T, ∂ α ϕ .
We give the definition of a weak derivative.

Definition 3.1.1 (Weak Derivative) Let u ∈ L 1Loc (), ∈ Rn . If there exists a func-
tion v ∈ L 1Loc () such that
u D α ϕd x = (−1)|α| vϕd x.

for every ϕ ∈ Cc∞ (), then D α u = v, and we say that u is weakly differentiable of
order α, and its αth weak derivative is v. We also say that the αth partial derivative
of the distribution T is given by
D α T (ϕ) = (−1)|α| T (D α ϕ)
Remark Observe the following:
(1) It is easy to see from the definition that
D α u(c1 ϕ1 + c2 ϕ2 ) = c1 D α u(ϕ1 ) + c2 D α u(ϕ2 )
and
D α u, ϕm = (−1)|α| u, D α ϕm −→ (−1)|α| u, D α ϕ = D α u, ϕ .

3.1 Weak Derivative 135
Consequently, D α u is a bounded linear functional.

(2) Classical and weak derivatives coincide if u is continuously differentiable on .
In this case, if |α| = k, then
dk
D α u = u (k) = u.
dxk
(3) If T is a regular distribution (i.e., T = Tu for some locally integrable u), then
the distributional derivative and the weak derivative coincide, that is, if |α| = k,
then D α u = T (k) .
(4) If u ∈ L 1Loc (), but there is no v ∈ L 1Loc (Rn ) such that
u D α ϕd x = (−1)|α| vϕd x

for every ϕ ∈ Cc∞ (), then we say that u has no weak αth partial derivative.
A distribution may have a distributional derivative without having a weak derivative.

For example, consider the Heaviside function H (x). Then, H = δ is the distribu-
tional derivative, but it is not locally integrable; hence H has no weak derivative. On
the other hand, let f (x) = |x| , then
f (x) = sgn(x) ∈ L 1Loc (R),
so it is the weak and the distributional derivative of f .
3.1.2 Basic Properties of Weak Derivatives
The next theorem discusses some basic properties of the weak derivative.
Theorem 3.1.2 Let u, D α u ∈ L 1Loc (), then
(1) D α u is unique.
(2) If D β u, D α+β u ∈ L 1Loc () exist, then
D β (D α u) = D α (D β u) = D α+β u.
Proof For (1), let v1 and v2 be two weak derivatives of u, then
v1 ϕd x = − uϕ d x

and
v2 ϕd x = − uϕ d x,

hence we have
(v2 − v1 )ϕd x = 0.

In Sect. 3.2, we will see that this implies that v2 = v1 a.e.

For (2), note here that D α (D β ϕ) = D α+β ϕ since ϕ ∈ Cc∞ (). Now, for every
ϕ ∈ Cc∞ () we have
D β (D α u)ϕd x = (−1)|β| D α u D β ϕd x

= (−1)|β|+|α| u D α D β ϕd x

= (−1)|β|+|α| u D α+β ϕd x

= (−1)|β|+|α|+|α+β| (D α+β u)ϕd x

α+β
= (D u)ϕd x.

This yields
D β (D α u) = D α+β u,
and exchanging α and β gives
D α (D β u) = D α+β u.
For (3), note that the convergence in L 1Loc implies convergence in the sense of dis-
tributions. which allows us (verify) to write
vϕ = lim(D α u n )ϕ = (−1)|α| (lim u n )D α ϕ = (−1)|α| u Dαϕ

and the result follows.
3.1.3 Pointwise Versus Weak Derivatives
A natural question that arises is whether there is a connection between pointwise

derivative and weak derivative. The answer can be illustrated through the example
of Heaviside function
1 x ≥0
H (x) = (3.1.1)
0 x < 0.
3.1 Weak Derivative 137
It was demonstrated in Example 2.4.4 that the distributional derivative of H is

δ, which is a singular distribution, and so δ ∈ / L loc
1
(−1, 1). This shows that H is
not weakly differentiable. But it can be immediately seen that the usual derivative
H = 0 for all x = 0.
We have seen that the weak derivative is identified through integration over the
set rather than being calculated pointwise at every point in the set. The previous
example shows that discontinuous function or piecewise continuous functions can
be pointwise differentiable a.e. without being weakly differentiable. In fact, there
are examples of continuous functions that are pointwise differentiable almost every-
where but not weakly differentiable (Cantor function). However, if the function is
continuously differentiable and its classical derivative exists for all x, then the weak
derivative exists and they coincide. The next result proposition proves this assertion.
Proposition 3.1.3 Let u ∈ C k (), ∈ Rn . Then the weak derivatives |α| ≤ k exist
and are equal to the classical derivatives 1 ≤ n ≤ k, up to order k.
Proof If u ∈ C k () then
D α u ∈ C() ⊆ L 1Loc ().
So, D α u as a weak derivative exists. Then, for ϕ ∈ Cc∞ () we have
∇uϕd x = (−1)|α| u∇ϕd x = (−1)|α|+|α| D α uϕd x.

Hence
∇u = D α u.
Similar arguments for k > 1.

The above proposition tells that the weak derivative exists for continuously dif-
ferentiable functions, but the jump discontinuity in the step function (3.1.1) causes
serious problems to the existence of weak derivatives.
The next result gives a characterization to determine whether a function in L 1Loc
has its weak derivative also in L 1Loc .
Proposition 3.1.4 Let u ∈ L 1Loc (). If there exists u j ∈ C ∞ () such that u j −→ u
and D α u j −→ vα in L 1Loc (), then D α u exists and
D α u = vα ∈ L 1Loc ().
Proof Let ϕ ∈ Cc∞ (). Then we have
(u j ϕ − uϕ)d x ≤ sup |ϕ| u j − u d x → 0.

Therefore,
u D α ϕd x = lim u j D α ϕd x

= (−1)|α| lim D α u j ϕd x

|α|
= (−1) vα ϕd x,

The next result deals with the general case if u, u j ∈ L P ().

Proposition 3.1.5 Let u ∈ L P () for open ⊆ Rn and p ≥ 1. Consider a
sequence v j ∈ L P () such that D α v j ∈ L P (). If v j −→ v in L P () and D α v j −→
wα in L P (), then D α u exists and D α u = w.
Proof The L p convergence implies that (verify) for all ϕ ∈ Cc∞ ().
lim D α v j ϕd x = wα ϕd x.

On the other hand, the definition gives
lim D α v j ϕd x = (−1)|α| lim v j D α ϕd x = (−1)|α| lim v D α ϕd x.

3.1.4 Weak Derivatives and Fourier Transform
We conclude the section with the following transformation property between weak
derivatives and powers of the independent variables.
Proposition 3.1.6 Let u ∈ L 1Loc (Rn ), and F{u}(ω) = û(ω). Then
(1) F{Dxα u} = (i)|α| ω α û(ω).
(2) Dωα (û(ω)) = (−i)|α| F{x α u(x)}.
Proof The proof is similar to Proposition 2.8.2.
This demonstrates that the Fourier transform behaves well with weak derivatives in
a similar manner to the usual (and distributional) derivatives, and it preserves the
correspondence between the smoothness of the function and the rate of decay of its
Fourier transform.
3.2 Regularization and Smoothening 139
3.2 Regularization and Smoothening
3.2.1 The Concept of Mollification
In light of the discussion in the preceding sections, it turns out that Schwartz spaces
play a dominant role in the theory of distributions. We will further obtain interesting
results that will lead to essential consequences on distribution theory. The main tool
for this purpose is “mollifiers”.
A well-known result in measure theory is that any function f in L p can be approx-
imated by a continuous function g. If g is smooth, this will give an extra advantage
since g in this case can serve as a solution to some differential equation. Since the
space of smooth functions C ∞ is a subspace of C, we hope that any function f ∈ L p
can be approximated by a function g ∈ C ∞ , which will play the role of “mollifier”.
There are two remarks to consider about g:
(1) Since our goal is to approximate any f ∈ L p by a function g ∈ C ∞ , this function
is characterized by f, so we can view it as f such that f → f as → 0.
(2) The smoothening process of getting a smooth function out of a continuous (not
necessarily smooth) reminds us of convolutions in which the performed integra-
tion has the effect of smoothening the curve of the function and thus eliminating
sharp points on the graph, which justifies the name “mollifier”. As the word
implies, to mollify an edge means to smoothen it. In general, if f is continuous
and g is not, then f ∗ g is continuous. If f is differentiable but g is not, then
f ∗ g is differentiable, i.e., f ∗ g will take the better regularity conditions of the
two functions.
Mollifiers are functions that can be linked to other functions by convolution to
smoothen the resulting function and give it more regularity. Therefore, we may write
f = φ ∗ f (3.2.1)
for some particular function
φ ∈ C ∞ (Rn )
independent of the choice of f . To get an idea of φ , note that if we take the Fourier
transform of (3.2.1) and take → 0, then we obtain
F{ f } = F{φ }F{ f }. (3.2.2)
On the other hand, the continuity of the Fourier transform, and the fact that f → f
implies
F{ f } → F{ f }. (3.2.3)
Combining (3.2.1) and (3.2.3) gives
F{φ } → 1,
But
1 = F{δ},
so as → 0 we obtain φ → δ. This means that the family of functions φ plays the

role of a delta sequence. So our task is to seek a function φ ∈ Cc∞ (Rn ) such that the
family φ has the following properties:
1. 0 ≤ φ ∈ Cc∞ (Rn ),
2. φ = 1, and
Rn
3. φ → δ.
As in (2.3.14), we suggest the form

1 x
φ = φ .
n
For simplicity, let n = 1. Indeed, we have

1 x
φ , f = φ f (x)d x.

Let x = y. Then,

φ , f = φ(y) f (y)dy. (3.2.4)
Now, let → 0. We have
φ , f → φ(y) f (0)dy = f (0) φ(y)dy.
If we set φ(y)dy as expected, then
φ , f → φ(y) f (0)dy = f (0) = δ, f ,
Thus, we have
φ → δ
as desired. To find a smooth function φ, the exponential function serves a good

example, but it needs to have a compact support, i.e., to be zero at the boundary out
of its compact support. The following example works properly. Let ϕ : Rn → R,
defined as:
⎧
⎨0 x ≥ 1
ϕ(x) = , (3.2.5)
⎩c. exp − 1−x
1
2 x < 1
where c is a number chosen so that
ϕ = 1.
R
The function ϕ is smooth and having the ball B1 (0) as support. For n = 1, we have
supp(ϕ) = [−1, 1].
If we define ϕ as in (2.3.14), then ϕ satisfies all properties of ϕ (verify), and
supp(ϕ ) = B (0)
i.e. for n = 1,
supp(ϕ ) = [−, ].
We are ready to define the mollifier.
3.2.2 Mollifiers
Definition 3.2.1 (Mollifier) Let ϕ ∈ Cc∞ (Rn ) with ϕ ≥ 0, supp(ϕ) = B1 (0), and
ϕ = 1. Then, the family of functions ϕ given by
Rn

1 x
ϕ (x) = n ϕ

for all > 0 is called mollifier.
Many functions can play the role of ϕ, but the function given in (3.2.5) is the
standard one, so the family ϕ is called the standard mollifier if ϕ is as given in
(3.2.5). It is easy to check that ϕ has the following standard properties:
(1) ϕ (x) ≥ 0 for all x ∈ Rn .
(2) ϕ ∈ Cc∞ (Rn ), with supp(ϕ ) = B (0).
(3) ϕ = 1.
Rn
(4) ϕ (x) → δ(x) as → 0.
Mollifiers are defined as C ∞ approximations to the delta distribution. Now, we can
mollify a function f ∈ L p (Rn ) by convolving it with any mollifier, say the standard
ϕ , giving rise to another family of functions
f = f ∗ ϕ .
The family f is called: f -mollification. For > 0, define the following set:
= {x ∈ : d(x, ∂) > }.
Here, ⊂ , and can be also described as
= {x ∈ : B (x) ⊆ },
so it is clear that → as → 0.
Note that if f : ⊆ Rn −→ R, then
f = ϕ (x − y) f (y)dy = ϕ (x − y) f (y)dy
Rn B (0)
for all x ∈ . This is the domain of f , and so f = f ∗ ϕ in , and
supp( f ) ⊂ supp( f ) + supp(ϕ ) = supp( f ) + B (0).
The following theorem discusses the properties of f and the importance of its
formulation.
Theorem 3.2.2 Let f ∈ L p , 1 ≤ p < ∞, and let f = f ∗ ϕ for some mollifier
ϕ , > 0. Then, we have the following:
(1) If f ∈ L p (Rn ) then f ∈ C ∞ (Rn ), and D α f = ϕ ∗ D α f for all x ∈ Rn .

(2) If f ∈ L p () then f ∈ C ∞ ( ), and D α f = ϕ ∗ D α f for all x ∈ .
(3) f p ≤ f p .
Proof For (1), We have
f = f ∗ ϕ = ϕ (x − y) f (y)dy.
Rn
Observe that
Dxαi ϕ (x − y) = (−1)|α| D αyi ϕ (x − y).
We obtain the following for all x ∈ Rn

(D α f )(x) = Dxα ϕ (x − y) f (y)dy

Rn
= (−1)|α| D αy ϕ (x − y) f (y)dy
Rn
= (−1)|α| (−1)|α| ϕ (x − y)D αy f (y)dy

Rn
= ϕ ∗ D αy f
= (D α f ) (x).
That is
(D α f )(x) = (D α f ) (x),
for all x ∈ Rn , and therefore f ∈ C ∞ (Rn ).

For (2), we assume α = 1 then we can use induction. Choose h small enough so
that x + hei ∈ . Then

f (x + hei ) − f (x) 1 1 x + hei − y x−y
= n ϕ −ϕ f (y)dy
h h

1 1 x + hei − y x−y
= n ϕ −ϕ f (y)dy
K h
for some compact K ⊂ . This implies that as h → 0+ ,

1 x + hei − y x−y 1 x−y
ϕ −ϕ −→ Dxi ϕ
h
uniformly on K . Hence, the weak derivative of f exists, and similar argument to the
above can be made for D α f to obtain
(D α f )(x) = (D α f ) (x)
for all x ∈ , and therefore f ∈ C ∞ ( ).

For (3), let f ∈ L p () where ⊆ Rn . Let q be the conjugate of p. Then,
ϕ = (ϕ ) p + q .
1 1
Using Holder’s inequality, we have

| f (x)| = ϕ (x − y) f (y)dy
1 1
q p
≤ ϕ (x − y) . ϕ (x − y). f

q p
q1 1p
= ϕ (x − y)dy . ϕ (x − y) | f (y)| p dy .
B (x) B (x)
But the first integral of the mollifier is equal to 1, and since supp(ϕ ) is compact, we
conclude that the second integral exists and finite. So
| f (x)| p ≤ ϕ (x − y) | f (y)| p dy. (3.2.6)

B (x)
Taking the integral of both sides of (3.2.6) over , we get
f pp = | f (x)| p d x ≤ ϕ (x − y) | f (y)| p d yd x,
B (x)
and, by Fubini Theorem, the above integral can be written as
f Lp p ( ) ≤ ϕ (x − y) | f (y)| p d xd y
B (x)
= | f (y)| p dy ϕ (x − y)d x
B (x)
≤ | f (y)| p dy ϕ (x − y)d xd y
Rn
= | f (y)| p dy = f Lp p ()

and this proves (3). The result can also be proved similarly for the case
= Rn .
3.2.3 Cut-Off Function
Another type of mollification that is important in applications is when f is the

characteristic function χ A . This can be achieved using cut-off functions.
Theorem 3.2.3 Let U be open set in Rn . Then for every compact set K ⊂ U , there
exists a function ξ(x) ∈ D(U ) such that 0 ≤ ξ ≤ 1 with ξ(x) ≡ 1 in a neighborhood
of K and supp(ξ) ⊂ U .
Proof For > 0, consider the following set:

K = {x : dist(x, K ) ≤ }.
Obviously, the set K contains K , and

K = B (x),
x∈K
and let be small enough that K 2 ⊂ . Consider the characteristic function
1 x ∈ K
χ K (x) = .
0 x∈/ K
Now, define the function
ξ(x) = (ϕ ∗ χ K )(x)
where ϕ (x) is the standard mollifier. Then, by Theorem 3.2.2, ξ ∈ C ∞ is smooth,

and
supp(ξ) ⊂ K + B (x) = K 2 ⊂ U.
Therefore, we have
ξ(x) ∈ D(Rn ).
Moreover, 0 ≤ ξ ≤ 1 and
ξ(x) = ϕ (y)χ K (x − y)dy = 1 ∀x ∈ K .
The previous theorem guarantees the existence of a function ξ ∈ D(Rn ) such

that ξ ≡ 1 on B1 (0). The function ξ is known as “cut-off function”. One can simply
formulate it as:
1 |x| ≤ 1
ξ(x) = . (3.2.7)
0 |x| ≥ 2
Now, if we consider the sequence

x
ξm (x) = ξ ,
m
then we see that ξm (x) = 1 on Bm (0), and

1 |x| ≤ m
ξm (x) = .
0 |x| ≥ 2m
Now, one can use the Dominated Convergence Theorem to show that if f ∈
L p (Rn ), then
ξm f − f p −→ 0
in L p (Rn ).
One important advantage of using cut-off functions over characteristic functions
is that breaking smooth functions with a cut-off function preserves the smoothness,
so if f ∈ C ∞ () then f ξ ∈ C ∞ (), while breaking them with the characteristic
(indicator) function of the form χ K m (x) may produce jump discontinuities at the
boundary of K m , so f ξ won’t be smooth on .
3.2.4 Partition of Unity
The third regularizing tool that will be studied is the partition of unity, which can
be obtained by use of cut-off functions. This is an important tool to globalize local
approximations.
Definition 3.2.4 (Locally Finite Cover) Let M be a manifold in Rn , and let {Ui } be
a collections of subsets of Rn . Then {Ui } is said to be locally finite cover of M if for
every x ∈ M, there exists a neighborhood of x N (x) such that N x ∩ Ui = Ø for all
but a finite number of indices i.
In other words, a locally finite cover means that for every point we can find a
neighborhood intersecting at most finitely many of sets in the cover. A topological
space is called paracompact if every open cover admits a locally finite refinement. All
locally compact Hausdorff spaces are paracompact spaces, hence Rn is paracompact,
i.e., every open cover {Uα } of a manifold in Rn has a locally finite refinement {Vi }.
If we apply this result again on {Vi }, we obtain another cover {Wi }, and it can be
shown that Wi ⊂ Wi ⊂ Vi .
Definition 3.2.5 (Subordinate) Let F1 = {Vi , i ∈ Λ} and F2 ={Ui , i ∈ Λ} be
two families of sets. Then we say that F1 subordinate to F2 if Vi ⊂ Ui for all i ∈ Λ.
Note here that if the two families are covers for some set M, then we say that F1
is refinement to F2 and a subcover of it.
Definition 3.2.6 (Partition of Unity) Let M be a manifold in Rn , and {φi : M →
R : i ∈ I } be a collection of nonnegative smooth functions. Then {φi } is a partition
of unity on M if
(1) 0 ≤ φi ≤ 1 for all i,


(2) φi (x) = 1 for all x ∈ M.
i
In light of the definition, one can also describe the functions as φi ∈ C ∞ (M), φi :
M → [0, 1]. The partition of unity can be used to “glue” local smooth paths to obtain
a global smooth one, and this will allow us to approximate nonsmooth functions
by smooth functions. It is worth noting that the set I is not necessarily countable,
but condition (2) implies that the summation is only over a countable indices i ∈
I ⊂ I where I is a countable set, so we can WLOG assume that I is countable.
Consequently, we have φi (x) = 0 for all but countably many i ∈ I . If we want the
summation to be over a finite number of indices, then we need extra conditions.
Proposition 3.2.7 Let M be a manifold in Rn , {Ui } be a locally finite cover of M,
and let {ξi } be a sequence of cut-off functions defined on M. If the subcover {supp(ξi )}
is a subordinate to {Ui }, then for each x ∈ M, we have ξi (x) = 0 for all but finitely
many i ∈ I .
Proof By the local finiteness of {Ui }, for every x ∈ M x ∈

/ supp(ξi ) for all but finitely
many ξi .
Another way of saying that the subcover {supp(φi )} is subordinate to the open
cover {Ui } is to say that {φi } is subordinate to {Ui }. Note that if {supp(φi )} is subor-
dinate to a locally finite cover {Ui }, then
{supp(φi )} = {φi−1 (0, 1]}
is also locally finite. A partition of unity is said to be locally finite on a set M, if

{supp(φi )} is locally finite cover of M.
Now, we discuss the existence of this partition.
Theorem 3.2.8 Let be open set in Rn . Then there exists a locally finite smooth
partition of unity {φi } on .
Proof Consider the collection {i }i∈N of subsets of defined by 0 = Ø, and for
k ≥ 1,

1
i = Bi (0) ∩ x ∈ : d(x, ∂) > .
i

Then for each i, we have: i is open, i is compact, i ⊂ i+1 , and i = .
Furthermore, for every x ∈ , there exists N ∈ N such that
x ∈ N +2 \ N +1 ,
which is clearly a compact subset of the open set
N +3 \ N .
Thus, let
K i = i+2 \ i+1
and
Ui = i+3 \ i .
Then clearly, {K i } and {Ui } are collections of compact sets and open sets, respectively,
and for each i, K i ⊂ Ui . It is easy to see from the construction of {Ui } that it is a locally
finite cover of (for example, if i < r < i + 1 then Br (0) will intersect Ui , Ui−1 , and
Ui−2 at most). By Theorem 3.2.3, there exists a sequence of cut-off functions {ξi }i∈N
such that for each i, we have ξi (x) ∈ D(), 0 ≤ ξi ≤ 1 with ξi (x) ≡ 1 on K i and
supp(ξi ) ⊂ Ui . This implies by Definition 3.2.5 that the sequence {ξi } subordinate
{Ui }, hence it is locally finite. By Proposition 3.2.7, the summation i ξi (x) is finite
for every x ∈ (only three nonzero terms), so we set

ξ(x) = ξi (x).
i
Then function ξ is well-defined and smooth. Now, we define the following sequence
φi ∈ D() given by
ξi (x)
φi (x) = .
ξ(x)

It is clear that 0 ≤ φi ≤ 1 and φi (x) = 1 for all x ∈ . So this is the smooth
partition of unity.
The three tools that we studied: mollifiers, cut-off functions, and partition of
unity, are among the most powerful tools that can be used to establish density and
approximation results. Mollifiers provide a mollification (smoothening the edges),
and cut-off functions provide compact support, and finally, the partition of unity
enables us to pass from local property to global.
3.2.5 Fundamental Lemma of Calculus of Variations
We conclude the section with a proof of the well-known Fundamental Lemma of

Calculus of Variations.
Lemma 3.2.9 (Fundamental Lemma of Calculus of Variations) Let u ∈ C 2 () sat-
isfying
uvd x = 0

3.3 Density of Schwartz Space 149
for all v ∈ C 2 () with v(∂) = 0. Then u ≡ 0 on .

Proof If not, then there exists a a neighborhood N ⊂ such that u > 0 on N . Let
K be a compact subset of N and define a cut-off function v = ξ(x) on K with ξ = 1
on K , 0 ≤ ξ ≤ 1, and supp(ξ) ⊆ N . This gives uξ > 0 on N . Hence
uv ≥ uξ > 0.
N
3.3 Density of Schwartz Space
3.3.1 Convergence of Approximating Sequence
One useful application of mollifiers is to approximate a function by another function

of the same regularity but with compact support. For example, let u ∈ C(Rn ), then
the sequence
u j (x) = u(x)ξ j (x) ∈ Cc (Rn )
and u j −→ u in C. Hence, we begin our density results with the following well-
known result in real analysis which implies that a continuous function can always
be approximated by another continuous function of compact support.
Theorem 3.3.1 Cc∞ (Rn )) is dense in C(Rn ).
Recall that ϕ ∈ C ∞ (Rn ) is said to be rapidly decreasing if
ϕα,β = sup x α D β ϕ(x) < ∞ ∀α, β ∈ Nn0 . (3.3.1)

x∈Rn
which is equivalent to
sup (1 + x)k D β ϕ(x) < ∞ (3.3.2)

x∈Rn
for all k ∈ N and β ∈ Nn0 , |β| ≤ k. The Schwartz space S(Rn ) is the space
S(Rn ) = {u ∈ C ∞ (Rn ) such that |u|α,β < ∞ for all α, β ∈ Nn0 }.
This section discusses some interesting properties of this space and how it can be
used to construct other function spaces. Schwartz space S(Rn ) has three significant
properties that make it very rich in construction and important in applications. These
properties are:
(1) S(Rn ) is closed under differentiation and multiplication by polynomials.
(2) S(Rn ) is dense in L p (Rn ).
(3) The Fourier transform is preserved under S(Rn ), i.e., it carries S(Rn ) onto itself.
The first property is clear from the previous chapter. To demonstrate the other prop-
erties, we need the following theorem.
Theorem 3.3.2 If u ∈ L p () for some ⊆ Rn and 1 ≤ p < ∞, then u → u in
L p ().
Proof We will prove the case when = Rn . By Theorem 3.3.1, or alternatively

Lusin’s Theorem, the function u can be approximated by a function in Cc (Rn ), i.e.,
for every σ > 0, there exists v ∈ Cc (Rn ) such that
σ
u − v p < (3.3.3)
3
Consider the following mollification of v
v = v ∗ ϕ .
Then, we have v ∈ C ∞ (Rn ). Note that v and ϕ are both of compact support, say
K and B (0) respectively, so
supp(v ) ⊂ K + B (0) ⊆ Br (0)
for some r > 0. Hence, v ∈ Cc∞ (Rn ). Moreover, since g is continuous on a compact
support, it is uniformly continuous on K . So there exists ρ > 0 such that if
x − y < ρ,
then σ
|v(x − y) − v(y)| <
3
for all x, y ∈ K . Then, using standard property (3) in the preceding section for ϕ ,
we get
σ σ
|v (x) − v(x)| ≤ |v(x − y) − v(x)| ϕ (y)dy ≤ ϕ (y)dy = .
Rn Br (0) 3 3
(3.3.4)
Taking the supremum of (3.3.4) over B (0), we conclude that v (x) → v(x) uni-
formly. Hence, there exists 0 sufficiently small such that for all < 0 and all
x ∈ B (0)
σ
v − v p < . (3.3.5)
3
Finally, we have
u − v = (u − v) ∗ ϕ (3.3.6)
and one can easily show that an approximation of u by v gives the same result for u
and v (verify). Now, from (3.3.3), (3.3.5), and (3.3.6), we obtain
u − u p ≤ u − v p + v − v p + v − u p < σ.
If ⊂ Rn is open, then we apply the function
ū(x) = u(x)χ , (3.3.7)
then ū ∈ L p (Rn ), so by the first case, its mollification ū ∈ C ∞ (Rn ) and
ū − ū L p (Rn ) −→ 0,
and consequently, one can show that
u − u L p () −→ 0.
3.3.2 Approximations of S and L p
One important consequence that can be established from the previous theorem is the
following.
Theorem 3.3.3 Let ⊆ Rn be an open set. Then, the space Cc∞ () is dense in
L p () for 1 ≤ p < ∞.
Proof Let u ∈ L p (). Then
ū(x) = u(x)χ (x) ∈ L p (Rn ).
Consider the sequence

ū = (u ) ∈ C ∞ (Rn )
in the proof of the previous theorem such that ū converges to ū in L p (Rn ) by the
previous theorem. Let = m such that −→ 0 as m → ∞. Define the following
collection of sets K m
1
d(K m , ∂) ≥ .
m

∞
Then clearly each K m is compact for all m, and K m = . Now, define a sequence
m=1
of cut-off functions
1 x ∈ Km
ξm (x) = (3.3.8)
0 x ∈ Rn \ .
Note that
|ūξm (x) − ū(x)| −→ 0
a.e. on R and
|ūξm − ū| p = |ξm − 1| p |ū| p ≤ |ū| p ,
so by the Dominated Convergence Theorem,
ūξm − ū L p (Rn ) −→ 0.
Define the sequence

u m = ū ξm (x).
Then supp(u m ) ⊆ , hence u m ∈ Cc∞ () and
u m − u L p () = u m − ū L p (Rn )

≤ (ū − ū)ξm (x) L p (Rn ) + ūξm − ū L p (Rn )
≤ ū − ū L p (Rn ) + ūξm (x) − ū L p (Rn ) −→ 0.
The above density result is not valid for p = ∞ (choose f (x) = c for some
nonzero constant c). The significance of this result lies in the fact that any function
in L p can be approximated by a smooth function in Cc∞ . It seems clear from the
proof how the mollifiers and the cut-off functions can be effective tools to construct
smooth sequences with compact support since the mollification provides C ∞ and
the cutting-off provides compact support. Note that we performed the mollification
then the cutting-off. If the reverse order was performed we wouldn’t obtain a smooth
sequence in but rather in K n ⊂ . This is one crucial advantage of convoluting
on R is that it produces a convolution approximating sequence on R not on a subset
of it.
Next, we will use the argument of the proof above in proving a generalization of
the Fundamental Lemma of COV (Lemma 3.2.9) which will play a crucial role in
the theory of elliptic PDEs in Chap. 4.
Theorem 3.3.4 (Fundamental Theorem) If u ∈ L 1Loc () satisfying
uvd x = 0

for all v ∈ Cc∞ (), then u = 0 a.e. in .

Proof Consider
u m = ū ξm (x) ∈ Cc∞ ()
as in the proof of the previous theorem and with the same settings for ξm (x) and ū .
Then
u m (x) = ϕ (x − y)ξm (y)u(y)dy.

Substituting with
v = ϕ (x − y)ξm (y) ∈ Cc∞ ()
in the integral above gives
u m (x) = 0,
and since u m −→ u in L p , taking into account (3.3.8), we conclude that u = 0 a.e.

in .
As a consequence of the previous theorem, we can now give a proof for

Theorem 3.1.2(1), the uniqueness of weak derivatives.
Theorem 3.3.5 Let u, D α u ∈ L 1Loc (), then D α u is unique.
Proof If there are two weak derivatives v1 , v2 for u, then using the same argument
in Theorem 3.1.2(1), we obtain
(v1 − v2 )ϕd x = 0,

for all ϕ ∈ Cc∞ (), and by the previous theorem, v1 = v2 .
The following result is very important and will be useful for upcoming results.
Theorem 3.3.6 S(Rn ) is dense in L p (Rn ).
Proof We need to show the following inclusions:
Cc∞ (Rn ) = D(Rn ) ⊂ S(Rn ) ⊂ L p (Rn ) (3.3.9)
The first inclusion above is clear since functions in D(Rn ), hence their derivatives
are of compact support and their supremum, therefore, exist on compact subsets. For
the second inclusion, letting u ∈ S(Rn ), then u can be written as
u(x) = (1 + |x|)−
n+1 n+1
p (1 + |x|) p u(x).
It follows that
p dx
u pp = |u| p ≤ sup (1 + |x|)n+1 u(x) < ∞.
Rn Rn (1 + |x|)n+1
and taking into account that (1 + |x|)n+1 u(x) is bounded because u ∈ S(Rn ). Com-
bining (3.3.9) together with Theorem 3.3.3 and the fact that L p (Rn ) is complete, the
result follows.
From the previous theorem and (3.3.9), we conclude the following:

Corollary 3.3.7 D(Rn ) is a dense subspace of S(Rn ).
Another important consequence of Theorem 3.3.6 is that, we can extend the
Fourier transform to L p (Rn ) by continuity. This is possible since S(Rn ) is dense
in L p (Rn ). Indeed, let f ∈ L p (Rn ). Then, there exists { f m }m ∈ S(Rn ) such that
f m → f in L p . It can be easily shown that F{ f m } is Cauchy in L p , which is complete,
hence it converges to some function, say g. So F{ f m } → g. Since F is continuous,
we can define F{ f } = g. In general, we cannot find a Fourier transform for every
f ∈ L p (Rn ), so the idea is to “upgrade” F, in the sense that it defines F primarily
in some dense subspace, then is lifted up to the desired space.
3.3.3 Generalized Plancherel Theorem
A third consequence is the following:

Theorem 3.3.8 (Generalized Plancherel Theorem) For f ∈ S(Rn ), one has
F{ f }22 = (2π)n f 22 .
Proof Let f, g ∈ S(Rn ). Then
F{ f }, F{g} = e−iω·x f (x)F{g}d xdω
= f (x) eiω·x F{g}dωd x = (2π)n f (x)g(x)d x.
Writing f = g, we get the result.
The Fourier transform is isometric (up to a factor of 2π) with respect to the L 2
norm, i.e., it preserves norm. One may eliminate the constant difference by adopting
the definition
F{ f } = f (x)e−2πiωx d x,
Rn
and this definition will give us the isometry

3.4 Construction of Sobolev Spaces 155
F{ f }22 = f 22 .
So we conclude the following result:

Corollary 3.3.9 F : L p −→ L p is isomorphism. Moreover, it is isometric for p =
2. That is, Fourier transforms are isometrically isomorphisms on Hilbert spaces.
3.4 Construction of Sobolev Spaces
3.4.1 Completion of Schwartz Spaces
Consider the mapping ρ : S(R) → R, ρ(ϕ) = ·k,m defined by
ρk,m (ϕ) = ϕk,m = sup sup x k ϕ(m) (x) .

k,m x∈R
It is clear that {ρk,m } is a family of countable semi-norms. Indeed, 0 ≤ ρ(ϕ) < ∞

for all ϕ ∈ S(R),
ρ(αϕ) = |α| ρ(ϕ),
and
ρ(ϕ + φ) ≤ ρ(ϕ) + ρ(φ).
If ϕ = 0, then ϕ = 0 only if m = 0. If ϕ(x) = c for any constant c, then ϕ = 0

for any m ≥ 1. Therefore, ρ is a seminorm but not a norm, and the system
Uk,m = { f ∈ S(R), f k,m < }
is a countable base at 0. Consequently, (S(R), ρk,m ) is a metric space but not normed
space. Consider the Frechet metric
1 ρ(ϕ − φ)
d(ϕ, φ) = . (3.4.1)
k,m
2k+m 1 + ρ(ϕ − φ)
It is easy to see that the metric d turns S(R) into a Frechet space. Indeed, let ϕn ∈
S(R) be a Cauchy sequence, so it is bounded. Then, x k ϕ(m)n (x) is Cauchy in S(R),
so it is Cauchy in the complete space (C ∞ (R), d), so it converges uniformly to a
bounded function ϕ in a Banach space C ∞ (R) with supremum norm (why?). We,
therefore, have
ϕn k,m → ϕk,m ,
or
ϕ − ϕn k,m → 0,
so
ϕk,m ≤ ϕ − ϕn k,m + ϕn k,m < ∞
Thus, ϕ ∈ S(R). The same argument can be extended to S(Rn ) by considering the
following seminorm:
ρα,β (ϕ) = ϕα,β = sup sup x α D β ϕ(x) ,

α,β x∈R
and
1 ρα,β (ϕ − φ)
d(ϕ, φ) = .
2|α|+|β| 1 + ρα,β (ϕ − φ)
3.4.2 Definition of Sobolev Space
It should be noted that S(R) is not a complete normed space, i.e., there is no norm
that would make the space complete. Thus, it is important to extend the space to its
completion. This is nothing but the Sobolev space as in the following definition.
Definition 3.4.1 (Sobolev Space) Let be in Rn . Then, the Sobolev space, denoted
by W k, p (), is defined by 0 ≤ |α| ≤ k}.
W k, p () = {u ∈ L P () such that D α u ∈ L p () for all 0 ≤ |α| ≤ k}.
In particular, the space W k,2 () is denoted by H k (). Note that if k = 0, then
W = L p . For k = 1, the definition reads
0, p
∂u
W 1, p () = {u ∈ L P () such that ∈ L p () i = 1, 2, . . . n}
∂xi
The Sobolev space W k, p () is a normed vector space with the norm
⎛ ⎞1/ p

uk, p = ⎝ D α u pp ⎠ . (3.4.2)
|α|≤k
In one dimensional case, this takes the form

3.4 Construction of Sobolev Spaces 157
p 1/ p
uk, p = u pp + Du pp + . . . + D k u p .
Indeed, uk, p ≥ 0 and uk, p = 0 iff f = 0 a.e. Moreover,
cuk, p = |c| uk, p
and
⎛ ⎞1/ p

u + vk, p = ⎝ D α (u + v) pp ⎠
|α|≤k
⎛ ⎞1/ p

≤⎝ D α u pp + D α v⎠
|α|≤k

≤ D α u pp + D α v pp
|α|≤k |α|≤k
= uk, p + vk, p .
The Sobolev space contains all functions in L p such that their distributional deriva-
tives are also in L p , and since these derivatives belong to L p , they exist and are
unique up to a set of measure zero, and we also have in general
W 1, p L p .
The above proper inclusion indicates that there are examples of functions that are
in L p , but they are not in W 1, p . They could even be weakly differentiable, but
nevertheless they are not Sobolev functions (see Problem 3.11.25). We can tell if a
function is in W 1, p by looking at its derivatives if they belong to L p in the sense of
distributions. For example, let u(x) = x −1/3 , then it is readily seen that u ∈ L 2 (0, 1)
but u ∈ / L 2 (0, 1), hence u ∈
/ H 2 (0, 1). In general, the Sobolev spaces are nested in
the following sense:
W k, p () ⊆ W k−1, p () ⊆ · · · ⊆ L p ().
Remark The letter k for H k is sometimes replaced with s.

In the case p = ∞, the Sobolev norm can be defined as
uk,∞ = max D α u L ∞ () ,

|α|≤k
where
D α u L ∞ () = ess sup |D α u| .

The importance of the following result is that it helps propose another equivalent
definition for H s .
Proposition 3.4.2 Let u ∈ L 2 (Rn ). Then, u ∈ H k (Rn ) iff
(1 + w2 )k/2 û(w) ∈ L 2 (Rn ).
Proof Let u ∈ L 2 (Rn ). Then

u22,k = D α u22 .
|α|≤k
Employing Plancherel Theorem and then Proposition 3.1.6, yield
1
D α u22 = F{D α u}22
|α|≤k
2π |α|≤k
1
i |α| w α û(w)2
= 2
2π |α|≤k
1
|w α |2 û(w) dw.
2
=
2π Rn |α|≤k
But note that

α
|w α |2 = w2 .
|α|≤k |α|≤k
We can find M ∈ R sufficiently large such that

1 α
(1 + |w|2 )k ≤ w2 ≤ M((1 + |w|2 )k
M |α|≤k
1
for all w ∈ Rn . Let a = and b = M. Then, by the above argument we conclude
M
that the norm u2,k is equivalent to the norm defined by
1/2
u = (1 + |w| ) û(w)dw
2 k
.

Hence, u22,k < ∞ if and only if

3.5 Basic Properties of Sobolev Spaces 159
1/2
(1 + |w| ) û(w)dw
2 k
< ∞.

This completes the proof.
3.4.3 Fractional Sobolev Space
The previous result suggests another definition of H s based on Fourier transform.

Definition 3.4.3 (Fractional Sobolev Space) The space H s (Rn ) is defined as
H s (Rn ) = {u ∈ L 2 (Rn ) : (1 + ω2 )s/2 û(ω) ∈ L 2 (Rn )∀0 ≤ s}
with the norm

u H s (Rn ) = (1 + ·2 )s/2 û L 2 (Rn ) .
The space H s (Rn ) is a Hilbert space endowed with the inner product

u, vk,2 = D α u, D α v2 . (3.4.3)
|α|≤k
Note here that, in this case, the space H s (Rn ) is not a subspace of L 2 (Rn ) (why?).
3.5 Basic Properties of Sobolev Spaces
3.5.1 Convergence in Sobolev Spaces
We begin the discussion on the properties by establishing an equivalent norm to the

norm defined in (3.4.2). Recall that two norms ·a and ·b are equivalent in a
normed space X if there exist C1 , C2 > 0 such that
C1 xa ≤ xb ≤ C2 xa
for all x ∈ X .
Proposition 3.5.1 For the space W k, p , 1 ≤ p, the norm
⎛ ⎞1/ p

uk, p = ⎝ D α u pp ⎠
|α|≤k
is equivalent to the norm

uk, p = D α u p .
|α|≤k
Proof For convenience, let us call the first norm ·1 and the second norm ·2 .
Using the fact that |x| p is convex for p ≥ 1, it is easy to show that
p p p p
c1 + c2 ≤ (c1 + c2 ) p ≤ 2 p−1 (c1 + c2 ),
and by induction this gives

n p n

n
p
p
ci ≤ ci ≤n p−1
ci .
i=1 i=1 i=1
It follows that

u1p = D α u pp
|α|≤k
p
≤ D α u p = u2
p
|α|≤k

≤ n p−1 D α u pp
|α|≤k
=n p−1
u1p .
That is
C1 u1 ≤ u2 ≤ C2 u1 ,
√
where C1 = 1 and C2 = n p−1 .
The convergence in Sobolev spaces can now be defined as follows:
Definition 3.5.2 Convergence in Sobolev Spaces. Let u m , u ∈ W k, p (), for ⊆
Rn . Then u m −→ u in W k, p () if
lim u m − uW k, p () = 0,
where ·W k, p () can be either one of the two norms

⎛ ⎞1/ p

uk, p = ⎝ D α u pp ⎠
|α|≤k
or

uk, p = D α u p .
|α|≤k
In W 1, p () for example, the two norms take the forms

1/ p
uW k, p () = |u| p + |Du| p ,

1/ p 1/ p
uW k, p () = |u| p
+ |Du| p
,

respectively. Each one of them can be convenient to use in some cases, and we will
constantly interchange the two norms and use whichever is suitable for the specific
argument.
3.5.2 Completeness and Reflexivity of Sobolev Spaces
As argued above, the Sobolev space W k, p (Rn ) is expected to be the completion of

the Schwartz space S(Rn ). The next theorem proves that this is indeed the case.
Theorem 3.5.3 The space W k, p (), for an open ⊆ Rn , is complete, separable

for 1 ≤ p < ∞, and reflexive for 1 < p < ∞.
Proof Let u m ∈ W k, p () be a Cauchy sequence. Then, {u m } and {D α u m } are Cauchy

sequences in L p () for all α ∈ Rn with |α| ≤ k. Since L p () is complete, both
sequences converge in L p (). So there exist u and v such that
u m − u p → 0, & D α u m − v p → 0
as m → ∞. Let ϕ ∈ D(). Then, using Holder’s inequality
(D α u m )ϕ − vϕ ≤ |(D α u m − v) ϕ|

≤ D α u m − v p ϕq → 0,
as m → ∞. Consequently
lim (D α u m )ϕ = vϕ (3.5.1)

Moreover, as m → ∞, we have
u m (D α ϕ) − u(D α ϕ) ≤ u m − u p D α ϕq → 0,

so
lim um Dαϕ = u(D α ϕ) (3.5.2)

From (3.5.1) and (3.5.2), we obtain
u Dϕ = lim um Dαϕ

|α|
= (−1) lim Dαum ϕ

= (−1)|α| vϕ.

Hence, D α u = v in W k, p (Rn ), and so
D α u m − D α v p → 0.
It follows that
u m − uk, p → 0
k, p
for all 0 ≤ |α| ≤ k. Therefore, u m −→ u. So u m converges to u in W k, p (Rn ), and
consequently W k, p (Rn ) is complete.
To show separability, note that the product space (Z , · Z ), defined as
N +1
Z= (L p ) j
j=1
with the norm ⎛ ⎞

k
p
u Z = ⎝ u j ⎠
p
j=1
is a separable Banach space. Define the following mapping:
T : W k, p (Rn ) −→ Z (Rn ), (T u) j = (D α u)
for 0 ≤ |α| ≤ k. Clearly, T is linear and injective isometry since

uk, p = T u Z .
Hence, T (W k, p (Rn )) is separable in the separable space Z . Consequently, there exists

a countable dense subset D in T (W k, p (Rn )), and therefore, T −1 (D) is a countable
dense subset in W k, p (Rn ).
Finally, note that Z is reflexive being a finite product of reflexive spaces, and
W k, p (Rn ) embeds as a complete space, so it is closed in Z , hence reflexive by Pettis
Theorem.
Recall that Pettis Theorem states that every closed subspace of a reflexive normed
space is reflexive. This is a well-known result in functional analysis, and its proof is
based on Banach–Bourbaki–Kakutani Theorem which states that a space is reflexive
if and only if its closed unit ball is weakly compact.
3.5.3 Local Sobolev Spaces
1
Recall in Sect. 3.1 we defined L loc (), the space of locally integrable functions on
, to be the functions that are Lebesgue integrable on every compact subset of .
We would like to define a similar space for the Sobolev spaces. However, one critical
issue arises here. Sobolev spaces involves weak derivatives, and these derivatives
might possess bizarre behavior on the boundary of compact sets, so defining the
locality in terms of compactness might be problematic. Alternatively, we strengthen
the idea of locality to involve an open set whose closure is compact proper subset of
the domain. This gives a control over the derivatives within the domain.
Definition 3.5.4 (Compact Inclusion) A set is said to be compactly contained in
(denoted by ⊂⊂ ) if ⊂ ⊂ where is compact.
Remark In other textbooks, this may also be denoted by , but we will not
adopt this notation.
Now, we define the local Sobolev space.
Definition 3.5.5 (Local Sobolev Space) Let ∈ Rn . Then, the local Sobolev space,
denoted by W k, p (), 0 ≤ |α| ≤ k is defined by
k, p
Wloc () = {u ∈ L P () such that u ∈ W k, p ( ) for every ⊂⊂ }.
k, p
We can alternatively say that u ∈ Wloc () if u ∈ W k, p (K ) for any compact set
K ⊂ , which might be a more convenient way of describing the functions in the
space. The functions in this space don’t have any growth constraints at the boundary.
Convergence in local Sobolev Spaces can be defined similar to that for the Sobolev
space: Let u m , u ∈ W k, p (), for ⊆ Rn . Then we say that the sequence u m con-
k, p
verges to u in Wloc () if for every ⊂⊂ we have
lim u m − uW k, p ( ) = 0.
3.5.4 Leibnitz Rule
We next establish Leibnitz rule for weak derivatives.

Theorem 3.5.6 Let u ∈ W k, p () and ψ ∈ Cc∞ (), ⊂ Rn . Then ψu ∈ W k, p (),
and for all |α| ≤ k, we have
α
α
D (uψ) = D β ψ · D α−β u.
β
β≤α
Proof We begin by the case |α| = 1, i.e.,
∂u
Dαu = ,
∂xi
i = 1, . . . n. Let ϕ ∈ Cc∞ (). Note that ψϕ ∈ Cc∞ (), hence we can use the classical
product rule
∂ ∂ψ ∂ϕ
(ϕ.ψ) = ϕ +ψ .
∂xi ∂xi ∂xi
Then
∂(u.ψ) ∂ϕ
ϕ=− u.ψ
∂xi ∂xi
∂ ∂ψ
= − u. (ϕ.ψ) + uϕ
∂xi ∂xi
∂u ∂ψ
= ϕψ + uϕ
∂xi ∂x
i
∂ψ ∂u
= u +ψ ϕ.
∂x i ∂x i
Hence,
∂(u.ψ) ∂ψ ∂u
=u +ψ
∂xi ∂xi ∂xi
Now, assume it is case for |α| = k, and consider α = β + γ such that |β| = k, and
|γ| = 1. Then by Theorem 3.1.2(2) we have
Dβ Dη = Dα.
It follows that
u.ψ D α ϕ = u.ψ D β (D γ ϕ)

β
|β|
= (−1) (D η ψ)(D β−η u)(D γ ϕ)
η≤β η
β
|β|+|γ|
= (−1) D γ (D η ψ D β−η u)(ϕ).
η
η≤β
Now, we apply Theorem 3.1.2 to the RHS of the equality and rearrange terms, making
use of the fact that

β β α
+ = ,
η−γ η η
we obtain
α
u.ψ D α ϕ = (−1)|α| (D η ψ)(D α−η u)(ϕ).
η≤α η
Therefore,
α
α
D (u.ψ) = (D η ψ)(D α−η u)(ϕ).
η
η≤α
Note that uψ ∈ L p (). Moreover, D α u ∈ L p () and D α ψ ∈ L p (), hence
D α (u.ψ) ∈ L p ()
for |α| ≤ k. Consequently,

u.ψ ∈ W k, p ().
3.5.5 Mollification with Sobolev Function
Theorem 3.5.7 Let u ∈ W k, p , 1 ≤ p < ∞, and let u = u ∗ ϕ for some mollifier

ϕ , > 0. Then, we have the following:
(1) If u ∈ W k, p (Rn ) then u ∈ C ∞ (Rn ), and
D α u = ϕ ∗ D α u = (D α u)
for all x ∈ Rn .
(2) If u ∈ W k, p () then u ∈ C ∞ ( ), and
D α u = ϕ ∗ D α u = (D α u)
for all x ∈ .
(3) u k, p ≤ u k, p .
Proof (1) and (2) follows immediately from Theorem 3.2.2 since every u in
W k, p (Rn ) is in L p (Rn ). For (3), Theorem 3.2.2 proved that
u p ≤ u , p . (3.5.3)
On the other hand, (2) demonstrated that
(D α u )(x) = (D α u) (x).
Since u ∈ W k, p , we have D α u ∈ L p . Then by Theorem 3.2.2
(D α u) p ≤ D α u p . (3.5.4)
Now (3) follows from (3.5.3) and (3.5.4). The result can also be proved similarly for
the case = Rn .
k, p
3.5.6 W0 ()
One of the important Sobolev spaces is the so called: “zero-boundary Sobolev space”.
This is defined in most textbooks as the closure (i.e., completion) of the space Cc∞ .
However, since we haven’t yet discussed approximation results, we shall adopt for
the time being an equivalent definition which may seem to be a bit more natural.
Definition 3.5.8 (Zero-Boundary Sobolev Space) Let ⊆ Rn . Then, the zero-
k, p
boundary Sobolev space, denoted by W0 (), is defined by
W0 () = {u ∈ W k, p () such that D α u |∂ = 0 for all 0 ≤ |α| ≤ k − 1}.

k, p
k, p
In words, the Sobolev functions in W0 () together with all their weak derivatives
up to k − 1 vanish on the boundary. More precisely,
1, p
W0 (Rn ) = {W 1, p (Rn ) ∈ L P (Rn ) such that u |∂ = 0}.
Two advantages of this property, as we shall see later, is that: 1. the regularity of
the boundary is not necessary, and 2. extensions can be easily constructed outside .
3.6 W 1, p () 167
k, p
Functions in W0 () are thus important in the theory of PDEs since they naturally
satisfy Dirichlet condition on the boundary ∂.
k, p
Proposition 3.5.9 For any ⊆ Rn , the space W0 () is Banach, separable, and
reflexive, for every k ≥ 0, 1 ≤ p ≤ ∞.
1, p
Proof It suffices to prove that W0 () is closed in W 1, p (). The proofs for sep-
arability and reflexivity are very similar to that for W 1, p (). The details are left to
the reader as an exercise.
3.6 W 1, p ()
The space W 1, p () is particularly important since it provides the bases for several
properties of Sobolev spaces. Sometimes it might be simpler to prove certain prop-
erties in W 1, p () because the techniques involved might become more complicated
when dealing with k > 1, so we establish the result for W 1, p () knowing that the
results can be extended to general k by induction. The next result demonstrates one of
the features of this space that distinguish it from other Sobolev spaces. We consider
the simplest type of this space which is W 1,1 (I ), where I is an interval in R. The
Sobolev norm in this case takes the form
u1,1 = |u| d x + u d x.
I I
The following theorem gives a relation between classical derivatives and weak deriva-
tives.
3.6.1 Absolute Continuity Characterization
Theorem 3.6.1 Let I be an interval in R, and u ∈ L 1 (I ) and its weak derivative is

u . Then u ∈ W 1,1 (I ) if and only if there exists an absolutely continuous represen-
tation ũ ∈ C(I ) in W 1,1 (I ) such that
x
ũ(x) = c + u (t)dt
a
for every x ∈ I and for some constant c.
Proof Choose a ∈ I and define

x
ũ(x) = u (t)dt.
a
Since u ∈ W 1,1 (I ), u ∈ L 1 (I ), so ũ ∈ L 1 (I ), and hence ũ ∈ C(I ) is absolutely

continuous and (ũ) = u , and consequently ũ ∈ W 1,1 (I ). We have
(u − ũ) = 0,
and so
u − ũ = c.
Conversely, let ũ ∈ C(I ) be an absolutely continuous version of u such that
(ũ) = u .
Let ϕ ∈ Cc∞ (I ) with supp(ϕ) ⊆ I . Performing integration by parts gives

b
uϕ=− uϕ .
I a
This implies that Du exists, and since u is absolutely continuous, we have by Fun-
damental Theorem of Calculus
Du = u ∈ L 1 (I ).
According to the previous theorem, absolute continuity provides a necessary and

sufficient condition for the weak and classical derivative of functions in L 1 (I ) to
coincide. The importance of the previous result lies in the fact that it allows the
replacement of any function u ∈ W 1,1 (I ) by its continuous representative function.
Loosely speaking, we can view functions in W 1,1 (I ) as absolutely continuous func-
tions. The fact that (ũ) = u is well expected since any absolutely continuous func-
tion is differentiable a.e. by the Fundamental Theorem of Calculus, and if its weak
derivative is in L 1 then it equals the classical derivative.
It is worth noting that the result does not hold in higher dimensions Rn , n > 1,
i.e., functions in W 1, p () for n ≥ 2 are not continuous and may be unbounded. We
shall see when studying embedding theorems that, under certain conditions, they
may coincide with continuous functions almost everywhere.
1, p
The next result gives a connection between W 1, p () and W0 ().
Proposition 3.6.2 Let u ∈ W 1, p (), 1 ≤ p < ∞. If there exists a compact set K ⊂
1, p
such that supp(u) = K , then u ∈ W0 ().
Proof Let be open such that
K ⊂ ⊂⊂ .
Define ξ ∈ D() such that ξ = 1 on K and

3.6 W 1, p () 169
supp(ξ) ⊆ ⊂
for some open ⊃ K . By Theorem 3.3.3, there exists u m ∈ D() such that
u m −→ u
in L p (), and
∂u m ∂u
−→
∂xi ∂xi
on (why?). Then ξu m ∈ D(), and
ξu m − ξu L p () ≤ ξ∞ u m − u L p () −→ 0.
Further, we have

∂(ξu m ) ∂(ξu) ∂ξ ∂u m ∂u
− = (u m − u) − ξ −
∂x ∂xi L p () ∂xi ∂xi ∂xi L p ()
i

∂ξ
≤ u m − u L p () + ξ∞ ∂u m − ∂u −→ 0.
∂x ∂x ∂xi L p ()
i ∞ i
Therefore,
ξu m −→ ξu = u
in W 1, p (). Since supp(ξu m ) ⊂ , we have

1, p
ξu m ∈ W0 ().
1, p
By completeness, u ∈ W0 ().
Remark The result of the previous proposition can be easily extended to W K , p ().
Similar to Lemma 2.10.1, we have the following result.

Proposition 3.6.3 Let u ∈ W 1, p (Rn ) for 1 ≤ p < ∞, and f ∈ L 1 (Rn ). Then
u ∗ f ∈ W 1, p (Rn )
and
Dxi ( f ∗ u) = f ∗ Dxi u.
Proof It is clear that u ∗ f ∈ L p (Rn ). Let ϕ ∈ D(Rn ). Then

Dxi ( f ∗ u)ϕ = − ( f ∗ u)Dxi ϕ

Rn Rn
=− u( f ∗ Dxi ϕ)
Rn
=− u.Dxi ( f ∗ ϕ)
Rn
= (Dxi u)( f ∗ ϕ)
Rn
= ( f ∗ Dxi u)ϕ.
Rn
3.6.2 Inclusions
Proposition 3.6.4 (Inclusion Results) Let ⊂ Rn . Then the following inclusions

hold:
1. If k1 , k2 ∈ N, such that 0 ≤ k1 < k2 , then
W k2 , p () ⊂ W k1 , p ().
2. If ⊂ then
W k, p () ⊂ W k, p ( ).
3. If is bounded and q ≥ p then
W k,q () ⊂ W k, p ().
4. If is bounded then
k, p k, p p
W0 () ⊂ W k, p () ⊂ Wloc () ⊂ L loc ().
5. For all k ∈ N, we have
C ∞ () ⊂ C k () ⊂ W k, p ().
6. For all k ∈ N, we have
Cc∞ () ⊂ Cck () ⊂ W0 ().

k, p
Proof The proofs follow directly from the definitions of the spaces.
3.6 W 1, p () 171
3.6.3 Chain Rule
The next result is a continuation of the calculus of weak derivatives. We have estab-
lished various types of derivative formulas, and it remains to establish the chain rule,
which plays an important role when we discuss the extension of Sobolev functions.
Theorem 3.6.5 Let u ∈ W 1, p (), 1 ≤ p ≤ ∞, and F ∈ C 1 (R) such that F ≤
M. Then
∂ ∂u
(F ◦ u) = F (u) · .
∂x j ∂x j
Moreover, if is bounded or F(0) = 0, then
F ◦ u ∈ W 1, p ().
∂u
Proof Let 1 ≤ p < ∞. Since ∈ L p () for 1 ≤ i ≤ n, and F ∈ C(R),
∂xi
∂u p ∂u p
F (u) dx ≤ M p d x < ∞.
∂xi ∂x i
Hence,
∂u
F (u) ∈ L p (). (3.6.1)
∂xi
By Theorem 3.3.3, consider the convolution approximating sequence u ∈ Cc∞ ()

converging to u in L p (). Similar to the above argument, we also see that
F(u ) − F(u) L p () ≤ M u − u L p () −→ 0.
Hence,
F(u ) −→ F(u)
in L p (). Furthermore, since F is continuous and u −→ u a.e. on , F (u ) −→

F (u) a.e. on , and we also have F ≤ M on , so we apply the Dominated
Convergence Theorem to obtain
F (u ) −→ F (u)
in L p (). Note that

∂u ∂u
= ,
∂xi ∂xi
∂u
and since ∈ L p (), we have by Theorem 3.3.2
∂xi
∂u ∂u
−→ in L p ().
∂xi ∂xi
Then

∂u ∂u
F (u ) ∂u − F (u) ∂u ≤ M −
∂xi ∂xi L p () ∂xi ∂xi L p ()

! " ∂u
+ F (u ) − F (u) ∂x

.
i L p ()
Hence,
∂u ∂u
F (u ) −→ F (u)
∂xi ∂xi
in L p (). Now, let ϕ ∈ Cc∞ (), and writing = n → 0 as n −→ ∞. We use the

classical chain rule on F,
∂ϕ ∂ϕ
F(u) d x = lim F(u n ) dx
∂xi ∂xi
∂
= − lim (F(u n ))ϕd x
∂x i
∂u n
= − lim F (u n ) ϕ(x)d x
∂xi
∂u
=− F (u) ϕ(x)d x.
∂x i
Consequently,
∂ F(u(x) ∂u
= F (u) ,
∂xi ∂xi
and the result follows by (3.6.1).

If F(0) = 0, then, using the Mean Value Theorem, we have
u
|F(u)| = |F(u) − F(0)| ≤ F (t) dt ≤ Mu.
0
Integrating over ,
F L p () ≤ M u L p () < ∞.

3.6 W 1, p () 173
So F ◦ u ∈ L p (), and so (3.6.1) implies that
F ◦ u ∈ W 1, p ().
If is bounded, then
|F(0)| p d x < ∞,

so F(0) ∈ L p (). Also,
|F(u)| ≤ |F(u) − F(0)| + |F(0)| ≤ Mu + |F(0)|
and since u ∈ L p (), we have
Mu + |F(0)| ∈ L p (),
so
|Mu + |F(0)|| p < ∞,

thus F ◦ u ∈ L p ().
For p = ∞, note that if u ∈ W 1,∞ (), then u ∈ W 1, p () and the chain rule holds
for all p < ∞.
Corollary 3.6.6 Let u ∈ W 1, p () for ⊂ Rn , 1 ≤ p ≤ ∞. Then |u| ∈ W 1, p ()

and for all |α| = 1
⎧
⎪ α
⎨D u u>0
α
D (|u|) = 0 u=0
⎪
⎩ α
−D u u < 0.
Proof Since |u| = u + + u − , where u + = max(u, 0) and
u − = (−u)+ = − min(u, 0),
it suffices to prove the result for u + . For every t ∈ R, define

√
t 2 + 2 − t > 0
F (t) =
0 t ≤ 0.
It is clear that F ∈ C 1 (R). By the chain rule, Fn (u) ∈ W 1, p () and

⎧
⎨√ u ∂u
u>0
D(F (u)) = u 2 + 2 ∂xi
⎩
0 u ≤ 0.
It follows that
∂ϕ u ∂u
F (u) dx = − ϕ√ dx
∂xi + u2 + 2 ∂xi
where + = {x ∈ : u > 0}. But in + we have F < 1, and we also have

Fn (u) −→ u + and
u
√ 1 a.e. in .
u + 2
2
Applying the Dominated Convergence Theorem,

∂ϕ ∂u
u+ dx = − ϕd x.
+ ∂xi + ∂xi
Therefore, u + ∈ W 1, p (), and

⎧
⎨ ∂u u>0
D α (u + ) = ∂xi (3.6.2)
⎩
0 u ≤ 0.
Similar argument shows that

⎧
⎨0 u≥0
D α (u − ) = ∂u (3.6.3)
⎩− u<0
∂xi
and so |u| ∈ W 1, p () and the weak derivative is the summation of (3.6.2) and (3.6.3).
The details are left to the reader.
3.6.4 Dual Space of W 1, p ()
Recall in a basic functional analysis that the dual space X ∗ of a normed space X
was defined to be the space consisting of all bounded linear functionals on X, i.e.,
f ∈ X ∗ if f : X −→ R is bounded linear functional. (Here, the scalar field could be
also C but we only focus on R in this book). A Fundamental result is that, if p and
1 1
q are conjugates in the sense that + = 1, then
p q
(L p )∗ = L q .
3.6 W 1, p () 175
Since W 1, p () contains L p functions and their derivatives, one can define the
dual space as follows:
Definition 3.6.7 (Dual of Sobolev Space) Let q be the conjugate of p. Then the dual
space (W 1, p ())∗ of the Sobolev space W 1, p () is defined as
(W 1, p ())∗ = W −1,q ().
In general, we have
(W k, p ())∗ = W −k,q ().
For p = 2, the space H −1 is the dual space of H01 . The norm defined on H −1 can
be written as follows:
f H −1 = sup{ f, u : u ∈ H01 (), u H01 () = 1}.
In principle, the space W −1,q () contains the distributional derivatives of L p

functions identified by regular distributions, even if these derivatives are not weak
(hence not included in W 1, p ()). In particular, it consists of all distributions in D ()
of the form
n
∂ fi
f = f0 + ,
i=1
∂xi
for f 0 , f i ∈ L 2 (). The Heaviside function is an example of a function in H −1 ()

but not in L 2 () (verify). More precisely, suppose u ∈ H01 () then Du ∈ H −1 (),
and hence D 2 u ∈ H −1 () since
v D2u = − Dv.Du.

This implies the following proper general inclusion:
W 1, p () ⊂ L p () ⊂ W −1,q ().
In particular,
H01 () ⊂ L 2 () ⊂ H −1 (). (3.6.4)
By Riesz representation theorem, any functional f ∈ H −1 () can be represented

by an inner product with some u ∈ H01 () such that
u, v H01 () = uv + Du.Dv

for all v ∈ H01 (). On the other hand, every u ∈ L 2 defines a function
f u ∈ H −1 ()
such that for all v ∈ H01 ()
u, v L 2 () = uv.

3.7 Approximation of Sobolev Spaces
Recall in Theorem 3.2.2 it was shown that if u ∈ L p () then u ∈ C ∞ ( ), and
D α u = ϕ ∗ D α u
for all x ∈ , where
= {x ∈ : d(x, ∂) > } = {x ∈ : B (x) ⊆ }.
Theorem 3.3.2 also established the fact that u → u in L p (). This gives an
approximating sequence in the form of convolution which approaches u from the
interior of the domain. With every fixed > 0, we can have some x in the neigh-
borhood ⊂ such that this interior neighborhood approaches from inside.
This is a “local” type approximation. This localness necessarily requires a bounded
subset . As soon as we consider the whole space Rn , the localness property should
disappear.
3.7.1 Local Approximation
Proposition 3.7.1 Let u ∈ W k, p () for 1 ≤ p < ∞. Then

k, p
u −→ u in Wloc ().
If = Rn then
u −→ u in W k, p (Rn ).
Proof By definition of Sobolev spaces, u ∈ L p (), so by Theorem 3.3.2, u −→ u

in L p (). Let
= {x ∈ : d(x, ∂) > }
for > 0. Then, by Theorem 3.5.7(2), we have u ∈ C ∞ ( ), and

3.7 Approximation of Sobolev Spaces 177
D α u = ϕ ∗ D α u. (3.7.1)
for all x ∈ . Therefore
u ∈ W k, p ( ).
Choosing any compact set ⊂⊂ such that ⊂ for some small > 0, then
for all |α| ≤ k and letting → 0+ , we have
D α u − D α u L p ( ) = (D α u) − D α u L p ( ) −→ 0.
This proves the first part.

If = Rn , then = Rn and (3.7.1) holds for all x ∈ Rn , so for all |α| ≤ k, we
have
D α (u ) − D α u L p (Rn ) = (D α u) − D α u L p (Rn ) −→ 0,
i.e., u −→ u in W k, p (Rn ).
In this sense, every Sobolev function can be locally approximated by a smooth

function in the Sobolev norm whenever the values of x are away from the bound-
ary of the original domain. If we make the distance closer (i.e., → 0), the interior
neighborhood gets larger and will absorb new values of x, until we eventually eval-
uate the function at the boundary (since → ). This is the essence of localness
property.
To get a stronger result, we need to globalize the approximation regardless of a
neighborhood taken for this approximating process. As pointed out in Sect. 3.2., the
partition of unity shall be invoked here.
3.7.2 Global Approximation
As discussed above, in order to remove the localness property, one needs to work
on the whole space. However, one can still obtain global results when considering
bounded sets. The following theorem extends our previous result from local to global.
Theorem 3.7.2 (Meyers–Serrin Theorem) For every open set ⊆ Rn and 1 ≤ p <
∞, we have
C ∞ () ∩ W k, p () = W k, p ().
Proof We first consider the case = Rn . Let u ∈ W k, p (Rn ). So D α u ∈ L p (Rn ) for

all |α| ≤ k. Let ϕ ∈ Cc∞ (Rn ), and consider its mollification
u = ϕ ∗ u,
where ϕ is the standard mollifier. Then, by Theorem 3.5.7, u ∈ C ∞ (Rn ) and
u k, p ≤ uk, p ,
from which we conclude that D α u ∈ L p (Rn ) and
D α u − D α u p → 0
in L p (Rn ). Therefore,
→0
p
u − uk, p = D α u − D α u pp −→ 0,
|α|≤k
u ∈ C ∞ (Rn ) ∩ W k, p (Rn ),
and u −→ u in W k, p (Rn ).
Now, let be open in Rn . Then there exists a smooth locally finite partition of
unity ξ˜i ∈ Cc∞ () subordinate to a cover {Ui }. Let > 0, and u ∈ W k, p (). Define
u i (x) = ξ˜i (x)u(x). (3.7.2)
Then by Theorem 3.5.6 u i ∈ W k, p (), so (u i )i ∈ C ∞ (Ui ), supp((u i )i ) ⊂ Ui , and

for small

(u i ) − u i k, p ≤ .
i W () 2i
Now define

v(x) = (u i )i = (ϕi ∗ u i )(x). (3.7.3)
i i
Since for every x ∈ , ˜

ξi (x) = 0 for all but a finite number of indices i, we have the
same conclusion for i (u i ) , and hence v ∈ C ∞ (). Also, note that

u i (x) = ξ˜i (x)u(x) = u(x) ξ˜i (x) = u(x).
i i i
Given δ > 0, there exist small i such that

v − uW k, p () = (ϕi ∗ u i )(x) − u i (x)

i i W k, p ()
3.7 Approximation of Sobolev Spaces 179

≤ (u i ) (x) − u i (x)W k, p ()
i
δ
≤ = δ.
i
2i
3.7.3 Consequences of Meyers–Serrin Theorem
Theorem 3.7.2 implies that
C ∞ () = W k, p () (3.7.4)
in the Sobolev norm ·k, p , that is
D α u n − D α uW k, p () −→ 0,
for all |α| ≤ k. This is a significant advantage over the local approximation. The theo-
rem has several important consequences. One consequence is the following corollary
which is analogous to (3.7.4).
k, p
Corollary 3.7.3 Cc∞ () = W0 () in the Sobolev norm ·k, p .
k, p
Proof In the proof of the previous theorem, let u ∈ W0 (), then the sequence u i
k, p
in (3.7.2) belongs to W0 (), hence by (3.7.3) and the argument thereafter, we have
∞
v ∈ Cc (). This gives
k, p
Cc∞ () ∩ W k, p () = W0 ().
k, p
This result serves as an alternative definition of W0 ().
Definition 3.7.4 (Zero-boundary Sobolev Space) Let be open in Rn . The space
W0 () is defined as the closure of Cc∞ () in the Sobolev norm ·k, p .
k, p
The proof of the previous corollary clearly shows that Definition 3.5.8 implies
Definition 3.7.4, whereas Definition 3.7.4 trivially implies Definition 3.5.8, thus the
two definitions are equivalent.
Another important consequence of Meyers–Serrin is that any Sobolev function
on the whole space Rn can be approximated by a smooth function in the ·k, p norm,
i.e.
C ∞ (Rn ) = W k, p (Rn )
in the Sobolev norm ·k, p . The next result has even more to say.
Proposition 3.7.5 Cc∞ (Rn ) = W k, p (Rn ) in the Sobolev norm ·k, p for
1 ≤ p < ∞.
Proof Let u ∈ W k, p (Rn ). By Meyers–Serrin Theorem, there exists u j ∈ C ∞ (Rn )

such that u j −→ u in the norm ·k, p . Consider the cut-off function
ξ(x) ∈ Cc∞ (Rn ),
and define the sequence

x
ξ j (x) = ξ ∈ Cc∞ (Rn ),
j
and set
v j (x) = u j (x)ξ j (x).
Then v j ∈ Cc∞ (Rn ), and clearly v j (x) → u a.e. Differentiate for |α| = 1 and i =
1, 2, . . . , n, we obtain

∂v j ∂u j 1 a.e. ∂u j ∂u
= ξj + ξu j −→ = .
∂xi ∂xi j ∂xi ∂xi
as j −→ ∞. By Dominated Convergence Theorem, we can show that

∂v j ∂u
− −→ 0,
∂x ∂xi L p (Rn )
i
i.e.,
∂v j ∂u
−→ in L p (Rn ).
∂xi ∂xi
A similar argument can be done for higher derivatives up to k using Leibnitz rule to
obtain
α
D (v j ) − D α u → 0.
p
Thus,
D α (v j ) ∈ L p (Rn )
for all |α| ≤ k, and so v j ∈ W k, p (Rn ), and

v j − u → 0.
k, p

Two important results can be inferred from the previous result. It was indicated
earlier that the Sobolev space is supposed to be the completion of the Schwartz space.
Since
3.8 Extensions 181
D(Rn ) ⊆ S(Rn ) ⊆ W k, p (Rn ),
we infer
Corollary 3.7.6 S(Rn ) = W k, p (Rn ).
The second result that can be inferred from Corollary 3.7.3 is about the connection
between the two Sobolev spaces W and W0 . Although we have the inclusion
k, p
W0 () ⊂ W k, p ()
as in Proposition 3.6.4(4). But this becomes different when = Rn .

k, p
Corollary 3.7.7 W k, p (Rn ) = W0 (Rn ).
Proof Note that from the proof of Proposition 3.7.5, for every u ∈ W k, p (Rn ), we
can find an approximating sequence
v j ∈ Cc∞ (Rn ) ⊂ W k, p (Rn ),
k, p
So v j ∈ W0 (Rn ). Since v j −→ u in ·k, p norm, by the completeness of the space
k, p
we obtain u ∈ W0 (Rn ).
3.8 Extensions
3.8.1 Motivation
In the previous section, we have obtained our results on bounded sets and in
Rn . In general, the behavior of Sobolev functions on the boundary of the domain
has always been a critical issue that could significantly affect the properties of the
(weak) solutions of a partial differential equation. In this regard, it might be useful
sometimes to extend from W k, p () to a W k, p ( ) for some ⊂ , in particular,
from W k, p () to W k, p (Rn ) because functions in W k, p () inherit some important
properties for those in W k, p (Rn ). This boils down to extending Sobolev functions
defined on a bounded set to be defined on Rn . However, we need to make certain
that our new functions preserve the weak derivative and other geometric properties
across the boundary. One of the many important goals is to use the extension to
obtain embedding results for W k, p () from W k, p (Rn ). It should be noted that we
have already used the zero extension in (3.3.7) in the proof of Theorem 3.3.2 in
the case ⊂ Rn . The treatment there wasn’t really problematic because we dealt
merely with functions. In Sobolev spaces, the issue becomes more delicate due to
the involvement of weak derivatives.
3.8.2 The Zero Extension
The first type of extension is the zero extension. For a function f defined on , the
zero extension can be simply defined by
f (x) x ∈
f¯(x) = f (x) · χ (x) = (3.8.1)
0 x ∈ Rn \ .
So f¯ is defined on Rn . Dealing with L p spaces makes this extension possible since

the functions in L p are considered the same if the difference between them is of
measure zero, so even if f (∂) = 0, the zero extension (3.8.1) is still in L p .
Proposition 3.8.1 Let be open in Rn , and let u ∈ L p () for some 1 ≤ p < ∞.
Then there exists a sequence u n ∈ L p (Rn ) such that
u n L p (Rn ) ≤ u L p () .
Proof Let ū ∈ L p (Rn ) be the zero extension of u. For > 0, consider the convolution
approximating sequence
ū (x) = ū(x) ∗ ϕ (x)
defined on . This gives
ū(x) ∗ ϕ (x) = ū(y)ϕ (x − y)dy

Rn
= u(y)ϕ (x − y)dy
∩B (x)
= u(x) ∗ ϕ (x) on .
Hence, by Theorem 3.2.2(3), and writing u n = u n for n −→ 0+ , we obtain
u n L p (Rn ) = u n L p () ≤ u L p () .
Recall in the proof of Theorem 3.3.2, we first proved that u → u in L p (Rn )

for u ∈ L p (Rn ), from which we concluded that Cc∞ (Rn ) is dense in L p (Rn ). Then,
we assumed that ⊂ Rn , and we used the zero extension (3.3.7) to obtain the
sequence ū ∈ C ∞ (Rn ) which converges to u in L p (). Now, we will assume that
Theorem 3.3.3 holds only for = Rn , and we will prove the general case using the
zero extension.
Proposition 3.8.2 For any open set in Rn , the space Cc∞ () is dense in L p ()
for 1 ≤ p < ∞.
3.8 Extensions 183
Proof Let u ∈ L p (). Then ū ∈ L p (Rn ). Hence, by Theorems 3.3.2 and 3.3.3, the
mollification ū m ∈ C ∞ (Rn ) and ū m −→ ū in L p (Rn ). Hence,

u − u p = ū − ū p n −→ 0.
m L () m L (R )
For convenience, set u m = u m , and consider a partition of unity ξm ∈ Cc∞ (), and
define the sequence
wm = u m ξm .
Clearly, wm ∈ Cc∞ (). Then
wm − u L p () ≤ wm − ξm u L p () + ξm u − u L p () .
But,

wm − ξm u L p () = ξm (u m − u) L p () ≤ u m − u L p () → 0.
For the second integral, note that
|ξm − 1| p |u| p ≤ |u| p .
So the Dominated Convergence Theorem gives
ξm u − u L p () −→ 0.
Hence, wm −→ u in L p ().
The situation in Sobolev spaces is more delicate since they involve weak deriva-
tives. The zero extension breaks the graph of the function across the boundary and
jump discontinuity may occur, and consequently, the weak derivatives could fail to
k, p
exist. The space W0 () will play an important role here since functions on this
space already assumed to vanish at the boundary, so the zero extension won’t break
the graph.
k, p
Proposition 3.8.3 Let be open in Rn , and let u ∈ W0 () for some 1 < p < ∞.
Then ū ∈ W k, p (Rn ) and
ūW k, p (Rn ) = uW k, p () .
k, p
Proof If u ∈ W0 (), then u ∈ L p (), and so ū ∈ L p (Rn ). Moreover, by
Meyers–Serrin Theorem, there exists a sequence u m ∈ Cc∞ () such that u m −→ u
in W k, p (), so u m −→ u in L p (). Also, there exists ū m ∈ Cc∞ (Rn ) such that
ū m − ū L p (Rn ) = u m − u L p () −→ 0,

so ū m −→ ū in L p (Rn ). Note that u m is Cauchy in W k, p (), and ū m is Cauchy in

L p (Rn ), so we have for all |α| ≤ k,
α
D ū j − D α ū m k, p n = D α u j − D α u m k, p −→ 0,
W (R ) W ()
so ū m Cauchy in W k, p (Rn ), and thus by completeness, ū m −→ v in W k, p (Rn ), hence

v = ū ∈ W k, p (Rn ). Let ϕ ∈ Cc∞ (), and noting that u m ∈ Cc∞ (). Then
ū D α ϕd x = u D α ϕd x
Rn
= (lim u m )D α ϕd x

= lim u m D α ϕd x

= (−1)α lim D α u m ϕd x

= (−1)α D α ϕϕd x

= D α ϕϕd x.
Rn
So
D α ū = D α u on Rn ,
and clearly ū L p (Rn ) = u L p () . Consequently,
ūW k, p (Rn ) = uW k, p () .

It should be noted that this result doesn’t hold in general for W k, p () because
functions on this space don’t necessarily vanish at the boundary. Instead, we will
find a sequence in Cc∞ (Rn ) approaching to the function on but in L p () only, and
since we can’t guarantee the existence of the weak derivatives across the boundary
∂, we always need to investigate the convergence of weak derivatives inside
staying away from ∂. Hence, our best tool here is the compact inclusion.
Proposition 3.8.4 Let be open in Rn , and let u ∈ W k, p () for some 1 ≤ p ≤ ∞.
Then, there exists u m ∈ Cc∞ (Rn ) such that
u m −→ u in L p ()
and
D α u m −→ D α u in L p ( )
for every ⊂⊂ .
3.8 Extensions 185
Proof Consider the zero extension ū ∈ L p (Rn ). Define
vm = ϕm ∗ ū ∈ C ∞ (Rn )
and vm −→ ū in L p (Rn ), and moreover,
∂vm ∂u
−→ in L p ( )
∂xi ∂xi
for every ⊂⊂ . Define the sequence of cut-off functions

x
ξm (x) = ξ , on
m
and set
u m (x) = u m (x) = ξn vm (x).
Then u m ∈ Cc∞ (Rn ) and
u m −→ ū = u a.e. on .
and it can be shown using Dominated Convergence Theorem that

u − u p −→ 0,
m L ()
in L p (). Moreover, for every ⊂⊂
∂ϕ ∂ϕ(x)
ū m (x) dx = (ϕm (x) ∗ ū(x)ξm (x))
∂xi ∂xi
∂ϕ(x)
= [ū(x − y)ξm (x − y)ϕm (y)dy] dx
Rn ∂xi
∂ϕ(x)
= dx ū(x − y)ξm (x − y)ϕm (y)dy
Rn ∂xi
∂
=− ϕ(x) (ū(x − y)ξm (x − y)) ϕm (y)dy
Rn ∂xi
∂ ū(x − y)
=− ϕ(x) ξm (x − y)ϕm (y)dy
Rn ∂xi

∂ ū
=− ϕ(x)d x
Rn ∂xi m
That is,

∂ ∂u
ū − −→ 0
∂x m ∂x p
i i L ( )
in L p ( ) for every ⊂⊂ .
The previous result enables us to construct a sequence in Cc∞ (Rn ) that converges to
k, p
u ∈ W k, p () in Wloc (). One may start to wonder when can we get the convergence
in W k, p ()? We will either let = Rn , so no extension is needed to pass across the
boundary, or we need to impose extra conditions on the boundary ∂ to guarantee
nice behavior of the weak derivatives. The next result deals with the first option, i.e.,
extending the domain to the whole space.
Proposition 3.8.5 Let u ∈ W k, p (Rn ) for some 1 ≤ p ≤ ∞. Then there exists u m ∈
Cc∞ (Rn ) such that u m −→ u in W k, p (Rn ).
Proof Consider a partition of unity

x
ξm (x) = ξ ∈ Cc∞ (Rn ),
m
which is defined by
1 x ≤ m
ξm =
0 x > m
and define the sequence
u m (x) = u(x)ξm (x).
Then clearly u m ∈ Cc∞ (Rn ) and so by Theorem 3.5.6
u m ∈ Cc∞ (Rn ) ∩ W k, p (Rn )

and
u m − u Lp p (Rn ) = |ξm − 1| p |u| p d x ≤ |u| p d x → 0
Rn x>m
as m → ∞. So, u m −→ u in L p (Rn ). Taking the derivative for |α| = 1,

∂u m ∂u ∂ξm ∂u
= ξm +u −→ .
∂xi ∂xi ∂xi ∂xi
Hence, u m −→ u in W k, p (Rn ).
Proposition 3.7.5 can now be immediately concluded. Now, if we stick with a

bounded open in Rn , then, as suggested above, we need to impose further conditions
on the boundary ∂. Many results on Sobolev spaces don’t require smooth boundary
3.8 Extensions 187
but may require boundary with a “nice” structure. The word “nice” here shall be
formulated mathematically in the following definition.
Definition 3.8.6 Let ⊂ Rn be bounded and connected.
(1) The set is said to be Lipschitz (denoted by Lip) if for every x ∈ ∂, there
exists a neighborhood N (x) such that
Γ (x) = N (x) ∩ ∂
is the graph of a Lipschitz continuous function, and ∂ can be written as a finite

union of these graphs, i.e.,

m
∂ = Γi .
i=1
(2) The set is said to be of class C k if for every x ∈ ∂, there exists a
neighborhood N (x) such that
Γ (x) = N (x) ∩ ∂
is the graph of a C k function, and ∂ can be written as a finite union of these

graphs, i.e.,
m
∂ = Γi .
i=1
(3) The set is said to be a smooth domain if k = ∞ in (2).
In words, a Lipschitz domain means its boundary locally coincides with a graph of
a Lipschitz function. A C k -class domain means its boundary locally coincides with
a C k -surface. A bounded Lip domain has the extension property for all k. Roughly
speaking, a bounded domain is Lip if its boundary behave as a Lipschitz function.
So, every convex domain is Lip, and all smooth domains are Lip. On the other hand,
a polyhedron in R3 is an example of a Lip domain that is not smooth.
Remark We need to note the following:
(1) A Lip domain is by definition bounded, so when we refer to a domain as lip, it

is presumed that it is bounded. The same holds for C k -class domains.
(2) It was assumed in the definition that the domain is connected. If it is disconnected,
then we will add to (2) the condition that N (x) ∩ is on one side of Γ (x).

m
(3) For ∂ = Γi , it is required to have a system of local coordinates such that
i=1
if Γ is the graph of a Lip function (C k functions) ψ, then it is represented by
xm = ψ(x1 , . . . , xm−1 ).
3.8.3 Coordinate Transformations
The main tool to establish our extension result is the following:

Definition 3.8.7 (Diffeomorphism) Let U, V be two open and bounded sets in Rn . A
mapping φ : U −→ V is called: C k −diffeomorphism if φ is bijection and bounded
in C k (U ), and
φ−1 = ψ ∈ C k (V ).
In words, a C k diffeomorphism is a C k mapping whose inverse is also C k . For a

C diffeomorphism φ : U −→ V, we have
k
φ = (φ1 , φ2 , . . . , φn )
ψ = (ψ1 , ψ2 . . . , ψn ).
Here, the mapping ψ is the coordinate transformation on V because it makes ∂

a coordinate surface, and the functions ψi are called the coordinate functions. In this
case, we say that U and V are C k diffeomorphic to each other. Roughly speaking, they
look the same sets, and the elements of the sets are relabeled due to the reorientation
of the coordinates. Writing
y1 = φ1 (x1 , . . . , xn ), y2 = φ2 (x1 , . . . , xn ), . . . , yn = φn (x1 , . . . , xn ),

x1 = ψ1 (y1 , . . . , yn ), x2 = ψ2 (y1 , . . . , yn ), . . . , xn = ψn (y1 , . . . , yn ).
The Jacobian J (φ) is defined as

∂(y1 , . . . , yn )
J (φ) = ,
∂(x1 , . . . , xn )
∂φ j
which is the n × n matrix with entries , 1 ≤ i, j ≤ n. The determinant of J is
∂xi
known as the Jacobian determinant of φ, and is denoted by
|J (φ)| = det(Dφ(x)).
A well-known result in the calculus of manifolds is that if f, g ∈ C 1 (Rn ) and h =

f ◦ g, then
∇h = ∇ f · ∇g.
Applying this on φ, ψ = ϕ−1 yields
(ψ ◦ φ)(x) = I d(x) = x.
3.8 Extensions 189
Taking the derivatives of both sides of the equation, then taking the determinant of
each side, given the fact that det(AB) = det(A) · det(B), give
1 = det(Dφ(x)) det(Dψ(y)),
hence
(det(Dφ(x)))−1 = det(Dφ−1 (y)),
in other words,
∂(y1 , . . . , yn ) 1
= .
∂(x1 , . . . , xn ) ∂(x1 ,...,xn )
∂(y1 ,...,yn )
This implies that
0 < m < |J (φ)| , |J (ψ)| < M < ∞
for some M > 0, so the Jacobian determinant doesn’t vanish for C k -diffeomorphisms.
This coordinate system helps us to change variables when performing multiple inte-
grals. Namely, let f ∈ L 1 (V ), then substituting y = ϕ(x), gives
f dy = ( f ◦ φ) |J (φ)| d x.
V U
Similarly, if f ∈ L 1 (U ), then
f dy = ( f ◦ ψ) |J (ψ)| dy.
U V
It should be noted that if φ ∈ C 1 (U ) is a diffeomorphism, then Dφ is continuous but

needs not be bounded on U . To allow the property of bounded derivative to occur,
we strengthen the definition as follows:
Definition 3.8.8 (Strongly Diffeomorphism) Let U, V be two open and bounded
sets in Rn . A mapping φ : U −→ V is called: C k −strongly diffeomorphism, if φ is
bijection and bounded in C k (U ), and
φ−1 = ψ ∈ C k (V ).
This strong version of the diffeomorphism guarantees that the mappings φ and
ψ, in addition to all their derivatives up to kth order are bounded since they are
continuously defined on closed sets, i.e.

∂ϕ j ∂ψ j
max , < ∞.
1≤i, j≤n ∂x i ∞ ∂ yi ∞
Of course, derivatives cannot be defined on the boundaries, so we can get around

this by defining
φ : −→ ∗
such that , ∗ are open sets in Rn , and U ⊂ and V ⊂ ∗ . This guarantees that
all first derivatives of φ, ψ are bounded on and ∗ , respectively.
Another advantage of the definition is that it defines φ on a compact set , which
allows us to define new Sobolev spaces on compact manifolds in Rn . Indeed, ∂ can
be covered by a finite number of open sets. In particular, each point, say x, in the set
∂ is contained in some neighborhood N (x) that can be represented by the graph
of φ, and so ∂ is covered by a finite number of these neighborhood, say {Ni }. In
other words, ∂ is covered by a finite number of subgraphs of mappings
φi ∈ C k (Ni ),
and thus a system of local coordinates is constructed via the mappings {ψi } for .
The following result, which is helpful in proving the next theorem, provides
a sufficient condition for a composition with a diffeomorphism to be a Sobolev
function.
Lemma 3.8.9 (Change of Coordinates) Let U, V be open bounded sets in Rn , and
let u ∈ W 1, p (U ), and φ : U −→ V , be a C 1 -strongly diffeomorphism, and let
φ−1 = ψ = (ψ1 , . . . , ψn ).
If v(y) = (u ◦ ψ)(y), then v ∈ W 1, p (V ) and
∂v ∂u(ψ) ∂ψk
n
= · .
∂ yi k=1
∂xk ∂ yi
Moreover,
vW k, p (V ) ≤ C uW k, p (U )
for all k ∈ N and some for some C > 0.
Proof Note that

|v(y)| P dy = |u ◦ ψ(y)| P dy.
V V
Choosing the substitution x = ψ(y), with |J (ψ)| ≤ M yield

3.8 Extensions 191
|v(y)| P dy ≤ M |u(x)| P d x < ∞,

V U
that is; v ∈ L p (V ) and
v L p (V ) ≤ C u L p (U ) . (3.8.2)
∂u ∂v
Also, note that ∈ L p (U ) and ∇ψ is continuous, and consequently exists.
∂xi ∂ yi
∂v ∂v
The next step is to evaluate , then to show that ∈ L p (V ), which implies that
∂yj ∂yj
v ∈ W 1, p (V ). By Meyers–Serrin Theorem, there exists a sequence
u m ∈ W 1, p (U ) ∩ C ∞ (U )
such that u m −→ u in W 1, p (U ) and
∂u m ∂u
−→ in L p (U ).
∂xi ∂xi
Define the following sequence:
vm = u m ◦ ψ.
It is clear that vm ∈ C 1 (V ), so we apply the chain rule for classical derivatives
∂vm ∂u m (ψ) ∂ψk

n
= ·
∂ yi k=1
∂xk ∂ yi
∂u m ∂u
Since −→ in L p (U ), this gives
∂xi ∂xi
∂vm n
∂u(ψ) ∂ψk
−→ · = w ∈ L p (V ) (3.8.3)
∂ yi k=1
∂x k ∂ yi
in L p (U ). A similar argument to the first part gives
|vm (y) − v(y)| P dy ≤ M |u m (x) − u(x)| P d x −→ 0.

V U
∂vm
where |J (ψ)| p < M. This implies that vm −→ v in L p (U ) and −→ w in
∂ yi
∂v
L p (U ), but since v ∈ L p (V ) and exists, we conclude from (3.8.3) and unique-
∂ yi
ness of weak derivatives that
∂vm ∂v
−→ = w.
∂ yi ∂ yi
Next, we establish the estimate

p
∂v p n
∂u(ψ) ∂ψk

∂y p = ∂xk
·
∂ yi
dy
i L (V ) V k=1

n
∂u ∂ψk p
≤ · dy.
k=1 V ∂xk ∂ yi
Using the change of variable x = ψ(y) from V to U , and denoting

∂ψ j p
C1 = n max ,
i, j ∂ yi ∞
then substituting the above give

∂v p n
∂u(x) p
≤ MC dx
∂y p 1
∂x j
i L (V ) k=1 U
n
∂u p
≤ MC1
∂x p < ∞
k=1 j L (U )
∂v
This implies that ∈ L p (V ), and hence v ∈ W 1, p (V ). Letting C p = MC1 , this
∂yj
gives
n 1/ p
∂v ∂u p

∂y p ≤ C ∂x p .
i L (V ) k=1 j L (U )
This estimate, in addition to (3.8.2) implies that
vW 1, p (V ) ≤ C uW 1, p (U ) .
A similar argument can be performed for k ≥ 2, with
u m ∈ W k, p (U ) ∩ C ∞ (U )
converging to u ∈ W k, p (U ) and the chain and Leibnitz rules are applied, then taking
the limits. We leave the details for the reader.
3.8 Extensions 193
Remark The result also holds for Lip domains

Now, we are ready to investigate the problem of extending Sobolev functions defined
on open bounded sets to the whole space.
3.8.4 Extension Operator
Theorem 3.8.10 (Existence of Extension Operator) Let ⊂ Rn be open bounded

and a C k -class, and ⊂⊂ . Then, there exists a linear bounded operator E,
called: “extension operator”, such that the following hold:
(1) E : W k, p () −→ W k, p (Rn ).
(2) Eu | = u for every u ∈ W k, p ().

(3) supp(Eu) ⊆ .
(4) The estimate
EuW k, p (Rn ) ≤ c uW k, p ()
holds for some c = c(n, k, p, , ), i.e. does not depend on u. For p = ∞,

c = c(, ).
(5) The estimate
EuW k, p ( ) ≤ c uW k, p ()
holds for some c = c(n, k, p, , ).

Proof The idea is to extend the function u to a larger set in Rn+ that contains its
support, then extend it by zero to Rn+ , then extend it to Rn by a higher order reflection.
We will only prove the case k = 1.
Let u ∈ W k, p () and x0 ∈ ∂. If ∂ is flat near x0 and lies in the hyperplane
H = {x = (x1 , . . . , xn−1 , 0) ∈ Rn },
then there exists a neighborhood N (x0 ) ⊂ ∗ such that
Γ (x0 ) = N (x0 ) ∩ ∂
is “flat” and lies in the hyperplane H = {xn = 0}. For a small δ > 0, let
B + = B ∩ {xn > 0}, B − = B ∩ {xn < 0}
where B = Bδ (x0 ). Then clearly u ∈ W k, p (B + ). Suppose that u ∈ C 1 (). Then we

define the following extension as a reflection of u from B + to B
u(x) x ∈ B+
ū(x) = (3.8.4)
3u(x1 , . . . , −xn ) − 2u(x1 , . . . , −2xn ) x ∈ B − .
Letting
u + = ū | B +
and the even reflection

u − = ū | B − ,
and writing x = (x1 , . . . xn−1 ), it is easy to see that
lim u + (x , xn ) = lim− u − (x , xn ),
xn →0+ xn →0
and for 1 ≤ i ≤ n − 1,
∂u − ∂u ∂u
=3 (x , −xn ) − 2 (x , −2xn ),
∂xi ∂xi ∂xi
hence
∂u − ∂u ∂u +
lim− = (x , 0) = lim+ .
xn →0 ∂xi ∂xi xn →0 ∂x i
For i = n, we have
∂u − ∂u ∂u
= −3 (x , −xn ) + 4 (x , −2xn ),
∂xn ∂xn ∂xn
so
∂u − ∂u ∂u +
lim− = (x , 0) = lim+ .
xn →0 ∂xn ∂xn xn →0 ∂x n
Therefore,
ū ∈ C 1 (B) ⊂ W 1, p ()
(Proposition 3.6.4(5)), and by simple calculation one can easily see that
ūW 1, p (B) ≤ C uW 1, p (B + ) ,
/ C 1 () and
where C is a constant that doesn’t depend on u. Now, suppose that u ∈
∂ is not flat near x0 . We flatten out the boundary
Γ = ∂ ∩ N
through a change in the coordinate system that will map it to a subset of the hyperplane
xn = 0. We will use a C 1 -strongly diffeomorphism
φ : N −→ B, φ ∈ C 1 (N ),
3.8 Extensions 195
and
ψ = φ−1 ∈ C 1 (B),
where B is an open set centered at y0 = φ(x0 ) so that y0 ∈ ∂ B + , and where φ is

given by
φ(x , xn ) = φ(x , xn − γ(x )), ψ(y , yn ) = ψ(y , yn − γ(y )),
Here, φ(Γ ) = B ∩ {xn = 0}, given by
φ(U ) = V = B + ,
where U = N ∩ , and
ψ(V ) = U.
Under this new coordinate system, consider the restriction of u to U, and define
v(y) = u ◦ φ−1 (y) = u(φ−1 (y)),
The idea here is to write

v = uφ−1 φ,
which makes V = B + the domain of v. and u(U ) = v(V ). By Lemma 3.8.9 and the
same procedure as above, it can be shown that
v ∈ C 1 (B + ) ∩ W 1, p (B + ),
and
vW 1, p (V ) ≤ C uW 1, p (U ) ≤ C uW 1, p () . (3.8.5)
Similar to (3.8.2), we extend v from B + to B through the even reflection v̄. Again,
it can be shown that v̄ ∈ W 1, p (B) and
v̄W 1, p (B) ≤ C vW 1, p (V ) . (3.8.6)
Then, we pull the function back to the original system by composing v̄ with φto
produce a new extension
ū = v̄(φ(x)),
which extends u from U to N and such that ū = u on U . Then by continuity and

Lemma 3.8.9,
ū ∈ C 1 (N ) ∩ W 1, p (N )
and
ūW 1, p (N ) ≤ C v̄W 1, p (B) . (3.8.7)
From (3.8.5)–(3.8.7), we have
ūW 1, p (N ) ≤ C uW 1, p (U ) ≤ C uW 1, p () . (3.8.8)
Note that we have two issues with the treatment above. First, the extensions are
still not of compact support, so they are not extended to the whole space. Second,
it can be implemented only locally because the coordinate system provides a local
representation. So we need to make use of the powerful tool of partition of unity to
globalize our results and compactly support the extensions so that we can extend by
zero the functions to Rn . Since ∂ is compact, there exists a finite cover {Ni } of ∂,
m
Ni = N , and let N0 ⊂⊂ such that
i=1

m
⊆ Ni = N ∪ = A ⊆ ∗ .
i=0
Let Ui = Ni ∩ , so {Ui , i = 0, . . . , m} is a cover of , and
u i ∈ W 1, p (Ui ).
By the previous argument,

ū i ∈ W 1, p (Ni ).
Choose a partition of unity {ξi , i = 1, . . . , m}, ξi ∈ Cc∞ (Ni ) subordinate to {Ni , i =

m
1, . . . , m}, where supp(ξi ) ⊆ Ni and ξi = 1 on . Moreover, supp(ξ0 ) ⊆ N0 , and
i=1
let
ũ 0 = uξ0 , ũ i = ξi (x)ū i (x).
Since u ∈ C 1 (), we have

ũ i ∈ C 1 ()
and supp(ũ i ) ⊆ Ni for i = 1, . . . , m, and consequently,
ũ i ∈ W 1, p (Ni )
for each i = 0, . . . m. In view of (3.8.8), we have
ũ i W 1, p (Ni ) ≤ C u i W 1, p (Ui ) . (3.8.9)
The last step is to define the linear operator

3.8 Extensions 197

m
ū = ũ i ,
i=0

m
for x ∈ Ni . Then ū ∈ W 1, p (A), such that ū = u on and
i=0
supp(ū) ⊆ A ⊆ ∗ .
Now, that we obtained an extension of u with a compact support, it is time now to

define Eu using the zero extension
ū x ∈ A
Eu =
0 x ∈R\ A
The following estimate is established:
EuW 1, p (Rn ) = ūW 1, p ( ) = ūW 1, p (A)

m
≤ ũ i W 1, p (Ni )
i=0

m
m
≤ ξi W 1,∞ (Ni ) · ū i W 1, p (Ni )
i=0 i=0

m
≤ KM u i W 1, p (Ui ) , (by (3.8.9))
i=0
≤ C uW 1, p () ,
where

m
M = max{Ci , i = 0, . . . , m}, K = ξi W 1,∞ (Ni ) , C = K M(m + 1).
i=0
So C clearly depends on n, k, , ∗ . The result easily holds for p = ∞. We leave

the details for the reader.
Remark We should note the following:
(1) The result holds for all k ∈ N, and in this case the procedure of the proof becomes
harder since the diffeomorphism will be C k instead of C 1 , which requires a
more complicated even reflection. For example, u − of (3.8.4) will be of the form
k xn
ci u(x , − ) for some coefficients ci such that
i=0 i +1

k+1 j
−1
ci = 1, j = 0, 1, . . . , k
i=1
i
This is represented by a system V C = 1 of k + 1 of linear equations. The coeffi-

cient matrix V is known as the Vandermonde matrix, and this matrix is invertible
since otherwise a polynomial of degree k will have k + 1 roots. This implies that
the system is uniquely solvable. So one can uniquely determine the values of ci
and proceed to establish ū ∈ C k (B) by showing that ū and all its derivatives up
to order k to be equal across the hyperplane {x ∈ Rn : xn = 0}.
(2) The condition C 1 for the boundary can be weakened. In fact, the result also holds
for the case when is a Lip domain, and in this case, φ, ψ are assumed to be
Lipschitz continuous instead of C 1 diffeomorphisms. The part of the boundary
Γ is the graph of a Lipschitz function γ and lies above the graph of γ. We
define the following Lipschitz
φ(x) = y = (x , xn − γ(x0 )), ψ(y) = x = (x , yn + γ(y0 )),
where
x0 = (x0 , x0n ), y0 = (y0 , y0n ).
Then both φ and ψ are Lipschitz with constant 1 with
ūW 1, p (B) ≤ C uW 1, p (B + ) ,
and |J (φ)| = |J (ψ)| = 1. This gives
ūW 1, p (V ) = uW 1, p (U ) .
Further, (3.8.6) stays the same and (3.8.7) becomes
ūW 1, p (A) = v̄W 1, p (B) .
For p = ∞, the extension defined in (3.8.4) becomes
u(x) x ∈ B+
ū(x) = ,
u(x1 , . . . , −xn ) x ∈ B − .
and ū is Lipschitz, and the corresponding estimate is
ūW 1,∞ (B) = uW 1,∞ (B + ) ,
and (3.8.6) becomes

v̄W 1,∞ (B) = vW 1,∞ (V ) ,
3.8 Extensions 199
and (3.8.7) becomes

ūW 1,∞ (A) = v̄W 1,∞ (B) .
Now, with the help of the Extension Theorem, we can strengthen the result of Propo-
sition 3.8.4.
Proposition 3.8.11 Let be open bounded in Rn and of C k class, and let u ∈
W k, p () for some 1 ≤ p ≤ ∞. Then there exists u m ∈ Cc∞ (Rn ) such that u m −→ u
in W k, p ().
Proof Let ∂ be bounded. Then by the Extension Theorem, there exists an extension
operator
Eu : W k, p () −→ W k, p (Rn ).
Define the sequence
u m = ξm (Eu)
where (Eu) = Eu ∗ ϕ and ξm is a sequence of cut-off functions. Then
u m () ∈ Cc∞ (Rn ),
so it is the desired sequence. The case when ∂ is unbounded is left to the reader as
an exercise (see Problem 3.11.31).
One of the advantages of imposing a nice structure on the boundary of the domain
is that it allows us to construct our approximating function to be defined in rather
than in , i.e., our approximating function will belong to C ∞ (). This will provide
a global approximation up to the boundary. Functions in C ∞ () are functions which
are smooth up to the boundary.
We next prove another variant of the Meyers–Serrin Theorem which establishes a
global approximation of smooth functions up to the boundary. The idea is to extend
functions in W k, p () to functions in W k, p (Rn ) in order to apply Proposition 3.7.5.
Theorem 3.8.12 Let be open bounded in Rn and of C k class. Then C k () is
dense in W k, p () in the Sobolev norm ·k, p for all 1 ≤ p < ∞.
Proof We will use Proposition 3.6.4(5).
C ∞ () ⊂ C k () ⊂ W k, p (). (3.8.10)
Let u ∈ W k, p () (1 ≤ p < ∞). We want to show that there exists a sequence
u j ∈ W k, p () ∩ C ∞ ()
such that u j −→ u in W k, p (). That is

W k, p () ∩ C ∞ () = W k, p ().
Since u ∈ W k, p (), there exists an extension E(u) ∈ W k, p (Rn ). By

Proposition 3.7.5, there exists u j ∈ Cc∞ (Rn ) such that u j −→ E(u) in W k, p (Rn ).
Take the restriction to , Consequently,
(u j ) | ∈ Cc∞ (),
and u j converges to E(u) | = u. This proves that C ∞ () is dense in W k, p (). Now
the result follows from (3.8.10).
3.9 Sobolev Inequalities
3.9.1 Sobolev Exponent
This section establishes some inequalities that play an important role in embedding
theorems and other results related to the elliptic theory and partial differential equa-
tions. There are many of these inequalities, but we will discuss some of the important
ones that will be used later and may provide the foundations for other inequalities.
In particular, we will study estimate inequalities in Sobolev or Holder spaces in the
following forms:
(1) u L ≤ C Du L .
(2) u L ≤ C uW .
(3) uC ≤ C uW .
Here, L refers to an arbitrary Lebesgue measurable space L p , W refers to a Sobolev
space, and C refers to a Holder continuous space. A main requirement of all these
inequalities is that the constant C of the estimate must be kept independent of the
function u, otherwise the estimate will lose its power and efficiency in applications
and producing further inequalities and other embedding results. Another concern is
the conjugate we need for a number p. Recall in measure theory, the conjugate of
a number p is the number q such that p −1 + q −1 = 1. This parameter was required
to guarantee the validity of Holder’s inequality, which is a fundamental inequality in
the theory of Lebesgue measurable spaces, and this is the reason why it is sometimes
known as “Holder conjugate”. Likewise, the conjugate needed for the number p to
obtain Sobolev inequalities shall be called Sobolev conjugate, and will be denoted
by p ∗ . Inequality (1) above is fundamental in this subject and plays the same role
as Holder’s inequality in Lebesgue spaces, therefore, it is important to establish this
inequality.
The most basic form of inequality (1) takes the following form: If u ∈ C 1 [a, b]
then
u L 1 [a,b] ≤ C u ∞ .
3.9 Sobolev Inequalities 201
This can be easily seen since u is absolute continuous on [a, b] and

b
u(x) = u (x)d x,
a
so
b
|u| ≤ u (x) d x = (b − a) max u (x) ,
a [a,b]
i.e., the constant C = b − a depends only on the domain [a, b]. How can we extend
this estimate in more general Lebesgue spaces? What if the estimate is taken over
Rn instead of R? Let us assume that for some p, q ≥ 1,
u L q (Rn ) ≤ C Du L p (Rn ) . (3.9.1)
We observe two points:

(1) The estimate is taken over Rn since it would be easy to extend the results to any
open bounded sets of Rn by the extension techniques studied in the previous
section, so obtaining the estimate over Rn is the key to other general results.
(2) The estimate is obviously invalid for all values of p, q, (one can find several
counterexamples). So with a given p, the value of q required to validate the
inequality shall be defined as the Sobolev conjugate of p.
It turns out that a suitable function should be invoked in order to validate the inequality
and help us determine appropriate values of q. The best function to deal with is a
function in Cc1 since it will be contained in all such spaces. We will use the standard
mollifier of (2.3.14)

1 x
φ (x) = n φ ∈ Cc1 (Rn ),

where φ is the function defined in (3.2.5). Then integrating φ over Rn using the
change of variable x = dy gives
|φ (x)|q d x = n−nq |φ(y)|q dy,

Rn Rn
so
φ L q (Rn ) = q −n φ L q (Rn ) .

n
(3.9.2)
Similar argument for D(φ ) gives
φ L p (Rn ) = p −n−1 Dφ L p (Rn ) .

n
(3.9.3)
In order to obtain (3.9.1), we combine (3.9.2) and (3.9.3) to obtain
φ L q (Rn ) ≤ α φ L p (Rn ) ,
where
n n
α= − − 1.
p q
Letting → 0 leads to a contradiction, so the exponent must be zero, that is,

n n
− − 1 = 0,
p q
which implies that

np
q= .
n−p
A simple calculation gives

1 1 1
− = .
p p∗ n
Thus, this will be defined as the Sobolev conjugate (or Sobolev exponent) of p
for all p ∈ [1, n).
Definition 3.9.1 (Sobolev Exponent) Let p ∈ [1, n). Then the Sobolev exponent of
p is
np
p∗ = .
n−p
Remark (1) For the definition to take place, we should have 1 ≤ p < n. For p = n,
we agree to have p ∗ = ∞.
(2) The new conjugate definition takes into account the space Rn but it cannot be
reduced to the classical Holder conjugate q −1 + p −1 = 1 as it is apparent from
the definition of p ∗ , so it cannot be regarded as a generalization of the Holder
conjugate.
3.9.2 Fundamental Inequalities
Now, we come to the next stage; proving the inequality, with the assumption that
q = p ∗ . Before we prove the inequality, we recall some basic inequalities from the
theory of Lebesgue spaces.
Theorem 3.9.2 (Extended Holder’s Inequality) Let p1 , . . . , pn be positive numbers,

such that
n
1
= 1.
p
i=1 i
Let u 1 , u 2 , . . . , u n ∈ L pi (). Then

n
u i ∈ L 1 (),
i=1
and

n n

ui ≤ u i pi .

i=1 1 i=1
Proof Use induction on the Holder’s inequality.
Theorem 3.9.3 (Nested Inequality) Let u ∈ L q () for some bounded measurable
set of measure μ() = M. If 1 ≤ p < q, then u ∈ L p () and
u p ≤ C uq ,
where C = M p − q = C( p, q, ).
1 1
q
Proof Note that |u| p ∈ L q (). Let v = |u| p , then v ∈ L q/ p (). Let r = , then
p
find s the conjugate of r , then using Holder’s Inequality on v ∈ L () and 1 ∈ L s ().
r
This gives
1/r
|u| p = |v| ≤ |u| pr .(μ())1/s .

1
Taking the power of for both sides of the inequality, given that pr = q and
p
1 q−p
= ,
sp qp
The previous result holds for bounded domains . If is unbounded, then we

use the following inequality.
Theorem 3.9.4 (Interpolation Inequality) Let 1 ≤ p < r < ∞ and ⊆ Rn be arbi-
trary measurable set. If u ∈ L p () ∩ L r (), then u ∈ L q () for any q ∈ [ p, r ] and
uq ≤ uθp · ur1−θ
for some θ ∈ [0, 1] such that

1 θ 1−θ
= + .
q p r
Proof Note that

θq (1 − θ)q
1= + ,
p r
p
Since u ∈ L p (), u θq ∈ L θq (), and also since u ∈ L r (), we have
r
u (1−θ)q ∈ L (1−θ)q ().
So we write
|u|q d x = |u|θq |u|(1−θ)q d x,

then using Holder’s inequality on |u|θq and |u|(1−θ)q gives

uqq ≤ u θq p · u (1−θ)q r
θq (1−θ)q
= uθq (1−θ)q
p · ur .
3.9.3 Gagliardo–Nirenberg–Sobolev Inequality
Now we come to our Sobolev inequalities. Our first inequality is fundamental, and
known as Gagliardo–Nirenberg–Sobolev inequality. Gagliardo and Nirenberg proved
the inequality for the case p = 1, and Sobolev for 1 < p < n in the space Rn . The
first three inequalities are known as “Gagliardo–Nirenberg–Sobolev inequalities”,
although the first of them (Theorem 3.9.5) is the most well known.
Theorem 3.9.5 (Gagliardo–Nirenberg–Sobolev Inequality I) Let 1 ≤ p < n and
u ∈ Cc1 (Rn ). Then
u L p∗ (Rn ) ≤ C Du L p (Rn ) .
Proof We prove the case p = 1. By the fundamental Theorem of calculus, we write

xi
∂u
u(x) = (x1 , . . . , xi−1 , ti , xi+1 , . . . , xn )dti .
−∞ ∂xi
Then
xi
|u(x)| ≤ Dxi u dti .
−∞
Multiply the n inequalities and take the n − 1 root,

n ∞ n−1
1
n
|u(x)| n−1 ≤ Dxi u dti .
i=1 −∞
Now, we integrate with respect to x1 over R, then we use the extended Holder’s
inequality
∞ ∞ n−1
1
∞ n ∞ n−1
1
n
|u(x)| n−1 dx ≤ Dx1 u dt1 · Dxi u dti d x1
−∞ −∞ −∞ i=2 −∞
1
n−1 n−1
1
∞ n ∞ ∞
≤ Dx1 u dt1 ( Dxi u d x1 dti ) .
−∞ i=2 −∞ −∞
We integrate with respect to x2 over R, and we repeat this argument successively

until xn to obtain
∞ n ∞ ∞ 1 n
n n−1 n−1
|u(x)| n−1 d x ≤ ... Dxi u d x1 . . . dti . . . d xn = |Du| d x .
−∞ i=1 −∞ −∞ Rn
(3.9.4)
This establishes the inequality for p = 1.
For 1 < p < n, let y = |u|α where α is to be determined later, and substitute above
in (3.9.4). Then
∞ n−1
n
αn
|u(x)| n−1 dx ≤ |D |u|α | d x = α |u|α−1 |Du| d x.
−∞ Rn Rn
We apply the Holder’s inequality on the last term, and taking into account that |Du| ∈
L p and |u|α−1 ∈ L q , where q is the Holder conjugate of p such that p −1 + q −1 = 1.
Consequently
p
q= .
p−1
This gives
∞ n−1
n
p−1
p
1/ p
αn (α−1) p
|u(x)| n−1 dx ≤α |u| p−1 dx |Du| d x
p
. (3.9.5)
−∞ Rn Rn
Now, we choose α such that the powers of u in both sides of the above inequality
are equal to p ∗ , i.e.
αn p(α − 1) np
= = = p∗ .
n−1 p−1 n−p
This gives
p(n − 1)
α= .
n−p
Substitute in (3.9.5), and divide both sides of the inequality by the first term of the
RHS of the inequality, noting that
n−1 p−1 n−p 1

− = = ∗.
n p np p
We thus obtain
∞ 1/ p∗ 1/ p
p∗ p(n − 1)
|u(x)| d x ≤ |Du| d x
p
, or
−∞ n−p Rn
u L p∗ (Rn ) ≤ C Du L p (Rn ) ,
for C = C(n, p).
Remark Note that from the proof of the case 1 < p < n, we cannot assume the
case p ≥ n since the choice of α will be invalid otherwise.
Having a norm in the L p spaces in the RHS of an inequality is always advantageous

because we can always engage W k, p in the inequality due to the fact that
u L p ≤ uW k, p
whenever u ∈ W k, p . This idea will be applied next.

Corollary 3.9.6 Let 1 ≤ p < n and u ∈ Cc1 (Rn ). Then
u L p∗ (Rn ) ≤ C uW 1, p (Rn ) .
Proof This follows from the fact that
u ∈ Cc1 (Rn ) ⊂ W 1, p (Rn ),
hence
Du L p ≤ uW 1, p .

∗
The next result extends the corollary to include not only L p (Rn ), but other L q (Rn )
for all q ∈ [ p, p ∗ ].
Theorem 3.9.7 Let 1 ≤ p < n and u ∈ Cc1 (Rn ). Then
u L q (Rn ) ≤ C uW 1, p (Rn )
for all q ∈ [ p, p ∗ ].
Proof We use the interpolation inequality with r = p ∗ to obtain

u L q (Rn ) ≤ uθL p (Rn ) · u1−θ
L r (Rn ) .
Note that
1 1
1= + ,
p q
where
1 1
p = > 1, q = > 1.
θ 1−θ
So the Young’s inequality can be used to rewrite the above estimate as
uqL q (Rn ) ≤ θ u L p (Rn ) + (1 − θ) u L r (Rn )

≤ u L p (Rn ) + u L r (Rn )
≤ u L p (Rn ) + C Du L p (Rn ) (by GNS inequality)
= C uW 1, p (Rn ) .
3.9.4 Poincare Inequality
The next step is to generalize the above results in two ways. In particular, we will
establish the inequalities for any Sobolev function in W 1,1 rather than Cc1 , and on any
bounded open set rather than the whole space. Of course, we need the Meyers–Serrin
Theorem for the former idea, and the extension operator for the latter. The second
inequality (Theorem 3.9.7) shall also be generalized to hold for any Sobolev function
in W 1,1 and on any bounded open set. The first inequality is the famous Poincare
inequality, which is one of the most useful and important inequalities in the theory
of PDEs. Note that q here is just a parameter that doesn’t play the role of Holder
conjugate.
Theorem 3.9.8 (Poincare Inequality) Let be an open C 1 and bounded in Rn for

1 ≤ p < n. If u ∈ W0 () then u ∈ L q () for all 1 ≤ q ≤ p ∗ , and
1, p
u L q () ≤ C Du L p ()
where C = C( p, q, n, ). Consequently, we have
u L q () ≤ C DuW 1, p () .
Proof By Meyers–Serrin Theorem, there exists u j ∈ Cc∞ () such that

1, p
u j −→ u in W0 ().
Extend u j to Rn and still call them u j , so
u j ∈ Cc∞ (Rn ) ⊂ Cc1 (Rn ).
By GNS inequality

u j = u j L p∗ (Rn ) ≤ C Du j L p (Rn ) ≤ C Du j L p () , (3.9.6)
L p∗ ()
and

u j − u i p∗ n ≤ C Du j − Du i p n −→ 0.
L (R ) L (R )
∗ ∗
Hence, {u j } is Cauchy in L p (Rn ) which is Banach, so u j −→ v ∈ L p (Rn ), hence
v = u. Take the limit of both sides of (3.9.6) using Fatou’s Lemma (Theorem 1.1.5).
u L p∗ () ≤ C Du L p () .
Since q ≤ p ∗ , the result now follows from the nested inequality (Theorem 3.9.3),
and the second inequality follows from the fact that Du L p ≤ uW 1, p .
Remark The particular case when q = p in the first estimate is the classical Poincare
inequality. In this case, the inequality holds for 1 ≤ p < ∞ since we always have
p < p ∗ for all p, n.
1, p
We infer from the above inequality that if we measure the size of the function in W0
by the p-norm, then its size will be bounded above by the size of its weak derivative.
3.9.5 Estimate for W 1, p
The next theorem concerns with the generalization of Theorem 3.9.7.

Theorem 3.9.9 (Sobolev’s Inequality) Let be open C 1 and bounded in Rn for

1 ≤ p < n. If u ∈ W 1, p (), then u ∈ L q () for all 1 ≤ q ≤ p ∗ and
u L q () ≤ C uW 1, p () ,
where C = C( p, q, n, ).
Proof We proceed the same as we did in the previous inequality. The details are left
to the reader as an exercise.
3.9.6 The Case p = n
All the above results hold for p < n. For the borderline case p = n, we see that
p ∗ = ∞. So in fact we have the following:
Theorem 3.9.10 (Sobolev’s Inequality) If u ∈ W 1,n (Rn ), n ≥ 2, then u ∈ L q (Rn )
for all n ≤ q ≤ ∞ and
u L q (Rn ) ≤ C uW 1,n (Rn ) .
Proof Substitute with p = α = n > 1 in (3.9.5) to become

∞ n−1
n
n−1
n
1/n
n2
|u(x)| n−1 dx ≤n |u| d x
n
|Du| d x
n
,
−∞ Rn Rn
which implies
unL r1 (Rn ) ≤ n un−1

L n (Rn ) · Du L n (Rn ) ,
n2 n
where r1 = . We apply Young’s inequality with p = and q = n. This
n−1 n−1
gives

n−1 1
unL r1 (Rn ) ≤n unL n (Rn ) + DunL n (Rn )
n n
! "
≤ n unL n (Rn ) + DunL n (Rn ) .
1
Now, take the power of both sides of the equation and make use of the equivalent
norms n
! "1/n
u L r1 (Rn ) ≤ n 1/n unL n (Rn ) + DunL n (Rn ) ≤ C uW 1,n (Rn ) . (3.9.7)
Now, we have 1 < n < r1 , so we apply the interpolation inequality for all q ∈ [n, r1 ]
and make use of (3.9.7)
u L q (Rn ) ≤ uθL r1 (Rn ) · u1−θ

L n (Rn )
≤ C uθW 1,n (Rn ) · u1−θ

W 1,n (Rn )
= C uW 1,n (Rn ).
We can repeat the same argument for p = n and α = n + 1. This will also give us
the same estimate
u L q (Rn ) ≤ C uW 1,n (Rn )
for all q ∈ [n + 1, r2 ], where

n(n + 1)
r2 = .
n−1
Repeat the argument successively for α = n + 2, n + 3, . . . , by induction, we obtain

the same estimate for all q ≥ n and rk −→ ∞ as k → ∞. This completes the proof.
3.9.7 Holder Spaces
We will discuss some inequalities that connect Sobolev spaces to Holder spaces.
Thus, we need to review some facts about Holder spaces.
Definition 3.9.11 (Holder-Continuous Function) A function u : −→ R is called
Holder continuous with exponent β ∈ (0, 1] if there exists a constant C > 0 such
that for all x, y ∈
|u(x) − u(y)| ≤ C |x − y|β .
It is obvious that a Holder continuous function is Lipschitz continuous for β = 1,

and a βth Holder continuous function is uniformly continuous for any β > 0. It can
be easily seen that for β > 1 the function is constant. One interesting property of
Holder functions which can be concluded from the definition is that for x = y
|u(x) − u(y)|
[u]0,β = sup < ∞.
x,y |x − y|β
In general, we write

uk,β = D α u∞ + [D α u]0,β .
|α|≤k |α|=k
This will be used to characterize Holder functions in their spaces.

Definition 3.9.12 (Holder Space) Let 0 ≤ β < 1. Then, the Holder space is the
normed space
C k,β () = {u : −→ R such that u ∈ C k () and [u]k,β < ∞}.
Here, it is important to note that a function u ∈ C k,β () doesn’t necessarily imply
that u is βth Holder continuous, but only its kth partial derivative is a βth Holder
continuous. It can be shown that the space C k,β () endowed with the norm [u]k,β is
a Banach space.
The reason for studying this type of spaces in this section is that for relatively
high values of p ( p > n), the Sobolev function tends to embed in a Holder space.
3.9.8 The Case p > n
This case will be illustrated through what is known as the Morrey’s inequality. The
next lemma is useful in proving the inequality.
Lemma 3.9.13 Let u ∈ Cc1 (Rn ) and p > n. Then
(1) For all r > 0, we have for some constant C1 = C(n) > 0
1 |Du(y)|
|u(y) − u(x)| dy ≤ C1 dy.
|Br (x)| Br (x) Br (x) |x − y|n−1
p
(2) If q is the Holder conjugate of p, (i.e., q = ), then for some constant
p−1
C2 = C(n, p) we have
1
dy = C2 · r ( p−n)/( p−1) .
Br (x) |x − y|(n−1)q
Proof Let y = x + r v for x, y ∈ Rn , where 0 ≤ t ≤ r = |x − y|, and v is a unit

vector, so
v ∈ S1n−1 (0) = S.
Then
t t
|u(x + tv) − u(x)| = |Du(x + τ v)| · vdτ ≤ |Du(x + τ v)| dτ .
0 0
Integrate over the unit sphere S,
t
|u(x + tv) − u(x)| ds ≤ |Du(x + τ v)| d Sdτ
S 0 S
t
|Du(x + τ v)|
= τ n−1 d Sdτ
0 S τ n−1
t
|Du(x + τ v)|
= τ n−1 d Sdτ
0 S |x + τ v − x|n−1
Now, substitute with y = x + τ v, and note that the first integral is over the (n − 1)-
dimensional sphere Sτn−1 (x) of radius τ ≤ t. Converting to polar coordinates, given
that
Sτn−1 (x) = τ n−1 Sτn−1 (0) ,
where |S| denotes the surface area, and the above integral becomes
t
|Du(y)|
|u(x + tv) − u(x)| d S ≤ d Sdτ .
S 0 Sτ (x) |y − x|n−1
This is an n − 1 dimensional integration over the surface of radius τ followed by

an integration over the radius 0 ≤ τ ≤ t, and this should give us an n dimensional
integration over the whole ball
Bt (x) ⊆ Br (x).
So we have
|Du(y)|
|u(x + tv) − u(x)| d S ≤ dy.
S Br (x) |y − x|n−1
Multiplying both sides by t n−1 , and then integrating both sides with respect to t from
0 to r yields
r r
|Du(y)|
|u(x + tv) − u(x)| t n−1 dtd S ≤ t n−1 dydt.
0 S 0 Br (x) |y − x|n−1
Again, the integration on the LHS is an integration over the ball Br (x) and t n−1 dt
can be integrated using usual calculus rules. This gives
rn |Du(y)|
|u(x + tv) − u(x)| d S ≤ dy.
Br (x) n Br (x) |y − x|n−1
Note that the coefficient

rn
= C(r, n)
n
depends also on r . To eliminate r , we divide both sides of the inequality by the
volume of the n- dimensional ball
π n/2 r n
|Br (x)| =
n2 + 1
where is the gamma function. This gives the inequality in (1) with

n
2
+1
C1 = = C(n).
nπ n/2
To prove (2), let r = |x − y|. Then using polar coordinates by letting ρ = |x − y|
and dy = ρn−1 dρ, we obtain
r
1
(n−1) p/ p−1
dy = ρ(1−n) p/ p−1 ρn−1 dρ
Br (x) |y − x| 0
r
p − 1 ( p−n)/( p−1)
= ρ(1−n)/ p−1 dρ = ·r .
0 p−n
Inequality (2) is thus established with
p−1
C2 = = C(n, p).
p−n
Theorem 3.9.14 (Morrey’s Inequality) Let p ∈ (n, ∞]. If u ∈ Cc1 (Rn ), then
uC 0,β (Rn ) ≤ C uW 1, p (Rn ) ,
n
where β = 1 − .
p
Proof We will only prove the case n < p < ∞, and the case p = ∞ is left to the
reader as an exercise (see Problem 3.11.44). The inclusion u ∈ Cc1 (Rn ) ⊂ W 1, p (Rn )
is clear, so we just need to prove the estimate. Let x ∈ Rn , and |B1 (x)| denotes the
volume of the unit ball centered at x. Then we have
|B1 (x)| u(x) 1

|u(x)| = |u(x)| = 1dy = |u(x)| dy
|B1 (x)| |B1 (x)| B1 (x) |B1 (x)| B1
and using triangle inequality and Lemma 3.9.13(2) with r = 1 gives

1
|u(x)| ≤ |u(x) − u(y)| dy + |u(y)| dy
|B1 (x)| B1 (x) B1 (x)

|Du(y)| 1
≤ C1 dy + |u(y)| dy
B (x) |x − y|
n−1 |B1 (x)| B1 (x)
1
|Du(y)|
≤ C1 dy + |u(y)| dy
B1 (x) |x − y|
n−1
B1 (x)
Note that u ∈ Cc1 (Rn ), so DL ∈ L p (Rn ). By the nested inequality
|u(y)| dy ≤ u L p (Rn ) ,

B1
and in view of Lemma 3.9.13,
|x − y|1−n ∈ L q (Rn )
p
where q = p = . Hence, we apply Holder’s inequality to obtain
p−1

|u(x)| ≤ C1 |Du(y)| |x − y|1−n dy + u L p (Rn )
B1 (x)
1/ p p−1/ p
(1−n) p/ p−1
≤ C1 |Du(y)| dy p
· |x − y| dy
B1 B1
+ C1 u L p (Rn ) .
From Lemma 3.9.13(2), with r = 1

( p−1)/ p
(1−n) p/ p−1
|x − y| dy = (C2 )( p−1)/ p .
B1
Substitute this value in the RHS of the above inequality
|u(x)| ≤ C1 C2( p−1)/ p Du L p (Rn ) + C1 u L p (Rn ) ] ≤ C3 uW 1, p (Rn ) ,
( p−1)/ p
where C3 = C1 C2 . Taking the supremum over all x ∈ Rn
u L ∞ (Rn ) ≤ C3 uW 1, p (Rn ) . (3.9.8)

Now, let x, y ∈ Rn and r = |x − y| > 0. Note that

n
r
|Br (x)| = r n |B1 (x)| = 2n |B1 (x)| = 2n B r2 (x) .
2
x+y
Define the open set N = Br (x) ∩ Br (y). Letting z = , then it is clear that
2
B r2 (x) = B r2 (z) ≤ |N | < |Br (x)| = |Br (y)|
We can write
|u(x) − u(y)| ≤ |u(x) − u(z)| + |u(z) − u(y)| .
Then

1
|u(x) − u(y)| ≤ |u(x) − u(z)| dz + |u(z) − u(y)| dz
|N | Br (x) Br (y)

1
≤ |u(x) − u(z)| dz + |u(z) − u(y)| dz
B r2 (z) Br (x) Br (y)

2n
= |u(x) − u(z)| dz + |u(z) − u(y)| dz
|Br (x)| Br (x) Br (y)
1
=2 n+1
|u(x) − u(z)| dz
|Br (x)| Br (x)
Again, using Lemma 3.9.13, then by Holder’s inequality
|Du(z)|
|u(x) − u(y)| ≤ 2n+1 C1 dz
Br (x) |x − z|n−1
r p−1/ p
≤ 2n+1 C1 Du L p (Rn ) |x − z|(1−n) p/ p−1 dz
0
( p−1)/ p ! " p−1/ p
=2 n+1
C1 C2 · Du L p (Rn ) r ( p−n)/( p−1)
So we obtain
|u(x) − u(y)| ≤ C Du L p (Rn ) r β (3.9.9)
( p−1)/ p
where C = 2n+1 C3 = 2n+1 C1 C2 . Now, dividing both sides of (3.9.9) by r β =
|x − y|β gives
|u(x) − u(y)|
≤ C Du L p (Rn ) < ∞.
|x − y|β
Since x and y are arbitrary, we take the supremum over all x, y ∈ Rn , x = y
[u]0,β, ≤ C Du L p (Rn ) ,
and using (3.9.8), we obtain
u1,β = u∞ + [u]0,β, ≤ C3 uW 1, p (Rn ) + C Du L p (Rn ) ≤ C uW 1, p (Rn )
and therefore, u ∈ C 1,β (Rn ) and
uC 1,β (Rn ) ≤ C uW 1, p (Rn ) ,
where

n
+1 p−1
2 p−1 p
C = 2n+1 .
nπ n/2 p−n
Morrey’s inequality holds for Rn . We can, however, generalize it to hold for any
subset of Rn that satisfies the hypotheses of Theorem 3.8.10, thanks to the extension
operator.
Theorem 3.9.15 If ⊂ Rn is bounded, open, and a C 1 −class, then W 1, p () ⊂

C 0,β () and
uC 0,β () ≤ C uW 1, p () ,
n
for p ∈ (n, ∞], and β = 1 − .
p
Proof This follows from Extension Theorem, Meyers-Serrin Theorem, and Morrey’s
inequality. The details are left to the reader (see Problem 3.11.45).
3.9.9 General Sobolev Inequalities
We establish some inequalities and estimates for general Sobolev spaces W k, p .

Theorem 3.9.16 Let be open C 1 and bounded in Rn and u ∈ W k, p ().
n 1 1 k
(1) If k < then u ∈ L q () where = − , and
p q p n
u L q () ≤ C uW k, p () .

n
(2) If k = then u ∈ L q () where 1 ≤ k < ∞, and
p
u L q () ≤ C uW k, p () .
n
(3) If k > then u ∈ C k−m−1,β () where
p
⎧ n n
⎪
⎨m + 1 − ∈
/N
n p p
m= ,β = n ,
p ⎪θ
⎩ ∈N
p
for some θ with 0 < θ < 1, and
uC k−m−1,β () ≤ C uW k, p () .
Proof To prove (1), note that D α u ∈ W 1, p () for all |α| ≤ k − 1, and so by GNS
D α u L p∗ () ≤ C D α uW 1, p () ≤ C uW k, p () ,
1 1 k ∗
where = − . So u ∈ W k−1, p () and
p∗ p n
uW k−1, p∗ () ≤ C1 uW k, p () .
∗ ∗
We repeat the same argument for u ∈ W k−1, p () so that we get u ∈ W k−1, p ()
and
uW k−2, p∗∗ () ≤ C2 uW k−1, p∗ ()
where
1 1 1 1 1 1 1 2
= − = − − = − .
p∗∗ p∗ n p n n p n
Continue repeating this process k − 1 times until we obtain u ∈ W 0,q () = L q ()
where
1 1 k
= −
q p n
and for some C = C1 C2 . . . Ck , we have
u L q () ≤ C uW k, p () .
n n
To prove (2), let k = , then u ∈ W k, p (). But for p = , there is p q such that W k, p () ⊂ L q (), and the result follows from the combination of
the above two inclusions in addition to the nested inequality (Theorem 3.9.3).
n n
For (3), we only prove the first case. Let u ∈ W k, p () where k > and ∈ / N.
p p
n n
Let m = . Then we clearly have m < < m + 1.
p p
n
Then m < . Applying the same argument of case (1), we obtain u ∈ W k−m,r ()
p
and
uW k−m,r () ≤ C uW k, p () ,
1 1 m
where = − .
r p n
But this implies that D α u ∈ W 1,r () for all α ≤ k − m − 1. Moreover, note that
n
< m + 1, so we have r > n, and using Morrey’s inequality, we conclude that
p
D α u ∈ C 0,β ()
and
D α uC 0,β () ≤ C D α uW 1,r () ≤ C uW m, p () ,
where n n
β =1− = 1 − + m.
r p
Since D α u ∈ C 0,β () for all α ≤ k − m − 1, we must have u ∈ C k−m−1,β () and
uC k−m−1,β () ≤ C uW k−m,r () ,
n n
If ∈ N, then letting m = − 1, by similar argument we can show that u ∈
p p
W k−m,r () , then we proceed as above.
3.10 Embedding Theorems
3.10.1 Compact Embedding
Recall in basic functional analysis, the concept of “embedding” was introduced to

facilitate the study of dual spaces and reflexivity. These embeddings connect Sobolev
spaces with the theory of PDEs because it provides information on the relations
between weak differentiability and integrability, and ultimately, on the regularity of
3.10 Embedding Theorems 219
the solutions, which plays a critical role in the theory of PDEs, and demonstrates
the fact that Sobolev spaces are, in many cases, the perfect spaces to deal with when
searching solutions for PDEs due to their nice integrability properties. Here, we recall
the definition again.
Definition 3.10.1 (Embedding) Let X and Y be two Banach spaces with norms
· X and ·Y respectively, and let ϕ : X −→ Y be a mapping. If ϕ is an isomet-
ric injective, then ϕ is said to be “embedding”, and is written as ϕ : X → Y (and
sometimes X ⊂⊂ Y ).
If we consider the map ı : X → Y , ı(x) = x and X ⊂ Y , then ı is called the
inclusion map. In general, the map ı is called embedding in the sense that it embeds
(or stick) X inside Y, and we can think of the elements of X as if they are in Y, or to
say that Y contains an isomorphic copy of X .
If this map is bounded, then we have more to say about this type of embedding.
Recall that a linear operator is bounded if and only if it is continuous, and so the
inclusion map ı : X → Y is continuous if there exists a constant C such that
ı(x)Y = xY ≤ C x X
for every x ∈ X . The equality on the above is due to the isometry of ı. In other words,
if x X < ∞ (i.e. x ∈ (X, · X ) then xY < ∞ (i.e. x ∈ (Y, ·Y ).
This embedding map is continuous. In this case we say that X is continuously
embedded into Y . It is important to note that when we say that X is embedded in Y
and x ∈ X, we don’t necessarily mean that x ∈ Y, but rather there is a representative
element in Y, say y such that x = y a.e. and x = y.
In the previous section we established some important estimates between Sobolev
spaces connected to other Banach spaces (Lebesgue or Holder). This gives rise to
inclusions and embeddings results. In view of the preceding estimates, we have the
following continuous embeddings.
Theorem 3.10.2 All the following inclusions are continuous:
(1) If 1 ≤ p < n then
∗
W 1, p (Rn ) ⊂ L p (Rn ).
Moreover, if p ≤ q ≤ p ∗ then
W 1, p (Rn ) ⊂ L q (Rn ).
(2) Let be open C 1 and bounded in Rn . If 1 ≤ p < n then

W 1, p () ⊂ L q ()
for all 1 ≤ q ≤ p ∗ .
(3) If n < p ≤ ∞, then
W 1, p (Rn ) ⊂ L ∞ (Rn ).
(4) If n < p ≤ ∞, then

W 1, p (Rn ) ⊂ C 0,β (Rn )
n
where β = 1 − .
p
(5) Let be open C and bounded in Rn . If n < p ≤ ∞, then
1
W 1, p () ⊂ L ∞ ()
n
and W 1, p () ⊂ C 0,β () where β = 1 − .
p
The theorem is an immediate conclusion of the estimates established in the previous
section. Note that all the above inclusions are continuous, i.e. for all 1 ≤ p < n
∗
the space W 1, p (Rn ) is continuously embedded in L p (Rn ) and in L q (Rn ) for all
q ∈ [ p, p ∗ ], and for all n < p ≤ ∞ it is continuously embedded in C 0,β (Rn ), which
in turn embedded in Cb (Rn ). The condition n < p in (3) and (4) is sharp (see Problem
3.11.50).
One of the interesting properties of these continuous embeddings is that any
Cauchy sequence in X is Cauchy in Y, and any convergent sequence in X is con-
vergent in Y . A more interesting type of embedding is what is known as compact
embedding, where the inclusion operator is not only bounded, but also compact. Here
is the definition.
Definition 3.10.3 (Compact Embedding) Let X and Y be two Banach spaces with
norms · X and ·Y respectively. Then an inclusion mapping is called compact
c
embedding. This is denoted by X → Y .
The property of being sequentially compact means that the bounded sequence
{ϕ(xk )} has a convergent subsequence. So one simple argument to show that an
embedding X → Y is not compact is to find an example of a bounded sequence in
Y with no convergent subsequence (see Problem 3.11.57).
The next theorem, due to Rellich and Kondrachov, is a powerful tool in establishing
compactness property. Rellich proved the result in 1930 for the case p = q = 2, and
Kondrachov generalized it in 1945 to p, q ≥ 1.
3.10.2 Rellich–Kondrachov Theorem
An important example to which the theorem can be applied is the convolution approx-
imating sequence u = ϕ ∗ u.
Lemma 3.10.4 If (u m ) ∈ W 1, p (Rn ) with compact support K , then
lim (u m ) − u m L 1 (K ) = 0

uniformly in m, and for each fixed ,
(u m ) = u m
is uniformly bounded and equicontinuous in C(K ), and consequently in L 1 (K ).
Proof We have
(u m ) − u m = ϕ (x − y) (u m (y) − u m (x)) dy
Rn

1 x−y
= ϕ (u m (y) − u m (x)) dy.
B1 (0)
m
x−y
Using the substitution z = in the above integral, and also using the funda-

mental theorem of calculus on u m
1
............. = − ϕ(z) Du m (x − t)zdtdz.
B1 (0) 0
It follows that
1
|(u m ) (x) − u m (x)| d x ≤ ϕ(z) |Du m (x − t y)| d xdtdz
K B1 (0) 0 K
≤ |Du m (w)dw| < ∞.

K
Thus
(u m ) − u m L 1 (K ) ≤ Du m L 1 (K ) ≤ M.
Now, take the supremum over all m,
lim sup (u m ) − u m L 1 (K ) = 0, (3.10.1)

which proves the first part. Moreover, fix > 0. Then
C
|(u m ) (x)| ≤ ϕ (x − y) |u m (y)| dy ≤ ϕ ∞ u m L 1 (K ) ≤ (3.10.2)
Rn n .
Take the supremum over all m,

C
(u m ) ∞ = sup |(u m ) (x)| ≤ ,
m n
and hence the sequence (u m ) is uniformly bounded. Similarly for D(u m ) ,
C
|D(u m ) (x)| ≤ ϕ (x − y) |u m (y)| dy ≤ Dϕ ∞ u m L 1 (K ) ≤ .
Rn n+1
(3.10.3)
It follows that
C
D(u m ) ∞ = sup |D(u m ) (x)| ≤ .
m n+1
So let ε > 0, and |x − y| < δ for some δ > 0. Then
1
|(u m ) (y) − (u m ) (x)| ≤ |D(u m ) (x + t (y − x))| dt
0
C
≤ δ = ε,
n+1
εn+1
for δ = . Therefore, (u m ) is equicontinuous in C(K ) which, in turns, is con-
C
tinuously embedded in L 1 (K ).
The previous lemma will be used next to prove a fundamental compact embedding
result: Rellich–Kondrachov Theorem, which states that the inclusion in
Theorem 3.10.2(2) is not only continuous, but also compact.
Theorem 3.10.5 (Rellich–Kondrachov Theorem) Let be open C 1 and bounded
in Rn . If 1 ≤ p < n then
c
W 1, p () → L q ()
for all 1 ≤ q < p ∗ .
Proof Theorem 3.10.2(2) already established the fact that the inclusion is
continuous, so we only need to prove compactness. Consider the bounded sequence
u m ∈ W 1, p (), for 1 ≤ p < n. By Theorem 3.8.10, there exists an extension
operator Eu m , which is still denoted by u m , such that u m ∈ W 1, p (Rn ) and

supp(u m ) ⊆ ∗ . Consider the convolution approximating sequence (u m ) . By (3.10.1)
we have
lim sup (u m ) − u m L 1 (∗ ) = 0,

↓0,m∈N
Since 1 ≤ q ≤ p ∗ , choose θ ∈ (0, 1) such that

1 θ 1−θ
= + .
q p∗ 1
Then by the interpolation inequality (Theorem 3.9.4), for any q ∈ [1, p ∗ ] we have
(u m ) − u m L q (∗ ) ≤ (u m ) − u m θL p∗ (∗ ) · (u m ) − u m 1−θ

L 1 (∗ ) .
Hence
lim sup (u m ) − u m L q () = 0,

↓0,m∈N
and so (u m ) converges to u m in L q () as −→ 0, and similarly (3.10.2) and

(3.10.3) still hold. By Lemma 3.10.4, for fixed > 0, (u m ) is uniformly bounded
and equicontinuous in C(), and consequently in L q (). Hence, the sequence satis-
fies the Arzela–Ascoli (Theorem 1.3.4), which implies that there exists a uniformly
convergent subsequence {(u m k ) } to u m k in L q (), so {(u m k ) } is Cauchy in L q (∗ ).
Therefore, the sequence {(u m k ) } has the following two properties: for any fixed
k ∈ N there exists = k such that

(u m ) − u m q ∗ ≤ 1 .
k k L ( )
k
Also, for every > 0 we can find Nk such that for all i, j ≥ Nk we have

(u m ) − (u m ) q ∗ ≤ 1 .
i j L ( ) k
It follows that

u m − u m q ∗ ≤ u m − (u m ) q ∗ + (u m ) − (u m ) q ∗
i j L ( ) i i L ( ) i j L ( )
3
+ (u m j ) − u m j L q (∗ ) < .
k
Note that since k is fixed, we cannot yet conclude that {u m i } is Cauchy in L q ().
But if we can repeat the same argument above for k + 1, k + 2, . . . , and for each
choice and every > 0 we obtain i, j ≥ Nk+1 > Nk , and in order to construct the
corresponding Cauchy sequence we must continue the process i, j −→ ∞, and in
this case we need to perform the Cantor’s diagonalization argument to obtain a

Cauchy sequence {u m i } which is convergent in L q (∗ ), and due to completeness,
{u m i } converges to u ∈ L q (∗ ).
The significance of this result stems from the fact that for every bounded sequence
of functions in W 1, p we can always extract a convergent subsequence in some L q
space for some suitable q, which turns out to be extremely useful in applications to
PDEs. Note that we required the domain to be bounded open C 1 in order to apply
1, p
the extension operator. If u ∈ W0 () then by Proposition 3.8.3 we don’t need this
condition on .
Corollary 3.10.6 Let be open and bounded in Rn . If 1 ≤ p < n then
1, p c
W0 () → L q ()
for all 1 ≤ q < p ∗ .
Proof Use Proposition 3.8.3 to obtain a zero extension, then proceed the same as in
the proof of Theorem 3.10.5.
The Rellich–Kondrachov Theorem investigated the compact embedding for the

case 1 ≤ p < n. We will use it to investigate the compact embedding for the cases
p = n, and p > n.
Theorem 3.10.7 Let be open C 1 and bounded in Rn .
(1) If p = n then
c
W 1,n () → L q ()
for all 1 ≤ q < ∞.

(2) If n < p ≤ ∞ then
c
W 1, p () → C(),
and consequently
c
W 1, p () → L q ()
for 1 ≤ q < ∞.
Proof (1) Let u m ∈ W 1,n () be a bounded sequence. Since is bounded, by the
nested inequality, u m is bounded in W 1,q () for all 1 ≤ q < n, so we apply Rellich–
Kondrachov Theorem, and for q ≥ n, we can find p such that 1 ≤ p < n. Choose
p < n such that q < p ∗ . Then
u m ∈ W 1, p (),
and by Rellich–Kondrachov Theorem, there exists u ∈ L q () such that

u m j −→ uinL q (),
and this proves (1).

(2) Let u m ∈ W 1,n () be a bounded sequence. By Morrey’s inequality we have
the continuous embedding
W 1, p () → C 0,β (),
but it is easy to show that bounded sets in C 0,β () are uniformly bounded and
equicontinuous, so by Arzela–Ascoli Theorem,
c c
C 0,β () → C() → L q ().
3.10.3 High Order Sobolev Estimates
Another important and useful consequence of Rellich–Kondrachov Theorem is the

following.
Theorem 3.10.8 Let be open C 1 and bounded in Rn . If 1 ≤ p < n then
1. For all 1 ≤ q − , k ≥ m ≥ 1, we have
q p n
c
W k, p () → W k−m,q ().
Proof (1). Let u j ∈ W k, p () be bounded sequence. Then
D α u j ∈ W 1, p ()
for all |α| ≤ k − 1. By Rellich–Kondrachov Theorem D α u j ∈ L q () for all 1 ≤

q < p ∗ and D α u j has a convergent subsequence, and

u j k−1,q ≤ C D α u j q ≤ C u j k,q ,
W L W
|α|≤k−1
so D α u j has a convergent subsequence in W k−1,q ().

For (2), note that since m ≥ 1, we have
∗
W k, p () ⊆ W k−m+1, p (), (3.10.4)
but from (1) we also have

∗ c
W k−m+1, p () → W k−m,q (). (3.10.5)
The result now follows from (3.10.4)–(3.10.5).
An immediate corollary is the following:

Corollary 3.10.9 Let be open C 1 and bounded in Rn . If 1 ≤ p < n then
c
(1) W k, p () → W k−1, p ().
c
(2) W 1, p () → L p ().
Proof Note that for p < n, we always have p − .
q p n
n
(2) If k = then
p
c
W k, p () → L q (),
for 1 ≤ q < ∞.
n
(3) If k > then
p
c
W k, p () → C 0,β ()
n
for 0 < β < γ where γ = min{1, k − }.
p
Proof (1). Let u i ∈ W k, p () be a bounded sequence. By Theorem 3.10.8(1), we

have
c
W k, p () → W k−1,q ()
for all 1 ≤ q < p ∗ = p1 , k ≥ 1. Iterating the process gives

c c c
W k, p () → W k−1,q () → W k−2,q () → . . . ,
1 1 j
where = − , 1 ≤ j ≤ k. After k iterations we obtain
pj p n
c
W 1, p () → W 0,q () = L q (),
1 1 k
where q < pk∗ and = − .
pk∗ p n
For (2), repeat the argument above k − 1 iterations.
For (3), we use Morrey’s inequality to show that every u ∈ W k, p () is Holder con-
tinuous. We leave the details for the reader.
3.10.5 Embedding of Fractional Sobolev Spaces
Recall in Section 3.5 the fractional Sobolev space was defined as
H s (Rn ) = {u ∈ L 2 (Rn ) : (1 + w2 )s/2 û(w) ∈ L 2 (Rn ) 0 ≤ |α| ≤ k}.
We will provide two compact embedding theorems for this space.

Theorem 3.10.11 Let 1 0 be any two positive real numbers.
If r > t, we have the continuous inclusion
H r (Rn ) → H t (Rn ).
Moreover, if is Lip and bounded in Rn then the above inclusion is compact:

c
H r () → H t ().
Proof Let u ∈ H r (Rn ). Then
$ r −t
%
F −1 {(1 + w2 )t/2 û(w)} = F −1 (1 + w2 )− 2 · (1 + w2 )r/2 û(w)
r −t
= F −1 {(1 + w2 )− 2 } ∗ F −1 {(1 + w2 )r/2 û(w)}.
r −t
From the hypothesis, the exponent − < 0, hence
2
r −t
F −1 {(1 + w2 )− 2 } ∈ L 1,
and since u ∈ H r (Rn ), we have
(1 + w2 )r/2 û(w) ∈ L 2 ,
which implies
F −1 {(1 + w2 )r/2 û(w)} ∈∈ L 2 ,
and therefore u ∈ H t (Rn ).
If be open C 1 and bounded in Rn , then by extension theorem we can show that

H () → H t (). Let u n ∈ H r () be a bounded sequence, which implies that
r
E(u n ) ∈ H r (Rn ),
and define the cut-off function ξ ∈ C0∞ () such that ξ = 1 on . Define the sequence
vn = ξ E(u n ).
Then vn ∈ H r (Rn ) such that vn | = u n , and
supp(vn ) ⊆ supp(ξ) ⊆ K
for some compact set K ⊃⊃ . Extract a subsequence vn j of vn which converges

to H t (Rn ). Consider F{vn j }. It is left to the reader to show that F{vn j } is uniformly
bounded and equicontinuous in H t (). After that, the result follows from Arzela–
Ascoli Theorem.
The theorem implies that in a fractional Sobolev space H r () with a bounded
domain of a nice regularity, any bounded sequence can have a convergent subse-
quence that converge in another Fractional Sobolev space H t () for any t < r .
Another type of compact embedding for fractional Sobolev spaces is the follow-
ing:
Theorem 3.10.12 Let be bounded and C k (or Lip) in Rn . Then
n
(1) If k > then
2
H k (Rn ) → Cb (Rn ).
n
(2) If k > then
2
c
H k () → C().
n
(3) If k > m + then
2
H k (Rn ) → Cbm (Rn ).
3.11 Problems 229
n
(4) If k > m + then
2
c
H k () → C m ().
Proof We will only prove the continuous inclusion (1). By performing m successive
iterations we can prove (3), and by the extension theorem and Arzela–Ascoli theorem
we can prove (2) and (4). Since
S(Rn ) ∩ H k (Rn = H k (Rn ),
it suffices to prove the result for u ∈ S(Rn ) ∩ H k (Rn ). But this implies that
1
u∞ ≤ û(w) dw = C (1 + |w|2 )k/2 û(w) dw < ∞.
Rn Rn (1 + |w|2 )k/2
So we use Cauchy–Schwartz inequality to obtain
1/2 1/2
1 2
u∞ ≤ C dw (1 + |w|2 )k û(w) dw . (3.10.6)
Rn (1 + |w|2 )k Rn
n
Since k > , we obtain
2
1
dw < ∞,
R (1 + |w|2 )k
so (3.10.6) becomes
u∞ ≤ C u H k (Rn ) .
In Theorem 3.6.1, it was shown that any function in W 1,1 (I ) has an absolutely
continuous representation ũ ∈ C(I ). Theorem 3.10.12 shows that functions in H k (R)
are always continuous and bounded for all k ≥ 1, and there are no continuous rep-
resentations for functions in H 1 (R2 ). In order to get bounded continuous Sobolev
functions on R3 , we need at least H 2 .
3.11 Problems
(1) Show that for any weakly differentiable f ∈ L loc

1
(Rn ), if D α f = 0 for |α| ≤ n,
then f is constant almost everywhere.
(2) Show that the Heaviside function H (x) is the weak derivative of
x x >0
f (x) = .
0 x ≤0
(3) Consider the function u : R −→ R, u(x) = |x|.

(a) Find the weak derivative of u.
(b) Is Du ∈ L 1 (R)?
(c) Does u have a second weak derivative D 2 u ?
(4) Find the weak derivative of u(x) = χQ (x).
(5) Let f ∈ Cc∞ (Rn ). Prove that
supp{D k f } ⊆ supp{ f }.
(6) Show that the function introduced in (3.2.5) is in Cc∞ (Rn ).

(7) Show that the formula
Cc∞ (Rn ) = L p (Rn )
doesn’t hold for p = ∞.

(8) Let f ∈ L p (Rn ) and ϕ ∈ Cc∞ (Rn ) be a mollifier, and consider
f = f ∗ ϕ .
Show that f exists for p = 1 and for p = ∞.

(9) Show that Cck (Rn ) is dense in C k (Rn ).
(10) Let f ∈ L 1 () for some open set ⊂ R. Find a sequence in C ∞ (R) that con-
verges to f .
(11) Let f ∈ C(R). Show that f → f uniformly on compact subsets of Rn .
(12) Consider the following mollification for x ∈ Rn
f = e−(+bi)|x| .
2
(a) Find F{ f }
(b) Find F{e−bi|x| }.
2
(13) Let f, h ∈ S(R ) and define

n
1 −|x|2 /2
h (x) = n e ,
(2π) 2
and let f = h ∗ f .
(a) Show that f −→ f .
(b) Show that
F{h } = e−|x| /2
2
.
(c) Conclude that

F −1 {F{ f }} = f (x).
3.11 Problems 231
(14) Let f ∈ L ∞ (Rn ). Show that f −→ f uniformly. Is f ∈ C ∞ (Rn )?

(15) Show that S(Rn ) is dense in S (Rn ).
(16) (a) Show that
ρk (ϕ) = x k ∂ β ϕ
∞
|β|≤k
is a seminorm on S(Rn ).
(b) Use the Frechet metric
∞
ρk ( f − g)
d( f, g) =
k=0
1 + ρk ( f − g)
to show that (S(Rn ), d) is a complete metric.

(c) Do the same for
ρm,k (ϕ) = sup sup (1 + |x|)k D β ϕ(x) .

|β|≤m x∈Rn
(17) Show that ·, · in (3.4.3) defines an inner product.

(18) Prove that the product of functions in W 1, p (R) is again in W 1, p (R).
(19) Give an example of a function f ∈ H S (Rn ) for all s ∈ R but f is not a Schwartz
function. Deduce that &
S(Rn ) H S (Rn ).
(20) Let u ∈ L 2 (Rn ). Show that

√
u H 1 (Rn ) ≤ (1 + |w| û(w) L 2 (Rn ) ≤ 2 u H 1 (Rn )
for all u ∈ H 1 (Rn ).

(21) Prove the following
−1
(a) δ ∈ H s (R) iff s < .
2
(b) δ ∈ H (R ) iff s < −1.
s 2
−n
(c) δ ∈ H s (Rn ) iff s < .
2
3
(22) Show that e−|x| ∈ H s (R) iff 0 ≤ s < .
2
(23) Determine the value of n for which the following functions belong to W 1,n (B)
if 1. B is the unit ball in Rn , and 2. B is the ball of radius 21 .
|x||) .
(a) u = log (|log
1
(b) u = log log .
|x|

1
(c) u = loglog 1 + .
|x|
1
(d) u = log log 1 + 2 .
|x|
(24) Let
1
u(x) =
|x|α
be defined on B = B1 (0) ⊂ Rn . Show that u ∈ W 1, p (B) iff
n−p
α< .
p
(25) (a) Give an example of a function u ∈ L 1 (), ⊆ R, that is weakly differen-

tiable but u ∈/ W 1,1 ().
(b) Give an example of a function u ∈ L p (), p > 1, ⊂ Rn , that is weakly
differentiable but u ∈ / W 1, p ().
(26) Give an example of a function in W 1, p () ( p ≥ 1) but not in W k, p () for all
k ≥ 2.
(27) If u ∈ W 1,1 (I ) then show that for every x ∈ I and for some c > 0 we have
|u(x)| ≤ c uW 1,1 (I ) .
(28) Show that the Heaviside function belongs to H −1 () for ⊂ Rn .

(29) Let u ∈ L p () for some open ⊂ Rn .
(a) Show that as −→ 0 we have
|u(x + z) − u(x)| p dz −→ 0.

(b) Conclude from (a) that

u (x) = ϕ ◦ u
converges to u in L p ().
(30) If u ∈ L 1 (Rn ), show that u (x) ∈ L ∞ (Rn ).
(31) Prove the case when ∂ is unbounded in Proposition 3.8.11.
1, p
(32) Show that if ū ∈ W 1, p (Rn ) and is C 1 -class then u ∈ W0 ().
(33) If u ∈ W ((0, ∞)), show that
1, p
lim u(x) = 0.
x→∞
(34) (a) Let u ∈ W 1,1 (I ) for some interval I ⊂ R. If u is weakly differentiable and
Du = 0 then u = c a.e. for some constant c ∈ R.
3.11 Problems 233
(b) Let be open and connected in Rn and u ∈ W 1, p (). If u is weakly differ-

entiable and Du = 0 then u = c a.e.
(35) Let u ∈ W 1, p () and ξ ∈ Cc∞ () be a cut-off function, ⊂ Rn . Let w = ξu
be the zero extension of ξu. Show that for 1 ≤ i ≤ n,
∂w ∂w ∂ξ
=ξ +w
∂xi ∂xi ∂xi
(36) Use approximation results to show that if u ∈ W 1, p (Rn ) and Dxi u = 0 for all
i = 1, 2, . . . , n, then u is constant a.e.
(37) (a) Show that if ϕn ∈ Cc∞ () and u ∈ W k, p () then uϕn ∈ W0 ().
k, p
k, p k, p
(b) Show that if v ∈ C k () ∩ W k,∞ () and u ∈ W0 () then uv ∈ W0 ().
(38) Show that for every u ∈ W k, p (), there exists w ∈ W k, p ( ) such that w = u
on and
wW k, p ( ) ≤ c uW k, p () .
(39) Let u ∈ W k, p (Rn+ ). Define the sequence
w (x) = u (x + 2en ),
where en is a unit vector in the nth coordinate. Show that
w (x) −→ u(x)
in W k, p (Rn+ ).
(40) Let u ∈ W 1, p (Rn+ ). Define
u(x) xn > 0
ū(x) =
u(x , −xn ) xn < 0.
(a) Show that ū ∈ W 1, p (Rn ).

(b) Find a general form for ū if u ∈ W k, p (Rn+ ) for k ∈ N.0
(41) Prove the inequality in Theorem 3.9.9.
(42) (a) Show that any function in W 1,∞ (R) coincides a.e. with a Lipschitz continuous
function.
(b) Deduce from (a) that
W 1,∞ () = C 0,1 ()
for any C 1 bounded set ⊂ Rn .

(43) Let be an open C 1 and bounded in Rn . Show that if u ∈ W 1,n (), n ≥ 2, then
u ∈ L q ()
for all n ≤ q ≤ ∞ and
u L q () ≤ C uW 1, p () .
(44) Prove Morrey’s inequality for p = ∞ as follows:

(a) Use a cut-off function ξ and Theorem 3.2.3 to show that for every u ∈
W 1,∞ (Rn ), ξu ∈ W 1, p (Rn ) for every p > n.
(b) Apply Morrey’s inequality on ξu.
(45) Prove Theorem 3.9.15 as follows:
(a) Show that for every u ∈ W 1, p (), 1 < p < ∞, there exists a sequence u j
in Cc∞ (Rn ) ∩ W 1, p (Rn ) that converges to ū in W 1, p (Rn ).
(b) Apply Morrey’s inequality to show that u j converges to u a.e. in C 0,β (R).
(c) Use Morrey’s inequality and Theorem 3.8.10(4) to prove Theorem 3.9.15 for
all 1 and
p
n
∈ N.
p
(47) If 0 ≤ α < β < 1, show that for a bounded set ⊂ Rn ,
c
(a) C 0,β () → C 0,α ().
c
(b) C 0,β () → C().
(48) Show that bounded sets in C 0,β () are uniformly bounded and equicontinuous.
(49) Let ⊂ Rn be open bounded and C 1 . Show that
c
W 1,∞ () → W 1, p ()
for any p > n. Deduce that

c
W 1,∞ () → C().
(50) Let u = ln |ln |x||.

/ L ∞ (B), for B = B1/e (0) ⊂ Rn (the open
(a) Show that u ∈ W01,n (B) but u ∈
1
ball of radius ).
e
(b) Show that u ∈ H 1 () but u ∈
/ C(), for = B1/2 (0) ⊂ R2 .
(c) Deduce that the condition n n show that u is pointwise
differentiable and
∇u = Du a.e.
(52) Use only the Arzela–Ascoli Theorem together with Theorem 3.6.1 to show that
for all 1 < p < ∞, we have the compact embedding
c
W 1, p (I ) → C(I )
where an open bounded interval I ⊂ R.

(53) Show that Rellich–Kondrachov Theorem doesn’t hold for q = p ∗ , that is;
∗
W 1, p () → L p ()
is a continuous inclusion but not compact.

(54) (a) Show that Rellich–Kondrachov Theorem still holds for Lip domain that
are not necessarily C 1 .
(b) Show by a counterexample that the condition that the domain being
bounded and Lip is essential.
(55) Give an example to show that the continuous inclusion
W 1, p (Rn ) → L p (Rn )
is not compact.
n
(56) Let k ≤ .
2
a) Show that
H k (Rn ) → L p (Rn )
2n
for all 2 ≤ p < .
n − 2k
b) If ⊂ R is open bounded and C 1 , show that
n
c
H k () → L p ()
2n
for all 2 ≤ p < .
n − 2k
(57) (a) Give an example of a bounded sequence u n ∈ L p () but has no convergent
subsequence in L p ().
(b) Use (a) to show that for q > p the inclusion
ι : L q () → L p ()
cannot be compact.
n
(58) If 1 ≤ p < ∞ and p = , then show that
m
W m, p (Rn ) → L q (Rn )
for all p ≤ q < ∞.

n
(59) Let ⊂ Rn be open bounded C 1 . If k < , then show that the inclusion
p
W k, p () → L q ()
is continuous, but is not compact if
1 1 k
q = p∗ , = − .
p∗ p n
(60) Let ⊂ Rn be open bounded and C 1 , p ≥ 1, 1 ≤ q < ∞. Assume that

n n
k−m > − > 0.
p q
(a) Show that

c
W k, p () → W m,q ().
(b) Discuss the case when

n n
k−m = − .
p q
(61) (a) If p = n, show that for all p ≤ q < ∞,

c
W 1,n (Rn ) → L q (Rn )
n
(b) If n < p, show that for β = 1 − ,
p
c
W 1, p (Rn ) → C 0,β (Rn ).
(62) Prove or disprove the following inclusions are continuous:

(a) W 1,n (Rn ) ⊂ C(Rn ) ∩ L ∞ (Rn ).
(b) W 1, p (Rn ) ⊂ C(Rn ) ∩ L ∞ (Rn ) for p > n.
(63) Let ⊂ Rn be open bounded and C k . Show that every weakly convergent
sequence in W k, p () converges strongly in W k−1, p ().
(64) Give an example of an unbounded function in H 1/2 (R).
3.11 Problems 237
(65) Verify the following inclusions:

(a) H 1/3 (R2 ) → L 3 (R2 ).
(b) H 1/2 (R2 ) → L 4 (R2 ).
(c) H 3/4 (R3 ) → L 4 (R3 ).
Chapter 4
Elliptic Theory
4.1 Elliptic Partial Differential Equations
4.1.1 Elliptic Operator
The general form of a second-order partial differential equation in R2 takes the form
Au x x + 2Bu x y + Cu yy + Du x + Eu y + F = 0.
This equation is called elliptic if B 2 − AC < 0, or equivalently, the matrix

A B
M=
BC
is positive definite. For x ∈ ⊆ Rn , the standard form of an elliptic PDE operator

takes the following form:

n
∂2u
n
∂
Lu(x) = − ai j (x) + bi (x) u(x) + c(x)u(x), (4.1.1)
i, j=1
∂xi ∂x j i=1
∂xi
where A(x) = (ai j (x)) is a matrix-valued function defined on as

a11 a12 . . . a1n

a21 a22 a2n

A= . .. ..
.. . .

an1 an2 . . . ann
and is positive definite for all x ∈ , i.e., ξ T Aξ > 0 for every nonzero ξ ∈ Rn . A
more convenient way of writing it is in the divergence form
https://doi.org/10.1007/978-981-99-3788-2_4
240 4 Elliptic Theory
Lu(x) = −div(A(x)∇u),
i.e.
n n
∂ ∂u ∂u
Lu(x) = − ai j (x) + bi (x) (x) + c(x)u(x). (4.1.2)
i, j=1
∂xi ∂x j i=1
∂xi
The equation models many steady-state natural and physical systems (e.g., heat
conduction, diffusion, heat and mass transfer, flow of fluids, and electric potential).
The divergence term
n
∂ ∂u
ai j (x)
i, j=1
∂xi ∂x j
refers to the diffusion process, the second term

n
∂u
bi (x) (x)
i=1
∂xi
is the advection term, and the zeroth-order term c(x)u(x) is the decay term. The
function A(x) is called symmetric if ai j = a ji for all 1 ≤ i, j ≤ n. For the matrix A
to be positive definite means that all its eigenvalues are positive. In particular, for
every x ∈ ,
n
ai j (x)ξi ξ j
i, j=1
≥ λ(x), (4.1.3)
|ξ|2
where λ(x) is the eigenvalue of A(x) and ξ ∈ Rn . Taking the minimum over all
such vectors ξ gives the smallest eigenvalue λmin (x) > 0. So (4.1.3) characterizes
the ellipticity of PDEs, and the smallest eigenvalue λmin (x) depends on the chosen
value of x and serves as the minimum of the LHS of (4.1.3).
4.1.2 Uniformly Elliptic Operator
To make this lower bound uniform, we need to make it independent of x. This gives
rise to the following definition.
Definition 4.1.1 (Uniformly Elliptic Operator) A second-order partial differential
equation of the form

n
∂2u n
∂
Lu(x) = − ai j (x) + bi (x) u(x) + c(x)u(x), x ∈ ⊆ Rn
i, j=1
∂xi ∂x j i=1
∂x i
4.1 Elliptic Partial Differential Equations 241
is called uniformly elliptic if there exists a positive number λ0 > 0 such that

n
ai j (x)ξi ξ j ≥ λ0 |ξ|2
i, j=1
for all x ∈ and ξ = (ξ1 , ξ2 , . . . , ξn ) ∈ Rn . The number λ0 is called the uniform

ellipticity constant.
There are some important particular cases. Due to its physical interpretation,
uniformly elliptic PDEs usually have an extra condition on the ellipticity: There
exists 0 < < ∞ such that

n
ai j ξi ξ j ≤ |ξ|2
i, j=1
for all ξ ∈ Rn . This is a stronger version of the uniform ellipticity in which the
diffusion term can be controlled from above and below. It is easy to see that choosing
suitable values of ξi , ξ j yields ai j ∈ L ∞ (). This is interpreted by the fact that there
is no blow-up in the diffusion process, which, in many situations, arises naturally,
so we will adopt this assumption throughout this chapter. One advantage of this
two-sided control of the diffusion process is that all eigenvalues of the matrix A lie
between these two bounds. Namely, for any eigenvalue λ(x) of A and for all x ∈
we have
λ0 ≤ λ ≤ . (4.1.4)
Another particular case is when A(x) is the identity matrix (i.e., ai j (x) = δi j ),
and bi = c = 0. In this case, the operator (4.1.2) reduces to the Laplace operator

n
∂2u
Lu(x) = −∇ u(x) = −
2
(x).
i, j=1
∂xi ∂x j
4.1.3 Elliptic PDEs
Elliptic equations are extremely important and can be used in a wide range of appli-
cations in applied mathematics and mathematical physics. The most basic examples
of elliptic equations are
(1) Laplace equation:
∇ 2 u = 0.
(2) Poisson equation:

∇ 2 u = f.
(3) Helmholtz equation:

(∇ 2 + μ)u = f.
The most common types boundary conditions are:

1. The Dirichlet condition:
u = g, on ∂,
2. Neumann condition:
∂u
= ∇u · n = g, on ∂,
∂n
where n is the outward unit normal vector.
Laplace’s equation is the cornerstone of potential theory and describes the elec-
tric potential in a region. It is one of the most important partial differential equations
because it has numerous applications in applied mathematics, physics, and engineer-
ing. Poisson’s equation is the nonhomogeneous version of Laplace’s equation and
describes the electric potential in the presence of a charge. It plays a dominant role
in electrostatic theory and gravity. The Helmholtz equation is another fundamental
equation which has important applications in many areas of physics, such as electro-
magnetic theory, acoustics, classical and quantum mechanics, thermodynamics, and
geophysics. So no wonder the theory of elliptic partial differential equations stands
as one of the most active areas in applied mathematics, and attracts an increasing
interest from researchers. Therefore, studying solutions of elliptic PDEs provides
a comprehensive overview of equations of mathematical physics and a wide scope
of the theory of PDEs, so it suffices our needs in this text. Moreover, another fea-
ture of studying elliptic type is that elliptic equations have no real characteristic
curves, and consequently, solutions to elliptic equations don’t possess discontinuous
derivatives, because if they do so, they will be only along characteristic curves. This
makes the elliptic equations the perfect tool to use in investigating equilibrium (time-
independent) steady-state processes in which time has no effect, and no singularities
in the solution to be transported.
4.2 Weak Solution
4.2.1 Motivation for Weak Solutions
A classical solution for a boundary value problem is a solution that satisfies the
problem pointwise everywhere. It should be differentiable as many times as needed
to fulfill the PDE. If the equation is of order n and defined in a domain , then the
classical solution must be in C n () for every x ∈ , and must also satisfy boundary
conditions of the problem for every x ∈ ∂. The problem of finding a solution to an
equation in a domain and satisfying the conditions on the boundary of this domain
is called “Boundary Value Problem”.
4.2 Weak Solution 243
Our elliptic BVP takes the following form: Let be a bounded and open set in
Rn . Find u ∈ C 2 () ∩ C() such that
Lu = f x ∈
(4.2.1)
u=g x ∈ ∂
for some linear elliptic operator L as in (4.1.2). Because of the boundary condition,
this is called Dirichlet problem. Notice that we can write the solution of this problem
as w + g where w is the solution to the problem
L(v + g) = f x ∈
v=0 x ∈ ∂,
so for simplicity we can just assume g = 0 in (4.2.1). If we can find such u, then it will
be a classical solution for the problem (4.2.1). As said earlier, these equations model
natural and physical phenomena occurring in the real world. Unfortunately, these
models may not admit classical solutions. In fact, many equations in various areas of
applied mathematics may have solutions that are not continuously differentiable, or
not even continuous (e.g., shock waves), and so finding classical solutions to these
equations in general may be too restrictive to obtain. Consider for example the case
when f is a regular distribution, then the equation
∇2u = f
cannot produce a continuous solution since otherwise f would also be continuous.

In this case, it would be helpful to be less demanding by seeking solutions with less
regularity, in the sense that they won’t satisfy the problem everywhere in the domain
and the conditions on its boundary.
4.2.2 Weak Formulation of Elliptic BVP
How can we find such solutions? Sobolev spaces come to the rescue as they provide
all the essentials to obtain these solutions. Sobolev spaces are the completion of
C ∞ spaces, and they include weakly differentiable functions that are not necessarily
continuous or differentiable in the usual sense, but they can be approximated by
some smooth functions in Cc∞ (). These proposed solutions are supposed to satisfy
these equations in a distributional sense not in pointwise sense. Recall in (2.2.1) two
distributions T and S are equal if
T, ϕ = S, ϕ
for all ϕ ∈ Cc∞ (). We will use the same formulation here. Namely, we will multiply
both sides of the equation Lu = f in (4.2.1) by a test function ϕ ∈ Cc∞ () then
integrate over to obtain
(Lu, ϕ) d x = f ϕd x. (4.2.2)

By density, this extends to all functions v ∈ H01 (). Indeed, for every v ∈ H01 ()
we can choose a sequence ϕn ∈ Cc∞ (∞) such that ϕn −→ v. Since the above equa-
tion is written in terms of ϕn , we pass the limit using Dominated Convergence
Theorem to obtain
(Lu, v) d x = f vd x.

If we let f ∈ L 2 () and u ∈ H 1 (), then (4.2.2) is well-defined and can be

written as
⎛ ⎞
n n
⎝− ∂ ∂u ∂u
ai j + bi + cu ⎠ vd x = f, v L 2 () ,
i, j=1
∂x i ∂x j i=1
∂x i
and performing integration by parts in the divergence term (the first term), making
use of the fact that v(∂) = 0, yields
⎛ ⎞

n
∂u ∂v n
∂u
⎝ ai j + bi v + cuv ⎠ d x = f, v L 2 () . (4.2.3)
i, j=1 ∂xi ∂x j i=1
∂x i
Since we usually impose the homogeneous Dirichlet condition u = 0 on the bound-

ary, our best choice of solution space is H01 (). If we can find a function u ∈ H01 ()
such that (4.2.3) holds a.e. in then u satisfies the Dirichlet problem in a distribu-
tional sense, and since this function satisfies the problem in the weak sense, this type
of solutions is known as: weak solution.
Definition 4.2.1 (Weak Formulation, Weak Solution) Consider the Dirichlet problem
Lu = f (x) x ∈
, (4.2.4)
u=0 x ∈ ∂
for some elliptic operator L of the form
n n
∂ ∂u ∂u
Lu(x) = − ai j (x) + bi (x) (x) + c(x)u(x).
i, j=1
∂x i ∂x j i=1
∂x i
Moreover, assume that ai j , bi , c ∈ L ∞ (), and f ∈ L 2 (), ∈ Rn . Then, the prob-

lem
4.2 Weak Solution 245

(Lu, v) d x = f, v L 2 () x ∈
(4.2.5)
u=0 x ∈ ∂
is called the weak formulation (or variational) formulation of problem (4.2.4). If

there exists u ∈ H01 () which satisfies (4.2.5) (i.e. (4.2.3)) for all v ∈ H01 (), then
u is a weak solution of problem (4.2.4).
Here, we need to emphasize the fact that f, v L 2 () is not really an inner product
but rather a sloppy way of describing it because it behaves the same. Since
| f, v | < ∞,
f defines a bounded linear functional on H01 () as
f (v) = f, v .
So the problem is equivalent to the problem of finding u ∈ H01 () such that for
f ∈ H −1 ()
f, v = B[u, v]
for all v ∈ H01 ().

We need to emphasize the following important observation: If problem (4.2.4) has
a classical solution and a weak solution, then, with sufficient regularity conditions,
the weak solution is a classical solution. It is evident from the formulation above that
a classical solution of (4.2.4) is also a weak solution.
On the other hand, let u ∈ C 2 () be a weak solution of (4.2.4), i.e., it satisfies
(4.2.5) for every v ∈ H01 () and suppose ai j , bi , c ∈ C 1 (), f ∈ C(). Performing
integration by parts again in the divergence term (first term of the LHS of (4.2.5))
gives
⎛ ⎞
n n
⎝− ∂ ∂u ∂u
ai j + bi + cu ⎠ vd x = f vd x,
i, j=1
∂x i ∂x j i=1
∂x i
which implies
⎛ ⎞
n n
⎝− ∂ ∂u ∂u
ai j (x) + bi (x) (x) + c(x)u(x) − f ⎠ vd x = 0.
i, j=1
∂x i ∂x j i=1
∂x i
By the Fundamental Lemma of COV (Lemma 3.2.9), we see that u satisfies (4.2.4)
almost everywhere, but since the terms on both sides of (4.2.4) are continuous, the
result extends by continuity to all x ∈ , and thus u is a classical solution of (4.2.4).
Define B[u, v] to be the integral in the LHS of (4.2.5). This is defined as the bilinear
form associated with L , and equation (4.2.5) can be written as
B[u, v] = f (v).
This B will play a dominant role in establishing the existence of weak solutions of
elliptic PDEs.
4.2.3 Classical Versus Strong Versus Weak Solutions
We adopt the following definitions to compare all types of solutions to a differential

equation.
(1) Classical Solution: If u satisfies equation (4.2.4) pointwise for all x ∈ , then
u is said to be a classical solution of (4.2.4).
(2) Strong Solution: If u satisfies equation (4.2.4) pointwise for almost all x ∈ ,
then u is said to be a strong solution of (4.2.4).
(3) Weak Solution: If u satisfies the weak formulation (4.2.5) of equation (4.2.4),
then u is said to be a weak solution of (4.2.4).
It is easy to see that every classical solution is a strong solution, but the converse is
not necessarily true. We illustrate the idea by the Poisson equation
u = f, in
u = 0 on ∂
for some f ∈ L 2 (). If u ∈ C 2 () and satisfies the equation pointwise for every
x ∈ together with the boundary condition, then u is a classical solution to the
problem, and in this case we get f ∈ C(). When it comes to applications to science
and engineering, the condition of obtaining a continuous data is a bit restrictive in
practice, and requiring f to be measurable and L 2 integrable seems more realistic in
many situations. If it happens that
u ∈ H 2 () ∩ H01 ()
satisfies the equation at almost all x ∈ except for a set of measure zero, then
u is a strong solution. Notice the difference between the two notions; u continues
to be a Sobolev function, and so it is measurable but not continuous, and conse-
quently the boundary condition cannot be taken pointwise because the boundary of
has measure zero and measurable functions don’t change by a set of measure zero
(remember that in L p spaces we are dealing with classes of functions rather than
functions). Nevertheless, the function belongs to H 2 , so it possesses second weak
derivatives and this allows it to satisfy the equation pointwise almost everywhere and
produces f ∈ L 2 () as a result of the calculations. If u ∈ H01 () and satisfies the
variational weak formulation
4.3 Poincare Equivalent Norm 247
Du.Dvd x = f vd x

for all v ∈ H01 (), then u is a weak solution of the equation. Observe here that u does
not satisfy the equation nor the boundary in a pointwise behavior, but rather globally
via an integration over the domain. We only require the first weak derivative of u to
exist, and since H 2 () ⊂ H 1 () we see that every strong solution is indeed a weak
solution but the converse is not necessarily true, thus it may happen that the equation
has a weak solution but not a strong solution, but if the weak solution turns out to
be in H 2 () then it becomes a strong solution. Sections 4.9 and 4.10 investigate this
direction thoroughly.
We end the section by the following important remark: The notion of weak solution
should not be confused with the notion of weak derivative as the former is called
“weak” because it satisfies the weak formulation of the PDE, which is a weaker
condition than satisfying the equation pointwise, but it doesn’t mean it satisfies
the equation with its weak derivatives. For example, the function (3.1.1) is a weak
solution of the Laplace equation u = 0 although it is not weakly differentiable as
illustrated in Sect. 3.1.
4.3 Poincare Equivalent Norm
4.3.1 Poincare Inequality on H01
In this section, we will deduce some important results that are very useful in estab-
lishing existence and uniqueness of weak solutions. Theorem 3.9.8 discussed the
1, p
Poincare inequality in W0 as a consequence of the Gagliardo–Nirenberg–Sobolev
inequality. According to the remark after Theorem 3.9.8, the inequality holds for
p = 2 for all n. Here is a restatement of the result with an alternative proof that
doesn’t depend on GNS inequality, which implies that it holds for 1 ≤ p < ∞, and
in which the domain can be unbounded in general but bounded in at least one
direction, and this gives an extra flexibility to the choice of .
Theorem 4.3.1 (Poincare Inequality on H01 ) Let ⊂ Rn be an open set that is
bounded in at least one direction of Rn . Then there exists C > 0 (which depends
only on ) such that for every u ∈ H01 (), we have
u L 2 () ≤ C Du L 2 () .
Proof For n > 1, we assume that is bounded in the xi direction, that is,
|xi | ≤ M
for all x ∈ with xi being the i th component of x. For u ∈ Cc∞ ():

∂u
u2 = |u|2 d x = − 2xi u dx
∂xi
where we perform integration by parts in xi . By Cauchy-Schwartz inequality, this

gives:

∂u
u2L 2 () ≤ 2M u
∂x d x
i
≤ C u L 2 () Di u L 2 () .
The result follows by dividing both sides by u L 2 () .

For n = 1 we have = (a, b) and u(a) = u(b) = 0, and the inequality can be
easily established by the same argument as above.
Now, using the fact that
H01 () = Cc∞ (),
the inequality extends by density to u ∈ H01 () where we can assume a sequence
u n ∈ Cc∞ () such that u n −→ u in H01 , which implies
i
D un −→ D i u L 2 ()
L 2 ()
for i = 0, 1.
Remark The Poincare constant C = 2M depends only on and is regarded as

the least diameter of . The value 2M is not the best value that can be obtained for
C , but it suffices our needs.
4.3.2 Equivalent Norm on H01
An important consequence of this inequality is the following.

Corollary 4.3.2 Let ⊂ Rn be an open set that is bounded in at least one direction.
Then the norm
u∂ = Du L 2 ()
defines an inner product on H01 (), and H01 () endowed with this inner product is
a Hilbert space.
Proof For every u ∈ H01 (), we use Poincare inequality to obtain
u2H 1 () ≤ (C2 + 1) Du2L 2 () ≤ (C2 + 1) u2H 1 () . (4.3.1)
This implies that the norm defined as Du L 2 () is equivalent to the standard norm
u H01 () , and so we can define the following inner product on H01 ():
4.3 Poincare Equivalent Norm 249
(u, v)∂ = Du, Dv L 2 () = Du Dvd x, (4.3.2)

such that
u2∂ = Du2L 2 () = (Du, Du)∂
and the result follows since H01 () endowed with · H01 () is a complete space.
The norm
u∂ = Du L 2 ()
on H 1 ()
shall be called Poincare Norm. In a similar fashion, we write u∂ 2 to
denote D 2 u L 2 () .
4.3.3 Poincare–Wirtinger Inequality
One main concern about the Poincare inequality is when u is constant over . Of
/ H01 (). So we need to generalize the inequality to include
course, in this case u ∈
this case and in the general space H 1 (). To motivate this, we need the following
definition:
Definition 4.3.3 (Mean Value). Let ⊂ Rn . Then the mean value of a function u
over , denoted by ū , is given by
1
ū = ud x.
||
The next inequality generalizes Poincare’s inequality.

Theorem 4.3.4 (Poincare–Wirtinger Inequality) Let be a open connected Lip set
in Rn , n ≥ 2. Then there exists C > 0 such that for every u ∈ H 1 (), we have
u − ū L 2 () ≤ C Du L 2 () .
Proof If the estimate above doesn’t hold, then for every m > 0 there exists u m ∈
H 1 () such that
u − ū L 2 () > Du L 2 () .
Define the following sequence:

u m − (ū )m
vm = .
u m − (ū )m 2
Then it is clear that vm 2 = 1 and (v̄ )m = 0. Moreover,

1
Dvm L 2 < . (4.3.3)
m
c
So the sequence {vm } is bounded, and since we have H 1 () → L 2 (), there exists
a subsequence, say {v j } such that v j −→ v in L 2 (), and so v L 2 = 1. We expect
from (4.3.3) that Dv = 0. Indeed, for every ϕ ∈ Cc∞ () we have
∂ϕ ∂ϕ ∂vm 1
v d x = lim vm d x = − lim ϕ d x = − lim ϕd x = 0.
∂xi m ∂xi m ∂xi m m
Therefore, v ∈ H 1 () and Dv = 0, so v is constant. But (v̄ )m = 0 implies v̄ = 0,

and thus v = 0 on , which contradicts the fact that v L 2 = 1.
4.3.4 Quotient Sobolev Space
In view of the above result, we can reduce the space H 1 () by collapsing the real
numbers to zero by considering the quotient Sobolev space
H̃ 1 () = H 1 ()/R
to include all the equivalence classes [u] = ũ with respect to the equivalence relation
u ∼ v iff u − v is constant, and u, v ∈ ũ implies that u − v is a constant. So, in
this space the functions u and u + 1 are the same because they belong to the same
equivalence class [u]. If this space is endowed with the Poincare norm
u∂ = Du L 2 () ,
then we have the following proposition which is easy to verify.

Proposition 4.3.5 Let be open connected Lip set in Rn , n ≥ 2. Then, the quotient
Sobolev space H̃ 1 () endowed with the Poincare norm ·∂ is a Banach space, and
is a Hilbert space with the inner product
n

∂u ∂v
ũ, ṽ ∂ = , (4.3.4)
i=1
∂xi ∂xi
for ũ, ṽ ∈ H̃ 1 ().

Proof See Problem 4.11.4.
4.4 Elliptic Estimates 251
4.4 Elliptic Estimates
4.4.1 Bilinear Forms
Before we start establishing the estimates, it is important to discuss some properties

of B[u, v] that was defined in the previous section.
Definition 4.4.1 (Bilinear Map) Let H be a Hilbert space, B : H × H → R. Then
the map B is said to be a bilinear map, or bilinear form if
B[αu + βv, w] = αB[u, w] + β Bv, w].
In words, a bilinear mapping is linear in both coordinates, i.e., x −→ B(x, y) for
all y ∈ H2 and y −→ B(x, y) for all x ∈ H1 . A typical example of a bilinear form
on Euclidean spaces Rn is the dot product, and takes the form

n
B[x, y] = x, y = ai j xi y j
i, j
for some n × n matrix A = (ai j ). In general, if H is any Hilbert space with R as the
underlying field, then the inner product on H × H is bilinear.
Moreover, we have the following.
Definition 4.4.2 Let B : H × H → R be a bilinear mapping. Then
(1) B is bounded if
|B[u, v]| ≤ C u H v H .
(2) B is symmetric if
B[u, v] = B[v, u]
for all u, v ∈ H.
(3) B is positive definite if
B[u, u] > 0
for all nonzero u ∈ H.

(4) B is strongly positive (or coercive) on H if there exists η > 0 such that
B[u, u] ≥ η u2H
for all u ∈ H.
In view of Definition 4.4.2, we expect that much of the properties for linear mappings
extend to bilinear mappings. In particular, we have the following, whose proof is left
to the reader.
Proposition 4.4.3 Let B : H × H → R be a bilinear mapping. Then the following

are equivalent:
(1) B is bounded.
(2) B is continuous everywhere in H × H.
(3) B is continuous at (0, 0).
Proof Exercise.
4.4.2 Elliptic Bilinear Mapping
Now we come to our elliptic bilinear map that we already introduced in the previous
section.
Definition 4.4.4 (Elliptic Bilinear Map) Let ai j , bi , c ∈ L ∞ () for some open
⊆ Rn . We define the elliptic bilinear map, B : H01 () × H01 () −→ R and it
is given by
⎛ ⎞
n
∂u ∂v n
∂u
B[u, v] = ⎝ ai j + bi v + cuv ⎠ d x. (4.4.1)
i, j=1 ∂xi ∂x j i=1
∂x i
This B is the bilinear form associated with the elliptic operator L defined in (4.1.1)
which serves our weak formulation given in (4.2.5). The conditions adopted in the
definition suffice our needs in this text. Before we establish our estimates, we need
the following.
Lemma 4.4.5 (Cauchy’s inequality) Let > 0. Then for s, t > 0 we have
t2
st ≤ s 2 + .
4
Proof We have
t
st = (2 )1/2 s .
(2 )1/2
t
The result follows using Young’s inequality with a = (2 )1/2 s and b = .
(2 )1/2
4.4.3 Garding’s Inequality
Theorem 4.4.6 (Elliptic Estimates) Let B be the elliptic bilinear map (4.4.1) for
some open in Rn .
4.4 Elliptic Estimates 253
(1) If ai j , bi , c ∈ L ∞ (), then B is bounded, i.e., there exists α > 0 such that for
every u, v ∈ H01 () we have
|B[u, v]| ≤ α u H01 () v H01 () .
(2) Garding’s inequality: If B is associated with a uniform elliptic operator L on

a domain that is bounded in at least one direction, then there exist β > 0 and
γ ≥ 0 such that for all u ∈ H01 () we have
B[u, u] ≥ β u2H 1 () − γ u2L 2 () .

0
Remark For convenience, we write · L ∞ () = ·∞ and · L 2 () = ·2 .
Proof For (1), let

⎧ ⎫
⎨n
n ⎬
M = max ai j , bi ∞ , c ∞ .
⎩ ∞ ⎭
i, j=1 i=1
Then we have

n
∂u ∂v
n
∂u
|B[u, v]| ≤ ai j dx + bi ∞ |v| d x
∞ ∂x
i, j=1 ∂x i j i=1 ∂x i
+ c∞ |u| |v| d x

∂u ∂v ∂u

≤M ∂x ∂x + ∂x |v| + |u| |v| d x
i j i

∂u ∂v ∂u
≤M + v2 + u2 v2 (by C-S inequality)
∂xi 2 ∂x j 2 ∂xi 2
≤ α u H 1 () v H 1 () (since · L 2 ≤ · H 1 ).
0 0 0
for some suitable α = 3M > 0.

For (2), since L is uniform elliptic, there exists λ0 > 0 such that

n
ai j (x)ξi ξ j ≥ λ0 |ξ|2 (4.4.2)
i, j=1
for all ξ ∈ Rn , let ξ = Du, substitute in (4.4.2) and integrate both sides over . Then
we have
n
∂u ∂u
λ0 |Du|2 d x ≤ ai j (x) dx
i, j=1 ∂xi ∂x j

n
∂u
= B[u, u] − bi (x) ud x − cu 2 d x
i=1 ∂xi

n
∂u
≤ B[u, u] + bi ∞ |u| d x + c∞ |u|2 d x.

i=1 ∂x i

∂u

Now we substitute with = s and |u| = t in Cauchy’s inequality with
∂xi
λ0
0 < < n .
2 i=1 bi ∞
This gives

n
1
λ0 |Du|2 d x ≤ B[u, u] + bi ∞ |Du|2 d x + + c∞ u 2 d x.
4
i=1
It follows that

λ0
n
|Du| d x ≤ λ0 −
2
bi ∞ |Du|2 d x
2 i=1

1
≤ B[u, u] + + c∞ u 2 d x,
4
which implies
λ
β u2H 1 () ≤ Du22 ≤ B[u, u] + γ u22 ,
0 2
where
λ0 1
β= , γ= + c∞ .
2(C2 + 1) 4
If bi = 0, then we have
λ0 |Du|2 d x ≤ B[u, u] + c∞ |u|2 d x,

which, by using Poincare inequality again, implies
β u2H 1 () ≤ B[u, u] + γ u22 ,

0
where
λ0
β= (C2 +1)
, γ = c∞ .
4.5 Symmetric Elliptic Operators 255
4.5 Symmetric Elliptic Operators
4.5.1 Riesz Representation Theorem for Hilbert Spaces
Before establishing results on the existence and uniqueness of solutions of elliptic

PDEs, we need to recall the famous Riesz Representation Theorem on Hilbert spaces
and give a brief proof of it.
Theorem 4.5.1 (Riesz Representation Theorem (RRT) for Hilbert Spaces) Let H be
a Hilbert space, and 0 = f ∈ H∗ . Then, there exists a unique element u ∈ H such
that
f (v) = v, u
for all v ∈ H and

f = u .
Proof Note that Y = ker( f ) is a closed proper subspace of H, and so Y ⊥ contains

a nonzero element, say y0 . Then f (v)y0 − f (y0 )v ∈ Y , which implies that
0 = f (v) y0 , y0 − f (y0 ) v, y0
from which we get

f (y0 ) f (y0 )
f (v) = v, y0 = v, y0 .
y0 , y0 y0 , y0
Then
f (y0 )
u= y0
y0 , y0
establishes the existence.

If there exists another element, say u ∈ H such that f (v) = v, u , then

v, u − u = 0.
Choose v = u − u gives
u=u,
which established uniqueness. Note that f ∈ H∗ and f ≤ u . On the other hand,
u2 = u, u = f (u) ≤ f u .
Divide by u to get the other direction.

4.5.2 Existence and Uniqueness Theorem—Poisson’s

Equation
The first equation to start with is the Poisson’s equation.

Theorem 4.5.2 (First Existence Theorem) Consider the Dirichlet problem
−∇ 2 u = f in
(4.5.1)
u = 0 on ∂
where f ∈ L 2 () for some open ⊂ Rn that is bounded in at least one direction.
Then there exists a unique weak solution u ∈ H01 () for problem (4.5.1).
Proof Note that L is elliptic with ai j (x) = δi j , bi (x) = c = 0 for all i = 1, . . . n. So

the elliptic bilinear map is of the form
B[u, v] = Du.Dvd x = u, v ∂

which is the Poincare inner product defined in (4.3.4). Proposition 4.3.5 asserts that
H01 () with this inner product is Hilbert, and since f ∈ L 2 (), the inner product
takes the form
(u, v)∂ = f, v = f (v) = f vd x.

Indeed, By Holder’s (or C-S) inequality,
| f (v)| ≤ | f v| ≤ f L 2 () v H01 () ,

so
f ∈ (H01 ())∗ = H −1 (),
and therefore by the Riesz Representation Theorem (RRT) there exists a unique
u ∈ H01 () satisfying the equation, and clearly
u |∂ = 0.
Hence u is the unique weak solution of the problem (4.5.1).
The second equation to discuss is the nonhomogeneous Helmholtz equation.

4.5.3 Existence and Uniqueness Theorem—Helmholtz

Equation
Theorem 4.5.3 (Second Existence Theorem) Consider the Dirichlet problem
−∇ 2 u + u = f in
(4.5.2)
u = 0 on ∂
where f ∈ H −1 () for some ⊆ Rn . Then there exists a unique weak solution
u ∈ H01 () for problem (4.5.2).
Proof Note that

Lu = −∇ 2 u + u
which is elliptic with ai j (x) = δi j , bi (x) = 0 for all i = 1, . . . n, and c(x) = 1. Here,
the elliptic bilinear map takes the following form:
B[u, v] = (Du Dv + uv) d x = u, v H01 () .

So B defines the standard Sobolev inner product on H01 (), and
B[u, v] = f, v = f (v)
for all v ∈ H01 () and f is a bounded linear functional on H01 (). Thus, by Riesz
Representation Theorem
(RRT)
there exists a unique function u ∈ H01 () satisfying the equation, and of course
u |∂ = 0.
Hence, u is the unique weak solution of problem (4.5.2).
We observe two important points from the two preceding theorems.

(1) The domain for the first problem was chosen to be ⊆ Rn , which could be
= Rn , whereas the domain for the second problem was chosen to be open
and bounded in at least one direction. This is because the latter problem requires
Poincare inequality to use the Poincare norm, while the former problem used the
standard Sobolev norm without the use of Poincare inequality.
(2) Both operators in the two problems are symmetric. In fact, this condition is
essential for B to define an inner product on H01 . If bi = 0 for some i, then
L is not symmetric and thus B cannot define an inner product and the Riesz
Representation Theorem won’t be applicable.
4.5.4 Ellipticity and Coercivity
Now, we investigate equations of the form
n
∂ ∂u
Lu = − ai j (x) + c(x)u(x).
i, j=1
∂xi ∂x j
The following result connects uniform ellipticity with coercivity.

Theorem 4.5.4 Consider the elliptic operator

n

L=− ∂xi ai j (x)∂x j + c(x), (4.5.3)
i, j=1
defined on H01 () for some open and bounded in at least one direction in Rn , and
let ai j , c ∈ L ∞ () with c(x) ≥ 0.
(1) If L is uniformly elliptic, then the associated elliptic bilinear map B[u, v] is
coercive.
(2) Moreover, if A = (ai j ) is symmetric then B defines a complete inner product on
H01 ().
Proof The elliptic bilinear map associated with L takes the form
⎛ ⎞
n
∂u ∂v
B[u, v] = ⎝ ai j (x) + c(x)u(x)v(x)⎠ d x.
i, j=1 ∂xi ∂x j
By the uniform ellipticity of L, there exists λ > 0 such that for all ξ ∈ Rn

n
ai j (x)ξi ξ j ≥ λ0 |ξ|2 . (4.5.4)
i, j
Substitute Du = ξ in (4.5.4), then by substituting the above in B

B[u, u] ≥ λ0 |Du|2 + c(x)u 2 d x

≥ λ0 |Du|2 d x

λ0 λ0
= |Du|2 + |Du|2 d x
2 2

λ0 λ0 2
≥ |Du|2 + u d x (by Poincare inequality)
2 2C2

≥σ |Du|2 + u 2 d x

= σ u2H 1 ()
0
for any u ∈ H01 (), where

! "
λ0 λ0
σ = min , .
2 2C2
This proves that B is coercive. So
B[u, u] > 0
for u = 0, and if B[u, u] = 0 then we clearly have u = 0. Moreover, by symmetry

of A, we have
B[u, v] = B[v, u].
Hence, B[u, v] defines an inner product ·, · B on H01 () and
B[u, u] = u, u B = u2B ≥ σ u2H 1 () ,

0
or
√
u B ≥ σ u H01 () . (4.5.5)
On the other hand,
u, u B ≤M u 2 d x = M u2H 1 () (4.5.6)

0

where
⎧ ⎫
⎨n
⎬
M = max ai j ∞ , c L ∞ () .
⎩ L () ⎭
i, j=1
Then (4.5.5) and (4.5.6) imply the inner product u, u B is equivalent to the standard
inner product u, u H01 () and thus the space (H01 (), ·, · B ) is Hilbert space.
4.5.5 Existence and Uniqueness Theorem—Symmetric

Uniformly Operator
The next theorem provides an existence and uniqueness theorem for (4.2.4) for a
symmetric uniformly operator of the form (4.5.3). Remember that the condition
bi = 0 is essential for symmetry of L .
Theorem 4.5.5 Consider the Dirichlet elliptic problem
Lu = f in
(4.5.7)
u = 0 on ∂.
where L is a uniformly elliptic operator of the form (4.5.3) defined on some open
set and bounded in at least some direction in Rn . If A = (ai j ) is symmetric, and
ai j , c ∈ L ∞ (), f ∈ L 2 (), and c(x) ≥ 0, then there exists a unique weak solution
for the problem (4.5.7).
Proof As in Theorem 4.5.4, we have
⎛ ⎞
n
∂u ∂v
i, j=1 ∂xi ∂x j
Then B is bounded by estimate 1 of Theorem 4.4.6, and is symmetric since A is

symmetric. Since L is uniform elliptic and c ≥ 0, by Theorem 4.5.4, B is coercive
and defines a complete inner product ·, · B on H01 () such that (H01 (), ·, · B ) is
Hilbert space. Moreover, f, v = f (v) is a bounded linear functional on L 2 (),
and thus on H01 (). The existence and uniqueness of the weak solution of problem
(4.5.7) follows now from the Riesz Representation Theorem.
Remark We end the section by the following remarks.
(1) The symmetry condition for A was required to prove that B defines an inner
product, but it wasn’t required to prove coercivity of B.
(2) The above results are still valid if c(x) admits negative values (see
Problems: 4.11.20, 4.11.21).
4.6 General Elliptic Operators
4.6.1 Lax–Milgram Theorem
The elliptic bilinear map B in Theorem 4.5.4 for a symmetric elliptic operator defines
the most general inner product on H01 () in the sense that if A is the identity matrix
and c = 1 then
4.6 General Elliptic Operators 261
u, u B = u, u H01 () ,
and if A is the identity matrix and c = 0 then
u, u B = u, u ∂ .
If bi = 0 for at least one i, then L is not symmetric, which implies that B is not
symmetric. In this case, B cannot define an inner product and consequently we cannot
apply the Riesz representation theorem. Therefore, we need to investigate a more
general version of the Riesz representation theorem to allow us to deal with general
elliptic operators that are not symmetric. The following theorem is fundamental and
serves our needs.
Theorem 4.6.1 (Lax–Milgram) Let B : H × H → R be a bilinear mapping for the
Hilbert space H . If B is bounded and coercive, then for every f ∈ (H )∗ there exists
a unique u ∈ H such that
B[u, v] = f, v
for all v ∈ H.
Proof Let u ∈ H be a fixed element. Then the mapping v −→ B[u, v] defines a
bounded linear functional on H and so by the Riesz representation theorem, there
exists a unique w = wu ∈ H such that
B[u, v] = wu , v
for all v ∈ H. Our claim is that the mapping u −→ wu is onto and one-to-one, which
implies the existence of u ∈ H such that
wu = f.
So consider the mapping T : H −→ H , T (u) = w. Then for all v ∈ H , we have
B[u, v] = T u, v .
Clearly T is linear due to the linearity of B. Moreover, by boundedness of B
T u2 = T u, T u = B[u, T u] ≤ C u T u .
We divide by T u and conclude that T is bounded, i.e., continuous. Moreover, by

coercivity of B there exists η > 0 such that
η u2H ≤ B[u, u] = T u, u ≤ T u H u H .
Again, dividing by u H implies that T is bounded below
η u H ≤ T u H . (4.6.1)
The next step is to show that R(T ) is closed, then show that
R(T )⊥ = {0}.
This implies that R(T ) = H and so T is onto. To show that R(T ) is closed, let wn
be Cauchy in R(T ). Then for each n, there exists u n such that
T (u n ) = wn .
Then by (4.6.1), we have

1 1
u n − u m ≤ T (u n − u m ) ≤ wn − w m → 0,
η η
and so (u n ) is Cauchy. But by completeness of H , u n −→ u for some u ∈ H and

wn −→ w for some w ∈ H . Hence
w = lim wn = lim T (u n ) = T (lim u n ) = T (u),
therefore w ∈ R(T ) and consequently R(T ) is closed in H. It follows that by orthog-

onal decomposition of Hilbert spaces,
H = R(T ) ⊕ R(T )⊥ .
Now, if R(T ) ⊂ H, then there exists 0 = y ∈ R(T )⊥ , so
y, R(T ) = 0.
By coercivity of B
η y2H ≤ B[y, y] = T y, y = 0.
Therefore y = 0, and hence

R(T ) = H,
i.e., T is onto. This means that there exists u ∈ H such that wu = f.

To show uniqueness of u, let
T u1 = T u2;
then substituting with u = u 1 − u 2 in (4.6.1) gives
u1 = u2,
which implies that T is one-to-one, and hence u is unique.

Note that in the proof of the theorem, we didn’t assume B is symmetric.
4.6.2 Dirichlet Problems
The next theorem is basically the same as Theorem 4.5.5 except that the symmetry
condition for A is relaxed.
Lu = f in
(4.6.2)
u = 0 on ∂.
where L is a uniformly elliptic operator of the form (4.5.3), such that ai j , c ∈

L ∞ (), f ∈ L 2 (), and c(x) ≥ 0 for some open that is bounded in at least
some direction in Rn . Then there exists a unique weak solution in H01 () for
the problem (4.6.2).
Proof Define
⎛ ⎞

n
∂u ∂v
B[u, v] = ⎝ ai j (x) + c(x)u(x)v(x)⎠ d x. (4.6.3)
i, j=1 ∂xi ∂x j
Then B is an elliptic bilinear map with ai j , c ∈ L ∞ (), and so bounded by estimate

1 of Theorem 4.4.6. Moreover, since L is uniform ellipticity, B[u, v] is coercive by
Theorem 4.5.4(1). So by the Lax–Milgram theorem, for every
f ∈ L 2 () ⊂ H −1 (),
there exists a unique u ∈ H01 () such that
f (v) = B[u, v]
for all v ∈ H01 (). The result follows from the Lax–Milgram theorem.
Recall that in an elliptic operator L, the coefficient matrix A is positive definite, so
its eigenvalues are all positive. In the previous theorem, it was assumed that c ≥ 0,
so it cannot be an eigenvalue of A. But for arbitrary c, we need to avoid the values
that would make c an eigenvalue of −L since this would give zero in the LHS of
(4.5.7) and hence f cannot be obtained.
Now we will study the solution of the equation
Lu + μu = f in ,
where L is the operator
n
∂ ∂u
Lu = − ai j (x) + c(x)u(x),
i, j=1
∂xi ∂x j
so the zeroth-order term in the equation is (c(x) + μ)u. When we relax the condition
c ≥ 0, we will make use of
γ = c∞
in Garding’s elliptic estimate. If we assume
μ ≥ γ,
then the zeroth-order term becomes
(c(x) + μ)u ≥ (c(x) + c∞ )u,
from which we obtain

c(x) + c∞ ≥ 0
for all choices of c, so by Theorem 4.5.4 the elliptic bilinear map B[u, v] is coercive.
Thus we have the following.
Lu + μu = f in
(4.6.4)
u=0 on ∂.
for some uniformly elliptic operator L of the form (4.5.3), such that ai j , c ∈ L ∞ ()
and f ∈ L 2 () for some open and bounded in at least some direction in Rn . If
μ ≥ γ, which was obtained in Garding inequality, then there exists a unique weak
solution for the problem (4.6.4).
Proof The result follows immediately from Theorem 4.6.2 since

c(x) + c∞ ≥ 0.
In other words, for μ ≥ γ, the operator
L μ = (L + μI ) : H01 −→ H −1 (4.6.5)
is onto (by existence), one-to-one (by uniqueness), and bounded.
4.6.3 Neumann Problems
Now we investigate Elliptic PDEs with Neumann conditions. Consider the problem
Lu = f in
(4.6.6)
∇u · n = 0 on ∂
where L is the elliptic operator in (4.5.3). Here, we will assume is bounded open
in Rn for n ≥ 2. The weak formulation of the problem takes the following form:
Find u ∈ H 1 () such that
⎛ ⎞

n
∂u ∂v
⎝ ai j (x) + c(x)u(x)v(x)⎠ d x = f, v in
i, j=1 ∂xi ∂x j
∇u · n = 0 on ∂.
The argument of finding the weak formulation of the problem is almost the same
as for the Dirichlet problem except one thing: The solution doesn’t vanish on the
boundary, so our test function will be in C ∞ (), and consequently our space solution
will be H 1 () rather than H01 (). This means that we won’t be able to use Poincare
inequality since it requires H01 (). To solve the problem we assume that ai j , c ∈
L ∞ (), f ∈ L 2 () and c(x) ≥ 0 on . Here, the case c = 0 should be treated with
extra care for reasons that will be discussed shortly. So we need to discuss the two
cases separately: the first case when c(x) is away from 0 and the second case is when
c = 0.
Theorem 4.6.4 Let ai j , c ∈ L ∞ () and f ∈ L 2 () in the problem (4.6.6) defined
on a bounded open in Rn for n ≥ 2. If c(x) ≥ m > 0 then there exists a unique
weak solution u ∈ H 1 () for problem (4.6.6), and for some C > 0, we have
u H 1 () ≤ C f L 2 () . (4.6.7)
Proof The elliptic bilinear map associated with L is

⎛ ⎞
n
∂u ∂v
i, j=1 ∂xi ∂x j
Then,

n

|B[u, v]| ≤ ai j ∞ |Du| |Dv| d x + c L ∞ () |u| |v| d x
L ()
i, j=1

≤α |Du| |Dv| d x + |u| |v| d x

= 2α u H 1 () v H 1 () ,
where

n

α = max{ ai j ∞ , c L ∞ () }.
L ()
i, j=1
Hence B is bounded on H 1 () × H 1 (). Moreover, by letting ξ = Du in the uni-

form ellipticity condition,
B[u, u] ≥ λ0 |Du|2 d x + m u2d x

≥ β u2H 1 ()
where β = min{λ0 , m}. Hence B is coercive. By applying the Lax–Milgram theorem,

we obtain a unique u ∈ H 1 () such that
β u2H 1 () ≤ B[u, u] = f, u ≤ f L 2 () u H 1 () .
Dividing by β u H 1 () , we arrive at estimate (4.6.7).
Now, we discuss the second case when c = 0. The problem reduces to
Lu = f in
(4.6.8)
∇u · n = 0 on ∂
The weak formulation of the problem takes the following form: Find u ∈ H 1 ()
such that

n
∂u ∂v
ai j (x) d x = f, v in
i, j=1 ∂xi ∂x j
∇u · n = 0 on ∂.
The difficulty of this problem stems from the fact that the existence and uniqueness
result is not guaranteed because if u is a weak solution to the problem, then u + k
is also a solution to the problem for any constant k. Moreover, for a constant k, we
have
f, k = 0,
which is equivalent to having

f¯ = 0.
In this case, two conditions will be added, the first is that
ū = 0,
and the compatibility condition

f, 1 = 0,
and so the Poincare–Wirtinger inequality and the quotient Sobolev space will be
invoked here.
Theorem 4.6.5 Let ai j ∈ L ∞ () and f ∈ L 2 () in problem (4.6.8) defined on a
bounded connected open in Rn for n ≥ 2. If
ū = 0
and
f, 1 L 2 () = 0,
then there exists a unique weak solution u ∈ H 1 () for problem (4.6.8). Moreover,
for some C > 0 we have
u H˜ 1 () ≤ C f L 2 () . (4.6.9)
Proof Consider the quotient Sobolev space H̃ 1 () with the Poincare norm
u H˜ 1 () = u∂ = Du L 2 ,
which is a Hilbert space by Proposition 4.3.5. The associated elliptic bilinear map
takes the form

n

|B[ũ, ṽ]| ≤ ai j ∞ |Du| |Dv| d x
L ()
i, j=1
≤ α u H 1 () v H 1 () ,
where

n

α= ai j ∞ ,
L ()
i, j=1
and so B is bounded on H˜ 1 () × H̃ 1 (). Moreover, letting ξ = Du in the uniform

ellipticity condition,
B[ũ, ũ] ≥ λ0 |Du|2 d x = λ0 u2˜ 1 , (4.6.10)

H ()

B is coercive. Lastly, we show that f ∈ ( H̃ 1 ())∗ . We have
| f (ṽ)| = | f, v | = | f, v − v | ≤ f L 2 v − v L 2 .
Using the Poincare–Wirtinger inequality, this gives
| f (ṽ)| ≤ C f L 2 Dv L 2 = f L 2 () ṽ H˜ 1 () ,
thus f is a linear bounded functional on H̃ 1 (). Therefore, applying the Lax–

Milgram theorem, we see that for every ṽ ∈ H̃ 1 () there exists a unique ũ ∈ H̃ 1 ()
such that
B[ũ, ṽ] = f, v
for f ∈ L 2 (). From (4.6.10), this gives
λ0 u2H˜ 1 () ≤ B[ũ, ũ] = f, ũ ≤ f L 2 () ũ H˜ 1 () ,
and dividing by λ0 u H˜ 1 () we arrive at estimate (4.6.9).
4.7 Spectral Properties of Elliptic Operators
4.7.1 Resolvent of Elliptic Operators
We return to problem (4.6.4) which states that the equation Lu + μu = g with

u(∂) = 0 has a unique weak solution on H01 () for all μ ≥ γ. We also concluded
that the operator L μ in (4.6.5) is invertible for μ ≥ γ, and
L −1
μ : H
−1
() −→ H01 ().
The objective of this section is to investigate the spectral properties of elliptic

operator. Our ultimate goal is to show that L μ is a Fredholm operator. Consider the
problem
Lu = f in
(4.7.1)
u = 0 on ∂
for some uniformly elliptic operator

n

L=− ∂i ai j ∂ j + c(x) + μ
i, j
defined on an open bounded set ∈ Rn . It was proved that for every f ∈ L 2 (),
there exists a unique weak solution u ∈ H01 () such that B[u, v] = f, v for every
v ∈ H01 (). Adding the term μu to both sides of the equation gives
Lu + μu = f + μu.
Writing g = f + μu gives the same equation in problem (4.6.4). Hence, we denote
L μ = L + μI : H01 () −→ H −1 (),
and the associated bilinear map is
Bμ [u, v] = B[u, v] + μ (u, v) = g, u . (4.7.2)

4.7 Spectral Properties of Elliptic Operators 269
Then
u = L −1 −1 −1
μ (g) = L μ ( f ) + μL μ (u). (4.7.3)
Let us denote
μL −1
μ = K,
which is the resolvent of L , and
L −1
μ ( f ) = h,
provided that we have μ ≥ γ. Then (4.7.3) can be written as
(I − K )u = h, (4.7.4)
with
K = μL −1
μ : H
−1
() −→ H01 (). (4.7.5)
The following theorem gives the first result of this section which implies that a
uniform elliptic operator has a compact resolvent.
Theorem 4.7.1 The operator
K = μL −1
μ : L () −→ L ()
2 2
defined above is compact, i.e., I − λK is a Fredholm operator for any 0 = λ.
Proof If we strict K in (4.7.4) on L 2 (), still calling it K , then
K | L 2 = K : L 2 () −→ H01 ().
We prove K is bounded. Indeed, Bμ is clearly elliptic bilinear, so it is bounded by

elliptic estimate (1), and using Garding’s inequality we have
Bμ [u, u] ≥ β u2H 1 () + (μ − γ) u2L 2 () ≥ β u2H 1 () . (4.7.6)

0 0
On the other hand, from (4.7.2) we have
β u2H 1 () ≤ Bμ [u, u] = g, u ≤ g L 2 () u L 2 () ≤ g L 2 () u H01 () .
0
(4.7.7)
Then (4.7.6) and (4.7.7) give
1
u H01 () ≤ g L 2 () .
β
From (4.7.3) and the definition of K , this implies

1
K (g) H01 () = L −1
μ (g) H 1 () ≤ g L 2 () ,
0 β
and hence K is a bounded linear operator which maps bounded sequences to bounded
sequences in H01 (), which, in turn, is compactly embedded in L 2 () (by the
Rellich–Kondrachov theorem for n > 2 and Theorem 3.10.7 for n = 1, 2) and there-
fore K = ι ◦ K is compact.
4.7.2 Fredholm Alternative for Elliptic Operators
Since we concluded that K is compact, we can obtain a Fredholm alternative theorem

for elliptic operator.
Theorem 4.7.2 (Fredholm Alternative for Elliptic Operators) Let L be a uniformly
elliptic operator defined on an open bounded set of Rn . Then:
Either
1. For every f ∈ L 2 () there exists a unique weak solution u ∈ H01 () of the problem
Lu = f
u = 0 ∂,
or
2. There exists a nonzero weak solution u ∈ H01 () for the equation Lu = 0.
Proof We start from the fact that the operator I − K is Fredholm by Theorem 4.7.1.
So either
(i) for every h ∈ L 2 () the equation
(I − K )u = h (4.7.8)
has a unique weak solution u ∈ H01 (), or

(ii) the equation (I − K )u = 0 has a nontrivial weak solution u ∈ H01 ().
Suppose statement (i) holds. Substituting μL −1 −1
μ = K and L μ ( f ) = h in (4.7.8)
gives
(I − μL −1 −1
μ )u = L μ ( f ) = h. (4.7.9)
Apply L μ to both sides of (4.7.9) and rearrange terms to finally obtain statement (1)
of the theorem. Suppose statement (ii) above holds. Again, letting K = μL −1 μ ,
(I − μL −1
μ )u = 0,
4.8 Self-adjoint Elliptic Operators 271
which, after applying L μ to both sides, implies

1
L μ (u) = u.
μ
Multiplying both sides by μ gives statement (2) of the theorem. This completes the
proof.
4.7.3 Spectral Theorem for Elliptic Operators
An immediate conclusion is
Corollary 4.7.3 Let K be the resolvent of a uniformly elliptic operator of the form
L = −∂x j (ai j ∂xi ) + bi ∂xi + c
for some ai j ∈ C 1 (), bi ∈ R be a constant number, and be open and bounded

in Rn . Then, the eigenfunctions of K form an countable orthonormal basis for the
space L 2 () and their corresponding eigenvalues behave as
λ1 > λ2 > λ3 . . . ,
and λn −→ 0.
Proof This is a consequence of the Hilbert–Schmidt theorem and the Spectral the-
orem for Self-adjoint compact operators.
The above corollary provides us with a justification of the Fredholm alternative of

elliptic operators. Indeed, if Lu = 0 has only the trivial solution, then L doesn’t
have 0 as an eigenvalue, so by Proposition 1.6.11 this implies that the orthonormal
basis is finite, but such a set cannot span all the Hilbert space L 2 , which is infinite-
dimensional, i.e., cannot be surjective, so we cannot find a solution for the equation
for every f.
4.8 Self-adjoint Elliptic Operators
4.8.1 The Adjoint of Elliptic Bilinear
The adjoint form of B is denoted by B ∗ and is defined as
B ∗ [u, v] = B[v, u]
and the adjoint problem is defined as finding the weak solution v ∈ H01 () of the
adjoint equation
L ∗v = f
v=0 ∂
such that
B ∗ [v, u] = f, u
for all u ∈ H01 (). Moreover,
Bμ∗ [v, u] = B ∗ [v, u] + μ v, u = g, u .
We will investigate the eigenvalue problem of K = μL −1 μ rather than L in order to

make use of the spectral properties of compact operators studied in Chap. 1. Consider
the elliptic operator
L = −∂x j (ai j ∂xi ) + bi ∂xi + c.
To make L self-adjoint, it is required that B ∗ [u, v] = B[u, v], so that L = L ∗ and
Lu, v = u, Lv .
To achieve this equality, we integrate (Lu)v by parts (by ignoring the summations
for simplicity). This gives

(Lu)vd x = −(ai j u xi )x j + bi u xi + cu vd x

= ai j u xi vx j + [(bi u)xi − (bi )xi u]v + cuvd x.

= −(ai j vxi )x j + bi vxi + (c − (bi )xi )v ud x

= (L ∗ v)ud x.

Letting bi constant gives (bi )xi = 0, and so
(Lu)vd x = (L ∗ v)ud x. = (Lv)ud x.

Thus:
Theorem 4.8.1 Let ai j ∈ C 1 (), and bi be a constant number. Then the elliptic
operator
4.8 Self-adjoint Elliptic Operators 273
L = −∂x j (ai j ∂xi ) + bi ∂xi + c
is self-adjoint; consequently its resolvent K = μL −1

μ is self-adjoint.
Proof The argument above implies L = L ∗ , and since I is also self-adjoint, then so
is L + μI, hence
∗
K ∗ = (L + μI )−1
= (L ∗ + μI )−1
= (L + μI )−1
= K.
4.8.2 Eigenvalue Problem of Elliptic Operators
Recall Theorem 4.6.3 asserts that the problem
Lu + μu(x) = f in
(4.8.1)
u = 0 on ∂
has a unique weak solution whenever μ ≥ γ (where γ is the elliptic estimate of

Garding inequality), where

n

L=− ∂xi ai j (x)∂x j + c(x) (4.8.2)
i, j=1
is a uniformly elliptic and self-adjoint operator and ai j , c ∈ L ∞ () and f ∈ L 2 ()

for some open and bounded in at least some direction in Rn . If μ = 0 and c ≥ 0,
then Theorem 4.6.2 asserts that the problem has a unique weak solution. If we
write μ = −λ, then the equation Lu + μu = 0 is written as Lu = λu and we have
an eigenvalue problem. We will discuss two cases. If λ ≤ −γ then μ ≥ γ and the
solution exists and unique for (4.8.1). If λ > −γ then we may not have a nontrivial
solution for the equation (L − λ)u = 0, but rather have the following Fredholm
alternative: either Lu − λu = f has a unique weak solution for every f ∈ L 2 (),
or the equation Lu − λu = 0 has a nontrivial solution, which implies that λ is an
eigenvalue of L, and problem (4.8.1) turns to the eigenvalue problem
Lu = λu in
u=0 on ∂.
4.8.3 Spectral Theorem of Elliptic Operator
The following theorem provides the main property for the spectrum of L .
Theorem 4.8.2 (Spectral Theorem of Elliptic Operators) Consider the uniformly
elliptic operator L in (4.8.2) defined on an open bounded set in Rn . Then, the
eigenfunctions of L form an countable orthonormal basis for the space L 2 () and
their corresponding eigenvalues behave increasingly as
0 < λ1 ≤ λ2 ≤ λ3 . . . ,
and λn −→ ∞.
Proof Consider the case when Lu − λu = 0 has a nontrivial solution, which identi-
fies λ as an eigenvalue of L . Add the term γu to both sides of the equation, λ > −γ.
This gives
Lu + γu − λu = γu,
or
L γ u = (γ + λ)u. (4.8.3)
Apply the resolvent L −1

γ to both sides of (4.8.3)
u = (γ + λ)L −1
γ u.
Substituting with K = γ L −1
γ ,
γ
Ku = u.
γ+λ
From Corollary 4.7.3, the eigenvalues of K are countable and decreasing, so let them
be γ
νn =
γ + λn
then the eigenvalues of L , and they increase, and since νn −→ 0 we must have
λn −→ ∞.
4.9 Regularity for the Poisson Equation 275
4.9 Regularity for the Poisson Equation
4.9.1 Weyl’s Lemma
Investigating the regularity of weak solutions of elliptic PDEs has been a major
research direction since 1940s. We begin with one of the earliest and most basic
regularity results. The result surprisingly asserts that the weak solution of the Laplace
equation is, in fact, a classical solution.
Theorem 4.9.1 (Weyl’s Lemma) If u ∈ H 1 () such that
Du · Dv = 0

for every v ∈ Cc∞ (). Then u ∈ C ∞ () and ∇ 2 u = 0 in .

Proof Consider the mollification u ∈ Cc∞ ( ). Performing integration by parts
yields
∇2u = ∇ 2 ϕ (x − y)u(y)dy = − ∇ϕ (x − y) · ∇u(y)dy = 0.

Hence u is harmonic on and

∇ 2 u = 0.
Letting ⊂ K ⊂ for some compact set K , it can be easily shown that the
sequence u is uniformly bounded and equicontinuous on K , hence there exists
v ∈ C ∞ (K ) such that u −→ v uniformly on K , so ∇ 2 v = 0 in K , and since
u −→ u in L 2 () we conclude that u = v.
Corollary 4.9.2 If u ∈ H 1 () is a weak solution to the Laplace equation then
u ∈ C ∞ (). In other words, weak solutions of the Laplace equation are classical
solutions.
The significance of the result is that it shows that the weak solution is actually
smooth and gives a classical solution. This demonstrates interior regularity.
4.9.2 Difference Quotients
Now we turn our discussion to the Poisson equation. The treatment for this equation
is standard and can be used to establish regularity results for other general elliptic
equations. The main tool of this topic is difference quotient. In calculus, the difference
quotient of a function u ∈ L p () is given by the formula
u(x + hek ) − u(x)

Dkh u(x) = ,
h
and this ratio will lead to the derivative of u if h −→ 0. We always need to ensure
that x + hek is inside , and this can be achieved by defining
h = {{x ∈ : d(x, ∂) > h > 0},
which clearly implies that h → as h → 0. This is similar to the settings adopted

for the mollifiers.
Definition 4.9.3 (Difference Quotient) Let ⊆ Rn be an open set, and the set
{e1 , . . . , en } be the standard basis of Rn . Let u be defined on . Then the differ-
ence quotient of u in the direction of ek , denoted by Dkh u, is defined on h by the
ratio
u(x + hek ) − u(x)
Dkh u(x) = .
h
If we choose ⊂⊂ such that ⊆ h , we ensure the difference quotient

is well-defined and this setting will be helpful in later results. Since the difference
quotients are meant to be a pre-stage for derivatives, our guess is that they obey the
same basic rules. The following proposition confirms our guess is correct.
Proposition 4.9.4 Let u, v ∈ W 1, p (), 1 ≤ p < ∞. Let ⊂⊂ such that ⊆
h , and suppose that supp(v) ⊆ , then
(1) Higher Derivative:
D(Dkh u) = Dkh (Du)
for all x ∈ .
(2) Sum Rule:
Dkh (u + v)(x) = Dkh u(x) + Dkh v(x)
for all x ∈ .
(3) Product Rule:
Dkh (u)v(x) = u(x)Dkh v(x) + v(x + hek )Dkh u(x)
for all x ∈ .
(4) Integration by Parts:
u(x)Dkh v(x)d x = − v(x)Dk−h u(x)d x.

Proof The first three statements can be proved by similar arguments for the classical
derivatives in ordinary calculus and it is thus left to the reader. For (4), note that by
using the substitution y = x + hek ,
u(x)v(x + hek ) u(y − hek )v(y)

dx = dy
h h
u(x − hek )v(x)
= d x.
h
Therefore we have
u(x)v(x + hek ) − u(x)v(x)
u(x)Dkh v(x)d x = dx
h
u(x − hek )v(x) − u(x)v(x)
= dx
h
u(x − hek )v(x) − u(x)v(x)
=− dx
−h
u(x − hek ) − u(x)
= − v(x) dx
−h
=− v(x)Dk−h u(x)d x.

The next theorem investigates the relation between difference quotients of a Sobolev
function in W 1, p () and its weak derivatives. Of course, if u ∈ W 1, p () then Du ∈
L p (), so
Dk u L p () < ∞.
How to compare between the two norms of Dk u and Dkh u? In view of the next theorem,
we see that difference quotients of functions in W 1, p () are bounded above by its
partial derivatives. On the other hand, if u ∈ L p () and its difference quotients Dkh u
are uniformly bounded above and independent of h, then its weak derivative exists
and is bounded by the same bound. That is, u ∈ W 1, p ().
Theorem 4.9.5 Let ⊆ Rn and suppose ⊂⊂ and ⊆ h .
(1) If u ∈ W 1, p (), 1 ≤ p then Dkh u ∈ L p ( ) and

h
D u ≤ Dk u L p () .
k L p ( )
(2) If u ∈ L p (), 1 0 such that for any and h above
we have Dkh u ∈ L p ( ) and
h
D u ≤ M,
k L p ( )
then Dk u ∈ L p () and

Dk u L p ( ) ≤ M.
Proof (1): By density argument, it suffices to prove the result for

u ∈ W 1, p () ∩ C 1 ().
This enables us to use the fundamental theorem of calculus. We write

u(x + hek ) − u(x)
Dkh u(x) =
h
1 h
= Dk u(x + tek )dt.
h 0
Now, using Holder’s inequality integrate over
h 1 h
D u(x) p d x ≤ |Dk u(x + tek )| p dtd x
k
h 0
h
1
= |Dk u(x + tek )| p d xdt (by Fubini Thm)
h 0
h
1
= |Dk u(x)| p d xdt
h 0
= |Dk u(x)| p d x.

(2): A well-known result in functional analysis states that every bounded sequence
in a reflexive space has a weakly convergent subsequence (see Theorem 5.2.6 next
chapter). So, letting h = h n such that h n −→ 0 as n −→ ∞, there exists a weakly
w
convergent subsequence, say Dkh n u, such that Dkh n u −→ v ∈ L p ( ), and so for
every ϕ ∈ C0∞ () we have by definition of weak convergence
lim ϕDkh n ud x = ϕvd x. (4.9.1)

On the other hand, note that Dk−h n ϕ converges to Dk ϕ uniformly on , hence by

Proposition 4.9.4(4)
lim ϕDkh n ud x = − lim u Dk−h n ϕd x = − u Dk ϕd x.

Combining this result with (4.9.1) gives
ϕvd x = − u Dk ϕd x,

w
which implies that v = Dk u ∈ L p ( ) and since Dkh n u −→ v, we obtain (see
Proposition 5.3.3(3) next chapter)

Dk u L p ( ) ≤ lim inf Dkh n u p ≤ M.
L ( )
Note that this holds for all ⊂⊂ and ⊆ h . Define

1
n = {x ∈ : d(x, ∂) > }.
n
Then uχ K n u, and using Dominated Convergence Theorem we get
Dk u L p () ≤ M.
The above results will help us establish regularity results. The regularity results
contain tedious calculations, so we will start with the simplest settings of Poisson
equation in Hilbert space, and the results for general equations can be proved by a
similar argument.
4.9.3 Caccioppoli’s Inequality
The following estimate is helpful to establish our first regularity theorem.

Lemma 4.9.6 (Caccioppoli’s Inequality) Let u ∈ H 1 () be a weak solution to the
Poisson equation
−∇ 2 u = f,
for some f ∈ L 2 (). Then for any ⊂⊂ and ⊆ h , we have
Du L 2 ( ) ≤ C[u L 2 () + f L 2 () ].
Proof Since u is a weak solution to the Poisson equation, it satisfies the weak
formulation
Du.Dvd x = f vd x (4.9.2)

for every v ∈ H01 . Define v = ξ 2 u where ξ ∈ Cc∞ is a cut-off function having the
following properties:
1. On , we have ξ = 1.
2. On \ , we have 0 ≤ ξ ≤ 1, and |∇ξ| ≤ M for some M > 0.
3. supp(ξ) ⊆ .
Substituting it in (4.9.2) gives
ξ 2 |Du|2 + 2ξ∇ξu Du = ξ 2 f ud x.

It follows that
ξ 2 |Du|2 ≤ 2 |ξ∇u| |u∇ξ| d x + |ξ f | |ξu| d x.

1
Use Cauchy inequality (Lemma 4.4.5) on the second integral on the LHS with = ,
4
and Young’s inequality on the RHS integral with
1 1 1
ξ 2 |Du|2 ≤ ξ 2 |Du|2 + 2 u 2 |∇ξ|2 + ξ2 f 2 + ξ2u2.
2 2 2
Using the properties of ξ given above, we obtain
|Du|2 ≤ ξ 2 |Du|2 ≤ C[u2L p () + f 2L p () ].

4.9.4 Interior Regularity for Poisson Equation
Now we set our second regularity result.

Theorem 4.9.7 (Interior Regularity for Poisson Equation) Let u ∈ H 1 () be a weak
solution to the Poisson equation
−∇ 2 u = f,
for some f ∈ L 2 (). Then for any ⊂⊂ , we have u ∈ H 2 ( ) and
u H 2 ( ) ≤ C[u L 2 () + f L 2 () ].
Proof We will use the same settings of the preceding lemma. Let ⊂⊂ h ⊂ ,
and define a cut-off function ξ ∈ Cc∞ having the following properties:
1. In , we have ξ = 1.
2. In h \ , we have 0 ≤ ξ ≤ 1, and |∇ξ| ≤ M for some M > 0.
3. supp(ξ) ⊆ h .
Since u is a weak solution to the Poisson equation, it satisfies the weak formulation
(4.9.2). Consider v ∈ H01 () and let supp(v) ⊆ h . Substitute it in (4.9.2) with
u = −Dkh u.
This gives
D(−Dkh u.).Dvd x = − Dkh (Du.).Dvd x (Proposition 4.9.4(1))

h h
= Du.Dkh (Dv)d x (Proposition 4.9.4(4))

h
= Du.D(Dkh v)d x (Proposition 4.9.4(1))

h
= f.Dkh vd x
h
≤ f 2 d x. |Dk v|2 d x (Theorem 4.9.5(1)).

h h
So we have
D(−Dkh u.).Dvd x ≤ f L 2 () Dv L 2 (h ) . (4.9.3)

h
Now, define
v = −ξ 2 Dkh u
in (4.9.3). Note that, using Proposition 4.9.4(3), the expression D(ξ 2 Dkh u) can be
written as
D(ξ 2 Dkh u) = 2ξ∇ξ.Dkh u + ξ 2 D(Dkh u). (4.9.4)
Substituting v in (4.9.3) taking into account (4.9.4) gives
D(−Dkh u.).D(−ξ 2 Dkh u)d x = D(Dkh u.).D(ξ 2 Dkh u)d x

h h

= ξ D(D h u)2 d x + 2 ξ∇ξ.Dkh u.D(Dkh u)d x,
k
h h
so

ξ D(D h u)2 2 = D(Dkh u.).D(ξ 2 Dkh u)d x − 2 ξ∇ξ.Dkh u.D(Dkh u)d x.
k L (h )
h h
(4.9.5)
Given property 2 for ξ above and using it in (4.9.4) gives
D(ξ 2 Dkh u) ≤ 2M.Dkh u + ξ D(Dkh u). (4.9.6)
Using (4.9.6) in (4.9.5),


ξ D(D h u)2 2 = D(Dkh u.).D(ξ 2 Dkh u)d x − 2 ξ∇ξ.Dkh u.D(Dkh u)d x
k L (h )
h h

≤ D(ξ 2 Dkh u) L 2 (h ) f L 2 () + 2 ξ D(D h u) ∇ξ D h u d x
k k
h
h

≤ 2M Dk u L 2 (h ) f L 2 () + ξ D(Dkh u) L 2 (h ) f L 2 ()

+ ξ D(D h u) 2M D h u d x (by (4.9.6)).
k k
h
Again, invoking the Cauchy inequality in the RHS of the inequality above, with
1
= and the values of s, t are in the order appeared in the inequality, we have
4
2
ξ D(D h u)2 2 ≤ M 2 Dkh u L 2 (h ) + f 2L 2 ()
k L (h )
1 2
+ ξ D(Dkh u) L 2 (h ) + f 2L 2 ()
4
1 2 2
+ ξ D(Dkh u) L 2 (h ) + 4M 2 Dkh u L 2 (h )
4
1 2
≤ ξ D(Dkh u) L 2 (h ) + 2 f 2L 2 () + 5M 2 Du2L 2 (h )
2
(Theorem 4.9.5(1))
which implies

ξ D(D h u)2 2 ≤ 4 f 2L 2 () + 10M 2 Du2L 2 (h )
k L (h )
# $
≤ C f 2L 2 () + Du2L 2 (h )
2
≤ C f L 2 () + Du L 2 (h ) .
Note that ξ = 1 on , so Proposition 4.9.4(1)) and Theorem 4.9.5(2) yield

2 2 2
D u 2 ≤ ξ D(Dkh u) L 2 (h ) .
L ( )
Substituting the above and using Lemma 4.9.6 (Caccioppoli’s Inequality) with =
h in the Lemma gives
2
D u ≤ C f L 2 () + Du L 2 () . (4.9.7)
L 2 ( )
The combination of (4.9.7) and Caccioppoli’s Inequality yields the result.

The above theorem asserts that a weak solution of the Poisson equation is in fact
a strong solution that belongs to H 2 , so its second weak derivative exists in L 2 .
Consequently, we can safely perform integration by parts in (4.9.2) to obtain the
Poisson equation for almost all x ∈ except for a set of measure 0.
4.10 Regularity for General Elliptic Equations 283
In standard calculus, it is well-known that u ∈ C 2 if u ∈ C. This observation

may lead someone to believe that we can do the same in the case above and conclude
that u ∈ H 2 because
∇2u = f ∈ L 2,
which is incorrect. It should also be noted here that having ∇ 2 u ∈ L 2 doesn’t nec-
essarily mean that the weak derivatives Di2 u exist and belong to L 2 because, as dis-
cussed earlier in Section 3.1, the existence of pointwise derivatives doesn’t always
imply the existence of weak derivatives, so our case should be handled with extra
care. For example, consider the equation u = 0 in some interval in R. Then, if u
is a strong solution to the equation, it is also a weak solution since it satisfies the
weak formulation, but we cannot conclude that u ∈ W 1, p because u might be a step
function, and step functions of the form (3.1.1) are not weakly differentiable.
4.10 Regularity for General Elliptic Equations
4.10.1 Interior Regularity
Now we take the result one step further. We will prove it for a general elliptic equation
(4.2.4), namely
n
∂ n
− ai j (x)D j u + bi (x)Di u(x) + c(x)u(x) = f . (4.10.1)
i, j=1
∂xi i=1
The weak formulation is given by

n
n
ai j Di u D j v + bi Di uv + cuvd x = f, v L 2 () . (4.10.2)
i, j=1 i=1
The argument of the proof is similar to the preceding theorem and so we will give
a sketch of the proof, leaving all the details to the reader to figure out. Note also that
the Caccioppoli inequality can be proved for the operator in (4.10.1) by a similar
argument.
Theorem 4.10.1 (Interior Regularity Theorem) Consider the elliptic equation Lu =
f where L is a uniformly elliptic operator given by (4.10.1) for ai j ∈ C 1 (), bi , c ∈
L ∞ (), and f ∈ L 2 () for some bounded open in Rn . If u ∈ H 1 () is a weak
solution to the equation Lu = f , then u ∈ HLoc
2
(). Furthermore, for any ⊂⊂
and some constant C > 0, we have
u H 2 ( ) ≤ C[u H 1 () + f L 2 () ].

Proof We will use the same settings as before. Namely, let ⊂⊂ h ⊂ , and
define a cut-off function ξ ∈ Cc∞ having the following properties: In , we have
ξ = 1, in h \ , we have 0 ≤ ξ ≤ 1, |∇ξ| ≤ M for some M > 0, and supp(ξ) ⊆
h . Since u is a weak solution to (4.10.1), it satisfies (4.10.2), so

n
ai j (x)Di u Di vd x = fˆvd x, (4.10.3)
i, j=1
for every v ∈ H01 (), where

n
fˆ(x) = f − bi (x)Di uv + c(x)u(x).
i=1
Choose
v(x) = −Dk−h (ξ 2 Dkh u),
and substitute it in (4.10.3). This gives

n
ai j (x)Di u Di (−Dk−h (ξ 2 Dkh u))d x = − fˆ Dk−h (ξ 2 Dkh u)d x. (4.10.4)
i, j=1
Employing all the previous useful results that we used in the preceding theorem, the
LHS of the equation can be written as
n
ai j (x)Di u Di (−Dk−h (ξ 2 Dkh u))d x = A + B,
i, j=1
where

n
A= ai j Dkh D j u(ξ 2 Dkh Di u))d x,
i, j=1

n
B= [ai j Dkh D j u(2ξ∇ξ Dkh u)
i, j=1
+ Dkh ai j D j u(2ξ∇ξ Dkh u)

+ Dkh ai j D j u(ξ 2 Dkh Di u)]d x
Since L is uniformly elliptic, the integral A can be estimated as

h 2
A≥λ ξ D Du d x,
k
h
λ
and for the integral B, we can use Cauchy’s inequality with = , then
2
Theorem 4.9.5(1). We obtain
λ h 2
|B| ≤ ξ D Du d x + C |Du|2 d x.
k
2 h
Using the two estimates for A and B in (4.10.4) yields
λ
n
h 2
ξ Dk Du d x − C1 |Du|2 d x ≤ ai j Di u Di vd x = fˆvd x.
2 h i, j=1
(4.10.5)
On the other hand, by doing the necessary calculations on the RHS of the inequality
λ
above and using Cauchy’s inequality with = , Theorem 4.9.5(1) gives
4

λ h 2 2
f Dk (ξ Dk u)d x ≤
ˆ −h 2 h ξ D Du d x + C2 | f | + |u|2 + |Du|2 d x.
4 k
h
Substituting in (4.10.5) and rearranging terms yields

λ h 2 # $
ξ D Du d x ≤ C3 u2 2 + Du2 2 + f 2 2
k L () L () L () ,
4 h
where C3 = max{C1 , C2 }. Therefore

h 2 # $
ξ D Du d x ≤ C u2 2 + Du2 2 + f 2 2
k L () L () L () .
h
Using the same argument of the preceding proof, we conclude that D 2 u ∈ L 2 ( )

and
u H 2 ( ) ≤ C[u H 1 () + f L 2 () ],
therefore u ∈ H 2 ( ), and given that ⊂⊂ , we have u ∈ HLoc

2
().
Remark It should be noted that the estimate of Theorem 4.10.1 can be expressed
by L 2 −norm of u rather than the H 1 −norm and becomes
u H 2 ( ) ≤ C[u L 2 () + f L 2 () ]
(see Problem 4.11.41).

4.10.2 Higher Order Interior Regularity
Now that we proved the regularity result for u ∈ H 1 and prove that u ∈ H 2 , one can
use induction to repeat the argument above and iterate the estimates and obtain higher
order regularity. Indeed, if f ∈ H 1 (), then by the preceding theorem u ∈ HLoc 2
().
Let us for simplicity consider the Poisson equation. Recall that a weak solution
satisfies in (4.9.2) which deals only with the first derivative, and consequently one
cannot perform integration by parts to obtain the equation −∇ 2 u = f because there
is no grantee that D 2 u ∈ L 2 and Di f ∈ L 2 exists and is integrable in L 2 , and this is
the main reason why weak solutions cannot be automatically regarded as strong or
classical solutions to the original equation. However, if the weak solution is found
to be in H 2 and Di f ∈ L 2 , then we can perform integration by parts. In this case,
we can choose to deal with Dv instead of v ∈ Cc∞ . Substituting it in (4.9.2) gives
Du.D(Dv)d x = f Dvd x.

Performing integration by parts gives
D 2 u.Dvd x = (D f ).vd x. (4.10.6)

The solution u in this case satisfies the original equation pointwise and almost
everywhere, and is thus a strong solution.
Corollary 4.10.2 Under the assumptions of the preceding theorem, if u ∈ H 1 () is
a weak solution of the equation Lu = f , then u is a strong solution to the equation.
Another important consequence is that, based on (4.10.6) and the argument above,
if we repeat the proof of the preceding theorem and iterate our estimates we will
obtain u ∈ HLoc
3
(). This process can be inductively repeated for k ∈ N. In general,
we have the following.
Theorem 4.10.3 (Higher order Interior Regularity Theorem) Consider the elliptic
equation Lu = f where L is a uniformly elliptic operator given by (4.10.1) for ai j ∈
C k+1 (), bi , c ∈ L ∞ (), and f ∈ H k () for some open bounded in Rn . If u ∈
H 1 () is a weak solution to the equation Lu = f , then u ∈ HLoc
k+2
(). Furthermore,
for any ⊂⊂ and some constant C > 0, we have
u H k+2 ( ) ≤ C[u L 2 () + f H k () ].
According to the theorem, the smoother the data ai j , bi , c, f of the equation,

n
the smoother the solution we get. Note here that if k > then using the Sobolev
2
Embedding Theorem, we conclude that f ∈ C() and consequently, u ∈ C 2 ()
which describes a classical solution for the equation.
4.10.3 Interior Smoothness
A natural question arises now: if the preceding theorem holds for all k ∈ N, shouldn’t
that imply that the solution is smooth? The following theorem answers the question
positively.
Theorem 4.10.4 (Interior Smoothness Theorem) Let ai j , bi , c, f ∈ C ∞ () and u ∈
H 1 () be a weak solution to the equation Lu = f , then u ∈ C ∞ () and u be a
classical solution to the equation.
Proof If u ∈ C ∞ () then f ∈ H k () for all k ∈ N, so by the preceding theorem

u ∈ HLoc
k+2
(), i.e., u ∈ H k+2 ( ) for every ⊂⊂ , and so by Sobolev Embedding
Theorem (Theorem 3.10.10(3)) u ∈ C ∞ ( ), and since is arbitrary, we can extend
it by continuity to .
4.10.4 Boundary Regularity
All the preceding results obtained so far establish the interior regularity for weak
solutions H k ( ) for sets ⊂⊂ . This shows that a solution of an elliptic equa-
tion with regular/smooth data is locally regular/smooth in the interior of its domain
of definition; but it doesn’t mention any information about the smoothness of the
solution at the boundary. In other words, the results above didn’t obtain a solution in
H k (). In order to obtain a smooth solution at the boundary, and based on the treat-
ment we gave to obtain interior regularity and smoothness, we require the boundary
itself to be sufficiently regular. The following theorem can also be iterated to yield
smoothness provided the data are smooth. The proof is long and very technical and
may not fit the scope of the present book. The interested reader may consult books
on the theory of partial differential equations for the details of the proof.
Theorem 4.10.5 (Boundary Regularity Theorem) In addition to the assumptions of
Theorem 4.10.1, suppose that is of C 1 −class, and ai j ∈ C 1 (). If u ∈ H01 () is a
weak solution to the equation Lu = f under the boundary condition u = 0 on ∂,
then u ∈ H 2 (). Furthermore, for some constant C > 0, we have
u H 2 () ≤ C[u L 2 () + f L 2 () ].
It becomes clear now that the weak solutions of elliptic equations can become
strong or even classical solutions provided the data of the equation are sufficiently
regular. It is well-known that the task of establishing the existence of a solution of
an elliptic equation is not an easy task. Now, in light of the present section, one can
seek weak solutions over Sobolev spaces, then use regularity techniques to show
that these weak solutions are in fact strong or classical solutions. In the next chapter,
we will see that weak solutions of elliptic partial differential equations are closely
connected with minimizers of integral functionals that are related to these elliptic
PDEs through their weak formulation.
4.11 Problems
(1) If ai j (x) ∈ C 2 () and bi = 0 in the elliptic operator L in (4.1.1), prove that
B[u, v] is symmetric.
(2) Suppose that L is an elliptic operator and there exists 0 < < ∞ such that

n
ai j ξi ξ j ≤ |ξ|2
i=1
for all ξ ∈ Rn . Show that ai, j ∈ L ∞ .

(3) Show that an elliptic operator defined on ⊆ Rn makes sense in divergence
form only if ai j ∈ C 1 ().
(4) (a) Prove that ·∂ defines a norm on H˜ 1 ().
(b) Prove that every Cauchy sequence in H̃ 1 () converges in H̃ 1 ().
(c) Prove that (4.3.4) defines an inner product.
(d) Conclude Proposition 4.3.5.
(6) Determine whether the cross-product mapping P : R3 × R3 −→ R3 is bilinear.
(7) Show that if the elliptic bilinear map B[u, v] is symmetric then it is coercive.
(8) (Poincare–Friedrichs inequality): Let be C 1 open and bounded in R2 . Show
that there exists C > 0 such that
2

|u|2 d x ≤ C |Du|2 d x + ud x

for all u ∈ H 1 ().

(9) Show that if u ∈ C 2 () ∩ C() is a weak solution to the problem
⎧
⎨ Lu = f x ∈
∂u
⎩ = 0 x ∈ ∂.
∂n
for some bounded ⊂ Rn , then show that u is a classical solution to the prob-
lem.
(10) Consider H01 () with the Sobolev norm
u H01 () = u L 2 () + Du L 2 ()

4.11 Problems 289
for u ∈ H01 (). Show that the norm Du L 2 () is equivalent to u H01 () on
H01 ().
(11) Determine whether or not H01 (Rn ) with the inner product
(u, v) = ∇u∇vd x
is a Hilbert space.
(12) Show that if f ∈ L 2 () and D f is its distributional derivative, then
D f H −1 () ≤ f L 2 () .
Deduce that L 2 () ⊂ H −1 ().

(13) Let L be a uniformly elliptic operator with bi = 0 for all i.
(a) If min(c(x)) < λ0 , show that γ in Garding’s Inequality can be estimated as
γ = λ0 − min(c(x)).
(b) Show that the bilinear map associated with the operator
Lu + μu = f
is coercive for μ ≥ γ.
(c) Establish the existence and uniqueness of the weak solution of the Dirichlet
problem of
Lu + μu = f
with u(∂) = 0.
(14) Let L be a uniformly elliptic general operator with ai j , bi , c ∈ L ∞ () for some
open and bounded in R2 . Write Garding’s Inequality with
λ0 M
γ= −m+ ,
2 2λ0
where m = min(c(x)) and M = max bi ∞ .

(15) Consider a bounded ⊂ Rn and f ∈ L 2 (), g ∈ L 2 (∂). Prove the existence
and uniqueness of the weak solution of the following Neumann problems:
∂u
(a) ∇ 2 u = 0 in and = g.
∂n
∂u
(b) −∇ 2 u + u = f in and = 0.
∂n
∂u
(16) Consider the problem −∇ 2 u = f in = g on L 2 () for bounded ⊂ Rn .
∂n
(a) Show that a necessary condition for the solution to exist is that
f dx + gd x = 0.

(b) Establish the existence and uniqueness of the weak solution of the problem.
(17) Prove that there exists a weak solution for the nonhomogeneous Helmholtz
problem
−∇ 2 u + u = f : x ∈
u=0 : x ∈ ∂
Do we need to use Lax-Milgram theorem? Justify your answer.

(18) Let bounded ⊂ Rn and f ∈ L 2 (), g ∈ L 2 (∂). Consider the problem
−div(∇u) + u = f
∂u
= g.
∂n
(a) Find the weak formulation of the problem.
(b) Prove the existence and uniqueness of the weak solution of the problem.
(− pu ) + qu = f
u(0) = u(1) = 0,
on I = [0, 1], for p, q ∈ C[I ] and f ∈ L 2 (I ), and p, q > 0.

(a) Find the associated bilinear form B on H01 (I ).
(b) Show that B is coercive.
(c) Prove the existence and uniqueness of the weak solution of the problem.
(20) In Theorem 4.5.4, find the best value of c0 > −∞ such that if c(x) ≥ c0 then
B[u, v] is coercive.
(21) Generalize Theorem 4.5.5 assuming that c(x) ≥ m for some m ∈ R (not nec-
λ
essarily positive) such that m ≥ − 2 .
C
(22) Solve the Dirichlet problem (4.6.2) if c(x) ≥ λ (where λ is the ellipticity con-
stant of L).
(23) Prove the existence and uniqueness of the weak solution of problem (4.6.4) for
a symmetric A using the Riesz Representation Theorem.
(24) Let λmax > 0 be the largest eigenvalue of the matrix A, and let μ < −λmax .
Prove the existence and uniqueness of weak solution of the Dirichlet problem
Lu + μu = f x ∈
u = 0, x ∈ ∂
where L is a uniformly elliptic operator of the form (4.5.3), such that ai j , c ∈

L ∞ (), and for some open that is bounded in at least some direction in Rn .
4.11 Problems 291
(25) Consider the following uniform elliptic operator
n n n
∂ ∂ ∂ ∂
L=− ai j (x) − (bi ·) + c(x) + d,
i, j=1
∂xi ∂x j i=1
∂xi i=1
∂xi
with ai j , bi , c, d ∈ L ∞ () for some open and bounded in R2 .

a) Show that the associated elliptic bilinear map B[u, v] is bounded.
b) Write the Garding’s Inequality with and
λ0 1
β= , γ= (max bi ∞ + c∞ )2 + d∞ .
2 2λ0
c) If μ ≥ γ, show that
Bμ [u, v] = B[u, v] + μ u, v L2
is bounded and coercive in H01 ().

d) Show that for f ∈ L 2 (), there exists a unique weak solution in H01 () to
the Dirichlet problem
Lu = f x ∈
.
u = 0, x ∈ ∂
(26) Without using Theorem 4.5.4, show that the elliptic bilinear map associated
with
Lu + μu = f
is coercive for μ ≥ γ.
(27) Show that the norm
u∂ 2 = ∇ 2 u L 2 ()
is equivalent to the standard norm u H01 () for some open and bounded in
R2 .
(28) Consider the equation
n n
∂ ∂u ∂
− ai j (x) + (bi u) + c(x)u(x) + μu(x) = f, in .
i, j=1
∂x i ∂x j i=1
∂x i
Assume ai j , c ∈ L ∞ () and bi ∈ W 1,∞ (), f ∈ L 2 () for some open in

R2 and bounded in at least one direction. Discuss existence and uniqueness of
weak solution of the equation under:
(a) the homogeneous Dirichlet condition u(∂) = 0.
(b) the homogeneous Neumann condition ∇u · n = 0 on ∂.
(c) Impose whatever conditions needed on b, c, μ to ensure the results in (a)
and (b).
(d) Discuss the two cases: μ = 0, and μ > 0.
(29) Consider the fourth-order biharmonic BVP
∇ 2 ∇ 2 u = f in
u = 0 on ∂
∇u · n = 0 on ∂
where f ∈ L 2 () and is open and bounded in R2 . Use the preceding problem
to show the existence and uniqueness of the weak solution of the problem.
(30) Show that the eigenvalues of a self-adjoint uniformly elliptic operator are
bounded from below.
(31) Show that
K ∗ = (L ∗ + μI )−1 | L 2
for the operator K = L −1

μ .
(32) In the preceding problem, show that if the only solution of
K ∗ (v) = −v
is v = 0 then for every g ∈ L 2 , the equation
u + γKu = g
has a unique solution.

(33) Let L be a uniformly elliptic operator defined on some open bounded in R2 ,
and f, h ∈ L 2 (), λ ∈ R.
(a) Show that
u − Ku = h
has a weak solution in H01 () iff h, v = 0 for every weak solution v ∈ H01 ()
of K ∗ v − v = 0.
(b) If Lu = 0 has a nontrivial solution, show that
Lu = f
has a weak solution in H01 () iff f, v = 0 for every weak solution v ∈ H01 ()
of L ∗ v = 0.
(34) In the previous problem, show that if
Lu − λu = f
has a weak solution in H01 () for every f ∈ L 2 () then
u L 2 ≤ C f L 2
for some C = C(λ, ).

4.11 Problems 293
(35) Let
L = −∇ 2 ,
and = Rn .
(a) Show that L has a continuous spectrum.
(b) Conclude that the boundedness of in Theorem 4.8.2 is essential to obtain
a discrete set of eigenvalues.
(36) Let {φn } be the orthonormal basis for L 2 () which are the eigenfunctions of
the Laplacian equation
−∇ 2 φn = λn φn .
φn
Show that { √ } is an orthonormal basis for H01 () endowed with the Poincare
λn
inner product
u, v ∂ = Du Dv.

(37) Prove the first three statements of Proposition 4.9.4.

(38) Prove that u ∈ W 1, p (Rn ) if and only if u ∈ L p (Rn ) and
u(x + h) − u(x) p
lim sup < ∞.
h→0 |h|
(39) Prove the estimate in Theorem 4.9.5(1) for p = ∞.

(40) Give an example to show that the estimate in Theorem 4.9.5(2) is not valid for
p = 1.
(41) (a) Prove the Caccioppoli inequality (Lemma 4.9.6) for the general elliptic
operator in (4.10.1).)
(b) Show that the estimate obtained in Theorem 4.10.1 can be refined to
% &
u H 2 ( ) ≤ C u H 1 (h ) + f L 2 (h ) .
(c) Choose v = ξ 2 u for some cut-off function ξ = 1 in h and supp(ξ) ⊂ .

Under the same assumptions of Theorem 4.10.1, use (a) and (b) to establish the
estimate % &
u H 2 ( ) ≤ C u L 2 () + f L 2 () .
n
(42) In Theorem 4.10.3, show that if k > then u ∈ C 2 () is a classical solution
2
for the equation.
Chapter 5
Calculus of Variations
5.1 Minimization Problem
5.1.1 Definition of Minimization Problem
The problem of finding the maximum value or minimum value of a function over
some set in the domain of the function is called: variational problem. This problem
and finding ways to deal with it is as old as humanity itself. In case we are looking
for a minimum value, the problem is called: minimization problem. We will focus
on the minimization problems due to their particular importance: In physics and
engineering, we look for the minimum energy, in geometry, we look for the shortest
distance or the smallest area or volume, and in economy, we look for the minimum
costs. Historically, the minimization problems in physics and geometry were the
main motivation to develop the necessary tools and techniques that laid the founda-
tions of this field of mathematics which connects functional analysis with applied
mathematics. We will discuss the relations between minimization problems and the
existence and uniqueness problem of solutions of PDEs. We confine ourselves to
elliptic equations. We shall give a formal definition to the problem.
Definition 5.1.1 (Minimization Problem) Let f : X −→ R be a real-valued func-
tion. The minimization problem is the problem of searching for some x0 ∈ A for
some set A ⊆ X , such that
f (x0 ) = inf f (x).
x∈A
If there exists such x0 ∈ A, then we say that x0 is the solution to the problem,
and the point x0 is said to be the minimizer of f over A. If f (x0 ) is the minimum
value over X then x0 is the global minimizer of f . The set A is called: admissible
set, and it consists of all the values of x that shall compete to obtain the minimum
value attained in A. We observe the following points:
(1) Not every minimization problem has a minimizer.
https://doi.org/10.1007/978-981-99-3788-2_5
296 5 Calculus of Variations
(2) If the minimizer was not found over a set A, it is still possible to find it if the
admissible set A is enlarged to a bigger one, and the larger the admissible set,
the bigger chance the problem will have a solution.
(3) The maximization problem is the dual of the minimization problem, noting that
over an admissible set A, we always have that
sup f (x) = − inf(−f (x)),
so solving one problem automatically solves the other problem.

We emphasized that we are concerned with searching for the minimizer rather than the
infimum of the function. This “searching” has two levels: the first level is to merely
prove the existence of such a minimizer. The second level (which usually comes
after achieving the goal of the first level) is to find such a minimizer (exactly or
approximately) using various analytical, numerical, and computational techniques.
In functional analysis, we usually confine ourselves to the first level, and once the
job is done, it would be the task for the applied mathematicians to carry out the
second level of the problem. To simplify our task, we will start our discussion with
simple minimization problems in finite-dimensional spaces which will motivate us
to discuss the problem in infinite-dimensional spaces.
The first time we encountered this type of problem was in a calculus course. We
begin with a lovely theorem that all readers had certainly studied in the first course
of calculus.
Theorem 5.1.2 (Extreme Value Theorem) If f is continuous on a closed interval
[a, b] then f attains both an absolute maximum value and an absolute minimum
value somewhere in [a, b].
The theorem predicts the existence of a minimizer and maximizer over [a, b], and
it can be easily generalized to Rn . The admissible set in the theorem is the compact
interval [a, b] and the only requirement for f is to be continuous. The properties of
continuity of the function and the compactness of the admissible set are the most
important conditions to establish the existence results of minimizers and maximizers.
Another fundamental theorem is:
Theorem 5.1.3 (Bolzano–Weierstrass Theorem) Every sequence in a compact set
in Rn has a subsequence that converges to some point in that set.
The theorem can be stated equivalently as: “Every bounded sequence in Rn has a
convergent subsequence”. Note here that both maximum and minimum are required.
The following function

x2 −1 ≤ x ≤ 0
f (x) =
2 − x 0 < x ≤ 1.
does have a minimum value of 0; but it does not have a maximum value. If we restrict
our goal to exploring minimizers only, then functions such as the above may serve as a
good example to motive us. In fact, this function is known as: lower semicontinuous.
5.1 Minimization Problem 297
5.1.2 Lower Semicontinuity
Definition 5.1.4 Let X be a normed space and f is a real-valued function defined

on X .
(1) Lower semicontinuous. The function f is said to be lower semicontinuous at
x0 ∈ D(f ) if for every > 0 there exists δ > 0 such that
f (x0 ) < f (x) +
whenever x − x0 < δ for all x ∈ D(f ). The lower semicontinuous function is

denoted simply by l.s.c.
(2) Upper semicontinuous. The function f is said to be upper semicontinuous at
x0 ∈ D(f ) if for every > 0 there exists δ > 0 such that
f (x) − < f (x0 )
whenever x − x0 < δ for all x ∈ D(f ). The lower semicontinuous function is

denoted simply by u.s.c.
It is evident from the definitions above that a function is continuous at a point iff it is
l.s.c. and u.s.c. at that point. Also, if f is l.s.c. then −f is u.s.c. We will only discuss
l.s.c. functions. A more reliable definition based on sequences is the following:
Definition 5.1.5 (Sequentially lower semicontinuous) A function f : X −→
(−∞, ∞] is said to be sequentially lower semicontinuous at a number x0 ∈ D(f ) if
f (x0 ) ≤ lim inf f (xn )
for every sequence xn ∈ X converges to x.

In normed spaces, the two definitions are equivalent, so we will continue to use
Definition 5.1.5 and omit the term sequentially for convenience.
The following result is fundamental and some authors use it as an equivalent definition
of lower semicontinuity.
Proposition 5.1.6 Let f be defined on some normed space X . Then f is l.s.c. iff
f −1 (−∞, r] = {x : f (x) ≤ r}
is closed set in X for every r ∈ R.

The proof is straightforward using the definitions. We leave it to the reader. The
set {x : f (x) ≤ r} is called: the lower-level set of f . Another important notion linked
to real-valued functions is the following:
Definition 5.1.7 (Epigraph) Let f be defined on some normed space X . Then the
epigraph of f , denoted by epi(f ), is given by
epi(f ) = {(x, r) ∈ X × R: r ≥ f (x)}.
In words, the epigraph of a function f is the set of all points in X × R lying above
or on the graph of f . The next result shows that l.s.c. functions can be identified by
their epigraphs.
Proposition 5.1.8 Let f be defined on some normed space X . Then f is l.s.c. iff
epi(f ) is closed.
Proof Consider the sequence (xn , rn ) ∈ epi(f ), which implies that xn ≤ rn . let
(xn , rn ) −→ (x0 , r0 ) for some x0 ∈ X and r0 ∈ R. Then by lower semicontinuity,
f (x0 ) ≤ lim inf f (xn ) ≤ lim inf rn = lim rn = r0 .
Hence
(f (x0 ), r0 ) ∈ epi(f )
and this proves one direction. Conversely, let epi(f ) be closed. Let xn −→ x0 in X .
Then
(xn , f (xn )) ∈ epi(f )
which is closed, so
(xn , f (xn )) −→ (x0 , r0 ) ∈ epi(f ).
It follows that f (xn ) −→ r0 , and f (xn ) ≤ r0 . Hence
f (x0 ) ≤ r0 = lim f (xn ) = lim inf rn .
5.1.3 Minimization Problems in Finite-Dimensional Spaces
With the use of the l.s.c notion, Bolzano–Weierstrass Theorem can be reduced to the
following version:
Theorem 5.1.9 If f :X −→ (−∞, ∞] is l.s.c on a compact set K ⊆ X , then f is
bounded from below and attains its infimum on K, i.e., there exists x0 ∈ K such that
f (x0 ) = inf f (x).

x∈A
Proof Let
inf f (x) = m ≥ −∞.
x∈K
Define the sets

1
Cn = {x ∈ C: f (x) ≤ m + }.
n
5.1 Minimization Problem 299
Since f is l.s.c., every Cn is closed in X , and since they are subsets of C, they are all
compact; noticing that Cn+1 ≤ Cn for all n. By Cantor’s intersection theorem,
∞

Cn = Ø.
n=1
i.e., there exists x0 ∈ X such that

∞

x0 ∈ Cn ,
n=1
therefore,
f (x0 ) = inf f (x) = m > −∞.
x∈K
The above theorem establishes the existence of minimizers without giving a con-
structive procedure to find it. To find the minimizer, we need to implement various
analytical and numerical tools to obtain the exact or approximate value of it, but this
is beyond the scope of the book. To determine whether the minimizer predicted by
the above theorem is global or not, we need to impose further conditions to give
extra information about this minimizer. One interesting and important condition is
the notion of convexity.
5.1.4 Convexity
Recall that a set C is convex if whenever x, y ∈ C we have θx + (1 − θ)y ∈ C for

every 0 ≤ θ ≤ 1. In the following proposition, we remind the reader of some of the
basic properties of convex sets that can be easily proved using definition.
Proposition 5.1.10 The following statements hold in every topological space:
(1) The empty set is trivially convex. Moreover, a singleton set is convex.
(2) All open and closed balls in any normed space are convex.
(3) If C is convex then C and Int(C) (the interior of C) are convex sets.
(4) The intersection of any collection of convex sets is convex.
(5) The intersection of all convex sets containing a subset A is convex. This is called
the: convex hull, and is denoted by Conv(A), or A . Moreover, if A is compact
then Conv(A) is compact.
(6) For any set A, Conv (Conv(A)) = Conv(A).
Recall a function is convex if whenever x, y ∈ D(f ) we have
f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y)

for all 0 < θ < 1. If the inequality is strict then f is called: strictly convex. It is
evident from the definition of convex function that the domain D(f ) must be convex.
The following theorem benefits from the property of convexity.
Theorem 5.1.11 Let f be convex. If f has a local minimizer, then it is a global

minimizer.
Proof If x0 is local minimizer, then for any y ∈ D(f ), we can choose suitable 0 <
θ < 1 such that
f (x0 ) ≤ f (θx0 + (1 − θ)y) ≤ θf (x0 ) + (1 − θ)f (y).
With a simple calculation, this implies
f (x0 ) ≤ f (y).
5.1.5 Minimization in Infinite-Dimensional Space
We have seen that compactness and lower semicontinuity were the basic tools used
to search for minimizers in finite-dimensional spaces. This situation totally changes
when it comes to infinite-dimensional spaces. As discussed in a previous course of
functional analysis, compactness is not easy to achieve in these spaces, and Heine–
Borel Theorem, which states that every closed bounded set is compact, fails in
infinite-dimensional spaces. In fact, a well-known result in analysis states that a
space is finite-dimensional if and only if its closed unit ball is compact, and hence
the closed unit ball in infinite-dimensional spaces is never compact. A good sug-
gestion to remedy this difficult situation is to change the topology on the space.
More specifically, we replace the norm topology with another “weaker” topology
that allows more closed and compact sets to appear in the space. This is the weak
topology, which is the subject of the next section.
5.2 Weak Topology
5.2.1 Notion of Weak Topology
It is well-known that in normed spaces, the stronger the norm the more open sets we
obtain in the space, which makes it harder to obtain convergence and compactness.
Replacing the norm topology with a coarser, or weaker topology results in fewer
open sets, which will simplify our task. We are concerned with the smallest topology
that makes any bounded linear functional continuous. This merely requires f −1 (O)
5.2 Weak Topology 301
to be open in the topology for any O open in R. The generated topology is called the
weak topology.
Historically, the minimization problem was one of the main motivations to explore
the weak topology. As noted in the preface, the reader of this book must have prior
knowledge of weak topology in a previous course of analysis, but we shall give a
quick overview and remind the reader of the important results that will be used later
in the chapter.
5.2.2 Weak Convergence
Remark Throughout, we always assume the space is normed.
Recall a sequence xn ∈ X is weakly convergent, written as xn x weakly (or can

w
be denoted by xn −→ x) if f (xn ) −→ f (x) for all bounded linear functionals f on X .
In particular, let un ∈ Lp for some 1 ≤ p < ∞ and q its Holder conjugate. if un u
then f (un ) −→ f (u) for all bounded linear functionals f on Lq . Moreover, if un u
in a Hilbert space H, then un , z → u, z for all z ∈ H. A useful property to obtain
convergence in norm from weak convergence is known as the Radon-Riesz property:
w
Theorem 5.2.1 (Radon-Riesz Theorem) Let fn ∈ Lp , 1 < p < ∞. If xn −→ x) in
LP and fn p −→ f p then fn − f p −→ 0.
Proof For p = 2, use Riesz Representation Theorem and note that
fn − f 22 = fn − f , fn − f = fn 2 − 2 fn , f + f 2 .
For p = 2, use Hahn–Banach Theorem to show the existence of g ∈ Lq such that

gq = 1 and g(f ) = f p = 1. Then use Holder’s inequality and the fact that Lp is
uniformly convex.
It is worth noting that when we describe a topological property by “weakly” it means

that the property holds in the weak topology. The following is a brief description of
the sets in the weak topology with some implications:
(1) A is said to be weakly open if every element x ∈ U there exist {fi } and {Oi } open
in R, for i = 1, 2, . . . n, such that

n
x∈ fk−1 (Ok ) ⊂ U.
k=1
(2) A is said to be weakly closed if Ac is weakly open.

w
(3) A is said to be weakly sequentially closed if for every xn ∈ A and xn −→ x, we
have x ∈ A.
(4) A is said to be weakly bounded if f (A) is bounded for all f ∈ X ∗ .

(5) A is said to be weakly compact if f (A) is a compact set for all f ∈ X ∗ .
(6) A is said to be weakly sequentially compact if for every sequence xn ∈ A there
w
exists a subsequence xnj −→ x ∈ A.
(7) If {xn } is convergent in norm, then it weakly convergent, but the converse is not
necessarily true.
(8) If A is a weakly open set, then it is an open set, but the converse is not necessarily
true.
(9) If A is a weakly closed set, then it is a closed set, but the converse is not
necessarily true.
(10) A is bounded if and only if it is weakly bounded.
(11) If A is compact, then it is weakly compact, but the converse is not necessarily
true.
5.2.3 Weakly Closed Sets
Now, we state two important theorems about closed sets in the weak topology.
Theorem 5.2.2 (Mazur’s Theorem) If a set is convex and closed, then it is weakly
closed.
Proof Let A is convex and closed, and x0 ∈ Ac . By consequences of Hahn–Banach

Theorem, there exists f ∈ X ∗ such that for all x ∈ A, f (x) b} ⊂ Ac .
Hence, Ac is weakly open, and therefore, A is weakly closed.
Theorem 5.2.3 (Heine–Borel Theorem/Weak Version) Any weakly compact set is

weakly closed and weakly bounded. Any weakly closed subset of a weakly compact
set is weakly compact.
Proof Use the fact that every weak topology is Hausdorff.
5.2.4 Reflexive Spaces
Recall that a space X is said to be reflexive if it is isometrically isomorphic to its

second dual X ∗∗ under a natural embedding. In what follows, we list theorems that
are among the most important results in this theory.
Theorem 5.2.4 A space X is reflexive if and only if its closed unit ball
5.2 Weak Topology 303
BX = {x ∈ X : x ≤ 1}
is weakly compact.
Proof This can be proved using Banach–Alaoglu Theorem.
Theorem 5.2.5 Let X be normed space and A be a subset of X . Then the following
statements are equivalent:
(1) X is reflexive.
(2) A is weakly compact if and only if A is weakly closed and weakly bounded.
Proof Use Theorems: 5.2.2, 5.2.3, and 5.2.4.
Theorem 5.2.6 (Kakutani Theorem) If X is reflexive, then every bounded sequence

in X has a weakly convergent subsequence.
Proof Let
Y = span{xn } ⊂ X .
Then Y is reflexive and separable. Now, use Banach–Alaoglu Theorem.
The following are well-known results that can be easily proved using the above
theorems:
(1) Every reflexive space is a Banach space.
(2) Every Hilbert space is reflexive.
(3) All Lp spaces for 1 < p < ∞ are reflexive.
(4) A closed subspace of a reflexive space is reflexive.
We provided brief proofs for the theorems above.1
5.2.5 Weakly Lower Semicontinuity
In the weak topology, the lower semicontinuity takes the following definition:
Definition 5.2.7 (Weak Lower Semicontinuous Mapping) A mapping f : X −→
(−∞, ∞] is said to be weakly lower semicontinuous, denoted by w.l.s.c. at
x0 ∈ D(f ) if
f (x0 ) ≤ lim inf f (xn )
for every sequence xn ∈ X converges weakly to x.
Remark We allow convex functions to take the value ∞.
1 For details, the reader can consult volume 2 of this series, or alternatively any other textbook on
functional analysis.
Since convergence in norm implies weak convergence, it is evident that every w.l.s.c.
mapping is indeed l.s.c. The converse, however, is not necessarily true, unless the two
types of convergence are equivalent, which is the case in finite-dimensional spaces.
The convexity property once again proves its power and efficiency. We first need to
prove the following result.
Lemma 5.2.8 A function f is convex if and only if epi(f ) is convex.
Proof Let f be convex and suppose
(x, r), (y, s) ∈ epi(f ).
let 0 < θ < 1 such that
(z, t) = θ(x, r) + (1 − θ)(y, s).
Then
z = θr + (1 − θ)s
≥ θf (x) + (1 − θ)f (y)
≥ f (θx + (1 − θ)y)
≥ f (t).
Hence (z, t) ∈ epi(f ). Conversely, let epi(f ) be convex and suppose x, y ∈ D(f ).
Then
(x, f (x), (y, f (x) ∈ epi(f )
and so
(θx + (1 − θ)y, θf (x) + (1 − θ)f (y)) = θ(x, f (x)) + (1 − θ)(y, f (y)) ∈ epi(f ).
But this implies

f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y).
Theorem 5.2.9 If f is convex and l.s.c, then f is w.l.s.c.
Proof Since f is convex, by the preceding theorem epi(f ) is convex, and since f
is l.s.c., by Proposition 5.1.8 epi(f ) is closed, hence by Mazur’s theorem (The-
orem 5.2.2) epi(f ) is weakly closed, i.e., closed in the weak topology, hence by
Proposition 5.1.8 again, f is l.s.c in the weak topology.
The next result shows that if a sequence converges weakly, then there exists a finite
convex linear combination of the sequence that converges strongly to the weak limit
of the sequence.
5.3 Direct Method 305
w
Lemma 5.2.10 (Mazur’s Lemma) Let xn −→ x) in a normed space X with a norm
·X . Then for every n ∈ N, there exists a corresponding k = k(n) and another
sequence
k
yn = λj xj ,
j=n
for some positive coefficients λj satisfying
λn + λn+1 + · · · + λk = 1,
such that
yn − xX −→ 0.
Proof Fix n ∈ N, and define the set

k
k
Cn = { λj xj , s.t. λj = 1}.
j=n j=n
Then it is clear from Proposition 5.1.10(5) that Cn is the convex hull of the set {xj : j ≥
n}, so it is convex, hence by Proposition 5.1.10(3) Cn is convex and closed. By Mazur’s
Theorem 5.2.2, this implies that Cn is weakly closed, and since xn converges weakly
to x, we must have x ∈ Cn from which we can find a member yn ∈ C such that
1
yn − xX ≤ . (5.2.1)
n
Since {xn } is a sequence, we can take the limit as n −→ ∞ in (5.2.1) which gives
the desired result.
5.3 Direct Method
5.3.1 Direct Verses Indirect Methods
Differential Calculus is the most important tool used by mathematicians to study

minimization and maximization problems. This classical method was began by Euler
in the 1750s and it is considered today the standard technique. The procedure takes
place on the first derivative and the second derivative of the function rather than the
function itself. More specifically, we find critical points using the first derivative, and
then examine them using the first or the second derivative to determine if these critical
points refer to minimum, maximum, or saddle points. So the method is described as
“indirect method ” in the sense that no direct work on the function takes place. In 1900,
Hilbert began research on these minimization problems during his investigation of

the Dirichlet principle, which will be discussed in the next section. He soon found a
nonconstructive proof in which only the existence of the minimizer can be provided,
without finding the minimizer itself. The method was described as: “direct method ”
because the procedure is implemented directly on the function without regard to its
derivatives. The direct method is purely analytic, and many fundamental results of
functional analysis were established during the attempts to modify the method or
improve it. It is one of the greatest ideas that laid the foundations of modern analysis.
The indirect calculus-based method is constructive, but it is too technical and
restrictive in the sense that it may require several conditions on the regularity of the
function and its derivatives to facilitate the task of implementing the method. On the
other hand, the direct method is nonconstructive, but it is not too technical. It only
needs a weak topology and some conditions on the function itself. One advantage of
this method is that, under some conditions on the function and the space, it deals with
general functionals defined on spaces, and can provide abstract existence theorems
of minimizers, and consequently they can be applied to different problems and in
various settings.
5.3.2 Minimizing Sequence
The main tool and the key to solve the problem is to start with minimizing sequences.
Definition 5.3.1 (Minimizing Sequence) Let f be a functional defined on A. Then,

a minimizing sequence of f on A is a sequence (xn ) such that xn ∈ A and
f (xn ) −→ inf f (x).

x∈A
Here are some observations:
(1) The infimum of a function always exists, but it may be −∞ if f is unbounded

from below. However, we usually assume or impose conditions to avoid this
case.
(2) By the definition of infimum, we can show that minimizing sequences always
exist (verify).
(3) The definition doesn’t say anything about the convergence of the sequence.
5.3.3 Procedure of Direct Method
Generally speaking, the method is based on the following main steps:

(1) Constructing the minimizing sequence that converges to the infimum of the
function.
(2) Prove a subsequence of it converges.
(3) Show that the limit is the minimizer
Step 2 is the crucial one here. The general argument is to extract a subsequence
of the minimizing sequence which converges in the weak topology. We will use
the Kakutani Theorem (Theorem 5.2.6) which is a generalization of the Bolzano–
Weierstrass Theorem.
For step 1, the construction of the minimizing sequence comes naturally, i.e., from
the definition of the infimum. Another way of showing the finiteness of the infimum
is to prove that the function is bounded from below, which implies the infimum can
never be −∞. It is important to note that if the function is not proved to be bounded
from below, its infimum may or may not be −∞, and assuming that
f : X −→ (−∞, ∞]
doesn’t help in this regard because the infimum may still be −∞ even if the function
doesn’t take this value. For example, if f (x) = e−x then f : R −→ (0, ∞) although
inf f = 0. One advantage of defining the range of the function to be (−∞, ∞] is
to guarantee that there is no x0 ∈ D(f ) with f (x0 ) = −∞. So if we can show the
existence of a minimizer x0 , then we have
inf f (x) = f (x0 ) > −∞.

x∈A
In fact, a function that is convex and l.s.c. cannot take the value −∞ since other-
wise it would be nowhere finite (verify). Thus, it is important to avoid this situation
by assuming the function to be proper. A function is called proper if its range is
(−∞, ∞]. So, a proper functional f means neither f ≡ ∞, nor does it take the value
−∞.
Remark Throughout, we use the assumption that f is proper in all subsequent
results.
5.3.4 Coercivity
We will assume that our space X is reflexive Banach space in order to take advan-
tage of the Banach–Alaoglu Theorem and its consequences, such as the Kakutani
Theorem. We will also assume the function to be proper convex and l.s.c., and the
admissible set is bounded, closed, and convex. If the admissible set is bounded, then
any sequence belongs to the set is also bounded, so we can use Kakutani Theorem if
the space is a reflexive Banach space, and we extract a subsequence that converges
weakly. If the set is not bounded, then we can control the boundedness of the sequence
using the coercivity of the function. One equivalent variant of coercivity of f is that
f (xn ) −→ ∞
if
xn −→ ∞
for xn ∈ D(f ). It follows that if (xn ) is the minimizing sequence and
f (xn ) −→ inf f
and f is bounded from below, then
f (xn ) ∞,
and consequently xn ∞ and this proves that xn is bounded. We can look at it
from a different point. Letting z ∈ D(f ) such that f (z) < ∞, we can choose r > 0
large enough to let
f (x) > f (z) whenever x > r.
We can then exclude all these members x by taking the following intersection:
M = D(f ) ∩ Br (0),
which is clearly a bounded set, and is also closed if D(f ) is closed, and convex if
D(f ) is convex, and the infimum of f over D(f ) is the same as the infimum over M .
Our minimizing sequence xn is certainly inside the bounded set M , so it is bounded.
If the space is reflexive Banach space, then M lies in a large fixed closed ball which
is weakly compact, so we can extract a subsequence from xn that converges weakly
to some member x0 . It remains to show that this limit is the minimizer, and it belongs
to M , keeping in mind that
inf f = inf f .
M D(f )
5.3.5 The Main Theorem on the Existence of Minimizers
Now we state our main result of the section which solves the minimization problem.
Theorem 5.3.2 Let X be a reflexive Banach space. If a functional J : A ⊂ X −→

(−∞, ∞] is a proper convex that is bounded from below, coercive, and l.c.s., and
defined on A closed and convex in X , then there exists u ∈ X such that
J [u] = inf J [v].

v∈A
If J is strictly convex then the minimizer u is unique.

Proof Since J is bounded from below, let
inf J [v] = m > −∞.

v∈A
Let (un ) be a minimizing sequence such that
J [un ] −→ m,
which implies that J [un ] is bounded and so by coercivity un is bounded in A. Since

X is reflexive, we use Theorem 5.2.6 to conclude that there exists a subsequence unj ,
for convenience call it again un , which converges weakly to, say u ∈ X . But since A
is closed and convex, by Mazur’s Theorem it is weakly convex, and therefore u ∈ A.
Finally, since J is convex and l.s.c, it is w.l.s.c. and so
J [u] ≤ lim inf J [un ].
It follows that
m ≤ J [u] ≤ lim inf J [un ] ≤ m,
and therefore
J [u] = m > −∞.
Now, suppose that J is strictly convex. Let u1 , u2 ∈ A and consider
u1 + u2
u0 = .
2
Then by strict convexity,

u1 + u2 1
J [u0 ] = J < (J [u1 ] + J [u2 ]) = m,
2 2
and this contradicts the fact that m is the infimum. The proof is complete.
In light of the preceding theorem, we see that the property of being reflexive and
Banach is very useful in establishing the existence of minimizers. We already proved
in Theorem 3.5.3 that all Sobolev spaces W k,p () are Banach, separable, and reflex-
ive for 1 < p < ∞, which justifies the great importance of Sobolev spaces in applied
mathematics. Also, we notice the importance of the admissible set to be convex as
this guarantees I to be convex and u to be in the admissible set. Proving a functional
is w.l.s.c. could be the most challenging condition to satisfy. Once the functional is
proved to be l.s.c. and convex, Theorem 5.2.9 can be used.
We recall some basic results in the analysis:
Proposition 5.3.3 Let un , u ∈ Lp , 1 ≤ p < ∞. The following statements hold:
(1) If un −→ u in norm then there exists a subsequence (unj ) of (un ) such that
unj −→ u a.e.
(2) If (un ) is bounded then there exists a subsequence (unj ) of (un ) such that
unj −→ lim inf un a.e.
w
(3) If un −→ u) then (un ) is bounded and
uLp () ≤ lim un Lp () .
Proof (1) follows immediately from the convergence in measure. (2) is proved using
the definition of infimum. (3) can be proved using the definition of weak convergence,
noting that the sequence {f (un )} is convergent for every bounded linear functional
f ∈ X ∗ . Now, we use Hahn–Banach Theorem to choose f such that f = 1 and
f (u) = u .
The third statement is especially important and can be a very efficient tool in prov-
ing weak lower semicontinuity property for functionals as we shall see in the next
section.
5.4 The Dirichlet Problem
5.4.1 Variational Integral
The goal of this section is to employ the direct method in establishing minimizers
of some functionals. Then we proceed to investigate connections between these
functionals and weak solutions of some elliptic PDEs. It turns out that there is a
close relation between the weak formulation (4.2.5) of the form
B[u, v] − f , v = 0
of a PDE and some integral functional. If we set v = u in the weak formulation, we

obtain the following integral functional:
J [v] = B[u, v] − f , v .
It was observed that the minimizer of the functional J , i.e., the element u0 that will
minimize (locally or globally) the value of J , is the solution of the weak formulation,
which in turns implies that u0 is the weak solution of the associated PDE from which
the bilinear B was derived, and given that B is identified by an integral, the same
5.4 The Dirichlet Problem 311
applies to J . The corresponding functional J is thus called: variational functional,

or variational integral, The word: “variational” here refers to the process of extrem-
ization of the functional, i.e., finding extrema and extremizers of a functional. As we
mentioned earlier in the chapter, we are only interested in the minimization problems.
This observation of the link between the two concepts (i.e., weak solutions of
PDEs and minimizers of their corresponding variational integrals) is old in history
and its roots go back to Gauss, but it was Dirichlet who formulated the problem for
the Laplace equation and provided a framework for this principle which is regarded
as one of the greatest mathematical ideas that affect the shape and structure of modern
analysis.
5.4.2 Dirichlet Principle
Consider the Dirichlet problem of the Laplace equation

∇ 2 u = 0 in
(5.4.1)
u = 0 on ∂.
Recall the weak formulation of the Laplace equation takes the form

∇u.∇vdx = fvdx

Letting u = v above, then we obtain the nonnegative functional

1
E[v] = |∇v|2 dx.
2
Definition 5.4.1 (Dirichlet Variational Integral) The the integral

1
E[v] = |∇v|2 dx
2
is called the Dirichlet Integral.

Physically, v refers to the electric potential and the integral E stands for the energy
of a continuous charge distribution, and consequently, the integral is sometimes
called the energy integral, or the Dirichlet energy. The problem of minimizing this
functional is justified by a physical principle which asserts that all natural systems
tend to minimize their energy.
Now we introduce the Dirichlet principle, due to Dirichlet in 1876, which is
regarded as a cornerstone of analysis and a landmark in the history of mathematics.
Theorem 5.4.2 (Dirichlet Principle) Let ⊂ Rn be bounded, and consider the col-
lection
A = {v ∈ Cc2 () : v(∂) = 0}.
A function
u ∈ Cc2 () ∩ C 0 ()
is a solution to problem (5.4.1) if and only if u is the minimizer over A of the Dirichlet
integral
1
E(u) = |∇u|2 dx.
2
Proof Note that for u, v ∈ Cc2 (), we have by integration by parts (divergence the-
orem)

∇u · ∇vdx = [(v∇u)]∂ − ∇ uvdx = − ∇ 2 uvdx
2
(5.4.2)

Now, let u ∈ Cc2 () be a solution to the Laplace equation. Let v ∈ A. Then by (5.4.2)

0= (∇ u)vdx = −
2
∇u · ∇vdx,

so

|∇(u + v)|2 dx = |∇u|2 dx + ∇u · ∇vdx + |∇v|2 dx

= |∇u| dx +
2
|∇v|2 dx

≥ |∇u|2 dx,

1
for an arbitrary v in A. Multiplying both sides of the inequality by , we conclude
2
that u is a minimizer of E.
Conversely, assume that the functional E has the minimizer u and E(u) ≤ E(v)
for all v ∈ A. Let v ∈ A and choose t ∈ R. Then
E(u) ≤ E(u + tv).
Define the function h : R −→ R by
h(t) = (u + tv),
then its derivative takes the form

5.4 The Dirichlet Problem 313

d 1
h (t) = |∇(u + tv)|2 dx
dt 2

d 1
= |∇u|2 + 2t∇u · ∇v + t 2 |∇v|2 dx .
dt 2
Note that the derivative takes place in t while the integration is in x, and since
u ∈ Cc2 () ∩ C 0 () we can take the derivative inside, and so

h (t) = |∇u| |∇v| + t |∇v|2 dx.

Since h has a minimum at 0, we have

0 = h (0) = |∇u| |∇v| dx

Again, using (5.4.2), noting that v = 0 on ∂, gives

0=− v∇ 2 udx

for every v ∈ A. By Theorem 3.3.4, we conclude that u is a solution to the Laplace

equation.
5.4.3 Weierstrass Counterexample
The principle gives the equivalence between the two famous problems. Based on this
principle, one can guarantee the existence of the Harmonic function (which solves
the Laplace equation) if and only if a minimizer for the Dirichlet integral is obtained.
It is well known that there exist harmonic functions that satisfy Laplace’s equation.
Now, if
∇ 2 u = 0,
this implies that ∇u is a constant, and when u = 0 on the boundary then we must
have u = 0 in the entire region, and so the energy integral equals zero. However, that
wasn’t the end of the story. If we go the other way around, mathematicians in the late
nineteenth century started to wonder: does there exist a minimizer for the integral?
Riemann, a pupil of Dirichlet, argued that the minimizer of the integral exists since
E≥0
from which he incorrectly concluded that there must exist a greatest lower bound. In
1869, Weierstrass provided the following example of a functional with a minimum
but with no minimizer, i.e., the functional doesn’t attain its infimum.
Example 5.4.3 (Weierstrass’s Example) Consider the Dirichlet integral
1
xv (x) dx
2
E[v] =
−1
where
v ∈ A = {v ∈ Cc2 () : v((−1) = −1, v(1) = 1}.
Clearly E ≥ 0. Define on [−1, 1] the sequence
tan−1 nx
un (x) =
tan−1 n
Then, it is easy to see that E[un ] −→ 0, and so we conclude that
inf E[u] = 0.
u∈A
Now, if there exists u ∈ A such that E[u] = 0, then u = 0, which implies that u is
constant, but this contradicts the boundary conditions.
The example came as a shock to the mathematical community at that time, and the
credibility of the Dirichlet principle became questionable. Several mathematicians
started to rebuild the confidence in the principle by providing a rigorous proof for
the existence of the minimizer of the Dirichlet integral. In 1904, Hilbert provided
a “long and complicated” proof of the existence of the minimizer of E in C 2 using
the technique of minimizing sequence. After the rise of functional analysis, it soon
became clear that the key to achieve a full accessible proof lies in the direct method.
The Dirichlet principle is valid on C 2 but the minimizer cannot be guaranteed unless
we have a reflexive Banach space, and C 2 is not. The best idea was to enlarge the
space to the completion space of C 2 , which is nothing but the Sobolev space. It
came as no surprise to know that defining Sobolev spaces as the completion of
smooth functions C ∞ was for a long time the standard definition of these spaces.
Furthermore, according to Theorem 3.5.3, Sobolev spaces are reflexive Banach space
for 1 < p < ∞. It turns out that Sobolev spaces are the perfect spaces to use in order
to handle these problems in light of the direct method that we discussed in Sect. 5.3.
This was among the main reasons that motivated Sergei Sobolev to develop a theory
to construct these spaces, and this justifies Definition 3.7.4 from a historical point of
view.
5.5 Dirichlet Principle in Sobolev Spaces 315
5.5 Dirichlet Principle in Sobolev Spaces
5.5.1 Minimizer of the Dirichlet Integral in H01
The next two theorems solve the minimization problem of the Dirichlet integral over
two different admissible sets in a simple manner by applying Theorem 5.3.2.
Theorem 5.5.1 There exists a unique minimizer over A = {v ∈ H01 (), is
bounded in Rn } for the Dirichlet integral.
Proof It is clear that E is bounded from below by 0, and so there is a finite infimum
for E. Moreover, E is coercive by Poincare’s inequality (or considering the Poincare
w w w
norm on H01 ). Let un −→ u in H01 (). Then un −→ u and Dun −→ Du in L2 ().
By Proposition 5.3.3(3),
Du2 ≤ lim inf Dun 2 ,
which implies
E[u] ≤ lim inf E[un ], (5.5.1)
and so E[·] is w.l.s.c. Finally, it can be easily proved that E[·] is strictly convex due
to the strict convexity of |·|2 and linearity of the integral. We, therefore, see that E
is bounded from below, coercive, w.l.s.c. and strictly convex, and it is defined on a
reflexive Banach space H01 (). The result now follows from Theorem 5.3.2.
5.5.2 Minimizer of the Dirichlet Integral in H 1
Now, we investigate the same equation, but with u = g on ∂ this time. Here we
assume g ∈ H 1 (), and consequently we also have u ∈ H 1 (). This boundary con-
dition can be interpreted by saying that the functions u and g have the same trace on
the boundary. Since the functions are both Sobolev, they are measurable and so point-
wise values within a set of measure zero have no effect. To avoid this problematic
issue on the boundary, we reformulate the condition to the form:
u − g ∈ H01 ().
Accordingly, the admissible set consists of all functions in H 1 () such that they are
equal, in a trace sense, to a fixed function g ∈ H 1 () on ∂. The following result
proves an interesting property of such a set.
Proposition 5.5.2 The admissible set
A = {u ∈ H 1 () : u − g ∈ H01 () for some g ∈ H 1 ()}

is weakly closed.
Proof Notice that the set A is convex. Indeed, let u, v ∈ A and letting w = θu +
(1 − θ)v, we clearly have w ∈ H 1 () being a linear space. We also have
w − g = θu + (1 − θ)v − g
= θ(u − v) + v − g
= θ(u − g) − θ(v − g) + (v − g)
∈ H01 ()
since
(u − g), (v − g) ∈ H01 ()
which is a linear space. Hence, w ∈ A.

Further, let un ∈ A, and un −→ u. Then
un − g ∈ H01 ()
which is a complete space, so
lim(un − g) = (u − g) ∈ H01 (),
hence u ∈ H01 (), from which we conclude that A is closed, and therefore, it is
weakly closed.
The next is a variant of Theorem 5.5.1 for the space H 1 (). The source of difficulty
here is that the Poincare inequality is not applicable directly.
Theorem 5.5.3 For a bounded set in Rn , there exists a unique minimizer for the
Dirichlet integral over the set
A = {u ∈ H 1 () : u − g ∈ H01 () for some g ∈ H 1 ()}.
Proof In view of the proof of Theorem 5.5.1, it suffices to show that the minimizing
sequence is bounded. If uj ∈ A is a minimizing sequence such that J [uj ] < ∞, then
sup Duj q
< ∞.
We also have
1,q
uj − g ∈ W0 (),
so using Poincare’s inequality we have

uj Lq
= uj − g + g Lq
≤ uj − g Lq
+ gLq
≤ D(uj − g) Lq
+ gLq
= Duj − Dg Lq
+ C1
≤C
Hence,
sup uj q
< ∞,
and consequently, (uj ) is bounded in W 1,q ().
5.5.3 Dirichlet Principle
Now we are ready to prove the Dirichlet principle in the general settings over Sobolev
spaces.
Theorem 5.5.4 Let ⊂ Rn be bounded. A function u ∈ A = {v ∈ H01 ()} is a weak
solution to problem (5.4.1) if and only if u is the minimizer over A of the Dirichlet
Integral.
Proof If u ∈ A is a weak solution to the Laplace equation, then u satisfies the weak
formulation of the Laplace equation, namely,

Du · Dvdx = 0.

Let w ∈ A, which can be written as w = u + v for some v ∈ A. Then, we have

|D(u + v)|2 dx = |Du|2 dx + 2 |Du| · |Dv| dx + |Dv|2 dx

≥ |Du| dx + 2 Du · Dvdx +
2
|Dv|2 dx

= |Du|2 dx + |Dv|2 dx

≥ |Du| dx,
2

and so u is a minimizer of E[·]. Conversely, let
E(u) ≤ E(v)
for all v ∈ A. Theorem 4.5.2 proved the existence and uniqueness of the weak solution
for the problem (with f = 0), and it was already proved above that it is a minimizer
of the integral variational E[·] and Theorem 5.5.1 proved the uniqueness of the
minimizer, so the other direction is proved.
5.5.4 Dirichlet Principle with Neumann Condition
The next result discusses the Dirichlet principle under a Neumann boundary condi-
tion, ⎧
⎨−∇ 2 u = 0 x ∈
∂u (5.5.2)
⎩ = g x ∈ ∂.
∂n
for a bounded Lip. domain ⊂ Rn and g ∈ C 1 (∂). It can be seen from the problem
that the solution is unique up to constant. Let u ∈ C 2 () be a classical solution of
problem (5.5.2). Multiply the equation by v ∈ C 2 () and integrate over then using
Green’s formula,

∂u
0= ∇ 2 uvds = vds − |∇u| |∇v| dx,
∂ ∂n
or
∂u
|∇u| |∇v| dx = vds, (5.5.3)
∂ ∂n
which gives the corresponding variational integral

J [v] = E[v] − gvds. (5.5.4)
∂
Now, letting v = 1 and substituting in (5.5.3) gives

∂u
ds = 0.
∂ ∂n
This is a compatibility condition which is essential for the problem to guarantee

the uniqueness of the solution.
As we did in Theorem 5.4.2, we will first prove the principle in the classical case.
Theorem 5.5.5 A function u ∈ C 2 () is a solution of problem (5.5.2) if and only if
u is the minimizer of the variational integral (5.5.4) over

∂v
A = {v ∈ C (), 2
ds = 0}.
∂n

Proof Note that the admissible space here is C 2 () and so our functions u, v don’t
necessarily vanish on ∂. Let u ∈ C 2 () be a solution of problem (5.5.2). Let
w ∈ A, and write w = u − v for some w ∈ C 2 (). Then
J [w] = J [u − v]

1
= |∇(u − v)|2 dx − (u − v)gds
2 ∂

1 1 ∂u
= |∇u|2 − guds + |∇v|2 − |∇u| |∇v| + vds.
2 ∂ 2 ∂ ∂n
Note that if v is a constant, then J [w] = J [u]. We substitute in the above equality
with v = c, and thus the condition

∂u
ds = 0
∂ ∂n
is verified, and so u ∈ A. Using the first Green’s formula in the last two integrals in
the RHS of the equation yields

∂u
− |∇u| |∇v| + vds = v∇ 2 u = 0.
∂ ∂n
Substituting above gives

1
J [w] = J [u] + |∇v|2 ≥ J [u],
2
hence u is the minimizer of J .

Conversely, let u be a minimizer of J . Let v ∈ A and choose t ∈ R. Then
J (u) ≤ J (u + tv).
So if we define the function h : R −→ R by
h(t) = J (u + tv)

1
= |∇(u + tv)|2 − g(u + tv)ds
2 ∂

1 1
= |∇u|2 + t |∇u| |∇v| + t 2 |∇v|2 − guds − t gvds
2 2 ∂ ∂
then it is clear that h has a minimum value at 0. If h is differentiable at 0 then

h (0) = 0. This gives

0 = h (0) = |∇u| |∇v| − gvds.
∂
Applying Green’s formula again gives

∂u
0= vds − v∇ 2 udx − gvds,
∂ ∂n ∂
or
∂u
v∇ udx =
2
vds − gvds. (5.5.5)
∂ ∂n ∂
Remember that we are yet to prove that u is a solution for Laplace equation, so we
cannot say that ∇ 2 u = 0. The trick here is to reduce the admissible space by adding a
suitable condition, namely v = 0 on ∂. This gives the following reduced admissible
space
A0 = {v ∈ Cc2 ()} ⊂ A = {v ∈ C 2 ()}.
So, if (5.5.5) holds for all v ∈ A then it holds for all v ∈ A0 , i.e., v = 0 on ∂, and
consequently the integrals in the RHS of (5.5.5) vanish and we thus obtain

v∇ 2 udx = 0

for every v ∈ A0 , from which we conclude that
∇ 2 u = 0.
Now, it remains to show that u satisfies the boundary condition. Getting back to our
admissible space A = {v ∈ C 2 ()}, and since the left-hand side of (5.5.5) is zero,
this gives
∂u
0= − g vds,
∂ ∂n
for all v ∈ A. By the Fundamental Lemma of COV, we get the Neumann BC and the
proof is complete.
It is important to observe how the variational integral changes its formula although it
corresponds to the same equation, namely, Laplace equation. It is, therefore, essential
in this type of problems to determine the admissible set in which the candidate
functions compete since changing the boundary conditions usually cause a change
in the corresponding variational integral.
5.5.5 Dirichlet Principle with Neumann B.C. in Sobolev

Spaces
We end the section with the Dirichlet principle with Neumann condition over Sobolev
spaces.
Theorem 5.5.6 A function u ∈ H 1 (), for bounded ∈ Rn , is a weak solution of
problem (5.5.2) if and only if u is the minimizer of the variational integral (5.5.4)
over
∂u
A = {v ∈ H 1 (), ds = 0}.
∂n
∂
Proof The (if) direction is quite similar to the argument for the preceding theorem.
Conversely, let u be a minimizer of J . It has been shown that the problem (5.5.2) has
a unique weak solution (see Problem 4.11.15(a)), and the argument above showed
that a weak solution is a minimizer to the variational integral (5.5.4), so it suffices
to prove that there exists a unique minimizer for (5.5.4), but this is true since J is
strictly convex.
We end the section with the following important remark: In the proof of the preceding
theorem, to show a minimizer is a weak solution for the problem, one may argue
similar to the proof of Theorem 5.5.5. Namely, let t ∈ R, and define the function

1
h(t) = J (u + tv) = |∇(u + tv)|2 dx − (u + tv)gds.
2 ∂
Again, h has minimum at 0, and so h (0) = 0. Differentiating both sides, then sub-
stituting with t = 0 gives

0 = h (0) = |∇u| |∇v| − gvds. (5.5.6)
∂
Hence, u satisfies the weak formulation (5.5.3). Although the argument seems valid,
but in fact, we may have some technical issue with it. Generally speaking, inserting
the derivative inside the integral when one of the functions of the integrand is not
smooth can be problematic, and this operation should be performed with extra care.
We should also note that differentiating h implies that the integral functional J should
be differentiable in some sense and the two derivatives are equal. The next section will
elaborate more on this point, and legitimize this previous argument by introducing a
generalization of the notion of derivative that can be applied to functionals that are
defined on infinite-dimensional spaces.
5.6 Gateaux Derivatives of Functionals
5.6.1 Introduction
In Sect. 5.3, we discussed the direct method and indirect method and the comparison
between them. The former is based on functional analysis, whereas the latter is based
on calculus. Two reasons for choosing the direct method are:
1. discussing the direct method fits the objective and scope of this book, and
2. we didn’t have tools to differentiate the variational integrals E[·] and J [·].
In this section, we will introduce the notion of differentiability of functionals and
will apply it to our variational integrals. As the title of the section suggests, we are
merely concerned with the idea of differentiating the variational integrals, and the
topic of differential calculus on metric function spaces is beyond the scope of the
book.
5.6.2 Historical Remark
One of the main reasons that causes the direct method to appear and thrive is the
lack of differential calculus required to deal with functional defined on function
spaces at that time. Hilbert and his pupils in addition to their contemporaries didn’t
had the machinery tools of calculus to deal with these “functionals”. The theory of
differential calculus in function and metric spaces was developed by René Fréchet and
René Gateaux, and they both were still pupils studying mathematics when Hilbert
published his direct method in 1900. The first work that appeared in differential
calculus was the Frechet’s thesis in 1906, but the theory wasn’t clear and rich enough
to use at that time. Gateaux started to work on the theory in 1913, and his work was not
published until 1922. The theory of differential calculus in metric and Banach spacers
soon started to attract attention, and it soon became an indispensable tool in the area
of calculus of variations due to its efficiency and richness in techniques. The present
section gives a brief introduction to the theory. We will not give a comprehensive
treatment of the theory, but rather highlight the important results that suffice our needs
in this chapter, and show how calculus can enrich the area of variational methods
and simplify the task of solving minimization problems.
5.6.3 Gateaux Derivative
Recall the directional derivative of a function f : Rn −→ R at x ∈ Rn and in the

direction of v is given by
5.6 Gateaux Derivatives of Functionals 323
f (x + tv) − f (x)
Dv f (x) = lim . (5.6.1)
t→0 t
To extend this derivative to infinite dimensions in a general normed space X , we let
Dv f (x) = Df (x; v): X × X −→ R
and given by the limit definition (5.6.1). If the limit above exists, then the quantity
Dv f (x) is called the Gateaux differential of f at x in the direction of v ∈ X , or simply:
G-differential. So at each single point, there are two G-differentials in one dimension
and infinitely many of them in two or more dimensions. Let us take a closer look at
the differential in (5.6.1). Let x0 ∈ X for some normed space X be a fixed point at
which the G-differential in the direction ofv = x − x0 for some x ∈ X exists. Then
for every t > 0, we have
|f (x) − f (x0 ) − Df (x, x − x0 )| ≤ t x − x0 .
In order for the above inequality to make sense, it is required that
Df (x, x − x0 ) ∈ R.
On the other hand, if f in (5.6.1) is a mapping defined on Rn , then the G-differential

reduces to:
Df (x0 ; v) = ∇f (x0 ) · v. (5.6.2)
It is well-known in linear algebra that a function f : Rn −→ R is linear if and only if

there exists v ∈ Rn such that f (x) = x · v for every x ∈ X, hence when we extend to
an infinite-dimensional space X , the dot product ∇f (x0 ) · v gives rise to a functional
Df (x0 , v) from X to R, and we can alternatively use the inner product notation
Df (x0 ; v) = Df (x0 ), v . (5.6.3)
Now, letting v = x − x0 and substituting in (5.6.2) gives
Df (x0 , v) = Df (x0 )(x − x0 ), (5.6.4)
and since x − x0 ∈ X , and Df (x0 , v) ∈ R, the quantity Df (x0 ) must be a functional

defined on X . This gives a good definition for the “Gateaux derivative” to exist and
be defined on X ∗ × X rather than X × X as for the directional derivative in Rn , but
this requires the G-differential to exist in all directions v ∈ R, so that we can define
Df (x0 ) as a functional on X . Moreover, the form (5.6.4) suggests the derivative
Df (x0 ; v) to be linear in v and bounded. Indeed,
Df (x0 , v) = Df (x0 )v ≤ Df (x0 ) v ,

and
Df (x0 , v + w) = Df (x0 )(v + w)

= Df (x0 )v + Df (x0 )w
= Df (x0 , v) + Df (x0 , w).
So we have Df (x0 ) ∈ X ∗ , and Df (x0 , v) : X ∗ × X −→ R. The functional
Dv f (x) = Df (x, v)
is the Gateaux derivative of f at x, and f is said to be Gateaux differentiable at x.

This motivates the following definition for Gateaux derivative of functionals.
Definition 5.6.1 (Gateaux Derivative) Let x0 ∈ X for some normed space X , and
consider the functional f : X −→ R. Define the following mapping (Df )(·, ·) : X ∗ ×
X −→ R, given by
f (x0 + tv) − f (x0 )

Df (x0 , v) = Df (x0 )v = lim .
t→0 t
(i.e. Df (x0 ) is bounded and linear in v). If Df (x0 , v) exists at x0 in all directions
v ∈ Rn , then Df (x0 , v) is called the Gateaux derivative of f , and f is said to be
Gateaux differentiable at x.
Care must be taken in case the domain of f is not all the space X , as we must
ensure the definition applies to points in the interior of D(f ). Furthermore, as we
observe from the definition, it requires the Gateaux derivative to be bounded and
linear in v. This is especially helpful in the calculus of variations to obtain results
that are consistent with the classical theory of differential calculus.
5.6.4 Basic Properties of G-Derivative
The following proposition shows that the G-derivative rules operate quite similar to
those for the classical derivative.
Proposition 5.6.2 Let f : X −→ R that is G-differentiable on X , and c ∈ R. Then
(1) For Df (u, cv) = cDf (u, v).

(2) D(cf ) = cDf .
(3) D(f ± g) = Df ± Dg.
(4) D(f .g) = fDg + gDf .
(5) Dv (ex ) = vex .
Proof Exercise.
5.6.5 G-Differentiability and Continuity
One significant difference between the classical derivative f and the G-derivative is
that if a functional is differentiable at x0 in the classical sense, then it is continuous at
x0 . This is not the case for the Df (x0 , v). The following example demonstrates this
fact.
Example 5.6.3 Consider the function f : R2 −→ R given by

xy4
x2 +y8
(x, y) = (0, 0)
f (x, y) =
0 (x, y) = (0, 0).
It is easy to show that f is not continuous at (0, 0) by showing the limit at (0, 0)
doesn’t exist. However, let v = (v1 , v2 ) ∈ R2 . Then applying the limit definition at
x0 = (0, 0) gives
f (tv1 , tv2 ) 1 tv1 (tv2 )4

lim = lim
t→0 t t→0 t (tv1 )2 + (tv2 )8
v1 v4
= lim t 2 2 26 8
t→0 v + t v
1 2
= 0,
so
Df (0, 0)v = 0
for all v ∈ R2 , which implies that Df (0, 0) = 0.

One reason to explain the above example is that the G-derivative from its very
definition doesn’t require a norm on the space, and consequently, it is not related to the
convergence property of the space. This is why the functional can be G-differentiable
at a point and discontinuous at that point.
5.6.6 Frechet Derivative
A stronger form of a derivative which ensures this property is the

Frechet derivative, which results from the G-derivative if the convergence of the limit
is uniform in all directions and doesn’t depend on v as the case for G-differentiability,
and writing x − x0 = v, then the derivative takes the form
|f (x) − f (x0 ) − Df (x0 )v|

Df (x0 , v) = lim = 0,
v→0 vX
where the norm ·X is the norm on the space X . Here, if the Frechet derivative exists
at a point x0 then the G-derivative exists at x0 exists and they coincide. Moreover, the
function is continuous at x0 . However, the G-derivatives may exist, but F-derivatives
do not exist. That’s why, dealing with the “weaker” type of differentiation may
sound more flexible and realistic in many cases. Moreover, due to the norm and the
uniform convergence, demonstrating the Frechet differentiability is sometimes not
an easy task, and evaluating the G-derivative is usually easier than the F-derivative.
More importantly, since this derivative can differentiate discontinuous functions, it
suits Sobolev functions and suffices our needs in this regard, so we will confine the
discussion to it throughout the chapter.
5.6.7 G-Differentiability and Convexity
It is well-known in analysis that one way to interpret convex functions geometrically

is that its graph lies above all of its tangent lines. This can be represented by saying
that
f (x) − f (y) ≥ f (y)(x − y)
for all x, y ∈ D(f ). The following result extends the above fact to
infinite-dimensional spaces.
Theorem 5.6.4 Let f : ⊆ X −→ R be G-differentiable defined on a convex set
in a normed space X . Then f is convex if and only if
f (v) − f (u) ≥ Df (u, v − u)
for all u, v ∈ .
Proof Let f be convex. Then for u, v ∈
tv + (1 − t)u = u + t(v − u) ∈ ,
and
f (tv + (1 − t)u) ≤ tf (v) + (1 − t)f (u)

= f (u) + t(f (v) − f (u)),
which implies that
f (tv + (1 − t)u) − f (u)

≤ f (v) − f (u).
t
Passing to the limit as t −→ 0 gives the first direction. Conversely, let u, v ∈ .
Since is convex,
w = tv + (1 − t)u = u + t(v − u) ∈ .
We apply the inequality on u, w and using u − w = −t(v − u) as the direction, and

apply it again on w, v, and using
v − w = (1 − t)(v − u)
as the direction, and taking into account Proposition 5.6.2(1), gives
f (u) − f (w) ≥ Df (w, −t(v − u)) = −tDf (w, v − u), (5.6.5)
f (v) − f (w) ≥ Df (w, (1 − t)(v − u)) = (1 − t)Df (w, v − u). (5.6.6)
Multiply (5.6.5) by (1 − t) and (5.6.6) by t, respectively, then add the two equations
gives
f (u) + t(f (v) − f (u)) ≥ f (w),
which implies the function is convex.

One important consequence is the following remarkable result that may facilitate
our efforts in demonstrating lower semicontinuity property of functionals, which
in many cases can be a complicated task. According to the following result, this
property is guaranteed if the functional is G-differentiable and convex.
Theorem 5.6.5 If f : X −→ R is convex and G-differentiable at x, then f is weakly
lower semicontinuous at x.
w
Proof Letting un −→ u in X . By Theorem 5.6.4, for each n we write
f (un ) − f (u) ≥ Df (u, un − u) = Df (u)(un − u),
but we know that Df (u) ∈ X ∗ , then by the definition of weak convergence, we have
Df (u)(un − u) −→ 0,
from which we get

f (un ) − f (u) ≥ 0,
and the result follows by taking the liminf for both sides of the inequality above.
5.6.8 Higher Gateaux Derivative
We can use the same discussion above in defining a second order G-derivative. Let
Df (x0 ) ∈ X ∗ be the G-derivative of a functional f : X −→ R. We define the G-
derivative of Df (x0 , v) in the direction of w ∈ X as
Df (x0 + tw, v) − Df (x0 , v)

D2 f (x0 , v, w) = lim .
t→0 t
This gives the second G-derivative of f in the directions v and w and taking the form

(D2 f (x0 )v)w = (D2 f (x0 )v, w ∈ R
which defines an inner product on X × X . The second G-derivative D2 f (x0 )v also

defines a continuous bilinear B : X × X −→ R in the directions v, w and is given by
B[v, w] = (D2 f (x0 )v)w.
This proposes the following Gateaux’s variant of Taylor’s theorem.

Theorem 5.6.6 Let f : X −→ R, for some normed space X , be twice
G-differentiable. Then for some t0 ∈ (0, 1), we have
t2 2
f (u + tv) = f (u) + t Df (u), v + D f (u + t0 v)w, w (5.6.7)
2
Proof Define the real-valued function
ϕ(t) = f (u + tv).
Since ϕ is twice differentiable, we can apply a second-order Taylor series expansion

on (0, t) to obtain
t 2
ϕ(t) = ϕ(0) + tϕ (0) + ϕ (t0 ) + (v),
2!
This yields the expansion (5.6.7).
Theorem 5.6.6 will be used to characterize convexity of functionals in connection to

their first and second G-derivatives.
Theorem 5.6.7 Let f : ⊆ X −→ R be twice G-differentiable defined on a convex
set in a normed space X . Then the following are equivalent:
(1) f is convex
(2) f (v) − f (u) ≥ Df (u, v − u) for all u, v ∈ .
(3) D2 f (u)v, v ≥ 0 for all u, v ∈ .
Proof The equivalence between (1) and (2) has been proved in Theorem 5.6.4. The
equivalence between (2) and (3) follows easily from (5.6.7).
The next theorem provides a procedure of finding the G-derivative of a general

variational form.
Theorem 5.6.8 Let

1
J [u] = B[u, u] − L(u)
2
be the variational integral associated with some elliptic PDE, for some bounded
symmetric bilinear form B and bounded linear functional L. Then:
(1) DJ (u)v = B[u, v] − L(v).

(2) D2 J (u, v)v = B[v, v].
Proof For (1), we have
J [u + tv] − J [u]
DJ (u)v = lim
t
t
1 1 1
= lim B[u + tv, u + tv] − L(u + tv) − B[u, u] + L(u)
t t 2 2
1
= lim tB[u, v] + t 2 B[v, v] − tL(v)
t t
= B[u, v] − L(v).
For (2), we have
DJ (u + tv)v − DJ (u)v
D2 J (u, v)v = lim
t t
1
= lim [B[u + tv, v] − L(v) − (B[u, v] − L(v))]
t t
1
= lim tB[v, v] = B[v, v].
t t
The arguments were fairly straightforward but we preferred to write the details out
due to the significance of the result. The functional J in the theorem represents the
variational functional from which an equivalent minimization problem can be defined
for an elliptic PDE. One consequence of the above theorem is that the G-derivative
of the variational functional is nothing but the weak formulation of the associated
PDE. Moreover, the second G-derivative of the functional is the elliptic bilinear form
B evaluated at the new variation v. If B is coercive then the second G-derivative of J
is positive for all v = 0.
Another consequence is that any critical point of J is a weak solution of the
PDE associated with the functional J . Note from Theorem 5.6.8(1) that the equation
DJ (u, v) = 0 gives B[u, v] = L(v) which is the the weak formulation of the equation
associated to the variational integral J .
5.6.9 Minimality Condition
We end the section with some calculus techniques for the functionals to explore
minimizers using the first and the second G-derivative. The first concerning the
critical point of a functional. The term: local point will serve the same meaning as
we have for the calculus. A local minimum point (or function) is an interior point at
which the function/functional takes a minimum value.
Theorem 5.6.9 Let J : ⊂ X −→ R be a functional that is G-differentiable on a
convex set .
(1) If u ∈ X is a local minimizer then DJ (u, v) = 0 for all v ∈ X .

(2) If J is convex then DJ (u, v) = 0 for all v ∈ X if and only if u is a local minimizer
of J . If = X then u is a global minimizer.
(3) If J is strictly convex then u is the unique minimizer.
Proof We will only prove (1) and (2), leaving (3) as easy exercises to the reader.
(1): Let u ∈ X be a local minimizer for a G-differentiable functional J . Letting v ∈
be any vector, and choosing t > 0 such that u + tv ∈ yield
J (u + tv) − J (u) ≥ 0.
Divide by t then take t −→ ∞ to obtain
DJ (u, v) ≥ 0.
Since this holds for any arbitrary v, we can choose −v, and using
Proposition 5.6.2(1)
−DJ (u, v) = DJ (u, −v) ≥ 0.
(2): Assume J is convex. Then for v ∈ , we have by Theorem 5.6.4
f (v) − f (u) ≥ Df (u, v − u) = 0.
In the elementary calculus case, it is well-known that a critical point at which f = 0

doesn’t necessary mean it is a local minimum, as it may be maximum or a saddle
point. We need to examine the first derivative test or the second derivative test to
check if this point is extremum or not. We will do the same here. The next result tells
us that if the second derivative of the functional at the critical point is positive then
the point is local minimum.
Theorem 5.6.10 Let J : ⊂ X −→ R be a twice G-differentiable functional on a
convex set . If u0 is a local minimizer for J then
D2 J (u)(v, v) ≥ 0.
5.7 Poisson Variational Integral 331
Proof The result follows easily using Theorem 5.6.9(1) above and Taylor’s
Formula (5.6.7).
5.7 Poisson Variational Integral
5.7.1 Gateaux Derivative of Poisson Integral
In this section, we will employ the tools and results that we learned in the previous
section in establishing some existence results and necessity conditions for mini-
mizers. This “calculus-based” method seems more flexible than the direct method
(which deals only with the existence problems), and provides us with various tools
from calculus. We will pick the Poisson variational integral (which is a generalization
to the Dirichlet integral) as our first example since we already established minimiza-
tion results and equivalence to weak solution for the Laplace equation. Recall the
Poisson variational integral takes the form

1
J [v] = |Dv| dx −
2
fvdx, (5.7.1)
2
Proposition 5.7.1 The Poisson variational integral J : H 1 () −→ R given by

(5.7.1) is G-differentiable.
Proof The G-derivative of J at u can be calculated as follows: Let 0 < t ≤ 1. From

(5.7.1), we write

1
J [u + tv] = |Du|2 + t 2 |Dv|2 + 2t |Du| |Dv| dx.
2
This implies

J [u + tv] − J [u] t
= |Dv|2 + |Du| |Dv| dx − fvdx.
t 2
The integrand in the first integral is clearly dominated by |Dv|2 which is integrable,
so applying the Dominated Convergence Theorem gives

t t
lim |Dv|2 = |Dv|2 = 0.
lim
t→0 2 t→0 2
Therefore, the G-derivative of J [·] is

DJ (u)v = |Du| |Dv| dx − fvdx.

We have two important observations.

(1) One advantage of using the G-derivative is that, in many cases (not all!), it is
easy to exchange integration and limits. This is because the integration process
takes place with respect to x, whereas the limit process is with respect to t, which
enables us to find an integrable function that dominates our integrand, and so the
Dominated Convergence Theorem can be applied.
(2) Observe that the G-derivative which was found in the preceding result has the
same form as the one found in (5.5.6) (except possibly the second integral term).
It shows that by defining
h(t) = J (u + tv),
we can evaluate the G-derivative of the functional by differentiating h. The

Poisson variational functional is G-differentiable, and consequently the operation
h (t) is legitimate and makes sense. Therefore, according to the results of the
preceding section, if u is a minimizer of a G-differentiable variational integral
J , and we set
h(t) = J (u + tv),
then
h (0) = DJ (u, V ).
In other words, we can think of the minimizer u to be a critical point of J .

Now, we will discuss the results using the new tools that we learned in the previous
sections. The first result to begin with is the Dirichlet principle of the equivalence
between the minimization problem and the weak solution of the Poisson equation.
Theorem 5.7.2 There exists a unique minimizer over H01 (), where is bounded
in Rn , for the variational Poisson integral (5.7.1) for f ∈ L2 (). Furthermore, u ∈
H01 () is the weak solution of the Dirichlet problem of the Poisson equation

−∇ 2 u = f
u=0 ∂
for f ∈ L2 (), if and only if u is the minimizer of the Poisson variational integral.
Proof For the first part of the theorem, note that J is clearly strictly convex, coercive
by Poincare inequality, G-differentiable by Proposition 5.7.1, and so weakly l.s.c. by
Theorem 5.6.5. The result follows from Theorem 5.3.2. Now we prove the equiv-
alence of the two problems. Let u ∈ H01 () be the weak solution of the Poisson
problem. Let v ∈ H01 (), and write v = u − w for some w ∈ H01 (). Then

1
J [v] = J [u − w] = |∇(u − w)|2 − (u − w)f dx.
2
With simple calculations, this can be written as


1 1
J [v] = |∇u| −2
fudx + |∇w| − 2
|∇u| |∇w| + fwdx.
2 2
Performing integration by parts on the fourth integral in the RHS of the equation,
given that
w |∂ = 0
yields
− |∇u| |∇w| = w∇ 2 u = − fwdx.

Substituting above gives

1
J [v] = J [u] + |∇w|2 ≥ J [u],
2
which holds for every v ∈ H01 (), hence u is the minimizer of J . Conversely, let u
be a minimizer of J . We have

1 1
J [u + tv] − J [u] = B[u + tv, u + tv] − f (u + tv])dx − B[u, u] + fudx
2 2

t2
= t B[u, v] − fvdx + B[v, v].
2
Dividing by t then passing to the limit as t −→ 0 gives

DJ (u) = B[u, v] − fvdx.

As a consequence, we have
Corollary 5.7.3 u ∈ H01 () is a minimizer for the Poisson variational integral J [·]
if and only if DJ (u) = 0.
Proof By Theorem 5.6.8, we have

DJ (u)v = B[u, v] − fvdx,

and note that the right-hand side represents the weak formulation of the Poisson
equation with homogeneous Dirichlet condition. Hence, DJ (u) = 0 if and only if u
is a weak solution to the Poisson equation, and by the previous theorem this occurs
if and only if u is a minimizer for the functional J .
The above corollary views the vanishing of DJ (u) as a necessary and sufficient
condition for minimization, but only for the variational integral of Poisson-type.
5.7.2 Symmetric Elliptic PDEs
Now we attempt to generalize our discussion of the Dirichlet principle to hold for
the following problem:

Lu = f in
(5.7.2)
u = 0 on ∂.
for some uniformly elliptic operator L as in (4.5.3) with symmetric aij , c ∈ L∞ (),
and c(x) ≥ 0 a.e.x ∈ , for some open and bounded in Rn , and f ∈ L2 (). Recall
that the elliptic bilinear B[u, v] associated with an elliptic operator L is continuous,
and B[v, v] is coercive if L is uniformly bounded. Moreover, by uniform ellipticity,
we have
n
aij (x)ξi ξj ≥ λ0 |ξ|2 .
i,j
It has been shown (Theorem 4.5.5) that there exists a unique weak solution in H01 ()
for the problem. The bilinear map takes the form
⎛ ⎞

n
⎝ ∂u ∂v
B[u, v] = aij (x) + c(x)u(x)v(x)⎠ dx, (5.7.3)
i,j=1 ∂xi ∂xj
and so the variational integral associated to (5.7.2) (written in short form) is

1 1
J [v] = B[v, v] − fvdx = A(x) |Dv|2 + cv2 dx − fvdx. (5.7.4)
2 2
We will follow the same plan. Namely, we prove the existence of the minimizer, then
we prove that the problem of finding the minimizer of (5.7.4) is equivalent to the
problem of finding the solution of (5.7.3), i.e., the solution of the equation and the
minimizer is the same. Remember that Theorem 4.5.5 states that there exists only one
weak solution, so if our result is valid, there should be only one minimizer. Before
establishing the result, we recall that for bounded sequences, the following identities
are known and can be easily proved:
lim inf(−xn ) = − lim sup(xn ), (5.7.5)
and
lim inf(xn ) + lim inf(yn ) ≤ lim inf(xn + yn ). (5.7.6)
Theorem 5.7.4 There exists a unique minimizer over H01 (), where is bounded
in Rn , for the variational integral (5.7.4) for f ∈ L2 ().
Proof Let u ∈ H01 (). Then, we have

1 1
J [u] = A(x) |Du| dx +
2
cu dx −
2
fudx.
2 2

1
≥ λ0 du2L2 − fudx.
2

1
≥ u 2
− fudx (Poincare’s inequality)
2C + 2
2 H 1

≥ C u2H 1 − f L2 · uL2 (C-S inequality)

1
≥ C u2H 1 − f 2 dx − u2L2 (Lemma 4.4.5)
4

≥ c uH 1 − uL2 − f 2 dx (c = min{C, 41 })
2 2

= c DuL2 − f dx (definition of ·H 1 )
2 2

≥ − f dx 2

> −∞. (since f ∈ L2 ).
Hence, J is bounded from below. Also, from the third inequality above, we have

J [v] ≥ C1 v2H 1 − f 2 dx − C2 v2 dx,

which can be written as

J [u] + f 2 dx ≥ C1 u2H 1 − C2 u2L2

≥ C3 u2H 1 − u2L2 (C3 = min{C1 , C2 }).
= C3 Du2L2
≥ C u2H 1 (C = 2C3 ).
C + 1
Hence, J [·] is coercive. Further, J is a strictly convex being the summation of two
w
strictly convex terms (i.e., |Du|2 and u2 ) and a linear term. Finally, let un −→ u
in H 1 (). Since f is a bounded linear functional on H 1 (), by definition of weak
convergence, we have
lim un fdx = ufdx,

which implies that

lim inf un fdx = lim sup un fdx = ufdx. (5.7.7)

By Proposition 5.3.3(3), we have
c u2 ≤ lim inf c un 2 . (5.7.8)
and
Du2 ≤ lim inf Dun 2 .
and given that A = [aij (x)] ∈ L∞ (), it is readily seen (verify) that

1 1
A |Du| dx ≤ lim inf
2
A |Dun |2 dx. (5.7.9)
2 2
Using (5.7.5), we add − lim sup( un , f ) and lim inf(− un , f ) to the left-hand side
and to the right-hand side of (5.7.9), respectively. This gives

I [u] − lim sup(− un fdx) ≤ lim inf I [un ] + lim inf − un fdx. (5.7.10)

Using (5.7.6) and (5.7.7) in (5.7.10) yields

I [u] − ufdx ≤ lim inf I [un ] − un fdx ,

and again add (5.7.8)–(5.7.10), therefore, we conclude that J [·] is w.l.s.c. The result
now follows from Theorem 5.3.2.
The step where we proved strict convexity is not necessary since we already proved
the functional is weakly l.s.c.. In fact, establishing strict convexity in these cases is
important only to prove uniqueness of the solution, which has been already verified
by Theorem 4.5.5.
5.7.3 Dirichlet Principle of Symmetric Elliptic PDEs
Next, we consider the problem of minimizing the functional (5.7.4) over the set of
admissible functions A = {v ∈ H01 ()}. The next theorem shows that the problem
of finding a solution for (5.7.3) and the problem of finding the minimizer of (5.7.4)
are equivalent.
Theorem 5.7.5 Consider the uniformly elliptic operator L defined in (5.7.2) for
a symmetric aij ∈ L∞ (), f ∈ L2 (), and c(x) ≥ 0 a.e.x ∈ for some open and
5.8 Euler–Lagrange Equation 337
bounded in Rn , and B[u, v] is the elliptic bilinear map associated with L. Then
u ∈ H01 () is a weak solution to (5.7.2) if and only if u is a minimizer for the
variational integral (5.7.4) over A = {v ∈ H01 ()}.
Proof Let u be a weak solution of (5.7.2). Then it satisfies the weak form
B[u, v] = f (v) (5.7.11)
for every v ∈ H01 (). Let w ∈ H01 () be a weak solution of (5.7.2), and without loss
of generality, assume w = u + v for any function v ∈ H01 (). Our claim is that
0 ≤ J [u] − J [w].
We have

1 1
J [u + v] − J [u] = B[u + v, u + v] − f (u + v])dx − B[u, u] + fudx.
2 2
(5.7.12)
By simple computations, taking into account that B is symmetric, and the fact from
Theorem 4.5.4 that if L is uniformly elliptic operator, then B is coercive, we obtain

1
J [u + v] − J [u] = − fvdx + B[u, v] + B[v, v]
2
1
= B[v, v]
2
≥ β v2H 1 () (By coercivity of B)
0
≥ 0.
This implies that u is the unique minimizer of (5.7.4). Note that the above inequality
becomes equality only when v = 0.
Conversely, let u ∈ H01 () be the minimizer of (5.7.4). Theorem 4.5.5 already
proved the existence and uniqueness of a weak solution for problem (5.7.2) which
we already proved it is a minimizer of (5.7.4), and on the other hand, Theorem 5.7.4
proved the existence and uniqueness of the minimizer of (5.7.4). This completes the
proof.
5.8 Euler–Lagrange Equation
5.8.1 Lagrangian Integral
Now, it seems that we are ready to generalize the work to more variational functionals.
We developed all the necessary tools and techniques to implement a “calculus-based”
method to solve minimization problems and their connections with their original
PDEs. Our goal is to investigate variational problems and see if the minimizers of
these problems hold as the weak solutions for the corresponding partial differential
equations. Consider the general variational integral

J [u] = L(∇u, u, x)dx. (5.8.1)

Here, L is a C 2 multi-variable function defined as
L : Rn × R × −→ R,
where u, v ∈ C 1 (), and is C 1 open and bounded set. The first variable in place
of ∇u is denoted by p, the second variable in place of u is denoted by z. This is a
common practice in the differentiation process if a function and its derivatives are the
arguments of another function so that the chain rule is not misused. Such a function
with the properties above is known as the Lagrangian functional. The functional
(5.8.1) shall be called: Lagrangian Integral. To differentiate L with respect to any
of the variables, we write
∇p L = (Lp1 , · · · , Lpn ), Lz , ∇x L = (Lx1 , · · · , Lxn ).
We will establish an existence theorem for the minimizer of the general variational
functional (5.8.1).
5.8.2 First Variation
One of the consequences of the preceding section is that by defining the function
h : R −→ R, by
h(t) = J (u + tv),
we see that, assuming sufficient smoothness on the integrand, the function h is dif-
ferentiable if and only if J is G-differentiable. Indeed, if L is C 2 , then J is G-
differentiable and both ∇p L and Lz are continuous, so we can perform chain rule,
dL
and since u, v ∈ C 1 (), is continuous, which implies that h is differentiable on
dt
R. Now, if u is a minimizer of a G-differentiable variational integral J , then
∂
h (0) = J [u + tv] |t=0 = DJ (u, V ). (5.8.2)
∂t
Equation (5.8.2) is called the first varition of J [·], and it provides the weak form of
the PDE which is associated with J .
5.8.3 Necessary Condition for Minimiality I
Let us see how to obtain the weak formulation explicitly from the first variation.
Writing
h(t) = J (u + tv) = L(∇(u + tv), u + tv, x)dx,

for v ∈ C01 () and t ∈ R so that
u + tv ∈ C01 ().
Let u ∈ C01 () be a minimizer of J , so 0 is the minimizer for h, hence h (0) = 0.

Then we have

∂L
h (t) = ∇p L(∇u + t∇v, u + tv, x) · Dv + (∇u + t∇v, u + tv, x)v dx.
∂z
n

∂L
= Lpi (∇u + t∇v, u + tv, x)vxi + (∇u + t∇v, u + tv, x)v dx.
i=1 ∂z
Thus, we have

n

∂L
0 = h (0) = Lpi (∇u, u, x)vxi + (∇u, u, x)v dx.
i=1
∂z
This is the weak formulation of the PDE associated with the variational integral J .
To find the equation, we use Green’s identity (or integrate by parts with respect to x
in the first n terms of the integral), taking into account v|∂ = 0, we obtain
n

∂ ∂L
0= − Lp (∇u, u, x) + (∇u, u, x) vdx.
i=1
∂xi i ∂z
By the Fundamental Lemma of COV, we obtain
n
∂ ∂L
− Lpi (∇u, u, x) + (∇u, u, x) = 0,
i=1
∂xi ∂z
or in vector form
∂L
−div(∇p L) + (∇u, u, x) = 0.
∂z
Theorem 5.8.1 (Necessary Condition For Minimiality I) Let
L(∇u, u, x) ∈ C 2 ()
for some ∈ Rn , and consider the Lagrangian Integral (5.8.1). If u ∈ C01 () is a
minimizer for J over A = {v ∈ C01 ()}, then u is a solution for the equation
n
∂ ∂L ∂L
− (∇u, u, x) + (∇u, u, x) = 0, (5.8.3)
i=1
∂x ∂pi ∂z
with the homogeneous Dirichlet condition u = 0 on .
5.8.4 Euler–Lagrange Equation
Definition 5.8.2 (Euler–Lagrange Equation) The equation (5.8.3) is called:

Euler−Lagrange equation, and its weak form is given by

n

∂L
Lpi (∇u, u, x)vxi + (∇u, u, x)v dx = 0.
i=1
∂z
The Euler–Lagrange equation is a quasilinear second-order partial differential

equation. It is one of the most important partial differential equations in applied
mathematics, and it is a landmark in the history and development of the field of
calculus of variations. If the functional is the variational integral of some PDE,
then the associated E-L equation reduces to the weak formulation of that PDE. For
example, if I [·] is the Dirichlet integral, then it is easy to see that the E-L equation
reduces to the Laplace equation. Here,
1 2
L(∇u, u, x) = |p| ,
2
so Lu = 0 and Lp = p = ∇u, and consequently Lpx = ∇ 2 u, therefore, the E-L equa-

tion reduces to the Laplace equation
∇ 2 u = 0.
The Lagrangian of the Poisson variational integral is written in the form
1 2
L(∇u, u, x) = |p| − fu.
2
Then
Lpx = ∇ 2 u, Lu = −f ,
so the E-L equation reduces to the Poisson equation.

−∇ 2 u, = f .
In the preceding theorem, the minimizer of the functional J was taken over the space
C01 (), and consequently, the direction vector v was also chosen to be in C01 () so
that
u + tv ∈ C01 ().
If the admissible set is chosen to be C 1 (), then we must have v ∈ C 1 (). Writing
again the first variation as

∂L
0 = h (0) = ∇p L(∇u, u, x) · Dv + (∇u, u, x)v dx.
∂z
Applying the first Green’s identity to the first term yields

∂L
0= v∇p L(∇u, u, x) · nds − div(∇p L) + (∇u, u, x) vdx,
∂ ∂z
where n is the outward normal vector, and this equation holds for all v ∈ C 1 (). We
thus have the following:
Theorem 5.8.3 Let L(∇u, u, x) ∈ C 2 () for some ∈ Rn . If u ∈ C 2 () is a min-
imizer of the Lagrangian Integral J over C 2 (), then u is a solution for the Euler–
Lagrange equation with the Neumann boundary condition
∇p L(∇u, u, x) · n = 0 on ∂.
5.8.5 Second Variation
A natural question arises is: Does the converse of Theorem 5.8.1 hold? We know
from the discussion above that a weak solution to the E-L equation is a critical point
of the associated variational integral. But is it necessarily minimizer? The answer
in general is: No, as it may also be maximizer, or neither. As we usually do with
elementary calculus, the second derivative needs to be invoked here. Consider again
the function
h(t) = J (u + tv) = L(∇(u + tv), u + tv, x)dx.

If h has a minimum value at 0, then h (0) = 0 and h (0) ≥ 0. The first derivative was
found to be
n

∂L
h (t) = Lpi (∇u + t∇v, u + tv, x)vxi + 2 (∇u + t∇v, u + tv, x)v dx.
i=1 ∂z
Then ⎡ ⎤

n
n
h (t) = ⎣ Lpi pj vxi vxj + 2 Lzpj vvxj + Lzz v2 ⎦ dx,
i,j=1 j=1
where Lpi pj = Lpi pj (∇u + t∇v, u + tv, x),

Lzpj = Lzpj (∇u + t∇v, u + tv, x),
Lzz = Lzz (∇u + t∇v, u + tv, x).
Thus, the second variation takes the form

⎡ ⎤
n
n
0 ≤ h (0) = ⎣ Lpi pj vxi vxj + 2 Lzpj vvxj + Lzz v2 ⎦ dx.
i,j=1 j=1
where Lpi pj = Lpi pj (∇u, u, x)

Lzpj = Lzpj (∇u, u, x)
Lzz = Lzz (∇u, u, x)
More precisely,
⎡ ⎤

n
n
0≤ ⎣ Lpi pj (∇u, u, x)(∇v)2 + 2 Lzpi (∇u, u, x)v∇v + Lzz (∇u, u, x)v2 ⎦ dx,
i,j=1 i=1
which is valid for all v ∈ Cc∞ (). The above integral is called: second variation.
5.8.6 Legendre Condition
Consider the function ηx

v = ξ(x)ϕ( ),

for some cut-off function ξ(x) ∈ Cc∞ (), fixed η ∈ Rn such that ϕ → 0 as → 0
and ϕ = 1. This gives
vxi (x) = ηi ξ + O().
Substituting above with a suitable choice of ξ gives

n
0≤ Lpi pj (∇u, u, x)ηi ηj dx,
i,j=1
which holds for

n
Lpi pj (∇u, u, x)ηi ηj ≥ 0. (5.8.4)
i,j=1
Condition (5.8.4) is thus a necessary condition for the critical point u to be a

minimizer for J . Moreover, the inequality reminds us with the convexity property
for L with respect to p. We therefore have the following theorem:
Theorem 5.8.4 (Necessary Condition For Minimiality II (Legendre) Let L(∇u,

u, x) ∈ C 2 () for some ∈ Rn . If u ∈ C01 () is a minimizer of the functional J
in (5.8.1) over
A = {v ∈ H01 ()},
then for all η ∈ Rn , we have

n
Lpi pj (∇u, u, x)ηi ηj ≥ 0.
i,j=1
The second variation can be written in the quadratic form

Q[u, v] = A(∇v)2 + 2Bv∇v + Cv2 dx,

where

n
n
A= Lpi pj , B = Lzpi , C = Lzz ,
i,j=1 i=1
then it can be seen that if u is a local minimizer then Q[u, u] is positive definite,
which implies that the integrand
A(∇u)2 + 2BuDu + Cu2 ≥ 0.
If the inequality is strict, then by (5.6.7), we have
t2
J [u + tv] = J [u] + t DJ (u v + Q[u, v],
2
where
d2
Q[u, v] = h (0) = J [u + tv] |t=0 .
dt 2
Therefore, we have
Theorem 5.8.5 Let L(∇u, u, x) ∈ C 2 () for some bounded ∈ Rn , and suppose
u ∈ C01 () is a critical point of the Lagrangian integral J . Then u is a local minimizer
for J over
A = {v ∈ C01 ()}
if and only if Q[u, v] > 0 (i.e., positive definite), equivalently,
A(∇u)2 + 2Bu∇u + Cu2 > 0.
5.9 Dirichlet Principle for Euler–Lagrange Equation
5.9.1 The Lagrangian Functional
Consider again the variational Lagrangian integral

J [u] = L(Du, u, x)dx. (5.9.1)

for some C 1 open and bounded ⊂ Rn . We will establish an existence theorem

for the minimizer of J over Sobolev spaces W 1,q (), 1 < q < ∞. To ensure the
variational integral is well-defined, we assume throughout that L is Caratheodory.
A function L(p, z, x) is called Caratheodory if L is C 1 in z and p for a.e. x, and
measurable in x for all p.
5.9.2 Gateaux Derivative of the Lagrangian Integral
Our first task is to prove that J is G-differentiable and find its derivative. We need to
impose further conditions on L. To ensure the functional J [·] is finite, we assume L,
Lp , and Lz are all Caratheodory. We also assume the p−growth condition
|L(p, z, x)| ≤ C |p|q + |z|q + 1 , (5.9.2)
together with

max{Lp (p, z, x) , |Lz (p, z, x)|} ≤ C |p|q−1 + |z|q−1 + 1 . (5.9.3)
Theorem 5.9.1 Consider the Lagrangian Integral Functional J : W 1,q () −→ R,

1 < q < ∞, given by
J [u] = L(Du, u, x)dx

for some bounded ⊂ Rn , where L is the Lagrangian. If L is Caratheodory and

satisfies conditions (5.9.2) and (5.9.3), then the functional J is G-differentiable and
5.9 Dirichlet Principle for Euler–Lagrange Equation 345

DJ (u, v) = Dp L(Du, u, x)Dv + Lz (Du, u, x)v dx. (5.9.4)

Proof The G-derivative of J can be found as the limit as t −→ 0 of the expression

1 1
(J [u + tv] − J [u]) = (L(Du + tDv, u + tv, x) − L(Du, u, x)) dx (5.9.5)
t t
The key to the proof is the Dominated Convergence Theorem. Set
1
ft = (L(Du + tDv, u + tv, x) − L(Du, u, x)) .
t
If we show that ft −→ f a.e. as t → 0 and |ft | ≤ g ∈ L1 () then by the Dominated

Convergence Theorem we conclude that

lim ft dx = fdx.
t
It is clear that
a.e. dL
ft −→ |t=0 = Dp L(Du, u, x)Dv + Lz (Du, u, x)v. (5.9.6)
dt
On the other hand, letting 0 < t ≤ 1, then ft can be written as

1 t d
ft = (L(Du + τ Dv, u + τ v, x)) d τ
t 0 dτ

1 t
= Dp L(Du + τ Dv, u + τ v, x)Dv + Lz (Du + τ Dv, u + τ v, x)v d τ
t 0
Next, we shall make use of the condition (5.9.3). This gives

t
1
(J [u + tv] − J [u]) ≤ 1 Dp L(Du + τ Dv, u + τ v, x) |∇v|
t t 0
+ |Lz (Du + τ Dv, u + τ v, x)| |v|] d τ

1 t
≤ C |Du + τ Dv|q−1 + |u + τ v|q−1 + 1
t 0
(|∇v| + |v|)] d τ .
1,q
Note that since u, v ∈ W0 , we have
v, ∇v, (Du + τ Dv), (u + τ v) ∈ Lq .
Given that the Holder’s conjugate of q is the number q∗ , so q∗ (q − 1) = q, we have

∗
(|Du + τ Dv|q−1 )q = (|Du + τ Dv|)q ∈ L1 ,
and similarly for u + τ v. Then using Young’s inequality on the terms

|Du + τ Dv|q−1 |∇v| , |Du + τ Dv|q−1 |v| , |u + τ v|q−1 |Dv| , and |u + τ v|q−1 |v| , we
have the following L1 − integrable functions
|Du + τ Dv|q , |u + τ v|q , |v| , |Dv| ∈ L1 ().
Together with some constant C, set them all to be the function g(x) ∈ L1 (), we
then have
1 t
|ft | ≤ g(x)d τ = g(x) ∈ L1 ().
t 0
Now from (5.9.5) and (5.9.6), the Dominated Convergence Theorem gives (5.9.4).
Theorem 5.9.2 Under the assumptions of the preceding theorem, if u ∈ W 1,q () is
a local minimizer of the functional J over
A = {v ∈ W 1,q () : v = g on ∂, for some g ∈ W 1,q ()},
then u is a weak solution of the Euler-Lagrange equation

n
∂ ∂L
− Lpi (Du, u, x) + (Du, u, x) = 0, x ∈
i=1
∂x i ∂z
u = g, x ∈ ∂.
Proof Multiplying the equation above by v ∈ Cc∞ () and integrating by parts gives

Dp L(Du, u, x)Dv + Lz (Du, u, x)v dx = 0. (5.9.7)

So (5.9.7) is the weak formulation of the Euler–Lagrange equation. Now, since u is

a local minimizer of J , by Theorem 5.9.1 and Theorem 5.6.9(1), we write (5.9.4) as
(5.9.7).
The task of proving that a minimizer is unique for such general functionals is a bit
challenging, and some further conditions should be imposed. One way to deal with
this problem is to assume convexity in the two variables (p, z) rather than p alone.
Such property is called: jointly convexity.
Definition 5.9.3 (Jointly Convex Functional) A function F(x, y) : X × Y −→ R is

called jointly convex if for x1 , x2 ∈ X and y1 , y2 ∈ Y and every 0 ≤ θ ≤ 1, we have
F (θx1 + (1 − θ)x2 , θy1 + (1 − θ)y2 ) ≤ θF(x1 , y1 ) + (1 − θ)F(x2 , y2 ).
If the inequality is strict, then the functional F is said to be jointly strictly convex.
5.9 Dirichlet Principle for Euler–Lagrange Equation 347
In an analogous way to Theorem 5.6.4, we have the following useful property:

Proposition 5.9.4 If L = L(p, z, x) is jointly convex in (z, p), then
L(x, v, Dv) − L(x, u, Du) ≥ Lp (x, u, Du)(Dv − Du) + Lz (x, u, Du)(v − u).
(5.9.8)
Proof For 0 < t ≤ 1, set
w = tv + (1 − t)u.
Then, by joint convexity, we have
L(x, w, Dw) ≤ tL(x, v, Dv) + (1 − t)L(x, u, Du)

= t(L(x, v, Dv) − L(x, u, Du)) + L(x, u, Du).
This implies
L(x, w, Dw) − L(x, u, Du)

L(x, v, Dv) − L(x, u, Du) ≥
t
L(x, w, Dw) − L(x, w, Du) + L(x, w, Du) + L(x, u, Du)
=
t
L(x, w, Dw) − L(x, w, Du) L(x, w, Du) + L(x, u, Du)
= + .
t t
Taking the limit t −→ 0, making use of Theorem 5.6.4, and noting that w −→ u,
we get (5.9.8).
Using this property of joint convexity, we can easily establish an existence and
uniqueness theorem for the minimizer.
Theorem 5.9.5 Under the assumptions of Theorem 5.9.1, and assuming L is jointly
strictly convex in (z, p), there exists u ∈ W 1,q () for any 1 < q < ∞ such that u is
the unique minimizer of the Lagrangian variational integral

J [u] = L(Du, u, x)dx.

w
Proof The variational integral J is G-differentiable by Theorem 5.9.1. Let un −→ u
in W 1,q (). Then by (5.9.8)
L(x, un , Dun ) − L(x, u, Du) ≥ Lp (x, u, Du)(Dun − Du) + Lz (x, u, Du)(un − u).
Note that Lp (Du, un , x) and Lz (x, u, Du)(v − u) are bounded linear functionals, and
w w
un −→ u, Dun −→ Du, so
Lp (Du, un , x)(Dun − Du) −→ 0,

and
Lz (x, u, Du)(un − u) −→ 0.
This gives
L(x, un , Dun ) ≥ L(x, u, Du).
Now we integrate both sides over , and then taking the limit inferior,
lim inf J [un ] ≥ J [u].
So, J is weakly l.s.c., and therefore, by Theorem 5.3.2, there exists a minimizer.
The uniqueness of the minimizer follows from the joint strict convexity by a similar
argument to that of Theorem 5.3.2, and this will be left to the reader as an easy
exercise.
Now we are ready to establish the Dirichlet principle for the Lagrangian integral.
5.9.3 Dirichlet Principle for Euler-Lagrange Equation
Theorem 5.9.6 Under the assumptions of Theorem 5.9.1, and assuming L is jointly
convex in (z, p), u ∈ W 1,q () is a weak solution of the Euler–Lagrange equation if
and only if u is a minimizer of the Lagrangian integral.
Proof Let u ∈ W 1,q () be a weak solution of the Euler–Lagrange equation. Inte-
grating both sides of the inequality (5.9.8) over yields

J [v] − J [u] ≥ Lp (x, u, Du)(Dv − Du) + Lz (x, u, Du)(v − u) = 0.

This gives J [v] ≥ J [u], and here u is a minimizer. Theorem 5.9.2 gives the other
direction.
5.10 Variational Problem of Euler–Lagrange Equation
5.10.1 p−Convex Lagrangian Functional
In this section, we solve a variational problem of Euler–Lagrange equation. Namely,

we will find a minimizer for the corresponding variational integral, then we will
show that this minimizer is a weak solution for the Euler–Lagrange equation. In the
previous section, we have already solved a variant of this problem in addition to
5.10 Variational Problem of Euler–Lagrange Equation 349
a Dirichlet principle for the Euler–Lagrange equation, provided the Lagrangian is

jointly convex. However, the property of joint convexity is restrictive, and not too
many functions satisfy this property. For example, the function
f (x, y) = xy
can be shown that it is convex in x and convex y but not jointly convex in (x, y). A
main motivation for us is the Legendre condition (Theorem 5.8.4), in the sense that
the inequality
n
Lpi pj (Du, u, x)ηi ηj ≥ 0
i,j=1
is essential for the critical point to be a minimizer. The above inequality implies that
in p, which seems to be the natural property to replace the joint convexity of L, so
we will adopt this property in the next two results. A classical result in real analysis
shall be invoked here. Recall that Egoroff’s theorem states the following: If {fn } be a
sequence of measurable functions and fn → f a.e. on a set E of finite measure, then
for every > 0, there exists a set A, with μ(A) < such that fn −→ f uniformly on
E \ A. The theorem shall be used to prove the following.
Theorem 5.10.1 Let L = L(p, z, x) be the Lagrangian functional that is bounded

from below. If L is convex in p, then J [u] is weakly l.s.c. on W 1,q () for any 1 <
w
q < ∞, is C 1 open bounded in Rn , that is, for every un −→ u on W 1,q () for
1 < q < ∞, is C open bounded in R , we have
1 n
J [u] ≤ lim inf J [un ].

w
Proof Let un −→ u in W 1,q (). We divide the proof into three parts. Firstly, since
L is convex in p, we use Theorem 5.6.4 to get
L(Dun , un , x) − L(Du, un , x) ≥ Lp (Du, un , x)(Dun − Du),
w
but we know that Lp (Du, un , x) is a bounded linear functional, and Dun −→ Du, so
Lp (Du, un , x)(Dun − Du) −→ 0,
from which we get

L(Dun , un , x) ≥ L(Du, un , x). (5.10.1)

Secondly, since L is bounded from below, J is also bounded from below, so let
m = lim inf J [un ].

Moreover, by Proposition 5.3.3(3), (un ) is bounded, so by Rellich–Kondrachov Theo-

rem 3.10.5, there exists a subsequence unm = um of un such that um strongly converges
to u in Lp , and so by Proposition 5.3.3(1), there exists a subsequence umj = uj of um
such that uj converges to u a.e. Now, since is bounded, by Egoroff theorem, there
exists a subsequence ujk = uk of uj such that uk converges to u uniformly in some
open set such that μ( \ ) < (where μ denotes the Lebesgue measure), so
we can assume within that both un and Dun are bounded, and since L is C 1 , this
implies

lim L(Du, un , x)dx = lim L(Du, un , x)dx

= L(Du, lim un , x)dx

= L(Du, u, x)dx,

i.e., we have
lim L(Du, un , x)dx = L(Du, u, x)dx. (5.10.2)

Thirdly, since L is bounded from below by, say c > −∞, then WLOG we can assume
L > 0 (since we can use the shift transformation L −→ L + c). Also, we note that
as → 0, , so we can write

L(Du, un , x)dx = χ L(Du, un , x)dx,

where χA is the characteristic function which equals 1 on A and zero otherwise.

1
Writing = , then we see that (χ L) is an increasing sequence of nonnegative and
n
measurable and converges to χ L. Hence, by the Monotone Convergence Theorem
(Theorem 1.1.6),

lim L(Du, u, x)dx = L(Du, u, x)dx. (5.10.3)

From (5.10.2) and (5.10.3), we conclude that

lim L(Du, un , x)dx = L(Du, u, x)dx,

and from (5.10.1), we obtain

5.10 Variational Problem of Euler–Lagrange Equation 351

lim inf J [un ] = lim inf L(Dun , un , x)dx

≥ lim inf L(Du, un , x)dx

≥ lim inf L(Du, un , x)dx

= L(Du, u, x)dx

= J [u].
5.10.2 Existence of Minimizer
We have seen that the two main conditions to guarantee the existence of minimizers
are the coercivity and lower semicontinuity. The preceding theorem deals with the
latter condition, and we need assumptions to guarantee the former. As in the preceding
theorem, we gave conditions on L rather than J , so we will continue to do that for
the existence theorem.
Theorem 5.10.2 Let L = L(p, z, x) be the Lagrangian functional. Suppose that L is
bounded from below and convex in p. Moreover, there exists α > 0 and β ≥ 0 such
that
L ≥ α |p|q − β,
for 1 < q < ∞. Then there exists u ∈ W 1,q () for some C 1 open bounded ⊂ Rn
such that u is the minimizer of the Lagrangian variational integral

J [u] = L(Du, u, x)dx

over the admissible set
A = {v ∈ W 1,q () : v = g on ∂, for some g ∈ W 1,q ()}.
Remark To avoid triviality of the problem, we assume that inf J < ∞, and A = ∅.
Proof WLOG we can assume β = 0 or use the shift L −→ L + β. The bound con-
dition of L implies that

J [u] = L(Du, u, x)dx ≥ α |Du|q ,

hence J is bounded from below, and note also from the preceding theorem that J is
weakly l.s.c., so let un ∈ A such that J [un ] < ∞. Then
sup Dun q < ∞.
For any v ∈ A, we have

1,q
un − v ∈ W0 (),
so using Poincare inequality
un Lq = un − v + vLq

≤ un − vLq + vLq
≤ D(un − v)Lq + vLq
= Dun − DvLq + C1
= C = C2 + C1 .
Therefore,
sup un q < ∞,
and consequently, (un ) is bounded in W 1,q (), which shows that J is coercive.
Finally, Proposition 5.5.2 shows that A is weakly closed. The result follows now
from Theorem 5.3.2.
Lastly, we prove that the minimizer for the Lagrangian variational integral is a weak
solution to the Euler–Lagrange equation.
Theorem 5.10.3 Suppose that the Lagrangian functional L satisfies all the assump-
tions of Theorem 5.9.1 and 5.10.2. If u ∈ W 1,q () is a local minimizer of the func-
tional J over
A = {v ∈ W 1,q () : v = g on ∂, for some g ∈ W 1,q ()},
then u is a weak solution of the Euler-Lagrange problem

n
∂ ∂L
− Lpi (Du, u, x) + (Du, u, x) = 0, x ∈
i=1
∂xi ∂z
u = g, x ∈ ∂.
Proof Same as Theorem 5.9.2.
5.11 Problems

(2) Give an example to show that the result of Mazur’s Lemma 5.2.10 doesn’t hold
for every finite convex combination of the sequence xn .
5.11 Problems 353
(3) Prove Theorem 5.2.9 from Mazur’s Lemma.

(4) Let f : X −→ R be coercive and weakly l.s.c. defined on a reflexive Banach
space X . Show that f is bounded from below.
(5) Show that if f : R −→ R and
f (x) ≥ α |x|p − β
for some α, β > 0 and 1 < p < ∞ then f has a minimizer over R.
(6) Give an example of a function f : R −→ R such that f is coercive, bounded
from below, but does not have a minimizer on R.
(7) Let f : X −→ R be convex and ls.c. If f < ∞ and there exists x0 ∈ X such that
f (x0 ) = −∞ then show that f ≡ −∞.
(8) Give an example of a minimizing sequence with no subsequence converging in
norm.
(9) Let {fi : i ∈ I } be a family of convex functionals defined on a Hilbert space.
Show that sup{fi : i ∈ I } is convex.
(10) Show that if f , g are l.s.c and both are bounded from below then f + g is l.s.c.
(11) Show that if fn is a sequence of l.s.c. functions and fn converges uniformly to
f , then f is l.s.c.
(12) If f is bounded from below, convex, and l.s.c. Prove or disprove: f is continuous
on its domain.
(13) Use Prop 5.3.3(3) to prove the statement of Theorem 4.9.5(2).
(14) (a) Show that a function f is coercive if and only if its lower level sets
{x : f (x) ≤ b, b ∈ R}
are bounded.
(b) Deduce from (a) that if f : H −→ (−∞, ∞] is proper coercive then every
minimizing sequence of f is bounded.
(15) A function is called: quasi − convex if its lower-level sets
{x : f (x) ≤ b, b ∈ R}
are convex.
(a) Show that every quasi-convex function is convex.
(b) Show that every monotone function is quasi-convex.
(c) Let f : H −→ (−∞, ∞] be quasi-convex. Show that f is l.s.c. if and only
if f is weakly l.s.c.
(16) Let Let f : H −→ (−∞, ∞] be quasi-convex and l.s.c., and suppose C ⊂ H
is weakly closed. If there exists b ∈ R such that
C ∩ {x : f (x) ≤ b, b ∈ R}
is bounded, prove that there exists a minimizer of f over C.

1,p
(17) (a) Show that the Dirichlet integral I is not coercive on W0 () for p > 2.
(b) Show that the Dirichlet integral I is strictly convex on W 1,p () for p ≥ 2.
(18) (a) Show that x2 is weakly l.s.c.
(b) Determine values of p for which xp is weakly l.s.c.
(19) Let F : Rn −→ (−∞, ∞] be l.s.c. and convex. Let be bounded Lip. in Rn ,
and define the variational J : W 1,p () −→ R,

J [u] = F(Du)dx.

(a) Show that J is convex.

(b) Show that J is l.s.c.
(20) Consider the variational integralJ : H 1 (0, 1) −→ R.
1
J [u] = u − 1 2
+ u2 dx.
0
(a) Show that J is coercive.

(b) Show thatJ is not convex.
(c) Show that the minimum of J is zero but J doesn’t attain its minimum.
(21) Consider the variational integral
1
2
J [u] = (u )2 − 1 dx.
0
(a) Show that J has minimum value 0.

(b) Show that there exists no minimizer over C 1 [0, 1].
(c) Show that there exists a minimizer over C[0, 1].
(22) Consider the variational integral
1
u − 2 |x| dx.
2
J [u] =
−1
(a) Show that J has minimum value 0.

(b) Show that there exists no minimizer over C 2 [−1, 1].
(c) Show that there exists a minimizer over C 1 [−1, 1].
(23) Consider the variational integral J : H01 () −→ R, is bounded in Rn , given
by

1 1
J [u] = |Du|2 + u3 + f (x)u dx.
2 3
(a) Show that J is strictly convex.

(b) Show that J is l.s.c.
(c) Show that there exists a minimizer for J .
5.11 Problems 355
(d) Show that the minimizer of J is a weak solution of the problem
∇ 2 u − u2 = f , x ∈
u = 0, x ∈ ∂.
(24) Let ψ : Rn −→ R be l.s.c. and convex. Consier the functional J : W 1,p () −→
R, for some open and lip. in Rn , and 1 < p < ∞, and defined by

J [u] = ψ(Du)dx.

Show that J is weakly l.s.c.

(25) Let f : R2 −→ R given by
⎧x
⎨ (x2 + y2 ) R2 \ {(0, 0)}
f (x, y) = y
⎩0 (x, y) = (0, 0).
(a) Show that f is not continuous at (0, 0).

(b) Find the G-derivative of f at (0, 0).
(27) Let f be G-differentiable on a normed space X . Prove that f is convex if
and only if
Df (v) − Df (u), v − u ≥ 0
for all u, v ∈ X .
(28) Consider the integral functional J : C 1 [0, 1] −→ R defined by
1
J [u] = |u| dx
0
(a) Find the G-derivative of J at all u = 0.

(b) Show that the G-derivative does not exist at u = 0.
(29) Let f : X −→ R for some Banach space X . Show that
|f (x1 ) − f (x2 )| ≤ sup Df (tx1 + (1 − t)x2 )X ∗ x2 − x1 X

t∈[0,1]
(30) Show that if f is Frechet differentiable at x, then it is G-Differentiable at x.

(31) Show that ·pp is not G-differentiable at u = 0.
(32) Consider the integral functional J : W 1,p () −→ R, ⊂ Rn , 1 < p < ∞,
defined by

J [u] = |Du|p dx.

(a) Show that J is convex.

(b) Find the G-derivative of J .
(c) Show that J has a minimizer u over the set
1,p
A ={v ∈ W 1,p () : v − g ∈ W0 (), for some g ∈ W 1,p ()}.
(d) Show that the minimizer u is the weak solution of the problem
div |∇u|p−2 ∇u = 0, x ∈
u = g, x ∈ ∂.
(33) If J [u] = B[u, v] + L[u] for some bilinear form B and linear L. Show that
D2 J (u, v)w = B[v, w] + B[w, v].
(34) Find the variational integral for the equation

! "
∇u
div 1/2
= 0.
1 + |∇u|2
(35) Consider the problem of minimizing the variational integral (5.7.1) over H01 (),
where is bounded in at least one direction in Rn and f ∈ L2 ().
(a) Prove the following identity

1 1 1
inf J [v] ≤ J [ui ] + J [uj ] − ∇(ui − uj )2 dx,
v∈X 2 2 4
(b) Show that ∇(ui − uj ) L2 −→ 0.

(c) Show that (un ) is Cauchy in H01 .
(d) Use (c) to prove the existence of the minimizer.
(e) Use (a) to prove the uniqueness of the minimizer.
(36) (a) Show that f (x) = xp is strictly convex for 1 < p < ∞.
(b) Show that the variational integral defined in (5.7.4) is strictly convex.
(c) Deduce that the weak solution to problem (5.7.2) is unique.
(37) Alternative proof for (5.7.9): Consider the problem (5.7.2) with all the assump-
tions.
(a) Show that ADv ∈ L2 ().
(b) Show that
0 ≤ A(x) |D(vn − v)|2 dx,

(c) Show that

5.11 Problems 357

A(x)Dvn Dvdx −→ A(x) |Dv|2 dx.

(d) Prove (5.7.9) in the proof of Theorem 5.7.5.

(38) In Theorem 5.7.5, find the associated variational integral, then use any method to
prove the theorem for the same symmetric operator with the boundary condition:
(a) u = f on ∂.
∂u
(b) = 0 on ∂.
∂n
∂u
(c) = g on ∂.
∂n
(39) Consider the following Neumann problem of the Poisson equation
−∇ 2 u = f , x ∈
∂u
= g, x ∈ ∂.
∂n
for some f , g ∈ L2 () where is bounded in Rn .

(a) Find the associated variational integral J [u] of the problem.
(b) Define the minimization problem and its admissible set.
(c) Show that u ∈ H 1 () is the weak solution of the problem if and only if u is
the minimizer of the associated variational integral J which is obtained in (a)
over the admissible set obtained in (b).
(40) Prove that there exists a minimizer for the functional J : H02 () −→ R, given
by

1 2 2
J [u] = D u − f (x)Du − g(x)u dx
2
for some bounded ⊂ Rn , f , g ∈ Cc∞ (), over the admissible set

A ={u ∈ H02 () : u = 0 on ∂}.

(41) The p−Laplacian operator p is defined by
p u = ∇ · (|∇u|p−2 ∇u)
p
(a) Find the G-derivative of u(x) = uLp .
(b) Consider the p−Laplace equation (for 1 < p < ∞)
−p u = f , x ∈
u = 0, x ∈ ∂.
Show that the corresponding variational integral is


1
Jp [u] = |∇u|p dx − fudx.
p
(c) Show that Jp is G-differentiable and convex. Deduce it is weakly l.s.c.

(d) Show that the functional Jp [·] admits a unique minimizer over H01 ().
(e) Establish a Dirichlet principle between the p−Laplacian equation and its
variational integral.
(42) Find the Euler–Lagrange equation corresponding to the following Lagrangians

over {u ∈ C 1 , u = 0 on ∂}.
1
(a) L(p, z, x) = |p|2 + F(z) for some nonlinear function F.
2
1 q
(b) L(p, z, x) = |p| + z q for some 1 < q < ∞.
2
1
(c) L(p, z, x) = |p|r+2 + fz, f ∈ C 1 .
r+2
(43) Show that the functional defined by
1 #
J [u] = x 1 + (u )2 dx
−1
has no minimizer on C 1 [−1, 1].

(44) Determine whether the functional J : C 1 [0, 1] −→ R defined by
1#
J [u] = u2 + (u )2 dx
0
has a minimizer over A ={u ∈ C 1 [0, 1] such that u(0) = 0 and u(1) = 1}.
(45) Consider the variational integral J : H01 () −→ R, given by

1
J [u] = |Du|2 − f (x)Du dx
2
for some bounded ⊂ Rn , f ∈ Cc∞ ().

(a) Prove that there exists a minimizer over the admissible set
A ={u ∈ H01 (), u = 0 on ∂.}.
(b) Find the corresponding Euler–Lagrange equation.

(46) Consider the variational integral J : C[a, b] −→ R, given by
b #
J [u] = 1 + (u )2 dx.
a
5.11 Problems 359
(a) Find the corresponding Euler–Lagrange equation.

(b) Show that the associated bilinear B[u, v] is positive.
(c) Conclude that the line is the shortest distance between two points.
(47) Find the Euler–Lagrange equation corresponding to the quadratic form

Q[u] = (u )2 − u2 dx.

(48) Determine whether the functions are jointly convex or not.

(a) L(p, z, x) = zpi .
(b) L(p, z, x) = |p|2 − z.
(c) L(p, z, x) = 21 |p|2 − zx.
(49) Show that the Poisson variational integral is jointly strictly convex in z and p.
(50) Prove or disprove:
(a) If f (x, y) and g(y, z) are strictly convex functions then
f (x, y, z) = f (x, y) + g(y, z)
is strictly convex.
(b) If f (x, y) is convex in (x, y) and strictly convex in x and strictly convex in
y, then f is strictly convex.
(c) If f (x, y) is jointly convex in (x, y) and strictly convex in x and strictly
convex in y, then f is jointly strictly convex.
(51) A function f : Rn −→ R is said to be strongly convex if there exists β > 0 such
that f (x) − β x2 is convex.
(a) Show that if a function is strongly convex, then it is strictly convex.
(b) Give an example of a function that is strictly convex but not strongly convex.
(c) Show that if a function is strongly convex then
β
f (y) ≥ f (x) + (∇f ) · (y − x) + y − x2 .
2
(d) In Theorem 5.10.2, in addition to all the assumptions of the theorem, if

L=L(x,p) and is strongly convex, show that the minimizer predicted by the
theorem is unique.
References
1. R.A. Adams, J.J.F. Fournier, Sobolev Spaces (Academic, Elsevier Ltd., 2003)
2. N.I. Akhiezer, I.M. Glazman, Theory of Linear Operators in Hilbert Space (Dover Publica-
tions, 1993)
3. C. Alabiso, I. Weiss, A Primer on Hilbert Space Theory (Springer International Publishing
Switzerland, 2015)
4. F. Albiac, N.J. Kalton, Topics in Banach Space Theory (Springer International Publishing
Switzerland, 2006; 2nd edn., 2016)
5. C.D. Aliprantis, K.C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide (Springer,
Berlin, 1999; Heidelberg, 2006)
6. T. Apostol, Mathematical Analysis, 2nd edn. (Pearson, 1974)
7. J.-P. Aubin, Applied Functional Analysis, 2nd edn. (Wiley, 2000)
8. G. Bachman, L. Narici, Functional Analysis (Dover Publications, 1998)
9. V. Barbu, T. Precupanu, Convexity and Optimization in Banach Spaces (Springer Netherlands,
2012)
10. H.H. Bauschke, P.L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert
Spaces (Springer, 2011)
11. B. Beauzamy, Introduction to Banach Spaces and their Geometry (North- Holland Publishing
Company, 1982)
12. L. Beck, Elliptic Regularity Theory a First Course (Springer, 2016)
13. S. Berberian, Fundamentals of Real Analysis (Springer, New York, 1999)
14. S.K. Berberian, P.R. Halmos, Lectures in Functional Analysis and Operator Theory (Springer,
1974)
15. K. Bichteler, Integration - A Functional Approach (Birkhäuser, Basel, 1998)
16. A. Bowers, N.J. Kalton, An Introductory Course in Functional Analysis (Springer, New York,
2014)
17. S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge University Press, 2004)
18. A. Bressan, Lecture Notes on Functional Analysis: With Applications to Linear Partial Dif-
ferential Equations (American Mathematical Society, 2012)
19. H. Brezis, Functional Analysis, Sobolev Spaces and Partial Differential Equations (Springer,
New York, 2010)
20. D.S. Bridges, Foundations of Real and Abstract Analysis (Springer, New York, 1998)
21. T. Bühler, D.A. Salamon, Functional Analysis (American Mathematical Society, 2018)
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer 361
Nature Singapore Pte Ltd. 2024
https://doi.org/10.1007/978-981-99-3788-2
362 References
22. C. Caratheodory, Calculus of Variations and Partial Differential Equations of First Order,
3rd edn. (American Mathematical Society, 1999)
23. K. Chandrasekharan, Classical Fourier Transforms (Springer, Berlin, Heidelberg, 1989)
24. N.L. Carothers, A short Course on Banach Space Theory (Cambridge University Press, 2004)
25. Ward Cheney, Analysis for Applied Mathematics (Springer, New York Inc, 2001)
26. M. Chipot, Elliptic Equations: An Introductory Course (Birkhäuser, Berlin, 2009)
27. M. Chipot, Elements of Nonlinear Analysis (Birkhauser Advanced Texts, 2000)
28. C. Chidume, Geometric Properties of Banach Spaces and Nonlinear Iterations (Springer-
Verlag London Limited, 2009)
29. P.G. Ciarlet, Linear and Nonlinear Functional Analysis with Applications (SIAM-Society for
Industrial and Applied Mathematics, 2013)
30. R. Coleman, Calculus on Normed Vector Spaces (Springer, 2012)
31. J.B. Conway, A Course in Functional Analysis (Springer, New York, 1985)
32. R.F. Curtain, A. Pritchard, Functional Analysis in Modern Applied Mathematics (Academic,
1977)
33. B. Dacorogna, Direct Methods in the Calculus of Variations (Springer, Berlin, 1989)
34. J. Diestel, Geometry of Banach Spaces - Selected Topics (Springer, Berlin, Heidelberg, NY,
1975)
35. G. van Dijk, Distribution Theory: Convolution, Fourier Transform, and Laplace Transform,
De Gruyter Graduate Lectures (Walter de Gruyter GmbH, Berlin/Boston, 2013)
36. J.J. Duistermaat, J.A.C. Kolk, Distributions: Theory and Applications (Springer, New York,
2006)
37. Y. Eidelman, V. Milman, A. Tsolomitis, Functional Analysis: An Introduction (American
Mathematical Society, 2004)
38. L.D. Elsgolc, Calculus of Variations (Dover Books on Mathematics, 2007)
39. L.C. Evans, Partial Differential Equations, 2nd edn. (American Mathematical Society, 2010)
40. Marián Fabian, Petr Habala, Petr Hájek, Vicente Montesinos, Václav. Zizler, Functional
Analysis and Infinite-Dimensional Geometry (Springer, New York, 2001)
41. A. Friedman, Foundations of Modern Analysis (Dover Publications Inc, 1970)
42. I.M. Gelfand, S.V. Fomin, Calculus of Variations (Prentice-Hall, Inc, 1963)
43. M.G.S. Hildebrandt, Calculus of Variations (Springer, 1996)
44. D. Gilbarg, N.S. Trudinger, Elliptic Partial Differential Equations of Second Order (Springer,
2001)
45. G. Giorgi, A. Guerraggio, J. Thierfelder, Mathematics of Optimization: Smooth and Nons-
mooth Case (Elsevier Science, 2004)
46. I. Gohberg, S. Goldberg, Basic Operator Theory (Birkhäuser, 1980)
47. H.H. Goldstine, A History of the Calculus of Variations from the 17th through the 19th Century
(Springer, 1980)
48. D.H. Griffel, Applied Functional Analysis (Ellis Horwood LTD, Wiley, 1981)
49. G. Grubb, Distributions and Operators (Springer Science+Business Media, 2009)
50. C. Heil, A Basis Theory Primer (Springer Science+Business Media, LLC, 2011)
51. V. Hutson, J.S. Pym, M.J. Cloud, Applications of Functional Analysis and Operator Theory
(Elsevier Science, 2006)
52. W.B. Johnson, J. Lindenstrauss, Handbook of the Geometry of Banach Spaces, vol. 2 (Elsevier
Science B.V., 2003)
53. J. Jost, Partial Differential Equations, 2nd edn. (Springer, 2007)
54. V. Kadets, A Course in Functional Analysis and Measure Theory (Springer, 2006)
55. L.V. Kantorovich, G.P. Akilov, Functional Analysis (Pergamon Pr, 1982)
56. S. Kantorovitz, Introduction to Modern Analysis (Oxford University Press, 2003)
57. N. Katzourakis, E. Varvaruca, An Illustrative Introduction To Modern Analysis (CRC Press,
2018)
58. A. Khanfer, Fundamentals of Functional Analysis (Springer, 2023)
59. H. Kielhöfer, Calculus of Variations, An Introduction to the One-Dimensional Theory with
Examples and Exercises (Springer, 2018)
References 363
60. A.N. Kolmogorov, S.V. Fomin, Elements of the Theory of Functions and Functional Analysis
(Martino Fine Books, 2012)
61. V. Komornik, Lectures on Functional Analysis and the Lebesgue Integral (Springer, 2016)
62. S.G. Krantz, A Guide to Functional Analysis (Mathematical Association of America, 2013)
63. E. Kreyszig, Introductory Functional Analysis with Applications (Wiley Classics Library,
1989)
64. A.J. Kurdila, M. Zabarankin, Convex Functional Analysis (Springer Science & Business
Media, 2005)
65. S.S. Kutateladze, Fundamentals of Functional Analysis (Springer-Science+Business Media,
B.V., 1995)
66. Serge Lang, Real and Functional Analysis (Springer, New York, 1993)
67. D. Peter, Lax, Functional Analysis: A Wiley-Interscience Series of Texts, Pure and Applied
Mathematics (2002)
68. L.P. Lebedev, I.I. Vorovich, Functional Analysis in Mechanics (Springer, New York, Inc.,
2003)
69. G. Leoni, A First Course in Sobolev Spaces (American Mathematical Society, 2009)
70. E.H. Lieb, M. Loss, Analysis (American Mathematical Society, 2001)
71. J. Lindenstrauss, L. Tzafriri, Classical Banach Spaces II: Function Spaces (Springer, Berlin,
Heidelberg GmbH, 1979)
72. Yu.I. Lyubich, Functional Analysis I: Linear Functional Analysis (Springer, Berlin, Heidel-
berg, 1992)
73. T.-W. Ma, Classical Analysis on Normed Spaces (World Scientific Publishing, 1995)
74. M.V. Marakin, Elementary Operator Theory (De Gruyter, 2020)
75. R. Megginson, An Introduction to Banach Space Theory (Springer, New York Inc, 1998)
76. M. Miklavcic, Applied Functional Analysis and Partial Differential Equations (World Scien-
tific Publishing Co., 1998)
77. D. Mitrea, Distributions, Partial Differential Equations, and Harmonic Analysis (Springer,
2018)
78. T.J. Morrison, Functional Analysis: An Introduction to Banach Space Theory (Wiley-
Interscience, 2000)
79. J. Muscat, Functional Analysis: An Introduction to Metric Spaces, Hilbert Spaces, and Banach
Algebras (Springer, 2014)
80. L. Narici, E. Beckenstein, Topological Vector Spaces (Chapman & Hall/CRC, Taylor & Fran-
cis Group, 2011)
81. J.T. Oden, L.F. Demkowicz, Applied Functional Analysis (CRC Press, Taylor & Francis
Group, 2018)
82. M.S. Osborne, Locally Convex Spaces (Springer International Publishing, Switzerland, 2014)
83. S. Ponnusamy, Foundations of Functional Analysis (Alpha Science International Ltd, 2002)
84. V. Maz’ya, Sobolev Spaces with Applications to Elliptic Partial Differential Equations, 2nd
edn. (Springer, 2011)
85. M. Renardy, R. Rogers, An Introduction to Partial Differential Equations, 2nd edn. (Springer,
2004)
86. M. Reed, B. Simon, Methods of Modern Mathematical Physics I: Functional Analysis (Aca-
demic, 1981)
87. F. Riesz, B. Sz.-Nagy, Functional Analysis (Dover Publications, 1990)
88. A.W. Roberts, D.E. Varberg, Convex Functions (Academic, 1973)
89. R.T. Rockafellar, Convex Analysis (Princeton University Press, 1970)
90. R.T. Rockafellar, R. Wets, Variational Analysis (Springer, 2010)
91. W. Rudin, Functional Analysis (McGraw-Hill, 1991)
92. B.P. Rynne, M.A. Youngson, Linear Functional Analysis. Springer Undergraduate Mathe-
matics Series (2008)
93. H.H. Schaefer, M.P. Wolff, Topological Vector Spaces (Springer Science+Business Media,
New York, 1999)
94. M. Schechter, Principles of Functional Analysis (American Mathematical Society, 2002)
364 References
95. M. Ó Searcóid, Elements of Abstract Analysis (Springer, 2002)

96. R. Sen, A First Course in Functional Analysis: Theory and Applications (Anthem Press, 2013)
97. V.I. Smirnov, A.J. Lohwater, A Course of Higher Mathematics. Integration and Functional
Analysis (Elsevier Ltd, 1964)
98. R. Strichartz, A Guide to Distribution Theory and Fourier Transforms (World Scientific Pub-
lishing Company, 2003)
99. V.S. Sunder, Operators on Hilbert Space. Springer, Texts and Readings in Mathematics (2016)
100. P. Szekeres, A Course in Modern Mathematical Physics (Cambridge University Press, 2004)
101. A.E. Taylor, D.C. Lay, Introduction to Functional Analysis (Robert E. Krieger, 1980)
102. M.E. Taylor, Partial Differential Equations (Springer, New York, 2010)
103. F. Treves, Topologocal Vector Space, Distributions and Kernels (Academic, California, 1967)
104. G.M. Troianiello, Elliptic Differential Equations and Obstacle Problems (Plenum, New York,
1987)
105. J.K. Truss, Foundations of Mathematical Analysis (Oxford University Press, 1997)
106. H.L. Vasudeva, Elements of Hilbert Spaces and Operator Theory. (Springer Nature Singapore
Pte Ltd., 2017)
107. P. Wojtaszczyk, Banach Spaces for Analysts (Cambridge University Press, 1991)
108. A. Wouk, A Course of Applied Functional Analysis (Wiley, 1979)
109. K. Yosida, Functional Analysis (Springer, Berlin, Heidelberg, 1978)
110. E. Zeidler, Applied Functional Analysis: Main Principles and Their Applications (Springer,
1995)
111. E. Zeidler, Applied Functional Analysis: Applications to Mathematical Physics (Springer,
1995)
112. A.H. Zemanian, Distribution Theory and Transform Analysis: An Introduction to Generalized
Functions, with Applications (Dover Publications, 2011)
113. W.P. Ziemer, Weakly Differentiable Functions, Sobolev Spaces and Functions of Bounded
Variations (Springer, 1989)
Index
A Convex hull, 299

Adjoint of general operators, 52 Convex set, 299
Adjoint operator, 5 Convolution of distribution, 125
Adjoint operator on Hilbert space, 7 Cut-off function, 144
Admissible set, 295
Arzela–Ascoli theorem, 9
D
Deficiency spaces, 54
B Delta distribution, 91
Banach–Alaoglu theorem, 303 Delta sequence, 91
Banach space, 3 Densely defined operator, 52
Bessel operator, 78 Diagonal operator, 39
Bessel’s inequality, 4 Diffeomorphism, 188
Bilinear form, 251 Difference quotient, 276
Bolzano–Weierstrass theorem, 296 Directional derivative, 322
Boundary regularity theorem, 287 Direct method, 306
Bounded below, 29 Dirichlet energy, 311
Bounded inverse theorem, 4 Dirichlet integral, 311
Bounded linear operator, 5 Dirichlet principle, 311
Dirichlet problem, 243
Distribution, 83
C Distributional derivative, 97
Caccioppoli’s inequality, 279 Dominated convergence theorem, 2
Caratheodory function, 344 Dual of Sobolev space, 175
Cauchy–Schwartz inequality, 3
Cauchy’s inequality, 252
Chain rule for Sobolev spaces, 171 E
Chebyshev, 78 Eigenfunction, 24
Classical solution, 246 Eigenvalue, 24
Closed graph theorem, 4 Eigenvector, 24
Closed operator, 47 Elliptic bilinear map, 252
Closed range theorem, 48 Elliptic equation, 239
Coercivity, 307 Embedding, 219
Compact embedding, 220 Epigraph, 297
Compact inclusion, 163 Euler–Lagrange equation, 340
Compact operator, 8 Extended Holder’s inequality, 202
Convex function, 299 Extension operator, 193
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer 365
Nature Singapore Pte Ltd. 2024
https://doi.org/10.1007/978-981-99-3788-2
366 Index
Extreme value theorem, 296 K

Kakutani’s theorem, 303
Kronecker delta function, 62
F
Fatou’s Lemma, 2
Finite-rank operator, 13 L
First variation, 338 Lagrangian integral, 338
Frechet derivative, 325 Laguerre operator, 78
Fredholm alternative, 43 Laplace equation, 241
Fredholm alternative for elliptic operators, Laplacian operator, 64
270 Lax–Milgram theorem, 261
Fredholm operator, 44 Lebesgue space, 1
Functions of slow growth, 113 Legendre operator, 78
Fundamental lemma of calculus of varia- Lipschitz domain, 187
tions, 148 Locally finite cover, 146
Locally integrable function, 84
Local Sobolev space, 163
G Lower semicontinuous, 297
Gagliardo–Nirenberg–Sobolev inequality,
204
Garding’s inequality, 253 M
Gateaux derivative, 324 Mazur’s lemma, 304
Gateaux differential, 323 Mazur’s theorem, 302
Gaussian function, 119 Meyers-Serrin theorem, 177
Generalized Plancherel theorem, 154 Minimization problem, 295
Green’s function, 63 Minimizer, 295
Minimizing sequence, 306
Minkowski’s inequality, 2
Mollifier, 141
H
Momentum operator, 69
Heine–Borel theorem, 302
Monotone convergence theorem, 2
Helmholtz equation, 242
Morrey’s inequality, 213
Higher order interior regularity theorem, 286
Multidimensional Fourier transform, 102
High-order Sobolev estimate, 225
Hilbert–Schmidt operator, 15
Hilbert–Schmidt theorem, 35 N
Hilbert space, 3 Nested inequality, 203
Holder-continuous function, 210 Neumann series, 32
Holder’s inequality, 2 Normed space, 1
Holder space, 211
O
I Open mapping theorem, 4
Inclusion map, 219
Infimum, 296
Inner product space, 3 P
Interior regularity theorem, 283 Parseval’s identity, 4
Interior smoothness theorem, 287 Partition of unity, 146
Interpolation inequality, 203 Plancherel theorem, 104
Invariant subspace, 34 Poincare inequality, 207
Poincare norm, 249
Poincare–Wirtinger inequality, 249
J Poisson equation, 241
Jointly convex function, 346 Proper function, 307
Index 367
Q T
Quotient Sobolev spaces, 250 Tempered distribution, 113
Test function, 83
Toeplitz theorem, 51
R
Radon-Riesz property, 301
Rapidly decreasing function, 106 U
Reflexive space, 302 Uniform bounded principle, 4
Regular distribution, 85 Uniformly elliptic operator, 240
Regular value, 28 Upper semicontinuous, 297
Rellich-Kondrachov theorem, 222
Resolvent, 28
Riesz-Fischer theorem, 2 V
Riesz Representation Theorem for Hilbert space, Variational integral, 311
255 Variational problem, 295
Riesz’s lemma, 2 Volterra equation, 45
S W
Schwartz space, 107 Weak derivative, 134
Self-adjoint operator, 8 Weak formulation of elliptic equation, 245
Sequentially lower semicontinuous, 297 Weakly bounded set, 302
Singular distribution, 87 Weakly closed, 301
Smooth domain, 187 Weakly closed set, 301
Smooth functions, 82 Weakly compact, 302
Sobolev conjugate, 200 Weakly compact set, 302
Sobolev embedding theorem, 226 Weakly convergence, 301
Sobolev exponent, 202 Weakly differentiable, 134
Sobolev’s inequality, 208 Weakly lower semicontinuous, 303
Sobolev space, 156 Weakly sequentially closed set, 301
Spectral mapping theorem, 33 Weakly sequentially compact set, 302
Spectral theorem for self-adjoint compact Weak solution, 245
operators, 39 Weak topology, 301
Spectral theorem of elliptic operator, 274 Weierstrass’s example, 314
Spectrum, 29 Weyl’s lemma, 275
Strictly convex, 300
Strongly diffeomorphism, 189
Strong solution, 246 Z
Sturm–Liouville operator, 67 Zero-boundary Sobolev space, 166, 179
Subordinate, 146 Zero extension, 182

Ammar Khanfer - Applied Functional Analysis-Springer (2024)

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

Ammar Khanfer - Applied Functional Analysis-Springer (2024)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ammar Khanfer - Applied Functional Analysis-Springer (2024)

Uploaded by

Copyright:

Available Formats

Applied Functional Analysis

ISBN 978-981-99-3787-5 ISBN 978-981-99-3788-2 (eBook)

Chapter 2 introduces distribution theory to the reader motivated by the discussion

Riyadh, Saudi Arabia Ammar Khanfer

1.6.4 Spectral Mapping Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.3.2 Dirac Delta Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

3.2 Regularization and Smoothening . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

3.10 Embedding Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

4.8.2 Eigenvalue Problem of Elliptic Operators . . . . . . . . . . . . . . 273

5.6 Gateaux Derivatives of Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . 322

1.1 Quick Review of Hilbert Space

1.1.1 Lebesgue Spaces

Definition 1.1.1 (Normed Spaces) Let X be a vector space. If X is endowed with a

The convergence in L p is defined as follows: Let ( f n ) ∈ L p (). Then f n converges

Theorem 1.1.4 (Minkowski’s Inequality) Let f, g ∈ L p for 1 ≤ p ≤ ∞. Then

Theorem 1.1.5 (Fatou’s Lemma) Let { f n } be a sequence of measurable functions

1.1.2 Convergence Theorems

Theorem 1.1.6 (Monotone Convergence Theorem) Let { f n } be a sequence of

Theorem 1.1.7 (Dominated Convergence Theorem) (DCT) Let { f n } be a sequence

Theorem 1.1.8 (Riesz’s Lemma) Let Y be a closed subspace of a normed space

1.1.3 Complete Space

Definition 1.1.11 (Banach Space) A space X is called complete if every Cauchy

Theorem 1.1.12 Every finite-dimensional normed space is complete.

1.1.4 Hilbert Space

Definition 1.1.14 (Inner Product) The inner product is a map

Theorem 1.1.15 (Cauchy–Schwartz Inequality) Let x, y ∈ V for some vector space

Theorem 1.1.17 (Decomposition Theorem) Let Y be a closed subspace of a Hilbert

Theorem 1.1.18 (Orthonormality Theorem) Let H be an infinite-dimensional

(2) Bessel’s Inequality. For every x ∈ H, we have

(3) Parseval’s Identity. If M is an orthonormal basis for H then for every x ∈ H,

1.1.5 Fundamental Mapping Theorems on Banach Spaces

Theorem 1.1.19 (Open Mapping Theorem) (OMT) If T ∈ B(X, Y ) where X and

Theorem 1.1.20 (Bounded Inverse Theorem) If T ∈ B(X, Y ) where X and Y are

Theorem 1.1.21 (Closed Graph Theorem) Let T : X −→ Y be a linear operator

Theorem 1.1.22 (Uniform Bounded Principle) Consider the sequence Tn ∈ B(X, Y )

1.2 The Adjoint of Operator

1.2.1 Bounded Linear Operators

T (cx + y) = cT (x) + T (y)

1.2.2 Definition of Adjoint

(verify). The following proposition provides two important fundamental properties

Proposition 1.2.2 Let X, Y be Banach spaces. Let T : X −→ Y be a linear oper-

Proof For (1), let f ∈ Y ∗ and x ∈ B X . Then

Let x ∈ X . Then T (x) = y ∈ Y. By Hahn–Banach theorem, there exists gx ∈ Y ∗

f (x) = f (T 1 (y)) = (T −1 )∗ f (y) = (T −1 )∗ f (T (x)) = (T −1 )∗ T ∗ f (x).

so T is one-to-one. Moreover, since T ∗∗ is onto, then by the open mapping theorem

1.2.3 Adjoint Operator on Hilbert Spaces

In Hilbert space, the definition of adjoint is given in terms of an inner product.

Recall the null space of an operator T is defined as

ker(T )= N (T ) = {x ∈ Dom(T ) : T (x) = 0}

and the range of T is

Im(T ) = R(T ) = {T x : x ∈ Dom(T )}.

Proposition 1.2.4 Let T : H1 → H2 be a bounded linear operator between two

1.2.4 Self-adjoint Operators

Conversely, let T x, x ∈ R for all x ∈ H. Then

1.3 Compact Operators

1.3.1 Definition and Properties of Compact Operators

We now introduce compact operators, an extremely important class of operators

h n (x) − h n (y) < /3

h n (y) − h m (y) ≤ /3.

h n (x) − h m (x) ≤ h n (x) − h n (y) + h n (y) − h m (y) + h m (y) − h m (x) < .

Proof For (1), let T be compact. If αn 0 then for every ,