Analysis 1

Eidgenössische Technische Hochschule Zürich
Analysis I: One Variable
Lecture Notes 2023
Alessio Figalli
January 26, 2024

Chapter 0.0
Preface
Welcome to ETH Zürich and to your exploration of these lecture notes. Originally crafted
in German for the academic year 2016/2017 by Manfred Einsiedler and Andreas Wieser, these
notes were designed for the Analysis I and II courses in the Interdisciplinary Natural Sciences,
Physics, and Mathematics Bachelor programs. In the academic year 2019/2020, a substantial
revision was undertaken by Peter Jossen.
For the academic year 2023/2024, Alessio Figalli has developed this English version. It
differs from the German original in several aspects: reorganization and alternative proofs
of some materials, extensive rewriting and expansion in certain areas, and a more concise
presentation. This version strictly aligns with the material presented in class, offering a
streamlined educational experience.
The courses Analysis I/II and Linear Algebra I/II are fundamental to the mathematics
curriculum at ETH and other universities worldwide. They lay the groundwork upon which
most future studies in mathematics and physics are built.
Throughout Analysis I/II, we will delve into various aspects of differential and integral
calculus. Although some topics might be familiar from high school, our approach requires
minimal prior knowledge beyond an intuitive understanding of variables and basic algebraic
skills. Contrary to high-school methods, our lectures emphasize the development of mathemat-
ical theory over algorithmic practice. Understanding and exploring topics such as differential
equations and multidimensional integral theorems is our primary goal. However, students are
encouraged to engage with numerous exercises from these notes and other resources to deepen
their understanding and proficiency in these new mathematical concepts.
Version: January 26, 2024. i

Contents
1 Introduction 2
1.1 Quadrature of the Parabola . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Tips on Studying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 The Real Numbers: Maximum, Supremum, and Sequences 8

2.1 The Axioms of the Real Numbers . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Ordered Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 Axiom of Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.3 Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.1 Definition of Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.2 The Absolute Value on the Complex Numbers . . . . . . . . . . . . . . . 31
2.3 Maximum and Supremum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.1 Existence of the Supremum . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.2 Two-point Compactification . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4 Consequences of Completeness . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4.1 The Archimedean Principle . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4.2 Decimal Fraction Expansion and Uncountability (Extra material) . . . . 40
2.5 Sequences of Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5.1 Convergence of Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5.2 Convergent Subsequences and Accumulation Points . . . . . . . . . . . . 46
2.5.3 Addition, Multiplication, and Inequalities . . . . . . . . . . . . . . . . . 49
2.5.4 Bounded Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.5.5 Cauchy-Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.5.6 Improper Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.6 Sequences of Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . 60
3 Functions of one Real Variable 61

3.1 Real-valued Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1.1 Boundedness and Monotonicity . . . . . . . . . . . . . . . . . . . . . . . 61
3.1.2 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.1.3 Sequential Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.2 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
ii
Chapter 0.0 CONTENTS
3.2.1 The Intermediate Value Theorem . . . . . . . . . . . . . . . . . . . . . . 72

3.2.2 Inverse Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3 Continuous Functions on Compact Intervals . . . . . . . . . . . . . . . 76
3.3.1 Boundedness and Extrema . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.3.2 Uniform Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.4 Example: Exponential and Logarithmic Functions . . . . . . . . . . . 80
3.4.1 Definition of the Exponential Function . . . . . . . . . . . . . . . . . . . 80
3.4.2 Properties of the Exponential Function . . . . . . . . . . . . . . . . . . . 82
3.4.3 The Natural Logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.5 Limits of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.5.1 Limit in the Vicinity of a Point . . . . . . . . . . . . . . . . . . . . . . . 89
3.5.2 One-sided Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.5.3 Landau Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.6 Sequences of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.6.1 Pointwise Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.6.2 Uniform Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4 Series and Power Series 101

4.1 Series of Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.1.1 Series with Non-negative Elements . . . . . . . . . . . . . . . . . . . . . 103
4.1.2 Conditional Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.1.3 Convergence Criteria of Leibnitz and Cauchy . . . . . . . . . . . . . . . 107
4.2 Absolute Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.2.1 Criteria for Absolute Convergence . . . . . . . . . . . . . . . . . . . . . 110
4.2.2 Reordering Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.2.3 Products of Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.3 Series of Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.4 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.4.1 Radius of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.4.2 Complex Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.5 Example: Exponential and Trigonometric Functions . . . . . . . . . 123
4.5.1 The Exponential Map as Power Series . . . . . . . . . . . . . . . . . . . 123
4.5.2 Sine and Cosine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.5.3 The Circle Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.5.4 Polar Coordinates and Multiplication of Complex Numbers . . . . . . . 130
4.5.5 The Complex Logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.5.6 Other Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . . . 132
5 Differential Calculus 135

5.1 The Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.1.1 Definition and Geometrical Interpretation . . . . . . . . . . . . . . . . . 135
5.1.2 Differentiation Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Version: January 26, 2024. iii

Chapter 0.0 CONTENTS
5.2 Main Theorems of Differential Calculus . . . . . . . . . . . . . . . . . 145

5.2.1 Local Extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.2.2 The Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.2.3 L’Hopital’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.2.4 Monotonicity and Convexity via Differential Calculus . . . . . . . . . . . 152
5.3 Example: Differentiation of Trigonometric Functions . . . . . . . . 156
5.3.1 Sine and Arc Sine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.3.2 Tangent and Arc Tangent . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5.3.3 Hyperbolic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6 The Riemann Integral 161

6.1 Step Functions and their Integral . . . . . . . . . . . . . . . . . . . . . 161
6.1.1 Decompositions and Step Functions . . . . . . . . . . . . . . . . . . . . 161
6.1.2 The Integral of a Step Function . . . . . . . . . . . . . . . . . . . . . . . 163
6.2 Definition and First Properties of the Riemann Integral . . . . . . 166
6.2.1 Integrability of Real-valued Functions . . . . . . . . . . . . . . . . . . . 166
6.2.2 Linearity and Monotonicity of the Riemann Integral . . . . . . . . . . . 168
6.3 Integrability Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.3.1 Integrability of Monotone Functions . . . . . . . . . . . . . . . . . . . . 172
6.3.2 Integrability of Continuous Functions . . . . . . . . . . . . . . . . . . . . 173
6.3.3 Integration and Sequences of Functions . . . . . . . . . . . . . . . . . . . 175
7 The Derivative and the Riemann Integral 177

7.1 The Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . . 177
7.1.1 The Fundamental Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 177
7.1.2 Integration by Parts and by Substitution . . . . . . . . . . . . . . . . . . 181
7.1.3 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
7.2 Integration and Differentiation of Power Series . . . . . . . . . . . . 186
7.3 Integration Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.3.1 Integration by Parts and by Substitution in Leibniz Notation . . . . . . 191
7.3.2 Integration by Parts: Examples . . . . . . . . . . . . . . . . . . . . . . . 192
7.3.3 Integration by Substitution: Examples . . . . . . . . . . . . . . . . . . . 193
7.3.4 Integration of Rational Functions . . . . . . . . . . . . . . . . . . . . . . 196
7.3.5 Definite Integrals with Improper Integration Limits . . . . . . . . . . . . 199
7.3.6 The Gamma Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7.4 Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
7.4.1 Taylor Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
7.4.2 Analytic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
8 Ordinary Differential Equations 211

8.1 Ordinary Differential Equations (ODEs) . . . . . . . . . . . . . . . . . 211
8.1.1 Linear First Order ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . 214
8.1.2 Autonomous First Order ODEs . . . . . . . . . . . . . . . . . . . . . . . 217
Version: January 26, 2024. iv

8.1.3 Homogeneous Linear Second Order ODEs with Constant Coefficients . . 220
8.1.4 Non-Homogeneous Linear Second Order ODEs with Constant Coefficients224
8.2 Existence and Uniqueness for ODEs . . . . . . . . . . . . . . . . . . . . . 229
8.2.1 Existence and Uniqueness for First Order ODEs . . . . . . . . . . . . . 229
8.2.2 Proof of Theorem 8.26 (Extra material) . . . . . . . . . . . . . . . . . . 233
8.2.3 Higher Order ODEs (Extra material) . . . . . . . . . . . . . . . . . . . . 236
1
Chapter 1
Introduction
1.1 Quadrature of the Parabola

Before we begin our journey in the world of mathematical analysis, as an example of how we
want to think and proceed here but also as an introduction to integral calculus, consider the
set
P = {(x, y) ∈ R2 | 0 ≤ x ≤ 1, 0 ≤ y ≤ x2 }. (1.1)
Our goal is to calculate its area.
This area was already determined as the first curvilinear bounded area by Archimedes (ca.
287–ca. 212 BCE) in the 3rd century BCE. For the area calculation, let us assume that we
know what the symbols in the definition in equation (1.1) mean and that P describes the area
in the following figure. In particular, we assume for the moment that we already know the
set of real numbers R.
Of course, calculating the area of P is not a challenge if we use integrals and the associated
calculation rules. However, we do not want to assume we know the integral calculus. Strictly
speaking, we must ask ourselves the following fundamental question before calculating:
What is an area?
If we cannot answer this question exactly, then we cannot know what it means to calculate
the area of P . Therefore, we qualify our goal in the following way:
2
Chapter 1.1 Quadrature of the Parabola
Proposition 1.1. — Suppose there is a notion of area in R2 that satisfies the following
properties:
1. The area of the rectangle [a, b] × [c, d] = {(x, y) ∈ R2 | a ≤ x ≤ b, c ≤ y ≤ d} is equal to

(b − a)(d − c), where a, b, c, d are real numbers with a ≤ b, c ≤ d.
2. If G is a domain in R2 and F is a domain contained in G, then the area of F is less

than or equal to the area of G.
3. For sets F, G in R2 without common points, the area of the union F ∪ G is the sum of
the areas of F and G.
Then the area of P as in equation (1.1) (if defined at all) is equal to 13 .
In other words, we have left open the question of whether there is a notion of area and for
what areas it is defined, but we want to show that 13 is the only reasonable value for the area
of P .
For the proof of Proposition 1.1 we need a lemma (also called an “auxiliary theorem”):
Lemma 1.2. — Let n ≥ 1 be a natural number. Then
n3 n2 n
12 + 22 + · · · + (n − 1)2 + n2 = + + . (1.2)
3 2 6
1
Proof. We perform the proof using induction. For n = 1, the left-hand side of equation (1.2)
is equal to 1 and the right-hand side is equal to 31 + 21 + 16 = 1. So equation (1.2) is true for
n = 1. This part of the proof is called the beginning of induction.
Suppose we already know that equation (1.2) holds for the natural number n. We now
want to show that it follows that equation (1.2) also holds for n + 1. The left-hand side of
equation (1.2), for (n + 1) instead of n, is given by
n3 n2 n n3 3n2 13n
12 + 22 + · · · + n2 + (n + 1)2 = + + + (n + 1)2 = + + +1
3 2 6 3 2 6
where, in the first equality, we have used the validity of equation (1.2) for the number n. The
right-hand side of equation (1.2), for (n + 1) instead of n, is given by
(n + 1)3 (n + 1)2 n + 1 n3 + 3n2 + 3n + 1 n2 + 2n + 1 n + 1

+ + = + +
3 2 6 3 2 6
n3 3n2 13n
= + + +1
3 2 6
This shows that the left and right sides of equation (1.2) also agree for n + 1. This part of
the proof is called the induction step.
It follows that equation (1.2) is true for n = 1 due to the validity of the beginning of the
induction. Therefore, it is also true for n = 2 due to the induction step, and for n = 3 again
due to the induction step. Continuing in this way, we obtain (1.2) for any natural number.
Version: January 26, 2024. 3

We say that equation (1.2) follows by means of induction for all natural numbers n ≥ 1.
Furthermore, we indicate the end of the proof with a small square.
Proof of Proposition 1.1. We assume that there is a notion of area content with the properties
in the proposition and that it is defined for P . Suppose that I is the area content of P . We
cover P for a given natural number n ≥ 1 with rectangles whose base has length n1 , as in
Figure 1.1 on the left.
Figure 1.1: Approximation of P with n = 6 rectangles.

2 2
Note that the first rectangle from the left has height n1 , the second one n2 , and so on.
Hence, from the assumed properties of the area, thanks to Lemma 1.2 the following inequality
holds:
1
1 1 1 22 1 n2
I≤ + + · · · +
n n2 n n2 n n2
1
= 3 (12 + 22 + · · · + n2 )
n
1 n3 n2 n

= 3 + +
n 3 2 6
1 1 1
= + + .
3 2n 6n2
Note that the straight-line segments where the rectangles touch have area 0, and we may
ignore them.
If, on the other hand, we use rectangles as in Figure 1.1 on the right, we also get
1 0 1 12 1 (n − 1)2
I≥ + + · · · +
n n2 n n2 n n2
1 2
= (1 + · · · + (n − 1)2 )
n3
1 2
= (1 + · · · + (n − 1)2 + n2 − n2 )
n3
1 n3 n2 n

2
= + + −n
n3 3 2 6
1 1 1
= − + 2
3 2n 6n

So in summary
1 1 1 1 1
− + 2 ≤I− ≤ + 2 (1.3)
2n 6n 3 2n 6n
for all natural numbers n ≥ 1. The only number that satisfies this for all natural numbers
n ≥ 1 is 0. Therefore, I − 13 = 0 and the proposition follows.
To rigorously prove the statement above, one needs to show that the only real number
satisfying (1.3) for all n ≥ 1 is 0. This is intuitively clear: indeed, taking n larger and larger,
the two expressions 2n 1
+ 6n1 2 and − 2n
1
+ 6n1 2 get smaller and smaller. However, we cannot give
a proof of it at this point, as we lack a rigorous definition of the real numbers.
As already mentioned, we have not answered the question of whether there is a notion of
area for sets in R2 . Nor have we described precisely what domains in R2 are, but we have
implicitly assumed that domains are those subsets of R2 to which we can assign an area. The
notions of the Riemann and Lebesgue integrals and measurable sets answer these fundamental
questions.
The idea of the proof is illustrated in the following applet.
Applet 1.3 (Estimating an area). We use up to 1000 rectangles to estimate the area from
below and from above. In the proof below, however, we will use an unlimited number of
rectangles and can thus determine the area exactly without any fuzziness.
1
Note that in the previous examples, we informally used the notion of “set”. Here, for
completeness, we give a more precise definition.
Interlude: Naive Set Theory

The central assumptions of naive set theory are the following postulates (1), (2), and
(4). Postulate (3), which we state here in addition, does not usually belong to them,
but is a consequence of the so-called axiom of regularity in Zermelo-Frenkel set theory.
(1) A set consists of distinguishable elements.
(2) A set is distinctively determined by its elements.
(3) A set is not an element of itself.
(4) Every statement A about elements of a set X defines the set of elements in X for
which the statement A is true; one writes {x ∈ X | A is true for x}.
The empty set, written as ∅ (or sometimes also {}), is the set containing no elements.
We write “x ∈ X” if x is an element of the set X. If x is not an element of the set X,

we write x ̸∈ X. Sometimes we describe a set by a concrete list of its elements, for example,
X = {x1 , x2 , . . . , xn }, but often it is more convenient to use postulate (4). For example, one

can write
{n ∈ Z | ∃ m ∈ Z : n = 2m}
1 to describe the set of even numbers. The symbol ∃ means “there exists”, while the symbol ∀
means “for all”. The symbols “ | ” and “:” in the formula above both mean “such that”. With
time, this mathematical terminology will become familiar, one just needs time and practice.

Chapter 1.2 Tips on Studying
1.2 Tips on Studying

“All beginnings are difficult”
You will notice a big difference between school mathematics and university mathematics. The
latter also uses its own language, which you will have to learn. The sooner you take this on,
the more you will take away from the lectures. This brings us to the next tip.
“There is no silver bullet to mathematics”

(handed down quotation from Euclid to the Egyptian king Ptolemy I)
You cannot learn mathematics by watching it; just as you cannot learn tennis or skiing by
watching all available tournaments or world championships on television. Rather, you should
learn mathematics like a language, and a language is taught by using it. Discuss the topics of
the lectures with colleagues. Explain to each other the proofs from the lecture or the solution
of the exercise examples. Above all, solve as many exercises as possible; this is the only way
to be sure that you have mastered the topics.
It is fine to work on the exercises in small groups. This even has the advantage that the
group discussions make the objects of the lectures more lively. However, you should ensure
that you fully understand the solutions, explain them and subsequently solve similar problems
on your own.
“He who asks is a fool for a minute. He who does not ask is a fool all his life.”
Confucius
Ask as many questions as you can and then ask them when they come up. Probably many of
your colleagues have the same question, or have not even noticed the problem. This allows the
lecturer or teaching assistant to simultaneously fix a problem for many and identify problems
in students where she or he thought none existed. Furthermore, good question formulation
needs to be practiced; the first year is the ideal time to do this.

Chapter 2
The Real Numbers: Maximum,

Supremum, and Sequences
2.1 The Axioms of the Real Numbers
2.1.1 Ordered Fields

Although we all have an intuitive notion of real numbers, our aim is to introduce them properly.
2 To do so, we shall first introduce the notion of a field and then the concept of a relation.
We start by introducing some fundamental notions that will be covered in much more
depth in the algebra lectures. Loosely speaking, a group is a set equipped with an “operation”
that satisfies a list of properties. For our purposes, it is enough to know that an operation is
something that takes two elements of a set and gives back a third element of the set. We now
specify which properties the operation has to satisfy so that we can speak of a group.
8
Chapter 2.1
Interlude: Groups
A group is a (non-empty) set G endowed with an operation “⋆” that satisfy the fol-
lowing properties:
• (Associativity) for all a, b, c ∈ G, we have
(a ⋆ b) ⋆ c = a ⋆ (b ⋆ c).
• (Neutral element) there exists a neutral element, i.e., e ∈ G such that for all
a ∈ G we have
a ⋆ e = e ⋆ a = a.
• (Inverse element) for each element a ∈ G there is an inverse element, i.e., a−1 ∈ G
such that
a ⋆ a−1 = a−1 ⋆ a = e.
Note that, in general, we do not require that a ⋆ b = b ⋆ a for arbitrary a, b ∈ G. If this

property holds, the group is called commutative or abelian.
Example 2.1. — To make these concepts easier to access, let us assume for the moment
2 that we already know the natural numbers N and the integers Z. We check whether they
satisfy the above properties if we replace ⋆ with the operations you are already familiar with.
1. Consider the natural numbers N = {0, 1, 2, 3, ...} with the usual addition + that you
probably know since primary school. This is
• associative, since for any natural numbers k, l, m ∈ N we have
(k + l) + m = k + (l + m),
• has a neutral element 0 ∈ N, since for any natural number n ∈ N we have
0 + n = n + 0 = n,
but
• no element apart from 0 has an inverse element. In fact, the inverse element of
n ∈ N\{0} would be −n, which is not included in the natural numbers N.
2. The same arguments show that the integers Z = {..., −3, −2, −1, 0, 1, 2, 3, ...}, again
with the addition, form a group. Moreover, this is a commutative group, since for all
integers n, m ∈ Z we have
n + m = m + n.

Chapter 2.1
3. As a different example, consider the set of nonzero rational numbers

× p
Q = | p, q ∈ Z, p, q ̸= 0
q
with the usual multiplication · between numbers. In this case, one can check that the
multiplication is associative and commutative, the neutral element is 1, and the inverse
of pq is pq . Hence, this is a commutative group.
2.2. — It follows directly from the definition of the neutral element that it is unique.
Indeed, assume that additionally to e ∈ G, we have a second element e′ with the property
such that e′ ⋆ a = a ⋆ e′ = a for all elements a ∈ G. Then, we can choose a = e and obtain
e = e ⋆ e ′ = e′ ,
2
where the first equality follows from the fact that e′ is neutral, while in the second equality
we used that that e is neutral.
We can thus speak of the neutral element of a group.
In the same spirit, assume that for an element a ∈ G, there exist two inverse elements a−1
and ã−1 . Then, using associativity, we observe that
a−1 = a−1 ⋆ e = a−1 ⋆ (a ⋆ ã−1 ) = (a−1 ⋆ a) ⋆ ã−1 = e ⋆ ã−1 = ã−1 .
So also for inverse elements, we might speak of the inverse element. In particular, since
a ⋆ a−1 = e, we deduce that a is the inverse of a−1 , thus
(a−1 )−1 = a. (2.1)

Chapter 2.1
Interlude: Rings and Fields

A ring is a (non-empty) set R in which we can “add” and “multiply” elements in a
compatible way. More precisely, a ring is a commutative group R whose first operation
“+” is called “addition”, and it is also equipped with an additional operation called
“multiplication” and denoted by “·”.
The neutral element for the addition is denoted by 0. The multiplication · is associative,
has a neutral element (usually denoted by 1), and satisfies the following property:
• (Distributivity) for all a, b, c ∈ R, we have
a · (b + c) = a · b + a · c
and
(a + b) · c = a · c + b · c.
A ring in which the operation · is also commutative is called a commutative ring.

Note that, in a ring, we do not require that elements have an inverse for the multipli-
cation.
A commutative ring in which every “non-zero” element (that is, any element other than
the neutral element 0 for the addition) has an inverse element for the multiplication ·
is called a field. In other words, a ring R is a field if R\{0} is a commutative group
2
for the operation ·.
Fields will usually be denoted by the letter K for the corresponding German word
“Körper”. We write K × = K \ {0} for the set of invertible elements in K.
Example 2.3. — Let us continue with our examples. We have already established that the
integers Z form a commutative group, but are they also a ring with the usual multiplication?
We must check:
• Associativity of the multiplication: For all integers k, l, m ∈ Z, we have
(k · l) · m = k · (l · m).
• Neutral element for the multiplication: The neutral element for the multiplication is
1 ∈ Z as, for all integers k ∈ Z, we have
1 · k = k · 1 = k.
• Distributivity: For all k, l, m ∈ Z we have
k · (l + m) = k · l + k · m

Chapter 2.1
and
(k + l) · m = k · m + l · m.
Hence, we conclude that Z is a ring. Moreover, since the multiplication is commutative

(namely, k · l = l · k), it is a commutative ring. However, it is not a field since no element
other than 1 and −1 has a multiplicative inverse. For example, the multiplicative inverse of 2
is 21 , which is not an integer.
n o
p
Example 2.4. — The set of rational numbers Q = q | p, q ∈ Z, q ̸= 0 with the usual
addition and multiplication is a field.
2.5. — Before going on, we look at some immediate consequences of the definition of field.
In the current notation, −a denotes the inverse of a with respect to the addition, while a−1
is the inverse of a with respect to the multiplication. Note that, in the current context, (2.1)
implies that
−(−a) = a, and (a−1 )−1 = a whenever a ̸= 0. (2.2)
Let K be a field, and a, b ∈ K. Then the following holds:
(i) 0 · a = 0 and a · 0 = 0.
Proof: Since 0 is the neutral element for the addition, we have 0 = 0 + 0. Hence, using
2 distributivity, we get
0 · a = (0 + 0) · a = (0 · a) + (0 · a).
Adding −0 · a (i.e., the inverse of 0 · a for the addition), we deduce that 0 · a = 0. The
case of a · 0 is analogous.
(ii) a · (−b) = −(a · b) = (−a) · b. In particular, we have (−1) · a = −a.

Proof: By the distributive law, we have
a · b + a · (−b) = a · (b + (−b)) = a · 0 = 0.
So a · (−b) is the additive inverse of a · b, i.e., −(a · b) = a · (−b). Taking b = 1 gives

−a = (−1) · a.
The validity of (−a) · b = −(a · b) follows exchanging a and b in the argument above.
(iii) (−a) · (−b) = a · b. In particular, we have (−a)−1 = −(a−1 ).

Proof: By (ii) we know that −(a · b) = a · (−b). Hence, recalling (2.2),
a · b = −(a · (−b)).
On the other hand, applying (ii) with (−b) instead of b, we also have
−(a · (−b)) = (−a) · (−b).

Chapter 2.1
Combining the two identities above, we conclude that (−a) · (−b) = a · b. Finally, taking
b = a−1 yields (−a) · (−(a−1 )) = a · a−1 = 1, which gives the second assertion.
Remark 2.6. — A natural question one may ask is the following: Is it possible to construct
a field K where 0 (i.e., the neutral element for +) and 1 (i.e., the one for ·) are equal?
Assume that 0 = 1. Then, using (i) above and the fact that 1 is the neutral element for
multiplication, we get
0=a·0=a·1=a
for every a ∈ K. So, the only possibility for having 0 = 1 is that K consists of the single
element 0. From now on, we shall assume that K always contains at least two elements, so in
particular, 0 and 1 cannot coincide.
Next, we introduce the second ingredient of an ordered field, the order relation. Again we
will do so in steps:
Interlude: Cartesian Product

Let X and Y be two sets. The cartesian product X × Y is the set of ordered pairs
of elements in X and Y :
X × Y = {(x, y) | x ∈ X, y ∈ Y }.
2
Example 2.7. — The cartesian product X × Y of X = {A, B, C, D, E, F, G, H} and Y =

{1, 2, 3, 4, 5, 6, 7, 8} is what we use to write down positions on a chess board: To each pair we
can associate a unique square on the chess board and vice versa. For instance, the black king
starts the game on the square corresponding to (E, 8) ∈ X × Y .
Interlude: Subsets
Let P and Q be sets.
• We say that P is subset of Q, and write P ⊂ Q (or P ⊆ Q), if for all x ∈ P also
x ∈ Q holds.
• We say that P is a proper subset of Q, and write P ⊊ Q, if P is a subset of Q

but not equal to Q.
• We write P ̸⊂ Q (or P ̸⊆ Q) if P is not a subset of Q.
Equivalent formulations for “P is a subset of Q ” are “P is contained in Q” and “Q is

a supset of P ”, which we also write as “Q ⊃ P ”. The meaning of the statement “Q is a
proper superset of P ”, written Q ⊋ P , is now implicit. Because of the second assumption
of naive set theory, two sets P and Q are equal exactly if both P ⊂ Q and Q ⊂ P hold.

Chapter 2.1
For instance, {x, y} = {z} holds if x = y = z. Note that there are no “multiplicities” for
elements of a set (for instance, {x, x, x} = {x}).
Interlude: Relations
Let X be a set. A relationship on X is a subset R ⊂ X × X, that is, a list of ordered
pairs of elements of X. We also write xRy if (x, y) ∈ R and often use symbols such as
<, ≪ , ≤, ∼=, ≡, ∼ for the relations.
If ∼ is a relation, we write “x ̸∼ y” if “x ∼ y” does not hold. A relation ∼ is called:
1. (Reflexive) for all x ∈ X : x ∼ x.
2. (Transitive) for all x, y, z ∈ X : x ∼ y and y ∼ z =⇒ x ∼ z.
3. (Symmetric) for all x, y ∈ X : x ∼ y =⇒ y ∼ x.
4. (Antisymmetric) for all x, y ∈ X : x ∼ y and y ∼ x =⇒ x = y.
A relation is called equivalence relation if it is reflexive, transitive, and symmetric.

A relation is called order relation if it is reflexive, transitive, and antisymmetric.
Example 2.8. — We look again at the integers Z and two examples of relations on them.
Let m, n, p ∈ Z be integers.
2
• Consider the relation ≤ of being “less than or equal to", i.e., we write n ≤ m if n is less
than or equal to m. Then we see that this relation is:
1. reflexive because of the trivial observation that n = n,

2. transitive since n ≤ m and m ≤ p implies n ≤ p,
3. not symmetric since for instance 7 ≤ 8 but 8 ̸≤ 7,
4. antisymmetric since if n is less than or equal to m and vice versa, the only possibility
is that they are equal.
We conclude that ≤ is an order relation.
• Consider next the relation < of being “strictly smaller than", i.e., we write n < m if n
and m are distinct integers and n is less than m. This relation is:
1. not reflexive since no integer is strictly smaller than itself,

2. transitive since n < m and m < p implies n < p,
3. not symmetric since 3 < 5 but 5 ̸< 3,
4. antisymmetric. This point is a bit subtle: We need to check for all n, m such that
n < m and m < n, we have n = m. But there are no such n, m and hence the
condition of antisymmetry is fulfilled because there is nothing to check.

Chapter 2.1
We conclude that < is neither an equivalence relation nor an order relation, since it does
not satisfy the reflexivity property.
Definition 2.9: Ordered Field

Let K be a field, and let ≤ be an order relation on the set K. We call (K, ≤), or K for
short, an ordered field if the following conditions are satisfied:
1. (Linearity of the order) for all x, y ∈ K,
at least one between x ≤ y and y ≤ x holds.
2. (Compatibility of order and addition) for all x, y, z ∈ K, it holds
x ≤ y =⇒ x + z ≤ y + z.
3. (Compatibility of order and multiplication) for all x, y ∈ K, it holds
0 ≤ x and 0 ≤ y =⇒ 0 ≤ x · y.
The following terminology is standard and will be used throughout these lecture notes:
• We pronounce x ≤ y as “x is less than or equal to y”.

2
• For x, y ∈ K we define y ≥ x by x ≤ y, and pronounce this as “y is greater than or
equal to x”.
• We say that an element x ∈ K is non-negative if x ≥ 0, and non-positive if x ≤ 0.
• Further, we define x < y (pronounced as “x is smaller than y” or “x is strictly smaller

than y”) whenever x ≤ y and x ̸= y.
• Analogously, we define x > y when y < x, and say “x is greater than y” or “x is strictly
greater than y”.
• An element x ∈ K is positive if x > 0 holds, and negative if x < 0 holds.
We often use these symbols in “equidirectional chains”, for example, x ≤ y < z = a stands for
x≤y and y<z and z = a.
Example 2.10. — A well-known example of an ordered field is the one of rational numbers
Q, together with the usual order relation given by
p p′
≤ ′ ⇐⇒ pq ′ ≤ p′ q, p, p′ ∈ Z, q, q ′ ∈ N.
q q
Here on the right-hand side is the order on the integers, which we assume to be known.

Chapter 2.1
2.11. — Let (K, ≤) be an ordered field, and x, y, z, w denote elements of K. We want to

prove a series of properties that follow from the definitions. To simplify the notation, it is
customary to write · for multiplication only if it would otherwise be confusing. This is why,
in proofs, · may disappear. For example, we may write xy instead of x · y.
(a) (Trichotomy) Either x < y or x = y or x > y.

Proof: This follows directly from the linearity of the order relation ≤.
(b) If x < y and y ≤ z, then x < z also holds.

Proof: First of all, since our assumption implies in particular that x ≤ y and y ≤ z, we
have x ≤ z according to the transitivity of the order relation.
We now want to prove x < z. Assume by contradiction that this is not the case. Then,
since x ≤ z, it must be x = z. Recalling that y ≤ z, this proves that y ≤ x. However, this
contradicts x < y. Hence, we conclude that x = z is impossible and, therefore, x < z.
Analogously, we see that x ≤ y and y < z imply that x < z.
(c) (Addition of Inequalities) If x ≤ y and z ≤ w hold, then x + z ≤ y + w also holds.

Proof: Indeed, x ≤ y implies x + z ≤ y + z according to the additive compatibility in
Definition 2.9, and z ≤ w implies y + z ≤ y + w for the same reason. Transitivity of the
order relation implies x + z ≤ y + w.
Analogously, using inference (b), one sees that x < y and z ≤ w imply that y + z < y + w.
3
(d) x ≤ y is equivalent to 0 ≤ y − x holds.
Proof: If x ≤ y, then adding −x to both sides, we obtain 0 ≤ y − x.
Vice versa, if 0 ≤ y − x then adding x to both sides we obtain x ≤ y.
(e) x ≤ 0 is equivalent to 0 ≤ −x.

Proof: This follows from (d) with y = 0.
(f) x2 ≥ 0, and x2 > 0 if x ̸= 0.

Proof: If x ≥ 0, the first statement follows from the multiplicative compatibility in
Definition 2.9.
If x ≤ 0, then −x ≥ 0 by inference (e) and hence x2 = (−x)2 ≥ 0 (the fact that
x2 = x · x = (−x) · (−x) = (−x)2 follows from (iii) in Paragraph 2.5).
Finally, we want to prove that x2 > 0 whenever x ̸= 0. Assume by contradiction that
there exists x ̸= 0 such that x2 = 0. Since K is a field and x ̸= 0, x−1 (the inverse of x
for multiplication) exists. Therefore, recalling (i) in Paragraph 2.5 and using that x2 = 0,
we get
0 = 0 · x−1 = x2 · x−1 = x · x · x−1 = x · 1 = x,
hence x = 0. This contradicts our assumption x ̸= 0 and concludes the proof.
(g) It holds 0 < 1.

Proof: 1 = 12 ≥ 0 by inference (f) and 1 ̸= 0 (see Remark 2.6).

Chapter 2.1
(h) If 0 ≤ x and y ≤ z, then xy ≤ xz.

Proof: Using inference (d), according to which z − y ≥ 0, and the multiplicative compat-
ibility in Definition 2.9, xz − xy = x(z − y) ≥ 0 holds, and thus the statement follows in
turn from inference (d).
(i) If x ≤ 0 and y ≤ z, then xy ≥ xz.

Proof: Note that −x ≥ 0 by inference (e), and z − y ≥ 0 by inference (d). Thus, by the
multiplicative compatibility in Definition 2.9, we get
xy − xz = x(y − z) = (−x)(−(y − z)) = (−x)(z − y) ≥ 0.
(j) If 0 < x ≤ y, then 0 < y −1 ≤ x−1 .

Proof: We first assert that x−1 > 0 (y −1 > 0 follows analogously). Indeed, if not,
because of x−1 ̸= 0 then trichotomy in inference (a) implies that x−1 < 0. Accordingly,
1 = xx−1 < 0 would hold by inference (h), which contradicts (g).
Since x−1 > 0 and y −1 > 0, we deduce that x−1 y −1 > 0. Therefore, using (h),
y −1 = xx−1 y −1 ≤ yx−1 y −1 = x−1 .
(k) If 0 ≤ x ≤ y and 0 ≤ z ≤ w, then 0 ≤ xz ≤ yw.

3 Proof: Exercise 2.12.
(l) If x + y ≤ x + z, then y ≤ z.
Proof: Exercise 2.12.
(m) If xy ≤ xz and x > 0, then y ≤ z.

Proof: Exercise 2.12.
Exercise 2.12. — Prove the inferences (k),(l),(m). What happens in (m) when you drop
the condition x > 0, that is, when x < 0 or x = 0? For some of the above inferences, formulate
and prove similar versions for the strict relation “<”.
2.13. — Let (K, ≤) be an ordered field. As usual, we write 2, 3, 4, . . . for the elements of
K given by 2 = 1 + 1, 3 = 2 + 1, et cetera. By the compatibility of + and ≤ in Definition 2.9,
and recalling property (g) in Paragraph 2.11, the inequalities
. . . < −2 < −1 < 0 < 1 < 2 < 3 < 4 < . . .
hold in K. In particular, the elements . . . , −2, −1, 0, 1, 2, 3, . . . of K are all distinct. We iden-
tify the set Z of integers with a subset of K. That is, we call the elements {. . . , −2, −1, 0, 1, 2, 3, . . .}
of K “integers”. Consequently, we call elements {pq −1 | p, q ∈ Z, q ̸= 0} in K rational numbers

Chapter 2.1
and thus identify Q with a subfield of K:
Z ⊊ Q ⊆ K.
In other words, if (K, ≤) is an ordered field, then it always includes a copy of the rationals
inside it.
The above axioms, inferences, and statements in the exercises represent the usual properties
for inequalities. We can also use them to solve problems like the one in the following exercise:
Exercise 2.14. — Show that
{x ∈ R \ {0} | x + 3
x + 4 ≥ 0} = {x ∈ R \ {0} | − 3 ≤ x ≤ −1 or x > 0}.
(x+3)(x+1)
Hint: note that x + 3
x +4= x .
Interlude: Functions
A function f from a set X to a set Y is a map that assigns to each x ∈ X a uniquely
determined element y = f (x) ∈ Y . We write f : X → Y for a function from X to Y
and sometimes also speak of a mapping or a transformation. We refer to the set X
as domain, and the set Y as domain of values or codomain.
3
The set F = {(x, f (x)) | x ∈ X} is called the graph of f . In the context of a function
f : X → Y , an element x of the domain of definition is also called argument, and an
element y = f (x) ∈ Y assumed by the function is also called value of the function. If
f : X → Y is a function, one also writes
f :X → Y
x 7→ f (x),
where f (x) could be a concrete formula. For example, f : R → R with x 7→ x2 is a

fully defined function in this notation. We pronounce “7→” as “is mapped to”. Two
functions f1 : X1 → Y1 and f2 : X2 → Y2 are said to be equal if X1 = X2 , Y1 = Y2 ,
and f1 (x) = f2 (x) for all x ∈ X1 .
Definition 2.15. — Let (K, ≤) be an ordered field. The absolute value or modulus on
K is the function | · | : K → K given by

x if x ≥ 0
|x| =
−x if x < 0.

Chapter 2.1
The sign is the function sgn : K → {−1, 0, 1} given by




−1 if x < 0

sgn(x) = 0 if x = 0 .


1 if x > 0

2.16. — In what follows, let (K, ≤) always be an ordered field, and x, y, z, w denote
elements from K.
(a) It holds x = sgn(x) · |x|, as well as | − x| = |x| and sgn(−x) = − sgn(x).
(b) It holds |x| ≥ 0, and |x| = 0 exactly when x = 0.

Proof: This follows from the trichotomy property.
(c) (Multiplicativity) It holds sgn(xy) = sgn(x) sgn(y) and |xy| = |x||y|.

To prove this, check all possible four cases (depending on whether x, y are negative or
not).
(d) If x ̸= 0, then |x−1 | = |x|−1 holds.

Proof: This follows from (c) because of |x−1 ||x| = 1.
(e) |x| ≤ y is equivalent to −y ≤ x ≤ y.

3 Proof: First we note that, in both cases, y is a non-negative number.
Suppose first |x| ≤ y. If x ≥ 0 then −y ≤ 0 ≤ x = |x| ≤ y.
If x < 0 then −y ≤ −|x| = x < 0 ≤ y and so again −y ≤ x ≤ y.
Suppose now −y ≤ x ≤ y. If x ≥ 0 then |x| = x ≤ y.
If x < 0 then we observe that the inequality −y ≤ x is equivalent to −x ≤ y. Hence
|x| = −x ≤ y also in this case.
(f) |x| < y is equivalent to −y < x < y.

This is proved by arguing as in (e).
(g) (Triangle inequality) It holds that
|x + y| ≤ |x| + |y|.
Proof: Note that, by (e), we have −|x| ≤ x ≤ |x| and −|y| ≤ y ≤ |y|. Adding these two
inequalities, we get
−(|x| + |y|) ≤ x + y ≤ |x| + |y|.
So, by property (e), |x + y| ≤ |x| + |y|.
(h) (Inverse triangle inequality) It holds that |x| − |y| ≤ |x − y|.

Proof: The triangle inequality in (g) shows that |x| ≤ |x − y + y| ≤ |x − y| + |y|, which

Chapter 2.1
leads to |x| − |y| ≤ |x − y|. Exchanging x and y, we get |y| − |x| ≤ |x − y|. So, by property
(e), |x| − |y| ≤ |x − y|, as desired.
3
Exercise 2.17. — For which x, y ∈ R does equality hold in the triangle inequality? And
in the inverse triangle inequality?
2.1.2 Axiom of Completeness

To do calculus, ordered fields are generally unsuitable because you can have “gaps”. Indeed,
think of the rational numbers Q in R. The field of rational numbers with the usual order is
ordered. So we need more properties, or another axiom, to do calculus in an ordered field.
This axiom is the so-called “completeness axiom”. To a certain extent, the search for this
axiom began with the work of Greeks such as Pythagoras, Euclid, and Archimedes, but it
did not become successful until the 19th century in the work of numerous mathematicians,
including Weierstrass, Heine, Cantor, and Dedekind (see also this link).
Definition 2.18: Completeness Axiom

Let (K, ≤) be an ordered field. We say (K, ≤) is complete or a completely ordered
field if the statement (V) is true.
(V) Let X, Y be non-empty subsets of K such that for all x ∈ X and y ∈ Y the
inequality x ≤ y holds. Then there exists c ∈ K lying between X and Y , in the
sense that for all x ∈ X and y ∈ Y the inequality x ≤ c ≤ y holds.
4
We call statement (V) the completeness axiom.
Definition 2.19: Real Numbers

We call field of real numbers any completely ordered field. Such a field is denoted
with the symbol R.
2.20. — We will often visualise the real numbers as the points on a straight line, which is
why we also call it the number line.
We interpret the relation x < y for x, y ∈ R as “on the straight line, the point y lies to the
right of the point x”. What does the completeness axiom mean in this picture?
Let X, Y be non-empty subsets of R such that for all x ∈ X and all y ∈ Y the inequality
x ≤ y holds. Then all elements of X are to the left of all elements of Y as in the following
figure.

Chapter 2.1
So, according to the completeness axiom, there exists a number c that lies in between. The
existence of the number c is, in a sense, an assurance that R has no “gaps”. It is advisable to
visualize definitions, statements, and their proofs on the number line. However, the number
line should always be used only as a motivation and to develop a good intuition, but not for
rigorous proof.
Interlude: Injective, surjective and bijective functions
Let f : X → Y be a function. We call f :
1. injective or an injection if f (x1 ) = f (x2 ) =⇒ x1 = x2 for all x1 , x2 ∈ X,
2. surjective or a surjection if for every y ∈ Y there exists an x ∈ X with f (x) = y

and
3. bijective or a bijection if it is both surjective and injective.
Thus, a function f : X → Y is not injective if there exist two distinct elements x1 ̸= x2 ∈ X

with f (x1 ) = f (x2 ), and not surjective if there exists a y ∈ Y such that f (x) ̸= y holds for all
4
x ∈ X.
In the following image, an injective function that is not surjective is shown on the left, and
a surjective function that is not injective is shown on the right.
Interlude: Image and Preimage of a Function

Definition 2.21. — For a function f : X → Y and a subset A ⊂ X we write
f (A) = {y ∈ Y | ∃ x ∈ A : f (x) = y}
and call this subset of Y the image of A under the function f . For a subset B ⊂ Y we
write
f −1 (B) = {x ∈ X | ∃ y ∈ B : f (x) = y}
and call this subset of X the preimage of B under the function f .

Chapter 2.1
Remark 2.22. — Saying that a function f : X → Y is surjective is equivalent to saying

that f (X) = Y (i.e., every element of Y is in the image of X under f ).
Example 2.23. — Let f : R → R be the constant function x 7→ 0 for every x ∈ R. Then

= R, while f −1 ({y}) = ∅ for every y ̸= 0.
f −1 ({0})
Example 2.24. — Let X, Y be two finite sets with the same number of elements (for
example, X and Y could be the same set). Then, for a function f : X → Y , injectivity and
surjectivity are equivalent.
To show this, assume that X and Y have n elements and write X = {x1 , . . . , xn }. Suppose
first that f is injective. Then all the elements f (xi ) are distinct, which means that the set
f (X) = {f (x1 ), . . . , f (xn )} also has n elements. Being f (X) a subset of Y and Y has n
elements, the only option is that f (X) = Y . This proves that injectivity implies surjectivity.
Conversely, to show that surjectivity implies injectivity, we prove that if f is not injective
then f is not surjective. So, assume there exist at least two elements xi ̸= xj such that
f (xi ) = f (xj ). This means that f (X) has at most n − 1 elements, so f cannot be surjective.
Remark 2.25. — For infinite sets, injectivity and surjectivity are not necessarily equivalent.
Consider for instance the functions f1 , f2 : N → N defined as
(
4 0 if n = 0,
f1 (n) = n + 1, f2 (n) =
n − 1 if n ≥ 1.
Then f1 is injective but not surjective, while f2 is surjective but not injective.
Exercise 2.26. — Reformulate the definitions of injectivity, subjectivity, and bijectivity

using the notions of image and preimage of a function.
We conclude this section by introducing the square root function on R≥0 = {x ∈ R :

x ≥ 0} as an application of the completeness axiom. We formulate this as an exercise for the
reader.
Exercise 2.27. — In this exercise, we show the existence and uniqueness of a bijective
√ √
function · : R≥0 → R≥0 with property ( a)2 = a for all a ∈ R≥0 .
1. Show that, for all x, y ∈ R≥0 : x < y is equivalent to x2 < y 2 .
2. Use Step 1 to deduce that, for every a ∈ R≥0 , there can exist at most one element
c ∈ R≥0 satisfying c2 = a.
3. For a real number a ∈ R≥0 consider the non-empty subsets
X = {x ∈ R≥0 | x2 ≤ a}, Y = {y ∈ R≥0 | y 2 ≥ a},

Chapter 2.1
and apply the completeness axiom to find c ∈ R with x ≤ c ≤ y for all x ∈ X and
y ∈ Y . Prove that c ∈ X and c ∈ Y to conclude that both c2 ≤ a and c2 ≥ a hold, thus
c2 = a.
Hint: If by contradiction c ∈ / X (that is, c2 > a), then one can find a suitably small
real number ε > 0 such that (c − ε)2 ≥ a. Thus c − ε ∈ Y, which contradicts y ≥ c for
every y ∈ Y . The case of c ∈
/ Y is analogous.
√
We call square root function the function · : R≥0 → R≥0 that assigns to each a ∈ R≥0
the number c ∈ R≥0 uniquely determined by the above construction. We note that c2 = a,
√
and we call c = a the square root of a. Show that:
√ √ √
4. The function · is increasing: for x, y ∈ R≥0 with x < y, the inequality x < y holds.
√
5. The function · : R≥0 → R≥0 is bijective.
√ √ √
6. For all x, y ∈ R≥0 , xy = x y.
4 √
Exercise 2.28. — For all x ∈ R, show that x2 = |x|2 and x2 = |x|.
2.29. — In summary, in a field of real numbers as defined in Definition 2.19, the usual
arithmetic rules and equation transformations work, although (as usual) division by zero is
not defined. Furthermore, the relations ≤ and < satisfy the usual transformation laws for
inequalities. In particular, when multiplying by negative numbers, the inequalities must be
reversed. We will use these laws in the following without reference. We will see the deep
meaning of the completeness axiom when we use it for further statements. In particular, until
further notice, we will always refer to it when we use it.
It is not clear for the moment that there is indeed a field of real numbers as in Definition
2.19. The fact that we occasionally even speak of the real numbers stems from the fact that,
except for certain identifications, there is only one completely ordered field. In this course,
we assume, in agreement with your high school experience, that a field of real numbers exists
and is unique.

Chapter 2.1
2.1.3 Intervals
Definition 2.30: Intervals

Let a, b ∈ R. We define:
• the closed interval [a, b] as
[a, b] = {x ∈ R | a ≤ x ≤ b}.
• The open interval (a, b) as
(a, b) = {x ∈ R | a < x < b}.
• The half-open intervals [a, b) and (a, b]:
[a, b) = {x ∈ R | a ≤ x < b} and (a, b] = {x ∈ R | a < x ≤ b}.
4
• The unbounded closed intervals
[a, ∞) = {x ∈ R | a ≤ x} and (−∞, b] = {x ∈ R | x ≤ b}
as well as the unbounded open intervals
(a, ∞) = {x ∈ R | a < x} and (−∞, b) = {x ∈ R | x < b}.
2.31. — The intervals (a, b], [a, b), (a, b) for a, b ∈ R are non-empty exactly when a < b,
and [a, b] is non-empty exactly when a ≤ b. If the interval is non-empty, a is called the left
endpoint, b is called the right endpoint and b − a is called the length of the interval.
Intervals of the kind [a, b], (a, b], [a, b), (a, b) for a, b ∈ R are also called bounded intervals
if we want to distinguish them from the unbounded intervals.
Instead of round brackets, inverted square brackets are sometimes used to denote open and
half-open intervals. For example, instead of (a, b) for a, b ∈ R, one can also find ]a, b[ in the
literature. We will always use round brackets here.

Chapter 2.1
Interlude: Set operations

Let P and Q be sets. The intersection P ∩ Q, the union P ∪ Q, the relative
complement P \ Q, and the symmetric difference P △Q are defined as by
P ∩ Q = {x | x ∈ P and x ∈ Q}
P ∪ Q = {x | x ∈ P or x ∈ Q}
P \ Q = {x | x ∈ P and x ∈
/ Q}
P △Q = (P ∪ Q) \ (P ∩ Q).
These definitions are illustrated in the following pictures. Sketches of this kind are called
Venn diagrams.
Figure 2.1: Illustration of set operations.
If it is clear from the context that all sets under consideration are subsets of a given basic
set X, then the complement P c of P is defined by P c = X \ P .
Figure 2.2: The complement P c = X \ P of P in X.

Chapter 2.1
Interlude: Union and Intersection of several sets

Let A be a family of sets, that is, a set whose elements are themselves sets. Then we
define the union and the intersection of the sets in A as
[ \
A = x | ∃A ∈ A : x ∈ A , A = x | ∀A ∈ A : x ∈ A .
A∈A A∈A
If A = {A1 , A2 , . . .}, then we also write

∞
[ ∞
\
An = {x | ∃ n ≥ 1 : x ∈ An } , An = {x | ∀ n ≥ 1 : x ∈ An }
n=1 n=1
for the union and the intersection of the sets in A.
Exercise 2.32. — 1. Show that a finite intersection of intervals is again an interval.

Can you describe the endpoints of a non-empty intersection using the endpoints of the
original intervals?
2. When is a union of two intervals an interval again? In this case, what happens when
you unite two intervals of the same type (open, closed, half-open)?
Definition 2.33: Neighborhoods

5
Let x ∈ R. A neighbourhood of x is a set containing an open interval I such that x ∈ I.
Given δ > 0, the open interval (x − δ, x + δ) is called the δ-neighbourhood of x.
2.34. — For example, both [−1, 1] and Q ∪ [−1, 1] are neighborhoods of 0 ∈ R (since they
both contain, for instance, (−1/2, 1/2)), but [0, 1] is not a neighbourhood of 0.
We note further that, for δ > 0 and x ∈ R, the δ-neighbourhood of x is given by {y ∈
R | |x − y| < δ}. We will interpret |x − y| as the distance from x to y. In terms of “distance”,
a few of the above inferences can be re-expressed more intuitively. For example, property (a)
in Paragraph 2.16 implies that, for x, y ∈ R, the equality |x − y| = | − (x − y)| = |y − x| holds.
In other words, the distance from x to y is equal to the distance from y to x.
Definition 2.35: Open and Closed Sets

A subset U ⊆ R is called open in R if for every x ∈ U there exists an open interval I
such that x ∈ I and I ⊂ U .
A subset F ⊆ R is called closed in R if its complement R \ F is open.
2.36. — Open intervals are open, closed intervals are closed. Intuitively, a subset is open
if, for any point x in the set, all points close enough to x are also in the set. Contrary to
conventional usage, “open” is not the opposite of “closed”.

Chapter 2.1
The sets ∅ and R are both open in R. Hence, they are also closed since ∅ = Rc and R = ∅c .
We note that Q ⊆ R and [a, b) ⊂ R are neither open nor closed.
Exercise 2.37. — Show that a subset U ⊆ R is open exactly if, for every element x ∈ U ,
there exists δ > 0 such that (x − δ, x + δ) ⊆ U .
5
Exercise 2.38. — Let U be a family of open sets, and F be a family of closed subsets of
R. Show that the union and the intersection
[ \
U and F
U ∈U F ∈F
are open and closed, respectively.

Chapter 2.2
2.2 Complex Numbers
2.2.1 Definition of Complex Numbers

Using a field of real numbers R, we can define the set of complex numbers as
C = R2 = {(x, y) | x, y ∈ R}.
We call elements z = (x, y) ∈ C complex numbers, and will write them in the form z = x + iy,
where the symbol i is called the imaginary unit. Note that in this identification, the symbol
+ is, for the time being, to be understood as a substitute for the comma. The number x ∈ R is
called the real part of z and one writes x = Re(z); the number y ∈ R is the imaginary part
of z and one writes y = Im(z). The elements of C with imaginary part 0 are also called real,
and the elements with real part 0 are called purely imaginary. Via the injective mapping
x ∈ R 7→ x + i0 ∈ C we identify R with the subset of real elements of C.
The graphical representation of the set C is called the complex plane or also Gaussian
number plane. From this geometric point of view, the set of real points is called the real
axis and the set of purely imaginary points is called the imaginary axis.
As you might expect from previous knowledge, i should corresponds to a square root of
−1. Hence, we want to define an addition and a multiplication on the set C so that the set C
together with these operations is a field in which i2 = −1 holds. Note that, if i2 = −1, then
it follows from commutativity and distributivity that
(x1 + iy1 ) · (x2 + iy2 ) = x1 x2 + ix1 y2 + iy1 x2 + i2 y1 y2 = (x1 x2 − y1 y2 ) + i(x1 y2 + y1 x2 ).
This leads us to the following definition:
Definition 2.39. — We call addition and multiplication on the set C = R × R the

following operations:
(x1 , y1 ) + (x2 , y2 ) = (x1 + x2 , y1 + y2 ) (addition)

(x1 , y1 ) · (x2 , y2 ) = (x1 x2 − y1 y2 , x1 y2 + x2 y1 ) (multiplication)

Chapter 2.2
Proposition 2.40: C is a Field

The set C = R × R, together with the zero element (0, 0), the one element (1, 0), and
the operations introduced in Definition 2.39, is a field.
Proof. We review the axioms of fields. The associativity and commutativity of the addition,
and the fact that (0, 0) is a neutral element for the addition, are direct consequences of the
corresponding properties of the addition of real numbers. The inverse element of (x, y) for
the addition is given by (−x, −y), since
(x, y) + (−x, −y) = (x − x, y − y) = (0, 0).
Proving the properties of multiplication requires a little more effort. We start with associa-
tivity of multiplication: let (x1 , y1 ), (x2 , y2 ), and (x3 , y3 ) be elements of C. Now calculate

(x1 , y1 ) · (x2 , y2 ) · (x3 , y3 ) = (x1 x2 − y1 y2 , x1 y2 + y1 x2 ) · (x3 , y3 )
= (x1 x2 x3 − y1 y2 x3 − x1 y2 y3 − y1 x2 y3 , x1 y2 x3 + y1 x2 x3 + x1 x2 y3 − y1 y2 y3 ).
Analogously, we calculate

(x1 , y1 ) · (x2 , y2 ) · (x3 , y3 ) = (x1 , y1 ) · (x2 x3 − y2 y3 , x2 y3 + y2 x3 )
5 = (x1 x2 x3 − y1 y2 x3 − x1 y2 y3 − y1 x2 y3 , x1 y2 x3 + y1 x2 x3 + x1 x2 y3 − y1 y2 y3 ).
This proves that the multiplication in Definition 2.39 is associative.

The commutativity of multiplication can also be shown by direct computation:
(x1 , y1 ) · (x2 , y2 ) = (x1 x2 − y1 y2 , x1 y2 + x2 y1 ) = (x2 , y2 ) · (x1 , y1 ).
Also, in the same way, we check that (1, 0) is the neutral element for multiplication:
(x, y) · (1, 0) = (x · 1 − y · 0, x · 0 + y · 1) = (x, y).
Next, we check the distributivity law: Let again (x1 , y1 ), (x2 , y2 ), and (x3 , y3 ) be elements of
C. Then

(x1 , y1 ) · (x2 , y2 ) + (x3 , y3 ) = (x1 , y1 ) · (x2 + x3 , y2 + y3 )
= (x1 x2 + x1 x3 − y1 y2 − y1 y3 , y1 x2 + y1 x3 + x1 y2 + x1 y3 )
= (x1 x2 − y1 y2 , y1 x2 + x1 y2 ) + (x1 x3 − y1 y3 , y1 x3 + x1 y3 )
= (x1 , y1 ) · (x2 , y2 ) + (x1 , y1 ) · (x3 , y3 ),
which shows that C is a ring when endowed with the addition and multiplication given in
Definition 2.39.

Chapter 2.2
To finish the proof, we still need to show the existence of multiplicative inverses. Let
(x, y) ∈ C be such that (x, y) ̸= (0, 0). So either x ̸= 0 or y ̸= 0 holds, and therefore
−y
x2 + y 2 > 0. Then the multiplicative inverse of (x, y) is given by x2 +y
x
2 , x2 +y 2 , because

(x, y) · x
x2 +y 2
, x2−y
+y 2
= x · x2 +yx −y
2 − y · x2 +y 2 , y ·
x
x2 +y 2
+x· −y
x2 +y 2
2 2
= xx2 +y
+y 2 , yx−xy
2
x +y 2 = (1, 0).
Applet 2.41 (Complex numbers). We consider the field operations (addition, multiplication,
multiplicative inverse) on the complex numbers. The true geometric meaning of multiplication
and of the multiplicative inverse will be discussed later.
As already explained, we do not write (x, y) for complex numbers, but x + iy. Instead of
x + i0 we also simply write x, and instead of 0 + iy we write iy, and finally we also write i
for i1. By construction, i2 = −1 holds. According to this notation, take R to be a subset of
C. This makes sense since addition and multiplication on C restricted to R is equivalent to
addition and multiplication on R. Also, given z, w ∈ C, we write zw for z · w.
Given z ̸= 0, we shall use both z −1 and z1 to denote the inverse of z for the multiplication.
For instance, i−1 = 1i = −i (since (−i) · i = −i2 = 1).
5 Definition 2.42: Complex Conjugation

Let z = x+iy be a complex number. We call z = x−iy the complex number conjugate
to z. The mapping C → C given by z 7→ z is called complex conjugation.
Lemma 2.43: Properties of Complex Conjugation

The complex conjugation satisfies the following properties:
1. For all z ∈ C, z z̄ ∈ R and z z̄ ≥ 0. Furthermore, for all z ∈ C, z z̄ = 0 holds

exactly when z = 0.
2. For all z, w ∈ C, z + w = z + w holds.
3. For all z, w ∈ C, z · w = z · w holds.
Proof. Part (1) follows from the fact that, for z = x + iy,
z z̄ = (x + iy)(x − iy) = x2 + y 2 .
To show parts (2) and (3), we write z = x1 + iy1 and w = x2 + iy2 . Then z + w =
(x1 + x2 ) + i(y1 + y2 ) and we get
z + w = (x1 + x2 ) − i(y1 + y2 ) = (x1 − iy1 ) + (x2 − iy2 ) = z + w.

Chapter 2.2
Analogously, since z · w = (x1 x2 − y1 y2 ) + i(x1 y2 + y1 x2 ) we have
z · w = (x1 x2 − y1 y2 ) − i(x1 y2 + y1 x2 ) = (x1 − iy1 ) · (x2 − iy2 ) = z · w,
as desired.
Exercise 2.44. — Show the identities
z+z z−z
Re(z) = and Im(z) =
2 2i
for all z ∈ C. In particular, conclude that R = {z ∈ C | z = z}. Can you interpret these
equalities geometrically?
2.45. — Since i2 = −1 < 0, property (f) in Paragraph 2.11 implies that no order compatible
with addition and multiplication can be defined on C. Nevertheless, calculus can be performed
on the complex numbers, which is partly addressed in this course but mainly in the course on
complex analysis in the second year of the study of mathematics and physics. The reason for
this is that C satisfies a generalisation of the completeness axiom, which we can only discuss
after some more theory.
5
2.2.2 The Absolute Value on the Complex Numbers

Since there is no ordering relation on the field of complex numbers that would make it an
ordered field, we cannot use Definition 2.15 to define an absolute value on C. We still want to
define an absolute value on C in such a way that it has as many properties as possible of the
absolute value on R and is compatible with it. To do this, we use the root function introduced
in Exercise 2.27.
Definition 2.46: Absolute Value

The absolute value or norm on C is the function | · | : C → R given by
√ p
|z| = zz = x2 + y 2
for z = x + iy ∈ C.
At this point we note that, given x ∈ R, the absolute value |x| = sgn(x)·x and the absolute
√ √
value of x as an element of C coincide, since xx = x2 = |x| holds. In particular, the newly
introduced notation is consistent, and we have extended the absolute value of R to C.

Chapter 2.2
Note that |z| ≥ 0 for all z ∈ C, and |z| = 0 exactly when z = 0. Also, the absolute value on
C is multiplicative, namely
√ √ √ √
|zw| = zwzw = zzww = zz ww = |z||w| for all z, w ∈ C.
Furthermore,
z̄
z −1 = for all z ̸= 0.
|z|2
These are essential consequences of Lemma 2.43. Finally, the triangle inequality holds, as
shown in the next proposition.
Proposition 2.47: Triangle Inequality
For all z, w ∈ C, |z + w| ≤ |z| + |w| holds.
To prove this result, we first need the following:
Lemma 2.48: Cauchy-Schwarz Inequality
For all z = x1 + iy1 , w = x2 + iy2 ∈ C,
x1 x2 + y1 y2 ≤ |z||w|. (2.3)
6
Proof. We begin by observing that
|z|2 |w|2 − (x1 x2 + y1 y2 )2 = (x21 + y12 )(x22 + y22 ) − (x1 x2 + y1 y2 )2

= x21 x22 + y12 y22 + y12 x22 + x21 y22 − (x21 x22 + y12 y22 + 2x1 x2 y1 y2 )
= y12 x22 + x21 y22 − 2x1 x2 y1 y2
= (y1 x2 − x1 y2 )2 ≥ 0.
This proves that

(x1 x2 + y1 y2 )2 ≤ |z|2 |w|2 .
Taking the square root on both sides and recalling Exercise 2.28, we get |x1 x2 + y1 y2 | ≤ |z||w|
which implies (2.3) (recall that x ≤ |x| for any x ∈ R).
Proof of Proposition 2.47. We write z = x1 + iy1 and w = x2 + iy2 . Then, since z + w =

(x1 + x2 ) + i(y1 + y2 ), we have
|z + w|2 = (x1 + x2 )2 + (y1 + y2 )2

= x21 + x22 + y12 + y22 + 2(x1 x2 + y1 y2 )
= |z|2 + |w|2 + 2(x1 x2 + y1 y2 ).

Chapter 2.2
Hence, using (2.3), we get
|z + w|2 = |z|2 + |w|2 + 2(x1 x2 + y1 y2 ) ≤ |z|2 + |w|2 + 2|z||w| = (|z| + |w|)2 .
Taking the square root on both sides concludes the proof.
2.49. — The absolute value of the complex number z = x + iy is the square root of x2 + y 2
and, in the geometric notion of complex numbers, is equivalent to the length of the straight
line from the origin 0 + i0 to z. In the same way, for two complex numbers z and w, we
interpret |z − w| as the distance from z to w.
Definition 2.50: Circular Disks

The open circular disk with radius r > 0 around a point z ∈ C is the set
B(z, r) = {w ∈ C | |z − w| < r}.
The closed circular disk with radius r > 0 around z ∈ C is the set
B(z, r) = {w ∈ C | |z − w| ≤ r}.
6 2.51. — The open circular disk B(z, r) thus consists precisely of those points that have
distance strictly less than r from z. Open circular disks in C and open intervals in R are
compatible in the following sense: If x ∈ R and r > 0, then the intersection of the open
circular disk B(x, r) ⊆ C with R is just the open interval (x − r, x + r) liyng symmetrically
about x.
Exercise 2.52. — Show the following property of open circular disks: let z1 , z2 ∈ C, r1 > 0
and r2 > 0. For each point z ∈ B(z1 , r1 ) ∩ B(z2 , r2 ) there exists a radius r > 0 such that
B(z, r) ⊆ B(z1 , r1 ) ∩ B(z2 , r2 ).
Illustrate your choice of radius r in a picture.
The definition of open set in C given below generalizes that in R from Exercise 2.37.

Chapter 2.2
Definition 2.53: Open and Closed Sets

A subset U ⊆ C is called open in C if for every point in U there exists an open circular
disk around that point contained in U . More formally, for all z ∈ U there exists a
radius r > 0 such that B(z, r) ⊆ U .
6 A subset A ⊆ C is called closed in C if its complement C \ A is open.
For example, thanks to Exercise 2.52, all open circular disks are open. In addition to open
circular disks, there are many other subsets of C. For example, every union of open subsets
is open. We will return to studying open sets and related notions in much greater generality
in the second semester.

Chapter 2.3
2.3 Maximum and Supremum
2.3.1 Existence of the Supremum
Definition 2.54: Bounded Sets, Maxima and Minima
Let X ⊆ R be a subset of the real numbers.
• X is bounded from above if there exists s ∈ R such that x ≤ s for all x ∈ X.

Any such s ∈ R is called an upper bound of X. An upper bound that is itself
contained in X is called a maximum. We write
s = max(X)
if such a maximum of X exists and is equal to s.
• The terms bounded from below, lower bound, and minimum are defined
analogously.
• X ⊂ R is called bounded if it is bounded from above and from below.
6 2.55. — If a subset X ⊆ R has a maximum, then it is unique. Indeed, if x1 ∈ X and

x2 ∈ X both satisfy the properties of a maximum, it follows that x1 ≤ x2 because x2 is a
maximum, and x2 ≤ x1 because x1 is a maximum, and hence x1 = x2 . So we may speak of
the maximum of a subset X ⊂ R if one exists.
A closed interval [a, b] with endpoints a < b in R has a = min([a, b]) as minimum and
b = max([a, b]) as maximum. However, there are also sets that do not have a maximum. For
example, the open interval (a, b) with endpoints a < b in R has no maximum. Although the
endpoint b would lend itself as a maximum, it does not lie in the set (a, b) and is therefore
not a candidate for the maximum. Likewise R as a subset of R, or also unbounded intervals
of the form [a, ∞), (a, ∞) for a ∈ R, do not have a maximum.
Definition 2.56: Supremum

Let X ⊆ R be a subset and let A := {a ∈ R | x ≤ a for all x ∈ X} be the set of all
upper bounds of X. If the minimum s = min(A) exists, then we call this minimum the
supremum of X, and write
s = sup(X).
In other words, if it exists, the supremum of X is the smallest upper bound of X. We can
describe the supremum s = sup(X) of X directly by
x ≤ s for all x ∈ X, and x ≤ t for all x ∈ X =⇒ s ≤ t. (2.4)

Chapter 2.3
Equivalently, the supremum of X could also be characterised by the fact that no real number
x1 strictly smaller than s = sup(X) is an upper bound of X:
x ≤ s for all x ∈ X, and t < s =⇒ ∃ x ∈ X with t < x. (2.5)
The statements (2.4) and (2.5) are equivalent.

We have not yet addressed the question of the existence of the supremum. If X is the
empty set or if X is unbounded from above, then the supremum cannot exist. In all other
cases it exists, as the following theorem shows.
Theorem 2.57: Existence of Supremum

Let X ⊂ R be a nonempty subset bounded from above. Then the supremum of X exists.
Proof. By assumption, X is non-empty, and the set of upper bounds A := {a ∈ R | x ≤

a for all x ∈ X} is also non-empty. Furthermore, for all x ∈ X and a ∈ A, the inequality
x ≤ a holds. Therefore, according to the completeness axiom in Definition 2.18, it follows
that there exists a c ∈ R such that
x≤c≤a
holds for all x ∈ X and a ∈ A. From the first inequality it follows that c is an upper bound
of X, so c ∈ A. From the second inequality it follows that c is the minimum of the set A.
6
Applet 2.58 (Supremum of a bounded non-empty set). We consider a bounded non-empty
subset of R and two equivalent characterizations of the supremum of this set.
Proposition 2.59: Supremum vs Set Operations

Let X and Y be nonempty subsets of R bounded from above, and write
X + Y := {x + y | x ∈ X, y ∈ Y } and XY := {xy | x ∈ X, y ∈ Y }.
The sets X ∪ Y , X ∩ Y , and X + Y are bounded from above. Also, if x ≥ 0 and y ≥ 0

for all x ∈ X and y ∈ Y , then XY is bounded from above. Furthermore:
(1) sup(X ∪ Y ) = max{sup(X), sup(Y )},
(2) if X ∩ Y is not empty, then sup(X ∩ Y ) ≤ min{sup(X), sup(Y )},
(3) sup(X + Y ) = sup(X) + sup(Y ) and
(4) if x ≥ 0 and y ≥ 0 for all x ∈ X and y ∈ Y , then sup(XY ) = sup(X) sup(Y ).
Proof. The proof of (1) and (2) is left to the reader.

To prove (3) we write x0 = sup(X) and y0 = sup(Y ). Let z ∈ X + Y . There are x ∈ X
and y ∈ Y with z = x + y, and because of x ≤ x0 and y ≤ y0 , then z ≤ x0 + y0 holds. That

Chapter 2.3
is, x0 + y0 is an upper bound for X + Y . We want to show that x0 + y0 is the supremum of

X + Y , i.e., the smallest upper bound of X + Y .
Let z0 = sup(X + Y ) (the supremum exists, since we just proved that X + Y is bounded
from above) and assume by contradiction that
ε = x0 + y0 − z0 > 0.
Since x0 is the supremum of X, it follows from (2.5) that there exists x ∈ X with x > x0 −ε/2,
and similarly there exists y ∈ Y with y > y0 − ε/2. Set z = x + y. It follows that
z = x + y > x0 − ε/2 + y0 − ε/2 = z0 ,
which contradicts the fact that z0 is an upper bound for X + Y . Thus z0 = x0 + y0 , as desired.
The proof of (4) is done in a similar way.
2.60. — For a non-empty subset X ⊆ R bounded from below, the largest lower bound of X
will also be called the infimum inf(X) of X. An existence statement analogous to Theorem
2.57 holds for the infimum. Alternatively, the infimum of X can be written as
− sup{−x | x ∈ X}.
6
In this way, practically all statements about infima can be traced back to statements about
suprema.
2.3.2 Two-point Compactification

2.61. — In this section we want to extend the notions of supremum and infimum
to arbitrary subsets of R. To do this, we use the symbols ∞ = +∞ and −∞, which are
not real numbers. We define the extended number line, which is also called two-point
compactification of R, by
R = R ∪ {−∞, +∞}
and imagine this as the number line
Here we have added the point +∞ to the right of R and the point −∞ to the left of R
to the straight line. We extend the order relation of the real numbers ≤ to R by requiring
−∞ < x < +∞ for all x ∈ R.

Chapter 2.3
To simplify the notation, we also write ∞ in place of +∞. Often used calculation rules for
the symbols −∞ and ∞, such as the following, are standard conventions:
∞+x=∞+∞=∞ and − ∞ + x = −∞ − ∞ = −∞
for all x ∈ R. Also, for x > 0,
x·∞=∞·∞=∞ and x · (−∞) = ∞ · (−∞) = −∞.
One should use such conventions as sparingly as possible and be careful with them. The
expressions ∞ − ∞ and 0 · ∞ or similar remain undefined.
6 Definition 2.62: Indefinite values

Let X be a subset of R. If X ⊂ R is not bounded from above, then we define sup(X) =
∞. If X is empty, we set sup(∅) = −∞. In this context, we call ∞ and −∞ indefinite
values.
In other words, the statement sup(X) = ∞ is equivalent, by definition, to the statement

that X is not bounded from above, that is,
∀ x0 ∈ R ∃ x ∈ X : x > x0 .
In the same way sup(X) = −∞ is equivalent, by definition, to the statement X = ∅.

Similarly, we define inf(∅) = +∞, and inf(X) = −∞ if X ⊆ R is not bounded from below.

Chapter 2.4
2.4 Consequences of Completeness

We introduced the root function in Section 2.1 using the completeness axiom, and used the
completeness axiom in Section 2.3 to prove the existence of the supremum. In this section we
will discuss some further consequences of the completeness axiom. We choose for the whole
Section 2.4 an arbitrary set of real numbers R, and call the elements of R real numbers.
2.4.1 The Archimedean Principle

Archimedes’ principle is the statement that for every real number x ∈ R there is an integer n
greater than x. The following theorem, which we prove using the existence of the supremum,
is a slightly more precise formulation of Archimedes’ Principle.
Theorem 2.63: Archimedean Principle

For every x ∈ R there exists exactly one n ∈ Z with n ≤ x < n + 1.
Proof. First, let x ≥ 0 be a non-negative real number. Then E = {n ∈ Z | n ≤ x} is an

upper bounded subset of R which is non-empty since 0 ∈ E. By Theorem 2.57, the supremum
s0 = sup(E) exists. Since s0 is the smallest upper bound of E, then:
7 (i) s0 ≤ x holds;
(ii) there exists n0 ∈ E with s0 − 1 < n0 .
It follows from (ii) that s0 < n0 + 1, and therefore m ≤ s0 < n0 + 1 for every m ∈ E. This
proves that m ≤ n0 for every m ∈ E, which means that n0 = s0 is the maximum of E.
Also, using (ii) again, n0 + 1 ∈/ E, so n0 + 1 > x by the definition of E.
Recalling (i), this proves that
n0 = s0 ≤ x < n0 + 1,
which shows the validity of Theorem 2.63 in the case x ≥ 0.

If x < 0, then we can apply the above case to −x and conclude the existence of m ∈ Z
such that m ≤ −x < m + 1, or equivalently −m − 1 < x ≤ −m. If x = −m, then the result
follows with n = −m. If x < −m, then the result follows with n = −m − 1.
For the proof of uniqueness, we assume that for n1 , n2 ∈ Z both inequalities n1 ≤ x < n1 +1
and n2 ≤ x < n2 + 1 hold. It follows that n1 ≤ x < n2 + 1 and thus n1 ≤ n2 . Similarly
n2 ≤ n1 , implying n1 = n2 .
Definition 2.64: Integer and Fractional Parts

The integer part ⌊x⌋ of a number x ∈ R is the integer n ∈ Z uniquely determined by
Theorem 2.63 with n ≤ x < n + 1. The function from R to Z given by x 7→ ⌊x⌋ is called
rounding function. The fractional part of a real number x is {x} = x−⌊x⌋ ∈ [0, 1).

Chapter 2.4
1
Corollary 2.65: n is arbitrarily small
1
For every ε > 0 there exists an integer n ≥ 1 such that n < ε holds.
Proof. If ε > 1 then the result holds with n = 1.

If ε ≤ 1 then 1ε ≥ 1 and, thanks to Theorem, 2.63 there exists n ≥ 1 with n > 1ε . Then,
for such n, the inequality n1 < ε holds.
Corollary 2.66: Density of Q
For every two real numbers a, b ∈ R with a < b, there is one r ∈ Q with a < r < b.
Proof. Set ε = b − a. According to Archimedes’ principle in the form of Corollary 2.65, there
exists m ∈ N with m 1
< ε. Similarly, according to Archimedes’ principle from Theorem 2.63,
there exists n ∈ Z with n ≤ ma < n + 1, or equivalently mn
m .
≤ a < n+1
Hence, since m < ε and a + ε = b, we get
1
n+1 n 1 1
a< = + ≤a+ < a + ε = b.
m m m m
This proves the corollary with r = m .

n+1
Stated differently, the above corollary shows that Q intersects any open non-empty interval
7 I, that is, I ∩ Q ̸= ∅. A subset X of R is called dense in R if every open non-empty interval
of R contains an element of X. Corollary 2.66 thus states: Q is dense in R.
The following two exercises establish a generalization of Archimedes’ principle, which will
be used in the section about decimal fractions.
Exercise 2.67. — Let A := {m0 , m1 , m2 , . . .} ⊂ Z be a strictly increasing sequence of

integers, i.e., mi+1 > mi for every i ≥ 1. Then the following analog of Theorem 2.63 holds
with A in place of Z: Given x ∈ R with x ≥ m0 , there exists exactly one element mi ∈ A such
that mi ≤ x < mi+1 .
Exercise 2.68. — Let A := {m0 , m1 , m2 , . . .} ⊂ Z be a strictly increasing sequence of

positive integers. Then Corollary 2.65 holds with A in place of N. In other words, for every
ε > 0 there exists an element mi ∈ A such that m1i < ε holds.
2.4.2 Decimal Fraction Expansion and Uncountability (Extra material)

2.69. — A common notion of the real numbers is given by non-terminating decimal frac-
tions. Formally, we define a decimal fraction as a sequence of integers
a0 , a1 , a2 , a3 , . . .

Chapter 2.4
with a0 ∈ Z, and 0 ≤ an ≤ 9 for all n ≥ 1. Thus, given a decimal fraction, we can assign to
it a unique element of R as follows.
Suppose a0 ≥ 0. We set
n
X n
X
xn = ak · 10−k and yn = 10−n + ak · 10−k . (2.6)
k=0 k=0
One can check that
x0 ≤ x1 ≤ . . . ≤ xn ≤ xn+1 ≤ . . . ≤ yn+1 ≤ yn ≤ . . . ≤ y1 ≤ y0 .
Thus, if we consider the sets X = {x0 , x1 , x2 , . . .} and Y = {y0 , y1 , y2 , . . .}, according to the
completeness axiom we can conclude the existence of c ∈ R with the property that
xn ≤ c ≤ yn (2.7)
for all n ∈ N. Archimedes’ principle in the form of Corollary 2.65 shows that there is precisely
one real number c that satisfies the inequality xn ≤ c ≤ yn for all n ≥ 0. If there were two
different such numbers, say c and d with c < d, it follows from
xn ≤ c < d ≤ yn
7 and the definition of xn and yn the inequality
0 < d − c < 10−n
for all n ≥ 0, which contradicts Exercise 2.68 with mi = 10i . We call the element c ∈ R
uniquely determined by (2.7) the real number with decimal expansion a0 , a1 a2 a3 a4 . . .. We
note that two possible alternative definitions would be
c = sup{xn | n ≥ 0} or c = inf{yn | n ≥ 0}.
If a0 is negative, we first consider the real number c with decimal expansion −a0 , a1 a2 a3 a4 . . .
and then define the real number with decimal expansion a0 , a1 a2 a3 a4 . . . as −c.
2.70. — Now, the following question arises: Can every element of R be written as a decimal
fraction? This is indeed the case: Let c ∈ R and c ≥ 0. Then we can write a0 := ⌊c⌋ and
define
an := ⌊10n c⌋ − 10⌊10n−1 c⌋. (2.8)
One verifies that the sequence a0 , a1 , a2 , . . . is indeed a decimal fraction, since 0 ≤ an ≤ 9 is

satisfied for all n ≥ 1. Also, it is not difficult to check the inequalities (2.7), from which it
follows that c is just the real number with decimal expansion a0 , a1 a2 a3 . . .. The case c ≤ 0 is
obtained in the same way, by changing the sign of a0 in the decimal expansion of −c.

Chapter 2.4
2.71. — It is important to remark that two different decimal fractions can represent the
same real number. For example, the two decimal fractions
0.199999999999 . . . and 0.20000000000000 . . .
both represent the real number 51 . However, the problem only occurs when a decimal fraction
becomes a constant sequence . . . 9999 . . . after a certain point. To rule this out, we can consider
the following definition: We call real decimal fraction any sequence of integers
a0 , a1 , a2 , a3 , . . .
with 0 ≤ an ≤ 9 for all n ≥ 1, and with the property that for every n0 ≥ 1 there exists a
n ≥ n0 with an ̸= 9.
7
Exercise 2.72. — Let c ≥ 0 be a real number. Verify that the sequence a0 , a1 , a2 , . . .
defined by (2.8) is a real decimal fraction. Then show that this gives rise to a bijection
between R and the set of all real decimal fractions.
Interlude: Power Set

Let X be a set. The power set P(X) of X is the set of all subsets of X, that is,
P(X) = {Q | Q is a set and Q ⊂ X}.
Example 2.73. — If X = {0, 1, 2} then

P(X) = ∅, {0}, {1}, {2}, {0, 1}, {0, 2}, {1, 2}, {0, 1, 2} .

Chapter 2.4
Interlude: Cardinality
Let X and Y be two sets. We say that X and Y have the same cardinality or the
same number of elements, written X ∼ Y , if there is a bijection f : X → Y . We
say that Y is larger than X, and write X ≲ Y , if there is an injection f : X → Y .
• We say that the cardinality of the empty set is zero, and write |∅| = 0.
• Let X be a set and n ≥ 1 a natural number. We say the set X has cardinality
n, and write |X| = n, if there is a bijection from X to {1, . . . , n}. In this case we
call X a finite set and write |X| < ∞.
• If X is not finite, we call X an infinite set.
• A set is called countable if there is a bijection to N. The cardinality of N is also

called ℵ0 , pronounced Aleph-0.
• A set X is called uncountable if it is infinite and not countable.
• The cardinality of P(N) is written c and called the continuum.
Remark 2.74. — A nontrivial fact is that
7 X≲Y and Y ≲ X =⇒ X ∼ Y.
In other words, if there exist an injective map f : X → Y and an injective map g : Y → X,

then one can find a bijective map h : X → Y . This fact is known as the Schröder–Bernstein
Theorem.
Theorem 2.75: Cantor’s Theorem

Let X be a set. Then the power set P(X) is larger than X and not equal to X.
Proof. The function i : X → P(X) given by i(x) = {x} is injective. So P(X) is larger than
X. It remains to show that there is no bijection from X to P(X). To show this, we assume
that there is a bijection f : X → P(X) and take this to a contradiction. For this, we define
the set
A = {x ∈ X | x ̸∈ f (x)}.
In other words, A ∈ P(X) consists of all elements x in X for which x is not an element of the
subset f (x) ⊂ X.
Since, by assumption, f : X → P(X) is a bijection, there exists a ∈ X such that A = f (a).
We now ask ourselves: does a belong to A or not?
If a ∈ A then, by the definition of A, a ̸∈ f (a). However, this is impossible since f (a) = A.
Vice versa, if a ̸∈ A it means that a ∈ f (a), that again is impossible since f (a) = A

Chapter 2.4
This proves that there exists no a ∈ X with f (a) = A, which contradicts the surjectivity
of f . So there can be no bijection f : X → P(X).
Proposition 2.76: Uncountability of R

The set R is uncountable.
Proof. By Theorem 2.75, P(N) is strictly larger than N, and thus uncountable. Thus, to show
that R is uncountable, it suffices to prove the existence of an injection
φ : P(N) → R.
7
We construct such an injection by assigning to each subset A the real number φ(A) whose
decimal fraction expansion is given by a0 , a1 a2 a3 a4 . . . with

0 if n ̸∈ A
an =
1 if n ∈ A
Injectivity of the function φ can be proved in two ways. As a first option, one can simply
apply Problem 2.72. Alternatively, one can argue as follows. Let A and B be distinct subsets
of N, and let n be the smallest element of A△B (recall Definition 2). If n ∈ A and n ∈ / B
then φ(A) > φ(B) holds. On the other hand, if n ∈ / A and n ∈ B then φ(A) < φ(B) holds.
Therefore φ(A) ̸= φ(B) holds in all cases, which proves the injectivity.

Chapter 2.5 Sequences of Real Numbers
2.5 Sequences of Real Numbers
2.5.1 Convergence of Sequences

Let X be a set. Intuitively, a sequence in X is a non-terminating sequence of elements
x0 , x1 , x2 , x3 , . . . of X indexed by the natural numbers. Here, we are interested in sequences
and their properties in R. We give now a precise definition, although we have already used
the concept of sequence before in an intuitive way.
Definition 2.77: Sequences
A sequence in R is a function a : N → R. The image a(n) of n ∈ N is also written as

an and is called the n-th element of a. Instead of a : N → R one often writes (an )n∈N ,
n=0 , or (an )n≥0 .
(an )∞
Since we primarily use the letter x to denote a real number, for sequences of real numbers
we shall mostly use the notation (xn )n∈N , (xn )∞
n=0 , or (xn )n≥0 .
Definition 2.78: (Eventually) Constant Sequences

7
A sequence (xn )∞
n=0 is called constant if xn = xm for all m, n ∈ N, and eventually
constant if there exists N ∈ N with xn = xm for all m, n ∈ N with m, n ≥ N .
Example 2.79. — A sequence in R is, for example, (xn )∞

n=0 given by x0 = 1, xn =
1
n for
n ≥ 1.
Definition 2.80: Convergence for Sequences
Let (xn )∞
n=0 be a sequence in R. We say that (xn )n=0 is convergent or converges if
∞
there exists an element A ∈ R with the following property:
∀ ε > 0 ∃ N ∈ N such that ∀ n ∈ N : n ≥ N =⇒ |xn − A| < ε.
In this case, we write

lim xn = A (2.9)
n→∞
and call the point A the limit of the sequence (xn )∞

n=0 .
It is a priori not clear that a converging sequence has only one limit. In the following lemma
8
we show that the limit is indeed unique, so the notation (2.9) is justified.

Lemma 2.81: Uniqueness of the Limit
A convergent sequence (xn )∞

n=0 in R has exactly one limit.
Proof. Let A ∈ R and B ∈ R be limits of the sequence (xn )∞ n=0 . Let ε > 0. Then we can
find NA , NB ∈ N such that for all n ≥ NA it holds |xn − A| < 2ε , and for all n ≥ NB it holds
|xn − B| < 2ε . Set N = max{NA , NB }. Then
ε ε
|A − B| ≤ |A − xN | + |xN − B| < 2 + 2 = ε.
Since ε > 0 is arbitrary it follows that |A − B| = 0, and so A = B.
Example 2.82. — A constant sequence (xn )∞ n=0 with xn = A ∈ R for all n ∈ N converges
to A. Similarly, eventually constant sequences converge to the value they eventually take.
1 ∞
Example 2.83. — The sequence of real numbers converges to zero, i.e., lim 1

n n=1 = 0.
n→∞ n
Indeed, given ε > 0, by Archimedes’ principle (Theorem 2.63) there exists N ∈ N with N1 < ε.
Therefore, for every n ∈ N with n ≥ N , we have 0 ≤ n1 ≤ N1 < ε.
8 Example 2.84. — The sequence of real numbers (yn )∞ n=0 given by yn = (−1) for n ∈ N
n
is not convergent, since the sequence members 1, −1, 1, −1, 1, −1, . . . alternate between 1 and
1 and, in particular, do not approach any real number.
2.5.2 Convergent Subsequences and Accumulation Points

Let (xn )∞
n=0 be a sequence in R. We call a subsequence of (xn )n=0 any sequence obtained by
∞
keeping only certain elements of the sequence (xn )∞

n=0 and ignoring all others. For example
x0 , x1 , x4 , x9 , x16 , x25 , . . .
is a subsequence. The formal definition is as follows:
Definition 2.85: Subsequences
Let (xn )∞
n=0 be a sequence in R. A subsequence of (xn )n=0 is a sequence of the form
∞
k=0 , where (nk )k=0 is a sequence of nonnegative integers such that nk+1 > nk for
(xnk )∞ ∞
all k ∈ N.
Remark 2.86. — In the previous definition, since nk+1 > nk for all k ∈ N, it follows that
nk ≥ k for every k ∈ N. (Exercise: Prove this fact by induction on k ∈ N.)

Lemma 2.87: Subsequences of Convergent Sequences are Convergent
Let (xn )∞ ∞
n=0 be a sequence in R converging to A ∈ R. Then, each subsequence of (xn )n=0
also converges to A.
Proof. We leave the proof as an exercise.
2.88. — A sequence can have convergent subsequences without itself converging. For
example, the sequence of real numbers given by xn = (−1)n is not convergence, while the
subsequences
(x2n )∞
n=0 and (x2n+1 )∞
n=0
are constant, and so in particular they are convergent.
Definition 2.89: Accumulation Points of Sequences
Let (xn )∞
n=0 be a sequence in R. A point A ∈ R is called an accumulation point of
the sequence (xn )∞
n=0 if for every ε > 0 and every N ∈ N there exists a natural number
n ≥ N with |xn − A| < ε.
Proposition 2.90: Subsequences and Accumulation Points
Let (xn )∞ ∞
n=0 be a sequence in R. An element A ∈ R is an accumulation point of (xn )n=0
8 if and only if there exists a convergent subsequence of (xn )∞
n=0 with limit A.
Proof. Suppose A ∈ R is an accumulation point of the given sequence. We recursively con-

struct a subsequence (xnk )∞k=0 that converges to A.
First, we apply the definition of accumulation point with ε = 1 to find an integer n0 such that
|xn0 − A| ≤ 1 holds.
Then, we apply the definition of accumulation point with ε = 2−1 and N = n0 + 1 to find
n1 > n0 such that |xn1 − A| ≤ 2−1 .
Then, we apply the definition of accumulation point with ε = 2−2 and N = n1 + 1 to find
n2 > n1 such that |xn2 − A| ≤ 2−2 .
In general, we apply the definition of accumulation point with ε = 2−k and N = nk−1 + 1 to
find nk > nk−1 such that |xnk − A| ≤ 2−k .
We claim that the subsequence (xnk )∞ k=0 of (xn )n=0 converges to A. Indeed, for every ε > 0
∞
there exists N ∈ N with 2−N < ε. Then, for every k ≥ N it holds
|xnk − A| ≤ 2−k ≤ 2−K < ε,
which shows that A is the limit of the sequence (xnk )∞

k=0 .
Conversely, suppose A is the limit of a converging subsequence (xnk )∞k=0 . Fix ε > 0 and
N ∈ N. Since xnk converges to A, there exists N0 ∈ N such that |xnk − A| < ε for all k ≥ N0 .
Hence, to satisfy the definition of an accumulation point, we want to choose k ≥ N0 so that

nk ≥ N . Recalling that nk ≥ k (see Remark 2.86), we define N1 = max{N0 , N } so that
|xnN1 − A| < ε and nN1 ≥ N.
So, A is an accumulation point of (xn )∞

n=0 .
Corollary 2.91: Accumulation Points have Infinite Elements nearby

Let A ∈ R be an accumulation point of the sequence (xn )∞ n=0 . Then, for every ε > 0,
∞
there are infinitely many elements of the sequence (xn )n=0 inside (A − ε, A + ε).
Proof. By Proposition 2.90 there exists a sequence (xnk )k≥0 such that limk→∞ xnk = A.
8 In particular, given ε > 0, there exists N ∈ N such that all elements (xnk )k≥N are inside
(A − ε, A + ε).
Corollary 2.92: Accumulation Points of Convergent Sequences
A converging sequence has exactly one accumulation point, which coincides with its
limit.
Proof. This follows from Lemma 2.87 and Proposition 2.90.
Exercise 2.93. — Let (xn )∞ n=0 be a sequence in R, and let F ⊆ R be the set of accumulation
points of the sequence (xn )n=0 . Show that F is closed.
∞

2.5.3 Addition, Multiplication, and Inequalities

2.94. — Given sequences (xn )∞ n=0 and (yn )n=0 , and a number α ∈ R, sums and scalar
∞
multiples of sequences are given by
(xn )∞ ∞ ∞
n=0 + (yn )n=0 = (xn + yn )n=0
α · (xn )∞ ∞
n=0 = (αxn )n=0 .
Sequences can also be multiplied as follows:
(xn )∞ ∞ ∞
n=0 · (yn )n=0 = (xn yn )n=0 .
Remark 2.95. — With the addition and multiplication defined above, the set of sequences
forms a commutative ring, where the zero element is the constant sequence (0)∞
n=0 , and the
neutral element for multiplication is the constant sequence (1)n=0 .
∞
Proposition 2.96: Limits and Operations

Let (xn )∞ ∞
n=0 and (yn )n=0 be sequences converging to A ∈ R and B ∈ R, respectively.
1. The sequence (xn + yn )∞

n=0 converges to A + B.
8 2. The sequence (xn yn )∞ ∞

n=0 converges to AB, and the sequence (αxn )n=0 converges
to αA for every α ∈ R.
3. Suppose xn ̸= 0 for all n ∈ N and A ̸= 0. Then the sequence (x−1 ∞

n )n=0 converges
to A−1 .
Proof. 1. Let ε > 0, and let NA , NB ∈ N be such that
|xn − A| < ε
2 ∀ n ≥ NA , and |yn − B| < ε
2 ∀ n ≥ NB .
Then, with N = max{NA , NB }, for all n ≥ N it holds
ε ε
|(xn + yn ) − (A + B)| ≤ |xn − A| + |yn − B| < 2 + 2 = ε,
which shows statement (1).
2. We refer to Exercise 2.103.

|A|
3. Let A be the limit of the sequence (xn )∞ n=0 . Since A = ̸ 0, choosing ε = 2 in the
definition of limit, there exists N0 ∈ N such that, for all n ≥ N0 ,
|A|
|xn − A| ≤ .
2

In particular, by inverse triangle inequality (see property (h) in Paragraph 2.16),
|A|
|xn | = |A + (xn − A)| ≥ |A| − |xn − A| ≥ ∀ n ≥ N0 .
2
which implies that

1 2
≤ ∀ n ≥ N0 .
|xn | |A|
In particular,
8 |xn − A| |xn − A|
x−1
n −A
−1
= ≤2 ∀ n ≥ N0 .
|xn | |A| |A|2
|A|2
Now, given ε > 0 we choose N1 so that |xn − A| < 2 ε for all n ≥ N1 . Then, for
N = max{N0 , N1 } we get
|xn − A|
x−1
n −A
−1
≤2 <ε ∀ n ≥ N.
|A|2
This proves that the sequence (x−1

n )n=0 converges to A , as desired.
∞ −1
Proposition 2.97: Limits and Inequalities
Let (xn )∞ ∞
n=0 and (yn )n=0 be sequences of real numbers with limits A = n→∞
lim xn and
B = lim yn .
n→∞
1. If A < B, then there exists N ∈ N such that xn < yn for all n ≥ N .
2. Assume that there exists N ∈ N such that xn ≤ yn for all n ≥ N . Then A ≤ B.
Proof. Suppose A < B, and let ε = 13 (B − A) > 0. Then there exist NA , NB ∈ N such that
n ≥ NA =⇒ A − ε < xn < A + ε
n ≥ NB =⇒ B − ε < yn < B + ε
9
holds. Note now that by the choice of ε it holds 2ε < B − A, therefore
A + ε < B − ε.
Hence, for N = max{NA , NB }, we have
n ≥ N =⇒ xn < A + ε < B − ε < yn .
This shows the first assertion.

For the second assertion, assume by contradiction that A > B. Then the first assertion
implies that there exists N such that xN > yN . This contradicts the assumption and proves
the result.

Remark 2.98. — In Proposition 2.97(2), even if one assumes that xn < yn , one can only
deduce that A ≤ B. Indeed, consider the sequences
1 1
xn = − , yn = ∀ n ≥ 1.
n n
Then xn < yn but both sequences converge to 0.
Lemma 2.99: Sandwich Lemma

Let (xn )∞ ∞ ∞
n=0 , (yn )n=0 , and (zn )n=0 be sequences of real numbers such that for some
N ∈ N the inequalities xn ≤ yn ≤ zn hold for all n ≥ N . Suppose that (xn )∞ n=0 and
∞ ∞
(zn )n=0 are convergent and have the same limit. Then the sequence (yn )n=0 is also
convergent, and it holds that
lim xn = lim yn = lim zn .

n→∞ n→∞ n→∞
Proof. The proof is left as an exercise for the reader.
Exercise 2.100. — Calculate the following limits, if they exist:
7n4 + 15 n2 + 5 n5 − 10
lim , lim , lim .
n→∞ 3n4 + n3 + n − 1 n→∞ n3 + n + 1 n→∞ n2 + 1
9
2.5.4 Bounded Sequences
In this section, we study bounded sequences of real numbers.
Definition 2.101: Bounded Sequences
A sequence (xn )∞
n=0 in R is called bounded if there exists a real number M ≥ 0 such
that |xn | ≤ M for all n ∈ N.
Lemma 2.102: Convergent Sequences are Bounded
Every convergent sequence is bounded.
Proof. Let (xn )∞n=0 be a convergent sequence with limit A ∈ R. Choosing ε = 1 in the
definition of limit, there exists N ∈ N such that |xn − A| ≤ 1 for all n ≥ N . In particular, by
triangle inequality (see property (g) in Paragraph 2.16),
|xn | ≤ |(xn − A) + A| ≤ |xn − A| + |A| ≤ 1 + |A| ∀ n ≥ N.
Set
M = max{1 + |A|, |x0 |, |x1 |, . . . , |xN −1 |}.
Then |xn | ≤ M holds for all n ∈ N, as required.

Exercise 2.103. — Prove statement (2) in Proposition 2.96.

Hint: Note that
|xn yn − AB| = |xn yn − xn B + xn B − AB| ≤ |yn − B|, |xn | + |xn − A|, |B|
and note that, since (xn )∞

n=0 converges, then |xn | ≤ M for some M > 0.
2.104. — As we will show, bounded sequences of real numbers always have at least one
accumulation point, or equivalently, a convergent subsequence. This fact gives rise to the
important notion of superior limit and inferior limit.
Definition 2.105: Monotone sequence
A sequence (xn )∞
n=0 is called:
• (monotonically) increasing if for all m, n ∈ N: m > n =⇒ xm ≥ xn ;
• strictly (monotonically) increasing if for all m, n ∈ N: m > n =⇒ xm > xn ;
• (monotonically) decreasing if for all m, n ∈ N: m > n =⇒ xm ≤ xn ;
• strictly (monotonically) decreasing if for all m, n ∈ N: m > n =⇒ xm < xn .
If the sequence is increasing or decreasing, we say that it is monotone. If it is strictly

9 increasing or strictly decreasing, we say that it is strictly monotone.
Remark 2.106. — An equivalent definition of increasing/decreasing sequence can be given

as follows:
• (monotonically) increasing if for all n ∈ N it holds xn+1 ≥ xn ;
• strictly (monotonically) increasing if for all n ∈ N it holds xn+1 > xn ;
• (monotonically) decreasing if for all n ∈ N it holds xn+1 ≤ xn ;
• strictly (monotonically) decreasing if for all n ∈ N it holds xn+1 < xn .
2.107. — Monotone bounded sequences are always convergent. We illustrate this with a
picture:
Figure 2.3
The monotonically increasing sequence a1 , a2 , a3 , . . . in the figure, bounded by M > 0, has no

choice but to converge to the supremum S of the sequence members.

Theorem 2.108: Convergence for Monotone Sequences
A monotone sequence of real numbers (xn )∞ n=0 converges if and only if it is bounded.
∞
If the sequence (xn )n=0 is monotonically increasing, then
lim xn = sup {xn | n ∈ N} .

n→∞
If the sequence (xn )∞

n=0 is monotonically decreasing, then lim xn = inf {xn | n ∈ N}.
n→∞
Proof. If (xn )∞
n=0 is convergent, it follows by Lemma 2.102 that (xn )n=0 is bounded.
∞
Vice versa, let (xn )∞

n=0 be a monotone bounded sequence of real numbers. To show that
(xn )n=0 converges, we can assume without loss of generality that (xn )∞
∞
n=0 is monotonically
increasing (otherwise we simply replace (xn )n=0 by (−xn )n=0 ). Since xn is bounded, there
∞ ∞
exists M > 0 such that xn ≤ M for all n ∈ N. This means that the set {xn | n ∈ N} is
bounded from above, so the supremum A = sup {xn | n ∈ N} exists. We now want to prove
that xn converges to A.
By the definition of supremum, we have:
(i) xn ≤ A for all n ∈ N;
(ii) for every ε > 0 there exists N ∈ N with xN > A − ε.
Thus, for n ≥ N , it follows from (i), (ii), and the monotonicity of (xn )∞
n=0 , that
9 A − ε < xN ≤ xn ≤ A < A + ε
holds. This proves that limn→∞ xn = A, as desired.
Remark 2.109. — If (xn )∞ n=0 is monotone and there exists a bounded subsequence (xnk )k=0 ,
∞
then the whole sequence is bounded (and therefore converges, thanks to Theorem 2.108).
Indeed, assume for instance that (xn )∞
n=0 is increasing and the subsequence (xnk )k=0 is
∞
bounded from above by a number M . Then, recalling Remark 2.86, by monotonicity we have
x0 ≤ xk ≤ xnk ≤ M ∀ k ∈ N.
So, (xn )∞
n=0 is bounded. The case when (xn )n=0 is increasing is analogous.
∞
Exercise 2.110. — Let (xn )∞

n=0 be the sequence given by x0 = 1 and

2 1
xn = xn−1 +
3 xn−1
for n ≥ 1. Show that (xn )∞n=0 converges and determine the limit.
Hint: First, prove that the sequence converges to a nonnegative limit. Second, show that if
A ≥ 0 is the limit, then it satisfies A = 23 A + A1 . Use this relation to identify A.


Exercise 2.111. — Let (xn )∞ n=0 be a monotonically increasing sequence and (yn )n=0 be
∞
a monotonically decreasing sequence with xn ≤ yn for all n ∈ N. Show that both sequences
converge and that limn→∞ xn ≤ limn→∞ yn holds. Illustrate your argument with a picture
similar to the one in Figure 2.3.
2.112. — Let (xn )∞ n=0 be a bounded sequence of real numbers. For the definition of limits
and accumulation points of the sequence (xn )∞ n=0 , only its long-term behavior is relevant, or
more precisely, the end (xk )n=N for arbitrarily large N ∈ N. Following this observation, for
∞
each n ∈ N we define the supremum
sn = sup {xk | k ≥ n}
over the final part {xk | k ≥ n} of the sequence. Since {xk | k ≥ m} ⊂ {xk | k ≥ n} for m > n,
it follows that sm ≤ sn for m > n. The sequence (sn )∞n=0 is therefore monotonically decreasing.
Since (xn )∞
n=0 is bounded by assumption, (sn )n=0 is also bounded, and so it is a monotoni-
∞
cally decreasing bounded sequence. Therefore, the sequence (sn )∞ n=0 converges to the infimum
of the set {sn | n ∈ N} by Theorem 2.108. This infimum is called the superior limit of the
given sequence (xn )∞ n=0 .
Analogously, one can define the inferior limit of (xn )∞ n=0 considering
in = inf {xk | k ≥ n} .
9
In this case (in )∞

n=0 is a monotonically increasing bounded sequence, so its limit exists and it
is called inferior limit.
We remark that, by the definition of in and sn , the following inequality holds:
in ≤ xn ≤ sn ∀ n ∈ N. (2.10)
Definition 2.113: Superior and Inferior Limits

Let (xn )∞
n=0 be a bounded sequence of real numbers. The real numbers defined by

lim sup xn = lim sup{xk | k ≥ n}
n→∞ n→∞
and
lim inf xn = lim inf{xk | k ≥ n}
n→∞ n→∞
are called superior limit, respectively inferior limit of the sequence (xn )∞
n=0 . Note
that, as a consequence of (2.10) and Proposition 2.97,
lim inf xn ≤ lim sup xn .

n→∞ n→∞

Example 2.114. — Let (xn )n be the sequence defined by xn = (−1)n + 1

n for n ∈ N. We
represent xn , sn , and in in the following table.
n 1 2 3 4 5 6 7 8 ...
3
xn 0 2 − 23 5
4 − 45
... 7
6 − 67 9
8
3 3 5 5 7 7 9 9
sn 2 2 4 4 ...
6 6 8 8
in −1 −1 −1 −1 −1 −1 −1 −1 . . .
9 Note here that sn = xn when n is even, and sn = xn+1 otherwise. Therefore lim supn→∞ ((−1)n +
1 2n + 1 ) = 1.
n ) = limn→∞ ((−1) 2n
Because xn ≥ −1 for every n and limn→∞ x2n+1 = −1, then in = −1 for all n ∈ N. Therefore
lim inf n→∞ ((−1)n + n1 ) = −1.
Lemma 2.115: Convergence vs Superior and Inferior Limits

A bounded sequence (xn )∞
n=0 in R converges if and only if lim supn→∞ xn =
lim inf n→∞ xn .
Proof. Define
in = inf{xn | n ≥ N }, sn = sup{xn | n ≥ N },
I = lim in = lim inf xn , S = lim sn = lim sup xn .

n→∞ n→∞ n→∞ n→∞
We now prove the result.

Suppose first that I = S holds. Recalling (2.10), since by assumption in and sn converge
10
to the same limit, the Sandwich Lemma 2.99 implies that xn converges to I = S.
Conversely, assume that (xn )∞
n=0 converges to A ∈ R, and let ε > 0. Then there exists
N ∈ N such that A − ε < xn < A + ε for all n ≥ N . In particular
A − ε ≤ in ≤ sn ≤ A + ε,
and therefore it follows from Proposition 2.97 that
A − ε ≤ I ≤ S ≤ A + ε.
Since ε > 0 is arbitrary, this implies that A ≤ I ≤ S ≤ A, so A = I = S.

Theorem 2.116: Superior and Inferior Limits are Accumulation Points

Let (xn )∞
n=0 be a bounded sequence and A = lim supn→∞ xn . Then A is an accumulation
point and, for every ε > 0, the following hold:
(1) there are only finitely many elements satisfying xn ≥ A + ε;
(2) infinitely many elements satisfy A − ε < xn < A + ε.
An analogous statement holds for the inferior limit.
Proof. Let ε > 0, and write sn = sup{xk | k ≥ n}. The sequence (sn )∞
n=0 is monotonically
decreasing and converges to A. So there exists N0 ∈ N such that
A ≤ sn < A + ε ∀ n ≥ N0 . (2.11)
Therefore, given N ∈ N, if we define N1 := max{N, N0 }, then sN1 satisfies (2.11).

Now, since sN1 = sup{xk | k ≥ N1 }, there exists n1 ≥ N1 satisfying xn1 > sN1 − ε, therefore
xn1 > A − ε (since sN1 ≥ A by (2.11)). On the other hand, since sn1 ≤ sN1 (recall that sn is
decreasing), (2.10) and (2.11) imply that xn1 < A + ε.
In other words, given N ∈ N we proved the existence of an index n1 ≥ N1 ≥ N such that
A − ε < xn1 < A + ε.

10
This shows that A is an accumulation point.

To prove (1), it follows from (2.10) and (2.11) that xn < A + ε for all n ≥ N0 . This implies
that only finitely many elements satisfy xn ≥ A + ε.
Since A is an accumulation point, (2) follows from Corollary 2.91.
Corollary 2.117: Bounded Sequences have Convergent Subsequences
Every bounded sequence of real numbers has an accumulation point and has a convergent
subsequence.
Proof. By Theorem 2.116, the limsup (and analogously for the liminf) is always an accumu-
lation point. Also, by Proposition 2.90, every accumulation point is the limit of a converging
subsequence. So, a convergent subsequence always exists.
Exercise 2.118. — Let (xn )∞ n=0 be a bounded sequence in R, and let E ⊆ R be the set of
accumulation points of the sequence (xn )∞
n=0 . Show that
lim sup xn = max E and lim inf xn = min E.

n→∞ n→∞

Exercise 2.119. — Let (an )∞ n=0 , (bn )n=0 and (cn )n=0 be convergent sequences of real
∞ ∞
numbers, with limits A, B, and C respectively. Let (xn )∞n=0 be the sequence defined by

a if n = 3k, k ∈ N
 n


xn = bn if n = 3k + 1, k ∈ N .


cn if n = 3k + 2, k ∈ N

Calculate lim sup xn , lim inf xn , and the set of accumulation points of the sequence (xn )∞
n=0 .
n→∞ n→∞
Exercise 2.120. — Let (xn )∞

n=0 be a bounded sequence of real numbers such that (xn+1 −
xn )n=0 converges to 0. Set
∞
A = lim inf xn and B = lim sup xn .

n→∞ n→∞
Show that the set of accumulation points of the sequence (xn )∞

n=0 is the interval [A, B].
Construct an example of such a sequence with [A, B] = [0, 1].
2.5.5 Cauchy-Sequences
Definition 2.121: Cauchy-sequences

10
A sequence (xn )∞
n=0 in R is a Cauchy sequence if for every ε > 0 there exists N ∈ N
such that |xn − xm | < ε for all m, n ≥ N .
Lemma 2.122: Boundedness of Cauchy Sequences
Cauchy sequences are bounded.
Proof. The proof is very similar to the one of Lemma 2.102. Choosing ε = 1 in the definition
of Cauchy sequence, there exists N ∈ N such that |xn − xN | ≤ 1 for all n ≥ N . In particular
|xn | ≤ 1 + |xN | for all n ≥ N , and therefore
|xn | ≤ M = max{1 + |xN |, |x0 |, |x1 |, . . . , |xN −1 |} ∀ n ∈ N.
Exercise 2.123. — Show that a Cauchy sequence converges if and only if it has a conver-
gent subsequence.
Theorem 2.124: Convergence vs Cauchy Sequences
A sequence of real numbers converges if and only if it is a Cauchy sequence.

Proof. Let (xn )∞

n=0 be sequence converging to some limit A, and fix ε > 0. Then there exists
N such that for all n ≥ N it holds |xn − A| < 2ε . This implies that
ε ε
|xn − xm | ≤ |xn − A| + |xm − A| < + =ε ∀ n, m ≥ N,
2 2
hence (xn )∞
n=0 is a Cauchy sequence.
Vice versa, let (xn )∞
n=0 be a Cauchy sequence. Since it is bounded (see Lemma 2.122),
n=0 has a subsequence (xnk )k=0 converging to some limit

Corollary 2.117 implies that (xn )∞ ∞
A ∈ R. Then, given ε > 0, consider N0 such that
ε
|xn − xm | < ∀ m, n ≥ N0 ,
2
and consider N1 such that

ε
|xnk − A| < ∀ k ≥ N1 .
2
Then, choosing N = max{N0 , N1 } and recalling that nN ≥ N (see Remark 2.86), it follows
10 that
ε ε
|xn − A| ≤ |xn − xnN | + |xnN − A| < + = ε ∀ n ≥ N,
2 2
n=0 converges to A.
so the whole sequence (xn )∞
Example 2.125. — Let (xn )∞

n=0 be a sequence of real numbers. We claim that the condition
∀ ε > 0 ∃ N ∈ N such that ∀ n ≥ N : |xn − xn+1 | < ε
is not equivalent to the convergence of the sequence. As a counterexample, consider the

sequence
0, 1, 1 + 12 , 2, 2 + 13 , 2 + 32 , 3, 3 + 14 , 3 + 24 , 3 + 34 , 4, 4 + 15 , 4 + 25 , 4 + 53 , 4 + 45 , 5, 5 + 16 . . .
that progresses between n−1 and n in steps of length n1 . This sequence is unbounded and thus
not convergent. On the other hand, the distance between two successive elements decreases
as the sequence progresses, and becomes arbitrarily small.
2.5.6 Improper Limits

11
We also introduce the improper limit values +∞ (often abbreviated to ∞) and −∞ for
sequences.

Definition 2.126: Improper Limits

Let (xn )∞
n=0 be a sequence of real numbers. We say (xn )n=0 diverges to ∞, and we
∞
write
lim xn = ∞.
n→∞
if for every M > 0 there exists N ∈ N such that xn > M for all n ≥ N .
Similarly, we say that (xn )∞
n=0 diverges to −∞ if for every real number M > 0 there
exists N ∈ N such that xn < −M for all n ≥ N .
In both cases, we speak of an improper limit.
2.127. — An unbounded sequence may not diverge to ∞ or −∞. For example, the sequence
0, −1, 2, −3, 4, −5, 6, −7, 8, −9, . . . ,
i.e., xn = (−1)n n is unbounded, but it neither diverges to ∞, nor diverges to −∞.
Exercise 2.128. — Let (xn )∞ n=0 be an unbounded sequence of real numbers. Show that
there exists a subsequence that diverges to ∞ or to −∞.
2.129. — We can use improper limits to define the superior and inferior limits for un-
11
bounded sequences in R. If the sequence (xn )∞
n=0 is not bounded from above, then sup{xn | k ≥
n} = ∞ for all n ∈ N and we write
lim sup xn = ∞.
n→∞
If (xn )∞
n=0 is bounded from above but not from below, then we write

lim sup xn = lim sup{xk | k ≥ n}
n→∞ n→∞
where the right-hand side is a real limit if the monotonically decreasing sequence sup{xk | k ≥
n} is bounded, and the improper limit −∞ otherwise. We use this terminology analogously
for the inferior limit.
Exercise 2.130. — Prove the following version of the sandwich lemma for improper limits.
For two sequences of real numbers (xn )∞
n=0 and (xn )n=0 with xn ≤ yn for all n ∈ N holds:
∞
lim xn = ∞ =⇒ lim yn = ∞
n→∞ n→∞
lim yn = −∞ =⇒ lim xn = −∞.

n→∞ n→∞

2.6 Sequences of Complex Numbers

To study sequences in C it is often sufficient to consider the corresponding sequences of real
and imaginary parts in R.
Definition 2.131: Sequences of Complex Numbers
A sequence of complex numbers (zn )∞ n=0 = (xn + iyn )n=0 is convergent with limit
∞
A + iB ∈ C if the two sequences of real numbers (xn )∞n=0 and (yn )n=0 are convergent,
∞
with limits A and B, respectively.

We say that (zn )∞
n=0 = (xn + iyn )n=0 diverges to ∞ if the sequence (|zn |)n=0 diverges
∞ ∞
toward ∞, that is,

p
lim |zn | = lim x2n + yn2 = ∞.
n→∞ n→∞
11
Remark 2.132. — As we did for sequences of real numbers, also for sequences of complex
numbers we can consider subsequences. This corresponds to consider (znk )∞ k=0 = (xnk +
iynk )k=0 for some strictly increasing sequence of nonnegative integers (nk )k=0 .
∞ ∞
Exercise 2.133. — Let (zn )∞ n=0 be a convergent sequence in C. Show that (|zn |)n=0 con-
∞
verges, and find the limit. Conversely, does the convergence of (|zn |)∞
n=0 imply the convergence
of (zn )n=0 ?
∞
Exercise 2.134. — Given a complex number z ∈ C, consider the so-called geometric

sequence zn = z n . Determine the set of all complex numbers z for which the sequence
n=0 converges.
(z n )∞

Chapter 3
Functions of one Real Variable
In this chapter, we discuss real-valued functions defined on a subset of R, typically an interval.

The central concept of this chapter is the one of continuity.
3.1 Real-valued Functions
3.1.1 Boundedness and Monotonicity

In this section, we discuss two elementary properties of real-valued functions that we have
already encountered for sequences: boundedness and monotonicity. A real-valued function
is any function with values in R. As soon as we talk about monotonicity, we will assume
that the domain of the functions we consider is a non-empty subset of R. Informally, we then
speak of functions in one real variable.
3.1. — For an arbitrary non-empty set D ⊂ R, we define the set of real-valued functions
on D as
11 F(D) = {f | f : D → R} .
One can define addition and scalar multiplication as
(f1 + f2 )(x) = f1 (x) + f2 (x) and (αf1 )(x) = αf1 (x)
for f1 , f2 ∈ F(D), α ∈ R, and x ∈ D. Also, functions in F(D) can be multiplied as follows:
(f1 f2 )(x) = f1 (x)f2 (x)
for all f1 , f2 ∈ F(D) and x ∈ D.

Given a ∈ R, we use the notation f ≡ a to denote the constant function f (x) = a for all
x ∈ D.
61
Chapter 3.1 Real-valued Functions
Remark 3.2. — The interested reader may notice that, with the addition and multi-
plication defined above, F(D) is a commutative ring (the neutral element for addition is the
constant function f ≡ 0, the neutral element for multiplication is the constant function f ≡ 1).
We say that x ∈ D is a zero of f ∈ F(D) if f (x) = 0 holds. The zero set of f is defined
by {x ∈ D | f (x) = 0}. Finally, we define an order relation on F(D): given f1 , f2 ∈ F(D) we
say
f1 ≤ f2 ⇐⇒ f1 (x) ≤ f2 (x) ∀ x ∈ D,
f1 < f2 ⇐⇒ f1 (x) < f2 (x) ∀ x ∈ D,
We say that f ∈ F(D) is non-negative if f ≥ 0, and f ∈ F(D) is positive if f > 0.
Exercise 3.3. — Let N1 , N2 ⊂ D be the set of zeros of f1 ∈ F(D) and f2 ∈ F(D)

respectively. What is the set of zeros of f1 f2 ?
Exercise 3.4. — Verify that the relation ≤ defined above on F(D) is indeed an order
11
relation.
Definition 3.5: Bounded Functions

Let D be a non-empty set, and f : D → R a function. We say that f is bounded
from above if there exists M > 0 such that
f (x) ≤ M ∀ x ∈ D.
We say that f is bounded from below if there exists M > 0 such that
f (x) ≥ −M ∀ x ∈ D.
Finally, we say that f is bounded if f is bounded from above and from below. Equiv-
alently, f is bounded if there exists M > 0 such that
|f (x)| ≤ M ∀ x ∈ D.

Definition 3.6: Monotone Functions

Let D ⊂ R and f : D → R. The function f is called:
1. (monotonically) increasing if for all x, y ∈ D:
x < y =⇒ f (x) ≤ f (y);
2. strictly (monotonically) increasing if for all x, y ∈ D:
x < y =⇒ f (x) < f (y);
3. (monotonically) decreasing if for all x, y ∈ D:
x < y =⇒ f (x) ≥ f (y);
4. strictly (monotonically) decreasing if for all x, y ∈ D:
x < y =⇒ f (x) > f (y).
We call a function f : D → R monotone if it is monotonically increasing or monoton-

ically decreasing, and strictly monotone if it is strictly monotonically increasing or
11
strictly monotonically decreasing.
Example 3.7. — • Let D = [a, b] be an interval, and f : D → R be the function

f (x) = x . The function f is strictly increasing if a ≥ 0, and strictly decreasing if b ≤ 0.
2
If a < 0 < b holds, then f is not monotone.
• For any subset D ⊂ R and any odd integer number n ≥ 0, the function x 7→ xn on D is
strictly monotonically increasing.
• The rounding function ⌊·⌋ : R 7→ R (recall Definition 2.64) is increasing, but not strictly
increasing.
• A constant function is both decreasing and monotonically increasing. Conversely, a

function on D ⊆ R that is both monotonically decreasing and monotonically increasing
is necessarily constant.

Figure 3.1: A strictly monotone function is always injective. However, it need not be surjective.
For example, the function f : R → R given by f (x) = 18 x + sgn(x) is strictly monotone
increasing but not subjective, (e.g., 12 is not in the image of f ).
Exercise 3.8. — Let D ⊆ R, and let f1 , f2 ∈ F(D) be strictly increasing. Show that:
(i) f1 + f2 ∈ F(D) is strictly increasing;
(ii) given a ∈ R, the function af ∈ F(D) is strictly increasing if a > 0, and strictly
decreasing if a < 0;
(iii) if f1 > 0 and f2 > 0, then f1 f2 is strictly increasing.
3.1.2 Continuity
Definition 3.9: Continuous Functions

Let D ⊆ R be a subset and let f : D → R be a function. We say that f is continuous
at a point x0 ∈ D if for all ε > 0 there exists δ > 0 such that the following implication
holds:
11 ∀ x ∈ D, |x − x0 | < δ =⇒ |f (x) − f (x0 )| < ε.
The function f is continuous in D if it is continuous at every point of D.
Remark 3.10. — In the definition of continuity, it is only important to check the implica-
tion for ε > 0 small. Indeed, assume that a function f satisfies the following:
There exists ε0 > 0 such that, for all ε ∈ (0, ε0 ] there exists δ > 0 such that
∀ x ∈ D, |x − x0 | < δ =⇒ |f (x) − f (x0 )| < ε.
Then f is continuous at x0 . Indeed, for ε > ε0 we can choose the number δ > 0 corresponding
to ε0 to get
∀ x ∈ D, |x − x0 | < δ =⇒ |f (x) − f (x0 )| < ε0 < ε.
In other words, if δ works for ε0 , then it works for all ε > ε0 .
3.11. — The following illustration shows a continuous function on D = [a, b) ∪ (c, d] ∪ {e}.
We see that f is continuous at every point x0 : no matter how small one chooses ε > 0, for a
suitable δ > 0 we have that for all x that are δ-close to x0 , f (x) is also ε-close to f (x0 ).

Applet 3.12 (Continuity). We consider a function that is continuous at most (but not all)
points in the domain of definition.
Example 3.13. — • Let a and b be real numbers. The affine function f : R → R given
by f (x) = ax + b is continuous.
11 Indeed, if a = 0, then the function is constant and therefore continuous. So, let a ̸= 0.
Given x0 ∈ R and ε > 0, note that |f (x) − f (x0 )| = |a||x − x0 | holds for all x ∈ R.
Thus, considering the choice δ = |a|
ε
, for any x ∈ R with |x − x0 | < δ we have
|f (x) − f (x0 )| = |a||x − x0 | < |a|δ = ε
and thus f is continuous.
• The function f : R → R given by f (x) = |x| is also continuous. Indeed, let x0 ∈ R and
ε > 0. Choosing δ = ε we notice that if |x − x0 | < δ then
|f (x) − f (x0 )| = |x| − |x0 | ≤ |x − x0 | < δ = ε
by the inverse triangle inequality.
• The rounding function f : R → R given by f (x) = ⌊x⌋ (recall Definition 2.64) is not
continuous at points in Z. Indeed, if x0 ∈ Z, then for any δ > 0 small it holds that
|(x0 − 21 δ) − x0 | < δ and
⌊x0 − 12 δ⌋ − ⌊x0 ⌋ = ⌊x0 ⌋ − ⌊x0 − 21 δ⌋ = x0 − (x0 − 1) = 1.
Thus, the continuity condition is not satisfied at x0 ∈ Z for any ε < 1.

Exercise 3.14. — Show that the functions f : R → R and g : [0, ∞) → R given by

√
f (x) = x2 and g(x) = x are both continuous.
11
Interlude: Restriction
Given D ⊆ R and a function f : D → R defined on D, for any D′ ⊂ D one can consider
the restriction of f to D′ : this is the function f |D′ : D′ → R defined as
f |D′ (x) = f (x) ∀ x ∈ D′ .
Note that we consider f |D′ and f as different functions, since their domains of definition
are not the same – except, of course, when D′ = D.
Exercise 3.15. — Let D ⊆ R and f : D → R be a continuous function. Let D′ ⊂ D.

Show that the restriction f |D′ is continuous.
Proposition 3.16: Combination of Continuous Functions

Let D ⊆ R, and let f1 , f2 : D → R be functions continuous at a point x0 ∈ D. Then
also the functions f1 + f2 , f1 f2 , and af1 for a ∈ R are continuous at x0 .
Proof. Let ε > 0. Since f1 and f2 are continuous at x0 , there exist δ1 , δ2 > 0 such that, for
all x ∈ D, it holds
ε
12 |x − x0 | < δ1 =⇒ |f1 (x) − f1 (x0 )| <
2
ε
|x − x0 | < δ2 =⇒ |f2 (x) − f2 (x0 )| < .
2
Hence, if we set δ = min {δ1 , δ2 } > 0, then
ε ε
|x−x0 | < δ =⇒ |(f1 +f2 )(x)−(f1 +f2 )(x0 )| ≤ |f1 (x)−f1 (x0 )|+|f2 (x)−f2 (x0 )| < + = ε.
2 2
Since ε > 0 is arbitrary, we obtain that f1 + f2 is continuous at x0 ∈ D.

The argument for f1 f2 is similar but a little more complicated. Given x ∈ D, using the
triangle inequality we have the estimate
|f1 (x)f2 (x) − f1 (x0 )f2 (x0 )| = |f1 (x)f2 (x) − f1 (x0 )f2 (x) + f1 (x0 )f2 (x) − f1 (x0 )f2 (x0 )|
≤ |f1 (x)f2 (x) − f1 (x0 )f2 (x)| + |f1 (x0 )f2 (x) − f1 (x0 )f2 (x0 )|
= |f1 (x) − f1 (x0 )||f2 (x)| + |f1 (x0 )||f2 (x) − f2 (x0 )|.
Let ε > 0 and choose δ1 , δ2 > 0 such that, for x ∈ D,
ε
|x − x0 | < δ1 =⇒ |f1 (x) − f1 (x0 )| < ,
2(|f2 (x0 )| + ε)
ε
|x − x0 | < δ2 =⇒ |f2 (x) − f2 (x0 )| < .
2(|f1 (x0 )| + 1)
Then, for x ∈ D with |x − x0 | < δ = min{δ1 , δ2 }, it holds

12
|f2 (x)| = |f2 (x) − f2 (x0 ) + f2 (x0 )| ≤ |f2 (x) − f2 (x0 )| + |f2 (x0 )|
ε
< + |f2 (x0 )| < ε + |f2 (x0 )|,
2(|f1 (x0 )| + 1)
therefore
ε ε
|f1 (x) − f1 (x0 )||f2 (x)| < (ε + |f2 (x0 )|) = .
2(|f2 (x0 )| + ε) 2
Analogously, for the second term, for x ∈ D with |x − x0 | < δ we have
ε ε
|f1 (x0 )||f2 (x) − f2 (x0 )| < |f1 (x0 )| < .
2(|f1 (x0 )| + ε) 2
Combining the inequalities above, we obtain |f1 (x)f2 (x) − f1 (x0 )f2 (x0 )| < ε as desired.
The statement about af1 for a ∈ R follows choosing f2 equal to the constant function
f2 ≡ a and applying the previous result.

Interlude: Sum Notation

Let n ≥ 1 be an integer, and let a1 , . . . , an be real numbers. In the following, we want
to use the commonly used sum notation
n
X
aj = a1 + a2 + · · · + an ,
j=1
We will refer to aj as the summand and j as the index or the running variable of
the sum. If J is a finite set, and if for each j ∈ J a number aj is given, we write
X
aj
j∈J
for the sum of all numbers in the set {aj | j ∈ J}. Finally, we will use the convention
j∈∅ aj = 0 for the sum over the empty index set.
P
Example 3.17. — Polynomial functions are functions of the form

n
X
p(x) = a0 + a1 x + . . . + an xn = aj xj
j=0
12
with aj ∈ R. We claim that these functions are continuous. Indeed, since x 7→ x is continuous,
also x 7→ x2 = x · x is continuous, x 7→ x3 = x · x2 is continuous, and so on. This means that
all the functions {a0 , a1 x, . . . , an xn } are continuous, and therefore their sum is continuous.
Remark 3.18. — It is also common to use the notation

n
Y
aj = a1 a2 · · · an
j=1
for products. In this context, aj are referred to as factors. Also, by convention

Q
j∈∅ aj = 1.
Interlude: Composition of Functions

Let f : X → Y and g : Y → Z be functions. Then the composition of f with g is the
function
g◦f :X →Z
which is defined by g ◦ f (x) = g(f (x)) for all x ∈ X.
Remark 3.19. — Let f : W → X, g : X → Y , h : Y → Z be functions. Then, we can

consider both the function h ◦ (g ◦ f ) : W → Z and the function (h ◦ g) ◦ f : W → Z. However,

the parentheses are irrelevant because the following is true for all w ∈ W :
h ◦ (g ◦ f )(w) = h(g ◦ f (w)) = h(g(f (w)) = h ◦ g(f (w)) = (h ◦ g) ◦ f (w).
In other words, h ◦ (g ◦ f ) = (h ◦ g) ◦ f and we say that the composition of functions is

associative. Therefore, we write h ◦ g ◦ f : W → Z.
Proposition 3.20: Composition of Continuous Functions

Let D1 and D2 be subsets of R and let x0 ∈ D1 . Let f : D1 → D2 be a continuous
function at x0 and let g : D2 → R be a continuous function at f (x0 ). Then g ◦ f :
D1 → R is continuous at x0 . In particular, the composition of continuous functions is
again continuous.
Proof. Let ε > 0. Then, due to the continuity of g, there exists η > 0 at f (x0 ), so that for all
y ∈ D2
|y − f (x0 )| < η =⇒ |g(y) − g(f (x0 ))| < ε.
Since η > 0 and f is continuous at x0 , there exists δ > 0 such that for all x ∈ D1 .
|x − x0 | < δ =⇒ |f (x) − f (x0 )| < η.
Combining these two implications together, it follows that for all x ∈ D1 .

12
|x − x0 | < δ =⇒ |f (x) − f (x0 )| < η =⇒ |g(f (x)) − g(f (x0 ))| < ε
holds. This shows that g ◦ f is continuous at the point x0 .
Remark 3.21. — Applying Proposition 3.20 with g(x) = |x| (see Example 3.13), we deduce
that if f : D → R is continuous then also the function x 7→ |f (x)| is continuous.
Exercise 3.22. — Show that the function f : R× → R given by x 7→ x1 is continuous.

Deduce that if g : D → R is a continuous function that has no zeros inside D, then the
function x 7→ g(x)
1
is continuous in D.
Conclude that functions of the form x 7→ h(x)
g(x) are continuous in D whenever h : D → R and
g : D → R are continuous and g does not have zeros in D.
Exercise 3.23. — Let a < b < c be real numbers, and f1 : [a, b] → R and f2 : [b, c] → R
be continuous functions. Show that the function f : [a, c] → R given by

f (x) if x ∈ [a, b)
1
f (x) =
f (x) if x ∈ [b, c]
2

is continuous exactly when f1 (b) = f2 (b) holds.
Exercise 3.24. — Let I ⊂ R be an open interval and let f : I → R a function. Show that
f is continuous if and only if, for every open set U ⊂ R, f −1 (U ) is also open.
Exercise 3.25. — Let D ⊆ R be a subset and f : D → R be a function. Show the following

statements.
1. If f is continuous at x0 ∈ D, then there exists an open neighborhood U of x0 and a real

number M > 0 such that |f (x)| ≤ M for all x ∈ D ∩ U .
2. If f is continuous at x0 ∈ D and f (x0 ) ̸= 0, then there is an open neighborhood U of x0

such that f (x)f (x0 ) > 0 for all x ∈ D ∩ U , that is, f (x) and f (x0 ) have the same sign.
3.1.3 Sequential Continuity

Continuity can also be characterized using sequences, as shown in the following theorem.
Roughly speaking, the content of this theorem is that a function f : D → R is continuous
if and only if it maps convergent sequences to convergent sequences, with the correct limit.
This concept is also called sequential continuity.
Theorem 3.26: Continuity = Sequential Continuity

12
Let D ⊆ R be a subset, f : D → R be a function, and x̄ ∈ D. The function f is
continuous at x̄ if and only if, for every sequence (xn )∞
n=0 ⊂ D converging to x̄, the
∞
sequence (f (xn ))n=0 converges to f (x̄).
Proof. Suppose f is continuous at x̄ and that (xn )∞

n=0 is a convergent sequence in D with
lim xn = x̄. Let ε > 0. Due to the continuity of f at the point x̄, there exists δ > 0 with
n→∞
|x − x̄| < δ =⇒ |f (x) − f (x̄)| < ε
for all x ∈ D. Due to the convergence of (xn )∞

n=0 to x̄, there exists N ∈ N such that
n ≥ N =⇒ |xn − x̄| < δ.
Thus
n ≥ N =⇒ |f (xn ) − f (x̄)| < ε,
which implies that the sequence (f (xn ))∞

n=0 converges to f (x0 ).
For the converse, we assume that f is not continuous at x̄. Then there exists ε > 0 such
that, for all δ > 0, one can find x ∈ D with
|x − x̄| < δ and |f (x) − f (x̄)| ≥ ε.

Using this for n ∈ N and δ = 2−n > 0, for each n ∈ N we find xn ∈ D with
|xn − x̄| < 2−n and |f (xn ) − f (x̄)| ≥ ε.
From these inequalities we conclude that the sequence (xn )∞

n=0 converges to x̄, but (f (xn ))n=0
∞
does not converge to f (x̄).
12 Remark 3.27. — As the proof above shows, if a function f : D → R is not continuous at

x̄, then one can find ε > 0 and sequence of points (xn )∞
n=0 ⊂ D converging to x̄ such that
|f (xn ) − f (x̄)| ≥ ε.
Exercise 3.28. — Let D ⊆ R be a subset and f : D → R be a continuous function.

Suppose that (xn )∞
n=0 is a sequence in D such that (f (xn ))n=0 converges. Must (xn )n=0 also
∞ ∞
converge?

Chapter 3.2 Continuous Functions
3.2 Continuous Functions
3.2.1 The Intermediate Value Theorem

In this section we prove a fundamental theorem that formalises the idea that the graph of a
continuous function on an interval is a continuous curve and so does not make any jumps.
Here we show that a continuous function f on an interval [a, b] contained in the domain of
definition takes all values between f (a) and f (b). As we shall see, the proof uses the existence
of the supremum, and thus indirectly the completeness axiom.
Theorem 3.29: Intermediate Value Theorem

Let f : [a, b] → R be a continuous function with f (a) ≤ f (b). Then, for every real
number c with f (a) ≤ c ≤ f (b) there exists x̄ ∈ [a, b] such that f (x̄) = c.
Proof. Fix c ∈ [f (a), f (b)], and define the set
X = {x ∈ [a, b] | f (x) ≤ c} .
Since a ∈ X and X ⊆ [a, b], the set X is non-empty and bounded from above. Therefore, by
Theorem 2.57, the supremum x̄ = sup(X) ∈ [a, b] exists. We will now use the continuity of f
13 at x̄ to show that f (x̄) = c holds.
First of all, since x̄ is the supremum of X, for any n ≥ 0 we can find an element xn ∈
X ∩ [x̄ − 2−n , x̄]. Since xn ∈ X it follows that f (xn ) ≤ c. Then, since the sequence (xn )∞
n=0
converges to x̄ (because |xn − x̄| ≤ 2−n ) and f is continuous, Theorem 3.26 implies that
f (x̄) = lim f (xn ),

n→∞
while Proposition 2.97 implies that
lim f (xn ) ≤ c.
n→∞
Hence, f (x̄) ≤ c.
Suppose now by contradiction that f (x̄) < c. Since c ≤ f (b), it follows that x̄ < b. Then,
by continuity, given ε = c − f (x̄) > 0 there exists δ > 0 such that, for x ∈ [a, b],
|x − x̄| < δ =⇒ |f (x) − f (x̄)| < ε =⇒ f (x) < f (x̄) + ε = c.
This implies in particular that (x̄ − δ, x̄ + δ) ∩ [a, b] ⊂ X. In particular X contains the set
(x̄, x̄ + δ) ∩ (x̄, b], a contradiction to the fact that x̄ = sup(X). In conclusion f (x̄) = c, as
desired.

Remark 3.30. — If f : [a, b] → R is a continuous function such that f (a) > f (b), Theo-
rem 3.29 still holds in the following way:
For every real number c with f (a) ≥ c ≥ f (b) there exists x̄ ∈ [a, b] such that f (x̄) = c.
To prove this statement, there are two possible ways:
(1) repeat the proof of Theorem 3.29 defining X = {x ∈ [a, b] | f (x) ≥ c};
(2) apply Theorem 3.29 to the function g = −f .
Exercise 3.31. — Let I be a non-empty interval and f : I → R a continuous injective

function. Show that f is strictly monotone.
3.2.2 Inverse Function Theorem
Interlude: Identity and Inverse Function

Given a set X, the identity function idX : X → X is defined as
idX (x) = x for every x ∈ X.
Given a bijective function f : X → Y , one can uniquely construct a function g : Y → X

13 as follows: for every y ∈ Y , g(y) is the uniquely determined element x ∈ X satisfying
f (x) = y. With this definition, it follows
g ◦ f = idX and f ◦ g = idY .
The function g is called inverse function (or inverse mapping) of f , and is often
denoted by f −1 .
Remark 3.32. — The existence of an inverse function is characteristic of bijective functions:

A function f : X → Y is bijective if and only if there exists a function g : Y → X with
g ◦ f = idX and f ◦ g = idY .
3.33. — In this subsection, we show that every continuous strictly monotone function has
an inverse function that is also continuous.
Theorem 3.34: Inverse Function Theorem

Let I be an interval and f : I → R be a continuous strictly monotone function. Then
f (I) ⊂ R is an interval, and the mapping f : I → f (I) has a continuous strictly
monotone inverse function f −1 : f (I) → I.

Proof. Without loss of generality, we can assume that I is non-empty and not a single point.
Also, we can assume that f is strictly increasing (otherwise replace f with −f ).
We write J = f (I), and first notice that the function f : I → J is bijective, since it is
surjective by definition, and due to strict monotonicity it is also injective. Thus, there exists
a uniquely determined inverse g = f −1 : J → I.
We note that the function g is strictly increasing: since f is strictly increasing,
x1 < x2 ⇐⇒ f (x1 ) < f (x2 ) ∀ x1 , x2 ∈ I,
which leads to
g(y1 ) < g(y2 ) ⇐⇒ y1 < y2 ∀ y1 , y2 ∈ J.
We show that J is an interval. Indeed, let y1 = f (x1 ) and y2 = f (x2 ) be elements of J,

with y1 < y2 . Then x1 < x2 , and it follows from the Intermediate Value Theorem 3.29 applies
to the closed interval [x1 , x2 ] that [y1 , y2 ] ⊂ f ([x1 , x2 ]) ⊂ J. In other words, given any two
element in J, the closed interval between them is contained inside J. Thus, J is an interval.
It remains to show that g = f −1 is continuous. Let ȳ ∈ J, and assume by contradiction
that g is not continuous at ȳ. Because of Remark 3.27, we can find ε > 0 and a sequence of
points (yn )∞
n=0 ⊂ J such that
lim yn = ȳ but |g(yn ) − g(ȳ)| ≥ ε.

n→∞
13 Define xn = g(yn ) and x̄ = g(ȳ). The property above tells us that, for every n ∈ N,
either xn ≤ x̄ − ε, or xn ≥ x̄ + ε.
In particular, there are infinitely many indices n’s for which one of the above options holds.
Without loss of generality, let us assume that there are infinitely many indices n’s for which
xn ≤ x̄ − ε, and define a subsequence xnk using such indices, so that
xnk = g(ynk ) ≤ x̄ − ε for all k ∈ N.
Since f is strictly increasing and x̄ − ε ∈ I (since both xnk = g(ynk ) and x̄ = g(ȳ) belong to
I, and I is an interval), we deduce that
ynk = f (xnk ) ≤ f (x̄ − ε) < f (x̄) = ȳ.
Hence, using Proposition 2.97 we conclude that
ȳ = lim ynk ≤ f (x̄ − ε) < f (x̄) = ȳ,

k→∞
a contradiction.
Example 3.35. — Let n ∈ N, n ≥ 1. The function f : [0, ∞) → [0, ∞) given by x 7→ xn

is continuous, strictly increasing, and surjective. According to the Inverse Function Theorem,
there exists a continuous strictly increasing inverse [0, ∞) → [0, ∞), that we express as
√
n
x 7→ x
and is called n-th root. Furthermore, for m, n ∈ N with n ≥ 1, we can define

m √ √
xn = n
x · ... · n x for x ∈ [0, ∞).
| {z }
m times
Also, thanks to Exercise 3.22, we can define the continuous functions
1
13
m
x− n = m for x ∈ (0, ∞).
xn
Exercise 3.36. — For any real number a > 0, we define the sequence of real numbers
√
n=0 by xn =
(xn )∞ a. Show that the sequence (xn )∞
n=0 converges, and that
n
√
n
lim a = 1.
n→∞
√
n=0 by xn =
Exercise 3.37. — We define a sequence of real numbers (xn )∞ n
n. Show that
this sequence converges, with limiting value
√
n
lim n = 1.
n→∞

Chapter 3.3 Continuous Functions on Compact Intervals
3.3 Continuous Functions on Compact Intervals

In this section we show that continuous functions on bounded closed intervals – so-called
compact intervals – have special properties.
3.3.1 Boundedness and Extrema

The key property of compact intervals, that will be used several times, is the following:
Lemma 3.38: Compactness

Let [a, b] be a compact interval, and (xn )∞
n=0 a sequence contained in [a, b]. Then there
∞
exists a subsequence (xnk )k=0 such that limk→∞ xnk = x̄ with x̄ ∈ [a, b].
Proof. Since the sequence (xn )∞

n=0 is bounded (being contained inside [a, b]), Corollary 2.117
implies the existence of a subsequence (xnk )∞k=0 such that limk→∞ xnk = x̄. Then, since
a ≤ xnk ≤ b for every k ∈ N, Proposition 2.97 implies that a ≤ x̄ ≤ b.
Theorem 3.39: Boundedness

Let [a, b] be a compact interval, and f : [a, b] → R a continuous function. Then f is
bounded.
14
Proof. Assume by contradiction that f is not bounded. Then, for every n ∈ N there exists
a point xn ∈ [a, b] such that |f (xn )| ≥ n. Applying Lemma 3.38, we can find a subsequence
k=0 such that limk→∞ xnk = x̄ ∈ [a, b].
(xnk )∞
Hence, by the continuity of |f | (recall Remark 3.21) we deduce that the sequence {|f (xnk )|}∞
k=0
converges to the real number |f (x̄)|, a contradiction since
|f (xnk )| ≥ nk → ∞ as k → ∞.
Exercise 3.40. — Find examples of . . .
1. . . . an unbounded continuous function on a bounded open interval.
2. . . . an unbounded continuous function on an unbounded closed interval.
3. . . . an unbounded function on a compact interval that is discontinuous at only one point.

Definition 3.41: Extreme Values

Let D ⊂ R and f : D → R be a real-valued function on D. We say that the function
f takes its maximal value at a point x0 ∈ D if f (x) ≤ f (x0 ) holds for all x ∈ D.
We call f (x0 ) the maximum of f . Similarly, f takes its minimal value at x0 ∈ D
if f (x) ≥ f (x0 ) holds for all x ∈ D. In that case we call f (x0 ) the minimum of f .
Maxima and minima are summarily called extreme values or extrema.
Theorem 3.42: Extrema

Let [a, b] be a compact interval, and f : [a, b] → R a continuous function. Then f
attains both its maximum and minimum.
Proof. Since f is bounded (thanks to Theorem 3.39), Theorem 2.57 implies that the supremum
S = sup(f ([a, b])) exists. By definition of supremum, for any n ∈ N there exists yn = f (xn )
with xn ∈ [a, b], such that S − 2−n ≤ yn ≤ S. Hence, limn→∞ yn = S.
Thanks to Lemma 3.38, there exists a subsequence (xnk )∞ k=0 such that limk→∞ xnk = x̄ ∈
[a, b]. Hence, by the continuity of f and Theorem 3.26, this implies that
f (x̄) = lim f (xnk ) = lim ynk = S,

k→∞ k→∞
where the convergence of (ynk )∞

k=0 to S follows from Lemma 2.87. Thus, f attains its maximum
14
at x̄ ∈ [a, b].
Applying the same argument to −f , it follows that f attains also its minimum.
Exercise 3.43. — Does any continuous function f on the open interval (0, 1) attain its
maximum?
3.3.2 Uniform Continuity
Definition 3.44: Uniform Continuity

Let D ⊆ R. A function f : D → R is called uniformly continuous if for all ε > 0
there exists δ > 0 such that the implication
|x − y| < δ =⇒ |f (x) − f (y)| < ε
holds for all x, y ∈ D.
Exercise 3.45. — Show that the polynomial function f (x) = x2 is continuous on R but
not uniformly continuous. Also, the restriction of f to [0, 1] is uniformly continuous.

Theorem 3.46: Compact Intervals and Uniform Continuity

Let [a, b] be a compact interval, and f : [a, b] → R a continuous function. Then f is
uniformly continuous.
Proof. Assume by contradiction that f is not uniformly continuous. This means that there
exists ε > 0 such that, for all δ > 0, one can find x, y ∈ [a, b] with
|x − y| < δ and |f (x) − f (y)| ≥ ε.
Using this with δ = 2−n > 0, for each n ∈ N we can find xn , yn ∈ [a, b] satisfying
|xn − yn | < 2−n and |f (xn ) − f (yn )| ≥ ε. (3.1)
k=0 such that limk→∞ xnk = x̄ ∈ [a, b].

Thanks to Lemma 3.38, there exists a subsequence (xnk )∞
In addition, since |xnk − ynk | < 2 −n k we have
|ynk − x̄| ≤ |ynk − xnk | + |xnk − x̄| < 2−nk + |xnk − x̄|.
Since the right-hand side tends to 0 as k → ∞, also the subsequence (ynk )∞

k=0 converges to x̄.
Now, since f is continuous at x̄ and limk→∞ xnk = limk→∞ ynk = x̄, it follows from
Theorem 3.26 that
14
lim f (xnk ) = lim f (ynk ) = f (x̄).
k→∞ k→∞
This means that there exist N1 , N2 ∈ N such that
ε ε
k ≥ N1 =⇒ |f (xnk ) − f (x̄)| < , and k ≥ N2 =⇒ |f (ynk ) − f (x̄)| < .
2 2
This implies that, for k ≥ N = max{N1 , N2 } we have
ε ε
|f (xnk ) − f (ynk )| ≤ |f (xnk ) − f (x̄)| + |f (ynk ) − f (x̄)| < + = ε,
2 2
a contradiction to (3.1) which proves the result.
Exercise 3.47. — Does the statement of Theorem 3.46 hold for continuous functions on
the open interval (0, 1)?
Exercise 3.48. — In this exercise we consider another continuity term.
1. Let D ⊂ R be a subset. We call a real-valued function f on D Lipschitz continuous

if there exists L ≥ 0 such that
|f (x) − f (y)| ≤ L|x − y| for all x, y ∈ D.

Give examples of Lipschitz continuous functions, and show that a Lipschitz continuous
function is also uniformly continuous.
√
14 2. Let f : R≥0 → R be the root function, f (x) = x. Show that:
(i) f[0,1] : [0, 1] → R is not Lipschitz continuous;
(ii) f[1,∞) : [1, ∞) → R is Lipschitz continuous;
(iii) f : [0, ∞) → R is uniformly continuous.

Chapter 3.4 Example: Exponential and Logarithmic Functions
3.4 Example: Exponential and Logarithmic Functions

In this section we will use the notion of convergence and limits of sequences to define the
exponential function, and to prove some of its properties such as continuity.
3.4.1 Definition of the Exponential Function

We start with a preliminary lemma, that we will use repeatedly in this section.
Lemma 3.49: Bernoulli’s inequality
For all a ∈ R with a ≥ −1 and n ∈ N with n ≥ 1, (1 + a)n ≥ 1 + na holds.
Proof. We use induction. For n = 1 we have (1 + a)n = 1 + a = 1 + na.

Now, suppose that the inequality (1 + a)n ≥ 1 + na holds for some n ≥ 1. Since 1 + a ≥ 0
by assumption, it follows that
(1 + a)n+1 = (1 + a)n (1 + a) ≥ (1 + na)(1 + a) = 1 + na + a + na2 ≥ 1 + (n + 1)a,
yielding the induction step. The lemma follows.
Proposition 3.50: Existence of the Exponential

Let x ∈ R be a real number. The sequence of real numbers (an )∞
n=1 given by
15 x n
an = 1 +
n
is convergent, and its limit is a positive real number.
To prove the result, we first show the following:
Lemma 3.51: Monotonicity

Given x ∈ R, let n0 ∈ N be an integer satisfying n0 ≥ 1 and n0 > −x. Then the
sequence (an )∞
n=n0 defined in Proposition 3.50 is increasing.
Proof. First, we note that for n ≥ n0 (so that x + n > 0) it holds
x x+n 1
≤ = ≤ 1,
(n + 1)(n + x) (n + 1)(n + x) n+1
that is
x
− ≥ −1 ∀ n ≥ n0 .
(n + 1)(n + x)

Hence, for n ≥ n0 we can use Bernoulli inequality in Lemma 3.49 to obtain
x n+1 x n+1
n(n + 1 + x) n+1

1+ x 1 + n+1

n+1
n+x
n = 1+ =
1 + nx n 1 + nx n (n + 1)(n + x)
n+1
n+x x n+x x
= 1− ≥ 1− = 1.
n (n + 1)(n + x) n n+x
Thus an ≤ an+1 , as desired.
Proof of Proposition 3.50. Fix x ∈ R, and let n0 ∈ N be an integer satisfying n0 ≥ 1 and

n0 > −x. As shown in Lemma 3.51, the sequence (an )∞ n=n0 is increasing. So, if we prove that
it is bounded, Theorem 2.108 will show that the sequence (an )∞ n=n0 (and hence the sequence
(an )n=0 ) converges.
∞
To show boundedness, we first consider the case x ≤ 0. In this case we note that
x
0<1+ ≤1 ∀ n ≥ n0 ,
n
n
thus 0 < 1 + nx ≤ 1 holds. So 1 is an upper bound for the increasing positive sequence
n=n0 . Therefore
(an )∞
x n n
= sup 1 + nx | n ≥ n0 > 0.

lim 1+
n→∞ n
15 This proves the proposition for x ≤ 0.

For x ≥ 0 and n > x we use that
n
x2

x n x n
1+ 1− = 1− 2 ≤ 1,
n n n
which implies the estimate
x n
x −n 1
1≤ 1+ ≤ 1− = n ∀ n > x.
n n 1 + (−x)
n
n ∞
(−x)
Since the sequence 1+ nconverges to a positive number (by the case x ≤ 0
n=1 −n ∞
(−x)
above), Proposition 2.96(3) implies that also the sequence 1+ n converges,
n=1
and in particular it is bounded (see Lemma 2.102). This implies that the monotonically
n ∞
increasing sequence 1 + nx n=1
is also bounded, so it converges.
Definition 3.52: Exponential Function

The exponential function exp : R → R>0 is the function defined by
x n
exp(x) = lim 1+
n→∞ n
for all x ∈ R.

Euler’s number e ∈ R is defined as

n
1
e = exp(1) = lim 1+ . (3.2)
n→∞ n
Its numerical value is
e = 2.71828182845904523536028747135266249775724709369995 . . .
A useful consequence of Lemma 3.51 is the following lower bound on exp(x).
Corollary 3.53: Growth of Exponential

Given n ∈ N with n ≥ 1, the exponential function satisfies
x n
exp(x) ≥ 1 + ∀ x > −n.
n
Proof. It suffices to observe that, given x ∈ R with x > −n, Lemma 3.51 and Definition 3.52
imply that an ≤ an+1 ≤ . . . ≤ exp(x).
3.4.2 Properties of the Exponential Function

15 Theorem 3.54: Properties of Exponential Map
The exponential function exp : R → R>0 is bijective, strictly monotonically increasing,
and continuous. Furthermore
exp(0) = 1, (3.3)
−1
exp(−x) = exp(x) for all x ∈ R, (3.4)
exp(x + y) = exp(x) exp(y) for all x, y ∈ R, (3.5)
Proof. (Extra material)

We start by checking (3.3), (3.4) and (3.5).
1. The identity (3.3) follows directly from the definition of the exponential function.
2. For (3.4), we apply Proposition 2.96(2) to say that

n
x n x2 n

x
exp(x) exp(−x) = lim 1+ lim 1− = lim 1 − 2 .
n→∞ n n→∞ n n→∞ n
2
Now, for n ≥ |x| we have − nx2 ≥ −1. Thus, using Bernoulli’s inequality, we obtain
n
x2 x2
1− n ≤ 1− n2
≤1 ∀ n ≥ |x|

Since the left-hand side converges to 1 as n → ∞, the Sandwich Lemma 2.99 implies
2 n
that lim 1 − nx2 = 1, so (3.4) holds.
n→∞
3. We prove (3.5) by an argument similar to the one above. As a preparation, we first

calculate for n ≥ 1 the product
(x + y)2 xy

x y x+y x+y cn
1− 1− 1+ =1− 2
+ 2 1+ = 1 + 2,
n n n n n n n
where cn stands for

x+y
cn = −(x2 + y 2 ) − xy + xy .
n
x+y |x|+|y|
Note that, since n ≤ n ≤ 1 for n ≥ |x| + |y|, we have
x+y
−2|xy| ≤ −xy + xy ≤ 2|xy| ∀ n ≥ |x| + |y|.
n
Also, since (|x| − |y|)2 ≥ 0, developing the square we get
2|xy| ≤ x2 + y 2 .
Hence, for n ≥ |x| + |y| we obtain
0 ≥ −(x2 + y 2 ) + 2|xy| ≥ cn ≥ −(x2 + y 2 ) − 2|xy| ≥ −2(x2 + y 2 ).

15
In particular, cn
n2
≥ −1 for n ≥ |x| + |y|. Therefore, by the Bernoulli inequality, it follows
x2 + y 2 cn cn n
1−2 ≤1+ ≤ 1+ 2 ≤1 ∀ n ≥ |x| + |y|,
n n n
n
so lim 1 + ncn2 = 1 by the Sandwich Lemma 2.99. Using (3.4) and Proposition
n→∞
2.96(2), we get
x+y n

exp(x + y) x n y n cn n
= lim 1 − 1− 1+ = lim 1 + 2 =1
exp(x) exp(y) n→∞ n n n n→∞ n
which proves (3.5).
It remains to prove the continuity, monotonicity, and bijectivity of the exponential function.
We shall first prove some useful estimates. First of all, we claim that
exp(x) ≥ 1 + x ∀x ∈ R (3.6)
Indeed, for x ≤ −1 this is clear since exp(x) > 0. For x > −1, it follows from Corollary 3.53.
Combining (3.6) and (3.4), we also deduce that
1 1
exp(x) = ≤ ∀ x < 1. (3.7)
exp(−x) 1−x

Furthermore, as a consequence of (3.6), we can also show that

1
exp − < x < exp(x) ∀ x ∈ R. (3.8)
x
Indeed, (3.6) implies that x < 1 + x ≤ exp(x). Applying this inequality to 1

x we obtain

1 1 1 1
< exp =⇒ x > = exp − .
exp x1

x x x
We can now prove the desired properties of exp.
1. The function exp is continuous. We first prove the continuity of exp at 0.

For this, given x ∈ (−δ, δ) with δ ∈ (0, 1), we apply (3.7) to deduce that
1 1 δ
x ∈ [0, δ) =⇒ | exp(x) − 1| = exp(x) − 1 ≤ −1≤ −1= ,
1−x 1−δ 1−δ
and (3.6) to prove that
δ
x ∈ (−δ, 0] =⇒ | exp(x) − 1| = 1 − exp(x) ≤ 1 − (1 + x) = −x ≤ δ ≤ .
1−δ
Thus, for x ∈ (−δ, δ) with δ ∈ (0, 1) we have

15
δ
| exp(x) − exp(0)| ≤ .
1−δ
Recalling Remark 3.10, to prove the continuity of exp at 0 it suffices to consider ε ∈ (0, 1].
So, let ε ∈ (0, 1] and choose δ = 2+ε
ε
. With this choice we see that δ < 1 and that
δ ε
| exp(x) − exp(0)| ≤ = < ε.
1−δ 2
It follows that exp is continuous at 0.

To show continuity at an arbitrary point x0 ∈ R, we write
exp(x) = exp(x − x0 + x0 ) = exp(x − x0 ) exp(x0 ).
We can thus write the exponential function as a composition, namely exp = µ ◦ exp ◦τ ,
with τ : R → R and µ : R → R given by
τ (x) = x − x0 and µ(x) = exp(x0 )x
Note that the functions τ and µ are continuous. In particular, τ is continuous at x0 , and
exp is continuous at 0 = τ (x0 ). It follows that exp is continuous at x0 from Proposition
3.20.

2. The function exp is strictly increasing. For all real numbers x < y it follows from (3.6)
that exp(y − x) ≥ 1 + (y − x) > 1, therefore
exp(x) < exp(x) exp(y − x) = exp(y).
Thus exp : R → R>0 strictly increasing.
3. The function exp : R → R>0 is bijective. The exponential function is strictly increasing
and, therefore, injective. To show surjectivity, choose a ∈ R>0 arbitrarily. If we set
x0 = −a−1 and x1 = a, it follows from (3.8) that
exp(x0 ) < a < exp(x1 ).
Since exp is continuous on all R, it follows from the Intermediate Value Theorem 3.29,
that there exists x ∈ [x0 , x1 ] such that exp(x) = a. This shows the assertion and finishes
the proof.
3.4.3 The Natural Logarithm

15
Definition 3.55: Logarithm
The unique inverse function
log : R>0 → R
of the bijective mapping exp : R → R>0 is called logarithm.
Corollary 3.56: Properties of Logarithm

The logarithm log : R>0 → R is a strictly monotonically increasing, continuous and
bijective function. Furthermore
log(1) = 0, (3.9)
log(a−1 ) = − log(a) for all a ∈ R>0 , (3.10)
log(ab) = log(a) + log(b) for all a, b ∈ R>0 , (3.11)
Proof. This follows directly from Theorem 3.54 and the Inverse Function Theorem 3.34. The
equations (3.9), (3.10), and (3.11) follow from the corresponding properties of the exponential,
choosing x = log a and y = log b.

Figure 3.2: The graphs of the exponential function and the logarithm. The auxiliary lines
show exp(x) ≥ x + 1 and log(x) ≤ x − 1.
3.57. — The logarithm function defined here is also called the natural logarithm to
distinguish it from the logarithm with another base a > 1, typically a = 10 or a = 2. Let
15 a > 1 be a real number. We can define the logarithm loga : R>0 → R in base a: the notation
log x
loga (x) = ∀ x ∈ R>0 .
log a
Verify that log10 (10n ) = n holds for all n ∈ Z. However, we will not use this definition, not
even for a = 10 or a = 2, and log(x) will always denote the natural logarithm of x ∈ R>0 to
base e.
3.58. — We can use the logarithm and exponential mapping to define more general powers.
For a positive number a > 0 and arbitrary exponents x ∈ R, we write
ax = exp(x log(a)).
In particular, we write ex = exp(x log(e)) = exp(x) for all x ∈ R.

Also, for x > 0 and a ∈ R, we define
xa = exp(a log(x)).

Figure 3.3: Graphs of a 7→ ax for different exponents x ∈ R.
Exercise 3.59. — Show that for x ∈ Q and a > 0 this definition agrees with the definition
of rational powers from Example 3.35. Furthermore, check the calculation rules
log(ax ) = x log(a), ax ay = ax+y , (ax )y = axy
15 for a > 0 and x, y ∈ R.
Exercise 3.60. — Let a > 0 be a positive number. Show that there exists a real number
Ca > 0 such that log(x) ≤ Ca xa holds for all x > 0.
Exercise 3.61. — Show that for all real numbers x ≥ −1 and p ≥ 1, the continuous
Bernoulli inequality
(1 + x)p ≥ 1 + px.
holds.
Exercise 3.62. — In this exercise, we consider another continuity term (compare with
Exercise 3.48).
1. Let D ⊂ R be a subset and α ∈ (0, 1]. We call a real-valued function f on D α-Hölder

continuous if there exists L ≥ 0 such that
|f (x) − f (y)| ≤ L|x − y|α for all x, y ∈ D.
Show that a α-Hölder continuous function is also uniformly continuous.

(Note that α-Hölder continuous functions with α = 1 correspond to Lipschitz functions.)

2. Given α ∈ (0, 1], consider the function f : [0, ∞) → R given by f (x) = xα . Show that
f is α-Hölder continuous.
Hint: use the inequality
15 (x + y)α ≤ xα + y α ∀ x, y ≥ 0, α ∈ (0, 1]. (3.12)
Applet 3.63 (Slide rule). Using the slide rule, calculate some products and quotients. Recall
the properties of the logarithm to see how to do these calculations. Before the introduction of
electronic calculators, these mechanical aids were widely used.

Chapter 3.5 Limits of Functions
3.5 Limits of Functions

We consider functions f : D → R on a subset D ⊂ R and want to define limits of f (x) for
the case when x ∈ D tends to a point x0 ∈ R. Typical situations are D = R, D = [0, 1], or
D = (0, 1), and x0 = 0 in all cases.
3.5.1 Limit in the Vicinity of a Point

3.64. — We specify for this section a non-empty subset D ⊂ R, and an element x0 ∈ R
such that
D ∩ (x0 − δ, x0 + δ) ̸= ∅ (3.13)
holds for all δ > 0. Whenever (3.13) holds, we say that x0 is an accumulation point of D.
Note that when x0 ∈ D, (3.13) is always satisfied.
We remark that (3.13) implies that there exists a sequence of points in D converging to
x0 .
Definition 3.65: Limit of a Function

Let f : D → R, and assume that D ∩ (x0 − δ, x0 + δ) ̸= ∅ for every δ > 0.
A real number L ∈ R is called limit of f (x) as x → x0 if for every ε > 0 there exists
δ > 0 such that
x ∈ D ∩ (x0 − δ, x0 + δ) =⇒ |f (x) − L| < ε.
In general, the limit of f (x) as x → x0 may not exist. But if a limit exists, then it is
15
uniquely determined. Therefore, from now on, we speak of the limit and write
lim f (x) = L
x→x0
if the limit of f (x) as x → x0 exists and is equal to L. Informally, this means that the function
values of f are arbitrarily close to L if x ∈ D is close to x0 .
3.66. — The limit of a function satisfies properties analogous to Proposition 2.96. If f and
g are functions on D such that the limits
lim f (x) = L1 and lim g(x) = L2

x→x0 x→x0
exist, so do the limits
lim f (x) + g(x) = L1 + L2 and lim f (x)g(x) = L1 L2 .

x→x0 x→x0
The inequality f ≤ g implies L1 ≤ L2 . Finally, the sandwich lemma holds: if h is another

function on D with f ≤ h ≤ g and L1 = L2 , then limx→x0 h(x) = L1 = L2 .

Recalling the definition of continuity, one has the following:
Remark 3.67. — Let f : D → R be a function. If x0 ∈ D, then f is continuous at x0 if

and only if lim f (x) = f (x0 ) holds.
x→x0
3.68. — Suppose that x0 ∈ D is an accumulation point of D \ {x0 }. Let f : D → R be

a function, and consider f |D\{x0 } : D \ {x0 } → R for the restriction of f to D \ {x0 }. It is
possible that f may be discontinuous at the point x0 , but the limiting value
L = lim f |D\{x0 } (x) (3.14)

x→x0
nevertheless exists. Under these circumstances, the point x0 ∈ D is called a removable

discontinuity of f , and one also writes
L = x→x
lim f (x) (3.15)
0
x̸=x0
in place of (3.14). Note that, in the situation above, the function fe : D → R defined by

f (x) if x ∈ D \ {x0 }
15 fe(x) = (3.16)
L if x = x0
is continuous at the point x0 . In other words, we have removed the discontinuity of the
function fe by replacing the value of the function f at the location x0 with L.
3.69. — Suppose instead that x0 ̸∈ D but the limit in (3.15) exists. In this situation, we
call the function fe defined in (3.16) the continuous extension of f on D ∪ {x0 }.
Arguing exactly as in the proof of Theorem 3.26, we also have the validity of the following:
Lemma 3.70: Limit vs Sequences
Let f : D → R. Then L = lim f (x) if and only if, for every sequence (xn )∞
n=0 ⊂ D
x→x̄
converging to x̄, lim f (xn ) = L also holds.
n→∞
Finally, we state a result on the limit of the composition with a continuous function.
Proposition 3.71: Limit and Composition

Let E ⊂ R, and let f : D → E be such that the limit ȳ = lim f (x) exists and belongs
x→x̄
to E. Let g : E → R be continuous at ȳ. Then lim g(f (x)) = g(ȳ) holds.
x→x̄

Proof. Let (xn )∞

n=0 ⊂ D be a sequence converging to x̄. By Lemma 3.70, n→∞ lim f (xn ) = ȳ
holds, and therefore, since g is continuous at ȳ, Theorem 3.26 yields lim g(f (xn )) = g(ȳ).
15 n→∞
Since (xn )∞
n=0 is an arbitrary sequence converging to x̄, it follows from Lemma 3.70 that
lim g(f (x)) = g(ȳ), as desired.
x→x̄
3.72. — We can introduce conventions for improper limits of functions, as we have already
done for sequences.
Definition 3.73: Improper limits

Let f : D → R, and assume that D ∩ (x0 − δ, x0 + δ) ̸= ∅ for every δ > 0.
We say that f diverges to ∞ as x → x0 , and write lim f (x) = ∞, if for every real
x→x0
number M > 0 there exists δ > 0 with the property that
x ∈ D ∩ (x0 − δ, x0 + δ) =⇒ f (x) ≥ M ∀ x ∈ D.
Analogously, lim f (x) = −∞, if for every real number M > 0 there exists δ > 0 with
x→x0
the property that
x ∈ D ∩ (x0 − δ, x0 + δ) =⇒ f (x) ≤ −M ∀ x ∈ D.
16 3.5.2 One-sided Limits

In addition to what was discussed above, several other limit terms are useful: In this subsec-
tion, we introduce one which is sensitive to which direction the limit point x0 is approached,
and where the limit point x0 may also be one of the symbols ∞ or −∞.
Definition 3.74: One-sided Limit

Let f : D → R, and assume that D ∩ [x0 , x0 + δ) ̸= ∅ for every δ > 0.
A real number L is called limit from the right of f at x0 if for every ε > 0 there
exists δ > 0 such that
x ∈ D ∩ [x0 , x0 + δ) =⇒ |f (x) − L| < ε.
If the limit from the right of f exists at x0 , we use the notation
L = x→x
lim f (x). (3.17)
0
x≥x0

If x0 is an accumulation point of D ∩ (x0 , ∞) then, as in Paragraph 3.68, we can also use a

version of the limit from the right where x ̸= x0 and we write
L = x→x
lim f (x), or also L = lim f (x).
0 x→x+
0
x>x0
Analogous to limits from the right, we can also define limits from the left, with the notation
lim f (x)
x→x0
and lim f (x) = x→x
lim f (x).
x→x−
0
0
x≤x0 x<x0
Finally, we can also allow the symbols +∞ and −∞ for one-sided limits, as in Paragraph 3.72.
Definition 3.75: Limits at Infinity

Let f : D → R, and assume that D ∩ (R, ∞) ̸= ∅ for every R > 0.
A real number L is called limit of f as x → ∞ if for every ε > 0 there exists R > 0
such that
x ∈ D ∩ (R, ∞) =⇒ |f (x) − L| < ε.
Instead, we say that f diverges to ∞ as x → ∞ if for every M > 0 there exists R > 0
such that
x ∈ D ∩ (R, ∞) =⇒ f (x) ≥ M.
16
If the limit of f as x → ∞ exists, then it is unique and we write
L = lim f (x).
x→∞
If f diverges to ∞ as as x → ∞, we write
lim f (x) = ∞.
x→∞
Of course, also limits at −∞ and/or the case when f diverges to −∞ can be considered, and
the definitions are analogous.
3.76. — Limits as x → ∞ can be transformed into limits from the right as x → 0. Indeed,
if D and f are given as in Definition 3.75, consider the set E and the function g : E → R
defined as
E = {x ∈ R>0 | x−1 ∈ D}, g(x) = f (x−1 ).
Then it holds
lim f (x) = lim g(x).
x→∞ x→0+
This means, in particular, that one limit exists if and only if the other limit exists.

Definition 3.77: One-sided Continuity and Jumps

Let f : D → R, and let x0 ∈ D. If the limit from the right
lim f (x)
x→x+
0
exists and it is equal to f (x0 ), then we say that f is continuous from the right at
x0 . Similarly, we define continuous from the left.
We call x0 ∈ D a jump point if the one-sided limits
lim f (x) and lim f (x) (3.18)

x→x−
0 x→x+
0
both exist but are different.
3.78. — The following graph represents a function with three points of discontinuity x1 ,
x2 , x3 .
16
The discontinuity x1 is a removable discontinuity. At x1 , the function is neither continuous

from the left nor from the right. The point x1 is also not a jump point. Although both limit
values (3.18) exist for x → x1 , they are equal.
At the point x2 , the function f is continuous from the left but not continuous from the
right. The point x2 is a jump point. At the point x3 , f is continuous from the right. Finally,
x3 is not a jump point because the limit from the left does not exist.
Example 3.79. — The domain of definition for all functions in this example is D = R>0 .
We want to study the limit as x → 0+ of the function f : D → R given by
f (x) = xx = exp(x log(x)).
To do this, let us first calculate two other limits.
1. We claim that
lim y exp(−y) = 0 (3.19)
y→∞

holds. Indeed, Corollary 3.53 implies that exp(y) ≥ (1 + y2 )2 holds for y > 0. This gives
0 ≤ y exp(−y) ≤ (1+yy )2 ≤ y4 , which implies (3.19) because of the sandwich lemma.
2
2. Next we want to show that

lim x log x = 0. (3.20)
x→0+
So let ε > 0. Because of (3.19) there exists R > 0 such that |y exp(−y)| < ε for all
y > R. Set δ = exp(−R) and consider x ∈ (0, δ). Then y = − log x satisfies y > R due
to the strict monotonicity of the logarithm, therefore |x log x| = | exp(−y)y| < ε, which
shows (3.20).
Because of Proposition 3.71 and since the exponential mapping is continuous, (3.20) yields
lim xx = lim exp(x log x) = exp(0) = 1.

x→0+ x→0+
Exercise 3.80. — Let a ∈ R. Calculate the following limits, if they exist:
x3 − x2 − x − 2 3e2x + ex + 1 ex log(x)
lim , lim , lim , lim .
x→2 x−2 x→∞ 2e2x − 1 x→∞ xa x→∞ xa
In each case, choose a suitable domain on which the given formulas define a function.
16
3.5.3 Landau Notation
We now introduce two common notations that relate the asymptotic behaviour of a function to
the asymptotic behaviour of another function – that is, describe relative asymptotic behaviour.
These notations are named after the German-Jewish mathematician Edmund Georg Hermann
Landau (1877 - 1938).
Definition 3.81: Landau Big-O

Let f, g : D → R. We write
f (x) = O(g(x)) as x → x0
if there exist M > 0 and δ > 0 such that
x ∈ D ∩ (x0 − δ, x0 + δ) =⇒ |f (x)| ≤ M |g(x)|.
Then we say that f is a big-O of g as x → x0 .
If g(x) ̸= 0 for all x ∈ D \ {x0 } sufficiently close to x0 , then f (x) = O(g(x)) is equivalent
f (x)
to g(x) being bounded in a neighbourhood of x0 .
We can also allow for x0 the elements ∞ and −∞ of the extended number line, as discussed
in the next definition. We define only the case at ∞, the definition for −∞ is analogous.

Definition 3.82: Landau Big-O at Infinity

f (x) = O(g(x)) as x → ∞
if there exist M > 0 and R > 0 such that
x ∈ D ∩ (R, ∞) =⇒ |f (x)| ≤ M |g(x)|.
The advantage of this notation is that we do not need to introduce the name for the upper
bound M . If we are not particularly interested in this constant, then we can concentrate on
the essentials in calculations. In this context, one also speaks of implicit constants.
Example 3.83. — • If f and g are bounded and continuous in a neighborhood of x0

with g(x0 ) ̸= 0, then f (x) = O(g(x)) holds.
• It holds
x2 = O(x) as x → 0
but not x = O(x2 ), since x

x2
is not bounded near 0.
• It holds
16 3x3
= O(1) as x → ∞
x3 + 3
3x3
but not x3 +3
= O(xα ) for α < 0.
As discussed above, the big-O means that f is bounded by a multiple of g. One may also
consider a stronger condition, namely that f is asymptotically negligible with respect to g.
This leads to the following definition.
Definition 3.84: Little-o

f (x) = o(g(x)) as x → x0
if for every ε > 0 there exists δ > 0 such that
x ∈ D ∩ (x0 − δ, x0 + δ) =⇒ |f (x)| ≤ ε|g(x)|.
Then we say that f is a little-o of g as x → x0 .
If g(x) ̸= 0 for all x in a neighbourhood of x0 , then f (x) = o(g(x)) as x → x0 is equivalent

to
f (x)
lim = 0.
x→x0 g(x)

Remark 3.85. — If f (x) = o(g(x)) then f (x) = O(g(x)).
Again, we can consider the little-o condition at infinity.
Definition 3.86: Landau Little-o at Infinity

f (x) = o(g(x)) as x → ∞
if for every ε > 0 there exists R > 0 such that
x ∈ D ∩ (R, ∞) =⇒ |f (x)| ≤ ε|g(x)|.
Example 3.87. — • x = o(x2 ) as x → ∞, and x2 = o(x) as x → 0.
• For any α < 1, the following holds true
3x3
= o(|x|α ) as x → 0
2x2 + x10
but not for α ≥ 1. In fact, for any α < 1 the limit exists.
3x3 3 3
lim α 2 10
= lim |x|1−α 8
= lim |x|1−α
16 x→0 |x| (2x + x ) x→0 (2 + x ) 2 x→0
and is equal to 0.
Exercise 3.88. — Let p > 1, a ∈ R, and b > 0. Show the following:
1. xp = o(x) as x → 0;
2. x = o(xp ) as x → ∞;
3. xa = o(ex ) as x → ∞;
4. log(x) = o(xb ) as x → ∞.
Exercise 3.89. — Let f1 , f2 , g be real-valued functions on D. Show that if f1 (x) = o(g(x))

and f2 (x) = o(g(x)) as x → x0 , then
α1 f1 (x) + α2 f2 (x) = o(g(x)) as x → x0
for every α1 , α2 ∈ R. Formulate and show the analogous statement for big-O.
Exercise 3.90. — Let f1 , f2 , g1 , g2 be real-valued functions on D. Show that:

• If f1 (x) = o(g1 (x)) and f2 (x) = o(g2 (x)) as x → x0 , then f1 (x)f2 (x) = o(g1 (x)g2 (x)) as

x → x0 ;
• If f1 (x) = o(g1 (x)) and f2 (x) = O(g2 (x)) as x → x0 , then f1 (x)f2 (x) = o(g1 (x)g2 (x)) as
x → x0 ;
• If f1 (x) = O(g1 (x)) and f2 (x) = O(g2 (x)) as x → x0 , then f1 (x)f2 (x) = O(g1 (x)g2 (x)) as
x → x0 .
2
Example 3.91. — Let f (x) = x+x3 +4x4 +x7 and g(x) = x+ 1+x 3x
. Then f (x) = x+o(x2 )
and g(x) = x + O(x2 ) as x → 0. In particular, their product satisfies the following:
f (x)g(x) = x + o(x2 ) x + O(x2 ) = x2 + o(x2 )x + xO(x2 ) + o(x2 ) · O(x2 )

= x2 + o(x3 ) + O(x3 ) + o(x4 ) = x2 + O(x3 ) as x → 0.
3.92. — Landau notation is often used as a placeholder, for example, to express that one
term in a sum is increasing or decreasing faster than the others. In an expression of the form
f (x) + o(g(x)) as x → x0
16 the term o(g(x)) stands for a function h : D → R with the property that
h(x) = o(g(x)) as x → x0 .
This applies analogously to the big-O notation.
Example 3.93. — One writes
x3 − 7x2 + 6x + 2

1
2
=x−7+O as x → ∞
x x
= x − 7 + o(1) as x → ∞
= x + O(1) as x → ∞
= x + o(x) as x → ∞
and thus remembers on the right-hand side only those terms that make up the bulk of the
term as x → ∞. It may perhaps come as a surprise that, in the above example, all four
formulas could be true or useful. The assertions all follow directly from polynomial division
with remainder, and depending on the context, one might want to use the slightly more precise
assertion with error −7 + O x1 or the coarser assertion using error o(x).


Chapter 3.6 Sequences of Functions
3.6 Sequences of Functions
3.6.1 Pointwise Convergence
Definition 3.94: Sequences of Functions
A sequence of real-valued functions on a subset D ⊂ R is a family of functions fn :

D → R indexed by N. The function fn is called the n-th element of the sequence.
One often writes (fn )n∈N or also (fn )∞
n=0 for a sequence of functions.
Definition 3.95: Pointwise Convergence

Let D ⊂ R, let (fn )∞
n=0 be a sequence of functions fn : D → R, and let f : D → R be
a function. We say the sequence (fn )∞
n=0 converges pointwise to f if for every x ∈ D
the sequence of real numbers (fn (x))n=0 converges to f (x). In this case we call f the
∞
pointwise limit of the sequence (fn )∞n=0 .
Exercise 3.96. — Show that the pointwise limit of a sequence of functions is uniquely
determined if it exists.
3.97. — In the following example we show that in general the continuity property is not
17 preserved under pointwise convergence.
Example 3.98. — Let D = [0, 1] and let fn : D → R be given by fn (x) = xn . Then the
sequence of continuous functions (fn )∞
n=0 converge pointwise to the non-continuous function
f : D → R given by

0 for x < 1
f (x) = lim fn (x) = lim xn =
n→∞ n→∞ 1 for x = 1.

3.6.2 Uniform Convergence
Definition 3.99: Uniform Convergence

Let D ⊂ R, let (fn )∞
n=0 be a sequence of functions fn : D → R, and let f : D → R be a
function. We say the sequence (fn )∞n=0 converges uniformly to f if for every ε > 0
there exists N ∈ N such that for all n ≥ N and all x ∈ D
|fn (x) − f (x)| < ε.
3.100. — The estimate |fn (x) − f (x)| < ε is equivalent to f (x) − ε ≤ fn (x) ≤ f (x) + ε.
Thus, uniform convergence can also be described by the graph of a function sequence and its
limit function, as the following figure shows: The function sequence fn converges uniformly
to f if for every ε > 0 the graph of fn lies in the “ε-tube” around f for all sufficiently large n.
17
Exercise 3.101. — Let D be a set and let (fn )∞ n=0 be a sequence of functions fn : D → R.
Show that if (fn )n=0 converges uniformly to a function f , then (fn )∞
∞
n=0 also converges pointwise
to f .
Theorem 3.102: Uniform Convergence Preserves Continuity

Let D ⊂ R and let (fn )∞ ∞
n=0 be a sequence of continuous functions (fn )n=0 converging
uniformly to f : D → R. Then f is continuous.
Proof. Let x̄ ∈ D and ε > 0. First, by the uniform convergence of fn to f , there exists N ∈ N
such that |fN (y) − f (y)| < 3ε for all y ∈ D. Then, since fn is continuous at x̄, there exists
δ > 0 such that
ε
|x − x̄| < δ =⇒ |fN (x) − fN (x̄)| <
3

holds for all x ∈ D. Then, for all x ∈ D with |x − x̄| < δ it follows that
|f (x) − f (x̄)| = |f (x) − fN (x) + fN (x) − fN (x̄) + fN (x̄) − f (x̄)|

≤ |f (x) − fN (x)| + |fN (x) − fN (x̄)| + |fN (x̄) − f (x̄)|
ε ε ε
< + + = ε,
3 3 3
which proves that f is continuous at x̄. Since x̄ is arbitrary in D, the theorem follows.
Exercise 3.103. — Let (fn )∞ n=0 be a sequence of bounded real-valued function on a set D,
and let f : D → R be another real-valued function on D. Suppose that D = D1 ∪ D2 for two
subsets, (fn |D1 )∞
n=0 tends uniformly towards f |D1 and (fn |D2 )n=0 tends uniformly towards
∞
f |D2 . Show that then (fn )∞

n=0 also tends uniformly towards f .
Exercise 3.104. — Let (fn )∞ n=0 be a sequence of bounded real-valued functions on a set
17 D. Show that if (fn )n=0 converges uniformly to a function f : D → R, then f is also bounded.
∞
Find also an example in which a sequence (fn )∞n=0 of bounded functions converges pointwise
to an unbounded function.
Exercise 3.105. — Let D ⊂ R and (fn )∞ n=0 be a sequence of uniformly continuous real-
valued functions on D that tends uniformly to f : D → R. Let (xn )∞
n=0 be a sequence in D
that converges towards x̄ ∈ D. Show that
lim fn (xn ) = f (x̄). (3.21)

n→∞
Find an example that shows that pointwise convergence of (fn )∞

n=0 to f is not sufficient to
infer (3.21).
Exercise 3.106. — Let D ⊂ R and (fn )∞ n=0 be a sequence of uniformly continuous real-
valued functions on D, uniformly converging to f : D → R. Show that f is uniformly
continuous.

Chapter 4
Series and Power Series
In this chapter we will consider so-called series, i.e., “infinite sums”, which will lead us to the
definitions of known functions, in particular to the definitions of trigonometric functions.
4.1 Series of Real Numbers
Definition 4.1: Convergent Series

Let (an )∞
n=0 be a sequence of real numbers, and let A ∈ R. We say that the series
P∞
k=0 ak converges to A ∈ R if
n
X
A = lim ak .
n→∞
k=0
In other words, computing the infinite sum ∞ k=0 ak corresponds to finding (if it exists)
P
the limit of the sequence (sn )n=0 given by the partial sums
∞
n
17 X
sn = ak .
k=0
We call an the n-th element or the n-th summand of the series. We call the series
P∞
k=0 ak convergent if the limit exists, in which case we call it the value of the
series. Otherwise, the series is not convergent.
If the sequence sn diverges to ∞ (respectively, −∞), then we call the series ∞
P
k=0 ak
divergent to ∞ (respectively, −∞).
Remark 4.2. — Unless otherwise specified, all sequences always consist of real numbers.
101
Chapter 4.1 Series of Real Numbers
Proposition 4.3: Necessary Condition for Convergence

If the series ∞ ∞
P
k=0 ak converges, then the sequence (an )n=0 is a null sequence, that is,
lim an = 0.
n→∞
Proof. By assumption, the partial sums sn = nk=0 ak for n ∈ N have a limit limn→∞ sn = S.
P
Thus, since an = sn − sn−1 , the following holds:
lim an = lim (sn − sn−1 ) = S − S = 0.

n→∞ n→∞
P∞
Example 4.4 (Geometric Series). — The geometric series n=0 q
n to q ∈ R converges
exactly when |q| < 1 and, in this case,
∞
X 1
qk = .
1−q
k=0
Indeed, by Proposition 4.3, convergence of the series implies that q k → 0, hence |q| < 1.
Conversely, for |q| < 1, first one prove by induction on n ∈ N the validity of the “geometric
sum formula”
n
X 1 − q n+1
qk = ∀ n ∈ N, q ̸= 1.
17 1−q
k=0
Then, thanks to Example 2.134, it holds that

n
X 1 − q n+1 1
qk = → as n → ∞.
1−q 1−q
k=0
Example 4.5 (Harmonic Series). — The converse of Proposition 4.3 does not hold. For
example, the harmonic series ∞ k=1 k is divergent. We prove the divergence with a concrete
1
P
estimate.
Let ℓ ∈ N and consider n = 2ℓ . Then the partial sum of the harmonic series for n satisfies
the estimate
2 ℓ
X 1 1 1 1 1 1 1 1 1 1 1
= 1 + + + + + + + + + · · · + ℓ−1 + ··· + ℓ
k 2 3 4 5 6 7 8 9 2 +1 2
k=1
1 1 1 1 1 1 1 1 1 1 ℓ
≥1+ + + + + + + + + ··· + ℓ + ··· + ℓ = 1 + .
2 |4 {z 4} |8 8 {z 8 8} 16 |2 {z 2} 2
= 12 = 12 = 12
Since ℓ ∈ N is arbitrary we see that the partial sums are not bounded, and therefore, the
harmonic series is divergent.

Exercise 4.6. — Let ∞

P∞
k=0 ak , k=0 bk be convergent series, and let α, β ∈ R. Prove that
P
P∞
the series k=0 (αak + βbk ) converge, with
∞
X ∞
X ∞
X
(αak + βbk ) = α ak + β bk .
k=0 k=0 k=0
Lemma 4.7: Convergence of the Tail

Let ∞
P P∞
k=0 ak be a series. For each N ∈ N, the series k=N ak is convergent if and only
P∞
if the series k=0 ak is convergent, and in this case
∞
X N
X −1 ∞
X
ak = ak + ak .
k=0 k=0 k=N
The same statement holds replacing convergent with divergent.
Proof. For every n ≥ N , it holds
n
X N
X −1 n
X
ak = ak + ak .
k=0 k=0 k=N
17
In particular, the partial sums of ∞
P∞
k=N ak converge exactly when the partial sums of
P
k=0 ak
converge. The case of a divergent sequence is completely analogous.
4.1.1 Series with Non-negative Elements
Proposition 4.8: Non-negative Series either Converge or Diverge

Given a series ∞
P
ak with non-negative elements ak ≥ 0 for all k ∈ N, the partial
Pn k=0
sums sn = k=0 ak form a monotonically increasing sequence. If the sequence (sn )∞n=0
is bounded, then the series ∞
P
a
k=0 k converges; otherwise, it diverges to ∞.
Proof. From an+1 ≥ 0 it follows that sn+1 = sn + an+1 ≥ sn for all n ∈ N, so the sequence
n=0 is increasing. If the partial sums {sn | n ∈ N} are bounded, then they converge ac-
(sn )∞
cording to Theorem 2.108.
Remark 4.9. — If ∞ k=0 ak is a series with non-negative elements, then the sequence of
P
partial sums (sn )n=0 is bounded if and only if there exists a subsequence (snk )∞
∞
k=0 which is
bounded (see Remark 2.109).

Corollary 4.10: Majorant and Minorant Criterion

P∞ P∞
Let k=0 ak , bk be two series such that 0 ≤ ak ≤ bk for all k ∈ N. Then
P∞ Pk=0
∞
0 ≤ k=0 ak ≤ k=0 bk holds and, in particular,
∞
X ∞
X
bk convergent =⇒ ak convergent
k=0 k=0
∞
X X∞
ak divergent to ∞ =⇒ bk divergent to ∞.
k=0 k=0
These implications hold even if 0 ≤ an ≤ bn holds only for n ≥ N , for some N ∈ N.
Proof. From ak ≤ bk it follows nk=0 ak ≤ nk=0 bk for all n ∈ N. Thus, according to the
P P
monotonicity of the sequence of partial sums,

∞ ∞
( n ) ( n )
X X X X
ak = sup ak | n ∈ N ≤ sup bk | n ∈ N = bk .
k=0 k=0 k=0 k=0
The last statement of the proposition is a consequence of Lemma 4.7.
Under the assumptions of the corollary, one calls the series ∞ k=0 bk a majorant of the
P
P∞ P∞
series k=0 ak , and the latter is a minorant of the series k=0 bk . This is why one refers to
17 the above result as the majorant and the minorant criterion.
Example 4.11. — The series ∞ k=0 k2 is convergent. In fact, ak = k2 ≤ k(k−1) = bk holds

1 1 1
P
∞
for k ≥ 2. Also, since k(k−1)
1 1
− k1 , the series k=2 bk is convergent since its partial sums
P
= k−1
can be computed by solving a telescopic sum:
n n
X 1 X 1 1
= −
k(k − 1) k−1 k
k=2 k=2

1 1 1 1 1 1 1 1
= 1− + − + − + ... + − =1− .
2 2 3 3 4 n−1 n n
P∞ P∞
In particular 1
k=2 k2 ≤ k=2 bk = 1.
Example 4.12. — We want to show that the series

∞
X 2n − 10
an with an =
n3 − 10n + 100
n=0
converges. Note that limn→∞ n2 an = 2, therefore there exists N ∈ N such that 0 ≤ n2 an ≤ 3

for all n ≥ N , or equivalently 0 ≤ an ≤ 3n−2 for all n ≥ N . Thus, Corollary 4.10 and
Example 4.11 imply that the series converges.

Proposition 4.13: Cauchy Condensation Test

Let (ak )∞
k=0 be a monotonically decreasing sequence of non-negative real numbers. Then
∞
X ∞
X
ak converges ⇐⇒ 2k a2k converges.
k=0 k=0
Proof. (Extra material) Due to the monotonicity of the sequence (ak )∞ k=0 , the following in-
equalities hold:
1 · a1 ≥ a2 ≥ 1 · a2 , 2 · a2 ≥ a3 + a4 ≥ 2 · a4 ,
4 · a4 ≥ a5 + a6 + a7 + a8 ≥ 4 · a8 , 8 · a8 ≥ a9 + . . . + a16 ≥ 8 · a16 ,
and, more in general,
2k a2k ≥ a2k +1 + . . . + a2k+1 ≥ 2k a2k+1 ∀ k ∈ N.
Summing all these inequalities for k = 0, . . . , n gives
n n n+1
2X
X X
k
2 a2k ≥ a2k +1 + . . . + a2k+1 = aj
k=0 k=0 j=2
17 and
n+1
2X n n n n+1
X X
k 1 X k+1 1X k
aj = a2k +1 + . . . + a2k+1 ≥ 2 a2k+1 = 2 a2k+1 = 2 a2k .
2 2
j=2 k=0 k=0 k=0 k=1
This shows that

n n+1
2X n+1
X
k 1X k
2 a2k ≥ aj ≥ 2 a2k .
2
k=0 j=2 k=1
Because of Remark 4.9 and Corollary 4.10, it follows that the series ∞ k=0 2 a2k is bounded
k
P
P∞
(and therefore converge) if and only if the series k=0 ak is bounded (and therefore converge).
P∞
Example 4.14. — Given p ∈ R, the series 1
n=1 np converges exactly when p > 1. Indeed:
• For p ≤ 0 we see that 1

np ≥ 1, so the series diverges according to Proposition 4.3.
1 ∞
• For p > 0, since the sequence is decreasing, Proposition 4.13 implies that

np n=1
P∞ 1
n=1 np converges if and only if
∞ ∞
X
k 1 X
2 k p = (21−p )k
(2 )
k=0 k=0

converges. According to Example 4.4, this series converges exactly when 21−p < 1, that
is, p > 1.
Remark 4.15. — The argument in Example 4.14 gives an alternative proof that the har-
monic series diverges (see Example 4.5).
17
Exercise 4.16. — Given p ∈ R, the series ∞ n=2 n log(n)p converges exactly when p > 1.
1
P
Hint: for p ≤ 0, compare the series above with the harmonic series; for p > 0, use Proposi-
tion 4.13 and Example 4.14.
P∞
Exercise 4.17. — Is the series 1
n=3 n log(n) log log(n) convergent or divergent?
4.1.2 Conditional Convergence
Definition 4.18: Absolute and Conditional Convergence

We say that a series ∞
P∞
k=0 ak is absolutely convergent if the series k=0 |ak | con-
P
verges.
The series ∞k=0 ak is conditionally convergent if it converges but is not absolutely
P
convergent.
The critical property of a conditionally convergent sequence is that one can rearrange the
terms to obtain any possible limit!
Theorem 4.19: Riemann’s Rearrangement Theorem

Let ∞
P
18 n=0 an be a conditionally convergent series with real members, and let A ∈ R.
There exists a bijection φ : N → N such that
∞
X
A= aφ(n) . (4.1)
n=0
Proof. (Extra material) Let ∞n=0 an be a conditionally convergent series. Then an → 0 as

P
n → ∞ (by Proposition 4.3) and ∞ n=0 |an | = ∞ by assumption.

P
We divide the natural numbers N into two parts:
P = {n ∈ N | an ≥ 0}, N = {n ∈ N | an < 0}
depending on the sign of the corresponding an , and we enumerate the elements in P and N
in ordered sequence, i.e., P = {p0 , p1 , . . .} and N = {n0 , n1 , . . .}, with p0 < p1 < p2 < · · · and

n0 < n1 < · · · . Note that we must have

∞
X ∞
X
apk = ∞ and −ank = ∞. (4.2)
k=0 k=0
Indeed, if for instance ∞

P∞ P∞
k=0 apk = ∞ but k=0 (−ank ) < ∞, then n=1 an = ∞. Or if both
P
P∞ P∞ P∞
k=0 apk < ∞ and k=0 (−ank ) < ∞, then n=1 |an | < ∞. In any case, this would give a
contradiction, so (4.2) must hold.
Now, given A ∈ R, we construct the bijective mapping φ : N → N recursively in the
following way: If A < 0, set φ(0) = n1 , and if A ≥ 0, set φ(0) = p1 . Then, assuming we have
already defined φ(0), φ(1), . . . , φ(n), we consider sn = aφ(0) + aφ(1) + · · · + aφ(n) and define

min(P \ {φ(0), φ(1), . . . , φ(n)}) if s < A
n
φ(n + 1) =
min(N \ {φ(0), φ(1), . . . , φ(n)}) if s ≥ A.
n
The mapping φ : N → N thus defined is injective by construction, and surjective due to the
divergence of the series (4.2). Since an → 0 as n → ∞, the sequence of partial sums (sn )∞
n=0
converges to A, which shows (4.1).
Exercise 4.20. — Complete the details omitted from the proof of Theorem 4.19. Also,
show that for A one can also take one of the symbols −∞ or ∞.
18
4.1.3 Convergence Criteria of Leibnitz and Cauchy

Definition 4.21. — For a sequence (ak )∞k=0 of non-negative numbers, we call the series
P∞
k=0 (−1) ak the corresponding alternating series.
k
Proposition 4.22: Leibniz criterion

Let (ak )∞
k=0 be a monotonically decreasing sequence of non-negative real numbers con-
verging to zero. Then the alternating series ∞ k
P
k=0 (−1) ak converges and
2n+1
X ∞
X 2n
X
(−1)k ak ≤ (−1)k ak ≤ (−1)k ak (4.3)
k=0 k=0 k=0
holds for all n ∈ N.
Pn
Proof. For n ∈ N, let sn = k=0 (−1) ak .
k Since the sequence (an )∞
n=0 is decreasing and
non-negative, we have
s2n+2 = s2n − a2n+1 + a2n+2 ≤ s2n ,

s2n+1 = s2n−1 + a2n − a2n+1 ≥ s2n−1 ,

s2n+2 = s2n+1 + a2n+2 ≥ s2n+1 .
for all n ∈ N. In other words,
s1 ≤ s3 ≤ . . . ≤ s2n−1 ≤ s2n+1 ≤ . . . ≤ s2n+2 ≤ s2n ≤ . . . ≤ s2 ≤ s0 .
Hence, the sequence (s2n )∞ n=0 is decreasing and bounded from below, while the sequence
(s2n+1 )n=0 is increasing and bounded from above, so the limits A = limn→∞ s2n+1 and B =
∞
limn→∞ s2n exist and satisfy
s1 ≤ s3 ≤ . . . ≤ s2n−1 ≤ s2n+1 ≤ A ≤ B ≤ s2n+2 ≤ s2n ≤ . . . ≤ s2 ≤ s0 .
However, since s2n+2 −s2n+1 = a2n+2 converges to zero, then A = B and the result follows.
Example 4.23 (Alternating Harmonic Series). — Consider the alternating harmonic

series
∞ ∞
X (−1)n+1 X (−1)n 1 1 1 1 1
=− =1− + − + − + ...
n n 2 3 4 5 6
n=1 n=1
Proposition 4.22 guarantees that this series converges, while the series of its absolute values
18 ∞
X 1
∞
X |(−1)n+1 |
=
n n
n=1 n=1
diverges to infinity (see Example 4.5). So this series is only conditionally convergent.
Theorem 4.24: Cauchy criterion

The series ∞
P
k=0 ak converges if and only if for every ε > 0 there exists N ∈ N such
that, for n ≥ m ≥ N ,
Xn
ak < ε.
k=m+1
Proof. By definition, the series ∞

k=0 ak converges if and only if the sequence of partial sums
P
Pn
sn = k=0 ak converges. By Theorem 2.124, the sequence (sn )∞ n=0 converges if and only if it
is a Cauchy sequence, namely, for every ε > 0 there exists N ∈ N such that
|sn − sm | < ε ∀ n, m ≥ N.
By the definition of sn , this is equivalent to saying that

n
X
ak < ε ∀ n ≥ m ≥ N,
k=m+1

which proves the result.
Example 4.25. — To see the divergence of the harmonic series, we can also use the Cauchy
criterion. We do this by setting ε = 12 and noticing that, for N ∈ N, the following holds:
18 2N
X 1 1 1 1 N +1 1
= + + ... + ≥ > .
k N N +1 2N 2N 2
k=N | {z }
N + 1 terms
Hence the harmonic series cannot converge, since it does not satisfy the Cauchy criterion in
Theorem 4.24.

Chapter 4.2 Absolute Convergence
4.2 Absolute Convergence

In this section we will look at absolutely convergent series and prove some convergence criteria.
As before, unless otherwise specified, all sequences consist of real numbers.
4.2.1 Criteria for Absolute Convergence
Proposition 4.26: Absolute Convergence implies Convergence

An absolutely convergent series ∞
P
n=0 an is also convergent and satisfies the generalized
triangle inequality
∞
X X∞
an ≤ |an |.
n=0 n=0
Proof. Since the series ∞ n=0 |an | converges, according to the Cauchy criterion in Theorem
P
4.24, for every ε > 0 there exists N ∈ N such that, for n ≥ m ≥ N ,

n
X
|ak | < ε.
k=m+1
From this and the triangle inequality it follows that

18
n
X n
X
ak ≤ |ak | < ε.
k=m+1 k=m+1
This proves that also the series ∞

n=0 an satisfies the Cauchy criterion, so it converges.
P
The second part now follows from the inequality

n
X n
X ∞
X
ak ≤ |ak | ≤ |ak | ∀ n ∈ N,
k=0 k=0 k=0
taking the limit as n → ∞.
We now prove two important criteria to guarantee the absolute convergence of a series. In
their proofs, we will implicitly use the following fact:
Assume that a sequence (xn )∞ n=0 converges to a limit α. Then, given q > α (respectively,
q < α) then there exists N ∈ N such that xn < q (respectively, xn > q) for every n ≥ N .
This fact is a consequence of Proposition 2.97.

Corollary 4.27: Cauchy Root Criterion

Let (an )∞
n=0 be a sequence of real numbers, and let
p
n
α = lim sup |an | ∈ R ∪ {∞}.
n→∞
Then
∞
X
α < 1 =⇒ an converges absolutely
n=0
and
∞
X
α > 1 =⇒ an does not converge.
n=0
Proof. Suppose α < 1, and define q = 1+α

2 ∈ (α, 1). Recalling that by definition
p
n
p
lim sup |an | = lim sup k |ak |,
n→∞ n→∞ k≥n
we have that xn = supk≥n |ak | → α as n → ∞. Hence, since α < q, there exists N ∈ N

p
k
such that
p
k
xN = sup |ak | < q.
18 k≥N
This yields |ak | < q k for all k ≥ N , so the series ∞

k=1 |ak | converges according to Corollary
P
4.10 and the convergence of the geometric series in Example 2.134 (recall that q < 1).
If α > 1 holds, since the limsup is an accumulation point (see Theorem 2.116), Proposi-
tion 2.90 implies the existence of a subsequence (ank )∞k=0 such that limk→∞ |ank | > 1. In
p
nk
particular, there exists N ∈ N such that

q
nk
|ank | > 1 for all k ≥ N ,
or equivalently, |ank | > 1 for all k ≥ N . In particular, the sequence (an )∞

n=0 does not converge
P∞
to zero and therefore, according to Proposition 4.3, n=0 an does not converge.
4.28. — Let (an )∞ n=0 be a sequence of real numbers and α = lim supn→∞ |an | as in the
pn
root criterion. If α = 1 holds, then no decision about convergence or divergence of the series
P∞
n=0 an can be made using the root criterion:
• According to Example 3.37, n 1/n → 1 as n → ∞, and by Example 4.5 the series

p
P∞ 1
n=0 n diverges.
• On the other hand, n 1/n2 → 1 as n → ∞, but the series ∞ n=0 n2 converges according
p P 1
to Example 4.11.

Corollary 4.29: D’Alembert’s quotient criterion
Let (an )∞
n=0 be a sequence of real numbers with an ̸= 0 for all n ∈ N, and assume that
|an+1 |
lim exists.
n→∞ |an |
Let α ≥ 0 denote such a limit. Then

∞
X
α < 1 =⇒ an converges absolutely
n=0
and
∞
X
α > 1 =⇒ an does not converge.
n=0
Proof. The proof is similar to the one of Corollary 4.27.

|an+1 |
Suppose first α < 1, and define q = 1+α 2 ∈ (α, 1). Since limn→∞ |an | = α < q, there
exists N ∈ N such that
|ak+1 |
<q ∀ k ≥ N.
|ak |
18 This implies that
|ak | |ak−1 | |aN +1 | |aN |

|ak | = · ·... · ·|aN | < q k−N |aN | = N q k ∀ k > n.
|ak−1 | |ak−2 | |aN | q
| {z } | {z } | {z }
<q <q <q
Since q < 1, the series ∞k=N +1 |ak | converges absolutely.

P
If instead α > 1 holds, there exists N ∈ N such that
|ak+1 |
>1 ∀ k ≥ N,
|ak |
therefore
|ak | |ak−1 | |aN +1 |
|ak | = · · ... · · |aN | > |aN | ∀ k > n.
|ak−1 | |ak−2 | |aN |
This implies that the sequence (an )n≥0 does not converge to zero. Hence, according to Propo-
sition 4.3, ∞n=0 an does not converge.
P
Exercise 4.30 (A Generalization of Corollary 4.29). — Let (an )∞

n=0 be a sequence of real
numbers with an ̸= 0 for all n ∈ N, and define
|an+1 | |an+1 |
α+ = lim sup , α− = lim inf .
n→∞ |an | n→∞ |an |

Prove the following implications:

∞
X ∞
X
18 α+ < 1 =⇒ an converges absolutely, α− > 1 =⇒ an does not converge.
n=0 n=0
Is the second implication still true if one replaces α− with α+ ?
4.2.2 Reordering Series
Theorem 4.31: Rearrangement for Absolutely Convergent Series

Let ∞
P
n=0 an be an absolutely convergent series of real numbers. Let φ : N → N be a
bijection. Then the series ∞
P
n=0 aφ(n) is absolutely convergent, and
∞
X ∞
X
an = aφ(n) . (4.4)
n=0 n=0
P∞
Proof. Let φ : N → N be a bijection, and fix ε > 0. By the convergence of n=0 |an |, there
exists N ∈ N such that
∞
X ε
|ak | < .
2
k=N +1
Let M be the maximum of the finite set {φ−1 (k) | 0 ≤ k ≤ N }. Equivalently, M ∈ N is the
smallest number such that
19
{a0 , . . . , aN } ⊂ {aφ(0) , . . . , aφ(M ) }.
Then, since {a0 , . . . , aN } ⊂ {aφ(0) , . . . , aφ(n) } for n ≥ M ,
n
X N
X X
aφ(ℓ) − ak = aφ(ℓ) .
ℓ=0 k=0 0≤ℓ≤n,
φ(ℓ)>N
This implies that, for n ≥ M ,
n
X ∞
X n
X N
X ∞
X X ∞
X
aφ(ℓ) − ak = aφ(ℓ) − ak − ak = aφ(ℓ) − ak
ℓ=0 k=0 ℓ=0 k=0 k=N +1 0≤ℓ≤n, k=N +1
φ(ℓ)>N
X ∞
X X ∞
X
≤ aφ(ℓ) + ak ≤ |aφ(ℓ) | + |ak |.
0≤ℓ≤n, k=N +1 0≤ℓ≤n, k=N +1
φ(ℓ)>N φ(ℓ)>N

Note now that all terms of the form aφ(ℓ) with φ(ℓ) > N and contained inside the infinite set
{ak | k > N }, therefore
X X∞
|aφ(ℓ) | ≤ |ak |.
0≤ℓ≤n, k=N +1
φ(ℓ)>N
Combining these two inequalities together we obtain

n
X ∞
X ∞
X
aφ(ℓ) − ak ≤ 2 |ak | < ε.
ℓ=0 k=0 k=N +1
Since ε > 0 is arbitrary, this shows the validity of (4.4).

If we apply the same argument as above to the series ∞
P∞
n=0 |an | we obtain that
P
n=0 |aφ(n) | =
P∞ P∞
n=0 |an | < ∞, so n=0 aφ(n) is absolutely convergent.
4.2.3 Products of Series
Theorem 4.32: Product Theorem

Let ∞
P P∞
n=0 an and n=0 bn be absolutely convergent series, and let α : N → N × N be a
bijection. Write α(n) = (α1 (n), α2 (n)) ∈ N × N for any n ∈ N. Then
19
∞
X ∞
X ∞
X
an bn = aα1 (n) bα2 (n) , (4.5)
n=0 n=0 n=0
and the series on the left converges absolutely.
Proof. Consider first a bijection α : N → N2 written as α(n) = (α1 (n), α2 (n)) such that
{α(k) | 0 ≤ k < n2 } = {0, 1, . . . , n − 1}2 for all n ∈ N. For example, (α(n))∞
n=0 could pass
through the set N as in the following figure.
2

Then, for each n ∈ N,

2 −1
nX n−1
! n−1
!
X X
|aα1 (k) ||bα2 (k) | = |aℓ | |bm | .
k=0 ℓ=0 m=0
P∞ P∞
Since the right-hand side is bounded by ( ℓ=0 |aℓ |) ( m=0 |bm |), it follows that
2 −1
nX ∞
! ∞
!
X X
sup |aα1 (k) ||bα1 (k) | ≤ |aℓ | |bm | < ∞,
n∈N k=0 ℓ=0 m=0
so the series ∞
k=1 aα1 (k) bα2 (k) is absolutely convergent, and in particular converges.
P
Considering now the identity

2 −1
nX n−1
! n−1
!
X X
aα1 (k) bα2 (k) = aℓ bm
k=0 ℓ=0 m=0
and taking the limit as n → ∞, recalling Proposition 2.96(2) we obtain

2 −1
nX n−1
! n−1
!
X X
lim aα1 (k) bα2 (k) = lim aℓ lim bm .
n→∞ n→∞ n→∞
k=0 ℓ=0 m=0
19
This proves the validity of (4.5) for the specific bijection α constructed above.
For an arbitrary bijection β : N → N2 , consider the bijection φ = α−1 ◦ β : N → N with α
as above, so that β = α ◦ φ. Then, if we write β(n) = (β1 (n), β2 (n)) = (α1 (φ(n)), α2 (φ(n))),
the Rearrangement Theorem 4.31 and the validity of (4.5) for α imply that
∞
X ∞
X ∞
X ∞
X ∞
X
aβ1 (n) bβ2 (n) = aα1 (φ(n)) bα2 (φ(n)) = aα1 (n) bα2 (n) = an bn .
n=0 n=0 n=0 n=0 n=0
Corollary 4.33: Cauchy Product

If ∞
P P∞
n=0 an and n=0 bn are absolutely convergent series, then
∞
X ∞
X ∞ X
X n
an bn = an−k bk ,
n=0 n=0 n=0 k=0
P∞ Pn
where the series n=0 k=0 an−k bk is absolutely convergent.
Proof. Consider the bijection α : N → N × N, α(n) = (α1 (n), α2 (n)), as in the picture below.

Then it follows from Theorem 4.32 that

∞
X ∞
X ∞
X
an bn = aα1 (n) bα2 (n) .
n=0 n=0 n=0
Now, if we write explicitly the terms appearing in the sum and we group them in blocks of
length 1, 2, 3, 4, . . . (geometrically, this corresponds to grouping terms that belong to the same
diagonal in the figure above), we see that
∞
X
19 aα1 (n) bα2 (n) = a0 b0 + (a0 b1 + a1 b0 ) + (a2 b0 + a1 b1 + a0 b2 )
n=0
+ (a0 b3 + a1 b2 + a2 b1 + a3 b0 ) + . . .
X∞ X n
= an−k bk .
n=0 k=0
Finally, the absolute convergence follows from the triangle inequality and Theorem 4.32:
∞ X
X n ∞ X
X n ∞
X
an−k bk ≤ |an−k bk | = |aα1 (n) ||bα2 (n) | < ∞.
n=0 k=0 n=0 k=0 n=0
Example 4.34. — Let q ∈ R be such that |q| < 1. Then ∞ n=0 q converges absolutely
n
P
according to Example 4.4. If we apply the Cauchy product to this series with itself, we get
∞
!2 ∞ X
n ∞ X
n ∞
1 X
n
X X X
= q = q n−k q k = qn = (n + 1)q n .
(1 − q)2
n=0 n=0 k=0 n=0 k=0 n=0
P∞
In this way, we obtain an explicit formula for n=0 nq
n :
∞ ∞ ∞
X X X 1 1 q
nq n = (n + 1)q n − qn = − = .
(1 − q)2 1 − q (1 − q)2
n=0 n=0 n=0

4.3 Series of Complex Numbers

To study series in C, it is often sufficient to consider the corresponding series of real and
imaginary parts in R.
Definition 4.35: Series of Complex Numbers

Let (zn )∞
n=0 = (xn +iyn )n=0 be a sequence of complex numbers and let Z = A+iB ∈ C.
∞
The series n=0 zn is convergent with limit Z if the two series of real numbers ∞
P∞ P
n=0 xn
P∞
and n=0 yn are convergent, with limits A and B, respectively.
19 We say that ∞
P∞
n=0 zn converges absolutely if the series n=0 |zn | converges.
P
Remark 4.36. — Let (zn )∞ n=0 be a sequence of complex numbers and

= (xn + iyn )∞
P∞ n=0
assume that the series n=0 |zn | converges. Since
0 ≤ |xn | ≤ |zn | , 0 ≤ |yn | ≤ |zn | ∀ n ∈ N,

P∞ P∞
the Majorant Criterion in Corollary 4.10 implies that the series n=0 |xn | and n=0 |yn |
converge, i.e., the series of the real and imaginary part are absolutely convergent.

Chapter 4.4 Power Series
4.4 Power Series
Our next goal is to investigate power series. These are series where the elements are powers
of the variable x ∈ R (or z ∈ C, if one wants to consider complex power series) multiplied by
a coefficient.
4.4.1 Radius of Convergence
Definition 4.37: Power Series

A power series with real coefficients is a series of the form
∞
X
an xn ,
n=0
where (an )∞
n=0 is a sequence in R, and x ∈ R. Here, x is called variable, and the
element an ∈ R is called the coefficient of xn .
Addition and multiplication of power series are given by

∞
X ∞
X ∞
X
an xn + bn xn = (an + bn )xn ,
n=0 n=0 n=0
∞
X ∞
X X∞ X
n
n n
19 an x bn x = an−k bk xn ,
n=0 n=0 n=0 k=0
where the product formula follows from Corollary 4.33:

∞
X ∞
X ∞ X
X n ∞ X
X n
n n n−k k
an x bn x = an−k x bk x = an−k bk xn .
n=0 n=0 n=0 k=0 n=0 k=0
4.38. — A power series is a polynomial if only finitely many of its coefficients are zero.
The convergence of a power series depends heavily on the coefficients an , and is answered in
Theorem 4.41.
Definition 4.39: Radius of Convergence

Let ∞ n=0 an x . The radius of convergence of the series is
n the number R ∈ R≥0 or
P
the symbol R = ∞, defined by


0

 if ρ = ∞

and
p
if 0 < ρ < ∞
n
ρ = lim sup |an | R = ρ−1
n→∞ 

∞ if ρ = 0.


P∞
Exercise 4.40. — Find, for each R ∈ [0, ∞) ∪ {∞}, a power series n=0 an x
n with radius
of convergence R.
Theorem 4.41: Convergence of Power Series

Let ∞ n
P
n=0 an x be a power series with radius of convergence R ∈ (0, ∞) ∪ {∞}. The
series ∞ n
P
n=0 an x converges absolutely for all x ∈ R with |x| < R, and does not converge
for all x ∈ R with |x| > R. In particular, for x ∈ (−R, R), we can define the function
f (x) = ∞ n
P
n=0 an x .
Proof. Let x ∈ R, and write ρ = lim supn→∞ |an | as in Definition 4.39. It holds
p
n
p
n
p
lim sup |an xn | = lim sup n |an ||x| = ρ|x|
n→∞ n→∞
P∞
According to the root criterion applied to the series n=0 bn with bn = an x , the series
n
converges absolutely for all x ∈ R with ρ|x| < 1, and does not converge if ρ|x| > 1 (in
particular, if ρ = 0, then the series converges absolutely for all x ∈ R).
Theorem 4.42: Continuity of Power Series

Let ∞ n
P
n=0 an x be a power series with radius of convergence R ∈ (0, ∞) ∪ {∞}, and
define the polynomials fn (x) = nk=0 ak xk . For any r < R, the sequence of polynomials
P
(fn )∞
n=0 converges uniformly to f on (−r, r). In particular, the power series defines a
19
continuous function f : (−R, R) → R.
Proof. To prove the result, we note that Theorem 4.41 applied with x = r implies that
P∞ P∞
n=0 |an |r < ∞ holds. Therefore, for every ε > 0 there exists N ∈ N such that
n n
n=N +1 |an |r <
ε. Thus, for all x ∈ (−r, r) and n ≥ N ,
n
X ∞
X ∞
X ∞
X ∞
X
|fn (x) − f (x)| = ak xk − ak xk = ak xk ≤ |ak ||x|k ≤ |ak |rk < ε.
k=0 k=0 k=n+1 k=N +1 k=N
This proves the uniform convergence inside (−r, r) of the sequence of continuous functions
n=0 to f so, by Theorem 3.102, f is continuous inside (−r, r). Since r < R is arbitrary,
(fn )∞
f : (−R, R) → R is continuous.
Example 4.43. — In general, it is not true that the partial sums fn (x) = nk=0 ak xk tend
P
uniformly to the function f (x) = ∞k=0 ak x in the whole open interval (−R, R). We illustrate
k
P
this with the geometric series.

The radius of convergence of the series ∞ n=0 x is R = 1, and the function defined by
n
P
this power series is equal to f (x) = 1−x

1
on (−1, 1) (see Example 4.4). In particular, if
the convergence of the sequence of partial sums on (−1, 1) were uniform, then applying the
notion of uniform convergence with ε = 1 we would find N ∈ N such that, for all n ≥ N and

x ∈ (−1, 1), the estimate

n
X 1
xk − <1
1−x
k=0
holds. Choosing n = N , thanks to the triangle inequality we obtain
N N
1 X
k
X
<1+ x ≤1+ |x|k ≤ 2 + N ∀ x ∈ (−1, 1).
1−x
k=0 k=0
However, this is a contradiction, since lim 1

1−x = ∞.
x→1−
19 Exercise 4.44. — Calculate the radius of convergence R of the power series

∞ √ 2 √
X ( n + n − n2 + 1)n
xn ,
n2
n=1
and show the convergence of the power series also at x = −R and x = R.
P∞
Exercise 4.45. — Let n=0 an x be a power series with an ̸= 0 for all n ∈ N, and
n
assume that the limit limn→∞ |a|an+1

n|
| exists. Then, the radius of convergence R is given by
R = limn→∞ |a|an+1
n|
|.
Hint: Combine Corollary 4.29 and Proposition 2.96(3).
Proposition 4.46: Radius of Convergence of Sum and Product

Let R ≥ 0, and let ∞
P n
P∞ n
n=0 an x and n=0 bn x be power series with radius of conver-
gence at least R. Then their sum and product also have radii of convergence at least
R.
Proof. Due to the linearity of the limit and Corollary 4.33, the absolute convergence of the
power series ∞
P∞
n=0 an x and n=0 bn x for |x| < R implies that both power series
n n
P
20
∞
X ∞
X ∞
X ∞ X
X n
(an + bn )x n
and an x n
bn x n
= an−k bk xn
n=0 n=0 n=0 n=0 k=0
absolutely converge for |x| < R. Since a power series does not converge for |x| larger than
its radius of converges, this implies that both power series have radii of convergence at least
R.

P∞
Example 4.47. — If n=0 an x
n has at least radius of convergence 1, then
∞ ∞
1 X X
an xn = (a0 + . . . + an )xn ∀ x ∈ (−1, 1) (4.6)
1−x
n=0 n=0
Indeed, the power series ∞ n=0 x has radius of convergence 1, and for x ∈ (−1, 1) we have
n
P
P∞ n
n=0 x = 1−x , so (4.6) follows from Proposition 4.46.
1
P∞
Exercise 4.48. — Calculate n=1 n2
−n .
4.4.2 Complex Power Series

Analogously to the real case, one can consider series with complex coefficients and a complex
variable z ∈ C. This generalization will be useful later.
Definition 4.49: Complex Power Series

A complex power series with complex coefficients is a series of the form
∞
X
an z n ,
n=0
20
where (an )∞
n=0 is a sequence in C, and z ∈ C.
Addition and multiplication of power series are the same as in the real case, namely
∞
X ∞
X ∞
X
an z n + bn z n = (an + bn )z n , (4.7)
n=0 n=0 n=0
∞
X ∞
X X∞ X
n
an z n
bn z n
= an−k bk z n . (4.8)
n=0 n=0 n=0 k=0
Also, the radius of convergence is still defined as in Definition 4.39. Several results that are
true for real power series, hold also in the complex case.
We note that, with the very same proof as in the real case, the analogue of Theorem 4.41
holds:
Theorem 4.50: Convergence of Power Series

Let ∞ n
P
n=0 an z be a power series with radius of convergence R ∈ (0, ∞) ∪ {∞}. The
series ∞ n
P
n=0 an z converges absolutely for all z ∈ C with |z| < R, and does not converge
for all z ∈ C with |z| > R. In particular, for z ∈ B(0, R), we can define the function
f (z) = ∞ n
P
n=0 an z .

Continuity of Complex Power Series (Extra material)
Also in the complex case, the analog of Theorem 4.42 holds. To prove that, one defines
continuous functions exactly as in Definition 3.9, and uniform convergence as in Definition 3.99
(with the only warning that | · | now denotes the absolute value on C, see Definition 2.46). In
this way, one can prove that Theorem 3.102 also holds in the complex case, namely, uniform
limit of continuous functions is continuous (in the courses of Analysis II or Complex Analysis,
20 this will be proved in full detail), and we get the following:
Theorem 4.51: Continuity of Power Series

Let ∞ n
P
n=0 an z be a power series with radius of convergence R ∈ (0, ∞) ∪ {∞}, and
define the (complex) polynomials fn (z) = nk=0 ak z k . For any r < R, the sequence
P
of polynomials (fn )∞
n=0 converges uniformly to f on B(0, r). In particular, the power
series defines a continuous function f : B(0, R) → C.

Chapter 4.5 Example: Exponential and Trigonometric Functions
4.5 Example: Exponential and Trigonometric Functions
4.5.1 The Exponential Map as Power Series

In section 3.4, we have seen the real exponential mapping and shown its main properties. We
now show that we can alternatively define the exponential map by the exponential series
∞
X 1 k
exp(x) = x , (4.9)
k!
k=0
where
0! = 1, k! = 1 · 2 · . . . · k.
It follows directly from the quotient criterion (see Exercise 4.45) that this series has infinite
radius of convergence, so in particular the right-hand side of (4.9) is well-defined for all x ∈ R.
Alternatively, one can note that, given N ∈ N
n! ≥ n · (n − 1) · . . . · N ≥ N n−N +1 ,
| {z }
n − N + 1 terms
hence
p
n 1 1
20 lim sup |an | ≤ lim sup n−N +1 = .
n→∞ n→∞ N n N
Since N ∈ N can be chosen arbitrarily large, this implies that lim supn→∞ n |an | = 0 and,
p
therefore, R = ∞.
The representation of the exponential mapping as a power series is, in many aspects, more
flexible than the representation as a limit. In addition, as we shall see, its complex version is
related to sine and cosine in a very practical way.
We first show that our new definition of exponential coincides with the one in Defini-
tion 3.52.
Proposition 4.52: Exponential Map as Power Series

For every x ∈ R,
∞
x n

X 1 k
x = lim 1 + .
k! n→∞ n
k=0
Proof. (Extra material) We first observe that, for any n ≥ 0, the identity
n n k−1 n k−1
xk xk 1 Y xk Y

x n X n! X X l
1+ = = (n − l) = 1−
n k!(n − k!) nk k! nk k! n
k=0 k=0 l=0 k=0 l=0

P∞
hold. Now, given x ∈ R and ε > 0, since 1
k=0 k! |x|
k < ∞ we can find N ∈ N such that
∞
X 1 k
|x| < ε.
k!
k=N +1
Then, for this N , we also have
N ∞ ∞ ∞
X 1 k X 1 k X 1 k X 1 k
x − x ≤ x ≤ |x| < ε (4.10)
k! k! k! k!
k=0 k=0 k=N +1 k=N +1
and furthermore, for n ≥ N ,
N n k−1
X 1 k X 1 kY ℓ
x − x 1−
k! k! n
k=0 k=0 ℓ=0
N k−1
Y n k−1
X 1 k ℓ X 1 kY ℓ
≤ |x| 1 − 1− + |x| 1−
k! n k! n
k=0 ℓ=0 k=N +1 ℓ=0
| {z }
≤1
N k−1
Y ∞
X 1 k ℓ X 1 k
≤ |x| 1 − 1− + |x| .
k! n k!
k=0 ℓ=0 k=N +1
20 This proves that
N N k−1
Y ∞
X 1 k x n X 1 k ℓ X 1 k
x − 1+ ≤ |x| 1 − 1− + |x| .
k! n k! n k!
k=0 k=0 ℓ=0 k=N +1
Since
k−1
Y
ℓ
lim 1− 1− =0 ∀ k ∈ {0, . . . , N },
n→∞ n
ℓ=0
letting n → ∞ in the latter formula yields
N ∞
X 1 k x n X 1 k
x − lim 1 + ≤ |x| < ε.
k! n→∞ n k!
k=0 k=N +1
Combining this estimate with (4.10) proves that
∞ ∞ N
X 1 k x n X 1 k X 1 k x n
x − lim 1 + ≤ x + x − lim 1 +
k! n→∞ n k! k! n→∞ n
k=0 k=N +1 k=0
∞ N
X 1 k X 1 k x n
≤ |x| + x − lim 1 + < 2ε.
k! k! n→∞ n
k=N +1 k=0
Since ε > 0 is arbitrary, (4.9) follows.

Figure 4.1: The graph of the exponential mapping, and the graphs of some partial sums of
the exponential series.
Definition 4.53: The Complex Exponential Map
20 The complex exponential map is the function exp : C → C given by

∞
X 1 n
exp(z) = z
n!
k=0
for all z ∈ C.
For a positive real number a ∈ R>0 and z ∈ C we write az = exp(z log(a)), and in
particular also ez = exp(z) for all z ∈ C.
Before stating the main properties of the exponential, we recall the binomial formula: given
z, w ∈ C and n ∈ N,
n
X n n n!
n
(z + w) = k
z w n−k
, = . (4.11)
k k k!(n − k)!
k=0
Theorem 4.54: Properties of the Complex Exponential

The complex exponential mapping exp : C → C is continuous. Furthermore
ez+w = ez ew and |ez | = eRe(z) (4.12)
for all z, w ∈ C. In particular, |eix | = 1 holds for all x ∈ R.

Proof. As noted before, the radius of convergence of the series ∞

n=0 n! z is infinite. Therefore,
1 n
P
by Theorems 4.50 and 4.51, exp : C → C is a continuous function.

Given z, w ∈ C, it follows from (4.7) and the binomial formula (4.11) that
∞
X X ∞ X ∞ X n
z w 1 n 1 n 1 1
e e = z w = z k wn−k
n! n! k! (n − k)!
n=0 n=0 n=0 k=0
∞
X 1 X nn ∞
X 1
= z k wn−k = (z + w)n = ez+w .
n! k n!
n=0 k=0 n=0
It remains to prove the formula for the absolute value. Since the conjugation C → C is a
continuous function, the following holds:
n n n
X 1 k X 1 k X 1 k
ez = lim z = lim z = lim z = ez ,
n→∞ k! n→∞ k! n→∞ k!
k=0 k=0 k=0
where the equality z k = z k follows from Lemma 2.43(3). Recalling that for a complex number
w it holds |w|2 = ww and w + w = 2 Re(w), we get
|ez |2 = ez ez = ez ez = ez+z = e2 Re(z) = (eRe(z) )2 ,
therefore |ez | = eRe(z) . In particular |eix | = e0 = 1.
20
z n
Exercise 4.55. — Show that ez = lim 1 + holds for all z ∈ C.

n→∞ n
4.5.2 Sine and Cosine

4.56. — Given x ∈ R, we split the power series of eix into its odd and even terms:
∞ n ∞ ∞
ix
X i n
X i2n 2n X i2n+1
e = x = x + x2n+1 .
n! (2n)! (2n + 1)!
n=0 n=0 n=0
Noticing that i2n = (−1)n , it follows that

∞ ∞
ix
X (−1)n 2n
X (−1)n 2n+1
e = x +i x .
(2n)! (2n + 1)!
n=0 n=0
Starting from this formula, we define the sine function and the cosine function as
∞ ∞
X (−1)n 2n+1 X (−1)n
sin(x) = x and cos(x) = x2n , (4.13)
(2n + 1)! (2n)!
n=0 n=0
so that the identity

eix = cos(x) + i sin(x)
holds.

As for the exponential, the radius of convergence of these series is infinite, and by Theorems
4.50 and 4.51 they define continuous functions. Since (−x)2n+1 = −x2n+1 and (−x)2n = x2n
for every n ∈ N, it follows directly from the definition as power series that the sine function
is odd, i.e., sin(−x) = − sin(x), and the cosine function is even, i.e., cos(−x) = cos(x) for
all x ∈ R.
Theorem 4.57: From Complex Exponential to Sine and Cosine

For all x ∈ R, the following relations between exponential, sine, and cosine functions
hold:
eix − e−ix eix + e−ix

eix = cos(x) + i sin(x), sin(x) = , cos(x) = .
2i 2
For all x, y ∈ R, the trigonometric addition formulae apply:
sin(x + y) = sin(x) cos(y) + cos(x) sin(y)

(4.14)
cos(x + y) = cos(x) cos(y) − sin(x) sin(y).
Proof. For x ∈ R we have
eix = cos(x) + i sin(x), e−ix = cos(−x) + i sin(−x) = cos(x) − i sin(x).
20
If we add (respectively subtract) these two identities, we obtain the formulae for cos(x) and
sin(x).
To prove the addition formulas, we multiply eix by eiy and using (4.12) we get
cos(x + y) + i sin(x + y) = ei(x+y) = eix eiy

= (cos(x) + i sin(x))(cos(y) + i sin(y))

= cos(x) cos(y) − sin(x) sin(y) + i sin(x) cos(y) + sin(y) cos(x) .
From this equality, the formulae in (4.14) follow.
In particular, in the case x = y ∈ R, we obtain the angle-doubling formulae
sin(2x) = 2 sin(x) cos(x) and cos(2x) = cos(x)2 − sin(x)2 . (4.15)
Recalling that |eix |2 = 1, we also get the circle equation for sine and cosine:
cos(x)2 + sin(x)2 = 1 ∀ x ∈ R.
Applet 4.58 (Power Series). We consider the first partial sums of the power series defining
exp, sin and cos (respectively sinh, cosh from the next section). By zooming in and out, you
can get a feeling for the quality of the approximations of the various partial sums. In the case

of the trigonometric functions, you can also clearly see in the picture that the power series
form alternating series.
4.5.3 The Circle Number
Theorem 4.59: Existence of π as First Positive Zero of Sine

There is exactly one number π ∈ (0, 4) with sin(π) = 0. For this number it holds
π
ei 2 = i, eiπ = −1, ei2π = 1.
n
Proof. The sequence of real numbers ( xn! )∞ n=2 is monotonically decreasing for all x ∈ (0, 2].
Therefore, from the Leibniz criterion for alternating series (see Proposition 4.22), the following
estimates hold for every x ∈ (0, 2]:
x3 x3 x5 x2 x2 x4
x− ≤ sin(x) ≤ x − + and 1− ≤ cos(x) ≤ 1 − + .
3! 3! 5! 2 2 24
Note that sin(0) = 0 and that, for x = 1, we have
20 1 1
sin(1) ≥ 1 − >√ .
6 2
Therefore, since the sine function is continuous, it follows from the Intermediate Value Theo-
rem 3.29 that there exists a number p ∈ (0, 1) such that sin(p) = √12 .
Because sin2 (p) + cos2 (p) = 1 and cos(x) ≥ 1 − 21 x2 > 0 for x ∈ [0, 1], it also follows
cos(p) = 1 − sin2 (p) = √12 . In other words,
p
1+i
eip = cos(p) + i sin(p) = √ .
2
Hence, if we set π = 4p, we see that
π 2 (1 + i)2 π 2
ei 2 = ei2p = eip = = i, eiπ = ei 2 = i2 = −1, ei2π = (−1)2 = 1.
2
In particular, from the identity cos(π) + i sin(π) = eiπ = −1 we deduce that sin(π) = 0 and
cos(π) = −1.
It remains to show the uniqueness of π as in the theorem. From the estimate
x3 x2

sin(x) ≥ x − =x 1− >0 for x ∈ (0, 2]
3! 6

it follows that the sine function has no zeros in (0, 2]. In particular, π ∈ (2, 4). Suppose now,
by contradiction, that there exists another value s ∈ (2, 4) satisfying sin(s) = 0, and define

π − s if 2 < s < π
r= .
s − π if 2 < π < s.
Then r ∈ (0, 2) and using (4.14) we get (the sign ± below depends on whether π − s is positive
or negative)

sin(r) = ± sin(π − s) = ± sin(π) cos(s) − cos(π) sin(s) = ±(0 − 0) = 0.
| {z } | {z }
=0 =0
This is a contradiction since sin never vanishes on (0, 2). This proves that π ∈ (0, 4) is uniquely
determined by the equation sin(π) = 0.
Corollary 4.60: Periodicity of Sine and Cosine
sin(x + π2 ) = cos(x) cos(x + π2 ) = − sin(x)

sin(x + π) = − sin(x) cos(x + π) = − cos(x)
sin(x + 2π) = sin(x) cos(x + 2π) = cos(x).
20
Proof. Rewriting the formulas in Theorem 4.59 in terms of sine and cosine, we see that
sin( π2 ) = 1, cos( π2 ) = 0, sin(π) = 0, cos(π) = −1, sin(2π) = 0, cos(2π) = 1.
Using these identities and (4.14), the result follows.
4.61. — From Corollary 4.60 it follows that sine and cosine are both periodic functions
with period length 2π. To know the numerical value of sin(x) or cos(x) for a given real number
x, it is sufficient to know the values of the sine on the interval [0, π2 ].
Exercise 4.62. — Show that the zeros of sin : R → R are exactly the points in πZ ⊂ R,
and the zeros of cos : R → R are exactly the points in πZ + π2 . Also, cos(x) = 1 only when
x = 2nπ with n ∈ Z.

Exercise 4.63. — Show the formula

x+y x−y
sin(x) − sin(y) = 2 cos sin
2 2
for all x, y ∈ R. Use this to show that sin : − π2 , π2 → [−1, 1] is strictly monotonically

increasing and hence bijective.
Exercise 4.64. — Show that 3.1 < π < 3.2 holds. Using an electronic tool to calculate
certain rational numbers may be helpful.
4.5.4 Polar Coordinates and Multiplication of Complex Numbers

Using the complex exponential function, we can express complex numbers in polar coordi-
nates, that is, in the form
z = reiθ = r cos(θ) + ir sin(θ)
where r is the distance from the origin 0 ∈ C to z, i.e., the absolute value r = |z| of z, and θ
is the angle enclosed between the half-lines R≥0 and zR≥0 .
20
If z ̸= 0, then the angle θ is uniquely determined, and is called the argument of z and written
as θ = arg(z). The set of complex numbers with absolute value one is thus
S1 = {z ∈ C | |z| = 1} = {eiθ | θ ∈ [0, 2π)}
and is called the unit circle in C.

Proposition 4.65: Existence of Polar Coordinates

For all z ∈ C \ {0} there exist uniquely determined real numbers r > 0 and θ ∈ [0, 2π)
with z = reiθ .
Proof. (Extra material) Let r = |z|, and consider the complex number w = zr . Note that
|w| = |z|
r = 1. We want to prove that there exists a unique θ ∈ [0, 2π) such that w = e .
iθ
Assume first that Im(w) ≥ 0 and recall that Re(w) ∈ [−1, 1] (since Re(w)2 + Im(w)2 = 1).
Hence, since cos(0) = 1 and cos(π) = −1, according to the Intermediate Value Theorem 3.29
there exists θ ∈ [0, π] such that Re(w) = cos(θ). Since Im(w) ≥ 0 by assumption and
sin(θ) ≥ 0 (since θ ∈ [0, π]), this implies that
p p
sin(θ) = 1 − cos2 (θ) = 1 − Re(w)2 = Im(w),
thus w = eiθ .
If Im(w) < 0, then we apply the above argument to −w to find ϑ ∈ (0, π) such that
−w = eiϑ (note that ϑ must be different from 0 and π, since Im(e0 ) = Im(eiπ ) = 0). Recalling
that eiπ = −1, it follows that w = eiπ eiϑ = eiθ with θ = π + ϑ ∈ (π, 2π).
′
It remains to show the uniqueness of θ. If θ, θ′ ∈ [0, 2π) satisfy w = eiθ = eiθ , then
′
ei(θ−θ ) = 1, that is,
sin(θ − θ′ ) = 0, cos(θ − θ′ ) = 1.
20 Note that θ − θ′ ∈ (−2π, 2π). Hence, from the uniqueness of π in Theorem 4.59 and the
formula sin(x + π) = − sin(x) (see Corollary 4.60) it follows that θ − θ′ ∈ {−π, 0, π}. But if
θ − θ′ ∈ {−π, π} then cos(θ − θ′ ) = −1, so the only possibility is θ − θ′ = 0, as desired.
4.66. — In polar coordinates, the multiplication in C can be reinterpreted as follows: If

z = reiφ and w = seiψ are complex numbers, then zw = rsei(φ+ψ) . In other words, when
multiplying complex numbers, the lengths of the vectors multiply and the angles add.
Applet 4.67 (Geometric Meaning of Complex Numbers). We can see from the polar coordi-
nate lines drawn in the geometrical meaning of the multiplication of complex numbers and the
inverses and roots of a given number.

Exercise 4.68. — Let w = reiθ be non-zero. Show that the n-th roots of w (namely, the
solutions z ∈ C to the equation z n = w) are given by the n numbers
n√
θ
o
i 2πα+ n
n
re α = 0, n1 , n2 , . . . , n−1
n .
When w = 1, its n-th roots are given by
ei2πα α = 0, n1 , n2 , . . . , n−1

n
and are called the n-th roots of unity.
n−1
X k
Exercise 4.69. — For all natural numbers n ≥ 2, show the identity ei2π n = 0.
k=0
4.5.5 The Complex Logarithm

We have defined the real logarithm as the inverse mapping of the bijective mapping exp :
20 R → R>0 . We now would like to define the logarithm for complex numbers. Unfortunately,
however, there is a fundamental problem here: the complex exponential mapping exp : C → C
is not injective, since, for example, exp(ix) = 1 holds for all x = 2nπ with n ∈ Z. For this
reason, we need to restrict the exponential mapping to a suitable subset D of C to ensure that
the restricted mapping exp |D : D → C× is bijective. This can be achieved by many different
subsets. We refer to the lecture on Complex Analysis for concrete choices.
4.5.6 Other Trigonometric Functions

In addition to the exponential function, the sine and the cosine, there are other related
functions called trigonometric functions.
4.70. — The tangent function and the cotangent function are given by.
sin(x) cos(x)
tan(x) = and cot(x) =
cos(x) sin(x)
defined for all x ∈ R with cos(x) ̸= 0, respectively with sin(x) ̸= 0.

Exercise 4.71. — Show that, for x, y ∈ C, the addition formula
tan(x) + tan(y)
tan(x + y) =
1 − tan(x) tan(y)
20 holds where defined. Find and prove an analogous addition formula for the cotangent.
4.72. — The hyperbolic sine and the hyperbolic cosine are the functions given by the
power series
∞ ∞
X 1 X 1 2k
sinh(x) = x2k+1 and cosh(x) = x .
(2k + 1)! (2k)!
k=0 k=0
It holds
ex − e−x ex + e−x
sinh(x) = and cosh(x) = ,
2 2
and so ex = cosh(x) + sinh(x) for all x ∈ R. The hyperbolic tangent and the hyperbolic
cotangent are given by
sinh(x) ex − e−x cosh(x) ex + e−x

tanh(x) = = x and coth(x) = = x ,
cosh(x) e + e−x sinh(x) e − e−x
and the hyperbolic cotangent is defined for all x ∈ R \ {0} (since sinh(x) ̸= 0 for x ̸= 0). The
functions sinh and tanh are odd, and cosh is even. The addition formulae
sinh(x + y) = sinh(x) cosh(y) + cosh(x) sinh(y),

cosh(x + y) = cosh(x) cosh(y) + sinh(x) sinh(t)

apply for all x, y ∈ R, as well as the hyperbolic equation
cosh2 (x) − sinh2 (x) = 1 ∀ x ∈ R.
20
Exercise 4.73. — Starting from the definitions of hyperbolic sine and hyperbolic cosine,
prove the above formulae.

Chapter 5
Differential Calculus
We deal with differential calculus in one variable. This is of fundamental importance for
understanding functions on R.
5.1 The Derivative
5.1.1 Definition and Geometrical Interpretation

In this section D ⊆ R will denote a non-empty set with no isolated points, that is, every
element x ∈ D is an accumulation point for D \ {x}. The typical example of such a subset is
an interval that is not empty and does not consist of only one point.
Definition 5.1: Derivative

Let f : D → R be a function and x0 ∈ D. We say that f is differentiable at x0 if the
limit
f (x) − f (x0 ) f (x0 + h) − f (x0 )
f ′ (x0 ) = x→x
lim = lim (5.1)
0 x − x0 h→0 h
x̸=x0
21 h̸=0
exists. In this case we call f ′ (x0 ) the derivative of f at x0 . If f is differentiable at

every point of D, then we also say that f is differentiable on D, and we call the
resulting function f ′ : D → R the derivative of f .
To simplify the notation, we shall often write
f (x) − f (x0 ) f (x0 + h) − f (x0 )

f ′ (x0 ) = lim = lim ,
x→x0 x − x0 h→0 h
without emphasizing that x ̸= x0 or h ̸= 0.
135
Chapter 5.1 The Derivative
Remark 5.2. — If f : D → R is differentiable at x0 , then it is also continuous at x0 .

Indeed, the condition f ′ (x0 ) = limx→x0 f (x)−f
x−x0
(x0 )
can be rewritten as
f (x) − f (x0 ) − f ′ (x0 )(x − x0 )

lim = 0,
x→x0 x − x0
or equivalently, recalling the little-o notation in Definition 3.84,
f (x) − f (x0 ) − f ′ (x0 )(x − x0 ) = o(x − x0 ).
Therefore
lim f (x) = lim f (x0 ) + f ′ (x0 )(x − x0 ) + o(x − x0 ) = f (x0 ),
x→x0 x→x0
hence f is continuous at x0 .
df
5.3. — An alternative notation for the derivative of f is dx . If x0 ∈ D is an accumulation
point from the right of D, then f is differentiable from the right at x0 if the derivative
from the right
f (x) − f (x0 ) f (x0 + h) − f (x0 )

f+′ (x0 ) = lim = lim
x→x+
0
x − x0 h→0 + h
exists. Differentiability from the left and the derivative from the left f−′ (x0 ) are
21 defined analogously considering the limit x → x−
0.
5.4. — An affine function is a function of the form x 7→ sx + r, for real numbers s and r.
The graph of an affine function is a non-vertical line in R2 . The parameter s in the equation
y = sx + r is called the slope of the straight line. If f : D → R is differentiable at a point
a ∈ D, the function x 7→ f (a) + f ′ (a)(x − a) is called affine approximation of f at a.

The geometric interpretation of the derivative of a real-valued function f at a is the slope of

the tangents of the graph at a. This is because when x tends towards a, the secant going
through (a, f (a)) and (x, f (x)) and having the difference quotient as its slope approaches the
tangent of the graph at a.
Example 5.5. — • Constant functions are everywhere differentiable and have the zero
function as their derivative.
• The identity function f (x) = x is differentiable, and its derivative is the constant func-
tion 1. Indeed
x − x0
f ′ (x0 ) = lim =1 ∀ x0 ∈ R.
x→x0 x − x0
Example 5.6. — The exponential function exp : R → R>0 is differentiable and its deriva-
tive is again the exponential function. Indeed, for x ∈ R, since ex+h = ex eh we get
ex+h − ex eh − 1
(ex )′ = lim = ex lim
h→0 h h→0 h
∞ ∞
X 1 k−1 X 1
= ex lim h = ex lim hn .
h→0 k! h→0 (n + 1)!
k=1 n=0
Note now that the power series h 7→ ∞ n=0 (n+1)! h has infinite radius of convergence, as it
1 n
P
21 follows for instance from Exercise 4.45. In particular the function g(h) = ∞ n=0 (n+1)! h is
1 n
P
continuous on R, and g(0) = 1!1 = 1. Therefore
(ex )′ = ex lim g(h) = ex g(0) = ex .

h→0
More in general, let α be a complex number and let f : R → C be the complex-valued function
given by f (x) = eαx . The derivative of f can still be defined as the limit of the incremental
ratios, namely
f (x + h) − f (x)
f ′ (x) = lim ,
h→0 h
and one gets
eαx+αh − eαx eαh − 1

(eαx )′ = lim = eαx lim
h→0 h h→0 h
∞
X 1 k k−1
= eαx lim α h = αeαx .
h→0 k!
k=1
Example 5.7. — Let f : R \ {0} → R given by f (x) = x1 . Then f is differentiable and it

holds f ′ (x) = − x12 for all x ∈ R \ {0}. In fact
1 1
x+h− x x − (x + h)
f ′ (x) = lim = lim
h→0 h h→0 (x + h)xh

1 1 1
= − lim =− = − 2.
h→0 (x + h)x limh→0 (x + h)x x
Definition 5.8: Higher Derivatives

Let f : D → R be a function. We define the higher derivatives of f , if they exist, by
f (0) = f, f (1) = f ′ , f (2) = f ′′ , ..., f (n+1) = (f (n) )′
for all n ∈ N. If f (n) exists for any n ∈ N, f is called n-times differentiable. If the n-
th derivative f (n) is also continuous, f is called n-times continuously differentiable.
We denote the set of n-times continuously differentiable functions on D by C n (D).
Differently put, C 0 (D) denotes the set of real-valued continuous functions on D, and
C 1 (D) denotes the set of all differentiable functions whose derivative is continuous. We call
such functions continuously differentiable or of class C 1 . Recursively, for n ≥ 1 we define
C n (D) = {f : D → R | f is differentiable and f ′ ∈ C n−1 (D)}
and say f ∈ C n (D) is of class C n .
21 Example 5.9. — The function f : R → R given by f (x) = sgn(x)x2 is differentiable, and

the derivative of f is given by f ′ (x) = 2|x|. This shows that f is continuously differentiable,
i.e., of class C 1 . Since the continuous function f ′ is not differentiable at 0, f is not of class
C 2.
Definition 5.10: Smooth Functions

We define
∞
\
C ∞ (D) = C n (D)
n=0
and call functions f ∈ C ∞ (D) smooth or of class C ∞ .
Example 5.11. — The exponential function exp : R → R is smooth. Polynomial functions

are smooth.
5.1.2 Differentiation Rules

As with continuous functions, we do not always want to show by hand that a given function
is differentiable. Instead, we want to prove the general rules by which the differentiability of
various functions can be traced.

Proposition 5.12: Derivative of Sum and Product

Let D ⊆ R and x0 ∈ D be an accumulation point of D \ {x0 }. Let f, g : D → R be
differentiable at x0 . Then f + g and f · g are differentiable at x0 , and the following
holds:
(f + g)′ (x0 ) = f ′ (x0 ) + g ′ (x0 ), (5.2)

(f g)′ (x0 ) = f ′ (x0 )g(x0 ) + f (x0 )g ′ (x0 ). (5.3)
In particular, any scalar multiple of f is differentiable at x0 and (αf )′ (x0 ) = αf ′ (x0 )

for all α ∈ R.
Proof. We compute using the properties of the limit introduced in Section 3.5.1:

(f + g)(x) − (f + g)(x0 ) f (x) − f (x0 ) g(x) − g(x0 )
lim = lim +
x→x0 x − x0 x→x0 x − x0 x − x0
f (x) − f (x0 ) g(x) − g(x0 )
= lim + lim
x→x0 x − x0 x→x0 x − x0
= f ′ (x0 ) + g ′ (x0 ),
and
21 (f · g)(x) − (f · g)(x0 ) (f (x) − f (x0 ))g(x) + f (x0 )(g(x) − g(x0 ))

lim = lim
x→x0 x − x0 x→x0 x − x0

f (x) − f (x0 ) g(x) − g(x0 )
= lim g(x) + f (x0 ) lim
x→x0 x − x0 x→x0 x − x0
f (x) − f (x0 ) g(x) − g(x0 )
= lim · lim g(x) + f (x0 ) lim
x→x0 x − x0 x→x 0 x→x 0 x − x0
= f ′ (x0 )g(x0 ) + f (x0 )g ′ (x0 ),
where we used that g is continuous at x0 to say that limx→x0 g(x) = g(x0 ) (recall Remark 5.2).
Corollary 5.13: Higher Order Derivatives of the Product

Let f, g : D → R be n-times differentiable. Then f + g and f · g are also n-times
differentiable and it holds f (n) + g (n) = (f + g)(n) as well as
n
X n (k) (n−k)
(f g)(n) = f g .
k
k=0
In particular, every scalar multiple α ∈ R, (αf )(n) = αf (n) for all α ∈ R.
Proof. For n = 1 this corresponds to Proposition 5.12. The general case follows by induction
on n ≥ 1.

Corollary 5.14: Derivatives of Polynomials
Polynomial functions are differentiable on all R. It holds (1)′ = 0 and (xn )′ = nxn−1
for all n ≥ 1.
Proof. The cases n = 0 and n = 1 have already been discussed in Example 5.5. We now prove
by induction the case n > 1. Assume that, for some n ≥ 1, (xn )′ = nxn−1 holds. Then it
follows from (5.3) that xn+1 = xxn is differentiable and
(xn+1 )′ = (xxn )′ = 1xn + x(nxn−1 ) = (n + 1)xn .
This proves the inductive step and concludes the proof.

Differentiability of any polynomial function now follows from the linearity of the derivative,
see (5.2).
Example 5.15. — Thanks to Example 5.6 in the special cases α = ±1 and α = ±i in

Example 5.6, we deduce that
(ex )′ = ex , (e−x )′ = −e−x , (eix )′ = ieix , (e−ix )′ = −ie−ix .
As a consequence, recalling Theorem 4.57 we see that

21
(eix )′ − (e−ix )′ eix + e−ix
sin′ (x) = = = cos(x),
2i 2
and analogously cos′ (x) = − sin(x). Similarly, sinh′ (x) = cosh(x) and cosh′ (x) = sinh(x).
Theorem 5.16: Chain Rule

Let D, E ⊆ R be subsets, and let x0 ∈ D be an accumulation point of D \ {x0 }. Let
f : D → E be differentiable at x0 such that y0 = f (x0 ) is an accumulation point of
E \ {y0 }, and let g : E → R be differentiable at y0 . Then g ◦ f : D → R is differentiable
at x0 and
(g ◦ f )′ (x0 ) = g ′ (f (x0 ))f ′ (x0 ).
Proof. Note that one can write g(y) = g(y0 ) + g ′ (y0 )(y − y0 ) + εg (y)(y − y0 ) with

 g(y)−g(y0 ) − g ′ (y ) if y ∈ E \ {y }
y−y0 0 0
εg (y) =
0 if y = y0 .
Also, since g is differentiable at y0 , the function εg is continuous at y0 . Substituting y = f (x)

and recalling that y0 = f (x0 ) we get
g(f (x)) = g(f (x0 )) + g ′ (f (x0 ))[f (x) − f (x0 ))] + εg (f (x))[f (x) − f (x0 )]

for all x ∈ D, therefore

g(f (x)) − g(f (x0 )) ′ f (x) − f (x0 ) f (x) − f (x0 )
lim = lim g (f (x0 )) + εg (f (x))
x→x0 x − x0 x→x0 x − x0 x − x0
= g ′ (f (x0 ))f ′ (x0 ) + εg (f (x0 )) f ′ (x0 ) = g ′ (f (x0 ))f ′ (x0 ),
| {z }
=0
where we used Proposition 3.71 and the continuity of εg at y0 = f (x0 ) to deduce that
εg (f (x)) → εg (f (x0 )) as x → x0 .
Remark 5.17. — By a nontrivial induction argument, one can prove that if f : D → E

and g : E → R are n-times differentiable, then also g ◦ f : D → R is n-times differentiable and
one can express the n-th derivative of g ◦ f in terms of sums and products of g ′ ◦ f , g (2) ◦ f ,
. . . , g (n) ◦ f, f ′ , f (2) , . . . , f (n) .
Corollary 5.18: Quotient Rule
Let D ⊆ R, x0 ∈ D an accumulation point of D \ {x0 }, and f, g : D → R differentiable

at x0 . If g(x0 ) ̸= 0, then fg is also differentiable at x0 and it holds
′
f f ′ (x0 )g(x0 ) − f (x0 )g ′ (x0 )
(x0 ) = .
21 g g(x0 )2
Proof. Let ψ : R \ {0} → R be the function given by ψ(y) = y1 , which is differentiable

by Example 5.7. We combine this with the chain rule (Theorem 5.16) and obtain that the
function g1 = ψ ◦ g is differentiable at x0 , with derivative
′
1 1
(x0 ) = − g ′ (x0 ).
g g(x0 )2
f
If we now use the product rule in Proposition 5.12, it follows that g =f· 1
g is differentiable
at x0 , and
′
1 ′ g ′ (x0 ) f ′ (x0 )g(x0 ) − f (x0 )g ′ (x0 )

f 1
(x0 ) = f · (x0 ) = f ′ (x0 ) − f (x0 ) = .
g g g(x0 ) g(x0 )2 g(x0 )2
Example 5.19. — We determine the derivative of the function f : R → R given by

f (x) = exp sin(sin(x2 )) by applying the chain rule several times. Since exp′ = exp we

obtain
f ′ (x) = exp(g(x))g ′ (x),

where g(x) = sin(sin(x2 )). Similarly, because sin′ = cos,
g ′ (x) = cos(h(x))h′ (x),
where h(x) = sin(x2 ). Finally,

h′ (x) = cos(k(x))k ′ (x)
21
with k(x) = x2 . Since k ′ (x) = 2x, this gives
f ′ (x) = exp sin(sin(x2 )) cos(sin(x2 )) cos(x2 )2x

∀ x ∈ R.
Exercise 5.20. — Determine the derivative of the function x 7→ cos sin3 (exp(x)) .

Theorem 5.21: Derivative of the Inverse

Let D, E ⊆ R, and let f : D → E be a continuous bijective mapping whose inverse
f −1 : E → D is also continuous. Let x̄ ∈ D be an accumulation point of D \ {x̄}, and
assume that f is differentiable at x̄ ∈ D with f ′ (x̄) ̸= 0. Then f −1 is differentiable at
ȳ = f (x̄) and
1
(f −1 )′ (ȳ) = ′
f (x̄)
Proof. We first show that ȳ is an accumulation point of E \ {ȳ}, which allows us to speak
of differentiability at ȳ. In fact, since by assumption x̄ is an accumulation point of D \ {x̄},
22 there exists a sequence (xn )∞ n=0 in D \ {x̄} with xn → x̄ as n → ∞. Since f is continuous and
bijective, the sequence (f (xn ))∞n=0 converges to ȳ = f (x̄).
Now, to compute the derivative, let (yn )∞ n=0 be an arbitrary sequence in E \ {ȳ} converging
to ȳ. Then xn = f (yn ) tends towards x̄ (since f −1 is continuous by assumption), and the
−1
following holds:
−1
f −1 (yn ) − f −1 (ȳ)

xn − x̄ f (xn ) − f (x̄) 1
lim = lim = lim = .
n→∞ yn − ȳ n→∞ f (xn ) − f (x̄) n→∞ xn − x̄ f ′ (x̄)
−1 −1
Hence, if we set g(y) = f (y)−f y−ȳ
(ȳ)
, this proves that for every sequence yn converging to
ȳ it holds g(yn ) → f ′ (x̄) . Recalling Lemma 3.70, this proves that limy→ȳ g(y) = f ′1(x̄) , as
1
desired.

Figure 5.1: An intuitive representation of Theorem 5.21. Mirroring the graph of f and the
tangent line at the point (x0 , y0 ) around the straight line x = y in R2 , we get the graph of
f −1 and, this is the assertion, the tangent line at (y0 , x0 ). A short calculation shows that the
reflection of a straight line with slope m around x = y has slope m 1
.
22 Example 5.22. — The function g : R \ {0} → R given by g(y) = log(|y|) is differentiable,

with g ′ (y) = y1 for all y ∈ R \ {0}. Indeed, since the map log : R>0 → R is the inverse of
exp : R → R>0 , it follows from Theorem 5.21 that g is differentiable at all points y > 0 with
g ′ (y) = f ′1(x) , where x = g(y) = log(y). Since exp′ = exp, this gives
1 1 1
g ′ (y) = log′ (y) = = = .
exp(x) exp(log(y)) y
For y < 0, since g(y) = log(−y), we apply the chain rule (Theorem 5.16) to get g ′ (y) =
− log′ (−y) = − −y
1
= y1 .
Example 5.23. — Given x > 0 and α ∈ R, we can compute the derivative of xα as follows:
α
xα = exp(α log(x)) =⇒ (xα )′ = exp′ (α log(x))α log′ (x) = exp(α log(x)) = αxα−1 .
x
This formula generalizes Corollary 5.14.
Example 5.24. — The logarithm f = log : R>0 → R, x → log(x) is smooth. Indeed,

by the example above, f ′ (x) = x1 . Then, by induction and the Leibniz rule, one proves that
f ′′ (x) = − x12 , f (3) (x) = x23 , and, in general, f (n) (x) = (−1)n−1 (n−1)!
xn .

Exercise 5.25. — Consider the function ψ : R → R defined by

(
exp − x1 if x > 0

ψ(x) = .
0 if x ≤ 0
Show that ψ is smooth on R, and that all its derivatives at 0 vanish.
22
Hint: Show first by induction that, for all n ∈ N,

1 1
ψ (n)
(x) = exp − fn ∀ x > 0, (5.4)
x x
where fn is a polynomial. Then, using that the exponential function grows faster than any
polynomial, prove by induction that
ψ(x)
lim =0 ∀ k ∈ N.
x→0+ xk
Conclude (again by induction on n ∈ N) that
ψ(x)fn x1 − 0

ψ (n) (x) − ψ (n) (0)

(n+1) 1 1
ψ (0) = lim = lim = lim ψ(x)fn = 0.
x→0 x−0 x→0+ x x→0+ x x

Chapter 5.2 Main Theorems of Differential Calculus
5.2 Main Theorems of Differential Calculus
5.2.1 Local Extrema
Definition 5.26: Local extrema

Let D ⊆ R and x0 ∈ D. We say that a function f : D → R has a local maximum at
x0 if there exists δ > 0 such that f (x) ≤ f (x0 ) for all x ∈ D ∩ (x0 − δ, x0 + δ).
If the inequality is strict, namely that f (x) < f (x0 ) for all x ∈ D ∩(x0 −δ, x0 +δ)\{x0 },
then f in x0 has an isolated local maximum. The value f (x0 ) is called a local
maximal value of f .
A local minimum, an isolated local minimum, and a local minimal value of f
are defined analogously.
Furthermore, we call x0 a local extremum of f , and f (x0 ) a local extremal value
of f , if f has a local minimum or a local maximum in x0 .
Proposition 5.27: Local Extrema vs First Derivative

Let D ⊆ R and f : D → R. Suppose f is differentiable at a local extremum x0 ∈ D, and
22 x0 is both a right-hand side and a left-hand side accumulation point of D (namely, x0
is an accumulation point for D ∩ (x0 , ∞) and D ∩ (−∞, x0 )). Then f ′ (x0 ) = 0 holds.
Proof. Without restriction of generality, we assume that f has a local maximum in x0 ∈ D

(otherwise replace f by −f ). Since f is differentiable at x0 and x0 can be approximated both
from the left and right, we see that the following holds.
On the one hand, we have
f (x) − f (x0 )
f+′ (x0 ) = lim ≤0
x→x+
0
x − x0
since, for x close to x0 to the right of x0 , f (x) − f (x0 ) ≤ 0 and x − x0 > 0.

On the other hand,
f (x) − f (x0 )
f−′ (x0 ) = lim ≥0
x→x0 − x − x0
since now, for x close to x0 to the left of x0 , f (x) − f (x0 ) ≤ 0 and x − x0 < 0.
Since f is differentiable at x0 then f ′ (x0 ) = f+′ (x0 ) = f−′ (x0 ), therefore f ′ (x0 ) = 0.

Corollary 5.28: Local Extrema in an Interval

Let I ⊆ R be an interval, and f : I → R. Let x0 ∈ I be a local extremum of f . Then at
least one of the following statements is true.
1. x0 ∈ I is an endpoint of I;
2. f is not differentiable at x0 ;
3. f is differentiable at x0 and f ′ (x0 ) = 0.
In particular, all local extrema of a differentiable function on an open interval are zeros
of the derivative.
Exercise 5.29. — Let f : R → R be the polynomial function f (x) = x3 − x. Find all local
extrema of f . Find all local extrema of the function |f | on [−3, 3].
5.2.2 The Mean Value Theorem

We now turn to general theorems of differential calculus and their consequences. Our first
22 question will be whether the derivative of a differentiable function takes the slope of certain
secants, and the following theorem will be our starting point.
Theorem 5.30: Rolle’s Theorem

Let a < b, and f : [a, b] → R be a continuous function, differentiable on the open
interval (a, b). If f (a) = f (b) holds, then there exists ξ ∈ (a, b) with f ′ (ξ) = 0.
Proof. According to Theorem 3.42, the minimum and maximum of f exist in [a, b]. That
is, there exist x0 , x1 ∈ [a, b] with f (x0 ) ≤ f (x) ≤ f (x1 ) for all x ∈ [a, b]. According to
Proposition 5.27, the derivative of f must be zero for all extrema in (a, b). So if either
x0 ∈ (a, b) or x1 ∈ (a, b) holds, then we have already found a ξ ∈ (a, b) with f ′ (ξ) = 0.
Instead, if both x0 and x1 are endpoints of the interval, because f (a) = f (b) it follows that f
is constant, hence f ′ (x) = 0 holds for all x ∈ (a, b).
Theorem 5.31: Mean Value Theorem

Let a < b, and f : [a, b] → R be a continuous function, differentiable on the open
interval (a, b). Then there exists ξ ∈ (a, b) with
f (b) − f (a)
f ′ (ξ) = .
b−a

Proof. We define a function g : [a, b] → R by
f (b) − f (a)
g(x) = f (x) − (x − a)
b−a
for all x ∈ [a, b]. Note that g is continuous, and satisfies
g(a) = f (a), g(b) = f (b) − (f (b) − f (a)) = f (a).
Furthermore, since both functions
f (b) − f (a)
x 7→ f (x) and x 7→ (x − a)
b−a
are differentiable in (a, b), it follows from Proposition 5.12 that g is differentiable in (a, b).
Thus, according to Rolle’s theorem, there exists ξ ∈ (a, b) such that
f (b) − f (a)
0 = g ′ (ξ) = f ′ (ξ) −
b−a
as desired.
5.32. — So, in words, Rolle’s theorem states that if a differentiable function on an inter-
22 val takes the same value at the endpoints, the slope must be zero somewhere between the
endpoints. We illustrate this in the following picture on the left.
Instead, according to the Mean Value Theorem, for every differentiable function on an interval,
there is at least one point where the slope of the function is exactly the average slope. This
can be seen in the picture on the right. You can also see here how the proof of the mean value
theorem can be traced back to Rolle’s theorem by modifying the graph of the function f on
the right by shearing in such a way that f (a) = f (b) applies afterwards.
Exercise 5.33. — Let [a, b] be a compact interval and f : [a, b] → R be continuously

differentiable. Show that f is Lipschitz continuous. What happens if compactness is dropped
as a hypothesis?

Example 5.34. — Let f : [0, 2π] → C be the complex-valued function given by f (t) =
exp(it) = cos(t) + i sin(t). At the endpoints of the interval [0, 2π], f (0) = f (2π) = 1 holds.
However, the derivative of f never takes the value zero because, according to Example 5.5,
f ′ (t) = i exp(it) ̸= 0
for all t ∈ [0, 2π]. Thus, the statements of Rolle’s Theorem and the Mean Value Theorem for
complex-valued functions are false in this generality.
Theorem 5.35: Cauchy Mean Value Theorem

Let a < b, and let f, g : [a, b] → R be continuous functions, both of them differentiable
on (a, b). Then there exists ξ ∈ (a, b) such that
g ′ (ξ)(f (b) − f (a)) = f ′ (ξ)(g(b) − g(a)). (5.5)
If in addition g ′ (x) ̸= 0 holds for all x ∈ (a, b), then g(a) ̸= g(b) and
f ′ (ξ) f (b) − f (a)

= .
g ′ (ξ) g(b) − g(a)
Proof. We define the function F : [a, b] → R by

22
F (x) = g(x) f (b) − f (a) − f (x) g(b) − g(a) .
Then F (a) = F (b) since

F (a) = g(a) f (b) − f (a) − f (a) g(b) − g(a) = g(a)f (b) − f (a)g(b),

F (b) = g(b) f (b) − f (a) − f (b) g(b) − g(a) = g(a)f (b) − f (a)g(b).
Thus, according to Rolle’s Theorem 5.30, there exists ξ ∈ (a, b) such that
F ′ (ξ) = g ′ (ξ)(f (b) − f (a)) − f ′ (ξ)(g(b) − g(a)) = 0.
This proves (5.5). If in addition g ′ (x) ̸= 0 for all x ∈ (a, b), then it follows from Rolle’s
theorem that g(b) ̸= g(a) holds. After dividing (5.5) by g ′ (ξ)(g(b) − g(a)), we get the second
assertion of the theorem.
5.36. — Just like for the Mean Value Theorem 5.31, Cauchy’s Mean Value Theorem
has a geometrical interpretation, only this time you have to look in the two-dimensional
plane. There, Cauchy’s mean value theorem states, under the assumptions made, that the
curve t 7→ (f (t), g(t)) has a tangent that is parallel to the straight line through the points
(f (a), g(a)), (f (b), g(b)).

5.2.3 L’Hopital’s Rule

The family of results known collectively as Rule of l’Hôpital (or l’Hospital) is named after
the French mathematician and nobleman Guillaume François Antoine, Marquis de l’Hôpital
(1661–1704). It probably goes back to Johann Bernoulli, but was published by de l’Hôpital
in his textbook Analyse des Infiniment Petits pour l’Intelligence des Lignes Courbes. De
l’Hôpital’s influential book was the first systematic treatment of infinitesimal calculus. His
approach and argumentation are thoroughly geometrical - de l’Hôpital did not know neither
a strict notion of limit nor a notion of differentiability.
22
Theorem 5.37: L’Hôpital’s Rule
Let a < b, and f, g : (a, b) → R differentiable functions. Suppose the following hold:
1. g(x) ̸= 0 and g ′ (x) ̸= 0 for all x ∈ (a, b).
2. lim f (x) = lim g(x) = 0.

x→a+ x→a+
f ′ (x)
3. The limit L = lim g ′ (x) exists.
x→a+
f (x)
Then lim g(x) also exists and is equal to L.
x→a+
Proof. By assumption (2), we can extend f and g continuously on [a, b) by setting f (a) =
g(a) = 0. Fix ε > 0. According to assumption (3), there exists δ > 0 such that
f ′ (ξ)
∈ (L − ε, L + ε) ∀ ξ ∈ (a, a + δ).
g ′ (ξ)
Now, given x ∈ (a, a + δ), we apply the Mean Value Theorem 5.35 on the interval [a, x] to
find ξx ∈ (a, x) such that
f (x) f (x) − f (a) f ′ (ξx )
= = ′ .
g(x) g(x) − g(a) g (ξx )

Since ξx ∈ (a, x) ⊆ (a, a + δ), it follows that
f (x) f ′ (ξx )
22 = ′ ∈ (L − ε, L + ε) ∀ x ∈ (a, a + δ).
g(x) g (ξx )
f (x)
Since ε > 0 is arbitrary, this proves that limx→a+ g(x) = L, as desired.
Theorem 5.37 is one of several versions of the rule of l’Hôpital For instance, one can as-
sume that the limits in (2) are both improper limits, i.e., that limx→a+ g(x) = ±∞ and
limx→a+ f (x) = ±∞ hold with arbitrary signs. More precisely, the following holds:
Theorem 5.38: L’Hôpital’s Rule for Improper Limits

Let a < b, and f, g : (a, b) → R differentiable functions. Suppose the following hold:
1. g(x) ̸= 0 and g ′ (x) ̸= 0 for all x ∈ (a, b).
2. lim |f (x)| = lim |g(x)| = ∞.

x→a+ x→a+
f ′ (x)
3. The limit L = lim g ′ (x) exists.
x→a+
f (x)
Then lim g(x) also exists and is equal to L.
x→a+
Proof. (Extra material) Fix ε > 0. According to assumption (3), there exists δ > 0 such that
23
f ′ (ξ)
∈ (L − ε, L + ε) ∀ ξ ∈ (a, a + δ).
g ′ (ξ)
Now, given x ∈ (a, a + δ), we apply the Mean Value Theorem 5.35 on the interval [x, a + δ]
to find ξx ∈ (x, a + δ) such that
f (x) − f (a + δ) f ′ (ξx )
= ′ .
g(x) − g(a + δ) g (ξx )
Since ξx ∈ (x, a + δ) ⊆ (a, a + δ), it follows that
f (x) − f (a + δ) f ′ (ξx )
= ′ ∈ (L − ε, L + ε) ∀ x ∈ (a, a + δ). (5.6)
g(x) − g(a + δ) g (ξx )
Note now that

g(a+δ)
f (x) f (x) − f (a + δ) 1 − g(x)
= · f (a+δ)
. (5.7)
g(x) g(x) − g(a + δ) 1 −
f (x)

g(a+δ)
1−
Also, since |f (x)|, |g(x)| → ∞ as x → a+ , limx→a+ g(x)
f (a+δ) = 1. Hence, recalling (5.6) and
1− f (x)
(5.7), there exists η ∈ (0, δ) such that
f (x)
∈ (L − 2ε, L + 2ε) ∀ x ∈ (a, a + η).
g(x)
f (x)
Since ε > 0 is arbitrary, this proves that limx→a+ g(x) = L, as desired.
Instead of the limit x → a+ or x → b− , one can also consider the limit as x → −∞ or

x → ∞. For instance, the following holds:
Theorem 5.39: L’Hôpital’s Rule at Infinity

Let R > 0 and f, g : (R, ∞) → R differentiable functions. Suppose the following hold:
1. g(x) ̸= 0 and g ′ (x) ̸= 0 for all x ∈ (R, ∞).
2. either lim f (x) = lim g(x) = 0 or lim |f (x)| = lim |g(x)| = ∞.

x→∞ x→∞ x→∞ x→∞
f ′ (x)
3. The limit L = lim ′ exists.
x→∞ g (x)
f (x)
Then lim also exists and is equal to L.
x→∞ g(x)
23 Proof. (Extra material) If lim f (x) = lim g(x) = 0, apply Theorem 5.37 in the interval
x→∞ x→∞
0, R1 to the functions x 7→ f x1 and x 7→ g x1 .

If lim |f (x)| = lim |g(x)| = ∞, apply instead Theorem 5.38 in the interval 0, R1 to the

x→∞ x→∞
functions x 7→ f x1 and x 7→ g x1 .

Exercise 5.40. — Prove Theorem 5.39 in detail.
Exercise 5.41. — Calculate the following limits using l’Hôpital’s rule:
sin(x) − x ex − x − 1 x4 − 4x
(a) lim ; (b) lim ; (c) lim ; (d) lim x3 ex .
x→0+ x2 sin(x) x→0 cos x − 1 x→2 sin(πx) x→−∞
Exercise 5.42. — Let a < b be real numbers and f : [a, b] → R a continuous function.
Suppose x0 ∈ [a, b] is a point such that f is differentiable on [a, b] \ {x0 } and suppose that the
limit limx→x0 f ′ (x) exists. Show that f is differentiable at x0 and that f ′ is continuous at x0 .
Exercise 5.43. — Let I = (a, b) ⊆ R be an interval, and let f : I → R be twice differen-

tiable. Show the formula
f (x + h) − 2f (x) + f (x − h)
f ′′ (x) = lim
h→0 h2

for all x ∈ I. Using the sign function x 7→ sgn(x), verify that the existence of the above limit
does not imply twice differentiability.
Hint: Apply l’Hôpital’s Rule twice.
5.2.4 Monotonicity and Convexity via Differential Calculus

The mean value theorem allows us to characterize known properties of functions using the
derivative. In the following, when we write f ′ ≥ 0, we mean that f ′ (x) ≥ 0 for all x ∈ I.
Also, I shall always denote a “non-trivial” interval, that is, a non-empty interval that does
not consist of a single point.
Proposition 5.44: Monotonicity vs First Derivative

Let I ⊆ R be an interval, and let f : I → R be a differentiable function. Then
f ′ ≥ 0 ⇐⇒ f is increasing.
Proof. Suppose f is increasing. Then we can note that f (x + h) − f (x) ≥ 0 for h > 0, and
f (x + h) − f (x) ≤ 0 for h < 0. Therefore, in either case f (x+h)−f
h
(x)
≥ 0, and we get
f (x + h) − f (x)
f ′ (x) = lim ≥ 0.
23
h→0 h
To prove the converse implication, assume that f is not increasing. Then there exist two
points x1 < x2 in I with f (x2 ) > f (x1 ), and according to the Mean Value Theorem 5.31 there
exists ξ ∈ [x1 , x2 ] with f ′ (ξ) = f (xx22)−f
−x1
(x1 )
< 0. So f ′ ̸≥ 0 on I.
Remark 5.45. — If f ′ > 0, the above argument can be used to show that f is strictly
increasing. However, the converse is false: the function f : R → R, x →
7 x3 is strictly
increasing, but f ′ (0) = 0.
Corollary 5.46: Constant Functions vs First Derivative

Let I ⊆ R be an interval, and let f : I → R. Then f is constant if and only if f is
differentiable and f ′ (x) = 0 holds for all x ∈ I.
Proof. On the one hand, the derivative of a constant function is the zero function.
Conversely, if f is differentiable and f ′ = 0, then f ′ ≥ 0 and −f ′ ≥ 0. Hence, Proposition
5.44 implies that both f and −f are increasing, so f is constant.
Exercise 5.47. — Let I ⊆ R be an interval, and let f : I → R. Show that f is a polynomial

if and only if f is smooth and there exists n ∈ N such that f (n) = 0.

Definition 5.48: Convex Functions

Let I ⊆ R be an interval, and let f : I → R. Then f is called convex if, for all a, b ∈ I
with a < b and all t ∈ (0, 1), the inequality
(5.8)

f (1 − t)a + tb ≤ (1 − t)f (a) + tf (b)
holds. We say that f strictly convex if the inequality in (5.8) is strict. A function
g : I → R is called (strictly) concave if −g is (strictly) convex.
5.49. — The inequality (5.8) can be understood geometrically as follows: If a < b are
points in the domain of definition of f , then the graph of f in the interval [a, b] lies below
the secant through the points (a, f (a)) and (b, f (b)). Convexity can also be characterized by
means of slopes of secants. Namely, f : I → R is convex if for all x ∈ (a, b) ⊂ I the inequality
f (x) − f (a) f (b) − f (x)

≤ (5.9)
x−a b−x
holds, and strictly convex if the inequality above is strict. Geometrically, this means that the
slope of the lines through the points (a, f (a)) and (x, f (x)) is smaller than the slope of the
lines through the points (x, f (x)) and (b, f (b)).
23
Exercise 5.50. — Show that the inequality (5.8) for all t ∈ (0, 1) is equivalent to the
inequality (5.9) for all x ∈ (a, b).
Proposition 5.51: Convexity vs Monotonicity of First Derivative

Let I ⊆ R be an interval and f : I → R a differentiable function. Then f is convex if
and only if f ′ is increasing.

Proof. Suppose first that f ′ is increasing. Fix a, b ∈ I with a < b, and consider x ∈ (a, b).
According to the Mean Value Theorem 5.31, there exist ξ ∈ (a, x) and ζ ∈ (x, b) such that
f (x) − f (a) f (b) − f (x)

f ′ (ξ) = , f ′ (ζ) = .
x−a b−x
Thus, from the monotonicity of f ′ we deduce (5.9), so f is convex.

Vice versa, suppose that f is convex. Let a < b be two points in I, and consider h > 0
small enough so that a + h < b − h. Applying (5.9) on the interval (a, b − h) with x = a + h
it follows that
f (a + h) − f (a) f (b − h) − f (a + h)
≤ .
h (b − h) − (a + h)
Also, from (5.9) applied on the interval (a + h, b) with x = b − h, we get
f (b − h) − f (a + h) f (b) − f (b − h)
≤ .
(b − h) − (a + h) h
Combining the two inequalities above, we deduce that for all h > 0 sufficiently small,
f (a + h) − f (a) f (b) − f (b − h)
≤ . (5.10)
h h
Taking the limit as h → 0+ , we obtain f ′ (a) ≤ f ′ (b). Since a < b are arbitrary, this proves
that f ′ is increasing.
23
Exercise 5.52. — With the same assumptions as in Proposition 5.51, prove that f is
strictly convex if and only if f ′ is strictly increasing.
Corollary 5.53: Convexity vs Second Derivative

Let I ⊆ R be an interval, and f : I → R a twice differentiable function. Then f is
convex if and only if f ′′ ≥ 0.
Proof. By Proposition 5.51, f is convex if and only if f ′ is increasing. Then, by Proposition

5.44 applied to f ′ , we see that f ′ is increasing if and only if f ′′ ≥ 0. The result follows.
Exercise 5.54. — Under the same assumptions as in Corollary 5.53, prove that if f ′′ (x) > 0
for all x ∈ I, then f is strictly convex. Is the converse true?
Exercise 5.55. — The function f : (0, ∞) → R, x 7→ x log(x), is strictly convex. This

follows from Corollary 5.53, since f is smooth and
1 1
f ′ (x) = log(x) + x = log(x) + 1, f ′′ (x) = >0
x x
for all x > 0. Furthermore, we already know from Example 3.79 that limx→0 x log(x) = 0.
Lastly, we note that limx→0 f ′ (x) = −∞, all of which can be seen in the graph of f .

23
Exercise 5.56 (Minima of Convex Functions). — Let I ⊆ R be an interval, and f : I → R

a convex function. Show that every local minimum of f is a (global) minimum.
Exercise 5.57. — Given α ∈ (0, 1], prove ˙

Hint: First, setting t = x+y
x
, show that (3.12) is equivalent to the inequality
1 ≤ tα + (1 − t)α ∀ t ∈ [0, 1]. (5.11)
Then, prove that the function x 7→ xα is concave for x ∈ (0, ∞) and conclude the validity of
(5.11).

Chapter 5.3 Example: Differentiation of Trigonometric Functions
5.3 Example: Differentiation of Trigonometric Functions

In this section, we study the derivative and monotonicity properties of trigonometric func-
tions.
5.3.1 Sine and Arc Sine

5.58. — Recalling Exercise 5.15, the functions sin : R → R and cos : R → R are smooth,
and
sin′ (x) = cos(x), cos′ (x) = − sin(x).
By Theorem 4.59 and Exercise 4.62, the zeros of cos : R → R are the set { π2 + kπ | k ∈ Z}, and
cos(0) = 1. From the Intermediate Value Theorem 3.29 it follows that sin′ (x) = cos(x) > 0
for all x ∈ (− π2 , π2 ). Thus, by Remark 5.45, the function
sin : [− π2 , π2 ] → [−1, 1] (5.12)
is strictly increasing and bijective (recall that sin(− π2 ) = −1 and sin( π2 ) = 1). Consequently,
the sine function restricted to [− π2 , π2 ] has an inverse, which we express as
arcsin : [−1, 1] → [− π2 , π2 ]
and call arcsine.

23 The following figure shows the graph of the sine function on [− π2 , π2 ], and of its inverse.
Remark 5.59. — Since sin′′ = − sin, it follows that sin is convex in the interval − π2 , 0 ,

and is concave in the interval 0, π2 .

5.60. — According to Theorem 5.21, the arcsine is differentiable at s if the derivative of

sine at x = arcsin(s) is not zero. Since the derivative of the sine sin′ = cos vanishes exactly

at the boundary points of − π2 , π2 , for x ∈ − π2 , π2 and s = sin(x), Theorem 5.21 yields

1 1 1
23 arcsin′ (s) = =p 2
=√
cos(x) 1 − sin (x) 1 − s2
(recall that cos(x) is positive for x ∈ − π2 , π2 , thus cos(x) = 1 − sin(x)2 ).

p
5.61. — The discussion above can be done analogously for the cosine. The cosine function is
strictly monotonically decreasing in the interval [0, π] and satisfies cos(0) = 1 and cos(π) = −1.
In particular, the restricted cosine function
cos : [0, π] → [−1, 1]
is bijective.
24
The inverse figure is called arccosine and is written as
arccos : [−1, 1] → [0, π]
Just as for the arcsine, we can apply the differentiation rules for the inverse and get s = cos(x)
for x ∈ (0, π)
1 1 1
arccos′ (s) = = −p = −√ .
− sin(x) 2
1 − cos (x) 1 − s2
Remark 5.62. — Since cos′′ = − cos, it follows that cos is concave in the interval 0, π2 ,

and is convex in the interval π2 , π .


5.3.2 Tangent and Arc Tangent

5.63. — We consider the restriction tan : (− π2 , π2 ) → R of the tangent function. Using the
quotient rule in Corollary 5.18, we get
′
′ sin(x) cos(x) cos(x) − sin(x)(− sin(x)) 1
tan (x) = = 2
=
cos(x) cos(x) cos(x)2
for all x ∈ (− π2 , π2 ). In particular, tan : (− π2 , π2 ) → R is strictly monotonically increasing.

Furthermore
sin(x) sin(x)
lim tan(x) = lim = +∞ and lim tan(x) = lim = −∞
x→ π2 − x→ π2 − cos(x) x→− π2 + x→− π2 + cos(x)
Thus, it follows from the Intermediate Value Theorem 3.29 that the tangent function tan :
(− π2 , π2 ) → R is bijective.
24
The inverse π π
arctan : R → − ,
2 2
is called arctangent. By Theorem 5.21 the arc tangent is differentiable, and for x ∈ − π2 , π2

and s = tan(x) it holds

1
arctan′ (s) = 1 = cos2 (x).
cos2 (x)
Since
sin2 (x) sin2 (x) + cos2 (x) − cos2 (x) 1
s2 = = = − 1,
cos2 (x) cos2 (x) cos2 (x)
it follows that 1 + s2 = 1
cos2 (x)
, and therefore
1
arctan′ (s) = ∀ s ∈ R.
1 + s2
5.64. — The cotangent and its inverse function behave similarly. The restriction cot |(0,π) :
(0, π) → R is strictly monotonically decreasing and bijective. The inverse
arccot : R → (0, π)

is called arccotangent and has the derivative
1
arccot′ (s) = − ∀ s ∈ R.
1 + s2
5.3.3 Hyperbolic Functions

We carry out the analogous discussion for the hyperbolic trigonometric functions
ex − e−x ex + e−x sinh(x) ex − e−x

sinh(x) = , cosh(x) = , tanh(x) = = x .
2 2 cosh(x) e + e−x
5.65. — It holds sinh′ (x) = cosh(x) > 0 for all x ∈ R. Thus, according to Proposition
5.44, the hyperbolic sine is strictly monotonically increasing. Since lim sinh(x) = ∞ and
x→∞
lim sinh(x) = −∞, by the Intermediate Value Theorem 3.29 we get that
x→−∞
sinh : R → R
is strictly monotonically increasing and bijective. The inverse function
arsinh : R → R
24
is called the Inverse Hyperbolic Sine. According to the theorem on differentiability of the
inverse function, arsinh is differentiable and it holds, for x ∈ R and s = sinh(x),
1 1 1
arsinh′ (s) = =q =√ .
cosh(x) 2 1 + s 2
1 + sinh (x)
The inverse hyperbolic sine has a closed form, unlike the inverse functions arcsin, arccos, and
arctan. In fact, starting from the relation sinh(x) = s we have
ex − e−x
=s =⇒ e2x − 2sex − 1 = 0.
2
Calling y = ex , we see that y 2 − 2sy − 1 = 0, hence

p
y =s± 1 + s2 .
√
Since y = ex > 0, the only admissible root is s + 1 + s2 , therefore
p p
ex = y = s + 1 + s2 =⇒ x = log s + 1 + s2 .
5.66. — The hyperbolic cosine satisfies cosh′ (x) = sinh(x) and cosh′′ (x) = cosh(x) > 0
for all x ∈ R. In particular, the cosine hyperbolic is strictly convex by Corollary 5.53 and

has a global minimum at 0 by Corollary 5.28 (since 0 = cosh′ (x) = sinh(x) implies x = 0).
For x > 0 we have cosh′ (x) > 0, so cosh is strictly monotonically increasing on R≥0 . Since
cosh(0) = 1 and lim cosh(x) = +∞, it follows that
x→∞
cosh : R≥0 → R≥1
is strictly monotonically increasing and bijective. The inverse
arcosh : R≥1 → R≥0
is called the Inverse Hyperbolic Cosine, is differentiable on R>1 and satisfies
1 1
arcosh′ (s) = =√
sinh(x) 2
s −1
for s > 1 and s = cosh(x) with x > 0. Furthermore

p
24 arcosh(s) = log s + s2 − 1
for all s > 1. We leave the proof of the above properties to those interested.
5.67. — The Inverse Hyperbolic Tangent is the inverse function

1 1+x
artanh : (−1, 1) → R, artanh(x) = 2 log
1−x
of the strictly monotonically increasing bijection tanh : R → (−1, 1). According to Theorem
5.21, artanh is differentiable and the following holds:
1
artanh′ (s) =
1 − s2
for all s ∈ (−1, 1).
Exercise 5.68. — Check all assertions made in Paragraphs 5.65, 5.66, and 5.67.

Chapter 6
The Riemann Integral
In this chapter we will take the idea of section 1.1 and extend it to the notion of the Riemann
integral with the help of the supremum and the infimum, i.e. implicitly the completeness
axiom.
6.1 Step Functions and their Integral
6.1.1 Decompositions and Step Functions
Interlude: Partitions
Two sets A, B are called disjoint if A ∩ B = ∅. For a collection A of sets, we say
that the sets in A are pairwise disjoint if for all A1 , A2 ∈ A with A1 ̸= A2 it holds
A1 ∩ A2 = ∅.
Let X be a set. A partition of X is a family P of non-empty pairwise disjoint subsets
of X such that
[
X= P.
P ∈P
24 In other words, sets P ∈ P are non-empty, and each element of X is an element of exactly
one P ∈ P.
Figure 6.1: Schematic representation of a partition P = {P1 , . . . , P4 } of a set X.
For the following discussion, we fix two real numbers a < b, and work with the compact
interval [a, b] ⊂ R.
161
Chapter 6.1 Step Functions and their Integral
Definition 6.1: Decomposition of an Interval

A decomposition of [a, b] is a finite set of points
a = x0 < x1 < · · · < xn−1 < xn = b
with n ∈ N. The points x0 , . . . , xn ∈ [a, b] are called the division points of the decom-
position.
6.2. — Formally, a decomposition of [a, b] is a finite subset of [a, b] containing a and b,

together with the listing of its elements in ascending order. A decomposition also induces a
of partition of [a, b], viz.
[a, b] = {x0 } ∪ (x0 , x1 ) ∪ {x1 } ∪ · · · ∪ (xn−1 , xn ) ∪ {xn }
which we will use implicitly from now on. A decomposition a = y0 < y1 < · · · < ym = b is
called a refinement of a decomposition a = x0 < x1 < · · · < xn = b if
{x0 , x1 , . . . , xn } ⊆ {y0 , y1 , . . . , ym }.
The notion of refinement leads to an order relation on the set of all decompositions of [a, b].
24 Note that any two decompositions of [a, b] always have a common refinement (take the union
of the points).
Definition 6.3: Step Functions

A function f : [a, b] → R is called a step function if there exists a decomposition
a = x0 < x1 < · · · < xn = b of [a, b] such that for k = 1, 2, . . . , n the restriction of f
to the open interval (xk−1 , xk ) is constant. We also say that the function f is a step
function with respect to the decomposition a = x0 < x1 < · · · < xn = b.
Figure 6.2: The graph of a step function on the interval [a, b].
Proposition 6.4: Linearity of the Space of Step Functions

Let f, g : [a, b] → R be step functions, and α, β ∈ R. Then αf + βg is also a step
function.

Proof. There exist decompositions of [a, b] with respect to which f and g are step functions.
For these decompositions, there exists a common refinement a = x0 < x1 < · · · < xn = b with
respect to which f and g are step functions. Thus, the functions f and g are both constant
on the open intervals (xk−1 , xk ), and consequently so is αf + βg, which means that αf + βg
is a step function with respect to a = x0 < x1 < · · · < xn = b.
Example 6.5. — Constant functions are step functions.
Remark 6.6. — Just as in the proof of Proposition 6.4, one can show that the product of
two step functions is again a step function. Also, we note that step functions are bounded,
since they take finitely many values.
6.1.2 The Integral of a Step Function
Definition 6.7: Integral of Step Functions

Let f : [a, b] → R be a step function with respect to a decomposition a = x0 < · · · <
xn = b of [a, b]. We define the integral of f on [a, b] as the real number
Z b n
X
f (x) dx = ck (xk − xk−1 ) (6.1)
24 a k=1
where ck denotes the value of f on the interval (xk−1 , xk ).
For the moment, in (6.1), the individual symbols and dx have no meaning. Originally, the
R
symbol stands for an S for “sum”, and the symbol dx indicates an “infinitesimal length”, i.e.
R
xk − xk−1 for an “infinitesimal fine” decomposition. The notation was introduced by Leibniz
(1646-1716).
Figure 6.3: For a non-negative step function f ≥ 0 we interpret (6.1) as the area of the, set
{(x, y) ∈ R2 | a ≤ x ≤ b, 0 ≤ y ≤ f (x)}, and in general as the signed net area.
6.8. — The equation (6.1) defining the integral is not without problems. A priori, in fact,
the right-hand side depends on the choice of a decomposition of the interval [a, b]. We must

convince ourselves that this is only an apparent dependence. In other words, if a = y0 < · · · <
ym = b is another decomposition of [a, b] with respect to which f is a step function, then
n
X m
X
ck (xk − xk−1 ) = dk (yk − yk−1 ) (6.2)
k=1 k=1
where dk denotes the constant value of f on the interval (yk−1 , yk ).

We consider this in three steps. In the first step, we assume that the decomposition
a = y0 < · · · < ym = b is finer than a = x0 < · · · < xn = b, and merely has an extra
separation point yl ∈ (xℓ−1 , xℓ ) = (yl−1 , yl+1 ). But this means that the sums in (6.2) are the
same, except that the term cl (xl −xl−1 ) on the left becomes dl (yl −yl−1 )+dl+1 (yl+1 −yl ) on the
right, which does not change the value of the sum since cl = dl = dl+1 holds. In a second step,
using complete induction, we can conclude that (6.2) holds if a = y0 < · · · < ym = b is finer
than a = x0 < · · · < xn = b, with any number of additional separation points. Finally, we
know that two decompositions of [a, b] always have a common refinement. We can thus show
(6.2) in full generality by comparing both sides with the corresponding sum for a common
refinement of the given decompositions.
Proposition 6.9: Linearity of Integral of Step Functions

Let f, g : [a, b] → R be step functions, and let α, β ∈ R. Then
Z b Z b Z b
24 (αf + βg)(x) dx = α f (x) dx + β g(x) dx
a a a
Proof. We have already shown in Proposition 6.4 that αf + βg is a step function. Let a ≤
x0 < · · · < xn = b be a decomposition such that the functions f and g (and consequently
αf + βg) are constant on the intervals (xk−1 , xk ). If ck is the value of f and dk the value of
g on (xk−1 , xk ), then αck + βdk is the value of αf + βg on (xk−1 , xk ). Therefore
Z b n
X
(αf + βg)(x) dx = (αck + βdk )(xk − xk−1 )
a k=1
Xn n
X
=α ck (xk − xk−1 ) + β dk (xk − xk−1 )
k=1 k=1
Z b Z b
=α f (x) dx + β g(x) dx,
a a
as desired.
Proposition 6.10: Monotonicity for Integral of Step Functions

Let f, g : [a, b] → R be step functions such that f ≤ g. Then
Z b Z b
f (x) dx ≤ g(x) dx.
a a

Proof. As in the proofs of Proposition 6.4 and Proposition 6.9, we can find a decomposition
a = x0 < · · · < xn = b such that f and g are constant on the intervals (xk−1 , xk ). We again
write ck for the value of f and dk for the value of g on (xk−1 , xk ). Now, because f ≤ g holds,
i.e., f (x) ≤ g(x) for all x ∈ [a, b], we get ck ≤ dk for all k ∈ {1, . . . , n}. Therefore
Z b n
X n
X Z b
f (x) dx = ck (xk − xk−1 ) ≤ dk (xk − xk−1 ) = g(x) dx.
a k=1 k=1 a
24 Exercise 6.11. — Let [a, b], [b, c] be two bounded and closed intervals and let f1 : [a, b] → R
and f2 : [b, c] → R be step functions. Show that the function

f (x) if x ∈ [a, b)
1
f : [a, c] → R, x 7→
f (x) if x ∈ [b, c]
2
is a step function on [a, c]. Then prove that the integral of f is given by
Z c Z b Z c
f (x) dx = f1 (x) dx + f2 (x) dx.
a a b
Finally, show that every step function on [a, c] is of the form described above.

Chapter 6.2 Definition and First Properties of the Riemann Integral
6.2 Definition and First Properties of the Riemann Integral

As in the last section, we consider functions on a compact interval [a, b] ⊂ R. To alleviate the
Rb
notation, we write SF for the set of step functions on [a, b]. Also, we often write a f dx in
Rb
place of a f (x) dx.
6.2.1 Integrability of Real-valued Functions

The following definition of integrability is a variant of Riemann’s definition, which goes back
to the French mathematician Jean-Gaston Darboux (1842–1917).
Definition 6.12: Lower and Upper Sums

Let f : [a, b] → R be a function. Then we define the sets of lower sums L(f ) ⊂ R and
upper sums U(f ) ⊂ R of f by
Z b Z b
L(f ) = ℓ dx ℓ ∈ SF and ℓ ≤ f U(f ) = u dx u ∈ SF and f ≤ u .
a a
Note that, if f is bounded, then these sets are non-empty. Indeed, if |f | ≤ M , then
24 ℓ = −M ∈ L(f ) and u = M ∈ U(f ).
For ℓ, u ∈ SF with ℓ ≤ f ≤ u, Proposition 6.10 implies that
Z b Z b
ℓ dx ≤ u dx,
a a
therefore s ≤ t for all s ∈ L(f ) and t ∈ U(f ). In particular, we have the inequality
sup L(f ) ≤ inf U(f )
if f is bounded.
Definition 6.13: Riemann Integral

A bounded function f : [a, b] → R is called Riemann integrable if sup L(f ) =
inf U(f ). In this case, this common value is called the Riemann integral of f , and is
expressed as
Z b
f dx = sup L(f ) = inf U(f ).
a

6.14. — We call a the lower (integration) limit and b the upper (integration) limit,
Rb
and the function f the integrand of the integral a f dx. If f ≥ 0 is Riemann integrable,
Rb
then we interpret the number a f dx as the area of the set
{(x, y) ∈ R2 | a ≤ x ≤ b, and 0 ≤ y ≤ f (x)}.
Remark 6.15. — Since, for the time being, we only know Riemann integrability and the
Riemann integral, we will simply take the liberty of speaking of integrability and integral. Note
however, that besides Riemann integration theory, there is another important such theory
called Lebesgue integral.
Proposition 6.16: Riemann Integrability Condition

Let f : [a, b] → R be bounded. The function f is Riemann integrable exactly if for every
ε > 0 there exist step functions ℓ and u that satisfy
Z b
ℓ≤f ≤u and (u − ℓ) dx < ε.
a
In such a case
Z b Z b Z b Z b
f dx − ℓ dx < ε, u dx − f dx < ε.
25 a a a a
Proof. Let A and B be nonempty subsets of R with the property that a ≤ b holds for all
a ∈ A and all b ∈ B. Then sup A ≤ inf B, and equality sup A = inf B holds exactly if for
every ε > 0 there is an a ∈ A and an b ∈ B with b − a < ε. This reasoning holds in particular
for the sets L(f ) and U(f ). The implications
f is Riemann integrable ⇐⇒ sup L(f ) = inf U(f )

⇐⇒ ∀ ε > 0 ∃ s ∈ L(f ), t ∈ U(f ) with t − s < ε
Z b
⇐⇒ ∀ ε > 0 ∃ ℓ, u ∈ SF, s.t. ℓ ≤ f ≤ u and (u − ℓ) dx < ε
a
prove the proposition.

The last inequalities follow from
Z b Z b Z b Z b
ℓ dx ≤ f dx ≤ u dx and (u − ℓ) dx < ε
a a a a
6.17. — It is good to know that the Riemann integral is a generalisation of the integral
of step functions, and in this sense we can simply speak of the Riemann integral of a step

function. This is the subject of exercise 6.18.
Exercise 6.18. — Let f : [a, b] → R be a step function. Show that f is Riemann integrable
and that the Riemann integral of f is equal to the integral of f as a step function.
Exercise 6.19. — Repeat the proof of Proposition 1.1 and show, in the language of this
R1
section, that f : [0, 1] → R, x 7→ x2 is Riemann integrable with 0 x2 dx = 31 . Also,
L(f ) = − ∞, 13 and 1

U(f ) = 3, ∞ .
Example 6.20. — Not all functions are Riemann integrable, as the following example
shows. Consider f : [0, 1] → R defined as

1 x ∈ Q
f (x) =
0 x ∈
̸ Q.
We claim that this function is not Riemann integrable.

To see this, let u ∈ SF with f ≤ u, and let 0 = x0 < · · · < xn = 1 be a decomposition such
that u is constant on every interval (xk−1 , xk ), with value ck . Since Q is dense in R, there
exists a x ∈ (xk−1 , xk ) with x ∈ Q. Because of f ≤ u, 1 = f (x) ≤ u(x) = ck holds. Thus
25 Z 1 n
X n
X
u(x) dx = ck (xk − xk−1 ) ≥ (xk − xk−1 ) = xn − x0 = 1
0 k=1 k=1
using telescopic sums. Thus, the upper integral of f is given by 1, since the step function with
constant value 1 has integral 1 and u was arbitrary.
Similarly, one shows that the lower integral of f is given by 0. Thus, f is not Riemann
integrable.
6.2.2 Linearity and Monotonicity of the Riemann Integral
Theorem 6.21: Linearity of the Riemann Integral

If f, g : [a, b] → R are integrable and α, β ∈ R. Then αf + βg is integrable with integral
Z b Z b Z b
(αf + βg) dx = α f dx + β g dx.
a a a
Proof. Given ε > 0, thanks to Proposition 6.16 we can find step functions ℓ1 , ℓ2 , u1 , u2 such
that Z b Z b
ℓ1 ≤ f ≤ u1 , ℓ2 ≤ g ≤ u2 , (u1 − ℓ1 ) dx < ε, (u2 − ℓ2 ) dx < ε,
a a

Z b Z b Z b Z b
f dx − ℓ1 dx < ε, g dx − ℓ2 dx < ε
a a a a
Assume first that α, β ≥ 0. Then
αℓ1 ≤ αf ≤ αu1 , βℓ2 ≤ βg ≤ βu2 =⇒ αℓ1 + βℓ2 ≤ αf + βg ≤ αu1 + βu2 ,
and
Z b
Z b Z b
(αu1 + βu2 ) − (αℓ1 + βℓ2 ) dx = α (u1 − ℓ1 ) dx + β (u2 − ℓ2 ) dx < (α + β)ε.
a a a
Since ε is arbitrary, this shows that αf + βg is integrable. Also, by the traingle inequality
and Proposition 6.9 applied to ℓ1 and ℓ2 , we get
Z b Z b Z b Z b Z b
(αf + βg) dx − α f dx − β g dx ≤ (αf + βg) dx − (αℓ1 + βℓ2 ) dx
a a a a a
Z b Z b Z b
+ (αℓ1 + βℓ2 ) dx − α ℓ1 dx − β ℓ2 dx
|a {z a a
}
=0
Z b Z b Z b Z b
+α ℓ1 dx − f dx + β ℓ2 dx − g dx
a a a a
≤ (α + β)ε + αε + βε = 2(α + β)ε.
25
Rb Rb Rb
which implies, again from the arbitrariness of ε, that a (αf + βg) dx = α a f dx + β a g dx.
The case when α or β is negative is analogous, but one needs to reverse some inequalities.
For instance, if α ≥ 0 but β < 0 then
αℓ1 ≤ αf ≤ αu1 , βℓ2 ≥ βg ≥ βu2 =⇒ αℓ1 + βu2 ≤ αf + βg ≤ αu1 + βℓ2 ,
and
Z b
Z b Z b
(αu1 + βℓ2 ) − (αℓ1 + βu2 ) dx = α (u1 − ℓ1 ) dx + β (ℓ2 − u2 ) dx < (α + |β|)ε.
a a a
Since ε is arbitrary, this shows that αf + βg is integrable, and analogously one proves that
Rb Rb Rb
a (αf + βg) dx = α a f dx + β a g dx.
Exercise 6.22. — Let f : [a, b] → R be Riemann integrable. Let f ∗ : [a, b] → R be a

function obtained by changing the value of f only at finitely many points in [a, b]. Show that
f ∗ is Riemann integrable and has the same Riemann integral as f .
Proposition 6.23: Monotonicity of the Riemann Integral

Rb Rb
Let f, g : [a, b] → R be integrable. If f ≤ g, then a f dx ≤ a g dx.

Proof. For any step function u : [a, b] → R, if u ≤ f then u ≤ g. This implies that L(f ) ⊆
L(g), and therefore
Z b Z b
f dx = sup L(f ) ≤ sup L(g) = g dx,
a a
as desired.
6.24. — Let f : [a, b] → R be a function. We define functions f + , f − , and |f | on [a, b] by
f + (x) = max{0, f (x)}, f − (x) = − min{0, f (x)}, |f |(x) = |f (x)| ∀ x ∈ [a, b].
The function f + is the positive part, f − is the negative part, and |f | is the absolute
value of the function f . One can check that
|f | + f |f | − f
f = f + − f −, |f | = f + + f − , f+ = , f− = .
2 2
In addition,
f ≤ g =⇒ f + ≤ g + and f ≤ g =⇒ f − ≥ g − .
25
Figure 6.4: The graph of af function f is shown on

R b the left, the graph of the corresponding
Rb
function |f | is shown on the right. The integral a f dx describes a net area and a |f |dx a
gross area.
Theorem 6.25: Triangle Inequality for Riemann Integral
Let f : [a, b] → R be an integrable function. Then f + , f − , and |f | are also integrable,

and Z b Z b
f dx ≤ |f | dx.
a a
Proof. We start by showing that f + is Riemann integrable. Let ε > 0. Since f is integrable,
there exist step functions ℓ and u with the property
Z b
ℓ≤f ≤u and (u − ℓ) dx < ε.
a
The functions ℓ+ and u+ are also step functions, and ℓ+ ≤ f + ≤ u+ holds. Since u − ℓ is
non-negative, u−ℓ = (u−ℓ)+ holds. Also, considering all possible cases (i.e., u(x) ≥ ℓ(x) ≥ 0,
u(x) ≥ 0 > ℓ(x), or 0 > u(x) ≥ ℓ(x)), one checks that
(u − ℓ)+ (x) ≥ u+ (x) − ℓ+ (x) ∀ x ∈ [a, b].

Therefore Z b Z b Z b
+ + +
(u − ℓ ) dx ≤ (u − ℓ) dx = (u − ℓ) dx < ε
a a a
which shows that f + is integrable. Theorem 6.21 implies that f − = f + − f and |f | = f + + f −

are also integrable. Finally
Z b Z b Z b Z b Z b Z b
+ − + −
f dx = f dx − f dx ≤ f dx + f dx = |f | dx
a a a a a a
Rb Rb
where we used Theorem 6.21, as well as a f + dx ≥ 0 and a f − dx ≥ 0.
Exercise 6.26. — Let a < b < c be real numbers. Show that a function f : [a, c] → R is
integrable exactly when f |[a,b] and f |[b,c] are integrable, and that in this case
Z c Z b Z c
f dx = f |[a,b] dx + f |[b,c] dx.
a a b
Exercise 6.27. — Let f : [a, b] → R be integrable, and λ > 0 be a real number. Let
g : [λa, λb] → R be the function given by g(x) = f (λ−1 x). Show that g is integrable, and that
Z b Z λb
λ f dx = g dx.
25 a λa
Exercise 6.28. — Let f : [a, b] → R be an integrable function. Show that the function
F : [a, b] → R given by Z x
F (x) = f (t) dt
a
is continuous.
Exercise 6.29. — Let C be the space of continuous functions on [a, b], and let I : C → R
be the integration.
Z b
I(f ) = f dx
a
Show that the function I is continuous, in the following sense: for all ε > 0 there exists a
δ > 0 such that
|f (x) − g(x)| < δ ∀ x ∈ D =⇒ |I(f ) − I(g)| ≤ ε.
Exercise 6.30. — Let f : [0, 1] → R be an integrable function and ε > 0. Show that there
exists a continuous function g : [0, 1] → R such that
Z 1
|f (x) − g(x)| dx < ε. (6.3)
0

Chapter 6.3 Integrability Theorems
6.3 Integrability Theorems
6.3.1 Integrability of Monotone Functions

We consider as before a compact interval [a, b] ⊂ R for real numbers a, b with a < b. We note
that monotone functions f : [a, b] → R are bounded, as e.g. f (a) is a lower bound and f (b) is
an upper bound if f is monotone increasing.
Theorem 6.31: Monotone Functions are Integrable

Every monotone function f : [a, b] → R is Riemann integrable.
Proof. Without loss of generality, f : [a, b] → R is increasing – if not, replace f with −f and
apply Proposition 6.21. We want to apply Proposition 6.16, that is, for a given ε > 0 we want
Rb
to find two step functions ℓ, u ∈ SF such that ℓ ≤ f ≤ u and a (u − ℓ) dx < ε.
We construct ℓ, u using a natural number n ∈ N which we will specify later, and the
decomposition
a = x0 < x1 < . . . < xn = b
of [a, b] given by xk = a + b−a

n k for k ∈ {0, . . . , n}. Let ℓ, u be given, for x ∈ [a, b], by

f (x ) if x ∈ [x , x ) for k ∈ {1, . . . , n}
k−1 k−1 k
ℓ(x) = ,
25
f (b) if x = b

f (a) if x = a
u(x) = .
f (x ) if x ∈ (x , x ] for k ∈ {1, . . . , n}
k k−1 k
Since f is increasing, ℓ ≤ f ≤ u holds. Indeed, for x ∈ [a, b] either x = b, where ℓ(x) = f (x), or
there is a k ∈ {1, . . . , n} with x ∈ [xk−1 , xk ). In the latter case we get ℓ(x) = f (xk−1 ) ≤ f (x),
and thus ℓ ≤ f holds. An analogous argument yields f ≤ u.
Recalling that xn = b and x0 = a, this yields
b n n
b − a
Z X X

(u − ℓ) dx = f (xk ) − f (xk−1 ) (xk − xk−1 ) = f (xk ) − f (xk−1 )
a n
k=1 k=1
b − a
= f (xn ) − f (xn−1 ) + f (xn−1 ) − f (xn−1 ) + . . . + f (x1 ) − f (x0 )
n
b − a
= f (b) − f (a) .
n
Rb
Following Archimedes’ principle, we can now choose n ∈ N such that a (u − ℓ) dx < ε. Thus,
it follows from Proposition 6.16 that f is Riemann-integrable.
√
Exercise 6.32. — Show that the function x ∈ [0, 1] 7→ 1 − x2 ∈ R is Riemann integrable.

Using the addition property in Exercise 6.26, the statement of Theorem 6.31 can be ex-
tended to functions that are only piecewise monotonic.
Definition 6.33: Piecewise Monotone Functions

A function f : [a, b] → R is called piecewise monotone if there is a decomposition
a = x0 < x1 < . . . < xn = b
of [a, b] such that f |(xk−1 ,xk ) is monotone for all k ∈ {1, . . . , n}.
Corollary 6.34: Piecewise Monotone Functions are Integrable

Every piecewise monotone bounded function f : [a, b] → R is Riemann integrable.
Proof. (Extra Material) This follows from Theorem 6.31, and Exercises 6.26 and 6.22.
6.3.2 Integrability of Continuous Functions

In this section, using boundedness and uniform continuity of continuous functions on compact
25 intervals (Theorems 3.39 and 3.46), we show the following result.
Theorem 6.35: Continuous Functions are Integrable

Every continuous function f : [a, b] → R is integrable.
Proof. Let f : [a, b] → R be continuous, and ε > 0. By Theorem 3.46 f is uniformly continu-
ous, so there exists δ > 0 such that
|x − y| < δ =⇒ |f (x) − f (y)| < ε
holds for all x, y ∈ [a, b].

Now, let a = x0 < x1 < . . . < xn = b be a decomposition of [a, b] such that xk − xk−1 < δ
for all k, and for each k ∈ {1, . . . , n} define
ck = min{f (x) | xk−1 ≤ x ≤ xk } and dk = max{f (x) | xk−1 ≤ x ≤ xk }
where we used Theorem 3.42 for the existence of these extrema.

We note that, for all k ∈ {1, . . . , n}, the inequality dk − ck < ε holds. Indeed, if yk , zk ∈
[xk−1 , xk ] as points of minimum and maximum, namely f (yk ) = ck and f (zk ) = dk , then
because |yk − zk | ≤ xk − xk−1 < δ, the uniform continuity implies that dk − ck < ε.

We now define step functions ℓ, u by

 
c
k if x ∈ [xk−1 , xk ) d
k if x ∈ [xk−1 , xk )
ℓ(x) = and u(x) =
n if x = b if x = b
c d
n
for x ∈ [a, b]. Since ck ≤ dk , we see that ℓ ≤ f ≤ u. Also, because dk − ck < ε, it follows that
Z b n
X n
X
(u − ℓ) dx = (dk − ck )(xk − xk−1 ) < ε (xk − xk−1 ) = ε(b − a).
a k=1 k=1
Since ε > 0 is arbitrary, f is integrable.
Again by the addition property in Exercise 6.26, the statement of Theorem 6.35 can be
extended to functions that are only piecewise continuous.
Definition 6.36: Piecewise Continuous Functions

A function f : [a, b] → R is called piecewise continuous if there is a decomposition
a = x0 < x1 < . . . < xn = b
of [a, b] such that f |(xk−1 ,xk ) is continuous for all k ∈ {1, . . . , n}, and both limits
limx→x+ f (x) and limx→x− f (x) exist. In other words, each function f |(xk−1 ,xk ) can
25 k−1 k
be extended to a continuous function on [xk−1 , xk ].
Corollary 6.37: Piecewise Continuous Functions are Integrable

Every piecewise continuous function f : [a, b] → R is Riemann integrable.
Proof. (Extra Material) This follows from Theorem 6.35 applied to the continuous extension
of f |(xk−1 ,xk ) on [xk−1 , xk ], and Exercises 6.26 and 6.22.
6.38. — Most “common” functions are piecewise continuous or piecewise monotone, and in
particular integrable according to Theorem 6.31 or according to Theorem 6.35. We note that
there exist functions that are continuous on their domain of definition but are not monotone
on any open subinterval.

Applet 6.39 (Integrability of a “shaky” function). We see that a continuous but shaky function
as in the graph shown is also Riemann integrable. We can also note that the program GeoGebra
sometimes has problems with the function used, and some of the displayed lower sums or upper
sums are actually not displayed and calculated correctly. Regardless of this, we have proven the
Riemann integrability, so we should not worry if there are some computational errors using
25 GeoGebra.
Exercise 6.40. — Let f : [a, b] → R be a continuous function. Prove that

Z b
f = 0 ⇐⇒ |f (x)| dx = 0.
a
6.3.3 Integration and Sequences of Functions

Assume that a sequence of integrable functions (fn )∞ n=0 , fn : [a, b] → R, converges pointwise
or uniformly to a function f : [a, b] → R. Can we conclude that f is integrable and if so, does
the equality
Z b Z b
lim fn dx = f dx
n→∞ a a
hold?
One can show that the pointwise limit of integrable functions is not integrable. More
importantly, as the following example shows, if a sequence of integrable functions (fn )n∈N
converges pointwise to an integrable function f , then the limit of the integrals does not
necessarily coincide with the integral of the limit.
26 Example 6.41. — Let D = [0, 1] and let fn : D → R be given by.


for x ∈ 0, 2n
1
n2 x



fn (x) = n2 ( n1 − x) for x ∈ 2n
1 1
,n


for x ∈ n , 1
1
0

for x ∈ [0, 1] and n ∈ N. Then fn is continuous and, in particular, integrable. Also, its integral
is equal to the area of the triangle in the figure, which is 21 n1 n2 = 14 .
Note that the sequence (fn )∞ n=0 converges pointwise to the constant function f (x) = 0.
Indeed, fn (0) for every n, so fn (0) → 0. Also, for every x > 0 it follows that fn (x) = 0 for
every n > x1 (since this is equivalent to x > n1 ), so again fn (x) → 0.
However, for all n ∈ N the following is true:
Z 1 Z 1
1
fn (x) dx = ≠ 0= f (x) dx.
0 4 0

So, the limit of the integrals is not equal to the integral of the limit function, although all
functions fn and f are continuous.
On the other hand, if the convergence fn → f is uniform, then both questions can be
answered affirmatively:
Theorem 6.42: Uniform Convergence and Riemann Integral Commute

Let (fn )∞
n=0 , with fn : [a, b] → R, be a sequence of integrable functions converging
26
uniformly to a function f : [a, b] → R. Then f is integrable and
Z b Z b
f dx = lim fn dx . (6.4)
a n→∞ a
Proof. Let ε > 0. By uniform convergence, there exists N ∈ N such that f − ε ≤ fn ≤ f + ε

for all n ≥ N . Since fN is Riemann integrable by assumption, there exist step functions ℓ, u
Rb
on [a, b] with ℓ ≤ fN ≤ u and a (u − ℓ) dx < ε. It follows that, for the step functions ℓ̂ = ℓ − ε
and û = u + ε, we have
Z b
ℓ̂ ≤ f ≤ û and (û − ℓ̂) dx < ε(2b − 2a + 1).
a
Since ε > 0 is arbitrary, the Riemann integrability of f follows from Proposition 6.16.
From the monotonicity and triangle inequality for the Riemann integral in Propositions
6.23 and 6.25, we also have
Z b Z b Z b Z b
f dx − fn dx = (f − fn ) dx ≤ |f − fn | dx ≤ ε(b − a) ∀ n ≥ N.
a a a a
This proves (6.4) and concludes the proof.

Chapter 7
The Derivative and the Riemann

Integral
In this chapter we will examine the connections between the Riemann integral from chapter 6
and the derivative from chapter 5. These connections are of fundamental importance for the
further theory.
7.1 The Fundamental Theorem of Calculus

We specify for this section a compact interval I ⊆ R that is non-empty and does not consist
of a single point. For brevity, we write integrable instead of Riemann-integrable.
7.1.1 The Fundamental Theorem
Definition 7.1: Primitive Function

Let I ⊂ R be an interval and f : I → R a function. Any differentiable function
F : I → R such that F ′ = f is called a primitive of f .
26
Remark 7.2. — In general, a primitive function needs neither exist (see the following
exercise) nor be unique.
Exercise 7.3. — Show that there is no differentiable function f : R → R such that

= sgn(x) holds for all x ∈ R.
f ′ (x)
177
Chapter 7.1 The Fundamental Theorem of Calculus
Theorem 7.4: Fundamental Theorem of Calculus

Let f : [a, b] → R be continuous. Then, for every C ∈ R, the function F : [a, b] → R
defined as Z x
F (x) = f (t) dt + C (7.1)
a
is a primitive of f .
Moreover, any primitive F : [a, b] → R has this form for some constant C ∈ R.
Proof. By Theorem 6.35, f is integrable. We claim that the function

Z x
x 7→ F (x) = f (t) dt + C
a
is differentiable on [a, b] and a primitive function of f . Let x0 ∈ [a, b] and ε > 0. By

the continuity of f , there exists δ > 0 such that |f (x) − f (x0 )| < ε for all x ∈ [a, b] with
|x − x0 | < δ.
Now, for x ∈ (x0 , x0 + δ) ∩ [a, b], it follows from Exercise 6.26 that
Z x Z x0
F (x) − F (x0 ) 1
− f (x0 ) = f (t) dt − f (t) dt − f (x0 )
x − x0 x − x0 a a
Z x
1
= f (t) dt − f (x0 ) .
x − x 0 x0
26
Noticing that Z x Z x
1 1
f (x0 ) = f (x0 ) dt = f (x0 ) dt,
x − x0 x0 x − x0 x0
from the above formula and Theorem 6.25 we get

Z x Z x
F (x) − F (x0 ) 1 1
− f (x0 ) = f (t) dt − f (x0 ) dt
x − x0 x − x 0 x0 x − x 0 x0
Z x
1
= f (t) − f (x0 ) dt
x − x 0 x0
Z x
1
≤ |f (t) − f (x0 )| dt.
x − x 0 x0
Noticing that in the last integral t ∈ [x0 , x] ⊂ [x0 , x0 + δ) ∩ [a, b], it follows from the continuity
of f that |f (t) − f (x0 )| < ε, therefore
x x
F (x) − F (x0 )
Z Z
1 1
− f (x0 ) < ε dt = ε dt = ε.
x − x0 x − x0 x0 x − x0 x0
Similarly, for x ∈ (x0 − δ, x0 ) ∩ [a, b],

Z x x0
F (x) − F (x0 )
Z
1
− f (x0 ) = f (t) dt − f (t) dt − f (x0 )
x − x0 x − x0 a a

Z x0
1
= − f (t) dt − f (x0 )
x − x0 x
Z x0
1
= f (t) dt − f (x0 )
x0 − x x
Z x0
1
= f (t) − f (x0 ) dt
x0 − x x
Z x0
1
≤ |f (t) − f (x0 )| dt < ε.
x0 − x x
Since ε > 0 is arbitrary, this proves
F (x) − F (x0 )
lim = f (x0 ),
x→x0 x − x0
that is, F ′ (x0 ) = f (x0 ). This shows that F is a primitive of f .

Uniqueness up to a constant follows by Corollary 5.46: if F is a primitive then
Z x ′
F (x) − f (t) dt = f (x) − f (x) = 0 ∀ x ∈ [a, b],
a
Rx
therefore F (x) − a f (t) dt is constant.
26 7.5. — Illustration 7.1 shows the essential estimate in the proof of Theorem 7.4. The
value F (x) − F (x0 ) can be written as f (x0 )(x − x0 ) plus the area in red that is smaller than
ε(x − x0 ). Thus F (x)−F
x−x0
(x0 )
, to an error less than ε, is given by f (x0 ).
Figure 7.1
Theorem 7.4, as stated or in the form of one of the following corollaries, is known as Funda-
mental Theorem of Integral and Differential Calculus and goes back to the work of
Leibniz, Newton and Barrow, which are largely the starting points of calculus. Isaac Barrow
(1630–1677) was a theologian, but also a physics and mathematics professor at Cambridge.
His most famous student was Isaac Newton.

Corollary 7.6: Integral vs Derivative

Let F : [a, b] → R be continuously differentiable. Then, for all x ∈ [a, b] we have
Z x
F (x) = F (a) + F ′ (t) dt.
a
Proof. By the definition of primitive, clearly F is a primitive of F ′ . Hence, according to

Rx
Theorem 7.4, there exists a constant C ∈ R such that F (x) = a F ′ (t) dt + C for all x ∈ [a, b].
Ra
Choosing x = a we obtain F (a) = a F ′ (t) dt + C = C, thus
Z x
F (x) = F ′ (t) dt + F (a) ∀ x ∈ [a, b].
a
Corollary 7.7: Riemann Integral and Primitives

Let f : [a, b] → R be continuous and F : [a, b] → R be a primitive of f . Then
Z b
f (t) dt = F (b) − F (a).
a
26
Proof. Apply Corollary 7.6 with f = F ′ and x = b.
Example 7.8. — For all a < b, a, b ∈ R, it holds:

Rb Rb
1. a ex dx = x ′
a (e ) dx = eb − ea ,
Rb Rb
2. a sin(x) dx = − a cos′ (x) dx = − cos(b) + cos(a);
Rb Rb
3. a cos(x) dx = a sin′ (x) dx = sin(b) − sin(a);
Rb Rb
4. a sinh(x) dx = a cosh′ (x) dx = cosh(b) − cosh(a);
Rb Rb
5. a cosh(x) dx = a sinh′ (x) dx = sinh(b) − sinh(a);
Rb Rb
6. xα dx = 1 1+α )′ 1
b1+α − a1+α whenever 1 + α ̸= 0, 0 < a < b;

a 1+α a (x dx = 1+α
Rb Rb
7. a x−1 dx = a (log x)
′ dx = log(b) − log(a), whenever 0 < a < b.
Exercise 7.9. — Let f : [a, b]] → R be discontinuous at mostly finitely many points. Show
Rx
that the function F (x) = a f (t) dt for is continuous in [a, b], differentiable at all continuity
points of f , and at such points it satisfies F ′ (x) = f (x).

Exercise 7.10. — Let f : I → R be continuous. Show that there exists ξ ∈ (a, b) with
Z b
26 f (x) dx = f (ξ)(b − a).
a
Can you find two different proofs?
7.1.2 Integration by Parts and by Substitution

As a consequence of the fundamental theorem of calculus and the differentiation rules seen
in the last section, we have the following fundamental results. We shall use the following
convenient notation: given h : [a, b] → R,
b
h(x) a = h(b) − h(a).
Theorem 7.11: Integration by Parts

Let f, g : [a, b] → R be continuously differentiable functions. Then
Z b b
Z b
′
f ′ (x) g(x) dx.

f (x) g (x) dx = f (x)g(x) a −
a a
Proof. Thanks to Proposition 5.12 we have
f g ′ = (f g)′ − f ′ g.
27
Integrating this identity on [a, b] and using Corollary 7.7 yields
Z b Z b
f (x) g ′ (x) dx = f (b)g(b) − f (a)g(a) − f ′ (x) g(x) dx,
a a
as desired.
From now on, as a convention, given h : [a, b] → R we have

Z a Z b
h(x) dx = − h(x) dx. (7.2)
b a
Theorem 7.12: Integration by Substitution, 1st Form
Let I, J ⊂ R be intervals, let f : I → J be continuously differentiable and g : J → R

continuous. Then, for any [a, b] ⊂ I,
Z b Z f (b)
′
g(f (x))f (x) dx = g(y) dy.
a f (a)

Ry
Proof. Fix y0 ∈ J, and define G(y) = y0 g(t) dt. By Theorem 7.4 we know that G′ = g, so it
follows from the Chain Rule (see Theorem 5.16) that g ◦ f f ′ = G′ ◦ f f ′ = (G ◦ f )′ . Hence
Z b Z b
′
g(f (x))f (x) dx = (G ◦ f )′ (x) dx = G(f (b)) − G(f (a))
a a
Z f (b) Z f (a) Z f (b)
= g(y) dy − g(y) dy = g(y) dy.
y0 y0 f (a)
Before stating the following result, we recall that if h : [a, b] → R is continuously differen-
tiable with h′ ̸= 0, then it follows from the Intermediate Value Theorem 3.29 that h′ > 0 or
h′ < 0. So, Remark 5.45 implies that it is strictly monotone and therefore invertible. Thanks
to Theorem 3.34 it follows that h−1 is continuous, and then Theorem 5.21 implies that h−1 is
differentiable on (h(a), h(b)).
Theorem 7.13: Integration by Substitution, 2nd Form
Let I, J ⊂ R be intervals, let f : I → J be continuously differentiable and g : J → R

27
continuous. Given [a, b] ⊂ I, assume that f ′ (x) ̸= 0 for all x ∈ [a, b], and let f −1 :
[f (a), f (b)] → R be the inverse of f |[a,b] . Then
Z b Z f (b)
g(f (x)) dx = g(y) (f −1 )′ (y) dy.
a f (a)
g
Proof. By Theorem 7.13 applied with f′ in place of g we have
Z b Z b Z b Z f (b)
g(f (x)) ′ g(f (x)) g(y)
g(f (x)) dx = f (x) dx = f ′ (x) dx = dy.
a a f ′ (x) a f ′ ◦ f −1 (f (x)) f (a) f ′ (f −1 (y))
Recalling that 1
f ′ ◦f −1
= (f −1 )′ (see Theorem 5.21), the result follows.
7.1.3 Improper Integrals

Given a non-empty interval I ⊆ R, we say that f : I → R is locally integrable if f |[a,b] is
integrable for every compact interval [a, b] ⊂ I.

Definition 7.14: Improper Integrals

Let I ⊆ R be a nonempty interval, and f : I → R be a locally integrable function. Set
c = inf(I) ∈ R ∪ {−∞} and d = sup(I) ∈ R ∪ {∞}, and choose x0 ∈ I. We define the
improper integral of f on I as
Z d Z x0 Z b
f (x) dx = lim f (x) dx + lim f (x) dx
c a→c+ a b→d− x0
whenever both limits exist and the sum makes sense (so, if the limits are infinite, we
do not admit an expression of the form ∞ − ∞).
If the limit is finite, we say that the improper integral converges. If the limit is ∞ or
−∞, we say that the improper integral is divergent. Otherwise, we call the improper
integral not convergent.
Example 7.15. — It holds

Z ∞ Z b
1 1 b π
dx = lim dx = lim arctan(x) 0
= lim arctan(b) = .
0 1 + x2 b→∞ 0 1 + x2 b→∞ b→∞ 2
Example 7.16. — Given α ∈ R, it holds

(
27
∞ 1
if α > 1
Z
x−α dx = α−1
1 ∞ if α ≤ 1.
In particular, the above improper integral is convergent exactly when α > 1. In fact
 h ib
if α ̸= 1
Z b 1 1−α 1 1−α 1
−α

1−α x = 1−α b − 1−α
x dx = 1
b
1  [log(x)]1 = log(b) if α = 1
and (
1 1−α ∞ if α < 1
lim b = , lim log(b) = ∞.
b→∞ 1 − α 0 if α > 1 b→∞
Lemma 7.17: Improper Integral of Non-negative Functions

Let a ∈ R, and f : [a, ∞) → R≥0 be a non-negative locally integrable function. Then
Z ∞ Z b
f (x) dx = sup f (x) dx | b > a .
a a
Rb
Proof. Since the function b ∈ [a, ∞) 7→ f (x) dx is monotonically
nR a increasing,
o it always has
b
a limit as b → ∞, which is equal to the supremum sup a f (x) dx | b > a . This supremum
is either finite (in which case the improper integral converges) or it is infinite (in which case
the improper integral diverges to ∞).

Example 7.18. — We want to study the improper integral

Z ∞ Z −1 Z 1 Z ∞
−x2 −x2 −x2 2
e dx = e dx + e dx + e−x dx.
−∞ −∞ −1 1
2
The function x ∈ R 7→ e−x is called the Gaussian. Due to Lemma 7.17, to prove that the
integral above converges it suffices to find a “majorant function” which defines a convergent
2
improper integral. Since x2 ≥ x for x ∈ [1, ∞), it follows that e−x ≤ e−x , and therefore
Z ∞ Z ∞ b
−x2
e−x dx = lim − e−x = lim e−1 − e−b ) = e−1 < ∞.

e dx ≤ 1
1 1 b→∞ b→∞
This shows the convergence of the second improper integral. Therefore, due to the symmetry
R∞ 2 R∞ 2 2
of the function, −∞ e−x dx = 1 e−x dx is also convergent. Finally, since e−x ≤ 1, the
R1
integral on [−1, 1] is bounded by −1 1 dx = 2.
This proves the convergence of the integral. However, we will not be able to calculate the
exact value of this integral until the second semester.
Theorem 7.19: Integral Test for Series

Let f : [0, ∞) → R≥0 be a monotonically decreasing function. Then
N
X +1 Z N +1 N
X
f (n) ≤ f (x) dx ≤ f (n).
0
27 n=1 n=0
In particular
∞
X Z ∞ ∞
X
f (n) ≤ f (x) dx ≤ f (n),
n=1 0 n=0
P∞ R∞
therefore, the series n=1 f (n) converges exactly when the improper integral 0 f (x) dx
converges.
Proof. Due to the monotonicity of f , Theorem 6.31 implies that the function f is locally
integrable. We consider the functions ℓ, u : [0, ∞) → R≥0 given by
u(x) = f (⌊x⌋) and ℓ(x) = f (⌈x⌉)
where ⌊·⌋ represents the rounding function (i.e., ⌊x⌋ is the largest number n ∈ N smaller than
x), and ⌈·⌉ represents the rounding up function (i.e., ⌈x⌉ is the smallest number n ∈ N larger
than x). With this choice, ℓ ≤ f ≤ u. Therefore, for all N ∈ N with N > 1, we have
N
X +1 Z N +1 Z N +1 Z N +1 N
X
f (n) = ℓ(x) dx ≤ f (x) dx ≤ u(x) dx = f (n),
n=1 0 0 0 n=0
which can also be seen in the following picture. The statement of the theorem follows by
letting N → ∞.

27
Example 7.20. — The harmonic series can be written as {f (n)}∞

n=0 with f (x) = 1+x .
1
R∞ 1 P∞ 1
Thus it diverges since 0 1+x dx = ∞. On the other hand, the series n=1 n2 converges
R∞ 1
since 0 (1+x) 2 dx < ∞.

7.2 Integration and Differentiation of Power Series

Rx
From Example 7.8(6) we know that, for any n ≥ 0, 0 tn dt = n+1 1
xn+1 . In other words,
1 n+1 is a primitive of xn . Also, thanks to Corollary 5.14 we know that (xn )′ = nxn−1 .
n+1 x
These formulas allow us to compute integrals and derivatives of polynomials. We now want
to understand when we can integrate/differentiate power series. q
Before stating and proving the results, we recall that limn→∞ n 1
n = 1. This can be proved
log(n)
using that n → 0 as n → ∞. Indeed,
r
1 1 1 1
= e n log( n ) = e− n log(n) → e0 = 1 as n → ∞. (7.3)
n
Alternatively, one can use Exercise 3.37 and Proposition 2.96(3).
Theorem 7.21: Integration of Power Series

Let f (x) = ∞ n
P
n=0 an x be a power series with radius of convergence R > 0. Then the
power series
∞
X an n+1
F (x) = x
n+1
n=0
28 has also radius of convergence R, and is a primitive of f on (−R, R).
Proof. We first check the assertion about the radius of convergence of F .

an−1
Let ρ = lim supn→∞ n |an |, so that R = ρ−1 , and define c0 = 0 and cn = for n ≥ 1.
p
n
In this way, it follows that
∞ ∞ ∞ ∞
X an n+1 X an−1 n X X
F (x) = x = x = cn xn = cn xn .
n+1 n
n=0 n=1 n=1 n=0
q
Recalling (7.3), given ε > 0 there exists N ∈ N such that 1 − ε ≤ n 1
n ≤ 1 + ε for all n ≥ N .
This implies that
r
p n|an−1 | p
lim sup n |cn | = lim sup ≤ (1 + ε) lim sup n |an−1 |
n→∞ n→∞ n n→∞
p n−1
n
= (1 + ε) lim sup n−1 |an−1 | = (1 + ε)ρ,
n→∞
and analogously
p n−1
n
p
n
lim sup |cn | ≥ (1 − ε) lim sup n−1 |an−1 | = (1 − ε)ρ.
n→∞ n→∞

Thus, if we set ρ̄ = lim supn→∞ |cn |, this proves that

p
n
(1 − ε)ρ ≤ ρ̄ ≤ (1 + ε)ρ.
Since ε > 0 is arbitrary, this implies that ρ̄ = ρ, so the power series F (x) = ∞
n=0 cn x has
n
P
radius of convergence R.
We now want to prove that F ′ = f on (−R, R). To prove that, fix an interval [a, b] ⊂
(−R, R), and consider the polynomial functions fn (t) = nk=0 ak tk . We note that
P
Z x n Z x n n
X X ak k+1 X ak k+1
fn (t) dt = ak tk dt = x − a ∀ x ∈ [a, b].
a a k+1 k+1
k=0 k=0 k=0
By Theorem 4.42, the sequence of functions (fn )∞ n=0 converges uniformly to f in [a, x] ⊂ [a, b],
so it follows from Theorem 6.42 that
Z x Z x
f (t) dt = lim fn (t) dt = F (x) − F (a) ∀ x ∈ [a, b].
a n→∞ a
According to Theorem 7.4, this implies that F ′ (x) = f (x) for all x ∈ [a, b]. Since [a, b] ⊂
(−R, R) is an arbitrary interval, this implies that F ′ = f on (−R, R), as desired.
Corollary 7.22: Differentiation of Power Series

P∞ n
28 Let f (x) = n=0 an x be a power series with radius of convergence R > 0. Then
f : (−R, R) → R is differentiable and it holds
∞
X
f ′ (x) = nan xn−1 ∀ x ∈ (−R, R),
n=1
where the power series on the right has also radius of convergence R.
Proof. Let cn = (n + 1)an for n ≥ 0, and let g(x) = ∞

P∞
n=0 cn x . Denote by
n−1 = n
P
n=1 nan x
R̄ the radius of convergence of g. Then, according to Theorem 7.21, the series
∞
X cn n+1
G(x) = x
n+1
n=0
has radius of convergence R̄ and is a primitive of g. We now note that

∞ ∞ ∞
X cn n+1 X (n + 1)an n+1 X
G(x) = x = x = an xn = f (x) − a0 .
n+1 n+1
n=0 n=0 n=1
This implies that G and f have the same radius of convergence and g = G′ = (f − a0 )′ = f ′ .
Therefore, we conclude that R̄ = R and f ′ = g.

Exercise 7.23. — Let f (x) = ∞ n=0 an x be a power series with radius of convergence
n
P
R > 0. Show that f : (−R, R) → R is smooth, and for each n ∈ N find a representation of
the n-the derivative f (n) by a power series.
Exercise 7.24. — Let f (x) = ∞

P∞
n=0 an x and g(x) = n=0 bn x be power series with
n n
P
radii of convergence Rf , Rg > 0. Let R = min{Rf , Rg } and suppose that f (x) = g(x) for all
x ∈ (−R, R). Show that cn = dn for all n ∈ N, and therefore Rf = Rg .
Exercise 7.25. — Let α ∈ R. The goal of this exercise is to show that

∞
X α n
α
(1 + x) = x ∀ x ∈ (−1, 1), (7.4)
n
n=0
where the generalised binomial coefficients are defined as

Qn−1
α j=0 (α − j) α(α − 1) · · · (α − n + 1)
= = .
n n! n!
(a) Show that, for α ̸∈ N, the power series
28 ∞
X α n
f (x) = x
n
n=0
has radius of convergence 1.
(b) Calculate the derivative of f and show that
f (x)
f ′ (x) = α ∀ x ∈ (−1, 1). (7.5)
1+x
′
f
(c) Define g(x) = (1 + x)α and use (8.20) to show that g = 0 on (−1, 1). Conclude the
validity of (7.4) by noticing that f (0) = 1 = g(0).
Example 7.26. — We have already seen in Example 4.23 that, as a consequence of the
Leibniz criterion in Proposition 4.22, the alternating harmonic series converges. However,
with the results of Chapter 4 we could not determine the value of the series. Now, with the
help of the fundamental theorem of integral and differential calculus, we can show that
∞
X (−1)n+1
= log(2).
n
n=1

To prove this, using the formula for the geometric series (which has radius of convergence 1),
we see that
∞
1 1 X
(log(1 + x))′ = = = (−1)n xn ∀ x ∈ (−1, 1).
1+x 1 − (−x)
n=0
According to Corollary 7.6 and Theorem 7.21, this implies that

Z x ∞
X (−1)n X (−1)k+1 ∞
1
log(1 + x) = log(1) + dt = xn+1 = xk ∀ x ∈ (−1, 1).
0 1+t n+1 k
n=0 k=1
k
Note now that, given x ∈ [0, 1], the sequence ak = k+1 x
is non-negative, decreasing, and
converging to zero. Hence it follows from Proposition 4.22 that, for x ∈ [0, 1],
2n ∞ 2n+1
X (−1)k+1 X (−1)k+1 X (−1)k+1 k
xk ≤ xk = log(1 + x) ≤ x ∀ n ∈ N.
k k k
k=1 k=1 k=1
Letting x → 1− , this implies that
2n 2n+1
X (−1)k+1 X (−1)k+1
≤ log(2) ≤ ∀ n ∈ N.
k k
k=1 k=1
28
Finally, letting n → ∞ in the above inequalities proves the result.
Example 7.27. — We can use the above method to show that

∞
X (−1)n π
= . (7.6)
2n + 1 4
n=0
Using again the formula for the geometric series, we see that
∞
1 X
arctan′ (x) = 2
= (−1)k x2k ∀ x ∈ (−1, 1).
1+x
k=0
According to Corollary 7.6 and Theorem 7.21, this implies that

Z x X (−1)k∞
1
arctan(x) = arctan(0) + dt = x2k+1 .
0 1 + t2 2k + 1
k=0
Again from Proposition 4.22, as in the previous example it follows that
2n+1 2n
X (−1)k 2k+1 X (−1)k
x ≤ arctan(x) ≤ x2k+1 ∀ n ∈ N.
2k + 1 2k + 1
k=0 k=0

Letting first x → 1− we get
2n+1 2n
X (−1)k π X (−1)k
≤ arctan(1) = ≤ ∀ n ∈ N,
2k + 1 4 2k + 1
k=0 k=0
so the result follows by letting n → ∞ (note that the series converges, thanks to Leibniz
criterion in Proposition 4.22).
Sometimes, the above methods for determining an indefinite integral of a function do not
produce a result. This may be because the primitive function we are looking for cannot be
expressed in terms of “known” functions.
28 Example 7.28 (Integral Sine). — The integral sine is the primitive function Si : R → R
of the continuous function
(
sin(x)
x if x ̸= 0
x ∈ R 7→
1 if x = 0
Rx sin(t)
with normalisation Si(0) = 0, that is Si(x) = 0 t dt. Thanks to Theorem 7.21, the
function Si can be written as a power series:
x ∞
xX ∞
(−1)n 2n (−1)n
Z Z
sin(t) X
Si(x) = dt = t dt = x2n+1
0 t 0 n=0 (2n + 1)! (2n + 1)!(2n + 1)
n=0
for all x ∈ R.

Chapter 7.3 Integration Methods
7.3 Integration Methods

Let I ⊆ R be an interval, and f : I → R a function. The notation
Z
f (x) dx = F (x) + C
means that F is a primitive function of f . In the expression F (x) + C, C is read as an

indefinite constant - called integration constant. Since the domain I of f is an interval,
two primitive functions of f differ by a constant, which makes the notation meaningful. One
calls F (x) + C the indefinite integral of f . Indefinite integrals of special functions can be
found in tables or by means of computer algebra systems. In this section, we show general
methods to determine indefinite integrals.
7.29. — We this whole section, I ⊆ R denotes a non-empty interval that does not consist
only of one point. Also, all functions in this section are real-valued functions with domain I
that are integrable on any compact interval [a, b] ⊆ I.
7.3.1 Integration by Parts and by Substitution in Leibniz Notation

In the computation of indefinite integrals, it is convenient to use Leibniz notation. This
notation allows us to reformulate, in a natural formalism, both integration by parts and by
28 substitution (see Section 7.1.2). We recall that the derivative of a function h is denoted by h′
dx . In this section, the second notation will be useful.
or by dh
7.30. — Let f and g be functions with primitives F , and G, respectively. Recall that,
from the product rule for the derivative in Proposition 5.12, it follows that (F G)′ = f G + F g.
This implies the integration by parts formula
Z Z
F (x)g(x) dx = F (x)G(x) − f (x)G(x) dx + C. (7.7)
In Leibniz notation, f = dF
dx and g = dx . This leads to the notation f dx = dF and g dx = dG,
dG
and integration by parts is sometimes written as

Z Z
F dG = F G − G dF + C,
which should be understood as a short form of formula (7.7).
7.31. — Let J be an interval and let f : I → J be a differentiable function. If G : J → R

is a primitive of g then, by the chain rule in Theorem 5.16, [G(f (x))]′ = g(f (x))f ′ (x) holds

for all x ∈ I. From this it follows that

Z
g(f (x))f ′ (x) dx = G(f (x)) + C.
Since G(u) = g(u) du + C holds, we obtain

R
Z Z
′
g(f (x))f (x) dx = g(u) du + C (7.8)
where we used the change of variables u = f (x). The substitution rule is also called change
of variable, as one has replaced the variable u in g(u) du by u = f (x). In Leibniz notation
R
this is very natural: if u = f (x) then du = f ′ (x)dx, and (7.8) follows.

We also recall the second form of the substitution rule: if f ′ ̸= 0 we can set x = f −1 (u) so
that du
dx
= (f −1 )′ (u), and obtain
Z Z
dx
g(f (x)) dx = g(u) du + C, (7.9)
du
see Section 7.1.2.
28 7.3.2 Integration by Parts: Examples

Example 7.32. — We want to calculate the indefinite integral xex dx. Since ex = (ex )′ ,
R
using (7.7) we get

Z Z Z Z
x x ′ x ′ x x
xe dx = x(e ) dx = xe − x · e dx + C = xe − ex + C.
Since ex = ex + C, we conclude that

R
Z
xex dx = xex − ex + C.
We note that it is sufficient to use only one integration constant C in such calculations, since
several such constants can be combined into one.
Example 7.33. — We calculate the integral

R
log(x) dx:
Z Z Z Z
log(x) dx = log(x) · 1 dx = log(x) · x dx = log(x) · x − log′ (x)x dx + C
′
Z Z
1
= log(x) · x − x dx + C = log(x) · x − dx + C = x log(x) − x + C.
x

Suggestion: To ensure that the final result is correct, differentiate the result and check if you
get the original function. For instance, in this case, one can easily check that
(x log(x) − x + C)′ = log(x).
Exercise 7.34. — Give a recursive formula for calculating the indefinite integrals
Z Z Z
xn ex dx , xn sin(x) dx , xn cos(x) dx
for n ∈ N.
Exercise 7.35. — Calculate

Z Z
xs log(x)dx , eax sin(bx) dx
for all s, a, b ∈ R. Note that the case s = −1 needs to be treated separately, in analogy with
Example 7.8(6)-(7).
28 7.3.3 Integration by Substitution: Examples

Example 7.36. — We want to compute x
dx. Let u = f (x) = 1 + x2 , so that
R
1+x2
du = f ′ (x) dx = 2x dx. Then we find
Z Z Z
x 1 1 1 1 1 1
dx = 2x dx = du = log |u| = log(1 + x2 ) + C.
1 + x2 2 1 + x2 2 u 2 2
R√
Example 7.37. — Given r > 0, we want to compute the indefinite integral r2 − x2 dx.
Due to the trigonometric identities r2 − r2 sin(θ)2 = r cos(θ) it is convenient to use the
p
change of variable x = r sin(θ), θ ∈ (− π2 , π2 ). With this choice we have dx = r cos(θ) dθ,

therefore
Z p Z p Z
2
2 2
r − x dx = r 2 2 2
r − r sin(θ) cos(θ) dθ = r cos2 (θ)dθ.
To compute cos2 (θ)dθ we use integration by parts as follows:

R
Z Z Z
cos2 (θ)dθ = cos(θ) sin′ (θ)dθ = cos(θ) sin(θ) − cos′ (θ) sin(θ) + C
Z
= cos(θ) sin(θ) + sin2 (θ) + C.

Since sin2 (θ) = 1 − cos2 (θ), we get

Z Z Z Z
2 2
cos (θ)dθ = cos(θ) sin(θ) + 1− cos (θ) + C = cos(θ) sin(θ) + θ − cos2 (θ) + C,
therefore
Z Z
1
2 cos2 (θ)dθ = cos(θ) sin(θ) + θ + C cos2 (θ)dθ =

=⇒ cos(θ) sin(θ) + θ + C
2
(note that, since C ∈ R is arbitrary, in the last formula we still write C in place of 2 ).
C
This
28
proves that
r2
Z p Z
2
cos2 (θ) dθ =

2 2
r − x dx = r sin(θ) cos(θ) + θ + C
2
(again, we write C instead of Cr2 ). Recalling that x = r sin(θ) with θ ∈ (− π2 , π2 ), it follows

q
2
that θ = arcsin r and cos(θ) = 1 − xr2 , therefore
x

r2
Z p
1 p x

r2 − x2 dx = x r2 − x2 + arcsin r + C.
2 2
7.38. — Substitution like in Example 7.41 is called trigonometric substitutions. We

will not always argue carefully in these calculations and will rather trust the Leibniz notation,
but recall that, to apply (7.9), there must be invertibility of the function when we express
the old variable by the new variable. For the following list of trigonometric substitutions, let
n ∈ Z.
n
• In expressions of the form (a2 − x2 ) 2 for a > 0, as already seen in the example above,
one considers the the substitution x = a sin(θ) with θ ∈ (− π2 , π2 ), giving dx = a cos(θ) dθ
1
and (a2 − x2 ) 2 = a cos(θ).
n
• In expressions of the form (a2 + x2 ) 2 for a > 0, the substitution x = a tan(θ) with θ ∈
29 1
(− π2 , π2 ) yields dx = cosa2 (θ) dθ and (a2 + x2 ) 2 = cos(θ)
a
.
• Although this is not a trigonometric substitution, we still note the following: For the
n n
expression x(a2 − x2 ) 2 or the expression x(a2 + x2 ) 2 , the substitutions u = a2 − x2 and
u = a2 + x2 , respectively, allow us to compute the indefinite integrals.
Example 7.39. — (i) Given a > 0, using the substitution x = a tan(θ), recalling that
1
a
(a2 + x2 ) 2 = cos(θ) and dx = cosa2 (θ) dθ, we get
cos3 (θ) a
Z Z Z
1 1 1
3 dx = dθ = 2 cos(θ)dθ = sin(θ) + C
(a2 + x2 ) 2 a3 cos2 (θ) a a2

1 x
= tan(θ) cos(θ) + C = √ + C.
a2 a a2 + x2
2
(ii) Choosing u = 1 − x2 (so that du = −2x dx), we have

Z Z
p 1 12 1
2
x 1 − x dx = − u1/2 du = − u3/2 + C = − (1 − x2 )3/2 + C.
2 23 3
Certain indefinite integrals can be computed with hyperbolic substitutions. For instance,
n
for expressions of the form (x2 − a2 ) 2 with a ∈ R, the substitution x = a cosh(u) yields
1
dx = a sinh(u) du and (x2 − a2 ) 2 = a sinh(u).
Example 7.40. — Using the substitution x = cosh(u) (so dx = sinh(u) du), we compute
Z p Z q Z
x2 − 1 dx = cosh (u) − 1 sinh(u) du = sinh2 (u) du.
2
In analogy to the argument used in Example 7.37, we compute sinh2 (u) du as follows:
R
Z Z
2
sinh (u) du = cosh(u) sinh(u) − cosh2 (u) du
Z
1 + sinh2 (u) du + C

= cosh(u) sinh(u) −
29 Z
= cosh(u) sinh(u) − u − sinh2 (u) du + C,
This yields
cosh(u) sinh(u) − u
Z Z
2
2 sinh (u) du = cosh(u) sinh(u)−u+C =⇒ sinh2 (u) du = +C,
2
hence
√
− x2 − 1 − arcosh(x)
Z p
cosh(u) sinh(u) u x
x2 − 1 dx = +C = + C.
2 2
Another method that we would like to mention briefly here is the so-called half-angle
method (or Weierstrass substitution). This is useful for the integral of expressions like
cos2 (x)+cos(x)+sin(x)
sin(x) or , see also Remark 7.46 below. We show this method in detail in
1
1+sin(x)
the next example.
Example 7.41. — We want to compute sin(x) dx, and we consider the change of variable
R 1
u = tan 2 . We can note that, by the doubling angle formulas for sine and cosine, it follows
x

that
2u 1 − u2
sin(x) = and cos(x) = .
1 + u2 1 + u2

Indeed, using (4.15) we have

x
sin( )
2 tan x2
2 cos 2x
2u (2) x x
= = = 2 sin cos = sin(x),
1 + u2 1 + tan2 x2

sin2 ( x2 ) 2 2
1 + cos2 x
(2)
and analogously for the second formula. Furthermore, the relation u = tan x
implies that

2
x = 2 arctan(u), therefore dx = 1+u
2
2 du (recall that arctan (s) = 1+s2 ).
′ 1
Using these formulas, we get
1 + u2 2
Z Z Z
1 1 x

dx = du = du = log |u| + C = log tan 2 + C.
sin(x) 2u 1 + u2 u
7.3.4 Integration of Rational Functions
A function of the form f (x) = p(x)

q(x) for polynomials p and q ̸= 0 is called a rational function.
In this section, we show a procedure for computing the indefinite integral of a rational function
f = pq on an interval I on which q has no zeros. By polynomial division with reminder, one can
always write f = pq in the form f = g + qr , where g and r are polynomials with deg r < deg q.
The polynomial function g is easy to integrate. Therefore, we always assume that the degree
29 of p is smaller than the degree of q.
7.42. — We start by integrating some elementary rational functions. Let a ∈ R and n ∈ N

with n ≥ 2. Then:
Z
1
dx = log |x − a| + C (7.10)
x−a
Z
1 1
n
dx = (x − a)1−n + C (7.11)
(x − a) 1−n
Z
1 1
arctan xa + C (7.12)

2 2
dx =
a +x a
Z
x 1
2 2
dx = log(a2 + x2 ) + C (7.13)
a +x 2
Z
x 1
2 2 n
dx = (a2 + x2 )1−n + C (7.14)
(a + x ) 2(1 − n)
The integrals (7.10) and (7.11) are calculated with substitution u = x−a, for (7.12) substitute
u = xa , for (7.13) and (7.14) substitute u = a2 + x2 .
7.43. — To integrate a general rational function, we use what is called the partial fraction
decomposition of rational functions. Let p, q be polynomials without nontrivial common
divisors such that q ̸= 0 and deg p < deg q.

First, factorize the polynomial q into linear and quadratic factors
q(x) = (x − a1 )k1 · · · (x − an )kn (x2 + b1 x + c1 )ℓ1 · · · (x2 + bm x + cm )ℓm
Then, the rational function p(x)

q(x) can be rewritten as a linear combination of rational functions
of the form
1 1 x
k
, 2 ℓ
, ,
(x − ai ) (x + bj x + cj ) (x + bj x + cj )ℓ
2
for some k ≤ ki and ℓ ≤ ℓj , and then one needs to integrate each of these individual terms.
x4 +1
Example 7.44. — We calculate the indefinite integral dx. First, we perform
R
x2 (x+1)
division with remainder:
x4 + 1 x2 + 1
= x − 1 +
x3 + x2 x2 (x + 1)
x2 +1
To obtain the partial fraction decomposition of x2 (x+1)
, we set
x2 + 1 ax + b c
2
= 2
+
x (x + 1) x x+1
for some real numbers a, b, c to be determined. To determine a, b, c we multiply both sides by

x2 (x + 1), which gives
29 x2 + 1 = ax(x + 1) + b(x + 1) + cx2 = ax2 + ax + bx + b + cx2 .
Comparing the coefficients, we find the linear system
a + c = 1, a + b = 0, b = 1,
therefore a = −1, b = 1, and c = 2. So in summary
x4 + 1 1 1 2
=x−1− + 2 + .
x2 (x + 1) x x x+1
Therefore
x4 + 1
Z Z Z Z Z Z
1 1 1
dx = x dx − 1 dx − dx + dx + 2 dx
x2 (x + 1) x x2 x+1
x2 1
= − x − log |x| − + 2 log |x + 1| + C.
2 x
Example 7.45. — We calculate the indefinite integral x(x2 +2x+2) 1

dx. Note that the
R
polynomial x + 2x + 2 has no real zeros. For the partial fraction decomposition we make the
2
approach
1 a bx + c
= + 2 .
x(x2 + 2x + 2) x x + 2x + 2

Now we multiply both sides by x(x2 + 2x + 2) and get
1 = a(x2 + 2x + 2) + (bx + c)x = ax2 + 2ax + 2a + bx2 + cx,
thus
a + b = 0, 2a + c = 0, 2a = 1,
which gives a = 12 , b = − 12 , and c = −1. It follows that

Z Z Z
1 1 1 1 x+2
2
dx = dx − 2
dx
x(x + 2x + 2) 2 x 2 x + 2x + 2
Z
1 1 x+2
= log |x| − dx
2 2 (x + 1)2 + 1
Z
1 1 u+1
= log |x| − dx
2 2 u2 + 1
1 1 1
= log |x| − log |u2 + 1| − arctan(u) + C
2 4 2
1 1 2 1
= log |x| − log((x + 1) + 1) − arctan(x + 1) + C,
2 4 2
where we have set u = x + 1 and used (7.13) and (7.14).
In some cases, the above procedure may also lead to the integral (a2 +x
1
2 )n dx for an a ∈ R
R
29 and n ≥ 2, which (as explained previously) we can handle with the trigonometric substitution
tan(u) = xa .
Remark 7.46. — Now that we know how to integrate rational functions, we can rediscuss
the half-angle method introduced before. This allows one to compute the integral of rational
expressions in sine and cosine. In fact, with the substitution u = tan x2 , using that

2u 1 − u2 2
sin(x) = , cos(x) = , dx = du,
1 + u2 1 + u2 1 + u2
(see Exercise 7.41), one ends up with the integral of a rational function in u.
cos(x)
Exercise 7.47. — Calculate the indefinite integral dx using the substitution
R
2+sin(x)
u = tan x2 .

Remark 7.48. — Sometimes one or the other substitution is carried out because there is a
nested function in the function to be integrated and one simply has no other method available.
√
For example, in the integral sin( x) dx none of the mentioned methods is available, but one
R
√
is tempted to set u = x, and this indeed leads to an integral that one can solve. Similarly,
in an integral of the form 1+e x dx, one sets u = e .
R 1 x

7.3.5 Definite Integrals with Improper Integration Limits

We have so far only integrated functions on Compact Intervals [a, b]. In this section, we extend
the notion of integral to also think about the Riemann integral of functions on unconstrained
and not necessarily closed intervals.
R1
Example 7.49. — We compute the indefinite integral 0 log(x) dx using
Z 1 Z 1
log(x) dx = lim log(x) dx = lim [x log(x) − x]1a
0 a→0 a a→0
= lim (log(1) − 1 − a log(a) + a) = −1

a→0
after Example 3.79.

29
R1
Example 7.50. — We consider the improper integral 1
0 x dx. This is improper since
x 7→ x1 is unbounded as x → 0+ . In this case, we get
Z 1 Z 1
1 1
dx = lim dx = lim log(1) − log(a) = ∞.
0 x a→0+ a x a→0
R1
Thus the improper integral 1
0 x dx diverges, and we can assign it the value ∞.
π
Z 1 Z
1 2
Exercise 7.51. — Calculate √ dx and tan(x) dx.
0 x 0
R∞
Exercise 7.52. — Decide for which p ∈ R≥0 the improper integral 0 x sin(xp ) dx con-
verges.
7.3.6 The Gamma Function

7.53. — The Gamma-function Γ is defined, for s ∈ (0, ∞), by the convergent improper
integral Z ∞
30 Γ(s) = xs−1 e−x dx. (7.15)
0
To verify that this improper integral indeed converges, we examine the integration limits 0
and ∞ separately. For 0 < a < b we find, using integration by parts,
Z b s −x b
Z b
s−1 −x
x e dx = 1
s x e a
+ 1
s xs e−x dx. (7.16)
a a

We obtain
Z b
b
Z b
s−1 −x 1
xs e−x a 1 s −x

x e dx = lim s + s x e dx
0 a→0 a
Z b
1
= 1 s
s b exp(−b) + xs e−x dx,
s 0
where the integral on the right is an actual Riemann integral since the function xs e−x is
continuous on [0, b]. To investigate the upper limit of integration, we note that there exists
R > 0 such that ex > xs+2 holds for all x > R. Thus
Z ∞ Z R Z ∞
xs e−x dx ≤ xs e−x dx + x−2 dx < ∞
0 0 R
which shows that (7.16) converges as b → ∞. Specifically, we obtain

Z ∞ Z b Z ∞
s−1 −x 1 s −x 1
x e dx = lim 1 s
s b exp(−b) + x e dx = xs e−x dx.
0 b→∞ s 0 s 0
This shows that the gamma function satisfies the functional equation
Γ(s + 1) = s Γ(s) (7.17)
30 for all s ∈ (0, ∞).
7.54. — The Gamma function extends the factorial function from N to (0, ∞). In fact
Z ∞
Γ(1) = x0 e−x dx = e0 − lim e−x = 1
0 x→∞
therefore (7.17) implies that
Γ(n + 1) = nΓ(n) = n(n − 1)Γ(n − 1) = . . . = n! Γ(1) = n! ∀ n ∈ N.
At the moment, it is not clear whether the Gamma function is continuous. Eventually, it
will turn out that Γ is smooth. Also, for example, we cannot calculate the value Γ( 12 ) with
the integration methods we know so far, but we will see later by means of a two-dimensional
√
integral that it is π.
7.55. — David Hilbert (1862–1943), in his 1893 article [Hil1893], used improper integrals in
the style of the Gamma function to prove that e (as first proved by Hermite in 1873) and π (as
first proved by Lindemann in 1882) are transcendental. We note here that the irrationality
of these numbers is much easier to prove. Transcendence proofs are generally much more
difficult. The difficulty in making such statements is perhaps illustrated by the fact that it is
still not known whether e + π is a transcendental number or not.

Chapter 7.4 Taylor Series
7.4 Taylor Series
7.4.1 Taylor Approximation

The derivative f ′ (x0 ) of a real-valued differentiable function f on an interval gives the slope
of the tangent of the graph of f at x0 . The corresponding affine function
x 7→ f (x0 ) + f ′ (x0 )(x − x0 )
approximates the function f to within an error f (x) − y(x) = o(x − x0 ) as x → x0 . The “qual-
ity” of the approximation can be increased by considering higher polynomial approximations
instead of affine approximations.
In this section, it will be convenient to use the following abuse of notation: Given a, b ∈ R,
irrespective of the order between a and b, [a, b] denotes the interval between them. In other
words, for all a, b ∈ R, [a, b] and [b, a] denote the same interval.
We also recall that, if a < b, then
Z b Z b
f (x) dx ≤ |f (x)| dx,
a a
see Theorem 6.25. If instead b < a, then a minus sign appears (recall (7.2)) and we get
30
Z b Z a Z a Z b Z b
f (x) dx = f (x) dx ≤ |f (x)| dx = − |f (x)| dx = |f (x)| dx .
a b b a a
In conclusion, independent of the order of a with respect to b, we always have

Z b Z b
f (x) dx ≤ |f (x)| dx .
a a
We shall use this inequality many times in this section.
7.56. — Let D ⊆ R be an open interval, and f : D → C be an n times differentiable

function. The n-th Taylor approximation of f around a point x0 ∈ D is the polynomial
function
n
X f (k) (x0 )
Pn (x) = (x − x0 )k . (7.18)
k!
k=0
The coefficients are chosen so that P (k) (x0 ) = f (k) (x0 ) for k ∈ {0, . . . , n}.
We will state and prove several versions of Taylor’s Theorem. We begin with this first
version:

Theorem 7.57: Taylor Expansion to Order n with Integral Remainder

Let n ≥ 1, f : [a, b] → R a n-times continuously differentiable function, and fix x0 ∈
[a, b]. Then, for all x ∈ [a, b],
x
(x − t)n−1
Z
f (x) = Pn−1 (x) + f (n) (t) dt, (7.19)
x0 (n − 1)!
where Pn−1 is the (n − 1)-th Taylor approximation of f defined in (7.18).
Remark 7.58. — In the above theorem, the assumption that f is a n-times continuously dif-
n−1
ferentiable function guarantees that the integral of the continuous function t 7→ f (n) (t) (x−t)
(n−1)!
exists.
Proof. The proof follows by induction on n and integration by parts. If n = 1 then f : [a, b] →
R is continuously differentiable and, by Corollary 7.6, we get
Z x
f (x) = f (x0 ) + f ′ (t)dt.
x0
If f is twice continuously differentiable, we can apply integration by parts to the above integral
with u(t) = f ′ (t) and v(t) = t − x. Indeed, since v ′ = 1 and v(x) = 0, we get
30 Z x
f (x) = f (x0 ) + f ′ (t)v ′ (t)dt
x0
x
Z x
′
f ′′ (t)v(t) dt

= f (x0 ) + f (t)v(t) x0
−
x0
Z x
= f (x0 ) + f ′ (x0 )(x − x0 ) + f ′′ (t)(x − t) dt
x0
x
(x − t)1
Z
= P1 (x) + f (2) (t) dt.
x0 1!
This proves the case n = 2.

More generally, assume that the statement of the theorem is true for n ≥ 1 and that
f : [a, b]] → R is a (n + 1) times continuously differentiable function. Then, by the induction
hypothesis,
n−1
X f (k) (x0 ) Z x
(x − t)n−1
f (x) = (x − x0 )k + f (n) (t) dt
k! x0 (n − 1)!
k=0
n (x−t)n−1
for all x ∈ D. Now, if we set u(t) = f (n) (t) and v(t) = − (x−t)
n! , since v (t) =
′
(n−1)! it
follows from integration by parts that
n−1
(x − t)n x
Z x
f (k) (x0 ) (x − t)n
X
k (n)
f (x) = (x − x0 ) − f (t) + f (n+1) (t) dt
k! n! x0 x0 n!
k=0

n x
f (k) (x0 ) (x − t)n
X Z
= (x − x0 )k + f (n+1) (t) dt.
k! x0 n!
k=0
This proves the induction step, and hence the result.
We can now state our two versions of Taylor’s Approximation, using first the big-O nota-
tion, and then a refined version with the little-o notation.
Corollary 7.59: Taylor Approximation with Big-O

f (x) = Pn−1 (x) + O(|x − x0 |n ) as x → x0 . (7.20)
Proof. Recalling (7.19), we have

x
(x − t)n−1
Z
f (x) = Pn−1 (x) + f (n) (t) dt.
x0 (n − 1)!
Also, since f (n) is continuous on [a, b], it is bounded (recall Theorem 8.25). Hence, there exists
30 a constant M such that |f (n) | ≤ M on [a, b]. This implies that
x x
(x − t)n−1 |x − t|n−1
Z Z
|f (x) − Pn−1 (x)| ≤ f (n) (t) dt ≤ M dt ∀ x ∈ [a, b].
x0 (n − 1)! x0 (n − 1)!
Observe now that the sign of the integrand (x − t) is constant on the interval [x0 , x], so
x x
|x − t|n−1 (x − t)n−1
Z Z
dt = dt
x0 (n − 1)! x0 (n − 1)!
and the last integral can be computed with a change of variable: setting s = x − t we get
x x−x0
(x − t)n−1 sn−1 (x − x0 )n |x − x0 |n
Z Z
dt = ds = =
x0 (n − 1)! 0 (n − 1)! n! n!
This shows that

|x − x0 |n
|f (x) − Pn−1 (x)| ≤ M ∀ x ∈ [a, b],
n!
therefore f (x) − Pn−1 (x) = O(|x − x0 |n ) as x → x0 , concluding the proof.
We now show that by replacing Pn−1 with Pn , we can improve the previous result using
the little-o notation.

Corollary 7.60: Taylor Approximation with Little-o

f (x) = Pn (x) + o(|x − x0 |n ) as x → x0 . (7.21)
Proof. Thanks to Theorem 7.57, we can write

x
(x − t)n−1
Z
f (x) = Pn−1 (x) + f (n) (t) dt.
x0 (n − 1)!
Also, we note that

x x
(x − t)n−1 (x − t)n−1 (x − x0 )n
Z Z
(n)
f (x0 ) dt = f (n) (x0 ) dt = f (n) (x0 ) .
x0 (n − 1)! x0 (n − 1)! n!
Therefore
Z x
(x − x0 )n
(n)
(x − t)n−1
f (x) = Pn−1 (x) + f (x0 ) + f (n) (t) − f (n) (x0 ) dt
n! x0 (n − 1)!
Z x (7.22)
(x − t)n−1
30 = Pn (x) + f (n) (t) − f (n) (x0 ) dt.
x0 (n − 1)!
Now, given ε > 0, it follows from the continuity of f (n) at x0 that there exists δ > 0 such that
|f (n) (x) − f (n) (x0 )| < ε for all x ∈ (x0 − δ, x0 + δ) ∩ [a, b]. Hence, if x ∈ (x0 − δ, x0 + δ) ∩ [a, b],
we can bound the integrand in the last integral by
(x − t)n−1 |x − t|n−1 |x − t|n−1

f (n) (t)−f (n) (x0 ) ≤ |f (n) (t)−f (n) (x0 )| <ε ∀ t ∈ [x0 , x].
(n − 1)! (n − 1)! (n − 1)!
This implies
Z x (x − t)n−1
|f (x) − Pn (x)| ≤ f (n) (t) − f (n) (x0 ) dt
x0 (n − 1)!
x
|x − t|n−1 |x − x0 |n
Z
<ε dt = ε ≤ ε|x − x0 |n ,
x0 (n − 1)! n!
where the last integral has been computed as in the proof of Corollary 7.59. This proves that
|f (x) − Pn (x)|
<ε ∀ x ∈ (x0 − δ, x0 + δ) ∩ [a, b],
|x − x0 |n
which shows that f (x) − Pn (x) = o(|x − x0 |n ) as x → x0 .

Example 7.61. — 1. If f ∈ C 1 , then Corollary 7.59 implies that
f (x) = f (x0 ) + O(|x − x0 |) as x → x0 ,
while Corollary 7.60 yields
f (x) = f (x0 ) + f ′ (x0 )(x − x0 ) + o(|x − x0 |) as x → x0 .
2. If f ∈ C 2 , then Corollary 7.59 gives
f (x) = f (x0 ) + f ′ (x0 )(x − x0 ) + O(|x − x0 |2 ) as x → x0 ,
while Corollary 7.60 yields
1
f (x) = f (x0 ) + f ′ (x0 )(x − x0 ) + f ′′ (x0 )(x − x0 )2 + o(|x − x0 |2 ) as x → x0 .
2
3. If f is smooth, then Corollary 7.60 yields
f (x) = Pn (x) + o(|x − x0 |n ) as x → x0 ,
31 while Corollary 7.59 applied with n + 1 in place of n gives
f (x) = Pn (x) + O(|x − x0 |n+1 ) as x → x0 .
Hence, while in the case where f has a finite number of derivatives Corollary 7.60
provides a stronger result, in the case when f is smooth, the bound on f − Pn provided
by Corollary 7.59 is more convenient.
While for proving Corollary 7.60 the continuity of f (n) plays a crucial role, in the proof
of Corollary 7.59 we mainly used that f (n) is bounded (the continuity of f (n) is needed only
to guarantee that f (n) is integrable). In fact, it is possible to prove Corollary 7.59 under
the weaker assumption that the n-th derivative exists and is bounded (but is not necessarily
continuous). For this, we first prove the following alternative version of Taylor Theorem. Note
that in the case n = 1, this result corresponds to the Mean Value Theorem 5.31.
Theorem 7.62: Taylor Expansion to Order n with Lagrange Remainder

Let n ≥ 1, f : [a, b] → R a n-times continuously differentiable function, and fix
x0 ∈ [a, b]. Then, for all x ∈ [a, b] there exists ξL ∈ (x0 , x) such that the Lagrange
remainder formula holds:
1 (n)
f (x) − Pn−1 (x) = f (ξL )(x − x0 )n . (7.23)
n!

Proof. (Extra Material) Fix x ∈ (a, b) and consider the function F : (a, b) → R defined as
n−1
f (n−1) (t) X f (k) (t)
F (t) = f (t) + f (1) (t)(x − t) + . . . + (x − t)n−1 = (x − t)k . (7.24)
(n − 1)! k!
k=0
Then F (x) = f (x) and F (x0 ) = Pn−1 (x). Also, its derivative is given by
n−1 n−1
′
X f (k+1) (t) X f (k) (t)
F (t) = (x − t)k − k(x − t)k−1
k! k!
k=0 k=0
n−1 n−1
X f (k+1) (t) X f (k) (t)
= (x − t)k − (x − t)k−1
k! (k − 1)!
k=0 k=1
n−1 n−2
X f (k+1) (t) X f (k+1) (t) f (n) (t)
= (x − t)k − (x − t)k = (x − t)n−1 .
k! k! (n − 1)!
k=0 k=0
Hence, applying Cauchy Mean Value Theorem 5.35 in the interval [x0 , x] to the functions
f (t) = F (t) and g(t) = −(x − t)n we deduce the existence of a point ξL ∈ (x0 , x) such that
f (n) (ξL )
f (x) − Pn−1 (x) F (x) − F (x0 ) f ′ (ξL ) (n−1)! (x − ξL )n−1 f (n) (ξL )
= = = = .
(x − x0 )n g(x) − g(x0 ) g ′ (ξL ) n(x − ξL )n−1 n!
31
This implies (7.23) and concludes the proof.
Corollary 7.63: Taylor Approximation with Big-O, 2nd version

[a, b]. Assume that there exists M > 0 and such that |f (n) (x)| ≤ M for all x ∈ [a, b].
Then
f (x) = Pn−1 (x) + O(|x − x0 |n ) as x → x0 . (7.25)
Proof. (Extra Material) Given x ∈ [a, b], we apply (7.23) to find a point ξL ∈ (x0 , x) such
that
1
f (x) − Pn−1 (x) = f (n) (ξL )(x − x0 )n .
n!
Since |f (n) (ξL )| ≤ M , it follows that
M
|f (x) − Pn−1 (x)| ≤ |x − x0 |n ∀ x ∈ [a, b],
n!
therefore f (x) − Pn−1 (x) = O(|x − x0 |n ) as x → x0 , as desired.
Another version of Taylor formula is the one with the so-called Cauchy remainder. We
discuss it in the following exercise.

Exercise 7.64. — Let f : [a, b] → R be a n-times differentiable function, and fix x, x0 ∈

[a, b]. Prove that there exists ξC ∈ [x0 , x] such that the Cauchy remainder formula holds:
1
f (x) − Pn−1 (x) = f n (ξC )(x − ξC )n−1 (x − x0 ). (7.26)
(n − 1)!
Hint: Consider the function F defined in (7.24) and apply to it the Mean Value Theorem 5.31
in the interval [x0 , x].
Example 7.65. — We can use the Taylor approximation to refine the discussion in Section
5.2.1. Let f : (a, b) → R be a n-times continuously differentiable function. Suppose x0 ∈ (a, b)
satisfies
f ′ (x0 ) = . . . = f (n−1) (x0 ) = 0.
Then the following implications hold:
• If f (n) (x0 ) < 0 and n is even, then f has an isolated local maximum in x0 .
• If f (n) (x0 ) > 0 and n is even, then f has an isolated local minimum in x0 .
• If f (n) (x0 ) ̸= 0 and n is odd, then x0 is not a local extremum of f .
All three statements follow from (7.23), which, in this case, takes the form
31
1 (n)
f (x) = f (x0 ) + f (ξL )(x − x0 )n , ξL ∈ (x0 , x).
n!
Indeed, if f (n) (x0 ) > 0, by continuity there exists δ > 0 such that f (n) (ξL ) > 0 for ξL ∈
(x0 , x) ⊂ (x0 − δ, x0 + δ). If n is even, then (x − x0 )n > 0 for x ̸= 0 and we deduce that
f (x) > f (x0 ) for x ∈ (x0 − δ, x0 + δ) with x ̸= 0. If n is odd, then (x − x0 )n changes sign
when considering x > x0 and x < x0 , so x0 is not a local extremum of f .
On the other hand, if f (n) (x0 ) < 0 and n is even, the same argument as above shows that
f (x) < f (x0 ) for x ∈ (x0 − δ, x0 + δ) with x ̸= 0, while in the case n odd x0 is not a local
extremum of f .
Applet 7.66 (Taylor approximations). We present some Taylor approximations at shiftable

footpoints to known functions.
7.4.2 Analytic Functions

Motivated by Taylor’s Theorem, one may be tempted to say that if one replaces the polynomial
Pn with the Taylor series
∞
X f (k) (x0 )
(x − x0 )k ,
k!
k=0

then one should recover f (x). Unfortunately, this is false, and functions that satisfy such a
property are rather special.
Note that the Taylor series is centered at x0 instead of 0 (i.e., xn is replaced with (x−x0 )n ).
Hence, all theorems about power series from Section 4.4 still hold, but taking into account
that now x0 plays the role of the center. In particular, if the series has radius of convergence
R > 0, then it converges for all x ∈ (x0 − R, x0 + R), while it diverges for |x − x0 | > R.
Definition 7.67: Analytic Functions

Let I ⊆ R be an interval and x0 ∈ I. A smooth function f : I → R is called analytic
at x0 if there exists δ > 0 such that the Taylor series of f around x0 has radius of
convergence R > δ and
∞
X f (n) (x0 )
f (x) = (x − x0 )n ∀ x ∈ (x0 − δ, x0 + δ) ∩ I.
n!
n=0
We say f is analytic in I if f is analytic at all points in I.
In other words, analytic functions f : I → R are characterized by the fact that, for every
point x0 ∈ I, there exists a power series that converges to f in a neighborhood of x0 .
31 As the next example shows, there are smooth functions f whose Taylor series converges to
a function different from f .
Example 7.68. — Consider the function ψ : R → R defined by

(
exp − x1 if x > 0

ψ : x ∈ R 7→ .
0 if x ≤ 0
As shown in Exercise 5.25, ψ is smooth on R and satisfies ψ (n) (0) = 0 for all n ∈ N. Hence,
the Taylor series of the function ψ at the point x0 = 0 is the zero series:
∞ ∞
X ψ (n) (0) n
X 0 n
x = x = 0.
n! n!
n=0 n=0
This series has an infinite radius of convergence and converges to the function 0. Since
ψ(x) > 0 holds for all x > 0, the Taylor series does not converge to ψ, and so ψ is not analytic
at the point x0 = 0.
The next result provides a criterion that guarantees that the Taylor series of f converges
to f in a neighborhood of x0 .

Theorem 7.69: A Criterion for Analyticity at x0
Let I ⊆ R be an interval and f : I → R a smooth function. Given x0 ∈ I, assume that

there exist constants r, c, A > 0 such that
|f (n) (x)| ≤ cAn n! for all x ∈ (x0 − r, x0 + r) ∩ I, n ∈ N.
Then f is analytic at x0 .
Proof. We first estimate the radius of convergence R of the Taylor series. If we define an =
f (n) (x0 )
, then the Taylor series is equal to ∞
n=0 an (x − x0 ) . Also, thanks to our assumption
n
P
n!
on the size of |f (n) | it follows that
cAn n!
|an | ≤ = cAn .
n!
This implies that

p
n
√
n √
lim sup |an | ≤ lim sup cAn = lim sup n cA = A,
n→∞ n→∞ n→∞
thus, by the definition of radius of convergence (see Definition 4.39), R ≥ A1 .

Now, fix δ < min r, A1 . Given x ∈ (x0 −δ, x0 +δ)∩I, we apply (7.23) and our assumption

31 on the size of |f (n) | to deduce that
1 (n)
|f (x) − Pn−1 (x)| ≤ |f (ξL )| |x − x0 |n ≤ cAn |x − x0 |n ≤ c(Aδ)n .
n!
Since Aδ < 1 by assumption (so, in particular, δ < 1

A ≤ R), letting n → ∞ we conclude that
∞
X f (n) (x0 )
f (x) = lim Pn−1 (x) = (x − x0 )n ∀ x ∈ (x0 − δ, x0 + δ) ∩ I,
n→∞ n!
n=0
as desired.
As a direct consequence of Theorem 7.69, we immediately deduce the following:
Corollary 7.70: A Criterion for Analyticity

Let f : [a, b] → R be a smooth function, and assume there exist constants c, A > 0 such
that
|f (n) (x)| ≤ cAn n! for all x ∈ [a, b], n ∈ N. (7.27)
Then f is analytic on [a, b].

Exercise 7.71. — 1. Show that the functions exp, sin, sinh satisfy the property (7.27)
on any interval [a, b] ⊂ R.
31 2. Show that the function log satisfies (7.27) on any interval [a, b] ⊂ (0, ∞).
3. Let f, g : [a, b] → R be functions satisfying (7.27). Show that f + g and f · g also satisfy
this property (possibly with different constants c and A).

Chapter 8
Ordinary Differential Equations
Setting up and solving differential equations stands as a primary practical use of calculus.
These equations are instrumental in addressing a wide range of challenges in fields such as
physics, chemistry, biology, and more. Moreover, disciplines like structural analysis, modern
economics, and information technology heavily rely on differential equations, making them
indispensable in these domains.
8.1 Ordinary Differential Equations (ODEs)

Ordinary differential equations (ODEs) are fundamental in mathematics and various applied
sciences, offering a vital framework for modeling and understanding diverse phenomena. These
equations involve functions of one variable and their derivatives, providing a deep insight into
the behavior of many physical, biological, and economic systems. Although derivatives are
usually denoted using ′ (so f ′ , f ′′ , etc.), it is common to use a ˙ to denote derivatives with
respect to time (so f˙, f¨, etc.).
Definition 8.1: ODEs

An ODE is a relationship that involves a function u : R → R of a real variable x ∈ R,
32
and its derivatives. The general form of an n-th order ODE is
G x, u(x), u′ (x), u′′ (x), . . . , u(n) (x) = 0, (8.1)

where G : Rn+1 → R is a given function.
ODEs can be categorized according to different criteria:
1. Order: An ODE is of order n if u(n) is the maximal derivative appearing in the ODE.
For instance:
(a) u′′ + u = 0 ⇝ second order.
211
Chapter 8.1
(b) u(3) = x2 u + x ⇝ third order.

(c) (u′ )2 + u − x3 = 0 ⇝ first order.
2. Linearity: An ODE is linear if it is linear in u and its derivatives. Otherwise, it is

nonlinear.
(a) u′′ + u = 0 ⇝ linear.

(b) u′′ + u2 = 0 ⇝ nonlinear.
(c) u′′ + u′ u = 0 ⇝ nonlinear.
(d) u(3) = x2 u + x ⇝ linear.
(e) (u′ )2 + u − x3 = 0 ⇝ nonlinear.
3. Homogeneity (for linear ODEs): A linear ODE is homogeneous if all terms involve the
function or its derivatives (this is equivalent to asking that if u is a solution, then Au is a
solution for all A ∈ R). It is non-homogeneous if there is an additional term independent
of the function.
(a) u′′ + u = 0 ⇝ homogeneous.

(b) u(3) = x2 u + x ⇝ non-homogeneous.
(c) u(3) = x2 u ⇝ homogeneous.
32
ODEs are pivotal in fields such as physics, engineering, biology, and economics, as they
model phenomena where the rate of change of a quantity is related to the quantity itself.
Example 8.2. — Here we present some classic examples of ODEs and their applications:
1. Newton’s Law of Cooling: In the field of heat transfer, Newton’s Law of Cooling plays
a pivotal role in understanding the dynamics of temperature change. This law states
that:
“The rate of heat loss of a body is directly proportional to the difference in

the temperatures between the body and its surrounding environment.”
This principle leads to the formulation of a differential equation that governs the tem-
perature dynamics of an object. The equation is expressed as:

Ṫ (t) = −k T (t) − Tenv ,
where:
• T (t) represents the temperature of the object at time t;

• Tenv is the temperature of the surrounding environment;

Chapter 8.1
• k is a positive constant, which represents the proportionality factor in the rate of

heat transfer.
This ODE is linear, non-homogeneous, and of first order

The equation illustrates how the rate of temperature change in an object is contin-
gent upon the temperature difference with its environment, a concept widely applied in
engineering, meteorology, and various scientific studies related to heat transfer.
2. Harmonic Oscillator: In the realm of classical mechanics, the concept of a harmonic

oscillator is central to understanding various physical systems. It refers to a system
where, upon being displaced from its equilibrium position, there is a restoring force F
that is directly proportional to the displacement x.
The fundamental principle of the simple harmonic oscillator can be described as follows:
“A system displaced from an equilibrium position experiences a restoring force

F , which is proportional to the displacement x.”
From Newton’s second law of motion ẍ(t) = −F (x(t)), this leads to the formulation of
the following ODE for the simple harmonic oscillator:
ẍ(t) + ω 2 x(t) = 0,
32 where ω denotes the angular frequency of the oscillations. This ODE is linear, homoge-
neous, and of second order.
In real oscillators, friction slows the motion of the system. In many vibrating systems,
the frictional force can be modeled as being proportional to the velocity ẋ of the object.
This leads to the formulation of the following ODE for the damped harmonic oscillator:
ẍ(t) + 2ζω ẋ(t) + ω 2 x(t) = 0,
where ζ ≥ 0 is called “damping ratio”.

This equation is fundamental in the study of oscillatory systems and finds applications
across various fields, including physics, engineering, and even economics.
3. Logistic Population Growth: In population dynamics, the logistic growth model is a

fundamental concept that elucidates the effects of resource limitations on population
growth by introducing the concept of carrying capacity, a threshold beyond which re-
source scarcity hinders a further increase in population.
The logistic growth model is mathematically articulated as

P (t)
Ṗ (t) = rP (t) 1 − ,
K
where:

Chapter 8.1
• P (t) denotes the population size at time t;

• r represents the intrinsic growth rate, indicating the potential increase rate in the
absence of resource constraints;
• K is the carrying capacity, defined as the maximum sustainable population size
given the prevailing environmental conditions.
This ODE is nonlinear and of first order.

This model is used to understand real-world population dynamics, as it accounts for the
practical constraints of resource availability and environmental capacity.
4. Bessel Equation: The Bessel equation is a significant differential equation in mathe-

matics and physics, particularly in problems involving cylindrical coordinates. For a
parameter α ∈ R, the Bessel equation is given by
x2 u′′ (x) + xu′ (x) + (x2 − α2 )u(x) = 0.
This ODE is linear, homogeneous, and of second order.

Solutions to the Bessel equation are known as Bessel functions. These functions are
particularly significant in physics for values of α ∈ Z and α + 21 ∈ Z. They describe
various physical phenomena such as heat conduction or wave propagation in cylindrical
media and also appear in quantum mechanics.
32
5. Airy Equation: Another related equation is the Airy equation, which is expressed as
u′′ (x) − α2 xu(x) = 0.
The Airy function is a special solution to this differential equation. It is particularly

relevant in quantum mechanics, where it is correlated with the Schrödinger equation for
a particle in a triangular potential well.
8.3. — So far, we have only talked about single differential equations, but one may also
study systems of differential equations. In addition, solutions of a differential equation are
required to satisfy some “initial conditions” such as u(0) = 0 (for example, this can correspond
to prescribing position at time 0) and/or u′ (0) = 1 (this can correspond to prescribing velocity
at time 0). These are called boundary conditions.
8.1.1 Linear First Order ODEs

In this section, we show how to solve linear differential equations of first order. We specify an
interval I ⊆ R that is non-empty and does not consist of a single point.
We start with the homogeneous case.

Chapter 8.1
Proposition 8.4: Homogeneous Linear 1st Order ODEs

Let f : I → R be a continuous function and consider the homogeneous first-order linear
ODE
u′ (x) + f (x)u(x) = 0 ∀ x ∈ I. (8.2)
Let F : I → R be a primitive of f . Then, all C 1 solutions u : I → R of the above ODE

are of the form
u(x) = Ae−F (x) , with A ∈ R.
In other words, the set of solutions of (8.2) forms a one-dimensional linear subspace of
C 1 (I).
Proof. Given A ∈ R, define u(x) = Ae−F (x) . Then
u′ (x) = −F ′ (x)Ae−F (x) = −f (x)Ae−F (x) = −f (x)u(x) ∀ x ∈ I,
that is, u solves the ODE.

Vice versa, let u ∈ C 1 (I) solve (8.2) and define v(x) = eF (x) u(x). Then
′
v ′ (x) = eF (x) u(x) + eF (x) u′ (x) = eF (x) f (x)u(x) − f (x)u(x) = 0

∀ x ∈ I.
32 By Corollary 5.46, we deduce that v(x) = A for some A ∈ R, or equivalently u(x) = Ae−F (x) .
Remark 8.5. — As we have seen, solutions of (8.2) are defined in terms of a primitive of f .
Since primitives are defined up to a constant, one could wonder what happens if one replaces
F by F + C for some constant C ∈ R. This would correspond to replacing Ae−F (x) with
Ae−C e−F (x) , but since A ∈ R is arbitrary, this plays no essential role in the final statement.
8.6. — We can now investigate the solvability of non-homogeneous linear first order ODEs,
namely
u′ (x) + f (x)u(x) = g(x) ∀ x ∈ I. (8.3)
To motivate the next result, we look for a special solution by applying the method of variation
of constants. The idea is that, instead of looking for solutions of the form x 7→ Ae−F (x)
(that we know solve the homogeneous equation), we look for solutions u(x) = H(x)e−F (x) for
some C 1 function H : I → R. With this choice it follows that

u′ (x) = H ′ (x)e−F (x) − H(x)F ′ (x)e−F (x) = H ′ (x)e−F (x) − f (x)u(x).
Hence, if we want u to solve (8.3) we need to impose that H ′ (x)e−F (x) = g(x), or equivalently,
H is a primitive of g(x)eF (x) .

Chapter 8.1
Motivated by this discussion, we can now prove the following:
Proposition 8.7: Non-Homogeneous Linear 1st Order ODEs

Let f, g : I → R be continuous functions and consider the non-homogeneous first-order
linear ODE (8.3). Let F : I → R be a primitive of f , and H : I → R a primitive of
geF . Then, all C 1 solutions u : I → R of the above ODE are of the form
u(x) = H(x)e−F (x) + Ae−F (x) , with A ∈ R.
In other words, the set of solutions of (8.3) forms a one-dimensional affine subspace of
C 1 (I).
Proof. First, given A ∈ R and u(x) = (H(x) + A)e−F (x) , it follows that
u′ (x) = H ′ (x)e−F (x) − F ′ (x)(H(x) + A)e−F (x)

= g(x)eF (x) e−F (x) − f (x)u(x) = g(x) − f (x)u(x) ∀ x ∈ I,
so u solves (8.3).
Vice versa, if u solves (8.3) then v(x) = u(x) − H(x)e−F (x) solves
v ′ (x) = u′ (x) − H ′ (x)e−F (x) − F ′ (x)H(x)e−F (x)

32 = u′ (x) − g(x)eF (x) e−F (x) + f (x)H(x)e−F (x)
= u′ (x) − g(x) + f (x) u(x) − v(x) = −f (x)v(x).

In other words, v(x) solves (8.2), so Proposition 8.7 implies that v(x) = Ae−F (x) for some
A ∈ R. Since u(x) = v(x) + H(x)e−F (x) , this proves the result.
The previous results give us formulas to solve every linear first order ODE. However, in
a concrete case, the difficulty will be determining the primitive F of f and then the one of
g(x)eF (x) . As we have seen above, solutions are uniquely determined up to a free parameter
A ∈ R. This will be used to impose the boundary condition.
Example 8.8. — We want to solve the ODE
2
u′ (x) − 2xu(x) = ex , u(0) = 1, (8.4)
2
on R. Following Proposition 8.7, we set f (x) = −2x and g(x) = ex . Then a primitive of f is
2 2
the function F (x) = −x2 , while a primitive of g(x)eF (x) = ex e−x = 1 is given by x. So, u
2
must be of the form u(x) = (x + A)ex . Imposing the boundary condition u(0) = 1 we obtain
A = 1, therefore the solution to the above ODE is given by
2
u(x) = (x + 1)ex . (8.5)

Chapter 8.1
Remark 8.9. — If one forgets the formula from Proposition 8.7, one can try to remember
the following procedure to solve (8.3).
Recalling that multiplying by a function of the form ew(x) for some function w is “useful”
(based on what we have seen in previous pages), we multiply (8.3) by ew(x) , so to get
u′ (x)ew(x) + f (x)u(x)ew(x) = g(x)ew(x) .

′
Then we look for a special choice of w so that the left-hand side above is equal to u(x)ew(x) .
This is equivalent to asking
u′ (x)ew(x) + w′ (x)u(x)ew(x) = u′ (x)ew(x) + f (x)u(x)ew(x) =⇒ w′ (x) = f (x).
Hence, if we choose w = F a primitive of f (note that here we can choose any primitive of
f without worrying about the additional constant C, since all that matters is that F ′ = f ),
32 then ′
u(x)eF (x) = g(x)eF (x) ,
therefore Z
F (x)
u(x)e = geF + A,
for some A ∈ R. In other words, if H is a primitive of geF then
u(x)eF (x) = H(x) + A =⇒ u(x) = H(x)e−F (x) + Ae−F (x) .
Exercise 8.10. — Find a solution to the ODE
u′ − 4
+ 1 u = x4

x u(1) = 1
in the interval (0, ∞).
8.1.2 Autonomous First Order ODEs

Autonomous first-order ordinary differential equations (ODEs) are a class of differential equa-
tions where the rate of change of a variable is a function of the variable itself, independent of
the independent variable (often time). Mathematically, they are expressed as:
33
u′ (x) = f (u(x)) (8.6)
for some continuous function f : R → R. The general solution of an autonomous first-order

ODE can be found using separation of variables. The process involves rearranging the
equation to separate the functions of u and x, and then integrating both sides. More precisely,

Chapter 8.1
assuming that f (u(x)) ̸= 0, we can divide both side and obtain
u′ (x)
= 1.
f (u(x))
Integrating both side and using the change of variable formula (7.8), we get
Z Z Z
1 1
du = u′ (x) dx = 1 dx + C = x + C, (8.7)
f (u) f (u(x))
where C is the constant of integration. Hence, if H is a primitive of f,

1
then we get
H(u(x)) = x + C =⇒ u(x) = H −1 (x + C).
Note that, since by assumption f1 ̸= 0 on the domain of integration (otherwise we could not
divide by f (u(x))), it means that H ′ = f1 ̸= 0, so H is invertible (since it is either strictly
increasing or strictly decreasing).
Example 8.11. — Consider the logistic growth model used in population dynamics:

′ u(x)
u (x) = ru(x) 1 − (8.8)
K
where u(x) ∈ (0, K) represents the population at “time” x, r is the growth rate, and K is the
33
carrying capacity (see Example 8.2(3)). To solve this, we rearrange and integrate:
Ku′ (x)
Z Z
K
=r =⇒ du = r dx = rx + C
u(x) (K − u(x)) u (K − u)
The left-hand side is the integral of a rational function, that can be solved as discussed in
Section 7.3.4: one can observe that
K 1 1
= + ,
u (K − u) u K −u
hence (recall that, in this model, 0 < u < K)

Z Z Z
K 1 1 u
du = du + du = log u − log(K − u) = log .
u (K − u) u K −u K −u
This implies

u(x) u(x)
log = C + rx =⇒ = eC+rx
K − u(x) K − u(x)
=⇒ u(x) = KeC+rx − u(x)eC+rx
1 + eC+rx u(x) = KeC+rx

=⇒
eC+rx
=⇒ u(x) = K .
1 + eC+rx

Chapter 8.1
If u0 ∈ (0, K) denotes the initial population, then setting x = 0 (here x plays the role of time)
we get
eC u0
u0 = K C
=⇒ eC =
1+e K − u0 ,
which gives
Ku0
u(x) = .
u0 + (K − u0 ) e−rx
Example 8.12. — Exponential decay can be modeled by an autonomous first-order ODE:
u′ (x) = −ku(x)
where k > 0 is the decay constant. This ODE could be solved using (8.4), but we show an
alternative approach via separation of variables.
More precisely, if u is identically zero then there is nothing to prove. Otherwise, if u is
non-zero in some interval, then we can divide by u and integrate to get
Z Z
1
du = −k dx =⇒ log |u| = −kx + C. (8.9)
u
This proves that, in each interval I where u does not vanish, there exists a constant C ∈ R such
33 that |u(x)| = eC e−kx . Since u has to be continuous, this implies that either u(x) = eC e−kx
or u(x) = −eC e−kx on the whole R. Therefore, in conclusion, u(x) = ae−kx for some a ∈ R.
Imposing the condition that u(0) = u0 , this gives
u(x) = u0 e−kx ∀ x ∈ R. (8.10)
Remark 8.13 (Method of Separation of Variables). — The method described above can
be generalized to ODEs of the form
u′ (x) = f (u(x))g(x) (8.11)
for some continuous functions f, g : R → R. More precisely, assuming as before that f (u(x)) ̸=
0, we can divide both side and obtain
u′ (x)
= g(x).
f (u(x))
Integrating both side and using the change of variable formula (7.8), we get
Z Z Z
1 1
du = u′ (x) dx = g(x) dx + C, (8.12)
f (u) f (u(x))

Chapter 8.1
where C is the constant of integration. Hence, if H is a primitive of 1

f and G is a primitive of
G, then we get
H(u(x)) = G(x) + C =⇒ u(x) = H −1 (G(x) + C).
Finally, the boundary condition uniquely identifies C.
8.1.3 Homogeneous Linear Second Order ODEs with Constant Coefficients

Second order linear differential equations are considerably more difficult to solve than first or-
der equations. We start by considering the simplest type of second order differential equation,
namely homogeneous linear ODEs with constant coefficients, namely,
u′′ + a1 u′ + a0 u = 0 (8.13)
for given a0 , a1 ∈ R. Let us first consider some examples.
Example 8.14. — Solutions of u′′ = 0 are affine functions, that is, u(x) = Ax + B for
constants A, B ∈ R.
Solutions of u′′ − u = 0 are functions of the form
33
u(x) = Aex + Be−x
for constants A, B ∈ R.
Solutions of u′′ + u = 0 are functions of the form
u(x) = A sin(x) + B cos(x)
for constants A, B ∈ R. Since sine and cosine can be written in terms of e±ix , one can also
look for solutions of the form
u(x) = Ceix + De−ix
with C, D ∈ C and then re-express the solution in terms of sine and cosine (recall that we are
interested in real-valued functions).
Exercise 8.15. — Check the assertions made in Example 8.14.
8.16. — The last two examples above suggest the approach of looking for solutions of
(8.13) of the form u(x) = eαx for some α ∈ C. Indeed, with this choice we see that
u′′ (x) + a1 u′ (x) + a0 u(x) = α2 + a1 α + a0 u(x) = 0.


Chapter 8.1
In other words, α must be a zero of the so-called characteristic polynomial
p(t) = t2 + a1 t + a0 .
We distinguish three cases, depending on whether the discriminant ∆ = a21 − 4a0 is positive,
negative, or zero.
• Case 1: ∆ > 0. In this case, the characteristic polynomial p(t) has two distinct real roots
√ √
−a1 + ∆ −a1 − ∆
α= , β= . (8.14)
2 2
This implies that the real-valued functions x 7→ eαx and x 7→ eβx are solutions, and therefore
u(x) = Aeαx + Beβx
is a solution of (8.13) for every A, B ∈ R.

• Case 2: ∆ < 0. In this case, the characteristic polynomial p(t) has two distinct complex
roots √ √
a1 −∆ a1 −∆
α + iβ = − + i , α − iβ = − − i . (8.15)
2 2 2 2
This means that the two complex-valued functions x 7→ e(α±iβ)x solve (8.13), therefore their
real and imaginary parts are real-valued solutions. This gives that
33
u(x) = Aeαx sin(βx) + Beαx cos(βx)
is a solution of (8.13) for every A, B ∈ R.

• Case 3: ∆ = 0. In this case, the characteristic polynomial p(t) has only one real zero
a1
α=− , (8.16)
2
thus x 7→ eαx is a solution of (8.13). To find another solution we recall the example u′′ = 0.
In this case two solutions are given by 1 and x, and these two solutions can be written as eγx
and xeγx with γ = 0.
Motivated by this observation, one could wonder whether x 7→ xeαx is a solution. This is
indeed the case:
′′ ′
xeαx + a1 xeαx + a0 xeαx = α2 + a1 α + a0 xeαx + (2α + a1 )eαx = 0

| {z } | {z }
=0 =0
where the first term vanishes because α is a root of the characteristic polynomial, while the
second term vanishes because of (8.16). This shows that
u(x) = Aeαx + Bxeαx
solves (8.13) for every A, B ∈ R.

Chapter 8.1
It is customary, for second order ODEs, to prescribe both the value of u and the value of
its derivative at some point (for instance, u(0) = 1 and u′ (0) = 0). The fact that we have
two constants A, B guarantees that we can choose them so as to impose these two boundary
conditions.
Now that we have found solutions to (8.13), we want to prove that they are the only ones.
This is the content of the next:
Proposition 8.17: Existence and Uniqueness: the Homogeneous Case
Following the terminology from Paragraph 8.16 above, consider the following solutions
to the homogeneous ODE (8.13):
∆>0: u1 (x) = eαx , u2 (x) = eβx , α, β as in (8.14),

∆<0: u1 (x) = eαx sin(βx), u2 (x) = eαx cos(βx), α, β as in (8.15),
∆=0: u1 (x) = eαx , u2 (x) = xeαx , α as in (8.16).
If u ∈ C 2 (I) solves (8.13), then there exist A, B ∈ R such that u = Au1 + Bu2 .
In other words, the set of solutions of (8.13) forms a two-dimensional linear subspace
of C 2 (I).
Proof. (Extra material) We consider the case ∆ > 0 (the other cases can be treated analo-
gously). Assume for simplicity that 0 ∈ I (the general case can be treated similarly, choosing
33 a point x0 ∈ I and arguing in a similar fashion with x0 in place of 0). Since
u1 (0) = u2 (0) = 1, u′1 (0) = α > β = u′2 (0).
if we define
αu2 (x) − βu1 (x) u1 (x) − u2 (x)
v1 (x) = , v2 (x) = ,
α−β α−β
then v1 and v2 are two solutions of (8.13) satisfying
v1 (0) = 1, v1′ (0) = 0, v2 (0) = 0, v2′ (0) = 1.
Now, given u ∈ C 2 (I) solution of (8.13), consider w(x) = u(x) − u(0)v1 (x). This is still a
solution and w(0) = u(0) − u(0)v1 (0) = 0. We then consider the function
W (x) = w(x)v2′ (x) − w′ (x)v2 (x)
(this function is called “Wronskian”). Using that both w and v2 solve (8.13) one can check
that W ′ = 0, thus W is constant. Since W (0) = 0 (because w(0) = v2 (0) = 0), we conclude
that
w(x)v2′ (x) − w′ (x)v2 (x) = 0.
Now, if w is identically zero, then there is nothing to prove (since this means that u = u(0)v1 ).
Otherwise, if w is not identically zero, by continuity we can find a small interval where both

Chapter 8.1
w and v2 are non-zero, and we get
w′ (x) v2′ (x)

− =0 =⇒ log′ |w(x)| − log′ |v2 (x)| = 0,
w(x) v2 (x)
which implies that log |w(x)| − log |v2 (x)| = C for some C ∈ R. This shows that, in each
interval I where w and v2 do not vanish, there exists a constant C ∈ R such that |w(x)| =
eC |v2 (x)|. By continuity this implies that as long as v2 does not vanish, then also w does not
vanish and w(x) = av2 (x) for some constant a ∈ R. Since in our case v2 vanished only at 0
(as one can easily check), then we deduce that there exist two constants b, c ∈ R such that
w = bv2 on (−∞, 0), w = cv2 on (0, ∞)
This implies that

′
w− (0) = bv2′ (0) = b, ′
w+ (0) = cv2′ (0) = c.
So, since w ∈ C 2 (I) by assumption, the only option is b = c, which proves that w(x) = bv2 (x)
in R. In conclusion
αu2 (x) − βu1 (x) u1 (x) − u2 (x)

u(x) = u(0)v1 (x) + bv2 (x) = u(0) +b ,
α−β α−β
which implies the result.

33
Example 8.18. — We attach a weight to a spring so that it is free to oscillate in the

vertical direction, and want to determine the position u(t) of the weight as a function of time
t. We choose the coordinate system so that u = 0 corresponds to the state of equilibrium
where the weight does not move. According to Newton’s fundamental laws of motion, the
second derivative u′′ multiplied by the mass m of the weight is equal to the force acting on
the weight.
A component of this force arises from the expansion of the spring and is oriented towards rest.
According to Hooke’s law, this force is given by −ku, where the real number k > 0 is called
the spring constant. Furthermore, friction forces generally act movement. We assume that
the corresponding force action is given by −du′ , where d ≥ 0 is the damping constant. The

Chapter 8.1
differential equation describing the motion u(t) of the mass is thus mu′′ = −du′ − ku, or
d ′ k
u′′ + u + u=0
m m
which
qis, therefore, a linear homogeneous differential equation of second order. If we set
ω= m k
and ζ = 2mω
d
, then the ODE becomes
u′′ + 2ζωu′ + ω 2 u = 0
(recall Example 8.2(2)), whose characteristic polynomial is
p(t) = t2 + 2ζωt + ω 2
with discriminant ∆ = 4(ζ 2 − 1)ω 2 .

If ∆ < 0 holds (equivalently ζ < 1), friction is small compared to the spring strength, and
we obtain solutions of the form
u(t) = e−ζωt A sin(γt) + B cos(γt) ,

with γ = 1 − ζ 2 ω. The constants A and B depend on the initial position u(0) and the
p
initial velocity u′ (0) of the mass. In the case ζ = 0, the oscillation is undamped and u is a
33 periodic function.
If friction is large compared to the strength of the spring so that ∆ ≥ 0 (this happens
when ζ ≥ 1), then the oscillating behavior disappears and the weight returns exponentially
fast to its steady state: if ζ > 1 then
p
u(t) = Ae−λ1 t + Be−λ2 t , λ1,2 = ζ ± ζ 2 − 1 ω,
while if ζ = 1 then
u(t) = Ae−ωt + Bte−ωt .
One can note that ζ − ζ 2 − 1 < 1 for ζ > 1, so the fastest exponential convergence is
p
achieved when ζ = 1. This behavior is desirable, for example, in a door-closing mechanism.
8.1.4 Non-Homogeneous Linear Second Order ODEs with Constant Coef-

ficients
8.19. — After having studied the homogeneous case, we now want to solve the constant
coefficients non-homogeneous linear second order ODE
u′′ + a1 u′ + a0 u = g (8.17)

Chapter 8.1
for given a0 , a1 ∈ R and g ∈ C 0 (I). Following the terminology from Paragraph 8.16, consider
the solutions to the homogeneous ODE:

We want to solve (8.17) using the method of variation of constants, that is, we look for a
solution u of the form u(x) = H1 (x)u1 (x) + H2 (x)u2 (x) for some functions H1 , H2 ∈ C 2 (I).
Calculating u′ and u′′ gives
u′ = (H1′ u1 + H2′ u2 ) + (H1 u′1 + H2 u′2 ),
u′′ = (H1′ u1 + H2′ u2 )′ + (H1′ u′1 + H2′ u′2 ) + (H1 u′′1 + H2 u′′2 ).
Therefore
u′′ + a1 u′ + a0 u = (H1′ u1 + H2′ u2 )′ + (H1′ u′1 + H2′ u′2 ) + (H1 u′′1 + H2 u′′2 )
+ a1 (H1′ u1 + H2′ u2 ) + a1 (H1 u′1 + H2 u′2 ) + a0 (H1 u1 + H2 u2 ).
Using that u1 and u2 solve the homogeneous equations, the expression above gives
33
u′′ + a1 u′ + a0 u = (H1′ u1 + H2′ u2 )′ + (H1′ u′1 + H2′ u′2 ) + a1 (H1′ u1 + H2′ u2 ).
Hence, we deduce that u is a solution of (8.17) provided the following holds:
H1′ (x)u1 (x) + H2′ (x)u2 (x) = 0, H1′ (x)u′1 (x) + H2′ (x)u′2 (x) = g(x).
Using the first equation, one can express H2′ in terms of H1′ , that is,
u1 ′
H2′ = − H . (8.18)
u2 1
Then, inserting this relation into the second equation, we obtain
u1 ′ ′ u2 g
H1′ u′1 − u H =g =⇒ H1′ = .
u2 2 1 u′1 u2 − u′2 u1
Substiting this relation back into (8.18), we conclude that

Z Z
u2 g u1 g
H1 = dx, H2 = dx.
u1 u2 − u′2 u1
′ u′2 u1− u′1 u2
In other words, if H1 and H2 are primitives of the functions appearing in the integral above,
then u = H1 u1 + H2 u2 is a particular solution of (8.17). Finally, the set of all solutions can

Chapter 8.1
be found by adding to u solutions to the homogeneous equations, that is
u = Au1 + Bu2 + (H1 u1 + H2 u2 ), A, B ∈ R.
We summarize this discussion in the following:
Proposition 8.20: Existence and Uniqueness: the Non-Homogeneous Case
Following the terminology from Paragraph 8.16 above, consider the following solutions
to the homogeneous ODE (8.13):

Also, let H1 and H2 denote two primitives of u′ u2u−u

2g
′u and u1 g
u′2 u1 −u′1 u2
, respectively. If
1 2 1
u ∈ C (I) solves (8.17), then there exist A, B ∈ R such that
2
u = Au1 + Bu2 + (H1 u1 + H2 u2 ).
In other words, the set of solutions of (8.13) forms a two-dimensional affine subspace
of C 2 (I).
33
Although this method is very general, computationally it can be very involved. So, in
some (very special) cases it may be easier to “guess” a particular solution to the homogeneous
equation by looking at functions of the form p(x)eγx , p(x)eγx cos(ηx), or p(x)eγx sin(ηx),
where p(x) is a polynomial and γ, η > 0 (depending on the structure of g, one of these
functions may work).
Example 8.21. — Solve the ODE
u′′ (x) + u(x) = 1, u(0) = 0, u′ (0) = 1.
In this case, all solutions to the homogeneous equation are A cos(x) + B sin(x). Therefore, we
look for a solution of the form u(x) = H1 (x) cos(x) + H2 (x) sin(x).
This leads to the two equations
H1′ (x) cos(x) + H2′ (x) sin(x) = 0, −H1′ (x) sin(x) + H2′ (x) cos(x) = 1,
and then, solving the linear system as we did above, we get

Z Z
sin(x)
H1 = − dx = − sin(x) dx
sin2 (x) + cos2 (x)

Chapter 8.1
Z Z
cos(x)
H2 = dx = cos(x) dx.
cos (x) + sin2 (x)
2
Hence, we can take H1 = cos(x) and H2 = sin(x), which leads to the particular solution
u(x) = cos2 (x) + sin2 (x) = 1 (you see that, in this case, one may have tried to guess it!). So,
the general solution is given by
u(x) = 1 + A cos(x) + B sin(x).
Imposing the boundary conditions u(0) = 0 and u′ (0) = 1, we conclude that
u(x) = 1 − cos(x) + sin(x).
Example 8.22. — Solve the ODE
u′′ (x) + u(x) = sin(x), u(0) = 0, u′ (0) = 1.
In this case, the method of constants gives

Z Z
sin(x) sin(x)
H1 = − dx = − sin(x)2 dx
sin2 (x) + cos2 (x)
Z Z
cos(x) sin(x)
H2 = dx = cos(x) sin(x) dx.
33 cos2 (x) + sin2 (x)
In this case we can take H1 = 21 cos(x) sin(x) − x and H2 = − 21 cos2 (x), which leads to the

particular solution u(x) = 21 cos2 (x) sin(x) − x cos(x) − cos2 (x) sin(x) = − 12 x cos(x). So, the

general solution is given by
x cos(x)
u(x) = − + A cos(x) + B sin(x).
2
Imposing the boundary conditions u(0) = 0 and u′ (0) = 1, we conclude that
1 3
u(x) = − x cos(x) + sin(x).
2 2
Remark 8.23. — In the solution of Example 8.22, one may note the presence of x in front
of cos(x). This is due to the fact that sin(x) and cos(x) are solutions to the homogeneous
equation, so the solution to the non-homogeneous problem cannot be just a linear combination
of them. As a general strategy, in such situations, a special solution to the homogeneous
equation is sought by multiplying the solutions of the homogeneous equation by x.
Exercise 8.24. — Solve the following ODEs:
1. u′′ (x) + u(x) = sin(2x), u(0) = 0, u′ (0) = 1.

Hint: Look for a special solution of the form a sin(2x) + b cos(2x).

Chapter 8.1
2. u′′ (x) + 4u(x) = cos(2x), u(0) = 1, u′ (0) = 0.

Hint: Look for a special solution of the form ax cos(2x) + bx sin(2x).
3. u′′ (x) + u′ (x) − 2u(x) = x2 , u(0) = 2, u′ (0) = 1.

33
Hint: Look for a special solution of the form ax2 + bx + c.
4. u′′ (x) + 2u′ (x) − 3u(x) = cos(x) + x, u(0) = 1, u′ (0) = 1.

Hint: Look for a special solution of the form a sin(x) + b cos(x) + cx + d.

Chapter 8.2
8.2 Existence and Uniqueness for ODEs
8.2.1 Existence and Uniqueness for First Order ODEs

Our goal now is to present the general theory of first-order ODEs for real-valued functions on
the real line. In general, a first order ODE would be an equation of the form G(x, u(x), u′ (x)) =
0. However, we shall assume that we can “isolate” u′ so to express it as a function of x and u.
Definition 8.25: First Order ODEs

A first-order ordinary differential equation is an equation involving a function and its
first derivative. The general form is
u′ (x) = f (x, u(x)),
where f : R2 → R is a given function.
The Cauchy-Lipschitz Theorem, also known as the Picard-Lindelöf Theorem, is a funda-

mental result in the theory of ODEs. It ensures the existence and uniqueness of solutions
under suitable conditions on f .
34 In the next theorem we need to assume that f is continuous as a function of the two
variables x and y. This means that, for any point (x0 , y0 ) in the domain and for any ε > 0,
there exists δ > 0 such that
|x − x0 | < δ and |y − y0 | < δ =⇒ |f (x, y) − f (x0 , y0 )| < ε.
Theorem 8.26: Cauchy-Lipschitz: Global Version

Let f : R × R → R satisfy the following conditions:
1. f is continuous in R × R;
2. f is Lipschitz with respect to the second variable; that is, there exists a constant
L > 0 such that
|f (x, y1 ) − f (x, y2 )| ≤ L|y1 − y2 | ∀ x ∈ R, y1 , y2 ∈ R.
Then, for any point (x0 , y0 ) ∈ R × R there exists a unique C 1 function u : R → R such
that (
u′ (x) = f (x, u(x)) for all x ∈ R,
(8.19)
u(x0 ) = y0 .

Chapter 8.2
As we shall see in Section 8.2.2 below, the proof is based on the method of successive
approximations, also known as Picard iterations. It involves constructing a sequence of con-
tinuous functions that converge to the solution of the differential equation. Before diving into
the proof of this important theorem, we first discuss some examples and generalizations.
Theorem 8.26 guarantees that solutions to the first order ODE (8.19) are unique when f
is Lipschitz in the second variable. This assumption is crucial, as the next example shows.
Example 8.27. — Consider the ODE
u′ (x) = |u(x)|α , u(0) = 0, (8.20)
with α ∈ (0, 1].

• For α = 1 the function f (y) = |y| is Lipschitz, since
|f (y1 ) − f (y2 )| = ||y1 | − |y2 || ≤ |y1 − y2 |.
Hence, Theorem 8.26 guarantees that the solution is unique. Since u = 0 is a solution, this is
the unique solution.
• For α < 1 the function f (y) = |y|α is not Lipschitz. Indeed, choosing y2 = 0 in the definition
of Lipschitz continuity, if this function were Lipschitz there would exist a constant L > 0 such
34
that
|f (y1 ) − f (0)| = |y1 |α ≤ L|y1 | ∀y1 ∈ R.
This would imply

1 ≤ L|y1 |1−α ∀y1 ∈ R \ {0},
but this is false since limy1 →0 |y1 |1−α = 0 (recall that α < 1).
Note now that, also in this case, the function u = 0 is a solution. We now try to use
separation of variables (recall Section 8.1.2) to find a second solution that is not zero, say
with u(x) > 0 somewhere:
u′ (x)
u′ (x) = u(x)α =⇒ = 1,
u(x)α
so, by integration, we obtain
u(x)1−α
Z Z
du
= dx + C =⇒ = x + C.
uα 1−α
Choosing C = 0 (which is compatible with u(0) = 0) we get

1
u(x) = (1 − α)x 1−α

Chapter 8.2
which is positive for x > 0. Hence, the function

(
0 for x ≤ 0,
u(x) = 1
for x > 0,

(1 − α)x 1−α
is a second solution of (8.20). Actually, given x0 ≥ 0, all the functions

(
0 for x ≤ x0 ,
ux0 (x) = 1
(1 − α)(x − x0 ) 1−α for x > x0 ,
solve (8.20), so there are infinitely many solutions.
Motivated by the previous example, one may wonder if the solution of (8.20) is unique for
α > 1. We begin by this observation, that we state as an exercise.
Exercise 8.28. — Let α > 1. Prove that the function f : R → R, y 7→ |y|α , is locally
Lipschitz (i.e., it is Lipschitz in every compact interval [a, b]), but it is not Lipschitz on the
whole R.
Ry
Hint: To prove local Lipschitz continuity, use that f (y1 ) − f (y2 ) = y12 f ′ (x) dx.
By the previous exercise, we see that Theorem 8.26 does not apply to (8.20) when α > 1.
34 Still, since this function is locally Lipschitz, one may hope that some existence and uniqueness
theorem still holds.
This is indeed the case, as implied by the local version of Cauchy-Lipschitz Theorem
stated below. As we shall discuss later, since now the function f is only assumed to be locally
Lipschitz, in general we cannot find a solution u defined in whole R.
Theorem 8.29: Cauchy-Lipschitz: Local Version

Let I ⊂ R be an interval, and let f : I × R → R satisfy the following conditions:
1. f is continuous in I × R;
2. f is locally Lipschitz with respect to the second variable; that is, for every compact
intervals [a, b] ⊂ I and [c, d] ⊂ R there exists a constant L > 0 such that
|f (x, y1 ) − f (x, y2 )| ≤ L|y1 − y2 | ∀ x ∈ [a, b], y1 , y2 ∈ [c, d].
Then, for any point (x0 , y0 ) ∈ I × R there exist an interval I ′ ⊂ I containing x0 , and
a unique C 1 function u : I ′ → R, such that
(
u′ (x) = f (x, u(x)) for all x ∈ I ′ ,
(8.21)
u(x0 ) = y0 .

Chapter 8.2
In other words, under a local Lipschitz assumption, one can only guarantee the existence
and uniqueness of a solution for some interval around x0 . Also, as long as the solution u(x)
remains bounded within I ′ , then one can continue applying Theorem 8.29 to extend the
interval I ′ as much as possible.
To better understand why solutions are defined only in some interval I ′ ⊂ I, we consider
the following example.
Example 8.30. — Consider the ODE
u′ (x) = u(x)2 , u(0) = 1.
The function f (y) = y 2 is locally Lipschitz, so Theorem 8.29 applies.

To find the solution, we use separation of variables. More precisely, since u(0) = 1 > 0, u
will be positive in a neighborhood of 0 and we get
u′ (x)
u′ (x) = u(x)2 =⇒ = 1,
u(x)2
so, by integration, we get

34 Z Z
du 1
= dx + C =⇒ − = x + C.
u2 u(x)
Choosing x = 0 this implies C = −1 and we get
1
u(x) = .
1−x
Note that this function solves the ODE in (−∞, 1), but limx→1− u(x) = ∞, so we cannot
extend this solution beyond x = 1.
Exercise 8.31. — Consider the ODE
u′ (x) = u(x)α , u(0) = 1,
with α > 1. Show that Theorem 8.29 applies and find the unique solution.
Although Theorem 8.29 guarantees that most nonlinear ODEs have a unique solution (at
least locally in x), nonlinear ODEs are very difficult to solve and there are no general tech-
niques to tackle such problems, neither in practice nor in theory. Therefore, in applications,
one often resorts to numerical methods.

Chapter 8.2
8.2.2 Proof of Theorem 8.26 (Extra material)

Remark 8.32. — In the proof, we shall use the following fact: If v : R → R is a continuous
function, then also the function s 7→ f (s, v(s)) is continuous. This is a consequence of
the continuity of f and the fact that the composition of continuous functions is continuous.
Although we did not prove this fact in this specific setting where f depends on two variables,
this can be proved the same way as in Proposition 3.20.
Proof. We first prove existence in four steps. In the fifth step, we prove uniqueness.
• Step 1: An equivalent integral equation. We claim that u : R → R is a C 1 solution to
(8.19) if and only if u : R → R is a continuous function satisfying
Z x
u(x) = y0 + f (s, u(s)) ds ∀ x ∈ R. (8.22)
x0
Indeed, if u solves the ODE, then by integration (see Corollary 7.6) we deduce the validity of
(8.22).
Vice versa, if u is a continuous function satisfying (8.22), then Theorem 7.4 and Re-
mark 8.32 imply that u(x) is a primitive of the continuous function f (x, u(x)), therefore
u′ (x) = f (x, u(x)). In particular, u is C 1 (since its derivative is continuous). Finally, choos-
ing x = x0 in (8.22) we deduce that u(x0 ) = y0 .
Therefore, to prove the existence, it suffices to construct a solution to (8.22). This will
35
be accomplished by constructing what are known as Picard approximations, a sequence of
functions that converge to a solution of (8.22).
• Step 2: Construction of Picard approximations. First, we define the continuous
function u0 : R → R as
u0 (x) = y0 ∀x ∈ R
Then, we define u1 : R → R as
Z x
u1 (x) = y0 + f (s, u0 (s)) ds.
x0
Note that the integral is well defined since u0 is continuous and therefore also the function
s 7→ f (s, u0 (s)) is continuous (see Remark 8.32). We also observe that u1 is the primitive of
a continuous function, so it is C 1 (and, in particular, continuous).
More generally, given n ∈ N, once the continuous function un : R → R is constructed, then
we define Z x
un+1 (x) = y0 + f (s, un (s)) ds.
x0
Again, since un is continuous, also s 7→ f (s, un (s)) is continuous, and therefore un+1 is C 1
(and in particular continuous).
• Step 3: Convergence of Picard approximations. Fix τ = 2M 1
(so that τ M = 12 ).
We now prove the uniform convergence of the sequence of Picard approximations un on the

Chapter 8.2
interval [x0 − τ, x0 + τ ] by proving that this sequence corresponds to the partial sums of a
uniformly convergent series of functions. More precisely, define vk = uk − uk−1 , so that
n
X
un = u0 + vk .
k=1
We want to prove that the series

∞
X
u∞ (x) = y0 + vk (x)
k=1
converges absolutely for every x ∈ [x0 −τ, x0 +τ ], so that the function u∞ : [x0 −τ, x0 +τ ] → R
is well defined, and that the sequence of function {un }∞ n=0 converges uniformly to u∞ on
[x0 − τ, x0 + τ ].
To prove this, we observe that
Z x
vn+1 (x) = un+1 (x) − un (x) = f (s, un (s)) − f (s, un−1 (s)) ds ∀ x ∈ R,
x0
therefore, by the Lipschitz regularity of f in the second variable,

Z x
|vn+1 (x)| ≤ |f (s, un (s)) − f (s, un−1 (s)) | ds
35 x0
Z x Z x (8.23)
≤M |un (s) − un−1 (s)| ds = M |vn (s)| ds
x0 x0
for every x ∈ I and n ≥ 1. In particular, if we define an = maxx∈[x0 −τ,x0 +τ ] |vn (x)|, given
x ∈ [x0 − τ, x0 + τ ] it follows that [x0 , x] ⊂ [x0 − τ, x0 + τ ]. Therefore
Z x Z x
|vn (s)| ds ≤ an ds = an |x − x0 | ≤ an τ,
x0 x0
that combined with (8.23) yields
an
|vn+1 (x)| ≤ M an τ = .
2
Since x is an arbitrary point inside [x0 − τ, x0 + τ ], this proves that
an
an+1 ≤ M an τ = ∀ n ≥ 1.
2
This implies (by induction) that an+1 ≤ 2−n a1 which gives, for x ∈ [x0 − τ, x0 + τ ],
∞
X
u∞ (x) = y0 + vk (x), |vk+1 (x)| ≤ 2−k a1 .
k=1

Chapter 8.2
By the majorant criterion, the series above converges absolutely. Also, for x ∈ [x0 − τ, x0 + τ ],
∞
X ∞
X
|u∞ (x) − un (x)| ≤ |vk (x)| ≤ a1 2k−1 = a1 2n−1 → 0 as n → ∞,
k=n+1 k=n+1
which proves that the sequence of function {un }∞n=0 converge uniformly to u∞ on [x0 −τ, x0 +τ ].
• Step 4: The limit function u∞ (x) solves (8.22). We want to take the limit, as n → ∞,
in the formula
Z x
un+1 (x) = y0 + f (s, un (s)) ds, ∀ x ∈ [x0 − τ, x0 + τ ].
x0
We first observe that the term on the left-hand side converges to u∞ (x) as n → ∞.
To prove the convergence of the right-hand side, we estimate
Z x Z x
|f (s, u∞ (s)) − f (s, un (s)) | ds ≤ M |u∞ (s) − un (s)| ds ,
x0 x0
and the last term converges to 0 as n → ∞ thanks to Theorem 6.42 (since |u∞ − un | → 0
uniformly). This proves that
Z x Z x
f (s, un (s)) ds → f (s, u∞ (s)) ds as n → ∞,
x0 x0
35 so, we conclude that u∞ solves (8.22) on [x0 − τ, x0 + τ ].

This proves the existence of a solution u∞ on [x0 − τ, x0 + τ ]. Now, to prove the existence
of a solution on the whole interval I we need to iterate this argument. More precisely, we
define x1 = x0 + τ , y1 = u∞ (x1 ), and we consider the ODE
(
u′ (x) = f (x, u(x)) for all x ∈ [x1 , x1 + τ ],
u(x1 ) = y1 .
Repeating the argument above on [x1 , x1 + τ ], we construct a solution u∞ : [x1 , x1 + τ ] → R.

Then, we define x2 = x1 + τ = x0 + 2τ and y2 = u∞ (x2 ), we consider the ODE
(
u′ (x) = f (x, u(x)) for all x ∈ [x2 , x2 + τ ],
u(x2 ) = y2 ,
we find a solution u∞ : [x2 , x2 + τ ] → R, and so on. In this way, we construct a solution on

[x0 − τ, ∞).
Analogously, we define x−1 = x0 − τ and y−1 = u∞ (x−1 ), we consider the ODE
(
u′ (x) = f (x, u(x)) for all x ∈ [x−1 − τ, x1 ],
u(x−1 ) = y−1 ,
and we find a solution on u∞ : [x−1 − τ, x1 ] → R. Then we define x−2 = x−1 − τ and y−2 =

Chapter 8.2
u∞ (x−2 ), and we continue analogously. This allows us to construct a function u∞ : R → R

that solves (8.22) on the whole R.
• Step 5: Uniqueness. Let u1 , u2 : R → R be two solutions of (8.19), and therefore of
(8.22). Then, similarly to Step 2, we note that
Z x Z x
|u1 (x) − u2 (x)| = |f (s, u1 (s)) − f (s, u2 (s)) | ds ≤ M |u1 (s) − u2 (s)| ds .
x0 x0
Hence, if we define a = maxx∈[x0 −τ,x0 +τ ] |u1 (x) − u2 (x)|, then the inequality above yields
Z x
|u1 (x) − u2 (x)| ≤ M a ds = M a|x − x0 | ≤ M τ a ∀ x ∈ [x0 − τ, x0 + τ ],
x0
therefore
a
a ≤ Mτa = .
2
This implies that a = 0, that is, u1 = u2 on [x0 − τ, x0 + τ ].
We can now repeat this argument in subsequent intervals as we did in Step 4. More
precisely, we first repeat the argument in [x0 + τ, x0 + 2τ ] to show that u1 = u2 there, then
in [x0 + 2τ, x0 + 3τ ], and so on. In this way, we deduce that u1 = u2 on [x0 − τ, ∞). Then,
we repeat the argument in [x0 − 2τ, x0 − τ ], then in [x0 − 3τ, x0 − 2τ ], etc., until we conclude
that u1 = u2 on the whole R.
35 8.2.3 Higher Order ODEs (Extra material)

Suppose we are given an n-th order ODE G x, u(x), u′ (x), u′′ (x), . . . , u(n) (x) = 0, where it

is possible to isolate the highest order derivative u(n) (x) and express it as a function of x and
lower order derivatives of u. Specifically, we assume that
u(n) (x) = f x, u(x), u′ (x), . . . , u(n−1) (x) , (8.24)

where f : Rn → R is a given function.

Our goal is to demonstrate that this ODE can be reformulated as a system of first-order
ODEs. To achieve this, we introduce variables U1 , U2 , . . . , Un to denote the derivatives of u
up to the (n − 1)-th order. These variables are defined as follows:



 U1 = u,
′ ′

 U2 = u = U1 ,



U3 = u′′ = U2′ ,
 ..
.





 U = u(n−1) = U ′

n n−1 .
Consequently, Equation (8.24) is transformed into
Un′ (x) = u(n) (x) = f x, u(x), u′ (x), . . . , u(n−1) (x) = f x, U1 (x), U2 (x), . . . , Un (x) ,


Chapter 8.2
thus converting the original n-th order ODE into a system of first-order ODEs:

′
 U1 = U2 ,


′

 U2 = U3 ,



..
 .
′




 Un−1 = Un ,
 U′

= f x, U1 (x), U2 (x), . . . , Un (x) .
n
The Cauchy-Lipschitz Theorem can be extended to systems of first-order ODEs, ensuring

existence and uniqueness of solutions whenever f is locally Lipschitz continuous in each of the
last n − 1 variables. More precisely, once one prescribes the values of
U1 (x0 ) = u(x0 ), U2 (x0 ) = u′ (x0 ), ..., Un (x0 ) = u(n−1) (x0 )
for some x0 ∈ I, then there exists a unique solution (local or global, depending on the
assumptions) satisfying these boundary conditions. As a consequence of this result, one can
for instance recover (and generalize) Proposition 8.17
Proposition 8.33: Existence and Uniqueness for Linear 2nd Order ODEs
Given a0 , a1 : I → R continuous and bounded, consider the linear homogeneous second

order ODE
u′′ (x) + a1 (x)u′ (x) + a0 (x)u(x) = 0 ∀ x ∈ I. (8.25)
35
Then, the set of solutions to this equation forms a two-dimensional linear subspace of
C 2 (I).
Proof. Fix x0 ∈ I. According to the previous discussion, if we define U1 = u and U2 = u′ ,

then (8.25) can be rewritten as
(
U1′ = U2 ,
U2′ = −a1 (x)U2 − a0 (x)U1 ,
which has a unique solution once we prescribe the values of U1 (x0 ) and U2 (x0 ). Equivalently,
(8.25) has a unique solution once we prescribe the values of u(x0 ) and u′ (x0 ).
Now, let u1 denote the unique solution satisfying u1 (x0 ) = 1 and u′1 (x0 ) = 0, and let
u2 denote the unique solution satisfying u2 (x0 ) = 0 and u′2 (x0 ) = 1. Then, the function
u = Au1 + Bu2 is a solution of (8.25) for every A, B ∈ R.
Vice versa, if u ∈ C 2 (I) solves (8.25) and we set A = u(x0 ) and B = u′ (x0 ), then
v = u − Au1 − Bu2 is a solution of (8.25) satisfying v(x0 ) = v ′ (x0 ) = 0. By uniqueness v
must be identically zero (since the zero function is a solution), therefore u = Au1 + Bu2 as
desired.
The higher-dimensional version of Cauchy-Lipschitz will be explored in Analysis 2. Under-

standing the proof of Theorem 8.26 (in the simpler context of a single first-order ODE under

Chapter 8.2
a global Lipschitz condition) will provide a solid foundation for studying the generalized form
35
of the Cauchy-Lipschitz Theorem for systems.

Index
absolute value, 18, 31, 170 continuously differentiable, 138

accumulation point, 47 continuum, 43
affine convergent, 45
function, 136, 201 convexity, 153
Airy equation, 214
decimal fraction, 40
arc tangent, 158
decomposition, 162
arccotangent, 159
decreasing, 63
area, 167
dense, 40
argument, 130
distance, 26
associativity, 69
equivalence relation, 14
base
Euler’s number, 82
logarithm, 86
eventually constant, 45
Bessel equation, 214
exponential function, 81
big-O, 94
exponential series, 123
bijection, 21
extended number line, 37
bijective, 21
extrema, 77
boundary conditions, 214
extreme values, 77
bounded, 35, 51
bounded function, 62 factor, 68
finite set, 43
Cauchy sequence, 57
fractional part, 39
Cauchy-Schwarz inequality, 32
change of variable, 192 geometric sequence, 60
Circular disk, 33 greater, 15
class C n , 138 greater than or equal to, 15
closed, 26, 34
closed circular disk, 33 Hölder continuous, 87
coefficient, 118 higher derivative, 138
completeness axiom, 20
identity function, 73
complex conjugation, 30
image, 21
complex numbers, 28
imaginary part, 28
continuous
imaginary unit, 28
piecewise, 174
improper integral, 183
right, 93
improper limit, 59
continuous extension, 90
increasing, 63
239
Chapter 8.2 INDEX
indefinite values, 38 order relation, 14

index, 68
partial fraction decomposition, 196
inferior limit, 54
partition, 161, 162
Infimum, 37
piecewise continuous, 174
injection, 21
piecewise monotone, 173
injective, 21
pointwise convergence, 98
integrand, 167
polar coordinates, 130
integration constant, 191
positive, 15
integration limit, 167
positive part, 170
inverse function, 73
power series, 118, 121
Inverse Hyperbolic Cosine, 160
power set, 42
Inverse Hyperbolic Sine, 159
preimage, 21
inverse mapping, 73
product, 68
jump, 93
radius of convergence, 118
larger, 43 rational function, 196
Lebesgue integral, 167 real decimal fraction, 42
limit, 45 real part, 28
limit from the right, 91 real-valued, 61
line, 136 relation, 14
Lipschitz continuous, 78 removable discontinuity, 90
little-o, 95 restriction, 66
lower sums, 166 root, 75
rounding function, 39
maximal value, 77
running variable, 68
maximum, 35
minimum, 77 sequence, 45, 98
modulus, 18 sequential continuity, 70
monotone, 63 set
piecewise, 173 finite, 43
sign, 19
natural logarithm, 86
smaller, 15
negative, 15
smaller than or equal to, 15
negative part, 170
smooth function, 138
neighbourhood, 26
square root function, 22
non-negative, 15
strictly greater, 15
non-positive, 15
strictly monotone, 63
norm, 31
strictly smaller, 15
number line, 20
subsequence, 46
extended, 37
summand, 68
open, 26, 34 superior limit, 54
open circular disk, 33 surjection, 21

Chapter 8.2 INDEX
surjective, 21
triangle inequality, 19
two-point compactification, 37
uniform convergence, 99
uniformly continuous, 77
unit circle, 130
upper bound, 35
upper sums, 166
variable, 61, 118
zero, 62

Bibliography
[ACa2003] N. A’Campo, A natural construction for the real numbers arXiv preprint 0301015,
(2003)
[Apo1983] T. Apostol, A proof that Euler missed: Evaluating ζ(2) the easy way The Mathe-
matical Intelligencer 5 no.3, p. 59–60 (1983)
[Aig2014] M. Aigner and G. M. Ziegler, Das BUCH der Beweise Springer, (2014)
[Amm2006] H. Amann und J. Escher, Analysis I, 3. Auflage, Grundstudium Mathematik,

Birkhäuser Basel, (2006)
[Bla2003] C. Blatter, Analysis I ETH Skript, https://people.math.ethz.ch/ blatter/dlp.html

(2003)
[Bol1817] B. Bolzano, Rein analytischer Beweis des Lehrsatzes, daß zwischen je zwei Werthen,
die ein entgegengesetztes Resultat gewähren, wenigstens eine reelle Wurzel der Gleichung
liege, Haase Verl. Prag (1817)
[Boo1847] G. Boole, The mathematical analysis of logic Philosophical library, (1847)
[Can1895] G. Cantor, Beiträge zur Begründung der transfiniten Mengenlehre Mathematische

Annalen 46 no.4, 481–512 (1895)
[Cau1821] A.L. Cauchy, Cours d’analyse de l’école royale polytechnique L’Imprimerie Royale,
Debure frères, Libraires du Roi et de la Bibliothèque du Roi. Paris, (1821)
[Ded1872] R. Dedekind, Stetigkeit und irrationale Zahlen Friedrich Vieweg und Sohn, Braun-
schweig (1872)
[Die1990] J. Dieudonné, Elements d’analyse Editions Jacques Gabay (1990)
[Hat02] A. Hatcher, Algebraic Topology Cambridge University Press (2002)
[Hil1893] D. Hilbert, Über die Transzendenz der Zahlen e und π Mathematische Annalen 43,
216-219 (1893)
[Hos1715] G.F.A. Marquis de l’Hôpital, Analyse des Infiniment Petits pour l’Intelligence des
Lignes Courbes 2nde Edition, F. Montalant, Paris (1715)
242
Chapter 8.2 BIBLIOGRAPHY
[Lin1894] E. Lindelöf, Sur l’application des méthodes d’approximations successives à l’étude

des intégrales réelles des équations différentielles ordinaires Journal de mathématiques pures
et appliquées 10 no.4, 117–128 (1894)
[Rus1903] B. Russell, The principles of mathematics WW Norton & Company, (1903)
[Rot88] J. J. Rotman, An introduction to Algebraic Topology Graduate Texts in Mathematics

119 Springer 1988
[Smu1978] R. Smullyan, What is the name of this book? Prentice-Hall, (1978)
[Zag1990] D. Zagier, A one-sentence proof that every prime p ≡ 1 mod 4 is a sum of two
squares. Amer. Math. Monthly 97, no.2, p. 144 (1990)

Analysis 1

Uploaded by

Copyright:

Available Formats

Analysis 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analysis 1

Uploaded by

Copyright:

Available Formats

Eidgenössische Technische Hochschule Zürich

Analysis I: One Variable

Lecture Notes 2023

January 26, 2024

Version: January 26, 2024. i

2 The Real Numbers: Maximum, Supremum, and Sequences 8

3 Functions of one Real Variable 61

3.2.1 The Intermediate Value Theorem . . . . . . . . . . . . . . . . . . . . . . 72

4 Series and Power Series 101

5 Differential Calculus 135

Version: January 26, 2024. iii

5.2 Main Theorems of Differential Calculus . . . . . . . . . . . . . . . . . 145

6 The Riemann Integral 161

7 The Derivative and the Riemann Integral 177

8 Ordinary Differential Equations 211

Version: January 26, 2024. iv

1.1 Quadrature of the Parabola

Our goal is to calculate its area.

1. The area of the rectangle [a, b] × [c, d] = {(x, y) ∈ R2 | a ≤ x ≤ b, c ≤ y ≤ d} is equal to

2. If G is a domain in R2 and F is a domain contained in G, then the area of F is less

Then the area of P as in equation (1.1) (if defined at all) is equal to 13 .

Lemma 1.2. — Let n ≥ 1 be a natural number. Then

(n + 1)3 (n + 1)2 n + 1 n3 + 3n2 + 3n + 1 n2 + 2n + 1 n + 1

Version: January 26, 2024. 3

Figure 1.1: Approximation of P with n = 6 rectangles.

Version: January 26, 2024. 4

Interlude: Naive Set Theory

(1) A set consists of distinguishable elements.

(2) A set is distinctively determined by its elements.

(3) A set is not an element of itself.

We write “x ∈ X” if x is an element of the set X. If x is not an element of the set X,

Version: January 26, 2024. 5

Version: January 26, 2024. 6

1.2 Tips on Studying

“There is no silver bullet to mathematics”

Version: January 26, 2024. 7

The Real Numbers: Maximum,

2.1 The Axioms of the Real Numbers

2.1.1 Ordered Fields

• (Associativity) for all a, b, c ∈ G, we have

Note that, in general, we do not require that a ⋆ b = b ⋆ a for arbitrary a, b ∈ G. If this

• associative, since for any natural numbers k, l, m ∈ N we have

• has a neutral element 0 ∈ N, since for any natural number n ∈ N we have

Version: January 26, 2024. 9

3. As a different example, consider the set of nonzero rational numbers

a−1 = a−1 ⋆ e = a−1 ⋆ (a ⋆ ã−1 ) = (a−1 ⋆ a) ⋆ ã−1 = e ⋆ ã−1 = ã−1 .

(a−1 )−1 = a. (2.1)

Version: January 26, 2024. 10

Interlude: Rings and Fields

• (Distributivity) for all a, b, c ∈ R, we have

A ring in which the operation · is also commutative is called a commutative ring.

• Associativity of the multiplication: For all integers k, l, m ∈ Z, we have

• Distributivity: For all k, l, m ∈ Z we have

Version: January 26, 2024. 11

Hence, we conclude that Z is a ring. Moreover, since the multiplication is commutative

Let K be a field, and a, b ∈ K. Then the following holds:

(ii) a · (−b) = −(a · b) = (−a) · b. In particular, we have (−1) · a = −a.

So a · (−b) is the additive inverse of a · b, i.e., −(a · b) = a · (−b). Taking b = 1 gives

(iii) (−a) · (−b) = a · b. In particular, we have (−a)−1 = −(a−1 ).

−(a · (−b)) = (−a) · (−b).