Analysis I

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Analysis

Adam Kelly (ak2316@cam.ac.uk)


July 4, 2021

At its heart, analysis is the study of ideas that depend on the notion of limits. The main
concepts of analysis (such as convergence, continuity, differentiation and integration)
will all depend quite fundamentally on a limiting process.
This article constitutes my notes for the ‘Analysis I’ course, held in Lent 2021 at Cam-
bridge. These notes are not a transcription of the lectures, and differ significantly in
quite a few areas. Still, all lectured material should be covered1 .

Contents
1 Sequences and Convergence 2
1.1 Limits & The Reals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Bolzano–Weierstrass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Cauchy Sequences & The General Principle of Convergence . . . . . . . . 6
1.4 Limits of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Infinite Series 10
2.1 Convergent & Divergent Series . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Convergence Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Series of Non-Negative Terms . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 Alternating Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Absolute Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Continuity 18
3.1 Continuity of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 The Intermediate Value Theorem . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 The Boundedness Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Differentiability 25
4.1 Properties of Differentiable Functions . . . . . . . . . . . . . . . . . . . . 25
4.2 Rolle’s Theorem & The Mean Value Theorem . . . . . . . . . . . . . . . . 28
4.3 Inverses of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Power Series 37
5.1 Radius of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1
A tiny bit of analysis is assumed, namely the content covered in the ‘Numbers and Sets’ course at
Cambridge. Specifically the reader should be aware of least upper bounds/suprema along with the
least upper bound axiom. If the reader is unfamiliar with this content, they are referred to Chapter
2 of my Numbers and Sets notes

1
Adam Kelly (July 4, 2021) Analysis

5.2 Differentiating Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . 39


5.3 Exponential, Trigonometric & Hyperbolic Functions . . . . . . . . . . . . 40
5.3.1 The Exponential & Logarithm Function . . . . . . . . . . . . . . . 41
5.3.1 The Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . 45
5.3.2 The Hyperbolic Functions . . . . . . . . . . . . . . . . . . . . . . . 47

6 Integration 49
6.1 Dissections, Upper & Lower Sums, and Riemann Integrals . . . . . . . . . 49
6.2 Properties of Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.3 Integrable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.4 The Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . . . . 58
6.5 Taylor’s Theorem Again . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.6 Improper Integrals & The Integral Test . . . . . . . . . . . . . . . . . . . 62

§1 Sequences and Convergence


One of the fundamental objects of study in analysis is sequences. In particular, in
almost all areas of this course we will be concerned (either implicitly or explicitly) with
the notion of convergence.

§1.1 Limits & The Reals


We will define convergence as follows.

Definition 1.1 (Convergence)


A sequence a1 , a2 , · · · ∈ R is said to converge to the limit a ∈ R if given any  > 0,
we can find an integer N such that |an − a| <  for all n ≥ N . We write lim an = a,
n→∞
or an → a as n → ∞.

This definition is (notably) purely algebraic. We can sensibly define this notion of
convergence for any ordered field (for example, Q). What takes us from algebra to
analysis is the fundamental property of the real numbers.

Axiom 1.2 (Fundamental Axiom of Analysis)


If a1 , a2 , · · · ∈ R is an increasing sequence and there exists A ∈ R such that ai ≤ A
for all i ∈ N, then there exists a ∈ R such that an → a as n → ∞.

In other words – an increasing sequence of real numbers that is bounded above converges.
Also this clearly implies that a decreasing sequence of reals bounded below converges.
Limits obey the properties that you would naturally expect.

Proposition 1.3 (Uniqueness of Limits)


If an → a and an → b as n → ∞, then a = b.

Proof. Assume that a 6= b. Given given any  > 0, we can find integers N1 and N2

2
Adam Kelly (July 4, 2021) Analysis

such that

|an − a| ≤ , for all n ≥ N1


|an − b| ≤ , for all n ≥ N2

Then letting  = |a − b|/3 and taking N = max{N1 , N2 }, we have by the triangle


inequality
2
|a − b| ≤ |an − a| + |an − b| ≤ 2 = |a − b|
3
for all n ≥ N . Thus we must have |a − b| = 0, and a = b.

Proposition 1.4 (Convergence of Subsequences)


If an → a as n → ∞ and n(1) < n(2) < · · · , then an(j) → a as j → ∞.

Proof. We note that n(j) ≥ j. Now an → a implies that given some , we can find
an N such that |aj − a| <  for all j ≥ N . But then this implies that |an(j) − a| < 
for all j ≥ N , since j ≥ N implies n(j) ≥ N . So an(j) → a also.

Proposition 1.5 (Manipulating Limits)


Let an and bn be sequences. Then the following hold.
(i) If an → a and bn → b then an + bn → a + b.
(ii) If an → a then can → ac for a constant c.
(iii) If an → a and bn → b then an bn → ab.
(iv) If an → a and a, an 6= 0 for all n, then 1/an → 1/a.

Proof. We prove each individually.


(i) Given some  > 0, we can find integers Na and Nb such that |an − a| ≤ /2
for all n ≥ Na and |bn − b| ≤ /2 for all n ≥ Nb .
Then letting N = max {Na , Nb } , by the triangle inequality we have
 
|(an − a) + (bn − b)| ≤ |an − a| + |bn − b| ≤ + =
2 2
for all n ≥ N . Thus an + bn → a + b.
(ii) Given  > 0, we can find some N such that |an − a| ≤ /|c| for all n ≥ N .
Then |can − ca| ≤  for all n ≥ N . So can → ca.

(iii) Given some  > 0, we can find integers Na and Nb such that |an −a| ≤  for all

n ≥ Na and |bn −b| ≤  for all n ≥ Nb . Then again letting N = max{Na , Nb },
we have
|(an − a)(bn − b)| ≤ ,
for all n ≥ N . Hence (an − a)(bn − b) → 0, so an bn − abn − ban + ab → 0, and
using the previous two properties we have an bn → ab.

3
Adam Kelly (July 4, 2021) Analysis

(iv) Since the sequence converges to a 6= 0, there must be some r > 0 such that
|an | > r, for all n. Then given some  > 0, there exists N ∈ N such that
|an − a| < |a|r for all n ≥ N . That is,

− 1 = a − an < |a|r = ,
1
an a aan |a|r

for all n ≥ N . Hence 1/an → 1/a.

Proposition 1.6 (Squeeze Theorem)


Let an , bn and xn be sequences such that an ≤ xn ≤ bn for all n. Then if an → `
and bn → ` as n → ∞, we also have xn → `.

Proof. Given some  > 0, we can find integers Na and Nb such that |an − `| <  for
all n ≥ Na and |bn − `| <  for all n ≥ Nb . Then letting N = max{Na , Nb }, we have
` −  < an ≤ xn ≤ bn < ` +  for all n ≥ N . That is, |xn − `| <  for all n ≥ N .
Hence xn → `.

With these results in the our toolbox, we can then prove our first actual analysis result.

Lemma 1.7 (Axiom of Archimedes)


1
n → 0 as n → ∞.

Proof. The sequence n1 is a decreasing sequence bounded below, and thus has a limit
1
a. Considering the sequence 2n = 21 · n1 , this tends to a limit a2 , but since 2n
1
is a
1 a
subsequence of n , it also tends to a limit a. Thus 2 = a, so a = 0 as required.

Now while this is an article about real analysis, we will frequently employ the complex
numbers. Of course, to do that we need to be able to do some analysis with C, and
indeed the definition of a limit still makes sense in C. All of the above properties also
hold, apart from the squeeze theorem (we need to be careful because C cannot be ordered
like R!)

§1.2 Bolzano–Weierstrass
An equivalent but important and quite useful form of the fundamental axiom is the
‘Bolzano-Weierstrass theorem’, that every bounded sequence has a convergent subse-
quence.

Theorem 1.8 (Bolzano-Weierstrass)


If an ∈ R and there exists K such that |an | ≤ K for all n, then we can find
n1 < n2 < n3 < · · · and a ∈ R such that anj → a as j → ∞.

Proof. Consider a real sequence an . We say that an element ai of the sequence is


good if aj ≥ ai for all j ≥ i. If the sequence has an infinite number of good elements,
then taking them as a subsequence will be increasing. If there is a finite number

4
Adam Kelly (July 4, 2021) Analysis

of good elements, then letting N be the index of the last good element, we have
that for any aj with j ≥ N + 1, there exists some ak with k > j such that ak < aj .
Then repeating this, we can get a collection of decreasing elements. Thus every real
sequence has a monotonic subsequence. Since the the sequence is bounded, we then
have that this subsequence converges.

Remark. This theorem says nothing about the uniqueness of the subsequence’s limit.
For example, consider the sequence xn = (−1)n . Then x2n+1 → −1 and x2n → 1.
The proof given above is quite clean but is not the only standard proof of the theorem.
Another common method which involves the method of ‘repeated bisection’, which is
colloquially known as ‘lion hunting’. This method will be discussed shortly.
The Bolzano-Weierstrass theorem also holds in C (and the same proof method will show
with a little induction that it holds in Rn , but we won’t dwell on that in this article).

Theorem 1.9 (Bolzano-Weierstrass in C)


If an ∈ C is a bounded sequence, then it has a convergent subsequence.

Proof. Begin by writing an = xn + iyn . Then since an is bounded, the sequences


xn and yn are also bounded. We can apply Bolzano–Weierstrass to xn to get a
subsequence xn(1) , xn(2) , . . . with xn(j) → x as j → ∞, for some x ∈ R. Then we can
apply Bolzano-Weierstrass again to yn(j) to get a subsequence yn(k(1)) , yn(k(2)) , . . .
with yn(k(j)) → y as j → ∞ for some y ∈ R.
Thus xn(k(j)) → x and yn(k(j)) → y and thus an(k(j)) → x + iy as j → ∞, so we have
a convergent subsequence.

Before we end our discussion of the Bolzano–Weierstrass theorem, we will take a short
digression on Lion Hunting (you need to know this proof though!)

Aside: How To Hunt Lions


The term ‘lion hunting’ comes most likely from the satirical paper ‘A Contribution to
the Mathematical Theory of Big Game Hunting’, by a group of mathematicians under
the pseudoname ‘H. Pétard’. They present the following method of hunting a lion (that
we know exists) in the Sahara Desert.
Bisect the desert by a line running N-S. The lion is either in the E portion
or in the W portion; let us suppose him to be in the W portion. Bisect this
portion by a line running E-W. The lion is either in the N portion or in the S
portion; let us suppose him to be in the N portion. We continue this process.
indefinitely, constructing a sufficiently strong fence about the chosen portion
at each step. The diameter of the chosen portions approaches zero, so that
the lion is ultimately surrounded by a fence of arbitrarily small perimeter.
While this is tongue and cheek, it does give us a method of finding things by ‘bisecting
intervals’, which we can apply to quite a few different theorems in Analysis.
Now it’s time to prove Bolzano-Weierstrass by hunting lions.

5
Adam Kelly (July 4, 2021) Analysis

Proof (Bolzano-Weierstrass, Lion Hunting Style). We are going to define two se-
quences an and bn inductively as follows. Begin by setting [a1 , b1 ] = [−K, K], and
let c1 = (a1 + b1 )/2 be the midpoint of this interval.
Then there is two possibilities:
1. xn ∈ [a1 , c1 ] for infinitely many values of n.
2. xn ∈ [c1 , b1 ] for infinitely many values of n.
Of course both of these can hold at the same time, but if the first one holds we set
a2 = a1 and b2 = c1 , and if it doesn’t then we set a2 = c1 and b2 = b1 .
Repeating this process we construct an and bn such that xm ∈ [an , bn ] for infinitely
many values of ma . Then we have an−1 ≤ an ≤ bn ≤ bn−1 , and also bn − an =
(bn−1 − an−1 )/2.
Now an is increasing and bounded above, and bn is decreasing and bounded below
and thus an → a ∈ [a1 , b1 ] and bn → b ∈ [a1 , b1 ]. Then we have b − a = (b − a)/2
using the above result, and thus a = b.
Since xm ∈ [an , bn ] for infinitely many values of m, we can construct a sequence
nj such that nj+1 > nj and xnj+1 ∈ [aj+1 , bj+1 ]. Then aj ≤ xnj ≤ bj , and thus
xnj → a.
a
This is lion hunting! You can kind of imagine that we are hunting for a number that a subsequence
converges to, using the fact that there must be infinitely many terms near that number

§1.3 Cauchy Sequences & The General Principle of Convergence


So far we have defined the notion of a sequence converging to some explicit limit. How-
ever, it is possible to determine if a sequence converges without considering this limit
explicitly. This is done by considering how ‘close’ the terms in the sequence eventually
get, as we shall see.
We begin by describing what it means for a sequence to be Cauchy.

Definition 1.10 (Cauchy Sequence)


A sequence a1 , a2 , · · · ∈ C is said to be Cauchy if for every  > 0 there exists an
integer N such that for all n, m ≥ N we have |an − am | ≤ .

We can almost immediately write down our first lemma.

Lemma 1.11 (Convergence Implies Cauchy)


If a sequence converges then it is Cauchy.

Proof. Consider a convergent sequence an → a. Given  > 0, we can find N such


that |an − a| ≤ /2 for all n ≥ N . Then by the triangle inequality we have
 
|an − am | ≤ |an − a| + |a − am | ≤ + = ,
2 2

6
Adam Kelly (July 4, 2021) Analysis

for all m, n ≥ N . Thus the sequence is Cauchy.

The converse of this result is also true! This gives us a powerful result about convergence,
which is quite widely applicable (particularly because we can avoid talking about the
limit explicitly, as we said before).

Lemma 1.12 (Completeness)


If a sequence is Cauchy then it converges.

Proof. Let a1 , a2 , . . . be a Cauchy sequence. We will first show that the sequence
is bounded. Because the sequence is Cauchy, can find an integer N ∈ N such that
n, m ≥ N implies that |an − am | ≤ 1. This implies that if n ≥ N we have

|xn | ≤ |xn − xN | + |xN | ≤ 1 + |xN |.

Thus we have |xn | ≤ max |xr | + 1 for all n, so the sequence is bounded.
1≤r≤N

Applying the Bolzano–Weierstrass theorem, we then have a subsequence an1 , an2 , . . .


with anj → a as j → ∞.
So given ε > 0, we can find M such that |anj − a| ≤ ε/2 for all nj ≥ M . We can
also find M 0 such that n, m ≥ M 0 implies |an − am | < ε/2.
Then for all n ≥ M 0 , choosing nj such that nj ≥ max{M, M 0 }, we have

|an − a| ≤ |an − anj | + |anj − a| ≤ ε/2 + ε/2 = ε.

Thus the sequence converges (namely to a).

Stripping away the detail, the reason this result holds is since Cauchy sequences are
bounded, we can get a convergent subsequence. Then since all of the terms get arbitrarily
close, the whole sequence must converge along with the subsequence.
Combining these two lemmas gives us the ‘general principle of convergence’, a result also
known as Cauchy’s criterion.

Theorem 1.13 (General Principle of Convergence/Cauchy’s Criteron)


A sequence converges if and only if it is a Cauchy sequence.

§1.4 Limits of Functions


So far we have developed the notion of a limit of a sequence, but we can sensibly come
up with a definition for what limits are in the context of functions. We will need to do
a little bit of setup first, however.
We first need to be able to distinguish points (element of C) where its possible to converge
to those points with a sequence of elements that are all in some set A.

Definition 1.14 (Limit Point)


Let A ⊆ C and a ∈ C. We say that a is a limit point of A if for any δ > 0 there is

7
Adam Kelly (July 4, 2021) Analysis

some z ∈ A such that 0 < |z − a| < δ.

We can then define a limit of a function by its behaviour as we approach a given limit
point.

Definition 1.15 (Limit of a Function)


Let A ⊆ C and let a ∈ C be a limit point of A. Then for f : A → C, we say that
the limit of f as z approaches a is `, written lim f (z) = `, if given  > 0 there is
z→a
some δ > 0 such that whenever 0 < |z − a| < δ and z ∈ A we have |f (z) − `| < .

These definitions should match the informal notion of a limits that would have been
given in a calculus course. The reader should note that there is no requirement that
f (a) = `, or even that f be defined at a at all.
There is a natural relation between this notion of a limit point and a limit and our
previous definition of limits of sequences.

Proposition 1.16 (Sequence Definition of Limit Points)


Let A ⊆ C and a ∈ C. Then a is a limit point of A if and only if there is a sequence
zn ∈ A such that zn → a as n → ∞ and zn 6= a for all n.

Proof. If a is a limit point then for every positive integer n we can take δ = 1/n
and obtain a zn ∈ A such that 0 < |zn − a| < 1/n. Then by the squeeze theorem
zn → a as n → ∞, and also zn 6= a for all n.
Conversely, if there is such a sequence zn ∈ A, then given δ > 0 there is an N such
that zN ∈ A and 0 < |zN − a| < δ, so a is a limit point of A.

Proposition 1.17 (Sequence Definition of Limits)


Let A ⊆ C and let a ∈ C be a limit point of A. Then for f : A → C we have
lim f (z) = ` if and only if limn→∞ f (zn ) = ` for every sequence zn ∈ A satisfying
z→a
zn 6= a and zn → a has f (zn ) → `.

Proof. If limz→a f (z) = `, consider a sequence zn ∈ A where zn 6= a and zn → a.


Given  > 0, there exists δ > 0 such that 0 < |z − a| < δ and z ∈ A implies
|f (z) − `| < .
Also there exists N such that n ≥ N implies 0 < |zn − a| < δ. Thus for n ≥ N we
have |f (zn ) − `| < , so limn→∞ f (zn ) = `.
Conversely, if limz→a f (z) was not `, then there would be some  such that for every
δ there is a z ∈ A such that 0 < |z − a| < δ but |f (z) − `| ≥ . Then taking δ = 1/n
for each positive integer n we could construct a sequence zn ∈ A with zn 6= 0 where
0 < |z − a| < 1/n but |f (zn ) − `| ≥ ` for all n. But then zn → a and f (zn ) 6→ `,
which is a contradiction.

In practice, which of these definitions is more useful will depend on context, however
the sequences definition does allow us to use the results we developed at the start of this

8
Adam Kelly (July 4, 2021) Analysis

section.

Proposition 1.18 (Properties of Limits of Functions)


Let A ⊆ C and let a be a limit point of A. Then if f, g : A → C are functions the
following hold.
(i) If limz→a f (z) = ` and limz→a g(z) = m, then limz→a f (z) + g(z) = ` + m.
(ii) If limz→a f (z) = ` then limz→a cf (z) = c`.
(iii) If limz→a f (z) = ` and limz→a g(z) = m then limz→a f (z)g(z) = `m.
(iv) If limz→a f (z) = ` and ` 6= 0 then limz→a 1/f (z) = 1/`.

Proof. These all follow directly from the sequence definition of limits along with
Proposition 1.5.

9
Adam Kelly (July 4, 2021) Analysis

§2 Infinite Series
The notion of convergence lets us talk quite sensibly about what it would mean to add
up an infinite number of things, which is quite exciting.

§2.1 Convergent & Divergent Series


The definition of convergence for an infinite series is rather natural, and comes from
considering the sequence of partial sums.

Definition 2.1 (Convergence of an Infinite Series)


P∞
For a sequence aj ∈ C, we say that the series
Pn j=1 aj converges to S if the sequence
of partial sums converges to S. That is, j=1 aj → S as n → ∞. Otherwise we say
that it diverges.

Remark (Notation). If a series converges, then we will typically write ∞


P
j=1 aj = S,
but some care is needed as if the series does not converge then this is nonsense. We will
also write Sn to denote the partial sum of a series, if it is clear from context what series
is being referred to.
In this chapter we will be primarily concerned with finding ways to show whether a given
series converges. But first, a few preliminaries.

Proposition 2.2 (Adding Series)


If two series converge, say ∞
P P∞ P∞
j=1 aj = a and j=1 bj = b, then j=1 (λaj + µbj ) also
converges, and we have
X∞
(λaj + µbj ) = λa + µb.
j=1

Proof. Considering the sequence of partial sums we have


n
X n
X n
X
(λaj + µbj ) = λ aj + µ bj −→ λa + µb
j=1 j=1 j=1

as n → ∞, as required.

It should also be intuitively clear that the first few terms of an infinite series will not
affect its convergence.

Proposition 2.3 (Ignoring Initial Terms)


P∞
Suppose there exists N such that aj = bj for all j ≥ N . Then either j=1 aj and
P ∞
j=1 bj both converge or they both diverge.

Proof. For n ≥ N , we have nj=1 an − nj=1 bn = N


P P P PN
j=1 an − j=1 bn , which is a
constant. Thus the sequences of partial sums either both converge or diverge.

We can also apply the general principle of convergence to infinite series. This gives us

10
Adam Kelly (July 4, 2021) Analysis

another useful way of proving that a series converges, particularly when we don’t know
what the series converges to which is often the case.

Theorem 2.4 (General Principle of Convergence/Cauchy’s Criterion for Series)


The infinite series ∞
P
j=1 aj converges if and
P only if for every  > 0 there is an integer
N such that for all n, m ≥ N we have | m j=n aj | ≤ .

Proof. This follows directly from the definition of the sequence of partial sums being
Cauchy.

§2.2 Convergence Tests


In this section we will develop some ‘tests’ of convergence, so that given some infinite
series we will have some tools to try and determine relatively quickly if it converges and
diverges.
We begin with the observation that if the terms of the series do not tend to 0, then the
series must diverge.

Proposition 2.5 (Limit of Terms in a Convergent Series)


Suppose that ∞
P
j=1 aj converges. Then aj → 0 as j → ∞.

Pn
Proof. Let Sn = j=1 aj . Then Sn → S for some S, and thus we must have
Sn+1 → S. Adding these sequences we get Sn+1 − Sn → 0, that is, an+1 → 0 as
n → ∞ as required.

Somewhat unfortunately, while necessary this is not a sufficient condition for conver-
gence. The most commonly cited counterexample to this is the Harmonic series.

Example 2.6 (Divergence of the Harmonic Series)


Consider the infinite series with ak = 1/k. Then ak → 0, but ∞
P
k=1 ak diverges.
Pn
To see this, let Hn = k=1 ak be the partial sum of the series. Then we have

1 1 1
H2n = Hn + + + ··· +
n+1 n+2 n+n
1 1 1 1
≥ Hn + + + ··· + = Hn + .
|2n 2n {z 2n} 2
n times

So if the series converged, we would have Hn → a and H2n → a, giving a ≥ a + 1/2,


which is a contradiction.

The result about the limit of terms in a series can be helpful though! For example, let’s
consider a series you are likely familiar with.

Example 2.7 (Convergence/Divergence of the Geomeric Series)

11
Adam Kelly (July 4, 2021) Analysis

Consider the infinite series ∞


P j−1 . Writing it’s partial sum as S =
Pn j−1 ,
j=1 x n j=1 x
A little bit of algebra gives us that
(
1−xn
if x 6= 1,
Sn = 1−x
n if x = 1.

1
So if |x| < 1, then xn → 0 and Sn → 1−x . Otherwise, if |x| ≥ 1 then the series
cannot converge since the terms do not tend to 0.

It’s best to think of Proposition 2.5 as giving the sort of a ‘bare minimum’ property that
a series has to have in order to converge. Of course, it’s also quite helpful for sanity
checks!

§2.2.1 Series of Non-Negative Terms


We just saw a necessary condition for convergence, now let’s have a look at some sufficient
conditions for convergence. For the time being we will restrict ourselves to considering
infinite series where all of the terms are real and non-negative.
The first result we have is a direct consequence of our fundamental axiom.

Theorem 2.8 (Bounded Partial Sums)


If
Pa n is a non-negative sequence and the partial sums Sn are bounded above, then

n=1 an converges.

Proof. Since all of the terms are non-negative, Sn is a bounded and increasing
sequence, and thus converges by our fundamental axiom.

In a similar vein is the comparison test, which allows us to show convergence by com-
paring a series with another series whose convergence we know.

Theorem 2.9 (The Comparison Test)


P∞ P∞
Suppose that 0 ≤ aj ≤ bj for all j. Then if j=1 bj converges, so does j=1 aj .

Proof. Since Bn = nj=1 bP


P
j is an increasing sequence whose limit is say B, we have
Bn ≤ B for all n. Thus nj=1 an ≤ Bn ≤ B,P so we have a series of non-negative
terms whose partial sums are bounded. Thus ∞ j=1 an converges.

Example 2.10 (Using the Comparison Test)


We will prove that ∞ 1
P
n=1 n2 converges using the comparison test.

for n ≥ 2, and ∞
PN
Since n12 < n(n−1)
1 1 1 1
P
n=2 n(n−1) converges as n=2 n(n−1) = 1− N → 1
P∞ 1
as N → ∞, we get that n=1 n2 converges by comparison.

Using the comparison test we can derive two more useful tests, both of which come
from comparing a series with the geometric series. You’ll notice that the proofs for both
results are relatively similar!

12
Adam Kelly (July 4, 2021) Analysis

Theorem 2.11 (The Root Test)



n a → a as n → ∞. Then if a < 1,
Let
P∞ an be a non-negative sequence and
P∞ suppose n
n=1 an converges, and if a > 1, n=1 an diverges.

Proof. If a < 1, we can choose b such that a < b < 1, and there exists anPinteger

N such that for all n ≥ N we have n an < P b, that is, an < bn . But then ∞
n=N b
n

converges since b < 1. Thus by comparison ∞ n=1 an converges too.



If a > 1, then for n ≥ N n an > 1 Pimplies that an > 1, but then the terms in the
series do not tend to zero and thus ∞n=1 an diverges.

Theorem 2.12 (The Ratio Test)


Let an P
be a non-negative sequence and P suppose that an+1
an → ` as n → ∞. Then if
` < 1, ∞n=1 a n converges, and if ` > 1, ∞
n=1 a n diverges.

Proof. If ` < 1, we can choose b such that ` < b < 1, and there exists an integer N
such that an+1
an < b for all n ≥ N . Therefore

an an−1 aN +1
an = · ··· · aN
an−1 an−2 aN
< aN b−N bn ,


P∞ n −N is a constant,
P∞n > N . But then n=N +1 b converges as b < 1, and since aN b
for
n=1 an converges by comparison.

If
P`∞> 1, then an+1 > an and the terms in the series do not tend to zero and thus
n=1 an diverges.

Remark (A Deadly Sin). In both the root and ratio test, if we find that either a = 1 or
` = 1, weP
cannot draw
P 1any conclusions about the convergence of the series. To see this,
1
consider n and n2
.
Some examples of using the root and ratio tests are shown below, but we won’t do too
many since there’s a most likely a few already on your example sheets (and there’s not
really much to the basic technique).

Example 2.13 (Using the Ratio Test)


We will prove that ∞ n
P
n=1 2n converges using the ratio test.
n
Let an = n/2n . Then we have aP
n+1 n+1 2 n+1
an = 2n+1 · n = 2n , and thus
an+1
an → 1
2 < 1 as
∞ n
n → ∞. Then by the ratio test n=1 2n converges.

Example 2.14 (Using the Root Test)


 n
We will prove that ∞ n+1
P
n=1 3n+5 converges using the root test.
n+1 n √ n+1 √ 1
Let an = ( 3n+5 ) . Then n an = 3n+5 , and thus n an → 3 as n → ∞. Then by the

13
Adam Kelly (July 4, 2021) Analysis

P∞  n
n+1
root test n=1 3n+5 converges.

In the case that we have our sequence of non-negative terms is decreasing, we can
determine the convergence by looking at a related series involving a relatively small
subsequence of terms.

Theorem 2.15 (Cauchy’s Condensation Test)


P∞ P∞ na n
If an is a decreasing sequence of positive numbers, then n=1 an and n=0 2 2
either both converge or both diverge.

Proof. It suffices to show that the


Pnpartial sums of both
Pn series are either both bounded
j
or both unbounded. Let Sn = j=1 aj and Tn = j=0 2 a2j be the partial sums of
each series.
If n < 2k we have

Sn ≤ a1 + (a2 + a3 ) + · · · + (a2k + · · · + a2k+1 −1 )


≤ a1 + 2a2 + · · · + 2k a2k = Tk ,

thus Sn ≤ Tk . So if Tn is bounded so is Sn . Also if n > 2k , we have

Sn ≥ a1 + a2 + (a3 + a4 ) + · · · + (a2k−1 +1 + · · · + a2k )


1 1
≥ a1 + a2 + 2a4 + · · · + 2k−1 a2k = Tk ,
2 2
so 2Sn ≥ Tk . So if Sn is bounded so is Tk . Thus Sn and Tn are either both bounded
or both unbounded.

You may notice that this proof somewhat resembles our proof that the harmonic series
diverges, and indeed you can show the following stronger result.

P −α
Theorem 2.16 (Convergence of n )
P∞ 1
n=1 nα converges if and only if α > 1.

Proof. If α ≤ 0 then the terms in the series do not tend to zero, so the series
diverges. So for α > 0 we apply Cauchy’s condensation test. This series converges
if and only if the series
∞ ∞
X 1 X
2k · αk = 2(1−α)k
2
k=0 k=0

converges. By comparison with the geometric series, this will converge if and only
if 21−α < 1, that is, α > 1.

In a way Cauchy’s condensation test is slightly different to the others convergence tests
since it just gives us a different series that may be easier to show convergence or diver-
gence for.
Still, it is natural to reach for Cauchy’s condensation test when the root or ratio tests
fail. For example, all of the series below cannot be shown to converge or diverge using

14
Adam Kelly (July 4, 2021) Analysis

the root or ratio tests:


∞ ∞ ∞ ∞ ∞
X 1 X 1 X 1 X 1 X log n
, , , , and .
nα n log n (log n)2 n(log n)2 n2
n=1 n=2 n=2 n=2 n=2

If you try, you will find that these tests are inconclusive. However, all can be shown to
converge/diverge using Cauchy’s condensation test.2

§2.2.2 Alternating Series


We will now look at a convergence test for series whose terms alternate from positive to
negative.

Theorem 2.17 (Alternating Series Test)


If an is a decreasing sequence of positive numbers with an → 0 as n → ∞ then
P ∞ j+1 a converges.
j=1 (−1) j

Proof. Let Sn = nj=1 (−1)j+1 aj denote the partial sum of the series. Then S2n =
P
S2n−2 + (a2n−1 − a2n ) ≥ S2n−2 . Also

S2n = a1 − (a2 − a3 ) − · · · − (a2n−2 − a2n−1 ) − a2n ≤ a1 .

Therefore S2n is increasing and bounded above, and thus converges. Now let S2n →
n → ∞. Then S2n+1 = S2n + a2n+1 → S + 0 = S as n → S. Thus Sn → S,
S asP
and ∞ j=1 (−1)
j+1 a converges.
j

Example 2.18
P∞ (−1)n+1
We will prove that n=1 n converges using the alternating series test.a
1 1
We note that is a decreasing sequence of positive numbers with
n n → 0. Thus by
(−1)n+1
the alternating series test ∞
P
n=1 n converges.
a
Later on we will see that this is equal to log 2.

§2.3 Absolute Convergence


To finish our discussion on infinite series (for now!) we will introduce the stronger notion
of absolute convergence. The main motivation for introducing this will become clear as
we develop some of the properties of absolute convergence, but informally knowing that
a series converges absolutely will allow us to be a little less careful when working with
the series.

Definition 2.19 (Absolute Convergence)


P∞
For some sequence an ∈ C, we say that the series n=1 an converges absolutely if
the series ∞
P
n=1 n | converges.
|a

2
You could also use the integration test (which we have not discussed), but that’s most likely going to
be slower as you have to integrate things.

15
Adam Kelly (July 4, 2021) Analysis

We can see that absolute convergence is a strictly stronger property than ‘regular’ con-
vergence. Indeed absolute convergence implies convergence, but not the other way round.

Theorem 2.20 (Absolute Convergence Implies Convergence)


If ∞
P
j=1 aj converges absolutely then it converges.

Proof. By the triangle inequality we have



m X m
X

aj ≤
|aj | ,
j=n j=n
P∞
and thus by Cauchy’s criterion j=1 aj converges.

To see that the converse isn’t true (and that indeed it is a strictly stronger notion),
(−1)n+1
consider the series ∞
P
n=1 n , which converges but does not converge absolutely. We
sometimes say such series converge conditionally.
When trying to determine whether a series converges absolutely, since |an | ≥ 0 we are
free to apply all of the results that we developed in the previous subsection. We will see
this in the next example.

Example 2.21 (Showing Absolute Convergence)


Consider the infinite series ∞ n n
P
n=1 z /2 . We will determine for which values of z
this infinite series converges absolutely.
By the root test, we know that ∞ n n
P
n=1 |z /2 | converges if |z| < 2 and diverges if
|z| > 2. Also for |z| = 2 we can see that the terms of the series do not go to zero, so
the series diverges.
Thus ∞ n n
P
n=1 z /2 converges absolutely for |z| < 2 and diverges for |z| ≥ 2.

Now it was claimed at the start of this subsection that absolute convergence allows us
to be a ‘little less careful’. Allow me to elaborate on this.
In general, you must be very careful when manipulating infinite series. To see why,
consider this informal example.
1 1 1 1 1
1− + − + − + · · · = log 2
2 3 4 5 6
1 1 1 1 1 3
1 + − + + − + · · · = log 2
3 2 5 7 4 2
The two series above both converge, and they both have all of the same terms. The
only difference is that the terms are in a different order, and this change has completely
altered the value of the series.
What’s nice about absolutely convergent series is that we don’t have to worry about this.

Definition 2.22 (Rearrangement)


P∞ P∞
A rearrangement of the series n=1 an is another series n=1 aσ(n) , where σ :

16
Adam Kelly (July 4, 2021) Analysis

N → N is a bijection.

Theorem 2.23 (Rearranging Absolutely Convergent Series)


If ∞
P
j=1 aj is absolutely convergent then any rearrangement of the series will converge
to the same value.

Proof. Let ∞
P
j=1 aσ(j) be a rearrangement of theP series. Given Pn  > 0, we wish to

show that there exists an integer N such that | j=1 aj − j=1 aσ(j) | <  for all
n ≥ N.
By Cauchy’s criterion there exists an integer m such that ∞
P
j=m+1 |aj | < . We
then choose N such that {a1 , a2 , . . . , am } ⊆ {aσ(1) , aσ(2) , . . . , aσ(N ) }. This can be
done by setting N = max1≤i≤m σ −1 (i). Then if n ≥ N we have

X n
X X
aj − aσ(j) = aj ,
j=1 j=1 j∈An

where An ⊆ {m + 1, m + 2, . . . }. Then applying the triangle inequality we have



∞ n X ∞ ∞
X X X X

aj − a σ(j)
=
a j

aj
≤ |aj | < ,
j=1 j=1 j∈An j=m+1 j=m+1

as required.

To see just how not-true this is for series that do not converge absolutely, we just need
to read the statement of Riemann’s Rearrangement Theorem.

Theorem 2.24 (Riemann’s Rearrangement Theorem)


Let aj ∈ R. Suppose that ∞
P
j=1 aj is convergent but not absolutely P
convergent. Then
given any x ∈ R there is a rearrangement of this series such that ∞ j=1 aσ(j) = x.

Proof. Left as an exercise – it’s on the example sheet!

17
Adam Kelly (July 4, 2021) Analysis

§3 Continuity
The next idea we will develop is that of ‘continuity’.

§3.1 Continuity of Functions


Continuity is all about the behaviour of a function around a point. For a function to
be be continuous at some x, we need f (y) to be close to f (x) whenever y is sufficiently
close to x.

Definition 3.1 (Continuity)


Let A ⊆ C and f : A → C. We say that f is continuous at a ∈ A if given any  > 0
we can find a δ > 0 such that |f (x) − f (y)| <  for all y ∈ A such that |x − y| < δ.
We say that f is continuous if it is continuous at every a ∈ A.

You may notice that this definition uses exactly the definition of the limit of a function
mentioned in subsection 1.4. This immediately gives us two more equivalent definitions
of continuity.

Proposition 3.2 (Limit Definition of Continuity)


Let f : A → C. Then f is continuous at a ∈ A if and only if lim f (z) = f (a).
z→a

Proposition 3.3 (Sequence Definition of Continuity)


Let f : A → C. Then f is continuous at a ∈ A if and only if for every sequence
an ∈ A with an → a we have f (an ) → f (a).

Each of the definitions have various advantages or disadvantages. Still it is worth noting
that the sequential/limit continuity definitions can be used to quite easily show certain
properties of continuity that would otherwise be quite fiddly with the -δ definition.
Using sequential continuity is also a natural way to show that a function is not continuous
at a point. Let’s take a look at some examples.

Example 3.4 (The Constant Function is Continuous)


If f (x) = c then f is continuous. To see this, we can take any value of δ in the
definition.

Example 3.5 (f (x) = x is Continuous)


If f (x) = x then f is continuous. To see this, we can take δ =  in the definition.

Example 3.6 (Step Functions are Discontinuous)

18
Adam Kelly (July 4, 2021) Analysis

Consider the function f : R → R with


(
−1 if x < 1,
f (x) =
1 if x ≥ 1.

This function is not continuous at 1. To see this, note that 1 − 1/n → 1 but
f (1 − 1/n) → −1 6= f (1) as n → ∞.

Example 3.7 (The Domain Matters)


Consider the function f : Q → R with
(
1 if x2 > 2,
0 if x2 < 2.

This function is continuous, since for every a ∈ Q there is an interval about a for
which f is constant, so f is continuous
√ at a. If f was defined on R instead of Q, then
it would be discontinuous only at ± 2. But these points are not in our domain, so
we don’t need to worry about them.

Example 3.8 (Continuity of sin 1/x)


Consider the functiona f : R → R with f (x) = sin(1/x) if x 6= 0 and f (0) = 0.

This function is not continuous at x = 0. To see this, consider the sequence an =


1
2πn+π/2 . Then an → 0, but f (an ) = 1 → 1 6= f (0) as n → ∞.
a
I know that we haven’t defined what sin is, but we will use it in examples because they are
instructive. If this bothers you, feel free to come back after our discussion of power series.

Example 3.9 (A Functional Equation with Continuity)


We wish to find all continuous functions f : R → R such that f (x) + f (2x) = 0.

19
Adam Kelly (July 4, 2021) Analysis

Letting x = 0 we get f (0) = 0. Then rearranging we have f (x) = −f (x/2), and


repeatedly using this gives
x x x x
f (x) = −f =f = −f = · · · = (−1)n f n .
2 4 8 2
Then since f is continuous we have f (x) = limn→∞ (−1)n f (x/2n ) = f (0) = 0. Thus
f (x) = 0 is the only such function, which clearly works.

When attempting to determine if a function is continuous, one should keep in mind the
following properties of continuity (all of which follow directly from our basic properties
of limits).

Proposition 3.10 (Sums, Products and Reciporicals of Continuous Functions)


Let f, g : A → C be functions. Then the following hold.
(i) If f and g are continuous at a, then so is their sum f + g.
(ii) If f and g are continuous at a, then so is their product f g.
(iii) If f is continuous at a and f (a) 6= 0 then so is 1/f .

Proof. We prove each individually using the limit definition of continuity.


(i) If lim f (z) = f (a) and lim g(z) = g(a) then lim (f + g)(z) = (f + g)(a).
z→a z→a z→a

(ii) If lim f (z) = f (a) and lim g(z) = g(a) then lim (f g)(z) = (f g)(a).
z→a z→a z→a

(iii) If lim f (z) = f (a) and f (a) 6= 0, then lim 1/f (z) = 1/f (a).
z→a z→a

Along with the fact that f (x) = x and f (x) = c are continuous, these properties imply
that all polynomials are continuous.

Proposition 3.11 (The Composition of Continuous Functions is Continuous)


Let f : A → B and g : B → C with A, B ⊆ C. Then if f is continuous at a ∈ A and
g is continuous at f (a), then g ◦ f is also continuous at a.

Proof. If an → a then by the continuity of f we have f (an ) → f (a), and by the


continuity of g we have g(f (an )) → g(f (a)). So g ◦ f is continuous at a.

§3.2 The Intermediate Value Theorem


Continuous functions are nice because they have a few nice properties. The first of these
that we will prove is the intermediate value theorem, a central theorem in elementary
analysis. The idea of the intermediate value theorem is that if a continuous function
starts at one value and ends at a different value, then it must take on all of the values
in between.

20
Adam Kelly (July 4, 2021) Analysis

f (b)

f (c) = t •
f (a)

a c b

In the proof below we will employ suprema3 , but we will give another proof afterwards
that does not use this (at the expense of being slightly longer).

Theorem 3.12 (The Intermediate Value Theorem)


Let f : [a, b] → R be a continuous function. Then for every t between f (a) and f (b)
there is an c ∈ [a, b] with f (c) = t.

Proof. Suppose without loss of generalitya that f (a) < t < f (b). Consider the set
S = {x ∈ [a, b] : f (x) < t}. This set is bounded and since a ∈ S it is non-empty. So
we can let c = sup S, and note that a ≤ c ≤ b.
Let n ≥ 1 be an integer. Then since c is the supremum there must exist some xn ∈ S
such that c − n1 < xn ≤ c. By the squeeze theorem we have xn → c as n → ∞.
Also f is continuous so f (xn ) → f (c). We also constructed xn to be in S, giving
f (xn ) < t for all n. This implies that f (c) ≤ t.
We know that c ∈ [a, b] but c 6= b, so there is some integer N such that for n ≥ N
we have c + 1/n ≤ b. Using that c is the supremum, we have c + 1/n 6∈ S for all
n ≥ N , that is, f (c + 1/n) ≥ t for all n ≥ N . Then by continuity we have f (c) ≥ t.
But then f (c) ≥ t and f (c) ≤ t, so we must have f (c) = t.
a
If t = f (a) or t = f (b) then we are done. Also the case f (a) > t > f (b) follows similarity.

Before we look at some examples of using the intermediate value theorem, there’s a few
things worth noting about the intermediate value theorem.
First of all, we say absolutely nothing about uniqueness in the theorem. It is very much
possible for a function to take on an intermediate value multiple times. Second of all,
when applying the intermediate value theorem (commonly abbreviated to IVT) a good
general problem solving technique is trying to apply IVT to other related functions such
as g(x) = f (x) − x or g(x) = f (x) − t and things like that. You will see this ‘trick’ show
up in the examples below, and also again when we get to Rolle’s theorem and the mean
value theorem.

Example 3.13 (Fixed Points of Continuous Maps from [0, 1] to Itself)


We will prove that if f : [0, 1] → [0, 1] is a continuous function then there is some
c ∈ [0, 1] such that f (c) = c.
Consider the function g(x) = f (x) − x. Then g is continuous with g(0) = f (0) and
g(1) = f (1) − 1. Since 0 ≤ f (x) ≤ 1, we must have g(0) ≥ 0 and g(1) ≤ 0. Thus by

3
If you are unfamiliar with this term and/or the least upper bound principle, feel free to have a look
at Chapter 2 of my ‘Numbers and Sets’ course notes.

21
Adam Kelly (July 4, 2021) Analysis

the intermediate value theorem there exists some c ∈ [0, 1] such that g(c) = 0. But
then we have f (c) − c = 0, that is, f (c) = c as required.

Example 3.14 (Existence nth Roots)


We will show that for any positive integer n and y > 0 there exists a real number x
such that xn = y.
Consider the function f (x) = xn . This function is continuous on [0, 1 + y], and since
f (0) = 0 and f (1 + y) = (1 + y)n , we have f (0) < y < f (1 + y). Thus by the
intermediate value theorem there is some x ∈ [0, 1 + y] such that f (x) = y, that is,
xN = y.

Now before we look on let’s have a look at an alternative proof for the intermediate
value theorem. This proof uses the same idea as the previous proof (try to see why) but
avoids the use of suprema.

Aside: Hunting For Lions Intermediate Values


It was mentioned before that the ‘lion hunting’ technique that we used to prove the
Bolzano–Weierstrass theorem could also be used to prove other theorems in Analysis.
One such example is the intermediate value theorem.
The steps in the proof are quite similar to that of the proof of the Bolzano-Weierstrass
theorem, the only difference is what we are ‘hunting for’, which in this case is our
intermediate value, and how we finish off the proof.
Proof (Intermediate Value Theorem, Lion Hunting Style). Again suppose without loss
of generality that f (a) < t < f (b). We are going to define two sequences an and bn
inductively as follows. Begin by setting a1 = a and b1 = b, and let c = (a1 + b1 )/2
be the midpoint a1 and b1 .
Then there is two possibilities:
1. f (c1 ) ≥ t.
2. f (c1 ) < t.
If the first one holds, we set a2 = a1 and b2 = c1 , and if it doesn’t then we set
a2 = c1 and b2 = b1 . Repeating this process, we construct an and bn such that
f (an ) ≤ t ≤ f (bn ), an−1 ≤ an ≤ bn ≤ bn−1 , and also bn − an = (bn−1 − an−1 )/2.
Now an is increasing and bounded above, and bn is decreasing and bounded below
and thus an → c ∈ [a, b], and bn → c0 ∈ [a, b]. Then we have c − c0 = (c − c0 )/2 using
the above result, and thus c = c0 .
Since f is continuous at c and an → c, we know that f (an ) → c as n → ∞. Since
f (an ) ≤ t, we also know that f (c) ≤ t. Similarity we also have f (bn ) ≥ t, so
f (c) ≥ t. But then t ≤ f (c) ≤ t, and thus f (c) = t, and we are done.

To get a feel for the construction, try drawing a rough sketch of the process. If you
consider the case of a function that takes on the desired intermediate value multiple
times, you should also see the that this proof does not give the same value for c as the

22
Adam Kelly (July 4, 2021) Analysis

previous proof.

A natural question upon learning of the intermediate value theorem is if it is equivalent to


definition of continuity. The answer to this question is no – there are functions that have
the ‘intermediate value property’, without being continuous. One example is sin(1/x)
which we showed is discontinuous at x = 0 in Example 3.8. Another more elaborate
example is ‘Conway’s Base-13 Function’, which has the intermediate value property but
is discontinuous everywhere. You can read more about this function on Wikipedia.

§3.3 The Boundedness Theorem


The last result about continuous functions that we will prove is the boundedness theorem,
another central theorem in elementary analysis. The statement of this result is straight-
forward: a continuous function on a closed bounded interval is bounded and attains its
bounds.
First we will prove that a continuous function defined on some closed interval is indeed
bounded.

Lemma 3.15 (Continuous Function on Closed Interval is Bounded)


If f : [a, b] → R is continuous then there exists K such that |f (x)| ≤ K for all
x ∈ [a, b].

Proof. If f was unbounded, then given any integer n ≥ 1 there exists xn ∈ [a, b]
such that |f (xn )| > n. By Bolzano–Weierstrass, xn has a convergent subsequence
xnj → x. Also since a ≤ x ≤ b, we must have x ∈ [a, b]. By continuity of f ,
f (xnj ) → f (x), but |f (xnj )| > nj , and thus does not tend to a limit. Thus we have
a contradiction.

Now we can prove that such a continuous function attains its bounds, giving us the
boundedness theorem.

Theorem 3.16 (Boundedness Theorem)


If f : [a, b] → R is continuous then there exists x1 , x2 ∈ [a, b] such that f (x1 ) ≤
f (x) ≤ f (x2 ) for all x ∈ [a, b].

Proof. By our previous lemma, we know that f is bounded. Now we define the set

A = {f (x) : x ∈ [a, b]}.

This set is bounded and non-empty and thus has a supremum M = sup A. Then
for each positive integer n, M − 1/n cannot be an upper bound for A. This implies
that there is some xn ∈ [a, b] such that M − 1/n < f (xn ) ≤ M .
By Bolzano–Weierstrass, xn has a convergent subsequence xnj → x. Since a ≤ xn ≤
b, we know a ≤ x ≤ b. By the continuity of f , f (xnj ) → f (x), but f (xnj ) → M by
construction. So f (x) = M . So x2 = x. The minimum follows analogously.
Alternate Proof. As before, let M = sup A and suppose that there was no x2 such

23
Adam Kelly (July 4, 2021) Analysis

1
that f (x2 ) = M . Then let g(x) = M −f (x) for x ∈ [a, b]. This function is well defined
and continuous. Now g must be bounded by the previous lemma, so g(x) ≤ K
for all x ∈ [a, b], for some K. This means that f (x) ≤ M − 1/k on [a, b]. This is
contradiction since we set M as the supremum.

24
Adam Kelly (July 4, 2021) Analysis

§4 Differentiability
You are likely familiar with differentiability (and particularly the computation of deriva-
tives) from calculus. While this knowledge should certainly not be disregarded, we are
going to go from the beginning, doing everything with a little more care than it got in
calculus. Throughout this section we will deal with functions f : A ⊆ R → R, though
the basic definitions will apply to C.4 .
Here’s our basic definition:

Definition 4.1 (Differentiability)


Let A ⊆ R and f : A → R. Then if x is a limit point of A, we say that f is
differentiable at x with derivative f 0 (x) if

f (x + h) − f (x)
lim = f 0 (x).
h→0 h

We say that f is differentiable on A if f is differentiable at every x ∈ A.

Let’s pause for a moment. The core idea of differentiation is we want to approximate a
function around some point using a linear map 5 . Indeed, our definition of the derivative
is directly equivalent to the following: f is differentiable at x if

f (x + h) = f (x) + f 0 (x)h + ε(h),

where limh→0 ε(h)/h = 0. That is, it is differentiable if we can approximate the function
with some linear map where the error decreases faster than linearly.

§4.1 Properties of Differentiable Functions


With that spiel over, we can move on to some properties of differentiable functions. The
first is that differentiability implies continuity.

Proposition 4.2 (Differentiability Implies Continuity)


If f is differentiable at x, then f is continuous at x.

Proof. Using the limit definition of continuity we have

f (x + h) − f (x)
lim [f (x + h) − f (x)] = lim · (h) = f 0 (x) · 0 = 0,
h→0 h→0 h
thus limh→0 f (x + h) = f (x), so f is continuous at x.

Now we can prove some basic rules for computing derivatives.

4
We will use differentiation over C in our discussion of power series though. One should also note
that the other basic rules such as the sum, product and chain rules will also hold over C, but being
differentiable over C is a stronger condition than being differentiable over R or R2 . For example,
consider f (z) = z, which is not complex differentiable.
5
Indeed it is which this idea, not really that of ‘the tangent to a curve’ that is used to generalise the
derivative to functions of multiple variables. Of course they are closely related.

25
Adam Kelly (July 4, 2021) Analysis

Proposition 4.3 (Sum, Produt & Quotient Rules)


Suppose that f : R → R and g : R → R are differentiable at x with derivatives f 0 (x)
and g 0 (x). Then the following hold.
(i) f + g is differentiable at x, with (f + g)0 (x) = f 0 (x) + g 0 (x),
(ii) f g is differentiable at x, with (f g)0 (x) = f 0 (x)g(x) + f (x)g 0 (x),
g(x)f 0 (x)−g 0 (x)f (x)
(iii) If g 0 (x) 6= 0 then f /g is differentiable at x, with (f /g)0 (x) = g 2 (x)
.

Proof. We prove each individually.


(i) This follows from the sum of limits.
(ii) We have by the continuity of f

f (x + h)g(x + h) − f (x)g(x)
lim
h→0
 h 
g(x + h) − g(x) f (x + h) − f (x)
= lim f (x + h) + g(x) ,
h→0 h h

and by the continuity of f and g at x we have (f g)0 (x) = f (x)g 0 (x)+f 0 (x)g(x).
(iii) Similarly we have

f (x + h)/g(x + h) − f (x)/g(x)
lim
h
h→0

1 f (x + h) − f (x) g(x + h) − g(x)
= lim g(x) − f (x) ,
h→0 g(x)g(x + h) h h
g(x)f 0 (x)−g 0 (x)f (x)
and by the continuity of f and g we have (f /g)0 (x) = g 2 (x)
.

We will now prove the chain rule, which tells us how to compute the derivative of the
composition of functions. Unfortunately this proof is quite ‘tricky’, and trying to do
something like we did in the proof above will not work. Instead, we need to return to
our equivalent definition of differentiability, with f (x + h) = f (x) + f 0 (x)h + ε(h), where
lim ε(h)/h = 0.
h→0

Proposition 4.4 (Chain Rule)


Suppose f : R → R is differentiable at x and g : R → R is differentiability at f (x).
Then g ◦ f is differentiable at x with (g ◦ f )0 (x) = g 0 (f (x)) · f 0 (x).

Proof. We have

f (x + h) = f (x) + f 0 (x)h + εf (h)


g(f (x) + h) = g(f (x)) + g 0 (f (x))h + εg (h),

where εf (h)/h, εg (h)/h → 0 as h → 0. From this we obtain

g(f (x + h)) = g(f (x) + f 0 (x)h + εf (h))

26
Adam Kelly (July 4, 2021) Analysis

= g(f (x)) + g 0 (f (x))(f 0 (x)h + εf (h)) + εg (f 0 (x)h + εf (h)).

Rearranging we get

g(f (x + h)) − g(f (x))


lim
h→0 h
g 0 (f (x))(f 0 (x)h + εf (h)) + εg (f 0 (x)h + εf (h))
= lim
h→0 h
εf (h) εg (f 0 (x)h + εf (h))
= g 0 (f (x))f 0 (x) + lim g 0 (f (x)) + lim .
h→0 h h→0 h
Then as h → 0 we have εf (h)/h → 0 and f 0 (x)h + εf (h) → 0 so εg (f 0 (x)h +
εf (h))/h → 0, and thus we have (g ◦ f )0 (x) = g 0 (f (x))f 0 (x).

Example 4.5 (Using the Chain Rule)


Let f (x) = sin(x2 ). Then since (sin x)0 = cos x we have f 0 (x) = 2x cos(x2 ).

Example 4.6 (A Continuous Everywhere But Non-Differentiable Function)


Consider the function f (x) = x sin(1/x) when x 6= 0, and f (0) = 0. This function is
continuous everywherea , but we will show that it is not differentiable at x = 0.
At x 6= 0 this is differentiable by the chain rule, but at x = 0 we would need the
following limit to exist:
 
1
− 0 sin x1

(0 + h) sin x+h  
1
lim = lim sin ,
h→0 h h→0 h

but this limit does not exist (a similar construction to Example 3.8 will show this),
and thus f is not differentiable at x = 0.
a
Feel free to check this!

Now in the example above we have a continuous everywhere function that’s not differ-
entiable at 0. A related example is one of a function that is differentiable everywhere
(and hence continuous), but whose derivative is discontinuous.

Example 4.7 (A Function with Discontinuous Derivative)


Consider the function f (x) = x2 sin(1/x) when x 6= 0 and f (0) = 0.
For x 6= 0 we have that f is differentiable by the chain rule, and the derivative is
given by    
0 1 1
f (x) = x sin + cos .
x x

For x = 0, we compute the derivative directly. We have


1
h2 sin

0 f (0 + h) − f (0) h
f (0) = lim = lim = 0.
h→0 h h→0 h

27
Adam Kelly (July 4, 2021) Analysis

Thus we can see that f 0 is not continuous, since f 0 (x) 6→ 0 as x → 0 (as cos 1/x is
not continuous).

There is however a limit as to how discontinuous a derivative can be. In particular the
derivative of a differentiable function must have the intermediate value property.

Theorem 4.8 (Darboux’s Theorem)


If f : R → R is differentiable then f 0 has the intermediate value property. That is
to say, if a < b and f 0 (a) < z < f 0 (b) then there exists c ∈ (a, b) with f 0 (c) = z.

Proof. Given a < b and z ∈ R such that f 0 (a) < z < f 0 (b), we wish to show there
exists c ∈ (a, b) such that f (c) = z. We can rewrite the condition as f 0 (a) − z < 0 <
f 0 (b) − z. Now define g(x) = f (x) − zx, and note that we have g 0 (a) < 0 < g 0 (b).
We want to find c ∈ (a, b) such that g 0 (c) = 0. Now, g is continuous (since f is
continuous), and thus is bounded on [a, b], and the minimum is attained say at some
point k. We can’t have k = a, as that would imply that g 0 (a) ≥ 0, and we also can’t
have k = b as that would imply g 0 (b) ≤ 0. Thus k ∈ (a, b), and we must then have
g 0 (k) = 0, and we are done.

§4.2 Rolle’s Theorem & The Mean Value Theorem


The sign of a function’s derivative at a point can tell us quite a bit about the behaviour
of that function near that point. In particular, knowing that the derivative at a point
vanishes can be quite useful;.

Definition 4.9 (Local Maxima & Minima)


Let f : R → R. If there exists δ such that |x − x0 | < δ implies that f (x0 ) ≤ f (x),
then we call x a local maximum. Similarly if this implies that f (x0 ) ≥ f (x), we
call x a local minimum.

Our important result is as follows:

Lemma 4.10 (Derivative of Maxima)


Let f : R → R. If x is a local maximum or minimum of f and f is differentiable at
x then f 0 (x) = 0.

Proof. Without loss of generality, assume that x is a local maximum. Then there
exists δ such that |h| < δ1 implies that f (x + h) ≤ f (x).

28
Adam Kelly (July 4, 2021) Analysis

Then if 0 < h < δ we have


f (x + h) − f (x)
≥ 0,
h
and thus we have f 0 (x) ≥ 0.
Similarly if −δ < h < 0 we have

f (x + h) − f (x)
≤ 0,
h
and thus we have f 0 (x) ≤ 0. Hence f 0 (x) = 0.

When we combine this result with the boundedness theorem from our discussion about
continuity, we end up with Rolle’s theorem.

Theorem 4.11 (Rolle’s Theorem)


Let f : [a, b] → R be a continuous function which is differentiable on (a, b). Then if
f (a) = f (b) then there exists c ∈ (a, b) such that f 0 (c) = 0.

Proof. Since f is differentiable, it is also continuous and thus by the boundedness


theorem f is bounded on [a, b] and these bounds are achieved. Thus we can let
M = maxx∈[a,b] f (x) and m = minx∈[a,b] f (x).
If M = m, then f must be constant and f 0 (x) = 0 for all x ∈ (a, b), so we are done.
Otherwise M > f (a) or m < f (a). If M > f (a) then since the bounds are attained
there exists c ∈ (a, b) such that f (c) = M . But them M is a maximum, so by our
previous lemma we have f 0 (c) = 0. A similar argument works if m < f (a).

A direct consequence of Rolle’s theorem6 is another classic theorem of analysis, the mean
value theorem, which is frequently abbreviated to MVT.

Theorem 4.12 (The Mean Value Theorem)


Let f : [a, b] → R be a continuous function which is differentiable on (a, b). Then
there exists c ∈ (a, b) such that f (b) − f (a) = f 0 (c)(b − a).

Proof. Let k = f (a)−f


a−b
(b)
, and define φ(x) = f (x) − kx. Note that φ(a) = φ(b). Then
by Rolle’s theorem there exists c ∈ (a, b) such that φ0 (c) = 0, that is f 0 (c) = k.

The mean value theorem says something notable: the size of the derivative controls the
size of the function, or (in rougher terms) it puts a restriction on know ‘badly behaved’
the function can be. Also, if we appeal to geometric intuition (as we have tried not to,
it’s an easy way to go wrong) we can see that the mean value theorem says, as R. P.
Burn wrote in Numbers and functions, “for the graph of a differentiable function, there
is always a tangent parallel to the chord”. Of course, we will quickly move on from
geometrical thinking.7
6
It’s possible to establish this theorem without Rolle’s theorem, and then Rolle’s theorem pops out as
a special case. The proof is (more or less) just what we did for Rolle’s theorem laid out explicitly in
the proof of this theorem.
7
I guess here is a good place to mention how we can reason geometrically in analysis. Drawing pictures

29
Adam Kelly (July 4, 2021) Analysis

Now let’s think about applying the mean value theorem to two different functions. Sup-
pose that f, g : [a, b] → R is continuous and differentiable on (a, b), and g(a) 6= g(b).
Then the mean value theorem gives us s, t ∈ (a, b) such that

f (b) − f (a) (b − a)f 0 (s) f 0 (s)


= = .
g(b) − g(a) (b − a)g 0 (t) g 0 (t)

A stronger version of the mean value theorem says that we can take s = t.

Theorem 4.13 (Cauchy’s Mean Value Theorem)


Let f, g : [a, b] → R be continuous functions which are differentiable on (a, b). Then
there exists c ∈ (a, b) such that

(f (b) − f (a))g 0 (c) = f 0 (c)(g(b) − g(a)).

Proof. We define the function



1 1 1

φ(x) = f (a) f (x) f (b)

g(a) g(x) g(b)
= [f (x)g(b) − f (b)g(x)] − [f (a)g(b) − f (b)g(a)] + [f (a)g(x) − f (x)g(a)].

Then φ is continuous on [a, b] and differentiable on (a, b)a . Also φ(a) = φ(b) = 0.
Then by Rolle’s theorem, there exists c ∈ (a, b) such that φ0 (c) = 0.
Differentiating φ we then have

φ0 (x) = f 0 (x)[g(b) − g(a)] + g 0 (x)[f (a) − f (b)],

and thus φ0 (c) = 0 gives the desired result.


a
If the use of determinants bothers you, then feel free to just look at the expansion. If also
remember how integrals work, you can obtain φ from working backwards from what we need to
arrive at.

Cauchy’s mean value theorem has many applications. For example, we can use the mean
value theorem to establish L’Hôpital’s rule for evaluating limits.

Example 4.14 (L’Hôpitel’s Rule)


ex − 1
We wish to evaluate lim .
x→0 sin x
ex −e0 ec
By Cauchy’s mean value theorem, there exist c ∈ (0, x) such that sin x−sin 0 = cos c .
Then c → 0 as x → 0, and thus we get
ex − 1 ec
lim = lim = 1.
x→0 sin x c→0 cos c

In more generality:
and thinking geometrically is a great way to understand how things work, to come up with coun-
terexamples and much more. Still it is important to remember that basically nothing in this course
is provable by appealing to geometric intuition. Instead this type of thinking should just inform the
‘analysis’ side of us how to approach things.

30
Adam Kelly (July 4, 2021) Analysis

Theorem 4.15 (L’Hôpital’s Rule)


Suppose f, g : [a, b] → R are continuous and differentiable on (a, b). Suppose that
f (a) = g(a) = 0, that g 0 (x) does not vanish near a and f 0 (x)/g 0 (x) → ` as x → a.
Then f (x)/g(x) → ` as x → a.

Proof. Since g 0 (x) does not vanish near a, we can suppose that g 0 (x) 6= 0 for x ∈
(a, b), as otherwise we could just consider the subinterval (a, b0 ) , defined so that this
is the case.
By Cauchy’s mean value theorem we have

f (x) f (x) − f (a) f 0 (c)


= = 0
g(x) g(x) − g(a) g (c)

where c ∈ (a, x). Then c → a as x → a, and hence f (x)/g(x) → ` as x → a.

The mean value theorem can also help us extend Lemma 4.10. Knowing the sign of
a derivative over some interval can tell us immediately if it is constant, increasing or
decreasing.

Proposition 4.16 (Sign of the Derivative)


Let f : [a, b] → R be differentiable on (a, b). Then the following hold.
(i) If f 0 (x) ≥ 0 for all x ∈ (a, b), then f is monotonically increasing.
(ii) If f 0 (x) = 0 for all x ∈ (a, b), then f is constant.
(iii) If f 0 (x) ≤ 0 for all x ∈ (a, b), then f is monotonically decreasing.

Proof. By the mean value theorem we have f (x2 ) − f (x1 ) = (x2 − x1 )f 0 (x) where
a < x1 < x < x2 < b. Then all of the results follow by considering the sign of
f 0 (x).

§4.3 Inverses of Functions


Before we move to slightly more exciting material, there’s a few things about inverses of
functions that we need to get out of the way. The proofs below are straightforward but
require some care (by which I mean you have to avoid messing up which letter means
what).
First of all, if we know that a continuous function is strictly increasing8 then we can
deduce that the function has an inverse that is continuous and strictly increasing also.
A similar result holds for strictly decreasing functions.

Theorem 4.17 (Continuous Inverse Theorem)


Let f : [a, b] → R be a continuous and strictly increasing function. Then if c = f (a)
and d = f (b), f : [a, b] → [c, d] is bijective and the inverse f −1 : [c, d] → [a, b] is

8
If a continuous function was not strictly increasing or strictly decreasing then we couldn’t have a
unique inverse.

31
Adam Kelly (July 4, 2021) Analysis

continuous and strictly increasing.

Proof. As f is strictly increasing, it is an injection. It is also a surjection since if


c < t < d then there exists x ∈ (a, b) such that f (x) = t. Thus f is a bijection, and
has an inverse f −1 .
Since f is strictly increasing, we can write this as x1 < x2 ⇐⇒ f (x1 ) < f (x2 ), so
letting f (x1 ) = y1 and f (x2 ) = y2 we have f −1 (y1 ) < f −1 (y2 ) ⇐⇒ y1 < y2 . Thus
f −1 is also strictly increasing.
Now we will show f −1 is continuous at some y ∈ [c, d]. Given  > 0, let x = f −1 (y).
If x 6= a, b, we can find δ such that

f (x − ) ≤ f (x) − δ < f (x) < f (x) + δ ≤ f (x + ).

But then |z − y| < δ implies that x −  < f −1 (z) < x + , so |f −1 (z) − f −1 (y)| < .
Thus f −1 is continuous on (a, b)
Otherwise, if x = a we have f (x) < f (x + ). Then |z − y| < f (x + ) − f (x) implies
that |f −1 (z) − f −1 (y)| < , so f −1 is continuous at c. A similar argument shows
that it is continuous at d.

Now this is the differentiability section, so we can also describe what the derivative of
a differentiable function’s inverse is. This result is known as the one variable inverse
function theorem.

Theorem 4.18 (Inverse Function Theorem)


Let f : [a, b] → R be continuous, strictly increasing, and differentiable on (a, b).
Then let f (a) = c and f (b) = d. Then f : [a, b] → [c, d] is a bijection, and f −1 is
differentiable on (c, d), with
1
(f −1 )0 (x) = .
f 0 (f −1 (x))

Proof. By our previous theorem, f is a bijection and f −1 is a continuous and strictly


increasing function. Let y = f (x), with x ∈ (a, b). We wish to show that g 0 (y) =
1/f 0 (x).
Given k 6= 0, let h be given by y + k = f (x + h), that is, f −1 (y + k) = x + h, where
h 6= 0. Then
f −1 (y + k) − f −1 (y) h
= .
k f (x + h) − f (x)
Then since f 0 (x) > 0 and f −1 is continuous (so h → 0 as k → 0) we have k → 0

f −1 (y + k) − f −1 (y) x+h−x 1
lim = lim = 0 ,
k→0 k h→0 f (x + h) − f (x) f (x)

as required.

32
Adam Kelly (July 4, 2021) Analysis

§4.4 Taylor’s Theorem


We are now going to think about functions where we can take derivatives a number of
times (getting higher-order derivatives). We will begin by looking at how we can get an
analogue of Rolle’s theorem for higher-order derivatives. I will apologise in advance that
this discussion is a bit long, so feel free to just skip to Taylor’s theorem.
The following generalisation will come directly from Rolle’s theorem.

Theorem 4.19 (Higher-Order Rolle’s Theorem)


Let f be continuous and n − 1-times differentiable on [a, b] and also n-times differ-
entiable on (a, b). Then if f (a) = f 0 (a) = f 00 (a) = · · · = f (n−1) (a) = 0 and f (b) = 0
then there is a c ∈ (a, b) such that f (n) (c) = 0.

Proof. Since f (a) = f (b), by Rolle’s theorem there exists c1 ∈ (a, b) such that
f 0 (c1 ) = 0. Then since f 0 (a) = f 0 (c1 ) = 0, we can use Rolle’s theorem again
to obtain c2 ∈ (a, c1 ) such that f 00 (c2 ) = 0. Continuing on like this we obtain a
cn ∈ (a, b) such that f (n) (cn ) = 0, as required.

We can also attempt to get some sort of ‘higher-order’ mean value theorem. Recall that
our proof of the mean value theorem was more or less as follows:
Define the function φ(x) = f (x) − kx, where we choose k such that the
conditions of Rolle’s theorem are satisfied. Then apply Rolle’s theorem to φ.
We can try and do the same with a more elaborate construction. We will do this in
three steps.
1. Construct a polynomial P (x) such that P (a) = f (a), P 0 (a) = f 0 (a), and so on
until P (n−1) (a) = f (n−1) (a).
2. Construct another polynomial9 E(x) such that E (r) (a) = 0 for r = 0, 1, . . . , n − 1,
but with E(b) = f (b) − P (b).
3. Take φ(x) = f (x) − P (x) − E(x).
If we construct φ(x) in this way, then we will have φ(a) = φ0 (a) = · · · = φ(n−1) (a) = 0,
and also φ(b) = 0, so it will satisfy our higher-order Rolle’s theorem.

Aside: Constructing Our Polynomials


While it is likely you have seen a construction for P before, I’d like to take a minute to
spell it out in detail because it has some nice ideas.
To construct this polynomial, what we really need is some construction of a term which
affects the value of the polynomial at a point a if and only if we are considering a certain
derivative. We can do this as follows:
(x − a)k
Qk (x) = .
k!

9
We can think of this as the ‘error correcting term’, because it fixes the discrepancy between f (b) and
P (b) while not ruining the previous construction.

33
Adam Kelly (July 4, 2021) Analysis

We can then (by taking derivatives) find that the value of this term is
(
(j) 1 if j = k,
Qk (a) =
0 if j 6= k

With this we can immediately write down an explicit construction for P (x).
n−1 n−1
X X f (k) (a)
P (x) = f (k) (a)Qk (x) = (x − a)k
k!
k=0 k=0
f 00 (a) f (n−1)(a)
= f (a) + f 0 (a)(x − a) + (x − a)2 + · · · + (x − a)n−1 .
2 (n − 1)!

This polynomial is known as the Taylor polynomial of degree n − 1 for f about a.


We can use the exact same method to construct E(x). We need to not affect the first
n − 1 derivatives at x = a, so we will need to add some multiple of Qn (x). However, this
time we care about the value of the undifferentiated term at x = b:
(b − a)n
Qn (b) = .
n!
Since we want E(b) = f (b) − P (b), if we are going to have E(x) = λQn (x) we are going
to need
(b − a)n
λQn (b) = λ = f (b) − P (b)
n!
[f (b) − P (b)] · n!
=⇒ λ = .
(b − a)n

And thus we get

[f (b) − P (b)] · n!
E(x) = Qn (x)
(b − a)n
x−a n
 
= [f (b) − P (b)] .
b−a

Now with the constructions as above, we can write down φ(x) = f (x) − P (x) − E(x).
Then since φ(x) satisfies all of the conditions of our higher-order Rolle’s theorem, there
exists c ∈ (a, b) such that φ(n) (c) = 0.
Taking the nth derivative, since P is a degree n − 1 polynomial and thus has a zero
derivative, we get
n!
φ(n) (c) = f (n) (c) − [f (b) − P (b)] · =0
(b − a)n
f (n) (c)
=⇒ f (b) = P (b) + (b − a)n .
n!
for some c ∈ (a, b).
This result is Taylor’s theorem, specially Taylor’s theorem with the Lagrange form of
the remainder. For completeness, this proof is given in a standalone form below.

34
Adam Kelly (July 4, 2021) Analysis

Theorem 4.20 (Taylor’s Theorem with Lagrange Remainder)


Let f be continuous and (n − 1)−times differentiable on [a, b] and n-times differen-
tiable on (a, b). Then we have

f (n−1) (a) f (n) (c)


f (b) = f (a) + f 0 (a)(b − a) + · · · + (b − a)n−1 + (b − a)n ,
(n − 1)! n!

for some c ∈ (a, b).

Proof. We define
n−1 n
f (k) (a)

X x−a
φ(x) = f (x) − (x − a)k − M ,
k! b−a
k=0

where M is chosen such that φ(a) = φ(b) = 0. Then differentiating we have φ(a) =
φ0 (a) = · · · = φ(n−1) (a) = 0.
Then since φ(a) = φ(b), there exists c1 ∈ (a, b) such that φ0 (c1 ) = 0. Similar-
ity φ0 (a) = φ0 (c1 ) = 0, and thus there exists c2 ∈ (a, c1 ) such that φ00 (c2 ) = 0.
Continuing on in this way, we find a cn such that φ(n) (cn ) = 0, where cn ∈ (a, b).
Then differentiating again we have
M · n!
φ(n) (c) = f (n) (c) − = 0.
(b − a)n

Now we know that M is such that φ(b) = 0, so we get


n−1
X f (k) (a)
M = f (b) − (b − a)n .
k!
k=0

Thus we have
n−1
!
M · n! X f (k) (a) n!
f (n) (c) = = f (b) − (b − a)n ·
(b − a)n k! (b − a)n
k=0
f (n−1) (a) f (n) (c)
=⇒ f (b) = f (a) + f 0 (a)(b − a) + · · · + (b − a)n−1 + (b − a)n ,
(n − 1)! n!

where c ∈ (a, b) as required.

So we have Taylor’s theorem (which in this case can be thought of as a higher-order mean
value theorem), let’s try and do something with it. One of the main uses of Taylor’s
theorem is in giving us an infinite series which converges to the value of our function.
This involves considering the terms of the degree n taylor polynomial for increasing
values of n, as we shall see.
Doing this type of problem has two main steps.
1. Write down Taylor’s theorem for the case of the function being considered.
2. Show that the error term tends to zero as n goes to infinity.
We see how this is done using a Tripos question from 2010 (so eh spoilers I guess).

35
Adam Kelly (July 4, 2021) Analysis

Example 4.21 (Using Taylor’s Theorem – Part IA, 2010 Paper 1, Q10)
Problem. Suppose that e : R → R is a differentiable function such that e(0) = 1 and
e0 (x) = e(x) for all x ∈ R. Use Taylor’s theorem with the remainder in Lagrange’s
form to prove that
X xn
e(x) = for all x ∈ R
n!
n>0

[No property of the exponential function may be assumed.]


Solution. We begin by noting that e(x) is infinitely differentiable, with e(n) (x) =
e(x). Indeed this is true for n = 1, and then it follows by induction for all n. Thus
Taylor’s theorem with Lagrange Remainder holds for all n ∈ N.
Thus for all n ∈ N we have (using e(n) (0) = 1 and e(n) (c) = e(c))
n−1
X xk e(c)xn
e(x) = +
k! n!
k=0

n
for some c ∈ (0, x). And thus it suffices to show that e(c)x
n! → 0 as n → ∞, since
Pn−1 xk
that would imply that e(x) − k=0 k! → 0 as n → ∞. Now e(t) is differentiable
and hence continuous on [0, x], and thus is bounded. So it suffices to show that
xn /n! → 0, which holds.
Thus

X xn
e(x) = ,
n!
n=0

as required.

Now it is not the case that the remainder term always goes to zero. A common coun-
terexample is f (x) = exp(−1/x2 ) for x 6= 0 with f (0) = 0.

36
Adam Kelly (July 4, 2021) Analysis

§5 Power Series
Following on from our discussion of infinite series in Chapter 2, we are going to discuss
a particular type of infinite series known as a power series. Throughout this section, we
are going to work mostly in C.

Definition 5.1 (Complex Power Series)


A complex power series is an infinite series of the form

X
an z n ,
n=0

where z ∈ C and an ∈ C.

In this section we will look at some properties of power series, and we will use them to
(finally) define some functions that we have alluded to quite a bit.

§5.1 Radius of Convergence


Before we develop the concept of the radius of convergence, we first need to develop a
lemma about the convergence of power series.

Lemma 5.2 (Absolute Convergence of Power Series)


Suppose that ∞
P n
P∞ n
n=0 an z0 converges for some z0 ∈ C. Then n=0 an z converges
absolutely for all z ∈ C with |z| < |z0 |.

Proof. Since ∞ n n
P
n=0 an z1 converges, we have an z1 → 0, implying this sequence is
bounded. Then there is some K > 0 such that |an z1n | < K for all n.
So if |z| < |z0 |, we have |an z n | ≤ K |z/z1 |n . Since the geometric series ∞ n
P
n=0 |z/z1 |
converges, our result follows by comparison.

With this lemma we can prove one of our key results in the study of power series, the
existence of the radius of convergence.

Theorem 5.3 (Radius of Convergence)


Suppose that an ∈ C. Then either the power series ∞ n
P
n=0 an z converges for all
z ∈ C, or there exists a real number R with R ≥ 0 such that
P∞ n
(i) n=0 an z converges absolutely if |z| < R,
P∞ n
(ii) n=0 an z diverges if |z| > R.

We say that R is the radius of convergence of the power series.

Proof. If ∞ n
P
P for all nz ∈ C, then we are done. Otherwise, there
n=0 an z converges
must exist z1 ∈ C such that ∞ n=0 an z1 diverges. Then by our previous lemma the
power series must diverge for all z ∈ C where |z| > |z1 |.

37
Adam Kelly (July 4, 2021) Analysis

Now define the set S = {|z| : ∞ n


P
n=0 an z converges}. This set is bounded above by
|z1 |, and is non-empty since the power series converges for z = 0. Thus it has a
supremum, say R = sup S.
Then by definition ∞ n
P
n=0 an z diverges if |z| > R, and then suppose |z| < R. Then
there must be some z0 with |z| < |z0 | < R such that the power series converges
for z0 (by the definition of the supremum again). Then by our previous lemma the
power series converges absolutely for z.

What’s quite lovely is that when we are inside the radius of convergence, the power series
converges absolutely! This means that we can avoid all of those issues with rearranging
series that show up when things don’t converge absolutely. This can be quite helpful
when actually working with power series.
What’s also quite nice is that we can employ all of those lovely results we developed
in section 2 to find what the radius of convergence of a power series is. Let’s look
at two lemmas that come from the ratio and root tests, which can give the radius of
convergence.

Lemma 5.4 (The Ratio Test for Radius of Convergence)


Let ∞ n
P
n=0 an z be a power series with radius of convergence R. Then if |an+1 /an | →
` as n → ∞, we have R = 1/` if ` 6= 0, and convergence everywhere if ` = 0.

Proof. By the ratio test, we have absolute convergence if


an+1 z n+1

lim · n < 1.
n→∞ an z

So if ` = 0, then we have convergence everywhere. Alternatively, if |z| < 1/` we


have absolute convergence, and if |z| > 1/` the series diverges, again by the ratio
test.

Lemma 5.5 (The Root Test for Radius of Convergence)


Let ∞ n 1/n → `
P
n=0 an z be a power series with radius of convergence R. Then if |an |
as n → ∞, we have R = 1/`.

Proof. By the root test, we have absolute convergence if

lim |an z n |1/n < 1.


n→∞

So if ` = 0, then we have convergence everywhere. Alternatively, if |z| < 1` we have


absolute convergence, and if |z| > 1/` the series diverges, again by the root test.

Now it is time for an important conceptual point: if we have a power series with radius
of convergence R, then this only specifies the convergence/divergence inside or outside
the circle |z| = R (the circle of convergence). Specifically it says nothing about the
behaviour on the circle. To see this, we will look at a few examples10 .

10
This should remind you of the deadly sin mentioned in section 2, where the root or ratio test said
nothing if the limit was 1.

38
Adam Kelly (July 4, 2021) Analysis

Example 5.6 (Convergence on |z| = R)


Consider the following power series:
∞ ∞ ∞
X X zn X zn
(i) zn, (ii) (iii) .
n n2
n=0 n=1 n=1

All of these have a radius of convergence R = 1, however for |z| = 1 for each power
series we have:
1. Divergence if |z| = 1.
2. Convergencea if |z| = 1 with z 6= 1, and divergence if z = 1.
3. Convergence if |z| = 1.
Thus we can’t say anything about the case |z| = R and each case really has to be
treated separately.
a
Have a look at Example Sheet 1 for this result

§5.2 Differentiating Power Series


So things are generally quite nice inside the circle of convergence. One of the nice things
we can do is differentiate! Even better, we can differentiate power series just like we do
polynomials! The following results you need to know (including the lemmas), but the
proofs are not examinable (feel free to skip them).
power series ∞ n
P
We are going to show
P∞ that the n=0 an z with radius of convergence R
has the derivative n=0 nan z n−1 , whenever |z| < R. Before showing this, we will need
a lemma which gives us that this second power series converges for |z| < R.

Lemma 5.7
Let f (z) = ∞ n
P
n=0 an z be a P
power series whose radius of convergence is R. Then if
0 < r < R, the power series ∞n=1 n|an |r
n−1 converges.

Proof. Pick w such that r < |w| < R. Then the power series ∞ n
P
n=0 an w converges,
n
so the terms |an w | are bounded above by say M . Then we have

rn rn
n|an |rn = n|an wn | ≤ M n .
|w|n |w|n
P∞ n converges by the ratio test, and thus by the
But then the series
P∞ n=1 M n(r/|w|)
comparison test n=1 n|an |rn−1 converges.

Applying this lemma twice also gives us that the power series ∞ n−2
P
n=2 n(n − 1)|an |r
converges. We are now ready to prove our theorem about differentiating power series.

Theorem 5.8 (Differentiating Power Series)


Let f (z) = ∞ n
P
n=0 an z be a power series whose radius of Pconvergence is R. Then f
is differentiable at all points with |z| < R, and f 0 (z) = ∞
n=0 nan z n−1 .

39
Adam Kelly (July 4, 2021) Analysis

Proof. Using the difference of powers factorisation, we have



f (z + h) − f (z) X (z + h)n − z n
= an
h h
n=0
X∞
an (z + h)n−1 + (z + h)n−2 z + · · · + z n−1 .
 
=
n=1
P∞ n−1
Then, since n=1 nan z converges by our lemma, we have
 
∞ ∞ n−1
f (z + h) − f (z) X X X
− nan z n−1 = an  z j (z + h)n−1−j − z n−1 
h
n=1 n=1 j=0
 
X∞ n−1
X 
z j (z + h)n−1−j − z n−1−j  .

= an 
n=1 j=0

Now we want to bound the terms of this inside sum. Choose r such that |z| < r < R
and h such that |z| + |h| < r. Then employing the difference of powers factorisation
again we get

|(z + h)n−1−j − z n−1−j | = |h((z + h)n−2−j + z(z + h)n−3−j + · · · + z n−2−j )|


≤ |h|(n − 1 − j)rn−2−j .

Applying this bound, assuming the right hand sum converges we have
 
∞ ∞

f (z + h) − f (z) X X n−1
X
− nan z n−1 ≤ |an |  rj |h|(n − 1 − j)rn−2−j 

h


n=1 n=1 j=0
 

X n−1
X
= |h| |an | rn−2  j
n=1 j=0

1 X
= |h| |an | n(n − 1)rn−2 .
2
n=2

Using our previous lemma twice shows that the right hand sum converges, and thus

f (z + h) − f (z) X
− nan z n−1 → 0 as h → 0,
h
n=1
P∞
that is, f 0 (z) = n=0 nan z
n−1 .

§5.3 Exponential, Trigonometric & Hyperbolic Functions


With all that out of the way, let’s define some functions! You’ll hopefully be familiar
with all of these anyway, but we will now be able to define them properly.
We are going to need one little lemma before we proceed.

40
Adam Kelly (July 4, 2021) Analysis

Lemma 5.9 (Constant Value Theorem Over C)


If F : C → C has F 0 (z) = 0 for all z ∈ C, then F (z) is constant for all z.

Proof. For some z ∈ C, we define the functions uz , vz : R → R such that F (tz) =


uz (t) + ivz (t). By the chain rule F 0 (tz)z = u0z (t) + ivz0 (t) = 0 for any t ∈ R. Then
comparing real and imaginary parts, this implies u0z (t) = 0 and vz0 (t) = 0, for all
t ∈ R. So uz and vz are constant.
Then for any z ∈ C we have F (z) = uz (1) + ivz (1) = uz (0) + ivz (0) = F (0), so F is
constant too.

§5.3.1 The Exponential & Logarithm Function


We will begin with one of the most important functions, the exponential function.

Definition 5.10 (The Exponential Finction)


We define the exponential function exp : C → C by the power series

X zn
exp(z) = .
n!
n=0

When we state a definition using power series, we pretty much always have to check
convergence straight away so that we do not end up writing complete nonsense. For this
power series, we can see by the ratio test that it converges for all z ∈ C. By the result
in the previous section, we can also see that it’s differentiable.
Now we can prove all of the properties that you already are aware of.

Proposition 5.11 (Properties of the Exponential Function)


Let z, w ∈ C. Then the following hold.
(i) exp0 (z) = exp(z).
(ii) exp(z) exp(w) = exp(z + w).
(iii) exp(x) > 0 for x ∈ R.
(iv) exp(x) is strictly increasing on R.
(v) exp(x) → ∞ as x → ∞, exp(x) → 0 as x → −∞.
(vi) exp : R → (0, ∞) is a bijection.

Proof. We prove each individually.


(i) This follows from differentiating the power series.
(ii) Let a, b ∈ C and define F (z) = exp(a + b − z) exp(z). Then F 0 (z) = − exp(a +
b − z) exp(z) + exp(a + b − z) exp(z) = 0, and thus by the constant value
theorem, F is constant. So exp(a + b − z) exp(z) = F (0) = exp(a + b), and
setting z = b gives exp(a) exp(b) = exp(a + b).

41
Adam Kelly (July 4, 2021) Analysis

(iii) Clearly exp(x) > 0 for all x ≥ 0, and exp(0) = 1. Then exp(0) = exp(x − x) =
exp(x) exp(−x) = 1, and thus exp(−x) > 0 for x > 0.
(iv) Differentiating, exp0 (x) = exp(x) > 0, and thus exp is strictly increasing on
R.
(v) Truncating the power series, exp(x) > 1 + x for x > 0. Thus if x → ∞,
exp(x) → ∞. Then exp(−x) = 1/ exp(x), so exp(x) → 0 as x → −∞.
(vi) Injectivity follows directly from exp being strictly increasing. For surjectivity,
take y > 0. Then by (v) there exists a, b ∈ R such that exp(a) < y < exp(b),
and by the intermediate value theorem there is c ∈ (a, b) such that exp(c) =
y.

Since the exponential function is a bijection R → R+ , it also has a well defined inverse.

Definition 5.12 (The Logarithm Function)


We define the logarithm function log : (0, ∞) → R to be the inverse of exp.

By the inverse function theorem, this is a differentiable function, and its derivative is
given by log0 (t) = 1/t for t > 0.
Remark. As defined here, the logarithm function is an inverse of exp : R → (0, ∞), not
over the whole of C. In general, exp is not a bijection, and thus it having an inverse
doesn’t make sense (with the tools we have developed, this is remedied in later courses).
Using both the exponential and logarithm functions, we can define powers in the general
case. For x > 0 and α ∈ R, we define xα = exp(α log x). With this we get the normal
‘rules of indices’ being properties of the exponential function, and it it not hard to see
that this definition matches how powers would be previously defined for α ∈ Q. This
also gives us a new shorthand for the exponential function: exp(z) = ez .

Aside: Multiplying Infinite Series


In Proposition 5.11 we proved a result involving multiplying two exponential functions
together. We also managed to do this in a way that avoided using the power series
definition of the function. Despite this, I think it’s worth taking a moment to look
properly at the multiplication of infinite series11 .
When we talk about trying multiplying infinite series, we are really talking about taking
two infinite series so that they can be combined into a new infinite series12 , which
hopefully converges to the product of the two original series.
So how can we do this? A natural approach would be to just ‘expand everything’, but
unfortunately that’s not really how infinite series work, so we can’t do that. Still, we
can use this idea as inspiration for a more mathematically sound approach.
P∞ n
Being
P∞ quite informal and hand-wavy, let’s consider two power series n=0 an z and
n
n=0 bn z , multiply them term by term and then group by powers of z. Doing this, we

11
This section is completely not-examinable and the proofs are relatively long, so I suggest that you
skip it if you are using these notes for revision.
12
If we don’t care about obtaining a new infinite series, this discussion can be completely ignored.

42
Adam Kelly (July 4, 2021) Analysis

get something like

(a0 + a1 z + a2 z 2 + a3 z 3 + · · · ) · (b0 + b1 z + b2 z 2 + b3 z 3 + · · · )
= a0 b0 + (a0 b1 + a1 b0 )z + (a0 b2 + a1 b1 + a2 b2 )z 2 + (a0 b3 + a1 b2 + a2 b1 + a3 b0 )z 3 + · · · .

Taking inspiration from this, for two sequence an and bn we can define their convolution
to be the sequence cn where

cn = a0 bn + a1 bn−1 + · · · + an b0 .

It is this construction that will form the basis of our product of infinite series.

Definition (Cauchy Product)


Given two infinite series ∞
P P∞
n=0 an and n=0 bn , their Cauchy product is the infi-
nite series

X
cn ,
n=0

where cn = a0 bn + a1 bn−1 + · · · + an b0 .

The Cauchy product does give us a way to obtain a ‘product’ of infinite series, but before
we go any further some important caveats need to be pointed out. The most notable is
the following: the infinite series obtained with the Cauchy product does not necessarily
converge 13 .

Example (Convergent Series Whose Cauchy Product Diverges)


(−1)n
Consider the convergenta series ∞
P
n=0 n+1 . Then the Cauchy product of this series

with itself is given by


∞ X n
X (−1)k (−1)n−k
√ √ .
n=0 k=0
k+1 n−k+1
But then since
n 2 n 2 n 2
(k + 1)(n − k + 1) ≤ +1 − −k ≤ +1 ,
2 2 2
the terms of the Cauchy product satisfy

n k n−k n
X (−1) (−1) X 1 n+1
√ √ ≥ = ,

n/2 + 1 n/2 + 1


k=0
k + 1 n − k + 1
k=0

and thus they don’t tend to 0 so the series cannot converge. This implies that we
have two convergent series whose Cauchy product does not converge.
a
Exercise!

This is obviously not a very good thing to happen, and we will spend the rest of this
aside trying to deal with this possible issue. Luckily enough, this issue can be completely
avoided if we know that one of the series being ‘multiplied’ converges absolutely14 .
13
It’s also possible for two divergent series to have a convergent Cauchy product. Try coming up with
an example as an exercise.
14
This is another reason why absolute convergence is a great property to have.

43
Adam Kelly (July 4, 2021) Analysis

Theorem (Mertens’ Theorem)


Suppose that ∞ = A converges absolutely, and ∞
P P
n=0 an P n=0 bn = B converges. Then

their Cauchy product n=0 cn converges, and

X
cn = AB.
n=0

Proof. Let the partial sums of the series be


n
X n
X n
X
An = ak , Bn = bk , Cn = ck .
k=0 k=0 k=0

Then we have

Cn = a0 b0 + (a0 b1 + a1 b0 ) + (a0 b2 + a1 b1 + a2 b0 ) + · · · + (a0 bn + · · · + an−1 bn )


= a0 Bn + a1 Bn−1 + · · · + an B0
= (a0 B + · · · + an B) − (a0 B + · · · + an B) + a0 Bn + a1 Bn−1 + · · · + an B0
= An B + a0 (Bn − B) + a1 (Bn−1 − B) + · · · + an (B0 − B).

We wish to show Cn → AB, and we will do so using three bounds. Given  > 0,
since An → A, there is an integer L such that n ≥ L implies

/3
|An − A| ≤ .
|B|
P∞
Then since k=0 |ak | converges and Bn → B, there is an integer M such that n ≥ M
implies
/3
|Bn − B| ≤ P∞ .
k=0 |ak |
Also, since ∞
P
k=0 ak converges, an → 0, and thus there is an integer N such that
n ≥ N implies that
/3
|an | ≤ PM −1 .
k=0 |Bk − B|
Combining thesea , for N ≥ max{L, M + N } we obtain

n
X
|Cn − AB| = (An − A)B + an−k (Bk − B)


k=0
M
X −1 n
X
≤ |An − A||B| + |an−k ||Bk − B| + |an−k ||Bk − B| ≤ 
k=0 k=M
P∞
Thus n=0 cn = AB.
a
We use M + N because we will need n − (M − 1) ≥ N to make the bounds work.

We can apply this result to power series. Recalling that power series converge absolutely
inside their circle of convergence, we can safely multiply power series using the Cauchy
product.

44
Adam Kelly (July 4, 2021) Analysis

Corollary (Multiplying Power Series)


Let ∞
P n
P∞ n
n=0 an z and n=0 bn z be power series with radius of convergence R1 and
R2 respectively. Then if |z| < min{R1 , R2 } we have
∞ ∞ ∞
! ! !
X X X
an z n · bn z n = cn z n ,
n=0 n=0 n=0

where cn = a0 bn + a1 bn−1 + · · · + an b0 .

So to summarise: when inside the circle of convergence, we can multiply power series in
the natural way, and the result will be as we expect. Using this corollary, we could prove
exp(a + b) = exp(a) exp(b) directly (though admittedly the other proof is much faster).

§5.3.1 The Trigonometric Functions


We can use the exponential function to define the familiar trigonometric functions.

Definition 5.13 (Sine & Cosine)


We define the sine and cosine functions by the power series

eiz − e−iz z3 z5 z7
sin z = =z− + − + ···
2i 3! 5! 7!
eiz + e−iz z2 z4 z6
cos z = =1− + − + ···
2 2 4! 6!
where z ∈ R.

Stating these definitions in terms of the exponential function make lots of identities quite
easy to derive.

Proposition 5.14 (Properties of sin and cos)


Let z, w ∈ R. Then the following hold.
(i) sin0 z = cos z, cos0 z = − sin z.
(ii) sin2 z + cos2 z = 1.
(iii) | sin z|, | cos z| ≤ 1.
(iv) cos(z + w) = cos z cos w − sin z sin w.
(v) sin(z + w) = sin z cos w + cos z sin w.

Proof. We prove each individually, mainly by computation.


0
(i) For sin we have sin0 z = (eiz − e−iz /2i = (eiz + e−iz )/2 = cos z, and for cos

0
we have cos0 z = (eiz + e−iz )/2 = (e−iz − eiz )/2i = − sin z.


(ii) sin2 z + cos2 z = −(e2iz + e−2iz + 2)/4 + (e2iz + e−2iz − 2)/4 = 1, as required.

45
Adam Kelly (July 4, 2021) Analysis

(iii) This follows from the above result.


(iv) Expanding the right hand expression gives

cos z cos w − sin z sin w


eiz + e−iz eiw + e−iw eiz − e−iz eiw − e−iw
   
= +
4 4
ei(z+w) + e−i(z+w)
=
2
= cos(z + w).

(v) Differentiating the above result with respect to z, we have − sin z cos w −
cos z sin w = − sin(z + w), giving us our desired result.

One of the more notable properties of sin and cos is that they are periodic, which is what
we are going to show next. To establish this, we can try evaluating one of them (say cos)
at some values near where we expect a root to be (guided by our previous knowledge of
these functions).

Lemma 5.15 (Smallest Root of cos)


There is a smallest positive x such that cos(x) = 0.

Proof. We begin by computing the sign of the derivative over (0, 2). Since cos0 (x) =
x2n−1 x2n+1
− sin(x), we can use the inequality (2n−1)! > (2n+1)! to bound

x3 x5 x7
sin x = x − + − + · · · > 0,
3! 5! 7!
so cos has a negative derivative and must be decreasing on (0, 2).
√ √
Evaluating the power series for cos at 2 and 3, we can see that the function
changes sign
√ 2 22 23 √ 3 32 33
cos 2=1− + − + · · · > 0, cos 3 =1− + − + · · · < 0.
2 4! 6! 2 4! 6!
Then
√ √by the intermediate value theorem, there must be a root in the interval
( 2, 3). Also since cos is decreasing on (0, 2), this must also be the smallest
positive root.

The existence of this root gives us a way to define π.

Definition 5.16 (π)


π is the smallest positive real number such that cos π/2 = 0.

We can easily determine the value of sin π/2 too. We have sin2 π/2 + cos2 π/2 = 1, so
sin2 π/2 = 1, and knowing from the proof of Lemma 5.15 that sin(x) > 0 on (0, 2), we
can deduce that sin π/2 = 1. These results are all we need to show that sin and cos are
periodic, and also that the two functions are just translated versions of each-other.

46
Adam Kelly (July 4, 2021) Analysis

Proposition 5.17 (Periodicity of sin and cos)


The functions sin and cos are periodic with a period of 2π, and are related as follows.

sin(z + π/2) = cos z, cos(z + π/2) = − sin z,


sin(z + π) = − sin z, cos(z + π) = − cos z,
sin(z + 2π) = sin z, cos(z + 2π) = cos z.

Proof. This follows directly from sin(π/2) = 1, cos(π/2) = 0 and the angle addition
formulas.

Remark (Periodicity of exp). If we use the power series for sin and cos, we can write
exp(iz) = cos z + i sin z, from which the above result implies that exp is periodic with a
period of 2π.
With sin and cos have been defined properly, we can then define tan, sec, csc and all of
those fun functions in the normal way.

§5.3.2 The Hyperbolic Functions


The last set of functions we will touch on is the (probably) slightly less familiar hyperbolic
functions. As with the standard trigonometric functions, we can state the definitions
using the exponential function.

Definition 5.18 (Hyperbolic Sine & Cosine)


We define the hyperbolic sine and hyperbolic cosine functions by the power
series
ez − e−z x3 x5 x7
sinh z = =x+ + + + ···
2 3! 5! 7!
ez + e−z x2 x4 x6
cosh z = =1+ + + + ···
2 2! 4! 6!
where z ∈ R.

Remark. The basic hyperbolic and trigonometric functions are related by cosh z =
cos(iz) and sinh z = −i sin(iz). Remembering this can allow you to adapt facts about
sin and cos to facts about sinh and cosh.
We will state a few properties for convenience.

Proposition 5.19 (Properties of sinh & cosh)


Let z, w ∈ R. Then the following hold.
(i) sinh0 z = cosh z, cosh0 z = sin z.
(ii) cosh2 z − sinh2 z = 1.
(iii) sinh(z + w) = sinh z cosh w + cosh z sinh w.
(iv) cosh(z + w) = cosh z cosh w + sinh z sinh w.

47
Adam Kelly (July 4, 2021) Analysis

Proof. We will use the observation made in the above remark.


(i) For sinh we have sinh0 z = [−i sin(iz)]0 = cos(iz) = cosh z, and for cosh we
have cosh0 z = [cos(iz)]0 = −i sin(iz) = sinh z.
(ii) cosh2 z − sinh2 z = cos(iz)2 − (−i sin(iz))2 = cos2 (iz) + sin2 (iz) = 1.
(iii) sinh(z + w) = −i sin(iz + iw) = sinh(z) cosh(w) + cosh(z) sinh(w).
(iv) cosh(z + w) = cos(iz + iw) = cosh(z) cosh(w) + sinh(z) sinh(w).

48
Adam Kelly (July 4, 2021) Analysis

§6 Integration
We are now prepared, both technically and emotionally, to talk about integrals. The
idea behind integration is to assign some sort of ‘area’ to sets. The first issue we run
into is that is not quite clear what ‘area’ actually means (in mathematical terms), so
to avoid this difficulty, we will instead define the notion of an integral (specifically the
Riemann integral, there are other types) and we will use this as our definition of area
(not vice versa).
Throughout this section, we will deal with the integration of bounded functions defined
on bounded intervals, f : [a, b] → R. We will discuss a slight generalisation later on.

§6.1 Dissections, Upper & Lower Sums, and Riemann Integrals


As with most ideas in analysis, integration depends fundamentally on a ‘limiting process’.
For Riemann integrals, this is conducted by considering dissections of the interval in
which we will be integrating over.

Definition 6.1 (Dissection)


A dissection D of a finite interval [a, b] is a finite subset of [a, b] containing the
endpoints. We typically write D = {x0 , x1 , x2 , . . . , xn } with a = x0 ≤ x1 ≤ · · · ≤
xn = b.

For a given dissection, because we are dealing with bounded functions we can sensibly
define the lower and upper sums of a function as follows.

Definition 6.2 (Upper & Lower Sums)


Let f : [a, b] → R be a bounded function, and let D be a dissection of [a, b]. We
define the upper sum SD (f ) and lower sum sD (f ) by
n
X
SD (f ) = (xj − xj−1 ) sup f (x),
j=1 x∈[xj−1 ,xj ]

Xn
sD (f ) = (xj − xj−1 ) inf f (x).
x∈[xj−1 ,xj ]
j=1

sup f (x)
x∈[x1 ,x2 ]

inf f (x)
x∈[x1 ,x2 ]

x0 x1 x2 x3 x4

Let’s make some observations. By construction we have SD (f ) ≥ sD (f ), and we can also


see intuitively that replacing a dissection by a more elaborate one will decrease the upper

49
Adam Kelly (July 4, 2021) Analysis

sum and increase the lower sum. Indeed, if we say that D0 refines D when D0 ⊇ D,
then we can formalise this intuition as follows.

Lemma 6.3 (Refinement Lemma)


If f : [a, b] → R is a bounded function and D, D0 are dissections of [a, b] such that
D ⊆ D0 , then
sD (f ) ≤ sD0 (f ) ≤ SD0 (f ) ≤ SD (f ).

Proof. It suffices to show that this holds when D0 and D differ by a single element
(since then we can just repeatedly apply this argument to obtain the general case).
Suppose that D0 = {x0 , . . . , xn }, and that D = D0 \{xi } where i 6= 0, n. Then let

A= sup f (x), and B = sup f (x).


x∈[xi−1 ,xi ] x∈[xi ,xi+1 ]

We can then establish the right hand side of the inequality with

SD (f ) − SD0 (f ) = (xi+1 − xi−1 ) max{A, B} − (xi − xi−1 )A − (xi+1 − xi )B


≥ max{A, B}[(xi+1 − xi−1 ) − (xi − xi−1 ) − (xi+1 − xi )] = 0.

Then the left hand side of the inequality follows similarly (taking infima).

A direct consequence of this lemma is that lower sums can never exceed upper sums.

Lemma 6.4 (Key Integration Property)


If f : [a, b] → R is a bounded function and D1 , D2 are dissections of [a, b] then

sD1 (f ) ≤ SD2 (f ).

Proof. This follows from the previous lemma by noting that sD1 (f ) ≤ sD1 ∪D2 (f ),
and SD1 ∪D2 (f ) ≤ SD2 (f ).

We are now (finally) ready to say what it means for a function to be Riemann integrable.

Definition 6.5 (Upper & Lower Integrals)


For a bounded function f : [a, b] → R we define the upper integral as I ∗ (f ) =
inf D SD (f ) and the lower integral as I∗ (f ) = supD sD (f ).

Note that Lemma 6.4, along with the boundedness of the function, guarantees that the
upper and lower integrals always exist.

Definition 6.6 (Riemann Integrable)


If f : [a, b] → R is a bounded function and I ∗ (f ) = I∗ (f ), then we say f is Riemann
integrable and write
Z b
f (x) dx = I ∗ (f ).
a

50
Adam Kelly (July 4, 2021) Analysis

So this is what it means for a function to be Riemann integrable15 . In general using


this definition directly on a function to show that it’s integrable is quite tricky. As an
example, we can attempt to show that f (x) = x is integrable.

Example 6.7 (f (x) = x is Integrable)


Consider the function f (x) = x defined on the interval [a, b]. Then for any dissection
D = {x0 , x1 , . . . , xn } of [a, b] we have the upper sum given by
n
X
SD (f ) = (xj − xj−1 )xj
j=1
n
!
X x2j x2j−1 x2j x2j−1
= − + + − xj−1 xj
2 2 2 2
j=1
n n
!
X x2j x2j−1 1X
= − + (xj − xj−1 )2
2 2 2
j=1 j=1
 2 n
a2

b 1X
= − + (xj − xj−1 )2 .
2 2 2
j=1

If xj − xj−1 < δ for all j, then we can obtain the upper bound
n
b2 a2 b2 a2
   
1X δ
SD (f ) ≤ − + δ (xj − xj−1 ) = − + (b − a).
2 2 2 2 2 2
j=1

So given  > 0, there is some dissection D such that SD (f ) ≤ (b2 − a2 )/2 + , and
a similar argument also gives sD (f ) ≥ (b2 − a2 )/2 − . So by our key integration
property, we have
1 2 1
(b − a2 ) −  ≤ I ∗ (f ), I∗ (f ) ≤ (b2 − a2 ) + ,
2 2
for any  > 0, and thus I ∗ (f ) = I∗ (f ) = (b2 − a2 )/2. This show that f (x) = x is
integrable over [a, b], with
Z b
b2 a2
x dx = − .
a 2 2

There are a few ways that we can streamline this argument. Later on we will develop
more general integrability results, but the following criterion is also quite useful for
showing particular functions are integrable.

Theorem 6.8 (Riemann’s Integrability Criterion)


Let f : [a, b] → R be a bounded function. Then f is Riemann integrable if and only
if given any  > 0, we can find a dissection D with

SD (f ) − sD (f ) < .

15
We will regularly drop the ‘Riemann’, and when referring to ‘integrability’ we will always be talking
only about Riemann integrability.

51
Adam Kelly (July 4, 2021) Analysis

Proof. If f is Riemann integrable, then given  > 0 there must exist dissections D1 ,
D2 such that
Z b Z b
 
SD1 (f ) < f (x) dx + and sD2 (f ) > f (x) dx − .
a 2 a 2

Then taking D = D1 ∪ D2 we have SD (f ) ≤ SD1 (f ), and sD (f ) ≥ sD2 (f ), and thus

SD (f ) − sD (f ) ≤ SD1 (f ) − sD2 (f ) < .

Otherwise if for any  > 0 we can find a dissection D such that SD (f ) − sD (f ) < ,
then we would have

inf0 SD0 (f ) − sup sD0 (f ) ≤ SD (f ) − sD (f ) < ,


D D0

that is, I ∗ (f ) − I∗ (f ) <  for all  > 0. So we must have I ∗ (f ) = I∗ (f ), implying f


is Riemann integrable.

§6.2 Properties of Integration


We now need to prove many of the properties of integrals that you are (most likely)
already familiar with. The first result we would like to prove is that an arbitrary linear
combination of integrable functions is integrable, but we will need to prove some smaller
results first.
The proofs below are a little bit fiddly, but they all follow the same basic idea: sim-
plify the problem as much as possible, think a bit about what’s going on with the
suprema/infima, and use our key property of integration, Riemann’s criterion and so on.

Proposition 6.9 (Adding Integrable Functions)


If f, g : [a, b] → R are Riemann integrable, then so is f + g and
Z b Z b Z b
f (x) + g(x) dx = f (x) dx + g(x) dx.
a a a

Proof. We begin by noting the general fact

sup (f (x) + g(x)) ≤ sup f (x) + sup g(x),


x∈[xj−1 ,xj ] x∈[xj−1 ,xj ] x∈[xj−1 ,xj ]

which directly implies that SD (f + g) ≤ SD (f ) + SD (g). A similar fact about infima


implies sD (f + g) ≥ sD (f ) + sD (g).
Now let D1 and D2 be dissections, and let D = D1 ∪ D2 . Then we have

SD (f + g) ≤ SD (f ) + SD (g) ≤ SD1 (f ) + SD2 (g),

which implies that I ∗ (f + g) ≤ I ∗ (f ) + I ∗ (g). Similarly we have I∗ (f + g) ≥


I∗ (f ) + I∗ (g).
Since f and g are integrable, their upper and lower integrals are equal, so combining

52
Adam Kelly (July 4, 2021) Analysis

the two previous inequalities gives


Z b Z b Z b Z b
f (x) dx + g(x) dx ≤ I∗ (f + g) ≤ I ∗ (f + g) ≤ f (x) dx + g(x) dx,
a a a a

so I (f + g) = I ∗ (f + g), showing f + g is integrable, with the integral given by


Rb ∗ Rb Rb
a f (x) + g(x) dx = a f (x) dx + a g(x) dx.

Proposition 6.10 (Negating Integrable Functions)


If f : [a, b] → R is Riemann integrable, then so is −f , and
Z b Z b
(−f (x)) dx = − f (x) dx.
a a

Proof. Let D be a dissection of [a, b]. Then since supx −f (x) = − inf x f (x), we have
n
X
SD (−f ) = (xj − xj−1 ) sup (−f (x))
j=1 x∈[xj−1 ,xj ]
n
X
=− (xj − xj−1 ) inf f (x)
x∈[xj−1 ,xj ]
j=1

= −sD (f ).

Similarly sD (−f ) = −SD (f ). This then implies that I ∗ (−f ) = −I∗ (f ) and I∗ (−f ) =
−I ∗ (f ). But then f is integrable, so I ∗ (−f ) = I∗ (−f ) = −I ∗ (f ). Thus −f is also
Rb Rb
integrable, and a (−f (x)) dx = − a f (x) dx.

Proposition 6.11 (Scaling Integrable Functions)


If λ ∈ R and f : [a, b] → R is Riemann integrable, then so is λf and
Z b Z b
λf (x) dx = λ f (x) dx.
a a

Proof. Without loss of generalitya , we may assume that λ ≥ 0. Then if D is a


dissection of [a, b] we have
n
X
SD (λf ) = (xj − xj−1 ) sup λf (x) = λSD (f ),
j=1 x∈[xj−1 ,xj ]

and similarly sD (λf ) = λsD (f ). Then I ∗ (λf ) = λI ∗ (f ) and I∗ (λf ) = λI∗ (f ). Since
f is integrable, we then have I ∗ (λf ) = I∗ (λf ) = λI ∗ (f ), so λf is integrable and
Rb Rb
a λf (x) dx = λ a f (x) dx.
a
If λ < 0, we can consider |λ|f , then by the previous lemma −|λ|f = λf will be integrable.

To summarise these results, if f, g : [a, b] → R are integrable and λ, µ ∈ R then λf + µg

53
Adam Kelly (July 4, 2021) Analysis

is integrable and
Z b Z b Z b
λf (x) + µg(x) dx = λ f (x) dx + µ g(x) dx.
a a a

Let’s now turn our attention16 to another set of integral properties which are still natural
but probably ever-so-slightly less familiar: integral inequalities. The first result is one
you’d expect to be true.

Proposition 6.12 (Basic Integral Inequality)


If f, g : [a, b] → R are Riemann integrable and f (x) ≤ g(x) for all x ∈ [a, b], then
Z b Z b
f (x) dx ≤ g(x) dx.
a a

Proof. If f ≤ g then for all dissections D we have SD (f ) ≤ SD (g), and hence


Rb Rb
a f (x) dx = inf D SD (f ) ≤ inf D SD (g) = a g(x) dx.

The next result shows that |f | is integrable if f is, and also something like a triangle
inequality for integrals.

Proposition 6.13 (Integral Triangle Inequality)


If f : [a, b] → R is Riemann integrable, then so is |f | and
Z b Z b

|f (x)| dx ≥
f (x) dx .
a a

Proof. We define f+ (x) = max{f (x), 0}. We will showing that f+ (x) is integrable.
For some interval I we have
   
sup f+ (x) = max 0, sup f (x) , inf f+ (x) = max 0, inf f (x) ,
x∈I x∈I x∈I x∈I

and then considering the cases supx∈I f (x) ≥ 0 and supx∈I f (x) ≤ 0, we find that
the following inequality holds:
   
sup f+ (x) − inf f+ (x) = max 0, sup f (x) − max 0, inf f (x)
x∈I x∈I x∈I x∈I

≤ sup f (x) − inf f (x).


x∈I x∈I

Then since f is integrable, given  > 0 there is a dissection D such that

SD (f+ ) − sD (f+ ) ≤ SD (f ) − sD (f ) < ,

and thus f+ is integrable. We can write |f | = 2f+ −f , and thus |f | is also integrable.

16
Don’t get too sad – we still have to show that the product of integrable functions is integrable, but
we will come back to that once we have shown |f | is integrable

54
Adam Kelly (July 4, 2021) Analysis
R R
b b
We also have −|f | ≤ f ≤ |f |, so a f (x)dx ≤ a |f (x)|dx.

We can use this property to prove that the product of two integrable functions f g is
integrable. To do this, the expansion 2f g = (f + g)2 − f 2 − g 2 can be used, so all we
need to show is that f 2 is integrable if f is. Also since f 2 = |f | · |f |, we can work only
with the case that f is positive.
Unlike some of the previous results, we will not write down what the value of this integral
will be (there’s not really a nice rule). We will only be proving integrability.

Proposition 6.14 (Multiplying Integrable Functions)


If f, g : [a, b] → R are Riemann integrable, then so is f g.

Proof. We will first show that if f : [a, b] → R is an integrable function such that
f (x) ≥ 0 for all x ∈ [a, b], then f 2 is integrable.
Since f is integrable (and hence bounded), we can choose some K > 0 such that
|f (x)| ≤ K for all x ∈ [a, b]. Then given any  > 0, there exists some dissection D
where SD (f ) − sD (f ) < /2K.
Using D, we define Mj = supx∈[xj−1 ,xj ] f (x) and mj = inf x∈[xj−1 ,xj ] f (x). Then we
have
n
X
SD (f 2 ) − sD (f 2 ) = (xj − xj−1 )(Mj2 − m2j )
j=1
Xn
= (xj − xj−1 )(Mj + mj )(Mj − mj )
j=1

≤ 2K(SD (f ) − sanD (f )) < ,

showing that f 2 is integrable.


Since f 2 = |f | · |f |, and |f | ≥ 0, this argument shows that f 2 is integrable for
any integrable function f . Using this and previous properties then shows that
[(f + g)2 − f 2 − g 2 ]/2 = f g is integrable.

§6.3 Integrable Functions


At this point we have our definition of integrability, along with some basic properties
of integrals. However, we have so far only really seen that f (x) = x and f (x) = |x|
are integrable. In this section we will prove some results that will expand this list quite
widely.
We have so only looked at bounded functions, and a natural question is whether bounded
is a sufficient criterion for integrability. This sadly is not the case, and there is a standard
counterexample due to Dirichlet.

Example 6.15 (A Bounded Non-Integrable Function)

55
Adam Kelly (July 4, 2021) Analysis

The function f : [0, 1] → R given by


(
1 if x ∈ Q,
f (x) =
0 if x ∈
6 Q.

is not Riemann integrable.


To see this, note that for any dissection D, since Q is dense in R, we have sD (f ) = 0
and SD (f ) = 1.

So let’s look at some types of integrable functions. The first is that monotonic functions
are integrable, which can be shown by taking an evenly spaced dissection causing the
sum to telescope.

Proposition 6.16 (Monotonic Functions are Integrable)


Let f : [a, b] → R be monotonic. Then f is Riemann integrable.

Proof. Without loss of generality, assume that f is increasing. Then given  > 0,
for any integer n > [(b − a)(f (b) − f (a))]/ we can define the dissection D =
{x0 , x1 , . . . , xn } of [a, b], where x0 = a, and xj − xj−1 = (b − a)/n. Then we have
n
" #
X
SD (f ) − sD (f ) = (xj − xj−1 ) sup f (x) − inf f (x)
x∈[xj−1 ,xj ] x∈[xj−1 ,xj ]
j=1
n
X b−a
= [f (xj ) − f (xj−1 )]
n
j=1
(b − a)(f (b) − f (a))
= < ,
n
and thus f is integrable.

An interesting consequence of this is that if we can write f (x) = f1 (x) − f2 (x) with f1 , f2
are increasing, then f is integrable. Such functions are known as ‘functions of bounded
variation’.
The next result we would like to show is that every continuous function is Riemann
integrable on some bounded interval. Before jumping into the proof, let’s take a moment
to reflect on what might be needed17 . For some continuous function, we want to get
SD (f ) − sD (f ) <  with some dissection. What this comes down to is finding some way
to bound
sup f (x) − inf f (x),
x∈I x∈I

under some sufficiently small interval I. Now we know that f is continuous, and conti-
nuity tells us that there if the interval is of size at most some δ (which can depend on
where the interval is) then we can get |f (x) − f (y)| < . This works for one interval,
but we need to construct a dissection, so we would need to make this type of argument
work for intervals which cover the whole of [a, b].
What would be really nice is if we could find some δ that would work no matter where
17
This is not the only possible proof! You can prove things without uniform continuity, but this is a
more natural (and more slick) argument.

56
Adam Kelly (July 4, 2021) Analysis

the sub-interval in [a, b] is. If this was the case, then the dissection could just be made
up of points which are about a distance of δ apart. It turns out that this is always
possible, and gives us the notion of uniform continuity.18

Definition 6.17 (Uniform Continuity)


Let A ⊂ R and f : A → R. We say that f is uniformly continuous on A if given
any  > 0 we can find a δ > 0 such that for x, y ∈ A, whenever |x − y| < δ we have
|f (x) − f (y| < .

Note that unlike the previous definition, we have no knowledge of where in the set A
the points x and y are, just how far apart they are. It turns out that on closed bounded
intervals (like what we have when we talk about integration, at least in this section),
continuous functions are always uniformly continuous. The proof of this is quite natural
– we use Bolzano-Weierstrass to show that a contradiction to uniform continuity is a
contradiction to continuity.

Theorem 6.18 (Continuous Functions are Uniformly Continuous)


Let f : [a, b] → R be continuous. Then f is uniformly continuous.

Proof. Suppose f was not uniformly continuous, that is, there exists some  > 0
such that for all δ > 0 there is some x, y with |x − y| < δ such that |f (x) − f (y)| ≥ .
Taking δ = 1/n, we can find some sequences xn , yn ∈ [a, b] such that |xn −yn | < 1/n,
and |f (xn ) − f (yn )| ≥ . Then by Bolzano-Weierstrass, we can find some convergent
subsequence xnj → x for some x ∈ [a, b]. But then |xnj − ynj | < 1/nj for all j, so
we must have ynj → x also.
Then |f (xnj ) − f (ynj )| ≥  for every j, which implies that f (xnj ) and f (ynj ) cannot
converge to the same value. But then by continuity we have f (xnj ) → f (x) and
f (ynj ) → f (x), which is a contradiction. Thus f must be uniformly continuous.

With this result, we can prove that continuous functions are integrable in the way we
discussed previously.

Theorem 6.19 (Continuous Functions are Integrable)


Let f : [a, b] → R be continuous. Then f is Riemann integrable.

Proof. Given  > 0, since f is uniformly continuous, there is some δ > 0 such that
|f (x) − f (y)| < /(b − a) whenever |x − y| < δ and x, y ∈ [a, b].
Now choose some integer n ≥ (b−a)/, and define the dissection D = {x0 , x1 , . . . , xn }
with xj = a + j(b − a)/n. Then we have

sup f (x) − inf f (x) ≤ ,
x∈[xj−1 ,xj ] x∈[xj−1 ,xj ] (b − a)

18
The ‘uniform’ part of the definition is part of a larger class of definitions that you will come across in
later analysis courses.

57
Adam Kelly (July 4, 2021) Analysis

for all 1 ≤ j ≤ n, and thus


n
" #
X
SD (f ) − sD (f ) = (xj − xj−1 ) sup f (x) − inf f (x)
x∈[xj−1 ,xj ] x∈[xj−1 ,xj ]
j=1
n
X b−a 
≤ · = ,
n b−a
j=1

and thus f is Riemann integrable.

This theorem shows that most of the functions we deal with regularly (and all of the
ones discussed in the previous chapter) are integrable as well as continuous.
Also, we have so far spent relatively little time on the ‘evaluating integrals’ side of
things, but knowing a function is integrable can help in determining the actual value of
the integral. A straightforward example is shown below.

Example 6.20 (Integral of f (x) = 1)


Consider the function f : [a, b] → R defined by f (x) = 1. This function is continuous
and hence integrable. Then taking the dissection D = {a, b}, we have SD (f ) =
sD (f ) = b − a, so the value of the integral is
Z b
1 dx = b − a.
a

§6.4 The Fundamental Theorem of Calculus


With all of that out of the way, we can now prove a central result in the theory of
integration: the fundamental theorem of calculus. The fundamental theorem of calculus
connects the theory that we built up relating to differentiation and integration, showing
that (in some sense) they are ‘inverses’.

Theorem 6.21 (Fundamental Theorem of Calculus)


Let f : [a, b] → R be continuous. For x ∈ [a, b], we define
Z x
F (x) = f (x) dx.
a

Then F is differentiable, with derivative F 0 (x) = f (x) for every x.

Proof. For |h| > 0, we note that


x+h
F (x + h) − F (x)
Z
1
= f (t) dt.
h h x

Now given  > 0, since f is continuous at x there is some δ > 0 such that when

58
Adam Kelly (July 4, 2021) Analysis

|x − y| < δ we have |f (x) − f (y)| < . Hence if 0 < |h| < δ we have
Z x+h Z x+h
1 1

h f (t) dt − f (x) = f (t) − f (x) dt
x h x
1 x+h
Z

≤ |f (t) − f (x)|
|h|
x
1
≤ · |h| = ,
|h|
F (x+h)−F (x)
hence limh→0 h = f (x), as required.

Another way to view this theorem (which is quite useful in evaluating integrals) is the
corollary below.

Corollary 6.22 (Fundamental Theorem of Calculus – Again)


Suppose that f : [a, b] → R has a continuous derivative. Then
Z b
f 0 (t) dt = f (b) − f (a).
a

Rx
Proof. Define g(x) = a f 0 (t) dt. Then by the fundamental theorem of calculus,
g 0 (x) = f 0 (x). Also g(a) = 0.
Let h(x) = g(x) − f (x). Then h0 (x) = 0, so by the mean value theorem, h(x) is
constant. Taking the value at a, we then have h(a) = −f (a) = g(x) − f (x) = −f (a)
for all x. So if x = b, this becomes g(b) = f (b)−f (a), which is our desired result.

One of the most useful applications of the fundamental theorem of calculus is that we
can translate results about derivatives to results about integrals. This works in both
coming up with the integrals of functions (by thinking about which functions result in a
given function when differentiated), but also in the development of general methods for
evaluating integrals. Two results we will look at are integration by parts, which comes
from the product rule of differentiation, and integration by substitution, which comes
from the chain rule.

Corollary 6.23 (Integration By Parts)


Suppose that the derivatives f 0 and g 0 exist and are continuous on [a, b]. Then
Z b Z b
0
f (x)g(x) dx = f (b)g(b) − f (a)g(a) − f (x)g 0 (x) dx.
a a

Proof. By the product rule (f g)0 = f 0 g + f g 0 . Then by the fundamental theorem of


Rb Rb
calculus we have f (b)g(b) − f (a)g(a) = a f 0 (x)g(x) dx + a f (x)g 0 (x) dx.

Corollary 6.24 (Integration By Substitution)


Let g : [α, β] → [a, b] with g(α) = a and g(β) = b, let g 0 exist and be continuous on

59
Adam Kelly (July 4, 2021) Analysis

[α, β]. Also let f : [a, b] → R be continuous. Then


Z b Z β
f (x) dx = f (g(t))g 0 (t) dt.
a α

Rx
Proof. Set F (x) = a f (t) dt, and let h(t) = F (g(t)), which is defined since g takes
values in [a, b]. Then by the fundamental theorem of calculus we have
Z β Z β
f (g(t))g 0 (t) dt = F 0 (g(t))g 0 (t) dt
α α

Then by the chain rule, we have


Z β Z β Z b
F 0 (g(t))g 0 (t) dt = h0 (t) dt = h(β) − h(α) = F (b) − F (a) = f (x) dx.
α α a

§6.5 Taylor’s Theorem Again


With some results now built up around integration, we are going to spend the next two
sections discussing two applications of integration to topics we have discussed earlier.
The first will be revisiting Taylor’s theorem, and after that we will look back at the
convergence of infinite series.
A nice application of integration by parts (and by extension the fundamental theorem
of calculus) is in proving Taylor’s theorem. You may recall from our discussion of
differentiation that the proof of Taylor’s theorem with Lagrange remainder was somewhat
cumbersome. Using integration, we can derive another form of Taylor’s theorem (this
time with the ‘integral remainder’). This other form can then even be used to obtain
Lagrange’s form of the remainder, under some slightly stronger assumptions about our
function19 .

Theorem 6.25 (Taylor’s Theorem with Integral Remainder)


Let f be n-times continuously differentiable on [a, b]. Then we have
b
f (n−1) (a) f (n) (t)
Z
0
f (b) = f (a) + f (a)(b − a) + · · · + (b − a)n−1 + (b − t)n−1 dt
(n − 1)! a (n − 1)!

Rb
Proof. We do induction on n. For n = 1, the result becomes f (b)−f (a) = a f 0 (t) dt,
which is true by the fundamental theorem of calculus.
Now if f is (n + 1)-times continuously differentiable, integrating by parts we have
b b Z b (n+1)
f (n) (t)
Z  n
n−1 f (t) n f (t)
(b − t) dt = − (b − t) + (b − t)n dt
a (n − 1)! n! a a n!

19
This is why we don’t just throw out the old proof! We need a stronger continuity condition than
before - specifically f (n) being continuous, not just existing.

60
Adam Kelly (July 4, 2021) Analysis

b
f (n) (a) f (n+1) (t)
Z
= (b − a)n + (b − t)n dt.
n! a n!

Thus if the result is true for n, it is also true for n+1, so we are done by induction.

With this result in hand, we can try and re-obtain the Lagrange form of the remainder.
Looking back at the form of this, we really want to some way take the f (n) (t) term ‘out
of the integral’, turning it into some scale factor f (n) (c), for some c ∈ (a, b).
The way we worked with this type of idea before was using the mean value theorem, so
we are going to prove an analogue of the mean value theorem for integrals. To do this
we will (naturally) use the mean value theorem along with the fundamental theorem of
calculus.

Lemma 6.26 (Integral Mean Value Theorem)


Let f, g : [a, b] → R be continuous with g(x) 6= 0 for all x ∈ (a, b). Then there exists
c ∈ (a, b) such that
Z b Z b
f (x)g(x) dx = f (c) g(x) dx.
a a

Rx Rx
Proof. Let F (x) = a f (t)g(t) dt and G(x) = a g(t) dt. By Cauchy’s mean value
theorem, there exists some c ∈ (a, b) such that

(F (b) − F (a))G0 (c) = F 0 (c)(G(b) − G(a)),

which by the fundamental theorem of calculus implies


Z b  Z b
f (t)g(t) dt g(c) = f (c)g(c) g(t) dt.
a a

Then sine g(c) 6= 0, the result follows.

And now we can write down a weaker form of Taylor’s theorem with Lagrange remainder.

Theorem 6.27 (Taylor’s Theorem with Lagrange Remainder – Weaker)


Let f be n-times continuously differentiable on [a, b]. Then we have

f (n−1) (a) f (n) (c)


f (b) = f (a) + f 0 (a)(b − a) + · · · + (b − a)n−1 + (b − a)n ,
(n − 1)! n!

for some c ∈ (a, b).

Proof. By Taylor’s theorem with integral remainder, we have


b
f (n−1) (a) f (n) (t)
Z
0
f (b) = f (a) + f (a)(b − a) + · · · + (b − a)n−1 + (b − t)n−1 dt.
(n − 1)! a (n − 1)!

61
Adam Kelly (July 4, 2021) Analysis

Then the integral mean value theorem gives us that


b b
f (n) (t) (b − t)n−1 f (n) (c)
Z Z
(b − t)n−1 dt = f (n) (c) dt = (b − a)n ,
a (n − 1)! a (n − 1)! n!

for some c ∈ (a, b), giving our result.

Now, to just emphasize one last time: this version of the result imposes an extra conti-
nuity condition, and does not replace the result and proof we had earlier. That said, it’s
still a perfectly good result on its own.

§6.6 Improper Integrals & The Integral Test


So far our definition of Riemann integration applies only to functions that are bounded
over some closed interval. In this subsection we are going to try and extend this definition
in a natural way to discuss the integration of functions that are not necessarily bounded
and integration over an interval that may not be closed (and may be infinite).

Definition 6.28 (Improper Integrals)


Suppose f : [a, b) → R is integrable (and bounded) on every interval [a, R], where
b ∈ R ∪ {∞}, and
Z R
f (x) dx → ` as R → b.
a
Rb
Then we say that the improper integral a f (x) dx converges, and that its value
is `. Otherwise, we say that it diverges.

Let’s have a look at some examples.

Example 6.29 (Integral of 1/xk from 1 to ∞)


Z ∞
1
We will show that the improper integral dx converges if and only if k > 1.
1 xk
If k 6= 1, then considering the integral of 1/xk over [1, R], we have

R
R
x1−k
Z
1
dx =
1 xk 1 − k 1
R1−k − 1
= ,
1−k
which tends to a finite limit as R → ∞, namely −1/(1 − k), if and only if k > 1.
Otherwise, if k = 1, then over [1, R] we have
Z R
1
dx = log R,
1 x

which does not tend to a finite limit as R → ∞. Thus the integral converges if and
only if k > 1.

62
Adam Kelly (July 4, 2021) Analysis


Example 6.30 (Integral of 1/ x from 0 to 1)
Z 1
1
We will show that the improper integral √ dx converges.
0 x
For any δ > 0, we have
Z 1
1 √ 1 √
√ dx = 2 x δ = 2 − 2 δ → 2 as δ → 0,
δ x

and thus the improper integral converges to 2.

Note the example above shows that it is possible to have an improper integral which
converges when the function is not bounded on the interval being integrated over.20
R∞
The definition of improper integrals also gives us a sensible way to say if −∞ f (x) dx
Ra R∞
converges. If for some a we have −∞ f (x) dx = `1 and a f (x) dx = `2 , then we say
R∞
that −∞ f (x) dx = `1 + `2 .
RR
Remark (Warning). This is a strictly stronger notion than saying that −R f (x) dx
converges to some limit as R → ∞.
It is with improper integrals that we get the last application of integration to our previ-
ously discussed topics: infinite series.

Theorem 6.31 (The Integral Comparison Test)


If f : [1, ∞) → R is a non-negative decreasing function with f (x) → 0 as x → ∞
then

X Z ∞
f (n) and f (x) dx
1 1

either both diverge or both converge.

Proof. Since f is decreasing and is non-negative, we have for n ≥ 2 that


Z n+1 Z n
f (x) dx ≤ f (n) ≤ f (x) dx.
n n−1
R2
That is, since 1 f (x) dx ≤ f (1), we have
Z N +1 N
X Z N
f (x) dx ≤ f (x) ≤ f (x) dx + f (1),
1 n=1 1

and then our result follows as N → ∞.

The integral comparison test is useful because in many cases it is easier to evaluate an
integral than it is to evaluate a sum (of course this isn’t always true, and there’s many
examples in both directions). This test also gives us a way to determine the convergence
of sums that previously required tools such as Cauchy’s condensation test21 . An example
20
R∞
It is also possible to have a function f that is unbounded on the interval [1, ∞), and yet 1 f (x) dx
converges. If you haven’t seen this before, try coming up with an example (as a hint, one way to do
it is by drawing triangles in a somewhat clever way).
21
Though it can be rather cumbersome to evaluate the integrals in certain cases, in which case reaching

63
Adam Kelly (July 4, 2021) Analysis

is shown below.

Example 6.32 (Using the Integral Test)


We will prove that ∞ log n
P
n=2 n2 converges using the integral test.

We compute using integration by parts that


R
log x R
Z Z R
log x 1
2
dx = − + 2
dx
2 x x 2
2 x
log R log 2 1 1
=− + − + .
R 2 R 2
R∞
This converges to (1+log 2)/2 as R → ∞, and thus 2 log x/x2 dx and ∞ 2
P
n=2 log n/n
both converge by the integral test.

for Cauchy’s condensation test first might be still helpful. Using the integral test also is a straight-
forward way to derive Cauchy’s condensation test

64

You might also like