Analysis I
Analysis I
Analysis I
At its heart, analysis is the study of ideas that depend on the notion of limits. The main
concepts of analysis (such as convergence, continuity, differentiation and integration)
will all depend quite fundamentally on a limiting process.
This article constitutes my notes for the ‘Analysis I’ course, held in Lent 2021 at Cam-
bridge. These notes are not a transcription of the lectures, and differ significantly in
quite a few areas. Still, all lectured material should be covered1 .
Contents
1 Sequences and Convergence 2
1.1 Limits & The Reals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Bolzano–Weierstrass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Cauchy Sequences & The General Principle of Convergence . . . . . . . . 6
1.4 Limits of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Infinite Series 10
2.1 Convergent & Divergent Series . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Convergence Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Series of Non-Negative Terms . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 Alternating Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Absolute Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Continuity 18
3.1 Continuity of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 The Intermediate Value Theorem . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 The Boundedness Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4 Differentiability 25
4.1 Properties of Differentiable Functions . . . . . . . . . . . . . . . . . . . . 25
4.2 Rolle’s Theorem & The Mean Value Theorem . . . . . . . . . . . . . . . . 28
4.3 Inverses of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5 Power Series 37
5.1 Radius of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1
A tiny bit of analysis is assumed, namely the content covered in the ‘Numbers and Sets’ course at
Cambridge. Specifically the reader should be aware of least upper bounds/suprema along with the
least upper bound axiom. If the reader is unfamiliar with this content, they are referred to Chapter
2 of my Numbers and Sets notes
1
Adam Kelly (July 4, 2021) Analysis
6 Integration 49
6.1 Dissections, Upper & Lower Sums, and Riemann Integrals . . . . . . . . . 49
6.2 Properties of Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.3 Integrable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.4 The Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . . . . 58
6.5 Taylor’s Theorem Again . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.6 Improper Integrals & The Integral Test . . . . . . . . . . . . . . . . . . . 62
This definition is (notably) purely algebraic. We can sensibly define this notion of
convergence for any ordered field (for example, Q). What takes us from algebra to
analysis is the fundamental property of the real numbers.
In other words – an increasing sequence of real numbers that is bounded above converges.
Also this clearly implies that a decreasing sequence of reals bounded below converges.
Limits obey the properties that you would naturally expect.
Proof. Assume that a 6= b. Given given any > 0, we can find integers N1 and N2
2
Adam Kelly (July 4, 2021) Analysis
such that
Proof. We note that n(j) ≥ j. Now an → a implies that given some , we can find
an N such that |aj − a| < for all j ≥ N . But then this implies that |an(j) − a| <
for all j ≥ N , since j ≥ N implies n(j) ≥ N . So an(j) → a also.
3
Adam Kelly (July 4, 2021) Analysis
(iv) Since the sequence converges to a 6= 0, there must be some r > 0 such that
|an | > r, for all n. Then given some > 0, there exists N ∈ N such that
|an − a| < |a|r for all n ≥ N . That is,
− 1 = a − an < |a|r = ,
1
an a aan |a|r
Proof. Given some > 0, we can find integers Na and Nb such that |an − `| < for
all n ≥ Na and |bn − `| < for all n ≥ Nb . Then letting N = max{Na , Nb }, we have
` − < an ≤ xn ≤ bn < ` + for all n ≥ N . That is, |xn − `| < for all n ≥ N .
Hence xn → `.
With these results in the our toolbox, we can then prove our first actual analysis result.
Proof. The sequence n1 is a decreasing sequence bounded below, and thus has a limit
1
a. Considering the sequence 2n = 21 · n1 , this tends to a limit a2 , but since 2n
1
is a
1 a
subsequence of n , it also tends to a limit a. Thus 2 = a, so a = 0 as required.
Now while this is an article about real analysis, we will frequently employ the complex
numbers. Of course, to do that we need to be able to do some analysis with C, and
indeed the definition of a limit still makes sense in C. All of the above properties also
hold, apart from the squeeze theorem (we need to be careful because C cannot be ordered
like R!)
§1.2 Bolzano–Weierstrass
An equivalent but important and quite useful form of the fundamental axiom is the
‘Bolzano-Weierstrass theorem’, that every bounded sequence has a convergent subse-
quence.
4
Adam Kelly (July 4, 2021) Analysis
of good elements, then letting N be the index of the last good element, we have
that for any aj with j ≥ N + 1, there exists some ak with k > j such that ak < aj .
Then repeating this, we can get a collection of decreasing elements. Thus every real
sequence has a monotonic subsequence. Since the the sequence is bounded, we then
have that this subsequence converges.
Remark. This theorem says nothing about the uniqueness of the subsequence’s limit.
For example, consider the sequence xn = (−1)n . Then x2n+1 → −1 and x2n → 1.
The proof given above is quite clean but is not the only standard proof of the theorem.
Another common method which involves the method of ‘repeated bisection’, which is
colloquially known as ‘lion hunting’. This method will be discussed shortly.
The Bolzano-Weierstrass theorem also holds in C (and the same proof method will show
with a little induction that it holds in Rn , but we won’t dwell on that in this article).
Before we end our discussion of the Bolzano–Weierstrass theorem, we will take a short
digression on Lion Hunting (you need to know this proof though!)
5
Adam Kelly (July 4, 2021) Analysis
Proof (Bolzano-Weierstrass, Lion Hunting Style). We are going to define two se-
quences an and bn inductively as follows. Begin by setting [a1 , b1 ] = [−K, K], and
let c1 = (a1 + b1 )/2 be the midpoint of this interval.
Then there is two possibilities:
1. xn ∈ [a1 , c1 ] for infinitely many values of n.
2. xn ∈ [c1 , b1 ] for infinitely many values of n.
Of course both of these can hold at the same time, but if the first one holds we set
a2 = a1 and b2 = c1 , and if it doesn’t then we set a2 = c1 and b2 = b1 .
Repeating this process we construct an and bn such that xm ∈ [an , bn ] for infinitely
many values of ma . Then we have an−1 ≤ an ≤ bn ≤ bn−1 , and also bn − an =
(bn−1 − an−1 )/2.
Now an is increasing and bounded above, and bn is decreasing and bounded below
and thus an → a ∈ [a1 , b1 ] and bn → b ∈ [a1 , b1 ]. Then we have b − a = (b − a)/2
using the above result, and thus a = b.
Since xm ∈ [an , bn ] for infinitely many values of m, we can construct a sequence
nj such that nj+1 > nj and xnj+1 ∈ [aj+1 , bj+1 ]. Then aj ≤ xnj ≤ bj , and thus
xnj → a.
a
This is lion hunting! You can kind of imagine that we are hunting for a number that a subsequence
converges to, using the fact that there must be infinitely many terms near that number
6
Adam Kelly (July 4, 2021) Analysis
The converse of this result is also true! This gives us a powerful result about convergence,
which is quite widely applicable (particularly because we can avoid talking about the
limit explicitly, as we said before).
Proof. Let a1 , a2 , . . . be a Cauchy sequence. We will first show that the sequence
is bounded. Because the sequence is Cauchy, can find an integer N ∈ N such that
n, m ≥ N implies that |an − am | ≤ 1. This implies that if n ≥ N we have
Thus we have |xn | ≤ max |xr | + 1 for all n, so the sequence is bounded.
1≤r≤N
Stripping away the detail, the reason this result holds is since Cauchy sequences are
bounded, we can get a convergent subsequence. Then since all of the terms get arbitrarily
close, the whole sequence must converge along with the subsequence.
Combining these two lemmas gives us the ‘general principle of convergence’, a result also
known as Cauchy’s criterion.
7
Adam Kelly (July 4, 2021) Analysis
We can then define a limit of a function by its behaviour as we approach a given limit
point.
These definitions should match the informal notion of a limits that would have been
given in a calculus course. The reader should note that there is no requirement that
f (a) = `, or even that f be defined at a at all.
There is a natural relation between this notion of a limit point and a limit and our
previous definition of limits of sequences.
Proof. If a is a limit point then for every positive integer n we can take δ = 1/n
and obtain a zn ∈ A such that 0 < |zn − a| < 1/n. Then by the squeeze theorem
zn → a as n → ∞, and also zn 6= a for all n.
Conversely, if there is such a sequence zn ∈ A, then given δ > 0 there is an N such
that zN ∈ A and 0 < |zN − a| < δ, so a is a limit point of A.
In practice, which of these definitions is more useful will depend on context, however
the sequences definition does allow us to use the results we developed at the start of this
8
Adam Kelly (July 4, 2021) Analysis
section.
Proof. These all follow directly from the sequence definition of limits along with
Proposition 1.5.
9
Adam Kelly (July 4, 2021) Analysis
§2 Infinite Series
The notion of convergence lets us talk quite sensibly about what it would mean to add
up an infinite number of things, which is quite exciting.
as n → ∞, as required.
It should also be intuitively clear that the first few terms of an infinite series will not
affect its convergence.
We can also apply the general principle of convergence to infinite series. This gives us
10
Adam Kelly (July 4, 2021) Analysis
another useful way of proving that a series converges, particularly when we don’t know
what the series converges to which is often the case.
Proof. This follows directly from the definition of the sequence of partial sums being
Cauchy.
Pn
Proof. Let Sn = j=1 aj . Then Sn → S for some S, and thus we must have
Sn+1 → S. Adding these sequences we get Sn+1 − Sn → 0, that is, an+1 → 0 as
n → ∞ as required.
Somewhat unfortunately, while necessary this is not a sufficient condition for conver-
gence. The most commonly cited counterexample to this is the Harmonic series.
1 1 1
H2n = Hn + + + ··· +
n+1 n+2 n+n
1 1 1 1
≥ Hn + + + ··· + = Hn + .
|2n 2n {z 2n} 2
n times
The result about the limit of terms in a series can be helpful though! For example, let’s
consider a series you are likely familiar with.
11
Adam Kelly (July 4, 2021) Analysis
1
So if |x| < 1, then xn → 0 and Sn → 1−x . Otherwise, if |x| ≥ 1 then the series
cannot converge since the terms do not tend to 0.
It’s best to think of Proposition 2.5 as giving the sort of a ‘bare minimum’ property that
a series has to have in order to converge. Of course, it’s also quite helpful for sanity
checks!
Proof. Since all of the terms are non-negative, Sn is a bounded and increasing
sequence, and thus converges by our fundamental axiom.
In a similar vein is the comparison test, which allows us to show convergence by com-
paring a series with another series whose convergence we know.
for n ≥ 2, and ∞
PN
Since n12 < n(n−1)
1 1 1 1
P
n=2 n(n−1) converges as n=2 n(n−1) = 1− N → 1
P∞ 1
as N → ∞, we get that n=1 n2 converges by comparison.
Using the comparison test we can derive two more useful tests, both of which come
from comparing a series with the geometric series. You’ll notice that the proofs for both
results are relatively similar!
12
Adam Kelly (July 4, 2021) Analysis
Proof. If a < 1, we can choose b such that a < b < 1, and there exists anPinteger
√
N such that for all n ≥ N we have n an < P b, that is, an < bn . But then ∞
n=N b
n
Proof. If ` < 1, we can choose b such that ` < b < 1, and there exists an integer N
such that an+1
an < b for all n ≥ N . Therefore
an an−1 aN +1
an = · ··· · aN
an−1 an−2 aN
< aN b−N bn ,
P∞ n −N is a constant,
P∞n > N . But then n=N +1 b converges as b < 1, and since aN b
for
n=1 an converges by comparison.
If
P`∞> 1, then an+1 > an and the terms in the series do not tend to zero and thus
n=1 an diverges.
Remark (A Deadly Sin). In both the root and ratio test, if we find that either a = 1 or
` = 1, weP
cannot draw
P 1any conclusions about the convergence of the series. To see this,
1
consider n and n2
.
Some examples of using the root and ratio tests are shown below, but we won’t do too
many since there’s a most likely a few already on your example sheets (and there’s not
really much to the basic technique).
13
Adam Kelly (July 4, 2021) Analysis
P∞ n
n+1
root test n=1 3n+5 converges.
In the case that we have our sequence of non-negative terms is decreasing, we can
determine the convergence by looking at a related series involving a relatively small
subsequence of terms.
You may notice that this proof somewhat resembles our proof that the harmonic series
diverges, and indeed you can show the following stronger result.
P −α
Theorem 2.16 (Convergence of n )
P∞ 1
n=1 nα converges if and only if α > 1.
Proof. If α ≤ 0 then the terms in the series do not tend to zero, so the series
diverges. So for α > 0 we apply Cauchy’s condensation test. This series converges
if and only if the series
∞ ∞
X 1 X
2k · αk = 2(1−α)k
2
k=0 k=0
converges. By comparison with the geometric series, this will converge if and only
if 21−α < 1, that is, α > 1.
In a way Cauchy’s condensation test is slightly different to the others convergence tests
since it just gives us a different series that may be easier to show convergence or diver-
gence for.
Still, it is natural to reach for Cauchy’s condensation test when the root or ratio tests
fail. For example, all of the series below cannot be shown to converge or diverge using
14
Adam Kelly (July 4, 2021) Analysis
If you try, you will find that these tests are inconclusive. However, all can be shown to
converge/diverge using Cauchy’s condensation test.2
Proof. Let Sn = nj=1 (−1)j+1 aj denote the partial sum of the series. Then S2n =
P
S2n−2 + (a2n−1 − a2n ) ≥ S2n−2 . Also
Therefore S2n is increasing and bounded above, and thus converges. Now let S2n →
n → ∞. Then S2n+1 = S2n + a2n+1 → S + 0 = S as n → S. Thus Sn → S,
S asP
and ∞ j=1 (−1)
j+1 a converges.
j
Example 2.18
P∞ (−1)n+1
We will prove that n=1 n converges using the alternating series test.a
1 1
We note that is a decreasing sequence of positive numbers with
n n → 0. Thus by
(−1)n+1
the alternating series test ∞
P
n=1 n converges.
a
Later on we will see that this is equal to log 2.
2
You could also use the integration test (which we have not discussed), but that’s most likely going to
be slower as you have to integrate things.
15
Adam Kelly (July 4, 2021) Analysis
We can see that absolute convergence is a strictly stronger property than ‘regular’ con-
vergence. Indeed absolute convergence implies convergence, but not the other way round.
To see that the converse isn’t true (and that indeed it is a strictly stronger notion),
(−1)n+1
consider the series ∞
P
n=1 n , which converges but does not converge absolutely. We
sometimes say such series converge conditionally.
When trying to determine whether a series converges absolutely, since |an | ≥ 0 we are
free to apply all of the results that we developed in the previous subsection. We will see
this in the next example.
Now it was claimed at the start of this subsection that absolute convergence allows us
to be a ‘little less careful’. Allow me to elaborate on this.
In general, you must be very careful when manipulating infinite series. To see why,
consider this informal example.
1 1 1 1 1
1− + − + − + · · · = log 2
2 3 4 5 6
1 1 1 1 1 3
1 + − + + − + · · · = log 2
3 2 5 7 4 2
The two series above both converge, and they both have all of the same terms. The
only difference is that the terms are in a different order, and this change has completely
altered the value of the series.
What’s nice about absolutely convergent series is that we don’t have to worry about this.
16
Adam Kelly (July 4, 2021) Analysis
N → N is a bijection.
Proof. Let ∞
P
j=1 aσ(j) be a rearrangement of theP series. Given Pn > 0, we wish to
∞
show that there exists an integer N such that | j=1 aj − j=1 aσ(j) | < for all
n ≥ N.
By Cauchy’s criterion there exists an integer m such that ∞
P
j=m+1 |aj | < . We
then choose N such that {a1 , a2 , . . . , am } ⊆ {aσ(1) , aσ(2) , . . . , aσ(N ) }. This can be
done by setting N = max1≤i≤m σ −1 (i). Then if n ≥ N we have
∞
X n
X X
aj − aσ(j) = aj ,
j=1 j=1 j∈An
as required.
To see just how not-true this is for series that do not converge absolutely, we just need
to read the statement of Riemann’s Rearrangement Theorem.
17
Adam Kelly (July 4, 2021) Analysis
§3 Continuity
The next idea we will develop is that of ‘continuity’.
You may notice that this definition uses exactly the definition of the limit of a function
mentioned in subsection 1.4. This immediately gives us two more equivalent definitions
of continuity.
Each of the definitions have various advantages or disadvantages. Still it is worth noting
that the sequential/limit continuity definitions can be used to quite easily show certain
properties of continuity that would otherwise be quite fiddly with the -δ definition.
Using sequential continuity is also a natural way to show that a function is not continuous
at a point. Let’s take a look at some examples.
18
Adam Kelly (July 4, 2021) Analysis
This function is not continuous at 1. To see this, note that 1 − 1/n → 1 but
f (1 − 1/n) → −1 6= f (1) as n → ∞.
This function is continuous, since for every a ∈ Q there is an interval about a for
which f is constant, so f is continuous
√ at a. If f was defined on R instead of Q, then
it would be discontinuous only at ± 2. But these points are not in our domain, so
we don’t need to worry about them.
19
Adam Kelly (July 4, 2021) Analysis
When attempting to determine if a function is continuous, one should keep in mind the
following properties of continuity (all of which follow directly from our basic properties
of limits).
(ii) If lim f (z) = f (a) and lim g(z) = g(a) then lim (f g)(z) = (f g)(a).
z→a z→a z→a
(iii) If lim f (z) = f (a) and f (a) 6= 0, then lim 1/f (z) = 1/f (a).
z→a z→a
Along with the fact that f (x) = x and f (x) = c are continuous, these properties imply
that all polynomials are continuous.
20
Adam Kelly (July 4, 2021) Analysis
f (b)
f (c) = t •
f (a)
a c b
In the proof below we will employ suprema3 , but we will give another proof afterwards
that does not use this (at the expense of being slightly longer).
Proof. Suppose without loss of generalitya that f (a) < t < f (b). Consider the set
S = {x ∈ [a, b] : f (x) < t}. This set is bounded and since a ∈ S it is non-empty. So
we can let c = sup S, and note that a ≤ c ≤ b.
Let n ≥ 1 be an integer. Then since c is the supremum there must exist some xn ∈ S
such that c − n1 < xn ≤ c. By the squeeze theorem we have xn → c as n → ∞.
Also f is continuous so f (xn ) → f (c). We also constructed xn to be in S, giving
f (xn ) < t for all n. This implies that f (c) ≤ t.
We know that c ∈ [a, b] but c 6= b, so there is some integer N such that for n ≥ N
we have c + 1/n ≤ b. Using that c is the supremum, we have c + 1/n 6∈ S for all
n ≥ N , that is, f (c + 1/n) ≥ t for all n ≥ N . Then by continuity we have f (c) ≥ t.
But then f (c) ≥ t and f (c) ≤ t, so we must have f (c) = t.
a
If t = f (a) or t = f (b) then we are done. Also the case f (a) > t > f (b) follows similarity.
Before we look at some examples of using the intermediate value theorem, there’s a few
things worth noting about the intermediate value theorem.
First of all, we say absolutely nothing about uniqueness in the theorem. It is very much
possible for a function to take on an intermediate value multiple times. Second of all,
when applying the intermediate value theorem (commonly abbreviated to IVT) a good
general problem solving technique is trying to apply IVT to other related functions such
as g(x) = f (x) − x or g(x) = f (x) − t and things like that. You will see this ‘trick’ show
up in the examples below, and also again when we get to Rolle’s theorem and the mean
value theorem.
3
If you are unfamiliar with this term and/or the least upper bound principle, feel free to have a look
at Chapter 2 of my ‘Numbers and Sets’ course notes.
21
Adam Kelly (July 4, 2021) Analysis
the intermediate value theorem there exists some c ∈ [0, 1] such that g(c) = 0. But
then we have f (c) − c = 0, that is, f (c) = c as required.
Now before we look on let’s have a look at an alternative proof for the intermediate
value theorem. This proof uses the same idea as the previous proof (try to see why) but
avoids the use of suprema.
To get a feel for the construction, try drawing a rough sketch of the process. If you
consider the case of a function that takes on the desired intermediate value multiple
times, you should also see the that this proof does not give the same value for c as the
22
Adam Kelly (July 4, 2021) Analysis
previous proof.
Proof. If f was unbounded, then given any integer n ≥ 1 there exists xn ∈ [a, b]
such that |f (xn )| > n. By Bolzano–Weierstrass, xn has a convergent subsequence
xnj → x. Also since a ≤ x ≤ b, we must have x ∈ [a, b]. By continuity of f ,
f (xnj ) → f (x), but |f (xnj )| > nj , and thus does not tend to a limit. Thus we have
a contradiction.
Now we can prove that such a continuous function attains its bounds, giving us the
boundedness theorem.
Proof. By our previous lemma, we know that f is bounded. Now we define the set
This set is bounded and non-empty and thus has a supremum M = sup A. Then
for each positive integer n, M − 1/n cannot be an upper bound for A. This implies
that there is some xn ∈ [a, b] such that M − 1/n < f (xn ) ≤ M .
By Bolzano–Weierstrass, xn has a convergent subsequence xnj → x. Since a ≤ xn ≤
b, we know a ≤ x ≤ b. By the continuity of f , f (xnj ) → f (x), but f (xnj ) → M by
construction. So f (x) = M . So x2 = x. The minimum follows analogously.
Alternate Proof. As before, let M = sup A and suppose that there was no x2 such
23
Adam Kelly (July 4, 2021) Analysis
1
that f (x2 ) = M . Then let g(x) = M −f (x) for x ∈ [a, b]. This function is well defined
and continuous. Now g must be bounded by the previous lemma, so g(x) ≤ K
for all x ∈ [a, b], for some K. This means that f (x) ≤ M − 1/k on [a, b]. This is
contradiction since we set M as the supremum.
24
Adam Kelly (July 4, 2021) Analysis
§4 Differentiability
You are likely familiar with differentiability (and particularly the computation of deriva-
tives) from calculus. While this knowledge should certainly not be disregarded, we are
going to go from the beginning, doing everything with a little more care than it got in
calculus. Throughout this section we will deal with functions f : A ⊆ R → R, though
the basic definitions will apply to C.4 .
Here’s our basic definition:
f (x + h) − f (x)
lim = f 0 (x).
h→0 h
Let’s pause for a moment. The core idea of differentiation is we want to approximate a
function around some point using a linear map 5 . Indeed, our definition of the derivative
is directly equivalent to the following: f is differentiable at x if
where limh→0 ε(h)/h = 0. That is, it is differentiable if we can approximate the function
with some linear map where the error decreases faster than linearly.
f (x + h) − f (x)
lim [f (x + h) − f (x)] = lim · (h) = f 0 (x) · 0 = 0,
h→0 h→0 h
thus limh→0 f (x + h) = f (x), so f is continuous at x.
4
We will use differentiation over C in our discussion of power series though. One should also note
that the other basic rules such as the sum, product and chain rules will also hold over C, but being
differentiable over C is a stronger condition than being differentiable over R or R2 . For example,
consider f (z) = z, which is not complex differentiable.
5
Indeed it is which this idea, not really that of ‘the tangent to a curve’ that is used to generalise the
derivative to functions of multiple variables. Of course they are closely related.
25
Adam Kelly (July 4, 2021) Analysis
f (x + h)g(x + h) − f (x)g(x)
lim
h→0
h
g(x + h) − g(x) f (x + h) − f (x)
= lim f (x + h) + g(x) ,
h→0 h h
and by the continuity of f and g at x we have (f g)0 (x) = f (x)g 0 (x)+f 0 (x)g(x).
(iii) Similarly we have
f (x + h)/g(x + h) − f (x)/g(x)
lim
h
h→0
1 f (x + h) − f (x) g(x + h) − g(x)
= lim g(x) − f (x) ,
h→0 g(x)g(x + h) h h
g(x)f 0 (x)−g 0 (x)f (x)
and by the continuity of f and g we have (f /g)0 (x) = g 2 (x)
.
We will now prove the chain rule, which tells us how to compute the derivative of the
composition of functions. Unfortunately this proof is quite ‘tricky’, and trying to do
something like we did in the proof above will not work. Instead, we need to return to
our equivalent definition of differentiability, with f (x + h) = f (x) + f 0 (x)h + ε(h), where
lim ε(h)/h = 0.
h→0
Proof. We have
26
Adam Kelly (July 4, 2021) Analysis
Rearranging we get
but this limit does not exist (a similar construction to Example 3.8 will show this),
and thus f is not differentiable at x = 0.
a
Feel free to check this!
Now in the example above we have a continuous everywhere function that’s not differ-
entiable at 0. A related example is one of a function that is differentiable everywhere
(and hence continuous), but whose derivative is discontinuous.
27
Adam Kelly (July 4, 2021) Analysis
Thus we can see that f 0 is not continuous, since f 0 (x) 6→ 0 as x → 0 (as cos 1/x is
not continuous).
There is however a limit as to how discontinuous a derivative can be. In particular the
derivative of a differentiable function must have the intermediate value property.
Proof. Given a < b and z ∈ R such that f 0 (a) < z < f 0 (b), we wish to show there
exists c ∈ (a, b) such that f (c) = z. We can rewrite the condition as f 0 (a) − z < 0 <
f 0 (b) − z. Now define g(x) = f (x) − zx, and note that we have g 0 (a) < 0 < g 0 (b).
We want to find c ∈ (a, b) such that g 0 (c) = 0. Now, g is continuous (since f is
continuous), and thus is bounded on [a, b], and the minimum is attained say at some
point k. We can’t have k = a, as that would imply that g 0 (a) ≥ 0, and we also can’t
have k = b as that would imply g 0 (b) ≤ 0. Thus k ∈ (a, b), and we must then have
g 0 (k) = 0, and we are done.
Proof. Without loss of generality, assume that x is a local maximum. Then there
exists δ such that |h| < δ1 implies that f (x + h) ≤ f (x).
28
Adam Kelly (July 4, 2021) Analysis
f (x + h) − f (x)
≤ 0,
h
and thus we have f 0 (x) ≤ 0. Hence f 0 (x) = 0.
When we combine this result with the boundedness theorem from our discussion about
continuity, we end up with Rolle’s theorem.
A direct consequence of Rolle’s theorem6 is another classic theorem of analysis, the mean
value theorem, which is frequently abbreviated to MVT.
The mean value theorem says something notable: the size of the derivative controls the
size of the function, or (in rougher terms) it puts a restriction on know ‘badly behaved’
the function can be. Also, if we appeal to geometric intuition (as we have tried not to,
it’s an easy way to go wrong) we can see that the mean value theorem says, as R. P.
Burn wrote in Numbers and functions, “for the graph of a differentiable function, there
is always a tangent parallel to the chord”. Of course, we will quickly move on from
geometrical thinking.7
6
It’s possible to establish this theorem without Rolle’s theorem, and then Rolle’s theorem pops out as
a special case. The proof is (more or less) just what we did for Rolle’s theorem laid out explicitly in
the proof of this theorem.
7
I guess here is a good place to mention how we can reason geometrically in analysis. Drawing pictures
29
Adam Kelly (July 4, 2021) Analysis
Now let’s think about applying the mean value theorem to two different functions. Sup-
pose that f, g : [a, b] → R is continuous and differentiable on (a, b), and g(a) 6= g(b).
Then the mean value theorem gives us s, t ∈ (a, b) such that
A stronger version of the mean value theorem says that we can take s = t.
Then φ is continuous on [a, b] and differentiable on (a, b)a . Also φ(a) = φ(b) = 0.
Then by Rolle’s theorem, there exists c ∈ (a, b) such that φ0 (c) = 0.
Differentiating φ we then have
Cauchy’s mean value theorem has many applications. For example, we can use the mean
value theorem to establish L’Hôpital’s rule for evaluating limits.
In more generality:
and thinking geometrically is a great way to understand how things work, to come up with coun-
terexamples and much more. Still it is important to remember that basically nothing in this course
is provable by appealing to geometric intuition. Instead this type of thinking should just inform the
‘analysis’ side of us how to approach things.
30
Adam Kelly (July 4, 2021) Analysis
Proof. Since g 0 (x) does not vanish near a, we can suppose that g 0 (x) 6= 0 for x ∈
(a, b), as otherwise we could just consider the subinterval (a, b0 ) , defined so that this
is the case.
By Cauchy’s mean value theorem we have
The mean value theorem can also help us extend Lemma 4.10. Knowing the sign of
a derivative over some interval can tell us immediately if it is constant, increasing or
decreasing.
Proof. By the mean value theorem we have f (x2 ) − f (x1 ) = (x2 − x1 )f 0 (x) where
a < x1 < x < x2 < b. Then all of the results follow by considering the sign of
f 0 (x).
8
If a continuous function was not strictly increasing or strictly decreasing then we couldn’t have a
unique inverse.
31
Adam Kelly (July 4, 2021) Analysis
But then |z − y| < δ implies that x − < f −1 (z) < x + , so |f −1 (z) − f −1 (y)| < .
Thus f −1 is continuous on (a, b)
Otherwise, if x = a we have f (x) < f (x + ). Then |z − y| < f (x + ) − f (x) implies
that |f −1 (z) − f −1 (y)| < , so f −1 is continuous at c. A similar argument shows
that it is continuous at d.
Now this is the differentiability section, so we can also describe what the derivative of
a differentiable function’s inverse is. This result is known as the one variable inverse
function theorem.
f −1 (y + k) − f −1 (y) x+h−x 1
lim = lim = 0 ,
k→0 k h→0 f (x + h) − f (x) f (x)
as required.
32
Adam Kelly (July 4, 2021) Analysis
Proof. Since f (a) = f (b), by Rolle’s theorem there exists c1 ∈ (a, b) such that
f 0 (c1 ) = 0. Then since f 0 (a) = f 0 (c1 ) = 0, we can use Rolle’s theorem again
to obtain c2 ∈ (a, c1 ) such that f 00 (c2 ) = 0. Continuing on like this we obtain a
cn ∈ (a, b) such that f (n) (cn ) = 0, as required.
We can also attempt to get some sort of ‘higher-order’ mean value theorem. Recall that
our proof of the mean value theorem was more or less as follows:
Define the function φ(x) = f (x) − kx, where we choose k such that the
conditions of Rolle’s theorem are satisfied. Then apply Rolle’s theorem to φ.
We can try and do the same with a more elaborate construction. We will do this in
three steps.
1. Construct a polynomial P (x) such that P (a) = f (a), P 0 (a) = f 0 (a), and so on
until P (n−1) (a) = f (n−1) (a).
2. Construct another polynomial9 E(x) such that E (r) (a) = 0 for r = 0, 1, . . . , n − 1,
but with E(b) = f (b) − P (b).
3. Take φ(x) = f (x) − P (x) − E(x).
If we construct φ(x) in this way, then we will have φ(a) = φ0 (a) = · · · = φ(n−1) (a) = 0,
and also φ(b) = 0, so it will satisfy our higher-order Rolle’s theorem.
9
We can think of this as the ‘error correcting term’, because it fixes the discrepancy between f (b) and
P (b) while not ruining the previous construction.
33
Adam Kelly (July 4, 2021) Analysis
We can then (by taking derivatives) find that the value of this term is
(
(j) 1 if j = k,
Qk (a) =
0 if j 6= k
With this we can immediately write down an explicit construction for P (x).
n−1 n−1
X X f (k) (a)
P (x) = f (k) (a)Qk (x) = (x − a)k
k!
k=0 k=0
f 00 (a) f (n−1)(a)
= f (a) + f 0 (a)(x − a) + (x − a)2 + · · · + (x − a)n−1 .
2 (n − 1)!
[f (b) − P (b)] · n!
E(x) = Qn (x)
(b − a)n
x−a n
= [f (b) − P (b)] .
b−a
Now with the constructions as above, we can write down φ(x) = f (x) − P (x) − E(x).
Then since φ(x) satisfies all of the conditions of our higher-order Rolle’s theorem, there
exists c ∈ (a, b) such that φ(n) (c) = 0.
Taking the nth derivative, since P is a degree n − 1 polynomial and thus has a zero
derivative, we get
n!
φ(n) (c) = f (n) (c) − [f (b) − P (b)] · =0
(b − a)n
f (n) (c)
=⇒ f (b) = P (b) + (b − a)n .
n!
for some c ∈ (a, b).
This result is Taylor’s theorem, specially Taylor’s theorem with the Lagrange form of
the remainder. For completeness, this proof is given in a standalone form below.
34
Adam Kelly (July 4, 2021) Analysis
Proof. We define
n−1 n
f (k) (a)
X x−a
φ(x) = f (x) − (x − a)k − M ,
k! b−a
k=0
where M is chosen such that φ(a) = φ(b) = 0. Then differentiating we have φ(a) =
φ0 (a) = · · · = φ(n−1) (a) = 0.
Then since φ(a) = φ(b), there exists c1 ∈ (a, b) such that φ0 (c1 ) = 0. Similar-
ity φ0 (a) = φ0 (c1 ) = 0, and thus there exists c2 ∈ (a, c1 ) such that φ00 (c2 ) = 0.
Continuing on in this way, we find a cn such that φ(n) (cn ) = 0, where cn ∈ (a, b).
Then differentiating again we have
M · n!
φ(n) (c) = f (n) (c) − = 0.
(b − a)n
Thus we have
n−1
!
M · n! X f (k) (a) n!
f (n) (c) = = f (b) − (b − a)n ·
(b − a)n k! (b − a)n
k=0
f (n−1) (a) f (n) (c)
=⇒ f (b) = f (a) + f 0 (a)(b − a) + · · · + (b − a)n−1 + (b − a)n ,
(n − 1)! n!
So we have Taylor’s theorem (which in this case can be thought of as a higher-order mean
value theorem), let’s try and do something with it. One of the main uses of Taylor’s
theorem is in giving us an infinite series which converges to the value of our function.
This involves considering the terms of the degree n taylor polynomial for increasing
values of n, as we shall see.
Doing this type of problem has two main steps.
1. Write down Taylor’s theorem for the case of the function being considered.
2. Show that the error term tends to zero as n goes to infinity.
We see how this is done using a Tripos question from 2010 (so eh spoilers I guess).
35
Adam Kelly (July 4, 2021) Analysis
Example 4.21 (Using Taylor’s Theorem – Part IA, 2010 Paper 1, Q10)
Problem. Suppose that e : R → R is a differentiable function such that e(0) = 1 and
e0 (x) = e(x) for all x ∈ R. Use Taylor’s theorem with the remainder in Lagrange’s
form to prove that
X xn
e(x) = for all x ∈ R
n!
n>0
n
for some c ∈ (0, x). And thus it suffices to show that e(c)x
n! → 0 as n → ∞, since
Pn−1 xk
that would imply that e(x) − k=0 k! → 0 as n → ∞. Now e(t) is differentiable
and hence continuous on [0, x], and thus is bounded. So it suffices to show that
xn /n! → 0, which holds.
Thus
∞
X xn
e(x) = ,
n!
n=0
as required.
Now it is not the case that the remainder term always goes to zero. A common coun-
terexample is f (x) = exp(−1/x2 ) for x 6= 0 with f (0) = 0.
36
Adam Kelly (July 4, 2021) Analysis
§5 Power Series
Following on from our discussion of infinite series in Chapter 2, we are going to discuss
a particular type of infinite series known as a power series. Throughout this section, we
are going to work mostly in C.
where z ∈ C and an ∈ C.
In this section we will look at some properties of power series, and we will use them to
(finally) define some functions that we have alluded to quite a bit.
Proof. Since ∞ n n
P
n=0 an z1 converges, we have an z1 → 0, implying this sequence is
bounded. Then there is some K > 0 such that |an z1n | < K for all n.
So if |z| < |z0 |, we have |an z n | ≤ K |z/z1 |n . Since the geometric series ∞ n
P
n=0 |z/z1 |
converges, our result follows by comparison.
With this lemma we can prove one of our key results in the study of power series, the
existence of the radius of convergence.
Proof. If ∞ n
P
P for all nz ∈ C, then we are done. Otherwise, there
n=0 an z converges
must exist z1 ∈ C such that ∞ n=0 an z1 diverges. Then by our previous lemma the
power series must diverge for all z ∈ C where |z| > |z1 |.
37
Adam Kelly (July 4, 2021) Analysis
What’s quite lovely is that when we are inside the radius of convergence, the power series
converges absolutely! This means that we can avoid all of those issues with rearranging
series that show up when things don’t converge absolutely. This can be quite helpful
when actually working with power series.
What’s also quite nice is that we can employ all of those lovely results we developed
in section 2 to find what the radius of convergence of a power series is. Let’s look
at two lemmas that come from the ratio and root tests, which can give the radius of
convergence.
Now it is time for an important conceptual point: if we have a power series with radius
of convergence R, then this only specifies the convergence/divergence inside or outside
the circle |z| = R (the circle of convergence). Specifically it says nothing about the
behaviour on the circle. To see this, we will look at a few examples10 .
10
This should remind you of the deadly sin mentioned in section 2, where the root or ratio test said
nothing if the limit was 1.
38
Adam Kelly (July 4, 2021) Analysis
All of these have a radius of convergence R = 1, however for |z| = 1 for each power
series we have:
1. Divergence if |z| = 1.
2. Convergencea if |z| = 1 with z 6= 1, and divergence if z = 1.
3. Convergence if |z| = 1.
Thus we can’t say anything about the case |z| = R and each case really has to be
treated separately.
a
Have a look at Example Sheet 1 for this result
Lemma 5.7
Let f (z) = ∞ n
P
n=0 an z be a P
power series whose radius of convergence is R. Then if
0 < r < R, the power series ∞n=1 n|an |r
n−1 converges.
Proof. Pick w such that r < |w| < R. Then the power series ∞ n
P
n=0 an w converges,
n
so the terms |an w | are bounded above by say M . Then we have
rn rn
n|an |rn = n|an wn | ≤ M n .
|w|n |w|n
P∞ n converges by the ratio test, and thus by the
But then the series
P∞ n=1 M n(r/|w|)
comparison test n=1 n|an |rn−1 converges.
Applying this lemma twice also gives us that the power series ∞ n−2
P
n=2 n(n − 1)|an |r
converges. We are now ready to prove our theorem about differentiating power series.
39
Adam Kelly (July 4, 2021) Analysis
Now we want to bound the terms of this inside sum. Choose r such that |z| < r < R
and h such that |z| + |h| < r. Then employing the difference of powers factorisation
again we get
Applying this bound, assuming the right hand sum converges we have
∞ ∞
f (z + h) − f (z) X X n−1
X
− nan z n−1 ≤ |an | rj |h|(n − 1 − j)rn−2−j
h
n=1 n=1 j=0
∞
X n−1
X
= |h| |an | rn−2 j
n=1 j=0
∞
1 X
= |h| |an | n(n − 1)rn−2 .
2
n=2
Using our previous lemma twice shows that the right hand sum converges, and thus
∞
f (z + h) − f (z) X
− nan z n−1 → 0 as h → 0,
h
n=1
P∞
that is, f 0 (z) = n=0 nan z
n−1 .
40
Adam Kelly (July 4, 2021) Analysis
When we state a definition using power series, we pretty much always have to check
convergence straight away so that we do not end up writing complete nonsense. For this
power series, we can see by the ratio test that it converges for all z ∈ C. By the result
in the previous section, we can also see that it’s differentiable.
Now we can prove all of the properties that you already are aware of.
41
Adam Kelly (July 4, 2021) Analysis
(iii) Clearly exp(x) > 0 for all x ≥ 0, and exp(0) = 1. Then exp(0) = exp(x − x) =
exp(x) exp(−x) = 1, and thus exp(−x) > 0 for x > 0.
(iv) Differentiating, exp0 (x) = exp(x) > 0, and thus exp is strictly increasing on
R.
(v) Truncating the power series, exp(x) > 1 + x for x > 0. Thus if x → ∞,
exp(x) → ∞. Then exp(−x) = 1/ exp(x), so exp(x) → 0 as x → −∞.
(vi) Injectivity follows directly from exp being strictly increasing. For surjectivity,
take y > 0. Then by (v) there exists a, b ∈ R such that exp(a) < y < exp(b),
and by the intermediate value theorem there is c ∈ (a, b) such that exp(c) =
y.
Since the exponential function is a bijection R → R+ , it also has a well defined inverse.
By the inverse function theorem, this is a differentiable function, and its derivative is
given by log0 (t) = 1/t for t > 0.
Remark. As defined here, the logarithm function is an inverse of exp : R → (0, ∞), not
over the whole of C. In general, exp is not a bijection, and thus it having an inverse
doesn’t make sense (with the tools we have developed, this is remedied in later courses).
Using both the exponential and logarithm functions, we can define powers in the general
case. For x > 0 and α ∈ R, we define xα = exp(α log x). With this we get the normal
‘rules of indices’ being properties of the exponential function, and it it not hard to see
that this definition matches how powers would be previously defined for α ∈ Q. This
also gives us a new shorthand for the exponential function: exp(z) = ez .
11
This section is completely not-examinable and the proofs are relatively long, so I suggest that you
skip it if you are using these notes for revision.
12
If we don’t care about obtaining a new infinite series, this discussion can be completely ignored.
42
Adam Kelly (July 4, 2021) Analysis
(a0 + a1 z + a2 z 2 + a3 z 3 + · · · ) · (b0 + b1 z + b2 z 2 + b3 z 3 + · · · )
= a0 b0 + (a0 b1 + a1 b0 )z + (a0 b2 + a1 b1 + a2 b2 )z 2 + (a0 b3 + a1 b2 + a2 b1 + a3 b0 )z 3 + · · · .
Taking inspiration from this, for two sequence an and bn we can define their convolution
to be the sequence cn where
cn = a0 bn + a1 bn−1 + · · · + an b0 .
It is this construction that will form the basis of our product of infinite series.
where cn = a0 bn + a1 bn−1 + · · · + an b0 .
The Cauchy product does give us a way to obtain a ‘product’ of infinite series, but before
we go any further some important caveats need to be pointed out. The most notable is
the following: the infinite series obtained with the Cauchy product does not necessarily
converge 13 .
and thus they don’t tend to 0 so the series cannot converge. This implies that we
have two convergent series whose Cauchy product does not converge.
a
Exercise!
This is obviously not a very good thing to happen, and we will spend the rest of this
aside trying to deal with this possible issue. Luckily enough, this issue can be completely
avoided if we know that one of the series being ‘multiplied’ converges absolutely14 .
13
It’s also possible for two divergent series to have a convergent Cauchy product. Try coming up with
an example as an exercise.
14
This is another reason why absolute convergence is a great property to have.
43
Adam Kelly (July 4, 2021) Analysis
Then we have
We wish to show Cn → AB, and we will do so using three bounds. Given > 0,
since An → A, there is an integer L such that n ≥ L implies
/3
|An − A| ≤ .
|B|
P∞
Then since k=0 |ak | converges and Bn → B, there is an integer M such that n ≥ M
implies
/3
|Bn − B| ≤ P∞ .
k=0 |ak |
Also, since ∞
P
k=0 ak converges, an → 0, and thus there is an integer N such that
n ≥ N implies that
/3
|an | ≤ PM −1 .
k=0 |Bk − B|
Combining thesea , for N ≥ max{L, M + N } we obtain
n
X
|Cn − AB| = (An − A)B + an−k (Bk − B)
k=0
M
X −1 n
X
≤ |An − A||B| + |an−k ||Bk − B| + |an−k ||Bk − B| ≤
k=0 k=M
P∞
Thus n=0 cn = AB.
a
We use M + N because we will need n − (M − 1) ≥ N to make the bounds work.
We can apply this result to power series. Recalling that power series converge absolutely
inside their circle of convergence, we can safely multiply power series using the Cauchy
product.
44
Adam Kelly (July 4, 2021) Analysis
where cn = a0 bn + a1 bn−1 + · · · + an b0 .
So to summarise: when inside the circle of convergence, we can multiply power series in
the natural way, and the result will be as we expect. Using this corollary, we could prove
exp(a + b) = exp(a) exp(b) directly (though admittedly the other proof is much faster).
eiz − e−iz z3 z5 z7
sin z = =z− + − + ···
2i 3! 5! 7!
eiz + e−iz z2 z4 z6
cos z = =1− + − + ···
2 2 4! 6!
where z ∈ R.
Stating these definitions in terms of the exponential function make lots of identities quite
easy to derive.
(ii) sin2 z + cos2 z = −(e2iz + e−2iz + 2)/4 + (e2iz + e−2iz − 2)/4 = 1, as required.
45
Adam Kelly (July 4, 2021) Analysis
(v) Differentiating the above result with respect to z, we have − sin z cos w −
cos z sin w = − sin(z + w), giving us our desired result.
One of the more notable properties of sin and cos is that they are periodic, which is what
we are going to show next. To establish this, we can try evaluating one of them (say cos)
at some values near where we expect a root to be (guided by our previous knowledge of
these functions).
Proof. We begin by computing the sign of the derivative over (0, 2). Since cos0 (x) =
x2n−1 x2n+1
− sin(x), we can use the inequality (2n−1)! > (2n+1)! to bound
x3 x5 x7
sin x = x − + − + · · · > 0,
3! 5! 7!
so cos has a negative derivative and must be decreasing on (0, 2).
√ √
Evaluating the power series for cos at 2 and 3, we can see that the function
changes sign
√ 2 22 23 √ 3 32 33
cos 2=1− + − + · · · > 0, cos 3 =1− + − + · · · < 0.
2 4! 6! 2 4! 6!
Then
√ √by the intermediate value theorem, there must be a root in the interval
( 2, 3). Also since cos is decreasing on (0, 2), this must also be the smallest
positive root.
We can easily determine the value of sin π/2 too. We have sin2 π/2 + cos2 π/2 = 1, so
sin2 π/2 = 1, and knowing from the proof of Lemma 5.15 that sin(x) > 0 on (0, 2), we
can deduce that sin π/2 = 1. These results are all we need to show that sin and cos are
periodic, and also that the two functions are just translated versions of each-other.
46
Adam Kelly (July 4, 2021) Analysis
Proof. This follows directly from sin(π/2) = 1, cos(π/2) = 0 and the angle addition
formulas.
Remark (Periodicity of exp). If we use the power series for sin and cos, we can write
exp(iz) = cos z + i sin z, from which the above result implies that exp is periodic with a
period of 2π.
With sin and cos have been defined properly, we can then define tan, sec, csc and all of
those fun functions in the normal way.
Remark. The basic hyperbolic and trigonometric functions are related by cosh z =
cos(iz) and sinh z = −i sin(iz). Remembering this can allow you to adapt facts about
sin and cos to facts about sinh and cosh.
We will state a few properties for convenience.
47
Adam Kelly (July 4, 2021) Analysis
48
Adam Kelly (July 4, 2021) Analysis
§6 Integration
We are now prepared, both technically and emotionally, to talk about integrals. The
idea behind integration is to assign some sort of ‘area’ to sets. The first issue we run
into is that is not quite clear what ‘area’ actually means (in mathematical terms), so
to avoid this difficulty, we will instead define the notion of an integral (specifically the
Riemann integral, there are other types) and we will use this as our definition of area
(not vice versa).
Throughout this section, we will deal with the integration of bounded functions defined
on bounded intervals, f : [a, b] → R. We will discuss a slight generalisation later on.
For a given dissection, because we are dealing with bounded functions we can sensibly
define the lower and upper sums of a function as follows.
Xn
sD (f ) = (xj − xj−1 ) inf f (x).
x∈[xj−1 ,xj ]
j=1
sup f (x)
x∈[x1 ,x2 ]
inf f (x)
x∈[x1 ,x2 ]
x0 x1 x2 x3 x4
49
Adam Kelly (July 4, 2021) Analysis
sum and increase the lower sum. Indeed, if we say that D0 refines D when D0 ⊇ D,
then we can formalise this intuition as follows.
Proof. It suffices to show that this holds when D0 and D differ by a single element
(since then we can just repeatedly apply this argument to obtain the general case).
Suppose that D0 = {x0 , . . . , xn }, and that D = D0 \{xi } where i 6= 0, n. Then let
We can then establish the right hand side of the inequality with
Then the left hand side of the inequality follows similarly (taking infima).
A direct consequence of this lemma is that lower sums can never exceed upper sums.
sD1 (f ) ≤ SD2 (f ).
Proof. This follows from the previous lemma by noting that sD1 (f ) ≤ sD1 ∪D2 (f ),
and SD1 ∪D2 (f ) ≤ SD2 (f ).
We are now (finally) ready to say what it means for a function to be Riemann integrable.
Note that Lemma 6.4, along with the boundedness of the function, guarantees that the
upper and lower integrals always exist.
50
Adam Kelly (July 4, 2021) Analysis
If xj − xj−1 < δ for all j, then we can obtain the upper bound
n
b2 a2 b2 a2
1X δ
SD (f ) ≤ − + δ (xj − xj−1 ) = − + (b − a).
2 2 2 2 2 2
j=1
So given > 0, there is some dissection D such that SD (f ) ≤ (b2 − a2 )/2 + , and
a similar argument also gives sD (f ) ≥ (b2 − a2 )/2 − . So by our key integration
property, we have
1 2 1
(b − a2 ) − ≤ I ∗ (f ), I∗ (f ) ≤ (b2 − a2 ) + ,
2 2
for any > 0, and thus I ∗ (f ) = I∗ (f ) = (b2 − a2 )/2. This show that f (x) = x is
integrable over [a, b], with
Z b
b2 a2
x dx = − .
a 2 2
There are a few ways that we can streamline this argument. Later on we will develop
more general integrability results, but the following criterion is also quite useful for
showing particular functions are integrable.
SD (f ) − sD (f ) < .
15
We will regularly drop the ‘Riemann’, and when referring to ‘integrability’ we will always be talking
only about Riemann integrability.
51
Adam Kelly (July 4, 2021) Analysis
Proof. If f is Riemann integrable, then given > 0 there must exist dissections D1 ,
D2 such that
Z b Z b
SD1 (f ) < f (x) dx + and sD2 (f ) > f (x) dx − .
a 2 a 2
Otherwise if for any > 0 we can find a dissection D such that SD (f ) − sD (f ) < ,
then we would have
52
Adam Kelly (July 4, 2021) Analysis
Proof. Let D be a dissection of [a, b]. Then since supx −f (x) = − inf x f (x), we have
n
X
SD (−f ) = (xj − xj−1 ) sup (−f (x))
j=1 x∈[xj−1 ,xj ]
n
X
=− (xj − xj−1 ) inf f (x)
x∈[xj−1 ,xj ]
j=1
= −sD (f ).
Similarly sD (−f ) = −SD (f ). This then implies that I ∗ (−f ) = −I∗ (f ) and I∗ (−f ) =
−I ∗ (f ). But then f is integrable, so I ∗ (−f ) = I∗ (−f ) = −I ∗ (f ). Thus −f is also
Rb Rb
integrable, and a (−f (x)) dx = − a f (x) dx.
and similarly sD (λf ) = λsD (f ). Then I ∗ (λf ) = λI ∗ (f ) and I∗ (λf ) = λI∗ (f ). Since
f is integrable, we then have I ∗ (λf ) = I∗ (λf ) = λI ∗ (f ), so λf is integrable and
Rb Rb
a λf (x) dx = λ a f (x) dx.
a
If λ < 0, we can consider |λ|f , then by the previous lemma −|λ|f = λf will be integrable.
53
Adam Kelly (July 4, 2021) Analysis
is integrable and
Z b Z b Z b
λf (x) + µg(x) dx = λ f (x) dx + µ g(x) dx.
a a a
Let’s now turn our attention16 to another set of integral properties which are still natural
but probably ever-so-slightly less familiar: integral inequalities. The first result is one
you’d expect to be true.
The next result shows that |f | is integrable if f is, and also something like a triangle
inequality for integrals.
Proof. We define f+ (x) = max{f (x), 0}. We will showing that f+ (x) is integrable.
For some interval I we have
sup f+ (x) = max 0, sup f (x) , inf f+ (x) = max 0, inf f (x) ,
x∈I x∈I x∈I x∈I
and then considering the cases supx∈I f (x) ≥ 0 and supx∈I f (x) ≤ 0, we find that
the following inequality holds:
sup f+ (x) − inf f+ (x) = max 0, sup f (x) − max 0, inf f (x)
x∈I x∈I x∈I x∈I
and thus f+ is integrable. We can write |f | = 2f+ −f , and thus |f | is also integrable.
16
Don’t get too sad – we still have to show that the product of integrable functions is integrable, but
we will come back to that once we have shown |f | is integrable
54
Adam Kelly (July 4, 2021) Analysis
R R
b b
We also have −|f | ≤ f ≤ |f |, so a f (x)dx ≤ a |f (x)|dx.
We can use this property to prove that the product of two integrable functions f g is
integrable. To do this, the expansion 2f g = (f + g)2 − f 2 − g 2 can be used, so all we
need to show is that f 2 is integrable if f is. Also since f 2 = |f | · |f |, we can work only
with the case that f is positive.
Unlike some of the previous results, we will not write down what the value of this integral
will be (there’s not really a nice rule). We will only be proving integrability.
Proof. We will first show that if f : [a, b] → R is an integrable function such that
f (x) ≥ 0 for all x ∈ [a, b], then f 2 is integrable.
Since f is integrable (and hence bounded), we can choose some K > 0 such that
|f (x)| ≤ K for all x ∈ [a, b]. Then given any > 0, there exists some dissection D
where SD (f ) − sD (f ) < /2K.
Using D, we define Mj = supx∈[xj−1 ,xj ] f (x) and mj = inf x∈[xj−1 ,xj ] f (x). Then we
have
n
X
SD (f 2 ) − sD (f 2 ) = (xj − xj−1 )(Mj2 − m2j )
j=1
Xn
= (xj − xj−1 )(Mj + mj )(Mj − mj )
j=1
55
Adam Kelly (July 4, 2021) Analysis
So let’s look at some types of integrable functions. The first is that monotonic functions
are integrable, which can be shown by taking an evenly spaced dissection causing the
sum to telescope.
Proof. Without loss of generality, assume that f is increasing. Then given > 0,
for any integer n > [(b − a)(f (b) − f (a))]/ we can define the dissection D =
{x0 , x1 , . . . , xn } of [a, b], where x0 = a, and xj − xj−1 = (b − a)/n. Then we have
n
" #
X
SD (f ) − sD (f ) = (xj − xj−1 ) sup f (x) − inf f (x)
x∈[xj−1 ,xj ] x∈[xj−1 ,xj ]
j=1
n
X b−a
= [f (xj ) − f (xj−1 )]
n
j=1
(b − a)(f (b) − f (a))
= < ,
n
and thus f is integrable.
An interesting consequence of this is that if we can write f (x) = f1 (x) − f2 (x) with f1 , f2
are increasing, then f is integrable. Such functions are known as ‘functions of bounded
variation’.
The next result we would like to show is that every continuous function is Riemann
integrable on some bounded interval. Before jumping into the proof, let’s take a moment
to reflect on what might be needed17 . For some continuous function, we want to get
SD (f ) − sD (f ) < with some dissection. What this comes down to is finding some way
to bound
sup f (x) − inf f (x),
x∈I x∈I
under some sufficiently small interval I. Now we know that f is continuous, and conti-
nuity tells us that there if the interval is of size at most some δ (which can depend on
where the interval is) then we can get |f (x) − f (y)| < . This works for one interval,
but we need to construct a dissection, so we would need to make this type of argument
work for intervals which cover the whole of [a, b].
What would be really nice is if we could find some δ that would work no matter where
17
This is not the only possible proof! You can prove things without uniform continuity, but this is a
more natural (and more slick) argument.
56
Adam Kelly (July 4, 2021) Analysis
the sub-interval in [a, b] is. If this was the case, then the dissection could just be made
up of points which are about a distance of δ apart. It turns out that this is always
possible, and gives us the notion of uniform continuity.18
Note that unlike the previous definition, we have no knowledge of where in the set A
the points x and y are, just how far apart they are. It turns out that on closed bounded
intervals (like what we have when we talk about integration, at least in this section),
continuous functions are always uniformly continuous. The proof of this is quite natural
– we use Bolzano-Weierstrass to show that a contradiction to uniform continuity is a
contradiction to continuity.
Proof. Suppose f was not uniformly continuous, that is, there exists some > 0
such that for all δ > 0 there is some x, y with |x − y| < δ such that |f (x) − f (y)| ≥ .
Taking δ = 1/n, we can find some sequences xn , yn ∈ [a, b] such that |xn −yn | < 1/n,
and |f (xn ) − f (yn )| ≥ . Then by Bolzano-Weierstrass, we can find some convergent
subsequence xnj → x for some x ∈ [a, b]. But then |xnj − ynj | < 1/nj for all j, so
we must have ynj → x also.
Then |f (xnj ) − f (ynj )| ≥ for every j, which implies that f (xnj ) and f (ynj ) cannot
converge to the same value. But then by continuity we have f (xnj ) → f (x) and
f (ynj ) → f (x), which is a contradiction. Thus f must be uniformly continuous.
With this result, we can prove that continuous functions are integrable in the way we
discussed previously.
Proof. Given > 0, since f is uniformly continuous, there is some δ > 0 such that
|f (x) − f (y)| < /(b − a) whenever |x − y| < δ and x, y ∈ [a, b].
Now choose some integer n ≥ (b−a)/, and define the dissection D = {x0 , x1 , . . . , xn }
with xj = a + j(b − a)/n. Then we have
sup f (x) − inf f (x) ≤ ,
x∈[xj−1 ,xj ] x∈[xj−1 ,xj ] (b − a)
18
The ‘uniform’ part of the definition is part of a larger class of definitions that you will come across in
later analysis courses.
57
Adam Kelly (July 4, 2021) Analysis
This theorem shows that most of the functions we deal with regularly (and all of the
ones discussed in the previous chapter) are integrable as well as continuous.
Also, we have so far spent relatively little time on the ‘evaluating integrals’ side of
things, but knowing a function is integrable can help in determining the actual value of
the integral. A straightforward example is shown below.
Now given > 0, since f is continuous at x there is some δ > 0 such that when
58
Adam Kelly (July 4, 2021) Analysis
|x − y| < δ we have |f (x) − f (y)| < . Hence if 0 < |h| < δ we have
Z x+h Z x+h
1 1
h f (t) dt − f (x) = f (t) − f (x) dt
x h x
1 x+h
Z
≤ |f (t) − f (x)|
|h|
x
1
≤ · |h| = ,
|h|
F (x+h)−F (x)
hence limh→0 h = f (x), as required.
Another way to view this theorem (which is quite useful in evaluating integrals) is the
corollary below.
Rx
Proof. Define g(x) = a f 0 (t) dt. Then by the fundamental theorem of calculus,
g 0 (x) = f 0 (x). Also g(a) = 0.
Let h(x) = g(x) − f (x). Then h0 (x) = 0, so by the mean value theorem, h(x) is
constant. Taking the value at a, we then have h(a) = −f (a) = g(x) − f (x) = −f (a)
for all x. So if x = b, this becomes g(b) = f (b)−f (a), which is our desired result.
One of the most useful applications of the fundamental theorem of calculus is that we
can translate results about derivatives to results about integrals. This works in both
coming up with the integrals of functions (by thinking about which functions result in a
given function when differentiated), but also in the development of general methods for
evaluating integrals. Two results we will look at are integration by parts, which comes
from the product rule of differentiation, and integration by substitution, which comes
from the chain rule.
59
Adam Kelly (July 4, 2021) Analysis
Rx
Proof. Set F (x) = a f (t) dt, and let h(t) = F (g(t)), which is defined since g takes
values in [a, b]. Then by the fundamental theorem of calculus we have
Z β Z β
f (g(t))g 0 (t) dt = F 0 (g(t))g 0 (t) dt
α α
Rb
Proof. We do induction on n. For n = 1, the result becomes f (b)−f (a) = a f 0 (t) dt,
which is true by the fundamental theorem of calculus.
Now if f is (n + 1)-times continuously differentiable, integrating by parts we have
b b Z b (n+1)
f (n) (t)
Z n
n−1 f (t) n f (t)
(b − t) dt = − (b − t) + (b − t)n dt
a (n − 1)! n! a a n!
19
This is why we don’t just throw out the old proof! We need a stronger continuity condition than
before - specifically f (n) being continuous, not just existing.
60
Adam Kelly (July 4, 2021) Analysis
b
f (n) (a) f (n+1) (t)
Z
= (b − a)n + (b − t)n dt.
n! a n!
Thus if the result is true for n, it is also true for n+1, so we are done by induction.
With this result in hand, we can try and re-obtain the Lagrange form of the remainder.
Looking back at the form of this, we really want to some way take the f (n) (t) term ‘out
of the integral’, turning it into some scale factor f (n) (c), for some c ∈ (a, b).
The way we worked with this type of idea before was using the mean value theorem, so
we are going to prove an analogue of the mean value theorem for integrals. To do this
we will (naturally) use the mean value theorem along with the fundamental theorem of
calculus.
Rx Rx
Proof. Let F (x) = a f (t)g(t) dt and G(x) = a g(t) dt. By Cauchy’s mean value
theorem, there exists some c ∈ (a, b) such that
And now we can write down a weaker form of Taylor’s theorem with Lagrange remainder.
61
Adam Kelly (July 4, 2021) Analysis
Now, to just emphasize one last time: this version of the result imposes an extra conti-
nuity condition, and does not replace the result and proof we had earlier. That said, it’s
still a perfectly good result on its own.
R
R
x1−k
Z
1
dx =
1 xk 1 − k 1
R1−k − 1
= ,
1−k
which tends to a finite limit as R → ∞, namely −1/(1 − k), if and only if k > 1.
Otherwise, if k = 1, then over [1, R] we have
Z R
1
dx = log R,
1 x
which does not tend to a finite limit as R → ∞. Thus the integral converges if and
only if k > 1.
62
Adam Kelly (July 4, 2021) Analysis
√
Example 6.30 (Integral of 1/ x from 0 to 1)
Z 1
1
We will show that the improper integral √ dx converges.
0 x
For any δ > 0, we have
Z 1
1 √ 1 √
√ dx = 2 xδ = 2 − 2 δ → 2 as δ → 0,
δ x
Note the example above shows that it is possible to have an improper integral which
converges when the function is not bounded on the interval being integrated over.20
R∞
The definition of improper integrals also gives us a sensible way to say if −∞ f (x) dx
Ra R∞
converges. If for some a we have −∞ f (x) dx = `1 and a f (x) dx = `2 , then we say
R∞
that −∞ f (x) dx = `1 + `2 .
RR
Remark (Warning). This is a strictly stronger notion than saying that −R f (x) dx
converges to some limit as R → ∞.
It is with improper integrals that we get the last application of integration to our previ-
ously discussed topics: infinite series.
The integral comparison test is useful because in many cases it is easier to evaluate an
integral than it is to evaluate a sum (of course this isn’t always true, and there’s many
examples in both directions). This test also gives us a way to determine the convergence
of sums that previously required tools such as Cauchy’s condensation test21 . An example
20
R∞
It is also possible to have a function f that is unbounded on the interval [1, ∞), and yet 1 f (x) dx
converges. If you haven’t seen this before, try coming up with an example (as a hint, one way to do
it is by drawing triangles in a somewhat clever way).
21
Though it can be rather cumbersome to evaluate the integrals in certain cases, in which case reaching
63
Adam Kelly (July 4, 2021) Analysis
is shown below.
for Cauchy’s condensation test first might be still helpful. Using the integral test also is a straight-
forward way to derive Cauchy’s condensation test
64