Lecture Notes: Metric Spaces - Sergey Mozgovoy
Lecture Notes: Metric Spaces - Sergey Mozgovoy
Lecture Notes: Metric Spaces - Sergey Mozgovoy
SERGEY MOZGOVOY
Contents
1. Metric spaces 2
1.1. Definition and examples 2
1.2. Balls and bounded sets 5
1.3. Open and closed sets 6
1.4. Convergent sequences 8
1.5. Continuous maps 10
1.6. Complete metric spaces 12
1.7. Completion 14
1.8. Banach fixed point theorem 16
2. Topological spaces 18
2.1. Definition and examples 18
2.2. Subspace and product topologies 20
2.3. Hausdorff spaces 22
2.4. Connected topological spaces 23
2.5. Compact topological spaces 26
2.6. Compact metric spaces 28
2.7. Continuous functions on compact spaces 30
3. Normed vector spaces 32
3.1. Definition of a normed vector space 32
3.2. Examples of normed vector spaces 33
3.3. Bounded linear operators 34
3.4. Equivalent norms 36
3.5. Banach spaces 37
Appendix A. Some inequalities 40
1. Metric spaces
1.1. Definition and examples. The usual distance between points on the plane is given by
p
d(x, y) = (x1 − y1 )2 + (x2 − y2 )2
where x = (x1 , x2 ) ∈ R2 and y = (y1 , y2 ) ∈ R2 . This distance satisfies in particular the triangle inequality
d(x, z) ≤ d(x, y) + d(y, z), x, y, z ∈ R2 .
Using this example as a motivation we can axiomatize the properties of a distance as follows.
Definition 1.1. A metric space is a pair (X, d), where X is a set and d: X × X → R is map such that for
all x, y, z ∈ X
(0) d(x, y) ≥ 0,
(1) d(x, y) = 0 ⇐⇒ x = y,
(2) d(x, y) = d(y, x) (symmetry),
(3) d(x, z) ≤ d(x, y) + d(y, z) (triangle inequality).
The map d is called a metric (or a distance function). The elements of X are called points of X.
Remark 1.2. Axiom 0 follows from other axioms. Indeed 0 = d(x, x) ≤ d(x, y) + d(y, x) = 2d(x, y), hence
d(x, y) ≥ 0.
Example 1.3 (Euclidean spaces). Let
Rn = { (x1 , . . . , xn ) | xi ∈ R ∀1 ≤ i ≤ n} .
This is a vector space over R, where the sum of elements x, y ∈ Rn is defined by
x + y = (x1 + y1 , . . . , xn + yn ) ∈ Rn
and the product of x ∈ Rn with a scalar λ ∈ R is defined by
λx = (λx1 , . . . , λxn ) ∈ Rn .
Define the scalar product on Rn by
(x, y) = x1 y1 + · · · + xn yn ∈ R.
n
Define the norm on R by q
p
kxk = (x, x) = x21 + · · · + x2n .
Define the map d: Rn × Rn → R by
p
d(x, y) = kx − yk = (x1 − y1 )2 + · · · + (xn − yn )2 .
To prove that d is a metric on Rn we will apply (see Theorem A.1)
Theorem 1.4 (Cauchy-Schwartz inequality). For all x, y ∈ Rn , we have
|(x, y)| ≤ kxk · kyk .
Corollary 1.5 (Triangle inequality). For all x, y ∈ Rn , we have
kx + yk ≤ kxk + kyk .
Proof. We have
2
kx + yk = (x + y, x + y) = (x, x) + 2(x, y) + (y, y)
2 2
≤ kxk + 2 kxk · kyk + kyk = (kxk + kyk)2 ,
hence kx + yk ≤ kxk + kyk.
n n n
Corollary 1.6. The map d: R × R → R defined above is a metric on R .
Proof. Let us verify the axioms of a metric:
pP
(1) If x = y, then clearly d(x, y) = kx − yk = 0. Conversely, if d(x, y) = (xi − yi )2 = 0 =⇒
2
(xi − yi ) = 0 for all i =⇒ xi = yi for all i =⇒ x = y.
(2) d(x, y) = kx − yk = ky − xk = d(y, x).
3
Remark 1.7. The metric space (Rn , d) is called the Euclidean space. The metric d is called the Euclidean
metric (or Euclidean distance).
Example 1.8. Define a metric d∞ on R2 by
d∞ (x, y) = max {|x1 − y1 | , |x2 − y2 |} .
n
More generally, define for x, y ∈ R
kxk∞ = max |xi | , d∞ (x, y) = kx − yk∞ = max |xi − yi | .
1≤i≤n 1≤i≤n
n
Then d∞ is a metric on R different from the Euclidean metric.
Example 1.9. Define a metric d1 on R2 by
d1 (x, y) = |x1 − y1 | + |x2 − y2 | .
More generally, define for x, y ∈ Rn
n
X n
X
kxk1 = |xi | , d1 (x, y) = kx − yk1 = |xi − yi | .
i=1 i=1
To see that dp satisfies the triangle inequality we will apply (see Theorem A.2)
Theorem 1.11 (Minkowski inequality). For any p ≥ 1 and x, y ∈ Rn , we have
n
!1/p n
!1/p n
!1/p
X p
X p
X p
|xi + yi | ≤ |xi | + |yi | .
i=1 i=1 i=1
We can write this inequality as kx + ykp ≤ kxkp + kykp . Triangle inequality for dp follows from
dp (x, z) = kx − zkp = k(x − y) + (y − z)kp ≤ kx − ykp + ky − zkp = dp (x, y) + dp (y, z).
pP
Remark 1.12. For P p = 2, we obtain d2 (x, y) = (xi − yi )2 which is the Euclidean metric. For p = 1, we
obtain d1 (x, y) = |xi − yi | which is the metric considered in the previous example.
Remark 1.13. One can show that for any x ∈ Rn , we have limp→∞ kxkp = maxi |xi | = kxk∞ . This implies
that limp→∞ dp (x, y) = d∞ (x, y).
Example 1.14 (Discrete metric). Let X be any set. Define the discrete metric on X by
(
0 if x = y
d(x, y) =
1 if x 6= y
Example 1.15 (Hamming distance). This is a distance used in information theory and coding theory. Let
X be a set and let n ≥ 1. Define a metric on X n by the rule
d(x, y) = # { 1 ≤ i ≤ n | xi 6= yi } , x, y ∈ X n .
This metric is called the Hamming distance. It is equal to the number of positions where the messages x
P1 we obtain the discrete metric on X. For X = {0, 1} we can write the distance
and y are different. For n =
also in the form d(x, y) = i |xi − yi |.
4
Example 1.16 (Levenshtein distance). This is S a distance between strings used in computer science and
linguistics. Let X be a finite set and let X ∗ = n≥0 X n be the set of finite sequences x = (x1 , . . . , xn ),
n ≥ 0, called words (or strings) in the alphabet X. Define the distance between two words to be the minimal
number of 1-letter edits (insertion, deletion or substitution) needed to obtain one word from the other.
Formally, let x, y be two words of length m, n respectively. For 0 ≤ i ≤ m and 0 ≤ j ≤ n define
(
max {i, j} ij = 0,
`x,y (i, j) =
min `x,y (i − 1, j) + 1, `x,y (i, j − 1) + 1, `x,y (i − 1, j − 1) + 1xi 6=yj ij > 0
Then define d(x, y) = `x,y (m, n). For example
d(tip, top) = d(ti, to) = d(t, t) + 1 = 1
d(desert, dessert) = d(de, des) = d(de, de) + 1 = 1
Example 1.17 (Distance on graphs). A graph is a pair (V, E), where V is a set, called the set of vertices,
and E is a set of unordered pairs of elements in V , called the set of edges. An edge e = {v, v 0 } ∈ E (also
written as vv 0 ) is called an edge between vertices v and v 0 . Given a function f : E → R>0 , we can interpret
f (e) for an edge e = vv 0 as the time needed to travel between v and v 0 along e. For any path p = e1 . . . en
0
f (ei ). Define d(v, v 0 ) as the minimal value of f (p) for the paths p
P
between vertices v and v , let f (p) =
0
between v and v (and +∞ if there are no such paths). One can show that (V, d) is a metric space (if we
also allow the value +∞). For example, if f (e) = 1 for every e ∈ E, then d(v, v 0 ) is equal to the number of
edges along the shortest path between v and v 0 .
Example 1.18. We can interpret the space Rn as the set of all functions f : {1, . . . , n} → R. Let us
generalize this construction to infinite domains. Given real numbers a < b, let C[a, b] be the set of all
continuous functions f : [a, b] → R. Define a metric d = d∞ on C[a, b] by
d(f, g) = sup |f (x) − g(x)| , f, g ∈ C[a, b].
x∈[a,b]
The set U is open as a union of open balls. Therefore all points of U are internal in U , hence also in A. This
implies that U ⊂ A◦ and we conclude that A◦ = U and it is open.
If V ⊂ A is the union of all open subsets of A, then A◦ ⊂ V . On the other hand all points of V are
internal in A, hence V ⊂ A◦ and we conclude that A◦ = V .
7
1.4. Convergent sequences. Let (X, d) be a metric space. A sequence (xn )∞ n=1 of points in X is a collection
of elements x1 , x2 , . . . in X. We denote a sequence (xn )∞
n=1 also by (x n ) n≥1 or just (xn ).
Definition 1.45. A sequence (xn ) of points in X is called convergent to x ∈ X if ∀ε > 0 ∃N > 0 such that
d(xn , x) < ε ∀n ≥ N.
The point x is called the limit of the sequence and is denoted by limn→∞ xn . We also write xn → x as
n → ∞.
Remark 1.46. The definition of convergence of a sequence (xn ) to x is equivalent to the statement that
d(xn , x) → 0, n → ∞.
Lemma 1.47. Every convergent sequence in a metric space has a unique limit.
Proof. Assume that (xn )n≥1 converges to x, y ∈ X. For any ε > 0 there exist N, N 0 > 0 such that
d(xn , x) < ε for n ≥ N and d(xn , y) < ε for n ≥ N 0 . Then for any n ≥ N, N 0 we have
d(x, y) ≤ d(x, xn ) + d(xn , y) < ε + ε = 2ε.
This inequality is true for any ε > 0 =⇒ d(x, y) = 0 =⇒ x = y.
n−1
Example 1.48. Consider a sequence (xn )n≥1 in R2 with xn = n1 , n . Then limn→∞ xn = x = (0, 1).
Indeed, s 2 2 r √
1 n−1 1 1 2
d(xn , x) = −0 + −1 = + 2 = → 0, n → ∞.
n n n2 n n
This example shows that convergence in Rk can be studied coordinatewise. This is the content of the next
lemma.
(n)
Lemma 1.49. A sequence (x(n) )n≥1 in the Euclidean space Rk converges to x ∈ Rk ⇐⇒ xi → xi as
n → ∞, for all 1 ≤ i ≤ k.
q
(n) (n)
P (n) (n)
Proof. =⇒ If x → x as n → ∞, then d(x , x) = i (xi − xi )2 → 0 =⇒ (xi − xi )2 → 0 for all
(n)
1 ≤ i ≤ k =⇒ xi → xi as n → ∞. q
(n) (n) (n)
− xi )2 → 0 for all i, hence d(x(n) , x) =
P
⇐= If xi → xi as n → ∞ for all i, then (xi i (xi − xi )2 → 0.
This means that x(n) → x as n → ∞.
Sequences are useful for the characterization of limit points of subsets.
Lemma 1.50. Let A ⊂ X be a subset. A point x ∈ X is a limit point of A ⊂ X ⇐⇒ there exists a
sequence in A\ {x} convergent to x.
Proof. =⇒ Let x ∈ X be a limit point of A. Then, for any n ≥ 1 there exists xn ∈ A\ {x} such that
d(xn , x) < 1/n. This implies that the sequence (xn )n≥1 is contained in A\ {x} and converges to x.
⇐= Assume that there exists a sequence (xn )n≥1 in A\ {x} convergent to x. Then, for any ε > 0 there exists
n ≥ 1 such that d(xn , x) < ε. This implies that x is a limit point of A.
Corollary 1.51. A subset A ⊂ X is closed ⇐⇒ the limit of any convergent sequence of points in A is
contained in A.
Proof. =⇒ If A is closed and (xn ) is a sequence in A convergent to x ∈ X, then x is a limit point of A, hence
x ∈ A. ⇐= If x is a limit point of A, then there exists a sequence (xn ) in A\ {x} convergent to x. By our
assumption this limit is contained in A, hence x ∈ A and A is closed set.
Example 1.52. Consider the space C[0, 1] equipped with the metric d1 . Consider the sequence of elements
xn ∈ C[0, 1] defined by xn (t) = tn for t ∈ [0, 1]. For every t ∈ [0, 1), we have limn→∞ xn (t) = limn→∞ tn = 0.
Therefore we can guess that the sequence (xn ) converges to the constant zero function 0 ∈ C[0, 1]. This is
indeed the case as
Z 1 Z 1 1
n n tn+1 1
d1 (xn , 0) = |t − 0| dt = t dt = = → 0, n → ∞.
0 0 n+1 0 n+1
9
On the other hand, let us equip C[0, 1] with the metric d∞ (called the supremum metric). Then
d∞ (xn , 0) = sup |tn − 0| = 1,
t∈[0,1]
or equivalently
|xn (t) − x(t)| ≤ ε ∀n ≥ N ∀t ∈ [a, b].
This leads us to the following definition.
Definition 1.54. Let x(t) be a function and (xn (t))n≥1 be a sequence of functions on the interval [a, b].
(1) A sequence (xn (t))n≥1 is said to converge pointwise to a function x(t) if
∀t ∈ [a, b] ∀ε > 0 ∃N > 0: |xn (t) − x(t)| < ε ∀n ≥ N
or equivalently, ∀t ∈ [a, b] we have limn→∞ xn (t) = x(t).
(2) A sequence (xn (t)) is said to converge uniformly to a function x(t) if
∀ε > 0 ∃N > 0: |xn (t) − x(t)| < ε ∀n ≥ N ∀t ∈ [a, b].
Remark 1.55.
(1) Note that for pointwise convergence we can choose N depending on t, while for uniform convergence
we have to choose N that works for all t simultaneously (uniformly).
(2) Uniform convergence implies pointwise convergence.
(3) We can see from the previous example that uniform convergence is equivalent to the convergence
with respect to the metric d∞ .
Example 1.56. Consider the sequence of functions xn (t) = tn − t2n in C[0, 1]. It converges pointwise to
the zero function as lim xn (t) = limn→∞ (tn − t2n ) = 0 for every t ∈ [0, 1]. On the other hand this sequence
does not converge uniformly to the zero function as
1
d∞ (xn , 0) = sup (tn − t2n ) = ∀n ≥ 1.
t∈[0,1] 4
Example 1.57. We have seen earlier that the sequence of functions xn (t) = tn in C[0, 1] does not converge
to the zero function with respect to the metric d∞ . But does it converge to any other function x ∈ C[0, 1]?
If this is the case, then (xn ) converges uniformly to x, hence also pointwise. Then, for any t ∈ [0, 1], we have
(
0, 0 ≤ t < 1
x(t) = lim xn (t) = lim tn =
n→∞ n→∞ 1, t = 1
But this function is not continuous, hence the sequence (xn ) does not converge in (C[0, 1], d∞ ).
Lemma 1.58. Let (xn ) be a sequence in X convergent to x ∈ X and let y ∈ X. Then
lim d(xn , y) = d(x, y).
n→∞
Proof. In the proof of this lemma we will need the following useful result
Lemma 1.59. For any points x, y, z ∈ X we have
|d(x, y) − d(x, z)| ≤ d(y, z).
Proof. From the triangle inequalities d(x, y) ≤ d(x, z) + d(y, z) and d(x, z) ≤ d(x, y) + d(y, z), we obtain
d(x, y) − d(x, z) ≤ d(y, z) and d(x, z) − d(x, y) ≤ d(y, z), hence the required statement.
Going back to the original lemma, we obtain
|d(xn , y) − d(x, y)| ≤ d(xn , x) → 0, n → ∞.
Therefore d(xn , y) → d(x, y) as n → ∞.
10
⇐= Let us show that f is continuous at any point x ∈ X. For every ε > 0, the set U = B(f (x), ε) is open
in Y . Therefore its preimage f −1 (U ) is open in X. In particular, the point x ∈ f −1 (U ) is an internal point.
Therefore there exists δ > 0 such that B(x, δ) ⊂ f −1 (U ), that is,
f (B(x, δ)) ⊂ U = B(f (x), δ).
This meas that f is continuous at x ∈ X. Therefore f is continuous.
Lemma 1.66. Let f : X → Y and g: Y → Z be continuous maps between metric spaces. Then their compo-
sition gf : X → Z is also a continuous map.
Proof. We have to show that for any open set U ⊂ Z the preimage (gf )−1 (U ) is also open. We have
(gf )−1 (U ) = f −1 (g −1 (U )).
If U ⊂ Z is open, then g −1 (U ) ⊂ Y is open as g is continuous. Therefore f −1 (g −1 (U )) is open as f is
continuous. This implies that (gf )−1 (U ) is open.
11
Definition 1.67. A map f : (X, d) → (Y, d0 ) is called Lipschitz continuous if there exists L ≥ 0 such that
d0 (f (x), f (y)) ≤ L · d(x, y) ∀x, y ∈ X.
The number L above is called a Lipschitz constant for f .
Theorem 1.68. If f : X → Y is Lipschitz continuous, then it is continuous.
Proof. Let L > 0 be a Lipschitz constant for f . For any ε > 0 choose δ = ε/L. Then if d(x, y) < δ =⇒
d0 (f (x), f (y)) ≤ L · d(x, y) < Lδ = ε. Therefore f is continuous.
√
Example 1.69. Consider the function f (t) = t for t ∈ [0, 1]. This function is continuous, although it is
not Lipschitz continuous. Indeed, if f would be Lipschitz continuous with a Lipschitz constant L > 0, then
in particular, √
t − 0 ≤ L · |t − 0| = Lt,
√ √
hence t ≥ L1 for t > 0. But this is obviously wrong for t = 2L 1
.
Lemma 1.70. Let f : [a, b] → R be a function having a continuous derivative. Then f is Lipschitz continuous
with the Lipschitz constant maxt∈[a,b] |f 0 (t)|.
Proof. The function f 0 : [a, b] → R is continuous, hence bounded. Let L = maxt∈[a,b] |f 0 (t)|. By the mean
value theorem, for any a ≤ x < y ≤ b, there exists t ∈ (x, y) such that
f (y) − f (x)
= f 0 (t).
y−x
Therefore
|f (y) − f (x)| = |f 0 (t)| · |x − y| ≤ L · |x − y|
and f is Lipschitz continuous.
Definition 1.71. A map f : (X, d) → (Y, d0 ) between two metric spaces is called an isometry if
d0 (f (x), f (y)) = d(x, y) ∀x, y ∈ X.
An isometry is called a global isometry if it is bijective.
Lemma 1.72. Let f : (X, d) → (Y, d0 ) be an isometry. Then
(1) f is continuous.
(2) f is injective.
(3) If f is a global isometry, then f −1 : Y → X is continuous.
Proof. (1) An isometry is Lipschitz continuous with a Lipschitz constant L = 1. Therefore it is also
continuous.
(2) Assume that x, y ∈ X satisfy f (x) = f (y). Then
d(x, y) = d0 (f (x), f (y)) = 0,
hence x = y. This means that f is injective.
(3) If f is a global isometry then f −1 : Y → X is also an isometry. Therefore by (1) it is continuous.
12
Therefore this sequence converges to some x(t0 ) ∈ R. In this way we obtain a function x: [a, b] → R. We
have to show that (xn ) converges uniformly to x and that x is a continuous function. We can write the
Cauchy condition in the form: ∀ε > 0 ∃N > 0
∀m, n ≥ N ∀t ∈ [a, b]: |xm (t) − xn (t)| < ε.
Taking the limit m → ∞ we obtain
∀n ≥ N ∀t ∈ [a, b]: |x(t) − xn (t)| ≤ ε
hence the sequence of functions (xn ) converges uniformly to x: [a, b] → R. The continuity of x follows from
the next lemma.
Lemma 1.85. Assume that a sequence of functions (xn (t)) in C[a, b] converges uniformly to a function
x: [a, b] → R. Then x ∈ C[a, b].
Proof. Let us show that x: [a, b] → R is continuous at t0 ∈ [a, b]. For any ε > 0 ∃N > 0
ε
|x(t) − xn (t)| < ∀n ≥ N ∀t ∈ [a, b].
3
We will need just n = N . As xn ∈ C[a, b], there exists δ > 0
ε
|xn (t) − xn (t0 )| < ∀t ∈ [t0 − δ, t0 + δ].
3
Then ∀t ∈ [t0 − δ, t0 + δ]:
ε ε ε
|x(t) − x(t0 )| ≤ |x(t) − xn (t)| + |xn (t) − xn (t0 )| + |xn (t0 ) − x(t0 )| < + + = ε.
3 3 3
This implies that x is continuous at t0 . Therefore x is continuous on [a, b].
14
1.7. Completion. Assume that X is a metric space which is not complete. We would like to embed X into
a larger metric space X ⊂ X ∗ such that X ∗ is complete and such that the closure of X is equal to X ∗ .
Definition 1.86. A subset A ⊂ X is called dense in X if A = X.
Example 1.87.
(1) Let X = [0, 1] and A = (0, 1) ⊂ X. Then the closure of A is [0, 1] = X, hence A is dense in X.
(2) The set Q is dense in R. Indeed, any x ∈ R has a decimal representation
x = a0 .a1 a2 . . .
where ai ∈ Z and 0 ≤ ai ≤ 9 for i ≥ 1 (the digits). For any n ≥ 1 the numbers
n
X ai
xn = a0 .a1 . . . an = i
i=0
10
are rational. Moreover, limn→∞ xn = x as
X ai X 9 1
|x − xn | = 0.0 . . . 0an+1 an+2 · · · = i
≤ = n → 0, n → ∞.
10 10i 10
i≥n+1 i≥n+1
This implies that d0 (φ(x∗ ), φ(y ∗ )) = d∗ (x∗ , y ∗ ). We can also show that φ is a bijection.
Existence:
1. Construction of X ∗ . Let X be the set of all Cauchy sequences in X. We call two sequences (xn ),
(yn ) equivalent if limn→∞ d(xn , yn ) = 0. This is an equivalence relation (one can check reflexivity, symmetry,
transitivity). Let X ∗ be the set of equivalence classes.
2. Metric on X ∗ . For any two equivalence classes x∗ = [(xn )] and y ∗ = [(yn )] in X ∗ define
d(x∗ , y ∗ ) = lim d(xn , yn ).
n→∞
(1) The sequence (d(xn , yn ))n≥1 is Cauchy and therefore converges in R.
(2) The limit limn→∞ d(xn , yn ) is independent of the choice of Cauchy sequences in the classes x∗ and
y∗ .
To show that d defines a metric on X ∗ , we have to check the triangle inequality. Let x∗ , y ∗ , z ∗ ∈ X ∗ and let
(xn ), (yn ), (zn ) be their representatives. For any n ≥ 1 we have
d(xn , yn ) ≤ d(xn , zn ) + d(zn , yn ).
Taking the limits, we obtain
d(x∗ , y ∗ ) ≤ d(x∗ , z ∗ ) + d(z ∗ , y ∗ ).
3. Embedding of X into X ∗ . For any x ∈ X, consider the constant sequence xn = x for n ≥ 1. This
is a Cauchy sequence that defines a point in X ∗ . This embedding X ⊂ X ∗ is an isometry.
4. Density of X in X ∗ . Indeed, let x∗ ∈ X ∗ and let (xn ) be its representative. Then (xn ) is a Cauchy
sequence =⇒ ∀ε > 0 ∃N > 0
∀m, n ≥ N : d(xm , xn ) < ε.
Therefore ∀n ≥ N :
d(x∗ , xn ) = lim d(xm , xn ) ≤ ε.
m→∞
This implies that the sequence (xn ) in X converges to x∗ ∈ X ∗ =⇒ X is dense in X ∗ .
5. Completeness of X ∗ . Any Cauchy sequence (xn ) in X converges to x∗ ∈ X ∗ which is an equivalence
class of (xn ). Consider now a Cauchy sequence (x∗n ) in X ∗ . As X is dense in X ∗ , for any n ≥ 1 we can
choose xn ∈ X such that d(xn , x∗n ) < 1/n. The sequence (xn ) is also a Cauchy sequence:
d(xm , xn ) < d(x∗m , x∗ n) + 1/m + 1/n.
Therefore it converges to some x∗ ∈ X ∗ . This implies that (x∗n ) also converges to x∗
d(x∗n , x∗ ) ≤ (x∗n , xn ) + d(xn , x) < 1/n + d(xn , x) → 0, n → ∞.
Example 1.93. We have seen that R is the completion of Q. But we can also apply the above theorem in
order to construct the set R of real numbers as the completion of Q. According to the above construction
R can be identified with the set of equivalence classes of Cauchy sequences Pin Q. One can show that every
n
such class x∗ has a representative Cauchy sequence (xn ) of the form xn = i=0 ai /10i , where ai ∈ Z and
∗
0 ≤ ai ≤ 9 for i ≥ 1. In this way we can represent x using the usual decimal representation a0 .a1 a2 a3 . . . .
Note that two different decimal representations can induce equivalent Cauchy sequences, hence the same
real number. For example,Pnconsider decimal representations 0.999
Pn. . . and 1.000 . . . with the corresponding
Cauchy sequences xn = i=1 9/10n = 1 − 1/10n and yn = 1 + i=1 0 = 1. These sequences are equivalent
as d(xn , yn ) = 1/10n → 0 as n → ∞, hence they represent the same real number.
16
2. Topological spaces
2.1. Definition and examples. Given a metric space (X, d), we proved that the collection of all its open
sets satisfies the following properties: ∅ and X are open sets, unions of open sets are open, finite intersections
of open sets are open. A collection of subsets of X satisfying these conditions is called a topology on X. It
turns out that many properties of the metric space depend only on the underlying topology. For this reason
we will study topologies in more detail.
Definition 2.1. A topological space is a pair (X, τ ), where X is a set and τ is a collection of subsets of
X such that
(1) ∅, X ∈ τ . S
(2) If (Ui )i∈I is a collection of sets in τ then the union i∈I Ui is inTτ .
(3) If (Ui )i∈I is a finite collection of sets in τ then the intersection i∈I Ui is in τ .
The collection τ is called a topology on X. The elements of τ are called open sets.
Example 2.2. Let (X, d) be a metric space and τ be the collection of all open sets in (X, d). Then τ is a
topology on X, called a metric topology generated by metric d.
Definition 2.3. A topological space (X, τ ) is called metrizable if τ is generated by some metric on X.
Example 2.4. Let X be an arbitrary set and τ = {∅, X}. Then τ is a topology, called the trivial topology.
This topology is not metrizable if |X| ≥ 2. Indeed, if τ is generated by some metric d, consider x, y ∈ X and
let r = d(x, y). Then y ∈
/ B(x, r), hence B(x, r) 6= ∅, X. Therefore B(x, r) ∈/ τ , while it should be open. A
contradiction.
Example 2.5. Let X be an arbitrary set and let τ consist of all subsets of X. Then τ is a topology, called
the discrete topology. This topology is actually a metric topology generated by the discrete metric
(
0, x = y,
d(x, y) =
1, x 6= y.
S
Indeed, B(x, 1) = {x} is open with respect to d for all x ∈ X. Then for any A ⊂ X, the set A = x∈A {x}
is also open with respect to d.
Example 2.6. Let X = {a, b}. Then the possible topologies on X are
(1) τ = {∅, X} (trivial topology).
(2) τ = {∅, {a} , X}.
(3) τ = {∅, {b} , X}.
(4) τ = {∅, {a} , {b} , X} (discrete topology).
Only the last topology is metrizable. One can show that a metric on a finite set always generates the discrete
topology.
Definition 2.7. Let (X, τ ) be a topological space.
(1) A subset A ⊂ X is called closed if its complement X\A is open.
(2) Given a subset A ⊂ X, its closure Ā is defined to be the minimal closed set that contains A. It is
the intersection of all closed sets that contain A.
(3) Given a subset A ⊂ X, its interior A◦ is defined to be the maximal closed set contained in A. It is
the union of all closed sets contained in A.
Example 2.8. If X is equipped with the trivial topology, then {x} = X for every x ∈ X. If X is equipped
with the discrete topology, then {x} = {x} for every x ∈ X.
Example 2.9. Let X = {a, b} and τ = {∅, {a} , X}. Then {a} = {a} and {b} = X.
Definition 2.10. A map f : (X, τ ) → (Y, τ 0 ) is called continuous if for every open set U ⊂ Y , the preimage
f −1 (U ) is open in X. A map f is called a homeomorphism if f is bijective and both f, f −1 are continuous.
Example 2.11. Let X be a set with |X| ≥ 2, τdisc be the discrete topology on X and τtriv be the trivial
topology on X. Then the identity map
f : (X, τdisc ) → (X, τtriv ), x 7→ x,
19
is bijective and continuos. On the other hand g = f −1 (X, τtriv ) → (X, τdisc ) is not continuos. Indeed a one-
point set {x} is open in (X, τdisc ), but its preimage g −1 ({x}) = {x} is not open in (X, τtriv ). We conclude
that f is not a homeomorphism, although it is bijective and continuous.
Lemma 2.12. If f : X → Y and g: Y → Z are continuous maps between topological spaces, then gf : X → Z
is also continuous.
Proof. For any open set U ⊂ Z, we have (gf )−1 (U ) = f −1 (g −1 (U )). Note that g −1 (U ) is open in Y , hence
f −1 (g −1 (U )) is open in X.
Example 2.13. Let X be an arbitrary set and let τ consist of ∅ and all U ⊂ X with X\U finite. This is a
topology, called a cofinite topology (or Zariski topology). Let us verify the axioms.
(1) It is clear that ∅ and X are in τ . S
(2) Let (Ui )i∈I be a collection of elements in τ . To show that the union i Ui is in τ , we can assume
that Ui 6= ∅, hence X\Ui is finite for all i ∈ I. Consider the complement
[ \
X\ Ui = (X\Ui ),
i∈I i∈I
where
S we applied De Morgan’s law. This set is finite as an intersection of finite sets. Therefore
i U i ∈ τ. T
T a finite collection of elements in τ and let us show that i Ui ∈ τ . If some set Ui is
(3) Let (Ui )i∈I be
empty, then i Ui = ∅ ∈ τ . Therefore we can assume that Ui 6= ∅, hence X\Ui is finite for all i ∈ I.
Consider the complement \ [
X\ Ui = (X\Ui ),
i∈I i∈I
where we again applied
T De Morgan’s law. This set is a finite union of finite sets, hence is itself finite.
This implies that i∈I Ui ∈ τ .
We conclude that τ is a topology on X. Note that if X is finite, then τ is a discrete topology and in particular
metrizable. If X is infinite, then τ is not metrizable. Indeed, assume that τ is generated by a metric d. Let
x, y ∈ X be two distinct points and let r = 21 d(x, y). Then open sets U = B(x, r) and V = B(y, r) satisfy
U ∩ V = ∅ and therefore
X = X\(U ∩ V ) = (X\U ) ∪ (X\V ).
But U, V are open and non-empty, hence X\U, X\V are finite. This implies that X is finite as a union of
two finite sets. A contradiction.
20
The intersections Ui ∩ Uj0 are open in X and the intersections Vi ∩ Vj0 are open in Y . Therefore the
above union is an element of τ .
This implies that τ satisfies all axioms of a topology.
Theorem 2.18. Let X, Y be two topological spaces and let X × Y be equipped with a product topology. Then
(1) The projection maps
p1 : X × Y → X, (x, y) 7→ x, p2 : X × Y → Y, (x, y) 7→ y
are continuous.
(2) Given continuous maps f : Z → X, g: Z → Y , there exists a unique continuous map h: Z → X × Y
that makes the diagram commute (p1 h = f and p2 h = g).
Z
g
h
f
X ×Y p2 Y
p1
Proof. 1. Let us show that p1 : X ×Y → X is continuous. For any open U ⊂ X its preimage is p−1
1 (U ) = U ×Y
which is an open subset of X × Y .
2. It is clear that h: Z → X × Y should satisfy
h(z) = (f (z), g(z)), z ∈ Z.
21
We just have to show that this map is continuous. An open set in X × Y can be written as ∪i∈I Ui × Vi ,
where Ui are open in X and Vi are open in Y . Its preimage is equal to
[
h−1 (Ui × Vi )
i∈I
−1
and it is enough to show that h (U × V ) is open for any open U ⊂ X and any open V ⊂ Y . We have
−1
h (U × V ) = { z ∈ Z | f (z) ∈ U, g(z) ∈ V } = f −1 (U ) ∩ g −1 (V )
and the last set is open as an intersection of two open sets f −1 (U ) and g −1 (V ). This means that preimages
of open sets in X × Y with respect to h are open in Z. Therefore h: Z → X × Y is continuous.
Example 2.19. Let us show that the following three topologies on R2 coincide
(1) Topology τ2 generated by the Euclidean metric d2 .
(2) Topology τ∞ generated by the supremum metric d∞ .
(3) The product topology τ̄ on R × R = R2 .
Similarly one can show that these topologies are equivalent to the topology τp generated by the metric dp
for every p ≥ 1. To show that two topologies τ, τ 0 on a set X are equal one needs to prove that the identity
map f : (X, τ ) → (X, τ 0 ), x 7→ x, is a homeomorphism. Indeed, continuity of f implies that τ 0 ⊂ τ (if U ∈ τ 0
=⇒ U = f −1 (U ) ∈ τ ) and continuity of f −1 implies that τ ⊂ τ 0 . We conclude that τ = τ 0 . In our case we
consider the identity map f : (R2 , τ2 ) → (R2 , τ∞ ). Then
p
d∞ (f (x), f (y)) = d∞ (x, y) = max |xi − yi | ≤ (x1 − y1 )2 + (x2 − y2 )2 = d2 (x, y).
i=1,2
This implies that f is Lipschitz continuous, hence continuous. The inverse function g = f −1 : (R2 , τ∞ ) →
(R2 , τ2 ) satisfies
p √
d2 (g(x), g(y)) = d2 (x, y) = (x1 − y1 )2 + (x2 − y2 )2 ≤ 2 · d∞ (x, y).
This implies that g is Lipschitz continuous, hence continuous. We conclude that f is a homeomorphism, hence
τ2 = τ∞ . Let us compare now τ∞ and τ̄ . The identity map h: (R2 , τ∞ ) → (R2 , τ̄ ) is continuous by the previous
theorem (take Z = (R2 , τ∞ ) and X = Y = R). We need to show that its inverse g = h−1 : (R2 , τ̄ ) → (R2 , τ∞ )
is also continuous. Every open set in (R2 , τ∞ ) is a union of open balls with respect to the metric d∞ . Such
an open ball has a form
B(x, ε) = y ∈ R d∞ (x, y) = max |xi − yi | < ε = y ∈ R2 yi ∈ (xi − ε, xi + ε) ∀i
2
i=1,2
= (x1 − ε, x1 + ε) × (x2 − ε, x2 + ε) = U1 × U2 ,
where Ui = (xi − ε, xi + ε) is open in R. We conclude that g −1 (B(x, ε)) = B(x, ε) = U1 × U2 . But this set
is open in the product topology τ̄ , hence g is continuous. We conclude that h is a homeomorphism, hence
τ∞ = τ̄ .
Example 2.20. In contrast to the previous example, consider the identity map
Id: (C[0, 1], d∞ ) → (C[0, 1], d1 ).
This map is Lipschitz continuous, hence continuous. On the other hand its inverse is not continuous as there
exists a sequence of functions xn (t) = tn that converges to zero with respect to d1 , but diverges with respect
to d∞ . This implies that the corresponding topologies satisfy τ1 ⊂ τ∞ , but they are not equal.
22
2.3. Hausdorff spaces. To motivate the introduction of Hausdorff topological spaces let us discuss first
convergent sequences in topological spaces. In the case of metric spaces we required open balls for the
definition of convergent sequences. In the case of topological spaces the analogue of open balls are open
neighborhoods of a point.
Definition 2.21. Let (X, τ ) be a topological space and x ∈ X.
(1) An open set U ⊂ X that contains x is called an open neighborhood of x.
(2) A subset V ⊂ X is called a neighborhood of x if there exists an open set U ⊂ X such that
x∈U ⊂V.
Remark 2.22. Note that a subset V ⊂ X is open ⇐⇒ for every point x ∈ V there exists an open
neighborhood U 3 x contained in V . Indeed, if V is open, then we can take U = V for everyS x ∈ V .
Conversely, assume that for every x ∈ V , there exists open Ux 3 x contained in V . Then V = x∈V Ux ,
hence V is open.
Definition 2.23. Let (X, τ ) be a topological space. A sequence (xn ) in X is called convergent to x ∈ X
if for any neighborhoodU 3 x there exists N > 0 such that xn ∈ U for n ≥ N .
Example 2.24. Let (X, τ ) be a topological space with the trivial topology τ = {∅, X} and let x ∈ X. Then
the sequence (xn = x)n≥1 converges to any y ∈ X. Indeed, the only open set that contains y is X and the
elements xn are contained in X for all n ≥ 1.
This example shows that arbitrary topological spaces can have rather undesirable properties. In order to
ensure that sequences have at most one limit, we have to put some restrictions on the topological space.
Definition 2.25. A topological space (X, τ ) is called a Hausdorff space (or a Hausdorff topological space)
if for any x, y ∈ X with x 6= y there exist open sets U 3 x, V 3 y such that U ∩ V = ∅.
Example 2.26. Any metric space (X, d) is Hausdorff. For x 6= y, let r = 21 d(x, y) and
U = B(x, r), V = B(y, r).
Then U, V are open and U ∩ V = ∅.
Example 2.27. Let (X, τ ) have the trivial topology and let |X| ≥ 2. Then X is not Hausdorff. Indeed, let
x, y ∈ X with x 6= y and let U 3 x, V 3 y be open. Then U = V = X, hence U ∩ V 6= 0.
Lemma 2.28. Let (X, τ ) be a Hausdorff topological space. Then any convergent sequence in X has a unique
limit.
Proof. Let (xn ) be a sequence convergent to x, y ∈ X and assume that x 6= y. There exist open sets U 3 x,
V 3 y with U ∩ V = ∅. As xn → x, y for n → ∞, there exist N, N 0 > 0 s.t.
xn ∈ U ∀n ≥ N, xn ∈ V ∀n ≥ N 0 .
Then for any n ≥ max {N, N 0 }: xn ∈ U ∩V =⇒ U ∩V 6= ∅ which contradicts to our assumption. Therefore
x = y.
Lemma 2.29. Let (X, τ ) be a Hausdorff topological space. For any point x ∈ X the set {x} is closed.
Proof. Let U = X\ {x}. By the Hausdorff condition, for any y ∈ U , there exists an open set Vy 3 y such
that x ∈
/ Vy . Therefore Vy ⊂ U . It is clear that U = ∪y∈U Vy and this set is open as a union of open sets.
This means that {x} = X\U is closed.
Example 2.30. Let X be an infinite set with the Zariski topology (closed sets are finite sets and X). Then
every 1-point set {x} is closed. However, X is not Hausdorff. Indeed, let x 6= y and let U 3 x, V 3 y be
open sets such that U ∩ V = ∅. Then
X = X\(U ∩ V ) = (X\U ) ∪ (X\V )
is finite as a union of two finite sets. This contradicts to our assumption that X is infinite.
Exercise 2.31. A topological space X is Hausdorff if and only if the diagonal { (x, x) | x ∈ X} is closed in
X × X.
23
Example 2.50. There exist connected topological spaces that are not path-connected. For example, consider
a subset E ⊂ R2 , called a deleted comb space,
[
1
E= n × [0, 1] ∪ [0, 1] × {0} ∪ {(0, 1)}
n≥1
One can show that the set E 0 = E\ {p}, where p = (0, 1), is path-connected, hence also connected. Its
closure (in E) is equal to E, hence E is also connected. On the other hand there is no path (contained in
E) between p and E 0 , hence E is not path-connected.
26
cubcover.
Theorem 2.55 (Heine-Borel Theorem on R). A closed interval [a, b] is compact.
Proof. Assume that I0 = [a, b] is not compact and let (Ui )i∈S be an infinite cover without a finite subcover.
Divide I0 into two equal subintervals. At least one of them does not have a finite subcover. Denote it by I1 .
Continuing this process we get intervals
I0 ⊃ I1 ⊃ I2 ⊃ . . .
b−a
with In having length 2n . Let xn = min In . Then
x0 ≤ x1 ≤ x2 ≤ . . .
is an increasing sequence in [a, b]. Therefore it converges to some x ∈ [a, b]. For any n ≥ 1 and k ≥ n, we
have xk ∈ Ik ⊂ In . Therefore x = limk→∞ xk ∈ In as In is closed. As (Ui )i∈S is a cover of [a, b], we can
find i ∈ S such that x ∈ Ui . Then there exists ε > 0 such that (x − ε, x + ε) ⊂ Ui . There exists n > 0 such
that b−a
2n < ε and therefore In ⊂ (x − ε, x + ε) ⊂ Ui (In contains x and has length < ε). This implies that
In ⊂ Ui , contradicting to our assumption that In does not have a finite subcover.
Lemma 2.56. Let (X, τ ) be a compact topological space and let A ⊂ X be closed. Then A is compact.
S
Proof. Let (Ui )i∈I be an open cover of A, that is, A ⊂ i∈I Ui and Ui ⊂ X are open. The set U = X\A is
open, so there is an open cover of X [
X= Ui ∪ U.
i∈I
0
S
S compactness of X, there exists a finite subset I ⊂ I such that X = (
By the i∈I 0 Ui ) ∪ U. This implies
A ⊂ i∈I 0 Ui , hence A is compact.
Theorem 2.57. Let X, Y be compact topological spaces. Then X × Y (with product topology) is compact.
Proof. Let (Wi )i∈I be an open cover of X × Y . Refining the cover (Wi ), we can assume that Wi = Ui × Vi ,
where Ui , Vi are open in X, Y respectively. For any x ∈ X, the space {x} × Y is compact and has an
open coverS(Ui × Vi )i∈I . ThereforeT there exists a finite subset Ix ⊂ I such that x ∈ Ui for all i ∈ Ix
and Y = i∈Ix Vi . Let Ux = i∈Ix Ui 3 x. The family (Ux )x∈X covers X, hence has a finite subcover
(Ux1 , . . . , Uxn ). We claim that
U i × V i | i ∈ Ixj , 1 ≤ j ≤ n
S cover of X × Y . Indeed, for any (x, y) ∈ X × Y , we have x ∈ Uxj for some 1 ≤ j ≤ n. We also have
is a finite
Y = i∈Ix Vi , hence y ∈ Vi for some i ∈ Ixj . Then x ∈ Uxj ⊂ Ui , hence (x, y) ∈ Ui × Vi . This proves the
j
above statement, hence the cover of X × Y has a finite subcover and we conclude that X × Y is compact.
27
Example 2.58. Let us show that every closed and bounded subset A ⊂ Rn is compact (part of the Heine-
Borel theorem on Rn to be proved later). As A is bounded, we can embed A ⊂ [−N, N ]n for some N > 0.
We know that the interval [−N, N ] is compact. Therefore the product [−N, N ]n is compact by the previous
theorem. As A ⊂ [−N, N ]n is closed, we conclude that it is also compact.
Proposition 2.59. Let f : X → Y be a continuous map between topological spaces. If X is compact then
f (X) is compact.
Proof. Let (Ui )i∈I be an open cover of f (X), that is, Ui ⊂ Y are open and f (X) ⊂ ∪i∈I Ui . Then
[
X= f −1 (Ui ),
i∈I
−1
where f (Ui ) are open. By the compactness of X there exists a finite subset I 0 ⊂ I such that
[
X= f −1 (Ui ).
i∈I 0
S
This implies f (X) ⊂ Ui , hence we found a finite subcover of f (X). Therefore f (X) is compact.
i∈I 0
Example 2.60. Let us show that the unit circle S 1 = (x, y) ∈ R2 x2 + y 2 = 1 is compact. Consider a
continuous map
f : [0, 2π] → R2 , θ 7→ (cos θ, sin θ).
Then S 1 = f ([0, 2π]). As the closed interval [0, 2π] is compact, we conclude that S 1 = f ([0, 2π]) is also
compact.
Proposition 2.61. Let (X, τ ) be a Hausdorff topological space and let A ⊂ X be a compact subspace. Then
A is closed in X.
Proof. We need to show that X\A is open. Consider S some x ∈ X\A. For every y ∈ A there exist open sets
Uy 3 x, Vy 3 y such that Uy ∩ Vy = ∅. Then A ⊂ y∈A Vy and by the compactness of A there exists a finite
set {y1 , . . . , yn } ⊂ A such that A ⊂ Vy1 ∪ · · · ∪ Vyn . We claim that
U = Uy1 ∩ . . . Uyn
T
is an open neighborhood of x contained in X\A. It is clear that U is open and x ∈ i Uyi = U . Moreover,
we know that Uyi ∩ Vyi = ∅, hence U ∩ Vyi = ∅ and therefore
U ∩ (Vy1 ∪ · · · ∪ Vyn ) = ∅.
But A ⊂ Vy1 ∪ · · · ∪ Vyn , hence U ∩ A = ∅. This implies that U ⊂ X\A and x is an internal point of X\A.
Therefore X\A is open.
Corollary 2.62. Let f : X → Y be a continuous and bijective map between topological spaces. If X is
compact and Y is Hausdorff, then f is a homeomorphism.
Proof. We have to prove that g = f −1 : Y → X is continuous. To do this it is enough to show that for any
closed subset A ⊂ X the set g −1 (A) = f (A) is closed. As X is compact and A ⊂ X is closed, we obtain
that A is also compact. Therefore f (A) ⊂ Y is compact. By the assumption that Y is Hausdorff and the
previous statement, we conclude that f (A) ⊂ Y is closed.
Example 2.63. We have seen previously that the identity map
Id: (C[0, 1], d∞ ) → (C[0, 1], d1 ), f →f
is Lipschitz continuous, hence also continuous. On the other hand we have seen that the inverse map is not
continuous (there is a sequence of functions xn (t) = tn convergent to zero with respect to d1 , but divergent
with respect to d∞ ). This means that the previous statement can not be applied. As any topology generated
by a metric is automatically Hausdorff, we conclude that (C[0, 1], d∞ ) is not compact.
28
Theorem 2.65 (Heine-Borel Theorem on Rn ). A subset of Rn is compact if and only if it is closed and
bounded.
Proof. We have seen that a compact set in a metric space is always closed and bounded. Conversely, assume
that A ⊂ Rn is closed and bounded. Then there exists N > 0 such that A ⊂ [−N, N ]n . We know that
[−N, N ] is compact in R. Therefore [−N, N ]n is compact. Therefore its closed subset A is compact.
Example 2.66. It is not true in general that a closed and bounded set is compact. For example, let X be
an infinite set equipped with the discrete metric. Then X is closed
S and bounded Sas X = B(x, 2) for any
x ∈ X. But X is not compact. For example, the open cover X = x∈X B(x, 1) = x∈X {x} does not have
a finite subcover.
We will prove now some useful characterizations of compact metric spaces.
Definition 2.67. A metric space (X, d) is called sequentially compact if any sequence (xn ) in X has a
convergent subsequence.
Definition 2.68. Let (X, d) be a metric space.
(1) GivenSε > 0, a subset A ⊂ X is called an ε-net of X if ∀x ∈ X ∃y ∈ A: d(x, y) < ε. Equivalently,
X = y∈A B(y, ε).
(2) A metric space X is called totally bounded if for any ε > 0 there exists a finite ε-net of X.
Theorem 2.69. Let (X, d) be a metric space. Then the following conditions are equivalent
(1) (X, d) is compact.
(2) (X, d) is sequentially compact.
(3) (X, d) is complete and totally bounded.
Proof. (1) =⇒ (2). Assume that (X, d) is compact. Let (xn )n≥1 be a sequence in X. Then it contains a
subsequence convergent to y ∈ X if and only if ∀ε > 0 and ∀N > 0
∃n ≥ N : xn ∈ B(y, ε).
Assuming that (xn ) does not contain a convergent subsequence, for any y ∈ X there exist εy > 0 and Ny > 0
such that
xn 6∈ B(y, εy ) ∀n ≥ Ny .
The open balls B(y, εy ) with y ∈ X form a cover of X. By the compactness of X there exists a finite subset
I ⊂ X s.t. [
X= B(y, εy ).
y∈I
S then for any n ≥ max { Ny | y ∈ I} we would have xn 6∈ B(y, εy ) for y ∈ I and therefore xn 6∈
But
y∈I B(y, εy ) = X, a contradiction.
(2) =⇒ (3). Assume that X is sequentially compact. We will prove that it is complete and totally bounded.
Let (xn ) be a Cauchy sequence. We can find a convergent subsequence (xnk )k≥1 with a limit x ∈ X. For
any ε > 0 there exists N > 0 such that d(xm , xn ) < ε/2 for m, n ≥ N . There exists k > 0 such that nk ≥ N
and d(xnk , x) < ε/2. Then for any n ≥ N
d(xn , x) ≤ d(xn , xnk ) + d(xnk , x) < ε/2 + ε/2 = ε.
This implies that the sequence (xn ) converges to x. This proves that X is complete.
29
Assume that X is not totally bounded. Then there exists ε > 0 s.t. for any finite set of points {x1 , . . . , xn }
the balls B(xi , ε) don’t cover X. Then we can choose a sequence (xn )n≥1 in such a way that
xn+1 6∈ B(x1 , ε) ∪ · · · ∪ B(xn , ε), n ≥ 1.
This implies that d(xm , xn ) ≥ ε for m 6= n. This sequence can not contain a convergent subsequence (any
subsequence is not Cauchy).
(3) =⇒ (1). Assume that X is complete and totally bounded and let’s prove that X is compact. Let (Ui )i∈I
be an open cover that does not contain a finite subcover. By our assumption, for any n ≥ 1, there exists a
finite set An ⊂ X such that [
X= B(x, 1/n).
x∈An
At least one of the sets B(x, 1) with x ∈ A1 can not be covered by a finite number of Ui . Let x1 ∈ A1 be
the corresponding point. At least one of the sets
B(x1 , 1) ∩ B(x, 1/2), x ∈ A2
can not be covered by a finite number of Ui . Let x2 ∈ A2 be the corresponding point. Continuing this
process we find a sequence of points xn ∈ An such that
B(x1 , 1) ∩ B(x2 , 1/2) ∩ · · · ∩ B(xn , 1/n)
can not be covered by a finite number of Ui and in particular is non-empty. The sequence (xn ) is Cauchy.
Indeed, for m > n we have B(xm , 1/m) ∩ B(xn , 1/n) 6= ∅, hence
d(xm , xn ) < 1/m + 1/n < 2/n.
As X is complete, the sequence (xn ) converges to some x ∈ X. Choose i ∈ I such that x ∈ Ui . Then
B(x, ε) ⊂ Ui for some ε > 0. There exists n > 0 such that d(xn , x) < ε/2 and n1 < ε/2. Then B(xn , 1/n) ⊂
B(x, ε) ⊂ Ui , hence
B(x1 , 1) ∩ B(x2 , 1/2) ∩ · · · ∩ B(xn , 1/n)
is covered by just one Ui , contradicting to our choice of the sequence (xn ).
Example 2.70. Consider the space C[0, 1] with the supremum metric d∞ . The set A = B̄(0, 1) =
{ f ∈ C[0, 1] | d∞ (f, 0) ≤ 1} is closed and bounded, but it is not compact. By the previous theorem it is
enough to show that A is not sequentially compact. The sequence xn (t) = tn in A converges pointwise to
the function x(t) = δ1 (t) which is 0 for t 6= 1 and 1 for t = 1. This function is not continuous, hence the
sequence (xn ) does not have a convergent subsequence.
We can use Theorem 2.69 to give an alternative proof of the Heine-Borel theorem:
Theorem 2.71 (Heine-Borel theorem, see 2.55). A closed interval [a, b] is compact.
Proof. We know that [a, b] is complete. To show that it is totally bounded, for every ε > 0, choose N > 1/ε
and divide [a, b] into N equal subintervals. The set A of the end-points of these subintervals forms a finite
ε-net of [a, b]. We conclude that [a, b] is totally bounded, hence compact.
Remark 2.72. In the same way we can show that an n-dimensional cube [a, b]n ⊂ Rn is complete and
totally bounded, hence compact. This can be used to prove Theorem 2.65.
We can also use Theorem 2.69 to give an alternative proof of the Bolzano-Weierstrass theorem:
Theorem 2.73 (Bolzano-Weierstrass theorem, see 1.80). A bounded sequence in Rk has a convergent sub-
sequence.
Proof. Let (xn ) be a bounded sequence in Rk . Let A = { xn | n ≥ 1} ⊂ Rk . Then Ā is closed and bounded.
Therefore Ā is compact, hence also sequentially compact. This implies that a sequence (xn ) in Ā has a
convergent subsequence.
30
Proof. Let A = f (X) ⊂ R. Then A is compact, hence closed and bounded. Therefore
M = sup f (x) = sup A < +∞
x∈X
and M ∈ A. This implies that there exists x1 ∈ X with M = f (x1 ). The proof for the infimum is similar.
Remark 2.76. Let us generalize the space of continuous functions C[a, b]. Given a topological space X,
define C(X) to be the set of continuous functions f : X → R. If X is compact, define the supremum metric
d∞ on C(X)
d∞ (f, g) = sup |f (x) − g(x)| .
x∈X
This metric is well-defined as the function h: X → R, x 7→ |f (x) − g(x)| is continuous, hence bounded by
the above result.
Remark 2.77. Even more generally, given topological spaces X, Y , define C(X, Y ) to be the set of con-
tinuous maps f : X → Y . If X is compact and (Y, dY ) is a metric space, define the supremum metric on
C(X, Y )
d∞ (f, g) = sup dY (f (x), g(x)).
x∈X
This metric is well-defined as the function h: X → R, x 7→ dY (f (x), g(x)) is continuous, hence bounded by
the above result. One can show that the generated topology on C(X, Y ) coincides with the compact-open
topology. This is the minimal topology that contains the sets W (K, U ) = { f ∈ C(X, Y ) | f (K) ⊂ U } for all
compact K ⊂ X and open U ⊂ Y .
Example 2.78. If X is finite and has the discrete topology, then C(X) ' Rn , where n = |X|, hence a
subset A ⊂ C(X) is compact if it is closed and bounded (by the Heine-Borel theorem). But this is not true
in general. For example, the subset A = B̄(0, 1) ⊂ C[0, 1] is closed and bounded, but not compact (see
Example 2.70). The next result is a useful characterization of compact subsets of C(X).
Theorem 2.79 (Arzela-Ascoli Theorem). Let X be a compact metric space and C(X) be the space of
continuous maps f : X → R with the supremum metric. A subset A ⊂ C(X) is compact if and only if
(1) A is closed and bounded.
31
where in the last equality we used the fact that [a, b] is compact.
P p
Example 3.7. For any p ≥ 1, let `p be the set of sequences (xn )n≥1 in R such that n≥1 |xn | < ∞. For
1/p
P p
any such sequence, define kxkp = n≥1 |xn | . Define addition and scalar multiplication of sequences
componentwise:
(x + y)n = xn + yn , (λx)n = λ · xn .
It is clear that if x ∈ `p and λ ∈ R, then λx ∈ `p . Let us show that if x, y ∈ `p , then x + y ∈ `p . By the
Minkowski inequality, for any m ≥ 1,
X m 1/p X m 1/p Xm 1/p
p p p
|xn + yn | ≤ |xn | + |yn | ≤ kxkp + kykp .
n=1 n=1 n=1
Taking the limit as m → ∞, we obtain
X 1/p
p
|xn + yn | ≤ kxkp + kykp < ∞,
n≥1
hence x + y ∈ `p . This implies that `p is a vector space. The map k·kp defines a norm on `p .
Similarly, define `∞ to be the set of all sequences (xn )n≥1 in R such that supn≥1 |xn | < ∞ (that is,
bounded sequences). Then `p is a vector space with addition and scalar multiplication defined as before. It
has a norm
kxk∞ = sup |xn | ∀x ∈ `∞ .
n≥1
34
Definition 3.10. Let X, Y be normed vector spaces. Define L(X, Y ) to be the set of all bounded linear
operators from X to Y . The set L(X, Y ) has a structure of a vector space:
(1) Given A, B ∈ L(X, Y ), define A + B ∈ L(X, Y ) by
(A + B)x = Ax + Bx, x ∈ X.
If A is bounded by M > 0 and B is bounded by N > 0, then A + B is also bounded:
k(A + B)xk = kAx + Bxk ≤ kAxk + kBxk ≤ M kxk + M kxk = (M + N ) kxk .
(2) Given λ ∈ K and A ∈ L(X, Y ) define λA ∈ L(X, Y ) by
(λA)x = λ · Ax.
It is clear that if A is bounded, then λA is bounded.
Define L(X) = L(X, X) and define the dual space X ∗ = L(X, R).
Definition 3.11. Define the operator norm kAk of a bounded linear operator A ∈ L(X, Y ) to be the
minimal number M ≥ 0 satisfying kAxk ≤ M kxk for all x ∈ X. That is,
kAk = inf { M ≥ 0 | kAxk ≤ M kxk ∀x ∈ X} .
Lemma 3.12. For any A ∈ L(X, Y ), we have
kAxk
kAk = sup = sup kAxk .
x6=0 kxk kxk=1
Proof. For any x 6= 0, consider the vector y = x/ kxk. Then kyk = 1 and kAxk kAyk
kxk = kyk = kAyk. This implies
the second equality of the lemma.
Let a = supx6=0 kAxk kAxk
kxk . Then kxk ≤ a for all x 6= 0, hence kAxk ≤ a kxk for all x ∈ X. Therefore kAk ≤ a.
kAxk kAxk
Conversely, we have kAxk ≤ kAk · kxk, hence kxk ≤ kAk for all x 6= 0. Therefore a = supx6=0 kxk ≤ kAk.
We conclude that kAk = a.
35
Example 3.13. Consider a linear map A: (Rn , k·k1 ) → (Rn , k·k∞ ) given by a matrix (aij )ni,j=1 . Then
n
X X
kAxk∞ = max aij xj ≤ max max |aij | |xj | = max |aij | · kxk1 .
i i j i,j
j=1 j
This implies that kAk ≤ maxi,j |aij |. To prove the inverse inequality, consider x = ej ∈ Rn for 1 ≤ j ≤ n.
Then
kAxk∞ = k(a1j , a2j , . . . , anj )k∞ = max |aij | .
i
Therefore
kAxk∞
kAk ≥ = max |aij | .
kxk1 i
(2)
kλAxk kAxk
kλAk = sup = |λ| sup = |λ| · kAk .
x6=0 kxk x6=0 kxk
(3) For any x ∈ X
k(A + B)xk = kAx + Bxk ≤ kAxk + kBxk ≤ kAk kxk + kBk kxk .
Therefore kA + Bk ≤ kAk + kBk.
Example 3.15. Let X = (Rn , k−k1 ). We will show that the dual space X ∗ = L(X, R) is given by
(Rn , k−k∞ ). A linear map f : Rn → R can be written in the form
n
X
f (x) = fi xi x ∈ Rn
i=1
for some (f1 , . . . , fn ) ∈ Rn (we will denote this vector also by f ). Then
X X
kf (x)k = fi xi ≤ max |fi | · |x|i = kf k∞ · kxk1 ,
i
hence kf k = supx6=0 kf (x)k / kxk1 ≤ kf k∞ . On the other hand, if kf k∞ = |fi | for some i, then for x = ei
(element of the standard basis) we have kf (x)k / kxk1 = |fi |, hence kf k = supx6=0 kf (x)k / kxk1 = kf k∞ .
Similarly one can show that the dual space of (Rn , k−k∞ ) is given by (Rn , k−k1 ). More generally, for any
p, q ≥ 1 such that p1 + 1q = 1, the dual space of (Rn , k−kp ) is given by (Rn , k−kq ).
Lemma 3.16. Let A ∈ L(X, Y ) and B ∈ L(Y, Z). Then BA is bounded and kBAk ≤ kBk · kAk.
Proof. For any x ∈ X we have
kBAxk ≤ kBk · kAxk ≤ kBk · kAk · kxk .
This implies kBAk ≤ kBk · kAk.
Remark 3.17. The previous result implies that the normed vector space L(X) = L(X, X) has a multipli-
cation given by composition of operators. This means that L(X) is equipped with a structure of an algebra.
Its identity element is the identity operator.
36
The last minimum is attained as the set { x ∈ Rn | kxk1 = 1} is compact and the map (Rn , k·k1 ) → R,
x 7→ kAxk is continuous by the previous argument. This implies that a > 0. We have kxk / kxk1 ≥ a for all
x 6= 0, hence
1
kxk1 ≤ kxk .
a
Therefore the norms are equivalent.
Theorem 3.22. Let A: X → Y be a linear operator between normed vector spaces. If X is finite-dimensional,
then A is continuous.
n
Proof. Choosing a basis (e1 , . . . , en ) of X, we can assumePnthat X = R . As all norms on X are equivalent,
we can assume that the norm on X is given by kxk1 = i=1 |xi |. Then
X
X
X
kAxk =
A xi ei
=
xi Aei
≤ kxi Aei k
X X
= |xi | kAei k ≤ C |xi | = C kxk1 ,
where C = maxi kAei k. This implies that A is bounded.
37
is a Banach space.
R1
Example 3.26. Consider the space C[0, 1] equipped with the norm kxk1 = 0 |x(t)| dt. Let us show that
it is non-Banach. Consider the function f (t) = 0 for t ∈ [0, 21 ] and f (t) = 1 for t ∈ ( 12 , 1]. This is a
non-continuous function which can be interpolated by the continuous functions (for n ≥ 2)
0
t ≤ 12
xn (t) = n(t − 2 ) 12 ≤ t ≤ 12 + n1
1
t ≥ 21 + n1
1
We have kf − xn k1 ≤ 1/n, hence the sequence (xn ) is Cauchy. On the other hand if it converges to some
continuous function g with respect to the norm k−k1 , then kf − gk1 = 0. This would imply that g(t) = 0
for t < 12 and g(t) = 1 for t > 12 , contradicting the continuity of g.
Theorem 3.27. The space `p is Banach.
Proof. Let (x(n) )n≥1 be a Cauchy sequence in `p . Then ∀ε > 0 ∃N > 0:
X p 1/p
(m) (m) (n)
(2) − x(n)
= xi − xi <ε ∀m, n ≥ N.
x
p
i≥1
Theorem 3.28. Let X, Y be two normed vector spaces and assume that Y is a Banach space. Then the
space of bounded linear operators L(X, Y ) is a Banach space.
Proof. Let (An )n≥1 be a Cauchy sequence in L(X, Y ). For any x ∈ X and for any ε > 0 there exists N > 0
s.t.
kAm − An k < ε ∀m, n ≥ N.
Therefore
kAm x − An xk ≤ kAm − An k kxk ≤ ε kxk , m, n ≥ N
and this implies that (An x) is a Cauchy sequence in Y . Therefore it is convergent to some Ax ∈ Y . Let us
show that A: X → Y is linear. For any x, y ∈ X, we have
A(x + y) = lim An (x + y) = lim (An x + An y) = lim An x + lim An y = Ax + Ay.
n→∞ n→∞ n→∞ n→∞
38
Similarly one proves that A(λx) = λAx for any A ∈ R, x ∈ X. Let us show that A is bounded. For any
ε > 0 there exists N > 0 such that
kAm − An k < ε ∀m, n ≥ N.
Therefore
kAm x − An xk ≤ kAm − An k kxk = ε kxk
and
kAx − An xk ≤ ε kxk .
This implies that A − An is bounded and therefore A is bounded. We also obtain that
kA − An k ≤ ε ∀n ≥ N.
This implies that An → A as n → ∞.
∗
Remark 3.29. Given a normed vector space (X, k·k) over R, we defined its dual space to be X = L(X, R).
Its elements are continuous linear maps f : X → R, called linear functionals. Note that X ∗ is a Banach space
(as R is complete), although X can be non-Banach. This implies that X ∗∗ is not necessarily isomorphic to
X (although there is a canonical map X → X ∗∗ ).
Definition 3.30. Let X be a normed vector space and (xn )n≥1 be a sequence in X.
P
(1) The (formal) series n≥1 xn is called convergent to x ∈ X if the sequence of partial sums (sn )n≥1
Pn P
with sn = k=1 xk converges to x. We denote the limit limn→∞ sn by n≥1 xn .
P P
(2) The series n≥1 xn is called absolutely convergent if the series n≥1 kxn k converges in R.
P∞
Theorem
P∞ 3.31. Let X be a Banach space and n=1 xn be an absolutely convergent series in X. Then
n=1 xn is convergent.
hence the sequence (sn )n≥1 in X is Cauchy. As X is complete, we conclude that (sn )n≥1 is convergent.
Theorem 3.32. Let X be a Banach space and A ∈ L(X). Then the series
X Ak
eA =
k!
k≥0
is convergent.
Proof. As X is Banach, the space L(X) = L(X, X) equipped with an operator norm is also Banach. To
prove the theorem, we have to show that the above series is absolutely convergent. We have
X
X 1
k
X 1 k
Ak /k!
=
A
≤ kAk = ekAk ,
k! k!
k≥0 k≥0 k≥0
k
where we used the fact that kABk ≤ kAk · kBk (and therefore
Ak
≤ kAk ). This means that the series
k
A
P
k≥0 k! in L(X) is absolutely convergent and therefore convergent.
Definition 3.33. Let X be a normed vector space and let I ∈ L(X) be the identity operator defined by
I(x) = x for x ∈ X. An operator A ∈ L(X) is called invertible in L(X) if there exists an operator B ∈ L(X)
such that AB = BA = I. The operator B is called the inverse of A and is denoted by A−1 .
39
Theorem 3.34. Let X be a Banach Pspace and let A ∈ L(X) with kAk < 1. Then the operator I − A is
invertible in L(X) and it inverse is k≥0 Ak .
Proof. The normed vector space L(X) is Banach. To show that k≥0 Ak is convergent, we have to show
P
that it is absolutely convergent. We have
X
X
Ak
≤ k 1
kAk = < ∞.
1 − kAk
k≥0 k≥0
Pn
Ak is absolutely convergent and therefore convergent. Let sn = k
P
This means that k≥0 k=0 A and
B = limn→∞ sn = k≥0 Ak . Then
P
Intuitive P
explanation.
P Put particles with mass pi at the points
P (xi , f (xi ))
Pof the plain. Then the center of
gravity ( pi xi , pi f (xi )) will be over the curve, hence pi f (xi ) ≥ f ( pi xi )).
Proof. By induction on n
n+1
! n
!
X X pi
f pi xi = f (1 − pn+1 ) xi + pn+1 xn+1
i=1 i=1
1 − pn+1
n
! n
X pi X
≤ (1 − pn+1 )f xi + pn+1 f (xn+1 ) ≤ pi f (xi ) + pn+1 f (xn+1 ).
i=1
1 − pn+1 i=1
Remark A.9. We can consider the sequence (pi ) as a collection
P of probabilities Pand the sequence x = (xi )
as a random variable. Then we define expectations E[x] = pi xi , E[f (x)] = pi f (xi ) and we can write
Jensen’s inequality in the form
f (E[x]) ≤ E[f (x)].