Ran
Ran
Ran
DAVID RAN
Contents
1. Measure Theory Preliminaries 2
2. Measure-Preserving Transformations 3
3. Recurrence 5
4. Ergodicity 6
5. Proving Ergodicity Using Fourier Series 9
6. Conditional Expectation 11
7. Birkhoff’s Ergodic Theorem 11
8. Consequences of Birkhoff’s Theorem 15
9. Application to Normal Numbers 17
10. Application to Continued Fractions 18
11. Ehrenfests’ Model 20
Acknowledgments 21
References 21
In ergodic theory one focuses on the long term behavior of a flow or an iterated
transformation T and studies properties such as recurrence, equidistribution, mix-
ing, and other behavior. Typically, the transformation T preserves the structure
of the space, such as the measure of a probability space. The field arose in or-
der to better understand problems in statistical physics, where a measure-theoretic
approach to mechanics proved useful when dealing with difficult equations.
Birkhoff’s Ergodic Theorem is one of two fundamental theorems in ergodic the-
ory, the other being Von Neumann’s Ergodic Theorem. We begin by presenting a
simplified version of Birkhoff’s Theorem.
Theorem 0.1 (Birkhoff’s Ergodic Theorem). Let (X, B, µ) be a σ-finite measure
space and let T : (X, B, µ) → (X, B, µ) be a measure-preserving ergodic transforma-
tion. Let A be measurable. Then for µ-a.e. x ∈ X we have:
1 µ(A)
lim card{j | T j (x) ∈ A, 1 ≤ j ≤ n − 1} → .
n→∞ n µ(X)
1
2 DAVID RAN
Given the orbit {T j x}∞j=0 , the theorem states that the time average (LHS) of
the orbit is equal to its space average (RHS). This paper will aim to prove this
result in a more general form and explore some of its applications. The examples
and applications in this paper are taken from [1].
Theorem 1.6 (Radon-Nikodym). Consider the measurable space (X, B) and two
measures on the space, ν and µ. Let ν µ, meaning that for all A ∈ B if µ(A) = 0
then ν(A) = 0. Then there exists a measurable function f : (X, B, µ) → [0, ∞) such
that for all A ∈ B we have Z
ν(A) = f dµ
A
The function f is called the Radon-Nikodym derivative.
BIRKHOFF’S ERGODIC THEOREM 3
2. Measure-Preserving Transformations
Given an invertible transformation T : X → X we often want to study the
doubly infinite time orbit of a point x ∈ X, which we denote ω:
ω = (. . . , T −2 x, T −1 x, x, T x, T 2 x, . . . )
Here we work in discrete time, which is often easier to analyze. We expect the long
term behavior of the discretized system to approximate the continuous time one.
Additionally, the transformation T typically preserves some structure of the space.
Definition 2.1. A map T : (X, B, µ) → (X, B, µ) is measure-preserving if for every
measurable set B ∈ B we have µ(T −1 B) = µ(B)
Now note that if T is both measure-preserving and invertible, then T −1 is also
measure-preserving, because µ((T −1 )−1 A) = µ(T A) = µ(T −1 T A) = µ(A). How-
ever, if T is not invertible it is not necessarily true that µ(A) = µ(T A). Consider
the following example.
Example 2.2 (The Doubling Map). Consider the map T : [0, 1] → [0, 1] where
T (x) = 2x mod 1. Then with respect to Lebesgue measure µ we have µ([0, 21 ]) = 12
while µ(T [0, 21 ]) = µ([0, 1]) = 1. However, if we work with preimages, we have
µ([0, 12 ]) = 21 and µ(T −1 [0, 12 ]) = µ([0, 41 ] ∪ [ 12 , 34 ]) = 12 . Moreover, measure is
preserved for any general interval [a, b]. Because intervals generate the Borel σ-
algebra, measure is preserved for all Borel sets. We will later use the Doubling
Map to prove results in number theory.
Theorem 2.3. Let T be a measurable transformation on the probability space
(X, B, µ). The following are equivalent:
(1) T is measure-preservingR
(2) for all f ∈ L1 we have R f dµ = R f ◦ T dµ
R
3. Recurrence
One property we are interested in is the frequency with which a point from a set
returns to that set. Poincaré’s Theorem is a first step in that direction. We will
prove stronger results later, so one may skip this proof.
∞ S
∞ ∞
T −n E =
T T
Let x ∈ EN . For all N ≥ 0 there exists n ≥ N such that
N =0n=N N =0
∞
T n x ∈ E. Therefore the set
T
EN contains points of X that enter E infinitely
N =0
∞
T
many times. Let F = E ∩ EN . We have
N =0
∞
\ ∞
\
µ(F ) = µ(E ∩ EN ) = µ E ∩ E0 \ (E0 \ EN )
N =0 N =0
∞
\
= µ(E ∩ E0 ) − µ E ∩ (E0 \ EN ) = µ(E ∩ E0 ) = µ(E)
N =0
Thus, almost every point in E returns to E infinitely many times. Moreover,
points in F return to F infinitely many times. Let x ∈ F . Then there exist
0 < n1 < n2 < · · · such that T ni x ∈ E for all i. Then the orbit of T ni x enters E
infinitely many times, because for all k > i we have T nk −ni (T ni (x)) = T nk ∈ E.
Therefore T ni (x) ∈ F , and this is true for all i. Therefore the orbit of x enters F
infinitely many times.
4. Ergodicity
Definition 4.1. A measure preserving map T is ergodic if for all sets B ∈ B such
that T −1 B = B we have µ(B) ∈ {0, 1}.
Ergodicity is a non-decomposability condition. If there exists B ∈ B such that
T −1 B = B and µ(B) ∈ (0, 1) then we can decompose X into two disjoint probability
1 1
spaces (B, B|B , µ(B) µ) and (X \ B, B |X\B , 1−µ(B) µ) that are invariant under T .
Definition 4.2. The symmetric difference of two sets A, B is A4B = (A \ B) ∪
(B \ A). If µ(A4B) = 0 we say A = B almost everywhere (a.e.)
If T is ergodic then the weaker condition B = T −1 B a.e. also implies µ(B) ∈
{0, 1}. We prove this in Corollary 4.5.
Proposition 4.3. If µ(A4B) = 0 then µ(A) = µ(B).
Lemma 4.4. Let T be measure-preserving. If µ(T −1 B4B) = 0 then there exists
B 0 ∈ B such that T −1 B 0 = B 0 and µ(B4B 0 ) = 0.
Proof. Let
∞ [
\ ∞
B0 = T −j B.
n=0j=n
∞
∞ S ∞ S∞ ∞ ∞
Then T −1 B 0 = T −1 T −j B = T −(j+1) B = T −j B = B 0
T T T S
n=0j=n n=0j=n n=0j=n+1
Therefore
n−1
[ n−1
X
µ(T −n B4B) ≤ µ( T −j (T −1 B4B)) ≤ µ(T −j (T −1 B4B))
j=0 j=0
Because T is measure-preserving:
n−1
X
= µ(T −1 B4B) = nµ(T −1 B4B) = 0
j=0
Corollary 4.5. If T is ergodic and µ(T −1 B4B) = 0 then µ(B) ∈ {0, 1}.
Proof. By the lemma, there exists B 0 such that T −1 B 0 = B 0 . Because T is ergodic
µ(B 0 ) ∈ {0, 1}. Because µ(B4B 0 ) = 0 we have µ(B) = µ(B 0 ) ∈ {0, 1}.
Recall Poincaré’s Theorem. Does the orbit of a point ever enter or re-enter a
different set?
Proposition 4.6. Given a set of positive measure A ⊂ X, if T is measure-
preserving and ergodic then the orbit of almost every point of X enters A even-
tually.
∞
T −n A. Then T −1 B ⊂ B and µ(T −1 B) = µ(B)
S
Proof. Consider the set B =
n=0
because T is measure-preserving. Because T is ergodic, µ(B) ∈ {0, 1}. Because
A has positive measure, we have µ(B) = 1. If x ∈ B, there exists k such that
x ∈ T −k A, so T k x ∈ A. Thus, almost every point in X enters A.
We summarize the previous results in the following statement.
Theorem 4.7. If T is a measure-preserving transformation then the following are
equivalent:
(1) T is ergodic
(2) µ(T −1 B4B) = 0 =⇒ µ(B) ∈ {0, 1}
8 DAVID RAN
(3) For all sets A, B with positive measure, there exists n > 0 such that
µ(T −n A ∩ B) > 0.
For each n there exists kn such that A(kn , n) = 1 and A(k, n) = 0 for k 6= kn . This
is true for arbitrary n, so we take the intersection over all n:
∞
T
Let Y = A(kn , n). By Proposition 1.2, we still have µ(Y ) = 1. Moreover, for
n=1
x, y ∈ Y we have |f (x) − f (y)| ≤ 21n for all n. Therefore for all x, y ∈ Y we have
f (x) = f (y). Thus, f is constant on a set of full measure Y ⊂ X.
The above theorem is also true for f ∈ L2 , so we can apply the above theorem
using L2 Fourier Series. In many cases it is easier to change to Fourier series to
BIRKHOFF’S ERGODIC THEOREM 9
show that f ◦ T = f implies f is constant a.e. With or without using Fourier series,
proving ergodicity can often be a difficult task.
Example 5.5 (Doubling Map). Recall the Doubling Map from before, but this
time generalized to the b-adic map for some b ∈ Z, b ≥ 2:
T : R/Z → R/Z : x 7→ bx mod 1.
Showing that T is measure-preserving is straightforward. We can show T is ergodic
by once again applying Proposition 4.8. Let f ∈ L2 ⊂ L1 be such that f ◦ T = f
a.e. By induction, f ◦ T k = f a.e. for all k. This is true because for each additional
composition of T only a null set of points are the exception to the equality. Now
let f have the Fourier series
X∞
cn e2πinx .
n=−∞
Then f ◦ T k has the Fourier series
∞
X k
cn e2πinb x .
n=−∞
The above equality holds because for any 1×k coordinate vectors v, w and any k ×k
matrix A we have h(v T A)T , wi = (v T A)w = v T (Aw) = hv, Awi. Next, observe that
f is A-invariant. We can also check that f is non-constant unless n = 0. This
raises a contradiction with Proposition 4.8.
Now suppose that A has no eigenvalue that is a root of unity. Suppose f ∈ L2
such that f ◦ A = f . Suppose f has the Fourier series
X
cn e2πihn,xi .
n∈Zk
BIRKHOFF’S ERGODIC THEOREM 11
6. Conditional Expectation
We introduce the idea of the conditional expectation of a function f as a nuance
that we will later encounter in Birkhoff’s Theorem. If we know T is measure-
preserving but not necessarily ergodic, we can still use Birkhoff’s Theorem to attain
a result regarding the conditional expectation of f , which is relevant to probability
theory and martingales.
Proposition 6.1. Let (X, B, µ) be a probability space and let A ⊂ B be a σ-
algebra Rof B. Let f ∈ L1 (X, B, µ). We can define a measure ν on A by taking
ν(A) = A f dµ for all A ∈ A. This is a measure and ν µ|A .
Proposition 6.2. By the Radon-Nikodym theorem there exists a unique function
E(f | A) : (X, A)
R → [0, ∞) that is measurable
R with respect to A such that for all
A ∈ A we have A E(f | A)dµ = ν(A) = A f dµ.
Since A ⊂ B, being A-measurable is more restrictive than begin B-measurable.
The function f need not be measurable with respect to A, as opposed to the function
dν
E(f | A). We refer to E(f | A) as the Radon Nikodym derivative dµ| A
or as the
conditional expectation of f with respect to A.
We can consider the expression on the left as a time average. It is the asymptotic
average of the points f (T n x) over time. Meanwhile, the expression on the right is
a space average. If we let f be the indicator function for a set A, we can measure
the long term frequency with which the orbit of x enters A as a proportion of time.
µ(A)
If T is ergodic, this frequency is µ(X) .
Let us proceed to the proof of the theorem, which will be broken into several
steps.
Proposition 7.2. Let T be a measure-preserving transformation. The operator
U : L1 → L1 that maps f 7→ f ◦ T is a positive linear operator with norm ||U || = 1.
(i.e. if f ≥ 0 then U f ≥ 0)
Proof. That U is positive follows from its definition, and ||U || = 1 by Theorem
2.3.
Proof. For each x there exists nx ≤ n such that Fn (x) = fnx (x). Then
n
X
|Fn (x)| = |fnx (x)| ≤ |fj (x)|.
j=1
R Pn
|fj |dµ < ∞, which proves Fn ∈ L1 .
R
Then |Fn |dµ ≤ j=1
BIRKHOFF’S ERGODIC THEOREM 13
These functions are defined for all x, but may be infinite. This proof will have
R ∗ distinct
several steps. We show (1) f ∗ ◦ T = f ∗ , (2) f ∗ = f∗ a.e., (3) f ∗ ∈ L1 , (4)
f dµ = f dµ, (5) f ∗ = E(f | I).
R
Pn−1
(1) We claim f ∗ ◦ T = f ∗ . Indeed, let an (x) = 1
n j=0 f (T j (x)). Then
n+1 1
n an+1 (x) = an (T x) + n f (x). As n → ∞ we have
Thus f ∗ ◦ T = f ∗ . Similarly f∗ ◦ T = f∗ .
We then prove the same inequality again, but this time we replace f, u, v by
−f, −v, −u respectively and note that (−f ∗ ) = −f∗ and (−f )∗ = −f ∗ . We see,
14 DAVID RAN
D−v,−u = {x ∈ X | (−f )∗ < −v, −u < (−f )∗ } = Eu,v . We apply Corollary 7.4 to
(−f ) and find
Z Z
(−f )dµ = (−f )dµ ≥ −uµ(D−v,−u ) = −uµ(E(u, v)).
Eu,v D(−v,−u)
Therefore
Z
(7.6) f dµ ≤ uµ(Eu,v ).
Eu,v
(4) Now we show f ∗ dµ = f dµ if µ(X) < ∞. This is much like part (1).
R R
(5) Now we show f ∗ = E(f | I). Because f ∗ = f ∗ ◦ T , we know that the preim-
age of any set under f ∗ will be invariant under T . Therefore f ∗ is measurable with
respect to I, the
R σ-algebraR of sets invariant under T . Also, if A is a set invariant
under T , then A f ∗ dµ = A f dµ which is true because we can restrict to the space
A and the function T |A : A → A. These two properties uniquely determine the
conditional expectation E(f | I).
We can think of the Birkhoff Theorem not only in terms of orbits, but also in
terms of pre-images of sets, as in the following corollary.
Corollary 8.2. Let (X, B, µ) be a probability space and let T : X → X be a
measure-preserving transformation. Then T is ergodic if and only if for all A, B ∈ B
n−1
1X
lim µ(T −j A ∩ B) = µ(A)µ(B).
n→∞ n
j=0
Then integrate
Z n−1 Z
1X
lim χA (T j x)χB (x)dµ = µ(A)χB (x)dµ = µ(A)µ(B).
n→∞ n
j=0
Pn−1
Observe that χA (T j )χB dµ = µ(T −j A ∩ B) and | n1 j=0 χA (T j x)χB (x)| ≤ 1 for
R
Recall Poincaré’s Theorem. Given a measure-preserving transformation T and a
set A of positive measure, almost every point in A will return to A infinitely many
times. While Poincaré’s Theorem shows that almost every point of A returns to A
under a measure-preserving transformation, it does not give a sense of how long we
would have to wait for a point to return to the set. Kac’s Lemma will give a sense
of how long we can expect to wait. As one might guess, a point from a larger set
on average takes less time to return to that set, whereas return times are longer for
smaller sets.
Define the first return time for a point x as nA (x) = min{n ≥ 1 | T n (x) ∈ A}.
We already know nA is finite for almost every x ∈ A, but Kac’s Lemma will tell us
more.
Theorem 8.3 (Kac’s Lemma). Let T be an ergodic measure-preserving transfor-
mation of the probability space (X, B, µ). Let A be a set of positive measure. Then
Z
nA dµ = 1.
A
Proof. Define the two sets
[
Lj := {T j (x) | x ∈ A, T k (x) 6∈ Y for 0 < k < j} and B := Lj .
j≥0
we have shown T −1 B = B a.e. Because T is ergodic and µ(B) has positive measure,
we see µ(B) = 1. S
Define Ak := {x ∈ A : nA (x) = k}. By Poincaré’s Theorem, Ak = A a.e. It
S j k≥0
is also clear that the sets Ak are disjoint. Also, Lj = T Ak up to a null set.
k≥j
Therefore, by switching the order of the unions and noting T preserves measure,
[ [[ [ [
µ(X) = µ(B) = µ( Lj ) = µ( T j Ak ) = µ( T j Ak )
j≥0 j≥0k≥j k≥0k≥j≥0
∞
X X∞ X∞ ∞
X X∞ Z
= µ(T j Ak ) = µ(Ak ) = kµ(Ak ) = nA dµ.
k≥0 k≥j≥0 k≥0 k≥j≥0 k≥0 A
Therefore Z
nA dµ = 1.
A
1
By Kac’s Lemma, the average return time for a point in A is nA (x) = µ(A) ,
inversely proportional to the size of the set.
As a side note, we can generalize the Birkhoff Theorem from functions in L1 to
functions in Lp with Lp convergence. We present the theorem for reference. More
information can be found in [3].
Theorem 8.4 (Lp Ergodic Theorem). Let T be a measure-preserving transforma-
tion of (X, B, µ) and let f ∈ Lp for some 1 ≤ p < ∞. There exists f ∗ ∈ Lp such
Pn−1
that f ∗ ◦ T = f ∗ a.e. and lim || n1 j=0 f (T j ) − f ∗ ||p = 0
n→∞
Recall that Tb is ergodic. Then by applying the Birkhoff ergodic theorem, for
Lebesgue almost all x ∈ [0, 1] we have
n−1 Z
1X 1
lim χ[ k , k+1 ) ◦ T j = χ[ k , k+1 ) = .
n→∞ n b b b b b
j=0
Thus, over the long term the proportion of digits in the base b expansion that are
k is 1b .
We now generalize the Definition 9.1.
Definition 9.3. A number x is normal in base b if each finite word of length k
occurs in x with the frequency b1k . A word of length k is a block of k adjacent digits
i0 i1 . . . ik−1 .
Proposition 9.4. For b ≥ 2, Lebesgue almost all numbers x ∈ [0, 1] are normal in
base b.
Proof. We generalize the proof from before. The first k digits of x in base b are
i0 i1 . . . ik−1 if and only if
n−1
X ij n−1X ij
1
x∈ , + k =: I.
j=0
bj j=0 bj b
By applying the b-adic map n times, we then see that the digits n through n + k − 1
form the word i0 i1 . . . ik−1 if and only if Tbn x ∈ I. Observe that I has measure b1k
and apply Birkhoff’s Theorem as we did before.
Definition 9.5. A number x is normal if it is normal in base b for all b ≥ 2.
Theorem 9.6. Lebesgue almost all numbers x ∈ [0, 1] are normal.
Proof. From the previous proposition, for all b ≥ 2 there exists a set Xb withT full
measure such that all x ∈ Xb are normal in base b. The countable intersection Xb
b≥2
is the set of normal numbers. By Proposition 1.2 this set has full measure.
1
Proof. By a quick computation we see that xn = k if and only if Tn x ∈ ( k+1 , k1 ].
By the Birkhoff Theorem for Lebesgue almost every x the number k appears in the
continued fraction of x with frequency
n−1
(k + 1)2
Z
1X j 1 1 1
lim χ( k+1
1 1 ◦T = χ 1 1 dν = ν ( , ] = log .
,k] ( k+1 , k ]
n→∞ n k+1 k log 2 k(k + 2)
j=0
BIRKHOFF’S ERGODIC THEOREM 19
For a.e. x we can also make some statements about the arithmetic and geometric
means of the terms in the continued fraction.
Proposition 10.2. For Lebesgue a.e. x ∈ [0, 1] the continued fraction expansion
satisfies
n−1
1X
lim xj = ∞.
n→∞ n
j=0
Proof. For each n we have xn = [ T 1n x ], where [x] denotes the integerR part of
x. Let f be the function f (x) = [ x1 ]. Note that f 6∈ L1 (ν), because f dν =
P∞ 1 1
P∞ 1 (k+1)2
k=0 kµ(( k+1 , k ]) = k=0 k log 2 log k(k+2) which does not converge. (We can
check by L’Hopital’s Rule that the terms in the sum do not converge to 0). How-
ever, we can “chop off” the unbounded part of f by defining a new function fN
that takes values of f when |f | < N and 0 otherwise. Clearly f ≥ fN . We have
fN ∈ L1 so we can apply Birkhoff’s Theorem. For Lebesgue a.e. x ∈ [0, 1]
n−1 n−1 n−1 Z
1X 1X 1X
lim xj = lim f ◦ T j x ≥ lim fN ◦ T j x = fN dν.
n→∞ n n→∞ n n→∞ n
j=0 j=0 j=0
R
This is true for all N . We claim lim fN dν = ∞. Therefore
N →∞
n−1
1X
lim xj = ∞.
n→∞ n
j=0
Proposition 10.3. For Lebesgue a.e. x ∈ [0, 1] the long-term geometric mean is
n−1
Y 1/n Y ∞ log k/ log 2
1
lim xj = 1+ 2 .
n→∞
j=0
k + 2k
k=1
Proof. We want to apply Birkhoff’s Theorem, but we are not dealing with a sum.
To remedy this we work with logarithms. Consider the function f : [0, 1] → R
1
where f (x) = log(k) when x ∈ ( k+1 , k1 ] for all k ∈ N and f (0) = 0. As one might
expect from prior examples, this function maps x to log k if x0 = k. We show
f ∈ L1 (ν).
Z ∞ X ∞
X 1 1 1 1
f dν = log k · ν ( , ] ≤ log k · 2λ ( , ] <∞
k+1 k k+1 k
k=1 k=1
We have
∞ ∞ log k/ log 2
(k + 1)2 (k + 1)2
Z
X log k X
f dν = log = log
log 2 k 2 + 2k k 2 + 2k
k=1 k=1
20 DAVID RAN
Thus
n−1 1/n ∞ log k/ log 2
(k + 1)2
Y X
lim log xj = log
n→∞
j=0
k 2 + 2k
k=1
We then exponentiate both sides. We then interchange the series in the exponent
of the RHS with an infinite product instead, which is allowed because the series
converges. Doing so proves the theorem.
minutes. That is equal to over 2.4 × 102 4 years, much longer than the accepted age
of the universe.
Acknowledgments. I would like to thank my mentor Daniel Campos Salas for
helping me explore different paper topics and for introducing me to ergodic theory.
I am very grateful for his patience and encouragement in guiding me through a
variety of different topics and helping me write this paper.
I would also like to thank Peter May for a wonderful REU experience. I believe
that the program would not have been so interesting or even possible without the
tremendous effort he put in.
References
[1] Charles Walkden. Ergodic Theory. http://www.maths.manchester.ac.uk/ cwalkden/ergodic-
theory/ergodic theory.pdf
[2] Patrick Billingsley. Ergodic Theory and Information. Wiley & Sons. 1965
[3] Peter Walters. Ergodic Theory - Introductory Lectures. Springer-Verlag. 1975.
[4] Henk Bruin. Notes on Ergodic Theory. http://www.mat.univie.ac.at/ bruin/ET2.pdf