Wang F. Foundation of Probability Theory 2024
Wang F. Foundation of Probability Theory 2024
Wang F. Foundation of Probability Theory 2024
Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
概率论基础
Originally published in Chinese by Beijing Normal University Press (Group) Co., Ltd.
Copyright © Beijing Normal University Press (Group) Co., Ltd., 2010
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance
Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy
is not required from the publisher.
Printed in Singapore
Preface
v
vi Foundation of Probability Theory
ix
This page intentionally left blank
Contents
Preface v
About the Authors ix
xi
xii Foundation of Probability Theory
Bibliography 189
Index 191
Chapter 1
1
2 Foundation of Probability Theory
1.1.1 Semi-algebra
We first introduce operations for subsets of the global set Ω. Let ∪
and ∩ denote the union and intersection, respectively, let Ac be the
complement of the set A, and let A− B := A∩ B c be the difference of
A and B, which is called a proper difference if B ⊂ A. For simplicity,
will use AB to stand for A∪B, A+B for A∪B with AB = ∅, and
we
n An for the union of finite or countable many disjoint sets {An }.
Then the semi-algebra of sets is defined in terms of the above-
mentioned features of intervals.
Definition 1.1. A class S of subsets of Ω is called a semi-algebra
(of sets) in Ω if
(1) Ω, ∅ ∈ S ,
(2) A ∩ B ∈ S for A, B ∈ S ,
(3) for A1 , A ∈ S with A1 ⊂ A, there exist n 1 and
n
A1 , A2 , . . . , An ∈ S disjoint mutually such that A = Ai .
i=1
Property 1.2. Under items (1) and (2) in Definition 1.1, item (3)
is equivalent to the following:
(3 ) if A ∈ S , then ∃n 1 and A1 , A2 , . . . , An ∈ S mutually
n
disjoint such that Ac = Ai .
i=1
1.1.2 Algebra
(1) Ω ∈ F ,
(2) A, B ∈ F implies A − B ∈ F .
Property 1.5. Under item (1) in Definition 1.4, item (2) is equiva-
lent to any one of the following:
(2 ) A, B ∈ F implies A ∪ B, Ac , B c ∈ F ,
(2 ) A, B ∈ F implies A ∩ B, Ac , B c ∈ F .
1.1.3 σ-algebra
According to the property of Lebesgue measurable sets, a σ-algebra
should be closed under the countable many operations of sets. Since
Class of Sets and Measure 5
the union and intersection of sets are dual to each other by comple-
ment, it suffices to have the closedness by complement and countable
many unions.
(1) Ω ∈ A ,
(2) Ac ∈ A holds for A ∈ A ,
∞
(3) An ∈ A holds for any {An }n1 ⊂ A .
n=1
Property 1.11. Under items (1) and (2) in Definition 1.9, (3) is
equivalent to
∞
(3 ) An ∈ A for An ∈ A , n = 1, 2, . . .
n=1
∞
∞ c
Proof. Note that An = Acn .
n=1 n=1
Proof.
Let {Ar : r ∈ Γ} be a family of σ-algebras in Ω. Then A =
Ar is a σ-algebra in Ω as well because of the following:
r∈Γ
A = 2Ω := {A : A ⊂ Ω}
the following: the former is only closed under finite many operations,
while the latter is closed under countably infinite many operations.
Intuitively, countably infinite many operations can be characterized
as the limit of finite many operations. So, it is reasonable to consider
the limit for sequences of sets.
Note that the limit for a sequence of sets is defined only in the
monotone case by the union (respectively, intersection) for an increas-
ing (respectively, decreasing) sequence. This leads to the notion of
monotone class.
where alg. stands for algebra, mon.cl. for monotone class, and s.-alg.
for semi-algebra.
C = {A1 × · · · × An : Ai ∈ Ai , 1 i n} ,
Ω := Ω1 × · · · × Ωn .
A1 × A2 × · · · × An = (A1 × · · · × Ak ) × (Ak+1 × · · · × An ).
Class of Sets and Measure 11
1.2 Measure
From (3), it follows that Φ(An ) < ∞ (∀n 1), which implies
Φ(Bk ) < ∞ (∀k 1). Then the desired assertion follows by
letting An = An ∩ Bn .
Proposition 1.32.
(1) (Finite subadditivity) Let μ be a finite additive measure on an
n
algebra F . For any A, A1 , . . . , An ∈ F with A ⊂ Ak , there
k=1
n
holds μ(A) μ(Ak ).
k=1
(2) (σ-subadditivity) Let μ be a measure on an algebra F . If A ∈
∞
F and {An }n1 ⊂ F such that A ⊂ An , then μ(A)
n=1
∞
μ(An ).
n=1
Let (b) hold, and let {An }n1 and {Bn }n1 as in above. We have
F A − Bn ↓ ∅. By the continuity at ∅ and the subtractive prop-
erty, we obtain 0 = lim Φ(A − Bn ) = Φ(A) − lim Φ(Bn ). Thus,
n→∞ n→∞
∞
Φ(A) = Φ(Ak ).
k=1
n
= (−1)−1 P(Ai1 · · · Ai ).
=1 1i1 <i2 <···<i n
n
n
n
n
n
μ(Bi ) = μ(Bi ∩ Bj ) = μ(Bi ∩ Bj )
i=1 i=1 j=1 j=1 i=1
n
= μ(Bj ) = μ
(A).
j=1
(2) Obvious.
(3) For any ε > 0 and n 1, take An1 , An2 , . . . ∈ S such
∞
∞
that Ani ⊃ An and μ∗ (An ) μ(Ani ) − ε/2n . Thus,
i=1 i=1
∞
∞
∞
Ani ⊃ An , and by the definition of μ∗
n=1 i=1 n=1
∞
ε
∞ ∞
∞
μ∗ An μ (Ani ) μ∗ (An ) + n
2
n=1 n=1 i=1 n=1
∞
= μ∗ (An ) + ε.
n=1
If μ∗ were a measure on 2Ω , then the restriction μ∗ |σ(S ) would be
an extended measure as desired. However, this is in general not true
as the Lebesgue measure is already a counterexample. So, we need
to find a class A ∗ of “regular” sets such that A ∗ ⊃ σ(S ) and μ∗
is σ-additive on A ∗ . An intuition to select a “regular” set is that it
does not leads to any loss of outer measures when using the set to
cut others. In this spirit, we introduce the notion of μ∗ -measurable
set as follows.
Definition 1.45. A set A ⊂ Ω is called μ∗ -measurable if
Let A ∗ = {A ⊂ Ω : A is μ∗ -measurable}.
We shall prove that A ∗ is a σ-algebra including S and μ∗ is a
measure on A ∗ . For this, we first study the properties of μ∗ and A ∗ .
The following is a consequence of Property 1.44(3).
Property 1.46. A is a μ∗ measurable set if and only if
Property 1.47. A ∗ ⊃ S .
Proof. Let A ∈ S , D ⊂ Ω. For any ε > 0, take {An } ⊂ S such
∞
∞
that An ⊃ D, μ∗ (D) μ(An ) − ε. Then by σ-subadditivity
n=1 n=1
of μ∗ and finite additivity of μ on F (S ), it follows that
∞
μ∗ (Ac ∩ D) + μ∗ (A ∩ D) [μ (An ∩ A) + μ (Ac ∩ An )]
n=1
∞
= μ(An ) μ∗ (D) + ε.
n=1
Theorem 1.48.
(1) A ∗ is a σ-algebra, so that A ∗ ⊃ σ(S ).
Class of Sets and Measure 21
∞
(2) If {An } ⊂ A ∗ are mutually disjoint and A = An , then
n=1
∀D ⊂ Ω,
∞
μ∗ (D ∩ A) = μ∗ (D ∩ An ).
n=1
Letting n → ∞, we derive
∞
μ∗ (D) μ∗ (D ∩ Ac )+ μ∗ (D ∩ (Ai−1 − Ai )) μ∗ (D ∩ Ac )+μ∗ (D ∩ A) .
i=1
Thus, A ∈ A ∗ .
Therefore, A ∗ is a σ-algebra by the monotone class theorem.
∞
(2) Let A = An with An ∈ A ∗ mutually disjoint. Then A ∈
n=1
∗.
∞
A By Property 1.44-(2), it suffices to prove μ∗ (D ∩ A) μ∗
n=1
22 Foundation of Probability Theory
n
(D ∩ An ). Replacing D by A ∩ D and An by Ai in (1.1), we obtain
i=1
n
μ∗ (D ∩ A) μ∗ (D ∩ Ai ). Then the proof is finished by letting
i=1
n ↑ ∞.
(3) The σ-additivity of μ∗ on A ∗ is obtained by letting D = Ω in (2).
Proof of Theorem 1.42. Since A ∗ ⊃ σ(S ), the restriction of
μ∗ on σ(S ) is obviously a measure, and μ∗ (A) = μ(A) for A ∈ S .
Thus, there exists an extension of μ on σ(S ).
Now, let μ be σ-finite on S . By Property 1.31(4), there exist
∞
mutually disjoint {An } ⊂ S such that Ω = An and μ(An ) < ∞,
n=1
n 1. If both μ1 and μ2 are measures on σ(S ) extended from μ, it
suffices to prove μ1 (A ∩ An ) = μ2 (A ∩ An ) for A ∈ σ(S ) and n 1.
For this, let Mn = {A : A ∈ σ(S ), μ1 (A ∩ An ) = μ2 (A ∩ An )}.
Then Mn ⊃ S . By the unique extension of μ on F (S ), we have
Mn ⊃ F (S ), thus by the monotone class theorem, it is sufficient
to show that M is a monotone class, which can be derived by the
continuity of measures.
A¯ = {A ∪ N : A ∈ A , N is a μ-null set}
1.4 Exercises
C := {A1 × · · · × An : Ai ∈ Ai }
is a semi-algebra in Ω := Ω1 × · · · × Ωn .
9. Prove Theorem 1.27.
10. Exemplify that an additive measure on a class of sets may not
be finitely additive.
11. Exemplify that the σ-algebra generated by a semi-algebra S
cannot be expressed as
∞
σ(S ) = An : ∀n 1, An ∈ S ,
n=1
Prove A ∗ ⊃ A∗ .
23. Let (Ω, A , μ) be a measure space. Prove that N ⊂ Ω is μ-null if
and only if μ∗ (N ) = 0.
24. For a measure space (Ω, A , μ), let Ai , Bi ⊂ Ω satisfy
μ∗ (Ai ΔBi ) = 0, i 1. Prove that
∞
∞
μ∗ Ai = μ∗ Bi .
i=1 i=1
25. Let C = {Ca,b = [−b, −a) ∪ (a, b] : 0 < a < b} and define
μ(Ca,b ) = b − a. Prove that μ can be extended to a measure
on σ(C ). Ask whether [1, 2] is μ∗ -measurable.
26. Let f : [0, ∞) → [0, ∞) be strictly increasing, strictly convex
and f (0) = 0. ∀A ⊂ (0, 1], define μ∗ (A) = f (λ∗ (A)), where λ∗ is
the Lebesgue outer measure. Prove that μ∗ satisfies μ∗ (∅) = 0,
nonnegativeness, monotonicity and σ-subadditivity.
27. Let (Ω, A , P) be a probability space, and let Ω ⊃ A ∈
/ A . Prove
that P can be extended to a probability measure on A1 := σ(A ∪
{A}).
28. Let f : R x −→ x3 ∈ R and A0 = [0, 1]. Prove that An+1 =
f (An ) ∪ 23 + f (An ) (n 0) is decreasing in n 0, where
f (An ) := {f (x) : x ∈ An }. The limit of An is denoted by C,
which is called the Cantor set. Prove that the Lebesgue measure
of C is 0.
This page intentionally left blank
Chapter 2
29
30 Foundation of Probability Theory
Property 2.4. Let (E, E ) be a measurable space. Then for any map
f : Ω → E, f −1 (E ) is the smallest σ-algebra in Ω such that f is
measurable.
Property 2.5. Let C be a class of subsets of E and let f : Ω → E
be a map. Then σ(f ) := f −1 (σ(C )) = σ(f −1 (C )).
Proof. Since f −1 (σ(C )) is a σ-algebra, including f −1 (C ), we have
f −1 (σ(C )) ⊃ σ(f −1 (C )). So, it suffices to prove
A := {C ⊂ E : f −1 (C) ∈ σ(f −1 (C ))} ⊃ σ(C ).
In fact, we have (1) A ⊃ C ; (2) f −1 (E) = Ω ∈ σ(f −1 (C )) ⇒ E ∈ A ;
(3) C ∈ A ⇒ f −1 (C c ) = (f −1 (C))c ∈ σ(f −1 (C )) ⇒ C c ∈ A ;
∞
∞
(4) {Cn }n1 ⊂ A ⇒ f −1 Cn = f −1 (Cn ) ∈ σ f −1 (C ) ⇒
n=1 n=1
∞
Cn ∈ A . Thus, A is a σ-algebra including C , hence A ⊃ σ(C ).
n=1
Theorem 2.6.
(1) f is a real measurable function on (Ω, A ) if and only if {f <
x} ∈ A for every x ∈ R.
(2) f = (f1 , · · · , fn ) is an n-dimensional function on (Ω, A ) if and
only if fk is a real measurable function on (Ω, A ) for 1 k n.
Proof. (1) The necessary part is obvious. To prove the sufficiency,
let S = {[−∞, x) : x ∈ R}. Then σ(S ) = B̄, so by Property 2.5, we
have f −1 (B̄) = f −1 (σ(S )) = σ(f −1 (S )) ⊂ σ(A ) = A . Thus, f is
a measurable function on (Ω, A ).
(2) Let f be measurable. Then for any 1 k n and Ak ∈ B̄,
we have {fk ∈ Ak } = {f ∈ R̄ × · · · × Ak × · · · × R̄} ∈ A . So,
fk is measurable for any 1 k n. On the other hand, let
fk be measurable for any 1 k n. To prove the measurabil-
ity of f , we take S = {{fk < r} : 1 k n, r ∈ R}. Since
32 Foundation of Probability Theory
1 if ω ∈ A;
1A (ω) =
0 else.
(2) Let {Ak }1kn be a finite measurable partition of Ω, i.e. they are
n
mutually disjoint sets in A such that Ω = Ak . Then for any
k=1
n
a1 , . . . , an ∈ R, f := ak 1Ak is called a simple function.
k=1
(3) If we take n = ∞ in (2) above, then f is called an elementary
function.
Property 2.11.
(1) 1A is a measurable function on (Ω, A ) if and only if A ∈ A .
(2) Simple functions and elementary functions are all measurable
functions on (Ω, A ).
n
Proof. Let f = ak 1Ak be a simple (or elementary if n = ∞)
k=1
function. Then ∀B ∈ B̄ we have f −1 (B) = Ak ∈ A .
k:ak ∈B
Theorem 2.12.
(1) A measurable function is the pointwise limit of a sequence of
simple functions.
(2) A measurable function is the uniform limit of a sequence of ele-
mentary functions.
(3) A bounded measurable function is the uniform limit of a sequence
of simple functions.
(4) A nonnegative measurable function is the (uniform) limit of a
sequence of increasing simple (elementary) functions.
34 Foundation of Probability Theory
Then fn are simple functions and |fn − f |1{−nf <n} < 21n ; when
f = ∞, fn = n; when f = −∞, fn = −n. Thus, the sequence
{fn }n1 converges pointwise to f .
(2) For any n ∈ N, let
∞
k
fn = 1 k k+1 + ∞1{f =∞} − ∞1{f =−∞} .
2n { 2n f < 2n }
k=−∞
Then {fn }n1 are elementary functions such that for any n 1,
1
|fn − f |1{|f |<∞} < n ; fn = f when |f | = ∞.
2
Thus, fn converges uniformly to f as n → ∞.
(3) If f is bounded, then by (1), the sequence {fn }n1 of simple
functions converges uniformly to f .
(4) If f is nonnegative, then the sequences {fn }n1 constructed in
(1) and (2) are increasing.
Let f be a real function on Ω. Define f + = max{f, 0} and f − =
max{−f, 0}, which are called the positive and the negative parts
of f , respectively. Then f = f + − f − , |f | = f + + f − , f + = |f |+f
2 ,
− |f |−f
f = 2 .
Theorem 2.13. The positive part and the negative part of a mea-
surable function are measurable. So, any measurable function can be
expressed as the difference of two nonnegative measurable functions.
(f1 , . . . , fn )(Ω) ⊂ D.
(1) 1 ∈ L;
(2) L is closed under linear combinations;
(3) for any nonnegative and increasing sequence, {fn }n1 ⊂ L, such
that fn ↑ f , if either f is bounded or f ∈ L , then f ∈ L.
(1) 1Ω = 1E ◦ f ∈ L.
(2) ∀g1 ◦ f, g2 ◦ f ∈ L and a1 , a2 ∈ R such that a1 (g1 ◦ f ) + a2 (g2 ◦ f )
makes sense, we have
a1 g1 ◦ f + a2 g2 ◦ f = [(a1 g1 + a2 g2 )1A ] ◦ f,
Let
n
n
μF (Ak ) = μF Ak μF (A).
k=1 k=1
∞
∞
Letting n → ∞, we obtain μF (Ak ) = μF Ak μF (A).
k=1 k=1
∞
It remains to be proved that μF (Ak ) μF (A). By an approx-
k=1
imation argument, we may assume that A is a finite interval; i.e. by
(N )
first using A(N ) = A∩[−N, N )n and Ak = Ak ∩[−N, N )n replacing
A and Ak respectively for N ∈ N and then letting N ↑ ∞.
Now, let A = [a, b) and Ak = [a(k) , b(k) ) with a, b, a(k) , b(k) ∈ Rn
∞
such that a b, a(k) b(k) (k 1) and Ak = A. By the left
k=1
continuity of F , ∀ε > 0, ∃δ > 0 such that
where δ = (δ, . . . , δ). Moreover, for each k 1, there exists δ(k) > 0
such that
ε
μF ([a(k) − δ(k) , b(k) )) μF (Ak ) + .
2k
Random Variable and Measurable Function 41
Since
∞
∞
[a, b − δ] ⊂ [a, b) = a(k) , b(k) ⊂ a(k) − δ(k) , b(k) ,
k=1 k=1
Thus,
N
μF ([a, b)) μF ([a, b − δ)) + ε ε + μF (a(k) − δ(k) , b(k) )
k=1
∞
2ε + μF (Ak ).
k=1
∞
Letting ε ↓ 0, we obtain μF (Ak ) μF (A).
k=1
So far, we have proved that μF is a σ-finite measure on the
semi-algebra C . Then the proof is finished by the measure exten-
sion theorem (Theorem 1.42).
It is clear that the L–S measure induced by a distribution function
is finite on compact sets. Such a measure is called the Radon measure.
Indeed, the inverse of Theorem 2.27 also holds, i.e. a Radon measure
must be the L–S measure induced by a distribution function, see
Exercise 6 at the end of this chapter.
Proof of Theorem 2.25. Let μF be the induced measure of F on
B n . By (c) and (d), μF is a probability measure. On probability space
(Ω, A , P) = (Rn , B n , μF ), define the random variable ξ(x) := x.
Then ξ is an n-dimensional random variable such that P(ξ < x) :=
μF ((−∞, x)) = F (x).
l
P(ξ (t1 ) < x(t1 ) , . . . , ξ (tl ) < x(tl ) ) = P ξ (ti ) < x(tI ) .
i=1
Property 2.31.
(1) {ξ (t) : t ∈ T } are independent if and only if {ξ (t) : t ∈ T } are
independent
for ∀T T.
(2) Let Tr = T and Tr mutually disjoint with |Tr | < ∞. Set
r∈I
ξ̄ (r) = (ξ (t) : t ∈ Tr ). If {ξ (t) : t ∈ T } are independent, then
{ξ̄ (r) : r ∈ I} are independent as well.
Corollary 2.34. ξ (k) : 1 k n are independent if and only if
the distribution functions of (ξ (1) , . . . , ξ (n) ) can be expressed as
n
(1) (n)
F (x ,...,x )= Fk x(k)
k=1
n
by letting x(k) → ∞, we derive Fi (∞) = 1, and hence, the distri-
i=1
bution of ξ (k) is given by Fk x(k) /Fk (∞). Thus,
Fk x(k)
n n
(1) (n) (k)
F x ,...,x = Fk x = ,
Fk (∞)
k=1 k=1
a.e.
g(fn(1) , . . . , fn(m) ) −−→ g(f (1) , . . . , f (m) ).
46 Foundation of Probability Theory
a.e.
(2) fn − fm −−→ 0 if and only if ∀ε > 0,
∞
∞
μ {|fn+v − fn | ε} = 0.
n=1 v=1
a.e.
In particular, when μ is finite, fn − fm −−→ 0 if and only if
∞
∀ε > 0, μ {|fn+v − fn | ε} → 0 (n → ∞).
v=1
sup μ(|fn+v − fn | ε) → 0, n → ∞,
v1
μ
Clearly, if fn −
→ f , then f is finite a.e. The following properties
are obvious.
Property 2.40.
μ μ
(1) If fn −
→ f, then any subsequence fnk − → f.
μ μ
(2) If fn −
→ f and fn −→ f , then f = f a.e.
μ μ
(3) If fn −
→ f and gn = fn a.e., g = f a.e., then gn −
→ g.
Proof. Let
1
DN = x ∈ Rn : |x| N, d(x, D c ) , d(x, ∅) = ∞.
N
a.e.
Let k ↑ ∞ to derive that fnk −−→ f by Theorem 2.38(1).
Random Variable and Measurable Function 49
+ μ(|fnk − fk | ε)
21−k + μ(|fnk − fk | ε).
μ
Hence, fn −
→ f.
a.e.
(3) Let μ be a finite measure and fn −−→ f . Then
∞
μ(|fn − f | ε) μ {|fm − f | ε} .
m=n
50 Foundation of Probability Theory
a.e.
Combining this with fn −−→ f and the upper continuity of measure,
we obtain
∞ ∞
lim μ(|fn − f | ε) μ {|fm − f | ε} = 0.
n→∞
n=1 m=n
P
Corollary 2.48. Let a be a constant. Then ξn −
→ a if and only if
d
ξn −
→ a.
Proof. We only need to prove the sufficiency. For simplicity, we
only consider the one-dimensional case. Since the distribution for
random variable ξ ≡ a is F (x) = 1(a,∞) , both a − ε and a + ε are
d
continuous points of F for any ε > 0. By ξn − → a, it follows for any
ε > 0, P(|ξn − a| > ε) = P(ξn < a − ε) + P(ξn > a + ε) → 0(n → ∞).
Similarly, it is easy to check the following two assertions.
P d d
Theorem 2.49. If ξn − ξn −
→ 0 and ξn −
→ ξ , then ξn −
→ ξ.
d d
Theorem 2.50. If ξn → ξ and ηn → a, where a is a constant, then
d
ξn + ηn → ξ + a.
2.5 Exercises
P P
(b) When ξn −
→ ξ, does it hold that Sn −
→ ξ?
20. (Ω, A , P) is called a pure atom probability space if Ω has a
partition {An }n1 such that A = σ({An : 1 n < ∞}), each
An (= ∅) is called an atom. Prove that for a sequence of random
variables on a pure atom probability space, the convergence in
probability is equivalent to the a.s. convergence.
21. (Egorov’s theorem) Let (Ω, A , μ) be a finite measure space, and
a.e.
let fn , f are fine measurable functions such that fn −−→ f . Then
∀ε > 0, ∃N ∈ A with μ(N ) ε such that fn uniformly converge
to f on N c .
22. For any sequence of random variables {ξn }, there exists a
P
sequence of positive numbers {an } such that an ξn −
→ 0.
23. Exemplify that Theorem 2.41 may fail when g is only a contin-
uous function.
24. Prove Theorems 2.49 and 2.50.
25. Let Fn and F be distribution functions of ξn and ξ, respectively.
d
If ξn −
→ ξ, then P(ξn x) → P(ξ x) and P(ξn > x) →
P(ξ > x) for every continuous point x of F .
This page intentionally left blank
Chapter 3
55
56 Foundation of Probability Theory
n
f= ak 1Ak ,
k=1
the
integral of f with respect to μ. For any A ∈ A , we call A f dμ =
Ω f 1A dμ the integral of f on A with respect to μ.
Clearly, the value of Ω f dμ is independent of the expression of
the simple function f and is hence well defined. As the integral
is the
measurement result of f under μ, we also denote μ(f ) = Ω f dμ. The
following properties are obvious.
m
gn := (aj − ε)1Aj ∩{|fn −f |ε} + N 1{f =∞,|fn |N } , n 1.
j=1
58 Foundation of Probability Theory
m
= (aj − ε)μ(Aj ) + N μ(f = ∞).
j=1
a.e.
Proposition 3.7. If f = g and their integrals exist, then μ(f ) =
μ(g).
Proof. Let g = c1A∩{f c} . Then g 1A f and Ω g dμ Af dμ,
so
cμ({f c} ∩ A) = μ(g) f dμ.
A
(2) Replacing fn by −fn in the above proof, (2) follows from (1)
immediately.
Integral and Expectation 61
Combining this with Theorem 3.8, (1)(a) and (2)(d), we finish the
proof.
μ
Proof. By Theorem 3.10(3), we only need to prove for fn − → f . By
Theorem 3.8(2)(b), it suffices to show that lim Ω |fn − f | dμ = 0.
n→∞
If this does not hold, then there exist nk ↑ ∞ and ε > 0 such that
μ
Ω |fnk −f | dμ ε, ∀k 1. Since fnk −
→ f , there exists a subsequence
a.e.
fnk −−→ f , so that by Theorem 3.10(3), we derive lim Ω |fnk −
n→∞
f | dμ = 0, which is a contradiction.
n
∞
Proof. Let gn = fk . If fn is nonnegative, then gn ↑ fn ,
k=1 n=1
so the assertion follows from the monotone convergence theorem.
∞
∞
n
Assume |f | dμ < ∞. Let g = |f |, g = |fk |. Then
Ω n n n
n=1 n=1 k=1
0 g n ↑ g . It follows from the monotone convergence theorem
∞
∞
|fn | dμ = lim gn dμ =
g dμ = |fn | dμ,
n→∞ Ω
n=1 Ω Ω Ω n=1
62 Foundation of Probability Theory
∞
so g is integrable and |gn | g . Since Ω |fn | dμ < ∞ and g is
n=1
a.e.
∞
a.e. finite. Hence, gn −→ fn . Then the assertion follows from the
n=1
dominated convergence theorem.
Corollary 3.13. If μ(f ) exists, for A ∈ A and {An }∞
n=1 ⊂ A mutu-
∞ ∞
ally disjoint such that A = An , we have A f dμ = An f dμ.
n=1 n=1
∞ ∞
Proof. As f ± 1A = f ± 1An , we have A f ± dμ = An f ± dμ.
n=1 n=1
Since the integral of f exists, at least one of the previous series is
finite, so we can make subtraction term by term, which gives the
assertion.
The following result provides the definition of product measures.
Corollary 3.14. Let (Ωi , Ai , μi ), 1 i n, be finite many σ-finite
measure spaces. Then there exists a unique σ-finite measure μ on the
product measurable space (Ω1 × · · · × Ωn , A1 × · · · × An ) such that
μ(A1 × · · · × An ) = μ1 (A1 ) · · · μn (An ), Ai ∈ Ai , 1 i n. (3.1)
The measure μ is called the product measure of μi , 1 i n, and is
denoted by μ1 × · · · × μn .
Proof. It is easy to see that
C := A1 × · · · × An : Ai ∈ Ai , 1 i n
is a semi-algebra in Ω1 × · · · × Ωn . By Theorem 1.42, it suffices to
show that μ defined by (3.1) is a σ-finite measure on C . Since each
μi is σ-finite, so is μ. It remains to prove the σ-additivity of μ. By
induction, we only prove for n = 2.
Let {A× B, Ai × Bi : i 1} ⊂ C such that A× B = ∞ i=1 Ai × Bi .
Then for any ω2 ∈ Ω2 , we have
∞
1A 1B (ω2 ) = 1A×B (·, ω2 ) = 1Ai 1Bi (ω2 ).
i=1
Proof.
By applying the mean value theorem of differentiation, we
f −f
have tt−t0t0 g, t ∈ (a, b) for ∀t0 ∈ (a , b). The assertion follows
from Corollary 3.18 immediately.
3.3 Expectation
integral is called the expectation of ξ, denoted by Eξ = Ω ξ dP.
If E|ξ| < ∞, we say that ξ has finite expectation.
As in the elementary probability theory, by using expectation, we
define the characteristic function and numerical characters of random
variables are as follows.
Definition 3.24.
(1) Let ξ = (ξ1 , . . . , ξn ) be an n-dimensional random variable. We
call
Rn (t1 , . . . , tn ) → ϕξ (t1 , . . . , tn ) := Eei t,ξ
√
n characteristic function of ξ, where i := −1 and t, ξ :=
the
j=1 tj ξj .
(2) Let ξ be a random variable such that Eξ exists. Then Dξ :=
E|ξ − Eξ|2 is called the variance of ξ.
(3) Let ξ be a random variable and r > 0. E|ξ|r is called the rth
moment of ξ. When Eξ exists, E|ξ − Eξ|r is called the rth central
moment of ξ.
(4) Let ξ and η be two random variables such that Eξ and Eη are
finite, and
bξ,η = E(ξ − Eξ)(η − Eη)
exists. Then bξ,η is called the covariance of ξ and η. If DξDη = 0
bξ,η
and is finite, then rξ,η = √DξDη is called the covariance coeffi-
cient of ξ and η.
(5) Let ξ = (ξ1 , . . . , ξn ) be an n-dimensional random variable such
that Eξ = (Eξ1 , . . . , Eξn ) and (bij = bξi ,ξj )1i,jn exist. Then
⎛ ⎞
b11 · · · b1n
⎜ . .. ⎟
B(ξ) = ⎜
⎝ .
.
⎟
. ⎠
bn1 · · · bnn
is called the covariance matrix of ξ. If (rij = rξi ,ξj )1i,jn exist,
then
⎛ ⎞
r11 · · · r1n
⎜ . .. ⎟
R(ξ) = ⎜
⎝
.. . ⎟
⎠
rn1 · · · rnn
is called the correlation matrix of ξ.
Integral and Expectation 67
n
m
n
m
ξη = ai bj 1Ai ∩Bj , Eξη = ai bj P(Ai )P(Bj ) = EξEη.
i=1 j=1 i=1 j=1
Proposition 3.26.
(1) Random variables ξ1 , . . . , ξn are mutually independent if and only
if
(2) By step (1), the linear property of integral, and the monotone
convergence theorem, we derive the formula first for f being a simple
function, then for f being a nonnegative function f , and finally for
f being a measurable function such that μ(g ◦ f ) exists.
In references, the L–S measure μ induced by a distribution func-
tion F is also denoted by dμ = dF , and the associated integral is
called the L–S integral.
Definition 3.28. Let μ be the Lebesgue–Stieltjes (L–S, in short)
measure on (Rn , B n ) induced by a distribution function F . Let f be
a measurable function on (Rn , B n ) such that μ(f ) exists. Then the
integral of f with respect to μ is called an L–S integral, denoted by
μ(f ) = f dμ = f dF.
Rn Rn
η := g(ξ) is
(P ◦ η −1 )(A) = dF, A ∈ B m .
g −1 (A)
we obtain
∞
−1
Eξ = x (P ◦ ξ )(dx) = ai pi .
R i=1
3.4 Lr -space
|f |r |g|s 1 1
a= , b= , α= , β= ,
μ(|f |r ) μ(|g|s ) r s
|f g| |f |r |g|s
+ ,
f r gs rμ(|f |r ) sμ(|g|s )
r s
|f | |g|
where the equality holds if and only if μ(|f = μ(|g| s . Combining this
|r ) )
with Theorem 3.8, we may take integrals with respect to μ in both
|f |r
sides to derive (3.3), and the equality holds if and only if μ(|f |r )
=
|g|s 1
μ(|g|s )
holds μ-a.e., which implies (3.4) for c1 = μ(|f |r )
and c2 =
1
− μ(|g|s ) . Finally, it is clear that (3.4) implies the equality in (3.3).
1
Corollary 3.35 (Jensen’s inequality). ∀r > 1, E|ξ| (E|ξ|r ) r ,
and the equality hold if and only if |ξ|r is a.s. constant.
+
∀a1 , . . . , an ∈ R, |a1 + · · · + an |r n(r−1) (|a1 |r + · · · + |an |r ).
holds if and only if |ξ| are constant, i.e. |ai | = |aj |, ∀i, j. But
equality
n n
ai = |ai | if and only if ai have same signs, so ai = aj , ∀i, j.
i=1 i=1
(2) Case r 1. We only prove ai are not all null. Note
|ak | |ak |r
r , r 1.
n
n
|ai | |ai |
i=1 i=1
+
n
μ(|f1 + · · · + fn |r ) n(r−1) μ(|fi |r ),
i=1
1 1 1
(μ|f + g|r ) r (μ|f |r ) r + (μ|g|r ) r ,
(1) when r > 1, there exist c1 , c2 ∈ R not all null and having the
same signs such that c1 f − c2 g = 0, a.e.;
(2) when r = 1, f, g have the same sign, a.e.
74 Foundation of Probability Theory
Proposition 3.40.
Lr (μ) Lr (μ)
(1) Let μ be a finite measure. If fn −−−→ f , then fn −−−−→ f, r ∈
(0, r).
Lr (μ)
(2) If fn −−−→ f , then μ(|fn |r ) → μ(|f |r ).
Proof. (1) and (2) follow from Hölder’s inequality and the triangle
inequality of dr , respectively.
Lr (μ)
(1) fn −−−→ f .
μ
→ f and {|fn − f |r }n1 is uniformly continuous in integral.
(2) fn −
μ
→ f and {|fn |r }n1 is uniformly continuous in integral.
(3) fn −
μ
→ f and {|fn |r }n1 is uniformly integrable.
(4) fn −
have
nε
r
sup |fn − f | dμ ε + μ(1A |fn − f |r ).
n1 A n=1
r
As for given n, lim μ(1A |fn − f | ) = 0, so
μ(A)→0
Lr (μ)
Since ε is arbitrary, we have fn −−−→ f.
μ
(b) Again, by Theorem 2.44, fn − → f implies that there exists a
a.e.
subsequence {fnk } such that fnk −−→ f. Thus, by Fatou’s lemma,
∀A ∈ A , μ(f 1A ) lim μ(fnk 1A ) sup μ(fn 1A ).
k→∞ n
Proof. Take {An } such that ϕ(An ) ↓ inf ϕ(A). Since inf
A∈A A∈A
∞
ϕ(A) 0, we may assume that ϕ(An ) are finite. Let A = An .
n=1
78 Foundation of Probability Theory
kn
Bn = A1,i1 ∩ A2,i2 ∩ · · · ∩ An,in =: An,i .
1i1 ,i2 ,...,in 2 i=1
ϕ(A1,i1 ∩A2,i2 ∩···∩An,in )0
Clearly, Φ is not empty and α ∈ [0, ϕ(Ω)]. Take {fn }n1 ⊂ Φ such
that αn := μ(fn ) ↑ α ϕ(Ω) < ∞. Set gn = sup fk . Then 0 gn ↑
kn
f := sup fk . For given n 1, put Ak = {ω : gn (ω) = fk (ω)} (1
k1
n
k−1
k n). Then Ak = Ω. Moreover, let Bk = Ak − AI . Then
k=1 i=1
n
{Bk } are mutually disjoint and Bk = Ω. Thus, ∀A ∈ A ,
k=1
n
n
gn dμ = fk dμ ϕ(Bk ∩ A) = ϕ(A).
A k=1 A∩Bk k=1
Hence, A f dμ ϕ(A). By this and definition of α, it follows that
μ(f ) = α.
Let
ϕc (A) = f dμ, ϕs (A) = ϕ(A) − f dμ.
A A
ϕn (Dn ∩ A) 0, ϕn (Dnc ∩ A) 0, ∀A ∈ A .
∞
Let D = Dn . Then ∀n,
n=1
1
D ⊂ Dn , ϕs (D ∩ A) μ(D ∩ A).
n
hence μ(Dnc ) = 0.
(ii) Let μ and ϕ be σ-finite measures. There exists {An }n1 mutually
∞
disjoint such that An = Ω and μ(An ), ϕ(An ) < ∞ (∀n). From (i),
n=1
(n) (n)
it follows that there exists ϕc and ϕs such that
(n)
Let Nn be a μ-null set such that ϕs (Nnc ∩ A ∩ An ) = 0 for any
A ∈ A . Set
∞ ∞
(n)
f= 1An f , ϕc (A) = f dμ, ϕs (A) = ϕ(n)
s (An ∩ A).
n=1 A n=1
∞
(n)
Again, let N = Nn . Then ∀A ∈ A , and we have ϕs (N c ∩
n=1
(n)
A n)
∩ A(n) ϕs (Nnc ∩
A ∩ An ) = 0. It follows that ϕs (N c ∩ A) =
ϕs (N ∩ A ∩ An ) = 0. Hence, ϕs and μ are singular.
c
n
3.6 Exercises
n
13. Let A1 , . . . , An be events and A = Ai . Prove
i=1
n
(a) 1A 1Ai .
i=1
n
(b) P(A) P(Ai ) − P(Ai ∩ Aj ).
i=1 i<j
n
(c) P(A) P(Ai ) − P(Ai ∩ Aj ) + P(Ai ∩ Aj ∩ Ak ).
i=1 i<j i<j<k
Ef (ξ)g(η) = Ef (ξ)Eg(η).
Integral and Expectation 87
∞
18. (a) If events A1 , A2 , . . . satisfy P(An ) < ∞, then
n=1
∞
∞
P( Ak ) = 0.
n=1 k=n
∞
(b) If events A1 , A2 , . . . are independent and P(An ) = ∞,
n=1
∞
∞
then P( Ak ) = 1.
n=1 k=n
∞
19. Let pn ∈ [0, 1). Apply the previous exercise to prove (1 −
n=1
∞
pn ) = 0 if and only if pn = ∞.
n=1
26. Let r ∈ (0, ∞) and let (Ω, A , μ) be a measure space. Prove that
the class of integrable simple functions is dense in Lr (μ).
27. Let 1/p + 1/q = 1, p, q > 1. Prove f p = {μ(f g) : gq 1}.
28. If a sequence of random variables {ξn }n1 is uniformly bounded,
then ξn converges in probability if and only if it converges in
Lr (P), where r ∈ (0, ∞).
29. For a measurable function f , define the essential supremum
by
f ∞ = inf {M : μ({ω : |f (ω)| > M }) = 0} .
(a) Prove that · ∞ satisfies the triangle inequality.
(b) If μ(Ω) < ∞, then f ∞ = lim f r .
r→∞
and
P(ξ ∈ A, (ξ, η) ∈ B) = P((x, η) ∈ B)μ(dx).
A
{x ∈ Ω : μ({x}) > 0}
is at most countable.
90 Foundation of Probability Theory
91
92 Foundation of Probability Theory
f (ω1 , ω2 )μ2 (dω2 ) is A1 -measurable in ω1 and has integral with
Aω1
respect to μ1 .
Proof. Let
fω−1
1
(B) = {ω2 ∈ Ω2 : fω1 (ω2 ) ∈ B}
= ω2 ∈ Ω2 : (ω1 , ω2 ) ∈ f −1 (B) = f −1 (B) ω1
,
The functions fω1 and fω2 are called the section functions f at
ω1 and ω2 , respectively.
Combining this with the linear property of integral and step (2),
we finish the proof.
Product Measure Space 95
where
TN T, ΩTNc = Ωt , ATN ∈ ATN.
t∈T
/ N
Hence, Pt (ATN ) = P (ATN ) = P T
t∈TN t∈T t t∈T t (A N ).
N N
PT = t∈T Pt be a product measure on (ΩT , AT ), but PT is a
finitely additive function on A T \T
, defined by (4.2) with T \ T as
its total index set. Since PTN := t∈TN Pt is a probability measure,
from the definition of P and the finite additivity of measure
it follows
that nk=1 P(Ak ) = P(A0 ).
(3) Since A T is a set algebra and P is finitely additive, to get the
σ-additivity of P, we only need to prove that it is continuous at ∅.
We use the method of proof by contradiction.
Let {An }n1 ⊂ A T be decreasing and ∃ε > 0 such that P(An ) ε
for every n 1. Now, we prove ∩∞ n=1 An = ∅. Note that for any
n 1, there exist a finite set Tn ⊂ T and An ∈ t∈Tn At such that
T n
An = ATnn × t∈T / n Ωt .
Let T∞ = ∪∞ n=1 Tn . Then T∞ is countable, denoted by T∞ =
{t1 , t2 , . . .}.To prove ∩∞ n=1 An =∅, we only need to prove ∃(ω̄t1 , . . . ,
ω̄tn , . . .) ∈ t∈T∞ Ωt , such that ∞ j=1 Aj (ω̄t1 , . . . , ω̄tn , . . .) = ∅, where
Aj (ω̄t1 , . . . , ω̄tn , . . .) is section of Aj at (ω̄t1 , . . . , ω̄tn , . . .).
(j)
First, we set B1 = ωt1 ∈ Ωt1 : P{t1 } (Aj (ωt1 )) 2ε . Since
P{t1 } (Aj (ωt1 )) = Pt∈Tj \{t1 } Aj j (ωt1 )
T
(j)
is At1 -measurable, B1 ∈ At1. Fubini’s theorem gives
ε
(j)
ε P(Aj ) = P{t1 } (Aj (ωt1 )) dPt1 Pt1 B1 + ,
Ωt 1 2
(j) (j) ∞
which implies that Pt1 B1 2ε . Since B1 is decreas-
j=1
∞ (j) (j)
ing, we have Pt1 j=1 B1 2ε , so ∃ω̄t1 ∈ ∞ j=1 B1 , that is,
P{ti } (Aj (ω̃tj )) 2ε for every j 1.
In general, assume for some k 1 we have (ω̄t1 , . . . , ω̄tk ) ∈ Ωt1 ×
· · · Ωtk such that
ε
P{t1 ,...,tk } (Aj (ω̄t1 , . . . , ω̄tk )) k , ∀j 1.
2
Let
(j)
Bk+1 = ωtk+1 ∈ Ωtk+1 : P{t1 ,...,tk ,tk+1 } (Aj (ω̄t1 , . . . , ω̄tk , ωtk+1 ))
ε
k+1 .
2
98 Foundation of Probability Theory
Then (ξt (ω) := ωt )t∈T are random variables on (Ω, A , P), and
k
k
(k) (k)
Ω = Ωi , A = Ai , k = 1, . . . , n.
i=1 i=1
4.4 Exercises
(a) μ1 × μ2 (A) = 0.
(b) μ1 (Aω2 ) = 0, μ2 -a.e.
(c) μ2 (Aω1 ) = 0, μ1 -a.e.
6. If an infinite matrix P = (pij )i,j∈N satisfies pij 0, j∈N pij =
1, ∀i ∈ N,
then P is called a transition probability matrix. Let
λ(i, A) = j∈A pij , i ∈ N, A ⊂ N. Prove that λ is a transition
probability on N × 2N .
7. Let μ be the counting measure on N, i.e. μ({i}) = 1 for any
i ∈ N. Let
⎧
⎨i,
⎪ i = j,
f (i, j) = −i, j = i + 1,
⎪
⎩0, other i, j.
Prove
f (ω1 , ω2 )μ(dω2 ) μ(dω1 ) = 0, but
N N
f (ω1 , ω2 )μ(dω1 ) μ(dω2 ) = ∞.
N N
12. Let
F be a probability distribution function on R. Prove
R (F (x + c) − F (x)) dx
= c for any constant c ∈ R, and if
1
F is continuous, then R F (x) dF (x) = 2 .
Chapter 5
105
106 Foundation of Probability Theory
To see that Definition 5.3 makes sense, we need to show the exis-
tence and uniqueness of E(ξ|C ) when ξ has expectation. Without
loss of generality,
we may assume Eξ − < ∞, so that C B →
ϕ(B) := B ξ dP is a signed measure with ϕ P|C . By Theorem
3.51, there
exists P|
C -a.s. unique f ∈ C such that dϕ = f dP|C , i.e.
ϕ(B) = B f dP = B ξ dP for B ∈ C .
By Theorem 2.12 and Property 5.5(4) and (6), it suffices to prove for
ξ = 1A , η = 1B , A ∈ A , B ∈ C , then the proof is finished since in
this case, we have
ηE(ξ|C ) dP = ξ dP = P(A ∩ B ∩ C) = ξη dP.
C C∩B C
Lr (P) Lr (P)
Property 5.7. Let r ∈ [1, ∞). If ξn −−−→ ξ, then E(ξn |C ) −−−→
E(ξ|C ).
Proof. By Jensen’s inequality and the property of conditional
expectation, it follows that
E|E(ξn |C ) − E(ξ|C )|r = E|E(ξn − ξ|C )|r E(E(|ξn − ξ|r |C ))
= E|ξn − ξ|r → 0 (n → ∞).
E|ξ − E(ξ|C )|2 E|ξ − η|2 , E(|ξ − E(ξ|C )|2 |C ) E(|ξ − η|2 |C ),
Hence, E(|ξ − η|2 |C ) E(|ξ − E(ξ|C )|2 |C ), and the equality holds if
and only if η = E(ξ|C ), P-a.s.
(i)
(3) for any 1 i n, let rm ∈ Q̄n such that the ith component is
−m and others are ∞; then lim F (ω; rm ) = 0, ω ∈ / N;
Nm→∞
1
(4) ∀r0 ∈ Qn , lim F (ω; r0 − m ) = F (ω; r0 ), ω∈
/ N.
Nm→∞
Let
C F (ω; r), ω ∈ N c;
F (ω; r) =
1(0,∞) (r), ω ∈ N, r ∈ Qn .
PS (AS ) = PS AS × ΩS −S , AS ∈ A S , S ⊂ S T.
116 Foundation of Probability Theory
5.5 Exercises
E(ξn+1 |An ) = ξn , n 1,
Prove that {ξn }n1 is a Markov chain if and only if one of the
following conditions holds:
(a) E(ξm |An ) = E(ξm |ξn ), m n 1;
(b) E(η|An ) = E(η|ξn ), η ∈ A n , n 1;
118 Foundation of Probability Theory
∞
12. Let matrix P = (pij )∞
i,j=0 satisfy pij 0, pij = 1. Construct a
j=0
probability space (Ω, A , P) and a sequence of random variables
{ξn }n0 such that P(ξn+1 = j|ξn = i) = pij for n 0 and
i, j 0.
13. Let ξ and η be random variables such that E(ξ|C ) = η and
Eξ 2 = Eη 2 < ∞. Prove that ξ = η, a.s.
14. Let ξ ∈ L1 (P). Prove that the family of random variables
{E(ξ|C ) : C is sub-σ-algebra of A }
is uniformly integrable.
15. Let ξ and η be independent identically distributed such that Eξ
exists. Prove E(ξ|ξ + η) = (ξ + η)/2.
16. Let (Ω, A , P) be a probability space, let (E, E ) be a measurable
space, and let T : Ω → E be a measurable map. Prove that for
any sub-σ-algebra C of E , there holds
P T −1 (B)|T −1 (C ) = (P ◦ T −1 )(B|C ) ◦ T, B ∈ E .
17. For an event P(A) > 0, denote PA = P(·|A). Prove that for any
B ∈ A and sub-σ-algebra C of A ,
P(A ∩ B|C )
PA (B|C ) = .
P(A|C )
n −itk ak
1 e −e−itk bk
μ([a, b)) = lim f (t1 , . . . , tn )dt1 · · · dtn .
T →∞ (2π)n [−T,T ]n k=1 itk
Proof. Let I(T ) denote the integral in the right-hand side over
[−T, T ]n . By the definition of fμ and Fubini’s theorem, we obtain
n n
T T
e−itk ak − e−itk bk i k=1 tk xk
I(T ) = μ(dx) ··· e dt1 · · · dtn
Rn −T −T k=1 itk
n
T
e−itk ak − e−itk bk itk xk
= e dtk μ(dx)
Rn itk
k=1 −T
n
T
sin tk (xk − ak ) − sin tk (xk − bk )
=2 n
dtk μ(dx)
Rn tk
k=1 0
n
T (xk −ak )
sin t
= 2n dt μ(dx).
Rn t
k=1 T (xk −bk )
r ∞
Since s sint t dt is bounded in s r ∈ R, and −∞ sin t
t dt = π, the
dominated convergence theorem implies
is at most countable.
Proof. Let
1
Dm,k (μ) = a ∈ R : μ({x : xk = a}) , m 1, 1 k n.
m
Proof. Let a = (ak )1kn and b = (bk )1kn such that {ak , bk :
1 k n} ⊂ C(μ). Then
n
∂[a, b] ⊂ {xk = ak or bk }
k=1
is a μ-null set.
and
lim |μ(Cn ) − μ(A)| = lim |μ(Cn ) − μ(An )|.
n→∞ n→∞
equivalently,
s
(2) We say that (μn )n1 converges strongly to μ, denoted by μn −
→
μ, if
1
fm (x) = , x ∈ E, m 1.
1 + md(x, C)
128 Foundation of Probability Theory
lim μn (C).
n→∞
(4) and (5) ⇒ (6). Let A be a μ-continuous set. Then μ(A) = μ(Ā) =
μ(A◦ ), where Ā and A◦ are the closure and interior of A, respectively.
This together with (4) and (5) yields
such that
Let
n−1
fn = ri 1{ri f <ri+1 } .
i=1
Then
so that
|μ(f ) − μm (f )| |μ(f ) − μ(fn )| + |μ(fn ) − μm (fn )| + |μm (fn ) − μm (f )|
n−1
δ(In ) (μ(E) + μm (E)) + ri |μ(ri f < ri+1 ) − μm (ri f < ri+1 )|.
i=1
Noting that {ri } ⊂ D c implies that each set {ri f < ri+1 } is
μ-continuous, by (6), we may let first m ↑ ∞ then n ↑ ∞ to derive (1).
Since {fn }n1 is dense in C(E), for any f ∈ C(E) and ε > 0, there
exists m0 1 such that ||fm0 − f ||∞ ε. So,
|μnk (f ) − μnl (f )|
|μnk (f − fm0 )| + |μnl (f − fm0 )| + |μnk (fm0 ) − μnl (fm0 )|
2εC + |μnk (fm0 ) − μnl (fm0 )|.
130 Foundation of Probability Theory
→ μ(m) (n → ∞),
w
μmn |Km −
→ μ(m) (k → ∞),
w
μnk |Km − m 1.
Clearly,
1
hl = l 1.
1 + ld(x, A)
Then
= μ(m) (A ∩ Km ).
|μnk (f ) − μ(f )|
w
By first letting k ↑ ∞ and then m ↑ ∞, we prove μnk −
→ μ.
holds for any increasing open sets Gn ↑ E. To see this, for any n 1,
we take μn ∈ M such that
μn (Gcn ) sup μ(Gcn ) − 1/n.
μ∈M
In this section, we first identify the weak convergence for finite mea-
sures on Rn by using the convergence of characteristic functions and
then prove that a complex function on Rn is a characteristic function
if and only if it is continuous and nonnegative definite.
Theorem 6.18. Let {μk , μ}k1 be finite measures on Rn . Then
w
μk −
→ μ (k → ∞) if and only if fμk → fμ pointwise.
By the dominated convergence theorem, the necessity is obvious.
The sufficiency follows from Theorem 6.22 on the convergence of
integral characteristic functions.
Definition 6.19. Let fμ be the characteristic function of a finite
measure μ. The indefinite integral of fμ
u1 un
˜
fμ (u1 , . . . , un ) = ··· fμ (t1 , . . . , tn ) dt1 · · · dtn , u ∈ Rn
0 0
ui 0
is called the integral characteristic function of μ, where 0 =− ui
if ui < 0.
134 Foundation of Probability Theory
Let
n
ei uk xk − 1
F (x, u) = , x, u ∈ Rn .
i xk
k=1
1
n
P
ξk −
→ a.
n
k=1
which implies
n
1
lim P ξk − a ε = 0.
n→∞ n
k=1
1
n
(2) Let ξn = ξn − a. Then ηn = n ξk , so
k=1
n
fn (t) = fξk (t/n) = [f (t/n)]n ,
k=1
136 Foundation of Probability Theory
Thus,
lim log fn (t) = lim log (1 + o(1/n))n = lim n log (1 + o(1/n)) = 0.
n→∞ n→∞ n→∞
Proof. Let η (k) = ξ (k) − m. Then {η (k) } are i.i.d with zero mean.
Let f be the characteristic function of η (k) . Then the characteristic
N
function of √1N η (k) is
k=1
√ N
fN (t) = f (t/ N ) , t ∈ Rn .
N
By Theorem 6.18, this implies that √1 η (k) converges in
N
k=1
distribution to N (0, D), the centered normal distribution with
covariance D.
m 2
m
(j) (k) (k)
fμ t − t αj ᾱk = αk eit ,x μ(dx) 0.
R n
j,k=1 k=1
1
m−1
0 f (c(j − k))e−i cj−k,x
mn
j1 ,...,jn ,k1 ,...,kn =0
n
m |r |
= 1− f (cr)e−i cr,x =: Gm (x).
r1 ,...,rn =−m
m
=1
Let
c n
μm (dx) = Gm (x)1[− πc , πc ]n (x) dx.
2π
Then
π π n
μm (Rn ) = μm − ,
c c
n π
m |r |
n
c c
= 1− f (cr) ei cr x dx
r1 ,...,rn =−m
m 2π − π
=1 =1 c
= f (0).
Since {μm }m1 is tight, there exist μ and a subsequence μmk such
w
that μmk → μ(k → ∞). Then
π π n
μ(Rn ) = μ − , = f (0)
c c
and
n−1
(m)
(m)
fm t1 , . . . , ti , ti+1 , . . . , t(m)
n − fm t1 , . . . , ti+1 , ti+2 , . . . , tn(m)
i=0
n−1
(m)
2f (0)(f (0) − Refm ei ti − ti , (6.4)
i=0
140 Foundation of Probability Theory
6.5 Exercises
is equivalent to
Probability Distances
Let (E, ρ) be a metric space with Borel σ-algebra E , and let P(E)
be class of all probability measures on (E, E ). In this chapter, we
introduce some distances on P(E), including the metrization of weak
topology, the total variation distance for the uniform convergence,
and Wasserstein distance arising from optimal transport.
∞
dw (μ, ν) := 2−n {|μ(fn ) − ν(fn )| ∧ 1} , μ, ν ∈ P(E).
n=1
143
144 Foundation of Probability Theory
w
(b) Equivalence to the weak topology: Obviously, if μn − → μ, then
dw (μn , μ) → 0. Conversely, let dw (μn , μ) → 0. We are going to prove
μn (f ) − μ(f ) → 0 for any f ∈ Cb (E). Given f ∈ Cb (E), since {fn }
is dense in Cb (E), for any ε > 0, there exists n0 1 such that
||fn0 − f ||∞ < ε. So,
lim |μn (f ) − μ(f )| 2ε + lim |μn (fn0 ) − μ(fn0 )|
n→∞ n→∞
= 2ε.
As ε is arbitrary, we have μn (f ) → μ(f ).
(c) Separability: ∀m 1, let Um = {(μ(f1 ), . . . , μ(fm )) : μ ∈
P(E)} ⊂ Rm . Since Rm is separable, so is Um . Thus, there exists a
countable set Pm ⊂ P(E) such that
Ũm := {(μ(f1 ), . . . , μ(fm )) : μ ∈ Pm }
∞
is dense in Um . Thus, P∞ := Pm is a countable subset of P(E),
m=1
so that it suffices to prove that P∞ is dense in P(E) under distance
dw .
In fact, for any μ ∈ P(E), there exists μm ∈ Pm such that
1
|μm (fi ) − μ(fi )| , ∀1 i m.
m
Thus,
1
dw (μm , μ) 2−m + → 0 (m → ∞).
m
(d) Completeness of dw : Assume E is locally compact. Note
{μn }n1 ⊂ P(E) is a Cauchy sequence under dw . Then ∀m
1, {μn (fm )}n1 is a Cauchy sequence, so converge to some number,
denoted by φ(fm ). Moreover, given f ∈ Cb (E), ∀ε > 0, ∃m0 1 such
that ||fm0 − f ||∞ < ε. Thus,
lim |μm (f ) − μn (f )| 2ε + lim |μm (fm0 ) − μn (fm0 )|
m,n→∞ m,n→∞
= 2ε.
As ε is arbitrary, we note that {μn (f )}n1 is also a Cauchy sequence,
which converge to some number, denoted by φ(f ). By the properties
Probability Distances 145
φ : Cb (E) → R
< ∞.
Thus, for any n 1, there exists πn ∈ C (μ, ν) such that
1
Wp (μ, ν)p πn (ρp ) − . (7.1)
n
So, if πn converge weakly to some π0 , then π0 should be an optimal
coupling. For this, we first prove that {πn }n1 is tight. In fact, by
Theorem 6.16, we know that finite set {μ, ν} is tight, so for any ε > 0,
there exists a compact set K ⊂ E such that μ(K c ) + ν(K c ) < ε.
Thus, ∀π ∈ C (μ, ν), π((K × K)c ) π(K c × E) + π(E × K c ) =
μ(K c ) + ν(K c ) < ε. Therefore, C (μ, ν) is tight. Hence, there exist a
w
subsequence {πnk }k1 and π0 ∈ P(E) such that πnk − → π0 (k → ∞).
Obviously, π0 ∈ C (μ, ν). Combining this with (7.1), we obtain that
for any N ∈ (0, ∞),
π0 (ρp ∧ N ) = lim πnk (ρp ∧ N ) Wp (μ, ν)p .
k→∞
π(dx1 , dx2 , dx3 ) := μ1 (dx1 )π12 (x1 , dx2 )π23 (x2 , dx3 )
we have
Hence,
w
As ε is arbitrary and μn −
→ μ, we have
lim μn (ρ(o, ·)p ) lim μn (ρ(o, ·)p ∧N ) = μ(ρ(o, ·)p ∧N ) μ(ρ(o, ·)p ).
n→∞ n→∞
From this and (7.3), it follows that μ(ρ(o, ·)p ) = lim μn (ρ(o, ·)p ).
n→∞
Thus, by Kantorovich’s dual formula, the dominated convergence
theorem, and Wp (μn , μm ) → 0 as n, m → ∞, we obtain
μ(· ∩ B(o, N )) Wp
μN := −−→ μ,
μ(B(o, N ))
Probability Distances 151
∞
(N )
so we have Pp (E) is dense in (Pp (E), Wp ). Thus, we only need
N =1
(N )
to prove that each Pp (E) is separable. Since ρ(o, ·) is bounded on
B̄(o, N ), as shown in step (b), the weak convergence is equivalent to
the convergence in Wp (see Exercise 6). Then the proof is finished
(N )
by Theorem 7.1, which says that Pp (E) is separable under weak
topology.
Thus,
μ (ρ(o, ·)p − N )+ max μi (ρ(o, ·)p − N )+ + 2p−1 Wp (μi , μ)p
1in
n
μi (ρ(o, ·)p − N )+ + 2p−1 ε, μ ∈ M.
i=1
Hence,
+
N
lim sup μ ρ(o, ·) 1{ρ(o,·)N } 2 lim sup μ
p
ρ(o, ·) −
p
N →∞ μ∈M N →∞ μ∈M 2
2p ε.
As ε is arbitrary, we get (7.4) immediately.
(b) Sufficiency. Let M be weakly compact and (7.4) hold. We intend
to prove that M is compact under Wp . For this, we only need to
152 Foundation of Probability Theory
prove that for any sequence {μn }n1 ⊂ M, there exists a conver-
gent subsequence under Wp . By the weak compactness of M, we
w
may and do assume that μn − → μ. Let {x1 , x2 , . . .} be a dense sub-
∞
set of E. Then we have B(xi , ε) ⊃ E for any ε > 0, where
i=1
B(xi , ε) is an open ball with radium ε centered at xi . Since set
{ε > 0 : ∃i 1 such that μ(∂B(xi , ε)) > 0} is at most countable, for
m 1, take εm ∈ (0, 1/m) such that B(xi , εm ) are all μ continuous
sets. Let
i
U1 = B(x1 , εm ), Ui+1 = B(xi+1 , εm ) \ B(xj , εm ).
j=1
Then
∞
μn (Ui ) ∧ μ(Ui )
πn (dx, dy) := 1Ui (x)1Ui (y) μn (dx)μ(dy)+
μn (Ui )μ(Ui )
i=1
1
Qn (dx)Q(dy)
1 − rn
is a coupling of μn and μ (if rn = 1, then the last term is set to 0),
hence
2p−1
Wp (μn , μ)p πn (ρp ) m−p + (Qn (ρ(o, ·)p ) + Q(ρ(o, ·)p ))
1 − rn
m−p + 2p N p (1 − rn ) + 2p−1 sup μk ρ(o, ·)p 1{ρ(o,·)N }
k1
||μ − ν||Var := |μ − ν|(E) = 2(μ − ν)+ (E) = 2(ν − μ)+ (E). (7.5)
μ ∧ ν := μ − (μ − ν)+ = ν − (ν − μ)+
Thus,
π0 (A × E) = π0 (E × A) = μ(A), A ∈ E.
7.4 Exercises
4. Let (E, ρ) be a Polish space. Prove that under the total variation
distance, the space P(E) is complete. Exemplify it may not be
separable.
5. Let (E, E ) be a Polish space, and let μ, ν ∈ P(E). Construct a
probability space (Ω, A , P) and measurable
ξ, η : Ω → E,
which are called random variables on E such that P ◦ ξ −1 = μ,
P ◦ η −1 = ν and Wp (μ, ν)p = Eρ(ξ, η)p .
156 Foundation of Probability Theory
6. Let (E, ρ) be a Polish space and {μn , μ}n1 ⊂ Pp (E). Prove that
w
Wp (μn , μ) → 0 if and only if μn −→ μ and lim μn (ρ(o, ·)p ) =
n→∞
μn (ρ(o, ·)p ), where o ∈ E is a fixed point.
7. Let (E, ρ) be a compact metric space. Prove that for any p ∈
[1, ∞), (P(E), Wp ) is also a compact metric space. Exemplify
that (E, ρ) is a locally compact space, but (P(E), Wp ) is not.
8. (Lévy distance) For any probability distribution functions F, G
on R, let
and
|ξ − η|
β(ξ, η) = E .
1 + |ξ − η|
Prove
ρL (Fξ , Fη ) α(ξ, η)
and
α(ξ, η)2 (1 − α(ξ, η))α(ξ, η)
β(ξ, η) α(ξ, η) + .
1 + α(ξ, η) 1 + α(ξ, η)
Chapter 8
157
158 Foundation of Probability Theory
|v|k dμ < ∞,
Rd
since by the integral transformation (Theorem 3.27),
where k ∗ := k
k−1 ∈ (1, ∞], such that
τ : Ω1 → Ω2 , τ −1 : Ω2 → Ω1
such that
P1 (τ −1 ◦ τ = idΩ1 ) = P2 (τ ◦ τ −1 = idΩ2 ) = 1,
P1 = P2 ◦ τ, P2 = P1 ◦ τ −1 ,
X1 − X2 ◦ τ L∞ (P1 ) + X2 − X1 ◦ τ −1 L∞ (P2 ) ε,
where idΩi stands for the identity map on Ωi , i = 1, 2.
Proof. Since Rd is separable, there is a measurable partition
(An )n1 of Rd such that diam(An ) < ε, n 1. Let Ain = {Xi ∈
An }, n 1, i = 1, 2. Then (Ain )n1 forms a measurable partition of
Ωi so that n1 Ain = Ωi , i = 1, 2, and, due to LX1 |P1 = LX2 |P2 ,
P1 (A1n ) = P2 (A2n ), n 1.
Since the probabilities (Pi )i=1,2 are atomless, according to Theorem
C in Section 41 of [12], for any n 1, there exist measurable sets
Ãin ⊂ Ain with Pi (Ain \ Ãin ) = 0, i = 1, 2, and a measurable bijec-
tive map
τn : Ã1n → Ã2n
such that
P1 |Ã1 = P2 ◦ τn |Ã1 , P2 |Ã2 = P1 ◦ τn−1 |Ã2 .
n n n n
Calculus on the Space of Finite Measures 163
f (Lξε ) − f (Lξ0 )
lim = E[ Df (μ0 )(ξ0 ), ξ˙0 ]. (8.1.4)
ε↓0 ε
φn,ε := (ξε − ξ0 ) ◦ τn−1 ∈ Tμ,k , φn,ε Tμ,k = ξε − ξ0 Lk (P) . (8.1.8)
does not have atom for any s > 0, ε ∈ [0, 1]. By conditions in Theorem
8.7(2), there exists a small constant s0 ∈ (0, 1) such that for any
166 Foundation of Probability Theory
Then
|f (μ1 ) − f (μ2 )| K0 Wk (μ1 , μ2 ), μ1 , μ2 ∈ Pk . (8.1.14)
Proof. By Theorem 7.4, there exists π ∈ C (μ1 , μ2 ) such that
1
k
Wk (μ1 , μ2 ) = |x − y| π( dx, dy) .
k
Rd ×Rd
Calculus on the Space of Finite Measures 167
1
1
(E[|ξ1 − ξ2 |k ]) k Df (Lγε (r) )Lk (Lγε (r) ) dr
0
KWk (μ1 , μ2 ), ε ∈ (0, 1].
Letting ε → 0, we derive (8.1.14).
i=1
In this case, for any ε ∈ [0, ε0 ) and s ∈ (0, ε0 − ε), by the definition
of D E , we have
(2) In general, for any μ ∈ Mk , let {μn }n1 ⊂ Mdisc such that μn → μ
in Mk . By (8.2.2), for any ε ∈ (0, ε0 ) and s ∈ (0, ε0 − ε), we have
= 0.
ε E
lim dr D f ((1 + hε )μn )(x)ḣr (x)μn ( dx)
n→∞ 0 Rd
ε
= dr {D E f }((1 + hr )μ)(x)ḣr (x)μ( dx). (8.2.5)
0 Rd
E,1
Proposition 8.13. Let k ∈ [1, ∞). Then for any f ∈ CK (Mk ) and
μ, γ ∈ Mk ,
d
f ((1 − r)μ + rγ)
dr
f ((1 − r − ε)μ + (r + ε)γ) − f ((1 − r)μ + rγ)
:= lim
ε↓0 ε
D E f˜(μ) = D̃ E f (μ), μ ∈ P.
Then the desired formula is implied by Proposition 8.13 with r = 0.
f (μ ◦ φ−1
εv ) − f (μ)
ε
= dr ∇{D E f (μ ◦ φ−1 −1
rv )}, v d(μ ◦ φrv ), ε ∈ (0, ε0 ).
0 Rd
(8.3.5)
d(μ ◦ φ−1
εv ) ρvε+s − ρvε
ρvε := , ρ̇vε := lim
dμ s↓0 s
f (μ ◦ φ−1
εv ) − f (μ)
ε
= dr {D E f (μ ◦ φ−1
rv )}ρ̇r dμ, ε ∈ [0, ε0 ].
v
(8.3.6)
0 Rd
d
{g ◦ φrv } = ∇g(φrv ), v(φrv )
dr
= ∇g, v (φrv ), r 0,
ρvr+s − ρvr
gρ̇vr dμ = g lim dμ
Rd Rd s↓0 s
1
= lim g d μ ◦ φ−1 −1
(r+s)v − μ ◦ φrv
s↓0 s Rd
1 d
= lim g ◦ φ(r+s)v − g ◦ φrv dμ = (g ◦ φrv ) dμ
s↓0 s Rd Rd dr
Calculus on the Space of Finite Measures 177
where divμ◦φ−1
rv
(v) = div(v) + v, ∇ log(ρvr ρ) . This implies
ρ̇vr = −divμ◦φ−1
rv
(v)ρr ,
ρvr μ = μ ◦ φ−1
rv
lead to
v
D E f (μ ◦ φ−1
rv ) ρ̇r dμ
Rd
=− D E f (μ ◦ φ−1
rv ) divμ◦(φ−1
rv
(v) d(μ ◦ φ−1
rv )
Rd
= ∇{D E f (μ ◦ φ−1 −1
rv )}, v d(μ ◦ φrv ).
Rd
sup (μ ◦ φ−1
sv )(| · | ) = μ(|φsv | ) 2μ(| · | + |v| ) < ∞.
k k k k
s∈[0,1]
1 1
= dr μ( dx) ∇ (D E f )(rμ ◦ φ−1
rv
0 Rd 0
+ (1 − r)μ) (φsv (x)), v(x) ds.
Thus,
|f (μ ◦ φ−1
v ) − f (μ) − Rd ∇{D f (μ)}, v dμ|
E 2
Iv :=
μ(|v|2 )
By μ({x}) = 0, we have
(μ + sδx ) ◦ φ−1
rv = μ + sδx+rv0 . (8.3.8)
f (μ + sδx+rv0 ) − f (μ + sδx )
lim
r↓0 r
= s Df (μ + sδx )(x), v0 .
f (μ + sδx+rv0 ) − f (μ + sδx )
r
r
1 d
= f (μ + sδx+θv0 ) dθ
r 0 dθ
1 r
= ∇f (μ + sδ· )(x + θv0 ), v(x + θv0 ) dθ
r 0
s r
= Df (μ + sδ· )(x + θv0 ), v(x + θv0 ) dθ, r ∈ (0, r0 ).
r 0
q
ξs − ξ0
lim E − ξ˙0 = 0. (8.3.9)
s↓0 s
Corollary 8.16. Let k ∈ [1, ∞).
(1) Let f ∈ C E,1,1 (Pk ). Then f is intrinsically differentiable and
Df (μ)(x) = ∇{D̃ E f (μ)(·)}(x), (x, μ) ∈ Rd × Pk . (8.3.10)
When k 2 and f ∈ CB E,1,1
(Pk ), we have f ∈ C 1 (Pk ).
(2) If f ∈ C E,1 (Pk ), then f ((1 − s)μ + sδ· ) ∈ C 1 (Rd ) with
∇f ((1 − s)μ + sδ· )(x) = sDf ((1 − s)μ + sδx )(x), x ∈ Rd .
(8.3.11)
Consequently,
1
Df (μ)(x) = lim ∇f ((1 − s)μ + sδ· )(x), f ∈ C E,1 (Pk ),
s↓0 s
(x, μ) ∈ Rd × M. (8.3.12)
(3) Let {ξs }s∈[0,s0 ) be random variables on M with Lξs ∈ Pk
d
continuous in s, such that ξ̇0 := ds ξs s=0 exists in Lq (Ω →
T M ; P) for some q 1. Then
f (Lξs ) − f (Lξ0 )
lim = E Df (Lξ0 )}(ξ0 ), ξ˙0 (8.3.13)
s↓0 s
Calculus on the Space of Finite Measures 181
holds for any f ∈ C E,1,1 (Pk ) such that for any compact set
K ⊂ Pk ,
p(q−1)
sup |∇{D̃ E f (μ)}|(x) C(1 + |x|) q , x ∈ Rd (8.3.14)
μ∈K
Definition 8.17. Let {ξi }i1 be i.i.d. random variables with stan-
dard normal distribution N (0, 1). Then
∞
1
ξ= αi2 ξi ei
i=1
In this case,
∞
∗
∇ v := ∇ei vi ∈ C(H )
i=1
is supported on 2 , since
∞ ∞
ri2 Λ( dr) = α2i < ∞.
RN i=1 i=1
It is easy to see that
GL = Λ ◦ Φ−1 .
By combining this with the integral transformation theorem
(Theorem 3.27) and the integration by parts formula
ri h(ri )
h(ri )g (ri ) dΛi = − h (ri ) Λi ( dri ), h, g ∈ Cb1 (R),
R R αi
we finish the proof.
NL := GL ◦ Φ−1
2
f := u ◦ Φ2 , g := v ◦ Φ2 ∈ Cb1 (H)
and
8.5 Exercises
E,1,1
For any f ∈ CK (Pk ) such that
|D̃f (μ)(x)| 1 + |x|k , μ ∈ Pk , x ∈ Rd ,
prove
|f (μ) − f (ν)| μ − νk,var .
E,1,1
5. For k ∈ [0, 1], let Pk , Mk , D E , D̃ E , C E,1,1 (Mk ), CK (Pk )
E,1,1
and CK (Mk ) be defined as before. Prove that Propositions
8.12–8.14 still hold. Can we also define the directional intrinsic
derivative and intrinsic derivative on Pk and Mk for k ∈ [0, 1]?
6. Let k ∈ [0, ∞). Prove that there exists a constant c > 0 such that
μ − νvar + Wk (μ, ν)1∨k cμ − νk,var , μ, ν ∈ Pk .
Calculus on the Space of Finite Measures 187
Moreover, when k > 1, find counter example such that for any
constant c > 0, the inequality
[1] Billingsley, P. Probability and Measure. 3rd Ed. John Wiley and Sons,
New York, 1995.
[2] Bogachev, I. Measure Theory (Vol. I). Springer, Berlin, 2007.
[3] Chen, Mu-Fa. From Markov Chains to Non-equilibrium Particle
Systems. 2nd Ed. World Scientific, River Edge, NJ, 2004.
[4] Dudley, R. Real Analysis and Probability. Wadsworth, Pacific Grove,
CA, 1989.
[5] Durrett, R. Probability: Theory and Examples. 5th Ed. Cambridge
University Press, Cambridge, 2019.
[6] Kallenberg, O. Foundations of Modern Probability. 3rd Ed. Springer,
Cham, 2021.
[7] Feller, W. An Introduction to Probability and Its Applications
(Vol. I, II). John Wiley and Sons, London, 1971.
[8] Loéve, M. Probability Theory II. 4th Ed. Springer-Verlag, New York-
Heidelberg, 1978.
[9] Neveu, J. Mathematical Foundations of the Calculus of Probability.
Holden-Day, San Francisco, Calif.-London-Amsterdam, 1965.
[10] Rachev, S. The Monge–Kantorovich Mass Transference Problem and
Its Stochastic Applications. Theory of Probability and Applications,
Vol. XXIX, 1985, 647–676.
[11] Reed, M., Simon, B. Method of Modern Mathematical Physics
(Vol. I). Academic Press, New York, 1980.
[12] Shiryayev, A. Probability. Springer-Verlag, New York, 1984.
[13] Villani, C. Optimal Transport. Springer-Verlag, Berlin, 2009.
[14] Wang, Jaigang. Foundation of Modern Probability Theory
(in Chinese). 2nd Ed. Fudan University Press, Shanghai, 2005.
[15] Yan, Jiaan. Lectures on Measure Theory (in Chinese). 2nd Ed. Science
Press, Beijing, 2004.
189
190 Foundation of Probability Theory
[16] Yan, Shi-Jian, Liu, Xiu-Fang. Measure and Probability (in Chinese).
Beijing Norm University Press, Beijing, 2003.
[17] Yan, Shi-Jian, Wang, Jun-Xiang, Liu, Xiu-Fang. Foundation of Prob-
ability Theory (in Chinese). 2nd Ed. Science Press, Beijing, 2009.
[18] Yosida, K. Functional Analysis. 6th Ed. Springer-Verlag, Berlin,
1980.
[19] Zhang, Gongqing, Lin, Yuanqu. Lectures on Functional Analysis
(Vol I) (in Chinese). Peking University Press, Beijing, 1990.
[20] Zhang, Gongqing, Guo, Maozheng. Lectures on Functional Analysis
(Vol II) (in Chinese). Peking University Press, Beijing, 2001.
Index
A D
absolute continuity, 80 decomposition
additivity, 11 of distribution function, 84
algebra, 3 directional derivative, 160
almost everywhere convergence, 45 distribution function, 29, 39
almost everywhere (a.e.), 45 probability, 42
almost sure (a.s.), 45 distribution law, 42
dominated convergence theorem,
B 61
Boolean algebra, 3
Borel σ-algebra, 6 E
Borel field, 6 L -system, 36
boundedness in integral, 75 elementary function, 33
essential supremum, 88
C existence of integral, 58
central limit theorem, 136 extension of measure, 17
characteristic function, 66 extrinsic derivative, 168
of finite measure, 121
complete measure space, 22 F
conditional expectation, 106 family of consistent probability
conditional probability, 107 measures, 115
convergence in distribution, 50 Fatou-Lebesgue theorem, 60
convergence in rth mean, 71 Fourier-Stieltjes transform, 121
convergence in measure, 46 Fubini’s theorem, 94
convexity extrinsic derivative, 168 generalized, 100
coupling, 145 function of sets, 11
covariance coefficient, 66 σ-additivity, 11
covariance matrix, 66 σ-finite, 12
Cr inequality, 72–73 continuous, 14
191
192 Foundation of Probability Theory
finite, 12 M
finite additivity, 11 μ∗ -measurable, 20
mathematical expectation, 65
G measurable cover, 23
Gauss measure on Hilbert space, measurable cylindrical set, 95
182 measurable function, 30
Gaussian measure on M, 185 measurable map, 30
Gaussian measure on P2 , 184 measurable space, 5
geometric probability model, 16 measure, 12
measure extension theorem, 18
H measure space, 15
metrization of weak topology, 143
Hölder’s inequality, 71
Minkowski’s inequality, 73
Hahn’s decomposition theorem, 79
mixed conditional distribution, 112
monotone class theorem, 7
I
for functions, 37
indefinite integral, 63 for set classes, 10
independent, 42 monotone convergence theorem, 57
indicator function, 33 mutually singular, 80
infinite product σ-algebra, 95
integrable, 58 N
integral, 56 nonnegative definite function, 137
integral characteristic function, 133 null set, 22
integral transformation theorem, 69
intrinsic derivative, 160 O
inverse formula, 123
optimal mean square approximation,
inverse image, 30
109
optimal transport, 145
J
outer measure, 19
Jensen’s inequality, 72
P
K
π-system, 8
Kantorovich dual formula, 148 positive part and negative part of
Kolmogorov’s consistent theorem, 116 function, 34
probability measure, 12
L probability space, 15
λ-system, 8 product σ-algebra, 10
law of large numbers, 135 Prohorov’s theorem, 130
Lebesgue’s decomposition theorem,
80 R
Lebesgue-Stieltjes (L-S) integral, 69 rth central moment, 66
Lebesgue-Stieltjes (L-S) measure, 39 rth moment, 66
L-derivative, 161 Radon-Nikodym derivative, 84
Lr space, 71 Radon-Nikodym theorem, 83
Index 193