0% found this document useful (0 votes)
80 views38 pages

December 2, 2020

This document provides general recommendations and an overview of content for a lecture series on probability theory. It recommends that students familiarize themselves with measure theory before the lectures, try proving results themselves during videos, and send any corrections to mistakes found. The first lecture covers definitions of random variables and distributions, expectations, and results like Jensen's inequality and Chebyshev's inequality. It recommends exercises from the listed textbooks. The second lecture covers independence of random variables and results involving independent random variables. The third lecture gives applications of independence like the weak law of large numbers and large deviations. The fourth lecture covers different modes of convergence for random variables.

Uploaded by

Manuel Francisco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views38 pages

December 2, 2020

This document provides general recommendations and an overview of content for a lecture series on probability theory. It recommends that students familiarize themselves with measure theory before the lectures, try proving results themselves during videos, and send any corrections to mistakes found. The first lecture covers definitions of random variables and distributions, expectations, and results like Jensen's inequality and Chebyshev's inequality. It recommends exercises from the listed textbooks. The second lecture covers independence of random variables and results involving independent random variables. The third lecture gives applications of independence like the weak law of large numbers and large deviations. The fourth lecture covers different modes of convergence for random variables.

Uploaded by

Manuel Francisco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Lectures on Probability Theory

General recommendations.
• These lectures assume that the audience is familiar with measure theory.
• The videos do not replace the books. I suggest to choose one among the
many listed at the end of these notes and to read the corresponding sections
before or after the videos.
• After the statement of a result, interrupt the video and try to prove the
assertion. It is the only way to understand the difficulty of the problem, to
differentiate simple steps from crucial ones, and to appreciate the ingenuity
of the solution. Sometimes you find an alternative proof of the result.
• You can speed-up or slow-down the video. By pressing settings at the
bottom-right corner, you can modify the playback speed.
• Exercises highlighted in blue present results which will be used later in the
lectures and are highly recommended, as well as those indicated with ∗.
• Send me an e-mail if you find a mistake which is not reported in these notes.
• If you typed in latex, with no personal definitions nor the use of special
packages, solutions to some exercises proposed below, send the file. Hope-
fully, I’ll create a note with solutions to the exercises, acknowledging the
authors of the solutions.
• A note about the methodology. I ask the students to view the video(s)
before the class. In the first part of the lecture, I recall the content of the
video. Sometimes, I ask one of the students to replace me. Occasionally,
the student is randomly chosen. This is the opportunity for the students to
ask questions on the content of the class. In the second part of the lecture, I
present some of the applications included in the “Further Readings” topic.

December 2, 2020
1
2

Lecture 1: Introduction

Summary. This lecture is based on Sections 3.1 and 3.2 of [Chung].


Content and Comments.
0:00 Definition of random variables, probability distribution measures and dis-
tribution functions.
8:14 One-to-one correspondance between probability distribution measures and
distribution functions.
12:32 Definition of discrete random variables and discrete distribution functions.
15:38 Definition of absolutely continuous and singular distributions. The Cantor
distribution is constructed at the end of [Chung, Section 1.3].
20:24 F = Fd + Fac + Fs . [Chung, Theorem 1.3.2]
22:09 Definition of expectation
24:28 For a non-negative random variable,
X X
P [ X ≥ n ] ≤ E[X] ≤ 1 + P[X ≥ n]
n≥1 n≥1

This
R is [Chung,RTheorem 3.2.1].
37:05 Ω f (X) dP = R f (x) µX (dx), [Chung, Theorem 3.2.2].
40:32 Jensen’s inequality, [Chung, Section 3.2]. See comment below on convex
functions.
47:28 Cebyshev’s inequality, [Chung, Section 3.2].
On convex functions. Let I be an open interval of R (which may be equal to R).
Consider a real-valued convex function ϕ : I → R. Show that ϕ is continuous on I
and that it has left and right-derivatives at every point. Denote by (D+ ϕ)(x) the
right-derivative of ϕ at x. Show that for all x0 ∈ I, (D+ ϕ)(x0 ) (x − x0 ) + ϕ(x0 ) ≤
ϕ(x). This bound is used in the proof of Jensen’s inequality.
Further Readings.
A. [Varadhan, Chapter 1] presents a review on measure theory.
B. [Breiman, Chapter 2] examines the properties of the distribution functions
of random vectors and presents Kolmogorov’s extension theorem. Be aware
that Breiman defined the distribution function as FX (x) = P [X < x], while
we adopt in these lectures the convention that FX (x) = P [X ≤ x].
C. [Durrett, Sections 1.1 – 1.3] has many examples.
Recommended exercises.
a. In Section 3.1 of [Chung], prove Theorems 1 to 6 (that is Theorems 3.1.1
to 3.1.6).
b. Section 3.1 of [Chung], exercises 3, 4, 5, 11.
c. In Section 3.2 of [Chung], prove Theorems 2 and 3.
d. Section 3.2 of [Chung], exercises 2, 5, 6, 7, 11, 12, 13, 14, 16, 19
Suggested exercises.
a. Section 3.1 of [Chung], exercises 6, 10
b. Section 3.2 of [Chung], exercises 1, 4, 8, 10, 15, 17, 18
3

Lecture 2: Independence

Summary. This lecture is based on Section 3.3 of [Chung].

Content.
0:00 Definition of independent random variables
4:47 Subfamilies of independent random variables are independent
8:12 Definition of distribution function and probability measure of a random
vector
12:43 Lemma. A finite set of random variables is independent if and only if the
distribution function of the random vector is equal to the product of the
distribution functions.
18:15 Lemma. A finite set of random variables is independent if and only if the
probability measure of the random vector is equal to the product of the
probability measures.
19:12 Theorem. Let X1 , . . . , XN be independent random variables and f1 , . . . , fN ,
fj : R → R, measurable functions. Then, f1 (X1 ), . . . , fN (XN ) are indepen-
dent random variables.
22:52 Theorem. Let X1 , . . . , XN be independent random variables, n0 = 0,
1 ≤ n1 < n2 < · · · < np = N and f1 , . . . , fp , fj : Rnj −nj−1 → R, mea-
surable functions. Then, f1 (X1 , . . . , Xn1 ), . . . , fp (Xnp−1 +1 , · · · , Xnp ) are
independent random variables.
26:19 Theorem. Let X, Y be independent random variables such that E[ |X| ] <
R∞, E[ |Y | ] < ∞.
R Then, E[ X Y ] = E[ X ] E[ Y ].
43:17 Ω f (X) dP = R f (x) µX (dx).
45:53 Second proof of the identity E[ X Y ] = E[ X ] E[ Y ].
52:03 Construction of a product measure on an infinite product space.

Comments and References.


17:40 Lemma. Assume that n = 2. Fix x1 . Denote by M the class of sets B
which satisfies the identity
h i    
P X1 ≤ x1 X2 ∈ B = P X1 ≤ x1 P X2 ∈ B .

Show that M is a monotone class and contains the algebra generated by


the intervals (−∞, a]. Apply the monotone class theorem, Theorem 1.5 of
[Taylor, Section 1.5], to conclude that the previous identity holds for all
Borel sets B. Fix a Borel set B0 . Denote by M the class of sets B which
satisfies the identity
h i    
P X1 ∈ B X2 ∈ B0 = P X1 ∈ B P X2 ∈ B0 .

Show that M is a monotone class and contains the algebra generated by


the intervals (−∞, a]. Apply the monotone class theorem, Theorem 1.5 of
[Taylor, Section 1.5], to conclude that the previous identity holds for all
Borel sets B.
28:00 Theorem. The construction of the integral is presented from Section 5.1 to
5.3 of [Taylor]. Fubini’s and Tonelli’s theorems can be found in Section 6.3
of [Taylor].
4

52:03 Details of the construction can be found in Lecture 17 of the course on


measure theory and in Section 6.6 of [Taylor].
Further readings.
A. [Taylor] for all results on measure theory used in this lecture.
B. [Chung, Section 3.3] provides many details skipped in the lecture and fur-
ther examples.
C. [Breiman, Section 3.1] presents independence from a slightly different point
of view.
D. [Durrett, Section 1.4] gives many examples.
Suggested exercises.
a. [Chung, Section 3.3], exercises 4, 8, 9, 10, 14, 15.
b. [Varadhan], exercise 23 of Chapter 1 and exercise 4 of chapter 3
c. Give and example of three random variables X, Y and Z defined on the
same probability space and such that X and Y are independent, Y and
Z are independent, X and Z are independent, but X, Y and Z are not
independent.
d. [Breiman, Section 3.1], problems 1 and 2.
e. [Durrett, Section 1.4] exercises 2, 4, 5, 6, 12, 13, 16, 17, 19.
5

Lecture 3: Applications of Independence

Summary. The first two applications of this lecture can be found in Section 1.5
of [Durrett], the last one in Chapter 3 of [Varadhan-LD].
Content and Comments.
0:00 Weak law of large numbers.
8:06 Convergence in Lp implies convergence in probability.
14:25 Bernstein polynomials approximate uniformly continuous functions.
33:57 An upper bound for large deviations. This result is known as Cramer’s
theorem.
Further readings.
A. Interesting examples (coupon collector, random permutation, occupancy
problems, St. Petersburg paradox) can be found in Section 1.5 of [Durrett].
B. More details on large deviations of i.i.d. random variables can be found in
[Varadhan-LD, Deuschel-Stroock, Dembo-Zeitouni].
Suggested exercises.
a. Exercises in Section 1.5 of [Durrett].
6

Lecture 4: Convergence of random variables

Summary. This lecture is based on [Chung, Section 4.1].


Content and Comments.
0:00 Definition of almost sure convergence
2:09 A necessary and sufficient condition for almost sure convergence. [Chung,
Theorem 4.1.1]
12:24 Definition of convergence in probability
13:19 Almost sure convergence implies convergence in probability. [Chung, The-
orem 4.1.2]
15:18 A sequence which converges in probability admits a subsequence which
converges almost surely. [Chung, Theorem 4.2.3]
25:17 Definition of convergence in Lp
26:03 Convergence in Lp implies convergence in probability. [Chung, Theorem
4.1.4]
27:55 A sequence dominated by a random variable in Lp and which converges in
probability also Converges in Lp . [Chung, Theorem 4.1.4]
35:07 The Corollary of [Taylor, Theorem 5.6] is applied here.
37:33 An example of a sequence which converges almost surely and does not
converge in Lp . Note that this sequence converges almost surely to 0 and
not only in probability. See Example 2 of [Chung, Section 4.1].
42:11 An example of a sequence which converges in Lp and does not converge
almost surely. See Example 1 of [Chung, Section 4.1]
Further readings.
A. [Breiman, Section 2.8] for the definition of Cauchy sequences and their
properties.
Recommended exercises.
a. [Chung, Section 4.1], exercises 4, 7, 8, 9, 10, 15, 18, 20
b. [Breiman, Section 2.8], problems 12, 13 and 14. Problem 14 asks to prove
a result similar to the one used in the lecture at time 0:00.
Suggested exercises.
a. [Chung, Section 4.1], exercises 1, 3, 5, 6, 11, 12, 19
7

Lecture 5: Borel-Cantelli lemma

Summary. This lecture is based on [Chung, Section 4.2] and [Durrett, Section
1.6].
Content and Comments.
0:00 Definition of lim supn En , lim inf n En
1:50 P [lim sup En ] = lim P [∪m≥n Em ]
n n
5:14 lim sup En = { En i. o. } := {ω : ω ∈ En i. o. }
n X
15:07 [Chung, Theorem 4.2.1]. P [En ] < ∞ ⇒ P [ En i. o. ] = 0.
n≥1
17:28 Application. [Durrett, Theorem 1.6.5] Xn i.i.d., E[X14 ] < ∞ ⇒ (X1 + · · · +
Xn )/n → E[X1 ] a. e. X
29:46 [Chung, Theorem 4.2.4]. (En : n ≥ 1), independent, P [En ] = ∞ ⇒
n≥1
P [En i. o. ] = 1.
35:21 Remark: (En : n ≥ 1), independent.
P Then, P [En i. o. ] = 1 or 0. Morever,
P [En i. o. ] = 1 if and only if n≥1 P [En ] = ∞.
37:05 Application. [Durrett, Theorem 1.6.7]. Xn i.i.d., E[ |X1 | ] = ∞ ⇒ P [ |Xn | ≥
n i. o. ] = 1.
38:57 We are using here [Chung, Theorem 3.2.1] (cf. Lecture 1, time 0:00)
39:39 In particular, P lim(X1 + · · · + Xn )/n exists and belongs to R = 0.
n
40:36 To prove that the set {ω ∈ Ω : limn (X1 +· · ·+Xn )/n exists and belongs to R}
is an element of the σ-algebra F, recall that this set corresponds to the set
{ω ∈ Ω : (X1 + · · · + Xn )/n is a Cauchy sequence }.
50:45 Thus, the hypothesis E[ |X1 | ] < ∞ is needed for a strong law of large
numbers for i.i.d random variables.
Further readings.
A. [Chung, Theorem 4.2.5]. Only pairwise independence is needed in [Chung,
Theorem 4.2.4]
B. [Durrett, Examples 1.6.2 and 1.6.3] [If Rn represents the number of records
up to time n, Rn / log n → 1 a. e.] and [if Ln represents the size of largest
sequence of consecutive 1’s in a Bernoulli sequence, Ln / log2 n → 1 a. e.]
C. [Breiman, Propositions 3.16-3.18] investigates the number of returns to the
origin in a coin-tossing problem.
Recommended exercises.
a. [Chung, Section 4.2] 2, 5, 6, 7, 10, 12, 14, 16,
b. [Durrett, Section 1.6] Exercises 2 – 8, 10
c. [Breiman, Section 3.3] Problems 6 [we presented a proof of this result earlier.
Use Borel-Cantelli to derive a second proof], 7, 9, 10
Suggested exercises.
a. [Chung, Section 4.2] 1, 3, 4, 8, 9, 11, 13, 15, 18, 19, 20
b. [Durrett, Section 1.6] Exercises 12, 14, 17, 18
8

Lecture 6: Weak convergence: Helly’s selection theorem and tightness

Summary. This lecture is based on [Breiman, Sections 8.1 and 8.2] and [Varadhan,
Section 2.3].
Content and Comments.
0:00 Definition of weak convergence and convergence in distribution.
8:55 The space of distributions N and the space of generalized distributions M.
13:38 [Breiman, Theorem 8.6] Helly’s selection Theorem, This result correspond
to steps 1, 2, 3 of [Varadhan, Theorem 2.4]
49:18 Examples where the limit is a generalized distribution and not a distribu-
tion.
51:54 Uniqueness o limit points yields convergence. [Breiman, Corollary 8.8]
1:00:18 Tightness of probability measures [Breiman, Definition 8.9]
1:05:52 A set of distribution funtions {Fα : α ∈ I} is tight if and only if the following
statement holds [Fαn → G ⇒ G ∈ N ]. [Breiman, Proposition 8.10]
1:21:11 Let me clarify. By hypothesis, Fαn (n) − Fαn (−n) ≤ 1 −  for all n ≥ 1. We
(1)
introduced a subsequence αn . This means that αn1 = αp(n) , where p(n) ≥
n and p(n + 1) > p(n). By definition of the sequence p(n) and the above
inequality, Fα(1) (p(n)) − Fα(1) (−p(n)) = Fαp(n) (p(n)) − Fαp(n) (−p(n)) ≤
n n
1 − . Hence, instead of writing Fα(1) (n) − Fα(1) (−n), I should have written
n n
Fαp(n) (p(n)) − Fαp(n) (−p(n)). This is what I meant when saying that n is
(1)
the one which corresponds to the subsequence αn .
Further readings.
A. [Chung, Sections 4.3 and 4.4] provides examples and an alternative view on
convergence in distribution.
B. [Durrett, Section 2.2] presents many examples of sequences of random vari-
ables which converge in distribution.
Recommended exercises.
a. Prove [Chung, Theorems 4.3.1 and 4.3.2]
b. [Breiman, Chapter 8] Problems 1, 2, 4, 5, 6, 7, 9, 10.
c. [Durrett, Section 2.2] Exercise 6
Suggested exercises.
a. [Breiman, Chapter 8] Problems 3, 8, 11.
b. [Chung, Section 4.3] Exercises 3, 8
c. [Durrett, Section 2.2] Exercises 2, 3, 7
9

Lecture 7: Weak convergence: Helly-Bray’s theorem

Summary. This lecture is based on [Breiman, Section 8.3], [Chung, Section 4.4]
and [Varadhan, Section 2.3].
Content and Comments.
0:00 Helly-Bray’s theorem, [Breiman, Proposition 8.12] presents a stronger ver-
sion of the first part of the theorem, and [Varadhan, Theorem 2.3].
32:26 Convergence in probability implies convergence in distribution. [Chung,
Theorem 4.4.5].
41:18 Convergence in distribution to a Dirac mass implies convergence in proba-
bility. [Chung, Section 4.4, Exercise 4].
d d d d
46:15 Xn → X and Yn → y ∈ R ⇒ Xn + Yn → X + y and Xn Yn → Xy. [Chung,
Theorem 4.4.6].
Further readings.
A. [Breiman, Section 8.3] presents a stronger version of Helly-Bray’s theorem
[continuity is replaced by the hypothesis that the set of discontinuity points
has measure 0 for the limiting probability measure.
B. [Chung, Theorem 4.4.1] states that to prove convergence in distribution it
is enough to show that E[f (Xn )] → E[f (X)] for all continuous functions f
with compact support. The corollary of [Chung, Theorem 4.4.6] provides
necessary and sufficient conditions for convergence in distribution in terms
of open and closed sets.
Recommended exercises.
*a. Let f : R → R be a bounded function which is continuous everywhere,
except at a finite number of points, represented by D = {x1 , . . . , xp }. Let
µn be a sequence R of probability
R measures which converges weakly to µ. If
µ(D) = 0, then f dµn → f dµ.
b. [Breiman, Section 8.3], problems 13, 14
c. [Chung, Section 4.4], exercises 1, 4, 6, 7, 9, 11
d. [Varadhan, Chapter 2], exercises 9, 10, 11
e. Prove [Breiman, Propositions 8.12, 8.15, 8.17, 8.19]
Remark 0.1. The assertion of the recommended exercise [a] will be used several
times in the next lectures.

Suggested exercises.
a. [Chung, Section 4.4], exercises 2, 3, 8, 10, 12
10

Lecture 8: Characteristic functions

Summary. This lecture is based on [Breiman, Section 8.7]


Content and Comments.
0:00 Definition of a characteristic function.
2:58 Elementary properties of characteristic functions. [Breiman, Proposition
8.27]. We also prove that the characteristic function is positive-definite.
13:25 Let µ and ν be two probability measures on R such that ϕµ = ϕν . Then,
µ = ν. [Breiman, Theorem 8.24]. The theorem on approximation by
trigonometric polynomials is [Rudin, Theorem 8.15].
44:28 There exists 0 < K < ∞ such that for all a > 0 and probability measures
µ on R,
K a
Z
h 1 1 ic  
µ − , ≤ 1 − <ϕµ (t) dt .
a a a 0
Rt
This is [Breiman, Proposition 8.29]. sin(t)/t < 1 because sin(t) = 0 cos(r) dr.
56:47 Let (Xn : n ≥ 1) be a sequence of random variables. Denote by ϕn (t) the
associated characteristic functions. Assume that there exists δ > 0 and
ϕ : [−δ, δ] → C such that ϕn (t) → ϕ(t) for all |t| ≤ δ. Assume that ϕ
is continuous at 0. Then, the sequence is tight. This is part of [Breiman,
Theorem 8.28].
Further readings.
A. [Varadhan, Section 2.1] has many interesting comments and examples. It
provides the inversion formula: a formula for the distribution function in
terms of the characteristic function. This is an alternative way to prove
that the characteristic functions identifies the distribution measure.
B. [Chung, Sections 6.1 and 6.2] provide further examples of characteristic
functions.
Recommended exercises.
a. [Varadhan, Section 2.1], exercises 2, 3, 4, 5.
b. [Chung, Section 6.1]. Prove properties (iii), (iv) and (v) of characteristic
functions, the corollary of Theorem 6.1.4 and Theorem 6.1.5.
c. [Chung, Sections 6.1], exercise 16.
Suggested exercises.
a. [Varadhan, Section 2.1], exercise 1.
b. [Chung, Section 6.1], exercise 11, 12.
c. [Chung, Section 6.2], exercises 3, 7, 9.
11

Lecture 9: The Lévy continuity theorem

Summary. This lecture is based on [Breiman, Sections 8.6, 8.7, 8.9 and 8.11] and
on [Varadhan, Section 3.6].
Content and Comments.
0:00 Lévy continuity theorem. [Breiman, Theorem 8.28].
7:22 Expansion of the characteristic function. [Breiman, Proposition 8.44].
27:30 The characteristic function of the sum of two independent random variables.
[Breiman, Proposition 8.33].
30:32 Application: The central limit theorem for i.i.d. random variables with
finite second moments. This is [Varadhan, Theorem 3.17]. See [Breiman,
Theorem 8.20] for another proof of this result. We use here the expansion
of log(1 + z) for z ∈ C. See [Breiman, Proposition 8.46].
Further readings.
A. Read [Breiman, Sections 8.12] and [Varadhan, Section 2.2]. The starting
question is: does there exist two distributions with the same moments or
does the convergence of moments entail the convergence in distribution.
B. Read [Breiman, Sections 8.13]. It is proved there that the Laplace trans-
forms characterize the distribution of positive random variables.
Recommended exercises.
*a. Let (Xn : n ≥ 1) be a sequence of random variables which converges in
distribution to X. Denote by ϕn (t), ϕ(t) the characteristic functions of
Xn , X, respectively. Show that ϕn converges to ϕ uniformly on bounded
intervals. This is [Breiman, Proposition 8.31] and will be used many times
below.
b. Prove [Breiman, Propositions 8.30, 8.33, 8.37]
c. [Breiman, Chapter 8], exercises 16, 17 (see [Varadhan, Theorem 2.7]), 21
d. Prove [Varadhan, Theorem 2.6]
e. [Chung, Section 6.4], exercises 4, 7, 11, 24
Suggested exercises.
a. [Chung, Section 6.3], exercises 6, 8.
b. Prove [Chung, Theorem 6.4.6]
c. [Chung, Section 6.4], exercise 6.
12

Lecture 10: Weak law of large numbers

Summary. This lecture is based on [Varadhan, Sections 3.2 – 3.4].


Content and Comments.
0:00 Weak law of large numbers, first proof based on truncation. [Varadhan,
Theorem 3.3].
16:40 Weak law of large numbers, second proof based on characteristic functions.
[Varadhan, Theorem 3.3]. See [Chung, Lemma of Section 6.4] for the proof
that (1 + zn /n)n → ez if zn → z ∈ C.
24:32 Comments on the strong law of large numbers.
25:47 Kolmogorov inequality. [Varadhan, Lemma 3.7].
44:47 Lévy inequality. [Varadhan, Lemma 3.8].
Further readings.
A. [Chung, Sections 5.1 – 5.3] covers substantially the same material. It pro-
vides a proof of the strong law of large numbers under thy hypothesis of a
finite second moment.
B. Read the example of [Chung, Section 5.2]
Recommended exercises.
a. [Varadhan, Chapter 3], exercises 5, 6, 7.
b. Prove [Chung, Theorems 5.1.1 – 5.1.3].
c. [Chung, Section 5.1], exercises 1, 2, 8, 9.
d. Prove [Chung, Theorems 5.2.1 – 5.2.3].
e. [Chung, Section 5.2], exercises 2, 5, 6, 9, 10, 13
13

Lecture 11: Convergence of series

Summary. This lecture is based on [Varadhan, Section 3.4].


Content and Comments.
0:00 Theorem: Let P (Xj : j ≥ 1) be a sequence of independent random variables
and Sn = 1≤j≤n Xj , Then Sn converges in distribution if and only it
converges in probability if and only if it converges a.s. [Varadhan, Theorem
3.9].
2:22 Lemma 1: A Cauchy sequence in probability converges in probability. This
is exercise 3.11 of [Varadhan, Section 3.4].
18:39 Lemma 2: For all  > 0, limm,n→∞ P [ maxm<k≤n |Xk − Xm | >  ] = 0 =⇒
∃ X s.t. Xn → X a. s. This is exercise 3.12 of [Varadhan, Section 3.4].
40:56 Proof of the theorem, divided in several claims.
46:18 Claim 1: ϕSn −Sm (t) → 1 for all |t| ≤ t0 .
52:18 Claim 2: ϕSn −Sm (t) → 1 for all t ∈ R. To prove the exercise, show that
1 − cos(2 t) ≤ 4 [ 1 − cos(t) ] for all t ∈ R.
57:12 Claim 3: Sn − Sm → 0 in probability.
1:03:09 Claim 4: There exists a r.v. S such that Sn → S in probability.
1:04:25 Claim 5: FS = F .
1:06:54 Sn converges a.s. to S.
1:12:38 Actually, k = m + 1 and runs from p + 1 to q. This is corrected at time
[1:14:10].
1:14:54 Some steps have been skipped. Here is a complete argument. To apply
Lévy’s inequality, exactly as stated in the previous lecture, set Yi = Xp+i ,
N = q − p, M = k − p. Note that M varies from 1 to N and that the bound
can be written as
h i
P YM + · · · + YN > ≤ δ for 1 ≤ M ≤ N .
2
Lévy’s inequality yields that
h i δ
P max Y1 + · · · + YM >  ≤ .
1≤M ≤N 1−δ
Rewriting this in terms of the variables Xi yields that
h i δ
P max Xp+1 + · · · + Xp+M >  ≤ .
1≤M ≤N 1−δ
That is
h i h i δ
P max Sk − Sp >  = P max Sp+M − Sp >  ≤ .
p<k≤q 1≤M ≤N 1−δ
.
Recommended exercises.
a. Prove [Chung, Theorems 5.3.2].
b. [Chung, Section 5.3], exercises 1, 2, 3, 6.
c. [Durrett, Section 1.8], exercises 9, 11.
14

Lecture 12: Kolmogorov’s three series theorem

Summary. This lecture is based on [Varadhan, Section 3.4].


Content and Comments.
0:00 Kolmogorov’s one series theorem, first proof. [Varadhan, Theorem 3.10]
4:50 One series theorem, second proof.
11:12 Kolmogorov’s two series theorem, [Varadhan, Theorem 3.11]
15:48 Kolmogorov’s three series theorem, direct statement [Varadhan, Theorem
3.12]
Application: the convergence of the random series 1≤j≤n (Zj /j θ ), where
P
28:01
Zj is a sequence of i.i.d. random variables such that P [Z1 = 1] = 1/2 =
P [Z1 = −1].
34:10 Kolmogorov’s three series theorem, the converse statement [Varadhan, The-
orem 3.12]
36:28 Lemma: (YP j )j≥1 sequence of independent Pvariables such that |Yj | ≤ C,
E[Yj ] = 0, 1≤j≤n Yj converges a.s. Then, j≥1 Var [Yj ] < ∞.
1:11:00 Kolmogorov’s Three series theorem, proof of the converse statement.
Recommended exercises.
a. [Breiman, Chapter 3], problem 11.
b. [Durrett, Section 1.8], exercise 1.
c. [Durrett, Section 1.8], solve example 3.
15

Lecture 13: The strong law of large numbers

Summary. This lecture is based on [Varadhan, Section 3.5].


Content and Comments.
0:00 Kronecker’s lemma. [Chung, Section 5.4].
12:27 The strong law of large numbers for mean-zero random variables. [Varadhan,
Theorem 3.14].
32:21 Exercise: Let (Xj : j ≥ 1) be a sequence of i.i.d. random variables such
that E[ X1+ ] = ∞, E[ X1− ] < ∞. Then, limN SN /N = +∞ almost surely.
33:18 Let (Xj : j ≥ 1) be a sequence of i.i.d. random variables such that
E[ |X1 | ] = ∞. Then, lim supN |SN |/N = +∞ almost surely. [Chung,
Theorem 5.4.2].
42:00 Kolmogorov’s 0-1 law. [Varadhan, Theorem 3.15].
Further readings.
A. [Chung, Section 5.4] states more general results on the almost sure conver-
gence of series
B. [Durrett, Sections 1.7 and 1.8] also states more general results and presents
instructive examples.
Recommended exercises.
*a. Let (Xj : j ≥ 1) be a sequence of i.i.d. random variables such that
E[ |X1 | ] < ∞. Prove that (X1 + · · · + XN )/N → E[X1 ] almost surely.
*b. Prove [Varadhan, Corollary 3.16] and solve exercises 3.16 and 3.15.
*c. Fill the gaps left in the proof of Kolmogorov’s 0-1 law.
d. Prove [Chung, Theorem 5.4.1] and its corollary.
e. Prove [Chung, Theorem 5.4.3] and its corollary.
f. [Chung, Section 5.4], exercises 1, 5, 7, 9.
g. Fill the details of Examples 1.8.1 – 1.8.3 in [Durrett]
h. Prove [Durrett, Theorem 1.8.7 and 1.8.8]
i. [Durrett, Section 1.8], exercises 4, 5, 6, 9, 10, 11, 12.
Suggested exercises.
a. Prove [Durrett, Theorem 1.7.3]
b. [Durrett, Section 1.7], exercises 1, 2, 3, 4.
c. Prove [Durrett, Theorem 1.7.4]
d. Fill the details of [Durrett, Example 1.7.3]
e. [Durrett, Section 1.8], exercises 1, 2, 3, 7, 8.
16

Lecture 14: Law of large numbers II.

Summary. This lecture is based on [Chung, Sections 5.4 and 5.5].


Content and Comments.
0:00 Let (Xj : j ≥ 1) be mean zero, independent random variables. Let (an :
n ≥ 1) be an increasing sequence of real numbers diverging to +∞. Let ϕ :
R → R be an even, non-negative function. Assume that ψ1 : (0, ∞) → R,
defined by ψ1 (x) = ϕ(x)/x, is increasingPand ψ2 : (0, ∞) → R, defined by
ψ2 (x) = ϕ(x)/x2 , is decreasing. Then, 1≤j≤n (Xj /aj ) converges almost
P
surely if j≥1 E[ϕ(Xj )/ϕ(aj )] < ∞. This is [Chung, Theorem 5.4.1].
24:55 Corollary: Under the hypotheses of the previous theorem, (X1 + · · · +
XN )/aN → 0 almost surely. Kronecker lemma was stated at the beginning
of the previous lecture.
26:20 Example 1: ϕ(x) = |x|p for 1 ≤ p ≤ 2.
29:29 Example 2: Let (Xj : j ≥ 1) be mean zero, independent random variables.
Assume that there exist 1 < p ≤ 2 and M < ∞ such that E[ |Xj |p ] ≤ M
for all j ≥ 1. Then, (X1 + · · · + XN )/N → 0 almost surely. Note that if
there exist q > 2 and A < ∞ such that E[ |Xj |q ] ≤ A for all j ≥ 1, then,
by Hölder inequality, E[ |Xj |2 ] ≤ A2/q for all j ≥ 1. In particular, the
thesis holds if there exist p > 1 and M < ∞ such that E[ |Xj |p ] ≤ M for
all j ≥ 1.
32:55 Let (Xj : j ≥ 1) be mean zero, i.i.d. random variables.
√ Assume that σ 2 =
E[X12 ] < ∞. Then, for all  > 0, (X1 + · · · + XN )/ N (log N )(1/2)+ → 0
almost surely.
38:57 Let F be a distribution function and (Xj : j ≥ 1) be a sequence of i.i.d.
random variables whose distribution function is F . Then, the empirical
distribution function converges, uniformly in R, to the distribution function
almost surely. This is [Chung, Theorem 5.5.1].
Recommended exercises.
a. [Chung, Section 5.4], exercises 2, 3, 4, 6, 10, 12, 13.
b. [Chung, Section 5.5], exercises 1, 2, 3.
17

Lecture 15: Applications of the Law of large numbers.

Summary. This lecture is based on [Chung, Section and 5.5] and [Durrett-4th,
Sections 2.4].
Content and Comments.
0:00 Shannon’s entropy. This is [Durrett-4th, Example 2.4.3]
7:41 Renewal process. I followed [Chung, Section and 5.5]. This is also [Durrett-4th,
Example 2.4.1].
18

Lecture 16: Central Limit Theorem

Summary. This lecture is based on [Varadhan, Section 3.6].


Content and Comments.
0:00 Statement of Lindeberg’s theorem. [Varadhan, Theorem 3.18].
6:10 Claim 1: max1≤j≤n (σj2 /s2n ) −→ 0.
P 
9:20 Claim 2: For all  > 0, limn→∞ 1≤j≤n P |Xn,j | >  ] = 0.

13:17 Claim 3: For all T > 0, sup|t|≤T max1≤j≤n ϕn,j (t) − 1 −→ 0.
20:39 Proof of Lindeberg’s theorem, Part 1:
n
ϕn,j (t) − 1 2 = 0 .
X
lim sup
n→∞ |t|≤T
j=1

28:03 Part 1A: sup|t|≤T max1≤j≤n ϕn,j (t) − 1 −→ 0.
P
32:04 Part 1B: There exists a constant CT such that sup|t|≤T 1≤j≤n ϕn,j (t) −


1 ≤ CT .
33:38 Proof of Lindeberg’s theorem, Part 2:
n
X   t2
lim sup ϕn,j (t) − 1 + = 0.

n→∞ |t|≤T
j=1
2
42:41 Lyapounov’s condition implies Lindeberg’s.
46:32 The same proof yields the following result. For each n ≥ 1, let (Xn,j :
1 ≤ j ≤ kn ) be independent random variables. Assume that kn → ∞,
2
E[Xn,j ] = 0, s2n = 1≤j≤kn E[Xn,j
P
] = 1. If, for all  > 0,
kn
X Z
lim x2 µn,j (dx) = 0 .
n→∞ |x|>
j=1
P
Then, 1≤j≤kn Xn,j converges in distribution to a mean-zero Gaussian
random variable with variance equal to 1.
Further readings.
A. [Chung, Sections 7.1 and 7.2].
B. [Breiman, Sections 9.1 – 9.3].
C. [Durrett-4th, Section 3.4].
D. In [Durrett-4th, Section 3.4], read example 8.
Recommended exercises.
a. For each n ≥ 1, let (Xn,j : 1 ≤ j ≤ kn ) P be independent random variables.
2
Assume that kn → ∞, E[Xn,j ] = 0, 1≤j≤kn E[Xn,j ] = 1. Assume,
furthermore, that for all  > 0,
Xkn Z
lim x2 µn,j (dx) = 0 .
n→∞ |x|>
j=1
P
Then, 1≤j≤kn Xn,j converges in distribution to a mean-zero Gaussian
random variable with variance equal to 1.
b. Prove [Chung, Theorem 7.1.1].
c. Fill the gaps of all examples in [Durrett-4th, Section 3.4].
19

d. [Varadhan, Section 3.6], exercises 17, 19.


e. [Chung, Section 7.1], exercise 1.
v. [Durrett-4th, Section 3.4], exercises 5, 6, 7.
Suggested exercises.
a. [Chung, Section 7.1], exercises 2, 4.
b. [Durrett-4th, Section 3.4], exercises 2, 3, 4, 8.
20

Lecture 17: Central Limit Theorem, II.

Summary. This lecture is based on [Chung, Section 7.2].


Content and Comments.
0:00 Comments on sums of small independent random variables.
11:27 The converse of Lindeberg’s theorem. This is part of [Chung, Theorem
7.2.1].
12:14 Lemma:

lim max P |Xn,j | >  ] = 0 for all  > 0
n→∞ 1≤j≤n

if and only if

lim sup max ϕn,j (t) − 1 = 0 for all T > 0.
n→∞ |t|≤T 1≤j≤n

24:00 Proof of the theorem, initial considerations.


30:12 Proof of the theorem, part 1: We have that
X 
ϕn,j (t) − 1 = − (t2 /2) .

lim
n→∞
1≤j≤n

39:51 Proof of the theorem, part 2: conclusion.


Further readings.
A. [Chung, Section 7.2]
B. In [Durrett-4th, Section 3.4], read subsection 3.4.3.
Recommended exercises.
*a. [Chung, Section 7.2], exercise 3.
b. [Chung, Section 7.2], exercise 7, 10.
c. [Durrett-4th, Section 3.4], exercise 9
Suggested exercises.
a. [Chung, Section 7.2], exercise 5, 8, 9, 12.
b. [Durrett-4th, Section 3.4], exercises 10, 13
21

Lecture 18: Infinitely Divisible Laws.

Summary. This lecture is based on [Breiman, Sections 9.4 and 9.5]


Content and Comments.
0:00 Statement of the problem: Let (Xn,j : 1 ≤ j ≤ kn ), kn → ∞, be an array of
independent random variables. Assume that they are uniformly negligible:
For all  > 0,
 
lim max P |Xn,j | >  = 0 .
n→∞ 1≤j≤kn
P
Let Sn = 1≤j≤kn Xn,j . 1. What are the possible limits (in distribution)
of Sn ? 2. Give necessary conditions on the sequence to guarantee the
convergence.
2:41 Poisson convergence. Let (Xn,j : 1 ≤ j ≤ n) be an array of i.i.d. random
variables such that P [Xn,1 = 1] = pn = 1 − P [Xn,1 P = 0]. Then, they are
uniformly negligible if pn → 0. Moreover, Sn = 1≤j≤kn Xn,j converges
in distribution to a Poisson law if the sequence n pn converges to some
λ ∈ [0, ∞).
9:57 Proposition. Sn converges in distribution if and only if n pn → λ ∈ [0, ∞).
In this case the limit is the Poisson distribution with parameter λ. This is
[Breiman, Theorem 9.4]. The index n in sometimes denoted by N .
25:50 The characteristic function of a Poisson distribution. The one of a Poisson
distribution with jump size a > 0 (P [X = ak] = (λk /k!) e−λ , k ≥ 0. The
characteristic function of finite sums of independent Poisson distributions
with different parameters and jump sizes.
32:10 Definition of infinitely divisible laws.
38:20 Proposition. Let (Xn,j : 1 ≤ j ≤ kn ) be an array of i.i.d. random variables.
Suppose that kn → ∞ and that Sn converges in distribution to S. Then, S
has an infinitely divisible law. This is [Breiman, Proposition 9.9]. Note that
the converse statement is trivial. That is, if S has an infinitely divisible
law, then there exists
P an array of i.i.d. random variables (Xn,j : 1 ≤ j ≤ n)
such that Sn = 1≤j≤n Xn,j converges in distribution to S.
47:57 Examples of IDL: Dirac, Gaussian, Poisson and independent Poisson sums
of i.i.d. random variables.
Further readings.
A. [Chung, Section 7.6]
B. [Durrett-4th, Sections 3.6 and 3.8]
Recommended exercises.
a. Prove [Durrett-4th, Theorem 3.6.1].
b. [Durrett-4th, Section 3.6], exercises 1, 2, 3, 4, 5, 6, 7.
c. Prove [Durrett-4th, Lemmata 3.6.2 – 3.6.4].
d. Prove [Durrett-4th, Theorems 3.6.6 and 3.6.7].
e. [Durrett-4th, Section 3.8], exercise 2.
f. Prove [Chung, Theorem 7.6.1]
g. [Chung, Section 7.6], exercises 1, 3.
22

Lecture 19: Accompanying laws.

Summary. This lecture is based on [Varadhan, Section 3.7].


Content and Comments.
0:00 Formulation of the one-dimensional central limit problem. Recall that kn →
∞.
2:01 The Poisson transformation of a law into an infinite divisible law.
4:51 The uniform negligibility is equivalent to the convergence to 1 of the char-
acteristic functions, uniformly over bounded intervals.
6:00 Definition of the accompanying laws.
10:18 Statement of the theorem on accompanying laws. This is [Varadhan, The-
orem 3.19].
12:33 Lemma 1: The sequence an,j converges uniformly to 0:
lim max | an,j | = 0 .
n→∞ 1≤j≤kn

16:20 Corollary: The sequence X en,j is uniformly negligible.


20:04 Lemma 2: There exists a finite constant C0 such that
|e
an,j | ≤ C0 P [ | X
en,j | ≥ 1/2 ] .
34:41 Recollection of the statement of the theorem and of the results proved so
far. P P
36:52 Lemma 3: Let Bn = j an,j . Then, j Xn,j −An converges in distribution
P e
to S if and only if j X n,j + Bn − An converges in distribution to S. A
similar statement holds with Y in place of X. At time 38:17, I write Bn +An
instead of Bn − An .
Conclusion: It is enough to prove the theorem with Xn,j , Yn,j replaced
by X en,j , Yen,j , respectively.

Further readings.
A. A different approach to the one-dimensional central limit problem is pre-
sented in [Breiman, Section 9.5 – 9.7].
23

Lecture 20A: Proof of the Accompanying Laws theorem, Part 1.

Summary. This lecture is based on [Varadhan, Section 3.7].


Content and Comments.
0:00 Statement of the first main proposition. This is [Varadhan, Theorem 3.19]
with Xn,j , Yn,j replaced by X en,j , Yen,j , respectively.
1:44 Step 0: It is enough to prove that for all sequences (An : n ≥ 1), the
difference ϕP Xen,j −AN (t) − ϕP Yen,j −AN (t) → 0 for all t ∈ R.
j j
3:59 Step 1: It is enough
P to prove that for all T > 0, there exists a finite constant
CT such that j | ϕn,j (t) − 1 | ≤ CT for all |t| ≤ T .
14:14 Step 2: It is enough to prove that
P
(a) There exists a finite constant C0 such that j P [ | X en,j | ≥ 1 ] ≤ C0
for all n ≥ 1,
P e2 χ e
(b) There exists a finite constant C0 such that j E[ X n,j |Xn,j |≤1 ] ≤ C0
for all n ≥ 1. P
27:25 Proposition: Assume that j Yen,j − An converges in distribution to some
law, then the previous estimates (a) and (b) hold.
28:08 Step 1: Under the hypothesis of the proposition, for all T > 0, there exists
a finite constant CT such that
Xkn h i
E { 1 − cos(tX en,j ) } ≤ CT (0.1)
j=1

for all |t| ≤ T and n ≥ 1. Note that the estimate


kn
X h i
E { 1 − cos(tX
en,j ) } ≤ C0
j=1

obtained at time [37:00] holds for all |t| ≤ t0 and all n large, say n ≥ n0 .
By changing the value of the constant C0 this inequality can be extended
to 1 ≤ n ≤ n0 . This explains why (0.1) is in force for all n ≥ 1.
38:31 Step 2: Condition (0.1) implies (a) and (b).
48:34 Actually, we proved the following result. Let (Zn,j : 1 ≤ j ≤ kn ) be an
array of independent random variables such that kn → ∞. Assume that
for all T > 0, there exists a finite constant CT such that
Xkn h i
E { 1 − cos(tZn,j ) } ≤ CT (0.2)
j=1

for all |t| ≤ T and n ≥ 1. Then, P


(a) There exists a finite constant C0 such that j P [ | Zn,j | ≥ 1 ] ≤ C0
for all n ≥ 1, and
2
P
(b) There exists a finite constant C0 such that j E[ Zn,j χ|Zn,j |≤1 ] ≤ C0
for all n ≥ 1.
24

Recommended exercise.
(*a) Assume that for all T > 0, there exists a finite constant CT such that
kn
X h i
E { 1 − cos(tX
en,j ) } ≤ CT
j=1

for all |t| ≤ T and n ≥ 1. Show that for all δ > 0, there exists a finite
P
constant C0 such that j P [ | X
en,j | ≥ δ ] ≤ Cδ for all n ≥ 1.
25

Lecture 20B: Proof of the Accompanying Laws theorem, Part 2.

Summary. This lecture is based on [Varadhan, Section 3.7].


Content and Comments.
P e
0:00 Statement of the second main result: Assume that j X n,j − An converges
in distribution to some law, then conditions (a) and (b) of the first part of
the lecture are in force.
2:23 Remarks on symmetric random variables. ϕX−X 0 (t) = | ϕX (t) |2 .
4:45 Remark 2: Let (Zn : n ≥ 1) be a sequence of random variables which
converges in distribution to some random variable S, and let (Zn0 : n ≥ 1)
be an independent copy of (Zn : n ≥ 1). Then, Zn − Zn0 converges in
distribution to S − S 0 , where S 0 is an independent copy of S.
7:09 Remark 3: Let Un,j = X en,j − X e 0 , where (Xe 0 : 1 ≤ j ≤ kn ) is an
n,j n,j
independent copy of (X en,j : 1 ≤ j ≤ kn ). The sequence P Un,j converges
j
in distribution.
11:35 Strategy of the proof,
12:32 Step 1: The conditions (a) and (b) hold for the sequence Un,j . In view of
[48:34] of the previous lecture, it is enough to prove (0.2) for Un,j .
27:30 Step 2(a): Condition (a) holds for X en,j .
34:02 Step 2(b): Condition (b) holds for X en,j .

Claim: Condition (0.1) implies that for every m > 0 there exists a finite constant
P e2 χ e
Cm such that j E[ X n,j |Xn,j |≤m ] ≤ Cm . Note that 1 has been replaced by m in
the indicator function.
Proof. Since 1 − cos(x) ≥ 0 for all x ∈ R, it follows from (0.1) that
X h i
E { 1 − cos(tX en,j ) }χ e
|tXn,j |≤π/4 ≤ CT
j

for all |t| ≤ T . Since there exists a > 0 such that 1 − cos(x) ≥ ax2 for |x| ≤ π/4,
X h i
a t2 E X 2
en,j χ|tXen,j |≤π/4 ≤ CT
j

for |t| ≤ T . Choosing t = T = π/4m yields that


 π 2 X h i
a e2 χ e
E X ≤ Cπ/4m ,
n,j |Xn,j |≤m
4m j

as claimed 
26

Lecture 21: The Lévy–Khintchine theorem.

Summary. This lecture is based on [Varadhan, Section 3.8].


Content and Comments.
0:00 Let (Xn,j : 1 ≤ j ≤ kn ) be an array of independent random variables.
According to the previous lecture, in order to examine P the convergence
in distribution of the sequence of random variables Xn,j − An , we
P je
have to consider the same problem for the sequence j Yn,j − A0n . That
is, toR examine the convergence of the sequencePof characteristic functions
exp{ [eitx − 1] Mn (dx) − ian t}, where Mn = j µ en,j , µ
en,j being the dis-
tribution measure of the random variable Xn,j .
e
6:22 The measure MN is not a probability measure. Definition of Lévy measure.
The measures Mn are Lévy measures. Introduction of the function θ(x).
19:50 Theorem: For every Lévy measure M , b ∈ R, σ 2 ≥ 0, the function
nZ  σ 2 t2 o
eitx − 1 − itθ(x) M (dx) + itb −

ϕM,b,σ2 (t) := exp (0.3)
2
is the characteristic function of an infinite divisible law. This is [Varadhan,
Theorem 3.20].
23:45 Step 1: For all δ > 0, the claim holds for
nZ  itx  o
exp e − 1 − itθ(x) M (dx) + itb
|x|≥δ

33:24 Step 2: The claim holds for


nZ  o
eitx − 1 − itθ(x) M (dx) + itb

exp

42:52 Proof of the theorem.


46:23 Theorem: (uniqueness of the representation). Assume that ϕM1 ,b1 ,σ12 (t) =
ϕM2 ,b2 ,σ22 (t) for all t ∈ R and that M1 ({0}) = M2 ({0}) = 0. Then, M1 =
M2 , b1 = b2 , σ12 = σ22 . In the statement of the theorem, I forgot to add
the hypothesis M1 ({0}) = M2 ({0}) = 0. We can always assume that
M ({0}) = 0 for a Lévy measure M because we only integrate continuous
functions which vanish at the origin..
48:18 Step 1: σ12 = σ22 .
51:07 Step 2: M1 = M2 .
59:24 Proof of the exercise left in the lecture. We showed that the bounded
Borel measures M1,s (dx) = [1 − cos(sx)] M1 (dx) and M2,s (dx) = [1 −
cos(sx)] M2 (dx) are equalR for all s ∈R R. To prove that M1 = M2 , it
is enough to show that F dM1 = F dM2 for all continuous function
with compact support in R \ {0}. Fix 0 < a < b < ∞ and a con-
tinuous function F with support contained in [a, b]. Choose δ small so
that δb < (2π − δ)a. Choose s so that (δ/a) < s < (2π − δ)/b. It
follows from this choice that sx ∈ [δ, 2π − δ] for all x ∈ [a, b]. In par-
ticular, there exists cδ > 0 such that 1 − cos(sx) ≥ cδ for all x ∈ [a, b].
Thus, F (x)/[1−cos(sx)] is a continuous
R function
R with support contained in
[a,
R b]. Hence, as M 1,s = M 2,s , FRdM 1 = F (x)/[1 − cos(sx)] M1,s (dx) =
F (x)/[1 − cos(sx)] M2,s (dx) = F dM2 , as claimed.
27

Recommended exercises.
a. [Varadhan, Section 3.8], exercises 21, 22, 23, 24, 25.
28

Lecture 22A: The one-dimensional central limit problem, part 1.

Summary. This lecture is based on [Varadhan, Section 3.8].

Remark 0.1 of Lecture 7 is used many times in this lecture.

Content and Comments.


0:00 Theorem: The sequence of random variables XMn ,σn2 ,an converges to a ran-
dom variable X if and only if there exists a Lévy measure M , σ 2 ≥ 0 and
a ∈ R such that X = XM,σ2 ,a and
(i) For all f ∈ Cb (R) for which
R there exists
R δ > 0 such that f (x) = 0 for
all |x| ≤ δ, we have that f dMn → f dM .
(ii) There exists x0 > 0 such that M ({−x0 , x0 }) = 0 and
n Z x0 o Z x0
lim x2 Mn (dx) + σn2 = x2 M (dx) + σ 2 . (0.4)
n→∞ −x0 −x0
(iii) limn→∞ an → a.
This is [Varadhan, Theorem 3.21].
12:17 If there exists x0 > 0 such that M ({x0 } ∪ {−x0 }) = 0 and (0.4) holds, then
for all x1 > 0 such that M ({x1 } ∪ {−x1 }) = 0 we have that
n Z x1 o Z x1
lim x2 Mn (dx) + σn2 = x2 M (dx) + σ 2 .
n→∞ −x1 −x1
20:25 Proof that the conditions are sufficient.
38:27 Comments on the proof
29

Lecture 22B: The one-dimensional central limit problem, part 2.

Summary. This lecture is based on [Varadhan, Section 3.8].


Content and Comments.
0:00 Statement of the theorem.
1:53 Strategy of the proof. R
3:30 Step A1: limt0 →0 supn { [ 1 − cos(tx) ] dMn + σn2 t2 /2 } = 0.
12:04 Step A2: For all T > 0, there exists a finite constant CT such that
Z
{ 1 − cos(tx) } dMn ≤ CT

for all |t| ≤ T and n ≥ 1. This has been proved in Lecture 20A, see equation
(0.1). It follows from this bound (see Lecture 20A, time [38:31] and exercise
(*a)) that there exists a finite constant C0 such that
Z Z
(a) Mn (dx) ≤ C0 and (b) x2 dMn ≤ C0
|x|≥δ |x|≤1
for all n ≥ 1,
16:04 Step A3: For all  > 0, there exists A > 0 such that Mn ([−A, A]c ) ≤ 
for all n ≥ 1, A ≥ A . I refer here to the result proved at time [44:28] in
Lecture 8.
2 2
23:01 Step A4:R Let ω(x) = x /(1 + x ). Then, there exists C0 < ∞ such that
αn = ω(x) Mn (dx) ≤ C0 for all n ≥ 1.
26:42 Step B: Claim: Given a subsequence (nk : k ≥ 1), there exist a sub-
subsequence (nkj : j ≥ 1) and a Lévy measure M such that
Z Z Z x0 Z x0
2
f (x) Mnkj (dx) → f (x) M (dx) and x Mnkj (dx) → x2 M (dx)
−x0 −x0
for all f ∈ Cb∗ (R) and x0 > 0 such that M ({−x0 , x0 }) = 0. Here, Cb∗ (R) is
the set of bounded, continuous functions f : R → R for which there exists
δ > 0 such that f (x) = 0 for all |x| ≤ δ.
29:10 Step B1: Assume that αn → 0. Then, the claim formulated in Step B holds
and M = 0.
37:00 Step B2: Assume that αn → α > 0. Then, the claim formulated in Step B
holds.
52:57 Step C: Claim: Given a subsequence (nk : k ≥ 1), there exist a sub-
subsequence (nkj : j ≥ 1) and σ 2 ≥ 0 such that σn2 k → σ 2 . In particular,
j
given a subsequence (nk : k ≥ 1), there exist a sub-subsequence (nkj : j ≥
1) and a triple (M, σ 2 , 0) such that (Mnkj , σn2 k , 0) → (M, σ 2 , 0).
j
56:56 Step D: Lemma: Assume that Xn → X and that Xn + an → Y in distri-
bution. Then an → a for some a and Y = X + a in distribution.
1:01:32 Step E: The sequence ankj converges to some a ∈ R.
1:03:58 Conclusion of the tightness part of the proof: Given a subsequence (nk :
k ≥ 1), there exist a sub-subsequence (nkj : j ≥ 1) and a triple (M, σ 2 , a)
such that (Mnkj , σn2 k , ankj ) → (M, σ 2 , a).
j
1:05:15 Uniqueness of limits.
30

Further readings.
A. The introduction of the measures dνn = [x2 /(1 + x2 )] dMn in the previous
proof is taken from [Breiman, Theorem 9.17].
31

Lecture 23: Applications.

Summary. This lecture is based on [Varadhan, Section 3.8].


Content and Comments.
0 A reminder of the result proved in the previous lectures. The convergence of
P
the sum j Xn,j − An is reduced to the convergence of a triple (M fn , 0, bn ).
0 Poisson convergence: Assume that (Xn,j : 1 ≤ j ≤ kn ) is an array of inde-
pendent random variables such that P [Xn,j = 1] = pn,j = 1 − P [X Pn,j = 0].
Suppose, further that max1≤j≤kn pn,j → 0. Then, the sequence j Xn,j −
A
Pn converges in distribution to a random variable S if and only if pn =
j pn,j → p and An → A. Moreover, S = N − A, where N is a Poisson
distribution of parameter p and A ∈ R the limit of An . Note that when
p = 0, N is the degenerate Poisson distribution, that is, P [N = 0] = 1.
0 Let ϕM,σ2 ,b (t) be the characteristic function associated to a triple (M, σ 2 , b).
Then, the distribution is Gaussian if and only if M = 0.
0 Let (Xn,j : 1 ≤ j ≤ kn ) be an array P of independent, uniformly negligible
random variables. Assume that j Xn,j converges in distribution to a
random variable S. Then, S is Gaussian if and only if for all δ > 0,
kn
X
lim P [ |Xn,j | > δ ] = 0 (0.5)
n→∞
j=1

0 Under the previous hypotheses, the mean µ and variances σ 2 of S are given
by
kn
X h i
σ 2 = lim E (Xn,j − an,j )2 χ|Xn,j −an,j |≤x0 (0.6)
n→∞
j=1

kn n
X  o
µ = lim an,j + E θ(Xn,j − an,j )
n→∞
j=1

Note that in the first line the limit does not depend on x0 because PMn2→ 0.
f
2 2 2
0 Assume, further, that E[Xn,j ] = 0, σn,j = E[Xn,j ] < ∞, σn = j σn,j →
σ 2 and S ∼ N (0, σ 2 ). Then, Lindeberg’s condition holds. This (re)proves
the assertion that Lindeberg’s condition are not only sufficient for conver-
gence to aP Gaussian random variable, but also necessary.
0 Claim 1: j an,j → 0. This claim is not necessary for the argument. For
this reason
P skipped
I its proof in the lecture. It is presented below.
2
0 Claim 2: a
j n,j → 0.
0 Conclusion: Lindeberg’s condition holds. For all δ > 0,
kn
X  2 
lim E Xn,j χ|Xn,j |>δ = 0
n→∞
j=1
P
Proof that j an,j → 0. Since µ = 0,
kn n
X  o
lim an,j + E θ(Xn,j − an,j ) = 0
n→∞
j=1
32

Fix 0 < δ < 1/2. By (0.5) and since θ is a bounded function, we may introduce the
indicator of |Xn,j | ≤ δ inside the expectation to get that
kn n
X  o
lim an,j + E θ(Xn,j − an,j ) χ|Xn,j |≤δ = 0 (0.7)
n→∞
j=1

We claim that there exists a finite constant C0 such that


kn
X h i

lim sup E θ(Xn,j − an,j ) − (Xn,j − an,j ) χ |Xn,j |≤δ
≤ C0 δ
n→∞
j=1

By definition of θ, there exists a finite constant C0 such that | θ(x) − x | ≤ C0 |x|3


for |x| ≤ 1. Hence, as maxj an,j → 0 and δ < 1/2, the previous sum is bounded by
kn h i kn h i
X 3 X 2
C0 E Xn,j − an,j χ|Xn,j |≤δ ≤ C0 δ E Xn,j − an,j χ|Xn,j |≤δ ,
j=1 j=1

where the value of the constant C0 may change from line to line. The last sum is
bounded by the sum appearing in (0.6) provided we choose x0 sufficiently large.
This proves the claim.
It follows from this claim and (0.7) that there exists a finite constant C0 such
that
Xkn n
  o
lim sup an,j + E (Xn,j − an,j ) χ|Xn,j |≤δ ≤ C0 δ .

n→∞
j=1
Note that the function θ disappeared. Since maxj an,j is bounded, (0.5) and the
previous equation yield that
Xkn
 
lim sup E Xn,j χ|Xn,j |≤δ ≤ C0 δ .

n→∞
j=1

By (0.5) again,
Xkn
 
lim sup E Xn,j χδ<|Xn,j |≤1 = 0 .

n→∞
j=1
Hence,
Xkn
 
lim sup E Xn,j χ|Xn,j |≤1 ≤ C0 δ .

n→∞
j=1
To complete the proof, it remains to recall that an,j = E[ Xn,j χ|Xn,j |≤1 ] and to let
δ → 0.
33

Lecture 24: Conditional expectation.

Summary. This lecture is based on [Chung, Section 9.1].


Content and Comments.
0:0 Let (Ω, F, P ) be a probability space, fixed throughout this lecture. Defini-
tion of P [ B | A ] and of PA [ · ].
2:10 Definition of E[ f | A ].
6:02 Example with a deck of cards
12:38 Let (An : n ≥ 1) be a partition of Ω such that An ∈ F, P [An ] > 0. Then,
X
P [B] = P [ B | An ] P [An ]
n≥1

15:12 For all f integrable,


X
E[f ] = E[ f | An ] P [An ]
n≥1

17:11 Let G = σ(Aj : j ≥ 1). Exercise: show that B belongs to G if and only if
B = ∪j∈M Aj for some M ⊂ N.
18:31 A function h : Ω → R is measurable with respect to G if and only if it is
constant on each set Aj . Proof that it is measurable if it is constant.
26:18 Proof that it is constant
P if it is measurable.
33:44 Let E[ f | G ] = n≥1 E[ f | An ] χAn . Then, E[ f | G ] is G-measurable.
37:35 For every bounded function h, measurable with respect to G,
Z Z
E[ f | G ] h dP = f h dP
B
43:27 Step 1: For every set B measurable with respect to G,
Z Z
E[ f | G ] dP = f dP
B B
48:17 Step 2: Extension to bounded functions, measurable with respect to G.
51:07 Summary of the properties of E[ f | G ] in the case where G = σ(Aj : j ≥ 1).
56:44 Definition of the conditional expectation E[ f | G ] for an integrable function
f ∈ F and a σ-algebra G ⊂ F.
58:30 Existence and uniqueness of the conditional expectation.
1:04:06 Uniquenes of the conditional expectation.
1:06:20 The conditional expectation E[ f | G ] is integrable.
1:10:56 If f is G-measurable, then E[ f | G ] = f .
1:12:32 Let f be an integrable function. Then, f can be decomposed as f = f1 +f2 ,
where f1 is G-measurable and f2 ⊥ G in the sense that E[ f h ] = 0 for all
bounded functions h which are G-measurable.
1:17:10 Definition of E[ f | X ] and of P [ A | G ].
34

Further readings.
A. The proof of the Radon-Nykodim theorem can be found in [Taylor, Section
6.4] or in my lectures on measure theory.
B. [Varadhan, Section 4.1] provides a short proof of Radon-Nykodim theorem,
which relies on the Hahn-Jordan decomposition of a measure.
C. [Durrett-4th, Section 5.1], examples 1 – 6
D. [Breiman, Section 4.1] proposes a slightly different approach to conditional
expectation.
Recommended exercises.
 
a. Show that E E[ Z | G ] = E[Z] if Z is integrable.
b. [Varadhan, Section 4.1], exercises 1 – 8 recall important facts from mea-
sure theory which are used in the proofs of the properties of conditional
expectation.
c. [Chung, Section 9.1], exercises 1 – 6
Suggested exercises.
a. [Breiman, Section 4.1], problems 2, 3, 5, 7, 8.
35

Lecture 25: Properties of conditional expectation.

Summary. This lecture is based on [Chung, Section 9.1].

Throughout this lecture, (Ω, F, P ) is a fixed probability space, G is a σ-algebra


contained in F and X, Y , (Xn : n ≥ 1) are integrable random variables which are
measurable with respect to F.

Content and Comments.


0:0 The conditional expectation is linear: E[ X + Y | G ] = E[ X | G ] +
E[ Y | G ]
5:14 The conditional expectation is monotone: E[ X | G ] ≤ E[ Y | G ] if X ≤ Y .
10:06 If Y is G-measurable
and X, XY are integrable, then E[ X Y | G ] = Y E[ X | G ]
19:54 E[ X | G ] ≤ E[ |X| | G ].
22:50 Conditional monotone convergence theorem
29:40 Conditional Fatou’s lemma
34:37 Conditional dominated convergence theorem
44:07 Jensen’s conditional inequality
1:01:46 Schwarz’ conditional inequality. The argument could be slightly simpler to
prove that the integral of X 2 /E[X 2 | G ] is equal to 1. We have that
h X2 i 1
χ≤E[ X 2 | G ]≤−1 E X 2 G
 
E 2
χ≤E[ X 2 | G ]≤−1 G = 2
E[ X | G ] E[ X | G ]
because X 2 and (X 2 /E[ X 2 | G ]) χ≤E[ X 2 | G ]≤−1 are integrable. Taking
expectation on both sides and simplifying the right-hand side yields that
X2
Z Z
χ 2 −1 dP = χ≤E[ X 2 | G ]≤−1 dP
E[ X 2 | G ] ≤E[ X | G ]≤
 
because E E[ Z | G ] = E[Z] if Z is integrable. It remains to let  → 0
and invoke the monotone convergence theorem.
1:15:26 If G1 ⊂ G2 , then
h i h i
E E[ X | G2 ] G1 = E[ X | G1 ] = E E[ X | G1 ] G2

Further readings.
A. [Varadhan, Section 4.2], specially Remark 4.5.
B. [Breiman, Section 4.2]
C. [Durrett-4th, Section 5-1]
Exercises a, b, c and d below are strongly recommended.
Recommended exercises.
*a. [Varadhan, Section 4.2], exercise 9
*b. Prove [Durrett-4th, Section 5-1], Theorems 4 and 8
*c. [Durrett-4th, Section 5-1], exercises 8, 9
*d. Prove [Breiman, Section 4.2] Proposition 4.20.(4)
e. [Chung, Section 9.1], exercises 7, 8, 9, 12
f. [Durrett-4th, Section 5-1], exercises 3, 4, 6
Suggested exercises.
36

a. [Chung, Section 9.1], exercises 10, 11, 13


b. [Durrett-4th, Section 5-1], exercises 7, 10, 11, 12
c. [Breiman, Section 4.2], problems 12, 14, 15
37

Lecture 26: Regular conditional probability.

Summary. This lecture is based on [Varadhan, Section 4.3].


Content and Comments.
Throughout this section, (Ω, F, P ) is a probability space and G ⊂ F a σ-algebra.
0:00 Definition of the conditional probability µ(ω, A) := P [A | G ], for A ∈ F.
3:30 Properties of the conditional probability µ( · , A), A ∈ F.
9:03 Definition of a regular conditional probability (RCP) given a σ-algebra.
13:54 Let µ(ω, A) be a RCP on (Ω, F, P ) given
R G. Then, for all integrable function
f : Ω → R (F-measurable), ω 7→ f d µω is a version of the conditional
expectation E[ f | G ].
20:30 Theorem: Let P be a probability measure on ([0, 1], B), where B is the
Borel σ-algebra, and let G ⊂ B. Then, there exists a RCP given G. This is
[Varadhan, Theorem 4.7].
29:18 Step 1: For x ∈ Q, define Fx : Ω → [0, 1] as Fx = P [ (−∞, x] | G ]. Main
properties of Fx .
37:46 Step 2: On a set of full measure N c , define Gx = inf{ Fy : y > x , y ∈ Q }.
For ω ∈ N c , y 7→ Gy (ω) is a distribution function.
44:53 Step 3: For ω ∈ N c , let Q(ω, ·) be the measure on ([0, 1], B) associated to
the distribution function G· (ω). For ω ∈ N , define Q(ω, ·) as the Lebesgue
measure on ([0, 1], B). For all AR ∈ B, ω 7→ Q(ω, A) is G-measurable.
54:04 Step 4: For all A ∈ B, B ∈ G, B Q( · , A) dP = P [A ∩ B]. This concludes
the proof. Keep in mind that the definition of Q is different on N , but this
set has measure 0 and is measurable with respect to G. We may therefore
replace B by B ∩ N c in the previous argument.
Further readings.
A. [Breiman, Section 4.3] defines regular conditional distributions (RCD) and
proves the existence of regular conditional distributions for random vectors
taking values in Borel spaces. In the notes of this chapter, there is an
example of space where a RCP does not exist.
C. [Durrett-4th, Section 5.1.3] proves the existence of RCD’s on Borel spaces.
Exercise 15 illustrates the interest of the existence of RDP’s.

Suggested exercises.
a. [Varadhan, Section 4.3], exercise 11
b. [Durrett-4th, Section 5-1], exercises 13, 15.
c. [Breiman, Section 4.2], problem 17
38

References
[Breiman] L. Breiman, Probability. Corrected reprint of the 1968 original. Classics in Applied
Mathematics, 7. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA,
1992. xiv+421 pp. ISBN: 0-89871-296-3
[Chung] K. L. Chung, A course in Probability Theory. Third edition. Academic Press, Inc., San
Diego, CA, 2001. xviii+419 pp. ISBN: 0-12-174151-6
[Dembo-Zeitouni] A. Dembo, O. Zeitouni, Large deviations techniques and applications. Cor-
rected reprint of the second (1998) edition. Stochastic Modelling and Applied Probability,
38. Springer-Verlag, Berlin, 2010. ISBN: 978-3-642-03310-0
[Deuschel-Stroock] J-D. Deuschel, D. W. Stroock, Large deviations. Pure and Applied Mathemat-
ics, 137. Academic Press, Inc., Boston, MA, 1989. ISBN: 0-12-213150-9
[Durrett] R. Durrett, Probability: Theory and Examples. Second edition. Duxbury Press, Bel-
mont, CA, 1996. xiii+503 pp. ISBN: 0-534-24318-5
[Durrett-4th] R. Durrett, Probability: Theory and Examples. Fourth edition. Cambridge Series in
Statistical and Probabilistic Mathematics, 31. Cambridge University Press, Cambridge, 2010.
ISBN: 978-0-521-76539-8
[Rudin] W. Rudin, Principles of Mathematical Analysis. Third edition. McGraw-Hill, Inc. New
York (1964).
[Taylor] S. J. Taylor, Introduction to Measure and Integration, Cambridge University Press, 1973.
ISBN 978-0-521-09804-5
[Varadhan-LD] Large deviations and applications. CBMS-NSF Regional Conference Series in Ap-
plied Mathematics, 46. Society for Industrial and Applied Mathematics (SIAM), Philadelphia,
PA, 1984. ISBN: 0-89871-189-4
[Varadhan] S. R. S. Varadhan, Probability Theory. Courant Lecture Notes in Mathematics, 7.
New York University, Courant Institute of Mathematical Sciences, New York; American
Mathematical Society, Providence, RI, 2001. ISBN: 0-8218-2852-5

You might also like