Probability Theory Presentation 10

BST 401 Probability Theory
Xing Qiu Ha Youn Lee
Department of Biostatistics and Computational Biology

University of Rochester
October 7, 2010
Qiu, Lee BST 401

Outline
1 Introduction to functional analysis
2 Convergence of Sequence of Measurable Functions
Qiu, Lee BST 401

Motivation (I)
Functional analysis is in some sense the linear algebra of

measurable functions/random variables. You’ve already
seen that linear combinations of r.v.s are r.v.s.
The usual linear algebra deals with finite dimensional
vectors. In general, random variables are inherently infinite
dimensional.
For an Euclidean space, all linear transformations can be
expressed as matrix multiplications in a basis system.
There is also a way to define a (infinite) basis system (and
coordinates) for a functional space. So linear
transformations of r.v.s can be expressed in this basis
system explicitly.
It turns out, all linear transformations are integrals in a
basis system.
Qiu, Lee BST 401
Motivation (I)

dimensional.
system explicitly.
basis system.
Qiu, Lee BST 401
Motivation (I)

dimensional.
system explicitly.
basis system.
Qiu, Lee BST 401
Motivation (I)

dimensional.
system explicitly.
basis system.
Qiu, Lee BST 401
Motivation (I)

dimensional.
system explicitly.
basis system.
Qiu, Lee BST 401
Motivation (II)
The functional norm will act as vector length, and

sometimes we can even define an inner product between
two vectors. Consequently two r.v.s may have an “angle”
between them; they may be orthogonal to each other.
Many important mathematical concepts, such as continuity,
convergence, and completeness, can be derived from the
norm of a functional space.
Unlike n-dim Euclidean vector spaces, norms defined on
an infinite functional space are not equivalent. Depending
on different norms, we have different functional spaces.
Lp (Ω) spaces, 1 6 p 6 ∞ are the most important
functional spaces for studying probability theory.
Other spaces, such as the Sobolev spaces are useful for
nonparametric regression, functional analysis, SDE, etc.
Qiu, Lee BST 401
Motivation (II)

Qiu, Lee BST 401
Motivation (II)

Qiu, Lee BST 401
Motivation (II)

Qiu, Lee BST 401
Motivation (II)

Qiu, Lee BST 401
Lp -space
(Ω, F , µ) is a measurable space.

For p > 1, we define Lp (Ω, F , µ) (in short, Lp ) to be the
space of µ-measurable functions such that
Z 1
p
p
kf kp = |f | dµ < ∞.
Ω
Special case: random variables with finite mean (L1 );

random variables with finite variance (L2 ).
Another special case: L∞ (Ω), the space of all almost
surely bounded r.v.s:
kf k∞ = lim kf kp = ess sup |f (x)|.

p→∞ Ω
Qiu, Lee BST 401

Lp -space

Z 1
p
p
kf kp = |f | dµ < ∞.
Ω


p→∞ Ω
Qiu, Lee BST 401

Lp -space

Z 1
p
p
kf kp = |f | dµ < ∞.
Ω


p→∞ Ω
Qiu, Lee BST 401

Lp -space

Z 1
p
p
kf kp = |f | dµ < ∞.
Ω


p→∞ Ω
Qiu, Lee BST 401

Basic properties
Lp norms are length:

1 Non-negativity. kf kp ≥ 0.
2 Zero function has length zero. k0kp = 0.
3 Commute with scalar multiplication. kcf kp = ckf kp .
4 The triangle inequality. kf + gkp ≤ kf kp + kgkp .
(Minkowski’s inequality).
Therefore, Lp spaces are linear spaces. f , g ∈ Lp implies
c1 f + c2 g ∈ Lp because kc1 f + c2 gkp 6 c1 kf kp + c2 kgkp .
Lp norm defines Lp -convergence. For f ∗ and
Lp
f1 , f2 , . . . ∈ Lp (Ω), we say fn → f ∗ if
kfn − f ∗ kp → 0.
Qiu, Lee BST 401

Basic properties

Lp
kfn − f ∗ kp → 0.
Qiu, Lee BST 401

Basic properties

Lp
kfn − f ∗ kp → 0.
Qiu, Lee BST 401

Basic properties

Lp
kfn − f ∗ kp → 0.
Qiu, Lee BST 401

Basic properties

Lp
kfn − f ∗ kp → 0.
Qiu, Lee BST 401

Basic properties

Lp
kfn − f ∗ kp → 0.
Qiu, Lee BST 401

Basic properties (II)
A norm induces a distance: distp (f , g) = kf − gkp . With

distance we can define Cauchy sequence. f1 , f2 , . . . is a
Cauchy sequence (relative to the given distance) if ∀ > 0,
there exists N ∈ N, such that
distp (fn , fm ) < , ∀n, m > N.
Completeness. A functional space X is complete if every

Cauchy sequence converges to a member in X.
Lp spaces are complete.
Implication: if a sequence of r.v.s X1 , X2 , . . . satisfies
limn,m→∞ E|Xn − Xm |p = 0, then there must be a r.v. X ∗ to
which Xn converges, and X ∗ ∈ Lp (Ω) as well. So say if Xn
have finite variances, X ∗ must have finite variance as well.
Qiu, Lee BST 401


distp (fn , fm ) < , ∀n, m > N.

Qiu, Lee BST 401


distp (fn , fm ) < , ∀n, m > N.

Qiu, Lee BST 401


distp (fn , fm ) < , ∀n, m > N.

Qiu, Lee BST 401

Dense subset/approximation
For simplicity, assume Ω = R.

Recall Q is dense in R. Dense subsets in Lp :
set of simple functions;
set of continuous functions;
set of smooth functions (functions with arbitrary
derivatives).
set of polynomials. (checkout the Bernstein polynomials
from Wikipedia)
Qiu, Lee BST 401


derivatives).
from Wikipedia)
Qiu, Lee BST 401


derivatives).
from Wikipedia)
Qiu, Lee BST 401


derivatives).
from Wikipedia)
Qiu, Lee BST 401


derivatives).
from Wikipedia)
Qiu, Lee BST 401


derivatives).
from Wikipedia)
Qiu, Lee BST 401

Basis
A basis (e1 , e2 , . . . , en ) of n-dim linear space (not

necessarily orthogonal):
1 ei are linearly independent;
2 every X ∈ X can be written Pn as a linear combination of
(e1 , e2 , . . . , en ). X = i=1 xi ei .
For a Banach space:
2 every X ∈ X can be written as
∞
X
X = xi ei ,
i=1
this summation is understood as a limit.

Example: Taylor expansion + smooth function
approximation of an Lp ([0, 1], B, L) function.
Qiu, Lee BST 401

Basis

(e1 , e2 , . . . , en ). X = i=1 xi ei .
For a Banach space:
∞
X
X = xi ei ,
i=1

Qiu, Lee BST 401

Basis

(e1 , e2 , . . . , en ). X = i=1 xi ei .
For a Banach space:
∞
X
X = xi ei ,
i=1

Qiu, Lee BST 401

Basis

(e1 , e2 , . . . , en ). X = i=1 xi ei .
For a Banach space:
∞
X
X = xi ei ,
i=1

Qiu, Lee BST 401

Basis

(e1 , e2 , . . . , en ). X = i=1 xi ei .
For a Banach space:
∞
X
X = xi ei ,
i=1

Qiu, Lee BST 401

Basis

(e1 , e2 , . . . , en ). X = i=1 xi ei .
For a Banach space:
∞
X
X = xi ei ,
i=1

Qiu, Lee BST 401

Basis

(e1 , e2 , . . . , en ). X = i=1 xi ei .
For a Banach space:
∞
X
X = xi ei ,
i=1

Qiu, Lee BST 401

Inner product and Hilbert space
A complete normed linear space such as Lp is called a

Banach space.
A Hilbert space H is a Banach space with a inner product
hf , gi : H × H → R which satisfies1
1 Bilinearity: haX + bY , Z i = ahX , Z i + bhY , Z i.
2 hX , Y i = hY , X i. 2
3 hX , X i > 0 and hX , X i = 0 iff X = 0.
p
An inner product induces a norm: kX k := hX , X i. But a
norm in general can not be extended to an inner product.
L2 is a Hilbert space and the only Hilbert space
R among L
p
spaces. Its inner product: hX , Y i2 = EXY = Ω XY dµ.
1
R should be replaced by C for spaces of complex valued functions.
2
For complex Hilbert spaces, hX , Y i = hY , X i, where · is complex
conjugate.
Qiu, Lee BST 401

Banach space.
2 hX , Y i = hY , X i. 2
p
R among L
p
1
2
conjugate.
Qiu, Lee BST 401

Banach space.
2 hX , Y i = hY , X i. 2
p
R among L
p
1
2
conjugate.
Qiu, Lee BST 401

Banach space.
2 hX , Y i = hY , X i. 2
p
R among L
p
1
2
conjugate.
Qiu, Lee BST 401

Banach space.
2 hX , Y i = hY , X i. 2
p
R among L
p
1
2
conjugate.
Qiu, Lee BST 401

Banach space.
2 hX , Y i = hY , X i. 2
p
R among L
p
1
2
conjugate.
Qiu, Lee BST 401

Banach space.
2 hX , Y i = hY , X i. 2
p
R among L
p
1
2
conjugate.
Qiu, Lee BST 401
Properties of a Hilbert space
With an inner product, we can define orthogonality. X is

orthogonal to Y if hX , Y i = 0.
hX , Y i
Also the angel between two vectors: cos α := kX kkY k .
A Hilbert space is a Banach space, so it has a basis. We
can go one step further: a separable Hilbert spaces has an
orthonormal basis (e1 , e2 , . . .) such that: a) (ei ) is a basis;
b) kei k = 1; c) hei , ej i = 0. Given an orthonormal basis,
every X ∈ X can be expressed as:
∞
X
X = hX , ei iei .
i=1
Qiu, Lee BST 401


hX , Y i
∞
X
X = hX , ei iei .
i=1
Qiu, Lee BST 401


hX , Y i
∞
X
X = hX , ei iei .
i=1
Qiu, Lee BST 401

Applications
The first n-terms provides a good approximation of X :

n
X ∞
X ∞
X
kX − hX , ei iei k = k hqX , ei iei k = hX , ei i ↓ 0.
i=1 i=n+1 i=n+1
This approximation is the foundation of nonparametric

regression (splines are n-term approximations of an
unknown regression function in an abstract Hilbert space),
Fourier analysis, wavelet analysis, PDE, and much more.
We can define projections in a Hilbert space. A projection
to a Hilbert subspace M ( X breaks X into two parts,
X = ProjM X + X ⊥ . ProjM X ∈ M has the smallest distance
with X . This is the theoretic foundation of regression
theory.
Qiu, Lee BST 401

Applications
The first n-terms provides a good approximation of X :

n
X ∞
X ∞
X
kX − hX , ei iei k = k hqX , ei iei k = hX , ei i ↓ 0.
i=1 i=n+1 i=n+1
This approximation is the foundation of nonparametric

regression (splines are n-term approximations of an
unknown regression function in an abstract Hilbert space),
Fourier analysis, wavelet analysis, PDE, and much more.
We can define projections in a Hilbert space. A projection
to a Hilbert subspace M ( X breaks X into two parts,
X = ProjM X + X ⊥ . ProjM X ∈ M has the smallest distance
with X . This is the theoretic foundation of regression
theory.
Qiu, Lee BST 401

Probability Theory Presentation 10

Uploaded by

Copyright:

Available Formats

Probability Theory Presentation 10

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability Theory Presentation 10

Uploaded by

Copyright:

Available Formats

BST 401 Probability Theory

Xing Qiu Ha Youn Lee

Department of Biostatistics and Computational Biology

Qiu, Lee BST 401

1 Introduction to functional analysis

2 Convergence of Sequence of Measurable Functions

Qiu, Lee BST 401

Functional analysis is in some sense the linear algebra of

Functional analysis is in some sense the linear algebra of

Functional analysis is in some sense the linear algebra of

Functional analysis is in some sense the linear algebra of

Functional analysis is in some sense the linear algebra of

The functional norm will act as vector length, and

The functional norm will act as vector length, and

The functional norm will act as vector length, and

The functional norm will act as vector length, and

The functional norm will act as vector length, and

(Ω, F , µ) is a measurable space.

Special case: random variables with finite mean (L1 );

kf k∞ = lim kf kp = ess sup |f (x)|.

Qiu, Lee BST 401

(Ω, F , µ) is a measurable space.

Special case: random variables with finite mean (L1 );

kf k∞ = lim kf kp = ess sup |f (x)|.

Qiu, Lee BST 401

(Ω, F , µ) is a measurable space.

Special case: random variables with finite mean (L1 );

kf k∞ = lim kf kp = ess sup |f (x)|.

Qiu, Lee BST 401

(Ω, F , µ) is a measurable space.

Special case: random variables with finite mean (L1 );

kf k∞ = lim kf kp = ess sup |f (x)|.

Qiu, Lee BST 401

Lp norms are length:

Qiu, Lee BST 401

Lp norms are length:

Qiu, Lee BST 401

Lp norms are length:

Qiu, Lee BST 401

Lp norms are length:

Qiu, Lee BST 401

Lp norms are length:

Qiu, Lee BST 401

Lp norms are length:

Qiu, Lee BST 401

A norm induces a distance: distp (f , g) = kf − gkp . With

distp (fn , fm ) < , ∀n, m > N.

Completeness. A functional space X is complete if every

Qiu, Lee BST 401

A norm induces a distance: distp (f , g) = kf − gkp . With

distp (fn , fm ) < , ∀n, m > N.

Completeness. A functional space X is complete if every

Qiu, Lee BST 401

A norm induces a distance: distp (f , g) = kf − gkp . With

distp (fn , fm ) < , ∀n, m > N.

Completeness. A functional space X is complete if every

Qiu, Lee BST 401

A norm induces a distance: distp (f , g) = kf − gkp . With

distp (fn , fm ) < , ∀n, m > N.

distp (fn , fm ) < , ∀n, m > N.

distp (fn , fm ) < , ∀n, m > N.

distp (fn , fm ) < , ∀n, m > N.

distp (fn , fm ) < , ∀n, m > N.