Kolmogorov-Smirnov Distance

1.
3 Kolmogorov-Smirnov distance
Only for probability measures on R.
Kolm(µ, ν) := sup |µ ((−∞, x]) − ν ((−∞, x])|

x∈R
≤ TV(µ, ν).
1.4 Facts
• All three distances defined above are stronger than weak convergence (i.e. convergence
in distribution, which is weak* convergence on the space of probaility measures, seen
as a dual space). That is, if any of these metrics go to zero as n → ∞, then we have
weak convergence. But converse is not true. However, weak convergence is metrizable
(e.g. by the Prokhorov metric).
• Important coupling interpretation of total variation distance:
T V (µ, ν) = inf {P (X 6= Y ) : (X, Y ) is a r.v. s.t. X ∼ µ, Y ∼ ν}
(i.e. infimum over all joint distributions with given marginals.)
• Similarly, for µ, ν on the real line,
Wass(µ, ν) = inf {E |X − Y | : (X, Y ) is a r.v. s.t. X ∼ µ, Y ∼ ν}
(So it’s often called the Wass1 , because of L1 norm.)
• TV isP
a very strong notion, often too strong to be useful. Suppose X1 , X2 , . . . iid ±1.
Sn = n1 Xi . Then
S
√n =⇒ N (0, 1)
n
Sn
But T V ( √ , Z) = 1 for all n. Both Wasserstein and Kolmogorov distances go to 0 at
√ n
rate 1/ n.
Lemma 1 Suppose W, Z are two r.v.’s and

p Z has a density w.r.t. Lebesgue measure bounded
by a constant C. Then Kolm(W, Z) ≤ 2 CWass(W, Z).
Proof: Consider a point t, and fix an . Define two functions g1 and g2 as follows. Let
g1 (x) = 1 on (−∞, t), 0 on [t + , ∞) and linear interpolation in between. Let g2 (x) = 1 on
(−∞, t − ], 0 on [t, ∞), and linear interpolation in between. Then g1 and g2 form upper
and lower ‘envelopes’ for 1(−∞,t] . So
P (W ≤ t) − P (Z ≤ t) ≤ E g1 (W ) − E g1 (Z) + E g1 (Z) − P (Z ≤ T ).
2-2
Now E g1 (W ) − E g1 (Z) ≤ 1 Wass(W, Z) since g1 is (1/)-Lipschitz, and E g1 (Z) − P (Z ≤
t) ≤ C since Z has density bdd by C.
Now using g2 , same bound holds for the other side: P (Z ≤ t) − P (W ≤ t). Optimize over
to get the required bound. 2
1.5 A stronger notion of distance

Pn
Exercise 1: Sn a simple random walk (SRW). Sn = 1 Xi , with Xi iid ±1. Then
S
√n =⇒ Z ∼ N (0, 1).
n
The Berry-Esseen bound: Suppose X1 , X2 , . . . iid E(X1 ) = 0, E(X12 ) = 1, E |X|3 < ∞.

Then
3 E |X1 |3

Sn
Kolm √ , Z ≤ √
n n
Can also show that for SRW,

S Const
Wass √n , Z ≤ √
n n
Sn
This means that it is possible to construct √
n
and Z on the same space such that

Sn C
E √ − Z ≤ √
n n
Can we do it in the strong sense? That is:

Sn t
≤ Ce−ct .

P √ − Z > √

n n
This is known as Tusnády’s Lemma. Will come back to this later.
2 Integration by parts for the gaussian measure
The following result is sometimes called ‘Stein’s Lemma’.
Lemma 2 If Z ∼ N (0, 1), and f : R → R is an absolutely continuous function such that

E |f 0 (Z)| < ∞, then E Zf (Z) = E f 0 (Z).
2-3
Proof: First assume f has compact support contained in (a, b). Then the result follows
from integration by parts:
Z b h ib Z b
−x2 /2 −x2 /2 2 /2
xf (x)e dx = f (x)e + f 0 (x)e−x dx.
a a a
Now take any f s.t. E |Zf (Z)| < ∞, E |f 0 (Z)| < ∞, E |f (Z)| < ∞.
Take a piecewise linear function g that takes value 1 in [−1, 1], 0 outside [−2, 2], and between
0 and 1 elsewhere. Let
fn (x) := f (x)g(x/n).
Then clearly,
|fn (x)| ≤ |f (x)| for all x and fn (x) → f (x) pointwise.
Similarly, fn0 → f 0 pointwise. Rest follows by DCT. The last step is to show that the
finiteness of E |f 0 (Z)| implies the finiteness of the other two expectations.
Suppose E |f 0 (Z)| < ∞. Then

Z ∞ Z ∞ Z x
−x2 /2 f (y) dy e−x2 /2 dx
0
|xf (x)| e dx ≤ x
0
Z0 ∞ 0
0 ∞ −x2 /2
Z
= f (y) xe dx dy.
0 y
| {z }
2 /2
e−y
Finiteness of E |f (Z)| follows from the inequality |f (x)| ≤ sup|t|≤1 |f (t)| + |xf (x)|. 2
Exercise 2: Find f s.t. E |Zf (Z)| < ∞ but E |f 0 (Z)| = ∞.
Next time, Stein’s method. Sketch:
Suppose you have a r.v. W and Z ∼ N (0, 1) and you want to bound
sup |E g(W ) − E g(Z)| ≤ sup E f 0 (W ) − W f (W )

g∈D f ∈D0
Main difference between stein’s method and characteristic functions is that Stein’s method
is a local technique. We transfer a global problem to a local problem. It’s a theme that is
present in many branches of mathematics.
2-4

Kolmogorov-Smirnov Distance

Uploaded by

Copyright:

Available Formats

Kolmogorov-Smirnov Distance

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kolmogorov-Smirnov Distance

Uploaded by

Copyright:

Available Formats

1.

Only for probability measures on R.

Kolm(µ, ν) := sup |µ ((−∞, x]) − ν ((−∞, x])|

• Important coupling interpretation of total variation distance:

T V (µ, ν) = inf {P (X 6= Y ) : (X, Y ) is a r.v. s.t. X ∼ µ, Y ∼ ν}

(i.e. infimum over all joint distributions with given marginals.)

• Similarly, for µ, ν on the real line,

Wass(µ, ν) = inf {E |X − Y | : (X, Y ) is a r.v. s.t. X ∼ µ, Y ∼ ν}

(So it’s often called the Wass1 , because of L1 norm.)

Lemma 1 Suppose W, Z are two r.v.’s and

1.5 A stronger notion of distance

The Berry-Esseen bound: Suppose X1 , X2 , . . . iid E(X1 ) = 0, E(X12 ) = 1, E |X|3 < ∞.

Can also show that for SRW,

Can we do it in the strong sense? That is:

This is known as Tusnády’s Lemma. Will come back to this later.

2 Integration by parts for the gaussian measure

The following result is sometimes called ‘Stein’s Lemma’.

Lemma 2 If Z ∼ N (0, 1), and f : R → R is an absolutely continuous function such that

|fn (x)| ≤ |f (x)| for all x and fn (x) → f (x) pointwise.

Suppose E |f 0 (Z)| < ∞. Then

Exercise 2: Find f s.t. E |Zf (Z)| < ∞ but E |f 0 (Z)| = ∞.

Next time, Stein’s method. Sketch:

sup |E g(W ) − E g(Z)| ≤ sup E f 0 (W ) − W f (W )

You might also like