Kolmogorov-Smirnov Distance
Kolmogorov-Smirnov Distance
Kolmogorov-Smirnov Distance
3 Kolmogorov-Smirnov distance
1.4 Facts
• All three distances defined above are stronger than weak convergence (i.e. convergence
in distribution, which is weak* convergence on the space of probaility measures, seen
as a dual space). That is, if any of these metrics go to zero as n → ∞, then we have
weak convergence. But converse is not true. However, weak convergence is metrizable
(e.g. by the Prokhorov metric).
• TV isP
a very strong notion, often too strong to be useful. Suppose X1 , X2 , . . . iid ±1.
Sn = n1 Xi . Then
S
√n =⇒ N (0, 1)
n
Sn
But T V ( √ , Z) = 1 for all n. Both Wasserstein and Kolmogorov distances go to 0 at
√ n
rate 1/ n.
Proof: Consider a point t, and fix an . Define two functions g1 and g2 as follows. Let
g1 (x) = 1 on (−∞, t), 0 on [t + , ∞) and linear interpolation in between. Let g2 (x) = 1 on
(−∞, t − ], 0 on [t, ∞), and linear interpolation in between. Then g1 and g2 form upper
and lower ‘envelopes’ for 1(−∞,t] . So
P (W ≤ t) − P (Z ≤ t) ≤ E g1 (W ) − E g1 (Z) + E g1 (Z) − P (Z ≤ T ).
2-2
Now E g1 (W ) − E g1 (Z) ≤ 1 Wass(W, Z) since g1 is (1/)-Lipschitz, and E g1 (Z) − P (Z ≤
t) ≤ C since Z has density bdd by C.
Now using g2 , same bound holds for the other side: P (Z ≤ t) − P (W ≤ t). Optimize over
to get the required bound. 2
S
√n =⇒ Z ∼ N (0, 1).
n
2-3
Proof: First assume f has compact support contained in (a, b). Then the result follows
from integration by parts:
Z b h ib Z b
−x2 /2 −x2 /2 2 /2
xf (x)e dx = f (x)e + f 0 (x)e−x dx.
a a a
Now take any f s.t. E |Zf (Z)| < ∞, E |f 0 (Z)| < ∞, E |f (Z)| < ∞.
Take a piecewise linear function g that takes value 1 in [−1, 1], 0 outside [−2, 2], and between
0 and 1 elsewhere. Let
fn (x) := f (x)g(x/n).
Then clearly,
Similarly, fn0 → f 0 pointwise. Rest follows by DCT. The last step is to show that the
finiteness of E |f 0 (Z)| implies the finiteness of the other two expectations.
Finiteness of E |f (Z)| follows from the inequality |f (x)| ≤ sup|t|≤1 |f (t)| + |xf (x)|. 2
Suppose you have a r.v. W and Z ∼ N (0, 1) and you want to bound
Main difference between stein’s method and characteristic functions is that Stein’s method
is a local technique. We transfer a global problem to a local problem. It’s a theme that is
present in many branches of mathematics.
2-4