Prob
Prob
Prob
Background material
Christoph Kopp
FX (t) = P(X ≤ t) ,
1
The set of all real numbers is denoted by R.
2
Not all subsets of R can be assigned a probability, but this does not matter for our purposes.
Module D2/3 2019 SS MSLS Tools from Probability Theory
for all real numbers t.3 It assigns to every t ∈ R the probability that the random variable
X is at most t. By definition, 0 ≤ F (t) ≤ 1 for all t. Further, if s < t, then F (s) ≤ F (t),
i. e. F is increasing (maybe not strictly).4
for every p ∈ (0, 1).5 It assigns to every number p ∈ (0, 1) the smallest value t such that
the distribution function of X at t is at least p. (This is a bit like reverse engineering.)
For example, FX−1 (1/2) is the smallest number t such that FX (t) exceeds 1/2.
F −1 is also increasing. In mathematical terms, F −1 is the generalized inverse of F and
has the property that F −1 (F (t)) ≤ t. If F is strictly increasing, then F −1 (F (t)) = t.
The plot below contains the graph of the density function f of some random variable
R3
X. The shaded area is P(X ≤ 3) = FX (3) = −∞ f (x) dx ≈ 0.93.
0.5
0.0
0 3
3
If it is clear which random variable is meant, we write only F instead of FX .
4
Because P(X ≤ t) = P(X ≤ s) + P(s < X ≤ t) and probabilities are nonnegative.
5
The infimum inf B of a set B ⊂ R is the biggest lower bound of B. In many cases, it is equal to
the minimum. Also note that in the expression F −1 , the part −1 is a superscript, not an exponent.
2
Module D2/3 2019 SS MSLS Tools from Probability Theory
Do not worry about the integral sign if you are not familiar with it. The integral above
computes the area under the graph of f , bounded below by the x axis and to the right
by the value t. Absolute continuity of X means that we can find a function f such that
for each t, the value of F (t) = P(X ≤ t) is obtained by integrating f up to t.
Its square root traditionally serves as estimator for the standard deviation.
3
Module D2/3 2019 SS MSLS Tools from Probability Theory
From an independent and identically distributed sample (x1 , y1 ) . . . , (xn , yn ) from (X, Y ),
we can estimate the covariance with the sample covariance
n
1 X
sxy = (xi − x̄)(yi − ȳ) .
n − 1 i=1
3 Special distributions
All random variables in this chapter are absolutely continuous, but we do not need their
densities. Furthermore, their distribution functions are strictly increasing over their
respective support.
2
0.8
0.3
1
0.6
0.2 0
0.4
−1
0.1
0.2
−2
0.0 0.0 −3
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0
t t p
4
Module D2/3 2019 SS MSLS Tools from Probability Theory
0.6 0.10
0.5 k=1 0.08 k = 10
0.4 k=2 0.06 k = 20
0.3 k=3 k = 30
0.2 0.04
0.1 0.02
0.0 0.00
0 2 4 6 8 10 0 10 20 30 40 50 60
The chi-squared distribution occurs in the context of variances in linear models. This
hints at the deep connection between geometry and linear models.
3.3 t distributions
Let Z ∼ N (0, 1) and V ∼ χ2 (k) be independent, then
Z
T =p
V /k
5
Module D2/3 2019 SS MSLS Tools from Probability Theory
0.4 4
0.3 3
0.2 2
k=5
0.1 1 k = 10
normal
0.0 0
−2 0 2 0.975 0.995
3.4 F distributions
Let V ∼ χ2 (p) and W ∼ χ2 (q) be independent, then the random variable
V /p
F =
W/q
A function called rnorm can be used to simulate normally distributed random variables
using pseudo-random numbers.6
6
The naming scheme is generic: e. g. for the t distribution, the commands are dt, pt, qt and rt.