Prob

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Tools from Probability Theory

Background material

Christoph Kopp

Bern University of Applied Sciences


School of Agricultural, Forest and Food Sciences HAFL

March 12, 2019

1 Random variables and general tools


We provide no formal definition of a random variable here. For us, a random variable
is a mechanism which provides values according to given probabilities.
This mechanism itself is not directly observable in statistics (think of it as a black box),
and the true probabilities are not known. However, we can “run” the mechanism several
times and each time, observe one so called realisation of the random variable.
A good example is casting a die: we do not know the true probabilities, but we can cast
the die several times, collect the results and infer statements about the probabilities
based on the observed data.
The variables which interest us in this course all take real numbers 1 as values. Good
examples are weights, lengths, durations or other quantities. The fact that a random
variable X takes a value in a set B ⊂ R is called the event X ∈ B.2 For example, if X
is at most equal to t, we write X ≤ t (instead of the equivalent X ∈ (−∞, t]).
Throughout these notes, we write P(A) to denote the probability of any event A. For
example, we write P(X ≤ t) to denote the probability of the event X ≤ t.

1.1 Distribution functions


The probability distribution of a real-valued random variable X is completely charac-
terized by its cumulative distribution function FX defined by

FX (t) = P(X ≤ t) ,
1
The set of all real numbers is denoted by R.
2
Not all subsets of R can be assigned a probability, but this does not matter for our purposes.
Module D2/3 2019 SS MSLS Tools from Probability Theory

for all real numbers t.3 It assigns to every t ∈ R the probability that the random variable
X is at most t. By definition, 0 ≤ F (t) ≤ 1 for all t. Further, if s < t, then F (s) ≤ F (t),
i. e. F is increasing (maybe not strictly).4

1.2 Quantile functions


The quantile function FX−1 of a random variable X is defined by

FX−1 (p) = inf{t : FX (t) ≥ p}

for every p ∈ (0, 1).5 It assigns to every number p ∈ (0, 1) the smallest value t such that
the distribution function of X at t is at least p. (This is a bit like reverse engineering.)
For example, FX−1 (1/2) is the smallest number t such that FX (t) exceeds 1/2.
F −1 is also increasing. In mathematical terms, F −1 is the generalized inverse of F and
has the property that F −1 (F (t)) ≤ t. If F is strictly increasing, then F −1 (F (t)) = t.

1.3 Absolutely continuous distributions


A random variable X is called absolutely continuous if there exists a non-negative so-
called density function f such that for all t ∈ R,
Z t
F (t) = f (x) dx .
−∞

The plot below contains the graph of the density function f of some random variable
R3
X. The shaded area is P(X ≤ 3) = FX (3) = −∞ f (x) dx ≈ 0.93.

0.5

0.0
0 3

3
If it is clear which random variable is meant, we write only F instead of FX .
4
Because P(X ≤ t) = P(X ≤ s) + P(s < X ≤ t) and probabilities are nonnegative.
5
The infimum inf B of a set B ⊂ R is the biggest lower bound of B. In many cases, it is equal to
the minimum. Also note that in the expression F −1 , the part −1 is a superscript, not an exponent.

2
Module D2/3 2019 SS MSLS Tools from Probability Theory

Do not worry about the integral sign if you are not familiar with it. The integral above
computes the area under the graph of f , bounded below by the x axis and to the right
by the value t. Absolute continuity of X means that we can find a function f such that
for each t, the value of F (t) = P(X ≤ t) is obtained by integrating f up to t.

2 Moments and estimators


2.1 Mutual independence of random variables
The random variables X1 , . . . , Xd are called mutually independent if they have no in-
fluence on each other in the sense that for any k ≤ d and x1 , . . . , xd ∈ R, we have
that
P(X1 ≤ x1 , . . . , Xk ≤ xk ) = P(X1 ≤ x1 ) · · · P(Xk ≤ xk ) .
If the Xi are mutually independent, then knowing the value of some of them does not
enable you to make a better prediction about the value of the rest.
For the remainder of Section 2, we restrict ourselves to absolutely continuous random
variables. We also assume that all integrals are finite.

2.2 Expected value and sample mean


The expected value of a random variable X with density f is defined as
Z ∞
E(X) = xf (x) dx .
−∞

It quantifies the theoretical average of this random variable.


From an independent and identically distributed sample x1 , . . . , xn from X, we can
estimate the expected value with the sample average
n
1X
x̄ = xi .
n i=1

2.3 Variance and sample variance


The variance of a random variable X with expected value µ is defined as

Var(X) = E(X − µ)2 .

It is the expectation of the squared distance between X and µ.


From an independent and identically distributed sample x1 , . . . , xn from X, we can
estimate the variance with the sample variance
n
1 X
s2x = (xi − x̄)2 .
n − 1 i=1

Its square root traditionally serves as estimator for the standard deviation.

3
Module D2/3 2019 SS MSLS Tools from Probability Theory

2.4 Covariance and sample covariance


The covariance of two random variables X with expectation µ and Y with expectation
ν is defined as
Cov(X, Y ) = E ((X − µ)(Y − ν)) .
It is the expectation of the product of the centered variables X − µ and Y − ν. The
covariance of independent random variables is zero. From the covariance, the correlation
may be defined, but we do not need it here.

From an independent and identically distributed sample (x1 , y1 ) . . . , (xn , yn ) from (X, Y ),
we can estimate the covariance with the sample covariance
n
1 X
sxy = (xi − x̄)(yi − ȳ) .
n − 1 i=1

3 Special distributions
All random variables in this chapter are absolutely continuous, but we do not need their
densities. Furthermore, their distribution functions are strictly increasing over their
respective support.

3.1 Normal distributions


A random variable X with a normal distribution is determined by its expectation µ and
variance σ 2 ; we write X ∼ N (µ, σ 2 ). The density function is bell-shaped and symmetric
about µ. The special case that Z ∼ N (0, 1) is called the standard normal distribution:
density function cumulative distribution function quantile function
0.4 1.0 3

2
0.8
0.3
1
0.6
0.2 0
0.4
−1
0.1
0.2
−2

0.0 0.0 −3
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0

t t p

The distribution function of Z ∼ N (0, 1) is denoted by Φ, i. e. Φ(t) = P(Z ≤ t). The


quantile function of Z is denoted by Φ−1 . It is often assumed that residuals of a model
come from a normal distribution.

4
Module D2/3 2019 SS MSLS Tools from Probability Theory

3.2 Chi-squared distributions


Let Z1 , Z2 , . . . , Zk be k ≥ 1 independent standard normal random variables. Then the
sum
X = Z12 + . . . + Zk2
has a chi-squared distribution with k degrees of freedom, symbolically X ∼ χ2 (k). Only
positive values are possible and the parameter k determines the location and shape of
the distribution.
The norm of a vector x = (x1 , . . . xk ) in k-dimensional space Rk is defined as
q
kxk = x21 + . . . + x2k .

If we interpret Z = (Z1 , . . . , Zk ) as a random vector in k dimensions, then the square of


its length has a χ2 (k) distribution, kZk2 ∼ χ2 (k). The density function is plotted for
some values of k below.

0.6 0.10
0.5 k=1 0.08 k = 10
0.4 k=2 0.06 k = 20
0.3 k=3 k = 30
0.2 0.04
0.1 0.02
0.0 0.00
0 2 4 6 8 10 0 10 20 30 40 50 60

The chi-squared distribution occurs in the context of variances in linear models. This
hints at the deep connection between geometry and linear models.

3.3 t distributions
Let Z ∼ N (0, 1) and V ∼ χ2 (k) be independent, then

Z
T =p
V /k

has a t distribution with k degrees of freedom, in symbols, T ∼ t(k). The bigger k,


the more the shape resembles a normal distribution; the main difference is that the t
distributions have much heavier tails than the normal distribution. The t distribution
is related to the distribution of the estimator of the slope of a regression line.

5
Module D2/3 2019 SS MSLS Tools from Probability Theory

densities zoom on quantiles

0.4 4

0.3 3

0.2 2
k=5
0.1 1 k = 10
normal
0.0 0
−2 0 2 0.975 0.995

3.4 F distributions
Let V ∼ χ2 (p) and W ∼ χ2 (q) be independent, then the random variable

V /p
F =
W/q

has an F distribution with p and q degrees of freedom, denoted as F ∼ F (p, q) (mind


the order of the two degrees of freedom). The F distribution only takes positive values
and often occurs as a ratio of two variances or sums of squares, which we will need when
we test certain statments about the regression model.

3.5 Special distributions in R


The normal cumulative distribution function and the normal quantile function (and be-
cause of that, the t, χ2 and F distributions) do not have simple analytic representations,
but all of them are numerically available in R. Specifically, let X ∼ N (µ, σ 2 ), then:

R name function code result


dnorm density function dnorm(x, µ, σ) f (x)
pnorm distribution function pnorm(q, µ, σ) FX (q)
qnorm quantile function qnorm(p, µ, σ) FX−1 (p)

A function called rnorm can be used to simulate normally distributed random variables
using pseudo-random numbers.6
6
The naming scheme is generic: e. g. for the t distribution, the commands are dt, pt, qt and rt.

You might also like