Statistikskript VWL Final E v2 Slides

Download as pdf or txt
Download as pdf or txt
You are on page 1of 423

Statistics for

Economics

1
Contents
Statistics for Economics
Part I: Probability
1. Probability theory: the building blocks Slide 7
1.1. Events and the sample space Slide 8
1.2. Relations of set theory Slide 19
1.3. The concept of probability Slide 27
1.4. Axiomatic definition of probability Slide 41
1.5. Basic theorems Slide 44
1.6. Probability spaces Slide 50
1.7. Conditional probability and stochastic independence Slide 64
1.8. Law of total probability Slide 77
1.9. Bayes’ theorem Slide 85
2. Combinatorial methods Slide 93
2.1. Factorials and binomial coefficients Slide 94
2.2. Multiplication rule Slide 97
2.3. Permutations Slide 99
2.4. Combinations Slide 101
2.5. Sampling with replacement Slide 103

2
3. Random variables Slide 105
3.1. The (cumulative) distribution function Slide 117
3.2. Discrete random variables Slide 125
3.3. Continuous random variables Slide 129
3.4. The expectation of a random variable Slide 138
3.5. Variance Slide 156
3.6. Standardization Slide 169
4. Special distributions Slide 173
4.1. The uniform discrete distribution Slide 176
4.2. The Bernoulli distribution (discrete) Slide 183
4.3. The Binomial distribution (discrete) Slide 187
4.4. The Poisson distribution (discrete) Slide 195
4.5. The uniform continuous distribution Slide 201
4.6. The exponential distribution (continuous) Slide 205
4.7. The normal distribution (continuous) Slide 209
5. Multivariate random variables Slide 218
5.1. Joint distribution and marginal distributions Slide 223
5.2. Conditional distributions and stochastic independence Slide 240
5.3. Covariance and correlation Slide 248
5.4. Sums and sample means of random variables Slide 254
6. The Central Limit Theorem Slide 261
3
Pert II: Statistics
7. Descriptive statistics Slide 278
7.1. Frequency tables, histograms, and empirical distributions Slide 281
7.2. Summarizing data using numerical techniques Slide 289
7.3. Boxplot Slide 302
7.4. Quantile-Quantile-plot Slide 305
7.5. Scatter diagram Slide 310
8. Estimation of unknown parameters Slide 313
8.1. Intuitive examples of estimators Slide 318
8.2. Properties of estimators Slide 328
8.3. Main methods to get estimators Slide 346
9. Confidence intervals Slide 361
9.1. The idea Slide 362
9.2. Example of a confidence interval
(mean of a distribution, large samples) Slide 366
9.3. Relation with testing hypotheses Slide 370

Part III: Exercises Slide 375

4
Part I: Probability

5
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 1.3/4 DeGroot and


Schervish
1. Probability theory:
building blocks

7
1.1. Events and the sample space

A (random) experiment is any process, real or


hypothetical, in which the possible outcomes can be
identified ahead of time:

 it is performed under clear rules;


 it can be repeated as often as necessary under
the same conditions; and
 the outcome is unknown and cannot be predicted.

8
Definition

Every experiment has a number of possible


single outcomes (elementary events).

9
Definition

The collection of all possible outcomes of an


experiment is called the

sample space

of the experiment.

10
Example 1.1.1:
) When a six-sided die is rolled:
S = {1, 2, 3, 4, 5, 6}
) If we flip a coin twice:
S = {HH, HT, TH, TT}
) If we flip a coin until we get a head:
S = {H, TH, TTH, TTTH, ....}
(countably many elements)

11
) For the lifespan of a light bulb we have:
S = {t : t ≥ 0} = [0,∞ ) = R +

(uncountably many elements)


 a continuum of outcomes.
) For modeling the behavior of a share price, a
possible choice is

S = {all functions: R + → R + }.

12
Remark:

An experiment might be described by


several sample spaces.

13
Example 1.1.2:
Two coins are tossed:
→ if we are interested in the outcomes heads or tails
of the two coins:
S1= {(H , H) , (H, T ), ( T, H), (H, H)}
→ if, instead, we count the number of heads/tails:
S2 = {( 2, 0 ) , (1, 1), ( 0, 2)}
→ finally, if we only want to see whether they show the
same (s) or a different (d) result:
S3 = {{s}, {d}}.
14
Definition
Definition

An event
Ein A is a well-defined,
zufälliges Ereignis A istarbitrary
eine Teilmenge
set of des
possible
Ereignisraumes
outcomesS.of the experiment. It is a subset
of the sample space S.
Das Ereignis A ist eingetreten, wenn das Ergebnis
Wedessay
Zufallsexperimentes
that an event A occurred
ein Element
if the outcome
dieser of
the
Teilmenge
experiment
A ist.
is an element contained in A.

15
Definition

All events of an experiment with sample space S


form the

set of events E(S).  Set of all subsets of S

16
We assume that two specific events must always be
contained in E:

1. The sample space, which as an event we call the


sure event: S ⊂ E;

2. The empty set, which as an event we call the


impossible event: Ø ⊂ E.

17
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables
Chapter 1.4 DeGroot and
6. The Central Limit Theorem Schervish
18
1.2. Relations of set theory

Complement: Ā
is the set that contains all elements of S
that do not belong to A.

19
Definition

The complement Ā = S\A occurs when A does


not occur.
A

S
Ā

20
Definition
Union: A∪B (‘‘A or B’’) is defined to be the set
containing all outcomes that belong to A alone, to B
alone, or both A and B.
A B

21
Definition
Intersection: A ∩ B (‘‘A and B’’) is defined to be the
set that contains all outcomes that belong both to A
and to B.
A A∩B B

22
Definition

Two events A and D are called disjoint or


mutually exclusive if A and D have no outcomes in
common: A ∩ D = Ø.
D

Disjoint events 23
Definition

Difference: A\B (‘‘A without B’’) occurs if A, but not B,


occurs. B
A

A\B

24
Definition

Containment: It is said that a set C is contained in a


set A if every element of C also belongs to the set A.

A
C
C is contained in A
⇔C⊂A

S
C is contained in A
25
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 1.2 DeGroot and


Schervish
26
1.3. The concept of probability

P: E ℝ
A P(A)

is a real-valued function:
To each event in E is assigned exactly one
element of ℝ.

27
1) The subjective interpretation of probability

is often used for one-time events;

is the probability that a person assigns to a


possible outcome of some process, representing
her own judgement of the likelihood that the
outcome will be obtained.

“P(A) = the degree of belief that someone holds


about the likelihood of A occurring’’

28
2) The frequency interpretation of probability

the probability that some specific outcome of a


process will be obtained is interpreted to mean the
relative frequency with which that outcome would be
obtained if the process were repeated a large number
of times under similar conditions.

29
Definition

The limit P(A) = lim hn (A)


n →∞

denotes the frequentist probability of A, where

hn (A) = "relative frequency that A occurs".

30
Example 1.3.1:
A die is rolled 3,000 times in succession. A running
tally is kept of the number of times we get a ‘‘6’’.
What do we expect for P (6)?
" "

1 → third concept ≅ 0.1666


( )
P 6 =
" "
6
"classical definition of probability"

31
Illustration of Example 1.3.1:

32
3) The classical interpretation of probability

attributed to Laplace, 1812

Nevertheless, Bernoulli had already discussed the


same concept more than 100 years earlier.

33
The classical interpretation of probability is based
on the concept of equally likely outcomes.

If the outcome of some process must be one of n


different outcomes, and if these n outcomes are
equally likely to occur, then the probability that an
event A occurs is given by the ratio between the
number of outcomes in A and the total number of
outcomes n.

34
# outcomes belonging to A |A|
P(A) = =
# possible outcomes n

35
Definition

An experiment with finite many equally likely


elementary events is called a

Laplace experiment.

36
Example 1.3.2:
We toss a die and a coin simultaneously.
We want to compute the probability of the event
A = "heads and number larger than 4"

Definition of the sample space:


Possible outcomes: ( H, ≤ 4 ) , ( H, > 4 ) , (T , ≤ 4 ) , (T , > 4 )
Are they equally likely? No!

37
→ Alternative: Elementary events:
(H,1) , ( T,1) , (H,2 ) , ( T,2 ) ,…, (H,6 ) , ( T,6 )
are equally likely!

We can use the Laplace theory:

2
⇒ A= {(H,5 ) , (H,6 )} and P ( A )= 12
= 1
6

38
Example 1.3.3:
When flipping a coin twice:

S = {HH, HT, TH, TT}.


3
P ["at least one head" ] =
4

39
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables
Chapter 1.5 DeGroot and
6. The Central Limit Theorem Schervish
40
1.4. Axiomatic definition of probability
Definition
Each function P

P: E ℝ
A P(A)
that assigns to each event A in E a real number is
called probability function (or measure).
P(A) is called the probability of the event A when
the following axioms hold (Kolmogorov, 1933):

41
Definition (continued)

Axiom 1: P(A) ≥ 0, ∀AєE

Axiom 2: P (S ) = 1

Axiom 3: P (A∪B) = P (A) + P (B),


whe n A∩B = Ø

(a ddition rule for dis joint e ve nts )

42
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables
Chapter 1.5 DeGroot and
6. The Central Limit Theorem Schervish
1.5. Basic theorems

Theorem 1
The probability of the complement of an event A is
given by
P(Ā) = 1 - P(A), for each event A є E

Theorem 2
The probability of the impossible event is given by:
P(Ø) = 0

44
Theorem 3
For every finite sequence of n pairwise disjoint events
A1,A2, ..., An є E the probability of the union of the
events equals the sum of the individual probabilities.
That is
 n  n
P   A i  = ∑ P ( A i ).
 i=1  i=1

Theorem 4
For an event resulting from a difference A\B we have
that P(A\B) = P(A) – P(A∩B).
45
Theorem 5 (addition rule)
For every two events A and B in E we have that
P(A∪B) = P(A) + P(B) – P(A∩B).

Impliziert
Theorem 6 (monotonicity property)
If an event A is contained in an event B, then its
probability will never be larger than that of B, that is

A⊂B P(A) ≤ P(B)

46
Example 1.5.1:

What is the probability that an arbitrarily chosen


number with three digits will have at least two of the
same digit? (Use the Laplace theory)

Let us define the sample space as

S = {000,…,999} , S = # outcomes in S = 10 3

47
We consider the event

A = ''number with at least two of same digits",

then:

A
( )
Th.1
P (A) = 1− P A = 1− , und
S
A = # three digits numbers with all
different digits = 10 ⋅ 9 ⋅ 8 = 720
720
⇒ P (A) = 1− = 0.28
1000
48
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables
Chapter 1.6 DeGroot and
6. The Central Limit Theorem Schervish
1.6. Probability spaces
Discrete probability spaces
Definition (discrete sample space)

A sample space S with a finite number or a


countable number of outcomes is called discrete.

50
Let us now consider P() a probability function that
satisfies the axioms.

Then we assign to each elementary event ei a


probability pi = P(ei). The number pi indicates the
probability that the outcome will be exactly ei.

That is:
e 1 e2 e3 ..... ei .....
p1 p2 p3 ..... pi .....

In order to satisfy the axioms of probability, the


numbers pi must satisfy the following conditions:

51
(1) pi ≥ 0 for each i = 1,2,...

(2) ∑ pi = 1, because ei ∩ ej = Ø ∀ i ≠ j
all i
(ei pairwise disjoint) and ∪ ei = S
all i
Th. 3
∑ pi = P(∪ ei) = P(S) = 1
alle i all i

(3) P(A) = ∑ pi
eiєA

that is, the probability of each event A є E is computed


as the sum of the probabilities of all outcomes ei
contained in A.
52
Example 1.6.1:
1. Probability space with m equally likely outcomes
(Laplace-experiment):

pi 1
e1 , , e m ;= ∀ i=1, , m
m
m
⇒ ∑ i
p =m ⋅ 1 =1
i=1
m

53
2. Probability space with infinitely, countably many
outcomes:
Experiment: “Flip a coin until we get an head’’.

S = {H, TH, TTH, TTTH, TTTTH ,…}


e1 e2 e3 e4 e5 ......

= ( H ) 1 2=
p1 P= TH ) 1 =
, p 2 P (=
4
, p3 1 ,…
8
  1
pi P=
 TT.....TH
   2i
 i 
i

1 ∞
1∞
⇒ ∑ pi = ∑ pi = ∑ i = ∑   =1,
alle i i=1 i=1 2 i=1  2 

geometric series that converges to 1!


54
Proof using the properties of the geometric
series or through visualization:

1 =p
2 1

Area of the
square = 1
1
1 =p 8
4 2
......
1
16 ......

55
General probability spaces

Definition
Ist (continuous
ein Ereignisraum sample space)
S nibzälbar, heisst er stetig.

A sample space S with uncountably many


outcomes is called continuous.

56
Example 1.6.2:

Consider a piece of the line going from 0 to 1 (closed


interval):

0 1

In this interval we can identify infinitely, uncountably


many points with zero length.

57
Let us now conduct an experiment constructed as
follows: choose randomly an arbitrary real number
0 ≤ a ≤ 1 in the interval [0,1].
In any case we have that
P ( a ) =0 ∀a ∈ S= [ 0,1] .

We define the events:

A = {a | a < 0.4}
B = {a | 0.6 < a < 0.9}
C = {a | a > 0.8}
58
Intuition:
P ( A ) = 0.4; P ( B ) = 0.3 und P ( C ) = 0.2. Why?
→ S can be seen as a Laplace continuous sample space,
where all real numbers are equally likely to be chosen.
length A
Thus for an event A: P ( A ) = ;
length S B
C

length B ∩ C 0.1
P [ B ∩ C] = = = 0.1
length S 1 0 0.5 0.6 0.8 0.9 1

Th.5
P [ B ∪ C] = P ( B ) +P ( C ) − P [ B ∩ C] =0.3+0.2-0.1=0.4

59
Remark:
The intervals (events) can be open or closed.
For example

B+ = {a | 0.6 ≤ a ≤ 0.9} → P ( B+ ) = 0.3 = P ( B ) ,

In both cases the probability is the same, because


even though B+ has two additional points (boundaries),
their probability is “of zero measure’’.

60
2. Define S= {( a,b ) | 0 < a < 3 and 0 < b < 2}
D = {( a,b ) | 0 < a < b < 2}

K= {(a,b) | (a-2) + (b-1) < 1}


2 2

( circle included in the rectangle )


b

K
S
a
61
Using the classic definition of probability:

Area of D 2 1
P ( D) = = =
Area of S 6 3

Area of K π
P (K) = = ≅ 0.5236
Area of S 6

62
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables
Chapter 2.1/2 DeGroot and
6. The Central Limit Theorem Schervish
1.7. Conditional probability and stochastic
independence

64
Example 1.7.1:
Rolling a die:

What is the probability of getting a “6“?

→ P ("6") = 1
6
And when we additionally know that the resulting
number is even?

P ("6" "result is an even number") = 1


3

65
Definition (conditional probability)

Suppose that we learn that an event B has occurred


and we wish to compute the probability of another
event A taking into account that our knowledge that B
has occurred.
This probability is called conditional probability of
A given that B has occurred, is denoted by P(AIB),
and can be computed as
P(A∩B)
P(AIB) = , if P(B) > 0,
P(B)
and is not defined if P(B)=0

66
Example 1.7.2:

Suppose that two dice were rolled.


What is the probability (Laplace experiment) that at
least one of the dice results in a ‘‘6’’ if we already
observe that the sum of the two numbers is larger
than 9?

67
(1,1) (1,2) ...... ...... ...... (1,6)
(2,1) (2,2) ...... ...... ...... (2,6)
(3,1) (3,2) ...... ...... ...... (3,6)
(4,1) (4,2) ...... ...... (4,5) (4,6)
(5,1) (5,2) ...... ...... (5,5) (5,6)
(6,1) (6,2) ...... (6,4) (6,5) (6,6)

11
A = "at least one 6": P ( A ) =
36
6 5
B = "sum > 9": P ( B) = and P ( A ∩ B ) =
36 36
P ( A ∩ B) 5/36
5
⇒ P ( A B) = = =
P ( B) 6/36 6
68
Theorem 7 (multiplication rule)
Let A and B be events with positive probability. The
probability of the intersection of A and B is given by

P(A ∩ B) = P(A)  P(B I A)


or
P(B ∩ A) = P(B)  P(A I B).

69
Example 1.7.3:

An urn contains 4 colored balls, 3 of them being red and


1 blue.
What is the probability that, after randomly drawing 2
balls without replacement, 2 red balls are observed?

Let Ri = "i-th drawn ball is red" for i=1, 2. Then we have

3 2 1
P [ R1 ∩ R 2 ] = P [ R1 ] ⋅ P  R 2 R1  = ⋅ = .
42 3 2

70
Definition (stochastic independent events)

Two events A and B with positive probability are


(stochastic) independent if

P(A I B) = P(A).

Analogously, it also holds that

P(B I A) = P(B).

71
Theorem 8 (multiplication rule for independent events)

If two events A and B are independent, then it follows


that
P(A∩B) = P(A)  P(B).

72
Example 1.7.4:

A coin is flipped twice.


S= {HH, HT, TH, TT} , all equally likely (Laplace).
Consider the events: A="head in 1st toss" = {HH, HT}
B="head in 2nd toss" = {HH, TH}
2 1
Then: P ( A ) =P ( B ) = = and
4 2
∩ B ) P ({HH
P (A = = }) 1 4 =P ( A ) ⋅ P ( B ) .
Thus, A and B are independent (as expected).

73
Example 1.7.5:

See exercise 1.6.1 b) “discrete probability spaces’’


(24 tosses of 2 dice; consider the sum of the numbers
→ compute P["at least one result equal 12"] = ? ).

74
Remark:
Stochastic independence is not a transitive relation!
From "A and B independent" and "B and C
independent" does not necessarily follow
"A and C independent"!

Example: Rolling two dice.

Let A be the event "1,2 or 3 in first die", B "4, 5 or 6 in


second die", and C "4,5 or 6 in first die".

Clearly A and B as well as B and C are independent,


but in any case A and C will be dependent.
75
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables
Chapter 2.1 DeGroot and
6. The Central Limit Theorem Schervish
1.8. Law of total probability
Example 1.8.1: (identifying defective items)
A manufactured article can be produced using two
different machines.
Machine one M1 produces two times more articles than
the slower machine M2. But 10% of M1 's articles are
defective while only 7% of M2 's articles are defective.

What is the probability that a randomly chosen article


from the total production is defective?
77
Known proportions: machine production defective items
M1 2/3 10%
M2 1/3 7%

P("defective article" | M1) = 0.1 ;


P("defective article" | M2) = 0.07
P("article produced by M1") = 2/3
P("article produced by M2") = 1/3
→ P(A=‘‘defective article") = ?
S

A ∩ M1
A="defective article" M i : "produced by M i"
A ∩ M2
M2 M1
78
We have (Theorem 8, multiplication rule):

P(A ∩M
=i) P ( A Mi ) ⋅ P (Mi ) , i = 1,2

Following from Axiom 3:

P ( A ) = P ( A ∩ M1 ) +P ( A ∩ M2 )
= P ( A M1 ) ⋅ P (M1 ) +P ( A M2 ) ⋅ P (M2 )
= 0.1⋅ 2 + 0.07 ⋅ 1 =0.09
3 3
79
Definition:
Definition (partition)

Let S denote the sample space of some experiment,


and consider n events H1, H2,..., Hn in S such that

Hi ∩ Hj = Ø for i ≠ j (pairwise disjoint)


and
H1 ∪ H2 ∪ ... ∪ Hn = S.

It is said that these events form a partition of S.

80
Theorem 9 (law of total probability)

Suppose that the events H1, H2, ... , Hn form a


partition of the state space S, and that all have some
positive probability. Then, for every event A є E,
n
P(A) = ∑ P(A I Hj)  P(Hj)
j=1

H1 H2
...
A
H3

Hn
H4 ...
...

81
Example 1.8.2:
In Example 1.7.3, an urn with 4 balls (3 red and 1
blue):
R1 = "first ball drawn is red"
and
R1 = "first ball drawn is blue"
form a partition of S.
We can therefore compute the probability of the event
R2 = {"second ball drawn is red"} as

( ) ( )
P (R2 ) = P (R2 R1 ) ⋅ P (R1 ) + P R2 R1 ⋅ P R1
2 3 3 1 3
= ⋅ + ⋅ =
3 4 3 4 4
82
Remark:
Computations like those appearing in the law of total
probability can often be visualized using a tree
diagram.

R 2 red-red
R1
R 2 red-blue

R 2 blue-red
R1
R 2 blue-blue (impossible)

83
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables
Chapter 2.3 DeGroot and
6. The Central Limit Theorem Schervish
1.9. Bayes’ theorem
Bayes’ theorem:

Let the events H1,...,Hn form a partition of the


space S such that P(Hi) > 0, for all i=1,…,n,
and let B be an event such that P(B) > 0.
Then, for each Hi

P (B | Hi )P (Hi )
P (H i | B ) =

n
j =1
P (B | H j )P (H j )

85
Example 1.9.1: (identifying defective items)
In Example 1.8.1 the probability that an article randomly
chosen from the total production is produced by
machine M1 was (a-priori)
2
P ("article produced by machine M1")= = 0.66.
3
If we now observe that the chosen article is defective,
we will surely increase that probability, given that
machine M1 leaves a larger portion of defective items
behind.

86
Example 1.9.1 (continued):
From Bayes’ theorem we get: 2
0.1 ⋅
3
P ( "article from M1" | " article defective" ) =
2 1
0.1 ⋅ + 0.07 ⋅
3 3
20
= = 0.741
27

In Bayes’ language, H1,...., Hn are called alternative


hypotheses, P(Hi) is called the prior probability of the i-
th hypothesis and P(Hi|B) is called the posterior
probability of the i-th hypothesis after having observed
that B has occurred.
87
Example 1.9.2: (test for a disease)
The following numbers are known about a free
medical test for a certain disease:

 If a person has the disease, there is a probability


of 90% that the test will give a positive response.

 If a person does not have the disease, there is a


probability of 99% that the test will be negative.

 The chances of having the disease are only 0.1%.


88
Example 1.9.2 (continued):
Given that the test is free, fast and harmless, you
decide to take it. A few days later you learn that you
had a positive response to the test.

What is the probability that you have the disease?

Let: H1 = " you have the disease";


B = "test is positive"
→ P (B H1= ( )
) 0.9 ; P B H1= 0.1 ;
= (
P B H1 0.99
= ); P B H1 0.01 ( )
89
Example 1.9.2 (continued):

( )
prior probability
→ P ( H1 ) 0.001 and =
= P H1 0.999
law of total

) ( )
probability

(
P ( B ) P ( B H1 ) ⋅ P ( H1 ) + P B H1 ⋅ P H1
⇒ =
= 0.9 ⋅ 0.001 + 0.01 ⋅ 0.999 = 0.01089 and
Bayes' theorem
( )
P B H 1 ⋅ P ( H 1 ) 0.9 ⋅ 0.001
⇒ P ( H1 B )
= = = 0.0826,
P ( B) 0.01089
that is the reliability of a positive response
is about 8.3%.
90
Tree diagram:
Test T

Condition
B

H1

B
B
H1

B
91
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 1.7/8/9 DeGroot


and Schervish
2. Combinatorial methods

93
2.1. Factorials and binomial coefficients

Definition

The notation n! indicates the product of all integer


numbers between 1 and n, that is
n! = 1 ⋅ 2 ⋅ 3 ⋅ ⋅ (n-1) ⋅ n
and is read n factorial.

Moreover, we assume that: 0! = 1

94
Definition

The binomial coefficient denoted by the symbol


n
k
is defined for integers n > 0 and k ≥ 0 with n ≥ k by

n n!
=
k k! (n – k)!

95
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem


2.2. Multiplication rule
Suppose that an experiment has k parts, that the i-th part
of the experiment can have ni possible outcomes, and
that all the outcomes in each part can occur regardless of
which specific outcomes have occurred in the other parts.

Then, the total number of outcomes of the experiment will


be equal to the product

n1  n2  n3    nk.

97
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem


2.3. Permutations
Definition

Suppose that a set has n elements.


Suppose that an experiment consists of selecting k of the
elements at a time without replacement.
Let each outcome consist of the k elements in the order
selected.
Each such outcome is called a permutation of n
elements taken k at a time.
The number of permutations is given by n! / (n – k)!
99
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem


2.4. Combinations

Definition

Consider a set with n elements.


Each subset of size k chosen from this set is called
a combination of n elements taken k at a time.

The number of distinct subsets of size k that can be


chosen from a set of size n is given by the binomial
n
coefficient ( k ) and therefore equals n! / (k! (n-k)!).

101
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem


2.5. Sampling with replacement

Definition

How many distinct sequences of length m can we


obtain using a set of size n (with replacement)? nm

And if we do not care about the ordering in the


n+m-1
different samples? ( m )

103
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapters 3 and 4 DeGroot


and Schervish
3. Random variables

105
Definition

Let us consider a probability space [S, E, P ()].


A real-valued function
X: S ℝ
e X(e) є ℝ,
assigning to each elementary event e in S a real
number X(e) is called a random variable when
an event Ar є E with Ar = {e | X(e) ≤ r} can be
defined for every arbitrary real number r.

106
S
 

  

  

xєℝ
-1 0 1 2 3

107
Example 3.0.1:
If we roll one die once, we have that

S = {1, 2,…,6}.

The number resulting on the die defines a random


variable that can be described by the function

X (e) = e .

108
Example 3.0.2:
A coin is tossed once.
Let X denote the number of heads. X has only
two possible values:
X ( "tail" ) = 0 und X ( "head" ) = 1.
The set of events E includes the four events:
E = {0,
/ "tail", "head", S} .

We have: if -∞ < r < 0 then A r =


0/
if 0 ≤ r < 1 then A r =
" tail"
if 1 ≤ r < ∞ then A r =
S.
109
Example 3.0.3:
Two dice are rolled once. The sample space contains
the following 36 single outcomes:
S= {( i, j) | i=1,...,6, j=1,...,6}.
We can consider different random variables, such as:
X = "sum of the numbers":
X ( i, j ) = i+j = x, x = 2, 3,…,12.
Y = "absolute difference of the numbers":
Y ( i, j ) = i-j = y, y = 0, 1,…,5.
110
Definition
Let W be the set of values a given function can take.
W is called the image (or range) of the function.

We say that a given random variable X is discrete if X


can take only a finite number or a countable infinite
number of different values (i.e. W ⊂ ℝ is discrete).

If the image W ⊂ ℝ of a random variable X contains an


interval of the real line (or the whole real line), then X is
called continuous (uncountably many values in W).
111
Example 3.0.4: (sick notes, see exercise 1.5.2)

O: "number of sick notes"

S = {{-}, {X},{Y},{Z},{XY},{XZ},{YZ},{XYZ}}
→ W = {0, 1, 2, 3}

112
Example 3.0.5: (revenue under uncertain conditions)

The total revenue of a company results from the order


volumes of few major contracts.
For next year the company management hopes to get
three big orders A, B, and C.
The management estimates the chances of success of
getting each specific order differently.
Let us assume for simplicity that getting order A, B, or
C are independent events.

113
Example 3.0.5 (continued):

Order Order volume Probability of


[Mio. CHF] getting the order

A 10 0.8
B 14 0.5
C 24 0.75

The revenue is a random variable X that depends on


the chances of success of the orders:
P ( X = 34 ) = P ( A ∩ B ∩ C=
) ( 0.8 ) ⋅ (1 − 0.5 ) ⋅ 0.75= 0.3

Image of X ?
114
Order positions Revenue P({ei})
(ei) (X(ei))
− 0 0.025
A 10 0.1
B 14 0.025
C 24 0.075
AB 24 0.1
AC 34 0.3
BC 38 0.075
ABC 48 0.3
Σ=1

W = {0, 10, 14, 24, 34, 38, 48}


→ P ( X=24 ) = P ({C, AB}) = 0.175.
115
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 3.3 DeGroot and


Schervish
3.1. The (cumulative) distribution function

Definition

The (cumulative) distribution function (for short


c.d.f.) F of a random variable X is the function
F(x) = P(X ≤ x) = P({e|X(e) ≤ x}), for all x є ℝ,
that assigns to each real value x the probability that
the random variable takes a value X ≤ x.

117
Example 3.1.1:
X = “number of heads when tossing a coin once’’.

X can only take the values 0 or 1.


If the coin is fair, that is P(‘‘head’’) = 0.5, then:
F(x)

1
 0 , if x < 0
1 ½
F(x) =  , if 0 ≤ x <1
2

 1 , if x ≥ 1 0 1

118
Example 3.1.2:
Two dice are rolled once.
Let Y be ‘‘the absolute difference between the two numbers’’.
Then:
Y = 0 , if (i, j) = (1,1); (2,2); (3,3); ...; (6,6) : 6 couples
Y = 1 , if (i, j) = (1,2); (2,1); ... : 10 couples
Y = 2 , if (i, j) = (1,3); (3,1); (2,4); ... : 8 couples
Y = 3 , if (i, j) = (1,4); (4,1); ... : 6 couples
Y = 4 , if (i, j) = (1,5); (5,1); (2,6); (6,2) : 4 couples
Y = 5 , if (i, j) = (1,6); (6,1) : 2 couples
36 couples
119
Example 3.1.2 (continued):

6 6
Thus: P ( Y 0==) = 16 ; P ( Y 3=) = 1 ;
36 36 6
10 5 4
P ( Y= 1=
) = ; P ( Y= 4=
) = 1 ;
36 18 36 9
8 2 2
P(Y 2 =
=) = ; P ( Y 5=) = 1 ;
36 9 36 18

120
0 , y<0
1/6 , 0 ≤ y < 1
4/9 , 1 ≤ y < 2
⇒ F ( y ) =2/3 , 2 ≤ y < 3
5/6 , 3 ≤ y < 4
17/18 , 4 ≤ y < 5
1 , y≥ 5

121
F(y)

2/3

1/3

0 1 2 3 4 5

122
Properties of the distribution function:

(1) Continuity from the right: a c.d.f. is always continuous


from the right; that is
lim F(x + Δx) = F(x) at every point x.
Δx0
(2) Nondecreasing: the function F(x) is nondecreasing as
x increases; that is
F(a) ≤ F(b) for any arbitrary a < b.

(3) Limits: F(x) has the limits:


lim F(x) = 0 and lim F(x) = 1.
x ∞ x+∞
123
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 3.1/3 DeGroot and


Schervish
3.2. Discrete random variables
Definition
If a random variable X has a discrete distribution,
the probability (mass) function of X is defined as
the function f such that for every real number x
f(x) = P [X=x].

Clearly, the values pi = P(X = xi) are positive only at


the points x=xi belonging to the image W of X:
 pi x= xi ∈ W ,
f ( x=
) P( X
= x= ) 
0 else.
125
Every probability function f satisfies the properties:

(1) f(xi) ≥ 0 (probabilities non-negative);

(2) ∑ f(xi) = 1 (sure event has probability 1).


all i

From (1) and (2) we get directly the following


property:

(3) f(xi) ≤ 1.
126
Remark:
For real-valued intervals, we can generally compute
the probabilites using the following formula:

∑ p ( xi )
P ( a<X ≤ b ) = F ( b ) − F ( a ) =
a < xi ≤b

For discrete random variables, the formula has to be


modified (boundaries included/excluded):

P ( a < X < b ) = F (b ) − F ( a ) − f (b )
P ( a ≤ X ≤ b ) = F ( b ) − F ( a ) +f ( a )
P ( a ≤ X < b ) = F ( b ) − F ( a ) +f ( a ) − f ( b )

127
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 3.2/3 DeGroot and


Schervish
3.3. Continuous random variables

Definition Type equation here.

Let X be a continuous random g


variable with
distribution function F. The first derivative of
the function in x
d
f(x) = F(x)
dx
is called the (probability) density function of X.
In this case: b


P ( a ≤ X ≤ b ) = f ( x )dx.
a
129
Every density function satisfies the following properties:

(1) f(x) ≥ 0 (distribution function nondecreasing)



(2) ∫f(x)dx = 1 (Area under the density is exactly 1)

130
Example 3.3.1:

A continuous random variable X is characterized by


the distribution function

 0 , x<0
 1
F ( x ) =  ( x - 3 )3 + 1 , 0 ≤ x < 3
 27
 1 , x ≥ 3

131
Example 3.3.1 (continued):

Computation of the density function:


(take the derivative of F(∙) in each part of the c.d.f. )

 0 , x<0
 1
f ( x ) =  ( x - 3 )2 , 0 ≤ x<3
9
 0 , x ≥ 3

132
Example 3.3.1 (continued):
f(x)

F(x)
1
1

0.5 0.5
0.2593

x x
0 1 2 3 0 1 2 3 4

133
What is the probability P (1 ≤ X ≤ 2 ) ?

P ( 1 ≤ X ≤ 2 )= F ( 2 ) − F ( 1)=
−1
27
+ 1− ( ( −2 ) 3
27
+1= ) 27
7
= 0.2593

or
2
1   −1 −8 7
2
1
P (1 ≤ X ≤ 2 ) = ∫ dx =
2
(x-3) = − = .
3
(x − 3)
1
9  27  1 27 27 27

In both cases we clearly get the same result!


The information content of the two functions F and f is
the same.
134
Example 3.3.2: (waiting time at the ‘S-Bahn’ station)

Trains come every 12 minutes at a given


‘S-Bahn’ station. Suppose you do not know the exact
schedule and arrive at the station at a randomly
chosen point in time.

Let us define
X = ‘‘waiting time at the station’’
as the random variable of interest.
The image set of X is W = [0,12] (minutes).
135
 1 , if x ∈ [ 0, 12]
Density function: f ( x ) =  12
 0 , else.

Distribution function:
 0 , x<0 x x
x 1  u x
F(x)  12 , 0 ≤ x ≤ 12 ← ∫ = du =
 12  12
 0
12 0

 1 , x > 12
10
Then: P (10 < X < 15 ) =F (15 ) − F (10 ) =−
1 =0.1667
12
9
P ( X > 9 ) =−
1 F ( 9 ) =−
1 =0.25
12 136
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 4.1/2 DeGroot and


Schervish
3.4. The expectation of a random variable

Definition
Let X be a random variable and f be its probability
or density function (discrete or continuous X).

The expected value of X is defined as

E[X] = ∑ xjf(xj) = ∑ xjpj, if X discrete;


all j all j

E[X] = ∫ xf(x)dx, if X continuous.

It is usually denoted by μx.
138
The expectation of a random variable can be regarded
as being the center of gravity of that distribution.

Satz:

The expected value of the difference between


the random variable X and its expectation μx
equals zero, that is

E[X - μx] = 0 (central property).

139
Example 3.4.1:

Let X be a random variable with probability function


 x , x = 1, 2
f (x) =  3
 0 , else.
1 2
⇒ E [ X] = 1⋅ + 2⋅ =5
3 3 3

140
Example 3.4.2: (rolling two dice)

Y = "absolute difference between the two numbers"


We computed the probability function as follows:
Y 0 1 2 3 4 5
6 10 8 6 4 2
f ( yi )
36 36 36 36 36 36
1 5 2 1 1 1
⇒ E[Y] =0⋅ + 1⋅ + 2⋅ + 3⋅ + 4⋅ + 5⋅
6 18 9 6 9 18
5+8+9+8+5 35
= =
18 18
141
Example 3.4.3:

Let X be a random variable with density function


 1 x , 1 ≤ x ≤ 3
f (x) =  4
 0 , else.
∞ 3
1 2 1 3 3
⇒ E [ X] =∫ x ⋅ f ( x ) dx =
∫ 4 x dx =
12
x
1
−∞ 1

27 − 1 26 13
= = = .
12 12 6

142
The expectation of a function of a random
variable (law of the unconscious statistician)

Let X be a random variable with probability


function or density function f, and g(X) be a real-
valued function. Then:

E[g(X)] = ∑ g(xj)  pj , if X discrete;


all j

E[g(X)] = ∫ g(x)  f(x)dx , if X continuous.


143
Example 3.4.4:
Breakdowns are observed during the activity of a
production center.

Analyzing past data, we get for


X= "number of breakdowns per day"
the following probability function:

X 0 1 2 3
f( x ) 0.35 0.4 0.15 0.1

144
Example 3.4.4 (continued):

To eliminate the breakdowns the firm incurs the


following costs:
4 (per thousand CHF).
g ( x )= 5−
x +1
If we first compute
E [ X ] =⋅
0 0.35 + 1 ⋅ 0.4 + 2 ⋅ 0.15 + 3 ⋅ 0.1 = 1
expected number of breakdowns and plug in the
value in the cost function, we get expected costs
4
g (E [ X ]) =
5− 3.
= 145
1+ 1
Example 3.4.4 (continued):

But:
The correct way to compute the expected costs is

E [ g ( X )]
= g ( 0 ) ⋅ 0.35 + g (1) ⋅ 0.4 + g ( 2 ) ⋅ 0.15 + g ( 3 ) ⋅ 0.1
11
1 ⋅ 0.35 + 3 ⋅ 0.4 +
= ⋅ 0.15 + 4 ⋅ 0.1
3
= 2.5.

146
Example 3.4.5:

X 0 1 2 3
fx 0.1 0.3 0.2 0.4

Y= ( X-2 )2 → fy ? Wy = {0, 1, 4}

Y 0 1 4
→ E [ Y ] = 0.7 + 0.4 = 1.1
fy 0.2 0.7 0.1

147
Computing expectations: linear function

Theorem

Let X be a random variable with expected value E[X]


and a, b two real-valued finite constants, if
Y = aX + b,
then the expected value of Y equals
E[Y] =E[aX + b]= a  E[X] + b.

148
Example 3.4.6:

Let X be a random variable with density function


e-x , x ≥ 0
fx ( x )  and Y = 2X + 1
0 , else
What is the E[X]? Linear case: 2 ways
E [ Y ] = E [ 2x + 1] = 2 ⋅ E [ X ] + 1 = 3
+∞ +∞

∫ xe ( -x.e )0 + ∫
-x +∞
E [ X] = dx = -x
e -x
dx = -e -x ∞
0 1
=
↓↑ P.I.
0 0

computing the density fy = ? 149


Theorem (density transformation)

Assume that some regularity conditions on the function g()


are satisfied. Then
dg
−1
(y)
fy ( y ) fx ( g
= −1
( y )) ⋅ .
dy

Example 3.4.6 (continued):


y-1
→ y = g ( x ) = 2x + 1 → g
−1
(y) =
2
y-1
− 1 y-1
⇒ fy ( y ) =e 2
⋅ , if ≥ 0.
2 2
 150
⇔ y ≥1
Example 3.4.6 (continued):

Do we have a density function?

∞ y-1 y-1
1 −1
− − ∞


21
e dy =
2

2
⋅2 e 2

1
1 
=

∞ y-1 y-1
1 − 1 − ∞
=
Then: E[Y]
2 ∫y e 2
dy = −
↑P.I. 2
⋅2 ⋅y⋅e 2

1
1
∞ y-1 y-1 ∞
− −
+ ∫e 2
dy =1+2 ⋅ ( − e 2
) 3.
=
1 1

151
Example 3.4.7:
Let us consider a random variable X with p.f.
X -1 0 1.5 2
f (x) 0.3 0.1 0.4 0.2

⇒ E [ X=
] ( −1) ⋅ 0.3 + 0 ⋅ 0.1 + 1.5 ⋅ 0.4 + 2 ⋅ 0.2= 0.7
E [ X+3=
] E [ X ] += 3 3.7
E [ 4X ] =4 E [ X] =4 ⋅ 0.7 =2.8

152
Example 3.4.8: (rolling die game)

Player 1 promises Player 2 that he will pay the


following amounts when rolling one die:

10 cents, if the number is a 1 or 2;


20 cents, if the number is a 3 or 4;
40 cents, if the number is a 5; and
80 cents, if the number is a 6.

How much does Player 2 have to pay before each


roll of the die such that the game is fair?

153
Example 3.4.8 (continued):
“Fair game’’ means that the fee one has to pay
exactly equals the expected gain. Let X denote the
gain (in cents).
X=x 10 20 40 80
f (x) 2 2 1 1
6 6 6 6

Then, the expected gain is given by


2 2 1 1
E [ X ] = 10 ⋅ + 20 ⋅ + 40 ⋅ + 80 ⋅ = 30.
6 6 6 6
Thus the fee must be set equal to 30 cents to get a
fair game. (→ Casino-games are not fair!) 154
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 4.3 DeGroot and


Schervish
3.5. Variance

Definition
Let X be a random variable with finite mean μx.
The variance of X is defined as follows:
2
σx = V(X) = E[(X - μx)2],

provided that the sum or the integral exists.


If the mean of X does not exist, we say that V(X)
does not exist as well.
The standard deviation of X is the nonnegative
square root of V(X), that is σx = + √ V(X) .
156
The variance of a random variable is a measure
of how spread out the distribution of X is.
It can be computed by:
2
σx = ∑ (xj - μx)2 pj , if X discrete;
alle j
2
σx = ∫(x-μx)2 f(x)dx, if X continuous.

157
Example 3.5.1: (flipping a coin)

Let X be the discrete random variable defined as


X = “number of heads when flipping a coin twice’’

Then: X 0 1 2
f (x) 1 1 1
4 2 4
and µ x = 1

1 2 1
2
σ
= ( 0-1) ⋅2
+ (1 − 1) ⋅ + ( 2 − 1)2 ⋅ 1= 1
X 4 2 4 2

158
Example 3.5.2: (rolling two dice)

Let us consider again (see Example 3.4.2)


Y = “absolute difference of the numbers’’.

Above we computed that


35
E[Y
= ] ⋅
18
Then, we get
V ( Y=
) σ=
2
Y
2.05247.

159
Example 3.5.3: (continuous case)

Let X be a continuous random variable with density


function given by

c  1 2
  x − x , 0<x<2
f (x) =   2 
 0 , else.

160
Example 3.5.3 (continued):

What is the value of the constant c such that the


function f is a density function?

2
 1 2 x2
2 x 3
2
 8 2 !
c ⋅ ∫  x- x  dx = c −  = c2 −  = c = 1
0 2   2 0 6 0  6 3
⇒ c =3
2

161
Example 3.5.3 (continued):

Then:
3
( )   3 8 
2 3 4
1 3 x x
E [ X] =
2 2

⋅ ∫ ⋅ −  = ⋅  − 2  =1
2 3
x - x dx=
2 0
2 2  3 0 8 0
 2 3 
and
3  1 2 3  5 2 1 4
2 2

V ( X ) = ∫ ( x-1) ⋅  x- x  dx = ∫
2 3
 x- x +2x - x  dx
20  2  2 0 2 2 
3
40 16 32  1 1 = 0.4472.
=  2- + - = 5 ; σx= 5
2 6 2 10 

162
Computation of variances: simple rules

Theorem (linear function)

Let X be a random variable with existing variance


V(X), and a and b two real constants. Then
Y = aX+b
has the variance
V(Y) = a2  V(X)
and standard deviation
σY = |a|  σx.

163
Example 3.5.4:
Let us consider the random variable X with p.f.
X 6060 6100 6140
f (x) 0.2 0.3 0.5

For the computation of the variance V(X) let us first


consider the linear transformation
X-6100
Y=
40
with associated probability function
Y -1 0 1
f ( y ) 0.2 0.3 0.5 164
Example 3.5.4 (continued):

We then get:

E [ Y ] = 0.3
V ( Y ) ( −1 − 0.3 ) ⋅ 0.2 + ( 0 − 0.3 ) ⋅ 0.3
= 2 2

+ (1 − 0.3 ) ⋅ 0.5 =
2
0.61
V ( 6100 + 40Y ) =
⇒V (X) = 40 ⋅ V (Y ) 2

= 40 ⋅ 0.61 =
2
976

165
Theorem (alternative method for computing variances)
For every random variable X: V(X) = E[X2] - μx2.

Example 3.5.5: (rolling two dice)


Let us consider again
Y = “absolute difference of the numbers’’
35
We already computed:
= E [Y] = , V ( Y ) 2.05247.
18
10 210
Check using the theorem: E [ Y ] = 0 ⋅ 1
2 2
2
+1 ⋅ + ... =
6 36 36

( )
2
210 70
⇒ V (Y) = − =5.83 − 3.78086 =2.05247. 166
36 36
Steiner rule:
Let X be a random variable with E[X] = μ and
let d be a real-valued constant. Then

V(X) = E[(X-d)2] – (μ-d)2.

167
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem


3.6. Standardization

Definition

Let X be a random variable with μx = μ and


σx = σ > 0 (both finite). Then we call the
random variable Z resulting from the transformation
X-μ
Z = σ standardized.

169
Every standardized random variable has
expectation 0 and variance 1:
translation stretching 1 X-μ
X Y=X–μ Z= σ Y= σ

E[X] = μ E[Y] = 0 E[Z] = 0


V(X) = σ2 V(Y) = σ2 V(Z) = 1

170
Example 3.6.1: (flipping a coin)

X = “number of heads when flipping a coin twice’’


We have: E(X) = 1 and V(X) = ½ .
X-µ X −1
Standardization: Z= =
σ 1
2
We then get for Z:
Z - 2 0 2
f (z) 1 1 1
4 2 4
and, as expected,
= E [ Z ] 0=and V ( Z ) 1.
171
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 5 DeGroot and


Schervish
4. Special distributions

173
Several distributions play a special role in probability
and statistics: they are known to be useful in a wide
variety of applied problems.

Each special class constitutes a whole family of


distributions.

The different members of each family can be


obtained by specifying the value of the underlying
parameter(s).

174
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem


4.1. The uniform discrete distribution
Let us consider a random variable X with m (finite)
possible outcomes.
We assume that all outcomes are equally likely:

X x1 x2 x3 … xm-1 xm
f(X) 1/m 1/m 1/m .... 1/m 1/m

Notation: fUni(x; m)
parameter of the family

176
Example 4.1.1: (rolling one die)

For m = 6, we have that the probability function of

X = ‘‘number when rolling one die’’

is given by

1
 , x = 1, 2,...,6
f Uni ( x;6 ) =  6
 0, else.
177
Example 4.1.1 (continued):
What about the distribution function?
FUni(x,6)

5/6

4/6

3/6

2/6

1/6

0 1 2 3 4 5 6
178
Example 4.1.1 (continued):

What is the expected value?


m m m
1 1 m
E [ X]= ∑ x ⋅ p = ∑ x ⋅ P [ X = x ]=
i=1
i i
i=1
i i ∑
i=1
xi ⋅ =
m
⋅ ∑ xi
m i=1
1 6 1 6 ⋅ ( 6 + 1)
→ E [ X] = ⋅ ∑ i = ⋅ =3.5
6 i=1 6 2

179
Example 4.1.1 (continued):

And the variance (or standard deviation)?

 2  1 m 2 1 6 2 1 ( 6 + 1)( 6 ⋅ 2 + 1) ⋅ 6 7 ⋅ 13 91
E X =⋅ ∑ x =⋅ ∑ i =⋅ = =
  m i 6 i=1 6 6 6 6
i=1

m 2
2 2 1 2  1 m 
V ( X) =  
E X − E [ X] =⋅ ∑ x −  ⋅ ∑ x 
  m i=1 i m i
 i=1 
91  7 2 91 49 182 − 147 35
= −  = − = = = 2.9167
6 2 6 4 12 12

σ
= V (X)
= 2.9167
= 1.7078
X
180
Remark: (no additive property)

Unfortunately the sum of two (or more) independent


uniform discrete random variables does not belong to
the same family of distributions, that is is not uniform
discrete.

Ex: X = ‘‘sum of the numbers when rolling two dice’’:

2 3 4  12
1 2 3
 ← not equally likely!
36 36 36
not a uniform discrete distribution!
181
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 5.2 DeGroot and


Schervish
4.2. The Bernoulli distribution (discrete)
The random variable X has only two possible
outcomes, denoted by 0 and 1.

result of an experiment with two outcomes A


and Ā, where A is called success and Ā failure.
The probability that event A occurs is called
success probability, is the parameter of the
distribution, and is denoted by p.

Such experiments are called


Bernoulli experiments/trials.
183
Definition

A random variable X has the Bernoulli distribution


with parameter p if its probability function equals

1 − p, x= 0

fBe ( x; p ) =
= p, x 1
 0, else.

The parameter p can take any value between 0 and


1.

184
Example 4.2.1:

Let us consider a game where in order to win the


player has to score at least one six when rolling three
dice (‘‘success’’). Let X denote success/failure.

→ What is the success probability?

()
3
5 216 − 125 91
p =1 - P [ "no six" ] =
1− = = .
6 216 216
91
⇒ E  X  = = 0.4213,
216
91  91 
( )
V X =⋅  1 −
216 
=
216 
0.2438,

σ X = 0.4938.
185
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 5.2 DeGroot and


Schervish
4.3. The Binomial distribution (discrete)

Definition

A random variable X has the binomial distribution


with parameters n and p if its probability function
equals

n x x = 0,1,2,..., n
fBi ( x; p, n )=   ⋅ p ⋅ (1 − p ) ,
n−x

x 0 ≤ p ≤ 1.

187
Derivation of the binomial distribution:
Let Yi, i=1,…n, be independent random
variables, each one Bernoulli distributed
with parameter p (Bernoulli trials).

Then, the sum


n
X = ∑Yi
i =1
is distributed according to a binomial
distribution with parameters n and p.

188
Example 4.3.1: (urn with replacement)

Let us consider an urn containing colored balls. In


particular we have 10 black and 20 white balls.

We draw four balls from the urn with replacement.

We are interested in the total number of black balls


we have drawn.

189
Example 4.3.1 (continued):

Let X denote the random variable

X = ‘‘number of black balls’’.

→ possible outcomes of X: x = 0, 1, 2, 3, 4
10 1
→ parameter values: p = = ; n=4
30 3
4 0 16
=   p (1 − p= )
4 −0
→ P(X=0) ;
0 81
4 1 32
 p (1 − p )
4 −1
P(X=1) =
= ;
 1 81
190
Example 4.3.1 (continued):

That is:
 1 
X is binomially distributed with fBi  x; ,4 
 3 

1 4
→ E [ X] = n ⋅ p = 4 ⋅ =
3 3
1 2 8
→ V ( X ) = n ⋅ p ⋅ (1 − p ) = 4 ⋅ ⋅ =
3 3 9

191
Example 4.3.2: (election)

Let us assume that 35% of the whole voting


population of a given country decides to vote for
party C.

We ask 12 randomly chosen persons whom they


intend to vote for (random sample of size n=12).

What will be the result of our sample test?

192
Example 4.3.2 (continued):

Expected number of electors voting for C in the sample:


µ = 12 ⋅ 0.35 = 4.2.

Standard deviation:
σ = 12 ⋅ 0.35 ⋅ 0.65 =
1.6523.

P ( "electors voting for C reach majority" )


12 12
=
= P [ X>6] ∑
= P [ X=x ] ∑ f ( x;=
x=7 x=7
0.35, 12 )
Bi 0.0846.

193
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 5.4 DeGroot and


Schervish
4.4. The Poisson distribution (discrete)
Definition

Let 𝜆𝜆 > 0. A random variable X has the Poisson


distribution with mean 𝜆𝜆 if its probability function is
as follows:

 λ x −λ
 ⋅ e , for x = 0,1, 2,...,
fPo ( x; λ ) =  x !
 0, else.

195
Example 4.4.1: (roulette)

A roulette player is convinced that the number 17 is


his lucky number.

Therefore he continuously bets on that number.

What is the probability that he is going to win exactly 8


times out of 200 trials?

→ success probability by one trial: p = P(‘‘17’’) = 1/37;


number of trials: n=200.

196
Example 4.4.1 (continued):

λ np
Let us use the Poisson distribution with = = 5.4054.
5.40548 −5.4054
Then, we find: fPo ( 8 ; λ ) = e = 0.0812.
8!

Remark: If, instead, we use the correct binomial


distribution, we find:

 1 
fBi  8 ; , 200  = 0.0814.
 37 

197
Example 4.4.2: (minigolf)

The professional minigolf player E. Findhole


pretends that, even in the most difficult conditions,
only in 20% of the cases is he not able to make a
hole in one.

Today many journalists are present at the minigolf


course. The player will demonstrate his ability with a
series of 50 attempts.

198
Example 4.4.2 (continued):
Let us compute the probability that Mr. Findhole
makes a hole in one in only 38 out of the 50
attempts: (a) exact; (b) with a suitable approximation.

a) X = "# failed attempts" follows fBi ( x ; 0.2 , 50 )


P [ X=12] = 0.1033

b) Let us use the Poisson distribution: λ = np = 10


P [ X=12] ≈ P( Y = 12 ) = 0.0948

Poiss.

Approximation error: too large! p = 0.2 not suitable!


199
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem


4.5. The uniform continuous distribution
Definition
A random variable X has the uniform (continuous)
distribution if its density function is defined as
follows (a,b two real-valued constants):
 1
 ,
fUni ( x; a, b )  b − a
= if a ≤ x ≤ b
 0, else.

The density function is constant in the interval [a, b].

201
Example 4.5.1:
According to schedule, a bus is expected to arrive
every 30 minutes between midnight and 6am.
What is the probability that a passenger has to wait
more than 10 minutes?
T = "waiting time for the next bus" ,
1
 , 0 ≤ t ≤ 30,
is a random variable with fUni ( t;0,30 ) =  30
 0 , else.

10 2
Thus: P [ T>10] =−1 P [ T ≤ 10] =−
1 FUni (10 ) =−
1 =
30 3
Moreover: E [ T ] 15min
= = and V ( T ) 75min2
202
Example 4.5.2: (waiting time at the ‘S-Bahn’ station)
Trains are coming every 12 minutes at a given
‘S-Bahn’ station. Suppose you do not know the exact
schedule and arrive at the station at a randomly
chosen point in time.

Let us define
X = ‘‘waiting time at the station’’
as the random variable of interest.
The image set of X is W = [0,12] (minutes).

203
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 5.7 DeGroot and


Schervish
4.6. The exponential distribution (continuous)

Definition

Let 𝜆𝜆 > 0. A random variable X has the exponential


distribution with parameter 𝜆𝜆 if its density function is as
follows:
λ ⋅ e − λ x , if 0 ≤ x < ∞
fEx ( x; λ ) = 
 0, else.

Then: E[X] = 1/ 𝜆𝜆 and V(X) = 1/ λ 2 .

205
Example 4.6.1: (life test)

If we believe in the manufacturer's claim, the


expected life of a light bulb is 5000 hours.
We assume as a good approximation for the
distribution of the random variable
X = ‘‘lifetime of the light bulb’’:

x
1 −
=fEx ( x ) e 5000 , 0 ≤ x < ∞.
5000

1
Remark: λ =
E[X]
206
Example 4.6.1 (continued):

What is the probability that a light bulb:

(1) runs less than 2500 hours ?


1

P [ X ≤ 2500] =
FEx ( 2500 ) =
1− e 2
0.3935;
=

(2) runs more than 10000 hours?

( )
P [ X > 10000] =1 − FEx (10000 ) =1 − 1 − e −2 =0.1353.

207
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 5.6 DeGroot and


Schervish
4.7. The normal distribution (continuous)
Definition (standard normal distribution)
A random variable X has the standard normal
distribution if its density function is defined as
follows:
x2
1 −
Z (x)
f= ⋅e , for − ∞ < x < ∞.
2
2 ⋅π

The expected value and the variance of this


distribution are 0 and 1, respectively.

209
Definition (general normal distribution)
A random variable X has the normal distribution
with mean μ and variance 𝜎𝜎 2 (−∞ < μ < ∞; σ > 0) if
its density function is defined as follows:

2
( x −µ )
1 -
fN ( x; µ ,σ 2
)= e 2σ 2 , − ∞ < x < ∞.
2πσ 2

In fact, a parameter μ for the center of gravity and a


parameter σ for the dispersion of the distribution are
introduced.
210
Illustration of the normal distribution (density function)
for different (μ,σ) values.
σ=1 σ=2 σ=3

μ = -5

μ=0

μ=5

211
Example 4.7.1: (working with normal distribution)

Let the random variable X be normally distributed


with E[X] = 5 und V(X) = 9.
What is the probability: P(-2 < X ≤ 4)?

 -2 - µ X - µ 4 - µ 
P ( -2 < X ≤ 4 ) = P  < ≤ 
 σ σ σ 
 7 -1 
= P - < Z ≤ 
 3 3
 1  7
= FZ  -  - FZ  - 
 3  3
= 0.3694 - 0.0098 = 0.3596
212
Example 4.7.1 (continued):
The probability P(-2 < X ≤ 4) for different values of
E[X] = μ und V(X) = σ2.
σ=1 σ=2 σ=3

μ = -5

μ=0

μ=5

213
Example 4.7.2: (finance: asset allocation)
It is usually assumed in the portfolio theory that financial
asset returns are normally distributed random variables.
→ E [R ] = µ expected return;
σ R = V ( R ) volatility (risk).
An investor wants to invest a certain amount of money in
three different shares:

share expected return volatility


A1 44% 22%
A2 36% 20%
A3 10% 4%
214
Example 4.7.2 (continued)

) investing the whole amount on a single share


(trying to avoid a loss):
 R1 − µ1 0 − 0.44 
P=( R1 < 0 ) P  σ < = 0.22   P= ( Z < -2 ) 0.0228
 1

 R2 − µ2 0 − 0.36 
P ( R2 < 0 ) P 
= <=  P= ( Z < -1.8 ) 0.0359
 σ2 0.2 
 R3 − µ3 0 − 0.1 
P ( R3 < 0 ) P 
= <
=  P=( Z < -2.5 ) 0.0062
 σ3 0.04 

i.e., the risk of a loss is smaller for the third share


(but the expected return is smaller, too).
215
Example 4.7.2 (continued)

) investing the same amount on the three shares:


1
portfolio return: R = ( R1 + R2 + R3 ) is normally
3
distributed with
1
[R ] ( 44 + 36 +=
µR E=
= 10 ) 30 [%]
3
2
 1
=σ R 2 V= ( R )   ( 22 + 20 + 4 )
2 2 = 2 100% 2
3
Then: ) = 0.0013 < P ( R3 < 0 ) !
P ( R < 0=

→ Splitting the capital among different investment


options turns out to be the best strategy (diversification).
216
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem


5. Multivariate random variables

218
To formalize many underlying theories as well as to
solve many applied problems, one also needs to
consider the relation among the different random
variables under investigation. In fact, that information
might play a prominent role and cannot be
neglected.

219
Example 5.0.1: (finance: portfolio selection)

Let us consider a portfolio composed of two indices,


namely the S&P 500 and the FTSE 100.
To analyze the behavior of the portfolio returns, one
possibility is to model the returns of the two indices
separately using two univariate random variables.
Proceeding this way, however, we would completely
neglect the stochastic relation between the two
indices, and the results of the analysis might be
misleading.
Therefore what we need is a multivariate approach
that takes into account such a relation.
220
→ In general, the relation among the variables is
stochastic and must be taken into account.

→ If we consider two random variables:


→ bivariate distribution.

→ If we consider more than two random variables:


→ multivariate distribution.

221
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 3.4/5 DeGroot and


Schervish
5.1. Joint distribution and marginal
distributions

Discrete random variables:

Let X and Y be discrete random variables, and


consider the ordered pair (X,Y).
The joint probability function of X and Y is defined
as the function f that for every point (x,y) in the xy-
plane,
f(x,y) = P [{X=x} ∩ {Y=y}].
223
Properties:

(1) f ( xi , y j ) ≥ 0 

( 2 ) ∑∑ f ( xi , y j ) = 1  ⇒ ( 3 ) f ( xi , y j ) ≤ 1, ∀i , j

∀i ∀ j 

Finally, for each set C of ordered pairs,

P ( ( X ,Y ) ∈ C ) =∑ f ( xi , y j ).
( xi ,y j )∈C

224
Example 5.1.1: (urn without replacement)

An urn contains 6 balls: three balls are labelled with


"1", two balls with "2", and the last ball with "3".

U

 
 
Two balls are drawn from the urn without
replacement. The joint probability of
(X, Y) = (" label first ball ", " label second ball ")
is: [see next slide]
225
Example 5.1.1 (continued):

Y y1=1 y2=2 y3=3 fx For example:


X
3 2 1
x1=1 1/5 1/5 1/10 1/2 f (1, 1) = ⋅ =
6 5 5
2 3 1
x2=2 1/5 1/15 1/15 1/3 f ( 2, 1) = ⋅ =
6 5 5
x3=3 1/10 1/15 0 1/6 1 3 1
f ( 3, 1) = ⋅ =
fy 1/2 1/3 1/6 1 6 5 10

Such a table is called a table of probabilities.


226
Example 5.1.2: (urn with replacement)
Consider the same example as in 5.1.1 with
replacement.

Y y1=1 y2=2 y3=3 fx For example:


X
3 3 1
x1=1 1/4 1/6 1/12 1/2 f (1,1) = ⋅ =
6 6 4
x2=2 1/6 1/9 1/18 1/3 
x3=3 1/12 1/18 1/36 1/6 
fy 1/2 1/3 1/6 1 

227
Example 5.1.3: (tossing a coin)

A fair coin is tossed four times. Let


(X, Y) = (" number of heads ", " number of changes ").

→ # outcomes: 24=16

TTTT: (0,0); TTTH: (1,1); TTHT: (1,2); THTT: (1,2); HTTT: (1,1);
TTHH: (2,1); THTH: (2,3); THHT: (2,2); HTHT: (2,3); HHTT: (2,1);
HTTH: (2,2); THHH: (3,1); HTHH: (3,2); HHTH: (3,2); HHHT: (3,1);
HHHH: (4,0).

228
Example 5.1.3 (continued):
Y y1 = 0 y2 = 1 y3 = 2 y4 = 3 fx
X

x1 = 0 1/16 0 0 0 1/16

x2 = 1 0 1/8 1/8 0 1/4

x3 = 2 0 1/8 1/8 1/8 3/8

x4 = 3 0 1/8 1/8 0 1/4

x5 = 4 1/16 0 0 0 1/16

fy 1/8 3/8 3/8 1/8 1


229
Remark:
In the previous three examples:

fx ( xi ) = P [ X = xi ] = ∑ f ( xi ,y j ) = pi ,• ;
j

fy ( y j ) = P Y = y j  = ∑ f ( xi ,y j ) = p•, j ;
i

are called marginal probability functions of X and


Y, respectively.

230
Continuous random variables:
Let X and Y be continuous random variables. The
function f(x,y) with
b d

∫ ∫ f ( x,y ) dydx = P [{a < X ≤ b} ∩ {c < Y ≤ d }]


a c

for real-valued constants a<b and c<d is called the


joint (probability) density function of X and Y.

Properties: (1) f ( x,y ) ≥ 0, ∀x, y ;


∞ ∞
( 2) ∫ ∫ f ( x,y ) dydx = 1.
-∞ -∞
231
Example 5.1.4:
The joint density function of a (specific) two-dimensional
normal distribution could be defined by:

x2 +y 2
1 −
f ( x, y=) ⋅e 2
, − ∞ < x, y < +∞.

232
Example 5.1.5:
The joint density function of (X, Y) is given by

12 2
 ( x + xy ) , if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1;
f ( x,y ) =  7
 0, else.

→ non-negativity property ? 
→ 12
( )
2
12 xy 12
1 1 1 1
1 1
x
∫ ∫ ( x +xy ) dydx = ∫ x y + ∫
2
2
dx = 2
x + dx
7 0 0
7 0 0 2 0 7 0 2
12  x 3 x 
( )
12 1 1 12 7
2 1
=  +  = + = ⋅ = 1
7  3 4  0 7 3 4 7 12
233
In the continuous case, the marginal (probability)
density functions of X and Y are


fx ( x ) = ∫ f ( x, y ) dy
−∞
and ∞
fy ( y ) = ∫ f ( x, y ) dx,
−∞
respectively.

234
Example 5.1.5 (continued):
1
1
12 12  y 2

fx ( x ) ∫
7 0
( x + xy ) dy =
2
⋅ x y + x 
7 
2

2 0
12  2 x 
= ⋅  x + , x ∈ [0,1].
7  2
1
1
12 12  x 3
x 2
y
fy ( y ) ∫
7 0
( x + xy ) dx = ⋅  +
2
7 3 
2 0
12  1 y 
= ⋅  + , y ∈ [0,1] .
7 3 2
235
Remark:
The expected values and variances of the marginal
distributions of a bivariate random vector can be
computed using the marginal probability/density
functions:
→ µ x = E [ X ]= ∑ x i fx ( x i );
i

∑ ( x -µ ) f ( x )
2
σx
= ( X)
V=2
i x x i
( discrete )
i

→ µ x = E [ X ]= ∫xf x
( x ) dx;
−∞

∫ ( x-µ )
2
σx
= ( X)
V=2
x
fx ( x ) dx ( continuous )
−∞
236
Definition

The joint (cumulative) distribution function of


two random variables X and Y is defined as the
function F such that for all real values x and y,

F ( x, y ) = P [ X ≤ x, Y ≤ y ] .

It is clear that F(x,y) is monotone increasing in x for


each fixed y and is monotone increasing in y for
each fixed x.

237
Practical computation:

If (X,Y) is discrete:

F ( x, y ) = ∑ ∑ f ( x , y ).
xi ≤ x y j ≤ y
i j

If (X,Y) is continuous:
x y

( x, y )
F= ∫ ∫ f (u, v ) ⋅ dv ⋅ du.
−∞ −∞

238
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 3.6 DeGroot and


Schervish
5.2. Conditional distributions and stochastic
independence
Definition
a) Let X and Y be two discrete random variables with joint
probability function pij. The conditional probability
function of X given thatY=yj is defined as follows:

f=
( )
f X ,Y xi , y j pi , j
( xi ) = p , if p⋅, j > 0, and 0 else.
X Y =y j
( )fY y j ⋅, j

b) Let X and Y be two continuous random variables with joint


density function f(x,y). The conditional density function
of X given that Y = y is defined as follows:
f X ,Y ( x, y )
=fX Y =y ( x ) , if fy ( y ) > 0, and 0 else.
fY ( y )
240
Example 5.2.1: (discrete case)
Consider tossing a coin three times. Define

(X,Y)=(" # heads in first toss ", " # heads ").

We compute the conditional distribution of X given


that Y=1.
Y y1=0 y2=1 y3 =2 y4=3 fx
X
x1=0 1/8 1/4 1/8 0 1/2

x2=1 0 1/8 1/4 1/8 1/2

fy 1/8 3/8 3/8 1/8 1


241
Example 5.2.1 (continued):
p1,2 1/ 4 2
f X |Y =1 (x) : X = x1 = 0: = =
p×,2 3/8 3
p2,2 1/ 8 1
X = x2 = 1: = =
p×,2 3/8 3

Another example: 1
fY |X=0 (y) : Y = 0:
4
1
Y = 1:
2
1
Y = 2:
4
Y = 3: 0 ...
242
Example 5.2.2: (continuous case)
Let (X,Y) denote a two-dimensional continuous
random vector with joint density given by
λ 2 ⋅ e − λ x , 0 ≤ y ≤ x,
f ( x,y ) = 
 0 , else.
→ Marginal densities:
x
fx ( x ) =∫ λ 2 ⋅ e − λ x dy =λ 2 ⋅ e − λ x ⋅ x, x ≥ 0
0
+∞
 1 − λ x  +∞
fy ( y ) =
∫ λ ⋅e
2 −λ x
λ ⋅ − e  =
dx = 2
λ ⋅ e−λy , y ≥ 0
y  λ y
(→ Y is exponentially distributed: Exp ( λ ) ) .
243
Example 5.2.2 (continued):

How should we specify the domain of the joint density


function (and therefore the integrals’ boundaries)?
y=x
A

A = {( x, y ) | x ∈ [0, ∞ ), y ∈ [0, x ]}
or
A = {( x, y ) | y ∈ [0, ∞ ), x ∈ [ y , ∞ )}
244
Example 5.2.2 (continued):

→ conditional density of X given Y :


λ 2 ⋅ e − λ x ⋅ 1{0≤ y ≤ x} − λ ( x-y )
f X|Y =y ( x ) = −λy
=λ ⋅e , x ≥ y ≥ 0.
λ ⋅e ⋅ 1{y ≥0}

→ conditional density of Y given X :


λ 2 ⋅ e − λ x ⋅ 1{0≤ y ≤ x}
1
fY |X=x ( y )
= = , 0 ≤ y ≤ x.
λ ⋅ e x ⋅ 1{x ≥0} x
2 −λ x

245
Definition (independent random variables)

It is said that two random variables X and Y with


joint probability or density function 𝑓𝑓𝑋𝑋,𝑌𝑌 and marginal
probability or density functions 𝑓𝑓𝑋𝑋 and 𝑓𝑓𝑌𝑌 are
independent if and only if

f X ,Y ( =
x, y ) f X ( x ) ⋅ fY ( y ) , − ∞ < x, y < +∞.
It also follows that (for all y with 𝑓𝑓𝑌𝑌 (y)>0 and all x
withf(x,y) = respectively)
𝑓𝑓𝑋𝑋 (x)>0, f x (x) ⋅ f y (y), für ∀(x,y).

= ( x ) and fY |X=x ( y ) fY ( y ).
f X|Y =y ( x ) f X=

246
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 4.6 DeGroot and


Schervish
5.3. Covariance and correlation
Definition
Let X and Y be two random variables with joint probability
or density function f(x,y). The expectation of the two-
dimensional function g(X,Y) is defined as

E [ g(X,Y)] = ∑∑ g(x ,yi j ) f(xi ,y j ), if (X,Y ) discrete;


∀i ∀j
∞ ∞
E [ g(X,Y)] = ∫ ∫ g(x,y) f(x,y) dy dx, if (X,Y ) continuous.
−∞ −∞

248
We introduce summaries of a joint distribution that
enable us to measure the relationship between two
random variables, i.e. their tendency to vary together
rather than independently.
Definition (covariance)

Let X and Y be random variables with finite means


μx and μy.
The covariance of X and Y is defined as
Ε ( X − µ X ) ⋅ (Y − µY )  ,
Cov ( X ,Y ) =

if the expectation exists.


249
Multiplication rule for expectations:
For all random variables X and Y with finite variance:
E [ XY ] = E [ X ] ⋅ E [Y ] + Cov ( X ,Y ) .

If X and Y are independent random variables with


finite variance, then
E [ XY ] = E [ X ] ⋅ E [Y ].
As a consequence, to verify the independence of two
random variables, the following criterion can be used:

X , Y independent → Cov (X,Y ) = 0


or Cov (X,Y ) ≠ 0 → X,Y not independent.
250
Computational rules for the covariance

The covariance is a so-called bilinear operator, which


means that it is linear in both arguments. Let U, V, X, Y
be random variables with finite means and variances and
a, b, c, d real-valued constants.
Then: Cov(a·U+b·V, c·X+d·Y) =
a·c·Cov(U,X) + a·d·Cov(U,Y) + b·c·Cov(V,X)+ b·d·Cov(V,Y)

Remark: The covariance can also be seen as a scalar


product.
Thus, when we say that X and Y are orthogonal, we
mean that Cov(X,Y) = 0.
251
Definition (correlation)
Let X and Y be random variables with finite variances σx
and σy, respectively.
Then the correlation of X and Y is defined as follows:
Cov ( X ,Y )
ρ X ,Y = .
σ X ⋅ σY

It is said that X and Y are positively correlated if 𝜌𝜌𝑋𝑋,𝑌𝑌 >


0, that X and Y are negatively correlated if 𝜌𝜌𝑋𝑋,𝑌𝑌 < 0,
and that X and Y are uncorrelated if 𝜌𝜌𝑋𝑋,𝑌𝑌 = 0.

252
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 4.2/6 DeGroot and


Schervish
5.4. Sums and sample means of random
variables

Expected value of a sum of two random variables?

E[ X + Y ] = ∑∑ (x + y i j ) ⋅ f (xi ,y j )


∀i ∀j

= ∑∑  x
∀i ∀j
i ⋅ f (xi ,y j ) + ∑∑  y
∀i ∀j
j ⋅ f(xi ,y j )

= E [ X ] + E [Y ]

This result can be generalized to:

E [ X1 + ... + X n ] = E [ X1 ] + ... + E [ X n ].
254
Variance of a sum of two random variables?

(
V ( X + Y ) = E ( X + Y ) - ( µx + µy )

)
( )
= E ( X - µ x ) + (Y - µ y ) 

2 2

   

= E ( X - µ x ) + (Y - µ y ) + 2 ⋅ ( X - µ x ) (Y - µ y ) 
 2 2

 
= E ( X - µ x ) + E (Y - µ y )  + 2 ⋅ E ( X - µ x ) (Y - µ y ) 
 2
 2

   
= V ( X ) + V (Y ) + 2 ⋅ Cov ( X,Y )

Thus, in the case of uncorrelated random variables,


it can be generalized to:

V ( X1 + ... + X n ) = V ( X1 ) + ... + V ( X n ) .
255
Sample mean of uncorrelated random variables:

Let 𝑋𝑋1 to 𝑋𝑋n be uncorrelated random variables with


finite means and variances. Then:
1 1 n 1
( )
E Xn
n

= E  ( X1 +  + X n )  = ⋅ ∑ E [ X i ] = ⋅ n ⋅ µ =
 n i=1 n
µ

1 1 n 1 σ2
( )
V Xn
n

= V  ( X1 +  + X n )  = 2 ⋅ ∑V [ X i ] = 2 ⋅ n ⋅ σ =
 n i=1 n
2

n
σ
⇒ σX =
n
n

(reduced by the factor n : called n - rule!)


256
Example 5.4.1:

Let us consider the following game:


To participate the player has to pay 1 Euro (fee).
A fair coin is tossed three times: for each ‘‘head’’ the
player wins exactly “1 Euro’’.

257
Example 5.4.1 (continued):
a) Describe the random variable
X: “player’s net winnings’’
using Bernoulli distributed random variables.

Show that E[X] = ½ and V(X) = ¾.


 1
3 y
 i ,1 =1, "head" , p = , Yi are independent,
X = ∑Yi -1, Yi =  2
i =1  y i ,2 =0, else.

 3  3 3
1 1
→ E  X  = E  ∑Yi -1 = ∑ E Yi  -1 = ∑ p -1 = 3 ⋅ − 1 = ;
 i =1  i =1 i=1 2 2
 3  3 3
1 3
→ V ( X ) = V  ∑Yi -1 = ∑ V ( Yi ) = ∑ p ⋅ (1-p ) = ⋅ 3 = .
 i =1  i=1 i=1 4 4
258
Example 5.4.1 (continued):
b) Anton plays the game three times consecutively.
Let U be the winnings after three games. Express
the random variable U using

𝑋𝑋i =‘‘player’s net winnings game i’’, i = 1, 2, 3.

Compute E[U] and V(U).


U = X1 + X 2 + X 3 , Xi are independent.

3
→ E U  =
3 ⋅ E  X  =;
2
3 9
→ V (U ) = 3 ⋅ V ( X ) = 3 ⋅ = .
4 4 259
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 6.3 DeGroot and


Schervish
6. The Central Limit Theorem

261
Let us consider a sequence of n random variables.
Assume that the random variables X1,..., Xn are
independent and identically distributed (i.i.d) with
(both finite)
E [ X i ] = µ und V ( X i ) = σ 2 .

This sequence of random variables is called a


random sample of size n.
How can we generate it in practice? → two (main)
cases:
1) random sampling with replacement;

2) series of tests / experiments. 262


Example 6.1:
Let X describe the winnings from a gambling game.
If we play that game several times in a row we get the random
sample X1,..., Xn.
n
The total winnings after n games is Sn =∑ Xi .
i=1
1
The average winnings per game after n games is Xn = Sn .
n
Question: What is the probability that after a lot of games the
total winnings is between a and b, i.e., P[a ≤ Sn ≤ b] = ?

→ The central limit theorem gives us an approximate way to


answer this type of question, in particular when n is large
(n→∞).
263
Theorem: Central Limit Theorem (CLT)

Let X1, X2,..., Xn be i.i.d. random variables with μ = E[Xi]


and σ2 = V(Xi) (both finite). Let Sn be the sum and
Sn
Xn = the sample mean of the random sample. Then, the
n
distribution function Fn of the standardized variable
Sn - n µ X n - µ
=Zn =
σ n σ/ n
converges for n → ∞ to the standard normal distribution:
Fn ( Zn ) → FZ ( Z ) .

Special case: Xi i.i.d. Bernoulli distributed.


264
Illustration of the Central Limit Theorem:
Let X1, X2,..., Xn be a random sample from a
continuous uniform distribution on [0,1].

265
Theorem: Limit Theorem of De Moivre and Laplace
Let Sn be a binomially distributed random variable with
parameters n and p.
Then its distribution function converges with increasing n
towards a normal distribution with corresponding moments:
FBi ( sn ; n, p ) → FN ( sn ; np, np (1 - p ) )

Similarly, the distribution function F of the standardized


variable Sn - np Xn - p
= Zn ≡
np (1 - p ) p (1 − p )
n
converges for n → ∞ to the standard normal distribution
Fn ( Zn ) → FZ ( Z ) .
266
Example 6.2: (finance)
A financial theory assumes that the share (log-) prices
in efficient markets behave according to the so-called
random walk: K t = K t-1 +ε t .

As a consequence, the returns are given by


K t − K t −1, E [ε t ] =
εt = 0
V (ε t ) = σ 2

and have the same expected value zero and the


same variance 𝜎𝜎 2 .
267
Example 6.2 (continued):

The monthly return would then be a sum


ε t + ε t+1 +  + ε t+n , with n=22
(approximate number of working days in a month).

CLT the monthly returns are approximately


normally distributed with expected value zero and n-
times variance.
→ naive prediction: the price stays at the same level
it is today.

268
Example 6.3:

Let X1,..., X12 be i.i.d. from a uniform distribution on


12
[-½, ½], and S12 = ∑ X i .
i =1
( b - a )2 1
We know that E [ X i ]= 0= µ, V ( X i )= = = σ 2.
12 12
Applying the CLT we get that
approx
S12 ~ N ( n µ , nσ 2 ) = N ( 0, 1) .

Remark: With as few as n=12 we get a reasonably good


approximation of the true distribution using the CLT.
269
Example 6.4:
We toss a coin 100 times and we get ‘‘heads’’ exactly
60 times. Is the coin fair?

1, heads in toss i , i =1,...,100,


Let X i = 
0, else.

Xi is Bernoulli distributed with p=½ (under the


assumption that the coin is fair).

270
Example 6.4 (continued):

S100 is binomially distributed with p=½, n=100.


standardization

 S100 − 50 60 − 50 
P [S100 ≥ 60 ] = P  Z100 = ≥ 
 25 25 
CLT: approx.

= 1 − FZ ( 2 ) =
0.0228.

Thus: Assuming a fair coin, the probability of the


event {S100 ≥ 60} is very small. Given that we observed
this event, we may question the fairness of the coin.

Remark: Use a correction for continuity when


approximating discrete distributions using the CLT.
271
Part II: Statistics

272
Representative random sample:

The difficulty in analyzing many phenomena, be they


economic, social, or otherwise, is that there is simply too
much information for the mind to assimilate.
It would be more useful to have much less information, but
information which was still representative of the orginal
data. In reaching this, much of the original information
would be deliberately lost.

Remark: There is no formal statistical definition of


representative.
273
A very important characteristic of statistical variables is
the scale in which they are measured.
Depending on the scale, variables can be divided in
different classes.
The appropriate method of analyzing the data as well as
the possible statistical evaluation depend on this
classification.

Other relevant factors are the sophistication of the


audience and the ‘message’ which is intended to
convey.
274
1. Nominal scale:

A variable is measured on a nominal scale when


there is not an obvious natural ordering of the
outcomes: only equality or inequality of the outcomes
can be determined.

2. Ordinal scale:

A variable is measured on an ordinal scale when the


different outcomes can be naturally ordered, but the
‘distance’ between them cannot be measured.

275
3. Ratio scale:

A variable is measured on a ratio scale when not


only the different outcomes can be ordered or
ranked, but also the distance between them can be
computed.
The outcomes in this case must be numbers.

Finally, another important distinction is whether we


have to analyze a sample of cross-sectional data
(measured at one specific point in time) or a sample
of time-series data.
276
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 1 Barrow


Chapters 1-3 ASWFS
7. Descriptive statistics

278
Goal:
The task of descriptive statistics is to introduce a
number of descriptive, in most cases graphical
methods to summarize all the information about the
variables under investigation and illustrate the main
features, without distorting the picture.

279
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem


7.1. Frequency tables, histograms, and
empirical distributions

281
Example 7.1.1: (radioactive decay of Americium-241)

The radioactive element Am-241 emits by decay α -


particles. We are interested in the number of emissions
in given intervals of a fixed length, for example 10
seconds.

We therefore observe the decay process for some time


and we want to find a suitable model for the recorded
data.

For measuring purposes, we split the whole recording


period in 1207 intervals, each lasting 10 seconds.
282
Example 7.1.1 (continued):

In each interval we count the number of emissions, yielding


the data x1,…, x n ; n =
1207. The total number of recorded
emissions is 10,129.

In a first step we build the following classes: -


for =y
j
3, 4, … ,16, we count the number of intervals with
exactly 𝑦𝑦𝑗𝑗 emissions;
- for intervals with 0-2 emissions and for intervals with more
than 17 emissions we build two boundary classes.

This results in the following frequency table (next slide):


283
Example 7.1.1 (continued):
Frequency table:
Class
interval 0-2 3 4 5 6 7 8 9
(emissions)
Numbers 18 28 56 105 126 146 164 161

Class
interval 10 11 12 13 14 15 16 >17
(emissions)
Numbers 123 101 74 53 23 15 9 5

Let us draw a histogram for the recorded data (see


next slide):
284
Example 7.1.1 (continued):

285
Example 7.1.2: (‘population pyramids’, book page 22)
Remark: Histograms can also be used for variables
maesured on a nominal or ordinal scale (bar charts).

Another helpful method is the empirical distribution


function Fn , defined as follows:
(# observations x i with x i ≤ y ) 1
Fn ( y ) =
n n
∑ fj .
j , y j ≤y

Fn gives an estimate of the distribution function F that


generated the observed data. Clearly, Fn is constant
except for the outcomes y1, …., ym. Usually one plots
the points ( y j , Fn ( y j )) , for j = 1, ...., m. 286
Example 7.1.1 (Am-241, continued):
Empirical distribution function:
𝑦𝑦𝑗𝑗 0 1 2 3 4 5 6 7 ……

𝐹𝐹𝑛𝑛 (𝑦𝑦𝑗𝑗 � ? ? 0.0149 0.0381 0.0845 0.1715 0.2759 0.3969 …….

𝐹𝐹𝑛𝑛 (𝑦𝑦𝑗𝑗 �

287
𝑦𝑦𝑗𝑗
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem


7.2. Summarizing data using numerical
techniques

Definition (measures of location)

a1) Arithmetic mean (or average) is the most


familiar measure of location: (→ ratio scale)

1 n
x=
n
∑x
i=1
i

289
Definition (measures of location)
a2) Median: (→ ordinal and ratio scale)
 x  n+1 , if n odd,
 
 2 


xMed =   
1
  x n  + x n   , if n even.
 2   2   2 +1 

a3) Mode: (→ nominal, ordinal, and ratio scale)

xM = xi , with hi ≥ h j , for all j ≠ i.

290
Example 7.2.1:
Let us consider the following observations:
4, 7, 7, 7, 12, 12, 13, 16, 19, 23, 23, 97 .

We compute the different measures of location:

mean x = 20;
mode x M = 7;
12+13
median x Med = = 12.5.
2
291
Definition (quantiles)

b) From the ordered observations x(1), ...., x(n) we can


compute the so-called empirical α - quantiles for
different probability levels α ∈ (0,1) as follows:

Compute first K=  αn  +1, where ⌊∙⌋ denotes the


integer part of the number αn. Then get the
empirical α -quantile as
 x (K) , if α ⋅ n not an integer number;

1
 ( x (K ) + x (K-1) ) , if α ⋅ n an integer number.
2
Interpretation: α -percent of the observations lie
below the empirical α - quantile. 292
Example 7.2.2:
Let α = 75%. For n = 100 is α ⋅ n = 75 an integer
number, thus K = 76.
We have to choose a value z between x(75) and x(76)
if`we want 75% of the observations to lie below z.
The empirical 75%-quantile (also called Q3 or third
1
quartile) (x (75)
+ x (76) ) fulfills the requirement.
2

→ For n = 101: α ⋅ n = 75.75 ⇒ K=76,


α ⋅ n not an integer number ⇒ Q3 = x (76) .

293
The location measures like mean, median, or modus
give only some information about the central
tendency of a distribution.

Two distributions might have the same location


parameter while being different.

A measure of dispersion can help distinguish


among distributions with the same location measure.

294
Definition (measures of dispersion)
c1) Range, which is the difference between the
smallest and largest observations, is the
simplest measure of dispersion:
range = xmax – xmin.

c2) Mean-quartile range as dispersion measure:

MQA=
( Q3 -Q2 ) + ( Q2 -Q1 ) IQA
= ,
2 2
where IQA = Q3-Q1 is called inter-quartile range.

295
Example 7.2.4:
n = 14 observations
range = 38-11=27

11 12.5 15 18 19.5 23 25.6 28 29 30 31.5 34 35 38

Q1 Q2 Q3
IQA
1
Q =
2
( x ( ) + x ( ) ) = 26.8
7 8

2
Q = x(
1 4)
= 18 ⇒ IQA = 13.5 ⇒ MQA = 6.75.

Q =x = 31.5
3 (11)

296
Definition (measures of dispersion)
c3) Variance and standard deviation as measures
of dispersion:
The mean of the squared distances of the
observations from the arithmetic mean
1 n
sx = ∑ ( x i - x )
2 2

n i=1
is called (empirical) variance.
The positive square root of the variance

sx = + sx2
is called (empirical) standard deviation. 297
Example 7.2.5:
Consider the observations
3, 5, 9, 9, 6, 6, 3, 7, 7, 6, 7, 6, 5, 7, 6, 9, 6, 5, 3, 5.
Let us compute the empirical variance:

( x -x ) h j ( x j -x )
2 2
j xj nj hj h jx j x j -x j

1 3 3
0.15 0.45 − 3 9 1.35
2 5 4
0.20 1 −1 1 0.20
3 6 6
0.30 1.8 0 0 0
4 7 4
0.20 1.4 1 1 0.20
5 9 3
0.15
 1.35
  3
 9 1.35

n=20
= ∑ 1= x =6
∑ 0 Sx 2 =3.1
298
Definition

The measures of dispersion introduced so far are all


measures of absolute dispersion and their values depend
upon the units in which the variable is measured.

To compare the degrees of dispersion of two variables


measured in different units we have to define a measure
of relative dispersion such as the coefficient of variation
defined (provided that x ≠ 0) as
sx
VK x = .
x

299
Example 7.2.6: (stock prices, 250 working days)

Daimler Chrysler-share: x = 50.59 Euro , s x = 36.18 Euro


Porsche AG-share: y = 396.10 Euro , s y = 182.96 Euro
36.18
⇒ VK x = = 0.72
50.59
182.96
VK y = = 0.46
396.10
Thus, although it shows a smaller standard deviation, the
Daimler Chrysler-share has larger relative dispersion.

→ VK is often used to measure the volatility of stock prices!

300
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem


7.3. Boxplot Graph: use the right scale!
d =largest observation x i
with x i -Q3 < 1.5 ⋅ IQA

25% of the Q3 = 75%-quantile 


data
lie here
Q 2 = median
 IQA = Q − Q
 3 1
25% of the
date Q1 = 25%-quantile 
lie here

c = smallest observation x i

x with Q1 -x i < 1.5 ⋅ IQA


x outliers
x
302
Example 7.3.1: (lifetime of 16 devices in months)

1.5; 3.5; 6.5; 11.50; 12.50; 14; 17; 17; 19; 20; 23.5;
32.5; 34.5; 39; 55.5; 119
( X ( ) +X ( ) )
8 9
( X ( ) +X ( ) )
4 5
( X( 13 )
+X (12 ) )
Q2 = = 18 ; Q1 = = 12 ; Q3 = = 33.5
2 2 2
⇒ IQA = 21.5 Q 3 + 1.5 ⋅ IQA = 65.75; Q1 - 1.5 ⋅ IQA < 0
1.5 ⋅ IQA = 32.25

55.5
Q3 • 119 outlier
• small relative dispersion
• not symmetric!
(more concentration on Q1/Q2)
Q2
Q1
1.5

303
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem


7.4. Quantile-Quantile-plot (QQ-plot)
Sometimes we have an idea about the distribution of
the stochastic process that generated the data.

A QQ-plot is a graphical device that allows us to


investigate whether our assumption about the
distribution is supported by the observed data.

For practical purposes, generally one uses specific


software.

But: what is the theory underlying the QQ-plot?


305
Assumption:
The data x1, …., xn are the realizations of random
variables X1,….., Xn, from a common distribution F.

Considering the analogy between empirical and


theoretical quantiles, we expect
x ( α n  +1) ≈ F-1 (α )
 
given that about α -percent of the data are smaller than
x ( α n  +1) per construction, and, on the other hand, values
of Xi are smaller than F-1(α ) with probability α :

P  X i ≤ F−1 (α ) = F ( F-1 (α ) ) ≅ α .


306
K-1
Let K= α n  + 1, then ≈ α , and thus we
expect n  1
K-  2
 K-1 
x(K) ≈ F (α ) ≈ F 
-1 -1 -1
≈F  .
 n   n 
 
Therefore, if the two distributions being compared
are similar, the points
  1 
 -1  K- 2  
F   , x ( K )  , K=1,...., n,
  n  
   
must lie approximately on the line y=x.
307
 K- 1 
Practically, we plot the theoretical quantiles F-1  2
 n 
 
vs. the empirical quantiles x(K) and graphically

investigate how much the points deviate from a line.

Example 7.3.1 (continued):

QQ - plot of the data

(theoretical distribution:

normal distribution).
308
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem


7.5. Scatter diagram
For variables measured on the ratio scale, we can compute
the differences among the observations.

One-dimensional: we plot the individual observations as


points on the x-axis.
Two-dimensional: we display the data as a collection of
points, each having the value of one variable on the x-axis
and the value of the other variable on the y-axis.

Usually the goal of this graphical method is to get an idea


about how to define the classes in a histogram and/or about
the relation between two variables.
310
Example 7.5.1:
One-dimensional:
x xxx xxx xxx xxx x
Two-dimensional:
waiting time between eruptions
and the duration of eruption for
the Old Faithful geyser in
Yellowstone National Park.

Scatter diagrams are easy to construct and to interpret


(provided that the sample size is not too large): one can see
the domain, concentrations of values (clusters), outliers,
relation between two variables,… 311
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 7.1 DeGroot and


Schervish
8. Estimation of unknown
parameters

313
A central problem in statistics consists of the
identification of random variables of interest, the
specification of a joint distribution or a family of
possible joint distributions for the observable random
variables, and the identification of any parameters of
those distributions that are assumed unknown.

These tasks must be done using an observable


random sample X1,…, Xn with realizations x1,…, xn.
In particular, the unknown parameters must be
estimated as accurately as possible.
314
For the estimation of the unknown parameters, we
have two main approaches:

1) Point estimation:
For each parameter one gets a single value from
the sample as a result of the estimation procedure.

2) Confidence intervals:
The idea is to get some intervals of values in which
the true unknown parameters are contained with
high probability (confidence).

315
The starting point in both approaches is the definition of a
so-called estimator (or statistic).

Suppose that the observable random variables of interest


are X1 ,…, Xn . Any arbitrary real-valued function

θ n ( X1 ,…, X n )

of the n random variables is called an estimator.


The estimator tells us how the random variables must be
handled to get an optimal estimation of the parameter(s).

316
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 7.1 DeGroot and


Schervish
8.1. Intuitive examples of estimators

We often have to deal in practice with the problem that


although we can sometimes safely identify the
underlying distribution generating the data, we have no
clue about the unknown parameters of that distribution.
Or even the distribution of the data generating process
is not easy understood from the data available.
However, we can still try to understand the main
features of the underlying distribution such as the
location or the variance parameter.
318
We want to estimate µ=E[X] thanks to a random
sample X1,..., Xn, generated from the population
distribution having probability or density function fx.

Idea: From the natural interpretation of µ as location


parameter of the distribution, use the arithmetic mean
of the sample (sample mean) as estimator:
1 n
µˆ = t(X1 ,…, X n ) = X n = ∑ X i .
n i=1

This type of estimations is called point estimations


given that the results are single values (and not
plausible intervals of values). 319
Example 8.1.1: (height of students)
We select a random sample of 10 students from all
students participating in a specific class. The height X
in cm is determined and reported in the following table:
i 1 2 3 4 5 6 7 8 9 10

xi 176 180 181 168 177 186 184 173 182 177

The estimate of the average height of the students in


the class is computed as

µˆ = x n = 178.4 cm.
320
Question: Is this a good estimator?

To answer this kind of question we need to introduce


properties that estimators must satisfy to be good estimators.
This is done in the next section.

Example 8.1.2: (binomial experiment)


Following the same reasoning as in the last example, if we
consider a binomial experiment, we should use as estimators
for the population variance σ2 and the success probability
parameter p (respectively):
1 n
σˆ 2 = t(X1 ,…, X n ) = S2x = ∑ (X i -X n ) 2
n i=1
and
1 n
p̂ = t(X1 ,…, X n ) = ∑ X i .
n i=1 321
Illustration: binomial experiment

322
Example 8.1.1 (continued):

Using the data summarized in the table on slide 320,


we get for the variance
2
s = 25.84.
x

Thus, a natural estimate for the variance (scale


parameter) of the underlying distribution is

σ̂ 2 = s 2x = 25.84.

323
Example 8.1.3: (firm’s lifetime)

324
Estimators t(X1,…, Xn) are random variables and therefore
have their own distribution.
Notation:
• symbol "^" means estimator (pronounced "Hat")
• T = t(X1,…, Xn) is a random variable; its realization (called
estimate) is based on the sample observations xi (i = 1,…,
n) of the corresponding random variables in the sample.
• E[T] = µT denotes the expected value of the estimator T.
• V(T) = E ( T - μ T )  = σ T 2 denotes the variance of the
 2
 
estimator T.

Note that often for a given parameter there may be different


estimators.
325
Example 8.1.4:
Let X be Poisson distributed with parameter λ , then:
λ x -λ
P (X = x) = e with E [ X ] = V ( X ) = λ.
x!

Now, should we use


λ = t ( X1 ,…, X n ) = X,
λ = t ( X ,…, X ) = S 2
1 n x

or even other functions as the estimator of 𝜆𝜆?


This decision must be made based on whether the
candidate estimators satisfy some relevant goodness
properties.
326
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 7.4/8.7 DeGroot


and Schervish
8.2. Properties of estimators
Definition (goodness properties)

A) Unbiased estimators
An estimator T = t ( X1 ,…, X n ) = θˆ is an unbiased estimator
of θ if E [ T ] =μT exists and E [ T ] =μT =θ for all values of θ.

In fact, it is desirable to use an estimator T with probability


distribution concentrated around θ.

The difference between the expected value of T and the


parameter θ is called bias: bias = E [ T ] − θ.
328
Example 8.2.1: (sample mean and variance)

1 n
i) T = t ( X1 ,…, X n ) = X n = ∑ X i is an unbiased
n i=1
estimator for μ = E [ X ] ( i.e., μ = E [ X i ] , i = 1,  , n ) :
1 n  1 n 1
E [ T ] = E  X n  = E  ∑ X i  = ∑ E [ X i ] = ⋅ n ⋅ μ = μ.
n
 i=1  n i=1 n
n
ii) T = t ( X1 ,…, X n ) = S2 = 1 ∑ i
(X - X) 2
is a biased
n i=1

estimator for σ 2 = V ( X ) ; in fact, its expectation

equals
( n-1) 2
σ and not σ 2 (for proof see next slide).
n 329
Example 8.2.1 (continued):
1  n 1  n
( ))
2 
( ) (
2
E [ T ] =E S  = E  ∑ X i -X  = E  ∑ ( X i -μ ) - X-μ
2

n  i=1  n  i=1 
1  n 2

n  i=1
2
( )
= E  ∑ ( X i -μ ) − n X-μ 

1 n  2 

n  i=1 
2
  (
=  ∑ E ( X i -μ )  − n E X-μ  

)
 
1
=
n
⋅ n ⋅ V ( X ) - n V X 
 ( )

1 σ 2
 1 n-1
2
= n ⋅ σ - n ⋅ = ⋅ σ 2
⋅ ( n-1) = ⋅ σ 2

n n  n n
=> Therefore, an unbiased estimator of σ2 is given by
n 2 1 n
T = t ( X1 ,…, X n ) =
n-1
S =
n-1
∑ i .
(X
i=1
- X) 2

330
Example 8.2.1 (continued):
iii) Consider a Bernoulli random sample Z1,…, Zn:
f z ( z ) = p (1-p ) , z ∈ {0,1} .
z 1-z

Then, the estimator


n
t ( Z1 ,…, Zn ) =Z = 1 ∑ Z
ni=1
i

is unbiased for the success probability parameter p.

This means that the share of successes in the


sample is an unbiased estimator for the success
probability parameter p in a binomial experiment.
331
Question: Which estimator for V(Z)=p(1-p)?

Idea: Consider Z(1 − Z).

Is this estimator unbiased?


2
 n p +np (1-p )
2 2

E  Z(1 − Z) = E  Z  − E  Z = p-
  n 2

n-1
= p (1-p ) ≠ p (1-p )
n

But: E  Z(1 − Z)  → p (1-p ) for n → ∞.

332
Graphical illustration:

333
Definition (goodness properties)

B) Asymptotically unbiased estimators

If the bias of an estimator becomes monotonically


smaller when the sample size increases, and in the
limit as n → ∞ vanishes, then we say that the
estimator is asymptotically unbiased:

lim E [ T ] = θ.
n →∞

334
Definition (goodness properties)

C) Consistent estimators
A sequence of estimators {θˆ n = t ( X1,…, Xn )}n that
converges in probability to the unknown parameter θ
being estimated, as n → ∞, is called consistent;
that is:

( )
P θˆ n − θ > ε → 0 as n → ∞, ∀ε > 0.

Notation:
P
θ̂n → θ as n → ∞.

335
Graphical illustration of consistency:
Data: firm’s lifetime example 8.1.3.

336
Theorem (practical consistency check):

An estimator is consistent when the following two


conditions are satisfied:

• it is unbiased (or at least asymptotically unbiased);

and

• as n → ∞, its variance vanishes.

337
Example 8.2.2: (sample mean)

Let X1,…, Xn be a random sample from a distribution


with expected value parameter µ and standard
deviation parameter σ.
1 n
Let Tn = t ( X1,…, Xn ) = Xn = ∑ Xi .
n i=1
We proved already that Tn is unbiased for µ.
 1 n  σ2
Now: V ( Tn ) = V  ∑ X= i → 0.
 n i=1  n n→∞
P
⇒ Tn is consistent for μ, i.e., Tn  → μ.
Theorem
338
Definition (goodness properties)
D) (Relative) efficient estimators
Let T and U1, U2,…, UK denote unbiased estimators for
the unknown parameter θ :
E [ T ] = E [U1 ] =  = E [UK ] = θ.
Then, T is called efficient if
V ( T ) ≤ V (Ui ) , i=1,…, K.
If more than one unbiased estimator exists, we choose
the one with the smallest variance.
(→ we expect that the realized estimates are less
dispersed around the true unknown θ ).
339
Graphical illustration of efficiency
Let X1, …, Xn be a random sample from a distribution
with expected value μ and variance σ2, and let n to be
an even number. Then:
1 n
Xa = ∑
n i =1
Xi

2 n/2
Xb = ∑
n i =1
X 2i

are two unbiased estimators for the mean parameter,


given that
1 n 1 n 1
E ( X a ) E (=
=
=
∑ Xi )
n i 1=

=
n i 1
E( X i )
n
n ⋅ E( X i )
⋅= µ;

2 n/2 2 n/2 2
=
E ( X b ) E( ∑ = X 2i ) ∑ ( X 2i )
E= ⋅ n / 2 ⋅ E=
( X 2i ) µ.
= n i 1= n i 1 n
340
Graphical illustration of efficiency (continued)
What about the variance? We can compute that
1 2 2 2
=V ( Xa ) = σ and V ( X b ) σ .
n n

341
Xa Xb
Graphical illustration of efficiency (regression)

342
Definition (goodness properties)

E) Mean squared error (MSE)

The mean squared error (MSE) complement the criteria


introduced so far to judge the goodness of an estimator.

Let T = t ( X1,…, Xn ) denote an estimator of the unknown


parameter θ . Then

E ( T-θ )  = MSE ( θ )
 2

 

is the mean squared error of the estimator T.


343
From the definition we see that for an estimator T with
finite variance, the MSE of T as an estimator of θ
equals its variance plus the square of its bias:

MSE ( θ ) = E ( T-θ ) = E ( T-μT - ( θ-μT ) ) 


 2
 2

   
= E ( T-μT ) − 2 ( θ-μT ) ⋅ E [ T-μT ] + ( θ-μT )
 
2 2

   
=0

= E ( T-μT ) + ( θ-μT )
 
2 2

 
= V ( T ) + bias .
2

344
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 7.5/6 DeGroot and


Schervish
8.3. Main methods to get estimators

So far we focused our attention on the properties


required for estimators to be good estimators and
therefore yield accurate estimates.

Question: What general methods are available to yield


estimators satisfying the goodness
properties?

We have several possible approaches. Among them:


 Method of moments
 Least-squares method (regression)
 Maximum Likelihood method 346
A) Method of moments
Idea: estimate the moment of the underlying distribution
assumed to generate the data using the corresponding
sample moments.

Assume that X1 ,…, Xn form a random sample from a


distribution with at least k existing moments. Define
µ j (θ ) E=
= θ  1 
 X j
 , j 1,..., k .
Suppose that the function M −1 (θ ) = ( µ1 (θ ),..., µ K (θ ) ) is a one-
to-one function of 𝜃𝜃. Define the sample moments by
1 n
=mj = ∑
n i =1
X i , j
j
1,..., k . 347
The method of moments estimator (MME) of 𝜃𝜃 is
M (m1 ,..., mk ).
theoretical moments sample moments
n
μ1=E [ X] = μ m1 = 1 ∑ X j
n j=1
n
μ2 =E  X2  = μ2 + σ 2 m2 = 1 ∑ X j2
n j=1
⋅ ⋅
⋅ ⋅
⋅ ⋅
n
μk =E  Xk  mk = 1 ∑ X jk
n j=1
348
The usual way of implementing the method of
moments is to set up the k equations m j = µ j (θ )
and then solve for 𝜃𝜃.

For example, we will then get the following method of


moments estimators for the mean and the variance
parameters, respectively:
μ̂ = Xn ;
n
1
n∑
2 2 2
σ̂ = (Xi -Xn ) = S .
i=1

Goodness properties of MMEs: Method of moment


estimators are consistent and asymptotically unbiased.
349
Illustration: normal distribution

350
B) Least-squares method
Idea: It is generally used in linear regressions.
Suppose that the goal is to estimate the unknown mean
parameter μ.
This method implies choosing as estimator of μ the
function μ̂ LS defined as
n
μ̂LS = argmin ∑ (Xi -μ)2 ,
μ i=1

that is the statistic that minimizes the sum of the squared


distances between sample observations and μ.
Solution: n
μ̂LS = 1 ∑ Xi = Xn .
n i=1 351
C) Maximum Likelihood method

Illustrative starting example:

A certain lion has three possible states of activity each


night; it is “very active” (denoted by 𝜃𝜃1 ), “moderately
active” (denoted by 𝜃𝜃2 ), and “lethargic” (denoted by 𝜃𝜃3 ).

Also, each night this lion eats people; it eats i people with
probability
p (i / θ ) , θ =
∈Θ {θ=
,j
j 1, 2,3} .
The numerical values are given in the following table (see
next slide):
352
i 0 1 2 3
𝑝𝑝 𝑖𝑖 ⁄𝜃𝜃1 0.00 0.05 0.05 0.90
𝑝𝑝 𝑖𝑖 ⁄𝜃𝜃2 0.05 0.05 0.80 0.10
𝑝𝑝 𝑖𝑖 ⁄𝜃𝜃3 0.90 0.08 0.02 0.00

If we are told that exactly X=x people were eaten last


night, how should we estimate the lion’s activity state?

One seemingly reasonable method is to estimate 𝜃𝜃 as


that 𝜃𝜃 ∈ 𝛩𝛩 for which 𝑝𝑝 𝑥𝑥 ⁄𝜃𝜃 is largest.

This kind of reasoning is the core of the maximum


likelihood method: we are going to choose the most
plausible parameter, in the sense that it maximizes the
probability of the event we observe.
353
Maximum Likelihood method:
Let X1,…, Xn be a random sample from a known
population distribution fx depending on the parameter θ
to be estimated.
When the joint probability/density function of the
observations in a random sample is regarded as a
function of θ
L ( θ; x1,..., xn ) = f( x1,..., xn ) ( θ; x1,..., xn )
n
= ∏ fx ( θ; xi )
i=1

for given values of x1,…, xn, it is called the likelihood


function.
354
For each possible observed vector x1,…, xn, let
T(x1,…, xn) denote a value of θ for which the
likelihood function is a maximum, and let
θ� ML = T(X1,…, Xn)
be the estimator of θ defined in this way. The
estimator θ� ML is called a maximum likelihood
estimator (MLE) of θ:
θ̂ML = argmax L ( θ ; X1,..., Xn )
θ

= argmax log L ( θ ; X1,..., Xn ) .


θ

After (X1,…, Xn)=(x1,…, xn) is observed, the value


T(x1,…, xn) is called maximum likelihood estimate.
355
Example 8.3.1:
MLE of p in a Bernoulli population:
n
(p ; x1,..., xn ) ∏ p (1-p ) , xi ∈ {0,1}.
xi 1-xi
L=
i=1
n
log L ( p ; x1,..., xn ) = ∑(x
i=1
i log p + (1-x i ) log (1-p ) )

∑ x ∑ (1-x )
n n
∂ log !
L ( p ; x1,..., x n ) i=1 i i
= = - i=1
0
∂p pˆ 1-pˆ

( )  
n n
⇔ 1 − p ⋅ ∑ xi =p ⋅  n − ∑ xi 
 
i=1  i=1 
1 n
⇒ p ML = ∑ Xi = Xn .
n i=1 356
Example 8.3.2: Let X1,…, Xn be a random sample
from a uniform continuous distribution:
1
 ,0 ≤ x ≤ θ
fUni ( X ) =  θ
 0 , else

n
 1
L ( θ ; x1,..., xn ) =   , if all xi ∈ [0,θ].
θ
L is monotonic decreasing in θ, but: all xi ∈ [0,θ],
that is θ ≥ xi , i=1,..., n
⇒ θˆ = max ( X ,, X ) .
ML 1 n

357
Graphical illustration of example 8.3.2:

358
Properties of maximum likelihood estimators

1. Invariance property of MLEs:


If θ is the maximum likelihood estimator of θ and
()
if g is a one-to-one function, then g θ is the
maximum likelihood estimator of g ( θ ).

2. Goodness properties of MLEs:


Maximum likelihood estimators are consistent
and asymptotically unbiased.

359
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 8.5 DeGroot and


Schervish
9. Confidence intervals

361
9.1. The idea
Confidence intervals provide a method of adding
more information to an estimator when we wish to
estimate an unknown parameter θ.
We can find an interval (A,B) that we think has high
probability of containing θ. The length of such an
interval gives us an idea about how closely we can
estimate θ and how large the sampling error is.

362
Definition
Symmetric (1- α)-confidence interval:
CONF1-α ( θ ) = θ n -f n ; θ n +f n  ,
where f n denotes the sampling error and is
computed in such a way that the confidence interval
contains the unknown parameter θwith a given
probability(1-α):
P θ ∈ CONF1-α ( θ )  = 1-α.

The probability (1- α ) is called the confidence level.


Classical values for confidence levels are 0.95 and
0.99.
363
Interpretation: The confidence interval can be regarded
as the observed value of the random interval (A,B) after
observing the data.

In fact, one way to think of the random interval (A,B) is to


imagine that the sample that we observed is one of many
possible samples that we could have observed.

Each such sample would allow us to compute an observed


interval. ⋅





X
θ 364
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 8.5 DeGroot and


Schervish
9.2. Example of a confidence interval
(mean of a distribution, large samples)
Let us consider the construction of the (1-α)-confidence
interval for the mean parameter μ of the population
distribution.  X n -μ 
CLT: P ≤ q  → FZ ( q ) .
 σ n  n →∞

Then (if n is large enough):


 X n -μ 
P  −q α ≤ ≤ q α  ≅ 1-α,
 1−
2 σ n 1−
2

( )
where q α denotes the 1 − α 2 - quantile of the standard
1−
2

normal distribution. 366


We can now solve with respect to the parameter of
interest, getting the (1-α) -confidence interval for μ :
 σ σ 
CONF1-α ( μ ) =  X n − q α ; Xn + q α 
 1−
2 n 1−
2 n
If also σ is unknown, for n ≥ 50 we have that
 Sn Sn 
CONF1-α ( μ ) =  X n − q α ; Xn + q α 
 1−
2 n 1−
2 n
is the (1- α) – confidence interval for μ , with
n
Sn = 1
2
∑ (X -X ) 2
.
ni=1
i n

367
Example 9.2.1: (rent index)

The municipal administration asks 50 households about


their rent per m2 (exclusive of heating) to compute the
local rent index. This results in the following numbers:
X50 = 8.30 € und S50 = 2.07 € .

What is the 0.9 - confidence interval for the average rent


parameter μ ?
CONF90% ( μ ) =
 2.07 2.07 
8.30 − 1.645 ⋅ ; 8.30 + 1.645 ⋅  = [ 7.82 ; 8.78].
 50 50 
368
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem Chapter 9.1 DeGroot and


Schervish
9.3. Relation with testing hypotheses

Consider again the statistical problem involving a


parameter θ whose true value is unknown but must
lie in a certain parameter space Θ.

Suppose Θ can be partitioned into two disjoint


subsets Θ0 and Θ1 and we are interested to verify
whether θ lies in Θ0 or in Θ1 .

A problem of this type is called a problem of


hypothesis testing. Some observed values will
provide information about θ to make a decision.
370
𝐻𝐻0 : 𝜃𝜃𝜖𝜖 Θ0
𝐻𝐻1 : 𝜃𝜃𝜃𝜃 Θ1
are called target (null) and alternative hypotheses.

When performing a test if we decide that θ lies in


Θ1 , we are said to reject the target hypothesis.

If we decide that θ lies in Θ0 , we are said not to


reject 𝐻𝐻0 .

One possible way to make our decision about θ is


to construct a confidence interval.
371
In such a case we are interested in the following
type of hypotheses:
𝐻𝐻0 : 𝜃𝜃 = 𝜃𝜃0
𝐻𝐻1 : 𝜃𝜃 ≠ 𝜃𝜃0 .

Example: Mean of a distribution (large samples)

Idea: Reject 𝐻𝐻0 : µ= µ0 if the distance between the


arithmetic mean and µ0 is large enough:
𝑋𝑋𝑛𝑛 − µ0 ≥ 𝑐𝑐,
where the value c is determined by the significance
𝜎𝜎
level α of the test: 𝑐𝑐 = 𝑞𝑞1−𝛼𝛼 .
2 𝑛𝑛
372
If 𝑋𝑋𝑛𝑛 = 𝑥𝑥𝑛𝑛 is observed, the set of µ0 such that we
would not reject 𝐻𝐻0 is the set of µ0 such that
𝜎𝜎
𝑥𝑥𝑛𝑛 − µ0 < 𝑞𝑞1−𝛼𝛼
2 𝑛𝑛
in case 𝜎𝜎 is known (otherwise see slide 367).

This inequality easily translates to the formula


given in slide 367 for the confidence interval
(µ = µ0 ):
 σ σ 
CONF1-α ( μ ) =  X n − q α ; Xn + q α 
 1−
2 n 1−
2 n
373
Part I: Probability Part II: Statistics
1. Probability theory: the building blocks 7. Descriptive statistics
1.1. Events and the sample space 7.1. Frequency tables, histograms, and empirical distributions
1.2. Relations of set theory 7.2. Summarizing data using numerical techniques
1.3. The concept of probability 7.3. Boxplot
1.4. Axiomatic definition of probability 7.4. Quantile-Quantile-plot
1.5. Basic theorems 7.5. Scatter diagram
1.6. Probability spaces
1.7. Conditional probability and stochastic independence 8. Estimation of unknown parameters
1.8. Law of total probability 8.1. Intuitive examples of estimators
1.9. Bayes’ theorem 8.2. Properties of estimators
8.3. Main methods to get estimators
2. Combinatorial methods
2.1. Factorials and binomial coefficients 9. Confidence intervals
2.2. Multiplication rule 9.1. The idea
2.3. Permutations 9.2. Example of a confidence interval
2.4. Combinations (mean of a distribution, large samples)
2.5. Sampling with replacement 9.3. Relation with testing hypotheses

3. Random variables Part III: Exercises


3.1. The (cumulative) distribution function
3.2. Discrete random variables
3.3. Continuous random variables
3.4. The expectation of a random variable
3.5. Variance
3.6. Standardization

4. Special distributions
4.1. The uniform discrete distribution
4.2. The Bernoulli distribution (discrete)
4.3. The Binomial distribution (discrete)
4.4. The Poisson distribution (discrete)
4.5. The uniform continuous distribution
4.6. The exponential distribution (continuous)
4.7. The normal distribution (continuous)

5. Multivariate random variables


5.1. Joint distribution and marginal distributions
5.2. Conditional distributions and stochastic independence
5.3. Covariance and correlation
5.4. Sums and sample means of random variables

6. The Central Limit Theorem


Part III: Exercises

375
Exercise 1.1.1:
Give the sample space in the following cases:
1) A person is asked about her birthday.
2) K persons are asked about their birthdays.
3) Position of a locator on the unit circle.
4) Let S = {"0 times six", "2 times six"}.
Is it a sample space for the experiment of rolling
two dice?

376
Exercise 1.2.1:
1) Rolling one die:

A: "number smaller than 4"  A ∪B = ?


→
B: "odd number"  A ∩B = ?

2) A = {( x, y ) |ax + by + c = 0}
→AB=?
B = {( x, y ) |ax + by + d = 0}

3) Prove the De Morgan’s laws:

A ∪ B = A ∩ B and A ∩ B = A ∪ B.
377
Exercise 1.2.1 (continued):

4) Rolling two dice:


S = {(1,1) , (1,2 ) ,..., (1,6 ) , ( 2,1) , ( 2,2 ) ,..., ( 6,6 )}
and
A: ‘‘at least one die is a six’’
B: ‘‘the two dice show the same number’’
C: ‘‘the two dice show odd numbers’’

378
Exercise 1.2.1 (continued):
Write the following events as subsets of S:

A=?
B=?
C=?
A=?
C=?

379
Exercise 1.2.1 (continued):

B∩C = ?
B\C =?
A ∪C = ?
A ∩B ∩C = ?

380
Exercise 1.3.1:

Which probability concept?

(1) The probability that playing ‘‘Lotto’’ we type three


or more right numbers equals only about 2%.

(2) The probability that next June in Frankfurt it is


going to snow is smaller than 5%.

(3) The probability that the applicant X for the


announced position Y is invited for an interview
equals 80%.

381
Exercise 1.5.1:

“Rolling two dice” (see exercise 1.2.1 (4))

A: “at least one die is a six”


B: “the two dice show the same number”
C: “the two dice show odd numbers”

Compute the probabilities of the events above


and those introduced on pages 374-375.

382
Exercise 1.5.2: (sick notes)

The (frequentist) probabilities for the sick notes of three


employees X, Y and Z are summarized in the following
table:
Ei {-} {X} {Y} {Z} {XY} {XZ} {YZ} {XYZ}

P(Ei) 0.751 0.1 0.063 0.061 0.011 0.008 0.005 0.001 Σ=1

Compute:
→ P(“X ill”) = ?
→ P(“Y ill”) = ?
→ P(“X and Y ill”) = ?
→ P(“X or Y ill”) = ?

383
Exercise 1.6.1:
Compute the probability of the following two events:
a) A: “Rolling four dice we get at least one six’’
b) B: “Rolling two dice 24 times we get at least one
twelve as the sum of the numbers’’

384
Exercise 1.6.2:

Let us consider the sample space

S= {( a,b ) | 0 < a < 3 and 0 < b < 2}.


Define the event
 a
U = ( a,b ) | a ∈ [ 0,3] , b < 1 −  .
 6
P(U) = ?

385
Exercise 1.7.1: (sick notes, see exercise 1.5.2)

→ Are the sick notes of the three employees X, Y


and Z pairwise independent?
I) P(‘‘X and Y ill’’) = ?
II) P(‘‘X and Z ill’’) = ?
III) P(‘‘Y and Z ill’’) = ?
⇒ How does the probability of sick notes of Y change
in reaction to a sick note of X?
P(‘‘Y ill’’ | ‘‘X ill’’) = ?

386
Exercise 1.7.2:
Two independent elevators A and B, identical from
both a technical and a functional point of view, are
located in an office building.
The probability that the elevator A (or B) at a given
point in time is on the ground floor equals 0.2.

387
Exercise 1.7.2 (continued):

→ What is the probability that a visitor coming at a


randomly chosen point in time....
I) finds both elevators on the ground floor?
II) finds at least one elevator on the ground floor?
III) finds exactly one elevator on the ground floor?
→ Both elevators have a failure probability equal to 5%
when they are not on the ground floor.
What is the probability that...
IV) elevator A fails given that it is not on the ground
floor?
388
Exercise 1.7.3:
The transmission of a communication from A to B can
be done using 3 independent channels. Each channel
can fail with a given probability ‘‘p’’.

Compute the following probability:


P[“transmission successful’’] = ?

389
Exercise 1.8.1: (draw of the ‘Zusatzzahl’ in Lotto)

Lotto works as follows: seven balls are drawn without


replacement from an urn containing 49 balls numbered
consecutively (with integer numbers from 1 to 49).

The number of the last ball is called ‘Zusatzzahl’.

What is the probability of the event

A: “Zusatzzahl 1 is drawn’’?

390
Exercise 1.8.2: (supplier with differences in quality)
An automaker equips his vehicles with air conditioning
systems that he gets from three different suppliers.

Supplier Share Defective items


A 50% 5%
B 30% 9%
C 20% 24%

M: “a randomly chosen vehicle has a malfunctioning


air conditioning system’’
→ P(M) = ?
→ when M is observed: P(A|M) = ? = P(A) = 0.5 ?
P(B|M) = ? and P(C|M) = ?
391
Exercise 1.9.1: (supplier with differences in quality)
(see exercise 1.8.2, continued)

P (A M) = ?
P (B M) = ?
P (C M) = ?

392
Exercise 1.9.2: (urn)

Let us consider two urns U1 and U2.


U1 contains 5 white and 7 red balls. U2 contains one
white and 5 red balls.

We randomly choose an urn and from that urn we


randomly draw one ball.

The ball drawn is red. What is the probability that urn


U1 was chosen?
393
Exercise 3.0.1:

Define the appropriate random variable for:

→ the number of customers in a given store;

→ the burning time of a light bulb (in hours).

394
Exercise 3.1.1:

Let us consider rolling two dice.

S = {(i, j) | i , j ∈ {1,...,6}}
X = "sum of the numbers": ( i, j ) → i+j
W = {2,...,12}

What is the distribution function of X?

395
Exercise 3.1.2:
A machine produces defective items with probability 5%.

We choose randomly 4 items from the total production.

Let us denote with X the random variable


X = ‘‘number of defective items in the sample’’.

Then: W={0,1,2,3,4}.

What is the distribution function of X?

396
Exercise 3.2.1:

A random variable has the following probability


function:

K for x = 0
2K for x = 1

f x ( x ) = 3K for x = 2
5K for x = 3

 0 else.

397
Exercise 3.2.1 (continued):

→ Compute the constant K.

→ P (1 < X ≤ 3 ) = ?
P ( X > 1) = ?
P ( X = 1) = ?

→ Find the smallest value x of X such that


P ( X ≤ x ) = Fx ( x ) ≥ 0.5.

398
Exercise 3.2.2: (revenue under uncertain conditions)

(see example 3.0.5, continued)

The management is interested in the probability that

1. the total order volume next year amounts at most


to 30 millions (M).
2. the next year’s absolute difference from the fixed
goal of 36 M for the total order volume amounts at
most to 6 M.

399
Exercise 3.3.1:
Let us consider a random variable X with density
function f given by
 2 1
3x , 0 ≤ x ≤ 2

f ( x ) =  3x 1
 2 , 2 <x≤c

 0 , else.
a) Determine the constant c such that the function f is
a density function. Sketch the density function.
b) Compute the distribution function of X.
400
Exercise 3.4.1:

1. Consider a random variable X with

 2 1
3x , 0 ≤ x ≤ 2

f (x) = 3 1
2 x , 2 < x ≤ c

 0 , else.
E [ X] = ?

401
2. Waiting time at the 'S-Bahn' station (continued)

(see example 3.3.2)


E [ X] = ?
3. Two players A and B roll one die alternately.

The rolling player wins from his competitor


• 3 Euro if he gets 1 or 2;
• 6 Euro if he gets 6; and
• 0 Euro if the number obtained is 3, 4 or 5.
Compute the expected value of the random variable
X = "winning of A" when each player rolls the die once.
402
Exercise 3.5.1:
A random variable X has the density function
3 2
 x , 0 ≤ x ≤ c
f ( x) = 8
 0, else.

Compute E[X] and V(X).

403
Exercise 3.5.2:
Compute the expected value and the variance of the
random variable
Y = 3X + 2,
where X has the probability function

X 1 2 5
f ( x ) 0.2 0.3 0.5

404
Exercise 3.5.3: (defective piping)
A piping is made of 20 segments. Given that the
ouflow quantity is smaller than the inflow quantity,
there must be a leak somewhere.
Let us assume that there is exactly one leak and
that it is located in each one of the segments with
equal probability 1/20.
We would like to find the segment in which the
leak is located with the smallest possible number
of inspections (that is, measuring the flow rate at
each segment’s borders).
405
Exercise 3.5.3 (continued):

a) Compute the distribution of


X = ‘‘number of inspections’’
in the case that we check gradually each segment’s
border starting with the first one.
2
Compute E(X) and σx.

b) Is there a better, cheaper strategy?

406
Exercise 4.3.1: (quality check)
In the production of high-quality drinking glasses the
percentage of defective items equals 20%.
In the course of a quality check we take randomly
four drinking glasses with replacement.
X: ‘‘# of defective glasses in the sample’’
Y: ‘‘# of flawless glasses in the sample’’
Compute the probability that:
(1) exactly one glass in the sample is defective;
(2) at least two glasses in the sample are defective;
(3) exactly one glass in the sample is flawless.
Compute E[X], E[Y], V(X), V(Y).
407
Exercise 4.4.1: (clients at the bank counter)
Clients come to a given bank counter at some unpredictable
point in time: in the morning (8-12) on average 12 clients per
hour and in the afternoon (14-16) on average 10 clients per
hour.
Assume that clients are coming independently to each other
whether it is morning or afternoon. Compute the probability…
1) that on a given day between 09.00h and 09.15h no client
comes to the bank counter;
2) that on a given day between 15.00h and 15.15h no client
comes to the bank counter;
3) that on a given day between 15.30h and 16.00h more than
6 clients show up at the bank counter.

408
Exercise 4.6.1: (clients at the bank counter)
(see exercise 4.4.1, continued)

→ in the morning (8-12): 12 clients per hour


→ in the afternoon (14-16): 10 clients per hour

What is the probability that at a given point in time in


the morning/afternoon a client shows up at the bank
counter in the following five minutes?

→ Xmor = ‘‘time until the arrival of the next client’’


(morning) ~ fEx (x; 12)
→ Xaft = ‘‘time until the arrival of the next client’’
(afternoon) ~ fEx (x; 10)
409
Exercise 4.7.1:
Working with general normal distributions N(µ, σ2):

Let X ~ fN (x ; 2,16)

→ P ( X ≤ 0) =?
→ P ( X ≤ 2) =?

Now consider X ~ fN (x ; 5, 100). Find q such that:

→ P ( X ≤ q) =
0.25;
→ P ( X ≤ q) =
0.75.

410
Exercise 5.1.1:

Let us consider a box containing 2 white, 3 black,


and 1 blue balls. We draw 2 balls with replacement.

Define the random variables:


X = ‘‘number of white balls in the sample’’
Y = ‘‘number of blue balls in the sample’’

Find the joint (bivariate) distribution and the marginal


distributions of X and Y.

411
Exercise 5.4.1:

Let { X1 , X 2 , ...} be a random sample with E ( X i ) = µ


1 2
and V ( X i ) σ . Let
= 2
Z= ⋅ ∑ ( 4X 2i + 3 X i + 2) .
22 i=1

Compute E[Z] and V(Z).

412
Exercise 5.4.2:

Let X and Y denote two independent, Poisson


distributed random variables. We know that V(X) +
V(Y) = 5.

Compute the probability P(X+Y ≤ 2).

Hint: The Poisson family of distributions satisfies the


additive property:
X ~ fPo 

Y ~ fPo ⇒ ( X + Y ) ~ fPo .
X,Y independent 
 413
Exercise 6.1:
The (historical) probability that on a given day in June
in a Mediterranean holiday resort it rains equals 0.08.

a) What is the distribution of the number of rainy


days in a week (X7) and in the whole month of
June (X30)?
b) Compute expected value and variance of X7.
c) What is the probability that...
- it does not rain for a whole week in June?
- it rains at least three days in a week in June?
- in the whole month of June we observe at most
two rainy days?
414
Exercise 6.1 (continued):

In the same resort, the sunshine duration of a day in


June can be modeled as a normally distributed
random variable with µ = 10 [hours] and σ2 = 10.8
[hours2].

d) What is the distribution of the total sunshine


duration in June (Y30) and that of the average
sunshine duration in June (Y30), respectively?

e) What is the probability that the sun shines in the


whole month of June of a given year on average
more than 11 hours per day?
415
Exercise 6.2:
100 integer numbers from 1 to 5 are randomly chosen
and summed. What is the probability that the sum…

a) equals at most the value 250?

b) lies between the values 275 and 305 (boundaries


included)?

416
Exercise 6.3:

A die is rolled 300 times. Let X denote the number of


‘3’ that are observed.

Compute P(50 < X ≤ 53) and P(X < 40).

417
Exercise 7.4.1:

Below are summarized the results (in points) of an


exam (23 students):

6.2; 4.82; 2.96; 6.18; 6.52; 7.9; 9.62, 6.22; 0.42;


9.06; 11.7; 6.54; 3.14; 4.74; 2.66; 7.04; 7.78; 11.8;
9.44; 20.76; 2.9; 8.42; 8.02

Boxplot? QQ-Plot? Other summary statistics?

418
Exercise 8.2.1: (estimation of λ in a Poisson distribution)

Let us consider the following two estimators for the


parameter λ of a Poisson distributed population:

n 2
T1 = X n and T2 = S .
n-1
i) Are T1 and T2 unbiased for λ ?
ii) Is T1 consistent for λ ?
iii)Is T1 (relative) efficient with respect to T2?
419
Exercise 8.2.2:

Let
1 n 1  n

X= ∑ Xi and X =  2X1+∑ Xi 
n i=1 n+1  i=2 
be two competing estimators for the mean
parameter E[X] = μ of the population distribution.
Assume that V(X) = 𝜎𝜎 2 exists.

1. Show that both estimators are unbiased.


2. Compute their variance.
3. Which alternative estimator is (relative)
efficient?
420
Exercise 8.3.1:

We toss n identical coins, each one until we get for


the first time ‘heads’.

Find the maximum likelihood estimator of


p = P[ "heads" ]
based on the random sample X1,…, Xn, where Xi
denotes the number of ‘tails’ needed before we get
for the first time ‘heads’ for the coin i.

421
Exercise 9.2.1:

A sector is made of N=12,100 individual companies.


We consider a random sample of size n=225.
The variable of interest is P = “annual profit” (in
Swiss francs).
Summary statistics for the results in 2006 are:
p225= 600,000. − ; sP= 90,000. −
Find:
1. a confidence interval for the mean annual profit
at the level α=4.55%;
2. a confidence interval for the total annual profit of
the sector at the level α=4.55%.
422
Exercise 9.2.2:

In the June 1986 issue of Consumer Reports, some data on the


calorie content of beef hot dogs is given. Here are the numbers
of calories for 20 different hot dog brands:

186, 181, 176, 149,184, 190, 158, 139, 175, 148,


152, 111, 141, 153,190, 157, 131, 149, 135, 132.

Assume that these numbers are the observed values from a


random sample of twenty independent normal random variables
with mean µ and standard deviation σ, both unknown.

Find a 90% confidence interval for the mean number of calories


µ.

423

You might also like