Sampling, CI and HT
Sampling, CI and HT
Sampling, CI and HT
Decision Support
AA
Christopher Grigoriou
It seems bombers came back with holes in them, sometimes lots of holes.
Somebody in the brass suggested that they put more armor in the places
that had the most holes, figuring those places must be getting hit more often.
2
Distribution of the sample mean
Applied Statistics and
Decision Support
AA
A machine is set up so that the average content of juice per bottle equals µ.
The actual amounts per bottle are distributed around this average
with a standard deviation σ = 5cl. Consider a sample of 50 bottles.
0.6 0.6
0.5 0.5
n = 1 0.4 n=2 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 1 0 0.5 1
0.40 0.40
0.30 0.30
n=3 n=4
0.20 0.20
0.10 0.10
0.00 0.00
0 0.33 0.67 1 0 0.25 0.5 0.75 1
4
Confidence intervals: Motivation
Applied Statistics and
Decision Support
AA
The latest poll (1,100 respondents) reveals that 54% of the population supports
the government's budgetary decisions. The margin of error is ± 3%.
5
Confidence intervals: Motivation (2)
Applied Statistics and
Decision Support
AA
6
Confidence intervals: The basic idea
Applied Statistics and
Decision Support
AA
Sampling
Population Sample
µ, σ, p
Inference:
Confidence interval Summarising data
Estimates
_
x, s, p^
_
SE = SD( x )
7
Example
Applied Statistics and
Decision Support
AA
A machine is set up such that the average content of juice per bottle equals µ.
A sample of 100 bottles yields an average content of 48cl.
Calculate a 90% and a 95% confidence interval for the average content.
8
Sample size
Applied Statistics and
Decision Support
AA
One day there was a fire in a wastebasket in the dean’s office
and in rushed a physicist, a chemist and a statistician.
While they were doing this, the statistician was setting fires to
all the other wastebaskets in the office.
9
Sample size
Applied Statistics and
Decision Support
AA
What sample size is required to estimate the average contents
to within 0.5cl at the 95% confidence level?
10
Hypothesis testing
Applied Statistics and
Decision Support
AA
Sampling
Population Sample
µ, σ, p
Inference
Hypotheses:
Summarising data
H0, Ha Estimates
_
^
x, s, p
_
SE = SD( x )
11
Carrying out a hypothesis test:
Applied Statistics and
Decision Support The classical approach
AA
Step 3. Significance level: How unlikely does the observed value have to be
to decide to reject H0
Step 5. Take the sample and see if the observed value justifies rejection of H0
12
Example
Applied Statistics and
Decision Support
AA
A machine is set up such that the average content of juice per bottle equals µ.
A sample of 36 bottles yields an average content of 48.5cl.
Test the hypothesis that the average content per bottle is 50cl
at the 5% significance level.
13
The impact of sample size
Applied Statistics and
Decision Support
AA
A machine is set up such that the average content of juice per bottle equals µ.
A sample of 100 bottles yields an average content of 48.8cl.
Test the hypothesis that the average content per bottle is 50cl
at the 5% significance level.
14
One-tailed tests
Applied Statistics and
Decision Support
AA
A machine is set up such that the average content of juice per bottle equals µ.
A sample of 36 bottles yields an average content of 48.5cl.
Can you reject the hypothesis that the average content per bottle is less than or
equal to 45cl in favour of the alternative that it exceeds 45cl (5% significance level)?
15
Exercise: Formulating H0
Applied Statistics and
Decision Support
AA
The manager claims that the average content of juice per bottle is less than 50cl.
The machine operator disagrees. A sample of 100 bottles yields an average
content of 49cl per bottle.
Does this sample allow the manager to claim he is right (5% significance level)?
16
Applied Statistics and
Decision Support
AA
P/2 P/2 P
_ _
m x m x
A machine is set up such that the average content of juice per bottle equals µ.
A sample of 100 bottles yields an average content of 48.8cl.
Calculate the P-value for the hypothesis that the average content per bottle
equals 50cl.
19
Unknown population standard deviation σ
Applied Statistics and
Decision Support
AA
s=
∑
i =1
( xi − x ) 2
n −1
20
Small samples
1. Population must follow a Normal distribution
Applied Statistics and
Decision Support
AA
2a. σ known: Sample mean follows Normal distribution
2b. σ unknown: Replace σ by estimate s
Sample mean follows t-distribution
Degrees of freedom
t-score 1 5 10 20 30 50 100
with n-1degrees of freedom
0.0 0.500 0.500 0.500 0.500 0.500 0.500 0.500
1.0 0.250 0.182 0.170 0.165 0.163 0.161 0.160
N(0,1)
2.0 0.148 0.051 0.037 0.030 0.027 0.025 0.024
2.5 0.121 0.027 0.016 0.011 0.009 0.008 0.007
3.0 0.102 0.015 0.007 0.004 0.003 0.002 0.002
^
Standard deviation of p:
^
Distribution of p:
22
Newspaper reports
Applied Statistics and
Decision Support
AA
The latest poll (1,100 respondents) reveals that 54% of the population supports
the government's budgetary decisions. The margin of error is ± 3%.
^
Observed value p:
^
Standard deviation of p: