Notes 6

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

22S:101 Biostatistics: J.

Huang 1

Chapter 9: Confidence Intervals

• Statistical Estimation
Point Estimation
Interval Estimation
• Confidence Intervals
Two-sided Confidence Intervals
One-sided Confidence Intervals
• Student’s t Distribution
22S:101 Biostatistics: J. Huang 2

Statistical Estimation

Point Estimation: using the data to calculate a single estimate of the


parameter of interest. For example, we often use the sample mean x
to estimate the population mean µ.

Interval Estimation: provides a range of values (an interval) that may


contain the unknown parameter (such as the population mean µ).
22S:101 Biostatistics: J. Huang 3

Confidence Intervals: an interval that contains the unknown


parameter (such as the population mean µ) with certain degree of
confidence.

Example: Consider the distribution of serum cholesterol levels


for all males in the US who are hypertensive and who smoke. This
distribution has an unknown mean µ and a standard deviation 46
mg/100ml. Suppose we draw a random sample of 12 individual
from this population and find that the mean cholesterol level is
x̄ = 217mg/100ml.

x̄ = 217mg/100ml is a point estimate of the unknown mean


cholesterol level µ in the population.

However, because of the sampling variability, it is important to


construct an interval estimate of µ to account for the sampling
variability. A 95% confidence interval for µ is
 
46 46
217 − 1.96 √ , 217 + 1.96 √ .
12 12
or
(191, 243).

A 99% confidence interval for µ is


 
46 46
217 − 2.58 √ , 217 + 2.58 √ .
12 12
or
(183, 251).
22S:101 Biostatistics: J. Huang 4

Confidence Intervals

Under the normality assumption


 
σ σ
P X − 1.96 √ ≤ µ ≤ X + 1.96 √ = 0.95.
n n
In general, by the CLT, for reasonably large sample size n, the above
equation is still approximately true. Thus a 95% confidence interval
for µ when σ is known is
 
σ σ
x̄ − 1.96 √ , x̄ + 1.96 √ .
n n

Let zα/2 be the value that cuts off an area of α/2 in the upper tail of
the standard normal distribution. A 1 − α confidence interval for the
population mean µ is
 
σ σ
x̄ − zα/2 √ , x̄ + zα/2 √
n n
22S:101 Biostatistics: J. Huang 5

Confidence Intervals: What do they mean?

In repeated sampling, from a normally distributed population with a


known standard deviation, 100(1 − α) percent of all intervals of the
form
 
σ σ
x̄ − z1−α/2 √ , x̄ + z1−α/2 √
n n
will in the long run cover the population mean µ.

See the simulations in R.


22S:101 Biostatistics: J. Huang 6

Confidence Intervals

In general, a confidence interval of an unknown quantity is


+
point estimate − (reliability coefficient) × (standard error).

Sometimes, we call

margin of error = (reliability coefficient) × (standard error)


= half of the length of the confidence interval.
22S:101 Biostatistics: J. Huang 7

Sample size calculation based on specified length of CI

In the cholesterol level example, the 95% confidence interval is


(191, 243). Its length is 243 − 191 = 52. How large a sample would
we need to reduce its length to 20?

Recall that the 95% confidence interval is


 
46 46
217 − 1.96 √ , 217 + 1.96 √ .
n n

The length of this confidence interval is 2 × 1.96 × 46/ n. So to find
the required sample size n, we can solve the equation
46
2 × 1.96 √ = 20.
n
We find
 2
1.96 × 46
n= = 81.3 ≈ 82.
10
22S:101 Biostatistics: J. Huang 8

One-sided confidence interval


Sometimes, we are interested in an upper limit for the population mean
µ or a lower limit for µ. In such cases, one-sided confidence intervals
are appropriate.
Example: Consider the distribution of hemoglobin levels for the
population of children under 6 who have been exposed to high levels
of lead. Suppose that this distribution has sd σ = 0.85g100ml.
Because children who have lead poisoning tend to have much lower
levels of hemoglobin than children who do not, we are interested in
an upper confidence limit for µ, the mean of the hemoglobin levels in
this population.
Suppose that we have a random sample of 74 children from this
population. The sample mean x = 10.6g100ml. We construct a 95%
upper confidence limit. The idea is to find c such that

 
σ
P µ ≤ X + c√ = 0.95.
n
That is
 
X −µ
P √ ≥ −c = 0.95.
σ/ n
Thus c = 1.645. The upper confidence limit is
0.85
10.6 + 1.645 × √ = 10.8.
74
22S:101 Biostatistics: J. Huang 9

Student’s t-distribution

So far we have assumed that σ is known. However, in reality, both µ


and σ are usually unknown. All we have is the data. Let x1, . . . , xn
be the observations. Let the sample mean and sample variance be
n n
1X 2 1 X
x= xi , s = (xi − x)2.
n i=1 n − 1 i=1

The confidence intervals can be constructed based on the following


t-statistic:
x−µ
T = √ .
s/ n

We can compare the t-statistic with the z-statistic:


x−µ
Z= √ .
σ/ n

The difference is
• In Z, we use σ (when σ is known).
• In T , we use s (when σ is unknown).
22S:101 Biostatistics: J. Huang 10

Suppose the data is from the normal distribution N (µ, σ 2). Then
T has a t-distribution with n − 1 degrees of freedom. This is often
denoted as

T ∼ tn−1 .

This result was first obtained by W. S. Gosset in the paper “The


Probable Error of a Mean,” Biometrika, 6 (1908), 1-25. Gosset used
the pseudonym “Student”. So this distribution is called student’s t
distribution, or in short, t-distribution.
22S:101 Biostatistics: J. Huang 11

Example: A sample of 16 ten-year-old girls gave a mean weight


of 71.5 and a standard deviation of 12 pounds. Assuming normality,
find the 90, 95, and 99 percent confidence intervals for the population
mean weight µ.

Let tα/2(n − 1) the value that cuts off the upper area of α/2 in a
t-distribution with n − 1 degrees of freedom. The general form of the
confidence interval based on the t-distribution is
 
s s
x − tα/2(n − 1) √ , x + tα/2(n − 1) √ .
n n

The 90, 95, and 99 percent confidence intervals are


 
12 12
71.5 − 1.75 , 71.5 + 1.75 ,
4 4
 
12 12
71.5 − 2.13 , 71.5 + 2.13 ,
4 4
 
12 12
71.5 − 2.95 , 71.5 + 2.95 ,
4 4
or

(66.25, 76.75),

(65.11, 77.89),

(62.65, 80.35),

respectively.
22S:101 Biostatistics: J. Huang 12

Let X be the sample mean and S 2 be the sample variance of a random


sample from a N (µ, σ 2) distribution. Denote
X −µ
T = √ .
S/ n
Then

T ∼ tn−1 .

This result was first obtained by W. S. Gosset in the paper “The


Probable Error of a Mean,” Biometrika, 6 (1908), 1-25. Gosset used
the pseudonym “Student”. So this distribution is called student’s t
distribution, or in short, t-distribution.
22S:101 Biostatistics: J. Huang 13

Example: Lloyd and Mailloux [1988, Analysis of S-100 Protein


Positive Folliculo-Stelate Cells in Rat Pituitary Tissues, American
Journal of Pathology, 133, 338-348] reported the following data
on the pituitary gland weight in a sample of four Wistar Furth Rats:
mean= 9.0 mg, standard error of the mean= 1.0.

(a) What was the sample standard deviation?


(b) Construct a 95% confidence interval for the mean pituitary
weight of a population of similar rats.

You might also like