Notes 6

22S:101 Biostatistics: J.
Huang 1
Chapter 9: Confidence Intervals
• Statistical Estimation
Point Estimation
Interval Estimation
• Confidence Intervals
Two-sided Confidence Intervals
One-sided Confidence Intervals
• Student’s t Distribution
22S:101 Biostatistics: J. Huang 2
Statistical Estimation
Point Estimation: using the data to calculate a single estimate of the

parameter of interest. For example, we often use the sample mean x
to estimate the population mean µ.
Interval Estimation: provides a range of values (an interval) that may

contain the unknown parameter (such as the population mean µ).
Confidence Intervals: an interval that contains the unknown

parameter (such as the population mean µ) with certain degree of
confidence.
Example: Consider the distribution of serum cholesterol levels

for all males in the US who are hypertensive and who smoke. This
distribution has an unknown mean µ and a standard deviation 46
mg/100ml. Suppose we draw a random sample of 12 individual
from this population and find that the mean cholesterol level is
x̄ = 217mg/100ml.
x̄ = 217mg/100ml is a point estimate of the unknown mean

cholesterol level µ in the population.
However, because of the sampling variability, it is important to

construct an interval estimate of µ to account for the sampling
variability. A 95% confidence interval for µ is

46 46
217 − 1.96 √ , 217 + 1.96 √ .
12 12
or
(191, 243).
A 99% confidence interval for µ is

46 46
217 − 2.58 √ , 217 + 2.58 √ .
12 12
or
(183, 251).
Confidence Intervals
Under the normality assumption

σ σ
P X − 1.96 √ ≤ µ ≤ X + 1.96 √ = 0.95.
n n
In general, by the CLT, for reasonably large sample size n, the above
equation is still approximately true. Thus a 95% confidence interval
for µ when σ is known is

σ σ
x̄ − 1.96 √ , x̄ + 1.96 √ .
n n
Let zα/2 be the value that cuts off an area of α/2 in the upper tail of
the standard normal distribution. A 1 − α confidence interval for the
population mean µ is

σ σ
x̄ − zα/2 √ , x̄ + zα/2 √
n n
Confidence Intervals: What do they mean?
In repeated sampling, from a normally distributed population with a

known standard deviation, 100(1 − α) percent of all intervals of the
form

σ σ
x̄ − z1−α/2 √ , x̄ + z1−α/2 √
n n
will in the long run cover the population mean µ.
See the simulations in R.

Confidence Intervals
In general, a confidence interval of an unknown quantity is

+
point estimate − (reliability coefficient) × (standard error).
Sometimes, we call
margin of error = (reliability coefficient) × (standard error)

= half of the length of the confidence interval.
Sample size calculation based on specified length of CI
In the cholesterol level example, the 95% confidence interval is

(191, 243). Its length is 243 − 191 = 52. How large a sample would
we need to reduce its length to 20?
Recall that the 95% confidence interval is

46 46
217 − 1.96 √ , 217 + 1.96 √ .
n n
√
The length of this confidence interval is 2 × 1.96 × 46/ n. So to find
the required sample size n, we can solve the equation
46
2 × 1.96 √ = 20.
n
We find
2
1.96 × 46
n= = 81.3 ≈ 82.
10
One-sided confidence interval

Sometimes, we are interested in an upper limit for the population mean
µ or a lower limit for µ. In such cases, one-sided confidence intervals
are appropriate.
Example: Consider the distribution of hemoglobin levels for the
population of children under 6 who have been exposed to high levels
of lead. Suppose that this distribution has sd σ = 0.85g100ml.
Because children who have lead poisoning tend to have much lower
levels of hemoglobin than children who do not, we are interested in
an upper confidence limit for µ, the mean of the hemoglobin levels in
this population.
Suppose that we have a random sample of 74 children from this
population. The sample mean x = 10.6g100ml. We construct a 95%
upper confidence limit. The idea is to find c such that

σ
P µ ≤ X + c√ = 0.95.
n
That is

X −µ
P √ ≥ −c = 0.95.
σ/ n
Thus c = 1.645. The upper confidence limit is
0.85
10.6 + 1.645 × √ = 10.8.
74
Student’s t-distribution
So far we have assumed that σ is known. However, in reality, both µ

and σ are usually unknown. All we have is the data. Let x1, . . . , xn
be the observations. Let the sample mean and sample variance be
n n
1X 2 1 X
x= xi , s = (xi − x)2.
n i=1 n − 1 i=1
The confidence intervals can be constructed based on the following

t-statistic:
x−µ
T = √ .
s/ n
We can compare the t-statistic with the z-statistic:

x−µ
Z= √ .
σ/ n
The difference is
• In Z, we use σ (when σ is known).
• In T , we use s (when σ is unknown).
Suppose the data is from the normal distribution N (µ, σ 2). Then
T has a t-distribution with n − 1 degrees of freedom. This is often
denoted as
T ∼ tn−1 .
This result was first obtained by W. S. Gosset in the paper “The

Probable Error of a Mean,” Biometrika, 6 (1908), 1-25. Gosset used
the pseudonym “Student”. So this distribution is called student’s t
distribution, or in short, t-distribution.
Example: A sample of 16 ten-year-old girls gave a mean weight

of 71.5 and a standard deviation of 12 pounds. Assuming normality,
find the 90, 95, and 99 percent confidence intervals for the population
mean weight µ.
Let tα/2(n − 1) the value that cuts off the upper area of α/2 in a
t-distribution with n − 1 degrees of freedom. The general form of the
confidence interval based on the t-distribution is

s s
x − tα/2(n − 1) √ , x + tα/2(n − 1) √ .
n n
The 90, 95, and 99 percent confidence intervals are

12 12
71.5 − 1.75 , 71.5 + 1.75 ,
4 4

12 12
71.5 − 2.13 , 71.5 + 2.13 ,
4 4

12 12
71.5 − 2.95 , 71.5 + 2.95 ,
4 4
or
(66.25, 76.75),
(65.11, 77.89),
(62.65, 80.35),
respectively.
Let X be the sample mean and S 2 be the sample variance of a random

sample from a N (µ, σ 2) distribution. Denote
X −µ
T = √ .
S/ n
Then
T ∼ tn−1 .
This result was first obtained by W. S. Gosset in the paper “The

Probable Error of a Mean,” Biometrika, 6 (1908), 1-25. Gosset used
the pseudonym “Student”. So this distribution is called student’s t
distribution, or in short, t-distribution.
Example: Lloyd and Mailloux [1988, Analysis of S-100 Protein

Positive Folliculo-Stelate Cells in Rat Pituitary Tissues, American
Journal of Pathology, 133, 338-348] reported the following data
on the pituitary gland weight in a sample of four Wistar Furth Rats:
mean= 9.0 mg, standard error of the mean= 1.0.
(a) What was the sample standard deviation?

(b) Construct a 95% confidence interval for the mean pituitary
weight of a population of similar rats.

Notes 6

Uploaded by

Copyright:

Available Formats

Notes 6

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Notes 6

Uploaded by

Copyright:

Available Formats

22S:101 Biostatistics: J.

Chapter 9: Confidence Intervals

Point Estimation: using the data to calculate a single estimate of the

Interval Estimation: provides a range of values (an interval) that may

Confidence Intervals: an interval that contains the unknown

Example: Consider the distribution of serum cholesterol levels

x̄ = 217mg/100ml is a point estimate of the unknown mean

However, because of the sampling variability, it is important to

A 99% confidence interval for µ is

Under the normality assumption

Confidence Intervals: What do they mean?

In repeated sampling, from a normally distributed population with a

See the simulations in R.

In general, a confidence interval of an unknown quantity is

margin of error = (reliability coefficient) × (standard error)

Sample size calculation based on specified length of CI

In the cholesterol level example, the 95% confidence interval is

Recall that the 95% confidence interval is

One-sided confidence interval

So far we have assumed that σ is known. However, in reality, both µ

The confidence intervals can be constructed based on the following

We can compare the t-statistic with the z-statistic:

This result was first obtained by W. S. Gosset in the paper “The

Example: A sample of 16 ten-year-old girls gave a mean weight

The 90, 95, and 99 percent confidence intervals are

Let X be the sample mean and S 2 be the sample variance of a random

This result was first obtained by W. S. Gosset in the paper “The

Example: Lloyd and Mailloux [1988, Analysis of S-100 Protein

(a) What was the sample standard deviation?

You might also like