Chapter 2 - Estimation PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

STA408: Statistics for Science and Engineering

Chapter 2: Estimation

2.1 Sampling Distributions

Populations and Samples


 A population consists of all subjects that are being studied.
 A sample is a subset of a population.

Population Parameters and Sample Statistics


 A numerical measure calculated for a population data set is called a population parameter.
 A summary measure calculated for a sample data set is called a sample statistic.

Population Distribution and Sampling Distribution


 The population distribution is the probability distribution of the population data.
 The probability distribution of a sample statistics is called its sampling distribution.

Note: A sample statistic that is used to estimate a population parameter is called an estimator.

Example 1
Suppose there are only five students in STA 408 class and the test 1 scores of these five students are as
below.
70 78 80 80 95
(a) Find the population distribution for the scores of the students.
(b) Find the mean and standard deviation of the data.

(a) Let 𝑥 be the score of a student. The population probability distribution is:

𝑥 𝑃(𝑥)
70
78
80
95

(b) Mean,
∑𝑥
𝜇= =
𝑁

Standard deviation,
2 (∑ 𝑥)2
∑(𝑥 − 𝜇)2 √∑ 𝑥 − 𝑁
𝜎=√ or =
𝑁 𝑁

2 (∑ 𝑥)2
√∑ 𝑥 − 𝑁
𝜎= =
𝑁
STA408 Chapter 2: Estimation

Sampling Distribution of 𝒙 ̅
Example 2
Refer to the data in Example 1.
(a) Find all possible samples of three scores each that can be selected, without replacement.
(b) Find the mean for each of the sample.
(c) Find the sampling distribution of 𝑥̅ .

(a) Let 𝑀1 = 70, 𝑀2 = 78, 𝑀3 = 80, 𝑀4 = 80 and 𝑀5 = 95.

Possible samples (b) Mean, 𝑥̅


70+78+80 228
{𝑀1 , 𝑀2 , 𝑀3 } {70, 78, 80} = = 76
3 3
70+78+80 228
{𝑀1 , 𝑀2 , 𝑀4 } {70, 78, 80} = = 76
3 3
70+78+95 243
{𝑀1 , 𝑀2 , 𝑀5 } {70, 78, 95} = = 81
3 3

{𝑀1 , 𝑀3 , 𝑀4 } {70, 80, 80}

{𝑀1 , 𝑀3 , 𝑀5 } {70, 80, 95}

{𝑀1 , 𝑀4 , 𝑀5 } {70, 80, 95}

{𝑀2 , 𝑀3 , 𝑀4 } {78, 80, 80}

{𝑀2 , 𝑀3 , 𝑀5 }

{𝑀2 , 𝑀4 , 𝑀5 }

{𝑀3 , 𝑀3 , 𝑀5 }

(c) Sampling distribution of 𝑥̅ :

𝑥̅ 76 76.67 79.33 ∑ 𝑃(𝑋̅ = 𝑥̅ )

𝑃(𝑋̅ = 𝑥̅ ) 0.2 0.1 0.1

Mean and Standard Deviation of 𝒙


̅
 The mean of the sampling distribution of 𝑥̅ is denoted by 𝝁𝒙̅ , is equal to the population mean 𝜇, i.e.,
𝜇𝑥̅ = 𝜇

 The standard deviation of the sampling distribution of 𝑥̅ is denoted by 𝝈̅𝒙 is


𝜎
𝜎𝑥̅ =
√𝑛
where 𝜎 is the standard deviation of the population and 𝑛 is the sample size.

2
STA408 Chapter 2: Estimation

Estimates and their notations


The population mean and variance are usually not known, therefore we estimate them as follows.
 To estimate population mean, 𝝁, we use the sample mean, 𝒙̅.
 To estimate population variance, 𝝈 , we use the sample variance, 𝒔𝟐 .
𝟐

Note: The standard error of the mean is also the standard deviation of the sample mean, denoted as
𝜎 𝑠
or
√𝑛 √𝑛

Example 3
The mean wage per hour for all 5000 employees who work at a large hotel is RM 27.50, and the standard
deviation is RM 3.70. Let 𝑥̅ be the mean wage per hour for a random sample of certain employees selected
from this company. Find the mean and standard deviation of 𝑥̅ for a sample size of
(a) 30, (b) 75, (c) 200.

Sampling from a normally distributed population


If the population from which the samples are drawn is normally distributed with mean 𝜇 and standard
deviation 𝜎, then the sampling distribution of the sample mean, 𝑥̅ will be normally distributed with
mean 𝜇𝑥̅ and standard deviation 𝜎𝑥̅ , irrespective of the sample size.
The distribution of 𝑥̅ is
𝜎2
𝑋̅ ~ 𝑁 (𝜇𝑥̅ = 𝜇 , 𝜎𝑥̅2 = )
𝑛

Sampling from a population that is not normally distributed


Most of the time the population from which the samples are selected is NOT normally distributed. In such
cases, the shape of the sampling distribution of 𝑥̅ is inferred from a theorem called Central Limit
Theorem.

Central Limit Theorem


For a large sample size, the sampling distribution of 𝑥̅ is approximately normal, irrespective of the
shape of the population distribution. The mean and standard deviation of the sampling distribution of 𝑥̅
are, respectively,
𝜎
𝜇𝑥̅ = 𝜇 and 𝜎𝑥̅ = .
√𝑛
The sample size is usually considered to be large if 𝒏 ≥ 𝟑𝟎.

3
STA408 Chapter 2: Estimation

Population distributions and sampling distributions of 𝒙 ̅


Table 1: The normal and not normal population distributions together with their respective
sampling distributions of 𝑥̅ for different sample sizes, 𝑛.

Applications of the sampling distribution of sample mean, 𝒙 ̅


Example 4
Assume that the weights of all packages of a certain brand of cookies are normally distributed with a
mean of 32 grams and a standard deviation of 0.3 grams. Find the probability that the mean weight, of a
random sample of 20 packages of this brand of cookies will be between 31.8 and 31.9 grams.

4
STA408 Chapter 2: Estimation

Example 5
The actual weights, 𝑊 kilograms, of fertilizer in a 5 kg bag may be modelled by a normal random variable
with mean 5.25 kg and variance 0.25 kg. A random sample of four 5 kg bags is selected. Calculate the
probability that the mean weight of fertilizer of the four bags is less than 5.30 kg.

2.2 Estimation

Definitions
 The assignment of value(s) to a population parameter based on a value of the corresponding
statistic is called estimation.
 The value(s) assigned to a population parameter based on a value of a sample statistic is called an
estimate.
 The sample statistic that is used to estimate a population parameter is called an estimator.

Properties of a good estimator


 Unbiased
Expected value of the estimates is equal to the parameter being estimated.

 Consistent
As sample size increases, the value of the estimator approaches the value of the parameter
estimated

 Relatively efficient
Of all the statistics that can be used to estimate a parameter, the relatively efficient estimator has
the smallest variance.

Estimation procedure
Step 1: Select a sample
Step 2: Collect required information from the members of the sample.
Step 3: Calculate the value(s) of the sample statistic(s).
Step 4: Assign value(s) to the corresponding population parameter(s).

5
STA408 Chapter 2: Estimation

 The value of a sample statistic that is used to estimate a population parameter is called a point
estimate.
 In interval estimate, an interval is constructed around the point estimate and it is stated that
this interval is likely to contain the corresponding population parameter.
 Each interval is constructed with regard to a given confidence level and is called a confidence
interval. The confidence interval is given as
Point estimate ± margin of error.
 The confidence level associated with a confidence interval states how much confidence we have
that this interval contains the true population parameter. The confidence interval is denoted by
(1 − 𝛼)100%
where 𝛼 is the significance level.
Examples of point estimates
Population (Parameter) Sample (Statistic /point estimate)
∑𝑋 ∑𝑥
Mean 𝜇= 𝑥̅ =
𝑁 𝑛
(∑ 𝑋)2 (∑ 𝑥)2
Variance ∑(𝑋 − 𝜇) 2 ∑ 𝑋2 − ∑(𝑥 − 𝑥̅ )2 ∑ 𝑥2 −
𝜎2 = = 𝑁 𝑠2 = = 𝑛
𝑁 𝑁 𝑛−1 𝑛−1
(∑ 𝑋)2 (∑ 𝑥)2
Standard ∑(𝑋 − 𝜇)2 √∑ 𝑋 2 − 𝑁 ∑(𝑥 − 𝑥̅ )2 √∑ 𝑥 2 − 𝑛
deviation 𝜎=√ = 𝑠=√ =
𝑁 𝑁 𝑛−1 𝑛−1

where 𝑋1 , 𝑋2 , 𝑋3 , , 𝑋𝑁 are the members of a population and 𝑥1 , 𝑥2 , 𝑥3 , , 𝑥𝑛 are the elements in a


sample.

Example 6
Following are the 2009 earnings (in thousands of dollars) before taxes for all six employees of a small
company.
88.50 108.40 65.50 52.50 79.80 54.60
Calculate the mean and standard deviation of these data.

6
STA408 Chapter 2: Estimation

Example 7
Assume that the data given in Example 6 are the earnings for six employees of a large company. Calculate
the mean and standard deviation of those data.

Estimation of a population mean: 𝝈𝟐 known


Confidence interval of mean for a specific 𝛼 when 𝜎 2 is known:
𝜎 𝜎
𝑥̅ − 𝑧𝛼 ( ) < 𝜇 < 𝑥̅ + 𝑧𝛼 ( )
2 √𝑛 2 √𝑛

For a 90% confidence interval, 𝛼 = 10% = 𝟎. 𝟏𝟎 and 𝑧𝛼 = 𝒛𝟎.𝟎𝟓 = _________________;


2

For a 95% confidence interval, 𝛼 = 5%= 𝟎. 𝟎𝟓 and 𝑧𝛼 = 𝒛𝟎.𝟎𝟐𝟓 = __________________;


2

For a 99% confidence interval, 𝛼 = 1%= 𝟎. 𝟎𝟏 and 𝑧𝛼 = 𝒛𝟎.𝟎𝟎𝟓 = __________________;


2

𝝈
and the margin of error is 𝒛𝜶 ( ).
𝟐 √𝒏

Confidence intervals

Figure 1: The 90% confidence intervals for 𝜇 constructed by 𝑥̅1 , 𝑥̅2 and 𝑥̅3 .

7
STA408 Chapter 2: Estimation

2.3 Interval Estimation of one Population Mean

(I) A sample (be it large or small) drawn from a normal distribution with a known 𝝈
Example 8
A publishing company has just published a new college textbook. Before the company decides the price
at which to sell this textbooks, it wants to know the average price of all such textbooks in the market. The
research department at the company took a sample of 25 comparable textbooks and collected
information on their prices. This information produced a mean price of RM 145 for this sample. It is
known that the standard deviation of the prices of all such textbooks is RM 35 and the population of such
prices is normal.
(a) What is the point estimate of the mean price of all such college textbooks?
(b) Construct a 90% confidence interval for the mean price of all such textbooks and interpret the
interval.

Example 9
The following data represent a sample of assets (in millions of RM) of 10 companies in Selangor. Find the
90% confidence interval of the mean. Assume that the assets (in millions of RM) of all companies in
Selangor are approximately normally distributed and the standard deviation of the population is 21.154.
12.23 2.89 13.19 73.25 11.59 8.74 7.92 40.22 5.01 2.27

8
STA408 Chapter 2: Estimation

Below is the output for the analysis of data in Example 9 using Minitab software.
One-Sample Z: Assets_Value

The assumed standard deviation = 21.154

Variable N Mean StDev SE Mean 95% CI


Assets_Value 10 17.73 22.30 6.69 (4.62, 30.84)

(II) A large sample drawn from an unknown distribution with a known 𝝈


Example 10
A machine is regulated to dispense liquid into cartons in such a way that the amount of liquid dispensed
on all occasions is known to have a standard deviation of 20 ml.
(a) Find the 95% confidence limits for the mean amount of liquid dispensed if a random sample of
40 cartons had an average content of 266ml.
(b) Find the 99% confidence limits for the mean amount of liquid dispensed if a random sample of
40 cartons had an average content of 266ml.
(c) Find the 95% confidence limits for the mean amount of liquid dispensed if a random sample of
120 cartons had an average content of 266ml.

9
STA408 Chapter 2: Estimation

Note : The width of the confidence interval depends on the size of the margin of error which depends
on the values of 𝑧, 𝜎 and 𝑛. However, the value of 𝜎 is beyond our control. Therefore the width of
the confidence interval can be controlled either through the value of 𝑧 (depends on 𝛼) or the size
of the sample, 𝑛.
Confidence level and the width of confidence interval
- The larger the confidence level, the wider the confidence interval is and vice versa.
Sample size and the width of confidence interval
- The bigger the size of the sample, the smaller the confidence interval is and vice versa.

t Distribution
Characteristics if a t distribution:
 It is bell-shaped
 It is symmetric about the mean.
 The mean, median and mode are equal to 0 and are located in the centre of the distribution.
 The curve never touches the 𝑥-axis.
 The variance is greater than 1.
 The t distribution is a family of curves based on the concept of degrees of freedom, 𝜈 which is
related to the sample size.
 As the sample size increases, the t distribution approaches the normal distribution.

Estimation of a population mean: 𝝈𝟐 unknown


Confidence interval of mean for a specific 𝛼 when 𝜎 2 is unknown:
𝑠 𝑠
𝑥̅ − 𝑡𝛼 ,𝜈 ( ) < 𝜇 < 𝑥̅ + 𝑡𝛼 ,𝜈 ( )
2 √𝑛 2 √𝑛
where the degrees of freedom, 𝝂 = 𝒏 − 𝟏.

Below is the output for the analysis of data in Example 9 using Minitab software (where 𝜎 is unknown)
One-Sample T: Assets_Value

Variable N Mean StDev SE Mean 95% CI


Assets_Value 10 17.73 22.30 7.05 (1.78, 33.68)

(III) A small sample drawn from a normal distribution with an unknown 𝝈𝟐


Example 11
Dr. K wants to estimate the mean cholesterol level for all adult men living in Shah Alam. He took a sample
of 25 adult men from Shah Alam and found that the mean cholesterol level for this sample is 186mg/dL
with a standard deviation of 12 mg/dL. Assume that the cholesterol levels for all adult men in Shah Alam
are (approximately) normally distributed. Construct a 95% confidence interval for the population mean.

10
STA408 Chapter 2: Estimation

(IV) A large sample drawn from an unknown distribution with an unknown 𝝈𝟐 (use 𝒕 table)

Example 12
Forty-one randomly selected adults who buy books for general reading were asked how much they
actually spend on books per year. The sample produced a mean of RM 145 and a standard deviation of
RM 30 for such annual expenses. Determine a 99% confidence interval for the corresponding population
mean.

Example 13
An experienced poultry farmer knows that the mean weight 𝜇 kg for a large population of chickens will
vary from season to season but the standard deviation of the weights should remain at 0.70 kg. A random
sample of 100 chickens is taken from the population and the weight 𝑥 kg of each chicken in the sample is
recorded giving ∑ 𝑥 = 190.2. Find a 95% confidence interval for 𝜇.

11
STA408 Chapter 2: Estimation

2.4 Interval Estimation of Two Population Means (Independent variables)

(I) Difference in means of two normal populations, 𝝁𝟏 − 𝝁𝟐 (variances 𝝈𝟐𝟏 and 𝝈𝟐𝟐 are known)
Mean and standard deviation of 𝑥̅1 − 𝑥̅2 which is (approximately) normal has a mean and standard
deviation as follow:
Mean 𝜇𝑥̅1 −𝑥̅2 = 𝜇1 − 𝜇2

𝜎12 𝜎22
Standard deviation 𝜎𝑥̅1 −𝑥̅2 = √ +
𝑛1 𝑛2

Interval Estimation of 𝝁𝟏 − 𝝁𝟐
When using the normal distribution, the (1 − 𝛼)100% confidence interval for 𝜇1 − 𝜇2 is
(𝑥̅1 − 𝑥̅2 ) ± 𝑧𝛼 ( 𝜎𝑥̅1 −𝑥̅2 )
2

Example 14
A survey of low-and middle-income households show that consumers aged 65 years and older had an
average credit card debt of RM 10, 235 and consumers in the 50- to 64-year group had an average credit
card debt of RM 9, 342 at the time of survey. Suppose that these averages where based on the random
samples of 1200 and 1400 people for the two groups, respectively. Further, assume that the population
standard deviations for the two groups were RM 2, 800 and RM 2, 500, respectively. Let 𝜇1 and 𝜇2 be the
respective population means for the two groups, people ages 65 years and older and people in the 50- to
64- year age group. Construct a 95% confidence interval for 𝜇1 − 𝜇2 . Based on the interval, are the 𝜇1 and
𝜇2 equal? Explain.

12
STA408 Chapter 2: Estimation

(II) Difference in means of two normal populations, 𝝁𝟏 − 𝝁𝟐 (variances 𝝈𝟐𝟏 = 𝝈𝟐𝟐 and unknown)
When the standard deviation of two populations are equal, we can use 𝜎 for both 𝜎1 and 𝜎2 . However,
since 𝜎 is unknown, we replace it by its point estimator, 𝑠𝑝 , called the pooled standard deviation.

Pool standard deviation for two samples


(𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22
𝑠𝑝 = √
𝑛1 + 𝑛2 − 2

Estimator of the standard deviation of 𝒙 ̅𝟐 is


̅𝟏 − 𝒙
1 1
𝑠𝑥̅ 1 −𝑥̅2 = 𝑠𝑝 √ +
𝑛1 𝑛2
Interval Estimation of 𝝁𝟏 − 𝝁𝟐
The (1 − 𝛼)100% confidence interval for 𝜇1 − 𝜇2 is
(𝑥̅1 − 𝑥̅2 ) ± 𝑡𝛼,𝜈 ( 𝑠𝑥̅1 −𝑥̅2 )
2
where the degrees of freedom, 𝜈 = 𝑛1 + 𝑛2 − 2.

Example 15
A consumer agency wanted to estimate the difference in mean amounts of caffeine in two different brands
of coffee. The agency took a sample of 15 one-pound jars of Brand I coffee that showed the mean amount
of caffeine in these jars to be 80 milligrams jar with a standard deviation of 5 milligrams. Another sample
of 12 one-pound jars of Brand II coffee gave a mean amount of caffeine equal to 77 milligrams per jar
with a standard deviation of 6 milligrams. Construct a 98% confidence interval for the difference between
the mean amounts of caffeine in one-pound jars of these two brands of coffee. Assume that the two
populations are normally distributed and that the standard deviations of two populations are equal.

13
STA408 Chapter 2: Estimation

Example 16
The following Minitab output was obtained from two independent samples selected from two normally
distributed populations with unknown but equal standard deviations.

Two-Sample T-Test and CI: S1, S2

Two-sample T for S1 vs S2

N Mean StDev SE Mean


S1 13 48.94 8.31 2.3
S2 10 45.95 9.97 3.2

Difference = μ (S1) - μ (S2)


Estimate for difference: 2.99
95% CI for difference: (-4.94, 10.91)
T-Test of difference = 0 (vs ≠): T-Value = 0.78 P-Value = 0.442 DF = 21
Both use Pooled StDev = 9.0587

(a) Verify that the pooled standard deviation of the data is 9.0587
(b) Show that the 95% confidence interval of the difference in mean of the two populations is
between −4.94 and 10.91.

14
STA408 Chapter 2: Estimation

(III) Difference in means of two normal populations, 𝝁𝟏 − 𝝁𝟐 (variances 𝝈𝟐𝟏 ≠ 𝝈𝟐𝟐 and unknown)

Degrees of freedom
2
𝑠2 𝑠2
(𝑛1 + 𝑛2 )
1 2
𝜈= 2 2
𝑠2 𝑠2
(𝑛1 ) (𝑛2 )
1 2
𝑛1 − 1 + 𝑛2 − 1

Estimator of the standard deviation of 𝒙


̅𝟏 − 𝒙
̅𝟐
𝑠12 𝑠22
𝑠𝑥̅1 −𝑥̅2 = √ +
𝑛1 𝑛2
Interval Estimation of 𝝁𝟏 − 𝝁𝟐
The (1 − 𝛼)100% confidence interval for 𝜇1 − 𝜇2 is
(𝑥̅1 − 𝑥̅2 ) ± 𝑡𝛼,𝜈 ( 𝑠𝑥̅1 −𝑥̅2 )
2
2
𝑠2 𝑠2
(𝑛1 + 𝑛2 )
1 2
where the degrees of freedom, 𝜈 = 2 2
𝑠2 𝑠2
( 1) ( 2)
𝑛1 𝑛2
+
𝑛1 −1 𝑛2 −1

Example 17
Refer to Example 15. Construct a 98% confidence interval for the difference between the mean amounts
of caffeine in one-pound jars of these two brands. Assume that two populations are normally distributed
and that the standard deviations of the two populations are not equal.

15
STA408 Chapter 2: Estimation

Example 18
The following Minitab output was obtained from two independent samples selected from two normally
distributed populations with unknown and unequal standard deviations.

Two-Sample T-Test and CI: S1, S2

Two-sample T for S1 vs S2

N Mean StDev SE Mean


S1 13 48.94 8.31 2.3
S2 10 45.95 9.97 3.2

Difference = μ (S1) - μ (S2)


Estimate for difference: 2.99
95% CI for difference: (-5.25, 11.23)
T-Test of difference = 0 (vs ≠): T-Value = 0.77 P-Value = 0.455 DF = 17

(a) Verify that the degrees of freedom is 17.


(b) Show that the 95% confidence interval of the difference in mean of the two populations is
between −5.25 and 11.23.

16
STA408 Chapter 2: Estimation

2.5 Interval Estimation of Two Population Means (Dependent variables)

Mean difference of two normal distributions for paired samples, 𝝁𝒅


Two samples are said to be paired samples when for each data value collected from one sample there is
a corresponding data value collected from the second sample, and both these data values are collected
from the same source.

Notation for paired samples


In pair samples, the difference between the two data values for each element of the two samples is
denoted by 𝒅, called pair difference. The degrees of freedom for the paired samples, 𝝂 = 𝒏 − 𝟏.
 𝜇𝑑 = the mean of the paired differences of the population.
 𝜎𝑑 = the standard deviation of the paired differences of the population (usually is never known).
 𝑑̅ = the mean of the paired differences of the sample.
 𝑠𝑑 = the standard deviation of the paired differences of the sample.
 𝑛 = the number of paired difference values.

Mean and standard deviation of the paired differences of two samples


∑𝑑
Mean, 𝑑̅ 𝑑̅ =
𝑛

2 (∑ 𝑑)2
Standard deviation, 𝑠𝑑 √∑ 𝑑 − 𝑛
𝑠𝑑 =
𝑛−1

Mean and standard deviation of 𝒅 ̅


If 𝜎𝑑̅ is known and either the sample size is large (𝑛 ≥ 30) or the population is normally distributed, then
the sampling distribution of 𝑑̅ is approximately normal with its mean and standard deviation given as,
Mean, 𝜇𝑑̅ 𝜇𝑑̅ = 𝜇𝑑
𝜎𝑑
Standard deviation, 𝜎𝑑̅ 𝜎𝑑̅ =
√𝑛

If 𝜎𝑑̅ is unknown, then 𝜎𝑑̅ is estimated by 𝑠𝑑̅ , i.e.,


𝑠𝑑
𝑠𝑑̅ =
√𝑛

Confidence interval for 𝝁𝒅


The (1 − 𝛼)100% confidence interval for 𝜇𝑑 is
𝑑̅ ± 𝑡𝛼,𝜈 ( 𝑠𝑑̅ )
2

where the degrees of freedom, 𝜈 = 𝑛 − 1.

17
STA408 Chapter 2: Estimation

Example 19
A researcher wanted to find the effect of special diet on systolic blood pressure. She selected a sample of
seven adults and put them on this dietary plan for 3 months. The table below gives the systolic blood
pressure (in mm Hg) of these seven adults before and after the completion of this plan.

Before 210 180 195 220 231 199 224


After 193 186 186 223 220 183 233

Let 𝜇𝑑 be the mean reduction in the systolic blood pressures due to this special dietary plan for the
population of all adults. Construct a 95% confidence interval for 𝜇𝑑 . Assume that the population paired
differences is (approximately) normally distributed.

The following Minitab outputs are obtained from the data in Example 19. Take note of the difference, d
value.
Paired T-Test and CI: Before, After

Paired T for Before - After

N Mean StDev SE Mean


Before 7 208.43 18.10 6.84
After 7 203.43 21.08 7.97
Difference 7 5.00 10.79 4.08

95% CI for mean difference: (-4.98, 14.98)


T-Test of mean difference = 0 (vs ≠ 0): T-Value = 1.23 P-Value = 0.266

18
STA408 Chapter 2: Estimation

Paired T-Test and CI: After, Before

Paired T for After - Before

N Mean StDev SE Mean


After 7 203.43 21.08 7.97
Before 7 208.43 18.10 6.84
Difference 7 -5.00 10.79 4.08

95% CI for mean difference: (-14.98, 4.98)


T-Test of mean difference = 0 (vs ≠ 0): T-Value = -1.23 P-Value = 0.266

Chi-square (𝝌𝟐 ) Distribution


 A distribution based on degrees of freedom, 𝜈.
 The symbol is 𝜒 2 .
(𝑛−1)𝑠2
 The chi-square distribution is obtained from the values of when random samples are
𝜎2
selected from a normally distributed population whose variance is 𝜎 . 2

 A chi-square variable cannot be negative.


 The distribution is skewed to the right.
 At about 100 degrees of freedom, chi-square distribution because approximately normal.
 The area under each chi-square distribution is equal to 1.00 or 100%.

Figure 1: The Chi-Square Family of Curves

Figure 2: Chi-Square Distribution for d.f. = 𝑛 − 1.

19
STA408 Chapter 2: Estimation

Example 20
2 2
Find the values of 𝜒right and 𝜒left for a 90% confidence interval when 𝑛 = 25.

2.6 Interval Estimation of One Population Variance


The assumptions for finding a confidence interval for a variance:
 The sample is a random sample.
 The population must be normally distributed.

The (1 − 𝛼)100% confidence interval for 𝜎 2 is


(𝑛 − 1)𝑠 2 (𝑛 − 1)𝑠 2
2 < 𝜎2 < 2
𝜒right 𝜒left
where degrees of freedom, 𝜈 = 𝑛 − 1.

The (1 − 𝛼)100% confidence interval for 𝜎 is

(𝑛 − 1)𝑠 2 (𝑛 − 1)𝑠 2
√ 2 <𝜎<√ 2
𝜒right 𝜒left

Example 21
Find the 95% confidence interval for the variance and standard deviation of the nicotine content of
cigarettes manufactured if a sample of 20 cigarettes has a standard deviation of 1.6 milligrams.

20
STA408 Chapter 2: Estimation

Example 22
Find the 90% confidence interval for the variance and standard deviation for the price in dollars of an
adult single-day ski lift ticket. The data represent a selected sample of nationwide ski resorts. Assume the
variable is normally distributed.
59 54 53 52 51 39 49 46 49 48

Below is the Minitab output of the confidence interval for one variance using the data given in Example
22.
Test and CI for One Variance: ski_lift_ticket

Method

The chi-square method is only for the normal distribution.


The Bonett method is for any continuous distribution.

Statistics

Variable N StDev Variance


ski_lift_ticket 10 5.31 28.2

90% Confidence Intervals

CI for CI for
Variable Method StDev Variance
ski_lift_ticket Chi-Square (3.87, 8.74) (15.0, 76.4)
Bonett (3.35, 10.09) (11.2, 101.8)

21
STA408 Chapter 2: Estimation

F distribution
 The values of F cannot be negative because variances are always positive or zero.
 The distribution is positively skewed.
 The mean value of F is approximately equal to 1.
 The F distribution is a family of curves based on degrees of freedom of variance of the numerator
and the degrees of freedom of the variance of the denominator.

Figure 3: The F family of curves.

2.7 Interval Estimation of Two Population Variances: Estimating the Ratio of Two Variances
𝜎2 𝑠2
 The point estimate of the ratio of two population variances 𝜎12 is given by the ratio 𝑠12 of the
2 2
sample variances.
𝜎12
 If 𝜎12 and 𝜎22 are the variances of normal populations, we can establish an interval estimate of
𝜎22
by using the statistic
𝜎22 𝑠12
𝐹=
𝜎12 𝑠22
 The random variable 𝐹 has an 𝐹 -distribution with 𝜈1 = 𝑛1 − 1 and 𝜈2 = 𝑛2 − 1 degrees of
freedom.

𝜎12
The (1 − 𝛼)100% confidence interval for ratio of two variances, is
𝜎22

𝑠12 1 𝜎12 𝑠12


< < 𝐹𝛼
𝑠22 𝐹𝛼,𝜈 ,𝜈 𝜎22 𝑠22 2 ,𝜈2 ,𝜈1
2 1 2

𝜎1
The (1 − 𝛼)100% confidence interval for ratio of two standard deviations, 𝜎2
is

𝑠12 1 𝜎1 𝑠12
√ 2 𝐹𝛼 < < √ 2 𝐹𝛼,𝜈 ,𝜈
𝑠2 ,𝜈 ,𝜈 𝜎2 𝑠2 2 2 1
2 1 2

22
STA408 Chapter 2: Estimation

Example 23
A study was conducted by the Department of Zoology at Virginia Tech to estimate the difference in the
amounts of the chemical orthophosphorus measured at two different stations on the James River.
Orthophosphorus was measured in milligrams per litre. Thirteen samples were collected from station 1,
and 11 samples were obtained from station 2. The 13 samples from station had an average
orthophosphorus content of 3.84 milligrams per litre and a standard deviation of 3.07 milligrams per
litre, while the 11 samples from station 2 had an average content of 1.49 milligrams per litre and a
standard deviation of 0.80 milligram per litre. Assume that the observations came from normal
populations.
(a) Construct a 98% confidence interval for the ratio of two variances and standard deviations. Based
on the confidence interval, what can you conclude about the two population variances?
(b) From the result in (a), construct the 98% confidence interval for the difference in the population
mean amounts of the chemical orthophosphorus measured at two different stations. Based on
the interval, is there a significant difference in the two population means?

23
STA408 Chapter 2: Estimation

Example 24
The following Minitab output was obtained from two independent samples selected from two normally
distributed populations with unknown and unequal variances. Show that the lower limit for the 95%
confidence interval of the ratio of variances and standard deviations for the two populations are as given
in the output.

Test and CI for Two Variances: S1, S3

Statistics

95% CI for
Variable N StDev Variance StDevs
S1 13 8.309 69.038 (5.958, 13.716)
S3 9 6.564 43.092 (4.434, 12.576)

Ratio of standard deviations = 1.266


Ratio of variances = 1.602

95% Confidence Intervals

CI for
CI for StDev Variance
Method Ratio Ratio
F (0.618, 2.372) (0.381, 5.626)

24
STA408 Chapter 2: Estimation

Additional Notes

Interval Estimation for Two Populations

For the confidence interval for difference in two population means, 𝜇1 − 𝜇2 ,


 If the value of 0 is in the interval, we can conclude that 𝜇1 = 𝜇2 because 𝜇1 − 𝜇2 = 0.
 However, if the value of 0 is not in the interval, then we can conclude that 𝜇1 ≠ 𝜇2 because
𝜇1 − 𝜇2 ≠ 0.

In a similar manner, when we consider the confidence interval for the ratio of two population variances,
𝝈𝟐𝟏
,
𝝈𝟐𝟐

𝜎12
 If the value of 1 is in the interval, we can conclude that 𝜎12 = 𝜎22 because 𝜎22
= 1.
𝜎12
 However, if the value of 1 is not in the interval, then we can conclude that 𝜎12 ≠ 𝜎22 because 𝜎22
≠ 1.

𝜎12
For example, if we consider the confidence interval for Example 23, the 98% confidence interval for 𝜎22
is
𝜎12
(3.127, 63.326). Since the value of 1 is not in the interval, we can conclude that 𝜎12 ≠ 𝜎22 because 𝜎22
≠ 1.

Remember:
To draw conclusion on the confidence interval for the difference in two population means, check if the
value of 0 is in the interval; however,
If we want to conclude on the confidence interval for the ratio of two population variances, check if the
value of 1 is in the interval.

25

You might also like