Chapter 2 - Estimation PDF
Chapter 2 - Estimation PDF
Chapter 2 - Estimation PDF
Chapter 2: Estimation
Note: A sample statistic that is used to estimate a population parameter is called an estimator.
Example 1
Suppose there are only five students in STA 408 class and the test 1 scores of these five students are as
below.
70 78 80 80 95
(a) Find the population distribution for the scores of the students.
(b) Find the mean and standard deviation of the data.
(a) Let 𝑥 be the score of a student. The population probability distribution is:
𝑥 𝑃(𝑥)
70
78
80
95
(b) Mean,
∑𝑥
𝜇= =
𝑁
Standard deviation,
2 (∑ 𝑥)2
∑(𝑥 − 𝜇)2 √∑ 𝑥 − 𝑁
𝜎=√ or =
𝑁 𝑁
2 (∑ 𝑥)2
√∑ 𝑥 − 𝑁
𝜎= =
𝑁
STA408 Chapter 2: Estimation
Sampling Distribution of 𝒙 ̅
Example 2
Refer to the data in Example 1.
(a) Find all possible samples of three scores each that can be selected, without replacement.
(b) Find the mean for each of the sample.
(c) Find the sampling distribution of 𝑥̅ .
{𝑀2 , 𝑀3 , 𝑀5 }
{𝑀2 , 𝑀4 , 𝑀5 }
{𝑀3 , 𝑀3 , 𝑀5 }
2
STA408 Chapter 2: Estimation
Note: The standard error of the mean is also the standard deviation of the sample mean, denoted as
𝜎 𝑠
or
√𝑛 √𝑛
Example 3
The mean wage per hour for all 5000 employees who work at a large hotel is RM 27.50, and the standard
deviation is RM 3.70. Let 𝑥̅ be the mean wage per hour for a random sample of certain employees selected
from this company. Find the mean and standard deviation of 𝑥̅ for a sample size of
(a) 30, (b) 75, (c) 200.
3
STA408 Chapter 2: Estimation
4
STA408 Chapter 2: Estimation
Example 5
The actual weights, 𝑊 kilograms, of fertilizer in a 5 kg bag may be modelled by a normal random variable
with mean 5.25 kg and variance 0.25 kg. A random sample of four 5 kg bags is selected. Calculate the
probability that the mean weight of fertilizer of the four bags is less than 5.30 kg.
2.2 Estimation
Definitions
The assignment of value(s) to a population parameter based on a value of the corresponding
statistic is called estimation.
The value(s) assigned to a population parameter based on a value of a sample statistic is called an
estimate.
The sample statistic that is used to estimate a population parameter is called an estimator.
Consistent
As sample size increases, the value of the estimator approaches the value of the parameter
estimated
Relatively efficient
Of all the statistics that can be used to estimate a parameter, the relatively efficient estimator has
the smallest variance.
Estimation procedure
Step 1: Select a sample
Step 2: Collect required information from the members of the sample.
Step 3: Calculate the value(s) of the sample statistic(s).
Step 4: Assign value(s) to the corresponding population parameter(s).
5
STA408 Chapter 2: Estimation
The value of a sample statistic that is used to estimate a population parameter is called a point
estimate.
In interval estimate, an interval is constructed around the point estimate and it is stated that
this interval is likely to contain the corresponding population parameter.
Each interval is constructed with regard to a given confidence level and is called a confidence
interval. The confidence interval is given as
Point estimate ± margin of error.
The confidence level associated with a confidence interval states how much confidence we have
that this interval contains the true population parameter. The confidence interval is denoted by
(1 − 𝛼)100%
where 𝛼 is the significance level.
Examples of point estimates
Population (Parameter) Sample (Statistic /point estimate)
∑𝑋 ∑𝑥
Mean 𝜇= 𝑥̅ =
𝑁 𝑛
(∑ 𝑋)2 (∑ 𝑥)2
Variance ∑(𝑋 − 𝜇) 2 ∑ 𝑋2 − ∑(𝑥 − 𝑥̅ )2 ∑ 𝑥2 −
𝜎2 = = 𝑁 𝑠2 = = 𝑛
𝑁 𝑁 𝑛−1 𝑛−1
(∑ 𝑋)2 (∑ 𝑥)2
Standard ∑(𝑋 − 𝜇)2 √∑ 𝑋 2 − 𝑁 ∑(𝑥 − 𝑥̅ )2 √∑ 𝑥 2 − 𝑛
deviation 𝜎=√ = 𝑠=√ =
𝑁 𝑁 𝑛−1 𝑛−1
Example 6
Following are the 2009 earnings (in thousands of dollars) before taxes for all six employees of a small
company.
88.50 108.40 65.50 52.50 79.80 54.60
Calculate the mean and standard deviation of these data.
6
STA408 Chapter 2: Estimation
Example 7
Assume that the data given in Example 6 are the earnings for six employees of a large company. Calculate
the mean and standard deviation of those data.
𝝈
and the margin of error is 𝒛𝜶 ( ).
𝟐 √𝒏
Confidence intervals
Figure 1: The 90% confidence intervals for 𝜇 constructed by 𝑥̅1 , 𝑥̅2 and 𝑥̅3 .
7
STA408 Chapter 2: Estimation
(I) A sample (be it large or small) drawn from a normal distribution with a known 𝝈
Example 8
A publishing company has just published a new college textbook. Before the company decides the price
at which to sell this textbooks, it wants to know the average price of all such textbooks in the market. The
research department at the company took a sample of 25 comparable textbooks and collected
information on their prices. This information produced a mean price of RM 145 for this sample. It is
known that the standard deviation of the prices of all such textbooks is RM 35 and the population of such
prices is normal.
(a) What is the point estimate of the mean price of all such college textbooks?
(b) Construct a 90% confidence interval for the mean price of all such textbooks and interpret the
interval.
Example 9
The following data represent a sample of assets (in millions of RM) of 10 companies in Selangor. Find the
90% confidence interval of the mean. Assume that the assets (in millions of RM) of all companies in
Selangor are approximately normally distributed and the standard deviation of the population is 21.154.
12.23 2.89 13.19 73.25 11.59 8.74 7.92 40.22 5.01 2.27
8
STA408 Chapter 2: Estimation
Below is the output for the analysis of data in Example 9 using Minitab software.
One-Sample Z: Assets_Value
9
STA408 Chapter 2: Estimation
Note : The width of the confidence interval depends on the size of the margin of error which depends
on the values of 𝑧, 𝜎 and 𝑛. However, the value of 𝜎 is beyond our control. Therefore the width of
the confidence interval can be controlled either through the value of 𝑧 (depends on 𝛼) or the size
of the sample, 𝑛.
Confidence level and the width of confidence interval
- The larger the confidence level, the wider the confidence interval is and vice versa.
Sample size and the width of confidence interval
- The bigger the size of the sample, the smaller the confidence interval is and vice versa.
t Distribution
Characteristics if a t distribution:
It is bell-shaped
It is symmetric about the mean.
The mean, median and mode are equal to 0 and are located in the centre of the distribution.
The curve never touches the 𝑥-axis.
The variance is greater than 1.
The t distribution is a family of curves based on the concept of degrees of freedom, 𝜈 which is
related to the sample size.
As the sample size increases, the t distribution approaches the normal distribution.
Below is the output for the analysis of data in Example 9 using Minitab software (where 𝜎 is unknown)
One-Sample T: Assets_Value
10
STA408 Chapter 2: Estimation
(IV) A large sample drawn from an unknown distribution with an unknown 𝝈𝟐 (use 𝒕 table)
Example 12
Forty-one randomly selected adults who buy books for general reading were asked how much they
actually spend on books per year. The sample produced a mean of RM 145 and a standard deviation of
RM 30 for such annual expenses. Determine a 99% confidence interval for the corresponding population
mean.
Example 13
An experienced poultry farmer knows that the mean weight 𝜇 kg for a large population of chickens will
vary from season to season but the standard deviation of the weights should remain at 0.70 kg. A random
sample of 100 chickens is taken from the population and the weight 𝑥 kg of each chicken in the sample is
recorded giving ∑ 𝑥 = 190.2. Find a 95% confidence interval for 𝜇.
11
STA408 Chapter 2: Estimation
(I) Difference in means of two normal populations, 𝝁𝟏 − 𝝁𝟐 (variances 𝝈𝟐𝟏 and 𝝈𝟐𝟐 are known)
Mean and standard deviation of 𝑥̅1 − 𝑥̅2 which is (approximately) normal has a mean and standard
deviation as follow:
Mean 𝜇𝑥̅1 −𝑥̅2 = 𝜇1 − 𝜇2
𝜎12 𝜎22
Standard deviation 𝜎𝑥̅1 −𝑥̅2 = √ +
𝑛1 𝑛2
Interval Estimation of 𝝁𝟏 − 𝝁𝟐
When using the normal distribution, the (1 − 𝛼)100% confidence interval for 𝜇1 − 𝜇2 is
(𝑥̅1 − 𝑥̅2 ) ± 𝑧𝛼 ( 𝜎𝑥̅1 −𝑥̅2 )
2
Example 14
A survey of low-and middle-income households show that consumers aged 65 years and older had an
average credit card debt of RM 10, 235 and consumers in the 50- to 64-year group had an average credit
card debt of RM 9, 342 at the time of survey. Suppose that these averages where based on the random
samples of 1200 and 1400 people for the two groups, respectively. Further, assume that the population
standard deviations for the two groups were RM 2, 800 and RM 2, 500, respectively. Let 𝜇1 and 𝜇2 be the
respective population means for the two groups, people ages 65 years and older and people in the 50- to
64- year age group. Construct a 95% confidence interval for 𝜇1 − 𝜇2 . Based on the interval, are the 𝜇1 and
𝜇2 equal? Explain.
12
STA408 Chapter 2: Estimation
(II) Difference in means of two normal populations, 𝝁𝟏 − 𝝁𝟐 (variances 𝝈𝟐𝟏 = 𝝈𝟐𝟐 and unknown)
When the standard deviation of two populations are equal, we can use 𝜎 for both 𝜎1 and 𝜎2 . However,
since 𝜎 is unknown, we replace it by its point estimator, 𝑠𝑝 , called the pooled standard deviation.
Example 15
A consumer agency wanted to estimate the difference in mean amounts of caffeine in two different brands
of coffee. The agency took a sample of 15 one-pound jars of Brand I coffee that showed the mean amount
of caffeine in these jars to be 80 milligrams jar with a standard deviation of 5 milligrams. Another sample
of 12 one-pound jars of Brand II coffee gave a mean amount of caffeine equal to 77 milligrams per jar
with a standard deviation of 6 milligrams. Construct a 98% confidence interval for the difference between
the mean amounts of caffeine in one-pound jars of these two brands of coffee. Assume that the two
populations are normally distributed and that the standard deviations of two populations are equal.
13
STA408 Chapter 2: Estimation
Example 16
The following Minitab output was obtained from two independent samples selected from two normally
distributed populations with unknown but equal standard deviations.
Two-sample T for S1 vs S2
(a) Verify that the pooled standard deviation of the data is 9.0587
(b) Show that the 95% confidence interval of the difference in mean of the two populations is
between −4.94 and 10.91.
14
STA408 Chapter 2: Estimation
(III) Difference in means of two normal populations, 𝝁𝟏 − 𝝁𝟐 (variances 𝝈𝟐𝟏 ≠ 𝝈𝟐𝟐 and unknown)
Degrees of freedom
2
𝑠2 𝑠2
(𝑛1 + 𝑛2 )
1 2
𝜈= 2 2
𝑠2 𝑠2
(𝑛1 ) (𝑛2 )
1 2
𝑛1 − 1 + 𝑛2 − 1
Example 17
Refer to Example 15. Construct a 98% confidence interval for the difference between the mean amounts
of caffeine in one-pound jars of these two brands. Assume that two populations are normally distributed
and that the standard deviations of the two populations are not equal.
15
STA408 Chapter 2: Estimation
Example 18
The following Minitab output was obtained from two independent samples selected from two normally
distributed populations with unknown and unequal standard deviations.
Two-sample T for S1 vs S2
16
STA408 Chapter 2: Estimation
2 (∑ 𝑑)2
Standard deviation, 𝑠𝑑 √∑ 𝑑 − 𝑛
𝑠𝑑 =
𝑛−1
17
STA408 Chapter 2: Estimation
Example 19
A researcher wanted to find the effect of special diet on systolic blood pressure. She selected a sample of
seven adults and put them on this dietary plan for 3 months. The table below gives the systolic blood
pressure (in mm Hg) of these seven adults before and after the completion of this plan.
Let 𝜇𝑑 be the mean reduction in the systolic blood pressures due to this special dietary plan for the
population of all adults. Construct a 95% confidence interval for 𝜇𝑑 . Assume that the population paired
differences is (approximately) normally distributed.
The following Minitab outputs are obtained from the data in Example 19. Take note of the difference, d
value.
Paired T-Test and CI: Before, After
18
STA408 Chapter 2: Estimation
19
STA408 Chapter 2: Estimation
Example 20
2 2
Find the values of 𝜒right and 𝜒left for a 90% confidence interval when 𝑛 = 25.
(𝑛 − 1)𝑠 2 (𝑛 − 1)𝑠 2
√ 2 <𝜎<√ 2
𝜒right 𝜒left
Example 21
Find the 95% confidence interval for the variance and standard deviation of the nicotine content of
cigarettes manufactured if a sample of 20 cigarettes has a standard deviation of 1.6 milligrams.
20
STA408 Chapter 2: Estimation
Example 22
Find the 90% confidence interval for the variance and standard deviation for the price in dollars of an
adult single-day ski lift ticket. The data represent a selected sample of nationwide ski resorts. Assume the
variable is normally distributed.
59 54 53 52 51 39 49 46 49 48
Below is the Minitab output of the confidence interval for one variance using the data given in Example
22.
Test and CI for One Variance: ski_lift_ticket
Method
Statistics
CI for CI for
Variable Method StDev Variance
ski_lift_ticket Chi-Square (3.87, 8.74) (15.0, 76.4)
Bonett (3.35, 10.09) (11.2, 101.8)
21
STA408 Chapter 2: Estimation
F distribution
The values of F cannot be negative because variances are always positive or zero.
The distribution is positively skewed.
The mean value of F is approximately equal to 1.
The F distribution is a family of curves based on degrees of freedom of variance of the numerator
and the degrees of freedom of the variance of the denominator.
2.7 Interval Estimation of Two Population Variances: Estimating the Ratio of Two Variances
𝜎2 𝑠2
The point estimate of the ratio of two population variances 𝜎12 is given by the ratio 𝑠12 of the
2 2
sample variances.
𝜎12
If 𝜎12 and 𝜎22 are the variances of normal populations, we can establish an interval estimate of
𝜎22
by using the statistic
𝜎22 𝑠12
𝐹=
𝜎12 𝑠22
The random variable 𝐹 has an 𝐹 -distribution with 𝜈1 = 𝑛1 − 1 and 𝜈2 = 𝑛2 − 1 degrees of
freedom.
𝜎12
The (1 − 𝛼)100% confidence interval for ratio of two variances, is
𝜎22
𝜎1
The (1 − 𝛼)100% confidence interval for ratio of two standard deviations, 𝜎2
is
𝑠12 1 𝜎1 𝑠12
√ 2 𝐹𝛼 < < √ 2 𝐹𝛼,𝜈 ,𝜈
𝑠2 ,𝜈 ,𝜈 𝜎2 𝑠2 2 2 1
2 1 2
22
STA408 Chapter 2: Estimation
Example 23
A study was conducted by the Department of Zoology at Virginia Tech to estimate the difference in the
amounts of the chemical orthophosphorus measured at two different stations on the James River.
Orthophosphorus was measured in milligrams per litre. Thirteen samples were collected from station 1,
and 11 samples were obtained from station 2. The 13 samples from station had an average
orthophosphorus content of 3.84 milligrams per litre and a standard deviation of 3.07 milligrams per
litre, while the 11 samples from station 2 had an average content of 1.49 milligrams per litre and a
standard deviation of 0.80 milligram per litre. Assume that the observations came from normal
populations.
(a) Construct a 98% confidence interval for the ratio of two variances and standard deviations. Based
on the confidence interval, what can you conclude about the two population variances?
(b) From the result in (a), construct the 98% confidence interval for the difference in the population
mean amounts of the chemical orthophosphorus measured at two different stations. Based on
the interval, is there a significant difference in the two population means?
23
STA408 Chapter 2: Estimation
Example 24
The following Minitab output was obtained from two independent samples selected from two normally
distributed populations with unknown and unequal variances. Show that the lower limit for the 95%
confidence interval of the ratio of variances and standard deviations for the two populations are as given
in the output.
Statistics
95% CI for
Variable N StDev Variance StDevs
S1 13 8.309 69.038 (5.958, 13.716)
S3 9 6.564 43.092 (4.434, 12.576)
CI for
CI for StDev Variance
Method Ratio Ratio
F (0.618, 2.372) (0.381, 5.626)
24
STA408 Chapter 2: Estimation
Additional Notes
In a similar manner, when we consider the confidence interval for the ratio of two population variances,
𝝈𝟐𝟏
,
𝝈𝟐𝟐
𝜎12
If the value of 1 is in the interval, we can conclude that 𝜎12 = 𝜎22 because 𝜎22
= 1.
𝜎12
However, if the value of 1 is not in the interval, then we can conclude that 𝜎12 ≠ 𝜎22 because 𝜎22
≠ 1.
𝜎12
For example, if we consider the confidence interval for Example 23, the 98% confidence interval for 𝜎22
is
𝜎12
(3.127, 63.326). Since the value of 1 is not in the interval, we can conclude that 𝜎12 ≠ 𝜎22 because 𝜎22
≠ 1.
Remember:
To draw conclusion on the confidence interval for the difference in two population means, check if the
value of 0 is in the interval; however,
If we want to conclude on the confidence interval for the ratio of two population variances, check if the
value of 1 is in the interval.
25