Estimtion Confidence Interval

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 46

Estimation and Confidence Intervals

Introduction
This chapter considers several important aspects of sampling. We begin by
studying point estimates. A point estimate is a single value (point) derived
from a sample and used to estimate a population value.

For example, suppose we select a sample of 50 junior executives and ask


how many hours they worked last week. Compute the mean of this sample
of 50 and use the value of the sample mean as a point estimate of the
unknown population mean.

However, a point estimate is a single value. A more informative approach is


to present a range of values in which we expect the population parameter to
occur. Such a range of values is called a confidence interval.
Point Estimate for a Population Mean

A point estimate is a single statistic used to estimate a population parameter.


For example, recent medical studies indicate that exercise is an important part of a
person’s overall health. The director of human resources at OCF, a large glass
manufacturer, wants an estimate of the number of hours per week employees spend
exercising. A sample of 70 employees reveals the mean number of hours of exercise last
week is 3.3. The point estimate of 3.3 hours estimates the unknown population.

The sample mean, , is not the only point estimate of a population parameter. For
example, p, a sample proportion, is a point estimate of π, the population proportion; and
s, the sample standard deviation, is a point estimate of σ, the population standard
deviation.
Confidence Intervals for a Population Mean

CONFIDENCE INTERVAL: A range of values constructed from sample data so that


the population parameter is likely to occur within that range at a specified probability.
The specified probability is called the level of confidence.

For example, we estimate the mean yearly income for construction workers in the New
York–New Jersey area is $85,000. The range of this estimate might be from $81,000 to
$89,000. We can describe how confident we are that the population parameter is in the
interval by making a probability statement. We might say, for instance, that we are 90
percent sure that the mean yearly income of construction workers in the New York–
New Jersey area is between $81,000 and $89,000.
Confidence Intervals for a Population Mean
Population Standard Deviation Known σ

A confidence interval is computed using two statistics: the sample mean, X̅ , and the
standard deviation. In computing a confidence interval, the standard deviation is
used to compute the range of the confidence interval.
Though it is reasonable to assume the standard deviation of the population is available,
but in most sampling situations the population standard deviation (σ) is not known.
Here are some examples where we wish to estimate the population means and it is
unlikely we would know the population standard deviations. Suppose each of these
studies involves students at Feni University.

 The Dean of the Business Administration wants to estimate the mean number of
hours full-time students work at paying jobs each week. He selects a sample of 30
students, contacts each student and asks them how many hours they worked last
week. From the sample information, he can calculate the sample mean, but it is not
likely he would know or be able to find the population (σ) standard deviation
required in formula. He could calculate the standard deviation of the sample and use
that as an estimate, but he would not likely know the population standard deviation
 The Dean of Faculty of Business wants to estimate the distance the typical
commuter student travels to class. He selects a sample of 40 commuter students,
contacts each, and determines the one-way distance from each student’s home to the
campus. From the sample data, he calculates the mean travel distance, that is X̅ . It
is unlikely the standard deviation of the population would be known or available,
again making formula unusable.

 The Director of Student Loans wants to know the mean amount owed on student
loans at the time of his/her graduation. The director selects a sample of 20
graduating students and contacts each to find the information. From the sample
information, he can estimate the mean amount. However, to develop a confidence
interval using formula , the population standard deviation is necessary. It is not
likely this information is available
When population standard deviation is not known, we can use the sample standard
deviation to estimate the population standard deviation. That is, we use s, the
sample standard deviation, to estimate σ, the population standard deviation. But in
doing so, we cannot use formula we used for known population standard deviation.
Because we do not know σ we cannot use the z distribution. However, there is a
remedy. We use the sample standard deviation and replace the z distribution with
the t distribution

The t distribution is a continuous probability distribution, with many similar


characteristics to the z distribution. William Gosset, an English brewmaster, was
the first to study the t distribution. He was particularly concerned with the exact
behavior of the distribution of the following statistic:
The t distribution and the standard normal distribution are shown graphically
hereunder. Note particularly that the t distribution is flatter, more spread out, than the
standard normal distribution. This is because the standard deviation of the t
distribution is larger than the standard normal distribution.
The following characteristics of the t distribution are based on the assumption that the
population of interest is normal, or nearly normal.
It is, like the z distribution, a continuous distribution.
It is, like the z distribution, bell-shaped and symmetrical.
There is not one t distribution, but rather a family of t distributions. All t distributions
have a mean of 0, but their standard deviations differ according to the sample size, n
 The t distribution is more spread out and flatter at the center than the standard normal
distribution. As the sample size increases, however, the t distribution approaches the
standard normal distribution, because the errors in using s to estimate decrease with
larger samples.
Because Student’s t distribution has
a greater spread than the z
distribution, the value of t for a
given level of confidence is larger in
magnitude than the corresponding z
value. Chart below shows the values
of z for a 95 percent level of
confidence and of t for the same
level of confidence when the sample
size is n =5.
To develop a confidence interval for the population mean using the t distribution, we
adjust formula as follows:

To determine a confidence interval for the population mean with an unknown


standard deviation, we:
1. Assume the sampled population is either normal or approximately normal.
From the central limit theorem, we know that this assumption is questionable
for small sample sizes, and becomes more valid with larger sample sizes.
2. Estimate the population standard deviation (σ) with the sample standard
deviation (s).
3. Use the t distribution rather than the z distribution
What Are Degrees of Freedom?
Degrees of Freedom refers to the maximum number of logically independent values,
which are values that have the freedom to vary, in the data sample. The easiest way to
understand Degrees of Freedom conceptually is through an example:

Consider a data sample consisting of, for the sake of simplicity, five positive integers.
The values could be any number with no known relationship between them. This data
sample would, theoretically, have five degrees of freedom.

Four of the numbers in the sample are {3, 8, 5, and 4} and the average of the entire
data sample is revealed to be 6.

This must mean that the fifth number has to be 10. It can be nothing else. It does not
have the freedom to vary.

So the Degrees of Freedom for this data sample is 4.


A Confidence Interval for a Proportion

PROPORTION: The fraction, ratio, or percent indicating the part of the sample
or the population having a particular trait of interest.

As an example of a proportion, a recent survey indicated that 92 out of 100


surveyed favored the continued use of daylight savings time in the summer. The
sample proportion is 92/100, or .92, or 92 percent. If we let p represent the
sample proportion, X the number of “successes,” and n the number of items
sampled, we can determine a sample proportion as follows.
The population proportion is identified by π. Therefore, refers to the percent of
successes in the population. Recall from binomial distribution that π is the
proportion of “successes” in a binomial distribution. This continues our
practice of using Greek letters, π, to identify population parameters and Roman
letters ,X, to identify sample statistics. To develop a confidence interval for a
proportion, we need to meet the following assumptions.
1. The binomial conditions have been met. Briefly, these conditions are:
 The sample data is the result of counts.
 There are only two possible outcomes. (We usually label one of the outcomes
a “success” and the other a “failure.”)
 The probability of a success remains the same from one trial to the next.
 The trials are independent. This means the outcome on one trial does not
affect the outcome on another.
2. The values nπ and n(1- π) should both be greater than or equal to 5. This
condition allows us to invoke the central limit theorem and employ the standard
normal distribution, that is, z, to complete a confidence interval.

Developing a point estimate for a population proportion and a confidence


interval for a population proportion is similar to doing so for a mean.

To develop a confidence interval for a population proportion, we use:


Choosing an Appropriate Sample Size

When working with confidence intervals, one important variable is sample size.
However, in practice, sample size is not a variable. It is a decision we make so
that our estimate of a population parameter is a good one. Our decision is based
on three variables:
1. The margin of error the researcher will tolerate.
2. The level of confidence desired, for example, 95 percent.
3. The variation or dispersion of the population being studied.
Margin Of Error
The first variable is the margin of error. It is designated as E and is the amount that is
added and subtracted to the sample mean (or sample proportion) to determine the
endpoints of the confidence interval. For example, in a study of wages, we may decide
that we want to estimate the population average wage with a margin of error of plus or
minus $1000.

The margin of error is the amount of error we are willing to tolerate in estimating
a population parameter. You may wonder why we do not choose small margins of
error. There is a trade-off between the margin of error and sample size. A small margin
of error will require a larger sample and more money and time to collect the sample. A
larger margin of error will permit a smaller sample and a wider confidence interval.
Level Of Confidence

The second choice is the level of confidence. In working with confidence


intervals, we logically choose relatively high levels of confidence such as 95
percent and 99 percent. To compute the sample size, we need the z-statistic that
corresponds to the chosen level of confidence. The 95 percent level of
confidence corresponds to a z value of 1.96, and a 99 percent level of
confidence corresponds to a z value of 2.58. Notice that larger sample sizes (and
more time and money to collect the sample) correspond with higher levels of
confidence. Also, notice that we use a z-statistic
population standard deviation
The third choice to determine the sample size is the population standard deviation.
If the population is widely dispersed, a large sample is required. On the other
hand, if the population is concentrated (homogeneous), the required sample size
will be smaller. Often, we do not know the population standard deviation. Here
are three suggestions for finding a value for the population standard deviation.

Conduct a pilot study: This is the most common method. Suppose we want an
estimate of the number of hours per week worked by students enrolled in the
College of Business at the University of Texas. To test the validity of our
questionnaire, we use it on a small sample of students. From this small sample, we
compute the standard deviation of the number of hours worked and use this value
as the population standard deviation
population standard deviation
Use a comparable study: Use this approach when there is an estimate of the
standard deviation from another study. Suppose we want to estimate the number of
hours worked per week by refuse workers. Information from certain state or federal
agencies that regularly study the workforce may provide a reliable value to use for
the population standard deviation.

Use a range-based approach: To use this approach, we need to know or have an


estimate of the largest and smallest values in the population. Recall from the
Empirical Rule which states that virtually all the observations could be expected to
be within plus or minus 3 standard deviations of the mean, assuming that the
distribution follows the normal distribution. Thus, the distance between the largest
and the smallest values is 6 standard deviations. We can estimate the standard
deviation as one-sixth of the range.
Sample Size to Estimate a Population Mean

To estimate a population mean, we can express the interaction among these three
factors and the sample size in the following formula. Notice that this formula is the
margin of error used to calculate the endpoints of confidence intervals to estimate a
population mean.

Solving this equation for n yields the following result.


A student in public administration wants to determine the mean amount members of
city councils in large cities earn per month as remuneration for being a council member.
The error in estimating the mean is to be less than $100 with a 95 percent level of
confidence. The student found a report by the Department of Labor that reported a
standard deviation of $1,000. What is the required sample size?

The maximum allowable error, E, is $100. The value of z for a 95 percent level of
confidence is 1.96, and the value of the standard deviation is $1,000. Substituting
these values into formula gives the required sample size as:
The computed value of 384.16 is rounded up to 385. A sample of 385 is required to
meet the specifications. If the student wants to increase the level of confidence, for
example to 99 percent, this will require a larger sample. The z value corresponding
to the 99 percent level of confidence is 2.58

We recommend a sample of 666. Observe how much the change in the confidence
level changed the size of the sample. An increase from the 95 percent to the 99
percent level of confidence resulted in an increase of 281 observations or 73
percent [(666/385)*100]. This would greatly increase the cost of the study, both in
terms of time and money. Hence, the level of confidence should be considered
carefully
Sample Size to Estimate a Population Proportion
To determine the sample size for a proportion, the same three variables need to be
specified:
1. The margin of error.
2. The desired level of confidence.
3. The variation or dispersion of the population being studied.

For the binomial distribution, the margin of error is:

Solving this equation for n yields the


following equation
The study in the previous example also estimates the proportion of cities that have
private refuse collectors. The student wants the margin of error to be within .10 of the
population proportion, the desired level of confidence is 90 percent, and no estimate
is available for the population proportion. What is the required sample size?

The estimate of the population proportion is to be within .10, so E .10. The desired
level of confidence is .90, which corresponds to a z value of 1.65. Because no
estimate of the population proportion is available, we use .50. The suggested number
of observations is

The student needs a random sample of 69 cities.

You might also like