Unit 18
Unit 18
Unit 18
Structure
18.0 Objectives
18.1 Introduction
18.2 Statistical Background
18.3 Concept of Statistical Inference
18.4 Point Estimation
18.5 Confidence Interval for Known Variance
18.6 Confidence Interval for Unknown Variance
18.7 Let Us Sum Up
1 8.8 Key Words
18.9 Some Useful Books
18'.10 Answers/Hints to Check Your Progress Exercises
18.1 INTRODUCTION
Many times due to certain constraints such as inadequate funds or manpower or time
we-are not in a position to survey all the units in a population. In such situations we
take resort to sampling, that is, we survey only a part ofthe population. On the basis of
the'information contained in the sample we try to draw conclusions about the population.
This process is called statistical inference. We must emphasise that statistical inference
is widely applied in economics as well as in many other fields such as sociology,
psychology, political science, medicine, etc. For example, before election process starts
or just before declaration of election results many newspapers and television channels
conduct exit polls. The purpose is to predict election results before the actual results
are declared. At that point of time, it is n ~ t ' ~ o s s i bfor
l e the surveyors to ask all the
voters about their preferences for electoral candidates -the time is too short, resources
are scarce, manpower is not available, and a complete census before election defeats
the very purpose of election!
In the above example the surveyor actually does not know the result, which is the
outcome of votes cast by all the voters. Here all the voters taken together comprise the
population. The surveyor has collected data from a representative sample of the
population, not all the voters. On the basis of the information contained in the sample,
(s)he is making forecast about the entire population.
In this Unit we deal with the concept of statistical inference and methods of statistical
estimation. Parameter, as you know, is a function of population units while statistic is
a hnction of sampling units. There could be a number ofparameters and corresponding
Statistical Inference statistics. However, in order to keep our presentation simple, we will confine ourselves
mostly to arithmetic mean.
-'
case of standard normal curve we measure z = -on the x-axis and probability of
a
occurrence of z, that is p(z) ,on the y-axis. Thus if we,consider a particular segment
of the normal curve (bounded by two values of z, say, z, and z, ) the area under the
curve gives its probability. Remember that normal curve is different from the frequency
curve considered in Block 1 of this course. Area under the normal curve qoes not give
frequencies; it gives probabilities.
In Unit 16 of Block 6 we leant that v e j often it is not possible to study the entire
population and we undertake a sample survey. If the samplejs drawn in a random
manner through appropriateprobability attached to each population unit and the sample
size is not very small, the sample can be a representative one ofthe popu1ation:Recall
that we can draw a number of samples from a given population and each sample
provides us with a sample mean. Thus the sample means can be arranged in the form
of a frequency distribution, called the 'sampling distribution'.
We know from Unit 16, Block 6 that sample.mean (x) assumes different values and Statistical Estimation
for each value we can attach a probability. Thus sample mean can be considered as a
random variable. In real life situations we have a finite population and the number of
samples (and therefore the number of sample means) is finite. In this case z is a
discrete random variable but when there are infinite number of samples, x could be a
continuous random variable.
Now let us consider another important concept discussed in Unit 16: the central limit
theorem. It says that sampling distribution of is normal if the population from
which the sample is drawn is normal. However, sampling distribution of is
approximately normal if sample size (n) is large, even if the parent population (that is,
population from which it is drawn) is not normal.,If the parent population is
approximately normal then sampling distribution of sample means is approximately
normal even when sample size is small.
We know that dispersion of sample means is smaller in value than dispersion of the
parent population from which the sample is drawn. Recall that the standard deviation
of the sampling distribution is called standard error. Thus if the population has a
0.
standard deviation o then the standard error of sample mean is -
&
From the above we learn that sample mean can be considered as a,random variable and
it approximates normal distribution when sample size is large. Usually we consider a
sample to be large in size if n > 30 .For small samples ( n 2 30 ), sampling distribution
of sample means is similar to student's t distribution. Recall that in the case of t
distribution the shape of the probability curve changes according to its 'degrees of
freedom'.
Hypo! hesii
Estimation
.......................................................................................................................
2) State whether the following statements are true or false.
a) Normal distribution is a limiting case of binomial distribution.
b) Standard deviation of sampling distribution of a statistic is termed as
standard error.
c) Poisson distribution is an example of continuous distribution.
d) Statistical estimation is a p& of statistical inference.
Statistical Estimatio~l
18.4 POINT ESTIMATION
As mentioned earlier we do not know the parameter value and wankto guess it by using
sample information. Obviously the best guess will be the value of the sample statistic.
For example, if we do not know the population mean the best guess would be the
sample mean. Here in this case we use a single value or point as 'estimate' of the
parameter.
In Unit 16 we have explained the concepts of estimate and estimator. Also we have
pointed out the distinction between the two. Recall that estimator is the formula and
estimate is the particular value obtained byusing the formula. For example, if we use
1
sample mean for estimation of population mean, then - C xi is the estimator. Suppose
?il
I collect data on a sample, and put the sampling units to this formula and obtain a
particular value for sample mean, say 120. Then 120 is an estimate of population
mean. It is possible that you draw another sample from the same population, use the
1
formula for sample mean, that is -C xi ,and obtain a different value, say 123. Here
n i
both 120 and 123 are estimates ofpopulation mean. But in both the cases the estimator
1
is the same, which is - C x, . Remember that the term
statistic, which is used to mean
n
a function of sample values, is a synonym for estiinator.
There may be situations when you would find moie than one potential estimator
(alternative formulae) for a parameter. In order to choose the best among these
estimators, we need to follow certain criteria. Based on these criteria an estimator
should fulfill certain desirable properties. There are quite a few desirable properties
for an estimator, but the most important is its unbiasedness.
Unbiasedness means that an estimate may be higher or lower than the unknown value
of the parameter. But the expected value of the estimate should be equal to the parameter.
For example, sample mean ( 3)may fluctuate ftom sample to sample but on an average
it would be equal to population mean. In other words, E(2) = p .
"
However, 1
1
i=l
bi - P' is not an unbiased estimator of the population variance
Here a question may be shaping up in your mind, 'How do we find out the confidence
ingerval and confidencecoeEcient?' Let us begin with confidence coefficient. We know
that the sampling distribution of ,- for large samples is normally distributed withmean
0-
, and
U standard error , where n is the size of the sample. By transforming the
&
2-p
sample mean ( = -) we obtain standard normal variate, which has zero mean
o/&
and unit variance. The standard normal curve is symmetrical and therefore, the area
under the curve for 0 5 z 5 oo is 0.5 which is presented in the form of atable (See Table
15.1 in Unit 15 of Block 5). Let us assume that we want our confidence coefficient to
be 95 per cent (that is, 0.95). Thus we should find out a range for z which will cover
0.95 area of the standard normal curve. Since distribution of z is symmetrical, 0.475
area should remain to the right and 0.475 area should remain to the left of z = 0 . If
look into normal area table (Table 15.1) we find that 0.475 area is covered when
z = 1.96. Thus the probability that z ranges between -1.96 to 1.96 is 0.95. From this
information let us work out backward and find the range within which p will remain.
We find that
Let us interpret the above. Recall that each sample would provide us with a different
value of 2 . Accordingly, the confidence interval would be different. In each case
the confidence interval would contain the unknown parameter or it would not.
Equation (18.2) means that if a large number of random samples, each of size n,
are drawn from the given population and if for each such sample the interval Statirtiarl %timation
!
(3
! Thus
Equation (18.4) implies that 99 per cent confidence interval for p is given by
i By looking into the normal area table you can work out the confidence interval for
confidence coefficient of 0.90 and find that
I We observe from (18.2), (18.4) and (1 8.5) that as the interval widens, the chance for
I
the interval holding a population parameter (in this case p ) increases.
The two limits of the confidence interval are called conJdence limits. For example, for
I
Q
95 per cent confidence level we have the lower confidence limit as 2 - 1.96- and the
J;;
(3
upper confidence limit as 2 + 1 - 9 6. The confidence coefficient can be interpreted
J;;
as the confidence or trust that we place in these limits for actually holding p .
Example 18.1
A paper company wants to estimate the average time required for a-new machine to
produce a ream of paper. A random sample of 36 reams shows an average production
time of 1.5 minutes per ream of paper. The population standard deviation is known to
be 0.30 minute. Construct an interval estimate with 95% confidence limits.
The information given is
x = 1.5, a = 0.30 and n = 36
C
Statistical Inference Sihce n = 36 ( > 30), we can take the sample as a large sample and accordingly ,- is
a
normally distributed with mean p and standard error -.Now, the standard error is
&
a 0.30
- 0.05
&-,I%
The 95% confidence interval is given by
Thius with 95% confidence, we can state that the average production time for the new I
machine will be between 1.402 minutes and 1.598 minutes. Here, 1.402 is the lower 1
confidence limit and 1.598 is the upper confidence lim!t. I
18.6 CONFIDENCE INTERVAL FOR
UNKNOWN VARIANCE
In l$e previous Section we estimated confidence interval for population mean on the
asSumption that population variance is known. It is a bit unrealistic that we do not
know population mean (we want to estimate it) but know population variance. A realistic
case would be the assumption that both population mean and variance are unknown.
On the basis of sample mean and variance we want to find out confidence interval for
population mean.
Sincethe population standard deviation ( o ) is not known we use the sample standard
deviation ( s ) in its place. However, in such a case the sampling distribution of 5 is
not normal, rather it follows student's t distribution. The standard error of the sample
S
maans would be -
J;;
Like the standard normal variate z, the t-distribution has a mean of zero, is symmetrical
about mean and ranges between - a, to a, . But its variance is greater than 1. Actually
its variance changes according to degrees of freedom. However, when n > 30 the t-
distribution has a variance very close to 1 and thus resembles z-distribution.
The t-statistic, like the z-statistic, is calculated as
By looking into the area table for t-dis&ution (see Table 15.3 in Unit 15) we find the
probability values for the confidence level that we require. The degrees of freedom is
( n - 1). Thus the confidence interval would be
Example 18.2
The mean weight (in kilogram) of 20 children are found to be 15 with a standard
deviation of 4. On the basis of the above information estimate 95 per cent confidence
interval for mean weight of the population from which the sample is drawn. Assume Statistical Estimation
7
Similarlyyou can find out confidence intervals for different sample sizes and confidence
coefficients.
Let us summarise.the rules for application of z or t statistic for estimation of confidence
interval.
1) If sample size is large (n>30) apply z-statistic - it does not matter whether
i) parent population is normal or not, and ii) variance is known or not.
2) If sample size is small (n 2 30) check whether i) parent population is normal, and
ii) variance is known.
a) If parent population is not normal apply nonpararnetric tests.
b) If parent population is normal and'variance is known apply z.
c) If parent population is normal and variance is not known apply t.
In Fig. 18.2 we present the above in the form of a chart.
.2) For a sample of 25 students in school the mean height was found to be 95 cm.
with a standard deviation of 4 cm. Find the 99 percent confidence interval.
T
ow population variance, we apply normal distribution to construct the confidence
i terval. In cases where population variance is not known, we use student's t for the
bove purpose. Remember that when sample size is large (n>30) t-distribution
approximatesnormal distribution. Thus for large samples, even if population variance
is not known, we can use normal distribution for constructionof confidence interval on
the basis of sample mean and sample variance.