Sampling Notes Part 01
Sampling Notes Part 01
Sampling Notes Part 01
A population having finite number of items The sample statistic are generally denoted by
is called a finite population. A population having Roman letters. For example,
infinite number of items is called an infinite
population. The population size is usually denoted 𝑥 – Sample mean
by N. 𝑠- Sample standard deviation
Random sampling
A random sample is one in which each and Note:
every unit of the population has an equal chance of 1. Mathematically, a sample statistic for a sample of
being included in the sample. size n can be defined as a function of the random
variables X1, X2,…… Xn i.e., t(X1, X2,… Xn). The
For example, if we take a sample of 4 students out of function t(X1, X2,…… Xn) is another random
30 students in a class, we get 30C4 samples, each variable, whose values can be represented by
having same chance of being selected. t(x1, x2,…… xn).
Parameters
2. Generally, the parameter value is not known. So,
A population is considered to be known
it is estimated with the help of statistic. A statistic
when we know the probability distribution f(x) of the
t = t(x1, x2,…… xn) is an unbiased estimator of the
associated random variable X.
population parameter 𝜃 , if 𝐸 𝑡 = 𝜃 (i.e., if
If X is normally distributed, we say that the E(Statistic) = Parameter).
population is normally distributed or that we have a
normal population. Similarly, if X is binomially
distributed, we say that the population is binomially
distributed or that we have a binomial population.
Page 1
Sampling Theory
Page 2
Sampling Theory
5−9 2 + 7−9 2 + 9−9 2 + 9−9 2 + 11−9 2 + 13−9 2 then the sample mean is normally distributed with
𝜎𝑥2 = 6 𝜎2
mean 𝜇 and variance .
𝑛
20
=
3
Theorem-05 (Central limit theorem)
Note that 𝜎𝑥2 ≠ 𝜎2 Suppose that the population from which samples are
taken has a probability distribution with mean 𝜇 and
𝜎 2 𝑁−𝑛 20 4−2 20 variance 𝜎 2 that is not necessarily a normal
In fact 𝑛 𝑁−1
= 2 4−1
= 3
= 𝜎𝑥2
distribution. Then the standardized variable
associated with 𝑥 , given by
Theorem-01 𝑥 −𝜇
𝑧 = 𝜎/ -----(4)
𝑛
The expected value of the sample mean 𝜇𝑥 is the
population mean 𝜇 .
is asymptotically normal, i.e.,
i.e., 𝜇𝑥 = 𝜇 ----(1) 𝑧
1 2
lim𝑛→∞ 𝑃 𝑍 ≤ 𝑧 = 𝑒 −𝑧 /2 𝑑𝑧 ----(5)
2𝜋 −∞
Note: Thorem-01 is illustrated in both Case-I and
Case-II of above example. Note:
It is assumed here that the population is infinite or
Theorem-02 that sampling is with replacement. Otherwise, the
If a population is infinite and the sampling is random above is true if we replace 𝜎/ 𝑛 in Eqn(4) by 𝜎𝑥
or if the population is finite and sampling is with from Eqn(3).
replacement, then the variance of the sampling
distribution of means denoted by 𝜎𝑥2 , is given by Conclusion:
If the population is normal, then the sampling
𝜎2
𝜎𝑥2 = 𝑛
-----(2) distribution of mean is also normal with mean 𝜇 and
standard deviation 𝜎/ 𝑛.
Where 𝜎 2 is the population mean and n is sample
size. While for large samples (usually n ≥30 is considered
as large sample), the same result hold even if the
Note: Thorem-02 is illustrated in Case-I of above distribution of the population is non-normal.
example.
Theorem-03
If a population is finite of size N and if sampling is
without replacement, then
𝜎 2 𝑁−𝑛
𝜎𝑥2 = 𝑛 𝑁−1
-----(3)
Theorem-04
If the population from which samples are taken is
normally distributed with mean 𝜇 and variance 𝜎 2 ,
Page 3
Sampling Theory
For example, the head of the department would distribution the test is conducted.
make a point estimate when he says that 70% of the In general,
students will get S grade in SEE. Relevent statistic −E t t−E t
Test statistic = =
S.E. t S.E.(t)
A point estimate is not quite sufficient as it
would be either right or wrong. If it is wrong, we 6. Critical region
will not know how wrong it is. So, it tis always The test procedure divides the possible values of
better if an estimate lies within an interval. the test statistic into two regions namely an
acceptance region for H0 and a rejection region
Interval Estimation for H0. The region where H0 is rejected is known
In interval estimation, an interval (T1, T2) as the critical region.
which is likely to contain the parameter is proposed
as estimator of the parameter. The interval (T1, T2) is If the value of the test statistic falls in the
called confidence interval. The limits T1 and T2 of critical region, we reject the null hypothesis H0.
the confidence interval are called the confidence
limits. 7. Errors of the first and second kind
Decision
For example, the head of the department would Actual
based on Error
fact
make an interval estimate when he says that the the sample
percentage of students getting S grade could be Correct
1 H0 is true Accept H0 -----
between 65% and 75%. decision
H0 is Wrong
2 Reject H0 Type I
Testing of Hypothesis true decision
H0 is not Wrong
Let us assume that the population parameter 3 Accept H0 Type II
true decision
has a certain value. Then the unknown parameter H0 is not Correct
value is estimated using sample values. If the sample 4 Reject H0 -----
true decision
value is exactly same or very close to our
assumption, then it can be straight away accepted as The probability of occurrence of the Type I error
the parameter. If it is far away, then we can totally is denoted by 𝛼.
reject it. But if it is neither close nor far away, then The probability of occurrence of the Type II
we have to develop a procedure to decide whether to error is denoted by 𝛽.
accept the presumed value or not, on the basis of The value 1 − 𝛽 is called the power of the test.
sample values. This procedure is called as the
Testing of Hypothesis. 8. Power of the test is the probability of rejecting
H0 when H0 is not true.
Page 4
Sampling Theory
Page 5
Sampling Theory
Note
𝑥 −𝜇 0
If 𝜎 is not known, then the test statistic 𝑧 = 𝑠/ 𝑛
is
used where 𝑠 is the sample standard deviation.
Page 6
Sampling Theory
1 1 1
Note (2) P = P[getting 3 or 4]= 6 + 6 = 3
Since the probable limits for a normal
variate 𝑋 are 𝐸 𝑋 ± 3 𝑣𝑎𝑟 𝑋 , the probable limits H0: The die is unbiased (P=1/3)
for the observed proportion of successes are
H1: The die is biased (P≠1/3) ; Two tailed test
𝐸 𝑝 ± 3𝑆. 𝐸. 𝑝 i.e., 𝑃 ± 3 𝑃𝑄 𝑛
Test statistic is
Note (3) 𝑝−𝐸 𝑝 𝑝−𝑃
If P is not known then the probable limits for 𝑧= =
𝑆. 𝐸. 𝑝 𝑃𝑄 𝑛
the proportion in the population are 𝑝 ± 3 𝑝𝑞 𝑛
0.02666667
𝑧= = 5.4 = 𝑧𝑐𝑎𝑙
0.004969
Page 7
Sampling Theory
2. A coin was tossed 400 times and head turned 4. A survey was conducted in a slum locality of
up 216 times. Test the hypothesis that the coin 2000 families by selecting a sample of size 800.
is unbiased at 5% level of significance. It was revealed that 180 families were
illiterates. Find the probable limits of the
Soln: n=400; P=P(H)=1/2; Q=1/2 illiterate families in the Population of 2000.
X = number of heads = 216
Soln: n=800 ;
H0: The coin is unbiased (P=1/2) X = number of illiterate families = 180
∴𝑝=𝑛=9
𝑋 1 Similar problems for practice
𝑃 is not known and hence we take 𝑃 = 𝑝 = 1/9 1. In 324 throws of a six faced die , an odd number
turned up 181 times. Is it reasonable to think that
∴ 𝑄 = 8/9 the die is an unbiased one?
Page 8
Sampling Theory
𝑠
& 𝑥 + 𝑡0.05 𝑛−1
= 8.20289
t is normally distributed for large samples.
Similarly
t-test for single mean 99% confidence limits for 𝜇 (actual diameter) is
The following are the assumptions of t-test
for single mean. 𝑠 𝑠
𝑥 − 𝑡0.01 , 𝑥 + 𝑡0.01
𝑛−1 𝑛−1
1. The parent population is Normal.
2. 𝜎 2 is unknown. From table 𝑡0.01 = 3.106 for 11 degrees of freedom.
3. Sample size is small. 𝑠
∴ 𝑥 − 𝑡0.01 𝑛−1
= 6.2187
Under the null hypothesis
𝑠
& 𝑥 + 𝑡0.01 𝑛−1
= 8.5413
H0: 𝜇 = 𝜇0 (population mean = sample mean)
Page 9
Sampling Theory
2. Show that 95% confidence limits for the mean 3. A random sample of 10 boys had the
𝒔
µ of the population are 𝒙 ± 𝒕𝟎.𝟎𝟓 . Deduce following I.Q. : 70, 120, 110, 101, 88, 83, 95,
𝒏−𝟏
that a random sample of 16 values with mean 98, 107, 100. Do these data support the
41.5 inches and the sum of the square of the assumption of a population mean I .Q. = 100
deviation from the mean 135 inches2 and at 5% level of significance.
drawn from a normal population ,95%
Soln: n=10, 𝜇 = 100
confidence limits for the mean of population
are 39.9 and43.1 inches. H0: 𝜇 = 100
Soln: (a) The probability that 𝑡 ≤ 𝑡0.05 is 0.95. H1: 𝜇 ≠ 100 (Two tailed test)
Hence the 95% confidence limits for µ are given by
Test statistic
𝑥−𝜇
≤ 𝑡0.05 𝑥 −𝜇
𝑠/ 𝑛 − 1 𝑡 = 𝑠/ 𝑛−1
𝑠 𝑥𝑖 𝑥 𝑖 −𝑥 2
⟹ 𝑥−𝜇 ≤ 𝑡0.05 𝑥= = 97.2 ; 𝑠 2 = = 183.36 ;
𝑛−1 𝑛 𝑛
∴ 𝑠 = 13.54
𝑠 𝑠
⟹− 𝑡0.05 ≤ 𝑥 − 𝜇 ≤ 𝑡0.05
𝑛−1 𝑛−1 Thus 𝑡 = −0.6204 ; 𝑡 = 0.6204
𝑠 𝑠 The table value for 9 d.f at 5% level of significance
⟹− 𝑡0.05 − 𝑥 ≤ −𝜇 ≤ −𝑥 + 𝑡0.05
𝑛−1 𝑛−1 is 𝑡0.05 = 2.26
𝑠 𝑠
⟹ 𝑡0.05 + 𝑥 ≥ 𝜇 ≥ 𝑥 − 𝑡0.05 Clearly 𝑡 = 0.6204 < 𝑡0.05
𝑛−1 𝑛−1
∴ H0 is accepted.
or
4. The average breaking strength of steel rods is
𝑠 𝑠
𝑥 − 𝑡0.05 ≤ 𝜇 ≤ 𝑥 + 𝑡0.05 specified to be 18.5 thousand pounds. To test
𝑛−1 𝑛−1 this a sample of 14 rods was tested. The mean
and standard deviation obtained were 17.85
b) Given n=16, 𝜈 = 𝑛 − 1 = 15 d.f
and 1.955 respectively. Is the result of the
From table, 𝑡0.05 = 2.131 for 15 d.f experiment significant with 95% confidence?
𝑥𝑖 − 𝑥 2
135 H0: 𝜇 = 18.5
∴ 𝑠2 = = = 8.4375
𝑛 16
H1: 𝜇 ≠ 18.5 (Two tailed test)
∴ 𝑠 = 2.9047
Test statistic
𝑠 𝑠
𝑥 − 𝑡0.05 𝑛−1
= 39.9 & 𝑥 + 𝑡0.05 𝑛−1
= 43.1 𝑥 −𝜇
𝑡 = 𝑠/ 𝑛−1
= −1.20
∴ 95% confidence limits are 39.9 and 43.1
Thus 𝑡 = 1.20
∴ H0 is accepted.
Page 10
Sampling Theory
5. In the past, a machine has produced washers 3. Consider the sample consisting of numbers 45,
having a thickness of 0.50mm. To determine 47, 50, 52, 48, 47, 49, 53 and 51.The sample is
whether the machine is in proper working drawn from a population whose mean is 48.5.
condition, a sample of 10 washers is chosen Find whether the sample mean differs
for which the mean thickness is found as significantly from the population mean at 50%
0.53mm with standard deviation 0.03mm. level of significance.
Test the hypothesis that the machine is in
proper working condition, using a level of Test for the difference between means
significance of (i) 0.05, (ii) 0.01 of two independent samples of sizes n1
Soln: n=10, 𝜇 = 0.50, 𝑥 = 0.53, 𝑠 = 0.03
and n2.
Test statistic is
H0: 𝜇 = 0.50 (the machine is working properly) 𝑥1 − 𝑥2
H1: 𝜇 ≠ 0.50 (Two tailed test) 𝑡=
Test statistic 𝑛1 𝑠12 + 𝑛2 𝑠22 1 1
𝑥 −𝜇 𝑛1 + 𝑛2 − 2 𝑛1 + 𝑛2
𝑡 = 𝑠/ 𝑛−1
=3
Thus 𝑡 = 3 Which follows t-distribution with 𝑛1 + 𝑛2 − 2 d.f.
Case(i): Here 𝑠1 and 𝑠2 are standard deviations of samples.
The table value for 9 d.f at 5% level of Problems
significance is 𝑡0.05 = 2.26
Clearly 𝑡 > 𝑡0.05 1.
∴ H0 is rejected at 5% level of significance. Diet A Diet B A group of 10 rats fed
5 2 on a diet A and
Case(ii): 6 3 another group of 8 rats
8 6 fed on a different diet
The table value for 9 d.f at 1% level of 1 8 B recorded the
significance is 𝑡0.01 = 3.25 12 1 following increase in
4 10 weights in gms.
Clearly 𝑡 < 𝑡0.01
∴ H0 is accepted at 1% level of significance. 3 2
9 8 Test whether the diet
6 - A is superior to diet B.
Note
10 -
Since we can reject the null hypothesis at
5% level of significance but not at 1% level, we can
conclude to check the machine or take at least Soln: n1=10, n2 = 8, 𝑥1 = 6.4, 𝑥2 = 5
another sample .
2
𝑥−𝑥 𝑥2
𝑠12 = = − 𝑥 2
= 10.24
Similar problems for practice 𝑛 𝑛
1. The nine items of a sample have the following Similarly, for the second sample 𝑠22 = 10.25
values : 45, 47, 50, 52, 48, 47, 49, 53, 51.Does
H0: 𝜇1 = 𝜇2 (No significant difference between
the mean of these differ significantly from the
two diets)
assumed mean of 47.5?
H1: 𝜇1 > 𝜇2 (Diet A is superior to Diet B - right
2. A machinist is making engine parts with axle tailed test)
diameter of 0.7 inch. A random sample of 10
parts shows mean diameter 0.742 inch with a SD Test statistic
of 0.04inch.On the basis of the sample ,would 𝑥1 − 𝑥2
𝑡= = 0.875
you say that the work is inferior?
𝑛1 𝑠12 + 𝑛2 𝑠22 1 1
𝑛1 + 𝑛2 − 2 𝑛1 + 𝑛2
Page 11
Sampling Theory
Page 12
Sampling Theory
Let 𝑑 be the sample mean and 𝑠𝑑 be the sample The table value for 𝑛 − 1 = 4 d.f at 5% level of
standard deviation of these observations. significance is 𝑡0.05 = −2.132 (left tailed)
2
𝑑 𝑑2 𝑑
𝑑= = −2 ; 𝑠𝑑 2 = − = 23.9992
𝑛 𝑛 𝑛
∴ 𝑠𝑑 = 4.8989
𝑑
𝑡= = −0.8164
𝑠𝑑 / 𝑛 − 1
Page 13