Sampling Distribution Revised For IBS 2020 Batch

Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

1

Sampling Distribution:

Sampling distribution is the foundation of hypothesis testing


and confidence intervals.

In order to be able to use the sample mean to estimate the


population mean, we should examine every possible sample
(and its mean ) that could have occurred in the process of
selecting one sample of a certain size. If this selection of all
possible samples actually were to be done , the distribution
of the results would be referred to as a sampling distribution.
Although in practice , only one such sample is actually
selected. The concept of sampling distributions must be
examined so that probability theory and its distribution can
be used in making inferences about the population parameter
values.

Sampling distribution is the distribution of all possible


values of a statistic obtained from various possible samples
of equal size drawn from a population or a process.

Suppose we have a population consisting of 4 values 2 , 5 , 6


, 8. Population mean = ( 2 + 5 + 6 + 8 ) / 4 = 5.25 and standard
∑ 𝑥2 129
deviation = 𝜎 = √ 𝑛
− (𝑥̅ )2 = √ 4
− 27.5625 = 2.165.

Now select all samples of size 2.

SAMPLING DISTRIBUTION OF SAMPLE MEANS:


2

Sample Mean
2,2 2
2,5 3.5
2,6 4
2,8 5
5,2 3.5
5,5 5
5,6 5.5
5,8 6.5
6,2 4
6,5 5.5
6,6 6
6,8 7
8,2 5
8,5 6.5
8,6 7
8,8 8
3

FREQUENCY DISTRIBUTION OF SAMPLE MEAN WHEN


SAMPLE SIZE IS 2:
𝑥̅ frequency f𝑥̅ (𝑥̅ )2 f(𝑥̅ )2
2 1 2 4 4
3.5 2 7 12.25 24.5
4 2 8 16 32
5 3 15 25 75
5.5 2 11 30.25 60.5
6 1 6 36 36
6.5 2 13 42.25 84.5
7 2 14 49 98
8 1 8 64 64
Total frequency = 16 , ∑ 𝑓𝑥̅ = 84 , 𝜇𝑥̅ = 84/16 = 5.25 which = 𝜇
of the original population mean.
∑ 𝑓 (𝑥̅ )2 478.5
𝜎𝑥̅ = √ ∑𝑓
− (𝜇𝑥̅ )2 = √ − 27.5625 = √29.90625 − 27.5625 =
16

√2.34375 = 1.5309310≅ 1.5309


𝜎 2.165
𝜎𝑥̅ = = = 1.5309
√𝑛 √2

𝜎𝑥̅ is the standard deviation of sample means called Standard


Error.
𝜎
Therefore 𝜇𝑥̅ = 𝜇 𝑎𝑙𝑤𝑎𝑦𝑠 𝑎𝑛𝑑 𝜎𝑥̅ =
√𝑛

[ The sampling distribution of the mean values has its own


mean denoted by (x) or 𝜇𝑥̅ = 𝜇 𝑎𝑙𝑤𝑎𝑦𝑠 𝑎𝑛𝑑 𝜎𝑥̅ = √𝜎𝑛 or s
n
]
4

Therefore from a population with mean 𝜇 𝑎𝑛𝑑 𝑆. 𝐷 𝜎

……

Select a select

Random random

Sample of sample

Size n of size

Find 𝑥̅1 n , find 𝑥̅́2

When we put all the sample together we get sampling


distribution of the sample mean x bar. It turns out that mean
of this sampling distribution is 𝜇 𝑖. 𝑒. , 𝜇𝑥̅ = 𝜇 but it does not
have the same S.D . The S.D of the means of sampling
𝜎
distribution of x bar = .
√𝑛

This implies as sample size increases the S.D of the sample


distribution decreases.

To sum it up , we conclude that :

Let 𝑥̅ be the mean of a random sample of size n from a


population having mean 𝜇 𝑎𝑛𝑑 𝑆. 𝐷 𝜎. Then the mean denoted by
𝜇𝑥̅ and S.D denoted by 𝜎𝑥̅ of the sampling distribution of x bar
𝜎
are 𝜇𝑥̅ = 𝜇 and 𝜎𝑥̅ = .
√𝑛

NOTE: Sampling distribution of sample means depends on


the distribution of the population from which samples are
drawn.

{ SAMPLING DISTRIBUTION OF STATISTIC ; CENTRAL LIMIT THEOREM :


5

For any given series , population parameter is constant , there can be only one
value of 𝝁 . But in case of a sampling distribution we cannot say that the sampling
mean is constant.

Example:
Suppose there are 5 workers A, B , C , D ,E comprising the
population . Their monthly wages in thousands of rupees are
A B C D E
3 5 7 7 8
Suppose we have to select a sample of size 3 out of 5 workers ,
then in all there Can be 10 samples (5 C 3 )of size 3.Out of a
population size of 5.
All possible samples and their Means:
Sample Wages in the Sample mean
sample (‘000Rs)
ABC 3,5,7 5.00
ABD 3,5,7 5.00
ABE 3,5,8 5.33
ACD 3,7,7 5.67
ACE 3,7,8 6.00
ADE 3,7,8 6.00
BCD 5,7,7 6.33
BCE 5,7,8 6.67
BDE 5,7,8 6.67
CDE 7,7,8 7.33
[The values of 𝒙̅ are rounded to two decimal places ]
Now , on the basis of values of
̅ 𝒈𝒊𝒗𝒆𝒏 𝒊𝒏 𝒕𝒉𝒆 𝒂𝒃𝒐𝒗𝒆 𝒕𝒂𝒃𝒍𝒆 , 𝒘𝒆 𝒄𝒂𝒏 𝒇𝒐𝒓𝒎 a frequency
𝒙
distribution of 𝒙̅ .
Frequency distribution of 𝒙 ̅ when the sample size is3.( Sampling
distribution of the mean)
6

̅ .
𝒙 Frequency
5.00 2
5.33 1
5.67 1
6.00 2
6.33 1
6.67 2
7.33 1
Total 10
∑ 𝒇𝒙 𝟔𝟎
𝝁𝒙̅ = ∑𝒇
= 𝟏𝟎 = 𝟔 and mean of the population is 𝝁 =( 3 + 5 + 7 +
7+8)/5=6
𝑩𝒚 𝒅𝒊𝒗𝒊𝒅𝒊𝒏𝒈 𝒕𝒉𝒆 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒊𝒆𝒔 𝒐𝒇 𝒅𝒊𝒇𝒇𝒆𝒓𝒆𝒏𝒕 𝒙 ̅ values by the
total number of frequencies , we obtain the relative frequencies of
these classes. The relative frequencies are used as probabilities of
classes as shown below.

Sampling distribution of 𝒙 ̅ when the sample size is 3 (


probability distribution of the sample statistic )
̅ .
𝒙 P(𝒙̅ )
5.00 2/10=0.2
5.33 1/10=0.1
5.67 1/10=0.1
6.00 2/10=0.2
6.33 1/10=0.1
6.67 2/10=0.2
7.33 1/10=0.1
Total ∑ 𝑷(𝒙̅) = 𝟏. 𝟎
7

Note : if we draw one sample of size 3 from the population of size 5


, any of the 10 possible samples can be drawn which implies
sample mean 𝒙 ̅ can have any of the values shown in the table
above with corresponding probability . if we draw a sample with
mean 6.67 , then the probability of drawing such a sample is o.2
which can be written as P(𝒙 ̅ =6.67 ) = 0.2
Similarly we can form sampling distribution of median , mode etc.
I). Sampling Distribution of the Mean:
If a population distribution is normal , the sampling distribution of
the mean ( X bar )is also normal for samples of all sizes and shows two
important properties.
a). The sampling distribution has a mean that is equal to the population
mean in symbols 𝝁𝒙̅ = 𝝁.
b). The sampling distribution has a standard deviation ( a standard error
) that is equal to the population S.D divided by the square root of
𝝈
sample size. i.e., 𝝈𝒙̅ = 𝒏

c). If all possible samples of size n are drawn with replacement from a
population having normal distribution with mean 𝝁 and S.D 𝝈 then the
sampling distribution distribution of mean 𝑿 ̅ and standard error 𝝈𝒙̅ will
also be normally distributed irrespective of the size of the sample.
̅ is normal then
In particular , if the sampling distribution of mean 𝑿
S.E of the mean 𝝈𝒙̅ can be used to determine probability of various
values of sample mean.
8

For this purpose sample mean is converted into a value of normal


̅ − 𝝁𝒙̅
𝑿 ̅− 𝛍
𝑿
variate using the rule : Z = = 𝝈/ .
𝝈𝒙̅ √𝒏

PROCEDURE: The procedure for making statistical inference using


sampling distribution about the population mean 𝝁 based on mean of
sample means is summarized as follows.
i). Population S.D 𝝈 value is known and :
➢ Population distribution is normal
➢ Population distribution is not normal but the sample size is large (
n >= 30 ).
𝝈
In such a case sampling distribution of mean 𝝁𝒙̅ = 𝝁. And 𝝈𝒙̅ = are
√𝒏
̅ − 𝝁𝒙̅ 𝑿
𝑿 ̅− 𝛍
very close to the standard normal distribution given by Z = =
𝝈𝒙̅ 𝝈/√𝒏
. This is valid as long as the population size is infinite or the sample is
drawn with replacement.
However this is not true when the samples are drawn without
replacement from a finite population.
Therefore in sampling without replacement from a finite population the
sampling distribution mean will have 𝝁𝒙̅ = 𝝁 and standard error 𝝈𝒙̅ =
𝝈 𝑵−𝒏 𝑵−𝒏
√ , where √𝑵−𝟏 is called Finite population correction factor (fpc
√𝒏 𝑵−𝟏
factor)

𝒏 𝑵−𝒏
NOTE: i). if the sampling fraction < 0.05 then the fpc factor i.e., √
𝑵 𝑵−𝟏
approximately equal to 1.
9

➢ if N is large relative to the sample size n , then the fpc factor i.e.,
𝑵−𝒏
√ approximately equal to 1.
𝑵−𝟏

𝒏 𝑵−𝒏
ii). if the sampling fraction 𝑵 < 0.05 then the fpc factor i.e., √𝑵−𝟏
need to be used.
Sampling from Non-Normal Population:
In case of a Normal population , the sampling distribution of mean
is also Normal.
However In most of the cases the population from which a sample
is taken is not normally distributed. In such cases , we use an
important theorem called THE CENTRAL LIMIT THEOREM to infer
the shape of the sampling distribution of the mean for large
samples.
The central limit theorem states that the sampling distribution of
the mean can be approximated by the normal distribution as the
sample size gets large enough .( how much ?). This is true
regardless of the shape of the distribution of the individual values
in the population.
The following guide lines are helpful in describing an appropriate
value of ‘n’.
i). If population understudy is normal , then the sampling
distribution of mean 𝒙̅ will also be normal , regardless of the size of
the sample.
10

ii). If population under study is approximately symmetric , then the


sampling distribution of mean 𝒙 ̅ becomes approximately normal
for relatively small values of ‘n’.
iii). If population under study is skewed , the sample size ‘n’ must
be larger with at least 30 before the sampling distribution of mean
̅ becomes approximately normal.
𝒙
NOTE: The Standard Deviation of the sampling distribution of the mean
is known as Standard error of the mean. It helps to construct
confidence limits and determining sample size.
Numeric problems:
Q1). In a certain locality the average rent paid by all tenants amounts to
Rs. 1,500 PM with a S.D of Rs. 450. However the population distribution
of rent pertaining to all tenants in that city is positively skewed. Find
out the mean a S.D of 𝒙 ̅ when the sample size is i) 30 ii). 100
Also describe the shape of its sampling distribution in both cases.
Q2. Suppose we are interested in 15 electronic companies of the same
size. All these companies are confronting a serious problem in the form
of excessive turnover of their employees. It has been found that the S.D
of the distribution of annual turnover is 60 employees. Take a sample of
3 electronic companies without replacement and determine the
standard error of the mean.
Q3). Suppose if we have large number of electronic companies say 800
instead of 15 and our sample continues to be of 3 companies with S.D
450 , then what will be the fpc factor ? Do you you think that the fpc
factor should be used in this case?
11

Q4). Suppose in a normally distributed population average income per


household is Rs. 10,000 PM with S.D Rs.800.A survey based on a
random sample of 100 households is under taken . What is the
probability that the sample mean will be between Rs. 9,800 and Rs.
10,100
Student’s t – distribution:
While considering the sampling distribution of 𝒙̅ we made two
assumptions i.e., i). the mean of the population 𝝁 is known . ii). the S.D
of the population is known i.e., 𝝈 is also known OR the sample is large
to justify replacing 𝝈 by its sample estimate (s).
However , if 𝝈 is not known OR n Sample size is not large we define a
new variance known as student’s t – variable as :
̅ −𝝁
𝒙 ̅ )𝟐
∑(𝑿−𝑿
t= where s = √ i.e., sample S.D
𝒔/√𝒏 𝒏−𝟏
NOTE:
➢ The t – distribution has an advantage as it can be computed from
the sample data, unlike the value of Z which can not be computed
unless 𝝈 is known. The distribution of t is very close to the
distribution of standard normal variate Z except for small values of
‘n’ .
➢ The degrees of freedom for a sample size of n is n – 1.
➢ With increasing degrees of freedom i.e., with the increased sample
size the t – distribution tends to coincide with the standard normal
distribution . The higher the sample size ‘n’ , ‘s’ will be more
accurate estimate of population S.D 𝝈 and vice – versa.
➢ Degrees of freedom: The number of degrees of freedom refers to
the number of values that are free to vary in random sample. The
shape of t – distribution varies with degrees of freedom. More the
sample size n , higher is the degrees of freedom.}
12

Q5). A particular brand of bottled water sells 500 ml bottles


of water . Of course not every bottle has exactly 500 ml of
water in it. Suppose the true amount of water in each bottle
varies according to a normal distribution with mean = 500 ml
and S.D = 2 ml. What is the probability that a randomly
selected bottle of water contains less than 497 ml of water?

Q6). Big Bazaar , a chain of 130 shopping malls has been bought
out by another larger nationwide super market chain. Before the
deal is finalized , the larger chain wants to have some assurance
that Big Bazaar will be a consistent money maker. The larger chain
has decided to look at the financial records of 25 of the Big Bazaar
13

outlets. Big Bazaar claims that each outlet’s profit have an


approximately normal distribution with the same mean and a S . D
of Rs. 40 million. If the Big Bazaar management is correct , then
what is the probability that the sample mean for 25 outlets will fall
within Rs. 30 million of the actual mean?
Q7). Few years back , a policy was introduced to give loan to
unemployed engineers to start their own business. Out of 1,00,000
unemployed engineers 60,000 accept the /policy and got the loan.
A sample of 100 unemployed engineers is taken at the time of
allotment of loan. What is the probability that sample proportion
would have exceeded 50 percent acceptance?
Q8). . The mean length of life of a certain cutting tool is 41.5 hours
with a S.D of 2.5 hours. What is the probability that a simple
random sample of size 50 drawn from this population will have a
mean between 40.5 hours and 42 hours?
Q9). A continuous manufacturing process produces items whose
weights are normally distributed with a mean weight of 800 gms
and a S.D of 300 gms. A random sample of 16 items is to be drawn
from the process.
i). What is the probability that the arithmetic mean of the sample
exceeds 900 gms?
Ii). Find the values of the sample AM within which the middle 95%
of all sample means will fall.
Q10). Safal, a tea manufacturing company, is interested in
determining the consumption rate of tea per household in Delhi.
The management believes that yearly consumption per household
is normally distributed with an unknown mean ‘ mu ‘ and S.D of
1.50 Kg.
14

I). If a sample of 25 household is taken to record their consumption


of tea for one year, what is the probability that the sample mean is
within 500gms of the population mean?
Ii).How large a sample must be in order to be 98% certain that the
sample mean is within 500gms of the population mean?
Q11). Suppose in a normally distributed population, average
income per household is Rs. 10,000pm with a SD of Rs. 800. A
survey based on a random sample of 100 households is undertaken
. What is the probability that the sample mean will be between Rs.
9,800 and Rs.10,100?
Q12). Suppose a random sample of n = 25 observations is selected
from a population with mean mu = 15 and sigma = 11. What is the
probability that the sample mean x-bar will be :
I). less than 14
Ii) more than 14
Iii).  1 of the population mean mu = 15.
Q13). Exam Question

Q14)

Q15)

Q16).
15
16
17
18
19
20
21
22

QUESTION TO UNDERSTAND THE ABOVE COCEPT:


23

SAMPLING DISTRIBUTION OF
PROPORTION;
24
25
26

Q1). A manufacturer of watches has determined from experience that 3%


of the watches he produces are defective. If a random sample of 300
watches is examined. What is the probability that the proportion of
defective is between 0.02 and 0.035?
Q2). It has been found that 7% of the tools manufactured by a factory are
defective. What is the probability that in a shipment of 625 such tools:
A). 8% or more will be defective?
B). 7% or less will be defective?
Q3). Few years back, a policy was introduced to give loan to unemployed
engineers to start their own business. Out of 1,00,000 un-employeeed
engineers 60,000 accept the policy and got the loan. A sample of 100 un-
employed engineers is taken at the time of allotment of loan. What is the
probability that sample proportion would have exceeded 50%
acceptance?
Q4). Assume that 2% of the items produced in an assembly line operation
are defective, but that the firm’s production manager is not aware of this
situation. What is the probability that in a lot of 400 such items , 3% or
more will be defective?
NOTE: If n>= 30 ,the sampling distribution of p-cap can be approximated
by a normal distribution. The approximation will be adequate if np >= 5
and n( 1- p) >= 5.
It may be noted that the sampling distribution of the proportion would
actually follow binomial distribution because population is binomially
distributed.
Therefore mu-p-cap = np , sigma p-cap = sqrt( npq )
Q5). If a coin is tossed 20 times and the coin falls on head after any toss,
it is a success. Suppose the probability of success is 0.5. What is the
probability that the number of success is less than or equal to 12?
27

Q6). The quality control department of a paints manufacturing company,


at the time of dispatch of decorative paints discovered that 30% of the
containers are defective. If a random sample of 500 containers is drawn
with replacement from the population , what is the probability that the
sample proportion will be less than or equal to 25% defective?
Q7).

Question: 8)
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

You might also like