business stat ch 1
business stat ch 1
business stat ch 1
CHAPTER ONE
SAMPLING AND SAMPLING DISTRIBUTIONS
3.1. Introduction
Usually the population under study is very large or infinite which makes studding it very
difficult or impossible. Under such circumstances we take a sample or a subset of the
population to study the population.
What is sampling?
Data is collected from target population using survey. If a survey covers all population,
the survey is called census and if the survey covers part of the population, the survey is
called sampling.
Why sampling is preferable?
➢ Cheaper than census
➢ Takes smaller time as compared to census
➢ Economy of efforts as relatively fewer staffs are needed
➢ More detailed information can be collected using sample
➢ Better quality of interviewing, supervision and other related activities
ii. Adequacy: The size of sample should be adequate; otherwise it may not represent the
characteristics of the universe.
iii. Independence: All items of the sample should be selected independently of one
another and all items of the universe should have the same chance of being selected in the
sample. By independence of selection we mean that the selection of a particular item in
one draw has no influence on the probabilities of selection in any other draw.
iv. Homogeneity: When we talk of homogeneity we mean that there is no basic
difference in the nature of units of the universe and that of the sample. If two samples
from the same universe are taken, they should give more or less the same unit.
is considered that your respondents are typical of the population as a whole. For this
reason, an awareness of the principles of sampling is essential to the implementation of
most methods of research, both quantitative and qualitative.
In the lottery method, each unit of the population is numbered and shown on a chit of
paper or disc. The chits are then folded and put in a box from which a sample of
predetermined number is to be drawn.
In random number case, table of random numbers is used. The units of population are
numbered from 1 to N from which n units are selected.
In a non-probability sample, some people have a greater, but unknown, chance than
others of selection.
The possible samples of size two from the B, C, D & E population are BC, BD, CD, CE,
DE.
Note that, B appears in three of the six samples: so the probability of B, being selected is
p (B) = 3/6 = ½. Similarly, p(C) = p (D) = p (E) = ½: so (1.) each element of the
population has the same chance of being chosen. More over, (2) each of the possible
samples of size two has the same chance [p (BC) = p (BD) = p (BE) = p (CD) = P (CE) =
p (DE) = 1/6], of being selected. Consequently, we can say the conditions are satisfied.
2. Systematic sampling
Similar to simple random sampling, but instead of selecting random numbers from tables,
you move through list (sample frame) picking every kth name where k is N/n.
You must first work out sampling fraction by dividing population size by required sample
size. E.g. for a population of 500 and a sample of 100, the sampling fraction is 1/5 i.e.
you will select one person out of every five in the population. Random number needs to
be used only to decide on starting point. With the sampling fraction of 1/5, the starting
point must be within the first 5 people in your list
Advantages:
• May be saving in time
• Bias may be reduced because interviewer has to call at clearly defined addresses -
not able to choose
Problems:
• Characteristics of particular areas (e.g. poor / rich) may mean that sample is not
representative
• Open to abuse by interviewer because difficult to check that instructions fully
carried out
4. Stratified Sampling
Dividing a population into non overlapping groups is called stratification. A stratified
random sampling is one where the population you have is divide into non overlapping
sub groups or strata & then a simple random sample is selected with in each of the strata
or sub groups. Thus a population can be stratified if they have readily identifiable
characteristics that can be used to separate the population members into sub groups.
For example, we can stratify a human population as follows: first we can divide the
population into different strata on the basis of age, sex, occupations, education, religion,
region, etc… you have to notice that stratification doesn’t mean absence of randomness.
But all that it means, the population is first divided into a certain strata & then a simple
random sample is selected from each stratum of the population. The advantages of using
stratified random sapling are:
➢ It more accurately reflects the characteristics of the population than simple
random sampling & systematic random sampling.
➢ It is more cost effective than simple random sampling.
workbook, but you might find that a probability sample with a poor response rate doesn't
in the end give you a particularly good representation of the population being examined.
1. Purposive Sampling
A purposive sample is one, which is selected by the researcher subjectively. The
researcher attempts to obtain sample that appears to him/her to be representative of the
population and will usually try to ensure that a range from one extreme to the other is
included.
Often used in political polling - districts chosen because their pattern has in the past
provided good idea of outcomes for whole electorate.
2. Quota Sampling
Quota sampling involves the fixation of certain quotas, which are to be fulfilled by the
interviewers.
Quota sampling is often used in market research. Interviewers are required to find cases
with particular characteristics. They are given quota of particular types of people to
interview and the quotas are organized so that final sample should be representative of
population.
Stages:
• Decide on characteristic of which sample is to be representative, e.g. age
• Find out distribution of this variable in population and set quota accordingly. E.g.
if 20% of population is between 20 and 30, and sample is to be 1,000 then 200 of
sample (20%) will be in this age group
Complex quotas can be developed so that several characteristics (e.g. age, sex, marital
status) are used simultaneously. By the end of the day, the researcher may be looking for
a widowed man in his nineties who looks as though he might buy a particular brand of
detergent.
Disadvantage of quota sampling - Interviewers choose who they like (within above
criteria) and may therefore select those who are easiest to interview, so bias can result.
Also, impossible to estimate accuracy (because not random sample)
3. Convenience sampling
A convenience sample is used when you simply stop anybody in the street who is
prepared to stop, or when you wander round a business, a shop, a restaurant, a theatre or
whatever, asking people you meet whether they will answer your questions. In other
words, the sample comprises subjects who are simply available in a convenient way to
the researcher. There is no randomness and the likelihood of bias is high. You can't draw
any meaningful conclusions from the results you obtain.
However, this method is often the only feasible one, particularly for students or others
with restricted time and resources, and can legitimately be used provided its limitations
are clearly understood and stated.
Because it is an extremely haphazard approach, students are often tempted to use the
word "random" when describing their sample where they have stopped people in the
street, as they see it "at random". You should avoid using the word "random" when
describing anything to do with sampling unless you are absolutely certain that you
selected respondents from a sampling frame using truly random methods.
4. Snowball sampling
With this approach, you initially contact a few potential respondents and then ask them
whether they know of anybody with the same characteristics that you are looking for in
your research. For example, if you wanted to interview a sample of vegetarians / cyclists /
people with a particular disability / people who support a particular political party etc.,
your initial contacts may well have knowledge (through e.g. support group) of others.
5. Self-selection
Self-selection is perhaps self-explanatory. Respondents themselves decide that they
would like to take part in your survey.
Sampling error
What can make a sample unrepresentative of its population? One of the most frequent
causes is sampling error.
Sampling error comprises the differences between the sample and the population that are
due solely to the particular units that happen to have been selected.
For example, suppose that a sample of 100 Arbaminch women are measured and are all
found to be taller than six feet. It is very clear even without any statistical prove that this
would be a highly unrepresentative sample leading to invalid conclusions. This is a very
unlikely occurrence because naturally such rare cases are widely distributed among the
population. But it can occur. Luckily, this is a very obvious error and can be detected
very easily.
The more dangerous error is the less obvious sampling error against which nature offers
very little protection. An example would be like a sample in which the average height is
overstated by only one inch or two rather than one foot which is more obvious. It is the
unobvious error that is of much concern.
There are two basic causes for sampling error. One is chance: That is the error that occurs
just because of bad luck. This may result in untypical choices. Unusual units in a
population do exist and there is always a possibility that an abnormally large number of
them will be chosen. The main protection against this kind of error is to use a large
enough sample. The second cause of sampling error is sampling bias.
Sampling bias is a tendency to favour the selection of units that have particular
characteristics. Sampling bias is usually the result of a poor sampling plan. The most
notable is the bias of non-response when for some reason some units have no chance of
appearing in the sample. For example, take a hypothetical case where a survey was
conducted recently by a Graduate School to find out the level of stress that graduate
students were going through. A mail questionnaire was sent to 100 randomly selected
graduate students. Only 52 responded and the results were that students were not under
stress at that time when the actual case was that it was the highest time of stress for all
students except those who were writing their thesis at their own pace. Apparently, this is
the group that had the time to respond. The researcher who was conducting the study
went back to the questionnaire to find out what the problem was and found that all those
who had responded were third and fourth PhD. students. Bias can be very costly and has
to be guarded against as much as possible. A means of selecting the units of analysis
must be designed to avoid the more obvious forms of bias. Another example would be
where you would like to know the average income of some community and you decide to
use the telephone numbers to select a sample of the total population in a locality where
only the rich and middle class households have telephone lines. You will end up with
high average income, which will lead to the wrong policy decisions.
A non-sampling error is an error that results solely from the manner in which the
observations are made.
Biased observations due to inaccurate measurement can be innocent but very devastating.
A story is told of a French astronomer who once proposed a new theory based on
spectroscopic measurements of light emitted by a particular star. When his colloquies
discovered that the measuring instrument had been contaminated by cigarette smoke, they
rejected his findings.
In surveys of personal characteristics, unintended errors may result from: -The manner in
which the response is elicited -The social desirability of the persons surveyed -The
purpose of the study -The personal biases of the interviewer or survey writer.
population mean. µ= X = X .
2. the standard deviation of the sampling distribution of the means
(standard error) is equal to the population standard deviation divided by
the square root of the sample size: x = δ/√n. This hold true if and only
of n<0.05N and N is very large. If N is finite and n≥ 0.05N,
x = *
N −n
. The expression N − n is called finite population
n N −1 N −1
correction factor/finite population multiplier. In the calculation of the
standard error of the mean, if the population standard deviation δ is
unknown, the standard error of the mean x , can be estimated by using
the sample standard error of the mean S X which is calculated as follows:
S N −n
SX = S or S X =* .
n n N −1
3. The sampling distribution of means is approximately normal for
sufficiently large sample sizes (n≥ 30).
Example:
A population consists of the following ages: 10, 20, 30, 40, and 50. A random
sample of three is to be selected from this population and mean computed.
Develop the sampling distribution of the mean.
Solution:
The number of simple random samples of size n that can be drawn without
replacement from a population of size N is NCn. With N= 5 and n = 3, 5C3 = 10
samples can be drawn from the population as:
Sampled items Sample means ( X )
10, 20, 30 20.00
10, 20, 40, 23.33
10, 20, 50 26.67
10, 30, 40 26.67
10, 30, 50 30.00
10, 40, 50 33.33
20, 30, 40 30.00
20, 30, 50 33.33
20, 40, 50 36.67
30, 40, 50 40.00
300.00
=
X = x = 30, regardless of the sample size = X .
N n
(X )
2
−X 1000
= = = 14.142 .
i
N 5
N − n 14.142 5−3
X = * = * = 5.774
n N −1 3 5 −1
2
X i− X
333.4
= = = 5.774
N 10
Since averaging reduces variability x < δ except the cases where δ = 0 and
n = 1.
The significance of the Central Limit Theorem is that it permits us to use sample
statistics to make inference about population parameters with out knowing
anything about the shape of the frequency distribution of that population other
than what we can get from the sample. It also permits us to use the normal
distribution (curve for analyzing distributions whose shape is unknown. It
creates the potential for applying the normal distribution to many problems
when the sample is sufficiently large.
Example:
1. The distribution of annual earnings of all bank tellers with five years of
experience is skewed negatively. This distribution has a mean of Birr 15,000
and a standard deviation of Birr 2000. If we draw a random sample of 30
tellers, what is the probability that their earnings will average more than Birr
15,750 annually?
Solution:
Steps:
1. Calculate µ and x
µ = Birr 15,000
x = δ/√n= 2000/√30 = Birr 365.15
2. Calculate Z for X
X −X X −
ZX = =
X X
15,750 − 15,000
Z15, 750 = = + 2.05
365
2. Suppose that during any hour in a large department store, the average
number of shoppers is 448, with a standard deviation of 21 shoppers. What is the
probability of randomly selecting 49 different shopping hours, counting the
shoppers, and having the sample mean fall between 441 and 446 shoppers,
inclusive?
Solution:
1. Calculate µ and x
µ = 448 shoppers
x = δ/√n= 21/√49 = 3
2. Calculate Z for X
X −X X −
ZX = =
X X
441 − 448 446 − 448
Z 441 = = − 2.33 Z 446 = = − 0.67
3 3
3. A production company’s 350 hourly employees average 37.6 year of age, with
a standard deviation of 8.3 years. If a random sample of 45 hourly employees is
taken, what is the probability that the sample will have an average age of less
than 40 years?
Solution:
1. Calculate µ and x
µ = 37.6 years n/N= 45/350 > 5%...... FPCF is needed
N −n 8.3 350 − 45
x = * = x = * =1.16
n N −1 45 350 − 1
2. Calculate Z for X
X −X X −
ZX = =
X X
40 − 37.6
Z 40 = = + 2.07
1.16
Solution:
a)
12 12
n = 36 δ =12 X = = = =2
n 36 6
P ( X > µ +6) + P ( X < µ - 6) =?
+6− −6−
Z +6 = = +3 Z −6 = = −3
2 2
P ( X > µ +6) + P (Z> µ - 6) = P (Z > 3) + P (Z < - 3)
= [0.5 – P (0 to +3)] + [0.5 – P (0 to -3)]
= (0.5 – 0.49865) + (0.5 – 0.49865)
= 0.00135(2) = 0.00270
b)
12 12
n = 36 δ =12 X = = = =2
n 36 6
P (µ - 6≤ X ≤ µ + 6) = P (- 3≤ Z ≤ 3)
= P (0 to 3)*2
= 0.49865*2
= 0.9973
If the population standard deviation is 12, in a random sample of 36 scores there
is a 99.73% chance of getting a sample mean score to lie within 6 points of the
population mean.