Lecture 13 Sampling

Download as pdf or txt
Download as pdf or txt
You are on page 1of 90

Sampling

Contents

• Definition of sampling
• Why do we use samples?
• Concept of representativeness
• Sampling questions
• Main methods of sampling
What is sampling?

• Sampling is the process of selecting a number


of study units from a defined study population

• Research conclusions and generalizations are


only as good as the sample they are based on
Sampling techniques and
sampling size

• Necessary to understand: the principles


of the sampling and the size of the
sample adequate to obtain results
approximating the true parameters
Sampling questions
( What we need to know)
We need to answer 3 questions:
• What is the population from which we
want to draw a sample? (study population)

• How will these members be selected?


(sampling methods)

• How many members do we need in our


sample? (sample size)
Definition of sampling terms
Target Population (Universe):
• A group of people living in one geographical area

• The target population is the entire group a


researcher is interested in; the group about which
the researcher wishes to draw conclusions upon

• Samples are usually obtained from populations and


thus it is logic that samples should be
representative of the populations from which they
are derived
Definition of sampling terms
Example

• Suppose we take a group of men aged 35-40


who have suffered an initial heart attack

• The purpose of this study could be to compare


the effectiveness of two drug regimes for
delaying or preventing further attacks.

• The target population here would be all men


meeting the same general conditions as those
actually included in the study
Definition of sampling terms
• The study population
It is a collection of objects, events or individuals having
characteristics that a researcher may be interested in studying.

• Example:
If your study is to explore and describe the characteristics of
diabetic patients, then your population is diabetic patients.

The study population has to be clearly defined (for example,


according to age, gender, and residence.) Otherwise we cannot
do the sampling. Apart from persons, a study population may
consist of villages, institutions, records, etc.

Each study population consists of Study units.


Definition of sampling terms
The population unit (study unit):

• This is the individual person or thing or


whatever being counted

• Thus the population units could be health


facilities, schools, ANC attendants, T.B.
cases etc…
Definition of sampling terms
Sampling unit
• Subject under observation on which information
is collected
– Example: Children <5 years, hospital discharges…
Sampling fraction
• Ratio between the sample size and the
population size
– Example: 100 out of 2000 (5%).
Sampling interval
– Number of units 2000=20
Desired sample size 100
Definition of sampling terms
Sampling frame:
• This is the source from which the study subjects are
selected

• List of all the sampling units in the population


– List of households, health care units…

Examples of sampling frames:

• Hospital admission lists


• Hospital outpatient lists
• The list of the workers in a factory
• List of students
Definition of sampling terms

Sampling technique (scheme):

• Method of selecting sampling units from


sampling frame

– Randomly, convenience sample…


Definition of sampling terms
The sample:

• Is the part of the population from which


information is actually obtained

• The help of a statistician is always needed but


the researcher should understand the basic
rules of sampling
Why do we use samples ?

Get information from large populations


– At minimal cost
– At maximum speed
– At increased accuracy
– Using enhanced tools
Sampling

Precision
Cost
Representativeness
• Person
• Demographic characteristics (age, sex…)
• Exposure/susceptibility
• Place (ex : urban vs. rural)
• Time
• Seasonality
• Day of the week
• Time of the day
Representativeness

• A properly selected sample of appropriate size


will enable generalization to be made about the
population.
For example
If you intend to interview 100 mothers in order
to obtain a complete picture of the weaning
practices in District X you would have to
select these mothers from a representative
sample of villages

It would be unwise to select them from only one


or two villages as this might give you a
distorted (biased) picture
Sampling and representativeness

Sampling
Population
Sample

Target Population

Target Population  Sampling Population  Sample


Sampling and representativeness
Study on prevalence of skin infections among school children in
Khartoum state

School
children of Children
10 schools
in state.

School children in Khartoum state

Target Population  Sampling Population  Sample


Sample size criteria

Sample size depends on:

• Objective of the study


• Population size
• The level of precision
• The level of confidence
• The degree of variability in the data measured
Level of percision
• Level of precision: sampling error

• The range in which the true population value is estimated


to be

• No sample is the exact mirror image of the population

• Magnitude of error can be measured in probability samples

• Expressed by standard error


– of mean, proportion, differences, etc
The level of precision

• If a researcher found that 60% of the health care


providers in the sample have adopted the new
protocol for T.B. case management

• This is often expressed in percentage point + or - 5%

• Thus the researcher can conclude that between 55%


and 65% of the HCPs adopted the new protocol for TB
case management
The level of confidence
• In a normal distribution, approximately 95% of the sample
values are within two standard deviations of the true
population value (e.g.: mean)

• Furthermore, the values obtained by these samples are


distributed normally about the true value, with some
samples having a higher value and some obtaining a lower
score than the true population value

• If a 95% confidence level is selected, 95 out of 100 samples will


have the true population value within the range of precision
specified earlier
Figure 1.
Degree of variability
• The more heterogeneous the population, the larger
the sample size required

• The more homogeneous the population, the smaller


sample size

• A proportion of 50% indicates a greater level of


variability than 20% or 80%

• This is because 20% and 80% indicate a large


majority do not or do have the attribute of interest i.e.
disease
Conceptual framework of sampling
Firstly, the study of the whole population under
investigation is expected to generate the true
population parameters

Secondly, conduct the study on a portion of the


population in order to obtain results
approximating the true population parameters

Thirdly, the researcher is expected to calculate


a sample, which would yield prevalence
approximating the true prevalence
Example:

• The prevalence rate of anakylostomiasis among school children


in Umbadda province

• In figures if the school children population is 20 000, all those


school children will be studied and the prevalence rate obtained
is then obtained 15/1000

• This is never possible as the cost of such study is extremely


high and the time required to study the 20000 school children is
impractical

• So long the study of the whole population is never possible then


the only option remain for us is to conduct the study on a
portion of the population in order to obtain results
approximating the true population parameters
• Referring to the above-mentioned example the
situation is what portion of population should be
selected to obtain the prevalence rate of
Ankylostomiasis among school children in Umbadda
province

• Logically if only a portion of the population is included


in the study, the prevalence rate is never expected to
be the same prevalence rate obtained if all school
children in Umbadda province are studied

• Actually these two principles are the main pillars of the


sampling techniques and sample size
Strategies for determining the sample
size

There are several approaches for determining


the sample size:

• Census for small populations


• Imitating a sample size of similar studies
• Using published tables
• Using formulas for calculation of sample
size
• Computerized statistical package
1. Census for small populations

• Entire population as the sample: total coverage


• Usually 200 or less
• Eliminates sampling error and provides data for
all individuals
• Achieve desirable level of precision
2. Using the sample size of similar
studies

• Using the sample size as those studies similar


to the one you plan

• Repetition of the errors made for determining


the sample size of another study
3. Using published tables
• Rely on published tables which provide the
sample size for a given set of criteria

4. Using formulas for calculation of the


sample size
• It is a wide spread belief among researchers
that the bigger the sample, the better the study
becomes (This is not necessarily true)
• In general it is much better to increase the
accuracy of data collection and the quality of
the collected data rather than to increase the
sample size after a certain point

• It is better to make extra efforts to get a


representative sample rather than to get a very
large sample

• The eventual sample size is usually a


compromise between what is desirable and
what is feasible
• The feasible sample size is usually determined
by the availability of resources in terms of time,
personnel, transport and money

• In a descriptive study in a certain village, we


want to measure with a certain precision the
proportion of children aged between 12-23
months who are vaccinated against measles
using a simple random sample
The following steps are used for
estimating the desirable sample size

• Estimate how big the proportion of those


children who are immunized (say 80%)

• Such estimation is not carried haphazardly but


based either on results of previous studies in
the same area or similar one
• Choose the confidence level with which you
want to be confident that the vaccination
coverage in the whole population is indeed
between 70% and 90%

• You can never be 100% sure. Do you want it to


be 90% or 95% or 99%
Steps in estimating sample size
• Identify major study variable
• Determine type of estimate (%, mean, ratio,...)
• Indicate expected frequency of factor of interest
• Decide on desired precision of the estimate
• Decide on acceptable risk that estimate will fall outside
its real population value
• Adjust for estimated design effect
• Adjust for expected response rate
Sample size formulas

Simple random/systematic sampling

Cluster sampling

Sampling when population is known


n = sample size
z = the normal standard deviate (z = 1.96)
p = the frequency of occurrence of an event
q = 1-p (the frequency of non occurrence of an event)
d = degree of precision (0.04%)
• A researcher is intending to conduct a survey to
determine the prevalence of Ankylostomiasis
among school children in Umbadda Province.

• From previous surveys conducted in the same


province the prevalence rate of Ankylostomiasis
was found to be 100/1000 (10%). How will the
researcher calculate the estimated the sample
size?
• Z = 1.96

• p = The frequency of occurrence of


Ankylostomiasis. From the previous surveys:
10% of the school children. In other words the
probability of having Ankylostomiasis is 0.1

• q = 1-0.1=0.9: the probability of not having


Ankylostomiasis among school children in
Umbadda Province is 0.9
• A researcher is intending to conduct a survey to
determine the prevalence of Diabetes among
adults in Omdurman Province.

• From previous surveys conducted in the same


province the prevalence rate of Diabetes was
found to be 20%. How will the researcher
calculate the estimated the sample size?
Thus Applying the above sample

2
𝑧 𝑝𝑞
𝑛=
𝑑2

1.96 𝑥 1.96 𝑥 0.2 𝑥 0.8


𝑛= = 384
0.04 𝑥 0.04
• In case the researcher could not trace the prevalence of
Ankylostomiasis from previous studies and this is commonly
encountered in reality, the only option remains is to assume
that the expected prevalence is 50%

• z = 1.96

• p = the frequency of occurrence of Ankylostomiasis (assumed


to be : 50% of the school children

• In other words the probability of having Ankylostomiasis is 0.5

• q = 1-0.5=0.5: the probability of not having Ankylostomiasis


among school children in Umbadda province is 0.5
Sample size equation for reduction of
sample
Types of samples

• Non-probability samples (no sampling frame)

• Probability samples (sampling frame)


Non-probability samples
• The technique does not employ random
procedures and the selection of the sample
unit is not carried by chance.

• Such samples are not representative of the


parent population

• The use of the non-probability samples in


health and medical research is not desirable
as the generalization of the research results
would be impossible
Non probability samples
• Quota sample
• Sample reflects population structure
• population divided into categories, a quota is to
be selected from each category
• Time/resources constraints

• Convenience samples (purposive units)


• Involve in the study those samples accessible at time
of data collection
• Biased results are high in this type
• Samples not representative of the population
Probability of being chosen : unknown
Samples do not involve any mathematical rules
Non probability Sample

• Purposive sample:
• Researcher involves specific units in the sample
Which it is believed to affect variables in a study

• Volunteer sample:
• Some surveys involve tests (volunteers), therefore
those included in the study are those who accept
to take the test
Methods used in probability samples

• Simple random sampling


• Systematic sampling
• Stratified sampling
• Multistage sampling
• Cluster sampling
Probability samples
• Random sampling
• Each subject has a known probability or an
equal chance of being chosen or included in the
study
• The technique used employs random
procedures whereby the selection of the
sampling unit or element (individuals, objects,
villages) is done on the basis of chance..
• Reduces possibility of selection bias
• Allows application of statistical theory to results
1. Simple random sampling

• Principle
• Equal chance of drawing each unit

• Each individual is chosen randomly and


entirely by chance, such that each individual
has the same probability of being chosen at
any stage during the sampling process

• Procedure
• Number all units
• Randomly draw units
Simple random sampling

• Advantages
• Simple
• Sampling error easily measured

• Disadvantages
• Need complete list of units
• Does not always achieve best
representativeness
• Units may be scattered
Simple random sampling
Example: evaluate the prevalence of tooth decay
among the 1200 children attending a school

• List of children attending the school


• Children numerated from 1 to 1200
• Sample size = 100 children
• Random sampling of 100 numbers between 1
and 1200

How to randomly select?


Simple random sampling
Table of random numbers
57172 42088 70098 11333 26902 29959 43909 49607
33883 87680 28923 15659 09839 45817 89405 70743
77950 67344 10609 87119 15859 74577 42791 75889
11607 11596 01796 24498 17009 67119 00614 49529
56149 55678 38169 47228 49931 94303 67448 31286
80719 65101 77729 83949 83358 75230 56624 27549
93809 19505 82000 79068 45552 86776 48980 56684
40950 86216 48161 17646 24164 35513 94057 51834
12182 59744 65695 83710 41125 14291 74773 66391
13382 48076 73151 48724 35670 38453 63154 58116
38629 94576 48859 75654 17152 66516 78796 73099
60728 32063 12431 23898 23683 10853 04038 75246
01881 99056 46747 08846 01331 88163 74462 14551
23094 29831 95387 23917 07421 97869 88092 72201
15243 21100 48125 05243 16181 39641 36970 99522
53501 58431 68149 25405 23463 49168 02048 31522
07698 24181 01161 01527 17046 31460 91507 16050
22921 25930 79579 43488 13211 71120 91715 49881
68127 00501 37484 99278 28751 80855 02035 10910
55309 10713 36439 65660 72554 77021 46279 22705
92034 90892 69853 06175 61221 76825 18239 47687
50612 84077 41387 54107 09190 74305 68196 75634
81415 98504 32168 17822 49946 37545 47201 85224
38461 44528 30953 08633 08049 68698 08759 45611
07556 24587 88753 71626 64864 54986 38964 83534
60557 50031 75829 05622 30237 77795 41870 26300
EPITABLE: random number listing
EPITABLE: random number listing

Also possible in Excel


2. SYSTEMATIC SAMPLE

This is defined as the process in which the


selection is done systematically according
to a list of the targeted population
FIRST: determine the sampling interval, which is
symbolized by “k,” (it is the population size divided by
the desired sample size)

SECOND: randomly select a number between 1 and k,


and include that person in your sample

THIRD: also include each k element in your sample, for


example if k is 10 and your randomly selected number
between 1 and 10 was 5, then you will select persons 5,
15, 25, 35, 45, etc. when you get to the end of your
sampling frame you will have all the people to be
included in your sample
Systematic sampling

• N = 1200, and n = 60
 sampling interval = 1200/60 = 20
• List persons from 1 to 1200
• Randomly select a number between 1 and
20 (ex : 8)
 1st person selected = the 8th on the
list
 2nd person = 8 + 20 = the 28th
etc .....
Systematic sampling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

31 32 33 34 35 36 37 38 39 40 41 42 43 44 45

46 47 48 49 50 51 52 53 54 55 ……..
Systematic sampling
3. STRATIFIED SAMPLE

• In this type of sampling, the population is subdivided


into strata

• Thus for the sample to be representative, it should be


drawn from each stratum separately

• If the sample is drawn from the population without


considering the stratification, then the whole sample
may be drawn from one stratum rather than all strata
• First, stratify your sampling frame (e.g., divide it
into the males and the females if you are using
gender as your stratification variable).

• Second, take a random sample from each


group

• There are actually two different types of


stratified sampling
• The first type and most common is called
proportional stratified sampling

• In proportional stratified sampling you must make sure


the subsamples (e.g., the samples of males and
females) are proportional to their sizes in the
population

• The second type of stratified sampling is called


disproportional stratified sampling

• In disproportional stratified sampling, the subsamples


are not proportional to their sizes in the population
• Here is an example: assume that your population is
75% female and 25% male

• Assume also that you want a sample of size 100 and


you want to stratify on the variable called gender

• For proportional stratified sampling, you would


randomly select 75 females and 25 males from the
population

• For disproportional stratified sampling, you might


randomly select 50 females and 50 males from the
population
Stratified sampling
• Principle:

• Particularly useful when one is


interested in analyzing data by certain
characteristic

• Classify population into internally


homogeneous subgroups (strata) e.g:
age group: 0-10, 11-20, 21-30

• Draw sample in each strata

• Combine results of all strata


Stratified sampling
• Advantages
• All subgroups represented, allowing separate
conclusions about each of them

• Disadvantages
• Sampling error difficult to measure
• Loss of precision if very small numbers
sampled in individual strata
Example: Stratified sampling

• Determine vaccination coverage in Sudan


• One sample drawn in each State (stratum)
• Estimates calculated for each stratum
• Each stratum weighted to obtain estimate for
Sudan (average)
4. MULTI-STAGE SAMPLE

• This is sample in which the parent population


is divided into large units, from which a first
stage sample is drawn randomly

• Then a second stage sample is drawn from


those sample units selected in the first stage
Multiple stage sampling
Principle
• consecutive samplings
• example:
sampling unit = household

• 1st stage : drawing areas or blocks


• 2nd stage : drawing buildings, houses
• 3rd stage : drawing households
5. Cluster sampling

• Principle

• Random sample of groups (“clusters”) of


units

• In selected clusters, all units or proportion


(sample) of units included
Cluster sampling
In cluster sampling, we follow these steps:
• Divide population into clusters (usually along
geographic boundaries) randomly sample clusters
measure all units within sampled clusters

• In single-stage cluster sampling, all the elements


from each of the selected clusters are used

• In two-stage cluster sampling, a random sampling


technique is applied to the elements from each of the
selected clusters
Cluster sampling
• The main difference between cluster sampling
and stratified sampling is that in cluster
sampling the cluster is treated as the
sampling unit

• While in stratified sampling analysis is done on


the elements within the strata
Example: Cluster sampling
Section 1 Section 2

Section 3

Section 5

Section 4
Cluster sampling
• Advantages

• Simple as complete list of sampling units within


population not required

• Less travel/resources required

• Disadvantages

• Sampling error difficult to measure


Selecting a sampling method

• Population to be studied
• Size/geographical distribution
• Heterogeneity with respect to variable

• Level of precision required

• Resources available

• Importance of having a precise estimate of


the sampling error
EPITABLE: Cluster sample size
calculation
Conclusions

• Probability samples are the best


• Beware of …
• refusals
• absentees
• “do not know”
Conclusions

• If in doubt…

Call a statistician !!!!

You might also like