Research Methodology: NOVEMBER 26, 2010
Research Methodology: NOVEMBER 26, 2010
Research Methodology: NOVEMBER 26, 2010
8
Step 1: Define the aims of the
study
• Write out the problem and primary and
secondary aims using one sentence per
aim. Formulate a plan for the statistical
analysis of each aim. For example,
prepare a set of dummy tables
9
Step 2: Define the variables
to be collected
• Write a detailed list of the information to be
collected and the concepts to be measured
in the study. Are you trying to identify:
– Attitudes
– Needs
– Behavior
– Demographics
– Some combination of these concepts
10
Step 2 contd..
11
Step 3: Review the literature
• Review current literature to identify related
surveys and data collection instruments
that have measured concepts similar to
those related to your study’s aims.
• Saves development time and allows for
comparison with other studies if used
appropriately.
12
Step 4: Specify the Type of
Interviewing method
• Determine how the questionnaire will be
administered: self-administered, telephone
survey, face-to-face interview, web-based
survey.
• At the top, clearly state:
– The purpose of the study
– How the data will be used
– Instructions on how to fill out the questionnaire
– Your policy on confidentiality
13
Step 5: Decide on the question
structure, wordings etc.
• There should be a logical progression
such that the informant is drawn into the
interview by awakening his/her interest.
For this, the introductory item should be as
attention-catching as possible, start with
neutral item (but relevant) than
controversial items.
Step 5: Decide on the question
structure, wordings etc.
• Begin with simple to complex questions:
Time taking questions should neither
come too early nor too late
• Respondent should not be affronted by an
early and sudden request for personal
information
• Avoid ambiguous and complex wordings:
avoid usually, normally, often, regularly
Step 5: Decide on the question
structure, wordings etc.
• Never ask to give an answer to a question
which is embarrassing without being given
an opportunity to explain. This should be
done whether or not researcher is
interested in the “why” question. For
example, if a question is asked “Did you
vote in the last local election?” it should be
supplemented with next question “Why he
did not vote?”
Step 5: Decide on the question
structure, wordings etc.
• Respondents must be brought as
smoothly as possible from one frame of
reference to another rather than made to
jump back and forth.
Step 5: Decide on the question
structure, wordings etc.
• No interview or questionnaire should be
completed without an expression of
appreciation for the efforts put by the
respondents.
Step 6: Design the Questions
• Question: How many cups of coffee or
tea do you drink in a day?
19
Step 6: Design the Questions
• Question: How many cups of coffee or tea
do you drink in a day?
• Principle: Ask for an answer in only one
dimension.
• Solution: Separate the question into two
– (1) How many cups of coffee do you drink
during a typical day?
– (2) How many cups of tea do you drink during a
typical day?
20
Step 6: Design the Questions
• Question: What brand of computer do you
own?
– (A) IBM PC
– (B) Apple
21
Step 6: Design the Questions
• Question: What brand of computer do you own?
– (A) IBM PC
– (B) Apple
• Principle: Avoid hidden assumptions. Make sure to
accommodate all possible answers.
• Solution:
– (1) Make each response a separate dichotomous item
• Do you own an IBM PC? (Circle: Yes or No)
• Do you own an Apple computer? (Circle: Yes or No)
– (2) Add necessary response categories and allow for
multiple responses.
• What brand of computer do you own? (Circle all that apply)
– Do not own computer
– IBM PC
– Apple
– Other 22
Step 6: Design the Questions
• Question: Have you had pain in the last
week?
[ ] Never [ ] Seldom [ ] Often [ ] Very
often
23
Step 6: Design the Questions
• Question: Have you had pain in the last
week?
[ ] Never [ ] Seldom [ ] Often [ ] Very
often
• Principle: Make sure question and answer
options match.
• Solution: Reword either question or answer
to match.
– How often have you had pain in the last week?
[ ] Never [ ] Seldom [ ] Often [ ] Very Often
24
Step 6: Design the Questions
• Question: Where did you grow up?
– Country
– Farm
– City
25
Step 6: Design the Questions
• Question: Where did you grow up?
– Country
– Farm
– City
• Principle: Avoid questions having non-
mutually exclusive answers.
• Solution: Design the question with
mutually exclusive options.
– Where did you grow up?
• House in the country
• Farm in the country
• City 26
Step 6: Design the Questions
• Question: Are you against drug abuse?
(Circle: Yes or No)
27
Step 6: Design the Questions
household annual
Agricultural Wages
income comes from:
Live Stocks
Non-agricultural Wages
Salary
34
Step 6: Design the Questions
• Question: Which one of the following do you think
increases a person’s chance of having a heart attack the
most? (Check one.)
[ ] Smoking [ ] Being overweight [ ] Stress
• Principle: Encourage the respondent to consider each
possible response to avoid the uncertainty of whether a
missing item may represent either an answer that does not
apply or an overlooked item.
• Solution: Which of the following increases the chance of
having a heart attack?
– Smoking: [ ] Yes [ ] No [ ] Don’t know
– Being overweight: [ ] Yes [ ] No [ ] Don’t know
– Stress: [ ] Yes [ ] No [ ] Don’t know
35
Step 6: Design the Questions
38
Step 6: Design the Questions
• Question:
– (1) Do you currently have a life insurance policy?
(Circle: Yes or No)
– If no, go to question 3.
– (2) How much is your annual life insurance
premium?
• Principle: Avoid branching as much as
possible to avoid confusing respondents.
• Solution: If possible, write as one question.
– How much did you spend last year for life
insurance? (Write 0 if none). 39
Step 6: Design the Questions
40
Step 6: Design the Questions
41
Step 7: Revise
• Shorten the set of questions for the study. If
a question does not address one of your
aims, discard it.
• Refine the questions and their wording by
testing them with a variety of respondents.
– Ensure the flow is natural.
– Verify that terms and concepts are familiar and
easy to understand for your target audience.
– Keep recall to a minimum and focus on the
recent past.
42
Step 8: Assemble the final
questionnaire
• Include identifying data on each page of a multi-
page questionnaire such as a respondent ID
number in case the pages separate.
• Group questions concerning major subject areas
together and introduce them by heading or short
descriptive statements.
• Ordering of questions should serve to stimulate
recall.
• Ordering and formatting of questions should be
unbiased and balanced.
• Include white space to make answers clear and to
help increase response rate. 43
Step 8: Assemble the final
questionnaire
• Space response scales widely enough so that it is
easy to circle or check the correct answer without
the mark accidentally including the answer above
or below.
– Open-ended questions: the space for the response
should be big enough to allow respondents with large
handwriting to write comfortably in the space.
– Closed-ended questions: line up answers vertically and
precede them with boxes or brackets to check, or by
numbers to circle, rather than open blanks.
• Use larger font size (e.g., 14) and high contrast
(black on white).
44
Conclusions
• You need plenty of time!
– Design your questionnaire from research hypotheses
that have been carefully studied and thought out.
– Discuss the research problem with colleagues and
subject matter experts is critical to developing good
questions.
– Review, revise and test the questions on an iterative
basis.
– Examine the questionnaire as a whole for flow and
presentation.
45
HOW TO COLLECT
PRIMARY DATA?
46
How to Collect Data from
survey?
Two ways of data collection
• Census
- is an attempt to contact every individual
in the entire population
- Time and money consuming, only for
very small population
- Populations rarely stand still. Population
changes while you take the census
•Sampling (we will discuss in detail) 47
Defining Population
• Population : an aggregate of objects,
animate or inanimate under study
49
Sampling and representativeness
Sampling or accessible
Population
Sample
Target Population
Sampling
process
Sample
population (data)
Conclusions
Statistics
parameters
(estimate)
unknown known
51
Population versus sample
• Population: The • Sample: The part of
entire group of the population we
individuals in which
we are interested actually examine and
but can’t usually for which we do have
assess directly data
• A statistic is a
• A parameter is a
number describing a Population number describing a
characteristic of the characteristic of a
population. The Sample sample. the value of a
parameter value for statistic is always
the population is a
fixed number but is known as it is from a
sometimes unknown known sample but its
– which is why we We use the statistic to value changes from
try to estimate it. estimate the unknown sample to sample.
population parameter
52
What is sampling?
53
Why sampling not census ?
54
Why sampling not census ?
To get information from large populations
56
Why sampling not census ?
In some situations census is not practical
• If testing is destructive like testing the quality
of milk or chemical salt by analysis, testing
the breaking strength of chalks, testing of
crackers and explosives and testing the life
of an electric tube or bulb etc.
• If the population is hypothetical, as in coin-
tossing problem where the process may
continue indefinitely then sampling method is
the only scientific method of estimating the
parameters of the universe. 57
Sampling: Balance between
Precision and cost
Precision
Cost
58
Principal Steps in a sample
Survey
59
Define the objectives of the survey
Principal
Define the target Population
60
Principal Prepare Questionnaire or schedule
65
Step 3. Select a sampling frame
67
Types of Sampling
PROBABILITY SAMPLING
NONPROBABILITY SAMPLING
68
Probability Sampling
A probability provides a quantitative description
of the likely occurrence of a particular event.
Probability Sampling is the scientific method of
selecting samples according to some law of
chance in which each unit in the population
has some definite pre-assigned known, non-
zero probability of being selected in the
sample.
The pre-assigned probabilities may be equal for
all, may be different for all or may be
proportional to the sample size. 69
Types of Probability Sampling
STRATIFIED SAMPLING
CLUSTER SAMPLING
72
Examples of SRS
List of Clients
73
Simple Random Sampling
List of Clients
Random Subsample
74
Types of SRS
SRS can be used in two ways
1. SRS with replacement (SRSWR): drawing a
unit and replace it into the population so that
population size remain same before any unit
drawn
2. SRS without replacement (SRSWOR): unit
once selected cannot be a part of population
75
How to choose Random Sample
Procedure - use table of random numbers, computer
random number generator or mechanical
device
Step I: Identify and assign each unit within the
sampling frame a unique number
between 1 to N (sampled population).
Step II. Identify a random start from the random
number table.
Step III. Determine how the digits in the random
number table will be assigned to the
sampling frame.
Step IV. Select the sample elements from the
sampling frame. 76
77
A Practical Example
Problem: Draw a random sample wor of size 10 from a
population of size 1000.
Solution:
Step I: Identify and assign each unit within the
sampling frame a unique number between
1 to 1000.
Step II: Identify a random start from the random
number table say we start from.
Step III: Determine how the digits in the random
number table will be assigned to the
sampling frame.
Step IV: Select the sample elements from the sampling
frame. 78
Merits and Demerits of SRS
Merits:
1. Eliminates the element of subjectivity or personal
bias: Since equal probability of unit selection
2. Simple statistical analysis
Demerits:
1. Problem with Sampling frame: need up-to-date
frame but difficult to get
2. Useful only in small and/or homogeneous areas
3. Requires larger sample size
4. Time consuming/inefficient
79
Stratified Random Sampling
80
Stratified Random Sampling
A population is divided into mutually
exclusive subpopulations of known size, and
a simple random sample is selected in each
subpopulation.
Each subpopulation is called stratum
The criterion which enables us to classify
various sampling units into different strata is
called Stratifying factor (s.f.). For example:
Age, sex, educational or income level,
geographical area, economic status and so
on. 81
Stratified Random Sampling
• A s.f. is effective if it divides the given populations
into different strata such that units in each stratum
are
- Homogeneous within themselves
- Heterogeneous/ unlike between different
stratum
• Suppose a farmer wishes to work out the average
milk yield of each cow type in his herd which
consists of Ayrshire, Friesian, Galloway and Jersey
cows. He could divide up his herd into the four sub-
groups and take samples from these (Easton and
Mc Coll 2004). 82
Stratified Random Sampling
Stratum 1 Stratum 2 Stratum 3 Stratum 4
1 1 1 1
2 2 2 2
3 3 3 3
N3
N1
N2
N4
n1 n2 n3
N = N1 + N2 + N3 + N4 n4
83
n = n1+n2+n3+n4
Stratified Random Sampling
84
Allocation of Sample Size
Proportional Allocation
here sample size allocated to each stratum is
directly proportional to the Population size for that
stratum i.e. More the population size of a stratum,
larger sample will be drawn from that.
86
Demerits of Stratified Sampling
87
Systematic Random Sampling
88
Systematic Random Sampling
Procedure
• Identify the total number of elements in the
population and number units in from 1 to N
• Decide on the size of desired sample size n
that you want or need
• Identify the sampling ratio k = N/n
• randomly select a number from 1 to k.
• Draw a sample by choosing every kth entry
89
Systematic Sampling
90
Systematic Random Sampling
1 26 51 76
2 27 52 77
3 28 53 78
4 29 54 79
5 30 55 80
N = 100 6 31 56 81
7 32 57 82
8 33 58 83
9 34 59 84
10 35 60 85
11 36 61 86
12 37 62 87
13 38 63 88
14 39 64 89
15 40 65 90
16 41 66 91
17 42 67 92
18 43 68 93
19 44 69 94
20 45 70 95
21 46 71 96
22 47 72 97
23 48 73 98
24 49 74 99
25 50 75 100
91
Systematic Random Sampling
1 26 51 76
2 27 52 77
3 28 53 78
4 29 54 79
5 30 55 80
N = 100 6 31 56 81
7 32 57 82
8 33 58 83
want n = 20 9
10
34
35
59
60
84
85
11 36 61 86
12 37 62 87
13 38 63 88
14 39 64 89
15 40 65 90
16 41 66 91
17 42 67 92
18 43 68 93
19 44 69 94
20 45 70 95
21 46 71 96
22 47 72 97
23 48 73 98
24 49 74 99
25 50 75 100
92
Systematic Random Sampling
1 26 51 76
2 27 52 77
3 28 53 78
4 29 54 79
5 30 55 80
N = 100 6 31 56 81
7 32 57 82
8 33 58 83
want n = 20 9 34 59 84
10 35 60 85
11 36 61 86
N/n = 5 12
13
37
38
62
63
87
88
14 39 64 89
15 40 65 90
16 41 66 91
17 42 67 92
18 43 68 93
19 44 69 94
20 45 70 95
21 46 71 96
22 47 72 97
23 48 73 98
24 49 74 99
25 50 75 100
93
Systematic Random Sampling
1 26 51 76
2 27 52 77
3 28 53 78
4 29 54 79
5 30 55 80
6 31 56 81
N = 100 7 32 57 82
8 33 58 83
9 34 59 84
10 35 60 85
want n = 20 11 36 61 86
12 37 62 87
13 38 63 88
N/n = 5 14 39 64 89
15 40 65 90
16 41 66 91
select a random number from 1-5: 17 42 67 92
18 43 68 93
chose 4 19 44 69 94
20 45 70 95
21 46 71 96
22 47 72 97
23 48 73 98
24 49 74 99
25 50 75 100
94
EXAMPLE OF
SYSTEMATIC
RANDOM SAMPLE
95
When to apply Systematic
Random Sampling?
96
Use of Systematic Sampling
1. Often used in industry, where an item is selected for testing
from a production line (say, every fifteen minutes) to ensure
that machines and equipment are working to specification.
Alternatively, the manufacturer might decide to select every
20th item on a production line to test for defects and quality.
This technique requires the first item to be selected at
random as a starting point for testing and, thereafter, every
20th item is chosen.
98
Demerits of Systematic Sampling
1. Biased if systematic layout coincides with an
environmental or vegetation pattern
2. Not in general random since requirement of
randomly arranged frame is rarely fulfilled
3. If N is not a multiple of n then the actual sample
size is different from that required
Systematic
Random
Stratified
100
Cluster sampling
- is a simple random sample of groups or
clusters of elements i.e. is a sampling
technique where the entire population is
divided into groups, or clusters, and a
random sample of these clusters are
selected. All observations in the selected
clusters are included in the sample.
every element should have a specified
(equal) chance of being selected into the
final sample. 101
Cluster sampling
Block A
Block B
Block E
Block D
Block F
Block C
102
Use of Cluster sampling
This procedure is useful when
- it is difficult and costly to develop a
complete list of the population members
but can get a complete list of groups or
'clusters' of the population
- In the survey involving a large,
geographically dispersed and population
related to developing societies.
- when one wants to sample economically
while retaining the characteristics of a
probability sample. 103
Merits & Demerits of Cluster
sampling
Merits
1.Requires less prior information than
stratified sampling.
Demerits
1.Cluster sampling increases sampling error,
because there are probably similarities
among cluster members.
104
Cautions while using Cluster
sampling
1. Clusters should be as small as possible consistent
with the cost and limitations of the survey,
2. The number of sampling units in each cluster
should be approximately same
3. Cluster sampling not recommended when we are
sampling areas in city where there are private
residential houses, business and industrial
complexes, apartment buildings etc. with widely
varying number of persons or households.
105
Multistage sampling
Sampling is carried out in stages.
-First stage units are selected by some suitable
method of sampling,
-From among the selected first stage units, a
sub-sample of secondary stage units is drawn
by some suitable method of sampling which
may be same as or different from the method
used in selecting first stage units and so
on…
Any one of the previous sampling schemes
106
can be applied during each stage.
Multi-Stage sampling
1st stage
Sampled Population
N’’
N’ SRSWOR
108
Merits and Demerits of
Multistage sampling
Merits
1.More flexible
2.Simple to carry out
3.Administrative convenience: by permitting the
field work to be concentrated and yet covering
large area
4.Cost effective: Need second stage frame only
for those units which are selected in the first
stage sample
Demerits 109
1. Less efficient
Difference Between Cluster Sampling
and Simple Random Sample
• Judgemental
• Quota
• Snowball
112
Accidental, Haphazard or
Convenience Sampling
- Attempts to obtain a sample of people or
units that are most convenient
Examples: “man on the street”,Mental
health branch students,available or
accessible clients, volunteer samples etc.
• Least expensive, least time consuming,
accessible, easily measured and
cooperative sampling units.
• Problem:
Problem we have no evidence for
representativeness
113
Judgemental Sampling
- is a form of convenience sampling in which the
population elements are selected based on the
judgement of the researcher about some
appropriate characteristics required from the
sample member
Examples: Test markets selected to determine the
potential of a new product, expert witnesses used
in court, purchase engineers selected in industrial
marketing research because they are considered
to be representative of the company
114
Quota Sampling
- May be viewed as two-stage restricted judgemental
sampling in which Ist stage consists of developing
control categories or quotas of population elements
and then in 2nd stage sample elements are selected
based on convenience or judgement.
- select people non-randomly according to some
quotas
Example: very effective in Magazine readership
• Requires that the various subgroups in a population
are represented .
• It should not be confused with stratified sampling.
115
Snowball Sampling
• An initial group of respondents is selected
randomly. Subsequent respondents are selected
based on the referrals or information provided by
the initial respondents.
• one person recommends another, who
recommends another, who recommends another,
etc.
• good way to identify hard-to-reach populations for
example, homeless persons, drug users
• Advantage: it substantially increases the
likelihood of locating the desired characteristic in
the population resulting low sampling variance
and cost. 116
Choosing Nonprobability vs.
Probability Sampling
Choice should be based on
1. Nature of research: ex. In exploratory research,
the findings are treated as preliminary and the use
of prob. Sampling may not be warranted. On the
other hand, in conclusive research where the
researcher wishes to use the results to estimate
overall market shares or the size of the total
market, prob. Sampling is favored.
2. Relative magnitude of nonsampling vs
sampling errors: If non-sampling error is an
important factor then non-prob. Sampling may be
preferable as the use of judgment may allow
greater control over the sampling process.
117
Choosing Nonprobability vs.
Probability Sampling
3. Variability in the Population: A more
heterogeneous population would favor probability
sampling as it would be more important to secure
a representative sample.
4. Statistical considerations: As probability
sampling is the basis of most common statistical
techniques, it is preferable from a statistical point
of view.
5. Operational considerations: Probability
sampling is sophisticated and requires statistically
trained researchers, costs more and takes longer
time so it is practically difficult to use.
118
Choosing Nonprobability vs.
Probability Sampling: Summary
Conditions favoring the use of
Factors Non-probability Probability
Sampling Sampling
Nature of research Exploratory Conclusive
Relative magnitude Nonsampling errors are larger Sampling errors are
of sampling and larger
Nonsampling
errors
Variability in the Homogeneous (low) Heterogeous (High)
Population
Statistical Unfavorable Favorable
considerations
Operational Favorable Unfavorable
119
considerations
Step 5: Determine sample size
120
Step 5: Determine sample size
Factors Influencing Sample Size are
1.Importance of the decisions
122
Formula for Determining
Sample Size with an Example
X 1.96
Accuracy Level = .1
N
.1
is (Standard deviation of Population)
.1 1.96
Confidence level = 95% N
2.5
Example: Determining a sample size to .1 1.96
N
estimate the mean number of schooling
2.5
completed by persons with foreign-born N 1.96
Parents. .1
N 2,401
123
Sampling Variability
• When sampling from a population, statistics vary from
sample to sample.
• Ideally we would like the values of the statistic to
randomly fluctuate around the true parameter value.
i.e. not to always be higher or always be lower than
the true parameter value.
• If the statistic does not randomly fluctuate, we say the
statistic is biased.
– Bias is a consistent repeated deviation of the
sample statistic from the population parameter in
the same direction when we take many samples.
We don’t want a sample that favors one result.124
Sampling Variability
• Variability
describes how
spread out the
values of the
sample statistic
are. Large
variability
means that the
result of the
sampling is not
repeatable.
125
Sampling error
• Origin in sampling and arise due to the
fact that only a part of the population has
been used to estimate population
parameters and draw inferences about
the population
• No sample is the exact mirror image of
the population
• Magnitude of error can be measured in
probability samples
126
Sampling error
• Expressed by standard error
– of mean, proportion, differences, etc
• Function of
– amount of variability in measuring
factor of interest
– sample size
127
Sources of sampling error
129
Non-Sampling error
Arise at the stages of observation,
ascertainment and processing of the data and
can occur at every stage of the planning or
execution of census or sample survey.
130
Sources of Non-sampling error
I. Faulty planning or Definitions:
inadequate and inconsistent data
specification w.r.t the objectives of the
survey, errors due to location of the units
and actual measurement of the
characteristics, in recording the
measurements and due to ill-designed
questionnaire, lack of trained, qualified and
adequate number of investigators and
supervisory staff.
II. Defective frame in sample survey 131
Sources of Non-sampling error
III. Response errors: Due to
misunderstanding of respondent, prestige
bias by virtue of which respondent may
upgrade education, intelligence quotient,
occupation, income, etc, or downgrade age
etc., Self-interest like giving underestimate
of salary or production and an over-
statement of his expenses or requirements
etc., Bias due to interviewer who influenced
responses and failure of respondent’s
memory for past happening or conditions. 132
Sources of Non-sampling error
IV. Non-response Bias: If full information is
not obtained on all the sampling units or
when selected individuals are not contacted
or do not respond
- usually 30%
- results in bias
V. Errors in coverage: inclusion of irrelevant
units and/ or exclusion of relevant survey
units. For example: under representation is
found in surveys of poor, homeless, prison
inmates and opinion polls over telephones
where a part of population will be missed
that do not have phones 133
Sources of Non-sampling error
VI. Interviewing skills - important not to introduce
bias
- types of questions asked
- attitude during interviewing
- wording of questions - confusing, misleading,
intimidating
VII. Compilation and publication errors: In data
editing, coding of the responses, entering data in
computers, tabulation and summarizing the original
observations made in the survey….Can be
controlled through verification, consistency check
etc.
134
How to control Non-sampling
error
• Non-Sampling errors can be controlled
by employing qualified, trained and
experience personnel, better supervision
and better equipment for processing and
analyzing data as compared to a
complete census
135
136