Research Methods For MSC MPH
Research Methods For MSC MPH
Research Methods For MSC MPH
1
VARIABLES
‘What information are we going to collect in our
study to meet our objectives?’
2
FORMULATING VARIABLES
What is a variable?
• A VARIABLE is a characteristic of a person, object or phenomenon which can take
on different values.
• In the form of numbers (e.g., age) or
• non-numerical characteristics (e.g., sex)
• A simple example of a variable in the form of numbers is ‘a person’s age’.
• Other examples of variables are:
• weight (expressed in kilograms or in pounds);
•home - clinic distance (expressed in kilometres or in minutes walking distance);
• monthly income (expressed in dollars, rupees, or kwachas); and
• number of children (1, 2, etc.).
3
• What is a variable?
• Because the values of all these variables are expressed in numbers, we call them
NUMERICAL VARIABLES.
• Some variables may also be expressed in categories.
• For example, the variable sex has two districts categories, groups, male and female.
4
5
Numerical variables can either be continuous or discrete
1. Continuous
With this type of data, one can develop more and more accurate measurements
depending on the instrument used, e.g.:
• height in centimeters (2.5 cm or 2.546 cm or 2.543216 cm)
• temperature in degrees Celsius (37.20 C or 37.199990C etc.)
2. Discrete
These are variables in which numbers can only have full values, e.g.:
• number of visits to a clinic (0, 1, 2, 3, 4, etc).
• number of sexual partners (0, 1, 2, 3, 4, 5, etc.)
6
Categorical variables, on the other hand, can either be ordinal or nominal
1. Ordinal variables These are grouped variables that are ordered or ranked in
• Increasing or Decreasing order:
• High income (above 300 per month)
• Middle income (100-300 per month)
• Low income (less than 100 per month)
Other examples are:
• Disability:
• No disability,
• Partial disability,
• Serious or total disability
7
Categorical variables, on the other hand, can either be ordinal or nominal
1. Ordinal variables
• Seriousness of a disease:
• severe,
• moderate,
• mild
• Agreement with a statement:
• fully agree,
• partially agree,
• fully disagree
• Fear of leprosy:
• will not share food with a patient;
• will not enter the house of a patient;
• will not allow patient to live in the community.
8
Categorical variables, on the other hand, can either be ordinal or nominal
2. Nominal variables.
The groups in these variables do not have an order or ranking in them.
For example:
Sex: male, female
Main food crops: maize, millet, rice, etc.
Religion: Christian, Muslims, Hindu, Buddhism, etc.
9
Variables
• When you selected the variables for your study, you did so with the
assumption that they either would help to define your problem
(dependent variables) and its different components or that they were
contributory factors to your problem (independent variables)
11
Operationalizing variables by choosing appropriate indicators
• If you want to determine the level of knowledge concerning a specific issue in order
to find out to what extent the factor ‘poor knowledge’
• Nutritional status of under-5 year olds: widely used indicators for nutritional
status include:
- Weight in relation to age (W/A)
- Weight in relation to height (W/H)
- Height in relation to age (H/A)
- Upper-arm circumference (UAC)
• For the classification of nutritional status, internationally accepted categories
already exist, which are based on so-called standard growth curves.
• For the indicator ‘Weight/Age’, for example, children are:
- well-nourished if they are above 80% of the standard,
- moderately malnourished if they are between 60% and 80%,
- severely malnourished if they are below 60%. 13
Operationalizing variables by choosing appropriate indicators
14
Defining variables and indicators of variables
• To ensure that everyone understands exactly what has been measured
and to ensure that there will be consistency in the measurement, it is
necessary to clearly define the variables (and indicators of variables)
15
Dependent and independent variables
• Because in health systems research you often look for causal
explanations, it is important to make a distinction between dependent
and independent variables
• The variables that are used to describe or measure the factors that are
assumed to cause or at least to influence the problem are called the
INDEPENDENT variables
16
Dependent and independent variables
• In a study of the relationship between smoking and lung cancer, ‘suffering
from lung cancer’ (with the values yes, no) would be the dependent variable
and ‘smoking’ is the independent variable
18
Questionnaire designs and
data sources
19
Questionnaire designs and data sources
• Questionnaire design
• Types of questions
• Steps in Designing a questionnaire
• Consideration to write Open or Closed
• Formatting interpreting questionnaires
• Data sources
• Data collection
• Data quality control
20
Questionnaire design
21
Questionnaire …
22
Types of questions
2. Closed questions: supplies the respondent with two or more specified alternative
responses
23
Steps in designing a questionnaire
24
Writing the Questionnaire
25
Types of Data
26
Types of data…
• Based on Type of Research Methods: data can be classified into
1. Quantitative data : is that which can be easily measured and recorded in numerical
form
27
Data sources
28
Data collection
29
Factors that affect data collection methods
30
Field staff recruitment
– Pre‐test is usually done on similar population but in an area different from the
actual survey
31
Purpose of pre‐test
32
Quality control at field level
33
Monitoring at the field
• Completeness of questionnaires
34
Quality control at data entry and processing
35
Quality control for Qualitative studies
36
Data processing
37
Coding Data
• Example, instead of using male and female for the variable sex, it can be coded as
1=male and 2=female
• The meaning of the codes will depend on the level of measurement of the variable
39
Coding data …
• The ID number can direct the person back to the original data for
correction
• Used for follow-up
40
Data entry
42
Measurement Error
43
Measurement
• Measurements vary from questions asked about:
• Symptoms during history-taking,
• Physical examinations and
• Tests,
• laboratory tests,
• imaging techniques,
• self-report questionnaires,
44
Characteristics of measurements
45
Development of a measurement instrument
46
Measurement Error
• Measurement error is the difference between the recorded response to a question
and the ‘true’ value.
• Measurement error occurs as part of data collection,
• It may arise from four sources:
1. The questionnaire,
4. The respondent
47
Sources of Measurement Error
1. Questionnaire Effects
1.1. Specification problems:
• Error can occur because the data specification is inadequate and/or
inconsistent with what the survey requires
48
1. Questionnaire Effects
1.2. Question wording:
• The designer wants the respondent to interpret the question as the designer
would interpret the question
The potentials for error are many
• First, the questionnaire designer may not have a clear formulation of the
concept he/she is trying to measure
• Next, even if he/she has a clear concept, it may not be clearly represented in the
question.
• Even if the concept is clear and faithfully reproduced, the respondent may not
interpret the request as intended.
• Not all respondents will understand the request for information, due to
language or cultural differences, affective response to the wording, or
differences in experience and context between the questionnaire author and the
respondent
49
1. Questionnaire Effects
1.3. Length of the questions
• The questionnaire designer is faced with the dilemma of keeping
questions short and simple while assuring sufficient information is
provided to respondents so they are able to answer a question
accurately and completely.
• Common sense and good writing practice tell us that keeping
questions short and simple will lead to clear interpretation.
• Research, however, suggests that longer questions actually yield more
accurate detail from respondents than shorter questions, at least as they
relate to behavioral reports of symptoms and doctors’ visits
50
1. Questionnaire Effects
51
1. Questionnaire Effects
1.5. Open and closed formats
• Question formats in which respondents are asked to respond using a
specified set of options (closed format) may yield different responses
than when respondents are not given categories (open format).
• A given response is less likely to be volunteered by a respondent in an
open format than when included as an option in a closed format.
• The closed format may remind respondents of something they may not
have otherwise remembered to include.
• The response options to a question cue the respondent as to the level
or type of responses considered appropriate
52
1. Questionnaire Effects
1.6. Data Collection Mode Effects
Face-to-face interviewing
• Face-to-face interviewing is the mode in which an interviewer administers a structured
questionnaire to respondents.
• Using a paper questionnaire or via computer assisted personal interviewing, the
interviewer completes the questionnaire by asking questions of the respondent.
• One problem for face-to-face interviewing is the effect of interviewers on respondents’
answers to questions, resulting in increases to the variances of survey estimates.
• Another possible source of measurement error is the presence of other household
members who may affect the respondent’s answers. T
• his is especially true for topics viewed as sensitive by the respondents.
• Measurement error may also occur because respondents are reluctant to report
socially undesirable traits or acts
53
1. Questionnaire Effects
54
2. Respondent Effects
• Survey respondents vary considerably in their abilities and willingness to
provide accurate answers to questions regarding to their behaviors.
• Respondent behaviors can be understood within the framework of the
generally accepted cognitive model of survey response, which recognizes
four basic tasks required from respondents when they are answering each
survey question.
a) question interpretation,
(b) memory retrieval,
(c) judgment formation,
(d) response editing
• This is a useful model for understanding how variability across respondents
may influence the quality of self-reported information
55
2. Respondent Effects
2.1. Question Interpretation
• Respondents sometimes employ terminology that differs from that employed in
research questionnaires.
• A related concern is the degree to which respondent cultural background may
influence the interpretation and/or comprehension of survey questions.
• Some disease patterns and risk practices are known to vary cross-culturally and those
varied experiences and beliefs regarding the problem can also be expected to influence
respondent knowledge and familiarity with the topic in general and related
terminology in particular.
• Experienced researchers, of course, recognize the importance of investigating and
addressing these potential problems by employing focus groups discussion
56
2. Respondent Effects
57
2. Respondent Effects
59
2. Respondent Effects
60
3. Interviewer Effects
• Because of individual differences, each interviewer handles the survey
situation in a different way, that is, in asking questions, probing and
recording answers, or interacting with the respondent, some
interviewers appear to obtain different responses from others.
• The interviewer situation is dynamic and relies on an interviewer
establishing rapport with the respondent.
• Interviewers may not ask questions exactly as worded, follow skip
patterns correctly or probe for answers non directivity.
• They may not follow directions exactly, either purposefully or because
those directions have not been made clear enough.
• Interviewers may vary their inflection, tone of voice, or other personal
mannerisms without even knowing it.
61
3. Interviewer Effects
• Errors, both over reports and underreports, can occur for each interviewer.
• When over reporting and underreporting of approximately the same
magnitude occurs, small interviewer bias will result.
• However, these individual interviewer errors may be large and in the same
direction, resulting in large errors for individual interviewers Another
possible mechanism that may account for interviewer effects involves social
distance.
• It is possible that the social distance between respondents and interviewers
may influence respondent willingness to report sensitive behaviors
• Interviewer-respondent familiarity with one another may also influence the
quality of self-reported behaviours or practice
62
3. Interviewer Effects
• Helping the respondents in different ways (even with gestures), putting emphases
in different questions, misreading questions, failing to probe answers correctly,
not following other elements of standardized survey protocols
• Social distance: if the interviewer varies from interviewee by sex, culture,
education, age, and dressing; the respondents may not feel comfort to genuinely
and freely respond about their morbidity and other contributing factors.
• Interviewer-respondent familiarity with one another may also influence the
quality of self-reported behaviours or practice
• Between interviewer bias: variability in terms of knowledge, profession
experience may influence the result.
• Language barrier; if the interviewers are not fluent speakers of the language of
respondents, he/she cannot explain the question in case the respondents unable to
understand the questionnaire
63
4. Context Effects
• Various aspects of the social and physical environment within which
survey data are collected may also influence the quality of the information
collected
• One aspect of the social environment that has received attention is the
absence or presence of other individuals during the interview, as this is
believed to influence the social desirability demands or pressures that
respondents may perceive.
• In general, the presence of others during survey interviews is known to be
associated with lower reporting of sensitive behaviors
• The physical context within which interviews take place may also
influence social desirability pressures and self-report quality
64
5. Processing Errors.
• Once data collection is complete, the construction of a final survey data set
requires the implementation of numerous coding and editing rules.
• The integrity of these rules is particularly critical in surveys, as they
typically involve assumptions about the reporting intentions of respondents
5.1. Data Entry Errors
• Data entry errors occur in the process of transferring collected data to an
electronic medium.
• The frequency of these errors varies by the types of information collected
(e.g., numeric versus character) and the mode of data collection.
• For example, with paper and pencil enumeration, survey data are key-
entered after the survey interview takes place
65
5. Processing Errors.
66
5. Processing Errors.
5.3. Pre-Edit Coding Errors
• Most surveys require some type of pre-edit coding of the survey returns before
they can be further processed in edit, imputation, and summary systems.
• The required coding is generally of two types—unit and item response
coding.
• The unit response coding assigns and records the status of the interview.
• It is designed to indicate the response status of the return for a sampled unit so
that it can be appropriately handled in subsequent processing.
67
5. Processing Errors.
5.3. Pre-Edit Coding Errors
• Item response coding is the more commonly discussed form of
questionnaire coding
• This can involve coding an actual response for a survey question into a
category
• This situation occurs for questions that elicit open-ended responses.
• For example, sometimes one of the responses in categorical questions is
“other-specify,” which results in a free-text response.
• In other surveys, the respondent may be asked by design to provide an
open-ended response
68
5. Processing Errors.
5.3. Pre-Edit Coding Errors
• The recoding of open-ended responses into a categorical variable is
performed by coders who interpret and catalogue each response.
• This process can result in error or bias, since different coders are likely to
interpret and code some responses differently.
• Even the same coders may change the way they code as they gain more
experience or get bored. Another problem with open-ended responses is that
respondents sometimes supply more than one answer to the question
• A technique employed to measure and reduce coding errors is the use of
multiple coders for the same set of responses.
• This is analogous to double key entry.
• Once both sets of coders have completed the coding, a comparison is made
between the two sets of data, with any discrepancies resolved by committee or
by an expert 69
5. Processing Errors.
5.4. Editing Errors
• Editing is a procedure designed and used for detecting erroneous and/or
questionable survey data (survey response data or identification type data)
with the goal of correcting (manually and/or via electronic means) as much
erroneous data (not necessarily all of the questioned data) as possible,
usually prior to data imputation and summary procedures
• Editing generally occurs at various points in the survey processing and can,
in itself, generate errors at each juncture.
• The editing process, in general, allows survey managers to review each
report for accuracy an activity that usually results in a feeling of control
over the process while obtaining a sense of the data
70
5. Processing Errors
5.4. Editing Errors
• While, indeed, there are benefits from editing, recent studies
documented by various survey organizations have shown that data are
often over-edited.
• This over editing unnecessarily uses valuable resources and can
actually add more error to the data than it eliminates.
• Processing error can also arise during edit processing due to edit
model failure in an automated system.
• Whether the editing is based on a very sophisticated mathematical
model or on simple range checks, the manner in which erroneous data
are flagged, and how they are handled, can introduce processing error
71
RELIABILITY &
VALIDITY
72
RELIABILITY
Reliability is defined as ‘the degree to which the measurement is free
from measurement error
• In full this is ‘the extent to which scores for patients who have not
changed are the same for repeated measurement under several
conditions:
• internal consistency
• test–retest
• inter-rater
• intra-rater
73
Types of Reliability
• Parameters of reliability for continuous variables
1. Intraclass correlation coefficients for single measurements
2. Pearson’s r
• Parameters of measurement error for continuous variables
1. Standard error of measurement
2. Coefficient of variation
• Parameters of reliability for categorical variables
1. Cohen’s kappa for nominal variables
• No parameters of measurement error for categorical variables
• It can be examined, however, which percentage of the measurements are
classified in the same categories. We call this the percentage of agreement
• Cronbach’s alpha as a reliability parameter
74
Validity
Define: ‘the degree to which an instrument truly measures the
construct(s) it purports to measure’
Types of validity
1. Content validity (including face validity): is the degree to which the
content of a measurement instrument is an adequate reflection of the
construct to be measured’
• If the construct we want to measure is body weight, a weighing scale is
sufficient.
• To measure the construct of obesity, defined as a body mass index (BMI =
weight/height2 ) > 30 kg/m2 , a weighing scale and a measuring rod are needed.
75
Types of validity
A. Face validity :A first aspect of content validity is face validity. ‘the
degree to which a measurement instrument, indeed, looks as though
it is an adequate reflection of the construct to be measured’
It is a subjective assessment and, therefore, there are no standards with regard to
how it should be assessed, and it cannot be quantified.
B. Content validity; When an instrument has passed the test of face
validation, we have to consider its content in more detail
• The purpose of a content validation study is to assess whether the
measurement instrument adequately represents the construct under
study.
• We again emphasize the importance of a good description of the
construct to be measured.
76
2. Criterion validity; ‘the degree to which the scores of a measurement instrument
are an adequate reflection of a gold standard’
• This implies that criterion validity can only be assessed when a gold standard
(i.e. a criterion) is available.
3. Construct validity: In situations in which a gold standard is lacking, construct
validation should be used to provide evidence of validity.
• Construct validity was defined as the degree to which the scores of a measurement
instrument are consistent with hypotheses, e.g. with regard to internal
relationships, relationships with scores of other instruments or differences
between relevant groups
• Construct validation is often considered to be less powerful than criterion
validation
77
4. Cross-cultural: is defined as ‘the degree to which the performance of the items
on a translated or culturally adapted PRO instrument are an adequate reflection
of the performance of items in the original version of the instrument’
78
Validity and reliability
79
80
Thank you
81