Modul Study Design

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

GRADUATE PROGRAMME IN EPIDEMIOLOGY

PRINCE OF SONGKLA UNIVERSITY

COURSE: EPIDEMIOLOGICAL METHODS II


MODULE: OVERVIEW OF STUDY DESIGN (revised October 2021 from
earlier slide presentation)
_____________________________________________________________________________

MODULE CONTENTS

Part 1: Conventional classification of studies


Descriptive and analytical (observational and experimental) designs
Study design hierarchy
Descriptive studies
Analytical studies – observational
as long as our research questions is answered with
Correlational studies design study , the researcher are the one that responsibility . RCT are not always
Cross-sectional studies match to every cases.
Case-control studies and their variants
Cohort studies, concurrent, historical and other variations
Analytical studies – experimental (intervention)
Randomized controlled trials
Community trials

Part 2: A components approach


Type of information required
Informational goal
Width of information sought
Directionality of investigation
Type of outcome
Time relationship
Type of sample
Design as an assembly of components
____________________________________________________________________________

OBJECTIVES
The objective of Part; 1 is to provide an overview of conventional designs together with
some mention of variants of the traditional analytical study deigns. Most of the content
of Part 1, is probably quite familiar to you already, and it is advised that you read
through Part 1 beforehand and then discuss any points of particular interest or confusion
in the session.
After completing Part 1, you should be familiar with the following
What is meant by a descriptive study design;
What is known as a correlation or ecological study design and its limitations;
The meaning and strengths and limitations of the following study designs: cross-
sectional, cohort, case-control, experimental and random controlled trial.

Part 2 attempts to examine epidemiological study design from a different viewpoint.


Instead of starting with the name of a design and then examining what is required for a
study to conform to the requirements of the design, Part 2 starts with a number of
components or attributes of any particular research objective and/or strategy,
highlighting the variability in these components. When the nature of each component is
decided, the components can be combined to construct a study design, tailored to the
particular investigation of concern.
After completing Part 2, you should have a better appreciation of the wide variety of
study designs and also appreciate the difficulties of trying to classify some studies in
terms of the conventional epidemiological design categories.
______________________________________________________________________
TEACHING AND LEARNING ACTIVITIES
Self-directed learning and discussion session. Participants should try to apply the ideas
presented in Part 2 to their own proposed project.
______________________________________________________________________
PERIODS REQUIRED
One session
______________________________________________________________________
ADDITIONAL READING
Hulley SB, Cummings SR, Browner WS, Grady D, Hearst N, Newman TB. Designing
Clinical Research. 4th edition. Baltimore: Lippincott Williams and Wilkins, 2013.
Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. 3nd edition. Philadelphia:
Lippincott Williams & Wilkins, 2008.
Kramer MS, Boivin JF. Toward an “unconfounded” classification of epidemiological
research design. J. Cron.Dis, 1987: 40(70, 683-8. [Old, but still useful reference]
Maclure M. Taxonomic axes of epidemiologic study design: a refutationist perspective. J
Clin Epidemiol, 1991; 44(10) 1045-53. [also old, but still useful reference]
http://www.jerrydallal.com/lhsp/study.htm
http://www.teachepi.org/documents/courses/fundamentals/Pai_Lecture2_Study_designs.
pdf
http://www.epidemiolog.net/evolving/AnalyticStudyDesigns.pdf
http://www.med.wayne.edu/pdfs/epidemiologic%20principles.pdf

2
Introduction
Epidemiology has been defined as the study of health outcomes in (human?)
populations. Common goals of epidemiology are:
To describe health status of populations, (frequencies and trends)
To explain the aetiology of diseases
To predict disease occurrence in populations
To control disease (prevention, eradication, prolonging life)
Thus, epidemiology may involve issues of disease quantification, distribution
(questions of who?, when? where?), the development of hypotheses, the testing of
hypotheses and identification of determinants (causal and preventive factors).
Conventionally epidemiological studies are classified into descriptive (describing the
occurrence of outcomes) and analytical studies (generally describing associations
between exposure and disease. Typical questions posed include .
Are exposure and outcome related?
What is the exposure?
Who are exposed?
What are the potential health outcomes?

Part 1: Conventional study designs


Descriptive studies include
– Case reports
– Case series studies,
– Population-based descriptive studies ?????
Analytical studies are divided into two types, observational and experimental.
Observational analytical studies include
– Correlational (ecological) studies
– Before-after studies
– Cross-sectional studies
– Case-control studies (and variants)
– Cohort studies
Experimental analytical studies include
Non-randomized interventional studies, and
Randomized controlled trials.

3
Hierarchy of epidemiological study design
A schematic arrangement of epidemiological study designs is shown in the diagram
below.

Partly adapted from Songer TJ, www-pitt.edu/—super7/19011-20001/19101-ppt

Moving from left to right the study designs potentially reveal increasing knowledge
(or increasing strength of evidence) of exposure-disease relationship. Study designs in
the upper row are considered to be useful in terms of hypothesis generation, while
those in the lower row are appropriate for hypothesis testing. The distinction between
observational and experimental analytical studies is also indicated.
Laboratory and animal studies can provide valuable knowledge related to exposure-
disease relationships, but are generally not considered among epidemiological study
designs.

Descriptive studies
Case reports and case series are detailed reports of one or a series of patients and
may reveal, for instance, new findings (such as of a previously undescribed disease),
unexpected link between diseases, new therapeutic effects, adverse events, experience
with a group of patients with similar diagnosis.

Population-based descriptive studies are desrpitons of whole po;ualtons


or a defined subset of a population, commonly used to report incidence
rates and prevalence levels.

4
Analytical studies - observational
Important characteristics of observational analytical epidemiological studies include
their being non-experimental (having no intervention as part of the study design), the
exposure and outcome occur in a non-controlled situation, and the data can be
obtained retrospectively (historical data), concurrently or prospectively.

Correlational (ecological) studies


These are studies that use aggregated data from whole populations, commonly
secondary data thatj may have been collected for routine purposes or (national)
surveys, and may be used to compare frequencies of disease or patterns of exposures
and disease across different populations at one time, or across the same population at
different times. They may be useful for the generation of hypotheses, but not for
testing of hypotheses as data are not available on individuals . Concluding exposure-
outcome relationships from such studies is liable to error, commonlyj referred to as
“ecological fallacy”. Considered by some to be classified among descriptive studies.

Cross-sectional studies These are studies in which data are collected at a single point
or over a single period of time. Except for time-constant characteristics, there is no
temporal sequence revealed, so strictly limiting the identification of cause and effect.
Nevertheless, the investigator may decide to consider some characteristic(s) to be
possible cause(s) and others to be possible outcome(s).
Despite the limitation on drawing causal inferences for cross-sectional studies, they
may indeed be used to identify associations, albeit not causal associations, and
therefore be a useful resource for generation of causal hypotheses.
As cross-sectional studies collect data more or less at one time point, they are
unsuitable for studies involving rare characteristics or rapidly changing variables
(such as highly fatal diseases or rapidly emerging diseases) but are more suited to
study of long-duration conditions. That is, they can be used to investigate prevalent
conditions, not incident events. They are considered to provide the weakest evidence
for causality among all observational analytical study designs. An example of a cross-
sectional design is the community survey.
The basic design pattern of cross-sectional studies and a simple example of the
analytical process are shown below.

5
Case control studies
These are studies in which subjects developing the disease of interest or with disease
are included as “cases” and others who have not developed that disease or do not have

6
the disease (i.e., healthy) from the same population are included as “controls”. The
proportion of cases and control that have been exposed are then compared.
Differences in proportion between the two groups suggests an association between the
exposure and the disease.

Advantages of the case-control design are:


Multiple exposures can be studied in a single study;
Rare outcomes can be investigated;
Study can be completed in a relatively short time;
Fewer subjects are required than with a cohort study with a similar objective:
Less expensive than an equivalent cohort study.

However, case-control studies also have their limitations:


The temporal sequence of disease and exposure is not always clear, therefore
evidence for causality may be weak;
Exposure information often relies on recall, which may be somewhat
unreliable and may be subject to recall bias if, for instance, cases recall more
rigorously than controls,
Not suitable for rare exposures;
It is necessary to adjust for confounding; effects of other variables, (matching
on some potential confounders can be done at the time of selection of cases
and controls).
Only recognized potential confounders can be controlled.

7
The analytical process used in a simplelk case-control study is shown below. Note
that the analysis starts with a comparison of the exposure prevalence between the two
outcome groups.

Interpretation of the odds ratio in basic case-control studies


To understand the interpretation of odds ratio in case-control studies is is useful to
consider the cases and controls as being performed within the scope of a (possibly
unidentified) cohort.
Let T1 and T0 be the amounts of exposed person-time and unexposed person
time in the hypothetical cohort, and B1 and B0 represent the numbers of
exposed and unexposed controls in the case-control study.
If we sample the controls such that the ratio of the number of exposed controls
to unexposed controls is equal to the ratio of person-time of exposure to
person-time of non-exposure (this should be the case if controls are sampled
independent of exposure experience and over the same time period as the
cases),
Then B1/B0 = T1/T0 and therefore B1/T1 = B0/T0
If, also, A1 and A0 are the numbers of cases in exposed and unexposed groups
in the study, and therefore in the ”cohort”, then the actual incidence rate ratio
in the “cohort” can be estimated by replacing T with B.
Thus, the incidence rate ratio = (A1/T1) / (A0/T0) = (A1/B1) / (A0/B0)
But this is the same as (A1/A0) / (B1/B0) , which is the exposure odds ratio
between cases and controls.

8
Thus, the odds ratio determined from such a case-control study is actually an
estimate of the incidence rate ratio between exposed and unexposed groups
(assuming a case is a person with an incident event).

Variants of the basic case-control design


Case-cohort study: Controls are sample form the entire cohort with every
member having equal chance to be selected as controls. Odds ratio then
estimates the risk ratio in the cohort.
Epidemic case-control study: Controls are selected from among non-cases at
the end of an epidemic.
Case-crossover study: For each case an earlier period is selected to act as a
matched control.
Proportional mortality study: Cases are deaths form an index disease,
controls are deaths from other diseases.
Prevalent case-control study: Cases are subjects with a prevalent condition,
controls are those without. The odds ratio is then called a prevalence odds
ratio.
Nested case-control study: Cases and controls are selected from a previously
defined cohort.

Cohort studies
In the basic cohort design, subjects are classified on the basis of presence or absence
of exposure, then followed up over time to determine the development of disease in
each exposure group..
Since outcome has not yet occurred at the time of classification on exposure, the
temporal relationship between exposure and outcome is usually clear. Therefore, the
design is fairly good for establishing a cause-effect relationship. Other advatanges are
that it is suitable to investigate multiple outcomes, and can be used for rare exposures.
There are, however, a number of limitations. Because subjects are followed up to
determine the occurrence outcomes, a long period and/or large sample sizes may be
required for sufficient numbers of outcomes to occur, especially if the outcomes are
rare. For these reasons, also the cost of conducting a cohort study tends to be high.
There may also be subjects lost to follow-up. It may be necessary to adjust for
potential confounding. This may be done in the analysis phase, provided data on
confounding variables has bee collected. Alternatively, it may be possible to remove
some of the confounding by matching on potentially confounding variables at the time
of subject selection.

9
The basic design of a cohort study (actually a “double cohort” study) is shown in the
figure below.

Although cohort studies involve follow up of subject of participants, the follow up


period may be concurrent with the conduct of the study, as shown in the following
diagram.

10
On the other hand, the follow up may occur in the past, or even partly in the past and
partly during the conduct of the study, as shown below.

Historical cohort studies mostly involve the extraction of data from large hospital
medical databases.

11
The analytical process follows the following scheme:

In this scheme, the outcome is considered as occurring or not occurring and the
measure of association is commonly risk ratio or odds ratio. However, a study may be
more interested in the occurrence of the outcome as a function of follow-up time. In
this case the measure of association is likely to be incidence rate ratio.
If T1 and T0 are the amounts of exposed person-time and unexposed person-time
respectively, and A1 and A0 are the numbers of cases in exposed and unexposed
groups, then, using incidence density to estimate the average incidence rate..

Incidence rate ratio = (A1/T1) / (A0/T0)


Various other measures of association may be used in some cohort studies:

12
Variants of the basic cohort design
Instead of separately sampling exposed and unexposed populations, the population
may be sampled and exposed and unexposed groups determined after sampling. This
may be used when the proportion of exposed in the population is reasonably high. It
also has an advantage that the sample as a whole is representative of the population.
There may be more than 2 exposure groups
There may be multiple outcome groups.
The outcome could be a continuous variable.
Time to outcome may be a parameter of interest – survival analysis techniques
may be necessary.
Subjects’ time frame may be totally or partially earlier than the study time
frame.

13
Analytical studies - experimental
Experimental studies are similar to cohort studies, except the the exposure is allocated
and therefore controlled. If the allocated is randomly assigned then the study is
described as a Randomized Controlled Trial (RCT). The randomization should, on
average, effectively reduce the problem of confounding as it ideally will balance the
distribution of all potential confounders, even those that are not recognized at the time
of the study.
The RCT is the most powerful design for establishing causality and is suitable for
studying small to moderate effects.
Limitations include the ethical problem of investigating possibly harmful
experimental exposure. In some situations, the effect of suspected harmful exposure
may be investigated by removing the exposure in the experimental group. Ethical
problems may also arise owing to the withholding of a treatment believed to be
beneficial from patients in the control arm.
RCTs generally require extensive planning and tend to be very expensive.
The basic design of an RCT is shown below.

14
Analytical processes appropriate for RCT are shown below.

Community trials are another type of experimental or interventional study design. In


this case randomization of individual subjects is rarely possible. It may be possible,
however, to conduct treatment allocation using cluster randomization.

15
Part 2: A components approach
As well as understanding the “ready-made” designs outlined above, we may also be
useful to consider the various components (or attributes) of a study, choose those
which are most suitable to the intended investigation, and build up a design
appropriate to a particular research project being planned.
Some of the components that might usefully be considered are the following:
1. Type of information required.
2. Informational goal.
3. Width of information sought.
4. Directionality of the information, and of the study.
5. Type of outcome.
6. Time relationship between conduct of the study and the data.
7. Representativeness of the sample.

This approach allows the research planner to more clearly appreciate the particular
requirements on the intended study, to understand how the conventional designs may
be modified, to realize that some studies are in effect intermediates or combinations of
conventional designs or may not adequately be described by any of the “ready-made”
design labels.
Also most important is that considering the components of a study design can help to
plan each stage of a study from the sample selection and data collection through to the
analytical strategy appropriate to the study objective.

Type of information required


The information sought in a study may be considered to be descriptive, comparative
or associative.
Descriptive information may be thought of as applying to a single group of subjects or
health settings from which quantitative or qualitative information is sought. There is
some lack of clarity in the conventional study classification of “descriptive” vs
“analytical” design. For instance, a study of the survival of a group of patients given a
certain diagnosis would, according to the components approach, be considered to be
seeking descriptive information. However, there may be no little analysis involved in
the describing the survival profile. Similar arguments can be made for studies aimed
at estimating population prevalences, incidence rates, etc. Even describing non-
numeric characteristics of a population from a sample may involve some analytical
work.
The following are examples of descriptive information \:
• Prevalence of an exposure
• Prevalence of a health condition

16
• Mean incidence rate for some event
• Survival profile following some event.
• Attitudes towards health-care provision.

When descriptive information about a population is the objective of the study, an


important concern is that the sample well represents the target population.
Comparative information may be thought of as an extension of descriptive
information where descriptive information of more than one group or population are
compared.
Comparative information may be similar to descriptive information but the focus is on
the comparison of 2 or more populations or groups. An important concern when
seeking comparative information is ensuring comparability of the populations or
groups. Using direct or indirect age standardization is an example of how to improve
comparability. Other techniques include stratification on age andj sex or other
variables in non-interventional studies and randomization in intervention studies
Comparability can be increased by techniques such as stratification, e.g., adjustment
by age and sex (and other variables) in non-intervention studies or by randomization
in intervention studies.
If aggregate data (instead of individual data) are compared (so-called correlational
study), there is a danger of “ecological fallacy”.

Associative information is information that pertains generally to the relationship


between characteristics and/or events at the level of the individual. Associative
information may be causal or non-causal or the nature of the relationship may be
unknown.
If the association is causal, then one characteristic can be regarded as dependent
(outcome variable) and others as independent (exposures or explanatory variables).
However, independent characteristics can be “predictors” of another characteristic
without necessarily being causally related (usually the case in studies of a diagnostic
test).
The following ae examples of associative information in a variety of settings.
Long-term exposure to arsenic and occurrence of bladder cancer.
Proximity of residence to a tin-mine and blood lead levels.
History of Chlamydia infection and occurrence of ectopic pregnancy.
Lack of exercise and childhood obesity.
Major concern in obtaining associative information when causal association is of
interest is assessing the evidence for causality and controlling for confounding.

17
Informational goal
The goal of the study may be to test a hypothesis or to estimate the value of a
parameter. When a hypothesis is to be test, consideration should be given to the
power of the study; when estimation is the goal, an important consideration is the
precision with which the estimation can be made.

Width of the information sought


A study may aim to obtain some broad information or to answer a narrow question.
Thus, we may think of:
Exploratory studies such as a study to identify risk factors for a disease, or
Directed studies, such as a study to determine the strength of association between a
specific risk factor and some disease.

Directionality of information and of the study


The information sought may refer to a single time point for each individual enrolled in
the study or each unit of analysis, and thus referred to as “cross-sectional” or
“transverse” information. Examples may be
HIV status of mothers attending for antenatal care;
The presence of carcino-embryonic antigen in blood and presence of colon cancer.
On the other hand the information may refer to a period of time (for each subject
enrolled in the study, i.e., longitudinal information. For longitudinal information, the
direction of the study, may be in a forward or backward direction.
Consider the following examples:
Follow-up of patients diagnosed with stomach cancer to determine their
survival duration.
Follow-up of smokers and non-smokers to determine the cumulative incidence
of myocardial infarction.
Follow-up of patients with disease D randomized to treatment A or treatment
B to compare the percentage cure in each treatment.
Comparison of history of exposure to betel-quid chewing among subjects with
and without oral cancer. (categorical outcome)
Comparison of history of playing in contaminated streams and blood lead
concentrations in children. (continuous outcome)
Think back to the conventional study designs outlined in the first part of this module.
Can you fit each of the above example into one or other of those designs?

18
Type of outcome
The outcome may be categorical or continuous. And it might be an incident event or a
prevalent characteristic.

Time relationship between the conduct of the study and the time to
which the data refer.
If the data refer to the time at which or during which the study is conducted, then the
study can be called a concurrent study. On the other hand if the data sought refer to
conditions and/or events in the past, then the timing of the study is historical, for
instance:
A study of association using a historical cohort in which both exposure and
outcome have already occurred before the onset of the study.
However, many studies have both concurrent and historical components, and may
therefore be considered to have mixed timing.

Representativeness of the sample


Samples may be drawn from target populations which are fully representative of the
target population except for sampling error. Such samples may be called “non-
contrived” samples. That is, the sample has not been altered or contrived in any way
from the composition of the target papulation. Such samples may be obtained by
simple random sampling, but also by a number of other sampling strategies, such as
cluster sampling, systematic sampling with a random start, etc.
A contrived sample, on the other hand, is one that has a composition that has been
purposely altered (that is, contrived) to suit the objective of the study. It is, therefore,
not directly representative of the population from which it is drawn. The aspect of the
population that are deliberately altered may be an exposure of interest, an outcome of
interest, or some other characteristic of the population. If the proportion of subjects in
the sample with characteristic “x” are deliberately altered, then the sample is said to
be “contrived on x”.
Consider the following study scenarios:
A “backward”-type of longitudinal study might sample separately from the
sup-populations with and without disease.
A study to examine the association between exercise and childhood obesity
might sample separately from boys and girls.
A follow-up study might sample separately from exposed and non-exposed
sub-populations.
While the above examples used contrived samples (the “x” in outcome, some other
characteristic – here sex, and exposure, respectively), the samples within each level of
19
“x” are likely to be non-contrived, even though the entire sample is, in each case
contrived.
Consideration of whether the sample is or is not contrived is important, as it has
implications for the analytical strategies that are appropriate and on the interpretations
that can be drawn from these analyses.
Question: Would you consider the sample used in an experimental studies (non-
randomized and randomized controlled trial) to be contrived or not? And under what
circumstances?

Combining these attributes to build up a study design


The above attributes can be combined in a wide variety of ways. Some combinations
correspond to the conventional epidemiological study designs outlined in the first part
of this module. Other combinations deviate from the conventional designs or may be
intermediate between two conventional designs. There are yet other combinations
which make no logical sense and therefore would be totally inappropriate for an
epidemiological study.

Exercise 1: Construct a two dimensional 4 x 3 table in which the rows represent the
representativeness of the sample (respectively non-contrived, contrived on exposure,
contrived on outcome and contrived on some other characteristic) and the columns
represent directionality of the study (respectively transverse, backward longitudinal
and forward longitudinal) and indicate which combinations correspond to which
conventional design (if any), which are still possible and which impossible.

Exercise 2: Then add in the types on information (categorical or continuous, and


event or characteristic) and again see which of the added combinations correspond to
conventional study designs, and which may not correspond but are still possible.

Exercise 3: Consider your own proposed research. Which of the above attributes
apply to your study? Please indicate the attributes of your study on a table similar to
that created in the previous exercise.

20

You might also like