Applying Quantitative Bias Analysis To Epidemiologic Data
Applying Quantitative Bias Analysis To Epidemiologic Data
Applying Quantitative Bias Analysis To Epidemiologic Data
Series Editors:
M. Gail
K. Krickeberg
J. Samet
A. Tsiatis
W. Wong
Applying Quantitative
Bias Analysis to
Epidemiologic Data
Timothy L. Lash Matthew P. Fox
Boston University Boston University
School of Public Health School of Public Health
715 Albany St. 715 Albany St.
Boston, MA 02118, USA Boston, MA 02118, USA
Aliza K. Fink
Boston University
School of Public Health
715 Albany St.
Boston, MA 02118, USA
Series Editors
M. Gail K. Krickeberg
National Cancer Institute Le Chatelet
Bethesda, MD 20892 F-63270 Manglieu
USA France
J. Samet A. Tsiatis
Department of Preventive Medicine Department of Statistics
Keck School of Medicine North Carolina State University
University of Southern California Raleigh, NC 27695
1441 Eastlake Ave. Room 4436, MC 9175 USA
Los Angeles, CA 90089
W. Wong
Department of Statistics
Stanford University
Stanford, CA 94305-4065
USA
Excel and Visual Basic are trademarks of the Microsoft group of companies. SAS and all other SAS
Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in
the USA and other countries. Stata is a registered trademark of StataCorp LP.
v
vi Preface
analyzing epidemiologic data without it. We have aimed the text at readers who
have some familiarity with epidemiologic research and intermediate data analysis
skills. For those without those skills, we suggest a comprehensive methods text,
such as Modern Epidemiology, which can be used in conjunction with this text to
provide a foundation in epidemiologic terminology, study design, and data analysis.
Readers with advanced skills, particularly statistical skills, might yearn for a fully
Bayesian treatment of the topic of bias analysis. Our approach is intentionally more
fundamental, in the hope that a wider audience of epidemiologists and data analysts
will adopt bias analysis methods if they do not have to simultaneously confront the
barriers (real or perceived) of Bayesian statistics.
An important adjunct resource for this textbook is the suite of freely available
spreadsheets and software available for download at https://sites.google.com/site/
biasanalysis/. (N.B. The credit for development of these tools goes solely to
Matthew Fox, whose perseverance and vision to enable bias analysis by these tech-
niques has added substantial value to the text.) We encourage readers to download
the software, follow the examples in the text, and then modify the fields to imple-
ment their own bias analysis. We would be delighted to hear from anyone who
improves the tools or detects an error, and we will post revised tools as they become
available. Likewise, we welcome comments, criticisms, and errata regarding the
text from readers and will maintain a log of this feedback on the aforementioned
web site.
In closing, we thank our friends and colleagues who contributed to the text
directly or indirectly. We appreciate Charles Pooles suggestion to the American
College of Epidemiology that a course on bias analysis would be of value, and we
appreciate John Acquavellas decision to accept that suggestion on behalf of the
college. Sander Greenland participated in the development and presentation of the
American College of Epidemiology workshops, and has been instrumental in
improving the methods of bias analysis. We are very grateful for his input and dedi-
cation to the topic. We also thank our colleagues who have a particular interest in
bias analysis methods; they have challenged us to develop our ideas and to com-
municate them clearly. We cannot list them all, so acknowledge especially Charles
Poole, Carl Phillips, George Maldonado, Anne Jurek, Ken Rothman, Rebecca
Silliman, Soe Soe Thwin, Dan Brooks, and Steve Cole. We also acknowledge the
important contribution of three anonymous reviewers recruited by our patient
publisher; perhaps some have already been named above.
Funding for this project was made possible by grant 1 G13 LM008530 from the
National Library of Medicine, NIH, DHHS. The views expressed in any written
publication, or other media, do not necessarily reflect the official policies of the
Department of Health and Human Services; nor does mention by trade names, com-
mercial practices, or organizations imply endorsement by the U.S. Government.
Sections of Chapter 1 first appeared in: Lash TL. Heuristic thinking and inference
from observational epidemiology. Epidemiology 2007;18(1):6772 and are used
here with permission. Examples of misclassified data in Chapters 6, 7, and 8 are
used with kind permission from Springer Science+Business Media and first
appeared in: Fink AK, Lash TL. A null association between smoking during
Preface vii
pregnancy and breast cancer using Massachusetts registry data. Cancer Causes and
Control 2003;14(5):497503. The example of multidimensional bias analysis
involving colorectal cancer mortality and 5-fluorouracil treatment is used with
permission and first appeared in: Sundararajan V, Mitra N, Jacobson JS, Grann VR,
Heitjan DF, Neugut AI. Survival associated with 5-fluorouracil-based adjuvant
chemotherapy among elderly patients with node-positive colon cancer. Annals of
Internal Medicine 2002;136(5):349357.
Boston, MA Timothy L. Lash
Matthew P. Fox
Aliza K. Fink
viii
Contents
ix
x Contents
6 Misclassification ......................................................................................... 79
Introduction .................................................................................................. 79
Definitions and Terms .................................................................................. 80
Conceptual .............................................................................................. 80
Calculating Classification Bias Parameters from Validation Data.......... 82
Sources of Data ....................................................................................... 83
Contents xi
Introduction
provide a measure of the effectiveness of therapies for which efficacy has been
established by randomized designs in clinical practice settings that involve patients
with characteristics that differ from the clinical trial subjects (e.g., the elderly or other
underserved subpopulations). Thus, nonrandomized epidemiologic research contrib-
utes to the knowledge base for disease prevention, early detection, and treatment.
Objective
The objective of this test is to reduce the aforementioned barriers to regular imple-
mentation of quantitative sensitivity analysis. Epidemiologic studies yield effect
estimates such as the risk ratio, rate ratio, odds ratio, or risk difference; all of which
compare measurements of the occurrence of an outcome in a group with some com-
mon characteristic (such as an exposure) with the occurrence of the outcome in a sec-
ond group with some other common characteristic (such as the absence of exposure).
The error accompanying an effect estimate equals the square of its difference from
the true effect, and conventionally parses into random error (variance) and systematic
error (bias squared). Under this construct, random error is that which approaches zero
as the study size increases and systematic error is that which does not. The amount of
random error in an effect estimate is measured by its precision, which is usually
quantified by p-values or confidence intervals that accompany the effect estimate.
The amount of systematic error in an effect estimate is measured by its validity, which
is seldom quantified. A quantitative assessment of the systematic error about an effect
estimate can be made using bias analysis.
In this text, we have collected existing methods of quantitative bias analysis,
explained them, illustrated them with examples, and linked them to tools for imple-
mentation. The second chapter provides a guide to choosing the method most
appropriate for the problem at hand and for making inference from the methods
4 1 Introduction, Objectives, and an Alternative
results. The software tools automate the analysis in familiar software and provide
output that reduces the resources required for presentation. Probabilistic bias analysis
and multiple biases modeling, for example, yield output that is no more compli-
cated to present and interpret than the conventional point estimate and its associated
confidence interval.
While we have compiled a set of methods to address comprehensively the most
common threats to a study results validity (selection bias, information bias, and
unmeasured confounding), we have not addressed all possible threats to validity or even
all methods to address even these common threats. For example, we have not addressed
model misspecification or bias from missing data. We have not addressed empirical
methods of bias analysis or Bayesian methods of bias analysis, although these methods
are related to many of the methods we do present. The interested reader can find
textbooks and journal articles that describe these methods, some of which can be
implemented by freely available software that can be downloaded from the internet.
We have not presented these methods for several reasons. First, this text is
directed to practicing epidemiologists who are familiar with these threats to validity
and who are comfortable with spreadsheets and relatively fundamental SAS
software programming. The alternative methods often require more sophisticated
computer programming than required to implement the methods we present. Second,
the empirical methods often require assumptions about the accuracy of the data
source used to inform the bias analysis, which we believe can seldom be supported.
We prefer to recognize that the validation data are often themselves measured with
error, and that this error should be incorporated into the bias analysis. The methods we
present more readily accommodate this preference. Third, the Bayesian methods are
similar to the probabilistic bias analysis methods and probabilistic multiple bias
analysis methods we present toward the end of this text. The primary difference is that
the Bayesian methods require specification of a prior for the parameter to be esti-
mated (i.e., ordinarily the association between an exposure and an outcome). While we
recognize and even agree with this Bayesian approach to data analysis and inference,
particularly compared with the inherent frequentist prior that any association is equally
likely, this text is not the forum to continue that debate.
An Alternative
Heuristics
A substantial literature from the field of cognitive science has demonstrated that
humans are frequently biased in their judgments about probabilities and at choosing
between alternative explanations for observations (Piattelli-Palmarini, 1994b; Kahneman
et al., 1982; Gilovich et al., 2002), such as epidemiologic associations. Some cognitive
scientists postulate that the mind uses dual processes to solve problems that require
such evaluations or choices (Kahneman and Frederick, 2002; Sloman, 2002).
The first system, labeled the Associative System, uses patterns to draw inferences.
6 1 Introduction, Objectives, and an Alternative
We can think of this system as intuition, although any pejorative connotation of that
label should not be applied to the associative system. The second system, labeled the
Rule-Based System, applies a logical structure to a set of variables to draw infer-
ences. We can think of this system as reason, although the label alone should not
connote that this system is superior. The Associative System is not necessarily less
capable than the Rule-Based System; in fact, skills can migrate from the Rule-Based
System to the Associative System with experience. The Associative System is
in constant action, while the Rule-Based System is constantly monitoring the
Associative System to intervene when necessary. This paradigm ought to be familiar;
we have all said at some time Wait a minute let me think, by which we do not mean
that we have not yet thought, but that we are not satisfied with the solution our
Associative Systems thought has delivered. After the chance to implement the
Rule-Based System, we might say On second thought, I have changed my mind,
by which we mean that the Rule-Based System has overwritten the solution initially
delivered by the Associative System.
The process used by the Associative System to reach a solution relies on heuristics.
A heuristic reduces the complex problem of assessing probabilities or predicting
uncertain values to simpler judgmental operations (Tversky and Kahneman,
1982b). An example of a heuristic often encountered in epidemiologic research is
the notion that nondifferential misclassification biases an association toward the
null. Heuristics often serve us well because their solutions are correlated with the
truth, but they can sometimes lead to systematic and severe errors (Tversky and
Kahneman, 1982b). Nondifferential and nondependent misclassification of a
dichotomous exposure leads to the expectation that an association will be biased
toward the null, but many exceptions exist. For example, any particular association
influenced by nondifferential misclassification may not be biased toward the null
(Jurek et al., 2005), dependent errors in classification can substantially bias an
association away from the null even if classification errors are nondifferential
(Kristensen, 1992), nondifferential misclassification of disease may not lead to any
bias in some circumstances (Brenner and Savitz, 1990), and a true association may
not provide stronger evidence against the null hypothesis than the observed associa-
tion based on the misclassified data even if the observed association is biased
toward the null (Gustafson and Greenland, 2006). Application of the misclassifica-
tion heuristic without deliberation can lead to errors in an estimate of the strength
and direction of the bias (Lash and Fink, 2003a), as is true for more general cogni-
tive heuristics (Tversky and Kahneman, 1982b).
Cognitive scientists have identified several classes of general heuristics, three of
which are described below because they may be most relevant to causal inference
based on nonrandomized epidemiologic results. These heuristics have the follow-
ing characteristics in common (Piattelli-Palmarini, 1994a). First, the errors in judg-
ments attributable to the heuristic are systematic and directional; that is, they
always act in the same way and in the same direction. Second, they are general and
nontransferable; that is, all humans are susceptible to the errors and knowledge of
how they act does not immunize us against them. Third, they are independent of
intelligence and education; that is, experts make the same mistakes as novices,
An Alternative 7
particularly if the problem is made a little more difficult or moved a small distance
outside of their expertise. While studies that have elicited an understanding of these
heuristics have most often been conducted in settings that are not very analogous to
causal inference using epidemiologic data, one such study has been conducted and its
results corresponded to results elicited in the cognitive science setting (Holman et al.,
2001). In addition, these heuristics have been shown to affect evidence-based forecasts
of medical doctors, meteorologists, attorneys, financiers, and sports prognosticators
(Koehler et al., 2002). It seems unlikely that epidemiologists would be immune.
association to account for the bias only so far as is plausible, which adjustment will,
on average, be insufficient.
Overconfidence
and cultures (Yates et al., 2002), and does not depend strongly on the accuracy with
which respondents estimate the median (Alpert and Raiffa, 1982). In fact, the
discrepancy between correctness of response and overconfidence increases with
the knowledge of the respondent. That is, when a response requires considerable
reasoning or specialized knowledge, the accuracy of experts exceeds the accuracy of
novices, but the extent of their overconfidence compared with novices increases
faster than the extent of their accuracy (Piattelli-Palmarini, 1994b).
How might the overconfidence heuristic affect inference from nonrandomized
epidemiologic results? Consider the conventional frequentist confidence interval
about a point estimate associating an exposure with a disease, derived from a
studys results, to be an uncertainty range like the interquartile range described
above. Further consider that stakeholders may be aware that the interval fails to
account for uncertainty beyond random error, so should be considered a minimum
description of the true uncertainty. Can the stakeholders be expected to inflate the
interval sufficiently to account for sources of uncertainty aside from random error?
An understanding of the overconfidence heuristic suggests that the intuitive infla-
tion will be predictably insufficient.
An implicit assumption is that all who are truly diseased will also test positive.
Almost half of the respondents answered 95%, which takes account of only the specific
evidence (the patients positive test) and completely ignores the base-rate information
(the prevalence of the disease in the population). The correct response of 2% was
given by 11 students. Base-rate information is more likely to be taken into account
when it is perceived to be causal rather than incidental (Tversky and Kahneman,
1982a). So, for example, respondents asked to estimate the probability that a
particular student passed an examination, given specific evidence about the student
(past academic performance), were more likely to integrate base-rate information
when it appeared causal (the examination was more or less difficult, based on the
proportion passing or failing among all students who took the examination) than
when it appeared incidental (the proportion of students passing or failing was fixed
by selection from all students who took the examination). Failure to account for the
10 1 Introduction, Objectives, and an Alternative
base-rate does not derive solely from innumeracy (Nisbett et al., 1982). Rather, the
specific evidence is ordinarily perceived to be concrete and emotionally interesting,
thereby more readily inspiring the respondent to write a mental script to explain its
relevance. Base-rate information is perceived to be abstract and emotionally unin-
teresting, so less likely to inspire the Associative System to write an explanatory
script that contradicts the specific evidence.
How might failure to account for the base-rate affect inference from nonrandomized
epidemiologic results? Consider a conventional epidemiologic result, composed of
a point estimate associating an exposure with a disease and its frequentist confidence
interval, to be specific evidence about a hypothesis that the exposure causes the
disease. Further consider that stakeholders have devoted considerable effort to gen-
erating and understanding the research results. Can the stakeholders be expected to
take account of the base-rate of true hypotheses among those studied by epidemi-
ologists, which may not be very high? An understanding of this last heuristic
suggests that stakeholders are not likely to adequately account for the base-rate,
despite exhortations to use base-rate information in epidemiologic inference (Poole,
2001; Wacholder et al., 2004; Greenland and Robins, 1991; Greenland, 2006).
Conclusion
Epidemiologists are not alone among scientists in their susceptibility to the system-
atic errors in inference engendered, in part, by influence of the heuristics described
above. For example, a review of measurements of physical constants reported con-
sistent underestimation of uncertainty (Henrion and Fischoff, 2002). Measurements
of the speed of light overestimated the currently accepted value from 1876 to 1902
and then underestimated it from 1905 to 1950. This pattern prompted one investigator
to hypothesize a linear trend in the speed of light as a function of time and a second
investigator to hypothesize a sinusoidal relation. In reaction, Birge adjusted a set of
measurements for systematic errors, produced corrected values and intervals that
overstated rather than understated the uncertainty, and concluded that the speed
of light was constant (Birge, 1941). Henrion and Fischoff (2002) attribute the
consistent underassessment of uncertainty in measurements of physical constants
to investigators using the standard error as the full expression of the uncertainty
regarding their measurements, to the impact on their inferences of heuristics such
as those described above, and to real-world pressures that discourage a candid
expression of total uncertainty. These same three forces likely affect inference from
nonrandomized epidemiology studies as well.
Henrion and Fischoff (2002) recommend three solutions. First, those who measure
physical constants should strive to account for systematic errors in their quantitative
assessments of uncertainty. Second, with an awareness of the cognitive literature,
those who measure physical constants should temper their inference by subjecting it
to tests that counter the tendencies imposed by the heuristics. For example, overcon-
fidence arises in part from a natural tendency to overweight confirming evidence and
to underweight disconfirming evidence. Forcing oneself to write down hypotheses
Conclusion 11
that counter the preferred (i.e., causal) hypothesis can reduce overconfidence in that
hypothesis. Finally, students should be taught how to obtain better measurements,
including how to better account for all sources of uncertainty and how to counter the
role of heuristic biases in reaching an inference. These same recommendations would
well serve those who measure epidemiologic associations.
It may seem that an alternative would be to reduce ones enthusiasm about
research results. In cognitive sciences literature, this approach is called debiasing,
and sorts into three categories (Wilson et al., 2002). These categories are: resistance
a mental operation that attempts to prevent a stimulus from having an adverse
effect, remediation a mental operation that attempts to undo the damage done
by the stimulus, and behavior control an attempt to prevent the stimulus from
influencing behavior. These strategies have been shown to be ineffective solutions
to tempering the impact of the heuristics described above (Wilson et al., 2002).
Nonetheless, epidemiologists are taught to rely on debiasing when making infer-
ence. We are told to interpret our results carefully, and to claim causation only with
trepidation. Consider, for example, the advice offered by three commentators on the
disparity between randomized (Rossouw et al., 2002) and nonrandomized (Stampfer
and Colditz, 1991) studies of the association between hormone replacement therapy
and cardiovascular disease. Each included in their commentary some advice tanta-
mount to a warning to be careful out there. One wrote that we should be reason-
ably cautious in the interpretation of our observations (Michels, 2003), the second
wrote that we must remain vigilant and recognize the limitations of research
designs that do not control unobserved effects (Piantadosi, 2003), and the third
wrote that future challenges include continued rigorous attention to the pitfalls of
confounding in observational studies (Whittemore and McGuire, 2003). Similar
warnings are easy to find in classroom lecture notes or textbooks.
The reality is that such trepidation, even if implemented, is ineffective. Just as
people overstate their certainty about uncertain events in the future, we also over-
state the certainty with which we believe that uncertain events could have been
predicted with the data that were available in advance, had it been more carefully
examined. The tendency to overstate retrospectively our own predictive ability is
colloquially known as 2020 hindsight. Cognitive scientists, however, label the
tendency creeping determinism (Piattelli-Palmarini, 1994b).
Creeping determinism can impair ones ability to judge the past or to learn from it.
It seems that when a result such as the trial of hormone therapy becomes available,
we immediately seek to make sense of the result by integrating it into what we
already know about the subject. In this example, the trial result made sense only
with the conclusion that the nonrandomized studies must have been biased by
unmeasured confounders, selection forces, and measurement errors, and that the
previous consensus must have been held only because of poor vigilance against
biases that act on nonrandomized studies. With this reinterpretation, the trial results
seem a more or less inevitable outcome of the reinterpreted situation. Making sense
of the past consensus is so natural that we are unaware of the impact that the out-
come knowledge (the trial result) has had on the reinterpretation. Therefore, merely
warning people about the dangers apparent in hindsight, such as the recommenda-
tions for heightened vigilance quoted above, has little effect on future problems of
12 1 Introduction, Objectives, and an Alternative
Introduction
Reducing Error
Every study has limited sample size, which means that every study contains some
random error. Similarly, every study is susceptible to sources of systematic error.
Even randomized epidemiologic studies are susceptible to selection bias from
losses to follow-up and to misclassification of analytic variables. Since the research
objective can never be achieved perfectly, epidemiologists should instead strive to
reduce the impact of error as much as possible. These efforts are made in the design
and analysis of the study.
To reduce random error in a studys design, epidemiologists can increase the size
of the study or improve the efficiency with which the data are distributed into the
categories of the analytic variables. Increasing the size of the study requires enrolling
more subjects and/or following the enrolled subjects for a longer period, and this
additional information ought to reduce the estimates standard error. A second
strategy to improve an estimates precision is to improve the efficiency with which
the data are distributed into categories of the analytic variables. This strategy also
reduces the standard error of the estimate of association, which improves its precision.
Consider the standard error of the odds ratio, which equals the square root of
the sum of inverses of the frequencies of the interior cells of a two-by-two table.
As displayed in Table 2.1, the two-by-two table is the simplest contingency table
relating exposure to disease.
With this data arrangement, the odds ratio equals (a/c)/(b/d) and its standard error
equals (1/a + 1/b + 1/c + 1/d). In a study with 100 subjects and each interior cell
frequency equal to 25, the odds ratio equals its null value of 1.0 and the standard error
of the odds ratio equals (1/25 + 1/25 + 1/25 + 1/25) = 0.4. The 95% confidence
interval about the null odds ratio equals 0.46 to 2.19. If only 40% of the cases were
located, but the sample size remained constant by increasing the case to control ratio
to 1 to 4, rather than 1 to 1, the odds ratio would remain null. The odds ratios
standard error would then equal (1/10 + 1/10 + 1/40 + 1/40) = 0.5 and the 95%
confidence interval would equal 0.38 to 2.66. Although the sample size (100 subjects)
did not change, the standard error and the width of the confidence interval (measured
on the log scale) have both increased by 25% due only to the less efficient distribution
of the subjects within the contingency table. This example illustrates how the
efficiency of the distribution of data within the categories of analytic variables affects
the study precision, given a fixed sample size, even when no bias is present.
Improving the efficiency with which the data are distributed requires an under-
standing of the distribution of the exposure and disease in the source population. If
Reducing Error 15
one is interested in studying the relation between sunlight exposure and melanoma
incidence, then a population in the northern United States might not have an efficient
distribution of the exposure compared with a population in the southern United
States where sunlight exposure is more common. If one is interested in studying the
relation between tanning bed exposure and melanoma incidence, then a population
in the northern United States might have a more efficient distribution of the exposure
than a population in the southern United States. Careful selection of the source popula-
tion is one strategy that investigators can use to improve the efficiency of the distri-
bution of subjects within the categories of the analytic variables.
Matching is a second strategy to improve the efficiency of this distribution.
Matching a predetermined number of controls to each case on potential confound-
ers assures that the controls will appear in a constant ratio to cases within the cat-
egories of the confounder. For example, skin type (freckled vs unfreckled) might
confound the relation between sunlight exposure and melanoma incidence. Cases
of melanoma may be more likely to have freckled skin than the source population
that gives rise to cases, and people with freckled skin might have different exposure
to sunlight than people with unfreckled skin. Without matching, most of the cases
will be in the category of the confounder denoting freckled skin, and most of the
controls will be in the category of the confounder denoting unfreckled skin because
it is more common in the source population. This disparity yields an inefficient
analysis, and therefore a wider confidence interval. Matching controls to cases
assures that controls appear most frequently in the stratum where cases appear most
frequently (e.g., freckled skin), so the analysis is more efficient and the confidence
interval narrower. Matching unexposed to exposed persons in cohort studies can
achieve a similar gain in efficiency.
To reduce systematic error in a studys design, epidemiologists should focus on
the fundamental criterion that must be satisfied to obtain a valid comparison of the
disease incidence in the exposed group with the disease incidence in the unexposed
group. That is, the unexposed group must have the disease incidence that the
exposed group would have had, had they been unexposed (Greenland and Robins,
1986), within the strata of measured confounders. The ideal study would compare
the disease occurrence in the exposed group (a factual, or observable, disease inci-
dence) with the incidence they would have had, had they been unexposed (a coun-
terfactual, or unobservable, disease incidence). Since the ideal comparison can never
be realized, the disease incidence is measured in a surrogate group: a second group
of subjects who are unexposed and whose disease experience we substitute for the
counterfactual ideal. The validity of that substitution, which cannot be verified,
directly impacts the validity of the estimate of association. The investigator must
strive for the desired balance in the collapsed data, which is achievable within prob-
ability limits by randomization, or within strata of measured confounders.
With this criterion in mind, the design principles to enhance validity follow
directly. The study population should be selected such that participation is not
conditional on exposure status or disease status. When both exposure status and
disease status affect the probability that a member of the source population partici-
pates in the study, the estimate of association will be susceptible to selection bias.
16 2 A Guide to Implementing Quantitative Bias Analysis
Enrolling subjects and/or documenting their exposure status before the disease
occurs (i.e., prospectively) assure that disease status cannot be associated with
initial participation.
Second, the study population should be selected such that the net effects of all
other predictors of the outcome, aside from exposure itself, are in balance between
the exposed and unexposed groups. This balance is commonly referred to as having
no confounding. Randomization achieves this objective within limits that are statis-
tically quantifiable. When exposure status cannot be assigned by randomization,
which is usually the situation in studies of disease etiology, the investigator can
limit confounding by restricting the study population to one level of the confounder
or ensuring that data are collected on potential confounders so that their effects can
be assessed in the analysis.
Finally, the data should be collected and converted to electronic form with as
few classification errors as possible. Some errors in classification are, however,
inevitable. Investigators often strive to assure that the rates of classification errors
do not depend on the values of other variables (e.g., rates of exposure classification
errors do not depend on disease status, which is called nondifferential exposure
misclassification) or on the proper classification of other variables (e.g., errors in
classification of exposure are as likely among those properly classified as diseased
as among those improperly classified as diseased, which is called independent
exposure misclassification). This second objective can be readily achieved by using
different methods to collect information on disease status from those used to collect
information on exposure status (as well as information on confounders). The data
collection for disease status should be conducted so that the data collector is
blinded to the information on exposure and confounder status. Nondifferential and
independent errors in classification often yield the most predictable, and therefore
most readily correctable, bias of the estimate of association. Nonetheless, one may
choose to select a design expected to yield relatively small differential classification
errors in preference to a design expected to yield relatively large nondifferential
classification errors, since the former would yield less bias and uncertainty.
Generalized advice always to balance information quality across compared catego-
ries (i.e., to strive for nondifferential classification errors) ignores the potential for
this trade-off to favor small differential errors.
shows the demographic characteristics of the people in the study. For example,
it might show the proportion of the population enrolled at each of the study centers, the
distribution of age, and the proportion belonging to each sex. Descriptive analyses
should include the proportion of the study population with a missing value assigned
to each analytic variable. The proportion with missing data helps to identify analytic
variables with problems in data collection, definition, or format conversion.
Examination of the bivariate relations between analytic variables is the third step
in data analysis. Bivariate relations compare proportions, means, or medians for
one study variable within categories of a second. These comparisons inform the
analysts understanding of the data distributions and can also identify data errors
that would prompt an inspection of the data collection, variable definitions, or for-
mat conversions. The number of bivariate relations that must be examined grows
exponentially as the number of analytic variables increases. If the number grows
too large to be manageable, the analyst should restrict the examination to pairs that
make sense a priori. However, whenever possible, all pairs ought to be examined
because a surprising and important finding might easily arise from a pair that would
be ignored a priori.
The comparisons of the proportions with the disease of interest within the cate-
gories of the analytic variables are a special subset of bivariate comparisons. These
proportions can be explicitly compared with one another by difference or division,
yielding estimates of association such as the risk difference, risk ratio, or a
difference in means. When estimates of association are calculated as a part of the
bivariate comparison, the analysis is also called a stratified analysis. Often one
comparison is a focus of the stratified analysis, which is the comparison of the
disease proportions in those exposed to the agent of interest with those unexposed
to the agent of interest. This comparison relates directly to the original objective: a
valid and precise estimate of the effect of an exposure on the occurrence of a dis-
ease. To continue the stratified analysis, the comparisons of disease proportions in
exposed versus unexposed are expanded to comparisons within levels of other ana-
lytic variables. For example, the risk ratio comparing exposed with unexposed
might be calculated within each of the three age groups. An average risk ratio can
be calculated by standardization or pooling. Comparison of this average or sum-
marized risk ratio with the crude or collapsed risk ratio (including all ages in one
stratum) indicates whether age is an important confounder of the risk ratio. If the
pooled risk ratio is substantially different from crude risk ratio, then the pooled risk
ratio will provide an estimate of association that is unconfounded (by age) and is
precision enhancing, in that its confidence interval will be narrower than those
obtained from alternative methods for averaging the risk ratio across strata of age.
Pooling reduces both the random error (by yielding a precision-enhancing estimate
of association) and the systematic error (by yielding an estimate of association
unconfounded by age). The correspondence between noncollapsibility and con-
founding holds also for the odds ratio, hazard, ratio, rate ratio, and rate difference,
so long as the risk of disease is low (<10%) in every combination of the categories
of exposure and the categories of controlled confounders. When the risk of disease
is greater than 10%, these estimates of association may not be collapsible across
strata of a control variable, even if that variable is not a confounder.
18 2 A Guide to Implementing Quantitative Bias Analysis
The analysis can proceed by further stratification on a second variable (e.g., sex
groups) and pooling to simultaneously adjust for confounding by both age and sex.
The number of strata increases geometrically as additional variables are analyzed,
which can become confusing as the number of strata increases beyond what can be
easily reviewed on a single page. In addition, the data quickly become too sparse
for pooling as the frequencies in some cells fall below about five and may reach
zero. A common solution to the problem engendered by this geometric progression
is to use regression modeling rather than stratification. Regression models yield
estimates of association that simultaneously adjust for multiple confounders and
that are also precision-enhancing. Their advantage over stratification is that they do
not become cumbersome or suffer from small numbers as easily as multiple strati-
fication. However, regression modeling does not show the data distribution, so
should not be used without first conducting the bivariate analysis and stratification
on the critical confounders.
This analytic plan describes the conventional epidemiologic approach to data
analysis. It yields a quantitative assessment of random error by producing confi-
dence intervals about the crude or pooled estimates of association. It also adjusts
the estimate of association for confounding variables included in the stratification
or regression model. However, there is no adjustment for selection bias, measure-
ment error, confounding by unmeasured confounders, or residual confounding by
measured confounders that are poorly specified or poorly measured. Nor is there
any quantification of uncertainty arising from these sources of bias. Quantitative
bias analysis addresses these shortcomings in the conventional approach to epide-
miologic data analysis.
Quantifying Error
The goal of quality study design and analysis is to reduce the amount of error in an
estimate of association. With that goal in mind, investigators have an obligation to
quantify how far they are from this goal. Quantitative bias analysis achieves this
objective. Conducting a study that will yield a measure of association with as little bias
as practical requires careful planning and choices in the design of data collection and
analysis. Similarly, quantifying the amount of residual bias requires choices in the
design of data collection and analysis. Since conducting a high-quality bias analysis
follows the same steps as conducting a high-quality epidemiologic study, plans for
both should be integrated at each phase of the study, as depicted in Fig. 2.1.
Before discussing the steps involved in planning and conducting a quantitative bias
analysis, it is important to first consider when it makes the most sense to conduct a
bias analysis. Quantitative bias analysis is most valuable when a study is likely to
Quantifying Error 19
Minimize Enhance
Develop
data validity
plan for
collection during
analysis
errors analysis
Enhance
Validity
Data
Design Analysis
collection
Fig. 2.1 Integration of planning for bias analysis with conventional study design and analysis
not be productive. Studies with wide conventional errors or that are susceptible to
many large systematic errors might instead be useful for generating ideas for better-
designed and larger subsequent studies. They should seldom however, provide a
basis for inference or policy action, so the additional effort of quantitative bias
analysis would not be an efficient use of resources.
Quantitative bias analysis is therefore most valuable when studies yield narrow
conventional confidence intervals so have little residual random error and when
these studies are susceptible to a limited number of systematic errors. Such studies
often appear to be an adequate basis for inference or for policy action, even though
only random error has been quantified by the conventional confidence interval.
Quantification of the error due to the limited number of biases will safeguard
against inference or policy action that takes account of only random error. Without
a quantitative assessment of the second important source of error systematic error
the inference or policy action would usually be premature.
Quantitative bias analysis is best accomplished with foresight, just as with all
aspects of epidemiologic research. The process of conducting a well-designed bias
analysis goes beyond simply understanding the methods used for the analysis, but
also includes a thorough planning phase to ensure that the information needed for
quantification of bias is carefully collected. To facilitate this collection, investiga-
tors should consider the important threats to the validity of their research while
designing their study. This consideration should immediately suggest the quantita-
tive analyses that will explore these threats, and should thereby inform the data
collection that will be required to complete the quantitative analyses.
For example, an investigator may design a retrospective case-control study of the
relation between leisure exposure to sunlight and the occurrence of melanoma.
Cases of melanoma and controls sampled from the source population will be inter-
viewed by telephone regarding their exposures to sunlight and other risk factors for
melanoma. The investigator should recognize the potential for selection bias to be
an important threat to the studys validity: cases may be more likely than controls
to agree to the interview, and those who spend substantial time in sunlight might
also participate at a different rate than those who do not spend much time in the
sun. To quantitatively address the potential selection bias (Chap. 4), the investigator
will need to know the participation proportions in cases and controls, within groups
of high and low exposure to sunlight. Case and control status will be known by
design, but to characterize each eligible subjects sunlight exposure, the investiga-
tor will need to complete the interview. Sunlight exposure will not, therefore, be
known for subjects who refuse to participate. However, in planning for a quantita-
tive bias analysis, the investigator might ask even those who refuse to participate
Quantifying Error 21
whether they would be willing to answer a single question regarding their sunlight
exposure. If the proportion of refusals who did agree to answer this one question
was high, this alone would allow the investigator to crudely compare sunlight
exposure history among cases and controls who refuse to participate, and to adjust
the observed estimate of association for the selection bias.
To continue the example, the investigators might be concerned about the accuracy
of subjects self-report of history of leisure-time sunlight exposure. In particular,
melanoma cases might recall or report their history of sunlight exposure differently
than controls sampled from the source population. This threat to validity would be
an example of measurement error (Chap. 6), which can also be addressed by quan-
titative bias analysis. To implement a bias analysis, the investigators would require
estimates of the sensitivity and specificity of sunlight exposure classification among
melanoma cases and among members of the source population. Classification error
rates might be obtained by an internal validation study (e.g., comparing self-report
of sunlight exposure history with a diary of sunlight exposure kept by subsets of the
cases and controls) or by external validation studies (e.g., comparing self-report of
sunlight exposure history with a diary of sunlight exposure kept by melanoma cases
and noncases in a similar second population).
Finally, imagine that the investigator was concerned that the relation between
leisure time exposure to sunlight and risk of melanoma was confounded by expo-
sure to tanning beds. Subjects who use tanning beds might be more or less likely to
have leisure time exposure to sunlight, and tanning bed use itself might be a risk
factor for melanoma. If each subjects use of tanning beds was not queried in the
interview, then tanning bed use would be an unmeasured confounder (Chap. 5).
While tanning bed use would ideally have been assessed during the interview, it is
possible that its relation to melanoma risk was only understood after the study
began. To plan for bias analysis, the investigator might turn to published literature
on similar populations to research the strength of association between tanning bed
use and leisure time exposure to sunlight, the strength of association between tan-
ning bed use and melanoma, and the prevalence of tanning bed use. In combination,
these three factors would allow a quantitative bias analysis of the potential impact
of the unmeasured confounder on the studys estimate of the association of leisure
time exposure to sunlight on risk of melanoma.
In these examples, planning for quantitative bias analysis facilitates the actual
analysis. Selection forces can be best quantified if the investigator plans to ask for
sunlight information among those who refuse the full interview. Classification error
can be best quantified if the investigator plans for an internal validation study or
assures that the interview and population correspond well enough to the circum-
stances used for an external validation study. Unmeasured confounding can be best
quantified if the investigator collects data from publications that studied similar
populations to quantify the bias parameters. Table 2.2 outlines the topics to consider
while planning for quantitative bias analysis. These topics are further explained in
the sections that follow.
22 2 A Guide to Implementing Quantitative Bias Analysis
confounding arises at the level of the population, so data used to correct for
an unmeasured confounder should arise from the same or a similar population,
but should not necessarily be limited to the population sampled to participate in
the study. The study sample is included in the source population, but is rarely the
entire source population. Selection bias arises from disease and exposure-depend-
ent participation. In assessing selection bias, the exposure and disease informa-
tion are available for study participants, so the information required should be
collected from nonparticipants. In contrast, information bias from classification
error arises within the actual study population, so the data required for assessing
information bias should be collected from a subset of participants (an internal
validity study) or from a population similar to the participants (an external validity
study). Careful consideration of the target population will lead to a more appropriate
bias analysis.
Once the major threats to validity have been ascertained, and the population from
which validity data will be collected has been identified, the investigator should
devise a plan for collecting the validity data. If the validity data will be external, then
the investigator should conduct a systematic review of the published literature to find
applicable validity studies. For example, if the investigator of the sunlight-melanoma
relation is concerned about errors in reporting of sunlight exposure, then she should
collect all of the relevant literature on the accuracy of self-report of sunlight exposure.
Studies that separate the accuracy of exposure by melanoma cases and noncases will
be most relevant. From each of these studies, she should abstract the sensitivities and
specificities (or predictive values) of self-report of sunlight exposure. Some estimates
might be discarded if the population is not similar to the study population. Studies of
the accuracy of self-report of sunlight exposure in teenagers would not provide good
external validity information for a study of melanoma cases and controls, because
there would be little overlap in the age range of the teenagers who participated in the
validity study and the melanoma cases and controls who participated in the investiga-
tors study. Even after discarding the poorly applicable validity data, there will often
be a range of values reported in the literature, and the investigator should decide how
to best use these ranges. An average value or a preferred value (e.g., the value from
the external population most like the study population) can be used with simple bias
analysis, or the range can be used with multidimensional bias analysis, probabilistic
bias analysis, or multiple biases modeling.
If the validity data will be internal, then the investigator should allocate study
resources to conduct the data collection required for the quantitative bias analysis.
If nonparticipants will be crudely characterized with regard to basic demographic
information such as age and sex, so that they can be compared to participants, then
the data collection system and electronic database should allow for designation of
nonparticipant status and for the data items that will be sought for nonparticipants.
If a validity substudy will be implemented to characterize the sensitivity and
specificity of exposure, then resources should be allocated to accomplish the substudy.
A protocol should be written to sample cases and controls (usually at random) to
participate in the diary verification of self-reported sunlight exposure. The substudy
protocol might require additional informed consent, additional recruitment materials,
24 2 A Guide to Implementing Quantitative Bias Analysis
and will certainly require instructions for subjects on how to record sunlight exposure
in the diary and a protocol for data entry.
These examples do not fully articulate the protocols required to plan and collect
the data that will inform a quantitative bias analysis. The same principles for designing
well-conducted epidemiologic studies apply to the design of well-conducted validity
studies. The reader is again referred to texts on epidemiologic study design, such as
Modern Epidemiology (Rothman et al., 2008b), to research the details of valid
study design. The larger point, though, is that the data collection for a validity sub-
study should not be underestimated. The investigator should plan such studies at the
outset, should allocate study resources to the data collection effort, and should
assure that the validation substudy is completed with the same rigor as applied to
the principal study.
Valid epidemiologic data analysis should begin with an analytic strategy that
includes plans for quantitative bias analysis at the outset. The plan for quantitative
bias analysis should make the best use of the validation data collected per the
design described above.
When multiple sources of systematic error are to be assessed in a single study, the
order of corrections in the analysis can be important. In particular, adjustments for
classification errors as a function of sensitivity and specificity do not reduce to a
multiplicative bias factor. The place in the order in which an adjustment for classifica-
tion error will be made can therefore affect the result of the bias analysis. In general,
the investigator should reverse the order in which the errors arose. Errors in classifi-
cation arise in the study population, as an inherent part of the data collection and
analysis, so should ordinarily be corrected first. Selection bias arises from differences
between the study participants and the source population, so should ordinarily be
corrected second. Confounding exists at the level of the source population, so error
arising from an unmeasured confounder should ordinarily be analyzed last.
While this order holds in general, exceptions may occur. For example, if internal
validation data on classification errors are used to correct for information bias, and
the internal validation data were collected after participants were selected into the
study population, then one would correct first for classification error and then for
selection bias. Were the internal validation data collected before participants were
selected into the study population, then one would correct first for selection bias
and then for classification error. In short, one should follow the study design in
reverse to determine the appropriate order of bias analysis. See Chap. 9 on multiple
bias analysis for a more complete discussion of the order of corrections.
Quantifying Error 25
Many of the techniques for quantitative bias analysis described herein assume that
the investigator has access to record-level data. That is, they assume that the original
data set with information on each subject in the study is available for analysis.
Record-level data, or original data, allow for a wider range of methods for quantitative
bias analysis. With record-level data, corrections for classification errors can be
made at the level of the individual subjects, which preserves correlations between the
study variables and allows the analyst to adjust the corrected estimates of association
for other confounders. Furthermore, when multiple sources of systematic error are
assessed in the analysis, applying the bias analyses in the proper order to the record-level
data can easily preserve the interactions of the biases.
Some of the techniques described herein apply to summary data, or collapsed
data. That is, they apply to data displayed as frequencies in summary contingency
tables or as estimates of association and their accompanying conventional confi-
dence intervals. Investigators or stakeholders with access to only summary data
(e.g., a reader of a published epidemiology study) can use these techniques to con-
duct quantitative bias analysis. In addition, investigators with access to record-level
data can generate these summary data and so might also use the techniques.
However, these techniques do not necessarily preserve the interrelations between
study variables and usually assume that multiple biases in an analysis are independent
of one another (i.e., the biases do not interact). These assumptions are not usually
testable and may often be incorrect. Investigators with access to record-level data
should therefore use the analyses designed for record-level data in preference to the
analyses designed for summary data.
SAS code available on the web site (see Preface), the computational difficulty varies
widely. As the computational difficulty grows, the researcher should expect to devote
more time and effort to completing the analysis, and more time and presentation
space to explaining and interpreting the method. In general, investigators should
choose the computationally simplest technique that satisfies their inferential goal
given the number of biases to be examined and whether multiple biases can be appro-
priately treated as independent of one another. When only one bias is to be examined,
and only its impact on the estimate of association is central to the inference, then
computationally straightforward simple bias analysis is sufficient. When more than
one bias is to be examined, the biases are not likely independent, and an assessment
of total error is required to satisfy the inferential goal, then the computationally most
difficult and resource-intensive multiple bias modeling will be required.
The following paragraphs summarize each of the analytic techniques and illustrate
the method with a brief example. The detailed chapters that follow show how to imple-
ment each technique and provide guidance for choosing from among the methods used
to accomplish each of the techniques. That choice usually depends on the available
bias parameters (e.g., the sensitivity and specificity of classification vs the positive and
negative predictive values), the source of the bias parameters (i.e., internal or external
validation data), and the data form (i.e., record-level or summary data).
With a simple bias analysis, the estimate of association obtained in the study is
adjusted a single time to account for only one bias at a time. The output is a single
revised estimate of association, which does not incorporate random error. For
example, Marshall et al. (2003) investigated the association between little league
injury claims and type of baseball used (safety baseball vs traditional baseball).
They observed that safety baseballs were associated with a reduced risk of ball-
related injury (rate ratio = 0.77; 95% CI 0.64, 0.93). They were concerned that
injuries might be less likely to be reported when safety baseballs were used than
when traditional baseballs were used, which would create a biased estimate of a
protective effect. To conduct a simple bias analysis, they estimated that no more
than 30% of injuries were unreported and that the difference in reporting rates was
no more than 10% (the bias parameters). Their inferential goal was to adjust the
estimate of association to take account of this differential underreporting. With this
single set of bias parameters, the estimate of association would equal a rate ratio of
0.88. They concluded that a protective effect of the safety ball persisted after taking
account of the potential for differential underreporting of injury, at least conditional
on the accuracy of the values assigned to the bias parameters.
Cain et al. (2006, 2007) conducted a simple bias analysis with the inferential
goal of determining whether their estimate of association could be completely
28 2 A Guide to Implementing Quantitative Bias Analysis
attributed to bias. Their study objective was to estimate the association between
highly active antiretroviral therapy (HAART) and multiple acquired immunodefi-
ciency syndrome (AIDS)-defining illnesses. Averaging over multiple AIDS-
defining illnesses, the hazard of an AIDS-defining illness in the HAART calendar
period was 0.34 (95% CI 0.25, 0.45) relative to the reference calendar period.
The authors were concerned that differential loss-to-follow-up might account for
the observed protective effect. They conducted a worst-case simple bias analysis
by assuming that the 68 men lost-to-follow-up in the HAART calendar period had
an AIDS-defining illness on the date of their last follow-up, and that the 16 men
lost-to-follow-up in the calendar periods before HAART was introduced did not
have an AIDS-defining illness by the end of follow-up. With these bounding
assumptions, the estimated effect of HAART equaled a hazard ratio of 0.52. The
inference is that differential loss-to-follow-up could not account for all of the
observed protective effect of HAART against multiple AIDS-defining illnesses,
presuming that this analysis did in fact reflect the worst case influence of this bias.
Note that in both examples, the estimate of association was adjusted for only one
source of error, that the adjustment was not reflected in an accompanying interval
(only a point estimate was given), and that random error was not simultaneously
incorporated to reflect total error. These are hallmarks of simple bias analysis.
on the accuracy of the ranges assigned as values to the bias parameters. While
multidimensional bias analysis provides more information than simple bias analy-
sis in that it provides a set of corrected estimates, it does not yield a frequency
distribution of adjusted estimates of association. Each adjusted estimate of associa-
tion stands alone, so the analyst or reader gains no sense of the most likely adjusted
estimate of association (i.e., there is no central tendency) and no sense of the width
of the distribution of the adjusted estimate of association (i.e., there is no frequency
distribution of corrected estimates). Multidimensional bias analysis also addresses
only one bias at a time and does not simultaneously incorporate random error, dis-
advantages that it shares with simple bias analysis.
analysis somewhat overestimated the protective effect and the conventional interval
somewhat underestimated the total error, at least conditional on the accuracy of the
distributions assigned to the bias parameters.
Multiple biases modeling is also an extension of simple bias analysis in which the
analyst assigns probability distributions to the bias parameters, rather than single
values or ranges, but now the analyst examines the impact of more than one bias at
a time. For example, we conducted a case-control study of the effect of pregnancy
termination (induced and spontaneous) on breast cancer risk among parous resi-
dents of Massachusetts ages 2555 years at breast cancer diagnosis (Lash and Fink,
2004). The study included all Massachusetts breast cancer cases reported to the
Massachusetts cancer registry between 1988 and 2000 arising from the population
of women who gave birth in Massachusetts between 1987 and 1999. The condi-
tional adjusted odds ratio estimate of the risk ratio of breast cancer, comparing
women who had any history of pregnancy termination with women who had no
history of pregnancy termination, equaled 0.91 (95% CI 0.79, 1.0). Information on
history of pregnancy termination and potential confounders was recorded on birth
certificates before the breast cancer diagnosis, so errors in recall or reporting of this
history should have been nondifferentially and independently related to breast can-
cer status (Rothman et al., 2008d). It may be that the observed null result derives
from nondifferential, independent misclassification of history of termination,
thereby masking a truly nonnull result. In addition, the study may have been subject
to a selection bias if women who migrated from Massachusetts between the time
they gave birth and the time they developed breast cancer differed from those who
did not migrate with respect to pregnancy terminations. The inferential goal was to
adjust the estimate of association and its interval to account for these biases. We
first implemented a probabilistic bias analysis with the following bias parameters:
(1) a triangular distribution of sensitivity of termination classification ranging from
69% to 94% with a mode of 85%, (2) a triangular distribution of specificity of
termination classification ranging from 95% to 100% with a mode of 99%, and (3)
a prevalence of termination in the source population ranging from 20% to 30%
with a mode of 25% (Holt et al., 1989; Werler et al., 1989; Wilcox and Horney,
1984). To allow for small deviations from perfectly nondifferential misclassifica-
tion, we allowed the sensitivity and specificity of termination classification in
cases, versus controls, to vary independently of one another between 0.9-fold and
1.1-fold (e.g., if the sensitivity in cases was chosen to be 85%, then the sensitivity
in the controls could be no less than 76.5% and no greater than 93.5%). These were
the bias parameters used to address misclassification. The probabilistic bias analy-
sis yielded a median odds ratio estimate of 0.90 (95% simulation interval 0.62, 1.2).
Conditional on the accuracy of the distributions assigned to the bias parameters,
this probabilistic bias analysis (which only accounts for one source of bias) sup-
ports the notion that the result is unlikely to arise from a bias toward the null induced
Quantifying Error 31
A Note on Inference
In the inference segment of each of the preceding examples, the inference was
always said to be conditional on the accuracy of the values or distributions assigned
to the bias parameters. It is, of course, impossible to know the accuracy of these
assignments. Nonetheless, the analyst should think that the assignments are more
accurate than the inherent assignments made to these bias parameters in a conven-
tional data analysis (e.g., no unmeasured confounding and perfect classification).
If stakeholders other than analyst support a different set of values, the bias analysis
can and should be repeated with the alternate set of values to see whether the results
of the bias analysis and the inference change substantially.
As will be described in Chap. 3, the assignment of values and distributions to
bias parameters is equal parts art, educated guess, and science. Were the values
known with certainty, then a bias analysis would not be necessary because alternate
empirical methods would be superior. Circumstances such as this are rare. It is
imperative, therefore, that in any bias analysis the values assigned to the bias
32 2 A Guide to Implementing Quantitative Bias Analysis
parameters are explicitly given, the basis for the assignment explicitly provided,
and any inference resting on the results of the bias analysis explicitly conditioned
on the accuracy of the assignments.
Conclusion
Bias Parameters
All bias analyses modify a conventional estimate of association to account for bias
introduced by systematic error. These quantitative modifications revise the con-
ventional estimate of association (e.g., a risk difference or a rate ratio) with equa-
tions that adjust it for the estimated impact of the systematic error. These equations
have parameters, called bias parameters, that ultimately determine the direction and
magnitude of the adjustment. For example:
The proportions of all eligible subjects who participate in a study, simultane-
ously stratified into subgroups of persons with and without the disease outcome
and within categories of the exposure variable of interest, are bias parameters.
These parameters determine the direction and magnitude of selection bias.
The sensitivity and specificity of exposure classification, within subgroups of
persons with and without the disease outcome of interest, are bias parameters that
affect the direction and magnitude of bias introduced by exposure misclassification.
The strength of association between an unmeasured confounder and the expo-
sure of interest and between the unmeasured confounder and the disease out-
come of interest are bias parameters that affect the direction and magnitude of
bias introduced by an unmeasured confounder.
Complete lists of bias parameters affecting selection bias, unmeasured confounding,
and information bias will be presented in their corresponding simple bias analysis
chapters, along with a definition of each bias parameter. This chapter pertains to the
data sources that can be examined to assign numeric values or probability distributions
to these bias parameters.
the values to assign to bias parameters and satisfies the assumptions of empirical
methods, then these empirical methods of bias analysis may be preferable to the
methods described in this text (see Chap. 1).
Selection Bias
Unmeasured Confounder
Ordinarily, a potential confounder that was not measured would be missing in all
members of the study population, not in only a subgroup of the study population.
However, it is possible that, by design or happenstance, an unmeasured confounder
would be available for a subsample of the study population. For example, in a study
of the relation between an occupational hazard and a lung cancer occurrence, smok-
ing history might be an unmeasured confounder. However, were the occupational
cohort assembled from several different factory locations, it is possible that infor-
mation from some factories would include smoking history and information
from others would not. Thus, smoking would, by happenstance, be unmeasured in
only a portion of the study population. Alternatively, the investigator may not have
Internal Data Sources 35
the resources (time and money) to survey all members of the occupational cohort
with respect to their smoking history, but might have the resources to survey a
sample of the cohort. Thus, smoking, by design, would be unmeasured in the mem-
bers of the occupational cohort who are not sampled for the survey or who refuse
to participate in the survey. While it may seem counterproductive to gather infor-
mation on only a portion of the study population, such a design strategy can yield
very valuable information to inform an adjustment for the confounder, either by
bias analysis or empirical methods mentioned in Chap. 1.
In both examples, the smoking information from the internal subgroup could be
used to inform a bias analysis of the impact of the unmeasured confounder (smok-
ing history) on the estimate of the effect of the occupation on lung cancer mortality.
The information derived from the subsample would include the strength of associa-
tion between occupation and smoking categories, the association between smoking
history and lung cancer mortality, and the prevalence of smoking. These bias
parameters determine the impact of the unmeasured confounder on the estimate of
association derived from the entire study population. One should realize, however,
that these bias parameters might themselves be measured with error in the subsam-
ple. The subcohorts in which smoking information was available might not be
representative of the entire cohort. It may be, for example, that factories where
smoking history was recorded and retained encouraged healthy habits in their
workforce, whereas factories where smoking history was not recorded did not or
did so to a lesser extent. Even when smoking history was collected by design, it
may be that the smoking history obtained from those who agreed to participate in
the survey would differ with respect to smoking history from those who refused to
participate in the survey. At the least, the subsample is of finite size, so sampling
error must be taken into account for any adjusted estimate of association that also
reports an interval (e.g., a confidence interval).
Information Bias
Often times, a variable can be measured by many different methods, and each
method would have its own accuracy. The most accurate method may be the most
expensive, most time-consuming, or most invasive. This method may not be applied
to all members of a study population. In this case, another less expensive, less time-
consuming, or less invasive method would be used to collect the information about
the variable for all the participants. An internal validation study would collect infor-
mation about the variable in a subsample of the study population from both the less
accurate method and the more accurate method. The information about the variable
would then be compared in the subsample, and this comparison would yield values
for the bias parameters (e.g., sensitivity, specificity, or predictive values) used to
correct for classification errors in the complete study population.
If the more accurate method is thought to be perfect, then it is often called the
gold-standard method for measuring the variable. For example, the gold-standard
36 3 Data Sources for Bias Analysis
the values assigned to the bias parameter to vary, using multidimensional or proba-
bilistic bias analysis.
A second nuance to consider with all validation studies, including internal vali-
dation studies, is that the measurement of the variable may not exactly equate with
the concept that is being measured. In the blood alcohol example, sobriety is the
concept that is being measured. Blood alcohol concentration over some threshold,
which is an objectively measured laboratory value, has been set as a sharp line of
demarcation to dichotomize study subjects into those who are sober and those who
are not. However, given differences in experience with alcohol drinking and other
factors, some participants with blood alcohol concentrations above the threshold
may be sober and those below the threshold may not be sober. If the sobriety con-
cept is task-oriented, the field sobriety test may be a better method of measuring
sobriety than blood alcohol content.
Finally, for many variables, there is no gold-standard method of measurement.
For example, human intelligence has been measured by many tests and methods,
but there is no single test recognized as the gold standard by which intelligence
could be optimally measured. When there is no gold-standard test, comparisons of
measurement methods are reported as agreement (concordant and discordant
classifications, or correlation of continuous measurements, by the two methods).
These measures of agreement can be used to inform values assigned to the bias
parameters, but should not be mistaken for direct estimates of the bias parameters.
When no data are collected from a subsample of the study population to inform bias
parameters, the values assigned to bias parameters must be informed by data
collected from a second study population or by an educated guess about these values.
These assignments rely on external data sources: the second study population or the
experience relied upon to reach an educated guess. Some analysts resist these
options; but such resistance is tantamount to assigning a value to the bias parameter
such that no adjustment would be made. For example, refusing to assign the sensitivity
of exposure classification observed in a second study is tantamount to assigning
perfect sensitivity to the exposure classification scheme, so that there is no quantitative
correction to account for exposed study participants who were misclassified as unex-
posed. Refusing to guess at the association between an unmeasured confounder
and the outcome is tantamount to saying there is no confounding by the unmeasured
confounder, so that there is no quantitative correction to account for it. While values
assigned to bias parameters from external data sources might well be imperfect,
they are almost certainly more accurate than the assumption that there is no error,
which is implicit in conventional analysis. Furthermore, the values assigned to bias
parameters from internal data sources may be imperfect as well, as described in the
preceding sections. Therefore, external data sources and educated guesses should
38 3 Data Sources for Bias Analysis
Selection Bias
The bias parameters required to assess selection bias are the proportions of eligible
subjects included in the study within each combination of exposure and disease
status. The subjects included in the study provide an initial internal estimate of the
exposure prevalence and disease occurrence within each combination. Bias arises,
however, only when the exposure prevalence and disease occurrence is different in
those who were not included in the study than in those who were included in
the study. To address selection bias, one can begin with the exposure prevalence
and disease occurrence observed in those included in the study, and then adjust
the exposure prevalence and disease occurrence by making educated guesses
about selection forces that would act in concert with the exposure prevalence and
disease occurrence.
For example, in a case-control study, one ordinarily knows the proportion of
cases and controls who agree to participate, so are included in the analysis. The
exposure prevalence of participating cases and controls is known from the data
gathered on participants. The exposure prevalence of cases and controls who did
not participate is unknown and requires an educated guess. One might reason that
exposure increases the probability that a case would participate in the study.
Imagine, for example, that the exposure under study is an occupational hazard.
Cases with a history of the occupation might be inclined to participate for reasons
including an altruistic concern that the hazard should be identified or an interest in
secondary gain (e.g., workers compensation). The prevalence of the occupational
hazard among participating cases would therefore overestimate the true prevalence
of the occupational hazard among all cases, so a reasonable educated guess of the
prevalence of the occupational hazard in nonparticipating cases would be lower
than its prevalence in participating cases. Controls, on the other hand, would not
have the same secondary motivations as cases, so the exposure prevalence of the
occupational hazard among nonparticipating controls might be about the same as
the exposure prevalence of the occupational hazard among participating controls.
One might further inform these educated guesses at exposure prevalence or
disease occurrence from research studies outside the study setting. For example,
other studies might have investigated the occupational hazard and reported the
exposure prevalence in a similar source population. This exposure prevalence might
be used to inform the overall exposure prevalence expected in controls, and then
one can solve for the exposure prevalence in nonparticipants by setting the second
studys exposure prevalence equal to the average of the exposure prevalence in
participating and nonparticipating controls, weighted by their proportions among
all selected controls. When using exposure prevalence of disease occurrence information
from other studies, one should assure that the second studys population is similar
External Data Sources 39
enough to the study population of interest to allow the extrapolation. For example,
if age and sex are related to disease occurrence, then the second studys distribution
of age and sex ought to be similar to the distribution of age and sex in the study
population of interest before the second studys estimate of disease occurrence is
adopted to inform the selection bias parameters. Multidimensional bias analysis or
probabilistic bias analysis can be implemented to incorporate uncertainty in the
extrapolation or a range of educated guesses and information from more than one
external source.
Unmeasured Confounder
so one can reasonably assume that the prevalence of a risky behavior that is an
unmeasured confounder will be higher in those with an exposure that is also a risky
behavior than in the unexposed (e.g., alcohol consumption is higher, on average,
among cigarette smokers than nonsmokers). On the contrary, some behaviors
preclude or reduce the prevalence of another behavior. For example, people gravitate
to lifetime sports that may be mutually exclusive, on average, either because they
are most common during the same seasons (e.g., running or biking) or because they
can only afford one choice (e.g., golfing or downhill skiing). Of course, some runners
also bike, and some skiers also golf, but the prevalence of bikers among runners and
the prevalence of golfers among skiers may be lower than their respective prevalence
among nonrunners and nonskiers, at least in some populations. Educated guesses at
such relations should take best advantage of external data sources and the analysts
experience, and should also allow for uncertainty in the assigned values by using
multidimensional or probabilistic bias analysis.
Information Bias
A contingency table comparing incident cases with fatal cases over some time period
would provide an estimate of the bias parameters. As will be discussed in Chap. 6,
when the sensitivity of disease classification is poor but does not depend on the exposure
status, and there are few false-positive cases (very high or perfect specificity), risk
ratio measures of association are not expected to be substantially biased by the disease
classification errors. This will also hold for rate ratio and odds ratio measures of
association, so long as these errors do not substantially affect person-time.
In this example, one could reasonably expect to find external information to
estimate the classification bias parameters. In addition, one could make reasonable
educated guesses about which bias parameter was likely to be high (near 100%;
sensitivity for vasectomy classification and specificity for prostate cancer classifi-
cation) and which to be low. These educated guesses are often sufficient to provide
values for the bias parameters that allow an assessment of the direction and approximate
magnitude of the bias and additional uncertainty arising from mismeasurement.
External information on these bias parameters should not be adopted for use in
a study without consideration of its applicability. Consider, for example, the strategy
of using visualization to assign sex categories (male or female) to study participants.
In most adult populations, such a strategy would be cost-efficient and accurate.
A validation study comparing the category assigned by visualization with the category
assigned by karyotyping (a possible gold standard) would presumably show high
sensitivity and specificity. However, were visualization used as the method of
assigning sex category in a newborn ward (presuming no visual clues like pink or
blue blankets), it may not work so well. The sensitivity and specificity of visualization
as a method for assigning sex category, as measured in a validation study in which
the strategy was applied to adults, should obviously not be adopted to provide
values for these bias parameters when visualization is used as a strategy to classify
newborns as male or female. More generally, sensitivity and specificity are sometimes
said to be characteristics of a test, so more readily applied to other populations in
which the test is used than predictive values (as explained in Chap. 6, predictive
values depend on the both test characteristics and the prevalence of the result in the
study population). While this mantra is correct on its surface, the preceding example
illustrates that sensitivity and specificity can also depend on the population in which
the validation data were collected.
Summary
Internal and external data sources can be used to inform the values assigned to bias
parameters. In addition, educated guesses about the plausible range and distribu-
tional form of these values should be considered. Subsequent chapters will outline
specific methods that make use of the bias parameters and the values assigned to
them to estimate the direction and magnitude of the bias and uncertainty arising
from systematic errors. In all cases, the best analysis makes optimal use of all three
sources of information: internal validity data, external validity data, and the reason
and judgment of the investigator.
ii
Chapter 4
Selection Bias
Introduction
Selection bias occurs when the two variables whose association is under study, usually
an exposure and a disease outcome, both affect participation in the study. Selection
bias can arise when participants enroll in a study, if the variables affect initial partici-
pation rates, and it can also arise when participants withdraw from the study, if the
variables affect follow-up rates. The association between the exposure and outcome
must be measured among participants, so is effectively conditioned on participation.
Conditioning the measurement of the relationship between the exposure and the dis-
ease on an effect (participation) of both variables can induce a noncausal association
between them that mixes with their causal relation, if any (Hernan et al., 2004).
Table 4.1 illustrates how such a noncausal association arises. Among all those
eligible to participate, the proportion with disease among the exposed and unex-
posed both equal 10% (100/1,000). However, the study must be conducted among
those who actually do participate. Participation is associated with both the exposure
and the disease. The odds ratio associating participation with exposure equals (80
+ 400)/(500 + 20)/(200 + 60)/(700 + 10) = 2.52, reflecting the fact that a higher
proportion of the exposed (48%) participated than the unexposed (26%). The odds
ratio associating participation with disease equals (80 + 60)/(400 + 200)/(20 + 40)/
(500 + 700) = 4.7, reflecting the fact that a higher proportion of the diseased par-
ticipated (70%) than the undiseased (33%). The net result is to induce an associa-
tion between exposure and disease among participants (risk ratio = 80/480/60/260
= 0.72), when no association exists among all those eligible.
Local
hazardous
waste site
Initial
participation
Leukemia
Fig. 4.1 Causal graph showing selection bias affecting the relation between exposure to a hazard-
ous waste site and occurrence of leukemia
Figure 4.1 illustrates the concept of selection bias in a study of the relation
between exposure to a local hazardous waste site and occurrence of leukemia. If the
study were conducted by case-control design, then one could imagine that leukemia
cases might be more likely to participate than controls selected from the population
that gave rise to the cases, since cases may have greater interest in a study of their
diseases etiology than controls. Similarly, persons who live near the hazardous
waste site (both cases and controls) may perceive it to be a threat to their health, so
may be more motivated to participate in a study than persons who do not live near
the hazardous waste site. In this example, both the exposure (residence near a local
hazardous waste site) and the outcome (leukemia diagnosis) affect the probability
of initial participation in the study. The association between residence near the
hazardous waste site and leukemia occurrence can only be measured among the
participants, so is conditioned on initial consent to participate. The proportion of
exposed cases among participants would be expected to exceed the proportion of exposed
cases among all those eligible to participate. Conversely, the proportion of unexposed
controls among participants would be expected to be less than the proportion of
unexposed controls among all eligible participants. That is, self-selection of participants
into the study is likely to induce an association between residence near the hazardous
waste site and occurrence of leukemia, even if there is no association between
them among all those eligible to participate, as displayed in Table 4.1. If the
selection proportions happen to cancel one anothers effect, then there may be no
bias, as will be explained further below. When the association between variables
measured among participants is different from the association that would have been
Definitions and Terms 45
Below are definitions and terms that will be used to explain selection bias and
bias analysis to account for selection bias. The first section addresses concepts
relevant to selection bias and the second section explains the motivation for
bias analysis.
Conceptual
As noted above, selection bias arises when the exposure and outcome under study
affect participation, meaning either initial participation or continued participation.
The design of an epidemiologic study requires the investigator to specify criteria
for membership in the study population. These criteria list the inclusion criteria and
exclusion criteria. To become a member of the study population, all of the inclusion
criteria must be met and none of the exclusion criteria may be met. Inclusion criteria
specify the characteristics of the study population with respect to personal
characteristics (e.g., sex, age range, geographic location, and calendar period) and
exposure or behavioral characteristics (e.g., tobacco use, alcohol use, exercise regimen,
diet, and occupation). Exclusion criteria specify the characteristics that restrict a
subset of the persons who met the inclusion criteria from becoming members of the
study population. For example, exclusion criteria may limit the population with
respect to history of the disease under study (outcomes are often limited to first
occurrence), language (e.g., non-English speakers and those with poor hearing may not
be able to complete an interview), or an administrative requirement (e.g., those with
a telephone or driver license, if these sources will be used to access participants).
The subset of the source population that satisfies all of the inclusion criteria and
none of the exclusion criteria constitutes the target population for enrollment into
the study. If any of these persons do not participate in the study, the potential for
selection bias arising at the stage of initial enrollment must be considered. Note that
there may be persons who satisfy the inclusion criteria and none of the exclusion
criteria but who are not identified. Their absence from the study does not induce a
selection bias, although it may affect the studys generalizability to the target popu-
lation. The ability to identify participants is therefore an inherent component of the
inclusion criteria.
Most epidemiologic studies follow the participants who are enrolled for some
period of time. If population members become lost to follow-up, a second opportunity
for selection bias arises. Differential loss-to-follow-up occurs when the exposure
and disease affect the probability of loss-to-follow-up. Because the longitudinal
exposure-disease association can only be measured among those who are followed,
46 4 Selection Bias
the differential loss-to-follow-up causes a difference between the true and the
measured association. Bias from differential continued participation is sometimes
called attrition bias.
Figure 4.2 depicts the general concept of selection bias. An investigator is interested
in the causal relation between exposure and disease. If all of the study population
participates initially and throughout follow-up, then the exposure-disease association
will not be conditioned on participation (there is no stratum of nonparticipants).
This point illustrates the connection between selection bias and missing data methods.
All biases can be conceptualized as a missing data problem (Little and Rubin, 2002;
Rubin, 1991; Cole et al., 2006). For selection bias, the conceptualization is quite
simple: persons who refuse to participate or are lost to follow-up do not contribute
data to the analysis. Their data are missing, and if the missingness is predicted by
exposure and disease status, then analyses conducted among those who do contribute
data will yield associations different from the association that would be observed,
were the data from all subjects available.
Exposure
Participation
(initial or
?
ongoing)
Disease
Fig. 4.2 Causal graph depicting the general concept of selection bias
then selection bias cannot arise, because the analysis is not conditioned on a common
consequence. That is, having only the exposure or the disease-predicting participation
in a study or loss-to-follow-up is not sufficient to create selection bias. It is possible,
however, for the association between exposure and disease to be indirect (i.e., not
depicted by a direct path from exposure or disease to participation or continued
participation). This possibility will be discussed later under the topic of M-bias.
This latter protection against selection bias is the basis for a distinction between
case-control and cohort studies. Case-control studies select cases (usually all cases)
from the source population and select controls as a sample of the source population.
If a different proportion of cases participate than controls, then the opportunity
arises for selection bias. Cohort studies enroll the study population and follow it
forward in time to observe incident cases. Because the study population excludes
prevalent cases, there can be no association between disease and initial participation.
It seems, therefore, that cohort studies are immune to selection bias arising from the
initial participation.
This apparent distinction between susceptibility of case-control and cohort
designs to selection bias provides one basis for a common misperception that
case-control studies are inherently less valid than cohort studies. The distinction
is not, however, as clear as the simple comparison in the preceding paragraph
makes it seem. First, in order for selection bias from differences in initial participation
to occur, both disease status and exposure status must affect initial participation.
While case status may affect participation in a case-control study, exposure status
may not, in which case no selection bias would be expected. Second, case-control
studies that rely on registries in which both exposure and disease status are
recorded before conception of the study hypothesis will be immune from selection
bias. In these registry-based studies, the complete study population participates,
and selection of cases and controls is a matter of efficiency. There is no selection
bias because all cases participate and controls are sampled without regard to exposure
status. Similarly, case-control studies can be nested in prospective cohort studies.
In these designs, all cases are included and controls are selected from the cohort
that gave rise to the cases because some information required for the study would
be too expensive to gather on the whole cohort. For example, case-control studies
of gene-environment interaction are frequently nested in prospective cohort studies
because genotyping is too expensive to complete for the entire cohort. Again, the
design is dictated by cost efficiency. Third, cohort studies can be conducted
retrospectively, in which case disease and exposure status may affect participation.
For example, in a retrospective cohort study of the relation between mastic asphalt
work and lung cancer mortality (Hansen, 1991), two of the sources of information
used to identify asphalt workers (i.e., exposed subjects) were a registry of men
enrolled in a benefit society and a list of benefit recipients (Cole et al., 1999). The
latter ought to have been a subset of the former; however, the fact that some ben-
efit recipients were not listed as benefit society members suggested that the mem-
bership roster included only survivors at the time it was disclosed to the study.
That is, members of the benefit society were deleted from its roster when they
died. No similar force acted on the unexposed reference group (comparable Danish
48 4 Selection Bias
men). Thus, both disease status (mortality) and exposure (asphalt work) were
related to initial participation, giving rise to a selection bias in this cohort study.
Finally, case-control studies and cohort studies are both susceptible to differential
loss-to-follow-up, which is a selection bias.
To summarize, both case-control studies and cohort studies can be designed to
prevent selection bias, and both can be conducted such that selection bias occurs.
Each study requires examination of the design to assess its susceptibility to
selection bias. The simple dichotomization of designs that suggests case-control
studies are susceptible to selection bias, whereas cohort studies are not, will inevitably
lead to avoidable errors. Each study must be examined on its own merits to ascertain
its susceptibility to selection bias.
The motivation for bias analysis to address selection bias follows directly from its
conceptualization. One wishes to adjust an estimate of association measured among
participants to account for the bias introduced by conditioning on participation,
when participation is affected by both exposure and disease. Participation may be
initial or ongoing. The adjustment is often difficult since it ideally requires an
assessment of the participation proportion among each of the four combinations of
exposed (when exposure is dichotomous) and diseased. Often the exposure status
and disease status of the nonparticipants will be unknown their participation
is required to ascertain this information. In this case, one might approach the bias analysis
from a different perspective. That is, one might ask whether reasonable estimates
of the participation proportions could account for all of an observed association.
A slight variation of this approach is to ask whether the participation proportions
required to counter an observed association, so that the estimate adjusted for postulated
selection bias equals the null, are reasonable. These approaches will be discussed
in the methods for simple bias analysis below.
Sources of Data
validation data, only the subset of participants who did not participate in the validation
substudy has missing data. The subset that did participate in the validation substudy
has complete data (both the data to estimate the effect and the data to address the
selection bias). With external validation data, no participants in the validation study
provide data to estimate the effect. This nuance affects the analytic technique used
to address the selection bias, as explained further below.
The last source of information about selection bias does not derive from validation
data per se, but rather from the experience of the investigator. That is, the information
used to inform the bias analysis is a series of educated guesses postulated by the
investigator for the purpose of completing the bias analysis. Each educated guess
derives from the investigators experience and familiarity with the problem at hand.
While the notion of making an educated guess to address selection bias may engender
some discomfort, since the bias analysis seems to be entirely fabricated, the alternative
is to ignore (at least quantitatively) the potential for selection bias to affect the
studys results. This alternative is also a fabrication, and often runs contrary to
evidence from the study, such as an observed difference in participation rates
between cases and controls.
Example
The example used to illustrate methods of simple bias analysis to address selection
bias derives from a case-control study of the relation between mobile phone use and
uveal melanoma (Stang et al., 2009). Melanoma of the uveal tract is a rare cancer
Consented No Q'naire
459/864 17/379
(94)/(57) (3.5)/(25)
Interviewed No interview
458/840 1/24
(94)/(55) (0.2)/(1.6)
Fig. 4.3 Flow of subject enrollment into the study by Stang et al. (2009)
50 4 Selection Bias
of the iris, ciliary body, or choroid. It is the most common primary intraocular
malignancy in adults, with an age-adjusted incidence of approximately 4.3 new
cases/million population (http://www.cancer.gov/cancertopics/pdq/treatment/
intraocularmelanoma/healthprofessional, 2006). Figure 4.3 illustrates the flow of
participant enrollment into the study.
Participation frequencies are depicted as number of cases/number of controls.
Participation percentages, relative to the whole, are depicted as (% cases)/(% controls).
Stang et al. identified 486 incident cases of uveal melanoma between September
2002 and September 2004 at a tertiary care facility in Essen, Germany that receives
cases from all of Europe (Schmidt-Pokrzywniak et al., 2004; Stang et al., 2006). Of
these 486 cases, 458 (94%) agreed to participate in the case-control study and
completed the interview. One hundred thirty-six (30%) of the interviewed cases
reported regular mobile phone use and 107 (23%) reported no mobile phone use;
the remainder used mobile phones irregularly. Three control groups were constructed;
this example will use only the population-based controls that were matched to cases
on age, sex, and region of residence. There were 1,527 eligible population-based
controls, of which 840 (55%) agreed to participate and were interviewed. Two
hundred and ninety-seven (35%) of the 840 interviewed controls reported regular
mobile phone use and 165 (20%) reported no mobile phone use. The odds ratio
associating regular mobile phone use, compared with no mobile phone use, with
uveal melanoma incidence equaled 0.71 (95% CI 0.51, 0.97). The substantial
difference in participation rates between cases and controls (94% vs 55%, respectively)
motivates a concern for the impact of selection bias on this estimate of association.
As shown in Fig. 4.3, Stang et al. (2009) asked those who refused to participate
whether they would answer a short-questionnaire to estimate the prevalence
of mobile phone use among nonparticipants. Of the 27 nonparticipating cases,
10 completed the short questionnaire, and 3 of the 10 (30%) reported regular
mobile phone use. Of the 663 nonparticipating controls, 284 completed the short
questionnaire and 72 (25%) reported regular mobile phone use. Only two categories
were available on the short questionnaire, so those who did not report regular
mobile phone use were categorized as nonusers.
Introduction to Correction
Table 4.2 Depiction of participation and mobile phone use in a study of the
relation between mobile phone use and the occurrence of uveal melanoma
(Stang et al., 2009)
Nonparticipants/short
Participants questionnaire Nonparticipants
Regular use No use Regular use No use Cannot categorize
Cases 136 107 3 7 17
Controls 297 165 72 212 379
which approximately equals the matched odds ratio reported in the study. The
matching will therefore be ignored for the purpose of illustrating the selection bias
correction.
Among nonparticipants who answered the short questionnaire, the crude odds
ratio equals:
3 / 72
OR c,np = = 1.25 (4.2)
7 / 210
which is in the opposite direction from the crude odds ratio observed among par-
ticipants. This difference illustrates the potential impact of selection bias.
To correct for the selection bias, one could collapse the participant and nonparticipant
data, but that would ignore the nonparticipants who did not complete the short
questionnaire. A simple solution would be to assume this second group of nonpar-
ticipants (i.e., those who also did not participate in the short questionnaire) had the
same exposure prevalence as those who did agree to participate in the short
questionnaire. To accomplish this solution, divide those in the second group of
nonparticipants into exposure groups in proportion to the exposure prevalence
observed among nonparticipating cases and controls who did complete the short
questionnaire. For example, multiply the 17 nonparticipant cases by 3/10 to obtain
the proportion expected to be regular users. The results are added to the number of
observed exposed cases (136) and the number of exposed cases who answered the
short questionnaire (3) to obtain an estimate of the number of total exposed cases
among those eligible for the study. Similar algebra is applied for the unexposed
cases and the exposed and unexposed controls, as shown in Eq. (4.3):
52 4 Selection Bias
3 7
136 + 3 + 17 107 + 7 + 17
= 10 10 = 1.62 (4.3)
OR adj
72 212
297 + 72 + 379 165 + 212 + 379
284 284
As noted above, this solution assumes that the nonparticipants who did not answer
the short questionnaire have the same prevalence of mobile phone use, within strata
of cases and controls, as the nonparticipants who did answer the short questionnaire.
While this assumption cannot be tested empirically with the available data, one
could conduct alternative bias analysis methods that explore the impact of viola-
tions of the assumption using multidimensional bias analysis (Chap. 7) or probabi-
listic bias analysis (Chap. 8).
This example is unusual in that data on the exposure prevalence among nonparticipants
are available. Ordinarily, only the number of nonparticipating cases and controls would
be available, and perhaps the participation rate among eligible cases and controls, but
no information about the exposure prevalence among nonparticipants would be
known. In this circumstance, one must postulate selection proportions, guided by the
participation rates in cases and controls, as shown in Table 4.3 and Eq. (4.4).
With these bias parameters, one can adjust the observed odds ratio by multiplying
it by the selection bias odds ratio to account for differential initial participation
using Eq. (4.4):
Continuing with the example, one can calculate the selection proportions, again
assuming that the exposure prevalence among the nonparticipants who answered
the questionnaire equals the exposure prevalence among the nonparticipants who
did not answer the short questionnaire. With this assumption, the selection proba-
bilities are as shown in Table 4.4. When these are multiplied together per Eq. (4.4),
those give a selection bias odds ratio of 2.28 [the fraction in Eq. (4.4)] and the
adjusted odds ratio (1.61) would be as shown in Eq. (4.5).
Simple Correction for Differential Initial Participation 53
Table 4.4 Selection proportions in a study of the relation between the mobile phone use and the
occurrence of uveal melanoma (Stang et al., 2009)
Selection probabilities Regular users Non users
Cases Scase,1 = 136/(136 + 3 + 3*17/10) Scase,0 = 107/(107 + 7 + 7*17/10)
= 0.94 = 0.85
Noncases Scontrol,1 = 297/(297 + 72 + Scontrol,0 = 165/(165 + 210 +
72*379/284) = 0.64 210*379/284) = 0.25
0.85 0.64
OR adj = 0.71 = 1.61 (4.5)
0.94 0.25
Note that the adjusted odds ratio calculated by this method equals the value calcu-
lated earlier by the first method (with rounding error). Figure 4.4 shows a screen
Fig. 4.4 Screenshot of the solution to the selection bias problem using the selection proportion
approach
54 4 Selection Bias
shot of the solution to this selection bias problem using the Excel spreadsheet available
on the texts web site (see Preface) . The bias parameters are entered in the top left
table and the crude data are entered in the center left table. The center middle and
center right tables show the missing data and the combination of the missing data
and the observed data, respectively. The results, displayed in the tables at the
bottom, show the association between mobile phone use and uveal melanoma
corrected for the selection bias.
A reasonable second step in this selection bias example would be to calculate the
limits of selection bias by presuming that all of the nonparticipants who did not answer
the short questionnaire were either regular mobile phone users or nonusers. This is an
initial introduction to multidimensional bias analysis, which is the subject of Chap. 7.
Table 4.5 shows that the selection bias odds ratio ranged from 0.75 to 3.07 and
the resulting adjusted odds ratios ranged from 0.53 to 2.80.
Table 4.5 Selection bias odds ratio and adjusted odds ratio in a study of the relation between the
mobile phone use and the occurrence of uveal melanoma (Stang et al., 2009), assuming that non-
participating cases and controls who did not answer the short questionnaire were all either regular
mobile phone users or nonusers
Selection proportions (E = 1 cases,
E = 1 controls, E = 0 cases, E = 0 Selection bias Adjusted odds
Case/control assumption controls) odds ratio ratio
Regular/regular 0.87, 0.40, 0.94, 0.44 0.97 0.69
Regular/non 0.87, 0.80, 0.94, 0.22 3.96 2.80
Non/regular 0.98, 0.40, 0.82, 0.44 0.75 0.53
Non/non 0.98, 0.80, 0.82, 0.22 3.07 2.17
Example
The example used to illustrate methods of simple bias analysis to address differen-
tial loss-to-follow-up derives from a cohort study of the relation between receipt of
guideline breast cancer therapy and breast cancer mortality (Lash et al., 2000; Lash
Simple Correction for Differential Loss-to-Follow-up 55
and Silliman, 2000). Although effective primary therapy for early stage breast cancer
has been well characterized and enjoys a broad consensus, this standard has not
fully penetrated medical practice. The example study examined the effect of less
than definitive guideline therapy on breast cancer mortality over 12 years of follow-up
after diagnosis of local or regional breast cancer.
The study population comprised 449 women diagnosed with local or regional
breast cancer at eight Rhode Island hospitals between July 1984 and February 1986
(Silliman et al., 1989). Patients identifying variables were expunged after the
enrollment project was completed to comply with the human subjects oversight
committee. Subjects were reidentified for the follow-up study by matching unique
patient characteristics to the Cancer Registry of the Hospital Association of Rhode
Island. The Hospital Association of Rhode Island reidentified 390 of the original
449 patients (87%), and the remaining 59 patients (13%) were lost to follow-up.
The probability of reidentification depended most strongly on the hospital of
diagnosis, because two affiliated hospitals participated in the Hospital Association
of Rhode Island cancer registry for only part of the enrollment.
The vital status of the 390 reidentified patients was ascertained by matching
their identifying variables to the National Death Index. The outcome (breast cancer
mortality) was assigned to subjects with a death certificate that listed breast cancer
as the underlying cause or a contributing cause of death. The date of last follow-up
was assigned as the date of death recorded on the death certificate for decedents.
For subjects with no National Death Index match, 31 December 1996 was assigned
as the date of last follow-up.
The 59 patients lost to follow-up were not ascertained by the Hospital
Association of Rhode Island, so their identifying information could not be
matched to the National Death Index. It is not known, therefore, whether they
were dead or alive by the end of follow-up and, if dead, whether they died of
breast cancer. The hospital where these women were treated was known, and
whether or not the women received guideline therapy was known, since this
information was ascertained before the identifying variables were deleted from
the original data set. Given this information, one is able to conduct a simple bias
analysis to assess the selection bias potentially introduced by differential loss-to-
follow-up.
Correction
To correct the rate ratio for selection bias induced by differential loss-to-follow-up,
one first depicts the follow-up in a contingency table as shown in Table 4.6. In this
example, the rate of breast cancer mortality cannot be depicted among women who
were lost to follow-up.
The crude rate difference associating guideline therapy with breast cancer mor-
tality equaled:
56 4 Selection Bias
40 65
IRDc,p = - = 3.3 / 100 PY (4.6)
687 2560
All that is known about those lost to follow-up is whether they initially received
guideline therapy and other baseline characteristics, including their hospitals of
diagnosis. To estimate the impact of their missing information, first estimate the
number of missing person-years by multiplying the average follow-up duration
among those with and without guideline therapy by the number of persons missing
in each category.
68713 256046
PY< guideline = = 85.9 PY and PYguideline = = 411.7 PY (4.7)
104 286
Table 4.7 Depiction of breast cancer mortality over 12 years of follow-up among those receiving
guideline therapy and those receiving less than guideline therapy, imputed for those lost to follow-up
on the basis of breast cancer mortality rates observed for women diagnosed at two hospitals where
tumor registries operated during only part of the study (observed data from Silliman et al., 1989)
Imputed loss-to-follow-up
Completed follow-up at two hospitals information
<Guideline Guideline <Guideline Guideline
Breast cancer deaths 3 5 4.2 18.7
Person-years 60.8 110.2 85.9 411.7
Crude rate 4.9/100 PY 4.5/100 PY
Crude rate difference 0.4/100 PY 0.
Crude rate ratio 1.1 1.
40 + 4.2 65 + 18.7
IR < guideline = = 5.7 / 100 PY and IR guideline = = 2.7 / 100 PY
687 + 85.9 2560 + 411.7
(4.8)
Using these imputed rates, the incidence rate difference equals 3.0/100 PY and
the incidence rate ratio equals 2.1. These associations are nearer the null than the
associations observed among those with complete data, but not so near the null as
the associations observed in just the two hospitals where most women lost to
follow-up had been diagnosed. The bias analysis provides some assurance that the
observed association between receipt of guideline therapy and breast cancer mor-
tality was not entirely attributable to differential loss-to-follow-up, conditional on
the accuracy of the values assigned to the bias parameters. To further buttress that
inference, one could subject the analysis to the most extreme case, in which all of
those lost to follow-up in the guideline therapy group died of breast cancer and
none of those lost to follow-up in the less than guideline therapy group died of
breast cancer, as shown in Eq. (4.9).
40 + 0 65 + 46
IR < guideline = = 5.2 / 100 PY and IR guideline = = 3.7 / 100 PY
687 + 85.9 2560 + 411.7
(4.9)
Even with this extreme assumption, the rate difference (1.5/100 PY) remains above
the null. This bounding simple bias analysis shows that the entire association between
receipt of guideline therapy and breast cancer mortality could not be attributable to
differential loss-to-follow-up.
ii
Chapter 5
Unmeasured and Unknown Confounders
Introduction
Confounding occurs when the effect of the exposure of interest mixes with the
effects of other variables that are causes of the exposure or that share common
causal ancestors with the exposure (Kleinbaum et al., 1982). Understanding and
adjusting for confounding in epidemiologic research is central to addressing
whether an observed association is indeed causal. It is imperative to control
confounding because it can make an association appear greater or smaller than it
truly is, and can even reverse the apparent direction of an association. Confounding
can also make a null effect (i.e., no causal relation between the exposure and
the disease) appear either causal or preventive. For a covariate variable to induce
confounding, there must be a relation between both the exposure and the covariate in
the source population and between the covariate and the disease (among those
unexposed). In addition, the covariate must not be affected by the exposure.
In nonrandomized epidemiologic studies, confounding can be controlled by design
or in the analysis, but only for known and measured confounders.
As an example, a nonrandomized study of the effect of male circumcision and
male acquisition of human immunodeficiency virus (HIV) could be confounded by
religion, a marker for sexual behavior. This potential confounding has been
depicted in the causal graph in Fig. 5.1 that encodes the presumed relations
between the variables of interest. In the target population, being Muslim is a strong
predictor of being circumcised compared with being a member of another religion.
This tendency creates a relation between the covariate (religion) and the exposure
(circumcision). In addition, Muslim men are often at decreased risk of acquisition
of HIV, even among those who are uncircumcised, which establishes the relation
between the covariate and the disease among the unexposed or lowest exposure group.
To obtain a valid estimate of the association between male circumcision and male
acquisition of HIV, religion must be controlled in the design (e.g., by restricting the
study to either Muslims or non-Muslims) or in the analysis (e.g., by adjusting for
the effect of being Muslim on HIV acquisition in a stratified analysis or in a multi-
variate regression model).
Religion
Male Acquisition of
Circumcision HIV
Fig. 5.1 Causal graph showing confounding by religion in the relation between male circumcision
and male acquisition of HIV
Conceptual
There are two types of confounding problems for which bias analysis is particularly
useful. First, an investigator may be faced with a situation in which an important
confounder was not measured in the course of data collection and therefore she cannot
control for it in the analysis. In this case, the investigator will likely have, or be able
to acquire, some knowledge about the confounder and its distribution in the study
population. Ordinarily, this knowledge is the same information that gives rise to the
notion that an unmeasured confounder has been left uncontrolled. This situation might
occur if a study was based primarily on record reviews during which data on the
Definition and Terms 61
Sources of Data
As with any bias analysis, estimates of the bias parameters can come either from
the subjects who were in the study that the investigator is analyzing (typically data
collected in a subset of the study population) or from studies conducted in similar
populations (typically acquired from searching the literature). The problem of
unmeasured confounding is no different. If the investigator is aware of a potential
confounder for which they cannot collect data because it is too expensive to measure
in the entire study sample, then a substudy in which the confounder is measured in
a sample of the population can be used to inform the bias analysis.
When no internal data have been collected, estimates from the literature have to
be used. This method for bias analysis has been referred to as indirect adjustment
because the data for the bias analysis are extracted from the literature and do not
come from an internal substudy. In these cases, the investigator should strive to find
populations as similar to the one under investigation as possible.
The problem is more difficult when dealing with an unknown confounder
because by definition the investigator will not be able to acquire any data on the
bias parameters used for the bias analysis. In this case, the confounder is hypothetical,
and neither a substudy nor a search of the literature will provide insight into the
distribution of the confounder or its effect on the outcome. The bias analysis will
Introduction to Simple Bias Analysis 63
instead explore the impact of adjustment for unknown confounders with different
size associations and distributions, which will be informed by educated guesses.
Approach
In the following discussion, unless otherwise stated, the simple bias analysis for
confounding will consider the case in which an observed association between a
dichotomous exposure (E) and a dichotomous outcome (D) will be adjusted for a
dichotomous confounder (C). The independent variables (E and C) are coded such
that 0 means subjects are in the reference or unexposed category of the variable,
while 1 denotes that subjects are in the index or exposed category of the variable.
The outcome (D) is coded such that + means subjects developed the disease and
means subjects did not develop the disease. To begin a bias analysis for an
unmeasured or unknown confounder, the investigator must first specify the bias
parameters necessary for the analysis. Given estimates of the bias parameters, one
can correct the estimate of association that was observed for the potential con-
founding by calculating what would have been observed, had we collected data on
the confounder and presuming the values assigned to the bias parameters are accu-
rate. The methods explained below have a long history (Bross, 1967; Gail et al.,
1988; Greenland, 1998; Yanagawa, 1984; Schlesselman, 1978; Bross, 1966),
although to date, use of them has been limited.
To begin, one must assign values to the bias parameters (informed by a substudy,
external literature, or other data sources) and then relate the observed data to the
bias parameters by applying mathematical equations. These equations can be
solved to postulate what results would have been observed, had data on the con-
founder been collected and the confounder controlled in the analysis.
To make the discussion more concrete, the example of the association between
male circumcision and the risk of acquiring HIV, which might be confounded by
religion, will be continued below. Before randomized trials were conducted (Bailey
et al., 2007; Gray et al., 2007; Auvert et al., 2005), all data exploring this relation
came from nonrandomized studies, each of which adjusted for a different set of
confounders [see Weiss et al. (2000) and Siegfried et al. (2005) for reviews]. Many
of these studies reported a protective association between circumcision and risk of
acquiring HIV, while some reported a null association, and in a limited number
of studies harmful associations were reported.
A Cochrane Library Systematic Review published in 2003 concluded, It seems
unlikely that potential confounding factors were completely accounted for in any of
64 5 Unmeasured and Unknown Confounders
the included studies (Siegfried et al., 2003). While important confounders were
likely unmeasured in some of the studies, this conclusion leaves no impression
about which studies were more likely to have been confounded nor to what degree
the associations observed were confounded. A simple bias analysis can be used to
adjust the observed data for the unmeasured confounding and can be used when a
study is already complete or when only summary data are available.
The example will use data from a cross-sectional study conducted among men
with genital ulcer disease in Kenya. Crude data from the study showed that men
who were circumcised had about one-third the risk HIV infection as men who were
uncircumcised (RR 0.35; 95% CI 0.28, 0.44) (Tyndall et al., 1996). Adjustment for
measured potential confounders had little impact on the results, so the example will
use the crude data. However, the authors made no adjustment for religion, a poten-
tially important confounding factor.
A study conducted in Rakai, Uganda noted that the protective effects of circum-
cision on HIV could be overestimated if there was no adjustment for religion (Gray
et al., 2000). Being Muslim is often associated with a lower risk of HIV acquisition,
and being Muslim strongly predicts male circumcision, so if male circumcision
were truly protective (as appears to be the case based on the randomized trials
specifically designed to address this question (Bailey et al., 2007; Gray et al., 2007;
Auvert et al., 2005), then studies that did not collect data on religion and adjust for
it could overestimate the protective effect of male circumcision on HIV acquisition.
The purpose of the bias analysis is to answer the question: What estimate of asso-
ciation between male circumcision and HIV would have been observed in this study,
had the authors collected data on religion and adjusted for it in the analysis?
Bias Parameters
in Uganda had 0.63-fold (95% CI 0.23, 1.76) reduced rate of incident HIV than
non-Muslim men. Because they did not report the risk ratio (which is needed for
this bias analysis), we used the reported data from this Ugandan study to generate
an estimate of the risk ratio.
The second required bias parameter is the distribution of the confounder within
strata of the exposure. This distribution is typically expressed as the proportion of
subjects with the confounder (C = 1) among the exposed (E = 1) and unexposed (E = 0)
populations. In the Kenyan study, the distribution of circumcision in the population
is unknown. To conduct a bias analysis, one might assume that 80% of circumcised
and 5% of uncircumcised men were Muslim roughly based on the prevalence
reported in the Ugandan cohort (Gray et al., 2000).
Assignment of these estimates to the bias parameters now permits a bias analysis
to assess whether adjusting for religion might explain some or all of the observed
protective association between circumcision and acquisition of HIV.
Ratio Measures
The framework for exploring the effect of the unmeasured confounder was derived
by Schlesselman (1978), building on the work of Bross (1966). They developed a
simple method to relate an observed risk ratio to the risk ratio adjusted for an
unmeasured confounder. Given assumptions about the distribution of the confounder
in the population and the effect of the confounder on the outcome in the absence of
the exposure, the adjusted risk ratio can be expressed as:
RR CD p0 + (1 - p0 ) (5.1)
RR adj = RR obs
RR CD p1 + (1 - p1 )
where RRadj is the risk ratio associating the exposure with the disease adjusted for
the confounder, RRobs is the observed risk ratio associating the exposure with the
disease without adjustment for the confounder, RRCD is the risk ratio associating the
confounder with the disease (assuming no effect measure modification by the exposure),
and p1 and p0 are the proportions of subjects with the confounder in the exposed and
unexposed groups respectively. [See Schneeweiss et al. (2005) for alternative ways
of expressing this equation.]
By making estimates of these three bias parameters (RRCD, p1, and p0), one can
calculate the association between the exposure and the disease expected after
adjustment for the confounder. One can also calculate the stratified analysis
expected had data on the confounder been collected, conditional on the assump-
tions about the bias parameters being correct. Calculating the stratified data, given
the bias parameters, is useful for understanding the distribution of the data within
the levels of the unmeasured confounder.
66 5 Unmeasured and Unknown Confounders
If the joint distribution of the exposure (E), disease (D), and confounder (C) were
known, the data could be stratified as in Table 5.1. Small letters denote actual observed
data and capital letters denote expected data, had the confounder been measured. The
observed data collected in the study are located in the far left of the table. To adjust for
the confounder, one needs to complete the middle and far right stratified tables.
With assumptions about the bias parameters, the stratified data (cells in the middle
and right tables) can be calculated from the collapsed data (left table) using a few
simple procedures. First, complete the margins of the stratified tables using the
observed data margins (m and n), the prevalence of the confounder among the
exposed (p1), and the prevalence of the confounder among the unexposed (p0).
and
Assuming no effect measure modification on the ratio scale, the same logic can
be applied to the A and M cells.
A1 / M1
RR CD = (5.7)
(a - A1 ) / (m - M1 )
which can be rearranged to:
RR CD M1a
A1 = (5.8)
RR CD M1 + m - M1
We can then solve for A0 as it is equal to (a A1). For each column in the stratified
tables, the number of noncases (e.g., C1) equals the total number (e.g., M1) less the
number of cases (e.g., A1). After solving for each of the cells in the stratified table,
use the completed table to adjust the estimate of effect for the confounder using a
standardized morbidity ratio (SMR) or a Mantel-Haenszel (MH) estimate of the
risk ratio.
If, instead of risk data summarized by the risk ratio, one wanted to correct case-
control data summarized by the odds ratio, one could substitute c and C for m and
M, respectively, as well as d and D for n and N, respectively. Then solve for C1 and
D1 as:
and
After solving for the C and D cells, one could solve for the A and B cells using an
estimate of the odds ratio between the confounder and the disease as below.
OR CDC1a
A1 = (5.11)
OR CDC1 + c - C1
and
OR CD D1b (5.12)
B1 =
OR CD D1 + d - D1
Using the table, one can now summarize the data adjusted for the odds ratio using
an SMR or MH estimator just as for the risk ratio.
The approach explained here assumes that the effect of the unknown or unmeas-
ured confounder on the outcome is constant across strata of the exposure (i.e., no
effect measure modification on the ratio scale). If effect measure modification is
suspected, then multiple estimates of the effect of the confounder on the outcome
within strata of the exposure need to be specified in addition to the prevalence of
the confounder within levels of the exposure. The bias analysis method to address
an unmeasured confounder in the presence of effect measure modification is
described later in this chapter.
68 5 Unmeasured and Unknown Confounders
Example
The method described above, and the information on religion and its association with
HIV, is now used to adjust the observed effect of male circumcision on HIV acquisition.
The values assigned to the bias parameters are summarized in Table 5.2
Table 5.2 Bias parameters for a simple bias analysis of the association between male circumcision
(E) and HIV (D) stratified by an unmeasured confounder being Muslim (C)
Values assigned to the bias
Bias parameter Description parameter
RRCD Association between being Muslim and HIV 0.63
p1 Prevalence of Muslims among circumcised 0.80
p0 Prevalence of Muslims among uncircumcised 0.05
The left side of Table 5.3 shows the observed data associating male circumcision
(E) with incident HIV (D).
Table 5.3. Data on the association between male circumci-
sion (E) and incident HIV (D) stratified by an unmeasured
confounder being Muslim (C), with crude data from (Tyndall
et al., 1996)
Total C1 C0
E1 E0 E1 E0 E1 E0
D+ 105 85 A1 B1 A0 B0
D 527 93 C1 D1 C0 D0
632 178 M1 N1 M0 N0
The crude data and the assumptions about the bias parameters, in conjunction
with the equations derived earlier, provide a solution to allow stratification by religion.
First solve for the M1 and N1 cells using the prevalence of the confounder among
the exposed and the unexposed:
M1 = 632 * 0.80 = 505.6 and N1 = 178* 0.05 = 8.9 (5.13)
and then for M0 and N0:
The relation between being Muslim and HIV infection (0.63) can be used to
solve for the interior cells of the stratified table:
RR CD M1a 0.63* 505.6 *105
A1 = = = 75.2 (5.15)
RR CD M1 + m - M1 0.63* 505.6 + 632 - 505.6
and
The crude risk ratio in this study was 0.35. The calculated stratified data allow
adjustment of the association between circumcision and incident HIV for con-
founding by religion. After adjusting for religion, the SMR would be:
105
SMR = = 0.49 (5.17)
505.6 * (2.7 / 8.9) + 126.4 * (82.3 / 169.1)
If the assumptions about the values assigned to the bias parameters are accurate,
then had the authors collected data on religion, they would have observed a protective
effect of male circumcision on acquisition of HIV, similar to the effect observed in
a randomized trial (risk ratio 0.40; 95% CI 0.24, 0.68)(Auvert et al., 2005). Thus,
unmeasured confounding by religion may explain some of the overestimate in the
protective effect seen in the nonrandomized study.
The results of a bias analysis provide an estimate of the impact of the bias caused
by unmeasured confounding, assuming the bias parameters are correct. The results
of the bias analysis are only as good as the estimate of the values assigned to the
bias parameters used to conduct the analysis. The values assigned to the bias
parameters for this bias analysis came from a similar population, but not the same
population, as the original study. In addition, the analysis does not account for any
other confounders already controlled in the main study analysis. While this could
be accounted for by further stratifying data by confounders already adjusted, this
solution is only possible when one has access to the record-level data and when the
sample size is large enough to allow stratification by multiple confounders simul-
taneously. Therefore, while the method described in this section can be used to
explore the impact of any unmeasured confounder, the method is limited by the
accuracy of the values assigned to the bias parameters and the limitations of the
data. Extensions of this approach, detailed in Chaps. 7 and 8, show how to account
for uncertainty about the values assigned to the bias parameters by making multiple
estimates of them.
This approach is an improvement over intuitive estimates of the impact of
unmeasured confounding because the assumptions made are explicit and the impacts
given those assumptions have been quantified. Presentation of the results of a simple
bias analysis allows others to dispute the bias parameters chosen and to recalculate
estimates of effect, given alternative parameters. For example, one might argue that
the effect of being Muslim in this population (RRCD) has been overstated, particularly
70 5 Unmeasured and Unknown Confounders
given adjustment for other variables. In that case, one could change the risk ratio
associating being Muslim with incident HIV infection to 0.8; this change would
yield an adjusted estimate of 0.41 instead of the 0.41 presented above.
This example is also implemented in the Excel spreadsheet available on the
texts web site (see Preface), which can be adapted to other data sets. Figure 5.2
shows a screenshot of the spreadsheet. The values assigned to the bias parameters
and crude data are entered into the cells in the top right. Below that is a presentation
of the expected data within strata of the unmeasured confounder calculated using
the method described above. The results displayed at the bottom of the spreadsheet
show an adjusted risk ratio of 0.41 after adjusting for religion.
Note that the investigator could also have arrived at the same adjusted estimate
of association using Eq. (5.1) as follows:
Fig. 5.2 Excel spreadsheet used to correct for an unmeasured binary confounder when no effect
measure modification is present
Implementation of Simple Bias Analysis 71
The spreadsheet allows visual inspection of the interior cells of the stratified data as
well as calculation of the adjusted SMR. Other free spreadsheets for external adjust-
ment of unmeasured confounder using three-dimensional plots of the bias parameters
can be found on the internet (Pharmacoepidemiology and Pharmaco
economics Software Downloads. http://www.brighamandwomens.org/pharmacoepid/
Software%20DownloadsSensi.Ana.Confounding.May5.xls).
Difference Measures
When a risk difference is the measure of association, the above methods can be
revised to complete the stratified table with bias parameters expressed as difference
measures (Gail et al., 1988). In this case, an estimate of the prevalence of the con-
founder in both the exposed and unexposed groups is still required, but instead of
an estimate of the risk ratio associating the confounder with the outcome, one needs
to know the risk difference associating the confounder with the outcome (RDCD). M
and N cells can be completed as above for ratio measures, but to determine the
values in the A and B cells, one uses RDCD as follows:
B1 B0
RDCD = - (5.19)
N1 N 0
Substituting (b B1) for B0 and (n N1) for N0:
B1 b - B1 (5.20)
RDCD = -
N1 n - N1
Fig. 5.3 Excel spreadsheet used to correct for an unmeasured binary confounder when a risk
difference is the measure of association
The preceding approach assumes that the investigator has estimates of the risk ratio
or risk difference associating the confounder with the disease and of the distribution
of the confounder within levels of the exposure. To this point, the methods have
assumed that there is no effect measure modification, for example, the effect of
being Muslim on incident HIV is not modified by male circumcision. If the association
between the confounder and outcome is different in strata of the exposure (e.g., if
the risk ratio associating being Muslim and incident HIV was different in strata of
male circumcision), then one might wish to report the stratum-specific estimates
rather than summarized or adjusted estimates of association. However, except in
Implementation of Simple Bias Analysis 73
Use the stratified data to calculate the adjusted risk ratio as:
105 (5.25)
SMR = = 0.45
505.6 * (3.0 / 8.9) + 126.4 * (82.0 / 169.1)
Alternatively, one can modify Eq. (5.1) to make the correction directly:
RR CD0 p0 + (1 - p0 ) (5.26)
RR adj = RR obs
RR CD1 p1 + (1 - p1 )
74 5 Unmeasured and Unknown Confounders
where RRCD1 and RRCD0 represent the association between the confounder and the
disease in the presence of the exposure and in the absence of the exposure,
respectively. Using the data on HIV and circumcision, the adjusted estimate of
effect would be
0.4 * 0.05 + (1 - 0.05)
RR adj = 0.35 = 0.45 (5.27)
0.7* 0.8 + (1 - 0.8)
Note that the advantage of calculating the interior cells over using Eq. (5.27) is that
the data can be used to calculate the risk ratio of the exposure (circumcision) on the
outcome (HIV acquisition) within strata of the confounder (religion).
Figure 5.4 shows a screenshot from an Excel spreadsheet for adjusting for an
unmeasured confounder in the presence of effect modification.
Fig. 5.4 Excel spreadsheet used to correct for an unmeasured binary confounder when effect
measure modification is present
Polytomous Confounders 75
Polytomous Confounders
To this point, the bias analysis methods have focused on a confounder with two
levels, for example, a person either is or is not Muslim. In some cases, it will be more
accurate to represent the confounder with more than two levels. For example, one
might be interested in adjusting for three levels of religion: Muslim, Christian, and
all other religions. Those who are Muslim might have 0.4 times the risk of incident
HIV as those who are categorized as other, whereas those who are Christian might
have only 0.8 times the risk as those categorized as other. In this case, one would
want to incorporate into the bias analysis model the fact that the potential confounder
has more than two levels. One can expand the bias analysis framework to account
for multiple levels of the confounder by stratifying the observed data into three
groups (Muslim, Christian, and other) as in Table 5.7. However, the number of bias
parameters increases substantially, as now the associations between Muslim and
incident HIV (RRCD1) and between Christian and incident HIV (RRCD0) must be
estimated to calculate the expected cell counts, as well as the proportion of subjects
among those circumcised who are Muslim (p11) and Christian (p01), and the proportion
of subjects among those uncircumcised who are Muslim (p10) and Christian (p00).
While earlier values were assigned to only three total bias parameters, now values
must be assigned to six total bias parameters.
Table 5.7 Data on the relation between male circumcision and incident
HIV stratified by religion (unmeasured confounder), with crude data from
(Tyndall et al., 1996)
Total Muslim Christian Other
E1 E0 E1 E0 E1 E0 E1 E0
D+ 105 85 A2 B2 A1 B1 A0 B0
D 527 93 C2 D2 C1 D1 C0 D0
632 178 N2 M2 N1 M1 N0 M0
Though it now becomes more complicated to fill in the interior cells (see the
spreadsheet for interior cell calculations), Greenland (1987) and Axelson and
Steenland (1988) have shown how to use the same general bias analysis approach
to relate the bias parameters to the projected data:
RR CD1 p10 + RR CD0 p00 + (1 - p10 - p00 ) (5.28)
RR adj = RR obs
RR CD1 p11 + RR CD0 p01 + (1 - p11 - p01 )
For this example, assume that among the exposed population (circumcised), 60%
are Muslim and a further 20% are Christian, while among the unexposed population
(uncircumcised), 5% are Muslim and a further 20% are Christian. In this case, the
fully adjusted risk ratio would be:
Thus, the estimate of effect fully adjusted for the multilevel confounder would
be 0.54, assuming that the values assigned to the bias parameters are accurate.
Figure 5.5 shows a screenshot from an Excel spreadsheet for adjusting for an
unmeasured confounder when the confounder is polytomous.
Fig. 5.5 Excel spreadsheet used to correct for an unmeasured confounder when the confounder
is polytomous
The method detailed above is useful when all of the bias parameters can be specified
reasonably by the investigator; however, this complete specification is not always
possible. When only a subset of the components necessary to correct for an unmeasured
Polytomous Confounders 77
Analytic Approach
The authors employ the relative risk due to confounding (hereafter referred to as
RRconf) as defined by Miettinen (1972) to assess the impact of an unmeasured con-
founder. The RRconf is the ratio of the crude risk ratio (RRcrude) and the standardized
(adjusted) risk ratio using the exposed group as the standard (RRadj).
RR crude RR crude
RR conf = and RR adj = (5.30)
RR adj RR conf
Any information about the crude risk ratio and the components that make up the
relative risk due to confounding can be used to inform what the adjusted measure
would have been, had the confounder been measured.
Assuming the odds ratios (or risk ratios) associating the exposure and the confounder
and associating the confounder and the outcome are greater than or equal to 1, then
the RRconf must be greater than or equal to 1 and cannot be greater than the minimum
of the bounding factors listed in Table 5.8 (Flanders and Khoury, 1990).
These bounds are useful when only some of the information about the unmeas-
ured confounder is available. Even if some of these measures cannot be estimated,
simply knowing the minimum of those that can be estimated will inform an estimate
of the upper limits of the confounding. For example, note that since RRadj = [RRcrude/
RRconf], the minimum value of the factors in Table 5.8 (which we will refer to as
Parmmin) can be substituted for RRconf, since it is the largest possible value for the
relative risk due to confounding. This substitution allows us to determine the largest
possible adjustment to the crude risk ratio as:
RR crude (5.31)
RR adj =
Parm min
Given that RRDC and OREC must be greater than 1, each of the values in Table 5.8
must be greater than 1; accordingly, Parmmin must also be greater than 1. Thus, the
adjusted risk ratio must be between RRcrude/Parmmin and RRcrude. This approach is
most useful when little is known about the unmeasured confounder or when exist-
ing data on some of the factors (e.g., prevalence of the confounder in the exposed
and unexposed groups) is believed not to apply to the current study population. When
either RRDC or OREC is less than 1, one can consider recoding the variables so that
both parameters are greater than 1 and specifying p1 and p0 for these newly coded
relationships.
Chapter 6
Misclassification
Introduction
be at least as strong as observed, and therefore misclassification error could not explain
the observed association. This superficial approach to assessing misclassification bias
prevents a thorough understanding of the extent to which the true association was
greater than the observed association, and also sometimes results in a misinterpretation
of the inference that can be made from the bias toward the null (Gustafson and
Greenland, 2006). Furthermore, there are many exceptions to this predictability that
are too easily overlooked when the general expectation is too readily applied. More
detail on these exceptions will be provided below in the relevant sections.
Conceptual
Throughout this discussion of misclassification we will use terms that describe the
type of misclassification and measure the extent of the misclassification. We will
begin by providing a conceptual definition and then below, where applicable, we
will provide equations for calculating these different measures. These terms
have comparable meanings whether the misclassified variable is the exposure,
disease, or a covariate. In this section, we will consider the misclassified variable
as the exposure variable.
Whenever data cannot be collected perfectly, some subjects recorded values will
differ from their true value. If we knew each subjects true categorization with respect
to some exposure E, we could assess the likelihood that a classification system
correctly separated subjects into exposed and unexposed groups. Typically, four
measures summarize the performance of a classification scheme: its sensitivity,
specificity, positive predictive value, and negative predictive value.
The probability that a subject who was truly exposed was correctly classified as
exposed is the classification schemes sensitivity (SE). The proportion of those who
were truly exposed who were incorrectly classified as unexposed is the false-negative
proportion (FN). Likewise, the probability that a subject who was truly unexposed
was correctly classified as unexposed is the classification schemes specificity (SP),
and the proportion incorrectly classified as exposed is the false-positive proportion
(FP). Because these measures are proportions, their values range from 0 to 1.
We present them throughout this chapter as percents, although they can also be
presented as decimals or even as fractions.
The sensitivity and specificity of a classification scheme measures how well
the scheme sorts subjects into exposed and unexposed groups, so they use the
total number who truly do and do not have the exposure, respectively, as their
denominators. Sensitivity and specificity are therefore measures of the quality of
the classification scheme. They can be applied to populations aside from the one in
which they were measured, although the external population should be similar
enough to the source population that the classification scheme can be expected to
have reasonably similar performance.
Definitions and Terms 81
One might also want to know, given that a subject was classified as exposed
(or unexposed), the probability that the assigned classification was correct.
These values are the positive predictive value (PPV) and negative predictive value
(NPV), respectively. These measures use the number of subjects who are classified
as exposed or unexposed as their denominators. The PPV calculates the probability
of truly being exposed if classified as exposed while the NPV calculates the
probability of truly being unexposed if classified as unexposed.
Predictive values are related to sensitivity and specificity; however, the relation
depends on the true prevalence of the exposure in the population in which the
predictive value is measured. For example, even if the sensitivity and specificity of
exposure classification do not depend on disease status (i.e., exposure classification
errors are nondifferential), the PPV will be greater among cases than among noncases
if the exposure is positively associated with the disease, since the exposure prevalence
will be higher among cases than among noncases. Because of this dependence on
prevalence, predictive values are less readily applied than sensitivity and specificity
to populations or subcohorts other than the one in which they were measured.
In many cases, there is concern that the ability of the classification scheme to
classify study participants with respect to some variable may depend on the value
of another variable. This concern about the classification errors can be described as
either differential or nondifferential or as either dependent or independent. It is
important to realize that these terms apply to the mechanism of classification and
its susceptibility to interdependencies. While these terms are sometimes applied to
a result, such as a contingency table, to say that the result was misclassified in some
respect (e.g., nondifferentially and independently) requires comparison of the result
with the classification obtained without error, which is seldom available.
Nondifferential (ND) exposure misclassification occurs when exposure status is
recorded before the onset of the disease or when the method of assigning exposure
status is otherwise blind to the outcome status (so that errors in the classification of
the exposure are not likely to be related to disease status). ND exposure misclassifica-
tion is also expected when exposure is ascertained after the occurrence of the disease,
but is a fixed value that would not change over time, so could not be influenced by
disease onset. An expectation of nondifferential exposure classification errors can
also be created by blinding those collecting exposure data to participants disease
status and by blinding those collecting disease data to participants exposure status.
Differential exposure misclassification is likely to occur when exposure is
assessed after the disease has occurred (e.g., by subject interview); this expectation
arises because having the disease may trigger cases to recall or report its perceived
causes differently than noncases (i.e., recall bias). Differential disease misclassification
might also occur if exposed individuals were more likely to have their disease found
than unexposed individuals (i.e., detection bias). For example, if study participants with
diabetes are more likely to be screened for glaucoma than participants without diabetes,
then false-negative cases of glaucoma may be more likely among nondiabetics.
Dependent misclassification occurs when individuals misclassified on one
variable (e.g., exposure) are more likely than individuals correctly classified on
that variable to also be misclassified on a second variable (e.g., disease status).
82 6 Misclassification
We will now turn to the equations used to calculate the measures of classification.
Again, we will use exposure classification as an example for this section, but the
equations also apply to disease and covariate variables. Assume that we had collected
data on some exposure by self-report, but also had a way to know with certainty each
subjects true exposure status. In that case, the data could be laid out as a 22 contin-
gency table shown in Table 6.1, also called a validation table. Along the interior
columns, subjects are classified according to their true exposure status, while along
the interior rows subjects are classified according to their self-reported status.
Table 6.1 Nomenclature for the equations for calculating measures of misclassification
Truly exposed Truly unexposed
Classified as exposed A B
Classified as unexposed C D
Below are Eqs. (6.1)(6.6) for calculating the six measures of classification.
These equations assume that the participants in the validation study were randomly
selected from the source population.
A
Sensitivity (SE) = (6.1)
A+C
D
Specificity (SP) = (6.2)
B+ D
C
False negative proportion (FN) = (6.3)
A+C
B
False positive proportion (FP) = (6.4)
B+ D
A (6.5)
Positive predictive value (PPV) =
A+ B
D
Negative predictive value (NPV) = (6.6)
C+D
Definitions and Terms 83
P ( E0 ) (6.8)
NPV =
P( E0 ) + P( E1 )(1 - SE/SP)
Sources of Data
disease, or covariate that is either too logistically difficult or too expensive to col-
lect in the entire study population. The investigator has the option to conduct an
internal validation study or, if that is not feasible, to use estimates calculated for a
different study in a similar population. When an internal validation study is feasi-
ble, then the researcher has three options for selecting individuals to be included in
the validation study (Marshall, 1990). This choice will affect the classification bias
parameters that can be validly estimated (Table 6.3).
The classification parameters obtained for the exposure classification scheme can
now be used to conduct a simple bias analysis to correct for misclassification.
As outlined above, the design of a validation study will determine which measures
of classification can be calculated. First, we will outline the methods for misclas-
sification corrections for situations in which the sensitivity and specificity are
available and then methods where the predictive values are available.
To begin, we will consider the hypothetical situation in which we know the true
classification status of study participants and would like to calculate what we expect
would be observed using an imperfect classification scheme with known sensitivity
and specificity. Considering this scenario first helps to explain the process for deriving
the equations used to calculate the expected true estimate of effect given misclassified
observed data (i.e., the situation ordinarily encountered).
In Table 6.4, the equations for calculating the cell frequency in the 2 2 table
are presented. As throughout, we will use upper case letters to designate true clas-
sification frequencies and lower case letters to designate expected observed or
actual observed frequencies. E1 and D+ represent the exposed and diseased popula-
tions, respectively, and E0 and D represent the unexposed and undiseased popu-
lations, respectively. In the first two columns, we present the true data and in the
last two columns we present the equations that relate the true data to what we would
expect to observe, given the bias parameters of sensitivity and specificity. For this
example, we will assume that the mechanism of misclassification of exposure is
nondifferential with respect to disease (i.e., SED+ = SED and SPD+ = SPD) and that
disease is correctly classified.
Table 6.4 Equations for calculating the expected observed data from the true data given sensitivity
and specificity
Truth Expected observed
E1 E0 E1 E0
D+ A B a = A(SE D +) + B(1 SPD +) b = A(1 SE D +) + B(SPD +)
D C D c = C(SE D ) + D(1 SPD ) d = C(1 SE D ) + D(SPD )
Total A+C B+D a+c b+d
The equations for the expectation for what will be observed divide the individuals
in each cell into two groups: those who will remain in the cell (i.e., are correctly
classified) and those who will be shifted to the adjacent cell in the row (i.e., are
misclassified). For disease and covariate misclassification, the equations are essentially
the same as presented above, with the difference being the direction in which
misclassified subjects are shifted. For exposure misclassification, subjects always
stay in the same row (because their disease is assumed to be correctly classified)
but misclassified subjects are shifted from the exposed to unexposed column or vice
86 6 Misclassification
versa. For disease misclassification, subjects always stay in the same column
(because their exposure is assumed to be correctly classified), but misclassified
subjects are shifted from the diseased to nondiseased row or vice versa. For covariate
misclassification, if we can assume that the exposure and disease are classified
correctly, then the misclassified data remain in the same exposure and disease cell
but are shifted from one stratum of the confounder to another.
To calculate the a cell (i.e., the number of subjects we would expect to be
classified as exposed and diseased given the bias parameters), we take the frequency
of truly exposed individuals who were classified correctly (A*SED +) and add
the frequency of truly unexposed individuals who were incorrectly classified as
exposed [B*(1 SPD +)].
Table 6.5 presents hypothetical data to demonstrate how to calculate the
expected observed cell frequencies. We will begin with an example of nondif-
ferential exposure misclassification in which the classification scheme has a sensi-
tivity of 85% and a specificity of 95%. The odds and risk ratios (OR and RR,
respectively) are calculated for the truth and expected observed data.
Table 6.5 Hypothetical example for calculating the expected observed data given the true data
assuming nondifferential misclassification
Truth Expected observed
E1 E0 E1 E0
D+ 200 100 200(0.85) + 100(0.05) = 175 200(0.15) + 100(0.95) = 125
D 800 900 800(0.85) + 900(0.05) = 725 800(0.15) + 900(0.95) = 975
Total 1,000 1,000 900 1,100
OR 2.3 1.9
RR 2.0 1.7
In this example, the result expected to be observed is biased toward the null by
the nondifferential exposure misclassification (in this case, a true risk ratio of 2.0
appears as 1.7 with misclassification). It is this calculation that is the basis for the
(often mistaken) generalization that nondifferential misclassification will bias
associations toward the null.
Using the same data, we now assume a classification scheme with nondifferential
specificity of 90% (i.e., same in both the diseased and nondiseased) but with a sensi-
tivity of exposure classification that depends on disease status. Table 6.6 presents the
Table 6.6 Hypothetical example for calculating the expected observed given the true data and
assuming differential misclassification
Truth Expected observed
E1 E0 E1 E0
D+ 200 100 200(1.0) + 100(0.1) = 210 200(0) + 100(0.9) = 90
D 800 900 800(0.7) + 900(0.1) = 650 800(0.3) + 900(0.9) = 1,050
Total 1,000 1,000 860 1,140
OR 2.3 3.8
RR 2.0 3.1
Bias Analysis of Exposure Misclassification 87
These examples show how misclassification can affect results. However, as epide-
miologists, we are usually faced with a different problem. Our data classification
scheme is imperfect, and we wish to estimate what data arrangement we would have
observed had participants been correctly classified. To understand how to deal with
this problem, we will algebraically rearrange the equations above to reflect what we
would expect the true data to be, given the observed data and the bias parameters.
We will demonstrate the approach using the A cell.
Equations (6.9) and (6.10) below from Table 6.4 show the relation of the true
A and B cells and the bias parameters (SE and SP) to the observed a cell. D+ Total
is the total number of subjects with the disease (a + b), and D Total is the total
number of subjects without the disease (c + d).
B = D+ Total - A (6.10)
When the a and D+ Total cells are known, we can substitute D+ Total A for B in
Eq. (6.9):
We can then algebraically rearrange this equation to solve for A (the expected number
of true exposed cases) as follows:
We can similarly rearrange the equations for C in Table 6.4 to calculate c. Table 6.7
shows all the equations for calculating expected truth from observed data, given
estimates of the sensitivity and specificity.
Table 6.7 Equation for calculating expected true data given the observed
data with exposure misclassification
Observed Corrected data
E1 E0 E1 E0
D+ a b [a D+ Total(1 SPD +)]/ D+ Total A
[SED + (1 SPD +)]
D c d [c D Total (1 SPD )]/ D Total C
[SED (1 SPD )]
Total a+c b+d A+C B+D
A + C is the corrected total number of exposed individuals and B + D is
the corrected total number of unexposed individuals.
We can now apply these equations to correct misclassified data using the validation
data presented earlier. In a study of the effect of smoking during pregnancy on breast
cancer risk, information on maternal smoking during pregnancy was ascertained
from the mothers self-report on the birth certificate. There was concern that mothers
may not accurately report their smoking history when completing a birth certificate,
but since the pregnancy was before their breast cancer diagnosis, the misclassification
mechanism was expected to be nondifferential. This study observed no association
between smoking during pregnancy and risk of breast cancer (OR = 0.95 95%;
CI = 0.81, 1.1) (Fink and Lash, 2003). However, it was hypothesized that the
lack of an observed association could have been caused by nondifferential
misclassification of smoking.
In Table 6.2, we used data from a validation study to calculate a sensitivity of 78%
and a specificity of 99% for the accuracy of smoking information on the birth certificate
compared to medical record. We will use these values and the observed data from the
study to calculate the expected true association between smoking during pregnancy
and breast cancer risk, conditional on the accuracy of the sensitivity and specificity.
Table 6.8 shows the observed and corrected data given the bias parameters.
In this example, the nondifferential misclassification had a minimal effect on the
odds ratio, suggesting it is unlikely that the observed null effect was the result of
inaccurate reporting of smoking status on the birth certificate, presuming that the
sensitivity and specificity were approximately accurate.
Table 6.8 Example of correction for misclassification of smoking in a study of the effect of
smoking during pregnancy on breast cancer risk assuming nondifferential misclassification
(observed data from Fink and Lash, 2003)
Observed Corrected data
SE = 78%, SP = 99% Smokers Nonsmokers Smokers Nonsmokers
Cases 215 1,449 257.6 1,406.4
Controls 668 4,296 803.1 4,160.9
OR 0.954 0.949
Bias Analysis of Exposure Misclassification 89
Fig. 6.1 Screenshot of Excel spreadsheet to perform simple bias analysis for nondifferential
misclassification of exposure
Figure 6.1 is a screenshot from the Excel spreadsheet available on the texts web
site (see Preface) showing the same results. In the spreadsheet, it is necessary to
input the sensitivity and specificity for the exposed and unexposed in the top panel
and then the a, b, c, and d cell frequencies from the observed 2 2 table in the
center panel. The spreadsheet will calculate the corrected cell frequencies and the
corrected odds ratio and risk ratio as displayed in the bottom panel.
All the methods presented to this point use sensitivity and specificity to complete
the bias analysis. However, when PPV and NPV are available, a second method can
be used for simple bias analysis to correct for misclassification. Table 6.9 displays
the formulas for recalculating cell counts using PPV and NPV (Marshall, 1990).
90 6 Misclassification
Table 6.9 Equations for calculating expected true frequencies given the
observed frequencies and classification predictive values
Observed Corrected data
E1 E0 E1 E0
D+ a b a(PPVD +) + b(1 NPVD +) D+ Total A
D c d c(PPVD ) + d(1 NPVD ) D Total C
Total a+c b+d A+C B+D
The validation study by Piper et al. (1993) used the third design strategy in Table 6.3.
That is, subjects were selected without regard to their observed or validated
smoking status. Therefore, PPV and NPV can be computed. As calculated above
using the data in Table 6.2, a PPV of 96% and an NPV of 92% were observed.
Validation data were not available from breast cancer cases, so the control data will
be applied to both groups. Note that this assumption inherently presumes the
prevalence of smoking is the same in cases and controls, which is the same as
assuming a null odds ratio associating smoking with breast cancer occurrence. That
is, the assumption forces a null prior on the association that the study is designed
to measure. Table 6.10 displays the original and corrected data and estimate of
effect using the predictive values.
Table 6.10 Example of correction for misclassification of smoking in a study of the effect of
smoking during pregnancy on breast cancer risk assuming nondifferential misclassification
(observed data from Fink and Lash, 2003)
Observed Corrected data
PPV = 96%, NPV = 92% Smokers Nonsmokers Smokers Nonsmokers
Cases 215 1,449 322.3 1,341.7
Controls 668 4,296 985.0 3,979.0
OR 0.95 0.97
The result using this method differs slightly from the result using sensitivity and
specificity. This difference may be due to a limitation of the predictive value method.
As noted above, predictive values are influenced by the underlying prevalence of
exposure in the population. The observed data had a prevalence of smoking of
12.9% (215/1,664) among the cases and 13.5% (668/4,964) among the controls
(Fink and Lash, 2003). In the validation study, the observed prevalence of smoking
during pregnancy was 26.5%, almost double the smoking prevalence observed in
the data. Therefore, applying the predictive values from the validation study to the
birth certificate study may not be appropriate, and as a result the corrected data using
this method does not replicate the results using measures of sensitivity and specifi-
city. In general, using predictive values will be most useful when the validation data
derive from a subset of the study participants.
Bias Analysis of Exposure Misclassification 91
The following section provides an example in which the scheme for classifying the
exposure exhibits differential misclassification. This example uses data from a
case-control study measuring the association between a family history of hemat-
opoietic cancer on an individuals risk of lymphoma in Sweden (Chang et al.,
2005). In this study, the investigators validated self-report of a family history by
searching for family members diagnosis in the Swedish Cancer Registry and
examined whether the validity of self-report differed between cases and controls.
Specificity was nearly perfect in both groups (98% and 99% among cases and con-
trols, respectively), but sensitivity was low among cases (60%) and even lower
among controls (38%). Table 6.11 presents the observed data from the study and
the corrected data accounting for the sensitivity and specificity of report in the
cases and controls.
In this example, differential misclassification of family history of hematopoietic
disease had a substantial effect on the observed estimate of association, with a
120% increase in the ratio effect (from a corrected 1.6 to an observed 2.3) away
from the null.
Table 6.11 The association between family history of hematopoietic cancer and risk of lymphoma
before and after correction for differential misclassification of family history (observed data from
Chang et al., 2005)
Observed data Corrected data
Family history No family history Family history No family history
Cases 57 1,148 56.7 1,148.3
Controls 26 1,201 37.1 1,189.9
OR 2.3 1.6
Table 6.12 The association between self-reported maternal residential proximity to agricultural
fields and risk of neural tube defects in their offspring (adapted from Shaw et al., 1999)
Observed data among Observed data among
Total population the validated subset the remainder
<0.25 miles 0.25+ miles <0.25 miles 0.25+ miles <0.25 miles 0.25+ miles
Cases 82 177 64 163 18 14
Controls 110 360 91 333 19 27
OR 1.5 1.4 1.8
Table 6.13 displays the true data for the validated subset and the corrected data
for the remainder of the population. In the validation subset, the sensitivity of
exposure classification was 65.7% among the cases and 50.0% among the controls,
and specificity was 87.5% among the cases and 89.3% among the controls (Rull et al.,
2006). These values were used to correct the data for the subset of the population
without validated data. For subjects in the validation study, we can simply use their
validated exposure measure.
Table 6.13 The association between residential proximity to agricultural fields and risk of neural
tube defects (adapted from Shaw et al., 1999 and Rull et al., 2006)
True data among the Corrected data among
Total validated subset the remainder
<0.25 miles 0.25+ miles <0.25 miles 0.25+ miles <0.25 miles 0.25+ miles
Cases 93.3 165.7 67 160 26.3 5.7
Controls 151.8 318.2 116 308 35.8 10.2
OR 1.2 1.1 1.3
There are two options for calculating an overall odds ratio. First, we can add
the cells in the validated and corrected groups to get totals (the left table in Table
6.13). A second approach, described by Greenland (1988), is to weight the log of
the odds ratios by the inverse of their variance:
ln(OR total ) =
[(ln(OR V )VR ) + (ln(OR R )VV )]
(6.13)
(VV + VR )
where VV is the variance of the odds ratio among the validated group, using the
standard calculation of the sum of the inverse of each cell in the 2 2 table and VR
is the variance of the odds ratio in the remainder. Applying this equation to these
data yields a pooled odds ratio of 1.1.
Note that using the validated measure of exposure decreased the estimate of
effect from 1.4 to 1.1 in the validated subset and from 1.8 to 1.3 in the nonvalidated
group. The higher estimate of effect in the nonvalidated group suggests that this
group may not be representative of the total and that it is necessary to consider
whether it is appropriate to apply the sensitivity and specificity from the validated
sample to the nonvalidated sample. Alternatively, given the small sample size of the
remainder sample, the discrepancy may arise in large part due to chance variation.
Bias Analysis of Exposure Misclassification 93
The BRFSS surveys the US population using telephone interviews and collects
height and weight information by self-report. A second study compared the national
prevalence of body mass index in the BRFSS and the National Health and Nutrition
Examination Survey (NHANES), which collects height and weight information during
a physical examination by a trained health technician (Yun et al., 2006). This study
suggested that the prevalence of being overweight and obese were underestimated in
the BRFSS study. For illustrative purposes, we will assume that there was no overre-
porting of body mass index (i.e., specificity was 100%) and that misclassification of
BMI is independent of diabetes status (i.e., nondifferential misclassification).
Extrapolating from the data provided in the manuscript of the comparison study, a
sensitivity of 91% for assigning an overweight BMI (i.e., 9% of those overweight had
been incorrectly classified as healthy), and 68.2% for an obese BMI (i.e., 31.8% of
those obese had been incorrectly classified as overweight) were calculated. We assume
that no one who was obese had been misclassified as having a healthy body mass
index. Table 6.15 displays the recalculated table cells and odds ratio after applying the
estimates of sensitivity assuming that misclassified data only shifted one column.
Note that the odds ratio for the obese group compared to the healthy group was
biased toward the null, but the estimate of association for the overweight group
compared to the healthy group was substantially biased away from the null, despite
the fact that the misclassification was nondifferential.
To summarize, the generalization that nondifferential misclassification biases
results toward the null should be refined to say, on average, nondifferential mis-
classification of a dichotomous variable will bias results toward the null. Further
refinements to this statement will be made throughout this chapter.
Disease Misclassification
misclassified data staying in the same row but moving columns, the data remain in
the same column but shift rows.
Table 6.16 Equations for calculating the expected observed data from the true data
when there is disease misclassification
Truth Expected observed
E1 E0 E1 E0
D+ A B A(SEE1) + C(1 SPE1) B(SEE0) + D(1 SPE0)
D C D C(SPE1) + A(1 SEE1) D(SPE0) + B(1 SEE0)
Total A+C B+D a+c b+d
As with the exposure misclassification, we will use the equations in Table 6.16
to derive the equations for calculating expected truth given observed data. These
equations are shown in Table 6.17.
Table 6.17 Equations for correcting observed data, given presumably correct estimates of disease
classification bias parametersa
Observed Expected truth
E1 E0 E1 E0
D+ a b [a E1 Total(1 SPE1)]/[(SEE1 [b E0 Total(1 SPE0)]/[(SEE0
(1 SPE1)] (1 SPE0)]
D c d E1 Total A E0 Total B
Total a+c b+d A+C B+D
a
E1 Total is the total number of exposed subjects (a + c) and E0 Total is the total number of unexposed
subjects (b + d)
validation data, and from these data the sensitivity and specificity of AMI classification
can be calculated. We can then correct the outcome misclassification by applying the
bias parameters to the sex-specific misclassified vital statistics registry data.
Death certificates have a sensitivity of 53.0% (1,750/3,300) and a specificity of
98.5% (12,075/12,259) for AMI classification. These values for sensitivity and
specificity will be applied to the data in Table 6.19, examining the relation between
death from acute myocardial infarction and sex. We will assume that the disease
misclassification was nondifferential.
Table 6.19 Observed and corrected data for the association between
sex and death from acute myocardial infarction (AMI) (observed data
from De et al., 1998)
Observed data Corrected data
Males Females Males Females
AMI deaths 4,558 3,428 7,369.0 5,214.2
Other deaths 46,305 46,085 43,494.0 44,298.8
Total 50,863 49,513 50,863 49,513
Risk (%) 9.0 6.9 14.5 10.5
Risk ratio 1.3 Reference 1.4 Reference
Here we see that the observed result was biased toward the null by 25%, compared
to the result corrected for the misclassification of acute myocardial infarction,
assuming that the values assigned to the bias parameters are accurate.
Figure 6.2 shows a screenshot of the Excel spreadsheet to conduct the same
correction for disease misclassification. Once the user inputs the sensitivity and
specificity of disease misclassification (by exposure status if necessary), the program
computes the corrected cell frequencies and measures of effect.
Fig. 6.2 Screenshot of Excel spreadsheet for adjusting estimates of effect for nondifferential
disease misclassification
correctly diagnosed with disease (e.g., by arising earlier in the follow-up than the
true case). However, this exception remains controversial because of the latter
requirement (Brenner and Savitz, 1990).
To examine why ratio measures of association are unbiased when specificity is
perfect and the sensitivity is the same among the exposed and unexposed, consider the
following equations from Table 6.4 that relate true cell counts to observed counts.
Because the specificity is 100%, the second half of each of the above equations (1 SPE+)
is zero and can be eliminated. Therefore, the observed risk ratio would be
A(SE E1 ) / (a + c) (6.15)
RR observed =
B(SE E 0 ) / (b + d )
The confidence interval will be wider than had the classification been perfect,
however, because of the reduction in the number of cases caused by the imperfect
sensitivity (i.e., a reduced from A and b reduced from B by a factor SEE+, such that
a/b = A/B).
In Table 6.20, data are presented for a hypothetical study with 2000 participants,
half with the exposure and the other half without. The true association between
exposure and disease is a risk ratio of 2 and the true risk difference is 0.1.
Using these data, Fig. 6.3 shows the relation between sensitivity, specificity, and
the expected observed risk ratio and risk difference. The black curves refer to the
risk ratios. When specificity is held at 100% and sensitivity ranges from 50% to
100%, the analysis yields the unbiased risk ratio of 2 regardless of the sensitivity.
Conversely, when sensitivity is held at 100% and specificity ranges from 50% to
Disease Misclassification 99
RR Specificity = 1 RR Sensitivity = 1
RD Specificity = 1 RD Adjusted
2 0.11
0.1
1.75
Risk Difference
0.09
Risk Ratio
1.5 0.08
0.07
1.25
0.06
1 0.05
0.5 0.6 0.7 0.8 0.9 1
Assigned Proportion
Fig. 6.3 Effect of disease misclassification on estimates of the risk ratio and risk difference
100%, there is an inverse relation between the bias and the specificity. Even a
specificity of 99% will yield a risk ratio of 1.9, very close to the truth (2.0) though
not completely unbiased. The gray curves refer to risk differences. When specificity
is held at 100% and the sensitivity ranges from 50% to 100%, the risk difference
approaches the truth linearly as the sensitivity approaches 100%. In this situation,
the risk difference can be corrected by dividing the observed risk difference by
the sensitivity, which is then expected to equal the true risk difference. The same
adjustment can be done if sensitivity is equal to 100%; the observed result can
be divided by the specificity to obtain the expected truth.
In the example of death from myocardial infarction and sex (Table 6.19), a risk
ratio of 1.3 was observed and, after conducting a bias analysis, a corrected risk ratio
of 1.4 was obtained. The validation data indicated that the specificity of disease was
close to perfect (98.5%), but, as we see in Fig. 6.3, this near-perfect specificity is
not always sufficient to produce an unbiased estimate. However, the high specificity
does explain why the observed estimate was close to the corrected estimate despite
the poor sensitivity (53%).
Figure 6.3 suggests that when conducting case-control studies in which the
measure of effect will be an odds ratio estimate of the risk ratio, it is useful to define
the outcome with high (perfect) specificity and nondifferential sensitivity with
respect to disease classification. Near-perfect specificity can be accomplished by
requiring pathological confirmation of disease (rather than a clinical confirmation),
for example, and serves to remind us that for research purposes, clinical definitions
of disease may not produce the most valid research results. Nondifferential disease
misclassification can be achieved by using individuals blinded to exposure status or
independent sources for disease classification.
100 6 Misclassification
Covariate Misclassification
Methods for conducting bias analysis to correct for a misclassified covariate follow
the same approach as for exposure misclassification, but the misclassified data stay
in the same exposure and disease cells and move from one stratum of the covariate
to another. The covariate could be a potential confounder, effect measure modifier,
or both (Table 6.21).
Table 6.21 Equations for corrected observed data that are biased by covariate misclassification
Observed data
Total C1 C0
E1 E0 E1 E0 E1 E0
D+ a b aC1 bC1 aC0 bC0
D c d cC1 dC1 cC0 dC0
Corrected data
C1 C0
E1 E0 E1 E0 E1 E0
D+ A B [aC1 a(1 SPC1)]/[(SEC1 [bC1 b(1 SPC1)]/[(SEC1 a AC1 b BC1
(1 SPC1)] (1 SPC1)]
D C D [cC1 c(1 SPC0)]/[(SEC0 [dC1 d(1 SPC0)]/[(SEC0 c CC1 d DC1
(1 SPC0)] (1 SPC0)]
Total AC1 + CC1 BC1 + DC1 AC0 + CC0 BC0 + DC0
The equations for correcting misclassified data are derived in the same manner
as in the sections on exposure and disease misclassification (Table 6.4). Note that
the sensitivities and specificities of the confounder classification are given within
levels of the outcome.
To illustrate a correction for covariate misclassification, we will use an example
of a study that examined whether there was an association between prenatal folic
acid vitamins and having twins. In unadjusted analyses, those who took vitamins
were 2.4-fold more likely to have twins than those who did not. Use of in vitro
fertilization (IVF) procedures increases the risk of twins and it was suspected that
women undergoing IVF would be more likely to take folic acid supplements. Thus,
IVF could potentially be a strong confounder of the vitamins and twins relations.
This study did not have data on use of IVF but used as a proxy a period of involuntary
childlessness. Adjustment for this variable yielded an adjusted risk ratio of 2.3
suggesting that IVF treatment was not a strong confounder of the relation between
folic acid supplements and having twins. However, further data made available after
the completion of this study indicated that involuntary childlessness was not a good
Covariate Misclassification 101
proxy for IVF use. The use of this proxy had a sensitivity of 60% and a specificity
of 95% (Berry et al., 2005), and these bias parameters were assumed to be nondif-
ferential. The observed data are presented in Table 6.22 and the data corrected for
the misclassification of IVF treatment by use of its proxy variable, compared with
actual IVF treatment, is presented in Table 6.23.
Table 6.22 Observed data from a proxy for use of IVF treatment on a study estimating the effect
of folic acid supplements on having twins (adapted from Berry et al., 2005)
Total IVF+ IVF
Folic acid Folic acid Folic acid
Yes No Yes No Yes No
Twins Yes 1,319 5,641 565 781 754 4,860
No 38,054 405,545 3,583 21,958 34,471 383,588
Total 39,373 411,186 4,148 22,739 35,225 388,448
Risk ratio 2.4 4.0 1.7
Standardized morbidity ratio 2.3
Mantel-Haenszel risk ratio 2.2
After correcting for the misclassification of IVF, the risk ratios in both strata of
IVF are null. Assuming the values assigned to the bias parameters are accurate,
misclassification of the IVF confounder created the appearance of an effect between
folic acid and twins when there was truly no association. Figure 6.4 shows the
screenshot from the Excel spreadsheet with this example.
Table 6.23 Corrected data from a proxy for use of IVF treatment on
a study estimating the effect of folic acid supplements on having twins
(adapted from Berry et al., 2005)
IVF+ IVF
Folic acid Folic acid
Yes No Yes No
Twins Yes 907.4 907.2 411.6 4,733.8
No 3,055.1 3,055.8 34,998.9 402,490.2
Total 3,962.5 3,963.0 35,410.5 407,224.0
Risk ratio 1.0 1.0
Fig. 6.4 Screenshot of Excel spreadsheet for conducting simple bias analysis for misclassified
covariates
RR crude
RR Conf = (6.17)
SMR
In the example above, the crude risk ratio relating folic acid intake to giving birth
to twins was 2.4 and the SMR adjusted for the proxy for IVF was 2.3. This result
suggests that IVF was not an important confounder in the association between use
of folic acid during pregnancy and having twins, because the adjusted RR differed
from the crude RR by less than 10% (relative risk due to confounding equal to 1.08,
which is less than 1.1). On the contrary, after correcting for the misclassification of
IVF, the SMR equals 1.0 so the relative risk due to confounding equals 2.4, a change
of 240% after adjustment. This change in the relative risk due to confounding with
the correction for misclassification indicates that the entire observed association
Dependent Misclassification 103
between folic acid use and having twins may have been due to confounding by IVF
use. In this situation, misclassification of the confounder prevented the adjustment
from removing the confounding and residual bias remained. As a general rule,
nondifferential misclassification of a confounder yields residual confounding when
adjusting for the confounder, and biases the relative risk due to confounding, not
the adjusted estimate itself, toward the null (Greenland, 1980). Therefore, an estimate
of association adjusted for a misclassified covariate will lie somewhere between the
crude result (i.e., the estimate of association not adjusted for the confounder) and
the true adjusted result (i.e., the estimate of association that one would get when
adjusting for the correctly classified confounding). The estimate adjusted for the
misclassified confounder is therefore biased away from the truth in the direction of
the confounding. In our example, because the confounding gave the appearance of
an effect when in fact the association was apparently null, the relative risk due to
confounding was biased toward the null from 2.4 to 1.08.
Unlike nondifferential misclassification of a covariate, differential misclassifi-
cation leads to an unpredictable bias of the relative risk due to confounding, which
can still be corrected using the methods described above.
To this point, we have considered the impact of a covariate that is acting as a
confounder. If the covariate is an effect measure modifier, then nondifferential
misclassification of either the exposure or the modifier in an analysis of interaction
can create the appearance of interaction when none truly exists or can mask interaction
when it truly exists (Greenland, 1980). In the example of folic acid and twins, if we
hypothesized that IVF treatment could modify the effect of folic acid intake on risk
of giving birth to twins, then relying on the data from the misclassified variable for
IVF would have suggested that IVF was a modifier, while in truth the association
between folic acid intake and twins does not depend on IVF status.
Dependent Misclassification
As briefly noted earlier, dependent errors occur when study participants misclas-
sified with respect to one axis of the association or analysis (e.g., exposure) are more
likely than those not misclassified to also be misclassified with respect to a second
axis of the association (e.g., disease). Small amounts of dependent misclassification
can have a large impact on estimates of effect, and the direction of the bias is not
predictable (Kristensen, 1992). Unlike the misclassification errors previously
discussed in this chapter, errors from dependent misclassification are more difficult
to assess in the analysis phase. However, since a major source of dependent errors
arises when the same method is used to ascertain information on more than one
variable, such as exposure and disease status, the best option is to design studies
that minimize the risk of this error. For example, an interview used to collect data
on exposure and disease would be susceptible to dependent errors, but if possible,
the disease information could be collected by medical record review to remove
the risk of dependent errors.
104 6 Misclassification
In the situations where there is no option of preventing this error in the design,
Kristensen (1992) describes a set of equations to calculate corrected risk ratios. One
barrier to the functionality of these equations is that they require estimates for the
probability of misclassification in many different directions. For example, what
proportions of respondents who are truly unexposed and undiseased are classified as
(1) unexposed and diseased, (2) exposed and undiseased, and (3) exposed and diseased?
There are seldom validation data to provide estimates of these probabilities.
In the absence of estimates of the probability of misclassification, one option is to
calculate the extent of misclassification that would be necessary to produce a specific
odds ratio, for example, a null result. Then the researcher can consider whether it is
plausible that the calculated misclassification occurred in the study. As an example,
in a study of self-reported neighborhood problems and functional decline (FD), indi-
viduals with one or more neighborhood problems were 2.5 times more likely to report
functional decline over a year of follow-up (Balfour and Kaplan, 2002). Both the
neighborhood characteristics and functional decline were assessed by an interview
with the participants, so it is possible that the personality characteristics that would
lead an individual to overstate the problems in their neighborhood may also lead them
to overstate their functional decline. We used the equations described by Kristensen
(1992) to find the minimum change in cell count necessary to remove the association
between neighborhood characteristics and functional decline (Lash and Fink, 2003a).
We found that if there was truly no association between neighborhood problems and
functional decline, it would only take 1.6% of the study population to have dependent
errors, such that they were misclassified as having neighborhood problems and func-
tional decline to create the observed effect. Table 6.24 displays the association
extrapolated from information in the original paper and the hypothetical 2 2 table
if 1.6% of the population had dependent errors.
Identifying the structure of dependent error that would produce a null result is
complex because the number of bias parameters increases to as many as 12. As a
result, we recommend using iterative methods in which small amounts of dependent
errors are simulated to see if even these small errors can make a variable with no
association with the outcome appear to be a strong risk factor. Figure 6.5 shows a
screenshot from an Excel spreadsheet for the preceding example.
Dependent Misclassification 105
This chapter, so far, has described the methods and impact of misclassification on
estimates of association. It would be invalid to use corrected cell frequencies to
calculate a conventional 95% confidence interval; this CI would be too narrow, as it
would only reflect the random error from the study and not the additional error
introduced by the bias parameters, reclassifications and potential correlation
between the bias parameters (Chu et al., 2006). This second component of random
error is what precludes applying standard equations for calculating standard errors
to the reclassified cell frequencies as if they were observed. Appropriate equations
for the variance of the corrected estimate of effect differ depending on whether
sensitivity and specificity were used to correct the estimates of effect or predictive
values were used. In addition, whether the bias parameters were derived from an
internal substudy or externally will affect the computation.
If the bias analysis was conducted using estimates of sensitivity and specificity,
then equations described by Greenland (1988) for adjusting the variance should be
used. If the exposure misclassification was differential and sensitivity and specificity
were estimated using external validation data, the variance for the corrected odds
ratio can be calculated as follows:
(SE D + SPD - 1)
2
i i
(6.18)
where the variance of sensitivity and specificity are [p*(1 p)]/n with p being the
sensitivity or specificity and n being the sample size used to estimate the sensitivity
or specificity. P(E1) and P(E0) are the proportion of subjects in the strata that after
reclassification are exposed and unexposed, respectively and P(e1) and P(e0) are the
proportion of subjects in the strata that are observed as exposed and unexposed,
respectively. Di is the total number of subjects within the strata.
For further equations for calculating variance for nondifferential misclassification
and outcome misclassification see Greenland (1988). For a Bayesian and graphical
solution applied to misclassification, see Chu et al. (2006).
Conclusions 107
When predictive values are used to correct the odds ratios, the equations derived by
Marshall (1990) should be used to calculate the variance. To calculate the variance of
the corrected odds ratio, first Eq. (6.16) is used to calculate the variance of the prob-
ability of exposure among the cases and repeated for the variance of the probability
of exposure among the controls.
P ( E1 ) P ( E0 ) PPV(1 - PPV )
var[ p( E )] = [PPV - (1 - NPV)]2 + P ( E1 )2
N n( E1 )
(1 - NPV)( - NPV) P ( E0 )2
+
n( E 0 ) (6.19)
where n(E1) is the total number of exposed individuals and n(E0) is the total number
of unexposed individuals.
For other equations depending on the validation sampling methods, refer to the
paper by Marshall (1990).
The results from Eq. (6.19) are used as the numerator in Eq. (6.20) below to
calculate the variance of the log of the corrected odds ratio.
Conclusions
Extensions
The methods presented above assume that the exposure, disease, and covariates
were each dichotomous variables. Here, a method for dealing with polytomous
variables, with misclassification patterns that can be differential and/or dependent,
is presented (Greenland and Lash, 2008). This equation allows the investigator to
have an m n table; both exposure and disease can have multiple levels, and the
data can be stratified by a confounder such that the table has K cells. Then for each
cell there is the observed count of individuals in the cell (C*), and the true count of
individuals in the cell (C). For each K cell, there is a probability that individuals
classified into that cell truly belong there (P). Therefore, given the observed data,
the corrected count in each cell can be computed as
C* (6.21)
C=
K =1 P
K
108 6 Misclassification
While the equation is relatively simple, the difficult issue is whether the investigator
has sufficient knowledge and information to estimate the probability of misclassi-
fication for each cell of the table.
The examples so far have implied that a study only had one source of bias from
misclassification. For example, the study of smoking and the risk of breast cancer
used the cancer registry to identify cases. The cancer registry requires histological
confirmation of cancer diagnoses, so disease misclassification was thought to be mini-
mal and there was no evidence of substantial confounder misclassification.
Therefore, in this example, exposure misclassification was the source of bias that
was most likely to have affected the results.
When an analysis is susceptible to more than one set of classification errors, the
corrections above can be performed sequentially, so long as the classification errors
are independent. However, if the errors are dependent, then these methods will not
be valid when applied sequentially and the method described by Eq. (6.21) must be
implemented.
Limitations
The methods presented in this chapter will provide results for simple bias analysis to
correct for misclassification errors. Methods using two different sets of bias param-
eters were presented (sensitivity and specificity or PPV and NPV) and each has
strengths and limitations. The method that used PPV and NPV will provide more
precise estimates, but because they are heavily influenced by the prevalence of expo-
sure in the population, applying results from validation studies in populations differ-
ent from the studies of interest is often inappropriate. Measures of sensitivity and
specificity can be applied more readily from one population to another, so long as the
populations and classification methods are similar. One limitation of this method is
that there are often combinations of sensitivity, specificity, and exposure or disease
prevalence that will produce negative cell counts. In these cases, the values assigned to
the bias parameters will need to be reevaluated before the method can be used. For
exposure misclassification, negative cell counts will occur when the number of
observed exposed cases (the a cell) is less than the product of the number of individu-
als exposed and the false-positive proportion (i.e., when the false-positive rate is
greater than the percent of exposed persons that are cases). Comparably for disease
misclassification, a negative value will occur when the frequency of observed exposed
cases is less than the total number of diseased individuals multiplied by the false-
positive proportion. Negative cell counts will occur when the false-positive proportion
is greater than the proportion of cases that are exposed. When these relations do not hold
within strata of a confounder, negative cells will occur for confounder misclassifica-
tion. Similarly, bounding equations for negative c cells can also be determined.
Negative cells can also occur when SE*SP is equal or less than FN*FP, indicating that
sensitivity is either the complement of specificity or that the classification system accu-
racy is less than 50%, suggesting worse than random classification.
Chapter 7
Multidimensional Bias Analysis
Introduction
The preceding three chapters have described the techniques for conducting simple
bias analysis to assess errors caused by selection bias, residual confounding, or
misclassification. However, simple bias analysis implies that the researcher has one
and only one estimate to assign to each of the values for the error models bias
parameters. In many situations, that is not the case. There are many bias parameters
for which validation data do not exist, so the values assigned to the bias parameter
are educated guesses. In this situation, the analyst is better served by making more
than one educated guess for each value and then combining values in different sets.
In other situations, multiple different measures of the bias parameter may exist, and
there may be no basis for the analyst to select just one as the best estimate of the
truth from among those available. For example, when both internal and external
validation studies have been conducted, or there were multiple external estimates
each in populations slightly different to the one under study, the analyst has no basis
to select one value for the bias parameter over another. Frequently, internal esti-
mates are more useful than external estimates because they derive from the same
source population as yielded the studys estimate of association. If there is the pos-
sibility of selection bias into the internal validation study, however, then it is pos-
sible that the subjects included in the validation study do not provide a good
estimate of the bias parameter in the remainder of the study population. In this situ-
ation, the analyst may want to use values informed by all of the available validation
studies as independent estimates.
Multidimensional bias analysis is a direct extension of simple bias analysis
whereby the methods for simple bias analysis are repeated with a range of values
for the bias parameter(s). This method provides the researcher with some information
regarding the range of estimates of association that are possible, given different
assumptions regarding the value of the bias parameters. For example, if there are
no data regarding the bias parameters, then multidimensional bias analysis could be
used to determine the minimum amount of bias that would convert a positive asso-
ciation to a null association. The analyst could then assess the plausibility of the
values that must be assigned to the bias parameters to accomplish the conversion.
T.L. Lash et al. Applying Quantitative Bias Analysis to Epidemiologic Data, 109
DOI: 10.1007/978-0-387-87959-8_7, Springer Science + Business Media, LLC 2009
110 7 Multidimensional Bias Analysis
Furthermore, multidimensional bias analysis can provide some insight into the
impact of changing the value assigned to the bias parameter on the estimate of
association. This analysis may find that the value assigned to the sensitivity of
exposure classification, for example, has less impact on the change in the estimated
association than the value assigned to the specificity of exposure classification.
Similarly, the analysis may find that the strength of the assumed association
between an unmeasured confounder and the outcome is more important than the
assumed prevalence of the confounder in the control population. Such insights both
expand the information obtained from the bias analysis and guide future research
efforts. In the first example, one might want to direct research resources toward a
study that would accurately measure the sensitivity of exposure classification. In
the second example, one might want to direct research resources toward a study that
would accurately measure the strength of association between the unmeasured
confounder and the outcome. In addition, multidimensional bias analysis can provide
boundaries to the range of expected estimates of association, given a known or potential
source of error, a bias model for the error, and the values assigned to the models
bias parameters.
The methodologies and equations used for multidimensional bias analysis are
the same as described in Chaps. 4 (selection bias), 5 (unmeasured and unknown
confounders), and 6 (misclassification). In a multidimensional bias analysis, the
analytic procedures are simply repeated with multiple combinations of values
assigned to the bias parameters. A multidimensional bias analysis might also vary
the error model, although that is a less common strategy than simply varying the
values assigned to the models bias parameters. This chapter will provide some
examples that highlight situations in which multiple bias analysis has been imple-
mented, strategies for performing multiple bias analysis, and formats for presenting
the results of multidimensional bias analysis.
Selection Bias
criteria, 361 could not be enrolled because the physician refused to participate. A
further 162 of the cases did not have available matched controls, so were excluded.
Therefore, only 250 (32%) of potential cases were included in the study. There are
at least two potential causes of selection bias. First, the exposure prevalence in the
controls is about 1%, and therefore it is possible that even a slightly higher prevalence
among the controls that were not recruited for the 162 cases could impact the
results. Second, of the 250 included cases, 27% had the vaccine during the relevant
time period, whereas of the 162 cases without available controls, 14% received the
vaccine. In addition, the exposure prevalence is unknown for the 361 cases not
recruited into the study. Because of the concern over the potential selection bias,
multidimensional bias analysis methods were used to assess these selection bias issues.
To assess the impact of incomplete control selection, the exposure prevalence in
the controls matched to the 162 cases was varied. Even if the exposure prevalence in
the controls was 5%, rather than the observed 1%, the odds ratio between receipt of
the intranasal influenza vaccine and Bells palsy would have equaled 10, with a 95%
confidence interval of 6.8, 16. While the strength of the association would have been
weaker, the observed effect cannot be explained completely by selection bias, given
the error model and the assumed values assigned to the bias parameters in Table 7.1.
Table 7.1 Multidimensional bias analysis to examine selection bias from not recruiting controls
for 162 cases in a case-control study of the association between receipt of the intranasal influenza
vaccine and the risk of Bells palsy (adapted from Mutsch et al. (2004))
Control exposure Vaccinated case/ 95% confidence
prevalence control Unvaccinated case/control Odds ratio intervala
0.01 68/8.0 182/714.0 33.3 15.7, 70.6
0.01 23/5.4 139/480.6 14.8 5.7, 38.5
0.02 91/17.7 321/1190.3 19.0 11.3, 32.1
0.03 91/22.6 321/1185.4 14.9 9.2, 24.0
0.04 91/27.4 321/1180.6 12.2 7.8, 19.0
0.05 91/32.3 321/1175.7 10.3 6.8, 15.7
a
The 95% confidence intervals were reported by Mutsch et al. (2004) and do not properly account
for error arising from the error model and values assigned to the bias parameters
Next, the potential impact of selection bias from the nonrecruited cases was
examined. It is unknown whether the exposure prevalence of the 361 cases would
be the 27% observed in the included cases, the 14% observed among the cases
without available controls, or some other value. To assess the impact from exclud-
ing these cases, the odds ratio was calculated with a range of exposure prevalence
from 27% to 1% (Table 7.2). If the exposure prevalence in the excluded cases was
14%, as with the other cases not included in the case-control analysis because they
had no matched controls, then the odds ratio would be 21 (95% CI: 13, 35). If the
exposure prevalence in these cases equaled the 1% prevalence of the observed controls,
then the overall odds ratio would have been 13 with 95% CI 8.0, 22. These calculations
indicate that the observed effect cannot be attributed completely to selection bias,
at least given this error model for the selection bias and the values assigned to the
models bias parameters.
112 7 Multidimensional Bias Analysis
Table 7.2 Multidimensional bias analysis to examine selection bias from excluding 361 cases in
a case-control study of the association between receipt of the intranasal influenza vaccine and the
risk of Bells palsy (adapted from Mutsch et al. (2004))
Vaccinated Unvaccinated 95% confidence
Case exposure prevalence case/control case/control Odds Ratio intervala
0.272 189.2/18.8 583.8/1,786.2 30.7 19.0, 49.8
0.26 184.9/18.8 588.1/1,786.2 29.8 18.4, 48.3
0.25 181.3/18.8 591.8/1,786.2 29.1 17.9, 47.1
0.2 163.2/18.8 609.8/1,786.2 25.4 15.6, 41.3
0.15 145.2/18.8 627.9/1,786.2 21.9 13.5, 35.7
0.14 141.5/18.8 631.5/1,786.2 21.3 13.0, 34.7
0.1 127.1/18.8 645.9/1,786.2 18.7 11.4, 30.5
0.05 109.1/18.8 664.0/1,786.2 15.6 9.5, 25.6
0.01 94.6/18.8 678.4/1,786.2 13.2 8.0, 21.9
Assumes constant control exposure prevalence of 0.01
a
The 95% confidence intervals were reported by Mutsch et al. (2004) and do not properly account
for error arising from the error model and values assigned to the bias parameters
Unmeasured Confounder
Table 7.3 Results of multidimensional bias analysis to assess the impact of confounding by
indication by an unknown binary confounder (UBC) on a study of 5-fluorouracil (5-FU) and
colorectal cancer mortality (Sundararajan et al., 2002)
Hazard ratio for
Prevalence of UBC in Prevalence of UBC UBC Hazard 5-FU, adjusted 95% confidence
those without 5-FU in those with 5-FU ratio for UBC intervala
0.5 0.1 3.00 1.11 1.00, 1.21
0.6 0.1 2.00 0.96 0.88, 1.06
0.5 0.1 2.00 0.90 0.83, 0.99
0.9 0.5 2.00 0.84 0.77, 0.92
0.9 0.1 1.75 1.03 0.94, 1.13
0.9 0.5 1.75 0.81 0.74, 0.88
0.5 0.1 1.75 0.85 0.77, 0.93
0.9 0.1 1.5 0.92 0.84, 1.00
0.9 0.5 1.5 0.77 0.70, 0.84
0.5 0.1 1.5 0.79 0.72, 0.86
0.9 0.1 1.25 0.79 0.72, 0.87
0.9 0.5 1.25 0.72 0.66, 0.79
0.5 0.1 1.25 0.73 0.66, 0.80
0.9 0.1 1.1 0.72 0.65, 0.78
0.9 0.5 1.1 0.69 0.63, 0.75
0.5 0.1 1.1 0.69 0.63, 0.75
a
The 95% confidence intervals were reported by Sundararajan et al. (2002) and do not properly
account for error arising from the error model and values assigned to the bias parameters
Misclassification
None of these studies were conducted in the population under study, nor is infor-
mation obtained from the medical report or via questionnaire a true gold standard.
Therefore, there is no clear answer as to which studys results should be used to
inform the assignment of values to the bias parameters in the bias analysis.
Multidimensional bias analysis provides a solution by allowing the analyst to exam-
ine the effect of different estimates of sensitivity and specificity on the observed
estimate of association. For this example, the validation studies consistently indi-
cated that specificity was very high, as would also be expected because it is unlikely
that a woman who did not smoke during her pregnancy would say that she smoked
during her pregnancy when she reported the information ultimately recorded on the
birth certificate. The focus of the bias analysis therefore rests on the impact on the
estimate of association of the value assigned to the sensitivity of maternal smoking.
As shown in Table 7.4, the sensitivity of maternal smoking status reported on a
childs birth certificate is not very high. It is not surprising that some women who
smoked during pregnancy would report that they did not when providing the infor-
mation ultimately recorded on the birth certificate. Using the equations outlined in
Chap. 6, we adjusted the observed odds ratio for a range of values of 0.51 assigned
to the sensitivity of exposure classification, while holding the specificity constant at
1. Figure 7.1 displays the results of this multidimensional bias analysis.
The figure indicates that, despite uncertainty regarding the value for sensitivity
of classification of maternal smoking status, it is unlikely that the nondifferential
misclassification of maternal smoking status biased a truly harmful exposure
1.00
0.98
0.96
OR - Expected Truth
0.94
0.92
0.90
0.88
0.86
0.84
0.82
0.80
0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
Sensitivity
Fig. 7.1 Multiple bias analysis resulting from assigning values ranging from 0.5 to 1 to the sen-
sitivity of classification of maternal smoking status, holding specificity equal to 1, using the
results of a study of the association between maternal smoking during pregnancy and breast cancer
risk (Fink and Lash, 2003). The markers correspond to sensitivity of smoking classification
reported in the validation studies shown in Table 7.4 and to sensitivity equal to 100%
Misclassification 115
toward the null. Note that in this example, the results of the multiple bias analysis
were effectively displayed using graphical techniques rather than a table. Figure 7.2
displays a screenshot of the table generated by this same multiple bias analysis
result in an Excel spreadsheet available at the texts web site (see Preface).
The Figure expands the values assigned to the bias parameters by allowing differ-
ential misclassification. This aspect shows that if differential misclassification was
likely (contrary to the studys design) and the sensitivity of smoking classification
was higher for the cases than the controls (one possible manifestation of recall
bias), then it is possible that there was a weak positive association between mater-
nal smoking and breast cancer risk that was masked by the smoking misclassifica-
tion. Given the studys design protection against differential misclassification,
however, this scenario seems unlikely.
Fig. 7.2 Screenshot of Excel spreadsheet to perform multidimensional bias analysis for misclas-
sification of exposure. Similar rationale and methodologies could be applied to research studies
with potential misclassification of outcome or a confounder
116 7 Multidimensional Bias Analysis
Limitations
Introduction
To this point we have considered situations in which the bias parameters for a bias
analysis are known with certainty or modeled as if they are known with certainty
(i.e., simple bias analysis, see Chaps. 46). We have also considered models of the
impact of combinations of bias parameters on an observed estimate of association
(i.e., multidimensional bias analysis, see Chap. 7). Simple bias analysis is an
improvement over conventional analyses, which implicitly assume that all the bias
parameters are fixed at values that confer no bias. However, the usefulness of sim-
ple bias analysis is limited by its assumption that the bias parameters are known
without error, a situation that is rarely, if ever, a reality. Multidimensional bias
analysis improves on simple bias analysis by examining the impact of more than
one set of bias parameters, but even this approach only examines the bias conferred
by a limited set of bias parameters. For any analysis, many other possible combina-
tions of bias parameters are plausible, and a multidimensional analysis will not
describe the impact of these possibilities. More important, multidimensional analy-
sis gives no sense of which corrected estimate of association is the most likely
under the assumed bias model, which can make interpretation of the results
challenging.
One solution to these limitations of simple and multidimensional bias analysis
is to use probabilistic bias analysis. With probabilistic bias analysis, rather than
specifying one set or a limited number of sets of bias parameters, the investigator
specifies probability distributions for each of the bias parameters, and then uses
Monte Carlo sampling techniques to generate a frequency distribution of corrected
estimates of effect. By Monte Carlo sampling we mean that we choose bias
parameter values from a probability distribution specified by the analyst, and then
use these chosen values to conduct a single simple bias analysis. We then repeat this
process many times, each time sampling a set of bias parameters and correcting the
estimate of association. The utility of the analysis therefore relies substantially on
the accuracy of the probability distributions assigned to the bias parameters.
As an example, in Chap. 6 we examined a misclassification problem in a
study of the relation between smoking during pregnancy and risk of breast cancer.
T.L. Lash et al. Applying Quantitative Bias Analysis to Epidemiologic Data, 117
DOI: 10.1007/978-0-387-87959-8_8, Springer Science + Business Media, LLC 2009
118 8 Probabilistic Bias Analysis
The observed data were as shown in Table 8.1. Women who reported smoking at
the time of the birth of their child were 0.95 times as likely to develop breast cancer
as were women who did not report smoking.
Table 8.1 Observed data in a study of the
effect of smoking during pregnancy on breast
cancer risk (Fink and Lash, 2003)
Observed
Smokers Nonsmokers
Cases 215 1,449
Controls 668 4,296
OR 0.95
In this example, we were concerned that smoking during pregnancy, which was
measured by self-report on a birth certificate, was not perfectly measured. Based on
an external validation study that used data from medical records as the gold stand-
ard, measuring smoking by self-report on a birth certificate had a sensitivity of
78%. However, if we use only this estimate of sensitivity to correct for the misclas-
sification using a simple bias analysis, we are expressing a strong confidence in our
knowledge that the actual sensitivity is 78%. It is the same confidence, in fact, that
is inherently implied when the sensitivity is implicitly set to 100% in a conventional
analysis. In reality, that value of 78% is itself measured with error, and this addi-
tional uncertainty should be incorporated into the bias analysis. In fact, two other
similar validation studies reported estimates of sensitivity ranging from 68 to 88%
(see Table 7.4). A more realistic approach might be to specify that sensitivity of
smoking classification could be any value between 68 and 88%, with any value in
the range being equally likely to be the true value for sensitivity (see discussion
below for the rationale and literature supporting this range).
This simple probability distribution for sensitivity of smoking classification
(a uniform distribution with a minimum of 68% and a maximum of 88%) will allow
a more realistic depiction of the potential impact of this information bias in the
study. We can now ask, given the probability distribution assigned to this bias
parameter, what is the central tendency of the corrected estimate of the effect? This
method improves upon the multidimensional approach, which gave many corrections,
but no sense of which was the most likely correction. We could also ask, given the
distributions specified for the bias parameters, what are the limits of an interval
encompassing some proportion of the corrected estimates (e.g. 95% of the distribution)?
Knowing this information would allow summarization of the results of a bias analysis
that incorporated many possibilities for the bias parameters and could give a sense
of the variability of possible corrected estimates, given an assigned plausible probability
distribution for the bias parameters. This summary would allow stronger statements
about the impact that the biases might have had, and allow these stronger statements
to be made more compactly than with the multidimensional approach.
Probabilistic bias analysis extends the methods explained in the earlier chapters
on simple and multidimensional bias analysis by repeatedly sampling from a
Probability Distributions 119
distribution of bias parameters. Probabilistic bias analysis can be used for any
method of simple bias analysis that is based on specifying bias parameters. Thus,
misclassification, unmeasured confounding, and selection bias problems can all
be approached with a probabilistic bias analysis extension to the simple bias
analysis methods that have already been discussed.
Probability Distributions
Uniform Distribution
4.0% 4.0
3.5% 3.5
Probablity Density
% of all draws
3.0% 3.0
2.5% 2.5
2.0% 2.0
1.5% 1.5
1.0% 1.0
0.5% 0.5
0.0% 0.0
50% 60% 70% 80% 90% 100%
Sensitivity
Fig. 8.1 Example of a uniform probability density distribution for sensitivity (solid line) and the
results of drawing randomly from that distribution (bars) with a minimum value of 70% and a
maximum value of 95%. While it is expected that each value in the allowed range will be selected
equally, the observed data do not match our expectations exactly since this process was not
repeated infinitely
Probability Distributions 121
are not reflected in the probability density assigned by a uniform distribution. There
are some bounds that make good sense, such as an upper limit on a proportion (e.g.,
a classification parameter) equal to 1 or a lower limit on a proportion equal to 0 or
equal to a value below which negative cell frequencies are returned by the correction
algorithm. These upper and lower bounds can also be incorporated into other probability
density distributions, as we will discuss further below.
Trapezoidal Distribution
Probability Density
% of all draws
10% 5
3
5%
2
0% 0
67% 77% 87% 97%
Sensitivity
Fig. 8.2 Example of a trapezoidal probability distribution for sensitivity (solid line) and the
results of drawing randomly from that distribution (bars) with a minimum value of 70%, a lower
mode of 75%, an upper mode of 85%, and a maximum value of 95%
Not all probability distributions can be randomly sampled in all statistical computing
packages. However, one can sample from any probability distribution [f(x)] by
recognizing that the integral from negative infinity to positive infinity over any
probability distribution or density function [i.e., its cumulative probability distribution
F(x) over all possible values] equals 1:
F ( x) = f ( x )dx = 1 (8.2)
-
By recognizing that F(x) is uniformly distributed over the interval [0,1], one can
choose a random number u from a standard uniform distribution and interpret it as
a random value selected from F(x). To obtain x [a random value selected from f(x)],
one solves
x
F ( x) = f ( x )dx = u (8.3)
-
for x. Numeric integration and interpolation will ordinarily suffice if f(x) is not
integrable or discrete. This procedure is implemented in SAS by calling
quantile(function,u,parm-1,,parm-k) where function is the density func-
tion and parm-1,,parm-k are optional shape and location parameters.
Most statistical analysis software packages do not provide the ability to sample
randomly from a trapezoidal distribution, but this sampling can be created using a
standard uniform distribution random number generator. To generate a random draw
Probability Distributions 123
s=
(
min +mod low + u max +mod up - min -mod low
) (8.4)
2
where min, modup, modlow, and max are the minimum, lower mode, upper mode
and maximum value of the specified trapezoidal distribution, and u is the random
draw from a standard (0,1) uniform distribution. Next, calculate the random value
trap from the trapezoidal distribution as follows:
If the value of s is within the two modes (inclusive) then:
trap = s (8.5)
( )(
trap = max - 2 max -mod up s -mod up ) (8.6)
trap = min + (mod low - min ) (2s - min -mod low ) (8.7)
Triangular Distribution
Probability Density
% of all draws
6
10%
4
0% 0
67% 77% 87% 97%
Sensitivity
Fig. 8.3 Example of a triangular probability distribution for sensitivity (solid line) and the results
of drawing randomly from that distribution (bars) with a minimum value of 70%, a mode of 80%
and a maximum value of 95%
Normal Distribution
For some analyses, a normal distribution might be a useful choice for a bias parameter,
particularly because many statistical packages allow random sampling of a standard
normal deviate and because it is a commonly known distribution. A normal distri-
bution requires the user to specify a mean and a standard deviation. Software packages
that allow for the creation of a random normal deviate can be used to create a draw
from a normal distribution as:
may be omitted if these values are desired, in which case the quantile function returns
the equivalent of random0,1 in (8.8). Figure 8.4 depicts 50,000 draws from a normal
distribution for sensitivity with a mean of 80% and a standard deviation of 10%.
Unlike the trapezoidal distribution, the normal distribution does not have a range
of values that are considered equally likely, and in some cases the investigator may
feel this model is a better fit to their prior assumptions about the bias parameter.
However, Fig. 8.4 demonstrates one problem with using a normal distribution.
Because the tails of the normal distribution extend past values that might not be
logically possible (i.e. values outside the range of 01 required for sensitivity), this
distribution might not be as useful as it first appears.
While a probit or logit transformation will avoid this problem, a straightforward
alternative choice in this situation is a truncated normal distribution in which the
user truncates the probability distribution to values that are possible given the data.
This truncation can be done explicitly by not allowing values for the bias parame-
ters that are impossible, or by allowing values that produce impossible values (i.e.
negative cells in a contingency table) to be removed from the results. Figure 8.5
shows the same sampled values as in Fig. 8.4 (normal, mean 80%, standard devia-
tion 10%), but all values outside the range 0100% have been discarded.
Even with the truncation, values are still allowed that may be either implausible
(e.g., values below 50% can still be chosen, which suggest the data classification
process was poor) or impossible given the data (see below). Impossible values
could be removed during the analysis, but many implausible corrections would still
remain. Therefore, one might further truncate the distribution below the limit that
seems plausible to the user.
4
% of all draws
3
5%
2
0% 0
%
7%
%
%
7
7
37
47
57
67
77
87
97
10
11
12
Sensitivity
Fig. 8.4 Example of a normal probability distribution for sensitivity (solid line) and the results of draw-
ing randomly from that distribution (bars) with a mean value of 80% and a standard deviation of 10%
126 8 Probabilistic Bias Analysis
Probability Density
% of all draws
5%
0% 0
30% 40% 50% 60% 70% 80% 90% 100%
Sensitivity
Fig. 8.5 Example of a normal probability distribution for sensitivity (solid line) and the results
of drawing randomly from that distribution (bars) with a mean value of 80% and a standard
deviation of 10% truncated at 0 and 100%
Beta Distribution
The beta distribution provides a probability density function that is well-suited for
assignment to proportions - such as sensitivity, specificity, or predictive values -
because it is defined on the interval [0,1]. The beta distribution is parameterized by
two positive shape parameters, often denoted as a and b. The expected value of a
random variable X drawn from a beta distribution and its variance are as follows.
a
E( X ) =
a+b
ab (8.9)
Var( X ) = 2
(a + b) (a + b + 1)
A second advantage of the beta distribution is its flexibility to model a wide range
of probability density shapes. Figure 8.6 illustrates this point. With a and b both set
equal to 1.1, the beta distribution yields a near uniform density, but without the sharp
boundaries of the uniform distribution. With a and b both set equal to 7, the beta
distribution yields a symmetric density similar to the normal distribution. With a and
b set to different values, the beta distribution yields asymmetric densities centered on
different means. In general, if a and b are both less than 1, then the density distribution
Probability Distributions 127
6 =15, =2
5
Probability Density
4 = 3, = 7
=7, =7
= 1.1, = 1.1
3
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Proportion
Fig. 8.6 Examples of probability density functions generated by the beta distribution by setting
different combinations of values for a and b
will be u-shaped, giving more weight to values at the tails than values at the center.
If a and b are both greater than 1, then the density distribution will be unimodal, giving
more weight to values at the center than values at the tails.
The beta distribution addresses some of the shortcomings described for the pre-
ceding distributions. Unlike the uniform, trapezoidal, and triangular distributions,
the beta distribution does not yield sharp boundaries. Unlike the normal distribu-
tion, the beta distribution does not yield values outside of an allowed range (such
as proportions less than zero or greater than one. The major shortcoming of the beta
distribution is that the connection between the desired density shape and the values
assigned to a and b are not directly apparent. For example, the density distribution
in Fig. 8.6 with a = 15 and b = 2 yields a density approximately centered between
0.9 and 0.95 and with almost the entire density between 0.55 and 1.0. This density
distribution may be a reasonable choice for a classification parameter, such as the
sensitivity of disease classification, for example.
In general, for a classification parameter (sensitivity, specificity, positive predic-
tive value, or negative predictive value) one can use the results from a validation
study to assign values to the beta distribution parameters. For example, to param-
eterize the sensitivity of classification, one can set a equal to the number of test
positives plus one, and set b equal to the number of test negatives plus one. This
method adopts the results of the validation study to parameterize directly the beta
distribution, so should be implemented only after consideration of the direct
applicability of the validation study results to the classification problem at hand.
This topic was discussed in Chap. 3.
128 8 Probabilistic Bias Analysis
When one wishes to less directly parameterize the beta distribution, an approxi-
mate guide for choosing values for a and b is to specify a range of likely values
with minimum = a and maximum = c, and also a mode = b. The mean of the beta
distribution then approximately equals (a + 4b + c)/6 and the standard deviation
equals (c a)/6. In the preceding example, if we set a = 0.55, b = 0.925, and c =
1.0, the mean of the beta distribution would be (0.55 + 4 0.925 + 1.0)/6 or 0.875
and the variance would be (1 - 0.55)/6 or 0.075. Given the estimated mean (x) and
standard deviation (s), we can solve for a and b with the following equations.
x(1 - x )
a = x - 1
s2
x(1 - x ) (8.10)
b = (1 - x ) - 1
s2
Using these equations with this example yields a = 16.1 and b = 2.3, which are
close to the actual shape parameters (a = 15, b = 2).
A second shortcoming of the beta distribution is that some statistical computing
software packages have no built-in function that allows one to sample from it. For
example, Microsoft Excel has no such function. In SAS, the function quantile
(beta,u,a,b,l,r) returns a random draw from a beta distribution with shape
parameters a = a and b = b), and with optional lower and upper bounds l and r,
respectively. These last two are location parameters with default values of 0 and 1,
respectively, and may be omitted if these values are desired. Figure 8.7 depicts
50,000 draws from a beta distribution for sensitivity with a = 46 and b = 6. This
distribution would correspond with a perfect validation study in which 45 of 50 true
positives tested positive (a = 45 + 1 = 46 and b = 5 + 1 = 6), so with measured
sensitivity of 90%.
8.0% 8
% of all draws
7.0% 7
6.0% 6
5.0% 5
4.0% 4
3.0% 3
2.0% 2
1.0% 1
0.0% 0
0.56 0.66 0.76 0.86 0.96
Sensitivity
Fig. 8.7 Example of a beta probability distribution with a = 46 and b = 6 (solid line) and the
results of drawing randomly from that distribution (bars)
Probability Distributions 129
7 triangular
Probability Density
trapezoidal
4
uniform
3
1 beta normal
0
0.5 0.6 0.7 0.8 0.9 1
Proportion
Fig. 8.8 Five probability density functions centered on a proportion of 0.85 and with the vast
majority of the density lying between 0.85 and 1.0
For example, assume that the study of the association between smoking and
breast cancer occurrence, with observed results shown in Table 8.1, was susceptible
to misclassification of smoking status. In particular, some women who smoked dur-
ing pregnancy may not have reported their smoking on their childs birth certificate,
so exposure classification would have imperfect sensitivity. On the other hand, it is
fair to assume that no woman who truly did not smoke during her pregnancy mis-
reported that she did smoke during pregnancy, so specificity would be perfect.
130 8 Probabilistic Bias Analysis
1
beta
0.9
0.8 triangular
Cumulative Probability
0.7 trapezoidal
0.6
0.5
0.4
0.3
0.2
uniform
0.1 normal
0
0.5 0.6 0.7 0.8 0.9 1
Proportion
Fig. 8.9 Five cumulative probability functions centered on a proportion of 0.85 and with the vast
majority of the density lying between 0.7 and 1.0, corresponding to the five probability density
functions depicted in Fig. 8.8
This scenario is consistent with the validation data described in Chap. 6. In this
circumstance, the correction equations described in Chap. 6 simplify to a function
of the observed frequency of exposed cases (or controls) and the sensitivity of
exposure classification, as shown in Table 8.2.
0.0465
0.0470
0.0475
0.0480
ln(OR)
0.0485
0.0490
0.0495
0.0500
uniform triangular trapezoidal normal beta
Distribution
Fig. 8.10 Probabilistic bias analysis results as a function of choice of density distribution applied to
the sensitivity of exposure classification. The horizontal gray line depicts the conventional odds ratio
calculated from the observed data. For each density distribution, the midpoint depicts the median result
after drawing 100,000 values of sensitivity (SE) from the density distribution (with 120 iterations
excluded), and the error bars depict the 2.5 and 97.5 percentiles of the ranked corrected odds ratios
132 8 Probabilistic Bias Analysis
Analytic Approach
As with any bias analysis, the first step is to identify the likely sources of important
bias affecting the study results. For our example, we have previously identified that
Analytic Approach 133
To correct for the bias, the investigator next has to identify the bias parameters
necessary to address the bias in a formal bias analysis. This identification would
be accomplished just as in a simple bias analysis, using methods presented in
earlier chapters. In this analysis we will use estimates of sensitivity and specifi-
city of exposure classification as the bias parameters to inform the probabilistic
bias analysis.
In the simple bias analysis (Chap. 6) we specified a single value for each of the bias
parameters. In this probabilistic bias analysis, we have now assigned probability
distributions to both the sensitivity and the specificity of smoking classification. To
begin making corrections to the data set, we take an initial draw from both of the prob-
ability distributions. We then use these values to conduct a single simple bias analysis.
From the previous section on sampling from a uniform distribution (see (8.1))
we can sample from the sensitivity distribution by drawing a random number u
from a standard uniform distribution. The initial draw yielded a value of 0.16098,
which we translate to a sensitivity within the range of 6888% as:
Thus, in this example, the initial draw of the value assigned to classification sensi-
tivity is 71.22%. Repeating the same procedure for specificity, we drew a value of
0.6417 from a standard uniform distribution, which gave an initial draw of the value
assigned to classification specificity of 95.78%.
In Chap. 6 we used a series of calculations to combine the observed data and the
estimates of sensitivity and specificity to correct the odds ratio associating smoking
during pregnancy with breast cancer occurrence for the misclassification of smok-
ing status. We now follow the same approach using our randomly drawn estimates
of sensitivity and specificity. Table 8.5 shows how we calculate the expected true
data given the values assigned to the bias parameters in the first draw.
Substituting the observed data in Table 8.5 and using the equations in Table 8.5,
we generate the observed and corrected data in Table 8.6.
After correcting for the exposure misclassification using the randomly sampled
estimates of sensitivity and specificity, we find a corrected odds ratio of 0.933, not
Analytic Approach 135
Table 8.5 Process for calculating expected truth given the observed data with
exposure misclassification
Observed Expected truth
E1 E0 E1 E0
D+ a b a - D+(1 SPD+)/(SED+ (1 SPD+)) D+ - A
D c d c - D+(1 SPD)/(SED (1 SPD)) D - C
Total a+c b+d A+C B+D
much different from the observed odds ratio of 0.954. If we examined only this
corrected estimate, we would conclude that there is likely little impact of exposure
misclassification on the results, presuming the values assigned to the bias parameters
are accurate.
The corrected odds ratio of 0.933 is the result of a single simulation to correct for
misclassification of smoking. It represents only one possible estimate corrected for
misclassification given the probability distribution we specified for sensitivity
(uniform from 68 to 88%) and specificity (uniform from 90 to 99%). Drawing
conclusions based on these corrections alone would ignore the probability distribu-
tions and would assume that the bias parameters were known with certainty.
Alternatively, repeating step 4 would almost certainly give us different estimates of
sensitivity and specificity of exposure classification and would therefore give us a
different corrected estimate of association. Ideally, we would like to know the
frequency distribution for the corrected estimates of association yielded by the
probability distributions assigned to the bias parameters and the method used to
correct for the bias. This frequency distribution can be approximated by repeating
steps 4 and 5 over and over, each time saving the corrected estimate of association.
These corrections can then be used to create a distribution of estimates of association
corrected for the bias, which approximates the desired frequency distribution.
We note that it is difficult to implement probabilistic bias analysis without the use
of statistical software that contains both random number generators and the ability
136 8 Probabilistic Bias Analysis
to correct the data many times while each time saving the corrected estimates.
This process can be accomplished by a few lines of code in statistical analysis pro-
grams such as SAS, but is slightly more complicated in programs such as Microsoft
Excel. In the appendix to this chapter we show one method that can be used in
Excel. For any probabilistic bias analysis, the user must decide how many simula-
tions to run, such that the resulting frequency distribution is sufficiently precise to
satisfy the inferential objective. This can be tens or hundreds of thousands of simula-
tions if there are many bias parameters and the probability distributions assigned to
those parameters are wide.
We repeated the simulation described above 50,000 times using Excel, each time
choosing new estimates of sensitivity and specificity, correcting for the bias, and
then saving the corrected odds ratios. The distribution of corrected odds ratios
yielded by these simulations range from 0.85 to 0.95.
This distribution can be summarized to create a 95% simulation interval. For this
interval we will use the median value of the corrected estimates as our measure of
central tendency. We then take the 2.5th percentile and 97.5th percentile of the
distribution as indicating the limits between which 95% of the corrected odds ratios
lie. We refer to this interval as the systematic error simulation interval. The results
are presented in Table 8.7.
Without accounting for the classification errors, the result showed that women
who smoked during pregnancy had 0.95 times the odds of breast cancer as those
who did not with a conventional frequentist 95% confidence interval from 0.81 to
1.13. We refer to this as the conventional result because it only includes an assess-
ment of random error as is the conventional analytic method.
In this case, the probabilistic bias analysis yielded a median of the odds ratios
corrected for the misclassification equal to 0.93 with a 95% simulation interval
ranging from 0.85 to 0.95. Thus, even with the distributions set around the bias
parameters, the median of the corrected estimates was not far from the observed
odds ratio. In addition, 95% of the distribution of corrected odds ratios were
between 0.85 and 0.95, suggesting little effect of the misclassification, presuming
that the distributions assigned to the bias parameters are accurate.
Table 8.7 Results of a probabilistic bias analysis of the relationship between smoking
and breast cancer correcting for nondifferential misclassification of smoking
Analysis Median 95% Interval (2.5th, 97.5th percentile)
Random error (Conventional result) 0.95 (0.81, 1.13)
Systematic error 0.92 (0.85, 0.95)
Analytic Approach 137
Fig. 8.11 Screenshot of Excel spreadsheet to conduct probabilistic bias analysis for exposure
misclassification
138 8 Probabilistic Bias Analysis
We can apply the same approach we used to model misclassification error to con-
duct a probabilistic bias analysis that models error from unmeasured confounding.
To do so, we will continue an example that first appeared in Chap. 5, which looked
at the association between male circumcision and acquisition of HIV. In the
observed data, men who were circumcised had approximately one third the risk of
HIV infection as men who were not circumcised (RR 0.35; 95% CI 0.28, 0.44). The
observed data from this study are given in Table 8.8.
In the study of the association between male circumcision and acquisition of HIV,
we were concerned that the results could be confounded by religious affiliation, a
variable for which data were not collected. We will use probabilistic bias analysis
to investigate the error that the unmeasured confounder might have introduced.
Analytic Approach 139
In the simple bias analysis we assumed that the values assigned to the bias param-
eters were known. Now, as with the misclassification problem above, we will
assign probability distributions to each of the three bias parameters (p1, p0, and
RRCD) above.
For this analysis, we assigned a trapezoidal distribution for each of the three bias
parameters. For each bias parameter, we centered the modes approximately on the
value used in the simple sensitivity analysis, chose a range for the mode that
seemed reasonable, and then extended the trapezoidal distribution to lower and
upper bounds such that the width of the trapezoid was approximately twice the
width of the range between modes. The minimum, modes and maximum for these
distributions are depicted in Table 8.9.
Table 8.9 Bias parameter distributions for a probabilistic bias analysis of the relationship between
male circumcision and HIV stratified by an unmeasured confounder (religion)
Bias parameter Description Min Modlow Modup Max
p1 (%) Prevalence of being Muslim among 70 75 85 90
circumcised
p0 (%) Prevalence of being Muslim among 3 4 7 10
uncircumcised
RRCD Association between being Muslim and 0.5 0.6 0.7 0.8
HIV acquisition
With these trapezoidal distributions assigned to each of the three bias parameters,
we can now sample from each to choose initial values for p1, p0, and RRCD. In
our initial samplings we obtained a value of 0.589 for RRCD, 73.2% for p1, and
7.5% for p0.
140 8 Probabilistic Bias Analysis
The approach used in Chap. 5 to create data stratified by the unmeasured confounder
can now be applied to the observed data using the three bias parameters values sampled
in step 4. In this case, the simulated data would be as depicted in Table 8.10.
This simple bias analysis gives a standardized morbidity ratio (SMR) adjusted
for religion of 0.48, closer to the null than the 0.35 risk ratio without adjusting for
religion.
In the final two steps we repeat steps 4 and 5 above, each time saving the SMR
adjusted for religion. We repeated this process in Excel 50,000 times and summa-
rized the results in Table 8.11.
Table 8.11 Results of a probabilistic bias analysis of the relationship between male circumcision
and HIV adjusting for an unmeasured confounder, religion
Analysis Median 95% Interval (2.5th, 97.5th percentile)
Random error (Conventional result) 0.35 (0.28, 0.50)
Systematic error 0.47 (0.42, 0.55)
As seen in Table 8.11, the simulation interval, which corrects for the unmeas-
ured confounding conditional on the accuracy of the distributions assigned to the
bias parameters, has a median SMR that is 0.47, closer to the null than the conven-
tional point estimate of 0.35. This median is similar to the result from the simple
bias analysis (Chap. 5). However, in this example, we can now see that the limits
of our simulation interval extend from 0.42 to 0.55. Thus, after correcting for the
confounding conditional on the accuracy of the assigned distributions, the median
estimate of association is shifted towards the null, but a value of 1 is still excluded
from the simulation interval (without simultaneously incorporating random error).
Figure 8.12 shows a screenshot of the Excel spreadsheet we used to conduct
the simulation.
Analytic Approach 141
Fig. 8.12 Screenshot of Excel spreadsheet to conduct a probabilistic bias analysis for an unmeas-
ured confounder
By sampling from this distribution, we could then divide the observed crude risk
ratio by the relative risk due to confounding to get a distribution of estimates
adjusted for the unmeasured confounder. Ideally, distributions around the bias
parameters may be informed by the limits to the relative risk due to confounding
described in Chap. 5 (Flanders and Khoury, 1990; Yanagawa, 1984).
This approach has the advantage of being simpler than the one described above,
in that one only needs to assign a distribution to one bias parameter (the confounding
bias itself). This simplification makes it easier to program in statistical analysis
software and simpler to explain in a methods section of a manuscript. However, the
relative risk due to confounding may be more difficult to correctly parameterize -
because it is less intuitive than specifying the individual parameters that make up
the RRC - and it may be difficult to get data from the literature to specify a distribution
for RRC. In addition, users may specify distributions for the RRC that portray less
uncertainty in the corrected estimates of effect (i.e. a narrower simulation interval)
than they would have obtained by specifying the individual parameters that
influence the RRC.
With those caveats, this approach would proceed as above and be summarized
in the same way. Bodnar et al. (2006) used this approach in a report on the association
between periconceptual vitamin use and risk of preeclampsia. Because data were
not collected on fruit and vegetable consumption (a variable considered an important
confounder), the authors assigned a trapezoidal distribution for the relative risk due
to confounding (RRc) by fruit and vegetable intake and simulated the impact of the
unmeasured confounding on the results.
Following the same approach we used above for both misclassification and unmeasured
confounding, we can apply similar techniques to a selection bias problem. In Chap.
4, we described how one can use the selection odds ratio (i.e., the odds ratio for the
probability of being selected into the study given exposure and outcome status) to
correct for selection bias. In the example given, there was concern about selection
bias in a study of the association between mobile phone use and the occurrence of
uveal melanoma. Table 8.12 shows the data from the study and the number of
subjects who filled out a shorter questionnaire indicating only their exposure and
disease status.
Table 8.12 Depiction of participation and mobile phone use in a study of the relation between
mobile phone use and the occurrence of uveal melanoma (Stang et al., 2008)
Participants Nonparticipants/short questionnaire Nonparticipants
Regular use No use Regular use No use Cannot categorize
Cases 136 107 3 7 17
Controls 297 165 72 212 379
Analytic Approach 143
These data can be corrected for the selection bias in a simple bias analysis with
an estimate of the selection proportions for each combination of the exposure and
disease. These can be calculated from the data above using the nonparticipants who
filled out the short questionnaire. To correct for the selection bias, we then
calculate:
Scase,0 Scontrol,1
OR
OR adj = OR = (8.13)
Scase,1 Scontrol,0 OR sel
where S represents the selection probability within levels of the exposure and the
outcome ORadj, is the odds ratio adjusted for the selection bias, OR ^ is the observed
odds ratio, and ORsel is the selection odds ratio.
This simple bias analysis can be extended to a probabilistic bias analysis either
by assigning distributions for each of the individual selection proportions, or by
assigning a single probability distribution to the selection odds ratio. In this exam-
ple, we will assign a triangular distribution to the selection odds ratio.
In the example from Chap. 5, we estimated that the selection odds ratio was:
0.940.25
OR sel = = 0.432 (8.14)
0.850.64
We might assign a distribution to the selection odds ratio with a minimum of 0.35,
a mode of 0.432 (the value we calculated above), and a maximum value of 1.1. By
sampling from this distribution and then dividing the observed odds ratio of 0.71
by the sampled selection odds ratio 50,000 times, we generated a simulation inter-
val corrected for the selection bias, conditional on the accuracy of the distribution
assigned to the bias parameter. The results are in Table 8.13.
Table 8.13 Results of a probabilistic bias analysis of the relation between mobile phone
use and uveal cancer adjusting for selection bias
Analysis Median 95% Interval (2.5th, 97.5th percentile)
Random error (Conventional result) 0.71 (0.51, 0.97)
Systematic error 1.18 (0.71, 1.81)
In the conventional analysis, the point estimate for the odds ratio was 0.71. After
correcting for the selection bias, the median odds ratio was 1.18, which nearly
equals the result of the simple bias analysis. This equivalence makes sense, since
we specified the median of the triangular distribution to be the same value we cal-
culated in the simple bias analysis. However, this analysis gives more information
than the simple bias analysis, as we can now see that the limits of the simulation
interval extend from 0.71 to 1.81 given the distribution we assigned to the selection
odds ratio (and without incorporating random error). Fig. 8.13 is a screenshot of the
Excel file used to conduct this simulation.
144 8 Probabilistic Bias Analysis
Fig. 8.13 Screenshot of Excel spreadsheet to conduct probabilistic bias analysis for selection bias
Correlated Distributions
In the preceding sections, we have assumed that the values sampled from the dis-
tributions assigned to the bias parameters are independent of one another. For
example, in the unmeasured confounding example, we assumed that when we
chose a value for the prevalence of the unmeasured confounder (religion) among
the exposed (circumcised) it was independent of the value we chose for the preva-
lence of the unmeasured confounder among the unexposed. This assumption
Correlated Distributions 145
might be realistic in some cases, but in many other cases it might not. For example,
even if the prevalence of Muslims is higher among circumcised than uncircum-
cised, choosing a lower prevalence of Muslims among the circumcised should
suggest that we choose a lower prevalence of Muslims among the uncircumcised,
simply reflecting the fact that the overall prevalence of Muslims is lower condi-
tional on selection of a low prevalence from the distribution assigned to the
circumcised.
An example of a situation in which we did not assume independence of the
distributions assigned to the bias parameters was the nondifferential misclassifi-
cation of exposure example. In that case, we assumed that the sensitivity and
specificity distributions were independent of each other, but by specifying non-
differential misclassification, each value we chose for sensitivity among the
cases was the same value we chose for sensitivity among the controls. In that
example, sensitivity among cases and sensitivity among controls were perfectly
correlated (i.e. correlation of 1).
Had we been concerned about differential misclassification instead, we would
not want to conduct a bias analysis in which sensitivity (and/or specificity) of
exposure classification among the cases was the same as among the controls. Still,
we also might not want to specify that the two distributions were independent (i.e.
correlation of 0) (Fox et al., 2005). It is perhaps unrealistic to assume that, even in
situations in which the misclassification is differential, there is no correlation
between the sensitivity in cases and the sensitivity in controls and between the
specificity in cases and the specificity in controls. Higher sensitivities among cases
are likely associated with higher sensitivities among controls, even if they are not
the same, presuming that the same classification method was used for both cases
and controls (e.g., a telephone interview). This correlation can be included in the
bias analysis by inducing a correlation between the random values (u) selected
from the standard uniform distribution. The following approach can be used to
induce this correlation. First, select three random variables, u1, u2, and u3, from a
standard uniform distribution and calculate the logit of each as:
u
gi = logit[ui ] = ln i (8.15)
1 - ui
To create two correlated random variables, calculate two values, c1 and c2, as:
where corr is a number between 0 and 1 that denotes the desired correlation
between the two variables. These two random variables (c1 and c2) will be corre-
lated with strength approximately corr and can be used as the random uniform vari-
ables in the general (see section with equations (8.2) or (8.3)) or specific approaches
to sampling from probability distributions that were described above. We use this
146 8 Probabilistic Bias Analysis
As noted in the section on the normal distribution, an analysis may assign values to
bias parameters that are logically impossible (e.g., a proportion greater than 1) or
inconsistent with the observed data. In fact, this circumstance can arise with any
assigned distribution. For example, in the misclassification example given above,
we assigned probability distributions for the sensitivity and specificity of exposure
classification. We specified sensitivity as uniform from 70 to 95% and specificity
as uniform from 90 to 99%. If we had specified that specificity was distributed as
uniform from 85 to 95%, at some point in our simulation we might have drawn a
value of 90% for sensitivity and 87% for specificity. Table 8.14 depicts the simulated
data implied by a correction for misclassification with these values.
Note that, after correcting for the misclassification with these values assigned to
the bias parameters, the frequency of breast cancer cases who were smokers equals
1.7. The combination of values assigned to the bias parameters, even if it did not
seem implausible, is impossible given the data. In this case, either the values assigned
to the bias parameters are incorrect, or random error in the collection of the data
makes the data incompatible with these assigned values. When conducting simple
bias analysis, the investigator would immediately notice the negative cell frequencies
and would adjust the bias parameters accordingly. With probabilistic bias analysis, it
can be more difficult to note the problem and to know what to do about it.
The analyst has several options when impossible values are generated. These
impossible values suggest that the bias parameter distributions and the observed
data are not perfectly compatible. One could simply leave the distributions assigned
to the bias parameters as they are, but remove all simulations that produce impos-
sible values. If this is done, it is preferable to report the number of simulations that
have been removed from the simulation so that consumers of a bias analysis can
inspect for themselves whether or not the probability distributions were compatible
with the data.
0.86
0.92
0.98
0.5
0.8
0.56
0.62
0.68
0.74
0.86
0.92
0.98
0.5
0.8
Fig. 8.14 Output sensitivity and specificity from a probabilistic bias analysis of the relation
between smoking and breast cancer correcting for nondifferential misclassification of smoking
with 5,000 iterations chosen from uniform distributions from 50 to 100% for both Se and Sp
Alternatively, one could implement the probabilistic bias analysis and examine
the distribution of chosen values for the bias parameters after discarding the illogi-
cal values. One could then use the output distributions to adjust the distributions
assigned to the bias parameters. The analyst could then repeat the analysis with
distribution assignments that are more compatible with the data and therefore have
fewer or no impossible values. For example, in the misclassification of smoking
status example that we have been using, we could specify very wide distributions
for sensitivity and specificity initially. Any combination of bias parameters that
produces an impossible value for the odds ratio will be removed from the data set.
Figure 8.14 shows the output sensitivity and specificity distributions when we
specified uniform distributions from 50 to 100% for both sensitivity and specificity.
In examining the output distributions, we see that the data are compatible with
a wide range of values for sensitivity, but values of specificity below 88% produce
negative cell frequencies and are therefore impossible given the data. One could
then use this information to create new distributions for specificity that have 88%
as a minimum value.
The probabilistic bias analysis approaches we have discussed so far have focused
only on depicting systematic error. The simulation intervals we have presented
show the distribution of estimates of effect corrected for the systematic errors the
investigator believes exist in the data. The conventional 95% confidence interval
depicts only the amount of random error, or sampling error, in a result. Investigators
may also like to see an interval that includes the total error in the study (i.e., an
interval that includes both systematic and random error).
There are several ways to combine systematic error into a single total error interval.
In this chapter we will describe one simple approach to approximating the total
148 8 Probabilistic Bias Analysis
error. Each simulation that we have described so far represents a single possible
corrected estimate of association given the data and the values sampled from the
assigned bias parameter distributions. Each simulation only accounts for the systematic
error in the data. To simulate the additional random error, we could choose a random
standard normal deviate and multiply it by the standard error from the conventional
estimate of association. This would simulate the random error in the study. Then,
for each simulation, we could combine the systematic error and random error as:
Table 8.15 Results of a probabilistic bias analysis of the relationship between smoking and breast
cancer correcting for nondifferential misclassification of the exposure including both systematic
and random error
Width of
Analysis Median 95% Interval (2.5th, 97.5th percentile) intervala
Random error (Conventional 0.95 (0.81, 1.13) 1.40
result)
Systematic error 0.92 (0.85, 0.95) 1.12
Total error analysis 0.92 (0.77, 1.09) 1.42
a
Upper limit of interval divided by its lower limit
The first two rows are exactly as seen in Table 8.7. The last line shows the results
of adding random error to the systematic error interval. After adding the random
error, the median corrected estimate (0.92) is the same as in the systematic error
only analysis, but the simulation interval is wider with limits extending from 0.77
to 1.09. Notice that the width of the total error analysis (as measured by dividing
the upper limit by the lower limit) is wider than the conventional interval. Thus,
while the simple bias analysis alerted us to the fact that exposure misclassification
was not likely to explain the observed results, the total error analysis shows that
there is less precision in the results than we might originally have understood from
the conventional 95% confidence interval. By combining the two sources of bias
into a single interval, one can succinctly summarize the results of the bias analysis
and convey them to an audience.
Conclusions 149
The probabilistic bias analyses shown in this chapter have all been conducted
using summarized data. A major limitation of using summarized data is that it is
difficult to adjust for other confounders for which adjustment may have been
made in the conventional analysis. However, probabilistic bias analysis can be
conducted using individual level data (record level correction), which retains
information on other covariates and allows for multiple adjustments to be made
in the final analysis. When using record level correction, one can also incorporate
random error into the probabilistic bias analysis using either the methods
described above or bootstrapping.
Conclusions
Probabilistic bias analysis has many advantages over simple bias analysis, despite
the fact that simple bias analyses are more commonly seen in the literature. First,
simple bias analysis assumes that the bias parameters are known with certainty, a
condition which rarely, if ever, occurs. Probabilistic bias analysis addresses this
shortcoming by assigning probability distributions to the bias parameters, which
more accurately reflect the uncertainty in the values assigned to those parameters.
In the example above of smoking misclassification, data on the sensitivity of using
birth certificate records as a measure of maternal smoking status were assessed
using a validation study. The validation study found that birth certificates had a
sensitivity of 78%, but this estimate of sensitivity was measured with random error
that is not reflected in a simple bias analysis. The resulting simulation intervals
from a probabilistic bias analysis reflect this uncertainty in the bias parameters and
give a more accurate representation of the total uncertainty about a given result.
Another important advantage of the probabilistic bias analysis approach is that
the results of the bias analysis can be easily summarized into a median estimate of
association and a 95% simulation interval. Tables such as those presented above can
be used to easily compare the conventional estimates with the systematic error
correction interval and an interval adjusted for both systematic and random error.
Bias analysis is sometimes criticized as being subjective, because the investigator
chooses what values or distributions to assign to the bias parameters. The bias parameters
chosen should reflect a credible estimate of the actual biases at work in a study, but one
never knows for sure if accurate distributions have been assigned to the bias parameters.
Reporting estimates that reflect only random error, as if there were no systematic error in
the data, can be a more misleading approach. By providing the distributions for the bias
parameters and the rationale for the assignments, reviewers and readers can judge for
themselves whether the distributions were appropriate and alternative distributions can be
assigned to reflect the views of stakeholders other than the investigators.
The fact that simple bias analysis is more common may be due, in part, to the
perception that probabilistic bias analysis is difficult to implement and no
150 8 Probabilistic Bias Analysis
Appendix
Random number generators in Excel are recalculated every time a change is made
to any cell in a workbook. Thus any set of corrections created in Excel using random
draws from a probability distribution can be used to create a corrected estimate of
association. A short Visual Basic program can then be written to cut and paste the
single corrected estimate into a new cell. The act of cutting and pasting will regener-
ate all the random numbers, effectively repeating steps 4 and 5 each time a corrected
estimate is saved. The Visual Basic code in Fig. 8.15 will create 1,000 corrected
estimates that are calculated in cell A3 and paste them in cells B1 - B1,000.
probability distributions)
Paste:=xlPasteValues, Opertion:=xlNone,
SkipBlanks:=False,transpose:=False
cntr = cntr + 1
Wend
Fig. 8.15 Visual Basic code to run a probabilistic bias analysis in Microsoft Excel
Chapter 9
Multiple Bias Modeling
Introduction
Many nonrandomized epidemiologic studies are susceptible to more than one threat
to validity (i.e., multiple biases). Bias analysis applied to these studies requires a
strategy to address each important threat. The methods described in earlier chapters
can be applied serially or in parallel to quantify bias and uncertainty from these
multiple biases.
The most easily accomplished serial multiple bias analysis applies the simple
bias analysis methods of Chaps. 46. For each threat to validity, the analyst
conducts one simple sensitivity analysis, without attempting to account for biases
simultaneously. While this method is the most straightforward, it has an important
shortcoming beyond the general disadvantages of simple bias analysis. That short-
coming is the absence of an estimate of the joint or simultaneous effect of the
biases. Each simple sensitivity analysis will provide an estimate of the direction of
the bias (i.e., toward or away from the null) and of the strength of the bias (i.e., how
much does the estimate of association change with adjustment for the bias).
The analyst can only guess that the joint effect will somehow average the effects of
the individual biases, but such an intuitive solution is neither necessarily correct nor
easy to calculate.
For example, imagine a bias analysis that addresses nondifferential independent
misclassification of a dichotomous exposure and an unmeasured confounder that is
more prevalent in the exposed than the unexposed and increases the risk of the disease
outcome. The simple bias analysis to address exposure misclassification will suggest
that the original estimate of association was biased toward the null; the adjustment
for misclassification will yield an estimate of the association farther from the null.
The simple bias analysis to address the unmeasured confounder will suggest that the
original estimate of association was biased away from the null (presuming an associa-
tion in the causal direction); the adjustment for misclassification will yield an estimate
of the association nearer to the null. The analyst may think that the joint effect will
be a simple average of the two corrections, but that solution is unlikely to be correct.
Parallel simple bias analysis may therefore yield an incorrect impression about the
joint effects of more than one threat to validity, so ought to be avoided. At the least,
T.L. Lash et al. Applying Quantitative Bias Analysis to Epidemiologic Data, 151
DOI: 10.1007/978-0-387-87959-8_9, Springer Science + Business Media, LLC 2009
152 9 Multiple Bias Modeling
such parallel bias analyses should be presented with the aforementioned caveat that
the joint effects cannot be reliably estimated by averaging of the individual
effects. This warning applies particularly for corrections that involve misclassification,
which corrections do not yield a simple bias factor that can be included as a summed
error term.
A second approach to multiple bias analysis using simple methods is to apply
corrections within strata, and then pool the results to obtain a summary estimate
corrected for the various threats to validity. For example, to solve the multiple bias
problem introduced in the preceding paragraph, one might first apply the simple
bias analysis methods to address exposure misclassification, which were explained
in Chap. 6. These misclassification-corrected data could then be stratified to correct
for the unmeasured confounder per the methods in Chap. 5. Within each stratum,
the stratum-specific associations calculated after rearrangement to address misclas-
sification can then be pooled by conventional methods (e.g., standardization or
information-weighted averaging such as Mantel-Haenszel methods). As described
earlier, the cell frequencies cannot be used to estimate the pooled associations
variance, so cannot yield a p value or confidence interval.
The order in which these serial corrections ought to be made is an important
consideration in any multiple bias analysis, including the one introduced in the pre-
ceding paragraph. Correction for classification errors does not reduce to a multipli-
cative bias factor, so its place in the order of corrections will affect the ultimate
result. Corrections should be made in the reverse of the order in which they occurred
as the data were generated. For example, confounding is ordinarily perceived as a
population-level phenomenon, whereas misclassification occurs as data are col-
lected or tabulated. Confounding therefore occurs before misclassification as the
data are generated, and corrections should precede in the reverse order. The simple
bias analysis to address misclassification should precede the simple bias analysis to
address the unmeasured confounder.
Unfortunately, there is no constant rule regarding the order in which threats to
validity arise. Furthermore, the order of correction can be affected by the data
source that informs the values assigned to bias parameters. For example, classification
parameters might be measured in a population-based setting (i.e., negligible selection
bias), but be applied to a data set where selection bias is a concern. In this setting,
the analyst should correct for selection bias before correcting for misclassification,
even if the selection bias preceded the misclassification in the data generation process.
In this case, the data source requires application of the classification parameters to
unselected data, even though the more general rule would suggest application of
the bias analyses in the opposite order.
For each multiple bias analysis, the analyst must determine the correct order of
application of the corrections, taking account of the order in which the threats to
validity arise during the data generation process, the influence of the data source
used to assign values to the bias parameters, and (for probabilistic bias analysis) the
point in the analysis when random error should be included. The last topic will be
discussed further in the section of this chapter on multiple bias analysis using
probabilistic methods.
Multiple Bias Analysis Example 153
In this chapter, we will apply multiple bias analysis methods to a study of the asso-
ciation between use of antidepressant medications and the occurrence of breast
cancer (Chien et al., 2006). This population-based casecontrol study used the US
Surveillance Epidemiology and End Results cancer registry to enroll 975 women
with primary breast cancer diagnosed in western Washington State between 1997
and 1999. The Centers for Medicare and Medicaid Services records were used to
enroll 1,007 controls. All women were 6579 years old at enrollment. Participation
rates were 80.6% among cases and 73.8% among controls.
In-person interviews were used to collect information on medication use in
the 20 years preceding breast cancer diagnosis (cases) or a comparable index
date (controls) and on other known and suspected breast cancer risk factors.
Table 9.1 shows the frequency of ever and never use of antidepressants among
cases and controls, as well as the crude and adjusted odds ratios associating
antidepressant use with breast cancer occurrence. The crude and adjusted odds
ratios are nearly identical, so we will use the crude frequencies to conduct the
bias analyses.
To illustrate the application of multiple bias analysis methods, we will consider
three threats to the validity of the reported association. First, cases were more
likely to participate than controls. If participation is also related to use of
antidepressants, then the results are susceptible to selection bias. Second, while
the investigators adjusted for many potential confounders, they did not adjust
for physical activity. Physical activity may be related to use of antidepressants,
and physical activity has been shown to reduce the risk of breast cancer in some
studies. The study results may therefore be susceptible to bias from this
unmeasured confounder. Last, medication history was self-reported, so the
study results are susceptible to misclassification of the exposure. Bias analysis
methods to address each of these threats to validity have been explained
in the preceding chapters. The focus of this analysis, therefore, is to show
how multiple bias analysis can be used to address all three threats to validity
simultaneously.
Table 9.1 Crude and adjusted odds ratio associating ever use of antidepressants, vs. never
use of antidepressants, with incident breast cancer (Chien et al., 2006)
Ever use of antidepressants Never use of antidepressants
Cases 118 832
Controls 103 884
Crude odds ratio (95% CI) 1.21 (0.92, 1.61)
Adjusted odds ratio (95% CI) 1.2 (0.9, 1.6)
The adjusted odds ratio adjusts for confounding by age and county of residence. Adjustment
for other measured potential confounders changed the odds ratio by less than 10%
154 9 Multiple Bias Modeling
The threats to validity in this study were introduced in the following order: confounding,
selection bias, misclassification. Confounding exists in the population as a relation
between physical activity and breast cancer and an association between physical activity
and use of antidepressant medications. While physical activity may not directly
increase or decrease the use of antidepressant medications (although it might), physical
activity and antidepressant use likely share common ancestors in a causal graph.
Selection bias arises because the analysis can be conducted only among participants.
It is clear from the enrollment proportions that cases were more likely to participate
than controls. We do not know from the study data whether use of antidepressant
medications affected participation rates. It is reasonable to assume, however, that post-
menopausal women who use antidepressants, or with a history of antidepressant use,
might participate at different rates than women who never used antidepressants. If so,
then the result is conditioned on a factor (participation) that is a descendant of both the
exposure and the disease outcome, which introduced a selection bias. Finally, among
the participants, self-reported history of antidepressant use may be misclassified.
Because cases were aware of their disease status at the time of their interview, they
may have recalled or reported their history of medication use differently than did the
controls. The investigators validated self-report of antidepressant medication use against
pharmacy records in a subset of participants (Boudreau et al., 2004). This internal vali-
dation study will inform the misclassification bias analysis. Given that the classification
errors were introduced last in the data generation process, misclassification will be the
first threat to validity examined in the bias analysis.
At the time of enrollment into the parent study (Chien et al., 2006), the investigators
also sought consent from participants to compare their interview answers regarding
medication use with pharmacy records (Boudreau et al., 2004). Not every participant
in the parent study agreed to this validation; 1.6% of cases refused to allow the
validation and 7.3% of controls refused to allow the validation. Among those who
agreed to allow the validation, pharmacy records were only available for those
whose medical care was provided through an integrated health care provider in
western Washington and for those who said they always filled their prescriptions at
one of two large retail pharmacies. Finally, for those who satisfied these criteria, not
all had pharmacy records that could be located. After these restrictions, the validation
study was conducted among 403 of the 1,937 parent study participants.
The pharmacy records were available for only the preceding 2 years for all
validation study participants, whereas the exposure classification in the parent study
was based on a self-reported 20-year medication history. Comparing the self-report
of medication use in the preceding 2 years with the pharmacy records yielded the
cross-classification frequencies shown in Table 9.2.
Multiple Bias Analysis, Simple Methods 155
As reported in the summary of the study, controls and cases enrolled in the study at
different rates. The proportion of controls who agreed to participate equaled 73.8%
and the proportion of cases who agreed to participate equaled 80.6%. It is not unu-
sual for the participation rate in cases to exceed the participation rate in controls,
presumably because cases are motivated by their disease to participate in research
related to it. Controls do not have so salient an altruistic motivation.
Among those who did not participate, we do not know the proportion who used
antidepressant medications in the preceding 20 years. Note that we are interested
now in the gold-standard proportion, not in the proportion that would have self-
reported such a history of antidepressant medication use in the preceding 20 years.
Because we believe that the results have been corrected for classification errors, the
gold-standard exposure classification is the classification concept to apply when
conducting the simple bias analysis to address selection bias.
Not only is the proportion who used antidepressant medications unknown
among nonparticipants, but it is also unknown whether this proportion is different
for controls and cases. We have no internal data from the study to inform this selec-
tion bias analysis, so will have to assign values to the bias parameters from external
data sources. To begin, we will assume that the proportion participating among
controls or cases is a simple weighted average of the participating proportions
among antidepressant users and nonusers. That is:
156 9 Multiple Bias Modeling
Fig. 9.1 Screenshot of the misclassification spreadsheet applied to the multiple bias analysis
example
with benign breast disease (12%) (Maguire et al., 1978). By assuming that this
prevalence is the same in controls as cases, we effectively introduce a prior on
the association we wish to measure that is centered on the null. Furthermore, we
recognize that not all such women will have used antidepressant medications, and
that not all women taking antidepressants will have been diagnosed with moderate
or severe depression. Nonetheless, this exposure prevalence is consistent with the
20-year antidepressant exposure prevalence (10.4%) observed in the Chien et al.
study. We assign R a value of 0.8 for controls and 0.9 for cases, indicating that
women with a history of antidepressant use are less likely to participate than those
without, but that this difference is likely to be less pronounced among cases than
among controls.
When we use these bias parameters in the simple sensitivity analysis spreadsheet
used to address selection bias (Chap. 4), the odds ratio corrected for selection bias
and misclassification equals 1.49, as shown in Fig. 9.2.
Fig. 9.2 Screenshot of the selection bias spreadsheet applied to the multiple bias analysis example
158 9 Multiple Bias Modeling
We will use the interior cell frequencies shown in the table labeled corrected
for selected proportions to proceed with the multiple bias analysis.
Chien et al. (2006) considered adjusting the association between antidepressant risk
and breast cancer occurrence for confounding by race/ethnicity, income, marital
status, education, time since last routine medical check-up, age at menarche, parity,
age at first birth, type of menopause, age at menopause, duration of contraceptive
use, use of menopausal hormone therapy, first-degree family history of breast cancer,
cigarette smoking status, alcohol consumption, body mass index, and medical history
of depression, hypertension, hypercholesterolemia, arthritis, diabetes mellitus, or
thyroid problems. None of these potential confounders resulted in a change of the
association by more than 10% upon adjustment, so ultimately the investigators
adjusted only for age, county of residence, and reference year. While some of these
potential confounders may not satisfy the prerequisite conditions for a confounder
of the association between antidepressant use and breast cancer occurrence, the fact
that adjustment for such a comprehensive set of breast cancer risk factors (known
and suspected) resulted in negligible change in the odds ratio suggests that the
crude association is little affected by confounding. Nonetheless, for the purpose
of illustration, we will proceed with a bias analysis to account for the potential
confounding by physical activity. It is likely that physical activity would correlate
with some of the confounders for which adjustment was made (e.g., cigarette smoking,
alcohol consumption and body mass index) and for which there was negligible
evidence of important confounding. The simple bias analysis does not account for
the strong prior evidence that confounding by physical activity is likely to be
negligible. Rather, the simple bias analysis treats confounding by physical activity
as independent of the confounding by the measured confounders.
Recall from Chap. 5 that a simple bias analysis to address an unmeasured
confounder required information on the strength of association between the confounder
and the outcome, as well as information on the prevalence of the confounder in the
exposed and unexposed groups. For this simple bias analysis, we assigned a value
of 0.92 to the association between any strenuous physical activity at age 50, vs. no
strenuous physical activity, and incident breast cancer. This relative risk was
reported in the large Womens Health Initiative Cohort study of postmenopausal
women (McTiernan et al., 2003). For the prevalence of physical activity in post-
menopausal antidepressant users and nonusers, we will substitute the prevalence
observed in a community-based sample of perimenopausal women aged 4554
(Gallicchio et al., 2007). In that cross-sectional survey, 43.6% of women with a
Center for Epidemiologic Studies-Depression Scale (CES-D) score below 16
reported regular moderate or heavy exercise, whereas 29.9% of women with a
CES-D score of 16 or greater reported regular moderate or heavy exercise. We again
recognize that this measure of depression is not the same as the measure of
Multiple Bias Analysis, Simple Methods 159
exposure to antidepressants. Some women with high CES-D scores would not have
been taking antidepressants, and some women with low CES-D scores would have been
taking antidepressants. Nonetheless, because we could not find a report of exercise
habits in women taking antidepressant medication, and because these prevalences
correspond with our expectation that exercise habits would be lower in depressed
women (and hence in women taking antidepressants), they are a sound basis to start
with the simple sensitivity analysis.
When we use these bias parameters in the simple sensitivity analysis spreadsheet
used to address an unmeasured confounder (Chap. 5), the odds ratio corrected for
selection bias and misclassification equals 1.46, as shown in Fig. 9.3. The odds
ratio associating breast cancer occurrence with antidepressant use therefore changes
little (1.49 to 1.46) with adjustment for this unmeasured potential confounder.
Fig. 9.3 Screenshot of the unmeasured confounder spreadsheet applied to the multiple bias
analysis example
160 9 Multiple Bias Modeling
The preceding simple multiple bias analysis used one set of bias parameters to
adjust serially the estimate of association for each threat to validity we identified
(exposure misclassification, selection bias, and an unmeasured confounder). The
multidimensional multiple bias analysis repeats these calculations using different
values for the bias parameters, and different combinations of those assigned
values. We begin by creating multiple bias parameter scenarios for each simple
bias analysis, including for each a scenario that corresponds with no bias.
Misclassification Scenarios
A None, all case and control sensitivity and specificity set to 100%
B As reported by Boudreau et al. (2004) and used in the simple misclassification
bias analysis section; sensitivity in cases = 56%, specificity in cases = 99%,
sensitivity in controls = 58%, specificity in controls = 97%
C Nondifferential; sensitivity = 50%, specificity = 100%
D Differential sensitivity; sensitivity in cases = 60%, sensitivity in controls = 50%;
specificity in cases and controls = 100%
E Differential; sensitivity in cases = 60%, sensitivity in controls = 50%; specificity
in cases = 97%, specificity in controls = 100%
Table 9.3 Values assigned to bias parameters used in the simple selection
bias analysis
Controls Cases
pobserved (observed participation proportion) 73.8% 80.6%
wAD+ (proportion using antidepressants) 12% 12%
(Maguire et al., 1978)
R (ratio participating, AD+ vs. AD) 0.8 0.9
pAD (participation proportion, AD) 75.6% 81.6%
pAD+(participation proportion, AD+) 60.5% 73.4%
Multiple Bias Analysis, Multidimensional Methods 161
Table 9.4 Values assigned to bias parameters used in the second selec-
tion bias scenario
Controls Cases
pobserved (observed participation proportion) 73.8% 80.6%
wAD+ (proportion using antidepressants) 20% 25%
R (ratio participating, AD+ vs. AD) 0.7 0.8
pAD (participation proportion, AD) 78.5% 84.8%
pAD+(participation proportion, AD+) 55.0% 67.9%
Table 9.5 Estimates of the association between ever-use of antidepressants in the preceding 20
years, vs. never use (Chien et al., 2006), adjusted for combinations of simple bias analyses to
address misclassification (scenarios AE), selection bias (scenarios 13), and an unmeasured
confounder (scenarios ag)
a b g
1 2 3 1 2 3 1 2 3
A 1.22 1.08 1.07 1.19 1.05 1.04 1.14 1.01 1.00
B 1.68 1.49 1.47 1.64 1.45 1.44 1.57 1.39 1.38
C 1.25 1.11 1.10 1.22 1.08 1.07 1.17 1.03 1.02
D 0.99 0.88 0.87 0.97 0.85 0.85 0.93 0.82 0.81
E 0.75 0.66 0.66 0.73 0.65 0.64 0.70 0.62 0.61
162 9 Multiple Bias Modeling
One can see from the table that, given these scenarios, misclassification of the
exposure (comparisons across rows) and selection bias (comparisons across minor
columns) have a larger influence than the unmeasured confounder (comparisons
across major columns). The approximately nondifferential scenario informed by the
validation substudy suggests that the classification errors created a bias toward the null
(i.e., comparing row B with row A), although the size of the bias is largely influenced
by assigning an imperfect specificity (i.e., comparing the change from row A to row B
with the change from row A to row C). The differential scenarios that correspond with
errors one might expect given the retrospective design [(i.e., recall bias, modeled by
better sensitivity of exposure classification in cases than in controls (scenarios D and
E) and some false-positive reports of antidepressant use in cases but not in controls
(scenario E)] suggest that the classification errors created a bias away from the null.
The selection bias, given these scenarios, was away from the null (comparing
minor columns 2 or 3 with column 1 in any major column), and the two choices of
values assigned to the bias parameters had little effect on the estimated size of the
bias (comparing minor column 2 with minor column 3 in any major column).
The unmeasured confounder, given these scenarios, created little bias of the
estimate of association (comparing any cell in major columns b or g with its cor-
responding cell in major column a).
Finally, one can see that the range of adjusted odds ratios in the interior cells
(0.611.68, for a width of 1.68/0.61 = 2.75) is 27% wider than the conventional
95% frequentist interval (0.9, 1.6, for a width of 1.6/0.9 = 1.78), despite the fact that
the range of values in the table incorporates no sampling error.
different and equally reasonable distributions, which would allow an analysis of the
sensitivity of the bias analysis to the assigned distributions. We have provided the
SAS code we used to conduct the probabilistic multiple bias analysis on the texts
web site (see Preface). Users can implement this code with different assigned
density distributions to perform this sensitivity analysis of the bias analysis. We
will point out reasonable alternatives in each of the following sections to provide
a starting point.
Recall that the simple and multidimensional analyses of bias from misclassification
addressed errors in self-reported use of antidepressant medication in the 20 years
preceding the interview. An internal validation study showed 56% sensitivity in
cases, 58% sensitivity in controls, 99% specificity in cases, and 97% specificity in
controls. These internal validation data were collected from the subset of the study
population (see the section in simple misclassification bias analysis). While the
internal validation data suggest that misclassification was approximately nondif-
ferential, the study design should suggest the very real potential for differential
misclassification. Breast cancer cases interviewed shortly after their diagnosis are
likely to recall and report antidepressant use differently than are controls without
the memory stimulation of their diagnosis. One would expect, therefore, that the
sensitivity of classification would be greater in cases than in controls (yielding
fewer false-negatives in cases) and that the specificity of classification would be
lower in cases than in controls (yielding more false-positives among cases).
The tension one faces in assigning bias parameters in this case, therefore, is to
decide how strongly to be influenced by the internal validation study vs. the inclination
to favor recall bias. The internal validation data, if accepted without scrutiny, would
require that we assign exactly the classification parameters observed and reported
by Boudreau et al. (2004). The probabilistic bias analysis would assign the entire
probability density to the reported classification parameters, so would then become
equivalent to the simple misclassification bias analysis shown earlier.
These classification parameters clearly do not pertain exactly as measured, however,
since they are reports of the 2-year recall of antidepressant use compared with
pharmacy records from the same period, whereas the exposure contrast is a measure
of the 20-year history of antidepressant use. In addition, the reported classification
parameters are proportions, so measured with binomial error. At the least, one
would want to incorporate that error into the probabilistic bias analysis. Most
important, an understanding of the study designs susceptibility to differential clas-
sification errors would support higher sensitivity in cases but slightly higher
sensitivity was actually observed in controls and lower specificity in cases, but
lower specificity was actually observed in controls. Recall that the study population
was restricted to members of a single health maintenance organization and to study
subjects who reported that they filled all prescriptions at one of two large retail
164 9 Multiple Bias Modeling
a
40
30
Probability Density
20
10
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Classification proportion
Case sensitivity Control sensitivity
Case specificity Control specificity
Fig. 9.4 Probability density distributions assigned to the sensitivity and specificity of self-reported
20-year antidepressant use classification in the multiple bias analysis example
Multiple Bias Analysis, Probabilistic Methods 165
b
1.0
0.9
0.8
0.7
0.6
Cases
0.5
0.4
0.3
0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Controls
sensitivity Specificity
Non-differential
Fig. 9.5 Scatter plot of the case sensitivity against the control sensitivity, and of the case specifi-
city against the control specificity, as drawn from the assigned probability density distribution in
the first 200 of 100,000 iterations
Figure 9.5 is a scatter plot of the values selected from these distributions in the
first 200 of 100,000 iterations. The plot shows the sensitivity in cases plotted
against the sensitivity in controls, and of the specificity in cases plotted against the
specificity in controls. The ascending diagonal line represents nondifferential sce-
narios, when the case sensitivity equals the control sensitivity or the case specificity
equals the control specificity. Note that most of the points of results drawn from the
sensitivity distributions lie above the ascending diagonal, which is consistent with
our parameterization that expects better sensitivity in cases than in controls. Most
of the points of results drawn from the specificity distributions lie below the
ascending diagonal, which is consistent with our parameterization that expects bet-
ter specificity in controls than in cases.
The median odds ratio in the 100,000 iterations equaled 1.15 and the simulation
interval (2.5 percentile and 97.5 percentile equaled 0.91 and 1.43 respectively; see
bias model 2 in Table 9.6). The width of this interval (measured as the ratio of the
97.5 percentile to the 2.5 percentile) equaled 1.57. Recall that the conventional odds
ratio equaled 1.22 with 95% frequentist confidence interval of 0.92, 1.61 (width of
1.75; see bias model 1 in Table 9.6). The uncertainty due to misclassification, as
measured by the interval width under the assigned density distributions assigned to
the classification parameters, is therefore nearly as large as the uncertainty due to
random error. These results are depicted in Fig. 9.6, where the conventional odds
166 9 Multiple Bias Modeling
Table 9.6 Description of bias models and their results implemented to illustrate probabilistic
multiple bias analysis
Without random With random
error incorporated error incorporated
2.5 and 2.5 and
97.5 per- Ratio 97.5 per- Ratio
Bias model Median centiles of limits Median cetiles of limits
1. None (conventional) 1.22 1.22, 1.22 1.00 1.22 0.92, 1.61 1.75
2. Case sensitivity trapezoidal 1.15 0.91, 1.43 1.57 1.15 0.80, 1.64 2.05
(0.45, 0.5, 0.6,0.65)
Control sensitivity
trapezoidal(0.4,0.48,
0.58,0.63)
Case specificity
trapezoidal(0.95,
0.97,0.99,1.0)
Control specificity
trapezoidal(0.96,
0.98,0.99,1.0)
3. Case AD prevalence normal 1.16 1.05, 1.31 1.25 1.17 0.87, 1.58 1.82
(0.25,0.075)
Control AD prevalence
normal(0.2, 0.05)
Case participation
ratio, AD users to
nonusers, trapezoi-
dal(0.75,0.85,0.95,1)
Control participation
ratio, AD users to
nonusers, trapezoidal
(0.7,0.8,0.9,1)
4. OR associating exercise with 1.14 0.89, 1.27 1.43 1.12 0.80, 1.54 1.93
incident breast cancer trap-
ezoidal(0.2,0.58,1.01,1.24)
Exercise prevalence in AD
users normal(0.3,0.05)
Exercise prevalence in non-
users normal(0.44,0.05)
5. Bias model 2, then bias 1.10 0.86, 1.40 1.63 1.10 0.75, 1.60 2.13
model 3
6. Bias model 2, then bias 1.01 0.71, 1.35 1.90 1.01 0.65, 1.52 2.34
model 3, then bias
model 4
ratio and its interval are depicted in a histogram (Panel A), as is the result of the
probabilistic bias analysis to address exposure misclassification (Panel B). The figure
shows that the distributions are of approximately the same width, and that the distri-
bution of odds ratios from the bias analysis is shifted slightly toward the null compared
with the conventional result.
Multiple Bias Analysis, Probabilistic Methods 167
Fig. 9.6 Histograms depicting the central tendency and variability of odds ratios generated by
100,000 iterations of the bias models described in Table 9.6. Panel A corresponds with bias model
1, with random error incorporated. Panel B corresponds with bias model 2, without random incor-
porated. Panel C corresponds with bias model 3, without random error incorporated. Panel D
corresponds with bias model 4, without random error incorporated. Panel E corresponds with bias
model 6, without random error incorporated. Panel F corresponds with bias model 6, with random
error incorporated
168 9 Multiple Bias Modeling
The simple and multidimensional analyses of selection bias addressed errors arising
from the observed difference in participation between cases (80.6%) and controls
(73.8%) and from the expectation that those with a history of antidepressant use
in the preceding 20 years may have agreed to participate at different rates from
never users of antidepressants. This latter expectation cannot be verified from the
data because information on antidepressant use was gathered by interview, which
of course could not be administered to nonparticipants. The tension one faces in
assigning bias parameters in this case, therefore, is to decide what prevalence to
assign to history of antidepressant use in cases and controls, what participation ratio
to assign to those with a history of antidepressant use in the preceding 20 years vs.
those without such a history, and whether to allow this ratio to vary for cases and
controls.
In our probabilistic selection bias analysis, we assigned a normal distribution
with a mean of 0.25 and a standard deviation of 0.075 to the prevalence of antidepressant
use in the preceding 20 years among cases. We assigned a normal distribution with
a mean of 0.2 and a standard deviation of 0.05 to the prevalence of antidepressant
use in the preceding 20 years among controls. Note that these assignments effectively
induced a prior for the association between antidepressant use and breast cancer
occurrence, which is the very association measured by the study.
We also parameterized the participation ratio among cases (AD use history to
never use history) as trapezoidal(0.75,0.85,0.95,1.0). This parameterization reflects
our belief that antidepressant users, or those with a history of antidepressant use,
would be less likely to participate in the casecontrol study than never users of
antidepressants. We parameterized the participation ratio among controls (AD use
history to never use history) as trapezoidal(0.7,0.8,0.9,1.0), which reflects the same
belief. The trapezoidal distribution for cases is located nearer the null to reflect our
further belief that the tendency of antidepressant users to refuse to participate in a
research study would be attenuated by a diagnosis of breast cancer. All of the selec-
tion bias models were constrained so that the overall participation rates among
cases and controls equaled the observed participation rates.
Figure 9.7 shows the cumulative frequency of selection proportions in 100,000
iterations, under the probabilistic selection bias model, by case/control and antide-
pressant use/never antidepressant use in the preceding 20 years. Under the model,
the selection proportion is consistently lowest among controls with antidepressant
use in the preceding 20 years and next lowest among cases with a history of anti-
depressant use in the preceding 20 years. The selection proportions in never users
of antidepressants were approximately the same in cases and controls, which
reflects the countervailing influence of lower participation in controls than cases,
but generally less difference in participation between antidepressant users and
never antidepressant users in the cases than in the controls.
The median odds ratio in the 100,000 iterations equaled 1.16 and the 95%
simulation interval equaled 1.05, 1.31 (see bias model 3 in Table 9.6). The width
of this interval equaled 1.25, which is narrower than the interval surrounding the
Multiple Bias Analysis, Probabilistic Methods 169
c
100
90
Cumulative percent of iterations
80
70
60
50
40
30
20
10
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Selection proportion
Cases, AD+ Cases, AD
Controls, AD+ Controls, AD
Fig. 9.7 Cumulative frequency of selection proportions in 100,000 iterations, under the probabi-
listic selection bias model, by case/control and antidepressant use/never antidepressant use in the
preceding 20 years
misclassification bias analysis (bias model 2 in Table 9.6) and the uncertainty arising
from random error (bias model 1 in Table 9.6). The selection bias model results are
located nearer the null than the conventional results, and centered at approximately
the same median odds ratio as the probabilistic misclassification bias analysis.
The probabilistic selection bias analysis results are depicted in Panel C of Fig. 9.6.
Comparison of this histogram with the histograms in Panels A (conventional analysis)
and B (probabilistic misclassification bias analysis) reinforce the impressions
described earlier in this paragraph.
d
100
90
Cumulative percent of iterations
80
70
60
50
40
30
20
10
0
0.2 0.4 0.6 0.8 1.0 1.2 1.4
Odds ratio
ADexercise OR BCexercise OR
Bound on confounding
Fig. 9.8 Cumulative frequency in 100,000 iterations of the odds ratio associating exercise with
antidepressant use, the odds ratio associating exercise with breast cancer occurrence, and the
strength of confounding, under the probabilistic unmeasured confounder bias model
The probabilistic multiple bias analysis for this example follows directly from the
preceding series of probabilistic bias analyses. To complete the multiple bias analysis,
the cell frequencies estimated in the first iteration of the misclassification bias
analysis are carried forward and used as the starting cell frequencies in the first
iteration of the selection bias analysis. We have computed the odds ratio based on
these cell frequencies. This analysis addresses misclassification and selection bias,
the two threats to validity for which there is the most compelling evidence (i.e., the
misclassification of antidepressant medication use reported in the internal valida-
tion study (Boudreau et al., 2004) and the different participation rates for cases and
controls reported in the original study (Chien et al., 2006)). This process was
repeated for 100,000 iterations. The median odds ratio in the 100,000 iterations
equaled 1.10 and the 95% simulation interval equaled 0.86, 1.40 (see bias model 5
in Table 9.6). The width of this interval equaled 1.63, which is narrower than the
uncertainty arising from random error (bias model 2 in Table 9.6), but wider than
the intervals in either the misclassification bias analysis (bias model 2 in Table 9.6)
or the selection bias analysis (bias model 3 in Table 9.6) alone. The combined
misclassification and selection bias analysis results are located only slightly nearer
the null than the individual probabilistic misclassification bias analysis results and
the probabilistic selection bias analysis results.
172 9 Multiple Bias Modeling
The cell frequencies estimated in the first iteration of the combined misclassification
bias analysis and selection bias analysis were then carried forward and used as the
starting cell frequencies in the first iteration of the unmeasured confounder bias
analysis (bias model 6 in Table 9.6). This process was repeated for 100,000 itera-
tions. The median odds ratio in the 100,000 iterations equaled 1.01 and the 95%
simulation interval equaled 0.71, 1.34 (see bias model 6 in Table 9.6 and panel E in
Fig. 9.6). The width of this interval equaled 1.90, which is wider than the uncer-
tainty arising from random error (bias model 1 in Table 9.6) and wider than the
intervals in any of the individual bias analysis models. When we incorporated random
error as well, the 95% simulation interval equaled 0.65, 1.52 (see panel F in Fig. 9.6).
The width of the interval equaled 2.34, which is 34% wider than the conventional
frequentist confidence interval and located much nearer to the null (compare panel
F with panel A of Fig. 9.6). This interval is also 23% wider than the same bias
model without incorporating random error (compare panel E with panel F of Fig. 9.6).
This comparison shows that sampling error and systematic error are both important
components of the uncertainty in this studys result, under the assumptions inherent
in the bias models we have implemented.
One of the most important assumptions is that the original study results (Chien
et al., 2006) were confounded by regular exercise. As we have explained above,
there is good reason to believe that such confounding did not exist or was negligible.
As a first step, a sensitivity analysis of this bias analysis might assign different
distributions to the bias parameters for the unmeasured confounder bias analysis.
These distributions would reduce both the magnitude of confounding by exercise
habits and the uncertainty attributable to it. We have already provided the limiting
case in bias model 5 in Table 9.6, for which this source of bias and uncertainty was
not included. One can think of bias model 5 as a probabilistic multiple bias analysis
in which the odds ratio associating antidepressant use with exercise was assigned a
point probability density distribution at a value equal to 1.0, or the odds ratio asso-
ciating regular exercise with breast cancer was assigned a point probability density
distribution at a value equal to 1.0, or both of these assignments were made.
Of course, there are multiple alternative parameterizations that could be assigned
and defended for the bias parameters of the unmeasured confounding bias analysis,
and for the two other bias analyses as well. There are also sources of uncertainty
left unaddressed, such as errors in model specification, missing data bias, and data
corruption by miscoding, programming errors, interviewer errors, or data processing
errors. None of these sources of uncertainty have been incorporated, although they
are often presumed to be small compared with the sources of error we have
addressed quantitatively. This assumption, particularly as it pertains to data collection
and management, rests on the foundation that the investigators are well-trained
in research design and methods, that they have designed the data collection proce-
dures to reduce the potential for errors to an acceptably low threshold, and that they
have hired and trained staff to implement these procedures.
Writing the computer code to implement a probabilistic multiple bias analysis
can be a daunting challenge (we have posted the code for this analysis on the texts
web site; see Preface). However, once it is written, the distributions and values
Multiple Bias Analysis, Probabilistic Methods 173
assigned to the bias parameters can be readily changed to reflect alternative reason-
able choices. Both the investigator and other stakeholders can access the code to
implement their particular choices, always bearing in mind that the result depends
on the choices being approximately accurate. When such a sensitivity analysis of
the bias analysis identifies results that depend substantially on the chosen assign-
ments, this dependence will help to identify an area for further research. For exam-
ple, if the two analysts assign different sensitivity and specificity distributions to the
misclassification of antidepressant use, and the results of the bias analysis based on
those assignments differ importantly, then a further validation study would be an
important area for further research. This guide to directed research can be one of
the most valuable consequences of a bias analysis.
ii
Chapter 10
Presentation and Inference
Introduction
Throughout the text we have illustrated methods to present the results of bias analysis
and explained the inferences that might derive from those results. This chapter will
briefly describe the overarching considerations we recommend for presentation and
inference, and the reader should refer to specific examples throughout the text for
the detailed implementations of these principles.
Presentation
Methods
As of this writing, the methods of bias analysis applied to epidemiologic data are
not readily familiar to most stakeholders in epidemiologic research (by stakeholders,
we mean research colleagues, editors, reviewers, readers, policymakers, and the
interested public). In addition, there are no established standards for good practice
to which stakeholders can compare the methods of a particular bias analysis for a
preliminary assessment of its quality. The first principle of presentation, therefore,
must be complete and accurate description of the bias analysis methods.
This description should begin with a clear statement of the bias analysis objectives.
The description should include the nature of the biases to be evaluated (e.g., selection
bias, unmeasured confounding, or classification errors), and this description should
relate to sections of the methods section pertaining to the conventional data collection
or data analysis. In general, the objectives will parse into two related types. The first
type of objective is to assess the direction and magnitude of the error introduced by
the bias, and the second type of objective is to assess whether an observed association
could be explained by the bias.
The description should continue with a mathematical depiction of the bias model
used to achieve the objective. Given the space limitations sometimes imposed by
journals, the full description of the model might be reserved for an appendix or
T.L. Lash et al. Applying Quantitative Bias Analysis to Epidemiologic Data, 175
DOI: 10.1007/978-0-387-87959-8_10, Springer Science + Business Media, LLC 2009
176 10 Presentation and Inference
online supplement. In many cases, this depiction will be most easily accomplished
with mathematical equations, although citations to the methods literature will suffice
for common equations such as those used to implement the simple bias analyses
described in earlier chapters. With this depiction of the model, a stakeholder should
be able to relate some component of the studys conventional results (e.g., the
frequencies of exposed and unexposed cases and controls) to the bias analysis
results (e.g., the frequencies of exposed and unexposed cases and controls after
correction for exposure classification errors) through a set of bias parameters (e.g.,
the sensitivities and specificities of exposure classification in cases and controls).
If the equations that establish this relation are not presented, then the stakeholder
should be able to find these equations in cited literature.
The description should then explain the values assigned to the bias parameters
and the rationale for the assignment. If the assignments were based on a validation
substudy, then the methods by which substudy participants were selected and valida-
tion data collected should be clearly explained, and if the participants were not a
representative sample, then the implications of the selection should be considered.
If the assignments were based on external validity studies, then those studies should
be cited and the generalizability of the results to the present study should be considered.
If the assignments were based on educated guesses, then the reasoning behind the
educated guess should be well-described. When there are multiple sources of data to
inform the assignments, the description should include the rationale for how these
multiple sources were used in the analysis. For example, if the analyst chose a simple
bias analysis, then the description should explain why one set of bias parameters was
preferred over all others. Better still, the bias analysis should use multidimensional
or probabilistic methods to incorporate all of the sources of information, and the
description should explain how these sources of data were used to inform the bias
analysis. For example, all reported values of sensitivity of exposure classification
might be used in a multidimensional bias analysis, or might be used to influence the
choice of probability density and the values assigned to its parameters. Graphical
depictions of the probability density functions portray the analysts understanding of
the range of values assigned to bias parameters and relative preferences for certain
values within the allowed range. Any data sources that might have informed the
values assigned to the bias parameters, but were rejected by the analyst, should be
listed and the reasons for rejection should be explained.
Finally, if the bias analysis involves multiple biases, the description should be com-
pleted for each bias and the rationale for the order of corrections should be
explained.
Results
Guidelines for presentation of bias analysis results parallel good practices for
presenting the results of conventional epidemiologic data analyses. If the bias analysis
yields corrected frequencies in a 2 2 table, then it can be useful to present these
Presentation 177
or nondifferential misclassification). These series can also show the change in location
of the estimated association and the width of its simulation interval as the bias model
incorporates ever more sources of uncertainty. For example, the series may begin
with the conventional result (i.e., depicting only random error), then add uncertainty
from a selection bias, then add uncertainty due to a classification error, and finally add
uncertainty due to an unmeasured confounder. With each addition, one can ascertain
the change in location of the estimated association and the width of its simulation
interval, and these changes will suggest which biases are most important with
regard to both location and width, assuming the bias model and assigned values are
accurate. Note that it is important to maintain constant scales on both the y-axis
(frequency of result) and x-axis (measures of association).
An alternative to histograms of bias analysis results is a simple plot of the point
estimates and simulation interval against the bias analysis description. For example,
in Chap. 8 we used this graphical depiction to compare the results of a series of
probabilistic bias analyses, each with a different probability density function assigned
to the sensitivity of exposure classification. This method is somewhat simpler to
prepare and allows stakeholders to visualize all of the relevant bias analyses in a
single figure, rather than comparing histograms in different panels with one another.
The disadvantage of this method is that the stakeholder loses the ability to see the
flattening out of the histogram heights that occurs when the width of the distribu-
tion grows. This method is therefore best used when the distribution width changes
little, as with the example in Chap. 8.
Inference
Inferential Framework
We have often been asked whether these methods, particularly the probabilistic bias
analysis methods, rest on a frequentist or Bayesian foundation. The best answer is
that the probabilistic bias analysis methods are semi-Bayesian, in that they place
priors on some (i.e., the bias parameters), but not all, model parameters. Most impor-
tant, they do not place a prior distribution on the parameter of interest (the association
between the exposure contrast and the outcome). Probabilistic bias analysis, then,
assumes the same noninformative prior on the effect of interest as inherently assumed
by frequentist statistical methods. This assumption allows a computationally simpler
solution, traded-off against the ordinarily untenable belief that all possible values for
the association between the exposure contrast and the outcome are equally likely.
Furthermore, unlike formal Bayesian methods, the probabilistic bias analysis methods
do not update the priors based on the data (save for the instance when negative cell
frequencies result from misclassification corrections, although even then the method
simply discards these iterations rather than formally updating the prior).
Although there are freely available software to implement fully Bayesian methods
(e.g., WinBUGS) and texts to guide Bayesian bias analysis (Gustafson, 2003),
we have presented only approximate semi-Bayesian methods because they are
Inference 179
Given that the bias analysis methods described herein do not belong fully to either
the frequentist or Bayesian schools of statistical methods and inference, one should
be careful to avoid interpretations or inferences that rest on the assumption that they
are solidly grounded in either school. For example, one should not characterize the
180 10 Presentation and Inference
simulation intervals described in Chaps. 8 and 9 as the intervals within which the
true parameter value is likely to lie with some stated probability (e.g., 95%). Such
an interpretation requires a fully Bayesian analysis. Likewise, one should not char-
acterize the simulation interval as the interval which, with unlimited repetitions, will
contain the true parameter value no less frequently than some stated probability (e.g.
95%). Such an interpretation requires a proper frequentist experiment and analysis.
The simulation interval is, instead, a reflection of the combined data and bias
analysis assumptions. That is, estimation of the simulation interval begins with an
assumption that the data were collected without intent to defraud or methodologic
errors (e.g., coding errors that reverse exposure categories). Second, estimation of the
simulation interval requires a model of the relation between the observed data and
systematic errors to be assessed (e.g., that the observed data relate to classification
errors through the equations provided in Chap. 6). Third, estimation of the simulation
interval requires assumptions about the values to be assigned to the bias parameters
used in the error model. These three together observed data, error model, and values
assigned to the models bias parameters generate a simulation interval.
Comparison of the median of the simulation interval with the conventional point
estimate of association provides some idea of the direction and magnitude of the
systematic error acting on the conventional estimate of association, assuming that
the model adequately depicts the influence of the systematic error and the values
assigned to the bias parameters are near to the true values. Comparison of the limits
of the simulation interval (e.g., the 2.5th and 97.5th percentiles) with the conven-
tional frequentist 95% confidence interval gives some insight into the direction and
magnitude of the systematic error acting on the conventional estimate of association.
It also provides some insight into the understatement of total error reflected in the
width of the conventional frequentist confidence interval, also assuming that the
model adequately depicts the influence of the systematic error and that the values
assigned to the bias parameters are near to the true values. Simple bias analysis is
just the point-density distribution simplification of the equivalent probabilistic bias
analysis, and multidimensional bias analysis divides the probability density over
several such point-density simplifications.
Utility
The major utility of bias analysis is as a safeguard against inferential errors resulting
from overconfidence. When epidemiologic results appear precise and valid, stake-
holders may be tempted to action by unwarranted confidence in the accuracy and
stability of the results. Bias analysis provides methods to test the susceptibility of
the results to alternative assumptions about the strength of systematic errors. These
analyses may reveal that the original confidence was overstated and should slow a
rush to action. As noted in Chap. 1, the very act of bias analysis, by virtue of requiring
alternative explanations for observed associations, will often appropriately reduce
confidence in research findings.
Inference 181
Alpert M, Raiffa H (1982) A progress report on the training of probabilisty assessors. In Judgment
Under Uncertainty: Heuristics and Biases, Kahneman D, Slovic P, Tversky A (eds) pp 294305.
Cambridge University Press: New York
Auvert B, Taljaard D, Lagarde E, Sobngwi-Tambekou J, Sitta R, Puren A (2005) Randomized,
controlled intervention trial of male circumcision for reduction of HIV infection risk: the
ANRS 1265 Trial. PLoS Med 2: e298
Axelson O, Steenland K (1988) Indirect methods of assessing the effects of tobacco use in occu-
pational studies. Am J Ind Med 13: 105118
Bailey RC, Moses S, Parker CB, Agot K, Maclean I, Krieger JN, Williams CF, Campbell RT,
Ndinya-Achola JO (2007) Male circumcision for HIV prevention in young men in Kisumu,
Kenya: a randomised controlled trial. Lancet 369: 643656
Balfour JL, Kaplan JA (2002) Neighborhood environment and loss of physical function in older
adults: evidence from the Alameda County Study. Am J Epidemiol 155: 507515.
Barton S (2000) Which clinical studies provide the best evidence? The best RCT still trumps the
best observational study. BMJ 321: 255256
Berry RJ, Kihlberg R, Devine O (2005) Impact of misclassification of in vitro fertilisation in studies of
folic acid and twinning: modelling using population based Swedish vital records. BMJ 330: 815
Birge R (1941) The general physical constants: as of August 1941 with details on the velocity of
light only. Rep Prog Phys 8: 90134
Bodnar LM, Tang G, Ness RB, Harger G, Roberts JM (2006) Periconceptional multivitamin use
reduces the risk of preeclampsia. Am J Epidemiol 164: 470477
Boudreau DM, Daling JR, Malone KE, Gardner JS, Blough DK, Heckbert SR (2004) A validation
study of patient interview data and pharmacy records for antihypertensive, statin, and antidepressant
medication use among older women. Am J Epidemiol 159: 308317
Brenner H, Savitz DA (1990) The effects of sensitivity and specificity of case selection on validity, sample
size, precision, and power in hospital-based case-control studies. Am J Epidemiol 132: 181192
Bross ID (1966) Spurious effects from an extraneous variable. J Chronic Dis 19: 637647
Bross ID (1967) Pertinency of an extraneous variable. J Chronic Dis 20: 487495
Buescher PA, Taylor KP, Davis MH, Bowling JM (1993) The quality of the new birth certificate
data: a validation study in North Carolina. Am J Public Health 83: 11631165
Cain LE, Cole SR, Chmiel JS, Margolick JB, Rinaldo CR, Jr., Detels R (2006) Effect of highly
active antiretroviral therapy on multiple AIDS-defining illnesses among male HIV serocon-
verters. Am J Epidemiol 163: 310315
Cameron DW, Simonsen JN, DCosta LJ, Ronald AR, Maitha GM, Gakinya MN, Cheang M,
Ndinya-Achola JO, Piot P, Brunham RC (1989) Female to male transmission of human immu-
nodeficiency virus type 1: risk factors for seroconversion in men. Lancet 2: 403407
Casscells W, Schoenberger A, Graboys TB (1978) Interpretation by physicians of clinical laboratory
results. N Engl J Med 299: 9991001
183
184 References
Koehler D, Brenner L, Griffin D (2002) The calibration of expert judgment: Heuristics and biases
beyond the laboratory. In Heuristics and Biases: The Psychology of Intuitive Judgment, Gilovich T,
Griffin D, Kahneman D (eds) pp 686715. Cambridge University Press: New York
Kristensen P (1992) Bias from nondifferential but dependent misclassification of exposure and
outcome. Epidemiology 3: 210215
Lang J, Rothman KJ, Cann C (1998) That confounded p-value. Epidemiology 9: 78
Lash TL (1998) Re: insulin-like growth factor 1 and prostate cancer risk: a population-based case-
control study. J Natl Cancer Inst 90: 1841
Lash TL, Fink AK (2003a) Re: Neighborhood environment and loss of physical function in older
adults: evidence from the Alameda County Study. Am J Epidemiol 157: 472473
Lash T, Fink AK (2003b) Semi-automated sensitivity analysis to assess systematic errors in obser-
vational data. Epidemiology 14: 451458
Lash TL, Fink AK (2004) Null association between pregnancy termination and breast cancer in a
registry-based study of parous women. Int J Cancer 110: 443448
Lash TL, Silliman RA (2000) A sensitivity analysis to separate bias due to confounding from bias
due to predicting misclassification by a variable that does both. Epidemiology 11: 544549
Lash TL, Silliman RA, Guadagnoli E, Mor V (2000) The effect of less than definitive care on
breast carcinoma recurrence and mortality. Cancer 89: 17391747
Lichtenstein S, Fischoff B, Phillips L (1982) Calibration of probabilities: the state of the art to
1980. In Judgment Under Uncertainty: Heuristics and Biases, Kahneman D, Slovic P, Tversky
A (eds) pp 306334. Cambridge University Press: New York
Little RJA, Rubin DB (2002) Statistical Analysis with Missing Data. Wiley: New York
Maguire GP, Lee EG, Bevington DJ, Kuchemann CS, Crabtree RJ, Cornell CE (1978) Psychiatric
problems in the first year after mastectomy. Br Med J 1: 963965
Maldonado G (2008) Adjusting a relative-risk estimate for study imperfections. J Epidemiol
Community Health 62: 655663
Marshall RJ (1990) Validation study methods for estimating exposure proportions and odds ratios
with misclassified data. J Clin Epidemiol 43: 941947
Marshall SW, Mueller FO, Kirby DP, Yang J (2003) Evaluation of safety balls and faceguards for
prevention of injuries in youth baseball. JAMA 289: 568574
McTiernan A, Kooperberg C, White E, Wilcox S, Coates R, ms-Campbell LL, Woods N, Ockene
J (2003) Recreational physical activity and the risk of breast cancer in postmenopausal women:
the Womens Health Initiative Cohort Study. JAMA 290: 13311336
Michels KB (2003) Hormone replacement therapy in epidemiologic studies and randomized clinical
trials are we checkmate? Epidemiology 14: 35
Miettinen OS (1972) Components of the crude risk ratio. Am J Epidemiol 96: 168172
Miettinen OS (1985) Theoretical Epidemiology: Principles of Occurrence Research in Medicine.
Delmar: Albany, NY
Mokdad AH, Ford ES, Bowman BA, Dietz WH, Vinicor F, Bales VS, Marks JS (2003) Prevalence
of obesity, diabetes, and obesity-related health risk factors, 2001. JAMA 289: 7679
Monninkhof EM, Elias SG, Vlems FA, van der Tweel, I, Schuit AJ, Voskuil DW, van Leeuwen FE
(2007) Physical activity and breast cancer: a systematic review. Epidemiology 18: 137157
Mutsch M, Zhou W, Rhodes P, Bopp M, Chen RT, Linder T, Spyr C, Steffen R (2004) Use of the
inactivated intranasal influenza vaccine and the risk of Bells palsy in Switzerland. N Engl J
Med 350: 896903
Nisbett R, Borgida E, Crandall R, Reed H (1982) Popular induction: Information is not necessarily
informative. In Judgment Under Uncertainty: Heuristics and Biases, Kahneman D, Slovic P,
Tversky A (eds) pp 101116. Cambridge University Press: New York
Phillips CV (2003) Quantifying and reporting uncertainty from systematic errors. Epidemiology
14: 459466
Piantadosi S (2003) Larger lessons from the Womens Health Initiative. Epidemiology 14: 67
Piattelli-Palmarini M (1994a) How to emerge from the tunnel of pessimism. In Inevitable
Illusions, Piattelli-Palmarini M (ed) pp 139145. Wiley: New York
Piattelli-Palmarini M (1994b) Inevitable Illusions. Wiley: New York
References 187
Piper JM, Mitchel EF, Jr., Snowden M, Hall C, Adams M, Taylor P (1993) Validation of 1989
Tennessee birth certificates using maternal and newborn hospital records. Am J Epidemiol 137:
758768
Poikolainen K, Vahtera J, Virtanen M, Linna A, Kivimaki M (2005) Alcohol and coronary heart
disease risk - is there an unknown confounder? Addiction 100: 11501157
Poole C (1987a) Beyond the confidence interval. Am J Public Health 77: 195199
Poole C (1987b) Confidence intervals exclude nothing. Am J Public Health 77: 492493
Poole C (2001) Low P-values or narrow confidence intervals: which are more durable?
Epidemiology 12: 291294
Robins JM, Rotnitzkey A, Zhao LP (1994) Estimation of regression coefficients when some
regressors are not always observed. J Am Stat Assoc 89: 846866
Rossouw JE, Anderson GL, Prentice RL, LaCroix AZ, Kooperberg C, Stefanick ML, Jackson RD,
Beresford SA, Howard BV, Johnson KC, Kotchen JM, Ockene J (2002) Risks and benefits of
estrogen plus progestin in healthy postmenopausal women: principal results from the Womens
Health Initiative randomized controlled trial. JAMA 288: 321333
Rothman KJ (1999) Is flutamide effective in patients with bilaterial orchiectomy? Lancet 353: 1184
Rothman KJ, Greenland S, Lash TL (2008a) Design strategies to improve study accuracy. In
Modern Epidemiology, Rothman KJ, Greenland S, Lash TL (eds) pp 168182. Lippincott
Williams & Wilkins: Philadelphia
Rothman KJ, Greenland S, Lash TL (2008b) Modern Epidemiology. Lippincott Williams &
Wilkins: Philadelphia
Rothman KJ, Greenland S, Lash TL (2008c) Precision and statistics in epidemiologic studies. In
Modern Epidemiology, Rothman KJ, Greenland S, Lash TL (eds) pp 148167. Lippincott
Williams & Wilkins: Philadelphia
Rothman KJ, Greenland S, Lash TL (2008d) Types of epidemiologic studies. In Modern
Epidemiology, Rothman KJ, Greenland S, Lash TL (eds) pp 8799. Lippincott Williams &
Wilkins: Philadelphia
Rubin DB (1991) Practical implications of modes of statistical inference for causal effects and the
critical role of the assignment mechanism. Biometrics 47: 12131234
Rull RP, Ritz B, Shaw GM (2006) Validation of self-reported proximity to agricultural crops in a
case-control study of neural tube defects. J Expo Sci Environ Epidemiol 16: 147155
Savitz DA (2003) Interpreting Epidemiologic Evidence Strategies for Study Design and Analysis.
Oxford University Press: Oxford
Schlesselman JJ (1978) Assessing effects of confounding variables. Am J Epidemiol 108: 38
Schmidt-Pokrzywniak A, Jockel KH, Bornfeld N, Stang A (2004) Case-control study on uveal
melanoma (RIFA): rational and design. BMC Ophthalmol 4: 11
Schneeweiss S, Glynn RJ, Tsai EH, Avorn J, Solomon DH (2005) Adjusting for unmeasured
confounders in pharmacoepidemiologic claims data using external information: the example
of COX2 inhibitors and myocardial infarction. Epidemiology 16: 1724
Shaw GM, Wasserman CR, OMalley CD, Nelson V, Jackson RJ (1999) Maternal pesticide expo-
sure from multiple sources and selected congenital anomalies. Epidemiology 10: 6066
Siegfried N, Muller M, Volmink J, Deeks J, Egger M, Low N, Weiss H, Walker S, Williamson P
(2003) Male circumcision for prevention of heterosexual acquisition of HIV in men. Cochrane
Database Syst Rev CD003362
Siegfried N, Muller M, Deeks J, Volmink J, Egger M, Low N, Walker S, Williamson P (2005) HIV
and male circumcision a systematic review with assessment of the quality of studies. Lancet
Infect Dis 5: 165173
Silliman RA, Guadagnoli E, Weitberg AB, Mor V (1989) Age as a predictor of diagnostic and
initial treatment intensity in newly diagnosed breast cancer patients. J Gerontol 44: M46M50
Sloman S (2002) Two systems of reasoning. In Heuristics and Biases: The Psychology of Intuitive
Judgment, Gilovich T, Griffin D, Kahneman D (eds) pp 379396. Cambridge University Press:
New York
Sorensen HT, Lash TL, Rothman KJ (2006) Beyond randomized controlled trials: a critical compari-
son of trials with nonrandomized studies. Hepatology 44: 10751082
188 References
Spiegelman D, Rosner B, Logan R (2000) Estimation and inference for logistic regression with
covariate misclassification and measurement error, in main study/validation study designs.
J Am Stat Assoc 95: 5161
Stampfer MJ, Colditz GA (1991) Estrogen replacement therapy and coronary heart disease: a
quantitative assessment of the epidemiologic evidence. Prev Med 20: 4763
Stang A, Schmidt-Pokrzywniak A, Lehnert M, Parkin DM, Ferlay J, Bornfeld N, Marr A, Jockel
KH (2006) Population-based incidence estimates of uveal melanoma in Germany.
Supplementing cancer registry data by case-control data. Eur J Cancer Prev 15: 165170
Stang A, Schmidt-Pokrzywniak A, Lash TL, Lommatzsch P, Taubert G, Bornfeld N, Jockel KH
(2009) Mobile phone use and risk of uveal melanoma: results of the RIFA case-control study.
J Natl Cancer Inst 101: 120123
Steenland K, Greenland S (2004) Monte carlo sensitivity analysis and bayesian analysis of
smoking as an unmeasured confounder in a study of silica and lung cancer. Am J Epidemiol
160: 384392
Sundararajan V, Mitra N, Jacobson JS, Grann VR, Heitjan DF, Neugut AI (2002) Survival associated
with 5-fluorouracil-based adjuvant chemotherapy among elderly patients with node-positive
colon cancer. Ann Intern Med 136: 349357
Tang MT, Weiss NS, Malone KE (2000) Induced abortion in relation to breast cancer among
parous women: a birth certificate registry study. Epidemiology 11: 177180
The Editors (2001) The value of P. Epidemiology 12: 286
Thompson WD (1987a) On the comparison of effects. Am J Public Health 77: 491492
Thompson WD (1987b) Statistical criteria in the interpretation of epidemiologic data. Am J Public
Health 77: 191194
Tversky A, Kahneman D (1982a) Evidential impact of base-rates. In Judgment Under Uncertainty:
Heuristics and Biases, Kahneman D, Slovic P, Tversky A (eds) pp 153162. Cambridge
University Press: New York
Tversky A, Kahneman D (1982b) Judgment under uncertainty: heuristics and biases. In Judgment
Under Uncertainty: Heuristics and Biases, Kahneman D, Slovic P, Tversky A (eds) pp 322.
Cambridge University Press: New York
Tyndall MW, Ronald AR, Agoki E, Malisa W, Bwayo JJ, Ndinya-Achola JO, Moses S, Plummer
FA (1996) Increased risk of infection with human immunodeficiency virus type 1 among uncir-
cumcised men presenting with genital ulcer disease in Kenya. Clin Infect Dis 23: 449453
Wacholder S, McLaughlin JK, Silverman DT, Mandel JS (1992) Selection of controls in case-
control studies. I. Principles. Am J Epidemiol 135: 10191028
Wacholder S, Chanock S, Garcia-Closas M, El GL, Rothman N (2004) Assessing the probability
that a positive report is false: an approach for molecular epidemiology studies. J Natl Cancer
Inst 96: 434442
Weinberg CR (2001) Its time to rehabilitate the P-value. Epidemiology 12: 288290
Weinberg CR, Umbach DM, Greenland S (1994) When will nondifferential misclassification of
an exposure preserve the direction of a trend? Am J Epidemiol 140: 565571
Weiss NS (1994) Application of the case-control method in the evaluation of screening. Epidemiol
Rev 16: 102108
Weiss NS (2003) Adjusting for screening history in epidemiologic studies of cancer: why, when,
and how to do it. Am J Epidemiol 157: 957961
Weiss HA, Quigley MA, Hayes RJ (2000) Male circumcision and risk of HIV infection in sub-
Saharan Africa: a systematic review and meta-analysis. AIDS 14: 23612370
Werler MM, Pober BR, Nelson K, Holmes LB (1989) Reporting accuracy among mothers of
malformed and nonmalformed infants. Am J Epidemiol 129: 415421
Whittemore AS, McGuire V (2003) Observational studies and randomized trials of hormone
replacement therapy: what can we learn from them? Epidemiology 14: 810
Wilcox AJ, Horney LF (1984) Accuracy of spontaneous abortion recall. Am J Epidemiol 120:
727733
References 189
Wilson T, Centerbar D, Brekke N (2002) Mental contamination and the debiasing problem. In
Heuristics and Biases: The Psychology of Intuitive Judgment, Gilovich T, Griffin D, Kahneman D
(eds) pp 185200. Cambridge University Press: New York
Yanagawa T (1984) Case-control studies: assessing the effect of a confouding factor. Biometrika
71: 191194
Yates J, Lee J, Sieck W, Choi I, Price P (2002) Probability judgment across cultures. In Heuristics
and Biases: The Psychology of Intuitive Judgment, Gilovich T, Griffin D, Kahneman D (eds)
pp 271291. Cambridge University Press: New York
Yun S, Zhu BP, Black W, Brownson RC (2006) A comparison of national estimates of obesity
prevalence from the behavioral risk factor surveillance system and the National Health and
Nutrition Examination Survey. Int J Obes (Lond) 30: 164170
ii
Index
A E
Anchoring, 78 Exposure misclassification, 16, 33, 81, 8597,
100, 106, 108, 131137, 148, 151, 152,
160, 166
B External validation study, 21, 40, 118
Bayesian statistics, 4, 5, 106, 178180
Beta distribution, 126131
Bias parameter, 21, 2531, 3341, 52, 53, 56, F
6166, 68 Failure to account for the base-rate, 9, 10
Frequentist statistics, 4, 5, 178
C
Case-control study, 20, 30, 38, 47, 49, 50, 91, H
110112, 153, 168 Heuristics, 512
Causal graph, 44, 46, 59, 60, 154
Classification errors, 5, 6, 16, 2125, 35, 36,
41, 81, 93, 136, 152, 154, 155, 161, I
163, 164, 175, 176, 178, 180, 181. See Information bias, 4, 23, 24, 33, 3536, 4041,
also Misclassification 118, 179. See also Misclassification;
Cohort study, 40, 47, 54, 158 Measurement error
Confidence interval, 24, 9, 10, 1315, 1720, 25, Internal validation study, 21, 35, 36, 84, 109,
26, 29, 31, 32, 35, 36, 98, 106, 111, 136, 154, 163, 171
137, 147, 148, 152, 165, 172, 179, 180
Confounding, 2, 4, 11, 1618, 21, 23, 24, 28,
29, 31, 33, 37, 5964, 69, 71, 7677, 84, L
101103, 109, 113, 119, 132, 138144, Losses to follow-up, 14. See also Selection
152, 154, 158, 169172, 178179 bias
Contingency table, 14, 25, 41, 50, 55, 81, 82, 125
Correlated distributions, 144146
Covariate misclassification, 83, 85, 86, 100103 M
Creeping determinism, 11, 33 Matching, 15, 51, 54, 55
Cumulative probability function, 129, 130 Measurement error, 2, 3, 11, 18, 21, 40, 79
Misclassification, 6, 14, 16, 25, 30, 31, 33,
36, 79110, 113119, 129, 131139,
D 142, 145149, 151157, 159173,
Dependent misclassification, 8182, 103106 177, 178
Differential misclassification, 86, 87, 91, 103, Multidimensional bias analysis, 23, 26, 2829,
115, 116, 145, 146, 163, 164 39, 52, 53, 84, 109118, 161, 162, 176,
Disease misclassification, 81, 85, 94100, 108 177, 179, 180
191
192 Index
U
R Uniform distribution, 118123, 126, 131, 133,
Random error, 2, 3, 9, 13, 14, 1720, 2529, 134, 145, 147
31, 32, 106, 136, 137, 140, 146149, Unknown confounder, 5978, 110, 112
152, 165, 167, 169, 170, 172, 178 Unmeasured confounder, 11, 18, 2125, 28,
Record-level data, 25 3335, 37, 3940, 6062, 6478, 110,
Relative risk due to confounding, 29, 77, 112113, 138142, 151153, 158162,
101103, 141142 169172, 178