Article
Article
Article
ORIGINAL REPORT
SUMMARY
Purpose Data mining may enhance traditional surveillance of vaccine adverse events by identifying events that are
reported more commonly after administering one vaccine than other vaccines. Data mining methods find signals as the pro-
portion of times a condition or group of conditions is reported soon after the administration of a vaccine; thus it is a relative
proportion compared across vaccines, and not an absolute rate for the condition. The Vaccine Adverse Event Reporting
System (VAERS) contains approximately 150 000 reports of adverse events that are possibly associated with vaccine admin-
istration.
Methods We studied four data mining techniques: empirical Bayes geometric mean (EBGM), lower-bound of the
EBGM’s 90% confidence interval (EB05), proportional reporting ratio (PRR), and screened PRR (SPRR). We applied these
to the VAERS database and compared the agreement among methods and other performance properties, particularly focus-
ing on the vaccine–event combinations with the highest numerical scores in the various methods.
Results The vaccine–event combinations with the highest numerical scores varied substantially among the methods. Not
all combinations representing known associations appeared in the top 100 vaccine–event pairs for all methods.
Conclusions The four methods differ in their ranking of vaccine–COSTART pairs. A given method may be superior in
certain situations but inferior in others. This paper examines the statistical relationships among the four estimators. Deter-
mining which method is best for public health will require additional analysis that focuses on the true alarm and false alarm
rates using known vaccine–event associations. Evaluating the properties of these data mining methods will help determine
the value of such methods in vaccine safety surveillance. Copyright # 2005 John Wiley & Sons, Ltd.
key words — adverse event; empirical Bayes; proportional reporting ratio; vaccine
Copyright # 2005 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2005; 14: 601–609
vaers data mining 603
approximately correct. A small deviation from this Analysis
model is that sometimes a single report will generate a
One objective of this comparison is to determine
handful of different COSTARTs (e.g., both nausea and
whether all four methods agree with each other, as
headache), but the effect of this is likely to be small.
shown by scatterplots and as measured by rank corre-
A second deviation is that sometimes a news story or
lation. Comparison is of methods’ sensitivity and spe-
popular television show can trigger a burst of reports,
cificity is desirable, but the paucity of gold standards
and these reports are not independent. But again, the
for vaccine–event causality limits the ability to esti-
overall magnitude of these effects is probably small,
mate these properties. The theoretical properties of
and the authenticity of signals generated in this way
the procedures are also an important consideration.
could be evaluated through examination of long- This paper addresses all three bases of comparison;
itudinal trends.
we measure the agreement between methods, we dis-
One aspect of the VAERS data that has not yet played
cuss performance with respect to a handful of known
a substantial role in signal detection research is the
adverse effects, and evaluate both kinds of informa-
association among COSTART terms. One could
tion on the basis of the performance differences
potentially ‘borrow strength’ by pooling signals from
expected from theory.
similar COSTART terms. For example, reports of
The Vaccine Injury Table is a list of vaccine–adverse
dizziness and vertigo might be usefully combined to
event associations that the Institute of Medicine
improve the power of the signal detection algorithm.
has determined are causal.9 By operationalizing these
However, we do not address this extension.
associations as 32 vaccine–COSTART pairs (Table 1),
Instead, this paper focuses upon a statistical we compare the ability of the methods to signal those
comparison of four signal detection methods that have
pairs. Such operationalization is imperfect, since
been discussed in the literature. We call these methods
COSTARTs are applied without standardized defini-
proportional reporting ratio (PRR), screened PRR tions or diagnostic confirmation. For example,
(SPRR), empirical Bayes geometric mean (EBGM),
ARTHRITIS may refer to acute or chronic inflamma-
and lower-bound of the EBGM’s 90% confidence
tion of joints. We then evaluate the efficiency of the
interval (EB05). We do not address the relative risk,10
methods by comparing the number of vaccine–
nor do we consider a conditional probability me-
COSTART pairs signaled by each method.
asure developed by Friedman et al.11 and critiqued by
Injection site reactions are accepted as being caused
DuMouchel et al.12
by injectable vaccines. We also look at the methods’
There are other methods that can be used for signal
ability to signal injection site reactions, represented by
detection in large contingency tables without true
COSTART codes ABSCESS INJECT SITE, ATRO-
measures of exposure. For example, the U.S. Census
PHY INJECT SITE, CYST INJECT SITE, EDEMA
Bureau and the Consumer Product Safety Commission INJECT SITE, GRANULOMA INJECT SITE, HEM
have explored the use of ‘raking’ to detect interactions
INJECT SITE, HYSN INJECT SITE, INFLAM
in large tables (cf. Little and Wu13). Bate et al.14
INJECT SITE, INJECT SITE REACT, MASS INJECT
propose using a Bayesian Confidence Propagation
SITE, NECRO INJECT SITE, and PAIN INJECT
Neural Network for adverse event detection in the
SITE. This comparison allows us to evaluate the
WHO database (but DuMouchel15 argues that this
methods’ abilities to detect an adverse effect which is
method is an approximation to EBGM based on beta-
known to be caused by many vaccines.
binomial Bayesian estimates). Hauben and Zhou7
A given method may be superior in some situations
review much of this literature.
but inferior in others. There are six possible pairwise
Although these and other methods could be
comparisons among the four data mining methods.
considered, this research has focused upon the four Since our primary interest is to determine whether any
main techniques that have been piloted within the
method is the most effective for discovering adverse
FDA to date; this paper is not intended to be a
event risks, we focus on four comparisons that seem
comprehensive overview of all currently available most informative in terms of identifying plausible
methods. A key concern is that methods used for
vaccine–event pairs.
official purposes ideally should be transparent and
sufficiently interpretable that expert knowledge can
Data mining methods assessed
guide the evaluation of new signals. Also, it is highly
desirable that the signal detection system used in Proportional reporting ratio (PRR). The PRR
VAERS not be radically different from systems approach was first described by Finney16 and
already in place. further developed recently by Evans, Waller, and
Copyright # 2005 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2005; 14: 601–609
604 d. banks ET AL.
Table 1. Vaccine–event associations from vaccine injury table, operationalized as vaccine–COSTART pairs
Association from vaccine injury table COSTART Vaccine code(s)
Anaphylaxis or anaphylactic shock after any of Anaphylaxis DT, DTAP, DTAPH*, DTP, HEP, IPV, MMR, TD
the following: tetanus toxoid-containing vac-
cines; pertussis antigen-containing vaccines;
measles, mumps and rubella virus-containing
vaccines in any combination; polio inactivated-
virus containing vaccines; hepatitis B antigen-
containing vaccines
Chronic arthritis after rubella virus-containing Arthritis MMR, MR, MUR, R
vaccines
Brachial neuritis after tetanus toxoid-containing Brachial neuritis DTAP*, DTAPH*, DT*, DTP*, TD*, TTOX
vaccines
Encephalitis after any of the following: pertussis Encephalitis DTAP, DTAPH*, DTP, MMR
antigen-containing vaccines; measles, mumps
and rubella virus-containing vaccines in any
combination
Encephalopathy after any of the following: Encephalopathy DTAP, DTAPH*, DTP, MMR
pertussis antigen-containing vaccines; measles,
mumps and rubella virus-containing vaccines in
any combination
Intussusception after rotavirus vaccine Intussusception RV
Paralytic polio after polio live virus-containing Poliomyelitis OPV
vaccines
Thrombocytopenia purpura after measles virus- Thrombocytopenic purpura M, MM*, MMR, MR
containing vaccines
*VAERS did not contain any reports of these vaccine–COSTART pairs.
Copyright # 2005 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2005; 14: 601–609
vaers data mining 605
continuity adjustment to improve the accuracy of the ques that allow the data to determine the shapes of
chi-squared approximation to the distribution of the the mixture components.
Pearson’s test for independence in a contingency This kind of framework, called a hierarchical model,
table18). These requirements help to address the first is widely used in Bayesian practice (see Carlin and
two of the three concerns about use of the raw PRR Louis19 for details). It allows one to exploit a
score. The formula is simple Bayesian computational structure for inference
X while avoiding the need to choose a subjective prior for
Yates-corrected X2 ¼ ðjOrs Ers j 0:5Þ2 =Ers the unknown distribution of mij. Formally, the measure
corresponding to vaccine i and COSTART j is given by
Here Ors is the observed number in cell (r,s) for
r ¼ 1,2 and s ¼ 1,2 and thus takes the values a, b, c, and log2 EBGMij ¼ E½log2 ðij =Eij Þjnij
d, as in the contingency table in 2.1. The Ers are the
numbers expected in those cells under the assumption where the right-hand side of the equation denotes
that the adverse events are independent of the vaccine, the expectation operator and Eij is the value
and this is given by the row sum times the column sum (a þ b)(a þ c)/(a þ b þ c þ d) in the notation in the
divided by the total, so SPRR section, and for the vaccine–COSTART pair
of interest, i and j correspond to cell E11). This expres-
E11 ¼ ða þ bÞða þ cÞ=ða þ b þ c þ dÞ sion calculates the expected value of the base 2 loga-
rithm of the ratio between the estimated reporting
E12 ¼ ða þ bÞðb þ dÞ=ða þ b þ c þ dÞ ratio and that under the assumption of no causal rela-
tionship, given the observed count of the spontaneous
E21 ¼ ða þ cÞðc þ dÞ=ða þ b þ c þ dÞ reports for that vaccine and that COSTART. Large
values suggest that vaccine i might provoke the
E22 ¼ ðb þ dÞðc þ dÞ=ða þ b þ c þ dÞ adverse event described by COSTART j.
The practical effect of this hierarchical model
Under the null hypothesis of no relationship between framework is that it ‘shrinks’ the estimates of the
vaccine and COSTART, the Yates-corrected X2- reporting ratio parameters in the Poisson distributions
statistic follows a chi-squared distribution with one towards each other, thereby reducing the effect of
degree of freedom. Avalue of 3.84 would be significant sampling variation in the data. The shrinkage is
at the 0.05 level, which agrees closely with the greatest when Eij is small and/or nij/Eij is small, which
screening criterion of Yates-corrected X2 4. typically occurs when a or b is small. Another
advantage is that the model preserves the interpret-
Empirical Bayes geometric mean (EBGM). ability of the parameters and their estimates. The main
DuMouchel15 developed the empirical Bayes drawback of this approach is that it is computationally
approach to analysis of spontaneous reporting sys- intensive, taking several minutes to run and requiring
tems such as VAERS. The empirical Bayes model investment in well-tested, special-purpose code. The
assumes that the counts nij in each cell are random computational burden depends upon the number of
variables from Poisson distributions with unknown rows and columns in the matrix, not the number of
means ij where the ij are themselves random vari- reports—so from the standpoint of scaling concerns,
ables with a common distribution. Usually this com- this performance is adequate for all foreseeable
mon distribution is taken to be a mixture of two VAERS applications.
gamma distributions, one of which is centered at
the null value corresponding to a coincidental
adverse event, and the other of which is more dis- Lower-bound of EBGM’s 90% confidence interval
persed and centered at a value corresponding to a (EB05). The EB05 is the lower-bound of the 90%
true causal relationship between the vaccine and confidence interval of EBGM. DuMouchel and
the adverse event. There are many alternative mod- Pregibon20 recommend that one use the 5th percen-
els that lead to similar results; this is a simple mix- tile point of the posterior distribution of the ratio as
ture model with two gamma components, one of the metric. If the 5th percentile is large, then the
which is highly dispersed and the other of which is association is unlikely to be due to chance alone
concentrated near 1. Simple alternative models and warrants further exploration. The rationale
assume a mixture of different distributions and use for selecting the 5th percentile point is based upon
the observed counts nij to estimate the parameters, a loose analogy with frequentist inference, in
but one could also consider nonparametric techni- which one wants to indicate associations that are
Copyright # 2005 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2005; 14: 601–609
606 d. banks ET AL.
Figure 1. Frequency of occurrence of vaccine–COSTART pairs. This scatterplot illustrates the number of vaccine–COSTART pairs versus
the total number of occurrences of a particular vaccine–COSTART pair, for the 14 800 pairs that occurred at least once in VAERS. For each
of 4857 pairs, only one occurrence has been reported to VAERS (far left of graph). The pairs that occurred most frequently (far right of
graph) correspond to pairs in which the COSTART is a common and expected event (such as fever) that occurs after many vaccines
significant at the 0.05 level. The EB05 signal is con- Comparison of EBGM and PRR
servative and this quality should minimize false
Figure 2 displays the natural logarithm of the EBGM
positives, but because it represents the lower bound
signal versus the natural logarithm of the PRR signal
of the confidence interval, it is theoretically less sen-
(175 points for which PRR is infinite are omitted from
sitive than EBGM.
A small modification of the EBGM method takes
better account of the uncertainty in the posterior
distribution of the ratio mij/Eij. As part of the EBGM
computation, one finds the distribution of this ratio.
This distribution can be asymmetric and highly
dispersed, in which case use of the expected value
could overemphasize the apparent relationship
between the vaccine and the COSTART.
RESULTS
Of 69 230 theoretical vaccine–COSTART pairs,
14 800 actually occurred in VAERS at the time of this
analysis. Figure 1 illustrates the number of vaccine–
COSTART pairs versus the total number of occur-
rences in VAERS, for these 14 800 pairs. The point
at (1, 4857) indicates that 4857 vaccine–COSTART
pairs each occurred only once in VAERS. The pairs
that occurred most frequently, at the far right of
Figure 1, correspond to pairs in which the COSTART Figure 2. Scatterplot of ln EBGM vs ln PRR. This plot
is a common and expected event (such as fever) demonstrates a filament (arrow) that consists of vaccine–COSTART
that occurs after many vaccines. Many vaccine– pairs for which only one report was received. For these singleton
reports, the range of the PRR scores is large compared to that of the
COSTART pairs occurred rarely and some pairs EBGM scores, suggesting that PRR gives undue weight to singleton
occurred at high frequency, but overall the curve is reports relative to EBGM. EBGM: Empirical Bayesian Geometric
very smooth. Mean. PRR: Proportional Reporting Ratio
Copyright # 2005 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2005; 14: 601–609
vaers data mining 607
the graph). The logarithmic plot demonstrates a strong than 2. The lower filament, which consisted of cells for
filamentary structure. The lowest filament consists which nij ¼ 1, has disappeared. Several of the upper
entirely of cases for which nij ¼ 1, which accounts filaments are also gone, since they corresponded to
for most of the largest (rightmost) values of the cells in which Yates-corrected X2 was not statistically
PRR scores. In fact, of the 175 vaccine–COSTART significant. Note that because of the large number of
pairs with infinite values of PRR, 146 have nij ¼ 1, points, there is considerable overplotting.
confirming that PRR gives undue weight to singleton Among the top 100 vaccine–COSTART pairs from
reports, and thus is highly susceptible to sampling var- EBGM and SPRR, 54 appear in both, including nine
iation. Only two of these 175 vaccine–COSTART pairs for which SPRR is infinite. Of those cells flagged
pairs are also in the top 100 EBGM. The known asso- by both methods, the Pearson correlation coefficient
ciation of rotavirus vaccine and intussusception is not for the signal ranks is 0.543 ( p < 0.0001). Among the
in the top 100 PRR scores, because the value—while top 100 EBGM scores (EBGM 7.16, ln EBGM
very large—is finite. 1.97), there are nine cases in which the SPRR method
does not signal because nij < 3. There are 37 cases in
which all criteria are met, but the rank is simply greater
Comparison of EBGM and SPRR than 100. Among the top 100 SPRR scores (SPRR
The screened version of PRR is intended to repair the 20.08, ln SPRR 3.0), there are 46 cases in which the
deficiencies noted in the previous comparison. The EBGM score is not in the top 100 of scores by that
SPRR drops cells for which nij < 3 and additionally method. The top 100 EBGM scores include the
requires both statistical significance and a raw rotavirus–intussusception and rubella–arthritis asso-
PRR 2; there are 1596 vaccine–COSTART pairs ciations, whereas the top 100 SPRR scores include the
for which SPRR is defined. Figure 3 shows a plot of rotavirus–intussusception and oral polio vaccine–
the natural logarithm of the EBGM score against the poliomyelitis associations. The top 100 SPRR scores
natural logarithm of the SPRR score, for the cells for include three injection site reaction COSTARTs, two
which both SPRR and EBGM are defined (nine points of which are also among the top 100 EBGM scores.
for which SPRR is infinite are omitted from the graph).
In comparison with Figure 2, the figure is left-trun-
cated at 0.693, the natural logarithm of 2, because
SPRR does not generate a score for cells with PRR less
Copyright # 2005 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2005; 14: 601–609
608 d. banks ET AL.
purely to differences in score ranks (top 100 or not) and
not due to restrictions on count, Yates-corrected chi-
squared, or SPRR value. Of those cells flagged by both
methods, the Pearson correlation coefficient for the
signal ranks is 0.416 ( p < 0.0001). The top 100 SPRR
scores (SPRR 20.08, ln SPRR 3.0) include the
rotavirus–intussusception and oral live polio vaccine–
poliomyelitis associations; the top 100 EB05 scores
(EB05 3.98, ln EB05 1.38) include these two, as
well as the rubella–arthritis association. The top 100
SPRR scores include three injection site reaction
COSTARTs, one of which appears among the top
100 EB05 scores.
DISCUSSION
Figure 5. Scatterplot of ln EB05 vs ln SPRR. Lines indicate the
cutoff for the top 100 pairs in each method (EB05 3.98, ln
Data mining methods have been proposed as screen-
EB05 1.38; SPRR 20.08, ln SPRR 3.0). Forty-two vaccine– ing tools for improving the efficiency of adverse event
COSTART pairs appear in the top 100 of both methods (upper right reports. This is the first analysis comparing several
quadrant). EB05: lower-bound of the 90% confidence interval of the proposed methods using the VAERS database. Several
empirical Bayesian geometric mean. SPRR: screened proportional data mining methods exist, and our purpose is to com-
reporting ratio
pare four approaches that have been piloted within the
FDA. The qualitative features of the comparisons are
as follows. The PRR signal appears less useful for
postmarketing safety surveillance than SPRR, EBGM,
Comparison of EBGM and EB05 and EB05. The large number of PRR signals for sin-
As shown in Figure 4, the natural logs of EBGM and gleton reports could result in many false alarms and
EB05 generally have a linear relationship, as is divert resources from more consequential relation-
expected since the posterior distributions are reason- ships. Because of these limitations, PRR was removed
ably symmetrical. Sixty-seven vaccine–COSTART from further consideration in the analysis.
pairs appear in the top 100 scores of both EBGM Even the best method for detecting clinically
and EB05. The top 100 EBGM scores (EBGM 7.16, important signals among spontaneous report data is
7.16, ln EBGM 1.97) include the rotavirus–intus- subject to limitations. First, if nearly all vaccines are
susception and rubella–arthritis associations; the top associated with the same adverse event, such as
100 EB05 scores (EB05 3.98, ln EB05 1.38) injection site reactions, then automatic signal detection
include these two, as well as the oral live polio vac- systems are unlikely to discover this association from
cine–poliomyelitis association. The top 100 EBGM VAERS data. No single vaccine would likely emerge as
scores include two injection site reaction COSTARTs, markedly different from others, with regard to this
one of which appears among the top 100 EB05 scores. event, even if the event were extremely common. Some
vaccines are commonly administered simultaneously,
e.g., Hemophilus influenzae type B vaccine, inacti-
Comparison of EB05 and SPRR
vated polio vaccine, pneumococcal conjugate vaccine,
Figure 5 plots the natural log of EB05 against the nat- and diphtheria and tetanus toxoids with acellular
ural log of SPRR for the cells for which both SPRR pertussis vaccine in children. Determining whether a
and EB05 are defined (nine points for which SPRR given adverse event results from one of several
is infinite are omitted from the graph). From the plot simultaneously administered vaccines (thereby exon-
of the natural logs of the scores, it is clear that many of erating the ‘innocent bystanders’), from the simple
the top-ranked signals from one method are not the additive effects of multiple vaccines, or from the
same as the top-ranked signals from the other method. synergistic effect of multiple vaccines, is a topic for
We have examined the top 100 vaccine-COSTART further research.
pairs flagged by the SPRR method and the EB05 We found that the SPRR method was generally
method. Among these, 42 are in common, including competitive with the EBGM method. In com-
one infinite value of SPRR. The discrepancies are due paring EBGM versus SPRR, one should consider the
Copyright # 2005 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2005; 14: 601–609
vaers data mining 609
21
bias-variance tradeoff. SPRR estimates have large 2. Meyboom RH, Egberts AC, Edwards IR, Hekster YA, de
variance; EBGM estimates are shrunk towards a Koning FH, Gribnau FW. Principles of signal detection in
common mean, which reduces variance at the expense pharmacovigilance. Drug Saf 1997; 16: 355–365.
3. Martin M, Weld LH, Tsai TF, et al. Advanced age a risk factor
of a small bias. From a public health standpoint, good for illness temporally associated with yellow fever vaccina-
methods will agree on the strongest signals; close tion. Emerg Infect Dis 2001; 7: 945–951.
correlation among the other signals is not as helpful. 4. Rosenthal S, Chen R. The reporting sensitivities of two passive
EB05 is designed with statistical principles in mind and surveillance systems for vaccine adverse events. Am J Public
Health 1995; 85: 1706–1709.
takes explicit account of the asymmetry in the 5. Szarfman A, Machado SG, O’Neill RT. Use of screening algo-
distribution of signals. However, these properties rithms and computer systems to efficiently signal higher-than-
may not ensure superior performance. We have expected combinations of drugs and events in the US FDA’s
evaluated the ability of the different methods to detect spontaneous reports database. Drug Saf 2002; 25: 381–392.
some well-known adverse effects. The causal relation- 6. Hauben M. A brief primer on automated signal detection. Ann
Pharmacother 2003; 37: 1117–1123.
ship of the vast majority of vaccine–event pairs is 7. Hauben M, Zhou X. Quantitative methods in pharmacovigi-
unknown, making estimates of sensitivity and speci- lance: focus on signal detection. Drug Saf 2003; 26: 159–186.
ficity unreliable. This paper brings together the 8. Niu MT, Erwin DE, Braun MM. Data mining in the US Vac-
comparative information that is currently available, cine Adverse Event Reporting System (VAERS): early detec-
tion of intussusception and other events after rotavirus
relying on both theory and some empirical work. The vaccination. Vaccine 2001; 19: 4627–4634.
number of vaccine–COSTART pairs that ranked in the 9. Vaccine Safety Committee, Institute of Medicine. Adverse
top 100 by each of two methods (EBGM, EB05, or Events Associated with Childhood Vaccines: Evidence Bear-
SPRR) ranged from 42 to 67. Few known associations ing on Casuality, Stratton KR, Howe CJ, Johnston RB (eds).
were in the top 100 scores of any of the methods that we National Academy Press: Washington, DC, 1994. see also
http://www.hrsa.gov/osp/vicp/table.htm.
studied, but the known associations that were signaled 10. Church KW, Hanks P. Word association norms, mutual infor-
overlapped and were more similar than different. mation, and lexicography. Computational Linguistics 1990;
Under the limitations described above, our research 16: 22–29.
finds that each method has strengths and limitations, 11. Friedman C, Hripcsak G, DuMouchel W, Johnson SB, Clayton
PD. Natural language processing in an operational clinical
and knowledge of these differences has practical value. information system. Natural Lang Eng 1995; 2(1): 83–108.
12. Dumouchel W, Friedman C, Hripcsak G, Johnson SB, Clayton
PD. Two applications of statistical modeling to natural lan-
ACKNOWLEDGEMENTS guage processing. In AI and Statistics V, Fisher D, Lenz H
(eds). Springer-Verlag: New York, 1996; 413–422.
The study was part of routine activities by the Office 13. Little RJA, Wu M-M. Models for contingency tables with
of Biostatistics and Epidemiology/Food and Drug known margins when target and sampled populations differ.
Administration and therefore did not require supple- J Am Stat Assoc 1991; 86: 87–95.
mental funding. The authors thank Dr. Susan 14. Bate A, Lindquist M, Orre R, Edwards IR, Meyboom RHB.
Ellenberg for helpful critique and Dr. Vitali Pool for Data-mining analyses of pharmacovigilence signals in relation
to to relevant comparison drugs. Eur J Clin Pharmacol 2002;
assistance with data mining method definitions. We 58: 483–490.
also greatly appreciate the efforts of the VAERS 15. DuMouchel W. Bayesian data mining in large frequency
Working Group for their dedication to the mainte- tables, with an application to the FDA spontaneous reporting
nance of VAERS. The members of the VAERS Work- system. Am Statistician 1999; 53: 177–190.
ing Group include: Marthe Bryant-Genevier, Soju 16. Finney DJ. Systemic signalling of adverse reactions to drugs.
Methods Inf Med 1974; 13: 1–10.
Chang, Hector Izurieta, Ann W. McMahon, Lise 17. Evans SJ, Waller PC, Davis S. Use of proportional reporting
Stevens, Frederick Varricchio, and Robert Wise (Food ratios (PRRs) for signal from spontaneous adverse drug reac-
and Drug Administration); Scott Campbell, Robert tion reports. Pharmacoepidemiol Drug Safe 2001; 10: 483–
Chen, Penina Haber, John Iskander, Alena Khromova, 486.
18. Yates F. Contingency tables involving small numbers and the
Elaine Miller, Gina T. Mootrey, Vitali Pool, and Sean chi-square test. Suppl J Roy Stat Soc 1934; 1: 217–235.
Shadomy (Centers for Disease and Prevention). 19. Carlin BP, Louis TA. Bayes and Empirical Bayes Methods for
Data Analysis. Boca Raton: Chapman & Hall/CRC: 2000.
20. DuMouchel W, Pregibon D. Empirical Bayes screening for
REFERENCES multi-item associations. Proceedings of the Seventh ACM
SIGKDD International Conference on Knowledge Discovery
1. Chen RT, Rastogi SC, Mullen JR, et al. The vaccine and Data Mining, 2001, pp. 67–76.
adverse event reporting system (VAERS). Vaccine 1994; 12: 21. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical
542–550. Learning. Springer-Verlag: New York, 2001 (see chapter 7).
Copyright # 2005 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2005; 14: 601–609