1. Introduction
Methods of evaluating and comparing the performance of diagnostic tests or indices are of
increasing importance as new tests or indices are developed or measured. When a test is
based on an observed variable that lies on a continuous or graded scale, an assessment of
the overall value of the test can be made through the use of a receiver operating characteristic
(ROC) curve (Hanley and McNeil, 1982; Metz, 1978). The underlying population curve is
theoretically given -by varying the cutpoint used to determine the values of the observed
variable to be considered abnormal and then plotting the resulting sensitivities against the
corresponding false positive rates. If a test could perfectly discriminate, it would have a
value above which the entire abnormal population would fall and below which all normal
values would fall (or vice versa). The curve would then pass through the point (0, 1) on the
unit grid. The closer an ROC curve comes to this ideal point, the better its discriminating
ability. A test with no discriminating ability will produce a curve that follows the diagonal
of the grid.
For statistical analysis, a recommended index of accuracy associated with an ROC curve
is the area under the curve (Swets and Pickett, 1982). The area under the population ROC
(n - l) lo + (m - l)tol +i(
var(6) = +- (2)
mn mn
Bamber (1975) provides a method of estimating the variance in the context of testing the
significance of a single ROC curve. Bamber introduces a quantity Bxxy, which is the
probability that two randomly chosen elements of the population C1 will both be greater
than or less than a randomly chosen element of C2, minus the complementary probability
that the observation from C2 will be between the two from Cl. A similar quantity Byyxis
also defined and the variance of A is given in terms of B and B Var(6) is then
mn mn
Sen (1960) has provideda method of structuralcomponentsto provideconsistentestimates
of the elements of the variance-covariancematrixof a vectorof U-statistics.This approach
turns out to be equivalentto jackknifing,but is conceptuallysimplerwhen dealing with
U-statistics.We will exploit this methodology to compare the areas under two or more
ROC curves. For the rth statistic, or, the X-components and Y-componentsare defined,
i n
E A(Xr, Yr) (=
Vro(X,)=- nl j=1i 1,2,...,m)
,2 . .IM
VI' rV)=(Xi,
m i=1 YJ) (j= 1, 2, ...,n).
r= [V-r0(X) - ][V (X, ) - ]
m - 1 i=1I
and similarlyS0l, which has (r, s)th element
501~= n - 1 E[V61(Y1) - ][VS (Y) - ]
n1 - 1j(
(al 2 ,), is thus
S =-Sio + - Sol.
m n
Let g be a real-valuedfunction of 0 that has bounded second derivativesin a neighborhood
of 0. Combiningresultsfrom Sen (1960) and Arveson(1969, Theorem 16), it follows that
if limNO,m/n is bounded and nonzero, then N12 [g(O) - g(O)] is asymptoticallynormally
distributedwith mean 0 and varianceo2, where
2 N co j-k agg a 1m
I 1 I
N-c9 = 10' 6~ nl /
[L (--S1o+
[ m n ) ]
has a standardnormal distribution.A confidenceintervalfor LO' naturallyfollows.
By a modest generalizationof these results,we can also apply any set of linear contrasts
to a vector of areas under correlatedROC curves and perform a test of significanceon
LO'. The test then takes the form
(0( - O)L'[L
) (Is1
m + nS01)
) LJ L(0
( 0)'
)'() (5)
which has a chi-squaredistributionwith degreesof freedomequal to the rank of LSL' . A
confidence regioncan also be constructed.
A computer program written in the SAS language is available from the authors for
computing components, covariancematrices,and contrasts.However,as indicatedin the
next section, the components can be computed easily by hand or by a simple computer
program.The components can then be input to any programwhich computes sums of
squaresand cross-productsin orderto obtain the covariancematrixS.
3. Example
When to performsurgicalcorrectionof intestinal obstructionin patients known to have
ovarian carcinoma is an unresolvedproblem. The dilemma centers around determining
those patients for whom surgerypresents a benefit. Castelado et al. (1981), and other
authorshave proposedthat patientswho survivelongerthan 2 months postoperativelycan
be declaredto have "benefited"from the surgery.Using this criterion,Krebsand Goplerud
(1983) devised a preoperativescoring system for use as a screeningtest in determining
a patient's risk for failing to benefit from surgery.The scoring algorithmis presentedin
Table 1. Accordingto this scoringsystem,patientswith low scoresshouldbe good candidates
for surgeryand those with higher scores should be consideredat risk for failing to benefit
from surgery.
The following example evaluates the discriminatingability of the proposed screening
algorithmon 49 consecutive ovarian cancer patients undergoingcorrectionof intestinal
obstructionat Duke UniversityMedicalCenter.Of the 49 patients, 12 survivedmore than
2 months postoperativelyand could be consideredsurgicalsuccesses;the remaining37 are
considered failures. The Krebs-Goplerud score (K-G) is compared against two other
preoperativelymeasuredindices:total protein(TP) and albumin(ALB), both of which are
positively associatedwith the patient's nutritionalstatus. BecauseALB is one component
of TP, these two measures are highly correlated,with a Kendall's tau-b value of .65.
Increasinglevels of ALB and TP are associated with better nutritional status, whereas
increasinglevels of K-G are associatedwith poorerprognosis.Thus, to simplifycomputa-
tions, we transformedby subtractingK-G from 12, the maximum possible value, so that
all indices would prognosticatein the same direction.
Figure 1 displays the empirical ROC curves for the three indices. From this figure, it
appearsthat K-G offers little improvement over either ALB or TP. The estimated areas
underthe curvesfor K-G, ALB, and TP are .69, .72, and .65, respectively.To analyzeand
comparethese areas,the covariancematrix for the vector of areasis needed. The method
of structuralcomponents easily producesthis matrix.For each of the variablesof interest,
(K-G, ALB, TP), we can denote by Xr (r = 1, 2, 3) the values associatedwith successand
by yr (r = 1, 2, 3) the values associatedwith surgicalfailures.Then, Or = Pr(Y' < Xr) +
iPr( yr = X') and we compute the components individually for each of the three variables.
If the data are first sorted by the variableof interest,it is a simple matterto calculatefor
eachX the numberof Y's less than X (NYLx)and the numberof Y's equal to X (NYEQx).
The component for X is then NYLx + 'NYEQx. Likewise, for each Y we calculate the
number of X's greaterthan Y (NXGy) and the number of X's equal to Y (NXEQy).The
component for Y is NXGy + 4NXEQy.
For this example, there are 12 X's and three variablesof interest,so the X-components
form a 12 x 3 matrix, V10.The 37 Y's yield a component matrix of dimension 37 x 3,
V0o.The 3 x 3 matricesS10and S0l are then computed as
- -- - -- -6
,- -- 1-
0.8 -
"04 '
0.2 _ I
It is clear that S10 and Sol are the covariance matrices of V1oand Vol, respectively. They
can readily be obtained from any computer program that computes covariance matrices.
The covariance matrix for the vector of areas is then
12 37
Table 2
Estimatedcovariancematrixbetweenareas underthe threeROC curves
K-G score Albumin Total protein
K-G score .0110 .0033 .0028
Albumin .0086 .0076
Total protein .0100
Table 3
Correlationcoefficientsof pairs of areas calculatedfromestimatedcovariancematrix(ECM)and
alsofrom methodof Hanley and McNeil (HM)
Correlation Kendall'stau-b Kendall'stau-b Correlation
(ECM) Survivors Nonsurvivors (HM)
K-G, ALB .34 .20 .18 .17
K-G, TP .27 -.01 .21 .10
ALB, TP .82 .61 .66 .61
I1 -I?)
Then based on (5), the x2 statistic with 2 degrees of freedom can be computed as 1.51 with
a P-value of .47. Based on this sample of 49 patients, there appears to be no advantage in
using the Krebs-Goplerud score over other routinely collected nutritional parameters,
although power in this situation is likely to be very small because of the small sample size.
4. Discussion
ROC curves are frequently being applied to the evaluation of diagnostic or prognostic tests
and indices. In order to make comparisons between two or more such indices derived from
the same test units or subjects, the implicit correlation between the curves should be taken
into account. This paper has presented a totally nonparametric approach to the comparison
of the areas under two or more ROC curves by using the theory developed for generalized
U-statistics. A covariance matrix can be estimated using the method of structural compo-
nents and the resulting test statistic has asymptotically a chi-square distribution. The
covariance matrix may also be used to construct confidence regions.
This work was supported in part by the Veterans' Administration Region 2 Health Services
Research and Development Field Program.
L'importancedes methodesd'evaluationet de comparaisonde la performancedes testsdiagnostiques
croit dans le meme temps que de nouveauxtests se developpentet sont lanc6ssur le marche.Quand
un test est fonde sur une variableobserveecontinueou qui prendses valeurssur une 6chellegraduee,
on peut faireune estimationglobalede la valeurdu test en utilisantla courbecaract6ristique (ROC)
du receveur.La courbe est construiteen faisant varierla coupure utilis6epour d6terminerquelles
valeursde la variableobserv6esont a considerercomme anormales,et ensuite en faisantla graphe
des sensibilit6sr6sultantescontre les ratioscorrespondantsfaussementpositifs.On doit tenir compte
de la naturecorr6l6edes donn6esdans l'analysestatistiquedes differencesentre courbesquanddeux
ou plusieurscourbesempiriquessont construitesa partirde tests bas6ssur les memes individus.On
presentedans ce papierune approchenon param6triquede l'analysedes airessous des courbesROC
correlees,en utilisant la theorie sur la statistique U generalisee,pour engendrerune matrice de