14
\!c.l l)ccr\ Nlaklng '!1,1. .1. \o :, 19lt{ Statistical Approaches to the Analysis of Receiver Operating Characteristic (ROC) Curves Bqrburu .1.Mcm"eil, M.D., Ph.D., anrJJames A. Hanlev, Ph.D. tn thisarticle we review published andst.rn'e unpubiished work in statistical analyses of ROC curves.We describe both singleand jgint indices and indicatethe approaches that have been taken to consider between-reader variations andcorrela- tions, within-reader variations, and variations and correlations between cases' We thendiscuss in detail a single index. the IP ratio at a fixedFP ratio (desig- nated ZPr"), or the FP ratioat a fixedIP ratio (designated FPTp). We show howto caiculate confidence limits around cnv pointon thecurve; we further show, using the conventional Dorl'man and Ail program andthe jackknifing technique, how to cal- culate these confidence limits for multiplecurves derived from the same sample of patients. (Med Decis Making 4:137-150' 1984) Over the past 20 years investigators have proposed a number of indices to describe receiver operatingcharacteristic (ROC) curves'and have developed statistical techniques to comparetwo or nlore curves [l-3]. In this article we will briefly review past work in this field and will indicate an approach to dealing with the problem of comparing differencesbetweentwo or more ROC curvesat a singleoperating point in either the true positive ( IP) or falsepositive (FP) dimelsion. We shall first review commonly usedindices and discuss their statisticalevaiuation for experiments with unpaired and with paired designs. Indices to Describe ROC Curves Anre-RErerro MresunEMENTS. Much recent work has involved the This u'ork was supported in part b,v a grant fronl the Hartford Foundation and the National Cancer Institute (CA 28847). From the DepzLrtmenrs of Radiology and Preventive Medicine and Clinical Epidemiology, Harvard Medical Schooland the Brigham and Women's Hospital, and the Department of Epi- demiologyand Health, McGill University. Please address requests for reprintsto Dr. McNeil, Department of Radiology, Harvard Medical School, 25 Shattuck Street, Boston, Massachu- setts02115. USA.

Statistical Approaches to the Analysis of Receiver ... · to the Analysis of Receiver Operating Characteristic (ROC) Curves Bqrburu .1. Mcm"eil, M.D., Ph.D., anrJ James A. Hanlev,

  • Upload
    others

  • View
    8

  • Download
    1

Embed Size (px)

Citation preview

\ ! c . l l ) cc r \ N lak lng' ! 1 , 1 . . 1 . \ o : , 1 9 l t {

Statistical Approachesto the Analysisof Receiver OperatingCharacteristic (ROC) CurvesBqrburu .1. Mcm"eil, M.D., Ph.D.,anrJ James A. Hanlev, Ph.D.

tn this article we review published and st.rn'e unpubiished work in statistical analyses

of ROC curves. We describe both single and jgint indices and indicate the

approaches that have been taken to consider between-reader variations and correla-

tions, within-reader variations, and variations and correlations between cases'We then discuss in detail a single index. the IP ratio at a fixed FP ratio (desig-

nated ZPr"), or the FP ratio at a fixed IP ratio (designated FPTp). We show how to

caiculate confidence limits around cnv point on the curve; we further show, using the

conventional Dorl 'man and Ail program and the jackknifing technique, how to cal-

culate these confidence limits for multiple curves derived from the same sample of

patients. (Med Decis Making 4:137-150' 1984)

Over the past 20 years investigators have proposed a number of indices to

describe receiver operating characteristic (ROC) curves' and have developed

statistical techniques to compare two or nlore curves [l-3]. In this article we

will briefly review past work in this field and wil l indicate an approach to

dealing with the problem of comparing differences between two or more

ROC curves at a single operating point in either the true positive ( IP) or

false positive (FP) dimelsion. We shall f irst review commonly used indices

and discuss their statistical evaiuation for experiments with unpaired and

with paired designs.

Indices to Describe ROC Curves

Anre-RErerro MresunEMENTS. Much recent work has involved the

This u 'ork was supported in part b,v a grant f ronl the Hart ford Foundat ion and the

Nat ional Cancer Inst i tute (CA 28847).

From the DepzLrtmenrs of Radiology and Preventive Medicine and Clinical Epidemiology,

Harvard Medical School and the Br igham and Women's Hospi ta l , and the Department of Epi-

demiology and Heal th, McGi l l Univers i ty . Please address requests for repr ints to Dr. McNei l ,

Department of Radiology, Harvard Medical School , 25 Shattuck Street , Boston, Massachu-

set ts 02115. USA.

138 McNEIT exo HaNLPY

area under the ROC curve. When discrete rating dat-a are used (e.g., 5-point

or 6-point rating scales for radiology imaging exr-leriments), and when ROC

curves are assumed to be based on two underlying Caussiarr distributions, a

maximum likelihood estimation program b-v Dorfrnan and A1f [4] can be

used to fit data points to a smooth curve and to derive thereby (along with

other indices) the area under this fitted curve and its associated standard

error. This area is designated A- and ranges in value from 0.0 to 1.0.

If the ROC curve is drawn by connecting the pairs of observed TP and

.FP ratios, and if the trapezoidal areas are summed, the resulting non-parametric area is designated P(A).The common availabil ity of the Dorf-man and Alf program, underestimation of the area, and undue dependenceon extreme points have decreased the use of the P(,4 ) index for rating data.

When continuous dala are available (as from chemistry laboratory tests,white cell counts, individualized predictions from logistic regression or dis-

criminant analysis) and ROC curves are created, no assumptions on under-

lying distributions need be made to obtain area measurements. Instead,

Bamber's recognition of the equivalence between the area under the ROC

curve and the Wilcoxon statistic Zallows immediate and direct calculation

of W and hence the area [5]. Hanley and McNeil 's [6] derivation of a closed

form approximate expression for the standard error associated with the

Wilcoxon statistic can be used to approximate the standard error of the

area.

SropE-RrrerED AND INrrRcspr-RELATED INorces. When ROC curves

are assumed to be based on underlying Gaussian distributions, the expected

ROC points should follow a straight line when plotted on binormal coordi-

nate paper t2l . This assumption has led to the development of a series of

indices related to the slope and intercept of the straight line fit to the

observed ROC points. In general, these are derived after observed data are

fitted with the Dorfman and Alf program. The true slope of the l ine is desig-

nated b and its true intercept a (estimates of these are designated d and b;

however, for simplicity, throughout this manuscript the estimate sign will

be omitted and all symbols a and b will designate estimates); a divided by b

is called Am, an index commonly used in radiology phantom studies. Con-

ceptually, Am is a form of a standardized difference between the means oftwo normal distributions (which may have different variances). Otherderived indices are also available (see [2] for a complete summary). The

Dorfman and Alf maximum likelihood program provides estimates for a

and b,var(a) , var(b) , and covar(4, b) . E i ther of the above pai rs of ind ices(e.g., Am and the slope or a and b), or any other pair derived from them, is

sufficient to describe fully a binormal ROC curve.

Statistical Tieatment: Area Index

UNpnrnrp Dara. When the area index is used and we have only one

rea(variing

andTheDorstatbether

Irvhiexprelacovingsistfearareiculiqua

Coris t lcandenequoperatiuncbycorhor

Allinvancmacur

serlrl€poru SurEurut o,rl oql JoJ seeJe aqt q8noqtle puu 'ssoJJ lqElru sa^JnJ

lou o/ru 'alduexe ro; 'sasel auos ul 'Ja^a,4Aoq 'es€3 aql oq sfeatp tou,(eruqJnS 'a^JnJ altlua eql go uoudt:rsap E ut to 'a^JnJ

JOU alllue aql JoJ alue-urrograd Jo xepur ieuorsueulp-auo IIulaAo u€ uI palselelul sI roleEtlsanutaql t"qt eunsse senbruqcel IeJrlsrlels Eurllnser pue saJIpuI a^oqe eql Jo I1y

xapuJ tulod dJ

'JeAe,^(\oq

'ayqepe,re .{lproua8 la,{ lou st stqt i[91] sired ldocralut-edols pelelaJroJ

o,4^i aleuusa o1 qreordde JIV pu€ UPLUJJoC (.leuolsuatulp-o,{\1,, u Sutsn fq

{ro^\ srqt pepuelxe sPI{ zlaIAI 'etep perred rod '[6'8] e^rnJ 3Od 3ut,{papunotues aql uoJJ a^Irap (sacrpur ,(ruruurns tlaql oJuaq pue) elep alecs Eutler

Jo slas o^u oql Jr uorlnqlrtstp arunbs-rqc e s,^ olloJ leqt Jllsllsls tsal u pado-la^ap seq z1a;41 'srted tdactatui-adols Jo uoslJudruoc partedun toC ('1enba

aq ol slueluaJnseatu Eete JoJ ',,nolaq pal€Jlpul s€ ',{r€sseJau lou sl sJuep-rJuroJ srql) 'sa,rrnc eql Jo eJuaptcutor alaldruoJ € sl eJaql JI {1uo Tsva uet

saAJnJ o,r1 uae,rleq flrluapr teql uI qreorddu I€JIlsItels snoro8tr lsou aql sI

sldacratur puu sadols Jlaql qloq uo {lsnoauellnuls selrnJ 3gy Eutreduo3

xepul 1dacra1u1-adopg

'ss;sse ol llnJrJJrp asr,$leqlo st lEql ,{lttuenb

E 'ses€J erues aql 8urfpn1s.{q paonpur sueJs uaa,ln}aq uolle{arJoo aql Eutlelnc-l"J Jo poqlau elqlseal arotu B Pelse8Ens [t] tlaruclt pue ,{a1ueg 'xaput eale

aql Jod '[1] (,(cualstsuoJ JapEeJ-uea,,rrtaq) sesut Jo las etues aql peeJ sJapEaJ

a1dr1lnur Sur,req , q pacnpur s3tuEIJe^oJ pue seJuelJs^ oql pue i(.{rualsts-uoJ rapeaJ-urqlr,u) aJuo ueql aJotu saseJ Jo les etues eql pBeJ :apear e 8ut-,r,uq ,(q peJnpur eJuBueA eql isasec aues aql Sutsn .{q peJnpul saJu€lJE^oJpue saJu?rJ€^ eql :uoslJuduroc perted u ut luasard aq .{utu leql (suotlelar-roJ pel€rJosse pue) seJueIJ€A 3o sad,{1 eeJql lunoJJe olul al€l o1 uotssardxapraua8 e papr,tord a,ruq san8ualloJ pu" sle,\\S 'pasn xapul aql Ja^eleq^\'lynl;;rp eJoru eJe suostredruol parted toJ slsel IEJllsll€lS

'vlvc aaursd

'ra,u.od pue ezrs aldues atetullsa ot fqaJeqlpue roJJO pJupuels aql JoJ uotssardxa tuJOJ pJSolJ e opt,ro:d ol posn eqoslu ueJ Jrlsrluls uoxorli[ eql '[9] sluaueJnsuatu Jo lstsuof, etup JI 3l]sllels

uoxoJlri6 aqt uroJJ ro 's8utler Jo lslsuoJ Bl€p JI urerSord JIV pue uutuJJo(l

aql uoJJ .{ltrarrp peululqo aq u€J sJoJJa pJupuets pal€tJossu pue seeJ€ aql'uorlnqrJlsrp

IEIuJou Jql Jo elqel oq1 qlltt oIIBJ IeJIIIJI aql Eutreduoo pue

(zeery - rea:y)gg

zeory Ieary= ollel [eJl]tlJ

€lnuloJ oqt Sutsn aJUaIJJJIp :taql 8ut-isat dq aperu eq ueo s€are ol.rl uaa,^Alaq suostrudruoc IPJI1SIluls

'(suotletru,t

repueJ-uae,rtaq alqt8tpau lnq Jap€el euo u€qt eroul a^eq 3,t\ JI Jo) Japeal

(r)

sl "ru

s0crp, JOJ

eqrr3q(Jo su-uoJq ̂q.ilP\ Iiq pt-Ersalar? P3q1 (

Jo se-IPIO

pelJasaAJrl

eql Jaql rpesoluorlE

JOU'Peal

-Jepr-slp I'slsa:'ulBp

aJual-JJO(-uou

PUE r

PJBPqll,l^3q ure 'su

JOdluroc

6tlsEAxnJ lod Jo srs,{'rvNv rYlu-srjvrs

l4O McNrrr eNp HeNtrv

may be allnost the same, in the clinical range of interest one may be superior Thto the other. In addition, even if two curves do not cross, one could imaginE Zr,that differences would exist at one point (the clinically relevant one, per- ch<haps) on the curve but would not be detected in any global test. In either of mathese two situations raw rating scale data may not ailow direct comparisons shcto be made, since it is unusuai for indentical interpretive critera (i.e., identi- (Mcal FP ratios) to exist in different experiments. In particular, this means that reathe observed TP ratio on one curve at a particular criterion cannot be cordirectly compared with that on another curve, because the associated FP an(ratios observed may be different. cor

CSF

Methods

In this section we elaborate an approach to comparing differences between thetwo ROC curves at one point (either TP or FP). This is also part of a com- (fr,prehensive computer package being prepared by Metz [0] . Our approach is Zrbased on fitted ?"P and FP ratios, obtained from fitted parameters from the meDorfman and Alf maximum likelihood estimation program for rating-based onROC curves [,4] . The method yields confidence intervals around true posi- bettive ratios at a fixed faise positive ratio (designated TPpp) or around false thepositive ratios at a fixed true positive ratio (designated FP7). Illustrative

examples are presented.The maximum likelihood estimation program of Dorfman and Alf pro-

vides parameters that aliow calculation of IP ratios at any FP ratio and anlthus provides the basis for comparing two ROC curves at either the same ratTP ratio or the same .FP ratio. ln brief, the relevant outputs for this purpose ateare: (1) a, the normai deviate value of the intercept of the ROC curve with se!the 7 axis; and (2) D, the slope of the ROC curve obtained from ROC curves ierplotted on normal deviate axes. The equation for this purpose in normal gldeviate space (Z) for the ZP ratio is:

Zr, - bZpp- a. (2) Pleval

The quantity Z7p on the Z scale can be converted to irP on the 0-10090 scale sol

by determining what percentage of the normal probabil ity distribution l ies ret

above (i.e., to the right oD Zrp. val

The Dorfman and Alf output also provides the variance and covarianceterms var(c) , var(b) , and covar(4, b) . These can be used to calculate the f ro

sampling variance or uncertainty of Z7p and thus of TP itself. This is done pu

in two steps. First we calculate a confidence interval for Z7p (in the Z scale),and second we transform the confidence interval back into the usual

0-10090 scale. The relevant equation for calculating the variance around aZ7p ratio is:

var(bZnr- o) : Zfnvar(b) + var(a) - 2Zepcovar(a, b). (3)

:sarlrluepr Euuno1lo; el{l asn a,tr sluauodtuoJ a^oq€ aql ulelqo oI (t)

'r,rl(z-t7)lr-nl: (zz- 'z) aS(c) (:n - to ' zq - tql n,.ot d IZZ -

(in _ to) nt + (.q _ tq)nt d !7 = (t7, r7) n,t(zo -tn) -d{Z(.q - 'ql = rZ, rZ

luql os '7p -d-472q:27 pue to -d.tZrq:r7 allJ,.'r a,{'\ asodrnd

srql JoJ '[61] urerSord gpy pue u€uJJoC 3.leuolsu3lrrlp-o,ll1,, s(zlal4l luoJJ

lndlno se ua,rrE sturet aJu€IJE^oJ pu€ aJuEIrE^ eql sasn poqlaru ttrJjrlult n^

(paJnpar) slqt alslnJlel ol sfe,tr o^\l are aJeql 'sluallud 3o saldures paleadar

uo pelplaJJoc eq 01 puet III^^ 7Z pue IZ 'seldur€s parrcd 8ut,t1oa.ut suos-r:edruoc rod '(€) uollunba uI s€ pal€lnrl€l qJea an (27)nt pue (r7)nt'

araqar 'rr,[(72)n^+(t7):r-r] ,{ldrurs sI seJuareJJlp aql Jo gg aqt 'sa1d

-rues (parrcdun 'luapuadapur) aleredes 3ut.t1o.tut s,dJ Jo suoslJedluoJ uI

Epr

IBns'(elr

auolaqlaJul

SsIIalEJ

Q\

'\il l€ru,

uorlenba Jo JolBulluouap 3q1 uI UOIIBIJPA JO SOJJnOS O^\1 Ja1l€l eql eroudl sa^ta,t\ luaruoru aql rod 'JepEOJ euo ulqll/t\ SEUtpeerar l€Je^es Jo sJap€eJ IuJaAos qlldaJe eJaql reqlaq^\ uo pue sluatled;o aldrues luaJeJJtp € Jo sluus aql uo pe1€ aso(-nlu^a serlrlepolu o,t\l luo{ aJB zZ pue t7 eq1 raqtaq,tl uo puadap III^\ oIIEJ ar.u'

I€JrlrJJ aql Jo rol"ulIuouep eql 'uollnqlJlslp IsIuJou aql o1 paJ"duoJ puB puE.

-olc

(r) "';?,f,tt : ,"eAIlr

palelnJpc st otteJ JEJIIIJJ Eut,to11o; aql aslg.'(sorler

drlr etues eql tE qloq 'zdJ pue r41 ''E'a) s.dJ paltlJ o,ttl uas,rttleq -ISo(

eJuaJeJJrp luectstuEts ,{1pct1st1e1s e sI aJeql Jal{leq/t\ lsel oI 'aleJs slql uo paslslsel uJoJJad ol eleudordde arour st 1t 'a1ecs

7 (papua-uado) aq1 uI sIJlaIu eql-ur,,(s aroru eq 01 pual suollnqltlslp Sutldures eql aJuls 'al"3s

dJ aW Jo dJZ sl t{:aql reqtlo uo pa:eduoc .{fiectlstlels eq uet (sluauuadxa luaJaJJIp omt ruor;) -uo

s.dJ palllJ olru 'Jlaslr dJ pue dJ7 uaanlaq uolleleJ auo-ol-auo e sI aJeql uaa/

asnsJag alr NoI U {OJ V rV S.dJ OiA,I CNIUVdI IOJ UOC SrSqI AJNVJIJINCIS

'luacrad 0S ruoJJ ,{e,rne a,totu senl€^ aql se flletcadsa

'rrrlauur,{se eq ilI1t\ soItBJ (dlr osl" pu?) dJ eql punoJe IB^JaluI aJuaplJuoJ

aq] ',{lpraueD 'salqet ,{lqtqeqord IeIuJou eql Eutsn 'dJ toJ sllull ra,e\ol pue dlraddn aql Sutlelsuerl Jo Jalleu eldruts E sr r'at, JoJ IP^JaluI sJuaplJuoJ aq lB pourplqo seq euo aou6 ('gE se ,trol s€ 1r JoJ ua^e IeIuJou ruau .{lqeuosea: luqlsuortnqrrtslp e^"q q puu t luql [g] ,(lpcrrtdure u^\oqs s"q zlal l 'ra'toarotr41) -llui'slepotu Eur.{yrapun eql Jo ssalpreEar 'suotlnqtrlstp u€Issn€D e^eq pFot{s suolfaql a8re1 sr aldues eql JI

'satelullsa pooqllslll tunrulxeru eJ€ q puz , saletu Jo lr-rlse reloru"J?d aql arurs :uollnqulslp lPtulou eql Jo selqel eql luorJ uasoqJ -rad

eq uur,.a1dr11nut,, et4l'grz),,?gglj!11 3o eldrlpru atuos + "a'\'drz autE

Jo AS aql yo aldr11nur ewos +dJZ sauloleq dJZ to! Ie^JeluI eJueplJuoJ aqJ JoIJ

lrl sa^un] Jod co sIsrTVNv lYJILsIrvJS

142 McNrrr eNo HeNrry

var (b ' - b ) = var (b r )+var (b . ) -2covar (b t , b . )v a r ( a r - a r ) : v a r ( a r ) + v a r ( a " ) - 2 c o v a r ( a , , a . )

covar ( t r , -b2 , ay-a) = covar (b , , a , ) -covar (b , , o r ) -covar (br , a r )+ covar(br, ar)

The second method is useful i f Metz's program is unavailable. We canapproximate var (21 - Z) by the method of jackknif ing (see Fleiss [ l l ] for ageneral introduction to jackknifing, and Efron [12], equations 6.11 and 6.17 ,for the jackknife variance in two-sample problems). When the rating datacome from n" normals and na abnormals, the jackknife method consists ofobtaining nN+ nA different estimates of Zr- 22 and using the quantity

E I ( Z t - Z ) - 1 z , i - z . l ) 2

as the jackknife variance of (Zy- Zr). The quantity (Zr- Z) is obtainedfrom the entire data set. The i th jackknife estimate (Zi- Zil is obtained byfitt ing two separate ROC curves to the data set (of n1,,+na-1 subjects) Ctformed by deleting the paired ratings of the i th subject from the original Aldata set. Although this may sound computer-intensive, the numbers of pairsof ROC curves to be fitted depend on the number of rating categories andnot on the number of patients. For example, data on a five-point ratingscale wil l involve at most l0 distinct values for each modality; the nn+n1,quantities being summed in equation (6) will occur in multiples.

Eq

ExrENsroN oF THE Mrrnoo ro Murrrprr Rr,tnrns. As we have done so va.

far, we still base the analysis on equation (4), but now need to include sid

between-reader (56r) and within-reader (Srr) variances. For this purpose we 5q

use equation (5) (Chapter 4) from Swets [], as shown here: 1.6

SEroi rn: 2" ' I s !** ,11 - t , * , ) * t i ' : * '

( l - ro, * , ) -51, , l t ' ' , (7)

L 'rb, ,r: the observable correlation between the Z7p's obtained when a set

of readers reads the same cases in the two settings

rc,wr:the observable correlation between the Z7p's obtained when asingle reader reads the same set of cases in two settings

S.2*,,:53+Si., the observable variance in Z7p that would be found byhaving one reader read once each of a set of different case samples

S3**,: Sj,+ S|,, the observable variance in Z7p that would be found byhaving one case sample read once by each of a set of different readers

S],:the observable variance in Z7p that would be found by having onereader read one case sample on two or more independent occasions

/ : the number of indeoendent readers

1 5 t

TheqttalcalO Z ,

whcalspith(we

(6)

Thdis

NcDir

nol

'[eluJou

flartuuap ol S Jo rnt?^ E pue Ierurouq€ .(larlulJep Jo sultur e11'1"dl11]j:j111

IS8S

IBIOI

z9

z9 fc

CF

zI

pasEesICpaseesrpuoN

su(euo

sJap,(q P

SAI,(q P

eu0

0*tu;1ug

oprg Eupra tulod-ealg B uo peululqo a^rnJ JOU 'I alqul

'8t'0 - anl€^ aql ;o lq8rr aql 01 .{llllqeqord aql sluasardar puu 'uollnqulslp

Ierrrrou Jo alqel aqr or Euu.ri;ar .{q iuaorad ?t'89 sauooeq 8t'0 - enp^ aqJ las €

'8r'0 - =LSg't-Gw'lDil'o : dtz

seluoraq (7) uotlunbg Pue St9'lslenbe (uortnqrrlsrp PtuJou eqlJo 0/oS saII qJIq^A a^oqe anl?^ at]J) dt7'01n9 e^\a

Jo orluJ dg ue ly 'sasodrnd a^I1€rlsnlll JoJ uoIIEInJIEJ pallulep euo rapls apnl

-uo)'ob11pue'0691 '0bg Jo sanle^drC WZelqej- uI u,rt\ol{s are,(aql lsanpn os al

4g Aue roJ soller dtdJ pue drz Jo uollulnrlss,tolle (6) pue (Z) suotlunbg

grno.0 : (q ,D)re^6 Nu )

Lg1o'o:(g)r'^ €lt'o=q Eull'

vL6O'0:\D)rE^ LS9'l:'� ou"SJIE(

:e1ep Sut,uolloJ eql saprzrord [7] uurSord poot{lle{Il tunrulx€lu JIV l,€ul:

pue uEtuJJoC aql 'I alqel uI €l€p e^Jnr JOU aldrues aql raplsuoC 'alun3 (s1c:

f,ou f{v No rNlod arcNIS v uo{ 'TIIII-I iIcNac[JNoJ Jo NoIrv-InJfYJ '{q ppeul

sllnsau

('seJns€eu Jeqro ot IIa^A (9)

,{lpnba ,(1dde feqt 'saJnseaur eoJe uo peseq er€ [1] ur saldruexa oql qEnoql

ua^a leql elou p1not{s repeal aq1) 'aceds dg-dJ aql uI uuql raqle: areds I

z ewurapsr.u 3rE setBrurlse sesEJ IIp ur lsluauodruoc Jer{lo 3t{1 Jo uoIIBInJIPJ JO S1

ro3 [1] ot peJJeJOJ sr JapBaJ oqJ 'al?Js dJz pepua-uado aqt .raJa:d ar'r sseJaq,{\ 31Ep

'alEJS drJ papua-pasolJ aql uI lro,rA l1e{Jld pue sla,lns 'oslv 'zq pue "q "o 'Ll't

'1, srelaure:ed lueuodtuoJ aql Jo saJuulJe^or Eutsn '.(lltctldrut ]I salelnlpJ E Jo

elnuJoJ eJueIJE^ Jno allq,tr "'-'.t uotl€larJoJ 3q1 Jo sulJal ur ,(ltnr1dxa >1pt uBc

]la{Jld pue sta,{\S (Z) pue lsluauoduroc eerql eql Jo qJeo ro3 sacuetren lenbaarunsse lte{3ld puu sl3.l.rs (1) :suotleueldxa o,trl seq uolle}ou luaJaJJIp eqJ'127-17)rc sE peqrJJsap a^Bq e^\ suollelnJl€3 osoq,^l,(ttluenb atues aql sI

(s1a1curq arenbs aql epISuI z^Z eW 3ur1et) (ril - ) r - I)'^

* iSZ fltluznb aql

wI sa^unJ Jod co srsrlYNv lYrlrsrrYrs

(t)

144 McNrtr,c.No HeNrry

Table 2. Predicted Thue-Positive Values at Fixed False-Positive Values,Using Equation (2)

FP f, u/0 l09o 20Vo

2,,TPSE of Zrc9590 CI on Z*95Vo Cl on TP

- 0.4868.440.26

( - 1.03, 0.07)(84V0,49Vo)

- 0.7477.U0.23

( - r . l e , - 0 . 2 8 )(8890 ,3990 )

- 1.068 5 . 5 4 -0 .35

( -1 .16 , -0 .36 )(96V0, 36Vo)

Rea

\\ irH i r

\t a:est r l

c l t rda t rAnla r (shovalr

The 95 percent confidence interval for Z7p is thus -0.48+ 1.96(0.2579), .,or ( - 0.99 to 0.03). Using the same merhod used to transform Z: - 0.48 to u^' '

a rounded ?"P value of 68 percent, we can transform the Z= -0.99 and ",.'

0.03 back to upper and lower ?"P limits of 84 percent and 49 percent. ;t-Table 2 summarizes the standard errors obtained in this wav for three

points on the above ROC curve. - 'De

Equation (3) gives confidence limits about this value as follows:

v ar ( Z 7) : (1.645)'� (0.04 6'7) + 0.097 4 - 2(1.645) (0.M78): 0.0665

SE(Zrp) = 0.25'79.

CoMp,lRrNc Ttvo Cunvss (UNrenEo Dnm) er ONr ?"Ppp. Table 3 dis-plays two sets of rating data for previously published results on gallium

Table 3. Ranking Data For an Unpaired Experiment [13]

Rating*-BWH3

NondiseasedDiseased

4l 9

I

A

I

Rating* -JHH

3

r36

4

125

s

tB'*'7

i'.

NondiseasedDiseased

J

l 8

BWH

a : 0.6665b : 0.4316

var(a) = 0.07234var(b) : 0 .03639covar(c, b) = 0.0163

J

J

JHH

a

b

var \o )var (b )covar(a, b)

1 l (r 2

T\\ C

graThct i r c0 . i 'l n t

26

0.76310.6969

0.18220.20210.125'7

i l = definitely abnormal; 5 - definitely normal

'06'0- = (e'r-os'E)-(sz't)(os'o- 6z'0 = 'z -'z(0ott'98 ro) 90't - = 08't -$z'1169'0 = zz

(onw'6 ro) S6'I - : 09't - (gZ't)AZ't = 'z

'(97'1:att1 o/t1l:d.:I lp 'snql 't alq€I Jo tuouoq aql uru,roqs ere saJusrJenoJ oql',{-rotsrq lnoqtt,r Etep aql roy'.{ya,ttlradsa:'69'gpue 08'I era,u z? pue e o JoJ sanl€^ eql puu ',{:o1srq qlr,r ulup aqt lo.1 '.i1a rtt-radsar '62'l pue 09'E aJa,r ruerSord zlehtr aqt urorg lq pue Ir Jo senl€,\ aqI'[y1] ,(rolsrq I€JrurlJ (surunloc) qtl^\ pue (s.tor) lnoqllt\ pear{ eql;o sruerB-oruol patnduroJ EuIp€eJ IEnpI^Iput a18uts u ruoJJ etep Eutler ;o slas o,\\lsfeldsrp t alq€I 'al6r; aNO rv (vrvq caurv4) siIAdnJ o,,rrJ cNI)rvdI IoJ

'tuaclad g'14

ol luacrad 9'gl €lpp HHf eql roJ pue'luatrad I'9L ol luaorod Z'Ie ar€ HlAgeql JoJ sluurl eJueprJuoc luac:ad 96

'a,toqu paqltJsap qreordde aql Sutsn

'0k9, ro'fl'0+ - tEgL'0- (sz't)osog'o - drz(HHt)'0/4vs ro 'll'0- : s999'0-(82'l)91€t'0 - drz(H,^Ng)

:s/r\olloJ se are feql 'sSutuas o^u eql Jo qJ"e loJ (01001 Jo senle.\

d.l ft) at7 Jo'anoqe u,toqs sP 'uoIlBInJIPs s,{\olp pue t elqul ul u,/r\oqssr urerEord JIV pue ueluJloq eql uoJJ lndlno eql 'teuu€os JeeuIIIlJal eEursn (gg1) ptrdsog surldog suqof tB las uolloq oqt pue '€retu€c ra8uy

ue Sursn (HAfg) p1rdso11 s.uatuol1 pue rueq8trg aql le paulelqo s3,{\ EtBp

;o tas dot aqJ '[gl] stsdas yo aornos IEJoJ 3 JoJ qtJeas aql ut EutEutul olsllll

[0t] ruo:J poqrau aqt Sutsn pa\uaqr

zlII

s

s7l

s

IUn-SIT

e Es0'0I890'08tE0'0LSL0'0

z99r'0zn90'0 Evon'jzrLr'0 96v9'0 88zz'r

tq.D

'gtD

6E'0 -

08'r -

67'�1 =

09't =

saf Brurlsapooqrla{!l urnurJxEl^l

'q'a,.q'D

eeJ

puloll'(6,

zg tg ,g

seJuaFBAoJ puB saJuBFuA

*(,{.rolsrq lnoqlr^{: Z :,ftols;q qlpr: I) sralautured aarnJ JOU

s€929

I-I

z-

plorsttzl.{.ro1sr11 qfFr pBaU

Sg: t/ 'stuallBd Peseaslq

E6 0t leloJ_

I

Ih

I

IBroJsnt.ZIf,.ro1sr11 qllar pBaU

tE : u'sluallBd PasuostPuol

v>

(0

(e€'liis

7C

IIIII

li lz€

),ls

1,7 7

zs97,

.i.ro1sr11

Inoqtr t\PBJU

[ttl tueuFadxg parEd u uorJ suBrs IJ uo BIB(I Eupuy 'p alqu1

sil sa^'dnJ 3ou Jo srsrlvNv 'rvcr.rsrlvt.s

t-I_

III

146 McNrtl aNo Haxrrv

The three subcomponents, calculated as for equation (5), yield values of0.3820, 1"0416, and 0.5877, so that equat ion (5) y ie lds

var (2, Zr t = ( t .28)) 0.3820 + 1.0416 - 2(1.28) 0.5811= 0.1630

SE(Zr- Z) - \0 .1630 = 0.40.

Thus the crit ical ratio is -0.90/0.40, or 2.25, indicating that at FP:l}qoZP's obtained with history are statistically higher.

With the jackknife technique, the paired ratings of successive patientswere eliminated, as described in the Appendix, to create 54 + 35 :89 differ-ent data sets, each with 88 patients. Using the steps in the Appendix, weobta ined the jackkni fe est imate of SE(.Z1 - Z) :V0.1966:0.44, which isonly l0 percent higher than the more parametric SE of 0.40 calculated bythe "paired binormal" model of Metz.

Discussion

The work was motivated by two concerns with the use of area indices forcomparing two ROC curves: (1) That two curves might cross and in suchcases similar areas might result; and (2) that even in the absence of crossingcurves similar areas might result when, in fact, statistical differences couldexist in the region of clinical interest. These concerns pointed to the need tomake comparisons at single points on either the TP or the FP axis. In theprocess of developing the analysis discussed here we realized that there wasa need to provide a brief overview of commonly used indices for ROC anal-ysis, whether paired or unpaired experimental designs were used.

The major point of our review and analysis is this: Once we assumebinormal distributions of an ROC curve, all statistical properties are deter-mined by the parameters s and b of the maximum likelihood fit to the data.This paper has emphasized the use of these parameters for calculating confi-dence limits around single ?"P or FP points onywhere along the ROC curve.Others, particularly Swets and Pickett [], have discussed confidence limitsexplicit ly in relationship to area measurements and IP points correspond-ing to observed ?"P-FP pairs.

To put this work in perspective it is worthwhile to summarize previouswork in the general area of statistical analyses of ROC curves. The workfalls along two lines: (l) the use of a single index (for example, the area or?"Ppp) versus joint indices (for example, slope and intercept); and (2) con-sideration of between-reader variations and correlations, within-readervariations, and variations and correlations between cases.

Swets and Pickett [] give formulas for single indices that cover all possi- NDble experimental designs. Hanley and McNeil [6] elaborated on statistical Dconsiderations relating to a specific single index, namely the area; becauseof the unique relationship of the area to the Wilcoxon statistic it is possible De

to calculate explicitly the associated standard error due to case sampling. histol

The sROCworkent inexpliccurvewerethe ja

tiple rMt

of coro n l y '

Orthis pspacetributnaturpriatethrou

To olfrom

(1)modahere iwhertjointl

differ

NDD

(2)De

withc

'sluarled r.{Jns a^rJ eJp eJeql :,{Jolslq

lnoqll{\ g pue f:o1srq ql!t\ 1 go s8urler qll,t stuatlsd po$aslpuou aloleq

s0sr.0_ ez.nr_ ez s-. t ^ !^ 8z16.r_ er2 ! t^ _ c- II S 9 Ze - t. Z 8 0V CIN

'z - tz ,'z lz',{rolsrq lnoq}r^\

E pue .ftolsrq qlp\ Z go sSutlur pa^IeJaJ oq,t luatlud paseaslpuou elalag'sluerlud a^rsseJf,ns 1no Surneal .(q pautulqo eJ? sles El€p ,lneu aqJ (Z)

'Eu11d

alqrss(asneJ:

IUJrlsr-rssod

Japsaj-uor (ro ?aJ

IJO,'r\SNOIA;

-puod

s1IluII'a^Jn3

-rJuoJ'ulep :-relep

e[unss

-lEuE:

S€1r\ AJ

3q1 ulol paerplno3SurssoqJns uroJ sa:

{q Patsr qJrr.a,t\ 'xI

-IAJJIP

stuarlr

6588.0_ S8Z0,t_ 9Zs Z -Z

I II S 9 ZE'z-tz 'z 9 , t z I

Eurtuy.{.ro1srq fnoqllA{

wt6t_ 9Z9Z I --E 7 6 0n

tz gr,zr3ut1ug

f,.ro1srq q1;tr1

CCN

'luarOJJIp

,(1tq8r1s are ,(aql 'snq1 'urerSord 21a61 aqi Jo unr a18uts E LUoJJ .{llutotpal€urlsa eJan t elq€I qlr,^A uorlJeuuos ur JeIlJea u,toqs asoql s€eJeqnA'ur?r3ord

JIV pue UEnIJJoC eql Jo sunJ aleredas o,{u Iuor; pe^IJop aJP eJOq

sanle^ zZ pue t7 at4t wqt e1o51 '(paseaslp: CI :pa$aslpuou: AN) .{lqepou

qJBe JoJ se^JnJ 369 aleredas tIJ ot pesn ere t elqel urorS eleq (1)'sarlrppou o.,\nt ,{q pelenp^a sluatled Jo les elues aql uoJJ

paurclqo sdJT om1 aql ot JeJaJ zZ pue t7 anql'l' '127-t7)n^ ul€1qo oI

anbpqra; aJlu)plJsf Jo uolluJlsnll :x;puaddy

'lsaJetur Jo sortur d,4 pue dJ le lnoq8norqt

cr't]aruur.{s aq o1 ,{1a1r1 eJou eJE suorlnqrrlsrp aldues oqt asneJaq oleud-ordde arotu suaas 'papua-uado arou Eutaq 'elels

Z eql '31BJS eqt Jo eJnteupesolc oqt Jo asnecaq 'luaJJad

0Ol punoJe puu luacrad 0 punoJe suollnqlJl-srp Eurldrues rrrlauru.{su Eurzreq ;o a8etue,rpesrp eql seq JetlEI aq1 'aoeds

d.:I-dJ Jeeurl eql ur ueql raqlu.r acuds Z aql u auop uaeq a,ruq raded stqlur apetu suosrreduror er{l Jo 11y

'Sutztseqdua qlro,u st lutod Jaqlo auo'Sutldurus asuo ol anp suoll€IJerr ,{1uo

reprsuoJ suerEord srH 'uaql uoJJ pe^IJap eq uec xaput a18uts fue asrnoc;oqEnoqtp '(ldacralur pue adols) saf,Ipul lutof paztseqdrua seq [Ot-g] ztaW

'lle{Jld pup sta,{,\S 3o qceordde leraua8 oql Sursn ,{q srapeat aldtt-lnu ot qreorddu srql puelxa ,(lsnotnqo plnoJ al6 'anbtuqcal Sutytu>11cet aqtqllm paldnoc uerSord JIV pu€ upruJJoq leuoltua^uoJ eqt asn o1 alqe eJa,^A 0-o0l =e,r 'sluarled

Jo slas etups at{l IUo{ paAIJap sdJ6r7 or*1 Eutreduor JoC 'a^lnf,

3ga ayEurs e uo turod .{ue lu sl€^ratul eJueplJuoJ JoJ s€lnurJoy ltrtldxosa,trE pue 'dtdJ ewlq srql 'xapur al8urs raqloue qlr,r sleap uotle8llsa^ut iue-serd aqa 'hl ttalctd pue sle,/r\S .{q patsaSSns sulnuloJ praua8 pus Jo lro,taql sluaualduot snqi U

'saseJ Jo las eruus aql uoJJ pa^lJep seArnJ JOH

roJ lnq 'seara qlr/r\ sl€3p osF [t] ilaxrlnl pue falueg ;o raded puoJes aqf .Io san

LVI S:r,\un] lou Jo slsA'IYNv lYlrrsrrYrs

I48 McNErr ,cNo HaNLr r

N D 3D

Dehistorlast fSimilvious

N D 4D

Dttory;

N D 4D

Dthisto

N D ID

Drtory

N D I

D

C(

The z

1 . ST

t q

\ O n t n O . € - $ C n $ - t . { $ o . . | ^ . 1 - t f , O- O c l o l cO @ ca. . . l O r O - . ] .O \O O $ - < f , €o \O 6 € @ O\ O O\ t <) - \O O\ -.-l \O \O \O O ro c . l - O r € O r € O \ C C \ C \ C $ r ^ r - O $ S6 . . t n o o 6 o r o o o . € @ o - r o . . l o o \ o€ . . I C . O O O O r O - O 5 - . O

= s = = S g E s E g S = S E s 3 g g s s .,- :..{.- ' .- -i - ' ^ -i - i .-r - ' -: ; ^' ^' -: ; .: ; ;

q)

z

N

N

I

\:'l

Ni

N

E N

.? N'

.ii,,ia

, N

v -

r . E

z z

o : "q 9 5

o . r. > q 2

a . a

a!

JI

t i

N

I

r.)

n € 6 r+ \O n O - O\ $ F- € \O cC oO O C 6 € -O :f, s. o F- t 6 <. r- r- O O o - a.l rO r- n € 6n \O r- @ a! r- € a- r- @ t € t O\ O. $ d r al or- @ oO O, O cO @ C\ cC € C) € O € cO € cO @ - t-o o o d - j d o d c j o - j o - : c l c l c l o o J o

l l l r l l l i l6 n n n n € € € $ $ $ h h o c € € r r a rcl € € cO € r- r F- h n 6 cl ot O\ O\.+ S 6l F l--.o 91 9! t! 6..{ q! 9! ^! Ol Cl Cl Cl Q a !a I o \o \o- O O O a a a O O O O O O O O O C - - -__i -_i _ -i __i - _ i -i - - --i J J i J J - J --il l l l i l l l

@ 6 € o \ - O € o \ O € - 6 - @ @ A @ a . l n 6o.l o al - \O o ol - o 6l @ o \O O - O - € \O O- O - O \ o O - O O - \ O O \ o O O O O r - o \ Oo, o\ o\ o o o. o. o o o. o 5 o o\ o. o. o. c\ 6r o\J - : - : ^ i ^ i - j - j ^ i - j - j 6 i - : ^ i , j - j - j - j - j ^ i - j

l l l l ! l

- n d - * * 6 o - - F o l 9 o 6 l c l - -- N

" 1 a n a a . 1 - 1 . 1 . ' 1 . ' l c ' I 1 a " 1 " 1 - f " n d j 1 aal * cl o $ * 6l o - a.l $ - <t $ n :l n o 6.1 S

A A N A A A A A A

z z z z z z z z z z z z z n3 . N4 , L

A

6961 '96r-L8n:9

loqr,{sd qtut t '"tep

poqleu Surteg 's1u,r:otur eJuoprJuoc Jo uorleurturalep pue .(:oeql uorlJolopleu8rs go sralarue.lud Jo uorterurtso pooqllalrl unurx?t\l :rf E JIV 'CC uetuJroq 't

BL6l'862-t8Z:8 pehl IrnN uruas 'srs,{yeue JOd Jo saldnur:d rrseg :EJ ztet\l '€

6L6l' lZI-60IitI Iolped 1sa.tu1'sanbtu-qro1 Sur8eu:r leJrpour Jo uort€nle^a eqt ol parldde srs,{1eue JOU :Vf sla,rs 'Z

Z86l ')rurapeJv :IroA ,na51 ',{:oaq1 uortrote6l

pu8rg ruo:j spoqtohl 'suats,{5 crlsou8er6 Jo uorlenlp^a :NU }telcrd 'Vt sle,$S 'I

saJuaJalau

'stuaururor 1n1d1aq srq roJ zlet\l 'E selreqJ 'rC ol patqepur ere sroqln€ aql

luaurEpal.noulry

S alq€I ur u,roqs sl (zZ -t7):ct' 3o uorretnduo3

wtL.o_ LLgr.r_ 9Z S Z - r groe.r 9ZS Z I - Cr rrI9 zt -tz 60rcN

'(l alqet ur ,(rtua tsel) ,{ro1-srq tnoqllr\ 1 pue frolstq ql!,!\ t Jo sSutlu: qtt..t,r sluatlud paseeslp alale6l

ii

8268'0- 0600't- 97s z -z

I II I 9 Zt

8168,0_ 0600.t_ sz9 z -z

I II S I Zt

€rr8.0, s8t0.I_ 9ZS Z -Z 8zt6.r_

r Ors s zE

8t06.t_ sz9 z I - c-t z 6 OtcN

'sluerled qJns tZ aJ€ aJaql 1,,(rolstq

lnoqll!\ a^IJ pu€ .{rolstq qll.{\ a^IJ Jo s8utler qtt,,'r sluatled pasueslp alela(I

8006.r_ 929 Z I - CI-t z 6 OtcIN

'sluarlud qJns o^u are araql !.{ro1-srq lnoqlr^\ E pue ,{rolsrq qll,r\ t;o s8utler qltm sluatled pas?aslp alalacl

i:

9Z9ZI-CI-E (.8 OtcN

'sluarled a^U snol^-ard aql roJ senle^ z7 aql ol pnba are JnoJ aseql JoJ sanlu^ z7 aqt '.{peltrutg

'luarted 1srr3 ,{ral aql JoJ anler' t7 aql ol lunba qf,Be aJE sluatled JnoJ lsEIaseql roJ sanle^ t7 nol eql leql oJIloN 'sluartud qJns lnoJ ar€ aleql :,fuolslq

lnoqtrlrt 7 pue .{rolsrq qll,\ z Jo sflurle: qlt,u sluatled pesseslpuou el3lac

t'.

ll

;o,

8198'0 -

,z _,2s8e0.r_ 9Zs Z -Z

I OtI 9 Zt,Z

t€06.r_ 929 Z I - C-t z 6 6€cN

,Z

6Vrsa^dnJ )ou {o srslrvNv'rvJrJ.srJvrs

150 McNrn eNn HeNrrv

5. Bamber D: The area above the ordinal dominance graph and the area below thereceiver operating characterist ic graph. J Math Psychol l2:38i-415,1975

6. Hanley JA, McNeil BJ: The meaning and use of the area under a receiver operat-ing characteristic (ROC) curve. Radiology 143:29-36, 1982

7. Hanley JA, McNeil BJ: Method for comparing the area under iwo ROC curvesderived from the same cases. Radiology 148:839-843, 1983

E. Metz CE, Kronman HB: Stat ist ical signif icance tests ior binormal ROC curves.J Math Psychol 22:218-243,1980

9. Metz CE, Kronman HB: A test for the statistical significance of differencesbetween ROC curves. Inserm 88:647-6ffi, 19'79

10. Metz CE, Wang P-L, and Kronman HB: A new approach for testing the signifi-cance of differences between ROC curves measured from correlated data. In,Deconick F, ed: Information Processing in Medical Imaging VIIL The Hague:Mart inus Nijhof, 1984

ll . Fleiss J, Davis M: Jackknif ing functions of mult inomial frequencies, with anapplication to a measure of concordance. Am J Epidemiol 115:841-845, 1982

12. Efron B: Bootstrap methods. Another look at the jackknife. Ann Stat j : l -26,

t9'7913. McNeil BJ, Sanders R, Alderson PO, et al: A prospective study of computed

tomography, ultrasound and gallium imaging in patients with fever. Radiology139:647-653, 1981

14. McNeil BJ, Hanley JA, Funkenstein HH, Wallman J: The use of paired ROCcurves in studying the impact of history on radiography interpretation. CT ofthe head as a case study. Radiology 149:75-77, 1983