Diagnostic Testing

Diagnostic TestingDiagnostic Testing

Ethan Cowan, MD, MSDepartment of Emergency Medicine

Jacobi Medical CenterDepartment of Epidemiology and Population Health

Albert Einstein College of Medicine

The Provider DilemmaThe Provider Dilemma

A 26 year old pregnant female presents after twisting her ankle. She has no abdominal or urinary complaints. The nurse sends a UA and uricult dipslide prior to you seeing the patient. What should you do with the results of these tests?

The Provider DilemmaThe Provider Dilemma

Should a provider give antibiotics if either one or both of these tests come back positive?

Why Order a Diagnostic Test?Why Order a Diagnostic Test?

When the diagnosis is uncertain

Incorrect diagnosis leads to clinically significant morbidity or mortality

Diagnostic test result changes management

Test is cost effective

Clinician Thought ProcessClinician Thought Process

Clinician derives patient prior prob. of disease: H & P Literature Experience

“Index of Suspicion” 0% - 100% “Low, Med., High”

Threshold Approach to Threshold Approach to Diagnostic TestingDiagnostic Testing

P < P(-) Dx testing & therapy not indicated P(-) < P < P(+) Dx testing needed prior to therapy P > P(+) Only intervention needed

Pauker and Kassirer, 1980, Gallagher, 1998

Probability of Disease

0% 100%Testing Zone

P(-) P(+)

Threshold Approach to Threshold Approach to Diagnostic TestingDiagnostic Testing

Width of testing zone depends on: Test properties Risk of excess morbidity/mortality attributable to the test Risk/benefit ratio of available therapies for the Dx

Probability of Disease

0% 100%Testing Zone

P(-) P(+)

Pauker and Kassirer, 1980, Gallagher, 1998

Test CharacteristicsTest Characteristics

Reliability Inter observer Intra observer Correlation B&A Plot Simple Agreement Kappa Statistics

Validity Sensitivity Specificity NPV PPV ROC Curves

ReliabilityReliability

The extent to which results obtained with a test are reproducible.

ReliabilityReliability

Not Reliable Reliable

Intra rater reliabilityIntra rater reliability

Extent to which a measure produces the same result at different times for the same subjects

Inter rater reliabilityInter rater reliability

Extent to which a measure produces the same result on each subject regardless of who makes the observation

Correlation (r)Correlation (r)

For continuous data r = 1 perfect r = 0 none

O1 = O2

O1

O2Bland & Altman, 1986

Correlation (r)Correlation (r)

Measures relation strength, not agreement

Problem: even near perfect correlation may indicate significant differences between observations

O1 = O2

r = 0.8

O1

O2Bland & Altman, 1986

Bland & Altman PlotBland & Altman Plot

For continuous data Plot of observation

differences versus the means

Data that are evenly distributed around 0 and are within 2 STDs exhibit good agreement

0

10

-10

O1 – O2

[O1 + O2] / 2

Bland & Altman, 1986

Simple AgreementSimple Agreement

Extent to which two or more raters agree on the classifications of all subjects

% of concordance in the 2 x 2 table (a + d) / N Not ideal, subjects may fall on diagonal by chance

Rater 1Rater 2- + total

- a b a + b + c d c + d

total a + c b + d N

KappaKappa

The proportion of the best possible improvement in agreement beyond chance obtained by the observers

K = (pa – p0)/(1-p0)

Pa = (a+d)/N (prop. of subjects along the main diagonal)

Po = [(a + b)(a+c) + (c+d)(b+d)]/N2 (expected prop.)

Rater 1Rater 2- + total

- a b a + b + c d c + d

total a + c b + d N

Interpreting Kappa ValuesInterpreting Kappa Values

K=1

K > 0.80

0.60 < K < 0.80

0.40 < K < 0.60

0 < K < 0.40

K = 0

K < 0

Perfect

Excellent

Good

Fair

Poor

Chance (pa = p0)

Less than chance

Weighted KappaWeighted Kappa

Used for more than 2 observers or categories Perfect agreement on the main diagonal weighted

more than partial agreement off of it.

Rater 1

Rater 2

1 2 ... C total

1 n11 n12 ... n1C n1.

2 n21 n22 ... n2C n2.

. .

. .

. .

...

... . .

. .

C nC1 nC2 ... nCC nC.

total n.1 n.2 ... n.C N

ValidityValidity

The degree to which a test correctly diagnoses people as having or not having a condition

Internal Validity External Validity

ValidityValidity

Valid, not reliable Reliable and Valid

Internal ValidityInternal Validity

Performance Characteristics Sensitivity Specificity NPV PPV ROC Curves

2 x 2 Table2 x 2 Table

TP = True Positives

FP = False Positives

Test Result

Disease Status

cases noncases total

+ TP-

positives negatives

total cases noncases NFN

FPTN

TN = True Negatives

FN = False Negatives

Gold StandardGold Standard Definitive test used

to identify cases Example: traditional

agar culture The dipstick and

dipslide are measured against the gold standard

Sensitivity (SN)Sensitivity (SN)

Test Result

Disease Status


+ TP-

positives negatives


FPTN

Probability of correctly identifying a true case TP/(TP + FN) = TP/ cases High SN, Negative test result rules out Dx (SnNout)

Sackett & Straus, 1998

Specificity (SP)Specificity (SP)

Test Result

Disease Status


+ TP-

positives negatives


FPTN

Probability of correctly identifying a true noncase TN/(TN + FP) = TN/ noncases High SP, Positive test result rules in Dx (SpPin)


Problems with Problems with Sensitivity and Specificity Sensitivity and Specificity

Remain constant over patient populations But, SN and SP convey how likely a test

result is positive or negative given the patient does or does not have disease

Paradoxical inversion of clinical logic Prior knowledge of disease status obviates

need of the diagnostic test

Gallagher, 1998

Positive Predictive Value (PPV)Positive Predictive Value (PPV)

Test Result

Disease Status


+ TP-

positives negatives


FPTN

Probability that a labeled (+) is a true case TP/(TP + FP) = TP/ total positives High SP corresponds to very high PPV (SpPin)


Negative Predictive Value (NPV)Negative Predictive Value (NPV)

Test Result

Disease Status


+ TP-

positives negatives


FPTN

Probability that a labeled (-) is a true noncase TN/(TN + FN) = TP/ total negatives High SN corresponds to very high NPV (SnNout)


Predictive Value ProblemsPredictive Value Problems

Vulnerable to Disease Prevalence (P) Shifts Do not remain constant over patient populations As P PPV NPV As P PPV NPV

Gallagher, 1998

Flipping a Coin to Dx AMI for Flipping a Coin to Dx AMI for People with Chest PainPeople with Chest Pain

SN = 3 / 6 = 50%SP = 47 / 94 = 50%

AMI No AMI

Heads (+) 3 47 50

Tails (-) 3 47 50

6 94 100

ED AMI Prevalence 6%

PPV= 3 / 50 = 6%NPV = 47 / 50 = 94%

Worster, 2002

Flipping a Coin to Dx AMI for Flipping a Coin to Dx AMI for People with Chest PainPeople with Chest Pain

SN = 45 / 90 = 50%

SP = 5 / 10 = 50%

AMI No AMI

Heads (+) 45 5 50

Tails (-) 45 5 50

90 10 100

CCU AMI Prevalence 90%

PPV= 45 / 50 = 90% NPV = 5 / 50 = 10%

Worster, 2002

Receiver Operator CurveReceiver Operator Curve

Allows consideration of test performance across a range of threshold values

Well suited for continuous variable Dx Tests

1.0

1-Specificity (FPR)

Sensitivity(TPR)

0.00.0 1.0

Receiver Operator CurveReceiver Operator Curve

Avoids the “single cutoff trap”

No Effect

Sepsis

Effect

WBC CountGallagher, 1998

Area Under the Curve (Area Under the Curve (θ) θ)

1-Specificity (FPR)

Sensitivity(TPR)

1.0

0.00.0 1.0

Measure of test accuracy (θ) 0.5 – 0.7 no to low discriminatory power (θ) 0.7 – 0.9 moderate discriminatory power (θ) > 0.9 high discriminatory power

Gryzybowski, 1997

Problem with ROC curvesProblem with ROC curves

Same problems as SN and SP “Reverse Logic”

Mainly used to describe Dx test performance

Appendicitis ExampleAppendicitis Example

Study design: Prospective cohort Gold standard: Pathology report from

appendectomy or CT finding (negatives)

Diagnostic Test: Total WBC

Cardall, 2004

Appy No Appy

CT ScanOR

+

+

- -

Physical Exam


WBC Appy Not Appy Total

> 10,000 66 89 155

< 10,000 21 98 119

Total 87 187 274

SN 76% (65%-84%)SP 52% (45%-60%)

PPV 42% (35%-51%)NPV 82% (74%-89%)

Cardall, 2004


Patient WBC: 13,000 Management: Get CT with PO & IV

Contrast

Cardall, 2004

Appy No Appy

CT ScanOR

+

+

- -

Physical Exam

Abdominal CTAbdominal CT

Follow UPFollow UP

CT result: acute appendicitis

Patient taken to OR for appendectomy

But, was WBC necessary?But, was WBC necessary?

Answer given in talk on Likelihood Ratios

Documents

Diagnostic Testing