27
Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April 18, 2012

Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

Embed Size (px)

Citation preview

Page 1: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

Imperfect Gold Standards for Biomarker Evaluation

Rebecca A. Betensky

Conference on Statistical Issues in Clinical Trials

University of Pennsylvania

April 18, 2012

Page 2: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

Outline

• Motivation: need for kidney injury biomarkers for diagnosis of acute kidney injury (AKI)

• Impact of imperfect gold standard on apparent sensitivity and specificity of perfect biomarker

• Examine conditional independence assumption: implicit restrictions

• Bounds on true sensitivity and specificity

Page 3: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

Serum creatinine for AKI

• Clinicians have used SCr to diagnose AKI for decades.

• Acknowledged as inadequate gold standard:– Poor specificity in some settings that are not

associated with kidney injury– Poor sensitivity in setting of adequate renal reserve– Relatively slow kinetics after injury

• Considerable interest in identifying better biomarkers of tubular injury: potentially more accurate and earlier diagnosis.

Page 4: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

How to evaluate new biomarkers?

• Studies have used changes in SCr as the gold standard against which to test novel tubular injury biomarkers.

• Aside from problems of specificity and sensitivity, – SCr does not directly reflect tubular function

or injury– Based on a cutoff, which will impact its true

spec and sens, and thus that of novel marker.

Page 5: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

Conceptual framework

• Actual disease that is the target of the diagnostic test (AKI) is not synonymous with clinical conditions identified by imperfect gold standard (SCr).

• AKI is difficult to establish without invasive and risky histopathological assessment.

• Using imperfect gold standard (i.e., imperfect reference test) may distort apparent diagnostic performance of novel biomarker.

Page 6: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

Idealized example of perfect novel biomarker

Page 7: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

disease prevalence=20%

imperfect gold standard sensitivity=80%, specificity=80%

Relative to imperfect gold standard, a perfect novel biomarker will have apparent sensitivity of 50% and apparent specificity of 64/68=94%.

Page 8: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

At lower prevalence, dominant effect of imperfect gold standard is on perfect biomarker’s apparent sensitivity:

apparent sens= apparent spec=

G

G

sensspec

pp 11

1

1

G

G

specsens

pp

11

1

1

Page 9: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

This is similar to imperfect gold standard=“need for dialysis”.

At prevalence of 20%, apparent sensitivity of perfect biomarker is 100% and apparent specificity is 84%. The bounds of the apparent AUC are 0.84-1.00.

Even rare false positives (imperfect gold standard spec=99%) lead to apparent sensitivity of 86% and bounds of apparent AUC of 0.72-0.98.

Page 10: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

Cut-offs for SCr

• Recent clinical studies of novel AKI biomarkers have used a variety of SCr criteria to define AKI.

• These examples illustrate that different choices of cut-off’s can lead to hugely different apparent properties of a novel biomarker.

Page 11: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

What if new biomarker is not perfect?

• Need assumptions on relationship between new biomarker and imperfect gold standard and disease to evaluate new biomarker.

• Conditional independence is convenient; allows for latent class models.

• However, it introduces implicit restrictions.

Page 12: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

What can we learn for imperfect novel biomarker?

• Previous illustration assumes perfect novel biomarker.

• Common assumption is conditional independence: P(B=b|G=g,D=d)=P(B=b|D=d)

• Apparent sensitivity of B relative to G:

• Apparent specificity of B relative to G:

• Use these to solve for “true sensitivity” and specificity of B relative to D

• Bounds on apparent AUC:– Apparent AUC< apparent sens × apparent spec– Apparent AUC>apparent sens+(1-apparent sens) × apparent spec

)1()1(

)1()1()1(

GG

BGBG

SppSep

SpSppSeSep

GG

BGBG

SppSep

SpSppSeSep

)1()1(

)1()1()1(

Page 13: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

Problems with conditional independence

• May not be plausible from mechanistic or physiological perspective; the two tests measure related phenomena.

• May be association between disease severity and test results; two tests may be conditionally independent given disease severity, but not conditionally independent given presence or absence of disease.

• Assumption of conditional independence constrains the disease prevalence; may not be plausible.

Page 14: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

Conditional Independence: disease severity

• Independence given disease severity:

P(G=1, B=1|D=1,X)=P(G=1|D=1,X)×P(B=1|D=1,X)

does not imply independence given disease:

P(G=1,B=1|D=1)=P(G=1|D=1)×P(B=1|D=1)

Page 15: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

Conditional Independence: disease prevalence

Conditional independence may not be possible at a given disease prevalence.

Page 16: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

Bounds on prevalence under conditional independence

G=1 G=0

B=1 a b

B=0 c d

G=1 G=0

B=1 a b

B=0 c d

G=1 G=0

B=1 (1-)a (1-)b

B=0 (1-)c (1-)d

D=1 D=0

Under conditional independence, split into two tables, with some constraints:

p=P(D=1)= a+ b+c+ d

Page 17: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

Example

Ignoring sampling variability, for p(0.285,0.715), conditional independence is not possible.

G=1 G=0

B=1 30% 5%

B=0 15% 50%

Page 18: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

Other dependence assumptions

• With more tests, some methods model relationships between some tests. This is arbitrary, and cannot be tested without a rich enough study.

• Discrepant resolution method; disfavored due to bias.

• Composite reference method; success depends on reliability of reference tests.

Page 19: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

Bounds on true sensitivity and specificity of a new biomarker

• Explore information available from the comparison of B and G, when no assumptions are made regarding their dependence.

• Assume operating characteristics of G are known.

• Derive bounds for operating characteristics of B.

Page 20: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

Idea

• Simply by bounding cells in cross tabulation of G and (B,D) to be between 0 and 1 we derive bounds for– P(D=1, B=1|G=1)– P(D=0, B=0|G=0)

• True sensitivity and specificity of G maximized at maxima of these and minimized at minima of these.

Page 21: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

ExampleG=1 G=0

B=1 25 5

B=0 10 60

• Apparent sens=25/35=71%

• Apparent spec=60/65=92%

• Suppose sens of G is 90% and spec of G is 95%

• True sens of B is (61%,81%)

• True spec of B is (87%,98%)

• These bounds are reasonably narrow.

Page 22: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

ExampleG=1 G=0

B=1 10 20

B=0 10 60

• Apparent sens=50%

• Apparent spec=75%

• Suppose sens of G is 90% and spec of G is 95%

• The true sens of B is (33%,67%)

• True spec of B is (71%,78%)

• Bound for sens is quite wide, ranging from poor test to possibly adequate; bound for spec is narrow.

Page 23: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

Conclusions

• Low sensitivity of a promising kidney injury biomarker when expected prevalence of disease is low (e.g., contrast nephropathy – NGAL sensitivity=78%), raises question of imperfect specificity of “gold standard”.

• Likewise, low specificity when expected prevalence is high (e.g., ICU with hypotension and sepsis – NGAL spec=76% when applied to critically ill patients) raises question of imperfect sensitivity of gold standard.

Page 24: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

Conclusions• Need “hard” clinical endpoints for use as gold standard, but even

these have potential problems (e.g., long latency, confounding by other risk factors).

• Could use exposure status, such as to nephrotoxic drug, to avoid SCr.

• Amount of information in comparing new biomarker to imperfect gold standard may not be very high, even if imperfect gold standard is a good test itself.

• Conditional independence is problematic – physiologically and technically.

• Nonparametric bounds may or may not be useful; but certainly reflect true information content.

• Ultimate validation of a biomarker’s utility is demonstration in a randomized clinical trial that it alters clinical management and improves clinical outcomes.

Page 25: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

Acknowledgments

• Sarah Emerson, PhD• Sushrut Waikar, MD• Joseph Bonventre, MD

Waikar SS, Betensky RA, Emerson SC, Bonventre JV (2012). Imperfect gold standards for kidney injury biomarker evaluation. J Am Soc Nephrol 23: 13-21.

Emerson SC, Waikar SS, Bonventre JV, Betensky RA (2012). Biomarker validation with an imperfect reference: issues and bounds. Unpublished manuscript.

Page 26: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April
Page 27: Imperfect Gold Standards for Biomarker Evaluation Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April

With low prevalence, maintaining high specificity is more important than high sensitivity.