Aggregating Published Prediction models with Individual Patient Data

Preview:

Citation preview

Aggregating Published Prediction Modelswith Individual Patient Data

A Comparison of Approaches

T.P.A. Debray, H. Koffijberg, Y. VergouweK.G.M. Moons, E.W. Steyerberg

Julius Center for Health Sciences and Primary Care, University Medical Center UtrechtCenter for Medical Decision Sciences, Erasmus Medical Center Rotterdam

The Netherlands

T.Debray@umcutrecht.nl

Introduction

Generalization of Prediction models

Sensible Modeling Strategies

Large Sample Size

Penalization and Shrinkage

Incorporation of Prior Evidence

Thomas Debray (Julius Center) Aggregating Prediction Models ISCB 2011 2 / 13

Practical Example

Traumatic Brain Injury (TBI)

Mushkudiani review (J Clin Epidemiol 2008)

Although most models agree on the three most importantpredictors, many were developed on small sample sizeswithin single centers and hence lack generalizability.Modeling strategies have to be improved, and includeexternal validation.

Illustration: Derivation of a novel Prediction Model

Favorable vs unfavorable outcome after 6 months TBIPatient data at hand (n=603)Predictors: Age, motor score (6 cat.), pupillary reactivity (3 cat.)

Modeling Strategies

Regular Logistic Regression (LRM)Penalized Likelihood Regression (PMLE)

Thomas Debray (Julius Center) Aggregating Prediction Models ISCB 2011 3 / 13

Practical Example

Traumatic Brain Injury (TBI)

External Validation (AUC)

1. EBIC (n=822)2. SLIN (n=409)3. NABIS (n=385)

LRM0.790.720.71

PMLE0.800.720.72

Can we improve the external validity of a new prediction model byaggregating newly collected IPD with prior evidence?

Prior evidence

11 studies on TBISimilar set of predictorsHeterogeneity

Thomas Debray (Julius Center) Aggregating Prediction Models ISCB 2011 4 / 13

Methods

Approach 1: Independent Pooling

Typical meta-analysis

Fixed (or random) effects

Multivariable regression coefficients are pooled independentlyWeights are based on standard errors or sample size

Update intercept using IPD

Example: β̂1 =∑m

j=1 β1jσ−11j∑m

j=1 σ−11j

with σ̂1 = 1∑mj=1 σ

−11j

Thomas Debray (Julius Center) Aggregating Prediction Models ISCB 2011 5 / 13

Methods

Approach 2: Bayesian Inference

Prior: previously published prediction models

Regression coefficients: (β0, β1, . . . , βk)jVariances: diag (Σj)Multivariate random-effects with penalization: µPRIOR,ΣPRIOR

Likelihood: newly derived prediction model

Regression coefficients: µIPD = (β0, β1, . . . , βk)TIPD

Variance-covariance matrix: ΣIPD

Posterior: aggregated prediction model

µPOST = ΣPOST

(Σ−1

PRIORµPRIOR + Σ−1IPDµIPD

)ΣPOST =

(Σ−1

PRIOR + Σ−1IPD

)−1

Thomas Debray (Julius Center) Aggregating Prediction Models ISCB 2011 6 / 13

Results

Practical Example: Validation Results

Discrimination

AU

C

0.5

0.6

0.7

0.8

0.9

1.0

EBIC SLIN NABIS

Logistic RegressionPMLEIndependent PoolingBayesian Inference

Thomas Debray (Julius Center) Aggregating Prediction Models ISCB 2011 7 / 13

Results

Practical Example: Validation Results

Calibration

Cal

ibra

tion

Slo

pe

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

EBIC SLIN NABIS

Logistic RegressionPMLEIndependent PoolingBayesian Inference

Thomas Debray (Julius Center) Aggregating Prediction Models ISCB 2011 8 / 13

Results

Simulation Study

Reference model with 5 predictors for generating data

Individual Patient Data at hand (nIPD = 200→ 1000)(b0, b1, . . . , b5)IPD = {−3, 1, 1, 1, 1, 1}4 heterogeneous literature studies (nj = 500)(data discarded after derivation of literature models)(b0, b1, . . . , b5)j ∝ N 6((−3, 1, 1, 1, 1, 1) , Γ) with Γ ∝Wish−1 (v ,S)Validation Data (nVAL = 5000)(b0, b1, . . . , b5)VAL = {−3, 1, 1, 1, 1, 1}

Evaluation of Performance

Mean Squared Error of predictor associations:

MSEcoef = 1r

∑rρ=1

∑5k=1

(β̂k − bk,VAL

)2Area Under the ROC curve (AUC)

Thomas Debray (Julius Center) Aggregating Prediction Models ISCB 2011 9 / 13

Results

Simulation Study

Reference model with 5 predictors for generating data

Individual Patient Data at hand (nIPD = 200→ 1000)(b0, b1, . . . , b5)IPD = {−3, 1, 1, 1, 1, 1}4 heterogeneous literature studies (nj = 500)(data discarded after derivation of literature models)(b0, b1, . . . , b5)j ∝ N 6((−3, 1, 1, 1, 1, 1) , Γ) with Γ ∝Wish−1 (v ,S)Validation Data (nVAL = 5000)(b0, b1, . . . , b5)VAL = {−3, 1, 1, 1, 1, 1}

Evaluation of Performance

Mean Squared Error of predictor associations:

MSEcoef = 1r

∑rρ=1

∑5k=1

(β̂k − bk,VAL

)2Area Under the ROC curve (AUC)

Thomas Debray (Julius Center) Aggregating Prediction Models ISCB 2011 9 / 13

Results

Weak heterogeneity

MSEcoef

nIPD/ΣnLIT

0

0.2

0.4

0.6

0.8

1.0

0.1 0.2 0.3 0.4 0.5

Logistic RegressionIndependent PoolingBayesian Inference

AUC

nIPD/ΣnLIT

0.86

0.87

0.88

0.89

0.90

0.1 0.2 0.3 0.4 0.5

Thomas Debray (Julius Center) Aggregating Prediction Models ISCB 2011 10 / 13

Results

Moderate heterogeneity

MSEcoef

nIPD/ΣnLIT

0

0.2

0.4

0.6

0.8

1.0

0.1 0.2 0.3 0.4 0.5

Logistic RegressionIndependent PoolingBayesian Inference

AUC

nIPD/ΣnLIT

0.86

0.87

0.88

0.89

0.90

0.1 0.2 0.3 0.4 0.5

Thomas Debray (Julius Center) Aggregating Prediction Models ISCB 2011 11 / 13

Results

Strong heterogeneity

MSEcoef

nIPD/ΣnLIT

0

0.2

0.4

0.6

0.8

1.0

0.1 0.2 0.3 0.4 0.5

Logistic RegressionIndependent PoolingBayesian Inference

AUC

nIPD/ΣnLIT

0.86

0.87

0.88

0.89

0.90

0.1 0.2 0.3 0.4 0.5

Thomas Debray (Julius Center) Aggregating Prediction Models ISCB 2011 12 / 13

Discussion

Discussion

Recomendations

Standard approach: Strongly heterogeneous evidence and relativelymuch data at handIndependent Pooling: Homogeneous evidence and relatively few data athandBayesian Inference: Moderately heterogeneous evidence and relativelyfew data at hand

Conclusion

Bayesian Inference performs similarly or better than the standardapproach, except in the presence of strong heterogeneity. Aggregationof prior evidence may not be desirable in those scenarios.

Future Work

Optimization of shrinkage intensity in Bayesian approachIncorporation of prediction models with different predictorsQuantification of heterogeneity across prediction models

Thomas Debray (Julius Center) Aggregating Prediction Models ISCB 2011 13 / 13

Discussion

Discussion

Recomendations

Standard approach: Strongly heterogeneous evidence and relativelymuch data at handIndependent Pooling: Homogeneous evidence and relatively few data athandBayesian Inference: Moderately heterogeneous evidence and relativelyfew data at hand

Conclusion

Bayesian Inference performs similarly or better than the standardapproach, except in the presence of strong heterogeneity. Aggregationof prior evidence may not be desirable in those scenarios.

Future Work

Optimization of shrinkage intensity in Bayesian approachIncorporation of prediction models with different predictorsQuantification of heterogeneity across prediction models

Thomas Debray (Julius Center) Aggregating Prediction Models ISCB 2011 13 / 13

Discussion

Discussion

Recomendations

Standard approach: Strongly heterogeneous evidence and relativelymuch data at handIndependent Pooling: Homogeneous evidence and relatively few data athandBayesian Inference: Moderately heterogeneous evidence and relativelyfew data at hand

Conclusion

Bayesian Inference performs similarly or better than the standardapproach, except in the presence of strong heterogeneity. Aggregationof prior evidence may not be desirable in those scenarios.

Future Work

Optimization of shrinkage intensity in Bayesian approachIncorporation of prediction models with different predictorsQuantification of heterogeneity across prediction models

Thomas Debray (Julius Center) Aggregating Prediction Models ISCB 2011 13 / 13

Recommended