31
Uncertainty in QSAR Predictions – Bayesian Inference and the Magic of Bootstrap Ullrika Sahlin PhD Centre for Environmental and Climate Research (CEC)

Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

Embed Size (px)

DESCRIPTION

This is the presentation from my talk at the excellent Gordon Research Conference on Computer Aided Drug Design 2013.

Citation preview

Page 1: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

Uncertainty in QSAR Predictions – Bayesian Inference and the Magic of Bootstrap

Ullrika Sahlin PhD

Centre for Environmental and Climate Research (CEC)

Page 2: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

QSAR integrated assessment

Assessment model

Input 1

Input 2

Input 3

Decision node

QSAR prediction

QSAR prediction

Experimental value

Page 3: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

Uncertainty in hazard assessment – does it matter?

4. Conservative

value of toxicity

3. Expected toxicity

2. Median toxicity

1. QSAR predictions

without uncertainty

0. No HA

?: 386

Not toxic*: 281 265 262 153

+109+3

+16Very toxic:

105

Sahlin et al. 2013. Arguments for Considering Uncertainty in QSAR Predictions in Hazard and Risk Assessments. ATLA

Page 4: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

QSAR integrated hazard assessment and the AD domain problem

-10 -8 -6 -4

020

040

060

080

0Predicted No Effect Concentration of 386 Triazoles

log min{EC50}

Mo

lecu

lar

we

igh

t

Relative toxicity potentialLow confidence in prediction

Page 5: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

Modes of statistical inference

• Parametric inference– Explain– Hypothesis-driven

• Predictive inference– Predict to support decision making– Generate hypothesis

• Evidence synthesis– Consider quality

Geisser. Introduction to predictive inference 1993. Sutton and Abrams 2001. Bayesian methods in meta-analysis and evidence synthesis. Statistical Methods in Medical Research.

Page 6: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

To predict…

is to make a statement of something we have not yet observed

is always made with uncertainty

is made using at least one model

Page 7: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

How can I…

• Assess uncertainty in a prediction?• Take my judgement of confidence in the

model into account?• Validate the assessment?

Principle for QSAR modelling

Principle to judge

confidence in predictions

Principle to assess

uncertainty

Page 8: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

Uncertainty in a prediction

Predictive error Predictive reliabilityOur confidence in using a model to predict what we want to predict

0.0 0.1 0.2 0.3 0.4 0.5 0.6

-2-1

01

hat value

pre

dic

tive

me

an

2 4 6 8 10 12 14

-2-1

01

nC

log

EC

50

Discrepancy between model and reality

Quantitative

Qualitative

Page 9: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

-5 0 5 10

-10

-50

510

15

nC

pred

icte

d y

Different kinds of errors

Page 10: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

5e-02 5e-01 5e+00 5e+01 5e+02

510

15

distance from model

pre

dic

tion

+

+ ++

+

+

++ ++++

+ + +++

+

++

+

+

+

+++

+ ++++

+

+

++

+

+

+

++

+++ ++++ +

+

++

+

+

+++

++++

+

+

+++ + +

+

+

+

++

+

++ +++

++++

++

+

++

+

+

+

+++

++

+

+

+

++++

++

++

++

++

++

+

++

+

+

+

++

+

+

+

++ ++++++++++++++

+ ++ +

+

+ +

+

++ + ++ ++

+ ++++ + +

++

+

+

+ +

++ +

++++

++

+

++

++++

++

+

+++

++ +

+

+

++

++

+++

+ ++

+++++++

+

+

++

+

++++

+

+

++

++

++

+ +++ + +

+

++

+ +++ + +

+

+++

+

+

+

+

+

+++++ +

+++

+ +++++

++++ ++

++

+++++

+++

++

+++

+++ ++++

++

++

++

+

+

+

++ + + +++ ++

++ +

+

+

+

++

++

+ ++++

+ +++++

+++++++++++

+ + ++++

+++

+

++

+++

++++

+++

++ ++ ++++ +

++++++ ++ ++

Predictive reliability

Page 11: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

Different measures of predictive reliability

• Similarity to points in the training data set• Distance from the centre of training data• Density of training data around the item to be

predicted

• Sensitivity analysis e.g. standard deviation in perturbed predictions

Page 12: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

Predictive error of a regression

Page 13: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

Predictive error of a regression

Predictive distribution

p(Y < y |X,θ)

Page 14: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

Predictive error of a regression

Predictive distribution

p(Y < y |X,θ)

Page 15: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

Predictive error of a regression

Use likelihood to compare!

Page 16: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

Assessment of predictive

distribution

Frequentist framework

Frequentist analytical

Sampling"external data" Re-sampling

Jackknifing "without

replacement"

Bootstrapping"with

replacement"

Bayesian framework

Bayesian analytical

Bayesian sampling

Different ways to assess

Page 17: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

I. Bayesian modelling

Assessment of predictive

distribution

Frequentist framework

Frequentist analytical

Sampling"external data" Re-sampling

Jackknifing "without

replacement"

Bootstrapping"with

replacement"

Bayesian framework

Bayesian analytical

Bayesian sampling

Page 18: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

I. Bayesian modelling

• Model parameters are uncertain

• Uncertainty is described by probability

• Prior information is subjective

• Data enters through Bayesian updating

0 50 100 150 200

5055

6065

7075

MCMC sampling

parameter 1

par

am

ete

r 2

Page 19: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

I. Bayesian modelling

Pros• Uncertainty is measured by

probability• Links to decision theory• Motivated under small data

Cons• Treatment of high-

dimensional descriptor space?

• Limitation to specific models?

• Re-modelling of QSARs needed

Page 20: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

Validation

Fathead Minnow QSARdata R-package

Park and Casella (2008) Journal of the American Statistical Association, Gramacy and Pantaleo (2010) Bayesian Analysis.

-2 -1 0 1 2

-10

12

training data

observed

pred

icte

d

R2_Blasso = 0.79

-3 -2 -1 0 1 2

-2-1

01

23

test data

observed

pred

icte

d

R2_Blasso = 0.75

Page 21: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

Validation

Empirical coverage

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

training data

confidence

hit

rate

0.0 0.2 0.4 0.6 0.8 1.00.

00.

20.

40.

60.

81.

0

test data

confidence

hit

rate

Page 22: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

2. Bootstrap sampling

Assessment of predictive

distribution

Frequentist framework

Frequentist analytical

Sampling"external data" Re-sampling

Jackknifing "without

replacement"

Bootstrapping"with

replacement"

Bayesian framework

Bayesian analytical

Bayesian sampling

Page 23: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

3. Assessment considering judgment in predictive reliability

Inspired by Denham 1997 and Clark 2009

Type of distribution: Gaussian

Mean: Point prediction yq

Variance: Local Predictive Error Sum of Squares divided by denominator

Page 24: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

3. Assessment considering judgment in predictive reliability

Inspired by Denham 1997 and Clark 2009

Type of distribution: Gaussian

Mean: Point prediction yq

Variance: Local Predictive Error Sum of Squares divided by denominator

Observed prediction errors Measure of predictive reliability

jj yy ˆSampling from distribution of

modified residuals

Page 25: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

3. Assessment considering judgment in predictive reliability

n

j jq

n

j jjjq

qw

yywPRESSW

1 ,

1

2, )ˆ(

.

)(

2

,

)ˆ(.jqwkNNj

jjq yyPRESSkNN

n

j jj yyPRESS1

2)ˆ(

Inspired by Denham 1997 and Clark 2009

Type of distribution: Gaussian

Mean: Point prediction Yq

Variance: Local Predictive Error Sum of Squares divided by denominator

Page 26: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

Validate the assessmentEvaluation on External data

log likelihood score

Ass

essm

ent

of p

redi

ctiv

e er

ror

-100 -80 -60 -40 -20 0

equal

W euclidean

W leverage

W ADdens

kNN euclidean

kNN leverage

kNN ADdens

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Empirical coverage (External data)

confidence level

hit

rate

1:1equalW euclideanW leverageW ADdenskNN euclideankNN leveragekNN ADdens

Page 27: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

So – which approach is the best?

-2 -1 0 1 2

-2-1

01

2

training data

observed

pred

icte

d

R2_pls = 0.77 R2_boot = 0.83 R2_Blasso = 0.79

-3 -2 -1 0 1 2-2

-10

12

3

test data

observed

pred

icte

d

R2_pls = 0.77 R2_boot = 0.78 R2_Blasso = 0.75

Page 28: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

So – which approach is the best?

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

training data

confidence

hit

rate

1:1BlassoBootstrapkNN leverageequal

0.0 0.2 0.4 0.6 0.8 1.00.

00.

20.

40.

60.

81.

0

test data

confidence

hit

rate

1:1BlassoBootstrapW euclideanequal

Page 29: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

training data

confidence

hit

rate

1:1BlassoBootstrapkNN leverageequal

0.0 0.2 0.4 0.6 0.8 1.00.

00.

20.

40.

60.

81.

0

test data

confidence

hit

rate

1:1BlassoBootstrapW euclideanequal

So – which approach is the best?

Evaluation on training data

log likelihood score

Ass

ess

me

nt

of

pre

dic

tive

err

or

-200 -150 -100 -50 0

Blasso

Bootstrap

kNN leverage

equal

Page 30: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

Take home messages

• A predictions is complete when given with uncertainty specified by probability

• Assessment of uncertainty need both be theoretical motivated and proved honest in empirical evaluation of performance measures

• Three useful approaches are to assess uncertainty through modelling (Bayesian), sampling (e.g. bootstrapping), or post modelling of predictive error

• Use appropriate measures to validate the assessment of uncertainty

Page 31: Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

Thank you for your attention

Drive safely in the statistical djungle!