Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

Uncertainty in QSAR Predictions – Bayesian Inference and the Magic of Bootstrap

Ullrika Sahlin PhD

Centre for Environmental and Climate Research (CEC)

QSAR integrated assessment

Assessment model

Input 1

Input 2

Input 3

Decision node

QSAR prediction

QSAR prediction

Experimental value

Uncertainty in hazard assessment – does it matter?

4. Conservative

value of toxicity

3. Expected toxicity

2. Median toxicity

1. QSAR predictions

without uncertainty

0. No HA

?: 386

Not toxic*: 281 265 262 153

+109+3

+16Very toxic:

105

Sahlin et al. 2013. Arguments for Considering Uncertainty in QSAR Predictions in Hazard and Risk Assessments. ATLA

QSAR integrated hazard assessment and the AD domain problem

-10 -8 -6 -4

020

040

060

080

0Predicted No Effect Concentration of 386 Triazoles

log min{EC50}

Mo

lecu

lar

we

igh

t

Relative toxicity potentialLow confidence in prediction

Modes of statistical inference

• Parametric inference– Explain– Hypothesis-driven

• Predictive inference– Predict to support decision making– Generate hypothesis

• Evidence synthesis– Consider quality

Geisser. Introduction to predictive inference 1993. Sutton and Abrams 2001. Bayesian methods in meta-analysis and evidence synthesis. Statistical Methods in Medical Research.

To predict…

is to make a statement of something we have not yet observed

is always made with uncertainty

is made using at least one model

How can I…

• Assess uncertainty in a prediction?• Take my judgement of confidence in the

model into account?• Validate the assessment?

Principle for QSAR modelling

Principle to judge

confidence in predictions

Principle to assess

uncertainty

Uncertainty in a prediction

Predictive error Predictive reliabilityOur confidence in using a model to predict what we want to predict

0.0 0.1 0.2 0.3 0.4 0.5 0.6

-2-1

01

hat value

pre

dic

tive

me

an

2 4 6 8 10 12 14

-2-1

01

nC

log

EC

50

Discrepancy between model and reality

Quantitative

Qualitative

-5 0 5 10

-10

-50

510

15

nC

pred

icte

d y

Different kinds of errors

5e-02 5e-01 5e+00 5e+01 5e+02

510

15

distance from model

pre

dic

tion

+

+ ++

+

+

++ ++++

+ + +++

+

++

+

+

+

+++

+ ++++

+

+

++

+

+

+

++

+++ ++++ +

+

++

+

+

+++

++++

+

+

+++ + +

+

+

+

++

+

++ +++

++++

++

+

++

+

+

+

+++

++

+

+

+

++++

++

++

++

++

++

+

++

+

+

+

++

+

+

+

++ ++++++++++++++

+ ++ +

+

+ +

+

++ + ++ ++

+ ++++ + +

++

+

+

+ +

++ +

++++

++

+

++

++++

++

+

+++

++ +

+

+

++

++

+++

+ ++

+++++++

+

+

++

+

++++

+

+

++

++

++

+ +++ + +

+

++

+ +++ + +

+

+++

+

+

+

+

+

+++++ +

+++

+ +++++

++++ ++

++

+++++

+++

++

+++

+++ ++++

++

++

++

+

+

+

++ + + +++ ++

++ +

+

+

+

++

++

+ ++++

+ +++++

+++++++++++

+ + ++++

+++

+

++

+++

++++

+++

++ ++ ++++ +

++++++ ++ ++

Predictive reliability

Different measures of predictive reliability

• Similarity to points in the training data set• Distance from the centre of training data• Density of training data around the item to be

predicted

• Sensitivity analysis e.g. standard deviation in perturbed predictions

Predictive error of a regression


Predictive distribution

p(Y < y |X,θ)


Predictive distribution

p(Y < y |X,θ)


Use likelihood to compare!

Assessment of predictive

distribution

Frequentist framework

Frequentist analytical

Sampling"external data" Re-sampling

Jackknifing "without

replacement"

Bootstrapping"with

replacement"

Bayesian framework

Bayesian analytical

Bayesian sampling

Different ways to assess

I. Bayesian modelling


distribution





replacement"

Bootstrapping"with

replacement"

Bayesian framework

Bayesian analytical

Bayesian sampling


• Model parameters are uncertain

• Uncertainty is described by probability

• Prior information is subjective

• Data enters through Bayesian updating

0 50 100 150 200

5055

6065

7075

MCMC sampling

parameter 1

par

am

ete

r 2


Pros• Uncertainty is measured by

probability• Links to decision theory• Motivated under small data

Cons• Treatment of high-

dimensional descriptor space?

• Limitation to specific models?

• Re-modelling of QSARs needed

Validation

Fathead Minnow QSARdata R-package

Park and Casella (2008) Journal of the American Statistical Association, Gramacy and Pantaleo (2010) Bayesian Analysis.

-2 -1 0 1 2

-10

12

training data

observed

pred

icte

d

R2_Blasso = 0.79

-3 -2 -1 0 1 2

-2-1

01

23

test data

observed

pred

icte

d

R2_Blasso = 0.75

Validation

Empirical coverage

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

training data

confidence

hit

rate

0.0 0.2 0.4 0.6 0.8 1.00.

00.

20.

40.

60.

81.

0

test data

confidence

hit

rate

2. Bootstrap sampling


distribution





replacement"

Bootstrapping"with

replacement"

Bayesian framework

Bayesian analytical

Bayesian sampling

3. Assessment considering judgment in predictive reliability

Inspired by Denham 1997 and Clark 2009

Type of distribution: Gaussian

Mean: Point prediction yq

Variance: Local Predictive Error Sum of Squares divided by denominator




Mean: Point prediction yq


Observed prediction errors Measure of predictive reliability

jj yy ˆSampling from distribution of

modified residuals


n

j jq

n

j jjjq

qw

yywPRESSW

1 ,

1

2, )ˆ(

.

)(

2

,

)ˆ(.jqwkNNj

jjq yyPRESSkNN

n

j jj yyPRESS1

2)ˆ(



Mean: Point prediction Yq


Validate the assessmentEvaluation on External data

log likelihood score

Ass

essm

ent

of p

redi

ctiv

e er

ror

-100 -80 -60 -40 -20 0

equal

W euclidean

W leverage

W ADdens

kNN euclidean

kNN leverage

kNN ADdens

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Empirical coverage (External data)

confidence level

hit

rate

1:1equalW euclideanW leverageW ADdenskNN euclideankNN leveragekNN ADdens

So – which approach is the best?

-2 -1 0 1 2

-2-1

01

2

training data

observed

pred

icte

d

R2_pls = 0.77 R2_boot = 0.83 R2_Blasso = 0.79

-3 -2 -1 0 1 2-2

-10

12

3

test data

observed

pred

icte

d

R2_pls = 0.77 R2_boot = 0.78 R2_Blasso = 0.75


0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

training data

confidence

hit

rate

1:1BlassoBootstrapkNN leverageequal

0.0 0.2 0.4 0.6 0.8 1.00.

00.

20.

40.

60.

81.

0

test data

confidence

hit

rate

1:1BlassoBootstrapW euclideanequal

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

training data

confidence

hit

rate

1:1BlassoBootstrapkNN leverageequal

0.0 0.2 0.4 0.6 0.8 1.00.

00.

20.

40.

60.

81.

0

test data

confidence

hit

rate

1:1BlassoBootstrapW euclideanequal


Evaluation on training data

log likelihood score

Ass

ess

me

nt

of

pre

dic

tive

err

or

-200 -150 -100 -50 0

Blasso

Bootstrap

kNN leverage

equal

Take home messages

• A predictions is complete when given with uncertainty specified by probability

• Assessment of uncertainty need both be theoretical motivated and proved honest in empirical evaluation of performance measures

• Three useful approaches are to assess uncertainty through modelling (Bayesian), sampling (e.g. bootstrapping), or post modelling of predictive error

• Use appropriate measures to validate the assessment of uncertainty

Thank you for your attention

Drive safely in the statistical djungle!

Education

Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap