51
Applicability Domain Towards a more formal definition Thierry Hanser Research Leader [email protected]

Applicability Domain - Towards a More Formal Definition

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Applicability Domain

Towards a more formal definition

Thierry Hanser

Research Leader

[email protected]

Can we trust a specific individual prediction?

83% accuracy

Screening Risk assement

Can we trust a specific individual prediction?

83% accuracy

x 1000000

Screening Risk assement

Can we trust a specific individual prediction?

83% accuracy

x 1000000

Screening Risk assement

Can we trust a specific individual prediction?

83% accuracy

x 1000000

Screening Risk assement

x 1

Can we trust a specific individual prediction?

83% accuracy

x 1000000

Screening Risk assement

x 1

Can we trust a specific individual prediction?

83% accuracy

x 1000000

Screening Risk assement

Indivitual prediction accuracy estimateGlobal model accuracy estimate

x 1

Articulation of the method

• Applicability domain is not a monolithic concept, there are 3 key layers

• Separation of concern can help clarify and formalise the notion of AD

• Purpose: Initiate a constructive discussion among our QSAR community to

build a common understanding together

• Harmonize the way we define and present AD to the end users across models

and applications

• Remove confusion for the end user and improve the value of our AD model

Setubal workshop report : Jaworska, J. S.; Comber, M.; Auer, C.; Van Leeuwen, C. Environ. Health Perspect. 2003, 111, 1358−1360

Guidance Document on the Validation of (Quantitative) Structure− Activity Relationship QSAR Models; OECD Series on Testing and Assessment No.69; OECD Environment Directorate, Environment, Health and Safety Division: Paris, 2007

Current understanding and definitions

Common definition2

“AD is the response and chemical structure space in which the model makes predictions with a given reliability”.

• A defined endpoint

• An unambiguous algorithm

• A defined domain of applicability• Appropriate measures of goodness-of–fit, robustness and predictivity

• A mechanistic interpretation, if possible

QSAR principles1

Current understanding and definitions

Boundaries

ReliabilityApplicability

Common definition2

“AD is the response and chemical structure space in which the model makes predictions with a given reliability”.

• A defined endpoint

• An unambiguous algorithm

• A defined domain of applicability• Appropriate measures of goodness-of–fit, robustness and predictivity

• A mechanistic interpretation, if possible

QSAR principles1

Likelihood ?

A good fundation to build on

Mathea M, Klingspohn W, Baumann K. Chemoinformatic Classification Methods and their Applicability Domain. Mol Inf. 2016 May 1;35(5):160–80.

Gadaleta D, Mangiatordi GF, Catto M, Carotti A, Nicolotti O. Applicability Domain for QSAR Models:: Where Theory Meets Reality. International Journal of Quantitative Structure-Property Relationships. 2016 Jan;1(1):45–63.

Norinder U, Rybacka A, Andersson PL. Conformal prediction to define applicability domain – A case study on predicting ER and AR binding. SAR and QSAR in Environmental Research. 2016 Apr 2;27(4):303–16.

Toccacheli P, Nouretdinov I, Gammerman A. Conformal Predictors for Compound Activity Prediction. arXiv:160304506 [cs] [Internet]. 2016 Mar 14 [cited 2016 May 11]; Available from: http://arxiv.org/abs/1603.04506

Roy K, Kar S, Ambure P. On a simple approach for determining applicability domain of QSAR models. Chemometrics and Intelligent Laboratory Systems. 2015 Jul 15;145:22–9.

Sheridan RP. The Relative Importance of Domain Applicability Metrics for Estimating Prediction Errors in QSAR Varies with Training Set Diversity. J Chem Inf Model. 2015 Jun 22;55(6):1098–107.

Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, et al. QSAR Modeling: Where have you been? Where are you going to? J Med Chem. 2014 Jun 26;57(12):4977–5010.

Carrió P, Pinto M, Ecker G, Sanz F, Pastor M. Applicability Domain Analysis (ADAN): A Robust Method for Assessing the Reliability of Drug Property Predictions. J Chem Inf Model. 2014 May 27;54(5):1500–11.

Toplak M, Močnik R, Polajnar M, Bosnić Z, Carlsson L, Hasselgren C, et al. Assessment of Machine Learning Reliability Methods for Quantifying the Applicability Domain of QSAR Regression Models. J Chem Inf Model. 2014 Feb 24;54(2):431–41.

Dragos H, Gilles M, Alexandre V. Predicting the Predictability: A Unified Approach to the Applicability Domain Problem of QSAR Models. J Chem Inf Model. 2009 Jul 27;49(7):1762–76.

And many more...

Current common methods

Molecule classes

• Organic-Organometalic-Inorganic

• Class of molecules (Arom. Amines)

Feature representation

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

Distance based methods

• Distance to data points

• Density

Response domain

Current common methods

Molecule classes

• Organic-Organometalic-Inorganic

• Class of molecules (Arom. Amines)

Feature representation

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

Distance based methods

• Distance to data points

• Density

Response domainNo boronic acids in the training set

S

OH

NH

Current common methods

Molecule classes

• Organic-Organometalic-Inorganic

• Class of molecules (Arom. Amines)

Feature representation

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

Distance based methods

• Distance to data points

• Density

Response domain

Nearest Neighbours

Random forest

Like

lihoo

dLi

kelih

ood

Current common methods

Molecule classes

• Organic-Organometalic-Inorganic

• Class of molecules (Arom. Amines)

Feature representation

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

Distance based methods

• Distance to data points

• Density

Response domain

Box

Current common methods

Molecule classes

• Organic-Organometalic-Inorganic

• Class of molecules (Arom. Amines)

Feature representation

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

Distance based methods

• Distance to data points

• Density

Response domain

Convex hull

Current common methods

Molecule classes

• Organic-Organometalic-Inorganic

• Class of molecules (Arom. Amines)

Feature representation

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

Distance based methods

• Distance to data points

• Density

Response domain

Distance to data

Distance to data points

Current common methods

Molecule classes

• Organic-Organometalic-Inorganic

• Class of molecules (Arom. Amines)

Feature representation

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

Distance based methods

• Distance to data points

• Density

Response domain

Descriptor density

Density of data

Current common methods

Molecule classes

• Organic-Organometalic-Inorganic

• Class of molecules (Arom. Amines)

Feature representation

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

Distance based methods

• Distance to data points

• Density

Response domain

Predicted property

Dis

trib

utio

n

Mixture of different concepts

Reliability(is the prediction reliable?)

Applicability(can I use this model to make a prediction ?)

Decidability(can I make a clear decision)

Like

lihoo

d

Mixture of different concepts

Reliability(is the prediction reliable?)

Applicability(can I use this model to make a prediction?)

Decidability(can I make a clear decision)

Mixture of different concepts

Applicability Domain

Towards an extended and more formal framework

My model can be applied for this query compound

The prediction is reliable enough for my use case

I can make a clear decision

Applicabilitydomain

Reliabilitydomain

Decidabiitydomain

Confidence in the prediction if ...

Applicability (of the model)

Can I apply my model ?

Perform the predictionOut of Applicability

domain

ApplicabiltyModel boundaries

(Designer's specifications)

• Is the class of my query compound supported by the model ?e.g. exclude polymers, proteins, inorganic molecules, etc.

• Is my query compound in the range of the descriptor of the training set ?e.g. inside convex hull, minimum information density

• Did my model see all the structural features present in the query compound ?e.g. not in domain, contains unseen boronic acid functional group

Reliability (of the prediction)

Can I apply my model ?

Perform the prediction

Can I trust my prediction ?

Look at the predictionOut of Reliability domain

ReliabilityPrediction boundaries

(User defined )

• How close are the nearest neighbours ?

• How reliable are these nearest data points ?e.g. GLP compliance

• How well did my model predict these data points ? e.g. performance during CV

Out of Applicability

domain

Decidability (of the outcome)

• Does my evidence converge or conflict ?e.g. k Nearest Neightbours distribution

• Is there a consensus between intermediate conclusions ? e.g. RF tree distribution

• Is my posterior likelihood strong enough ?e.g. Naïve Bayes posterior probability

Can I apply my model ?

Perform the predictionOut of applicability

domainCan I trust my prediction ?

Look at the predictionOut of Reliability domain

Can I make a clear call ?

Equivocalor

Undecided

Make a statement

DecidabilityLikelihood boundaries

(User defined)

Implementation

Can I apply my model ?

Perform the predictionOut of applicability

domainCan I trust my prediction ?

Look at the predictionOut of Reliability domain

Can I make a clear call ?

Equivocalor

Undecided

Make a statement

Calibrate (Conformal analysis based on likelihood)

Use local model performance

Calibrate (Null hypothesis)

Rule driven

Confidence

Predictability

Reliability

Confidence(subjective)

Applicability

Confidence in the decision

Low reliability High reliability

Confidence in the decision

Low decidability

High decidability

Confidence in the decision

Confidence in the decision

We can't trust this high

likelihood !

Outside the Reliability Domainc

Outside the Reliability Domain

Can I trust my prediction ?

Confidence in the decision

Confident call

Difficult call(activity cliff)

Equivocal / UndecidedOutside the Reliability Domain

Outside the Reliability Domain

Can I make a clear call ?

Reliability and Likelihood are different concepts

Reliability and Likelihood are not interconvertibleThey can't compensate for each other

(e.g. low likelihood can't be compensated for by high reliability)

Intuitive, non ambigous and formal decision framework

Can I apply my model ?

Perform the prediction

Is my predictionreliable enough?

Look at the predictionOut of reliability domain

Equivocal / Undecided

Make a statement

Can I make a clear call ?

Out of applicability

domain

Applicabilty

Reliability

Decidability

Intuitive, non ambigous and formal decision framework

Can I apply my model ?

Perform the prediction

Is my predictionreliable enough?

Look at the predictionOut of reliability domain

Equivocal / Undecided

Make a statement

Can I make a clear call ?

Out of applicability

domain

Decisiondomain

Decision Domain

Decision Domain

“The Decision Domain (DC) is the scope in which it is possible to apply the model to make a non equivocal decision based on a reliable prediction”.

Proposal: Extend and refine the Applicability Domain into a Decision Domain

Transparency decision process

Decisiondomain

My model can be applied for this query compound

The prediction is reliable enough for my use case

I can make a clear decision

Applicability

Reliability

Decidabiity

Transparency decision process

I know which are the data (examples) that support the model's conclusion

I understand which assumptions the model has used in its reasoning Decision

support

Interpretation

Support

Decisiondomain

My model can be applied for this query compound

The prediction is reliable enough for my use case

I can make a clear decision

Applicability

Reliability

Decidabiity

Transparency decision process

TransparentDecision

Decisionsupport

Interpretation

Support

Decisiondomain

My model can be applied for this query compound

The prediction is reliable enough for my use case

I can make a clear decision

Applicability

Reliability

Decidabiity

I know which are the data (examples) that support the model's conclusion

I understand which assumptions the model has used in its reasoning

TARDIS principle

ransparency (of the method)pplicability (of the model)eliability (of the prediction)ecidability (non equivocal result)nterpretability (of the result)upport (evidence supporting the result)

(TARDIS principle)

TARDIS

Requirements to usefully support a decision process :

Conclusion

• In the context of risk assesment, expert and statistical models are tools to

support a decision process.

• These models should provide transparent and interpretable conclusions

that can be easily assessed by a human expert using his own knowledge.

• Finally it is the human experts that make the decision based on their

knowledge and the helpful information provided by the tools.

Lhasa Limited

Granary Wharf House, 2 Canal WharfLeeds, LS11 5PS

Registered Charity (290866)

Company Registration Number 01765239

+44(0)113 394 6020

[email protected]

www.lhasalimited.org

Thank you

Confidence vs Observed accuracy

Likelihood

Obs

erve

d ac

cura

cy

Accuracy ∼ Confidence ?

Confidence vs Observed accuracy (current)

LikelihoodReliability

RF, SVM, etc.

Local CV accuracy (kNN)

Polynomial Fitting (Sarah)

Confidence descriptors

Training setdataset

Individual prediction accuracy estimate

LikelihoodReliability

Null hypothesis (p-value)

Conformal prediction (target accuracy)

Confidence metric + Desired accuracy A%

Prediction within A% or no prediction

Calibrationdataset

Meta model Calibration

At what level I can trust my prediction Can I trust my prediction to a given level ?

Confidence vs Observed accuracy (proposed)

LikelihoodReliability

Confidence

Reliability and Likelihood are not interchangeableThey can't compensate each other

(low likelihood can't be compensated for by high reliability)

C ∼ R ⋅ L

C ∼ R ⋅ L

Confidence model (proposed)

LikelihoodReliability

Reliability Likelihood

Confidence

Noise estimate Accuracy Estimate

Noise

Confidence model (proposed)

Obs

erve

d a

ccur

acy

Low Reliability

High RMSE

Likelihood

Confidence model (proposed)

Low RMSE

High Reliability

Obs

erve

d a

ccur

acy

Likelihood

Confidence model (proposed)

ReliabilityLikelihood

Obs

erve

d ac

cura

cy

RM

SE e

rror

Use case defined threshold"...with a given reliability"

Which level of likelyhood ~ accuracy fitting do I need for my use case ?

If the reliability of the prediction meets this criteria then I can use the likelihood as an accuracy metric

Captured during cross validation

Meta model

Likelihood ~ Accuracy estimate

Confidence vs Observed accuracy (proposed)O

bser

ved

accu

racy

Likelihood is the meta model

Meta model Calibration using Conformal Prediction

Likelihood

becomes the non-conformity metric

(under the requested reliability levelassumption)

Conformitywiththe

calibration dataset

Required accuracy

A %Prediction

with at least an accuracy

of A%

Conformal prediction framework

J. Chem. Inf. Model., 2014, 54 (6), pp 1596–1603 "Introducing Conformal Prediction in Predictive Modeling. A Transparent and Flexible Alternative to Applicability Domain Determination" Ulf Norinder, Lars Carlsson, Scott Boyer, and Martin Eklund

Likelihood

Likelihood ~ Accuracy estimate