Applicability Domain - Towards a More Formal Definition

Applicability Domain

Towards a more formal definition

Thierry Hanser

Research Leader

thierry.hanser@lhasalimited.org

Can we trust a specific individual prediction?

83% accuracy

Screening Risk assement

83% accuracy

x 1000000

83% accuracy

x 1000000

83% accuracy

x 1000000

83% accuracy

x 1000000

83% accuracy

x 1000000

Indivitual prediction accuracy estimateGlobal model accuracy estimate

Articulation of the method

• Applicability domain is not a monolithic concept, there are 3 key layers

• Separation of concern can help clarify and formalise the notion of AD

• Purpose: Initiate a constructive discussion among our QSAR community to

build a common understanding together

• Harmonize the way we define and present AD to the end users across models

and applications

• Remove confusion for the end user and improve the value of our AD model

Setubal workshop report : Jaworska, J. S.; Comber, M.; Auer, C.; Van Leeuwen, C. Environ. Health Perspect. 2003, 111, 1358−1360

Guidance Document on the Validation of (Quantitative) Structure− Activity Relationship QSAR Models; OECD Series on Testing and Assessment No.69; OECD Environment Directorate, Environment, Health and Safety Division: Paris, 2007

Current understanding and definitions

Common definition2

“AD is the response and chemical structure space in which the model makes predictions with a given reliability”.

• A defined endpoint

• An unambiguous algorithm

• A defined domain of applicability• Appropriate measures of goodness-of–fit, robustness and predictivity

• A mechanistic interpretation, if possible

QSAR principles1

Current understanding and definitions

Boundaries

ReliabilityApplicability

Common definition2

“AD is the response and chemical structure space in which the model makes predictions with a given reliability”.

• A defined endpoint

• An unambiguous algorithm

• A defined domain of applicability• Appropriate measures of goodness-of–fit, robustness and predictivity

• A mechanistic interpretation, if possible

QSAR principles1

Likelihood ?

A good fundation to build on

Mathea M, Klingspohn W, Baumann K. Chemoinformatic Classification Methods and their Applicability Domain. Mol Inf. 2016 May 1;35(5):160–80.

Gadaleta D, Mangiatordi GF, Catto M, Carotti A, Nicolotti O. Applicability Domain for QSAR Models:: Where Theory Meets Reality. International Journal of Quantitative Structure-Property Relationships. 2016 Jan;1(1):45–63.

Norinder U, Rybacka A, Andersson PL. Conformal prediction to define applicability domain – A case study on predicting ER and AR binding. SAR and QSAR in Environmental Research. 2016 Apr 2;27(4):303–16.

Toccacheli P, Nouretdinov I, Gammerman A. Conformal Predictors for Compound Activity Prediction. arXiv:160304506 [cs] [Internet]. 2016 Mar 14 [cited 2016 May 11]; Available from: http://arxiv.org/abs/1603.04506

Roy K, Kar S, Ambure P. On a simple approach for determining applicability domain of QSAR models. Chemometrics and Intelligent Laboratory Systems. 2015 Jul 15;145:22–9.

Sheridan RP. The Relative Importance of Domain Applicability Metrics for Estimating Prediction Errors in QSAR Varies with Training Set Diversity. J Chem Inf Model. 2015 Jun 22;55(6):1098–107.

Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, et al. QSAR Modeling: Where have you been? Where are you going to? J Med Chem. 2014 Jun 26;57(12):4977–5010.

Carrió P, Pinto M, Ecker G, Sanz F, Pastor M. Applicability Domain Analysis (ADAN): A Robust Method for Assessing the Reliability of Drug Property Predictions. J Chem Inf Model. 2014 May 27;54(5):1500–11.

Toplak M, Močnik R, Polajnar M, Bosnić Z, Carlsson L, Hasselgren C, et al. Assessment of Machine Learning Reliability Methods for Quantifying the Applicability Domain of QSAR Regression Models. J Chem Inf Model. 2014 Feb 24;54(2):431–41.

Dragos H, Gilles M, Alexandre V. Predicting the Predictability: A Unified Approach to the Applicability Domain Problem of QSAR Models. J Chem Inf Model. 2009 Jul 27;49(7):1762–76.

And many more...

Current common methods

Molecule classes

• Organic-Organometalic-Inorganic

• Class of molecules (Arom. Amines)

Feature representation

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

Distance based methods

• Distance to data points

• Density

Response domain

Molecule classes

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

• Density

Response domainNo boronic acids in the training set

Molecule classes

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

• Density

Response domain

Nearest Neighbours

Random forest

Molecule classes

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

• Density

Response domain

Molecule classes

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

• Density

Response domain

Convex hull

Molecule classes

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

• Density

Response domain

Distance to data

Distance to data points

Molecule classes

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

• Density

Response domain

Descriptor density

Density of data

Molecule classes

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

• Density

Response domain

Predicted property

Mixture of different concepts

Reliability(is the prediction reliable?)

Applicability(can I use this model to make a prediction ?)

Decidability(can I make a clear decision)

Reliability(is the prediction reliable?)

Applicability(can I use this model to make a prediction?)

Decidability(can I make a clear decision)

Applicability Domain

Towards an extended and more formal framework

My model can be applied for this query compound

The prediction is reliable enough for my use case

I can make a clear decision

Applicabilitydomain

Reliabilitydomain

Decidabiitydomain

Confidence in the prediction if ...

Applicability (of the model)

Can I apply my model ?

Perform the predictionOut of Applicability

domain

ApplicabiltyModel boundaries

(Designer's specifications)

• Is the class of my query compound supported by the model ?e.g. exclude polymers, proteins, inorganic molecules, etc.

• Is my query compound in the range of the descriptor of the training set ?e.g. inside convex hull, minimum information density

• Did my model see all the structural features present in the query compound ?e.g. not in domain, contains unseen boronic acid functional group

Reliability (of the prediction)

Perform the prediction

Can I trust my prediction ?

Look at the predictionOut of Reliability domain

ReliabilityPrediction boundaries

(User defined )

• How close are the nearest neighbours ?

• How reliable are these nearest data points ?e.g. GLP compliance

• How well did my model predict these data points ? e.g. performance during CV

Out of Applicability

domain

Decidability (of the outcome)

• Does my evidence converge or conflict ?e.g. k Nearest Neightbours distribution

• Is there a consensus between intermediate conclusions ? e.g. RF tree distribution

• Is my posterior likelihood strong enough ?e.g. Naïve Bayes posterior probability

Perform the predictionOut of applicability

domainCan I trust my prediction ?

Can I make a clear call ?

Equivocalor

Undecided

Make a statement

DecidabilityLikelihood boundaries

(User defined)

Implementation

Perform the predictionOut of applicability

domainCan I trust my prediction ?

Equivocalor

Undecided

Make a statement

Calibrate (Conformal analysis based on likelihood)

Use local model performance

Calibrate (Null hypothesis)

Rule driven

Confidence

Predictability

Reliability

Confidence(subjective)

Applicability

Confidence in the decision

Low reliability High reliability

Low decidability

High decidability

We can't trust this high

likelihood !

Outside the Reliability Domainc

Outside the Reliability Domain

Can I trust my prediction ?

Confident call

Difficult call(activity cliff)

Equivocal / UndecidedOutside the Reliability Domain

Outside the Reliability Domain

Reliability and Likelihood are different concepts

Reliability and Likelihood are not interconvertibleThey can't compensate for each other

(e.g. low likelihood can't be compensated for by high reliability)

Intuitive, non ambigous and formal decision framework

Is my predictionreliable enough?

Look at the predictionOut of reliability domain

Equivocal / Undecided

Make a statement

Out of applicability

domain

Applicabilty

Reliability

Decidability

Intuitive, non ambigous and formal decision framework

Is my predictionreliable enough?

Look at the predictionOut of reliability domain

Equivocal / Undecided

Make a statement

Out of applicability

domain

Decisiondomain

Decision Domain

“The Decision Domain (DC) is the scope in which it is possible to apply the model to make a non equivocal decision based on a reliable prediction”.

Proposal: Extend and refine the Applicability Domain into a Decision Domain

Transparency decision process

Decisiondomain

Applicability

Reliability

Decidabiity

I know which are the data (examples) that support the model's conclusion

I understand which assumptions the model has used in its reasoning Decision

support

Interpretation

Support

Decisiondomain

Applicability

Reliability

Decidabiity

TransparentDecision

Decisionsupport

Interpretation

Support

Decisiondomain

Applicability

Reliability

Decidabiity

I know which are the data (examples) that support the model's conclusion

I understand which assumptions the model has used in its reasoning

TARDIS principle

ransparency (of the method)pplicability (of the model)eliability (of the prediction)ecidability (non equivocal result)nterpretability (of the result)upport (evidence supporting the result)

(TARDIS principle)

TARDIS

Requirements to usefully support a decision process :

Conclusion

• In the context of risk assesment, expert and statistical models are tools to

support a decision process.

• These models should provide transparent and interpretable conclusions

that can be easily assessed by a human expert using his own knowledge.

• Finally it is the human experts that make the decision based on their

knowledge and the helpful information provided by the tools.

Lhasa Limited

Granary Wharf House, 2 Canal WharfLeeds, LS11 5PS

Registered Charity (290866)

Company Registration Number 01765239

+44(0)113 394 6020

info@lhasalimited.org

www.lhasalimited.org

Thank you

Confidence vs Observed accuracy

Likelihood

Accuracy ∼ Confidence ?

Confidence vs Observed accuracy (current)

LikelihoodReliability

RF, SVM, etc.

Local CV accuracy (kNN)

Polynomial Fitting (Sarah)

Confidence descriptors

Training setdataset

Individual prediction accuracy estimate

Null hypothesis (p-value)

Conformal prediction (target accuracy)

Confidence metric + Desired accuracy A%

Prediction within A% or no prediction

Calibrationdataset

Meta model Calibration

At what level I can trust my prediction Can I trust my prediction to a given level ?

Confidence vs Observed accuracy (proposed)

Confidence

Reliability and Likelihood are not interchangeableThey can't compensate each other

(low likelihood can't be compensated for by high reliability)

C ∼ R ⋅ L

Confidence model (proposed)

Reliability Likelihood

Confidence

Noise estimate Accuracy Estimate

Low Reliability

High RMSE

Likelihood

Low RMSE

High Reliability

Likelihood

ReliabilityLikelihood

Use case defined threshold"...with a given reliability"

Which level of likelyhood ~ accuracy fitting do I need for my use case ?

If the reliability of the prediction meets this criteria then I can use the likelihood as an accuracy metric

Captured during cross validation

Meta model

Likelihood ~ Accuracy estimate

Confidence vs Observed accuracy (proposed)O

Likelihood is the meta model

Meta model Calibration using Conformal Prediction

Likelihood

becomes the non-conformity metric

(under the requested reliability levelassumption)

Conformitywiththe

calibration dataset

Required accuracy

A %Prediction

with at least an accuracy

Conformal prediction framework

J. Chem. Inf. Model., 2014, 54 (6), pp 1596–1603 "Introducing Conformal Prediction in Predictive Modeling. A Transparent and Flexible Alternative to Applicability Domain Determination" Ulf Norinder, Lars Carlsson, Scott Boyer, and Martin Eklund

Likelihood

Likelihood ~ Accuracy estimate

Applicability Domain - Towards a More Formal Definition

Documents

ww2.arb.ca.gov · DEFINITIONS AND APPLICABILITY ......................................................................... 7 Applicability

A formal approach to design a large scale domain ontologybisu/Presentation/ICSC_WSER... · 2017. 2. 1. · A formal approach to design a large scale domain ontology BISWANATH DUTTA

CSc340 5a1 Formal Relational Query Languages Chapter 6 The Relational Algebra The Tuple Relational Calculus The Domain Relational Calculus

Fourth Amendment applicability to Child Pornography ... Fourth Amendment applicability... · Fourth Amendment applicability to Child Pornography investigations: common situations

Isabelle/jEdit as IDE for domain-speci c formal languages

How Formal Analysis and Veriﬁcation - cyber.stanford.edu · Outline of this talk Security Deﬁnition of Blockchain-based system Technology and Security Layer Applicability of Formal

Accommodating Ontologies to Biological Reality—Top- Level ...€¦ · Background:The Basic Formal Ontology (BFO) is a top-level formal foundational ontology for the biomedical domain

Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005

Learning to Classify Documents According to Formal and ... · general-domain documents of formal and informal style, and the sec-ond represents medical texts. We tested on the second

Slide 5- 1 · Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 5- 11 Formal Definitions - Domain A domain has a logical definition: ([DPSOH ³86$BSKRQHBQXPEHUV

Guidance for determining applicability ... - Water … include advanced capabilities for simulation of ... of hydrologic-process and water-management ... had received formal review

VIME Framework · formal domain. What distinguishes it from the formal learning domain in most cases is the lack of coordination. The activities in this domain are separate from those

An Approach to Detecting Domain Errors Using Formal ...cis.k.hosei.ac.jp/~sliu/chentoAPSEC2004.pdfAn Approach to Detecting Domain Errors Using Formal Speciﬁcation-Based Testing Yuting

Domain Driven Modernization of Legacy Systems...domain An object-oriented domain model describes the structure and behavior using objects A domain model can be informal or formal It

Automatic Formal Verification of Clock Domain Crossing …Automatic Formal Verification of Clock Domain Crossing Signals Bing Li, Chris Kwok Initials, Presentation Subject, Month 2004

WHY IT’S IMPORTANT— - Schoolwiresmisdtx.schoolwires.com/cms/lib/TX21000394/Centricity/Domain/112/...WHY IT’S IMPORTANT ... • formal region • functional region ... A recurring

How quality of ADMET property prediction may affect early ... · • estimates accuracy of prediction (applicability domain) of programs5 • can be used for secure data sharing6

Sending a Patient Reminder - ONC | Office of the National ... · Web viewCertificate Discovery using Domain Name System (DNS) and Lightweight Directory Access Protocol (LDAP) Applicability

Domain analysis & description:€The implicit and explicit … · 4 Domain Analysis & Description and in formal terms, what has puzzled philosophers since antiquity. One can therefore

Standardization in SDR: The military model and its applicability to the civil domain Dr Michael Street NATO C3 Agency This presentation contains explanatory