© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
C. Gomes, H. Noçairi, M. Thomas
Alternative approaches for skin sensitization evaluation: Statistical and integrated
approach for the combination of non animal methods
ESTIV2012
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
OverviewOverview
Introduction/Context
Specific methodology Visualization of the methodology Process of validation rules
Data and Application
Conclusions and Perspectives
ESTIV2012
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
L'Oreal is developing approaches for alternative safety evaluation for skin sensitization of ingredients by combining multiple in vitro and in silico data.
Purpose : develop a predictive model for hazard identification : Sensitizer/Non Sensitizer
Data : For this purpose we used a full data set on 165 chemicals
composed of 35 different variables, representing the results from in silico predictions (Derek, TIMES,
Toxtree), from DPRA, MUSST,Nrf-2 and PGE-2 in vitro tests as well as numerous physico-chemical experimental or calculated parameters
Statutory ContextStatutory Context
ESTIV2012
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
Specific MethodologySpecific Methodology
A large number of supervised classification models have been proposed in the Literature
Solution “stacking" meta-model.
Which One To Choose?
Objective : Prediction of binary outcome (Sensitizer/Non Sensitizer)
Bias induced by the use of one single statistical approach
ESTIV2012
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
Specific MethodologySpecific Methodology
Repeated sub-sampling for variables selection
• Small number of observations
• Choice of different models
Boosting, Naïve Bayes, SVM, Sparse PLS-DA, and Expert Scoring
ESTIV2012
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
Each model provides a probability of being dangerous
Visualization of the methodologyVisualization of the methodology
BoostingSparse PLS DA SVMNaïve Bayes Score Method
INPUT VARIABLES (Qualitative and quantitative) BINARY OUTCOME(Sensitizer class (S) / Non Sensitizer class (NS) for N Subjects)
11 22 33 44 55
1
N
Subj
ects
11 22 33 44 55 Response VariableResponse Variable S
NS
Stacking Meta-modelStacking Meta-modelBy Logistics PLS-DABy Logistics PLS-DA
Stacking is a combination of 5 supervised classification methods
ESTIV2012
Robust prediction
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
Step 5Step 5 : Stacking model on
thevalidation
set
Global Stacking
Step 4Step 4 : Stacking model on the learning setwith variables selected in step 3Step 3Step 3 : Parameterization of each models,
and selection of the common variables in all subsetsselection of the common variables in all subsets
Step 2Step 2 : Learning set split into Q subsets
Data(N
observations)Validation set (30%)
Learning set (70%)
1st :1st : StackingMeta-model
Qth : Qth : StackingMeta-model
Step 1Step 1 : Data split into Learning/ Validation set
1st subsets :1st subsets :learning (80%)
Test (20%)
Qth subsets :Qth subsets :learning (80%)
Test (20%)
qth subsets :qth subsets :learning (80%)
Test (20%)qth :qth : StackingMeta-model
Process of validation rulesProcess of validation rules
ESTIV2012
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
Model StackingModel Boosting
Data and ApplicationData and Application(Sensitizer No Sensitizer )
SensitizerConclusion
No SensitizerConclusion
Inconclusive
Conclusion
85%
15%
(N=67: ≥ 85% and ≤ 15%) (N=135: ≥ 85% and ≤ 15%)
ESTIV2012
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
ESTIV2012
Results show that stacking model has better performance than all the other models taken
separately on a larger set
Performances on the validation set (N = Performances on the validation set (N = 50)50)
Performance comparisons on a validation set (25 Sensitizer and 25 Non Sensitizer) :
Take into account only high probabilities (≥ 85% and ≤ 15%) :
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
Conclusions and PerspectivesConclusions and Perspectives Conclusions :
The Stacking Meta-Model gives a prediction model with better performances for the development of alternative approaches in safety evaluation of chemicals than each of the five initial models separately
This kind of alternative prediction tool will ultimately contribute to the risk assessment decision making in a Weight of Evidence approach. Perspectives :
Implementation of another prediction model into the Stacking meta-model
Link the outputs of statistical approach with the comprehension of biological mechanisms.
Obtain a predictive model for potency evaluation of sensitizers (multi-class case )
ESTIV2012
Thanks you for your attention
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
Back up
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
In order to precise the probability a priori on each tests, a quality criterion ( Quality Factor) is used, based on Klimisch-like code 1,2, 3 (noted QF):
o Klimisch-like « code 1 » : Reliable Results QF = 1 o Klimisch-like « code 2 » : Doubtful results QF = 0,8 o Klimisch-like « code 3 » : Not reliable Results QF = 0,2
Bayes' theorem relates the conditional and marginal probabilities of stochastic events A and B:
P(A/B)=P(B/A)P(A)P(B)
1-)1( 2010
10
ppppppP
0p
2pTest SpecificityTest Sensitivity
A priori Test probability 1p
Naïve BayesNaïve Bayes
Performances
Test1 Test2 Test3
SensitivitySpecificity
Posterior Test1 Test2 Test3Probability
(=A)Probability
(=B)
Prior Test1 Test2 Test3Probability
(=A)Probability
(=B)
Result Test1 Test2 Test3A(=1) or
B(=0)QF
0.750.87
5
0.670.88
1
0.80.67
1
0.8
0.8 + (1-0.67)
0.50.5
0.5 x
0.5 x
0.7050.29
= 0.705P(A)= 0.5
x
0.7050.29
01
0.875
(1-0.75)
0.29 x
0.705 x
= 0.41P(A) = 1 - 0.29
x0.875 +
0.410.59
0.410.59
11
0.800.20
0.2
The aim of this criterion is correcting the observed "raw“ prediction by taking into account the reliability of the test in the following way: o Corrected Sensitivity =o Corrected Specificity =
0.5 + QF* (Sensitivity -0.5)0.5 + QF* (Specificity-0.5)
0.5 + 0.2 x (0.75 - 0.5) = 0.550.5 + 0.2 x (0.875 - 0.5) = 0.575
0.550.57
5
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
The score method allows, by a graphic visualization, to select important variables, and to fix thresholds. Example for qualitative variables:
AScore
(+1,+2,+3)
BScore
(-1,-2,-3)
Modality 1 0 -1
Modality 2 0 0
Modality 3 +2 0
Score scenario for Var 1
Expert ScoreExpert Score
ModaliModalityty
BB
BA
AA
Parameter 1N
321
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
The score method allows by a graphic visualization, to select important variables, and to fix thresholds.
Example for continuous variables:
B Score(+1,+2,+
3)
A Score(-1,-2,-
3)
<=Threshold
+2 0
> Threshol
d
0 0
Score scenario for Var 2
Var 2Value
of Var 2
B A
Threshold
Expert ScoreExpert Score
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
Table: global scores value for all subjects
Expert ScoreExpert Score
Global Score
Subject-1
7
Subject-n
6
1-Specificity
Sens
itiv
ity
1
10
Choice of the Threshold :Best compromise
between sensitivity and specificity
ROC curve
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
The PLSDA is a classification technique that combines the properties of PLS regression with the power of discrimination of discriminant analysis:
Sparse PLS-DASparse PLS-DA
Regression vector
1 … … p1
n X Scaled
PLS-DA model 1
q
Y
Y = b.X
t1
t 2
Maximum variation Between (B)
PLS Solves the optimization problem :
,cov2
1,11,1 hhT
vuhhvuvuYXMinYvXuMax
hhhh
Sparse PLS Solves the optimization problem : 21
2
11 hλhλhhT
v,uvPuPvuYXMin
hh
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
BoostingBoosting
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
Support Vector Machines (SVMs) are a set of machine learning approaches used for classification and regression, developed by Vladimir Vapnik
SVM is based on the concept of decision planes that define decision boundaries.
How does it work?How does it work?
What is SVMs?What is SVMs?
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
x1
x2
Class BClass A
Example 1 : Linear SVMsExample 1 : Linear SVMs How would you classify these points using a
linear discriminant function in order to minimize the error rate?
Infinite number of answers!
Which one is the best?
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
“Safety zone”Margin
x1
x2
Class BClass A
x+
x+
x-
Support Vectors
The linear discriminate function with the maximum margin is the best The margin is defined as the maximal width that the boundary could be moved from the separating hyper plane before hitting the first data point
Why is it the best? Robust to outliners
and thus strong generalization ability
Example 1 : Linear SVMsExample 1 : Linear SVMs
© L’Oréal - Reproduction interdite sans l’accord préalable écrit de L’Oréal.Reproduction prohibited without written agreement of L’Oréal.
Research & InnovationAdvanced Research
0 x
x2
But what are we going to do if the data set is just impossible to separate in 2 parts ?
How about… mapping data to a higher-dimensional space ?
Example 2 : No-Linear SVMsExample 2 : No-Linear SVMs
0 x
Datasets that are linearly separable with some noise work out great :