Upload
vuongkhuong
View
212
Download
0
Embed Size (px)
Citation preview
A. Cornuéjols!
AgroParisTech!
(based in part on Sebastian Thrun CMU class "and on the tutorial of Padraic Cunningham at ECML-09)!
!"#$%#&'()$*#+','()#$(-+,./01)
Evaluating ML algorithms 2 A. Cornuéjols
Questions
Since induction is fallible, it is necessary to be able to assess its reliability!
! Typical questions: – What is the true performance of my (learned) classification rule
– Is my learning algorithm better than this other one?
Evaluating ML algorithms 3 A. Cornuéjols
Outline
1. Measuring the error rate
2. Confusion matrices and various performance criteria
3. The ROC curve
Evaluating ML algorithms 4
!1&0#&'()./*).+%*)*++-+)+#.*)
A. Cornuéjols
Evaluating ML algorithms 5 A. Cornuéjols
Evaluating classification rules
Large data sample
Very small data sample
Illimited sample
Evaluating ML algorithms 6 A. Cornuéjols
Various sets of data
The whole available data set
Learning set Validation set Test set
Evaluating ML algorithms 7 A. Cornuéjols
Asymptotic behaviour (ideal case)
! Useful for very large data sets!
Evaluating ML algorithms 8 A. Cornuéjols
Over-fitting (over-learning)
Erreur
t
erreur sur basede test
erreur sur based'apprentissage
Arrêt de l'apprentissage
Sur-apprentissage
Evaluating ML algorithms 9 A. Cornuéjols
Over-fitting (NNs)
• ))2-%+3*1)4-%+)5)666)*7*04$*1)
• !!"#$%&'(!)#$%!*!+++!','-).'(!/!Evaluating ML algorithms 10 A. Cornuéjols
Why using a test set?
! The control parameters of the learning algorithm • E.g.: number of hidden layers, number of neurons, ...
– Are tuned in order to reduce the error on the validation set
! In order to have a non optimistically biased estimate of the error, one must measure it on an independent data set: the test set
Evaluating ML algorithms 11 A. Cornuéjols
Evaluating classification rules
A lot Few
Evaluating ML algorithms 12 A. Cornuéjols
Evaluating the error rate
! True error:!
! Test error:!
! "=D
D ydxyxpxfye ,),(),( #
!"
#=Syx
S xfym
e,
),(1ˆ $
D = the true distribution
m = # of test examples
T = test data
(Real risk)
(Empirical risk) T
Evaluating ML algorithms 13 A. Cornuéjols
Example:
! The learned hypothesis incorrectly classifies 12 out of 40 examples in the test set T.
! Q : What will be the true error rate?
! R : ???
Evaluating ML algorithms 14 A. Cornuéjols
Confidence intervals
! They are estimated using the normal law with:
– Mean:
– Standard deviation:
! We want to estimate errorD(h).!
! We estimate it by using errorT(h) which follows a binomial law!
– With mean !
– And standard error !!
Evaluating ML algorithms 15 A. Cornuéjols
Confidence intervals
! The normal law! ! The normal law!
Evaluating ML algorithms 16 A. Cornuéjols
Confidence intervals
With probability N%, the true error errorD lies in the interval:!
N% 50% 68% 80% 90% 95% 98% 99%
zN 0.67 1.0 1.28 1.64 1.96 2.33 2.58
Evaluating ML algorithms 17 A. Cornuéjols
Confidence intervals (cf. Mitchell 97)
If – T contains m examples independently sampled – m ! 30
Then – With probability 95%, the true error eD is within:
meee SS
S)ˆ1(ˆ96.1ˆ !
±
Evaluating ML algorithms 18 A. Cornuéjols
Example:
! The learned hypothesis incorrectly classifies 12 out of 40 test examples in T.
! Q: What will be the true error on unseen examples?
! A: With 95% confidence, the true error will lie within
meee SS
S)ˆ1(ˆ96.1ˆ]44.0;16.0[ !
±"
3.04012ˆ ==Se40=m 14.0)ˆ1(ˆ96.1 !
"
mee SS
Evaluating ML algorithms 19 A. Cornuéjols
95% confidence intervals
Evaluating ML algorithms 20 A. Cornuéjols
Performance curves
Erreur de test
95% confidence intervals
Erreur d’apprentissage
Evaluating ML algorithms 21 A. Cornuéjols
Evaluating learned hypotheses
Lot of data
Few
Evaluating ML algorithms 22 A. Cornuéjols
Various sets
Data
test " error Learning
Evaluating ML algorithms 23 A. Cornuéjols
Small data sets: a dilemma
Evaluating ML algorithms 24 A. Cornuéjols
Small data sets: a dilemma
Evaluating ML algorithms 25 A. Cornuéjols
Cross validation (k-fold) Data
Learn on yellow, test on rose " error5
Learn on yellow, test on rose " error6
Learn on yellow, test on rose " error7
Learn on yellow, test on rose " error1
Learn on yellow, test on rose " error3
Learn on yellow, test on rose " error4
Learn on yellow, test on rose " error8
Learn on yellow, test on rose " error2
error = # errori / k
k-way split
Evaluating ML algorithms 26 A. Cornuéjols
The “leave-one-out” procedure
Data
! Low bias
! Highvariance
! Tends to under-estimate the error if the data are not fully i.i.d.
[Guyon & Elisseeff, jMLR, 03]!
Evaluating ML algorithms 27 A. Cornuéjols
The Bootstrap estimate
Data
" Repeat and compute the mean
" Learn on yellow, test on rose " error
Evaluating ML algorithms 28 A. Cornuéjols
Problem
! The calculation of the confidence interval supposes the independence of the estimations.
! But our estimations are not independent. #
Estimation of the true risk for the final h Mean of the risks
On the k test samples Mean of the risk on
whole data set
Evaluating ML algorithms 29 A. Cornuéjols
Types of performance criteria
Evaluating ML algorithms 30
2-'8%1,-')0#.+,9*1))
#':)"#+,-%1)4*+8-+0#'9*)9+,.*+,#)
A. Cornuéjols
Evaluating ML algorithms 31 A. Cornuéjols
Confusion matrix
Réel!
Estimé ! +! -!
+! VP! FP!
-! FN! VN!
Evaluating ML algorithms 32 A. Cornuéjols
Confusion matrix
14% of the butterflies are recognized as fishes
Evaluating ML algorithms 33 A. Cornuéjols
Types of performance criteria
Evaluating ML algorithms 34 A. Cornuéjols
Types of performance criteria
Evaluating ML algorithms 35 A. Cornuéjols
Types of performance criteria
Evaluating ML algorithms 36 A. Cornuéjols
Types of performance criteria
Evaluating ML algorithms 37 A. Cornuéjols
Types of performance measures
Evaluating ML algorithms 38 A. Cornuéjols
Performance measures
! Sensitivity !
! Specificity!
Réel!
Estimé ! +! -!
+! VP! FP!
-! FN! VN!
VN
VN + FP
VP
FN + VP ! Recall!
! Precision!
VP
VP + FN
VP
VP + FP
Evaluating ML algorithms 39 A. Cornuéjols
Performance measures
! FN-rate !
! F-measure!
FN
VP + FN
Réel!
Estimé ! +! -!
+! VP! FP!
-! FN! VN!
2 x recall x precision
Recall + precision
! FP-rate!FP
FP + VN
2 VP
2 VP + FP + FN =
Evaluating ML algorithms 40 A. Cornuéjols
Performance measures
Evaluating ML algorithms 41 A. Cornuéjols
Performance measures
Evaluating ML algorithms 42 A. Cornuéjols
Performance measures
Evaluating ML algorithms 43 A. Cornuéjols
Performance measures
!!!!!!!!!!!!"#$%!&'()#!
*++,! -.,!
(--:) 6;<=>) 6;?5@)
3#:) 6;56>) 6;A<A)
B+*9,1,-'C(--:D)E)=F)G)5A5)E)6;@5<)Evaluating ML algorithms 44
H/*)IJ2)9%+"*)
A. Cornuéjols
Evaluating ML algorithms 45 A. Cornuéjols
The ROC curve
Evaluating ML algorithms 46 A. Cornuéjols
Types of errors
Evaluating ML algorithms 47 A. Cornuéjols
The ROC curve
Critère de décision
Prob
abili
téde
la cl
asse
Classe '+'Classe '-'
ROC = Receiver Operating Characteristic
Evaluating ML algorithms 48 A. Cornuéjols
The ROC curve
Critère de décision
Prob
abili
téde
la cl
asse
Classe '+'
Critère de décisionPr
obab
ilité
de la
clas
se
Classe '-'
Vraispositifs
Fauxnégatifs
Fauxpositifs
Vraisnégatifs
(50%)(50%)
(90%)(10%)
Evaluating ML algorithms 49 A. Cornuéjols
The ROC curve
Evaluating ML algorithms 50 A. Cornuéjols
The ROC curve PROPORTION DE VRAIS NEGATIFS
PROPORTION DE FAUX POSITIFS
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
0,1
0,2
0
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1,0
0,1
0,2
0
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1,0
PRO
POR
TIO
N D
E V
RA
IS P
OSI
TIFS
PRO
P OR
TIO
N D
E FA
UX
NEG
ATI
FS
Ligne de hasard(pertinence = 0,5)
Courbe ROC(pertinence = 0,90)
Evaluating ML algorithms 51 A. Cornuéjols
The ROC curve
PROPORTION DE VRAIS NEGATIFS
PROPORTION DE FAUX POSITIFS
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
0,1
0,2
0
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1,0
0,1
0,2
0
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1,0
PROP
ORTIO
N DE V
RAIS
POSIT
IFS
PROP
ORTIO
N DE F
AUX N
EGAT
IFS
Ligne de hasard(pertinence = 0,5)
Courbe ROC(pertinence = 0,90)
PROPORTION DE VRAIS NEGATIFS
PROPORTION DE FAUX POSITIFS
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
0,1
0,2
0
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1,0
0,1
0,2
0
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1,0
PROP
ORTIO
N DE V
RAIS
POSIT
IFS
PROP
ORTIO
N DE F
AUX N
EGAT
IFS
Critère de déci-sion
Probabilité
de la classe
Classe '+'
Critère de déci-sion
Probabilité
de la classe
Classe '-'
Vraispositifs
Fauxnégatifs
Fauxpositifs
Vraisnégatifs
Critère de déci-sion
Probabilité
de la classe
Classe '+'
Critère de déci-sion
Probabilité
de la classe
Classe '-'
Vraispositifs
Fauxnégatifs
Fauxpositifs
Vraisnégatifs
(50%)(50%)
(90%)(10%)
Seuil "laxiste"
Seuil "sévère"
Evaluating ML algorithms 52 A. Cornuéjols
The ROC curve
Evaluating ML algorithms 53 A. Cornuéjols
The ROC curve
Evaluating ML algorithms 54 A. Cornuéjols
Comparaison of learning algorithms ! Résumé!
! Comparison on a single data sets – [Dietterich, 1998] recommends using:
• 5 x 2 cross-validation • Paired t-test
– The McNemar test on a validation set
! Comparison on multiples (different) data sets – [Demsar, 2006] recommends using:
• Wilcoxon Signed Ranks Test • The Friedman test
Evaluating ML algorithms 55 A. Cornuéjols
Résumé
! Attention à votre fonction de coût : – qu’est-ce qui importe pour la mesure de performance ?
! Données en nombre fini: – calculez les intervalles de confiance
! Données rares : – Attention à la répartition entre données d’apprentissage et données
test. Validation croisée.
! N’oubliez pas l’ensemble de validation
! L’évaluation est très importante – Ayez l’esprit critique – Convainquez-vous vous même !
Evaluating ML algorithms 56 A. Cornuéjols
Specific problems
! The distribution of the classes is very unbalanced (e.g. 1% ou 1%O for one of the two classes)
! “Gray zone” (uncertain labels)
! Multi-valued functions
Evaluating ML algorithms 57 A. Cornuéjols
Other evaluation criteria
! Intelligibility of the learned decision function – E.g. SVMs or boosting are not good
! Performances in generalization – Often not correlated to the previous performance criterion
! Various costs – Data preparation
– Computational cost – Cost of the ML expertise – Cost of the domain expertise
Evaluating ML algorithms 58 A. Cornuéjols
References
! Dietterich, T. G., (1998). Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, 10 (7) 1895-1924.!
! JapKowicz N. & Shah M. (2011). Evaluating Learning Algorithms. A classification perspective. Cambridge University Press, 2011. (An interesting book)!
Evaluating ML algorithms 59 A. Cornuéjols
The Weka ML toolkit ! http://www.cs.waikato.ac.nz/m!weka/ "
Evaluating ML algorithms 60 A. Cornuéjols
The Weka ML toolkit