Upload
eunice-stevenson
View
214
Download
1
Embed Size (px)
Citation preview
ECML-2001 1
Estimating the predictive accuracy of a classifierEstimating the predictive accuracy of a classifier
Hilan Bensusan
Alexandros Kalousis
Why do we need to estimate the classifier performance? Why do we need to estimate the classifier performance?
To perform model selection without a previously established pool of classifiers.
To make meta-learning more automatic and less dependent on human experts.
To gain insight into the area of expertise of different classifiers
Meta-learningMeta-learning
Meta-learning is the endeavour to learn something about the expected performance of a classifier from previous applications.
It depends heavily on the way datasets are characterised.
It has concentrated on predicting the suitability of a classifier and on classifier selection from a pool.
Regression to predict performanceRegression to predict performance
In this paper we examine an approach to the direct estimation of performances through regression.
The work is somehow related to zooming for ranking but there no knowledge about the classifiers is gained.
Previous work includesPrevious work includes
João Gama and Pavel Brazdil in work related to StatLog (one dataset characterisation only and poor results reported in NMSE).
So Young Sohn (with StatLog datasets with boosting but better results).
Recent paper by Christian Koepf (good results with few classifiers and artificial datasets only).
Our approachOur approach
Broaden the research by comparing different dataset characterisation strategies and different regression methods.
A metadataset for each classifier is composed by a set of dataset characterisation attributes and the performance of the classifier in each dataset.
We concentrate on 8 classifiers:We concentrate on 8 classifiers:
two decision tree classifiers (C5.0tree and Ltree),
Naive Bayes Linear discriminant, Two rule methods (C5.0rules and ripper), Nearest neighbor, And a combination method (C5.0boost)
Strategies of dataset characterization:Strategies of dataset characterization:
A set of information-theoretical and statistical features of the datasets developed after StatLog (dct).
A finer grained development of the StatLog characteristics, where histograms are used to describe the distributions of features computed for each attribute of a dataset (histo).
Landmarking (land).
ECML-2001 9
LandmarkingLandmarking
A characterization technique where the performance of simple, bare-bone learners in the dataset is used to characterise it.
In this paper we use seven landmarkers: Decision node, Worst node, Randomly chosen node, Naive Bayes, 1-Nearest Neighbour, Elite 1-Nearest Neighbour and Linear Discriminant.
ECML-2001 10
Regression on accuraciesRegression on accuracies The quality of the estimate depends on its
closeness to the actual accuracy achieved by the classifier, measured by the Mean Absolute Deviation (MAD) using 10 fold xval.
MAD is defined as the sum of the absolute differences between real and predicted values divided by the number of test items.
dMAD, the MAD obtained by predicting always the mean error, is used as reference.
ECML-2001 11
Regression methods and datasetsRegression methods and datasets
We used a kernel method and Cubist for regression.
65 datasets from UCI and METAL were used.
Classifier performance is the mean of the accuracy on the 10 xval folds
ECML-2001 12
Estimating with kernelEstimating with kernel
Classifier dct histo land dMADC5.0boost 0.112 0.123 0.050 0.134C5.0rules 0.110 0.121 0.051 0.133C5.0tree 0.110 0.123 0.054 0.137lindisc 0.118 0.129 0.063 0.137ltree 0.105 0.113 0.041 0.132Near.Nei. 0.120 0.138 0.081 0.153NaiBayes 0.121 0.143 0.064 0.146ripper 0.113 0.128 0.056 0.145
ECML-2001 13
Estimating with CubistEstimating with Cubist
Classifier dct histo land dMADC5.0boost 0.103 0.128 0.033 0.134C5.0rules 0.121 0.126 0.036 0.133C5.0tree 0.114 0.130 0.044 0.137lindisc 0.118 0.140 0.054 0.137ltree 0.114 0.121 0.032 0.132Near.Nei. 0.150 0.149 0.067 0.153NaiBayes 0.126 0.149 0.044 0.146ripper 0.128 0.131 0.041 0.145
ECML-2001 14
Using estimates to rankUsing estimates to rank Rankings are compared for similarity using
Spearman’s rank correlation. Zooming cannot be applied to land since we
should not use a classifier to rank itself (we use land-).
We compare the ranking estimates with the true ranking.
The default ranking is computed over all datasets (C5b,r,t,Lt,rip,nn,nb,lind downwards).
ECML-2001 15
Average Spearman's Correlation Coefficients with the True RankingAverage Spearman's Correlation Coefficients with the True Ranking
Rankings Kernel Cubist Zooming
Default 0.330 0.330 0.330
dct 0.435 0.083 0.341
histo 0.405 0.174 0.371
land 0.180 0.185 -
land- - 0.090 0.190
ECML-2001 16
Gaining insight about classifiers Gaining insight about classifiers
Example, a land rule:
(34 cases, mean error 0.218)
IF Rand_Node <= 0.57
Elite_Node > 0.084
THEN mlcnb = 0.167
+ 0.239 Rand_Node
- 0.18 Worst_Node
+ 0.105 Elite_Node
ECML-2001 17
ConclusionsConclusions Regression can be used to estimate performances. Meta-learning needs good dataset characterisation. Landmarking is the best dataset characterisation
strategy for performance estimation but not the best one for ranking.
Future work includes further exploration of dataset characterisation strategies and of the results of combining them (as well as explaining the still odd result of landmarking in ranking).