ECML-20011 Estimating the predictive accuracy of a classifier Hilan Bensusan Alexandros Kalousis

ECML-2001 1

Estimating the predictive accuracy of a classifierEstimating the predictive accuracy of a classifier

Hilan Bensusan

Alexandros Kalousis

Why do we need to estimate the classifier performance? Why do we need to estimate the classifier performance?

To perform model selection without a previously established pool of classifiers.

To make meta-learning more automatic and less dependent on human experts.

To gain insight into the area of expertise of different classifiers

Meta-learningMeta-learning

Meta-learning is the endeavour to learn something about the expected performance of a classifier from previous applications.

It depends heavily on the way datasets are characterised.

It has concentrated on predicting the suitability of a classifier and on classifier selection from a pool.

Regression to predict performanceRegression to predict performance

In this paper we examine an approach to the direct estimation of performances through regression.

The work is somehow related to zooming for ranking but there no knowledge about the classifiers is gained.

Previous work includesPrevious work includes

João Gama and Pavel Brazdil in work related to StatLog (one dataset characterisation only and poor results reported in NMSE).

So Young Sohn (with StatLog datasets with boosting but better results).

Recent paper by Christian Koepf (good results with few classifiers and artificial datasets only).

Our approachOur approach

Broaden the research by comparing different dataset characterisation strategies and different regression methods.

A metadataset for each classifier is composed by a set of dataset characterisation attributes and the performance of the classifier in each dataset.

We concentrate on 8 classifiers:We concentrate on 8 classifiers:

two decision tree classifiers (C5.0tree and Ltree),

Naive Bayes Linear discriminant, Two rule methods (C5.0rules and ripper), Nearest neighbor, And a combination method (C5.0boost)

Strategies of dataset characterization:Strategies of dataset characterization:

A set of information-theoretical and statistical features of the datasets developed after StatLog (dct).

A finer grained development of the StatLog characteristics, where histograms are used to describe the distributions of features computed for each attribute of a dataset (histo).

Landmarking (land).

ECML-2001 9

LandmarkingLandmarking

A characterization technique where the performance of simple, bare-bone learners in the dataset is used to characterise it.

In this paper we use seven landmarkers: Decision node, Worst node, Randomly chosen node, Naive Bayes, 1-Nearest Neighbour, Elite 1-Nearest Neighbour and Linear Discriminant.

ECML-2001 10

Regression on accuraciesRegression on accuracies The quality of the estimate depends on its

closeness to the actual accuracy achieved by the classifier, measured by the Mean Absolute Deviation (MAD) using 10 fold xval.

MAD is defined as the sum of the absolute differences between real and predicted values divided by the number of test items.

dMAD, the MAD obtained by predicting always the mean error, is used as reference.

ECML-2001 11

Regression methods and datasetsRegression methods and datasets

We used a kernel method and Cubist for regression.

65 datasets from UCI and METAL were used.

Classifier performance is the mean of the accuracy on the 10 xval folds

ECML-2001 12

Estimating with kernelEstimating with kernel

Classifier dct histo land dMADC5.0boost 0.112 0.123 0.050 0.134C5.0rules 0.110 0.121 0.051 0.133C5.0tree 0.110 0.123 0.054 0.137lindisc 0.118 0.129 0.063 0.137ltree 0.105 0.113 0.041 0.132Near.Nei. 0.120 0.138 0.081 0.153NaiBayes 0.121 0.143 0.064 0.146ripper 0.113 0.128 0.056 0.145

ECML-2001 13

Estimating with CubistEstimating with Cubist

Classifier dct histo land dMADC5.0boost 0.103 0.128 0.033 0.134C5.0rules 0.121 0.126 0.036 0.133C5.0tree 0.114 0.130 0.044 0.137lindisc 0.118 0.140 0.054 0.137ltree 0.114 0.121 0.032 0.132Near.Nei. 0.150 0.149 0.067 0.153NaiBayes 0.126 0.149 0.044 0.146ripper 0.128 0.131 0.041 0.145

ECML-2001 14

Using estimates to rankUsing estimates to rank Rankings are compared for similarity using

Spearman’s rank correlation. Zooming cannot be applied to land since we

should not use a classifier to rank itself (we use land-).

We compare the ranking estimates with the true ranking.

The default ranking is computed over all datasets (C5b,r,t,Lt,rip,nn,nb,lind downwards).

ECML-2001 15

Average Spearman's Correlation Coefficients with the True RankingAverage Spearman's Correlation Coefficients with the True Ranking

Rankings Kernel Cubist Zooming

Default 0.330 0.330 0.330

dct 0.435 0.083 0.341

histo 0.405 0.174 0.371

land 0.180 0.185 -

land- - 0.090 0.190

ECML-2001 16

Gaining insight about classifiers Gaining insight about classifiers

Example, a land rule:

(34 cases, mean error 0.218)

IF Rand_Node <= 0.57

Elite_Node > 0.084

THEN mlcnb = 0.167

+ 0.239 Rand_Node

- 0.18 Worst_Node

+ 0.105 Elite_Node

ECML-2001 17

ConclusionsConclusions Regression can be used to estimate performances. Meta-learning needs good dataset characterisation. Landmarking is the best dataset characterisation

strategy for performance estimation but not the best one for ranking.

Future work includes further exploration of dataset characterisation strategies and of the results of combining them (as well as explaining the still odd result of landmarking in ranking).

Documents

ECML-20011 Estimating the predictive accuracy of a classifier Hilan Bensusan Alexandros Kalousis