19
Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen 1,2 , Cheng Soon Ong 1 , Klaas E. Stephan 1,2,3 , Joachim M. Buhmann 1 1 Department of Computer Science, ETH Zurich, Switzerland 2 Laboratory for Social and Neural Systems Research, University of Zurich, Switzerland 3 Wellcome Trust Centre for Neuroimaging, University College London, United Kingdom

Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

Evaluation of classification performance on small, imbalanced datasets

Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3, Joachim M. Buhmann1

1 Department of Computer Science, ETH Zurich, Switzerland 2 Laboratory for Social and Neural Systems Research, University of Zurich, Switzerland 3 Wellcome Trust Centre for Neuroimaging, University College London, United Kingdom

Page 2: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

The balanced accuracy

1

Page 3: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

3

Is the accuracy a faithful performance measure?

actual + actual – actual + actual –

predicted +

predicted –

Page 4: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

4

Setting

Observations with labels

Classification-based confusion matrix:

Performance assessment

Accuracy

Balanced accuracy

Assessing classification performance

n

TNTPA

TNFP

TN

FNTP

TPB

2

1

x }1,1{ y

actual + actual –

predicted + TP FP

predicted – FN TN

P N

FNFPI

TNTPC

:

:

Page 5: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

5

Assuming a flat prior on the interval [0,1], the posterior of the accuracy follows a Beta distribution

From this we can compute:

the mean:

the mode:

a posterior probability interval:

The posterior distribution of the accuracy

),(~ baBetaA 1,1 IbCa

IC

C

2

1

IC

C

1,1;1;1,1;2

1

2

1 ICFICF BB

with

IC

A xxICB

ICxp )1()1,1(

1),;(

Page 6: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

6

Assuming a flat prior on the interval [0,1], the posterior of the balanced accuracy is given by the convolution of two Beta distributions

Based on this density, we can compute:

the mean

the mode

a posterior probability interval

The posterior distribution of the balanced accuracy

BetaavgAAB NP ~)(21

1

01,1;21,1);(2)( dzFPTNzpFNTPzxpxp AAB

Page 7: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

7

Two examples average accuracy 2 std. errors

mean accuracy and 95% mass

mean bal. acc. and 95% mass

chance

actual + actual –

Example 2: high accuracies on both classes, no imbalance, no bias

Example 1: fair overall accuracy, high class imbalance, strong prediction bias

actual + actual –

predicted +

predicted –

Page 8: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

8

Posterior densities mean

median

mode

95% post. prob. int.

average bal. acc.

chance

Posterior balanced accuracy

Posterior accuracy

predicted +

predicted –

actual + actual –

Page 9: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

Smooth precision-recall curves 2

Page 10: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

10

Decision values

Page 11: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

11

Decision values and the binormal assumption

decision values of negative examples

Page 12: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

12

Decision values and the binormal assumption

decision values of negative examples

decision values of positive examples

Page 13: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

13

Empirical and parametric curves

ROC curve PR curve

empirical

true TPR

(recall)

FPR (1 – specificity)

Page 14: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

14

Decision values and the binormal assumption

decision values of negative examples

decision values of positive examples

Page 15: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

15

Decision values and the binormal assumption

decision values of negative examples

decision values of positive examples

Page 16: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

16

Empirical and parametric curves

ROC curve PR curve

empirical

binormal

-binormal true

TPR

(recall)

FPR (1 – specificity)

Page 17: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

17

The effect of class imbalance on the PR curve

AP

RM

SE

Fraction of positive examples

Page 18: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

18

The effect of class imbalance on the PR curve

Esti

mat

ed m

inu

s tr

ue

aver

age

pre

cisi

on

(A

P) empirical

binormal

-binormal

Page 19: Evaluation of classification performance on small ... · Evaluation of classification performance on small, imbalanced datasets Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3,

19

Take-home messages

Dont’s

report the average and the standard error of the accuracy across cross-validation folds

look at empirical ROC or PR curves

Do’s

report a statistic of the posterior distribution of the balanced accuracy

compute a smooth ROC or PR curve under parametric assumptions

K.H. Brodersen, C.S. Ong, K.E. Stephan, J.M. Buhmann (2010) The balanced accuracy and its posterior distribution. Proceedings of the 20th International Conference on Pattern Recognition (in press).

K.H. Brodersen, C.S. Ong, K.E. Stephan, J.M. Buhmann (2010) The binormal assumption on precision-recall curves. Proceedings of the 20th International Conference on Pattern Recognition (in press).