29
Analysis of Classification- based Error Functions Mike Rimer Dr. Tony Martinez BYU Computer Science Dept. 18 March 2006

Analysis of Classification-based Error Functions

  • Upload
    quynh

  • View
    50

  • Download
    0

Embed Size (px)

DESCRIPTION

Analysis of Classification-based Error Functions. Mike Rimer Dr. Tony Martinez BYU Computer Science Dept. 18 March 2006. Overview. Machine learning Teaching artificial neural networks with an error function Problems with conventional error functions CB algorithms Experimental results - PowerPoint PPT Presentation

Citation preview

Page 1: Analysis of Classification-based Error Functions

Analysis of Classification-based

Error Functions

Mike Rimer

Dr. Tony Martinez

BYU Computer Science Dept.

18 March 2006

Page 2: Analysis of Classification-based Error Functions

Overview

Machine learning Teaching artificial neural networks with an

error function Problems with conventional error functions CB algorithms Experimental results Conclusion and future work

Page 3: Analysis of Classification-based Error Functions

Machine Learning

Goal: Automating learning of problem domains Given a training sample from a problem

domain, induce a correct solution-hypothesis over the entire problem population

The learning model is often used as a black box

input output f (x)

Page 4: Analysis of Classification-based Error Functions

Teaching ANNs with an Error Function

Used to train a multi-layer perceptron (MLP) to guide the gradient descent learning procedure

to an optimal state Conventional error metrics are sum-squared

error (SSE) and cross entropy (CE) SSE suited to function approximation CE aimed at classification problems CB error functions [Rimer & Martinez 06]

work better for classification

Page 5: Analysis of Classification-based Error Functions

SSE, CE

Attempts to approximate 0-1 targets in order to represent making a decision

OT

0 1

O2O1

ERROR 2ERROR 1

Pattern labeled as class 2

Page 6: Analysis of Classification-based Error Functions

Issues with approximating hard targets

Requires weights to be large to achieve optimality

Leads to premature weight saturation Weight decay, etc., can improve the situation

Learns areas of the problem space unevenly and at different times during training

Makes global learning problematic

Page 7: Analysis of Classification-based Error Functions

Classification-basedError Functions

Designed to more closely match the goal of learning a classification task (i.e. correct classifications, not low error on 0-1 targets), avoiding premature weight saturation and discouraging overfit

CB1 [Rimer & Martinez 02, 06] CB2 [Rimer & Martinez 04] CB3 (submitted to ICML ‘06)

Page 8: Analysis of Classification-based Error Functions

CB1

Only backpropagates error on misclassified training patterns

0 1Correct

T~T

otherwise0

)( and if

)( and if

maxmax

maxmax~max~

TkkkT

TTkkT

k ooTcoo

ooTcoo

0 1Misclassified

T ~T

ERROR

Page 9: Analysis of Classification-based Error Functions

CB2

Adds a confidence margin, μ, that is increased globally as training progresses

otherwise0

)( and if

)( and if

maxmax

maxmax~max~

TkkkT

TTkkT

k ooTcoo

ooTcoo

0 1 Misclassified

T ~T

ERROR

μ0 1

~T T

ERROR

μ

Correct, but doesn’t satisfy margin

0 1 Correct, and

satisfies margin

T~T

μ

Page 10: Analysis of Classification-based Error Functions

CB3

Learns a confidence Ci for each training pattern i as training progresses Patterns often misclassified have low confidence Patterns consistently classified correctly gain confidence

0 1Misclassified

T ~T

ERROR

0 1

~T T

ERROR

Ci

Correct with learned low confidence

0 1

~T T

ERROR

Ci

Correct with learned high confidence

Page 11: Analysis of Classification-based Error Functions

Neural Network Training

Influenced by: Initial parameter (weight) settings Pattern order presentation (stochastic training) Learning rate # of hidden nodes

Goal of training: High generalization Low bias and variance

Page 12: Analysis of Classification-based Error Functions

Experiments

Empirical comparison of six error functions SSE, CE, CE w/ WD, CB1-3

Used eleven benchmark problems from the UC Irvine Machine Learning Repository ann, balance, bcw, derm, ecoli, iono, iris, musk2, pima, sonar, wine

Testing performed using stratified 10-fold cross-validation

Model selection by hold-out set Results were averaged over ten tests LR = 0.1, M = 0.7

Page 13: Analysis of Classification-based Error Functions

Classifier output difference (COD)

Evaluation of behavioral difference of two hypotheses (e.g. classifiers)

T

xHxHIHHD Tx

T

))()((

),(ˆ 2121

T is the test set

I is the identity or characteristic function

Page 14: Analysis of Classification-based Error Functions

Robustness to initial network weights

Averaged 30 random runs over all datasets

algorithm%

Test acc St Dev Epoch

CB3 93.468 4.7792 200.67

CB2 92.839 4.0800 366.69

CB1 92.828 5.3290 514.14

CE 92.789 5.3937 319.57

CE w/ WD 92.251 5.4735 197.24

SSE 91.951 5.6131 774.70

Page 15: Analysis of Classification-based Error Functions

Robustness to initial network weights

Averaged over all tests Algorithm Test error COD

CB3 0.0653 0.0221

CB2 0.0716 0.0274

CB1 0.0717 0.0244

CE 0.0721 0.0248

CE w/ WD 0.0774 0.0255

SSE 0.0804 0.0368

COD

0.0200

0.0220

0.0240

0.0260

0.0280

0.0300

0.0320

0.0340

0.0360

0.0380

CB1 CB2 CB3 CE CE w/WD

SSE

Page 16: Analysis of Classification-based Error Functions

Robustness to pattern presentation order

Averaged 30 random runs over all datasets

algorithm%

Test acc St Dev Epoch

CB3 93.446 5.0409 200.46

CB2 92.641 5.4197 402.52

CB1 92.542 5.473 560.09

CE 92.290 5.6020 329.65

CE w/ WD 91.818 5.6278 221.21

SSE 91.817 5.6653 593.30

Page 17: Analysis of Classification-based Error Functions

Robustness to pattern presentation order

Averaged over all tests Algorithm Test error COD

CB3 0.0655 0.0259

CB2 0.0736 0.0302

CB1 0.0746 0.0282

CE 0.0771 0.0329

CE w/ WD 0.0818 0.0338

SSE 0.0818 0.0344

COD

0.0200

0.0220

0.0240

0.0260

0.0280

0.0300

0.0320

0.0340

0.0360

0.0380

CB1 CB2 CB3 CE CE w/WD

SSE

Page 18: Analysis of Classification-based Error Functions

Robustness to learning rate

Average of varying the learning rate from 0.01 – 0.3

Algorithm Test acc St Dev Epoch

CB3 93.175 3.514 334.8

CB2 92.285 3.437 617.8

SSE 92.211 3.449 525.7

CB1 91.908 3.880 505.4

CE 91.629 3.813 466.2

CE w/ WD 91.330 3.845 234.6

Page 19: Analysis of Classification-based Error Functions

Robustness to learning rate

90

90.5

91

91.5

92

92.5

93

93.5

94

0.01 0.06 0.11 0.16 0.21 0.26

Learning Rate

Tes

t A

ccu

racy

CB1

CB2

CB3

CE

CE WD

SSE

Page 20: Analysis of Classification-based Error Functions

Robustness to number of hidden nodes

Average of varying the number of nodes in the hidden layer from 1 - 30

Algorithm Test acc St dev Epoch

CB3 93.026 3.397 303.9

CB1 92.291 3.610 381.0

CB2 92.136 3.410 609.4

SSE 92.066 3.402 623.1

CE 91.956 3.563 397.0

CE w/ WD 91.74 3.493 190.6

Page 21: Analysis of Classification-based Error Functions

Robustness to number of hidden nodes

90

90.5

91

91.5

92

92.5

93

93.5

94

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

# Hidden Nodes

Tes

t A

ccu

racy

CB1 CE

CB2

CB3

CE

CE WD

SSE

Page 22: Analysis of Classification-based Error Functions

Conclusion

CB1-3 are generally more robust than SSE, CE, and CE w/ WD, with respect to: Initial weight settings Pattern presentation order Pattern variance Learning rate # hidden nodes

CB3 is superior, most robust, with most consistent results

Page 23: Analysis of Classification-based Error Functions

Questions?

Page 24: Analysis of Classification-based Error Functions

0 0.9 1.8 2.7 3.6 4.5 5.4 6.3 7.2 8.1 9 9.9+

Epoch

300

600

900

0

10

20

30

40

50

60

70

80

90

100

SSE

Page 25: Analysis of Classification-based Error Functions

0 0.9 1.8 2.7 3.6 4.5 5.4 6.3 7.2 8.1 9 9.9+

Epoch

300

600

900

0

10

20

30

40

50

60

70

80

90

100

Cross-entropy

Page 26: Analysis of Classification-based Error Functions

0 0.9 1.8 2.7 3.6 4.5 5.4 6.3 7.2 8.1 9 9.9+

Epoch

300

600

900

0

10

20

30

40

50

60

70

80

90

100

Cross-entropy w/ weight decay

Page 27: Analysis of Classification-based Error Functions

0

0.5 1

1.5 2

2.5 3

3.5 4

4.5 5

5.5 6

6.5 7

7.5 8

8.5 9

9.5

Epoch

300

600

900

0

10

20

30

40

50

60

70

80

90

100

CB1

Page 28: Analysis of Classification-based Error Functions

0

0.5 1

1.5 2

2.5 3

3.5 4

4.5 5

5.5 6

6.5 7

7.5 8

8.5 9

9.5

Epoch

300

600

900

0

10

20

30

40

50

60

70

80

90

100

CB2

Page 29: Analysis of Classification-based Error Functions

0

0.5 1

1.5 2

2.5 3

3.5 4

4.5 5

5.5 6

6.5 7

7.5 8

8.5 9

9.5

Epoch

300

600

900

0

10

20

30

40

50

60

70

80

90

100

CB3