21
Using Error-Correcting Codes For Text Classification Rayid Ghani [email protected] Center for Automated Learning & Discovery, Carnegie Mellon University This presentation can be accessed at http://www.cs.cmu.edu/~rayid/icmltal

Using Error-Correcting Codes For Text Classification

  • Upload
    caraf

  • View
    27

  • Download
    1

Embed Size (px)

DESCRIPTION

Using Error-Correcting Codes For Text Classification. Rayid Ghani [email protected]. Center for Automated Learning & Discovery, Carnegie Mellon University. This presentation can be accessed at http://www.cs.cmu.edu/~rayid/icmltalk. Outline. Review of ECOC Previous Work Types of Codes - PowerPoint PPT Presentation

Citation preview

Page 1: Using Error-Correcting Codes For Text Classification

Using Error-Correcting Codes For Text Classification

Rayid [email protected]

Center for Automated Learning & Discovery,Carnegie Mellon University

This presentation can be accessed at http://www.cs.cmu.edu/~rayid/icmltalk

Page 2: Using Error-Correcting Codes For Text Classification

Outline Review of ECOC Previous Work Types of Codes Experimental Results Semi-Theoretical Model Drawbacks Conclusions & Work in Progress

Page 3: Using Error-Correcting Codes For Text Classification

Overview of ECOC Decompose a multiclass problem

into multiple binary problems The conversion can be independent

or dependent of the data (it does depend on the number of classes)

Any learner that can learn binary functions can then be used to learn the original multivalued function

Page 4: Using Error-Correcting Codes For Text Classification

ECOC-Picture

0111

0100

1001

C

B

A

0111

0100

1001

C

B

A

4321 ffff

A B

C

Page 5: Using Error-Correcting Codes For Text Classification

Training ECOC

Given m distinct classes

Create an m x n binary matrix M.

Each class is assigned ONE row of M.

Each column of the matrix divides the classes into TWO groups.

Train the Base classifier to learn the n binary problems.

Page 6: Using Error-Correcting Codes For Text Classification

Testing ECOC To test a new instance

Apply each of the n classifiers to the new instance

Combine the predictions to obtain a binary string(codeword) for the new point

Classify to the class with the nearest codeword (usually hamming distance is used as the distance measure)

Page 7: Using Error-Correcting Codes For Text Classification

Previous Work Combine with Boosting –

ADABOOST.OC (Schapire, 1997)

Page 8: Using Error-Correcting Codes For Text Classification

Types of Codes Random Algebraic Constructed/Meaningful

Page 9: Using Error-Correcting Codes For Text Classification

Experimental Setup Generate the code Choose a Base Learner

Page 10: Using Error-Correcting Codes For Text Classification

Dataset Industry Sector Dataset

Consists of company web pages classified into 105 economic sectors

Standard stoplist No Stemming Skip all MIME and HTML headers Experimental approach similar to

McCallum et al. (1997) for comparison purposes.

Page 11: Using Error-Correcting Codes For Text Classification

Results

Classification Accuracies on five random 50-50 train-test splits of the Industry Sector dataset with a vocabulary size of 10000.

ECOC - 88% accurate!

Comparison with NBC

0

20

40

60

80

100

Trial 1 Trial 2 Trial 3 Trial 4 Trial 5

Cla

ssif

icat

ion

Acc

ura

cy (

%)

Naive Bayes Classifier

63-bit ECOC

Page 12: Using Error-Correcting Codes For Text Classification

How does the length of the code matter?

Naive Bayes Classifier15-bit ECOC 31-bit ECOC 63-bit ECOCAccuracy(%) 65.3 77.4 83.6 88.1

Table 2: Average Classification Accuracy on 5 random 50-50 train-test splits of the Industry Sector dataset with a vocabulary size of 10000 words selected using Information Gain.

Longer codes mean larger codeword separation

The minimum hamming distance of a code C is the smallest distance between any pair of distance codewords in C

If minimum hamming distance is h, then the code can correct (h-1)/2 errors

Page 13: Using Error-Correcting Codes For Text Classification

No. of Bits Training SizeHmin Emin p(average) Theoretical Exprerimental15 20 5 2 0.846 58.68 64.5415 50 5 2 0.895 79.64 77.3715 80 5 2 0.907 84.23 79.4231 20 11 5 0.847 66.53 71.7631 50 11 5 0.899 91.34 83.5731 80 11 5 0.908 93.97 84.7663 50 31 15 0.897 99.95 88.12

Theoretical Evidence Model ECOC by a Binomial Distribution

B(n,p) n = length of the codep = probability of each bit being

classified incorrectly

Page 14: Using Error-Correcting Codes For Text Classification

Theoretical Vs. Experimental AccuracyVocabsize=10000

0

20

40

60

80

100

15 15 15 31 31 31 63

Length of Code

Acc

ura

cy (

%)

Theoretical

Exprerimental

Page 15: Using Error-Correcting Codes For Text Classification

Size Matters?

Variation of accuracy with code length and training size

40

50

60

70

80

90

100

0 20 40 60 80 100

Training size per class

Acc

ura

cy (

%) SBC

15bit

31bit

63bit

Page 16: Using Error-Correcting Codes For Text Classification

Size does NOT matter!

Percent Decrease in Error with Training size and length of code

30

35

40

45

50

55

60

65

70

0 20 40 60 80 100

Training Size

% D

ecre

ase

in E

rro

r

15bit

31bit

63bit

Page 17: Using Error-Correcting Codes For Text Classification

Choosing Codes

Page 18: Using Error-Correcting Codes For Text Classification

Interesting Observations NBC does not give good probabilitiy

estimates- using ECOC results in better estimates.

Page 19: Using Error-Correcting Codes For Text Classification

Drawbacks Can be computationally expensive Random Codes throw away the real-

world nature of the data by picking random partitions to create artificial binary problems

Page 20: Using Error-Correcting Codes For Text Classification

Conclusion Improves Classification Accuracy

considerably! Extends a binary learner to a

multiclass learner Can be used when training data is

sparse

Page 21: Using Error-Correcting Codes For Text Classification

Future Work Use meaningful codes (hierarchy

or distinguishing between particularly difficult classes)

Use artificial datasets Combine ECOC with Co-Training or

Shrinkage Methods Sufficient and Necessary

conditions for optimal behavior