17
A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute of Technology Fukuoka, Japan 1

A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute

Embed Size (px)

Citation preview

Page 1: A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute

A Robust Bagging Method using Median as a Combination Rule

Zaman Md. Faisal and Hideo HiroseDepartment of Information Design and

InformaticsKyushu Institute of Technology

Fukuoka, Japan

1

Page 2: A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute

1. Bagging Algorithm2. Comparative view of Bagging and Bragging Procedure3. Nice bagging Algorithm4. Robust Bagging (Robag) Procedure.5. Classifiers used in the study6. Datasets used in the study7. Relative Improvement Measure8. Results based on different classifiers9. Conclusion10.Reference

2

Contents of the Study

2

Page 3: A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute

Objectives and Features of the Study

3

Objectives of the study: 1)Propose a robust bagging algorithm which can perform a) Comparatively well with Linear Classifiers (FLDA, NMC). b) Overcome the overfitting problem.

Feature of the study:1) A new bagging algorithm named, Robust Bagging (Robag)”.2) A Relative Improvement Measure [R.I.M] to measure the relative improvement of bagging algorithms over the base classifiers.3) Comparison of four bagging algorithms, s.t., bagging[1],bragging [2], nice bagging [3] and robag using the RIM.

Page 4: A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute

4

Standard Bagging Algorithm

4

Proposed in 1994 by Leo Breiman.

The Bagging Algorithm 1. Create B bootstrap replicates of the dataset. 2. Fit a model to each of the replicates. 3. Average (or vote) the predictions of the B models.

In each Bootstrapped sample 63% of the data is sampled, and 37% is unsampled (these samples are called out-of-bag samples).To exploit this variation the base classifier should be unstable. The examples of unstable classifier are Decision Tree, Neural Network, MARS e.t.c.

Page 5: A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute

5

Bragging Algorithm

5

Bragging (Bootstrap Robust Aggregating) was proposed by P. Bühlman in 2003. In Bragging :

1) A Robust Location Estimator , “Median” is used instead of mean to combine the multiple classifiers.2) It was proposed to improve the performance of MARS (Multivariate Adaptive Regression Spline). 3) In the case of CART, it performs quite similar to Bagging.

Page 6: A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute

66

T

w2

w2

w1

w2

w1

w1

w1

w2

w1

w2

w2

w1Median

w1

w2

Majority voting

Standard Bagging Procedure

Bragging Procedure

Using bootstrapping create multiple training sets

The out-of bag samplesCreate multiple version of the baseclassifiers

Classifier outputs

Bagging and Bragging Algorithm (Comparative view)

Page 7: A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute

77

Nice Bagging

Nice bagging algorithm was proposed by Skurichina and Duin in 1998. They proposed to a)Use a validation set(tuning set) to validate the classifiers before combining. b)They used the bootstrapped training sets for the validation.c)They selected only the, “nice” classifiers ( classifiers having misclassification error less than APER ( Apparent Error). d)They combine the coefficients of the linear classifiers using the, “average” rule.

Page 8: A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute

88

Robust Bagging (Robag)

In the Robag algorithm We used the Out-of-Bag(OOB) samples as the validation set as for validation it is better to use a data set which is independent of the training set ( the OOB is independent of the bootstrap training samples).We also used, “Median” as the combiner because in that case any extreme results yield by any classifiers will automatically be filtered out.

Page 9: A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute

9

The out-of bag samples

T

Validation Sets Validation Process Using OOB samples

Using bootstrapping create multiple training sets

Validated Classifiers

Classifier Outputs

9

Robust Bagging (Robag) contd’

Create multiple version of the base classifiers

Combining Classifier Outputs

Combined Classifier

Using Median to combine the classifier outputs

Page 10: A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute

1010

Relative Improvement Measure (RIM)

To check whether bagging or any other variant of bagging improves or degrades the performance of the base classifier we use here a relative improvement measure.

Relative Improvement = base

basebagging

Err

ErrErr

Here , Errbase= Test Error of the base Classifier Errbagging= Test Error of the bagged Classifier

So the RIM measures the decrease (increase) of the test error of the Bagged classifier over the base classifier.

Page 11: A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute

11

Classifiers and the data set used

11

We used here an unstable classifier that is a tree classifier, CART and two stable classifier (linear) , i,e., Fisher Linear Discriminant (FLD)and Nearest Mean Classifier (NMC) .

The stable classifiers are used to check the performance of the robag algorithm. As usually bagging algorithm do not perform well in case of stable classifiers.

We use 5 of the well known UCI [4](University of California Irvine) MachineLearning Repository data sets.

Datasets N q

Austral 690 14

Breast Cancer 699 9

Diabetes 768 8

Ionosphere 351 33

Spectral 267 23

N= no. of observationsq = no. of feautures

Page 12: A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute

12

Experimental Setup

12

In the experiments we 1)We divide all the datasets into 2 parts a training part (80% of the total data) a testing part (20% of the data), randomly. 2) We make this random partition 50 times .3)In each partition a) calculate the APER . b) use 50 bootstrap replicates to generate bagged classifiers. c) calculate the RIM. 4) Average the results over the 50 iterations.

Page 13: A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute

13

Results of CART

13

Bagging Bragging Nice Bagging

Robag

Austral -0.11 -0.13 -0.05 -0.10

Breast Cancer

-0.28 -0.25 -0.14 -0.20

Diabetes -0.19 -0.23 -0.22 -0.27

Ionosphere -0. 27 -0.29 -0.13 -0.28

Spectral -0.07 -0.03 -0.02 -0.05

Table: Mean relative improvements in error rate of Bagging, Robust Bagging, Nice Bagging and Bragging with respect to a classification tree 

Page 14: A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute

14

Results of FLD

14

Bagging Bragging Nice Bagging

Robag

Austral -0.01 -0.01 0.00 -0.01

Breast Cancer

-0.02 -0.02 -0.01 -0.03

Diabetes -0.02 -0.01 -0.02 -0.02

Ionosphere -0. 02 -0.02 -0.03 -0.04

Spectral -0.02 -0.04 -0.07 -0.05

Table: Mean relative improvements in error rate of Bagging, Robust Bagging, Nice Bagging and Bragging with respect to a FLD 

Page 15: A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute

15

Results of NMC

15

Bagging Bragging Nice Bagging

Robag

Austral 0.01 -0.01 0.01 -0.01

Breast Cancer -0.02 -0.02 -0.01 -0.03

Diabetes 0.02 0.01 0.03 -0.01

Ionosphere0.04 0.02 0.02 0.01

Spectral -0.01 -0.01 -0.01 0.00

Table: Mean relative improvements in error rate of Bagging, Robust Bagging, Nice Bagging and Bragging with respect to a NMC 

Page 16: A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute

16

Conclusion

16

We see from the results that the Robag algorithm, 1) Performed nearly similar to bagging and it’s variants when applied to CART in 2 datasets , performed better in 1 dataset.2)Performed well in 4 datasets when applied to FLD.3)Performed well in 4 datasets when applied with NMC.

So, we can say that the Robag algorithm when applied with Linear Classifiers performed better than other Bagging variants.

Page 17: A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute

References

17

[1] L. Breiman, “Bagging Predictors”, Machine Learning 24, 1996, pp.123-140.

[2] P. Bühlman, “Bagging, subbagging and bragging for improving some prediction algorithms”, in Recent Advances and Trends in Nonparametric Statistics, M.G. Arkitas, and D. N. Politis(Eds), Elsevier, 2003, pp. 9-34.

[3] M. Skurichina, R. P.W. Duin, “Bagging for linear classifiers”, Pattern Recognition, 31, 1998, pp. 909-930.

[4] http://www.ics.uci.edu/~mlearn/MLRepository.html.