Kato Mivule: An Overview of Adaptive Boosting – AdaBoost

1. An Overview of Adaptive Boosting AdaBoost Presented By Kato Mivule Dr. Manohar Mareboyana, Professor Data Mining - Spring 2013 Computer Science Department Bowie State University An Overview of AdaBoost 1

2. OUTLINE Introduction How AdaBoost Works The experiment Results Conclusion and Discussion 2 An Overview of AdaBoost 3. Adaptive Booting AdaBoost Adaptive Boosting (AdaBoost) was proposed by Freund and Schapire (1995). AdaBoost is a machine learning classifier that uses several iterations by adding weak learners to generate a new learner with improved performance. AdaBoost is adaptive in that with each iteration, a new weak learner is added to the AdaBoost classifier by fine-tuning weights with priority given to misclassified data in prior iterations. AdaBoost is less vulnerable to over-fitting but prone to noise and outliers. 3 An Overview of AdaBoost 4. AdaBoost Fit Ensemble 4 An overview of the AdaBoost Fit Ensemble procedure. An Overview of AdaBoost 5. 5 An Overview of AdaBoost 6. 6 An Overview of AdaBoost 7. 7 An Overview of AdaBoost 8. 8 An Overview of AdaBoost 9. How AdaBoost Works Weak Learners Decision Stump For this overview, we choose Decision Stumps as our weak learner. The Decision Stump generates a decision tree with only one single split. The resulting tree can be used for classifying unseen (untrained) instances. The leaf nodes is the class name. A non-leaf node is a decision node. 9 An Overview of AdaBoost 10. How AdaBoost Works Weak Learners How a Decision Stump chooses the best attributes: Information gain: attribute with lowest info gain is chosen. Gain ratio. Gini index. 10 An Overview of AdaBoost 11. AdaBoost the experiment For illustration purposes, we utilized Rapid Miners AdaBoost functionality We used a UCI Cancer dataset with 643 data points. We employed a 10 fold cross validation. 11 An Overview of AdaBoost 12. AdaBoost the experiment We used Rapid Miners Decision Stump as our weak learner. 12 An Overview of AdaBoost 13. AdaBoost Results 13 Generated AdaBoost Model The following AdaBoost Model was generated: AdaBoost (prediction model for label Class) Number of inner models: 3 Embedded model #0 (weight: 2.582): Uniformity of Cell > 3.500: 4 {2=11, 4=202} Uniformity of Cell 3.500: 2 {2=433, 4=37} Embedded model #1 (weight: 1.352): Uniformity of Cell Shape > 1.500: 4 {2=100, 4=237} Uniformity of Cell Shape 1.500: 2 {2=344, 4=2} Embedded model #2 (weight: 1.016): Clump Thickness > 8.500: 4 {2=0, 4=83} Clump Thickness 8.500: 2 {2=444, 4=156} An Overview of AdaBoost 14. AdaBoost Results AdaBoost using Decision Stumps classification accuracy at 93.12%. Decision Stump with out AdaBoost classification accuracy at 92.97%. 14 An Overview of AdaBoost 15. AdaBoost the results AdaBoost Confusion Matrix Classification accuracy at 93.12% Decision Stump Confusion Matrix Classification accuracy at 92.97% 15 An Overview of AdaBoost 16. AdaBoost the results 16 An Overview of AdaBoost The Receiver Operating Characteristic (ROC): The ROC shows the false positive rate on X- axis (specificity), the probability of target = 1 when its true value is 0. The true positive rate on Y-axis (sensitivity), the probability of target=1 when its true value is 1. For an ideal situation, the curve rises fast to the top-left indicating that the model correctly made the predictions. Area Under the Curve (AUC): AUC shows how the classier will rank a randomly chosen positive instance higher than a randomly chosen negative instance. The AUC calculates total performance of classifier. Higher AUC indicates better performance. 0.50 AUC indicates random performance. 1.00 AUC indicates perfect performance weight: The ROC/AUC plot for AdaBoost with AUC of 0.975. The ROC/AUC plot for Decision Stamp with AUC of 0.911. 17. 17 CONCLUSION As shown in the preliminary results, AdaBoost performs better than Decision Stump. However, much of the success for the AdaBoost will depend largely on fine-tuning parameters in the machine learning classifier and the weak learner that is chosen. An Overview of AdaBoost 18. References 1. Y. Freund and R. E. Schapire, "A Decision-Theoretic generalization of On-Line learning and an application to boosting," Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119-139, Aug. 1997. 2. T. G. Dietterich, "Ensemble methods in machine learning," Lecture Notes in Computer Science, vol. 1857, pp. 1-15, 2000. 3. Kato Mivule, Claude Turner, Soo-Yeon Ji, Towards A Differential Privacy and Utility Preserving Machine Learning Classifier, Procedia Computer Science, Volume 12, 2012, Pages 176-181 4. T. Fawcett, An introduction to ROC analysis., Pattern recognition letters, vol. 27, no.8, 2006, Pages 861-874. 5. K. Bache and M. Lichman, Breast Cancer Wisconsin (Original) Data Set - UCI Machine Learning Repository. University of California, School of Information and Computer Science., Irvine, CA, 2013. 6. MATLAB, AdaBoost - MATLAB. Online, Accessed: May 3rd 2013, Available: http://www.mathworks.com/discovery/adaboost.html. 7. MATLAB, Ensemble Methods:: Nonparametric Supervised Learning (Statistics Toolbox). Online, Accessed: May 3rd 2013, Available: http://www.mathworks.com/help/toolbox/stats/bsvjye9.html#bsvjyi5. 8. ROC Charts, Model Evaluation Classification Online, Accessed May 3rd 2013, Available: http://chem- eng.utoronto.ca/~datamining/dmc/model_evaluation_c.htm 9. MedCalc, ROC curve analysis in MedCalc, Online, Accessed May 3rd 2013, Available: http://www.medcalc.org/manual/roc- curves.php 18 An Overview of AdaBoost 19. THANK YOU. Contact: kmivule at gmail dot com 19 An Overview of AdaBoost

Technology

Kato Mivule: An Overview of Adaptive Boosting – AdaBoost