1 A Brief Introduction to Adaboost Hongbo Deng 6 Feb, 2007 Some of the slides are borrowed from...

Preview:

Citation preview

1

A Brief Introduction to Adaboost

Hongbo Deng

6 Feb, 2007

Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman.

2

Outline

Background

Adaboost Algorithm

Theory/Interpretations

3

What’s So Good About Adaboost

Can be used with many different classifiers

Improves classification accuracy

Commonly used in many areas

Simple to implement

Not prone to overfitting

4

Bootstrapping

Bagging

Boosting (Schapire 1989)

Adaboost (Schapire 1995)

A Brief History Resampling for estimating statistic

Resampling for classifier design

5

Bootstrap Estimation

Repeatedly draw n samples from D For each set of samples, estimate a

statistic The bootstrap estimate is the mean of the

individual estimates Used to estimate a statistic (parameter)

and its variance

6

Bagging - Aggregate Bootstrapping

For i = 1 .. MDraw n*<n samples from D with replacementLearn classifier Ci

Final classifier is a vote of C1 .. CM

Increases classifier stability/reduces variance

D1D2

D3 D

7

Boosting (Schapire 1989)

Consider creating three component classifiers for a two-category problem through boosting.

Randomly select n1 < n samples from D without replacement to obtain D1 Train weak learner C1

Select n2 < n samples from D with half of the samples misclassified by C1 to obtain D2 Train weak learner C2

Select all remaining samples from D that C1 and C2 disagree on Train weak learner C3

Final classifier is vote of weak learners

D

D1

D2

D3

+ --

+

8

Adaboost - Adaptive Boosting

Instead of resampling, uses training set re-weighting Each training sample uses a weight to determine the probability

of being selected for a training set.

AdaBoost is an algorithm for constructing a “strong” classifier as linear combination of “simple” “weak” classifier

Final classification based on weighted vote of weak classifiers

9

Adaboost Terminology

ht(x) … “weak” or basis classifier (Classifier = Learner = Hypothesis)

… “strong” or final classifier

Weak Classifier: < 50% error over any distribution

Strong Classifier: thresholded linear combination of weak classifier outputs

10

Discrete Adaboost AlgorithmEach training sample has a

weight, which determines the probability of being selected for training the component classifier

11

Find the Weak Classifier

12

Find the Weak Classifier

13

The algorithm core

14

Reweighting

y * h(x) = 1

y * h(x) = -1

15

Reweighting

In this way, AdaBoost “focused on” the informative or “difficult” examples.

16

Reweighting

In this way, AdaBoost “focused on” the informative or “difficult” examples.

17

Algorithm recapitulation t = 1

18

Algorithm recapitulation

19

Algorithm recapitulation

20

Algorithm recapitulation

21

Algorithm recapitulation

22

Algorithm recapitulation

23

Algorithm recapitulation

24

Algorithm recapitulation

25

Pros and cons of AdaBoost

AdvantagesVery simple to implementDoes feature selection resulting in relatively

simple classifierFairly good generalization

DisadvantagesSuboptimal solutionSensitive to noisy data and outliers

26

References Duda, Hart, ect – Pattern Classification

Freund – “An adaptive version of the boost by majority algorithm”

Freund – “Experiments with a new boosting algorithm”

Freund, Schapire – “A decision-theoretic generalization of on-line learning and an application to boosting”

Friedman, Hastie, etc – “Additive Logistic Regression: A Statistical View of Boosting”

Jin, Liu, etc (CMU) – “A New Boosting Algorithm Using Input-Dependent Regularizer”

Li, Zhang, etc – “Floatboost Learning for Classification”

Opitz, Maclin – “Popular Ensemble Methods: An Empirical Study”

Ratsch, Warmuth – “Efficient Margin Maximization with Boosting”

Schapire, Freund, etc – “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods”

Schapire, Singer – “Improved Boosting Algorithms Using Confidence-Weighted Predictions”

Schapire – “The Boosting Approach to Machine Learning: An overview”

Zhang, Li, etc – “Multi-view Face Detection with Floatboost”

27

Appendix

Bound on training error Adaboost Variants

28

Bound on Training Error (Schapire)

29

Discrete Adaboost (DiscreteAB)(Friedman’s wording)

30

Discrete Adaboost (DiscreteAB)(Freund and Schapire’s wording)

31

Adaboost with Confidence Weighted Predictions (RealAB)

32

Adaboost Variants Proposed By Friedman LogitBoost

Solves Requires care to avoid numerical problems

GentleBoost Update is fm(x) = P(y=1 | x) – P(y=0 | x) instead of

Bounded [0 1]

33

Adaboost Variants Proposed By Friedman LogitBoost

34

Adaboost Variants Proposed By Friedman GentleBoost

35

Thanks!!!Any comments or questions?

Recommended