A speech about Boosting Presenter: Roberto Valenti

Preview:

Citation preview

A speech about BoostingA speech about BoostingPresenter: Roberto Presenter: Roberto ValentiValenti

The Paper*The Paper*

*R.Schapire. The boosting approach to Machine Learning An Overview, 2001

I want YOU… I want YOU…

……TO TO UNDERSTANDUNDERSTAND

OverviewOverview

• Introduction• Adaboost

– How Does it work?– Why does it work?– Demo– Extensions– Performance & Applications

• Summary & Conclusions• Questions

Introduction to BoostingIntroduction to BoostingLet’s start

IntroductionIntroduction

• An example of Machine Learning: Spam classifier

• Highly accurate rule: difficult to find

• Inaccurate rule: ”BUY NOW”• Introducing Boosting: “An effective method of producing

an accurate prediction rule from inaccurate rules”

IntroductionIntroduction

• History of boosting:– 1989: Schapire

•First provable polynomial time boosting

– 1990: Freund•Much more efficient, but practical

drawbacks

– 1995: Freund & Schapire•Adaboost: Focus of this Presentation

– …

IntroductionIntroduction

• The Boosting Approach– Lots of Weak Classifiers– One Strong Classifier

• Boosting key points:– Give importance to misclassified data– Find a way to combine weak

classifiers in general rule.

AdaboostAdaboostHow does it work?

Adaboost – How does it work?Adaboost – How does it work?

Adaboost – How does it work?Adaboost – How does it work?

Base Learner Job:– Find a base Hypothesis:

– Minimize the error:

• Choose t

Adaboost – How does it work?Adaboost – How does it work?

AdaboostAdaboostWhy does it work?

Adaboost – Why does it work?Adaboost – Why does it work?

• Basic property: reduce the training error

• On binary Distributions:t

• Training error bounded by:

• Is at most e-2T->drops

exponentially!

• Generalization Error bounded by:

– T= number of iterations– m=sample size– d= Vapnik-Chervonenkis dimension2

– Pr [.]= empirical probability– Õ = Logarithmic and constant factors

• Overfitting in T!

Adaboost – Why does it work?Adaboost – Why does it work?

Adaboost – Why does it work?Adaboost – Why does it work?• Margins of the training examples

margin(x,y)=

• Positive only if correctly classified by H• Confidence in prediction:

• Qualitative Explanation of Effectiveness– Not Quantitative.

Adaboost – Other ViewAdaboost – Other View

• Adaboost as a zero-sum Game– Game matrix M– Row Player: Adaboost– Column Player: Base Learner– Row player plays rows with distribution P– Column player plays with distribution Q – Expected Loss: PTMQ

• Play a Repeated game Matrix

Adaboost – Other ViewAdaboost – Other View

• Von Neumann’s minmax theorem:

• If exist a classifier with • Then exist a combination of base

classifiers with margin > 2• Adaboost has potential of success

• Relations with Linear Programming and Online Learning

AdaboostAdaboostDemo

AdaboostAdaboostExtensions

Adaboost - ExtensionsAdaboost - Extensions

• History of Boosting:– …– 1997: Freund & Schapire

•Adaboost.M1 – First Multiclass Generalization– Fails if weak learner achieves less than 50%

•Adaboost.M2– Creates a set of binary problems– For x, better l1 or l2?

– 1999: Schapire & Singer•Adaboost.MH

– For x, better l1 or one of the others?

Adaboost - ExtensionsAdaboost - Extensions

– 2001: Rochery, Schapire et al.• Incorporating Human Knowledge

• Adaboost is data-driven• Human Knowledge can

compensate lack of data• Human expert:

– Chose rule p mapping x to p(x) Є [0,1]– Difficult!– Simple rules should work..

Adaboost - ExtensionsAdaboost - Extensions

• To incorporate human knowledge

• Where

RE(p||q)=p ln(p/q)+(1-p) ln((1-p)/(1-q))

AdaboostAdaboostPerformance and Applications

Adaboost - Performance & Adaboost - Performance & ApplicationsApplicationsError Rates on Text Error Rates on Text

categorizationcategorization

Reuters newswire articles AP newswire headlines

Adaboost - Performance & Adaboost - Performance & ApplicationsApplicationsSix Class Text Classification Six Class Text Classification

(TREC)(TREC)

Training Error Test Error

Adaboost - Performance & Adaboost - Performance & ApplicationsApplications

“How may I help you”

Spoken Language Spoken Language ClassificationClassification

“Help desk”

Adaboost - Performance & Adaboost - Performance & ApplicationsApplications

class, label1/weight1,label2/weight2

OCR: Outliers

Rounds:

12

25

4

Adaboost - ApplicationsAdaboost - Applications

• Text filtering – Schapire, Singer, Singhal. Boosting and Rocchio

applied to text filtering.1998• Routing

– Iyer, Lewis, Schapire, Singer, Singhal. Boosting for document routing.2000

• “Ranking” problems– Freund, Iyer, Schapire, Singer. An efficient

boostingalgorithm for combining preferences.1998• Image retrieval

– Tieu, Viola. Boosting image retrieval.2000• Medical diagnosis

– Merler, Furlanello, Larcher, Sboner. Tuning costsensitive boosting and its application to melanoma diagnosis.2001

Adaboost - ApplicationsAdaboost - Applications

• Learning problems in natural language processing– Abney, Schapire, Singer. Boosting applied to

tagging and PP attachment.1999– Collins. Discriminative reranking for natural

language parsing.2000– Escudero, Marquez, Rigau. Boosting applied to

word sense disambiguation.2000– Haruno, Shirai, Ooyama. Using decision trees to

construct a practical parser.1999– Moreno, Logan, Raj. A boosting approach for

confidence scoring.2001– Walker, Rambow, Rogati. SPoT: A trainable

sentence planner.2001

Summary and ConclusionsSummary and ConclusionsAt last…

SummarySummary

• Boosting takes a weak learner and converts it to a strong one

• Works by asymptotically minimizing the training error

• Effectively maximizes the margin of the combined hypothesis

• Adaboost is related to other many topics

•It Works!

ConclusionsConclusions

• Adaboost advantages:– Fast, simple and easy to program– No parameter required

• Performance Dependency:– (Skurichina, 2001) Boosting is only

useful for large sample size.– Choice of weak classifier– Incorporation of classifier weights– Data distribution

QuestionsQuestions

?(don’t be mean)

Recommended