42
1 Improving Supervised Classification using Confidence Weighted Learning Koby Crammer Joint work with Mark Dredze, Alex Kulesza and Fernando Pereira Workshop in Machine Learning, The EE department, Technion January 20, 2010

Improving Supervised Classification using Confidence Weighted Learning

  • Upload
    faraji

  • View
    57

  • Download
    0

Embed Size (px)

DESCRIPTION

Improving Supervised Classification using Confidence Weighted Learning. Koby Crammer Joint work with Mark Dredze, Alex Kulesza and Fernando Pereira. Workshop in Machine Learning, The EE department, Technion January 20, 2010. Linear Classifiers. Input Instance to be classified. - PowerPoint PPT Presentation

Citation preview

Page 1: Improving Supervised Classification  using  Confidence Weighted Learning

1

Improving Supervised Classification

using Confidence Weighted Learning

Koby Crammer

Joint work with

Mark Dredze, Alex Kulesza and Fernando Pereira

Workshop in Machine Learning, The EE department, Technion January 20, 2010

Page 2: Improving Supervised Classification  using  Confidence Weighted Learning

2

Linear Classifiers

Input Instance to be classified

Weight vector of classifier

Page 3: Improving Supervised Classification  using  Confidence Weighted Learning

3

Natural Language Processing

• Big datasets, large number of features

• Many features are only weakly correlated with target label

• Linear classifiers: features are associated with word-counts

• Heavy-tailed feature distribution

Feature Rank

Cou

nts

Page 4: Improving Supervised Classification  using  Confidence Weighted Learning

4

Sentiment Classification

• Who needs this Simpsons book? You DOOOOOOOOThis is one of the most extraordinary volumes I've ever encountered … . Exhaustive, informative, and ridiculously entertaining, it is the best accompaniment to the best television show … . … Very highly recommended!

Pang, Lee, Vaithyanathan, EMNLP 2002

Page 5: Improving Supervised Classification  using  Confidence Weighted Learning

Online Learning

Maintain Model M Get Instance x

Predict Label y=M(x)

Get True Label ySuffer Loss l(y,y)

Update Model M

Page 6: Improving Supervised Classification  using  Confidence Weighted Learning

6

Sentiment Classification

• Many positive reviews with the word best

Wbest

• Later negative review – “boring book – best if you want to sleep in seconds”

• Linear update will reduce both

Wbest Wboring

• But best appeared more than boring• Better to reduce words in different rate

Wboring Wbest

Page 7: Improving Supervised Classification  using  Confidence Weighted Learning

7

Linear Model Distribution over Linear Models

Example

Mean weight-vector

Page 8: Improving Supervised Classification  using  Confidence Weighted Learning

8

New Prediction Models

• Gaussian distributions over weight vectors

• The covariance is either full or diagonal

• In NLP we have many features and use a diagonal covariance

Page 9: Improving Supervised Classification  using  Confidence Weighted Learning

9

The algorithm forces that most of the values of would reside in this region

Weight Vector (Version) Space

Page 10: Improving Supervised Classification  using  Confidence Weighted Learning

10

Nothing to do, most of the weight vectors already classifies the example correctly

Passive Step

Page 11: Improving Supervised Classification  using  Confidence Weighted Learning

11

The mean is moved beyond the mistake-line(Large Margin)

Aggressive Step

The covariance is shirked in the direction of the input example

The algorithm projects the current Gaussian distribution on the half-space

Page 12: Improving Supervised Classification  using  Confidence Weighted Learning

12

• Projection update:

• Can be solved analytically

The Update

Page 13: Improving Supervised Classification  using  Confidence Weighted Learning

13

Synthetic Data

• 20 features• 2 informative (rotated

skewed Gaussian)• 18 noisy• Using a single feature is

as good as random prediction

Page 14: Improving Supervised Classification  using  Confidence Weighted Learning

14

Synthetic Data (cntd.)

Distribution after 50 examples (x1)

Page 15: Improving Supervised Classification  using  Confidence Weighted Learning

15

Synthetic Data (results)

Perceptron

PA

2nd Order

CW-full

CW-diag

Page 16: Improving Supervised Classification  using  Confidence Weighted Learning

16

Data

• Binary document classification– Sentiment reviews:

• 6 Amazon domains (Blitzer et al)

– Reuters (RCV1):• 3 pairs of labels

– 20 News Groups:• 3 of labels

• About 2000 instances per dataset• Bag of words representation• 10 Fold Cross-Validation; 5 epochs

Page 17: Improving Supervised Classification  using  Confidence Weighted Learning

17

Results vs Batch - Sentiment

• always better than batch methods • 3/6 significantly better

Page 18: Improving Supervised Classification  using  Confidence Weighted Learning

18

Results vs Batch - 20NG + Reuters

• 5/6 better than batch methods • 3/5 significantly better, 1/1 significantly worse

Page 19: Improving Supervised Classification  using  Confidence Weighted Learning

19

Parallel Training

• Split large data into disjoint sets• Train using each set independently• Combine resulting classifiers

– Average Classifiers performance– Uniform mean of linear weights

– Weighted mean of linear weights using confidence information

Page 20: Improving Supervised Classification  using  Confidence Weighted Learning

20

Parallel Training

• Data Size: Sentiment ~1M ; Reuters ~0.8M

• #Features/#Docs: Sentiment ~13 ; Reuters ~0.35

• Performance degradates with number of splits

• Weighting improves performance

Baseline (CW)

Page 21: Improving Supervised Classification  using  Confidence Weighted Learning

21 ApproximationConstraints for labels

Multi-Class Update• Multiple constraints per instance

• Approximate using a single constraint

Crammer, Dredze, Kulesza. EMNLP 2008

Page 22: Improving Supervised Classification  using  Confidence Weighted Learning

22

Evaluation Setup

• Nine multi-class datasets

TaskInstancesFeaturesLabelsBalanced

20 News18,828252,11520Yes

Amazon 713,580686,7247Yes

Amazon 37,000494,4813Yes

Enron A3,00013,55910No

Enron B3,00018,06510No

NYT Desk10,000108,67126No

NYT Online10,000108,67134No

NYT Section10,000108,67120No

Reuters4,00023,6994No

Crammer, Dredze, Kulesza. EMNLP 2008

Page 23: Improving Supervised Classification  using  Confidence Weighted Learning

23

Evaluation

Better than all baselines (online and batch): 8 of 9 datasets

Crammer, Dredze, Kulesza. EMNLP 2008

Page 24: Improving Supervised Classification  using  Confidence Weighted Learning

24

20 Newsgroups

Better than all online baselines: 8 of 9 datasets

Crammer, Dredze, Kulesza. EMNLP 2008

Page 25: Improving Supervised Classification  using  Confidence Weighted Learning

25

Multi-Domain Learning

• Task: sentiment classification

• Goal: reviews differ across domains– Electronics– Books– Movies– Kitchen Appliances

• Challenge: domains differ– Domains use different features– Domains may behave differently towards features

Blitzer, Dredze, Pereira, ACL 2007

Dredze, Kulesza, Crammer. MLJ 2009

Page 26: Improving Supervised Classification  using  Confidence Weighted Learning

26

Differing Feature Behaviors

• Share similar behaviors across domains

• Learn domain specific behaviors

Shared parameters

Parameters used for every domain

Domain parameters

Separate parameters for every domain

Page 27: Improving Supervised Classification  using  Confidence Weighted Learning

27

Combining Domain Parameters

w

w

Shared

Domain Specific

wCombined

2

-1

.5

Page 28: Improving Supervised Classification  using  Confidence Weighted Learning

28

Classifier Combination

Individual classifiers

Combined classifier

Weighting

• CW classifier is a distribution over weight vectors

Page 29: Improving Supervised Classification  using  Confidence Weighted Learning

29

Multi-Domain Regularization• Combined classifier for prediction and updates

– Based on Evgeniou and Pontil, KDD 2004

• Passive-aggressive update rule– Find shared model and individual model closest to

current corresponding models– Such that their combination will perform well on

current example

1) Smallest parameter change

Classify example correctly

2)

Page 30: Improving Supervised Classification  using  Confidence Weighted Learning

30

Evaluation on Sentiment

• Sentiment classification– Rate product reviews: positive/negative

– 4 datasets• All- 7 Amazon product types• Books- different rating thresholds• DVDs- different rating thresholds• Books+DVDs

– 1500 train, 100 test per domain

Page 31: Improving Supervised Classification  using  Confidence Weighted Learning

31

Results

0

5

10

15

20

25

Books DVD Books+DVD All

Test

Err

or

SingleSeparateMDR

Test error (smaller better)

10-fold CV, one pass online training

Books, DVDs, Books+DVDs

p=.001

Page 32: Improving Supervised Classification  using  Confidence Weighted Learning

32

Active LearningDredze & Crammer, ACL 2008

• Start with a pool of unlabeled examples

• Use few labeled examples to choose an initial hypothesis

• Iterative Algorithm :– Use current classifier

to pick an example to be labeled

– Train using all labeled examples

Page 33: Improving Supervised Classification  using  Confidence Weighted Learning

33

Active LearningDredze & Crammer, ACL 2008

• Start with a pool of unlabeled examples

• Use few labeled examples to choose an initial hypothesis

• Iterative Algorithm :– Use current classifier

to pick an example to be labeled

– Train using all labeled examples

Page 34: Improving Supervised Classification  using  Confidence Weighted Learning

34

Active LearningDredze & Crammer, ACL 2008

• Start with a pool of unlabeled examples

• Use few labeled examples to choose an initial hypothesis

• Iterative Algorithm :– Use current classifier

to pick an example to be labeled

– Train using all labeled examples

Page 35: Improving Supervised Classification  using  Confidence Weighted Learning

35

Active LearningDredze & Crammer, ACL 2008

• Start with a pool of unlabeled examples

• Use few labeled examples to choose an initial hypothesis

• Iterative Algorithm :– Use current classifier

to pick an example to be labeled

– Train using all labeled examples

Page 36: Improving Supervised Classification  using  Confidence Weighted Learning

36

Active LearningDredze & Crammer, ACL 2008

• Start with a pool of unlabeled examples

• Use few labeled examples to choose an initial hypothesis

• Iterative Algorithm :– Use current classifier

to pick an example to be labeled

– Train using all labeled examples

Page 37: Improving Supervised Classification  using  Confidence Weighted Learning

37

Active LearningDredze & Crammer, ACL 2008

• Start with a pool of unlabeled examples

• Use few labeled examples to choose an initial hypothesis

• Iterative Algorithm :– Use current classifier

to pick an example to be labeled

– Train using all labeled examples

Page 38: Improving Supervised Classification  using  Confidence Weighted Learning

38

Active LearningDredze & Crammer, ACL 2008

• Start with a pool of unlabeled examples

• Use few labeled examples to choose an initial hypothesis

• Iterative Algorithm :– Use current classifier

to pick an example to be labeled

– Train using all labeled examples

Page 39: Improving Supervised Classification  using  Confidence Weighted Learning

39

Picking the Next Example

• Random• Linear Classifiers

– Example with lowest margin

• Active Confidence Learning :– Example with least confidence

– Equivalent to lowest normalized-margin

Dredze & Crammer, ACL 2008

Page 40: Improving Supervised Classification  using  Confidence Weighted Learning

40

Active Learning

• 13 Datasets : – Sentiment (4), 20NG (3), Reuters (3), SPAM (3)

Dredze & Crammer, ACL 2008

Page 41: Improving Supervised Classification  using  Confidence Weighted Learning

41

Active Learning

• Amount of labels needed by CW Margin and ACL to achieve 80% of the accuracy of training with all data

Dredze & Crammer, ACL 2008

Page 42: Improving Supervised Classification  using  Confidence Weighted Learning

42

Summary

• Online training is fast and effective …• … but, NLP data has heavy-tailed feature

distribution• New Model:

– Add feature confidence parameters

• Benefits:– better than state-of-the-art training algorithms for linear

classifiers – Converges faster– Theoretical guaranties– Allows better combination of models trained in parallel

and better active learning– better domain adaptation