View
218
Download
0
Category
Tags:
Preview:
Citation preview
On-line Learning with On-line Learning with Passive-Aggressive Passive-Aggressive
AlgorithmsAlgorithms
Joseph KeshetThe Hebrew University
Learning Seminar,2004
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 2 of 33
Supervised Learning Supervised Learning ComponentsComponents
• Instance space
• Label space
• Mapping from to are called classifiers
• There exists unknown target classifier
• Goal: produce hypothesis that would be a good approximation of the target
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 3 of 33
OutlineOutline• Binary classification
– Problem setting– On-line algorithm– Mistake bound analysis– Kernels
• Regression
• Novelty-detection / “one-class”
• Hierarchical classification
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 4 of 33
Binary ClassificationBinary Classification• Input: examples
• Restriction: linear classification functions
• Goal: find that attains small error
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 5 of 33
Online LearningOnline LearningInitiate:
For
Receive vector
Predict label
Receive correct label
Suffer error
Apply update rule to obtain
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 6 of 33
Margin & LossMargin & Loss• Margin• Binary error is combinatorial quantity and thus
difficult to minimize directly• Define loss
Binary Error
“0-1” loss
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 7 of 33
The Update RuleThe Update Rule
Classify the current example correctly
Keep the current hyperplane close to
the last one
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 8 of 33
The Update RuleThe Update Rule
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 9 of 33
Loss BoundLoss Bound TheoremTheorem• sequence of examples
• satisfies
• Then
where
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 10 of 33
The Non-Separable CaseThe Non-Separable Case
Slack variable
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 11 of 33
The Update StoryThe Update Story
Correct classification-
No update
Misclassification Misclassification
The non-separable case
C
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 12 of 33
KernelsKernels• Since
• Note that
• Therefore
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 13 of 33
On-line RegressionOn-line Regression• Input: examples
• Restriction: linear classification functions
• Goal: find that attains small discrepancy
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 14 of 33
On-line RegressionOn-line Regression• Define loss
• Update rule
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 15 of 33
On-line Novelty DetectionOn-line Novelty Detection• Input: examples
• Restriction: linear classification functions
• Goal: find that closes the smallest ball
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 16 of 33
On-line Novelty DetectionOn-line Novelty Detection• Define loss
• Update rule
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 17 of 33
Hierarchical ClassificationHierarchical Classification• Goal: spoken phoneme recognition
b g
PHONEMES
Sononorants
Silences
ObstruentsNasals
Liquids
Vowels
Plosives FricativesFront Center Back
n m ng
d k p t
f v sh s thdhzh z
l y w r Affricates
jh ch
oyowuhuwaaao eraway
iy ih eyehae
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 18 of 33
Metric Over Label TreeMetric Over Label Tree
• A given hierarchy induces a metric over the set of labels tree distance
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 19 of 33
Metric Over Label TreeMetric Over Label Tree
• A given hierarchy induces a metric over the set of labels tree distance
bb
aa
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 20 of 33
Metric Over LabelsMetric Over Labels• Metric semantics:
γ(a,b) is the severity of predicting label “b” instead of correct label “a”
•Our high-level goal:
Tolerate minor errors …•Sibling errors
•Under-confident predictions - predicting a parent
…but, avoid major errors
bb
aa
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 21 of 33
Hierarchical ClassifierHierarchical Classifier• Assume and
• Associate a prototype
with each label
• Score of label
as
• Classify by: W4 W5 W6 W7 W8
W9 W10
W1
W0
W2
W3
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 22 of 33
Hierarchical ClassifierHierarchical Classifier• Goal: maintain close to
• Define
•
•
• Goal: maintain small
w4 w5 w6 w7 w8
w9 w10
w1
w0
w2
w3
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 23 of 33
Online LearningOnline LearningFor
Receive instance
Predict label
Receive correct label
Suffer tree-based penalty
Apply update rule to obtain
Goal: Suffer a small cumulative tree error
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 24 of 33
Tree LossTree Loss• Difficult to minimize directly
• Instead upper bound by
where
This is the Tree Loss
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 25 of 33
TheThe Update RuleUpdate Rule
w6 w7
w10w9
w4 w5 w8
w1 w2
w3
w0
Local update – only nodesalong the path from to are updated
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 26 of 33
Loss BoundLoss Bound TheoremTheorem• sequence of examples• satisfies
• Then
where and
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 27 of 33
Extension: KernelsExtension: Kernels• Since
• Note that
• Therefore
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 28 of 33
ExperimentsExperiments• Synthetic data: depth 4, 121 labels
Generated using orthogonal set in with Guassian noise (var 0.16)100 train instances and 50 test instances for each label
• Phoneme recognizer: 41 phoenems, taken from TIMIT corpus, MFCC+∆+∆∆ front-end, concatenation of 5 frames, RBF kernel2000 train vectors and 500 test vector per phoneme
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 29 of 33
ExperimentsExperiments• Flat - Ignore the hierarchy - solve as
multiclass CC
• Greedy approach: solve a multiclass problem at nodes with at least 2 children
CC
CC CC
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 30 of 33
ResultsResults
AveragedTree Error
MulticlassError
Synthetic data (tree) 0.05 5
Synthetic data (flat) 0.11 8.6
Synthetic data (greedy) 0.52 34.9
Phonemes (tree) 1.3 40.6
Phonemes (flat) 1.41 41.8
Phonemes (greedy) 2.48 58.2
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 31 of 33
ResultsResults• Difference between the error rate of the Hieron
and the multiclass
Synthetic data Phonemes
Hie
ron
err
-Fla
t e
rr
gross errorsgross errors
minor errorsminor errors
Hie
ron
err
-Fla
t e
rr
On-line Learning w/ Passive-Aggressive Joseph Keshet, The Hebrew University
Slide 32 of 33
Hierarchy vs. FlatHierarchy vs. Flat
Similarity between the prototypes
QuickTime™ and aVideo decompressor
are needed to see this picture.
Recommended