View
57
Download
0
Category
Preview:
DESCRIPTION
Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion. Presenter: Brian Quanz. About today’s discussion…. Last time: discussed convex opt. Today: Will apply what we learned to 4 pattern analysis problems given in book: - PowerPoint PPT Presentation
Citation preview
A KTEC Center of Excellence 1
Pattern Analysis using Convex Optimization: Part 2 of
Chapter 7 Discussion
Presenter: Brian Quanz
A KTEC Center of Excellence 2
About today’s discussion…• Last time: discussed convex opt.
• Today: Will apply what we learned to 4 pattern analysis problems given in book:• (1) Smallest enclosing hypersphere (one-class SVM)
• (2) SVM classification
• (3) Support vector regression (SVR)
• (4) On-line classification and regression
A KTEC Center of Excellence 3
About today’s discussion…• This time for the most part:
• Describe problems
• Derive solutions ourselves on the board!
• Apply convex opt. knowledge to solve
•Mostly board work today
A KTEC Center of Excellence 4
Recall: KKT Conditions• What we will use:
• Key to remember ch. 7:• Complementary slackness -> sparse dual rep.
• Convexity -> efficient global solution
A KTEC Center of Excellence 5
Novelty Detection: Hypersphere• Train data – learn support
•Capture with hypersphere
•Outside – ‘novel’ or ‘abnormal’ or ‘anomaly’
• Smaller sphere = more fine-tuned novelty detection
A KTEC Center of Excellence 6
1st: Smallest Enclosing Hypersphere•Given:
• Find center, c, of smallest hypersphere containing S
A KTEC Center of Excellence 7
S.E.H. Optimization Problem•O.P.:
• Let’s solve using Lagrangian and KKT and discuss
A KTEC Center of Excellence 8
Cheat
A KTEC Center of Excellence 9
S.E.H.: Solution
•H(x) = 1 if x>=0, 0 o.w.
Dual=primal @
A KTEC Center of Excellence 10
Theorem on bound of false positive
A KTEC Center of Excellence 11
Hypersphere that only contains some data – soft hypersphere
• Balance missing some points and reducing radius• Robustness –single point could throw off
• Introduce slack variables (repeated approach)• 0 within sphere, squared distance outside
A KTEC Center of Excellence 12
Hypersphere optimization problem•Now with trade off between radius
and training point error:
• Let’s derive solution again
A KTEC Center of Excellence 13
Cheat
A KTEC Center of Excellence 14
Soft hypersphere solution
A KTEC Center of Excellence 15
Linear Kernel Example
A KTEC Center of Excellence 16
Similar theorem
A KTEC Center of Excellence 17
Remarks• If data lies in subspace of feature
space:• Hypersphere overestimates support in perpendicular
dir.
• Can use kernel PCA (next week discussion)
• If normalized data (k(x,x)=1)• Corresponds to separating hyperplane, from origin
A KTEC Center of Excellence 18
Maximal Margin Classifier•Data and linear classifier
•Hinge loss, gamma margin
• Linear separable if
A KTEC Center of Excellence 19
Margin Example
A KTEC Center of Excellence 20
Typical formulation• Typical formulation fixes gamma
(functional margbin) to 1 and allows w to vary since scaling doesn’t affect decision, margin proportional to 1/norm(w) to vary.
•Here we fix w norm, and vary functional margin gamma
A KTEC Center of Excellence 21
Hard Margin SVM• Arrive at optimization problem
• Let’s solve
A KTEC Center of Excellence 22
Cheat
A KTEC Center of Excellence 23
Solution
• Recall:
A KTEC Center of Excellence 24
Example with Gaussian kernel
A KTEC Center of Excellence 25
Soft Margin Classifier•Non-separable - Introduce slack
variables as before• Trade off with 1-norm of error vector
A KTEC Center of Excellence 26
Solve Soft Margin SVM• Let’s solve it!
A KTEC Center of Excellence 27
Soft Margin Solution
A KTEC Center of Excellence 28
Soft Margin Example
A KTEC Center of Excellence 29
Support Vector Regression• Similar idea to classification, except turned
inside-out
• Epsilon-insensitive loss instead of hinge
• Ridge Regression: Squared-error loss
A KTEC Center of Excellence 30
Support Vector Regression• But, encourage sparseness
•Need inequalities• epsilon-insensitive loss
A KTEC Center of Excellence 31
Epsilon-insensitive•Defines band around function for 0-
loss
A KTEC Center of Excellence 32
SVR (linear epsilon)•Opt. problem:
• Let’s solve again
A KTEC Center of Excellence 33
SVR Dual and Solution•Dual problem
A KTEC Center of Excellence 34
Online• So far batch: processed all at once
• Many tasks require data processed one at a time from start
• Learner:• Makes prediction
• Gets feedback (correct value)
• Updates
• Conservative only updates if non-zero loss
A KTEC Center of Excellence 35
Simple On-line Alg.: Perceptron• Threshold linear function
• At t+1 weight updated if error
• Dual update rule:
• If
A KTEC Center of Excellence 36
Algorithm Pseudocode
A KTEC Center of Excellence 37
Novikoff Theorem• Convergence bound for hard-margin case
• If training points contained in ball of radius R around origin
• w* hard margin svm with no bias and geometric margin gamma
• Initial weight:
• Number of updates bounded by:
A KTEC Center of Excellence 38
Proof• From 2 inequalities:
• Putting these together we have:
• Which leads to bound:
A KTEC Center of Excellence 39
Kernel Adatron• Simple modification to perceptron, models hard margin
SVM with 0 thresholdalpha stops changing, either alpha positive and right term 0, or right term negative
A KTEC Center of Excellence 40
Kernel Adatron – Soft Margin• 1-norm soft margin version
• Add upper bound to the values of alpha (C)
• 2-norm soft margin version
• Add constant to diagonal of kernel matrix
• SMO
• To allow a variable threshold, updates must be made on pair of examples at once
• Results in SMO
• Rate of convergence both algs. sensitive to order
• Good heuristics, e.g. choose points most violate conditions first
A KTEC Center of Excellence 41
On-line regression• Also works for regression case
• Basic gradient ascent with additional constraints
A KTEC Center of Excellence 42
Online SVR
A KTEC Center of Excellence 43
Questions•Questions, Comments?
Recommended