Soft Large Margin classifiers

SOFT LARGE MARGIN CLASSIFIERS

David KauchakCS 451 – Fall 2013

Assignment 5

Midterm

Friday’s class will be in MBH 632

CS lunch talk Thursday

Java tips for the data-Xmx

-Xmx2g

http://www.youtube.com/watch?v=u0VoFU82GSw

Large margin classifiers

The margin of a classifier is the distance to the closest points of either class

Large margin classifiers attempt to maximize this

margin margin

Support vector machine problem

subject to:

This is a a quadratic optimization problem

Maximize/minimize a quadratic function

Subject to a set of linear constraints

Many, many variants of solving this problem (we’ll see one in a bit)

Soft Margin Classification

What about this problem?

subject to:

Soft Margin Classification

subject to:

We’d like to learn something like this, but our constraints won’t allow it

Slack variables

subject to:

subject to:slack variables (one for each example)

What effect does this have?

Slack variables

subject to:

slack penalties

Slack variables

subject to:allowed to make a mistake

penalized by how far from “correct”

trade-off between margin maximization and penalization

margin

Soft margin SVM

Still a quadratic optimization problem!

subject to:

http://cs.stanford.edu/people/karpathy/svmjs/demo/

Solving the SVM problem

Understanding the Soft Margin SVM

subject to:

Given the optimal solution, w, b:

Can we figure out what the slack penalties are for each point?

subject to:

What do the margin linesrepresent wrt w,b?

subject to:

What are the slack values for points outside (or on) the margin AND correctly classified?

subject to:

0! The slack variables have to be greater than or equal to zero and if they’re on or beyond the margin then yi(wxi+b) ≥ 1 already

subject to:

What are the slack values for points inside the margin AND classified correctly?

subject to:

Difference from point to the margin. Which is?

subject to:

What are the slack values for points that are incorrectly classified?

subject to:

Which is?

subject to:

“distance” to the hyperplane plus the “distance” to the margin?

subject to:

“distance” to the hyperplane plus the “distance” to the margin

Why -?

subject to:

Does this look familiar?

Hinge loss!

0/1 loss:

Hinge:

Exponential:

Squared loss:

subject to:

Do we need the constraints still?

subject to:

Unconstrained problem!

Does this look like something we’ve seen before?

Gradient descent problem!

Soft margin SVM as gradient descent

multiply through by 1/Cand rearrange

let λ=1/C

What type of gradient descent problem?

Soft margin SVM as gradient descent

One way to solve the soft margin SVM problem is using gradient descent

hinge loss L2 regularization

Gradient descent SVM solver

pick a starting point (w) repeat until loss doesn’t decrease in all

dimensions: pick a dimension move a small amount in that dimension towards

decreasing loss (using the derivative)

hinge loss L2 regularizationFinds the largest margin hyperplane while allowing for a soft margin

Support vector machinesOne of the most successful (if not the most successful) classification approach:

Support vector machine

perceptron algorithm

k nearest neighbor

decision tree

Trends over time

Soft Large Margin classifiers

Documents

Ch 6. Kernel Methods by Aizerman et al. (1964). Re-introduced in the context of large margin classifiers by Boser et al. (1992). Vapnik (1995), Burges

Classifiers Notes

Advanced classifiers

ICCV2009: Max-Margin Ađitive Classifiers for Detection

Soft Margin Support Vector Classi cation as Bu ered Probability … · Soft Margin Support Vector Classi cation as Bu ered Probability Minimization Matthew Norton, Alexander Mafusalov,

Soft-Margin Learning for Multiple Feature- Kernel ...vislab.ucr.edu/Biometrics16/Soft-Margin_Learning.pdf · EDA1 COMP_DEG MDS KDA1 Gopalan Kliep Deep Face Naïve BaseMKL Proposed

Evaluating Classifiers

Reflux Classifiers

Hard or Soft Classification? Large-Margin Unified …yfliu/papers/lum.pdfHard or Soft Classiﬁcation? Large-Margin Uniﬁed Machines ... University of North Carolina, Chapel Hill,

Spatial Intent Recognition usingweb.mit.edu/16.412j/www/html/Final Projects/FIN_Qu_Coffee... · 2005-05-20 · Spatial Intent Recognition using Optimal Margin Classifiers 16.412J

Classifiers in Kam-Tai Languages · 2013-01-23 · 4.2.7. Classifiers and temporal words ..... 169 4.2.8. Classifiers and pronouns ..... 171 4.2.9. Classifiers and numerals ... few

Lecture 12 Classifiers Part 2 Topics Classifiers Maxent Classifiers Maximum Entropy Markov Models Information Extraction and chunking intro Readings: Chapter

8. Classifiers

Reducing multiclass to binary, 2000cseweb.ucsd.edu/~elkan/254spring01/aldebaro1.pdfAldebaro Klautau Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers 7 13 Algorithm

Advances in Large Margin Classifiers

Using Image Priors in Maximum Margin Classifiers Tali Brayer Margarita Osadchy Daniel Keren

8. Classifiers - Max Planck Societypubman.mpdl.mpg.de/.../component/escidoc:1582926/Classifiers.pdf · 8. Classifiers 1. Introduction 2 ... and examples of classifiers in various

More Classifiers

The soft-margin support vector machine - SVCL · The soft-margin support vector machine Nuno Vasconcelos ECE Department, UCSD

Naïve Bayes Classifiers