38
SOFT LARGE MARGIN CLASSIFIERS David Kauchak CS 451 – Fall 2013

Soft Large Margin classifiers

  • Upload
    trung

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

Soft Large Margin classifiers. David Kauchak CS 451 – Fall 2013. Admin. Assignment 5 Midterm Friday’s class will be in MBH 632 CS lunch talk Thursday. Java tips for the data. - Xmx -Xmx2g http:// www.youtube.com / watch?v =u0VoFU82GSw. Large margin classifiers. margin. margin. - PowerPoint PPT Presentation

Citation preview

Page 1: Soft Large Margin classifiers

SOFT LARGE MARGIN CLASSIFIERS

David KauchakCS 451 – Fall 2013

Page 2: Soft Large Margin classifiers

Admin

Assignment 5

Midterm

Friday’s class will be in MBH 632

CS lunch talk Thursday

Page 3: Soft Large Margin classifiers

Java tips for the data-Xmx

-Xmx2g

http://www.youtube.com/watch?v=u0VoFU82GSw

Page 4: Soft Large Margin classifiers

Large margin classifiers

The margin of a classifier is the distance to the closest points of either class

Large margin classifiers attempt to maximize this

margin margin

Page 5: Soft Large Margin classifiers

Support vector machine problem

subject to:

This is a a quadratic optimization problem

Maximize/minimize a quadratic function

Subject to a set of linear constraints

Many, many variants of solving this problem (we’ll see one in a bit)

Page 6: Soft Large Margin classifiers

Soft Margin Classification

What about this problem?

subject to:

Page 7: Soft Large Margin classifiers

Soft Margin Classification

subject to:

We’d like to learn something like this, but our constraints won’t allow it

Page 8: Soft Large Margin classifiers

Slack variables

subject to:

subject to:slack variables (one for each example)

What effect does this have?

Page 9: Soft Large Margin classifiers

Slack variables

subject to:

slack penalties

Page 10: Soft Large Margin classifiers

Slack variables

subject to:allowed to make a mistake

penalized by how far from “correct”

trade-off between margin maximization and penalization

margin

Page 11: Soft Large Margin classifiers

Soft margin SVM

Still a quadratic optimization problem!

subject to:

Page 12: Soft Large Margin classifiers

Demo

http://cs.stanford.edu/people/karpathy/svmjs/demo/

Page 13: Soft Large Margin classifiers

Solving the SVM problem

Page 14: Soft Large Margin classifiers

Understanding the Soft Margin SVM

subject to:

Given the optimal solution, w, b:

Can we figure out what the slack penalties are for each point?

Page 15: Soft Large Margin classifiers

Understanding the Soft Margin SVM

subject to:

What do the margin linesrepresent wrt w,b?

Page 16: Soft Large Margin classifiers

Understanding the Soft Margin SVM

subject to:

Or:

Page 17: Soft Large Margin classifiers

Understanding the Soft Margin SVM

subject to:

What are the slack values for points outside (or on) the margin AND correctly classified?

Page 18: Soft Large Margin classifiers

Understanding the Soft Margin SVM

subject to:

0! The slack variables have to be greater than or equal to zero and if they’re on or beyond the margin then yi(wxi+b) ≥ 1 already

Page 19: Soft Large Margin classifiers

Understanding the Soft Margin SVM

subject to:

What are the slack values for points inside the margin AND classified correctly?

Page 20: Soft Large Margin classifiers

Understanding the Soft Margin SVM

subject to:

Difference from point to the margin. Which is?

Page 21: Soft Large Margin classifiers

Understanding the Soft Margin SVM

subject to:

What are the slack values for points that are incorrectly classified?

Page 22: Soft Large Margin classifiers

Understanding the Soft Margin SVM

subject to:

Which is?

Page 23: Soft Large Margin classifiers

Understanding the Soft Margin SVM

subject to:

“distance” to the hyperplane plus the “distance” to the margin?

Page 24: Soft Large Margin classifiers

Understanding the Soft Margin SVM

subject to:

“distance” to the hyperplane plus the “distance” to the margin

Why -?

Page 25: Soft Large Margin classifiers

Understanding the Soft Margin SVM

subject to:

“distance” to the hyperplane plus the “distance” to the margin

?

Page 26: Soft Large Margin classifiers

Understanding the Soft Margin SVM

subject to:

“distance” to the hyperplane plus the “distance” to the margin

1

Page 27: Soft Large Margin classifiers

Understanding the Soft Margin SVM

subject to:

“distance” to the hyperplane plus the “distance” to the margin

Page 28: Soft Large Margin classifiers

Understanding the Soft Margin SVM

subject to:

Page 29: Soft Large Margin classifiers

Understanding the Soft Margin SVM

Does this look familiar?

Page 30: Soft Large Margin classifiers

Hinge loss!

0/1 loss:

Hinge:

Exponential:

Squared loss:

Page 31: Soft Large Margin classifiers

Understanding the Soft Margin SVM

subject to:

Do we need the constraints still?

Page 32: Soft Large Margin classifiers

Understanding the Soft Margin SVM

subject to:

Unconstrained problem!

Page 33: Soft Large Margin classifiers

Understanding the Soft Margin SVM

Does this look like something we’ve seen before?

Gradient descent problem!

Page 34: Soft Large Margin classifiers

Soft margin SVM as gradient descent

multiply through by 1/Cand rearrange

let λ=1/C

What type of gradient descent problem?

Page 35: Soft Large Margin classifiers

Soft margin SVM as gradient descent

One way to solve the soft margin SVM problem is using gradient descent

hinge loss L2 regularization

Page 36: Soft Large Margin classifiers

Gradient descent SVM solver

pick a starting point (w) repeat until loss doesn’t decrease in all

dimensions: pick a dimension move a small amount in that dimension towards

decreasing loss (using the derivative)

hinge loss L2 regularizationFinds the largest margin hyperplane while allowing for a soft margin

Page 37: Soft Large Margin classifiers

Support vector machinesOne of the most successful (if not the most successful) classification approach:

Support vector machine

perceptron algorithm

k nearest neighbor

decision tree

Page 38: Soft Large Margin classifiers

Trends over time