Download ppt - SVMs in a Nutshell

Transcript
Page 1: SVMs in a Nutshell

SVMs in a Nutshell

Page 2: SVMs in a Nutshell

What is an SVM?

• Support Vector Machine– More accurately called support vector

classifier– Separates training data into two classes so

that they are maximally apart

Page 3: SVMs in a Nutshell

Simpler version

• Suppose the data is linearly separable

• Then we could draw a line between the two classes

Page 4: SVMs in a Nutshell

Simpler version

• But what is the best line? In SVM, we’ll use the maximum margin hyperplane

Page 5: SVMs in a Nutshell

Maximum Margin Hyperplane

Page 6: SVMs in a Nutshell

What if it’s non-linear?

Page 7: SVMs in a Nutshell

Higher dimensions

• SVM uses a kernel function to map the data into a different space where it can be separated

Page 8: SVMs in a Nutshell

What if it’s not separable?

• Use linear separation, but allow training errors

• This is called using a “soft margin”• Higher cost for errors = creation of more

accurate model, but may not generalize• Choice of parameters (kernel and cost)

determines accuracy of SVM model• To avoid over- or under-fitting, use cross

validation to choose parameters

Page 9: SVMs in a Nutshell

Some math

• Data: {(x1, c1), (x2, c2), …, (xn, cn)}

• xi is vector of attributes/features, scaled

• ci is class of vector (-1 or +1)

• Dividing hyperplane: wx - b = 0• Linearly separable means there exists a

hyperplane such that wxi - b > 0 if positive example and wxi - b < 0 if negative example

• w points perpendicular to hyperplane

Page 10: SVMs in a Nutshell

More math

• wx - b = 0Support vectors• wx - b = 1• wx - b = -1 Distance between

hyperplanes is 2/|w|, so minimize |w|

Page 11: SVMs in a Nutshell

More math

• For all i, either w xi - b 1 or wx - b -1

• Can be rewritten: ci(w xi - b) 1

• Minimize (1/2)|w| subject to ci(w xi - b) 1

• This is a quadratic programming problem and can be solved in polynomial time

Page 12: SVMs in a Nutshell

A few more details

• So far, assumed linearly separable– To get to higher dimensions, use kernel function

instead of dot product; may be nonlinear transform– Radial Basis Function is commonly used kernel:

k(x, x’) = exp(||x - x’||2) [need to choose ]

• So far, no errors; soft margin:– Minimize (1/2)|w| + C i

– Subject to ci(w xi - b) 1 - i

– C is error penalty