Support Vector Machine - GitHub Pages · Support Vector Machine Foundations of Data Analysis 04/14/2020. 2 Linear Classifiers denotes +1 ... Linear Classifiers denotes +1 denotes

Slide credits to Andrew Moore: http://www.cs.cmu.edu/~awm/tutorials

Support Vector Machine

Foundations of Data Analysis 04/14/2020

http://www.cs.cmu.edu/~awm/tutorials

2

Linear Classifiers

denotes +1

denotes -1

How would you classify this data?

f(x,w) = sign(w · x)

3



denotes +1

denotes -1

Linear Classifiers

4



denotes +1

denotes -1

Linear Classifiers

5



denotes +1

denotes -1

Linear Classifiers

6



Linear Classifiers

denotes +1

denotes -1

7

Which is the best?


denotes +1

denotes -1

Linear Classifiers

8

Define the margin of a linear classifier as the width that the boundary could be increased before hitting a datapoint.

f(x,w) = sign(w · x) Linear Classifiers

denotes +1

denotes -1

9

Maximum margin: the widest margin that maximally separates two data groups.

f(x,w) = sign(w · x)Maximum Margin

denotes +1

denotes -1

10

Maximum margin linear classifier is the simplest kind of SVM (LinearSVM)


Support Vectors : data points that the margin pushes up against.

denotes +1

denotes -1

Support Vector Machines

11

Specifying a line and margin

• How do we represent this mathematically? • …in m input dimensions?

Plus-Plane

Minus-PlaneClassifier Boundary

“Predict Class = +1” zone

“Predict Class = -1” zone

12

Plus-Plane

Minus-PlaneClassifier Boundary


Class +1 if

-1 if

embarrassing points

if

w · x >= 1

w · x <= �1

�1 < w · x < 1


Specifying a line and margin

13


M = Margin Width

wx=1

wx=0

wx=-1


How to compute M?

Computing the Margin Width

14


M = Margin Width

wx=1

wx=0

wx=-1


x+

x-

• Let x- be any point on the minus plane • Let x+ be the closest plus-plane-point to x-

• Claim: x+ = x- + λ w for some value of λ. Why?


15


M = Margin Width

wx=1

wx=0

wx=-1


x+

x-


The line from x- to x+ is perpendicular to the planes. So to get from x- to x+ travel some distance in direction w.

• Let x- be any point on the minus plane • Let x+ be the closest plus-plane-point to x-

• Claim: x+ = x- + λ w for some value of λ. Why?

16


M = Margin Width

wx=1

wx=0

wx=-1


x+

x-

What we know: • w . x+ = +1 • w . x- = -1 • x+ = x- + λ w • |x+ - x- | = M


17


M = Margin Width

wx=1

wx=0

wx=-1


x+

x-


w . (x - + λ w) + b = 1

=>

w . x - + b + λ w .w = 1

=>

-1 + λ w .w = 1

=>w.w2

=λ


18


M = Margin Width

wx=1

wx=0

wx=-1


x+

x-


w.w2

=λ

M = |x+ - x- | =| λ w |=

wwwwww

.2

..2

==

www .|| λλ ==


19


wx=1

wx=0

wx=-1


Learning the Maximum Margin Classifier

x+

x-

M = Margin Width =ww.2

Use optimization to search the space of W to find the widest margin that matches all the data points.

20

What would SVMs do?

x=0

Simple 1-D Example

21

Not a big surprise

Positive “plane” Negative “plane”x=0

Simple 1-D Example

22

What can SVM do about this?

Harder 1-D Example

x=0

23

),( 2kkk xx=z

x=0

Harder 1-D Example

• Permit non-linear basis functions made linear regression

• Let’s permit them here too

■ General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable:

Φ: x → φ(x)

Nonlinear SVMs: Feature Space

24

Nonlinear SVM Basis Functions

Φ(xk) = ( polynomial terms of xk of degree 1 to q )

Φ(xk) = ( Gaussian radial basis functions of xk )

Φ(xk) = ( sigmoid functions of xk )

25

Documents

Support Vector Machine - GitHub Pages · Support Vector Machine Foundations of Data Analysis 04/14/2020. 2 Linear Classifiers denotes +1 ... Linear Classifiers denotes +1 denotes