Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
Slide credits to Andrew Moore: http://www.cs.cmu.edu/~awm/tutorials
Support Vector Machine
Foundations of Data Analysis 04/14/2020
2
Linear Classifiers
denotes +1
denotes -1
How would you classify this data?
f(x,w) = sign(w · x)
3
f(x,w) = sign(w · x)
How would you classify this data?
denotes +1
denotes -1
Linear Classifiers
4
f(x,w) = sign(w · x)
How would you classify this data?
denotes +1
denotes -1
Linear Classifiers
5
f(x,w) = sign(w · x)
How would you classify this data?
denotes +1
denotes -1
Linear Classifiers
6
f(x,w) = sign(w · x)
How would you classify this data?
Linear Classifiers
denotes +1
denotes -1
7
Which is the best?
f(x,w) = sign(w · x)
denotes +1
denotes -1
Linear Classifiers
8
Define the margin of a linear classifier as the width that the boundary could be increased before hitting a datapoint.
f(x,w) = sign(w · x) Linear Classifiers
denotes +1
denotes -1
9
Maximum margin: the widest margin that maximally separates two data groups.
f(x,w) = sign(w · x)Maximum Margin
denotes +1
denotes -1
10
Maximum margin linear classifier is the simplest kind of SVM (LinearSVM)
f(x,w) = sign(w · x)
Support Vectors : data points that the margin pushes up against.
denotes +1
denotes -1
Support Vector Machines
11
Specifying a line and margin
• How do we represent this mathematically? • …in m input dimensions?
Plus-Plane
Minus-PlaneClassifier Boundary
“Predict Class = +1” zone
“Predict Class = -1” zone
12
Plus-Plane
Minus-PlaneClassifier Boundary
“Predict Class = +1” zone
Class +1 if
-1 if
embarrassing points
if
w · x >= 1
w · x <= �1
�1 < w · x < 1
“Predict Class = -1” zone
Specifying a line and margin
13
“Predict Class = +1” zone
M = Margin Width
wx=1
wx=0
wx=-1
“Predict Class = -1” zone
How to compute M?
Computing the Margin Width
14
“Predict Class = +1” zone
M = Margin Width
wx=1
wx=0
wx=-1
“Predict Class = -1” zone
x+
x-
• Let x- be any point on the minus plane • Let x+ be the closest plus-plane-point to x-
• Claim: x+ = x- + λ w for some value of λ. Why?
Computing the Margin Width
15
“Predict Class = +1” zone
M = Margin Width
wx=1
wx=0
wx=-1
“Predict Class = -1” zone
x+
x-
Computing the Margin Width
The line from x- to x+ is perpendicular to the planes. So to get from x- to x+ travel some distance in direction w.
• Let x- be any point on the minus plane • Let x+ be the closest plus-plane-point to x-
• Claim: x+ = x- + λ w for some value of λ. Why?
16
“Predict Class = +1” zone
M = Margin Width
wx=1
wx=0
wx=-1
“Predict Class = -1” zone
x+
x-
What we know: • w . x+ = +1 • w . x- = -1 • x+ = x- + λ w • |x+ - x- | = M
Computing the Margin Width
17
“Predict Class = +1” zone
M = Margin Width
wx=1
wx=0
wx=-1
“Predict Class = -1” zone
x+
x-
What we know: • w . x+ = +1 • w . x- = -1 • x+ = x- + λ w • |x+ - x- | = M
w . (x - + λ w) + b = 1
=>
w . x - + b + λ w .w = 1
=>
-1 + λ w .w = 1
=>w.w2
=λ
Computing the Margin Width
18
“Predict Class = +1” zone
M = Margin Width
wx=1
wx=0
wx=-1
“Predict Class = -1” zone
x+
x-
What we know: • w . x+ = +1 • w . x- = -1 • x+ = x- + λ w • |x+ - x- | = M
w.w2
=λ
M = |x+ - x- | =| λ w |=
wwwwww
.2
..2
==
www .|| λλ ==
Computing the Margin Width
19
“Predict Class = +1” zone
wx=1
wx=0
wx=-1
“Predict Class = -1” zone
Learning the Maximum Margin Classifier
x+
x-
M = Margin Width =ww.2
Use optimization to search the space of W to find the widest margin that matches all the data points.
20
What would SVMs do?
x=0
Simple 1-D Example
21
Not a big surprise
Positive “plane” Negative “plane”x=0
Simple 1-D Example
22
What can SVM do about this?
Harder 1-D Example
x=0
23
),( 2kkk xx=z
x=0
Harder 1-D Example
• Permit non-linear basis functions made linear regression
• Let’s permit them here too
■ General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable:
Φ: x → φ(x)
Nonlinear SVMs: Feature Space
24
Nonlinear SVM Basis Functions
Φ(xk) = ( polynomial terms of xk of degree 1 to q )
Φ(xk) = ( Gaussian radial basis functions of xk )
Φ(xk) = ( sigmoid functions of xk )
25