LOGO
Classification IV
Lecturer Dr Bo Yuan
E-mail yuanbsztsinghuaeducn
Overview
Support Vector Machines
2
Linear Classifier
3
wx + b gt0
wx + b lt0
wx + b =0
)(
))(()(
bxwsign
xgsignbwxf
w
n
iiixwxw
caseinJust
1
0)( 21
21
xxw
bxwbxw
Distance to Hyperplane
4
x
x
bxwxg )(
ww
wwbxw
bwxwxg
wxx
)()(
||||
|)(||||||)(|
||||||||
w
xg
ww
wxg
wxxM
||||
||
w
b
Selection of Classifiers
5
Which classifier is the best
All have the same training error
How about generalization
Unknown Samples
6
A
B
Classifier B divides the space more consistently (unbiased)
Margins
7
Support Vectors Support Vectors
Margins
The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point
Intuitively it is safer to choose a classifier with a larger margin
Wider buffer zone for mistakes
The hyperplane is decided by only a few data points
Support Vectors
Others can be discarded
Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)
Works very well in practice
How to specify the margin formally
8
Margins
9
ldquoPredict Class
= +1rdquo
zone
ldquoPredict Class
= -1rdquo
zone
wx+b=1
wx+b=
0wx+b=-1
X-
x+M=Margin
Width
||||
2
wM
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Overview
Support Vector Machines
2
Linear Classifier
3
wx + b gt0
wx + b lt0
wx + b =0
)(
))(()(
bxwsign
xgsignbwxf
w
n
iiixwxw
caseinJust
1
0)( 21
21
xxw
bxwbxw
Distance to Hyperplane
4
x
x
bxwxg )(
ww
wwbxw
bwxwxg
wxx
)()(
||||
|)(||||||)(|
||||||||
w
xg
ww
wxg
wxxM
||||
||
w
b
Selection of Classifiers
5
Which classifier is the best
All have the same training error
How about generalization
Unknown Samples
6
A
B
Classifier B divides the space more consistently (unbiased)
Margins
7
Support Vectors Support Vectors
Margins
The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point
Intuitively it is safer to choose a classifier with a larger margin
Wider buffer zone for mistakes
The hyperplane is decided by only a few data points
Support Vectors
Others can be discarded
Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)
Works very well in practice
How to specify the margin formally
8
Margins
9
ldquoPredict Class
= +1rdquo
zone
ldquoPredict Class
= -1rdquo
zone
wx+b=1
wx+b=
0wx+b=-1
X-
x+M=Margin
Width
||||
2
wM
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Linear Classifier
3
wx + b gt0
wx + b lt0
wx + b =0
)(
))(()(
bxwsign
xgsignbwxf
w
n
iiixwxw
caseinJust
1
0)( 21
21
xxw
bxwbxw
Distance to Hyperplane
4
x
x
bxwxg )(
ww
wwbxw
bwxwxg
wxx
)()(
||||
|)(||||||)(|
||||||||
w
xg
ww
wxg
wxxM
||||
||
w
b
Selection of Classifiers
5
Which classifier is the best
All have the same training error
How about generalization
Unknown Samples
6
A
B
Classifier B divides the space more consistently (unbiased)
Margins
7
Support Vectors Support Vectors
Margins
The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point
Intuitively it is safer to choose a classifier with a larger margin
Wider buffer zone for mistakes
The hyperplane is decided by only a few data points
Support Vectors
Others can be discarded
Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)
Works very well in practice
How to specify the margin formally
8
Margins
9
ldquoPredict Class
= +1rdquo
zone
ldquoPredict Class
= -1rdquo
zone
wx+b=1
wx+b=
0wx+b=-1
X-
x+M=Margin
Width
||||
2
wM
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Distance to Hyperplane
4
x
x
bxwxg )(
ww
wwbxw
bwxwxg
wxx
)()(
||||
|)(||||||)(|
||||||||
w
xg
ww
wxg
wxxM
||||
||
w
b
Selection of Classifiers
5
Which classifier is the best
All have the same training error
How about generalization
Unknown Samples
6
A
B
Classifier B divides the space more consistently (unbiased)
Margins
7
Support Vectors Support Vectors
Margins
The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point
Intuitively it is safer to choose a classifier with a larger margin
Wider buffer zone for mistakes
The hyperplane is decided by only a few data points
Support Vectors
Others can be discarded
Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)
Works very well in practice
How to specify the margin formally
8
Margins
9
ldquoPredict Class
= +1rdquo
zone
ldquoPredict Class
= -1rdquo
zone
wx+b=1
wx+b=
0wx+b=-1
X-
x+M=Margin
Width
||||
2
wM
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Selection of Classifiers
5
Which classifier is the best
All have the same training error
How about generalization
Unknown Samples
6
A
B
Classifier B divides the space more consistently (unbiased)
Margins
7
Support Vectors Support Vectors
Margins
The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point
Intuitively it is safer to choose a classifier with a larger margin
Wider buffer zone for mistakes
The hyperplane is decided by only a few data points
Support Vectors
Others can be discarded
Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)
Works very well in practice
How to specify the margin formally
8
Margins
9
ldquoPredict Class
= +1rdquo
zone
ldquoPredict Class
= -1rdquo
zone
wx+b=1
wx+b=
0wx+b=-1
X-
x+M=Margin
Width
||||
2
wM
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Unknown Samples
6
A
B
Classifier B divides the space more consistently (unbiased)
Margins
7
Support Vectors Support Vectors
Margins
The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point
Intuitively it is safer to choose a classifier with a larger margin
Wider buffer zone for mistakes
The hyperplane is decided by only a few data points
Support Vectors
Others can be discarded
Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)
Works very well in practice
How to specify the margin formally
8
Margins
9
ldquoPredict Class
= +1rdquo
zone
ldquoPredict Class
= -1rdquo
zone
wx+b=1
wx+b=
0wx+b=-1
X-
x+M=Margin
Width
||||
2
wM
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Margins
7
Support Vectors Support Vectors
Margins
The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point
Intuitively it is safer to choose a classifier with a larger margin
Wider buffer zone for mistakes
The hyperplane is decided by only a few data points
Support Vectors
Others can be discarded
Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)
Works very well in practice
How to specify the margin formally
8
Margins
9
ldquoPredict Class
= +1rdquo
zone
ldquoPredict Class
= -1rdquo
zone
wx+b=1
wx+b=
0wx+b=-1
X-
x+M=Margin
Width
||||
2
wM
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Margins
The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point
Intuitively it is safer to choose a classifier with a larger margin
Wider buffer zone for mistakes
The hyperplane is decided by only a few data points
Support Vectors
Others can be discarded
Select the classifier with the maximum margin Linear Support Vector Machines (LSVM)
Works very well in practice
How to specify the margin formally
8
Margins
9
ldquoPredict Class
= +1rdquo
zone
ldquoPredict Class
= -1rdquo
zone
wx+b=1
wx+b=
0wx+b=-1
X-
x+M=Margin
Width
||||
2
wM
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Margins
9
ldquoPredict Class
= +1rdquo
zone
ldquoPredict Class
= -1rdquo
zone
wx+b=1
wx+b=
0wx+b=-1
X-
x+M=Margin
Width
||||
2
wM
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Objective Function
Correctly classify all data points
Maximize the margin
Quadratic Optimization Problem
Minimize
Subject to 10
1 1 ii yifbxw
1 1 ii yifbxw
01)( bxwy ii
www
M T
2
1min
2max
www t
2
1)(
1)( bxwy ii
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Lagrange Multipliers
11
l
i
l
iiiiiP bxwywL
1 1
2 )(||||2
1
00
0
1
1
l
iii
p
l
iiii
p
yb
L
xyww
L
0amp0
2
1
2
1
ii
ii
jijiijT
ii
jijijiji
iiD
ytosubject
xxyyHwhereH
xxyyL
Dual Problem
Quadratic problem again
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Solutions of w amp b
12
bxxyxgl
iiii
1
)(
inner product
positivewithSamplesVectorsSupport
Ss Smsmmms
s
Smsmmms
ssmmSm
ms
Smsmmms
ss
xxyyN
b
xxyyb
ybxxyy
bxxyy
bwxy
)(1
)(
1)(
1)(
2
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
An Example
13
(1 1 +1)
(0 0 -1)
1
2
212
2
11 00
iii y
00
02
22221212
21211111
2221
1211
xxyyxxyy
xxyyxxyy
HH
HHH
211
2
12
121 2
2
1
HLi
iD
11 21
1)(
1121
]11[]00[)1(1]11[11
21
1
2
1
xxbwxxg
wxb
xywi
iii
0121 xx
22
22
wM
x1
x2
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Soft Margin
14
wx+b=1
wx+b=0
wx+b=-
1
e7
e11 e2 01)( iii bwxy
0
2
1)(
i
ii
t Cwww
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Soft Margin
15
iii
p
l
iii
p
l
iiii
p
uCL
yb
L
xyww
L
0
00
0
1
1
i
iiiT
iiD yandCtsHL 00
2
1
l
i
l
iii
l
iiiiiiP bxwyCwL
1 11
2]1)([
2
1
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Non-linear SVMs
16
0 x
0 x
x2
x
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Feature Space
17
Φ x rarr φ(x)
x1
x2
x12
x22
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Feature Space
18
Φ x rarr φ(x)
x2
x1
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Quadratic Basis Functions
19
mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
x
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
)(
Constant Terms
Linear Terms
Pure Quadratic Terms
Quadratic Cross-Terms
Number of terms
22
)1)(2( 22
2
mmmCm
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Calculation of Φ(xi )Φ(xj )
20
mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
ba
1
2
32
1
31
21
2
22
21
2
1
1
2
32
1
31
21
2
22
21
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
)()(
1
m
iiiba
1
2
m
iii ba
1
22
1
1 1
2m
ijiji
m
ij
bbaa
)()( jiji xxxx
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
It turns out hellip
21
m
i
m
i
m
i
m
ijjijiiiii bbaabababa
1 1
1
1 1
22 221)()(
1
1 111
2
1 1 1
1
2
1
22
122)(
12
12)(12)()1(
m
i
m
iii
m
ijjjii
m
iii
m
i
m
j
m
iiijjii
m
iii
m
iii
babababa
bababa
bababababa
)()()1()( 2 bababaK )(mO )( 2mO
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Kernel Trick
The linear classifier relies on dot products between vectors xixj
If every data point is mapped into a high-dimensional space via some transformation Φ x rarr φ(x) the dot product becomes φ(xi) φ(xj)
A kernel function is some function that corresponds to an inner product in some expanded feature space K(xi xj) = φ(xi) φ(xj)
Example x=[x1 x2] K(xi xj) = (1 + xi xj)2
22
]2221[)( where)(x)(x
]2221[]2221[
2221)1()(
212221
21ji
212221
2121
2221
21
221122
222211
21
21
2
xxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxxxxxxK
jjjjjjiiiiii
jijijijijijijiji
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Kernels
23
)tanh()(
2exp)(
)1()(
2
2
cxxxxKTangentHyperbolic
xxxxKGaussian
xxxxKPolynomial
jiji
ji
ji
djiji
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
String Kernel
24
64
4
2)()(
)(
catcatKcarcarK
catcarK
Similarity between text strings Car vs Custard
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Solutions of w amp b
25
bxxKyxg
xxKyyN
xxyyN
b
xxKyxxyxw
xyw
l
iiii
Ss Smsmmms
sSs Smsmmms
s
l
i
l
ijiiijiiij
l
iiii
1
1 1
1
)()(
))((1
))()((1
)()()()(
)(
bxxybxwxgl
iiii
1
)(
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Decision Boundaries
26
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
More Maths hellip
27
Quadratic Optimization
Lagrange Duality
KarushndashKuhnndashTucker Conditions
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
SVM Roadmap
28
Kernel Trick
K(ab)=Φ(a)Φ(b)
ab rarr Φ(a)Φ(b)High Computational Cost
Soft MarginNonlinear Problem
Linear SVMNoise
Linear ClassifierMaximum Margin
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Reading Materials
Text Book
Nello Cristianini and John Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods Cambridge University Press 2000
Online Resources
httpwwwkernel-machinesorg httpwwwsupport-vector-machinesorg
httpwwwtristanfletchercoukSVM20Explainedpdf httpwwwcsientuedutw~cjlinlibsvm
A list of papers uploaded to the web learning portal
Wikipedia amp Google
29
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Review
What is the definition of margin in a linear classifier
Why do we want to maximize the margin
What is the mathematical expression of margin
How to solve the objective function in SVM
What are support vectors
What is soft margin
How does SVM solve nonlinear problems
What is so called ldquokernel trickrdquo
What are the commonly used kernels
30
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic SVM in Practice
Hints
Applications
Demos
Multi-Class Problems
Softwarebull A very popular toolbox Libsvm
Any other interesting topics beyond this lecture
Length 20 minutes plus question time
31