Upload
marianna-robbins
View
221
Download
0
Embed Size (px)
Citation preview
Survey of Kernel Methods
byJinsan Yang
(c) 2003 SNU Biointelligence Lab.
Introduction
Support Vector MachinesFormulation of SVMOptimization Theorem Dual Formulation of SVM
Reproducing Kernel Hilbert Space Kernel Machines
(c) 2003 SNU Biointelligence Lab.
SVM Formulation
Support vector classifiers:
L
xx
Linxxfor
||||/
0)('
,
*
21
21
*
0'0 xL
||)('||/)(
)'(||||
1)('
:),(
00*
xfxf
xxx
Lxdist
C/1|||| 1x
2x
(c) 2003 SNU Biointelligence Lab.
SVM Formulation
}0'|),({ 021 xLxxx
}0'|),({ 021 xLxxx
}0'|),({ 021 xLxxx
(c) 2003 SNU Biointelligence Lab.
Optimal separating hyperplane
Optimize:
Note: Any positively scaled (multiple of a vector) satisfies
Set by
})1,1{(
),,1(,),(max0,
i
ii
ywhere
NiCLxdistytosubjectC
Cxy ii )'(||||
10
0,
C/1||||
),,1(,1)'(||||2
1min 0
2
, 0
Nixytosubject ii
(c) 2003 SNU Biointelligence Lab.
Optimization Theorem
Because of the many constraints, the optimization of SVM is still too complicate to solve
Change this to the corresponding dual formulation
Need to use some theorems about duality: Kuhn Tucker Theorem Kuhn Tucker Saddle Point Condition (saddle point theorem) Wolfe (existence of dual solution)
(c) 2003 SNU Biointelligence Lab.
Optimization Theorem
Generalization of the following optimization problemTheorem (Fermat,1629):
For a convex f , w* is a minimum of f(w) iff Theorem (Lagrange,1797):
For a convex Lagrangian , w* is a minimum of f(w) subject to hi(w) = 0, i=1, 2, .. , m iff
m
iii whwfwL
1
)()(),(
0)( *
w
wf
0),(
,0),( ****
wL
w
wL
(c) 2003 SNU Biointelligence Lab.
Optimization Theorem
Kuhn and Tucker suggested a solution to the so called convex optimization problem, where one minimizes a certain type of (convex) objective function under certain (convex) constraints of inequality type.Problem: minimize
subject to :
Generalized Lagrangian function:
)(wf
),,1(,0)(
),,1(,0)(
miwh
kiwg
i
i
)(')(')(),,( whwgwfwL
(c) 2003 SNU Biointelligence Lab.
Optimization Theorem
Lagrangian dual problem: maximize subject to (where )
Theorem (weak duality theorem): for solutions between primal and dual problems, Corollary:Corollary: If , then Duality Gap: the difference between the primal and dual problems
),( 0 ),,(inf),( wL
w
),()( wf}0)(,0)(:)(inf{}0:),(sup{ whwgwf
),()( *** wf ),,1(,0)( ** kiwgii
(c) 2003 SNU Biointelligence Lab.
Optimization Theorem
Saddle point
Theorem: is a saddle point of the Lagrangian function for the primal problem iff there is no duality gap
for the optimal solutionsTheorem (strong duality theorem, Wolfe): In the convex domain of primal problem and affine functions h and g, the duality gap is zero.
),,( *** w
),,(),,(),,( ****** wLwLwL ),,( *** w
(c) 2003 SNU Biointelligence Lab.
Optimization Theorem
Theorem (Kuhn-Tucker, 1951) For a primal optimization problems with convex domain and affine g and h,
is an optimal solution iff there are such that*w **,
),,1(0)(
),,1(0)(,0
0),,(
,0),,(
**
**
******
kiwg
kiwg
wL
w
wL
i
i
i
i
(The Kuhn-Tucker conditions)
(c) 2003 SNU Biointelligence Lab.
Optimization Theorem
In the Kuhn-Tucker conditions, if and in that case, the corresponding constraint becomes inactive (since ) in the primal optimization problem.The constraint can be active ( or ) when .
0)( * wgi0i
0i0i
0)( * wgi
0i
(c) 2003 SNU Biointelligence Lab.
Dual form of SVM
Primal problem:
Dual problem:
):0,0:0(
)1)'((||||2
1),,(
),,1(,1)'(||||2
1min
11
1
2
2
, 0
l
iiii
l
iii
l
iiii
ii
xyww
Ly
b
L
bwxywbwL
libwxytosubjectw
),,(inf)( bwLw
0),,,1(0
,2
1)(),(max
1
1,1
i
l
iii
jiji
l
jiji
l
ii
ylitosubject
xxyy
(c) 2003 SNU Biointelligence Lab.
Nonlinear SVM
Dual problem:
0),,,1(0
),(2
1
)(),(2
1)(),(max
1
1,1
1,1
i
l
iii
jiji
l
jiji
l
ii
jiji
l
jiji
l
ii
ylitosubject
xxkyy
xxyy
(c) 2003 SNU Biointelligence Lab.
Reproducing Kernel Hilbert Space
Dual representation of the hypothesis
Kernel : a function K such that for all x, z X
Using Kernel, we can compute the inner product
in the feature space directly as a function of the original input space,
l
iiii bxxybxwxf
1
)(),()(')(
)(),(),( zxzxK
l
iiii bxxKyxf
1
),()(
)(),( zx
(c) 2003 SNU Biointelligence Lab.
Reproducing Kernel Hilbert Space
For a given Kernel, what is the corresponding feature mapping? (Ans: Mercer’s theorem)Theorem (Mercer): If k is a continuous symmetric kernel of a positive integral operator K, that is:
,it can be expanded in a uniformly convergent series in terms of Eigenfunctions and positive Eigenvalues
)(0)()(),(
)(),())((
2 CLfallfordxdyyfxfyxkwith
dxxfyxkyKf
CC
C
jj
''222
'111
1
.
,)()(),(
mmm
F
N
jjjj
vvvvvvAcf
NwhereyxyxkF
(c) 2003 SNU Biointelligence Lab.
Reproducing Kernel Hilbert Space
Note: construction of a feature map corresponding to a kernel K
Proposition: If k is a continuous kernel of a positive integral operator (positive semi-definite in discrete case), one can construct a mapping into a space where k acts as
a dot product,
k in mercer’s theorem is called mercer kernel.
)),(),((: 2211 xxx
),())(),(( yxkyx
(c) 2003 SNU Biointelligence Lab.
Reproducing Kernel Hilbert Space
A vector space X is called inner product space if there is a real bilinear map < , > satisfying:• •
Hilbert space: a complete separable inner product space (A space H is separable if there exists a countable subset D s
uch that every element of H is the limit of a sequence of elements of D.) RKHS: RKHS is a Hilbert space of functions f on some set C such that all evaluation functionals
are continuous. (Wahba)
00,,0,
,,
xxxandxx
xyyx
)()( yffT
(c) 2003 SNU Biointelligence Lab.
Reproducing Kernel Hilbert Space
Riesz representation theorem: Let H be a Hilbert space and let be given. Then there is a unique
such that Recall: If Hrkhs is a RKHS, then for each y in C, Ty (defined as ) is continuous.
By the Riesz representation theorem, for each there exist a unique function of x, say, such that
Cy
(*)),(,)( ykfyf
*H Hf 0||||||||, 00 fandHffff
)()(,: yffTCHT yRKHSy
RKHSHyk ),(
(c) 2003 SNU Biointelligence Lab.
Reproducing Kernel Hilbert Space
spans the whole RKHS : By (*), implies f = 0.
By (*), :The inner product on the RKHS space corresponds to a value
of the reproducing kernel k
Cf. is the completion of the continuous functions wrt. the L2-norm.
}:),({ Cyyk yykf 0),(,
(**)),())(,(),(),,( yxkyxkykxk
)(2 nRL
(c) 2003 SNU Biointelligence Lab.
Reproducing Kernel Hilbert Space
For a Mercer kernel k, it is possible to construct a dot product such that k becomes a reproducing kernel for a Hilbert space of functions of the form
(check) Since k is symmetric, choose i orthogonal as,
jjnnj /,
1 11
)()(),()(i
N
jijjji
iii
F
xxaxxkaxf
)(
)(,)(),(,1 1,
yf
yxaykfi
N
njnnnjijji
F
(c) 2003 SNU Biointelligence Lab.
Reproducing Kernel Hilbert Space
Feature space vs RKHS: feature space is a RKHS. Rewriting the functions of RKHS wrt. the orthonomal basis of M
ercer’s theorem :
(x) is nothing but the coordinate representation of the kernel as a function of one argument.
)(
,)(,
)('))(,()()(
1
1 1
1
ini
in
nni
N
jjijjinnn
n
N
nnn
xa
xaf
xxxxf
F
F
FNnnn ,,1)(
)),(,(
)),(,())(,(
RKHSHxkfwhere
xkfx
(c) 2003 SNU Biointelligence Lab.
Reproducing Kernel Hilbert Space
The representation ability of a kernel k and l data points: The corresponding feature space H is spanned by
The feature mapping is wrt. the corresponding Mercer’s eigenfunctions and an objective function f(t) may be expressed as a linear combination of these eigenfunctions.
Since H is a RKHS, any such nonlinear function f(t) can be approximated with these kernels
)},(,),,({ 1 lxkxk
(c) 2003 SNU Biointelligence Lab.
Example
Nonlinear Regression for a training set generated from a target function t(x) :Assume a dual representationMinimize the norm
)},(,),,{( 11 ll yxyxS
l
iii xxKxf
1
),()(
l
i
l
jjiji
l
iii
l
iii
l
iiiH
txxKy
xtxxKxtxxKtf
1 1
2
1
11
2
||||),(2
)(),(),(),(||||