24
Survey of Kernel Methods by Jinsan Yang

Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

Embed Size (px)

Citation preview

Page 1: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

Survey of Kernel Methods

byJinsan Yang

Page 2: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Introduction

Support Vector MachinesFormulation of SVMOptimization Theorem Dual Formulation of SVM

Reproducing Kernel Hilbert Space Kernel Machines

Page 3: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

SVM Formulation

Support vector classifiers:

L

xx

Linxxfor

||||/

0)('

,

*

21

21

*

0'0 xL

||)('||/)(

)'(||||

1)('

:),(

00*

xfxf

xxx

Lxdist

C/1|||| 1x

2x

Page 4: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

SVM Formulation

}0'|),({ 021 xLxxx

}0'|),({ 021 xLxxx

}0'|),({ 021 xLxxx

Page 5: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Optimal separating hyperplane

Optimize:

Note: Any positively scaled (multiple of a vector) satisfies

Set by

})1,1{(

),,1(,),(max0,

i

ii

ywhere

NiCLxdistytosubjectC

Cxy ii )'(||||

10

0,

C/1||||

),,1(,1)'(||||2

1min 0

2

, 0

Nixytosubject ii

Page 6: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Optimization Theorem

Because of the many constraints, the optimization of SVM is still too complicate to solve

Change this to the corresponding dual formulation

Need to use some theorems about duality: Kuhn Tucker Theorem Kuhn Tucker Saddle Point Condition (saddle point theorem) Wolfe (existence of dual solution)

Page 7: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Optimization Theorem

Generalization of the following optimization problemTheorem (Fermat,1629):

For a convex f , w* is a minimum of f(w) iff Theorem (Lagrange,1797):

For a convex Lagrangian , w* is a minimum of f(w) subject to hi(w) = 0, i=1, 2, .. , m iff

m

iii whwfwL

1

)()(),(

0)( *

w

wf

0),(

,0),( ****

wL

w

wL

Page 8: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Optimization Theorem

Kuhn and Tucker suggested a solution to the so called convex optimization problem, where one minimizes a certain type of (convex) objective function under certain (convex) constraints of inequality type.Problem: minimize

subject to :

Generalized Lagrangian function:

)(wf

),,1(,0)(

),,1(,0)(

miwh

kiwg

i

i

)(')(')(),,( whwgwfwL

Page 9: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Optimization Theorem

Lagrangian dual problem: maximize subject to (where )

Theorem (weak duality theorem): for solutions between primal and dual problems, Corollary:Corollary: If , then Duality Gap: the difference between the primal and dual problems

),( 0 ),,(inf),( wL

w

),()( wf}0)(,0)(:)(inf{}0:),(sup{ whwgwf

),()( *** wf ),,1(,0)( ** kiwgii

Page 10: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Optimization Theorem

Saddle point

Theorem: is a saddle point of the Lagrangian function for the primal problem iff there is no duality gap

for the optimal solutionsTheorem (strong duality theorem, Wolfe): In the convex domain of primal problem and affine functions h and g, the duality gap is zero.

),,( *** w

),,(),,(),,( ****** wLwLwL ),,( *** w

Page 11: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Optimization Theorem

Theorem (Kuhn-Tucker, 1951) For a primal optimization problems with convex domain and affine g and h,

is an optimal solution iff there are such that*w **,

),,1(0)(

),,1(0)(,0

0),,(

,0),,(

**

**

******

kiwg

kiwg

wL

w

wL

i

i

i

i

(The Kuhn-Tucker conditions)

Page 12: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Optimization Theorem

In the Kuhn-Tucker conditions, if and in that case, the corresponding constraint becomes inactive (since ) in the primal optimization problem.The constraint can be active ( or ) when .

0)( * wgi0i

0i0i

0)( * wgi

0i

Page 13: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Dual form of SVM

Primal problem:

Dual problem:

):0,0:0(

)1)'((||||2

1),,(

),,1(,1)'(||||2

1min

11

1

2

2

, 0

l

iiii

l

iii

l

iiii

ii

xyww

Ly

b

L

bwxywbwL

libwxytosubjectw

),,(inf)( bwLw

0),,,1(0

,2

1)(),(max

1

1,1

i

l

iii

jiji

l

jiji

l

ii

ylitosubject

xxyy

Page 14: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Nonlinear SVM

Dual problem:

0),,,1(0

),(2

1

)(),(2

1)(),(max

1

1,1

1,1

i

l

iii

jiji

l

jiji

l

ii

jiji

l

jiji

l

ii

ylitosubject

xxkyy

xxyy

Page 15: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Reproducing Kernel Hilbert Space

Dual representation of the hypothesis

Kernel : a function K such that for all x, z X

Using Kernel, we can compute the inner product

in the feature space directly as a function of the original input space,

l

iiii bxxybxwxf

1

)(),()(')(

)(),(),( zxzxK

l

iiii bxxKyxf

1

),()(

)(),( zx

Page 16: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Reproducing Kernel Hilbert Space

For a given Kernel, what is the corresponding feature mapping? (Ans: Mercer’s theorem)Theorem (Mercer): If k is a continuous symmetric kernel of a positive integral operator K, that is:

,it can be expanded in a uniformly convergent series in terms of Eigenfunctions and positive Eigenvalues

)(0)()(),(

)(),())((

2 CLfallfordxdyyfxfyxkwith

dxxfyxkyKf

CC

C

jj

''222

'111

1

.

,)()(),(

mmm

F

N

jjjj

vvvvvvAcf

NwhereyxyxkF

Page 17: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Reproducing Kernel Hilbert Space

Note: construction of a feature map corresponding to a kernel K

Proposition: If k is a continuous kernel of a positive integral operator (positive semi-definite in discrete case), one can construct a mapping into a space where k acts as

a dot product,

k in mercer’s theorem is called mercer kernel.

)),(),((: 2211 xxx

),())(),(( yxkyx

Page 18: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Reproducing Kernel Hilbert Space

A vector space X is called inner product space if there is a real bilinear map < , > satisfying:• •

Hilbert space: a complete separable inner product space (A space H is separable if there exists a countable subset D s

uch that every element of H is the limit of a sequence of elements of D.) RKHS: RKHS is a Hilbert space of functions f on some set C such that all evaluation functionals

are continuous. (Wahba)

00,,0,

,,

xxxandxx

xyyx

)()( yffT

Page 19: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Reproducing Kernel Hilbert Space

Riesz representation theorem: Let H be a Hilbert space and let be given. Then there is a unique

such that Recall: If Hrkhs is a RKHS, then for each y in C, Ty (defined as ) is continuous.

By the Riesz representation theorem, for each there exist a unique function of x, say, such that

Cy

(*)),(,)( ykfyf

*H Hf 0||||||||, 00 fandHffff

)()(,: yffTCHT yRKHSy

RKHSHyk ),(

Page 20: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Reproducing Kernel Hilbert Space

spans the whole RKHS : By (*), implies f = 0.

By (*), :The inner product on the RKHS space corresponds to a value

of the reproducing kernel k

Cf. is the completion of the continuous functions wrt. the L2-norm.

}:),({ Cyyk yykf 0),(,

(**)),())(,(),(),,( yxkyxkykxk

)(2 nRL

Page 21: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Reproducing Kernel Hilbert Space

For a Mercer kernel k, it is possible to construct a dot product such that k becomes a reproducing kernel for a Hilbert space of functions of the form

(check) Since k is symmetric, choose i orthogonal as,

jjnnj /,

1 11

)()(),()(i

N

jijjji

iii

F

xxaxxkaxf

)(

)(,)(),(,1 1,

yf

yxaykfi

N

njnnnjijji

F

Page 22: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Reproducing Kernel Hilbert Space

Feature space vs RKHS: feature space is a RKHS. Rewriting the functions of RKHS wrt. the orthonomal basis of M

ercer’s theorem :

(x) is nothing but the coordinate representation of the kernel as a function of one argument.

)(

,)(,

)('))(,()()(

1

1 1

1

ini

in

nni

N

jjijjinnn

n

N

nnn

xa

xaf

xxxxf

F

F

FNnnn ,,1)(

)),(,(

)),(,())(,(

RKHSHxkfwhere

xkfx

Page 23: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Reproducing Kernel Hilbert Space

The representation ability of a kernel k and l data points: The corresponding feature space H is spanned by

The feature mapping is wrt. the corresponding Mercer’s eigenfunctions and an objective function f(t) may be expressed as a linear combination of these eigenfunctions.

Since H is a RKHS, any such nonlinear function f(t) can be approximated with these kernels

)},(,),,({ 1 lxkxk

Page 24: Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem

(c) 2003 SNU Biointelligence Lab.

Example

Nonlinear Regression for a training set generated from a target function t(x) :Assume a dual representationMinimize the norm

)},(,),,{( 11 ll yxyxS

l

iii xxKxf

1

),()(

l

i

l

jjiji

l

iii

l

iii

l

iiiH

txxKy

xtxxKxtxxKtf

1 1

2

1

11

2

||||),(2

)(),(),(),(||||