ADVANCED TOPIC: KERNELS 1. The kernel trick where i 1,…,i k are the mistakes… so: Remember in our alternate perceptron:

Embed Size (px)

DESCRIPTION

The kernel trick – con ’ t where i 1,…,i k are the mistakes… then Since: Consider a preprocesser that replaces every x with x ’ to include, directly in the example, all the pairwise variable interactions, so what is learned is a vector v ’ : And it has some advantages…(everything is in terms of dot-product). I can stick my preprocessor here, before the dot-product gets called

Citation preview

ADVANCED TOPIC: KERNELS 1 The kernel trick where i 1,,i k are the mistakes so: Remember in our alternate perceptron: The kernel trick con t where i 1,,i k are the mistakes then Since: Consider a preprocesser that replaces every x with x to include, directly in the example, all the pairwise variable interactions, so what is learned is a vector v : And it has some advantages(everything is in terms of dot-product). I can stick my preprocessor here, before the dot-product gets called The kernel trick con t A voted perceptron over vectors like u,v is a linear function applied to x= Replacing u with u would lead to non- linear functions f(x,y,xy,x 2,) The kernel trick con t But noticeif we replace u.v with (u.v+1) 2 . Compare to The kernel trick con t So up to constants on the cross-product terms Why not replace the computation of With the computation of where ? The kernel trick con t Consider a preprocesser that replaces every x with x to include, directly in the example, all the pairwise variable interactions, so what is learned is a vector v : I can stick my preprocessor here, before the dot-product gets called Better yet: use No preprocessor! I never build x! Example of separability 8 Some results with polynomial kernels 9 10 12 13 The kernel trick con t General idea: replace an expensive preprocessor x x and ordinary inner product with no preprocessor and a function K(x,x i ) where This is really useful when you want to learn over objects x with some non-trivial structure. The kernel trick con t Even more general idea: use any function K that is Continuous Symmetrici.e., K(u,v)=K(v,u) Positive semidefinitei.e., K(u,v)0 Then by an ancient theorem due to Mercer, K corresponds to some combination of a preprocessor and an inner product: i.e., Terminology: K is a Mercer kernel. The set of all x is a reproducing kernel Hilbert space (RKHS). The matrix M[i,j]=K(x i,x j ) is a Gram matrix.