Class 19, spring 2001 CBCl/AI MIT Review by Evgeniou, Pontil and Poggio Advances in Computational Mathematics, 2000 The “b” problem We said that the solution

Class 19, spring 2001

CBCl/AI MIT

Review by Evgeniou, Pontil and PoggioAdvances in Computational Mathematics, 2000

2

1

))((1

minKii

iHf

fyxfV

The “b” problem

We said that the solution to

with K positive definite, is

1

( ) ( , )i ii

f x K x x


CBCl/AI MIT

The “b” problem

For RN when K is a positive definite kernel the solution is

bKf i

l

i i ),()( ' xxx

),()( i

l

i iKf xxx

For SVM, since Vapnik introduced them, even when K is a positive definite kernel, the solution is given as

This is puzzling since we have seen that RN and SVM are the same (just with a different loss function which is independent of K). What is the solution of this small puzzle?


CBCl/AI MIT

We know that the function f that satisfies

has the form

where the kernel K is conditionally positive definite of order k and p(x) is a polynomial of degree (k-1)

Wahba 1990; Poggio and Girosi, 1989; Smale and Cucker, 2001

22

1

))((1

minKii

iHf

fyxf

)(),()( xxxx pKf i

l

i i

The “b” problem: Regularization Networks


CBCl/AI MIT

Definition: Positive definite kernels A real-valued kernel K(x,y) is positive definite if and only if K is symmetric and

for any n distinct points x in the bounded domain X

0),(1,

kjk

n

kj j Kcc xx

The “b” problem


CBCl/AI MIT

Definition: Conditionally positive definite kernels of order 1 (-K is called negative definite)

A real-valued kernel K(x,y) is conditionally positive definite of order 1 if and only if K is symmetric and for any n distinct points x in the domain X and scalars c such that

the quadratic form is nonnegative, that is

0),(1,

kjk

n

kj j Kcc xx

The “b” problem

01

n

i ic


CBCl/AI MIT

Notice that positive definite kernels are conditionally positive definite of order 1 (from the definition).

Thus in RN with a positive definite K we can always lookfor a solution

but also (considering K to be conditionally positive)

The “b” problem

),()( i

l

i iKf xxx

bKf i

l

i i ),()( xxx

What does it mean?


CBCl/AI MIT Theorem:

If K is positive definite then

is conditionally positive definite of order 1.

The proof is easy from Corollary 2.1 b of Micchelli, 1986. Notice that this corresponds to subtracting out the constant term in the expansion of K in the eigenfunctions, which in turn means that the constant term in the expansion of f is not penalized in the RKHS norm..

The “b” problem

constxxKxxK jiji ),(),('


CBCl/AI MIT

Reproducing Kernel Hilbert Spaces

the spacethe space

N

nnn xcxfH

0

)()(

is a RKHS with K as the reproducing kernel.is a RKHS with K as the reproducing kernel.

N

nnnn txtxK

0

)()(),( Given a positive Given a positive definite function function

N

n n

nK

cf

0

22

The RKHS norm is The RKHS norm is

with on a bounded domainwith on a bounded domain )(xf


CBCl/AI MIT

Suppose in RN we start with K positive definite and then we consider K’ =K –const which is conditionally positive definite. If I regularize with K’, that is I look for

The solution is

The “b” problem

bKbKf i

l

i ii

l

i i ),(),()( ' xxxxx

22

1

'))((1

minKii

iHf

fyxf

because of the conditions on the coefficients.


CBCl/AI MIT

In this case the coefficients are given by solving

These equations correspond to the minimization problem with K’.

The “b” problem

01

1)(1)( '

ybIKbIK


CBCl/AI MIT

If I use K positive definite and I look for

The solution is

The “b” problem

),()( i

l

i i Kf xxx

22

1

))((1

minKii

iHf

fyxf

with

yIK )(


CBCl/AI MIT

Thus I have two different approximations for f, corresponding to whether constant shifts in f are penalized or not. The two representations are

The “b” problem

bKf i

l

i i ),()( xxx

b1)( 1 IKα

),()( i

l

i i Kf xxx

In the case of RN (quadratic loss function)


CBCl/AI MIT

The same (but not the last equation) is true for SVMs. Thus Vapnik’s solution corresponds to using a conditionally positive K’ instead of K in the RKHS norm, thereby NOT penalizing constants in f. This choice depends on the problem but it is often reasonable especially in classification problems. On the other hand, our argument shows that we can use directly a positive definite stabilizer in SVM. This implies that the QP problem can be solved WITHOUT the constraint

which makes the QP optimization an easier problem…

The “b” problem

01

n

i i


CBCl/AI MIT

Given a positive definite K the are two possible choices. The first is to find the function solving

The “b” problem: summary

),()( i

l

i i Kf xxx

22

1

))((1

minKii

iHf

fyxf

The other is to consider K’=K-const and minimize

22

1

'))((1

minKii

iHf

fyxf

which is

which gives bKf i

l

i i ),()( xxx

Documents

Class 19, spring 2001 CBCl/AI MIT Review by Evgeniou, Pontil and Poggio Advances in Computational Mathematics, 2000 The “b” problem We said that the solution