Upload
anna-cunningham
View
213
Download
0
Embed Size (px)
Citation preview
Class 19, spring 2001
CBCl/AI MIT
Review by Evgeniou, Pontil and PoggioAdvances in Computational Mathematics, 2000
2
1
))((1
minKii
iHf
fyxfV
The “b” problem
We said that the solution to
with K positive definite, is
1
( ) ( , )i ii
f x K x x
Class 19, spring 2001
CBCl/AI MIT
The “b” problem
For RN when K is a positive definite kernel the solution is
bKf i
l
i i ),()( ' xxx
),()( i
l
i iKf xxx
For SVM, since Vapnik introduced them, even when K is a positive definite kernel, the solution is given as
This is puzzling since we have seen that RN and SVM are the same (just with a different loss function which is independent of K). What is the solution of this small puzzle?
Class 19, spring 2001
CBCl/AI MIT
We know that the function f that satisfies
has the form
where the kernel K is conditionally positive definite of order k and p(x) is a polynomial of degree (k-1)
Wahba 1990; Poggio and Girosi, 1989; Smale and Cucker, 2001
22
1
))((1
minKii
iHf
fyxf
)(),()( xxxx pKf i
l
i i
The “b” problem: Regularization Networks
Class 19, spring 2001
CBCl/AI MIT
Definition: Positive definite kernels A real-valued kernel K(x,y) is positive definite if and only if K is symmetric and
for any n distinct points x in the bounded domain X
0),(1,
kjk
n
kj j Kcc xx
The “b” problem
Class 19, spring 2001
CBCl/AI MIT
Definition: Conditionally positive definite kernels of order 1 (-K is called negative definite)
A real-valued kernel K(x,y) is conditionally positive definite of order 1 if and only if K is symmetric and for any n distinct points x in the domain X and scalars c such that
the quadratic form is nonnegative, that is
0),(1,
kjk
n
kj j Kcc xx
The “b” problem
01
n
i ic
Class 19, spring 2001
CBCl/AI MIT
Notice that positive definite kernels are conditionally positive definite of order 1 (from the definition).
Thus in RN with a positive definite K we can always lookfor a solution
but also (considering K to be conditionally positive)
The “b” problem
),()( i
l
i iKf xxx
bKf i
l
i i ),()( xxx
What does it mean?
Class 19, spring 2001
CBCl/AI MIT Theorem:
If K is positive definite then
is conditionally positive definite of order 1.
The proof is easy from Corollary 2.1 b of Micchelli, 1986. Notice that this corresponds to subtracting out the constant term in the expansion of K in the eigenfunctions, which in turn means that the constant term in the expansion of f is not penalized in the RKHS norm..
The “b” problem
constxxKxxK jiji ),(),('
Class 19, spring 2001
CBCl/AI MIT
Reproducing Kernel Hilbert Spaces
the spacethe space
N
nnn xcxfH
0
)()(
is a RKHS with K as the reproducing kernel.is a RKHS with K as the reproducing kernel.
N
nnnn txtxK
0
)()(),( Given a positive Given a positive definite function function
N
n n
nK
cf
0
22
The RKHS norm is The RKHS norm is
with on a bounded domainwith on a bounded domain )(xf
Class 19, spring 2001
CBCl/AI MIT
Suppose in RN we start with K positive definite and then we consider K’ =K –const which is conditionally positive definite. If I regularize with K’, that is I look for
The solution is
The “b” problem
bKbKf i
l
i ii
l
i i ),(),()( ' xxxxx
22
1
'))((1
minKii
iHf
fyxf
because of the conditions on the coefficients.
Class 19, spring 2001
CBCl/AI MIT
In this case the coefficients are given by solving
These equations correspond to the minimization problem with K’.
The “b” problem
01
1)(1)( '
ybIKbIK
Class 19, spring 2001
CBCl/AI MIT
If I use K positive definite and I look for
The solution is
The “b” problem
),()( i
l
i i Kf xxx
22
1
))((1
minKii
iHf
fyxf
with
yIK )(
Class 19, spring 2001
CBCl/AI MIT
Thus I have two different approximations for f, corresponding to whether constant shifts in f are penalized or not. The two representations are
The “b” problem
bKf i
l
i i ),()( xxx
b1)( 1 IKα
),()( i
l
i i Kf xxx
In the case of RN (quadratic loss function)
Class 19, spring 2001
CBCl/AI MIT
The same (but not the last equation) is true for SVMs. Thus Vapnik’s solution corresponds to using a conditionally positive K’ instead of K in the RKHS norm, thereby NOT penalizing constants in f. This choice depends on the problem but it is often reasonable especially in classification problems. On the other hand, our argument shows that we can use directly a positive definite stabilizer in SVM. This implies that the QP problem can be solved WITHOUT the constraint
which makes the QP optimization an easier problem…
The “b” problem
01
n
i i
Class 19, spring 2001
CBCl/AI MIT
Given a positive definite K the are two possible choices. The first is to find the function solving
The “b” problem: summary
),()( i
l
i i Kf xxx
22
1
))((1
minKii
iHf
fyxf
The other is to consider K’=K-const and minimize
22
1
'))((1
minKii
iHf
fyxf
which is
which gives bKf i
l
i i ),()( xxx