Upload
tuxette
View
76
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Short courses on functional data analysis and statistical learning, part 4
Citation preview
Influence of the sampling on Functional DataAnalysis
Nathalie Villa-Vialaneix - [email protected]://www.nathalievilla.org
Institut de Mathématiques de Toulouse - IUT de Carcassonne, Université de PerpignanFrance
La Havane, September 18th, 2008
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 1 / 30
Table of contents
1 Introduction to the sampling problem
2 Approximating functions with splines
3 Using splines in functional models based on sampling
4 References
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 2 / 30
We do not observe functional data !
In most theoretical work, the functional observations x1, . . . , xn are directlythe true functions.
Or even, not the true sampling:
xi =(xi(t i
1) + εi,t i1, xi(t i
2) + εi,t i2, . . . , xi(t i
di) + εi,t i
di
)
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 3 / 30
We do not observe functional data !
But, in fact we can observe:
xi =
(xi(a), xi
(a +
b − aL
), xi
(a +
b − a2L
), . . . , xi(b)
)
Or even, not the true sampling:
xi =(xi(t i
1) + εi,t i1, xi(t i
2) + εi,t i2, . . . , xi(t i
di) + εi,t i
di
)
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 3 / 30
We do not observe functional data !
Or:xi =
(xi(t i
1), xi(t i2), . . . , xi(t i
di))
Or even, not the true sampling:
xi =(xi(t i
1) + εi,t i1, xi(t i
2) + εi,t i2, . . . , xi(t i
di) + εi,t i
di
)
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 3 / 30
We do not observe functional data !
Or even, not the true sampling:
xi =(xi(t i
1) + εi,t i1, xi(t i
2) + εi,t i2, . . . , xi(t i
di) + εi,t i
di
)Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 3 / 30
Consequences on the estimators and their errors
Hence, most of times, functional data analysis consists in1 Building estimators of the xi from their sampling, x̂i = Υ(xi)
2 Using x̂i (or its derivatives) as if they were the true functions xi .
Problem: Most of theoretical results presented in the past days werebased on the knowledge of xi and not on an the estimation x̂i .What are the consequences on
The estimate Ψn of the regression function Ψ?
The consistency of the error to the optimal Bayes error
when using approximation of xi?
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 4 / 30
Consequences on the estimators and their errors
Hence, most of times, functional data analysis consists in1 Building estimators of the xi from their sampling, x̂i = Υ(xi)
2 Using x̂i (or its derivatives) as if they were the true functions xi .
Problem: Most of theoretical results presented in the past days werebased on the knowledge of xi and not on an the estimation x̂i .
What are the consequences on
The estimate Ψn of the regression function Ψ?
The consistency of the error to the optimal Bayes error
when using approximation of xi?
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 4 / 30
Consequences on the estimators and their errors
Hence, most of times, functional data analysis consists in1 Building estimators of the xi from their sampling, x̂i = Υ(xi)
2 Using x̂i (or its derivatives) as if they were the true functions xi .
Problem: Most of theoretical results presented in the past days werebased on the knowledge of xi and not on an the estimation x̂i .What are the consequences on
The estimate Ψn of the regression function Ψ?
The consistency of the error to the optimal Bayes error
when using approximation of xi?
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 4 / 30
Notations and assumptions
Suppose that we are studying the random pair (X ,Y) where:
X is functional and takes its values in the Hilbert space (X, 〈., .〉X);
Y takes its values inclassification case: {−1, 1};regression case: R.
Suppose that we observe (xτi , yi)i=1,...,n where:
τ is the set of sampling points (the same for all functions);
(xi , yi)i are i.i.d. copies of (X ,Y).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 5 / 30
Notations and assumptions
Suppose that we are studying the random pair (X ,Y) where:
X is functional and takes its values in the Hilbert space (X, 〈., .〉X);Y takes its values in
classification case: {−1, 1};regression case: R.
Suppose that we observe (xτi , yi)i=1,...,n where:
τ is the set of sampling points (the same for all functions);
(xi , yi)i are i.i.d. copies of (X ,Y).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 5 / 30
Notations and assumptions
Suppose that we are studying the random pair (X ,Y) where:
X is functional and takes its values in the Hilbert space (X, 〈., .〉X);Y takes its values in
classification case: {−1, 1};regression case: R.
Suppose that we observe (xτi , yi)i=1,...,n where:
xτi = (xi(t))t∈τ
(non noisy case)
τ is the set of sampling points (the same for all functions);
(xi , yi)i are i.i.d. copies of (X ,Y).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 5 / 30
Notations and assumptions
Suppose that we are studying the random pair (X ,Y) where:
X is functional and takes its values in the Hilbert space (X, 〈., .〉X);Y takes its values in
classification case: {−1, 1};regression case: R.
Suppose that we observe (xτi , yi)i=1,...,n where:
xτi = (xi(t) + εi,t )t∈τ
(noisy case)
τ is the set of sampling points (the same for all functions);
(xi , yi)i are i.i.d. copies of (X ,Y).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 5 / 30
Table of contents
1 Introduction to the sampling problem
2 Approximating functions with splines
3 Using splines in functional models based on sampling
4 References
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 6 / 30
A smooth representation of sampled functions
Given xτ , splines at providing the smoothest representation as possibleof x : [0, 1]→ R.
More precisely, x is approximated by:
x̂λ,τ = arg minhHm
1|τ|
∑t∈τ
(h(t) − xτt
)2+ λ
∫[0,1]
h(m)(t)dt
where for a m > 3/2, the Sobolev space Hm is defined by
Hm = {h ∈ L2([0, 1]) : ∀ k = 1, . . . ,m, h(k) exists in a weak sense
and h(m) ∈ L2([0, 1])}.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 7 / 30
A smooth representation of sampled functions
Given xτ , splines at providing the smoothest representation as possibleof x : [0, 1]→ R. More precisely, x is approximated by:
x̂λ,τ = arg minhHm
1|τ|
∑t∈τ
(h(t) − xτt
)2+ λ
∫[0,1]
h(m)(t)dt
where for a m > 3/2, the Sobolev space Hm is defined by
Hm = {h ∈ L2([0, 1]) : ∀ k = 1, . . . ,m, h(k) exists in a weak sense
and h(m) ∈ L2([0, 1])}.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 7 / 30
Decomposition of Hm
The key point for solving the previous optimization problem is to build aHilbert structure on Hm such that ‖h‖Hm '
∥∥∥h(m)∥∥∥
L2 .
This can be done by decomposing Hm into Hm = Hm0 ⊕H
m1 where
Hm0 =KerDm = Pm−1 (the space of polynomial functions of degree
less or equal to m − 1);
Hm1 is an infinite dimensional subspace of Hm defined via m
boundary conditions, denoted B : Hm → Rm, that are such that:KerB ∩ Pm−1.
Example 1: For m = 2, B : h → (h(0), h(1)) andHm
1 = {h ∈ H2 : h(0) = h(1) = 0}Example 2: For m > 3/2, B : h → (h(0), h′(0), . . . , h(m−1)(0)) andHm
1 = {h ∈ Hm : Bh = 0}
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 8 / 30
Decomposition of Hm
The key point for solving the previous optimization problem is to build aHilbert structure on Hm such that ‖h‖Hm '
∥∥∥h(m)∥∥∥
L2 .This can be done by decomposing Hm into Hm = Hm
0 ⊕Hm1 where
Hm0 =KerDm = Pm−1 (the space of polynomial functions of degree
less or equal to m − 1);
Hm1 is an infinite dimensional subspace of Hm defined via m
boundary conditions, denoted B : Hm → Rm, that are such that:KerB ∩ Pm−1.
Example 1: For m = 2, B : h → (h(0), h(1)) andHm
1 = {h ∈ H2 : h(0) = h(1) = 0}Example 2: For m > 3/2, B : h → (h(0), h′(0), . . . , h(m−1)(0)) andHm
1 = {h ∈ Hm : Bh = 0}
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 8 / 30
Decomposition of Hm
The key point for solving the previous optimization problem is to build aHilbert structure on Hm such that ‖h‖Hm '
∥∥∥h(m)∥∥∥
L2 .This can be done by decomposing Hm into Hm = Hm
0 ⊕Hm1 where
Hm0 =KerDm = Pm−1 (the space of polynomial functions of degree
less or equal to m − 1);
Hm1 is an infinite dimensional subspace of Hm defined via m
boundary conditions, denoted B : Hm → Rm, that are such that:KerB ∩ Pm−1.
Example 1: For m = 2, B : h → (h(0), h(1)) andHm
1 = {h ∈ H2 : h(0) = h(1) = 0}Example 2: For m > 3/2, B : h → (h(0), h′(0), . . . , h(m−1)(0)) andHm
1 = {h ∈ Hm : Bh = 0}
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 8 / 30
Decomposition of Hm
The key point for solving the previous optimization problem is to build aHilbert structure on Hm such that ‖h‖Hm '
∥∥∥h(m)∥∥∥
L2 .This can be done by decomposing Hm into Hm = Hm
0 ⊕Hm1 where
Hm0 =KerDm = Pm−1 (the space of polynomial functions of degree
less or equal to m − 1);
Hm1 is an infinite dimensional subspace of Hm defined via m
boundary conditions, denoted B : Hm → Rm, that are such that:KerB ∩ Pm−1.
Example 1: For m = 2, B : h → (h(0), h(1)) andHm
1 = {h ∈ H2 : h(0) = h(1) = 0}
Example 2: For m > 3/2, B : h → (h(0), h′(0), . . . , h(m−1)(0)) andHm
1 = {h ∈ Hm : Bh = 0}
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 8 / 30
Decomposition of Hm
The key point for solving the previous optimization problem is to build aHilbert structure on Hm such that ‖h‖Hm '
∥∥∥h(m)∥∥∥
L2 .This can be done by decomposing Hm into Hm = Hm
0 ⊕Hm1 where
Hm0 =KerDm = Pm−1 (the space of polynomial functions of degree
less or equal to m − 1);
Hm1 is an infinite dimensional subspace of Hm defined via m
boundary conditions, denoted B : Hm → Rm, that are such that:KerB ∩ Pm−1.
Example 1: For m = 2, B : h → (h(0), h(1)) andHm
1 = {h ∈ H2 : h(0) = h(1) = 0}Example 2: For m > 3/2, B : h → (h(0), h′(0), . . . , h(m−1)(0)) andHm
1 = {h ∈ Hm : Bh = 0}
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 8 / 30
Hilbert structure of Hm
Hm0 and Hm
1 are Hilbert spaces with respect to the inner product:
∀ u, v ∈ Hm0 , 〈u, v〉Hm
0= (Bu)T (Bv),
∀ u, v ∈ Hm1 , 〈u, v〉Hm
1= 〈Dmu,Dmv〉L2 .
Hence, we obtain this way a inner product on Hm:
〈u, v〉Hm = 〈P0(u),P0(v)〉Hm0
+ 〈P1(u),P1(v)〉Hm1
= 〈Dmu,Dmv〉L2 + (Bu)T (Bv)
where Pj is the projector on Hmj for j = 0, 1.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 9 / 30
Hilbert structure of Hm
Hm0 and Hm
1 are Hilbert spaces with respect to the inner product:
∀ u, v ∈ Hm0 , 〈u, v〉Hm
0= (Bu)T (Bv),
∀ u, v ∈ Hm1 , 〈u, v〉Hm
1= 〈Dmu,Dmv〉L2 .
Hence, we obtain this way a inner product on Hm:
〈u, v〉Hm = 〈P0(u),P0(v)〉Hm0
+ 〈P1(u),P1(v)〉Hm1
= 〈Dmu,Dmv〉L2 + (Bu)T (Bv)
where Pj is the projector on Hmj for j = 0, 1.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 9 / 30
Hilbert structure of Hm
Hm0 and Hm
1 are Hilbert spaces with respect to the inner product:
∀ u, v ∈ Hm0 , 〈u, v〉Hm
0= (Bu)T (Bv),
∀ u, v ∈ Hm1 , 〈u, v〉Hm
1= 〈Dmu,Dmv〉L2 .
Hence, we obtain this way a inner product on Hm:
〈u, v〉Hm = 〈P0(u),P0(v)〉Hm0
+ 〈P1(u),P1(v)〉Hm1
= 〈Dmu,Dmv〉L2 + (Bu)T (Bv)
where Pj is the projector on Hmj for j = 0, 1.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 9 / 30
Hilbert structure of Hm
Hm0 and Hm
1 are Hilbert spaces with respect to the inner product:
∀ u, v ∈ Hm0 , 〈u, v〉Hm
0= (Bu)T (Bv),
∀ u, v ∈ Hm1 , 〈u, v〉Hm
1= 〈Dmu,Dmv〉L2 .
Hence, we obtain this way a inner product on Hm:
〈u, v〉Hm = 〈P0(u),P0(v)〉Hm0
+ 〈P1(u),P1(v)〉Hm1
= 〈Dmu,Dmv〉L2 + (Bu)T (Bv)
where Pj is the projector on Hmj for j = 0, 1.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 9 / 30
RKHS structure of Hm
Equipped with 〈., .〉Hm , Hm0 , Hm
1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:
∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm
j= u(t).
Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm
1 = Pm−1 for the norm ‖.‖Hm0
, then k0(s, t) =∑m−1
i=1 ei(s)ei(t).
Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then
k1(s, t) =
∫ 1
0
(t − w)m−1+ (s − w)m−1
+
(m − 1)!2dw.
See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30
RKHS structure of Hm
Equipped with 〈., .〉Hm , Hm0 , Hm
1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:
∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm
j= u(t).
Hence, k = k0 + k1 is the reproducing kernel of Hm.
Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm
1 = Pm−1 for the norm ‖.‖Hm0
, then k0(s, t) =∑m−1
i=1 ei(s)ei(t).
Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then
k1(s, t) =
∫ 1
0
(t − w)m−1+ (s − w)m−1
+
(m − 1)!2dw.
See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30
RKHS structure of Hm
Equipped with 〈., .〉Hm , Hm0 , Hm
1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:
∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm
j= u(t).
Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm
1 = Pm−1 for the norm ‖.‖Hm0
, then k0(s, t) =∑m−1
i=1 ei(s)ei(t).
Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then
k1(s, t) =
∫ 1
0
(t − w)m−1+ (s − w)m−1
+
(m − 1)!2dw.
See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30
RKHS structure of Hm
Equipped with 〈., .〉Hm , Hm0 , Hm
1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:
∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm
j= u(t).
Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm
1 = Pm−1 for the norm ‖.‖Hm0
, then k0(s, t) =∑m−1
i=1 ei(s)ei(t).If m = 2 and boundary conditions are u(0) = u(1) = 0,{t → t , t → (1 − t)} is an orthonormal basis of Hm
0 and then
k0(s, t) = (1 − t)(1 − s) + st .
Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then
k1(s, t) =
∫ 1
0
(t − w)m−1+ (s − w)m−1
+
(m − 1)!2dw.
See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30
RKHS structure of Hm
Equipped with 〈., .〉Hm , Hm0 , Hm
1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:
∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm
j= u(t).
Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm
1 = Pm−1 for the norm ‖.‖Hm0
, then k0(s, t) =∑m−1
i=1 ei(s)ei(t).If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0,
{t → t i
i!
}i=1,...,m−1
is an orthonormalbasis of Hm
0 and then
k0(s, t) =m−1∑k=0
tk sk
(k !)2.
Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then
k1(s, t) =
∫ 1
0
(t − w)m−1+ (s − w)m−1
+
(m − 1)!2dw.
See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30
RKHS structure of Hm
Equipped with 〈., .〉Hm , Hm0 , Hm
1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:
∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm
j= u(t).
Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm
1 = Pm−1 for the norm ‖.‖Hm0
, then k0(s, t) =∑m−1
i=1 ei(s)ei(t).Example 2: k1 can be found by the way of the Green function,G : [0, 1]2 → R, satisfying:
u =
∫[0,1]
G(., t)Dmu(t)dt .
We have:k1(s, t) =
∫[0,1]
G(s,w)G(t ,w)dw.
If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then
k1(s, t) =
∫ 1
0
(t − w)m−1+ (s − w)m−1
+
(m − 1)!2dw.
See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30
RKHS structure of Hm
Equipped with 〈., .〉Hm , Hm0 , Hm
1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:
∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm
j= u(t).
Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm
1 = Pm−1 for the norm ‖.‖Hm0
, then k0(s, t) =∑m−1
i=1 ei(s)ei(t).Example 2:If m = 2 and boundary conditions are u(0) = u(1) = 0, then
k1(s, t) = (s − t)3+ − s(1 − t)(s2 − 2t + t2)/6.
If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then
k1(s, t) =
∫ 1
0
(t − w)m−1+ (s − w)m−1
+
(m − 1)!2dw.
See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30
RKHS structure of Hm
Equipped with 〈., .〉Hm , Hm0 , Hm
1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:
∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm
j= u(t).
Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm
1 = Pm−1 for the norm ‖.‖Hm0
, then k0(s, t) =∑m−1
i=1 ei(s)ei(t).Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then
k1(s, t) =
∫ 1
0
(t − w)m−1+ (s − w)m−1
+
(m − 1)!2dw.
See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30
RKHS structure of Hm
Equipped with 〈., .〉Hm , Hm0 , Hm
1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:
∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm
j= u(t).
Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm
1 = Pm−1 for the norm ‖.‖Hm0
, then k0(s, t) =∑m−1
i=1 ei(s)ei(t).Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then
k1(s, t) =
∫ 1
0
(t − w)m−1+ (s − w)m−1
+
(m − 1)!2dw.
See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30
Assumptions for existence and unicity of a spline
(A1) |τ| ≥ m − 1;
(A2) sampling points are distinct in [0, 1];
(A3) the m boundary conditions B j are linearly independent from the|τ| linear forms h ∈ Hm → h(t) for t ∈ τ.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 11 / 30
Assumptions for existence and unicity of a spline
(A1) |τ| ≥ m − 1;
(A2) sampling points are distinct in [0, 1];
(A3) the m boundary conditions B j are linearly independent from the|τ| linear forms h ∈ Hm → h(t) for t ∈ τ.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 11 / 30
Assumptions for existence and unicity of a spline
(A1) |τ| ≥ m − 1;
(A2) sampling points are distinct in [0, 1];
(A3) the m boundary conditions B j are linearly independent from the|τ| linear forms h ∈ Hm → h(t) for t ∈ τ.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 11 / 30
Computing the splines
Theorem [Kimeldorf and Wahba, 1971]Under assumptions (A1)-(A3), for all given xτ , the unique solution of theoptimization problem is:
x̂λ,τ = ωT(U(K1 + λI|τ |)
−1UT)−1
U(K1 + λI|τ |)−1xτ (1)
+ηT (K1 + λI|τ |)−1
(I|τ | − UT (U(K1 + λI|τ |)
−1UT )−1U(K1 + λI|τ |)−1
)xτ
= (ωT M0 + ηT M1)xτ
where
{ω1, . . . , ωm} is a basis of Pm−1, ω = (ω1, . . . , ωm)T andU = (ωi(t))i=1,...,m, t∈τ ;
η = (k1(t , .))Tt∈τ and K1 = (k1(t , t ′))t ,t ′∈τ .
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 12 / 30
Computing inner products between splines
Corollary
Under assumptions (A1)-(A3),
〈̂uλ,τ , v̂λ,τ〉Hm = (uτ)T MTOWMOvτ + (uτ)T MT
1 K1M1vτ
= (uτ)T Mτvτ
where the matrix Mτ is symmetric and positive definite (and therefore,defines a inner product on R|τ |).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 13 / 30
Assumptions for convergence of spline estimates
If τ = {t1, t2, . . . , t|τ |}, denote:
∆τ = max t1, t2 − t1, . . . , 1 − t|τ |−1, ∆τ = min{t2−t1, t3−t2, . . . , t|τ |−t|τ |−1}.
and suppose that we are given a series of sampling sets τ1, τ2, . . . , τd ,. . . .λ should now depend on τ. We then note (τd)d for the series of samplingsets and (λd)d for the associated series of regularizing parameters.Suppose:
(A4) There is R ∈ R such that ∆τ/∆τ ≤ R;
(A5) limd→+∞ |τd | = +∞ and limd→+∞ λd = 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 14 / 30
Assumptions for convergence of spline estimates
If τ = {t1, t2, . . . , t|τ |}, denote:
∆τ = max t1, t2 − t1, . . . , 1 − t|τ |−1, ∆τ = min{t2−t1, t3−t2, . . . , t|τ |−t|τ |−1}.
and suppose that we are given a series of sampling sets τ1, τ2, . . . , τd ,. . . .λ should now depend on τ. We then note (τd)d for the series of samplingsets and (λd)d for the associated series of regularizing parameters.
Suppose:
(A4) There is R ∈ R such that ∆τ/∆τ ≤ R;
(A5) limd→+∞ |τd | = +∞ and limd→+∞ λd = 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 14 / 30
Assumptions for convergence of spline estimates
If τ = {t1, t2, . . . , t|τ |}, denote:
∆τ = max t1, t2 − t1, . . . , 1 − t|τ |−1, ∆τ = min{t2−t1, t3−t2, . . . , t|τ |−t|τ |−1}.
and suppose that we are given a series of sampling sets τ1, τ2, . . . , τd ,. . . .λ should now depend on τ. We then note (τd)d for the series of samplingsets and (λd)d for the associated series of regularizing parameters.Suppose:
(A4) There is R ∈ R such that ∆τ/∆τ ≤ R;
(A5) limd→+∞ |τd | = +∞ and limd→+∞ λd = 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 14 / 30
Assumptions for convergence of spline estimates
If τ = {t1, t2, . . . , t|τ |}, denote:
∆τ = max t1, t2 − t1, . . . , 1 − t|τ |−1, ∆τ = min{t2−t1, t3−t2, . . . , t|τ |−t|τ |−1}.
and suppose that we are given a series of sampling sets τ1, τ2, . . . , τd ,. . . .λ should now depend on τ. We then note (τd)d for the series of samplingsets and (λd)d for the associated series of regularizing parameters.Suppose:
(A4) There is R ∈ R such that ∆τ/∆τ ≤ R;
(A5) limd→+∞ |τd | = +∞ and limd→+∞ λd = 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 14 / 30
Convergence of spline estimates
Theorem [Ragozin, 1983]
Under assumptions (A1)-(A3), there are two constants, AR ,m and BR ,m,depending only on R and m, such that for any x ∈ Hm and any positive λ:
∥∥∥̂xλ,τ − x∥∥∥2
L2 ≤
(AR ,mλ + BR ,m
1|τ|2m
) ∥∥∥Dmx∥∥∥2
L2 .
Thus, under additional assumption (A4),∥∥∥̂xλ,τd − x
∥∥∥L2
d→+∞−−−−−−→ 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 15 / 30
Convergence of spline estimates
Theorem [Ragozin, 1983]
Under assumptions (A1)-(A3), there are two constants, AR ,m and BR ,m,depending only on R and m, such that for any x ∈ Hm and any positive λ:
∥∥∥̂xλ,τ − x∥∥∥2
L2 ≤
(AR ,mλ + BR ,m
1|τ|2m
) ∥∥∥Dmx∥∥∥2
L2 .
Thus, under additional assumption (A4),∥∥∥̂xλ,τd − x
∥∥∥L2
d→+∞−−−−−−→ 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 15 / 30
Just a single example: Tecator dataset
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 16 / 30
Table of contents
1 Introduction to the sampling problem
2 Approximating functions with splines
3 Using splines in functional models based on sampling
4 References
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 17 / 30
Notations and method
Suppose that we are given a pair of random variables (X ,Y) taking theirvalues in Hm × {−1, 1} (classification case) or in Hm × {−1, 1} (regressioncase).
We are given a training set of size n, {(xτdi , yi)}i=1,...,n where
xτdi = (xi(t1), . . . , xi(t|τd |));
{(xi , yi)}i are i.i.d. copies of (X ,Y).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 18 / 30
Notations and method
Suppose that we are given a pair of random variables (X ,Y) taking theirvalues in Hm × {−1, 1} (classification case) or in Hm × {−1, 1} (regressioncase).We are given a training set of size n, {(xτd
i , yi)}i=1,...,n where
xτdi = (xi(t1), . . . , xi(t|τd |));
{(xi , yi)}i are i.i.d. copies of (X ,Y).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 18 / 30
A general consistent method based on derivatives
Consider a consistent classification or regression scheme for data ofRp × {−1, 1} (or Rp × R) and denote by ψD the obtained classifier (orregression function) based on a learning set D = {(u1, y1), . . . , (un, yn)}.
If the definition of ψD is based on the norm or inner product between (ui)i
(and, of course on yi values), then this method can be generalized to beused with derivatives of xi by replacing this inner product by:
〈Dmxi ,Dmxj〉L2
' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm = 〈Mτd xi
τd , xjτd 〉Rd
Write Qτd for the transpose of the Choleski triangle of Mτd
((Qτd )T Qτd = Mτd ), we thus define a classifier or a regression functionon (Dmxi)i using only the discrete sampling (xτd
i )i by using
φn,τd = ψεn,τd
where εn,τd = {(Qτd xτdi , yi)}i .
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 19 / 30
A general consistent method based on derivatives
Consider a consistent classification or regression scheme for data ofRp × {−1, 1} (or Rp × R) and denote by ψD the obtained classifier (orregression function) based on a learning set D = {(u1, y1), . . . , (un, yn)}.If the definition of ψD is based on the norm or inner product between (ui)i
(and, of course on yi values), then this method can be generalized to beused with derivatives of xi by replacing this inner product by:
〈Dmxi ,Dmxj〉L2
' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm = 〈Mτd xi
τd , xjτd 〉Rd
Write Qτd for the transpose of the Choleski triangle of Mτd
((Qτd )T Qτd = Mτd ), we thus define a classifier or a regression functionon (Dmxi)i using only the discrete sampling (xτd
i )i by using
φn,τd = ψεn,τd
where εn,τd = {(Qτd xτdi , yi)}i .
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 19 / 30
A general consistent method based on derivatives
Consider a consistent classification or regression scheme for data ofRp × {−1, 1} (or Rp × R) and denote by ψD the obtained classifier (orregression function) based on a learning set D = {(u1, y1), . . . , (un, yn)}.If the definition of ψD is based on the norm or inner product between (ui)i
(and, of course on yi values), then this method can be generalized to beused with derivatives of xi by replacing this inner product by:
〈Dmxi ,Dmxj〉L2 ' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm
= 〈Mτd xiτd , xj
τd 〉Rd
Write Qτd for the transpose of the Choleski triangle of Mτd
((Qτd )T Qτd = Mτd ), we thus define a classifier or a regression functionon (Dmxi)i using only the discrete sampling (xτd
i )i by using
φn,τd = ψεn,τd
where εn,τd = {(Qτd xτdi , yi)}i .
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 19 / 30
A general consistent method based on derivatives
Consider a consistent classification or regression scheme for data ofRp × {−1, 1} (or Rp × R) and denote by ψD the obtained classifier (orregression function) based on a learning set D = {(u1, y1), . . . , (un, yn)}.If the definition of ψD is based on the norm or inner product between (ui)i
(and, of course on yi values), then this method can be generalized to beused with derivatives of xi by replacing this inner product by:
〈Dmxi ,Dmxj〉L2 ' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm = 〈Mτd xi
τd , xjτd 〉Rd
Write Qτd for the transpose of the Choleski triangle of Mτd
((Qτd )T Qτd = Mτd ), we thus define a classifier or a regression functionon (Dmxi)i using only the discrete sampling (xτd
i )i by using
φn,τd = ψεn,τd
where εn,τd = {(Qτd xτdi , yi)}i .
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 19 / 30
A general consistent method based on derivatives
Consider a consistent classification or regression scheme for data ofRp × {−1, 1} (or Rp × R) and denote by ψD the obtained classifier (orregression function) based on a learning set D = {(u1, y1), . . . , (un, yn)}.If the definition of ψD is based on the norm or inner product between (ui)i
(and, of course on yi values), then this method can be generalized to beused with derivatives of xi by replacing this inner product by:
〈Dmxi ,Dmxj〉L2 ' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm = 〈Mτd xi
τd , xjτd 〉Rd
Write Qτd for the transpose of the Choleski triangle of Mτd
((Qτd )T Qτd = Mτd ), we thus define a classifier or a regression functionon (Dmxi)i using only the discrete sampling (xτd
i )i by using
φn,τd = ψεn,τd
where εn,τd = {(Qτd xτdi , yi)}i .
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 19 / 30
Consistency property
Theorem [Rossi and Villa, 2008]Under assumptions (A1)-(A4) and assumption(A5a) E
(∥∥∥Dmx∥∥∥
L2
)is finite and Y ∈ {−1, 1};
or(A5b) τd ⊂ τd+1 and E
(Y2
)is finite, we have
limd→+∞
limn→+∞
Lφn,τd = L∗.
Sketch of the proof:
Using assumptions on the consistency of themultidimensional method, we have, for all d
Lφn,τd − L∗τd
n→+∞−−−−−−→ 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 20 / 30
Consistency property
Theorem [Rossi and Villa, 2008]Under assumptions (A1)-(A4) and assumption(A5a) E
(∥∥∥Dmx∥∥∥
L2
)is finite and Y ∈ {−1, 1};
or(A5b) τd ⊂ τd+1 and E
(Y2
)is finite, we have
limd→+∞
limn→+∞
Lφn,τd = L∗.
Sketch of the proof: Using convergence of the splines - (A5a) - and amartingale argument - (A5b) - we demonstrate that
L∗τd− L∗
d→+∞−−−−−−→ 0
where L∗τd= infφ:R|τd |→R P (φ(Xτd ) , Y) (classification case) or
L∗τd= infφ:R|τd |→R E
([φ(Xτd ) − Y ]2
)(regression case).
Using assumptionson the consistency of the multidimensional method, we have, for all d
Lφn,τd − L∗τd
n→+∞−−−−−−→ 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 20 / 30
Consistency property
Theorem [Rossi and Villa, 2008]Under assumptions (A1)-(A4) and assumption(A5a) E
(∥∥∥Dmx∥∥∥
L2
)is finite and Y ∈ {−1, 1};
or(A5b) τd ⊂ τd+1 and E
(Y2
)is finite, we have
limd→+∞
limn→+∞
Lφn,τd = L∗.
Sketch of the proof: Using assumptions on the consistency of themultidimensional method, we have, for all d
Lφn,τd − L∗τd
n→+∞−−−−−−→ 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 20 / 30
Application to kernel methods (SVM and kernel ridgeregression)
Provided additional assumptions, kernel methods
FD = arg minn∑
i=1
L(yi ,F(ui)) + C ‖F‖S
are consistent both for classification and for regression purposes:[Steinwart, 2002, Christmann and Steinwart, 2007].
Application to the general framework to kernel methods lead to thedefinition of the following kernel:
Kτd = K ◦ Qτd
from (R|τd |)2 to R, where K is any usual multidimensional kernel.This kernel uses the sampling xτd
i to approximately compute thefunctional kernel
K : (u, v) ∈ Hm → K(‖u − v‖Hm ).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 21 / 30
Application to kernel methods (SVM and kernel ridgeregression)
Provided additional assumptions, kernel methods
FD = arg minn∑
i=1
L(yi ,F(ui)) + C ‖F‖S
are consistent both for classification and for regression purposes:[Steinwart, 2002, Christmann and Steinwart, 2007].Application to the general framework to kernel methods lead to thedefinition of the following kernel:
Kτd = K ◦ Qτd
from (R|τd |)2 to R, where K is any usual multidimensional kernel.
This kernel uses the sampling xτdi to approximately compute the
functional kernel
K : (u, v) ∈ Hm → K(‖u − v‖Hm ).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 21 / 30
Application to kernel methods (SVM and kernel ridgeregression)
Provided additional assumptions, kernel methods
FD = arg minn∑
i=1
L(yi ,F(ui)) + C ‖F‖S
are consistent both for classification and for regression purposes:[Steinwart, 2002, Christmann and Steinwart, 2007].Application to the general framework to kernel methods lead to thedefinition of the following kernel:
Kτd = K ◦ Qτd
from (R|τd |)2 to R, where K is any usual multidimensional kernel.This kernel uses the sampling xτd
i to approximately compute thefunctional kernel
K : (u, v) ∈ Hm → K(‖u − v‖Hm ).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 21 / 30
Corollary: consistency of kernel based method forclassification
Derivative based SVM consistency
Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)
for all d, the kernel K is universal on any compact subset of R|τ| andhas a covering number of the form N(K , ε) = On(ε−αd ) for a αd > 0on this compact subset,
the sequence of regularization parameters C ≡ (Cdn ) is such that for
each d, limn→+∞ nCdn = +∞ and Cd
n = On
(nβd−1
)for a 0 < βd <
1αd
,
and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .Then, the SVM classifier is universally consistent:
limd→+∞
limn→+∞
E (φn,τd ) = L∗.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 22 / 30
Corollary: consistency of kernel based method forclassification
Derivative based SVM consistency
Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)
for all d, the kernel K is universal on any compact subset of R|τ| andhas a covering number of the form N(K , ε) = On(ε−αd ) for a αd > 0on this compact subset,
the sequence of regularization parameters C ≡ (Cdn ) is such that for
each d, limn→+∞ nCdn = +∞ and Cd
n = On
(nβd−1
)for a 0 < βd <
1αd
,
and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .
Then, the SVM classifier is universally consistent:
limd→+∞
limn→+∞
E (φn,τd ) = L∗.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 22 / 30
Corollary: consistency of kernel based method forclassification
Derivative based SVM consistency
Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)
for all d, the kernel K is universal on any compact subset of R|τ| andhas a covering number of the form N(K , ε) = On(ε−αd ) for a αd > 0on this compact subset,
the sequence of regularization parameters C ≡ (Cdn ) is such that for
each d, limn→+∞ nCdn = +∞ and Cd
n = On
(nβd−1
)for a 0 < βd <
1αd
,
and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .Then, the SVM classifier is universally consistent:
limd→+∞
limn→+∞
E (φn,τd ) = L∗.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 22 / 30
Corollary: consistency of kernel based method forregression
Derivative based SVM consistency
Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)
for all d, the kernel K is universal on any compact subset of R|τ| ,
the sequence of regularization parameters C ≡ (Cdn ) is such that for
each d, limn→+∞ nCdn = +∞ and limn→+∞ n(Cd
n )4/3 = 0
and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .Then, the ridge kernel regression is universally consistent:
limd→+∞
limn→+∞
E (φn,τd ) = L∗.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 23 / 30
Corollary: consistency of kernel based method forregression
Derivative based SVM consistency
Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)
for all d, the kernel K is universal on any compact subset of R|τ| ,
the sequence of regularization parameters C ≡ (Cdn ) is such that for
each d, limn→+∞ nCdn = +∞ and limn→+∞ n(Cd
n )4/3 = 0
and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .
Then, the ridge kernel regression is universally consistent:
limd→+∞
limn→+∞
E (φn,τd ) = L∗.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 23 / 30
Corollary: consistency of kernel based method forregression
Derivative based SVM consistency
Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)
for all d, the kernel K is universal on any compact subset of R|τ| ,
the sequence of regularization parameters C ≡ (Cdn ) is such that for
each d, limn→+∞ nCdn = +∞ and limn→+∞ n(Cd
n )4/3 = 0
and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .Then, the ridge kernel regression is universally consistent:
limd→+∞
limn→+∞
E (φn,τd ) = L∗.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 23 / 30
Linear regression with noisy covariates
Let’s finally come back to the linear model
Y = 〈X , a〉L2([0,1]) + ε
with Y a real random variable, X a random variable taking its value inL2([0, 1]) and ε a centered real random variable independent of X .
But in the case of “noisy covariates”, we do not observe the couple(X ,Y) but (wi(t1), . . . ,wi(td), yi)i=1,...,n with
wi(tj) = xi(tj) + δi,j
where (xi , yi)i are (not necessarily independent) copies of (X ,Y) and δi,j
is a sequence of independent real centered independent variables that aresupposed to have 4th moment. Moreover, δi,j are supposed to beindependent of X and ε.In the following, we will assumed that the observations are centered toavoid notations’ difficulties.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 24 / 30
Linear regression with noisy covariates
Let’s finally come back to the linear model
Y = 〈X , a〉L2([0,1]) + ε
with Y a real random variable, X a random variable taking its value inL2([0, 1]) and ε a centered real random variable independent of X .But in the case of “noisy covariates”, we do not observe the couple(X ,Y) but (wi(t1), . . . ,wi(td), yi)i=1,...,n with
wi(tj) = xi(tj) + δi,j
where (xi , yi)i are (not necessarily independent) copies of (X ,Y) and δi,j
is a sequence of independent real centered independent variables that aresupposed to have 4th moment. Moreover, δi,j are supposed to beindependent of X and ε.
In the following, we will assumed that the observations are centered toavoid notations’ difficulties.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 24 / 30
Linear regression with noisy covariates
Let’s finally come back to the linear model
Y = 〈X , a〉L2([0,1]) + ε
with Y a real random variable, X a random variable taking its value inL2([0, 1]) and ε a centered real random variable independent of X .But in the case of “noisy covariates”, we do not observe the couple(X ,Y) but (wi(t1), . . . ,wi(td), yi)i=1,...,n with
wi(tj) = xi(tj) + δi,j
where (xi , yi)i are (not necessarily independent) copies of (X ,Y) and δi,j
is a sequence of independent real centered independent variables that aresupposed to have 4th moment. Moreover, δi,j are supposed to beindependent of X and ε.In the following, we will assumed that the observations are centered toavoid notations’ difficulties.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 24 / 30
Splines estimators
In the case where xi are known, a spline estimate of a would be:
an := arg minh∈Hm
1n
n∑i=1
yi −1p
p∑j=1
h(tj)xi(tj)
2
+ ρ ‖h‖Hm
In the case where xi are only known through noisy covariates, a splineestimate of a can be accessed through the Total Least Square approach[Cardot et al., 2007]:
an := arg minh∈Hm ,(xi,j)i,j
1n
n∑i=1
yi −
1p
p∑j=1
h(tj)xi,j
2
+1p
p∑j=1
(xi,j − wi(tj))2
+ρ ‖h‖Hm
The solution is given by
an =1n
(1
npXT X + ρAm − pσ2
k Ip
)−1
XT Y
where σk is replaced, in practice, by σδp where σδ is an estimate of the
standard deviation of δ.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 25 / 30
Splines estimators
In the case where xi are known, a spline estimate of a would be:
an := arg minh∈Hm
1n
n∑i=1
yi −1p
p∑j=1
h(tj)xi(tj)
2
+ ρ ‖h‖Hm
The solution is given by
an =1n
(1
npXT X + ρAm
)−1
XT Y
whereX = (xi(tj))i=1,...,n,j=1,...,p ;Y = (y1, . . . , yn)T ;Am is the matrix that defines the Hm-norm from the discrete samplingat (tj)j .
In the case where xi are only known through noisy covariates, a splineestimate of a can be accessed through the Total Least Square approach[Cardot et al., 2007]:
an := arg minh∈Hm ,(xi,j)i,j
1n
n∑i=1
yi −
1p
p∑j=1
h(tj)xi,j
2
+1p
p∑j=1
(xi,j − wi(tj))2
+ρ ‖h‖Hm
The solution is given by
an =1n
(1
npXT X + ρAm − pσ2
k Ip
)−1
XT Y
where σk is replaced, in practice, by σδp where σδ is an estimate of the
standard deviation of δ.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 25 / 30
Splines estimators
In the case where xi are only known through noisy covariates, a splineestimate of a can be accessed through the Total Least Square approach[Cardot et al., 2007]:
an := arg minh∈Hm ,(xi,j)i,j
1n
n∑i=1
yi −
1p
p∑j=1
h(tj)xi,j
2
+1p
p∑j=1
(xi,j − wi(tj))2
+ρ ‖h‖Hm
The solution is given by
an =1n
(1
npXT X + ρAm − pσ2
k Ip
)−1
XT Y
where σk is replaced, in practice, by σδp where σδ is an estimate of the
standard deviation of δ.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 25 / 30
Splines estimators
In the case where xi are only known through noisy covariates, a splineestimate of a can be accessed through the Total Least Square approach[Cardot et al., 2007]:
an := arg minh∈Hm ,(xi,j)i,j
1n
n∑i=1
yi −
1p
p∑j=1
h(tj)xi,j
2
+1p
p∑j=1
(xi,j − wi(tj))2
+ρ ‖h‖Hm
The solution is given by
an =1n
(1
npXT X + ρAm − pσ2
k Ip
)−1
XT Y
where σk is replaced, in practice, by σδp where σδ is an estimate of the
standard deviation of δ.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 25 / 30
Assumptions for convergence of an
a belongs to Hm;
It exists a constant κ, 0 < κ < 1, such that, for every δ > 0, ∃C:P (|X(t) − X(s)| ≤ C |t − s|, s, t ∈ [0, 1]) ≥ 1 − δ
∃E ∈ R and, for all k ∈ N, a k -dimensional subspace, Lk , of L2 with
E
(inf
h∈Lksup
t
∣∣∣X(t) − h(t)∣∣∣2) ≤ Ek−2q
There is a constant F :Var
(〈Γn
Xζs , ζt 〉L2
)≤ F
nE(〈X − E (X) , ζs〉
2L2
)E
(〈X − E (X) , ζt 〉
2L2
)For each δ > 0, ∃D: P
(1√p
∥∥∥∥ 1np XT Xa
∥∥∥∥Rp> D
)≥ 1 − δ
np−2κ = O(1), limn,p→+∞ ρ = 0 and limn,p→+∞1
nρ = 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 26 / 30
Convergence of an
Theorem [Crambes et al., 2008]Under the previous assumptions,
∥∥∥an − a∥∥∥
ΓX= OP
(1
npρ+
1n
+ n−(2q+1)/2)
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 27 / 30
Application to prediction of ozone
The data is a time series of the maximum of ozone rate each day inToulouse (France).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 28 / 30
Table of contents
1 Introduction to the sampling problem
2 Approximating functions with splines
3 Using splines in functional models based on sampling
4 References
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 29 / 30
References
Further details for the references are given in the joint document.
Berlinet, A. and Thomas-Agnan, C. (2004).Reproducing Kernel Hilbert Spaces in Probability and Statistics.Kluwer Academic Publisher.
Cardot, H., Crambes, C., Kneip, A., and Sarda, P. (2007).Smoothing splines estimators in functional linear regression witherrors-in-variables.Computational Statistics and Data Analysis, 51:4832–4848.
Christmann, A. and Steinwart, I. (2007).Consistency and robustness of kernel-based regression in convex riskminimization.Bernouilli, 13(3):799–819.
Crambes, C., Kneip, A., and Sarda, P. (2008).Smoothing splines estimators for functional linear regression.Annals of Statistics.
Kimeldorf, G. and Wahba, G. (1971).Some results on Tchebycheffian spline functions.Journal of Mathematical Analysis and Applications, 33(1):82–95.
Ragozin, D. (1983).Error bounds for derivative estimation based on spline smoothing ofexact or noisy data.Journal of Approximation Theory, 37:335–355.
Rossi, F. and Villa, N. (2008).Classification and regression based on derivatives: a consistancyresult applied to functional kernel based classification and regression.Work in progress.
Steinwart, I. (2002).Support vector machines are universally consistent.Journal of Complexity, 18:768–791.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 30 / 30