Influence of the sampling on Functional Data Analysis

Influence of the sampling on Functional DataAnalysis

Nathalie Villa-Vialaneix - [email protected]://www.nathalievilla.org

Institut de Mathématiques de Toulouse - IUT de Carcassonne, Université de PerpignanFrance

La Havane, September 18th, 2008

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 1 / 30

Table of contents

1 Introduction to the sampling problem

2 Approximating functions with splines

3 Using splines in functional models based on sampling

4 References


We do not observe functional data !

In most theoretical work, the functional observations x1, . . . , xn are directlythe true functions.

Or even, not the true sampling:

xi =(xi(t i

1) + εi,t i1, xi(t i

2) + εi,t i2, . . . , xi(t i

di) + εi,t i

di

)



But, in fact we can observe:

xi =

(xi(a), xi

(a +

b − aL

), xi

(a +

b − a2L

), . . . , xi(b)

)


xi =(xi(t i


2) + εi,t i2, . . . , xi(t i

di) + εi,t i

di

)



Or:xi =

(xi(t i

1), xi(t i2), . . . , xi(t i

di))


xi =(xi(t i


2) + εi,t i2, . . . , xi(t i

di) + εi,t i

di

)




xi =(xi(t i


2) + εi,t i2, . . . , xi(t i

di) + εi,t i

di

)Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 3 / 30

Consequences on the estimators and their errors

Hence, most of times, functional data analysis consists in1 Building estimators of the xi from their sampling, x̂i = Υ(xi)

2 Using x̂i (or its derivatives) as if they were the true functions xi .

Problem: Most of theoretical results presented in the past days werebased on the knowledge of xi and not on an the estimation x̂i .What are the consequences on

The estimate Ψn of the regression function Ψ?

The consistency of the error to the optimal Bayes error

when using approximation of xi?





Problem: Most of theoretical results presented in the past days werebased on the knowledge of xi and not on an the estimation x̂i .

What are the consequences on








Problem: Most of theoretical results presented in the past days werebased on the knowledge of xi and not on an the estimation x̂i .What are the consequences on





Notations and assumptions

Suppose that we are studying the random pair (X ,Y) where:

X is functional and takes its values in the Hilbert space (X, 〈., .〉X);

Y takes its values inclassification case: {−1, 1};regression case: R.

Suppose that we observe (xτi , yi)i=1,...,n where:

τ is the set of sampling points (the same for all functions);

(xi , yi)i are i.i.d. copies of (X ,Y).




X is functional and takes its values in the Hilbert space (X, 〈., .〉X);Y takes its values in

classification case: {−1, 1};regression case: R.










xτi = (xi(t))t∈τ

(non noisy case)









xτi = (xi(t) + εi,t )t∈τ

(noisy case)




Table of contents




4 References


A smooth representation of sampled functions

Given xτ , splines at providing the smoothest representation as possibleof x : [0, 1]→ R.

More precisely, x is approximated by:

x̂λ,τ = arg minhHm

1|τ|

∑t∈τ

(h(t) − xτt

)2+ λ

∫[0,1]

h(m)(t)dt

where for a m > 3/2, the Sobolev space Hm is defined by

Hm = {h ∈ L2([0, 1]) : ∀ k = 1, . . . ,m, h(k) exists in a weak sense

and h(m) ∈ L2([0, 1])}.


A smooth representation of sampled functions

Given xτ , splines at providing the smoothest representation as possibleof x : [0, 1]→ R. More precisely, x is approximated by:

x̂λ,τ = arg minhHm

1|τ|

∑t∈τ

(h(t) − xτt

)2+ λ

∫[0,1]

h(m)(t)dt

where for a m > 3/2, the Sobolev space Hm is defined by

Hm = {h ∈ L2([0, 1]) : ∀ k = 1, . . . ,m, h(k) exists in a weak sense

and h(m) ∈ L2([0, 1])}.


Decomposition of Hm

The key point for solving the previous optimization problem is to build aHilbert structure on Hm such that ‖h‖Hm '

∥∥∥h(m)∥∥∥

L2 .

This can be done by decomposing Hm into Hm = Hm0 ⊕H

m1 where

Hm0 =KerDm = Pm−1 (the space of polynomial functions of degree

less or equal to m − 1);

Hm1 is an infinite dimensional subspace of Hm defined via m

boundary conditions, denoted B : Hm → Rm, that are such that:KerB ∩ Pm−1.

Example 1: For m = 2, B : h → (h(0), h(1)) andHm

1 = {h ∈ H2 : h(0) = h(1) = 0}Example 2: For m > 3/2, B : h → (h(0), h′(0), . . . , h(m−1)(0)) andHm

1 = {h ∈ Hm : Bh = 0}


Decomposition of Hm


∥∥∥h(m)∥∥∥

L2 .This can be done by decomposing Hm into Hm = Hm

0 ⊕Hm1 where







1 = {h ∈ Hm : Bh = 0}


Decomposition of Hm


∥∥∥h(m)∥∥∥


0 ⊕Hm1 where







1 = {h ∈ Hm : Bh = 0}


Decomposition of Hm


∥∥∥h(m)∥∥∥


0 ⊕Hm1 where






1 = {h ∈ H2 : h(0) = h(1) = 0}

Example 2: For m > 3/2, B : h → (h(0), h′(0), . . . , h(m−1)(0)) andHm

1 = {h ∈ Hm : Bh = 0}


Decomposition of Hm


∥∥∥h(m)∥∥∥


0 ⊕Hm1 where







1 = {h ∈ Hm : Bh = 0}


Hilbert structure of Hm

Hm0 and Hm

1 are Hilbert spaces with respect to the inner product:

∀ u, v ∈ Hm0 , 〈u, v〉Hm

0= (Bu)T (Bv),

∀ u, v ∈ Hm1 , 〈u, v〉Hm

1= 〈Dmu,Dmv〉L2 .

Hence, we obtain this way a inner product on Hm:

〈u, v〉Hm = 〈P0(u),P0(v)〉Hm0

+ 〈P1(u),P1(v)〉Hm1

= 〈Dmu,Dmv〉L2 + (Bu)T (Bv)

where Pj is the projector on Hmj for j = 0, 1.



Hm0 and Hm


∀ u, v ∈ Hm0 , 〈u, v〉Hm

0= (Bu)T (Bv),

∀ u, v ∈ Hm1 , 〈u, v〉Hm



〈u, v〉Hm = 〈P0(u),P0(v)〉Hm0

+ 〈P1(u),P1(v)〉Hm1





Hm0 and Hm


∀ u, v ∈ Hm0 , 〈u, v〉Hm

0= (Bu)T (Bv),

∀ u, v ∈ Hm1 , 〈u, v〉Hm



〈u, v〉Hm = 〈P0(u),P0(v)〉Hm0

+ 〈P1(u),P1(v)〉Hm1





Hm0 and Hm


∀ u, v ∈ Hm0 , 〈u, v〉Hm

0= (Bu)T (Bv),

∀ u, v ∈ Hm1 , 〈u, v〉Hm



〈u, v〉Hm = 〈P0(u),P0(v)〉Hm0

+ 〈P1(u),P1(v)〉Hm1




RKHS structure of Hm

Equipped with 〈., .〉Hm , Hm0 , Hm

1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:

∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm

j= u(t).

Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm

1 = Pm−1 for the norm ‖.‖Hm0

, then k0(s, t) =∑m−1

i=1 ei(s)ei(t).

Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then

k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!2dw.

See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.





∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm

j= u(t).

Hence, k = k0 + k1 is the reproducing kernel of Hm.

Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm


, then k0(s, t) =∑m−1

i=1 ei(s)ei(t).


k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!2dw.






∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm

j= u(t).



, then k0(s, t) =∑m−1

i=1 ei(s)ei(t).


k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!2dw.






∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm

j= u(t).



, then k0(s, t) =∑m−1

i=1 ei(s)ei(t).If m = 2 and boundary conditions are u(0) = u(1) = 0,{t → t , t → (1 − t)} is an orthonormal basis of Hm

0 and then

k0(s, t) = (1 − t)(1 − s) + st .


k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!2dw.






∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm

j= u(t).



, then k0(s, t) =∑m−1

i=1 ei(s)ei(t).If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0,

{t → t i

i!

}i=1,...,m−1

is an orthonormalbasis of Hm

0 and then

k0(s, t) =m−1∑k=0

tk sk

(k !)2.


k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!2dw.






∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm

j= u(t).



, then k0(s, t) =∑m−1

i=1 ei(s)ei(t).Example 2: k1 can be found by the way of the Green function,G : [0, 1]2 → R, satisfying:

u =

∫[0,1]

G(., t)Dmu(t)dt .

We have:k1(s, t) =

∫[0,1]

G(s,w)G(t ,w)dw.

If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then

k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!2dw.






∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm

j= u(t).



, then k0(s, t) =∑m−1

i=1 ei(s)ei(t).Example 2:If m = 2 and boundary conditions are u(0) = u(1) = 0, then

k1(s, t) = (s − t)3+ − s(1 − t)(s2 − 2t + t2)/6.

If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then

k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!2dw.






∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm

j= u(t).



, then k0(s, t) =∑m−1

i=1 ei(s)ei(t).Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then

k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!2dw.






∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm

j= u(t).



, then k0(s, t) =∑m−1

i=1 ei(s)ei(t).Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then

k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!2dw.



Assumptions for existence and unicity of a spline

(A1) |τ| ≥ m − 1;

(A2) sampling points are distinct in [0, 1];

(A3) the m boundary conditions B j are linearly independent from the|τ| linear forms h ∈ Hm → h(t) for t ∈ τ.



(A1) |τ| ≥ m − 1;





(A1) |τ| ≥ m − 1;




Computing the splines

Theorem [Kimeldorf and Wahba, 1971]Under assumptions (A1)-(A3), for all given xτ , the unique solution of theoptimization problem is:

x̂λ,τ = ωT(U(K1 + λI|τ |)

−1UT)−1

U(K1 + λI|τ |)−1xτ (1)

+ηT (K1 + λI|τ |)−1

(I|τ | − UT (U(K1 + λI|τ |)

−1UT )−1U(K1 + λI|τ |)−1

)xτ

= (ωT M0 + ηT M1)xτ

where

{ω1, . . . , ωm} is a basis of Pm−1, ω = (ω1, . . . , ωm)T andU = (ωi(t))i=1,...,m, t∈τ ;

η = (k1(t , .))Tt∈τ and K1 = (k1(t , t ′))t ,t ′∈τ .


Computing inner products between splines

Corollary

Under assumptions (A1)-(A3),

〈̂uλ,τ , v̂λ,τ〉Hm = (uτ)T MTOWMOvτ + (uτ)T MT

1 K1M1vτ

= (uτ)T Mτvτ

where the matrix Mτ is symmetric and positive definite (and therefore,defines a inner product on R|τ |).


Assumptions for convergence of spline estimates

If τ = {t1, t2, . . . , t|τ |}, denote:

∆τ = max t1, t2 − t1, . . . , 1 − t|τ |−1, ∆τ = min{t2−t1, t3−t2, . . . , t|τ |−t|τ |−1}.

and suppose that we are given a series of sampling sets τ1, τ2, . . . , τd ,. . . .λ should now depend on τ. We then note (τd)d for the series of samplingsets and (λd)d for the associated series of regularizing parameters.Suppose:

(A4) There is R ∈ R such that ∆τ/∆τ ≤ R;

(A5) limd→+∞ |τd | = +∞ and limd→+∞ λd = 0.



If τ = {t1, t2, . . . , t|τ |}, denote:


and suppose that we are given a series of sampling sets τ1, τ2, . . . , τd ,. . . .λ should now depend on τ. We then note (τd)d for the series of samplingsets and (λd)d for the associated series of regularizing parameters.

Suppose:





If τ = {t1, t2, . . . , t|τ |}, denote:







If τ = {t1, t2, . . . , t|τ |}, denote:






Convergence of spline estimates

Theorem [Ragozin, 1983]

Under assumptions (A1)-(A3), there are two constants, AR ,m and BR ,m,depending only on R and m, such that for any x ∈ Hm and any positive λ:

∥∥∥̂xλ,τ − x∥∥∥2

L2 ≤

(AR ,mλ + BR ,m

1|τ|2m

) ∥∥∥Dmx∥∥∥2

L2 .

Thus, under additional assumption (A4),∥∥∥̂xλ,τd − x

∥∥∥L2

d→+∞−−−−−−→ 0.


Convergence of spline estimates

Theorem [Ragozin, 1983]

Under assumptions (A1)-(A3), there are two constants, AR ,m and BR ,m,depending only on R and m, such that for any x ∈ Hm and any positive λ:

∥∥∥̂xλ,τ − x∥∥∥2

L2 ≤

(AR ,mλ + BR ,m

1|τ|2m

) ∥∥∥Dmx∥∥∥2

L2 .

Thus, under additional assumption (A4),∥∥∥̂xλ,τd − x

∥∥∥L2

d→+∞−−−−−−→ 0.


Just a single example: Tecator dataset


Table of contents




4 References


Notations and method

Suppose that we are given a pair of random variables (X ,Y) taking theirvalues in Hm × {−1, 1} (classification case) or in Hm × {−1, 1} (regressioncase).

We are given a training set of size n, {(xτdi , yi)}i=1,...,n where

xτdi = (xi(t1), . . . , xi(t|τd |));

{(xi , yi)}i are i.i.d. copies of (X ,Y).


Notations and method

Suppose that we are given a pair of random variables (X ,Y) taking theirvalues in Hm × {−1, 1} (classification case) or in Hm × {−1, 1} (regressioncase).We are given a training set of size n, {(xτd

i , yi)}i=1,...,n where

xτdi = (xi(t1), . . . , xi(t|τd |));

{(xi , yi)}i are i.i.d. copies of (X ,Y).


A general consistent method based on derivatives

Consider a consistent classification or regression scheme for data ofRp × {−1, 1} (or Rp × R) and denote by ψD the obtained classifier (orregression function) based on a learning set D = {(u1, y1), . . . , (un, yn)}.

If the definition of ψD is based on the norm or inner product between (ui)i

(and, of course on yi values), then this method can be generalized to beused with derivatives of xi by replacing this inner product by:

〈Dmxi ,Dmxj〉L2

' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm = 〈Mτd xi

τd , xjτd 〉Rd

Write Qτd for the transpose of the Choleski triangle of Mτd

((Qτd )T Qτd = Mτd ), we thus define a classifier or a regression functionon (Dmxi)i using only the discrete sampling (xτd

i )i by using

φn,τd = ψεn,τd

where εn,τd = {(Qτd xτdi , yi)}i .



Consider a consistent classification or regression scheme for data ofRp × {−1, 1} (or Rp × R) and denote by ψD the obtained classifier (orregression function) based on a learning set D = {(u1, y1), . . . , (un, yn)}.If the definition of ψD is based on the norm or inner product between (ui)i


〈Dmxi ,Dmxj〉L2

' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm = 〈Mτd xi

τd , xjτd 〉Rd



i )i by using

φn,τd = ψεn,τd






〈Dmxi ,Dmxj〉L2 ' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm

= 〈Mτd xiτd , xj

τd 〉Rd



i )i by using

φn,τd = ψεn,τd






〈Dmxi ,Dmxj〉L2 ' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm = 〈Mτd xi

τd , xjτd 〉Rd



i )i by using

φn,τd = ψεn,τd






〈Dmxi ,Dmxj〉L2 ' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm = 〈Mτd xi

τd , xjτd 〉Rd



i )i by using

φn,τd = ψεn,τd



Consistency property

Theorem [Rossi and Villa, 2008]Under assumptions (A1)-(A4) and assumption(A5a) E

(∥∥∥Dmx∥∥∥

L2

)is finite and Y ∈ {−1, 1};

or(A5b) τd ⊂ τd+1 and E

(Y2

)is finite, we have

limd→+∞

limn→+∞

Lφn,τd = L∗.

Sketch of the proof:

Using assumptions on the consistency of themultidimensional method, we have, for all d

Lφn,τd − L∗τd

n→+∞−−−−−−→ 0.




(∥∥∥Dmx∥∥∥

L2



(Y2

)is finite, we have

limd→+∞

limn→+∞

Lφn,τd = L∗.

Sketch of the proof: Using convergence of the splines - (A5a) - and amartingale argument - (A5b) - we demonstrate that

L∗τd− L∗

d→+∞−−−−−−→ 0

where L∗τd= infφ:R|τd |→R P (φ(Xτd ) , Y) (classification case) or

L∗τd= infφ:R|τd |→R E

([φ(Xτd ) − Y ]2

)(regression case).

Using assumptionson the consistency of the multidimensional method, we have, for all d


n→+∞−−−−−−→ 0.




(∥∥∥Dmx∥∥∥

L2



(Y2

)is finite, we have

limd→+∞

limn→+∞

Lφn,τd = L∗.

Sketch of the proof: Using assumptions on the consistency of themultidimensional method, we have, for all d


n→+∞−−−−−−→ 0.


Application to kernel methods (SVM and kernel ridgeregression)

Provided additional assumptions, kernel methods

FD = arg minn∑

i=1

L(yi ,F(ui)) + C ‖F‖S

are consistent both for classification and for regression purposes:[Steinwart, 2002, Christmann and Steinwart, 2007].

Application to the general framework to kernel methods lead to thedefinition of the following kernel:

Kτd = K ◦ Qτd

from (R|τd |)2 to R, where K is any usual multidimensional kernel.This kernel uses the sampling xτd

i to approximately compute thefunctional kernel

K : (u, v) ∈ Hm → K(‖u − v‖Hm ).




FD = arg minn∑

i=1


are consistent both for classification and for regression purposes:[Steinwart, 2002, Christmann and Steinwart, 2007].Application to the general framework to kernel methods lead to thedefinition of the following kernel:

Kτd = K ◦ Qτd

from (R|τd |)2 to R, where K is any usual multidimensional kernel.

This kernel uses the sampling xτdi to approximately compute the

functional kernel

K : (u, v) ∈ Hm → K(‖u − v‖Hm ).




FD = arg minn∑

i=1


are consistent both for classification and for regression purposes:[Steinwart, 2002, Christmann and Steinwart, 2007].Application to the general framework to kernel methods lead to thedefinition of the following kernel:

Kτd = K ◦ Qτd

from (R|τd |)2 to R, where K is any usual multidimensional kernel.This kernel uses the sampling xτd

i to approximately compute thefunctional kernel

K : (u, v) ∈ Hm → K(‖u − v‖Hm ).


Corollary: consistency of kernel based method forclassification

Derivative based SVM consistency

Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)

for all d, the kernel K is universal on any compact subset of R|τ| andhas a covering number of the form N(K , ε) = On(ε−αd ) for a αd > 0on this compact subset,

the sequence of regularization parameters C ≡ (Cdn ) is such that for

each d, limn→+∞ nCdn = +∞ and Cd

n = On

(nβd−1

)for a 0 < βd <

1αd

,

and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .Then, the SVM classifier is universally consistent:

limd→+∞

limn→+∞

E (φn,τd ) = L∗.








n = On

(nβd−1

)for a 0 < βd <

1αd

,

and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .

Then, the SVM classifier is universally consistent:

limd→+∞

limn→+∞

E (φn,τd ) = L∗.








n = On

(nβd−1

)for a 0 < βd <

1αd

,

and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .Then, the SVM classifier is universally consistent:

limd→+∞

limn→+∞

E (φn,τd ) = L∗.


Corollary: consistency of kernel based method forregression



for all d, the kernel K is universal on any compact subset of R|τ| ,


each d, limn→+∞ nCdn = +∞ and limn→+∞ n(Cd

n )4/3 = 0

and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .Then, the ridge kernel regression is universally consistent:

limd→+∞

limn→+∞

E (φn,τd ) = L∗.








n )4/3 = 0

and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .

Then, the ridge kernel regression is universally consistent:

limd→+∞

limn→+∞

E (φn,τd ) = L∗.








n )4/3 = 0

and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .Then, the ridge kernel regression is universally consistent:

limd→+∞

limn→+∞

E (φn,τd ) = L∗.


Linear regression with noisy covariates

Let’s finally come back to the linear model

Y = 〈X , a〉L2([0,1]) + ε

with Y a real random variable, X a random variable taking its value inL2([0, 1]) and ε a centered real random variable independent of X .

But in the case of “noisy covariates”, we do not observe the couple(X ,Y) but (wi(t1), . . . ,wi(td), yi)i=1,...,n with

wi(tj) = xi(tj) + δi,j

where (xi , yi)i are (not necessarily independent) copies of (X ,Y) and δi,j

is a sequence of independent real centered independent variables that aresupposed to have 4th moment. Moreover, δi,j are supposed to beindependent of X and ε.In the following, we will assumed that the observations are centered toavoid notations’ difficulties.




Y = 〈X , a〉L2([0,1]) + ε

with Y a real random variable, X a random variable taking its value inL2([0, 1]) and ε a centered real random variable independent of X .But in the case of “noisy covariates”, we do not observe the couple(X ,Y) but (wi(t1), . . . ,wi(td), yi)i=1,...,n with



is a sequence of independent real centered independent variables that aresupposed to have 4th moment. Moreover, δi,j are supposed to beindependent of X and ε.

In the following, we will assumed that the observations are centered toavoid notations’ difficulties.




Y = 〈X , a〉L2([0,1]) + ε

with Y a real random variable, X a random variable taking its value inL2([0, 1]) and ε a centered real random variable independent of X .But in the case of “noisy covariates”, we do not observe the couple(X ,Y) but (wi(t1), . . . ,wi(td), yi)i=1,...,n with



is a sequence of independent real centered independent variables that aresupposed to have 4th moment. Moreover, δi,j are supposed to beindependent of X and ε.In the following, we will assumed that the observations are centered toavoid notations’ difficulties.


Splines estimators

In the case where xi are known, a spline estimate of a would be:

an := arg minh∈Hm

1n

n∑i=1

yi −1p

p∑j=1

h(tj)xi(tj)

2

+ ρ ‖h‖Hm

In the case where xi are only known through noisy covariates, a splineestimate of a can be accessed through the Total Least Square approach[Cardot et al., 2007]:

an := arg minh∈Hm ,(xi,j)i,j

1n

n∑i=1

yi −

1p

p∑j=1

h(tj)xi,j

2

+1p

p∑j=1

(xi,j − wi(tj))2

+ρ ‖h‖Hm

The solution is given by

an =1n

(1

npXT X + ρAm − pσ2

k Ip

)−1

XT Y

where σk is replaced, in practice, by σδp where σδ is an estimate of the

standard deviation of δ.


Splines estimators

In the case where xi are known, a spline estimate of a would be:

an := arg minh∈Hm

1n

n∑i=1

yi −1p

p∑j=1

h(tj)xi(tj)

2

+ ρ ‖h‖Hm


an =1n

(1

npXT X + ρAm

)−1

XT Y

whereX = (xi(tj))i=1,...,n,j=1,...,p ;Y = (y1, . . . , yn)T ;Am is the matrix that defines the Hm-norm from the discrete samplingat (tj)j .



1n

n∑i=1

yi −

1p

p∑j=1

h(tj)xi,j

2

+1p

p∑j=1

(xi,j − wi(tj))2

+ρ ‖h‖Hm


an =1n

(1


k Ip

)−1

XT Y




Splines estimators



1n

n∑i=1

yi −

1p

p∑j=1

h(tj)xi,j

2

+1p

p∑j=1

(xi,j − wi(tj))2

+ρ ‖h‖Hm


an =1n

(1


k Ip

)−1

XT Y




Splines estimators



1n

n∑i=1

yi −

1p

p∑j=1

h(tj)xi,j

2

+1p

p∑j=1

(xi,j − wi(tj))2

+ρ ‖h‖Hm


an =1n

(1


k Ip

)−1

XT Y




Assumptions for convergence of an

a belongs to Hm;

It exists a constant κ, 0 < κ < 1, such that, for every δ > 0, ∃C:P (|X(t) − X(s)| ≤ C |t − s|, s, t ∈ [0, 1]) ≥ 1 − δ

∃E ∈ R and, for all k ∈ N, a k -dimensional subspace, Lk , of L2 with

E

(inf

h∈Lksup

t

∣∣∣X(t) − h(t)∣∣∣2) ≤ Ek−2q

There is a constant F :Var

(〈Γn

Xζs , ζt 〉L2

)≤ F

nE(〈X − E (X) , ζs〉

2L2

)E

(〈X − E (X) , ζt 〉

2L2

)For each δ > 0, ∃D: P

(1√p

∥∥∥∥ 1np XT Xa

∥∥∥∥Rp> D

)≥ 1 − δ

np−2κ = O(1), limn,p→+∞ ρ = 0 and limn,p→+∞1

nρ = 0.


Convergence of an

Theorem [Crambes et al., 2008]Under the previous assumptions,

∥∥∥an − a∥∥∥

ΓX= OP

(1

npρ+

1n

+ n−(2q+1)/2)


Application to prediction of ozone

The data is a time series of the maximum of ozone rate each day inToulouse (France).


Table of contents




4 References


References

Further details for the references are given in the joint document.

Berlinet, A. and Thomas-Agnan, C. (2004).Reproducing Kernel Hilbert Spaces in Probability and Statistics.Kluwer Academic Publisher.

Cardot, H., Crambes, C., Kneip, A., and Sarda, P. (2007).Smoothing splines estimators in functional linear regression witherrors-in-variables.Computational Statistics and Data Analysis, 51:4832–4848.

Christmann, A. and Steinwart, I. (2007).Consistency and robustness of kernel-based regression in convex riskminimization.Bernouilli, 13(3):799–819.

Crambes, C., Kneip, A., and Sarda, P. (2008).Smoothing splines estimators for functional linear regression.Annals of Statistics.

Kimeldorf, G. and Wahba, G. (1971).Some results on Tchebycheffian spline functions.Journal of Mathematical Analysis and Applications, 33(1):82–95.

Ragozin, D. (1983).Error bounds for derivative estimation based on spline smoothing ofexact or noisy data.Journal of Approximation Theory, 37:335–355.

Rossi, F. and Villa, N. (2008).Classification and regression based on derivatives: a consistancyresult applied to functional kernel based classification and regression.Work in progress.

Steinwart, I. (2002).Support vector machines are universally consistent.Journal of Complexity, 18:768–791.


Science

Influence of the sampling on Functional Data Analysis