78
Influence of the sampling on Functional Data Analysis Nathalie Villa-Vialaneix - [email protected] http://www.nathalievilla.org Institut de Mathématiques de Toulouse - IUT de Carcassonne, Université de Perpignan France La Havane, September 18th, 2008 Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 1 / 30

Influence of the sampling on Functional Data Analysis

  • Upload
    tuxette

  • View
    76

  • Download
    5

Embed Size (px)

DESCRIPTION

Short courses on functional data analysis and statistical learning, part 4

Citation preview

Page 1: Influence of the sampling on Functional Data Analysis

Influence of the sampling on Functional DataAnalysis

Nathalie Villa-Vialaneix - [email protected]://www.nathalievilla.org

Institut de Mathématiques de Toulouse - IUT de Carcassonne, Université de PerpignanFrance

La Havane, September 18th, 2008

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 1 / 30

Page 2: Influence of the sampling on Functional Data Analysis

Table of contents

1 Introduction to the sampling problem

2 Approximating functions with splines

3 Using splines in functional models based on sampling

4 References

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 2 / 30

Page 3: Influence of the sampling on Functional Data Analysis

We do not observe functional data !

In most theoretical work, the functional observations x1, . . . , xn are directlythe true functions.

Or even, not the true sampling:

xi =(xi(t i

1) + εi,t i1, xi(t i

2) + εi,t i2, . . . , xi(t i

di) + εi,t i

di

)

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 3 / 30

Page 4: Influence of the sampling on Functional Data Analysis

We do not observe functional data !

But, in fact we can observe:

xi =

(xi(a), xi

(a +

b − aL

), xi

(a +

b − a2L

), . . . , xi(b)

)

Or even, not the true sampling:

xi =(xi(t i

1) + εi,t i1, xi(t i

2) + εi,t i2, . . . , xi(t i

di) + εi,t i

di

)

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 3 / 30

Page 5: Influence of the sampling on Functional Data Analysis

We do not observe functional data !

Or:xi =

(xi(t i

1), xi(t i2), . . . , xi(t i

di))

Or even, not the true sampling:

xi =(xi(t i

1) + εi,t i1, xi(t i

2) + εi,t i2, . . . , xi(t i

di) + εi,t i

di

)

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 3 / 30

Page 6: Influence of the sampling on Functional Data Analysis

We do not observe functional data !

Or even, not the true sampling:

xi =(xi(t i

1) + εi,t i1, xi(t i

2) + εi,t i2, . . . , xi(t i

di) + εi,t i

di

)Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 3 / 30

Page 7: Influence of the sampling on Functional Data Analysis

Consequences on the estimators and their errors

Hence, most of times, functional data analysis consists in1 Building estimators of the xi from their sampling, x̂i = Υ(xi)

2 Using x̂i (or its derivatives) as if they were the true functions xi .

Problem: Most of theoretical results presented in the past days werebased on the knowledge of xi and not on an the estimation x̂i .What are the consequences on

The estimate Ψn of the regression function Ψ?

The consistency of the error to the optimal Bayes error

when using approximation of xi?

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 4 / 30

Page 8: Influence of the sampling on Functional Data Analysis

Consequences on the estimators and their errors

Hence, most of times, functional data analysis consists in1 Building estimators of the xi from their sampling, x̂i = Υ(xi)

2 Using x̂i (or its derivatives) as if they were the true functions xi .

Problem: Most of theoretical results presented in the past days werebased on the knowledge of xi and not on an the estimation x̂i .

What are the consequences on

The estimate Ψn of the regression function Ψ?

The consistency of the error to the optimal Bayes error

when using approximation of xi?

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 4 / 30

Page 9: Influence of the sampling on Functional Data Analysis

Consequences on the estimators and their errors

Hence, most of times, functional data analysis consists in1 Building estimators of the xi from their sampling, x̂i = Υ(xi)

2 Using x̂i (or its derivatives) as if they were the true functions xi .

Problem: Most of theoretical results presented in the past days werebased on the knowledge of xi and not on an the estimation x̂i .What are the consequences on

The estimate Ψn of the regression function Ψ?

The consistency of the error to the optimal Bayes error

when using approximation of xi?

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 4 / 30

Page 10: Influence of the sampling on Functional Data Analysis

Notations and assumptions

Suppose that we are studying the random pair (X ,Y) where:

X is functional and takes its values in the Hilbert space (X, 〈., .〉X);

Y takes its values inclassification case: {−1, 1};regression case: R.

Suppose that we observe (xτi , yi)i=1,...,n where:

τ is the set of sampling points (the same for all functions);

(xi , yi)i are i.i.d. copies of (X ,Y).

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 5 / 30

Page 11: Influence of the sampling on Functional Data Analysis

Notations and assumptions

Suppose that we are studying the random pair (X ,Y) where:

X is functional and takes its values in the Hilbert space (X, 〈., .〉X);Y takes its values in

classification case: {−1, 1};regression case: R.

Suppose that we observe (xτi , yi)i=1,...,n where:

τ is the set of sampling points (the same for all functions);

(xi , yi)i are i.i.d. copies of (X ,Y).

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 5 / 30

Page 12: Influence of the sampling on Functional Data Analysis

Notations and assumptions

Suppose that we are studying the random pair (X ,Y) where:

X is functional and takes its values in the Hilbert space (X, 〈., .〉X);Y takes its values in

classification case: {−1, 1};regression case: R.

Suppose that we observe (xτi , yi)i=1,...,n where:

xτi = (xi(t))t∈τ

(non noisy case)

τ is the set of sampling points (the same for all functions);

(xi , yi)i are i.i.d. copies of (X ,Y).

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 5 / 30

Page 13: Influence of the sampling on Functional Data Analysis

Notations and assumptions

Suppose that we are studying the random pair (X ,Y) where:

X is functional and takes its values in the Hilbert space (X, 〈., .〉X);Y takes its values in

classification case: {−1, 1};regression case: R.

Suppose that we observe (xτi , yi)i=1,...,n where:

xτi = (xi(t) + εi,t )t∈τ

(noisy case)

τ is the set of sampling points (the same for all functions);

(xi , yi)i are i.i.d. copies of (X ,Y).

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 5 / 30

Page 14: Influence of the sampling on Functional Data Analysis

Table of contents

1 Introduction to the sampling problem

2 Approximating functions with splines

3 Using splines in functional models based on sampling

4 References

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 6 / 30

Page 15: Influence of the sampling on Functional Data Analysis

A smooth representation of sampled functions

Given xτ , splines at providing the smoothest representation as possibleof x : [0, 1]→ R.

More precisely, x is approximated by:

x̂λ,τ = arg minhHm

1|τ|

∑t∈τ

(h(t) − xτt

)2+ λ

∫[0,1]

h(m)(t)dt

where for a m > 3/2, the Sobolev space Hm is defined by

Hm = {h ∈ L2([0, 1]) : ∀ k = 1, . . . ,m, h(k) exists in a weak sense

and h(m) ∈ L2([0, 1])}.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 7 / 30

Page 16: Influence of the sampling on Functional Data Analysis

A smooth representation of sampled functions

Given xτ , splines at providing the smoothest representation as possibleof x : [0, 1]→ R. More precisely, x is approximated by:

x̂λ,τ = arg minhHm

1|τ|

∑t∈τ

(h(t) − xτt

)2+ λ

∫[0,1]

h(m)(t)dt

where for a m > 3/2, the Sobolev space Hm is defined by

Hm = {h ∈ L2([0, 1]) : ∀ k = 1, . . . ,m, h(k) exists in a weak sense

and h(m) ∈ L2([0, 1])}.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 7 / 30

Page 17: Influence of the sampling on Functional Data Analysis

Decomposition of Hm

The key point for solving the previous optimization problem is to build aHilbert structure on Hm such that ‖h‖Hm '

∥∥∥h(m)∥∥∥

L2 .

This can be done by decomposing Hm into Hm = Hm0 ⊕H

m1 where

Hm0 =KerDm = Pm−1 (the space of polynomial functions of degree

less or equal to m − 1);

Hm1 is an infinite dimensional subspace of Hm defined via m

boundary conditions, denoted B : Hm → Rm, that are such that:KerB ∩ Pm−1.

Example 1: For m = 2, B : h → (h(0), h(1)) andHm

1 = {h ∈ H2 : h(0) = h(1) = 0}Example 2: For m > 3/2, B : h → (h(0), h′(0), . . . , h(m−1)(0)) andHm

1 = {h ∈ Hm : Bh = 0}

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 8 / 30

Page 18: Influence of the sampling on Functional Data Analysis

Decomposition of Hm

The key point for solving the previous optimization problem is to build aHilbert structure on Hm such that ‖h‖Hm '

∥∥∥h(m)∥∥∥

L2 .This can be done by decomposing Hm into Hm = Hm

0 ⊕Hm1 where

Hm0 =KerDm = Pm−1 (the space of polynomial functions of degree

less or equal to m − 1);

Hm1 is an infinite dimensional subspace of Hm defined via m

boundary conditions, denoted B : Hm → Rm, that are such that:KerB ∩ Pm−1.

Example 1: For m = 2, B : h → (h(0), h(1)) andHm

1 = {h ∈ H2 : h(0) = h(1) = 0}Example 2: For m > 3/2, B : h → (h(0), h′(0), . . . , h(m−1)(0)) andHm

1 = {h ∈ Hm : Bh = 0}

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 8 / 30

Page 19: Influence of the sampling on Functional Data Analysis

Decomposition of Hm

The key point for solving the previous optimization problem is to build aHilbert structure on Hm such that ‖h‖Hm '

∥∥∥h(m)∥∥∥

L2 .This can be done by decomposing Hm into Hm = Hm

0 ⊕Hm1 where

Hm0 =KerDm = Pm−1 (the space of polynomial functions of degree

less or equal to m − 1);

Hm1 is an infinite dimensional subspace of Hm defined via m

boundary conditions, denoted B : Hm → Rm, that are such that:KerB ∩ Pm−1.

Example 1: For m = 2, B : h → (h(0), h(1)) andHm

1 = {h ∈ H2 : h(0) = h(1) = 0}Example 2: For m > 3/2, B : h → (h(0), h′(0), . . . , h(m−1)(0)) andHm

1 = {h ∈ Hm : Bh = 0}

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 8 / 30

Page 20: Influence of the sampling on Functional Data Analysis

Decomposition of Hm

The key point for solving the previous optimization problem is to build aHilbert structure on Hm such that ‖h‖Hm '

∥∥∥h(m)∥∥∥

L2 .This can be done by decomposing Hm into Hm = Hm

0 ⊕Hm1 where

Hm0 =KerDm = Pm−1 (the space of polynomial functions of degree

less or equal to m − 1);

Hm1 is an infinite dimensional subspace of Hm defined via m

boundary conditions, denoted B : Hm → Rm, that are such that:KerB ∩ Pm−1.

Example 1: For m = 2, B : h → (h(0), h(1)) andHm

1 = {h ∈ H2 : h(0) = h(1) = 0}

Example 2: For m > 3/2, B : h → (h(0), h′(0), . . . , h(m−1)(0)) andHm

1 = {h ∈ Hm : Bh = 0}

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 8 / 30

Page 21: Influence of the sampling on Functional Data Analysis

Decomposition of Hm

The key point for solving the previous optimization problem is to build aHilbert structure on Hm such that ‖h‖Hm '

∥∥∥h(m)∥∥∥

L2 .This can be done by decomposing Hm into Hm = Hm

0 ⊕Hm1 where

Hm0 =KerDm = Pm−1 (the space of polynomial functions of degree

less or equal to m − 1);

Hm1 is an infinite dimensional subspace of Hm defined via m

boundary conditions, denoted B : Hm → Rm, that are such that:KerB ∩ Pm−1.

Example 1: For m = 2, B : h → (h(0), h(1)) andHm

1 = {h ∈ H2 : h(0) = h(1) = 0}Example 2: For m > 3/2, B : h → (h(0), h′(0), . . . , h(m−1)(0)) andHm

1 = {h ∈ Hm : Bh = 0}

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 8 / 30

Page 22: Influence of the sampling on Functional Data Analysis

Hilbert structure of Hm

Hm0 and Hm

1 are Hilbert spaces with respect to the inner product:

∀ u, v ∈ Hm0 , 〈u, v〉Hm

0= (Bu)T (Bv),

∀ u, v ∈ Hm1 , 〈u, v〉Hm

1= 〈Dmu,Dmv〉L2 .

Hence, we obtain this way a inner product on Hm:

〈u, v〉Hm = 〈P0(u),P0(v)〉Hm0

+ 〈P1(u),P1(v)〉Hm1

= 〈Dmu,Dmv〉L2 + (Bu)T (Bv)

where Pj is the projector on Hmj for j = 0, 1.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 9 / 30

Page 23: Influence of the sampling on Functional Data Analysis

Hilbert structure of Hm

Hm0 and Hm

1 are Hilbert spaces with respect to the inner product:

∀ u, v ∈ Hm0 , 〈u, v〉Hm

0= (Bu)T (Bv),

∀ u, v ∈ Hm1 , 〈u, v〉Hm

1= 〈Dmu,Dmv〉L2 .

Hence, we obtain this way a inner product on Hm:

〈u, v〉Hm = 〈P0(u),P0(v)〉Hm0

+ 〈P1(u),P1(v)〉Hm1

= 〈Dmu,Dmv〉L2 + (Bu)T (Bv)

where Pj is the projector on Hmj for j = 0, 1.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 9 / 30

Page 24: Influence of the sampling on Functional Data Analysis

Hilbert structure of Hm

Hm0 and Hm

1 are Hilbert spaces with respect to the inner product:

∀ u, v ∈ Hm0 , 〈u, v〉Hm

0= (Bu)T (Bv),

∀ u, v ∈ Hm1 , 〈u, v〉Hm

1= 〈Dmu,Dmv〉L2 .

Hence, we obtain this way a inner product on Hm:

〈u, v〉Hm = 〈P0(u),P0(v)〉Hm0

+ 〈P1(u),P1(v)〉Hm1

= 〈Dmu,Dmv〉L2 + (Bu)T (Bv)

where Pj is the projector on Hmj for j = 0, 1.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 9 / 30

Page 25: Influence of the sampling on Functional Data Analysis

Hilbert structure of Hm

Hm0 and Hm

1 are Hilbert spaces with respect to the inner product:

∀ u, v ∈ Hm0 , 〈u, v〉Hm

0= (Bu)T (Bv),

∀ u, v ∈ Hm1 , 〈u, v〉Hm

1= 〈Dmu,Dmv〉L2 .

Hence, we obtain this way a inner product on Hm:

〈u, v〉Hm = 〈P0(u),P0(v)〉Hm0

+ 〈P1(u),P1(v)〉Hm1

= 〈Dmu,Dmv〉L2 + (Bu)T (Bv)

where Pj is the projector on Hmj for j = 0, 1.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 9 / 30

Page 26: Influence of the sampling on Functional Data Analysis

RKHS structure of Hm

Equipped with 〈., .〉Hm , Hm0 , Hm

1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:

∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm

j= u(t).

Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm

1 = Pm−1 for the norm ‖.‖Hm0

, then k0(s, t) =∑m−1

i=1 ei(s)ei(t).

Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then

k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!2dw.

See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30

Page 27: Influence of the sampling on Functional Data Analysis

RKHS structure of Hm

Equipped with 〈., .〉Hm , Hm0 , Hm

1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:

∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm

j= u(t).

Hence, k = k0 + k1 is the reproducing kernel of Hm.

Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm

1 = Pm−1 for the norm ‖.‖Hm0

, then k0(s, t) =∑m−1

i=1 ei(s)ei(t).

Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then

k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!2dw.

See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30

Page 28: Influence of the sampling on Functional Data Analysis

RKHS structure of Hm

Equipped with 〈., .〉Hm , Hm0 , Hm

1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:

∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm

j= u(t).

Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm

1 = Pm−1 for the norm ‖.‖Hm0

, then k0(s, t) =∑m−1

i=1 ei(s)ei(t).

Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then

k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!2dw.

See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30

Page 29: Influence of the sampling on Functional Data Analysis

RKHS structure of Hm

Equipped with 〈., .〉Hm , Hm0 , Hm

1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:

∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm

j= u(t).

Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm

1 = Pm−1 for the norm ‖.‖Hm0

, then k0(s, t) =∑m−1

i=1 ei(s)ei(t).If m = 2 and boundary conditions are u(0) = u(1) = 0,{t → t , t → (1 − t)} is an orthonormal basis of Hm

0 and then

k0(s, t) = (1 − t)(1 − s) + st .

Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then

k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!2dw.

See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30

Page 30: Influence of the sampling on Functional Data Analysis

RKHS structure of Hm

Equipped with 〈., .〉Hm , Hm0 , Hm

1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:

∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm

j= u(t).

Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm

1 = Pm−1 for the norm ‖.‖Hm0

, then k0(s, t) =∑m−1

i=1 ei(s)ei(t).If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0,

{t → t i

i!

}i=1,...,m−1

is an orthonormalbasis of Hm

0 and then

k0(s, t) =m−1∑k=0

tk sk

(k !)2.

Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then

k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!2dw.

See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30

Page 31: Influence of the sampling on Functional Data Analysis

RKHS structure of Hm

Equipped with 〈., .〉Hm , Hm0 , Hm

1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:

∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm

j= u(t).

Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm

1 = Pm−1 for the norm ‖.‖Hm0

, then k0(s, t) =∑m−1

i=1 ei(s)ei(t).Example 2: k1 can be found by the way of the Green function,G : [0, 1]2 → R, satisfying:

u =

∫[0,1]

G(., t)Dmu(t)dt .

We have:k1(s, t) =

∫[0,1]

G(s,w)G(t ,w)dw.

If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then

k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!2dw.

See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30

Page 32: Influence of the sampling on Functional Data Analysis

RKHS structure of Hm

Equipped with 〈., .〉Hm , Hm0 , Hm

1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:

∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm

j= u(t).

Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm

1 = Pm−1 for the norm ‖.‖Hm0

, then k0(s, t) =∑m−1

i=1 ei(s)ei(t).Example 2:If m = 2 and boundary conditions are u(0) = u(1) = 0, then

k1(s, t) = (s − t)3+ − s(1 − t)(s2 − 2t + t2)/6.

If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then

k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!2dw.

See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30

Page 33: Influence of the sampling on Functional Data Analysis

RKHS structure of Hm

Equipped with 〈., .〉Hm , Hm0 , Hm

1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:

∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm

j= u(t).

Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm

1 = Pm−1 for the norm ‖.‖Hm0

, then k0(s, t) =∑m−1

i=1 ei(s)ei(t).Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then

k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!2dw.

See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30

Page 34: Influence of the sampling on Functional Data Analysis

RKHS structure of Hm

Equipped with 〈., .〉Hm , Hm0 , Hm

1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:

∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm

j= u(t).

Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm

1 = Pm−1 for the norm ‖.‖Hm0

, then k0(s, t) =∑m−1

i=1 ei(s)ei(t).Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then

k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!2dw.

See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30

Page 35: Influence of the sampling on Functional Data Analysis

Assumptions for existence and unicity of a spline

(A1) |τ| ≥ m − 1;

(A2) sampling points are distinct in [0, 1];

(A3) the m boundary conditions B j are linearly independent from the|τ| linear forms h ∈ Hm → h(t) for t ∈ τ.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 11 / 30

Page 36: Influence of the sampling on Functional Data Analysis

Assumptions for existence and unicity of a spline

(A1) |τ| ≥ m − 1;

(A2) sampling points are distinct in [0, 1];

(A3) the m boundary conditions B j are linearly independent from the|τ| linear forms h ∈ Hm → h(t) for t ∈ τ.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 11 / 30

Page 37: Influence of the sampling on Functional Data Analysis

Assumptions for existence and unicity of a spline

(A1) |τ| ≥ m − 1;

(A2) sampling points are distinct in [0, 1];

(A3) the m boundary conditions B j are linearly independent from the|τ| linear forms h ∈ Hm → h(t) for t ∈ τ.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 11 / 30

Page 38: Influence of the sampling on Functional Data Analysis

Computing the splines

Theorem [Kimeldorf and Wahba, 1971]Under assumptions (A1)-(A3), for all given xτ , the unique solution of theoptimization problem is:

x̂λ,τ = ωT(U(K1 + λI|τ |)

−1UT)−1

U(K1 + λI|τ |)−1xτ (1)

+ηT (K1 + λI|τ |)−1

(I|τ | − UT (U(K1 + λI|τ |)

−1UT )−1U(K1 + λI|τ |)−1

)xτ

= (ωT M0 + ηT M1)xτ

where

{ω1, . . . , ωm} is a basis of Pm−1, ω = (ω1, . . . , ωm)T andU = (ωi(t))i=1,...,m, t∈τ ;

η = (k1(t , .))Tt∈τ and K1 = (k1(t , t ′))t ,t ′∈τ .

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 12 / 30

Page 39: Influence of the sampling on Functional Data Analysis

Computing inner products between splines

Corollary

Under assumptions (A1)-(A3),

〈̂uλ,τ , v̂λ,τ〉Hm = (uτ)T MTOWMOvτ + (uτ)T MT

1 K1M1vτ

= (uτ)T Mτvτ

where the matrix Mτ is symmetric and positive definite (and therefore,defines a inner product on R|τ |).

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 13 / 30

Page 40: Influence of the sampling on Functional Data Analysis

Assumptions for convergence of spline estimates

If τ = {t1, t2, . . . , t|τ |}, denote:

∆τ = max t1, t2 − t1, . . . , 1 − t|τ |−1, ∆τ = min{t2−t1, t3−t2, . . . , t|τ |−t|τ |−1}.

and suppose that we are given a series of sampling sets τ1, τ2, . . . , τd ,. . . .λ should now depend on τ. We then note (τd)d for the series of samplingsets and (λd)d for the associated series of regularizing parameters.Suppose:

(A4) There is R ∈ R such that ∆τ/∆τ ≤ R;

(A5) limd→+∞ |τd | = +∞ and limd→+∞ λd = 0.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 14 / 30

Page 41: Influence of the sampling on Functional Data Analysis

Assumptions for convergence of spline estimates

If τ = {t1, t2, . . . , t|τ |}, denote:

∆τ = max t1, t2 − t1, . . . , 1 − t|τ |−1, ∆τ = min{t2−t1, t3−t2, . . . , t|τ |−t|τ |−1}.

and suppose that we are given a series of sampling sets τ1, τ2, . . . , τd ,. . . .λ should now depend on τ. We then note (τd)d for the series of samplingsets and (λd)d for the associated series of regularizing parameters.

Suppose:

(A4) There is R ∈ R such that ∆τ/∆τ ≤ R;

(A5) limd→+∞ |τd | = +∞ and limd→+∞ λd = 0.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 14 / 30

Page 42: Influence of the sampling on Functional Data Analysis

Assumptions for convergence of spline estimates

If τ = {t1, t2, . . . , t|τ |}, denote:

∆τ = max t1, t2 − t1, . . . , 1 − t|τ |−1, ∆τ = min{t2−t1, t3−t2, . . . , t|τ |−t|τ |−1}.

and suppose that we are given a series of sampling sets τ1, τ2, . . . , τd ,. . . .λ should now depend on τ. We then note (τd)d for the series of samplingsets and (λd)d for the associated series of regularizing parameters.Suppose:

(A4) There is R ∈ R such that ∆τ/∆τ ≤ R;

(A5) limd→+∞ |τd | = +∞ and limd→+∞ λd = 0.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 14 / 30

Page 43: Influence of the sampling on Functional Data Analysis

Assumptions for convergence of spline estimates

If τ = {t1, t2, . . . , t|τ |}, denote:

∆τ = max t1, t2 − t1, . . . , 1 − t|τ |−1, ∆τ = min{t2−t1, t3−t2, . . . , t|τ |−t|τ |−1}.

and suppose that we are given a series of sampling sets τ1, τ2, . . . , τd ,. . . .λ should now depend on τ. We then note (τd)d for the series of samplingsets and (λd)d for the associated series of regularizing parameters.Suppose:

(A4) There is R ∈ R such that ∆τ/∆τ ≤ R;

(A5) limd→+∞ |τd | = +∞ and limd→+∞ λd = 0.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 14 / 30

Page 44: Influence of the sampling on Functional Data Analysis

Convergence of spline estimates

Theorem [Ragozin, 1983]

Under assumptions (A1)-(A3), there are two constants, AR ,m and BR ,m,depending only on R and m, such that for any x ∈ Hm and any positive λ:

∥∥∥̂xλ,τ − x∥∥∥2

L2 ≤

(AR ,mλ + BR ,m

1|τ|2m

) ∥∥∥Dmx∥∥∥2

L2 .

Thus, under additional assumption (A4),∥∥∥̂xλ,τd − x

∥∥∥L2

d→+∞−−−−−−→ 0.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 15 / 30

Page 45: Influence of the sampling on Functional Data Analysis

Convergence of spline estimates

Theorem [Ragozin, 1983]

Under assumptions (A1)-(A3), there are two constants, AR ,m and BR ,m,depending only on R and m, such that for any x ∈ Hm and any positive λ:

∥∥∥̂xλ,τ − x∥∥∥2

L2 ≤

(AR ,mλ + BR ,m

1|τ|2m

) ∥∥∥Dmx∥∥∥2

L2 .

Thus, under additional assumption (A4),∥∥∥̂xλ,τd − x

∥∥∥L2

d→+∞−−−−−−→ 0.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 15 / 30

Page 46: Influence of the sampling on Functional Data Analysis

Just a single example: Tecator dataset

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 16 / 30

Page 47: Influence of the sampling on Functional Data Analysis

Table of contents

1 Introduction to the sampling problem

2 Approximating functions with splines

3 Using splines in functional models based on sampling

4 References

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 17 / 30

Page 48: Influence of the sampling on Functional Data Analysis

Notations and method

Suppose that we are given a pair of random variables (X ,Y) taking theirvalues in Hm × {−1, 1} (classification case) or in Hm × {−1, 1} (regressioncase).

We are given a training set of size n, {(xτdi , yi)}i=1,...,n where

xτdi = (xi(t1), . . . , xi(t|τd |));

{(xi , yi)}i are i.i.d. copies of (X ,Y).

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 18 / 30

Page 49: Influence of the sampling on Functional Data Analysis

Notations and method

Suppose that we are given a pair of random variables (X ,Y) taking theirvalues in Hm × {−1, 1} (classification case) or in Hm × {−1, 1} (regressioncase).We are given a training set of size n, {(xτd

i , yi)}i=1,...,n where

xτdi = (xi(t1), . . . , xi(t|τd |));

{(xi , yi)}i are i.i.d. copies of (X ,Y).

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 18 / 30

Page 50: Influence of the sampling on Functional Data Analysis

A general consistent method based on derivatives

Consider a consistent classification or regression scheme for data ofRp × {−1, 1} (or Rp × R) and denote by ψD the obtained classifier (orregression function) based on a learning set D = {(u1, y1), . . . , (un, yn)}.

If the definition of ψD is based on the norm or inner product between (ui)i

(and, of course on yi values), then this method can be generalized to beused with derivatives of xi by replacing this inner product by:

〈Dmxi ,Dmxj〉L2

' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm = 〈Mτd xi

τd , xjτd 〉Rd

Write Qτd for the transpose of the Choleski triangle of Mτd

((Qτd )T Qτd = Mτd ), we thus define a classifier or a regression functionon (Dmxi)i using only the discrete sampling (xτd

i )i by using

φn,τd = ψεn,τd

where εn,τd = {(Qτd xτdi , yi)}i .

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 19 / 30

Page 51: Influence of the sampling on Functional Data Analysis

A general consistent method based on derivatives

Consider a consistent classification or regression scheme for data ofRp × {−1, 1} (or Rp × R) and denote by ψD the obtained classifier (orregression function) based on a learning set D = {(u1, y1), . . . , (un, yn)}.If the definition of ψD is based on the norm or inner product between (ui)i

(and, of course on yi values), then this method can be generalized to beused with derivatives of xi by replacing this inner product by:

〈Dmxi ,Dmxj〉L2

' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm = 〈Mτd xi

τd , xjτd 〉Rd

Write Qτd for the transpose of the Choleski triangle of Mτd

((Qτd )T Qτd = Mτd ), we thus define a classifier or a regression functionon (Dmxi)i using only the discrete sampling (xτd

i )i by using

φn,τd = ψεn,τd

where εn,τd = {(Qτd xτdi , yi)}i .

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 19 / 30

Page 52: Influence of the sampling on Functional Data Analysis

A general consistent method based on derivatives

Consider a consistent classification or regression scheme for data ofRp × {−1, 1} (or Rp × R) and denote by ψD the obtained classifier (orregression function) based on a learning set D = {(u1, y1), . . . , (un, yn)}.If the definition of ψD is based on the norm or inner product between (ui)i

(and, of course on yi values), then this method can be generalized to beused with derivatives of xi by replacing this inner product by:

〈Dmxi ,Dmxj〉L2 ' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm

= 〈Mτd xiτd , xj

τd 〉Rd

Write Qτd for the transpose of the Choleski triangle of Mτd

((Qτd )T Qτd = Mτd ), we thus define a classifier or a regression functionon (Dmxi)i using only the discrete sampling (xτd

i )i by using

φn,τd = ψεn,τd

where εn,τd = {(Qτd xτdi , yi)}i .

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 19 / 30

Page 53: Influence of the sampling on Functional Data Analysis

A general consistent method based on derivatives

Consider a consistent classification or regression scheme for data ofRp × {−1, 1} (or Rp × R) and denote by ψD the obtained classifier (orregression function) based on a learning set D = {(u1, y1), . . . , (un, yn)}.If the definition of ψD is based on the norm or inner product between (ui)i

(and, of course on yi values), then this method can be generalized to beused with derivatives of xi by replacing this inner product by:

〈Dmxi ,Dmxj〉L2 ' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm = 〈Mτd xi

τd , xjτd 〉Rd

Write Qτd for the transpose of the Choleski triangle of Mτd

((Qτd )T Qτd = Mτd ), we thus define a classifier or a regression functionon (Dmxi)i using only the discrete sampling (xτd

i )i by using

φn,τd = ψεn,τd

where εn,τd = {(Qτd xτdi , yi)}i .

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 19 / 30

Page 54: Influence of the sampling on Functional Data Analysis

A general consistent method based on derivatives

Consider a consistent classification or regression scheme for data ofRp × {−1, 1} (or Rp × R) and denote by ψD the obtained classifier (orregression function) based on a learning set D = {(u1, y1), . . . , (un, yn)}.If the definition of ψD is based on the norm or inner product between (ui)i

(and, of course on yi values), then this method can be generalized to beused with derivatives of xi by replacing this inner product by:

〈Dmxi ,Dmxj〉L2 ' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm = 〈Mτd xi

τd , xjτd 〉Rd

Write Qτd for the transpose of the Choleski triangle of Mτd

((Qτd )T Qτd = Mτd ), we thus define a classifier or a regression functionon (Dmxi)i using only the discrete sampling (xτd

i )i by using

φn,τd = ψεn,τd

where εn,τd = {(Qτd xτdi , yi)}i .

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 19 / 30

Page 55: Influence of the sampling on Functional Data Analysis

Consistency property

Theorem [Rossi and Villa, 2008]Under assumptions (A1)-(A4) and assumption(A5a) E

(∥∥∥Dmx∥∥∥

L2

)is finite and Y ∈ {−1, 1};

or(A5b) τd ⊂ τd+1 and E

(Y2

)is finite, we have

limd→+∞

limn→+∞

Lφn,τd = L∗.

Sketch of the proof:

Using assumptions on the consistency of themultidimensional method, we have, for all d

Lφn,τd − L∗τd

n→+∞−−−−−−→ 0.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 20 / 30

Page 56: Influence of the sampling on Functional Data Analysis

Consistency property

Theorem [Rossi and Villa, 2008]Under assumptions (A1)-(A4) and assumption(A5a) E

(∥∥∥Dmx∥∥∥

L2

)is finite and Y ∈ {−1, 1};

or(A5b) τd ⊂ τd+1 and E

(Y2

)is finite, we have

limd→+∞

limn→+∞

Lφn,τd = L∗.

Sketch of the proof: Using convergence of the splines - (A5a) - and amartingale argument - (A5b) - we demonstrate that

L∗τd− L∗

d→+∞−−−−−−→ 0

where L∗τd= infφ:R|τd |→R P (φ(Xτd ) , Y) (classification case) or

L∗τd= infφ:R|τd |→R E

([φ(Xτd ) − Y ]2

)(regression case).

Using assumptionson the consistency of the multidimensional method, we have, for all d

Lφn,τd − L∗τd

n→+∞−−−−−−→ 0.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 20 / 30

Page 57: Influence of the sampling on Functional Data Analysis

Consistency property

Theorem [Rossi and Villa, 2008]Under assumptions (A1)-(A4) and assumption(A5a) E

(∥∥∥Dmx∥∥∥

L2

)is finite and Y ∈ {−1, 1};

or(A5b) τd ⊂ τd+1 and E

(Y2

)is finite, we have

limd→+∞

limn→+∞

Lφn,τd = L∗.

Sketch of the proof: Using assumptions on the consistency of themultidimensional method, we have, for all d

Lφn,τd − L∗τd

n→+∞−−−−−−→ 0.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 20 / 30

Page 58: Influence of the sampling on Functional Data Analysis

Application to kernel methods (SVM and kernel ridgeregression)

Provided additional assumptions, kernel methods

FD = arg minn∑

i=1

L(yi ,F(ui)) + C ‖F‖S

are consistent both for classification and for regression purposes:[Steinwart, 2002, Christmann and Steinwart, 2007].

Application to the general framework to kernel methods lead to thedefinition of the following kernel:

Kτd = K ◦ Qτd

from (R|τd |)2 to R, where K is any usual multidimensional kernel.This kernel uses the sampling xτd

i to approximately compute thefunctional kernel

K : (u, v) ∈ Hm → K(‖u − v‖Hm ).

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 21 / 30

Page 59: Influence of the sampling on Functional Data Analysis

Application to kernel methods (SVM and kernel ridgeregression)

Provided additional assumptions, kernel methods

FD = arg minn∑

i=1

L(yi ,F(ui)) + C ‖F‖S

are consistent both for classification and for regression purposes:[Steinwart, 2002, Christmann and Steinwart, 2007].Application to the general framework to kernel methods lead to thedefinition of the following kernel:

Kτd = K ◦ Qτd

from (R|τd |)2 to R, where K is any usual multidimensional kernel.

This kernel uses the sampling xτdi to approximately compute the

functional kernel

K : (u, v) ∈ Hm → K(‖u − v‖Hm ).

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 21 / 30

Page 60: Influence of the sampling on Functional Data Analysis

Application to kernel methods (SVM and kernel ridgeregression)

Provided additional assumptions, kernel methods

FD = arg minn∑

i=1

L(yi ,F(ui)) + C ‖F‖S

are consistent both for classification and for regression purposes:[Steinwart, 2002, Christmann and Steinwart, 2007].Application to the general framework to kernel methods lead to thedefinition of the following kernel:

Kτd = K ◦ Qτd

from (R|τd |)2 to R, where K is any usual multidimensional kernel.This kernel uses the sampling xτd

i to approximately compute thefunctional kernel

K : (u, v) ∈ Hm → K(‖u − v‖Hm ).

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 21 / 30

Page 61: Influence of the sampling on Functional Data Analysis

Corollary: consistency of kernel based method forclassification

Derivative based SVM consistency

Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)

for all d, the kernel K is universal on any compact subset of R|τ| andhas a covering number of the form N(K , ε) = On(ε−αd ) for a αd > 0on this compact subset,

the sequence of regularization parameters C ≡ (Cdn ) is such that for

each d, limn→+∞ nCdn = +∞ and Cd

n = On

(nβd−1

)for a 0 < βd <

1αd

,

and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .Then, the SVM classifier is universally consistent:

limd→+∞

limn→+∞

E (φn,τd ) = L∗.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 22 / 30

Page 62: Influence of the sampling on Functional Data Analysis

Corollary: consistency of kernel based method forclassification

Derivative based SVM consistency

Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)

for all d, the kernel K is universal on any compact subset of R|τ| andhas a covering number of the form N(K , ε) = On(ε−αd ) for a αd > 0on this compact subset,

the sequence of regularization parameters C ≡ (Cdn ) is such that for

each d, limn→+∞ nCdn = +∞ and Cd

n = On

(nβd−1

)for a 0 < βd <

1αd

,

and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .

Then, the SVM classifier is universally consistent:

limd→+∞

limn→+∞

E (φn,τd ) = L∗.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 22 / 30

Page 63: Influence of the sampling on Functional Data Analysis

Corollary: consistency of kernel based method forclassification

Derivative based SVM consistency

Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)

for all d, the kernel K is universal on any compact subset of R|τ| andhas a covering number of the form N(K , ε) = On(ε−αd ) for a αd > 0on this compact subset,

the sequence of regularization parameters C ≡ (Cdn ) is such that for

each d, limn→+∞ nCdn = +∞ and Cd

n = On

(nβd−1

)for a 0 < βd <

1αd

,

and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .Then, the SVM classifier is universally consistent:

limd→+∞

limn→+∞

E (φn,τd ) = L∗.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 22 / 30

Page 64: Influence of the sampling on Functional Data Analysis

Corollary: consistency of kernel based method forregression

Derivative based SVM consistency

Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)

for all d, the kernel K is universal on any compact subset of R|τ| ,

the sequence of regularization parameters C ≡ (Cdn ) is such that for

each d, limn→+∞ nCdn = +∞ and limn→+∞ n(Cd

n )4/3 = 0

and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .Then, the ridge kernel regression is universally consistent:

limd→+∞

limn→+∞

E (φn,τd ) = L∗.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 23 / 30

Page 65: Influence of the sampling on Functional Data Analysis

Corollary: consistency of kernel based method forregression

Derivative based SVM consistency

Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)

for all d, the kernel K is universal on any compact subset of R|τ| ,

the sequence of regularization parameters C ≡ (Cdn ) is such that for

each d, limn→+∞ nCdn = +∞ and limn→+∞ n(Cd

n )4/3 = 0

and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .

Then, the ridge kernel regression is universally consistent:

limd→+∞

limn→+∞

E (φn,τd ) = L∗.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 23 / 30

Page 66: Influence of the sampling on Functional Data Analysis

Corollary: consistency of kernel based method forregression

Derivative based SVM consistency

Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)

for all d, the kernel K is universal on any compact subset of R|τ| ,

the sequence of regularization parameters C ≡ (Cdn ) is such that for

each d, limn→+∞ nCdn = +∞ and limn→+∞ n(Cd

n )4/3 = 0

and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .Then, the ridge kernel regression is universally consistent:

limd→+∞

limn→+∞

E (φn,τd ) = L∗.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 23 / 30

Page 67: Influence of the sampling on Functional Data Analysis

Linear regression with noisy covariates

Let’s finally come back to the linear model

Y = 〈X , a〉L2([0,1]) + ε

with Y a real random variable, X a random variable taking its value inL2([0, 1]) and ε a centered real random variable independent of X .

But in the case of “noisy covariates”, we do not observe the couple(X ,Y) but (wi(t1), . . . ,wi(td), yi)i=1,...,n with

wi(tj) = xi(tj) + δi,j

where (xi , yi)i are (not necessarily independent) copies of (X ,Y) and δi,j

is a sequence of independent real centered independent variables that aresupposed to have 4th moment. Moreover, δi,j are supposed to beindependent of X and ε.In the following, we will assumed that the observations are centered toavoid notations’ difficulties.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 24 / 30

Page 68: Influence of the sampling on Functional Data Analysis

Linear regression with noisy covariates

Let’s finally come back to the linear model

Y = 〈X , a〉L2([0,1]) + ε

with Y a real random variable, X a random variable taking its value inL2([0, 1]) and ε a centered real random variable independent of X .But in the case of “noisy covariates”, we do not observe the couple(X ,Y) but (wi(t1), . . . ,wi(td), yi)i=1,...,n with

wi(tj) = xi(tj) + δi,j

where (xi , yi)i are (not necessarily independent) copies of (X ,Y) and δi,j

is a sequence of independent real centered independent variables that aresupposed to have 4th moment. Moreover, δi,j are supposed to beindependent of X and ε.

In the following, we will assumed that the observations are centered toavoid notations’ difficulties.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 24 / 30

Page 69: Influence of the sampling on Functional Data Analysis

Linear regression with noisy covariates

Let’s finally come back to the linear model

Y = 〈X , a〉L2([0,1]) + ε

with Y a real random variable, X a random variable taking its value inL2([0, 1]) and ε a centered real random variable independent of X .But in the case of “noisy covariates”, we do not observe the couple(X ,Y) but (wi(t1), . . . ,wi(td), yi)i=1,...,n with

wi(tj) = xi(tj) + δi,j

where (xi , yi)i are (not necessarily independent) copies of (X ,Y) and δi,j

is a sequence of independent real centered independent variables that aresupposed to have 4th moment. Moreover, δi,j are supposed to beindependent of X and ε.In the following, we will assumed that the observations are centered toavoid notations’ difficulties.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 24 / 30

Page 70: Influence of the sampling on Functional Data Analysis

Splines estimators

In the case where xi are known, a spline estimate of a would be:

an := arg minh∈Hm

1n

n∑i=1

yi −1p

p∑j=1

h(tj)xi(tj)

2

+ ρ ‖h‖Hm

In the case where xi are only known through noisy covariates, a splineestimate of a can be accessed through the Total Least Square approach[Cardot et al., 2007]:

an := arg minh∈Hm ,(xi,j)i,j

1n

n∑i=1

yi −

1p

p∑j=1

h(tj)xi,j

2

+1p

p∑j=1

(xi,j − wi(tj))2

+ρ ‖h‖Hm

The solution is given by

an =1n

(1

npXT X + ρAm − pσ2

k Ip

)−1

XT Y

where σk is replaced, in practice, by σδp where σδ is an estimate of the

standard deviation of δ.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 25 / 30

Page 71: Influence of the sampling on Functional Data Analysis

Splines estimators

In the case where xi are known, a spline estimate of a would be:

an := arg minh∈Hm

1n

n∑i=1

yi −1p

p∑j=1

h(tj)xi(tj)

2

+ ρ ‖h‖Hm

The solution is given by

an =1n

(1

npXT X + ρAm

)−1

XT Y

whereX = (xi(tj))i=1,...,n,j=1,...,p ;Y = (y1, . . . , yn)T ;Am is the matrix that defines the Hm-norm from the discrete samplingat (tj)j .

In the case where xi are only known through noisy covariates, a splineestimate of a can be accessed through the Total Least Square approach[Cardot et al., 2007]:

an := arg minh∈Hm ,(xi,j)i,j

1n

n∑i=1

yi −

1p

p∑j=1

h(tj)xi,j

2

+1p

p∑j=1

(xi,j − wi(tj))2

+ρ ‖h‖Hm

The solution is given by

an =1n

(1

npXT X + ρAm − pσ2

k Ip

)−1

XT Y

where σk is replaced, in practice, by σδp where σδ is an estimate of the

standard deviation of δ.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 25 / 30

Page 72: Influence of the sampling on Functional Data Analysis

Splines estimators

In the case where xi are only known through noisy covariates, a splineestimate of a can be accessed through the Total Least Square approach[Cardot et al., 2007]:

an := arg minh∈Hm ,(xi,j)i,j

1n

n∑i=1

yi −

1p

p∑j=1

h(tj)xi,j

2

+1p

p∑j=1

(xi,j − wi(tj))2

+ρ ‖h‖Hm

The solution is given by

an =1n

(1

npXT X + ρAm − pσ2

k Ip

)−1

XT Y

where σk is replaced, in practice, by σδp where σδ is an estimate of the

standard deviation of δ.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 25 / 30

Page 73: Influence of the sampling on Functional Data Analysis

Splines estimators

In the case where xi are only known through noisy covariates, a splineestimate of a can be accessed through the Total Least Square approach[Cardot et al., 2007]:

an := arg minh∈Hm ,(xi,j)i,j

1n

n∑i=1

yi −

1p

p∑j=1

h(tj)xi,j

2

+1p

p∑j=1

(xi,j − wi(tj))2

+ρ ‖h‖Hm

The solution is given by

an =1n

(1

npXT X + ρAm − pσ2

k Ip

)−1

XT Y

where σk is replaced, in practice, by σδp where σδ is an estimate of the

standard deviation of δ.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 25 / 30

Page 74: Influence of the sampling on Functional Data Analysis

Assumptions for convergence of an

a belongs to Hm;

It exists a constant κ, 0 < κ < 1, such that, for every δ > 0, ∃C:P (|X(t) − X(s)| ≤ C |t − s|, s, t ∈ [0, 1]) ≥ 1 − δ

∃E ∈ R and, for all k ∈ N, a k -dimensional subspace, Lk , of L2 with

E

(inf

h∈Lksup

t

∣∣∣X(t) − h(t)∣∣∣2) ≤ Ek−2q

There is a constant F :Var

(〈Γn

Xζs , ζt 〉L2

)≤ F

nE(〈X − E (X) , ζs〉

2L2

)E

(〈X − E (X) , ζt 〉

2L2

)For each δ > 0, ∃D: P

(1√p

∥∥∥∥ 1np XT Xa

∥∥∥∥Rp> D

)≥ 1 − δ

np−2κ = O(1), limn,p→+∞ ρ = 0 and limn,p→+∞1

nρ = 0.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 26 / 30

Page 75: Influence of the sampling on Functional Data Analysis

Convergence of an

Theorem [Crambes et al., 2008]Under the previous assumptions,

∥∥∥an − a∥∥∥

ΓX= OP

(1

npρ+

1n

+ n−(2q+1)/2)

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 27 / 30

Page 76: Influence of the sampling on Functional Data Analysis

Application to prediction of ozone

The data is a time series of the maximum of ozone rate each day inToulouse (France).

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 28 / 30

Page 77: Influence of the sampling on Functional Data Analysis

Table of contents

1 Introduction to the sampling problem

2 Approximating functions with splines

3 Using splines in functional models based on sampling

4 References

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 29 / 30

Page 78: Influence of the sampling on Functional Data Analysis

References

Further details for the references are given in the joint document.

Berlinet, A. and Thomas-Agnan, C. (2004).Reproducing Kernel Hilbert Spaces in Probability and Statistics.Kluwer Academic Publisher.

Cardot, H., Crambes, C., Kneip, A., and Sarda, P. (2007).Smoothing splines estimators in functional linear regression witherrors-in-variables.Computational Statistics and Data Analysis, 51:4832–4848.

Christmann, A. and Steinwart, I. (2007).Consistency and robustness of kernel-based regression in convex riskminimization.Bernouilli, 13(3):799–819.

Crambes, C., Kneip, A., and Sarda, P. (2008).Smoothing splines estimators for functional linear regression.Annals of Statistics.

Kimeldorf, G. and Wahba, G. (1971).Some results on Tchebycheffian spline functions.Journal of Mathematical Analysis and Applications, 33(1):82–95.

Ragozin, D. (1983).Error bounds for derivative estimation based on spline smoothing ofexact or noisy data.Journal of Approximation Theory, 37:335–355.

Rossi, F. and Villa, N. (2008).Classification and regression based on derivatives: a consistancyresult applied to functional kernel based classification and regression.Work in progress.

Steinwart, I. (2002).Support vector machines are universally consistent.Journal of Complexity, 18:768–791.

Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 30 / 30