113
Several nonlinear models and methods for FDA Nathalie Villa-Vialaneix - [email protected] http://www.nathalievilla.org Institut de Mathématiques de Toulouse - IUT de Carcassonne, Université de Perpignan France La Havane, September 16th, 2008 Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 1 / 42

Several nonlinear models and methods for FDA

  • Upload
    tuxette

  • View
    72

  • Download
    3

Embed Size (px)

DESCRIPTION

Short courses on functional data analysis and statistical learning, part 2 CENATAV, Havana, Cuba September 16th, 2008

Citation preview

Page 1: Several nonlinear models and methods for FDA

Several nonlinear models and methods for FDA

Nathalie Villa-Vialaneix - [email protected]://www.nathalievilla.org

Institut de Mathématiques de Toulouse - IUT de Carcassonne, Université de PerpignanFrance

La Havane, September 16th, 2008

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 1 / 42

Page 2: Several nonlinear models and methods for FDA

Table of contents

1 Nonparametric kernel

2 Neural networks

3 References

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 2 / 42

Page 3: Several nonlinear models and methods for FDA

Nonparametric model in FDA

In this section, X is a random variable taking its values in a metric space(X, ‖.‖X) where ‖.‖X denotes a semi-norm (i.e., ‖x‖X = 0; x = 0).

In the following presentation, we are interesting in the followingnonparametric functional model:

Y = Φ(X) + ε

where Y is a real random variable (regression case), X is a functionalrandom variable taking its values in (X, ‖.‖X) and ε is a centered randomvariable, independant from X and Φ is a unknown operator from X to R.We also suppose that we are given a set of n i.i.d. realizations of therandom pair (X ,Y):

(xi , yi), (x2, y2), . . . , (xn, yn).

From this training set, we aim at building an estimate, Φn, of Φ suchthat Φn converge to the true Φ when n tends to infinity, in a sense thatwill be developed later.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 3 / 42

Page 4: Several nonlinear models and methods for FDA

Nonparametric model in FDA

In this section, X is a random variable taking its values in a metric space(X, ‖.‖X) where ‖.‖X denotes a semi-norm (i.e., ‖x‖X = 0; x = 0).In the following presentation, we are interesting in the followingnonparametric functional model:

Y = Φ(X) + ε

where Y is a real random variable (regression case), X is a functionalrandom variable taking its values in (X, ‖.‖X) and ε is a centered randomvariable, independant from X and Φ is a unknown operator from X to R.

We also suppose that we are given a set of n i.i.d. realizations of therandom pair (X ,Y):

(xi , yi), (x2, y2), . . . , (xn, yn).

From this training set, we aim at building an estimate, Φn, of Φ suchthat Φn converge to the true Φ when n tends to infinity, in a sense thatwill be developed later.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 3 / 42

Page 5: Several nonlinear models and methods for FDA

Nonparametric model in FDA

In this section, X is a random variable taking its values in a metric space(X, ‖.‖X) where ‖.‖X denotes a semi-norm (i.e., ‖x‖X = 0; x = 0).In the following presentation, we are interesting in the followingnonparametric functional model:

Y = Φ(X) + ε

where Y is a real random variable (regression case), X is a functionalrandom variable taking its values in (X, ‖.‖X) and ε is a centered randomvariable, independant from X and Φ is a unknown operator from X to R.We also suppose that we are given a set of n i.i.d. realizations of therandom pair (X ,Y):

(xi , yi), (x2, y2), . . . , (xn, yn).

From this training set, we aim at building an estimate, Φn, of Φ suchthat Φn converge to the true Φ when n tends to infinity, in a sense thatwill be developed later.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 3 / 42

Page 6: Several nonlinear models and methods for FDA

Nonparametric model in FDA

In this section, X is a random variable taking its values in a metric space(X, ‖.‖X) where ‖.‖X denotes a semi-norm (i.e., ‖x‖X = 0; x = 0).In the following presentation, we are interesting in the followingnonparametric functional model:

Y = Φ(X) + ε

where Y is a real random variable (regression case), X is a functionalrandom variable taking its values in (X, ‖.‖X) and ε is a centered randomvariable, independant from X and Φ is a unknown operator from X to R.We also suppose that we are given a set of n i.i.d. realizations of therandom pair (X ,Y):

(xi , yi), (x2, y2), . . . , (xn, yn).

From this training set, we aim at building an estimate, Φn, of Φ suchthat Φn converge to the true Φ when n tends to infinity, in a sense thatwill be developed later.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 3 / 42

Page 7: Several nonlinear models and methods for FDA

Nadaraya-Watson kernel estimate[Nadaraya, 1964, Watson, 1964]

Returning to the real case (i.e. X ∈ R), the Nadaraya-Watson kernelestimate is the regression function:

Φn : x ∈ R →

∑ni=1 yiK

(x−xi

h

)∑n

i=1 K(

x−xih

)where

K is the so-called kernel, i.e., K : R → R is a bounded, integrablefunction. Additionally, K is often positive and null everywhere, excepton a compact subset of R.

h is the smoothing parameter: this parameter controls thesmoothness of the estimate Φn.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 4 / 42

Page 8: Several nonlinear models and methods for FDA

Nadaraya-Watson kernel estimate[Nadaraya, 1964, Watson, 1964]

Returning to the real case (i.e. X ∈ R), the Nadaraya-Watson kernelestimate is the regression function:

Φn : x ∈ R →

∑ni=1 yiK

(x−xi

h

)∑n

i=1 K(

x−xih

)where

K is the so-called kernel, i.e., K : R → R is a bounded, integrablefunction. Additionally, K is often positive and null everywhere, excepton a compact subset of R.

h is the smoothing parameter: this parameter controls thesmoothness of the estimate Φn.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 4 / 42

Page 9: Several nonlinear models and methods for FDA

Nadaraya-Watson kernel estimate[Nadaraya, 1964, Watson, 1964]

Returning to the real case (i.e. X ∈ R), the Nadaraya-Watson kernelestimate is the regression function:

Φn : x ∈ R →

∑ni=1 yiK

(x−xi

h

)∑n

i=1 K(

x−xih

)where

K is the so-called kernel, i.e., K : R → R is a bounded, integrablefunction. Additionally, K is often positive and null everywhere, excepton a compact subset of R.

h is the smoothing parameter: this parameter controls thesmoothness of the estimate Φn.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 4 / 42

Page 10: Several nonlinear models and methods for FDA

Remark on the parameters

Two parameters of the estimator Φn have to be set:1 The kernel K : its choice does not have much influence on the

accuracy of Φn. Several common choices for K are:the uniform kernel: K : x ∈ R → I[−1,1](x) (“moving averageestimate”),

the Epanechnikov kernel: K : x ∈ R → (1 − u2)I[−1,1](x),the Gaussian kernel: K : x ∈ R → e−u2

.. . .

2 The smoothing parameter h: it is of a main importance to obtain agood approximation.Several methods have been proposed to choose it, such as crossvalidation strategies, to name a few.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 5 / 42

Page 11: Several nonlinear models and methods for FDA

Remark on the parameters

Two parameters of the estimator Φn have to be set:1 The kernel K : its choice does not have much influence on the

accuracy of Φn. Several common choices for K are:the uniform kernel: K : x ∈ R → I[−1,1](x) (“moving averageestimate”),the Epanechnikov kernel: K : x ∈ R → (1 − u2)I[−1,1](x),

the Gaussian kernel: K : x ∈ R → e−u2.

. . .2 The smoothing parameter h: it is of a main importance to obtain a

good approximation.Several methods have been proposed to choose it, such as crossvalidation strategies, to name a few.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 5 / 42

Page 12: Several nonlinear models and methods for FDA

Remark on the parameters

Two parameters of the estimator Φn have to be set:1 The kernel K : its choice does not have much influence on the

accuracy of Φn. Several common choices for K are:the uniform kernel: K : x ∈ R → I[−1,1](x) (“moving averageestimate”),the Epanechnikov kernel: K : x ∈ R → (1 − u2)I[−1,1](x),the Gaussian kernel: K : x ∈ R → e−u2

.

. . .2 The smoothing parameter h: it is of a main importance to obtain a

good approximation.Several methods have been proposed to choose it, such as crossvalidation strategies, to name a few.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 5 / 42

Page 13: Several nonlinear models and methods for FDA

Remark on the parameters

Two parameters of the estimator Φn have to be set:1 The kernel K : its choice does not have much influence on the

accuracy of Φn. Several common choices for K are:the uniform kernel: K : x ∈ R → I[−1,1](x) (“moving averageestimate”),the Epanechnikov kernel: K : x ∈ R → (1 − u2)I[−1,1](x),the Gaussian kernel: K : x ∈ R → e−u2

.. . .

2 The smoothing parameter h: it is of a main importance to obtain agood approximation.

The smoothing parameter h: it is of a mainimportance to obtain a good approximation.Several methods have been proposed to choose it, such as crossvalidation strategies, to name a few.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 5 / 42

Page 14: Several nonlinear models and methods for FDA

Remark on the parameters

Two parameters of the estimator Φn have to be set:1 The kernel K : its choice does not have much influence on the

accuracy of Φn. Several common choices for K are:the uniform kernel: K : x ∈ R → I[−1,1](x) (“moving averageestimate”),the Epanechnikov kernel: K : x ∈ R → (1 − u2)I[−1,1](x),the Gaussian kernel: K : x ∈ R → e−u2

.

. . .

2 In particular, h depends on n.

h = 1 h = 0.5

The smoothing parameter h: it is of a main importance to obtain agood approximation.Several methods have been proposed to choose it, such as crossvalidation strategies, to name a few.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 5 / 42

Page 15: Several nonlinear models and methods for FDA

Remark on the parameters

Two parameters of the estimator Φn have to be set:1 The kernel K : its choice does not have much influence on the

accuracy of Φn. Several common choices for K are:the uniform kernel: K : x ∈ R → I[−1,1](x) (“moving averageestimate”),the Epanechnikov kernel: K : x ∈ R → (1 − u2)I[−1,1](x),the Gaussian kernel: K : x ∈ R → e−u2

.

. . .

2 The smoothing parameter h: it is of a main importance to obtain agood approximation.Several methods have been proposed to choose it, such as crossvalidation strategies, to name a few.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 5 / 42

Page 16: Several nonlinear models and methods for FDA

Generalization of the N.W. kernel to FDA[Ferraty and Vieu, 2002, Ferraty and Vieu, 2000]

When X takes its values in the Hilbert space (X, 〈., .〉X), this estimates canbe generalized by

Φn : x ∈ X →n∑

i=1

wi(x)yi

where

wi(x) =K

(‖xi−x‖X

h

)∑n

k=1 K(‖xk−x‖X

h

)and K and h are defined as previously (i.e., as in the real case).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 6 / 42

Page 17: Several nonlinear models and methods for FDA

Basic assumption about the fractal dimension

The main assumption for theoretical results on the convergence of theN.W. kernel estimate in Hilbertian case is:

(Afd1) Fractal dimension assumption(also called “small balls assumption”)

limα→0+

P (X ∈ B(x, α))

αa(x)= c(x)

where a(x) and c(x) are positive real numbers andB(x, α) :=

{u ∈ X : ‖u − x‖X ≤ α

}.

a(x) is named fractal dimension of the probability distribution of X .

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 7 / 42

Page 18: Several nonlinear models and methods for FDA

Basic assumption about the fractal dimension

The main assumption for theoretical results on the convergence of theN.W. kernel estimate in Hilbertian case is:

(Afd1) Fractal dimension assumption(also called “small balls assumption”)

limα→0+

P (X ∈ B(x, α))

αa(x)= c(x)

where a(x) and c(x) are positive real numbers andB(x, α) :=

{u ∈ X : ‖u − x‖X ≤ α

}.

a(x) is named fractal dimension of the probability distribution of X .

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 7 / 42

Page 19: Several nonlinear models and methods for FDA

Assumptions for pointwise convergence

(A1) limn→+∞ hn = 0 and limn→+∞nha(x)

nlog n = +∞

(A2) K is bounded and is null except on a compact subset of R+;moreover, K satisfies

∀ t , t ′ ∈ R+, |K(t) − K(t ′)| ≤ |t − t ′|

(A3) Y is bounded

(A4) Φ is continuous at x ∈ X

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 8 / 42

Page 20: Several nonlinear models and methods for FDA

Pointwise convergence

Theorem [Ferraty and Vieu, 2000, Ferraty and Vieu, 2002]

Under assumptions (A1)-(A4) and assumption (Afd1),

limn→+∞

Φn(x) = Φ(x).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 9 / 42

Page 21: Several nonlinear models and methods for FDA

Assumptions for optimal rate of pointwiseconvergence

(Afd2) P (X ∈ B(x, α)) = αa(x)c(x) + OP(αa(x)+b(x)

)(A4’) It exists B > 0, C > 0 and β > 0 such that: for all u and v inB(x,B)

|Φ(u) − Φ(v)| ≤ C |u − v |β

(A1’) hn = h(

log nn

) 11γ(x)+a(x) where γ(x) = min{b(x), β} and

limn→+∞nha(x)

nlog n = 0

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 10 / 42

Page 22: Several nonlinear models and methods for FDA

Rate of convergence for pointwise convergence

Theorem [Ferraty and Vieu, 2000, Ferraty and Vieu, 2002]

Under assumptions (A1’), (A2), (A3), (A4’) and assumption (Afd2),

Φn(x) − Φ(x) = O

(log n

n

) γ(x)2γ(x)+a(x)

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 11 / 42

Page 23: Several nonlinear models and methods for FDA

Assumptions for uniform convergence on a compactsubset of X, C

(Afd3) limα→0+ supx∈SP(X∈B(x,α))

αA = c(x) where infx∈S c(x) > 0

(A5) The covering number of S, N(S, l) (i.e., the minimum numberof balls of radius l that are needed to cover S is such that it existsα > 0 and C > 0 such that N(C, l) = Cl−α

(A4”) Φ is continuous on S

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 12 / 42

Page 24: Several nonlinear models and methods for FDA

Uniform convergence

Theorem [Ferraty and Vieu, 2004, Ferraty and Vieu, 2008]

Under assumptions (A1)-(A3), (A4”), (A5) and assumption (Afd3),

Φn(x) − Φ(x) = O

(log n

n

) γ(x)2γ(x)+a(x)

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 13 / 42

Page 25: Several nonlinear models and methods for FDA

Note on possible choices for the semi-norm

1 PCA semi-norm: suppose that X is a squared integrable randomvariable of L2 (i.e., E

(‖X‖2L2

)< +∞) then,

ΓX can be written as∑

k≥1 λk vk ⊗ vk where ((λk ), (vk ))k is theeigensystem of ΓX , (λk ) are in decreasing order and (vk ) areorthonormal vectors of L2,This defines a semi-norm on X = L2: for a given K ∈ N∗,

∀ x ∈ X, ‖x‖2X =K∑

k=1

〈vk , x〉2L2 =∥∥∥PSpan{v1,...,vK }(x)

∥∥∥2L2 .

This semi-norm emphasizes the main directions for the representationof the random variable X .

2 q-th derivative semi-norm: suppose now thatX =

{h ∈ L2 : h(q) exists and is in L2

}. Then,

∀ x ∈ X, ‖x‖2X

=∥∥∥∥x(q)

∥∥∥∥2

L2.

This norm is strongly related to RKHS (Sobolev spaces) and splinesand will be further investigated in Presentation 4.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 14 / 42

Page 26: Several nonlinear models and methods for FDA

Note on possible choices for the semi-norm

1 PCA semi-norm: suppose that X is a squared integrable randomvariable of L2 (i.e., E

(‖X‖2L2

)< +∞) then,

ΓX can be written as∑

k≥1 λk vk ⊗ vk where ((λk ), (vk ))k is theeigensystem of ΓX , (λk ) are in decreasing order and (vk ) areorthonormal vectors of L2,

This defines a semi-norm on X = L2: for a given K ∈ N∗,

∀ x ∈ X, ‖x‖2X =K∑

k=1

〈vk , x〉2L2 =∥∥∥PSpan{v1,...,vK }(x)

∥∥∥2L2 .

This semi-norm emphasizes the main directions for the representationof the random variable X .

2 q-th derivative semi-norm: suppose now thatX =

{h ∈ L2 : h(q) exists and is in L2

}. Then,

∀ x ∈ X, ‖x‖2X

=∥∥∥∥x(q)

∥∥∥∥2

L2.

This norm is strongly related to RKHS (Sobolev spaces) and splinesand will be further investigated in Presentation 4.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 14 / 42

Page 27: Several nonlinear models and methods for FDA

Note on possible choices for the semi-norm

1 PCA semi-norm: suppose that X is a squared integrable randomvariable of L2 (i.e., E

(‖X‖2L2

)< +∞) then,

ΓX can be written as∑

k≥1 λk vk ⊗ vk where ((λk ), (vk ))k is theeigensystem of ΓX , (λk ) are in decreasing order and (vk ) areorthonormal vectors of L2,This defines a semi-norm on X = L2: for a given K ∈ N∗,

∀ x ∈ X, ‖x‖2X =K∑

k=1

〈vk , x〉2L2 =∥∥∥PSpan{v1,...,vK }(x)

∥∥∥2L2 .

This semi-norm emphasizes the main directions for the representationof the random variable X .

2 q-th derivative semi-norm: suppose now thatX =

{h ∈ L2 : h(q) exists and is in L2

}. Then,

∀ x ∈ X, ‖x‖2X

=∥∥∥∥x(q)

∥∥∥∥2

L2.

This norm is strongly related to RKHS (Sobolev spaces) and splinesand will be further investigated in Presentation 4.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 14 / 42

Page 28: Several nonlinear models and methods for FDA

Note on possible choices for the semi-norm

1 PCA semi-norm: suppose that X is a squared integrable randomvariable of L2 (i.e., E

(‖X‖2L2

)< +∞) then,

ΓX can be written as∑

k≥1 λk vk ⊗ vk where ((λk ), (vk ))k is theeigensystem of ΓX , (λk ) are in decreasing order and (vk ) areorthonormal vectors of L2,This defines a semi-norm on X = L2: for a given K ∈ N∗,

∀ x ∈ X, ‖x‖2X =K∑

k=1

〈vk , x〉2L2 =∥∥∥PSpan{v1,...,vK }(x)

∥∥∥2L2 .

This semi-norm emphasizes the main directions for the representationof the random variable X .

2 q-th derivative semi-norm: suppose now thatX =

{h ∈ L2 : h(q) exists and is in L2

}. Then,

∀ x ∈ X, ‖x‖2X

=∥∥∥∥x(q)

∥∥∥∥2

L2.

This norm is strongly related to RKHS (Sobolev spaces) and splinesand will be further investigated in Presentation 4.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 14 / 42

Page 29: Several nonlinear models and methods for FDA

Note on possible choices for the semi-norm

1 PCA semi-norm: suppose that X is a squared integrable randomvariable of L2 (i.e., E

(‖X‖2L2

)< +∞) then,

ΓX can be written as∑

k≥1 λk vk ⊗ vk where ((λk ), (vk ))k is theeigensystem of ΓX , (λk ) are in decreasing order and (vk ) areorthonormal vectors of L2,This defines a semi-norm on X = L2: for a given K ∈ N∗,

∀ x ∈ X, ‖x‖2X =K∑

k=1

〈vk , x〉2L2 =∥∥∥PSpan{v1,...,vK }(x)

∥∥∥2L2 .

This semi-norm emphasizes the main directions for the representationof the random variable X .

2 q-th derivative semi-norm: suppose now thatX =

{h ∈ L2 : h(q) exists and is in L2

}. Then,

∀ x ∈ X, ‖x‖2X

=∥∥∥∥x(q)

∥∥∥∥2

L2.

This norm is strongly related to RKHS (Sobolev spaces) and splinesand will be further investigated in Presentation 4.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 14 / 42

Page 30: Several nonlinear models and methods for FDA

Generalization to curve classification[Ferraty and Vieu, 2003, Ferraty and Vieu, 2004]

Suppose that X ∈ X but also that Y ∈ {1, 2, . . . ,G}.

Then, a classification rule is given by:

∀ x ∈ X, g(x) = arg maxg=1,...,G

P (Y = g|X = x) .

Then, this rule needs the estimation of the probability P (Y = g|X = x):

Pn(Y = g|X = x) :=n∑

i=1

wi(x)I[Yi=g]

where wi(x) =K(‖x−xi‖X

h

)∑n

l=1 K(‖x−xl‖X

h

) .

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 15 / 42

Page 31: Several nonlinear models and methods for FDA

Generalization to curve classification[Ferraty and Vieu, 2003, Ferraty and Vieu, 2004]

Suppose that X ∈ X but also that Y ∈ {1, 2, . . . ,G}.Then, a classification rule is given by:

∀ x ∈ X, g(x) = arg maxg=1,...,G

P (Y = g|X = x) .

Then, this rule needs the estimation of the probability P (Y = g|X = x):

Pn(Y = g|X = x) :=n∑

i=1

wi(x)I[Yi=g]

where wi(x) =K(‖x−xi‖X

h

)∑n

l=1 K(‖x−xl‖X

h

) .

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 15 / 42

Page 32: Several nonlinear models and methods for FDA

Generalization to curve classification[Ferraty and Vieu, 2003, Ferraty and Vieu, 2004]

Suppose that X ∈ X but also that Y ∈ {1, 2, . . . ,G}.Then, a classification rule is given by:

∀ x ∈ X, g(x) = arg maxg=1,...,G

P (Y = g|X = x) .

Then, this rule needs the estimation of the probability P (Y = g|X = x):

Pn(Y = g|X = x) :=n∑

i=1

wi(x)I[Yi=g]

where wi(x) =K(‖x−xi‖X

h

)∑n

l=1 K(‖x−xl‖X

h

) .

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 15 / 42

Page 33: Several nonlinear models and methods for FDA

Example of nonparametric curve classification withPCA semi-norm

Problem: Discriminating 5 phonemes from their log-periodograms.

Competitor methods:Ridge PDA (i.e., Principal Discriminant Analysis penalized by the norm)Partial Least Squares with the L2 norm (denoted by MPLSR)Partial Least Squares with the PCA semi-norm (denoted byNPCD/MPLSR)Nonparametric kernel estimator with the PCA semi-norm (denotedby NPCD/PCA)

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 16 / 42

Page 34: Several nonlinear models and methods for FDA

Example of nonparametric curve classification withPCA semi-norm

Problem: Discriminating 5 phonemes from their log-periodograms.

Competitor methods:Ridge PDA (i.e., Principal Discriminant Analysis penalized by the norm)Partial Least Squares with the L2 norm (denoted by MPLSR)Partial Least Squares with the PCA semi-norm (denoted byNPCD/MPLSR)Nonparametric kernel estimator with the PCA semi-norm (denotedby NPCD/PCA)

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 16 / 42

Page 35: Several nonlinear models and methods for FDA

Obtained result

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 17 / 42

Page 36: Several nonlinear models and methods for FDA

Generalization to time series [Ferraty and Vieu, 2004]

Problem and notations: If we are given a time series (Z(t))t∈R , one isoften interesting, knowing {Z(t), t ∈ [Tmax − T ,Tmax]}, to predict Z(t + τ).

Denoting

X ={Z(t), t ∈ [Tmax − T ,Tmax]

}Y = Z(Tmax + τ)

we can see that this problem is strongly related to a functional regressionmodel.The observations are given by

xi ={z(t), t ∈ [(i − 1)T , iT ]

}yi = z(iT + τ)

for i = 1, . . . , n but they are not independant.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 18 / 42

Page 37: Several nonlinear models and methods for FDA

Generalization to time series [Ferraty and Vieu, 2004]

Problem and notations: If we are given a time series (Z(t))t∈R , one isoften interesting, knowing {Z(t), t ∈ [Tmax − T ,Tmax]}, to predict Z(t + τ).Denoting

X ={Z(t), t ∈ [Tmax − T ,Tmax]

}Y = Z(Tmax + τ)

we can see that this problem is strongly related to a functional regressionmodel.

The observations are given by

xi ={z(t), t ∈ [(i − 1)T , iT ]

}yi = z(iT + τ)

for i = 1, . . . , n but they are not independant.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 18 / 42

Page 38: Several nonlinear models and methods for FDA

Generalization to time series [Ferraty and Vieu, 2004]

Problem and notations: If we are given a time series (Z(t))t∈R , one isoften interesting, knowing {Z(t), t ∈ [Tmax − T ,Tmax]}, to predict Z(t + τ).Denoting

X ={Z(t), t ∈ [Tmax − T ,Tmax]

}Y = Z(Tmax + τ)

we can see that this problem is strongly related to a functional regressionmodel.The observations are given by

xi ={z(t), t ∈ [(i − 1)T , iT ]

}yi = z(iT + τ)

for i = 1, . . . , n but they are not independant.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 18 / 42

Page 39: Several nonlinear models and methods for FDA

Mixing assumptions

Under mixing assumptions on ((xi , yi))i and other assumptions, thesame convergence results holds.

More precisely, suppose that, for

α(n) = supk∈Z, A∈σk

−∞, B∈σ+∞k+n

∣∣∣P (A ∩ B) − P (A)P (B)∣∣∣

where σlk = σ ({(xi , yi) : k ≤ i ≤ l}), we have

limn→+∞

α(n) = 0.

α is named mixing coefficient.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 19 / 42

Page 40: Several nonlinear models and methods for FDA

Mixing assumptions

Under mixing assumptions on ((xi , yi))i and other assumptions, thesame convergence results holds.More precisely, suppose that, for

α(n) = supk∈Z, A∈σk

−∞, B∈σ+∞k+n

∣∣∣P (A ∩ B) − P (A)P (B)∣∣∣

where σlk = σ ({(xi , yi) : k ≤ i ≤ l}), we have

limn→+∞

α(n) = 0.

α is named mixing coefficient.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 19 / 42

Page 41: Several nonlinear models and methods for FDA

Mixing assumptions

Under mixing assumptions on ((xi , yi))i and other assumptions, thesame convergence results holds.More precisely, suppose that, for

α(n) = supk∈Z, A∈σk

−∞, B∈σ+∞k+n

∣∣∣P (A ∩ B) − P (A)P (B)∣∣∣

where σlk = σ ({(xi , yi) : k ≤ i ≤ l}), we have

limn→+∞

α(n) = 0.

α is named mixing coefficient.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 19 / 42

Page 42: Several nonlinear models and methods for FDA

Table of contents

1 Nonparametric kernel

2 Neural networks

3 References

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 20 / 42

Page 43: Several nonlinear models and methods for FDA

Biological neural networks

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 21 / 42

Page 44: Several nonlinear models and methods for FDA

Biological neural networks

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 21 / 42

Page 45: Several nonlinear models and methods for FDA

Biological neural networks

∑Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 21 / 42

Page 46: Several nonlinear models and methods for FDA

Biological neural networks

If∑> activation threshold then

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 21 / 42

Page 47: Several nonlinear models and methods for FDA

Biological neural networks

If∑< activation threshold then

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 21 / 42

Page 48: Several nonlinear models and methods for FDA

Mathematical multilayer perceptrons:multidimensional case

Inpu

ts

2

1.5

11

Variable d

Variable 2

Variable 1

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 22 / 42

Page 49: Several nonlinear models and methods for FDA

Mathematical multilayer perceptrons:multidimensional case

Inpu

ts

2

1.5

11

Variable d

Variable 2

Variable 1× weights

0.5

-1

0.2

∑= 4.4

Activation function G

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 22 / 42

Page 50: Several nonlinear models and methods for FDA

Mathematical multilayer perceptrons:multidimensional case

Inpu

ts

2

1.5

11

Variable d

Variable 2

Variable 1× weights

0.5

-1

0.2

∑= 4.4

Activation function G

∑+G

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 22 / 42

Page 51: Several nonlinear models and methods for FDA

Mathematical multilayer perceptrons:multidimensional case

Inpu

ts

2

1.5

11

Variable d

Variable 2

Variable 1× weights

0.5

-1

0.2

∑= 4.4

Activation function G

∑+G

× weights

∑+f

Outputs

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 22 / 42

Page 52: Several nonlinear models and methods for FDA

In summary, multilayer perceptrons with 1 hiddenlayer are:

a class of functions of the form:

φpw : x ∈ Rd →

p∑k=1

w(2)k G

(w(0)

k + (w(1)k )T x

)

whereG is given and called the activation function.

p is also given and called the number of neurons on the hiddenlayer.For all k = 1, . . . , p, w(2)

k ∈ R, w(0)k ∈ R and w(1)

k ∈ Rd are theweights that have to be set from the learning data set.More precisely, given ((xi , yi))i=1,...,n n i.i.d. random realization of therandom pair (X ,Y) that takes its values in Rd × R, an estimate of Ψ isgiven by: Φn = φw∗n where w∗n is a solution of:

w∗n = arg minw∈(R×R×Rd)p

n∑i=1

(yi − φw(xi))2 .

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 23 / 42

Page 53: Several nonlinear models and methods for FDA

In summary, multilayer perceptrons with 1 hiddenlayer are:

a class of functions of the form:

φpw : x ∈ Rd →

p∑k=1

w(2)k G

(w(0)

k + (w(1)k )T x

)where

G is given and called the activation function.

p is also given and called the number of neurons on the hiddenlayer.For all k = 1, . . . , p, w(2)

k ∈ R, w(0)k ∈ R and w(1)

k ∈ Rd are theweights that have to be set from the learning data set.More precisely, given ((xi , yi))i=1,...,n n i.i.d. random realization of therandom pair (X ,Y) that takes its values in Rd × R, an estimate of Ψ isgiven by: Φn = φw∗n where w∗n is a solution of:

w∗n = arg minw∈(R×R×Rd)p

n∑i=1

(yi − φw(xi))2 .

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 23 / 42

Page 54: Several nonlinear models and methods for FDA

In summary, multilayer perceptrons with 1 hiddenlayer are:

a class of functions of the form:

φpw : x ∈ Rd →

p∑k=1

w(2)k G

(w(0)

k + (w(1)k )T x

)where

G is given and called the activation function. Popular examples ofactivation functions are:

the linear activation function

p is also given and called the number of neurons on the hiddenlayer.For all k = 1, . . . , p, w(2)

k ∈ R, w(0)k ∈ R and w(1)

k ∈ Rd are theweights that have to be set from the learning data set.More precisely, given ((xi , yi))i=1,...,n n i.i.d. random realization of therandom pair (X ,Y) that takes its values in Rd × R, an estimate of Ψ isgiven by: Φn = φw∗n where w∗n is a solution of:

w∗n = arg minw∈(R×R×Rd)p

n∑i=1

(yi − φw(xi))2 .

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 23 / 42

Page 55: Several nonlinear models and methods for FDA

In summary, multilayer perceptrons with 1 hiddenlayer are:

a class of functions of the form:

φpw : x ∈ Rd →

p∑k=1

w(2)k G

(w(0)

k + (w(1)k )T x

)where

G is given and called the activation function. Popular examples ofactivation functions are:

the sigmoid activation function

p is also given and called the number of neurons on the hiddenlayer.For all k = 1, . . . , p, w(2)

k ∈ R, w(0)k ∈ R and w(1)

k ∈ Rd are theweights that have to be set from the learning data set.More precisely, given ((xi , yi))i=1,...,n n i.i.d. random realization of therandom pair (X ,Y) that takes its values in Rd × R, an estimate of Ψ isgiven by: Φn = φw∗n where w∗n is a solution of:

w∗n = arg minw∈(R×R×Rd)p

n∑i=1

(yi − φw(xi))2 .

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 23 / 42

Page 56: Several nonlinear models and methods for FDA

In summary, multilayer perceptrons with 1 hiddenlayer are:

a class of functions of the form:

φpw : x ∈ Rd →

p∑k=1

w(2)k G

(w(0)

k + (w(1)k )T x

)where

G is given and called the activation function.p is also given and called the number of neurons on the hiddenlayer.

For all k = 1, . . . , p, w(2)k ∈ R, w(0)

k ∈ R and w(1)k ∈ Rd are the

weights that have to be set from the learning data set.More precisely, given ((xi , yi))i=1,...,n n i.i.d. random realization of therandom pair (X ,Y) that takes its values in Rd × R, an estimate of Ψ isgiven by: Φn = φw∗n where w∗n is a solution of:

w∗n = arg minw∈(R×R×Rd)p

n∑i=1

(yi − φw(xi))2 .

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 23 / 42

Page 57: Several nonlinear models and methods for FDA

In summary, multilayer perceptrons with 1 hiddenlayer are:

a class of functions of the form:

φpw : x ∈ Rd →

p∑k=1

w(2)k G

(w(0)

k + (w(1)k )T x

)where

G is given and called the activation function.p is also given and called the number of neurons on the hiddenlayer. p is a main parameter to obtain a good generalization ability.

For all k = 1, . . . , p, w(2)k ∈ R, w(0)

k ∈ R and w(1)k ∈ Rd are the

weights that have to be set from the learning data set.More precisely, given ((xi , yi))i=1,...,n n i.i.d. random realization of therandom pair (X ,Y) that takes its values in Rd × R, an estimate of Ψ isgiven by: Φn = φw∗n where w∗n is a solution of:

w∗n = arg minw∈(R×R×Rd)p

n∑i=1

(yi − φw(xi))2 .

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 23 / 42

Page 58: Several nonlinear models and methods for FDA

In summary, multilayer perceptrons with 1 hiddenlayer are:

a class of functions of the form:

φpw : x ∈ Rd →

p∑k=1

w(2)k G

(w(0)

k + (w(1)k )T x

)where

G is given and called the activation function.p is also given and called the number of neurons on the hiddenlayer. p is a main parameter to obtain a good generalization ability.

For all k = 1, . . . , p, w(2)k ∈ R, w(0)

k ∈ R and w(1)k ∈ Rd are the

weights that have to be set from the learning data set.More precisely, given ((xi , yi))i=1,...,n n i.i.d. random realization of therandom pair (X ,Y) that takes its values in Rd × R, an estimate of Ψ isgiven by: Φn = φw∗n where w∗n is a solution of:

w∗n = arg minw∈(R×R×Rd)p

n∑i=1

(yi − φw(xi))2 .

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 23 / 42

Page 59: Several nonlinear models and methods for FDA

In summary, multilayer perceptrons with 1 hiddenlayer are:

a class of functions of the form:

φpw : x ∈ Rd →

p∑k=1

w(2)k G

(w(0)

k + (w(1)k )T x

)where

G is given and called the activation function.p is also given and called the number of neurons on the hiddenlayer. p is a main parameter to obtain a good generalization ability.

For all k = 1, . . . , p, w(2)k ∈ R, w(0)

k ∈ R and w(1)k ∈ Rd are the

weights that have to be set from the learning data set.More precisely, given ((xi , yi))i=1,...,n n i.i.d. random realization of therandom pair (X ,Y) that takes its values in Rd × R, an estimate of Ψ isgiven by: Φn = φw∗n where w∗n is a solution of:

w∗n = arg minw∈(R×R×Rd)p

n∑i=1

(yi − φw(xi))2 .

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 23 / 42

Page 60: Several nonlinear models and methods for FDA

In summary, multilayer perceptrons with 1 hiddenlayer are:

a class of functions of the form:

φpw : x ∈ Rd →

p∑k=1

w(2)k G

(w(0)

k + (w(1)k )T x

)where

G is given and called the activation function.p is also given and called the number of neurons on the hiddenlayer.For all k = 1, . . . , p, w(2)

k ∈ R, w(0)k ∈ R and w(1)

k ∈ Rd are theweights that have to be set from the learning data set.

More precisely, given ((xi , yi))i=1,...,n n i.i.d. random realization of therandom pair (X ,Y) that takes its values in Rd × R, an estimate of Ψ isgiven by: Φn = φw∗n where w∗n is a solution of:

w∗n = arg minw∈(R×R×Rd)p

n∑i=1

(yi − φw(xi))2 .

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 23 / 42

Page 61: Several nonlinear models and methods for FDA

In summary, multilayer perceptrons with 1 hiddenlayer are:

a class of functions of the form:

φpw : x ∈ Rd →

p∑k=1

w(2)k G

(w(0)

k + (w(1)k )T x

)where

G is given and called the activation function.p is also given and called the number of neurons on the hiddenlayer.For all k = 1, . . . , p, w(2)

k ∈ R, w(0)k ∈ R and w(1)

k ∈ Rd are theweights that have to be set from the learning data set.More precisely, given ((xi , yi))i=1,...,n n i.i.d. random realization of therandom pair (X ,Y) that takes its values in Rd × R, an estimate of Ψ isgiven by: Φn = φw∗n where w∗n is a solution of:

w∗n = arg minw∈(R×R×Rd)p

n∑i=1

(yi − φw(xi))2 .

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 23 / 42

Page 62: Several nonlinear models and methods for FDA

Universal approximation

The popularity of MLP in the multidimensional context comes from twomain properties. The first one is called universal approximationcapability:

Theorem [Hornik, 1991, Hornik, 1993, Stinchcombe, 1999]If the activation function is continuous and non polynomial, then,

{φpw : w ∈

(R × R × Rd

)p, p ∈ N∗}

is dense in the set of continuous functions from Rd to R for thetopology induced by the uniform norm on compact sets.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 24 / 42

Page 63: Several nonlinear models and methods for FDA

Consistency to optimal weights

The second property deals with the fact that the optimal empirical weightstends to the optimal weights when the size of the training data tends toinfinity.

Theorem [White, 1989, White, 1990]

Denote w∗ = arg minw∈(R×R×Rd)p E((

X − φpw(Y)

)2)

and

Θ(p,∆) ={φ

pw :

∑pk=1 |w

(2)k | ≤ ∆ and

∑ki=1(|w(0)

k |+∑d

l=1 |w(1)kl |) ≤ ∆p

}.

If1 the activation function G is bounded;2 limn→+∞ pn = +∞, limn→+∞∆n = +∞, ∆n = o(n) and

limn→+∞ pn∆4n log pn∆n = o(n);

3 w∗ exists and is unique;4 w∗n = arg minw: φw∈Θ(pn ,∆n)

∑ni=1(yi − φw(xi))2;

then limn→+∞

∥∥∥w∗n − w∗∥∥∥Rd = 0.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 25 / 42

Page 64: Several nonlinear models and methods for FDA

Multilayer perceptrons with functional input

Suppose now that (X ,Y) is a random pair taking its values in X × R where(X, 〈., .〉X) is a Hilbert space.

Multilayer perceptrons with 1 hidden layer generalize to functional input by:

φpw : x ∈ X →

p∑k=1

w(2)k G

(w(0)

k + 〈x,w(1)k 〉X

)where the weights w(1)

k ∈ X (functional values).

With relevant representations of the weights (w(1)k )k (rich enough

representations), universal property of this model remains valid.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 26 / 42

Page 65: Several nonlinear models and methods for FDA

Multilayer perceptrons with functional input

Suppose now that (X ,Y) is a random pair taking its values in X × R where(X, 〈., .〉X) is a Hilbert space.Multilayer perceptrons with 1 hidden layer generalize to functional input by:

φpw : x ∈ X →

p∑k=1

w(2)k G

(w(0)

k + 〈x,w(1)k 〉X

)where the weights w(1)

k ∈ X (functional values).

With relevant representations of the weights (w(1)k )k (rich enough

representations), universal property of this model remains valid.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 26 / 42

Page 66: Several nonlinear models and methods for FDA

Multilayer perceptrons with functional input

Suppose now that (X ,Y) is a random pair taking its values in X × R where(X, 〈., .〉X) is a Hilbert space.Multilayer perceptrons with 1 hidden layer generalize to functional input by:

φpw : x ∈ X →

p∑k=1

w(2)k G

(w(0)

k + 〈x,w(1)k 〉X

)where the weights w(1)

k ∈ X (functional values).

With relevant representations of the weights (w(1)k )k (rich enough

representations), universal property of this model remains valid.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 26 / 42

Page 67: Several nonlinear models and methods for FDA

Practical implementation: discrete sampling case

Suppose now that (xi)i are known on a discrete sampling gridt1, t2, . . . , td .

Then, 〈w(1)k , xi〉X can be approximated by

〈w(1)k , xi〉X '

1d

d∑l=1

xi(tk )w(1)k (tl).

w(1)k should be searched in a class of functions F such that, for any sets of

real numbers (cl)l=1,...,d , it exists a function w ∈ F for which

∀ l = 1, . . . , d, w(tl) = cl .

Splines have such a property (see Presentation 4).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 27 / 42

Page 68: Several nonlinear models and methods for FDA

Practical implementation: discrete sampling case

Suppose now that (xi)i are known on a discrete sampling gridt1, t2, . . . , td .Then, 〈w(1)

k , xi〉X can be approximated by

〈w(1)k , xi〉X '

1d

d∑l=1

xi(tk )w(1)k (tl).

w(1)k should be searched in a class of functions F such that, for any sets of

real numbers (cl)l=1,...,d , it exists a function w ∈ F for which

∀ l = 1, . . . , d, w(tl) = cl .

Splines have such a property (see Presentation 4).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 27 / 42

Page 69: Several nonlinear models and methods for FDA

Practical implementation: discrete sampling case

Suppose now that (xi)i are known on a discrete sampling gridt1, t2, . . . , td .Then, 〈w(1)

k , xi〉X can be approximated by

〈w(1)k , xi〉X '

1d

d∑l=1

xi(tk )w(1)k (tl).

w(1)k should be searched in a class of functions F such that, for any sets of

real numbers (cl)l=1,...,d , it exists a function w ∈ F for which

∀ l = 1, . . . , d, w(tl) = cl .

Splines have such a property (see Presentation 4).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 27 / 42

Page 70: Several nonlinear models and methods for FDA

Practical implementation: discrete sampling case

Suppose now that (xi)i are known on a discrete sampling gridt1, t2, . . . , td .Then, 〈w(1)

k , xi〉X can be approximated by

〈w(1)k , xi〉X '

1d

d∑l=1

xi(tk )w(1)k (tl).

w(1)k should be searched in a class of functions F such that, for any sets of

real numbers (cl)l=1,...,d , it exists a function w ∈ F for which

∀ l = 1, . . . , d, w(tl) = cl .

Splines have such a property (see Presentation 4).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 27 / 42

Page 71: Several nonlinear models and methods for FDA

Practical implementation: projection approach

Suppose now that X admits a family of functions (ψqk )q∈N∗, 1≤k≤q such that:

for all q ∈ N∗, (ψqk )k is an orthonormal system,

if Pq denotes the projection in X on Span{ψ

q1, . . . , ψ

qq

}then, the

pointwise consistent property is satisfied:

∀ x ∈ X, ∀ ε > 0,∃Q ∈ N∗ : ∀ q ≥ Q ,∥∥∥Pq(x) − x

∥∥∥∞≤ ε.

Hence, 〈w(1)k , xi〉X can be approximated by

〈w(1)k , xi〉X ' 〈w

(1)k ,Pq(xi)〉X = 〈Pq(w(1)

k ), xi〉X = 〈Pq(w(1)k ),Pq(xi)〉X.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 28 / 42

Page 72: Several nonlinear models and methods for FDA

Practical implementation: projection approach

Suppose now that X admits a family of functions (ψqk )q∈N∗, 1≤k≤q such that:

for all q ∈ N∗, (ψqk )k is an orthonormal system,

if Pq denotes the projection in X on Span{ψ

q1, . . . , ψ

qq

}then, the

pointwise consistent property is satisfied:

∀ x ∈ X, ∀ ε > 0,∃Q ∈ N∗ : ∀ q ≥ Q ,∥∥∥Pq(x) − x

∥∥∥∞≤ ε.

Hence, 〈w(1)k , xi〉X can be approximated by

〈w(1)k , xi〉X ' 〈w

(1)k ,Pq(xi)〉X = 〈Pq(w(1)

k ), xi〉X = 〈Pq(w(1)k ),Pq(xi)〉X.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 28 / 42

Page 73: Several nonlinear models and methods for FDA

Practical implementation: projection approach

Suppose now that X admits a family of functions (ψqk )q∈N∗, 1≤k≤q such that:

for all q ∈ N∗, (ψqk )k is an orthonormal system,

if Pq denotes the projection in X on Span{ψ

q1, . . . , ψ

qq

}then, the

pointwise consistent property is satisfied:

∀ x ∈ X, ∀ ε > 0,∃Q ∈ N∗ : ∀ q ≥ Q ,∥∥∥Pq(x) − x

∥∥∥∞≤ ε.

Hence, 〈w(1)k , xi〉X can be approximated by

〈w(1)k , xi〉X ' 〈w

(1)k ,Pq(xi)〉X = 〈Pq(w(1)

k ), xi〉X = 〈Pq(w(1)k ),Pq(xi)〉X.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 28 / 42

Page 74: Several nonlinear models and methods for FDA

Universal approximation for projection basedapproach

Theorem[Rossi and Conan-Guez, 2005, Rossi and Conan-Guez, 2006]If G is continuous and non polynomial then the setx ∈ X →

p∑k=1

w(2)k G

w(0)k +

q∑l=1

βkl(Pq(x))l

is dense in the set of all continuous functions defined on X for the uniformnorm on any compact subset of X.

Remark 1: A similar result exists for the pointwise consistent approach.Remark 2: A convergence results for the optimal weights of thefunctional MLP also exists but requires many technical assumptions sothat we do not detail it here. See [Rossi and Conan-Guez, 2005] for moredetails about it.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 29 / 42

Page 75: Several nonlinear models and methods for FDA

Universal approximation for projection basedapproach

Theorem[Rossi and Conan-Guez, 2005, Rossi and Conan-Guez, 2006]If G is continuous and non polynomial then the setx ∈ X →

p∑k=1

w(2)k G

w(0)k +

q∑l=1

βkl(Pq(x))l

is dense in the set of all continuous functions defined on X for the uniformnorm on any compact subset of X.

Remark 1: A similar result exists for the pointwise consistent approach.

Remark 2: A convergence results for the optimal weights of thefunctional MLP also exists but requires many technical assumptions sothat we do not detail it here. See [Rossi and Conan-Guez, 2005] for moredetails about it.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 29 / 42

Page 76: Several nonlinear models and methods for FDA

Universal approximation for projection basedapproach

Theorem[Rossi and Conan-Guez, 2005, Rossi and Conan-Guez, 2006]If G is continuous and non polynomial then the setx ∈ X →

p∑k=1

w(2)k G

w(0)k +

q∑l=1

βkl(Pq(x))l

is dense in the set of all continuous functions defined on X for the uniformnorm on any compact subset of X.

Remark 1: A similar result exists for the pointwise consistent approach.Remark 2: A convergence results for the optimal weights of thefunctional MLP also exists but requires many technical assumptions sothat we do not detail it here. See [Rossi and Conan-Guez, 2005] for moredetails about it.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 29 / 42

Page 77: Several nonlinear models and methods for FDA

Functional Inverse Regression

Suppose that we are given a random pair (X ,Y) taking its values in R × Xfor which a are given n i.i.d. realizations (x1, y1), . . . , (xn, yn)

Model: Moreover, we suppose that (X ,Y) satisfies the following model:

Y = Ψ(〈X , a1〉X, . . . , 〈X , aq〉X, ε)

where ε is a centered real random variable independant of X , Ψ is anunknown function that has to be estimated and {a1, . . . , aq} are unknownelements of X that are linearly independents and that have to beestimated.We call the space Span

{a1, . . . , aq

}, the Effective Dimension Reduction

subspace of X, denoted by EDR.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 30 / 42

Page 78: Several nonlinear models and methods for FDA

Functional Inverse Regression

Suppose that we are given a random pair (X ,Y) taking its values in R × Xfor which a are given n i.i.d. realizations (x1, y1), . . . , (xn, yn)Model: Moreover, we suppose that (X ,Y) satisfies the following model:

Y = Ψ(〈X , a1〉X, . . . , 〈X , aq〉X, ε)

where ε is a centered real random variable independant of X , Ψ is anunknown function that has to be estimated and {a1, . . . , aq} are unknownelements of X that are linearly independents and that have to beestimated.

We call the space Span{a1, . . . , aq

}, the Effective Dimension Reduction

subspace of X, denoted by EDR.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 30 / 42

Page 79: Several nonlinear models and methods for FDA

Functional Inverse Regression

Suppose that we are given a random pair (X ,Y) taking its values in R × Xfor which a are given n i.i.d. realizations (x1, y1), . . . , (xn, yn)Model: Moreover, we suppose that (X ,Y) satisfies the following model:

Y = Ψ(〈X , a1〉X, . . . , 〈X , aq〉X, ε)

where ε is a centered real random variable independant of X , Ψ is anunknown function that has to be estimated and {a1, . . . , aq} are unknownelements of X that are linearly independents and that have to beestimated.We call the space Span

{a1, . . . , aq

}, the Effective Dimension Reduction

subspace of X, denoted by EDR.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 30 / 42

Page 80: Several nonlinear models and methods for FDA

Fundamental property of EDR space

Denote A = (〈X , a1〉X, . . . , 〈X , aq〉X)T .

Li’s condition [Li, 1991]If

(A-Li) ∀ x ∈ X, ∃v ∈ Rq : E (〈u,X〉X | A) = vT A ,

then E (X | Y) ∈ ΓX (EDR).

⇒ EDR is estimated through the estimation of a1, . . . , aq, eigenvectorsof Γ−1

X ΓE(X |Y).But, as for the functional linear model, Γ−1

X has to be estimated by apenalized or a regularized approach.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 31 / 42

Page 81: Several nonlinear models and methods for FDA

Fundamental property of EDR space

Denote A = (〈X , a1〉X, . . . , 〈X , aq〉X)T .

Li’s condition [Li, 1991]If

(A-Li) ∀ x ∈ X, ∃v ∈ Rq : E (〈u,X〉X | A) = vT A ,

then E (X | Y) ∈ ΓX (EDR).

⇒ EDR is estimated through the estimation of a1, . . . , aq, eigenvectorsof Γ−1

X ΓE(X |Y).

But, as for the functional linear model, Γ−1X has to be estimated by a

penalized or a regularized approach.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 31 / 42

Page 82: Several nonlinear models and methods for FDA

Fundamental property of EDR space

Denote A = (〈X , a1〉X, . . . , 〈X , aq〉X)T .

Li’s condition [Li, 1991]If

(A-Li) ∀ x ∈ X, ∃v ∈ Rq : E (〈u,X〉X | A) = vT A ,

then E (X | Y) ∈ ΓX (EDR).

⇒ EDR is estimated through the estimation of a1, . . . , aq, eigenvectorsof Γ−1

X ΓE(X |Y).But, as for the functional linear model, Γ−1

X has to be estimated by apenalized or a regularized approach.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 31 / 42

Page 83: Several nonlinear models and methods for FDA

PCA approach[Ferré and Yao, 2003, Ferré and Yao, 2005]

Note

((λni , v

ni ))i≥1 the eigenvalue decomposition of Γn

X ((λi)i are ordered indecreasing order and almost n eigenvalues are not null; (vn

i )i areorthonormal)

kn an integer such that: kn ≤ n and limn→+∞ kn = +∞

Pkn the projector Pkn (u) =∑kn

i=1〈vni , .〉Xvn

i

Γn,knX = Pkn ◦ Γn,kn

X ◦ Pkn = 1n∑kn

i=1 λni 〈v

ni , .〉Xvn

i

if (Is)s=1,...,S is a partition of the subspace of R where Y takes itsvalues, then we estimate

P (Y ∈ Is) by pns = 1

n

∑ni=1 I{yi∈Is },

E (X | Y ∈ Is) by µns = 1

n

∑ni=1 xiI{yi∈Is },

ΓE(X |Y) by ΓnE(X |Y)

=∑S

s=1 pnsµ

ns ⊗ µ

ns .

((αknk ), an

k )k=1,...,q the eigensystem of (Γn,knX )+ΓE(X |Y).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 32 / 42

Page 84: Several nonlinear models and methods for FDA

PCA approach[Ferré and Yao, 2003, Ferré and Yao, 2005]

Note

((λni , v

ni ))i≥1 the eigenvalue decomposition of Γn

X ((λi)i are ordered indecreasing order and almost n eigenvalues are not null; (vn

i )i areorthonormal)

kn an integer such that: kn ≤ n and limn→+∞ kn = +∞

Pkn the projector Pkn (u) =∑kn

i=1〈vni , .〉Xvn

i

Γn,knX = Pkn ◦ Γn,kn

X ◦ Pkn = 1n∑kn

i=1 λni 〈v

ni , .〉Xvn

i

if (Is)s=1,...,S is a partition of the subspace of R where Y takes itsvalues, then we estimate

P (Y ∈ Is) by pns = 1

n

∑ni=1 I{yi∈Is },

E (X | Y ∈ Is) by µns = 1

n

∑ni=1 xiI{yi∈Is },

ΓE(X |Y) by ΓnE(X |Y)

=∑S

s=1 pnsµ

ns ⊗ µ

ns .

((αknk ), an

k )k=1,...,q the eigensystem of (Γn,knX )+ΓE(X |Y).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 32 / 42

Page 85: Several nonlinear models and methods for FDA

PCA approach[Ferré and Yao, 2003, Ferré and Yao, 2005]

Note

((λni , v

ni ))i≥1 the eigenvalue decomposition of Γn

X ((λi)i are ordered indecreasing order and almost n eigenvalues are not null; (vn

i )i areorthonormal)

kn an integer such that: kn ≤ n and limn→+∞ kn = +∞

Pkn the projector Pkn (u) =∑kn

i=1〈vni , .〉Xvn

i

Γn,knX = Pkn ◦ Γn,kn

X ◦ Pkn = 1n∑kn

i=1 λni 〈v

ni , .〉Xvn

i

if (Is)s=1,...,S is a partition of the subspace of R where Y takes itsvalues, then we estimate

P (Y ∈ Is) by pns = 1

n

∑ni=1 I{yi∈Is },

E (X | Y ∈ Is) by µns = 1

n

∑ni=1 xiI{yi∈Is },

ΓE(X |Y) by ΓnE(X |Y)

=∑S

s=1 pnsµ

ns ⊗ µ

ns .

((αknk ), an

k )k=1,...,q the eigensystem of (Γn,knX )+ΓE(X |Y).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 32 / 42

Page 86: Several nonlinear models and methods for FDA

PCA approach[Ferré and Yao, 2003, Ferré and Yao, 2005]

Note

((λni , v

ni ))i≥1 the eigenvalue decomposition of Γn

X ((λi)i are ordered indecreasing order and almost n eigenvalues are not null; (vn

i )i areorthonormal)

kn an integer such that: kn ≤ n and limn→+∞ kn = +∞

Pkn the projector Pkn (u) =∑kn

i=1〈vni , .〉Xvn

i

Γn,knX = Pkn ◦ Γn,kn

X ◦ Pkn = 1n∑kn

i=1 λni 〈v

ni , .〉Xvn

i

if (Is)s=1,...,S is a partition of the subspace of R where Y takes itsvalues, then we estimate

P (Y ∈ Is) by pns = 1

n

∑ni=1 I{yi∈Is },

E (X | Y ∈ Is) by µns = 1

n

∑ni=1 xiI{yi∈Is },

ΓE(X |Y) by ΓnE(X |Y)

=∑S

s=1 pnsµ

ns ⊗ µ

ns .

((αknk ), an

k )k=1,...,q the eigensystem of (Γn,knX )+ΓE(X |Y).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 32 / 42

Page 87: Several nonlinear models and methods for FDA

PCA approach[Ferré and Yao, 2003, Ferré and Yao, 2005]

Note

((λni , v

ni ))i≥1 the eigenvalue decomposition of Γn

X ((λi)i are ordered indecreasing order and almost n eigenvalues are not null; (vn

i )i areorthonormal)

kn an integer such that: kn ≤ n and limn→+∞ kn = +∞

Pkn the projector Pkn (u) =∑kn

i=1〈vni , .〉Xvn

i

Γn,knX = Pkn ◦ Γn,kn

X ◦ Pkn = 1n∑kn

i=1 λni 〈v

ni , .〉Xvn

i

if (Is)s=1,...,S is a partition of the subspace of R where Y takes itsvalues, then we estimate

P (Y ∈ Is) by pns = 1

n

∑ni=1 I{yi∈Is },

E (X | Y ∈ Is) by µns = 1

n

∑ni=1 xiI{yi∈Is },

ΓE(X |Y) by ΓnE(X |Y)

=∑S

s=1 pnsµ

ns ⊗ µ

ns .

((αknk ), an

k )k=1,...,q the eigensystem of (Γn,knX )+ΓE(X |Y).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 32 / 42

Page 88: Several nonlinear models and methods for FDA

PCA approach[Ferré and Yao, 2003, Ferré and Yao, 2005]

Note

((λni , v

ni ))i≥1 the eigenvalue decomposition of Γn

X ((λi)i are ordered indecreasing order and almost n eigenvalues are not null; (vn

i )i areorthonormal)

kn an integer such that: kn ≤ n and limn→+∞ kn = +∞

Pkn the projector Pkn (u) =∑kn

i=1〈vni , .〉Xvn

i

Γn,knX = Pkn ◦ Γn,kn

X ◦ Pkn = 1n∑kn

i=1 λni 〈v

ni , .〉Xvn

i

if (Is)s=1,...,S is a partition of the subspace of R where Y takes itsvalues, then we estimate

P (Y ∈ Is) by pns = 1

n

∑ni=1 I{yi∈Is },

E (X | Y ∈ Is) by µns = 1

n

∑ni=1 xiI{yi∈Is },

ΓE(X |Y) by ΓnE(X |Y)

=∑S

s=1 pnsµ

ns ⊗ µ

ns .

((αknk ), an

k )k=1,...,q the eigensystem of (Γn,knX )+ΓE(X |Y).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 32 / 42

Page 89: Several nonlinear models and methods for FDA

PCA approach[Ferré and Yao, 2003, Ferré and Yao, 2005]

Note

((λni , v

ni ))i≥1 the eigenvalue decomposition of Γn

X ((λi)i are ordered indecreasing order and almost n eigenvalues are not null; (vn

i )i areorthonormal)

kn an integer such that: kn ≤ n and limn→+∞ kn = +∞

Pkn the projector Pkn (u) =∑kn

i=1〈vni , .〉Xvn

i

Γn,knX = Pkn ◦ Γn,kn

X ◦ Pkn = 1n∑kn

i=1 λni 〈v

ni , .〉Xvn

i

if (Is)s=1,...,S is a partition of the subspace of R where Y takes itsvalues, then we estimate

P (Y ∈ Is) by pns = 1

n

∑ni=1 I{yi∈Is },

E (X | Y ∈ Is) by µns = 1

n

∑ni=1 xiI{yi∈Is },

ΓE(X |Y) by ΓnE(X |Y)

=∑S

s=1 pnsµ

ns ⊗ µ

ns .

((αknk ), an

k )k=1,...,q the eigensystem of (Γn,knX )+ΓE(X |Y).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 32 / 42

Page 90: Several nonlinear models and methods for FDA

PCA approach[Ferré and Yao, 2003, Ferré and Yao, 2005]

Note

((λni , v

ni ))i≥1 the eigenvalue decomposition of Γn

X ((λi)i are ordered indecreasing order and almost n eigenvalues are not null; (vn

i )i areorthonormal)

kn an integer such that: kn ≤ n and limn→+∞ kn = +∞

Pkn the projector Pkn (u) =∑kn

i=1〈vni , .〉Xvn

i

Γn,knX = Pkn ◦ Γn,kn

X ◦ Pkn = 1n∑kn

i=1 λni 〈v

ni , .〉Xvn

i

if (Is)s=1,...,S is a partition of the subspace of R where Y takes itsvalues, then we estimate

P (Y ∈ Is) by pns = 1

n

∑ni=1 I{yi∈Is },

E (X | Y ∈ Is) by µns = 1

n

∑ni=1 xiI{yi∈Is },

ΓE(X |Y) by ΓnE(X |Y)

=∑S

s=1 pnsµ

ns ⊗ µ

ns .

((αknk ), an

k )k=1,...,q the eigensystem of (Γn,knX )+ΓE(X |Y).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 32 / 42

Page 91: Several nonlinear models and methods for FDA

Assumptions for convergence of the estimated EDRspace

(A1) X has 4 th moment;

(A2) the (λk )k≥1 are all distinct and positive;

(A3) if we note b1 = 2√

2λ1−λ2

and bj = 2√

2min(λj−1−λj ,λj−λj+1)

for j ≥ 2,

limn→+∞1√nλkn

∑knj=1 bj = 0 and limn→+∞

1√nλ2

kn

= 0;

(A4) the eigenvalues of Γ−1X ΓE(X |Y) are all distinct.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 33 / 42

Page 92: Several nonlinear models and methods for FDA

Convergence of the estimated EDR space

Theorem [Ferré and Yao, 2003, Ferré and Yao, 2005]Under assumption (A-Li) and assumptions (A1)-(A4),∥∥∥an

k − ak∥∥∥X

n→+∞, P−−−−−−−−→ 0.

Remark: A similar result for a smoothing approach is described in[Ferré and Villa, 2005, Ferré and Villa, 2006].

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 34 / 42

Page 93: Several nonlinear models and methods for FDA

Convergence of the estimated EDR space

Theorem [Ferré and Yao, 2003, Ferré and Yao, 2005]Under assumption (A-Li) and assumptions (A1)-(A4),∥∥∥an

k − ak∥∥∥X

n→+∞, P−−−−−−−−→ 0.

Remark: A similar result for a smoothing approach is described in[Ferré and Villa, 2005, Ferré and Villa, 2006].

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 34 / 42

Page 94: Several nonlinear models and methods for FDA

FIR multilayer perceptrons

The idea of this model, developed in [Ferré and Villa, 2006], is to estimatethe function Ψ by a multilayer perceptron.

〈X , a1〉X

〈X , a2〉X

〈X , aq〉X w(1)

w(2)∑

+

Bias

∑+

Bias: w(0)

Y

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 35 / 42

Page 95: Several nonlinear models and methods for FDA

FIR multilayer perceptrons

The idea of this model, developed in [Ferré and Villa, 2006], is to estimatethe function Ψ by a multilayer perceptron.

〈X , a1〉X

〈X , a2〉X

〈X , aq〉X w(1)

w(2)∑

+

Bias

∑+

Bias: w(0)

Y

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 35 / 42

Page 96: Several nonlinear models and methods for FDA

Notations

Outputs : ∀ u ∈ Rq,

φw(u) =

p∑k=1

w(2)k G

((w(1)

k )T u + w(0)k

);

For a given error function L (e.g., mean square error),ζ((u, y),w) = L(φw(u), y);

Variables: Z = ((〈X , aj〉X)j=1,...,q,Y), zni = ((〈xi , aj〉X)j=1,...,q, yi): Z

and (zni )i take their values in a open subspace, O, of Rq+1;

The weights are chosen in a compact subset,W, of (R × Rq × R)p

theoretical: w∗ = arg minw∈W E (ζ(Z ,w)): W∗ is the set of all w∗;empirical: w∗n = arg minw∈W

∑ni=1 ζ(zn

i ,w).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 36 / 42

Page 97: Several nonlinear models and methods for FDA

Notations

Outputs : ∀ u ∈ Rq,

φw(u) =

p∑k=1

w(2)k G

((w(1)

k )T u + w(0)k

);

For a given error function L (e.g., mean square error),ζ((u, y),w) = L(φw(u), y);

Variables: Z = ((〈X , aj〉X)j=1,...,q,Y), zni = ((〈xi , aj〉X)j=1,...,q, yi): Z

and (zni )i take their values in a open subspace, O, of Rq+1;

The weights are chosen in a compact subset,W, of (R × Rq × R)p

theoretical: w∗ = arg minw∈W E (ζ(Z ,w)): W∗ is the set of all w∗;empirical: w∗n = arg minw∈W

∑ni=1 ζ(zn

i ,w).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 36 / 42

Page 98: Several nonlinear models and methods for FDA

Notations

Outputs : ∀ u ∈ Rq,

φw(u) =

p∑k=1

w(2)k G

((w(1)

k )T u + w(0)k

);

For a given error function L (e.g., mean square error),ζ((u, y),w) = L(φw(u), y);

Variables: Z = ((〈X , aj〉X)j=1,...,q,Y), zni = ((〈xi , aj〉X)j=1,...,q, yi): Z

and (zni )i take their values in a open subspace, O, of Rq+1;

The weights are chosen in a compact subset,W, of (R × Rq × R)p

theoretical: w∗ = arg minw∈W E (ζ(Z ,w)): W∗ is the set of all w∗;empirical: w∗n = arg minw∈W

∑ni=1 ζ(zn

i ,w).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 36 / 42

Page 99: Several nonlinear models and methods for FDA

Notations

Outputs : ∀ u ∈ Rq,

φw(u) =

p∑k=1

w(2)k G

((w(1)

k )T u + w(0)k

);

For a given error function L (e.g., mean square error),ζ((u, y),w) = L(φw(u), y);

Variables: Z = ((〈X , aj〉X)j=1,...,q,Y), zni = ((〈xi , aj〉X)j=1,...,q, yi): Z

and (zni )i take their values in a open subspace, O, of Rq+1;

The weights are chosen in a compact subset,W, of (R × Rq × R)p

theoretical: w∗ = arg minw∈W E (ζ(Z ,w)): W∗ is the set of all w∗;empirical: w∗n = arg minw∈W

∑ni=1 ζ(zn

i ,w).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 36 / 42

Page 100: Several nonlinear models and methods for FDA

Notations

Outputs : ∀ u ∈ Rq,

φw(u) =

p∑k=1

w(2)k G

((w(1)

k )T u + w(0)k

);

For a given error function L (e.g., mean square error),ζ((u, y),w) = L(φw(u), y);

Variables: Z = ((〈X , aj〉X)j=1,...,q,Y), zni = ((〈xi , aj〉X)j=1,...,q, yi): Z

and (zni )i take their values in a open subspace, O, of Rq+1;

The weights are chosen in a compact subset,W, of (R × Rq × R)p

theoretical: w∗ = arg minw∈W E (ζ(Z ,w)): W∗ is the set of all w∗;

empirical: w∗n = arg minw∈W∑n

i=1 ζ(zni ,w).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 36 / 42

Page 101: Several nonlinear models and methods for FDA

Notations

Outputs : ∀ u ∈ Rq,

φw(u) =

p∑k=1

w(2)k G

((w(1)

k )T u + w(0)k

);

For a given error function L (e.g., mean square error),ζ((u, y),w) = L(φw(u), y);

Variables: Z = ((〈X , aj〉X)j=1,...,q,Y), zni = ((〈xi , aj〉X)j=1,...,q, yi): Z

and (zni )i take their values in a open subspace, O, of Rq+1;

The weights are chosen in a compact subset,W, of (R × Rq × R)p

theoretical: w∗ = arg minw∈W E (ζ(Z ,w)): W∗ is the set of all w∗;empirical: w∗n = arg minw∈W

∑ni=1 ζ(zn

i ,w).

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 36 / 42

Page 102: Several nonlinear models and methods for FDA

Assumptions for convergence of the optimalempirical weights

(A1) ∀ z ∈ O, ζ(z, .) is continuous;

(A2) ∀w ∈ W, ζ(.,w) is measurable;

(A3) it exists a measurable function ζ̃ from O in R such that, ∀ z ∈ O,∀w ∈ W,

∣∣∣ζ(z,w)∣∣∣ < ζ̃(z) and E

(ζ̃(Z)

)< +∞;

(A4) ∀w ∈ W, ∃C(w) > 0 such that for all (x, y) and (x′, y′) in O,∣∣∣ζ((x, y),w) − ζ((x′, y),w)∣∣∣ ≤ C(w) ‖x − x′‖Rq

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 37 / 42

Page 103: Several nonlinear models and methods for FDA

Assumptions for convergence of the optimalempirical weights

(A1) ∀ z ∈ O, ζ(z, .) is continuous;

(A2) ∀w ∈ W, ζ(.,w) is measurable;

(A3) it exists a measurable function ζ̃ from O in R such that, ∀ z ∈ O,∀w ∈ W,

∣∣∣ζ(z,w)∣∣∣ < ζ̃(z) and E

(ζ̃(Z)

)< +∞;

(A4) ∀w ∈ W, ∃C(w) > 0 such that for all (x, y) and (x′, y′) in O,∣∣∣ζ((x, y),w) − ζ((x′, y),w)∣∣∣ ≤ C(w) ‖x − x′‖Rq

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 37 / 42

Page 104: Several nonlinear models and methods for FDA

Assumptions for convergence of the optimalempirical weights

(A1) ∀ z ∈ O, ζ(z, .) is continuous;

(A2) ∀w ∈ W, ζ(.,w) is measurable;

(A3) it exists a measurable function ζ̃ from O in R such that, ∀ z ∈ O,∀w ∈ W,

∣∣∣ζ(z,w)∣∣∣ < ζ̃(z) and E

(ζ̃(Z)

)< +∞;

(A4) ∀w ∈ W, ∃C(w) > 0 such that for all (x, y) and (x′, y′) in O,∣∣∣ζ((x, y),w) − ζ((x′, y),w)∣∣∣ ≤ C(w) ‖x − x′‖Rq

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 37 / 42

Page 105: Several nonlinear models and methods for FDA

Assumptions for convergence of the optimalempirical weights

(A1) ∀ z ∈ O, ζ(z, .) is continuous;

(A2) ∀w ∈ W, ζ(.,w) is measurable;

(A3) it exists a measurable function ζ̃ from O in R such that, ∀ z ∈ O,∀w ∈ W,

∣∣∣ζ(z,w)∣∣∣ < ζ̃(z) and E

(ζ̃(Z)

)< +∞;

(A4) ∀w ∈ W, ∃C(w) > 0 such that for all (x, y) and (x′, y′) in O,∣∣∣ζ((x, y),w) − ζ((x′, y),w)∣∣∣ ≤ C(w) ‖x − x′‖Rq

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 37 / 42

Page 106: Several nonlinear models and methods for FDA

Convergence of the optimal empirical weights

Theorem [Ferré and Villa, 2006]Under assumption (A-Li), assumptions ensuring the convergence of theestimated EDR space and assumptions (A1)-(A4),

d(w∗n,W∗)

n→+∞, P−−−−−−−−→ 0,

where d is defined by: d(w,W) = infw̃∈W∥∥∥w − w̃

∥∥∥R(q+2)p .

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 38 / 42

Page 107: Several nonlinear models and methods for FDA

Application: Tecator Data Set

Aim : Predicting the fat content of peaces of meat from their infraredspectra.

Comparison of:

PCA and then MLP [Thodberg, 1996];

Functional MLP (with a discrete sampling based approach)[Rossi and Conan-Guez, 2005];

smoothing FIR and then MLP [Ferré and Villa, 2006];

projection based FIR and then MLP[Ferré and Yao, 2003, Ferré and Villa, 2006];

smoothing FIR and then linear model.

Methodology: Repetition of 50 experiments with random training/testsets.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 39 / 42

Page 108: Several nonlinear models and methods for FDA

Application: Tecator Data Set

Aim : Predicting the fat content of peaces of meat from their infraredspectra.Comparison of:

PCA and then MLP [Thodberg, 1996];

Functional MLP (with a discrete sampling based approach)[Rossi and Conan-Guez, 2005];

smoothing FIR and then MLP [Ferré and Villa, 2006];

projection based FIR and then MLP[Ferré and Yao, 2003, Ferré and Villa, 2006];

smoothing FIR and then linear model.

Methodology: Repetition of 50 experiments with random training/testsets.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 39 / 42

Page 109: Several nonlinear models and methods for FDA

Application: Tecator Data Set

Aim : Predicting the fat content of peaces of meat from their infraredspectra.Comparison of:

PCA and then MLP [Thodberg, 1996];

Functional MLP (with a discrete sampling based approach)[Rossi and Conan-Guez, 2005];

smoothing FIR and then MLP [Ferré and Villa, 2006];

projection based FIR and then MLP[Ferré and Yao, 2003, Ferré and Villa, 2006];

smoothing FIR and then linear model.

Methodology: Repetition of 50 experiments with random training/testsets.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 39 / 42

Page 110: Several nonlinear models and methods for FDA

Results of the experiments

1

2

3

4

5

PCA-MLPNN f

SIRr-MLPSIRp-MLP

SIRr-ML

Remark: A similar experiment has been made on the phoneme dataset tocompare “FIR-MLP” to the functional nonparametric kernel.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 40 / 42

Page 111: Several nonlinear models and methods for FDA

Results of the experiments

1

2

3

4

5

PCA-MLPNN f

SIRr-MLPSIRp-MLP

SIRr-ML

Remark: A similar experiment has been made on the phoneme dataset tocompare “FIR-MLP” to the functional nonparametric kernel.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 40 / 42

Page 112: Several nonlinear models and methods for FDA

Table of contents

1 Nonparametric kernel

2 Neural networks

3 References

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 41 / 42

Page 113: Several nonlinear models and methods for FDA

References

Further details for the references are given in the joint document.

Ferraty, F. and Vieu, P. (2000).Dimension fractale et estimation de la régression dans des espacesvectoriels semi-normés.Comptes Rendus Mathématique. Académie des Sciences. Paris,330:139–142.

Ferraty, F. and Vieu, P. (2002).The functional nonparametric model and application to spectrometricdata.Computational Statistics, 17:515–561.

Ferraty, F. and Vieu, P. (2003).Curves discrimination: a non parametric approach.Computational and Statistical Data Analysis, 44:161–173.

Ferraty, F. and Vieu, P. (2004).Nonparametric models for functional data, with application inregression, time series prediction and curves discrimination.Journal of Nonparametric Statistics, 16:111–125.

Ferraty, F. and Vieu, P. (2008).Erratum of: ’non-parametric models for functional data, withapplication in regression, time-series prediction and curvediscrimination’.Journal of Nonparametric Statistics, 20(2):187–189.

Ferré, L. and Villa, N. (2005).Discrimination de courbes par régression inverse fonctionnelle.Revue de Statistique Appliquée, LIII(1):39–57.

Ferré, L. and Villa, N. (2006).Multi-layer perceptron with functional inputs: an inverse regressionapproach.Scandinavian Journal of Statistics, 33(4):807–823.

Ferré, L. and Yao, A. (2003).Functional sliced inverse regression analysis.Statistics, 37(6):475–488.

Ferré, L. and Yao, A. (2005).Smoothed functional inverse regression.Statistica Sinica, 15(3):665–683.

Hornik, K. (1991).Approximation capabilities of multilayer feedfoward networks.Neural Networks, 4(2):251–257.

Hornik, K. (1993).Some new results on neural network approximation.Neural Networks, 6(8):1069–1072.

Li, K. (1991).Sliced inverse regression for dimension reduction.Journal of the American Statistical Association, 86:316–342.

Nadaraya, E. (1964).On estimating regression.Theory of Probability and its Applications, 10:186–196.

Rossi, F. and Conan-Guez, B. (2005).Functional multi-layer perceptron: a nonlinear tool for functional dataanlysis.Neural Networks, 18(1):45–60.

Rossi, F. and Conan-Guez, B. (2006).Theoretical properties of projection based multilayer perceptrons withfunctional inputs.Neural Processing Letters, 23(1):55–70.

Stinchcombe, M. (1999).Neural network approximation of continuous functionals andcontinuous functions on compactifications.Neural Network, 12(3):467–477.

Thodberg, H. (1996).A review of bayesian neural network with an application to nearinfrared spectroscopy.IEEE Transaction on Neural Networks, 7(1):56–72.

Watson, G. (1964).Smooth regression analysis.Sankhya Series, A(26):359–372.

White, A. (1990).Connectionist nonparametric regression: multilayer feedforwardnetworks can learn arbitraty mappings.Neural Networks, 3:535–549.

White, H. (1989).Learning in artificial neural network: a statistical perspective.Neural Computation, 1:425–464.

Nathalie Villa (IMT & UPVD) Presentation 2 La Havane, Sept. 16th, 2008 42 / 42