25
1 Functional Data Analysis in Matlab and R James Ramsay, Professor, McGill U., Montreal Hadley Wickham, Grad student, Iowa State, Ames, IA Spencer Graves, Statistician, PDF Solutions, San José, CA

fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

Embed Size (px)

Citation preview

Page 1: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

1

Functional Data Analysis in Matlab and R

James Ramsay, Professor, McGill U., Montreal Hadley Wickham, Grad student, Iowa State, Ames, IA

Spencer Graves, Statistician, PDF Solutions, San José, CA

Page 2: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

2

Outline • What is Functional Data Analysis? • FDA and Differential Equations • Examples:

– Squid Neurons– Continuously Stirred Tank Reactor (CSTR)

• Conclusions • References

Page 3: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

3

What is FDA? • Functional data analysis is a collection of

techniques to model data from dynamic systems – possibly governed by differential equations – in terms of some set of basis functions

• The ‘fda’ package supports the use of 8 different types of basis functions: constant, monomial, polynomial, polygonal, B-splines, power, exponential, and Fourier.

Page 4: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

4

Observations of different lengths • Observation vectors of different lengths

can be mapped to coordinates of a fixed basis set

• All examples in the ‘fda’ package have the same numbers of observations

• No conceptual obstacles to handling observation vectors of different lengths

Page 5: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

5

Time Warping • “start” and “stop” are sometimes

determined by certain transitions• Example: growth spurts in the life cycle of

various species do not occur at exactly the same ages in different individuals (even within the same species)

Page 6: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

6

10 Girls: Berkeley Growth Study• Tuddenham, R. D.,

and Snyder, M. M. (1954) "Physical growth of California boys and girls from birth to age 18", _University of California Publications in Child Development_, 1, 183-364.

ooooo

oo

oo

ooooooo

ooooooooooooooo

5 10 15

8010

012

014

016

018

0

age

Hei

ght (

cm.)

ooooo

oo

oo

oooooooooooooooooooooo

ooooo

o

oo

oo

ooooooooooooooooooooo

oooo

oo

oo

oo

ooooooooooooooooooooo

oooo

oo

oo

oo

ooooooooooooooooooooo

ooooo

oo

oo

oooooooooooooooooooooo

ooo

ooo

oo

oo

ooooooooooooooooooooo

oooo

oo

oo

oo

ooooo

oooooooooooooooo

oooo

oo

oo

oo

ooooooooooooooooooooo

oooo

oo

oo

oo

ooooooooooooooooooooo

Page 7: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

7

Acceleration • Growth spurts

occur at different ages

• Average shows the basic trend, but features are damped by improper registration

ooooo

oo

oo

ooooooo

ooooooooooooooo

5 10 1580

100

120

140

160

180

age

Hei

ght (

cm.)

ooooo

oo

oo

oooooooooooooooooooooo

ooooo

o

oo

oo

ooooooooooooooooooooo

oooo

oo

oo

oo

ooooooooooooooooooooo

oooo

oo

oo

oo

ooooooooooooooooooooo

ooooo

oo

oo

oooooooooooooooooooooo

ooo

ooo

oo

oo

ooooooooooooooooooooo

oooo

oo

oo

oo

ooooo

oooooooooooooooo

oooo

oo

oo

oo

ooooooooooooooooooooo

oooo

oo

oo

oo

ooooooooooooooooooooo

5 10 15

-4-3

-2-1

01

2

age

Gro

wth

acc

eler

atio

n (c

m/y

ear^

2)

Page 8: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

8

Registration • register.fd all

to the mean

• Not perfect, but better

5 10 15

-4-3

-2-1

01

ageG

row

th a

ccel

erat

ion

(cm

/yea

r^2)

5 10 15

-4-3

-2-1

01

2

Gro

wth

acc

eler

atio

n (c

m/y

r^2)

Page 9: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

9

A Stroll Along the Beach

• Light intensity over 365 days at each of 190*143 = 27140 pixels was – smoothed – functional principal components

• http://www.stat.berkeley.edu/~wickham/userposter.pdf

Page 10: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

10

Other fdacapabilities

• Correlations – even with

series of different lengths!

• Phase plane plots – good

estimates of derivatives

Month

Mea

n Te

mpe

ratu

re

Jan Apr Jun Sep Dec

-10

05

15

j F

m

A

MJ

J AS

O

N

D

Montreal average daily tempdeviation from average (C)

-10 -5 0 5 10 15 20

-0.0

060.

000

0.00

6

Temperature (C)

Acce

lera

tion

jF

m

A

M JJA

S

O

N

D

j

Montreal average daily tempdeviation from average (C)

afda-ch03.Rfda-ch01.Rfda-ch02.R

Page 11: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

11

Script files for fda books • Ramsay and Silverman

– (2002) Applied Functional Data Analysis (Springer)

– (2006) Functional Data Analysis, 2nd ed. (Springer)

• ~R\library\fda\scripts– Some but not all data sets discussed in the

books are in the ‘fda’ package – Script files are available to reproduce some but

not all of the analyses in the books. – plus CSTR demo

Page 12: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

12

FDA and Differential Equations• Many dynamic systems are believed to

follow processes where output changes are a function of the outputs, x, and inputs, u(and unknown parameters θ):

( ) ( ) [ ]Tttt ,0,|, ∈= θux,fx&

• Matlab was designed in part for these types of models

Page 13: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

13

Squid Neurons • FitzHugh (1961) - Nagumo et al. (1962) Equations:

Estimate a, b and c in: ( )[ ]( ) cbRaVR

RVVcV+−−=

+−=&

& 33

Vol

tage

acr

oss

Axo

n M

embr

ane

Rec

over

y vi

a O

utw

ard

Cur

rent

s

V

R

Page 14: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

14

Tank Reactions • Continuously Stirred Tank Reactor (CSTR)

Tem

pera

ture

C

once

ntra

tion

Page 15: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

15

Functional Data Analysis Process1. Select Basis Set 2. Select Smoothing Operator

– e.g., differential equation– equivalent to a Bayesian prior over coefficients

to estimate3. Estimate coefficients to optimize some

objective function 4. Model criticism, residual plots, etc. 5. Hypothesis testing

Page 16: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

16

Inputs to Tank Reaction Simulation

Page 17: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

17

( )( ) ( ) ( )

( ) ( ){ }( ) ( )( ) ( )

( ) ( )ba

aFFaFF

FTFTFFFF

FTTFT

TFTFCFTTFFdtdTCFCFTdtdC

bbCCTC

TT

CC

TCTT

CC

,,,:parameters 42

,130,,

1110exp,

,,,

co

co

1 coco

inin

incoinco

inref4

in

cocoininininco

ininin

τκα

ββαβ

τκβ

αβββ

+=

=+=

+−−=

+++−=+−=

+

Computations: Nonlinear ODE

• Compute Input vectors • Define functions • Call differential

equation solver • Summarize, plot Te

mpe

ratu

re

Con

cent

ratio

n

estimate parameters (κ, τ, a, b)

Page 18: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

18

Three problems • Estimate (κ, τ, a, b) to minimize SSE in

Temperature only

function SSE SSE-minMatlab lsqnonlin 5.09888 0.00236R nls 5.09652 0

optim Nelder-Mead 5.09652 0BFGS 5.09652 0CG 5.09900 0.00248SANN 5.17504 0.07852

nlminb 5.09652 0

Page 19: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

19

0 10 20 30 40 50 601.2

1.4

1.6

C(t)

Concentration (red = true, blue = estimated)

0 10 20 30 40 50 60330

340

350

360

T(t)

Temperature

SSE(Temp, Conc)

• Matlab: lsqnonlin • R: nls

0 10 20 30 40 50 60

1.2

1.4

1.6

1.8

Concentration (red = true, blue = estimate)

C(t)

0 10 20 30 40 50 60

330

340

350

360

TemperatureC

(t)

Matlab RConcentration 1.149E-03 1.145E-03Temperature 2.640E-04 2.636E-04

Median absolute relative error

Page 20: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

20

R vs. Matlab• Gave comparable answers • R code for CSTR slightly more accurate but

requires much more compute time – coded by different people

• R has helper functions not so easily replicated in Matlab– summary.nls– confint.nls– profile.nls

Estimate StdErr t Pr(>|t|) kref 0.466 0.004 113.0 < 2e-16 ***EoverR 0.840 0.009 94.7 < 2e-16 ***a 1.720 0.232 7.4 8.2e-13 ***b 0.496 0.050 10.0 < 2e-16 ***

Page 21: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

21

confint.nls• Likelihood-based confidence intervals:

generally more accurate than Wald intervals – Wald subject to parameter effects curvature – Likelihood: only affected by intrinsic curvature

> confintNlsFit2.5% 97.5%

kref 0.458 0.474EoverR 0.823 0.858a 1.300 2.222b 0.401 0.599

Page 22: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

22

0.455 0.465 0.475

0.0

1.0

2.0

τ

0.82 0.84 0.86

0.0

1.0

2.0

τ

1.2 1.6 2.0 2.4

0.0

1.0

2.0

τ

0.40 0.50 0.60

0.0

1.0

2.0

τ

plot.profile.nls• for a plot

showing the sqrt(log(LR))

0.455 0.465 0.475

0.0

1.0

2.0

τ

0.82 0.84 0.86

0.0

1.0

2.0

τ

1.2 1.6 2.0 2.4

0.0

1.0

2.0

τ

0.40 0.50 0.60

0.0

1.0

2.0

τ

kref EoverR

a b

50

99

80

9590

Page 23: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

23

Conclusions

• R and Matlab give comparable answers • R:nls has helper functions absent from

Matlab:lsqnonlin

• Functional data analysis tools are key for – estimating derivatives and – working with differential operators

Page 24: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

24

References• www.functionaldata.org• Ramsay and Silverman (2006) Functional Data

Analysis, 2nd ed. (Springer) • ________(2002) Applied Functional Data

Analysis (Springer) • Ramsay, J. O., Hooker, G., Cao, J. and

Campbell, D. (2007) Parameter estimation for differential equations: A generalized smoothing approach (with discussion). Journal of the Royal Statistical Society, Series B. To appear.

Page 25: fda in Matlab & R · PDF filein Matlab and R James Ramsay, Professor, McGill U., ... – plus CSTR demo . 12 FDA and Differential Equations • Many dynamic systems are believed to

25

NOT free-knot splines• For this, see

– DierckxSpline package

– Companion to Dierckx, P. (1993). Curve and Surface Fitting with Splines. Oxford Science Publications, New York.

• R package by Sundar Dorai-Raj– links to Fortran code by Dierckx available

from www.netlib.org/dierckx• soon to appear on CRAN