Upload
hoangbao
View
227
Download
3
Embed Size (px)
Citation preview
1
Functional Data Analysis in Matlab and R
James Ramsay, Professor, McGill U., Montreal Hadley Wickham, Grad student, Iowa State, Ames, IA
Spencer Graves, Statistician, PDF Solutions, San José, CA
2
Outline • What is Functional Data Analysis? • FDA and Differential Equations • Examples:
– Squid Neurons– Continuously Stirred Tank Reactor (CSTR)
• Conclusions • References
3
What is FDA? • Functional data analysis is a collection of
techniques to model data from dynamic systems – possibly governed by differential equations – in terms of some set of basis functions
• The ‘fda’ package supports the use of 8 different types of basis functions: constant, monomial, polynomial, polygonal, B-splines, power, exponential, and Fourier.
4
Observations of different lengths • Observation vectors of different lengths
can be mapped to coordinates of a fixed basis set
• All examples in the ‘fda’ package have the same numbers of observations
• No conceptual obstacles to handling observation vectors of different lengths
5
Time Warping • “start” and “stop” are sometimes
determined by certain transitions• Example: growth spurts in the life cycle of
various species do not occur at exactly the same ages in different individuals (even within the same species)
6
10 Girls: Berkeley Growth Study• Tuddenham, R. D.,
and Snyder, M. M. (1954) "Physical growth of California boys and girls from birth to age 18", _University of California Publications in Child Development_, 1, 183-364.
ooooo
oo
oo
ooooooo
ooooooooooooooo
5 10 15
8010
012
014
016
018
0
age
Hei
ght (
cm.)
ooooo
oo
oo
oooooooooooooooooooooo
ooooo
o
oo
oo
ooooooooooooooooooooo
oooo
oo
oo
oo
ooooooooooooooooooooo
oooo
oo
oo
oo
ooooooooooooooooooooo
ooooo
oo
oo
oooooooooooooooooooooo
ooo
ooo
oo
oo
ooooooooooooooooooooo
oooo
oo
oo
oo
ooooo
oooooooooooooooo
oooo
oo
oo
oo
ooooooooooooooooooooo
oooo
oo
oo
oo
ooooooooooooooooooooo
7
Acceleration • Growth spurts
occur at different ages
• Average shows the basic trend, but features are damped by improper registration
ooooo
oo
oo
ooooooo
ooooooooooooooo
5 10 1580
100
120
140
160
180
age
Hei
ght (
cm.)
ooooo
oo
oo
oooooooooooooooooooooo
ooooo
o
oo
oo
ooooooooooooooooooooo
oooo
oo
oo
oo
ooooooooooooooooooooo
oooo
oo
oo
oo
ooooooooooooooooooooo
ooooo
oo
oo
oooooooooooooooooooooo
ooo
ooo
oo
oo
ooooooooooooooooooooo
oooo
oo
oo
oo
ooooo
oooooooooooooooo
oooo
oo
oo
oo
ooooooooooooooooooooo
oooo
oo
oo
oo
ooooooooooooooooooooo
5 10 15
-4-3
-2-1
01
2
age
Gro
wth
acc
eler
atio
n (c
m/y
ear^
2)
8
Registration • register.fd all
to the mean
• Not perfect, but better
5 10 15
-4-3
-2-1
01
ageG
row
th a
ccel
erat
ion
(cm
/yea
r^2)
5 10 15
-4-3
-2-1
01
2
Gro
wth
acc
eler
atio
n (c
m/y
r^2)
9
A Stroll Along the Beach
• Light intensity over 365 days at each of 190*143 = 27140 pixels was – smoothed – functional principal components
• http://www.stat.berkeley.edu/~wickham/userposter.pdf
10
Other fdacapabilities
• Correlations – even with
series of different lengths!
• Phase plane plots – good
estimates of derivatives
Month
Mea
n Te
mpe
ratu
re
Jan Apr Jun Sep Dec
-10
05
15
j F
m
A
MJ
J AS
O
N
D
Montreal average daily tempdeviation from average (C)
-10 -5 0 5 10 15 20
-0.0
060.
000
0.00
6
Temperature (C)
Acce
lera
tion
jF
m
A
M JJA
S
O
N
D
j
Montreal average daily tempdeviation from average (C)
afda-ch03.Rfda-ch01.Rfda-ch02.R
11
Script files for fda books • Ramsay and Silverman
– (2002) Applied Functional Data Analysis (Springer)
– (2006) Functional Data Analysis, 2nd ed. (Springer)
• ~R\library\fda\scripts– Some but not all data sets discussed in the
books are in the ‘fda’ package – Script files are available to reproduce some but
not all of the analyses in the books. – plus CSTR demo
12
FDA and Differential Equations• Many dynamic systems are believed to
follow processes where output changes are a function of the outputs, x, and inputs, u(and unknown parameters θ):
( ) ( ) [ ]Tttt ,0,|, ∈= θux,fx&
• Matlab was designed in part for these types of models
13
Squid Neurons • FitzHugh (1961) - Nagumo et al. (1962) Equations:
Estimate a, b and c in: ( )[ ]( ) cbRaVR
RVVcV+−−=
+−=&
& 33
Vol
tage
acr
oss
Axo
n M
embr
ane
Rec
over
y vi
a O
utw
ard
Cur
rent
s
V
R
14
Tank Reactions • Continuously Stirred Tank Reactor (CSTR)
Tem
pera
ture
C
once
ntra
tion
15
Functional Data Analysis Process1. Select Basis Set 2. Select Smoothing Operator
– e.g., differential equation– equivalent to a Bayesian prior over coefficients
to estimate3. Estimate coefficients to optimize some
objective function 4. Model criticism, residual plots, etc. 5. Hypothesis testing
16
Inputs to Tank Reaction Simulation
17
( )( ) ( ) ( )
( ) ( ){ }( ) ( )( ) ( )
( ) ( )ba
aFFaFF
FTFTFFFF
FTTFT
TFTFCFTTFFdtdTCFCFTdtdC
bbCCTC
TT
CC
TCTT
CC
,,,:parameters 42
,130,,
1110exp,
,,,
co
co
1 coco
inin
incoinco
inref4
in
cocoininininco
ininin
τκα
ββαβ
τκβ
αβββ
+=
=+=
+−−=
+++−=+−=
+
Computations: Nonlinear ODE
• Compute Input vectors • Define functions • Call differential
equation solver • Summarize, plot Te
mpe
ratu
re
Con
cent
ratio
n
estimate parameters (κ, τ, a, b)
18
Three problems • Estimate (κ, τ, a, b) to minimize SSE in
Temperature only
function SSE SSE-minMatlab lsqnonlin 5.09888 0.00236R nls 5.09652 0
optim Nelder-Mead 5.09652 0BFGS 5.09652 0CG 5.09900 0.00248SANN 5.17504 0.07852
nlminb 5.09652 0
19
0 10 20 30 40 50 601.2
1.4
1.6
C(t)
Concentration (red = true, blue = estimated)
0 10 20 30 40 50 60330
340
350
360
T(t)
Temperature
SSE(Temp, Conc)
• Matlab: lsqnonlin • R: nls
0 10 20 30 40 50 60
1.2
1.4
1.6
1.8
Concentration (red = true, blue = estimate)
C(t)
0 10 20 30 40 50 60
330
340
350
360
TemperatureC
(t)
Matlab RConcentration 1.149E-03 1.145E-03Temperature 2.640E-04 2.636E-04
Median absolute relative error
20
R vs. Matlab• Gave comparable answers • R code for CSTR slightly more accurate but
requires much more compute time – coded by different people
• R has helper functions not so easily replicated in Matlab– summary.nls– confint.nls– profile.nls
Estimate StdErr t Pr(>|t|) kref 0.466 0.004 113.0 < 2e-16 ***EoverR 0.840 0.009 94.7 < 2e-16 ***a 1.720 0.232 7.4 8.2e-13 ***b 0.496 0.050 10.0 < 2e-16 ***
21
confint.nls• Likelihood-based confidence intervals:
generally more accurate than Wald intervals – Wald subject to parameter effects curvature – Likelihood: only affected by intrinsic curvature
> confintNlsFit2.5% 97.5%
kref 0.458 0.474EoverR 0.823 0.858a 1.300 2.222b 0.401 0.599
22
0.455 0.465 0.475
0.0
1.0
2.0
τ
0.82 0.84 0.86
0.0
1.0
2.0
τ
1.2 1.6 2.0 2.4
0.0
1.0
2.0
τ
0.40 0.50 0.60
0.0
1.0
2.0
τ
plot.profile.nls• for a plot
showing the sqrt(log(LR))
0.455 0.465 0.475
0.0
1.0
2.0
τ
0.82 0.84 0.86
0.0
1.0
2.0
τ
1.2 1.6 2.0 2.4
0.0
1.0
2.0
τ
0.40 0.50 0.60
0.0
1.0
2.0
τ
kref EoverR
a b
50
99
80
9590
23
Conclusions
• R and Matlab give comparable answers • R:nls has helper functions absent from
Matlab:lsqnonlin
• Functional data analysis tools are key for – estimating derivatives and – working with differential operators
24
References• www.functionaldata.org• Ramsay and Silverman (2006) Functional Data
Analysis, 2nd ed. (Springer) • ________(2002) Applied Functional Data
Analysis (Springer) • Ramsay, J. O., Hooker, G., Cao, J. and
Campbell, D. (2007) Parameter estimation for differential equations: A generalized smoothing approach (with discussion). Journal of the Royal Statistical Society, Series B. To appear.
25
NOT free-knot splines• For this, see
– DierckxSpline package
– Companion to Dierckx, P. (1993). Curve and Surface Fitting with Splines. Oxford Science Publications, New York.
• R package by Sundar Dorai-Raj– links to Fortran code by Dierckx available
from www.netlib.org/dierckx• soon to appear on CRAN