From Data to Differential Equations Jim Ramsay McGill University

From Data to Differential From Data to Differential EquationsEquations

Jim RamsayJim Ramsay

McGill UniversityMcGill University

The themesThe themes

Differential equations are powerful Differential equations are powerful tools for modeling data.tools for modeling data.

There are new methods for There are new methods for estimating differential equations estimating differential equations directly from data.directly from data.

Some examples are offered, drawn Some examples are offered, drawn from chemical engineering and from chemical engineering and medicine.medicine.

Differential Equations as Differential Equations as ModelsModels

DIFE’S make DIFE’S make explicit the relation explicit the relation between one or between one or more derivatives more derivatives and the function and the function itself.itself.

An example is the An example is the harmonic motion harmonic motion equation:equation:

2 2( ) ( )D x t x t

Why Differential Equations?Why Differential Equations?

The behavior of a derivative is often of The behavior of a derivative is often of more interest than the function itself, more interest than the function itself, especially over short and medium time especially over short and medium time periods.periods.

Recall equations like Recall equations like f = maf = ma and and e = mce = mc22.. What often counts is how rapidly a system What often counts is how rapidly a system

responds rather than its level of response.responds rather than its level of response. Velocity and acceleration can reflect Velocity and acceleration can reflect

energy exchange within a system.energy exchange within a system.

A DIFE requires that derivatives behave A DIFE requires that derivatives behave smoothly, since they are linked to the smoothly, since they are linked to the function itself.function itself.

Natural scientists often provide theory to Natural scientists often provide theory to biologists and engineers in the form of biologists and engineers in the form of DIFE’s.DIFE’s.

Many fields such as pharmacokinetics and Many fields such as pharmacokinetics and industrial process control routinely use industrial process control routinely use DIFE’s as models, especially for DIFE’s as models, especially for input/output systems.input/output systems.

DIFE’s are especially useful when feedback DIFE’s are especially useful when feedback systems must be developed to control the systems must be developed to control the behavior of systems.behavior of systems.

The solution to an The solution to an mmth order linear DIFE is th order linear DIFE is an an mm-dimensional function space, and thus -dimensional function space, and thus the equation can model the equation can model variation variation over over replications as well as replications as well as averageaverage behavior. behavior.

Systems of DIFE’s are important models for Systems of DIFE’s are important models for processes mutually influencing each other, processes mutually influencing each other, such as treatments and symptoms, such as treatments and symptoms, predator and prey, and etc.predator and prey, and etc.

Nonlinear DIFE’s can provide compact and Nonlinear DIFE’s can provide compact and elegant models for systems exhibiting elegant models for systems exhibiting exceedingly complex behavior.exceedingly complex behavior.

Differential equations and time Differential equations and time scalesscales

DIFE’s are important where there are DIFE’s are important where there are events at different time scales.events at different time scales.

The order of the equation corresponds The order of the equation corresponds to the number of time scales plus one.to the number of time scales plus one.

A first-order equation can model A first-order equation can model events on two time scales: long-term, events on two time scales: long-term, modeled by modeled by x(t),x(t), and short-term, and short-term, modeled by modeled by Dx(t).Dx(t).

Handwriting has four time Handwriting has four time scalesscales

Average spatial position needs only Average spatial position needs only x(t),x(t), time scale is many seconds.time scale is many seconds.

Overall left-to-right trend requires Overall left-to-right trend requires Dx(t)Dx(t) with time scale a second or less.with time scale a second or less.

Cusps, loops, strokes require Cusps, loops, strokes require DD22x(t)x(t) with with scale of 100 msec.scale of 100 msec.

Transient effects such from pen contacting Transient effects such from pen contacting paper require paper require DD33x(t)x(t) with scale of 10 msec. with scale of 10 msec.

The Rössler EquationsThe Rössler Equations

This nearly linear system exhibits chaotic This nearly linear system exhibits chaotic behavior that would be virtually impossible behavior that would be virtually impossible to model without using a DIFE:to model without using a DIFE:

( ) ( ) ( )

( ) ( ) ( )

( ) ( ( ) ) ( )

Dx t y t z t

Dy t x t ay t

Dz t b x t c z t

Stochastic DIFE’sStochastic DIFE’s

We can introduce stochastic elements We can introduce stochastic elements into DIFE’s in many ways:into DIFE’s in many ways:

Random coefficient functions.Random coefficient functions. Random forcing functions.Random forcing functions. Random initial, boundary, and other Random initial, boundary, and other

constraints.constraints. Stochastic time.Stochastic time.

If we can model data on functions or If we can model data on functions or functional input/output systems, we functional input/output systems, we will have a modeling tool that greatly will have a modeling tool that greatly extends the power and scope of extends the power and scope of existing nonparametric curve-fitting existing nonparametric curve-fitting techniques. techniques.

We may also get better estimates of We may also get better estimates of functional parameters and their functional parameters and their derivatives. derivatives.

A simple input/output A simple input/output systemsystem

We begin by looking at a first order We begin by looking at a first order DIFE for a single output function DIFE for a single output function x(t)x(t) and a single input function and a single input function u(t). u(t). (SISO)(SISO)

But our goal is the linking of multiple But our goal is the linking of multiple inputs to multiple outputs (MIMO) by inputs to multiple outputs (MIMO) by linear or nonlinear systems of linear or nonlinear systems of arbitrary order arbitrary order mm..

( ) ( ) ( ) ( ) ( )Dx t t x t t u t

•u(t)u(t) is often called the is often called the forcing functionforcing function..•αα(t) (t) and and ββ(t) (t) are the are the coefficient functionscoefficient functions that define the DIFE.that define the DIFE.•The system is The system is linearlinear in these coefficient in these coefficient functions, and in the input functions, and in the input u(t)u(t) and output and output x(t).x(t).

In this simple case, an analytic solution is In this simple case, an analytic solution is possible:possible:

0 0( ) ( )

0( ) [ (0) ( ) ( ) ]

t utu du v dv

x t e x t e f u du

However, in most situations involving However, in most situations involving DIFE’s it is necessary to use numerical DIFE’s it is necessary to use numerical methods to find the solution. methods to find the solution.

A constant coefficient A constant coefficient exampleexample

We can see more clearly what happens We can see more clearly what happens when coefficient when coefficient ββ is a constant, is a constant, αα = 1, = 1, and and u(t)u(t) is a function stepping from 0 to is a function stepping from 0 to 1 at time 1 at time tt11::

1( )1

1( ) [1 ],t tx t e t t

Constant Constant αα//ββ is the is the gaingain in in the system.the system.

Constant Constant ββ controls the controls the responsivityresponsivity of the of the system to a system to a change in change in input. input.

Lupus treatmentLupus treatment

Lupus is an incurable auto-immune Lupus is an incurable auto-immune disease that mainly afflicts women.disease that mainly afflicts women.

It flares unpredictably, inflicting wide It flares unpredictably, inflicting wide damage with severe symptoms.damage with severe symptoms.

The treatment is prednisone, an The treatment is prednisone, an immune system suppressant used in immune system suppressant used in transplants.transplants.

But prednisone has serious short- and But prednisone has serious short- and long-term side affects.long-term side affects.

0 1 2 3 4 5

0

5

10

15

20

25

30

Year

SLE

DA

I (R

/G),

Pre

dnis

one

(B)

Patient 13, Nobs = 70, NSLEDAI = 40, Nflare = 13

How can we estimate a How can we estimate a DIFE from data?DIFE from data?

The DIFE as a linear differential The DIFE as a linear differential operatoroperator

We can express the first order DIFE as a We can express the first order DIFE as a linear differential operator:linear differential operator:

( ) ( ) ( ) ( ) ( ) ( ) 0Lx t t x t Dx t t u t

More generally, assuming “(t)”,More generally, assuming “(t)”,

1

0 1

m Kj m

j k kj k

Lx D x D x u

Smoothing data with the Smoothing data with the operator Loperator L

If we know the differential equation, then the If we know the differential equation, then the operator L can define a data smoother. The operator L can define a data smoother. The fitting criterion is:fitting criterion is:

2

2

1

( ) ( )N

i ii

PENSSE y x t Lx t dt

The larger The larger λλ is, the more the fitting function is, the more the fitting function x(t)x(t) is forced to be a solution of the differential equation is forced to be a solution of the differential equation Lx(t) = 0Lx(t) = 0..

The smooth valuesThe smooth values

If If x(t)x(t) is expanded in terms of a set is expanded in terms of a set K K basis basis functions functions φφkk(t),(t), and if and if NN by by KK matrix matrix ZZ contains the values of these functions at contains the values of these functions at time points time points ttii, then, then

1( ' ) '( )

( )( ) ', ( )

y Z Z Z R Z y s

R L L s L u

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

How to estimate How to estimate LLLL is a function of weight coefficients is a function of weight coefficients αα(t)(t) and and ββ(t).(t).

If these have the basis function expansionsIf these have the basis function expansions

( ) ;J L

j j l lj l

t a t t b t then we can optimize the profiled penalized error then we can optimize the profiled penalized error sum of squaressum of squares

2

( , )N

i ii

PENSSE a b y y

with respect to coefficient vectors with respect to coefficient vectors aa and and bb..

Adding constraintsAdding constraints

It is a simple matter to:It is a simple matter to: Constrain some coefficient functions Constrain some coefficient functions

to be zero or a constant.to be zero or a constant. Force others to be smooth, Force others to be smooth,

employing specific linear differential employing specific linear differential operators to smooth them towards operators to smooth them towards specific target spaces.specific target spaces.

And more …And more …

This approach is easily generalizable to:This approach is easily generalizable to: DIFE’s and differential operators of DIFE’s and differential operators of

any order.any order. Multiple inputs uMultiple inputs ujj(t) and outputs x(t) and outputs xii(t).(t). Replicated functional data.Replicated functional data. Nonlinear DIFE’s and operators.Nonlinear DIFE’s and operators.

What about choosing What about choosing λλ??

Choosing the smoothing parameter Choosing the smoothing parameter λλ is is always a delicate matter.always a delicate matter.

The right value of The right value of λλ will be rather large if will be rather large if the data can be well-modelled by a low-the data can be well-modelled by a low-order DIFE, but not so large as to fail to order DIFE, but not so large as to fail to smooth observational noise and small smooth observational noise and small additional functional variation.additional functional variation.

However, generalized cross-validation However, generalized cross-validation seems to work well in this context, too.seems to work well in this context, too.

A simple harmonic exampleA simple harmonic example

For For i=1,…,Ni=1,…,N and and j=1,…,nj=1,…,n, let, let

1 2 3 4sin 6 cos 6ij i i j i j i j ijy c c t c t c t

where the where the ccikik’s’s and the and the εεijij’s’s are are N(0,1);N(0,1); and and t = 0(0.01)1t = 0(0.01)1..

The functional variation satisfies the differential equationThe functional variation satisfies the differential equation2 2 4( ) (6 ) ( ) ( ) 0Lx t D x t D x t

so that so that ββ00(t) = (t) = ββ11(t) = (t) = ββ33(t)=0(t)=0 and and ββ22(t)(t) = (6 = (6ππ))2 2 = 355.3.= 355.3.

For simulated data with For simulated data with N = 20N = 20 and constant and constant bases for bases for ββ00(t) ,…, (t) ,…, ββ33(t), (t), we getwe get

for for L = DL = D44, , best results are forbest results are for λλ=10=10-10-10 and the and the RIMSE’s for derivatives 0, 1 and 2 are 0.32, 9.3 RIMSE’s for derivatives 0, 1 and 2 are 0.32, 9.3 and 315.6, resp.and 315.6, resp.

for for LL estimated, best results are for estimated, best results are for λλ=10=10-5-5 and and the RIMSE’s are 0.18, 2.8, and 49.3, resp.the RIMSE’s are 0.18, 2.8, and 49.3, resp.

giving precision ratios of 1.8, 3.3 and 6.4, giving precision ratios of 1.8, 3.3 and 6.4, resp.resp.

ββ22 was estimated as 353.6 whereas the true was estimated as 353.6 whereas the true value was 355.3.value was 355.3.

ββ33 was 0.1, with true value 0.0. was 0.1, with true value 0.0.

In addition to better curve estimates In addition to better curve estimates and much better derivative estimates, and much better derivative estimates, note that the derivative RMSE’s do note that the derivative RMSE’s do not go wild at the end points.not go wild at the end points.

This is because the DIFE ties the This is because the DIFE ties the derivatives to the function values, and derivatives to the function values, and the function values are tamed by the the function values are tamed by the data.data.

A decaying harmonic A decaying harmonic exampleexample

A second order equation defining A second order equation defining harmonic behavior with decay, forced harmonic behavior with decay, forced by a step function:by a step function:

ββ0 0 = 4.04, = 4.04, ββ1 1 = 0.4, = 0.4, αα = -2.0. = -2.0. u(t) = 0, t < 2u(t) = 0, t < 2ππ, u(t) = 1, t ≥ 2, u(t) = 1, t ≥ 2ππ.. Noise with std. dev. 0.2 added to 100 Noise with std. dev. 0.2 added to 100

randomly generated solution randomly generated solution functions.functions.

ParameterParameter True True ValueValue

Mean Mean EstimateEstimate

Std. ErrorStd. Error

ββ00 4.0404.040 4.0414.041 0.0730.073

ββ11 0.4000.400 0.3970.397 0.0480.048

αα -2.000-2.000 -1.998-1.998 0.0880.088

Results from 100 samples using minimum Results from 100 samples using minimum generalized cross-validation to choose generalized cross-validation to choose λλ::

0 2 4 6 8 10 12-1.5

-1

-0.5

0

0.5

1

1.5

2

t

datax(t)u(t)

An oil refinery exampleAn oil refinery example

The single input is “reflux flow” and The single input is “reflux flow” and the output is “tray 47” level in a the output is “tray 47” level in a distillation column.distillation column.

There were 194 sampling points.There were 194 sampling points. 30 B-spline basis functions were used 30 B-spline basis functions were used

to fit the output, and a step function to fit the output, and a step function was used to model the input.was used to model the input.

Results for the refinery dataResults for the refinery data

After some experimentation with first and After some experimentation with first and second order models, and with constant second order models, and with constant and varying coefficient models, the clear and varying coefficient models, the clear conclusion seems to be the constant conclusion seems to be the constant coefficient model:coefficient model:

( ) 0.02 ( ) 0.19 ( )Dx t x t u t

Monotone smoothingMonotone smoothing

Some constrained Some constrained functions can be functions can be expressed as expressed as DIFE’s.DIFE’s.

A smooth strictly A smooth strictly monotone function monotone function can be expressed can be expressed as the second as the second order DIFEorder DIFE

2 ( ) ( ) ( )D x t t Dx t

We can monotonically smooth data by We can monotonically smooth data by estimating the second order DIFE estimating the second order DIFE directly.directly.

We constrain We constrain ββ00(t) = 0(t) = 0, and give , and give ββ11(t)(t) enough flexibility to smooth the data.enough flexibility to smooth the data.

In the following artificial example, the In the following artificial example, the smoothing parameter was chosen by smoothing parameter was chosen by generalized cross-validation. generalized cross-validation. ββ11(t)(t) was was expanded in terms of 13 B-splines. expanded in terms of 13 B-splines.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

t

x(t

)

DataEstimateTrue

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

2

4

6

8

10

12

t

Dx(

t) DataEstimateTrue

Analyzing the Lupus dataAnalyzing the Lupus data

Weight function Weight function ββ(t)(t) defining an defining an order 1 DIFE for symptoms estimated order 1 DIFE for symptoms estimated with and without prednisone dose as with and without prednisone dose as a forcing function.a forcing function.

Weight expanded using B-splines Weight expanded using B-splines with knots at every observation time.with knots at every observation time.

Weight Weight αα(t)(t) for prednisone is for prednisone is constant.constant.

The forced DIFE for lupusThe forced DIFE for lupus

0 1 2 3 4 5

-20

0

20

40

60

80

0(t

)

0 1 2 3 4 5 6-7

-6.5

-6

-5.5

-5

-4.5

1(t

)

The data fitThe data fit

0 1 2 3 4 5 6-10

-5

0

5

10

15

20

25

30

35

t

x(t

)

DataNonhomogeneousHomogeneous

AssessmentAssessment

Adding the forcing function halved the Adding the forcing function halved the quadratic loss function being quadratic loss function being minimized.minimized.

We see that the fit improves where the We see that the fit improves where the dose is used to control the symptoms, dose is used to control the symptoms, but not where it is not used.but not where it is not used.

These results are only suggestive, and These results are only suggestive, and much more needs to be done.much more needs to be done.

SummarySummary

We can estimate differential equations We can estimate differential equations directly from noisy data with little bias and directly from noisy data with little bias and good precision.good precision.

This gives us a lot more modeling power, This gives us a lot more modeling power, especially for fitting input/output functional especially for fitting input/output functional data.data.

Estimates of derivatives can be much Estimates of derivatives can be much better, relative to smoothing methods.better, relative to smoothing methods.

Special functions such as monotone can be Special functions such as monotone can be fit by estimating the DIFE that defines them.fit by estimating the DIFE that defines them.

Documents

From Data to Differential Equations Jim Ramsay McGill University