41
Statistical Data Assimilation: Exact Formulation and Variational Approximations Henry D. I. Abarbanel Department of Physics and Marine Physical Laboratory (Scripps Institution of Oceanography) Center for Theoretical Biological Physics University of California, San Diego [email protected] =================== In collaboration with Philip Gill, Elizabeth Wong, Dan Creveling, Mark Kostuk, Jack Quinn, Bryan Toth, William Whartenby, Chris Knowlton

Statistical Data Assimilation: Exact Formulation and

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistical Data Assimilation: Exact Formulation and

Statistical Data Assimilation: Exact Formulation and Variational Approximations

Henry D. I. Abarbanel

Department of Physics and

Marine Physical Laboratory (Scripps Institution of Oceanography)

Center for Theoretical Biological PhysicsUniversity of California, San Diego

[email protected]===================

In collaboration with Philip Gill, Elizabeth Wong, Dan Creveling, Mark Kostuk, Jack Quinn, Bryan Toth, William Whartenby, Chris Knowlton

Page 2: Statistical Data Assimilation: Exact Formulation and

Data assimilation is a unidirectional communications system: data = transmitter model = receiver. Synchronize the model and the data to achieve communication of information from data to model.

We present an exact formula for the probability of the performance of the communications channel. The exact formula is an integral along a path in the space of states and parameters of the receiver = model.

We analyze approximations to the path integral: variational and direct evaluations.

We present an example from a core climate model element: shallow water equations.

Page 3: Statistical Data Assimilation: Exact Formulation and

Common features in developing predictive models of observed nonlinear systems:

I Few of the many state variables are observable.

II We must estimate the unobserved state variables and the fixed parameters to make predictions, using x(tn+1) = f(x(tn),p).

III Using observations to guide the dynamics to the right sector of phase space can allow prediction in chaos

Page 4: Statistical Data Assimilation: Exact Formulation and

0 1, ,... ,...,n mt t t t t T

( )1,2,...,

l ny tl L

1( )( ( ), );1,2,...,

a n

a n

x tf x t pa D

( ) ( ( ))l ly n h x nL D

Page 5: Statistical Data Assimilation: Exact Formulation and

Our general interest is in making models of observed physical systems: These models often have unknown parameters---conductivity, transport coefficients, reaction rates, coupling strengths, ….

These models often have unobservable state variables---gating variables for ion channels, population inversion in lasers, …

1 0

( ) ( ( ), ); need all x(0) and

( ) ( ( ), ); need all x( ) and n n

dx t F x t p pdt

x t f x t p t p

Our discussion starts with the model and the data, and an interest in determining the unknown parameters and unobserved state variables.

Page 6: Statistical Data Assimilation: Exact Formulation and

Data Assimilation—transfer of information from observations to models: 

Noisy Data, Errors in the Model—low resolution, …., unknown initial states when 

data acquisition begins

Exact Formulation as a Path Integral

Page 7: Statistical Data Assimilation: Exact Formulation and

( ( ) | ( )) probability distribution for themodel state at time t , given measurements

(0), (1),..., ( ) ( )m

P x m Y m

y y y m Y m

Page 8: Statistical Data Assimilation: Exact Formulation and

0( , ( ))

1( , ( ))

01

0

=

( ), ( 1),..., (0)

( , ( )) ( ( ), ( )

( ( ) | ( )) ( ( ) | ( ))

( ) ( ( 1) | ( )) ( (0))

( ) A X Y m

mTMI X Y mD

nm

n

D

x

x

TMI Y Z m CMI y

X x x m x

n

m

n z

P x m Y mP x m Y m

d n e P x n x n P x

d n e

Path Integral for :

0| ( 1))

Total Conditional Mutual Information between path and observations ( )

m

nZ n

X Y m

What is different about this path integral: nonlinear “propagators” dissipative dynamics, orbits on strange attractors.

Path

Page 9: Statistical Data Assimilation: Exact Formulation and

00

1

0

( | ( )) log{ ( ( ), ( ) | ( 1))}

- log{ ( ( 1) | ( ))} log{ ( (0))}

m

nm

n

A X Y m MI x n y n Y n

P x n x n P x

Action for State and Parameter Estimation

Page 10: Statistical Data Assimilation: Exact Formulation and

0

0

0

- ( | ( ))

-

With the density of paths exp[- ( | ( ))],we are able to evaluate any conditional expectationvalue of a function

( ) ( ( ), ( -1),..., (1), (0), ) as

( ) [ ( ) | ] ( )

A X Y m

A

A X Y m

F X F x m x m x x p

dXe F XE F X Y F X

dXe

( | ( ))

Path is through state variable+parameter space X={x(m),x(m-1),...x(0),p}

X Y m

Page 11: Statistical Data Assimilation: Exact Formulation and

0

0

- ( | ( ))

- ( | ( ))

( ) [ ( ) | ] ( )

A X Y m

A X Y m

dXe F XE F X Y F X

dXe

We have explored Monte Carlo numerical evaluation of the integral representation of the data assimulation task. We produced the results here using single CPU machines.

This is eminently parallelizable. Same problem on GPUmachines runs 60-300 times faster! Bigger problems utilizemore GPU threads.

This is the source of numerical optimism.

Page 12: Statistical Data Assimilation: Exact Formulation and
Page 13: Statistical Data Assimilation: Exact Formulation and

0

0

- ( | ( ))

- ( | ( ))

( ) [ ( ) | ] ( )

A X Y m

A X Y m

dXe F XE F X Y F X

dXe

0

Another approach is expansion about a saddle path( ) | 0 1,2..., ( 1)

This is 4DVar.This is a numerical optimization problem-

Sig-see Gill tomor

nificant problems with this when trrow.

X SA X m D K

X

20

ajectories are chaotic.Path integral representation allows corrections to 4DVar:do Gaussian integral about .

( )This requires |X S

X SA X

X X

Page 14: Statistical Data Assimilation: Exact Formulation and

1 1 0 1 1

1 1 2

Lorenz96 ModelDynamical variables--longitudinal `activity'--no vertical levelsno latitude variations.y ( ) 1, 2,...,

( ) ( ) ( ) ( ) y ( ) ( )( ) ( )( ( ) ( ))

a

D D D

aa a a

t a D

y t y t y t y t t y tdy t y t y t y t y

dt

( ) Forcinga t

We explored D = 5. Chaotic at Forcing = 7.9; we used Forcing = 8.17

Page 15: Statistical Data Assimilation: Exact Formulation and

15 2 4 1

21 3 5 2

32 4 1 3

43 5 2 4

24

1 1 1

2 3

1

3

( ) ( )( ( ) ( )) ( )

( ) ( )( ( ) ( )) ( )

( ) ( )( ( ) ( )) ( )

( ) ( )( ( ) ( )) ( )

(

( )( ( ) ( ))

( )( ( ) ( )

(

)

) )( ( )

u t x t y t

u t

dy t y t y t y t y t Fdt

dy t y t y t y t y t Fdt

dy t y t y t y t y t Fdt

dy t

x t

y t y t y t y t Fdt

dy t y t

y t

y t ydt

3 5( )) ( )t y t F

Model does not synchronize with ‘data’ (x(t)) with only one coupling: two positive Conditional Lyapunov Exponents two different couplings for data are required

Page 16: Statistical Data Assimilation: Exact Formulation and

21 2

0 1

1 1 2 2

2

Calculate the synchronization error or cost function

C(k , ) ( ( ) ( ))

as a function of the couplings (u (t)=k , ( ) ).

Examine the dependence on an initial condition, here y (0)for

N L

l n l nn l

k x t y t

u t k

1 2various regions of (k , ).k

Page 17: Statistical Data Assimilation: Exact Formulation and
Page 18: Statistical Data Assimilation: Exact Formulation and
Page 19: Statistical Data Assimilation: Exact Formulation and

2 (0) 4.9datay

Page 20: Statistical Data Assimilation: Exact Formulation and

2 (0) 4.9datay

Page 21: Statistical Data Assimilation: Exact Formulation and

2 (0) 4.9datay

Page 22: Statistical Data Assimilation: Exact Formulation and

2 (0) 4.9datay

Page 23: Statistical Data Assimilation: Exact Formulation and

Dynamical State and Parameter Estimation (DSPE)Minimize:

2 21 1

0

1( , , ) {( ( ) ( )) ( ) }2

T

C y u p x t y t u tT

Subject to the equality constraints:

11 1 1 1

1

( ) ( ( ), ( ), ) ( )( ( ) ( ))

( ) ( ( ), ( ), )

R

RR R

dy t F y t y t p u t x t y tdt

dy t F y t y t pdt

Page 24: Statistical Data Assimilation: Exact Formulation and

Use variational answer, saddle path, from optimization

As a first approximation and use path integral to evaluate fluctuations about this optimal path. We use IPOPT to solve optimization problem via direct method.

0

Saddle path

( ) | 0 1,2..., ( 1)X SA X m D K

X

Page 25: Statistical Data Assimilation: Exact Formulation and

Shallow Water Equations

x, u(x,y,t)y, v(x,y,t)

z, w(x,y,z,t)

0( , , ) ( , , )H x y t H h x y t

Page 26: Statistical Data Assimilation: Exact Formulation and

Twin Experiment: Generate ‘data’ from NbyN shallow water equation. Present observed variables: u, v, h. How big must L be to allow estimation of other unobserved state variables? How many measurements must one make to accurately estimate unobserved states?

For 8 by 8 example with 192 state variables, the number is 112 out of 192, about 60%.

2L N

Page 27: Statistical Data Assimilation: Exact Formulation and

In 8 by 8 shallow water equations, present all measurements of h(x,y,t) as well as 2, 3, 4, .. Columns of data from wind velocities. Columns contain data on shears in the wind fields.

Page 28: Statistical Data Assimilation: Exact Formulation and
Page 29: Statistical Data Assimilation: Exact Formulation and
Page 30: Statistical Data Assimilation: Exact Formulation and
Page 31: Statistical Data Assimilation: Exact Formulation and
Page 32: Statistical Data Assimilation: Exact Formulation and
Page 33: Statistical Data Assimilation: Exact Formulation and

Effective Actions for Path Integrals0 ( )

2

2

( ) ( )

stationary path approximation( ) 1 ( )( ) ( ) ( ) | ( ) ( ) ...

2

( )4DVar is | 0,

1 ( )( ) ( ) ( ) ( ) ...2

(2 )( ) ( )det( ''( )

A X

S

S

D

F X dXF X e

F X F XF X F S X S X S X SX X X

F XX

F XF X F S X S X SX X

F X F SF S

Performing an integral involves an optimization problem

Page 34: Statistical Data Assimilation: Exact Formulation and

Effective Actions for Path IntegralsGeneralize 4Dvar to another, exact

variational principle

0

0

0

0 0 0

0

( )( )

( )

0 ( )

( ) ( ) ( )2

( )

Generating function for moments along the path

X ( ) | [ ]

X X X X ( )

and similarly for hig

A X K XC K

A X

K A X

A X A X A X

A X

e dX e

dX eC K E XK dX e

dX e dX e dX eC KX X dX e

her moments

Page 35: Statistical Data Assimilation: Exact Formulation and

0

0

( ) ( )( )

( )( ) ( )

12 2

Now trade in the `current' K for the mean field via

( ) ( )

( ) ( ) implies

( ) ( )

A X K XA

AA X X

A C K K

C K A KK

e dX e

dX e

A C KX X

Page 36: Statistical Data Assimilation: Exact Formulation and

0

( ) contains all the moment information. It is likethe Free Energy in statistical physics.

( ) 0 gives the `bare' orbit.

( ) 0 gives the complete expected state variable <X>

including all c

A

A XX

A

orrections due to fluctuations in measurements

and model error.

Page 37: Statistical Data Assimilation: Exact Formulation and

0

1 1/21'' ''2

3 10 01

( )( ) ( )( )

0

( )2 ( ) 2( )( ) ( )

''0

'' 20 0

( ) ( )

(2 )det( ( ))

( ) ( ) (log ( )) ( )2

( ) 0 is the generali

r r r l lll

r ll

AA X XA

ll

l

AU UDA

A AU

e dX e

A A

e dUe eA

A A tr A O

A

zed 4Dvar

Page 38: Statistical Data Assimilation: Exact Formulation and

Using the integral representation of the Data Assimilation questions, one can ask how many degrees of freedom are actually required to capture the physics---when do we stop improving the resolution ?

By a consideration of the continuum limit in space and time, one can integrate out the high frequency and high wavenumber components and provide a “universal” parametrization of the answer, then evaluate the remaining structure now containing all the Physics on the desired scales.

Page 39: Statistical Data Assimilation: Exact Formulation and

Summary

Data assimilation is a unidirectional communications system: data = transmitter, model = receiver. Synchronize the model and the data to achieve communication of information from data to model.

We gave an exact integral representation of the solution of the master equation for the conditional probability distribution in state space. It is an integral along a path in the space of states and parameters of the receiver = model.

Evaluation of high dimensional path integral using Monte Carlo methods. Works well in “low” dimensions. Eminently parallelizable, so numerical possibilities for LARGE models are available.

Page 40: Statistical Data Assimilation: Exact Formulation and
Page 41: Statistical Data Assimilation: Exact Formulation and