22
ESAIM: M2AN Will be set by the publisher ESAIM: Mathematical Modelling and Numerical Analysis DOI: (will be inserted later) www.esaim-m2an.org TOWARDS EFFECTIVE DYNAMICS IN COMPLEX SYSTEMS BY MARKOV KERNEL APPROXIMATION Christof Sch¨ utte 1 and Tobias Jahnke 2 Abstract. Many complex systems occurring in various application share the property that the under- lying Markov process remains in certain regions of the state space for long times, and that transitions between such metastable sets occur only rarely. Often the dynamics within each metastable set is of minor importance, but the transitions between these sets are crucial for the behavior and the under- standing of the system. Since simulations of the original process are usually prohibitively expensive, the effective dynamics of the system, i.e. the switching between metastable sets, has to be approximated in a reliable way. This is usually done by computing the dominant eigenvectors and eigenvalues of the transfer operator associated to the Markov process. In many real applications, however, the matrix representing the spatially discretized transfer operator can be extremely large, such that approximating eigenvectors and eigenvalues is a computationally critical problem. In this article we present a novel method to determine the effective dynamics via the transfer operator without computing its dominant spectral elements. The main idea is that a time series of the process allows to approximate the sampling kernel of the process, which is an integral kernel closely related to the transition function of the transfer operator. Metastability is taken into account by representing the approximative sampling kernel by a linear combination of kernels each of which represents the process on one of the metastable sets. The effect of the approximation error on the dynamics of the system is discussed, and the potential of the new approach is illustrated by numerical examples. Mathematics Subject Classification. — Please, give AMS classification codes —. Received July 7, 2008. Introduction 1 This article deals with a novel approach to the identification of the effective dynamical behavior of complex 2 systems. It will be assumed that the evolution of the system under consideration can be described by a Markov 3 process. Furthermore we will be mainly interested in systems that (A) are high dimensional and (B) that 4 exhibit metastable or almost invariant sets that are characterized by the property that the expected exit times 5 from these sets define the timescales of the effective dynamical behavior of the system. Systems with these 6 properties are pervasive; for example, they occur in the geo-sciences (e.g. climate dynamics with warm and ice 7 WARNING: — Give at least one key word — Supported by the DFG research center Matheon “Mathematics for key technologies” in Berlin and by Microsoft Research within the project “Conceptionalization of Molecular Dynamics”. 1 Institut f¨ ur Mathematik II, Freie Universit¨at Berlin, Arnimallee 2–6, 14195 Berlin, Germany. [email protected] 2 Institut f¨ ur Angewandte und Numerische Mathematik, Universit¨at Karlsruhe (TH), Kaiserstr. 93, 76133 Karlsruhe, Germany. [email protected] c EDP Sciences, SMAI 2009

Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

ESAIM: M2AN Will be set by the publisher ESAIM: Mathematical Modelling and Numerical Analysis

DOI: (will be inserted later) www.esaim-m2an.org

TOWARDS EFFECTIVE DYNAMICS IN COMPLEX SYSTEMS BY MARKOVKERNEL APPROXIMATION ∗

Christof Schutte1

and Tobias Jahnke2

Abstract. Many complex systems occurring in various application share the property that the under-lying Markov process remains in certain regions of the state space for long times, and that transitionsbetween such metastable sets occur only rarely. Often the dynamics within each metastable set is ofminor importance, but the transitions between these sets are crucial for the behavior and the under-standing of the system. Since simulations of the original process are usually prohibitively expensive, theeffective dynamics of the system, i.e. the switching between metastable sets, has to be approximatedin a reliable way. This is usually done by computing the dominant eigenvectors and eigenvalues of thetransfer operator associated to the Markov process. In many real applications, however, the matrixrepresenting the spatially discretized transfer operator can be extremely large, such that approximatingeigenvectors and eigenvalues is a computationally critical problem. In this article we present a novelmethod to determine the effective dynamics via the transfer operator without computing its dominantspectral elements. The main idea is that a time series of the process allows to approximate the samplingkernel of the process, which is an integral kernel closely related to the transition function of the transferoperator. Metastability is taken into account by representing the approximative sampling kernel by alinear combination of kernels each of which represents the process on one of the metastable sets. Theeffect of the approximation error on the dynamics of the system is discussed, and the potential of thenew approach is illustrated by numerical examples.

Mathematics Subject Classification. — Please, give AMS classification codes —.

Received July 7, 2008.

Introduction 1

This article deals with a novel approach to the identification of the effective dynamical behavior of complex 2

systems. It will be assumed that the evolution of the system under consideration can be described by a Markov 3

process. Furthermore we will be mainly interested in systems that (A) are high dimensional and (B) that 4

exhibit metastable or almost invariant sets that are characterized by the property that the expected exit times 5

from these sets define the timescales of the effective dynamical behavior of the system. Systems with these 6

properties are pervasive; for example, they occur in the geo-sciences (e.g. climate dynamics with warm and ice 7

WARNING: — Give at least one key word —

∗ Supported by the DFG research center Matheon “Mathematics for key technologies” in Berlin and by Microsoft Researchwithin the project “Conceptionalization of Molecular Dynamics”.1 Institut fur Mathematik II, Freie Universitat Berlin, Arnimallee 2–6, 14195 Berlin, Germany. [email protected] Institut fur Angewandte und Numerische Mathematik, Universitat Karlsruhe (TH), Kaiserstr. 93, 76133 Karlsruhe, [email protected]

c© EDP Sciences, SMAI 2009

Page 2: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

2 C. SCHUTTE AND T. JAHNKE

ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial markets and their dynamical1

regimes or employment dynamics), or in the life sciences. Form the latter field our guiding example is taken: In2

biomolecular systems the metastable sets are called conformations and the effective dynamics can be described3

as transitions between these conformations with specific internal motion within each conformation [16,38].4

Transitions between conformations are critical to the function of proteins and nucleic acids. They include ligand5

binding [30], complex conformational rearrangements between native protein substates [14,28], and folding [23],6

so that their analysis is the key to understanding the biophysics of living cells.7

In many of these application the mathematical models used to describe the dynamics of the respective system8

are high dimensional ordinary differential equations (ODE). Mostly, appropriate models do not only contain the9

system’s degrees of freedom but also additional degrees of freedom that represent the environment or heat bath10

in which the system is embedded or to which it is coupled. Considering the evolution of the system thus often11

means considering a process that results from the projection of some higher dimensional ODE to the system’s12

state space. A typical example for this approach are thermostat models in molecular dynamics [25]. Under13

certain conditions (e.g., on the time scales of the effective dynamics) the resulting process still is a Markov14

process. In many cases, however, the projected process is remodeled in form of a stochastic differential equation15

(SDE). Because of this, we will mainly consider SDE models for Markov processes on the state space of the16

system under consideration. Projected ODEs is briefly discussed in the appendix.17

Mathematically each Markov process can be described by the associated Markov or transfer operator, and18

the effective dynamical behavior of the process by the essential properties (e.g. dominant eigenvalues and19

eigenvectors) of its transfer operator; there is a remarkably long list of articles about this topic from which the20

following may fit best to the topic considered herein [2,5,7,8,11,21,24,26,32,37]. Using the transfer operator in21

order to study the effective dynamics has one main advantage and one main disadvantage: the advantages is22

that independent of the (mostly strong) nonlinearity of the dynamics, the transfer operator is a linear operator23

that governs the propagation of a probability density by the underlying dynamics. The disadvantage, however,24

is that the transfer operator lives in a high dimensional function space whose dimension is that of the state space25

of the system. In many real-world cases, computation of its essential properties (dominant spectral elements)26

thus suffers from the curse of dimensionality although there are several articles that offer cures to this problem27

for specific problems mainly for biomolecular systems, see [6,33,34] for example.28

This article provides a novel approach to the analysis of the effective dynamics via the transfer operator29

without computing its dominant spectral elements. Instead, we will consider the approximation of the integration30

kernel of the transfer operator. We will demonstrate that for non-deterministic dynamics this kernel has some31

nice structural properties that may allow to approximate it well even in high dimensional cases. We then32

will study how some mathematical properties of the dynamics, particularly its metastability properties, may33

change if we exchange the original transfer operator with the one that results from approximation of the kernel.34

Furthermore, we will introduce some algorithmic kernel approximation techniques that have the potential to35

work well even in high dimensions. Finally, we will present some numerical experiments to illustrate the concept36

itself and the performance of our kernel approximation techniques. However, it should be emphasized that this37

article can just give a first introduction of the key ideas: we will mainly consider diffusive dynamics in low38

dimensions for the sake of clarity and completeness, and we will base our kernel approximation algorithm on39

just one kind of ansatz functions. Generalizations are under investigation but will not be discussed herein.40

1. Transfer operators and kernels41

Throughout this article we study homogeneous Markov processes Xt = {Xt}t∈I on a state space X ⊂ Rd,42

where I is an interval in R. The dynamics of Xt is given by the stochastic transition function43

p(t, x, A) = P[Xt+s ∈ A |Xs = x], (1.1)

Page 3: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

TOWARDS EFFECTIVE DYNAMICS IN COMPLEX SYSTEMS BY MARKOV KERNEL APPROXIMATION 3

for every t, s ∈ I, x ∈ X and A ⊂ X. We say that the process Xt admits an invariant probability measure μ on 1

the corresponding measure space (X,A, μ), if 2

∫X

p(t, x, A)μ(dx) = μ(A) for all A ∈ A. 3

In the following we shall always assume that the invariant measure of the process exists and is unique. A Markov 4

process is called reversible with respect to an invariant probability measure μ, if it satisfies 5

∫A

p(t, x, B)μ(dx) =∫

B

p(t, x, A)μ(dx) 6

for every t ∈ I and A, B ∈ A. If moreover p(t, x, ·) is absolutely continuous with respect to the Lebesgue 7

measure, then we denote by ρ(t, x, y) the associated flat-space transition density, i.e., we have 8

p(t, x, A) =∫

A

ρ(t, x, y)dy. 9

Transfer operator. Given some measure ν, we consider the function spaces 10

Lpν =

{f : X → C :

∫|f(x)|pν(dx) < ∞

}, 11

Lpν(X × X) =

{f : X × X → C :

∫|f(x, y)|pν(dx)ν(dy) < ∞

}12

with p = 1 or p = 2. The associated norms will be denoted by ‖ · ‖ν,p. We will consider two cases: when ν 13

stands for the Lebesgue measure we call the spaces flat, and when ν is equal to the invariant measure μ, then 14

the spaces are called weighted. We define the semigroup of Markov propagators or forward transfer operators 15

P t : Lrμ → Lr

μ with t ∈ I and 1 ≤ r < ∞ by 16

∫A

P tf(y)μ(dy) =∫X

f(x)p(t, x, A)μ(dx) 17

for any measurable A ⊂ X. If μ is invariant under the dynamics Xt, then it is easy to see that the characteristic 18

function 1X ∈ L1μ of the entire state space is an invariant density of P t, i.e., we have P t1X = 1X. As following 19

from its definition, P t conserves norm, ‖P tf‖1 = ‖f‖1 and positivity, i.e., P tf ≥ 0 whenever f ≥ 0. Hence, 20

P t is a Markov operator. The perhaps simplest case is that of an ODE z = F (z). Let its solution be unique 21

for all initial values z(0) = z0 and denote its flow map by Φt such that z(t) = Φtz0. Then, the associated 22

transfer function is p(t, x, A) = 1A(Φtx) where 1A denotes the characteristic function of the set A. Let μ be 23

some measure that is invariant under Φt. The corresponding transfer operator then reads P tf(x) = f(Φ−tx). 24

Basic assumption. In all of the subsequent we will suppose that p(t, x, ·) as well as the associated invariant 25

measure are absolutely continuous with respect to the Lebesgue measure. This simplifies our considerations. 26

Needless to say that all of the subsequent definitions can be generalized to the case where absolute continuity 27

cannot be assumed. Moreover, we tacitly assume that the invariant measure μ is nonzero almost everywhere. 28

Again, this assumption can be dropped if the following arguments are restricted to the subset {x ∈ X: μ(x) > 29

0} ⊂ X, but this would make the notation somewhat more complicated. 30

Kernels. With these assumption, the expression for the propagator P t becomes 31

P tf(y) =∫X

kt(y, x)f(x)μ(x)dx , f ∈ Lpμ, (1.2)

Page 4: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

4 C. SCHUTTE AND T. JAHNKE

where μ(dx) =: μ(x)dx, and we have introduced the transition kernel1

kt(y, x)μ(y) = ρ(t, x, y) (1.3)

that is defined for all x, y for which μ > 0. Obviously, the transition kernel satisfies2

∫X

kt(y, x)μ(y)dy = 1 , ∀(x, t) ∈ X× I . (1.4)

A kernel with property (1.4) is called a Markov kernel. For a reversible process the transition kernel is symmetric,3

i.e., kt(x, y) = kt(y, x). We will furthermore consider a second kernel function, called the sampling kernel, being4

defined by5

κt(x, y) = μ(x) ρ(t, x, y) = μ(x)kt(y, x)μ(y). (1.5)

The sampling kernel is particularly important because it can be sampled directly from a given realization of the6

investigated process (see Sect. 2.4 for an example). In the following we will often fix a time t and then ignore7

the index t so that we simply can write the transition function as ρ(x, y), the sampling kernel as κ(x, y), and8

the transfer operator as Pf(y) =∫X

k(y, x)f(x)μ(x)dx in Lpμ. For convenience we introduce the abbreviation9

Pf = k ∗ f , knowing that we have to understand it relative to the space, especially weighting, considered.10

2. Gaussian kernels and Ornstein-Uhlenbeck processes11

2.1. Ornstein-Uhlenbeck sampling kernels12

Consider an Ornstein-Uhlenbeck process13

x = −F (x − x) + ΣW , (2.1)

with symmetric, positive definite matrices F ∈ Rd×d and Σ ∈ R

d×d, and define B = Σ2. The transition function14

of such a process is absolutely continuous with respect to the Lebesgue measure. The corresponding transition15

density at time t with respect to the initial condition to start in x0 at time t = 0 is given by16

ρ(t, x0, x) = Z(t) exp(−1

2(x − ξ(t))T C(t)−1(x − ξ(t))

), (2.2)

where17

ξ(t) = x + exp(−tF )(x0 − x)18

Z(t) = (2π)−d/2(detC(t))−1/219

and C(t) is the solution of20

C(t)F + FC(t) = B − exp(−tF )B exp(−tF ). (2.3)

It is well-known that for any M1, M2, M3 ∈ Rd×d the matrix equation XM1 +M2X = M3 has a unique solution21

if λ(1) + λ(2) �= 0 for all eigenvalues λ(1) of M1 and λ(2) of M2. Since F is positive definite, a unique solution of22

equation (2.3) exists. As a consequence of (2.2), the invariant measure is absolutely continuous with respect to23

the Lebesgue measure and has the form24

μ(x) = Z∞ exp(−1

2(x − x)T C−1

∞ (x − x))

, (2.4)

Page 5: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

TOWARDS EFFECTIVE DYNAMICS IN COMPLEX SYSTEMS BY MARKOV KERNEL APPROXIMATION 5

with C∞ such that C∞F + FC∞ = B. The last equation again has a unique solution; it satisfies 1

C(t) = C∞ − exp(−tF )C∞ exp(−tF ). (2.5)

The associated Markov operator Pt in the flat Lp space is obtained from Ptf(x) =∫

ρ(t, x0, x)f(x0)dx0. 2

We consider the sampling kernel 3

κt(x0, x) = ρ(t, x0, x)μ(x0), (2.6)

because this is the object that can be sampled directly from a given realization of the Ornstein-Uhlenbeck process 4

(for details see Sect. 2.4 below). Equations (2.2) and (2.4) yield that the sampling kernel can be expressed as 5

κt(x0, x) = Z(t)Z∞ exp(−1

2((x − x)T , (x0 − x)T )C(t)−1

(x − xx0 − x

)), (2.7)

with 6

C−1(t) =(

C(t)−1 −C(t)−1 exp(−tF )− exp(−tF )C(t)−1 exp(−tF )C(t)−1 exp(−tF ) + C−1

). (2.8)

According to (1.5), the associated Markov kernel with respect to the invariant measure μ has the form 7

kt(x, x0) =1

μ(x)κt(x0, x)

1μ(x0)

, 8

and we observe that kt is indeed symmetric (this follows from (2.5)) as it should be since the Ornstein-Uhlenbeck 9

process is a reversible Markov process. 10

2.2. Parameter estimation from a time series of the sampling kernel 11

Suppose now that the parameters F , x, and Σ of the process are unknown, and that only a sampling of the 12

sampling kernel is given. Based on this sampling, the parameters of the Ornstein-Uhlenbeck process can be 13

estimated as follows: 14

(1) Approximate the covariance matrix C ≈ C of the sampling kernel and its inverse 15

C−1(t) =(

M11 M12

MT12 M22

). 16

(2) Compute an estimate F ≈ F by solving 17

F = − log(−M−111 M12)/t 18

where log(·) denotes the matrix logarithm. Since (−M−111 M12) is (at least approximately) a symmetric 19

and positive definite matrix, the matrix logarithm is well-defined via the logarithm of the eigenvalues. 20

(3) Approximate Σ ≈ Σ via 21

C−1∞ = M22 − MT

12M−111 M12, 22

B = C∞FT + F C∞, 23

Σ =√

B. 24

If C∞ and F are symmetric and positive definite, then so is B, and the matrix Σ =√

B can be obtained 25

by computing the eigendecomposition of B and taking the square roots of the eigenvalues. 26

Page 6: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

6 C. SCHUTTE AND T. JAHNKE

−1 −0.5 0 0.5 1−1

−0.5

0

0.5

1

Figure 1. Sampling (as resulting from DNS) of the Ornstein-Uhlenbeck transition kernel forF = 4 and σ = 0.45 with t = 0.5 and m = 1000 steps.

2.3. Invariant measure of sampling kernels1

Whenever a sampling kernel κ is known, then the associated invariant measure is computable by means of2

simple integration:3

μ(x) =∫

ρ(x0, x)μ(x0)dx0 =∫

κ(x0, x)dx0.4

As in (1.5), ρ(·, ·) denotes the flat space transition function of the underlying process. For Gaussian sampling5

kernels6

κ(x, x0) ∝ exp(−1

2(xT , xT

0 )(

Mxx Mxx0

MTxx0

Mx0x0

)(xx0

)),7

we thus get for the associated measure8

μ(x) ∝ exp(− 1

2xTMx

), with M = Mx0x0 − MT

xx0M−1

xx Mxx0. (2.9)

The kernels are symmetric (and thus induce a reversible process) iff Mxx = Mx0,x0 and Mx0x = MTx0x.9

2.4. Numerical illustration10

Let us consider the 1d-case with F = 4, x = 0 and Σ = 0.45. A direct numerical simulation (DNS)11

of the system (with Euler-Maruyama discretization in time with timestep 0.001) yields a time series x0:m−1 =12

{x0, . . . , xm} with xn ∈ X. If the DNS is ergodic in the sense that the generated ensemble of points in state space13

is approximately distributed according to the invariant measure μ of the process), then the sampling x0:m−114

directly induces a sampling z0:m−1 = {z0, . . . , zm−1} of the sampling kernel by letting zn = (xn, xn+1) ∈ X×X.15

This sampling is illustrated in Figure 1 for t = 0.5 and m = 1000 sampling points.16

In the one-dimensional case, one obtains C∞ = Σ2/2F , C(t) = C∞(1 − exp(−2tF )), and17

C−1(t) = C−1(t)(

1 − exp(−tF )− exp(−tF ) 1

).18

Hence, the covariance matrix of the sampling kernel is19

C(t) =Σ2

2F

(1 exp(−tF )

exp(−tF ) 1

).20

Page 7: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

TOWARDS EFFECTIVE DYNAMICS IN COMPLEX SYSTEMS BY MARKOV KERNEL APPROXIMATION 7

Table 1. Dependence of the estimators on the sampling length m.

m 1000 2000 10000 ∞F 4.530 4.050 4.010 4.000Σ 0.503 0.456 0.452 0.450

An estimate C = (Cij)i,j of this matrix is directly available from the given data. The approximations F ≈ F 1

and Σ ≈ Σ can be computed via 2

F = log(C11/C12

)/t, ΣN =

√2FN C11. 3

Table 1 shows that the estimates converge to the correct values as the length m of the time series increases. 4

The example shows that the sampling kernel allows to estimate the parameters of the underlying Ornstein- 5

Uhlenbeck process in an easy way. However, in many realistic applications, the underlying SDE does not have 6

the simple form (2.1), because the process is metastable. Nevertheless, it will be shown how such situations can 7

be treated by a superposition of kernels each of which represents an Ornstein-Uhlenbeck process. 8

3. Metastability 9

Let us make a simple Gedankenexperiment (experiment of thought): Assume we consider a diffusion process xt 10

in state space R, governed by the SDE 11

xt = −F (xt) + ΣWt, 12

where F = −DV (x) is the gradient of a smooth potential V with several local minima. Then it is well known 13

that for small enough Σ the process stays for long periods of time in the disjoint wells M1, . . . , MN ⊂ R of V 14

while the exit from the well Mi has an expected exit time that scales like exp(−2ΔVi/Σ2), where ΔVi is the 15

lowest energy barrier from the respective well into a neighboring one. That is, the process is metastable and 16

the sampling kernel of this process will then have the following approximate form: it will be a superposition of 17

“peaks” in each of the wells, i.e., in Mi ×Mi, i = 1, . . . , N . Such processes occur in many real-life applications; 18

examples have been given in the introduction. This is our motivation to study Markov processes that belong 19

to sampling kernels constructed by superposition, and subsequently analyse their metastability properties. 20

3.1. Superposition kernels 21

Let us now discuss how to construct Markov operators and kernels by superposition. Suppose that Pi, 22

i = 1, . . . , N are Markov operators in Lpμi

with respective invariant probability measures μi, and that αi ∈ R 23

are non-negative weights with α1 + . . . + αN = 1. In this case, we consider the Markov operator P in Lpμ given 24

by the superposition 25

∫A

Pf(x)μ(dx) =N∑

i=1

αi

∫A

Pif(x)μi(dx), with Pif(x) =∫

ki(x, y)f(y)μi(dy), 26

where μ is the invariant probability measure of P . These facts guarantee that Pi1X = 1X ∈ Lpμi

and P1X = 27

1X ∈ Lpμ. Inserting this into the above equation yields that μ is the invariant probability measure of P if and 28

only if 29

μ =N∑

i=1

αiμi. (3.1)

Page 8: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

8 C. SCHUTTE AND T. JAHNKE

The kernel associated with P is then given by1

μ(x)k(x, y)μ(y) =N∑

i=1

αiμi(x)ki(x, y)μi(y), (3.2)

where ki denotes the Markov kernel of Pi, and it is assumed that all of the invariant measures are absolutely2

continuous with respect to the Lebesgue measure. A kernel with this structure will be called a superposition3

kernel.4

The kernels are living in the respectively weighted spaces. What is the flat space transition density ρflat(·, ·)5

if the flat space transition densities ρi,flat(·, ·) belong to the kernels ki? By using (1.5) and (3.2), we find the6

answer7

ρflat(x, y) =N∑

j=1

αjμj(x)μ(x)

ρj,flat(x, y), (3.3)

where we assumed that μ is positive (almost) everywhere.8

Remark 3.1. A realization of the process corresponding to the superposition kernel can be computed by9

repeating the following steps. Draw a random variable r from the uniform distribution [0, 1] and choose the10

index j such that r ∈ [βj−1(x), βj(x)), where βj(x) =∑j

k=1 αkμk(x)/μ(x), and j = N if r = 1. Then, the11

current state x is updated according to the jth transition density ρj,flat(x, y). However, we emphasize that12

our goal is to solve the inverse problem: How can the parameters αi and the densities ρi be estimated if a13

realization of the stochastic process is given? Before this question is addressed, we investigate the relation14

between superposition kernels and metastability.15

3.2. Almost invariance and superposition kernels16

As above let ki, i = 1, . . . , N , be Markov kernels with invariant probability measures μi. We consider17

the mixed kernel μ(x)k(x, y)μ(y) =∑N

i=1 αiμi(x)ki(x, y)μi(y), with the invariant measure μ =∑

i αiμi and18

weights αi such that∑

i αi = 1. Let us assume that the measures μi all are absolutely continuous with respect19

to the Lebesgue measure, and that μ is positive (almost) everywhere.20

Definition 3.2. The Markov kernel k is called ε-metastable if21

Oij =∫

μi(x)μj(x)μ(x)

dx ≤ ε22

for all i, j = 1, . . . , N with j �= i.23

Almost invariant densities of k. In the μ-weighted space, the invariant densities of the Markov operators Pi24

are25

Φi(x) = μi(x)/μ(x).26

Obviously, these functions are densities in L1μ. A nice property of these densities is formulated in the following27

lemma.28

Lemma 3.3. The kernel k is ε-metastable if and only if29

Oij =∫

Φi(x)Φj(x)μ(x) dx = 〈Φi, Φj〉μ ≤ ε. (3.4)

for all i, j = 1, . . . , N with j �= i.30

Page 9: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

TOWARDS EFFECTIVE DYNAMICS IN COMPLEX SYSTEMS BY MARKOV KERNEL APPROXIMATION 9

The proof follows directly from Definition 3.2. 1

Other useful properties are: 2

N∑j=1

αjΦj(x) = 1, ∀x, i.e., {Φj}j=1,...,N is a partition of unity,

0 ≤ Φi(x) ≤ 1/αi, ∀i, (3.5)∣∣∣Φi(x) − 1

αi

∣∣∣μ(x) =1αi

( N∑j=1j �=i

αjμj(x)). (3.6)

The latter equation follows from 3

∣∣∣Φi(x) − 1αi

∣∣∣μ(x) =∣∣∣μi(x) −

N∑j=1

αj

αiμj(x)

∣∣∣ =N∑

j=1j �=i

αj

αiμj(x). 4

With these properties we can prove that the propagator with kernel k leaves the density Φi nearly invariant 5

with respect to the measure μ: 6

Theorem 3.4. Let the kernel k defined as above be ε-metastable, let all measures μi be absolutely continuous 7

with respect to the Lebesgue measure, and let μ be positive (almost) everywhere. Then, for all i = 1, . . . , N : 8

‖k ∗ Φi − Φi‖1,μ ≤ 2(1 − αi) ε 9

where (k ∗ Φi)(y) =∫X k(y, x)Φi(x)μ(x)dx. 10

Proof. The proof requires some preparations: 11

‖k ∗ Φi − Φi‖1,μ =∫ ∣∣∣

∫μ(y)k(y, x)Φi(x)μ(x)dx − Φi(y)μ(y)

∣∣∣ dy 12

=∫ ∣∣∣∑

j

αj

∫μj(y)kj(y, x)Φi(x)μj(x)dx − Φi(y)μ(y)

∣∣∣ dy 13

=∫ ∣∣∣∑

j

αjμj(y)∫

kj(y, x)Φj(x)μi(x)dx − Φi(y)μ(y)∣∣∣ dy 14

≤∫ N∑

j=1,j �=i

αj

∫kj(y, x)Φj(x)μi(x)dxμj(y)dy 15

+∫ ∣∣∣αiμi(y)

∫ki(y, x)Φi(x)μi(x)dx − Φi(y)μ(y)

∣∣∣ dy. 16

17

The first of these terms can be estimated using that∫

kj(y, x)μj(y)dy = 1: 18

∫ N∑j=1,j �=i

αj

∫kj(y, x)Φj(x)μi(x)dxμj(y)dy ≤

N∑j=1,j �=i

αj〈Φi, Φj〉μ ≤ (1 − αi)ε. 19

Page 10: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

10 C. SCHUTTE AND T. JAHNKE

The second term is split into two parts:1

∫ ∣∣∣αiμi(y)∫

ki(y, x)Φi(x)μi(x)dx − Φi(y)μ(y)∣∣∣dy ≤

∫ ∣∣∣αiμi(y)∫

ki(y, x)1αi

μi(x)dx − Φi(y)μ(y)∣∣∣ dy2

+ αi

∫ ∫ki(y, x)

∣∣∣Φi(x) − 1αi

∣∣∣μi(x)dxμi(y)dy.3

4

The first part vanishes because5

∫ ∣∣∣μi(y)∫

ki(y, x)μi(x)dx − Φi(y)μ(y)∣∣∣ dy =

∫ ∣∣∣∫

ki(y, x)μi(x)dx − 1X

∣∣∣μi(y) dy6

=∫ ∣∣∣Pi1(y) − 1X

∣∣∣μi(y) dy = 07

since Pi1X = 1X in L2μi

. The second part allows the following estimate based on (3.6) and the fact that8 ∫ki(y, x)μi(y)dy = 1:9

∫ ∫ki(y, x)

∣∣∣∣Φi(x) − 1αi

∣∣∣∣μi(x)dxμi(y)dy ≤∫

1αi

⎛⎜⎝

N∑j=1j �=i

αjμj(x)

⎞⎟⎠Φi(x)dx10

≤ 1αi

N∑j=1j �=i

αj

∫μj(x)Φi(x)dx11

≤ 1αi

(1 − αi)ε.12

Putting the two terms together again we get13

‖k ∗ Φi − Φi‖1,μ ≤ (1 − αi)ε + (1 − αi)ε. �14

Let us now assume that K is a Markov kernel which cannot be represented exactly by a weighted sum of15

other kernels ki but which can be approximated by such a representation, i.e.16

μ(x)K(x, y)μ(y) ≈ μ(x)k(x, y)μ(y) =N∑

i=1

αiμi(x)ki(x, y)μi(y).17

As before, let μi be the invariant measures of the Markov kernels ki and let μ be the invariant measure of k. It18

is assumed that μ is positive almost everywhere, and that all measures are absolutely continuous with respect19

to the Lebesgue measure. The coefficients αi are positive with∑

j αj = 1.20

Definition 3.5. The kernel K is said to be (ε, δ)-metastable if there is an ε-metastable kernel k that satisfies21

the above assumptions such that22

‖K − k‖1,μ =∫ ∫

|K − k|(y, x)μ(x)dxμ(y)dy ≤ δ.23

Page 11: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

TOWARDS EFFECTIVE DYNAMICS IN COMPLEX SYSTEMS BY MARKOV KERNEL APPROXIMATION 11

Now we again consider the functions Φi(x) = μi(x)/μ(x) as candidates for almost invariant densities. By 1

definition, we obtain 2

‖K ∗ Φi − Φi‖1,μ ≤ ‖k ∗ Φi − Φi‖1,μ 3

+∫ ∫

|K − k|(y, x)Φi(x)μ(x)dxμ(y)dy, 4

where the first term can be estimated due to Theorem 3.4, while the second term can be simplified again if K 5

is (ε, δ)-metastable: 6∫ ∫|K − k|(y, x)Φi(x)μ(x)dxμ(y)dy ≤ 1

αi‖K − k‖. 7

That is, we have shown the following result: 8

Theorem 3.6. Let K be an (ε, δ)-metastable Markov kernel with associated mixed kernel 9

μ(y)k(x, y)μ(x) =N∑

j=1

αjμj(y)kj(x, y)μj(x). 10

Then the bounded functions Φi(x) = μi(x)/μ(x) are almost invariant under K: 11

‖K ∗ Φi − Φi‖1,μ ≤ 2(1 − αi) ε +δ

αi· 12

It should be pointed out that in Theorems 3.4 and 3.6 the invariance is measured with respect to the weighted 13

norm ‖ · ‖1,μ. This has important consequences which will be discussed below (cf. Sect. 5.1). 14

3.3. Perturbation of the spectrum 15

Which consequence does the approximation of the kernel function have for the eigenvalues of the associated 16

operators? This question can be answered, at least under additional assumptions. Assume that the original 17

transfer operator P has invariant measure μ and kernel K(·, ·), and that the associated Markov process is 18

reversible. Then, P is self-adjoint in the Hilbert space L2μ. Furthermore, consider a symmetric approximation 19

k(·, ·) ≈ K(·, ·). Let k induce the operator P with the same invariant measure μ. This assumption simplifies 20

the analysis, because now both P and P can be considered as operators in L2μ. Next, we assume that P − P is 21

a Hilbert-Schmidt operator which is the case if and only if 22

‖K − k‖22,μ =

∫ ∫|K − k|2(y, x)μ(x)dxμ(y)dy ≤ ∞. (3.7)

In this situation, Theorem 3 of [13] applies and yields: 23

Corollary 3.7. The above assumptions on P , P , K and k imply that there exist enumerations {λi}, and {νi} 24

of the spectra of P and P , respectively, in L2μ, such that 25

∞∑i=1

|λi − νi|2 ≤ ‖K − k‖22,μ. 26

The corollary indicates that if K and its approximation k are close enough in Hilbert-Schmidt norm then 27

the spectra of the associated transfer operators are very similar. Since the dominant eigenvalues define the 28

most important metastable timescales, this means that the approximation of the kernel of a transfer operator 29

at least has the potential of approximating the effective dynamics also. However, the result does not imply that 30

Page 12: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

12 C. SCHUTTE AND T. JAHNKE

the Hilbert-Schmidt norm is the appropriate norm for getting optimal approximation of the dominant part of1

the spectrum.2

Remark 3.8. The above results on the approximation of essential features of the dynamics lead to two different3

norms, ‖ · ‖1,μ and ‖ · ‖2,μ. Let k be a kernel and κ the associated sampling kernel. Then, these norms read4

‖k‖1,μ =∫ ∫

|κ(x, y)|dxdy5

‖k‖2,μ =

(∫ ∫ (κ(x, y)

μ(x)1/2μ(y)1/2

)2

dxdy

)2

,6

that is, 1-norm approximation means sampling kernel approximation in an unweighted (Lebesgue) measures7

sense, while 2-norm means sampling kernel approximation with a weighting which is large where the invariant8

measure is small.9

The results of this section mean, roughly speaking, that an (ε, δ)-metastable kernel K can be approximated10

by a superposition of kernels k1, . . . , kN , and that the corresponding invariant measures μ1, . . . , μN allow to11

construct functions Φ1, . . . , ΦN which are almost invariant densities of K in the weighted space. It will now12

be our goal to construct such an approximative kernel in a situation where K is not known explicitly and only13

a sampling of the associated process is given. We will see that under appropriate conditions this process can14

locally be approximated by the linear Ornstein-Uhlenbeck process discussed in Section 2. The goal is then15

to find the parameters (means and covariance matrices) of these local Ornstein-Uhlenbeck processes, and to16

combine these processes in such a way that the metastability behaviour is correctly reproduced.17

4. Kernel approximation by mixture models18

Let us assume that we have a sampling z0:m−1 = {z0, . . . , zm−1} of the sampling kernel κ of the Markov19

process Xt under consideration (zn ∈ X × X for all n = 0, . . . , m − 1). At the moment it is of no importance20

whether this sampling results from a long-term observation of Xt or from an ensemble of short-term observations.21

We want to exploit the sampling z0:m−1 in order to get an approximation of κ by a superposition kernel relative22

to the Lebesgue measure (i.e. in the 1-norm sense with respect to the associated kernel function, see Rem. 3.8).23

To this end we assume that z0:m−1 is an observation of m independent and identically distributed realizations24

of some random variable Z that is distributed according to a mixture model, i.e., the density ρ(z|θ) of the25

probability P(Z ∈ A|θ) =∫

Aρ(z|θ)dz for measurable sets A ∈ X × X has the form of a weighted sum of N26

component densities:27

ρ(z|θ) =N∑

y=1

ρ(z|y, θ)P(Y = y|θ).28

Here, θ denotes unknown parameters that determine the form of the probability densities. For example, letting29

the component densities ρ(Z|y, θ), y = 1, . . . , N be Gaussian, then ρ(Z|θ) is a weighted sum of N Gaussian, and30

we speak of a Gaussian mixture model. Note that we treat Y as a random variable whose value y ∈ 1, . . . , N31

we do not know. Therefore, Y is called the hidden assignment variable for Z to one of the component densities,32

and P(Y = y|θ) is the probability of a certain value y of Y given θ. If we knew the value of Y , say Y = y, then33

the density for Z was just the y-th component density.34

According to our assumption, each of our independent samples zn of Z has its own assignment value yn,35

i.e., the realization zn of Z comes from the realization yn of Y . However, by this assumption, for given θ, the36

y0, . . . , ym−1 are statistically independent, as well as the z0, . . . , zm−1. Our aim is to choose the parameters θ37

such that the likelihood38

L(θ | z0:m−1) =m−1∏n=0

ρ(zn | θ) =m−1∏n=0

N∑y=1

ρ(zn|y, θ)P(Ytn = y|θ),39

Page 13: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

TOWARDS EFFECTIVE DYNAMICS IN COMPLEX SYSTEMS BY MARKOV KERNEL APPROXIMATION 13

is maximized, i.e., we seek the parameters for which the probability of the given observation over the family of 1

models under consideration is maximal. Thus, the maximum likelihood estimate (MLE) θ satisfies 2

θ = argmaxθL(θ | z0:m−1). 3

Equivalently, the MLE can be defined via the logarithm of L, which according to the assumed statistical 4

independencies, reads 5

θ = argmaxθ logL(θ | z0:m−1)

with logL(θ | z0:m−1) =m−1∑n=0

log

(N∑

y=1

ρ(zn|y, θ)P(Ytn = y|θ))

. (4.1)

Here, however, we do not have observations of Yt, i.e., in some sense we have to consider the optimization task for 6

all possible probability densities for Yt. This is indeed feasible by means of the expectation-maximization (EM) 7

algorithm. Hartley [17] pioneered the research on the EM algorithm in the late 1950s, followed by Dempster 8

et al. [9] in the late 1970s. Over the years, the EM algorithm has found many applications in various domains 9

and has become a powerful estimation tool [12,15]. 10

The EM algorithm is an iterative optimization procedure. Starting with an initial parameter estimate θ(0), 11

each iteration monotonically increases the likelihood function L(θ |x) = L(θ). Each iteration consists of two 12

steps: (a) the E-step or the expectation step and (b) the M -step or the maximization step. 13

EM for the Gaussian mixture model. Let us now specify that the component densities are Gaussian with 14

mean zy and covariance matrices Σy, y = 1, . . . , N : 15

ρ(z |y, θ) = G(z; zy, Σy). (4.2)

Thus, the free parameters θ of our model are the means and covariances of the Gaussian component densities, 16

and the probabilities that component y is active, 17

αy = P(Y = y |θ). (4.3)

These probabilities obviously have to satisfy the constraint∑N

y=1 αy = 1. We thus have the parameter set 18

θ = (z1, . . . , zN , Σ1, . . . , ΣN , α1, . . . , αN ). 19

In this case, the EM iteration takes the form of Algorithm 1. 20

The meaning of the algorithm becomes clearer with the interpretation of γ(i)n (y) as the probability that 21

according to the mixture model with parameters θ(i) at time tn the observation zn has to be assigned to hidden 22

state y. Thus, some sample zn can be assigned to several of the hidden states with probability between 0 and 1. 23

Remark 4.1. Let x0:m−1 = {x0, . . . , xm} be some given observation sequence of the Markov process Xt under 24

consideration. It induces a sampling zn = (xn, xn+1) ∈ X × X of the associated sampling kernel κ. In this 25

case the basic assumption of the mixture model that z0:m−1 results from repeated independent and identically 26

distributed realization of some random variable Y seems strange since we know that the zn are correlated by 27

the Markov process. Nevertheless the EM algorithm for the Gaussian mixture model often results in excellent 28

approximations of the sampling (provided the underlying structure of the data is that of an superposition 29

Gaussian kernel). However, one can take the Markov-like correlations in z0:m−1 into account by means of 30

generalizing the mixture model to hidden Markov models (HMM) which again leads to a specific version of the 31

EM algorithm [4,31]. 32

Page 14: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

14 C. SCHUTTE AND T. JAHNKE

Algorithm 1 EM-algorithm for Gaussian mixture model

Require: Time series z0:m−1 = {z0, . . . , zm−1}, tolerance tol, initial guess of parameters

θ(0) = (z(0)y , Σ(0)

y , α(0)y )y=1,...,N .

Ensure: Maximum likelihood estimate θ.(1) Formally set i = −1.(2) i := i + 1.(3) Expectation step (E-step): Compute the occupation probabilities

γ(i)n (y) =

G(zn; z(i)y , Σ(i)

y )α(i)y∑N

y=1 G(zn; z(i)y , Σ(i)

y )α(i)y

, y = 1, . . . , N, n = 0, . . . , m − 1

(4) Maximization step (M-Step):For y = 1, . . . , N, compute the new optimal parameter estimates

Σ(i+1)y =

1γ(i)(y)

m−1∑n=0

γ(i)n (y)(zn − z(i)

y )(zn − z(i)y )�

z(i+1)y =

1γ(i)(y)

m−1∑n=0

γ(i)n (y) zn

α(i+1)y =

1∑Ny=1 γ(i)(y)

γ(i)(y),

where γ(i)(y) =∑m−1

n=0 γ(i)n (y).

(5) Compute the log-likelihood log cL(θ(i+1)) using equations (4.1), (4.2), and (4.3).If L(θ(i+1)) − L(θ(i)) > tol, go to Step (2). Otherwise, terminate with θ(i+1) as the desired approximationof θ.

Remark 4.2. Whenever we want to estimate the Gaussian parameters of the sampling kernel of some reversible1

Markov process it can be of interest to add appropriate symmetry constraints to the EM iteration. This is indeed2

possible by different means. The easiest way is, when considering a sampling z0:m−1 with zn = (xn, xn+1) that3

is induced by a long-term time series, to extend it into a reversible one by adding zm:2m−1 = {zm, . . . , z2m−1},4

zn+m = (xn, xn+1), and then apply Algorithm 1 to the extended sampling.5

5. Kernel approximation of metastable processes6

We will now study metastable dynamics associated with the nonsymmetric double-well potential7

V (x) = (x2 − 1)2 + 0.25x,8

which is illustrated in Figure 2.9

5.1. Diffusion in a double well potential10

Let us return to a 1d Ornstein-Uhlenbeck process, this time of the form11

x = −F (x) + ΣW12

with Σ = 0.45, and force field F (x) = dVdx (x) where V denotes the above nonsymmetric double-well potential.13

Page 15: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

TOWARDS EFFECTIVE DYNAMICS IN COMPLEX SYSTEMS BY MARKOV KERNEL APPROXIMATION 15

−2 −1 0 1 2−1

0

1

2

3

4

5

x

Figure 2. Nonsymmetric double well potential as used below for numerical tests.

−2 −1 0 1 2−2

−1

0

1

2

0.2

0.4

0.6

0.8

1

1.2

1.4

Figure 3. Sampling and histogram of sampling kernel for the OU process as discussed in thetext. Points outside of the black box on the left hand side have not been hit by the sampling.

After again performing a DNS of the system (now with Euler-Maruyama discretization in time with timestep 1

0.001), we directly get a sampling of the sampling kernel κt(x, x0) with respect to the invariant measure μ of the 2

process. For t = 0.5 and N = 10 000 sampling points, the sampling is illustrated in the left panel of Figure 3. 3

The right panel of Figure 3 shows the corresponding histogram. 4

We assume that the underlying kernel is (ε, δ)-metastable and apply Algorithm 1 to this sampling. This 5

yields the following results for the mean values (estimated equilibria (x, x0)i for i = 1, 2) and the respective 6

covariance matrices Σi(t), i = 1, 2: 7

(x, x0)1 = (−1.012,−1.012), (x, x0)2 = (0.933, 0.933), 8

and 9

Σ1(t) =(

0.0208 0.00100.0010 0.0211

), Σ2(t) =

(0.0299 0.00470.0047 0.0308

). 10

We observe that the two parts of the kernel can be well approximated by Gaussian that both have the structure 11

and form of Gaussian kernels resulting from an Ornstein-Uhlenbeck process. 12

Page 16: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

16 C. SCHUTTE AND T. JAHNKE

−2 −1 0 1 20

0.02

0.04

0.06

0.08

x −2 −1 0 1 20

5

10

15

20

25

x

Figure 4. Left: invariant measures μ1 (dashed), μ2 (dashed) and the full measure μ (solid) ascomputed from the DNS sampling with τ = 0.5. Right: corresponding approximate invariantdensities Φ1 and Φ2.

We now aim at understanding the full kernel Kt(x, x0) as an superposition kernel kt(x, x0) in the sense of1

Section 3.1:2

μ(x)Kt(x, x0)μ(x0) ≈ μ(x)kt(x, x0)μ(x0) =2∑

j=1

αjμj(x)kj(x, x0)μj(x0),3

where the ki are the two Gaussian kernels that have been determined above, and4

α1 = 0.948, α2 = 0.052,5

as a further result of the Gaussian mixture model. But this means that we determined everything that is6

required to construct the superposition kernel kt(x, x0) that approximates Kt(x, x0). In terms of the weighted7

1-norm ‖kt − K‖1,μ the agreement is very good (as far as this can be checked based on histograms of kt).8

Based on these results we are able to inspect the approximate invariant densities Φ1 and Φ2 (see Fig. 4) and9

compute the corresponding overlap O12 = 〈Φ1, Φ2〉μ. We get10

O12 = 2.8 × 10−9,11

such that we can enter the above metastability results with ε = O12 = 2.8× 10−9. This shows that the kernel is12

ε-metastable, but it also shows that the current approximation contains “too much metastability” in the following13

sense: the metastable sets exhibit stronger metastability with respect to the approximate superposition kernel14

than with respect to the original process. This is a consequence of the fact that the approximation of the15

sampling kernel was based on the 1-norm instead on the 2-norm (see Rem. 3.8); in the 1-norm the weights of16

the transition regions are small since the sampling kernels are small there.17

5.2. Assignment to metastable and transition states18

Let us consider the above example again. Let us denote the available time series generated by DNS by19

x0:m = {x0, . . . , xm}. From this we get the time series underlying the sampling kernel; this will be denoted20

z0:m−1 = {z0, . . . , zm−1}, where zn = (xn, xn+1).21

According to the previous analysis we have two main metastable states. Points in the shortened time series22

x0:m−1 can be assigned to these states via the almost invariant densities Φi, i = 1, 2, in the sense of constructing23

the two sets24

Mi = {xn : 0 ≤ n ≤ m − 1, Φi(xn) > θ‖Φi‖∞}, i = 1, 2,25

Page 17: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

TOWARDS EFFECTIVE DYNAMICS IN COMPLEX SYSTEMS BY MARKOV KERNEL APPROXIMATION 17

Figure 5. Sampling of the sampling kernel with coloring of points according to assignment asdescribed in the text (θ = 0.95). The two clouds of grey and black dots represent M1 and M2,respectively. The crosses indicate M10 (black) and M01 (green), the circles M12 (black) andM21 (light grey), and the stars M20 (black) and M02 (light grey).

where θ > 0.5 is some appropriate user-selected threshold (e.g., 0.95). The properties of the Φi guarantee that 1

M1 ∩M2 = ∅. All other xn will be collected in the transition set 2

M0 = {xn : 0 ≤ n ≤ m − 1, Φi(xn) ≤ θ‖Φi‖∞, i = 1, 2}· 3

Transitions are events n where xn ∈ Mj for j = 0, 1, 2 but xn+1 �∈ Mj . We can classify these via the timeseries 4

z as follows: First introduce 5

Mij = {zn = (xn, xn+1) : 0 ≤ n ≤ m − 1, xn ∈ Mi and xn+1 ∈ Mj}· 6

See Figures 5 and 6 for illustrations of these sets for different values of θ. 7

Let now #A denote the number of elements in the set A. Then, we observe that for all i = 0, 1, 2 8

#Mi =∑

j

#Mij , 9

and the optimal Markov transition matrix (in a MLE sense, i.e., under the condition of the observation X 10

made) between the sets Mi, i = 0, 1, 2, has transition probabilities 11

p(i, j) =#Mij

#Mi· 12

Page 18: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

18 C. SCHUTTE AND T. JAHNKE

Figure 6. Sampling of the sampling kernel with coloring of points according to assignment asdescribed in the text (θ = 0.99). The two clouds of grey and black dots represent M1 and M2,respectively. The crosses indicate M10 (black) and M01 (light grey), the circles M12 (black)and M21 (light grey), and the stars M20 (black) and M02 (light grey).

In our case we get (θ = 0.95)1

� =

⎛⎝ 0 0.8112 0.1888

0.0001 0.9997 0.00020.0003 0.0030 0.9967

⎞⎠ .2

The eigenvalues of this matrix are 1.0000, 0.9965,−0.0001 which illustrates that, as long as we are interested3

in metastability, the process can be further aggregated by means of the PCCA algorithm [10,11] into the4

2 × 2 process5

� =(

0.9998 0.00020.0031 0.9969

),6

which describes the jumps between M1 and M2.7

Let us conclude our considerations with the following sketch of the algorithmic approach we are advocating8

herein:9

• Use Algorithm 1 to find an approximate superposition kernel∑M

i=1 αiκi based on a sampling (time10

series with lag time τ) of the sampling kernel of the system under investigation.11

• Construct the almost invariant functions Φi based on the invariant measures of the approximate super-12

position kernel and compute the transition matrix � as outlined in this section.13

• Take the Markov chain associated with � as description of the effective dynamics of the system on14

timescale τ and the dynamics associated to the sampling kernels κi, i = 1, . . . , M, as local dynamics15

within each of the M metstable sets.16

In Section 2.3 of [36] this approach has been applied to the small peptide trialanine; it has been demonstrated17

that its results coincide with results of other algorithmic approaches to metastable dynamical behavior in18

molecular systems.19

Page 19: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

TOWARDS EFFECTIVE DYNAMICS IN COMPLEX SYSTEMS BY MARKOV KERNEL APPROXIMATION 19

6. Relation to other approaches 1

6.1. Extended state space 2

Let us consider again Markov kernels ki, i ∈ I = {1, . . . , N}, with invariant probability measures μi, and 3

associated flat space transition densities ρi(x, ·). Assume that ki and ρi are associated with a lag time τ . Let 4

us assume that the measures μi are absolutely continuous with respect to the Lebesgue measure and that the 5

ρi are positive (almost) everywhere. Let the underlying state space be Ω. Now we consider the extension of the 6

process to the extended state space Ω × I, i.e., the number of the respective component ki of the process now 7

is part of the state information. Now, consider the following Markov transition density on this extended state 8

space: 9

ρext(x, i, y, j) = ρi(x, y)Tij(y), (6.1)where i, j ∈ I, and Tij denotes the i, j-entry of the transition matrix T = exp(τR) of the Markov jump process 10

with rate matrix R on I. R and T are supposed to depend on y such that π = (πj(y))j∈I , the invariant measure 11

of T (y), satisfies 12

πj(y) = αjμj(y)/μ(y), 13

with some fixed positive numbers αi that do not depend on y and satisfy∑

j∈I αj = 1, and μ(y) =∑

j αjμj(y). 14

Under these conditions we easily verify that the invariant measure of the extended transition function with 15

density ρext is 16

μext(x, i) = αiμi(x). 17

We can now lump the extended transition function together again, i.e., we consider its marginal transition 18

density 19

ρmar(x, y) =∑i,j∈I

πi(x)ρext(x, i, y, j). (6.2)

Interestingly we then get back to the transition density of the superposition process on Ω: 20

ρmar(x, y) =∑i∈I

αiμi(x)μ(x)

ρi(x, y). 21

6.2. Towards HMMSDE 22

Let ρi, i = 1, . . . , N, be Markov transition densities and R ∈ RN×N a rate matrix. The transition matrix 23

associated with R and step t = 1 is � = exp(R). Let π be the invariant measure of �, i.e., πT� = πT . Consider 24

the extended state space Ω = Ω × {1, . . . , N}. Then introduce the transition density 25

ρ(x, i, y, j) = ρi(x, y)�ij 26

which defines a “1-step” Markov kernel on Ω. In contrast to what has been considered above we do no longer 27

assume that � depends on the target state y ∈ Ω. 28

In case that the ρi(x, y) are transition functions of Ornstein-Uhlenbeck processes 29

x = −DV (i)(x) + Σ(i)W , 30

the such defined process is governed by the HMMSDE model [18] 31

x(t) = −DV (i(t))(x(t)) + Σ(i(t))W , 32

i(t) = Markov jump process with rate matrix R. 33

Concerning parameterization of this process by the time series at hand, we get the ρi by approximation of 34

the sampling kernel and the transition matrix from the above EM algorithm and counting scheme. In [18–20], 35

Page 20: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

20 C. SCHUTTE AND T. JAHNKE

another approach to parameter estimation for the HMMSDE model has been presented. Further investigations1

will have to work out whether these algorithms can also be used for the advanced kernel approximation scheme.2

Appendix A3

As an example in a deterministic setting, we derive an explicit formula for the transition kernel in the case4

of a partially observed Hamiltonian system. Consider the ordinary differential equation z = F (z) with flow5

map Φt such that z(t) = Φtz0 if z(0) = z0. Now assume that the trajectory is only partially observed, i.e.,6

instead of the full state z = (x, ξ) we do only consider the part x = Qz where Q denotes the projection from the7

state space onto the subspace corresponding to x. Then, the observed process has the form x(t) = QΦt(x0, ξ0).8

Furthermore assume that the flow map Φt leaves the measure π = μ⊗ ν invariant, of which we assume that it is9

absolutely continuous with respect to the Lebesgue measure and decomposes according to π(x, ξ) = μ(x)ν(ξ).10

Under these conditions the transfer operator of the observed process x(t) on time scale τ has the following11

form [33,34] in the function space L2μ:12

P τf(x) =∫

f(QΦ−τ (x, ξ)) ν(ξ) dξ.13

Rewriting it in the above notation under the assumption that the μ is almost everywhere positive exhibits that14

the associated kernel in L2μ has the form15

kτ (x, y) =1

μ(y)

∫δ(y − QΦ−τ (x, ξ)) ν(ξ) dξ,16

where δ denotes the usual delta distribution which is used here for the sake of simplicity.17

In order to understand what kind of function kτ may be, consider the following scenario which originates18

from molecular dynamics applications: there z = F (z) should be thought of as a Hamiltonian system with19

position x, momentum ξ, and Hamiltonian H(z) = T (ξ) + V (x). Hence, F (z) = −JDH(z) where DH denotes20

the derivative of H with respect to z = (x, ξ), and J is the typical skew-symmetric block matrix J = [0, I; −I, 0].21

The associated flow then leaves the measure22

π(x, ξ) = μ(x)ν(ξ) =1

Zxexp(−βV (x)) · 1

Zξexp(−βT (ξ))23

invariant, where Zx and Zξ are appropriate normalization constants and β is some arbitrary positive number.24

In this context, all above assumptions are satisfied for arbitrary potential energies V and kinetic energies T25

that grow strong enough. In order to allow a glimpse on the structure of kτ let us specifically choose the case26

of a one-dimensional position coordinate x and V (x) = x2/2 and T (ξ) = ξ2/2, and τ so that s = sin(τ) �= 0.27

Then QΦ−τ (x0, ξ0) = cos(τ)x0 − sin(τ)ξ0, and we find (set c = cos(τ))28

kτ (x, y) =1

μ(y)1s

ν(cx − y

s

)·29

This then results in a sampling kernel of Gaussian form30

κτ (x, y) =1

sZxZξexp

(−β

2

[x2 +

(cx − y

s

)2])

=1

sZxZξexp

(−β

2

[x2 − 2cxy + y2

s2

])·31

Next, consider the Hamiltonian H(x, ξ) = T (ξ) + V (x) where V (x) now denotes the double well potential32

shown in Figure 2 (cf. Sect. 5.1). Let β = 5 and τ = 0.5. After performing a DNS of the projected Hamiltonian33

system (with Verlet discretization in time with timestep 0.005), we directly get a sampling of the associated34

Page 21: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

TOWARDS EFFECTIVE DYNAMICS IN COMPLEX SYSTEMS BY MARKOV KERNEL APPROXIMATION 21

−1.5 −1 −0.5 0 0.5 1 1.5−1.5

−1

−0.5

0

0.5

1

1.5

Figure 7. Sampling of the sampling kernel for projected Hamiltonian system as described inthe text.

sampling kernel κτ (x, x0) with respect to the invariant measure μ of the process. This sampling is shown in 1

Figure 7. We observe that the sampling kernel can well be approximated by a superposition of two Gaussian. 2

References 3

[1] D. Achlioptas, F. McSherry and B. Scholkopf, Sampling techniques for kernel methods, in Advances in Neural Information 4

Processing Systems 14, T.G. Diettrich, S. Becker and Z. Ghahramani Eds., MIT Press (2002) 335–342. 5

[2] M. Belkin and P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in Advances in Neural 6

Information Processing Systems 14, T.G. Diettrich, S. Becker and Z. Ghahramani Eds., MIT Press (2002) 585–591. 7

[3] M. Belkin and P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15 (2003) 8

1373–1396. 9

[4] J. Bilmes, A Gentle Tutorial on the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and 10

Hidden Markov Models. ICSI-TR-97-021 (1997). 11

[5] A. Bovier, M. Eckhoff, V. Gayrard and M. Klein, Metastability in stochastic dynamics of disordered mean-field models. Probab. 12

Theor. Rel. Fields 119 (2001) 99–161. 13

[6] J. Chodera, N. Singhal, V. Pande, K. Dill and W. Swope, Automatic discovery of metastable states for the construction of 14

Markov models of macromolecular conformational dynamics. J. Comp. Chem. 126 (2007) 155101. 15

[7] E.B. Davies, Metastable states of symmetric Markov semigroups I. Proc. London Math. Soc. 45 (1982) 133–150. 16

[8] M. Dellnitz and O. Junge, On the approximation of complicated dynamical behavior. SIAM J. Numer. Anal. 36 (1999) 17

491–515. 18

[9] A.P. Dempster, N.M. Laird and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. 19

Soc. B 39 (1977) 1–38. 20

[10] P. Deuflhard and M. Weber, Robust Perron cluster analysis in conformation dynamics. Lin. Alg. App. 398 (2005) 161–184. 21

[11] P. Deuflhard, W. Huisinga, A. Fischer and Ch. Schutte, Identification of almost invariant aggregates in reversible nearly 22

uncoupled Markov chains. Lin. Alg. Appl. 315 (2000) 39–59. 23

[12] R. Duda, P. Hart and D. Stork, Pattern Classification. Wiley (2001). 24

[13] L. Elsner and S. Friedland, Variation of the Discrete Eigenvalues of Normal Operators. P. Am. Math. Soc. 123 (1995) 2511– 25

2517. 26

[14] S. Fischer, B. Windshugel, D. Horak, K.C. Holmes and J.C. Smith, Structural mechanism of the recovery stroke in the myosin 27

molecular motor. Proc. Natl. Acad. Sci. USA 102 (2005) 6873–6878. 28

[15] A. Fischer, S. Waldhausen, I. Horenko, E. Meerbach and Ch. Schutte, Identification of biomolecular conformations from 29

incomplete torsion angle observations by Hidden Markov Models. J. Comp. Chem. 28 (2007) 1384–1399. 30

[16] H. Frauenfelder, S.G. Sligar and P.G. Wolynes, The energy landscapes and motions of proteins. Science 254 (1991) 1598–1603. 31

[17] H.O. Hartley, Maximum likelihood estimation from incomplete data. Biometrics 14 (1958) 174–194. 32

[18] I. Horenko and Ch. Schutte, Likelihood-based estimation of multidimensional Langevin models and its application to biomolec- 33

ular dynamics. Mult. Mod. Sim. (to appear). 34

[19] I. Horenko, E. Dittmer, A. Fischer and Ch. Schutte, Automated model reduction for complex systems exhibiting metastability. 35

Mult. Mod. Sim. 5 (2006) 802–827. 36

Page 22: Christof Schutte¨ and Tobias Jahnkejahnke/media/schuettejahnke2008.pdf · 2 C. SCHUTTE AND T. JAHNKE¨ 1 ages or atmospheric blocking dynamics), in the economic sciences (e.g., financial

22 C. SCHUTTE AND T. JAHNKE

[20] I. Horenko, C. Hartmann, Ch. Schuette and F. Noe, Data-based parameter estimation of generalized multidimensional Langevin1

processes. Phys. Rev. E 76 (2007) 016706.2

[21] W. Huisinga and B. Schmidt, Metastability and dominant eigenvalues of transfer operators, in Advances in Algorithms for3

Macromolecular Simulation, C. Chipot, R. Elber, A. Laaksonen, B. Leimkuhler, A. Mark, T. Schlick, C. Schutte and R. Skeel4

Eds., Lect. Notes Comput. Sci. Eng. 49, Springer (2005) 167–182.5

[22] W. Huisinga, S. Meyn and Ch. Schuette, Phase transitions and metastability in Markovian and molecular systems. Ann. Appl.6

Probab. 14 (2004) 419–458.7

[23] M. Jager, Y. Zhang, J. Bieschke, H. Nguyen, M. Dendle, M.E. Bowman, J. Noel, M. Gruebele and J. Kelly, Structure-function-8

folding relationship in a ww domain. Proc. Natl. Acad. Sci. USA 103 (2006) 10648–10653.9

[24] S. Lafon and A.B. Lee, Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning10

and data set parameterization. IEEE Trans. Pattern Anal. Mach. Intell. 28 (2006) 1393–1403.11

[25] B.B. Laird and B.J. Leimkuhler, Generalized dynamical thermostating technique. Phys. Rev. E 68 (2003) 016704.12

[26] B. Nadler, S. Lafon, R.R. Coifman and I.G. Kevrekidis, Diffusion maps, spectral clustering and reaction coordinates of13

dynamical systems. Appl. Comput. Harmon. Anal. 21 (2006) 113–127.14

[27] H. Neuweiler, M. Lollmann, S. Doose and M. Sauer, Dynamics of unfolded polypeptide chains in crowded environment studied15

by fluorescence correlation spectroscopy. J. Mol. Biol. 365 (2007) 856–869.16

[28] F. Noe, D. Krachtus, J.C. Smith and S. Fischer, Transition networks for the comprehensive characterization of complex17

conformational change in proteins. J. Chem. Theory Comput. 2 (2006) 840–857.18

[29] F. Noe, M. Oswald, G. Reinelt, S. Fischer and J.C. Smith, Computing best transition pathways in high-dimensional dynamical19

systems: application to the αL � β � αR transitions in octaalanine. Mult. Mod. Sim. 5 (2006) 393–419.20

[30] A. Ostermann, R. Waschipky, F.G. Parak and G.U. Nienhaus, Ligand binding and conformational motions in myoglobin.21

Nature 404 (2000) 205–208.22

[31] L.R. Rabiner, A tutorial on HMMs and selected applications in speech recognition. Proc. IEEE 77 (1989).23

[32] Ch. Schutte and W. Huisinga, On conformational dynamics induced by Langevin processes, in EQUADIFF 99 – International24

Conference on Differential Equations 2, B. Fiedler, K. Groger and J. Sprekels Eds., World Scientific (2000) 1247–1262.25

[33] Ch. Schutte and W. Huisinga, Biomolecular conformations can be identified as metastable sets of molecular dynamics, in26

Handbook of Numerical Analysis X, P.G. Ciarlet and C. Le Bris Eds., Elsevier (2003) 699–744.27

[34] Ch. Schutte, A. Fischer, W. Huisinga and P. Deuflhard, A direct approach to conformational dynamics based on hybrid Monte28

Carlo. J. Comput. Phys., Special Issue on Computational Biophysics 151 (1999) 146–168.29

[35] Ch. Schutte, W. Huisinga and P. Deuflhard, Transfer operator approach to conformational dynamics in biomolecular systems,30

in Ergodic Theory, Analysis, and Efficient Simulation of Dynamical Systems, B. Fielder Ed., Springer (2001) 191–223.31

[36] C. Schutte, F. Noe, E. Meerbach, P. Metzner and C. Hartmann, Conformations dynamics. Proceedings of ICIAM 2007, Section32

on Public Talks (submitted).33

[37] G. Singleton, Asymptotically exact estimates for metastable Markov semigroups. Quart. J. Math. Oxford 35 (1984) 321–329.34

[38] D. Wales, Energy Landscapes. Cambridge University Press, Cambridge (2003).35