NONLINEAR DYNAMICS II, III: ANALYSIS OF TIME SERIES€¦ · NONLINEAR DYNAMICS II, III: ANALYSIS OF TIME SERIES Jaroslav Stark Centre for Nonlinear Dynamics and its Applications,

NONLINEAR DYNAMICS II, III:

ANALYSIS OF TIME SERIES

Jaroslav StarkCentre for Nonlinear Dynamics and its Applications,University College London,Gower Street,London, WC1E 6BT.

SUMMARY

Perhaps the single most important lesson to be drawn from the study of non-linear dynamical sys-tems over the last few decades is that even simple deterministic non-linear systems can give rise tocomplex behaviour which is statistically indistinguishable from that produced by a completely ran-dom process. One obvious consequence of this is that it may be possible to describe apparentlycomplex signals using simple non-linear models. This has led to the development of a variety ofnovel techniques for the manipulation of such “chaotic” time series. In appropriate circumstances,such algorithms are capable of achieving levels of performance which are far superior to those ob-tained using classical linear signal processing techniques.

Friday, November 18, 1994. 1 Jaroslav Stark, Analysis of Time Series

1. INTRODUCTION

Traditionally, uncertainty has been modelled using probabilistic methods: in other words by incor-porating random variables and processes in a model. Such stochastic models have been extremelysuccessful in a wide variety of applications and over the years a vast range of techniques forestimating and manipulating such models has been developed. These form the subject matter of theother lectures making up this course.

By contrast, it is only relatively recently that mathematicians, scientists and engineers have come torealize that uncertainty can also be generated in certain circumstances by purely deterministicmechanisms. This phenomenon has come to be known as “chaos” and is primarily due to the capac-ity of many nonlinear dynamical systems to rapidly amplify an insignificant amount of uncertaintyabout the initial state of a system into an almost total lack of knowledge of its state at a later time.This aspect of nonlinear systems has already been dealt with in some detail in Colin Sparrow’s lec-ture on Chaotic Behaviour.

One of the most promising applications of this observation is within the field of time series analysis,where nonlinear dynamics offers the potential of a new class of models which in certain circum-stances can lead to forecasting and filtering algorithms of vastly superior performance compared totraditional signal processing methods based on the stochastic paradigm. The aim of these two lec-tures is therefore to outline the framework within which such models are constructed and to describesome of the uses to which these models can be put to.

Before we do this, however, a word of caution is in order. Within the last few years some rather ex-travagant claims for this so called “theory of chaos” have been made, particularly in the popularmedia. Many of these are difficult to justify and it is important to stress that “chaotic time series”models are only appropriate in certain classes of problems; they are not a universal panacea. In par-ticular, the sorts of models that we shall describe are deterministic and dynamic. We should onlyattempt to apply them in situations where we have reasonable cause to suspect that deterministicmechanisms are important, and where we are interested in longer term behaviour rather than singleevents. Some promising applications include:

• Diagnostic monitoring of engineering systems such as turbines, generators, gear-boxes. Theseall undergo complex deterministic vibrations and it is often important to detect as early as pos-sible slight changes in their operating behaviour which indicate some malfunction such as ashaft, turbine blade or gear tooth beginning to crack.

• Similarly many biological systems exhibit highly deterministic nonlinear oscillations, e.g. theheart beat, circadian rhythms, certain EEG signals etc. There is thus scope for medical applica-tions of techniques based on nonlinear dynamics, for instance in foetal distress monitoring dur-ing childbirth.

• Agglomerated properties of populations can in some circumstances show strongly deterministicfeatures. Apart from obvious applications such as forecasting the transmission and develop-ment of epidemics, there is also interest developing and validating models of ecological sys-tems and their interaction with man.

Considerable effort has also been made to apply these techniques to financial and economic datasuch as foreign exchange rates and stock market indices. On the whole, methods based on non-lin-ear dynamics do not perform too badly, but nor do they perform spectacularly well. It seems clearthat the behaviour of such systems has more structure than can be explained by traditional linearstochastic models, but the best way of incorporating concepts from nonlinear dynamics remains tobe seen. This is currently the subject of intense research activity.


2. TIME SERIES GENERATED BY CHAOTIC SYSTEMS

Recall from Colin Sparrow’s lecture that the state of a deterministic dynamical system at a giventime t0 is described by a point x lying in IRm The time evolution of the system is given by a mapx(t0) → x(t) which gives the state of the system at time t given that it was initially in state x(t0) attime t0. Often this map will obtained as the solution of some ordinary differential equation.

In many situations, however, we do not have access to the state x and the map x(t0) → x(t) is un-known. All that we can do is observe some function ϕ(x) of the state x, where ϕ : IRm → IR is somemeasurement function. It corresponds to measuring some observable property of the system such asposition or temperature. The evolution of this quantity with time is then given by the time seriesϕ(x(t)). In practice we can only observe this at discrete time intervals. For simplicity we assume thatthese intervals are all the same so that in fact we observe the discrete time series ϕn = ϕ(x(nτ)) for n= 1,2, … where τ is some sampling interval.

Note that in some circumstances we may be able to measure more than one observable simultane-ously. This gives rise to a multi-variable time series. For simplicity, we shall not consider this case,but all the methods described below are equally applicable to such time series, with essentially onlytrivial modifications. In practice it often turns out that there is no advantage in taking more variables(e.g. Muldoon [1994]). One interesting idea in this context is that we can use nonlinear dynamicstechniques to easily test whether or not two simultaneous time series have been generated by thesame dynamical system. Smith [1993] has suggested using this to test for sensor malfunction inmulti-sensor systems.

A simple example of such a time series is given in Figure 1, which was obtained by observing the xco-ordinate of the Hénon map

xn+1 = 1 - a(xn)2 + yn (1a)yn+1 = bxn (1b)

where a and b are parameters which in this case are taken to be a = 1.4 and b = 0.3. In this case theobservation function is just ϕ(xn,yn) = xn, and ϕn = xn.

-1.5

- 1

-0.5

0

0.5

1

1.5

0 200 400 600 800 1000

x n

nFigure 1. Sample chaotic time series {xn} derived from the Hénon map (Eq. 1).


-20

-15

-10

- 5

0

5

10

15

20

0 200 400 600 800 1000

ϕ n

nFigure 2. Time series {ϕn} derived from the Lorenz equations (Eq. 2).

Figure 2 shows another example, this time taken by observing the z co-ordinate of the Lorenz equa-tions

x = 10(y - x)

y = -xz + 28x - y (2)

z = xy - 83z

These were integrated using a Runge-Kutta fourth order scheme with a time step of 10-2 and sam-pled every 10 integration steps (i.e. so that ϕn = x(0.1n)).

In contrast to the simple form of the equations generating them, the behaviour in Figure 1 and 2 ap-pears to be rather complex, and particularly in the case of the first figure, to be more or less randomand unpredictable. Indeed, most conventional statistical tests of such data would conclude thatnearly all of the variation in this time series was due to random fluctuations. Classically, it would bemodelled by linear stochastic processes, such as for instance an autoregressive moving average(ARMA) model:

ϕn = α0 +

j=1

k

∑ α jϕn-j +

j=0

l

∑ βjεn-j (3)

Here εn is a sequence of uncorrelated random variables and the αi and βi are the parameters of themodel. Such a model would assign most of the complexity seen in Figure 1 to the stochastic termsrepresented by the εn and hence would lead to rather poor estimates of the future behaviour of thetime series. By contrast, as we shall see below, methods based on nonlinear dynamics are able tomake extremely good short term forecasts of these times series, even when the underlying dynami-cal equations (e.g. Equation 1 or 2) are not known. This has led to a re-examination of many sys-tems to determine whether or not their behaviour could be predicted using such techniques. Apartfrom the inherent interest in making predictions in their own right, the predictability or not of a


given set of data can yield important clues about the mechanisms which generated it (see [Sugiharaand May, 1990] and [Tsonis and Elsner, 1992]). Finally, such chaotic prediction algorithms in turncan be used as the basis of novel techniques for signal processing including filtering, noise reductionand signal separation (see §5 below).

Examples of the sorts of time series that have been analysed, often with mixed results, include datafrom electronic circuits, fluid dynamics experiments, chemical reactions, sunspots, speech, elec-troencephalograms and other physiological time series, population dynamics and economic and fi-nancial data (see for instance [Casdagli, 1992] and the references therein). An example of such realdata is shown in Figure 3. This represents the square root of the amplitude of the output of a semi-conductor laser, digitized using an 8 bit A/D converter.

0

2

4

6

8

10

12

14

16

0 200 400 600 800 1000

ϕ n

nFigure 3. Time series of observed laser data.

Before we describe some of the algorithms used to model such time series, it would seem appropri-ate to analyse the way in which uncertainty manifests itself in time series such as those in Figure 1and 2. At first sight, it might seem strange that a purely deterministic system, such as that given byEquation 1 or 2, should have any uncertainty at all associated with it. After all, if say we know(x0,y0) for the Hénon map, then in principle we can predict (xn,yn) perfectly for all time. However,as Colin Sparrow has already described, even the tiniest inaccuracies in our knowledge of either(x0,y0) or of the equations themselves will be rapidly amplified. In the context of time series, this isillustrated in Figure 4, where the initial conditions (x0,y0) used to generate Figure 1 were slightlyperturbed to give (′x0, ′y0) where ′x0 = x0 + 10-8 and ′y0 = y0 + 10-8. We see that although the grossfeatures of the resulting time series are very similar to Figure 1, detailed inspection shows that xnand ′xn rapidly diverge from each other. This is much more apparent in Figure 5 where we plot ∆n =log xn - ′xn . We see that for about the initial 50 iterations ∆n increases more or less linearly untilthere is essentially no correlation left between xn and ′xn. In other words, an initial uncertainty in thestate of the system of the order of 10-8 results after 50 time steps in a complete lack of knowledge ofits behaviour. The rate at which uncertainty grows is given by the initial slope of the graph of log xn - ′xn and corresponds to the largest Liapunov exponent which was introduced in ColinSparrow’s lecture.


-1.5

- 1

-0.5

0

0.5

1

1.5

0 200 400 600 800 1000

x ′n

nFigure 4. Time series derived from Eq. 1 and modified initial conditions (x'0,y'0).

- 8

-6

-4

-2

0

2

0 200 400 600 800 1000

∆ n

nFigure 5. Separation ∆n = log xn - ′xn between Figure 1 and Figure 4.

Such exponential growth of small perturbations is characteristic of chaotic systems and indeed isusually used to define chaos. It places severe limits on the long term predictability of such systems.This is because in practice we never know the initial state of the system to infinite precision. Anyerrors, no matter how small, in the determination of this state will grow as above and rapidly renderour long term forecasts meaningless. We thus have the dichotomy that chaotic systems are highlypredictable in the short term (due to their deterministic time evolution) but completely unpredictablein the long term (due to their sensitive dependence on initial conditions).


The above discussion indicates that the apparent uncertainty in a chaotic time series will depend onthe choice of time scales, and in particular on the choice of the sampling interval τ. Recall Figure 2,and note that this has much more structure and appears much less random than the Figure 1. If how-ever we choose a larger τ, we get a much more random looking time series, as in Figure 6, where τ= 1 is used.

-20

-15

-10

- 5

0

5

10

15

20

0 200 400 600 800 1000

ϕ n

nFigure 6. Time series derived from the Lorenz equations with τ = 1.

One the other hand, if we choose to sample much more frequently, the deterministic nature of thebehaviour becomes much more apparent, as in Figure 7, which was generated using τ = 0.01.

-20

-15

-10

- 5

0

5

10

15

0 200 400 600 800 1000

ϕ n

nFigure 7. Time series derived from the Lorenz equations with τ = 0.01.


3. RECONSTRUCTING THE DYNAMICS

For the Hénon map of Equation 1 the time evolution of xn depends on both x and y. Thus to generateFigure 1, we had to compute both xn and yn for n = 1, …, 1000. Similarly, for the Lorenz equationswe had to compute x(t), y(t) and z(t) simultaneously. More generally, as described at the beginningof §2, the state x will lie in IRm and we will need all m co-ordinates to determine the future be-haviour of the system. On the other hand, the observed time series ϕn is only one dimensional. Atfirst sight it might thus appear that ϕn contains relatively little information about the behaviour ofx(t) and that the fact that ϕn was generated by a deterministic system is of little value when the statex and the dynamics x(t0) → x(t) are not kown. Remarkably, this intuition turns out to be false due toa powerful result known as the Takens Embedding Theorem ([Takens, 1980], see also [Sauer et al.,1991] and [Noakes, 1991]). This shows that for typical measurement functions ϕ, and typical dy-namics x(t0) → x(t), it is possible to reconstruct the state x and the map x((n-1)τ) → x(nτ) just fromknowledge of the observed time series {ϕn}. More precisely this theorem says that typically thereexists an integer d (called the embedding dimension) and a function G : IRd → IR such that ϕn sat-isfies

ϕn = G(ϕn-d, ϕn-d+1, …, ϕn-1) (4)

for all n. Furthermore if we define the point vn∈ IRd by

vn = (ϕn-d+1, ϕn-d+2, …, ϕn) (5)

then the dynamics on IRd given by vn-1→ vn = (ϕn-d+1, ϕn-d+2, …, ϕn-1, G(vn-1)) is completelyequivalent to the original dynamics x((n-1)τ) → x(nτ) under some (unknown) smooth invertible co-ordinate change. In other words by working with the points vn we can reconstruct both the originalstate space and the dynamics on it. This procedure is illustrated by the next few figures.

Figure 8 shows 5000 points of the attractor of the Hénon map, plotted in the state space given by(xn,yn ). Figure 9 by contrast, shows the reconstructed attractor in (xn,xn-1) space, using the aboveprocedure. We stress that this figure was plotted using only the observed time series xn (i.e. as inFig. 1) with no knowledge of the variable yn. Apart from the scale change, the figures are identical.

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

-1.5 -1 -0.5 0 0.5 1 1.5

y n

xn

-1.5

-1

-0.5

0

0.5

1

1.5

-1.5 -1 -0.5 0 0.5 1 1.5

x n-1

xn

Figure 8 Figure 9


In the case of the Hénon map, is easy to see why this procedure works: Eq. 1b gives a simple rela-tion between yn and xn-1. This however is a very special feature of the Hénon map, and will not holdfor more general dynamics, such as those of the Lorenz equations. The above reconstruction proce-dure will then yield a somewhat different looking picture of the dynamics as is seen in Figures 10and 11. The first of these shows a plot of Lorenz attractor projected onto the (x(t),y(t)) plane, fromthe full three dimensional state space given by (x(t),y(t),z(t)). Figure 10 shows a reconstruction usingthe x co-ordinate alone: thus x(t) is shown plotted against x(t-τ) with τ = 0.1. The similarity betweenthe two figures is obvious.

-30

-20

-10

0

10

20

30

-20 -15 -10 -5 0 5 10 15 20

y n

xn

-20

-15

-10

-5

0

5

10

15

20

-20 -15 -10 -5 0 5 10 15 20

x n-1

xn

Figure 10 Figure 11

Note that the map vn-1→ vn just advances a block of elements of the time series {ϕn} forward byone time step. Thus, whilst the original state space and dynamics might be inaccessible to us, the re-constructed dynamics (which is completely equivalent to the original dynamics) operates purely interms of the observed time series can be approximated using any one of a wide range of non-linearfunction fitting algorithms (some of which are described in the next section). We thus conclude thatdespite the fact that ϕn is one dimensional whilst x(t) is m-dimensional, all of the important infor-mation about the behaviour of x(t) is actually encoded in ϕn. It is this fundamental fact which liesbehind all practical schemes for processing chaotic time series.

At this point, the observant reader will ask: how do I know what value of d to take? Clearly, for ef-ficiency and numerical accuracy, this should be chosen as small as possible whilst still ensuring thatwe have a proper reconstruction. Takens in fact shows that any value of d ≥ 2m+1 always results inan equivalence between the original and the reconstructed system, though in fact often a smallervalue will also be sufficient (e.g. for the Hénon map above we took d = m = 2). Unfortunately, thisis not as useful a bound as it might at first seem, since usually m itself will also be unknown. Inpractice d therefore has to be determined empirically; one possibility is to compute a trial G for eachvalue of d and then use the smallest d which gives a good fit to Eq. 4. This is discussed in more de-tail below in §7 since a value of d is obtained as a by-product of most tests for the presence ofchaotic behaviour in a time series.

In practical applications the choice of τ can also be crucial. We have already seen in Figures 2, 6and 7 that τ can have profound effect on the apparent complexity of the observed time series. Now,as far as Takens’s theorem is concerned, any value of τ is as good as any other (provided that certaingenericity conditions are met). However, in practice, some values of τ will give far “better” re-constructions than others. Thus if τ is too small, since ϕ(x(t)) is continuous as a function of t, suc-


cessive values of ϕn will be almost identical and hence ϕn-d+2, …, ϕn will contain very little usefulinformation beyond that already obtained from ϕn-d+1. In such circumstances, if measurements aremade only to a finite precision, it will be difficult to make reasonable predictions of the future be-haviour of the time series. On the other hand if τ is too large then the map G will become very com-plicated and difficult to estimate accurately. A number of methods for choosing a suitable valuehave been proposed ([Fraser and Swinney, 1986], [Liebert and Schuster, 1989], [Liebert et al.,1991] and [Buzug and Pfister, 1992]). The last two of these references attempt to determine an op-timal choice for both d and τ simultaneously. Note that the issue of choice of time delay arises evenif we are just given a time series {ϕn} and have no direct control over the original sampling intervalτ. This is because in such a case we still have the option of resampling the time series at some inte-ger multiple of the original rate, in other words of using reconstruction co-ordinates of the form (ϕn-

q(d+1), ϕn-q(d+2), …, ϕn) for any positive integer q.

4. PREDICTING CHAOTIC TIME SERIES

So, what do we gain from the above reconstruction procedure? Well, consider Eq. 4: this tells usthat ϕn is uniquely determined in terms of the previous d values of the time series. We can thus usethe function G to predict the future behaviour of the time series. In this sense Eq. 4 is a non-lineardeterministic analogue of the ARMA model in Eq. 2; we have simply removed the stochastic termsdue to the εn and replaced the linear combination αdϕn-d + αd-1ϕn-d+1 + … + α1ϕn-1 by the non-linear function G.

The big problem, of course, is to determine G. In the case of the Hénon map, we can in fact writedown G for the time series in Fig. 1 in closed form. From Eq. 1b, we have that yn-1 = bxn-2 and sub-stituting this into Eq. 1a we get

xn = 1 - a(xn-1)2 + bxn-2 (6)

However, this is a very special feature of the Hénon map: in general, for typical dynamical systemsand typical observation functions such an explicit derivation is not possible. However, given asample {ϕn : n = 1, …, N} of the time series we can use any one of a large variety of non-linearfunction fitting techniques to approximate G.

Such techniques fall into two categories: local and global (e.g. [Farmer and Sidorowich, 1987],[Crutchfield and McNamara, 1987], [Lapedes and Farber, 1987], [Abarbanel et al., 1989], [Giona etal., 1991], [Pawelzik and Schuster, 1991], [Linsay, 1991], [Jiménez et al., 1992] and [Casdagli,1989 and 1992]). Global methods construct a single approximation G using all the available data.Examples include polynomials, rational functions, various kinds of neural networks and radial basisfunctions (see below).

By contrast, local techniques estimate the value of G(v) using only a small number of data points vnin the immediate neighbourhoods of v. These are used to form a local estimate Gv , usually in theform of a low order polynomial (i.e. linear or quadratic). Note that if we simply take the nearestpoint to v and employ a 0th order estimate (i.e. a constant function) we end up with the well knowntechnique of “forecasting by analogy”. This is because a point v in our reconstructed state space cor-responds to a pattern of successive values of the observable (Figure 12). To predict the next value inthe sequence we look through our past history of the data {ϕn : n = 1, …, N} to find the pattern vj =(ϕj-d+1, ϕj-d+2, …, ϕj) which most closely matches v (Figure 13). The value following vj is just ϕj+1= G(vj): hence our best 0th order estimate of the value Gv (v) following v is ϕj+1. More sophisticatedlocal techniques simply aim to obtain a better estimate of Gv (v) using other previously seen nearbypatterns. Observe that one big disadvantage of this approach is that we have to permanently store


the sample data {ϕn : n = 1, …, N}, and we need to search through this looking for nearby pointseach time we want to make a prediction. Sophisticated algorithms exist to make this feasible, evenfor large data sets, but nevertheless local prediction methods are much more computationally inten-sive and much less straightforward to implement than global algorithms. The latter are also easier toexplain and conceptually much more elegant. On the other hand local methods will probably yieldmore accurate approximations, particularly if the derivatives of G are required.

v

Figure 12 The point v corresponds to a pattern of successive values of the time series

Figure 13 Previous patterns which are a good match to v.

In these notes we shall concentrate on global techniques. One of the most successful methods hasbeen that of radial basis function approximation. This was originally conceived of as a interpolationtechnique (see for instance Powell [1987]) but was subsequently generalized to an approximationmethod using linear least squares estimation (Broomhead and Lowe [1988]). Within this contextradial basis function methods have also received considerable attention within the field of systemidentification (Chen et al [1990 a,b and 1991], [Pottmann and Seborg, 1992]). Also note that radialbasis function approximations can be regarded as a particular class of neural networks; indeed thiswas the context of Broomhead and Lowe’s original approach.


The basic idea behind radial basis function approximation is to choose a finite number of points,called radial basis centres, w1, …, wp∈ IRd and some fixed radial basis function ρ : IR → IR . In theexamples described below we use the basis function ρ(r) = √(r2 + c) with an appropriately chosenconstant c, but many other choices are possible. We then look for an approximation Gλ of a givenfunction G : IRd → IR in the form of

Gλ(v) =

j=1

p

∑ λ j ρ v − w j( ) (7)

where λ = (λ1, …, λp) are parameters which determine the function Gλ and • is the Euclideannorm on IRd.

Observe that, although Gλ(v) is a nonlinear function of v, it depends linearly on the parameters λ =(λ1, …, λp). As first suggested by Broomhead and Lowe [1988], this means that we can use stan-dard linear least squares methods to obtain an estimate of λ from a sample of data {ϕn : n = 1, …,N}. More precisely, we choose λ to minimize the least squares error

E2 =

n=d

N

∑ Gλ (vn ) − G(vn )( )2(8)

where of course vn = (ϕn-d+1, ϕn-d+2, …, ϕn) are just the points drawn from the data sample. Ob-serve that G(vn) =ϕn+1, so that substituting Eq. 7 into Eq. 8, we get

E2 =

n=d

N

∑ λ j

j=1

p

∑ ρ vn − w j( ) − ϕn+1

2

(9)

If we now define the (N-d)×p matrix A by Anj = ρ vn − w j( ) and the vector b∈ IRN-d by bn = ϕn+1,we can rewrite this as

E2 =

n=d

N

∑ Anjλ j

j=1

p

∑ − bn

2

(10)

= Aλ − b 2

In other words, minimizing E is equivalent to finding a λ such that the vector Aλ - b has minimumEuclidean norm. This is a completely standard linear least squares problem which can be solved in avariety of ways (e.g. [Stoer and Bulirsch, 1980] or [Lawson and Hanson, 1974]). The best approachis generally considered to be via the orthogonal decomposition of A. We shall explain how such adecomposition can be obtained below, but first let us show how it leads to the solution of the leastsquares problem. So, suppose that we have found p+1 mutually orthogonal vectors q1, … , qm, qsuch that

A = Q K(11)

b = Q k + q

where Q is the (N-d)×p whose columns consist of the vectors q1, … , qm, and the p×p matrix K isupper triangular with unit diagonal elements (i.e. Kij = 0 for i>j and Kii = 1 for all i). Then

Aλ - b = Q K λ - Q k - q


and hence

Aλ − b 2 = q 2 + Q Kλ − k( ) 2 - 2 q † Q (Kλ - k )

where † indicates transpose. But q is orthogonal to q1, … , qp, so q †Q = 0. Hence

Aλ − b 2 = q 2 + Q Kλ − k( ) 2

Thus the required λ to minimize Aλ − b 2 can be obtained as a solution of Kλ - k . Since K is up-per triangular, this can be solved extremely easily by backsubstitution, i.e. set λp = kp and

λ i = ki -

j=i+1

p

∑ Kij λ j (12)

inductively for i = p-1, …, 1.

Note that we allow any of the q’s to be zero: all that we require is that qi†qj = 0 for all i ≠ j and qi†q= 0 for all i. If one of the qi = 0, this means that the columns of A are linearly dependent, in whichcase the minimizing λ is not unique (in fact λ is minimizing if and only if QKλ = Qk ). If q = 0, thissimply means that the residual Aλ - b is zero.

The q’s can be obtained using a variety of algorithms. Standard techniques are based on the Gram-Schmidt algorithm or on Givens or Householder transformations (e.g. [Stoer and Bulirsch, 1980] or[Lawson and Hanson, 1974]). Here we shall present the Gram-Schmidt algorithm because it is thesimplest to describe and will already be familiar to many readers.

Recall that given a set of vectors a1, …, ap∈ IRN-d (which in this context are the columns of A) theGram-Schmidt algorithm generates a set of mutually orthogonal vectors q1, …, qp such that q1, …,qj spans the same subspace of IRN-d as a1, …, aj, for any 1 ≤ j ≤ p. This is achieved by setting q1 =a1 and then inductively

qj = aj -

i=1

j−1

∑ Kij qi (13)

where for i < j

Kij =

a j†qi

qi†qi

qi ≠ 0

0 qi = 0

(14)

Note that this is precisely the K which gives the required orthogonal decomposition in Eq. 11. Wealso need to find q and k , this is done by treating b as ap+1 in the above procedure (so that q isthen given by qp+1and ki by Kip+1). Also observe that sometimes one also requires the qj to be or-thonormal. This can easily be achieved by normalizing the qj as defined above, but for our purposesit is more convenient not to do so.

In practice, is preferable to calculate the q’s in a slightly different fashion using the so calledModified Gram-Schmidt algorithm (e.g. [Lawson and Hanson, 1974]). This leads to exactly thesame Q,q ,K and k but is numerically better conditioned. It makes use of a set of auxiliary vectorsqj(i) for i ≤ j defined by

qj(i+1) = qj(i) - Kij qi (15)

with qj(1) = aj and qi = qi(i). By induction we see that


Kij =q j

(i)†qi

qi†qi

qi ≠ 0

0 qi = 0

(16)

Thus qj(i+1) can be computed directly from the qj(i). The decomposition of b is similarly performedusing q (i) for i = 1, …, p+1

q (i+1) = q (i) - ki qi (17)

with q (1) = b and q = q (p+1). In terms of q (i), the ki are given by

ki =q (i)†qi

qi†qi

qi ≠ 0

0 qi = 0

(18)

Note that despite initial appearances the Modified Gram-Schmidt algorithm requires no more stor-age than the conventional Gram-Schmidt scheme since at each stage qj(i+1) can be used to overwriteq (i).

The one remaining issue we need to address before we have a complete algorithm is the choice ofthe basis centres wj. In Powell’s original approach, these were placed uniformly on a grid, and it wasassumed that G(wj) was known, i.e. that the centres were chosen from amongst the data points. Inour application to chaotic time series, we cannot of course ensure that the data points are arranged ina regular fashion, but it still seems desirable that the centres should be reasonably distributedthroughout the region where the data lies. In the absence of any a priori knowledge it is thus stillusual to choose the centres from amongst the data points vn = (ϕn-d+1, ϕn-d+2, …, ϕn), so that wj =vn(j) for some n(j).This was for instance done by Casdagli [1989] who was the first to apply radialbasis function to the prediction of chaotic time series (though he used interpolation rather than leastsquares approximation). Since we require the number of centres to be substantially smaller than thenumber of data points, the simplest solution is to choose the centres randomly from amongst thedata set. A more sophisticated approach might be to require that none of the centres lie closertogether than some predetermined minimal distance. More complex criteria are discussed in [Smith,1992], whilst Chen et al. [1990 a,b and 1991] (see also [Pottmann and Seborg, 1992] and Stark[1994]), present a model selection algorithm whereby initially a very large number of centres ischosen and then all but the most significant are discarded during the estimation procedure. Note thatin all of these approaches the matrix elements Aij an then be expressed directly in terms of theobserved sample data {ϕn : n = 1, …, N} as

Aij = ρ(rin(j)) (19)

where rij is the Euclidean distance between vi and vj, given by

rij = ϕi−k − ϕ j−k( )2

k=0

d−1

∑

12

(20)


THE COMPLETE ALGORITHM

To summarize, here is the complete algorithm:

1. Choose the centres wj = vn(j) from amongst the data points {vn = (ϕn-d+1, ϕn-d+2, …, ϕn) : n =1, …, N}.

2. Compute the distances rin(j) using Eq. 20 and define the vectors q1(1), …, qp(1) by [qj(1)] i =ρ(rin(j)), where [qj(1)] i is the ith component of [qj(1)].

3. For i = 1, …, p-1, set qi = qi(i), and compute Kij for j > i using Eq. 16, and then form qi(i+1) us-ing Eq. 15.

4. Define q (1) by [q (1)] i = ϕi+1.

5. For i = 1, …, p, compute ki using Eq. 18, and then form q (i+1) using Eq. 17.

6. Set λp = kp and compute λp-1, …, λ1 (in that order) using Eq. 12.

The resulting (λ1, …, λp) is the parameter vector λ which gives the optimal approximation Gλ of G.

Let us give some examples of the application of this algorithm. Figure 14 shows the one step pre-diction error εn = ϕn - Gλ(ϕn-2,ϕn-1) for the Hénon time series of Fig. 1. The first 500 data pointswere used as the data sample and the first 150 were used to define the radial basis centres. The basisfunction used was ρ(r) = √(r2 + 10). The error is down to the level of 10-9 or less. Furthermore, ob-serve that the in-sample error for the first 500 points is indistinguishable from the out-of-sampleerror for the second half of the time series. This indicates that a good overall fit to G has been ob-tained and “overfitting” has not occurred.

-1 10-7

-5 10-8

0

5 10-8

1 10-7

0 200 400 600 800 1000

ε n

nFigure 14. One step prediction error εn for radial basis prediction of Fig. 1

Of course this example is no great test of the technique: all we are trying to do is fit the functionG(xn-2,xn-1) = 1 - 1.4(xn-1)2 + 0.3xn-2, (albeit without any knowledge of its functional form). Amore difficult example Lorenz time series of Fig. 2; recall that for this no explicit expression for Gis available. The results are shown in Figure 15. As before, the first 500 data points were used as thedata sample and the first 150 were used to define the radial basis centres. The basis function was


ρ(r) = √(r2 + 4000), and the embedding dimension was d = 4. The error is much larger than in theHénon case, but even so is on average on the order of 5×10-3 of the amplitude of the original timeseries. This is more than adequate for most applications, and indeed is more accurate than the basicprecision of any data we would be able to collect in most real world situations. More importantly thepredictions are at least 10-2 better than those which we would get by predicting ϕn to be the same asϕn-1 (which is perhaps the simplest prediction technique possible, and a fundamental baselineagainst which we should always test).

-0.1

-0.05

0

0.05

0.1

0 200 400 600 800 1000

ε n

nFigure 15. One step prediction error εn for radial basis prediction of Fig. 2

Figure 16. A trajectory segment corresponding to poor predictions in Fig. 15.

The occasional very poor predictions, which are such a noticeable feature of Fig. 15, are due to thefact that close to the origin (x,y,z) = (0,0,0) the Lorenz system is much more sensitive to small errorsthan in other parts of the phase space. In particular, in this region it is very difficult to tell into whichof the two halves of the attractor (contained in x > 0 or x < 0, respectively) the orbit will go next.


This is illustrated in Figure 16, where we highlight a segment of the trajectory corresponding to oneof the “glitches” in Figure 15.

Such variability in the predictability of the system throughout the state space appears to occur in awide variety of systems, and is currently the subject of much interest (e.g. [Abarbanel et al., 1991],[Smith, 1992], [May, 1992] and [Lewin, 1992]). One approach to quantifying such differences inpredictability is through the concept of local Liapunov exponents ([Abarbanel et al., 1991] and[Wolff, 1992]).

Finally in Figure 17 we apply the above algorithm to the laser data from Fig. 3. Again, the first 500data points were used as the data sample, but now only the first 75 points were used as basis centres.The basis function was ρ(r ) = √(r2 + 640), and the embedding dimension was d = 5. Despite thevery poor quality of the original data (only 8 bit precision) we find that we are still able to model thedynamics to an accuracy of better than 5% throughout most of the data. Furthermore, if we use moredata points we can do much better. Figure 18 shows the prediction error if we use a sample of 1500points to estimate G. In this case the increased amount of data even allowed us to reduce the numberof radial basis functions down to 50.

- 2

-1.5

- 1

-0.5

0

0.5

1

1.5

2

0 200 400 600 800 1000

ε n

nFigure 17. One step prediction error εn for laser data of Fig. 3

It must be stressed is that the prediction errors shown in Fig. 14-18 are the result of only predictingforward one step at a time. Thus for each n the prediction for ϕn is obtained by applying Gλ to thelast d observed values ϕn-d, ϕn-d+1, …, ϕn-1. In many applications one may wish to forecast furtherahead. This can be done either by directly estimating a function which predicts k steps ahead, or byiterating a one step forecast. In general, the latter method appears to be preferable ([Farmer andSidorowich, 1988] and [Casdagli, 1989]).

Both techniques, however, will rapidly encounter fundamental limits to the long term predictabilityof chaotic time series due to the exponential divergence of trajectories found in such systems (againsee [Farmer and Sidorowich, 1988] and [Casdagli, 1989] for a fuller discussion). This is illustratedin Figure 19, where we plot the prediction error εn = xn - xn for multi-step predictions, always pre-dicting from the same point v500 = (x499,x500). Here xn is defined by iterating Gλ forward from thispoint. Thus xn = Gλ ( xn−2 , xn−1) with initial conditions (x499, x500) = (x499,x500). Other details areas for Fig. 14. We see that the error rapidly rises so that it is impossible to make any kind of


prediction beyond about 50 time steps ahead. Notice the obvious similarity to Fig. 5. Sugihara andMay [1990] have suggested that this phenomenon can be used to detect chaotic behaviour in timeseries (see also [Tsonis and Elsner, 1992]).

- 2

-1.5

- 1

-0.5

0

0.5

1

1.5

2

0 500 1000 1500 2000 2500

ε n

nFigure 18. Prediction error εn for the laser data using a data sample of 1500 points

-10

- 8

-6

-4

-2

0

2

500 700 900 1100 1300 1500

ε n

n

^

Figure 19. Multi step prediction error εn for Hénon time series.


RECURSIVE PREDICTION

One disadvantage of the algorithm described above is that it is very much a “batch” technique. Itthus calculates an estimate Gλ of G once and for all using a predetermined block of observations ϕ1,…, ϕN. There is then no way of updating Gλ using further observations ϕN+1, ϕN+2, … as they aremade. Should one decide to use a larger data sample to estimate G one has to discard the previousestimate and recalculate a new estimate from the beginning. This leads to several disadvantages

a) it limits the number of data points ϕ1, …, ϕN that can be used in the estimation process. This isbecause for a given value of N, we have to form and manipulate the (N-d)×p matrix A andhence the above algorithm has a memory requirement of order Np. When p is of the order of102 as above, or even larger, this rapidly becomes a serious restriction on the size of N.

b) no useful predictions can be made until all the observations ϕ1, …, ϕN have been made andprocessed. In many applications it would be preferable to start making predictions (albeit ratherbad ones) right from the start and have their quality improve as more and more data is assimi-lated.

c) in many situations the function G may not be stationary but will vary slowly with time, or oc-casionally change suddenly. In such an environment a batch estimation scheme will be veryunsatisfactory since it will repeatedly have to discard its previous estimate of G and compute anew one starting from scratch. Furthermore, as mentioned in b) during each such recalculationthere will be a delay before the estimate based on new data becomes available.

It would thus be better, particularly for real time signal processing applications to use prediction al-gorithms which continuously update the estimate Gλ using new observations ϕN+1, ϕN+2, … as theyare made. It turns out that this can be done using the framework of recursive least squares estima-tion (e.g. [Young, 1984]). Recursive least squares techniques are of course well known in linear sig-nal processing and form the basis of most adaptive filter architectures (e.g. [Alexander, 1986]). Un-fortunately the standard least squares algorithms used in such linear schemes are not sufficientlystable or accurate for application to chaotic time series and in proves necessary to use more sophis-ticated recursive approaches such as the Recursive Modified Gram-Schmidt (RMGS) algorithm ofLing et. al. [1986]. This yields results comparable to those obtained from the batch procedure de-scribed above [Stark, 1993].

5. NOISE REDUCTION AND SIGNAL SEPARATION

So far we have considered the time series {ϕn} in isolation. In most applications, however, we areunlikely to be given such a pure chaotic signal. Instead, we will be asked to manipulate a mixture ψn= ϕn + ξn of a chaotic time series {ϕn} and some other signal {ξn}. The latter may represent noise,in which case we want to remove it from {ψn}, or it may be a signal that we wish to detect, in whichcase we want to extract it from {ψn} and discard {ϕn}. An example of the latter might be a faintspeech signal {ξn} masked by deterministic “noise” {ϕn} coming from some kind of vibratingmachinery, such as an air conditioner (e.g. [Taylor, 1991]). In both cases, the mathematical problemthat we face amounts to separating {ψn} into its two components {ϕn} and {ξn}. This is of courseclosely related to the property of “shadowing” which is found in many chaotic systems, and has al-ready been mentioned by Colin Sparrow. This is because in the reconstructed state space the pointsun = (ψn-d+1, ψn-d+2, …, ψn) form a pseudo-orbit close to the real orbit vn = (ϕn-d+1, ϕn-d+2, …,ϕn). Thus finding {ϕn}, given {ψn}, amounts to finding a real orbit which shadows the givenpseudo-orbit.


Several schemes have been developed in the last few years to perform this task (e.g. [Kostelich andYorke, 1988], [Hammel, 1990], [Farmer and Sidorowich, 1991], [Taylor, 1991], [Schreiber andGrassberger, 1991], [Stark and Arumugam, 1992] and Davies [1992, 1993]). Here, we shall outlinethe overall framework behind all these approaches and illustrate it with a particularly simple methodwhich is aimed at the situation when {ξn} is a relatively slowly varying signal [Stark andArumugam, 1992].

First observe that the decomposition of {ψn} into { ϕn} and {ξn} is not unique. In fact, we canchoose ϕ1, …, ϕd arbitrarily, define the sequence {ϕn} recursively by ϕn = G( ϕn−d , ϕn−d+1, …,ϕn−1) and obtain the trivial decomposition ψn = ϕn + (ψn - ϕn). It is thus necessary to imposesome additional constraints on {ξn}. The most common is to minimize the size of {ξn} with respectto some appropriate norm [Farmer and Sidorowich, 1991]. In many cases, just requiring {ξn} to be“small” for all n is sufficient to establish uniqueness. To see this, note that if {ϕn} is chaotic, thenfor most choices of ϕ1, …, ϕd , the distance ϕn - ϕn will grow rapidly (if it is not large already).Hence if {ξn} is small, ξn = ψn - ϕn = ϕn - ϕn + ξn will be large for at least some values of n. Thisis in fact the same argument that shows that for many chaotic systems there can only be one shad-owing orbit which stays close to a given pseudo-orbit.

From now on we shall therefore restrict ourselves to the situation where {ξn} is small in comparisonto {ϕn}. The problem of separating {ψn} into its components then naturally falls into two parts:

a) Performing the decomposition when the function G is known.

b) Estimating the function G from the combined time series {ψn} (rather than from {ϕn}).

We already have all the tools required to solve the second problem. The basic idea is to apply thetechniques of the last section to a large sample of {ψn}, in which case, with some luck, most of theeffects of {ξn} will average out and a reasonable estimate of G can be made. We can then proceediteratively, using this estimate of G to obtain an estimate {ξn} of { ξn}. Then ψn = ψn - ξn shouldbe much more deterministic than {ψn} and hence we should be able to obtain a better estimate of Gfrom it. This procedure can then be repeated as often as necessary.

-5 10-5

0 100

5 10-5

1 10-4

1.5 10-4

0 200 400 600 800 1000

ξ n

nFigure 20. Signal added to Hénon time series for signal extraction examples.


Let us thus turn our attention to the first problem, namely that of performing the decompositionwhen G is known. The basic approach is to look for a discrepancy δn = ψn - G(ψn-d, ψn-d+1, …,ψn-1) between the observed value of ψn and that which is predicted by the deterministic dynamicsG(ψn-d, ψn-d+1, …, ψn-1). If ξn-d = ξn-d +1 = … = ξn = 0 then this discrepancy will be zero. A non-zero value of δn will therefore indicate the presence of some non-trivial signal. This is illustrated byFigures 20 and 21. The signal {ξn} from Fig. 20 was added to the Hénon time series from Fig. 1. Anestimate Gλ of G was made from the combined time series {ψn} and a one step prediction wasmade, just as for Fig. 14. This was then subtracted from {ψn} to yield { δn} as shown in Fig. 21. It isclear that this procedure is able to detect the presence of the pulse and even to some extent extractits qualitative features, but does not yield much in the way of quantitative information.

-2 10-4

0 100

2 10-4

4 10-4

0 200 400 600 800 1000

δ n

nFigure 21. Discrepancy δn when above signal is added to Hénon time series.

To proceed further we expand δn assuming that {ξn} is small:

δn = ψn - G(ψn-d, ψn-d+1, …, ψn-1)

= ϕn + ξn - G(ϕn-d+ξn-d, ϕn-d+1+ξn-d+1, …, ϕn-1+ξn-1) (21)

≅ ξn -

i=1

d

∑ ξn-i

∂G(ψn−d ,K,ψn−1)

∂ψn−i(22)

Allowing n to vary, Eq. 22 gives a set of simultaneous linear equations for {δn}. These lie at theheart of most approaches to signal separation. Although several different techniques can be used tosolve these equations, great care has to be taken when the dynamics of {ϕn} is chaotic, since in thatcase this set of equations becomes very badly conditioned. One possibility is to use the singularvalue decomposition, which is able to cope with very badly conditioned linear problems. This is es-sentially the technique used by Farmer and Sidorowich [1991]. They in fact solve Eq. 22 repeatedly,regarding it as a Newton step in solving the full non-linear problem given by Eq. 21. They also im-pose additional equations designed to ensure that the final {ϕn} has minimal norm. Davies [1992and 1993] gives an elegant framework for this kind of approach.


Here, however, we shall describe a simpler approach that can be used if we make additonal assump-tions about the properties of {ξn}. In particular we assume that {ξn} is slowly varying. This for in-stance is the case for the signal in Fig. 20, except in a small neighbourhood of the transient (and aswe shall see our scheme works reasonably well even there). We thus assume that ξn - ξn-1 is smallin comparison to ξn . It is then reasonable to set ξn-d ≅ ξn-d+1 ≅ … ≅ ξn in Eq. 22, which gives

δn ≅ ξn (1 - Jn) (23)

where

Jn =

∂G(ψn−d ,K,ψn−1)

∂ψn−ii=1

d

∑ (24)

This gives

ξn ≅ δn1 − Jn

(25)

as an estimate for ξn. This turns out to work extremely well, except for the occasional cases when Jncomes close to 1. This can be overcome by noting that under the assumption ξn-d ≅ ξn-d+1 ≅ … ≅ξn, Eq. 25 is an equally valid estimate for any of ξn-d, ξn-d+1, …, ξn. Thus, it is reasonable to useany of ξn , ξn+1, … , ξn+d as an estimate for ξn. Since the error in Eq. 21 is essentially inverselyproportional to 1 - Jn , the best estimate will be given by that ξk for which the corresponding 1 -Jk is maximized. Let k(n) be this value, so that n ≤ k(n) ≤ n+d and 1 - Jk(n) ≥ 1 - Ji for all n ≤ i ≤n+d. Our best estimate for ξn is then ξk(n). This is shown plotted in Figure 22, and as we can seehas managed to recover the original pulse almost perfectly. We stress all the information requiredfor this has come from combined signal {ψn} and in particular the dynamics Gλ was estimated fromthis.

-5 10-5

0 100

5 10-5

1 10-4

1.5 10-4

0 200 400 600 800 1000

ξ k(n)

n

~

Figure 22. Recovered signal ξk(n) .

Observe that in traditional signal processing terms, if we identify {ϕn} as the “noise” contaminatingthe signal {ξn}, we are able to recover the signal at a signal to noise ratio of -80dB. This is of coursefar in excess of what could be done using conventional linear filtering. In this context, it should be


pointed out that, in common with most such schemes, the above algorithm’s performance increasesas the amplitude of {ξn} decreases, down to some lower limit set by numerical inaccuracy. This is incomplete contrast to conventional signal processing techniques where signal extraction performancedeteriorates with decreasing signal amplitude.

6. CONTROLING CHAOTIC SYSTEMS

So far, we have considered the presence or absence of chaos in a given system as something beyondour control. In many cases however, we would like to avoid chaotic behaviour, or at least modify itin such a way as to perform a useful function. One approach would be to make large and possiblycostly alterations to the system to completely change its dynamics and ensure that the modifiedsystem is not capable of chaotic behaviour. This may, however, not be feasible for many practicalsystems. An alternative, due to Ott et. al. [1990] is to attempt to control the system using only smalltime dependent perturbations of an accessible system parameter. The key fact behind their idea isthat a typical chaotic system, such as the Hénon map, will contain an infinite number of unstable pe-riodic orbits. Normally, these will not be observed, precisely because they are unstable, but any typ-ical trajectory of the system will come arbitrarily close to them infinitely often. This can be seen inFig. 1, where {xn} comes close to an unstable fixed point near to x = 0.6, at about n = 150 and againat n = 640 and n = 690.

The effects of the unstable periodic orbits can thus be seen in an observed time series {ϕn}. Severalgroups have shown that an accurate estimate of their position and eigenvalues can be derived fromthe time series (e.g. [Auerbach et al., 1987] and [Lathrop and Kostelich, 1989]). Such estimatesmake use of many of the techniques for modelling chaotic time series that we have described in theprevious sections. As usual, Takens’s theorem is used to reconstruct the original dynamical system,whilst a local estimate to the function G around a periodic orbit can be used to calculate the orbit’seigenvalues.

The basic idea behind controlling a chaotic system is to choose one of these periodic orbits and at-tempt to stabilize it by small perturbations of a parameter. This is possible precisely because chaoticsystems are so sensitive to small changes. Normally, such sensitivity simply leads to instability andcomplex behaviour of the kind seen in Fig. 1. However, if the perturbations are carefully chosen,they can push the system into the desired periodic regime and then keep it there. Ott et. al. [1990]first demonstrated the feasibility of their algorithm using numerical simulations, but since then it hasbeen applied successfully to the control of a variety of real systems (Ditto et al., 1990], [Singer etal., 1991] and [Hunt et al., 1991]). Once again, this algorithm relies on a local approximation of thefunction G around the periodic orbit.

One of the great advantages of this approach is that there is potentially a large number of differentperiodic orbits which we can stabilize. We can thus choose precisely that orbit which gives the bestsystem performance in a given application. Furthermore, we can easily switch amongst the differentorbits available, again using only small changes in the control parameter. In principle, it should thusbe possible to obtain many substantially different classes of behaviour from the same chaotic sys-tem. This is in complete contrast to systems which lack a chaotic attractor and operate at a stableequilibrium or periodic orbit. For such systems, small parameter perturbations can only move theorbit by a small amount, but cannot generally lead to dramatically different behaviour. One is thusessentially restricted to whatever behaviour is given by the stable orbit and it is difficult to makesubstantial improvements in performance without major changes to the system.

A further extension to this idea is described in [Shinbrot et al., 1990]. Here, rather than aiming tooperate the system in a given periodic steady state, one is trying to direct the state of the system to a


desired target state in as short a time as possible using only small perturbations of the control pa-rameter. It turns out that essentially the same framework as above can be used to achieve this, andonce again a chaotic system’s extreme sensitivity to small perturbations can be used to our advan-tage.

7. CHARACTERIZING CHAOTIC TIME SERIES

Given an apparently complex time series {ϕn} such as those in Fig. 1-3, how do we tell whether ithas come from a deterministic chaotic system or from some kind of stochastic process such as theARMA model of Eq. 2? A variety of techniques for answering this question exist, all based upon thereconstruction technique described in §3.

Thus, from the observed series {ϕn : n = 1, …, N} we form the d-dimensional reconstructed orbit{ vn = (ϕn-d+1, ϕn-d+2, …, ϕn) : n = d, …, N}. As mentioned above, this is in some sense equivalentto the original orbit {xn = x(nτ) : n = d, …, N} in the unknown phase space; more precisely the co-ordinate independent properties of {vn} are the same as those of {xn}. The basic idea for character-izing chaotic behaviour is therefore to compute one more or more quantities which are invariant un-der co-ordinate changes and indicate the presence or absence of chaos.

As already described in Colin Sparrow’s lecture, and in §2, one of the most fundamental character-istics of chaotic behaviour is the exponential divergence of nearby trajectories. We saw this in Fig.5, and saw its effects on our ability to make long range predictions in Fig. 19. Recall that the rate ofthis divergence is measured using Liapunov exponents. Thus if x′n is a small perturbation of xn weexpect to see that

′xn − xn ~ eλn (26)

for where some constant λ. In fact, there will be m different values of λ, depending on the directionof the initial perturbation. These m values are known as the Liapunov exponents of the original dy-namical system (e.g. [Eckmann and Ruelle, 1985], [Abarbanel et al., 1993]) and it is usual to de-scribe the system as chaotic if at least one is positive. Now, note that if x0 was an equilibrium pointthen λ would then just be an eigenvalue of the linearized differential equation around this equilib-rium. It is well known that eigenvalues are preserved under co-ordinate changes (this point was al-ready implicit in our discussion of chaotic control above) and in exactly the same way it can beshown that Liapunov exponents remain invariant under smooth co-ordinate changes, and in particu-lar that the exponents for the reconstructed system are the same as those for the original dynamics.In other words we can estimate the original Liapunov exponents by observing the separation of twonearby reconstructed trajectories v′n and vn. The best way of doing this is to estimate the local dy-namics at vn for each n, as briefly described under local prediction in §3. The linear part of this (i.e.the Jacobian matrix) gives us the behaviour of small perturbations close to vn. The resulting lin-earizations for each n can then be composed (i.e. by multiplication of the corresponding matrices) toyield an estimate over the whole of the observed time series of the rate of divergence. Details ofpractical algorithms can be found in [Eckmann and Ruelle, 1985], [Wolf et al., 1985], [Eckmann etal., 1986] and [Abarbanel et al., 1993].

Such numerical determination of Liapunov exponents has been very popular, and has been at-tempted with a wide variety of observed time series. Unfortunately, estimates of Liapunov expo-nents tend to converge rather slowly and hence large quantities of high quality data are required toobtain accurate results. In many cases sufficient data will not be available and attempting to drawconclusions from such estimates is at best a waste of time and at worst can be highly misleading. It


is probably fair to say that many of the published papers claiming to have detected chaos in ob-served time series should therefore be treated with healthy scepticism.

An alternative invariant is a quantity called the correlation dimension DC (e.g. [Eckmann andRuelle, 1985], [Abarbanel et al., 1993]). This ignores the dynamics and concentrates on the proper-ties of the attractor in IRm on which the points {xn} lie. In particular it attempts to measure thenumber of variables required to describe this set. If all the xn are identical (so that all the ϕn areconstant) then DC will be zero. If they lie on some smooth curve then DC will equal 1 and if they filla plane it will equal 2. At the other extreme, if the xn completely fill IRm then we will have DC = m.Intriguingly, DC need not be an integer (it is about 1.21… for the Hénon map) and is thus an exam-ple of a fractal dimension.

As with the Liapunov exponents, the correlation dimension is co-ordinate independent and hence isthe same for {xn} and {vn}. It can be estimated from a finite sample {vn : n = 1, …, N} as follows.First form all the N2 possible pairs (vi,vj) of such points. Calculate the Euclidean distance rij be-tween each pair vi and vj. Recall that an expression for this in terms of the original time series {ϕn}was already given in Eq. 20.

Now, for a given ε, let N(ε) be number of pairs (i,j) such that rij ≤ ε. Then C(ε) = N(ε)/N2 is the pro-portion of pairs of points within a distance of ε of each other. A simple calculation shows that if allthe points {vn} lie randomly on some curve we have roughly C(ε) ~ ε for large N and small ε. Simi-larly if the {vn} lie on a surface we get C(ε) ~ ε2. This suggests that C(ε) behaves exponentially forsmall ε with the exponent giving the dimension of the set on which the {vn} lie. This motivates thedefinition of the correlation dimension

DC = limε→0

logC(ε )log ε

(27)

The quantity C(ε) is called a correlation integral and can be readily estimated numerically. Themain difficulty lies in finding all the points within a distance ε of a given point vi. Several efficientalgorithms exist for doing this, even for moderately large data sets (e.g. N = 106) (e.g. [Theiler,1987], [Bingham and Kot, 1989], [Grassberger, 1990]). Note that exactly the same problem (of find-ing all the points in a given neighbourhood) arises in the local prediction algorithms referred to in§3, and indeed such searching is by far the slowest part of such techniques. Once has been com-puted for a range of ε, we can estimate DC by plotting log C(ε) against log ε and estimating theslope of the resulting line. Again, a reasonable size data set is required to obtain meaningful valuesof DC, but probably not as large as is needed for the computation of Liapunov exponents. Neverthe-less published conclusions should be treated with some caution.

So far, we have assumed that we know the size of d required to reconstruct the time series {ϕn}.When this is not the case, we proceed by trial and error. Thus, we calculate a correlation dimensionDC(d) for each trial choice of d. When d is too small the set {vn} will completely fill IRm and wewill get DC(d) ≅ d (we must of course always have 0 ≤ DC(d) ≤ d). Conversely once d is sufficientlylarge the computed value DC(d) should stabilize at approximately the correct correlation dimensionof {xn}. As an example, for the Hénon map we get DC(1) ≅ 1 and DC(2) ≅ DC(3) ≅ DC(4) ≅ 1.21.

When DC is not an integer, as in this case, we say that the system contains a strange attractor. Thisis usually a sign of chaos (as indicated by a positive Liapunov exponent), although strange non-chaotic systems do exist (but are currently believed to be pathological). Note also that many otherfractal dimensions exist to characterize fractal sets, but DC is probably the easiest to estimate nu-merically.


It may of course happen that DC(d) continues to grow with d. This usually suggests that the time se-ries {ϕn} was generated by a stochastic process rather than by a chaotic dynamical system. Thus forexample, a random process such as white noise will have DC = ∞. Of course, we would also get thisresult if the original dynamical system was genuinely infinite dimensional, but from many points ofview such a system is indistinguishable from a random one. A variety of statistical sets for distin-guishing chaos from stochastic noise have been devised but their utility in real applications is stillrather limited.

In the above procedure, the smallest value of d at which DC(d) begins to stabilize yields the minimalembedding dimension required to adequately represent the dynamics of the system. Computing thecorrelation dimension in this way thus yields both a measure of a time series’s complexity, and anestimate of the embedding dimension required for any further processing. It is thus usually the firststep in analysing time series which we suspect might have been generated by a chaotic system. Inpractice, lack of data and numerical precision limit calculations to about d ≤ 10 (and hence DC ≤10). From the point of view of these notes, therefore, any time series with DC appreciably largerthan 10 can be treated more or less as a truly random one.

Recently, several authors ([ (Cenys and Pyragas, 1988], [Liebert et al., 1991], [Kennel et al., 1992]

and [Buzug and Pfister, 1992]) have pointed out that estimating d by the above procedure is unnec-essarily complicated and that one can attempt to do this directly. The basic idea is that if d is suffi-ciently large then if two points (ϕ i-d+1, ϕi-d+2, …, ϕ i) and (ϕj-d+1, ϕ j-d+2, …, ϕ j) are close in IRd

then if we increase d by one, the two points (ϕi-d+1, ϕi-d+2, …, ϕ i, ϕi+1) and (ϕj-d+1, ϕj-d+2, …, ϕ j,ϕj+1) should still be close in IRd+1. One the other hand if d is too small then (ϕi-d+1, ϕi-d+2, …, ϕ i,ϕi+1) and (ϕj-d+1, ϕ j-d+2, …, ϕ j, ϕ j+1) should be much further away from each other than (ϕi-d+1,ϕi-d+2, …, ϕi) and (ϕj-d+1, ϕj-d+2, …, ϕj) were. Of course all this is saying is that if d is sufficientlylarge to give us a proper reconstruction then if (ϕi-d+1, ϕi-d+2, …, ϕi) and (ϕj-d+1, ϕj-d+2, …, ϕj) areclose then so must the values ϕi+1 and ϕj+1. Hence in the context of §3, a local 0th predictor (whichwe called “forecasting by analogy”) will yield reasonable forecasts. We thus see that the above ap-proach is simply testing the crude predictability of the time series for various values of d. Thereseems to be no reason to restrict oneself to 0th order local predictors which suggests that the bestway of determining d is a purely empirical one: estimate G for each candidate value of d and use thesmallest one which gives a good fit.


FURTHER READING

The recently published reprint collection by Ott et al. [1994] contains an excellent selection of themost important papers in this area, and includes a 60 page introduction by the authors. Two recentreviews, by Abarbanel et al. [1993] and by Grassberger et al. [1992] also provide comprehensivesurveys of chaotic time series analysis, with many examples of applications. Finally a large part ofvolume 54 of the Journal of the Royal Statistical Society B is devoted to this subject, and may beparticularly useful to those interested in the more statistical aspects of chaotic time series.

REFERENCES AND BIBLIOGRAPHY

H. D. I. Abarbanel, R. Brown and J. B. Kadtke, 1989, Prediction and System Identification inChaotic Time Series with Broadband Fourier Spectra, Phys. Lett. A, 138, 401-408.

H. D. I. Abarbanel, R. Brown, J. J. Sidorowich, L. S. Tsimring, 1993, The Analysis Of ObservedChaotic Data In Physical Systems, Rev. Mod. Phys., 65, 1331-1392.

S. T. Alexander, 1986, Adaptive Signal Processing, Theory and Applications, Springer-Verlag.

D. Auerbach, P. Cvitanovic′ , J.-P. Eckmann, G. Gunaratne and I. Procaccia, 1987, ExploringChaotic Motion Through Periodic Orbits, Phys. Rev. Lett., 58, 2387-2389.

S.A. Billings, M.J. Korenberg and S. Chen, 1988, Identification of Non-linear Output-Affine Sys-tems Using an Orthogonal Least Squares Algorithm, Int. J. Systems Sci., 19, 1559-1568.

S. Bingham and M. Kot, 1989, Multidimensional Trees, Range Searching, and a Correlation Di-mension Algorithm of Reduced Complexity, Phys. Lett. A, 140, 327-330.

A. Blyth, 1992, Least Squares Approximation of Chaotic Time Series Using Radial Basis Functions,MSc Dissertation, Department of Mathematics, Imperial College.

D. S. Broomhead and D. Lowe, 1988, Multivariable Functional Interpolation and Adaptive Net-works, Complex Systems, 2, 321-355.

J. R. Bunch and C. P. Nielsen, 1978, Updating the Singular Value Decomposition, NumerischeMathematik, 31, 111-129.

Th. Buzug and G. Pfister, 1992, Optimal Delay time and Embedding Dimension for Delay-TimeCoordinates by Analysis of the Global Static and and Local Dynamical Behaviour of StrangeAttractors, Phys. Rev. A, 45, 7073-7084.

M. Casdagli, 1989, Nonlinear Prediction of Chaotic Time Series, Physica D, 35, 335-356.

M. Casdagli, 1992, Chaos and Deterministic versus Stochastic Non-linear Modelling, J. Roy. Stat.Soc. B, 54, 303-328.

M. Casdagli, S. Eubank, J. D. Farmer, and J. Gibson, 1991, State Space Reconstruction in the Pres-ence of Noise, Physica D, 51, 52-98.

A. (Cenys and K. Pyragas, 1988, Estimation of the Number of Degrees of Freedom from Chaotic

Time Series, Phys. Lett. A, 129, 227-230.

S. Chen, S.A. Billings and W. Luo, 1989, Orthogonal Least Squares Methods and Their Applicationto Non-linear Systems Identification, Int. J. Control, 50, 1873-1896.

S. Chen, S.A. Billings, C. F. N. Cowan and P. M. Grant, 1990a, Practical Identification of NAR-MAX Models Using Radial Basis Functions, Int. J. Control, 52, 1327-1350.


S. Chen, S.A. Billings, C. F. N. Cowan and P. M. Grant, 1990b, Non-linear Systems IdentificationUsing Radial Basis Functions, Int. J. Systems Sci., 21, 2513-2539.

S. Chen, C. F. N. Cowan and P. M. Grant, 1991, Orthogonal Least Squares Learning Algorithm forRadial Basis Function Networks, IEEE Trans. Neural Networks, 2, 302-309.

J. P. Crutchfield and B. S. McNamara, 1987, Equations of Motion from a Data Series, ComplexSystems, 1, 417-452.

J.-P. Eckmann and D. Ruelle, 1985, Ergodic Theory of Chaos and Strange Attractors, Rev. Mod.Phys., 57, 617-656.

J.-P. Eckmann, S. Oliffson Kamphorst, D. Ruelle and S. Ciliberto, 1986, Liapunov Exponents froma Time Series, Phys. Rev. A, 34, 4971-4979.

M.E. Davies, 1992, An Iterated Function Approximation in Shadowing Time Series, Phys. Lett. A,169, 251-258.

M.E. Davies, 1993, Noise Reduction by Gradient Descent, Int. J. of Bifurcation and Chaos, 3, 113-118.

W.L. Ditto, S.N. Rauseo and M.L. Spano, 1990, Experimental Control of Chaos, Phys. Rev. Lett.,65, 3211-3214.

J. D. Farmer and J. J. Sidorowich, 1987, Predicting Chaotic Time Series, Phys. Rev. Lett., 59, 845-848.

J. D. Farmer and J. J. Sidorowich, 1991, Optimal Shadowing and Noise Reduction, Physica D, 47,373-392.

A. M. Fraser and H. L. Swinney, 1986, Independent Coordinates for Starnge Attractors from MutualInformation, Phys. Rev. A, 33, 1134-1140.

M. Giona, F. Lentini and V. Cimagalli, 1991, Functional Reconstruction and Local Prediction ofChaotic Time Series, Phys. Rev. A, 44, 3496-3502.

P. Grassberger, 1990, An Optimized Box-Assisted Algorithm for Fractal Dimensions, Phys. Lett. A,148, 63-68.

A. Block, W. von Bloh and H.J. Schellnhuber, 1990, Efficient Box-Counting Determination ofGeneralized Fractal Dimensions, Phys. Rev. A, 42, 1869-1874.

P. Grassberger, T. Schreiber and C. Schaffrath, 1992, Non-linear Time Sequence Analysis, Int. J. ofBifurcation and Chaos, 1, 521-547.

S. M. Hammel, 1990, A Noise Reduction Method for Chaotic Systems, Phys. Lett. A., 148, 421-428.

E.R. Hunt, 1991, Stabilizing High-Period Orbits in a Chaotic System: the Diode Resonator, Phys.Rev. Lett., 67, 1953-1955.

J. Jiménez, J. A. Moreno and G. J. Ruggeri, 1992, Forecasting on Chaotic Time Series: A LocalOptimal Linear-Reconstruction Method, Phys. Rev. A, 45, 3553-3558.

M. B. Kennel, R. Brown and H. D. I. Abarbanel, 1992, Determining Embedding Dimension forPhase-Space Reconstruction Using aGeometrical Construction, Phys. Rev. A, 45, 3403-3411.

M.J. Korenberg, S.A. Billings, Y.P. Liu and P.J. McIloy, 1988, Orthogonal Parameter EstimationAlgorithm for Non-linear Stochastic Systems, Int. J. Control, 48, 193-210.

E. J. Kostelich and J. A. Yorke, 1988, Noise Reduction in Dynamical Systems, Phys. Rev. A, 38,1649-1652.


A. S. Lapedes and R. Farber, 1987, Nonlinear Signal Processing Using Neural Networks: Predictionand System Modelling, Technical Report, LA-UR-87-2662, Los Alamos National Laboratory.

D. P. Lathrop and E. J. Kostelich, 1989, Characterization of an Experimental Strange Attractor byPeriodic Orbits, Phys. Rev. A, 40, 4028-4031.

C. L. Lawson and R. J. Hanson, 1974, Solving Least Squares Problems, Prentice-Hall, New Jersey.

W. Liebert and H. G. Schuster, 1989, Proper Choice of the Time Delay for the Analysis of ChaoticTime Series, Phys. Lett. A, 142, 107-111.

W. Liebert, K. Pawelzik and H. G. Schuster, 1991, Optimal Embeddings of Chaotic Attractors fromTopological Considerations, Europhysics Lett., 14, 521-526.

F. Ling, D. Manolakis and J. G. Proakis, 1986, A Recursive Modified Gram-Schmidt Algorithm forLeast-Squares Estimation, IEEE Trans. ASSP, 34, 829-835.

P. S. Linsay, 1991, An Efficient Method of Forecasting Chaotic Time Series Using Linear Interpola-tion, Phys. Lett. A, 153, 353-356.

P. F. Marteau and H. D. I. Abarbanel, 1991, Noise Reduction in Chaotic Time Series Using ScaledProbabilistic Methods, J. Nonlin. Sci., 1, 313-343.

M. Muldoon, 1994, Delay Reconstruction for Multiprobe Signals, proceedings of the IEE Collo-quium on Exploiting Chaos in Signal Processing.

L. Noakes, 1991, The Takens Embedding Theorem, Int. J. of Bifurcation and Chaos, 1, 867-872.

D. Nychka, S. Ellner, D. McCaffrey and A. D. Gallant, 1992, Finding Chaos in Noisy Systems, J.Roy. Stat. Soc. B, 54, 399-426.

E. Ott, C. Crebogi and J.A. Yorke, 1990, Controlling Chaos, Phys. Rev. Lett., 64, 1196-1199.

E. Ott, T. Sauer and J.A. Yorke, 1994, Coping with Chaos, Wiley-Interscience.

K. Pawelzik and H. G. Schuster, 1991, Unstable Periodic Orbits and Prediction, Phys. Rev. A, 43,1808-1812.

M. Pottmann and D.E. Seborg, 1992, Identification of Non-linear Processes Using Reciprocal Mul-tiquadric Functions, J. Proc. Cont., 2, 189-.

W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, 1988, Numerical Recipes in C,The Art of Scientific Computing, Cambridge University Press, Cambridge.

M. J. D. Powell, 1987, Radial Basis Functions for Multivariable Interpolation : A Review, in Algo-rithms for Approximation, ed. J. C. Moxon and M. G. Cox, Clarendon Press, Oxford.

T. Sauer, J. A. Yorke and M. Casdagli, 1991, Embedology, J. Stat. Phys., 65, 579-616.

T. Schreiber and P. Grassberger, 1991, A Simple Noise-Reduction Method for Real Data, Phys.Lett. A, 160, 411-418.

T. Shinbrot, E. Ott, C. Crebogi and J.A. Yorke, 1990, Using Chaos to Direct Trajectories to Targets,Phys. Rev. Lett., 65, 3215-3218.

J. Singer, Y.-Z. Wang and H.H. Bau, 1991, Controlling a Chaotic System, Phys. Rev. Lett., 66,1123-1125.

L. A. Smith, 1992, Identification and Prediction of Low Dimensional Dynamics, Physica D, 58, 50-76.

L. A. Smith, K. Godfrey, P. Fox, K. Warwick, 1991, A New Technique for Fault Detection inMulti-Sensor Probes, Control 91, IEE Conference Publication, 332, 1062-1064.


J. Stark, 1993, Recursive Prediction of Chaotic Time Series, J. Nonlinear Sci., 3, 197-223.

J. Stark, 1994, Adaptive Model Selection Using Orthogonal Least Squares Methods, submitted toProc. Roy. Soc. A.

J. Stark and B. Arumugam, 1992, Extracting Slowly Varying Signals from a Chaotic Background,Int. J. of Bifurcation and Chaos, 2, 413-419.

J. Stoer and R. Bulirsch, 1980, An Introduction to Numerical Analysis, Springer-Verlag.

G. Sugihara and R. M. May, 1990, Nonlinear Forecasting as a Way of Distinguishing Chaos fromMeasurement Error in Time Series, Nature, 344, 734-741.

F. Takens, 1980, Detecting Strange Attractors in Turbulence, in Dynamical Systems and Turbu-lence, Warwick, 1980, ed. D.A. Rand and L.-S. Young, Lecture Notes in Mathematics, 898,Springer-Verlag.

W. W. Taylor, 1991, Quantifying Predictability for Applications in Signal Separation, to appear inSPIE Proceedings, 1565.

J. Theiler, 1987, Efficient Algorithm for Estimating the Correlation Dimension from a Set of Dis-crete Points, Phys. Rev. A, 36, 4456-4462.

A. A. Tsonis and J. B. Elsner, 1992, Nonlinear Prediction as a Way of Distinguishing Chaos fromRandom Fractal Sequences, Nature, 358, 217-220.

A. Wolf, J.B. Swift, H.L. Swinney and J.A. Vastano, 1985, Determining Lyapunov Exponents froma Time Series, Physica, 16D, 285-317.

P. Young, 1984, Recursive Estimation and Time-Series Analysis, An Introduction, Springer-Verlag.

Documents

NONLINEAR DYNAMICS II, III: ANALYSIS OF TIME SERIES€¦ · NONLINEAR DYNAMICS II, III: ANALYSIS OF TIME SERIES Jaroslav Stark Centre for Nonlinear Dynamics and its Applications,