Simple statistical methods for complex ecological dynamicssw15190/talks/isec14.pdf · based models...

Simple statistical methods for complex

ecological dynamics

Simon Wood & Matteo Fasiolo

Mathematical Sciences, University of Bath, U.K.National Centre for Statistical Ecology EPSRC

Two ecological timeseries

0 50 100 150 200 250 300 350

Nicholson’s blowfly’s

Popula

1960 1970 1980 1990

Pine looper moth, Tentsmuir

Two modelling issues

It is appealing to make statistical inferences about mechanismbased models of ecological timeseries, but. . .

◮ Many scientifically useful ecological dynamic models do nottry to get every feature of the data right.

◮ The population dynamics of small animals (insects,rodents, etc.) can be highly non-linear.

Simulated Ricker Model Data (toy)

0 10 20 30 40 50

popula

Nt+1 = erNte−Nt+et , et ∼ N(0, σ2

e), Yt ∼ Poi(φNt) [observed]

. . . a simple toy model, but like many ecological dynamicsystems, it is very non-linear. . .

Varying r in the Ricker

popula

Varying e1 in the Ricker

popula

Naive Bayes or Likelihood inference for Ricker

◮ Suppose we want inferences about θT = (r, φ, σ2e ), from an

observed population data series, y.

◮ Simple Bayesian or Likelihood approaches both require thejoint density fθ(e,y).

◮ MLE requires (approximate) evaluation of∫fθ(e,y)de (or

an EM approach).◮ Bayesian inference requires samples of θ, e from something

∝ fθ(e,y).

◮ We should look at fθ(e,y). . .

log{fθ(e,y)} versus e1 and r

−2 −1 0 1 2

−41000

−39000

−37000

2.5 3.0 3.5 4.0 4.5−50000

−45000

−40000

−35000

. . . trying to integrate out the random effects, e, or sample fromthis density is hopeless.

Solution 1: change what we be try to model

◮ We should not try to make the model reproduce features ofdata that the system would not reproduce itself.

◮ The exact phase of the series is just a noise feature, whichit is not interesting to model.

◮ We should concentrate on phase insensitive statistics of theseries, and on statistics summarizing short term dynamicstructure.

◮ e.g. the ACF, the coefficients of short term autoregressivemodels, and summaries of the marginal distribution of theobservations, or their increments.

◮ This should be done in a way that is flexible enough to dealwith missing data, multiple series and unobserved states.

But hang on a minute. . .

◮ 1D sections, like that shown through fθ(e,y) are only apartial view.

◮ For example, here is a section through a function f(x, z). . .

−3 −2 −1 0 1 2 3

−0.1

◮ Lots of local minimum right?

f(x, y). . .

◮ Wrong!

◮ . . . it has one global minimum.

Solution 2: Work with the state

◮ Recast Ricker problem in terms of state, nt, rather thannoise, et.

◮ Then plots of fθ(n,y) against any nt are benignlyuni-modal.

◮ Suggests that problem ought to be tractable using statespace methods.

◮ But 1D views are still partial: these transects look lovely

0.0 0.2 0.4 0.6 0.8 1.0

◮ but are transects through this. . .

g(x,y)

◮ . . . need to proceed with caution.

State space methods

1. Use direct MCMC on states nt, and parameter, θ. Appearsto work for Ricker, but

◮ Need good starting values for state, which is difficult for anon-linear model with hidden states.

◮ Mixing often slow.◮ Hard to prove that it is sampling the target properly.

2. Use filtering, within MCMC or for MC estimation oflikelihood.

◮ Seems to work, without the initialization and mixing issues.◮ Still hard to prove that the target is being explored

properly.

Filtering in brief

◮ Idea is to work forward in time, iterating the steps

1. Prediction:

p(nt|y0:t−1) =

∫p(nt|nt−1)p(nt−1|y0:t−1)dnt−1

2. Update:

p(nt|y0:t) = p(yt|nt)p(nt|y0:t−1)/p(yt|y0:t−1)

◮ In practice replace p(n∗|y·) by discrete set of {n(i)t } values

(particles) with ‘importance weights’ {w(i)t }.

1. Prediction step then moves the particles, and updates theirweights, perhaps by importance sampling.

2. Update just updates the importance weights.

◮ The model ought to provide a basis for moving particles.

◮ In practice periodically resample particles in proportion totheir weights to avoid particle depletion problems.

More particle Filtering

◮ Likelihood is p(y0:T ) = p(y0)∏

t p(yt|y0:t−1), where

p(yt|y0:t−1) =

∫p(yt|nt)p(nt|y0:t−1)dnt ≃

p(yt|n(i)t )w

◮ Particle filtering should work, provided the modellikelihood is not irregular.

◮ On toy problems, like the Ricker, it appears to work well,provided there is enough process noise.

◮ Intuitively it seems that process noise could smooth out theirregularities seen earlier, but generally it’s hard to proveanything. . .

◮ Let’s look at a toy toy model: If we discretise the statespace of the Ricker model then the likelihood is exactly andefficiently computable (discrete HMM).

Exact likelihood for Discrete Ricker as noise declines

2.0 2.5 3.0 3.5 4.0 4.5

σe = 0.3

log(r)

log−

2.0 2.5 3.0 3.5 4.0 4.5

σe = 0.1

log(r)

log−

2.0 2.5 3.0 3.5 4.0 4.5

σe = 0.05

log(r)

log−

2.0 2.5 3.0 3.5 4.0 4.5

σe = 0.01

log(r)

log−

How the particle filter tracks p(nt|y0:t, θ)

5 10 15 20

p(Nt|θ) when σe = 0.3

5 10 15 20

p(Nt|θ) when σe = 0.01

◮ . . . Density of the state, with attempts to sample byparticle filtering. Mixed success.

Statistical dimension reduction. ABC

◮ The alternative is to try to match carefully chosen phase

insensitive statistics, s, of the data, in a principled manner.

◮ Approximate Bayesian Computation, is one approach forstochastic simulation.

1. For each trial θ∗, data, y∗, are simulated and transformedto statistics, s∗.

2. The accept/reject decision about θ∗ is then based onreplacing the likelihood with I(‖s∗ − s‖k < ǫ).

◮ As ǫ → 0 we simulate from f(θ|s).

◮ Snags: need to choose ‖ · ‖k; as ǫ → 0, the acceptance ratealso → 0; works best with few statistics. Tuning needed.

Less tuning, more assumptions: synthetic likelihood

◮ Convert the observed data y into a vector of phaseinsensitive summary statistics that can be modelled asapprox. normal. i.e. s ∼ N(µθ,Vθ), where θ are the modelparameters.

◮ A ‘synthetic’ likelihood can be evaluated as follows.◮ Given θ, simulate Nr (=500, here) replicates, y∗

1,y∗

2. . ., and

process these exactly as y was processed to obtain replicatestatistics, s∗1, s

2, . . ..◮ Define µθ =

∑is∗i/Nr, S = [s∗

1− µθ, s

2− µθ, . . .] and hence

Vθ = STS/(Nr − 1).◮ ls(θ) = −(s− µθ)

TV−1

θ(s − µθ)/2− log |Vθ|/2 is the log

synthetic likelihood.

Synthetic likelihood - picture

y1* y2

* y3* yNr

data to statistics transform

s1* s2

* s3* sNr

estimate µθ, Σθ µθ ,Σθ MVN log likelihood

ls(θ)

Ricker example: what statistics?

0 10 20 30 40 50

0 5 10 15

0 1 2 3 4 5

yt−10.3

y t0.3

0 10 20 30 40 50

uniform quantiles

Ricker example: statistics ∼ N?

−3 −2 −1 0 1 2 3

Theoretical Quantiles

−3 −2 −1 0 1 2 3

Ricker example: transects through synthetic likelihood

3.0 3.5 4.0 4.5 5.0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

5 10 15 20

◮ . . . variability is random and small scale.

◮ Easy to explore ls by MCMC, or stochastic optimizationmethods.

◮ Can estimate ls in vicinity of maximum by quadraticregression of ls on parameters.

Ricker example: MCMC results (25% acceptance)

0 10000 20000 30000 40000 50000

Ricker example: parameter estimates

3.2 3.4 3.6 3.8 4.0

0.2 0.3 0.4 0.5 0.6

φ10 11 12 13

◮ Truth in red, 10000 burn in discarded.

Ricker direct MCMC on state and parameters

3.2 3.4 3.6 3.8 4.0

0.0 0.2 0.4 0.6 0.8

9.5 10.5 11.5

◮ Truth in red, based on second half of every 100th of1000000 single term update steps.

Taking stock

◮ Nothing is perfect (and in practice nothing is fast)!◮ Direct MCMC is difficult to start and difficult to get to mix.◮ Filtering is problematic at low process noise, and when the

parameters are not close to correct.◮ ABC seems to need quite a bit of tuning.◮ Synthetic likelihood has the dodgy normality assumption.

◮ In simulations with toy models, the direct methods(MCMC and Filtering) have better statistical efficiency, forthe correct model, when the process noise is not too low.

◮ When the model is not aiming to capture everything thenonly ABC and Synthetic likelihood seem reasonable.

◮ Let’s look at a less toy example. . .

Nicholson’s Blowflies

Nicholson’s Blowfly Data

0 50 100 150 200 250 300 350

Nicholson’s blowfly’s

◮ Lab population of adult flies known exactly every 2 days,from counts of dead flies and freshly discarded pupal cases.

◮ Time from egg to adult is around 2 weeks.

◮ Fecundity is likely to be density dependent.

Blowfly model

◮ DDE model (Nisbet and Gurney, 1981) is

dt= PN(t− τ)e−N(t−τ)/N0 − δN(t).

◮ Daily discretisation is

Nt+1 = PNt−τ exp(−Nt−τ/N0)et +Nt exp(−δǫt)

where et and ǫt are independent Gamma random deviateswith mean 1 and variance σ2

p and σ2d.

◮ Statistics are ACF coefficients to lag 11, coefficients of theincrements regression as for Ricker example, the meanpopulation and coefficients of the auto-regressionyi = β0yi−6 + β1y

2i−6 + β2y

3i−6 + β3yi−1 + β4y

2i−1 + ǫi.

◮ Parameter group Pτ and δτ controls stability.

4 Experimental replicates

0 50 100 150 200 250 300 350

50 100 150 200 250 300

250 300 350 400 450

Blowfly MCMC chain (31% acceptance)

0 10000 20000 30000 40000 50000

0 10000 20000 30000 40000 50000−

0 10000 20000 30000 40000 50000−20

0 10000 20000 30000 40000 50000

Blowfly checking plots

2.0 2.5 3.0 3.5

log theoretical quantiles

−3 −2 −1 0 1 2 3

N(0,1) quantilesm

−2 −1 0 1 2

Normal Q−Q Plot

◮ Left is QQ-plot for simulated (s− µθ)TΣ−1

θ (s− µθ)(observed is dashed).

◮ Middle gives scaled normal QQ plots for each element ofsimulated s.

◮ Normal QQ plot for observed Σ−1/2θ (s− µθ).

Blowflies: data left, model reps right

0 50 100 150 200 250 300 350

50 100 150 200 250 300

250 300 350 400 450

250 300 350 400 4500

250 300 350 400 450

Blowfly stability

0.01 0.05 0.50 5.00

no equilibrium

overdamped

underdamped

cyclic

◮ The dynamics are all limit cycles. The fluctuations are notnoise driven.

◮ This is not obvious a priori: e.g. noise has to be muchlarger than demographic stochasticity would give, togenerate observed variability.

Blowflies by Filtering

◮ We tried a similar analysis via filtering.

◮ Have to add observation noise to model to make this work.

0.01 0.05 0.50 5.00

no equilibrium

overdamped

underdamped

cyclic

0.01 0.05 0.50 5.00

E1 PMMH

no equilibrium

overdamped

underdamped

cyclic

0.01 0.05 0.50 5.00

no equilibrium

overdamped

underdamped

cyclic

0.01 0.05 0.50 5.00

E2 PMMH

no equilibrium

overdamped

underdamped

cyclic

0.01 0.05 0.50 5.00

no equilibrium

overdamped

underdamped

cyclic

0.01 0.05 0.50 5.00

E3 PMMH

no equilibrium

overdamped

underdamped

cyclic

0.01 0.05 0.50 5.00

no equilibrium

overdamped

underdamped

cyclic

0.01 0.05 0.50 5.00

E4 PMMH

no equilibrium

overdamped

underdamped

cyclic

Which is right?

◮ The filtering approach puts much more weight on theregion of greater dynamic stability.

◮ For the experimental conditions in E2 there is separateexperimental data which is inconsistent with this.

◮ Further investigation suggests that the filter is sufferingsevere particle depletion at a couple of times.

◮ Switching to a heavier tailed error distribution improvesmatters somewhat.

◮ But there are at least some clear diagnostic indicators. . .

Typical simulations from the fitted models

0 100 250

50 150 250

000 E2

250 350 450

0 100 250

000 E1 SL

50 150 250

000 E2 SL

250 350 450

0 100 250

E1 PMMH

50 150 250

000 E2 PMMH

250 350 450

E3 PMMH

250 350 450

E4 PMMH

Conclusions?

◮ For a correct enough non-linear dynamic models withenough process noise, filtering is probably best.

◮ The problem is knowing what’s enough in the above.

◮ If you are unsure that the model is right, or it’s notsupposed to be correct, then information reductionmethods such as ABC or Synthetic Likelihood (SL) may beless misleading, and don’t seem to be massively worse thanfiltering in any case.

◮ SL requires a bit less tuning than ABC, but at the cost of anormality assumption. The latter may be removable via asaddlepoint approximation.

Some References

Arulampalam, Maskell, Gordon, and Clapp (2002) A tutorial onparticle filters for online nonlinear/non-gaussian bayesian tracking.

Beaumont, Zhang, and Balding (2002) Approximate bayesiancomputation in population genetics. Genetics, 162(4):20252035. IEEETransactions on Signal Processing, 50(2):174188.

Ionides, Bhadra, Atchade, and King (2011) Iterated filtering. TheAnnals of Statistics, 39(3):17761802.

Nicholson (1954) An outline of the dynamics of animal populations.Australian Journal of Zoology, 2(1):965.

Wood (2010) Statistical inference for noisy nonlinear ecologicaldynamic systems. Nature, 466(7310):11021104

Fasiolo, Pya and Wood (nearly complete) Statistical inference forhighly nonlinear dynamic ecological models: method comparison.

Simple statistical methods for complex ecological dynamicssw15190/talks/isec14.pdf · based models...

Documents

MELTS:Aframeworkforapplyingmachinelearningto timeseries

Using Informix TimeSeries and the Internet of Thingsadvancedatatools.com/Downloads/Internet_of_Things_Webcast-2.pdf · Using Informix TimeSeries and the Internet of Things • Summary

Azure Data Explorer - Timeseries Database Features

Gaussian Processes for Timeseries Modelling

Analysis of fMRI Timeseries - Wellcome Trust Centre for ... · Analysis of fMRI Timeseries: ... email r.henson@ucl.ac.uk Contents ... timeseries also allows us to view statistical

Timeseries based deep hybrid transfer learning frameworks

Scientiﬁcally formulated range of specialised hair care

TimeSeries - Oxford Statisticsreinert/time/notesht08short.pdf · 1 What are TimeSeries? Manystatistical methodsrelatetodatawhichareindependent,oratleastuncorrelated. Thereare many

Example timeseries in OceanSITES The OceanSITES program of fixed open-ocean sustained timeseries Uwe Send Scripps Institution of Oceanography, La Jolla,

TSAR A TimeSeries AggregatoR · TimeSeries Aggregation at Twitter A common problem! ⇢Data products (analytics.twitter.com, embedded analytics, internal dashboards)! ⇢Business

Hierarchical TimeSeries Clustering for Data Streams

Timeseries 586

EPL Timeseries

univaritae timeseries

IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak

Modeling God’s Attributes and the Biblical God is a Scientiﬁcally … · 2015. 12. 30. · Modeling God’s Attributes and the Biblical God is a Scientiﬁcally Rational Concept

Time series analysis projects - Chalmersrootzen/timeseries/timeseries... · Financial Time Series Parameter Value StandardError+ Tstasc Constant 0.0648532 0.0333538 1.9444 GARCH(1))

Kisters WATER TimeSeries Documentation

DADI contribution to IVOA TimeSeries priority DADI contribution to IVOA TimeSeries priority F.Bonnarel (CDS) On behalf of DADI TimeSeries group. 13/12/2017 Summary of presentation

09 F&DM TimeSeries Smooth 37