View
5
Download
0
Category
Preview:
Citation preview
Simple statistical methods for complex
ecological dynamics
Simon Wood & Matteo Fasiolo
Mathematical Sciences, University of Bath, U.K.National Centre for Statistical Ecology EPSRC
Two ecological timeseries
0 50 100 150 200 250 300 350
04000
8000
Nicholson’s blowfly’s
day
Popula
tion
1960 1970 1980 1990
0.5
1.0
1.5
2.0
Pine looper moth, Tentsmuir
year
pop
0.2
5
Two modelling issues
It is appealing to make statistical inferences about mechanismbased models of ecological timeseries, but. . .
◮ Many scientifically useful ecological dynamic models do nottry to get every feature of the data right.
◮ The population dynamics of small animals (insects,rodents, etc.) can be highly non-linear.
Simulated Ricker Model Data (toy)
0 10 20 30 40 50
05
01
00
15
02
00
t
popula
tion
Nt+1 = erNte−Nt+et , et ∼ N(0, σ2
e), Yt ∼ Poi(φNt) [observed]
. . . a simple toy model, but like many ecological dynamicsystems, it is very non-linear. . .
Varying r in the Ricker
r
3.790
3.795
3.800
3.805
3.810
t
10
20
30
40
50
popula
tion
0
50
100
150
200
Varying e1 in the Ricker
e1
−2
−1
0
1
2
t
10
20
30
40
50
popula
tion
0
50
100
150
200
Naive Bayes or Likelihood inference for Ricker
◮ Suppose we want inferences about θT = (r, φ, σ2e ), from an
observed population data series, y.
◮ Simple Bayesian or Likelihood approaches both require thejoint density fθ(e,y).
◮ MLE requires (approximate) evaluation of∫fθ(e,y)de (or
an EM approach).◮ Bayesian inference requires samples of θ, e from something
∝ fθ(e,y).
◮ We should look at fθ(e,y). . .
log{fθ(e,y)} versus e1 and r
−2 −1 0 1 2
−41000
−39000
−37000
e1
log(f
θ(e
, y))
2.5 3.0 3.5 4.0 4.5−50000
−45000
−40000
−35000
r
log(f
θ(e
, y))
. . . trying to integrate out the random effects, e, or sample fromthis density is hopeless.
Solution 1: change what we be try to model
◮ We should not try to make the model reproduce features ofdata that the system would not reproduce itself.
◮ The exact phase of the series is just a noise feature, whichit is not interesting to model.
◮ We should concentrate on phase insensitive statistics of theseries, and on statistics summarizing short term dynamicstructure.
◮ e.g. the ACF, the coefficients of short term autoregressivemodels, and summaries of the marginal distribution of theobservations, or their increments.
◮ This should be done in a way that is flexible enough to dealwith missing data, multiple series and unobserved states.
But hang on a minute. . .
◮ 1D sections, like that shown through fθ(e,y) are only apartial view.
◮ For example, here is a section through a function f(x, z). . .
−3 −2 −1 0 1 2 3
−0.1
2−
0.1
0−
0.0
8−
0.0
6−
0.0
4−
0.0
20.0
0
x
f(x,y
)
◮ Lots of local minimum right?
f(x, y). . .
◮ Wrong!
x
yz
◮ . . . it has one global minimum.
Solution 2: Work with the state
◮ Recast Ricker problem in terms of state, nt, rather thannoise, et.
◮ Then plots of fθ(n,y) against any nt are benignlyuni-modal.
◮ Suggests that problem ought to be tractable using statespace methods.
◮ But 1D views are still partial: these transects look lovely
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.1
0.2
0.3
0.4
0.5
x
g(x
,y)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.1
0.2
0.3
0.4
0.5
y
g(x
,y)
◮ but are transects through this. . .
x
y
g(x,y)
◮ . . . need to proceed with caution.
State space methods
1. Use direct MCMC on states nt, and parameter, θ. Appearsto work for Ricker, but
◮ Need good starting values for state, which is difficult for anon-linear model with hidden states.
◮ Mixing often slow.◮ Hard to prove that it is sampling the target properly.
2. Use filtering, within MCMC or for MC estimation oflikelihood.
◮ Seems to work, without the initialization and mixing issues.◮ Still hard to prove that the target is being explored
properly.
Filtering in brief
◮ Idea is to work forward in time, iterating the steps
1. Prediction:
p(nt|y0:t−1) =
∫p(nt|nt−1)p(nt−1|y0:t−1)dnt−1
2. Update:
p(nt|y0:t) = p(yt|nt)p(nt|y0:t−1)/p(yt|y0:t−1)
◮ In practice replace p(n∗|y·) by discrete set of {n(i)t } values
(particles) with ‘importance weights’ {w(i)t }.
1. Prediction step then moves the particles, and updates theirweights, perhaps by importance sampling.
2. Update just updates the importance weights.
◮ The model ought to provide a basis for moving particles.
◮ In practice periodically resample particles in proportion totheir weights to avoid particle depletion problems.
More particle Filtering
◮ Likelihood is p(y0:T ) = p(y0)∏
t p(yt|y0:t−1), where
p(yt|y0:t−1) =
∫p(yt|nt)p(nt|y0:t−1)dnt ≃
1
N
∑i
p(yt|n(i)t )w
(i)t
◮ Particle filtering should work, provided the modellikelihood is not irregular.
◮ On toy problems, like the Ricker, it appears to work well,provided there is enough process noise.
◮ Intuitively it seems that process noise could smooth out theirregularities seen earlier, but generally it’s hard to proveanything. . .
◮ Let’s look at a toy toy model: If we discretise the statespace of the Ricker model then the likelihood is exactly andefficiently computable (discrete HMM).
Exact likelihood for Discrete Ricker as noise declines
2.0 2.5 3.0 3.5 4.0 4.5
−12
0−
100
−90
−80
σe = 0.3
log(r)
log−
likel
ihoo
d
2.0 2.5 3.0 3.5 4.0 4.5
−14
0−
120
−10
0−
80
σe = 0.1
log(r)
log−
likel
ihoo
d
2.0 2.5 3.0 3.5 4.0 4.5
−14
0−
120
−10
0−
80
σe = 0.05
log(r)
log−
likel
ihoo
d
2.0 2.5 3.0 3.5 4.0 4.5
−18
0−
140
−10
0
σe = 0.01
log(r)
log−
likel
ihoo
d
How the particle filter tracks p(nt|y0:t, θ)
5 10 15 20
05
10
15
20
25
30
p(Nt|θ) when σe = 0.3
t
Nt
5 10 15 20
05
10
15
20
25
30
p(Nt|θ) when σe = 0.01
t
Nt
◮ . . . Density of the state, with attempts to sample byparticle filtering. Mixed success.
Statistical dimension reduction. ABC
◮ The alternative is to try to match carefully chosen phase
insensitive statistics, s, of the data, in a principled manner.
◮ Approximate Bayesian Computation, is one approach forstochastic simulation.
1. For each trial θ∗, data, y∗, are simulated and transformedto statistics, s∗.
2. The accept/reject decision about θ∗ is then based onreplacing the likelihood with I(‖s∗ − s‖k < ǫ).
◮ As ǫ → 0 we simulate from f(θ|s).
◮ Snags: need to choose ‖ · ‖k; as ǫ → 0, the acceptance ratealso → 0; works best with few statistics. Tuning needed.
Less tuning, more assumptions: synthetic likelihood
◮ Convert the observed data y into a vector of phaseinsensitive summary statistics that can be modelled asapprox. normal. i.e. s ∼ N(µθ,Vθ), where θ are the modelparameters.
◮ A ‘synthetic’ likelihood can be evaluated as follows.◮ Given θ, simulate Nr (=500, here) replicates, y∗
1,y∗
2. . ., and
process these exactly as y was processed to obtain replicatestatistics, s∗1, s
∗
2, . . ..◮ Define µθ =
∑is∗i/Nr, S = [s∗
1− µθ, s
∗
2− µθ, . . .] and hence
Vθ = STS/(Nr − 1).◮ ls(θ) = −(s− µθ)
TV−1
θ(s − µθ)/2− log |Vθ|/2 is the log
synthetic likelihood.
Synthetic likelihood - picture
θ y
model
y1* y2
* y3* yNr
*
data to statistics transform
s1* s2
* s3* sNr
* s
estimate µθ, Σθ µθ ,Σθ MVN log likelihood
ls(θ)
Ricker example: what statistics?
0 10 20 30 40 50
050
100
150
200
time
obse
rved
pop
ulat
ion
0 5 10 15
−0.
20.
20.
61.
0
Lag
AC
F
0 1 2 3 4 5
01
23
45
yt−10.3
y t0.3
0 10 20 30 40 50
−20
00
100
200
uniform quantiles
orde
red
first
diff
eren
ces
Ricker example: statistics ∼ N?
−3 −2 −1 0 1 2 3
2000
4000
6000
Theoretical Quantiles
Sam
ple
Qua
ntile
s
−3 −2 −1 0 1 2 3
−14
00−
1000
Theoretical Quantiles
Sam
ple
Qua
ntile
s
−3 −2 −1 0 1 2 3
−15
00−
500
0
Theoretical Quantiles
Sam
ple
Qua
ntile
s
−3 −2 −1 0 1 2 3
−10
000
500
Theoretical Quantiles
Sam
ple
Qua
ntile
s
−3 −2 −1 0 1 2 3
−10
000
500
1500
Theoretical Quantiles
Sam
ple
Qua
ntile
s
−3 −2 −1 0 1 2 3
−10
000
1000
2000
Theoretical Quantiles
Sam
ple
Qua
ntile
s
−3 −2 −1 0 1 2 3
−0.
40.
00.
20.
4
Theoretical Quantiles
Sam
ple
Qua
ntile
s
−3 −2 −1 0 1 2 3
−0.
30−
0.25
−0.
20
Theoretical Quantiles
Sam
ple
Qua
ntile
s
−3 −2 −1 0 1 2 3
0.7
0.9
1.1
Theoretical Quantiles
Sam
ple
Qua
ntile
s
−3 −2 −1 0 1 2 3
−8e
−04
−2e
−04
2e−
04
Theoretical Quantiles
Sam
ple
Qua
ntile
s
−3 −2 −1 0 1 2 3
−5e
−06
5e−
06
Theoretical Quantiles
Sam
ple
Qua
ntile
s
−3 −2 −1 0 1 2 3
3236
4044
Theoretical Quantiles
Sam
ple
Qua
ntile
s
−3 −2 −1 0 1 2 3
1015
2025
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Ricker example: transects through synthetic likelihood
3.0 3.5 4.0 4.5 5.0
−10
00−
800
−60
0−
400
−20
00
r
l s
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
−16
−14
−12
−10
−8
−6
−4
σl s
5 10 15 20
−10
000
−60
00−
2000
0
φ
l s
◮ . . . variability is random and small scale.
◮ Easy to explore ls by MCMC, or stochastic optimizationmethods.
◮ Can estimate ls in vicinity of maximum by quadraticregression of ls on parameters.
Ricker example: MCMC results (25% acceptance)
0 10000 20000 30000 40000 50000
3.2
3.4
3.6
3.8
4.0
4.2
step
r
0 10000 20000 30000 40000 50000
0.1
0.3
0.5
step
σ e
0 10000 20000 30000 40000 50000
68
1012
step
φ
0 10000 20000 30000 40000 50000
−70
0−
500
−30
0−
100
step
log
synt
hetic
like
lihoo
d
Ricker example: parameter estimates
r
Den
sity
3.2 3.4 3.6 3.8 4.0
01
23
σe
0.2 0.3 0.4 0.5 0.6
02
46
8
φ10 11 12 13
0.0
0.2
0.4
0.6
0.8
◮ Truth in red, 10000 burn in discarded.
Ricker direct MCMC on state and parameters
r
Den
sity
3.2 3.4 3.6 3.8 4.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
σe
Den
sity
0.0 0.2 0.4 0.6 0.8
01
23
45
6
φ
Den
sity
9.5 10.5 11.5
0.0
0.2
0.4
0.6
0.8
◮ Truth in red, based on second half of every 100th of1000000 single term update steps.
Taking stock
◮ Nothing is perfect (and in practice nothing is fast)!◮ Direct MCMC is difficult to start and difficult to get to mix.◮ Filtering is problematic at low process noise, and when the
parameters are not close to correct.◮ ABC seems to need quite a bit of tuning.◮ Synthetic likelihood has the dodgy normality assumption.
◮ In simulations with toy models, the direct methods(MCMC and Filtering) have better statistical efficiency, forthe correct model, when the process noise is not too low.
◮ When the model is not aiming to capture everything thenonly ABC and Synthetic likelihood seem reasonable.
◮ Let’s look at a less toy example. . .
Nicholson’s Blowflies
Nicholson’s Blowfly Data
0 50 100 150 200 250 300 350
040
0080
00
Nicholson’s blowfly’s
day
Pop
ulat
ion
◮ Lab population of adult flies known exactly every 2 days,from counts of dead flies and freshly discarded pupal cases.
◮ Time from egg to adult is around 2 weeks.
◮ Fecundity is likely to be density dependent.
Blowfly model
◮ DDE model (Nisbet and Gurney, 1981) is
dN
dt= PN(t− τ)e−N(t−τ)/N0 − δN(t).
◮ Daily discretisation is
Nt+1 = PNt−τ exp(−Nt−τ/N0)et +Nt exp(−δǫt)
where et and ǫt are independent Gamma random deviateswith mean 1 and variance σ2
p and σ2d.
◮ Statistics are ACF coefficients to lag 11, coefficients of theincrements regression as for Ricker example, the meanpopulation and coefficients of the auto-regressionyi = β0yi−6 + β1y
2i−6 + β2y
3i−6 + β3yi−1 + β4y
2i−1 + ǫi.
◮ Parameter group Pτ and δτ controls stability.
4 Experimental replicates
0 50 100 150 200 250 300 350
020
0060
0010
000
day
pop
50 100 150 200 250 300
020
0060
0010
000
day
pop
250 300 350 400 450
010
0020
0030
0040
00
day
pop
250 300 350 400 450
050
010
0015
0020
00
day
pop
Blowfly MCMC chain (31% acceptance)
0 10000 20000 30000 40000 50000
−2.
4−
2.0
−1.
6
log
δ
0 10000 20000 30000 40000 50000
1.5
2.0
2.5
log
P
0 10000 20000 30000 40000 50000
5.4
5.8
6.2
log
N0
0 10000 20000 30000 40000 50000
−2.
0−
1.0
0.0
log
σ p
0 10000 20000 30000 40000 50000
1012
1416
τ
0 10000 20000 30000 40000 50000−
2.5
−1.
00.
5
log
σ d
0 10000 20000 30000 40000 50000−20
0000
0−
5000
00
l s
0 10000 20000 30000 40000 50000
−75
−70
−65
l s
Blowfly checking plots
2.0 2.5 3.0 3.5
2.0
3.0
4.0
5.0
log theoretical quantiles
log
obse
rved
qua
ntile
s
−3 −2 −1 0 1 2 3
−4
−2
02
4
N(0,1) quantilesm
argi
nal q
uant
iles
−2 −1 0 1 2
−1.
5−
0.5
0.5
1.5
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
◮ Left is QQ-plot for simulated (s− µθ)TΣ−1
θ (s− µθ)(observed is dashed).
◮ Middle gives scaled normal QQ plots for each element ofsimulated s.
◮ Normal QQ plot for observed Σ−1/2θ (s− µθ).
Blowflies: data left, model reps right
0 50 100 150 200 250 300 350
040
0010
000
popu
latio
n
0 50 100 150 200 250 300 350
040
0010
000
50 100 150 200 250 300
040
0010
000
popu
latio
n
50 100 150 200 250 300
040
0010
000
250 300 350 400 450
020
0040
00
popu
latio
n
250 300 350 400 4500
2000
4000
250 300 350 400 450
010
0020
00
day
popu
latio
n
250 300 350 400 450
010
0020
00
day
Blowfly stability
0.01 0.05 0.50 5.00
15
1050
100
500
δτ
Pτ
no equilibrium
overdamped
underdamped
cyclic
◮ The dynamics are all limit cycles. The fluctuations are notnoise driven.
◮ This is not obvious a priori: e.g. noise has to be muchlarger than demographic stochasticity would give, togenerate observed variability.
Blowflies by Filtering
◮ We tried a similar analysis via filtering.
◮ Have to add observation noise to model to make this work.
0.01 0.05 0.50 5.00
15
1050
100
500
E1 SL
δτ
Pτ
no equilibrium
overdamped
underdamped
cyclic
0.01 0.05 0.50 5.00
15
1050
100
500
E1 PMMH
δτ
Pτ
no equilibrium
overdamped
underdamped
cyclic
0.01 0.05 0.50 5.00
15
1050
100
500
E2 SL
δτ
Pτ
no equilibrium
overdamped
underdamped
cyclic
0.01 0.05 0.50 5.00
15
1050
100
500
E2 PMMH
δτ
Pτ
no equilibrium
overdamped
underdamped
cyclic
0.01 0.05 0.50 5.00
15
1050
100
500
E3 SL
δτ
Pτ
no equilibrium
overdamped
underdamped
cyclic
0.01 0.05 0.50 5.00
15
1050
100
500
E3 PMMH
δτ
Pτ
no equilibrium
overdamped
underdamped
cyclic
0.01 0.05 0.50 5.00
15
1050
100
500
E4 SL
δτ
Pτ
no equilibrium
overdamped
underdamped
cyclic
0.01 0.05 0.50 5.00
15
1050
100
500
E4 PMMH
δτ
Pτ
no equilibrium
overdamped
underdamped
cyclic
Which is right?
◮ The filtering approach puts much more weight on theregion of greater dynamic stability.
◮ For the experimental conditions in E2 there is separateexperimental data which is inconsistent with this.
◮ Further investigation suggests that the filter is sufferingsevere particle depletion at a couple of times.
◮ Switching to a heavier tailed error distribution improvesmatters somewhat.
◮ But there are at least some clear diagnostic indicators. . .
Typical simulations from the fitted models
0 100 250
040
00
E1
Day
Pop
ulat
ion
50 150 250
040
0010
000 E2
DayP
opul
atio
n
250 350 450
015
0035
00
E3
Day
Pop
ulat
ion
250 350 450
050
015
00
E4
Day
Pop
ulat
ion
0 100 250
040
0010
000 E1 SL
Day
Pop
ulat
ion
50 150 250
040
0010
000 E2 SL
Day
Pop
ulat
ion
250 350 450
020
00
E3 SL
Day
Pop
ulat
ion
250 350 450
060
014
00
E4 SL
Day
Pop
ulat
ion
0 100 250
010
000
E1 PMMH
Day
Pop
ulat
ion
50 150 250
060
0014
000 E2 PMMH
Day
Pop
ulat
ion
250 350 450
020
0050
00
E3 PMMH
Day
Pop
ulat
ion
250 350 450
015
0035
00
E4 PMMH
Day
Pop
ulat
ion
Conclusions?
◮ For a correct enough non-linear dynamic models withenough process noise, filtering is probably best.
◮ The problem is knowing what’s enough in the above.
◮ If you are unsure that the model is right, or it’s notsupposed to be correct, then information reductionmethods such as ABC or Synthetic Likelihood (SL) may beless misleading, and don’t seem to be massively worse thanfiltering in any case.
◮ SL requires a bit less tuning than ABC, but at the cost of anormality assumption. The latter may be removable via asaddlepoint approximation.
Some References
Arulampalam, Maskell, Gordon, and Clapp (2002) A tutorial onparticle filters for online nonlinear/non-gaussian bayesian tracking.
Beaumont, Zhang, and Balding (2002) Approximate bayesiancomputation in population genetics. Genetics, 162(4):20252035. IEEETransactions on Signal Processing, 50(2):174188.
Ionides, Bhadra, Atchade, and King (2011) Iterated filtering. TheAnnals of Statistics, 39(3):17761802.
Nicholson (1954) An outline of the dynamics of animal populations.Australian Journal of Zoology, 2(1):965.
Wood (2010) Statistical inference for noisy nonlinear ecologicaldynamic systems. Nature, 466(7310):11021104
Fasiolo, Pya and Wood (nearly complete) Statistical inference forhighly nonlinear dynamic ecological models: method comparison.
Recommended