A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

A likelihood-free version of the stochasticapproximation EM algorithm (SAEM) forparameter estimation in complex models

Umberto PicchiniCentre for Mathematical Sciences,

Lund University

twitter: @uPicchini

[email protected]

18 October 2016, Department of Computer and Information Science,Linköping University.

Umberto Picchini [email protected], twitter:@uPicchini

This presentation is based on the working paper:

P. (2016). Likelihood-free stochastic approximation EM for inferencein complex models, arXiv:1609.03508.


I will consider:

the problem of parameter inference with “complex models”, i.e.models having an intractable likelihood.

the inference problem for “incomplete data”, in the sense givenby the seminal EM-paper [Dempster et al. 1977].

In two words, what I investigate is:

we have data Y arising from a generic model depending on theunobservable X and parameter θ.

How do we estimate θ from Y, in presence of the latent X?


The presence of the latent (unobservable) X means that we deal withan incomplete data problem.

The EM algorithm1 is the standard way to conductmaximum-likelihood inference for θ in presence of incomplete data.

The complete data is the couple (Y, X), and the correspondingcomplete likelihood is p(Y, X;θ).The incomplete (data) likelihood is p(Y;θ).

We are interested in finding the MLE

θ̂ = arg maxθ∈Θ

p(Y;θ)

given observations Y = (Y1, ..., Yn).1Dempster, Laird and Rubin, 1977. Maximum likelihood from incomplete data

via the EM algorithm. JRSS-B.Umberto Picchini [email protected], twitter:@uPicchini

In the rest of this presentation I will discuss:

SAEM: a popular stochastic version of EM, for when EM is notdirectly applicable.

Implementing SAEM is difficult! And impossible for modelswith intractable likelihoods. What to do?

A quick intro to Wood’s synthetic likelihoods (SL).

Our contribution embedding SL within SAEM.

Simulation studies.


EM in one slide

EM is a two steps procedure: E-step followed by the M-step.Define

Q(θ|θ ′) =

∫log pY,X(Y, X;θ)pX|Y(X|Y;θ ′)dX ≡ EX|Y log pY,X(Y, X;θ).

At iteration k > 1

E-step: compute Q(θ|θ̂(k−1)

);

M-step: obtain θ̂(k)

= arg maxθ∈ΘQ(θ|θ̂(k−1)

).

As k→∞ the sequence {θ̂(k)

}k converges to a stationary point of thedata likelihood p(Y;θ) under weak assumptions.

Typically, E-step is hard while M-step is “easy”.


How to get around the E-step

The E-step requires the evaluation of:

Q(θ|θ ′) =

∫log pY,X(Y, X;θ)pX|Y(X|Y;θ ′)dX.

This is hard, as pX|Y(X|Y; ·) is typically unknown.

MCEM [Wei-Tanner 1990]Assume we are able to simulate draws from pX|Y(X|Y; ·) say mk times→Monte-Carlo approximation:

generate xr ∼ pX|Y(X|Y; ·), r = 1, ..., mk;

Q(θ|θ ′) ≈ 1mk

∑mkr=1 log pY,X(Y, xr;θ).

Problem: mk needs to increase as k increases. Double asymptoticproblem!


SAEM (stochastic approximation EM)

A more efficient approximation to the E-step is is given by SAEM2

generate xr ∼ pX|Y(X|Y; ·), r = 1, ..., mk;

Q̃(θ|θ̂(k)

) =

(1 − γk)Q̃(θ|θ̂(k−1)

) + γk( 1

mk

∑mkr=1 log pY,X(Y, xr;θ)

).

With {γk} a decreasing sequence such that∑

k γk =∞,∑

k γ2k <∞.

As k→∞, it is not required for mk to increase, in fact it is possible totake mk ≡ 1 for all k, however see next slide for convergenceproperties.

2Delyon, Lavielle and Moulines, 1999. Convergence of a stochasticapproximation version of the EM algorithm. Annals of Statistics.


Beautiful things happen if you manage to write log p(Y, X) as a member ofthe curved exponential family, e.g.

log p(Y, X;θ) = −Λ(θ) + 〈Sc(Y, X), Γ(θ)〉. (1)

Here 〈...〉 is the scalar product, Λ and Γ are two functions of θ and Sc(Y, X)is the minimal sufficient statistic of the complete model.

Then we only need to update the sufficient statistics

sk = sk−1 + γk(Sc(Y, X(k)) − sk−1).

Computing Sc(Y, X) for most non-trivial models is hard! But if you manage,the M-step is often explicit:

θ̂(k)

= arg maxθ∈Θ

(−Λ(θ) + 〈sk, Γ(θ)〉)

Only for case (1) Delyon et al. (1999) prove convergence of the sequence{θk}k to a stationary point of p(Y;θ) under weak conditions.


Some considerations

General problem with all EM-type algorithms: we assumed the ability tosimulate latent states from p(X|Y). This is often not trivial.

For state-space models, plenty of possibilities given by particle filters(sequential Monte Carlo). In this case, the sampling issue is“solvable”.

What to do outside of state-space models? What if the model has nodynamic structure?

What if the model is so complex that we can’t write pY,X(Y, X) inclosed form?

Example, for SDE models the transition density of the underlying Markovprocess is unknown.

Then we cannot write p(X0:n) =∏n

j=1 p(Xj|Xj−1), hence we cannot write

pY,X(Y0:n, X0:n) = p(Y0:n|X0:n)p(X0:n).


If we can’t write the complete likelihood certainly we cannot hope tofind the sufficient statistics Sc(·).

Specifically: it is impossible to apply SAEM for models havingintractable likelihoods, e.g. models for which we can’t write p(Y, X)in closed form.

Likelihood-free methods use the ability to simulate from a model tocompensate for our ignorance about the underlying likelihood.


Say we formulate a statistical model p(Y;θ) such that n observationsare assumed Yj ∼ p(Y; θ), j = 1, .., n.

Suppose we do not know p(Y; ·), however

we do know how to implement a simulator to generate drawsfrom p(Y; ·).

Trivial example (but you get the idea)

y = x + ε, x ∼ px, ε ∼ N(0,σ2ε)

simulate x∗ ∼ px(X) [possible even when px unknown!]

simulate y∗ ∼ N(x∗,σ2ε), then y∗ ∼ py(Y |σε)

Therefore, in the following we consider the case where the only thingwe know is how to forward simulate from an assumed model.


Bayes: complex networks might not allow for trivial sampling (Gibbs-type),i.e. when the conditional densities are unknown.

[Pic from Schadt et al. (2009) doi:10.1038/nrd2826]Umberto Picchini [email protected], twitter:@uPicchini

The ability to simulate from a model even when we have noknowledge of the analytic expressions of the underlying likelihood(s),is central in likelihood-free methods for intractable likelihoods.

Several ways to deal with “intractable likelihoods”.

“Plug-and-play methods”: the only requirements is the ability tosimulate from the data-generating-model.

particle marginal methods (PMMH, PMCMC) based on SMCfilters [Andrieu et al. 2010].

(improved) Iterated filtering [Ionides et al. 2015]

approximate Bayesian computation (ABC) [Marin et al. 2012].

Synthetic likelihoods [Wood 2010].

In the following I focus on Synthetic Likelihoods.Umberto Picchini [email protected], twitter:@uPicchini

A nearly chaotic model

Two realizations from a Ricker model.{yt ∼ Poi(φNt)

Nt = r · Nt−1 · e−Nt−1 .

Small changes in r cause major departures from data.0

510

15n t

Time5 10 15 20 25 −2

60−2

20−1

80−1

40Lo

g−lik

eliho

od

log(r)2.5 3.0 3.5 4.0 4.5

Figure: One path generated with log r = 3.8 (black) and one generated withlog r = 3.799 (red).


The resulting likelihood can be difficult to explore if algorithms arebadly initialized.

2.5 3.0 3.5 4.0

−15

−10

−5

Ricker

log(r)

27

12

17

nt

−5 −4 −3 −2

−35

−25

−15

−5

0

Pennycuick

log(a)

0.5 1.0 1.5 2.0

−1

5−

10

−5

0

Varley

log(b)0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

−15

−10

−5

Maynard−Smith

log(r)

Log−

likelih

ood (

10

3)

Figure: The loglikelihood is in black.


A change of paradigm

from S. Wood, Nature 2010:

“Naive methods of statistical inference try to make the modelreproduce the exact course of the observed data in a way that the realsystem itself would not do if repeated.”

“What is important is to identify a set of statistics that is sensitive tothe scientifically important and repeatable features of the data, butinsensitive to replicate-specific details of phase.”

In other words, with complex, stochastic and/or chaotic model wecould try to match features of the data, not the path of the data itself.

A similar approach is considered in ABC (approximate Bayesiancomputation).


Synthetic likelihoods

y: observed data, from static or dynamic models

s(y): (vector of) summary statistics of data, e.g. mean,autocorrelations, marginal quantiles etc.

assumes(y) ∼ N(µθ,Σθ)

an assumption justifiable via second order Taylor expansion(same as in Laplace approximations).

µθ and Σθ unknown: estimate them via simulations.


nature09319-f2.2.jpg (JPEG Image, 946 × 867 pixels) - Scaled (84%) http://www.nature.com/nature/journal/v466/n7310/images/nature09319...

1 of 1 29/05/2016 16:03

Figure: Schematic representation of the synthetic likelihoods procedure.Umberto Picchini [email protected], twitter:@uPicchini

For fixed θ simulate R artificial datasets y∗1 , ..., y∗R from your model andcompute corresponding (possibly vector valued) summaries s∗1 , ..., s∗R.

compute

µ̂θ =1R

R∑r=1

s∗r , Σ̂θ =1

R − 1

R∑r=1

(s∗r − µ̂θ)(s∗r − µ̂θ)′

compute the statistics sobs for the observed data y.

evaluate a multivariate Gaussian likelihood at sobs

liksyn(θ) := N(sobs; µ̂θ, Σ̂θ) ∝ |Σ̂θ|−1/2 exp

(−(sobs − µ̂θ)Σ̂

−1θ (sobs − µ̂θ)

′

2

)This likelihood can be maximized for a varying θ or be plugged withinan MCMC algorithm targeting

π̂(θ|sobs) ∝ liksyn(θ)π(θ).


So the synthetic likelihood methodology assumes no specificknowledge of the probabilistic features of the model.

Only assumes the ability to forward-generate from the model.

assumes that the analyst is able to specify “informative”summaries.

assumes that said summaries are (approximately) Gaussians ∼ N(·).

Transform the summaries to be ≈ N is often not an issue (just as wedo in linear regression).

Of course the major issue (still open, also in ABC) is how to buildinformative summaries. This is left unsolved.


I intend to use the synthetic likelihoods approach to enablelikelihood-free inference using SAEM.

This should allow SAEM to be applied to intractable likelihoodmodels.


We use synthetic likelihoods to construct a Gaussian approximationover a set of complete summaries (S(Y), S(X)) to define a completesynthetic loglikelihood.

the complete synthetic loglikelihood

log p(s;θ) = logN(s;µ(θ),Σ(θ)), (2)

with s = (S(Y), S(X))

In (2) µ(θ) and Σ(θ) are unknown but can be estimated usingsynthetic likelihoods (SL), conditionally on θ.

However we need to obtain a maximizer for the (incomplete)synthetic loglikelihood log p(S(Y);θ).


SAEM with synthetic likelihoods (SL)

For given θ SL returns estimates µ̂(θ) and Σ̂(θ) (sample mean andsample covariance).

Crucial result

For a Gaussian likelihood µ̂(θ) and Σ̂(θ) are sufficient statistics forµ(θ) and Σ(θ). And a Gaussian is member of the exponential family.

Recall: what SAEM does is to update sufficient statistics, perfect forus!

At kth SAEM iteration:

µ̂(k)(θ) = µ̂(k−1)(θ) + γ(k)(µ̂(θ) − µ̂(k−1)(θ)) (3)

Σ̂(k)

(θ) = Σ̂(k−1)

(θ) + γ(k)(Σ̂(θ) − Σ̂(k−1)

(θ)). (4)


Updating the latent variable X

At kth iteration of SAEM we need to sample S(X(k))|S(Y). This istrivial!.

We haveS(X(k))|S(Y) ∼ N(µ̂

(k)x|y (θ), Σ̂

(k)x|y (θ))

where

µ̂(k)x|y = µ̂x + Σ̂xyΣ̂

−1y (S(Y) − µ̂y)

Σ̂(k)x|y = Σ̂x − Σ̂xyΣ̂

−1y Σ̂yx

where µ̂x, µ̂y, Σ̂x, Σ̂y, Σ̂xy and Σ̂yx are extracted from (µ̂(k), Σ̂(k)

).That is µ̂(k)(θ) = (µ̂x, µ̂y) and

Σ̂(k)

(θ) =

[Σx Σxy

Σyx Σy

].


The M-step

Now that we have simulated a S(X(k)) (conditional on data) letsproduce the complete summaries at iteration k:

s(k) := (S(Y), S(X(k)))

and maximize (M-step) the complete synthetic loglikelihood:

θ̂(k)

= arg maxθ∈Θ

logN(s(k);µ(θ),Σ(θ)) (5)

For each perturbation of θ the M-step performs a synthetic likelihoodsimulation.

It returns the best found maximizer for (5) and corresponding best(µ̂, Σ̂). Plug these in the updating moments equations (3)-(4).


The slide that follows describes a single iteration of SAEM-SL.


Input: observed summaries S(Y), positive integers L and R. Values for θ̂(k−1)

, µ̂(k−1) and Σ̂(k−1)

.

Output: θ̂(k).

At iteration k:1. Extract µ̂x , µ̂y , Σ̂x , Σ̂y , Σ̂xy and Σ̂yx from µ̂(k−1) and Σ̂

(k−1). Compute conditional moments µ̂x|y , Σ̂x|y .

2. Sample S(X(k−1))|S(Y) ∼ N(µ̂(k−1)x|y (θ), Σ̂

(k−1)x|y (θ)) and form s(k−1) := (S(Y), S(X(k−1))).

3. Obtain (θ(k) ,µ(k) ,Σ(k)) from InternalSL(s(k−1) , θ̂(k−1)

, R) starting at θ̂(k−1)

.4. Increase k := k + 1 and go to step 1.

Function InternalSL(s(k−1) ,θstart , R):

Input: s(k−1) , starting parameters θstart , a positive integer R. Functions to compute simulated summaries S(y∗) andS(x∗) must be available.Output: the best found θ∗ maximizing logN(s(k) ; µ̂, Σ̂) and corresponding (µ∗ ,Σ∗).

Here θc denotes a generic candidate value.i. Simulate x∗r ∼ pX(X0:N ;θc), y∗r ∼ pY|X(Y1:n|X1:n ;θc) for r = 1, ..., R.

ii. Compute user-defined summaries s∗r = (S(y∗r ), S(x∗r )) for r = 1, ..., R. Construct the corresponding (µ̂, Σ̂).iii. Evaluate logN(s(k) ; µ̂, Σ̂).Use a numerical procedure that performs (i)–(iii) L times to find the best θ∗ maximizing logN(s(k) ; µ̂, Σ̂) for varying θc .Denote with (µ∗ , Σ̂

∗) the simulated moments corresponding to the best found θ∗ . Set θ(k) := θ∗ .

iv. Update moments:

µ̂(k) = µ̂(k−1) +γ(k)(µ̂∗ − µ̂(k−1))

Σ̂(k)

= Σ̂(k−1)

+γ(k)(Σ̂∗− Σ̂

(k−1)).

Return (θ(k) , µ̂(k) , Σ̂(k)

).


We have now completed all the steps required to implement alikelihood free version of SAEM.

Main inference problem: not clear how to construct a set ofinformative (S(Y), S(X)) for θ. These are user-defined, hencearbitrary.Main computational bottleneck: compared to the regularSAEM, our M-step is a numerical optimization routine. We usedNelder-Mead, which is rather slow.

Ideal case (typically unattainable)

If we have:1 s = (S(Y), S(X)) is jointly sufficient for θ and2 s is multivariate Gaussian

then our likelihood free SAEM converges to a stationary point ofp(Y;θ) under the conditions given in Delyon et al 1999.


I have two examples to show:

a state-space model driven by an SDE: I compare SAEM-SLwith the regular SAEM and with direct optimzation of thesynthetic likelihood.

a simple Gaussian state-model: I compare SAEM-SML vs theregular SAEM, iterated filtering and particle marginal methods.

A “static model” example is available in my paper3.

3P. 2016. Likelihood-free stochastic approximation EM for inference in complexmodels, arXiv:1609.03508.


Example: a nonlinear Gaussian state-space model

We study a standard toy-model (e.g. Jasra et al. 20104).

{Yj = Xj + σyνj, j > 1Xj = 2 sin(eXj−1) + σxτj,

with νj, τj ∼ N(0, 1) i.i.d. and X0 = 0.

θ = (σx,σy).

4Jasra, Singh, Martin and McCoy, 2012. Filtering via approximate Bayesiancomputation. Statistics and Computing.


We generate n = 50 observations from the model withσx = σy = 2.23.

0 5 10 15 20 25 30 35 40 45 50

time

-10

-5

0

5

10

Y


the standard SAEM

Let’s set-up the “standard” SAEM. We need the complete likelihoodand sufficient statistics.

Easy for this model.

p(Y, X) = p(Y|X)p(X) =

n∏j=1

p(Yj|Xj)p(Xj|Xj−1)

Yj|Xj ∼ N(Xj,σ2y)

Xj|Xj−1 ∼ N(2 sin(eXj−1),σ2x)

Sσ2x=∑n

j=1(Xj − 2 sin(eXj−1))2 and Sσ2y=∑n

j=1(Yj − Xj)2 are

sufficient for σ2x and σ2

y


Plug the sufficient statistics in the complete (log)likelihood, and set tozero the gradient w.r.t. (σ2

x ,σ2y).

Explicit M-step at kth iteration:

σ̂2(k)

x = Sσ2x/n

σ̂2(k)

y = Sσ2y/n

To run SAEM the only left thing needed is a way to sample X(k)|Y.

For this we use sequential Monte Carlo, e.g. the bootstrap filter (inbackup slides, if needed).

I skip this sampling step. Just know that this is easily accomplishedfor state space models.


SAEM-SL: SAEM with synthetic likelihoods

To implement SAEM-SL no knowledge of the complete likelihood isrequired, nor analytic derivation of the sufficient statistics.

We just have to postulate some “reasonable” summaries for X and Y.

For each synthetic likelihood step, we simulate R = 500 realizationsof S(Xr) and S(Yr), containing:

the sample median of Xr, r = 1, ..., R;

the median absolute deviation of Xr;

the 10th, 20th, 75th and 90th percentile of Xr.

the sample median of Yr;

the median absolute deviation of Yr;

the 10th, 20th, 75th and 90th percentile of Yr.


Results with SAEM-SL on 30 different datasets

Starting parameter values are randomly initialised. Here R = 500.

0 10 20 30 40 50 60 70σ

x

-2

0

2

4

6

8

10

12

14

16

18

20

0 10 20 30 40 50 60 70σ

y

0

5

10

15

20

25

30

Figure: trace plots for SAEM-SL (σx, left; σy, right) for the thirty estimationprocedures. Horizontal lines are true parameter values.


(M, M̄) (500,200) (1000,200) (1000,20)σx (true value 2.23)

SAEM-SMC 2.54 [2.53,2.54] 2.55 [2.54,2.56] 1.99 [1.85,2.14]IF2 1.26 [1.21,1.41] 1.35 [1.28,1.41] 1.35 [1.28,1.41]

σy (true value 2.23)SAEM-SMC 0.11 [0.10,0.13] 0.06 [0.06,0.07] 1.23 [1.00,1.39]IF2 1.62 [1.56,1.75] 1.64 [1.58,1.67] 1.64 [1.58,1.67]

Table: SAEM with bostrap filter using M particles; IF2=iterated filtering.

R 500 1000σx (true value 2.23)

SAEM-SL 1.67 [0.42,1.97] 1.51 [0.82,2.03]σy (true value 2.23)

SAEM-SL 2.40 [2.01,2.63] 2.27 [1.57,2.57]

Table: SAEM with synthetic likelihoods. K = 60 iterations.


Example: state-space SDE model [P., 2016]

We consider a one-dimensional state-space model driven by a SDE.

Suppose we administer 4 mg of theophylline [Dose] to a subject.

Xt is the level of theophylline concentration in blood at time t (hrs).Consider the following state-space model:Yj = Xj + εj, εj ∼iid N(0,σ2

ε)

dXt =

(Dose·Ka·Ke

Cl e−Kat − KeXt

)dt + σ

√XtdWt, t > t0

Ke is the elimination rate constantKa is the absorption rate constantCl the clearance of the drugσ the intensity of intrinsic stochastic noise.


We simulate a set of n = 30 observations from the model atequispaced times.But how to simulate from this model? No analytic solution for theSDE is available.

We resort to the Euler-Maruyama discretization with a small stepsizeh = 0.05 on the time interval [0,30]:

Xt+h = Xt +

(Dose · Ka · Ke

Cle−Kat − KeXt

)h + (σ

√h · Xt)Zt+h,

{Zt} ∼iid N(0, h)

This implies a latent simulated process of length N:

X0:N = {X0, Xh, ..., XN}.


A typical relation of the process:

time (hrs)0 5 10 15 20 25 30

0

2

4

6

8

10

12

14

Figure: data (circles) and the latent process (black line).


The classic SAEM

Applying the “standard” SAEM is not really trivial here.

The complete likelihood:

p(Y, X) = p(Y|X)p(X) =

n∏j=1

p(Yj|Xj)

N∏i=1

p(Xi|Xi−1)

Yj|Xj ∼ N(Xj,σ2y)

Xi|Xi−1 ∼ not available.

Euler-Maruyama induces a Gaussian approximation:

p(xi|xi−1) ≈1

σ√

2πxi−1hexp{−

[xi − xi−1 − (Dose·Ka·Ke

Cl e−Kaτi−1 − Kexi−1)h]2

2σ2xi−1h

}.


The classic SAEM

I am not going to show how to obtain all the sufficient summarystatistics (see the paper).

Just trust me that it requires a bit of work.

And this is just a one-dimensional model!

We sample X(k)|Y using the bootstrap filter sequential Monte Carlomethod.

If you are not familiar with sequential Monte Carlo, worry not. Justconsider it a method returning a “best” filtered X(k) based on Y (forlinear Gaussian models you would use Kalman).


SAEM-SL with synthetic likelihoods

User-defined summaries for a simulation r: (s(x∗r ), s(y∗r )).s(x∗r ) contains:

(i) the median values of X∗0:N ;

(ii) the median absolute deviation of X∗0:N ,

(iii) a statistic for σ computed from X∗0:N (see next slide).

(iv) (∑

j(Y∗j − X∗j )

2/n)1/2.

s(y∗r ) contains:

(i) the median value of y∗r ;

(ii) its median absolute deviation;

(iii) the slope of the line connecting the first and last simulatedobservation (Y∗n − Y∗1 )/(tn − t1).


In Miao 2014: for an SDE of the type dXt = µ(Xt)dt + σg(Xt)dWt

with t ∈ [0, T], we have∑Γ |Xi+1 − Xi|

2∑Γ g(Xi)(ti+1 − ti)

→ σ2 as |Γ |→ 0

where the convergence is in probability and Γ a partition of [0, T].

We deduce that using the discretization {X0, X1, ..., XN} produced bythe Euler-Maruyama scheme, we can take the square root of the lefthand side in the limit above, which should be informative for σ.


100 different datasets are simulated from ground-truth parameters.All optimizations start away from ground truth values.

SAEM-SL: at each iteration of the M-step simulates R = 500summaries, with L = 10 Nelder-Mead iterations (M-step) andK = 100 SAEM iterations.

0 20 40 60 80 100 120Ke

0

0.05

0.1

0.15

0.2

0 20 40 60 80 100 120Cl

0

0.05

0.1

0.15

0.2

0 20 40 60 80 100 120σ

0

0.1

0.2

0.3

0 20 40 60 80 100 120σǫ

0

0.2

0.4

0.6

0.8


SAEM-SMC using the bootstrap filter with M = 500 particles toobtain a X(k)|Y.

Cl and σ are essentially unidentified.


Ke Cl σ σεtrue values 0.050 0.040 0.100 0.319SAEM-SMC 0.045 [0.042,0.049] 0.085 [0.078,0.094] 0.171 [0.158,0.184] 0.395 [0.329,0.465]SAEM-SL 0.044 [0.038,0.051] 0.033 [0.028,0.039] 0.106 [0.083,0.132] 0.266 [0.209,0.307]optim. SL 0.063 [0.054,0.069] 0.089 [0.068,0.110] 0.304 [0.249,0.370] 0.543 [0.485,0.625]

SAEM-SMC: uses M = 500 particles to filter X(k)|Y via SMC. Runsfor K = 300 SAEM iterations.

SAEM-SL at each iteration of the M-step simulates R = 500summaries, with L = 10 Nelder-Mead iterations (M-step) andK = 100 SAEM iterations.

“optim. SL” denotes the direct maximization of Wood’s synthetic(incomplete) likelihood:

θ̂ = arg maxθ∈Θ

logN(S(Y);µ(θ),Σ(θ)). (6)


How about Gaussianity of the summaries?

Here we have qq-normal plots from the 7 postulated summaries at theobtained optimum (500 simulations each).

-4 -2 0 2 4

sx(1)

4

6

8

10

12

-4 -2 0 2 4

sx(2)

1.5

2

2.5

3

3.5

-4 -2 0 2 4

sx(3)

1.8

2

2.2

2.4

-4 -2 0 2 4

sx(4)

0.1

0.2

0.3

0.4

-4 -2 0 2 4

sy(1)

4

6

8

10

12

-4 -2 0 2 4

sy(2)

1.5

2

2.5

3

3.5

-4 -2 0 2 4

sy(3)

-0.4

-0.3

-0.2

-0.1

The summaries quantiles nicely follow the line (not visible) for theperfect match with Gaussian quantiles.


Summary

We introduced SAEM-SL, a version of SAEM that is able to dealwith intractable likelihoods;

It only requires the formulation and simulation of “informative”summaries s.

How to construct informative summaries automatically is adifficult open problem.

if said user-defined summaries s are sufficient for θ (veryunlikely), and if s ∼ N(·) then SAEM-SL converges to the truemaximum likelihood estimates for p(Y|θ).

The method can be used for intractable models, or even just toinitialize starting values for more refined algorithms (e.g.particle MCMC).


Key references

Andrieu et al. 2010. Particle Markov chain Monte Carlo methods.JRSS-B.

Delyon, Lavielle and Moulines, 1999. Convergence of a stochasticapproximation version of the EM algorithm. Annals of Statistics.

Dempster, Laird and Rubin, 1977. Maximum likelihood fromincomplete data via the EM algorithm. JRSS-B.

Ionides et al. 2015. Inference for dynamic and latent variable modelsvia iterated, perturbed Bayes maps. PNAS.

Marin et al. 2012. Approximate Bayesian computational methods.Stat. Comput.

Picchini 2016. Likelihood-free stochastic approximation EM forinference in complex models, arXiv:1609.03508.

Wood 2010. Statistical inference for noisy nonlinear ecologicaldynamic systems. Nature.


Appendix


Justification of Gaussianity (Wood 2010)

Assuming Gaussianity for summaries s(·) can be justified from astandard Taylor expansion.

Say that fθ(s) is the true (unknown) joint density of s.

Expand fθ(s) around its mode µθ:

log fθ(s) ≈ log fθ(µθ) +12(s − µθ) ′

(∂2 log fθ∂s∂s ′

)(s − µθ)

hence

fθ(s) ≈ const× exp{−

12(s − µθ) ′

(−∂2 log fθ∂s∂s ′

)(s − µθ)

}s ∼ N

(µθ,{−∂2 log fθ∂s∂s ′

}−1), approximately when s ≈ µθ


Asymptotic properties for synthetic likelihoods (Wood2010)

As the number of simulated statistics R→∞the maximizer θ̂ of liks(θ) is a consistent estimator.

θ̂ is an unbiased estimator.

θ̂ might not be in general Gaussian. It will be Gaussian if Σθdepends weakly on θ or when d = dim(s) is large.


Algorithm 1 Bootstrap filter with M particles and threshold 1 6 M̄ 6M. Resamples only when ESS < M̄.

Step 0. Set j = 1: for m = 1, ..., M sample X(m)1 ∼ p(X0), compute weights

W(m)1 = f (Y1|X

(m)1 ) and normalize weights w(m)

1 := W(m)1 /∑M

m=1 W(m)1 .

Step 1.if ESS({w(m)

j }) < M̄ thenresample M particles {X(m)

j , w(m)j } and set W(m)

j = 1/M.end ifSet j := j + 1 and if j = n + 1, stop and return all constructed weights{W(m)

j }m=1:Mj=1:n to sample a single path. Otherwise go to step 2.

Step 2. For m = 1, ..., M sample X(m)j ∼ p(·|X(m)

j−1). Compute

W(m)j := w(m)

j−1p(Yj|X(m)j )

normalize weights w(m)j := W(m)

j /∑M

m=1 W(m)j and go to step 1.