54
A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models Umberto Picchini Centre for Mathematical Sciences, Lund University twitter: @uPicchini [email protected] 18 October 2016, Department of Computer and Information Science, Linköping University. Umberto Picchini [email protected], twitter:@uPicchini

A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Embed Size (px)

Citation preview

Page 1: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

A likelihood-free version of the stochasticapproximation EM algorithm (SAEM) forparameter estimation in complex models

Umberto PicchiniCentre for Mathematical Sciences,

Lund University

twitter: @uPicchini

[email protected]

18 October 2016, Department of Computer and Information Science,Linköping University.

Umberto Picchini [email protected], twitter:@uPicchini

Page 2: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

This presentation is based on the working paper:

P. (2016). Likelihood-free stochastic approximation EM for inferencein complex models, arXiv:1609.03508.

Umberto Picchini [email protected], twitter:@uPicchini

Page 3: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

I will consider:

the problem of parameter inference with “complex models”, i.e.models having an intractable likelihood.

the inference problem for “incomplete data”, in the sense givenby the seminal EM-paper [Dempster et al. 1977].

In two words, what I investigate is:

we have data Y arising from a generic model depending on theunobservable X and parameter θ.

How do we estimate θ from Y, in presence of the latent X?

Umberto Picchini [email protected], twitter:@uPicchini

Page 4: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

The presence of the latent (unobservable) X means that we deal withan incomplete data problem.

The EM algorithm1 is the standard way to conductmaximum-likelihood inference for θ in presence of incomplete data.

The complete data is the couple (Y, X), and the correspondingcomplete likelihood is p(Y, X;θ).The incomplete (data) likelihood is p(Y;θ).

We are interested in finding the MLE

θ̂ = arg maxθ∈Θ

p(Y;θ)

given observations Y = (Y1, ..., Yn).1Dempster, Laird and Rubin, 1977. Maximum likelihood from incomplete data

via the EM algorithm. JRSS-B.Umberto Picchini [email protected], twitter:@uPicchini

Page 5: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

In the rest of this presentation I will discuss:

SAEM: a popular stochastic version of EM, for when EM is notdirectly applicable.

Implementing SAEM is difficult! And impossible for modelswith intractable likelihoods. What to do?

A quick intro to Wood’s synthetic likelihoods (SL).

Our contribution embedding SL within SAEM.

Simulation studies.

Umberto Picchini [email protected], twitter:@uPicchini

Page 6: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

EM in one slide

EM is a two steps procedure: E-step followed by the M-step.Define

Q(θ|θ ′) =

∫log pY,X(Y, X;θ)pX|Y(X|Y;θ ′)dX ≡ EX|Y log pY,X(Y, X;θ).

At iteration k > 1

E-step: compute Q(θ|θ̂(k−1)

);

M-step: obtain θ̂(k)

= arg maxθ∈ΘQ(θ|θ̂(k−1)

).

As k→∞ the sequence {θ̂(k)

}k converges to a stationary point of thedata likelihood p(Y;θ) under weak assumptions.

Typically, E-step is hard while M-step is “easy”.

Umberto Picchini [email protected], twitter:@uPicchini

Page 7: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

How to get around the E-step

The E-step requires the evaluation of:

Q(θ|θ ′) =

∫log pY,X(Y, X;θ)pX|Y(X|Y;θ ′)dX.

This is hard, as pX|Y(X|Y; ·) is typically unknown.

MCEM [Wei-Tanner 1990]Assume we are able to simulate draws from pX|Y(X|Y; ·) say mk times→Monte-Carlo approximation:

generate xr ∼ pX|Y(X|Y; ·), r = 1, ..., mk;

Q(θ|θ ′) ≈ 1mk

∑mkr=1 log pY,X(Y, xr;θ).

Problem: mk needs to increase as k increases. Double asymptoticproblem!

Umberto Picchini [email protected], twitter:@uPicchini

Page 8: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

SAEM (stochastic approximation EM)

A more efficient approximation to the E-step is is given by SAEM2

generate xr ∼ pX|Y(X|Y; ·), r = 1, ..., mk;

Q̃(θ|θ̂(k)

) =

(1 − γk)Q̃(θ|θ̂(k−1)

) + γk( 1

mk

∑mkr=1 log pY,X(Y, xr;θ)

).

With {γk} a decreasing sequence such that∑

k γk =∞,∑

k γ2k <∞.

As k→∞, it is not required for mk to increase, in fact it is possible totake mk ≡ 1 for all k, however see next slide for convergenceproperties.

2Delyon, Lavielle and Moulines, 1999. Convergence of a stochasticapproximation version of the EM algorithm. Annals of Statistics.

Umberto Picchini [email protected], twitter:@uPicchini

Page 9: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Beautiful things happen if you manage to write log p(Y, X) as a member ofthe curved exponential family, e.g.

log p(Y, X;θ) = −Λ(θ) + 〈Sc(Y, X), Γ(θ)〉. (1)

Here 〈...〉 is the scalar product, Λ and Γ are two functions of θ and Sc(Y, X)is the minimal sufficient statistic of the complete model.

Then we only need to update the sufficient statistics

sk = sk−1 + γk(Sc(Y, X(k)) − sk−1).

Computing Sc(Y, X) for most non-trivial models is hard! But if you manage,the M-step is often explicit:

θ̂(k)

= arg maxθ∈Θ

(−Λ(θ) + 〈sk, Γ(θ)〉)

Only for case (1) Delyon et al. (1999) prove convergence of the sequence{θk}k to a stationary point of p(Y;θ) under weak conditions.

Umberto Picchini [email protected], twitter:@uPicchini

Page 10: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Some considerations

General problem with all EM-type algorithms: we assumed the ability tosimulate latent states from p(X|Y). This is often not trivial.

For state-space models, plenty of possibilities given by particle filters(sequential Monte Carlo). In this case, the sampling issue is“solvable”.

What to do outside of state-space models? What if the model has nodynamic structure?

What if the model is so complex that we can’t write pY,X(Y, X) inclosed form?

Example, for SDE models the transition density of the underlying Markovprocess is unknown.

Then we cannot write p(X0:n) =∏n

j=1 p(Xj|Xj−1), hence we cannot write

pY,X(Y0:n, X0:n) = p(Y0:n|X0:n)p(X0:n).

Umberto Picchini [email protected], twitter:@uPicchini

Page 11: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

If we can’t write the complete likelihood certainly we cannot hope tofind the sufficient statistics Sc(·).

Specifically: it is impossible to apply SAEM for models havingintractable likelihoods, e.g. models for which we can’t write p(Y, X)in closed form.

Likelihood-free methods use the ability to simulate from a model tocompensate for our ignorance about the underlying likelihood.

Umberto Picchini [email protected], twitter:@uPicchini

Page 12: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Say we formulate a statistical model p(Y;θ) such that n observationsare assumed Yj ∼ p(Y; θ), j = 1, .., n.

Suppose we do not know p(Y; ·), however

we do know how to implement a simulator to generate drawsfrom p(Y; ·).

Trivial example (but you get the idea)

y = x + ε, x ∼ px, ε ∼ N(0,σ2ε)

simulate x∗ ∼ px(X) [possible even when px unknown!]

simulate y∗ ∼ N(x∗,σ2ε), then y∗ ∼ py(Y |σε)

Therefore, in the following we consider the case where the only thingwe know is how to forward simulate from an assumed model.

Umberto Picchini [email protected], twitter:@uPicchini

Page 13: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Bayes: complex networks might not allow for trivial sampling (Gibbs-type),i.e. when the conditional densities are unknown.

[Pic from Schadt et al. (2009) doi:10.1038/nrd2826]Umberto Picchini [email protected], twitter:@uPicchini

Page 14: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

The ability to simulate from a model even when we have noknowledge of the analytic expressions of the underlying likelihood(s),is central in likelihood-free methods for intractable likelihoods.

Several ways to deal with “intractable likelihoods”.

“Plug-and-play methods”: the only requirements is the ability tosimulate from the data-generating-model.

particle marginal methods (PMMH, PMCMC) based on SMCfilters [Andrieu et al. 2010].

(improved) Iterated filtering [Ionides et al. 2015]

approximate Bayesian computation (ABC) [Marin et al. 2012].

Synthetic likelihoods [Wood 2010].

In the following I focus on Synthetic Likelihoods.Umberto Picchini [email protected], twitter:@uPicchini

Page 15: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

A nearly chaotic model

Two realizations from a Ricker model.{yt ∼ Poi(φNt)

Nt = r · Nt−1 · e−Nt−1 .

Small changes in r cause major departures from data.0

510

15n t

Time5 10 15 20 25 −2

60−2

20−1

80−1

40Lo

g−lik

eliho

od

log(r)2.5 3.0 3.5 4.0 4.5

Figure: One path generated with log r = 3.8 (black) and one generated withlog r = 3.799 (red).

Umberto Picchini [email protected], twitter:@uPicchini

Page 16: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

The resulting likelihood can be difficult to explore if algorithms arebadly initialized.

2.5 3.0 3.5 4.0

−15

−10

−5

Ricker

log(r)

27

12

17

nt

−5 −4 −3 −2

−35

−25

−15

−5

0

Pennycuick

log(a)

0.5 1.0 1.5 2.0

−1

5−

10

−5

0

Varley

log(b)0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

−15

−10

−5

Maynard−Smith

log(r)

Log−

likelih

ood (

10

3)

Figure: The loglikelihood is in black.

Umberto Picchini [email protected], twitter:@uPicchini

Page 17: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

A change of paradigm

from S. Wood, Nature 2010:

“Naive methods of statistical inference try to make the modelreproduce the exact course of the observed data in a way that the realsystem itself would not do if repeated.”

“What is important is to identify a set of statistics that is sensitive tothe scientifically important and repeatable features of the data, butinsensitive to replicate-specific details of phase.”

In other words, with complex, stochastic and/or chaotic model wecould try to match features of the data, not the path of the data itself.

A similar approach is considered in ABC (approximate Bayesiancomputation).

Umberto Picchini [email protected], twitter:@uPicchini

Page 18: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Synthetic likelihoods

y: observed data, from static or dynamic models

s(y): (vector of) summary statistics of data, e.g. mean,autocorrelations, marginal quantiles etc.

assumes(y) ∼ N(µθ,Σθ)

an assumption justifiable via second order Taylor expansion(same as in Laplace approximations).

µθ and Σθ unknown: estimate them via simulations.

Umberto Picchini [email protected], twitter:@uPicchini

Page 19: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

nature09319-f2.2.jpg (JPEG Image, 946 × 867 pixels) - Scaled (84%) http://www.nature.com/nature/journal/v466/n7310/images/nature09319...

1 of 1 29/05/2016 16:03

Figure: Schematic representation of the synthetic likelihoods procedure.Umberto Picchini [email protected], twitter:@uPicchini

Page 20: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

For fixed θ simulate R artificial datasets y∗1 , ..., y∗R from your model andcompute corresponding (possibly vector valued) summaries s∗1 , ..., s∗R.

compute

µ̂θ =1R

R∑r=1

s∗r , Σ̂θ =1

R − 1

R∑r=1

(s∗r − µ̂θ)(s∗r − µ̂θ)′

compute the statistics sobs for the observed data y.

evaluate a multivariate Gaussian likelihood at sobs

liksyn(θ) := N(sobs; µ̂θ, Σ̂θ) ∝ |Σ̂θ|−1/2 exp

(−(sobs − µ̂θ)Σ̂

−1θ (sobs − µ̂θ)

2

)This likelihood can be maximized for a varying θ or be plugged withinan MCMC algorithm targeting

π̂(θ|sobs) ∝ liksyn(θ)π(θ).

Umberto Picchini [email protected], twitter:@uPicchini

Page 21: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

So the synthetic likelihood methodology assumes no specificknowledge of the probabilistic features of the model.

Only assumes the ability to forward-generate from the model.

assumes that the analyst is able to specify “informative”summaries.

assumes that said summaries are (approximately) Gaussians ∼ N(·).

Transform the summaries to be ≈ N is often not an issue (just as wedo in linear regression).

Of course the major issue (still open, also in ABC) is how to buildinformative summaries. This is left unsolved.

Umberto Picchini [email protected], twitter:@uPicchini

Page 22: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

I intend to use the synthetic likelihoods approach to enablelikelihood-free inference using SAEM.

This should allow SAEM to be applied to intractable likelihoodmodels.

Umberto Picchini [email protected], twitter:@uPicchini

Page 23: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

We use synthetic likelihoods to construct a Gaussian approximationover a set of complete summaries (S(Y), S(X)) to define a completesynthetic loglikelihood.

the complete synthetic loglikelihood

log p(s;θ) = logN(s;µ(θ),Σ(θ)), (2)

with s = (S(Y), S(X))

In (2) µ(θ) and Σ(θ) are unknown but can be estimated usingsynthetic likelihoods (SL), conditionally on θ.

However we need to obtain a maximizer for the (incomplete)synthetic loglikelihood log p(S(Y);θ).

Umberto Picchini [email protected], twitter:@uPicchini

Page 24: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

SAEM with synthetic likelihoods (SL)

For given θ SL returns estimates µ̂(θ) and Σ̂(θ) (sample mean andsample covariance).

Crucial result

For a Gaussian likelihood µ̂(θ) and Σ̂(θ) are sufficient statistics forµ(θ) and Σ(θ). And a Gaussian is member of the exponential family.

Recall: what SAEM does is to update sufficient statistics, perfect forus!

At kth SAEM iteration:

µ̂(k)(θ) = µ̂(k−1)(θ) + γ(k)(µ̂(θ) − µ̂(k−1)(θ)) (3)

Σ̂(k)

(θ) = Σ̂(k−1)

(θ) + γ(k)(Σ̂(θ) − Σ̂(k−1)

(θ)). (4)

Umberto Picchini [email protected], twitter:@uPicchini

Page 25: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Updating the latent variable X

At kth iteration of SAEM we need to sample S(X(k))|S(Y). This istrivial!.

We haveS(X(k))|S(Y) ∼ N(µ̂

(k)x|y (θ), Σ̂

(k)x|y (θ))

where

µ̂(k)x|y = µ̂x + Σ̂xyΣ̂

−1y (S(Y) − µ̂y)

Σ̂(k)x|y = Σ̂x − Σ̂xyΣ̂

−1y Σ̂yx

where µ̂x, µ̂y, Σ̂x, Σ̂y, Σ̂xy and Σ̂yx are extracted from (µ̂(k), Σ̂(k)

).That is µ̂(k)(θ) = (µ̂x, µ̂y) and

Σ̂(k)

(θ) =

[Σx Σxy

Σyx Σy

].

Umberto Picchini [email protected], twitter:@uPicchini

Page 26: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

The M-step

Now that we have simulated a S(X(k)) (conditional on data) letsproduce the complete summaries at iteration k:

s(k) := (S(Y), S(X(k)))

and maximize (M-step) the complete synthetic loglikelihood:

θ̂(k)

= arg maxθ∈Θ

logN(s(k);µ(θ),Σ(θ)) (5)

For each perturbation of θ the M-step performs a synthetic likelihoodsimulation.

It returns the best found maximizer for (5) and corresponding best(µ̂, Σ̂). Plug these in the updating moments equations (3)-(4).

Umberto Picchini [email protected], twitter:@uPicchini

Page 27: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

The slide that follows describes a single iteration of SAEM-SL.

Umberto Picchini [email protected], twitter:@uPicchini

Page 28: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Input: observed summaries S(Y), positive integers L and R. Values for θ̂(k−1)

, µ̂(k−1) and Σ̂(k−1)

.

Output: θ̂(k).

At iteration k:1. Extract µ̂x , µ̂y , Σ̂x , Σ̂y , Σ̂xy and Σ̂yx from µ̂(k−1) and Σ̂

(k−1). Compute conditional moments µ̂x|y , Σ̂x|y .

2. Sample S(X(k−1))|S(Y) ∼ N(µ̂(k−1)x|y (θ), Σ̂

(k−1)x|y (θ)) and form s(k−1) := (S(Y), S(X(k−1))).

3. Obtain (θ(k) ,µ(k) ,Σ(k)) from InternalSL(s(k−1) , θ̂(k−1)

, R) starting at θ̂(k−1)

.4. Increase k := k + 1 and go to step 1.

Function InternalSL(s(k−1) ,θstart , R):

Input: s(k−1) , starting parameters θstart , a positive integer R. Functions to compute simulated summaries S(y∗) andS(x∗) must be available.Output: the best found θ∗ maximizing logN(s(k) ; µ̂, Σ̂) and corresponding (µ∗ ,Σ∗).

Here θc denotes a generic candidate value.i. Simulate x∗r ∼ pX(X0:N ;θc), y∗r ∼ pY|X(Y1:n|X1:n ;θc) for r = 1, ..., R.

ii. Compute user-defined summaries s∗r = (S(y∗r ), S(x∗r )) for r = 1, ..., R. Construct the corresponding (µ̂, Σ̂).iii. Evaluate logN(s(k) ; µ̂, Σ̂).Use a numerical procedure that performs (i)–(iii) L times to find the best θ∗ maximizing logN(s(k) ; µ̂, Σ̂) for varying θc .Denote with (µ∗ , Σ̂

∗) the simulated moments corresponding to the best found θ∗ . Set θ(k) := θ∗ .

iv. Update moments:

µ̂(k) = µ̂(k−1) +γ(k)(µ̂∗ − µ̂(k−1))

Σ̂(k)

= Σ̂(k−1)

+γ(k)(Σ̂∗− Σ̂

(k−1)).

Return (θ(k) , µ̂(k) , Σ̂(k)

).

Umberto Picchini [email protected], twitter:@uPicchini

Page 29: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

We have now completed all the steps required to implement alikelihood free version of SAEM.

Main inference problem: not clear how to construct a set ofinformative (S(Y), S(X)) for θ. These are user-defined, hencearbitrary.Main computational bottleneck: compared to the regularSAEM, our M-step is a numerical optimization routine. We usedNelder-Mead, which is rather slow.

Ideal case (typically unattainable)

If we have:1 s = (S(Y), S(X)) is jointly sufficient for θ and2 s is multivariate Gaussian

then our likelihood free SAEM converges to a stationary point ofp(Y;θ) under the conditions given in Delyon et al 1999.

Umberto Picchini [email protected], twitter:@uPicchini

Page 30: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

I have two examples to show:

a state-space model driven by an SDE: I compare SAEM-SLwith the regular SAEM and with direct optimzation of thesynthetic likelihood.

a simple Gaussian state-model: I compare SAEM-SML vs theregular SAEM, iterated filtering and particle marginal methods.

A “static model” example is available in my paper3.

3P. 2016. Likelihood-free stochastic approximation EM for inference in complexmodels, arXiv:1609.03508.

Umberto Picchini [email protected], twitter:@uPicchini

Page 31: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Example: a nonlinear Gaussian state-space model

We study a standard toy-model (e.g. Jasra et al. 20104).

{Yj = Xj + σyνj, j > 1Xj = 2 sin(eXj−1) + σxτj,

with νj, τj ∼ N(0, 1) i.i.d. and X0 = 0.

θ = (σx,σy).

4Jasra, Singh, Martin and McCoy, 2012. Filtering via approximate Bayesiancomputation. Statistics and Computing.

Umberto Picchini [email protected], twitter:@uPicchini

Page 32: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

We generate n = 50 observations from the model withσx = σy = 2.23.

0 5 10 15 20 25 30 35 40 45 50

time

-10

-5

0

5

10

Y

Umberto Picchini [email protected], twitter:@uPicchini

Page 33: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

the standard SAEM

Let’s set-up the “standard” SAEM. We need the complete likelihoodand sufficient statistics.

Easy for this model.

p(Y, X) = p(Y|X)p(X) =

n∏j=1

p(Yj|Xj)p(Xj|Xj−1)

Yj|Xj ∼ N(Xj,σ2y)

Xj|Xj−1 ∼ N(2 sin(eXj−1),σ2x)

Sσ2x=∑n

j=1(Xj − 2 sin(eXj−1))2 and Sσ2y=∑n

j=1(Yj − Xj)2 are

sufficient for σ2x and σ2

y

Umberto Picchini [email protected], twitter:@uPicchini

Page 34: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Plug the sufficient statistics in the complete (log)likelihood, and set tozero the gradient w.r.t. (σ2

x ,σ2y).

Explicit M-step at kth iteration:

σ̂2(k)

x = Sσ2x/n

σ̂2(k)

y = Sσ2y/n

To run SAEM the only left thing needed is a way to sample X(k)|Y.

For this we use sequential Monte Carlo, e.g. the bootstrap filter (inbackup slides, if needed).

I skip this sampling step. Just know that this is easily accomplishedfor state space models.

Umberto Picchini [email protected], twitter:@uPicchini

Page 35: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

SAEM-SL: SAEM with synthetic likelihoods

To implement SAEM-SL no knowledge of the complete likelihood isrequired, nor analytic derivation of the sufficient statistics.

We just have to postulate some “reasonable” summaries for X and Y.

For each synthetic likelihood step, we simulate R = 500 realizationsof S(Xr) and S(Yr), containing:

the sample median of Xr, r = 1, ..., R;

the median absolute deviation of Xr;

the 10th, 20th, 75th and 90th percentile of Xr.

the sample median of Yr;

the median absolute deviation of Yr;

the 10th, 20th, 75th and 90th percentile of Yr.

Umberto Picchini [email protected], twitter:@uPicchini

Page 36: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Results with SAEM-SL on 30 different datasets

Starting parameter values are randomly initialised. Here R = 500.

0 10 20 30 40 50 60 70σ

x

-2

0

2

4

6

8

10

12

14

16

18

20

0 10 20 30 40 50 60 70σ

y

0

5

10

15

20

25

30

Figure: trace plots for SAEM-SL (σx, left; σy, right) for the thirty estimationprocedures. Horizontal lines are true parameter values.

Umberto Picchini [email protected], twitter:@uPicchini

Page 37: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

(M, M̄) (500,200) (1000,200) (1000,20)σx (true value 2.23)

SAEM-SMC 2.54 [2.53,2.54] 2.55 [2.54,2.56] 1.99 [1.85,2.14]IF2 1.26 [1.21,1.41] 1.35 [1.28,1.41] 1.35 [1.28,1.41]

σy (true value 2.23)SAEM-SMC 0.11 [0.10,0.13] 0.06 [0.06,0.07] 1.23 [1.00,1.39]IF2 1.62 [1.56,1.75] 1.64 [1.58,1.67] 1.64 [1.58,1.67]

Table: SAEM with bostrap filter using M particles; IF2=iterated filtering.

R 500 1000σx (true value 2.23)

SAEM-SL 1.67 [0.42,1.97] 1.51 [0.82,2.03]σy (true value 2.23)

SAEM-SL 2.40 [2.01,2.63] 2.27 [1.57,2.57]

Table: SAEM with synthetic likelihoods. K = 60 iterations.

Umberto Picchini [email protected], twitter:@uPicchini

Page 38: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Example: state-space SDE model [P., 2016]

We consider a one-dimensional state-space model driven by a SDE.

Suppose we administer 4 mg of theophylline [Dose] to a subject.

Xt is the level of theophylline concentration in blood at time t (hrs).Consider the following state-space model:Yj = Xj + εj, εj ∼iid N(0,σ2

ε)

dXt =

(Dose·Ka·Ke

Cl e−Kat − KeXt

)dt + σ

√XtdWt, t > t0

Ke is the elimination rate constantKa is the absorption rate constantCl the clearance of the drugσ the intensity of intrinsic stochastic noise.

Umberto Picchini [email protected], twitter:@uPicchini

Page 39: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

We simulate a set of n = 30 observations from the model atequispaced times.But how to simulate from this model? No analytic solution for theSDE is available.

We resort to the Euler-Maruyama discretization with a small stepsizeh = 0.05 on the time interval [0,30]:

Xt+h = Xt +

(Dose · Ka · Ke

Cle−Kat − KeXt

)h + (σ

√h · Xt)Zt+h,

{Zt} ∼iid N(0, h)

This implies a latent simulated process of length N:

X0:N = {X0, Xh, ..., XN}.

Umberto Picchini [email protected], twitter:@uPicchini

Page 40: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

A typical relation of the process:

time (hrs)0 5 10 15 20 25 30

0

2

4

6

8

10

12

14

Figure: data (circles) and the latent process (black line).

Umberto Picchini [email protected], twitter:@uPicchini

Page 41: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

The classic SAEM

Applying the “standard” SAEM is not really trivial here.

The complete likelihood:

p(Y, X) = p(Y|X)p(X) =

n∏j=1

p(Yj|Xj)

N∏i=1

p(Xi|Xi−1)

Yj|Xj ∼ N(Xj,σ2y)

Xi|Xi−1 ∼ not available.

Euler-Maruyama induces a Gaussian approximation:

p(xi|xi−1) ≈1

σ√

2πxi−1hexp{−

[xi − xi−1 − (Dose·Ka·Ke

Cl e−Kaτi−1 − Kexi−1)h]2

2σ2xi−1h

}.

Umberto Picchini [email protected], twitter:@uPicchini

Page 42: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

The classic SAEM

I am not going to show how to obtain all the sufficient summarystatistics (see the paper).

Just trust me that it requires a bit of work.

And this is just a one-dimensional model!

We sample X(k)|Y using the bootstrap filter sequential Monte Carlomethod.

If you are not familiar with sequential Monte Carlo, worry not. Justconsider it a method returning a “best” filtered X(k) based on Y (forlinear Gaussian models you would use Kalman).

Umberto Picchini [email protected], twitter:@uPicchini

Page 43: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

SAEM-SL with synthetic likelihoods

User-defined summaries for a simulation r: (s(x∗r ), s(y∗r )).s(x∗r ) contains:

(i) the median values of X∗0:N ;

(ii) the median absolute deviation of X∗0:N ,

(iii) a statistic for σ computed from X∗0:N (see next slide).

(iv) (∑

j(Y∗j − X∗j )

2/n)1/2.

s(y∗r ) contains:

(i) the median value of y∗r ;

(ii) its median absolute deviation;

(iii) the slope of the line connecting the first and last simulatedobservation (Y∗n − Y∗1 )/(tn − t1).

Umberto Picchini [email protected], twitter:@uPicchini

Page 44: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

In Miao 2014: for an SDE of the type dXt = µ(Xt)dt + σg(Xt)dWt

with t ∈ [0, T], we have∑Γ |Xi+1 − Xi|

2∑Γ g(Xi)(ti+1 − ti)

→ σ2 as |Γ |→ 0

where the convergence is in probability and Γ a partition of [0, T].

We deduce that using the discretization {X0, X1, ..., XN} produced bythe Euler-Maruyama scheme, we can take the square root of the lefthand side in the limit above, which should be informative for σ.

Umberto Picchini [email protected], twitter:@uPicchini

Page 45: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

100 different datasets are simulated from ground-truth parameters.All optimizations start away from ground truth values.

SAEM-SL: at each iteration of the M-step simulates R = 500summaries, with L = 10 Nelder-Mead iterations (M-step) andK = 100 SAEM iterations.

0 20 40 60 80 100 120Ke

0

0.05

0.1

0.15

0.2

0 20 40 60 80 100 120Cl

0

0.05

0.1

0.15

0.2

0 20 40 60 80 100 120σ

0

0.1

0.2

0.3

0 20 40 60 80 100 120σǫ

0

0.2

0.4

0.6

0.8

Umberto Picchini [email protected], twitter:@uPicchini

Page 46: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

SAEM-SMC using the bootstrap filter with M = 500 particles toobtain a X(k)|Y.

Cl and σ are essentially unidentified.

Umberto Picchini [email protected], twitter:@uPicchini

Page 47: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Ke Cl σ σεtrue values 0.050 0.040 0.100 0.319SAEM-SMC 0.045 [0.042,0.049] 0.085 [0.078,0.094] 0.171 [0.158,0.184] 0.395 [0.329,0.465]SAEM-SL 0.044 [0.038,0.051] 0.033 [0.028,0.039] 0.106 [0.083,0.132] 0.266 [0.209,0.307]optim. SL 0.063 [0.054,0.069] 0.089 [0.068,0.110] 0.304 [0.249,0.370] 0.543 [0.485,0.625]

SAEM-SMC: uses M = 500 particles to filter X(k)|Y via SMC. Runsfor K = 300 SAEM iterations.

SAEM-SL at each iteration of the M-step simulates R = 500summaries, with L = 10 Nelder-Mead iterations (M-step) andK = 100 SAEM iterations.

“optim. SL” denotes the direct maximization of Wood’s synthetic(incomplete) likelihood:

θ̂ = arg maxθ∈Θ

logN(S(Y);µ(θ),Σ(θ)). (6)

Umberto Picchini [email protected], twitter:@uPicchini

Page 48: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

How about Gaussianity of the summaries?

Here we have qq-normal plots from the 7 postulated summaries at theobtained optimum (500 simulations each).

-4 -2 0 2 4

sx(1)

4

6

8

10

12

-4 -2 0 2 4

sx(2)

1.5

2

2.5

3

3.5

-4 -2 0 2 4

sx(3)

1.8

2

2.2

2.4

-4 -2 0 2 4

sx(4)

0.1

0.2

0.3

0.4

-4 -2 0 2 4

sy(1)

4

6

8

10

12

-4 -2 0 2 4

sy(2)

1.5

2

2.5

3

3.5

-4 -2 0 2 4

sy(3)

-0.4

-0.3

-0.2

-0.1

The summaries quantiles nicely follow the line (not visible) for theperfect match with Gaussian quantiles.

Umberto Picchini [email protected], twitter:@uPicchini

Page 49: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Summary

We introduced SAEM-SL, a version of SAEM that is able to dealwith intractable likelihoods;

It only requires the formulation and simulation of “informative”summaries s.

How to construct informative summaries automatically is adifficult open problem.

if said user-defined summaries s are sufficient for θ (veryunlikely), and if s ∼ N(·) then SAEM-SL converges to the truemaximum likelihood estimates for p(Y|θ).

The method can be used for intractable models, or even just toinitialize starting values for more refined algorithms (e.g.particle MCMC).

Umberto Picchini [email protected], twitter:@uPicchini

Page 50: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Key references

Andrieu et al. 2010. Particle Markov chain Monte Carlo methods.JRSS-B.

Delyon, Lavielle and Moulines, 1999. Convergence of a stochasticapproximation version of the EM algorithm. Annals of Statistics.

Dempster, Laird and Rubin, 1977. Maximum likelihood fromincomplete data via the EM algorithm. JRSS-B.

Ionides et al. 2015. Inference for dynamic and latent variable modelsvia iterated, perturbed Bayes maps. PNAS.

Marin et al. 2012. Approximate Bayesian computational methods.Stat. Comput.

Picchini 2016. Likelihood-free stochastic approximation EM forinference in complex models, arXiv:1609.03508.

Wood 2010. Statistical inference for noisy nonlinear ecologicaldynamic systems. Nature.

Umberto Picchini [email protected], twitter:@uPicchini

Page 51: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Appendix

Umberto Picchini [email protected], twitter:@uPicchini

Page 52: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Justification of Gaussianity (Wood 2010)

Assuming Gaussianity for summaries s(·) can be justified from astandard Taylor expansion.

Say that fθ(s) is the true (unknown) joint density of s.

Expand fθ(s) around its mode µθ:

log fθ(s) ≈ log fθ(µθ) +12(s − µθ) ′

(∂2 log fθ∂s∂s ′

)(s − µθ)

hence

fθ(s) ≈ const× exp{−

12(s − µθ) ′

(−∂2 log fθ∂s∂s ′

)(s − µθ)

}s ∼ N

(µθ,{−∂2 log fθ∂s∂s ′

}−1), approximately when s ≈ µθ

Umberto Picchini [email protected], twitter:@uPicchini

Page 53: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Asymptotic properties for synthetic likelihoods (Wood2010)

As the number of simulated statistics R→∞the maximizer θ̂ of liks(θ) is a consistent estimator.

θ̂ is an unbiased estimator.

θ̂ might not be in general Gaussian. It will be Gaussian if Σθdepends weakly on θ or when d = dim(s) is large.

Umberto Picchini [email protected], twitter:@uPicchini

Page 54: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Algorithm 1 Bootstrap filter with M particles and threshold 1 6 M̄ 6M. Resamples only when ESS < M̄.

Step 0. Set j = 1: for m = 1, ..., M sample X(m)1 ∼ p(X0), compute weights

W(m)1 = f (Y1|X

(m)1 ) and normalize weights w(m)

1 := W(m)1 /∑M

m=1 W(m)1 .

Step 1.if ESS({w(m)

j }) < M̄ thenresample M particles {X(m)

j , w(m)j } and set W(m)

j = 1/M.end ifSet j := j + 1 and if j = n + 1, stop and return all constructed weights{W(m)

j }m=1:Mj=1:n to sample a single path. Otherwise go to step 2.

Step 2. For m = 1, ..., M sample X(m)j ∼ p(·|X(m)

j−1). Compute

W(m)j := w(m)

j−1p(Yj|X(m)j )

normalize weights w(m)j := W(m)

j /∑M

m=1 W(m)j and go to step 1.

Umberto Picchini [email protected], twitter:@uPicchini