Statistical modeling of summary values leads to accurate ...mwl25/mcmski/slides/or... · Oliver Ratmann (Imperial College London, UK) Anton Camacho (London School of Hygiene & Tropical

Centre for Outbreak Analysis and Modelling

Statistical modeling of summary values leads to accurate Approximate

Bayesian Computations

Oliver Ratmann (Imperial College London, UK)

Anton Camacho (London School of Hygiene & Tropical Medicine, UK)Adam Meijer (National Institute of the Environment & Public Health, NL)

Gé Donker (Netherlands Institute for Health Services Research, NL)

Tuesday, 7 January 14

Standard ABC

ABC approximation to likelihood

is exact if 1) summary statistics are sufficient 2) upper and lower tolerances coincide

summary stat

tolerance

(Beaumont 2002)


Standard ABC

ABC approximation to likelihood

is exact if 1) summary statistics are sufficient 2) upper and lower tolerances coincide

summary stat

tolerance

(Beaumont 2002)

in practice not feasible, ‘asymptotic’ argument


σ2

n-A

BC

est

imat

e of

πτ(σ

2 |x)

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0 n=60

naivetolerancesτ-=0.35τ+=1.65

π(σ2|x)

argmaxσ2π(σ2|x)

even with sufficient summary statistics (Fernhead & Prangle 2012)

Standard ABC is noisy


ABC*

σ2

n−AB

C e

stim

ate

of π

τ(σ2 |x

)

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

n=60

calibratedtolerancesτ−=0.572τ+=1.808m=97

π(σ2|x)

argmaxσ2

π(σ2|x)

Can we construct ABC such that inference is accurate• wrt point estimate, eg MAP• wrt overall similarity in distribution, eg KL

divergence• and maintain computational feasibility

If yes, under which conditions?

How general are these?

(Ratmann, Camacho, Meijer, Donker; arXiv 2013)


R =�c

� T

�s

1:n(x), s1:m(y)� c

+

T-testobjective: declare , unequalH0: , equalH1: , unequalrejection region:

µ(✓) µx

µ(✓) µx

µ(✓) µx

(�1, c�] [ [c+,1)

ABCobjective: declare , equalH0: , unequalH1: , equalrejection region:

µ(✓) µx

µ(✓) µx

µ(✓) µx

[c�, c+]

ABC* step 1To avoid asymptotics, interpret ABC accept/reject step as the outcome of a decision test


R =�c

� T

�s

1:n(x), s1:m(y)� c

+


µ(✓) µx

µ(✓) µx

µ(✓) µx

(�1, c�] [ [c+,1)



, are fully determinedsth

µ(✓) µx

µ(✓) µx

µ(✓) µx

[c�, c+]

c� c+

P (R |H0 ) ↵


R =�c

� T

�s

1:n(x), s1:m(y)� c

+


µ(✓) µx

µ(✓) µx

µ(✓) µx

(�1, c�] [ [c+,1)




Let then ABC approximation to likelihood is the power function of the test

µ(✓) µx

µ(✓) µx

µ(✓) µx

[c�, c+]

c� c+

P (R |H0 ) ↵

⇢ = µ� µx

⇢ ! P (R | ⇢ )


R =�c

� T

�s

1:n(x), s1:m(y)� c

+


µ(✓) µx

µ(✓) µx

µ(✓) µx

(�1, c�] [ [c+,1)




Let then ABC approximation to likelihood is the power function of the test

µ(✓) µx

µ(✓) µx

µ(✓) µx

[c�, c+]

c� c+

P (R |H0 ) ↵

⇢ = µ� µx

⇢ ! P (R | ⇢ )

holds for specific test: two sided, one sample equivalence hypothesis test


Example: test variance

x

1:n ⇠ N (0,�2x

) y1:m ⇠ N (0,�2)

suppose

then



x

1:n ⇠ N (0,�2x

) y1:m ⇠ N (0,�2)

suppose

⇢ = �2/�2x

⇢? = 1

H0 : ⇢ /2 [⌧�, ⌧+]

H1 : ⇢ 2 [⌧�, ⌧+]

T = S2(y1:m)/S2(x1:n) = ⇢1

n� 1

mX

i=1

(yi

� y)2

�2

⇠ ⇢

n� 1�2m�1

then

for simplicity, summary values equal data



x

1:n ⇠ N (0,�2x

) y1:m ⇠ N (0,�2)

suppose

⇢ = �2/�2x

⇢? = 1

H0 : ⇢ /2 [⌧�, ⌧+]

H1 : ⇢ 2 [⌧�, ⌧+]

T = S2(y1:m)/S2(x1:n) = ⇢1

n� 1

mX

i=1

(yi

� y)2

�2

⇠ ⇢

n� 1�2m�1

then


point of equality



x

1:n ⇠ N (0,�2x

) y1:m ⇠ N (0,�2)

suppose

⇢ = �2/�2x

⇢? = 1

H0 : ⇢ /2 [⌧�, ⌧+]

H1 : ⇢ 2 [⌧�, ⌧+]

T = S2(y1:m)/S2(x1:n) = ⇢1

n� 1

mX

i=1

(yi

� y)2

�2

⇠ ⇢

n� 1�2m�1

then


point of equality

tolerances on population level



x

1:n ⇠ N (0,�2x

) y1:m ⇠ N (0,�2)

suppose

⇢ = �2/�2x

⇢? = 1

H0 : ⇢ /2 [⌧�, ⌧+]

H1 : ⇢ 2 [⌧�, ⌧+]

T = S2(y1:m)/S2(x1:n) = ⇢1

n� 1

mX

i=1

(yi

� y)2

�2

⇠ ⇢

n� 1�2m�1

then

know distribution of T,can work out , c� c+



x

1:n ⇠ N (0,�2x

) y1:m ⇠ N (0,�2)

suppose

⇢ = �2/�2x

⇢? = 1

H0 : ⇢ /2 [⌧�, ⌧+]

H1 : ⇢ 2 [⌧�, ⌧+]

T = S2(y1:m)/S2(x1:n) = ⇢1

n� 1

mX

i=1

(yi

� y)2

�2

⇠ ⇢

n� 1�2m�1

then

know distribution of T,can work out , c� c+



x

1:n ⇠ N (0,�2x

) y1:m ⇠ N (0,�2)

suppose

⇢ = �2/�2x

⇢? = 1

H0 : ⇢ /2 [⌧�, ⌧+]

H1 : ⇢ 2 [⌧�, ⌧+]

T = S2(y1:m)/S2(x1:n) = ⇢1

n� 1

mX

i=1

(yi

� y)2

�2

⇠ ⇢

n� 1�2m�1

then

know distribution of T,can work out , and power function

c� c+

0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

ρ

powe

r



x

1:n ⇠ N (0,�2x

) y1:m ⇠ N (0,�2)

suppose

⇢ = �2/�2x

⇢? = 1

H0 : ⇢ /2 [⌧�, ⌧+]

H1 : ⇢ 2 [⌧�, ⌧+]

T = S2(y1:m)/S2(x1:n) = ⇢1

n� 1

mX

i=1

(yi

� y)2

�2

⇠ ⇢

n� 1�2m�1

then

know distribution of T,can work out , and power functionand calibrate

c� c+

0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

ρ

powe

r

move mode

increase



x

1:n ⇠ N (0,�2x

) y1:m ⇠ N (0,�2)

suppose

⇢ = �2/�2x

⇢? = 1

H0 : ⇢ /2 [⌧�, ⌧+]

H1 : ⇢ 2 [⌧�, ⌧+]

T = S2(y1:m)/S2(x1:n) = ⇢1

n� 1

mX

i=1

(yi

� y)2

�2

⇠ ⇢

n� 1�2m�1

then

know distribution of T,can work out , and power functionand calibrate

c� c+

0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

ρ

powe

rtighten

increase

move mode

increase



x

1:n ⇠ N (0,�2x

) y1:m ⇠ N (0,�2)

suppose

⇢ = �2/�2x

⇢? = 1

H0 : ⇢ /2 [⌧�, ⌧+]

H1 : ⇢ 2 [⌧�, ⌧+]

T = S2(y1:m)/S2(x1:n) = ⇢1

n� 1

mX

i=1

(yi

� y)2

�2

⇠ ⇢

n� 1�2m�1

then

calibrated tolerances

σ2

n−AB

C e

stim

ate

of π

τ(σ2 |x

)

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

2.0

n=60

calibratedtolerancesτ−=0.477τ+=2.2naivetolerancesτ−=0.35τ+=1.65

π(σ2|x)

argmaxσ2

π(σ2|x)

exact posterior



x

1:n ⇠ N (0,�2x

) y1:m ⇠ N (0,�2)

suppose

⇢ = �2/�2x

⇢? = 1

H0 : ⇢ /2 [⌧�, ⌧+]

H1 : ⇢ 2 [⌧�, ⌧+]

T = S2(y1:m)/S2(x1:n) = ⇢1

n� 1

mX

i=1

(yi

� y)2

�2

⇠ ⇢

n� 1�2m�1

then

calibrated tolerancescalibrated m

σ2

n−AB

C e

stim

ate

of π

τ(σ2 |x

)

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

n=60

calibratedtolerancesτ−=0.572τ+=1.808m=97calibratedtolerancesτ−=0.726τ+=1.392m=300

π(σ2|x)

argmaxσ2

π(σ2|x)tighten

exact posterior



x

1:n ⇠ N (0,�2x

) y1:m ⇠ N (0,�2)

suppose

⇢ = �2/�2x

⇢? = 1

H0 : ⇢ /2 [⌧�, ⌧+]

H1 : ⇢ 2 [⌧�, ⌧+]

T = S2(y1:m)/S2(x1:n) = ⇢1

n� 1

mX

i=1

(yi

� y)2

�2

⇠ ⇢

n� 1�2m�1

then




x

1:n ⇠ N (0,�2x

) y1:m ⇠ N (0,�2)

suppose

⇢ = �2/�2x

⇢? = 1

H0 : ⇢ /2 [⌧�, ⌧+]

H1 : ⇢ 2 [⌧�, ⌧+]

T = S2(y1:m)/S2(x1:n) = ⇢1

n� 1

mX

i=1

(yi

� y)2

�2

⇠ ⇢

n� 1�2m�1

then


Conclusions-1using statistical decision theory, the ABC accept/reject step can be set up such that

• the ABC* MAP equals the MAP of the exact posterior

• the KL divergence of the ABC* posterior to the exact posterior is minimal


ABC* step 2

1. repeat data points on summary level “summary values” ➣ can model their distribution, eg

s

1:n(x) ⇠ N (µx

,�

2x

)

Statistical decision testing on summary level

3. indirect inference ➣ link auxiliary space back to original space

2. testing on auxiliary space ➣ given , is the underlying small ? s

1:n(x) s1:m(y) ⇢ = µ(✓)� µx

s1:m(y) ⇠ N (µ(✓),�2(✓))


ABC* step 2


s

1:n(x) ⇠ N (µx

,�

2x

)




1:n(x) s1:m(y) ⇢ = µ(✓)� µx

s1:m(y) ⇠ N (µ(✓),�2(✓))

Assumptionssummary values can be found sth

A1 they are sufficient for θA2 their distribution can be modeled in an elementary way so that test statistics are available and can be calibrated

further conditions to transport the accurate ABC* density to the original space


ABC* step 2


s

1:n(x) ⇠ N (µx

,�

2x

)




1:n(x) s1:m(y) ⇢ = µ(✓)� µx

s1:m(y) ⇠ N (µ(✓),�2(✓))

Assumptionssummary values can be found sth

A1 they are sufficient for θA2 their distribution can be modeled in an elementary way so that test statistics are available and can be calibrated

further conditions to transport the accurate ABC* density to the original space


Summary valuessuitable data points on a summary level can be found

data


data time series is biennial



data time series is biennial

odd and even time series values can be modeled as iid Gaussian



s

1:n(x) ⇠ N (µx

,�

2x

)

s1:n(y) ⇠ N (µ(✓),�2(✓))

⇢ = µ(✓)� µx

obs

simpopulation error

L : ⇥ ⇢ RD ! � ⇢ RK

✓ ! (⇢1, . . . , ⇢K)

⇢k

= �k

(⌫xk

, ⌫k

(✓))

⇢ = (⇢1, . . . , ⇢K)✓ = (✓1, . . . , ✓D)D orig parameters

K error parametersLink function

Modeling summary valuesconstructs an auxiliary probability space

Discussion wrt indirect inference (Gouriéroux 1993)• difficulty in indirect inference: which aux space chosen

here constructed empirically from distr of summary values


ABC* indirect inference

⇡

true posterior

(✓|x) / `(x|✓) ⇡(✓)

/ `(s1:nkk

(x), k = 1, . . . ,K|✓) ⇡(✓)

= `(s1:nkk

(x), k = 1, . . . ,K|⇢) ⇡(⇢) |@L(✓)|

⇡

abc

(✓|x) / P

x

(ABC accept|⇢) ⇡(⇢) |@L(✓)|

using assumptions A1, A2:


⇡

true posterior

(✓|x) / `(x|✓) ⇡(✓)

/ `(s1:nkk

(x), k = 1, . . . ,K|✓) ⇡(✓)

= `(s1:nkk

(x), k = 1, . . . ,K|⇢) ⇡(⇢) |@L(✓)|

⇡

abc

(✓|x) / P

x

(ABC accept|⇢) ⇡(⇢) |@L(✓)|




ABC* approximation on -space is⇢

⇡

true posterior

(✓|x) / `(x|✓) ⇡(✓)

/ `(s1:nkk

(x), k = 1, . . . ,K|✓) ⇡(✓)

= `(s1:nkk

(x), k = 1, . . . ,K|⇢) ⇡(⇢) |@L(✓)|

⇡

abc

(✓|x) / P

x

(ABC accept|⇢) ⇡(⇢) |@L(✓)|





⇡

true posterior

(✓|x) / `(x|✓) ⇡(✓)

/ `(s1:nkk

(x), k = 1, . . . ,K|✓) ⇡(✓)

= `(s1:nkk

(x), k = 1, . . . ,K|⇢) ⇡(⇢) |@L(✓)|

⇡

abc

(✓|x) / P

x

(ABC accept|⇢) ⇡(⇢) |@L(✓)|

using assumptions A1, A2:match through calibrationof ABC tolerances and m




⇡

true posterior

(✓|x) / `(x|✓) ⇡(✓)

/ `(s1:nkk

(x), k = 1, . . . ,K|✓) ⇡(✓)

= `(s1:nkk

(x), k = 1, . . . ,K|⇢) ⇡(⇢) |@L(✓)|

⇡

abc

(✓|x) / P

x

(ABC accept|⇢) ⇡(⇢) |@L(✓)|

using assumptions A1, A2:match through calibrationof ABC tolerances and m


Regularity conditions on the link functionA3 the link function is bijective and continuously differentiable


Example: moving average no sufficient statistics other than data, simple enough so that link function is analytically known

x

t

= u

t

+ au

t�1, u

t

⇠ N (0,�2)

✓ = (a,�2)

⌫1 = (1 + a

2)�2

⌫2 = a/(1 + a

2)

⇢1 = (1 + a

2)�2/⌫

x1

⇢2 = atanh(a/(1 + a

2))� atanh(⌫x2),


Example: moving average no sufficient statistics other than data, simple enough so that link function is anlytically known

x

t

= u

t

+ au

t�1, u

t

⇠ N (0,�2)

✓ = (a,�2)

⌫1 = (1 + a

2)�2

⌫2 = a/(1 + a

2)

⇢1 = (1 + a

2)�2/⌫

x1

⇢2 = atanh(a/(1 + a

2))� atanh(⌫x2),



x

t

= u

t

+ au

t�1, u

t

⇠ N (0,�2)

✓ = (a,�2)

⌫1 = (1 + a

2)�2

⌫2 = a/(1 + a

2)

⇢1 = (1 + a

2)�2/⌫

x1

⇢2 = atanh(a/(1 + a

2))� atanh(⌫x2),



x

t

= u

t

+ au

t�1, u

t

⇠ N (0,�2)

✓ = (a,�2)

⌫1 = (1 + a

2)�2

⌫2 = a/(1 + a

2)

⇢1 = (1 + a

2)�2/⌫

x1

⇢2 = atanh(a/(1 + a

2))� atanh(⌫x2),

−0.4 −0.2 0.0 0.2 0.4

0.6

0.8

1.0

1.2

1.4

aσ

2

1

1

1.5

2

1 3

5

10

Testing only variance:link not bijective

exact posterior

−0.4 −0.2 0.0 0.2 0.4

0.6

0.8

1.0

1.2

1.4

a

σ2

1

3

5

1

3

5

10

Testing variance and autocorrelation with even values:summary values not sufficient



x

t

= u

t

+ au

t�1, u

t

⇠ N (0,�2)

✓ = (a,�2)

⌫1 = (1 + a

2)�2

⌫2 = a/(1 + a

2)

⇢1 = (1 + a

2)�2/⌫

x1

⇢2 = atanh(a/(1 + a

2))� atanh(⌫x2),

−0.4 −0.2 0.0 0.2 0.4

0.6

0.8

1.0

1.2

1.4

aσ

2

1

1

1.5

2

1 3

5

10

Testing only variance:link not bijective

exact posterior

−0.4 −0.2 0.0 0.2 0.4

0.6

0.8

1.0

1.2

1.4

a

σ2

1

3

5

1

3

5

10

Testing variance and autocorrelation with even values:summary values not sufficient



x

t

= u

t

+ au

t�1, u

t

⇠ N (0,�2)

✓ = (a,�2)

⌫1 = (1 + a

2)�2

⌫2 = a/(1 + a

2)

⇢1 = (1 + a

2)�2/⌫

x1

⇢2 = atanh(a/(1 + a

2))� atanh(⌫x2),

−0.4 −0.2 0.0 0.2 0.4

0.6

0.8

1.0

1.2

1.4

aσ

2

1

1

1.5

2

1 3

5

10

exact posterior

−0.4 −0.2 0.0 0.2 0.4

0.6

0.8

1.0

1.2

1.4

a

σ2

1

3

5

1

3

5

10

−0.4 −0.2 0.0 0.2 0.4

0.6

0.8

1.0

1.2

1.4

a

σ2

1

3

5

10

1

3

5

10

5 tests: link bijective and summary values sufficient


Example: flu time series datastochastic transmission model, derived from ODEs

three parameters of interest: reproductive number R0, duration of immunity, reporting rate

6 sets of iid summary values, from 3 time series, subsetting odd and even values


Example: flu time series datastochastic transmission model, derived from ODEs

three parameters of interest: reproductive number R0, duration of immunity, reporting rate

6 sets of iid summary values, from 3 time series, subsetting odd and even values


Example: flu time series dataTest if linkbijective from ABC* output

previous standard MCMC ABC

MCMCABC* with calibrated tolerances










Conclusions

using statistical decision theory in ABC,

• we can entirely avoid previous asymptotic arguments

• and construct accurate ABC algorithms by calibrating the decision tests appropriately

necessary to understand the distribution of the data on a summary levelidentifying replicate structures and modeling them is key in ABC as in any other approaches for which the likelihood is tractable


Thank you

co-workers on this projectAnton Camacho (London School of Hygiene & Tropical Medicine, UK)

Adam Meijer (National Institute of the Environment & Public Health, NL)

Gé Donker (Netherlands Institute for Health Services Research, NL)

acknowledgementsIoanna Manolopoulou (University College London)

Christian Robert (Paris Dauphine)


Documents

Statistical modeling of summary values leads to accurate ...mwl25/mcmski/slides/or... · Oliver Ratmann (Imperial College London, UK) Anton Camacho (London School of Hygiene & Tropical