Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Centre for Outbreak Analysis and Modelling
Statistical modeling of summary values leads to accurate Approximate
Bayesian Computations
Oliver Ratmann (Imperial College London, UK)
Anton Camacho (London School of Hygiene & Tropical Medicine, UK)Adam Meijer (National Institute of the Environment & Public Health, NL)
Gé Donker (Netherlands Institute for Health Services Research, NL)
Tuesday, 7 January 14
Standard ABC
ABC approximation to likelihood
is exact if 1) summary statistics are sufficient 2) upper and lower tolerances coincide
summary stat
tolerance
(Beaumont 2002)
Tuesday, 7 January 14
Standard ABC
ABC approximation to likelihood
is exact if 1) summary statistics are sufficient 2) upper and lower tolerances coincide
summary stat
tolerance
(Beaumont 2002)
in practice not feasible, ‘asymptotic’ argument
Tuesday, 7 January 14
σ2
n-A
BC
est
imat
e of
πτ(σ
2 |x)
0.0 0.5 1.0 1.5 2.0 2.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0 n=60
naivetolerancesτ-=0.35τ+=1.65
π(σ2|x)
argmaxσ2π(σ2|x)
even with sufficient summary statistics (Fernhead & Prangle 2012)
Standard ABC is noisy
Tuesday, 7 January 14
ABC*
σ2
n−AB
C e
stim
ate
of π
τ(σ2 |x
)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
n=60
calibratedtolerancesτ−=0.572τ+=1.808m=97
π(σ2|x)
argmaxσ2
π(σ2|x)
Can we construct ABC such that inference is accurate• wrt point estimate, eg MAP• wrt overall similarity in distribution, eg KL
divergence• and maintain computational feasibility
If yes, under which conditions?
How general are these?
(Ratmann, Camacho, Meijer, Donker; arXiv 2013)
Tuesday, 7 January 14
R =�c
� T
�s
1:n(x), s1:m(y)� c
+
T-testobjective: declare , unequalH0: , equalH1: , unequalrejection region:
µ(✓) µx
µ(✓) µx
µ(✓) µx
(�1, c�] [ [c+,1)
ABCobjective: declare , equalH0: , unequalH1: , equalrejection region:
µ(✓) µx
µ(✓) µx
µ(✓) µx
[c�, c+]
ABC* step 1To avoid asymptotics, interpret ABC accept/reject step as the outcome of a decision test
Tuesday, 7 January 14
R =�c
� T
�s
1:n(x), s1:m(y)� c
+
T-testobjective: declare , unequalH0: , equalH1: , unequalrejection region:
µ(✓) µx
µ(✓) µx
µ(✓) µx
(�1, c�] [ [c+,1)
ABC* step 1To avoid asymptotics, interpret ABC accept/reject step as the outcome of a decision test
ABCobjective: declare , equalH0: , unequalH1: , equalrejection region:
, are fully determinedsth
µ(✓) µx
µ(✓) µx
µ(✓) µx
[c�, c+]
c� c+
P (R |H0 ) ↵
Tuesday, 7 January 14
R =�c
� T
�s
1:n(x), s1:m(y)� c
+
T-testobjective: declare , unequalH0: , equalH1: , unequalrejection region:
µ(✓) µx
µ(✓) µx
µ(✓) µx
(�1, c�] [ [c+,1)
ABC* step 1To avoid asymptotics, interpret ABC accept/reject step as the outcome of a decision test
ABCobjective: declare , equalH0: , unequalH1: , equalrejection region:
, are fully determinedsth
Let then ABC approximation to likelihood is the power function of the test
µ(✓) µx
µ(✓) µx
µ(✓) µx
[c�, c+]
c� c+
P (R |H0 ) ↵
⇢ = µ� µx
⇢ ! P (R | ⇢ )
Tuesday, 7 January 14
R =�c
� T
�s
1:n(x), s1:m(y)� c
+
T-testobjective: declare , unequalH0: , equalH1: , unequalrejection region:
µ(✓) µx
µ(✓) µx
µ(✓) µx
(�1, c�] [ [c+,1)
ABC* step 1To avoid asymptotics, interpret ABC accept/reject step as the outcome of a decision test
ABCobjective: declare , equalH0: , unequalH1: , equalrejection region:
, are fully determinedsth
Let then ABC approximation to likelihood is the power function of the test
µ(✓) µx
µ(✓) µx
µ(✓) µx
[c�, c+]
c� c+
P (R |H0 ) ↵
⇢ = µ� µx
⇢ ! P (R | ⇢ )
holds for specific test: two sided, one sample equivalence hypothesis test
Tuesday, 7 January 14
Example: test variance
x
1:n ⇠ N (0,�2x
) y1:m ⇠ N (0,�2)
suppose
then
Tuesday, 7 January 14
Example: test variance
x
1:n ⇠ N (0,�2x
) y1:m ⇠ N (0,�2)
suppose
⇢ = �2/�2x
⇢? = 1
H0 : ⇢ /2 [⌧�, ⌧+]
H1 : ⇢ 2 [⌧�, ⌧+]
T = S2(y1:m)/S2(x1:n) = ⇢1
n� 1
mX
i=1
(yi
� y)2
�2
⇠ ⇢
n� 1�2m�1
then
for simplicity, summary values equal data
Tuesday, 7 January 14
Example: test variance
x
1:n ⇠ N (0,�2x
) y1:m ⇠ N (0,�2)
suppose
⇢ = �2/�2x
⇢? = 1
H0 : ⇢ /2 [⌧�, ⌧+]
H1 : ⇢ 2 [⌧�, ⌧+]
T = S2(y1:m)/S2(x1:n) = ⇢1
n� 1
mX
i=1
(yi
� y)2
�2
⇠ ⇢
n� 1�2m�1
then
for simplicity, summary values equal data
point of equality
Tuesday, 7 January 14
Example: test variance
x
1:n ⇠ N (0,�2x
) y1:m ⇠ N (0,�2)
suppose
⇢ = �2/�2x
⇢? = 1
H0 : ⇢ /2 [⌧�, ⌧+]
H1 : ⇢ 2 [⌧�, ⌧+]
T = S2(y1:m)/S2(x1:n) = ⇢1
n� 1
mX
i=1
(yi
� y)2
�2
⇠ ⇢
n� 1�2m�1
then
for simplicity, summary values equal data
point of equality
tolerances on population level
Tuesday, 7 January 14
Example: test variance
x
1:n ⇠ N (0,�2x
) y1:m ⇠ N (0,�2)
suppose
⇢ = �2/�2x
⇢? = 1
H0 : ⇢ /2 [⌧�, ⌧+]
H1 : ⇢ 2 [⌧�, ⌧+]
T = S2(y1:m)/S2(x1:n) = ⇢1
n� 1
mX
i=1
(yi
� y)2
�2
⇠ ⇢
n� 1�2m�1
then
know distribution of T,can work out , c� c+
Tuesday, 7 January 14
Example: test variance
x
1:n ⇠ N (0,�2x
) y1:m ⇠ N (0,�2)
suppose
⇢ = �2/�2x
⇢? = 1
H0 : ⇢ /2 [⌧�, ⌧+]
H1 : ⇢ 2 [⌧�, ⌧+]
T = S2(y1:m)/S2(x1:n) = ⇢1
n� 1
mX
i=1
(yi
� y)2
�2
⇠ ⇢
n� 1�2m�1
then
know distribution of T,can work out , c� c+
Tuesday, 7 January 14
Example: test variance
x
1:n ⇠ N (0,�2x
) y1:m ⇠ N (0,�2)
suppose
⇢ = �2/�2x
⇢? = 1
H0 : ⇢ /2 [⌧�, ⌧+]
H1 : ⇢ 2 [⌧�, ⌧+]
T = S2(y1:m)/S2(x1:n) = ⇢1
n� 1
mX
i=1
(yi
� y)2
�2
⇠ ⇢
n� 1�2m�1
then
know distribution of T,can work out , and power function
c� c+
0.5 1.0 1.5 2.0
0.0
0.2
0.4
0.6
0.8
ρ
powe
r
Tuesday, 7 January 14
Example: test variance
x
1:n ⇠ N (0,�2x
) y1:m ⇠ N (0,�2)
suppose
⇢ = �2/�2x
⇢? = 1
H0 : ⇢ /2 [⌧�, ⌧+]
H1 : ⇢ 2 [⌧�, ⌧+]
T = S2(y1:m)/S2(x1:n) = ⇢1
n� 1
mX
i=1
(yi
� y)2
�2
⇠ ⇢
n� 1�2m�1
then
know distribution of T,can work out , and power functionand calibrate
c� c+
0.5 1.0 1.5 2.0
0.0
0.2
0.4
0.6
0.8
ρ
powe
r
move mode
increase
Tuesday, 7 January 14
Example: test variance
x
1:n ⇠ N (0,�2x
) y1:m ⇠ N (0,�2)
suppose
⇢ = �2/�2x
⇢? = 1
H0 : ⇢ /2 [⌧�, ⌧+]
H1 : ⇢ 2 [⌧�, ⌧+]
T = S2(y1:m)/S2(x1:n) = ⇢1
n� 1
mX
i=1
(yi
� y)2
�2
⇠ ⇢
n� 1�2m�1
then
know distribution of T,can work out , and power functionand calibrate
c� c+
0.5 1.0 1.5 2.0
0.0
0.2
0.4
0.6
0.8
ρ
powe
rtighten
increase
move mode
increase
Tuesday, 7 January 14
Example: test variance
x
1:n ⇠ N (0,�2x
) y1:m ⇠ N (0,�2)
suppose
⇢ = �2/�2x
⇢? = 1
H0 : ⇢ /2 [⌧�, ⌧+]
H1 : ⇢ 2 [⌧�, ⌧+]
T = S2(y1:m)/S2(x1:n) = ⇢1
n� 1
mX
i=1
(yi
� y)2
�2
⇠ ⇢
n� 1�2m�1
then
calibrated tolerances
σ2
n−AB
C e
stim
ate
of π
τ(σ2 |x
)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
2.0
n=60
calibratedtolerancesτ−=0.477τ+=2.2naivetolerancesτ−=0.35τ+=1.65
π(σ2|x)
argmaxσ2
π(σ2|x)
exact posterior
Tuesday, 7 January 14
Example: test variance
x
1:n ⇠ N (0,�2x
) y1:m ⇠ N (0,�2)
suppose
⇢ = �2/�2x
⇢? = 1
H0 : ⇢ /2 [⌧�, ⌧+]
H1 : ⇢ 2 [⌧�, ⌧+]
T = S2(y1:m)/S2(x1:n) = ⇢1
n� 1
mX
i=1
(yi
� y)2
�2
⇠ ⇢
n� 1�2m�1
then
calibrated tolerancescalibrated m
σ2
n−AB
C e
stim
ate
of π
τ(σ2 |x
)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
n=60
calibratedtolerancesτ−=0.572τ+=1.808m=97calibratedtolerancesτ−=0.726τ+=1.392m=300
π(σ2|x)
argmaxσ2
π(σ2|x)tighten
exact posterior
Tuesday, 7 January 14
Example: test variance
x
1:n ⇠ N (0,�2x
) y1:m ⇠ N (0,�2)
suppose
⇢ = �2/�2x
⇢? = 1
H0 : ⇢ /2 [⌧�, ⌧+]
H1 : ⇢ 2 [⌧�, ⌧+]
T = S2(y1:m)/S2(x1:n) = ⇢1
n� 1
mX
i=1
(yi
� y)2
�2
⇠ ⇢
n� 1�2m�1
then
calibrated tolerancescalibrated m
Tuesday, 7 January 14
Example: test variance
x
1:n ⇠ N (0,�2x
) y1:m ⇠ N (0,�2)
suppose
⇢ = �2/�2x
⇢? = 1
H0 : ⇢ /2 [⌧�, ⌧+]
H1 : ⇢ 2 [⌧�, ⌧+]
T = S2(y1:m)/S2(x1:n) = ⇢1
n� 1
mX
i=1
(yi
� y)2
�2
⇠ ⇢
n� 1�2m�1
then
calibrated tolerancescalibrated m
Conclusions-1using statistical decision theory, the ABC accept/reject step can be set up such that
• the ABC* MAP equals the MAP of the exact posterior
• the KL divergence of the ABC* posterior to the exact posterior is minimal
Tuesday, 7 January 14
ABC* step 2
1. repeat data points on summary level “summary values” ➣ can model their distribution, eg
s
1:n(x) ⇠ N (µx
,�
2x
)
Statistical decision testing on summary level
3. indirect inference ➣ link auxiliary space back to original space
2. testing on auxiliary space ➣ given , is the underlying small ? s
1:n(x) s1:m(y) ⇢ = µ(✓)� µx
s1:m(y) ⇠ N (µ(✓),�2(✓))
Tuesday, 7 January 14
ABC* step 2
1. repeat data points on summary level “summary values” ➣ can model their distribution, eg
s
1:n(x) ⇠ N (µx
,�
2x
)
Statistical decision testing on summary level
3. indirect inference ➣ link auxiliary space back to original space
2. testing on auxiliary space ➣ given , is the underlying small ? s
1:n(x) s1:m(y) ⇢ = µ(✓)� µx
s1:m(y) ⇠ N (µ(✓),�2(✓))
Assumptionssummary values can be found sth
A1 they are sufficient for θA2 their distribution can be modeled in an elementary way so that test statistics are available and can be calibrated
further conditions to transport the accurate ABC* density to the original space
Tuesday, 7 January 14
ABC* step 2
1. repeat data points on summary level “summary values” ➣ can model their distribution, eg
s
1:n(x) ⇠ N (µx
,�
2x
)
Statistical decision testing on summary level
3. indirect inference ➣ link auxiliary space back to original space
2. testing on auxiliary space ➣ given , is the underlying small ? s
1:n(x) s1:m(y) ⇢ = µ(✓)� µx
s1:m(y) ⇠ N (µ(✓),�2(✓))
Assumptionssummary values can be found sth
A1 they are sufficient for θA2 their distribution can be modeled in an elementary way so that test statistics are available and can be calibrated
further conditions to transport the accurate ABC* density to the original space
Tuesday, 7 January 14
Summary valuessuitable data points on a summary level can be found
data
Tuesday, 7 January 14
data time series is biennial
Summary valuessuitable data points on a summary level can be found
Tuesday, 7 January 14
data time series is biennial
odd and even time series values can be modeled as iid Gaussian
Summary valuessuitable data points on a summary level can be found
Tuesday, 7 January 14
s
1:n(x) ⇠ N (µx
,�
2x
)
s1:n(y) ⇠ N (µ(✓),�2(✓))
⇢ = µ(✓)� µx
obs
simpopulation error
L : ⇥ ⇢ RD ! � ⇢ RK
✓ ! (⇢1, . . . , ⇢K)
⇢k
= �k
(⌫xk
, ⌫k
(✓))
⇢ = (⇢1, . . . , ⇢K)✓ = (✓1, . . . , ✓D)D orig parameters
K error parametersLink function
Modeling summary valuesconstructs an auxiliary probability space
Discussion wrt indirect inference (Gouriéroux 1993)• difficulty in indirect inference: which aux space chosen
here constructed empirically from distr of summary values
Tuesday, 7 January 14
ABC* indirect inference
⇡
true posterior
(✓|x) / `(x|✓) ⇡(✓)
/ `(s1:nkk
(x), k = 1, . . . ,K|✓) ⇡(✓)
= `(s1:nkk
(x), k = 1, . . . ,K|⇢) ⇡(⇢) |@L(✓)|
⇡
abc
(✓|x) / P
x
(ABC accept|⇢) ⇡(⇢) |@L(✓)|
using assumptions A1, A2:
Tuesday, 7 January 14
⇡
true posterior
(✓|x) / `(x|✓) ⇡(✓)
/ `(s1:nkk
(x), k = 1, . . . ,K|✓) ⇡(✓)
= `(s1:nkk
(x), k = 1, . . . ,K|⇢) ⇡(⇢) |@L(✓)|
⇡
abc
(✓|x) / P
x
(ABC accept|⇢) ⇡(⇢) |@L(✓)|
using assumptions A1, A2:
ABC* indirect inference
Tuesday, 7 January 14
ABC* approximation on -space is⇢
⇡
true posterior
(✓|x) / `(x|✓) ⇡(✓)
/ `(s1:nkk
(x), k = 1, . . . ,K|✓) ⇡(✓)
= `(s1:nkk
(x), k = 1, . . . ,K|⇢) ⇡(⇢) |@L(✓)|
⇡
abc
(✓|x) / P
x
(ABC accept|⇢) ⇡(⇢) |@L(✓)|
using assumptions A1, A2:
ABC* indirect inference
Tuesday, 7 January 14
ABC* approximation on -space is⇢
⇡
true posterior
(✓|x) / `(x|✓) ⇡(✓)
/ `(s1:nkk
(x), k = 1, . . . ,K|✓) ⇡(✓)
= `(s1:nkk
(x), k = 1, . . . ,K|⇢) ⇡(⇢) |@L(✓)|
⇡
abc
(✓|x) / P
x
(ABC accept|⇢) ⇡(⇢) |@L(✓)|
using assumptions A1, A2:match through calibrationof ABC tolerances and m
ABC* indirect inference
Tuesday, 7 January 14
ABC* approximation on -space is⇢
⇡
true posterior
(✓|x) / `(x|✓) ⇡(✓)
/ `(s1:nkk
(x), k = 1, . . . ,K|✓) ⇡(✓)
= `(s1:nkk
(x), k = 1, . . . ,K|⇢) ⇡(⇢) |@L(✓)|
⇡
abc
(✓|x) / P
x
(ABC accept|⇢) ⇡(⇢) |@L(✓)|
using assumptions A1, A2:match through calibrationof ABC tolerances and m
ABC* indirect inference
Regularity conditions on the link functionA3 the link function is bijective and continuously differentiable
Tuesday, 7 January 14
Example: moving average no sufficient statistics other than data, simple enough so that link function is analytically known
x
t
= u
t
+ au
t�1, u
t
⇠ N (0,�2)
✓ = (a,�2)
⌫1 = (1 + a
2)�2
⌫2 = a/(1 + a
2)
⇢1 = (1 + a
2)�2/⌫
x1
⇢2 = atanh(a/(1 + a
2))� atanh(⌫x2),
Tuesday, 7 January 14
Example: moving average no sufficient statistics other than data, simple enough so that link function is anlytically known
x
t
= u
t
+ au
t�1, u
t
⇠ N (0,�2)
✓ = (a,�2)
⌫1 = (1 + a
2)�2
⌫2 = a/(1 + a
2)
⇢1 = (1 + a
2)�2/⌫
x1
⇢2 = atanh(a/(1 + a
2))� atanh(⌫x2),
Tuesday, 7 January 14
Example: moving average no sufficient statistics other than data, simple enough so that link function is anlytically known
x
t
= u
t
+ au
t�1, u
t
⇠ N (0,�2)
✓ = (a,�2)
⌫1 = (1 + a
2)�2
⌫2 = a/(1 + a
2)
⇢1 = (1 + a
2)�2/⌫
x1
⇢2 = atanh(a/(1 + a
2))� atanh(⌫x2),
Tuesday, 7 January 14
Example: moving average no sufficient statistics other than data, simple enough so that link function is anlytically known
x
t
= u
t
+ au
t�1, u
t
⇠ N (0,�2)
✓ = (a,�2)
⌫1 = (1 + a
2)�2
⌫2 = a/(1 + a
2)
⇢1 = (1 + a
2)�2/⌫
x1
⇢2 = atanh(a/(1 + a
2))� atanh(⌫x2),
−0.4 −0.2 0.0 0.2 0.4
0.6
0.8
1.0
1.2
1.4
aσ
2
1
1
1.5
2
1 3
5
10
Testing only variance:link not bijective
exact posterior
−0.4 −0.2 0.0 0.2 0.4
0.6
0.8
1.0
1.2
1.4
a
σ2
1
3
5
1
3
5
10
Testing variance and autocorrelation with even values:summary values not sufficient
Tuesday, 7 January 14
Example: moving average no sufficient statistics other than data, simple enough so that link function is anlytically known
x
t
= u
t
+ au
t�1, u
t
⇠ N (0,�2)
✓ = (a,�2)
⌫1 = (1 + a
2)�2
⌫2 = a/(1 + a
2)
⇢1 = (1 + a
2)�2/⌫
x1
⇢2 = atanh(a/(1 + a
2))� atanh(⌫x2),
−0.4 −0.2 0.0 0.2 0.4
0.6
0.8
1.0
1.2
1.4
aσ
2
1
1
1.5
2
1 3
5
10
Testing only variance:link not bijective
exact posterior
−0.4 −0.2 0.0 0.2 0.4
0.6
0.8
1.0
1.2
1.4
a
σ2
1
3
5
1
3
5
10
Testing variance and autocorrelation with even values:summary values not sufficient
Tuesday, 7 January 14
Example: moving average no sufficient statistics other than data, simple enough so that link function is anlytically known
x
t
= u
t
+ au
t�1, u
t
⇠ N (0,�2)
✓ = (a,�2)
⌫1 = (1 + a
2)�2
⌫2 = a/(1 + a
2)
⇢1 = (1 + a
2)�2/⌫
x1
⇢2 = atanh(a/(1 + a
2))� atanh(⌫x2),
−0.4 −0.2 0.0 0.2 0.4
0.6
0.8
1.0
1.2
1.4
aσ
2
1
1
1.5
2
1 3
5
10
exact posterior
−0.4 −0.2 0.0 0.2 0.4
0.6
0.8
1.0
1.2
1.4
a
σ2
1
3
5
1
3
5
10
−0.4 −0.2 0.0 0.2 0.4
0.6
0.8
1.0
1.2
1.4
a
σ2
1
3
5
10
1
3
5
10
5 tests: link bijective and summary values sufficient
Tuesday, 7 January 14
Example: flu time series datastochastic transmission model, derived from ODEs
three parameters of interest: reproductive number R0, duration of immunity, reporting rate
6 sets of iid summary values, from 3 time series, subsetting odd and even values
Tuesday, 7 January 14
Example: flu time series datastochastic transmission model, derived from ODEs
three parameters of interest: reproductive number R0, duration of immunity, reporting rate
6 sets of iid summary values, from 3 time series, subsetting odd and even values
Tuesday, 7 January 14
Example: flu time series dataTest if linkbijective from ABC* output
previous standard MCMC ABC
MCMCABC* with calibrated tolerances
Tuesday, 7 January 14
Example: flu time series dataTest if linkbijective from ABC* output
previous standard MCMC ABC
MCMCABC* with calibrated tolerances
Tuesday, 7 January 14
Example: flu time series dataTest if linkbijective from ABC* output
previous standard MCMC ABC
MCMCABC* with calibrated tolerances
Tuesday, 7 January 14
Conclusions
using statistical decision theory in ABC,
• we can entirely avoid previous asymptotic arguments
• and construct accurate ABC algorithms by calibrating the decision tests appropriately
necessary to understand the distribution of the data on a summary levelidentifying replicate structures and modeling them is key in ABC as in any other approaches for which the likelihood is tractable
Tuesday, 7 January 14
Thank you
co-workers on this projectAnton Camacho (London School of Hygiene & Tropical Medicine, UK)
Adam Meijer (National Institute of the Environment & Public Health, NL)
Gé Donker (Netherlands Institute for Health Services Research, NL)
acknowledgementsIoanna Manolopoulou (University College London)
Christian Robert (Paris Dauphine)
Tuesday, 7 January 14