"reflections on the probability space induced by moment conditions with implications for Bayesian inference": a discussion

“reflections on the probability space induced bymoment conditions with implications for Bayesian

inference”: a discussion

Christian P. RobertUniversite Paris-Dauphine, Paris & University of Warwick, Coventry

[email protected]

Outline

what is the question?

what could the question be?

what is the answer?

what could the answer be ?

what is the question?

”If one specifies a set of moment functions collectedtogether into a vector m(x , θ) of dimension M, regards θas random and asserts that some transformation Z (x , θ)has distribution ψ, then what is required to use thisinformation and then possibly a prior to make validinference?” R. Gallant, p.4

Priors without efforts

I quest for model induced prior dating back to early 1900’s[Lhoste, 1923]

I reference priors such as Jeffreys’ prior induced by samplingdistribution

[Jeffreys, 1939]

I Fiducial distributions as Fisher’s attempted answer[Fisher, 1956]

Fisher’s t fiducial distribution

When considering

t =x − θ

s/√

n

the ratio has a frequentist t distribution with n − 1 degrees offreedom


However, no equivalent justification in asserting that

t =x − θ

s/√

n

has a t posterior distribution with n − 1 degrees of freedom on θ,given (x , s) except when using a non-informative and improperprior π(θ,σ2) ∝ 1/σ2 since, then

θ ∼ Tn−1(x , s/√n)


Furthermore, neither Bayesian nor frequentist interpretation impliesthat

t =x − θ

s/√

n

has a t posterior distribution with n − 1 degrees of freedom jointly

what could the question be?

Given a set of moment equations

E[m(X1, . . . , Xn, θ)] = 0

(where both the Xi ’s and θ are random), can one derive alikelihood function and a prior distribution compatible with thoseconstraints?

coherence across sample sizes n

Highly complex question since it implies the integral equation∫Θ×Xn

m(x1, . . . , xn, θ)π(θ)f (x1|θ) · · · f (xn|θ)dθdx1 · · · dxn = 0

must or should have a solution in (π, f ) for all n’s.possible outside of a likelihood x prior modelling?

coherence across sample sizes n

Highly complex question since it implies the integral equation∫Θ×Xn

m(x1, . . . , xn, θ)π(θ)f (x1|θ) · · · f (xn|θ)dθdx1 · · · dxn = 0

must or should have a solution in (π, f ) for all n’s.possible outside of a likelihood x prior modelling?

Zellner’s Bayesian method of moments

Given moment conditions on parameter θ and σ2

E[θ|x1, . . . , xn] = xn E[σ2|x1, . . .] = s2n var(θ|σ2, x1, . . .) = σ2/n

derivation of a maximum entropy posterior

θ|σ2, x1, . . . ∼ N(xn, σ2/n) σ−2|x1, . . . ∼ Exp(s2n)

[Zellner, 1996]

but incompatible with corresponding predictive distribution[Geisser & Seidenfeld, 1999]

Zellner’s Bayesian method of moments

Given moment conditions on parameter θ and σ2

E[θ|x1, . . . , xn] = xn E[σ2|x1, . . .] = s2n var(θ|σ2, x1, . . .) = σ2/n

derivation of a maximum entropy posterior

θ|σ2, x1, . . . ∼ N(xn, σ2/n) σ−2|x1, . . . ∼ Exp(s2n)

[Zellner, 1996]

but incompatible with corresponding predictive distribution[Geisser & Seidenfeld, 1999]

what is the answer?

Under the condition that Z (·, θ) is surjective,

p?(x |θ) = ψ(Z (x , θ))

and arbitrary choice of prior π(θ)

I lhs and rhs operate on different spaces

I no reason why density ψ should integrate against Lebesguemeasure in n-dimensional Euclidean space

I no direct connection with a genuine likelihood function, i.e.,product of the densities of the Xi ’s (conditional on θ)

what is the answer?

Under the condition that Z (·, θ) is surjective,

p?(x |θ) = ψ(Z (x , θ))

and arbitrary choice of prior π(θ)

I lhs and rhs operate on different spaces

I no reason why density ψ should integrate against Lebesguemeasure in n-dimensional Euclidean space

I no direct connection with a genuine likelihood function, i.e.,product of the densities of the Xi ’s (conditional on θ)

what could the answer be?

“A common situation that requires consideration of thenotions that follow is that deriving the likelihood from astructural model is analytically intractable and onecannot verify that the numerical approximations onewould have to make to circumvent the intractability aresufficiently accurate.” R. Gallant, p.7

Approximative Bayesian answers

Defining joint distribution on (θ, x1, . . . , xn) through momentequations prevents regular Bayesian inference as likelihood isunavailablethere may be alternative available:

I Approximative Bayesian computation (ABC) and empiricallikelihood based Bayesian inference

[Tavare et al., 1999; Owen, 201; Mengersen et al., 2013]

I INLA (Laplace), EP (expectation/propagation),[Martino et al., 2008; Barthelme & Chopin, 2014]

I variational Bayes[Jaakkola & Jordan, 2000]

Approximative Bayesian answers

Defining joint distribution on (θ, x1, . . . , xn) through momentequations prevents regular Bayesian inference as likelihood isunavailablethere may be alternative available:

I Approximative Bayesian computation (ABC) and empiricallikelihood based Bayesian inference

[Tavare et al., 1999; Owen, 201; Mengersen et al., 2013]

I INLA (Laplace), EP (expectation/propagation),[Martino et al., 2008; Barthelme & Chopin, 2014]

I variational Bayes[Jaakkola & Jordan, 2000]

Bayesian approximative answers

I Using a fake likelihood does not prohibit Bayesian analysis, asshown in the paper with model in eqn. (45)

I However this requires case-by-case consistency analysis sincepseudo-likelihoods do not offer same garantees

I Example of ABC model choice based on insufficient statistics[Marin et al., 2014]

Empirical likelihood (EL)

Dataset x made of n independent replicates x = (x1, . . . , xn) of arv X ∼ F

Generalized moment condition pseudo-model

EF

[h(X ,φ)

]= 0,

where h known function, and φ unknown parameter

Induced empirical likelihood

Lel(φ|x) = maxp

n∏i=1

pi

for all p such that 0 6 pi 6 1,∑

i pi = 1,∑

i pih(xi ,φ) = 0

[Owen, 1988, B’ka, & Empirical Likelihood, 2001]

Empirical likelihood (EL)

Dataset x made of n independent replicates x = (x1, . . . , xn) of arv X ∼ F

Generalized moment condition pseudo-model

EF

[h(X ,φ)

]= 0,

where h known function, and φ unknown parameter

Induced empirical likelihood

Lel(φ|x) = maxp

n∏i=1

pi

for all p such that 0 6 pi 6 1,∑

i pi = 1,∑

i pih(xi ,φ) = 0

[Owen, 1988, B’ka, & Empirical Likelihood, 2001]

Raw ABCel sampler

Naıve implementation: Act as if EL was an exact likelihood[Lazar, 2003, B’ka]

for i = 1 → N do

generate φi from the prior distribution π(·)set the weight ωi = Lel(φi |xobs)

end for

return (φi ,ωi ), i = 1, . . . , N

I Output weighted sample of size N

[Mengersen et al., 2013, PNAS]

Raw ABCel sampler

Naıve implementation: Act as if EL was an exact likelihood[Lazar, 2003, B’ka]

for i = 1 → N do

generate φi from the prior distribution π(·)set the weight ωi = Lel(φi |xobs)

end for

return (φi ,ωi ), i = 1, . . . , N

I Performance evaluated through effective sample size

ESS = 1/ N∑

i=1

ωi

/ N∑j=1

ωj

2

[Mengersen et al., 2013, PNAS]

Economy & Finance

"reflections on the probability space induced by moment conditions with implications for Bayesian inference": a discussion