Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data

logo

Let’s Practice What We Preach:Likelihood Methods for Monte Carlo Data

Xiao-Li Meng

Department of Statistics, Harvard University

September 24, 2011

Based on

Kong, McCullagh, Meng, Nicolae, and Tan (2003, JRSS-B, withdiscussions);Kong, McCullagh, Meng, and Nicolae (2006, Doksum Festschrift);Tan (2004, JASA); ..., Meng and Tan (201X)

Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 1 / 23

logo

Let’s Practice What We Preach:Likelihood Methods for Monte Carlo Data

Xiao-Li Meng

Department of Statistics, Harvard University

September 24, 2011

Based on

Kong, McCullagh, Meng, Nicolae, and Tan (2003, JRSS-B, withdiscussions);Kong, McCullagh, Meng, and Nicolae (2006, Doksum Festschrift);Tan (2004, JASA); ..., Meng and Tan (201X)


logo

Importance sampling (IS)

Estimand:

c1 =

∫Γ

q1(x)µ(dx) =

∫Γ

q1(x)

p2(x)p2(x)µ(dx).

Data: {Xi2, i = 1, . . . n2} ∼ p2 = q2/c2

Estimating Equation (EE):

r ≡ c1

c2= E2

[q1(X )

q2(X )

].

The EE estimator:

r =1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)

Standard IS estimator for c1 when c2 = 1.


logo


Estimand:

c1 =

∫Γ

q1(x)µ(dx) =

∫Γ

q1(x)

p2(x)p2(x)µ(dx).

Data: {Xi2, i = 1, . . . n2} ∼ p2 = q2/c2


r ≡ c1

c2= E2

[q1(X )

q2(X )

].

The EE estimator:

r =1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)



logo


Estimand:

c1 =

∫Γ

q1(x)µ(dx) =

∫Γ

q1(x)

p2(x)p2(x)µ(dx).

Data: {Xi2, i = 1, . . . n2} ∼ p2 = q2/c2


r ≡ c1

c2= E2

[q1(X )

q2(X )

].

The EE estimator:

r =1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)



logo


Estimand:

c1 =

∫Γ

q1(x)µ(dx) =

∫Γ

q1(x)

p2(x)p2(x)µ(dx).

Data: {Xi2, i = 1, . . . n2} ∼ p2 = q2/c2


r ≡ c1

c2= E2

[q1(X )

q2(X )

].

The EE estimator:

r =1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)



logo


Estimand:

c1 =

∫Γ

q1(x)µ(dx) =

∫Γ

q1(x)

p2(x)p2(x)µ(dx).

Data: {Xi2, i = 1, . . . n2} ∼ p2 = q2/c2


r ≡ c1

c2= E2

[q1(X )

q2(X )

].

The EE estimator:

r =1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)



logo


Estimand:

c1 =

∫Γ

q1(x)µ(dx) =

∫Γ

q1(x)

p2(x)p2(x)µ(dx).

Data: {Xi2, i = 1, . . . n2} ∼ p2 = q2/c2


r ≡ c1

c2= E2

[q1(X )

q2(X )

].

The EE estimator:

r =1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)



logo

What about MLE?

The “likelihood” is:

f (X12 . . .Xn22) =

n2∏i=1

p2(Xi2) — free of the estimand c1!

So why are {Xi2, i = 1, . . . n2} even relevant?Violation of likelihood principle?

What are we “inferring”?What is the “unknown” model parameter?


logo

What about MLE?


f (X12 . . .Xn22) =

n2∏i=1





logo

What about MLE?


f (X12 . . .Xn22) =

n2∏i=1





logo

What about MLE?


f (X12 . . .Xn22) =

n2∏i=1





logo

Bridge sampling (BS)

Data: {Xij , i = 1, . . . , nj} ∼ pj = qj/cj , j = 1, 2

Estimating Equation (Meng and Wong, 1996):

r ≡ c1

c2=

E2[α(X )q1(X )]

E1[α(X )q2(X )], ∀ α : 0 < |

∫αq1q2dµ| <∞

Optimal choice: αO(x) ∝ [n1q1(x) + n2rq2(x)]−1

Optimal estimator rO , the limit of

r(t+1)O =

1n2

n2∑i=1

[q1(Xi2)

s1q1(Xi2)+s2 r(t)O q2(Xi2)

]1n1

n1∑i=1

[q2(Xi1)


]


logo




r ≡ c1

c2=

E2[α(X )q1(X )]

E1[α(X )q2(X )], ∀ α : 0 < |

∫αq1q2dµ| <∞



r(t+1)O =

1n2

n2∑i=1

[q1(Xi2)


]1n1

n1∑i=1

[q2(Xi1)


]


logo




r ≡ c1

c2=

E2[α(X )q1(X )]

E1[α(X )q2(X )], ∀ α : 0 < |

∫αq1q2dµ| <∞



r(t+1)O =

1n2

n2∑i=1

[q1(Xi2)


]1n1

n1∑i=1

[q2(Xi1)


]


logo




r ≡ c1

c2=

E2[α(X )q1(X )]

E1[α(X )q2(X )], ∀ α : 0 < |

∫αq1q2dµ| <∞



r(t+1)O =

1n2

n2∑i=1

[q1(Xi2)


]1n1

n1∑i=1

[q2(Xi1)


]


logo




r ≡ c1

c2=

E2[α(X )q1(X )]

E1[α(X )q2(X )], ∀ α : 0 < |

∫αq1q2dµ| <∞



r(t+1)O =

1n2

n2∑i=1

[q1(Xi2)


]1n1

n1∑i=1

[q2(Xi1)


]


logo

What about MLE?


2∏j=1

nj∏i=1

qj(Xij)

cj∝ c−n1

1 c−n22 — free of data!

What went wrong: cj is not “free parameter” becausecj =

∫Γ qj(x)µ(dx) and qj is known.

So what is the “unknown” model parameter?

Turns out rO is the same as Bennett’s (1976) optimal acceptanceratio estimator, as well as Geyer’s (1994) reversed logistic regressionestimator.

So why is that? Can it be improved upon without any “sleight ofhand”?


logo

What about MLE?


2∏j=1

nj∏i=1

qj(Xij)

cj∝ c−n1








logo

What about MLE?


2∏j=1

nj∏i=1

qj(Xij)

cj∝ c−n1








logo

What about MLE?


2∏j=1

nj∏i=1

qj(Xij)

cj∝ c−n1








logo

What about MLE?


2∏j=1

nj∏i=1

qj(Xij)

cj∝ c−n1








logo

What about MLE?


2∏j=1

nj∏i=1

qj(Xij)

cj∝ c−n1








logo

Pretending the measure is unknown!

Because

c =

∫Γ

q(x)µ(dx),

and q is known in the sense that we can evaluate it at any samplevalue, the only way to make c “unknown” is to assume the underlyingmeasure µ is “unknown”.

This is natural because Monte Carlo simulation means we use samplesto represent, and thus estimate/infer, the underlying populationq(x)µ(dx), and hence estimate/infer µ since q is known.

Monte Carlo integration is about finding a tractable discrete µ toapproximate the intractable µ.


logo


Because

c =

∫Γ

q(x)µ(dx),





logo


Because

c =

∫Γ

q(x)µ(dx),





logo


Because

c =

∫Γ

q(x)µ(dx),





logo

Importance Sampling Likelihood

Estimand: c1 =∫

Γ q1(x)µ(dx)

Data: {Xi2, i = 1, . . . n2} ∼ i .i .d . c−12 q2(x)µ(dx)

Likelihood for µ:

L(µ) =

n2∏i=1

c−12 q2(Xi2)µ(Xi2)

Note that c2 is a functional of µ.

The nonparametric MLE of µ is

µ(dx) =P(dx)

q2(x), P — empirical measure


logo


Estimand: c1 =∫

Γ q1(x)µ(dx)


Likelihood for µ:

L(µ) =

n2∏i=1

c−12 q2(Xi2)µ(Xi2)



µ(dx) =P(dx)



logo


Estimand: c1 =∫

Γ q1(x)µ(dx)


Likelihood for µ:

L(µ) =

n2∏i=1

c−12 q2(Xi2)µ(Xi2)



µ(dx) =P(dx)



logo


Estimand: c1 =∫

Γ q1(x)µ(dx)


Likelihood for µ:

L(µ) =

n2∏i=1

c−12 q2(Xi2)µ(Xi2)



µ(dx) =P(dx)



logo


Estimand: c1 =∫

Γ q1(x)µ(dx)


Likelihood for µ:

L(µ) =

n2∏i=1

c−12 q2(Xi2)µ(Xi2)



µ(dx) =P(dx)



logo


Thus the MLE for r ≡ c1/c2 is

r =

∫q1(x)µ(dx) =

1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)

When c2 = 1, q2 = p2, standard IS estimator for c1 is obtained.

{X(i2), i = 1, . . . n2} is (minimum) sufficient for µ onx ∈ S2 = {x : q2(x) > 0}, and hence c1 is guaranteed to beconsistent only when S1 ⊂ S2.


logo



r =

∫q1(x)µ(dx) =

1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)




logo



r =

∫q1(x)µ(dx) =

1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)




logo



r =

∫q1(x)µ(dx) =

1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)




logo

Bridge Sampling Likelihood

Estimand: ∝ cj =∫

Γ qj(x)µ(x), j = 1, . . . , J.

Data: {Xij , 1 ≤ i ≤ nj} ∼ c−1j qj(x)µ(dx), 1 ≤ j ≤ J

Likelihood for µ: L(µ) =∏J

j=1

∏nj

i=1 c−1j qj(Xij)µ(Xij)

Writing θ(x) = log µ(x), then

log L(µ) = n

∫Γθ(x)dP −

J∑j=1

nj log cj(θ),

P is the empirical measure on {Xij , 1 ≤ i ≤ nj , 1 ≤ j ≤ J}.


logo



Γ qj(x)µ(x), j = 1, . . . , J.



j=1

∏nj



log L(µ) = n

∫Γθ(x)dP −

J∑j=1

nj log cj(θ),



logo



Γ qj(x)µ(x), j = 1, . . . , J.



j=1

∏nj



log L(µ) = n

∫Γθ(x)dP −

J∑j=1

nj log cj(θ),



logo



Γ qj(x)µ(x), j = 1, . . . , J.



j=1

∏nj



log L(µ) = n

∫Γθ(x)dP −

J∑j=1

nj log cj(θ),



logo



Γ qj(x)µ(x), j = 1, . . . , J.



j=1

∏nj



log L(µ) = n

∫Γθ(x)dP −

J∑j=1

nj log cj(θ),



logo


MLE for µ given by equating the canonical sufficient statistics P toits expectation:

nP(dx) =J∑

j=1

nj c−1j qj(x)µ(dx),

µ(dx) =nP(dx)∑J

j=1 nj c−1j qj(x)

. (A)

Consequently, the MLE for {c1, . . . , cJ} must satisfy

cr =

∫Γ

qr (x) d µ =J∑

j=1

nj∑i=1

qr (xij)∑Js=1 ns c−1

s qs(xij). (B)

(B) is the “dual” equation of (A), and is also the same as theequation for optimal multiple bridge sampling estimator (Tan 2004).


logo



nP(dx) =J∑

j=1


µ(dx) =nP(dx)∑J

j=1 nj c−1j qj(x)

. (A)


cr =

∫Γ

qr (x) d µ =J∑

j=1

nj∑i=1


s qs(xij). (B)



logo



nP(dx) =J∑

j=1


µ(dx) =nP(dx)∑J

j=1 nj c−1j qj(x)

. (A)


cr =

∫Γ

qr (x) d µ =J∑

j=1

nj∑i=1


s qs(xij). (B)



logo



nP(dx) =J∑

j=1


µ(dx) =nP(dx)∑J

j=1 nj c−1j qj(x)

. (A)


cr =

∫Γ

qr (x) d µ =J∑

j=1

nj∑i=1


s qs(xij). (B)



logo

But We Can Ignore Less ...

To restrict the parameter space for µ by using some knowledge of theknown µ, that it, to set up a sub-model.

The new MLE has a smaller asymptotic variance under the submodelthan under the full model.

Examples:

Group-invariance submodelLinear submodelLog-linear submodel


logo




Examples:



logo




Examples:



logo




Examples:



logo




Examples:

Group-invariance submodel

Linear submodelLog-linear submodel


logo




Examples:

Group-invariance submodelLinear submodel

Log-linear submodel


logo




Examples:



logo

An Universally Improved IS

Estimand: r = c1/c2; cj =∫Rd qj(x)µ(dx)

Data: {Xi2, i = 1, . . . n2} i .i .d ∼ c−12 q2µ(dx)

Taking G = {Id ,−Id} leads to

rG =1

n2

n2∑i=1

q1(Xi2) + q1(−Xi2)

q2(Xi2) + q2(−Xi2).

Because of the Rao-Blackwellization, V(rG) ≤ V(r).

Need twice as many evaluations, but typically this is a small insurancepremium.

Consider S1 = R & S2 = R+. Then rG is consistent for r :

rG =1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)+

1

n2

n2∑i=1

q1(−Xi2)

q2(Xi2).

But standard IS r only estimates∫∞

0 q1(x)µ(dx)/c2.


logo





rG =1

n2

n2∑i=1

q1(Xi2) + q1(−Xi2)

q2(Xi2) + q2(−Xi2).




rG =1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)+

1

n2

n2∑i=1

q1(−Xi2)

q2(Xi2).


0 q1(x)µ(dx)/c2.


logo





rG =1

n2

n2∑i=1

q1(Xi2) + q1(−Xi2)

q2(Xi2) + q2(−Xi2).




rG =1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)+

1

n2

n2∑i=1

q1(−Xi2)

q2(Xi2).


0 q1(x)µ(dx)/c2.


logo





rG =1

n2

n2∑i=1

q1(Xi2) + q1(−Xi2)

q2(Xi2) + q2(−Xi2).




rG =1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)+

1

n2

n2∑i=1

q1(−Xi2)

q2(Xi2).


0 q1(x)µ(dx)/c2.


logo





rG =1

n2

n2∑i=1

q1(Xi2) + q1(−Xi2)

q2(Xi2) + q2(−Xi2).




rG =1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)+

1

n2

n2∑i=1

q1(−Xi2)

q2(Xi2).


0 q1(x)µ(dx)/c2.


logo





rG =1

n2

n2∑i=1

q1(Xi2) + q1(−Xi2)

q2(Xi2) + q2(−Xi2).




rG =1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)+

1

n2

n2∑i=1

q1(−Xi2)

q2(Xi2).


0 q1(x)µ(dx)/c2.


logo





rG =1

n2

n2∑i=1

q1(Xi2) + q1(−Xi2)

q2(Xi2) + q2(−Xi2).




rG =1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)+

1

n2

n2∑i=1

q1(−Xi2)

q2(Xi2).


0 q1(x)µ(dx)/c2.


logo





rG =1

n2

n2∑i=1

q1(Xi2) + q1(−Xi2)

q2(Xi2) + q2(−Xi2).




rG =1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)+

1

n2

n2∑i=1

q1(−Xi2)

q2(Xi2).


0 q1(x)µ(dx)/c2.


logo





rG =1

n2

n2∑i=1

q1(Xi2) + q1(−Xi2)

q2(Xi2) + q2(−Xi2).




rG =1

n2

n2∑i=1

q1(Xi2)

q2(Xi2)+

1

n2

n2∑i=1

q1(−Xi2)

q2(Xi2).


0 q1(x)µ(dx)/c2.


logo

There are many more improvements ...

Define a sub-model by requiring µ to be G-invariant, where G is afinite group on Γ.

The new MLE of µ is

µG(dx) =nPG(dx)∑J

j=1 nj c−1j q Gj (x)

,

where PG(A) = aveg∈G P(gA); q Gj (x) = aveg∈G qj(gx).

When the draws are i.i.d. within each psdµ,

µG = E [µ| GX ],

i.e., the Rao-Blackwellization of µ given the orbit.

Consequently,

c Gj =

∫Γ

qj(x)µG(dx) = E [cj |GX ].


logo






,



µG = E [µ| GX ],


Consequently,

c Gj =

∫Γ



logo






,



µG = E [µ| GX ],


Consequently,

c Gj =

∫Γ



logo






,



µG = E [µ| GX ],


Consequently,

c Gj =

∫Γ



logo

Using Groups to model trade-off

If G1 k G2, thenVar

(~c G1

)≤ Var

(~c G2

).

The statistical efficiency increases with the size of Gi , but so does thecomputational cost needed for function evaluation (but not forsampling, because there are no additional samples involved).


logo


If G1 k G2, thenVar

(~c G1

)≤ Var

(~c G2

).



logo


If G1 k G2, thenVar

(~c G1

)≤ Var

(~c G2

).



logo

Linear submodel: stratified sampling (Tan 2004)

Data: {Xij , 1 ≤ i ≤ nj}i .i .d∼ pj(x)µ(dx), 1 ≤ j ≤ J.

The sub-model has parameter space{µ :

∫Γ

pj(x)µ(dx), 1 ≤ j ≤ J, are equal (to 1).

}Likelihood for µ: L(µ) =

∏Jj=1

∏nj

i=1 pj(Xij)µ(Xij)

The MLE is

µlin(dx) =P(dx)∑J

j=1 πjpj(x),

where πjs are MLEs from a mixture model:

the datai .i .d∼

∑Jj=1 πjpj(·) with πjs unknown


logo




∫Γ



∏Jj=1

∏nj

i=1 pj(Xij)µ(Xij)

The MLE is


j=1 πjpj(x),


the datai .i .d∼



logo




∫Γ


}


j=1

∏nj

i=1 pj(Xij)µ(Xij)

The MLE is


j=1 πjpj(x),


the datai .i .d∼



logo




∫Γ



∏Jj=1

∏nj

i=1 pj(Xij)µ(Xij)

The MLE is


j=1 πjpj(x),


the datai .i .d∼



logo




∫Γ



∏Jj=1

∏nj

i=1 pj(Xij)µ(Xij)

The MLE is


j=1 πjpj(x),


the datai .i .d∼



logo

So why MLE?

Goal: to estimate c =∫

Γ q(x)µ(dx).

For an arbitrary vector b, consider the control-variate estimator(Owen and Zhou 2000)

cb ≡J∑

j=1

nj∑i=1

q(xji )− b>g(xji )∑Js=1 nsps(xji )

,

where g = (p2 − p1, . . . , pJ − p1)>.

A more general class: for∑J

j=1 λj(x) ≡ 1 and∑J

j=1 λj(x)bj(x) ≡ b,consider (Veach and Guibas 1995 for bj ≡ 0; Tan, 2004)

cλ,B =J∑

j=1

1

nj

nj∑i=1

λj(xji )q(xji )− b>j (xji )g(xji )

pj(xji )

Should cλ,B be more efficient than cb? Could there be somethingeven more efficient?


logo

So why MLE?


Γ q(x)µ(dx).


cb ≡J∑

j=1

nj∑i=1


,

where g = (p2 − p1, . . . , pJ − p1)>.




cλ,B =J∑

j=1

1

nj

nj∑i=1


pj(xji )



logo

So why MLE?


Γ q(x)µ(dx).


cb ≡J∑

j=1

nj∑i=1


,

where g = (p2 − p1, . . . , pJ − p1)>.




cλ,B =J∑

j=1

1

nj

nj∑i=1


pj(xji )



logo

So why MLE?


Γ q(x)µ(dx).


cb ≡J∑

j=1

nj∑i=1


,

where g = (p2 − p1, . . . , pJ − p1)>.




cλ,B =J∑

j=1

1

nj

nj∑i=1


pj(xji )



logo

So why MLE?


Γ q(x)µ(dx).


cb ≡J∑

j=1

nj∑i=1


,

where g = (p2 − p1, . . . , pJ − p1)>.




cλ,B =J∑

j=1

1

nj

nj∑i=1


pj(xji )



logo

Three estimators for c =∫

Γ q(x) µ(dx):

IS: 1

n

n∑i=1

q(xi )∑Jj=1 πjpj(xi )

,

where πj = nj/n are the true proportions.

Reg:1

n

n∑i=1

q(xi )− β>g(xi )∑Jj=1 πjpj(xi )

,

where β is the estimated regression coefficient, ignoring stratification.

Lik: 1

n

n∑i=1


,

where πjs are the estimated proportions, ignoring stratification.

Which one is most efficient? Least efficient?


logo


Γ q(x) µ(dx):

IS: 1

n

n∑i=1


,


Reg:1

n

n∑i=1


,


Lik: 1

n

n∑i=1


,




logo


Γ q(x) µ(dx):

IS: 1

n

n∑i=1


,


Reg:1

n

n∑i=1


,


Lik: 1

n

n∑i=1


,




logo


Γ q(x) µ(dx):

IS: 1

n

n∑i=1


,


Reg:1

n

n∑i=1


,


Lik: 1

n

n∑i=1


,




logo


Γ q(x) µ(dx):

IS: 1

n

n∑i=1


,


Reg:1

n

n∑i=1


,


Lik: 1

n

n∑i=1


,




logo

Let’s find it out ...

Γ = R10 and µ is Lebesgue measure.

The integrand is

q(x) = 0.810∏j=1

φ(x j) + 0.210∏j=1

ψ(x j ; 4) ,

where φ(·) is standard normal density and ψ(·; 4) is t4 density.

Two sampling designs:

(i) q2(x) with n draws, or(ii) q1(x) and q2(x) each with n/2 draws,

where

q1(x) =10∏j=1

φ(x j), q2(x) =10∏j=1

ψ(x j ; 1)


logo



The integrand is

q(x) = 0.810∏j=1

φ(x j) + 0.210∏j=1

ψ(x j ; 4) ,




where

q1(x) =10∏j=1

φ(x j), q2(x) =10∏j=1

ψ(x j ; 1)


logo



The integrand is

q(x) = 0.810∏j=1

φ(x j) + 0.210∏j=1

ψ(x j ; 4) ,




where

q1(x) =10∏j=1

φ(x j), q2(x) =10∏j=1

ψ(x j ; 1)


logo



The integrand is

q(x) = 0.810∏j=1

φ(x j) + 0.210∏j=1

ψ(x j ; 4) ,




where

q1(x) =10∏j=1

φ(x j), q2(x) =10∏j=1

ψ(x j ; 1)


logo



The integrand is

q(x) = 0.810∏j=1

φ(x j) + 0.210∏j=1

ψ(x j ; 4) ,



(i) q2(x) with n draws, or

(ii) q1(x) and q2(x) each with n/2 draws,

where

q1(x) =10∏j=1

φ(x j), q2(x) =10∏j=1

ψ(x j ; 1)


logo



The integrand is

q(x) = 0.810∏j=1

φ(x j) + 0.210∏j=1

ψ(x j ; 4) ,




where

q1(x) =10∏j=1

φ(x j), q2(x) =10∏j=1

ψ(x j ; 1)


logo

A little surprise?

Table: Comparison of design and estimator

one sampler two samplers

IS Reg Lik IS Reg Lik

Sqrt MSE .162 .00942 .00931 .0175 .00881 .00881

Std Err .162 .00919 .00920 .0174 .00885 .00884

Note: Sqrt MSE is√

mean squared error of the point estimates andStd Err is

√mean of the variance estimates from 10000 repeated

simulations of size n = 500.


logo

Comparison of efficiency:

Statistical efficiency: IS < Reg ≈ Lik

IS is a stratified estimator, which uses only the labels.

Reg is conventional method of control variates.

Lik is constrained MLE, which uses pjs but ignores the labels;it is exact if q = pj for any particular j .


logo







logo







logo







logo







logo

Building intuition ...

Suppose we make n = 2 draws, one from N(0, 1) and one fromCauchy(0, 1), hence π1 = π2 = 50%.

Suppose the draws are {1, 1}, what would be the MLE (π1, π2)?




logo







logo







logo







logo







logo

What Did I Learn?

Model what we ignore, not what we know!

Model comparison/selection is not about which model is true (as allof them are “true”), but which model represents a better compromiseamong human, computational, and statistical efficiency.

There is a cure for our “schizophrenia” — we now can analyze MonteCarlo data using the same sound statistical principles and methods foranalyzing real data.


logo

What Did I Learn?





logo

What Did I Learn?





logo

What Did I Learn?





logo

If you are looking for theoretical research topics ...

RE-EXAM OLD ONES AND DERIVE NEW ONES!

Prove it is MLE, or a good approximation to MLE.Or derive MLE or a cost-effective approximation to it.

Markov chain Monte Carlo (Tan 2006, 2008)

More ......


logo





More ......


logo



Prove it is MLE, or a good approximation to MLE.

Or derive MLE or a cost-effective approximation to it.


More ......


logo





More ......


logo





More ......


logo





More ......


Documents

Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data