30
06/14/22 MCMC and Statistics 1 Jun Liu Department of Statistics Stanford University Based on the joint work with F. Liang and W.H. Won

Jun Liu Department of Statistics Stanford University

  • Upload
    nenet

  • View
    49

  • Download
    0

Embed Size (px)

DESCRIPTION

Multiple-Try Metropolis. Jun Liu Department of Statistics Stanford University. Based on the joint work with F. Liang and W.H. Wong. The Basic Problems of Monte Carlo. Draw random variable Estimate the integral . Sometimes with unknown normalizing constant. c g( x ). c. u cg ( x ). - PowerPoint PPT Presentation

Citation preview

Page 1: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 1

Jun LiuDepartment of Statistics

Stanford University

Based on the joint work with F. Liang and W.H. Wong.

Page 2: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 2

The Basic Problems of Monte CarloThe Basic Problems of Monte Carlo

• Draw random variable

• Estimate the integral

X x~ ( )

I f d E f ( ) ( ) ( )x x x X

Sometimes with unknown normalizing constant

Page 3: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 3

How to Sample from (x)• The Inversion Method. If U ~ Unif (0,1) then

• The Rejection Method.– Generate x from g(x);– Draw u from unif(0,1);– Accept x if– The accepted x follows (x).

X F U F 1( ) , ~ where is the cdf of .

c g(x)

(x)

x

u cg(x)c

u< ( )/cg( ) x x

The “envelope” distrn

Page 4: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 4

High Dimensional Problems?High Dimensional Problems?

X

( ),x x Lattice points,

11

( ) exp{ ( )}X X Z 1 Eng

where Eng( )~

X 1T

x x

Metropolis Algorithm:(a) pick a lattice point, say , at random(b) change current xto 1- x(so X(t) X*)(c) compute r= (X*)/ (X(t) )(d) make the acceptance/rejection decision.

Ising Model

Partitionfunction

Page 5: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 5

General Metropolis-Hastings RecipeGeneral Metropolis-Hastings Recipe

• Start with any X(0)=x0, and a “proposal chain” T(x,y)

• Suppose X(t)=xt . At time t+1, – Draw y~T(xt ,y) (i.e., propose a move for the next step)– Compute the Metropolis ratio (or “goodness” ratio)

– Acceptance/Rejection decision: : Let

ry T x yx T y x

t

t

( ) ( , )( ) ( , )

Xy p= { ,r}x p

t

t

( ) min

1 11

, with , with

“Thinning down”

Page 6: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 6

• The detailed balance

( ) ( , ) min ,( ) ( , )( ) ( , )

min ( ) ( , ), ( ) ( , )

( ) ( , ) min ,( ) ( , )( ) ( , )

x T x yy T y xx T x y

x T x y y T y x

y T y xx T x yy T y x

1

1

Actual transition probabilityfrom x to y, where x y

Transition probabilityfrom y to x.

Page 7: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 7

General Markov Chain SimulationGeneral Markov Chain Simulation

• Question: how to simulate from a target distribution (X) via Markov chain?

• Key: find a transition function A(X,Y) so that f0 An

that is, is an invariant distribution of A.• Different from traditional Markov Chain

theory.

Page 8: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 8

If the actual transition probability is

( ) ( , )y x ywhere (x,y) is a symmetric function of x,y,Then the chain has (x) as its invariant distribution.

T x y yy T y xx T x y

T x yy

T y xx( , ) min , ( ) min ,( ) ( , )

( ) ( , )( , )( )

( , )( )1

I learnt it from Stein

Page 9: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 9

• The moves are very “local”• Tend to be trapped in a local mode.

( ) .( )

.( )

x e ex y x Y

12

20 25

2

212

5 2

0 255 2

22

Page 10: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 10

Other Approaches?• Gibbs sampler/Heat Bath: better or worse?• Random directional search --- should be

better if we can do it. “Hit-and-run.”• Adaptive directional sampling (ADS)

(Gilks, Roberts and George, 1994).

Iteration t

xc

xaMultiplechains

Page 11: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 11

Gibbs Sampler/Heat Bath

• Define a “neighborhood” structure N(x)– can be a line, a subspace, trace of a group, etc.

• Sample from the conditional distribution.• Conditional Move

A chosen direction

)(|)()(~oldxNxnew xxpx

Page 12: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 12

How to sample along a line?

• What is the correct conditional distribution?

– Random direction:

– Directions chosen a priori: the same as above– In ADS?

p t t( ) ( ) x r x x x r ' t

x x x r x xc c c a ct ' ( , )

p t t tda( ) | | ( ) 1 x r

Page 13: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 13

The Snooker Theorem• Suppose x~ and y is any point in the d-dim

space. Let r=(x-y)/|x-y|. If t is drawn from

Then

follows the target distribution .

p t t y td( ) | | ( ) 1r

x y t' r

x y (anchor)

)(tp

If y is generated from distr’n, the new point x’ is indep. of y.

Page 14: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 14

Connection with transformation group

• WLOG, we let y=0.• The move is now: x x’=t x

The set {t: t0} forms a transformation group.

Liu and Wu (1999) show that if t is drawn from

p t t J H dtt( ) ( ) | ( )| ( ) x x

Then the move is invariant with respect to .

Page 15: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 15

Another Hurdle • How to draw from something like

• Adaptive rejection? Approximation? Griddy Gibbs?

• M-H Independence Sampler (Hastings, 1970)

– need to draw from something that is close enough to p(x).

p t t td( ) | | ( ) 1 y r

Page 16: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 16

• Propose bigger jumps – may be rejected too often

• Proposal with mix-sized stepsizes.• Try multiple times and select good one(s)

(“bridging effect”) (Frankel & Smit, 1996)• Is it still a valid MCMC algorithm?

Page 17: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 17

• Draw y1,…,yk from the proposal T(x, y) .

• Select Y=yj with probability (yj)T(yj,x).

• Draw from T(Y, x). Let• Accept the proposed yj with probability

Current is at x

*1

*1 ,, kxx xxk *

),()(),()(

),()(),()( ,1min***

1*1

11

jkkj

kk

yxTxyxTxxyTyxyTyp

Can be dependent ones

Page 18: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 18

A Modification

• If T(x,y) is symmetric, we can have a different rejection probability:

)()()()( ,1min

**1

1

k

k

xxyyp

Ref: Frankel and Smit (1996)

Page 19: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 19

Random Ray Monte Carlo:

xy1

y2

y3y4y5

• Propose random direction• Pick y from y1 ,…, y5

• Correct for the MTM bias

Back to the example

Page 20: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 20

An Interesting Twist

• One can choose multiple tries semi-deterministically.

Random equal grids

y1y2

y3y4

y5y6

y7y8x

*1x

*3x *

4x*5x *

6x *7x *

8x*2x y

•Pick y from y1 ,…, y8

•The correction rule is the same:

)()()()( ,1min

*8

*1

81

xxyyp

Page 21: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 21

Use Local Optimization in MCMC• The ADS formulation is powerful, but its

direction is too “random.”• How to make use of their framework?

– Population of samples– Randomly select to be updated.– Use the rest to determine an “anchor point”

• Here we can use local optimization techniques;

• Use MTM to draw sample along the line, with the help of the Snooker Theorem.

},,{ )()1( mttt xxS

)(ctx

ty

Page 22: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 22

xc xa

Distribution contour

A gradient or conjugate gradient direction.

(anchor point)

Page 23: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 23

Numerical Examples

• An easy multimodal problem

19.9.1

,44

19.9.1

,66

),0(31

2222 NNIN

Page 24: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 24

Page 25: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 25

A More DifficultTest Example• Mixture of 2 Gaussians:

• MTM with CG can sample the distribution.• The Random-Ray also worked well.• The standard Metropolis cannot get across.

),5(),()( 5232

5231 II0 NNx

Page 26: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 26

Fitting a Mixture model

• Likelihood:

• Prior: uniform in all, but with constraints

L y y py

n ji j

jji

n

( | , , )

11

3

1

FHGIKJ

RS|T|UV|W|

(log , log ; , , ,2, )p p jj j1 2 1 3

min321 ; ; jjpAnd each group has at least one data point.

Page 27: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 27

Page 28: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 28

Bayesian Neural Network Training

• Setting: Data =

• 1-hidden layer feed-forward NN Model

• Objective function for optimization:

)},(,),,(),,{( 2211 nnyyy xxx

Nonlinear curve fitting: ttt fy )(x

).()(ˆ1

jTt

M

jjtxf x

y

Mh1x px

1h

),0|(),0|(

)()),(ˆ|(22

22

NN

gfyNP tt

x

Page 29: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 29

Liang and Wong (1999) proposed a method that combines the snooker theorem, MTM, exchange MC, and genetic algorithm.

Activation function: tanh(z)# hidden units M=2

Page 30: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 30