Upload
nenet
View
49
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Multiple-Try Metropolis. Jun Liu Department of Statistics Stanford University. Based on the joint work with F. Liang and W.H. Wong. The Basic Problems of Monte Carlo. Draw random variable Estimate the integral . Sometimes with unknown normalizing constant. c g( x ). c. u cg ( x ). - PowerPoint PPT Presentation
Citation preview
04/22/23 MCMC and Statistics 1
Jun LiuDepartment of Statistics
Stanford University
Based on the joint work with F. Liang and W.H. Wong.
04/22/23 MCMC and Statistics 2
The Basic Problems of Monte CarloThe Basic Problems of Monte Carlo
• Draw random variable
• Estimate the integral
X x~ ( )
I f d E f ( ) ( ) ( )x x x X
Sometimes with unknown normalizing constant
04/22/23 MCMC and Statistics 3
How to Sample from (x)• The Inversion Method. If U ~ Unif (0,1) then
• The Rejection Method.– Generate x from g(x);– Draw u from unif(0,1);– Accept x if– The accepted x follows (x).
X F U F 1( ) , ~ where is the cdf of .
c g(x)
(x)
x
u cg(x)c
u< ( )/cg( ) x x
The “envelope” distrn
04/22/23 MCMC and Statistics 4
High Dimensional Problems?High Dimensional Problems?
X
( ),x x Lattice points,
11
( ) exp{ ( )}X X Z 1 Eng
where Eng( )~
X 1T
x x
Metropolis Algorithm:(a) pick a lattice point, say , at random(b) change current xto 1- x(so X(t) X*)(c) compute r= (X*)/ (X(t) )(d) make the acceptance/rejection decision.
Ising Model
Partitionfunction
04/22/23 MCMC and Statistics 5
General Metropolis-Hastings RecipeGeneral Metropolis-Hastings Recipe
• Start with any X(0)=x0, and a “proposal chain” T(x,y)
• Suppose X(t)=xt . At time t+1, – Draw y~T(xt ,y) (i.e., propose a move for the next step)– Compute the Metropolis ratio (or “goodness” ratio)
– Acceptance/Rejection decision: : Let
ry T x yx T y x
t
t
( ) ( , )( ) ( , )
Xy p= { ,r}x p
t
t
( ) min
1 11
, with , with
“Thinning down”
04/22/23 MCMC and Statistics 6
• The detailed balance
( ) ( , ) min ,( ) ( , )( ) ( , )
min ( ) ( , ), ( ) ( , )
( ) ( , ) min ,( ) ( , )( ) ( , )
x T x yy T y xx T x y
x T x y y T y x
y T y xx T x yy T y x
1
1
Actual transition probabilityfrom x to y, where x y
Transition probabilityfrom y to x.
04/22/23 MCMC and Statistics 7
General Markov Chain SimulationGeneral Markov Chain Simulation
• Question: how to simulate from a target distribution (X) via Markov chain?
• Key: find a transition function A(X,Y) so that f0 An
that is, is an invariant distribution of A.• Different from traditional Markov Chain
theory.
04/22/23 MCMC and Statistics 8
If the actual transition probability is
( ) ( , )y x ywhere (x,y) is a symmetric function of x,y,Then the chain has (x) as its invariant distribution.
T x y yy T y xx T x y
T x yy
T y xx( , ) min , ( ) min ,( ) ( , )
( ) ( , )( , )( )
( , )( )1
I learnt it from Stein
04/22/23 MCMC and Statistics 9
• The moves are very “local”• Tend to be trapped in a local mode.
( ) .( )
.( )
x e ex y x Y
12
20 25
2
212
5 2
0 255 2
22
04/22/23 MCMC and Statistics 10
Other Approaches?• Gibbs sampler/Heat Bath: better or worse?• Random directional search --- should be
better if we can do it. “Hit-and-run.”• Adaptive directional sampling (ADS)
(Gilks, Roberts and George, 1994).
Iteration t
xc
xaMultiplechains
04/22/23 MCMC and Statistics 11
Gibbs Sampler/Heat Bath
• Define a “neighborhood” structure N(x)– can be a line, a subspace, trace of a group, etc.
• Sample from the conditional distribution.• Conditional Move
A chosen direction
)(|)()(~oldxNxnew xxpx
04/22/23 MCMC and Statistics 12
How to sample along a line?
• What is the correct conditional distribution?
– Random direction:
– Directions chosen a priori: the same as above– In ADS?
p t t( ) ( ) x r x x x r ' t
x x x r x xc c c a ct ' ( , )
p t t tda( ) | | ( ) 1 x r
04/22/23 MCMC and Statistics 13
The Snooker Theorem• Suppose x~ and y is any point in the d-dim
space. Let r=(x-y)/|x-y|. If t is drawn from
Then
follows the target distribution .
p t t y td( ) | | ( ) 1r
x y t' r
x y (anchor)
)(tp
If y is generated from distr’n, the new point x’ is indep. of y.
04/22/23 MCMC and Statistics 14
Connection with transformation group
• WLOG, we let y=0.• The move is now: x x’=t x
The set {t: t0} forms a transformation group.
Liu and Wu (1999) show that if t is drawn from
p t t J H dtt( ) ( ) | ( )| ( ) x x
Then the move is invariant with respect to .
04/22/23 MCMC and Statistics 15
Another Hurdle • How to draw from something like
• Adaptive rejection? Approximation? Griddy Gibbs?
• M-H Independence Sampler (Hastings, 1970)
– need to draw from something that is close enough to p(x).
p t t td( ) | | ( ) 1 y r
04/22/23 MCMC and Statistics 16
• Propose bigger jumps – may be rejected too often
• Proposal with mix-sized stepsizes.• Try multiple times and select good one(s)
(“bridging effect”) (Frankel & Smit, 1996)• Is it still a valid MCMC algorithm?
04/22/23 MCMC and Statistics 17
• Draw y1,…,yk from the proposal T(x, y) .
• Select Y=yj with probability (yj)T(yj,x).
• Draw from T(Y, x). Let• Accept the proposed yj with probability
Current is at x
*1
*1 ,, kxx xxk *
),()(),()(
),()(),()( ,1min***
1*1
11
jkkj
kk
yxTxyxTxxyTyxyTyp
Can be dependent ones
04/22/23 MCMC and Statistics 18
A Modification
• If T(x,y) is symmetric, we can have a different rejection probability:
)()()()( ,1min
**1
1
k
k
xxyyp
Ref: Frankel and Smit (1996)
04/22/23 MCMC and Statistics 19
Random Ray Monte Carlo:
xy1
y2
y3y4y5
• Propose random direction• Pick y from y1 ,…, y5
• Correct for the MTM bias
Back to the example
04/22/23 MCMC and Statistics 20
An Interesting Twist
• One can choose multiple tries semi-deterministically.
Random equal grids
y1y2
y3y4
y5y6
y7y8x
*1x
*3x *
4x*5x *
6x *7x *
8x*2x y
•Pick y from y1 ,…, y8
•The correction rule is the same:
)()()()( ,1min
*8
*1
81
xxyyp
04/22/23 MCMC and Statistics 21
Use Local Optimization in MCMC• The ADS formulation is powerful, but its
direction is too “random.”• How to make use of their framework?
– Population of samples– Randomly select to be updated.– Use the rest to determine an “anchor point”
• Here we can use local optimization techniques;
• Use MTM to draw sample along the line, with the help of the Snooker Theorem.
},,{ )()1( mttt xxS
)(ctx
ty
04/22/23 MCMC and Statistics 22
xc xa
Distribution contour
A gradient or conjugate gradient direction.
(anchor point)
04/22/23 MCMC and Statistics 23
Numerical Examples
• An easy multimodal problem
19.9.1
,44
19.9.1
,66
),0(31
2222 NNIN
04/22/23 MCMC and Statistics 24
04/22/23 MCMC and Statistics 25
A More DifficultTest Example• Mixture of 2 Gaussians:
• MTM with CG can sample the distribution.• The Random-Ray also worked well.• The standard Metropolis cannot get across.
),5(),()( 5232
5231 II0 NNx
04/22/23 MCMC and Statistics 26
Fitting a Mixture model
• Likelihood:
• Prior: uniform in all, but with constraints
L y y py
n ji j
jji
n
( | , , )
11
3
1
FHGIKJ
RS|T|UV|W|
(log , log ; , , ,2, )p p jj j1 2 1 3
min321 ; ; jjpAnd each group has at least one data point.
04/22/23 MCMC and Statistics 27
04/22/23 MCMC and Statistics 28
Bayesian Neural Network Training
• Setting: Data =
• 1-hidden layer feed-forward NN Model
• Objective function for optimization:
)},(,),,(),,{( 2211 nnyyy xxx
Nonlinear curve fitting: ttt fy )(x
).()(ˆ1
jTt
M
jjtxf x
y
Mh1x px
1h
),0|(),0|(
)()),(ˆ|(22
22
NN
gfyNP tt
x
04/22/23 MCMC and Statistics 29
Liang and Wong (1999) proposed a method that combines the snooker theorem, MTM, exchange MC, and genetic algorithm.
Activation function: tanh(z)# hidden units M=2
04/22/23 MCMC and Statistics 30