Upload
sylvie
View
55
Download
1
Embed Size (px)
DESCRIPTION
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 679: Text Mining. Lecture #9: Introduction to Markov Chain Monte Carlo, part 3. Slide Credit: Teg Grenager of Stanford University, with adaptations and improvements by Eric Ringger. - PowerPoint PPT Presentation
Citation preview
CS 679: Text Mining
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
Slide Credit: Teg Grenager of Stanford University, with adaptations and improvements by Eric Ringger.
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.
Announcements
Assignment #4 Prepare and give lecture on your 2 favorite papers Starting in approx. 1.5 weeks
Assignment #5 Pre-proposal Answer the 5 Questions!
Where are we? Joint distributions are useful for answering questions (joint or
conditional) Directed graphical models represent joint distributions Hierarchical bayesian models: make parameters explicit We want to ask conditional questions on our models
E.g., posterior distributions over latent variables given large collections of data Simultaneously inferring values of model parameters and answering the
conditional questions: “posterior inference” Posterior inference is a challenging computational problem Sampling is an efficient mechanism for performing that inference. Variety of MCMC methods: random walk on a carefully constructed
markov chain Convergence to desirable stationary distribution
Agenda
Motivation The Monte Carlo Principle Markov Chain Monte Carlo Metropolis Hastings Gibbs Sampling Advanced Topics
Metropolis-Hastings The symmetry requirement of the Metropolis proposal distribution
can be hard to satisfy Metropolis-Hastings is the natural generalization of the Metropolis
algorithm, and the most popular MCMC algorithm Choose a proposal distribution which is not necessarily symmetric Define a Markov chain with the following process:
Sample a candidate point x* from a proposal distribution q(x*|x(t)) Compute the importance ratio:
With probability min(r,1) transition to x*, otherwise stay in the samestate x(t)
)|q()p()|q()p()(*)(
*)(*
tt
t
xxxxxxr
MH convergence Theorem: The Metropolis-Hastings algorithm converges to the
target distribution p(x). Proof:
For all , WLOG assume
Thus, it satisfies detailed balance
),()p()|()p()|()p()|()p(
)|()p()|()p()|()p(
)|()p(),()p(
xyTyyxqyxyqxyxqy
yxqyyxqyxyqx
xyqxyxTx
candidate is always accepted (i.e., )b/c
transition prob.
multiply by 1
commute
Gibbs sampling A special case of Metropolis-Hastings which is applicable to state
spaces in which we have a factored state spaceAnd access to the full (“complete”) conditionals:
Perfect for Bayesian networks! Idea: To transition from one state (variable assignment) to another,
Pick a variable, Sample its value from the conditional distribution That’s it!
We’ll show in a minute why this is an instance of MH and thus must be sampling from joint or conditional distribution we wanted.
1 1 1p( | ,..., , ,..., ) p( | ) p( | )j j j n j j j jx x x x x x x x x
Markov blanket Recall that Bayesian networks
encode a factored representation of the joint distribution
Variables are independent of their non-descendents given their parents
Variables are independent of everything else in the network given their Markov blanket!
So, to sample each node, we only need to condition on its Markov blanket:
Gibbs sampling More formally, the proposal distribution is
The importance ratio is
So we always accept!
)|( )(* txxq )|( )(* tjj xxp if
0 otherwise* ( ) *
( ) * ( )
* ( ) ( )
( ) * *
* ( ) ( ) *
( ) * * (
( )
*
)
* *
( ) ( )
*
( )
p( )q( | )p( )q( | )p( ) p( | )p( ) p( | )
p( ) p( , ) p( )p( ) p( , ) p( )
p( ) p( ) p( )p( ) p( ) p( )
p( )1
p( )
t
t t
t tj j
tj j
t tj j j
t
t
tj j j
jt t
j
jtj
x x xrx x xx x xx x x
x x x xx x x x
x x xx x x
xx
Dfn of proposal distribution and
Dfn of conditional probability (twice)
Definition of
Cancel common terms
Gibbs sampling example Consider a simple, 2 variable Bayes net
Initialize randomly Sample variables alternately
A
B
B=1 B=0
A=1 0.8 0.2
A=0 0.2 0.8
A=1 A=0
0.5 0.5 b b
a
a
T
F
1
F
TF
1
F
T1
T
1
TT
Gibbs Sampling (in 2-D)
(MacKay, 2002)
Practical issues
How many iterations? How to know when to stop? M-H: What’s a good proposal function? Gibbs: How to derive the complete
conditionals?
Advanced Topics Simulated annealing, for global optimization, is a
form of MCMC Mixtures of MCMC transition functions Monte Carlo EM
stochastic E-step i.e., sample instead of computing full posterior
Reversible jump MCMC for model selection Adaptive proposal distributions
Next
Document Clustering by Gibbs Sampling on a Mixture of Multinomials From Dan Walker’s dissertation!