View
229
Download
0
Tags:
Embed Size (px)
Citation preview
Introduction to Monte Carlo MethodsD.J.C. Mackay
Rasit Onur Topaloglu
Ph.D. candidate
Outline•Problems Monte Carlo (MC) deals with•Uniform Sampling•Importance Sampling•Rejection Sampling•Metropolis•Gibbs Sampling•Speeding up MC : Hybrid MC and Over-relaxation
Definition of Problems
Problem1: Generate samples from a distribution
Problem2: Estimate expectation of a function given probability distribution for a variable
Monte Carlo for High Dimensions
•The accuracy is independent of the dimensionality of the space sampled
•Solving first problem => solving second one •Just evaluate function using the samples and average them out
Sampling from P(x)
•If P*(x) can be evaluated, still a hard problem as:• Z is not known•Not easy to draw at high dimensions (except Gaussian)
ex: A sample from univariate Gaussian can be calculated as :
•Assume we can evaluate P(x) within a multiplicative constant
u1 and u2 are uniform in [0,1]
Difficulty of Sampling at High Dimensions
•Can discretize the function (figure on right) and sample the discrete samples
•This is costly at high dimensions: BN for B bins and N dimensions
Uniform Sampling
•Draw samples uniformly from state space
•For distributions that have peaks at a small region, lots of points must be sampled so that (x) is calculated a number of times => requires lots of samples•Thus, uniform distribution seldom useful
•ZR is the normalizing constant:
•Estimate by:
•Tries to solve the 2nd problem
Importance Sampling•Tries to solve the 2nd problem
•Introduce a simpler density Q
•Values of x where Q(x)>P(x) are over-represented; values of x where Q(x)<P(x) are under-represented=>
introduce weights
Reliability of Importance Sampling
•An importance sampler should have heavy tails in problems where infrequent samples might be influential
The estimate vs number of samplesGaussian sampler Cauchy Sampler
Rejection Sampling•Again, a Q (proposal density) assumed
•Also assume we know a constant c s.t.:
•Generate x using Q*(x)•Find r.v. u uniformly in interval [0,cQ*(x)]•If u<=P*(x), accept and add x to the random number list
Transition to Markov Chain MC
•Importance and rejection sampling work well if Q is similar to P•In complex problems, it is difficult to find a single such Q over the state space
Metropolis Method•Proposal density Q depends on current state x(t):Q(x`;x(t))
Accept new state if a1,Else; accept with probability a:
•In comparison to rejection sampling, the rejected points are not discarded and hence influential on consequent samples => samples correlated
=> Metropolis may have to be run more to generate independent samples from P(x)!
Disadvantages of Large Step Size•Useful for high dimensions•Uses a length scale much smaller than state space L
•Because:•Large steps are highly unlikely to be accepted
=> Limited or no movement in space! => biased
A Lower Bound for Independent Samples
•Metropolis will explore using a random walk
•Random walks take a long time
•After T steps of size , state only moves T
•As Monte Carlo is trying to achieve independent samples, requires ~(L/ )2 samples to get a sample
independent of the initial condition
•This rule of thumb be used for a lower bound
An Example using this Bound
tim
e100th iteration
400th iteration
1200th iteration
•Metropolis still provides biases estimates even after a large number of iterations
•Takes ~ 102=100 steps to reach an end state
Gibbs Sampling
•As opposed to previous methods, at least 2 dimensional distributions required
•Q defined in terms of conditional distributions of joint distribution P(x)
•The assumption is P(x) complex to evaluate but P(xi|{xj}ji) is tractable
Gibbs Sampling on an Example
•Start with x=(x1,x2), fix x2(t) and sample x1 from P(x1|x2)
•Fix x1 and sample x2 from P(x2|x1)
Gibbs Sampling with K Variables
•In comparison to Metropolis, every proposal is always accepted
•In bigger models, groups of variables jointly sampled:
Comparison of MC Methods in High Dimensions
•Importance and rejection sampling result in high weights and constants respectively, resulting in inaccurate or length simulations => not practical
•Metropolis requires at least (max/ min)2 samples to acquire independent samples => might be lengthy
•Gibbs sampling has similar properties with Metropolis. No adjustable parameters => most practical
Practical Questions About Monte Carlo
•Can we predict how long it takes for equilibrium:Use the simple bound proposed
•Can we determine convergence in a running simulation: yet another difficult problem
•Can we speed up convergence time and time between independent samples?
Reducing Random Walk in Metropolis : Hybrid MC
•Most probabilities can be written in the form:
•Introduce a momentum variable p:
•K is kinetic energy:
•Create asymptotic samples from joint distribution:
•Pick p randomly from:
•Update x and p:
•This results in linear time convergence
An Illustrative Example for Hybrid MC
Hybrid Hybrid
Randomwalk
Randomwalk
•Over a number of iterations, hybrid trajectories indicate lesscorrelated samples
Reducing Random Walk in Gibbs : Over-relaxation
•Use former value x(t) as well to calculate x(t+1) :
•Useful for Gaussians, not straightforward for other types of conditional distributions
•Suitable to speed up the process when variables are highly correlated
An Illustrative Example for Over-Relaxation
•State space for a bi-variate Gaussian for 40 iterations
•Over-relaxation samples better covers the state space
Reducing Random : Simulated Annealing
•Introduce a parameter T (temperature) and gradually reduce it to 1:
•High T corresponds to being able to make transitionseasier
•As opposed to its use in optimization, T is not reduced to 0 but 1
Applications of Monte CarloDifferential Equations
Ex: Steady-state temperature distribution of an annulus:
The finite difference approximation using grid dimension h:
•With ¼ probability, one of the states is selected
•The value when a boundary is reached is kept, the mean of many such runs gives the solution
Applications of Monte CarloIntegration
To evaluate:
•Bound the function with a box
•Ratio of points under function to all points within the box gives the ratio of the areas of the function and the box
Applications of Monte CarloImage Processing :Automatic eye-glass removal:
•MCMC used instead of a gradient-based methods to solveMAP criterion that is used to locate points of eye-glasses
[C. Wu, C. Liu, H.-Y. Shum, Y.-Q. Xu and Z. Zhang, “Automatic Eyeglasses Removal from Face Images”, IEEE Tran. On Pattern Analysis and Machine Intelligence, Vol.26, No.3, pp. 322-336, Mar.2004]
Applications of Monte CarloImage segmentation:
•Data-driven techniques such as edge detection, tracing and clustering combined with MCMC to speed up the search
[Z. Tu and S.-C. Zhu, “Image Segmentation by Data-Driven Markov Chain Monte Carlo”, IEEE Tran. On Pattern Analysis and Machine Intelligence, Vol.24, No.5, pp. 657-673, May 2002]
The Problem
•The tree relates physical parameters to circuit parameters•Structured according to SPICE formula hierarchy•Given pdf’s for lowest level parameters; find pdf’s at highest
level
Lowest Level:
High level:
Ex: gm=(2*k*Id)
Motivation for Probability Propagation
Find a novel propagation method
•Estimation of distributions of high level parameters needed to examine effects of process variations•Gaussian assumption attributed to these parameters no longer accurate in current technologies
GOALS
Determinism : a stochastic output using known formulas
Algebraic tractability : enabling manual applicability
Speed & Accuracy : be comparable or outperform Monte Carloand other parametric methods
Parametric Belief Propagation
•Each node receives and sends messages to parents and children until equilibrium•Parent to child () : causal information
•Parent to parent () : diagnostic information
Calculationshandledat each node:
Parametric Belief Propagation
•When arrows in the hierarchy tree indicate linear addition operations on Gaussians, analytic formulations possible•Not straightforward for other distributions or non-standard distributions
Shortcomings of Monte Carlo
•Non-determinism : Not manually applicable
•Limited for certain distributions : Random number generators in most CAD packages only provide certain distributions
•Accuracy : May miss points that are less likely to occur due to random sampling unless very large number of samples used; limited by the performance of random number generator
P1
P2
Monte Carlo – FDPP Comparison one-to-many relationships and custom pdf’s
P3
P4
•Non-standard pdf’s not possible without a custom random number generator
•Monte Carlo overestimates in one-to-many relationships as same sample is used
Operations Necessary to Implement FDPP
•F (Forward) : Given a function, estimates the distribution of next node in the formula hierarchy
•Q (Quantize) : Discretize a pdf to operate on its samples
•B (Bandpass) : Decrements number of samples for computational efficiency
•R (Re-bin) : Reduces number of samples for computational efficiency
Analytic operation on continuous distributions difficult; instead operations on discrete distributions implemented :
T NSUB
PHIf
Necessary Operators (Q, F, B, R) on a Hierarchical Tree
Q Q
F
B,R
•Repeated until we acquire the high level distribution (ex. G)
spdf(X) or (X)pdf(X)
p-domain r-domain
Probability Discretization Theory: QN Operator; p and r domains
•QN band-pass filter pdf(X) and divide into bins
))(()( XpdfQX N
N in QN indicates number or bins
Certain operators easy to apply in r-domain
spdf(X) or (X)
r-domain
Characterizing an spdf
Ni
ii wxpX..1
)()(
•can write spdf(X) as :
im
im
i dxXpdfp)1(
)(
2)1(
imwi
where :
pi : probability for i’th impulse
wi : value of i’th impulse
F Operator •F operator implements a function over spdf’s
spdf(X) or (X)
))(),..,(()( 1 rXXFY Xi, Y : random variables
r
r
rss
Xs
Xs
Xs
Xs wwfyppY
,..,1
1
1
1
1
1
1)),..,((..)(
pXs : Set of all samples s belonging to X
•Function applied to individual impulses•Individual probabilities multiplied
))(()(' XBX e
Band-pass, Be, Operator
))((]),[(:
)()(Xwnmwi
ii
ii
wxpX
•Eliminate samples having values out of rangeMargin-based Definition:
))(()
)(max(:
)()(Xp
e
ppi
ii
iii
i
wxpX
Error-based Definition:•Eliminate samples having probabilities least likely to occur
Re-bin, RN, Operator ))(()(' XRX N
•Samples falling into the same bin congregated in one
i
ii wxpX )()( ijs
ji bwstppj
.where :
Impulses after F Unite into one bin Resulting spdf(X)
Error Analysis
12
2
jbjiji
ji pwmdi
)(:,
),(Total distortion:
dqQpdfqQEQ )(][2/
2/
222
Variance of quantization error:
•If quantizer uniform and small, quantization error random variable Q is uniformly distributed
2)(),( jiji wmwmd Distortion caused by representing samples in a bin by a single sample:
mi : center or i’th bin
Algorithm Implementing the F Operator
While each random variable has its spdf computed
For each rv. which has all ancestor spdf’s computed
For each sample in X1
For each sample in Xr
Place an impulse with height p1,..,pr at x=f(v1,..,vr)
Apply B and R algorithms to this rv.
Algorithm for the B and R Operators
Divide this range into M bins
For each binPlace a quantizing impulse at the center of the bin with a height pi equal to the sum of all impulses within bin
Find maximum probability, pi-max, of quantized impulses within bins
Find new maximum and minimum values wi within impulses
Divide this range into N bins
Find maximum and minimum values wi within impulses
Eliminate impulses within bins which have a quantized impulse with smaller probability than error-rate*pi-max
For each binPlace an impulse at the center of the bin with height equal to sum of all impulses within bin
•A close match is observed after interpolation
Monte Carlo – FDPP Comparison
solid : FDPP dotted : Monte Carlo
Pdf of VthPdf of ID
Monte Carlo – FDPP Comparison with a Low Sample Number
•Monte Carlo inaccurate for moderate number of samples•Indicates FDPP can be manually applied without major accuracy degradation
solid : FDPP,100 samples
Pdf of FPdf of F
noisy : Monte Carlo, 1000 and 100000 samples respectively
•Edges define a linear sum, ex: n5=n2+n3
Monte Carlo – FDPP ComparisonPdf of n7Benchmark example
solid : FDPP dotted : Monte Carlo triangles:belief propagation
•Distributions at internal nodes n4, n5, n6 should be re-sampled using Monte Carlo
Faulty Application of Monte Carlo
•Not optimal for internal nodes with non-standard distributions
Pdf of n7Benchmark example
solid : FDPP dotted : Monte Carlo triangles:belief propagation